**Important update 10/24/11 -**Please see this revised analysis.

**Original post:**

A second earth-shattering fact is that there are more numbers in the universe that begin with the digit 1 than 2, or 3, or 4, or 5, or 6, or 7, or 8, or 9. And more numbers that begin with 2 than 3, or 4, and so on. This relationship holds for the lengths of rivers, the populations of cities, molecular weights of chemicals, and any number of other categories. What a blow to any of us who purport to have mastered the basic facts of the world around us!

This numerical regularity is known as Benford's Law, and specifically, it says that the probability of the first digit from a set of numbers is

*d*is given by
In fact, Benford's law has been used in legal cases to detect corporate fraud, because deviations from the law can indicate that a company's books have been manipulated. Naturally, I was keen to see whether it applies to the large public firms that we commonly study in finance.

I downloaded quarterly accounting data for all firms in Compustat, the most widely-used dataset in corporate finance that contains data on over 20,000 firms from SEC filings. I used a standard set of 43 variables that comprise the basic components of corporate balance sheets and income statements (revenues, expenses, assets, liabilities, etc.).

And lo, it works! Here are the distribution of first digits vs. Benford's law's prediction for total assets and total revenues.

Next, I looked at how adherence to Benford's law changed over time, using a measure of the sum of squared deviations of the empirical density from the Benford's prediction.

where ^

*P(d)*is the empirical probability of the first digit*d*.
Deviations from Benford's law have increased substantially over time, such that today the empirical distribution of each digit is about 3 percentage points off from what Benford's law would predict. The deviation increased sharply between 1982-1986 before leveling off, then zoomed up again from 1998 to 2002. Notably, the deviation from Benford dropped off very slightly in 2003-2004 after the enactment of Sarbanes-Oxley accounting reform act in 2002, but this was very tiny and the deviation resumed its increase up to an all-time peak in 2009.

So according to Benford's law, accounting statements are getting less and less representative of what's really going on inside of companies. The major reform that was passed after Enron and other major accounting standards barely made a dent.

Next, I looked at Benford's law for three industries: finance, information technology, and manufacturing. The finance industry showed a huge surge in the deviation from Benford's from 1981-82, coincident with two major deregulatory acts that sparked the beginnings of that other big mortgage debacle, the Savings and Loan Crisis. The deviation from Benford's in the finance industry reached a peak in 1988 and then

*decreased*starting in 1993 at the tail end of the S&L fraud wave, not matching its 1988 level until … 2008.
The time series for information technology is similarly tied to that industry's big debacle, the dotcom bubble. Neither manufacturing nor IT showed the huge increase and decline of the deviation from Benford's that finance experienced in the 1980s and early 1990s, further validating the measure since neither industry experienced major fraud scandals during that period. The deviation for IT streaked up between 1998-2002 exactly during the dotcom bubble, and manufacturing experienced a more muted increase during the same period.

While these time series don't prove anything decisively, deviations from Benford's law are compellingly correlated with known financial crises, bubbles, and fraud waves. And overall, the picture looks grim. Accounting data seem to be less and less related to the natural data-generating process that governs everything from rivers to molecules to cities. Since these data form the basis of most of our research in finance, Benford's law casts serious doubt on the reliability of our results. And it's just one more reason for investors to beware.

As noted by William Black in his great book on the S&L crisis The Best Way to Rob a Bank Is to Own One, the most fraudulent S&Ls were the ones that looked most profitable on paper. That was in fact an inherent part of the scam. So perhaps, instead of looking solely at profitability, we should also consider this more fundamental measure of a firm's "performance." And many questions remain. What types of firms, and what kind of executives drive the greatest deviations from Benford's law? Does this measure do well in predicting known instances of fraud? How much of these deviations are driven by government deregulation, changes in accounting standards, and traditional measures of corporate governance?

Stay tuned to find out.

**References:**

Mark Nigrini's book on Benford's Law

## 65 comments:

awesome post!!!!

Do you think there are enough numbers that occur in a quarterly filing w/ the sec to reliably tell if a company is 'fudging' their numbers using benford's law?

This is excellent but be careful - can you be sure that there has not been a change to reporting requirements which means reporting of numbers that would not follow Benford's Law?

I like the idea, however, that means some cooking at top level. if individual departments cook their book, the sum/average should follow benford law....

and i would suspect cooking to happen more at dpt level than top.

So Benford's law would dictate that 1990 (starting with a 1) should be more common than 2000, or 2001 (because they start with 2). Could this explain why Benford's law underpredicts the distribution more as we left the 20th century and entered the 21st?

@MostlyAPragmatist Benford's law applies to data sets that grow exponentially such as incomes and stock prices. It does not apply to sets of numbers you pull out of your hat, which is actually why it is useful for fraud detection.

Who said that economics can't be a science!

Good post. I'd be interested to know which variable (revenue?) was plotted in the last few charts and how many companies went into each data point (was it all 20,000, and the same 20,000 each time?)

Nice work.

Three questions:

1. Can you replot without normalizing the first two charts? There's some loss of information such as total number of firms.

2. How many types of variables are being used in your sum of squared deviations, just the total revenue and total assets for each company in each year?

3. The increase looks real but very small e.g. 0.001 to 0.008 (was this on your normalized distribution or non-normalized?) - can one estimate the number of companies engaged in fraudulent behavior? Either Bayesian or frequentist estimate would be fine.

Thanks, I wondered when someone would get around to this.

Try doing this with Fed, Pentagon and OMB numbers. I willing to bet there will be some even more significant deviations from Benford in those numbers.

Firstly, congratulations on some interesting work.

I have substantial experience applying Benford's Law to the insurance industry.

Some warnings:

There are many reasons why data may not follow Benford's Law; not just fraud.

This type of work is highly susceptible to data mining.

I have found certain types of people (usually otherwise highly intelligent mathematicians/econometricians) become so obsessed with Benfords law because of it's simplicity and elegance that they are completely blind to the two points above.

Apologies for my misuse of apostrophes in its and Benford's. Difficult posting on a cellphone.

Assets = Liabilities + Owners Equity

I think the analysis is a bit of a stretch considering there have been many changes in GAAP reporting requirements, especially on the L and OE side of the equation(off balance sheet financing, stock options, derivatives, mark-to-market, etc. over time). The underlying data may likely be skewed.

Since you appear to be Stata user, you can do a formal hypothesis test using digdis or firstdigit (both from SSC).

Thanks to everyone for their comments. Other than Sarbanes-Oxley, I haven't tested whether accounting standards changes are related to changes in the deviation. What I can say with some confidence is that these changes have made the data less representative of what's going on inside firms.

The natural quasi-random economic processes that generate assets and cashflows are still present and should still follow the law. If there were reforms, they have made the numbers LESS representative of the underlying processes.

Moreover, accounting changes don't explain why deviations from the law seem to correlate with known periods of fraud within industries. In my next post I'll look at how it correlates with known fraud within individual firms.

The main charts use quarterly data for ALL 43 variables for ALL firms in Compustat (most SEC filings by most public firms and some private ones). The first two use data from just revenues and assets for all of these firms.

Benford's law does not hold for arithmetic sequences such as years.

very interesting:

http://acemaxx-analytics-dispinar.blogspot.com/2011/10/die-abnehmende-zuverlassigkeit-der.html

@Nello: My point was that there may be some kind of non-fraud-related data which doesn't follow a power series that is polluting the results and making it disagree with Benford's law. So, if dates were somehow being included in the data, they could show this. I see the correlation with the tech bubble and the housing bubble (at least a little bit, but mostly in the financial sector). I'm looking for a non-fraud explanation for why the numbers are increasing so uniformly in all industries regardless of whether we're entering a bubble.

I have never before seen a Benford's law time series. It is a great way to check that the law is relevant to your data set.

Aside from revenue and profits, what other series are included?

Very interesting. One thing that occurred to me was that the advent of computers could possibly have changed the way we present data, and that might have an effect on the outcome of these distributions. What happens if you look at NON-financial data of some sort with similar time histories, but that is presented in a similar was as financial reports. Say water or electricity bills or usage from dozens of utility companies. I want you to be right, but my skeptical nature warns...

Excellent Post. I suggest you use Google scholar to save yourself some work: Accounting Professors and Finance Professors have been doing similar work for nearly 20 years now, and there is a lot of good stuff there.

Past Benford's Law work shown hanky panky in Greek public finances, Enron accounting statements, and Earnings.

http://onlinelibrary.wiley.com/doi/10.1111/j.1468-0475.2011.00542.x/abstract

http://www.emeraldinsight.com/journals.htm?articleid=1657234&show=abstract

http://www.springerlink.com/content/g2532196r4284x21/

tylerh - Thanks for the references. I did do a literature search, but there is surprisingly NOT a lot of work out there considering how long we've known about it, and the work hasn't been as influential as it probably should be. Most academics don't know about it, and furthermore no one has done the type of broad-scale study presented here.

So I think there's quite a bit of value in getting this out there!

Benford's Law is a good tool to uncover possible fraud.

I've used it in a privately held business before, and uncovered enough questions to pursue issues. So it works and is practical.

It's a good quick cut on the data set to zero in on suspicious activity. But do be cautious about small data sets and be willing and able to drill down to the next level.

Very interesting - how would Benford's law work if the growth process stagnated? For example, if the period 2008-2011 has little to no growth in the financial numbers, wouldn't that be "suspicious" activity as per Benford?

The leading zero in values >0 and <1 would would cause some deviation for Benford's distribution. Values for Earnings Per Share are often in this range.

Interesting ideas. I wonder if the large changes in valuations of companies before and during the various crises itself could be the source of the deviation from Benfords? Ie, market values are no longer increasing at a steady geometric rate, thus skewing the number distribution.

Dummy me - can you run a chi-square test on your data? Is it significant?

Great post. I found a similar one about Benford's Law and the current mess in Europe here: http://www.helixpartners.com/market-commentary/benfords-law-and-the-eurozone/

One thing I'd be really interested in seeing is how deviations from Benford's law varies at the firm level. Do firms stick to one level? Or are they learning how to fudge numbers over time? (Learning by doing!)

What is the variation within industry vs. across industry? Do firms with a simpler production process (e.g., turning trees into paper) have different Benford deviations than firms with complex and hard-to-define production functions (e.g., software development)?

-Eric

Have you run across a population of data that yields a flat line? It would be interesting to know about one or two of those.

Are there enough datapoints for you to look at Benford's law just within the universe of US-listed Chinese companies (of which many have been found fraudulent) to see if there is a meaningful divergence from what is expected?

See "Benford's Law" article in Wikipedia, section "Generalization to digits beyond the first [digit]". Formula, explanation, and references provided.

Great work. You may want to split your analysis according to the size of the firm and the size of the numbers. Discrepancy of large firms/large numbers is more important than discrepancy of small firms/small numbers.

Interesting food for thought. Would be interesting to see if there was a statistically significant difference when controlling for the number and size of public companies who are required to file with the SEC over time. It's a widely held theory in aviation that the number of fatal airline accidents will increase as passenger traffic increases. I would think that as the number of SEC filers increased, so would incidents of accounting irregularities. Additionally, the geographic diversity of companies who file with the SEC has increased dramatically in the last 20 years as the equity markets have globalized. Chinese and other foreign, emerging market companies have capitivated investors even while being notorious perpetrators of accounting fraud.

However, Benford's Law is only relevant to data sets across multiple orders of magnitude. Public companies must comply with various listing requirements that require a firm to reach and maintain a certain size in order be allowed to be publicly-traded and subject to SEC filing requirements. Smaller companies are also more likely to fail or to be consumed in M&A (mice can't eat elephants), and are much less likely to exist as public companies because of competitive dynamics in certain sectors, such as financial services. The flip side of Sarbanes-Oxley is that it also makes it much harder to justify being public as a small firm. It is interesting to note how the increase in the sum of squares corrlates with bull markets for M&A and IPOs in the mid 80's (junk-bond boom, conglomerate busting), late 90's (dot-com era), and 2000's (age of the LBO).

A test to validate the existence of a sufficiently wide and consistant variance in the population of the data set over time would make the analysis more compelling.

Neat work. Glad it's getting attention. My thought is this may reflect a) more composite statements and b) greater manipulation of earnings through financial operations. First one implies greater uncertainty which people naturally resolve in patterns we think make sense. Second implies a number is targeted - as in we need x this quarter - and again we don't want that to look what we think is unnatural.

[partly cross-posted from Mark Thoma's blog]

Interesting work, though it is important to consider whether certain economic processes [other than fraud] might bias the distribution:

One could argue that competitive pressures would drive corporate results to be correlated with each other to a greater extent than in a random power-law distribution. If high profit levels are competed away over time, figures for multiple firms would tend to approach each other. It's not obvious that this would result in non-Benford distributions, but it is certainly possible to imagine circumstances where it could (e.g. if firms with over $1 billion in profits become particularly noticeable to competitors).

Or, even given the figures

arepartly influenced by the managers of the companies, this might be a financially oriented management strategy rather than fraud. For instance, a company might target a 10% sales increase each year, and might then invest whatever resources it takes, or work their salespeople hard enough, to ensure that happens. Again, not certain that this would result in a Benford's law violation but it is mathematically plausible depending on the parameters.The trend of increasing deviation from Benford's law is suggestive, as are the industry-specific trends. But again you would want to check against other potential causes: it is plausible that deregulation leads to more competition, or more financially-oriented management targets, which could cause either of the effects I've suggested above.

But all that said: maybe it is fraud after all. The fact that you've checked 43 different variables and the effects still exist is certainly interesting.

Have you tried checking each of these 43 individually, in case there are specific variables which don't fit a Benford's type distribution at all? (e.g. earnings per share might not, because companies execute stock splits or combinations which tend to push stock price, and therefore EPS, towards a consistent value across different companies)

Link to this article posted on the "Talk Economics" Bulletin Board at Bullion Bulls Canada: http://www.bullionbullscanada.com

Is the number of firms filing to the SEC increasing over time? If so, then wouldn't we expect the "sum of squares" to increase over time as well (even in the absence of fraud)? Perhaps something more like a "mean square" measure is needed.

Well ... i have a point that should be considered.

Benford's law assumes that the underling quantity does not grow or does not follow some nontrivial law of motion. The length of river is not contracting or expanding over time so systemically as the accounting variables. Thus the second measure is not valid for time processes that exhibit eg. random walk or a growth.

IMHO Benford's law should hold once a nontrivial transformation of variable is taken so that it does satisfy the underling assumption at the end. Yet this has to be shown. Maybe empirical proof could be considered as sufficient.

Thinking about the finance sector specifically:

(1) Did (or should) you control for the massive consolidation in the Bank/S&L industries (over 18k institutions in the early '80s to fewer than 8k today)? Does growth through merger influence Benford numbers differently than organic growth (i.e., would merging of three "1" banks in year 1 give you a "3" bank in year 2?)

(2) Can the Benford approach reveal anything by looking at relationships between fin'l stmt. numbers (e.g., change in total loans vs. change in loan-losses or loss reserves)?

(3) Might the growth in loan securitization since the mid-'80s explain some of the results by introducing more variance in assets and revenues than existed previously (when banks held all loans in their portfolio and had organic run-on / run-off)?

I'm still concerned about systematic effects that have changed over the years. Example: just today when recording data from my digital oscilloscope -- I was just get number that ended in even digits!? Some rounding algorithm that would make a mess of Benford's law but really didn't effect my data in any way.

What happens when businesses stop using pennies? Does this walk up the ladder all the way so that all numbers have an excess of 5's? How about the change when the markets went from fractional numbers to decimal? You really need some controls. Also, the details of which digits are either over or under represented might shed light on if there are logical systematic effects that could account for discrepancies.

What happens if you run the data again, but throw away the least significant digit in every number? This would presumably remove most of the effects I am concerned about, yet leave the effects you are concerned about exposed.

@MostlyAPragmatist This law is not meant to be applied to a data type like 'calendar year'.

Jialan - excellent analysis, I feel extremely wonky after reading through it! I found your blog through a link on Dealbreaker. Looking forward to more posts.

Have you tried the KL-divergence as a measure of the difference between two distributions? It's a more natural notion of distance than the sum of squares error.

Yes, as mentioned above, why use sum of squares rather than MSE / RMSE? Surely the error should be normalized to account for increase in the number of companies reporting..

Also if no effect were found this would certainly not rule out fraud. In fact, since Benford's law has been well known for awhile, I would expect expert book-cookers to ensure their fraudulent numbers are consistent with the law.

One good set of well-known book-cooking companies are Chinese companies that listed their stock on the U.S. exchanges via a reverse takeover. At least 80% of them have turned out to be frauds.

It would be interesting if you could run your analysis on this group of companies. I can provide you with a list of them if needed.

> more numbers in the universe that begin with the digit 1 than 2, or 3...

According to Benford's Law, that statement is incorrect. The probability that a number starts with 1 is log10(2) = 0.3, so there are fewer numbers in the universe that start with 1 than 2,3,4,5,6,7,8, or 9.

It is possible for someone committing fraud to be aware of Benford's law, and adjust the numbers accordingly. I would be interested in seeing plots of the same data, but in a different base (e.g. plot the frequency of first digits in base 6 vs log_6(1 + 1/d))

Interesting post.

Previous research on the subject for context: http://www.jstor.org/pss/247861

This is pretty cool. It's always amazing to find skews in things that I would think were random.

I dispute Benford's Law. Here's a simple proof that the number of numbers starting with the digit 2 is greater than the number of numbers starting with the digit 1.

* * *

Let X be the set of all numbers starting with the digit 1.

Let Y be the set of all numbers starting with the digit 2.

There are both odd and even numbers in X.

There are both odd and even numbers in Y.

Let Z be the set of all numbers that equals 2 multiplied by each number in X.

The number of numbers in Z equals the number of numbers in X.

Since Y contains all numbers that start with the digit 2, then Z must either be a subset of Y or equal to Y.

However, each number in Z is an even number; therefore, Z must be smaller than Y.

Since Z is equal to X, then X must be smaller than Y.

Therefore, the number of numbers that begin with the digit 1 is smaller than the number of numbers that begin with the digit 2.

QED.

* * *

@Ray Balestri: when you take a number that starts with 1, and multiply it by two, you don't necessarily get a number that starts with 2. Consider, for example, 15*2=30. Don't bother trying to correct the proof. The cardinality of all natural numbers starting with 1 is the same as the cardinality of the whole set of natural numbers... both are aleph null. Benford's Law is not a statement about number theory.

Ouch. You are correct.

What, then, does Benford's Law explain or describe?

Anonymous, don't bother responding. Wikipedia's entry is very helpful. The lead-in sentence, which makes a lot of sense, says: "Benford's law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way."

Of course, this is much different from Ms. Wang's rather imprecise and misleading statement: "there are more numbers in the universe that begin with the digit 1 than 2, or 3, or 4, or 5, or 6, or 7, or 8, or 9."

The universe is a pretty big place, producing results closer to the what one would expect to find under number theory than Benford's Law.

How about some update on the problem you have found, and when might we see the new results,, I suspect that it may be minor and that the results may still be directionally accurate?

So, all was nonsense.

awesome. Although it is very technical. But it really helps. Looking forward for your next post =)

Has anyone checked Groupon's declared numbers for their fit with Benford's Law?

Are Compustat numbers normalized at all or can you count on a one to one relationship with reported data in the SEC filings? Love this stuff!

China is on the decline so where is the true supply-side fundamental growth that can support Dow 12,000 ???

16 Facts About China That Will Make Your Mind Melt , and why the Lie about China's economic growth as an International investment needs to be Debated .

http://www.businessinsider.com/facts-about-china-2011-9?op=1

Tighter Oversight of China Bank Risk Needed: IMF

http://www.bloomberg.com/news/2011-11-15/imf-calls-for-more-oversight-of-china-bank-risk.html

I think there is a good case to be made that would suggest that the multi-Nationalists Corporations are Cooking the books a little bit today because what the heck is going on today to support Dow Jones equities at 12,000 when in 2006-7 we had a whole lot more going on than we do NOW ????

If they will allow this to go one they will allow the Multi-Nationals to cook their books don't ya think ?????

Congress: Trading stock on inside information?

http://www.cbsnews.com/8301-18560_162-57323527/congress-trading-stock-on-inside-information/

'60 Minutes' Uncovers Pelosi's Insider Stock Trades

http://www.newsmax.com/InsideCover/pelosi-stock-insider-60minutes/2011/11/13/id/417848?s=al&promo_code=D800-1

Note the good fit to Benford's Law. The spikes were mainly due to recurrences of low value invoice amounts. The analysis was done in IDEA using the Benford's Law analysis tool.

What is really interesting about the article is not the magnitude of deviation from Benford's law, it's the change in deviation over time.

I've placed an Excel spreadsheet for investigating Benford's Law here

Benford's Law is pretty good technique to detect the frauds and irregularities in accounting data, i have prepared a guide and step by step guidance on how to apply the benfords law using MS-Excel.

http://internalauditworkingpapers.weebly.com/benfords-law-application.html

Post a Comment