By Nigrini’s definition, second order test look at relationships and patterns in data and is based on the digits of the differences between amount that have been sorted from smallest to largest. The digit patterns of the differences are expected to closely approximate the digit frequencies of Benford’s law. The second order test gives few (if any) false positives in that if the results are not as expected (close to Benford), the the data do indeed have some characteristic that is rare and unusual, abnormal or irregular.

As described in previous article, mixture of approximate geometric sequences will produce a Benford Set.

What is Benford Set?

A set of numbers that conforms closely to Benford’s Law is called a Benford Set.

Geometric sequence can be described as follows:

\[ S_n = ar^{n-1} \]

Meaning of symbols is as follows:

\(S_n\) => member in geometric sequence.
\(a\) => first term in geometric sequence.
\(r\) => common ratio of the \((n + 1)^{st}\) element divided by the nth element.

The second order test is based on differences between the successive elements of a geometric sequence \(D_n\):

\[ D_n = ar^{n} - ar^{n - 1} = a(r - 1)r^{n - 1} \]

Since the elements of this new sequence form a geometric series, the distribution of these digits will also conform to Benford’s Law and the \(N - 1\) differences will form a Benford Set.

Nigrini makes the following statement:

If the data is made up of nondiscrete random variables drawn from any continuos distribution with a smooth density function (Uniform, Triangular, Normnal or Gamma distributions), then the digit patterns of the \(N - 1\) differences between the ordered elements will be Almost Benford (meaning that digit pattern will conform closely, but no exactly to Benford’s Law).

This also, funny enough, applies to when data is drawn from most of the continuous distributions encountered in practice.

Let’s check this out.

1 Normal distribution

Let’s import our libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import seaborn as sns

sns.set_style("darkgrid")

from src.BenfordExpectedProbability import BenfordExpectedProbability
from src.BenfordAnalysis import BenfordAnalysis

np.random.seed(13)

BenfordExpectedProbability and BenfordAnalysis are local classes that I have used during writing last article.

Let’s draw from normal distribution and plot this:

x = np.random.normal(100_000, 10_000, 100_000)

fig, ax = plt.subplots()

sns.histplot(x, ax=ax)
ax.set(title="Histogram of random samples from normal distribution")

plt.tight_layout()
plt.show()

Calculating first order differences through pandas is very easy:

x_diff = (
    pd.Series(x)
    .sort_values()
    .diff()
    .dropna()
    .pipe(lambda a: a[a>=0.0001])
)

x_diff *= 100_000
x_diff

6916     2.981607e+06
76912    7.386981e+08
91236    6.330016e+07
92269    6.214962e+06
76806    1.827014e+07
             ...     
75749    1.262661e+07
5266     2.617066e+07
6455     1.278739e+08
37773    7.745682e+07
12258    5.277675e+08
Length: 99973, dtype: float64

Do note that we are multiplying with multiple of 100 so that we can get first two digits (Nigrini multiplies by 100, I chose greater number).

After this, we can see that these differences conform almost perfectly to Benford’s law:

We can see that these differences conform almost perfectly to Benford’s Law. The red columns (meaning that this particular subset is not conforming to Benford’s Law) can be disregarded, since the difference is very, very small.

2 Uniform distribution

Same methodology, uniform distribution:

x = np.random.uniform(10_000, 100_000, 100_000)

Picture says thousand words™:

3 Triangular distribution

Let’s run this test on triangular distribution:

x = np.random.triangular(10_000, 50_000, 100_000, 100_000)

And we get this plot:

4 Gamma distribution

Finally, let’s check gamma distribution:

x = np.random.gamma(10_000, 1_000, 100_000)

Will conformity fail?

No.

5 Second order test on real world data

We can use Nigrini’s invoices to see how will test behave with real world data.

	ID	SUPPLIER	DATE	INVOICE	AMOUNT
0	1	2001	2010-01-01	4242J10	25.19
1	2	2001	2010-01-01	7899J10	25.86
2	3	2001	2010-01-01	3830J10	26.57
3	4	2001	2010-01-01	9514J10	27.83
4	5	2001	2010-01-01	6296J10	28.09
...	...	...	...	...	...
189465	189466	52935	2010-07-01	270221266736	33.46
189466	189467	52936	2010-07-01	270348386110	61.52
189467	189468	52937	2010-02-01	271253401514	12.36
189468	189469	52938	2010-02-01	261715090450	8.02
189469	189470	52939	2010-02-01	270241460335	16.30

189470 rows × 5 columns

After applying identical methodology to AMOUNT column, we can plot conformity with Benford’s Law.

We can see that the analyst should check all invoices where the first two digits of differences between sorted Invoice amounts are 10, 19, 20, 29, 30 … and so on (all red columns). The most critical case is 99.