Proving Election Fraud using Data Science
Overview
While we await the outcome of court cases that will determine who will win the 2020 Presidential election, I thought it would be instructive to analyze the mounting evidence that vote fraud occurred in multiple states.
As it turns out, the 2020 Presidential election is teeming with fraudulent votes. The tool we're going to use to prove this is called Benford's law. Statisticians have been using Benford's law for years to detect election fraud. We'll dive into the details of Benford's law in a moment, but first, a word of caution:
Beware of those who are quick to dismiss Benford's law. They are usually political junkie types worried that careful analysis will turn the election results against their chosen candidate. Methinks they doth protest too much. Benford's law can be used to detect election fraud, but it cannot by itself tell you who cheated. All it can tell you is that someone fiddled with the numbers. Therefore, anyone who has an allergic reaction to using Benford's law as a screening tool to detect fraud is probably dishonest.
Detecting fraud using Benford's law
Math isn't everyone's strong suit, so I'm going to simplify the explanation of Benford's law a bit. Think about the set of numbers that occur in nature or during the ordinary course of human action—the amount of your electric bill, the price of houses, lotto numbers, the distance between planets, etc. If you take any such set of numbers and look at the leftmost (leading) digit, you'll find that the digit 1 occurs about 30.1% of the time, the digit 2 appears about 17.61% of the time, and so on, like this:
Digit  Percent of the time the digit occurs as a leading digit 

1  30.10% 
2  17.61% 
3  12.49% 
4  9.69% 
5  7.92% 
6  6.69% 
7  5.80% 
8  5.12% 
9  4.58% 
This is called a Benford distribution, and the graph of it looks like this:
Benford's distribution (aka Benford's law) also holds true for vote tallies—assuming nobody cheated. I became aware of the Benford appoarch to fraud detection when I saw people screaming that Twitter was (ironically) censoring mention of it. My initial response was my usual one of skepticism, until I dug a little more and saw that accountants have been using Benford's law for years to detect accounting fraud. I decided it was worth delving into and doing some serious analysis of the election data.
Georgia: Is the Peach state ripe for election fraud?
I started with my home state of Georgia, a place not exactly notable for vote fraud. Frankly, I assumed the Georgia election was acceptably clean and the results mostly accurate. Was I wrong?
Grouping by Presidential candidate, I had Excel count the leading digit distribution of the results from each county. (Yes, I used Excel. You can download the workbook here.)
Leading digit  Trump  Biden  Jorgensen 

1  43  46  52 
2  35  41  21 
3  13  16  18 
4  14  21  18 
5  10  7  15 
6  17  9  14 
7  8  8  9 
8  10  6  5 
9  9  5  7 
Total  159  159  159 
You can't tell much just from counting the leading digits. You need to calculate the percentage of the time each leading digit occurs. This way you can compare the vote tallies to the expected Benford distribution, like so:
Leading digit  Benford distribution  Trump  Biden  Jorgensen 

1  30.10%  27.04%  28.93%  32.70% 
2  17.61%  22.01%  25.79%  13.21% 
3  12.49%  8.18%  10.06%  11.32% 
4  9.69%  8.81%  13.21%  11.32% 
5  7.92%  6.29%  4.40%  9.43% 
6  6.69%  10.69%  5.66%  8.81% 
7  5.80%  5.03%  5.03%  5.66% 
8  5.12%  6.29%  3.77%  3.14% 
9  4.58%  5.66%  3.14%  4.40% 
Total  100.00%  100.00%  100.00%  100.00% 
Let's visualize this table as a bar chart:
The grey line is the Benford distribution. Remember, in an honest, nonfraudulent election, each candidate's leading digits should follow a Benford distribution. But do they?
Interpreting the results
Look at Biden's column for digit 2. It's way above the Benford line. Also, take a look at his for digit 4. Same thing. Now look at Trump's column for digit 6. It's also way above the line.
How do we interpret this? A common way statisticians determine whether data fits an expected distribution is to use what's called the chi square goodnessoffit test. I'm not going to get into the detailed math behind this. Suffice it to say that the chi square test looks at the difference between the observed and expected values—the expected values in this case being the Benford distribution. Here are the results of the chi square test:
Leading digit  Trump  Biden  Jorgensen 

1  0.31027  0.04541  0.22535 
2  1.10066  3.79612  1.10060 
3  1.48997  0.47165  0.10946 
4  0.08082  1.27690  0.27444 
5  0.33575  1.56221  0.28940 
6  2.39381  0.15846  0.66866 
7  0.10184  0.10184  0.00336 
8  0.26705  0.35407  0.76211 
9  0.25485  0.44983  0.00688 
Total  6.33502  8.21649  3.44026 
We add up the differences for each candidate. What we're looking for is whether the total is less than or greater than what's called the critical value. Again without going into the math, the critical value is 15.5073. (If you're curious about the critical value, go check out https://www.omnicalculator.com/statistics/criticalvalue.) Here's what's important:

If the chi square for a candidate is less than the critical value, the observed distribution is a good fit to the Benford distribution.

If the chi square for a candidate is more than the critical value, the observed distribution is not a good fit to the Benford distribution.
In this case, each candidate's chi square value is far less than 15.5073, hence, each candidate's first digit distribution does fit the Benford distribution. Now, this does not mean that no fraud occurred. People have cheated at contests throughout history, and that especially includes elections. The question is never, "Does vote fraud exist?" but rather, "How bad is it?" Cheating occurs in every election of any size or significance. But in the case of Georgia, the cheating doesn't show up in this particular analysis.
Chicago, Illinois lives up to the corruption stereotype
Chicago has long had a welldeserved reputation as a place run by crooked politicians and, well—frankly—criminals acting under color of law. It's not just in the movies and history books. If Benford's law is any indication, Chicago is still living up to the reputation built by its shady past.
Here's the graph for Chicago:
Biden's numbers clearly don't fit the Benford distribution. They look more like a bell curve. Let's perform the chi square test to verify what the graph is showing us:
Digit  Trump  Biden 

1  3.39754  23.20173 
2  0.01160  0.39186 
3  2.62273  25.27114 
4  1.56170  18.56627 
5  0.37939  5.46632 
6  0.04551  0.57512 
7  0.00161  1.04767 
8  0.02969  3.60855 
9  0.66986  3.32666 
Total  8.71963  81.45532 
I won't repeat what I blurted out when I saw these results in my Excel spreadsheet. The sum of the chi square for Biden is 81.45532, far above the critical value of 15.5073. Without a doubt, vote fraud likely occurred in Chicago. Of course, that's not much of a surprise. It is Chicago, after all.
Notice that Trump's numbers do fit the distribution, which tells me that the few Trump votes were not thrown out or moved to Biden. Rather, it seems that someone was physically or digitally stuffing the ballot box with votes for Biden.
Detroit, Michigan: Another hotbed of corruption
The results from Detroit also demonstrate likely fraud. Biden's tallies, having a chi square of 30.84060, deviate significantly from a Benford distribution. Trump's chi square is 7.016. The difference is palpable.
One more thing worth a mention: The fraud in Detroit is not just from mailin ballots. I removed all mailedin/droppedoff ballots from the analysis, and the distribution of Biden votes actually deviated from a Benford distribution even more. This tells me that there was systemic fraud that extended to inperson voting as well.
Does cheating change the results?
Most of the time election fraud goes undetected or—more commonly—ignored because throwing out the fraudulent ballots wouldn't make a difference. This means people who commit election fraud usually get away with it. And when people get away with it for long enough, they eventually up the ante. I think that's what we're seeing here. The people who have engaged in ballot box stuffing for years have decided this is the year to go all in. They didn't count on data nerds catching them in the act.
I don't know how all this will play out. But I do hope that we'll soon see some automated, realtime election analysis that will immediately alert us to issues like we've seen in the 2020 Presidential election. Maybe that will deter the career criminalpoliticians from cheating the next time around.
But I'm not holding my breath.
For further reading: DIY Election Fraud Analysis Using Benford’s Law by Rajat Gupta