The Base Rate Fallacy

Who is more likely to be killed by a police officer in the United States: a white person or a black person? You might think, “Police kill more white people than black people in the US. So it’s the white person.” That answer contains a fallacy: the base rate fallacy. This post explains the fallacy, provides some examples, and suggests how to avoid it.

1. Base Rates

In order to correctly calculate a probability about something in a population, then we need to account for that thing’s prevalence in the population. In short, we need to account for its base rate.

Base rates are rates at which something occurs in a population (of people, items, etc.). For example, if 1% of people in my neighborhood are doctors, then the base rate of doctors in my neighborhood is simply 1%. If 60% of people in Atlanta own a pet, then the base rate of pet owning in Atlanta is 60%. If 13% of US residents are black people, then the base rate of black people in the US is 13%. You get the idea.

So if we want to calculate a probability about doctors, pet owners, or black people, then we need to account for the base rate of doctors, pet owners, or black people. It’s that simple.

2. Examples Of The Base Rate Fallacy

Pregnancy tests, drug tests, and police data often determine life-changing decisions, policies, and access to public goods. So we should make sure we understand how to avoid the base rate fallacy when thinking about them.

2.1 Pregnancy Test

Suppose Jesse’s pregnancy test kit is 99% accurate and Jesse tests positive. What is the probability that Jesse is pregnant? Jesse’s test result must be 99% accurate if their test is 99% accurate, right?

Positive pregnancy test result. Public domain.

Actually, it depends on—among other things—if Jesse is part of the population of people that can get pregnant. After all, if Jesse lacks a uterus, then the base rate of pregnancies in Jesse’s population is 0%, making the probability that Jesse is pregnant 0%. (In that case, Jesse’s positive test result would be false—a.k.a., a false positive.) So if someone thought that the probability of Jesse being pregnant was more than 0%, then they either assumed or ignored a relevant base rate.

2.2 Drug Test

Suppose the drug tests for welfare eligibility are 90% accurate. If I test positive, what is the probability that I am among the 10% of people that actually use drugs? Well, the test is 90% accurate. So it’s 90%, right?

Calculating the probability of a true positive drug test result on a population of 100 people when the base rate of drug use is 10% and the test is 90% accurate. Boxes not drawn to scale. Nick Byrd, CC BY 4.0

Actually, it’s 50%. The key is to remember the 10% base rate of drug use. Check the back of the napkin math for yourself (above).

The thing to remember with diagnostic tests like this is that even highly accurate tests produce inaccurate individual test results if the base rate of what they test for is very low (as Dr. Deborah Birx explains in less than 60 seconds below). So if we ignore base rates—especially low base rates—then that base rate fallacy produces inaccurate probabilities. And sometimes the stakes are way too high to shrug off such errors.

2.3 Police Statistics

So is a white person or black person more likely to be killed by the police in the United States? Well, police kill more white people than black people. So you might think that the white person is more likely to be killed by the police.

However, there are far more white people in the US than black people (60% vs. 13% according to 2019 Census estimates). So of course police people kill far more white people than black people in the US! That tells us more about the general population than about police. But what happens when we account for the base rates of white and black people?

Police killings per race per million people of that race. — Image via Statistica

When you factor in the base rates of each race in the US, we find that black people are multiple times more likely to be killed by police officers than white people—the opposite of what we thought when we fallaciously neglected the relevant base rates.

3. Stereotypes & The Base Rate Fallacy

Psychology has revealed that we are prone to ignore base rates when calculating probabilities about stereotyped people or groups. Consider a famous example of how we might reason according to stereotypes (Kahneman & Tversky, 1973, p. 241).

Jack is a 45-year-old man. He is married and has four children. He is generally conservative, careful, and ambitious. He shows no interest in political and social issues and spends most of his free time on his many hobbies which include home carpentry, sailing, and mathematical puzzles.

The probability that Jack is one of the 30 engineers in the sample of 100 people is ____ %

When Jack is described as a stereotypical engineer, most people tend to overestimate the probability that Jack is an engineer relative to the 30% base rate of engineers, t(169) = 3.23, p < 0.01 (ibid.). That is, people seem to ignore the 30% base rate of engineers in the final sentence. In other words, people tend to commit the base rate fallacy about that description of Jack.

However, people tend to avoid the base rate fallacy when individuals are not described stereotypically (Turpin et al., 2020). Instead, they seem to realize that the probability of someone being an engineer depends on the base rate of engineers in the population.

Suppose that you are given no information whatsoever about an individual chosen at random from the sample. The probability that this man is one of the 30 engineers in the sample of 100 is ____ %”

(The answer to both prompts is 30.)

4. Implications Of The Base Rate Fallacy

There seem to be a few major takeaways from this information about the base rate fallacy.

We can help each other avoid the base rate fallacy. We know some of the conditions under which we are more and less likely to commit the base rate fallacy. So if we want people to evaluate probabilities correctly, then we should design the questions and decision environments in ways that are less likely to encourage people to commit the base rate fallacy. For example, we should either anonymize individuals or else not describe individuals stereotypically in certain decision contexts.
We can identify our own base rate fallacies. Now that you know about base rates and their role in probability calculations, you can correct yourself. Just ask, “What is the relevant base rate?” You might first need to ask, “What is the relevant population?”
We can be more humble. Sometimes we do not know the relevant base rate—and sometimes we cannot know it. In these cases, we simply cannot calculate the probabilities that depend on those base rates. In those cases, we should admit that we do not know the probabilities, especially when the stakes are high—e.g., drug testing, disease testing, etc.