Who is more likely to be killed by a police officer in the United States: a white person or a black person? You might think, “Police kill more white people than black people in the US. So it’s the white person.” That answer contains a fallacy: the base rate fallacy. This post explains the fallacy, provides some examples, and suggests how to avoid it.

## 1. Base Rates

In order to correctly calculate a probability about something in a population, then we need to account for that thing’s prevalence in the population. In short, we need to account for its base rate.

* Base rates are rates at which something occurs in a population (of people, items, etc.).* For example, if 1% of people in my neighborhood are doctors, then the base rate of doctors in my neighborhood is simply 1%. If 60% of people in Atlanta own a pet, then the base rate of pet owning in Atlanta is 60%. If 13% of US residents are black people, then the base rate of black people in the US is 13%. You get the idea.

So if we want to calculate a probability about doctors, pet owners, or black people, then we need to account for the base rate of doctors, pet owners, or black people. It’s that simple.

## 2. Examples Of The Base Rate Fallacy

Pregnancy tests, drug tests, and police data often determine life-changing decisions, policies, and access to public goods. So we should make sure we understand how to avoid the base rate fallacy when thinking about them.

#### 2.1 Pregnancy Test

Suppose Jesse’s pregnancy test kit is 99% accurate and Jesse tests positive. What is the probability that Jesse is pregnant? 99%?

Actually, it depends on—among other things—if Jesse is part of the population of people that can get pregnant (e.g., the population of people that have uteruses). After all, if the base rate of pregnancies or uteruses in Jesse’s population is 0%, then the probability that Jesse is pregnant is 0%. (In that case, Jesse’s positive test result would be false—a.k.a., a false positive.) So if someone thought that the probability of Jesse being pregnant was more than 0%, then they either assumed or ignored a relevant base rate.

#### 2.2 Drug Test

Suppose the drug tests for welfare eligibility are 90% accurate. If I test positive, what is the probability that I am among the 10% of people that actually use drugs? Well, the test is 90% accurate. So it’s 90%, right?

Actually, it’s 50%. The key is to remember the 10% base rate of drug use. Check the back of the napkin math for yourself (above).

They thing to remember with diagnostic probabilities like this is that *even highly accurate tests produce inaccurate individual test results if the base rate for what we’re testing is very low*. So if we ignore base rates—especially low base rates—then that base rate fallacy can produce drastically incorrect probability estimates.

#### 2.3 Police Statistics

So is a white person or black person more likely to be killed by the police in the United States? Well, police kill more white people than black people. So you might think that the white person is more likely to be killed by the police.

However, there are far more white people in the US than black people (60% vs. 13% according to 2019 Census estimates). So *of course* police people kill more white people than black people in the US! That doesn’t tell us much about white people or black people or police. It might only tell us about the base rates of white people and black people in the US. So what happens when we account for those base rates?

When you factor in base rates of each race in the US, we find that black people are multiple times more likely to be killed by police officers than white people—the opposite of what we thought when we fallaciously neglected the relevant base rates.

## 3. Stereotypes & The Base Rate Fallacy

Psychology has revealed that we are prone to ignore base rates when calculating probabilities about stereotyped people or groups. Consider a famous example (Kahneman & Tversky, 1973, p. 241).

Jack is a 45-year-old man. He is married and has four children. He is generally conservative, careful, and ambitious. He shows no interest in political and social issues and spends most of his free time on his many hobbies which include home carpentry, sailing, and mathematical puzzles.

The probability that Jack is one of the 30 engineers in the sample of 100 people is

____%

Most people tend to *overestimate* the probability that Jack is an engineer relative to the 30% base rate of engineers, *t*(169) = 3.23, *p* < 0.01 (ibid.). That is people seem to ignore the 30% **base rate of engineers** in the final sentence. In other words, people tend to commit the base rate fallacy about that description of Jack.

However, people tend to avoid the base rate fallacy when individuals are *not* described as stereotypical (Turpin et al., 2020). Instead, they seem to realize that the probability of someone being an engineer depends on the base rate of engineers in the population.

Suppose that you are given no information whatsoever about an individual chosen at random from the sample. The probability that this man is one of the 30 engineers in the sample of 100 is

”____ %

(The answer to both prompts is 30.)

## 4. Implications Of The Base Rate Fallacy

There seem to be a few major take-aways from this information about the base rate fallacy.

**We can help others avoid the base rate fallacy**. We know some of the conditions under which we are more and less likely to commit the base rate fallacy. So if we want people to evaluate probabilities correctly, then we should design the questions and decision environments in ways that are less likely to encourage people to commit the base rate fallacy. For example, we should either anonymize individuals or else not describe individuals stereotypically in certain decision contexts.**We can correct our base rate fallacies**. Now that you know about base rates and their role in probability calculations, you can correct yourself. Just ask, “What is the relevant base rate?” You might first need to ask, “What is the relevant population?”**Agnosticism, honesty, and humility**. Sometimes we do not know the relevant base rate—and sometimes we*cannot*know it. In these cases we simply cannot calculate the probabilities that depend on those base rates. In those cases, we should admit that we do not know the probabilities, especially when the stakes are high—e.g., drug testing, disease testing, etc.