BrownMath.com → Stats w/o Tears → 5. Probability

# Stats without Tears5. Probability

Updated 15 Nov 2021
Copyright © 2012–2022 by Stan Brown, BrownMath.com

View or
Print:
These pages change automatically for your screen or printer. Underlined text, printed URLs, and the table of contents become live links on screen; and you can use your browser’s commands to change the size of the text or search for key words. If you print, I suggest black-and-white, two-sided printing.

Intro: By now you know: There’s no certainty in statistics. When you draw a sample from a population, what you get is a matter of probability. When you use a sample to draw some conclusion about a population, you’re only probably right. It’s time to learn just how probability works.

If you’re learning independently, you can skip the sections marked “Optional” and still understand the chapters that follow. If you’re taking this course with an instructor, s/he may require some or all of those sections. Ask if you’re not sure!

For easy reference, tables used in more than one problem are duplicated at the end of this document.

## 5A.  Probability Basics

### 5A1.  What Is Probability?

Definitions: Probability can be defined two ways: the long-term relative frequency of an event, or the likelihood that an event will occur.

A trial is any procedure or observation whose result depends at least partly on chance. The result of a trial is called the outcome. We call a group of one or more repeated trials a probability experiment.

Example 1: Ten thousand doctors took aspirin every night for six years, and 104 of them had heart attacks. The relative frequency is 104/10000 = 1.04%, so the probability of heart attack is 1.04% for doctors taking aspirin nightly.

Each doctor represents a trial, and the outcome of each trial is either “heart attack” or “no heart attack”. The group of 10,000 trials is a probability experiment.

Definition: An event is a group of one or more possible outcomes of a trial. Usually those outcomes are related in some way, and the event is named to reflect that.

Example 2: If you draw a card from a deck without looking, there are 52 possible outcomes (assuming the jokers have been removed). “Ace” is an event, representing a group of four outcomes, and the probability of that event is 4/52 or 1/13. “Spade” is an event, representing a group of 13 outcomes, so its probability is 13/52 or 1/4. “Ace of spades” is both an outcome and an event, with a probability of 1/52.

Write probabilities as fractions, decimals, or percentages, like this:

P(event) = number

Example 3: On a coin flip, P(heads) = 0.5, read as the probability of heads is 0.5. “P(0.5)” is wrong. Don’t write P(number); always write P(event) = number.

All probabilities are between 0 and 1 inclusive. A probability of 0 means the event is impossible or cannot happen; a probability of 1 means the event is a certainty or will definitely happen. Probabilities between 0 and 1 are assigned to events that may or may not happen; the more likely the event, the higher its probability.

Definition: When an event is unlikely — when it has a low probability of occurring — you call it an unusual event. Unless otherwise stated, “unlikely” means that the probability is below 0.05.

This will be an important idea in inferential statistics.

### 5A2.  Where Do You Get Probabilities?

Pure thought is enough to give many probabilities: the probability of drawing a spade from a deck of cards, the probability of rolling doubles three times in a row at Monopoly, the probability of getting an all-white jury pool in a county with 26% black population. Any such probability is called a theoretical probability or classical probability.

Theoretical probabilities come ultimately from a sample space, usually with help from some of the laws for combining events. (I’ll tell you about both of these later in this chapter.)

Example 4: A standard die (used in Monopoly or Yahtzee) has six faces, all equally likely to come up. Therefore you know that the probability of rolling a two is 1/6.

On the other hand, some probabilities are impossible to compute that way, because there are too many variables or because you don’t know enough: the probability that weather conditions today will give rise to rain tomorrow, the probability that a given radium nucleus will decay within the next second, the probability that a given candidate will win the next election, the probability that a driver will have a car crash in the next year. To find the probability of an event like that, you do an experiment or rely on past experience, and so it is called an experimental probability or empirical probability.

Example 5: The CDC says that the incidence of TB in the US is 5 cases per 100,000 population. 5/100,000 = 0.005%. Therefore you can say that the probability a randomly selected person has TB is 0.005%.

These two terms describe where a probability came from, but there’s no other difference between experimental and theoretical probabilities. They both obey the same laws and have the same interpretations.

You probably don’t need formulas, but if you want them here they are:

Theoretical or classical: P(success) = N(success) / N(possible outcomes) P(success) = N(success) / N(trials)

### 5A3.  Interpreting Probability Statements

Every probability statement has two interpretations, probability of one and proportion of all. You use the interpretation that seems most useful in a given situation.

Example 6: For doctors taking aspirin nightly, P(heart attack in six years) = 1.04%. The “probability of one” interpretation is that there’s a 1.04% chance any given doctor taking aspirin will have a heart attack. The “proportion of all” interpretation is that 1.04% of all doctors taking aspirin can be expected to have heart attacks.

Which interpretation is right? They’re both right, but in a given situation you should use the one that feels more natural.

### 5A4.  Law of Large Numbers

You know that P(boy) is about 50% for live births, but you’re not surprised to see families with two or three girls in a row. Probability is long-term relative frequency; it can’t predict what will happen in any particular case.

This is expressed in the law of large numbers: as you take more and more trials, the relative frequency tends toward the true probability.

The law of large numbers was stated in 1689 by Jacob Bernoulli.

Example 7: For just a few babies, say the four children in one family, it’s quite common to find a proportion of boys very different from 50%, say one in four (25%) or even zero in four. But consider a class of thirty statistics students. The proportion may still be different from 50%, but a very different proportion (more than 70%, say, or less than 30%) would be unusual. And when you look at all babies born in a large hospital in a year, experience tells you that the proportion will be very close to 50%. The more trials you take, the closer the relative frequency is to the true probability — usually.

so far
rel. freq.
1T00.0000
2H10.5000
3H20.6667
4H30.7500
5H40.8000
6T40.6667

But the Law of Large Numbers says that the relative frequency tends to the true probability. Probability can’t predict what will happen in any given case. The idea that a particular outcome is “due” is just wrong, and it’s such a classic mistake that it has a name. The Gambler’s Fallacy is the idea that somehow events try to match probabilities.

Example 8: I’ve just flipped a coin a few times, and the results are shown at the right. The first flip was a tail, and after that flip the relative frequency (rf) of heads is 0. The next flip is a head, and after two flips I’ve had one head out of two trials, so the rf is 0.5. The third flip is also a head, so now the rf is 2/3 or about 0.6667. At this point someone might say, “you’re due for a tail, to move the rf back toward the true probability of 0.5.” That’s the Gambler’s Fallacy.

The coin doesn’t know what it did before, and it doesn’t try to make things “right”. In my trials, the fourth flip moves the rf of heads further from 0.5, and the fifth flip moves it further still. True, the sixth flip moves the rf of heads closer to 0.5, but it could just as well have moved it further away, even if the coin is perfectly fair.

I stopped after six trials. I know that if I went on to do ten trials, or a hundred, or a thousand, over time the proportion of heads would almost always move closer to 0.5 — not necessarily on any particular flip, but in the long run.

Subconsciously you expect random events not to show a pattern, but you may see patterns along the way. For example, if you flip a fair coin repeatedly, inevitably you will see a run of ten heads or ten tails — about twice in every thousand sequences of ten. If you flip the coin once every two seconds, you can expect to see a run of ten flips the same about once every 17 minutes, on average.

Here are two more examples of patterns cropping up in processes that are actually random:

Example 9: You have flipped a coin 999 times, and there were 499 heads and 500 tails. What’s the probability of a head on the next flip?

Solution: It is 50%, the same as on any other flip. The Law of Large Numbers tells you that over time you tend to get closer and closer to 50% heads, but it doesn’t tell you anything at all about any particular flip. If you think that the coin is somehow “due for a head”, you’ve fallen into the Gambler’s Fallacy.

### 5A5.  Sample Space

At bottom, probability is about counting. Empirical probability is the number of times something did happen, divided by the number of trials. Classical probability is similar, but it makes use of a list or table of all possible outcomes, called a sample space. Technically a sample space is just a list of all possible outcomes, but it’s only useful if you make it a list of all possible equally likely outcomes.

For repeated independent trials — flipping multiple coins, rolling multiple dice, making successive bets at roulette, and so on — the size of the sample space will be the number of outcomes in each trial, raised to the power of the number of trials. For example, if you want to compute probabilities for the number of girls in a family of four children, your sample space will have 24 = 16 entries.

Example 10: If you roll two dice, what’s the probability you’ll roll a seven? You could list the sample space as

S = { 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }

but the outcomes are not equally likely. There’s only one way to get a twelve, for instance (double sixes), but there are several ways to get a seven (1–6, 2–5, and so on). So it’s much more useful to list your sample space with equally likely outcomes.

When constructing a sample space, be systematic so that you don’t leave any out or list any twice. Here, you’re rolling two dice, and each die has six equally likely results, so you have 6×6 = 36 equally likely outcomes in your sample space. How can you be systematic? List the outcomes in some regular order, like the picture below. Each row lists all the possibilities with the same outcome for the first die; each column lists all the possibilities with the same outcome for the second die.  image courtesy of Bob Yavits, Tompkins Cortland Community College

Once you have a sample space of equally likely outcomes, finding the probability is simple. There are six ways to roll a seven: 6-1, 5-2, 4-3, 3-4, 2-5, 1-6. There are 36 possible outcomes, all equally likely. Therefore the probability of rolling a seven is 6/36 or 1/6 or about 0.1667. In symbols, P(7) = 6/36 or P(7) = 1/6.

Presenting
numbers:
There’s no need to reduce fractions to lowest terms. If a decimal is not exactly equal to a fraction, it’s probably better to keep the fraction. But if the fraction is complex or you’re comparing fractions, round to four decimal places and use the “approximately equal” sign, like this:

P(7) ≈ 0.1667

Caution: Round your final answer only. Never use a rounded number in further calculations; that’s the Big No-no. Fortunately, your calculator makes it easy to chain calculations so that you can see rounded numbers but it still uses the unrounded numbers for further calculations.

Example 11: Find the probability of rolling craps (two, three, or twelve).

Solution: There’s one way to roll a two, two ways to roll a three, and one way to roll a twelve. P(craps) = (1+2+1)/36 = 4/36 or 1/9.

### 5A6.  Probability Models

Often, it’s not practical to construct a sample space and compute probabilities from it. Instead, you construct a probability model. Probability models are yet another kind of mathematical model as introduced in Chapter 4.

Definition: A probability model is a table showing all possible outcomes and their probabilities. Every probability must be 0 to 1 inclusive, and the total of the probabilities must be 1 or 100%.

A probability model can be theoretical or empirical.

on Two Coin Flips
xP(x)
01/4
12/4
21/4
4/4 = 1

Example 12: Construct a probability model for the number of heads that appear when you flip two coins.

Solution: Start by constructing the sample space. Remember that you need equally likely events if you are going to find probabilities from the sample space. The first coin can be heads or tails, and whatever the first coin is, the second coin can also be heads or tails. So the sample space has 2×2 = 4 outcomes:

S = { HH, HT, TH, TT }

There are four equally likely outcomes, so the denominator (bottom number) on all the probabilities will be 4. The possible outcomes are no heads (one way), one head (two ways), and two heads (one way). The probability model is shown at right. Often a total row is included, as I did, to show that the probabilities add up to 1.

That was an easy example, so easy that you could just as well work from the sample space. But think about more complex situations, especially with empirical (experimental) probabilities. Constructing a sample space may be impractical, but a probability model is relatively easy to create.

Example 13: (adapted from Sullivan 2011, page 235 problem 40): The CDC asked college students how often they wore a seat belt when driving. 118 answered never, 249 rarely, 345 sometimes, 716 most of the time, 3093 always. Construct a probability model for seat-belt use by college students when driving.

Seat-Belt Use by
College Students Driving
(sample size: 4521)
Never2.61 %
Rarely5.51 %
Sometimes7.63 %
Most of
the time
15.84 %
Always68.41 %
Total100.00 %

Solution: Probability of one is proportion of all, so to get the probabilities you simply calculate the proportions. Sample size was (118+249+345+716+3093) = 4521. The proportions or probabilities are then simply 118/4521, 249/4521, and so on. The probability model is shown at the right.

Comments: Don’t push this model too far. In this sample, 68.4% of college students reported that they always use a seat belt when driving. There’s no uncertainty about that statement; it’s a completely accurate statistic (summary number for a sample). But can you go further and say that 68.4% of college students always wear a seat belt when driving? No, for two reasons.

First, this is a sample. Even if it’s a perfect random sample, it’s still not the population. There’s always sample variability. A different sample of college students would most likely give different answers — probably not very different, since this was a large sample, but almost certainly not identical. Second, and more serious, this survey depended on self reporting: students weren’t observed, they were just asked. When people report their behavior they tend to shade their responses in the direction of what’s socially approved or what they would like to think about themselves (response bias). How many of those “always” responses should have been “most of the time” or “sometimes”? You have no way to know.

## 5B.  Combining Probabilities

You can find probabilities of simple events by making sample spaces and counting. But life isn’t usually that simple. To find probabilities of more interesting (and complex) events, you need to use rules for combining probabilities.

The rules are the same whether your original probabilities are theoretical or experimental.

### 5B1.  Probability “or” for Disjoint Events

Definition: When two events can’t both happen on the same trial, they are called mutually exclusive events or disjoint events.

Example 14: You select a student and ask where she was born. “Born in Cortland” and “born in Ithaca” are mutually exclusive events because they can’t both be true for the same person.

Comment: Obviously it’s possible that neither is true. Disjoint events could both be false, or one might be true, but they can’t both be true in the same trial.

Example 15: You select a student and ask his major. “Major in physics” and “major in music” are non-disjoint events because they could be true of the same student. (It doesn’t matter whether they are both true of the student you asked. They are non-disjoint because they could both be true of the same student — think about double majors.)

Rule: For disjoint events, P(A or B) = P(A)+P(B)

Example 16: You draw a card from a standard 52-card deck. What’s P(ace or face card)? (A face card is a king, queen, or jack.)

Solution: Are the events “ace” and “face card” disjoint? Yes, because a given card can’t be both an ace and a face card. Therefore you can use the rule:

P(ace or face card) = P(ace) + P(face card)

But what are P(ace) and P(face card)? A picture may help. used by permission; source: http://www.jfitz.com/cards/ accessed 2012-09-26

Now you can see that the deck of 52 cards has four aces and twelve face cards. Therefore

P(ace) = 4/52 and P(face card) = 12/52

Since the events are disjoint,

P(ace or face card) = P(ace) + P(face card)

P(ace or face card) = 4/52 + 12/52 = 16/52

Reminder: When you need to compute probability of A or B, always ask yourself first, are the events disjoint? Use the simple addition rule only if the events are disjoint. If events are non-disjoint — if it’s possible for both to happen on the same trial — you have to use the general rule, below.

US Marital Status in 2006 (in Millions)
MenWomen Totals
Married63.664.1127.7
Widowed2.611.313.9
Divorced9.713.122.8
Never married30.325.055.3
Totals106.2113.5219.7

Take a look at this table of marital status in 2006, from the US Census Bureau. It’s known as a contingency table or two-way table, because it classifies each member of the sample or population by two variables — in this case, sex and marital status.

Example 17: What’s the probability that a randomly selected person is widowed or divorced?

Solution: Are those events disjoint? Yes, because a given person can’t be listed in both rows of the table. (You might argue that a given person can be both widowed and divorced in his or her lifetime, and that’s true. But the table shows marital status at the time the survey was made, not over each person’s lifetime. The “Widowed” row counts those whose most recent marriage ended with the death of their spouse.) Therefore

P(widowed or divorced) = P(widowed) + P(divorced)

How do you find those probabilities? Remember that probability of one = proportion of all. Find the proportions, and you have the probabilities.

P(widowed or divorced) = 13.9/219.7 + 22.8/219.7

P(widowed or divorced) = 36.7/219.7 ≈ 0.1670

Example 18: Find the probability that a randomly selected man is widowed or divorced.

Solution: Disjoint events? Yes, a given man can’t be in both rows of the table. Again, the probabilities are the proportions, but now you’re looking only at the men:

P(widowed or divorced) = P(widowed) + P(divorced)

P(widowed or divorced) = 2.6/106.2 + 9.7/106.2

P(widowed or divorced) = 12.3/106.2 ≈ 0.1158

Now let’s look at a couple of examples of probability “or” for non-disjoint events.

Example 19: Find P(seven or club).

Solution: Are the events “seven” and “club” disjoint? No, because a given card can be both a seven and a club. You can’t use the simple addition rule.

The next section shows you a formula, but in math there’s usually more than one way to approach a problem. Here you can look back at the picture look at the picture (reprinted on the last page) and count from the sample space. There are thirteen clubs, plus the sevens of spades, hearts, and diamonds, for a total of 16. (You don’t count the seven of clubs when counting sevens, because you already counted it when counting clubs.) And therefore P(seven or club) = 16/52.

Example 20: Find P(woman or divorced).

Solution: Disjoint events? No, a given person can be both. So what do you do? The same thing as in the preceding example: you count up all the women, and all the divorced people who aren’t women, and divide by the number of people:

P(woman or divorced) = 113.5/219.7 + 9.7/219.7 = 123.2/219.7 ≈ 0.5608

### 5B2.  Probability “or” for All Events

Because this textbook helps you,
please click to donate!
Because this textbook helps you,
BrownMath.com/donate.

Look back at P(seven or club). Those are not disjoint events, so you can’t just add P(seven) and P(club). But what did you do, when counting? You counted the clubs, then you counted the sevens that aren’t clubs. In other words, just adding P(seven) and P(club) would be wrong because that would double count the overlap.

With 52 cards, it’s easy enough just to count. But that’s not practical in every problem, so there’s a rule: go ahead and double count by adding the probabilities, then fix it by subtracting the part you double counted.

Rule: P(A or B) = P(A) + P(B) − P(A and B)

This general addition rule works for all events, disjoint or non-disjoint. (If two events are disjoint, they can’t happen at the same time, P(A and B) is 0, and the general rule becomes the same as the simple rule.)

Let’s redo the last two examples with this new general rule, to see that it gives the same answers.

Example 19 again: Find P(seven or club).

P(seven or club) = P(seven) + P(club) − P(seven and club)

Caution: P(seven and club) doesn’t mean “all the sevens and all the clubs”. It means the probability that one card will be both a seven and a club — in other words, it means the seven of clubs.

P(seven or club) = 4/52 + 13/52 − 1/52

P(seven or club) = 16/52

US Marital Status in 2006 (in Millions)
MenWomen Totals
Married63.664.1127.7
Widowed2.611.313.9
Divorced9.713.122.8
Never married30.325.055.3
Totals106.2113.5219.7

Example 20 again: Using the table, table of marital status (reprinted on the last page), find P(woman or divorced).

Solution:

P(woman or divorced) = P(woman) + P(divorced) − P(woman and divorced)

P(woman or divorced) = 113.5/219.7 + 22.8/219.7 − 13.1/219.7

P(woman or divorced) = 123.2/219.7 ≈ 0.5608

### 5B3.  Probability “not” — Complements

About two thirds of students who register for a math class complete it successfully. What’s the probability that a randomly selected student who registers for a math class will not complete it successfully? Of course you already know it’s 1−(2/3) = 1/3. Let’s formalize this.

Definitions: Two events are complementary if they can’t both occur but one of them must occur. If A is an event from a given sample space, then the complement of A, written AC or not A, is the rest of that sample space.

Describing a complement usually involves using the word “not”. Complementary events (can’t both happen, but one must happen) are a subcategory of disjoint events (can’t both happen).

Example 21: The complement of the event “the student completes the course successfully” is the event “the student does not complete the course successfully.” Obviously the complement need not be a simple event. The complement of “the student completes the course successfully” is “the student never shows up, or attends initially but stops attending, or withdraws, or earns an F, or takes an incomplete but never finishes”, or probably other outcomes I haven’t thought of.

Rule: P(AC) = 1 − P(A)

This comes directly from the definition, and the rule for “or”. A and AC can’t both happen, so they’re disjoint and P(A or AC) = P(A)+P(AC). But one or the other must happen, so P(A or AC) = 1. Therefore P(A)+P(AC) = 1, and P(AC) = 1−P(A).

Example 22: In rolling two dice, “doubles” and “not doubles” are complementary events because they can’t both happen on the same roll, but one of them must happen. “Boxcars” (double sixes) and “snake eyes” (double ones) can’t both happen, so they’re disjoint; but they are not complementary because other outcomes are possible.

The complement rule is useful on its own, but it really shines as a labor-saving device. Very often when a probability problem looks like a lot of tedious computation, the complement is your friend. This really sticks out with “at least” problems (later), but here are a few simpler examples.

Colors of Plain M&Ms
Blue24 %
Orange20 %
Green16 %
Yellow14 %
Brown13 %
Red13 %

Example 23: The color distribution for plain M&Ms is shown at right. What’s the probability that a randomly selected plain M&M is any color but yellow?

Solution: You could add the probabilities of the five other colors, but of course it’s easier to say

P(YellowC) = 1 − P(Yellow)

P(YellowC) = 100% − 14% = 86%

US Marital Status in 2006 (in Millions)
MenWomen Totals
Married63.664.1127.7
Widowed2.611.313.9
Divorced9.713.122.8
Never married30.325.055.3
Totals106.2113.5219.7

Example 24: Referring again to the table of marital status, status (reprinted on the last page), what’s the probability that a randomly selected person is not currently married?

Solution: Since the four marital statuses are disjoint, you could add the probabilities for widowed, divorced, and never married. But it’s easier to take the complement of “married”:

P(not currently married) = P(marriedC)

P(not currently married) = 1 − P(married)

P(not currently married) = 1 − 127.7/219.7

P(not currently married) = 0.4188

### 5B4.  Probability “and” for Independent Events

Definition: Two events are called independent events if the occurrence of one doesn’t change the probability of the other.

Example 25: When you play poker, being dealt a pair in this hand and a pair in the next are independent events because the deck is shuffled between hands. But in casino blackjack, according to Scarne on Cards (Scarne 1965, 144 [see “Sources Used” at end of book]), four decks are used and they aren’t necessarily shuffled between hands. Therefore, getting a natural (ace plus a ten or face card) in this hand and a natural in the next are not independent events, because the cards already dealt change the mix of remaining cards and therefore change the probabilities.

That’s also an example of sampling with replacement (poker) and sampling without replacement (casino blackjack).

Samples drawn with replacement are independent because the sample space is reset to its initial condition between draws. Samples drawn without replacement are usually dependent because what you draw out changes the mix of what is left. However, if you’re drawing from a very large group, the change to the proportions in the mix is very small, so you can treat small samples from a very large group as independent.

Independent events are not disjoint, and disjoint events are not independent. If two events A and B are disjoint, then if A happens B can’t happen, so its probability is zero. One of two disjoint events happening changes the probability of the other, so they can’t be independent.

Rule: For independent events, P(A and B) = P(A) × P(B)

Example 26: In Monopoly, you get an extra roll if you roll doubles, but if you roll doubles three times in a row you have to go to jail. What’s the probability you’ll have to go to jail on any given turn?

Solution: Refer to the picture of the dice (reprinted on the last page). There are six ways out of 36 to get doubles, so P(doubles) = 6/36 or 1/6. Each roll is independent, so the probability of doubles three times in a row is (1/6)×(1/6)×(1/6) or (1/6)^3 = 1/216, about 0.0046. If you play a lot of Monopoly, you’ll go to jail, because of doubles, between four and five times per thousand turns.

Example 27: The first traffic light on your morning commute is red 40% of the time, yellow 5%, and green 55%. What’s the probability you’ll hit a green all five mornings in any given week?

Solution: Are the five days independent? Yes, because where you hit that light in its cycle on one morning doesn’t influence where you hit it on the next day. The probability of green is 55% each day regardless of what happens on any other day. Therefore, the probability of five greens on five successive mornings is 55%×55%×55%×55%×55% or (0.55)5 ≈ 0.0503. About one week in twenty, that light should be green for you all five mornings.

US Marital Status in 2006 (in Millions)
MenWomen Totals
Married63.664.1127.7
Widowed2.611.313.9
Divorced9.713.122.8
Never married30.325.055.3
Totals106.2113.5219.7

Example 28: Refer again to the table of marital status. status (reprinted on the last page). What’s the probability that a randomly selected person is female and widowed?

Solution: In a two-way table, for probability “and”, you don’t worry about formulas or independence because everything is already laid out for you. 11.3 million persons are female and widowed, out of 219.7 million. Therefore:

P(female and widowed) = 11.3/219.7 ≈ 0.0514.

Example 29: Earlier in this section, I said that samples drawn without replacement are usually dependent, but you can treat them as independent when drawing a small sample from a very large group. Here’s an example. If you randomly select three women, what’s the probability that all three are widowed?

Solution: From the preceding example, the probability that any one woman is widowed was 11.3/219.7. Because three women is a small sample against the millions of women in the census, and the sample is random, you can treat them as independent. If you randomly select one woman out of millions, the mix of marital status in the remaining women is so nearly unchanged that you can ignore the difference. Therefore, the probability that all three women are widowed is

(11.3/219.7) × (11.3/219.7) × (11.3/219.7) = (11.3/219.7)³ ≈ 0.0001.

### 5B5.  Probability “at least” for Independent Events

There’s no special rule for “at least”, but textbook writers (and quiz writers) love this type of problem, so it’s worth looking at. “At least” problems usually want you to combine several of the probability rules.

Example 30: Think back to that traffic light that’s green 55% of the time, yellow 5%, red 40%. What’s the probability that you’ll catch it red at least one morning in a five-day week?

Solution: You could find the probability of catching it red one morning (five separate probabilities for five separate mornings), or two mornings (ten different ways to hit two mornings out of five), or three, four, or five mornings. This would be incredibly laborious. Remember that the complement is your friend. What’s the complement of “at least one morning”? It’s “no mornings”. So you can find the probability of getting a red on no mornings, subtract that from 1, and have the desired probability of hitting red on at least one morning.

P(at least one red in five) = 1 − P(no red in five)

But the status of the light on each morning is independent of all the others, so

P(no red in five) = P(no red on one)5

What’s the probability of no red on any one morning? It’s 1 minus the probability of red on any one morning:

P(no red on one) = 1 − P(red on one) = 1−0.4

Now put all the pieces together:

P(no red on one) = 1 − P(red on one) = 1−0.4

P(no red in five) = [ P(no red on one) ]5 = (1−0.4)5

P(at least one red in five) = 1 − P(no red in five) = 1 − (1−0.4)50.9222

About 92% of weeks, you hit red at least one morning in the week.

Be careful with your logic! You really do need to work things through step by step, and write down your steps. Some students just seem to subtract things from 1, and multiply other things, and hope for the best. That’s not a very productive approach.

One thing that can help you with these “at least’ and “at most” problems is to write down all the possibilities and then cross out the ones that don’t apply, or underline the ones that do apply. For “at least one red in five”, you have 1 2 3 4 5 or 0 | 1 2 3 4 5. Either way, with this enumeration technique, taught to me by Benjamin Kirk, you can see that the complement of “at least one” is “none”.

A common mistake is computing 1−0.45 for P(none), instead of the correct (1−0.4)5. “None are red” means “all are not-red”, every one of the five is something other than red. Remember that all are not is different from not all are. In ordinary English, people often say “All my friends can’t go to the concert” when they really mean “Some of my friends can go, but not all of them can go.” In math you have to be careful about the distinction. Here’s an example.

Example 31: For the same situation, what’s the probability that you’ll hit a red light no more than four mornings in a five-day week? (This could also be asked as “at most four mornings” or “four mornings at most”.)

Solution: Try enumerating. “At most four out of five” looks like this: 0 1 2 3 4 5 or 0 1 2 3 4 | 5. The previous example was a “none are” or “all are not”, but this one is a “not all are”.

P(≤ 4 out of 5) = 1 − P(5 out of 5)

P(5 out of 5) = 0.45

P(≤ 4 out of 5) = 1 − 0.450.9898

About 99% of weeks, you hit the red light no more than four mornings of the week.

Example 32: You’re throwing a barbecue, and you want to start the grill at 2 PM. Fred and Joe live on opposite sides of town, and they’ve both agreed to bring the charcoal. The problem is that they’re both slackers. Fred is late 40% of the time, and Joe is late 30% of the time. What’s the probability you’ll start the grill by 2 PM?

Solution: This is another “at least” problem for independent events, though this time the independent events don’t have the same probability. To have charcoal by 2 PM, at least one of them has to show up by then. What’s the probability that at least one will be on time? Again, you could compute the probability that they’re both on time, that Fred’s on time but Joe’s late, and that Fred’s late and Joe’s on time — all of those together will be the probability of charcoal on time. But again, the complement is your friend. The complement of “charcoal on time” is “charcoal late”, which happens only if they’re both late.

P(charcoal on time) = 1 − P(charcoal late)

P(charcoal on time) = 1 − P(Fred late and Joe late)

(Fred and Joe live on opposite sides of town, so whether one is late has no connection with whether the other one is late. The events are independent.)

P(charcoal on time) = 1 − P(Fred late) × P(Joe late)

P(charcoal on time) = 1 − 0.4×0.3 = 0.88

You’ve got an 88% chance of starting the grill on schedule.

Example 33: The space shuttle Challenger exploded shortly after launch in the 1980s, when one of six gaskets failed. After the fact, engineers realized that they should have known the design was too risky, but they didn’t think past “each gasket is 97% reliable.” The trouble was that if any gasket failed, the shuttle would explode. If you were asked to evaluate the design while the plans were still on the drawing board, what would you conclude? (Note: The design makes the six gaskets independent.)

Solution: The shuttle will explode if one or more gaskets fail. Here’s another “at least” problem, so enumerate the case you’re interested in: 0 | 1 2 3 4 5 6.

P(explosion) = P(at least one gasket fails)

The complement of “at least one gasket fails” (hard to compute) is “no gaskets fail” (much easier). What does it mean for no gaskets to fail? All gaskets must hold. Since the gaskets are independent, that’s easy to compute:

P(all six gaskets hold) = 0.976

The answer you want is the complement of the all-hold or zero-fail case:

P(at least one gasket fails) = 1 − P(all six hold) = 1 − 0.976

P(explosion) = P(at least one gasket fails) = 1 −0.976 ≈ 0.1670

Conclusion: There’s about a 17% chance that the shuttle will explode, just considering the gaskets and ignoring all other possible causes of trouble. This is about the same as the odds of shooting yourself in Russian roulette.

### 5B6.  Conditional Probability

In 2012, the Honda Accord was the most frequently stolen vehicle in the US (Siu 2013 [see “Sources Used” at end of book]). Does that mean that your Honda Accord is more likely to be stolen than another model?

You’re tested for a rare strain of flu, and the result is positive. Your doctor tells you the test is 99% accurate. Does that mean that there’s a 99% chance you have that strain of flu?

In New York City, a rape victim identifies physical characteristics that match only 0.0001% of people. Police find someone with those characteristics and arrest him. Is there only a 0.0001% chance that he’s innocent?

These are examples of conditional probability — the probability of one event under the condition that another event happened. It’s probably the most misunderstood probability topic, but I’m going to demystify it for you.

The definition may seem hard at first. But after you work through the examples you’ll find it makes sense.

Definition: The conditional probability of B given A, written P(B | A), is the probability of B under the condition that A occurs. Read B | A as “B given A” or “if A then B”.

That’s the “probability of one” interpretation. You might find the “proportion of all” interpretation easier: P(B | A) is the proportion of A’s that are also B.

Either way, the order matters — P(B | A) and P(A | B) mean different things and they’re different numbers.

Example 34: P(truck | Ford) is the probability that a vehicle is a truck if it’s a Ford, or the probability that a Ford is a truck, or the proportion of trucks among Fords. P(Ford | truck) is the probability that a vehicle is a Ford if it’s a truck, or the probability that a truck is a Ford, or the proportion of Fords among trucks.

Example 35: Let’s look first at the suspected rapist. The prosecutor presents evidence that these physical characteristics are found in only 0.0001% of people. The prosecutor therefore claims that there’s only a 0.0001% chance the suspect is innocent.

But the defense points out that there are over 8 million people in New York City. 0.0001% × 8,000,000 = 8, so the suspect is not a unique individual at all, but one of about eight people who match the eyewitness accounts. Seven of them are innocent. If there’s no evidence beyond the physical match to tie him to the crime, the probability that this defendant is innocent isn’t 0.0001%, it’s 7/8 or 87.5%. (And that’s just in the city. If you consider the metro area, or the US, or the world, there are even more people who match, so any one of them is even more likely to be innocent.)

The prosecutor’s fallacy is the false idea that the probability of a random match equals the probability of innocence. You can also describe this fallacy as “consider[ing] the unlikelihood of an event, while neglecting to consider the number of opportunities for that event to occur”, in the words of “The Prosecutor’s Fallacy” on the Poker Sleuth site (Stutzbach 2011 [see “Sources Used” at end of book]).

It’s an easy mistake to make if you just think about low probabilities. To not make this error, think in whole numbers, as the defense did. 0.0001% is hard to think about; 8 is much easier.

The key to solving conditional-probability problems is your old friend, probability of one equals proportion of all. The probability that this particular matching person is innocent is the same as the proportion of all matching people that are innocent, or the proportion of innocent people among those who match. Probability problems usually get easier when you turn them into problems about numbers of people or numbers of things.

What does this look like in symbols? (Don’t be afraid of symbols! They are your friend, I promise. Words are slippery and confusing, but when you reduce a problem to symbols you make the situation clear and you are half way to solving it.)

In this example, there’s a 0.0001% chance that a random person would match the physical type of the criminal:

P(matching) = 0.0001%

The prosecution wants you to believe that the probability of a matching individual being innocent is the same:

P(innocent | matching) = 0.0001%    (WRONG)

This is a conditional probability, the probability that one thing is true if another thing is true. Formally, the whole expression is “the probability of innocent given matching”. But it’s easier to think of as “the probability that a person who matches is innocent” or “the proportion of matching people who are innocent”.

The symbols help you clarify your thinking. “The probability of a match” and “the probability of innocence among those who match” are different symbols, and they’re different concepts. You’d expect them to be different probabilities.

The defense showed the right way to figure the probability of innocence given a match. 0.0001%×8,000,000 = 8 people match, and 7 of them are innocent. The probability that a matching person is innocent — the probability that a person is innocent given that he matches — is 87.5%.

P(innocent | matching) = 87.5%    (CORRECT)

Notice what happens with if-then probabilities. You’re considering one group within a subgroup of the population, not one group within the whole population. You’ve reduced your sample space — not all people, but all matching people. The bottom number of your fraction comes from the “given that” part of the conditional probability, because P(innocent | matching) is the proportion of matching people that are also innocent.

To explode the prosecutor’s fallacy, you distinguish between a probability in the whole population and a probability in a subgroup. You also have to ask yourself, “which group?” The issue of medical test results is a good example.

Example 36: There’s a rare skin disease, Texter’s Peril (TP), where you become hypersensitive to the buttons on your phone. (Yes, I am making this up.) It affects 0.03% of adults aged 18–30, three in ten thousand. The only cure is to lay off texting for 30 days, no exceptions. Naturally this is about the worst thing that can happen to anyone.

Your doctor has tested you and the test comes up positive. She tells you that the test is 99% accurate. Does that mean you are 99% likely to have TP? You might think so, and sadly many doctors make the same mistake.

You have a positive test result, and you want to know how likely it is that you have Texter’s Peril. In symbols,

P(disease | positive) = ?

Your doctor told you that the test is 99% accurate, meaning that 99% of people who actually have TP get a positive result:

P(positive | disease) = 99%

These are obviously not the same symbol, so the probability you care about, the probability you have the disease, may well be different from 99%. How can you compute it?

Change those probabilities to whole numbers, and make a table. (I got this technique from the book Calculated Risks [Gigerenzer 2002 [see “Sources Used” at end of book]]. The book cites a study showing that doctors routinely confused probabilities when counseling patients about test results.) You’ve already played with a two-way table; now you’re going to make one. It’s a little bit like filling in a puzzle. I hope you like puzzles. ☺

You don’t know the population size, but that’s okay. Just use a large round number, like a million. Start with what you know.

P(disease) = 0.03%

Out of 1,000,000 people, 0.03% = 300 will have TP, and the other 999,700 won’t. That’s the bottom row of the table, the totals row.

P(positive | disease) = 99%

Of the 300 who have actually have TP, 99% = 297 will get a correct positive result, and 3 will get a false negative. That’s the first column of the table.

P(negative | diseaseC) = 99%

(In the real world, a given test may not be equally accurate for positives and negatives, but we’ll overlook that to keep things simple.) Out of 999,700 who don’t have TP, 99% = 989,703 will get a correct negative result, and 9,997 will get a false positive. This is the second column of the table, and now you can fill in the column of totals.

Have TP Don’t Have TP Total 297 9,997 10,294 3 989,703 989,706 300 999,700 1,000,000

Take a look at that table, specifically the “Positive Test” row. Do you see the problem? Most of the people with positive test results actually don’t have Texter’s Peril, even though the test is 99% accurate!

It took a while to get here, but it’s better to be correct slowly than to be wrong quickly. You can now compute the probability of having TP given that you have a positive test result. Once again, probability of one equals proportion of all, so this is really the same as the proportion of people with positive test results who actually have TP:

P(disease | positive) = 297 / 10,294 = 2.89%

The test is 99% accurate, but because TP is rare, most of the positive results are false positives, and there’s under a 3% chance that a positive result means you actually have Texter’s Peril. There’s a 1 − 297/10,294 = 97.11% chance that a positive result is a false positive.

Notice again: With conditional probability, you’re not concerned with the whole population. Rather, you focus on a subgroup within a subgroup. P(disease | positive) is the proportion of people who actually have the disease, within the subgroup that received a positive test result.

Example 37: What’s the chance that a negative is a false negative, that given a negative test result you actually have TP? In symbols,

P(disease | negative) = ?

You’ve already got the table, so this is a piece of cake. Out of a million people, 989,706 test negative and 3 of them have the disease. The probability that a negative is a false negative is

P(disease | negative) = 3/989,706 ≈ 0.000 003

which is essentially nil.

Example 38: A lot of Web sites in 2013 trumpeted the news that the Honda Accord was the most frequently stolen model in the US the year before. And that’s true. Out of 721,053 stolen cars and light trucks in 2012, Hot Wheels 2012 tells us that 58,596 were Honda Accords (NICB 2013 [see “Sources Used” at end of book]).

But many Web sites warned Honda owners that they were most at risk. For instance, Honda Accord, Civic Remain Top Targets for Thieves at cars.com (Schmitz 2013 [see “Sources Used” at end of book]) leads with “If you own a Honda Accord or Civic, or a full-size Ford pickup truck, you might want to take a moment to make sure your auto-insurance payments are up to date. You drive one of the top three most-stolen vehicles in the US.”

Do you see what’s wrong here? Think about it for a minute before reading on.

Yes, a lot of Honda Accords were stolen, because there are a lot of them on the road. Too many news organizations are sloppy and think that the likelihood a stolen car is an Accord is the same as the likelihood that an Accord will be stolen. This is the doctor’s mistake from the previous example, all over again.

Let’s clarify. You have 58,596 Accords out of 721,053 thefts, so the probability that a stolen car was an Accord — the probability that a car was an Accord given that it was stolen — the probability of “if stolen then Accord” — is

P(Accord | stolen) = 58,596/721,053 = 8.13%

But that doesn’t tell you doodley-squat about your chance of having your Accord stolen. That would be the probability of a car being stolen given that it is an Accord, “if Accord then stolen”. The top number of that fraction is still 58,596, but the bottom number is the total number of Accords on the road:

P(stolen | Accord) = 58,596/(total Accords on the road in 2012)

Do you see the difference? They’re both conditional probabilities, but they’re different conditions. “If stolen then Accord” is different from “if Accord then stolen”. The first one is about Accord thefts as a proportion of all thefts, and the second one is about Accord thefts as a proportion of all Accords. Those are different numbers.

To find the chance that an Accord will be stolen, you need the number of Accords on the road in 2012. A press release from Experian (2012) says there were “more than 245 million vehicles on US roads” in 2012, and 2.6% of them were Accords.

P(stolen | Accord) = (stolen Accords)/(total Accords on the road in 2012)

P(stolen | Accord) = 58,596/(2.6% of 245 million)

P(stolen | Accord) = 58,596/6,370,000

P(stolen | Accord) = 0.92%

Yes, over 8% of cars stolen in 2012 were Accords, but the chance of a given Accord being stolen was under 1%. P(Accord | stolen) = 8.13%, but P(stolen | Accord) = 0.92%.

#### Optional:  Conditional Probability Formula

Rule: P(B | A) = P(A and B) / P(A)  —or—  N(A and B) / N(A)

The “N” alternatives remind you that often it’s easier just to count than to find probabilities and then divide. Either way, when you consider P(B | A), remember that you’re interested in the likelihood of B given that A occurs. It’s the B cases within the A group, not all the B cases.

P(A | B) is not the same as P(B | A). You’ll get the probability right if you remember that the second event, the “given that” event, supplies the bottom number of the fraction.

Example 39: Find P(stolen | Accord), the chance that any one Accord will be stolen. Using the numbers from Example 38,

P(stolen | Accord) = N(Accord and stolen) / N(Accord)

P(stolen | Accord) = 58,596/6,370,000 = 0.92%

Example 40: I draw a card from the deck, and I tell you it’s red. What’s the probability that it’s a heart? If you didn’t know anything about the card, you’d write P(heart) = ¼ because a quarter of the cards in the deck are hearts. But what is the probability given that it’s red?

P(heart | red) = P(heart and red) / P(red)

P(heart and red) is the probability of a red heart. A quarter of the cards in the deck are red hearts, so this is just ¼. P(red) is of course ½ because half the cards in the deck are red.

P(heart | red) = (¼) / (½) = (¼) × 2 = ½

This one is probably easier to do by just counting:

P(heart | red) = N(heart and red) / N(red)

P(heart | red) = 26/52 = ½

Either way, you’re concerned with the sub-subgroup of hearts within the subgroup of red cards. P(heart | red) = ½ — half of the red cards are hearts.

Example 41: You know P(heart | red) = ½: given that a card is red, there’s a ½ probability that it’s a heart. But what is P(red | heart), the probability that a card is red given that it’s a heart? You probably already know the answer, but let’s run the formula:

P(red | heart) = N(red and heart) / N(heart)

P(red | heart) = 13/13 = 1 (or 100%)

Conditional probabilities often come up in two-way tables.

US Marital Status in 2006 (in Millions)
MenWomen Totals
Married63.664.1127.7
Widowed2.611.313.9
Divorced9.713.122.8
Never married30.325.055.3
Totals106.2113.5219.7

Example 42: Again using the table of marital status (reprinted on the last page), status, what’s the probability that a randomly selected woman is divorced? In other words, given that the person is a woman, what’s the probability that she’s divorced?

Solution: The problem wants P(divorced | woman), the probability that the person is divorced given that she’s a woman.

P(divorced | woman) = N(divorced and woman) / N(woman)

P(divorced | woman) = 13.1/113.5 ≈ 0.1154

Because we have “given woman” or “if woman”, the bottom number is the number of women, 113.5 million.

### 5B7.  Optional:  Checking Independence

Remember the definition of independent events? A and B are independent if the occurrence of one doesn’t change the probability of the other. Now that you know about conditional probability, you can define independent events in terms of conditional probability:

Definition: Two events A and B are independent if and only if P(A|B) = P(A).

This makes sense. P(A) is the probability of A without considering whether B happened or not, and P(A|B) is the probability of A given that B happened. If B’s occurrence doesn’t change the probability of A, then those two numbers will be equal.

US Marital Status in 2006 (in Millions)
MenWomen Totals
Married63.664.1127.7
Widowed2.611.313.9
Divorced9.713.122.8
Never married30.325.055.3
Totals106.2113.5219.7

Example 43: Referring again to the table of marital status, status (reprinted on the last page), show that “woman” and “widowed” are dependent (not independent).

Solution:

P(widowed) = 13.9 / 219.7 ≈ 0.0633

P(widowed | woman) = 11.3 / 113.5 ≈ 0.0996

These numbers are different — the probability of “widowed” changes when “woman” is given, or in English the proportion of widowed women is different from the proportion of widowed people. Therefore the events “woman” and “widowed” are not independent.

By the way, if A and B are independent then B and A are independent. So you could just as well compare P(woman) = 113.5/219.7 ≈ 0.5166 to P(woman|widowed) = 11.3/13.9 ≈ 0.8129. Since those are different, you conclude that “woman” and “widowed” are dependent.

### 5B8.  Optional:  Probability “and” for All Events

When events are not independent, to find probability “and” you need to use a conditional probability. Remember the formula for conditional probability: P(B | A) = P(A and B) / P(A). Multiply both sides by P(A) and you have P(A) × P(B | A) = P(A and B), or:

Rule: For all events, P(A and B) = P(A) × P(B | A)

Example 44: You draw two cards from the deck without looking. What’s the probability that they’re both diamonds?

Solution: Are these independent events? No! P(diamond1), the probability that the first card is a diamond, is 13/52 because there are 13 diamonds out of 52. But if the first card is a diamond, the probability that the second card is a diamond is different. Now there are only 12 diamonds left in the deck, out of a total of 51 cards. So P(diamond2 | diamond1) = 12/51, which is a bit less than 13/52.

P(diamond1 and diamond2) = P(diamond1) × P(diamond2 | diamond1)

P(diamond1 and diamond2) = (13/52) × (12/51) ≈ 0.0588

## 5C.  Sequences instead of Formulas

A lot of probability problems can be solved without using formulas, through the technique of sequences. Here’s the procedure:

1. Write down the “winning sequences”, the sequences that lead to the desired outcome.
2. Assign probabilities to each event in each sequence, from start to end.
3. Multiply the probabilities within each sequence, and then add up the probabilities of all the sequences.

Example 45: Suppose a bag contains 6 oatmeal cookies, 4 raisin cookies, and 5 chocolate chip. You are to draw two cookies from the bag without looking (and without replacement, which would be yucky). What is the probability that you will get two chocolate chip cookies?

Solution: To start with, notice that there are 6+4+5 = 15 cookies. There’s only one winning sequence, but this one illustrates an important point: you have to assign each probability in its situation at that point in its sequence.

1. Sequence: CC1 and CC2
2. Probabilities: 5/15 and 4/14.
You compute the probability CC2 at this point in the sequence: it’s the probability of a second CC if the first cookie was CC. You don’t care about the probabilities if the first cookie was anything else, because the sequence starts with a CC cookie. That means that, when you are looking for the probability of a second CC cookie, the bag now contains only 14 cookies, and only 4 of them are CC.
3. Arithmetic: (5/15)×(4/14) ≈ 0.0952

Example 46: In the same situation, what’s the probability you’ll get one oatmeal and one raisin?

Solution: Even though you don’t care which order they come in, you have to list both orders among your willing sequences. Remember the example of flipping two coins, or the examples with dice: to make probabilities come out right, consider possible orderings.

1. Sequences: (A) O1 and R2; (B) R1 and O2
2. Probabilities: (A) 5/15 and 4/14; (B) 4/15 and 5/14
3. Arithmetic: (5/15)×(4/14) + (4/15)×(5/14) ≈ 0.1905

Example 47: Consider the same bag of 15 cookies, but now what’s the probability you get two cookies the same?

Solution:

1. Sequences: (A) O1 and O2; (B) R1 and R2; (C) CC1 and CC2
2. Probabilities — again, the probability for the second cookie takes into account the first cookie that was drawn.
(A) 6/15 and 5/14; (B) 4/15 and 3/14; (C) 5/15 and 4/14
3. Arithmetic: (6/15)×(5/14) + (4/14)×(3/14) + (5/15)×(4/14) ≈ 0.2952

Example 48: Your teacher’s policy is to roll a six-sided die and give a quiz if a 2 or less turns up. Otherwise, she rolls again and collects homework if a 3 or less turns up. You haven’t done the homework for today and you’re not ready for a quiz. What is the probability you’ll get caught?

Solution: Though you could do this with formulas, you’ll get the same answer with less pain by following the method of sequences. The “winning sequences” in this case are the sequences that lead to either a quiz or homework.

1. There are two sequences: (A) quiz (and stop, without deciding about homework); (B) no quiz, but homework
Notice that you start each sequence from the same starting point. Notice also that you don’t consider the possible sequence “no quiz and no homework” because in that sequence you don’t get caught.
2. P(quiz) = 2/6 = 1/3. P(no quiz) = 1−1/3 = 2/3. P(homework if die roll) = 3/6 = 1/2.
(A) 1/3 (B) 2/3 and 1/2
3. (1/3) + (2/3)×(1/2) = (1/3)+(1/3)= 2/3
There’s a 2/3 probability of a quiz or homework.

Sequences let you think through a situation without getting confused about which formula may apply. Sometimes no formula applies. Here’s a famous example.

Example 49: You’re a contestant on Let’s Make a Deal. You have to pick one of three doors, knowing that there’s a new car behind one of them and a “zonk” (something funny but worthless) behind the other two. Let’s say you pick Door #1.

The host, who of course knows where the car is, opens Door #2 and shows you a zonk. He then asks whether you want to stick with your choice of Door #1, or instead take what’s behind Door #3. What should you do, and why?

(I gave specific door numbers to help make this problem less abstract, but the specifics don’t matter. What does matter is that you pick a door at random, and the host reveals that a door you didn’t pick is the wrong one.)

Solution: There’s really no formula for this one, because the host’s actions aren’t governed by probability. Once you realize that, it’s easy.

1. In the long run, 1/3 of contestants will choose the correct door, whichever one it is, and 2/3 will choose one of the two wrong doors. Why? The show’s producers have to make sure that prizes are equally distributed among the three doors over the long haul. If they favored one door over the others, people would notice and would start picking that door.

Therefore, P(right door) = 1/3 and P(wrong door) = 2/3.

2. If you chose the right door, the host opens one of the two wrong doors, but obviously you would not benefit by switching.
3. If you chose the wrong door, the host opens the other wrong door and offers you the chance to switch doors. The host has eliminated the other wrong door, and the third door must be the winning door. You should switch.

If you chose the wrong door and switch doors, you will always win because the host has eliminated the other wrong door.

4. The probability that you chose the right door initially, and will lose if you switch, is 1/3. The probability that you chose the wrong door initially, and will win if you switch, is 2/3.

In the long run, keeping your original choice is the winning strategy 1/3 of the time, and switching is the winning strategy 2/3 of the time.

5. Switching doors doubles your chance of winning.

This is the famous Monty Hall Problem. Monty Hall developed Let’s Make a Deal and hosted the show for many years. There was a lot of controversy (Tierney 1991 [see “Sources Used” at end of book]) about the answer. Many people who should have known better thought that Door #1 and Door #3 were equally likely after Door #2 was opened. But they forgot that this is not a pure probability problem. The host knows where the car is and picks a door to open based on that knowledge, and that makes all the difference.

## What Have You Learned?

Key ideas:

(The online book has live links to all of these.)

Because this textbook helps you,
please click to donate!
Because this textbook helps you,
BrownMath.com/donate.
Study aids:

## Exercises for Chapter 5

Write out your solutions to these exercises, showing your work for all computations. Then check your solutions against the solutions page and get help with anything you don’t understand.

Caution! If you don’t see how to start a problem, don’t peek at the solution — you won’t learn anything that way. Ask your instructor or a tutor for a hint. Or just leave it and go on to a different problem for now. You may find when you return to that “impossible” problem that you see how to do it after all.

### Problem Set 1

1 You toss three coins.
(a) How many entries do you expect in the sample space of equally likely events?

(b) Construct that sample space.

(c) Find P(2H), the probability of getting exactly two heads.

2 In 2003 a federal government survey estimated that 58.2% of US households had both a cell phone and a landline, 2.8% had only cell service, and 1.6% had no phone service at all.
(a) Construct a probability model for type of phone service to US households. (Hint: You’re going to have to add a fourth case.)
(b) Supposedly, polling agencies try not to call cell phones, because consumers object to paying for the calls. What proportion of US households could be reached by a landline in 2003?
3 According to DiscovertheOdds.com (2014) [see “Sources Used” at end of book], the probability of being struck by lightning in a given year is about 1 in 1,000,000. A blog post by Tara Parker-Pope (2007) [see “Sources Used” at end of book] says that the probability of suffering a shark attack in 2003 was about 1 in 4,691,000. Can you add these two numbers to find the probability of being struck by lightning or attacked by a shark in 2003 as 1/1,000,000 + 1/4,691,000? Briefly, why or why not?
4 P(A), the probability of event A, is 0.7. A and B are complementary events. Find (a) P(not A); (b) P(B); (c) P(A and B).
If any of them cannot be determined from the information given, say so.
5 A blog post by Tara Parker-Pope (2007) [see “Sources Used” at end of book] reported that your lifetime risk of dying of heart disease is 1/5, and your lifetime risk of dying of cancer is 1/7. Can you add these two numbers to find the probability of dying of heart disease or cancer? Briefly, why or why not?
6 Explain the difference between P(divorced | man) and P(man | divorced).
7 A company analyzed all 412 customer complaints that were received in January 2013. None of them were for unresolved billing disputes. Therefore the probability that a randomly selected complaint from January 2013 was for an unresolved billing dispute is zero. We’re used to interpreting a probability of zero as impossible, but obviously it is possible for a complaint to be about an unresolved billing dispute. How do you resolve this paradox?

Need a hint? Think about the two kinds of probability from the beginning of the chapter.

8 You shuffle a standard 52-card deck well and deal five cards. What is the probability that the fifth card is a spade?
9 Write out the sample space for flipping two coins, and use it to answer these questions.
(a) If you are told that at least one of the flips came up heads, what is the probability that both are heads?
(b) If you are told that the first coin came up heads, what is the probability that both are heads?
10 The chance of being a victim of violent crime in a given year varies by age and sex, according to What are my chances of being a victim of violent crime? [see “Sources Used” at end of book] Take 17.1 per thousand, or 1.71%, as the average.

(a) You’re waiting for a flight at the airport. You fall into conversation with a stranger, and you’re surprised to learn that both of you have been victims of violent crime in the past year. Assuming random selection, what are the chances of that happening?

(b) Explain why you cannot use the same technique to find the probability that both members of a married couple have been victims of violent crime in the past year.

US Marital Status in 2006 (in Millions)
MenWomen Totals
Married63.664.1127.7
Widowed2.611.313.9
Divorced9.713.122.8
Never married30.325.055.3
Totals106.2113.5219.7
11 For this problem, please use the table of marital status at right. status (reprinted on the last page).

(a) Find P(divorced).

(b) Give two interpretations of that probability.

(c) What type of probability is this: classical, empirical, experimental, theoretical?

(d) Find P(divorcedC) and give one interpretation.

(e) Find P(man and married).

(f) Find P(man or married). (Work this with and without the formula.)

(g) Find the probability that a randomly selected male was never married:
P(never married | male) = ?

(h) Find P(man | married), and interpret as “____% of ____ were ____.”

(i) Find P(married | man), and interpret as “____% of ____ were ____.”

12 In five-card draw poker, you are dealt five cards and then during the betting you can discard some in hopes that the replacements will improve your hand. You have a pat hand if the first five cards are good enough that you don’t need to discard. What’s the probability you’ll be dealt a diamond flush (five diamonds) as a pat hand?
13 There are 20 M&Ms left in the dish: 5 blue, 4 orange, 3 green, 3 yellow, 3 brown, and 2 red. The yellows are your favorites. Your friend takes three M&Ms without looking.

(a) What’s the chance that she leaves your favorites behind?

(b) What’s the chance that all three of her picks are red?

14 Tom Turkey invested in two risky startup companies, A and W. There is a 0.90 probability that company A will go bankrupt, and a 0.80 probability that company W will go bankrupt. Assuming the two companies have no connection, find the probabilities that (a) both will go bankrupt; (b) one of them, but not both, will go bankrupt; (c) neither will go bankrupt.
Colors of Plain M&Ms
Blue24 %
Orange20 %
Green16 %
Yellow14 %
Brown13 %
Red13 %
15Without looking, you take three M&Ms from a new three-pound bag. (The bag contains over a thousand M&Ms.) Use the probability model of plain M&M colors (reprinted on the last page) at right to answer these questions.

(a) Find the probability that all three are red.

(b) Find the probability that none are red.

(c) Find the probability that at least one is green.

(d) Find the probability that exactly one is green.

16 A poll found that 45% of baseball fans had attended a game in person within the past year. Of five randomly selected baseball fans, find the probability that at least one fan had not attended a game within the past year.
17 Without looking, Grace Underfire takes two sourballs from a bowl that contains 11 cherry and 9 orange flavor. What is the probability that she will get one of each flavor?
18 An annual church raffle offers one chance in 500 of winning something. Find the chance that you win at least once if you play five years in a row.
19 Butch will miss an important TV program while taking his statistics exam, so he sets both his DVRs to record it. The first one records 70% of the time, and the second one records 60% of the time. (Their performance is independent.) What is the probability that he gets home after the exam and finds
(a) No copies of his program?
(b) One copy of his program?
(c) Two copies of his program?

### Problem Set 2

20 Police plan to enforce speed limits during the morning rush hour on four different routes into the city. The traps on routes A, B, C, and D are operated 40%, 30%, 20%, and 30% of the time, respectively. Biff always speeds to work, and he has probability 0.2, 0.1, 0.5, and 0.2 of using those routes.
(a) What’s the probability that he’ll get a ticket on any one morning?
(b) What’s the probability he’ll go five mornings without a ticket?
(Hint: His choice of a route, and whether there’s a speed trap on that route, are independent.)
US Marital Status in 2006 (in Millions)
MenWomen Totals
Married63.664.1127.7
Widowed2.611.313.9
Divorced9.713.122.8
Never married30.325.055.3
Totals106.2113.5219.7
21 For this problem, please use the table of marital status at right. status (reprinted on the last page). Show that the events “man” and ”divorced” are not independent.
22 I remarked that if you flip a fair coin repeatedly, you’ll see a run of ten heads or ten tails. Show why this should happen twice in about every thousand flips.
23 (adapted from Dabes and Janik 1999 [see “Sources Used” at end of book], page 24)  The probability that a certain door is locked is 0.5. The key to the door is one of five unidentified keys hanging on a rack. You select two keys before going to the door. Find the probability that you can open the door without returning for another key.

## What’s New?

• 15 Nov 2021: Corrected a link to a biography of Monty Hall.
• (intervening changes suppressed)
• 22 Aug 2012: New document.
US Marital Status in 2006 (in Millions)
MenWomen Totals
Married63.664.1127.7
Widowed2.611.313.9
Divorced9.713.122.8
Never married30.325.055.3
Totals106.2113.5219.7
Colors of Plain M&Ms
Blue24 %
Orange20 %
Green16 %
Yellow14 %
Brown13 %
Red13 %
Seat-Belt Use by
College Students Driving
(sample size: 4521)
Never2.61 %
Rarely5.51 %
Sometimes7.63 %
Most of
the time
15.84 %
Always68.41 %   Updates and new info: https://BrownMath.com/swt/