Stats without Tears
6. Discrete Probability Models
Updated 18 Jan 2017
(What’s New?)
Copyright © 2013–2019 by Stan Brown
Updated 18 Jan 2017
(What’s New?)
Copyright © 2013–2019 by Stan Brown
Intro: In Chapter 5, you looked at the probabilities of specific events. In this chapter, you’ll take a more global view and look at the probabilities of all possible outcomes of a given trial.
The random variable is one of the main concepts of statistics, and we’ll be dealing with random variables from now till the end of the course.
A variable is “the characteristic measured or observed when an experiment is carried out or an observation is made.”
—Upton and Cook (2008, 401) [see “Sources Used” at end of book]
If the results of that procedure depend on chance, completely or partly, you have a random variable. Each outcome of the procedure is a value of the variable. We use a capital letter like X for a variable, and a lowercase letter like x for each value of the variable.
As you learned in Chapter 1, numeric variables can be discrete or continuous. A discrete random variable can have only specific values, typically whole numbers. A continuous random variable can have infinitely many values, either across all the real numbers or within some interval.
In this chapter, you’ll be concerned with discrete random variables. In the next chapter, you’ll look at one particular type of continuous random variable, the normal distribution.
Example 1: You roll three dice. The number of sixes that appear is a random variable, and the total number of spots on the upper faces is another random variable. These are both discrete.
Example 2: You randomly select a household and ask the family income for last year. This is a continuous random variable.
Example 3: You randomly select twelve TC3 students, measure their heights, and take the average. “Height of a student” is a continuous random variable, and “average height in a 12student sample” is another continuous random variable.
Example 4: You randomly select 40 families and ask the number of children in each. “Number of children in family” is a discrete random variable, and “average number of children in a sample of 40 families” is a continuous random variable.
Definition: A discrete probability distribution or DPD (also known as a discrete probability model) lists all possible values of a discrete random variable and gives their probabilities. The distribution can be shown in a table, a histogram, or a formula. Like any probabilities, the probabilities in a DPD can be determined theoretically or experimentally.
Prize  Declared Value, x  Chance
of Winning, P(x) 

Two Camaros  $100,000  1 in 5,000,000 
Cash  10,000  1 in 1,000,000 
Apple iPad  1,000  1 in 500,000 
Various  500  1 in 250,000 
Gift card  5  0.9999928 
Example 5: In March 2013, Royal Auto sent me one of those “Win big!” flyers with a fake car key taped to it. The various prizes, and chances of winning, are shown at right.
This is a discrete probability distribution. The discrete variable X is “prize value”, and the five possible values of X are $100,000 down to $5.
Remember the two interpretations of probability: probability of one = proportion of all. From the table, you can equally well say that any person’s chance of winning a $500 prize is 1/250,000 = 0.000 004 = 0.0004%, or that in the long run 0.0004% of all the people who participate in the promotion will win a $500 prize.
A discrete probability distribution must list all possible outcomes. The total probability for all possible outcomes in any situation is 1. Therefore, for any discrete probability distribution, the probabilities must add up to 1 or 100%.
Definitions: Suppose you do a probability experiment a lot of times. (For the Royal Auto example, suppose bazillions of people show up to claim prizes.) Each outcome will be a discrete value. The mean of the discrete probability distribution, μ, is the mean of the outcomes from an indefinitely large number of trials, and the standard deviation of the discrete probability distribution, σ, is the standard deviation of the outcomes from an indefinitely large number of trials. The mean of any probability distribution is also called the expected value, because it’s the expected average outcome in the long run.
How do you find the mean and SD of a discrete probability
distribution? Well, one interpretation of
probability is longterm relative frequency,
so you can treat a discrete probability distribution as a
relative frequency distribution. (You can also think of the
probabilities as weights, with the mean as the weighted average.)
On the TI83/84, that means good old 1Var Stats
,
just like in Chapter 3.
μ = ∑ x·P(x) σ = √[ ∑ (x²·P(x)) − μ²]
For ∑, see ∑ Means Add ’em Up in Chapter 1.
Example 6: To find the mean and SD of the distribution of winnings in the Royal Auto sweepstakes, put the x’s in one list and the P(x)’s in another list. Caution: When the probability is a fraction, enter the fraction, not an approximate decimal. The calculator will display an approximate decimal, but it will do its calculations on a much more precise value.
After entering the x’s and p’s, press
[STAT
] [►
] [1
] and specify your two lists, such as
1Var Stats L1,L2
. (Yes, the order matters: the
x list must be first and the P(x) list second.) When you
get your results, check n first. In a discrete probability
distribution, n represents the total of the probabilities, so it
must be exactly 1. If it’s just approximately 1, you made a
mistake in entering your probabilities.
The mean of the distribution is μ = $5.03, and the standard deviation is σ = $45.85.
Interpretation: In the long run, the dealership will have to pay out $5.03 per person in prizes. The SD is a little harder to get a grasp on, but notice that it’s more than nine times the mean. This tells you that there is a lot of variability in outcome from one person to the next. In general, the mean tells you the longterm average outcome, and the SD tells you the unpredictability of any particular trial. You can look at the SD as a measure of risk.
A couple of notes about the calculator output: The calculator knows that a DPD is a population, so it gives you σ and not s for the SD. It should give you μ for the mean, but instead it displays x̅, so you need to make the change. I’ve already mentioned that the sum of the probabilities (n) must be exactly 1, not just approximately 1.
Example 7: When visiting the city, should you park in a lot or on the street? On a quarter of your visits (25%), you park for an hour or less, which costs $10 in a lot; for parking more than an hour they charge a flat $14. If you park on the street, you might receive a simple $30 parking ticket (p = 20%), or a $100 citation for obstruction of traffic (p = 5%), but of course you might get neither. Which should you do?
(Adapted from Paulos 2004 [see “Sources Used” at end of book].)
You have two probability models here, one for the outcomes of parking in a lot, and one for street parking. Begin by putting the two models into tables:


The problem leaves out some things that you can figure for yourself. Remember that every probability model includes all outcomes, and the probabilities add up to 1. If there’s a 25% chance of parking up to an hour, there must be a 100−25 = 75% chance of parking more than an hour. And on the street, if you have a 20+5 = 25% chance of getting some kind of ticket, you have a 100−25 = 75% chance of getting neither. The cost of getting neither ticket is zero.
Now you can fill in the empty cells in the tables.


I showed the total probability to emphasize that it’s 1. Never compute the total of the outcomes (x’s), because that wouldn’t mean anything.
How do these tables help you make up your mind where to park? By themselves, they don’t. But they let you compute μ and σ, and that will help you decide.
I placed the x’s and P(x)’s for the parking
lot in L1 and L2, and did 1Var Stats L1,L2
. I placed the
x’s and P(x)’s for street parking in L3 and L4 and
did 1Var Stats L3,L4
. Here are the results:
Lot: Street:
As always, look first at n. If it’s not exactly 1, find your mistake in entering the probabilities.
Now you can interpret these results. Parking in the lot is a bit more expensive in the long run (μ = $13.00 per day versus μ = $11.00 per day). But there are no nasty surprises (σ = $1.73, little variation from day to day). Parking on the street is much riskier (σ = $23.64), meaning that what happens today can be wildly different from what happened yesterday.
So what should you do? Statistics can give you information, but part of your decision is always your own temperament. If you like stability and predictability — if you are risk averse — you’ll opt for the parking lot. If it’s more important to you to save $2 a day on average, and you can accept occasionally getting hit with a nasty fine, you’ll choose to park on the street.
The fair price of a game is the price that would make the expected value or mean value of the probability distribution equal to zero, the breakeven point.
(“Fair price” is one of those math words that look like English but mean something different. You should expect to pay more than the fair price because the operator of the game — the insurance company or casino or stockbroker — also has to cover selling and administrative expenses.)
There are two ways to compute the fair price:
Die shows  x  P(x) 

1,2,3,4,5  −$12  5/6 
6  $60−12 = $48  1/6 
6/6 = 1 
Example 8: Take a really simple bar game: a stranger offers to pay you $60 if you roll a 6 with a standard sixsided die, but you have to pay him $12 per roll. Find the fair price of this game.
Method 1: The only prize is $60, and you have a 1/6 chance of winning it. $60×(1/6) = $10.
Method 2: Amounts in L1, probabilities in L2; 1VarStats L1,L2. Verify that n=1, and read off the mean of −$2. The actual price is $12, so the fair price is $12 + (−2) = $10.
Naturally, the two methods always give the same answer. Method 2 is easier if you already know the mean of the probability distribution; otherwise Method 1 is easier.
Example 9: A lottery has a $6,000,000 grand prize with probability of winning 1 in 3,000,000. It also has a $10 consolation prize with probability of winning 1 in 1000. What is the fair price of your $5 lottery ticket?
Solution: You don’t need μ, so Method 1 is easier: multiply each prize by its probability and add up the products. $6,000,000×(1/3,000,000) + $10×(1/1000) → fair price is $2.01.
Why does a lottery ticket that is worth $2.01 actually cost $5.00? In effect, the lottery is paying out about 2.01/5.00 ≈ 40% of ticket sales in prizes. Some of the 60% that the lottery commission keeps will cover the lottery’s own expenses, and the rest is paid to the state treasury. This is actually fairly typical: most lotteries pay out in prizes less than half of what they take in. By contrast, the illegal “numbers game” pays out about 70%, or at least it did in the 1980s in Cleveland. (Don’t ask me how I know that!)
In the examples so far of probability models, I’ve had to give you a table of probabilities. But there are many subtypes of discrete probability distribution where the probabilities can be calculated by a formula. The rest of this chapter will look at part of one family, discrete probability distributions that come from Bernoulli trials.
Repeated trials of a process or an event are called Bernoulli trials if they have both of these characteristics:
If the probability of success on each trial is p, then the probability of failure on each trial is 1−p, or q for short.
Bernoulli trials are named after Jacob Bernoulli, a Swiss mathematician. He developed the binomial distribution, which you’ll meet later in this chapter.
Example 10: You randomly interview 30 people to find out which party they will vote for in the next election. These are not Bernoulli trials, because there are more than two possible outcomes. (New York State ballots often have six or more parties listed, though some parties just endorse the Republican or Democratic candidate.)
On reflection, you realize that you don’t care which party a given voter will choose. All you care about is whether they are voting for your candidate or not, so you randomly select 30 registered voters and ask, “Will you be voting for Abe Snake for President?” (Yes, that’s a real thing; here’s a video.) These are Bernoulli trials, because there are only two answers, and the probability of voting for Abe Snake is the same for each randomly selected person. (p equals the proportion of Abe Snake voters in the population. Remember, proportion of all = probability of one.)
Actually, this overlooks the undecided or “swing” voters. These become fewer as the election gets closer, but in real life they can’t be overlooked because they may be a larger proportion than the leading candidate’s lead.
You draw cards from a deck until you get a heart. These are not Bernoulli trials. Although there are only two outcomes, heart and other suit, the probability changes with each draw because you have removed a card from the deck.
Variation: You replace each card and reshuffle the deck before drawing the next card. Then these become Bernoulli trials because the probability of drawing a heart is 25% on every trial.
Variation: You have five decks shuffled together, instead of one 52card pack. You don’t replace cards after drawing them. You can treat these as Bernoulli trials even without replacement, because you won’t be drawing enough cards to alter the probabilities significantly.
How do I know? Five packs is 260 cards, and 10% of 260 is 26. On the first card, P(heart) = 25%. It’s quite unlikely that you’d have no hearts by the 26th card (0.04% chance), but if you did, the probability of a heart on the 27th card would be: 5×13/(5×52−26) ≈ 27.8%. That’s not much different from the original 25%.
(You don’t have to take my word for these probabilities. Use the sequences method from Chapter 5 to compute them.)
Although this sample without replacement violates independence, it doesn’t violate it by very much, not enough to worry about. This bears out what I said earlier: Trials without replacement can still be treated as independent when the sample is small relative to the population.
Example 13:
According to the AVMA (2014) [see “Sources Used” at end of book]
30.4% of US households own one or more cats. Suppose
you randomly select some households.
(a) How likely is it that the first time you find cat owners is in
the fifth household?
(b) How likely is it that your first catowning household will be
somewhere in the first five you survey?
Although you could compute these individual probabilities using techniques from Chapter 5, there’s a specific model called the geometric model that makes it a lot easier to compute. Also, using the geometric model you can get an overview of the probabilities for various outcomes, which you’d miss by computing probabilities of specific events using the previous chapter’s techniques. If trials are independent, and you want the probability of a string of failures before your first success, you’re using a geometric model.
The geometric model, also known as the geometric probability distribution, is a kind of discrete probability distribution that applies to Bernoulli trials when you try, and try, and try again until you get a success. P(x) is the probability that your first success will come on your xth attempt, after x−1 failures.
Expanding on the definition of Bernoulli trials, you can say that a geometric model is one where
The probability of success on any given trial, p, completely describes a geometric model.
Here’s a picture of part of the geometric model for catowning households, with p = 0.304.
How do you read this? The horizontal axis is x, the number of the trial that gives your first success, and the vertical axis is P(x), the probability of that outcome.
For example, there’s a hair over a 30% chance that you’ll find cat owners in your first household, P(1) = 30.4%. There’s about a 21% chance that the first household won’t own cats but the second household will, P(2) ≈ 21%. Skipping a bit to x = 6, there’s just about a 5% chance that the first five households won’t have cats but the sixth will, P(6) ≈ 5%. And so forth.
x = 1 is always the most likely outcome, and larger x values are successively less and less likely. This is true for every geometric distribution, not just this particular one with p = 0.304.
The geometric model never actually ends. The probabilities eventually get too small to show in the picture, but no matter what x you pick, the probability is still greater than 0.
Your TI83/84 calculator has two menu selections for the geometric model:
geometpdf(
p,x)
answers the question “what’s the
probability that my first success will come at trial number
x?”geometcdf(
p,x)
answers the question “what’s
the probability that my first success will come at or before
trial number x?” (The “c” stands for
cumulative, because the cdf
functions
accumulate the probabilities for a range of outcomes.)They’re both in the [2nd
VARS
makes DISTR
] menu.
(If you have a calculator in the TI89 family, use the
[F5
] Distr
menu. Select Geometric
Pdf
and Geometric Cdf
.)
Let’s use the calculator to find the answers for Example 13. Here p, the probability of success in any given household, is 30.4% or 0.304.
Part (a) wants the probability of four failures followed by a
success on the fifth try. For that you use geometpdf
.
Press [2nd
VARS
makes DISTR
] [▲
] [▲
] to
get to geometpdf
, and press [ENTER
].
With the “wizard” interface:  With the classic interface: 

Enter p and x.
Press [ 
After entering p and x, press [) ] [ENTER ] to get the
answer.
geometpdf(.304,5) = .0713362938 → 0.0713 
There’s about a 7% chance you won’t find any cat owners in the first four households but you will in the fifth household.
(You could calculate this the long way. The probability of four failures followed by a success is (1−.304)^{4}×.304. But the geometric model is easier. That’s the point of a model: one general rule works well enough for all cases, so you don’t have to treat each situation as a special case with its own unique methods.)
Part (b) wants the probability of a success occurring anywhere in
the first five trials. This is a geometcdf
problem.
Press [2nd
VARS
makes DISTR
] [▲
] to
get to geometcdf
, and press [ENTER
].
With the “wizard” interface:  With the classic interface: 

Enter p and x.
Press [ 
After entering p and x, press [) ] [ENTER ] to get the
answer.
geometcdf(.304,5) = .8366774327 → 0.8367 
There’s almost an 84% chance you will find at least one catowning household among the first five.
(Doing this the long way, you would use the complement. The complement of “at least one catowning household in the first five” is “no catowning households in the first five”. The probability that a given household doesn’t own a cat is q = 1−.304 = 0.696, and the probability that five in a row don’t own cats is 0.696^{5}. Therefore the original probability you wanted is 1−(.696^{5}) = 0.8367.)
You don’t actually need formulas for the
geometric model, but if you’re curious about what your
calculator is doing, here they are:
geometpdf(p,x) =
q^{x−1}p
geometcdf(p,x) = 1−q^{x}
where q = 1−p as usual.
You can see that the two “long way” paragraphs
above actually used those formulas.
The geometric distribution is completely specified by p, so you can compute the mean and standard deviation quite easily:
μ = 1/p σ = μ √q or (1/p) √(1−p)
Example 14: 30.4% of US households own cats. How many households do you expect you’ll need to visit to find a catowning household?
Solution: The expected value of a distribution is the mean. μ = 1/p = 1/.304 = 3.289473684. μ = 3.3. Interpretation: On average, you expect to have to visit between 3 and 4 households to find the first cat owners.
Caution! The expected value (mean) is not the most likely value (mode). Take a look back at the histogram, and you’ll see that the most likely value is 1: you’re more likely to get lucky on the first trial than on any other specific trial. But the distribution is highly skewed right, so the average gets pulled toward the higher numbers.
To compute the SD, just multiply the
mean by √q. A handy technique is called
chaining calculations. After first calculating the mean,
press the [×
] key, and the calculator knows you are
multiplying the previous answer by something. Here you see that
σ = 2.7.
Interpreting σ is a bit harder. The geometric distribution is a type of discrete probability distribution, so you interpret its standard deviation the same way as for any other DPD. In this particular example, σ is almost as large as μ, so you expect a lot of variability. If you and a lot of coworkers go out independently looking for households with cats, the group average number of visits will be 3.3 households, but there will be a lot of variability between different workers’ experience. You can’t use the Empirical Rule here because the geometric model is not a bell curve, but you can at least say you won’t be surprised to find workers who get lucky on the first house (μ−σ ≈ 0.5), and workers who have to visit six houses or more (μ+σ ≈ 6.0).
Some people find it very hard to make choices because they feel they must consider all the pros and cons of every possibility. Others look at possibilities one at a time and take the first one that’s acceptable. Studies such as The Tyranny of Choice (Roets, Schwartz, Guan 2012 [see “Sources Used” at end of book]) show that the first group may make better choices objectively, but the second group is happier with the items they choose.
Example 15: You have to buy a new sofa. You’d be content with 55% of the sofas out there. Let’s assume that your Web search presents sofas in an order that has nothing to do with your preferences. There are hundreds to choose from, so you decide to adopt the “first one that’s acceptable” strategy. How likely is it that you’d order the third sofa you’d see?
Solution: This is a geometric model, with two failures
followed by one success. p = 55%.
geometpdf(.55,3)
= .111375. There’s about an
11% chance you’d order the third sofa.
Example 16: Larry’s batting average is .260. During which time at bat would he expect to get his first hit of the game? How likely is he to get his first hit within his first four times at bat?
Solution: This is a geometric model with p =
0.260. The mean or expected value is 1/p = 1/.26 =
3.85, about 4. On average, his first hit each
game will
come on his
fourth time at bat.
For the second question, geometcdf(.26,4)
=
.70013424; there’s about a 70% chance he’ll get
his first hit within his first four times at bat.
In the previous section, we looked at the geometric model, where you just keep trying until you get a success. In this section, we’ll look at the binomial model, where you have a fixed number of trials and a varying number of successes.
The binomial model, also known as the binomial probability distribution or BPD, is a kind of discrete probability distribution that applies to Bernoulli trials when you have a fixed number of trials, n.
Expanding on the definition of Bernoulli trials, you can say that a binomial model is one where
Example 17: Cats again! 30.4% of US households
own one or more cats. You visit five households, selected randomly.
(a) What’s the chance that no more than two have
cats?
(b) What’s the chance that exactly two have cats?
(c) What’s the chance that at least two have
cats?
(d) What’s the chance that two to four have
cats?
This problem fits the binomial model: n = 5 trials, each household does or does not have cats, and the probability p = 30.4% is the same for each household.
A picture of this binomial distribution is shown at right, and you can see some differences from the picture of the geometric distribution:
How do you read the picture? There’s about a 17% probability that none of the five households will have cats, about 36% that one of the five will have cats, and so on. (Why 36% and not 30.4%? Because there’s a greater chance of “winning” one out of five than one out of one.)
In this book we’re more concerned with computing probabilities, but it can be nice to get an overall picture of a distribution. I made this particular graph by using @RISK from Palisade Corporation, but you can also make histograms of binomial distributions by using MATH200A Program part 1(5).
Here you have a choice. Your TI83/84 calculator comes with two menu selections for the binomial model, but the MATH200A program gives you a simpler interface. Here’s a quick overview of both, before we start on computations:
With the MATH200A program (recommended):  If you’re not using the program: 

MATH200A Program part 3 gives you one interface for all binomial probability calculations. The program might already be on your calculator from Chapter 3 boxplots, but if it’s not, see Getting the Program for instructions. To find binomial probability with the program, press
[
That puts the program name on your home screen. Press
[ 
These are both in the [

Got a TI89 family calculator? Use the
[F5
] Distr
menu. Select Binomial
Pdf
or Binomial Cdf
. The Cdf function can handle
any range of successes, not just 0 to x. See
Binomial Probability Distribution on TI89 for full
instructions.)
Now let’s use your TI83/84 to answer the questions in Example 17. You have five trials, so n = 5. The probability of success on any given household is 30.4%, so p = 0.304.
(a) What’s the probability that no more than two of the five randomly selected households have cats?
With the MATH200A program (recommended):  If you’re not using the program: 

Press [ Enter n and p. “No more than two cats” is from 0 to 2 cats, so enter those values when prompted. The program echoes back your inputs and shows the computed probability. To show your work, write down the screen name, the inputs, and the result. Conclusion: P(x ≤ 2) or P(0 ≤ x ≤ 2) = 0.8316. 
The probability that no more than two of your five
households have cats (in other words, the probability that 0 to 2 have
cats) is
If you don’t have the “wizard” interface, or
you have it turned off,
If you have the “wizard” interface, you get a menu
screen, but you enter the same information. Press [ Either way, write down the Conclusion: P(x ≤ 2) or P(0 ≤ x ≤ 2) = 0.8316. 
(b) What’s the probability that exactly two of five randomly selected households are cat owners?
With the MATH200A program (recommended):  If you’re not using the program: 

You need a specific number of successes, instead of a range.
It’s almost exactly the same deal: you
just enter the same number for Conclusion: P(x = 2) or P(2) = 0.3116. 
(a) The probability of exactly two catowner households in
five is (The “wizard” interface screen is the
same as it was for Conclusion: P(x = 2) or P(2) = 0.3116. 
(c) What’s the probability that at least two of the five randomly selected households have cats?
With the MATH200A program (recommended):  If you’re not using the program: 

“At least two”, in a sample of five, means from two to five successes. Enter those values in MATH200A part 3. Here’s the results screen:
Conclusion: P(x ≥ 2) or P(2 ≤ x ≤ 5) = 0.4800. 
This one is a little trickier. You could find P(2), P(3), P(4), and P(5) and add them up by hand, but that’s tedious and error prone, and it can introduce rounding errors. Instead, you’ll make the calculator add them up for you.
First, get all the probabilities for 0 through n successes
into a statistics list. To do this, use After the closing paren, don’t press [ Now you need to sum the desired range of cells. You want 2 ≤ x ≤ 5. But the lowest possible x is 0, and the cells in statistics lists are numbered starting at 1. So to get x from 2 through 5, you need cells 3 through 6. When summing part of a list, add 1 to your desired x values. Press [ Your answer: P(x ≥ 2) or P(2 ≤ x ≤ 5) = 0.4800. Beware of offbyone errors when you solve problems with phrases like at least and no more than. Always test the “edge conditions”. “Okay, I need at least 2, and that’s 2 through 5, not 3 through 5. Oh yeah, add 1 for the statistics list in the TI83, so I’m summing cells 3 through 6, not 2 through 5.” Alternative solution: Do you remember solving “at least” problems in Chapter 5? What was the lesson there? With laborious probability problems, the complement is your friend. What’s the complement of “at least two”? It’s “fewer than two”, which is the same as “no more than one”. Shaky on the logic of complements? Use the enumeration method from Chapter 5: 0 1 2 3 4 5 or 0 1  2 3 4 5. Find the probability of ≤1 household with cats, and subtract from 1: P(x ≥ 2) = 1 − P(x ≤ 1) P(x ≥ 2) = 1 − binomcdf(5, .304, 1) P(x ≥ 2) = .4799959639 → 0.4800 
(d) What’s the chance for two to four catowning households in your random sample of five households?
With the MATH200A program (recommended):  If you’re not using the program: 

Nothing new here: just use good old MATH200A part 3. Here’s the results screen:
P(2 ≤ x ≤ 4) = 0.4774. 
You need x from 2 through 4, but remember you always add 1 when
summing binomial probabilities from a statistics list, so you put 3 to
5 in your P(2 ≤ x ≤ 4) = 0.4774. Alternative solution: You can also do it
without summing. If you think about it, the probability for x from 2
to 4 is the probability for x from 0 to 4, with x below 2 (x no more
than 1) removed:
P(0 ≤ x ≤ 1) + P(2 ≤ x ≤ 4) = P(0 ≤ x ≤ 4) and by subtracting that first term you get P(2 ≤ x ≤ 4) = P(0 ≤ x ≤ 4) − P(0 ≤ x ≤ 1) Your probability is the result of subtracting two cumulative probabilities, the cdf from 0 to 4 minus the cdf from 0 to 1. It’s shown at right. This is tricky, I admit. You have to set that x value
correctly in the second 
You don’t actually need a formula for the
binomial model, but if you’re curious about what your
calculator is doing, here it is:
binompdf(n,p,x) =
_{n}C_{x} · p^{x}
q^{n−x}
Why?
p^{x} is the probability of getting successes on
all of the first x trials.
q is the probability of failure on one trial, and therefore
q^{n−x} is the probability of failure
on the remaining trials, after the x out of n
successes.
But in a binomial probability model, you care how many successes and
failures there are, not in what order they occur.
To account for the fact that order doesn’t matter, the formula
has to multiply by _{n}C_{x}, “the
number of ways to choose x objects out of n”.
(If you want to know more about _{n}C_{x},
search “combinations” at your favorite math site.)
Unlike the geometric case, there’s no simple formula for binomcdf. Your calculator just has to compute probabilities for x = 0, 1, and so on and add them up.
Example 18: Larry’s batting average is .260. How likely is it that he’ll get more than one hit in four times at bat?
Solution: This is a binomial model with n =
4, p = 0.26, x = 2 to 4. You can use MATH200A part 3 or
the binompdf
sum
technique to get
.27870128. P(x > 1) = 0.2787 or
about 28%. (The program is completely straightforward, so I’m
showing only the tricky binompdf
sum
sequence here.)
Alternative solution: If you don’t have the program, can you see how to use the complement to solve this problem more easily? Check your answer against mine to be sure that your method is correct.
The binomial distribution depends on the proportion in the population (p) and your sample size (n). You can compute the mean and SD quite easily:
μ = np σ = √[npq]
What are the mean and SD of the number of catowning households in a random sample of five households?
μ = np = 5 × 0.304 = 1.52
σ = √[npq] = √[5 × .304 × (1−.304)] = 1.028552381
Conclusion: μ = 1.5 and σ = 1.0.
Interpretation: in a sample of five households, the expected number of catowning households is 1.5. Or, if you take a whole lot of samples of five households, on average you will find that 1.5 households per sample own cats. The SD is 1.0. You can’t use the Empirical Rule, but you can say that you expect most of the samples of five to contain μ±2σ = 1.5±2×1.0 = 0 to 3 catowning households.
Example 19: 30.4% of US households own one or more cats. You visit ten random households and seven of them own cats. Are you surprised at this result?
A result is surprising or unusual or unexpected if it has low probability, given what you think you know about the population in question. The threshold for “low probability” can vary in different problems, but a typical choice is 5%.
When we ask whether a result is surprising (unusual, unexpected), we are really talking about that result or one even further from the expected value.
You think you know that 30.4% of US households own cats. A sample of ten doesn’t seem very large; how do you decide whether seven successes seems reasonable or unreasonable?
First, what’s the expected value? That’s μ = np = 10×.304 = 3.04.
Next, what does “that result or one further from the expected value” mean? The expected value is 3.04, seven is greater than 3.04, so we’re talking about seven or more successes, x = 7 to 10.
Find the probability of that result or one even further from the expected value.
That’s easiest with MATH200A part 3: set n=10, p=.304,
x=7 to 10. You can also do it with binomcdf
:
seven or more successes is the complement of zero to six
successes
(0 1 2 3 4 5 6 7 8 9 10).
Either way, the probability is 0.0115 or just over 1%.
Draw your conclusion. If 30.4% of US households own cats, finding seven or more cat houses in a random sample of ten households is unusual (surprising, unexpected).
That was a trivial example. But in real life, when a result is unexpected it can cast doubt on what you’ve been told. Here’s an example.
Example 20: In Talladega County, Alabama, in 1962, an African American man named Robert Swain was accused of rape. 26% of eligible jurors in the county were African American, but the 100man jury panel for Swain’s trial included only 8 African Americans. (Through exemptions and peremptory challenges, all were excluded from the final 12man jury.) Swain was convicted and sentenced to death.
Swain’s lawyer appealed, on grounds of racial bias in jury selection. The Supreme Court ruled in 1965 that “The overall percentage disparity has been small and reflects no studied attempt to include or exclude a specified number of blacks.”
—Adapted from Michailides [see “Sources Used” at end of book]
What do you think of that ruling? If 100 men in the county were randomly selected, is eight out of 100 in the jury pool unexpected (unusual, surprising)?
Solution: This is a binomial model: every man in the county either is or is not African American, the sample size is a fixed 100, and in a random sample there’s the same 26% chance that any given man is African American.
To determine whether eight in 100 is unexpected, ask what is expected. For binomial data, μ = np = 100×.26 = 26; in a sample of 100, you expect 26 African Americans.
Okay, 26 is expected, 8 is less than 26, “further away from expected” is less than 8, so you compute the probability for x = 0 to 8.
Use binomcdf(100,.26,8)
or MATH200A part 3.
Either way you get a probability
of 4.734795002E6, or about 0.000 005, five chances in
a million. That is unexpected. It’s
so unlikely that we have to question the
county’s claim that the selection was random.
Unfortunately, Mr. Swain’s lawyer didn’t consult a statistician.
(The online book has live links to all of these.)
binomcdf
/binompdf
or MATH200A,
but MATH200A is less work.Chapter 7 WHYL → ← Chapter 5 WHYL
Write out your solutions to these exercises, showing your work for all computations. Then check your solutions against the solutions page and get help with anything you don’t understand.
Caution! If you don’t see how to start a problem, don’t peek at the solution — you won’t learn anything that way. Ask your instructor or a tutor for a hint. Or just leave it and go on to a different problem for now. You may find when you return to that “impossible” problem that you see how to do it after all.
Velma the Vampire will drink anything, but she prefers O
negative. She doesn’t know a victim’s blood type until she
tastes it.
(a) How many does she expect to drain before she gets some O
negative?
(b) How likely is it that she’ll find her first O negative
within her first ten victims?
(c) How likely is it that exactly two of her first ten victims will
be O negative?
x  0  1  2  3  4  5 

P(x)  0.0778  0.2591  0.3456  0.2305  0.0768  0.0102 
(a) Find and interpret the mean and standard deviation of this
probability model.
(b) For an extra challenge, can you use your answer from
part (a) to construct a simpler probability model for five flips
of this coin?
Updates and new info: https://BrownMath.com/swt/