The Negative Binomial Random Variable

 NEGBINOM.83p NEGBINOM.86p negbinom.89p

The negative binomial random variable, denoted by X ~ nb(r, p) is a generalization of the geometric random variable. Suppose you have probability p of of succeeding on any one try. If you make independent attempts over and over, then X counts the number of attempts needed to obtain the rth success, for some designated r >=1. When r = 1, then X ~ geo(p).

We often let q = 1 - p be the probability of failure on any one attempt. Then the probability of having the rth success on the kth attempt, for k >= r, is given by

P(X = k) = C(k - 1, r - 1) * q^(k - r) * p^r .

The average (or mean) number of attempts needed to succeed r times is given by

E[X] = r / p.

The variance of the number of attempts needed to succeed r times is given by

Var(X) = r*(1 - p) / p^2.

There is no closed-form formula for the cumulative distribution function (or cdf) P(X <= k).

Using the NEGBINOM Program

The NEGBINOM program can be used to compute probabilites such as P(j <= X <= k), P(X = k), and P(X <= k). To execute the program, we enter the values of p, r, and the lower and upper bounds of j and k. (Enter the same value k for both the lower and upper bound to compute a pdf value P(X = k).) The program also asks if you want a complete distribution to be entered into the STAT Edit screen. If so, then enter 1. If not, then enter 0. The program then displays P(j <= X <= k) along with the average value and standard deviation.

If you enter 1, then most of the distribution will be entered into the STAT Edit screen. Under L1, the possible number of attempts r, . . . , n are listed. Under L2, the pdf values of P(X = k), for r <= k <= n, are listed. Under L3, the cdf values P(X <= k), for r <= k <= n, are listed. The upper bound n is chosen so that n equals twice the average value.

Click here for info on the TI-86 and TI-89 Stat Edit displays.

Example. Suppose one die is rolled over and over until a 6 is rolled three times. What is the probability that it takes

(a) from 10 to 20 rolls?
(b) exactly 18 rolls?
(c) at least 12 rolls?
(d) What is the most likely number of rolls needed to roll a 6 three times?

Solution. After calling up the NEGBINOM program, enter 1 / 6 for PROBABILITY, enter 3 for NO. OF OCCURRENCES, enter 10 for LOWER BOUND, and enter 20 for UPPER BOUND. Also enter 1 for a complete distribution.

(a) We find that P(10 <= X <= 20) = 0.49308, with an average of 18 attempts needed and a standard deviation of about 9.4868.

(b) To find P(X = 18), either run the program again with 18 for both LOWER BOUND and UPPER BOUND, or simply look up the value in the STAT Edit screen under list L2 (yStat on the TI-86, or c2 in APPS, 6, 1 on the TI-89). We find that P(X = 18) = 0.04087.

(c) Using the cdf under list L3 (fStat on the TI-86, or c3 in APPS, 6, 1 on the TI-89), we obtain P(X >= 12) = 1 - P(X <= 11) = 1 - 0.273225 = 0.726775.

(d) From the complete distribution listings, we see that the distribution is bi-modal. The most likely number of rolls needed roll a 6 three times is 12 and 13, which both occur with probability 0.0493489.

Exercise

Suppose you pull a card from a shuffled deck (with replacement) over and over until a Heart is drawn five times. What is the probability that it will take

(a) at most 20 draws?
(b) at least 10 draws?
(c) What is the median number of draws needed and the most likely number of draws needed?

Solution

Here, X ~ nb(5, 1 / 4). Since there must be at least 5 attempts to succeed 5 times, we first wish to find P(5 <= X <= 20). In the NEGBINOM program, enter 1 / 4 for PROBABILITY, enter 5 for NO. OF OCCURRENCES, enter 5 for LOWER BOUND, and enter 20 for UPPER BOUND. Also enter 1 for COMPLETE DIST.

(a) We see that P(5 <= X <= 20) = 0.58516 with an average of 20 attempts needed and a standard deviation of about 7.746.

(b) We wish to find P(X >= 10) = 1 - P(5 <= X <= 9). From the cdf listing in the complete distribution, we find that P(5 <= X <= 9) = 0.04893; thus, the probability that it will take at least 10 rolls is P(X >= 10) = 1 - 0.04893 = 0.95107.

(c) Also from the cdf listing, we see that P(X <= 19) = 0.53458, which makes 19 the median because it is the first value in the range where the cdf exceeds 0.5. From the pdf listing, we see that this distribution is bi-modal with P(X = 16) = P(X = 17) = 0.0563.