The Hypergeometric Random Variable

 HYPGEOM.83p HYPGEOM.86p hypgeom.89p

Suppose we have a population of N objects which are divided into two types: Type A and Type B. There are n objects of Type A and N - n objects of Type B. For example, a standard deck of 52 playing cards can be divided in many ways. Type A could be "Hearts" and Type B could be "All Others." Then there are 13 Hearts and 39 others in this population of 52 cards.

Suppose a random sample of size r is taken (without replacement) from the entire population of N objects. The Hypergeometric Random Variable X, denoted X ~ Hyp(N, n, r), counts the total number of objects of Type A in the sample.

If r <= n, then there could be at most r objects of Type A in the sample. If r > n, then there can be at most n objects of Type A in the sample. Thus, the value min(r, n) is the maximum possible number of objects of Type A in the sample.

On the other hand, if r <= N - n, then all objects chosen may be of Type B. But if r > N - n, then there must be at least r - (N - n) objects of Type A chosen. Thus, the value max(0, r - (N-n)) is the least possible number of objects of Type A in the sample.

The probability of having exactly k objects of Type A, for max(0, r - (N-n)) <= k <= min(r, n) is

P(X = k) = C(n, k)*C(N-n, r-k) / C(N, r) .

The average (or mean) number of objects of Type A in the sample of size r is given by

E[X] = r*(n / N).

The variance of the number of objects of Type A in the sample is given by

Var(X) = r*(n / N) * (N - n) / N * (N - r) / (N -1).

There are no closed-form formulas for the cumulative probability P(X <= k) or for computing probabilities such as P(j <= X <= k).

Using the HYPGEOM Program

The HYPGEOM program can be used to compute probabilites such as P(j <= X <= k), P(X = k), and P(X <= k). To execute the program, we enter the values of N, n, r and the lower and upper bounds of j and k. (Enter the same value k for both the lower and upper bound to compute a pdf value P(X = k).) The program also asks if you want a complete distribution to be entered into the STAT Edit screen. If so, then enter 1. If not, then enter 0. The program then displays P(j <= X <= k) along with the average value and standard deviation.

If you enter 1, then the entire distribution will be entered into the STAT Edit screen. Under L1, the range of the possible number of objects of Type A are listed. Adjacent under L2, the values of P(X = k) are listed. Under L3, the cumulative probabilities P(X <= k) are listed.

Click here for info on the TI-86 and TI-89 Stat Edit displays.

Example. Suppose a hand of 5 cards is dealt. What is the probability of there being

(a) at least 3 Hearts?
(b) exactly 3 Hearts?
(d) at most 3 hearts?
(d) What is the average number of Hearts and the most likely number of Hearts to be dealt?

Solution. There are 13 Hearts in a deck of 52. In this sample of 5, "at most 3" means from 3 to 5. After calling up the HYPGEOM program, enter 52 for POP. SIZE, enter 13 for TYPE A SIZE, enter 5 for SAMPLE SIZE, enter 3 for LOWER BOUND, and enter 5 for UPPER BOUND.

We that P(X >= 3) = P(3 <= X <= 5) = 0.0927671, and that there would be an average of 1.25 Hearts in a random 5 card deal with a standard deviation of 0.9295.

If you entered 1 to receive a complete distribution, then view the distribution in the STAT Edit screen (LIST EDIT on the TI-86 or APPS, 6, 1 on the TI-89). For a value k under L1 (xStat or c1), the value of P(X = k) is adjacent under L2 (yStat or c2), and the value P(X <=k) is under L3 (fStat or c3).

(b) We see here that the probability of exactly 3 Hearts is 0.08154.

(c) We also see from the list that P(X <=3) = 0.98878.

(d) The mode is 1 Heart, since it has the highest probability of occurring at 0.41142.

Exercises

1. Suppose a hand of 13 cards is dealt. What is the probability of there being

(a) at most 2 Aces
(b) exactly 2 Aces?
(c) at least one Ace?
(d) What is the average number of Aces and the most likely number of Aces to be dealt?

2. In a drama class, there are 12 females and 10 males. A group of 15 is to be chosen at random to read a screenplay.

(a) What is the range for the possible number of males chosen? Of females chosen?
(b) What is the probability that there will be at most 6 males? At least 6 males?
(c) What is the average number of males and the most likely number of males to be chosen? What is the average number of females chosen?

3. In another class, there are 12 males and 17 females. A random group of 14 is chosen.

(a) What is the range for the possible number of females chosen? Of males chosen?
(b) What is the probability of there being at most 9 females? At least 9 females?
(c) What is the average number of males chosen?
(d) What is the most likely number of females to be chosen?

Solutions

1. Here, X ~ Hyp(52, 4, 13). There are 4 Aces (Type A). In this case, "at most 2" means 0 to 2 Aces. In the HYPGEOM program, enter 52 for POP. SIZE, enter 4 for TYPE A SIZE, enter 13 for SAMPLE SIZE, enter 0 for LOWER BOUND, and enter 2 for UPPER BOUND. Also enter 1 for a complete distribution.

(a) We see that P(X <= 2) is about 0.95616, and that the average number of Aces in a 13 card deal is 1 with a standard deviation of about 0.84017.

(b) Under STAT Edit (LIST EDIT on the TI-86 or APPS, 6, 1 on the TI-89), we find that P(X = 2) = 0.21349.

(c) P(X >= 1) = 1 - P(X = 0) = 1 - 0.3038175 = 0.6961825.

(d) We also observe that the most likely number of Aces in a 13 card hand is 1, occurring with probability 0.43885.

2. (a) Here, X ~ Hyp(22, 10, 15). There are 10 males (Type A). There are 15 chosen and there are only 12 females (Type B). Thus, there must be at least 3 males chosen. But there can be at most 10 males chosen; so the range of the possible number of males chosen is {3, 4, . . ., 10}. Similarly, the range of the possible number of females chosen is {5, 6, . . ., 12}.

(b) In this case, "at most 6 males" means 3 to 6 males. In the HYPGEOM program, enter 22 for POP. SIZE, enter 10 for TYPE A SIZE, enter 15 for SAMPLE SIZE, enter 3 for LOWER BOUND, and enter 6 for UPPER BOUND. Also enter 1 for a complete distribution.

We see that P(3 <= X <= 6) = 0.3839. Moreover, P(X >= 6) = 1 - P(X <= 5) = 1 - 0.113 = 0.887 (or re-run the program with 6 for LOWER BOUND 10 for UPPER BOUND).

(c) We also see that the average number of males is 6.81818. Under STAT Edit (LIST EDIT on the TI-86 or APPS, 6, 1 on the TI-89), we find the mode to be 7, which occurs with probability 0.3483. The average number of females chosen then must be 15 - 6.81818 = 8.18182.

3. (a) Here, X ~ Hyp(29, 17, 14). There are 17 females (Type A). There are 14 chosen and there are only 12 males (Type B). Thus, there must be at least 2 females chosen, and there can be at most 14 chosen. So the range of the possible number of females chosen is {2, 3, . . ., 14}. Similarly, the range of the possible number of males chosen is {0, 1, . . ., 12}.

(b) In this case, "at most 9" means 2 to 9 females. In the program, enter 29 for POP. SIZE, enter 17 for TYPE A SIZE, enter 14 for SAMPLE SIZE, enter 2 for LOWER BOUND, and enter 9 for UPPER BOUND. Also enter 1 for a complete distribution.

We see that P(2 <= X <= 9) = 0.8351298. Moreover, P(X >= 9) = 1 - P(X <= 8) = 1 - 0.5869 = 0.4131 (or re-run the program with 9 for LOWER BOUND and 14 for UPPER BOUND).

(c) Because the average number of females chosen was displayed as 8.2, we can say that the average number of males chosen is 14 - 8.2 = 5.8

(d) Under STAT Edit (LIST EDIT on the TI-86 or APPS, 6, 1 on the TI-89), we find that the most likely number of females chosen is 8, which occurs with probability 0.28962.