Confidence Interval for the Difference Between Proportions

 DIFPCI.83p DIFPCI.86p difpci.89p

Suppose we are studying two populations and we wish to measure the proportions p1 and p2 of those having a certain designation. Then the difference between the proportions p1 - p2 is a special case of the difference between means. To estimate this difference, we conduct independent random samples on each population to first estimate p1 and p2 individually. Then, p1 ~ m1 / n1, where m1 is the number of affirmative responses and n1 is the number surveyed in the first population. Similarly, p2 ~ m2 / n2, where m2 is the number of affirmative responses and n2 is the number surveyed in the second population. Then, (p1 - p2) ~ (m1 / n1 - m2 / n2) +/- e, where e is an appropriate margin of error.

For this scenario, each population could be "large" or "small" relative to the respective sample size. If Population 1 is "large", then the sample deviation is given by S1 = Sqrt[ (m1 / n1) (1 - m1 / n1) ]. If Population 1 is of a smaller finite size N, (usually so that the sample size is more than 5% of N), then the sample deviation is S1 = Sqrt[ (m1 / n1) (1 - m1 / n1) ] Sqrt[ (N - n1) / (N - 1) ] . The sample deviation S2 for the second population is defined similarly.

If our desired level of confidence is r and z is the z-score such that P(-z <= Z <= z) = r, where Z ~ N(0,1), then the margin of error is given by e = z*Sqrt[ (S1)^2 / n1 + (S2)^2 / n2 ]. Thus,

 (p1 - p2) ~ (m1 / n1 - m2 / n2) +/- z*Sqrt[ (S1)^2 / n1 + (S2)^2 / n2 ]

To use the analysis above, we generally require samples of sizes n1 >= 30 and n2 >= 30. However to obtain reasonably small margins of error, we would usually need much larger sample sizes.

Using the DIFPCI Program

To execute the program, we first enter 1, 2, 3, or 4 to designate the types of populations we have under study: (1) two large populations; (2) the first large and the second finite; (3) the first finite and the second large; or (4) two finite populations. If a population is finite, then we enter its population size. Next, enter the number of affirmative responses and the sample sizes for each population and the desired level of confidence. The program displays the difference in sample proportions m1 / n1 - m2 / n2, the margin of error, and the confidence interval.

Example. The results of a poll commisioned by the Center on Addiction and Substance Abuse at Columbia University found that 1340 out of 2000 adults and 304 out of 400 youths interviewed believed that popular culture encourages drug use. Find a 95% confidence interval for the true difference in proportions between adults and youths with this belief at that time.

Solution. We shall assume that these were large nationwide populations of adults and youths under study. Thus after calling up the DIFPCI program, first enter 1 to specify that we have two large populations. Next, enter 1340 for 1ST NO. OF YES, enter 2000 for 1ST SAMPLE SIZE, enter 304 for 2ND NO. OF YES, enter 400 for 2ND SAMPLE SIZE, and enter .95 for CONF. LEVEL.

We find that p1 - p2 ~ -0.09 +/- 0.0467, or that -0.1367 <= p1 - p2 <= -0.0433. Equivalently 0.0433 <= p2 - p1<= 0.1367. So apparently, a greater percentage of youths had this belief at the time of the study. The percentage of youths having this belief was possibly from 4.33 percentage points higher to 13.67 percentage points higher than the percentage of adults having this belief.

Exercises

1. On July 13, 1995, USA TODAY reported the results of a USA TODAY/CNN Gallup Poll. Out of 801 adults surveyed nationally, 68% felt that the Republicans work in Congress was "politics as usual." However, out of 326 adults surveyed in California, only 62% felt this way. Find a 90% confidence interval for the difference between proportions nationally and in California.

2. On June 25, 1995, The Associated Press reported the results of a national survey conducted by the Center for Social and Religious Research at the Hartford Seminary. The study was on the divorce rate of a group of 5000 Protestant clergywomen and 5000 Protestent clergymen. It was found that 25% out of 2458 clergywomen responding had been divorced at least once and 20% out of 2086 clergymen responding had been divorced at least once. Find a 99% confidence interval for the true difference in divorce rates among the two targeted groups of 5000 clergywomen and 5000 clergymen.

3. Suppose we know that 1032 out of 4544 respondents from a targeted population of 10,000 clergy had been divorced. Then an independent national survey (which may include some of these clergy) found that 2080 out of 8000 adults had been divorced at least once. Find a 95% confidence interval for the true difference in divorce rates among these 10,000 Protestant clergy and the general adult population.

Solutions

1. In the DIFPCI program, first enter 1 to designate two large populations. Since the exact number of affirmative responses are not given, we can enter .68*801 for 1ST NO. OF YES and 801 for 1ST SAMPLE SIZE, then enter .62*326 for 2ND NO. OF YES and 326 for 2ND SAMPLE SIZE. Finally enter .9 for CONF. LEVEL.

We find that p1 - p2 ~ 0.06 +/- 0.0519, or that 0.0081 <= p1 - p2 <= 0.1119. That is, this feeling was apparently stronger nationally by as much as 11.19 percentage points.

2. If we limit the two populations to the two groups of 5000, rather than all possible Protestant clergy, then we have two small finite populations of known size. We then are studying the proportions of these two groups rather than all possible clergy. Thus in the DIFPCI program, first enter 4 to designate two finite populations, then enter 5000 for both of their population sizes.

Again, we do not have the exact number of "Yes" responses; thus, in the program, enter .25*2458 for 1ST NO. OF YES and 2458 for 1ST SAMPLE SIZE, then enter .2*2086 for 2ND NO. OF YES and 2086 for 2ND SAMPLE SIZE. Finally enter .99 for CONF. LEVEL.

We find that 0.0265 <= p1 - p2 <= 0.0735. That is, among these two groups of 5000 clergywomen and 5000 clergymen, the proportion of clergywomen who have been divorced is from 2.65 percentage points higher to 7.35 percentage points higher than the proportion of clergymen who have been divorced.

3. Our first population is still small and finite; but now our second population is "large" and can be considered infinite. Thus in the DIFPCI program, first enter 3 to designate that the first population is finite and the second is large, then enter 10000 for the first population size.

Next, enter 1032 for 1ST NO. OF YES and 4544 for 1ST SAMPLE SIZE, then enter 2080 for 2ND NO. OF YES and 8000 for 2ND SAMPLE SIZE. Finally enter .95 for CONF. LEVEL.

We find that -0.0461 <= p1 - p2 <= -0.0197, or equivalently 0.0197 <= p2 - p1 <= 0.0461. Thus, the general public has a higher divorce rate by 1.97 to 4.61 percentage points.

Note: The built-in 2-PropZInt command on the TI-83 cannot be used without the exact numbers of "Yes" responses. These values must be integers. Also, this built-in function cannot take into account the possible finite population size correction factor for the sample deviations.