Confidence Interval for a Proportion

 PCONFINT.83p PCONFINT.86p pconfint.89p

A special case of the mean of a population is the proportion p of those having a certain designation. For example, we may ask what proportion of the population approves of the President's performance. Since p is a proportion, it is always the case that 0 <= p <= 1; however, p is often stated as a percentage. If p = 0.73, then we could say that 73% approve.

Using a "large" random sample, we can estimate the value of p by the sample proportion m / n, where m equals the number of favorable responses, and n equals the total number of those surveyed.

If we assume that the overall population is extremely large relative to the sample size n, then the sample deviation is given by S = Sqrt[ (m / n)(1 - m / n) ]. However, if the population is of a small size N, then the sample deviation is S = Sqrt[ (m / n)(1 - m / n) ] * Sqrt[ (N - n) / (N - 1) ]. Usually, if the sample size is less than 5% of the overall population, then we use the first form for S. Often, such as with national surveys, the population size is not stated because it can be assumed to be extremely large. However, for smaller populations, the population size must be stated in order to use the second form for S.

Since m / n is an estimate, we say that p = (m / n) +/- e, where e is the appropriate margin of error that again depends on the desired level of confidence r. Once again, the z-score is the value z such that P(-z <= Z <= z) = r, where Z ~ N(0,1). The margin of error is then given by e = z*S / Sqrt(n). However, since 0 <= p <= 1, the actual confidence interval for p will be

[max(0, m / n - z*S / Sqrt(n)), min(1, m / n + z*S / Sqrt(n))]

Usually, a sample of size n >= 30 is considered sufficient to use the analysis above. However in practice, if we wish to obtain a reasonably small margin of error for a proportion estimate, then a much larger sample size is needed.

Using the PCONFINT Program

To execute the program, first enter either 1 or 2 to specify a "large" population or a finite population. Then enter the number of favorable responses, the sample size, (and the population size for Option 2). The program displays the sample proportion, the margin of error, and the confidence interval.

Example. A national poll commisioned by the Center on Addiction and Substance Abuse at Columbia University found that 1340 out of 2000 adults interviewed believed that popular culture encourages drug use. Find a 98% confidence interval for the true proportion of adults with this belief at that time.

Solution. We assume that the number of adults in the target population is extremely large. Thus, call up the PCONFINT program and enter 1 for "large" population, then enter 1340 for NUMBER OF YES, enter 2000 for SAMPLE SIZE, and enter .98 for CONF. LEVEL.

We see that 67% of adults responing had this belief with a margin of error of 2.45 percentage points, which gives a 98% confidence interval of (0.6455, 0.6945).

Exercises

1. In a recent national poll, 54% of 900 adults surveyed stated that they favored President Bush over John Kerry in the next presidential election. Find a 95% confidence interval for the true proportion of adults nationwide who favored Bush at that time.

2. In a college of 2650 students, 264 out of 400 surveyed had registered to vote. Find a 90% confidence interval for the true proportion of those registered to vote at this college.

3. On June 25, 1995, The Associated Press reported the findings of a national survey conducted by the Center for Social and Religious Research at the Hartford Seminary. The study was on the divorce rate of a group of 5000 Protestant clergymen and 5000 Protestent clergywomen. It was found that 20% out of 2086 clergymen responding had been divorced at least once. Find a 99% confidence interval for the true proportion of those that have been divorced among the targeted group of 5000 clergymen.

Solutions

1. Again we assume a "large" population. Since the exact number of "Yes" responses was not stated, we can enter .54*900 for NUMBER OF YES, then enter 900 for SAMPLE SIZE, and .95 for CONF. LEVEL.

We find that p ~ 0.54 +/- 0.0326 that gives us the interval [0.5074, 0.5726]. Thus, Bush apparently had a majority of support at that time.

2. Here we have a small, finite population. Thus after calling up the program, enter 2 for "finite" population, then enter 2650 for POP. SIZE, 264 for NUMBER OF YES, and 400 for SAMPLE SIZE. Lastly enter .90 for CONF. LEVEL. We find that p ~ 0.66 +/- 0.0359.

3. Again we have a small, finite population. Thus, enter 5000 for POP. SIZE, then enter .2*2086 for NUMBER OF YES, 2086 for SAMPLE SIZE, and .99 for CONF. LEVEL. We find that p ~ 0.2 +/- 0.0172. Thus, we can be relatively certain that the true proportion of the targeted group of 5000 clergyman who have been divorced lies in the interval [0.1828, 0.2172].