This project is on the analysis of a population proportion. You will obtain data by performing a random survey on a targeted population and two disjoint sub-populations. The project involves construction of confidence intervals and hypothesis tests for a population proportion p and for the difference in proportions p1 - p2 among the sub-populations.

We wish to study a specific population and analyze the proportion of this population that responds favorably to a certain question.

1. Think of a question that has only two responses. For example, "Do you favor capital punishment?" or "Are you a smoker or non-smoker?" State your question and which response you want to measure. This will be called
a **favorable response**.

2. Specify a population W that you wish to target for your survey. For example, "Students at my university" or "Adults age 21 or older in this city."

3. Break your population down into two disjoint sub-populations W1 and W2. For example, W1 = female and W2 = male, or W1 = married and W2 = single.

4. If possible specify the proportionate sizes of the sub-populations. For example, W1 is 60% of the population and W2 is 40%.

5. If possible, give the size N of your population W and the sizes N1 and N2 of the sub-populations W1 and W2. Or if appropriate, simply specify that these are "large" populations that do not require a finite population correction factor.

Note: If you are doing this project at the same time as Project 1: Survey on Population Mean, then you should use the same population W and sub-populations
W1 and W2 for both projects.

Before conducting the survey, we wish to make some initial estimates of the proportions. We shall use these estimates later for our hypothesis tests.

1. Give an estimate P as to what proportion of the entire population W you think will respond favorably to your question. Give reasons for your estimate.

2. Give estimates of the proportions p1 and p2 of the sub-populations W1 and W2 that you think will respond favorably. Again, give reasons for your estimates. The difference of your estimates then becomes an estimate for the difference in proportions p1 - p2.

3. If applicable, how would you respond to your question? Do you think that you would be with the majority or minority of the population with your response?

1. For your proportion, choose a desired margin of error e such as 0.03 or 0.035.

2. For each sub-population W1 and W2, find the sample sizes n1 and n2 required to obtain 95% confidence intervals that have no larger than your desired margin of error e.

Next, you must conduct a random survey on the targeted population to obtain sample measurements. For scientific purposes, there usually should be at least n1 respondents from sub-population W1 and at least n2 respondents from sub-population W2. However, for instructional purposes here, you may limit yourself to a total of 100 measurements from the entire population with at least 30 from each sub-population.

But you should at least use sample sizes that are in proportion to the sizes of the sub-populations. For example, if W1 is 60% of the population, then use n1 = 60 and n2 = 40, or n1 = 45 and n2 = 30, etc.

1. Take a random survey of people specifically within your target population W and gather their responses. Be sure to determine to which sub-population each respondent belongs. State how the survey was conducted and how randomness was insured.

2. List the total number of people surveyed n and the number m of those responding favorably. Compute the sample proportion m/n. How does your sample proportion compare with your estimate P?

3. For the two sub-populations W1 and W2, state the number of respondents n1 and n2 and the numbers m1 and m2 of those responding favorably. Compute the respective sample proportions m1/n1 and m2/n2. Do these values compare favorably with your estimates?

4. How do m1/n1 and m2/n2 compare with m/n? Based on this data, does it seem that a person is more or less likely to respond favorably depending on sub-population?

5. Do you think it is possible for the true sub-population proportions p1 and p2 to be equal; that is, could the favorable response be independent of sub-population?

We next wish to display our data graphically. One way to do this is with a Block Diagram which shows the data represented in set notation. To set up the diagram, let A = those responding favorably, A' = those responding unfavorably, B = sub-population W1, and B' = sub-population W2.

1. Draw two Block Diagrams, the first of which shows the number of people in each of the four categories A & B, A & B', A' & B, and A' & B', and the second of which shows the percentages of persons in each of these four subsets.

Note that A & B is merely the set of favorable responses from W1 and A & B' is the set of favorable responses from W2.

2. Compute the conditional probabilities P(B | A) and P(B' | A) and explain what they mean.

We next wish to construct confidence intervals for the true population proportion p, the true sub-population proportions p1 and p2, and the true difference in sub-population proportions p1 - p2.

1. Use your data and the **PCONFINT** program to construct separate 95% confidence intervals for p, p1, and p2.

2. Do you feel that the margins of error are small enough to provide useful information and to make the confidence intervals meaningful statistics?

3. Is it now possible for p1 and p2 to be equal; that is, is there any overlap in their confidence intervals? Could the favorable response be independent of sub-population?

4. Use the **DIFPCI** program to find a 95% confidence interval for the difference p1 - p2. How does this interval help determine if the favorable response is independent of sub-population?

Lastly, we shall use the sample data to perfom hypothesis tests on the initial personal estimates that you made.

1. Let P denote your personal estimate for p. Choose an appropriate level of significance, then use your data and the **PTEST** program to test the following three hypotheses:

2. Now let P denote your personal estimate of the difference p1 - p2. Choose an appropriate level of significance, then use your data and the **DIFPTEST** program to test the following three hypotheses:

3. At the 0.05 level of siginificance, test whether the favorable response is independent of sub-population. That is, test the hypothesis Ho: p1 - p2 = 0.

Draw any conclusions or add any comments that you like.

For further study, you can also test various hypotheses about the proportions of your sub-populations.

Return to Table of Contents.