This project is on the analysis of a population mean. You will obtain data by performing a random survey on a targeted population and two disjoint sub-populations. The project involves construction of confidence intervals and hypothesis tests for the population mean µ and for the difference in means µ1 - µ2 among the sub-populations.

We wish to study a specific population and to analyze the mean of some measurement on this population and its sub-populations.

1. Think of an average that you would like to determine. For example, the average number of hours a typical student sleeps per weeknight, or the average number of jobs that a person has had.

2. Specify the population W that you wish to target for your survey. For example, "Undergraduates at my university," or "Adults age 21 or older in this city."

3. Break your population down into two disjoint sub-populations W1 and W2. For example, W1 = female and W2 = male, or W1 = having college degree and W2 = not having college degree.

4. If possible specify the proportionate sizes of the sub-populations. For example, W1 is 60% of the population and W2 is 40%.

5. If possible, give the size N of your population W and the sizes N1 and N2 of the sub-populations W1 and W2. Or if appropriate, simply specify that these are "large" populations that do not require a finite population correction factor.

Before conducting the survey, we wish to make some initial estimates of the means. We shall later use these estimates for the purposes of hypothesis tests.

1. Give an estimate m of the true population mean µ. Give reasons for your estimate such as intuition, observation, or knowledge of a previous study.

2. Give estimates of the means µ1 and µ2 of the sub-populations W1 and W2. Again, give reasons for your estimates. The difference of your estimates then becomes an estimate for the difference in means µ1 - µ2.

3. If applicable, how do you compare with your estimated average? That is, with regard to this measurement, do you feel that you are above, below, or close to average?

1. Give an estimate for the possible range [c , d] of your measurements. Give a logical explanation for your chosen bounds.

2. For your measurement, choose a desired margin of error e. This value will depend on the size of your measurements. If your measurements are small, then you should choose a relatively small e. If your measurements are large, then you will need an appropriately larger margin of error.

3. For each sub-population W1 and W2, find the sample sizes n1 and n2 required to obtain 95% confidence intervals that have no larger than your desired margin of error e. (If W1 and W2 are both "large" and have the same range [c , d], then n1 will equal n2.)

Next, you must conduct a random survey on the targeted population to obtain sample measurements. For scientific purposes, there usually should be at least n1 respondents from sub-population W1 and at least n2 respondents from sub-population W2. However, for instructional purposes here, you may limit yourself to a total of 100 measurements from the entire population with at least 30 from each sub-population.

But you should at least use sample sizes that are in proportion to the sizes of the sub-populations. For example, if W1 is 60% of the population, then use n1 = 60 and n2 = 40, or n1 = 45 and n2 = 30, etc.

1. Take a random survey of people specifically within your target population W and record their measurements. Be sure to determine to which sub-population each respondent belongs. State how the survey was conducted and how randomness was insured.

It is often important to display the data visually with charts or histograms.

1. For each of W, W1, and W2, list the data in a frequency chart that gives the possible measurements and the number of respondents having that measurement.

2. For each of W, W1, and W2, make a histogram that shows the numbers of people having each possible measurement.

1. Compute the respective sample means and sample deviations for W, W1, and W2. Do the means compare favorably with your estimates?

2. How do the sample means from the sub-populations compare with the sample mean of the overall population? Based on this data, do the averages for the sub-populations seem to be much higher or lower than the overall population average?

3. Do you think it seems possible for the true sub-population means µ1 and µ2 to be equal; that is, could the averages be independent of sub-population?

4. Compute the median and the mode for each of W, W1, and W2.

5. Based on your data, how do the medians compare with the means. That is, does it appear that at least half, or perhaps exactly half, of each population is above or below average?

6. For the overall population W, what percentage of measurements are within one sample deviation of the sample average? Within two deviations?

From looking at your histograms, does the overall population W or either sub-population appear to be normally distributed? Use the **TESTNORM** program to test the hypothesis that W is normally distributed. Then test whether the sub-populations are normally distributed.

Recall: If you have put your data in a frequency chart, then you can use the **FREQ** program to enter the measurements into a list in order to run the **TESTNORM** program. On the TI-83, enter the measurements into list **L5** and the frequencies into list **L6**. On the TI-86, enter the measurements into **xStat** and the frequencies into list **yStat**. On the TI-89, enter the measurements into column **c5** and the frequencies into column **c6** in a DATA variable called **dist**. Then execute the **FREQ** program, then execute the **TESTNORM** program.

We next wish to construct confidence intervals for the true population mean µ, the true sub-population means µ1 and µ2, and the true difference in sub-population means µ1 - µ2.

1. Use your data and the **ZCONFINT** or **TCONFINT** program as appropriate to construct separate 95% confidence intervals for µ, µ1, and µ2.

2. Do you feel that the margins of error are small enough to provide useful information and to make the confidence intervals meaningful statistics?

3. Is it now possible for µ1 and µ2 to be equal; that is, is there any overlap in their confidence intervals? Could the averages be independent of sub-population?

4. Use the **ZDIFMNCI** or **TDIFMNCI** program as appropriate to find a 95% confidence interval for the difference µ1 - µ2. How does this interval help to determine whether the means are dependent of the sub-population?

Next, we shall use the sample data to perfom hypothesis tests on the initial personal estimates that you made.

1. Let m denote your personal estimate for µ. Choose an appropriate level of significance, then use your data and the **ZMNTEST** or **TMNTEST** program as appropriate to test the following three hypotheses:

2. Now let m denote your personal estimate of the difference µ1 - µ2. Choose an appropriate level of significance, then use your data and the **Z2MNTEST** or **T2MNTEST** program as appropriate to test the following three hypotheses:

3. At the 0.05 level of siginificance, test whether the average measurement is the same for each sub-population. That is, test the hypothesis Ho: µ1 - µ2 = 0. What can be said now about the mean being independent of sub-population?

Draw any conclusions or add any comments that you like.

For further study, you can also test various hypotheses about the means of your sub-populations.

Return to Table of Contents.