Between Two Independent Normal Populations

Consider two independent normally distributed populations having unknown means µx and µy respectively. We wish to construct a confidence interval for the difference in means µx - µy. We first estimate the difference with the difference in sample means Xbar - Ybar from independent random samples, of sizes n and m respectively, conducted on each population. Then the confidence interval is of the form (Xbar - Ybar) +/- e, where e is an appropriate margin of error.

If we assume that the populations have the same variance, then the data creates a t distribution with n + m - 2 degrees of freedom. Moreover, we no longer need large sample sizes to construct the confidence interval. First, we define a "pooled deviation" Sp to approximate the common standard deviation by

where Sx and Sy are the respective sample deviations. Then we define the t-score to be the value t such that P(-t <= t(n+m-2) <= t) = r, where r is the desired level of confidence. The confidence interval is then given by

If we do not assume that the populations have a common variance, then the
data still forms a t distribution with r degrees of freedom, where r is the greatest integer less than or equal to

We use this t-distribution to find the t-score t, and then the confidence interval is given by

To execute the **TDIFMNCI** program to find such a confidence interval, we enter the size of the first sample, followed by Xbar and Sx, then the size of the second sample, Ybar, and Sy. Next we enter the desired level of confidence (in decimal). To denote that we are using the pooled sample deviation for populations having a common variance, enter **1** for **POOLED**. Otherwise enter **0** for **POOLED**. The program displays the difference Xbar - Ybar, the margin of error, and the confidence interval.

** Example 1.** Many studies have been done on environmental and heriditary influences on IQ scores. In one such study, "The malleability of IQ as judged from adoption studies," in the journal

Let us assume that IQ scores are normally distributed, that these groups were independent, and that there is equality of the true variances of scores within the entire populations of children from which these two samples came. Find a 98% confidence interval for the true difference in average score between these two populations of children.

*Solution.* After calling up the **TDIFMNCI** program, enter **29** for **X SAMPLE SIZE**, **97** for **XBAR**, **13** for **X SAMPLE DEV.**, **68** for **Y SAMPLE SIZE**, **109** for **YBAR**, **11** for **Y SAMPLE DEV.**, **.98** for **CONF. LEVEL.**, and **1** for **POOLED?**. We find that µx - µy ~ -12 +/- 6.1009, or that -18.1009 <= µx - µy <= -5.8991. Equivalently, 5.8991 <= µy - µx <= 18.1009. That is, the average IQ score of all such children from the second group would be from 5.9 points higher to 18.1 points higher than the average score from the first group.

If we have two data sets with an equal number of measurements, as in the example that follows, then we can enter the data into the **STAT Edit** (**LIST EDIT** on TI-86, **APPS 6** on TI-89) screen in order to compute the statistics Xbar, Sx, Ybar, and Sy. We can then access the statistics to enter these values into the program as explained in the previous section.

** Example 2.** The "ego strength" of two independent samples of middle-aged men participating in fitness programs are given below. Assuming all such measurements are normally distributed, find a 90% confidence interval for the difference in mean ego strength between all middle-aged men participating in such programs.

*Solution.* First, enter the data into the **STAT Edit** screen (**LIST EDIT** on TI-86, **APPS 6** on TI-89), then use the **2-Var Stats** command (**TwoVar** on the TI-86 and 89) in order to compute the statistics Xbar, Sx, Ybar, and Sy. We see for these random samples of size 14 that Xbar = 4.64 and Sx = 0.6902 for the Low Fitness group, and Ybar = 6.43 and Sy = 0.43043 for the High Fitness group.

Next, call up the **TDIFMNCI** program and either enter these statistics directly or access the non-rounded values as explained in the previous section. Also, enter **0** for **POOLED** since we cannot necessarily assume a common variance among the populations.

We find that -2.1634 <= µx - µy <= -1.4152, or equivalently 1.4152 <= µy - µx <= 2.1634. That is, all High Fitness participants should average from 1.4152 points to 2.1634 points higher in Ego Strength than all Low Fitness participants.

1. In the article "Sex Differences in Mental Test Scores, Variability, and Numbers of High Scoring" in the journal Science (Vol. 269, July 7, 1995), it is stated that males traditionally have a greater variance in scores on mathematics tests than do females. Males tend to score at the higher and lower ends while females' scores are more consistent. However, the variance on other tests are usually equal. Moreover, males generally tend to average higher on mathematics tests while females tend to average higher on tests involving reading comprehension.

So assume that Verbal SAT scores of girls and boys are normally distributed with a common variance. We wish to see if there is an appreciable difference in the average score. The following data is a random collection of Verbal SAT scores from a group of sophomores at a random university. Find a 90% confidence interval for the difference between the girls' and boys' average scores.

| ||||||

2. (Data sets of different sizes). Suppose we obtain the following additional
random scores to add to the above data:

Add this new data and find a new 90% confidence interval for the difference between average score of girls and boys.

3. If we now consider Math SAT scores, then according to the article mentioned above, we can no longer assume equal variances among boys' scores and girls' scores. The data below is a random collection of Math SAT scores from a group of sophomores at a random university. Assuming that all scores for both boys and girls are normally distributed, find a 95% confidence interval for the true difference between the average score of boys and girls.

| |||||||

4. (Non-independent populations) The following data below is a random collection of pairs of grade point averages. The first is the final high school GPA and the second is the first year college GPA of the same student. Assuming that both sets of GPA's are from normally distributed populations, find a 99% confidence interval for the average difference in GPA from high school to first year in college.

Coll
| Coll | Coll | Coll | Coll |
|||||

1.87 | 1.13 | 1.43 | 3.17 | 3.16 | |||||

2.27 | 3.4 | 2.74 | 1.57 | 2.09 | |||||

2.56 | 1.96 | 1.95 | 1.71 | 3.6 | |||||

2.35 | 1.96 | 2.42 | 2.09 | 0.85 | |||||

3.02 | 3.17 | 3.04 | 3.4 | 1.88 | |||||

2.44 | 1.6 | 3.00 | 2.1 | 2.53 | |||||

1.33 | 2.37 | 3.73 | 1.66 | 2.81 |

1. First, enter the data into the **STAT Edit** screen (**LIST EDIT** on TI-86, **APPS 6** on TI-89), then use the **2-Var Stats** command (**TwoVar** on the TI-86 and TI-89) in order to compute the statistics Xbar, Sx, Ybar, and Sy. We obtain Xbar = = 507.143, Sx = 110.279, Ybar = 490.476, and Sy = 65.687.

Next, call up the **TDIFMNCI** program and enter the data. Use **1** for **POOLED** to denote a common variance among the populations.

We see that the 90% confidence interval for µx - µy is [-30.4986, 63.8319]; hence based on this data, girls average from 30.4986 points lower to 63.8319 points higher than boys on Verbal SAT scores.

2. We first add the additional 4 girls' scores and the additional 6 boys' scores in the appropriate columns in the list editor. Since the data sets no longer have the same size, we cannot we the **2-Var Stats** command to compute the sample means and sample deviations. So compute them separately with the **1-Var Stats** command (**OneVar** on the TI-86 and TI-89).

For the girls, the sample size is 25, the sample mean is 524, and the sample deviation is 110.8678. For the boys, the sample size is 27, the sample mean is 495.5555, and the sample deviation is 67.33.

After entering these values in the **TDIFMNCI**, we obtain a 90% confidence interval of [-13.8242, 70.7132].

3. Compute the statistics and execute the **TDIFMNCI** program with **0** for **POOLED**. We obtain a 95% confidence interval of [10.7198, 85.5302]. So based on this data, we can state that boys average anywhere from 10.7198 points higher to 85.5302 points higher than girls on the Math SAT.

4. Since the first and second GPAs are from the same person, the measurements are clearly dependent; hence, we must create a new sample by converting the data into one population measuring the first GPA minus second GPA. We then shall enter these statistics from this new sample into the **TCONFINT** program for a confidence interval for the mean of a normally distributed population. To actually enter the data under **L1** in the **STAT Edit** screen (**List Edit** on the TI-86 or **APP 6** on the TI-89), we enter the differences 2.6-1.87, 3.1-2.27, etc.

After completing the **TCONFINT** program, we find that the average high school GPA is from 0.4937 points higher to 1.2006 points higher than the average first year college GPA .

Return to Table of Contents.