Confidence Interval for the Difference of Means
Between Two Independent Arbitrary Populations

 ZDIFMNCI.83p ZDIFMNCI.86p zdifmnci.89p

Consider two independent populations having unknown means µx and µy respectively. We wish to construct a confidence interval for the difference in means µx - µy. We first estimate this difference with the difference in sample means Xbar - Ybar from independent random samples, of sizes n and m respectively, conducted on each population. Then the confidence interval is of the form (Xbar - Ybar) +/- e, where e is an appropriate margin of error.

For this scenario, each population could be "large" or "small" relative to the respective sample size. If Population 1 is of a small finite size N, then the sample deviation Sx is adjusted by multiplying it by the finite population correction factor Sqrt[ (N - n) / (N - 1) ] . The sample deviation Sy is adjusted similarly if the second population is a small finite size.

Once again, if the desired level of confidence is r, then we define the z-score z to be the value such that P(-z <= Z <= z) = r, where Z ~ N(0,1). If we let Sx and Sy denote the sample deviations of the respective random samples, then the margin of error is given by e = z Sqrt[ (Sx)^2 / n + (Sy)^2 / m ] .

The confidence interval is given by

(Xbar - Ybar) +/- z*Sqrt[ (Sx)^2 / n + (Sy)^2 / m ].

Usually, samples of sizes n >= 30 and m >= 30 are considered sufficient to use the analysis above. However in practice, if one wishes to obtain a reasonably small margin of error, then much larger sample sizes are needed.

Using the ZDIFMNCI Program

To execute the program, we first enter 1, 2, 3, or 4 to designate the types of populations we have under study: (1) two large populations; (2) the first large and the second finite; (3) the first finite and the second large; or (4) two finite populations. If a population is finite, then we enter its population size. Next, enter the size of the first sample, followed by Xbar and Sx, then enter the size of the second sample, Ybar, and Sy. Lastly, we enter the desired level of confidence (in decimal). The program displays the difference Xbar - Ybar, the margin of error, and the confidence interval.

Example 1. A national random survey of 650 workers with a college degree yields an average income of Xbar= \$38,560 with a standard deviation of Sx = \$4350. An independent random survey of 700 workers without a college degree yields an average income of Ybar = \$34,920 with a standard deviation of Sy = \$3870. Find a 95% confidence interval for the true difference in average incomes.

Solution. After calling up the ZDIFMNCI program, enter 1 to designate two "large" populations. Next, enter 650 for X SAMPLE SIZE, 38560 for XBAR, 4350 for X SAMPLE DEV., 700 for Y SAMPLE SIZE, 34920 for YBAR, 3870 for Y SAMPLE DEV., and .95 for CONF. LEVEL. We see that (µx - µy) ~ 3640 +/- 440.4781. In other words, workers with a degree average somewhere from \$3199.52 to \$4080.48 more in income.

Using Data Sets of a Common Size

If we have two data sets with an equal number of measurements, then we can enter the data into the STAT Edit screen (LIST EDIT on the TI-86, APPS 6 on the TI-89) in order to compute the statistics Xbar, Sx, Ybar, and Sy with the 2-Var Stats command (TwoVar on the TI-86 and TI-89). We then can access the statistics to enter these values into the program as follows:

On the TI-83: For X SAMPLE SIZE, press VARS, press 5, press 1, press ENTER. For XBAR, press VARS, press 5, press 2, press ENTER. For X SAMPLE DEV., press VARS, press 5, press 3, press ENTER. For Y SAMPLE SIZE, press VARS, press 5, press 1, press ENTER. For YBAR, press VARS, press 5, press 5, press ENTER. For Y SAMPLE DEV., press VARS, press 5, press 6, press ENTER.

On the TI-86: For X SAMPLE SIZE, type 2nd ALPHA 9 to obtain n, press ENTER. For XBAR, press STAT (i.e., 2nd +), press F5, then press F1, press ENTER. For X SAMPLE DEV., press STAT, press F5, press F3, press ENTER. For Y SAMPLE SIZE, type 2nd ALPHA 9 for n, press ENTER. For YBAR, press STAT, press F5, then press F4, press ENTER. For Y SAMPLE DEV., press STAT, press F5, press MORE, press F1, press ENTER.

On the TI-89: For X SAMPLE SIZE, type and enter nStat. For XBAR, press CHAR (i.e., 2nd +), press 2, then scroll down to Xbar (item A), press ENTER. For X SAMPLE DEV., type and enter Sx. For Y SAMPLE SIZE, type and enter nStat. For YBAR, press CHAR, press 2, then scroll down to Ybar (item B), press ENTER. For Y SAMPLE DEV., type and enter Sy.

Exercises

1. We wish to see if there is any apparent difference in high school grade point average between girls and boys who choose to go to college. The data below is a random collection of high school GPAs from a group of sophomores at a random university. Find a 90% confidence interval for the difference between average female and average male grade point average in the following cases:

(a) the samples are to represent all students nationwide.
(b) the samples are to represent only the 1254 female sophomores and the 982 male sophomores at that university.

Random Collection of Female High School GPAs
 3.25 3.25 3 3 4 3.6 3 3.25 3.4 3.6 3.75 3.7 3 3.25 3.5 3.8 3 2.8 4 3.25 2.75 3.1 3.75 3.5 3.4 3.75 3.25 3.3 2.7 3.1

Random Collection of Male High School GPAs
 3.75 3 2.3 2.9 3 4 2.1 3.5 2.1 2.5 4 3.75 3.75 3 3.4 4 2.4 2.5 2.9 2.7 3.75 4 2.5 2.5 3.75 2.2 3.7 4 2.8 2.3

2. (Data sets of different sizes). Suppose we obtain the following additional random GPAs to add to the above data:

More Random Female High School GPAs

 3.85 3.05 4 2.65 3.8 3.5 3.45 3 3.6 3.4

More Random Male High School GPAs

 3.65 2.4 3.3 3.5 2.6 3.4 2.7 3

Add this new data and find a new 90% confidence interval for the difference between average female and average male grade point average in the same two case as in Exercise 1.

3. A survey of 810 married men in a county that has 4160 married men found that the mean age of first marriage was 25.2 years with a sample deviation of 2.4 years. A national survey of 850 women found that the mean age of first marriage was 23.3 years with a sample deviaion of 2.1 years. Find a 95% confidence interval for the difference in average age at first marriage between men in this county and women nationwide.

Solutions

1. First, enter the data into the STAT Edit screen (LIST EDIT on TI-86, APPS 6 on TI-89), then use the 2-Var Stats command (TwoVar on the TI-86 and 89) in order to compute the statistics. We see for these random samples of size 30 that Xbar = XBar ~ 3.333 and Sx ~ 0.35727 for the girls and YBar ~ 3.1017 and Sy ~ 0.673448 for the boys.

Next, call up the ZDIFMNCI program and either enter these statistics directly or access the non-rounded values as explained above under Using Data Sets of a Common Size.

(a) For two large populations, we obtain a 90% confidence interval for µx - µy of [0.0027, 0.4606]. That is, based on this data we may say that, nationally, the average high school GPA of females is greater than that of males by as little as 0.0027 or by as much as 0.4606.

(b) For the two finite populations of sizes 1254 and 982 respectively, we obtain a 90% confidence interval of [0.006, 0.4574]. So just for sophomores at this university, the average high school GPA of females is greater than that of males by as little as 0.006 or by as much as 0.4574.

2. We first add the 10 additional girl GPAs and the 8 additional boy GPAs in the appropriate columns in the list editor. Since the data sets no longer have the same size, we cannot we the 2-Var Stats command to compute the sample means and sample deviations. So compute them separately with the 1-Var Stats command (OneVar on the TI-86 and TI-89).

For the girls, the sample size is 40, the sample mean is 3.3575, and the sample deviation is 0.37151. For the boys, the sample size is 38, the sample mean is 3.094736842, and the sample deviation is 0.629327126.

(a) After entering these values in the ZDIFMNCI for two large populations, we obtain a 90% confidence interval of [0.069, 0.4565].

(b) Upon entering these values in the ZDIFMNCI for populations of sizes 1254 and 982 respectively, we obtain a 90% confidence interval of [0.0726, 0.453].

3. Bring up the ZDIFMNCI program and enter 3 for a finite population against a large population, then enter 4160 for the first population size. Next, enter the summary statistics for each sample. We obtain a 95% confidence interval of [1.6952, 2.1048]. Thus, men in this select county average from 1.6952 years older to 2.1048 years older at the age of first marriage compared to women nationwide.