Goodness of Fit Test for Normality

 TESTNORM.83p TESTNORM.86p testnorm.89p

Given a random sample of measurements, x1, x2, . . . , xn from a population A, we would like to test the hypothesis that A is normally distributed. Assuming that A is normal with mean µ and standard deviation s, and that X denotes a random measurement from A, then (X - µ) / s follows a standard normal distribution Z ~ N(0,1). If we let yi = (xi - µ) / s, for 1<= i <= n, then data y1, y2, . . . , yn would be a random sample from this standard normal distribution.

To test the "goodness of fit," we will divide the real line (-Infinity, Infinity) into m subdivisions, or bins, U1, U2, . . . ,Um-1, Um, where m is usually chosen to be from 1 / 5 to 1 / 10 of the sample size. The division is as follows: U1 = (-Infinity, -3], U2 = (-3, -3 + 6 / (m - 2) ], . . . , Um-1 = (-3 + 6(m - 3) / (m - 2), 3], and Um = (3, Infinity). The m - 2 middle bins are of equal length 6 / (m - 2).

We then count how many of the measurements yi lie in each subdivision and denote this frequency by fk, for 1 <= k <= m.

Since Z ~ N(0,1), the expected number of measurements that should fall in Uk is ek = n*P(Z in Uk ). We wish to compare the actual frequencies fk with these expected frequencies ek. If the differences are too large, then we must reject the null hypothesis that A is Normal. To measure the differences, we use a test statistic W defined by

W = Sum[ (fk - ek)^2 / ek , {k ,1, m} ] .

For large sample sizes n, W follows an approximate Chi Square distribution with m - 1 degrees of freedom.

However if µ is unknown or unspecified, then we must replace it with Xbar in the definition of yi. Likewise if s is unknown or unspecified, then we must replace it with its maximum likelihood estimate sigma(x) in the definition of yi. That is, we may instead have to let yi = (xi - Xbar ) / sigma(x), or yi = (xi - µ ) / sigma(x) , or yi = (xi - Xbar) / s.

In these cases, W is still defined as above, but now W follows an approximate Chi-Square distribution with m - 1 - e degrees of freedom, where e = 1 or e = 2 denotes the number of (maximum likelihood estimate) replacements made in the definition of yi. We then use this Chi-Square distribution to compute the p-value (right-tail value) created by the test statistic.

Note: Here we are using sigma(x) to denote the expression Sqrt[ Sum[ (xi - Xbar)^2 , {i, 1, n} ] / n ], or Sqrt[ Sum[ (xi - µ)^2 , {i, 1, n} ] / n ] if µ is specified, to denote the maximum likelihood estimate of the standard deviation.

Suppose we obtain a small p-value p. If A were normally distributed, then the chance of the test statistic W being as large as it is would only be p. In other words, the probability is approximately p that the frequencies of occurrences in the various bins differ by as much as they do from the expected numbers of occurrences. For p small enough (perhaps below a specified level of significance), we reject the null hypothesis that A is normally distributed (with the stated parameters).

Using the TESTNORM Program

The TESTNORM program performs this goodness of fit test. Before executing the program:

On TI-83+, enter the data points into list L1 in the STAT Edit screen.

On TI-86, enter the data points into list L1 in the LIST EDIT screen.

On TI-89, enter the data points into column c1 in a Data Editor called dist.

Then call up the TESTNORM program.

On TI-83+, a drop down menu appears which allows you to specify which parameters you wish to enter.

On TI-86, a toolbar appears which allows you to specify which parameters you wish to enter.

On TI-89, press F1 to obtain a drop down menu which allows you to specify which parameters you wish to enter.

The program will sort the data, then compute the frequencies and the expected number of occurrences for each bin. The program displays the test statistic W and the p-value.

Warning: You might run out of memory with a large data set. If so, then you might be able to execute the program after clearing some memory space.

Example.The following measurements are a random sample of combined SAT scores from a group of students picked from a random university .

Random Sample of Combined SAT Scores
 1270 850 1030 910 1330 760 1300 1060 1160 1220 1280 1020 980 1070 810 1070 1040 1050 1160 1020 1400 800 810 830 1040 990 1130 950 880 720 1070 1240 1240 1320 1040 880 1070 1080 1170 1320 820 910 1230 1300 750

Using a collection of 12 bins, test the hypothesis that the distribution of all SAT scores of students at this university

(a) is normally distributed with mean 980,
(b) is normally distributed.

Solution. (a) After entering the data, call up the TESTNORM program and designate that you wish to specify the mean. Then enter 980 for MEAN and enter 12 for NUMBER OF BINS.

We obtain a right tail value of 0.00743 from a test statisitc of 26.064564. With the low p-value, we have significant evidence to reject that the scores are normally distributed with a mean of 980. If so then there would only be a 0.00743 probability of the frequencies differing by as much as they do from the expected values.

On the TI-83+, press STAT, press ENTER:
1. The data points have been sorted into increasing order in list L1.
2. The list L2 contains the frequency of the data yi occurring in each bin.
3. The list L3 contains the expected numbers of occurrences for each bin.

On the TI-86, press LIST, press F4:
1. The data points have been sorted into increasing order in list L1.
2. The list L2 contains the frequency of the data yi occurring in each bin.
3. The list L3 contains the expected numbers of occurrences for each bin.
(You may have to add these list names into the LIST EDIT screen in order to see the results.)

On the TI-89, press APPS, press 6, press 1:
1. The data points have been sorted into increasing order in column c1.
2. The column c2 contains the frequency of the data yi occurring in each bin.
3. The column c3 contains the expected numbers of occurrences for each bin.

By comparing the frequencies {0, 0, 0, 1, 8, 6, 14, 4, 11, 1, 0, 0} with the expected values {0.06075, 0.30814, 1.248, 3.5613, 7.1632, 10.159, 10.159, 7.1632, 3.5613, 1.248, 0.30814, 0.06075}, we can see that there is not much "goodness" of fit to the specified normal distribution.

So either the mean is not 980 or the scores do not come from a normally distributed population. For part (b), we simply test for normality without specifying any parameters. Re-execute the program, but initially enter 4 to designate that we are not entering any parameters (NEITHER).

We now obtain a p-value of 0.657566. With the high p-value, we do not have evidence to reject normality. The differences in the frequencies {0, 0, 1, 7, 5, 10, 9, 6, 6, 1, 0, 0} and the expected values are no longer significant.

Using Data in a Frequency Chart

 FREQ.83p FREQ.86p freq.89p

The TESTNORM program requires that raw data be entered into specific lists (L1 on the TI-83 and TI-86, and c1 on the TI-89). But often data is given in a frequency chart that gives the number of occurrences for each measurement. If so, then we can use the FREQ program above to enter the measurements into a list in order to run the TESTNORM program.

On the TI-83: Enter the measurements into list L5 and the frequencies into list L6.

On the TI-86; Enter the measurements into xStat and the frequencies into list yStat.

On the TI-89: Enter the measurements into column c5 and the frequencies into column c6 in a DATA variable called dist.

After entering the data, execute the FREQ program, then execute the TESTNORM program.

Example. A number of households were surveyed as to how many children lived at home. The responses are below:

 Number of Children 0 1 2 3 4 5 6 Number of Households 60 42 86 59 22 4 2

Test whether or not the number of childen per household follows a normal distribution.

Solution. Rather than enter all 275 measurements into the appropriate list required to execute the TESTNORM program, we enter the frequency chart into the proper lists described above. Then we execute the FREQ program.

Next, we shall execute the TESTNORM program without specifying the mean or standard deviation, using 7 bins. We obtain a low p-value of 2.125 E-11; thus, we can reject the hypothesis that the number of childen per household is normally distributed. If it were normally distributed, then there would no chance of the frequencies {0, 0, 102, 86, 81, 4, 2} differing so much from the expected values in each bin. Is the result any different for more bins?

Exercise

The following measurements are weights (in pounds) from a random sample of men aged 40 to 49.

Random Weights of Men Aged 40-49
 218.5 262.75 133.5 152.75 165.25 198.5 165.5 178.25 191 247.25 205 135.75 136.25 192.5 173.75 200.25 163 187.5 191.75 217 127.5 176 184.25 172.75 194 175.25 185.25 202.25 212 158.25 177 156.5 196.75 170.75 158 160.25

Using 11 bins, test whether or not the data comes from a normally distributed population

(a) with standard deviation 15,
(b) with mean 180 and standard deviation 30.

Solution

(a) Execute the TESTNORM program by specifying a standard deviation of 15 (using the third item on the menu) and 11 bins. We obtain a p-value of 0 (from a test statistic of 276.198). If the weights of 40-49 year old men were normally distributed with of standard deviation 15, then there would be no chance of the data set being distributed as it is with the frequencies {3, 1, 1, 7, 5, 4, 5, 5, 1, 2, 2} differing so much from the expected occurrences in each bin. Therefore we must reject the hypothesis of normality with a standard deviation of 15.

(b) Re-execute the program by specifying a mean of 180 and a standard deviation of 30 (using the first item on the menu) and again using 11 bins. We now obtain a p-value of 0.9814 from frequencies {0, 0, 1, 3, 8, 10, 9, 3, 1, 1, 0}. Clearly we can now accept the hypothesis that the weights come from a normally distributed population having mean 180 and standard deviation 30.