The Rank Sum Test

 RANKSUM.83p RANKSUM.86p ranksum.89p

The Wilcoxon Rank Sum test can be used to test the null hypothesis that two populations X and Y have the same continuous distribution. We assume that we have independent random samples x1, x2, . . ., xm and y1, y2, . . ., yn, of sizes m and n respectively, from each population. We then merge the data and rank of each measurement from lowest to highest. All sequences of ties are assigned an average rank.

The Wilcoxon test statistic W is the sum of the ranks from population X. Assuming that the two populations have the same continuous distribution (and no ties occur), then W has a mean and standard deviation given by

µ = m (m + n + 1) / 2

and

s = Sqrt[ m n (N + 1) / 12 ],

where N = m + n.

We test the null hypothesis Ho: No difference in distributions. A one-sided alternative is Ha: first population yields lower measurements. We use this alternative if we expect or see that W is unusually lower than its expected value µ . In this case, the p-value is given by a normal approximation. We let N ~ N( µ , s ) and compute the left-tail P(N <=W) (using continuity correction if W is an integer).

If we expect or see that W is much higher than its expected value, then we should use the alternative Ha: first population yields higher measurements. In this case, the p-value is given by the right-tail P(N >= W), again using continuity correction if needed. If the two sums of ranks from each population are close, then we could use a two-sided alternative Ha: there is a difference in distributions. In this case, the p-value is given by twice the smallest tail value (2*P(N <=W) if W < µ , or 2*P(N >=W) if W > µ ).

We note that if there are ties, then the validity of this test is questionable.

Using the RANKSUM Program

Before executing the RANKSUM program, we must enter the X data into list L1 and enter the Y data into list L2 (use xStat and yStat on the TI-86, or c1 and c2 in a list called dist on the TI-89). Then execute the program by entering 1, 2, or 3 to specify the desired alternative X < Y, X > Y, or X does not equal Y.

The program will first sort each list, then merge and sort the lists into list L3 (fStat or c3). Then it will put the rank of each measurement in L3 next to it in L4 (LW or c4). All sequences of ties are assigned an average rank. The expected sum µ of the ranks from population X is displayed followed by the actual sums of the ranks from the populations X and Y. The program then displays the P-value for the entered alternative.

Example. In order to justify the rising retail prices for compact discs during the 90's, the industry compared the playing time for new releases in 1993 and in 1999. Below are random samples of times (in minutes) from discs released on the five major lables in each year. Use the Rank Sum test to check whether the 1999 times tend to be higher.

 1993 50.05 53.733 63 39.417 44.3 63.983 48.017 64.85 71.617 48.2 65 53.733 45.3 37.25 56.95 71.933 58.033 51.1 60.7 45.633 48.317 59.083 60.2 46.4 57.117 49.233 37.867 48.367 53.6 53.067

 1999 69.95 56.917 45.133 73.517 61.4 61.733 66.033 49.333 34 60.2 62.8 68.967 56.217 56.367 49 50.95 61.1 42 56.55 65.867 61.983 49.267 46.267 67.117 53.017 60.7 60.883 69.367 75.117 64.45 55.7 73.55

Solution. We first enter the 1993 times into one list (L1, xStat or c1) and the 1999 times into another list (L2, yStat, or c2). The alternative hypothesis is that the 1993 time tend to be less than the 1999 times.

We now execute the RANKSUM program by entering 1 for the alternative to specify that the first list tends to be lower. We obtain the following results:

The expected sum of ranks from the 1993 times is 945. The actual sums of the ranks from the 1993 and 1999 times are respectively 786 and 1167 The P-value is 0.012787.

Now consider the null hypothesis of "no difference" with the alternative that the 1993 times tend to be lower. If the null hypothesis were true, there would be only a 1.28% chance of the 1993 sum of ranks being as low as 786. This p-value is low enough to reject the null hypothesis in favor of the alternative that the 1993 times tend to be lower than the 1999 times.

Exercises

1. A professor is trying to determine if there is a significant difference in the Verbal and Math ACT scores of students enrolled in General Math class. The respective scores for students in his class are given below:

 (24, 18) (22, 19) (18, 19) (20, 24) (14, 16) (18, 15) (24, 22) (15, 16) (23, 17) (22, 23) (26, 30) (28, 26) (16, 18) (22, 20) (19, 18) (20, 20) (24, 22) (18, 16) (24, 21) (18, 16) (19, 19) (20, 26) (15, 16) (19, 18) (21, 25) (28, 26) (25, 23) (24, 24)

Using his class as a sample, use the Rank Sum test to see whether General Math students significantly tend to score higher on the Verbal ACT.

2. A manufacturer of cat food want to assure that the packages being produced at the Tennessee plant have the same weight as the packages being produced at the Wisconsin plant. Below are random samples of package weights (in ounces) from the production lines at each plant:

 Tenn. 4.67 4.65 4.68 4.59 4.64 4.56 4.54 4.81 4.72 4.51 4.63 4.59 4.6 4.57 4.62 4.7 4.62 4.69 4.61 4.67 4.6 4.63 4.66 4.72

 Wisc. 4.74 4.65 4.6 4.62 4.67 4.72 4.67 4.70 4.62 4.63 4.57 4.68 4.58 4.66 4.54 4.64 4.54 4.62 4.59 4.71 4.7 4.52 4.62

Use the Wilcoxon Rank Sum test (with a two-tail alternative) to test whether there is any overall difference in weights at the two production plants.

Solutions

1. We first enter the Verbal ACT scores into one list (L1, xStat or c1) and the Math ACT scores into another list (L2, yStat, or c2). Next, we execute the RANKSUM program by entering 2 for the alternative > (the Verbal scores in the first list tend to be higher). We obtain the following results:

The expected sum of ranks from the Verbal ACT scores is 798. The actual sums of the ranks from the Verbal and Math ACT scores are respectively 832.5 and 763.5 The P-value is 0.2859.

Now consider the null hypothesis of "no difference" with the alternative that the Verbal ACT scores tend to be higher. If the null hypothesis were true, then there would still be a 28.59% chance of the Verbal sum of ranks being as high as 832.5. This p-value is not quite low enough to reject the null hypothesis in favor of the alternative.

2. After entering the data into the appropriate lists and executing the RANKSUM program by entering 3 for the alternative, we obtain the following results: The expected sum of ranks from the Tennessee plant is 576. The actual sums of the ranks from the Tenn. and Wisc. plants are respectively 573.5 and 554.5 The P-value is 0.9576.

For the null hypothesis of "no difference" with a two-sided alternative, we can state: If the null hypothesis were true, then there would be a 95.76% chance of the Tenn. sum of ranks differing from 576 (either higher or lower) by as much as it does. Obviously we do not reject the null hypothesis.