The Wilcoxon Rank Sum test can be used to test the null hypothesis that two populations X and Y have the same continuous distribution. We assume that we have independent random samples x1, x2, . . ., xm and y1, y2, . . ., yn, of sizes m and n respectively, from each population. We then merge the data and rank of each measurement from lowest to highest. All sequences of ties are assigned an average rank.

The Wilcoxon test statistic W is the sum of the ranks from population X. Assuming that the two populations have the same continuous distribution (and no ties occur), then W has a mean and standard deviation given by

and

where N = m + n.

We test the null hypothesis Ho: No difference in distributions. A one-sided alternative is Ha: first population yields lower measurements. We use this alternative if we expect or see that W is unusually lower than its expected value µ . In this case, the p-value is given by a normal approximation. We let N ~ N( µ , s ) and compute the left-tail P(N <=W) (using continuity correction if W is an integer).

If we expect or see that W is much higher than its expected value, then we should use the alternative Ha: first population yields higher measurements. In this case, the p-value is given by the right-tail P(N >= W), again using continuity correction if needed. If the two sums of ranks from each population are close, then we could use a two-sided alternative Ha: there is a difference in distributions. In this case, the p-value is given by twice the smallest tail value (2*P(N <=W) if W < µ , or 2*P(N >=W) if W > µ ).

We note that if there are ties, then the validity of this test is questionable.

Before executing the **RANKSUM** program, we must enter the X data into list **L1** and enter the Y data into list **L2** (use **xStat** and **yStat** on the TI-86, or **c1** and **c2** in a list called **dist** on the TI-89). Then execute the program by entering **1**, **2**, or **3** to specify the desired alternative X < Y, X > Y, or X does not equal Y.

The program will first sort each list, then merge and sort the lists into list **L3** (**fStat** or **c3**). Then it will put the rank of each measurement in **L3** next to it in **L4** (**LW** or **c4**). All sequences of ties are assigned an average rank. The expected sum µ of the ranks from population X is displayed followed by the actual sums of the ranks from the populations X and Y. The program then displays the P-value for the entered alternative.

** Example.** In order to justify the rising retail prices for compact discs during the 90's, the industry compared the playing time for new releases in 1993 and in 1999. Below are random samples of times (in minutes) from discs released on the five major lables in each year. Use the Rank Sum test to check whether the 1999 times tend to be higher.

*Solution.* We first enter the 1993 times into one list (**L1**, **xStat** or **c1**) and the 1999 times into another list (**L2**,** yStat**, or **c2**). The alternative hypothesis is that the 1993 time tend to be less than the 1999 times.

We now execute the **RANKSUM** program by entering **1** for the alternative to specify that the first list tends to be lower. We obtain the following results:

The expected sum of ranks from the 1993 times is 945. The actual sums of the ranks from the 1993 and 1999 times are respectively 786 and 1167 The P-value is 0.012787.

Now consider the null hypothesis of "no difference" with the alternative that the 1993 times tend to be lower. If the null hypothesis were true, there would be only a 1.28% chance of the 1993 sum of ranks being as low as 786. This p-value is low enough to reject the null hypothesis in favor of the alternative that the 1993 times tend to be lower than the 1999 times.

1. A professor is trying to determine if there is a significant difference in the Verbal and Math ACT scores of students enrolled in General Math class. The respective scores for students in his class are given below:

Using his class as a sample, use the Rank Sum test to see whether General Math students significantly tend to score higher on the Verbal ACT.

2. A manufacturer of cat food want to assure that the packages being produced at the Tennessee plant have the same weight as the packages being produced at the Wisconsin plant. Below are random samples of package weights (in ounces) from the production lines at each plant:

Use the Wilcoxon Rank Sum test (with a two-tail alternative) to test whether there is any overall difference in weights at the two production plants.

1. We first enter the Verbal ACT scores into one list (**L1**, **xStat** or **c1**) and the Math ACT scores into another list (**L2**, **yStat**, or **c2**). Next, we execute the **RANKSUM** program by entering **2** for the alternative > (the Verbal scores in the first list tend to be higher). We obtain the following results:

The expected sum of ranks from the Verbal ACT scores is 798. The actual sums of the ranks from the Verbal and Math ACT scores are respectively 832.5 and 763.5 The P-value is 0.2859.

Now consider the null hypothesis of "no difference" with the alternative that the Verbal ACT scores tend to be higher. If the null hypothesis were true, then there would still be a 28.59% chance of the Verbal sum of ranks being as high as 832.5. This p-value is not quite low enough to reject the null hypothesis in favor of the alternative.

2. After entering the data into the appropriate lists and executing the **RANKSUM** program by entering **3** for the alternative, we obtain the following results: The expected sum of ranks from the Tennessee plant is 576. The actual sums of the ranks from the Tenn. and Wisc. plants are respectively 573.5 and 554.5 The P-value is 0.9576.

For the null hypothesis of "no difference" with a two-sided alternative, we can state: If the null hypothesis were true, then there would be a 95.76% chance of the Tenn. sum of ranks differing from 576 (either higher or lower) by as much as it does. Obviously we do not reject the null hypothesis.

Return to Table of Contents.