Basic Statistics

Statistics is concerned with the study of numerical data. Often we are given a set of raw measurements from a population and we are asked to analyze the data in some way. The most elementary analysis involves computing the mean, median, mode, and standard deviation. We shall perform such computations in this section. Later sections will involve increasingly advanced topics.

Determining The Population

When studying a set of measurements, we must know the population under consideration. Moreover, we must know whether the set of measurements includes each member of the population or whether it is only a sample of measurements. If we have a measurement from every member of the population, then we call the data set a census. We shall consider this case first.

Census

In a census, we assume that we have a measurement from every member in the population under consideration. For example, consider the following Math ACT scores from a calculus class at a university which requires the ACT for admission.

 18 25 23 27 19 21 25 19 28 31 27 30 21 24 28 29 21 21 30 25 26 22 29 34 20 27 28 28 25 22 27

Since we have a measurement for every person in the class, we have a census. The population is the simply this class of calculus students at this university.

(a) Find the true values of the mean, median, mode, and standard deviation.

(b) What percentage of this class is within one standard deviation of average?

Entering Data & Computing Statistics

On the TI-83: Press STAT, then press ENTER.

On the TI-86: Press LIST (i.e., 2nd -), then F4 (for EDIT). Or you can press STAT (2nd +), then F2.

On the TI-89: Press APPS, then 6, then 1 to get to a list editor. Throughout this site, the "Current" item obtain by pressing APPS, 6, 1 is a DATA variable called dist that is stored in the main folder.

Step 2: Next, we clear the lists.

On the TI-83/86: In order to clear L1 on the TI-83 (or xStat on the TI-86), press the Up arrow to highlight L1 (xStat), press CLEAR, then press ENTER.

On the TI-89: Move the cursor into column c1. Press F6 (which is 2nd F1), then press 5 to clear column c1. Or press F1, then press 8 to clear all columns in the list editor.

Step 3: Enter the data.

With the cursor under list L1 (xStat on the TI-86, c1 on the TI-89), type 18, press ENTER; type 30, press ENTER; continue until all measurements are entered under L1 on the TI-83 (xStat on the TI-86, or c1 on TI-89). Then press 2nd QUIT (EXIT on the TI-86, HOME on the TI-89) to return to the Home screen.

Sorting the Data

If we desire, we can sort the data into increasing order:

On the TI-83: Press STAT, press 2. We obtain SortA( on the screen. Press L1 (i.e., 2nd 1), press ENTER. In other words, enter the command SortA(L1. Now press STAT, press ENTER. Observe the data under list L1 which now has been sorted. Press the down arrow to scroll down the list.

On the TI-86: Press LIST, press F5 (for OPS), press F2 for sortA. Press 2nd F3 for NAMES, then press the button for xStat. Next, press STO, press the button for xStat, and press ENTER. In other words, enter the command SortA xStat -> xStat.

On the TI-89: After entering the data in c1, press F6 (i.e., 2nd F1) and press 3 to sort the column.

Computing the Mode

The mode is the measurement that occurs most often which can also be interpreted as the most likely measurement. There could be more than one mode.

The calculator does not compute the mode for us. But by scrolling down the list, we can observe that four different measurements each occur four times which is the most that any measurement occurs. Thus, we have modes of 21, 25, 27, and 28.

Computing the Statistics

After data has been entered into list L1 (or xStat, or c1), we can compute the other desired statistics.

On the TI-83: Press STAT, press the right arrow to display the CALC screen, press 1 to obtain the line 1-Var Stats. Press L1 (2nd 1) to obtain the line 1-Var Stats L1. Press ENTER to obtain the basic statistics.

On the TI-86: Press STAT (2nd +), then F1, then F1 again to obtain the command OneVar. Type xStat (or press List, then F3, then the button for xStat), and enter the command OneVar xStat.

On the TI-89: Press MATH (2nd 5), then press 6, then press 1 to obtain the command OneVar. Type c1 and enter the command OneVar c1. To see the statistics, press MATH, then press 6, then press 8. Then enter the command ShowStat.

Note: If the data had been entered into a different list, say list L3 (or yStat, or c2), then we would use the command 1-Var Stats L3 (or OneVar yStat or OneVar c2).

Since we have a census of the entire population, the value of Xbar is actually the true population mean; thus, µ = 25.16129032.

On the TI-83 and TI-86, two standard deviation values are given. The first, Sx, is the sample deviation which is to be used for a sample. (This is the only value given on the TI-89.) The second, sigma, is the true standard deviation when the data set is a census of the entire population. Hence the true standard deviation, which we shall denote here by s, is s = 3.952106618.

We also note a sample size of n = 31. If we scroll down, we obtain more statistics. The minimum of the data set is 18 while the maximum is 34. In particular, the median is 25. That is, 25 is the "middle" measurement is 25.

The first quartile is Q1 = 21 and the third quartile is Q3 = 28. Thus, around 1/4 of the measurements are 21 or below while around 3/4 of the measurements are 28 or below.

We note that sigma = Sqrt[ (n - 1) / n ] * S. Thus, to calculate sigma on the TI-89, enter Sqrt(30/31)*Sx -> s, which also stores the value as s.

Accessing the Statistics

After computing the statisctics, the calculator stores their values in memory. We can call up the values from this screen. For example, we can compute the interval (µ - s, µ + s).

On the TI-83: Press VARS, press 5, press 2, press -, press VARS, press 5, press 4, press ENTER. We see that µ - s = 21.2091837. Now press 2nd ENTER to retrieve the command, edit the - to a +, and press ENTER. We see that µ + s = 29.11339694.

On the TI-86: Press STAT (i.e., 2nd +). press F5, then press F1. Then press -, press F2, and press ENTER. Now press 2nd ENTER to retrieve the command, edit the - to a +, and press ENTER. We see that (µ - s, µ + s) = (21.2091837, 29.11339694).

On the TI-89: Press CHAR (i.e, 2nd +), scroll down to MATH, scroll right then scroll down to item A for xbar and press ENTER. Next, press -, then type s and press ENTER. Next, edit the - to a +, and press ENTER to compute µ + sigma.

Now return to your list of data in the data editor, scroll down the list of measurements and observe that 19 measurements are between 21.2091837 and 29.11339694. Thus, 19 / 31 or 61.29% of the measurements are within one standard deviation of average.

Random Sample

Now, suppose the calculus class is actually much larger and that the data set of Math ACT scores only represents a sample of measurements from the entire class. Assume also that they were chosen arbitrarily or "at random."

What is the largest population that this sample can honestly represent?

Here are some choices for the population: (1) the entire calculus class, (2) all present calculus students at the university, (3) all students at the university, (4) all college students, (5) all math majors at the university, (6) all students required to take calculus at the university, (7) all students who have recently taken the ACT, (8) other?

Which is the best choice?

Given a sample of measuements, it may be hard to determine the largest population that it can honestly represent. Usually then, the population is decided upon first and a sample is taken, hopefully at random, from just that population. The sample then is used to study the entire population.

Example. Suppose we wish to estimate the percentage of students at our school that are within two standard deviations of average height. But we do not wish to measure every student. Estimate the percentage based on the following random sample of heights (measured to the nearest inch).

 68 72 76 64 68 69 66 70 65 72 67 69 62 64 68 70 74 68 66 71 60 65 75 63 72 63 65 64 68 67 70 69 68

Solution. We enter first the data into a list say L2 (or yStat or c2), sort it, and compute the statistics. Because we have a sample rather than a census, Xbar denotes the sample mean and Sx denotes the sample deviation. We see that Xbar = 67.818, Sx = 3.77, and that we have a sample of size n = 33.

Next, we compute the interval (Xbar - 2 Sx, Xbar + 2 Sx) by accessing the variables

On the TI-83, use item 3 in the VARS Statistics for Sx.

On the TI-86, use F3 from the STAT VARS menu for Sx.

On the TI-89, just type (capital) S, then x for Sx.

We find this interval to be (60.27738369, 75.3589794). By scrolling down the sorted list, we see that all but the smallest measurement of 60 and the largest measurement of 76 lie in this range. Thus, 31 out of 33 or roughly 93.94% should be within two standard deviations of average height.

When scrolling down the list, we also can observe that the mode is 68 which occurs 6 times.

Frequency Charts

Often data with many measurements are given in a frequency chart that gives the number of occurrences for each measurement. For example, suppose a number of households were surveyed as to how many children lived at home. The responses are below:

 Number of Children 0 1 2 3 4 5 6 Number of Households 60 42 86 59 22 4 2

Note: The measurement 0 occurs 60 times; the measurement 1 occurs 42 times, etc.: 0, 0, 0, 0, . . . , 1, 1, 1, 1, . . . , 2, 2, 2, . . . , etc. So it is easier to use the frequency chart.

To enter the data into lists, follow the first set of instructions; but on the TI-83 enter the measurements (children) under list L1 and the frequencies under list L2. On the TI-86, enter the measurements under xStat and the frequencies under yStat. On the TI-89, enter the measurements under c1 and the frequencies under c2. (Remember to clear the lists before entering new data.)

There is no need to sort the data. To compute the statistics:

On the TI-83: Enter the command 1-Var Stats L1, L2 which means that the measurements in list L1 occur with frequency L2.

On the TI-86: Enter the command OneVar xStat, yStat.

On the TI-89: Enter the command OneVar c1, c2.

We see that there were 275 measurements, with the average number of children being about 1.85818 with a sample deviation of 1.33928.

What percentage of these measurements are within one sample deviation of average?

Compute Xbar - Sx and Xbar + Sx to obtain an interval of (0.518897, 3.197466).

The measurements 1, 2, and 3 are within this range. Thus, there are 42 + 86 + 59 = 187 out of 275, or 68% within one sample deviation of average.