Understanding Scores

Overview

What types of scores are there?

  1. Raw Scores
  2. Linear Transformed Scores
  3. Area Transformed Scores
  4. Stanines

What good are they?

The Raw Score

The Raw Score is a simple count of the test behavior, (e.g., # of free throws). It isn't usually meaningful for psychological measures. If I say you have a score of '4,' is that good? Bad? Indifferent?

You can't know anything from a simple total. It may sound like a lot or a little, but without knowing how it relates to other information, it is useless.

a 4 all alone...<sob>
So we transform it. a 4 being magically transformed

 

Transformation

  1. Doesn't change the score, it just uses different units. (Sounds contradictory? wait and see how it works).
  2. Transformation includes or takes into account information not in the raw score (such as the # of items on the test).
  3. More informative/interpretable than a raw score.

Linear Transformation

Uses the linear equation for the transformation.

Examples:

So how are these used?

Well, if you know an IQ is 100, you know that person earned a score at the mean of everyone who takes the test.

If the IQ score is 115, you know the person is noticeably above the mean (1 standard deviation= 15. see above).

If 130, much more noticeably above the mean--definitely an unusual score because it is 2 standard deviations from the mean.

How unusual a score is 70? 70 is as unusual as which, 115 or 130?

130-- they're both 2 standard deviations from the mean.

Suddenly, by using a linear transformation, the number now has meaning! It tells the unusualness/commoness of a score just by knowing the chosen mean/sd and the score.

4 "I have context I am transformed!"

Area Transformation

Uses the normal curve to create transformations. An area transformation is where on the curve the score falls, e.g., percentiles (NOT equivalent to percent or percentage).

55th percentile: earned score equal to or greater than 55 percent of the people who took the test. Notice that phrase 'equal to or greater than.' It has to be exactly that phrase to be accurate.

percentile drawn on a bell-shaped curve: Squished in the middle, spread out at the ends. Percentiles have uneven intervals–Greatly compressed in the middle…stretched at the limits. The difference between 50 and 55 percentile is tiny, but the difference between a 90 and a 95 percentile is huge. Notice that you have to understand the normal distribution to correctly interpret percentile scores.

Standard Nine aka Stanine

But what does the number mean in real life?

Depends on what you are using for the standard or comparison. That is, a score is meaningful relative to the comparison on which it is based.

Options for Standards

Personal performance - improvement compared to self over time. You may say "I lift more weight now than I used to."

Useful in clinical and some educational work. People like this standard when they are learning--it emphasizes progress made away from the starting point.

Lifting weights. Day 1 can't do it. Day 40 lifts it.
   

External or criterion-referenced. An absolute level of performance must be met to be successful. External agencies often set criterion-referenced standards.

This one can seem hard from a learner's point of view. It doesn't matter how hard you tried, how nice you are, or how much you studied. All that matters is whether you can remove an appendix without killing the patient or can land a plane without crashing.

Lifting weights: A human can lift 50 pounds, but can't lift 5,000-- it just isn't possible.
   

Norm groups- Find out how a group of people do on the test and compare all scores to their performance. This one is used a lot for psychological measures because there often is no external criterion.

How much is enough learning? The norm group answer is "As much as everyone else is doing." Learners often like this one because it gives them the security of fitting in. However, they may also rebel at being compared to everyone else.

The group can all life small weights. The guy who can lift a big one is above average. The guy who can't lift any is below average.

Since norm groups are so common in psychology, let's explore them a bit further.

Notice that norm groups are also called "standardization samples". Do you see where the concept of standard comes into play?

What group should be used for the norm comparison?

Depends on what you want to know. Do you want to know how the person is doing compared to a sample of peers, e.g., from the same school? or compared to the US population, e.g., a national group? Do you want to know how they do relative to others their age? or how they'd fit into an older group?

The sample typically is representative

But it doesn't have to fit one of the above categories if you have a specific research or clinical question, you need a norm group that reflects your question. My mother had a right hemisphere stroke and I want to know how she is doing. I may want to know relative to other persons with right hemisphere strokes so we can estimate if her pace of improvement is "normal", "slower", or "faster" so we can predict rehabilitation and financial needs.

[Note in this case that there is also an external comparison to be made and considered-- she can either feed herself or not, toilet herself or not. Meeting those standards is necessary for independent functioning. It doesn't matter if she is doing better than the average right hemisphere patient. And relative to herself she has made tremendous improvement, from being confined to bed to being able to walk 100 feet with help. This is great, but it may not be enough for that hard external standard.]

Let's examine the most popular norms.

Age norms

Many psychological traits of interest (e.g., intelligence) change with age so you need a comparison group of the same age.

Crude example: let's say that 5 year olds can define "apple" but 4 year olds can't.

If the subject defines "apple" then she is performing at a "5 year old level"

Grade Norms

Done similarily only with grades. They may say a child is performing similar to the beginning of 3rd grade or similar to the end of 3rd grade.

Typically they just get a beginning and end of year sample, unless it is first or 2nd grade, then they may have a middle. (They limit their samples because they'd have to have a full sample of children for each segment -- it gets costly.)

Big problems with grade norms though

  1. "Typical grade tasks" vary considerably w/schools-- so what does "grade" mean anyway? What does it mean to be at the beginning of 3rd grade?
  2. You absolutely may NOT interpolate (guess at scores between beginning and end of year calculated ones). Why? Because development is not even or regular over the course of the school year. So you can't say child is performing similar to the middle of the year, unless you took a comparison sample from the middle of the year.
  3. Finally, there's a big problem with "of out of age performance" interpretation…

    Example: a first grader passes "3rd grade" items.

Other considerations

Remember! The particular question you are trying to answer (decision you are trying to make) determines what is the appropriate comparison group. If you are deciding who to take into the US Army, then the potential pool is everyone in the USA. So any selection test standardization sample needs to reflect that. If you are deciding whom to hire in your Bowling Green bakery, you'll be choosing from amongst local Kentuckians and your selection process norm group should reflect that.

Culture Issues

Case ex: I was asked to evaluate for purposes of educational placement an immigrant child who was not doing well in a school in the USA. She didn't speak English. The families intention was to stay in the USA. What norms should be used to evaluate ability? What questions might you need to consider?

What if we changed the situation so that she was returning to her home country in a month for the rest of her life and would need to know where to be placed when she returned home?

Feel free to bring up this question in the discussion board. It is a hard question so only pick one of the issues to discuss.

To summarize


Created January 30, 2000: Last Modified: January 30, 2005. All contents© Sally Kuhlenschmidt.