Data Collection and Analysis


	Two-Variable Relationships	Home \| Contact us


Main Concepts \| Demonstration \| Activity \| Teaching Tips \| Data Collection & Analysis \| Practice Questions \| Milestone \| Fathom Tutorial

We will look at a data set that discusses an issue sure to be near and dear to all of you: standardized testing and what leads to higher test scores. The data presented is aggregate data collected from each of the 50 states and includes information concerning the average cost per pupil, the student:teacher ratio, the estimated average teacher salary, and the percent of students who took the SAT exam for the 1994-1995 school year. What you hope to predict with all of this information is the state’s average Verbal, Math, and combined SAT scores.

Without looking at the data, which variable do you think will be of most use in predicting SAT scores? Why?

Download the data set from inspire.stat.ucla.edu/unit_02/SATdata.txt (note that there is an underscore _ between the words "unit" and "02").

Your primary goal is to understand the relation between SAT scores and the various cost variables. The questions below are meant to guide your analysis.

(1) What is the relationship between the amount of money spent per pupil and average total SAT score? What’s the correlation? The scatter plot? What is the coefficient in the linear regression model? What does this mean about how SAT scores can be expected to change when more money is spent per pupil?

(2) What do you find when you examine the relationship between the student:teacher ratio and total SAT score?

(3) Of the variables included in the data set, which is most highly correlated with total SAT score? Does this variable seem to have a higher effect on Math or Verbal scores, or is the effect about equal?

(4) Examine the linear relationship between total SAT score and the variable you found with which it's most correlated. Suggest explanations for the shape of the relationship.

(5) Suppose your state decides to focus on raising their average SAT score. What’s the best thing to do to try to raise a state’s average SAT score, according to this data?

(6) Remember that this is aggregate state-level data. You are not able to see what is happening with individual schools or even at individual districts. How does that effect our ability to understand SAT score variability? Is there any downside to having only state-level data?

Discuss your answers to these questions on the Discussion Board. We will re-visit aggregated SAT data in demonstration of Unit 3.