More on Two Variable Relationships

 Home | Contact us   
  Main Concepts  | Demonstration  | Activity  | Teaching Tips  | Data Collection & Analysis  | Practice Questions  | Milestone   | Fathom Tutorial 
 

   

 Main Concepts

Not all relationships are linear. In this unit we explore some ways around this. Transforming non-linear relationships into linear relationships is a difficult concept for many students.

• Transforming the data to linearity
Some non-linear relationships can be transformed into linear relationships by transforming either the x variable, the y variable, or both. Although there is an infinite variety of transforming functions to consider, in practice (in this course) only power transformations, exponentials, and logarithms are used. Sometimes the appropriate transform is suggested by a theory that prescribes the relation (e.g. a physics equation) but also, sometimes, one simply tries a transform and sees if it makes the residual plot look better.

• Errors in Interpretation of Regression
Causal relations can not be inferred from regressions, correlations, or scatterplots. Language is important here. Algebra teaches us that the slope tells us the "change in y for a unit change in x". But in most contexts in which we interpret regression, x was not observed to change (and it may not even be possible to change it.)

• Aggregate data
Many data sets are actually aggregates of larger collections of data, and these can lead to a false sense of security. For example, suppose we examine the average SAT score by state with the average expenditure per pupil. The "actual" data consists of the SAT scores of the many thousands of students in each state plotted against, say, that state's average expenditure. Had we plotted each student's SAT score vs. state expenditures we would have seen a much "messier" scatter plot, and would have had a much lower R-squared value. This means that the correct interpretation will include a statement about the relationship between states' scores and states' expenditures, and not about individuals' scores and states' expenditures (which would be a much weaker relation).

• Extrapolating
Try not to do this, particularly if you are a weatherman or stock broker. The idea is that many, many phenomena are linear for short segments, but non-linear over a larger scale. So if you try to predict a y-value for x's that are beyond the range of observed data, you are implicitly assuming that the relationship will continue to be linear (with the same slope) beyond the range of observed data. And this is an assumption that is often impossible to verify and many times is untrue.