

Main
Concepts
Not all relationships are linear. In this unit we explore some
ways around this. Transforming nonlinear relationships into linear
relationships is a difficult concept for many students.
• Transforming the data to linearity
Some nonlinear relationships can be transformed into linear
relationships by transforming either the x variable, the y variable, or
both. Although there is an infinite variety of transforming functions
to consider, in practice (in this course) only power transformations,
exponentials, and logarithms are used. Sometimes the appropriate
transform is suggested by a theory that prescribes the relation (e.g. a
physics equation) but also, sometimes, one simply tries a transform and
sees if it makes the residual plot look better.
• Errors in Interpretation of Regression
Causal relations can not be inferred from regressions, correlations, or
scatterplots. Language is important here. Algebra teaches us that the
slope tells us the "change in y for a unit change in x". But in most
contexts in which we interpret regression, x was not observed to change
(and it may not even be possible to change it.)
• Aggregate data
Many data sets are actually aggregates of larger collections of data,
and these can lead to a false sense of security. For example, suppose
we examine the average SAT score by state with the average expenditure
per pupil. The "actual" data consists of the SAT scores of the many
thousands of students in each state plotted against, say, that state's
average expenditure. Had we plotted each student's SAT score vs. state
expenditures we would have seen a much "messier" scatter plot, and
would have had a much lower Rsquared value. This means that the
correct interpretation will include a statement about the relationship
between states' scores and states' expenditures, and not about
individuals' scores and states' expenditures (which would be a much
weaker relation).
• Extrapolating
Try not to do this, particularly if you are a weatherman or stock
broker. The idea is that many, many phenomena are linear for short
segments, but nonlinear over a larger scale. So if you try to predict
a yvalue for x's that are beyond the range of observed data, you are
implicitly assuming that the relationship will continue to be linear
(with the same slope) beyond the range of observed data. And this is an
assumption that is often impossible to verify and many times is untrue.
