• In this unit we are introducing the concept of a statistical
model. A model is a set of assumptions about a variable or a
relationship between variables. It is an idealization of reality which
we hope approximates reality closely enough for our purposes. Or, as
George Box, a well-known statistician, once said: "All models are
wrong; some are useful."
• Models can be used for different purposes, including summarizing
relationships, making predictions, and understanding phenomena.
• A surprisingly large number of interesting relationships can be
modeled by a linear relationship.
• Regression is a very complex and subtle tool about which entire books
have been written. We will very lightly scratch the surface in this
course. We will return to this topic at the end of this course and
scratch this surface once more.
• Correlation does not imply causation. Correlation merely measures the
strength and direction of a linear relationship between two variables,
which is a way of saying it tells us something about the predictive
ability of a linear relationship.
• High correlation does not mean that the linear model is good and low
correlation doesn’t mean that the linear model is inappropriate. The
correlation coefficient measures the data's proximity to a straight
line, but it does not measure the appropriateness of the linear
• In addition to learning to interpret models, we also study how well
suited they are to answering our questions and how well model fits the
data. Residual plots are useful tools for evaluating the fit of the