Regression Revisited

 HomeContact us    
  Main Concepts  | Demonstration  | Activity  | Teaching Tips  | Data Collection & Analysis  | Practice Questions  | Milestone   | Fathom Tutorial 
 

   

 Activity

This activity will (hopefully) reinforce the idea of slope and intercept as statistics, each varying from sample to sample. We will use Beth Chance's regression sampling applet:

http://statweb.calpoly.edu/chance/applets/regcoeff/regcoeff.html

The applet allows you to select the population slope and intercept, which in turn determine the population regression line (in yellow). You can also choose a mean and standard deviation for the x-values, and finally the population standard deviation for the responses about the regression line. For now, let's all be consistent:

• Set the population slope to 1.5 and the population intercept to 2. Keep all other values the same.

• Click the Set Population button to create your new population of data. (Note:If you would like to see more of the graph, you can change the window frame using the gray boxes along the 4 sides of the graph.)

At the bottom of the page, you should see the equation y = 1.50x+2. That is the population equation.

We will now sample from the population displayed on the graph (the blue dots).

• Hit the Draw Samples button once. The applet randomly selects a sample of points (in red; n = 80 is the default). Then the applet calculates the least squares regression line for those n points and graphs that line in red. The equation of the line appears at the bottom of the page. Is it exactly the same as our population equation? Do the graphs line up exactly? Why not?

• Hit the Draw Samples button a few more times, just to see how the samples --and, hence, the resulting least squares regression lines-- differ from sample to sample. This is an illustration of sampling variability.

• Change the "num samples" from 1 to 100, then click Draw Samples. The applet will superimpose all 100 sample least squares lines onto the graph (the "wave" of red) and launch a window with dot plots of the sampling distributions of the slope and intercept. Focus on the slopes: what do you see? Is the center of the dot plot reasonably close to 1.5? Do you notice a shape forming?

• Before you close the dot plot window, note the standard deviation for the slopes somewhere.
Now let's see how varying the other "parameters" of the applet changes things.

• Hit the Reset button.

• Change the value of sigma from .45 (the default) to 2.45, and click Set Population. What do you notice happened to the population graph? Remember, sigma is the standard deviation of the y-values about the regression line.

• Once again, take 100 samples of size n = 80 and look at the sampling distribution of the slopes. What happened to their spread? (That is, did the standard deviation of the slopes increase or decrease, compared to the value you noted earlier?) Is this what you would expect to happen for a larger value of sigma?

• Change sigma back to .45 (the default) and click Set Population. For our next illustration, change the sample size from 80 to 20. Again, take 100 samples and look at the resulting slopes. What happened to the spread of the slopes this time? Is this what you would expect to happen for a smaller sample size?

• Finally, change the sample size back to 80 (the default). How do the x-values play a role? To find out, change "x std" (the standard deviation of the x-values) from 1.84 to 4.84. With sample size back at 80, take 100 samples again. What happened to the spread of the slopes? Does this result surprise you?


Assuming the simulations went according to plan, we should have found three patterns among the variety of lines provided by the variety of random samples:

1) The larger the standard deviation of the responses about the line, the more widely-varying our estimates of the slope will be.

2) The variability of the sampling distribution of the slopes is larger for smaller sample sizes.

3) Slopes across different samples are less variable when the x-values are more variable.