Inference and Sampling Distributions
|Main Concepts | Demonstration | Activity | Teaching Tips | Data Collection & Analysis | Practice Questions | Milestone | Fathom Tutorial|
Main Concepts: Sampling Distributions
The central idea of this unit is to answer the question: how can we extend what we've learned about samples to the entire population?
• There is a crucial distinction to be made between statistics (functions of data) and parameters (which characterize probability distributions). Statistics are observable (because we get them from our data). Parameters are often unknowable, because they "belong" to populations. We will use the statistics of the sample to estimate the parameters of the population.
• Statistics, because they are based on data and, ideally, data are generated by some sort of random process, will vary from sample to sample. This means that statistics are random variables, and therefore have their own probability distributions. This is a Big Idea and a Very Difficult Concept. The probability distribution of a statistic is called the sampling distribution.
• Sampling distributions are very abstract, which is partly
why they are difficult. What makes sampling distributions somewhat
abstract is that they do not describe variability across a sample, but
variability from sample to sample.
• One of the Big (and Beautiful) Ideas of Statistics is the Central Limit Theorem. Loosely, if multiple samples of size n were drawn randomly and independently from a population, then the histogram of the means of those samples would be approximately normal. This is commonly misinterpreted to mean that the sample will be normally distributed.
• In theory, the sampling distribution can often be figured out mathematically based on probability theory. In practice, this can be quite difficult or impossible for some statistics. The Central Limit Theorem is one way of providing an approximate sampling distribution for some statistics, but it is not universally applicable.
• The Central Limit Theorem is one of the major concepts of Statistics and is seductively useful. However, it is not a panacea. There are some statistics for which the sampling distribution is not approximately normal, no matter how large the sample size. And there are some populations that are so non-normal that astronomically large sample sizes are required before the sampling distribution is approximately normal.
• When we say that an estimator is an unbiased estimator of a parameter, we mean that we have a method which, through repeated samples, is on average the same value as the parameter. It would be incorrect to say that 2.4 is an "unbiased" estimate of the population mean, because the term "unbiased" refers to the method, not the number.
• Be aware that although the books dwell on sampling distributions of sample means and sample proportions, there are as many sampling distributions as there are statistics. We feel it is important for both you and your students to look at sampling distributions distinct from the Central Limit Theorem. It's not that we think the sampling distribution of,say, the sample maximum is useful information; but the activity of examining it helps us learn about the more general concept of sampling distributions.