Comparing Two Populations |
||
Main Concepts | Demonstration | Activity | Teaching Tips | Data Collection & Analysis | Practice Questions | Milestone | ||
3a) The sample sizes are fairly large, and so we may assume the central limit theorem will make our test statistic -- the difference in the two sample proportions -- follow an approximate normal distribution.
Let p1 represent the proportion of the population who downloaded music in the Spring and p2 represent the proportion who did the same in the Winter. We wish to estimate p2 - p1.
Our estimate of this difference is 0.14 - 0.29 = -0.15. The standard error for this estimator is the square root of (0.14*0.86/1358) + (0.29*0.71/1300) = 0.0157. (For example, see 12.2 of YMS or 11.3 of POD.)
So a 95% CI is -0.15 +/- 1.96*0.0157 or -0.15 +/- 0.031 or [-18%, -12%]. With 95% confidence, we conclude that the proportion of the population who downloaded music in the Spring was between 12% and 18% below the corresponding proportion in Winter.
3b) The null hypothesis is p1 = p2 and the alternative is p1 > p2.
We perform all calculations assuming the null hypothesis is true. If so, then we can pool our estimates of p1 and p2, but to do this we need to know how many people answered "yes" to both polls. In the Spring, we assume 1300 people were polled and told that 29% answered "yes", and so we know that .29*1300 = 377 people said yes. In the Winter we calculate that 191 (rounding up) said yes.
In other words, let p represent the proportion in the population that download music. If H0 is true, p = p1 = p2, and so a good estimate of p would be (377+191)/(1300+1358) = .21369. We'll use this to estimate the standard error of our estimator of the difference in the two populations.
Our test statistic (again refer to POD or YMS) is Z= (0.14 - 0.29)/sqrt(0.21369*0.78631*(1/1300+1/1358)) = -9.43.
To calculate p-values, we must assume that both samples were random and independent and that the sample sizes are "large". "Large" is often translated to mean n*0.21369 > 10 and n*(1-0.21369) > 10 for each sample. This is easily met here since 21% of 1300 is well over 10.
P(Z < -9.43) is essentially 0, and so we reject the null hypothesis and conclude that the proportion of all users who download music has really declined over the study period.
Solutions to Practice Problems 1) The data set records gender. We'll test whether there's a difference in mean body temperatures. a) State the null and alternative hypotheses. Null: the mean temperature for women is equal to the mean temperature for men. Or the difference in the mean temperature for women and the mean for men is 0. Note that it is important that the hypothesis be stated in terms of population parameters. Alternative: the mean temperature for women does not equal the mean temperature for men. b) What are the assumptions? Are they satisfied here? We assume the observations are a random sample and are independent within each group (all of the men independent from each other, and the same for the women) and that the two groups are independent. We have no good way to check the assumption of independence, but from what we are told of the data it sounds as if this is a plausible assumption. Since both samples are large, we do not need to verify that the samples came from normally distributed populations. (Note: If either sample were small, you would need to check each sample separately for normality via normal quantile plots.) c) State the test and give the observed value of the test statistic. We'll use the (unpooled) t-test. The observed value for this set of data is t = 2.28. d) What's the p-value? The p-value = P( |T|> 2.28 assuming the null hypothesis is true) where T follows a t-distribution with (approx) 127.791 (from Fathom). This is equal to 0.024. e) State your conclusion. We conclude, at the 5% level, that the mean body temperature for men is different from the mean body temperature for women. 2) Test whether pulse rates differ for men and women. (Note: this is a "trick" question. Your answer should include all of the "correct" components and not just state a conclusion.) H0: mean pulse rate for women = mean pulse rate for men. Ha: mean pulse rate for women <> mean pulse rate for men. We assume that pulse rates are random samples from the population of healthy adult men and women and are independent both within gender and across gender. We will just have to assume that the data were collected to support the independence and random assumption. (Probably the data were not a random sample. In this case, we will proceed as if it were.) Since the samples are large, we do not need to assume the populations are normally distributed. We'll use a t-statistic, unpooled. Fathom calculates the observed value of this statistic to be 0.6319. The p-value, using 116.704 degrees of freedom, is 0.53. This is larger than 0.05, so we fail to reject the null hypothesis and conclude that there is insufficient evidence to conclude that men and women differ in their pulse rates. |
||