9-5 Resampling: Using Technology for Inferences 493 This confidence interval is very close to the one previously obtained in Section 9-1. Again, because the confidence interval does not include 0, it appears that the e-cigarette and nicotine replacement treatment groups have different success rates. Also, the confidence interval gives us an estimate of the size of the difference. Randomization The randomization method for two sample proportions involves combining both sets of sample data (for proportions, use 0’s and 1’s), and then randomly selecting samples without replacement using the same sample sizes as the original samples. Example: The difference between the two sample proportions is 79>438 - 44>446 = 0.08171059. If we use the randomization procedure with those two sample proportions and if we use technology to get 1000 simulated differences, a typical result is that the difference of 0.08171059 or more will never occur, and the difference of -0.08171059 or below will never occur. That is, a difference “at least as extreme” as 0.08171059 will not occur. Consequently, there is a very small chance of getting a difference at least as extreme as 0.08171059. This is similar to the results of the P-value method in Section 9-1 1P@value = 0.000452. Because the original samples do have a difference of 0.08171059, we conclude that this difference is significant, so that there appears to be a significant difference between the two samples. It appears that the e-cigarette and nicotine replacement treatment groups have different success rates. Two Means: Independent Samples Section 9-2 presents methods for using sample data from two independent samples to make inferences about two population means. The following example illustrates the use of resampling with means from two independent samples. Resampling to Test a Claim About Two Means: Independent Samples EXAMPLE 3 Example 1 in Section 9-2 includes the following heights (mm) of randomly selected U.S. Army male personnel measured in 1988 (from Data Set 2 “ANSUR I 1988”) and different heights (mm) of randomly selected U.S. Army male personnel measured in 2012 (from Data Set 3 “ANSUR II 2012”). That example specified a 0.05 significance level for testing the claim that the mean height of the 1988 population is less than the mean height of the 2012 population. ANSUR I 1988: 1698 1727 1734 1684 1667 1680 1785 1885 1841 1702 1738 1732 ANSUR II 2012: 1810 1850 1777 1811 1780 1733 1814 1861 1709 1740 1694 1766 1748 1794 1780 The above two data sets have this difference between their means: x1 - x2 = 1739.4 - 1777.8 = -38.4 Bootstrapping The procedure for bootstrap resampling involves creating a bootstrap sample for the first sample by sampling with replacement, then doing the same for the second sample. Next, find the difference between the two bootstrap sample means. Repeat many times to get a large list of differences, then sort those differences and find P2.5 and P97.5, which are the limits of a 95% confidence interval estimate of the difference between the two population means. continued

RkJQdWJsaXNoZXIy NjM5ODQ=