9-5 Resampling: Using Technology for Inferences 491 Usefulness of Randomization: Assume that we are testing the null hypothesis that salaries are independent of gender so that there is no difference between the two sample means, so that H0: m1 = m2 and H1: m1 ≠ m2: ■ The original data show a difference between means of -4.0. ■ The randomization shown below yields a difference of -1.1. ■ We can repeat the randomization 1000 times to determine whether the difference of -4.0 is significant in the sense that it rarely occurs, or whether the difference is not significant because it occurs often. Original Data Two Independent Samples ($k) Male Female 2 5 3 6 7 9 12 Difference in means (d): 4.0 − 8.0 = −4.0 Combined Two Samples ($k) 2 3 7 5 6 9 12 Randomization Combined Samples Reallocated ($k) Male Female 2 3 6 5 9 7 12 Difference in means: 5.7 − 6.8 = −1.1 Here are two important observations about the preceding reallocation of data from the two samples: 1. See that the reallocation of data to the male group and female group reflects a null hypothesis that there is no difference between the two groups. The resampling therefore yields a distribution of differences between means that is based on the null hypothesis, so it makes sense to use the resulting distribution for hypothesis tests about the mean difference. 2. See that the reallocation of data to the male group and female group destroys the individual characteristics of the original two data sets (such as means, standard deviations, distributions). As a result, the distribution of differences between means is worthless for obtaining a confidence interval that is an estimate of the difference between the two group means. SOLUTION Randomization Procedure for Estimating the P-Value Randomization is used to estimate a P-value that can be used to form a conclusion about the null hypothesis H0: m1 = m2. The following procedure for estimating a P-value is based on 1000 repetitions, but any large number of repetitions could be used instead. The steps in the following procedure are vastly simplified by using technology. Step1: Find the difference d between the means of the original two samples. Example: The difference between means for male and female income samples is d = 4.0 - 8.0 = -4.0. Step2: Repeat Randomization (Reallocation) Procedure 1000 times or more. Example: The randomization (reallocation) is completed 1000 times, with the male>female income sample data randomly reallocated (without replacement) between the two samples. continued

RkJQdWJsaXNoZXIy NjM5ODQ=