7-4 Bootstrapping: Using Technology for Estimates 357 Why Is It Called “Bootstrap”? The term “bootstrap” is used because the data “pull themselves up by their own bootstraps” to generate new data sets. In days of yore, “pulling oneself up by one’s bootstraps” meant that an impossible task was somehow accomplished, and the bootstrap method described in this section might seem impossible, but it works! How Many? In the interest of providing manageable examples that don’t occupy multiple pages each, the examples in this section involve very small data sets and no more than 20 bootstrap samples, but we should use at least 1000 bootstrap samples when we use bootstrap methods in serious applications. Professional statisticians commonly use 10,000 or more bootstrap samples. Bootstrap Procedure for a Confidence Interval Estimate of a Parameter 1. Given a simple random sample of size n, obtain many (such as 1000 or more) bootstrap samples of the same size n. 2. For the parameter to be estimated, find the corresponding statistic for each of the bootstrap samples. (Example: For a confidence estimate of m, find the sample mean x from each bootstrap sample.) 3. Sort the list of sample statistics from low to high. 4. Using the sorted list of the statistics, create the confidence interval by finding corresponding percentile values. Procedures for finding percentiles are given in Section 3-3. (Example: Using a list of sorted sample means, the 90% confidence interval limits are P5 and P95. The 90% confidence interval estimate of m is P5 6 m 6 P95, where P5 and P95 are replaced with the actual percentile values calculated from the sample data.) Limitations of the Following Examples For the purpose of illustrating the bootstrap procedure, Examples 2, 3, and 4 all involve very small samples with only 20 bootstrap samples. Consequently, the resulting confidence intervals include almost the entire range of sample values, and those confidence intervals are not very useful. Larger samples with 1000 or more bootstrap samples will provide much better results than those from Examples 2, 3, and 4. Proportions When working with proportions, it is very helpful to represent the data from the two categories by using 0’s and 1’s, as in the following example. EXAMPLE 2 Eye Color Survey: Bootstrap CI for Proportion In a survey, four randomly selected subjects were asked if they have brown eyes, and here are the results: 0, 0, 1, 0 (where 0 = no and 1 = yes). Use the bootstrap resampling procedure to construct a 90% confidence interval estimate of the population proportion p, the proportion of people with brown eyes in the population. SOLUTION REQUIREMENT CHECK The sample is a simple random sample. (There is no requirement of at least 5 successes and at least 5 failures or np Ú 5 and nq Ú 5. There is no requirement that the sample must be from a normally distributed population.) The four solution steps are on the next page: continued

RkJQdWJsaXNoZXIy NjM5ODQ=