356 CHAPTER 7 Estimating Parameters and Determining Sample Sizes DEFINITION Given a simple random sample of size n, a bootstrap sample is another random sample of n values obtained with replacement from the original sample. Without replacement, every sample would be identical to the original sample, so the proportions or means or standard deviations or variances would all be the same, and there would be no confidence “interval.” CAUTION Note that a bootstrap sample involves sampling with replacement, so that when a sample value is selected, it is replaced before the next selection is made. Advantage of Bootstrapping Bootstrap resampling gives us a reasonable estimate of how the point estimate of the parameter varies. With enough bootstrap samples, the resulting distribution of the sample statistic tends to be a reasonable approximation of the true distribution. Disadvantage of Bootstrapping It should be noted that with this bootstrap method, many samples are generated, and those samples are then used to generate the statistic being used to estimate a parameter. The sampling distribution of those statistics is centered around the sample statistic of the original sample data, not the population parameter being estimated. If we have a sample with a point estimate that is a very poor estimate of the population parameter, the bootstrap method will not make that point estimate any better. Limitation of Percentile Bootstrapping Confidence Interval The method of bootstrapping presented in this section involves resampling a data set then constructing a confidence interval by finding percentile values of the statistic being analyzed. The result is a confidence interval estimate of a parameter. This approach of constructing a confidence interval using percentile values is not ideal. There are more advanced methods, such as the bias-corrected BC bootstrap confidence interval, that yield better confidence intervals, but they are beyond the scope of this text. EXAMPLE 1 Bootstrap Sample of Incomes When the author collected annual incomes of current statistics students, he obtained these results (in thousands of dollars): 0, 2, 3, 7. Original Sample Bootstrap Sample 0 7 2 2 3 2 7 3 The sample of 57, 2, 2, 36 is one bootstrap sample obtained from the original sample. Other bootstrap samples may be different. Incomes tend to have distributions that are skewed instead of being normal, so we should not use the methods of Section 7-2 with a small sample of incomes. This is a situation in which the bootstrap method comes to the rescue. How Many People Do You Know It’s difficult for anyone to count the number of people he or she knows, but statistical methods can be used to estimate the mean number of people that we all know. The simple approach of just asking someone how many people are known has worked poorly in the past. A much better approach is to select a representative sample of people and ask each person how many people he or she knows who are named Mario, Ginny, Rachel, or Todd. (Uncommon names are more effective because people with more common names are more difficult to accurately recall.) Responses are then used to project the total number of people that are known. (If sample subjects know a mean of 1.76 people with those names, and we know that 0.288% of the population has those names, then the mean number of people known is 1.76>0.00288 = 611.) According to one estimate, the mean number of people known is 611, and the median is 472. (See “How Many People Do You Know? Efficiently Estimating Personal Network Size,” by McCormick, Salganik, and Zheng, Journal of the American Statistical Association, Vol. 105, No. 4.)
RkJQdWJsaXNoZXIy NjM5ODQ=