298 CHAPTER 6 Normal Probability Distributions Outliers We should always be aware of the presence of any outliers, particularly because they can have very dramatic effects on results. Test for the effects of outliers by applying statistical methods with these outliers included and then a second time with outliers excluded. Outliers should be investigated because they may be the most important characteristics of the data and they may reveal critical information about the data. Discard outliers only if they are identified as being errors. Data Transformations Many data sets have a distribution that is not normal, but we can transform the data so that the modified values have a normal distribution. One common transformation is to transform each value of x by taking its logarithm. (You can use natural logarithms or logarithms with base 10. If any original values are 0, take logarithms of values of x + 1). If the distribution of the logarithms of the values is a normal distribution, the distribution of the original values is called a lognormal distribution. (See Exercises 21 “Transformations” and 22 “Lognormal Distribution.”) In addition to transformations with logarithms, there are other transformations, such as replacing each x value with 1x, or 1>x, or x2. In addition to getting a required normal distribution when the original data values are not normally distributed, such transformations can be used to correct deficiencies, such as a requirement (found in later chapters) that different data sets have the same variance. Dallas Commute Times EXAMPLE 2 Example 1 used only five of the Dallas commute times listed in Data Set 31 “Commute Times” in Appendix B. Shown in the accompanying display is the result obtained by using the Statdisk Normality Assessment feature with all 1000 Dallas commute times. YOUR TURN. Do Exercise 11 “Small World.” Statdisk Let’s use the display with the three criteria for assessing normality. 1. Histogram: We can see that the histogram is skewed to the right and is far from being bell-shaped. 2. Normal quantile plot: The points in the normal quantile plot are very far from a straight-line pattern. We conclude that the 1000 Dallas commute times do not appear to be from a population with a normal distribution.

RkJQdWJsaXNoZXIy NjM5ODQ=