70 CHAPTER 2 Descriptive Statistics Ages in a class 20 20 20 20 20 20 21 21 21 21 22 22 22 23 23 23 23 24 24 65 Although the mean, the median, and the mode each describe a typical entry of a data set, there are advantages and disadvantages of using each. The mean is a reliable measure because it takes into account every entry of a data set. The mean can be greatly affected, however, when the data set contains outliers. An outlier is a data entry that is far removed from the other entries in the data set. (You will learn a formal way for determining an outlier in Section 2.5.) DEFINITION While some outliers are valid data, other outliers may occur due to data-recording errors. A data set can have one or more outliers, causing gaps in a distribution. Conclusions that are drawn from a data set that contains outliers may be flawed. Comparing the Mean, the Median, and the Mode The table at the left shows the sample ages of students in a class. Find the mean, median, and mode of the ages. Are there any outliers? Which measure of central tendency best describes a typical entry of this data set? SOLUTION From the histogram below, it appears that the data entry 65 is an outlier because it is far removed from the other ages in the class. Mean: x = Σx n = 475 20 ≈ 23.8 years Median: Median = 21 + 22 2 = 21.5 years Mode: The entry occurring with the greatest frequency is 20 years. Interpretation The mean takes every entry into account but is influenced by the outlier of 65. The median also takes every entry into account, and it is not affected by the outlier. In this case the mode exists, but it does not appear to represent a typical entry. Sometimes a graphical comparison can help you decide which measure of central tendency best represents a data set. The histogram shows the distribution of the data and the locations of the mean, the median, and the mode. In this case, it appears that the median best describes the data set. Age Mean Median Outlier Mode Frequency 1 3 2 4 6 5 20 25 30 35 40 45 50 55 60 65 Ages of Students in a Class Gap TRY IT YOURSELF 6 Remove the data entry 65 from the data set in Example 6. Then rework the example. How does the absence of this outlier change each of the measures? Answer: Page A36 EXAMPLE 6 Picturing the World The National Association of Realtors keeps track of existing home sales. One list uses the median price of existing homes sold and another uses the mean price of existing homes sold.The median and mean sales prices of existing homes over a three-month span are shown in the double-bar graph. (Source: National Association of Realtors) 50 Median price Mean price Month Existing home price (in thousands of dollars) 100 150 200 250 300 350 400 Nov. 2020 Dec. 2020 Jan. 2021 U.S. Existing Home Prices Notice in the graph that each month, the mean price is more than the median price. Identify a factor that would cause the mean price to be greater than the median price.
RkJQdWJsaXNoZXIy NjM5ODQ=