2-1 Frequency Distributions for Organizing and Summarizing Data 45 2-4 Scatterplots, Correlation, and Regression • Develop an ability to construct a scatterplot of paired data. • Analyze a scatterplot to determine whether there appears to be a correlation between two variables. Key Concept When working with large data sets, a frequency distribution (or frequency table) is often helpful in organizing and summarizing data. A frequency distribution helps us to understand the nature of the distribution of a data set. Also, construction of a frequency distribution is often the first step in constructing a histogram, which is a graph used to help visualize the distribution of data. 2-1 Frequency Distributions for Organizing and Summarizing Data DEFINITION A frequency distribution (or frequency table) shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values in each of them. Let’s use the commute times listed in Table 2-1 from the Chapter Problem. Table 2-2 is a frequency distribution summarizing those times. The frequency for a particular class is the number of original values that fall into that class. For example, the first class in Table 2-2 has a frequency of 6, so 6 of the commute times are between 0 and 14 minutes, inclusive. Examining the list of frequencies, we see that the commute times are distributed with most of the times being at the lower end. The following standard terms are often used in constructing frequency distributions and graphs. DEFINITIONS Lower class limits are the smallest numbers that can belong to each of the different classes. (Table 2-2 has lower class limits of 0, 15, 30, 45, 60, 75, and 90.) Upper class limits are the largest numbers that can belong to each of the different classes. (Table 2-2 has upper class limits of 14, 29, 44, 59, 74, 89, and 104.) Class boundaries are the numbers used to separate the classes, but without the gaps created by class limits. Figure 2-1 on the next page shows the gaps created by the class limits from Table 2-2. In Figure 2-1 we see that the values of 14.5, 29.5, 44.5, 59.5, 74.5, and 89.5 are in the centers of those gaps. Following the pattern of those class boundaries, we see that the lowest class boundary is -0.5 and the highest class boundary is 104.5. The complete list of class boundaries is -0.5, 14.5, 29.5, 44.5, 59.5, 74.5, 89.5, and 104.5. Class midpoints are the values in the middle of the classes. Table 2-2 has class midpoints of 7, 22, 37, 52, 67, 82, and 97. Each class midpoint can be found by adding the lower class limit to the upper class limit and dividing the sum by 2. Class width is the difference between two consecutive lower class limits (or two consecutive lower class boundaries) in a frequency distribution. Table 2-2 uses a class width of 15. (The first two lower class limits are 0 and 15, and their difference is 15.) TABLE 2-2 Daily Commute Time in Los Angeles Daily Commute Time in Los Angeles (minutes) Frequency 0–14 6 15–29 18 30–44 14 45–59 5 60–74 5 75–89 1 90–104 1
RkJQdWJsaXNoZXIy NjM5ODQ=