3-2 Measures of Variation 111 Notation Here is a summary of notation for the standard deviation and variance: s = sample standard deviation s2 = sample variance s = population standard deviation s 2 = population variance Note: Articles in professional journals and reports often use SD for standard deviation and VAR for variance. Important Properties of Variance ■ The units of the variance are the squares of the units of the original data values. (If the original data values are in feet, the variance will have units of ft2; if the original data values are in seconds, the variance will have units of sec2.) ■ The value of the variance can increase dramatically with the inclusion of outliers. (The variance is not a resistant measure of variation.) ■ The value of the variance is never negative. It is zero only when all of the data values are the same number. ■ The sample variance s 2 is an unbiased estimator of the population variance s 2, as described in Part 2 of this section. The variance is a statistic used in some statistical methods, but for our present purposes, the variance has the serious disadvantage of using units that are different than the units of the original data set. This makes it difficult to understand variance as it relates to the original data set. Because of this property, it is better to first focus on the standard deviation when trying to develop an understanding of variation. PART 2 Beyond the Basics of Variation In Part 2, we focus on making sense of the standard deviation so that it is not some mysterious number devoid of any practical significance. We begin by addressing common questions that relate to the standard deviation. Why Is Standard Deviation Defined as in Formula 3-4? In measuring variation in a set of sample data, it makes sense to begin with the individual amounts by which values deviate from the mean. For a particular data value x, the amount of deviation is x - x. It makes sense to somehow combine those deviations into one number that can serve as a measure of the variation. Adding the deviations isn’t good, because the sum will always be zero. To get a statistic that measures variation, it’s necessary to avoid the canceling out of negative and positive numbers. One approach is to add absolute values, as in Σ x - x . If we find the mean of that sum, we get the mean absolute deviation (or MAD), which is the mean distance of the data from the mean: Mean absolute deviation = Σ x - x n Why Not Use the Mean Absolute Deviation Instead of the Standard Deviation? Computation of the mean absolute deviation uses absolute values, so it uses an operation that is not “algebraic.” (The algebraic operations include addition, multiplication, extracting roots, and raising to powers that are integers or fractions.) The use of absolute values would be simple, but it would create algebraic difficulties in inferential Geographical Center of North America Geography professor Peter Rogerson calculated the geographical center of North America to be in the aptly named town of Center, North Dakota. The method used by Dr. Rogerson involved finding the point at which the sum of the squares of the distances to all other points in the region is the smallest possible sum. The calculation of the standard deviation is based on a sum of squares, as are some other measures in statistics. - N thD k t
RkJQdWJsaXNoZXIy NjM5ODQ=