Skip to main content icon/video/no-internet

Descriptive statistics constitute a branch of statistics that provides a variety of techniques used to present a quantitative summary of a given set of data. In contrast to descriptive statistics are inferential statistics, which provide ways to make inferences about the population from which the data were sampled.

When summarizing data on a single variable is of interest, there are at least two major characteristics to be considered: central tendency and dispersion. These characteristics represent some aspects of the distribution of the data. The central tendency denotes where the center of the data distribution is located. The mean, the median, and the mode are commonly used to describe the central tendency. The mean is the average of all data values, that is, the sum of data values divided by the number of observations. The median is the midpoint in a set of sorted data values; half of the data values fall above the median and half of the data values fall below it. The mode is the data value that appears most frequently in the given set of data.

The dispersion refers to the degree to which the data vary around the center of the distribution, or to what extent the data are spread out. One of the measures of dispersion is the standard deviation, which indicates the extent to which individual data points depart from the mean on average. Technically, it is defined by the following formula:

None

where Xi is the ith observation, X is the mean, and N is the total number of observations.

Another term for dispersion is variation. A measure for data variation is variance, which is defined as the square of the standard deviation. To illustrate how these indices are used, a set of data will be analyzed.

Example 1

Suppose that 20 children took a math test and we observed the following scores:

None

The mean for the above data is 10.70. If the above data are sorted, scores for the 10th and 11th observations are both 11. The median is their average (because the total number of observations is an even number), and thus calculated as 11. The mode for the above data is 11, because the score 11 is observed most frequently (five times). When the data distribution has a single mode and is almost symmetric about its center, as in this example, these three measures give almost the same values. When the distribution is skewed or when there are a few extreme values (outliers) in the data, the three measures of central tendency can disagree.

Although the mean is most commonly used because of its computational and interpretational convenience, the median or the mode is preferred for skewed data or when outliers are present, because the mean is sensitive to these factors and could be misleading as a representative value of the distribution. The standard deviation is calculated as 3.05 for the above data and the variance is thus 9.30. It should be noted that the standard deviation, like the mean, may not be representative of data if the distribution is highly skewed (i.e., lacks symmetry), because the average dispersion can differ for the data points above the mean and for those below the mean (and the mean itself is also affected by the skewness).

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading