Percentiles

Sarah Boslaugh

doi:10.4135/9781412953948

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Percentiles

Edited by:
Sarah Boslaugh
In:Encyclopedia of Epidemiology
Chapter DOI:https://doi.org/10.4135/9781412953948.n345
Subject:Epidemiology & Biostatistics, Public Health (general), Public Health Research Methods

Request Permissions

Show page numbers Hide page numbers

The percentile is a concept often used to summarize data and place the score or measurement taken on an individual into the context of a larger population. For any particular number p between 0 and 100, the pth percentile of a set of n measurements arranged in order of magnitude is the value that has at most p% of the observations below it and at most (100 − p)% above it. Roughly speaking, the first percentile is the number that divides the bottom 1% of the data from the top 99%; the second percentile is the number that divides the bottom 2% of the data from the top 98%; and so on. Therefore, if a man has a body mass index score at the 98th percentile for his age, it means roughly 98% of men his age have a body mass index score lower than him, and only 2% have a higher score.

A percentile may be viewed as the division of a data set into 100 equal parts. Smaller groupings are often used; for instance, the median of a data set is also the 50th percentile, which specifies that at least half the observations are equal or smaller than it. Other commonly used percentile groupings include deciles, which divide a data set into tenths (10 equal parts), quintiles, which divide a data set into fifths (5 equal parts), and quartiles, which divide a data set into quarters (4 equal parts). Of these, quartiles are the most commonly used.

Percentiles are often used to describe large data sets; for instance, in the body mass index example above, the percentiles may have been calculated using a sample of thousands of American men. However, researchers sometimes want to calculate percentiles, quartiles, and so on, for a smaller data set, in which case the following procedure may be used to establish cut points.

1.
Arrange the observations into increasing order from smallest to largest.
2.
Calculate the product of the sample size n and proportion φ you wish to include in each division (for quartiles, φ = 0:25; for deciles, φ = 0:10; etc.)
3.
If np is an integer, say k, calculate the average of the kth and (k + 1)th ordered values; if np is not an integer, round it up to the next integer and find the corresponding ordered value.

For example, a study of serum total cholesterol (mg/L) levels recorded the following ordered levels for 20 adult patients (the data were adapted by the author from data presented in Ott and Longnecker (2001, p. 83).

To determine the first quartile, we take p = 0.25, and calculate np = (20)(0.25) = 5, then the first quartile is the average of the fifth and sixth observations,

Table 1 Serum Total Cholesterol (mg/L) Levels Recorded the Following Ordered Levels for 20 Adult Patients
Ordered Observation	Cholesterol (mg/L)
1	133
2	137
3	148
4	149
5	152
6	167
7	174
8	179
9	189
10	192
11	201
12	209
13	210
14	211
15	218
16	238
17	245
18	248
19	253
20	257
	Source: Adapted from data presented in Ott and Longnecker (2001, p. 83).

Therefore, data points falling at or below this cut point are in the first quartile of the data set. To calculate the cut point for the median, we take p = 0.5, and np = (20)(0.5) = 10, so the median is the average of the 10th and 11th

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Percentiles

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends