If a set of numerical data has n elements and is arranged in increasing order
x1≤x2≤...≤xn,
the lower quartile (
Q1) may be taken to be the
median of the lower half of the data, i.e. of
x1,
x2,...,
x½(n−1) if
n is odd, and the median of
x1,
x2,...,
x½n if
n is even. The upper quartile (
Q3) may be taken to be the median of the upper half of the data, i.e. of
x½(n+1),
x½(n+3),...,
xn if
n is odd, and the median of
x½(n+2),
x½(n+4),...,
xn if
n is even. The difference
Q3−
Q1 is the interquartile range, a term introduced by
Galton in 1882. An alternative term is the midspread.
As an example, consider the ordered data:
101, 103, 104, 105, 106, 107, 108, 109, 111, 111, 111, 115, 118, 121, 124, 127, 130, 156, 199.
There are nineteen observations. The tenth largest is 111, the median. Within the lower nine values, the fifth largest is 106 (=
Q1). Within the upper nine values the fifth largest is 124 (=
Q3). The inter-quartile range is 124−106=18.
When there are many observations it may be easier to read approximate values for the lower and upper quartiles from a
cumulative frequency graph. These will be the values of the variable corresponding to cumulative relative frequencies of 25% and 75%, respectively.
For a
continuous random variable
X, the lower quartile of the distribution is such that P(
X<
Q1)=¼ and the upper quartile is such that P(
X<
Q3)=¾.
In his 1970 book on
exploratory data analysis,
Tukey referred (in the context of
data) to the quartiles as hinges and he called the interquartile range the H-spread. Tukey defined a step as 1.5 × H-spread, and proposed that values one step beyond a hinge should be called inner fences and values two steps beyond a hinge should be called outer fences. Any data item beyond an outer fence would be called far out.
In the previous data the hinges are 106 and 124, thus the H-spread is 124−106=18 and the step is 1.5 × 18=27. The inner fences are at 106−27=79 and 124+27=151. The outer fences are at 79−27=52 and 151+27=178. The observation 199 is greater than 178 and is therefore far out.
See also
boxplot;
outlier;
quantile;
skewness;
trimean.