An observation that is very different to other observations in a set of data. Since the most common cause is recording error, it is sensible to search for outliers (by means of summary statistics and plots of the data) before conducting any detailed statistical modelling.

Outlier. A small data set showing an obvious outlier. Whenever feasible, data should be plotted, since including an outlier will usually make nonsense of any calculations.
Various indicators are used to identify outliers. One is that an observation has a value that is more than 2.5 standard deviations from the mean. Another is that an observation has a value that lies more than 1.5
I beyond the upper or the lower quartile, where
I is the interquartile range (see
boxplot).
If there is only a single outlier present, then an effective test is the Dixon test. Denoting the
kth largest observation by
y(k), the test statistic is either

depending on whether
y(n) appears unusually large, or
y(1) appears unusually small. Special tables are required in order to determine significance.
For data from a
normal distribution, the test statistic of the Grubbs test, suggested by
Grubbs in 1969, is
G, given by
G=1/s max {y(n)−y̅, y̅−y(1)},
where
y̅ and
s are the sample mean and standard deviation.
The Rosner test for multiple outliers relies on ordering the
n observations in terms of their distance from the overall mean,
y̅. Let
ym be the observation that is the
mth closest to
y̅ and let the mean and standard deviation of the
m−1 observations closest to the overall mean be
y̅m−1 and
sm−1. The decision as to whether
ym is an outlier is based on the value of
