Share on Facebook Share on Twitter Email
Answers.com

Prediction interval

 
Statistics Dictionary: prediction interval

A statement about the likely value of a future observation. It has a similar interpretation to a confidence interval but the interval is wider because it allows for the random error associated with the future observation.

For example, if a random sample of n observations is taken from a population with known variance σ2 and unknown mean μ, then the natural estimate of μ is , the sample mean. A confidence interval for μ is based on the uncertainty in the estimate, which has variance σ2/n. However, associated with a future observation from this distribution is the future random sampling variation with variance σ2. A prediction interval for a future observation is based on the sum of the two variances, σ2+σ2/n.



Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
Wikipedia: Prediction interval
Top

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which future observations will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

Prediction intervals are used in both frequentist statistics and Bayesian statistics: a prediction interval bears the same relationship to a future observation that a frequentist confidence interval or Bayesian credible interval bears to an unobservable population parameter: prediction intervals predict the distribution of individual future points, whereas confidence intervals and credible intervals of parameters predict the distribution of estimates of the true population mean or other quantity of interest that cannot be observed.

Contents

Introduction

For example, if one makes the parametric assumption that the underlying distribution is a normal distribution, and has a sample set {X1, ..., Xn}, then confidence intervals and credible intervals may be used to estimate the population mean μ and population standard deviation σ of the underlying population, while prediction intervals may be used to estimate the value of the next sample variable, Xn+1.

Alternatively, in Bayesian terms, a prediction interval can be described as a credible interval for the variable itself, rather than for a parameter of the distribution thereof.

The concept of prediction intervals need not be restricted to inference just a single future sample value but can be extended to more complicated cases. For example, in the context of river flooding where analyses are often based on annual values of the largest flow within the year, there may be interest in making inferences about the largest flood likely to be experienced within the next 50 years.

Since prediction intervals are only concerned with past and future observations, rather than unobservable population parameters, they are advocated as a better method than confidence intervals by some statisticians, such as Seymour Geisser[citation needed], following the focus on observables by Bruno de Finetti[citation needed].

Examples

Non-parametric

One can compute prediction intervals without any assumptions on the population; formally, this is a non-parametric method.[1]

Suppose one randomly draws a sample of two observations X1 and X2 from a population.

What is the probability that X2 > X1?

Assuming that samples are unequal, exactly 50%, regardless of the underlying population – the probability of picking 3 and then 7 is the same as picking 7 and then 3, regardless of the particular probability of picking 3 or 7. Thus, if one picks a single sample X1, then 50% of the time the next sample will be greater, which yields (X1, +∞) as a 50% prediction interval for X2. Similarly, 50% of the time it will be smaller, which yields another 50% prediction interval for X2, namely (−∞, X1).

Similarly, if one has a sample {X1, ..., Xn} then the probability that the next observation Xn+1 will be the largest is 1/(n + 1), since all observations have equal probability of being the maximum, and similarly the probability that Xn+1 will be the smallest is 1/(n + 1), thus the other (n − 1)/(n + 1) of the time, Xn+1 falls between the sample maximum and sample minimum of {X1, ..., Xn}. Thus, denoting the sample maximum and minimum by M and m, this yields an (n − 1)/(n + 1) prediction interval of [mM].

For example, if n = 19, then [mM] gives an 18/20 = 90% prediction interval – 90% of the time, the 20th observation falls between the smallest and largest observation seen heretofore. Likewise, n = 39 gives a 95% prediction interval, and n = 199 gives a 99% prediction interval.

One can visualize this by drawing the n samples on a line, which divides the line into n + 1 sections (n − 1 segments between samples, and 2 intervals going to infinity at both ends), and noting that Xn+1 has an equal chance of landing in any of these n + 1 sections.

Thus one can also pick any k of these sections and give a k/(n + 1) prediction interval (or set, if the sections are not consecutive). For instance, if n = 2, then the probability that X3 will land between the existing 2 observations in 1/3.

Notice that while this gives the probability that a future observation will fall in a range, it does not give any estimate as to where in a segment it will fall – notably, if it falls outside the range of observed values, it may be far outside the range. See extreme value theory for further discussion.

Formally, this is applies not just to sampling from a population, but to any exchangeable sequence of random variables, not necessarily independent or identically distributed.

Normal distribution

Given a sample from a normal distribution, whose parameters are unknown, it is possible to given prediction intervals in the frequentist sense, i.e., an interval [ab] based on statistics of the sample such that on repeated experiments, Xn+1 falls in the interval the desired percentage of the time; one may call these "predictive confidence intervals".[2]

A general technique of frequentist prediction intervals is to find and compute a pivotal quantity of the observables X1, ..., XnXn+1 – meaning a function of observables and parameters whose probability distribution does not depend on the parameters – that can be inverted to give a probability of the future observation Xn+1 falling in some interval computed in terms of the observed values so far, X_1,\dots,X_n. Such a pivotal quantity, depending only on observables, is called an ancillary statistic.[3] The usual method of constructing pivotal quantities is to take the difference of two variables that depend on location, so that location cancels out, and then take the ratio of two variables that depend on scale, so that scale cancels out. The most familiar pivotal quantity is the Student's t-statistic, which can be derived by this method and is used in the sequel.

Known mean, known variance

To begin, if one has a normal distribution N(μ,σ2) with known mean and variance, then one can compute prediction intervals in terms of the quantile function, \Phi^{-1}_{\mu,\sigma^2}(p)=\mu + \sigma \Phi^{-1}(p), where Φ is the cumulative distribution function for the standard normal distribution. For instance, a symmetric 95% prediction interval is given by


\begin{align}
& {}\qquad \left[\Phi^{-1}_{\mu,\sigma^2}(0.025),\  \Phi^{-1}_{\mu,\sigma^2}(0.975)\right] \\[6pt]
& = \left[\mu + \sigma\Phi^{-1}(0.025),\  \mu+\sigma\Phi^{-1}(0.975)\right] \\[6pt]
& =\left[\mu - \sigma\Phi^{-1}(0.975),\  \mu+\sigma\Phi^{-1}(0.975)\right];
\end{align}

2.5% of the time a sample will fall to the left of this interval, 2.5% of the time it will fall to the right, and the rest of the time it will fall in the interval.

Estimation of parameters

For a distribution with unknown parameters, a direct approach to prediction is to estimate the parameters and then use the associated quantile function – for example, one could use the sample mean \overline{X} as estimate for μ and the sample variance s2 as an estimate for σ2. Note that there are two natural choices for s2 here – dividing by (n − 1) yields an unbiased estimate, while dividing by n yields the maximum likelihood estimator, and either might be used. One then uses the quantile function with these estimated parameters \Phi^{-1}_{\overline{X},s^2} to give a prediction interval.

This approach is usable, but the resulting interval will not have the repeated sampling interpretation[4] – it is not a predictive confidence interval.

For the sequel, use the sample mean:

\overline{X} = \overline{X}_n=(X_1+\cdots+X_n)/n

and the (unbiased) sample variance:

s^2 = s_n^2={1 \over n-1}\sum_{i=1}^n (X_i-\overline{X}_n)^2.

Unknown mean, known variance

Given[5] a normal distribution with unknown mean μ but known variance 1, the sample mean \overline{X} of the observations X_1,\dots,X_n has distribution N(μ,1 / n), while the future observation Xn + 1 has distribution N(μ,1). Taking the difference of these cancels the μ and yields a normal distribution of variance 1 + (1 / n), thus

\frac{X_{n+1}-\overline{X}}{\sqrt{1+(1/n)}} \sim N(0,1)

Solving for Xn + 1 gives the prediction distribution N(\overline{X},1+(1/n)), from which one can compute intervals as before. This is a predictive confidence interval in the sense that if one uses a quantile range of 100p%, then on repeated applications of this computation, the future observation Xn + 1 will fall in the predicted interval 100p% of the time.

Notice that this prediction distribution is more conservative than using the estimated mean \overline{X} and known variance 1, as this uses variance 1 + (1 / n), hence yields wider intervals. This is necessary for the desired confidence interval property to hold.

Known mean, unknown variance

Conversely, given a normal distribution with known mean 0 but unknown variance σ, the sample variance s2 of the observations X_1,\dots,X_n has, up to scale, a \scriptstyle\chi_{n-1}^2 distribution; more precisely:

\frac{(n-1)s_n^2}{\sigma^2} \sim \chi_{n-1}^2.

while the future observation Xn + 1 has distribution N(0,σ2). Taking the ratio of the future observation and the sample standard deviation cancels the σ, yielding a Student's t-distribution with n–1 degrees of freedom:

\frac{X_{n+1}}{s} \sim T^{n-1}

Solving for Xn + 1 gives the prediction distribution sTn − 1, from which one can compute intervals as before.

Notice that this prediction distribution is more conservative than using a normal distribution with the estimated standard deviation s and known mean 0, as it uses the t-distribution instead of the normal distribution, hence yields wider intervals. This is necessary for the desired confidence interval property to hold.

Unknown mean, unknown variance

Combining the above for a normal distribution N(μ,σ2) with both μ and σ2 unknown yields the following ancillary statistic:[6]

\frac{X_{n+1}-\overline{X}_n}{S_n\sqrt{1+1/n}} \sim T^{n-1}

This simple combination is possible because the sample mean and sample variance of the normal distribution are independent statistics; this is only true for the normal distribution, and in fact characterizes the normal distribution.

Solving for Xn + 1 yields the prediction distribution

\overline{X}_n + S_n\sqrt{1+1/n} \cdot T^{n-1}.

The probability of Xn + 1 falling in a given interval is then:

\Pr\left(\overline{X}_n-T_a S_n\sqrt{1+(1/n)}\leq X_{n+1}   \leq\overline{X}_n+T_a S_n\sqrt{1+(1/n)}\,\right)=p

where Ta is the 100((1 + p)/2)th percentile of Student's t-distribution with n − 1 degrees of freedom. Therefore the numbers

\overline{X}_n\pm T_a {S}_n\sqrt{1+(1/n)}

are the endpoints of a 100p% prediction interval for Xn + 1.

Contrast with parametric confidence intervals

Note that in the formula for the predictive confidence interval no mention is made of the unobservable parameters μ and σ of population mean and standard deviation – the observed sample statistics \overline{X}_n and Sn of sample mean and standard deviation are used, and what is estimated is the outcome of future samples.

Rather than using sample statistics as estimators of population parameters and applying confidence intervals to these estimates, one considers "the next sample" Xn + 1 as itself a statistic, and computes its sampling distribution.

In parametric confidence intervals, one estimates population parameters; if one wishes to interpret this as prediction of the next sample, one models "the next sample" as a draw from this estimated population, using the (estimated) population distribution. By contrast, in predictive confidence intervals, one use the sampling distribution of (a statistic of) n or n+1 samples from such a population, and the population distribution is not directly used, though the assumption about its form (though not the values of its parameters) is used in computing the sampling distribution.

Regression analysis

A common application of prediction intervals is to regression analysis.

Suppose the data is being modeled by a straight line regression:

y_i=\alpha+\beta x_i +\epsilon_i\,

where yi is the response variable, xi is the explanatory variable, εi is a random error term, and α and β are parameters.

Given estimates \hat \alpha and \hat \beta for the parameters, such as from a simple linear regression, the predicted response value yd for a given explanatory value xd is

\hat{y}_d=\hat\alpha+\hat\beta x_d ,

(the point on the regression line), while the actual response would be

y_d=\alpha+\beta x_d +\epsilon_d.  \,

The point estimate \hat{y}_d is called the mean response, and is an estimate of the expected value of yd, E(y | xd).

A prediction interval instead gives an interval in which one expects yd to fall; this is not necessary if the actual parameters α and β are known (together with the error term εi), but if one is estimating from a sample, then one may use the standard error of the estimates for the intercept and slope (\hat\alpha and \hat\beta) to compute a prediction interval.

Bayesian statistics

Seymour Geisser, a proponent of predictive inference, gives predictive applications of Bayesian statistics.[7]

In Bayesian statistics, one can compute (Bayesian) prediction intervals from the posterior probability of the random variable, as a credible interval. In theoretical work, credible intervals are not often calculated for the prediction of future events, but for inference of parameters – i.e., credible intervals of a parameter, not for the outcomes of the variable itself. However, particularly where applications are concerned with possible extreme values of yet to be observed cases, credible intervals for such values can be of practical importance.

See also

References


 
 

 

Copyrights:

Statistics Dictionary. A Dictionary of Statistics. Second edition revised. Copyright © Oxford University Press, 2008. All rights reserved.  Read more
Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Prediction interval" Read more