(statistics) Probabilities of the outcomes of an experiment before the experiment has been performed.
| Sci-Tech Dictionary: prior probabilities |
(statistics) Probabilities of the outcomes of an experiment before the experiment has been performed.
| 5min Related Video: Prior probability |
| Investment Dictionary: Prior Probability |
The probability that an event will reflect established beliefs about the event before the arrival of new evidence or information. Prior probabilities are the original probabilities of an outcome, which be will updated with new information to create posterior probabilities.
Investopedia Says:
Prior probabilities represent what we originally believed before new evidence is uncovered. New information is used to produce updated probabilities and is a more accurate measure of a potential outcome. For example, three acres of land have the labels A, B and C. One acre has reserves of oil below its surface, while the other two do not. The probability of oil being on acre C is one third, or 0.333. A drilling test is conducted on acre B, and the results indicate that no oil is present at the location. Since acres A and C are the only candidates for oil reserves, the prior probability of 0.333 becomes 0.5, as each acre has one out of two chances.
Related Links:
See why investors today still follow this set of principles to reduce risk and increase returns through diversification. Modern Portfolio Theory: An Overview
Learn how to follow the efficient frontier to better returns. Modern Portfolio Theory Stats Primer
This technique can reduce uncertainty in estimating future outcomes. Introduction To Monte Carlo Simulation
Check out how the assumptions of theoretical risk models compare to actual market performance. The Uses And Limits Of Volatility
Learn how to illustrate an asset return's sensitivity. Find The Right Fit With Probability Distributions
Volatility is not the only way to measure risk. Learn about the "new science of risk management". Introduction to Value at Risk (VAR) - Part 2
Volatility is not the only way to measure risk. Learn about the "new science of risk management". Introduction to Value at Risk (VAR) - Part 1
| Philosophy Dictionary: prior probability |
The probability assigned to a hypothesis or event before a piece of evidence emerges. In Bayesian reasoning, the probability after a piece of evidence (the posterior probability) is a function of the prior probability, and the extent to which the evidence fits the hypothesis, but inversely proportional to the prior probability of the evidence—in other words, evidence that might be expected to have arisen on many different hypotheses does not confirm any one of them particularly well. The terminology is criticized by de Finetti, who prefers to talk of initial and final probabilities.
| Wikipedia: Prior probability |
| This article includes a list of references, related reading or external links, but its sources remain unclear because it lacks inline citations. Please improve this article by introducing more precise citations where appropriate. (February 2008) |
In Bayesian inference, a prior probability distribution, often called simply the prior, is a probability distribution representing knowledge or belief about an unknown quantity a priori, that is, before any data have been observed P(A). The unknown quantity could be a parameter, hypothesis or latent variable.
The posterior probability is then the conditional probability taking the data into account P(A | B). This is computed from the prior and the likelihood function via Bayes' theorem.
Contents |
In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p (for example, suppose p is the proportion of voters who will vote for the politician named Smith in a future election) is the probability distribution that would express one's uncertainty about p before the "data" (for example, an opinion poll) are taken into account. It is meant to attribute uncertainty rather than randomness to the uncertain quantity.
One applies Bayes' theorem, multiplying the prior by the likelihood function and then normalizing, to get the posterior probability distribution, which is the conditional distribution of the uncertain quantity given the data.
A prior is often the purely subjective assessment of an experienced expert. Some will choose a conjugate prior when they can, to make calculation of the posterior distribution easier.
Parameters of prior distributions are called hyperparameters, to distinguish them from parameters of the model of the underlying data. For instance, if one is using a beta distribution to model the distribution of the parameter p of a Bernoulli distribution, then:
An informative prior expresses specific, definite information about a variable. An example is a prior distribution for the temperature at noon tomorrow. A reasonable approach is to make the prior a normal distribution with expected value equal to today's noontime temperature, with variance equal to the day-to-day variance of atmospheric temperature.
This example has a property in common with many priors, namely, that the posterior from one problem (today's temperature) becomes the prior for another problem (tomorrow's temperature); pre-existing evidence which has already been taken into account is part of the prior and as more evidence accumulates the prior is determined largely by the evidence rather than any original assumption, provided that the original assumption admitted the possibility of what the evidence is suggesting. The terms "prior" and "posterior" are generally relative to a specific datum or observation.
An uninformative prior expresses vague or general information about a variable. The term "uninformative prior" may be somewhat of a misnomer; often, such a prior might be called a not very informative prior, or an objective prior, i.e. one that's not subjectively elicited. Uninformative priors can express "objective" information such as "the variable is positive" or "the variable is less than some limit".
The simplest and oldest non-informative prior is the principle of indifference, which assigns equal probabilities to all possibilities.
In parameter estimation problems, the use of an uninformative prior typically yields results which are not too different from conventional statistical analysis, as the likelihood function often yields more information than the uninformative prior.
Some attempts have been made at finding a priori probabilities, i.e. probability distributions in some sense logically required by the nature of one's state of uncertainty; these are a subject of philosophical controversy. For example, Edwin T. Jaynes has published an argument (Jaynes 1968) based on Lie groups that suggests that the prior for the proportion p of voters voting for a candidate, given no other information, should be the Haldane prior p−1(1 − p)−1. If one is so uncertain about the value of the aforementioned proportion p that one knows only that at least one voter will vote for Smith and at least one will not, then the conditional probability distribution of p given this information alone is the uniform distribution on the interval [0, 1], which is obtained by applying Bayes' theorem to the data set consisting of one vote for Smith and one vote against, using the above prior. The Haldane prior has been criticized on the grounds that it yields an improper posterior distribution that puts 100% of the probability content at either p = 0 or at p = 1 if a finite sample of voters all favor the same candidate, even though mathematically the posterior probability is simply not defined and thus we cannot even speak of a probability content. The Jeffreys prior p−1/2(1 − p)−1/2 is therefore preferred (see below).
Priors can be constructed which are proportional to the Haar measure if the parameter space X carries a natural group structure which leaves invariant our Bayesian state of knowledge. For example, in physics we might expect that an experiment will give the same results regardless of our choice of the origin of a coordinate system. This induces the group structure of the translation group on X, which determines the prior probability as a constant improper prior. Similarly, some measurements are naturally invariant to the choice of an arbitrary scale (i.e., it doesn't matter if we use centimeters or inches, we should get results that are physically the same). In such a case, the scale group is the natural group structure, and the corresponding prior on X is proportional to 1/x. It sometimes matters whether we use the left-invariant or right-invariant Haar measure. For example, the left and right invariant Haar measures on the affine group are not equal. Berger (1985, p. 413) argues that the right-invariant Haar measure is the correct choice.
Another idea, championed by Edwin T. Jaynes, is to use the principle of maximum entropy (MAXENT). The motivation is that the Shannon entropy of a probability distribution measures the amount of information contained the distribution. The larger the entropy, the less information is provided by the distribution. Thus, by maximizing the entropy over a suitable set of probability distributions on X, one finds that distribution that is least informative in the sense that it contains the least amount of information consistent with the constraints that define the set. For example, the maximum entropy prior on a discrete space, given only that the probability is normalized to 1, is the prior that assigns equal probability to each state. And in the continuous case, the maximum entropy prior given that the density is normalized with mean zero and variance unity is the standard normal distribution. The principle of minimum cross-entropy generalizes MAXENT to the case of "updating" an arbitrary prior distribution with suitability constraints in the maximum-entropy sense.
A related idea, reference priors, was introduced by José-Miguel Bernardo. Here, the idea is to maximize the expected Kullback–Leibler divergence of the posterior distribution relative to the prior. This maximizes the expected posterior information about X when the prior density is p(x); thus, in some sense, p(x) is the "least informative" prior about X. The reference prior is defined in the asymptotic limit, i.e., one considers the limit of the priors so obtained as the number of data points goes to infinity. Reference priors are often the objective prior of choice in multivariate problems, since other rules (e.g., Jeffreys' rule) may result in priors with problematic behavior.
Objective prior distributions may also be derived from other principles, such as information or coding theory (see e.g. minimum description length) or frequentist statistics (see frequentist matching).
Philosophical problems associated with uninformative priors are associated with the choice of an appropriate metric, or measurement scale. Suppose we want a prior for the running speed of a runner who is unknown to us. We could specify, say, a normal distribution as the prior for his speed, but alternatively we could specify a normal prior for the time he takes to complete 100 metres, which is proportional to the reciprocal of the first prior. These are very different priors, but it is not clear which is to be preferred. Jaynes' often-overlooked method of transformation groups can answer this question in some situations[1].
Similarly, if asked to estimate an unknown proportion between 0 and 1, we might say that all proportions are equally likely and use a uniform prior. Alternatively, we might say that all orders of magnitude for the proportion are equally likely, which gives a prior proportional to the logarithm. The Jeffreys prior attempts to solve this problem by computing a prior which expresses the same belief no matter which metric is used. The Jeffreys prior for an unknown proportion p is p−1/2(1 − p)−1/2, which differs from Jaynes' recommendation.
Practical problems associated with uninformative priors include the requirement that the posterior distribution be proper. The usual uninformative priors on continuous, unbounded variables are improper. This need not be a problem if the posterior distribution is proper. Another issue of importance is that if an uninformative prior is to be used routinely, i.e., with many different data sets, it should have good frequentist properties. Normally a Bayesian would not be concerned with such issues, but it can be important in this situation. For example, one would want any decision rule based on the posterior distribution to be admissible under the adopted loss function. Unfortunately, admissibility is often difficult to check, although some results are known (e.g., Berger and Strawderman 1996). The issue is particularly acute with hierarchical Bayes models; the usual priors (e.g., Jeffreys' prior) may give badly inadmissible decision rules if employed at the higher levels of the hierarchy.
If Bayes' theorem is written as

then it is clear that it would remain true if all the prior probabilities P(Ai) and P(Aj) were multiplied by a given constant; the same would be true for a continuous random variable. The posterior probabilities will still sum (or integrate) to 1 even if the prior values do not, and so the priors only need to be specified in the correct proportion.
Taking this idea further, in many cases the sum or integral of the prior values may not even need to be finite to get sensible answers for the posterior probabilities. When this is the case, the prior is called an improper prior. Some statisticians[citation needed] use improper priors as uninformative priors. For example, if they need a prior distribution for the mean and variance of a random variable, they may assume p(m, v) ~ 1/v (for v > 0) which would suggest that any value for the mean is "equally likely" and that a value for the positive variance becomes "less likely" in inverse proportion to its value. Many authors (Lindley, 1973; De Groot, 1937; Kass and Wasserman, 1996)[citation needed] warn against the danger of over-interpreting those priors since they are not probability densities. The only relevance they have is found in the corresponding posterior, as long as it is well-defined for all observations. (The Haldane prior is a typical counterexample.)
Examples of improper priors include:
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)
| Best of the Web: Prior probability |
Some good "Prior probability" pages on the web:
Math mathworld.wolfram.com |
| Bayes's theorem (philosophy) | |
| Bayes rule (statistics) | |
| Bayesian inference |
| What is experimental probability and theoretical probability? Read answer... | |
| What is prior advice? Read answer... | |
| How is probability represented? Read answer... |
Copyrights:
![]() | Sci-Tech Dictionary. McGraw-Hill Dictionary of Scientific and Technical Terms. Copyright © 2003, 1994, 1989, 1984, 1978, 1976, 1974 by McGraw-Hill Companies, Inc. All rights reserved. Read more | |
![]() | Investment Dictionary. Copyright ©2000, Investopedia.com - Owned and Operated by Investopedia Inc. All rights reserved. Read more | |
![]() | Philosophy Dictionary. The Oxford Dictionary of Philosophy. Copyright © 1994, 1996, 2005 by Oxford University Press. All rights reserved. Read more | |
![]() | Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Prior probability". Read more |
Mentioned in