Share on Facebook Share on Twitter Email
Answers.com

Bayes' theorem

 
Sci-Tech Dictionary: Bayes' theorem
(¦bāz ′thir·əm)

(mathematics) A theorem stating that the probability of a hypothesis, given the original data and some new data, is proportional to the probability of the hypothesis, given the original data only, and the probability of the new data, given the original data and the hypothesis. Also known as inverse probability principle.


Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
Encyclopedia of Public Health: Bayes' Theorem
Top

Bayes' theorem deals with the role of new information in revising probability estimates. The theorem assumes that the probability of a hypothesis (the posterior probability) is a function of new evidence (the likelihood) and previous knowledge (prior probability). The theorem is named after Thomas Bayes (1702–1761), a nonconformist minister who had an interest in mathematics. The basis of the theorem is contained in as essay published in the Philosophical Transactions of the Royal Society of London in 1763.

Bayes' theorem is a logical consequence of the product rule of probability, which is the probability (P) of two events (A and B) happening— P(A,B)—is equal to the conditional probability of one event occurring given that the other has already occurred—P(A|B)—multiplied by the probability of the other event happening—P(B). The derivation of the theorem is as follows: P(A,B) = P(A|B)×P(B) = P(B|A)×P(A)

Thus: P(A|B) = P(B|A)×P(A)/P(B).

Bayes' theorem has been frequently used in the areas of diagnostic testing and in the determination of genetic predisposition. For example, if one wants to know the probability that a person with a particular genetic profile (B) will develop a particular tumour type (A)—that is, P(A|B). Previous knowledge leads to the assumption that the probability that any individual will develop the specific tumour (P(A)) is 0.1 and the probability that an individual has the particular genetic profile (P(B)) is 0.2. New evidence establishes that the probability that an individual with the tumor—P(B|A)—has the genetic profile of interest is 0.5.

Thus: P(A|B) = 0.1×0.5/0.2 = 0.25

The adoption of Bayes' theorem has led to the development of Bayesian methods for data analysis. Bayesian methods have been defined as "the explicit use of external evidence in the design, monitoring, analysis, interpretation and reporting" of studies (Spiegelhalter, 1999). The Bayesian approach to data analysis allows consideration of all possible sources of evidence in the determination of the posterior probability of an event. It is argued that this approach has more relevance to decision making than classical statistical inference, as it focuses on the transformation from initial knowledge to final opinion rather than on providing the "correct" inference.

In addition to its practical use in probability analysis, Bayes' theorem can be used as a normative model to assess how well people use empirical information to update the probability that a hypothesis is true.

(SEE ALSO: Bayes, Thomas; Probability Model; Statistics for Public Health)

Bibliography

Spiegelhalter, D.; Myles, J.; Jones, D.; and Abrams, K. (1999). "An Introduction to Bayesian Methods in Health Technology Assessment." British Medical Journal 319:508–512.

—— (2000). "Bayesian Methods in Health Technology Assessment: A Review." Health Technology Assessment 4(38):1–130.

— GEORGE WELLS



Philosophy Dictionary: Bayes's theorem
Top

Theorem in probability theory. Thomas Bayes (1702-61) was an English clergyman, whose An Essay towards Solving a Problem in the Doctrine of Chances occurs in two memoirs presented by Price (Bayes having died), in Philosophical Transactions of 1763 and 1764. Bayes gave a result for the probability that the chance of an event on a single trial is within a certain interval, given the number of times the event has occurred and the number it has failed. But the form in which his theorem is remembered is as an expression for the posterior probability of a hypothesis (its probability after evidence is obtained). This is a product of (i) its probability before the evidence, or prior probability, and (ii) the probability of the evidence being as it is, given the hypothesis, divided by (iii) the prior probability of the evidence (often expressed as the probability of the evidence considered in the light of all the different possible hypotheses).

Wikipedia: Bayes' theorem
Top

In probability theory, Bayes' theorem (often called Bayes' law or Bayes' rule, and named after Rev. Thomas Bayes; IPA:/'beɪz/) shows how one conditional probability (such as the probability of a hypothesis given observed evidence) depends on its inverse (in this case, the probability of that evidence given the hypothesis). The theorem expresses the posterior probability (i.e. after evidence E is observed) of a hypothesis H in terms of the prior probabilities of H and E, and the probability of E given H. It implies that evidence has a stronger confirming effect if it was more unlikely before being observed.[1] Bayes theorem is valid in all common interpretations of probability, and is applicable in science and engineering.[2] However, there is disagreement between frequentist and Bayesian and subjective and objective statisticians in regards to the proper implementation and extent of Bayes´ theorem.

Contents

Simple statement of theorem

Bayes gave a special case involving continuous prior and posterior probability distributions and discrete probability distributions of data, but in its simplest setting involving only discrete distributions, Bayes' theorem relates the conditional and marginal probabilities of events A and B, where B has a non-vanishing probability:

P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\,\! .

Each term in Bayes' theorem has a conventional name:

Bayes' theorem in this form gives a mathematical representation of how the conditional probabability of event A given B is related to the converse conditional probabablity of B given A.

Likelihood functions and continuous prior and posterior distributions

Suppose a continuous probability distribution with probability density function ƒΘ is assigned to an uncertain quantity Θ. (In the conventional language of mathematical probability theory Θ would be a "random variable", but in certain kinds of scientific applications such language may be considered objectionable.) The probability that the event B will be the outcome of an experiment depends on Θ; it is P(B | Θ). As a function of Θ this is the likelihood function:

 L(\theta) = P(B \mid \Theta = \theta). \,

Then the posterior probability distribution of Θ, i.e. the conditional probability distribution of Θ given the observed data B, has probability density function

 f_\Theta(\theta \mid B) = \text{constant}\cdot f_\Theta(\theta) L(B \mid \theta), \,

where the "constant" is a normalizing constant so chosen as to make the integral of the function equal to 1, so that it is indeed a probability density function. This is the form of Bayes' theorem actually considered by Thomas Bayes.

In other words, Bayes' theorem says:

To get the posterior probability distribution, multiply the prior probability distribution by the likelihood function and then normalize.

More generally still, the new data B may be the value of an observed continuously distributed random variable X. The probability that it has any particular value is therefore 0. In such a case, the likelihood function is the value of a probability density function of X given Θ, rather than a probability of B given Θ:

 L(\theta) = f_X(x \mid \Theta = \theta). \,

Application of the theorem

The theorem still prescribes multiplying the prior distribution by the likelihood function and then normalizing, to get the posterior distribution. As a formal theorem, Bayes' theorem is valid in all common interpretations of probability. However, frequentist and Bayesian interpretations disagree on how (and to what) probabilities are assigned. In the Bayesian interpretation, probabilities are rationally coherent degrees of belief, or a degree of belief in a proposition given a body of well-specified information.[2] Bayes' theorem can then be understood as specifying how an ideally rational person responds to evidence.[1] In the frequentist interpretation, probabilities are the frequencies of occurrence of random events as proportions of a whole. Though his name has become associated with subjective probability, Bayes himself interpreted the theorem in an objective sense.[3]

The theorem was given extra prominence by a theorem by physicist R. T. Cox which showed that any system of inference fitting certain requirements could be mapped onto probability.[2][4] Bayes' Theorem has since found a wide variety of applications in science and engineering.[2]

Simple example

Suppose there is a school having 60% boys and 40% girls as students. The female students wear trousers or skirts in equal numbers; the boys all wear trousers. An observer sees a (random) student from a distance; all the observer can see is that this student is wearing trousers. What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem.

The event A is that the student observed is a girl, and the event B is that the student observed is wearing trousers. To compute P(A|B), we first need to know:

  • P(A), or the probability that the student is a girl regardless of any other information. Since the observers sees a random student, meaning that all students have the same probability of being observed, and the fraction of girls among the students is 40%, this probability equals 0.4.
  • P(B|A), or the probability of the student wearing trousers given that the student is a girl. As they are as likely to wear skirts as trousers, this is 0.5.
  • P(B), or the probability of a (randomly selected) student wearing trousers regardless of any other information. Since P(B) = P(B|A)P(A) + P(B|A')P(A'), this is 0.5×0.4 + 1×0.6 = 0.8.

Given all this information, the probability of the observer having spotted a girl given that the observed student is wearing trousers can be computed by substituting these values in the formula:

P(A|B) = \frac{P(B|A) P(A)}{P(B)} = \frac{0.5 \times 0.4}{0.8} = 0.25.

Another, essentially equivalent way of obtaining the same result is as follows. Assume, for concreteness, that there are 100 students, 60 boys and 40 girls. Among these, 60 boys and 20 girls wear trousers. All together there are 80 trouser-wearers, of which 20 are girls. Therefore the chance that a random trouser-wearer is a girl equals 20/80 = 0.25. Put in terms of Bayes´ theorem, the probability of a student being a girl is 40/100, the probability that any given girl will wear trousers is 1/2. The product of two is 20/100, but you know the student is wearing trousers, so you remove the 20 non trouser wearing students and receive a probability of (20/100)/(80/100), or 20/80.

It is often helpful when calculating conditional probabilities to create a simple table containing the number of occurrences of each outcome, or the relative frequencies of each outcome, for each of the independent variables. The table below illustrates the use of this method for the above girl-or-boy example

Girls Boys Total
Trousers
20
60
 80
Skirts
20
 0
 20
Total
40
60
100

Derivation from conditional probabilities

To derive the theorem, we start from the definition of conditional probability. The probability of event A given event B is

P(A|B)=\frac{P(A \cap B)}{P(B)}.

Equivalently, the probability of event B given event A is

P(B|A) = \frac{P(A \cap B)}{P(A)}. \!

Rearranging and combining these two equations, we find

P(A|B)\, P(B) = P(A \cap B) = P(B|A)\, P(A). \!

This lemma is sometimes called the product rule for probabilities. Dividing both sides by P(B), provided that it is non-zero, we obtain Bayes' theorem:

P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{P(B|A)\,P(A)}{P(B)}. \!

Alternative forms

Bayes' theorem is often completed by noting that, according to the Law of total probability,

P(B) = P(A\cap B) + P(A^C\cap B) = P(B|A) P(A) + P(B|A^C) P(A^C)\! ,

where AC is the complementary event of A (often called "not A").

This results in the analogous form:

P(A|B) = \frac{P(B|A) P(A)}{P(B|A) P(A) + P(B|A^C) P(A^C)} \!.

More generally, the law states that given a partition, i.e. {Ai}, of the event space

P(B) = {\sum_i P(B \cap A_i)} = {\sum_i P(B|A_i) P(A_i)} \!.

Thus, for any Ai in the partition, Bayes' theorem states that

P(A_i|B) = \frac{P(B | A_i)\, P(A_i)}{P(B)}  = \frac{P(B | A_i)\, P(A_i)}{\sum_j P(B|A_j)\,P(A_j)}  \!.

In terms of odds and likelihood ratio

Bayes' theorem can also be written neatly in terms of a likelihood ratio Λ and odds O as

O(A|B)=O(A) \cdot \Lambda (A|B)

where O(A|B) are the (posterior) odds of A given B,

O(A|B)=\frac{P(A|B)}{P(A^C|B)} \!

O(A) are the (prior) odds of A by itself

O(A)=\frac{P(A)}{P(A^C)} \!

and Λ(A|B) is the likelihood ratio.

\Lambda (A|B) = \frac{P(B|A)}{P(B|A^C)} \!

For probability densities

There is also a version of Bayes' theorem for continuous distributions. It is somewhat harder to derive, since probability densities are not probabilities, so Bayes' theorem has to be established by a limit process; see Papoulis (citation below), Section 7.3 for an elementary derivation.

Bayes originally used the proposition to find a continuous posterior distribution given discrete observations.

Bayes' theorem for probability densities is formally similar to the theorem for probabilities:

 f_X(x|Y=y) = \frac{f_{X,Y}(x,y)}{f_Y(y)} = \frac{f_Y(y|X=x)\,f_X(x)}{f_Y(y)} = \frac{f_Y(y|X=x)\,f_X(x)}{\int_{-\infty}^{\infty} f_Y(y|X=\xi )\,f_X(\xi )\,d\xi }\!.

There is an analogous statement of the law of total probability, which is used in the denominator:

 f_Y(y) = \int_{-\infty}^{\infty} f_Y(y|X=x )\,f_X(x)\,dx \!.

As in the discrete case, the terms have standard names.

 f_{X,Y}(x,y)\,

is the joint density function of X and Y,

 f_X(x|Y=y)\,

is the posterior probability density function of X given Y = y,

 f_Y(y|X=x) = L(x|y)\,

is (as a function of x) the likelihood function of X given Y = y, and

 f_X(x)\,

and

 f_Y(y)\!

are the marginal probability density functions of X and Y respectively, where ƒX(x) is the prior probability density function of X.

Abstract form

Given two absolutely continuous probability measures P ~ Q on the probability space (\Omega, \mathcal{F}) and a sigma-algebra \mathcal{G} \subset \mathcal{F}, the abstract Bayes' theorem for a \mathcal{F}-measurable random variable X becomes

E_P \left[ X|\mathcal{G} \right]  = \frac{E_Q \left[ \left.\frac{dP}{dQ} X \right|\mathcal{G} \right] }{E_Q \left[\left. \frac{dP}{dQ}\right|\mathcal{G} \right] }.

Proof :

by definition of conditional probability,

 E_P \left[ X|\mathcal{G} \right]  = \frac{E_P \left[ X 1_\mathcal{G} \right] }{P \left[ \mathcal{G} \right] }  = \frac{E_P \left[ X 1_\mathcal{G} \right] }{E_P \left[ 1_\mathcal{G} \right] }

We further have that

 E_Q \left[ \frac{dP}{dQ} X\right]  = \int \frac{dP}{dQ} X \,dQ   = \int X \, dP = E_P \left[ X \right].

Hence,

 E_Q \left[ \left.\frac{dP}{dQ} X\right|\mathcal{G} \right]  = \frac{E_Q \left[ \frac{dP}{dQ} X 1_\mathcal{G} \right] }{E_Q \left[ 1_\mathcal{G} \right] } = \frac{E_P \left[ X 1_\mathcal{G} \right] }{E_Q \left[ 1_\mathcal{G} \right] }
 E_Q \left[ \left.\frac{dP}{dQ}\right|\mathcal{G} \right]  = \frac{E_Q \left[ \frac{dP}{dQ} 1_\mathcal{G} \right] }{E_Q \left[ 1_\mathcal{G} \right] } = \frac{E_P \left[ 1_\mathcal{G} \right] }{E_Q \left[ 1_\mathcal{G} \right] }.

Summarizing, we obtain the desired result.

This formulation is used in Kalman filtering to find Zakai equations. It is also used in financial mathematics for change of numeraire techniques.

Extensions

Theorems analogous to Bayes' theorem hold in problems with more than two variables. For example:

 P(A|B \cap C) = \frac{P(A) \, P(B|A) \, P(C|A \cap B)}{P(B) \, P(C|B)}\,.

This can be derived in a few steps from Bayes' theorem and the definition of conditional probability:

 P(A|B \cap C) = \frac{P(A \cap B \cap C)}{P(B \cap C)} = \frac{P(C|A \cap B) \, P(A \cap B)}{P(B) \, P(C|B)} = \frac{P(A) \, P(B|A) \, P(C|A \cap B)}{P(B) \, P(C|B)}\,.

Similarly,

 P(A|B \cap C) = \frac{P(B|A \cap C) \, P(A|C)}{P(B|C)}\,,

which can be regarded as a conditional Bayes' Theorem, and can be derived by as follows:

 P(A|B \cap C) = \frac{P(A \cap B \cap C)}{P(B \cap C)} = \frac{P(B|A \cap C) \, P(A|C) \, P(C)}{P(C) \, P(B|C)} = \frac{P(B|A \cap C) \, P(A|C)}{P(B|C)}\,.

A general strategy is to work with a decomposition of the joint probability, and to marginalize (integrate) over the variables that are not of interest. Depending on the form of the decomposition, it may be possible to prove that some integrals must be 1, and thus they fall out of the decomposition; exploiting this property can reduce the computations very substantially. A Bayesian network, for example, specifies a factorization of a joint distribution of several variables in which the conditional probability of any one variable given the remaining ones takes a particularly simple form (see Markov blanket).

Further examples

Example 1: Drug testing

An example of the use of Bayes' theorem is the evaluation of drug test results. Suppose a certain drug test is 99% sensitive and 99% specific, that is, the test will correctly identify a drug user as testing positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the time. This would seem to be a relatively accurate test, but Bayes' theorem can be used to demonstrate the relatively high probability of misclassifying non-users as users. Let's assume a corporation decides to test its employees for drug use, and that only 0.5% of the employees actually use the drug. What is the probability that, given a positive drug test, an employee is actually a drug user? Let "D" stand for being a drug user and "N" indicate being a non-user. Let "+" be the event of a positive drug test. We need to know the following:

  • P(D), or the probability that the employee is a drug user, regardless of any other information. This is 0.005, since 0.5% of the employees are drug users. This is the prior probability of D.
  • P(N), or the probability that the employee is not a drug user. This is 1 − P(D), or 0.995.
  • P(+|D), or the probability that the test is positive, given that the employee is a drug user. This is 0.99, since the test is 99% accurate.
  • P(+|N), or the probability that the test is positive, given that the employee is not a drug user. This is 0.01, since the test will produce a false positive for 1% of non-users.
  • P(+), or the probability of a positive test event, regardless of other information. This is 0.0149 or 1.49%, which is found by adding the probability that a true positive result will appear (= 99% x 0.5% = 0.495%) plus the probability that a false positive will appear (= 1% x 99.5% = 0.995%). This is the prior probability of +.

Given this information, we can compute the posterior probability P(D|+) of an employee who tested positive actually being a drug user:

\begin{align}P(D|+) & = \frac{P(+ | D) P(D)}{P(+)} \\
& = \frac{P(+ | D) P(D)}{P(+ | D) P(D) + P(+ | N) P(N)} \\
& = \frac{0.99 \times 0.005}{0.99 \times 0.005 + 0.01 \times 0.995} \\
& = 0.3322.\end{align}

Despite the specificity and sensitivity of the test, the low base-rate of use renders the accuracy of the test low: the probability that an employee who tests positive actually using drugs is only about 33%, so it is in fact more likely that the employee is not a drug user. The rarer the condition for which we are testing, the greater the percentage of positive tests that will be false positives.

Example 2: Bayesian inference

Applications of Bayes' theorem often assume the philosophy underlying Bayesian probability that uncertainty and degrees of belief can be measured as probabilities.

We describe the marginal probability distribution of a variable A as the prior probability distribution or simply the 'prior'. The conditional distribution of A given the "data" B is the posterior probability distribution or just the 'posterior'.

Suppose we wish to know about the proportion r of voters in a large population who will vote "yes" in a referendum. Let n be the number of voters in a random sample (chosen with replacement, so that we have statistical independence) and let m be the number of voters in that random sample who will vote "yes". Suppose that we observe n = 10 voters and m = 7 say they will vote yes. From Bayes' theorem we can calculate the probability distribution function for r using

 f(r | n=10, m=7) = 
  \frac {f(m=7 | r, n=10) \, f(r)} {\int_0^1 f(m=7|r, n=10) \, f(r) \, dr}. \!

From this we see that from the prior probability density function f(r) and the likelihood function L(r) = f(m = 7|r, n = 10), we can compute the posterior probability density function f(r|n = 10, m = 7).

The prior probability density function f(r) summarizes what we know about the distribution of r in the absence of any observation. We provisionally assume in this case that the prior distribution of r is uniform over the interval [0, 1]. That is, f(r) = 1. If some additional background information is found, we should modify the prior accordingly. However before we have any observations, all outcomes are equally likely.

Under the assumption of random sampling, choosing voters is just like choosing balls from an urn. The likelihood function L(r) = P(m = 7|r, n = 10,) for such a problem is just the probability of 7 successes in 10 trials for a binomial distribution.

 P( m=7 | r, n=10) = {10 \choose 7} \, r^7 \, (1-r)^3.

As with the prior, the likelihood is open to revision—more complex assumptions will yield more complex likelihood functions. Maintaining the current assumptions, we compute the normalizing factor,


\begin{align}
\int_0^1 P( m=7|r, n=10) \, f(r) \, dr & = \int_0^1 {10 \choose 7} \, r^7 \, (1-r)^3 \, 1 \, dr \\
& = {10 \choose 7} \left / {11 \choose 3,7,1} \right . = {10 \choose 7} \, \frac{1}{1320}
\end{align}

and the posterior distribution for r is then

 f(r | n=10, m=7) = 
 \frac{{10 \choose 7} \, r^7 \, (1-r)^3 \, 1} {{10 \choose 7} \, \frac{1}{1320}} = 1320 \, r^7 \, (1-r)^3

for r between 0 and 1, inclusive.

One may be interested in the probability that more than half the voters will vote "yes". The prior probability that more than half the voters will vote "yes" is 1/2, by the symmetry of the uniform distribution. In comparison, the posterior probability that more than half the voters will vote "yes", i.e., the conditional probability given the outcome of the opinion poll – that seven of the 10 voters questioned will vote "yes" – is

1320\int_{1/2}^1 r^7(1-r)^3\,dr \approx 0.887, \!

which is about an "89% chance".

Example 3: The Monty Hall problem

We are presented with three doors—red, green, and blue—from which to choose, one of which has a prize hidden behind it. Suppose we choose the red door. The presenter, who knows where the prize is (and will not choose that door to open), opens the blue door and reveals that there is no prize behind it. He then asks if we wish to change our choice from our initial selection of red. Will changing our mind at this point improve our chances of winning the prize?

You might think that, with two doors left unopened, you have a 50:50 chance with either door, and so there is no point in changing doors. However, this is not the case. Let us call the situation that the prize is behind a given door Ar, Ag, and Ab.

To start with, P(A_r) = P(A_g) = P(A_b) = \tfrac 1 3, and to make things simpler we shall assume that we have already picked the red door.

Let us call B "the presenter opens the blue door". Without any prior knowledge, we would assign this a probability of 50%.

  • In the situation where the prize is behind the red door, the presenter is free to pick between the green or the blue door at random. Thus, P(B | Ar) = 1 / 2
  • In the situation where the prize is behind the green door, the presenter must pick the blue door. Thus, P(B | Ag) = 1
  • In the situation where the prize is behind the blue door, the presenter must pick the green door. Thus, P(B | Ab) = 0

Thus,


\begin{align}
P(A_r|B) & =  \frac{P(B | A_r) P(A_r)}{P(B)} =   \frac{\frac 1 2 \cdot \frac 1 3}{\frac 1 2} = \tfrac 1 3
\\
P(A_g|B) & =  \frac{P(B | A_g) P(A_g)}{P(B)} =   \frac{1 \cdot \frac 1 3}{\frac 1 2} = \tfrac 2 3
\\
P(A_b|B) & =  \frac{P(B | A_b) P(A_b)}{P(B)} =   \frac{0 \cdot \frac 1 3}{\frac 1 2} = 0.
\end{align}

So, we should always choose the green door. This assumes that the presenter chooses at random with equal probability. If the presenter always chooses the blue door, it makes no difference what we do.

Note how this depends on the value of P(B). Another way of looking at the apparent inconsistency is that when you chose the first door, you had a 1/3 chance of being right. When the second door was removed from the list of possibilities, the conditional probability for the chosen door to be the winning door is also 1/3 and hence this left the last door with a 2/3 chance of being right.

Historical remarks

Bayes' theorem is named after the Reverend Thomas Bayes (1702–1761), who studied how to compute a distribution for the parameter of a binomial distribution (to use modern terminology). His friend, Richard Price, edited and presented the work in 1763, after Bayes' death, as An Essay towards solving a Problem in the Doctrine of Chances[5]. Pierre-Simon Laplace replicated and extended these results in an essay of 1774, apparently unaware of Bayes' work.

Bayes presents his work as the solution to a problem, namely:

Given the number of times ion which an unknown event has happende and failed [... Find] the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named.[5]

He goes on to give an example of a man trying to guess the ratio of 'blanks' and 'prizes' at a lottery. He supposes the man has watched the lottery, so far, draw 10 'blanks' and one 'prize'. Given that data, he then gives detailed calculations as to how to compute the probability that the ratio of blanks to prizes is thus between 9:1 and 11:1. (The odds are bad...about 7.7 percent). He goes on to describe the same calculation for the case in which the man has watched the lottery draw 20 blanks and 2 prizes, 40 blanks and 4 prizes, and so on. He ends by calculating the odds for the lottery having drawn 10,000 blanks and 1,000 prizes (Good odds there, a 97 percent chance that the ratio of blanks/prizes is between 9:1 and 11:1).[5]

One of Bayes' results (Proposition 5) gives a simple description of conditional probability, and shows that it can be expressed independently of the order in which things occur:

If there be two subsequent events, the probability of the second b/N and the probability of both together P/N, and it being first discovered that the second event has also happened, from hence I guess that the first event has also happened, the probability I am right [i.e., the conditional probability of the first event being true given that the second has also happened] is P/b.

Note that the expression says nothing about the order in which the events occurred; it measures correlation, not causation. His preliminary results, in particular Propositions 3, 4, and 5, imply the result now called Bayes' Theorem (as described above), but it does not appear that Bayes himself emphasized or focused on that result.

Bayes' main result (Proposition 9 in the essay) is the following: assuming a uniform distribution for the prior distribution of the binomial parameter p, the probability that p is between two values a and b is


\frac {\int_a^b {n+m \choose m} p^m (1-p)^n\,dp}
 {\int_0^1 {n+m \choose m} p^m (1-p)^n\,dp}
\!

where m is the number of observed successes and n the number of observed failures.

What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter p. So, one can compute probability for an experimental outcome, but also for the parameter which governs it, and the same algebra is used to make inferences of either kind.

Bayes states his question in a way that might make the idea of assigning a probability distribution to a parameter palatable to a frequentist. He supposes that a billiard ball is thrown at random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard balls will fall above or below the first ball.

Stephen Fienberg [1] describes the evolution of the field from "inverse probability" at the time of Bayes and Laplace, and even of Harold Jeffreys (1939) to "Bayesian" in the 1950s. The irony is that this label was introduced by R.A. Fisher in a derogatory sense. So, historically, Bayes was not a "Bayesian". It is actually unclear whether or not he was a Bayesian in the modern sense of the term, i.e. whether or not he was interested in inference or merely in probability: the 1763 essay is more of a probability paper.

An investigation by Stigler[6] suggests that Bayes' theorem was discovered by Nicholas Saunderson some time before Bayes. However, this interpretation was later disputed by Edwards[7].

Richard Price and the Existence of a Deity

Richard Price, who, as mentioned, discovered Bayes' Theorem in his papers after he died, believed that the theorem helped prove the existence of a Deity. In his introduction to Bayes' original essay, he wrote that:

The purpose I mean is, to shew what reason we have for believing that there are in the constitution of things fixt laws according to which things happen, and that, therefore, the frame of the world must be the effect of the wisdom and power of an intelligent cause; and thus to confirm the argument taken from final causes for the existence of the Deity. It will be easy to see that the converse problem solved in this essay is more directly applicable to this purpose; for it shews us, with distinctness and precision, in every case of any particular order or recurrency of events, what reason there is to think that such recurrency or order is derived from stable causes or regulations innature, and not from any irregularities of chance. - Philosophical Transactions of the Royal Society of London, 1763.[5]

See also

References

  1. ^ a b Howson, Colin; Peter Urbach (1993). Scientific Reasoning: The Bayesian Approach. Open Court. ISBN 9780812692341. 
  2. ^ a b c d Jaynes, Edwin T. (2003). Probability theory: the logic of science. Cambridge University Press. ISBN 9780521592710. 
  3. ^ Earman, John (1992). "Bayes' Bayesianism". Bayes Or Bust?: A Critical Examination of Bayesian Confirmation Theory. MIT Press. ISBN 9780262050463. 
  4. ^ Baron, Jonathan (1994). Thinking and Deciding (2 ed.). Oxford University Press. pp. 209-210. ISBN 0521437326. 
  5. ^ a b c d Bayes, Thomas, and Price, Richard (1763). "An Essay towards solving a Problem in the Doctrine of Chance. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M. A. and F. R. S.". Philosophical Transactions of the Royal Society of London 53: 370–418. http://www.stat.ucla.edu/history/essay.pdf. 
  6. ^ Stephen M. Stigler (1983), "Who Discovered Bayes' Theorem?" The American Statistician 37(4):290–296.
  7. ^ A. W. F. Edwards (1986), "Is the Reference in Hartley (1749) to Bayesian Inference?", The American Statistician 40(2):109–110
  8. ^ Gillies, Donald (Mar 1986). "In Defense of the Popper-Miller Argument". Philosophy of Science 53 (1): 110-113. http://www.jstor.org/stable/187924. 

Versions of the essay

Commentaries

  • G. A. Barnard (1958) "Studies in the History of Probability and Statistics: IX. Thomas Bayes' Essay Towards Solving a Problem in the Doctrine of Chances", Biometrika 45:293–295. (biographical remarks)
  • Daniel Covarrubias. "An Essay Towards Solving a Problem in the Doctrine of Chances". (an outline and exposition of Bayes' essay)
  • Stephen M. Stigler (1982). "Thomas Bayes' Bayesian Inference," Journal of the Royal Statistical Society, Series A, 145:250–258. (Stigler argues for a revised interpretation of the essay; recommended)
  • Isaac Todhunter (1865). A History of the Mathematical Theory of Probability from the time of Pascal to that of Laplace, Macmillan. Reprinted 1949, 1956 by Chelsea and 2001 by Thoemmes.
  • An Intuitive Explanation of Bayesian Reasoning (includes biography)

Additional material


Best of the Web: Bayes' theorem
Top

Some good "Bayes' theorem" pages on the web:


Math
mathworld.wolfram.com
 
 
 

 

Copyrights:

Sci-Tech Dictionary. McGraw-Hill Dictionary of Scientific and Technical Terms. Copyright © 2003, 1994, 1989, 1984, 1978, 1976, 1974 by McGraw-Hill Companies, Inc. All rights reserved.  Read more
Encyclopedia of Public Health. Encyclopedia of Public Health. Copyright © 2002 by The Gale Group, Inc. All rights reserved.  Read more
Philosophy Dictionary. The Oxford Dictionary of Philosophy. Copyright © 1994, 1996, 2005 by Oxford University Press. All rights reserved.  Read more
Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Bayes' theorem" Read more