Share on Facebook Share on Twitter Email
Answers.com

Categorical distribution

 
Wikipedia: Categorical distribution

In probability theory, a categorical distribution is a discrete probability distribution distribution whose sample space is the set of 1-of-N encoded[nb 1][1] random vectors x of dimension n having the property that \textstyle{\sum_{i=1}^n x_i = 1}, i.e. exactly one element is 1 and the others are zero. The probability mass function f is:


f( \mathbf{x} ; \boldsymbol{p} ) = \prod_{i=1}^n p_i^{x_i}

where pi represents the probability of seeing element i and \textstyle{\sum_i p_i = 1}. This is the formulation adopted by Bishop [1][nb 2].

The distribution can be transformed into a special case of the multinomial distribution in which the parameter n of the multinomial distribution is fixed at 1.

Contents

Properties

The possible probabilities for the categorical distribution with n = 3 are the 2-simplex x1 + x2 + x3 = 1, embedded in 3-space.
  • The distribution is completely given by the probabilities associated with each number k: pk = P(X = xk), k = 1,...,n, where \textstyle{\sum_i p_k = 1}. The possible probabilities are exactly the standard (n − 1)-dimensional simplex; for n = 2 this reduces to the possible probabilities of the Bernoulli distribution being the 1-simplex, p+q=1, 0 \leq p \leq 1.
  • The distribution is a special case of a "multivariate Bernoulli distribution"[2] in which exactly one of the n 0-1 variables takes the value one.
  • \mathbb{E} \left[ \mathbf{x} \right] = \boldsymbol{p}
  • Let \boldsymbol{X} be the realisation from a categorical distribution. Define the random vector Y as composed of the elements:
Y_j=I(\boldsymbol{X}=x_j),
where I is the indicator function. Then Y has a distribution which is a special case of the multinomial distribution with parameter n = 1. The sum of n independent and identically distributed such random variables Y constructed from a categorical distribution with parameter \boldsymbol{p} is multinomially distributed with parameters n and \boldsymbol{p}
  • The sufficient statistic from n independent observations is the set of counts (or, equivalently, proportion) of observations in each category, where the total number of trials (=n) is fixed.

Chi-square statistic

Definition

The goodness of fit of a categorical data set of size k with a theoretical model categorical distribution with n categories with probabilities pi can be computed by the following chi-square statistic:

 X^2 = \sum_{i=1}^{n} {(O_i - k\cdot p_i)^2 \over k \cdot p_i} ,

where

X2 = the test statistic; this asymptotically approaches a χ2 distribution, hence is denoted by X, not χ.
Oi = the observed frequency of category i;
k \cdot p_i = the model frequency of the theoretical distribution.

The sampling distribution of this statistic is asymptotically a chi-square distribution, hence it is called a "chi-square test".

Descriptive statistics

By comparing the value of a statistic with the chi-square distribution, one may, in descriptive statistics terms, model how well the model fits the data:

  • if X2 is small (one would rarely get such a small value from a χ2 distribution), then the frequencies are closer to the theoretical values than would be expected by random trials;
  • if X2 is large (one would rarely get such a large value from a χ2 distribution), then the frequencies are further from the theoretical values than would be expected by random trials.

For example, if one were modeling a coin flip by a fair coin, if the coin alternated head/tail/head/tail, then X2 would be zero (after an even number of trials), which is unusually low – the data are more uniform than one would predict.

Conversely, if the coin always came up heads, then X2 would be unusually large – the data are more concentrated than the model would predict.

Frequentist statistics

In frequentist statistics terms, this statistic is widely used for statistical hypothesis testing, in the Pearson's chi-square test – one tests the null hypothesis that the data are samples from a given categorical distribution, generally a uniform distribution, by computing the p-value of a given value of the X2 and then rejecting or not the null hypothesis at a given level of statistical significance.

Bayesian statistics

In Bayesian statistics, one may interpret value of the chi-square statistic, given different models, as measuring the likelihood of these models being correct. However in practice one instead uses the Dirichlet distribution as conjugate prior: one begins with a given distribution on the space of possible pi (often uniform), and then updates it based on observations. That is, if the frequency of each outcome is Ei and one begins with a uniform prior, then the posterior distribution is the function Dir(E1,...,En) – the maximum likelihood estimate for the frequencies is the observed frequencies, and by integrating the function one may give credible regions for distributions.

See also

Related distributions

Notes

  1. ^ aka 1-of-K encoded
  2. ^ However, Bishop does not explicitly use the term categorical distribution

References

  1. ^ a b Bishop, C. 2006. Pattern Recognition and Machine Learning. Springer.
  2. ^ Johnson, N.L., Kotz, S., Balakrishnan, N. (1997) Discrete Multivariate Distributions, Wiley. ISBN 0-471-12844-9 (p.105)

Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
 
 

 

Copyrights:

Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Categorical distribution" Read more