Share on Facebook Share on Twitter Email
Answers.com

Chi-square distribution

 
Sci-Tech Dictionary: chi-square distribution
(′kī ¦skwer dis·trə′byü·shən)

(statistics) The distribution of the sum of the squares of a set of variables, each of which has a normal distribution and is expressed in standardized units.


Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
Veterinary Dictionary: chi-square distribution
Top

In statistical terms this is said of a variable with K degrees of freedom if it is distributed like the sum of the squares of K independent random variables each of which has a normal distribution with mean zero and variance of 1.

Wikipedia: Chi-square distribution
Top
chi-square
Probability density function
Chi-square distributionPDF.png
Cumulative distribution function
Chi-square distributionCDF.png
parameters: k > 0\, degrees of freedom
support: x \in [0; +\infty)\,
pdf: \frac{x^{(k/2) - 1}}{2^{k/2} \Gamma(k/2)} e^{-x/2}\,
cdf: \frac{\gamma(k/2,x/2)}{\Gamma(k/2)}\,
mean: k\,
median: approximately k-\frac{2}{3}+\frac{4}{27k}-\frac{8}{729k^2}
mode: k-2\, if k\geq 2\,
variance: 2\,k\,
skewness: \sqrt{8/k}\,
kurtosis: 12/k\,
entropy: \frac{k}{2}\!+\!\ln(2\Gamma(k/2))\!+\!(1\!-\!k/2)\psi(k/2)
mgf: (1-2\,t)^{-k/2} for 2\,t<1\,
cf: (1-2\,i\,t)^{-k/2}       [1]


In probability theory and statistics, the chi-square distribution (also chi-squared or χ2-distribution) is one of the most widely used theoretical probability distributions in inferential statistics, e.g., in statistical significance tests.[2][3][4][5] A random variable is said to have a chi-square distribution if it equals the sum of the squares of a set of statistically independent standard Gaussian random variables.

The best-known situations in which the chi-square distribution is used are the common chi-square tests for goodness of fit of an observed distribution to a theoretical one, and of the independence of two criteria of classification of qualitative data. Many other statistical tests also lead to a use of this distribution, like Friedman's analysis of variance by ranks.

Contents

Definition

If X1,...,Xk are k independent, normally distributed random variables with mean 0 and variance 1, then the random variable

Q = \sum_{i=1}^k X_i^2

is distributed according to the chi-square distribution with k degrees of freedom. This is usually written

Q\sim\chi^2_k.\,

The chi-square distribution has one parameter: k - a positive integer that specifies the number of degrees of freedom (i.e. the number of Xis)

The chi-square distribution is a special case of the gamma distribution.

Characteristics

Further properties of the chi-square distribution can be found in the box at right.

Probability density function

A probability density function of the chi-square distribution is


f(x;k)=
\begin{cases}\displaystyle
\frac{1}{2^{k/2}\Gamma(k/2)}\,x^{(k/2) - 1} e^{-x/2}&\text{for }x>0,\\
0&\text{for }x\le0,
\end{cases}

where Γ denotes the Gamma function, which has closed-form values at the half-integers.

For derivations of the pdf in the cases of one and two degrees of freedom, see Proofs related to chi-square distribution.

Cumulative distribution function

Its cumulative distribution function is:

F(x;k)=\frac{\gamma(k/2,x/2)}{\Gamma(k/2)} = P(k/2, x/2)

where γ(k,z) is the lower incomplete Gamma function and P(k,z) is the regularized Gamma function.

Tables of this distribution — usually in its cumulative form — are widely available and the function is included in many spreadsheets and all statistical packages.

Additivity

It follows from the definition of the chi-square distribution that the sum of independent chi-square variables is also chi-square distributed. Specifically, if \{X_i\}_{i=1}^n are independent chi-square variables with \{k_i\}_{i=1}^n degrees of freedom, respectively, then Y = X_1 + \cdots + X_n is chi-square distributed with k_1 + \cdots + k_n degrees of freedom.

Information entropy

The information entropy is given by


H
=
\int_{-\infty}^\infty f(x;k)\ln(f(x;k)) dx
=
\frac{k}{2}
+
\ln
 \left(
  2 \Gamma
  \left(
   \frac{k}{2}
  \right)
 \right)
+
\left(1 - \frac{k}{2}\right)
\psi(k/2).

where ψ(x) is the Digamma function.

Noncentral moments

The moments about zero of a chi-square distribution with k degrees of freedom are given by[6][7]

\begin{align}
E(X^m) &= k (k+2) (k+4) \cdots (k+2m-2) \\
       &= 2^m \frac{\Gamma(m+k/2)}{\Gamma(k/2)}.
\end{align}

Cumulants

The cumulants are readily obtained by a (formal) power series expansion of the logarithm of the characteristic function:


\kappa_n = \frac{2^n \, n!\, k}{2n}

Asymptotic properties

By the central limit theorem, because the chi-square distribution is the sum of k independent random variables, it converges to a normal distribution for large k (k > 50 is "approximately normal" according to [8]). Specifically, if X\sim\chi^2_k, then as k tends to infinity, the distribution of (X-k)/\sqrt{2k} tends to a standard normal distribution. However, convergence is slow as the skewness is \sqrt{8/k} and the excess kurtosis is 12 / k.

Other functions of the chi-square distribution converge more rapidly to a normal distribution. Some examples are:

  • If X\sim\chi^2_k then \sqrt{2X} is approximately normally distributed with mean \sqrt{2k-1} and unit variance (result credited to R. A. Fisher).
  • If X\sim\chi^2_k then \sqrt[3]{X/k} is approximately normally distributed with mean 1 − 2 / (9k) and variance 2 / (9k) (Wilson and Hilferty,1931)

Related distributions

Normal distribution

A chi-square variable with k degrees of freedom is defined as the sum of the squares of k independent standard normal random variables.

More generally, the chi-square distribution is related to any Gaussian random vector of length k as follows. If Y is a Gaussian random vector having mean vector μ and covariance matrix C, then X = (Y − μ)TC − 1(Y − μ) is chi-square distributed with k degrees of freedom. This is because the subtraction of μ and the multiplication by C − 1 / 2 effectively transforms the Gaussian vector to an i.i.d., zero-mean distribution.

The sum of squares of statistically independent unit-variance Gaussian variables which do not have mean zero yields a generalization of the chi-square distribution called the noncentral chi-square distribution.

If Y is a vector of k i.i.d. standard normal random variables and A is a k \times k idempotent matrix with rank kn then the quadratic form YTAY is chi-square distributed with kn degrees of freedom.

The chi-square distribution is also naturally related to other distributions arising from the Gaussian. In particular,

Generalizations

The chi-square distribution is obtained from the sum of k independent, zero-mean, unit-variance Gaussian random variables. Generalizations of this distribution can be obtained by summing the squares of other types of Gaussian random variables. Several such distributions are described below.

Noncentral chi-square distribution

The noncentral chi-square distribution is obtained from the sum of the squares of independent Gaussian random variables having unit variance and nonzero means.

Generalized chi-square distribution

The generalized chi-square distribution is obtained from the quadratic form zTAz, where z is a zero-mean Gaussian vector having an arbitrary covariance matrix, and A is an arbitrary matrix.

Gamma, exponential, and related distributions

The chi-square distribution X\sim\chi^2_k is a special case of the gamma distribution, in that X \sim \Gamma(\frac{k}{2}, \theta=2).

Because the exponential distribution is also a special case of the Gamma distribution, we also have that if X \sim \chi_2^2 (with 2 degrees of freedom), then X \sim \mathrm{Exponential}(\lambda = \tfrac{1}{2}) is an exponential distribution.

The Erlang distribution is also a special case of the Gamma distribution and thus we also have that if X\sim\chi^2_k with even k, then X is Erlang distributed with shape parameter k / 2 and scale parameter 1/2.

Other generalizations

If Zi are k independent, complex Gaussian random variables with mean 0 and variance \sigma_i^2, then the random variable

\tilde{Q} = \sum_{i=1}^k |Z_i|^2

is a type of generalized chi-square distribution. The differences from the standard chi-square distribution is that Zi are complex and can have different variances. If \mu=\sigma_i^2 for all i, then \tilde{Q} becomes a \chi^2_{2k} scaled by μ / 2, also known as the Erlang distribution. If \sigma_i^2 have distinct values for all i, then \tilde{Q} has the pdf[9]


f(x; k,\sigma_1^2,\ldots,\sigma_k^2) = \sum_{i=1}^{k} \frac{e^{-\frac{x}{\sigma_i^2}}}{\sigma_i^2 \prod_{j=1, j\neq
i}^{k} (1- \frac{\sigma_j^2}{\sigma_i^2})} \quad\mbox{for }x\geq0.

If there are sets of repeated variances among \sigma_i^2, assume that they are divided into M sets, each representing a certain variance value. Denote \mathbf{r}=(r_1, r_2, \dots, r_M) to be the number of repetitions in each group. I.e., the mth set contains rm variables that have variance \sigma^2_m. it represents an arbitrary linear combination of χ2 random variables with different degree of freedoms:

\tilde{Q} = \sum_{m=1}^M \sigma^2_m Q_m, \quad Q_m \sim \chi^2_{2r_m} \,

The pdf of \tilde{Q} becomes[10]


f(x; \mathbf{r}, \sigma^2_1, \dots \sigma^2_M) = \prod_{m=1}^M \frac{1}{\sigma^{2r_m}_m} \sum_{k=1}^M \sum_{l=1}^{r_k} \frac{\Psi_{k,l,\mathbf{r}}}{(r_k-l)!} (-x)^{r_k-l} e^{-\frac{x}{\sigma^2_k}} \quad\mbox{for }x\geq0.

where

\Psi_{k,l,\mathbf{r}} = (-1)^{r_k-1}   \sum_{\mathbf{i} \in
\Omega_{k,l}} \prod_{j \neq k}  \Big( \!\!\!
\begin{array}{c}
i_j + r_j-1\\
i_j
\end{array} \!\!\! \Big) \Big(\frac{1}{\sigma^2_j}\!-\!\frac{1}{\sigma^2_k} \Big)^{-(r_j + i_j)},

with \mathbf{i}=[i_1,\ldots,i_M]^T from the set Ωk,l of all partitions of l − 1 (with ik = 0) defined as


\Omega_{k,l} = \Big\{ [i_1,\ldots,i_m]\in \mathbb{Z}^m;
\sum_{j=1}^M i_j \!= l-1, i_k=0, i_j\geq 0 \,\, \forall j
\Big\}.

Applications

The chi-square distribution has numerous applications in inferential statistics, for instance in chi-square tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regression line via its role in Student's t-distribution. It enters all analysis of variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables divided by their respective degrees of freedom.

Following are some of the most common situations in which the chi-square distribution arises from a Gaussian-distributed sample.

  • The box below shows probability distributions with name starting with chi for some statistics based on X_i\sim \mathrm{Normal}(\mu_i,\sigma^2_i),i=1,\cdots,k, independent random variables:
Name Statistic
chi-square distribution \sum_{i=1}^k \left(\frac{X_i-\mu_i}{\sigma_i}\right)^2
noncentral chi-square distribution \sum_{i=1}^k \left(\frac{X_i}{\sigma_i}\right)^2
chi distribution \sqrt{\sum_{i=1}^k \left(\frac{X_i-\mu_i}{\sigma_i}\right)^2}
noncentral chi distribution \sqrt{\sum_{i=1}^k \left(\frac{X_i}{\sigma_i}\right)^2}

See also

References

  1. ^ M.A. Sanders. "Characteristic function of the central chi-square distribution". http://www.planetmathematics.com/CentralChiDistr.pdf. Retrieved 2009-03-06. 
  2. ^ Abramowitz, Milton; Stegun, Irene A., eds. (1965), "Chapter 26", Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Dover, ISBN 0-486-61272-4, http://www.math.sfu.ca/~cbm/aands/page_940.htm .
  3. ^ NIST (2006). Engineering Statistics Handbook - Chi-Square Distribution
  4. ^ Jonhson, N.L.; S. Kotz, , N. Balakrishnan (1994). Continuous Univariate Distributions (Second Ed., Vol. 1, Chapter 18). John Willey and Sons. ISBN 0-471-58495-9. 
  5. ^ Mood, Alexander; Franklin A. Graybill, Duane C. Boes (1974). Introduction to the Theory of Statistics (Third Edition, p. 241-246). McGraw-Hill. ISBN 0-07-042864-6. 
  6. ^ Chi-square distribution, from MathWorld, retrieved Feb. 11, 2009
  7. ^ M. K. Simon, Probability Distributions Involving Gaussian Random Variables, New York: Springer, 2002, eq. (2.35), ISBN 978-0-387-34657-1
  8. ^ Box, Hunter and Hunter. Statistics for experimenters. Wiley. p. 46. 
  9. ^ D. Hammarwall, M. Bengtsson, B. Ottersten, Acquiring Partial CSI for Spatially Selective Transmission by Instantaneous Channel Norm Feedback, IEEE Transactions on Signal Processing, vol 56, pp. 1188-1204, March 2008.
  10. ^ E. Björnson, D. Hammarwall, B. Ottersten, Exploiting Quantized Channel Norm Feedback through Conditional Statistics in Arbitrarily Correlated MIMO Systems, IEEE Transactions on Signal Processing, vol 57, pp. 4027-4041, October 2009
  • Wilson, E.B. Hilferty, M.M. (1931) The distribution of chi-square. Procedings of the National Academy of Sciences, Washington, 17, 684–688.

External links


Best of the Web: Chi-square distribution
Top

Some good "Chi-square distribution" pages on the web:


Math
mathworld.wolfram.com
 
 
 

 

Copyrights:

Sci-Tech Dictionary. McGraw-Hill Dictionary of Scientific and Technical Terms. Copyright © 2003, 1994, 1989, 1984, 1978, 1976, 1974 by McGraw-Hill Companies, Inc. All rights reserved.  Read more
Veterinary Dictionary. Saunders Comprehensive Veterinary Dictionary 3rd Edition. Copyright © 2007 by D.C. Blood, V.P. Studdert and C.C. Gay, Elsevier. All rights reserved.  Read more
Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Chi-square distribution" Read more