| Dictionary: chi-square test |
| 5min Related Video: chi-square test |
| Business Dictionary: Chi-Square Test |
Statistical method to test whether two (or more) variables are: (1) independent or (2) homogeneous. The chi-square test for independence examines whether knowing the value of one variable helps to estimate the value of another variable. The chi-square test for homogeneity examines whether two populations have the same proportion of observations with a common characteristic. Though the formula is the same for both tests, the underlying logic and sampling procedures vary.
| Encyclopedia of Public Health: Chi-Square Test |
Studies often collect data on categorical variables that can be summarized as a series of counts. These counts are commonly arranged in a tabular format known as a contingency table. For example, a study designed to determine whether or not there is an association between cigarette smoking and asthma might collect data that could be assembled into a 2−2 table. In this case, the two columns could be defined by whether the subject smoked or not, while the rows could represent whether or not the subject experienced symptoms of asthma. The cells of the table would contain the number of observations or patients as defined by these two variables.
The chi-square test statistic can be used to evaluate whether there is an association between the rows and columns in a contingency table. More specifically, this statistic can be used to determine whether there is any difference between the study groups in the proportions of the risk factor of interest. Returning to our example, the chi-square statistic could be used to test whether the proportion of individuals who smoke differs by asthmatic status.
The chi-square test statistic is designed to test the null hypothesis that there is no association between the rows and columns of a contingency table. This statistic is calculated by first obtaining for each cell in the table, the expected number of
Table 1
| Observed values for data presented in a two-by-two table | |||
| SOURCE: Courtesy of author. | |||
| Variable 2 | Variable 1 | Total | |
| Yes | No | ||
| Yes | a | b | a+b |
| No | c | d | c+d |
| Total | a+c | b+d | n |
events that will occur if the null hypothesis is true. When the observed number of events deviates significantly from the expected counts, then it is unlikely that the null hypothesis is true, and it is likely that there is a row-column association. Conversely, a small chi-square value indicates that the observed values are similar to the expected values leading us to conclude that the null hypothesis is plausible. The general formula used to calculate the chi-square (X2) test statistic is as follows: where O = observed count in category; E = expected count in the category under the null hypothesis; df = degrees of freedom; and c, r represent the number of columns and rows in the contingency table.
The value of the chi-square statistic cannot be negative and can assume values from zero to infinity. The p-value for this test statistic is based on the chi-square probability distribution and is generally extracted from published tables or estimated using computer software programs. The p-value represents the probability that the chi-square test statistic is as extreme as or more extreme than observed if the null hypothesis were true. As with the t and F distributions, there is a different chi-square distribution for each possible value of degrees of freedom. Chi-square distributions with a small number of degrees of freedom are highly skewed; however, this skewness is attenuated as the number of degrees of freedom increases. In general, the degrees of freedom for tests of hypothesis that involve an r×c contingency table is
Table 2
| Expected values for data presented in a two-by-two table | |||
| SOURCE: Courtesy of author. | |||
| Variable 2 | Variable 1 | Total | |
| Yes | No | ||
| Yes | (a+b)(a+c)/n | (a+b)(b+d)/n | a+b |
| No | (c+d)(a+c)/n | (c+d)(b+d)/n | c+d |
| Total | a+c | b+d | n |
equal to (r7minus;1)×(c−1); thus for any 2×2 table, the degrees of freedom is equal to one. A chi-square distribution with one degree of freedom is equal to the square root of the normal distribution, and, consequently, either the chi-square or standard normal table can be used to determine the corresponding p-value.
The chi-square test is most widely used to conduct tests of hypothesis that involve data that can be presented in a 2×2 table. Indeed, this tabular format is a feature of the case-control study design that is commonly used in public health research. Within this contingency table, we could denote the observed counts as shown in Table 1. Under the null hypothesis of no association between the two variables, the expected number in each cell under the null hypothesis is calculated from the observed values using the formula outlined in Table 2.
The use of the chi-square test can be illustrated by using hypothetical data from a study investigating the association between smoking and asthma among adults observed in a community health clinic. The results obtained from classifying 150 individuals are shown in Table 3. As Table 3 shows, among asthmatics the proportion of smokers was 40 percent (20/50), while the corresponding proportion among asymptomatic individuals was 22 percent (22/100). By applying the formula presented in Table 2, for the observed cell counts of 20, 30, 22, and 78 (Table 3) the corresponding expected counts are 14, 36, 28, and 72. The observed and expected counts can then be used to calculate the chi-square test statistic as outlined in Equation 1. The resulting value of the chi-square
Table 3
| Hypothetical data showing chi-square test | |||
| SOURCE: Courtesy of author. | |||
| Symptoms of asthma | Ever smoke cigarettes | Total | |
| Yes | No | ||
| Yes | 20 | 30 | 50 |
| No | 22 | 30 | 100 |
| Total | 42 | 108 | 150 |
test statistic is approximately 5.36, and the associated p-value for this chi-square distribution that has one degree of freedom is 0.02. Therefore, if there was truly no association between smoking and asthma, there is a 2 out of 100 probability of observing a difference in proportions that is at least as large as 18 percent (40%–22%) by chance alone. We would therefore conclude that the observed difference in the proportions is unlikely to be explained by chance alone, and consider this result statistically significant.
Because the construction of the chi-square test makes use of discrete data to estimate a continuous distribution, some authors will apply a continuity correction when calculating this statistic. Specifically, where Oi−Ei is the absolute value of the difference between Oi and Ei and the term 0.5 in the numerator is often referred to as Yates correction factor. This correction factor serves to reduce the chi-square value, and, therefore, increases the resulting p-value. It has been suggested that this correction yields an overly conservative test that may fail to reject a false null hypothesis. However, as long as the sample size is large, the effect of the correction factor is negligible.
When there is a small number of counts in the table, the use of the chi-square test statistic may not be appropriate. Specifically, it has been recommended that this test not be used if any cell in the table has an expected count of less than one, or if 20 percent of the cells have an expected count that is greater than five. Under this scenario, the Fisher'sexact test is recommended for conducting tests of hypothesis.
(SEE ALSO: Normal Distributions; Probability Model; Sampling; Statistics for Public Health; T-Test)
Bibliography
Cohran, W. G. (1954). "Some Methods for Strengthening the Common X2 Test." Biometrics 10:417–451.
Grizzle, J. E. (1967). "Continuity Correction in the X2 Test for 2×2 Tables." The American Statistician 21:28–32.
Pagano, M., and Gauvreau, K. (2000). Principles of Biostatistics, 2nd edition. Pacific Grove, CA: Duxbury Press.
Rosner, B. (2000). Fundamentals of Biostatistics, 5th edition. Pacific Grove, CA: Duxbury Press.
— PAUL J. VILLENEUVE
| Wikipedia: Chi-square test |
A chi-square test (also chi-squared or χ2 test) is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is true, or any in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-square distribution as closely as desired by making the sample size large enough.
Some examples of chi-squared tests where the chi-square distribution is only approximately valid:
One case where the distribution of the test statistic is an exact chi-square distribution is the test that the variance of a normally-distributed population has a given value based on a sample variance. Such a test is uncommon in practice because values of variances to test against are seldom known exactly.
Contents |
If a sample of size n is taken from a population having a normal distribution, then there is a well-known result (see distribution of the sample variance) which allows a test to be made uv whether the variance of the population has a pre-determined value. For example, a manufacturing process might have been in stable condition for a long period, allowing a value for the variance to be determined essentially without error. Suppose that a variant of the process is being tested, giving rise to a small sample of product items whose variation is to be tested. The test statistic T in this instance could be set to be the sum of squares about the sample mean, divided by the nominal value for the variance (ie. the value to be tested as holding). Then T has a chi-square distribution with n–1 degrees of freedom. For example if the sample size is 21, the acceptance region for T for a significance level of 5% is the interval 9.59 to 34.17.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)
| Best of the Web: chi-square test |
Some good "chi-square test" pages on the web:
Math mathworld.wolfram.com |
| noncentral chi-square distribution (statistics) | |
| confidence level (in marketing) | |
| control (in marketing) |
| How do you do chi squared test? | |
| Who found the chi square test? | |
| In a chi square test how do you get E? |
Copyrights:
![]() | Dictionary. The American Heritage® Dictionary of the English Language, Fourth Edition Copyright © 2007, 2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company. All rights reserved. Read more | |
![]() | Business Dictionary. Dictionary of Business Terms. Copyright © 2000 by Barron's Educational Series, Inc. All rights reserved. Read more | |
![]() | Encyclopedia of Public Health. Encyclopedia of Public Health. Copyright © 2002 by The Gale Group, Inc. All rights reserved. Read more | |
![]() | Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Chi-square test". Read more |
Mentioned in