Share on Facebook Share on Twitter Email
Answers.com

Spearman's rank correlation coefficient

 
Sci-Tech Dictionary: Spearman's rank correlation coefficient
(¦spir·mənz ¦raŋk ′kär·ə′lā·shən ′kō·ə′fish·ənt)

(statistics) A statistic used as a measure of correlation in nonparametric statistics when the data are in ordinal form; a product moment correlation coefficient. Also known as Spearman's rho.


Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
Dental Dictionary: Spearman’s rho
Top

n.pr

A statistical test for correlation between two rank-ordered scales. It yields a statement of the degree of interdependence of the scores of the two scales.

Geography Dictionary: Spearman's rank correlation coefficient
Top

Also known as Spearman's rho, the meaning of this coefficient is the same as that of the product-moment correlation coefficient. The two sets of variables are ranked separately and the differences in rank, d, are calculated for each pair of variables. The equation is:


where n is the number of paired variables.

Sports Science and Medicine: Spearman rank correlation coefficient
Top

A statistical test that uses a ranking system to assess the degree of correlation existing between two sets of data. The two sets of data are placed in rank order next to each other so that they can be compared statistically.

Wikipedia: Spearman's rank correlation coefficient
Top

In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter ρ (rho) or as rs, is a non-parametric measure of correlation – that is, it assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any other assumptions about the particular nature of the relationship between the variables. Certain other measures of correlation are parametric in the sense of being based on possible relationships of a parameterised form, such as a linear relationship.

Contents

Calculation

In practice, however, a simpler procedure is normally used to calculate ρ. The raw scores are converted to ranks, and the differences di between the ranks of each observation on the two variables are calculated.

If there are no tied ranks, then ρ is given by:[1]

 \rho = 1- {\frac {6 \sum d_i^2}{n(n^2 - 1)}}

where:

di = xiyi = the difference between the ranks of corresponding values Xi and Yi, and
n = the number of values in each data set (same for both sets).

If tied ranks exist, classic Pearson's correlation coefficient between ranks has to be used instead of the above formula:[1]


\rho=\frac{n(\sum x_iy_i)-(\sum x_i)(\sum y_i)}
{\sqrt{n(\sum x_i^2)-(\sum x_i)^2}~\sqrt{n(\sum y_i^2)-(\sum y_i)^2}}.

One has to assign the same rank to each of the equal values. It is an average of their positions in the ascending order of the values:

An example of averaging ranks

In the table below, notice how the rank of values that are the same is the mean of what their ranks would otherwise be.

Variable Xi Position in the descending order Rank xi
0.8 5 5
1.2 4 \frac{4+3}{2}=3.5\
1.2 3 \frac{4+3}{2}=3.5\
2.3 2 2
18 1 1

In this case the shortcut formula cannot be used (because of the tied ranks in the data) - the second, product-moment form, must be used instead.

Example

The raw data used in this example is shown below. The goal here is to calculate the correlation between the IQ of a person with the number of hours spent in front of TV per week.

IQ, Xi Hours of TV per week, Yi
106 7
86 0
100 27
101 50
99 28
103 29
97 20
113 12
112 6
110 17

The first step is to sort this data by the second column. Next, two more columns are created (xi and yi). The last of these columns (yi) is assigned 1,2,3,...n, and then the data is sorted by the first original column (Xi). The first of the newly created columns (xi) is assigned 1,2,3,...n. Then a column di is created to hold the differences between the two rank columns (xi and yi). Finally another column d^2_i should be created. This is just column di squared.

After performing this process with the example data, the end result is something like the following:

IQ, Xi Hours of TV per week, Yi rank xi rank yi di d^2_i
86 0 1 1 0 0
97 20 2 6 −4 16
99 28 3 8 −5 25
100 27 4 7 −3 9
101 50 5 10 −5 25
103 29 6 9 −3 9
106 7 7 3 4 16
110 17 8 5 3 9
112 6 9 2 7 49
113 12 10 4 6 36

The values in the d^2_i column can now be added to find \sum d_i^2 = 194. The value of n is 10. So these values can now be substituted back into the equation,

 \rho = 1- {\frac {6\times194}{10(10^2 - 1)}}

which evaluates to ρ = −0.175757575..., which shows that the correlation between IQ and hours spent watching TV is very low (barely any correlation). In the case of ties in the original values, this formula should not be used. Instead, the Pearson correlation coefficient should be calculated on the ranks (where ties are given ranks, as described above).

Determining significance

One approach to testing whether an observed value of ρ is significantly different from zero (r will always maintain 1 ≥ r ≥ −1) is to calculate the probability that it would be greater than or equal to the observed r, given the null hypothesis, by using a permutation test. An advantage of this approach is that it automatically takes into account the number of tied data values there are in the sample, and the way they are treated in computing the rank correlation.

Another approach parallels the use of the Fisher transformation in the case of the Pearson product-moment correlation coefficient. That is, confidence intervals and hypothesis tests relating to the population value ρ can be carried out using the Fisher transformation:

F(r) = {1 \over 2}\log{1+r \over 1-r} = \operatorname{arctanh}(r).

If F(r) is the Fisher transformation of r, the sample Spearman rank correlation coefficient, and n is the sample size, then

z = \sqrt{\frac{n-3}{1.06}}F(r)

is a z-score for r which approximately follows a standard normal distribution under the null hypothesis of statistical independence (ρ = 0).[2][3]

A generalization of the Spearman coefficient is useful in the situation where there are three or more conditions, a number of subjects are all observed in each of them, and it is predicted that the observations will have a particular order. For example, a number of subjects might each be given three trials at the same task, and it is predicted that performance will improve from trial to trial. A test of the significance of the trend between conditions in this situation was developed by E. B. Page[4] and is usually referred to as Page's trend test for ordered alternatives.

Correspondence analysis based on Spearman's rho

Classic correspondence analysis is a statistical method which gives a score to every value of two nominal variables, in this way that Pearson's correlation coefficient between them is maximized.

There exists an equivalent of this method, called grade correspondence analysis, which maximizes Spearman's rho or Kendall's tau[5].

See also

References

  1. ^ a b Myers, Jerome L.; Arnold D. Well (2003). Research Design and Statistical Analysis (second edition ed.). Lawrence Erlbaum. pp. 508. ISBN 0805840370. 
  2. ^ Choi, S.C. (1977) Test of equality of dependent correlations. Biometrika, 64 (3), 645–647
  3. ^ Fieller, E.C. et al (1957) Tests for rank correlation coefficients :I. Biometrika 44, 470–481
  4. ^ Page, E. B. (1963). "Ordered hypotheses for multiple treatments: A significance test for linear ranks". Journal of the American Statistical Association 58: 216–230. doi:10.2307/2282965. 
  5. ^ Kowalczyk, T.; Pleszczyńska E. , Ruland F. (eds.) (2004). Grade Models and Methods for Data Analysis with Applications for the Analysis of Data Populations. Studies in Fuzziness and Soft Computing vol. 151. Berlin Heidelberg New York: Springer Verlag. ISBN 9783540211204. 
  • G.W. Corder, D.I. Foreman, "Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach", Wiley (2009)
  • C. Spearman, "The proof and measurement of association between two things" Amer. J. Psychol. , 15 (1904) pp. 72–101
  • M.G. Kendall, "Rank correlation methods" , Griffin (1962)
  • M. Hollander, D.A. Wolfe, "Nonparametric statistical methods" , Wiley (1973)
  • J. C. Caruso, N. Cliff, "Empirical Size, Coverage, and Power of Confidence Intervals for Spearman's Rho", Ed. and Psy. Meas. , 57 (1997) pp. 637–654

External links


Best of the Web: Spearman's rank correlation coefficient
Top

Some good "Spearman's rank correlation coefficient" pages on the web:


Math
mathworld.wolfram.com
 
 
 

 

Copyrights:

Sci-Tech Dictionary. McGraw-Hill Dictionary of Scientific and Technical Terms. Copyright © 2003, 1994, 1989, 1984, 1978, 1976, 1974 by McGraw-Hill Companies, Inc. All rights reserved.  Read more
Dental Dictionary. Mosby's Dental Dictionary. Copyright © 2004 by Elsevier, Inc. All rights reserved.  Read more
Geography Dictionary. A Dictionary of Geography. Copyright © Susan Mayhew 1992, 1997, 2004. All rights reserved.  Read more
Sports Science and Medicine. The Oxford Dictionary of Sports Science & Medicine. Copyright © Michael Kent 1998, 2006, 2007. All rights reserved.  Read more
Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Spearman's rank correlation coefficient" Read more