Share on Facebook Share on Twitter Email
Answers.com

Exploratory data analysis

 
Statistics Dictionary: exploratory data analysis

Variant: EDA

The search for relationships between variables (as in data mining) by the use of graphics such as boxplots and techniques such as classification trees. Writing about his work over the previous decade, Tukey used the phrase as the title of his 1977 book.



Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
Wikipedia: Exploratory data analysis
Top

Exploratory data analysis (EDA) is an approach to analyzing data for the purpose of formulating hypotheses worth testing, complementing the tools of conventional statistics for testing hypotheses[1]. It was so named by John Tukey to contrast with Confirmatory Data Analysis, the term used for the set of ideas about hypothesis testing, p-values, confidence intervals etc. which formed the key tools in the arsenal of practicing statisticians at the time.

Contents

EDA development

Tukey held that too much emphasis in statistics was placed on statistical hypothesis testing (confirmatory data analysis); more emphasis needed to be placed on using data to suggest hypotheses to test. In particular, he held that confusing the two types of analyses and employing them on the same set of data can lead to systematic bias owing to the issues inherent in testing hypotheses suggested by the data.

The objectives of EDA are to:

Many EDA techniques have been adopted into data mining and are being taught to young students as a way to introduce them to statistical thinking.[2]

Techniques

There are a number of tools that are useful for EDA, but EDA is characterized more by the attitude taken than by particular techniques.[3]

The principal graphical techniques used in EDA are:

The principal quantitative techniques are:

Graphical and quantitative techniques are:

History

Many EDA ideas can be traced back to earlier authors, for example:

The Open University course Statistics in Society (MDST 242), took the above ideas and merged them with Gottfried Noether's work, which introduced statistical inference via coin-tossing and the median test.

Software

See also

Bibliography

  • Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1985). Exploring Data Tables, Trends and Shapes. ISBN 0-471-09776-4. 
  • Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1983). Understanding Robust and Exploratory Data Analysis. ISBN 0-471-09777-2. 
  • Tukey, John Wilder (1977). Exploratory Data Analysis. Addison-Wesley. ISBN 0-201-07616-0. 
  • Velleman, P F & Hoaglin, D C (1981) Applications, Basics and Computing of Exploratory Data Analysis ISBN 0-87150-409-X
  • Andrienko, N & Andrienko, G (2005) Exploratory Analysis of Spatial and Temporal Data. A Systematic Approach. Springer. ISBN 3-540-25994-5

References

  1. ^ "And roughly the only mechanism for suggesting questions is exploratory. And once they’re suggested, the only appropriate question would be how strongly supported are they and particularly how strongly supported are they by new data. And that’s confirmatory.", A conversation with John W. Tukey and Elizabeth Tukey, Luisa T. Fernholz and Stephan Morgenthaler, Statistical Science Volume 15, Number 1 (2000), 79-94.
  2. ^ Konold, C. (1999). Statistics goes to school. Contemporary Psychology, 44(1), 81-82.
  3. ^ "Exploratory data analysis is an attitude, a flexibility, and a reliance on display, NOT a bundle of techniques, and should be so taught.", John W. Tukey, We need both exploratory and confirmatory, The American Statistician, 34(1), (Feb., 1980), pp. 23-25.
  • Leinhardt, G., Leinhardt, S., Exploratory Data Analysis: New Tools for the Analysis of Empirical Data, Review of Research in Education, Vol. 8, 1980 (1980), pp. 85-157.
  • Theus, M., Urbanek, S. (2008), Interactive Graphics for Data Analysis: Principles and Examples, CRC Press, Boca Raton, FL, ISBN 978-1-58488-594-8

External links

Notes

  • [1] (Very clear set of notes on EDA from Andrew Zieffler)

 
 

 

Copyrights:

Statistics Dictionary. A Dictionary of Statistics. Second edition revised. Copyright © Oxford University Press, 2008. All rights reserved.  Read more
Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Exploratory data analysis" Read more