Share on Facebook Share on Twitter Email
Answers.com

Missing values

 
Wikipedia: Missing values

In statistics, missing values occur when no data value is stored for the variable in the current observation. Missing values are a common occurrence, and statistical methods have been developed to deal with this problem. Modern statistical packages have made dealing with missing values much easier. Often these use a maximum likelihood estimation for summary statistics, confidence intervals, etc. For a researcher missing values can be disastrous. Researchers try to establish relationships between variables. To do this, researchers take a representative sample from the population at interest and conduct research on this group to get the exact relationship between the variables at interest in the given population. To get this representative sample it is disastrous if the data holds missing values. These values will make it difficult to give reliable and correct conclusions. As such, researchers always try to avoid missing values as much as possible (Ader, H.J., Mellenbergh, G.J. 2008).

Contents

Vital questions to think on when you have missing values

Issues:

  • How serious is the missing data problem?
  • Can the research design be adapted?
  • Is it plausible to assume that the missing data are distributed (completely) at random?: seeMCAR (missing completely at random)
  • Can you apply one of the techniques that allow for missing data?

Methodological or statistical consultants can transform these to specific questions to the client:

  • Can you diminish or avoid having missing data, adapting the design?
  • Can you retrieve the missing information and adapt the data?
  • Have you any idea why information is missing? If there is an apparent cause, is it in some way represented in the data? (Ader, H.J., Mellenbergh, G.J. 2008)

Techniques of dealing with missing values

Because missing values can malform the data and as such give incorrect or unreliable conclusions it is ideal to avoid getting missing values as much as possible. Before conducting research a researcher is to think on the possibility of missing values. Especially, as has been said, because missing values are common and bring a lot of difficulties for the data analysis. The researcher is then to think of possible data analyses methods that are content robust. Data is content robust when we are confident that violations of the basic assumptions (every statistical test has basic assumptions) will have no disrupting effect on the conclusions drawn about the research question.

Imputation

If it is known that the data analysis technique which is to be used isn't content robust, it is good to consider imputing the missing data. This can be done in several ways. Recommended is to use multiple imputations. Rubin convincingly showed[citation needed] that even with a small number, m, of repeated imputations (m being equal or smaller than 5) the quality of estimation improves enormously (in: Ader, H.J., Mellenbergh, G.J. 2008). For most practical purposes 2 or 3 imputations are sufficient. There is a drawback though: any data analysis has to be repeated for each of the m imputed data sets and, in some cases, the relevant statistics have to be combined in a relatively complicated way (Ader, H.J., Mellenbergh, G.J. 2008). Examples of imputations are:

Partial imputation

The expectation-maximization algorithm is an approach in which values of the statistics which would be computed if a complete dataset were available are estimated (imputed), taking into account the pattern of missing data. In this approach, values for individual missing data-items are not usually imputed.

Partial deletion

Methods which involve reducing the data available to a dataset having no missing values include:

  • Listwise deletion/casewise deletion (albeit a naieve solution)
  • Pairwise deletion(albeit a naieve solution)

Full analysis

Methods which take full account of all information available, without the distortion resulting from using imputed values as if they were actaully observed:

See also

References

  • Adèr, H.J.(2008). Chapter 13: Missing data. In Adèr, H.J., & Mellenbergh, G.J. (Eds.) (with contributions by Hand, D.J.), Advising on Research Methods: A consultant's companion (pp. 305-332). Huizen, The Netherlands: Johannes van Kessel Publishing.

Further reading

External links

Background

Software


Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
 
 

 

Copyrights:

Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Missing values" Read more