(computer science) The evaluation of digital data.
| Sci-Tech Dictionary: data analysis |
(computer science) The evaluation of digital data.
| 5min Related Video: Data analysis |
| Wikipedia: Data analysis |
Data analysis is a process of gathering, modeling, and transforming data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
Data mining is a particular data analysis technique that focuses on modeling and knowledge discovery for predictive rather than purely descriptive purposes. Business intelligence covers data analysis that relies heavily on aggregation, focusing on business information. In statistical applications, some people divide data analysis into descriptive statistics, exploratory data analysis, and confirmatory data analysis. EDA focuses on discovering new features in the data and CDA on confirming or falsifying existing hypotheses. Predictive analytics focuses on application of statistical or structural models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a species of unstructured data. All are varieties of data analysis.
Data integration is a precursor to data analysis, and data analysis is closely linked to data visualization and data dissemination. The term data analysis is sometimes used as a synonym for data modeling, which is unrelated to the subject of this article.
Contents |
In nuclear and particle physics the data usually originate from the experimental apparatus via a data acquisition system. It is then processed, in a step usually called data reduction, to apply calibrations and to extract physically significant information. Data reduction is most often, especially in large particle physics experiments, an automatic, batch-mode operation carried out by software written ad-hoc. The resulting data n-tuples are then scrutinized by the high physicists, using specialized software tools like ROOT or PAW, comparing the results of the experiment with theory.
The theoretical models are often difficult to compare directly with the results of the experiments, so they are used instead as input for Monte Carlo simulation software like Geant4, predict the response of the detector to a given theoretical event, producing simulated events which are then compared to experimental data.
See also: Computational physics. 3
Qualitative data analysis (QDA) or qualitative research is the non-quantitative analysis of data from non-numerical sources, for example words, photographs, observations, etc..
The statistical analysis of data is a process with several phases, each with its own goal.
During data cleaning erroneous entries are inspected and corrected where possible. In some cases, it is easy to substitute suspect data with the correct values. However, when it is unclear what caused the erroneous data or what should be used to replace it, it is important that no subjective decisions are made to ensure the quality of the data. Furthermore, it is important not to throw information away at any stage in the data cleaning phase. When altering variables the original values should be kept in a duplicate dataset or under a different variable name so that information is always cumulatively retrievable.[1]
The initial data analysis uses descriptive statistics to answer the following four questions[1]:
Each step of the initial data analysis is described below.
The quality of the data can be assessed in several ways. First of all the distribution of the variables before data cleaning is compared to the distribution of the variables after data cleaning to see whether data cleaning has had unwanted effects on the data. Second, the missing observations in the data are analyzed to see whether they are missing at random and whether some form of imputation (statistics) is needed. Third, extreme observations in the data are analyzed to see if they seem to disturb the distribution. If that is the case, robust techniques can be applied.
When the quality of the measurement instruments used is not the main focus of the research, the quality of the measurement instruments can be checked during initial data analysis. One way to assess the quality of a measurement instrument is to perform an analysis of homogeneity (internal consistency). A homogeneity index like Cronbach's α gives an indication of the reliability of a measurement instrument.
In many cases, a check to see whether the randomization procedure has worked will be the starting point for analyzing the implementation of the design. This can be done by checking whether variables are equally distributed across groups. Other ways of checking the implementation of the design are manipulation checking and the analysis of nonresponse and dropout.
In this step, the findings of the initial data analysis are documented and possible corrective actions are taken. For instance, when the distribution of a variable is not normal, the data may need to be transformed or categorized. Furthermore, a decision should be made on how to handle missing data and outliers. If the randomization procedure seems to be defective, propensity scores can be calculated and included in the main analyses as a
| Wikiversity has learning materials about Data analysis |
| This article needs additional citations for verification. Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (December 2008) |
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)
| EDA | |
| maps | |
| cluster analysis |
| What is the data analysis of noise pollution? Read answer... | |
| What is quantitative data analysis? Read answer... | |
| What are data analysis for smoking? Read answer... |
| Outline of a data analysis? | |
| Data analysis of HR in IT industry? | |
| What does Frequency in data analysis means what? |
Copyrights:
![]() | Sci-Tech Dictionary. McGraw-Hill Dictionary of Scientific and Technical Terms. Copyright © 2003, 1994, 1989, 1984, 1978, 1976, 1974 by McGraw-Hill Companies, Inc. All rights reserved. Read more | |
![]() | Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Data analysis". Read more |
Mentioned in