Data that does not reside in fixed locations. Free-form text in a word processing document is a typical example. Contrast with structured data. See free-form database.
Download Computer Desktop Encyclopedia to your iPhone/iTouch
| Computer Desktop Encyclopedia: unstructured data |
Data that does not reside in fixed locations. Free-form text in a word processing document is a typical example. Contrast with structured data. See free-form database.
Download Computer Desktop Encyclopedia to your iPhone/iTouch
| 5min Related Video: Unstructured data |
| Wikipedia: Unstructured data |
Unstructured data (or unstructured information) refers to (usually) computerized information that either does not have a data model or has one that is not easily usable by a computer program. The term distinguishes such information from data stored in fielded form in databases or annotated (semantically tagged) in documents.
The term is imprecise for several reasons; 1) structure, while not formally defined can still be implied and 2) data with some form of structure may still be characterized as unstructured if its structure is not helpful for the desired processing task. In the first case, software that creates machine-processable structure exploits the linguistic, auditory, and visual structure that is inherent in all forms of human communication.[1] This inherent structure can be inferred from text, for instance, by examining word morphology, sentence syntax, and other small- and large-scale patterns. In the second case, examples of "unstructured data" may include audio, video, and unstructured text such as the body of an e-mail message, Web page, or word processor document. While the main content being conveyed does not have a defined structure, it generally comes packaged in objects (ex: in files or documents, ...) that themselves have structure and are thus a mix of structured and unstructured data, but collectively this is still referred to as "unstructured data". For example, an HTML Web page is tagged, but HTML mark-up is typically designed solely for rendering. It does not capture the meaning or function of tagged elements in ways that support automated processing of the information content of the page. XHTML tagging does allow machine processing of elements although it typically does not capture or convey the semantic meaning of tagged terms.
Merrill Lynch in 1998 cited estimates that as much as 80% of all potentially usable business information originates in unstructured form.[2] Such estimates may not be based on primary research, but they are nonetheless widely accepted.[3]
A lot of the unstructured data is noisy text. Spontaneous communication (such as e-mail, SMS, blogs, and web pages) contains noisy text and processing noise for example from automatic speech recognition produce noisy text. Noise in text is defined as any kind of difference between the surface form of a coded representation of the text and the intended, correct, or original text.
Contents |
Data mining and text analytics and noisy text analytics techniques are different methods used to find patterns in, or otherwise interpret, this information. Common techniques for structuring text usually involve manual tagging with metadata or Part-of-speech tagging for further text mining-based structuring. UIMA provides a common framework for processing this information to extract meaning and create structured data about the information.
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)
| structured data (technology) | |
| Oracle Collaboration Suite (technology) | |
| FileTek, Inc. (Private Company) |
| How decision support system help cope with the growth in unstructured data and unstructured decisions? | |
| What is an unstructured questionnaire? | |
| What is unstructured problem? |
Copyrights:
![]() | Computer Desktop Encyclopedia. THIS DEFINITION IS FOR PERSONAL USE ONLY. All other reproduction is strictly prohibited without permission from the publisher. © 1981-2010 The Computer Language Company Inc. All rights reserved. Read more | |
![]() | Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Unstructured data". Read more |
Mentioned in