Share on Facebook Share on Twitter Email
Answers.com

String metric

 
Wikipedia: String metric

In mathematics, string metrics (also known as similarity metrics) are a class of textual based metrics resulting in a similarity or dissimilarity (distance) score between two text strings for approximate matching or comparison and in fuzzy string searching. For example the strings "Sam" and "Samuel" can be considered (although not the same) to a degree similar. A string metric provides a floating point number indicating an algorithm-specific indication of similarity.

The most widely known (although rudimentary) string metric is Levenshtein Distance (also known as Edit Distance), which operates between two input strings, returning a score equivalent to the number of substitutions and deletions needed in order to transform one input string into another. Simplistic string metrics such as Levenshtein distance have expanded to include phonetic, token, grammatical and character-based methods of statistical comparisons.

A widespread example of a string metric is DNA sequence analysis and RNA analysis, which are performed by optimised string metrics to identify matching sequences.

String metrics are used heavily in information integration and are currently used in fraud detection, fingerprint analysis, plagiarism detection, ontology merging, DNA analysis, RNA analysis, image analysis, evidence-based machine learning, database data deduplication, data mining, Web interfaces, e.g. Ajax-style suggestions as you type, data integration, semantic knowledge integration, etc..

List of string metrics

See also

External links


Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
 
 

 

Copyrights:

Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "String metric" Read more