Share on Facebook Share on Twitter Email
Answers.com

Language model

 
Wikipedia: Language model

A statistical language model assigns a probability to a sequence of m words P(w_1,\ldots,w_m) by means of a probability distribution.

Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information retrieval.

In speech recognition and in data compression, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence.

When used in information retrieval, a language model is associated with a document in a collection. With query Q as input, retrieved documents are ranked based on the probability that the document's language model would generate the terms of the query, P(Q|Md).

Estimating the probability of sequences can become difficult in corpora, in which phrases or sentences can be arbitrarily long and hence some sequences are not observed during training of the language model (data sparseness problem of overfitting). For that reason these models are often approximated using smoothed N-gram models.

Contents

N-gram models

In an n-gram model, the probability P(w_1,\ldots,w_m) of observing the sentence w1,...,wm is approximated as


P(w_1,\ldots,w_m) = \prod^m_{i=1} P(w_i|w_1,\ldots,w_{i-1})
 \approx \prod^m_{i=1} P(w_i|w_{i-(n-1)},\ldots,w_{i-1})

Here, it is assumed that the probability of observing the ith word wi in the context history of the preceding i-1 words can be approximated by the probability of observing it in the shortened context history of the preceding n-1 words (nth order Markov property).

The conditional probability can be calculated from n-gram frequency counts: 
P(w_i|w_{i-(n-1)},\ldots,w_{i-1}) = \frac{count(w_{i-(n-1)},\ldots,w_{i-1},w_i)}{count(w_{i-(n-1)},\ldots,w_{i-1})}


The words bigram and trigram language model denote n-gram language models with n=2 and n=3, respectively.

Example

In a bigram (n=2) language model, the probability of the sentence I saw the red house is approximated as


P(I,saw,the,red,house) \approx P(I|<s>) P(saw|I) P(the|saw) P(red|the) P(house|red)

whereas in a trigram (n=3) language model, the approximation is


P(I,saw,the,red,house) \approx P(I|<s>,<s>) P(saw|<s>,I) P(the|I,saw) P(red|saw,the) P(house|the,red)

Note, that the context of the first n-1 ngrams is filled start-of-sentence markers, typically denoted <s>.

See also

References


Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
 
 

 

Copyrights:

Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Language model" Read more