Wikipedia:

parallel text

A parallel text is a text in one language together with its translation in another language. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text.

Large collections of parallel texts are called parallel corpora (see text corpus). Alignments of parallel corpora at sentence level are prerequisite for many areas of linguistic research. During translation, sentences can be split, merged, deleted, inserted or changed in order. This makes alignment a non-trivial task.

Bitext

In the field of translation studies a bitext is a merged document comprised of both source- and target-language versions of a given text.

Bitexts are generated by a piece of software called an alignment tool, or a bitext tool, which automatically aligns the original and translated versions of the same text. The tool generally matches these two texts sentence by sentence. A collection of bitexts is called a bitext database or a bilingual corpus, and can be consulted with a search tool.

History

The idea of the bitext is attributed to Brian Harris, who first wrote a paper on the concept in 1988, and has been promoted by the Université de Montréal-based RALI (Recherche appliquée en linguistique informatique, or Applied Research in Computational Linguistics), a group of computer scientists and linguists who study natural language processing. Pierre Isabelle and Claude Bédard are noted promoters of the concept of the bitext.

Bitexts and Translation memories

The concept of the bitext shows certain similarities with that of the translation memory. The main difference between a bitext and a translation memory is that a translation memory is a database in which its segments (matched sentences) are stored in a way that is totally unrelated to their original context; the original sentence order is lost. A bitext retains the original sentence order.

Note that the standard format for exchanging translation memories between CAT programs is TMX, an XML vocabulary published by LISA (Localisation Industries Association). TMX allows preserving the original order of sentences, so the previous paragraph should be taken with a grain of salt.

Bitexts are designed to be consulted by a human translator, not by a machine. As such, small alignment errors or minor discrepancies that would cause a translation memory to fail are of no importance.

Parallel Corpora in the Web

See also

External links and references

  1. ^ Ralf Steinberger Ralf, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Tomaž Erjavec, Dan Tufiş, Dániel Varga (2006). "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages". Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006). Genoa, Italy, 24-26 May 2006. 

Documentation


 
 
 

Join the WikiAnswers Q&A community. Post a question or answer questions about "parallel text" at WikiAnswers.

 

Copyrights:

Wikipedia. This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Parallel text" Read more

Search for answers directly from your browser with the FREE Answers.com Toolbar!  
Click here to download now. 

Get Answers your way! Check out all our free tools and products.

On this page:   E-mail   print Print  Link  

 

Keep Reading

Mentioned In: