parse

Share on Facebook Share on Twitter Email
(pärs) pronunciation

v., parsed, pars·ing, pars·es.

v.tr.
  1. To break (a sentence) down into its component parts of speech with an explanation of the form, function, and syntactical relationship of each part.
  2. To describe (a word) by stating its part of speech, form, and syntactical relationships in a sentence.
    1. To examine closely or subject to detailed analysis, especially by breaking up into components: "What are we missing by parsing the behavior of chimpanzees into the conventional categories recognized largely from our own behavior?" (Stephen Jay Gould).
    2. To make sense of; comprehend: I simply couldn't parse what you just said.
  3. Computer Science. To analyze or separate (input, for example) into more easily processed components.
v.intr.
To admit of being parsed: sentences that do not parse easily.

[Probably from Middle English pars, part of speech, from Latin pars (ōrātiōnis), part (of speech).]

parser pars'er n.

(1) To analyze a sentence or language statement. Parsing breaks down words into functional units that can be converted into machine language. For example, to parse the expression sum salary for title = "MANAGER" the word SUM must be identified as the primary command, FOR as a conditional search, TITLE as a field name and MANAGER as the data to be searched.

Parsing breaks down a natural language request, such as "What's the total of all the managers' salaries" into the commands required by a high-level language, such as in the example above. See name parsing.

(2) To convert from one format to another. The term is often used as a substitute for the word "convert" when continuous strings of text are scanned to find embedded format codes that must be changed. In contrast, when data are moved between different databases, that is generally known as database "conversion," because the locations of the fields in a database record are easily identified and generally do not have to be searched (scanned) to be found.

Download Computer Desktop Encyclopedia to your PC, iPhone or Android.

1. To determine the syntactic structure of a sentence or other utterance (close to the standard English meaning). “That was the one I saw you.” “I can't parse that.

2. More generally, to understand or comprehend. “It's very simple; you just kretch the glims and then aos the zotz.” “I can't parse that.

3. Of fish, to have to remove the bones yourself. “I object to parsing fish”, means “I don't want to get a whole fish, but a sliced one is okay”. A parsed fish has been deboned. There is some controversy over whether unparsed should mean ‘bony’, or also mean ‘deboned’.


The process of determining the syntactic or grammatical structure of a sentence. The interesting philosophical point is that for natural languages, this process is extremely difficult to do explicitly. Yet ordinary speakers have no difficulty recognizing grammaticality, nor the way in which words and other features of sentences contribute to their structure. This raises the question of whether it is right to think in terms of tacit or implicit deployment of rules, or whether our ordinary knowledge is better thought of in some other way.


to analyse amino acid sequences or nucleotide base sequences during the construction of multiple sequence alignments when determining homology.

Previous:paroxysmal nocturnal hemoglobinuria, parotid gland, paromomycin
Next:parsimony, parthenocarpy, parthenogenesis
Random House Word Menu:

categories related to 'parse'

Top
Random House Word Menu by Stephen Glazier
For a list of words related to parse, see:
  • Grammar and Usage - parse: (vb) analyze, describe, and grammatically label the parts of a sentence


In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar. Parsing can also be used as a linguistic term, for instance when discussing how phrases are divided up in garden path sentences.

Parsing is also an earlier term for the diagramming of sentences of natural languages, and is still used for the diagramming of inflected languages, such as the Romance languages or Latin. The term parsing comes from Latin pars (ōrātiōnis), meaning part (of speech).[1][2]

Parsing is a common term used in psycholinguistics when describing language comprehension. In this context, parsing refers to the way that human beings, rather than computers, analyze a sentence or phrase (in spoken language or text) "in terms of grammatical constituents, identifying the parts of speech, syntactic relations, etc." [3] This term is especially common when discussing what linguistic cues help speakers to parse garden-path sentences.

Contents

Parser

In computing, a parser is one of the components in an interpreter or compiler that checks for correct syntax and builds a data structure (often some kind of parse tree, abstract syntax tree or other hierarchical structure) implicit in the input tokens. The parser often uses a separate lexical analyser to create tokens from the sequence of input characters. Parsers may be programmed by hand or may be (semi-)automatically generated (in some programming languages) by a tool.

Human languages

In some machine translation and natural language processing systems, human languages are parsed by computer programs. Human sentences are not easily parsed by programs, as there is substantial ambiguity in the structure of human language, whose usage is to convey meaning (or semantics) amongst a potentially unlimited range of possibilities but only some of which are germane to the particular case. So an utterance "Man bites dog" versus "Dog bites man" is definite on one detail but in another language might appear as "Man dog bites" with a reliance on the larger context to distinguish between those two possibilities, if indeed that difference was of concern. It is difficult to prepare formal rules to describe informal behaviour even though it is clear that some rules are being followed.

In order to parse natural language data, researchers must first agree on the grammar to be used. The choice of syntax is affected by both linguistic and computational concerns; for instance some parsing systems use lexical functional grammar, but in general, parsing for grammars of this type is known to be NP-complete. Head-driven phrase structure grammar is another linguistic formalism which has been popular in the parsing community, but other research efforts have focused on less complex formalisms such as the one used in the Penn Treebank. Shallow parsing aims to find only the boundaries of major constituents such as noun phrases. Another popular strategy for avoiding linguistic controversy is dependency grammar parsing.

Most modern parsers are at least partly statistical; that is, they rely on a corpus of training data which has already been annotated (parsed by hand). This approach allows the system to gather information about the frequency with which various constructions occur in specific contexts. (See machine learning.) Approaches which have been used include straightforward PCFGs (probabilistic context-free grammars), maximum entropy, and neural nets. Most of the more successful systems use lexical statistics (that is, they consider the identities of the words involved, as well as their part of speech). However such systems are vulnerable to overfitting and require some kind of smoothing to be effective.[citation needed]

Parsing algorithms for natural language cannot rely on the grammar having 'nice' properties as with manually designed grammars for programming languages. As mentioned earlier some grammar formalisms are very difficult to parse computationally; in general, even if the desired structure is not context-free, some kind of context-free approximation to the grammar is used to perform a first pass. Algorithms which use context-free grammars often rely on some variant of the CKY algorithm, usually with some heuristic to prune away unlikely analyses to save time. (See chart parsing.) However some systems trade speed for accuracy using, e.g., linear-time versions of the shift-reduce algorithm. A somewhat recent development has been parse reranking in which the parser proposes some large number of analyses, and a more complex system selects the best option.

Programming languages

The most common use of a parser is as a component of a compiler or interpreter. This parses the source code of a computer programming language to create some form of internal representation. Programming languages tend to be specified in terms of a context-free grammar because fast and efficient parsers can be written for them. Parsers are written by hand or generated by parser generators.

Context-free grammars are limited in the extent to which they can express all of the requirements of a language. Informally, the reason is that the memory of such a language is limited. The grammar cannot remember the presence of a construct over an arbitrarily long input; this is necessary for a language in which, for example, a name must be declared before it may be referenced. More powerful grammars that can express this constraint, however, cannot be parsed efficiently. Thus, it is a common strategy to create a relaxed parser for a context-free grammar which accepts a superset of the desired language constructs (that is, it accepts some invalid constructs); later, the unwanted constructs can be filtered out.

Overview of process

Flow of data in a typical parser

The following example demonstrates the common case of parsing a computer language with two levels of grammar: lexical and syntactic.

The first stage is the token generation, or lexical analysis, by which the input character stream is split into meaningful symbols defined by a grammar of regular expressions. For example, a calculator program would look at an input such as "12*(3+4)^2" and split it into the tokens 12, *, (, 3, +, 4, ), ^, 2, each of which is a meaningful symbol in the context of an arithmetic expression. The lexer would contain rules to tell it that the characters *, +, ^, ( and ) mark the start of a new token, so meaningless tokens like "12*" or "(3" will not be generated.

The next stage is parsing or syntactic analysis, which is checking that the tokens form an allowable expression. This is usually done with reference to a context-free grammar which recursively defines components that can make up an expression and the order in which they must appear. However, not all rules defining programming languages can be expressed by context-free grammars alone, for example type validity and proper declaration of identifiers. These rules can be formally expressed with attribute grammars.

The final phase is semantic parsing or analysis, which is working out the implications of the expression just validated and taking the appropriate action. In the case of a calculator or interpreter, the action is to evaluate the expression or program, a compiler, on the other hand, would generate some kind of code. Attribute grammars can also be used to define these actions.

Types of parser

The task of the parser is essentially to determine if and how the input can be derived from the start symbol of the grammar. This can be done in essentially two ways:

  • Top-down parsing- Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream by searching for parse trees using a top-down expansion of the given formal grammar rules. Tokens are consumed from left to right. Inclusive choice is used to accommodate ambiguity by expanding all alternative right-hand-sides of grammar rules.[4]
  • Bottom-up parsing - A parser can start with the input and attempt to rewrite it to the start symbol. Intuitively, the parser attempts to locate the most basic elements, then the elements containing these, and so on. LR parsers are examples of bottom-up parsers. Another term used for this type of parser is Shift-Reduce parsing.

LL parsers and recursive-descent parser are examples of top-down parsers which cannot accommodate left recursive productions. Although it has been believed that simple implementations of top-down parsing cannot accommodate direct and indirect left-recursion and may require exponential time and space complexity while parsing ambiguous context-free grammars, more sophisticated algorithms for top-down parsing have been created by Frost, Hafiz, and Callaghan[5][6] which accommodate ambiguity and left recursion in polynomial time and which generate polynomial-size representations of the potentially exponential number of parse trees. Their algorithm is able to produce both left-most and right-most derivations of an input with regard to a given CFG (context-free grammar).

An important distinction with regard to parsers is whether a parser generates a leftmost derivation or a rightmost derivation (see context-free grammar). LL parsers will generate a leftmost derivation and LR parsers will generate a rightmost derivation (although usually in reverse).[4]

Examples of parsers

Top-down parsers

Some of the parsers that use top-down parsing include:

Bottom-up parsers

Some of the parsers that use bottom-up parsing include:

Parser development software

Some of the well known parser development tools include the following. Also see comparison of parser generators.

Lookahead

Lookahead establishes the maximum incoming tokens that a parser can use to decide which rule it should use. Lookahead is especially relevant to LL, LR, and LALR parsers, where it is often explicitly indicated by affixing the lookahead to the algorithm name in parentheses, such as LALR(1).

Most programming languages, the primary target of parsers, are carefully defined in such a way that a parser with limited lookahead, typically one, can parse them, because parsers with limited lookahead are often more efficient. One important change[citation needed] to this trend came in 1990 when Terence Parr created ANTLR for his Ph.D. thesis, a parser generator for efficient LL(k) parsers, where k is any fixed value.

Parsers typically have only a few actions after seeing each token. They are shift (add this token to the stack for later reduction), reduce (pop tokens from the stack and form a syntactic construct), end, error (no known rule applies) or conflict (does not know whether to shift or reduce).

Lookahead has two advantages.

  • It helps the parser take the correct action in case of conflicts. For example, parsing the if statement in the case of an else clause.
  • It eliminates many duplicate states and eases the burden of an extra stack. A C language non-lookahead parser will have around 10,000 states. A lookahead parser will have around 300 states.

Example: Parsing the Expression 1 + 2 * 3

 Set of expression parsing rules (called grammar) is as follows, 
Rule1: E → E + E Expression is the sum of two expressions. Rule2: E → E * E Expression is the product of two expressions. Rule3: E → number Expression is a simple number Rule4: + has less precedence than *

Most programming languages (except for a few such as APL and Smalltalk) and algebraic formulas give higher precedence to multiplication than addition, in which case the correct interpretation of the example above is (1 + (2*3)). Note that Rule4 above is a semantic rule. It is possible to rewrite the grammar to incorporate this into the syntax. However, not all such rules can be translated into syntax.

Simple non-lookahead parser actions
  1. Reduces 1 to expression E on input 1 based on rule3.
  2. Shift + onto stack on input 1 in anticipation of rule1.
  3. Reduce stack element 2 to Expression E based on rule3.
  4. Reduce stack items E+ and new input E to E based on rule1.
  5. Shift * onto stack on input * in anticipation of rule2.
  6. Shift 3 onto stack on input 3 in anticipation of rule3.
  7. Reduce 3 to Expression E on input 3 based on rule3.
  8. Reduce stack items E* and new input E to E based on rule2.

The parse tree and resulting code from it is not correct according to language semantics.

To correctly parse without lookahead, there are three solutions:

  • The user has to enclose expressions within parentheses. This often is not a viable solution.
  • The parser needs to have more logic to backtrack and retry whenever a rule is violated or not complete. The similar method is followed in LL parsers.
  • Alternatively, the parser or grammar needs to have extra logic to delay reduction and reduce only when it is absolutely sure which rule to reduce first. This method is used in LR parsers. This correctly parses the expression but with many more states and increased stack depth.
Lookahead parser actions
  1. Shift 1 onto stack on input 1 in anticipation of rule3. It does not reduce immediately.
  2. Reduce stack item 1 to simple Expression on input + based on rule3. The lookahead is +, so we are on path to E +, so we can reduce the stack to E.
  3. Shift + onto stack on input + in anticipation of rule1.
  4. Shift 2 onto stack on input 2 in anticipation of rule3.
  5. Reduce stack item 2 to Expression on input * based on rule3. The lookahead * expects only E before it.
  6. Now stack has E + E and still the input is *. It has two choices now, either to shift based on rule2 or reduction based on rule1. Since * has more precedence than + based on rule4, so shift * onto stack in anticipation of rule2.
  7. Shift 3 onto stack on input 3 in anticipation of rule3.
  8. Reduce stack item 3 to Expression after seeing end of input based on rule3.
  9. Reduce stack items E * E to E based on rule2.
  10. Reduce stack items E + E to E based on rule1.

The parse tree generated is correct and simply more efficient[citation needed] than non-lookahead parsers. This is the strategy followed in LALR parsers.

See also

References

  1. ^ "Bartleby.com homepage". http://www.bartleby.com/61/33/P0083300.html. Retrieved 28 November 2010. 
  2. ^ "parse". dictionary.reference.com. http://dictionary.reference.com/search?q=parse&x=0&y=0. Retrieved 27 November 2010. 
  3. ^ "parse". dictionary.reference.com. http://dictionary.reference.com/browse/parse. Retrieved 27 November 2010. 
  4. ^ a b Aho, A.V., Sethi, R. and Ullman ,J.D. (1986) " Compilers: principles, techniques, and tools." Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.
  5. ^ Frost, R., Hafiz, R. and Callaghan, P. (2007) " Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars ." 10th International Workshop on Parsing Technologies (IWPT), ACL-SIGPARSE , Pages: 109 - 120, June 2007, Prague.
  6. ^ Frost, R., Hafiz, R. and Callaghan, P. (2008) " Parser Combinators for Ambiguous Left-Recursive Grammars." 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN , Volume 4902/2008, Pages: 167 - 181, January 2008, San Francisco.
  7. ^ shproto.org

Further reading

External links


Top

Dansk (Danish)
v. tr. - analysere (grammatik)
v. intr. - analyseres
n. - analyse

Nederlands (Dutch)
ontleden (zin), nauwkeurig analyseren, reeks ontleden en analyseren (computer)

Français (French)
v. tr. - (Ling) faire l'analyse grammaticale de, (Comput) analyser
v. intr. - (Ling) faire l'analyse grammaticale de
n. - analyse grammaticale

Deutsch (German)
v. - grammatisch beschreiben od. analysieren
n. - grammatikalische Analyse

Ελληνική (Greek)
v. - (γραμμ.) τεχνολογώ, (Η/Υ) "φρασεολογώ" πρόγραμμα ή εντολή

Italiano (Italian)
analizzare

Português (Portuguese)
v. - analisar (Gram.)

Русский (Russian)
делать грамматический разбор

Español (Spanish)
v. tr. - analizar gramaticalmente
v. intr. - ser analizado gramaticalmente
n. - análisis gramatical

Svenska (Swedish)
v. - analysera, ta ut satsdelar

中文(简体)(Chinese (Simplified))
解析, 符合语法, 分列

中文(繁體)(Chinese (Traditional))
v. tr. - 解析
v. intr. - 符合語法
n. - 分列

한국어 (Korean)
v. tr. - 품사 및 문법적 관계를 설명하다, 해부하다, 뼈를 발라내다
v. intr. - 문장의 품사 및 문법적 관계를 설명하다
n. - 품사분석

日本語 (Japanese)
v. - 解剖する

العربيه (Arabic)
‏(فعل) يعرب, تعرب الكلمه‏

עברית (Hebrew)
v. tr. - ‮ניתח מלה מבחינה דיקדוקית ותחבירית, ניתח משפט‬
v. intr. - ‮איפשר לנתח דיקדוקית‬
n. - ‮ניתוח מילה או משפט‬


Post a question - any question - to the WikiAnswers community:

Copyrights:

Mentioned in

syntax error (technology)
parser (technology)