Parsing: Process of Analyzing with the Rules of a Formal Grammar

Chhillar N. et al., J. Harmoniz. Res. Eng., 2013, 1(2), 73-79 • - Journal Of Harmonized Research (JOHR) Journal Of Harmonized Research in Engineering 1(2), 2013, 73-79 ISSN 2347 – 7393 Original Research Article PARSING: PROCESS OF ANALYZING WITH THE RULES OF A FORMAL GRAMMAR Nikita Chhillar, Nisha Yadav, Neha Jaiswal Department of Computer Science and Engineering, Dronacharya College of Engineering, Khentawas, Farukhnagar, Gurgaon, India Abstract: Parsing is the process of structuring a linear representation in accordance with a given grammar. The “linear representation” may be a sentence, a computer program, knitting pattern, a sequence of geological strata, a piece of music, actions in ritual behavior, in short any linear sequence in which the preceding elements in some way restrict† the next element. For some of the examples the grammar is well-known, for some it is an object of research and for some our notion of a grammar is only just beginning to take shape. Keywords: Parsing, Grammar, Bottom -up parsing , Top -down parsing, Parser. Introduction: the aid of devices such as sentence diagrams. Parsing or syntactic analysis is the process of Within computational linguistics the term is analyzing a string of symbols, either in natural used to refer to the formal analysis by language or in computer languages, according computer of a sentence or other string of to the rules of a formal grammar. The term words into its constituents, resulting in a parse parsing comes from Latin pars meaning part tree showing their syntactic relation to each (of speech). other, which may also contain semantic and The term has slightly different meanings in other information. different branches of linguistics and computer The term is also used in psycholinguistics science. Traditional sentence parsing is often when describing language comprehension. In performed as a method of understanding the this context, parsing refers to the way that exact meaning of a sentence, sometimes with human beings analyze a sentence or phrase (in spoken language or text) "in terms of For Correspondence: grammatical constituents, identifying the parts nikitachhillarATyahoo.com of speech, syntactic relations, etc." This term Received on: October 2013 is especially common when discussing what Accepted after revision: December 2013 linguistic cues help speakers to interpret garden-path sentences. Downloaded from: www.johronline.com Within computer science, the term is used in the analysis of computer languages, referring www.johronline.com 73 | P a g e Chhillar N. et al., J. Harmoniz. Res. Eng., 2013, 1(2), 73-79 to the syntactic analysis of the input code into this type is identified as NP-complete. Head- its component parts in order to facilitate the driven phrase structure grammar is another writing of compilers and interpreters. linguistic formalism that has been accepted in 1) Traditional methods: the parsing community, but further research The traditional grammatical exercise of efforts have focused on simple formalisms parsing, are known as clause analysis , such as the one used in the Penn Treebank. involves splitting a text into its component Shallow parsing aims to locate only the parts of speech with an explanation of the boundaries of major constituents like noun form, purpose, and syntactic relationship of phrases. Another admired strategy for each part. This is determined in many part avoiding linguistic controversy is dependency from study of the language's conjugations and grammar parsing. declensions, which can be quite complex for Most modern parsers are at least partially heavily inflected languages. To parse a phrase statistical; that is, they rely on a body of such as 'Kittu saw monkey' involves noting training data which has already been that the singular noun 'Kittu' is the subject of interpreted (parsed by hand). This approach the sentence, the verb 'saw' is the third person permits the system to gather information about singular of the past tense of the verb 'to see', the frequency with which different and the singular noun 'monkey' is the object of constructions occur in specific contexts. the sentence. Techniques such as sentence Approaches which have been used consist of diagrams are used to indicate relation between straightforward PCFGs (probabilistic context- elements in the sentence. free grammars), maximum entropy, and neural 2) Computational methods: nets. Most of the successful systems use In some machine translation and natural lexical statistics (that is, they consider the language processing systems, written texts in identities of the words involved, as well as human languages are parsed by computer their part of speech). However such systems programs . Human sentences are not easily are vulnerable to over fitting and need some parsed via programs, as there is substantial kind of smoothing to be effective. ambiguity inside the structure of human Parsing algorithms used for natural language language, whose usage is to convey meaning cannot rely on the grammar having 'good' (or semantics) in a potentially unlimited range properties as with manually designed of possibilities however only some of which grammars for programming languages. As are germane to the particular case. So an mentioned before some grammar formalisms utterance "Kittu saw monkey" versus are very difficult to parse computationally; in "Monkey saw Kittu" is definite on one detail general, even if the desired structure is not but in another language might appear as "Kittu context-free, some type of context-free monkey saw" with a reliance on the larger approximation to the grammar is used to context to distinguish between those two perform a first pass. Algorithms which make possibilities, if indeed that difference was of use context-free grammars often rely on some concern. It is difficult to prepare formal rules alternative of the CKY algorithm, usually with to describe informal behavior even though it is some heuristic to prune away unlikely clear that some rules are being followed. analyses to keep time. However some systems To parse natural language data, researchers trade speed for accurateness using, example should first have same opinion on the linear-time versions of the shift-reduce grammar to be used. The selection of syntax is algorithm. A recent development has been affected by both linguistic and computational parse reranking that the parser proposes concerns; for example some parsing systems several large numbers of analyses, and a more make use of lexical functional grammar, complex system picks the best option. whereas in general, parsing for grammars of www.johronline.com 74 | P a g e Chhillar N. et al., J. Harmoniz. Res. Eng., 2013, 1(2), 73-79 3) Overview of process: of identifiers. These rules can be formally conveyed with attribute grammars. The last phase is semantic parsing or analysis that works out the implications of the expression just validated and taking the suitable action. Calculator or interpreter evaluates the expression or program, a compiler, would generate some kind of code. Attribute grammars can also be used to describe these actions. Types of parsing: The task of the parser is basically to determine how the input can be derived from the start symbol of the grammar. This can be done in basically two ways: • Top-down parsing- Top-down parsing can be viewed as an approach to find left-most origin of an input-stream by searching for parse trees using a top-down extension of the given formal grammar rules. Tokens are used from left to right. Complete choice is used to hold ambiguity by expanding every alternative right-hand- These examples demonstrate the general case side of grammar rules. of parsing a computer language by two levels • Bottom-up parsing - A parser can initiate of grammar: lexical and syntactic. with the input and approach to rewrite it to The first step is the token generation, or the start symbol. Intuitively, the parser lexical analysis, in which the input character attempts to place the most basic elements, stream is split into meaningful symbols then the elements include these, and so on. defined with a grammar of regular LR parsers are instances of bottom-up expressions. For example, a calculator parsers. Another term used for this sort of program would come across an input such as parser is Shift-Reduce parsing. "12*(3+4^2" and split this into the tokens 12, LL parsers and recursive-descent parser are *, (, 3, +, 4,), ^, 2, each of which is a examples of top-down parsers that cannot significant symbol in the context of an accommodate left recursive production rules. arithmetic expression. The lexer would Although it has been assumed that simple contain rules that tell it, the characters *, +, ^, implementations of top-down parsing cannot (and) mark the beginning of a new token, so hold direct and indirect left-recursion and may meaningless tokens such as "12*" or "(3" will need exponential time and space complexity not be generated). as parsing ambiguous context-free grammars, The next stage is parsing or syntactic analysis, extra sophisticated algorithms for top-down which examines that the tokens form an parsing have been produced by Frost, Hafiz, acceptable expression. This is generally done and Callaghan that accommodate ambiguity with reference to a context-free grammar and left recursion in polynomial time and that which recursively defines parts that can make generate polynomial-size depictions of the up an expression and the arrangement in potentially exponential number of parse trees. which they must appear. However, not all Their algorithm is capable to produce both rules defining programming languages can be left-most and right-most derivations of an conveyed by context-free grammars alone, for input with respect to a given CFG (context- example type validity and proper declaration free grammar). www.johronline.com 75 | P a g e Chhillar N. et al., J. Harmoniz. Res. Eng., 2013, 1(2), 73-79 An important dissimilarity with regard to parsing by applying each production rule to parsers is whether a parser produces a leftmost the received symbols, working from the left- derivation or a rightmost derivation.

Load more