PARSING-Brief Study-Tanya Sharma, Sumit Das & Vishal Bhalla

International Journal of Research (IJR) Vol-1, Issue-8, September2014 ISSN 2348-6848 Parsing – A Brief Study Tanya Sharma, Sumit Das & Vishal Bhalla Research Scholars, Computer science and engineering department, MDU Rohtak, India ABSTRACT Within computational linguistics the term is Parsing is a technique to determine how a used to refer to the formal analysis by a string might be derived using productions computer of a sentence or other string of words into its constituents, resulting in a (rewrite rules) of a given grammar. It can be parse tree showing their syntactic relation to used to check whether or not a string each other, which may also contain semantic belongs to a given language. When a and other information. statement written in a programming The term is also used in psycholinguistics language is input, it is parsed by a compiler when describing language comprehension. to check whether or not it is syntactically In this context, parsing refers to the way that correct and to extract components if it is human beings analyze a sentence or phrase (in spoken language or text) "in terms of correct. Finding an efficient parser is a grammatical constituents, identifying the nontrivial problem and a great deal of parts of speech, syntactic relations, etc." research has been conducted on parser This term is especially common when design. This paper basically serves the discussing what linguistic cues help speakers purpose of elaborating the concept of to interpret garden-path sentences. parsing. Within computer science, the term is used in the analysis of computer languages, 1) INTRODUCTION referring to the syntactic analysis of the input code into its component parts in order Parsing or syntactic analysis is the process to facilitate the writing of compilers and of analysing a string of symbols, either in interpreters. natural language or in computer languages, according to the rules of a formal grammar. 2) NATURAL LANGUAGE PARSING The term parsing comes from Latin pars (orationis), meaning part (of speech). 2.1) TRADITIONAL METHODS The term has slightly different meanings in different branches of linguistics and The traditional grammatical exercise of computer science. Traditional sentence parsing, sometimes known as clause parsing is often performed as a method of analysis, involves breaking down a text understanding the exact meaning of a into its component parts of speech with sentence, sometimes with the aid of devices an explanation of the form, function, and such as sentence diagrams. It usually syntactic relationship of each part. This emphasizes the importance of grammatical is determined in large part from study of divisions such as subject and predicate. the language's conjugations and declensions, which can be quite intricate PARSING – A BRIEF STUDY Tanya Sharma, Sumit Das & Vishal Bhalla P a g e | 1265 International Journal of Research (IJR) Vol-1, Issue-8, September2014 ISSN 2348-6848 for heavily inflected languages. To parse syntax is affected by both linguistic and a phrase such as 'man bites dog' involves computational concerns; for instance noting that the singular noun 'man' is the some parsing systems use lexical subject of the sentence, the verb 'bites' is functional grammar, but in general, the third person singular of the present parsing for grammars of this type is tense of the verb 'to bite', and the known to be NP-complete. Head-driven singular noun 'dog' is the object of the phrase structure grammar is another sentence. Techniques such as sentence linguistic formalism which has been diagrams are sometimes used to indicate popular in the parsing community, but relation between elements in the other research efforts have focused on sentence. less complex formalisms such as the one Parsing was formerly central to the used in the Penn Treebank. Shallow teaching of grammar throughout the parsing aims to find only the boundaries English-speaking world, and widely of major constituents such as noun regarded as basic to the use and phrases. Another popular strategy for understanding of written language. avoiding linguistic controversy is However the teaching of such techniques dependency grammar parsing. is no longer current. Most modern parsers are at least partly statistical; that is, they rely on a corpus 2.2) COMPUTATIONAL METHODS of training data which has already been annotated (parsed by hand). This In some machine translation and natural approach allows the system to gather language processing systems, written information about the frequency with texts in human languages are parsed by which various constructions occur in computer programs. Human sentences specific contexts. Approaches which are not easily parsed by programs, as have been used include straightforward there is substantial ambiguity in the PCFGs (probabilistic context-free structure of human language, whose grammars), maximum entropy, and usage is to convey meaning (or neural nets. Most of the more successful semantics) amongst a potentially systems use lexical statistics (that is, unlimited range of possibilities but only they consider the identities of the words some of which are germane to the involved, as well as their part of speech). particular case. So an utterance "Man However such systems are vulnerable to bites dog" versus "Dog bites man" is over-fitting and require some kind of definite on one detail but in another smoothing to be effective. language might appear as "Man dog Parsing algorithms for natural language bites" with a reliance on the larger cannot rely on the grammar having 'nice' context to distinguish between those two properties as with manually designed possibilities, if indeed that difference grammars for programming languages. was of concern. It is difficult to prepare As mentioned earlier some grammar formal rules to describe informal formalisms are very difficult to parse behaviour even though it is clear that computationally; in general, even if the some rules are being followed. desired structure is not context-free, In order to parse natural language data, some kind of context-free approximation researchers must first agree on the to the grammar is used to perform a first grammar to be used. The choice of pass. Algorithms which use context-free PARSING – A BRIEF STUDY Tanya Sharma, Sumit Das & Vishal Bhalla P a g e | 1266 International Journal of Research (IJR) Vol-1, Issue-8, September2014 ISSN 2348-6848 grammars often rely on some variant of are extracted, rather than a parse tree the CKY algorithm, usually with some being constructed. Parsers range from heuristic to prune away unlikely analyses very simple functions such as scanf, to to save time. However some systems complex programs such as the frontend trade speed for accuracy using, e.g., of a C++ compiler or the HTML parser linear-time versions of the shift-reduce of a web browser. An important class of algorithm. A somewhat recent simple parsing is done using regular development has been parse re-ranking expressions, where a regular expression in which the parser proposes some large defines a regular language, and then the number of analyses, and a more complex regular expression engine automatically system selects the best option. generates a parser for that language, allowing pattern matching and extraction 3) COMPUTER LEVEL PARSING of text. In other contexts regular expressions are instead used prior to 3.1) PARSER parsing, as the lexing step whose output is then used by the parser. A parser is a software component that The use of parsers varies by input. In the takes input data (frequently text) and case of data languages, a parser is often builds a data structure – often some kind found as the file reading facility of a of parse tree, abstract syntax tree or program, such as reading in HTML or other hierarchical structure – giving a XML text; these examples are markup structural representation of the input, languages. In the case of programming checking for correct syntax in the languages, a parser is a component of a process. The parsing may be preceded or compiler or interpreter, which parses the followed by other steps, or these may be source code of a computer programming combined into a single step. The parser language to create some form of internal is often preceded by a separate lexical representation; the parser is a key step in analyser, which creates tokens from the the compiler frontend. Programming sequence of input characters; languages tend to be specified in terms alternatively, these can be combined in of a deterministic context-free grammar scannerless parsing. Parsers may be because fast and efficient parsers can be programmed by hand or may be written for them. For compilers, the automatically or semi-automatically parsing itself can be done in one pass or generated by a parser generator. Parsing multiple passes – see one-pass compiler is complementary to templating, which and multi-pass compiler. produces formatted output. These may The implied disadvantages of a one-pass be applied to different domains, but compiler can largely be overcome by often appear together, such as the adding fix-ups, where provision is made scanf/printf pair, or the input (front end for fix-ups during the forward pass, and parsing) and output (back end code the fix-ups are applied backwards when generation) stages of a compiler. the current program segment has been The input to a parser is often text in recognized as having been completed. some computer language, but may also An example where such a fix-up be text in a natural language or less mechanism would be useful would be a structured textual data, in which case forward GOTO statement, where the generally only certain parts of the text target of the GOTO is unknown until the PARSING – A BRIEF STUDY Tanya Sharma, Sumit Das & Vishal Bhalla P a g e | 1267 International Journal of Research (IJR) Vol-1, Issue-8, September2014 ISSN 2348-6848 program segment is completed.

PARSING-Brief Study-Tanya Sharma, Sumit Das & Vishal Bhalla

Parser Tables for Non-LR(1) Grammars with Conflict Resolution Joel E

Chapter 4 Syntax Analysis

Unit-5 Parsers

Bottom-Up Parsing Swarnendu Biswas

Compiler Construction

Pondicherry Engineering College Puducherry – 605 014

Characteristic Parsing: a Framework for Producing Compact Deterministic Parsers, I1"

Regulation 2018 Revision I (I - VIII Semester)

5. Bottom-Up Parsing

Cs6601 Distributed Systems Part A

IELR(1): Practical LR(1) Parser Tables for Non-LR(1) Grammars with Conflict Resolution

Bottom up Parsing