A Robust Parsing Algorithm for Link Grammars

A ROBUST PARSING ALGORITHM FOR LINK GRAMMARS Dennis Grinberg John Laerty Daniel Sleator August CMUCS School of Computer Science Carnegie Mellon University Pittsburgh PA cmp-lg/9508003 02 Aug 1995 Corresp ondence regarding this pap er should b e sent to Dennis Grinberg at the ab ove address or by email to denniscscmuedu Research supp orted in part by NSF and ARPA under grant IRI and the ATR Telecommunications Research Lab oratories The views and conclusions contained in this do cument are those of the authors and should not b e interpreted as representing the ocial p olicies either expressed or implied of the NSF or the US government Keywords Robust parsing partial parsing natural language pro cessing Abstract In this pap er we present a robust parsing algorithm based on the link grammar formalism for parsing natural languages Our algorithm is a natural extension of the original dynamic programming recognition algorithm which recursively counts the number of linkages b etween two words in the input sentence The mo died algorithm uses the notion of a nul l link in order to allow a connection b etween any pair of adjacent words regardless of their dictionary denitions The algorithm pro ceeds by making three dynamic programming passes In the rst pass the input is parsed using the original algorithm which enforces the constraints on links to ensure grammaticality In the second pass the total cost of each substring of words is computed where cost is determined by the number of null links necessary to parse the substring The nal pass counts the total number of parses with minimal cost All of the original pruning techniques have natural counterparts in the robust algorithm When used together with memoization these techniques enable the algorithm to run eciently with cubic worstcase complexity We have implemented these ideas and tested them by parsing the Switchboard corpus of conversa tional English This corpus is comprised of approximately three million words of text corresp onding to more than hours of transcrib ed sp eech collected from telephone conversations restricted to dierent topics Although only a small fraction of the sentences in this corpus are grammatical by standard criteria the robust link grammar parser is able to extract relevant structure for a large p ortion of the sentences We present the results of our exp eriments using this system including the analyses of selected and random sentences from the corpus We placed a version of the robust parser on the Word Wide Web for exp erimentation It can b e reached at URL httpwwwcscmueduafscscmueduprojectlinkwwwrobusthtml In this version there are some limitations such as the maximum length of a sentence in words and the maximum amount of memory the parser can use Introduction In this pap er we present a robust parsing algorithm for the link grammar formalism introduced in Using a simple extension of the original formalism we develop ecient parsing and pruning algorithms for extracting structure from unrestricted natural language Our approach to robust parsing is purely algorithmic no mo dication to the underlying gram mar is necessary We b egin by making a generalized denition of what is allowed as a parse We then assign a nonnegative cost to each generalized parse in such a way that the cost of a parse which is grammatical with resp ect to the underlying grammar is zero The goal of the robust pars ing algorithm is then to enumerate all parses of a given sentence that have minimal cost While this approach to robust parsing is certainly not new it has a particularly simple and eective real ization for link grammar that takes advantage of the formalisms unique prop erties In particular all of the pruning techniques that make link grammar parsing ecient for large natural language grammars either remain unchanged or have natural extensions in the robust parsing algorithm The robust algorithm uses the notion of a nul l link to allow a connection b etween any pair of adjacent words regardless of their dictionary denitions The algorithm pro ceeds by making three dynamic programming passes through the sentence In the rst pass the input is parsed using the original algorithm which enforces the constraints on links to ensure grammaticality In the second pass the total cost of each substring of words is computed where cost is determined by the number of null links necessary to parse the substring The nal pass counts the total number of parses with minimal cost Memoization together with pruning techniques enable the algorithm to 3 run eciently with theoretic time complexity of O n for an input of n words We have implemented these ideas and tested them by parsing the Switchboard corpus of con versational English This corpus is comprised of approximately three million words of text corresp onding to more than hours of transcrib ed sp eech collected from telephone conversations restricted to dierent topics Although only a small fraction of the sentences in this corpus are grammatical by standard criteria the robust link grammar parser is able to extract relevant structure for a large p ortion of the sentences We present here the results of our exp eriments using this system including the analyses of selected and random sentences from the corpus and the statistics of a typical parsing session There is a wide b o dy of literature related to robust parsing This can b e seen immediately by the references given by S Abney in a tutorial titled Partial Parsing at the th Conference on Applied Natural Language Pro cessing in While some robust parsing metho ds dep end on sp ecic domains and use semantics as their guide in this pap er we investigate a purely syntactic technique The advantages of using syntactic techniques include the ability to use the robust parser in varied domains and applications the ability to use existing grammars with little or no change and the ability to base techniques on parsing technologies whose characteristics are well understo o d There has b een work by a number of researchers on leasterrors recognition for contextfree grammars In Lyon prop osed a dynamic programming algorithm for nding the least number of mutations insertions and deletions of terminal symbols necessary to parse a sentence Recently Lee et al extended this work by allowing errors in nonterminals In Lavie and Tomita describ e a mo dication of the Generalized LR Parser Their GLR algorithm is designed to determine the maximal subsets of the input that are parsable by skipping words While not guaranteed to have cubic running time a b eam search is used to prune parsing options that are unlikely to pro duce a maximal parse With pruning the system is no longer guaranteed to pro duce the b est solution but the authors rep ort that the b eam search works well in practice In the following section we briey review the relevant concepts from link grammar that are necessary for presenting the robust parsing algorithm In Section we give two denitions of cost The rst is given in terms of the edit distance of a generalized parse and it is analogous to leasterrors recognition The second denition introduces the concept of a null link Using this denition our approach b ears similarity to the work in While the edit distance is p erhaps a more general and intuitive concept it do es not lead us to ecient parsing algorithms In Section we give the details of the robust parsing algorithm using a cost function dened in terms of null links In Section we explain how the pruning techniques of can b e extended to accommo date null links Finally in Section we present the results of our exp eriments with the robust parser on the Switchboard corpus Link Grammar Concepts and Notation In this section we briey summarize the relevant concepts and notation of link grammar The gure b elow represents one of the parses pro duced by a link grammar parser on the input sentence Despite newspap er rep orts to the contrary Mary handles herself with extreme condence The lab elled arcs connecting words to other words on their left or right are called links A valid parse is called a linkage CO Xca Jp Jp EVp Jp AN Mp D*u Ss O A Despite newspaper reports to the contrary , Mary handles herself with extreme confidence In this gure we see that confidence is linked on the left to the adjective extreme with a link lab elled A denoting an adjective We can also say that extreme is linked on the right to confidence Words can have multiple links for example The word handles has a singular noun Ss link to its left and ob ject O and prep ositional phrase EVp links to its right A link grammar is dened by a dictionary comprising a vocabulary and denitions of the vocab ulary words These denitions describ e how the words can b e used in linkages A words denition can b e thought of as a list of disjuncts Each disjunct d is represented by two ordered lists written as d l l l r r r 1 2 m n n1 1 where l are left connectors and r are right connectors Connector names typically b egin with one i j or more upp er case letters followed by sequences of lower case letters and s For example from the ab ove linkage we can see that one of the disjuncts in the dictionary denition of handles must b e Ss EVp O A word W with disjunct d l l l r r r can b e linked to other words 1 2 m n n1 1 by connecting each l to right connectors of words on W s left and also connecting each r to left i j connectors of words on W s right Links are only p ermitted b etween matching connectors A linkage is sp ecied by choosing a disjunct

A Robust Parsing Algorithm for Link Grammars

A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Implementing a Portable Clinical NLP System with a Common Data Model: a Lisp Perspective

Chapter 14: Dependency Parsing

Development of a Persian Syntactic Dependency Treebank

Downloads." the Open Information Security Foundation

Lexicalization and Grammar Development

Parsing Aligned Parallel Corpus by Projecting Syntactic Relations from Annotated Source Corpus

Dependency-Length Minimization in Natural and Artificial Languages*

Natural Language Processing (NLP): Written and Oral Involves Processing of Written Text Using Computer Models at Lexical, Syntactic, and Semantic Level

Extraction of Protein Interaction Information from Unstructured Text Using a Link Grammar Parser Rania A

Wanderlust: Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns

Interpretable Natural Language Segmentation Based on Link Grammar