University of Pennsylvania ScholarlyCommons
IRCS Technical Reports Series Institute for Research in Cognitive Science
June 1997
Complexity of Lexical Descriptions and its Relevance to Partial Parsing
Srinivas Bangalore University of Pennsylvania
Follow this and additional works at: https://repository.upenn.edu/ircs_reports
Bangalore, Srinivas, "Complexity of Lexical Descriptions and its Relevance to Partial Parsing" (1997). IRCS Technical Reports Series. 82. https://repository.upenn.edu/ircs_reports/82
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-97-10.
This paper is posted at ScholarlyCommons. https://repository.upenn.edu/ircs_reports/82 For more information, please contact [email protected]. Complexity of Lexical Descriptions and its Relevance to Partial Parsing
Abstract In this dissertation, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated with rich descriptions (supertags) that impose complex constraints in a local context. However, increasing the complexity of descriptions makes the number of different descriptions for each lexical item much larger and hence increases the local ambiguity for a parser. This local ambiguity can be resolved by using supertag co-occurrence statistics collected from parsed corpora. We have explored these ideas in the context of Lexicalized Tree-Adjoining Grammar (LTAG) framework wherein supertag disambiguation provides a representation that is an almost parse. We have used the disambiguated supertag sequence in conjunction with a lightweight dependency analyzer to compute noun groups, verb groups, dependency linkages and even partial parses. We have shown that a trigram-based supertagger achieves an accuracy of 92.1‰ on Wall Street Journal (WSJ) texts. Furthermore, we have shown that the lightweight dependency analysis on the output of the supertagger identifies 83‰ of the dependency links accurately. We have exploited the representation of supertags with Explanation-Based Learning to improve parsing effciency. In this approach, parsing in limited domains can be modeled as a Finite-State Transduction. We have implemented such a system for the ATIS domain which improves parsing eciency by a factor of 15. We have used the supertagger in a variety of applications to provide lexical descriptions at an appropriate granularity. In an information retrieval application, we show that the supertag based system performs at higher levels of precision compared to a system based on part-of-speech tags. In an information extraction task, supertags are used in specifying extraction patterns. For language modeling applications, we view supertags as syntactically motivated class labels in a class-based language model. The distinction between recursive and non-recursive supertags is exploited in a sentence simplification application.
Comments University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-97-10.
This thesis or dissertation is available at ScholarlyCommons: https://repository.upenn.edu/ircs_reports/82 , Institute for Research in Cognitive Science
Complexity of Lexical Descriptions and Its Relevance to Partial Parsing
Srinivas Bangalore
University of Pennsylvania 3401 Walnut Street, Suite 400A Philadelphia, PA 19104-6228
June 1997
Site of the NSF Science and Technology Center for Research in Cognitive Science
IRCS Report 97--10
COMPLEXITY OF LEXICAL DESCRIPTIONS AND ITS
RELEVANCE TO PARTIAL PARSING
SRINIVAS BANGALORE
A DISSERTATION
in
COMPUTER AND INFORMATION SCIENCE
Presented to the Faculties of the UniversityofPennsylvania in Partial Ful llment of the
Requirements for the Degree of Do ctor of Philosophy