University of Pennsylvania ScholarlyCommons
IRCS Technical Reports Series Institute for Research in Cognitive Science
September 1998
Incorporating Punctuation Into the Sentence Grammar: A Lexicalized Tree Adjoining Grammar Perspective
Christine D. Doran University of Pennsylvania
Follow this and additional works at: https://repository.upenn.edu/ircs_reports
Doran, Christine D., "Incorporating Punctuation Into the Sentence Grammar: A Lexicalized Tree Adjoining Grammar Perspective" (1998). IRCS Technical Reports Series. 68. https://repository.upenn.edu/ircs_reports/68
University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-98-24.
This paper is posted at ScholarlyCommons. https://repository.upenn.edu/ircs_reports/68 For more information, please contact [email protected]. Incorporating Punctuation Into the Sentence Grammar: A Lexicalized Tree Adjoining Grammar Perspective
Abstract Punctuation helps us to structure, and thus to understand, texts. Many uses of punctuation straddle the line between syntax and discourse, because they serve to combine multiple propositions within a single orthographic sentence. They allow us to insert discourse-level relations at the level of a single sentence. Just as people make use of information from punctuation in processing what they read, computers can use information from punctuation in processing texts automatically. Most current natural language processing systems fail to take punctuation into account at all, losing a valuable source of information about the text. Those which do mostly do so in a superficial way, again failing to fully exploit the information conveyed by punctuation. To be able to make use of such information in a computational system, we must first characterize its uses and find a suitable eprr esentation for encoding them.
The work here focuses on extending a syntactic grammar to handle phenomena occurring within a single sentence which have punctuation as an integral component. Punctuation marks are treated as full- fledged lexical items in a Lexicalized Tree Adjoining Grammar, which is an extremely well-suited formalism for encoding punctuation in the sentence grammar. Each mark anchors its own elementary trees and imposes constraints on the surrounding lexical items. I have analyzed data representing a wide variety of constructions, and added treatments of them to the large English grammar which is part of the XTAG system. The advantages of using LTAG are that its elementary units are structured trees of a suitable size for stating the constraints we are interested in, and the derivation histories it produces contain information the discourse grammar will need about which elementary units have used and how they have been combined. I also consider in detail a few particularly interesting constructions where the sentence and discourse grammars meet-appositives, reported speech and uses of parentheses. My results confirm that punctuation can be used in analyzing sentences ot increase the coverage of the grammar, reduce the ambiguity of certain word sequences and facilitate discourse-level processing of the texts.
Comments University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-98-24.
This thesis or dissertation is available at ScholarlyCommons: https://repository.upenn.edu/ircs_reports/68
INCORPORATING PUNCTUATION INTO THE
SENTENCE GRAMMAR A LEXICALIZED TREE
ADJOINING GRAMMAR PERSPECTIVE
Christine D Doran
A DISSERTATION
in
Linguistics
Presented to the Faculties of the University of Pennsylvania
in Partial Ful llmentofthe Requirements for the Degree of Do ctor of Philosophy