Sepia: Semantic Parsing for Named Entities by Gregory A. Marton Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY August 2003 () Massachusetts Institute of Technology 2003. All rights reserved.
. li Author...... Department of Elec1cal Engineering and Computer Science August 29, 2003
Certifiedby ...... - Boris Katz Principal Research Scientist Thesis Supervisor
( Accepted by Arthur C. Smith Chairman, Department Committee on Graduate Students
MASSACHUSETITISINSURi OF TECHNOLOGY ARGHVES 31, APR 15 2004
LIBRARIES I. I Sepia: Semantic Parsing for Named Entities by Gregory A. Marton
Submitted to the Department of Electrical Engineering and Computer Science on August 29, 2003, in partial fulfillment of the requirements for the degree of Master of Science
Abstract People's names, dates, locations, organizations, and various numeric expressions, col- lectivelv called Named Entities, are used to convey specific meanings to humans in the same way that identifiers and constants convey meaning to a computer language interpreter. Natural Language Question Answering can benefit from understanding the meaning of these expressions because answers in a text are often phrased dif- ferently from questions and from each other. For example, "9/11" might mean the same as "September 11th" and "Mayor Rudy Giuliani" might be the same person as "Rudolph Giuliani". Sepia, the system presented here, uses a lexicon of lambda expressions and a mildly context-sensitive parser ireateAri data structure fe ehch' named entity. The parser and grammar design are inspired by Combinatory Categorial Grammar. The data structures are designed to capture semantic dependencies using common syntactic forms. Sepia differs from other natural language parsers in that it does not use a pipeline architecture. As yet there is no statistical component in the architecture. To evaluate Sepia, I use examples tp illustrate its qualitative differences from other named entity systems, I measure component perforrmance on Automatic Con- tent Extraction (ACE) competition held-out training data. and I assess end-to-end performance in the Infolab's TREC-12 Question Answering competition entry. Sepia will compete in the ACE Entity Detection and Tracking track at the end of Septem- ber.
Thesis Supervisor: Boris Katz Title: Principal Research Scientist
2