Generating Domain Terminologies Using Root- and Rule-Based Terms1
Generating Domain Terminologies using Root- and Rule-Based Terms1 Jacob Collard1, T. N. Bhat2, Eswaran Subrahmanian3,4, Ram D. Sriram3, John T. Elliot2, Ursula R. Kattner2, Carelyn E. Campbell2, Ira Monarch4,5 Independent Consultant, Ithaca, New York1, Materials Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD2, Information Technology Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 3, Carnegie Mellon University, Pittsburgh, PA4, Independent Consultant, Pittsburgh, PA5 Abstract Motivated by the need for flexible, intuitive, reusable, and normalized terminology for guiding search and building ontologies, we present a general approach for generating sets of such terminologies from natural language documents. The terms that this approach generates are root- and rule-based terms, generated by a series of rules designed to be flexible, to evolve, and, perhaps most important, to protect against ambiguity and standardize semantically similar but syntactically distinct phrases to a normal form. This approach combines several linguistic and computational methods that can be automated with the help of training sets to quickly and consistently extract normalized terms. We discuss how this can be extended as natural language technologies improve and how the strategy applies to common use-cases such as search, document entry and archiving, and identifying, tracking, and predicting scientific and technological trends. Keywords dependency parsing; natural language processing; ontology generation; search; terminology generation; unsupervised learning. 1. Introduction 1.1 Terminologies and Semantic Technologies Services and applications on the world-wide web, as well as standards defined by the World Wide Web Consortium (W3C), the primary standards organization for the web, have been integrating semantic technologies into the Internet since 2001 (Koivunen and Miller 2001).
[Show full text]