A Link Grammar for Turkish

Total Page:16

File Type:pdf, Size:1020Kb

A Link Grammar for Turkish A LINK GRAMMAR FOR TURKISH A THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER ENGINEERING AND THE INSTITUTE OF ENGINEERING AND SCIENCES OF BILKENT UNIVERSITY IN PARTIAL FULLFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE By Özlem İstek August, 2006 I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a thesis for the degree of Master of Science. Asst. Prof. Dr. İlyas Çiçekli (Supervisor) I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a thesis for the degree of Master of Science. Prof. Dr. H. Altay Güvenir I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a thesis for the degree of Master of Science. Assoc. Prof. Ferda Nur Alpaslan Approved for the Institute of Engineering and Sciences: Prof. Dr. Mehmet Baray Director of Institute of Engineering and Sciences ii ABSTRACT A LINK GRAMMAR FOR TURKISH Özlem İstek M.S. in Computer Engineering Supervisor: Asst. Prof. Dr. İlyas Çiçekli August, 2006 Syntactic parsing, or syntactic analysis, is the process of analyzing an input sequence in order to determine its grammatical structure, i.e. the formal relationships between the words of a sentence, with respect to a given grammar. In this thesis, we developed the grammar of Turkish language in the link grammar formalism. In the grammar, we used the output of a fully described morphological analyzer, which is very important for agglutinative languages like Turkish. The grammar that we developed is lexical such that we used the lexemes of only some function words and for the rest of the word classes we used the morphological feature structures. In addition, we preserved the some of the syntactic roles of the intermediate derived forms of words in our system. Keywords: Natural Language Processing, Turkish grammar, Turkish syntax, Parsing, Link Grammar. iii ÖZET TÜRKÇE İÇİN B İR BA Ğ GRAMER İ Özlem İstek Bilgisayar Mühendisli ği Bölümü, Yüksek Lisans Tez Yöneticisi: Yar. Doç. Prof. Dr. İlyas Çiçekli Ağustos, 2006 Sözdizimsel çözümleme veya ayrı ştırma, bir tümcenin dilbilgisel yapısını yani kelimeleri arasındaki ili şkiyi ortaya çıkarmak amacıyla verilen bir gramere göre inceleme i şlemidir. Bu çalı şmada, Türkçe için bir ba ğ grameri geli ştirilmi ştir. Sistemimizde Türkçe gibi çekimli ve biti şken biçimbirimlere sahip diller için çok önemli olan, tam kapsamlı, iki a şamalı bir biçimbirimsel tanımlayıcının sonuçları kullanılmı ştır. Geli ştirdi ğimiz gramer sözcükseldir ancak, bazı i şlevsel kelimeler oldukları gibi kullanılırken, di ğer kelime türleri için kelimelerin kendilerinin yerine biçimbirimsel özellikleri kullanılmı ştır. Ayrıca sistemimizde kelimelerin ara türeme formlarının sözdizimsel rollerinin bazıları muhafaza edilmi ştir. Anahtar Kelimeler: Do ğal Dil İş leme, Türkçe Dilbilgisi, Türkçe sözdizimi, Sözdizimsel Çözümleme, Ba ğ Grameri. iv Acknowledgement I would like to express my deep gratitude to my supervisor Asst. Prof. Dr. İlyas Çiçekli for his invaluable guidance, encouragement, and suggestions throughout the development of this thesis. I would also like to thank Prof. Dr. H. Altay Güvenir and Assoc. Prof. Ferda Nur Alpaslan for reading and commenting on this thesis. I would like to thank my friends Abdullah Fi şne and Serdar Severcan for their help. I am also grateful to my friend Arif Yılmaz for his invaluable help, moral support, encouragement and suggestions. I am grateful to my family for their infinite moral support and help throughout my life. v To my mother, Fatma İSTEK vi Contents 1 Introduction..................................................................................................1 1.1 Linguistic Background.............................................................................3 1.2 Thesis Outline..........................................................................................7 2 Link Grammar .............................................................................................8 2.1 Introduction.............................................................................................8 2.2 Main Rules of the Grammar.....................................................................9 2.3 Language and Notion of Link Grammars ............................................... 10 2.3.1 Rules for Writing Connector Blocks or Linking Requirements........ 10 2.3.2 The Concept of Disjuncts................................................................12 2.4 General Features of the Link Parser .......................................................13 2.5 Special Features of the Dictionary..........................................................14 2.6 Coordinating Conjunctions ....................................................................17 2.6.1 Handling Conjunctions ...................................................................18 2.6.2 Some Problematic Conjunctional Structures.................................... 20 2.7 Post-Processing .....................................................................................21 2.7.1 Introduction ....................................................................................21 2.7.2 Structures of Domains.....................................................................21 2.7.3 Rules in Post Processing .................................................................22 3 Turkish Morphology and Syntax............................................................... 24 3.1 Distinctive Features of Turkish ..............................................................24 3.2 Turkish Morphotactics...........................................................................28 3.2.1 Inflectional Morphotactics ..............................................................29 3.2.2 Derivational Morphotactics.............................................................33 3.2.3 Question Morpheme........................................................................37 3.3 Constituent Order in Turkish..................................................................38 vii 3.4 Classification of Turkish Sentences .......................................................40 3.4.1 Classification by Structure ..............................................................41 3.4.2 Classification by Predicate Type .....................................................42 3.4.3 Classification by Predicate Place.....................................................44 3.4.4 Classification by Meaning...............................................................44 3.5 Substantival Sentences...........................................................................45 4 Design.......................................................................................................... 47 4.1 Morphological Analyzer ........................................................................47 4.1.1 Turkish Morphological Analyzer ....................................................47 4.1.2 Improvements and Modifications to Turkish Morphological Analyzer ................................................................................................................ 49 4.2 System Architecture...............................................................................52 5 Turkish Link Grammar............................................................................. 61 5.1 Scope of Turkish Link Grammar............................................................63 5.2 Linking Requirements Related to All Words.......................................... 63 5.3 Compound Sentences, Nominal Sentences, and the Wall ....................... 67 5.4 Linking Requirements of Word Classes .................................................74 5.4.1 Adverbs ..........................................................................................74 5.4.2 Postpositions...................................................................................76 5.4.3 Adjectives and Numbers .................................................................78 5.4.4 Pronouns.........................................................................................81 5.4.5 Nouns .............................................................................................85 5.4.6 Verbs.............................................................................................. 90 5.4.7 Conjunctions...................................................................................93 6 Performance Evaluation ............................................................................ 95 7 Conclusion ................................................................................................ 101 BIBLIOGRAPHY ....................................................................................... 103 A Turkish Morphological Features ............................................................ 106 viii B Summary of Link Types.......................................................................... 108 C Input Document and Statistical Results.................................................. 112 D Example Output from Our Test Run...................................................... 113 ix List of Figures Figure 1 METU-Sabancı Turkish Treebank.......................................................3 Figure 2 Typical Order of Constituents in Turkish...........................................39 Figure 3 Architecture of a Two Level Morphological Analyzer ....................... 48 Figure 4 System Architecture
Recommended publications
  • A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts
    World Academy of Science, Engineering and Technology International Journal of Biomedical and Biological Engineering Vol:1, No:5, 2007 PIELG: A Protein Interaction Extraction System using a Link Grammar Parser from Biomedical Abstracts Rania A. Abul Seoud, Nahed H. Solouma, Abou-Baker M. Youssef, and Yasser M. Kadah, Senior Member, IEEE failure. Applications that repair or replace portions of or Abstract—Due to the ever growing amount of publications about whole living tissues (e.g., bone, dentine, or bladder) using protein-protein interactions, information extraction from text is living cells is named Tissue Engineering (TE). For example, increasingly recognized as one of crucial technologies in dentine formation is the process of regenerating dental tissues bioinformatics. This paper presents a Protein Interaction Extraction by tissue engineering principles and technology. Dentine System using a Link Grammar Parser from biomedical abstracts formation is governed by biological mediators or growth (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as factors (protein) and interactions amongst different proteins. well as their linguistically significant and meaningful combinations. Dentine formation needs the support of continuous updated The system uses phrasal-prepositional verbs patterns to overcome information about protein-protein interactions. preposition combinations problems. The recall and precision are Researches in the last decade have resulted in the 74.4% and 62.65%, respectively. Experimental evaluations with two production of a large amount of information about protein other state-of-the-art extraction systems indicate that PIELG system functions involved in dentine formation process. That achieves better performance.
    [Show full text]
  • Implementing a Portable Clinical NLP System with a Common Data Model: a Lisp Perspective
    Implementing a portable clinical NLP system with a common data model: a Lisp perspective The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Luo, Yuan, and Peter Szolovits, "Implementing a portable clinical NLP system with a common data model: a Lisp perspective." Proceedings, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2018), December 3-6, 2018, Madrid, Spain (Piscataway, N.J.: IEEE, 2018): doi 10.1109/BIBM.2018.8621521 ©2018 Author(s) As Published 10.1109/BIBM.2018.8621521 Publisher Institute of Electrical and Electronics Engineers (IEEE) Version Original manuscript Citable link https://hdl.handle.net/1721.1/124439 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/ Implementing a Portable Clinical NLP System with a Common Data Model – a Lisp Perspective Yuan Luo* Peter Szolovits* Dept. of Preventive Medicine CSAIL Northwestern University MIT Chicago, USA Cambridge, USA [email protected] [email protected] Abstract— This paper presents a Lisp architecture for a annotations, often in idiosyncratic representations. This makes portable NLP system, termed LAPNLP, for processing clinical it quite difficult to chain together sequences of operations. Alt- notes. LAPNLP integrates multiple standard, customized and hough several recent projects have achieved reasonable in-house developed NLP tools. Our system facilitates portability success in analyzing certain types of clinical narratives [3-6], across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize neces- efforts towards a common data model (CDM) to ensure port- sary data elements.
    [Show full text]
  • Chapter 14: Dependency Parsing
    Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright © 2021. All rights reserved. Draft of September 21, 2021. CHAPTER 14 Dependency Parsing The focus of the two previous chapters has been on context-free grammars and constituent-based representations. Here we present another important family of dependency grammars grammar formalisms called dependency grammars. In dependency formalisms, phrasal constituents and phrase-structure rules do not play a direct role. Instead, the syntactic structure of a sentence is described solely in terms of directed binary grammatical relations between the words, as in the following dependency parse: root dobj det nmod (14.1) nsubj nmod case I prefer the morning flight through Denver Relations among the words are illustrated above the sentence with directed, labeled typed dependency arcs from heads to dependents. We call this a typed dependency structure because the labels are drawn from a fixed inventory of grammatical relations. A root node explicitly marks the root of the tree, the head of the entire structure. Figure 14.1 shows the same dependency analysis as a tree alongside its corre- sponding phrase-structure analysis of the kind given in Chapter 12. Note the ab- sence of nodes corresponding to phrasal constituents or lexical categories in the dependency parse; the internal structure of the dependency parse consists solely of directed relations between lexical items in the sentence. These head-dependent re- lationships directly encode important information that is often buried in the more complex phrase-structure parses. For example, the arguments to the verb prefer are directly linked to it in the dependency structure, while their connection to the main verb is more distant in the phrase-structure tree.
    [Show full text]
  • Downloaded from Brill.Com09/25/2021 11:36:06AM Via Free Access Hybridity Versus Revivability 41
    HYBRIDITY VERSUS REVIVABILITY: MULTIPLE CAUSATION, FORMS AND PATTERNS Ghil‘ad Zuckermann Associate Professor and ARC Discovery Fellow in Linguistics The University of Queensland, Australia Abstract The aim of this article is to suggest that due to the ubiquitous multiple causation, the revival of a no-longer spoken language is unlikely without cross-fertilization from the revivalists’ mother tongue(s). Thus, one should expect revival efforts to result in a language with a hybridic genetic and typological character. The article highlights salient morphological constructions and categories, illustrating the difficulty in determining a single source for the grammar of Israeli, somewhat misleadingly a.k.a. ‘Modern Hebrew’. The European impact in these features is apparent inter alia in structure, semantics or productivity. Multiple causation is manifested in the Congruence Principle, according to which if a feature exists in more than one contributing language, it is more likely to persist in the emerging language. Consequently, the reality of linguistic genesis is far more complex than a simple family tree system allows. ‘Revived’ languages are unlikely to have a single parent. The multisourced nature of Israeli and the role of the Congruence Principle in its genesis have implications for historical linguistics, language planning and the study of language, culture and identity. “Linguistic and social factors are closely interrelated in the development of language change. Explanations which are confined to one or the other aspect, no matter how well constructed, will fail to account for the rich body of regularities that can be observed in empirical studies of language behavior.” Weinreich, Labov & Herzog 1968: 188.
    [Show full text]
  • BUILDING a LEXICAL FUNCTIONAL GRAMMAR for TURKISH By
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Sabanci University Research Database BUILDING A LEXICAL FUNCTIONAL GRAMMAR FOR TURKISH by OZLEM¨ C¸ETINO˙ GLU˘ Submitted to the Graduate School of Engineering and Natural Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy Sabancı University June 2009 BUILDING A LEXICAL FUNCTIONAL GRAMMAR FOR TURKISH APPROVED BY Prof. Dr. Kemal Oflazer .............................................. (Thesis Supervisor) Assoc. Prof. Dr. Aslı G¨oksel .............................................. Assoc. Prof. Dr. Berrin Yanıko˘glu .............................................. Prof. Dr. Cem Say .............................................. Assoc. Prof. Dr. H¨usn¨uYenig¨un .............................................. DATE OF APPROVAL: .............................................. c Ozlem¨ C¸etino˘glu 2009 All Rights Reserved to my parents Acknowledgments I would like to express my deepest gratitude to my supervisor Prof. Dr. Kemal Oflazer, for his guidance and advices not only for this research but also for academic life in general. I thank Prof. Dr. Cem Say, Assoc. Prof. Dr. Berrin Yanıko˘glu, Assoc. Prof. Dr. H¨usn¨uYenig¨un and Assoc. Prof. Dr. Aslı G¨oksel for their kind attendance to the thesis committee and for their valuable contributions. My sincere gratitude goes to Prof. Dr. Miriam Butt for patiently answering my questions and for her enormous help in linguistics. I would also like to thank all Par- Gram members for the insightful discussions and comments. I learned a lot from them. Onsel¨ Arma˘gan and Tuba G¨um¨u¸s helped me in integrating their work into my system. I thank them for their efforts. Thanks to everyone at FENS 2014 for creating a fun place to work.
    [Show full text]
  • Development of a Persian Syntactic Dependency Treebank
    Development of a Persian Syntactic Dependency Treebank Mohammad Sadegh Rasooli Manouchehr Kouhestani Amirsaeid Moloodi Department of Computer Science Department of Linguistics Department of Linguistics Columbia University Tarbiat Modares University University of Tehran New York, NY Tehran, Iran Tehran, Iran [email protected] [email protected] [email protected] Abstract tions in tasks such as machine translation. Depen- dency treebanks are collections of sentences with This paper describes the annotation process their corresponding dependency trees. In the last and linguistic properties of the Persian syn- decade, many dependency treebanks have been de- tactic dependency treebank. The treebank veloped for a large number of languages. There are consists of approximately 30,000 sentences at least 29 languages for which at least one depen- annotated with syntactic roles in addition to morpho-syntactic features. One of the unique dency treebank is available (Zeman et al., 2012). features of this treebank is that there are al- Dependency trees are much more similar to the hu- most 4800 distinct verb lemmas in its sen- man understanding of language and can easily rep- tences making it a valuable resource for ed- resent the free word-order nature of syntactic roles ucational goals. The treebank is constructed in sentences (Kubler¨ et al., 2009). with a bootstrapping approach by means of available tagging and parsing tools and man- Persian is a language with about 110 million ually correcting the annotations. The data is speakers all over the world (Windfuhr, 2009), yet in splitted into standard train, development and terms of the availability of teaching materials and test set in the CoNLL dependency format and annotated data for text processing, it is undoubt- is freely available to researchers.
    [Show full text]
  • Downloads." the Open Information Security Foundation
    Performance Testing Suricata The Effect of Configuration Variables On Offline Suricata Performance A Project Completed for CS 6266 Under Jonathon T. Giffin, Assistant Professor, Georgia Institute of Technology by Winston H Messer Project Advisor: Matt Jonkman, President, Open Information Security Foundation December 2011 Messer ii Abstract The Suricata IDS/IPS engine, a viable alternative to Snort, has a multitude of potential configurations. A simplified automated testing system was devised for the purpose of performance testing Suricata in an offline environment. Of the available configuration variables, seventeen were analyzed independently by testing in fifty-six configurations. Of these, three variables were found to have a statistically significant effect on performance: Detect Engine Profile, Multi Pattern Algorithm, and CPU affinity. Acknowledgements In writing the final report on this endeavor, I would like to start by thanking four people who made this project possible: Matt Jonkman, President, Open Information Security Foundation: For allowing me the opportunity to carry out this project under his supervision. Victor Julien, Lead Programmer, Open Information Security Foundation and Anne-Fleur Koolstra, Documentation Specialist, Open Information Security Foundation: For their willingness to share their wisdom and experience of Suricata via email for the past four months. John M. Weathersby, Jr., Executive Director, Open Source Software Institute: For allowing me the use of Institute equipment for the creation of a suitable testing
    [Show full text]
  • Lexicalization and Grammar Development
    University of Pennsylvania ScholarlyCommons IRCS Technical Reports Series Institute for Research in Cognitive Science April 1995 Lexicalization and Grammar Development B. Srinivas University of Pennsylvania Dania Egedi University of Pennsylvania Christine D. Doran University of Pennsylvania Tilman Becker University of Pennsylvania Follow this and additional works at: https://repository.upenn.edu/ircs_reports Srinivas, B.; Egedi, Dania; Doran, Christine D.; and Becker, Tilman, "Lexicalization and Grammar Development" (1995). IRCS Technical Reports Series. 138. https://repository.upenn.edu/ircs_reports/138 University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-95-12. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/ircs_reports/138 For more information, please contact [email protected]. Lexicalization and Grammar Development Abstract In this paper we present a fully lexicalized grammar formalism as a particularly attractive framework for the specification of natural language grammars. We discuss in detail Feature-based, Lexicalized Tree Adjoining Grammars (FB-LTAGs), a representative of the class of lexicalized grammars. We illustrate the advantages of lexicalized grammars in various contexts of natural language processing, ranging from wide-coverage grammar development to parsing and machine translation. We also present a method for compact and efficientepr r esentation of lexicalized trees. In diesem Beitrag präsentieren wir einen völlig lexikalisierten Grammatikformalismus als eine besonders geeignete Basis für die Spezifikation onv Grammatiken für natürliche Sprachen. Wir stellen feature- basierte, lexikalisierte Tree Adjoining Grammars (FB-LTAGs) vor, ein Vertreter der Klasse der lexikalisierten Grammatiken. Wir führen die Vorteile von lexikalisierten Grammatiken in verschiedenen Bereichen der maschinellen Sprachverarbeitung aus; von der Entwicklung von Grammatiken für weite Sprachbereiche über Parsing bis zu maschineller Übersetzung.
    [Show full text]
  • Parsing Aligned Parallel Corpus by Projecting Syntactic Relations from Annotated Source Corpus
    Parsing Aligned Parallel Corpus by Projecting Syntactic Relations from Annotated Source Corpus Shailly Goyal Niladri Chatterjee Department of Mathematics Indian Institute of Technology Delhi Hauz Khas, New Delhi - 110 016, India {shailly goyal, niladri iitd}@yahoo.com Abstract approaches either use examples from the same lan- guage, e.g., (Bod et al., 2003; Streiter, 2002), or Example-based parsing has already been they try to imitate the parse of a given sentence proposed in literature. In particular, at- using the parse of the corresponding sentence in tempts are being made to develop tech- some other language (Hwa et al., 2005; Yarowsky niques for language pairs where the source and Ngai, 2001). In particular, Hwa et al. (2005) and target languages are different, e.g. have proposed a scheme called direct projection Direct Projection Algorithm (Hwa et al., algorithm (DPA) which assumes that the relation 2005). This enables one to develop parsed between two words in the source language sen- corpus for target languages having fewer tence is preserved across the corresponding words linguistic tools with the help of a resource- in the parallel target language. This is called Di- rich source language. The DPA algo- rect Correspondence Assumption (DCA). rithm works on the assumption of Di- However, with respect to Indian languages we rect Correspondence which simply means observed that the DCA does not hold good all the that the relation between two words of time. In order to overcome the difficulty, in this the source language sentence can be pro- work, we propose an algorithm based on a vari- jected directly between the correspond- ation of the DCA, which we call pseudo Direct ing words of the parallel target language Correspondence Assumption (pDCA).
    [Show full text]
  • Nature Redacted September 7,2017 Certified By
    The Universality of Concord by Isa Kerem Bayirli BA, Middle East Technical University (2010) MA, Bogazigi University (2012) Submitted to the Department of Linguistics and Philosophy in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Linguistics at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2017 2017 Isa Kerem Bayirli. All rights reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature redacted Author......................... ...... ............................. Departmeyf)/Linguistics and Philosophy Sic ;nature redacted September 7,2017 Certified by...... David Pesetsky Ferrari P. Ward Professor of Linguistics g nThesis Supervisor redacted Accepted by.................. Signature ...................................... David Pesetsky Lead, Department of Linguistics and Philosophy MASSACHUSETTS INSTITUTE OF TECHNOLOGY SEP 2 6 2017 LIBRARIES ARCHiVES The Universality of Concord by Isa Kerem Bayirh Submitted to the Deparment of Linguistics and Philosophy on September 7, 2017 in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Linguistics Abstract In this dissertation, we develop and defend a universal theory of concord (i.e. feature sharing between a head noun and the modifying adjectives). When adjectives in a language show concord with the noun they modify, concord morphology usually involves the full set of features of that noun (e.g. gender, number and case). However, there are also languages in which concord targets only a subset of morphosyntactic features of the head noun. We first observe that feature combinations that enter into concord in such languages are not random.
    [Show full text]
  • Dependency-Length Minimization in Natural and Artificial Languages*
    Journal of Quantitative Linguistics 2008, Volume 15, Number 3, pp. 256 – 282 DOI: 10.1080/09296170802159512 Dependency-Length Minimization in Natural and Artificial Languages* David Temperley Eastman School of Music at the University of Rochester ABSTRACT A wide range of evidence points to a preference for syntactic structures in which dependencies are short. Here we examine the question: what kinds of dependency configurations minimize dependency length? We consider two well-established principles of dependency-length minimization; that dependencies should be consistently right-branching or left-branching, and that shorter dependent phrases should be closer to the head. We also add a third, novel, principle; that some ‘‘opposite-branching’’ of one-word phrases is desirable. In a series of computational experiments, using unordered dependency trees gathered from written English, we examine the effect of these three principles on dependency length, and show that all three contribute significantly to dependency-length reduction. Finally, we present what appears to be the optimal ‘‘grammar’’ for dependency-length minimization. INTRODUCTION A wide range of evidence from diverse areas of linguistic research points to a preference for syntactic structures in which dependencies between related words are short.1 This preference is reflected in numerous psycholinguistic phenomena: structures with long depen- dencies appear to cause increased complexity in comprehen- sion and tend to be avoided in production. The preference for shorter dependencies has also been cited as an explanation for certain *Address correspondence to: David Temperley, Eastman School of Music, 26 Gibbs Street, Rochester, NY 14604. Tel: 585-274-1557. E-mail: [email protected] 1Thanks to Daniel Gildea for valuable comments on a draft of this paper.
    [Show full text]
  • Natural Language Processing (NLP): Written and Oral Involves Processing of Written Text Using Computer Models at Lexical, Syntactic, and Semantic Level
    Natural Language Processing (NLP): written and oral Involves processing of written text using computer models at lexical, syntactic, and semantic level. Introduction: 1. Understanding (of language) – map text/speech to immediately useful form. 2. Generation (of language) Sentence Analysis Phases: 1. Morphological Analysis: extracts root word from declined/inflection form of word after removing suffices and prefixes. 2. Syntactic Analysis: builds a structural description of a sentence using MAP (Morphological Analysis Process) based on grammatical rules, called Parsing. 3. Semantic Analysis: Creates a semantic structure by ascribing a literal meaning to sentence using parse structure obtained in syntactic phase. It maps individual words into corresponding objects in the knowledge base. Focus is on creation of target representation of the meaning of a sentence. 4. Pragmatic Analysis: to establish meaning of a sentence in different contexts. 5. Discourse Analysis: Interpretation based on belief during conversation. Grammars and Parsers: Parsing: Process of analyzing an input sequence in order to determine its structure with respect to a given grammar. Types of Parsing: 1. Rule-based parsing: syntactic structure of language is provided in the form of linguistic rules which can be coded as production rules that are similar to context-free rules. 1. Top-down parsing: Start with start symbol and apply grammar rules in the forward direction until the terminal symbols of the parse tree correspond to the words in the sentence. 2. Bottom-up parsing: Start with the words in the sentence in the sentence and apply grammar rules in the backward direction until a single tree is produced whose root matches with the start symbol.
    [Show full text]