Modelling Function Words Improves Unsupervised Word Segmentation

Total Page:16

File Type:pdf, Size:1020Kb

Modelling Function Words Improves Unsupervised Word Segmentation Modelling function words improves unsupervised word segmentation Mark Johnson1,2, Anne Christophe3,4, Katherine Demuth2,6 and Emmanuel Dupoux3,5 1 Department of Computing, Macquarie University, Sydney, Australia 2 Santa Fe Institute, Santa Fe, New Mexico, USA 3 Ecole Normale Superieure,´ Paris, France 4 Centre National de la Recherche Scientifique, Paris, France 5 Ecole des Hautes Etudes en Sciences Sociales, Paris, France 6 Department of Linguistics, Macquarie University, Sydney, Australia Abstract function words are treated specially in human language acquisition. We do this by comparing Inspired by experimental psychological two computational models of word segmentation findings suggesting that function words which differ solely in the way that they model play a special role in word learning, we function words. Following Elman et al. (1996) make a simple modification to an Adaptor and Brent (1999) our word segmentation models Grammar based Bayesian word segmenta- identify word boundaries from unsegmented se- tion model to allow it to learn sequences quences of phonemes corresponding to utterances, of monosyllabic “function words” at the effectively performing unsupervised learning of a beginnings and endings of collocations lexicon. For example, given input consisting of of (possibly multi-syllabic) words. This unsegmented utterances such as the following: modification improves unsupervised word segmentation on the standard Bernstein- juwɑnttusiðəbʊk Ratner (1987) corpus of child-directed En- a word segmentation model should segment this as glish by more than 4% token f-score com- ju wɑnt tu si ðə bʊk, which is the IPA representation pared to a model identical except that it of “you want to see the book”. does not special-case “function words”, We show that a model equipped with the abil- setting a new state-of-the-art of 92.4% to- ity to learn some rudimentary properties of the ken f-score. Our function word model as- target language’s function words is able to learn sumes that function words appear at the the vocabulary of that language more accurately left periphery, and while this is true of than a model that is identical except that it is inca- languages such as English, it is not true pable of learning these generalisations about func- universally. We show that a learner can tion words. This suggests that there are acqui- use Bayesian model selection to determine sition advantages to treating function words spe- the location of function words in their lan- cially that human learners could take advantage of guage, even though the input to the model (at least to the extent that they are learning similar only consists of unsegmented sequences of generalisations as our models), and thus supports phones. Thus our computational models the hypothesis that function words are treated spe- support the hypothesis that function words cially in human lexical acquisition. As a reviewer play a special role in word learning. points out, we present no evidence that children use function words in the way that our model does, 1 Introduction and we want to emphasise we make no such claim. Over the past two decades psychologists have in- While absolute accuracy is not directly relevant vestigated the role that function words might play to the main point of the paper, we note that the in human language acquisition. Their experiments models that learn generalisations about function suggest that function words play a special role in words perform unsupervised word segmentation the acquisition process: children learn function at 92.5% token f-score on the standard Bernstein- words before they learn the vast bulk of the asso- Ratner (1987) corpus, which improves the previ- ciated content words, and they use function words ous state-of-the-art by more than 4%. to help identify context words. As a reviewer points out, the changes we make The goal of this paper is to determine whether to our models to incorporate function words can computational models of human language acqui- be viewed as “building in” substantive informa- sition can provide support for the hypothesis that tion about possible human languages. The model 282 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 282–292, Baltimore, Maryland, USA, June 23-25 2014. c 2014 Association for Computational Linguistics that achieves the best token f-score expects func- miners and prepositions are associated with tion words to appear at the left edge of phrases. nouns, auxiliary verbs and complementisers While this is true for languages such as English, are associated with main verbs) it is not true universally. By comparing the pos- 6. semantically, content words denote sets of terior probability of two models — one in which objects or events, while function words de- function words appear at the left edges of phrases, note more complex relationships over the en- and another in which function words appear at the tities denoted by content words right edges of phrases — we show that a learner 7. historically, the rate of innovation of function could use Bayesian posterior probabilities to deter- words is much lower than the rate of innova- mine that function words appear at the left edges tion of content words (i.e., function words are of phrases in English, even though they are not typically “closed class”, while content words told the locations of word boundaries or which are “open class”) words are function words. Properties 1–4 suggest that function words This paper is structured as follows. Section 2 might play a special role in language acquisition describes the specific word segmentation mod- because they are especially easy to identify, while els studied in this paper, and the way we ex- property 5 suggests that they might be useful for tended them to capture certain properties of func- identifying lexical categories. The models we tion words. The word segmentation experiments study here focus on properties 3 and 4, in that are presented in section 3, and section 4 discusses they are capable of learning specific sequences of how a learner could determine whether function monosyllabic words in peripheral (i.e., initial or words occur on the left-periphery or the right- final) positions of phrase-like units. periphery in the language they are learning. Sec- A number of psychological experiments have tion 5 concludes and describes possible future shown that infants are sensitive to the function work. The rest of this introduction provides back- words of their language within their first year of ground on function words, the Adaptor Grammar life (Shi et al., 2006; Halle´ et al., 2008; Shafer models we use to describe lexical acquisition and et al., 1998), often before they have experienced the Bayesian inference procedures we use to infer the “word learning spurt”. Crucially for our pur- these models. pose, infants of this age were shown to exploit frequent function words to segment neighboring 1.1 Psychological evidence for the role of content words (Shi and Lepage, 2008; Halle´ et function words in word learning al., 2008). In addition, 14 to 18-month-old Traditional descriptive linguistics distinguishes children were shown to exploit function words to function words, such as determiners and prepo- constrain lexical access to known words - for in- sitions, from content words, such as nouns and stance, they expect a noun after a determiner (Cau- verbs, corresponding roughly to the distinction be- vet et al., 2014; Kedar et al., 2006; Zangl and tween functional categories and lexical categories Fernald, 2007). In addition, it is plausible that of modern generative linguistics (Fromkin, 2001). function words play a crucial role in children’s Function words differ from content words in at acquisition of more complex syntactic phenom- least the following ways: ena (Christophe et al., 2008; Demuth and McCul- 1. there are usually far fewer function word lough, 2009), so it is interesting to investigate the types than content word types in a language roles they might play in computational models of 2. function word types typically have much language acquisition. higher token frequency than content word types 1.2 Adaptor grammars 3. function words are typically morphologically Adaptor grammars are a framework for Bayesian and phonologically simple (e.g., they are typ- inference of a certain class of hierarchical non- ically monosyllabic) parametric models (Johnson et al., 2007b). They 4. function words typically appear in peripheral define distributions over the trees specified by positions of phrases (e.g., prepositions typi- a context-free grammar, but unlike probabilistic cally appear at the beginning of prepositional context-free grammars, they “learn” distributions phrases) over the possible subtrees of a user-specified set of 5. each function word class is associated with “adapted” nonterminals. (Adaptor grammars are specific content word classes (e.g., deter- non-parametric, i.e., not characterisable by a finite 283 set of parameters, if the set of possible subtrees from Gi. The PCFG generates the distribution GS of the adapted nonterminals is infinite). Adaptor over the set of trees generated by the start sym- TS grammars are useful when the goal is to learn a bol S; the distribution over the strings it generates potentially unbounded set of entities that need to is obtained by marginalising over the trees. satisfy hierarchical constraints. As section 2 ex- In a Bayesian PCFG one puts Dirichlet priors plains in more detail, word segmentation is such Dir(α) on the rule probability vector θ, such that a case: words are composed of syllables and be- there is one Dirichlet parameter αA α for each long to phrases or collocations, and modelling this rule A α R. There are Markov Chain→ Monte → ∈ structure improves word segmentation accuracy. Carlo (MCMC) and Variational Bayes procedures Adaptor Grammars are formally defined in for estimating the posterior distribution over rule Johnson et al.
Recommended publications
  • An Analysis of the Content Words Used in a School Textbook, Team up English 3, Used for Grade 9 Students
    [Pijarnsarid et. al., Vol.5 (Iss.3): March, 2017] ISSN- 2350-0530(O), ISSN- 2394-3629(P) ICV (Index Copernicus Value) 2015: 71.21 IF: 4.321 (CosmosImpactFactor), 2.532 (I2OR) InfoBase Index IBI Factor 3.86 Social AN ANALYSIS OF THE CONTENT WORDS USED IN A SCHOOL TEXTBOOK, TEAM UP ENGLISH 3, USED FOR GRADE 9 STUDENTS Sukontip Pijarnsarid*1, Prommintra Kongkaew2 *1, 2English Department, Graduate School, Ubon Ratchathani Rajabhat University, Thailand DOI: https://doi.org/10.29121/granthaalayah.v5.i3.2017.1761 Abstract The purpose of this study were to study the content words used in a school textbook, Team Up in English 3, used for Grade 9 students and to study the frequency of content words used in a school textbook, Team Up in English 3, used for Grade 9 students. The study found that nouns is used with the highest frequency (79), followed by verb (58), adjective (46), and adverb (24).With the nouns analyzed, it was found that the Modifiers + N used with the highest frequency (92.40%), the compound nouns were ranked in second (7.59 %). Considering the verbs used in the text, it was found that transitive verbs were most commonly used (77.58%), followed by intransitive verbs (12.06%), linking verbs (10.34%). As regards the adjectives used in the text, there were 46 adjectives in total, 30 adjectives were used as attributive (65.21 %) and 16 adjectives were used as predicative (34.78%). As for the adverbs, it was found that adverbs of times were used with the highest frequency (37.5 % ), followed by the adverbs of purpose and degree (33.33%) , the adverbs of frequency (12.5 %) , the adverbs of place ( 8.33% ) and the adverbs of manner ( 8.33 % ).
    [Show full text]
  • ASYMMETRIES in the PROSODIC PHRASING of FUNCTION WORDS: ANOTHER LOOK at the SUFFIXING PREFERENCE Nikolaus P
    ASYMMETRIES IN THE PROSODIC PHRASING OF FUNCTION WORDS: ANOTHER LOOK AT THE SUFFIXING PREFERENCE Nikolaus P. Himmelmann Universität zu Köln It is a well-known fact that across the world’s languages there is a fairly strong asymmetry in the affixation of grammatical material, in that suffixes considerably outnumber prefixes in typo - logical databases. This article argues that prosody, specifically prosodic phrasing, plays an impor - tant part in bringing about this asymmetry. Prosodic word and phrase boundaries may occur after a clitic function word preceding its lexical host with sufficient frequency so as to impede the fu - sion required for affixhood. Conversely, prosodic boundaries rarely, if ever, occur between a lexi - cal host and a clitic function word following it. Hence, prosody does not impede the fusion process between lexical hosts and postposed function words, which therefore become affixes more easily. Evidence for the asymmetry in prosodic phrasing is provided from two sources: disfluencies, and ditropic cliticization, that is, the fact that grammatical pro clitics may be phonological en clit- ics (i.e. phrased with a preceding host), but grammatical enclitics are never phonological proclit- ics. Earlier explanations for the suffixing preference have neglected prosody almost completely and thus also missed the related asymmetry in ditropic cliticization. More importantly, the evi - dence from prosodic phrasing suggests a new venue for explaining the suffixing preference. The asymmetry in prosodic phrasing, which, according to the hypothesis proposed here, is a major fac - tor underlying the suffixing preference, has a natural basis in the mechanics of turn-taking as well as in the mechanics of speech production.* Keywords : affixes, clitics, language processing, turn-taking, grammaticization, explanation in ty - pology, Tagalog 1.
    [Show full text]
  • 6 the Major Parts of Speech
    6 The Major Parts of Speech KEY CONCEPTS Parts of Speech Major Parts of Speech Nouns Verbs Adjectives Adverbs Appendix: prototypes INTRODUCTION In every language we find groups of words that share grammatical charac- teristics. These groups are called “parts of speech,” and we examine them in this chapter and the next. Though many writers onlanguage refer to “the eight parts of speech” (e.g., Weaver 1996: 254), the actual number of parts of speech we need to recognize in a language is determined by how fine- grained our analysis of the language is—the more fine-grained, the greater the number of parts of speech that will be distinguished. In this book we distinguish nouns, verbs, adjectives, and adverbs (the major parts of speech), and pronouns, wh-words, articles, auxiliary verbs, prepositions, intensifiers, conjunctions, and particles (the minor parts of speech). Every literate person needs at least a minimal understanding of parts of speech in order to be able to use such commonplace items as diction- aries and thesauruses, which classify words according to their parts (and sub-parts) of speech. For example, the American Heritage Dictionary (4th edition, p. xxxi) distinguishes adjectives, adverbs, conjunctions, definite ar- ticles, indefinite articles, interjections, nouns, prepositions, pronouns, and verbs. It also distinguishes transitive, intransitive, and auxiliary verbs. Writ- ers and writing teachers need to know about parts of speech in order to be able to use and teach about style manuals and school grammars. Regardless of their discipline, teachers need this information to be able to help students expand the contexts in which they can effectively communicate.
    [Show full text]
  • The Impact of Function Words on the Processing and Acquisition of Syntax
    NORTHWESTERN UNIVERSITY The Impact of Function Words on the Processing and Acquisition of Syntax A DISSERTATION SUBMITTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for the degree DOCTOR OF PHILOSOPHY Field of Linguistics By Jessica Peterson Hicks EVANSTON, ILLINOIS December 2006 2 © Copyright by Jessica Peterson Hicks 2006 All Rights Reserved 3 ABSTRACT The Impact of Function Words on the Processing and Acquisition of Syntax Jessica Peterson Hicks This dissertation investigates the role of function words in syntactic processing by studying lexical retrieval in adults and novel word categorization in infants. Christophe and colleagues (1997, in press) found that function words help listeners quickly recognize a word and infer its syntactic category. Here, we show that function words also help listeners make strong on-line predictions about syntactic categories, speeding lexical access. Moreover, we show that infants use this predictive nature of function words to segment and categorize novel words. Two experiments tested whether determiners and auxiliaries could cause category- specific slowdowns in an adult word-spotting task. Adults identified targets faster in grammatical contexts, suggesting that a functor helps the listener construct a syntactic parse that affects the speed of word identification; also, a large prosodic break facilitated target access more than a smaller break. A third experiment measured independent semantic ratings of the stimuli used in Experiments 1 and 2, confirming that the observed grammaticality effect mainly reflects syntactic, and not semantic, processing. Next, two preferential-listening experiments show that by 15 months, infants use function words to infer the category of novel words and to better recognize those words in continuous speech.
    [Show full text]
  • Syntactic Variation in English Quantified Noun Phrases with All, Whole, Both and Half
    Syntactic variation in English quantified noun phrases with all, whole, both and half Acta Wexionensia Nr 38/2004 Humaniora Syntactic variation in English quantified noun phrases with all, whole, both and half Maria Estling Vannestål Växjö University Press Abstract Estling Vannestål, Maria, 2004. Syntactic variation in English quantified noun phrases with all, whole, both and half, Acta Wexionensia nr 38/2004. ISSN: 1404-4307, ISBN: 91-7636-406-2. Written in English. The overall aim of the present study is to investigate syntactic variation in certain Present-day English noun phrase types including the quantifiers all, whole, both and half (e.g. a half hour vs. half an hour). More specific research questions concerns the overall frequency distribution of the variants, how they are distrib- uted across regions and media and what linguistic factors influence the choice of variant. The study is based on corpus material comprising three newspapers from 1995 (The Independent, The New York Times and The Sydney Morning Herald) and two spoken corpora (the dialogue component of the BNC and the Longman Spoken American Corpus). The book presents a number of previously not discussed issues with respect to all, whole, both and half. The study of distribution shows that one form often predominated greatly over the other(s) and that there were several cases of re- gional variation. A number of linguistic factors further seem to be involved for each of the variables analysed, such as the syntactic function of the noun phrase and the presence of certain elements in the NP or its near co-text.
    [Show full text]
  • Hungarian Copula Constructions in Dependency Syntax and Parsing
    Hungarian copula constructions in dependency syntax and parsing Katalin Ilona Simko´ Veronika Vincze University of Szeged University of Szeged Institute of Informatics Institute of Informatics Department of General Linguistics MTA-SZTE Hungary Research Group on Artificial Intelligence [email protected] Hungary [email protected] Abstract constructions? And if so, how can we deal with cases where the copula is not present in the sur- Copula constructions are problematic in face structure? the syntax of most languages. The paper In this paper, three different answers to these describes three different dependency syn- questions are discussed: the function head analy- tactic methods for handling copula con- sis, where function words, such as the copula, re- structions: function head, content head main the heads of the structures; the content head and complex label analysis. Furthermore, analysis, where the content words, in this case, the we also propose a POS-based approach to nominal part of the predicate, are the heads; and copula detection. We evaluate the impact the complex label analysis, where the copula re- of these approaches in computational pars- mains the head also, but the approach offers a dif- ing, in two parsing experiments for Hun- ferent solution to zero copulas. garian. First, we give a short description of Hungarian copula constructions. Second, the three depen- 1 Introduction dency syntactic frameworks are discussed in more Copula constructions show some special be- detail. Then, we describe two experiments aim- haviour in most human languages. In sentences ing to evaluate these frameworks in computational with copula constructions, the sentence’s predi- linguistics, specifically in dependency parsing for cate is not simply the main verb of the clause, Hungarian, similar to Nivre et al.
    [Show full text]
  • Several Parts of Speech Nouns
    What is a Part of Speech? A part of speech is a group of words that are used in a certain way. For example, "run," "jump," and "be" are all used to describe actions/states. Therefore they belong to the VERBS group. In other words, all words in the English language are divided into eight different categories. Each category has a different role/function in the sentence. The English parts of speech are: Nouns, pronouns, adjectives, verbs, adverbs, prepositions,conjunctions and interjections. Same Word – Several Parts of Speech In the English language many words are used in more than one way. This means that a word can function as several different parts of speech. For example, in the sentence "I would like a drink" the word "drink" is a noun. However, in the sentence "They drink too much" the word "drink" is a verb. So it all depends on the word's role in the sentence. Nouns A noun is a word that names a person, a place or a thing. Examples: Sarah, lady, cat, New York, Canada, room, school, football, reading. Example sentences: People like to go to the beach. Emma passed the test. My parents are traveling to Japan next month. The word "noun" comes from the Latin word nomen, which means "name," and nouns are indeed how we name people, places and things. Abstract Nouns An abstract noun is a noun that names an idea, not a physical thing. Examples: Hope, interest, love, peace, ability, success, knowledge, trouble. Concrete Nouns A concrete noun is a noun that names a physical thing.
    [Show full text]
  • 175 Function Words of Andio Language Viewed from Syntactical
    E-ISSN 2281-4612 Academic Journal of Interdisciplinary Studies Vol 2 No 2 ISSN 2281-3993 Published by MCSER-CEMAS-Sapienza University of Rome July 2013 Function Words of Andio Language Viewed from Syntactical Aspect Baso Andi-Pallawa Tadulako University, Indonesia - 2006 Email: [email protected] Doi:10.5901/ajis.2013.v2n2p175 Abstract This research describes the function words of Andio Language in terms of the function word types. For examples: (1) noun determiners, (2) qualifiers, (3) prepositions, (4) coordinators, (5) interrogators, (6) subordinating conjunction, (7) sentence linkers, (8) modal auxiliaries, (9) interjection, (10) aspect words, and (11) particles and (12) relative pronouns. The significance of this study is expected to produce the descriptive data of those types of function words, to furnish information for those who are interested in carrying out another study concerning with Andio language, to help the teaching of local content course, such as function words themselves of Andio language and to contribute to the Department of National Education, particularly in Central Sulawesi for the consideration as development of the local content course curriculum. The design of this study is descriptive qualitative. It is based on the process of investigation involving description and interpretation that can be assigned without manipulating any variables. The subjects of this study are the native speakers of Andio language at Lamala district, Luwuk Banggai regency in Central Celebes (Sulawesi). Keywords: function words, syntactic aspects, andio language, 1. Introduction Language is one of the most important and normal forms of human behavior. It always has a place in the academic world where its position has converted seriously: at one time the investigation of language is almost completely limited to specific languages, particularly those of their speakers are decreasing that make those languages undergo extinction.
    [Show full text]
  • Automatic Discovery of Adposition Typology
    Automatic Discovery of Adposition Typology Rishiraj Saha Roy Rahul Katare∗ IIT Kharagpur IIT Kharagpur Kharagpur, India – 721302. Kharagpur, India – 721302. [email protected] [email protected] Niloy Ganguly Monojit Choudhury IIT Kharagpur Microsoft Research India Kharagpur, India – 721302. Bangalore, India – 560001. [email protected] [email protected] Abstract Natural languages (NL) can be classified as prepositional or postpositional based on the order of the noun phrase and the adposition. Categorizing a language by its adposition typology helps in addressing several challenges in linguistics and natural language processing (NLP). Understand- ing the adposition typologies for less-studied languages by manual analysis of large text corpora can be quite expensive, yet automatic discovery of the same has received very little attention till date. This research presents a simple unsupervised technique to automatically predict the adpo- sition typology for a language. Most of the function words of a language are adpositions, and we show that function words can be effectively separated from content words by leveraging differ- ences in their distributional properties in a corpus. Using this principle, we show that languages can be classified as prepositional or postpositional based on the rank correlations derived from entropies of word co-occurrence distributions. Our claims are substantiated through experiments on 23 languages from ten diverse families, 19 of which are correctly classified by our technique. 1 Introduction Adpositions form a subcategory of function words that combine with noun phrases to denote their se- mantic or grammatical relationships with verbs, and sometimes other noun phrases. NLs can be neatly divided into a few basic typologies based on the order of the noun phrase and its adposition.
    [Show full text]
  • Morphology, Material for Incorporation in Curricula
    REPOR TRESUMES ED 019 258 24 TE 000 261 MORPHOLOGY, MATERIAL FORINCORPORATION IN CURRICULAOF GRADES 11 AND 12. NORTHERN ILLINOIS UNIV.,DE KALB 'REPORT NUMBER CRP-H-144-2 PUB DATE AUG 66 REPORT NUMBER BR -5- 1112-2 CONTRACT OEC-4-10-252 EDRS PRICE MF-$0.50 HC-$4.66 115P. DESCRIPTORS- *LANGUAGE INSTRUCTIONS*MORPHOLOGY (LANGUAGES), *CURRICULUM GUIDES, *ENGLISHINSTRUCTION, *FORM CLASSES (LANGUAGES), APPLIED LINGUISTICS,LANGUAGE GUIDES, GRAMMAR, PHONOLOGY, SYNTAX, GRACE119 .GRADE 12, SECONDARYEDUCATION, LANGUAGE, NORTHERN ILLINOISUNIVERSITY, DEKALB, PROJECT ENGLISH, THIS 'NORTHERN ILLINOISUNIVERSITY PROJECT ENGLISHUNIT IS PLANNED TO COMPLEMENT THEGRAMMAR WHICH 11TH- AND 12TH-GRADE STUDENTS ALREADYKNOW, AND TO ENRICH THEIR UNDERSTANDING OF THE ENGLISHLANGUAGE. THOUGH NOT PRIMARILY AN INTRODUCTION TO THE PARTSOF SPEECH, THE UNIT'ROVIDES SECTIONS ON NOUNS, VERBS ANDAUXILIARIES, ADJECTIVES, ADVERBS, CONJUNCTIONS, SUBORDINATORS,SENTENCE CONNECTORS, PRONOUNS, AND QUALIFIERS. THE UNITALSO CONTAINS (1) A DIAGNOSTIC TEST ON THE PARTSOF SPEECH,(2) A LESSON PLAN TO INTRODUCE MORPHEMICS,. (3) ANINTRODUCTION TO VOCABULARY STUDY, (4) A SECTION ON DERIVATIONALSUFFIXES,(5) A LESSON USING THE PARTS OF SPEECH INTEACHING THE ESSAY OF DEFINITION,(6) AN INTRODUCTION TO A12TH-GRADE UNIT ON THE PARTS OF SPEECH AND FORMALDEFINITION, AND (7)A TEST ON PHONOLOGY, MORPHOLOGY, AND SYNTAX.PARTS OR ALL OF THIS UNIT CAN BE USED (1) PRECEDING OR FOLLOWING A UNIT ON SYNTAX, (2) FOLLOWING A UNIT ON PHONOLOGY, (3) IN AN HONORS SENIOR ENGLISH CLASS, OR (4) TO ENRICHA PROGRAM FOR SUPERIOR STUDENTS. (MM) It.,1 110 e 'PO 13B MORPHOLOGY Material for incorporation in curricula of grades 11 and12 Caution:Those materials are being used experimentally by Project English participants, who are continuing to develop them.
    [Show full text]
  • Part-Of-Speech Tagging Dionysius Thrax of Alexandria (C
    Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2019. All rights reserved. Draft of October 2, 2019. CHAPTER 8 Part-of-Speech Tagging Dionysius Thrax of Alexandria (c. 100 B.C.), or perhaps someone else (it was a long time ago), wrote a grammatical sketch of Greek (a “techne¯”) that summarized the linguistic knowledge of his day. This work is the source of an astonishing proportion of modern linguistic vocabulary, including words like syntax, diphthong, clitic, and parts of speech analogy. Also included are a description of eight parts of speech: noun, verb, pronoun, preposition, adverb, conjunction, participle, and article. Although earlier scholars (including Aristotle as well as the Stoics) had their own lists of parts of speech, it was Thrax’s set of eight that became the basis for practically all subsequent part-of-speech descriptions of most European languages for the next 2000 years. Schoolhouse Rock was a series of popular animated educational television clips from the 1970s. Its Grammar Rock sequence included songs about exactly 8 parts of speech, including the late great Bob Dorough’s Conjunction Junction: Conjunction Junction, what’s your function? Hooking up words and phrases and clauses... Although the list of 8 was slightly modified from Thrax’s original, the astonishing durability of the parts of speech through two millennia is an indicator of both the importance and the transparency of their role in human language.1 POS Parts of speech (also known as POS, word classes, or syntactic categories) are useful because they reveal a lot about a word and its neighbors.
    [Show full text]
  • Phrasal Verbs
    Inglés IV (B-2008) Prof. Argenis A. Zapata Universidad de Los Andes Facultad de Humanidades y Educación Escuela de Idiomas Modernos Phrasal Verbs Phrasal verbs, also called two-word and three-word verbs, are compound verbs that consist of a verb followed by an adverb particle, which may also function as a preposition. Phrasal verbs function as semantic units; that is to say, they have a meaning as a whole. Often their meaning cannot be inferred from the sum of the meanings of the individual words. For this reason, the meaning of phrasal verbs must be memorized as a whole. For example, run into = meet (someone) by accident, talk over = discuss (something), look up = seek (a word) in a reference book, turn on = start the operation of (an appliance), turn off = stop the operation of (an appliance), wait on = serve (someone at a restaurant), look over = examine (a test), look into = investigate. Likewise, phrasal verbs are grammatical units that fulfill normal English verb functions in sentences. They may be transitive or intransitive verbs; i.e., they may or may not be followed by noun phrases or object pronouns (direct objects). E.g., I wanted to call up the department store, but I didn’t have its number. He got off at the corner. I haven’t seen my dog for a while; I’m looking for him. If you don’t know the meaning of a word, look it up in a dictionary. Phrasal verbs must be differentiated from normal verb + preposition sequences (also referred to as verb + prepositional phrases).
    [Show full text]