Linguistic Preprocessing for Distributional Analysis : Evidence from French Emmanuel Cartier

Total Page:16

File Type:pdf, Size:1020Kb

Linguistic Preprocessing for Distributional Analysis : Evidence from French Emmanuel Cartier Linguistic Preprocessing for Distributional Analysis : Evidence from French Emmanuel Cartier To cite this version: Emmanuel Cartier. Linguistic Preprocessing for Distributional Analysis : Evidence from French. Corpus Linguistics 2015, Lancaster University, Jul 2015, Lancaster, United Kingdom. hal-01443193 HAL Id: hal-01443193 https://hal.archives-ouvertes.fr/hal-01443193 Submitted on 22 Jan 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Linguistic Preprocessing for experiments/tasks with varying values for each parameter, so as to assess the most efficient ones. Distributional Analysis Efficiency : They conclude that : dependency-based does not Evidence from French trigger any improvement over raw-text n-gram calculus; as for feature granularity, that stemming yields the better results; as for stopwords or high- Emmanuel Cartier frequency words removal, it does yield better results, Université Paris 13 Sorbonne Paris Cité but only if no raw frequency weighting is applied to the results; this is in line with the conclusion of LIPN – RCLN – UMR 7030 (Bulinaria and Levy, 2012). [email protected] Nevertheless, these conclusions should be refined and completed : 1/ As feature granularity is concerned, the authors do 1 Introduction not take into account a combination of features from different levels; (Béchet et al., 2012), for example, have shown that combining features from three Distributional Hypothesis and Cognitive levels (form, lemma, pos-tag) can result in better Foundations pattern recognition for specific linguistic tasks; such For about fifteen years, the statistical paradigm, from a combination is also in line with the Cognitive the distributionalism hypothesis (Harris, 1954) and Semantics and the Construction Grammar corpus linguistics (Firth, 1957), has prevailed in the hypothesis, that linguistic signs emerge as NLP field, with a lot of convincing results : constructions combining schemes, lemmas and multiword expression, part-of-speech, semantic specific forms; relation identification, and even probabilistic models 2/ The experiments on dependency need additional of language. These studies have identified interesting experiments, as several works (for example Pado linguistic phenomena, such as collocations, and Lapata, 2007) made a contradictory conclusion. "collostructions" (Stefanowitsch, 2003), « word 3/ Stopwords or high-frequency words removal sketches » (Kilgariff et al., 2004). results in better results if no frequency weighting is Cognitive Semantics (Langacker, 1987, 1991; applied; but the authors apply – as quasi all work in Geeraerts et al. 1994 ; Schmid, 2007, 2013), have the field -, a brute-force removal either based on also introduced novel concepts, most notably that of “gold standard” stopword lists, or on a arbitrary « entrenchment », which enables to ground the count to cut off results; this technique should be social lexicalization of linguistic signs and to refined to remove only the noisy words or n-grams correlate it with repetition in corpus. and should be linguistically motivated. Finally, Construction Grammars (Fillmore et al., 1988 ; Goldberg, 1995, 2003 ; Croft, 2001, 2004, Linguistic Motivation for Linguistic 2007) have proposed linguistic models which reject Preprocessing the distinction lexicon (list of “words”) - grammar The hypothesis supported in this paper is that, if (expliciting the combination of words) : all linguistic repetition of sequences is the best way to access signs are constructions, from morphemes to usage and to induce linguistic properties, language syntaxical schemes, leading to the notion of users do not only rely on the sequentiality of “constructicon”, as a goal for linguistic description. language, but also on non-sequential knowledge thus untractable from the actual distribution of words. Computational Models of the Distributional This knowledge is linked to the three classical Hypothesis linguistical units : lexical units, phrases and As Computational Linguistics is concerned, the predicate structures, each being a combination of the Vector Space Model (VSM) has prevailed to preceding with language-specific rules for their implement the distributional hypothesis, giving rise construction. Probabilistic models of language have to continuous sophistication and several state-of-the- mainly focused until now on the lexical units level, arts (Turney and Pantel, 2010; Lenci and al., 2010; but to leverage language, probabilistic research must Kiela and Clark, 2013; Clark, 2015). (Kiela and also model and preprocess phrases and predicate Clarke, 2014) state that the following parameters are structures. implied in any VSM implementation: vector size, The present paper will try to ground this window size, window-based or dependency-based hypothesis through an experiment, aimed at context, feature granularity, similarity metric, retrieving lexico-semantic relations in French, where weighting scheme, stopwords and high frequency we preprocess the corpus in three ways : cut-off. Three of them are directly linked to • morphosyntactic analysis linguistic preprocessing : window-based or • peripheral lexical units removal dependency-based context, the second requiring a dependency analysis of the corpus; feature • phrases identification. granularity, ie, the fact of taking into account either the raw corpus, or a lemmatized or pos-tagged one As we will see, these steps enables to access more for n-gram calculus; stopwords and high frequency easily the predicate structures that the experiment cut-off , ie removal of high-frequency words or “tool aims at revealing, while using a VSM model on the words”. (Kiela and Clarke, 2014) conducted six resulting preprocessed corpus. 2 Evidence from French : Semantic 2. Unify nominal phrases, as they are the target Relations and Definitions for hypernym relations and their sparsity complicate retrieval of patterns. Definition model Here we assume that a definitory statement is a Take the following source definition : statement asserting the essential properties of a en/P cuisine/NC ,/PONCT un/DET DEFINIENDUM être/V lexical unit. It is composed of the definiendum un/DET pièce/NC de/P pâte/NC aplatir/VPP ,/PONCT (DFM), i.e. the lexical unit to be defined; the g é n é r a l e m e n t / A D V a u / P + D r o u l e a u / N C à / P pâtisserie/NC ./PONCT ((cooking) an undercrust is a piece definiens (DFS), i.e. the phrasal expression denoting of dough that has been flattened, usually with a rolling pin.) the essential properties of the DFM lexical unit; the definition relator (DEFREL), i.e. the linguistic items It will be reduced to : denoting the semantic relation between the two un/DET DEFINIENDUM être/V un/DET pièce/NC de/P previous elements. pâte/NC aplatir/VPP ,/PONCT au/P+D rouleau/NC à/P The traditional model of definition decomposes pâtisserie/NC ou/CC un/DET laminoir/NC ./PONCT the DFS into two main parts : HYPERNYM + And we extract the domain restriction : en/P PROPERTIES. cuisine/NC. Definitory statement can also comprise other information : enunciation components (according to Step 1 and 2 : Adverbials and subordinate clauses Sisley, a G-component is a ... ); domain restrictions The first linguistical sequences removed from the (in Astrophysics, a XXX is a YYY). source sentence are adverbials and specific clauses. But we would like to remove only clauses dependent Corpus on the main predicate, not those dependent on one of We use three corpora and retain only the nominal its core components. For example, we remove the entries in each : incidential clause in : DEFINIENDUM (/PONCT parfois/ADV Apaiang/NPP Trésor de la Langue Française (TLF) : 61 234 ,/PONCT même/ADJ prononciation/NC )/PONCT être/V nominal lexical units, and 90 348 definitions; un/DET atoll/NC de/P le/DET république/NC du/P+D French Wiktionary (FRWIK) : 140 784 nouns, Kiribati/NPP ./PONCT for a total of 187 041 definitions. But relative clauses dependent on one of the Wikipedia (WIKP) : 610 013 glosses (ie first definiens component should be first extracted: sentence of each article) from the French Wikipedia, DEFINIENDUM être/V du/P+ enzymes/NC qui/PROREL contrôler/V le/DET structure/NC topologique/ADJ de/P using a methodology next to (Navigli and al, 2008) l’ADN/NC ... The first two are dictionaries (TLF, FRWIK), the To achieve this goal, we use distributional analysis last one is an encyclopedia (WIKP). In the first case, on these clauses with (SDMC, Béchet et al., 2012) definition obeys to lexicographic standards, whereas and human tuning to determine the most frequent definitions are more “natural” in WIKP. patterns and trigger words of incidential clauses at specific locations in the sentence : beginning of the System Architecture definition, between the definiendum and a definition The system is composed of four steps : relator. • Morpho-syntactic analysis of the corpus Some incidential clauses convey a semantic relation, • Semantic Relation Trigger words Markup for example the synonymy relation : • Sentence Simplification : this step aims at
Recommended publications
  • Will Distributional Semantics Ever Become Semantic?
    Will Distributional Semantics Ever Become Semantic? Alessandro Lenci University of Pisa 7th International Global WordNet Conference January 28, 2014 - Tartu Alessandro Lenci GWC 2014 @ Tartu - January 28, 2014 1 Distributional semantics Theoretical roots What is Distributional Semantics? Distributional semantics is predicated on the assumption that linguistic units with certain semantic similarities also share certain similarities in the relevant environments. If therefore relevant environments can be previously specified, it may be possible to group automatically all those linguistic units which occur in similarly definable environments, and it is assumed that these automatically produced groupings will be of semantic interest. Paul Garvin, (1962), “Computer participation in linguistic research”, Language, 38(4): 385-389 Alessandro Lenci GWC 2014 @ Tartu - January 28, 2014 2 Distributional semantics Theoretical roots What is Distributional Semantics? Distributional semantics is predicated on the assumption that linguistic units with certain semantic similarities also share certain similarities in the relevant environments. If therefore relevant environments can be previously specified, it may be possible to group automatically all those linguistic units which occur in similarly definable environments, and it is assumed that these automatically produced groupings will be of semantic interest. Paul Garvin, (1962), “Computer participation in linguistic research”, Language, 38(4): 385-389 Alessandro Lenci GWC 2014 @ Tartu - January 28, 2014 3 Distributional semantics Theoretical roots The Pioneers of Distributional Semantics Distributionalism in linguistics Zellig S. Harris To be relevant [linguistic] elements must be set up on a distributional basis: x and y are included in the same element A if the distribution of x relative to the other elements B, C, etc.
    [Show full text]
  • The Twilight of Structuralism in Linguistics?*
    Ewa Jędrzejko Univesity of Silesia in Katowice nr 3, 2016 The Twilight of Structuralism in Linguistics?* Key words: structuralism, linguistics, methodological principles, evolution of studies on language I would like to start with some general remarks in order to avoid misunderstandings due to the title of this paper. “The question that our title / has cast in deathless bronze”1 is just a mere trigger, and a signal of reference to animated discussions held in many disci- plines of modern liberal arts: philosophy, literary studies, social sciences, aesthetics, cultural studies (just to mention areas closest to linguistics), concerning their current methodologi- cal state and future perspectives regarding theoretical research. What I mean by that is, among others, the dispute around deconstructionism being in opposition to neo-positivistic stance, as well as some views on the current situation in art expressed in the volume called Zmierzch estetyki – rzekomy czy autentyczny? [The twilight of aesthetics – alleged or true?] (Morawski, 1987),2 or problems signalled by the literary scholars in the tellingly entitled Po strukturalizmie [After structuralism] (Nycz, 1992).3 The statements included in this article are for me one out of many voices in the discus- sion on the changes observed in linguistics in the context of questions about the future of structuralism. Since in a way the situation in contemporary linguistics (including Polish linguistics) is similar. The shift in the scope of interest of linguistics and the way it is presented may
    [Show full text]
  • AVAILABLE from 'Bookstore, ILC, 7500 West Camp Wisdom Rd
    DOCUMENT RESUME ED 401 726 FL 024 212 AUTHOR Payne, David, Ed. TITLE Notes on Linguistics, 1996. INSTITUTION Summer Inst. of Linguistics, Dallas, Tex. REPORT NO ISSN-0736-0673 PUB DATE 96 NOTE 239p. AVAILABLE FROM 'Bookstore, ILC, 7500 West Camp Wisdom Rd., Dallas, TX 75236 (one year subscription: SIL members, $15.96 in the U.S., $19.16 foreign; non-SIL members, $19.95 in the U.S.; $23.95 foreign; prices include postage and handling). PUB TYPE Collected Works Serials (022) JOURNAL CIT Notes on Linguistics; n72-75 1996 EDRS PRICE MF01/PC10 Plus Postage. DESCRIPTORS Book Reviews; Computer Software; Conferences; Dialects; Doctoral Dissertations; Group Activities; *Language Patterns; *Language Research; *Linguistic Theory; Native Speakers; Phonology; Professional Associations; Publications; Research Methodology; *Syntax; Textbooks; Tone Languages; Workshops IDENTIFIERS 'Binding Theory ABSTRACT The four 1996 issues of this journal contain the following articles: "Sketch of Autosegmental Tonology" (H. Andrew Black); "System Relationships in Assessing Dialect Intelligibility" (Margaret Milliken, Stuart Milliken); "A Step-by-Step Introduction to Government and Binding Theory of Syntax" (Cheryl A. Black); "Participatory Research in Linguistics" (Constance Kutsch Lojenga); "Introduction to Government and Binding Theory II" (Cheryl A. Black); What To Do with CECIL?" (Joan Baart); "WINCECIL" (Jerold A. Edmondson); "Introduction to Government and Binding Theory III" (Cheryl A. Black); and "Mainland Southeast Asia: A Unique Linguistic Area" (Brian Migliazza). Each issue also contains notes from the SIL Linguistics Department coordinator, a number of reports on linguistics association conferences around the world, book and materials reviews, and professional announcements. (MSE) *********************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document.
    [Show full text]
  • Glossary for Syntax in Three Dimensions (2015) © Carola Trips
    Glossary for Syntax in three Dimensions (2015) © Carola Trips A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y – Z A abstract case Abstract Case (or "Deep Case") is an abstract notion of case distinct from the morphologically marked case (cf. "case") used in Case theory (subtheories of Generative Grammar). Abstract Case is present even in languages which lack morphological case on noun phrases; it is usually assumed to be congruent with morphological case when such features are present. For example in German dative case is marked morphologically as in Ich helfe ihm ('I help him') whereas no such case marking is found in the English sentence (but Case is still there). academy In previous centuries the need for an academy dealing with linguistic matters was discussed and debated in England (on the model of the French académie française). Due to the increase in the production of grammars in the 17th and 18th centuries and the rising number of grammarians, calls for an academy to codify the English language by publishing an authoritative grammar and dictionary dwindled. acceptance Step in the process of standardization: a selected variety of a particular language must gain acceptance by a group of speakers/users who are capable of influencing other speakers. actants In valency grammar, a functional unit determined by the valency of the verb. Actants are required by the valence of the verb and are the equivalents of arguments (in generative grammar). © Carola Trips 1 active voice Term used in grammatical analysis of voice, referring to a sentence, clause or verb form where from a semantic point of view the grammatical subject is typically the actor in relation to the verb.
    [Show full text]
  • TABLE of CONTENTS Preface 5 List of Abbreviations 10 L Introduction 13 2. Linguistic Research in Ancient Greece 15 3. the Indian
    TABLE OF CONTENTS Preface 5 List of Abbreviations 10 LINGUISTIC RESEARCH BEFORE THE NINETEENTH CENTURY L Introduction 13 2. Linguistic Research in Ancient Greece 15 3. The Indian Grammatical School 22 4. From the Days of the Roman Empire to the end of the Renaissance 26 5. From the Renaissance to the end of the Eighteenth Century 31 LINGUISTIC RESEARCH IN THE NINETEENTH CENTURY 6. Introduction 37 7. The Epoch of the First Comparativists 40 8. The Biological Naturalism of August Schleicher. ... 44 9. "Humboldtism" in Linguistics (the "Weltanschauung" theory) 48 10. Psychologism in Linguistics 52 11. The Neo-grammarians 58 12. Hugo Schuchardt, a Representative of the 'Tndependents" 65 8 TABLE OF CONTENTS LINGUISTIC RESEARCH IN THE TWENTIETH CENTURY 13. Introduction 69 The Basic Characteristics of Twentieth Century Scholar- ship 69 The Trend of Development in Linguistics 72 14. Non-Structural Linguistics 78 Linguistic Geography 78 The Foundation of Methods 78 Modern Dialectology 82 The French Linguistic School 84 The Psychophysiological, Psychological and Sociolog- ical Investigation of Language 84 Stylistic Research 87 Aesthetic Idealism in Linguistics 89 Introduction 89 Vossler's School 90 Neolinguistics 93 The Progressive Slavist Schools 97 The Kazan School 97 The Fortunatov (Moscow) School 100 The Linguistic Views of Belic 101 Marrism 102 Experimental Phonetics 107 15. Structural Linguistics . 113 Basic Tendencies of Development 113 Ferdinand de Saussure 122 The Geneva School 129 The Phonological Epoch in Linguistics 132 The Forerunners 132 The Phonological Principles of Trubetzkoy 134 The Prague Linguistic Circle 141 The Binarism of Roman Jakobson 144 TABLE OF CONTENTS 9 The Structural Interpretation of Sound Changes.
    [Show full text]
  • The Semantics of Poetry: a Distributional Reading
    The semantics of poetry: a distributional reading Aur´elie Herbelot University of Cambridge, Computer Laboratory J.J. Thomson Avenue, Cambridge CB1 8AZ United Kingdom [email protected] February 2, 2014 1 Abstract Poetry is rarely a focus of linguistic investigation. This is far from surprising, as poetic language, especially in modern and contemporary literature, seems to defy the general rules of syntax and semantics. This paper assumes, however, that linguistic theories should ideally be able to account for creative uses of language, down to their most difficult incarnations. It proposes that at the semantic level, what distinguishes poetry from other uses of language may be its ability to trace conceptual patterns which do not belong to everyday discourse but are latent in our shared language structure. Distributional semantics provides a theoretical and experimental basis for this exploration. First, the notion of a specific ‘semantics of poetry’ is discussed, with some help from literary criticism and philosophy. Then, distributionalism is introduced as a theory supporting the notion that the meaning of poetry comes from the meaning of ordinary language. In the second part of the paper, experimental results are provided showing that a) distributional representa- tions can model the link between ordinary and poetic language, b) a distributional model can experimentally distinguish between poetic and randomised textual out- put, regardless of the complexity of the poetry involved, c) there is a stable, but not immediately transparent, layer of meaning in poetry, which can be captured distributionally, across different levels of poetic complexity. 2 1 Introduction Poetry is not a genre commonly discussed in the linguistics literature.
    [Show full text]
  • Peter Matthews, Grammatical Theory in the United
    Linguistic Society of America Review: [untitled] Author(s): D. Terence Langendoen Reviewed work(s): Grammatical Theory in the United States from Bloomfield to Chomsky by Peter H. Matthews Source: Language, Vol. 71, No. 3 (Sep., 1995), pp. 595-600 Published by: Linguistic Society of America Stable URL: http://www.jstor.org/stable/416230 Accessed: 12/05/2009 12:08 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=lsa. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected]. Linguistic Society of America is collaborating with JSTOR to digitize, preserve and extend access to Language.
    [Show full text]
  • English Verbs in Syntactic Structures
    Mark Arono English verbs in Syntactic Structures Beauty is truth, truth beauty—that is all Ye know on earth, and all ye need to know. John Keats, Ode on a Grecian Urn (1820) 1 Introduction The formally inclined take comfort in the last two lines of Keats’s Ode on a Grecian Urn. We judge an analysis by its beauty, or elegance, or simplicity. In linguistics, there is no clearer demonstration of Keats’s maxim than the analysis of English verbs in Syntactic Structures (henceforth SS). I am not alone in my belief that the beauty of this analysis played a large part in leading the field to accept the truth of Chomsky’s claim that the description of human language calls for the use of transformations. The system of English verbs provides one of the core pieces of evidence in SS for the value of transformations in grammatical theory and description. In this article, I will review Chomsky’s analysis of English verbs and use it to partially re- construct the (largely implicit) view of morphology that lies behind it—most cen- trally, the role that the morpheme played in SS and just what the term morpheme meant in that work. I will explore the roots of this view in structuralist linguistics, especially in the morphological theory of Chomsky’s mentor, Zellig Harris. Chom- sky’s analysis of English verbs and their morphology might have been possible within another framework, but a deeper understanding of the assumptions that undergird it makes the analysis even more beautiful. In his preface, Chomsky described SS as a comparison of “three models for linguistic structure” (p.
    [Show full text]
  • FRENCH DISCOURSE ANALYSIS Glyn Williams
    FRENCH DISCOURSE ANALYSIS The method of post-structuralism Glyn Williams London and New York I ELLIW HAF First published 1999 by Routledge Fy nghariad a' m cymar. 11 New Fetter Lane, London EC4P 4EE Simultaneously published in the USA and Canada by Routledge 29 West 35th Street, New York, NY 10001 © 1999 Glyn Williams Typeset in Bembo by Keystroke, Jacaranda Lodge, Wolverhampton Printed and bound in Great Britain by Mackays of Chatham plc, Chatham, Kent All rights reserved. No part of this book may be reprinted or reproduced or utilised in any fonn or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library cifCongress Cataloguing in Publication Data Williams, Glyn French Discourse Analysis: The method of post-structuralism/Glyn Williarns. Includes bibliographical references. 1. French language--Discourse analysis. 2. Poststructuralism. I. Title. PC2434.WSS 1998 440.1'41-dc21 98-13943 ISBN 0-415-18940-3 CONTENTS Acknowledgements xi Introduction 1 1 Modernism and the philosophy of language 11 Modernism 11 The Enlightenment, reason and philosophies rf language 12 Language 21 Discourse 2 6 Conclusion 3 2 2 Structuralism 33 Sign and signifier 35 Langue and parole 40 Influences 44 Uvi-Strauss 48 Lacan 53 Conclusion 61 3 Post-structuralism 63 Bachelard and the philosophy rf science 64 Canguilhem on science as discourse 65 Althusser's anti-humanism 67 Foucault and discourse 7 6 Subject, object, concep ts and enonciative modalities 85 The production rf discourse 89 Power and resistance 94 Conclusion 98 Vll CONTENTS 4 Materialism and Discourse 100 Interpretation 2 80 Introduction 100 Conclusion 284 Th e Althusserian influence 103 normativity and the ontology of being 285 The methodology cifAAD69 112 9 Post-structuralism.
    [Show full text]
  • Distributionalism: Document Classification, Word Sense
    Gerold Schneider, SS 2005: Distributionalism 1 Morphologie Computerlinguistik, und Lexikologie Institut f¨ur Informatik Distributionalism: Document Classification, Word Sense Disambiguation, Word Similarity Gerold Schneider Universit¨at Z¨urich gschneid@ifi.unizh.ch June 30, 2005 Gerold Schneider, SS 2005: Distributionalism 2 Contents 1. The Distributional Hypothesis 2. Document Classification – TFIDF 3. WSD – Yarowsky 92 4. Document Classification with WSD – Schutze 95 5. Word Similarity – Weeds et al. 05 Gerold Schneider, SS 2005: Distributionalism 3 1 The Distributional Hypothesis The Strong or Weak Contextual Hypothesis of (Miller and Charles, 1991) says that the similarity of the contextual representation of two words determines (fully or partially) the semantic similarity of those words. “Miller and Charles (1991) advanced a contextual approach to semantic similarity that builds upon Leibnizs definition of synonymy in terms of the interchangeability of words in linguistic contexts. A main idea is that the semantic similarity of two words is a critical function of their interchangeability, without a loss of plausibility, in natural linguistic contexts. The contextual approach borrows from discussions of synonymy in terms of interchangeability.” Gerold Schneider, SS 2005: Distributionalism 4 2 Document Classification – TFIDF Classical IR models use a beg-of-words TFIDF model Idea: Similar documents have similar words. Especially if they are rare words. • Edit distance=Term frequency (TF) distance: How many tokens are dissimilar between two
    [Show full text]
  • Structural Linguistics
    Structural linguistics Dr LE BI Le Patrice, OCT Canada-Ontario Certified Teacher of English and French UNIVERSITE METHODISTE DE COTE D’IVOIRE (UMECI) COURSE DESCRIPTION • This course gives an overview of the general theoretical framework of Structural Linguistics. It mainly explores the four major Schools of Structural Linguistics (the School of Geneva, the Prague School of Linguistics, Glossematics and Distributionalism) by featuring their prominent figures and laying an emphasis on their methods of investigation. The course also highlights the contribution of the four schools to the development of language science and the understanding of human language. The course ends by exposing students to other theories or schools which are sometimes regarded as being related to structuralism. Overall Expectations of the Course • BY THE END OF THIS COURSE, STUDENTS ARE EXPECTED TO: • Know the major schools of structural linguistics, their prominent figures and their methods of investigation; • Understand the relevance of structural linguistics in General Linguistics and its contribution to the advent of Modern Linguistics. Specific Expectations • BY THE END OF THIS COURSE, STUDENTS WILL: • Know and understand the basic concepts developed by each school and their relevance to the analysis of human language; • Figure out how the concept of structuralism extends to theories other than the four major schools of structural linguistics; Learning goals •BY THE END OF THIS COURSE, STUDENTS SHOULD BE ABLE TO: •Conduct a linguistic analysis grounded in the framework of the theories studied. bibliography • Campel, Lyle (2013) Historical Linguistics: An Introduction. Edinburgh, Edinburg University Press. 3rd Edition. • De Saussure, Ferdinand (1916) Course in General Linguistics. Edited by Charles Bally and Albert Sechehaye.
    [Show full text]
  • Language Use in the Public Sphere Li 170 the Publicsphere Language Usein
    li li170 li170 Linguistic Insights 170 Studies in Language and Communication This book comprises a range of general discus- sions on tradition and innovation in the method- ology used in discourse studies (Pragmatics, Discourse Analysis, Argumentation Theory, Rhetoric, Philosophy) and a number of empirical applications of such methodologies in the anal- ysis of actual instances of language use in the public sphere – in particular, discourses arising in the context of the debate on the presence of religious symbols in public places. Inés Olza is Post-Doctoral Research Fellow with GRADUN (Discourse Analysis Group, University Language Use in the Public Sphere • of Navarra) in ICS (Institute for Culture and ) Society). She also works as a Lecturer in General (eds and Spanish Linguistics at the Spanish National University for Distance Education (UNED, associ- ated center in Pamplona). Her research focuses on the semantic-pragmatic analysis of phraseol- ogy and metalanguage (of Spanish and other Inés Olza, Óscar Loureda & languages), on multimodality, and on the argu- Manuel Casado-Velarde (eds) mentative use of figurative language in public discourse. Óscar Loureda is Chair Professor of Spanish Linguistics at the University of Heidelberg (Ger- Language Use in many), and since 2011 he has been the director of the University’s Iberoamerika-Zentrum. He is also a member of the Research Group “Dis- the Public Sphere course Particles and Cognition” in the Heidel- berg University Language and Cognition Lab (HULC LAB). His research interests center on Methodological Perspectives and Semantics and Lexicography of Spanish, and on Empirical Applications general theory of discourse (Text Linguistics, Text Grammar, Discourse Analysis).
    [Show full text]