Noise Reduction and Normalization of Microblogging Messages

Total Page:16

File Type:pdf, Size:1020Kb

Noise Reduction and Normalization of Microblogging Messages FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO PROGRAMA DOUTORAL EM ENGENHARIA INFORMÁTICA Noise reduction and normalization of microblogging messages Supervisor Author Eugénio da Costa Oliveira Gustavo Alexandre Co-Supervisor Teixeira Laboreiro Luís Morais Sarmento February 5, 2018 Dedicated to my loving family Acknowledgements First of all I would like to thank my family in loving recognition of their tireless support and understanding throughout this long endeavor. Such a work would not be possible without their help. Further thanks are also warmly extended to my supervisor and co- supervisor who provided knowledge, help, guidance and encouragement through all these years. Special appreciation is also due to the teachers who provided valuable insight and opinions, which greatly enriched this work and to whom the author feels indebted; particularly Eduarda Mendes Rodrigues and Car- los Pinto Soares. I would also like to express my recognition for the help provided by the members of the Scientific Committee. For many days several colleagues accompanied me in this academic journey (but also a life journey), and who selflessly shared their time and friendship. To all of them I present an expression of gratitude. The friendly department staff is not forgotten, specially Sandra Reis who always went beyond the line of duty to provide assistance in the hours of need. The author would also like to make known the financial support that was provided through grant SAPO/BI/UP/Sylvester from SAPO/Portugal Telecom and Fundação para a Ciência e a Tecnologia (FCT) through the RE- ACTION project: Retrieval, Extraction and Aggregation Computing Tech- nology for Integrating and Organizing News (UTA-Est/MAI/0006/ 2009). i ii Abstract User-Generated Content (UGC) was born out of the so-called Web 2.0 — a rethinking of the World Wide Web around content produced by the users. Social media posts, comments, and microblogs are examples of textual UGC that share a number of characteristics: they are typically short, sporadic, to the point and noisy. In this thesis we propose new pre-processing and normalization tools, specifically designed to overcome some limitations and difficulties com- mon in the analysis of UGC that prevent a better grasp on the full potential of this rich ecosystem. Machine learning was employed in most of these solutions that improve (in a substantial way some times) over the baseline methods that we used as comparison. Performance was measured using standard measures and error were examined for better evaluation of limi- tations. We achieved 0.96 and 0.88 F1 values on our two tokenization tasks (0.96 and 0.78 accuracy), employing a language-agnostic solution. We were able to assign messages to their respective author from a pool of 3 candidates with a 0.63 F1 score, based only on their writting style, using no linguistic knowledge. Our attempt at identifying Twitter bots was met with median ac- curacy of 0.97, in part due to the good performance of our stylistic features. Language variant identification is much more difficult than simply recogniz- ing the language used since it is a much finer problem; however we were able to achieve 0.95 accuracy based on a sample of 100 messages per user. Finally we address the problem of reverting obfuscation, which is used to disguise words (usually swearing) in text. Our experiments measured a 0.93 median weighted F1 score in this task. We have also made available the Portuguese annotated corpus that was created and examined for this task. iii iv Sumário Os Conteúdos Gerados por Utilizadores (CGU) nasceram da chamada Web 2.0 — uma revisão da World Wide Web em torno de conteúdos produzi- dos pelos utilizadores. Exemplos dos CGU são as mensagens colocadas nos media sociais, os comentários em websites e os microblogs. Todos eles partilham algumas características típicas como a sua brevidade, serem es- porádicos, diretos e ruidosos. Nesta tese propomos novas ferramentas de pré-processamento e nor- malização que foram criadas explicitamente para superarem algumas limi- tações e dificulades típicas na análise de CGU e que impedem uma melhor compreensão do verdadeiro potencial deste ecossistema rico. Na maioria dos casos foi empregue aprendisagem automática, o que permitiu superar (por vezes de forma substancial) os sistemas usados para comparação. O desempenho foi calculado recorrendo a métricas padronizadas e os erros foram examinados por forma a avaliar melhor as limitações dos sistemas propostos. Alcançámos valores F1 de 0,96 e 0,88 nas nossas duas tarefas de ato- mização (0,96 e 0,78 de exatidão), empregando uma solução sem qualquer dependência linguística. Conseguimos atribuir a autoria de uma mensa- gem, de entre 3 candidatos, com uma pontuação F1 de 0,63 baseada apenas no seu estilo de escrita, sem qualquer atenção à língua em que foi redigida. A nossa tentativa de identificar sistemas automáticos no Twitter recebeu uma mediana de exatidão de 0,97, em parte graças ao bom desempenho das ca- racterísticas estilísticas mencionadas anteriormente. A identificação de vari- antes da língua é um problema muito mais complexo do que a simples iden- tificação da língua, dado que é uma questão de granularidade mais fina; no entanto consguimos alcançar 0,95 de exatidão operando sobre 100 men- sagens por cada utilizador. Por fim, abordámos o problema de reverter a ofuscação, que é empegue para disfarçar algumas palavras (regra geral, pa- lavrões) em texto. As nossas experiências apontaram uma pontuação me- diana ponderada do F1 de 0,93 nesta tarefa. Disponibilizámos livremente o nosso corpus anotado em português que foi criado e examinado para esta tarefa. v vi Contents 1 Introduction 1 1.1 Research questions . .3 1.2 State of the art . .6 1.3 Methodology used . .7 1.4 Contributions . .7 1.5 Thesis Structure . .8 2 UGC and the world of microblogging 9 2.1 What is User-Generated Content . 10 2.2 A brief introduction to microblogs . 12 2.3 A few notes about Twitter . 13 2.3.1 The network . 14 2.3.2 User interaction . 14 2.3.3 Defining the context of a message . 16 2.3.4 Takeaway ideas . 16 2.4 How microblog messages can be problematic . 17 2.4.1 Spelling choices . 17 2.4.2 Orality influences . 18 2.4.3 Non-standard capitalization . 19 2.4.4 Abbreviations, contractions and acronyms . 20 2.4.5 Punctuation . 23 2.4.6 Emoticons . 23 2.4.7 Decorations . 24 2.4.8 Obfuscation . 25 2.4.9 Special constructs . 26 2.5 Conclusion . 26 3 Tokenization of micro-blogging messages 27 3.1 Introduction . 28 3.2 Related work . 32 3.3 A method for tokenizing microblogging messages . 34 3.3.1 Extracting features for classification . 35 3.4 Tokenization guidelines . 36 vii viii CONTENTS 3.4.1 Word tokens . 37 3.4.2 Numeric tokens . 38 3.4.3 Punctuation and other symbols . 38 3.5 Experimental set-up . 39 3.5.1 The tokenization methods . 39 3.5.2 Evaluation scenarios and measures . 40 3.5.3 The corpus creation . 41 3.6 Results and analysis . 42 3.7 Error analysis . 44 3.7.1 Augmenting training data based on errors . 47 3.8 Conclusion and future work . 47 4 Forensic analysis 51 4.1 Introduction . 52 4.2 Related work . 52 4.3 Method description & stylistic features . 54 4.3.1 Group 1: Quantitative Markers . 55 4.3.2 Group 2: Marks of Emotion . 55 4.3.3 Group 3: Punctuation . 56 4.3.4 Group 4: Abbreviations . 56 4.4 Experimental setup . 56 4.4.1 Performance evaluation . 57 4.5 Results and analysis . 58 4.5.1 Experiment set 1 . 58 4.5.2 Experiment set 2 . 59 4.6 Conclusions . 60 5 Bot detection 63 5.1 Introduction . 64 5.1.1 Types of users . 65 5.2 Related work . 67 5.3 Methodology . 69 5.3.1 Chronological features . 69 5.3.2 The client application used . 72 5.3.3 The presence of URLs . 72 5.3.4 User interaction . 72 5.3.5 Writing style . 73 5.4 Experimental set-up . 77 5.4.1 Creation of a ground truth . 77 5.4.2 The classification experiment . 78 5.5 Results and analysis . 79 5.6 Conclusion and future work . 80 CONTENTS ix 6 Nationality detection 83 6.1 Introduction . 84 6.2 Related work . 86 6.3 Methodology . 87 6.3.1 User selection . 88 6.3.2 Features . 88 6.4 Experimental setup . 90 6.4.1 Dataset generation . 91 6.4.2 Determining the number of messages . 91 6.4.3 Determining the most significant features . 92 6.5 Results . 92 6.6 Analysis . 94 6.7 Conclusion . 95 6.8 Future work . 96 7 An analysis of obfuscation 97 7.1 Introduction . 98 7.2 What is profanity, and why it is relevant . 99 7.2.1 On the Internet . 100 7.3 Related work . 101 7.3.1 How prevalent is swearing? . 102 7.3.2 Why is it difficult to censor swearing? . 106 7.4 Swearing and obfuscation . 109 7.4.1 Our objectives . 112 7.5 Works related to the use of swear words . 113 7.5.1 List-based filtering systems . 114 7.5.2 Detection of offensive messages . 115 7.5.3 Works related to the study of swear words . 116 7.6 The SAPO Desporto corpus . 120 7.6.1 Description of the dataset . 121 7.6.2 The annotation . 122 7.6.3 The lexicon . 124 7.7 Obfuscation . 126 7.7.1 Obfuscation methods . 128 7.7.2 Obfuscation method analysis . 130 7.7.3 Preserving word length . 131 7.7.4 Altering word length .
Recommended publications
  • Ehrensperger Report
    American Name Society 51st Annual Ehrensperger Report 2005 A Publication of the American Name Society Michael F. McGoff, Editor PREFACE After a year’s hiatus the Ehrensperger Report returns to its place as a major publication of the American Name Society (ANS). This document marks the 51st year since its introduction to the membership by Edward C. Ehrensperger. For over twenty-five years, from 1955 to 1982, he compiled and published this annual review of scholarship. Edward C. Ehrensperger 1895-1984 As usual, it is a partial view of the research and other activity going on in the world of onomastics, or name study. In a report of this kind, the editor must make use of what comes in, often resulting in unevenness. Some of the entries are very short; some extensive, especially from those who are reporting not just for themselves but also for the activity of a group of people. In all cases, I have assumed the prerogative of an editor and have abridged, clarified, and changed the voice of many of the submissions. I have encouraged the submission of reports by email or electronically, since it is much more efficient to edit text already typed than to type the text myself. For those not using email, I strongly encourage sending me written copy. There is some danger, however, in depending on electronic copy: sometimes diacritical marks or other formatting matters may not have come through correctly. In keeping with the spirit of onomastics and the original Ehrensperger Report, I have attempted where possible to report on research and publication under a person’s name.
    [Show full text]
  • Preprint Corpus Analysis in Forensic Linguistics
    Nini, A. (2020). Corpus analysis in forensic linguistics. In Carol A. Chapelle (ed), The Concise Encyclopedia of Applied Linguistics, 313-320, Hoboken: Wiley-Blackwell Corpus Analysis in Forensic Linguistics Andrea Nini Abstract This entry is an overview of the applications of corpus linguistics to forensic linguistics, in particular to the analysis of language as evidence. Three main areas are described, following the influence that corpus linguistics has had on them in recent times: the analysis of texts of disputed authorship, the provision of evidence in cases of trademark disputes, and the analysis of disputed meanings in criminal or civil cases. In all of these areas considerable advances have been made that revolve around the use of corpus data, for example, to study forensically realistic corpora for authorship analysis, or to provide naturally occurring evidence in cases of trademark disputes or determination of meaning. Using examples from real-life cases, the entry explains how corpus analysis is therefore gradually establishing itself as the norm for solving certain forensic problems and how it is becoming the main methodological approach for forensic linguistics. Keywords Authorship Analysis; Idiolect; Language as Evidence; Ordinary Meaning; Trademark This entry focuses on the use of corpus linguistics methods and techniques for the analysis of forensic data. “Forensic linguistics” is a very general term that broadly refers to two areas of investigation: the analysis of the language of the law and the analysis of language evidence in criminal or civil cases. The former, which often takes advantage of corpus methods, includes areas as wide as the study of the language of the judicial process and courtroom interaction (e.g., Tkačuková, 2015), the study of written legal documents (e.g., Finegan, 2010), and the investigation of interactions in police interviews (e.g., Carter, 2011).
    [Show full text]
  • Swearing a Cross-Cultural Study in Asian and European Languages
    Swearing A cross-cultural study in Asian and European Languages Thesis Submitted to Radboud University Nijmegen For the degree of Master of Arts (M.A) Name: Syahrul Rahman / s4703944 Email: [email protected] Supervisor 1: Dr. Ad Foolen Supervisor 2: Professor Helen de Hoop Master Linguistics Radboud University Nijmegen 2016/2017 0 Acknowledgment In the name of Allah, the beneficent and merciful. All praises be to Allah for His mercy and blessing. He has given me health and strength to complete this master thesis as particular instance of this research. Then, may His peace and blessing be upon to His final prophet and messenger, Muhammad SAW, His family and His best friends. In writing and finishing this thesis, there are many people who have provided their suggestion, motivation, advice and remark that all have helped me to finish this paper. Therefore, I would like to express my big appreciation to all of them. For the first, the greatest thanks to my beloved parents Abd. Rahman and Nuriati and my family who have patiently given their love, moral values, motivation, and even pray for me, in every single prayer just to wish me to be happy, safe and successful, I cannot thank you enough for that. Secondly, I would like to dedicate my special gratitude to my supervisor, Dr. Ad Foolen, thanking him for his guidance, assistance, support, friendly talks, and brilliant ideas that all aided in finishing my master thesis. I also wish to dedicate my big thanks to Helen de Hoop, for her kind willingness to be the second reviewer of my thesis.
    [Show full text]
  • Identifying Idiolect in Forensic Authorship Attribution: an N-Gram Textbite Approach Alison Johnson & David Wright University of Leeds
    Identifying idiolect in forensic authorship attribution: an n-gram textbite approach Alison Johnson & David Wright University of Leeds Abstract. Forensic authorship attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially evidential in legal cases, through the analysis of linguistic clues left behind by writers. The forensic linguist “approaches this problem of questioned authorship from the theoretical position that every native speaker has their own distinct and individual version of the language [. ], their own idiolect” (Coulthard, 2004: 31). However, given the diXculty in empirically substantiating a theory of idiolect, there is growing con- cern in the Veld that it remains too abstract to be of practical use (Kredens, 2002; Grant, 2010; Turell, 2010). Stylistic, corpus, and computational approaches to text, however, are able to identify repeated collocational patterns, or n-grams, two to six word chunks of language, similar to the popular notion of soundbites: small segments of no more than a few seconds of speech that journalists are able to recognise as having news value and which characterise the important moments of talk. The soundbite oUers an intriguing parallel for authorship attribution studies, with the following question arising: looking at any set of texts by any author, is it possible to identify ‘n-gram textbites’, small textual segments that characterise that author’s writing, providing DNA-like chunks of identifying ma- terial? Drawing on a corpus of 63,000 emails and 2.5 million words written by 176 employees of the former American energy corporation Enron, a case study approach is adopted, Vrst showing through stylistic analysis that one Enron em- ployee repeatedly produces the same stylistic patterns of politely encoded direc- tives in a way that may be considered habitual.
    [Show full text]
  • An Introduction to Forensic Linguistics: Language in Evidence
    An Introduction to Forensic Linguistics ‘Seldom do introductions to any fi eld offer such a wealth of information or provide such a useful array of exercise activities for students in the way that this book does. Coulthard and Johnson not only provide their readers with extensive examples of the actual evidence used in the many law cases described here but they also show how the linguist’s “toolkit” was used to address the litigated issues. In doing this, they give valuable insights about how forensic linguists think, do their analyses and, in some cases, even testify at trial.’ Roger W. Shuy, Distinguished Research Professor of Linguistics, Emeritus, Georgetown University ‘This is a wonderful textbook for students, providing stimulating examples, lucid accounts of relevant linguistic theory and excellent further reading and activities. The foreign language of law is also expertly documented, explained and explored. Language as evidence is cast centre stage; coupled with expert linguistic analysis, the written and spoken clues uncovered by researchers are foregrounded in unfolding legal dramas. Coulthard and Johnson have produced a clear and compelling work that contains its own forensic linguistic puzzle.’ Annabelle Mooney, Roehampton University, UK From the accusation of plagiarism surrounding The Da Vinci Code, to the infamous hoaxer in the Yorkshire Ripper case, the use of linguistic evidence in court and the number of linguists called to act as expert witnesses in court trials has increased rapidly in the past fi fteen years. An Introduction to Forensic Linguistics provides a timely and accessible introduction to this rapidly expanding subject. Using knowledge and experience gained in legal settings – Coulthard in his work as an expert witness and Johnson in her work as a West Midlands police offi cer – the two authors combine an array of perspectives into a distinctly unifi ed textbook, focusing throughout on evidence from real and often high profi le cases including serial killer Harold Shipman, the Bridgewater Four and the Birmingham Six.
    [Show full text]
  • Being Pragmatic About Forensic Linguistics Edward K
    Journal of Law and Policy Volume 21 Issue 2 SYMPOSIUM: Article 12 Authorship Attribution Workshop 2013 Being Pragmatic About Forensic Linguistics Edward K. Cheng, J.D. Follow this and additional works at: https://brooklynworks.brooklaw.edu/jlp Recommended Citation Edward K. Cheng, J.D., Being Pragmatic About Forensic Linguistics, 21 J. L. & Pol'y (2013). Available at: https://brooklynworks.brooklaw.edu/jlp/vol21/iss2/12 This Article is brought to you for free and open access by the Law Journals at BrooklynWorks. It has been accepted for inclusion in Journal of Law and Policy by an authorized editor of BrooklynWorks. BEING PRAGMATIC ABOUT FORENSIC LINGUISTICS Edward K. Cheng* If my late colleague Margaret Berger taught me anything about evidence, it was that the field seldom yields easy answers. After all, law is necessarily a pragmatic discipline, especially when it comes to matters of proof. Courts must make their best decisions given the available evidence. They have neither the luxury of waiting for better, nor the ability to conjure up, evidence (or new technologies) that they wished they had. Scholars, by contrast, are naturally attracted to the ideal, sometimes like moths to a flame. Ideals reflect the values and commitments of our society, and they provide the goals that inspire and guide research. But when assessing a new field like forensic linguistics as a legal academic, one needs to carefully separate the ideal from the pragmatic. For when it comes to real cases, evidence law can ill afford to allow the perfect to be the enemy of the good. Bearing this admonition firmly in mind, this article aims to provide some legal context to the Authorship Attribution Workshop (“conference”).
    [Show full text]
  • Germanistische Studien
    Eine Zeitschrift des Vereins Deutsche Sprache (Georgien) GERMANISTISCHE STUDIEN N. 10 Jubiläumsausgabe Herausgegeben von Lali Kezba-Chundadse und Friederike Schmöe Begründet von Samson (Tengis) Karbelaschwili Tbilissi · Dortmund Verlag „Universal“ Germanistische Studien Herausgegeben von Prof. Dr. Lali Kezba-Chundadse und PD. Dr. Friederike Schmöe Begründet von Prof. Dr. Samson Karbelaschwili († 2009) Die Zeitschrift „Germanistische Studien“ des Vereins Deutsche Sprache (Abteilung Tbilissi) ist ein Forum für Forschungsbeiträge aus dem Bereich der deutschen Sprachwissenschaft, Literaturtheorie und Deutsch als Fremdsprache. Es ist interdisziplinär angelegt und offen für alle theoretischen Ansätze in oben genannten Teilbereichen der Germanistik. Die Zeitschrift erscheint seit 2000 einmal jährlich in gedruckter Form. Bis 2009 wurden Fallstudien in allen Teilbereichen der Germanistik veröffentlicht. Die letzten zwei Hefte (2009/10) der Zeitschrift, je einem Schwerpunktthema gewidmet, sind Tagungsberichte. Beiträge können nur in deutscher Sprache verfasst eingereicht werden. Die eingesandten Beiträge werden durch den international besetzten Beirat begutachtet. Die Druckausgabe wird unterstützt durch den Verein Deutsche Sprache. Allen Beiträgen wird ein kurzes 10- 15 Zeilen, (maximal ein halbseitiges englischsprachiges) Abstract vorangestellt. Die Inhaltsverzeichnisse der Print-Ausgaben, die Abstracts aller publizierten Beiträge und ausgewählte Artikel, nach den Namen der Autoren alphabetisch geordnet, sind im Volltext unter http://germstud.wordpress.com
    [Show full text]
  • About the AAFS
    American Academy of Forensic Sciences 410 North 21st Street Colorado Springs, Colorado 80904 Phone: (719) 636-1100 Email: [email protected] Website: www.aafs.org @ AAFS Publication 20-2 Copyright © 2020 American Academy of Forensic Sciences Printed in the United States of America Publication Printers, Inc., Denver, CO Typography by Kathy Howard Cover Art by My Creative Condition, Colorado Springs, CO WELCOME LETTER Dear Attendees, It is my high honor and distinct privilege to welcome you to the 72nd AAFS Annual Scientific Meeting in Anaheim, California. I would like to thank the AAFS staff, the many volunteers, and everyone else who have worked together to create an excellent program for this meeting with the theme Crossing Borders. You will have many opportunities to meet your colleagues and discuss new challenges in the field. There are many workshops and special sessions that will be presented. The Interdisciplinary and Plenary Sessions will provide different views in forensic science—past, present, and future. The Young Forensic Scientists Forum will celebrate its 25th Anniversary and is conducting a workshop related to the meeting theme. More than 1,000 presentations are scheduled that will provide you with more insight into the developments in forensic science. The exhibit hall, always interesting to explore, is where you will see the latest forensic science equipment, technology, and literature. The theme Crossing Borders was chosen by me and my colleagues at the Netherlands Forensic Institute (NFI). We see many definitions of crossing borders in forensic science today. For the 2020 meeting, six words starting with the letters “IN” are included in the theme.
    [Show full text]
  • Euphemism and Language Change: the Sixth and Seventh Ages
    Lexis Journal in English Lexicology 7 | 2012 Euphemism as a Word-Formation Process Euphemism and Language Change: The Sixth and Seventh Ages Kate Burridge Electronic version URL: http://journals.openedition.org/lexis/355 DOI: 10.4000/lexis.355 ISSN: 1951-6215 Publisher Université Jean Moulin - Lyon 3 Electronic reference Kate Burridge, « Euphemism and Language Change: The Sixth and Seventh Ages », Lexis [Online], 7 | 2012, Online since 25 June 2012, connection on 30 April 2019. URL : http:// journals.openedition.org/lexis/355 ; DOI : 10.4000/lexis.355 Lexis is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Lexis 7: “Euphemism as a Word-Formation Process” 65 Euphemism and Language Change: The Sixth and Seventh Ages Kate Burridge1 Abstract No matter which human group we look at, past or present, euphemism and its counterpart dysphemism are powerful forces and they are extremely important for the study of language change. They provide an emotive trigger for word addition, word loss, phonological distortion and semantic shift. Word taBoo plays perpetual havoc with the methods of historical comparative linguistics, even undermining one of the cornerstones of the discipline – the arBitrary nature of the word. When it comes to taBoo words, speakers Behave as if there were a very real connection Between the physical shape of words and their taBoo sense. This is why these words are so unstaBle and why they are so powerful. This paper reviews the various communicative functions of euphemisms and the different linguistic strategies that are used in their creation, focusing on the linguistic creativity that surrounds the topic of ‘old age’ in Modern English (Shakespeare’s sixth and seventh ages).
    [Show full text]
  • Usage and Origin of Expletives Usage
    MASARYK UNIVERSITY BRNO FACULTY OF EDUCATION DEPARTMENT OF ENGLISH LANGUAGE AND LITERATURE USAGE AND ORIGIN OF EXPLETIVES IN BRITISH ENGLISH DIPLOMA THESIS BRNO 2006 SUPERVISED BY: WRITTEN BY: ANDREW PHILIP OOOAKLANDOAKLANDAKLAND,, M.A. HANA ČECHOVÁ ACKNOWLEDGEMENT I should like to thank to Mr. Andrew Philip Oakland, M.A. for his academic guidance and precious advice he provided me with and thus helped to accomplish the following thesis. I would also like to thank to Peter Martin Crossley for all his patience and kind help. BRNO, 20th APRIL 2006 DECLARATION I hereby declare that I have worked on this undergraduate diploma thesis on my own and that I have used only the sources listed in the bibliography. I also give consent to deposit this thesis at Masaryk University in the library or the Informational system of the Faculty of Education and to be made available for study purposes. ___________________________________________________ Hana ČECHOVÁ CONTENTS 4 CONTENTS 1. INTRODUCTION ............................................................................6 2. WHAT ARE EXPLETIVES? ................................................................ 9 2.1 EXPLETIVES IN GRAMMAR 2.2 EXPLETIVES AS A PART OF SOCIO-LINGUISTICS 3. WHY DO PEOPLE USE EXPLETIVES? ................................................12 3.1 WHY DO WE SWEAR? 3.2 IS SWEARING A SIGN OF A WEAK MIND? 4. SWEARING IN HISTORICAL PERIODS ............................................. 20 4.1 OLD ENGLISH 4.2 MIDDLE ENGLISH 4.3 THE REFORMATION 4.4 THE RENAISSANCE 4.5 MODERN PERIOD 4.5.1 PYGMALION (1914) 4.5.2 LADY CHATTERLEY´S LOVER (1928, 1960) 4.5.3 OTHER INCIDENTS (1965-2004) 5. SEMANTIC CATEGORIES AND SPEAKERS PREFERENCES ..................56 5.1 HEAVEN AND HELL 5.2 SEX/BODY AND ITS FUNCTIONS 6.
    [Show full text]
  • Page 1 DOCUMENT RESUME ED 335 965 FL 019 564 AUTHOR
    DOCUMENT RESUME ED 335 965 FL 019 564 AUTHOR Riego de Rios, Maria Isabelita TITLE A Composite Dictionary of Philippine Creole Spanish (PCS). INSTITUTION Linguistic Society of the Philippines, Manila.; Summer Inst. of Linguistics, Manila (Philippines). REPORT NO ISBN-971-1059-09-6; ISSN-0116-0516 PUB DATE 89 NOTE 218p.; Dissertation, Ateneo de Manila University. The editor of "Studies in Philippine Linguistics" is Fe T. Otanes. The author is a Sister in the R.V.M. order. PUB TYPE Reference Materials - Vocabularies/Classifications/Dictionaries (134)-- Dissertations/Theses - Doctoral Dissertations (041) JOURNAL CIT Studies in Philippine Linguistics; v7 n2 1989 EDRS PRICE MF01/PC09 Plus Postage. DESCRIPTORS *Creoles; Dialect Studies; Dictionaries; English; Foreign Countries; *Language Classification; Language Research; *Language Variation; Linguistic Theory; *Spanish IDENTIFIERS *Cotabato Chabacano; *Philippines ABSTRACT This dictionary is a composite of four Philippine Creole Spanish dialects: Cotabato Chabacano and variants spoken in Ternate, Cavite City, and Zamboanga City. The volume contains 6,542 main lexical entries with corresponding entries with contrasting data from the three other variants. A concludins section summarizes findings of the dialect study that led to the dictionary's writing. Appended materials include a 99-item bibliography and materials related to the structural analysis of the dialects. An index also contains three alphabetical word lists of the variants. The research underlying the dictionary's construction is
    [Show full text]
  • Tove Skutnabb-Kangas Page 1 1/9/2009
    Tove Skutnabb-Kangas Page 1 1/9/2009 Bibliography on multilingualism, multilingual and Indigenous/minority education, linguistic human rights, language and power, the subtractive spread of English, the relationship between linguistic diversity and biodiversity, etc.. Last updated in January 2009. This private bibliography (around 946,000 bytes, over 127,00 words, around 5.350 entries, 313 pages in Times Roman 12) contains all the references I (Tove Skutnabb-Kangas) have used in what I have published since 1988. There may be errors (and certainly moving to Mac has garbled references in Kurdish, Greek, etc). I have not been able to delete all the doubles (and there are several) but am doing it when I see them. If an article says “In XX (ed)” and page numbers, the book itself will be found under the editor’s name. I have not checked the bibliography - still I hope it may be useful for some people. My own publications are also here in alphabetical order, but for more recent ones check my old web page http://akira.ruc.dk/~tovesk/ which is updated more often. Or, from December 2008 onwards, the new one, www.tove-skutnabb-kangas.org which is under construction. For (my husband) Robert Phillipson's publications, check his webpage, www.cbs.dk/staff/phillipson (I do not have all his publications here). All websites here were up-to-date when I put them in but I do not check them regularly. – Comments and additions are very welcome! Abadzi, Helen (2006). Efficient Learning for the Poor. Insights from the Fontier of Cognitive Neuroscience.
    [Show full text]