Experiments in Clustering Urban-Legend Texts
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Enhanced Thesaurus Terms Extraction for Document Indexing
Enhanced Thesaurus Terms Extraction for Document Indexing Frane ari¢, Jan najder, Bojana Dalbelo Ba²i¢, Hrvoje Ekli¢ Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 10000 Zagreb, Croatia E-mail:{Frane.Saric, Jan.Snajder, Bojana.Dalbelo, Hrvoje.Eklic}@fer.hr Abstract. In this paper we present an mogeneous due to diverse background knowl- enhanced method for the thesaurus term edge and expertise of human indexers. The extraction regarded as the main support to task of building semi-automatic and auto- a semi-automatic indexing system. The matic systems, which aim to decrease the enhancement is achieved by neutralising burden of work borne by indexers, has re- the eect of language morphology applying cently attracted interest in the research com- lemmatisation on both the text and the munity [4], [13], [14]. Automatic indexing thesaurus, and by implementing an ecient systems still do not achieve the performance recursive algorithm for term extraction. of human indexers, so semi-automatic sys- Formal denition and statistical evaluation tems are widely used (CINDEX, MACREX, of the experimental results of the proposed MAI [10]). method for thesaurus term extraction are In this paper we present a method for the- given. The need for disambiguation methods saurus term extraction regarded as the main and the eect of lemmatisation in the realm support to semi-automatic indexing system. of thesaurus term extraction are discussed. Term extraction is a process of nding all ver- batim occurrences of all terms in the text. Keywords. Information retrieval, term Our method of term extraction is a part of extraction, NLP, lemmatisation, Eurovoc. -
Urban Legends
Jestice/English 1 Urban Legends An urban legend, urban myth, urban tale, or contemporary legend is a form of modern folklore consisting of stories that may or may not have been believed by their tellers to be true. As with all folklore and mythology, the designation suggests nothing about the story's veracity, but merely that it is in circulation, exhibits variation over time, and carries some significance that motivates the community in preserving and propagating it. Despite its name, an urban legend does not necessarily originate in an urban area. Rather, the term is used to differentiate modern legend from traditional folklore in pre-industrial times. For this reason, sociologists and folklorists prefer the term contemporary legend. Urban legends are sometimes repeated in news stories and, in recent years, distributed by e-mail. People frequently allege that such tales happened to a "friend of a friend"; so often, in fact, that "friend of a friend has become a commonly used term when recounting this type of story. Some urban legends have passed through the years with only minor changes to suit regional variations. One example is the story of a woman killed by spiders nesting in her elaborate hairdo. More recent legends tend to reflect modern circumstances, like the story of people ambushed, anesthetized, and waking up minus one kidney, which was surgically removed for transplantation--"The Kidney Heist." The term “urban legend,” as used by folklorists, has appeared in print since at least 1968. Jan Harold Brunvand, professor of English at the University of Utah, introduced the term to the general public in a series of popular books published beginning in 1981. -
An Evaluation of Machine Learning Approaches to Natural Language Processing for Legal Text Classification
Imperial College London Department of Computing An Evaluation of Machine Learning Approaches to Natural Language Processing for Legal Text Classification Supervisors: Author: Prof Alessandra Russo Clavance Lim Nuri Cingillioglu Submitted in partial fulfillment of the requirements for the MSc degree in Computing Science of Imperial College London September 2019 Contents Abstract 1 Acknowledgements 2 1 Introduction 3 1.1 Motivation .................................. 3 1.2 Aims and objectives ............................ 4 1.3 Outline .................................... 5 2 Background 6 2.1 Overview ................................... 6 2.1.1 Text classification .......................... 6 2.1.2 Training, validation and test sets ................. 6 2.1.3 Cross validation ........................... 7 2.1.4 Hyperparameter optimization ................... 8 2.1.5 Evaluation metrics ......................... 9 2.2 Text classification pipeline ......................... 14 2.3 Feature extraction ............................. 15 2.3.1 Count vectorizer .......................... 15 2.3.2 TF-IDF vectorizer ......................... 16 2.3.3 Word embeddings .......................... 17 2.4 Classifiers .................................. 18 2.4.1 Naive Bayes classifier ........................ 18 2.4.2 Decision tree ............................ 20 2.4.3 Random forest ........................... 21 2.4.4 Logistic regression ......................... 21 2.4.5 Support vector machines ...................... 22 2.4.6 k-Nearest Neighbours ....................... -
Using Lexico-Syntactic Ontology Design Patterns for Ontology Creation and Population
Using Lexico-Syntactic Ontology Design Patterns for ontology creation and population Diana Maynard and Adam Funk and Wim Peters Department of Computer Science University of Sheffield Regent Court, 211 Portobello S1 4DP, Sheffield, UK Abstract. In this paper we discuss the use of information extraction techniques involving lexico-syntactic patterns to generate ontological in- formation from unstructured text and either create a new ontology from scratch or augment an existing ontology with new entities. We refine the patterns using a term extraction tool and some semantic restrictions derived from WordNet and VerbNet, in order to prevent the overgener- ation that occurs with the use of the Ontology Design Patterns for this purpose. We present two applications developed in GATE and available as plugins for the NeOn Toolkit: one for general use on all kinds of text, and one for specific use in the fisheries domain. Key words: natural language processing, relation extraction, ontology generation, information extraction, Ontology Design Patterns 1 Introduction Ontology population is a crucial part of knowledge base construction and main- tenance that enables us to relate text to ontologies, providing on the one hand a customised ontology related to the data and domain with which we are con- cerned, and on the other hand a richer ontology which can be used for a variety of semantic web-related tasks such as knowledge management, information re- trieval, question answering, semantic desktop applications, and so on. Automatic ontology population is generally performed by means of some kind of ontology-based information extraction (OBIE) [1, 2]. This consists of identi- fying the key terms in the text (such as named entities and technical terms) and then relating them to concepts in the ontology. -
Fact Or Fiction?
The Ins and Outs of Media Literacy 1 Part 1: Fact or Fiction? Fake News, Alternative Facts, and other False Information By Jeff Rand La Crosse Public Library 2 Goals To give you the knowledge and tools to be a better evaluator of information Make you an agent in the fight against falsehood 3 Ground rules Our focus is knowledge and tools, not individuals You may see words and images that disturb you or do not agree with your point of view No political arguments Agree 4 Historical Context “No one in this world . has ever lost money by underestimating the intelligence of the great masses of plain people.” (H. L. Mencken, September 19, 1926) 5 What is happening now and why 6 Shift from “Old” to “New” Media Business/Professional Individual/Social Newspapers Facebook Magazines Twitter Television Websites/blogs Radio 7 News Platforms 8 Who is your news source? Professional? Personal? Educated Trained Experienced Supervised With a code of ethics https://www.spj.org/ethicscode.asp 9 Social Media & News 10 Facebook & Fake News 11 Veles, Macedonia 12 Filtering Based on: Creates filter bubbles Your location Previous searches Previous clicks Previous purchases Overall popularity 13 Echo chamber effect 14 Repetition theory Coke is the real thing. Coke is the real thing. Coke is the real thing. Coke is the real thing. Coke is the real thing. 15 Our tendencies Filter bubbles: not going outside of your own beliefs Echo chambers: repeating whatever you agree with to the exclusion of everything else Information avoidance: just picking what you agree with and ignoring everything else Satisficing: stopping when the first result agrees with your thinking and not researching further Instant gratification: clicking “Like” and “Share” without thinking (Dr. -
LASLA and Collatinus
L.A.S.L.A. and Collatinus: a convergence in lexica Philippe Verkerk, Yves Ouvrard, Margherita Fantoli, Dominique Longrée To cite this version: Philippe Verkerk, Yves Ouvrard, Margherita Fantoli, Dominique Longrée. L.A.S.L.A. and Collatinus: a convergence in lexica. Studi e saggi linguistici, ETS, In press. hal-02399878v1 HAL Id: hal-02399878 https://hal.archives-ouvertes.fr/hal-02399878v1 Submitted on 9 Dec 2019 (v1), last revised 14 May 2020 (v2) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. L.A.S.L.A. and Collatinus: a convergence in lexica Philippe Verkerk, Yves Ouvrard, Margherita Fantoli and Dominique Longrée L.A.S.L.A. (Laboratoire d'Analyse Statistique des Langues Anciennes, University of Liège, Belgium) has begun in 1961 a project of lemmatisation and morphosyntactic tagging of Latin texts. This project is still running with new texts lemmatised each year. The resulting files have been recently opened to the interested scholars and they now count approximatively 2.500.000 words, the lemmatisation of which has been checked by a philologist. In the early 2.000's, Collatinus has been developed by Yves Ouvrard for teaching. -
“La Llorona” As a Liminal Archetypal Monster in Modern Latin American Society
eTropic 16.1 (2017): ‘Tropical Liminal: Urban Vampires & Other Bloodsucking Monstrosities’ Special Issue | 67 The Role of the Internet in the Endurance of “La Llorona” as a Liminal Archetypal Monster in Modern Latin American Society David Ramírez Plascencia University of Guadalajara-SUV, México Abstract Monsters are liminal beings that not only portray fears, proscriptions and collective norms, they are also embedded with special qualities that scare and, at the same time, captivate people’s inquisitiveness. Monstrosities are present in practically all cultures; they remain alive, being passed from one generation to another, often altering their characteristics over time. Modernity and science have not ended people’s belief in paranormal beings; to the contrary, they are still vivid and fresh, with contemporary societies updating and incorporating them into daily life. This paper analyses one of the most well-known legends of Mexico and Latin America, the ghost of “La Llorona” (the weeping woman). The legend of La Llorona can be traced to pre-Hispanic cultures in Mexico, however, the presence of a phantasmagoric figure chasing strangers in rural and urban places has spread across the continent, from Mexico and Central America, to Latino communities in the United States of America. The study of this liminal creature aims to provide a deep sense of her characteristics – through spaces, qualities and meanings; and to furthermore understand how contemporary societies have adopted and modernised this figure, including through the internet. The paper analyses different versions of the legend shared across online platforms and are analysed using Jeffrey Jerome Cohen’s (1996) theoretical tool described in his work Monster Culture (Seven Theses), which demonstrates La Llorona’s liminal qualities. -
Storytelling
Please do not remove this page Storytelling Anderson, Katie Elson https://scholarship.libraries.rutgers.edu/discovery/delivery/01RUT_INST:ResearchRepository/12643385580004646?l#13643502170004646 Anderson, K. E. (2010). Storytelling. SAGE. https://doi.org/10.7282/T35T3HSK This work is protected by copyright. You are free to use this resource, with proper attribution, for research and educational purposes. Other uses, such as reproduction or publication, may require the permission of the copyright holder. Downloaded On 2021/09/24 13:02:38 -0400 Chapter 28- 21st Century Anthropology: A Reference Handbook Edited by H. James Birx Storytelling Katie Elson Anderson, Rutgers University. Once upon a time before words were written, before cultures and societies were observed and analyzed there was storytelling. Storytelling has been a part of humanity since people were able to communicate and respond to the basic biological urge to explain, educate and enlighten. Cave drawings, traditional dances, poems, songs, and chants are all examples of early storytelling. Stories pass on historical, cultural, and moral information and provide escape and relief from the everyday struggle to survive. Storytelling takes place in all cultures in a variety of different forms. Studying these forms requires an interdisciplinary approach involving anthropology, psychology, linguistics, history, library science, theater, media studies and other related disciplines. New technologies and new approaches have brought about a renewed interest in the varied aspects and elements of storytelling, broadening our understanding and appreciation of its complexity. What is Storytelling? Defining storytelling is not a simple matter. Scholars from a variety of disciplines, professional and amateur storytellers, and members of the communities where the stories dwell have not come to a consensus on what defines storytelling. -
Removing Boilerplate and Duplicate Content from Web Corpora
Masaryk University Faculty}w¡¢£¤¥¦§¨ of Informatics !"#$%&'()+,-./012345<yA| Removing Boilerplate and Duplicate Content from Web Corpora Ph.D. thesis Jan Pomik´alek Brno, 2011 Acknowledgments I want to thank my supervisor Karel Pala for all his support and encouragement in the past years. I thank LudˇekMatyska for a useful pre-review and also for being so kind and cor- recting typos in the text while reading it. I thank the KrdWrd team for providing their Canola corpus for my research and namely to Egon Stemle for his help and technical support. I thank Christian Kohlsch¨utterfor making the L3S-GN1 dataset publicly available and for an interesting e-mail discussion. Special thanks to my NLPlab colleagues for creating a great research environment. In particular, I want to thank Pavel Rychl´yfor inspiring discussions and for an interesting joint research. Special thanks to Adam Kilgarriff and Diana McCarthy for reading various parts of my thesis and for their valuable comments. My very special thanks go to Lexical Computing Ltd and namely again to the director Adam Kilgarriff for fully supporting my research and making it applicable in practice. Last but not least I thank my family for all their love and support throughout my studies. Abstract In the recent years, the Web has become a popular source of textual data for linguistic research. The Web provides an extremely large volume of texts in many languages. However, a number of problems have to be resolved in order to create collections (text corpora) which are appropriate for application in natural language processing. In this work, two related problems are addressed: cleaning a boilerplate and removing duplicate and near-duplicate content from Web data. -
The State of (Full) Text Search in Postgresql 12
Event / Conference name Location, Date The State of (Full) Text Search in PostgreSQL 12 FOSDEM 2020 Jimmy Angelakos Senior PostgreSQL Architect Twitter: @vyruss https://www.2ndQuadrant.com FOSDEM Brussels, 2020-02-02 Contents ● (Full) Text Search ● Indexing ● Operators ● Non-natural text ● Functions ● Collation ● Dictionaries ● Other “text” types ● Examples ● Maintenance https://www.2ndQuadrant.com FOSDEM Brussels, 2020-02-02 Your attention please Allergy advice ● This presentation contains linguistics, NLP, Markov chains, Levenshtein distances, and various other confounding terms. ● These have been known to induce drowsiness and inappropriate sleep onset in lecture theatres. https://www.2ndQuadrant.com FOSDEM Brussels, 2020-02-02 What is Text? (Baby don’t hurt me) ● PostgreSQL character types – CHAR(n) – VARCHAR(n) – VARCHAR, TEXT ● Trailing spaces: significant (e.g. for LIKE / regex) ● Storage – Character Set (e.g. UTF-8) – 1+126 bytes → 4+n bytes – Compression, TOAST https://www.2ndQuadrant.com FOSDEM Brussels, 2020-02-02 What is Text Search? ● Information retrieval → Text retrieval ● Search on metadata – Descriptive, bibliographic, tags, etc. – Discovery & identification ● Search on parts of the text – Matching – Substring search – Data extraction, cleaning, mining https://www.2ndQuadrant.com FOSDEM Brussels, 2020-02-02 Text search operators in PostgreSQL ● LIKE, ILIKE (~~, ~~*) ● ~, ~* (POSIX regex) ● regexp_match(string text, pattern text) ● But are SQL/regular expressions enough? – No ranking of results – No concept of language – Cannot be indexed ● Okay okay, can be somewhat indexed* ● SIMILAR TO → best forget about this one https://www.2ndQuadrant.com FOSDEM Brussels, 2020-02-02 What is Full Text Search (FTS)? ● Information retrieval → Text retrieval → Document retrieval ● Search on words (on tokens) in a database (all documents) ● No index → Serial search (e.g. -
Urban-Legend-PUPIL.Pdf
PUPIL Like the wood we use for our furniture, the paper used to print this catalog is FSC certified and comes from responsible forests. OUR BRAND VALUES Urban Legend, a brand by CasaMia Interior Ltd. Born from the will to create accessible Designs that do not compromise innovation, quality and style, “Urban Legend” is a freedom brand first and foremost We create what we like. We believe the best way to keep our innovative and creative spirit alive every day is to love what we do, and to do it with purpose. We constantly re-invent ourselves. Our Designs cannot be classified under one particular style, but rather incorporate different languages to create unique statements. Our Design vision is very focused on what we call our “emotional intelligence”, or our capacity to be moved by a simple detail, a shape or a texture. Our style is minimalist yet sophisticated, warm and “There is an invisible, subconscious authentic yet resolutely contemporary, innova- tive and sometimes conservative. These contra- feeling that connects us to the dictions, we do not fight them, but we integrate things we love. It is this very intimate them fully into our work, and they define the very and ethereal emotion that we strive unique “Urban Legend” signature, between Arti- san well kept knowhow and “modern lifestyles”. to ignite through our products” D.Jamet Head of Design p4 p5 OUR PRODUCTS Our furniture is manufactured in Vietnam, by “work- shop-sized” facturies that are both highly skilled and cost-efficient. We put a high priority to work with people who take pride in what they do, and always / FSC timbers aim at achieving higher goals with us! / Human-scaled 100% of the wood we use in production is imported and comes from sustainable forests. -
Exploding Phones and Dangerous Bananas: Perceived Precision and Believability of Deceptive Messages Found on the Internet
EXPLODING PHONES AND DANGEROUS BANANAS: PERCEIVED PRECISION AND BELIEVABILITY OF DECEPTIVE MESSAGES FOUND ON THE INTERNET Stefano Grazioli Ronald C. Carrell McIntire School of Commerce McCombs School of Business University of Virginia University of Texas at Austin [email protected] [email protected] Abstract Urban legends are untrue messages that circulate widely in a social environment and that have a wide audience that believes them to be true. We argue that the Internet facilitates the spreading of these deceptive messages and that in some cases this carries with it social and managerial consequences. We contribute to current research on dysfunctional forms of communication by showing that targets of deceptive messages judge lower-precision messages to be untruthful, and that some (but not all) targets use higher precision as a proxy for truth. Overall, these results caution us not to use precision as a proxy for truth when evaluating reports and rumors such as those that flood Internet mailboxes and chat rooms. Introduction Urban legends are narratives that (a) originate and spread anonymously or semi-anonymously, (b) are often found in several versions, (c) have a wide audience, and (d) are believed by many to be true despite the lack of supporting evidence or despite the existence of clear contradictory evidence (Brunvand 2000; Heath, et al 2001). Examples of urban legends include reports of alligators inhabiting city sewer systems, razor blades being placed inside apples, phones that cause explosions, and bacteria-laden bananas. Google and the Open Directory Project,1 as well as dozens of other web sites,2 contain hundreds of examples of such narratives.