Evaluating Wordnet-Based Measures of Lexical Semantic Relatedness
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Combining Semantic and Lexical Measures to Evaluate Medical Terms Similarity?
Combining semantic and lexical measures to evaluate medical terms similarity? Silvio Domingos Cardoso1;2, Marcos Da Silveira1, Ying-Chi Lin3, Victor Christen3, Erhard Rahm3, Chantal Reynaud-Dela^ıtre2, and C´edricPruski1 1 LIST, Luxembourg Institute of Science and Technology, Luxembourg fsilvio.cardoso,marcos.dasilveira,[email protected] 2 LRI, University of Paris-Sud XI, France [email protected] 3 Department of Computer Science, Universit¨atLeipzig, Germany flin,christen,[email protected] Abstract. The use of similarity measures in various domains is corner- stone for different tasks ranging from ontology alignment to information retrieval. To this end, existing metrics can be classified into several cate- gories among which lexical and semantic families of similarity measures predominate but have rarely been combined to complete the aforemen- tioned tasks. In this paper, we propose an original approach combining lexical and ontology-based semantic similarity measures to improve the evaluation of terms relatedness. We validate our approach through a set of experiments based on a corpus of reference constructed by domain experts of the medical field and further evaluate the impact of ontology evolution on the used semantic similarity measures. Keywords: Similarity measures · Ontology evolution · Semantic Web · Medical terminologies 1 Introduction Measuring the similarity between terms is at the heart of many research inves- tigations. In ontology matching, similarity measures are used to evaluate the relatedness between concepts from different ontologies [9]. The outcomes are the mappings between the ontologies, increasing the coverage of domain knowledge and optimize semantic interoperability between information systems. In informa- tion retrieval, similarity measures are used to evaluate the relatedness between units of language (e.g., words, sentences, documents) to optimize search [39]. -
Evaluating Lexical Similarity to Build Sentiment Similarity
Evaluating Lexical Similarity to Build Sentiment Similarity Grégoire Jadi∗, Vincent Claveauy, Béatrice Daille∗, Laura Monceaux-Cachard∗ ∗ LINA - Univ. Nantes {gregoire.jadi beatrice.daille laura.monceaux}@univ-nantes.fr y IRISA–CNRS, France [email protected] Abstract In this article, we propose to evaluate the lexical similarity information provided by word representations against several opinion resources using traditional Information Retrieval tools. Word representation have been used to build and to extend opinion resources such as lexicon, and ontology and their performance have been evaluated on sentiment analysis tasks. We question this method by measuring the correlation between the sentiment proximity provided by opinion resources and the semantic similarity provided by word representations using different correlation coefficients. We also compare the neighbors found in word representations and list of similar opinion words. Our results show that the proximity of words in state-of-the-art word representations is not very effective to build sentiment similarity. Keywords: opinion lexicon evaluation, sentiment analysis, spectral representation, word embedding 1. Introduction tend or build sentiment lexicons. For example, (Toh and The goal of opinion mining or sentiment analysis is to iden- Wang, 2014) uses the syntactic categories from WordNet tify and classify texts according to the opinion, sentiment or as features for a CRF to extract aspects and WordNet re- emotion that they convey. It requires a good knowledge of lations such as antonymy and synonymy to extend an ex- the sentiment properties of each word. This properties can isting opinion lexicons. Similarly, (Agarwal et al., 2011) be either discovered automatically by machine learning al- look for synonyms in WordNet to find the polarity of words gorithms, or embedded into language resources. -
Teaching and Learning Academic Vocabulary
Musa Nushi Shahid Beheshti University Homa Jenabzadeh Shahid Beheshti University Teaching and Learning Academic Vocabulary Developing learners' lexical competence through vocabulary instruction has always been high on second language teachers' agenda. This paper will be focusing on the importance of academic vocabulary and how to teach such vocabulary to adult EFL/ESL learners at intermediate and higher levels of proficiency in the English language. It will also introduce several techniques, mostly those that engage students’ cognitive abilities, which can in turn facilitate the process of teaching and learning academic vocabulary. Several online tools that can assist academic vocabulary teaching and learning will also be introduced. The paper concludes with the introduction and discussion of four important academic vocabulary word lists which can help second language teachers with the identification and selection of academic vocabulary in their instructional planning. Keywords: Vocabulary, Academic, Second language, Word lists, Instruction. 1. Introduction Vocabulary is essential to conveying meaning in a second language (L2). Schmitt (2010) notes that L2 learners seem cognizant of the importance of vocabulary in language learning, as evidenced by their tendency to carry dictionaries, and not grammar books, around with them. It has even been suggested that the main difference between intermediate and advanced L2 learners lies not in how complex their grammatical knowledge is but in how expanded and developed their mental lexicon is (Lewis, 1997). Similarly, McCarthy (1990) observes that "no matter how well the student learns grammar, no matter how successfully the sounds of L2 are mastered, without words to express a wide range of meanings, communication in an L2 just cannot happen in any meaningful way" (p. -
Using Prior Knowledge to Guide BERT's Attention in Semantic
Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks Tingyu Xia Yue Wang School of Artificial Intelligence, Jilin University School of Information and Library Science Key Laboratory of Symbolic Computation and Knowledge University of North Carolina at Chapel Hill Engineering, Jilin University [email protected] [email protected] Yuan Tian∗ Yi Chang∗ School of Artificial Intelligence, Jilin University School of Artificial Intelligence, Jilin University Key Laboratory of Symbolic Computation and Knowledge International Center of Future Science, Jilin University Engineering, Jilin University Key Laboratory of Symbolic Computation and Knowledge [email protected] Engineering, Jilin University [email protected] ABSTRACT 1 INTRODUCTION We study the problem of incorporating prior knowledge into a deep Measuring semantic similarity between two pieces of text is a fun- Transformer-based model, i.e., Bidirectional Encoder Representa- damental and important task in natural language understanding. tions from Transformers (BERT), to enhance its performance on Early works on this task often leverage knowledge resources such semantic textual matching tasks. By probing and analyzing what as WordNet [28] and UMLS [2] as these resources contain well- BERT has already known when solving this task, we obtain better defined types and relations between words and concepts. Recent understanding of what task-specific knowledge BERT needs the works have shown that deep learning models are more effective most and where it is most needed. The analysis further motivates on this task by learning linguistic knowledge from large-scale text us to take a different approach than most existing works. Instead of data and representing text (words and sentences) as continuous using prior knowledge to create a new training task for fine-tuning trainable embedding vectors [10, 17]. -
Defining Vocabulary Words Grades
Vocabulary Instruction Booster Session 2: Defining Vocabulary Words Grades 5–8 Vaughn Gross Center for Reading and Language Arts at The University of Texas at Austin © 2014 Texas Education Agency/The University of Texas System Acknowledgments Vocabulary Instruction Booster Session 2: Defining Vocabulary Words was developed with funding from the Texas Education Agency and the support and talent of many individuals whose names do not appear here, but whose hard work and ideas are represented throughout. These individuals include national and state reading experts; researchers; and those who work for the Vaughn Gross Center for Reading and Language Arts at The University of Texas at Austin and the Texas Education Agency. Vaughn Gross Center for Reading and Language Arts College of Education The University of Texas at Austin www.meadowscenter.org/vgc Manuel J. Justiz, Dean Greg Roberts, Director Texas Education Agency Michael L. Williams, Commissioner of Education Monica Martinez, Associate Commissioner, Standards and Programs Development Team Meghan Coleman, Lead Author Phil Capin Karla Estrada David Osman Jennifer B. Schnakenberg Jacob Williams Design and Editing Matthew Slater, Editor Carlos Treviño, Designer Special thanks to Alice Independent School District in Alice, Texas Vaughn Gross Center for Reading and Language Arts at The University of Texas at Austin © 2014 Texas Education Agency/The University of Texas System Vaughn Gross Center for Reading and Language Arts at The University of Texas at Austin © 2014 Texas Education Agency/The University of Texas System Introduction Explicit and robust vocabulary instruction can make a significant difference when we are purposeful in the words we choose to teach our students. -
Latent Semantic Analysis for Text-Based Research
Behavior Research Methods, Instruments, & Computers 1996, 28 (2), 197-202 ANALYSIS OF SEMANTIC AND CLINICAL DATA Chaired by Matthew S. McGlone, Lafayette College Latent semantic analysis for text-based research PETER W. FOLTZ New Mexico State University, Las Cruces, New Mexico Latent semantic analysis (LSA) is a statistical model of word usage that permits comparisons of se mantic similarity between pieces of textual information. This papersummarizes three experiments that illustrate how LSA may be used in text-based research. Two experiments describe methods for ana lyzinga subject's essay for determining from what text a subject learned the information and for grad ing the quality of information cited in the essay. The third experiment describes using LSAto measure the coherence and comprehensibility of texts. One of the primary goals in text-comprehension re A theoretical approach to studying text comprehen search is to understand what factors influence a reader's sion has been to develop cognitive models ofthe reader's ability to extract and retain information from textual ma representation ofthe text (e.g., Kintsch, 1988; van Dijk terial. The typical approach in text-comprehension re & Kintsch, 1983). In such a model, semantic information search is to have subjects read textual material and then from both the text and the reader's summary are repre have them produce some form of summary, such as an sented as sets ofsemantic components called propositions. swering questions or writing an essay. This summary per Typically, each clause in a text is represented by a single mits the experimenter to determine what information the proposition. -
Ying Liu, Xi Yang
Liu, Y. & Yang, X. (2018). A similar legal case retrieval Liu & Yang system by multiple speech question and answer. In Proceedings of The 18th International Conference on Electronic Business (pp. 72-81). ICEB, Guilin, China, December 2-6. A Similar Legal Case Retrieval System by Multiple Speech Question and Answer (Full paper) Ying Liu, School of Computer Science and Information Engineering, Guangxi Normal University, China, [email protected] Xi Yang*, School of Computer Science and Information Engineering, Guangxi Normal University, China, [email protected] ABSTRACT In real life, on the one hand people often lack legal knowledge and legal awareness; on the other hand lawyers are busy, time- consuming, and expensive. As a result, a series of legal consultation problems cannot be properly handled. Although some legal robots can answer some problems about basic legal knowledge that are matched by keywords, they cannot do similar case retrieval and sentencing prediction according to the criminal facts described in natural language. To overcome the difficulty, we propose a similar case retrieval system based on natural language understanding. The system uses online speech synthesis of IFLYTEK and speech reading and writing technology, integrates natural language semantic processing technology and multiple rounds of question-and-answer dialogue mechanism to realise the legal knowledge question and answer with the memory-based context processing ability, and finally retrieves a case that is most similar to the criminal facts that the user consulted. After trial use, the system has a good level of human-computer interaction and high accuracy of information retrieval, which can largely satisfy people's consulting needs for legal issues. -
Untangling Semantic Similarity: Modeling Lexical Processing Experiments with Distributional Semantic Models
Untangling Semantic Similarity: Modeling Lexical Processing Experiments with Distributional Semantic Models. Farhan Samir Barend Beekhuizen Suzanne Stevenson Department of Computer Science Department of Language Studies Department of Computer Science University of Toronto University of Toronto, Mississauga University of Toronto ([email protected]) ([email protected]) ([email protected]) Abstract DSMs thought to instantiate one but not the other kind of sim- Distributional semantic models (DSMs) are substantially var- ilarity have been found to explain different priming effects on ied in the types of semantic similarity that they output. Despite an aggregate level (as opposed to an item level). this high variance, the different types of similarity are often The underlying conception of semantic versus associative conflated as a monolithic concept in models of behavioural data. We apply the insight that word2vec’s representations similarity, however, has been disputed (Ettinger & Linzen, can be used for capturing both paradigmatic similarity (sub- 2016; McRae et al., 2012), as has one of the main ways in stitutability) and syntagmatic similarity (co-occurrence) to two which it is operationalized (Gunther¨ et al., 2016). In this pa- sets of experimental findings (semantic priming and the effect of semantic neighbourhood density) that have previously been per, we instead follow another distinction, based on distri- modeled with monolithic conceptions of DSM-based seman- butional properties (e.g. Schutze¨ & Pedersen, 1993), namely, tic similarity. Using paradigmatic and syntagmatic similarity that between syntagmatically related words (words that oc- based on word2vec, we show that for some tasks and types of items the two types of similarity play complementary ex- cur in each other’s near proximity, such as drink–coffee), planatory roles, whereas for others, only syntagmatic similar- and paradigmatically related words (words that can be sub- ity seems to matter. -
Aspects of Idiom*
University of Calgary PRISM: University of Calgary's Digital Repository Calgary (Working) Papers in Linguistics Volume 07, Winter 1982 1982-01 Aspects of idiom* Sayers, Coral University of Calgary Sayers, C. (1982). Aspects of idiom*. Calgary Working Papers in Linguistics, 7(Winter), 13-32. http://hdl.handle.net/1880/51297 journal article Downloaded from PRISM: https://prism.ucalgary.ca Aspects of Idiom* Coral Sayers • 1. Introduction Weinreich (1969) defines an idiom as "a complex expression • whose meaning cannot be derived from the meanings of its elements." This writer has collected examples of idioms from the English, German, Australian English, and Quebec French dialects (Appendix I) in order to examine the properties of the idiom, to explore in the literature the current concepts of idiom, and to relate relevant knowledge gained by these processes to the teaching of English as a Second Language. An idiom may be a word, a "lexical idiom," or, if it has a more complicated constituent structure, a "phrasal idiom." (Katz and Postal, 1964, cited by Fraser, 1970:23) As the lexical idiom, domi nated by a single branch syntactic category (e.g. "squatter," a noun) does not present as much difficulty as does the phrasal idiom, this paper will focus upon the latter. Despite the difficulties presented by idioms to the Trans formational Grammar model, e.g. the contradiction of the claim that virtually all sentences have a low occurrence probability and frequency (Coulmas, 1979), the consensus of opinion seems to be that idioms can be accommodated in the grammatical description. Phrasal idioms of English should be considered as a "more complicated variety of mono morphemic lexical entries." (Fraser, 1970:41) 2. -
Measuring Semantic Similarity of Words Using Concept Networks
Measuring semantic similarity of words using concept networks Gabor´ Recski Eszter Iklodi´ Research Institute for Linguistics Dept of Automation and Applied Informatics Hungarian Academy of Sciences Budapest U of Technology and Economics H-1068 Budapest, Benczur´ u. 33 H-1117 Budapest, Magyar tudosok´ krt. 2 [email protected] [email protected] Katalin Pajkossy Andras´ Kornai Department of Algebra Institute for Computer Science Budapest U of Technology and Economics Hungarian Academy of Sciences H-1111 Budapest, Egry J. u. 1 H-1111 Budapest, Kende u. 13-17 [email protected] [email protected] Abstract using word embeddings and WordNet. In Sec- tion 3 we briefly introduce the 4lang resources We present a state-of-the-art algorithm and the formalism it uses for encoding the mean- for measuring the semantic similarity of ing of words as directed graphs of concepts, then word pairs using novel combinations of document our efforts to develop novel 4lang- word embeddings, WordNet, and the con- based similarity features. Besides improving the cept dictionary 4lang. We evaluate our performance of existing systems for measuring system on the SimLex-999 benchmark word similarity, the goal of the present project is to data. Our top score of 0.76 is higher than examine the potential of 4lang representations in any published system that we are aware of, representing non-trivial lexical relationships that well beyond the average inter-annotator are beyond the scope of word embeddings and agreement of 0.67, and close to the 0.78 standard linguistic ontologies. average correlation between a human rater Section 4 presents our results and pro- and the average of all other ratings, sug- vides rough error analysis. -
Semantic Similarity Between Sentences
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072 SEMANTIC SIMILARITY BETWEEN SENTENCES PANTULKAR SRAVANTHI1, DR. B. SRINIVASU2 1M.tech Scholar Dept. of Computer Science and Engineering, Stanley College of Engineering and Technology for Women, Telangana- Hyderabad, India 2Associate Professor - Dept. of Computer Science and Engineering, Stanley College of Engineering and Technology for Women, Telangana- Hyderabad, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - The task of measuring sentence similarity is [1].Sentence similarity is one of the core elements of Natural defined as determining how similar the meanings of two Language Processing (NLP) tasks such as Recognizing sentences are. Computing sentence similarity is not a trivial Textual Entailment (RTE)[2] and Paraphrase Recognition[3]. task, due to the variability of natural language - expressions. Given two sentences, the task of measuring sentence Measuring semantic similarity of sentences is closely related to similarity is defined as determining how similar the meaning semantic similarity between words. It makes a relationship of two sentences is. The higher the score, the more similar between a word and the sentence through their meanings. The the meaning of the two sentences. WordNet and similarity intention is to enhance the concepts of semantics over the measures play an important role in sentence level similarity syntactic measures that are able to categorize the pair of than document level[4]. sentences effectively. Semantic similarity plays a vital role in Natural language processing, Informational Retrieval, Text 1.1 Problem Description Mining, Q & A systems, text-related research and application area. -
SELECTING and CREATING a WORD LIST for ENGLISH LANGUAGE TEACHING by Deny A
Teaching English with Technology, 17(1), 60-72, http://www.tewtjournal.org 60 SELECTING AND CREATING A WORD LIST FOR ENGLISH LANGUAGE TEACHING by Deny A. Kwary and Jurianto Universitas Airlangga Dharmawangsa Dalam Selatan, Surabaya 60286, Indonesia d.a.kwary @ unair.ac.id / [email protected] Abstract Since the introduction of the General Service List (GSL) in 1953, a number of studies have confirmed the significant role of a word list, particularly GSL, in helping ESL students learn English. Given the recent development in technology, several researchers have created word lists, each of them claims to provide a better coverage of a text and a significant role in helping students learn English. This article aims at analyzing the claims made by the existing word lists and proposing a method for selecting words and a creating a word list. The result of this study shows that there are differences in the coverage of the word lists due to the difference in the corpora and the source text analysed. This article also suggests that we should create our own word list, which is both personalized and comprehensive. This means that the word list is not just a list of words. The word list needs to be accompanied with the senses and the patterns of the words, in order to really help ESL students learn English. Keywords: English; GSL; NGSL; vocabulary; word list 1. Introduction A word list has been noted as an essential resource for language teaching, especially for second language teaching. The main purpose for the creation of a word list is to determine the words that the learners need to know, in order to provide more focused learning materials.