Dictionary-Based Concept Mining: an Application for Turkish
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Text Mining Course for KNIME Analytics Platform
Text Mining Course for KNIME Analytics Platform KNIME AG Copyright © 2018 KNIME AG Table of Contents 1. The Open Analytics Platform 2. The Text Processing Extension 3. Importing Text 4. Enrichment 5. Preprocessing 6. Transformation 7. Classification 8. Visualization 9. Clustering 10. Supplementary Workflows Licensed under a Creative Commons Attribution- ® Copyright © 2018 KNIME AG 2 Noncommercial-Share Alike license 1 https://creativecommons.org/licenses/by-nc-sa/4.0/ Overview KNIME Analytics Platform Licensed under a Creative Commons Attribution- ® Copyright © 2018 KNIME AG 3 Noncommercial-Share Alike license 1 https://creativecommons.org/licenses/by-nc-sa/4.0/ What is KNIME Analytics Platform? • A tool for data analysis, manipulation, visualization, and reporting • Based on the graphical programming paradigm • Provides a diverse array of extensions: • Text Mining • Network Mining • Cheminformatics • Many integrations, such as Java, R, Python, Weka, H2O, etc. Licensed under a Creative Commons Attribution- ® Copyright © 2018 KNIME AG 4 Noncommercial-Share Alike license 2 https://creativecommons.org/licenses/by-nc-sa/4.0/ Visual KNIME Workflows NODES perform tasks on data Not Configured Configured Outputs Inputs Executed Status Error Nodes are combined to create WORKFLOWS Licensed under a Creative Commons Attribution- ® Copyright © 2018 KNIME AG 5 Noncommercial-Share Alike license 3 https://creativecommons.org/licenses/by-nc-sa/4.0/ Data Access • Databases • MySQL, MS SQL Server, PostgreSQL • any JDBC (Oracle, DB2, …) • Files • CSV, txt -
Words and Alternative Basic Units for Linguistic Analysis
Words and alternative basic units for linguistic analysis 1 Words and alternative basic units for linguistic analysis Jens Allwood SCCIIL Interdisciplinary Center, University of Gothenburg A. P. Hendrikse, Department of Linguistics, University of South Africa, Pretoria Elisabeth Ahlsén SCCIIL Interdisciplinary Center, University of Gothenburg Abstract The paper deals with words and possible alternative to words as basic units in linguistic theory, especially in interlinguistic comparison and corpus linguistics. A number of ways of defining the word are discussed and related to the analysis of linguistic corpora and to interlinguistic comparisons between corpora of spoken interaction. Problems associated with words as the basic units and alternatives to the traditional notion of word as a basis for corpus analysis and linguistic comparisons are presented and discussed. 1. What is a word? To some extent, there is an unclear view of what counts as a linguistic word, generally, and in different language types. This paper is an attempt to examine various construals of the concept “word”, in order to see how “words” might best be made use of as units of linguistic comparison. Using intuition, we might say that a word is a basic linguistic unit that is constituted by a combination of content (meaning) and expression, where the expression can be phonetic, orthographic or gestural (deaf sign language). On closer examination, however, it turns out that the notion “word” can be analyzed and specified in several different ways. Below we will consider the following three main ways of trying to analyze and define what a word is: (i) Analysis and definitions building on observation and supposed easy discovery (ii) Analysis and definitions building on manipulability (iii) Analysis and definitions building on abstraction 2. -
Application of Machine Learning in Automatic Sentiment Recognition from Human Speech
Application of Machine Learning in Automatic Sentiment Recognition from Human Speech Zhang Liu Ng EYK Anglo-Chinese Junior College College of Engineering Singapore Nanyang Technological University (NTU) Singapore [email protected] Abstract— Opinions and sentiments are central to almost all human human activities and have a wide range of applications. As activities and have a wide range of applications. As many decision many decision makers turn to social media due to large makers turn to social media due to large volume of opinion data volume of opinion data available, efficient and accurate available, efficient and accurate sentiment analysis is necessary to sentiment analysis is necessary to extract those data. Business extract those data. Hence, text sentiment analysis has recently organisations in different sectors use social media to find out become a popular field and has attracted many researchers. However, consumer opinions to improve their products and services. extracting sentiments from audio speech remains a challenge. This project explores the possibility of applying supervised Machine Political party leaders need to know the current public Learning in recognising sentiments in English utterances on a sentiment to come up with campaign strategies. Government sentence level. In addition, the project also aims to examine the effect agencies also monitor citizens’ opinions on social media. of combining acoustic and linguistic features on classification Police agencies, for example, detect criminal intents and cyber accuracy. Six audio tracks are randomly selected to be training data threats by analysing sentiment valence in social media posts. from 40 YouTube videos (monologue) with strong presence of In addition, sentiment information can be used to make sentiments. -
Text KM & Text Mining
Text KM & Text Mining An Introduction to Managing Knowledge in Unstructured Natural Language Documents Dekai Wu HKUST Human Language Technology Center Department of Computer Science and Engineering University of Science and Technology Hong Kong http://www.cs.ust.hk/~dekai © 2008 Dekai Wu Lecture Objectives Introduction to the concept of Text KM and Text Mining (TM) How to exploit knowledge encoded in text form How text mining is different from data mining Introduction to the various aspects of Natural Language Processing (NLP) Introduction to the different tools and methods available for TM HKUST Human Language Technology Center © 2008 Dekai Wu Textual Knowledge Management Text KM oversees the storage, capturing and sharing of knowledge encoded in unstructured natural language documents 80-90% of an organization’s explicit knowledge resides in plain English (Chinese, Japanese, Spanish, …) documents – not in structured relational databases! Case libraries are much more reasonably stored as natural language documents, than encoded into relational databases Most knowledge encoded as text will never pass through explicit KM processes (eg, email) HKUST Human Language Technology Center © 2008 Dekai Wu Text Mining Text Mining analyzes unstructured natural language documents to extract targeted types of knowledge Extracts knowledge that can then be inserted into databases, thereby facilitating structured data mining techniques Provides a more natural user interface for entering knowledge, for both employees and developers Reduces -
Greek and Latin Roots, Prefixes, and Suffixes
GREEK AND LATIN ROOTS, PREFIXES, AND SUFFIXES This is a resource pack that I put together for myself to teach roots, prefixes, and suffixes as part of a separate vocabulary class (short weekly sessions). It is a combination of helpful resources that I have found on the web as well as some tips of my own (such as the simple lesson plan). Lesson Plan Ideas ........................................................................................................... 3 Simple Lesson Plan for Word Study: ........................................................................... 3 Lesson Plan Idea 2 ...................................................................................................... 3 Background Information .................................................................................................. 5 Why Study Word Roots, Prefixes, and Suffixes? ......................................................... 6 Latin and Greek Word Elements .............................................................................. 6 Latin Roots, Prefixes, and Suffixes .......................................................................... 6 Root, Prefix, and Suffix Lists ........................................................................................... 8 List 1: MEGA root list ................................................................................................... 9 List 2: Roots, Prefixes, and Suffixes .......................................................................... 32 List 3: Prefix List ...................................................................................................... -
Knowledge-Powered Deep Learning for Word Embedding
Knowledge-Powered Deep Learning for Word Embedding Jiang Bian, Bin Gao, and Tie-Yan Liu Microsoft Research {jibian,bingao,tyliu}@microsoft.com Abstract. The basis of applying deep learning to solve natural language process- ing tasks is to obtain high-quality distributed representations of words, i.e., word embeddings, from large amounts of text data. However, text itself usually con- tains incomplete and ambiguous information, which makes necessity to leverage extra knowledge to understand it. Fortunately, text itself already contains well- defined morphological and syntactic knowledge; moreover, the large amount of texts on the Web enable the extraction of plenty of semantic knowledge. There- fore, it makes sense to design novel deep learning algorithms and systems in order to leverage the above knowledge to compute more effective word embed- dings. In this paper, we conduct an empirical study on the capacity of leveraging morphological, syntactic, and semantic knowledge to achieve high-quality word embeddings. Our study explores these types of knowledge to define new basis for word representation, provide additional input information, and serve as auxiliary supervision in deep learning, respectively. Experiments on an analogical reason- ing task, a word similarity task, and a word completion task have all demonstrated that knowledge-powered deep learning can enhance the effectiveness of word em- bedding. 1 Introduction With rapid development of deep learning techniques in recent years, it has drawn in- creasing attention to train complex and deep models on large amounts of data, in order to solve a wide range of text mining and natural language processing (NLP) tasks [4, 1, 8, 13, 19, 20]. -
Different but Not All Opposite: Contributions to Lexical Relationships Teaching in Primary School
INTE - ITICAM - IDEC 2018, Paris-FRANCE VOLUME 1 All Different But Not All Opposite: Contributions To Lexical Relationships Teaching In Primary School Adriana BAPTISTA Polytechnic Institute of Porto – School of Media Arts and Design inED – Centre for Research and Innovation in Education Portugal [email protected] Celda CHOUPINA Polytechnic Institute of Porto – School of Education inED – Centre for Research and Innovation in Education Centre of Linguistics of the University of Porto Portugal [email protected] José António COSTA Polytechnic Institute of Porto – School of Education inED – Centre for Research and Innovation in Education Centre of Linguistics of the University of Porto Portugal [email protected] Joana QUERIDO Polytechnic Institute of Porto – School of Education Portugal [email protected] Inês OLIVEIRA Polytechnic Institute of Porto – School of Education Centre of Linguistics of the University of Porto Portugal [email protected] Abstract The lexicon allows the expression of particular cosmovisions, which is why there are a wide range of lexical relationships, involving different linguistic particularities (Coseriu, 1991; Teixeira , 2005). We find, however, in teaching context, that these variations are often replaced by dichotomous and decontextualized proposals of lexical organization, presented, for instance, in textbooks and other supporting materials (Baptista et al., 2017). Thus, our paper is structured in two parts. First, we will try to account for the diversity of lexical relations (Choupina, Costa & Baptista, 2013), considering phonological, morphological, syntactic, semantic, pragmatic- discursive, cognitive and historical criteria (Lehmann & Martin-Berthet, 2008). Secondly, we present an experimental study that aims at verifying if primary school pupils intuitively organize their mental lexicon in a dichotomous way. -
Formal Concept Analysis in Knowledge Processing: a Survey on Applications ⇑ Jonas Poelmans A,C, , Dmitry I
Expert Systems with Applications 40 (2013) 6538–6560 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa Review Formal concept analysis in knowledge processing: A survey on applications ⇑ Jonas Poelmans a,c, , Dmitry I. Ignatov c, Sergei O. Kuznetsov c, Guido Dedene a,b a KU Leuven, Faculty of Business and Economics, Naamsestraat 69, 3000 Leuven, Belgium b Universiteit van Amsterdam Business School, Roetersstraat 11, 1018 WB Amsterdam, The Netherlands c National Research University Higher School of Economics, Pokrovsky boulevard, 11, 109028 Moscow, Russia article info abstract Keywords: This is the second part of a large survey paper in which we analyze recent literature on Formal Concept Formal concept analysis (FCA) Analysis (FCA) and some closely related disciplines using FCA. We collected 1072 papers published Knowledge discovery in databases between 2003 and 2011 mentioning terms related to Formal Concept Analysis in the title, abstract and Text mining keywords. We developed a knowledge browsing environment to support our literature analysis process. Applications We use the visualization capabilities of FCA to explore the literature, to discover and conceptually repre- Systematic literature overview sent the main research topics in the FCA community. In this second part, we zoom in on and give an extensive overview of the papers published between 2003 and 2011 which applied FCA-based methods for knowledge discovery and ontology engineering in various application domains. These domains include software mining, web analytics, medicine, biology and chemistry data. Ó 2013 Elsevier Ltd. All rights reserved. 1. Introduction mining, i.e. gaining insight into source code with FCA. -
Concept Maps Mining for Text Summarization
UNIVERSIDADE FEDERAL DO ESPÍRITO SANTO CENTRO TECNOLÓGICO DEPARTAMENTO DE INFORMÁTICA PROGRAMA DE PÓS-GRADUAÇÃO EM INFORMÁTICA Camila Zacché de Aguiar Concept Maps Mining for Text Summarization VITÓRIA-ES, BRAZIL March 2017 Camila Zacché de Aguiar Concept Maps Mining for Text Summarization Dissertação de Mestrado apresentada ao Programa de Pós-Graduação em Informática da Universidade Federal do Espírito Santo, como requisito parcial para obtenção do Grau de Mestre em Informática. Orientador (a): Davidson CurY Co-orientador: Amal Zouaq VITÓRIA-ES, BRAZIL March 2017 Dados Internacionais de Catalogação-na-publicação (CIP) (Biblioteca Setorial Tecnológica, Universidade Federal do Espírito Santo, ES, Brasil) Aguiar, Camila Zacché de, 1987- A282c Concept maps mining for text summarization / Camila Zacché de Aguiar. – 2017. 149 f. : il. Orientador: Davidson Cury. Coorientador: Amal Zouaq. Dissertação (Mestrado em Informática) – Universidade Federal do Espírito Santo, Centro Tecnológico. 1. Informática na educação. 2. Processamento de linguagem natural (Computação). 3. Recuperação da informação. 4. Mapas conceituais. 5. Sumarização de Textos. 6. Mineração de dados (Computação). I. Cury, Davidson. II. Zouaq, Amal. III. Universidade Federal do Espírito Santo. Centro Tecnológico. IV. Título. CDU: 004 “Most of the fundamental ideas of science are essentially simple, and may, as a rule, be expressed in a language comprehensible to everyone.” Albert Einstein 4 Acknowledgments I would like to thank the manY people who have been with me over the years and who have contributed in one way or another to the completion of this master’s degree. Especially my advisor, Davidson Cury, who in a constructivist way disoriented me several times as a stimulus to the search for new answers. -
Empowering Video Storytellers: Concept Discovery and Annotation for Large Audio-Video-Text Archives
Empowering Video Storytellers: Concept Discovery and Annotation for Large Audio-Video-Text Archives A large amount of video content is available and stored by broadcasters, libraries and other enterprises. However, the lack of effective and efficient tools for searching and retrieving has prevented video becoming a major value asset. Much work has been done for retrieving video information from constrained structured domains, such as news and sports, but general solutions for unstructured audio-video-language intensive content are missing. Our goal is to transform the multimedia archive into an organized and searchable asset so that human efforts can focus on a more important aspect - storytelling. This project involves a cross- disciplinary effort towards development of new techniques for organizing, summarizing, and question answering over audio-video-language intensive content, such as raw documentary footage. Such domains offer the full range of real-world situations and events, manifested with interaction of rich multimedia information. The team includes investigators from Schools of Engineering and Applied Science, Journalism, and Arts. It has extensive experience and deep expertise in analysis of multimedia information - video, audio, and language - as well as broad experience with professional documentary film production and interactive media designs. The project includes two major content partners, WITNESS and WNET, both providing unique video archives, well-defined application domains and user communities. In particular, WITNESS collects and manages a documentary video archive from activists all over the world to support the promotion of human rights. This archive has several important qualities: it is diverse (including interviews, documentation of events, and ‘hidden camera’ reporting), it is relatively ‘raw’ (shot by semi-professional operators), it has been shot for specific purposes (e.g. -
Technical Phrase Extraction for Patent Mining: a Multi-Level Approach
2020 IEEE International Conference on Data Mining (ICDM) Technical Phrase Extraction for Patent Mining: A Multi-level Approach Ye Liu, Han Wu, Zhenya Huang, Hao Wang, Jianhui Ma, Qi Liu, Enhong Chen*, Hanqing Tao, Ke Rui Anhui Province Key Laboratory of Big Data Analysis and Application, School of Data Science & School of Computer Science and Technology, University of Science and Technology of China {liuyer, wuhanhan, wanghao3, hqtao, kerui}@mail.ustc.edu.cn, {huangzhy, jianhui, qiliuql, cheneh}@ustc.edu.cn Abstract—Recent years have witnessed a booming increase of TABLE I: Technical Phrases vs. Non-technical Phrases patent applications, which provides an open chance for revealing Domain Technical phrase Non-technical phrase the inner law of innovation, but in the meantime, puts forward Electricity wireless communication, wire and cable, TV sig- higher requirements on patent mining techniques. Considering netcentric computer service nal, power plug Mechanical fluid leak detection, power building materials, steer that patent mining highly relies on patent document analysis, Engineering transmission column, seat back this paper makes a focused study on constructing a technology portrait for each patent, i.e., to recognize technical phrases relevant works have been explored on phrase extraction. Ac- concerned in it, which can summarize and represent patents cording to the extraction target, they can be divided into Key from a technology angle. To this end, we first give a clear Phrase Extraction [8], Named Entity Recognition (NER) [9] and detailed description about technical phrases in patents and Concept Extraction [10]. Key Phrase Extraction aims to based on various prior works and analyses. Then, combining characteristics of technical phrases and multi-level structures extract phrases that provide a concise summary of a document, of patent documents, we develop an Unsupervised Multi-level which prefers those both frequently-occurring and closed to Technical Phrase Extraction (UMTPE) model. -
Lexical Sense Labeling and Sentiment Potential Analysis Using Corpus-Based Dependency Graph
mathematics Article Lexical Sense Labeling and Sentiment Potential Analysis Using Corpus-Based Dependency Graph Tajana Ban Kirigin 1,* , Sanda Bujaˇci´cBabi´c 1 and Benedikt Perak 2 1 Department of Mathematics, University of Rijeka, R. Matejˇci´c2, 51000 Rijeka, Croatia; [email protected] 2 Faculty of Humanities and Social Sciences, University of Rijeka, SveuˇcilišnaAvenija 4, 51000 Rijeka, Croatia; [email protected] * Correspondence: [email protected] Abstract: This paper describes a graph method for labeling word senses and identifying lexical sentiment potential by integrating the corpus-based syntactic-semantic dependency graph layer, lexical semantic and sentiment dictionaries. The method, implemented as ConGraCNet application on different languages and corpora, projects a semantic function onto a particular syntactical de- pendency layer and constructs a seed lexeme graph with collocates of high conceptual similarity. The seed lexeme graph is clustered into subgraphs that reveal the polysemous semantic nature of a lexeme in a corpus. The construction of the WordNet hypernym graph provides a set of synset labels that generalize the senses for each lexical cluster. By integrating sentiment dictionaries, we introduce graph propagation methods for sentiment analysis. Original dictionary sentiment values are integrated into ConGraCNet lexical graph to compute sentiment values of node lexemes and lexical clusters, and identify the sentiment potential of lexemes with respect to a corpus. The method can be used to resolve sparseness of sentiment dictionaries and enrich the sentiment evaluation of Citation: Ban Kirigin, T.; lexical structures in sentiment dictionaries by revealing the relative sentiment potential of polysemous Bujaˇci´cBabi´c,S.; Perak, B. Lexical Sense Labeling and Sentiment lexemes with respect to a specific corpus.