Exploratory Search on Mobile Devices
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
D1.3 Deliverable Title: Action Ontology and Object Categories Type
Deliverable number: D1.3 Deliverable Title: Action ontology and object categories Type (Internal, Restricted, Public): PU Authors: Daiva Vitkute-Adzgauskiene, Jurgita Kapociute, Irena Markievicz, Tomas Krilavicius, Minija Tamosiunaite, Florentin Wörgötter Contributing Partners: UGOE, VMU Project acronym: ACAT Project Type: STREP Project Title: Learning and Execution of Action Categories Contract Number: 600578 Starting Date: 01-03-2013 Ending Date: 30-04-2016 Contractual Date of Delivery to the EC: 31-08-2015 Actual Date of Delivery to the EC: 04-09-2015 Page 1 of 16 Content 1. EXECUTIVE SUMMARY .................................................................................................................... 2 2. INTRODUCTION .............................................................................................................................. 3 3. OVERVIEW OF THE ACAT ONTOLOGY STRUCTURE ............................................................................ 3 4. OBJECT CATEGORIZATION BY THEIR SEMANTIC ROLES IN THE INSTRUCTION SENTENCE .................... 9 5. HIERARCHICAL OBJECT STRUCTURES ............................................................................................. 13 6. MULTI-WORD OBJECT NAME RESOLUTION .................................................................................... 15 7. CONCLUSIONS AND FUTURE WORK ............................................................................................... 16 8. REFERENCES ................................................................................................................................ -
Awareness and Utilization of Search Engines for Information Retrieval by Students of National Open University of Nigeria in Enugu Study Centre Library
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Library Philosophy and Practice (e-journal) Libraries at University of Nebraska-Lincoln 2020 Awareness and Utilization of Search Engines for Information Retrieval by Students of National Open University of Nigeria in Enugu Study Centre Library SUNDAY JUDE ONUH National Open University of Nigeria, [email protected] OGOEGBUNAM LOVETH EKWUEME National Open University of Nigeria, [email protected] Follow this and additional works at: https://digitalcommons.unl.edu/libphilprac Part of the Library and Information Science Commons ONUH, SUNDAY JUDE and EKWUEME, OGOEGBUNAM LOVETH, "Awareness and Utilization of Search Engines for Information Retrieval by Students of National Open University of Nigeria in Enugu Study Centre Library" (2020). Library Philosophy and Practice (e-journal). 4650. https://digitalcommons.unl.edu/libphilprac/4650 Awareness and Utilization of Search Engines for Information Retrieval by Students of National Open University of Nigeria in Enugu Study Centre Library By Jude Sunday Onuh Enugu Study Centre Library National Open University of Nigeria [email protected] & Loveth Ogoegbunam Ekwueme Department of Library and Information Science National Open University of Nigeria [email protected] Abstract This study dwelt on awareness and utilization of search engines for information retrieval by students of National Open University of Nigeria (NOUN) Enugu Study centre. Descriptive survey research was adopted for the study. Two research questions were drawn from the two purposes that guided the study. The population consists of 5855 undergraduate students of NOUN Enugu Study Centre. A sample size of 293 students was used as 5% of the entire population. -
Gender and Information Literacy: Evaluation of Gender Differences in a Student Survey of Information Sources
Gender and Information Literacy: Evaluation of Gender Differences in a Student Survey of Information Sources Arthur Taylor and Heather A. Dalal Information literacy studies have shown that college students use a va- riety of information sources to perform research and commonly choose Internet sources over traditional library sources. Studies have also shown that students do not appear to understand the information quality issues concerning Internet information sources and may lack the information literacy skills to make good choices concerning the use of these sources. No studies currently provide clear guidance on how gender might influ- ence the information literacy skills of students. Such guidance could help improve information literacy instruction. This study used a survey of college-aged students to evaluate a subset of student information literacy skills in relation to Internet information sources. Analysis of the data collected provided strong indications of gender differences in information literacy skills. Female respondents ap- peared to be more discerning than males in evaluating Internet sources. Males appeared to be more confident in the credibility and accuracy of the results returned by search engines. Evaluation of other survey responses strengthened our finding of gender differentiation in informa- tion literacy skills. ollege students today have come of age surrounded by a sea of information delivered from an array of sources available at their fingertips at any time of the day or night. Studies have shown that the most common source of information for college-aged students is the Internet. Information gathered in this environment is most likely found using a commercial search engine that returns sources of dubious quality using an unknown algorithm. -
Final Study Report on CEF Automated Translation Value Proposition in the Context of the European LT Market/Ecosystem
Final study report on CEF Automated Translation value proposition in the context of the European LT market/ecosystem FINAL REPORT A study prepared for the European Commission DG Communications Networks, Content & Technology by: Digital Single Market CEF AT value proposition in the context of the European LT market/ecosystem Final Study Report This study was carried out for the European Commission by Luc MEERTENS 2 Khalid CHOUKRI Stefania AGUZZI Andrejs VASILJEVS Internal identification Contract number: 2017/S 108-216374 SMART number: 2016/0103 DISCLAIMER By the European Commission, Directorate-General of Communications Networks, Content & Technology. The information and views set out in this publication are those of the author(s) and do not necessarily reflect the official opinion of the Commission. The Commission does not guarantee the accuracy of the data included in this study. Neither the Commission nor any person acting on the Commission’s behalf may be held responsible for the use which may be made of the information contained therein. ISBN 978-92-76-00783-8 doi: 10.2759/142151 © European Union, 2019. All rights reserved. Certain parts are licensed under conditions to the EU. Reproduction is authorised provided the source is acknowledged. 2 CEF AT value proposition in the context of the European LT market/ecosystem Final Study Report CONTENTS Table of figures ................................................................................................................................................ 7 List of tables .................................................................................................................................................. -
Collocation Classification with Unsupervised Relation Vectors
Collocation Classification with Unsupervised Relation Vectors Luis Espinosa-Anke1, Leo Wanner2,3, and Steven Schockaert1 1School of Computer Science, Cardiff University, United Kingdom 2ICREA and 3NLP Group, Universitat Pompeu Fabra, Barcelona, Spain fespinosa-ankel,schockaerts1g@cardiff.ac.uk [email protected] Abstract Morphosyntactic relations have been the focus of work on unsupervised relational similarity, as Lexical relation classification is the task of it has been shown that verb conjugation or nomi- predicting whether a certain relation holds be- nalization patterns are relatively well preserved in tween a given pair of words. In this pa- vector spaces (Mikolov et al., 2013; Pennington per, we explore to which extent the current et al., 2014a). Semantic relations pose a greater distributional landscape based on word em- beddings provides a suitable basis for classi- challenge (Vylomova et al., 2016), however. In fication of collocations, i.e., pairs of words fact, as of today, it is unclear which operation per- between which idiosyncratic lexical relations forms best (and why) for the recognition of indi- hold. First, we introduce a novel dataset with vidual lexico-semantic relations (e.g., hyperonymy collocations categorized according to lexical or meronymy, as opposed to cause, location or ac- functions. Second, we conduct experiments tion). Still, a number of works address this chal- on a subset of this benchmark, comparing it in lenge. For instance, hypernymy has been modeled particular to the well known DiffVec dataset. In these experiments, in addition to simple using vector concatenation (Baroni et al., 2012), word vector arithmetic operations, we also in- vector difference and component-wise squared vestigate the role of unsupervised relation vec- difference (Roller et al., 2014) as input to linear tors as a complementary input. -
Comparative Evaluation of Collocation Extraction Metrics
Comparative Evaluation of Collocation Extraction Metrics Aristomenis Thanopoulos, Nikos Fakotakis, George Kokkinakis Wire Communications Laboratory Electrical & Computer Engineering Dept., University of Patras 265 00 Rion, Patras, Greece {aristom,fakotaki,gkokkin}@wcl.ee.upatras.gr Abstract Corpus-based automatic extraction of collocations is typically carried out employing some statistic indicating concurrency in order to identify words that co-occur more often than expected by chance. In this paper we are concerned with some typical measures such as the t-score, Pearson’s χ-square test, log-likelihood ratio, pointwise mutual information and a novel information theoretic measure, namely mutual dependency. Apart from some theoretical discussion about their correlation, we perform comparative evaluation experiments judging performance by their ability to identify lexically associated bigrams. We use two different gold standards: WordNet and lists of named-entities. Besides discovering that a frequency-biased version of mutual dependency performs the best, followed close by likelihood ratio, we point out some implications that usage of available electronic dictionaries such as the WordNet for evaluation of collocation extraction encompasses. dependence, several metrics have been adopted by the 1. Introduction corpus linguistics community. Typical statistics are Collocational information is important not only for the t-score (TSC), Pearson’s χ-square test (χ2), log- second language learning but also for many natural likelihood ratio (LLR) and pointwise mutual language processing tasks. Specifically, in natural information (PMI). However, usually no systematic language generation and machine translation it is comparative evaluation is accomplished. For example, necessary to ensure generation of lexically correct Kita et al. (1993) accomplish partial and intuitive expressions; for example, “strong”, unlike “powerful”, comparisons between metrics, while Smadja (1993) modifies “coffee” but not “computers”. -
Collocation Extraction Using Parallel Corpus
Collocation Extraction Using Parallel Corpus Kavosh Asadi Atui1 Heshaam Faili1 Kaveh Assadi Atuie2 (1)NLP Laboratory, Electrical and Computer Engineering Dept., University of Tehran, Iran (2)Electrical Engineering Dept., Sharif University of Technology, Iran [email protected],[email protected],[email protected] ABSTRACT This paper presents a novel method to extract the collocations of the Persian language using a parallel corpus. The method is applicable having a parallel corpus between a target language and any other high-resource one. Without the need for an accurate parser for the target side, it aims to parse the sentences to capture long distance collocations and to generate more precise results. A training data built by bootstrapping is also used to rank the candidates with a log-linear model. The method improves the precision and recall of collocation extraction by 5 and 3 percent respectively in comparison with the window-based statistical method in terms of being a Persian multi-word expression. KEYWORDS: Information and Content Extraction, Parallel Corpus, Under-resourced Languages Proceedings of COLING 2012: Posters, pages 93–102, COLING 2012, Mumbai, December 2012. 93 1 Introduction Collocation is usually interpreted as the occurrence of two or more words within a short space in a text (Sinclair, 1987). This definition however is not precise, because it is not possible to define a short space. It also implies the strategy that all traditional models had. They were looking for co- occurrences rather than collocations (Seretan, 2011). Consider the following sentence and its Persian translation1: "Lecturer issued a major and also impossible to solve problem." مدرس یک مشکل بزرگ و غیر قابل حل را عنوان کرد. -
Collocation Classification with Unsupervised Relation Vectors
Collocation Classification with Unsupervised Relation Vectors Luis Espinosa-Anke1, Leo Wanner2,3, and Steven Schockaert1 1School of Computer Science, Cardiff University, United Kingdom 2ICREA and 3NLP Group, Universitat Pompeu Fabra, Barcelona, Spain fespinosa-ankel,schockaerts1g@cardiff.ac.uk [email protected] Abstract Morphosyntactic relations have been the focus of work on unsupervised relational similarity, as Lexical relation classification is the task of it has been shown that verb conjugation or nomi- predicting whether a certain relation holds be- nalization patterns are relatively well preserved in tween a given pair of words. In this pa- vector spaces (Mikolov et al., 2013; Pennington per, we explore to which extent the current et al., 2014a). Semantic relations pose a greater distributional landscape based on word em- beddings provides a suitable basis for classi- challenge (Vylomova et al., 2016), however. In fication of collocations, i.e., pairs of words fact, as of today, it is unclear which operation per- between which idiosyncratic lexical relations forms best (and why) for the recognition of indi- hold. First, we introduce a novel dataset with vidual lexico-semantic relations (e.g., hyperonymy collocations categorized according to lexical or meronymy, as opposed to cause, location or ac- functions. Second, we conduct experiments tion). Still, a number of works address this chal- on a subset of this benchmark, comparing it in lenge. For instance, hypernymy has been modeled particular to the well known DiffVec dataset. In these experiments, in addition to simple using vector concatenation (Baroni et al., 2012), word vector arithmetic operations, we also in- vector difference and component-wise squared vestigate the role of unsupervised relation vec- difference (Roller et al., 2014) as input to linear tors as a complementary input. -
Dish2vec: a Comparison of Word Embedding Methods in an Unsupervised Setting
dish2vec: A Comparison of Word Embedding Methods in an Unsupervised Setting Guus Verstegen Student Number: 481224 11th June 2019 Erasmus University Rotterdam Erasmus School of Economics Supervisor: Dr. M. van de Velden Second Assessor: Dr. F. Frasincar Master Thesis Econometrics and Management Science Business Analytics and Quantitative Marketing Abstract While the popularity of continuous word embeddings has increased over the past years, detailed derivations and extensive explanations of these methods lack in the academic literature. This paper elaborates on three popular word embedding meth- ods; GloVe and two versions of word2vec: continuous skip-gram and continuous bag-of-words. This research aims to enhance our understanding of both the founda- tion and extensions of these methods. In addition, this research addresses instability of the methods with respect to the hyperparameters. An example in a food menu context is used to illustrate the application of these word embedding techniques in quantitative marketing. For this specific case, the GloVe method performs best according to external expert evaluation. Nevertheless, the fact that GloVe performs best is not generalizable to other domains or data sets due to the instability of the models and the evaluation procedure. Keywords: word embedding; word2vec; GloVe; quantitative marketing; food Contents 1 Introduction 3 2 Literature review 4 2.1 Discrete distributional representation of text . .5 2.2 Continuous distributed representation of text . .5 2.3 word2vec and GloVe ..............................7 2.4 Food menu related literature . .9 3 Methodology: the base implementation 9 3.1 Quantified word similarity: cosine similarity . 12 3.2 Model optimization: stochastic gradient descent . 12 3.3 word2vec ................................... -
Evaluating Language Models for the Retrieval and Categorization of Lexical Collocations
Evaluating language models for the retrieval and categorization of lexical collocations Luis Espinosa-Anke1, Joan Codina-Filba`2, Leo Wanner3;2 1School of Computer Science and Informatics, Cardiff University, UK 2TALN Research Group, Pompeu Fabra University, Barcelona, Spain 3Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain [email protected], [email protected] Abstract linguistic phenomena (Rogers et al., 2020). Re- cently, a great deal of research analyzed the degree Lexical collocations are idiosyncratic com- to which they encode, e.g., morphological (Edmis- binations of two syntactically bound lexical ton, 2020), syntactic (Hewitt and Manning, 2019), items (e.g., “heavy rain”, “take a step” or or lexico-semantic structures (Joshi et al., 2020). “undergo surgery”). Understanding their de- However, less work explored so far how LMs in- gree of compositionality and idiosyncrasy, as well their underlying semantics, is crucial for terpret phraseological units at various degrees of language learners, lexicographers and down- compositionality. This is crucial for understanding stream NLP applications alike. In this paper the suitability of different text representations (e.g., we analyse a suite of language models for col- static vs contextualized word embeddings) for en- location understanding. We first construct a coding different types of multiword expressions dataset of apparitions of lexical collocations (Shwartz and Dagan, 2019), which, in turn, can be in context, categorized into 16 representative useful for extracting latent world or commonsense semantic categories. Then, we perform two experiments: (1) unsupervised collocate re- information (Zellers et al., 2018). trieval, and (2) supervised collocation classi- One central type of phraselogical units are fication in context. -
Information Rereival, Part 1
11/4/2019 Information Retrieval Deepak Kumar Information Retrieval Searching within a document collection for a particular needed information. 1 11/4/2019 Query Search Engines… Altavista Entireweb Leapfish Spezify Ask Excite Lycos Stinky Teddy Baidu Faroo Maktoob Stumpdedia Bing Info.com Miner.hu Swisscows Blekko Fireball Monster Crawler Teoma ChaCha Gigablast Naver Walla Dogpile Google Omgili WebCrawler Daum Go Rediff Yahoo! Dmoz Goo Scrub The Web Yandex Du Hakia Seznam Yippy Egerin HotBot Sogou Youdao ckDuckGo Soso 2 11/4/2019 Search Engine Marketshare 2019 3 11/4/2019 Search Engine Marketshare 2017 Matching & Ranking matched pages ranked pages 1. 2. query 3. muddy waters matching ranking “hits” 4 11/4/2019 Index Inverted Index • A mapping from content (words) to location. • Example: the cat sat on the dog stood on the cat stood 1 2 3 the mat the mat while a dog sat 5 11/4/2019 Inverted Index the cat sat on the dog stood on the cat stood 1 2 3 the mat the mat while a dog sat a 3 cat 1 3 dog 2 3 mat 1 2 on 1 2 sat 1 3 stood 2 3 the 1 2 3 while 3 Inverted Index the cat sat on the dog stood on the cat stood 1 2 3 the mat the mat while a dog sat a 3 cat 1 3 dog 2 3 mat 1 2 Every word in every on 1 2 web page is indexed! sat 1 3 stood 2 3 the 1 2 3 while 3 6 11/4/2019 Searching the cat sat on the dog stood on the cat stood 1 2 3 the mat the mat while a dog sat a 3 cat 1 3 query dog 2 3 mat 1 2 cat on 1 2 sat 1 3 stood 2 3 the 1 2 3 while 3 Searching the cat sat on the dog stood on the cat stood 1 2 3 the mat the mat while a dog sat a 3 cat -
Collocation Segmentation for Text Chunking
VYTAUTAS MAGNUS UNIVERSITY Vidas DAUDARAVIČIUS COLLOCATION SEGMENTATION FOR TEXT CHUNKING Doctoral dissertation Physical Sciences, Informatics (09P) Kaunas, 2012 UDK 004 Da-371 This doctoral dissertation was written at Vytautas Magnus University in 2008–2012. Research supervisor: doc. dr. Minija Tamoši unait¯ e˙ Vytautas Magnus University, Physical Sciences, Informatics – 09 P ISBN 978-9955-12-844-1 VYTAUTO DIDŽIOJO UNIVERSITETAS Vidas DAUDARAVIČIUS TEKSTO SKAIDYMAS PASTOVIU˛JU˛JUNGINIU˛SEGMENTAIS Daktaro disertacija Fiziniai mokslai, Informatika (09P) Kaunas, 2012 Disertacija rengta 2008–2012 metais Vytauto Didžiojo universitete Mokslinis vadovas: doc. dr. Minija Tamoši unait¯ e˙ Vytauto Didžiojo universitetas, fiziniai mokslai, informatika – 09 P In the beginning was the Word (Jon 1:1) Summary Segmentation is a widely used paradigm in text processing. There are many methods of segmentation for text processing, including: topic segmentation, sentence segmentation, morpheme segmentation, phoneme segmentation, and Chinese text segmentation. Rule-based, statistical and hybrid methods are employed to perform the segmentation. This dissertation introduces a new type of segmentation– collocation segmentation–and a new method to perform it, and applies them to three different text processing tasks: – Lexicography. Collocation segmentation makes possible the use of large corpora to evaluate the usage and importance of terminology over time. It highlights important methodological changes in the history of research and allows actual research trends throughout history to be rediscovered. Such an analysis is applied to the ACL Anthology Reference Corpus of scientific papers published during the last 50 years in this research area. – Text categorization. Text categorization results can be improved using collocation segmen- tation. The study shows that collocation segmentation, without any other language resources, achieves better results than the widely used n-gram techniques together with POS (Part-of- Speech) processing tools.