Geographical Localization of Web Domains and Organization Addresses Recognition by Employing Natural Language Processing, Pattern Matching and Clustering
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Text and Data Mining: Technologies Under Construction
Text and Data Mining: Technologies Under Construction WHO’S INSIDE Accenture Elsevier Science Europe American Institute of Figshare SciTech Strategies Biological Sciences General Electric SPARC Battelle IBM Sparrho Bristol-Myers Squibb Komatsu Spotfire Clinerion Linguamatics Limited Talix Columbia Pipeline Group MedAware UnitedHealth Group Copyright Clearance Mercedes-Benz Verisk Center, Inc. Meta VisTrails CrossRef Novartis Wellcome Trust Docear OMICtools Copyright Clearance Center, Inc. (CCC) has licensed this report from Outsell, Inc., with the right to distribute it for marketing and market education purposes. CCC did not commission this report, nor is this report a fee-for-hire white paper. Outsell’s fact-based research, analysis and rankings, and all aspects of our opinion were independently derived and CCC had no influence or involvement on the design or findings within this report. For questions, please contact Outsell at [email protected]. Market Advancing the Business of Information Performance January 22, 2016 Table of Contents Why This Topic 3 Methodology 3 How It Works 3 Applications 7 Scientific Research 7 Healthcare 10 Engineering 13 Challenges 15 Implications 17 10 to Watch 18 Essential Actions 21 Imperatives for Information Managers 22 Related Research 24 Table of Contents for Figures & Tables Table 1. Providers of TDM-Related Functions 5 © 2016 Outsell, Inc. 2 Why This Topic Text and data mining (TDM), also referred to as content mining, is a major focus for academia, governments, healthcare, and industry as a way to unleash the potential for previously undiscovered connections among people, places, things, and, for the purpose of this report, scientific, technical, and healthcare information. Although there have been continuous advances in methodologies and technologies, the power of TDM has not yet been fully realized, and technology struggles to keep up with the ever-increasing flow of content. -
Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text
remote sensing Article Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text Edwin Aldana-Bobadilla 1,∗ , Alejandro Molina-Villegas 2 , Ivan Lopez-Arevalo 3 , Shanel Reyes-Palacios 3, Victor Muñiz-Sanchez 4 and Jean Arreola-Trapala 4 1 Conacyt-Centro de Investigación y de Estudios Avanzados del I.P.N. (Cinvestav), Victoria 87130, Mexico; [email protected] 2 Conacyt-Centro de Investigación en Ciencias de Información Geoespacial (Centrogeo), Mérida 97302, Mexico; [email protected] 3 Centro de Investigación y de Estudios Avanzados del I.P.N. Unidad Tamaulipas (Cinvestav Tamaulipas), Victoria 87130, Mexico; [email protected] (I.L.); [email protected] (S.R.) 4 Centro de Investigación en Matemáticas (Cimat), Monterrey 66628, Mexico; [email protected] (V.M.); [email protected] (J.A.) * Correspondence: [email protected] Received: 10 July 2020; Accepted: 13 September 2020; Published: 17 September 2020 Abstract: The automatic extraction of geospatial information is an important aspect of data mining. Computer systems capable of discovering geographic information from natural language involve a complex process called geoparsing, which includes two important tasks: geographic entity recognition and toponym resolution. The first task could be approached through a machine learning approach, in which case a model is trained to recognize a sequence of characters (words) corresponding to geographic entities. The second task consists of assigning such entities to their most likely coordinates. Frequently, the latter process involves solving referential ambiguities. In this paper, we propose an extensible geoparsing approach including geographic entity recognition based on a neural network model and disambiguation based on what we have called dynamic context disambiguation. -
WHU Otto Beisheim School of Management Handbook
International Student Handbook Welcome to WHU Message from the Dean Welcome to WHU – Otto Beisheim School of Management. WHU is a private, state-accredited business school of university rank located in the center of Germany – in Vallendar, just outside of Koblenz. The school has frequently been ranked top in international rankings, such as the renowned Financial Times rankings and the Wall Street Jour- nal ranking and is accredited by AACSB, EQUIS and FIBAA. The school’s main mission is to provide first-class education in management at all levels. We are known for the international orientation of our programs, our international joint ventures we have had for a number of years and our extensive student exchange programs. Our students have access to a network of partner universities which has grown to more than 180 first-class institutions world- wide. Student exchanges are a very important element in our programs, because studying abroad deepens a student’s knowledge of other cultures and increases both flexibility and mobility. Just as WHU students spend considerable time abroad (one term at the undergraduate level and another term at the graduate level), we are happy to welcome many students from our partner in- stitutions at our school and to offer them a unique learning environment. Currently, about 25% of the students on our campus come from abroad and we are proud that their number is growing con- tinuously. We appreciate very much the contributions of our exchange students both in the class- room and outside, because they enrich the discussion and give our campus life an international flair. -
Knowledge-Driven Geospatial Location Resolution for Phylogeographic
Bioinformatics, 31, 2015, i348–i356 doi: 10.1093/bioinformatics/btv259 ISMB/ECCB 2015 Knowledge-driven geospatial location resolution for phylogeographic models of virus migration Davy Weissenbacher1,*, Tasnia Tahsin1, Rachel Beard1,2, Mari Figaro1,2, Robert Rivera1, Matthew Scotch1,2 and Graciela Gonzalez1 1Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and 2Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA *To whom correspondence should be addressed. Abstract Summary: Diseases caused by zoonotic viruses (viruses transmittable between humans and ani- mals) are a major threat to public health throughout the world. By studying virus migration and mutation patterns, the field of phylogeography provides a valuable tool for improving their surveil- lance. A key component in phylogeographic analysis of zoonotic viruses involves identifying the specific locations of relevant viral sequences. This is usually accomplished by querying public data- bases such as GenBank and examining the geospatial metadata in the record. When sufficient detail is not available, a logical next step is for the researcher to conduct a manual survey of the corresponding published articles. Motivation: In this article, we present a system for detection and disambiguation of locations (topo- nym resolution) in full-text articles to automate the retrieval of sufficient metadata. Our system has been tested on a manually annotated corpus of journal articles related to phylogeography using inte- grated heuristics for location disambiguation including a distance heuristic, a population heuristic and a novel heuristic utilizing knowledge obtained from GenBank metadata (i.e. a ‘metadata heuristic’). Results: For detecting and disambiguating locations, our system performed best using the meta- data heuristic (0.54 Precision, 0.89 Recall and 0.68 F-score). -
Natural Language Processing
Chowdhury, G. (2003) Natural language processing. Annual Review of Information Science and Technology, 37. pp. 51-89. ISSN 0066-4200 http://eprints.cdlr.strath.ac.uk/2611/ This is an author-produced version of a paper published in The Annual Review of Information Science and Technology ISSN 0066-4200 . This version has been peer-reviewed, but does not include the final publisher proof corrections, published layout, or pagination. Strathprints is designed to allow users to access the research output of the University of Strathclyde. Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Users may download and/or print one copy of any article(s) in Strathprints to facilitate their private study or for non-commercial research. You may not engage in further distribution of the material or use it for any profitmaking activities or any commercial gain. You may freely distribute the url (http://eprints.cdlr.strath.ac.uk) of the Strathprints website. Any correspondence concerning this service should be sent to The Strathprints Administrator: [email protected] Natural Language Processing Gobinda G. Chowdhury Dept. of Computer and Information Sciences University of Strathclyde, Glasgow G1 1XH, UK e-mail: [email protected] Introduction Natural Language Processing (NLP) is an area of research and application that explores how computers can be used to understand and manipulate natural language text or speech to do useful things. NLP researchers aim to gather knowledge on how human beings understand and use language so that appropriate tools and techniques can be developed to make computer systems understand and manipulate natural languages to perform the desired tasks. -
1 Application of Text Mining to Biomedical Knowledge Extraction: Analyzing Clinical Narratives and Medical Literature
Amy Neustein, S. Sagar Imambi, Mário Rodrigues, António Teixeira and Liliana Ferreira 1 Application of text mining to biomedical knowledge extraction: analyzing clinical narratives and medical literature Abstract: One of the tools that can aid researchers and clinicians in coping with the surfeit of biomedical information is text mining. In this chapter, we explore how text mining is used to perform biomedical knowledge extraction. By describing its main phases, we show how text mining can be used to obtain relevant information from vast online databases of health science literature and patients’ electronic health records. In so doing, we describe the workings of the four phases of biomedical knowledge extraction using text mining (text gathering, text preprocessing, text analysis, and presentation) entailed in retrieval of the sought information with a high accuracy rate. The chapter also includes an in depth analysis of the differences between clinical text found in electronic health records and biomedical text found in online journals, books, and conference papers, as well as a presentation of various text mining tools that have been developed in both university and commercial settings. 1.1 Introduction The corpus of biomedical information is growing very rapidly. New and useful results appear every day in research publications, from journal articles to book chapters to workshop and conference proceedings. Many of these publications are available online through journal citation databases such as Medline – a subset of the PubMed interface that enables access to Medline publications – which is among the largest and most well-known online databases for indexing profes- sional literature. Such databases and their associated search engines contain important research work in the biological and medical domain, including recent findings pertaining to diseases, symptoms, and medications. -
TAGGS: Grouping Tweets to Improve Global Geotagging for Disaster Response Jens De Bruijn1, Hans De Moel1, Brenden Jongman1,2, Jurjen Wagemaker3, Jeroen C.J.H
Nat. Hazards Earth Syst. Sci. Discuss., https://doi.org/10.5194/nhess-2017-203 Manuscript under review for journal Nat. Hazards Earth Syst. Sci. Discussion started: 13 June 2017 c Author(s) 2017. CC BY 3.0 License. TAGGS: Grouping Tweets to Improve Global Geotagging for Disaster Response Jens de Bruijn1, Hans de Moel1, Brenden Jongman1,2, Jurjen Wagemaker3, Jeroen C.J.H. Aerts1 1Institute for Environmental Studies, VU University, Amsterdam, 1081HV, The Netherlands 5 2Global Facility for Disaster Reduction and Recovery, World Bank Group, Washington D.C., 20433, USA 3FloodTags, The Hague, 2511 BE, The Netherlands Correspondence to: Jens de Bruijn ([email protected]) Abstract. The availability of timely and accurate information about ongoing events is important for relief organizations seeking to effectively respond to disasters. Recently, social media platforms, and in particular Twitter, have gained traction as 10 a novel source of information on disaster events. Unfortunately, geographical information is rarely attached to tweets, which hinders the use of Twitter for geographical applications. As a solution, analyses of a tweet’s text, combined with an evaluation of its metadata, can help to increase the number of geo-located tweets. This paper describes a new algorithm (TAGGS), that georeferences tweets by using the spatial information of groups of tweets mentioning the same location. This technique results in a roughly twofold increase in the number of geo-located tweets as compared to existing methods. We applied this approach 15 to 35.1 million flood-related tweets in 12 languages, collected over 2.5 years. In the dataset, we found 11.6 million tweets mentioning one or more flood locations, which can be towns (6.9 million), provinces (3.3 million), or countries (2.2 million). -
Structured Toponym Resolution Using Combined Hierarchical Place Categories∗
In GIR’13: Proceedings of the 7th ACM SIGSPATIAL Workshop on Geographic Information Retrieval, Orlando, FL, November 2013 (GIR’13 Best Paper Award). Structured Toponym Resolution Using Combined Hierarchical Place Categories∗ Marco D. Adelfio Hanan Samet Center for Automation Research, Institute for Advanced Computer Studies Department of Computer Science, University of Maryland College Park, MD 20742 USA {marco, hjs}@cs.umd.edu ABSTRACT list or table is geotagged with the locations it references. To- ponym resolution and geotagging are common topics in cur- Determining geographic interpretations for place names, or rent research but the results can be inaccurate when place toponyms, involves resolving multiple types of ambiguity. names are not well-specified (that is, when place names are Place names commonly occur within lists and data tables, not followed by a geographic container, such as a country, whose authors frequently omit qualifications (such as city state, or province name). Our aim is to utilize the context of or state containers) for place names because they expect other places named within the table or list to disambiguate the meaning of individual place names to be obvious from place names that have multiple geographic interpretations. context. We present a novel technique for place name dis- ambiguation (also known as toponym resolution) that uses Geographic references are a very common component of Bayesian inference to assign categories to lists or tables data tables that can occur in both (1) the case where the containing place names, and then interprets individual to- primary entities of a table are geographic or (2) the case ponyms based on the most likely category assignments. -
Case No COMP/M.4942 - NOKIA / NAVTEQ
EN This text is made available for information purposes only. A summary of this decision is published in all Community languages in the Official Journal of the European Union. Case No COMP/M.4942 - NOKIA / NAVTEQ Only the English text is authentic. REGULATION (EC) No 139/2004 MERGER PROCEDURE Article 8 (1) Date: 02/VII/2008 COMMISSION OF THE EUROPEAN COMMUNITIES Brussels, 02/VII/2008 C (2008) 3328 PUBLIC VERSION COMMISSION DECISION of 02/VII/2008 declaring a concentration to be compatible with the common market and the EEA Agreement (Case No COMP/M.4942 - NOKIA/ NAVTEQ) COMMISSION DECISION of 02/VII/2008 declaring a concentration to be compatible with the common market and the EEA Agreement (Case No COMP/M.4942 - NOKIA/ NAVTEQ) (Only the English text is authentic) (Text with EEA relevance) THE COMMISSION OF THE EUROPEAN COMMUNITIES, Having regard to the Treaty establishing the European Community, Having regard to the Agreement on the European Economic Area, and in particular Article 57 thereof, Having regard to Council Regulation (EC) No 139/2004 of 20 January 2004 on the control of concentrations between undertakings1, and in particular Article 8(1) thereof, Having regard to the Commission's decision of 28 March 2008 to initiate proceedings in this case, After consulting the Advisory Committee on Concentrations, Having regard to the final report of the Hearing Officer in this case, Whereas: I. INTRODUCTION (1) On 19 February 2008, the Commission received a notification of a proposed concentration pursuant to Article 4 and following a referral pursuant to Article 4(5) of Council Regulation (EC) No 139/2004 ("the Merger Regulation") by which the undertaking Nokia Inc. -
Toponym Resolution in Scientific Papers
SemEval-2019 Task 12: Toponym Resolution in Scientific Papers Davy Weissenbachery, Arjun Maggez, Karen O’Connory, Matthew Scotchz, Graciela Gonzalez-Hernandezy yDBEI, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA zBiodesign Center for Environmental Health Engineering, Arizona State University, Tempe, AZ 85281, USA yfdweissen, karoc, [email protected] zfamaggera, [email protected] Abstract Disambiguation of toponyms is a more recent task (Leidner, 2007). We present the SemEval-2019 Task 12 which With the growth of the internet, the public adop- focuses on toponym resolution in scientific ar- ticles. Given an article from PubMed, the tion of smartphones equipped with Geographic In- task consists of detecting mentions of names formation Systems and the collaborative devel- of places, or toponyms, and mapping the opment of comprehensive maps and geographi- mentions to their corresponding entries in cal databases, toponym resolution has seen an im- GeoNames.org, a database of geospatial loca- portant gain of interest in the last two decades. tions. We proposed three subtasks. In Sub- Not only academic but also commercial and open task 1, we asked participants to detect all to- source toponym resolvers are now available. How- ponyms in an article. In Subtask 2, given to- ponym mentions as input, we asked partici- ever, their performance varies greatly when ap- pants to disambiguate them by linking them plied on corpora of different genres and domains to entries in GeoNames. In Subtask 3, we (Gritta et al., 2018). Toponym disambiguation asked participants to perform both the de- tackles ambiguities between different toponyms, tection and the disambiguation steps for all like Manchester, NH, USA vs. -
A Survey on Geocoding: Algorithms and Datasets for Toponym Resolution
A Survey on Geocoding: Algorithms and Datasets for Toponym Resolution Anonymous ACL submission Abstract survey and critical evaluation of the currently avail- 040 able datasets, evaluation metrics, and geocoding 041 001 Geocoding, the task of converting unstructured algorithms. Our contributions are: 042 002 text to structured spatial data, has recently seen 003 progress thanks to a variety of new datasets, • the first survey to review deep learning ap- 043 004 evaluation metrics, and machine-learning algo- proaches to geocoding 044 005 rithms. We provide a comprehensive survey • comprehensive coverage of geocoding sys- 045 006 to review, organize and analyze recent work tems, which increased by 50% in the last 4 046 007 on geocoding (also known as toponym resolu- 008 tion) where the text is matched to geospatial years 047 009 coordinates and/or ontologies. We summarize • comprehensive coverage of annotated geocod- 048 010 the findings of this research and suggest some ing datasets, which increased by 100% in the 049 011 promising directions for future work. last 4 years 050 012 1 Introduction 2 Background 051 013 Geocoding, also called toponym resolution or to- An early work on geocoding, Amitay et al.(2004), 052 014 ponym disambiguation, is the subtask of geopars- identifies two important types of ambiguity: A 053 015 ing that disambiguates place names in text. The place name may also have a non-geographic mean- 054 016 goal of geocoding is, given a textual mention of a ing, such as Turkey the country vs. turkey the ani- 055 017 location, to choose the corresponding geospatial co- mal, and two places may have the same name, such 056 018 ordinates, geospatial polygon, or entry in a geospa- as the San Jose in California and the San Jose in 057 019 tial database. -
Vegetation of Andean Wetlands (Bofedales) in Huascarán National Park, Peru
Vegetation of Andean wetlands (bofedales) in Huascarán National Park, Peru M.H. Polk1, K.R. Young1, A. Cano2 and B. León1,2 1Department of Geography & the Environment, The University of Texas at Austin, USA 2Museo de Historia Natural, Universidad Nacional Mayor de San Marcos, Lima, Perú _______________________________________________________________________________________ SUMMARY Hybrid terrestrial-aquatic ecosystems in the Andes, commonly known as bofedales, consist of both peatlands and wet meadows and line valley floors at elevations > 3800 m. Compared with similar ecosystems at lower altitudes and higher latitudes, the ecosystem processes and spatial patterns of bofedales are only just beginning to be understood. The research presented here provides the first exploratory and descriptive analysis of the biodiversity and place-to-place variation of vegetation in bofedales in three valleys inside Peru’s Huascarán National Park. Through vegetation surveys, we recorded 112 plant species in 29 families. Over a short geographical distance, a valley-to-valley comparison showed high dissimilarity in terms of species composition. Based on dominant life form and species composition, vegetation in bofedales can be grouped into five assemblages. Our preliminary analysis suggests that several abiotic factors could influence the floristic composition of bofedales: elevation, bulk density, percent organic matter, and cation exchange capacity. The findings of high valley-to-valley variation in species, soil and elevation influences may be useful to land managers of high mountain landscapes that are undergoing transformation related to glacier recession. While our findings advance research on tropical Andean bofedales, they also highlight the need for additional comprehensive investigations to fill gaps in knowledge about the tropical mountains of Latin America.