Extracting Actionable Information from Microtexts

Total Page:16

File Type:pdf, Size:1020Kb

Extracting Actionable Information from Microtexts Extracting actionable information from microtexts Ali Hurriyeto¨ gluˇ arXiv:2008.00343v1 [cs.CL] 1 Aug 2020 i The research was supported by the Dutch national research program COMMIT/. SIKS Dissertation Series No. 2019-17. The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Infor- mation and Knowledge Systems. An electronic version of this dissertation is available at http://repository.ru.nl ISBN: 978-94-028-1540-5 Copyright c 2019 by Ali Hurriyeto¨ glu,ˇ Nijmegen, the Netherlands Cover design by Burcu HÃijrriyetoħlu c Published under the terms of the Creative Commons Attribution License, CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, pro- vided the original author and source are credited. Acknowledgements The work presented in this book would be much harder without support from my fam- ily, friends, and colleagues. First and foremost, I thank my supervisors prof. dr. Antal van den Bosch and Dr. Nelleke Oostdijk, for their invaluable support, guidance, enthusiasm, and optimism. Being part of LAnguage MAchines (LAMA) group at Radboud University was a great experience. I met a lot of wonderful and interesting people. The round table meeting, hutje-op-de-hei, ATILA, the IRC channel, and lunches were fruitful sources of motiva- tion and inspiration. I am grateful for this productive and relaxed work environment. Erkan, Florian, Iris, Kelly, Maarten, Martin, Wessel, Alessandro, Maria, and Ko were the key players of this pleasant atmosphere. I thank all my co-authors and collaborators, who were in addition to my advisors and LAMA people, Piet Daas & Marco Puts from Statistics Netherlands (CBS), Jurjen Wage- maker & Ron Boortman from Floodtags, Christian Gudehus from Ruhr-University Bochum, for their invaluable contributions to the work in this thesis and to my development as a scientist. Florian and Erkan, we have done a lot together. Completing this together in terms of you supporting me again as being my paranymphs feels awesome. I appreciate it. Serwan, Erkan, Ghiath, and Ali are the friends who were always there for me. I am very happy to have this invaluable company with you in this long and hard journey in life. Remy and Selman were the great people who made my time in Mountain View, CA, USA memorable during my internship at Netbase Solutions Inc. It would not be that much productive and cheerful without them. Their presence made me feel safe and at home. I am grateful to prof. dr. Marteen de Rijke for welcoming me and Florian to his group on Fridays during the first year of my PhD studies. This experience provided me the context to understand information retrieval. ii iii I would also like to thank the members of the doctoral committee, prof. dr. Martha Larson, prof. dr. ir. Wessel Kraaij, prof. dr. Lidwien van de Wijngaert, Dr. ir. Alessan- dro Bozzon, and Dr. Aswhin Ittoo. Moreover, I appreciate support of the anonymous reviewers who provided feedback to my submissions to scientific venues. Both the for- mer and latter group of people significantly improved quality of my research and un- derstanding scientific method. There is not any dissertation that can be completed without institutional support. I appreciate support of Graduate School for the Humanities (GSH), Faculty of Arts, and Center for Language and Speech Technology (CLST) at Radboud University, COMMIT/ project, and the Netherlands Research School for Information and Knowledge Systems (SIKS). They provided the environment needed to complete this dissertation. I am always grateful to my family for their effort in standing by my side. Last but not least, Sevara and Madina, you are meaning of happiness for me. Ali HÃijrriyetoħlu SarÄśyer, May 2019 Contents 1 Introduction 1 1.1 Research Questions . .3 1.2 Data and Privacy . .4 1.3 Contributions and Outline . .5 2 Time-to-Event Detection 7 2.1 Introduction . .7 2.2 Related Research . .9 2.3 Data Collection . 11 2.4 Estimating the Time between Twitter Messages and Future Events . 14 2.4.1 Introduction . 14 2.4.2 Methods . 15 2.4.2.1 Linear and Local Regression . 15 2.4.2.2 Time Series Analysis . 16 2.4.3 Experimental Set-up . 16 2.4.3.1 Training and Test Data Generation . 16 2.4.3.2 Baseline . 18 2.4.3.3 Evaluation . 18 2.4.4 Results . 18 2.4.5 Conclusion . 20 2.5 Estimating Time to Event from Tweets Using Temporal Expressions . 21 2.5.1 Introduction . 22 2.5.2 Experimental Set-Up . 23 2.5.2.1 Data Sets . 23 2.5.2.2 Temporal Expressions . 24 2.5.2.3 Evaluation and Baselines . 26 2.5.3 Results . 28 2.5.4 Analysis . 31 2.5.5 Conclusion . 32 2.6 Estimating Time to Event based on Linguistic Cues on Twitter . 34 2.6.1 Introduction . 34 2.6.2 Time-to-Event Estimation Method . 36 2.6.2.1 Features . 37 2.6.2.2 Feature Selection . 42 2.6.2.3 Feature Value Assignment . 42 2.6.2.4 Time-to-Event Estimation for a Tweet . 43 2.6.3 Experimental Set-up . 44 iv Contents v 2.6.3.1 Training and Test Regimes . 44 2.6.3.2 Evaluation and Baselines . 44 2.6.3.3 Hyperparameter Optimization . 46 2.6.4 Test Results . 48 2.6.5 Discussion . 49 2.6.6 Conclusion . 53 2.7 Conclusion . 53 3 Relevant Document Detection 70 3.1 Introduction . 71 3.2 Related Research . 72 3.3 Relevancer . 74 3.3.1 Data Preparation . 75 3.3.2 Feature Extraction . 76 3.3.3 Near-duplicate Detection . 76 3.3.4 Information Thread Detection . 77 3.3.5 Cluster Annotation . 79 3.3.6 Creating a Classifier . 81 3.3.7 Scalability . 82 3.4 Finding and Labeling Relevant Information in Tweet Collections . 83 3.5 Analysing the Role of Key Term Inflections in Knowledge Discovery on Twitter . 87 3.6 Using Relevancer to Detect Relevant Tweets: The Nepal Earthquake Case 90 3.7 Identifying Flu Related Tweets in Dutch . 95 3.8 Conclusion . 97 4 Mixing Paradigms for Relevant Microtext Classification 102 4.1 Introduction . 102 4.2 Classifying Humanitarian Information in Tweets . 103 4.2.1 Introduction . 103 4.2.2 Approach 1: Identifying Topics Using Relevancer . 105 4.2.3 Approach 2: Topic Assignment by Rule-based Search Query Gen- eration . 106 4.2.4 Combined Approach . 109 4.2.5 Results . 111 4.2.6 Discussion . 111 4.2.7 Conclusion . 112 4.3 Comparing and Integrating Machine Learning and Rule-Based Microtext Classification . 113 4.3.1 Related Studies . 114 4.3.2 Data Sets . 115 4.3.3 Baseline . 117 4.3.4 Building a Machine Learning Based Classifier . 117 4.3.5 Rule-Based System . 118 4.3.6 Comparing Machine Learning and Rule-Based System Results . 119 4.3.7 Integrating ML and RB Approaches at the Result Level . 121 4.3.8 Discussion . 124 Contents vi 4.3.9 Conclusion . 124 4.4 Conclusion . 125 5 Conclusions 126 5.1 Answers to Research Questions . 126 5.2 Answer to Problem Statement . 129 5.3 Thesis Contributions . 129 5.4 Outlook . 130 References 133 Samenvatting 145 Summary 147 Curriculum Vitae 149 SIKS Dissertation Series 150 Chapter 1 Introduction Microblogs such as Twitter represent a powerful source of information. Part of this information can be aggregated beyond the level of individual posts. Some of this ag- gregated information is referring to events that could or should be acted upon in the interest of e-governance, public safety, or other levels of public interest. Moreover, a significant amount of this information, if aggregated, could complement existing infor- mation networks in a non-trivial way. Here, we propose a semi-automatic method for extracting actionable information that serves this purpose. The term event denotes what happens to entities in a defined space and time (Casati & Varzi, 2015). Events that affect the behaviors and possibly the health, well-being, and other aspects of life of multiple people, varying from hundreds to millions, are the focus of our work. We are interested in both planned events (e.g. football matches and concerts) and unplanned events (e.g. natural disasters such as floods and earthquakes). Aggregated information that can be acted upon is specified as actionable.1 Actionable information that can help to understand and handle or manage events may be detected at various levels: an estimated time to event, a graded relevance estimate, an event’s precise time and place, or an extraction of entities involved (Vieweg, Hughes, Starbird, & Palen, 2010; Yin, Lampert, Cameron, Robinson, & Power, 2012; Agichtein, Castillo, Donato, Gionis, & Mishne, 2008). Microtexts, which are posted on microblogs, are specified as short, user-dependent, minimally-edited texts in comparison to traditional writing products such as books, essays, and news articles (Ellen, 2011; Khoury, Khoury, & Hamou-Lhadj, 2014). 1http://www.oed.com/view/Entry/1941, accessed June 10, 2018 1 Introduction 2 Users of a microblog platform may be professional authors but very often they are non- professional authors whose writing skills vary widely and who are usually less con- cerned with readers’ expectations as regards the well-formedness of a text. Generally, microtexts can be generated and published in short time spans with easy-to-use inter- faces without being bound or dependent to a fixed place. Microbloggers may generate microtexts about anything they think, observe, or want to share through the microblogs. In recent years, vast quantities of microtexts have been generated on microblogs that are known as social networking services, e.g., Twitter2 and Instagram3 (Vallor, 2016).
Recommended publications
  • A Logical Framework for Temporal Deductive Databases S
    A logical framework for temporal deductive databases S. M. Sripada Departmentof Computing Imperial College of Science& Technology 180 Queen’sGate, London SW7 2BZ Abstract researchers have incorporated time into conventional databasesusing various schemes Temporal deductive databases are deductive providing different capabilities for handling databaseswith an ability to representboth valid temporal information. In particular, work has time and transaction time. The work is basedon been done on adding temporal information to the Event Calculus of Kowalski & Sergot. Event conventionaldatabases to turn them into temporal Calculus is a treatment of time, based on the databases. However, no work is done on notion of events, in first-order classical logic handling both transaction time and valid time in augmentedwith negation as failure. It formalizes deductive databases.In this paper we propose a the semanticsof valid time in deductivedatabases framework for dealing with time in temporal and offers capability for the semantic validation deductivedatabases. of updates and default reasoning. In this paper, the Event Calculus is extended to include the Snodgrass dz Ahn [16,17] describe a concept of transaction time. The resulting classification of databasesdepending on their framework is capable of handling both proactive ability to represent temporal information. They and retroactive updates symmetrically. Error identify three different concepts of time - valid correction is achievedwithout deletionsby means time, transaction time and user-defined time. Of of negation as failure. The semantics of these, user-defined time is temporal transaction time is formalised and the axioms of information of some kind that is of interest to the the Event Calculus are modified to cater for user but does not concern a DBMS.
    [Show full text]
  • Temporal Metadata for Discovery
    Temporal Metadata for Discovery A review of current practice Makx Dekkers AMI Consult SARL Massimo Craglia (Editor) European Commission DG JRC Institute for Environment and Sustainability EUR 23209 EN - 2008 The mission of the Institute for Environment and Sustainability is to provide scientific-technical support to the European Union’s Policies for the protection and sustainable development of the European and global environment. European Commission Joint Research Centre Institute for Environment and Sustainability Contact information Makx Dekkers Address: AMI Consult Société à responsabilité limitée B.P. 647 2016 Luxembourg E-mail: [email protected] Massimo Craglia (Editor) Address: European Commission Joint Research Centre Institute for Environment and Sustainability Spatial Data Infrastructures Unit TP262, Via Fermi 2749 I-21027 Ispra (VA) ITALY E-mail: [email protected] Tel.: +39-0332-786269 Fax: +39-0332-786325 http://ies.jrc.ec.europa.eu/ http://www.jrc.ec.europa.eu/ Legal Notice Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use which might be made of this publication. Europe Direct is a service to help you find answers to your questions about the European Union Freephone number (*): 00 800 6 7 8 9 10 11 (*) Certain mobile telephone operators do not allow access to 00 800 numbers or these calls may be billed. A great deal of additional information on the European Union is available on the Internet. It can be accessed through the Europa server http://europa.eu/ JRC 42620 EUR 23209 EN ISSN 1018-5593 Luxembourg: Office for Official Publications of the European Communities © European Communities, 2008 Reproduction is authorised provided the source is acknowledged Printed in Italy Table of contents 1 BACKGROUND AND RATIONALE ....................................................................
    [Show full text]
  • Dealing with Granularity of Time in Temporal Databases
    DEALING WITH GRANULARITY OF TIME IN TEMPORAL DATABASES Gio Wiederhold t Department of Computer Science Stanford University, Stanford, CA 94305-2140, U.S.A. Sushil Jajodia Department of Information Systems and Systems Engineering George Mason University, Fairfax, VA 22030-4444, U.S.A. Witold Litwin* I.N.R.I.A. 78153 Le Chesnay, France ABSTRACT A question that always arises when dealing with temporal informa- tion is the granularity of the values in the domain type. Many different approaches have been proposed; however, the community has not yet come to a basic agree- ment. Most published temporal representations simplify the issue which leads to difficulties in practical applications. In this paper, we resolve the issue of temporal representation by requiring two domain types (event times and intervals), formal- ize useful temporal semantics, and extend the relational operations in such a way that temporal extensions fit into a relational representation. Under these considera- tions, a database system that deals with temporal data can not only present con- sistent temporal semantics to users but perform consistent computational sequences on temporal data from diverse sources. 1. Introduction Large databases do not just collect data about the current state of objects but retain infor- marion about past states as well. In the past, when storage was costly, such data was often not retained online. Aggregations of past information might have been kept, and detail data was archived. Certain applications always required that an adequate history be kept. In medical data- base systems a record of past events is necessary to avoid repeating ineffective or inappropriate treatments, and in other applications legal or planning requirements make keeping of a history desirable.
    [Show full text]
  • A Centralized Ledger Database for Universal Audit and Verification
    LedgerDB: A Centralized Ledger Database for Universal Audit and Verification Xinying Yangy, Yuan Zhangy, Sheng Wangx, Benquan Yuy, Feifei Lix, Yize Liy, Wenyuan Yany yAnt Financial Services Group xAlibaba Group fxinying.yang,yuenzhang.zy,sh.wang,benquan.ybq,lifeifei,yize.lyz,[email protected] ABSTRACT certain consensus protocol (e.g., PoW [32], PBFT [14], Hon- The emergence of Blockchain has attracted widespread at- eyBadgerBFT [28]). Decentralization is a fundamental basis tention. However, we observe that in practice, many ap- for blockchain systems, including both permissionless (e.g., plications on permissioned blockchains do not benefit from Bitcoin, Ethereum [21]) and permissioned (e.g., Hyperledger the decentralized architecture. When decentralized architec- Fabric [6], Corda [11], Quorum [31]) systems. ture is used but not required, system performance is often A permissionless blockchain usually offers its cryptocur- restricted, resulting in low throughput, high latency, and rency to incentivize participants, which benefits from the significant storage overhead. Hence, we propose LedgerDB decentralized ecosystem. However, in permissioned block- on Alibaba Cloud, which is a centralized ledger database chains, it has not been shown that the decentralized archi- with tamper-evidence and non-repudiation features similar tecture is indispensable, although they have been adopted to blockchain, and provides strong auditability. LedgerDB in many scenarios (such as IP protection, supply chain, and has much higher throughput compared to blockchains. It merchandise provenance). Interestingly, many applications offers stronger auditability by adopting a TSA two-way peg deploy all their blockchain nodes on a BaaS (Blockchain- protocol, which prevents malicious behaviors from both users as-a-Service) environment maintained by a single service and service providers.
    [Show full text]
  • Time, Points and Space - Towards a Better Analysis of Wildlife Data in GIS
    Time, Points and Space - Towards a Better Analysis of Wildlife Data in GIS Dissertation zur Erlangung der naturwissenschaftlichen Doktorw¨urde (Dr. sc. nat.) vorgelegt der Mathematisch-naturwissenschaftlichen Fakult¨at der Universit¨at Z¨urich von Stephan Imfeld von Lungern OW Begutachtet von Prof. Dr. Kurt Brassel Prof. Dr. Bernhard Nievergelt Dr. Britta Allg¨ower Prof. Dr. Peter Fisher Z¨urich 2000 Die vorliegende Arbeit wurde von der Mathematisch-naturwissenschaftlichen Fakult¨at der Universit¨at Z¨urich auf Antrag von Prof. Dr. Kurt Brassel und Prof. Dr. Robert Weibel als Dissertation angenommen. How to Catch Running Animals with GIS and See What They’re up to i Abstract Geographical Information Systems are powerful instruments to analyse spatial data. Wildlife researchers and managers are always confronted with spatial data analysis and make use of these systems for various tasks. One important characteristic of the animals unter investigation is their locomotion. Thus the temporal aspects are important, but unfortunately GIS are almost ignorant concerning the analysis of the temporal domain. This thesis is trying to provide a new perspective on how to analyse moving point objects within GIS. A conceptual shift is performed from a space centered view to a way of analysing spatial and temporal aspects in an equally balanced way. For this purpose the family of analytical Time Plots was developed. They represent a completely new approach of how to analyse moving point objects. They transform the data originating from an animal’s movements into a repre- sentation with two time axes and one spatial axis that allows for an effective recognition of spatial patterns within the data.
    [Show full text]
  • SQL and Temporal Database Research: Unified Review and Future Directions
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 SQL and Temporal Database Research: Unified Review and Future Directions Rose-Mary Owusuaa Mensah1, Vincent Amankona2 1Postgraduate Student, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana 2 Postgraduate Student, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Several attempts to incorporate temporal In the early 1990s, a renowned researcher in the database extensions into the Structured Query Language, SQL, one of community, Richard Snodgrass, proposed that temporal the most popular query languages for databases date back extensions to SQL be developed because few of them to the nineteenth and twentieth century. Although a lot of existed. In response to his proposal [1], a committee was work and research has been done on temporal databases formed to consolidate past research and suggestions from and SQL, there exist very limited literature clearly outlining the research community in order to design temporal the various events which have taken place with regards to extensions to the 1992 edition of the SQL standard. Those temporal extensions of SQL over the years till the present extensions, known as TSQL2, were then developed by this state in a concise document. Consequently, researchers need committee and in 1993, they presented proposals to the to gather several pieces of literature before they can obtain ANSI SQL Technical Committee. Based on responses to the a vivid pictorial timeline of the history and the current state proposals, changes were made to the Language, and the of these temporal extensions for research and software definitive version of the TSQL2 Language Specification development purposes.
    [Show full text]
  • The Multi-Temporal Database of Planetary Image Data (Muted): New Features to Study Dynamic Mars
    50th Lunar and Planetary Science Conference 2019 (LPI Contrib. No. 2132) 1001.pdf THE MULTI-TEMPORAL DATABASE OF PLANETARY IMAGE DATA (MUTED): NEW FEATURES TO STUDY DYNAMIC MARS. T. Heyer1, H. Hiesinger1, D. Reiss1, J. Raack1, and R. Jaumann2, 1Institut für Planetologie, Westfälische Wilhelms-Universität, Wilhelm-Klemm-Str. 10, 48149 Münster, Germany, 2German Aerospace Center (DLR), Berlin, Germany. ([email protected]) Introduction: The Multi-Temporal Database of Structure: MUTED is based on open source Planetary Image Data (MUTED) is a web-based tool to software and standards from the Open Geospatial support the identification of surface changes and time- Consortium (OGC). Metadata of the planetary image critical processes on Mars. The database enables scien- datasets are integrated from the Planetary Data System tists to quickly identify the spatial and multi-temporal (PDS) into a relational database (PostGreSQL). In coverage of orbital image data of all major Mars mis- order to provide the multi-temporal coverage, addi- sions. Since the 1970s, multi-temporal spacecraft tional information, e.g., the geometry, the number and observations have revealed that the martian surface is time span of overlapping images are derived for each very dynamic [e.g., 1-3]. The observation of surface image respectively. A Geoserver translates the metada- changes and processes, including eolian activity [e.g., ta stored in the relational database into web map ser- 5, 6], mass movement activities [e.g., 6, 7, 8], the vices (WMS) and web features services (WFS). Using growth and retreat of the polar caps [e.g., 9, 10], and Common Query Language (CQL), the web services crater-forming impacts [11] became possible by the can be filtered by date, solar longitude, spatial resolu- increasing number of repeated image acquisitions of tion, incidence angle, and spatial extend.
    [Show full text]
  • Self-Organizing Networks and GIS Tools Cases of Use for the Study of Trading Cooperation (1400-1800)
    Journal of Knowledge Management, Economics and Information Technology Self-organizing Networks and GIS Tools Cases of Use for the Study of Trading Cooperation (1400-1800) Ana Crespo Solana and David Alonso García (coords.) Copyright © 2012 by Scientific Papers and individual contributors. All rights reserved. Scientific Papers holds the exclusive copyright of all the contents of this journal. In accordance with the international regulations, no part of this journal may be reproduced or transmitted by any media or publishing organs (including various websites) without the written permission of the copyright holder. Otherwise, any conduct would be considered as the violation of the copyright. The contents of this journal are available for any citation, however, all the citations should be clearly indicated with the title of this journal, serial number and the name of the author. Edited and printed in France by Scientific Papers @ 2012. Editorial Board: Adrian GHENCEA, Claudiu POPA This publication has been sponsored by the Research Project: “Geografía Fiscal y Poder financiero en Castilla” Ref. HAR2010-15168, MICINN, Spain, and GlobalNet, Ref. HAR2011-27694, MICINN, Spain. Self-organizing Networks and GIS Tools Cases of Use for the Study of Trading Cooperation (1400-1800) Summary: 1. J. B. “Jack” Owens, Dynamic Complexity of Cooperation-Based Self-Organizing Commercial Networks in the First Global Age (DynCoopNet): What’s in a name? 2. Monica Wachowicz and J. B. “Jack” Owens, Dynamics of Trade Networks: The main research issues on space-time representations. 3. Adolfo Urrutia Zambrana, María José García Rodríguez, Miguel A. Bernabé Poveda, Marta Guerrero Nieto, Geo-history: Incorporation of geographic information systems into historical event studies.
    [Show full text]
  • 8. Time in GIS and Geographical Databases
    735450_ch08.qxd 2/4/05 1:00 PM Page 91 8 Time in GIS and geographical databases D J PEUQUET Although GIS and geographical databases have existed for over 30 years, it has only been within the past few that the addition of the temporal dimension has gained a significant amount of attention. This has been driven by the need to analyse how spatial patterns change over time (in order to better understand large-scale Earth processes) and by the availability of the data and computing power required by that space–time analysis. This chapter reviews the basic representational and analytical approaches currently being investigated. 1 INTRODUCTION increasingly encountering the issue of how to keep a geographical database current without overwriting Representations used historically within GIS assume outdated information. This problem has come to be a world that exists only in the present. Information known as ‘the agony of delete’ (Copeland 1982; contained within a spatial database may be added to Langran 1992). The rapidly decreasing cost of or modified over time, but a sense of change or memory and the availability of larger memory dynamics through time is not maintained. This capacities of all types is also eliminating the need to limitation of current GIS capabilities has been throw away information as a practical necessity. In receiving substantial attention recently, and the addition, the need within governmental policy- impetus for this attention has occurred on both a making organisations (and subsequently in science) theoretical and a practical level. to understand the effects of human activities better On a theoretical level GIS are intended to provide on the natural environment at all geographical scales an integrated and flexible tool for investigating is now viewed with increasing urgency.
    [Show full text]
  • Time in Database Systems
    30th January 2004 16:9 WorldScientific/ws-b9-75x6-50 book-timeai Chapter 19 Time in Database Systems Jan Chomicki David Toman Department of Comp. Sci. and Eng. School of Computer Science University at Buffalo, U.S.A. University of Waterloo, Canada [email protected] [email protected] 19.1 Introduction Time is ubiquitous in information systems. Almost every enterprise faces the problem of its data becoming out of date. However, such data is often valuable, so it should be archived and some means of accessing it should be provided. Also, some data may be inherently historical, e.g., medical, cadastral, or ju- dicial records. Temporal databases provide a uniform and systematic way of dealing with historical data. This chapter develops point-based data models and query languages for temporal databases in the relational framework. The models provide a separa- tion between the conceptual data (what is stored in the database) and the way the data is compactly represented in the temporal relations (how it is stored). This approach leads to a clean and elegant data model while still providing an efficient implementation path. The foundations of the approach can be traced to the constraint database technology [Kanellakis et al., 1995]: constraint rep- resentation is used as the basis for a space-efficient representation of temporal relations. We first study how logics of time can be used as query and integrity con- straint languages in the above setting and the differences resulting from choos- ing a particular logic as a query language for temporal data. Consequently, model-theoretic notions, particularly formula satisfaction in a fixed model, are of primary interest.
    [Show full text]
  • Preference-Aware Integration of Temporal Data
    Preference-aware Integration of Temporal Data Bogdan Alexe Mary Roth Wang-Chiew Tan IBM Almaden IBM Almaden and UCSC UCSC [email protected] [email protected] [email protected] ABSTRACT hanced if temporal information from different sources is also care- A complete description of an entity is rarely contained in a single fully accounted for [31, 34] in order to determine when facts about data source, but rather, it is often distributed across different data entities are true. sources. Applications based on personal electronic health records, For example, patients typically visit multiple medical profes- sentiment analysis, and financial records all illustrate that signifi- sionals/facilities over the course of their lifetime, and often even cant value can be derived from integrated, consistent, and query- simultaneously. While it is important for each medical facility to able profiles of entities from different sources. Even more so, such maintain medical history records for its patients to provide more integrated profiles are considerably enhanced if temporal informa- comprehensive diagnosis and care, there is even greater value for tion from different sources is carefully accounted for. both the patient and the medical professionals to have access to an integrated profile derived from the histories kept by each institu- We develop a simple and yet versatile operator, called PRAWN, that is typically called as a final step of an entity integration work- tion. Through the integrated profile, one could understand when a drug was administered and taken by a patient and for how long. flow. PRAWN is capable of consistently integrating and resolv- ing temporal conflicts in data that may contain multiple dimen- In turn, one could determine whether drugs with adverse interac- sions of time based on a set of preference rules specified by a tions have been unintentionally prescribed to a patient by different institutions at the same time.
    [Show full text]
  • Semantics of Time-Varying Attributes and Their Use
    Semantics of TimeVarying Attributes and Their Use for Temp oral Database Design Christian S Jensen Richard T Sno dgrass 1 Department of Mathematics and Computer Science Aalb org University Fredrik Ba jers Vej E DK Aalb org DENMARK csjiesdaucdk 2 Department of Computer Science University of Arizona Tucson AZ USA rtscsarizonaedu Abstract Based on a systematic study of the semantics of temp oral attributes of entities this pap er provides new guidelines for the design The notions of observation and up date of temp oral relational databases patterns of an attribute capture when the attribute changes value and when the changes are recorded in the database A lifespan describ es when an attribute has a value And derivation functions describ e how the values of an attribute for all times within its lifespan are computed from stored values The implications for temp oral database design of the semantics that may b e captured using these concepts are formulated as schema decomp osition rules Intro duction Designing appropriate database schemas is crucial to the eective use of rela tional database technology and an extensive theory has b een develop ed that sp ecies what is a go o d database schema and how to go ab out designing such a schema The relation structures provided by temp oral data mo dels eg the recent TSQL mo del provide builtin supp ort for representing the temp o ral asp ects of data With such new relation structures the existing theory for e eective use of tem relational database design no longer applies Thus to mak p oral database
    [Show full text]