Finding Rising Stars in Co-Author Networks Via Weighted Mutual Influence

Total Page:16

File Type:pdf, Size:1020Kb

Finding Rising Stars in Co-Author Networks Via Weighted Mutual Influence Finding Rising Stars in Co-Author Networks via Weighted Mutual Influence Ali Daud a,b Naif Radi Aljohani a Rabeeh Ayaz Abbasi a,c a Faculty of Computing and a Faculty of Computing and c Department of Computer Science, Information Technology, King Information Technology, King Quaid-i-Azam University, Islamabad, Abdulaziz University, Jeddah, Saudi Abdulaziz University, Jeddah, Saudi Pakistan Arabia Arabia [email protected] [email protected] [email protected] b b d Zahid Rafique Tehmina Amjad Hussain Dawood b Department of Computer Science b Department of Computer Science d Faculty of Computing and and Software Engineering, and Software Engineering, Information Technology, University International Islamic University, International Islamic University, of Jeddah, Jeddah, Saudi Arabia Islamabad, Pakistan Islamabad, Pakistan [email protected] [email protected] [email protected] a Khaled H. Alyoubi a Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia [email protected] ABSTRACT KEYWORDS Finding rising stars is a challenging and interesting task which is Bibliographic databases; Citations; Author Order; Rising Stars; Mutual being investigated recently in co-author networks. Rising stars are Influence; Co-Author Networks authors who have a low research profile in the start of their career but may become experts in the future. This paper introduces a new method Weighted Mutual Influence Rank (WMIRank) for finding 1. INTRODUCTION rising stars. WMIRank exploits influence of co-authors’ citations, The rapid evolution of scientific research has created large order of appearance and publication venues. Comprehensive volumes of publications every year, and is expected to continue in experiments are performed to analyze the performance of near future [21]. Online databases such as (DBLP or AMiner) WMIRank in comparison to baseline methods, which have provide large number of publications containing information 1 ignored weighted mutual influence. AMiner data for years 1995- about publication’s title, abstract, author, published year and 2000 is used for experiments. List of top 30 authors as per venues [17]. proposed and baseline methods are compared for their average number of papers, average number of citations and achievements. Informally, co-author relationships and citation links can be called Experimental results provide convincing evidence of the Academic Social Networks (ASNs). Formally, an academic social effectiveness of the investigated weighted mutual influence. network consists of people in research community where they share ideas, research publications, comments and ask questions. CCS CONCEPTS Several areas are well investigated for web databases, such as, expert finding / author ranking [2,5,6,19], author interest finding • Information systems~Link and co-citation analysis [4], research collaboration [9], citation content analysis [23], • Information systems~Retrieval models and ranking author name disambiguation [10,15,16,20], identifying influential • Information systems~Social networks entities [1,3,11,21], etc. However, finding rising-stars problem [7,8, 12,18] still needs to be explored. The rising stars are the scholars with expertise and capabilities to 1 https://aminer.org/ achieve high reputation in their respective fields in near future. Rising star finding in a specific domain is an emerging research © 2017 International World Wide Web Conference Committee (IW3C2), direction that may enable research communities to highlight published under Creative Commons CC BY 4.0 License. WWW 2017 Companion, April 3-7, 2017, Perth, Australia. potential researchers before time. ACM 978-1-4503-4914-717/04. Rising star finding emerges as a new research area and there is DOI: http://dx.doi.org/10.1145/3041021.3054137 little work done in this regard. Initially, this idea was first formulated by PubRank [12], PubRank only incorporates authors and papers mutual influence and static ranking of publication 33 venues. Dynamics of authors’ research profiles are modeled in 3. METHODS [18]. In this method, researchers are classified into four groups using unsupervised learning techniques. Later, PubRank is Before describing WMIRank, a brief introduction of existing enhanced by StarRank [7] which incorporates two new features: methods related to rising stars finding is presented. These methods author’s contribution based mutual influence and dynamic ranking are PubRank [12] and StarRank [7]. These methods are derived lists of publications. Afterwards, a prediction method is proposed from the famous Page Rank algorithm. The details of these [8], that applies machine learning techniques for finding rising methods are presented in Section 3.1 and 3.2. stars using publication, co-author and venue based features combination. However, author’s contribution based mutual 3.1 PubRank influence is not deeply explored in terms of co-author’s citations, PubRank is the first method derived for finding rising stars in order of appearance and publication venue’s citations. bibliographic networks [12]. It is based on two features, the mutual influence among researchers and track record of author’s In this paper, we propose WMIRank for finding outstanding publications in form of publishing in different levels of venues. researchers. It incorporates weighted mutual influence of co- Consider an example of two authors 퐴 and 퐴 with 4 and 3 author’s citations, order of appearance and publication venue’s 푘 푙 publications respectively. Both authors are co-authors of two citations. This weighted mutual influence enables us to effectively papers. The mutual influence weights between authors are find the outstanding researchers. calculated as follows: The main contribution of our work is summarized as follows. 1. It is the first attempt to consider co-author’s citations, co- W (Al , Ak) = (Al , Ak) ⁄PAk = 2⁄ 4 = 0.5 (2) author’s order of appearance and citations of co-author’s W (Ak , Al) = (Ak , Al) ⁄PAl = 2⁄ 3 = 0.66 publication venue. 2. Mathematical formulations for the computation of weighted Where, weight W (Al, Ak) describes influence of author Al on mutual influence. author Ak and PAl and PAk are the total number of publications by 3. Performance evaluation of proposed and baseline methods in authors 퐴푙 and 퐴푘. The weight W (Al, Ak) value is smaller than the terms of average number of papers and average number of weight W (Ak, Al) value, because author 퐴푘 has more publications citations. than author A . So, author 퐴 has more influence on author A . 4. Qualitative analysis of top ranked 30 authors in terms of their l 푘 l achievements. Then worth of scholar’s publications is computed on the basis of 5. The effect of parameters for the computation of ranks of reputation/prestige of publications’ venues. For the computation authors is examined and comparison of methods for ranking of publication quality score, venue of publication is considered i.e. position relocation is provided. publishing in high-level venue in the start of career shows scholar has bright chances to become future rising star. Long et al. This paper is organized as follows. The problem definition is suggests static ranking listings with following rank information. presented in section 2 followed by the brief review of two existing i.e. Rank 1 (premium), Rank 2 (leading), Rank 3 (reputable), methods and detailed description of proposed method in section 3. Rank 4 (unranked) [13]. Finally, the publication quality score for In section 4, dataset, performance evaluation, experiments to an author with a set P of publications is computed as follows. examine the accuracy and efficiency of the proposed method and results comparison with two baseline methods is provided. 1 |P| 1 λ (Ai) = ∑ (3) Finally, section 5 concludes this work. |P| i = 1 α (r(푝푢푏푖)-1) th Where, pubi is the i publication, r(pubi) is the rank of paper and 2. PROBLEM DEFINITION value of α is (0 < α < 1). The value of α is low for a low rank Formal problem definition of calculating rising star scores is paper. The larger λ (Ai) is, the higher the average quality of papers provided here. The ranking problem structure is quite similar to published by the researcher. Finally, PubRank is formulated as the famous page rank problem. follows. Given a set {퐴 ,…, 퐴 } of n authors, we have to calculate the 1 – d 1 푛 PubRank (A ) = + d . X (4) ranking score (rising star score) for each author by calculating i n |v| W (A , A ) . λ (A ) . PubRank (A ) different mutual influence values. Here, W = {푊1,…, 푊푛} be an n i j i j 푋 = ∑ |v| × n matrix, where Wij represents the mutual influence of author 퐴푖 j = 1 ∑k = 1 W (Ak , Aj) . λ (Ak) on author 퐴푗. Let Y be a vector representing ranking scores of n authors. Then ranking function is defined as follows. Where, n is the total number of authors, W (Aj, Ai) and λ (Ai) are influential weight and publication quality score respectively. 1- d |v| (Ai, A푗) y(Ai)= + d . ∑ |v| . y(Aj) (1) n j=1 ∑ (A , A ) k=1 k 푗 3.2 StarRank In StarRank, the order in which authors appear in papers is also Where, d is damping factor, v is the set of co-authors for author Ai considered with first author as maximum contributor [7]. Author and W(Ai, Aj) is the influential weight. The y(Ai) score is calculated for each author to rank the authors. with less contribution gets low score and with more contribution gets higher score. Based on this intuition, author contribution We propose three features i.e. co-author’s citations based mutual based mutual influence is calculated. It also recommends a influence, co-author order based mutual influence and co-author dynamic way of computing the publication venue score instead of venues’ citations based mutual influence for the computation of static ranking. Consider two authors Ak and Al with 4 and 3 rising star score of an author Ai.
Recommended publications
  • Tarjan Transcript Final with Timestamps
    A.M. Turing Award Oral History Interview with Robert (Bob) Endre Tarjan by Roy Levin San Mateo, California July 12, 2017 Levin: My name is Roy Levin. Today is July 12th, 2017, and I’m in San Mateo, California at the home of Robert Tarjan, where I’ll be interviewing him for the ACM Turing Award Winners project. Good afternoon, Bob, and thanks for spending the time to talk to me today. Tarjan: You’re welcome. Levin: I’d like to start by talking about your early technical interests and where they came from. When do you first recall being interested in what we might call technical things? Tarjan: Well, the first thing I would say in that direction is my mom took me to the public library in Pomona, where I grew up, which opened up a huge world to me. I started reading science fiction books and stories. Originally, I wanted to be the first person on Mars, that was what I was thinking, and I got interested in astronomy, started reading a lot of science stuff. I got to junior high school and I had an amazing math teacher. His name was Mr. Wall. I had him two years, in the eighth and ninth grade. He was teaching the New Math to us before there was such a thing as “New Math.” He taught us Peano’s axioms and things like that. It was a wonderful thing for a kid like me who was really excited about science and mathematics and so on. The other thing that happened was I discovered Scientific American in the public library and started reading Martin Gardner’s columns on mathematical games and was completely fascinated.
    [Show full text]
  • Qualitative and Quantitative Security Analyses for Zigbee Wireless Sensor Networks
    Downloaded from orbit.dtu.dk on: Sep 27, 2018 Qualitative and Quantitative Security Analyses for ZigBee Wireless Sensor Networks Yuksel, Ender; Nielson, Hanne Riis; Nielson, Flemming Publication date: 2011 Document Version Publisher's PDF, also known as Version of record Link back to DTU Orbit Citation (APA): Yuksel, E., Nielson, H. R., & Nielson, F. (2011). Qualitative and Quantitative Security Analyses for ZigBee Wireless Sensor Networks. Kgs. Lyngby, Denmark: Technical University of Denmark (DTU). (IMM-PHD-2011; No. 247). General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Qualitative and Quantitative Security Analyses for ZigBee Wireless Sensor Networks Ender Y¨uksel Kongens Lyngby 2011 IMM-PHD-2011-247 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk IMM-PHD: ISSN 0909-3192 Summary Wireless sensor networking is a challenging and emerging technology that will soon become an inevitable part of our modern society.
    [Show full text]
  • Diffie and Hellman Receive 2015 Turing Award Rod Searcey/Stanford University
    Diffie and Hellman Receive 2015 Turing Award Rod Searcey/Stanford University. Linda A. Cicero/Stanford News Service. Whitfield Diffie Martin E. Hellman ernment–private sector relations, and attracts billions of Whitfield Diffie, former chief security officer of Sun Mi- dollars in research and development,” said ACM President crosystems, and Martin E. Hellman, professor emeritus Alexander L. Wolf. “In 1976, Diffie and Hellman imagined of electrical engineering at Stanford University, have been a future where people would regularly communicate awarded the 2015 A. M. Turing Award of the Association through electronic networks and be vulnerable to having for Computing Machinery for their critical contributions their communications stolen or altered. Now, after nearly to modern cryptography. forty years, we see that their forecasts were remarkably Citation prescient.” The ability for two parties to use encryption to commu- “Public-key cryptography is fundamental for our indus- nicate privately over an otherwise insecure channel is try,” said Andrei Broder, Google Distinguished Scientist. fundamental for billions of people around the world. On “The ability to protect private data rests on protocols for a daily basis, individuals establish secure online connec- confirming an owner’s identity and for ensuring the integ- tions with banks, e-commerce sites, email servers, and the rity and confidentiality of communications. These widely cloud. Diffie and Hellman’s groundbreaking 1976 paper, used protocols were made possible through the ideas and “New Directions in Cryptography,” introduced the ideas of methods pioneered by Diffie and Hellman.” public-key cryptography and digital signatures, which are Cryptography is a practice that facilitates communi- the foundation for most regularly used security protocols cation between two parties so that the communication on the Internet today.
    [Show full text]
  • Curriculum Vitae
    Massachusetts Institute of Technology School of Engineering Faculty Personnel Record Date: April 1, 2020 Full Name: Charles E. Leiserson Department: Electrical Engineering and Computer Science 1. Date of Birth November 10, 1953 2. Citizenship U.S.A. 3. Education School Degree Date Yale University B. S. (cum laude) May 1975 Carnegie-Mellon University Ph.D. Dec. 1981 4. Title of Thesis for Most Advanced Degree Area-Efficient VLSI Computation 5. Principal Fields of Interest Analysis of algorithms Caching Compilers and runtime systems Computer chess Computer-aided design Computer network architecture Digital hardware and computing machinery Distance education and interaction Fast artificial intelligence Leadership skills for engineering and science faculty Multicore computing Parallel algorithms, architectures, and languages Parallel and distributed computing Performance engineering Scalable computing systems Software performance engineering Supercomputing Theoretical computer science MIT School of Engineering Faculty Personnel Record — Charles E. Leiserson 2 6. Non-MIT Experience Position Date Founder, Chairman of the Board, and Chief Technology Officer, Cilk Arts, 2006 – 2009 Burlington, Massachusetts Director of System Architecture, Akamai Technologies, Cambridge, 1999 – 2001 Massachusetts Shaw Visiting Professor, National University of Singapore, Republic of 1995 – 1996 Singapore Network Architect for Connection Machine Model CM-5 Supercomputer, 1989 – 1990 Thinking Machines Programmer, Computervision Corporation, Bedford, Massachusetts 1975
    [Show full text]
  • Moshe Y. Vardi
    MOSHE Y. VARDI Address Office Home Department of Computer Science 4515 Merrie Lane Rice University Bellaire, TX 77401 P.O.Box 1892 Houston, TX 77251-1892 Tel: (713) 348-5977 Tel: (713) 665-5900 Fax: (713) 348-5930 Fax: (713) 665-5900 E-mail: [email protected] URL: http://www.cs.rice.edu/∼vardi Research Interests Applications of logic to computer science: • database systems • complexity theory • multi-agent systems • specification and verification of hardware and software Education Sept. 1974 B.Sc. in Physics and Computer Science (Summa cum Laude), Bar-Ilan University, Ramat Gan, Israel May 1980 M.Sc. in Computer Science, The Feinberg Graduate School, The Weizmann Insti- tute of Science, Rehovoth, Israel Thesis: Axiomatization of Functional and Join Dependencies in the Relational Model. Advisors: Prof. C. Beeri (Hebrew University) Prof. P. Rabinowitz Sept 1981 Ph.D. in Computer Science, Hebrew University, Jerusalem, Israel Thesis: The Implication Problem for Data Dependencies in the Relational Model. Advisor: Prof. C. Beeri Professional Experience Nov. 1972 – June 1973 Teaching Assistant, Dept. of Mathematics, Bar-Ilan University. Course: Introduction to computing. 1 Nov. 1978 – June 1979 Programmer, The Weizmann Institute of Science. Assignments: implementing a discrete-event simulation package and writing input-output routines for a database management system. Feb. 1979 – Oct. 1980 Research Assistant, Inst. of Math. and Computer Science, The Hebrew University of Jerusalem. Research subject: Theory of data dependencies. Nov. 1980 – Aug. 1981 Instructor, Inst. of Math. and Computer Science, The Hebrew University of Jerusalem. Course: Advanced topics in database theory. Sep. 1981 – Aug. 1983 Postdoctoral Scholar, Dept. of Computer Science, Stanford Univer- sity.
    [Show full text]
  • Michael Oser Rabin Automata, Logic and Randomness in Computation
    Michael Oser Rabin Automata, Logic and Randomness in Computation Luca Aceto ICE-TCS, School of Computer Science, Reykjavik University Pearls of Computation, 6 November 2015 \One plus one equals zero. We have to get used to this fact of life." (Rabin in a course session dated 30/10/1997) Thanks to Pino Persiano for sharing some anecdotes with me. Luca Aceto The Work of Michael O. Rabin 1 / 16 Michael Rabin's accolades Selected awards and honours Turing Award (1976) Harvey Prize (1980) Israel Prize for Computer Science (1995) Paris Kanellakis Award (2003) Emet Prize for Computer Science (2004) Tel Aviv University Dan David Prize Michael O. Rabin (2010) Dijkstra Prize (2015) \1970 in computer science is not classical; it's sort of ancient. Classical is 1990." (Rabin in a course session dated 17/11/1998) Luca Aceto The Work of Michael O. Rabin 2 / 16 Michael Rabin's work: through the prize citations ACM Turing Award 1976 (joint with Dana Scott) For their joint paper \Finite Automata and Their Decision Problems," which introduced the idea of nondeterministic machines, which has proved to be an enormously valuable concept. ACM Paris Kanellakis Award 2003 (joint with Gary Miller, Robert Solovay, and Volker Strassen) For \their contributions to realizing the practical uses of cryptography and for demonstrating the power of algorithms that make random choices", through work which \led to two probabilistic primality tests, known as the Solovay-Strassen test and the Miller-Rabin test". ACM/EATCS Dijkstra Prize 2015 (joint with Michael Ben-Or) For papers that started the field of fault-tolerant randomized distributed algorithms.
    [Show full text]
  • AWARDS SESSION New Orleans Convention Center New Orleans, Louisiana Tuesday, 18 November 2014
    Association for Computing Machinery AWARDS SESSION New Orleans Convention Center New Orleans, Louisiana Tuesday, 18 November 2014 SC14 Awards Session will honor the following individuals for their contributions to the computing profession: Charles E. Leiserson 2014 ACM/IEEE Computer Society Ken Kennedy Award Satoshi Matsuoka 2014 IEEE Computer Society Sidney Fernbach Award Gordon Bell 2014 IEEE Computer Society Seymour Cray Computer Engineering Award The Kennedy Award is presented by ACM President Alexander L. Wolf and Computer Society President Dejan S. Milojičić. The IEEE Computer Society Seymour Cray and Sidney Fernbach Awards are presented by 2014 IEEE Computer Society President Dejan S. Milojičić. The 2014 ACM/IEEE Computer Society Ken Kennedy Award ACM/IEEE Computer Society Ken Kennedy Award The Ken Kennedy Award was established in memory of Ken Kennedy, the founder of Rice University’s nationally ranked computer science program and one of the world’s foremost experts on high-performance computing. The award consists of a certificate and a $5,000 honorarium and is awarded jointly by the ACM and the IEEE Computer Society for outstanding contributions to programmability or productivity in high-performance computing together with significant community service or mentoring contributions. http://awards.acm.org http://computer.org/awards PREVIOUS RECIPIENTS — ACM/IEEE COMPUTER SOCIETY KEN KENNEDY AWARD 2013 - Jack Dongarra 2012 - Mary Lou Soffa 2011 - Susan L. Graham 2010 - David Kuck 2009 - Francine Berman 2014 ACM/IEEE COMPUTER SOCIETY KEN KENNEDY AWARD SUBCOMMITTEE Mary Hall, University of Utah, Chair David Rosenblum, National University of Randy Allen, National Instruments Singapore Keith Cooper, Rice University Valentina Salapura, IBM Susan Graham, University of California, Berkeley* David Padua, University of Illinois at Urbana- Champaign *Previous recipient Charles E.
    [Show full text]
  • The Distortion of Locality Sensitive Hashing
    The Distortion of Locality Sensitive Hashing Flavio Chierichetti∗1, Ravi Kumar2, Alessandro Panconesi∗3, and Erisa Terolli∗4 1 Dipartimento di Informatica, Sapienza University of Rome, Rome, Italy [email protected] 2 Google, Mountain View, USA [email protected] 3 Dipartimento di Informatica, Sapienza University of Rome, Rome, Italy [email protected] 4 Dipartimento di Informatica, Sapienza University of Rome, Rome, Italy [email protected] Abstract Given a pairwise similarity notion between objects, locality sensitive hashing (LSH) aims to construct a hash function family over the universe of objects such that the probability two objects hash to the same value is their similarity. LSH is a powerful algorithmic tool for large-scale applications and much work has been done to understand LSHable similarities, i.e., similarities that admit an LSH. In this paper we focus on similarities that are provably non-LSHable and propose a notion of distortion to capture the approximation of such a similarity by a similarity that is LSHable. We consider several well-known non-LSHable similarities and show tight upper and lower bounds on their distortion. 1998 ACM Subject Classification F.2 Analysis of Algorithms and Problem Complexity Keywords and phrases Locality sensitive hashing, Distortion, Similarity Digital Object Identifier 10.4230/LIPIcs.ITCS.2017.54 1 Introduction The notion of similarity finds its use in a large variety of fields above and beyond computer science. Often, the notion is tailored to the actual domain and the application for which it is intended. Locality sensitive hashing (henceforth LSH) is a powerful algorithmic paradigm for computing similarities between data objects in an efficient way.
    [Show full text]
  • Biography Five Related and Significant Publications
    GUY BLELLOCH Professor Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 [email protected], http://www.cs.cmu.edu/~guyb Biography Guy E. Blelloch received his B.S. and B.A. from Swarthmore College in 1983, and his M.S. and PhD from MIT in 1986, and 1988, respectively. Since then he has been on the faculty at Carnegie Mellon University, where he is now an Associate Professor. During the academic year 1997-1998 he was a visiting professor at U.C. Berkeley. He held the Finmeccanica Faculty Chair from 1991–1995 and received an NSF Young Investigator Award in 1992. He has been program chair for the ACM Symposium on Parallel Algorithms and Architectures, program co-chair for the IEEE International Parallel Processing Symposium, is on the editorial board of JACM, and has served on a dozen or so program committees. His research interests are in parallel programming languages and parallel algorithms, and in the interaction of the two. He has developed the NESL programming language under an ARPA contract and an NSF NYI award. His work on parallel algorithms includes work on sorting, computational geometry, and several pointer-based algorithms, including algorithms for list-ranking, set-operations, and graph connectivity. Five Related and Significant Publications 1. Guy Blelloch, Jonathan Hardwick, Gary L. Miller, and Dafna Talmor. Design and Implementation of a Practical Parallel Delaunay Algorithm. Algorithmica, 24(3/4), pp. 243–269, 1999. 2. Guy Blelloch, Phil Gibbons and Yossi Matias. Efficient Scheduling for Languages with Fine-Grained Parallelism. Journal of the ACM, 46(2), pp.
    [Show full text]
  • Extended Abstract
    Sarasate: A Strong Representation System for Networking Policies Bin Gui, Fangping Lan, Anduo Wang Temple University ABSTRACT Yet using and managing networking policies remain hard: One Policy information in computer networking today are hard to manage. has to fully understand the protocol mechanisms or its operating This is in sharp contrast to relational data structured in a database environment to properly set policy attributes in that protocol; In that allows easy access. In this demonstration, we ask why cannot SDNs, while the goal is to simplify management, the lack of prefixed (or how can) we turn network polices into relational data. Our key mental model makes it more difficult for anyone who is not involved observation is that oftentimes a policy does not prescribe a single in writing the controller program in the first place to make sense ofor “definite” network state, but rather is an “incomplete” description debug a policy; Besides the tight coupling with network systems, in of all the legitimate network states. Based on this idea, we adopt more automated tasks like formal verification or synthesis, policies conditional tables and the usual SQL interface (a relational structured also take on the characteristics — expression in a specialized logic developed for incomplete database) as a means to represent and or format — of that external synthesizer or verifier that require query sets of network states in exactly the same way as a single significant expertise. And few in the networking community question definite network snapshot. More importantly, like relational tables the need for increasingly more complex and disparate structures of that improve data productivity and innovation, relational polices policies, or the deeper integration of policies with the rest of the allow us to extend a rich set of data mediating methods to address system(s).
    [Show full text]
  • Datalog Unchained
    Datalog Unchained Victor Vianu UC San Diego & Inria [email protected] ABSTRACT robust class of fixpoint queries, previously defined by extensions of This is the companion paper of a talk in the Gems of PODS se- first-order logic with a fixpoint operator. ries, that reviews the development, starting at PODS 1988, of a The two PODS 1988 articles initiated a line of research [3, 6, 8, family of Datalog-like languages with procedural, forward chain- 14, 15, 104, 116] that extended the forward chaining approach to an ing semantics, providing an alternative to the classical declarative, entire family of Datalog-like languages, providing an alternative model-theoretic semantics. These languages also provide a unified paradigm for database queries, with natural counterparts to classical formalism that can express important classes of queries including query languages including the fixpoint and while queries, and all the fixpoint, while, and all computable queries. They can also incor- way to computable queries. The forward chaining approach turned porate in a natural fashion updates and nondeterminism. Datalog out to also be suitable for studying nondeterministic languages, variants with forward chaining semantics have been adopted in a yielding an appealing unifying formalism. The present paper tells variety of settings, including active databases, production systems, the story of this development. distributed data exchange, and data-driven reactive systems. After a brief review of classical query languages, we turn to Datalog and its extensions. We begin with an informal overview of ACM Reference Format: Datalog and Datalog: with stratified and well-founded semantics. Victor Vianu. 2021. Datalog Unchained.
    [Show full text]
  • Proposal) RA PTIME X⊆ + ⊇ X⊇ X X
    U4U - Taming Uncertainty with Uncertainty-Annotated Databases Division: CISE/IIS/III Boris Glavic, Oliver Kennedy, Atri Rudra Keywords: Uncertainty; Incomplete Databases; Certain Answers survey 1 Overview AgeRange Zip Gender Commute Uncertainty is prevalent in data analysis, no matter 1 18-24 10003 M 1 hour what the size of the data, the application domain, or the 2 24-36 10003 M 90 3 37-45 10007 NULL 50 min type of analysis. Common sources of uncertainty include 4 0-18 10007 F 40 min missing values, sensor errors and noise, bias, outliers, 5 NULL 14260 F 1 mismatched data, and many more. If ignored, data uncer- 6 0-18 14260 Other 70 min tainty can result in hard to trace errors in analytical results, Figure 1: Example input with missing values which in turn can have severe real world implications like unfounded scientific discoveries, financial damages, or even effects on people’s physical well-being (e.g., medical decisions based on incorrect data). Example 1 Consider the example dataset survey in Figure1, which illustrates responses to a survey of com- muters, with one row per response. Several NULL values indicate omitted responses. Commute (duration) is a free-text response and follows multiple response formats. Several of these (in red) lack units, can not be parsed, and are consequently replaced with NULL. A data scientist explores this data with the query: SELECT Zip,AVG(Commute)FROM surveyWHERE Gender = 'F'GROUPBY Zip SQL’s NULL value semantics would silently exclude rows 2, 3 and 5 when computing results. Dropping records that contain missing data is one commonly used approach to managing uncertainty in data science, but by no means the only one.
    [Show full text]