Revisiting Data Partitioning for Scalable RDF Graph Processing Jorge Armando Galicia Auyon

Total Page:16

File Type:pdf, Size:1020Kb

Revisiting Data Partitioning for Scalable RDF Graph Processing Jorge Armando Galicia Auyon Revisiting Data Partitioning for Scalable RDF Graph Processing Jorge Armando Galicia Auyon To cite this version: Jorge Armando Galicia Auyon. Revisiting Data Partitioning for Scalable RDF Graph Processing. Other [cs.OH]. ISAE-ENSMA Ecole Nationale Supérieure de Mécanique et d’Aérotechique - Poitiers, 2021. English. NNT : 2021ESMA0001. tel-03167657 HAL Id: tel-03167657 https://tel.archives-ouvertes.fr/tel-03167657 Submitted on 12 Mar 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE` pour l'obtention du Grade de Docteur de l'Ecole Nationale Sup´erieure de M´ecanique et d'A´erotechnique (Dipl^omeNational - Arr^et´edu 25 mai 2016) Ecole Doctorale : Sciences et Ing´eni´erie pour l'Information et Math´ematiques Secteur de Recherche : Informatique et Applications Pr´esent´ee par : Jorge Armando GALICIA AUYON´ ************************ Revisiting Data Partitioning for Scalable RDF Graph Processing ************************ Directeur de th`ese : Ladjel BELLATRECHE Co-encadrant de th`ese : Amin MESMOUDI ************************ Soutenue le 12 Janvier 2021 devant la Commission d'Examen ************************ JURY Pr´esident : Emmanuel GROLLEAU Professeur, ISAE - ENSMA, Poitiers Rapporteurs : Yannis MANOLOPOULOS Professor, Open University of Cyprus, Chypre Farouk TOUMANI Professeur, Universit´eBlaise Pascal, Clermont-Ferrand Membres du jury : Genoveva VARGAS-SOLAR Charg´eede Recherche, CNRS, Grenoble Carlos ORDONEZ Associate Professor, University of Houston, USA Patrick VALDURIEZ Directeur de Recherche, INRIA, Montpellier Amin MESMOUDI Ma^ıtre de Conf´erences, Universit´ede Poitiers, Poitiers Ladjel BELLATRECHE Professeur, ISAE - ENSMA, Poitiers ii It always seems impossible until it's done. | Nelson Mandela iii iv Acknowledgements This thesis would not have been possible without the inspiration and support of so many indi- viduals: - I would like to thank my supervisors Prof. Ladjel Bellatreche and Dr. Amin Mes- moudi for their guidance, patience, and support during my thesis. The insightful dis- cussions I had with each of you definitely enlightened my way while exploring new ideas. Their careful editing contributed greatly to the production of this manuscript. You have helped me learn how to do research. - I would also want to show my sincere gratitude to Professors Yannis Manolopoulos and Farouk Toumani for taking their precious time to be rapporteurs of my thesis. I greatly appreciate your feedback that I carefully considered to improve the quality of my work. In addition, I would like to thank Professors Genoveva Vargas-Solar, Patrick Valduriez, Carlos Ordonez, and Emmanuel Grolleau who have honored me to be part of the examination committee and whose questions have enriched my study from many different angles. - I also thank all the members of the LIAS laboratory who welcomed me and with whom I shared so many moments of conviviality throughout these three years. Many thanks to Allel Hadjali, Brice Chardin, Henri Bauer, Michael Richard, and Yassine Ouhammou for mentoring me during the first steps of my teaching career. I wish to extend my special thanks to Micka¨el Baron for the technical support throughout my work, for his availability and enthusiasm. I sincerely thank B´en´edicte Boinot for her kindness, generosity, and all the administrative assistance to coordinate my thesis project. - I want to thank my past and present Ph.D. colleagues and friends. Abir, Amna, Ana¨ıs, Anh Toan, Chourouk, Fatma, Houssameddine, Ibrahim, Ishaq, Issam, Khe- didja, Louise, Matheus, Mohamed, Nesrine, R´eda, Richard, Sana, Simon, and v Soulimane. Many thanks to everyone for all the shared moments in which we exchanged ideas of our academic works and for the diverse conversations that cleared my mind when I needed it most. - Many thanks to my group of friends, Anita, Abril, Gad, Gledys, Laura, Lina, Mar- cos, Mariana, Melissa, Moditha, and Suela on whom I was able to unload a lot of pressure thanks to so many shared laughs. Your sense of humor was the ray of sunshine for many cloudy days during this journey. - Finally, my deep and sincere gratitude to my family for their unconditional and unpar- alleled love, help, and support. Many thanks to my parents, Veronica and Jorge, my sisters Daniela and Jimena, brother Jos´e, and aunt Carmen for always being there to restore my optimism when I was missing. I am forever indebted to R´egis, Marie France, Caroline, Thomas, and Marie Jos´e who became my family and I never felt alone while being so far from my loved Guatemala. Finally, I would like to express my deepest gratitude to Gabriel, who encouraged, supported, and loved me without fail and stood by my side every time. I will be forever grateful to you, this journey would not have been possible if not for all your support. <Muchas gracias a tod@s! vi Table of Contents Table of Contents vii Introduction 1 I Preliminaries 17 1 Data Partitioning Foundations 19 1.1 Introduction . 21 1.2 Data Partitioning Fundamentals . 22 1.2.1 Partitioning definition and development overview . 23 1.2.2 Partitioning concept evolution . 24 1.3 Partitioning dimensions . 27 1.3.1 Type....................................... 27 1.3.2 Main objective . 29 1.3.3 Mechanism . 30 1.3.4 Algorithm . 31 1.3.5 Cost Model . 31 1.3.6 Constraints . 32 1.3.7 Platform . 33 1.3.8 System Element . 33 1.3.9 Adaptability . 33 1.3.10 Data model . 33 1.4 Partitioning approaches . 34 1.4.1 Partitioning by type, platform and mechanism . 35 1.4.2 Partitioning by data model . 41 1.4.3 Partitioning by adaptability . 47 1.4.4 Partitioning by constraints . 51 1.5 Partitioning in large-scale platforms . 53 1.5.1 Hadoop ecosystem . 53 1.5.2 Apache Spark . 55 1.5.3 NoSQL stores . 56 1.5.4 Hybrid architectures . 58 1.6 Conclusion . 58 2 Graph Data : Representation and Processing 61 2.1 Introduction . 63 2.2 Graph database models . 63 2.2.1 Logical graph data structures . 64 vii TABLE OF CONTENTS 2.2.2 Data storage . 65 2.2.3 Query and manipulation languages . 66 2.2.4 Query processing . 69 2.2.5 Graph partitioning . 71 2.2.6 Graph databases . 77 2.3 Resource Description Framework . 80 2.3.1 Background . 80 2.3.2 Storage models . 82 2.3.3 Processing strategies . 86 2.3.4 Data partitioning . 87 2.4 Conclusion . 91 II Contributions 93 3 Logical RDF Partitioning 95 3.1 Introduction . 97 3.2 RDF partitioning design process . 97 3.3 Graph fragments . 99 3.3.1 Grouping the graph by instances . 99 3.3.2 Grouping the graph by attributes . 104 3.4 From logical fragments to physical structures . 107 3.5 Allocation problem . 109 3.5.1 Problem definition . 109 3.5.2 Graph partitioning heuristic . 111 3.6 RDF partitioning example . 112 3.7 Dealing with large fragments . 116 3.8 Conclusion . 117 4 RDFPartSuite in Action 119 4.1 Introduction . 121 4.2 RDF QDAG ...................................... 121 4.2.1 System architecture . 122 4.2.2 Storage model . 123 4.2.3 Execution model . 125 4.3 Loading costs . 128 4.3.1 Tested datasets . 128 4.3.2 Configuration setup . 129 4.3.3 Pre-processing times . 129 4.4 Evaluation of the fragmentation strategies . 130 4.4.1 Data coverage . 130 4.4.2 Exclusive comparison of fragmentation strategies . 131 4.4.3 Combining fragmentation strategies . 134 4.5 Evaluation of the allocation strategies . 136 4.5.1 Data skewness comparison . 136 4.5.2 Communication costs study . 137 4.5.3 Distributed experiments . 140 4.6 Partitioning language . 143 4.6.1 Notations . 143 4.6.2 CREATE KG statement . 144 4.6.3 LOAD DATA statement . 144 4.6.4 FRAGMENT KG statement . 144 viii TABLE OF CONTENTS 4.6.5 ALTER FRAGMENT statement . 145 4.6.6 ALLOCATE statement . 145 4.6.7 ALTER ALLOCATION statement . 146 4.6.8 DISPATCH statement . 146 4.6.9 Integration of the language to other systems . 146 4.7 RDF partitioning advisor . 147 4.7.1 Main functionalities . 147 4.7.2 System architecture . ..
Recommended publications
  • A Comparison of On-Line Computer Science Citation Databases
    A Comparison of On-Line Computer Science Citation Databases Vaclav Petricek1,IngemarJ.Cox1,HuiHan2, Isaac G. Councill3, and C. Lee Giles3 1 University College London, WC1E 6BT, Gower Street, London, United Kingdom [email protected], [email protected] 2 Yahoo! Inc., 701 First Avenue, Sunnyvale, CA, 94089 [email protected] 3 The School of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802, USA [email protected] [email protected] Abstract. This paper examines the difference and similarities between the two on-line computer science citation databases DBLP and CiteSeer. The database entries in DBLP are inserted manually while the CiteSeer entries are obtained autonomously via a crawl of the Web and automatic processing of user submissions. CiteSeer’s autonomous citation database can be considered a form of self-selected on-line survey. It is important to understand the limitations of such databases, particularly when citation information is used to assess the performance of authors, institutions and funding bodies. We show that the CiteSeer database contains considerably fewer single author papers. This bias can be modeled by an exponential process with intuitive explanation. The model permits us to predict that the DBLP database covers approximately 24% of the entire literature of Computer Science. CiteSeer is also biased against low-cited papers. Despite their difference, both databases exhibit similar and signif- icantly different citation distributions compared with previous analysis of the Physics community. In both databases, we also observe that the number of authors per paper has been increasing over time. 1 Introduction Several public1 databases of research papers became available due to the advent of the Web [1,22,5,3,2,4,8] These databases collect papers in different scientific disciplines, index them and annotate them with additional metadata.
    [Show full text]
  • Linking Educational Resources on Data Science
    The Thirty-First AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI-19) Linking Educational Resources on Data Science Jose´ Luis Ambite,1 Jonathan Gordon,2∗ Lily Fierro,1 Gully Burns,1 Joel Mathew1 1USC Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, California 90292, USA 2Department of Computer Science, Vassar College, 124 Raymond Ave, Poughkeepsie, New York 12604, USA [email protected], [email protected], flfierro, burns, [email protected] Abstract linked data (Heath and Bizer 2011) with references to well- known entities in DBpedia (Auer et al. 2007), DBLP (Ley The availability of massive datasets in genetics, neuroimag- 2002), and ORCID (orcid.org). Here, we detail the meth- ing, mobile health, and other subfields of biology and ods applied to specific problems in building ERuDIte. First, medicine promises new insights but also poses significant we provide an overview of our project. In later sections, we challenges. To realize the potential of big data in biomedicine, the National Institutes of Health launched the Big Data to describe our resource collection effort, the resource schema, Knowledge (BD2K) initiative, funding several centers of ex- the topic ontology, our linkage approach, and the linked data cellence in biomedical data analysis and a Training Coordi- resource provided. nating Center (TCC) tasked with facilitating online and in- The National Institutes of Health (NIH) launched the Big person training of biomedical researchers in data science. A Data to Knowledge (BD2K) initiative to fulfill the promise major initiative of the BD2K TCC is to automatically identify, of biomedical “big data” (Ohno-Machado 2014). The NIH describe, and organize data science training resources avail- BD2K program funded 15 major centers to investigate how able on the Web and provide personalized training paths for data science can benefit diverse fields of biomedical research users.
    [Show full text]
  • A Fuzzy Datalog Deductive Database System 1
    A FUZZY DATALOG DEDUCTIVE DATABASE SYSTEM 1 A Fuzzy Datalog Deductive Database System Pascual Julian-Iranzo,´ Fernando Saenz-P´ erez´ Abstract—This paper describes a proposal for a deduc- particular, non-logic constructors are disallowed, as the cut), tive database system with fuzzy Datalog as its query lan- but it is not Turing-complete as it is meant as a database query guage. Concepts supporting the fuzzy logic programming sys- language. tem Bousi∼Prolog are tailored to the needs of the deductive database system DES. We develop a version of fuzzy Datalog Fuzzy Datalog [9] is an extension of Datalog-like languages where programs and queries are compiled to the DES core using lower bounds of uncertainty degrees in facts and rules. Datalog language. Weak unification and weak SLD resolution are Akin proposals as [10], [11] explicitly include the computation adapted for this setting, and extended to allow rules with truth de- of the rule degree, as well as an additional argument to gree annotations. We provide a public implementation in Prolog represent this degree. However, and similar to Bousi∼Prolog, which is open-source, multiplatform, portable, and in-memory, featuring a graphical user interface. A distinctive feature of this we are interested in removing this burden from the user system is that, unlike others, we have formally demonstrated with an automatic rule transformation that elides both the that our implementation techniques fit the proposed operational degree argument and the explicit call to fuzzy connective semantics. We also study the efficiency of these implementation computations in user rules. In addition, we provide support for techniques through a series of detailed experiments.
    [Show full text]
  • Powerdesigner 16.6 Data Modeling
    SAP® PowerDesigner® Document Version: 16.6 – 2016-02-22 Data Modeling Content 1 Building Data Models ...........................................................8 1.1 Getting Started with Data Modeling...................................................8 Conceptual Data Models........................................................8 Logical Data Models...........................................................9 Physical Data Models..........................................................9 Creating a Data Model.........................................................10 Customizing your Modeling Environment........................................... 15 1.2 Conceptual and Logical Diagrams...................................................26 Supported CDM/LDM Notations.................................................27 Conceptual Diagrams.........................................................31 Logical Diagrams............................................................43 Data Items (CDM)............................................................47 Entities (CDM/LDM)..........................................................49 Attributes (CDM/LDM)........................................................55 Identifiers (CDM/LDM)........................................................58 Relationships (CDM/LDM)..................................................... 59 Associations and Association Links (CDM)..........................................70 Inheritances (CDM/LDM)......................................................77 1.3 Physical Diagrams..............................................................82
    [Show full text]
  • OLAP, Relational, and Multidimensional Database Systems
    OLAP, Relational, and Multidimensional Database Systems George Colliat Arbor Software Corporation 1325 Chcseapeakc Terrace, Sunnyvale, CA 94089 Introduction Many people ask about the difference between implementing On-Line Analytical Processing (OLAP) with a Relational Database Management System (ROLAP) versus a Mutidimensional Database (MDD). In this paper, we will show that an MDD provides significant advantages over a ROLAP such as several orders of magnitude faster data retrieval, several orders of magnitude faster calculation, much less disk space, and le,~ programming effort. Characteristics of On-Line Analytical Processing OLAP software enables analysts, managers, and executives to gain insight into an enterprise performance through fast interactive access to a wide variety of views of data organized to reflect the multidimensional aspect of the enterprise data. An OLAP service must meet the following fundamental requirements: • The base level of data is summary data (e.g., total sales of a product in a region in a given period) • Historical, current, and projected data • Aggregation of data and the ability to navigate interactively to various level of aggregation (drill down) • Derived data which is computed from input data (performance rados, variance Actual/Budget,...) • Multidimensional views of the data (sales per product, per region, per channel, per period,..) • Ad hoc fast interactive analysis (response in seconds) • Medium to large data sets ( 1 to 500 Gigabytes) • Frequently changing business model (Weekly) As an iUustration we will use the six dimension business model of a hypothetical beverage company. Each dimension is composed of a hierarchy of members: for instance the time dimension has months (January, February,..) as the leaf members of the hierarchy, Quarters (Quarter 1, Quarter 2,..) as the next level members, and Year as the top member of the hierarchy.
    [Show full text]
  • Analysis of Computer Science Communities Based on DBLP
    Analysis of Computer Science Communities Based on DBLP Maria Biryukov1 and Cailing Dong1,2 1 Faculty of Science, Technology and Communications, MINE group, University of Luxembourg, Luxembourg [email protected] 2 School of Computer Science and Technology, Shandong University, Jinan, 250101, China cailing [email protected] Abstract. It is popular nowadays to bring techniques from bibliometrics and scientometrics into the world of digital libraries to explore mecha- nisms which underlie community development. In this paper we use the DBLP data to investigate the author’s scientific career, and analyze some of the computer science communities. We compare them in terms of pro- ductivity and population stability, and use these features to compare the sets of top-ranked conferences with their lower ranked counterparts.1 Keywords: bibliographic databases, author profiling, scientific commu- nities, bibliometrics. 1 Introduction Computer science is a broad and constantly growing field. It comprises various subareas each of which has its own specialization and characteristic features. While different in size and granularity, research areas and conferences can be thought of as scientific communities that bring together specialists sharing simi- lar interests. What is specific about conferences is that in addition to scope, par- ticipating scientists and regularity, they are also characterized by level. In each area there is a certain number of commonly agreed upon top ranked venues, and many others – with the lower rank or unranked. In this work we aim at finding out how the communities represented by different research fields and conferences are evolving and communicating to each other. To answer this question we sur- vey the development of the author career, compare various research areas to each other, and finally, try to identify features that would allow to distinguish between venues of different rank.
    [Show full text]
  • On Active Deductive Databases
    On Active Deductive Databases The Statelog Approach Georg Lausen Bertram Ludascher Wolfgang May Institut fur Informatik Universitat Freiburg Germany flausenludaeschmayginfo rmat ikun ifr eibur gde Abstract After briey reviewing the basic notions and terminology of active rules and relating them to pro duction rules and deductive rules resp ectively we survey a number of formal approaches to active rules Subsequentlywe present our own stateoriented logical approach to ac tive rules which combines the declarative semantics of deductive rules with the p ossibility to dene up dates in the style of pro duction rules and active rules The resulting language Statelog is surprisingly simple yet captures many features of active rules including comp osite eventde tection and dierent coupling mo des Thus it can b e used for the formal analysis of rule prop erties like termination and expressivepo wer Finally weshowhow nested transactions can b e mo deled in Statelog b oth from the op erational and the mo deltheoretic p ersp ective Intro duction Motivated by the need for increased expressiveness and the advent of new appli cations rules have b ecome very p opular as a paradigm in database programming since the late eighties Min Today there is a plethora of quite dierent appli cation areas and semantics for rules From a birdseye view deductive and active rules may b e regarded as two ends of a sp ectrum of database rule languages Deductive Rules Production Rules Active Rules higher level lower level stratied well RDL pro ARDL Ariel Starburst Postgres
    [Show full text]
  • Multidimensional Network Analysis
    Universita` degli Studi di Pisa Dipartimento di Informatica Dottorato di Ricerca in Informatica Ph.D. Thesis Multidimensional Network Analysis Michele Coscia Supervisor Supervisor Fosca Giannotti Dino Pedreschi May 9, 2012 Abstract This thesis is focused on the study of multidimensional networks. A multidimensional network is a network in which among the nodes there may be multiple different qualitative and quantitative relations. Traditionally, complex network analysis has focused on networks with only one kind of relation. Even with this constraint, monodimensional networks posed many analytic challenges, being representations of ubiquitous complex systems in nature. However, it is a matter of common experience that the constraint of considering only one single relation at a time limits the set of real world phenomena that can be represented with complex networks. When multiple different relations act at the same time, traditional complex network analysis cannot provide suitable an- alytic tools. To provide the suitable tools for this scenario is exactly the aim of this thesis: the creation and study of a Multidimensional Network Analysis, to extend the toolbox of complex network analysis and grasp the complexity of real world phenomena. The urgency and need for a multidimensional network analysis is here presented, along with an empirical proof of the ubiquity of this multifaceted reality in different complex networks, and some related works that in the last two years were proposed in this novel setting, yet to be systematically defined. Then, we tackle the foundations of the multidimensional setting at different levels, both by looking at the basic exten- sions of the known model and by developing novel algorithms and frameworks for well-understood and useful problems, such as community discovery (our main case study), temporal analysis, link prediction and more.
    [Show full text]
  • How Do Computer Scientists Use Google Scholar?: a Survey of User Interest in Elements on Serps and Author Profile Pages
    BIR 2019 Workshop on Bibliometric-enhanced Information Retrieval How do Computer Scientists Use Google Scholar?: A Survey of User Interest in Elements on SERPs and Author Profile Pages Jaewon Kim, Johanne R. Trippas, Mark Sanderson, Zhifeng Bao, and W. Bruce Croft RMIT University, Melbourne, Australia fjaewon.kim, johanne.trippas, mark.sanderson, zhifeng.bao, [email protected] Abstract. In this paper, we explore user interest in elements on Google Scholar's search engine result pages (SERPs) and author profile pages (APPs) through a survey in order to predict search behavior. We inves- tigate the effects of different query intents (keyword and title) on SERPs and research area familiarities (familiar and less-familiar) on SERPs and APPs. Our findings show that user interest is affected by the re- spondents' research area familiarity, whereas there is no effect due to the different query intents, and that users tend to distribute different levels of interest to elements on SERPs and APPs. We believe that this study provides the basis for understanding search behavior in using Google Scholar. Keywords: Academic search · Google Scholar · Survey study 1 Introduction Academic search engines (ASEs) such as Google Scholar1 and Microsoft Aca- demic2, which are specialized for searching scholarly literature, are now com- monly used to find relevant papers. Among the ASEs, Google Scholar covers about 80-90% of the publications in the world [7] and also provides individual profiles with three types of citation namely, total citation number, h-index, and h10-index. Although the Google Scholar website3 states that the ranking is de- cided by authors, publishers, numbers and times of citation, and even considers the full text of each document, there is a phenomenon called the Google Scholar effect [11].
    [Show full text]
  • PP-DBLP: Modeling and Generating Public-Private Attributed Networks with DBLP
    PP-DBLP: Modeling and Generating Public-Private Attributed Networks with DBLP Xin Huang∗, Jiaxin Jiang∗, Byron Choi∗, Jianliang Xu∗, Zhiwei Zhang∗, Yunya Song# ∗Department of Computer Science, Hong Kong Baptist University #Department of Journalism, Hong Kong Baptist University ∗fxinhuang, jxjian, bchoi, xujl, [email protected], #[email protected] Abstract—In many online social networks (e.g., Facebook, graph is visible only to the corresponding user. From each Google+, Twitter, and Instagram), users prefer to hide her/his users viewpoint, the social network is exactly the union of the partial or all relationships, which makes such private relation- public graph and her/his own private graph. Several sketching ships not visible to public users or even friends. This leads to a new graph model called public-private networks, where each and sampling approaches [2] have been proposed to address user has her/his own perspective of the network including the essential problems of graph processing, such as the size of private connections. Recently, public-private network analysis has reachability tree [5], all-pair shortest paths [6], pairwise node attracted significant research interest in the literature. A great similarities [7], correlation clustering [8] and so on. deal of important graph computing problems (e.g., shortest paths, centrality, PageRank, and reachability tree) has been studied. In social networks, vertices usually contain attributes, e.g., However, due to the limited data sources and privacy concerns, a person has information including name, interests, skills, proposed approaches are not tested on real-world datasets, but and so on. Recent study [2] focuses on one essential aspect on synthetic datasets by randomly selecting vertices as private of topological structure of public-private graphs.
    [Show full text]
  • The Best Nurturers in Computer Science Research
    The Best Nurturers in Computer Science Research Bharath Kumar M. Y. N. Srikant IISc-CSA-TR-2004-10 http://archive.csa.iisc.ernet.in/TR/2004/10/ Computer Science and Automation Indian Institute of Science, India October 2004 The Best Nurturers in Computer Science Research Bharath Kumar M.∗ Y. N. Srikant† Abstract The paper presents a heuristic for mining nurturers in temporally organized collaboration networks: people who facilitate the growth and success of the young ones. Specifically, this heuristic is applied to the computer science bibliographic data to find the best nurturers in computer science research. The measure of success is parameterized, and the paper demonstrates experiments and results with publication count and citations as success metrics. Rather than just the nurturer’s success, the heuristic captures the influence he has had in the indepen- dent success of the relatively young in the network. These results can hence be a useful resource to graduate students and post-doctoral can- didates. The heuristic is extended to accurately yield ranked nurturers inside a particular time period. Interestingly, there is a recognizable deviation between the rankings of the most successful researchers and the best nurturers, which although is obvious from a social perspective has not been statistically demonstrated. Keywords: Social Network Analysis, Bibliometrics, Temporal Data Mining. 1 Introduction Consider a student Arjun, who has finished his under-graduate degree in Computer Science, and is seeking a PhD degree followed by a successful career in Computer Science research. How does he choose his research advisor? He has the following options with him: 1. Look up the rankings of various universities [1], and apply to any “rea- sonably good” professor in any of the top universities.
    [Show full text]
  • Oracle Rdb™ SQL Reference Manual Volume 4
    Oracle Rdb™ SQL Reference Manual Volume 4 Release 7.3.2.0 for HP OpenVMS Industry Standard 64 for Integrity Servers and OpenVMS Alpha operating systems August 2016 ® SQL Reference Manual, Volume 4 Release 7.3.2.0 for HP OpenVMS Industry Standard 64 for Integrity Servers and OpenVMS Alpha operating systems Copyright © 1987, 2016 Oracle Corporation. All rights reserved. Primary Author: Rdb Engineering and Documentation group This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007).
    [Show full text]