Database Queries – Logic and Complexity Moshe Y

Total Page:16

File Type:pdf, Size:1020Kb

Database Queries – Logic and Complexity Moshe Y Database Queries – Logic and Complexity Moshe Y. Vardi, Rice University Mathematical logic emerged during the early part of the 20 Century, out of a foundational investigation of mathematics, as the basic language of mathematics. In 1970 Codd proposed the relational database model, based on mathematical logic: logical structures offer a way to model data, while logical formulas offer a way to express database queries. This proposal gave rise to a multi-billion dollar relational database industry as well as a rich theory of logical query languages. This talk will offer an overview of how mathematical logic came to provide foundations for one of today's most important technologies, and show how the theory of logical queries offer deep insights into the computational complexity of evaluating relational queries. Moshe Y. Vardi is the George Distinguish Service Professor in Computational Engineering and Director of the Ken Kennedy Institute for Information Technology Institute at Rice University. He is the co-recipient of three IBM Outstanding Innovation Awards, the ACM SIGACT Goedel Prize, the ACM Kanellakis Award, the ACM SIGMOD Codd Award, the Blaise Pascal Medal, and the IEEE Computer Society Goode Award. He is the author and co-author of over 400 papers, as well as two books: Reasoning about Knowledge and Finite Model Theory and Its Applications. He is a Fellow of the Association for Computing Machinery, the American Association for Artificial Intelligence, the American Association for the Advancement of Science, and the Institute for Electrical and Electronic Engineers. He is a member of the US National Academy of Engineering, the American Academy of Arts and Science, the European Academy of Science, and Academia Europea. He holds honorary doctorates from the Saarland University in Germany and Orleans University in France. He is the Editor-in-Chief of the Communications of the ACM. Scientific Data Management: Not your everyday transaction Anastasia Ailamaki, EPFL Lausanne Today's scientific processes heavily depend on fast and accurate analysis of experimental data. Scientists are routinely overwhelmed by the effort needed to manage the volumes of data produced either by observing phenomena or by sophisticated simulations. As database systems have proven inefficient, inadequate, or insufficient to meet the needs of scientific applications, the scientific community typically uses special- purpose legacy software. When compared to a general-purpose data management system, however, application-specific systems require more resources to maintain, and in order to achieve acceptable performance they often sacrifice data independence and hinder the reuse of knowledge. With the exponential growth of dataset sizes, data management technology are no longer luxury; they are the sole solution for scientific applications. I will discuss some of the work from teams around the world and the requirements of their applications, as well as how these translate to challenges for the data management community. As an example I will describe a challenging application on brain simulation data, and its needs; I will then present how we were able to simulate a meaningful percentage of the human brain as well as access arbitrary brain regions fast, independently of increasing data size or density. Finally I will present some of the dat management challenges that lie ahead in domain sciences. Anastasia Ailamaki is a Professor of Computer Sciences at the Ecole Polytechnique Federale de Lausanne (EPFL) in Switzerland. Her research interests are in database systems and applications, and in particular (a) in strengthening the interaction between the database software and emerging hardware and I/O devices, and (b) in automating database management to support computationally-demanding and demanding data- intensive scientific applications. She has received a Finmeccanica endowed chair from the Computer Science Department at Carnegie Mellon (2007), a European Young Investigator Award from the European Science Foundation (2007), an Alfred P. Sloan Research Fellowship (2005), seven best-paper awards at top conferences (2001-2011), and an NSF CAREER award (2002). She earned her Ph.D. in Computer Science from the University of Wisconsin-Madison in 2000. She is a member of IEEE and ACM, and has also been a CRA-W mentor. Open Data François Bancilhon Open Data consists in making available to the general public and to private and public organization PSI (public sector information) for access and reuse. More and more open data is becoming available in most democratic countries following the launch of the data.gov initiative in the US in 2009. The availability of this new information brings a number opportunities and raises a number of challenges. The opportunities are the new applications that companies and organisations can build using this data and the new understanding given to the people who access it. The challenges are the following: most of this data is usually in a poor format (poorly structured xls tables or in some cases even pdf), it is often of poor quality, and it is fragmented in thousands or millions of files with duplicate and/or complementary information. To use these fragmented, poorly structured and poor quality files, several approaches can be used, not necessarily mutually exclusive. One is to move the intelligence from the data into the application and to develop search based applications which directly manages the data as is. Another one is to bring some order in the data using a semantic web approach: converting the data in rdf, identifying entities and linking them from one data set to the other. And a final one is to structure the data by aligning data sets on common attribute and structure, to get closer to a uniform data base scheme. François is currently CEO of Data Publica, a key actor of the Open Data space in France and CEO of the Mobile Services Initiative for INRIA. He has co-founded and/or managed several software startups in France and in the US (Data Publica, Mandriva, Arioso, Xyleme, Ucopia, O2 Technology). Before becoming an entrepreneur, François was a researcher and a university professor, in France and the US, specializing in database technology. François holds an engineering degree from the École des Mines de Paris, a PhD from the University of Michigan and a Doctorate from the University of Paris XI. Web archiving Julien Masanès The Web represents the largest source of open information ever produced in history. Larger than the printed sphere by several order of magnitude, it also exhibit specific characteristics compared to traditional media, such as it's collaborative editing to which a large fraction of humanity participates, even marginally, it's complex dynamics and the paradoxical nature of traces it conveys, both ubiquitous and fragile at the same time. These unique features also led the web to become a major source for modern information, analysis and study, and the capacity to preserve its memory an important issue for the future. But these features also require to lay new methodological and practical foundations in the well-established field of cultural artefacts preservation. This presentation will outline the salient properties of the Web viewed from the somewhat different angle of its preservation and offer some insight into how its memory can be built to serve science in the future. Julien Masanès is Director of the Internet Memory, a non-profit foundation for web preservation and digital cultural access. Before this he directed the Web Archiving Project at the Bibliothèque Nationale de France since 2000. He also actively participated in the creation of the International Internet Preservation Consortium (IIPC), which he has coordinated during the first two years. He contributes in various national and international initiatives and provides advices for the European Commission as an expert in the domain of digital preservation and web archiving. He has also launched and presently chairs the International Web Archiving Workshop (IWAW) series, the main international rendezvous in this field. Julien Masanès studied Philosophy and Cognitive Science, gaining his MS in Philosophy from the Sorbonne in 1992 and his MS in Cognitive Science from the Ecole des Hautes Etudes en Sciences Sociales (EHESS) in 1994. In 2000 he gained a MS in librarianship at the Ecole Nationale Supérieure des Sciences de l'information et des Bibliothèques (ENSSIB). Static Analysis and Verification Victor Vianu, U.C. San Diego Correctness and good performance are essential desiderata for database systems and the many applications relying on databases. Indeed, bugs and performance problems are commonly encountered in such systems and can range from annoying to catastrophic. Static analysis and verification provide tools for automatic reasoning about queries and applications in order to guarantee desirable behavior. Unfortunately, such reasoning, carried out by programs that take as input other programs, quickly runs against fundamental limitations of computing. In the cases when it is feasible, it often requires a sophisticated mix of techniques from logic and automata theory. This talk will discuss some of the challenges and intrinsic limitations of static analysis and verification and identify situations where it can be very effective. Victor Vianu is a Professor of Computer Science at the University of California, San Diego. He received his PhD in Computer Science from the University of Southern California in 1983. He has spent sabbaticals
Recommended publications
  • ANDREAS PIERIS School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh, EH8 9AB, UK [email protected]
    ANDREAS PIERIS School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh, EH8 9AB, UK [email protected] UNIVERSITY EDUCATION • D.Phil. in Computer Science, 2011 Department of Computer Science, University of Oxford Thesis: Ontological Query Answering: New Languages, Algorithms and Complexity Supervisor: Professor Georg Gottlob • M.Sc. in Mathematics anD FounDations oF Computer Science (with Distinction), 2007 Mathematical Institute, University of Oxford Thesis: Data Exchange and Schema Mappings Supervisor: Professor Georg Gottlob • B.Sc. in Computer Science (with Distinction, GPA: 9.06/10), 2006 Department of Computer Science, University of Cyprus Thesis: The Fully Mixed Nash Equilibrium Conjecture Supervisor: Professor Marios Mavronicolas EMPLOYMENT HISTORY • Lecturer (equivalent to Assistant ProFessor) in Databases, 09/2016 – present School of Informatics, University of Edinburgh • PostDoctoral Researcher, 11/2014 – 09/2016 Institute of Logic and Computation, Vienna University of Technology • PostDoctoral Researcher, 09/2011 – 10/2014 Department of Computer Science, University of Oxford RESEARCH Major research interests • Data management: knowledge-enriched data, uncertain data • Knowledge representation and reasoning: ontology languages, complexity of reasoning • Computational logic and its applications to computer science Research grants • EfFicient Querying oF Inconsistent Data, 09/2018 – 08/2022 Principal Investigator Funding agency: Engineering and Physical Sciences Research Council (EPSRC) Total award: £758,049 • Value AdDeD Data Systems: Principles anD Architecture, 04/2015 – 03/2020 Co-Investigator Funding agency: Engineering and Physical Sciences Research Council (EPSRC) Total award: £1,546,471 Research supervision experience • Marco Calautti, postdoctoral supervision, University of Edinburgh, 09/2016 – present • Markus Schneider, Ph.D. supervisor, University of Edinburgh, 09/2018 – present • Gerald Berger, Ph.D.
    [Show full text]
  • Understanding the Hidden Web
    Introduction General Framework Different Modules Conclusion Understanding the Hidden Web Pierre Senellart Athens University of Economics & Business, 28 July 2008 Introduction General Framework Different Modules Conclusion Simple problem Contact all co-authors of Serge Abiteboul. It’s easy! You just need to: Find all co-authors. For each of them, find their current email address. Introduction General Framework Different Modules Conclusion Advanced Scholar Search Advanced Search Tips | About Google Scholar Find articles with all of the words with the exact phrase with at least one of the words without the words where my words occur Author Return articles written by e.g., "PJ Hayes" or McCarthy Publication Return articles published in e.g., J Biol Chem or Nature Date Return articles published between — e.g., 1996 Subject Areas Return articles in all subject areas. Return only articles in the following subject areas: Biology, Life Sciences, and Environmental Science Business, Administration, Finance, and Economics Chemistry and Materials Science Engineering, Computer Science, and Mathematics Medicine, Pharmacology, and Veterinary Science Physics, Astronomy, and Planetary Science Social Sciences, Arts, and Humanities ©2007 Google Introduction General Framework Different Modules Conclusion Advanced Scholar Search Advanced Search Tips | About Google Scholar Find articles with all of the words with the exact phrase with at least one of the words without the words where my words occur Author Return articles written by e.g., "PJ Hayes" or McCarthy Publication
    [Show full text]
  • The Best Nurturers in Computer Science Research
    The Best Nurturers in Computer Science Research Bharath Kumar M. Y. N. Srikant IISc-CSA-TR-2004-10 http://archive.csa.iisc.ernet.in/TR/2004/10/ Computer Science and Automation Indian Institute of Science, India October 2004 The Best Nurturers in Computer Science Research Bharath Kumar M.∗ Y. N. Srikant† Abstract The paper presents a heuristic for mining nurturers in temporally organized collaboration networks: people who facilitate the growth and success of the young ones. Specifically, this heuristic is applied to the computer science bibliographic data to find the best nurturers in computer science research. The measure of success is parameterized, and the paper demonstrates experiments and results with publication count and citations as success metrics. Rather than just the nurturer’s success, the heuristic captures the influence he has had in the indepen- dent success of the relatively young in the network. These results can hence be a useful resource to graduate students and post-doctoral can- didates. The heuristic is extended to accurately yield ranked nurturers inside a particular time period. Interestingly, there is a recognizable deviation between the rankings of the most successful researchers and the best nurturers, which although is obvious from a social perspective has not been statistically demonstrated. Keywords: Social Network Analysis, Bibliometrics, Temporal Data Mining. 1 Introduction Consider a student Arjun, who has finished his under-graduate degree in Computer Science, and is seeking a PhD degree followed by a successful career in Computer Science research. How does he choose his research advisor? He has the following options with him: 1. Look up the rankings of various universities [1], and apply to any “rea- sonably good” professor in any of the top universities.
    [Show full text]
  • Toward an Open Knowledge Research Graph.Pdf
    THE SERIALS LIBRARIAN https://doi.org/10.1080/0361526X.2019.1540272 Toward an Open Knowledge Research Graph Sören Auera and Sanjeet Mann b aPresenter; bRecorder ABSTRACT KEYWORDS Knowledge graphs facilitate the discovery of information by organizing it into Knowledge graph; scholarly entities and describing the relationships of those entities to each other and to communication; Semantic established ontologies. They are popular with search and e-commerce com- Web; linked data; scientific panies and could address the biggest problems in scientific communication, research; machine learning according to Sören Auer of the Technische Informationsbibliothek and Leibniz University of Hannover. In his NASIG vision session, Auer introduced attendees to knowledge graphs and explained how they could make scientific research more discoverable, efficient, and collaborative. Challenges include incentiviz- ing researchers to participate and creating the training data needed to auto- mate the generation of knowledge graphs in all fields of research. Change in the digital world Thank you to Violeta Ilik and the NASIG Program Planning Committee for inviting me to this conference. I would like to show you where I come from. Leibniz University of Hannover has a castle that belonged to a prince, and next to the castle is the Technische Informationsbibliothek (TIB), responsible for supporting the scientific and technology community in Germany with publications, access, licenses, and digital information services. Figure 1 is an example of a knowledge graph about TIB. The basic ingredients of a knowledge graph are entities and relationships. We are the library of Leibniz University of Hannover and we are a member of Leibniz Association (a German research association).
    [Show full text]
  • AAAI-11 Program Schedule.IAAI.EAAI
    AAAI-11 Technical Program Schedule Monday, August 8 6:00 – 7:00 pm AAAI-11 Opening Reception Tuesday, August 9 8:30 - 9:00 am Grand Ballroom, Street Level AAAI-11/IAAI-11 Opening Ceremony Welcome and Opening Remarks Outstanding Award Presentations -- Papers, SPC Member, PC Member Wolfram Burgard and Dan Roth, AAAI-11 Program Cochairs IAAI Welcome, Robert S. Engelmore Award, Deployed Application Award Announcements Daniel Shapiro, IAAI-11 Conference Chair, Markus Fromherz, IAAI-11 Program Cochair, and David Leake, AI Magazine Editor-in-Chief Feigenbaum Prize, AAAI Classic Paper Award, Distinguished Service Award Fellows Announcement, Senior Member Recognition Eric Horvitz, AAAI Past President and Awards Committee Chair Henry Kautz, AAAI President 9:15 – 10:00 am AAAI-11 25th Conference Anniversary Panel Moderator: Manuela Veloso, AAAI President-Elect (Carnegie Mellon University) 10:00 – 10:20 am Coffee Break 10:20 - 11:20 am IAAI-11/AAAI-11 Joint Invited Talk: Building Watson: An Overview of DeepQA for the Jeopardy! Challenge David Ferrucci (IBM T J Watson Research Center) 11:30 am – 12:30 pm Description Logics 1 281: Revisiting Semantics for Epistemic Extensions of Description Logics Anees Mehdi, Sebastian Rudolph 242: Integrating Rules and Description Logics by Circumscription Qian Yang, Jia-Huai You, Zhiyong Feng 626: Conjunctive Query Inseparability of OWL 2QL TBoxes B. Konev, R. Kontchakov, M. Ludwig, T. Schneider, F. Wolter, M. Zakharyaschev Machine Learning 1 6024: Nectar: Quantity Makes Quality: Learning with Partial Views Nicolò Cesa-Bianchi, Shai Shalev-Shwartz, Ohad Shamir 31: Symmetric Graph Regularized Constraint Propagation Zhenyong Fu, Zhiwu Lu, Horace H. S.
    [Show full text]
  • Internationale Mathematische Nachrichten
    INTERNATIONALE MATHEMATISCHE NACHRICHTEN INTERNATIONAL MATHEMATICAL NEWS NOUVELLES MATHEMA¶ TIQUES INTERNATIONALES NACHRICHTEN DER OSTERREICHISCHENÄ MATHEMATISCHEN GESELLSCHAFT EDITED BY OSTERREICHISCHEÄ MATHEMATISCHE GESELLSCHAFT Nr. 181 August 1999 WIEN INTERNATIONALE MATHEMATISCHE NACHRICHTEN INTERNATIONAL MATHEMATICAL NEWS NOUVELLES MATHEMA¶ TIQUES INTERNATIONALES GegrundetÄ 1947 von R. Inzinger, fortgefuhrtÄ von W. Wunderlich Herausgeber: OSTERREICHISCHEÄ MATHEMATISCHE GESELLSCHAFT Redaktion: P. Flor (U Graz; Herausgeber), U. Dieter (TU Graz), M. Drmota (TU Wien), L. Reich (U Graz) und J. Schwaiger (U Graz), unter stÄandiger Mit- arbeit von R. Mlitz (TU Wien) und E. Seidel (U Graz). ISSN 0020-7926. Korrespondenten DANEMARK:Ä M. E. Larsen (Dansk Matematisk Forening, Kopenhagen) FRANKREICH: B. Rouxel (Univ. Bretagne occ., Brest) GRIECHENLAND: N. K. Stephanidis (Univ. Saloniki) GROSSBRITANNIEN: The Institute of Mathematics and Its Applications (Southend-on-Sea), The London Mathematical Society JAPAN: K. Iseki¶ (Japanese Asoc. of Math. Sci) JUGOSLAWIEN: S. Pre·sic¶ (Univ. Belgrad) KROATIEN: M. Alic¶ (Zagreb) NORWEGEN: Norsk Matematisk Forening (Oslo) OSTERREICH:Ä C. Binder (TU Wien) RUMANIEN:Ä F.-K. Klepp (Timisoara) SCHWEDEN: Svenska matematikersamfundet (GÄoteborg) 2 SLOWAKEI: J. Sira· n· (Univ. Pre¼burg) SLOWENIEN: M. Razpet (Univ. Laibach) TSCHECHISCHE REPUBLIK: B. Maslowski (Akad. Wiss. Prag) USA: A. Jackson (Amer. Math. Soc., Providende RI) INTERNATIONALE MATHEMATISCHE NACHRICHTEN INTERNATIONAL MATHEMATICAL NEWS NOUVELLES MATHEMA¶
    [Show full text]
  • Essays Dedicated to Peter Buneman ; [Colloquim in Celebration
    Val Tannen Limsoon Wong Leonid Libkin Wenfei Fan Wang-Chiew Tan Michael Fourman (Eds.) In Search of Elegance in the Theory and Practice of Computation Essays Dedicated to Peter Buneman 4^1 Springer Table of Contents Models for Data-Centric Workflows 1 Serge Abiteboul and Victor Vianu Relational Databases and Bell's Theorem 13 Samson Abramsky High-Level Rules for Integration and Analysis of Data: New Challenges 36 Bogdan Alexe, Douglas Burdick, Mauricio A. Hernandez, Georgia Koutrika, Rajasekar Krishnamurthy, Lucian Popa, Ioana R. Stanoi, and Ryan Wisnesky A New Framework for Designing Schema Mappings 56 Bogdan Alexe and Wang-Chiew Tan User Trust and Judgments in a Curated Database with Explicit Provenance 89 David W. Archer, Lois M.L. Delcambre, and David Maier An Abstract, Reusable, and Extensible Programming Language Design Architecture 112 Hassan Ait-Kaci A Discussion on Pricing Relational Data 167 Magdalena Balazinska, Bill Howe, Paraschos Koutris, Dan Suciu, and Prasang Upadhyaya Tractable Reasoning in Description Logics with Functionality Constraints 174 Andrea Call, Georg Gottlob, and Andreas Pieris Toward a Theory of Self-explaining Computation 193 James Cheney, Umut A. Acar, and Roly Perera To Show or Not to Show in Workflow Provenance 217 Susan B. Davidson, Sanjeev Khanna, and Tova Milo Provenance-Directed Chase&Backchase 227 Alin Deutsch and Richard Hull Data Quality Problems beyond Consistency and Deduplication 237 Wenfei Fan, Floris Geerts, Shuai Ma, Nan Tang, and Wenyuan Yu X Table of Contents 250 Hitting Buneman Circles Michael Paul Fourman 259 Looking at the World Thru Colored Glasses Floris Geerts, Anastasios Kementsietsidis, and Heiko Muller Static Analysis and Query Answering for Incomplete Data Trees with Constraints 273 Amelie Gheerbrant, Leonid Libkin, and Juan Reutter Using SQL for Efficient Generation and Querying of Provenance Information 291 Boris Glavic, Renee J.
    [Show full text]
  • INF2032 2017-2 Tpicos Em Bancos De Dados III ”Foundations of Databases”
    INF2032 2017-2 Tpicos em Bancos de Dados III "Foundations of Databases" Profs. Srgio Lifschitz and Edward Hermann Haeusler August-December 2017 1 Motivation The motivation of this course is to put together theory and practice of database and knowledge base systems. The main content includes a formal approach for the underlying existing technology from the point of view of logical expres- siveness and computational complexity. New and recent database models and systems are discussed from theory to practical issues. 2 Goal of the course The main goal is to introduce the student to the logical and computational foundations of database systems. 3 Sylabus The course in divided in two parts: Classical part: relational database model 1. Revision and formalization of relational databases. Limits of the relational data model; 2. Relational query languages; DataLog and recursion; 3. Logic: Propositional, First-Order, Higher-Order, Second-Order, Fixed points logics; 4. Computational complexity and expressiveness of database query lan- guages; New approaches for database logical models 1. XML-based models, Monadic Second-Order Logic, XPath and XQuery; 2. RDF-based knowledge bases; 3. Decidable Fragments of First-Order logic, Description Logic, OWL dialects; 4. NoSQL and NewSQL databases; 4 Evaluation The evaluation will be based on homework assignments (70 %) and a research manuscript + seminar (30%). 5 Supporting material Lecture notes, slides and articles. Referenced books are freely available online. More details at a wiki-based web page (available from August 7th on) References [ABS00] Serge Abiteboul, Peter Buneman, and Dan Suciu. Data on the Web: From Relations to Semistructured Data and XML.
    [Show full text]
  • Conference Program
    Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI-11) Twenty-Third Conference on Innovative Applications of Artificial Intelligence (IAAI-11) Second Symposium on Educational Advances in Artificial Intelligence (EAAI-11) August 7 – 11, 2011 Hyatt Regency San Francisco San Francisco, California, USA Sponsored by the Association for the Advancement of Artificial Intelligence Cosponsored by the National Science Foundation, AI Journal, Google, Inc. Microsoft Research, Cornell University Institute for Computational Sustainability Naval Research Laboratory, Yahoo! Research Labs, NASA Ames Research Center University of Southern California/Information Sciences Institute, ACM/SIGART IBM Research, Videolectures.net, and David E. Smith Conference Program Acknowledgments Robotics Program Chair Contents The Association for the Advancement of Artifi- Andrea Thomaz (Georgia Institute of Technology, USA) cial Intelligence acknowledges and thanks the Acknowledgments / 2 following individuals for their generous contri- Poker Competition Cohairs AI Video Competition / 18 butions of time and energy to the successful Nolan Bard (University of Alberta, Canada) Awards / 2–4 creation and planning of the Twenty-Fifth AAAI Jonathan Rubin (University of Auckland, New Competitions / 18–19 Conference on Artificial Intelligence and the Zealand) Conference at a Glance / 5 Twenty-Third Conference on Innovative Appli- AI Video Competition Cochairs Doctoral Consortium / 8 cations of Artificial Intelligence. David Aha (Naval Research Laboratory, USA) EAAI-11 Program / 9 Arnav Jhala (University of California, Santa Cruz, Exhibition / 16 AAAI-11 Conference Committee USA) General Information / 20 IAAI-11 Program / 10–15 AAAI Conference Committee Chair A complete listing of the AAAI-11 / IAAI-11 / Invited Presentations / 3, 6–7 Dieter Fox (University of Washington, USA) EAAI-11 Program Committee members appears in Poker Competition / 18 AAAI-11 Program Cochairs the conference proceedings.
    [Show full text]
  • Author Template for Journal Articles
    Mining citation information from CiteSeer data Dalibor Fiala University of West Bohemia, Univerzitní 8, 30614 Plzeň, Czech Republic Phone: 00420 377 63 24 29, fax: 00420 377 63 24 01, email: [email protected] Abstract: The CiteSeer digital library is a useful source of bibliographic information. It allows for retrieving citations, co-authorships, addresses, and affiliations of authors and publications. In spite of this, it has been relatively rarely used for automated citation analyses. This article describes our findings after extensively mining from the CiteSeer data. We explored citations between authors and determined rankings of influential scientists using various evaluation methods including cita- tion and in-degree counts, HITS, PageRank, and its variations based on both the citation and colla- boration graphs. We compare the resulting rankings with lists of computer science award winners and find out that award recipients are almost always ranked high. We conclude that CiteSeer is a valuable, yet not fully appreciated, repository of citation data and is appropriate for testing novel bibliometric methods. Keywords: CiteSeer, citation analysis, rankings, evaluation. Introduction Data from CiteSeer have been surprisingly little explored in the scientometric lite- rature. One of the reasons for this may have been fears that the data gathered in an automated way from the Web are inaccurate – incomplete, erroneous, ambiguous, redundant, or simply wrong. Also, the uncontrolled and decentralized nature of the Web is said to simplify manipulating and biasing Web-based publication and citation metrics. However, there have been a few attempts at processing the Cite- Seer data which we will briefly mention. Zhou et al.
    [Show full text]
  • Today Oxford
    www.oxfordtoday.ox.ac.uk Michaelmas Term 2010 Volume 23 No 1 OX FOR D TODAY THE UNIVERSITY MAGAZINE 20 | WILFRED THESIGER AFRICA SEEN THROUGH HIS LENS 30 | SCIENCE WHEN TO SHARE GENE DATA? 45 | GEOFFREY HILL SEAMUS PERRY ON OUR GREATEST LIVING POET PRIME MINISTERS Why has Oxford produced so many? OXF01.cover 1 8/10/10 3:37:5 pm FROM HOME Since 1821 the Oxfordand Cambridge Club has provided alumni of both universities with an exclusive home from home in the heartofthe Capital. Todaymembers can relax, dine and meetfriends in supremely elegant surroundings thatalso featurewell stocked libraries,sports facilities and first-class bedroom accommodation. Reciprocal clubs welcome members of the Oxfordand Cambridge Club in 35 countriesaround the world. Formoreinformation, please contact: [email protected] or call +44 (0)20 7321 5110 Oxfordand Cambridge Club,71Pall Mall,LondonSW1Y 5HD www.oxfordandcambridgeclub.co.uk OX FOR D TODAY EDITOR: Dr Richard Lofthouse DEPUTY ART EDITOR: Steven Goldring DESIGNER: Victoria Ford HEAD OF PUBLICATIONS AND WEB OFFICE: Anne Brunner-Ellis PRODUCTION EDITOR: Kate Lloyd SUB EDITOR: Elizabeth Tatham PICTURE EDITOR: Joanna Kay DESIGN DIRECTOR: Dylan Channon Thanks to Simon Kirrane, Esther Woodman, Helen Cox, Emma Swift EDITORIAL ENQUIRIES: Janet Avison Public Affairs Directorate Tel: 01865 280545 Fax: 01865 270178 [email protected] www.oxfordtoday.ox.ac.uk ALUMNI ENQUIRIES, INCLUDING CHANGE OF ADDRESS: Claire Larkin Alumni Offi ce Tel: 01865 611610 Michaelmas [email protected] COVER IMAGE: HARRY BORDEN/CORBIS OUTLINE, ROB JUDGES www.alumni.ox.ac.uk Term 2010 University of Oxford, University Offi ces, Wellington Square, Oxford OX1 2JD ADVERTISING ENQUIRIES: Marie Longstaff Future Plus, Beaufort Court, 30 Monmouth Street, Bath BA1 2BW Tel: 01225 822849 [email protected] www.futureplc.com Oxford Today is published in February, June and October.
    [Show full text]
  • Size and Treewidth Bounds for Conjunctive Queries
    Size and Treewidth Bounds for Conjunctive Queries GEORG GOTTLOB University of Oxford, UK STEPHANIE TIEN LEE University of Oxford, UK GREGORY VALIANT University of California, Berkeley and PAUL VALIANT University of California, Berkeley This paper provides new worst-case bounds for the size and treewith of the result Q(D) of a conjunctive query Q applied to a database D. We derive bounds for the result size |Q(D)| in terms of structural properties of Q, both in the absence and in the presence of keys and functional dependencies. These bounds are based on a novel “coloring” of the query variables that associates a coloring number C(Q) to each query Q. Intuitively, each color used represents some possible entropy of that variable. Using this coloring number, we derive tight bounds for the size of Q(D) in case (i) no functional dependencies or keys are specified, and (ii) simple functional dependencies (keys) are given. These results generalize recent size-bounds for join queries obtained by Atserias et al. [2008]. In the case of arbitrary (compound) functional dependencies, we use tools from information theory to provide lower and upper bounds, establishing a close connection between size bounds and a basic question in information theory. Our new coloring scheme also allows us to precisely characterize (both in the absence of keys and with simple keys) the treewidth-preserving queries—the queries for which the treewidth of the output relation is bounded by a function of the treewidth of the input database. Finally, we give some results on the computational complexity of determining the size bounds, and of deciding whether the treewidth is preserved.
    [Show full text]