Some Undecidable Implication Problems for Path Constraints

Total Page:16

File Type:pdf, Size:1020Kb

Some Undecidable Implication Problems for Path Constraints University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science April 1997 Some Undecidable Implication Problems for Path Constraints Peter Buneman University of Pennsylvania Wenfei Fan University of Pennsylvania Scott Weinstein University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/cis_reports Recommended Citation Peter Buneman, Wenfei Fan, and Scott Weinstein, "Some Undecidable Implication Problems for Path Constraints", . April 1997. University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-97-14. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_reports/84 For more information, please contact [email protected]. Some Undecidable Implication Problems for Path Constraints Abstract We present a class of path constraints of interest in connection with both structured and semi-structured databases, and investigate their associated implication problems. These path constraints are capable of expressing natural integrity constraints that are not only a fundamental part of the semantics of the data, but are also important in query optimization. We show that, despite the simple syntax of the constraints, the implication problem for the constraints is r.e. complete and the finite implication problem for the constraints is co-r.e. complete. Indeed, we establish the existence of a conservative reduction of the set of all first-order sentences to the path constraint language. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MS- CIS-97-14. This technical report is available at ScholarlyCommons: https://repository.upenn.edu/cis_reports/84 Some Undecidable Implication Problems for Path Constraints y z Peter Buneman Wenfei Fan Scott Weinstein p etercentralcisup ennedu wfansaulcisup ennedu weinsteinlinccisup ennedu Department of Computer and Information Science University of Pennsylvania April Abstract We present a class of path constraints of interest in connection with b oth structured and semistructured databases and investigate their asso ciated implication problems These path constraints are capable of expressing natural integrity constraints that are not only a funda mental part of the semantics of the data but are also imp ortant in query optimization We show that despite the simple syntax of the constraints the implication problem for the constraints is re complete and the nite implication problem for the constraints is core complete Indeed we establish the existence of a conservative reduction of the set of all rstorder sentences to the path constraint language Intro duction Path inclusion constraints have b een studied in in the context of semistructured data Consider the following ob jectoriented schema class studentf Name string Taking setcourse g class coursef CName string setstudent Enrolled g Students setstudent Courses setcourse in which we assume that the declarations Students and Courses dene p ersistent entry p oints it stands this declaration do es not provide full information ab out the intended into the database As structure Given such a database we would exp ect the following informally stated constraints to hold This work was partly supp orted by the Army Research Oce DAAH and NSF Grant CCR y Supp orted by an IRCS graduate fellowship z Supp orted by NSF Grant CCR r Students Courses Students Courses Taking Taking Taking S1 C1 S2 C2 Enrolled Enrolled Enrolled Name CName Name CName "Smith" "Chem3" "Jones" "Phil4" Figure Representation of a studentcourse database a s S tudents c sT ak ing c C our ses nr ol l ed s S tudents b c C our ses s cE That is any course taken by a student must b e a course that o ccurs in the database extent of courses e and any student enrolled in a course must b e a student that similarly o ccurs in the database W shall call such constraints extent constraints We might also exp ect an inverse relationship to hold b etween Taking and Enrolled Ob ject oriented databases dier in the ways they enable one to state and enforce extent constraints and inverse relationships Compare for example O and Ob jectStore The presence of such 2 constraints is imp ortant b oth for database and for query optimization Let us develop a more formal notation for describing such constraints To do this we b orrow an idea that has b een exploited in semistructured data mo dels eg of regarding semistructured data as an edgelab eled graph The database consists of two sets and we express this by a ro ot no de r with edges emanating from it that are lab eled either Students or Courses These connect to no des that resp ectively represent students and courses which have edges emanating from them that resp ectively describ e the structure of students and courses For example a student has a single Name edge connected to a string no de and multiple Taking edges connected to course no des See Figure for an example of such a graph this representation of data we can examine certain kinds of constraints Using Extent Constraints By taking edge lab els as binary predicates constraints of the form a and b ab ove can b e stated as c s S tudentsr s T ak ing s c C our sesr c s c C our sesr c E nr ol l edc s S tudentsr s These constraints are examples of word constraints studied in the implication problems for word constraints were shown to b e decidable there verse Constraints These are common in ob jectoriented databases With resp ect to our In studentcourse schema the inverse b etween Taking and Enrolled is expressed as s c S tudentsr s T ak ing s c E nr ol l edc s c s C our sesr c E nr ol l edc s T ak ing s c Such constraints cannot b e expressed as word constraints or even by the more general path constraints given in Lo cal Database Constraints In database integration it is sometimes desirable to make one database a comp onent of another database or to build a database of databases Supp ose for example we wanted to bring together a numb er of studentcourse databases as describ ed ab ove We might write something like class SchoolDBf DBidentifier string Studentssetstudent as defined above Courses setcourse as defined above g Schools setSchoolDB Now we may want certain constraints to hold on comp onents of this database For example the we refer to extent constraints describ ed ab ove now hold on each memb er of the Schools set Here a comp onent database such as a memb er of the set Schools as a loc al database and its constraints our graph representation by adding Schools edges from as local database constraints Extending the new ro ot no de to the ro ots of lo cal databases the lo cal extent constraints are d c S chool sr d s S tudentsd s T ak ing s c C our sesd c d s S chool sr d c C our sesd c E nr ol l edc s S tudentsd s Again these cannot b e stated as path constraints of These considerations give rise to the question whether there is a natural generalization of the constraints of which will capture these slightly more complicated forms Here we consider a class of path constraints of either the form y r x x y x y x or the form x y r x x y y x where x y x y x y represents a path from no de x to no de y This class of path constraints can b e used to express all the constraints we have so far en countered For semistructured data in particular this class of constraints is useful not only for on this sub ject optimizing navigational queries but also for inferring structure see Surprisingly the implication problems for this mild generalization of word constraints are unde cidable However certain restricted cases are decidable in semistructured databases and these t to express at least the constraints we have describ ed ab ove In this pap er we cases are sucien establish the undecidability of the implication problems for this class of path constraints We defer to another pap er a full treatment of the decidability results Related work The idea of representing data as an edgelab eled graph and using paths to sp ecify navigational queries dates back to the early s Recently the idea has b een exploited and adapted to a variety of new database applications ranging from querying ob jectoriented databases eg XSQL OQLdo c to querying semistructured data eg UnQL Lorel WebSQL STRUQL There has also b een work in constraint languages dened in terms of paths for structured data as well as for semistructured data A class of functional constraints called path functional dependencies was prop osed in The axiomatizability and decidability of its asso ciated unrestricted implication problem were established in This constraint language generalizes functional dep endencies in the relational data mo del for semantic and ob ject oriented data mo dels It diers signicantly from ours which is a generalization of unary inclusion dep endencies in the relational mo del for b oth structured and semistructured data In a class of constraints for sp ecifying range restrictions asso ciated with paths called specialization constraints was prop osed for ob jectoriented data mo dels The axiomatizability and decidability of its asso ciated implication problems were established in A sp ecialization constraint asserts a typ e condition which is often sp ecied by a schema The central dierence b etween sp ecialization constraints and our path constraints is that sp ecialization constraints are
Recommended publications
  • Fast Data Analytics by Learning
    Fast Data Analytics by Learning by Yongjoo Park A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 2017 Doctoral Committee: Associate Professor Michael Cafarella, Co-Chair Assistant Professor Barzan Mozafari, Co-Chair Associate Professor Eytan Adar Professor H. V. Jagadish Associate Professor Carl Lagoze Yongjoo Park [email protected] ORCID iD: 0000-0003-3786-6214 c Yongjoo Park 2017 To my mother ii Acknowledgements I thank my co-advisors, Michael Cafarella and Barzan Mozafari. I am deeply grateful that I could start my graduate studies with Mike Cafarella. Mike was the one of the nicest persons I have met (I can say, throughout my life). He also deeply cared of my personal life as well as my graduate study. He always encouraged and supported me so I could make the best decisions for my life. He was certainly more than a simple academic advisor. Mike also has excellent talents in presentation. His descriptions (on any subject) are well-organized and straightforward. Sometimes, I find myself trying to imitate his presentation styles when I give my own presentations, which, I believe, is natural since I have been advised by him for more than five years now. I started work with Barzan after a few years since I came to Michigan. He is excellent in writing research papers and presenting work in the most interesting, formal, and concise way. He devoted tremendous amount of time for me. I learned great amount of writing and presentation skills from him.
    [Show full text]
  • O. Peter Buneman Curriculum Vitæ – Jan 2008
    O. Peter Buneman Curriculum Vitæ { Jan 2008 Work Address: LFCS, School of Informatics University of Edinburgh Crichton Street Edinburgh EH8 9LE Scotland Tel: +44 131 650 5133 Fax: +44 667 7209 Email: [email protected] or [email protected] Home Address: 14 Gayfield Square Edinburgh EH1 3NX Tel: 0131 557 0965 Academic record 2004-present Research Director, Digital Curation Centre 2002-present Adjunct Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 2002-present Professor of Database Systems, Laboratory for the Foundations of Computer Science, School of Informatics, University of Edinburgh 1990-2001 Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 1981-1989 Associate Professor of Computer Science, Department of Computer and Information Science, University of Pennsylvania. 1981-1987 Graduate Chairman, Department of Computer and Information Science, University of Pennsyl- vania 1975-1981 Assistant Professor of Computer Science at the Moore School and Assistant Professor of Decision Sciences at the Wharton School, University of Pennsylvania. 1969-1974 Research Associate and subsequently Lecturer in the School of Artificial Intelligence, Edinburgh University. Education 1970 PhD in Mathematics, University of Warwick (Supervisor E.C. Zeeman) 1966 MA in Mathematics, Cambridge. 1963-1966 Major Scholar in Mathematics at Gonville and Caius College, Cambridge. Awards, Visting positions Distinguished Visitor, University of Auckland, 2006 Trustee, VLDB Endowment, 2004 Fellow of the Royal Society of Edinburgh, 2004 Royal Society Wolfson Merit Award, 2002 ACM Fellow, 1999. 1 Visiting Resarcher, INRIA, Jan 1999 Visiting Research Fellowship sponsored by the Japan Society for the Promotion of Science, Research Institute for the Mathematical Sciences, Kyoto University, Spring 1997.
    [Show full text]
  • RSE Fellows Ordered by Area of Expertise As at 11/10/2016
    RSE Fellows ordered by Area of Expertise as at 11/10/2016 HRH Prince Charles The Prince of Wales KG KT GCB Hon FRSE HRH The Duke of Edinburgh KG KT OM, GBE Hon FRSE HRH The Princess Royal KG KT GCVO, HonFRSE A1 Biomedical and Cognitive Sciences 2014 Professor Judith Elizabeth Allen FRSE, FMedSci, Professor of Immunobiology, University of Manchester. 1998 Dr Ferenc Andras Antoni FRSE, Honorary Fellow, Centre for Integrative Physiology, University of Edinburgh. 1993 Sir John Peebles Arbuthnott MRIA, PPRSE, FMedSci, Former Principal and Vice-Chancellor, University of Strathclyde. Member, Food Standards Agency, Scotland; Chair, NHS Greater Glasgow and Clyde. 2010 Professor Andrew Howard Baker FRSE, FMedSci, BHF Professor of Translational Cardiovascular Sciences, University of Glasgow. 1986 Professor Joseph Cyril Barbenel FRSE, Former Professor, Department of Electronic and Electrical Engineering, University of Strathclyde. 2013 Professor Michael Peter Barrett FRSE, Professor of Biochemical Parasitology, University of Glasgow. 2005 Professor Dame Sue Black DBE, FRSE, Director, Centre for Anatomy and Human Identification, University of Dundee. ; Director, Centre for Anatomy and Human Identification, University of Dundee. 2007 Professor Nuala Ann Booth FRSE, Former Emeritus Professor of Molecular Haemostasis and Thrombosis, University of Aberdeen. 2001 Professor Peter Boyle CorrFRSE, FMedSci, Former Director, International Agency for Research on Cancer, Lyon. 1991 Professor Sir Alasdair Muir Breckenridge CBE KB FRSE, FMedSci, Emeritus Professor of Clinical Pharmacology, University of Liverpool. 2007 Professor Peter James Brophy FRSE, FMedSci, Professor of Anatomy, University of Edinburgh. Director, Centre for Neuroregeneration, University of Edinburgh. 2013 Professor Gordon Douglas Brown FRSE, FMedSci, Professor of Immunology, University of Aberdeen. 2012 Professor Verity Joy Brown FRSE, Provost of St Leonard's College, University of St Andrews.
    [Show full text]
  • WWW2007 Program Guide
    AOL_AD_WWW2007-Open_FIXED.pdf 4/13/2007 3:13:25 PM C M Y CM MY CY CMY K www2007.org Message from the Chair of IW3C2 On behalf of the International World Wide Web Conference Steering Committee (IW3C2), I welcome you to the 16th conference of our series. As you already know, the World Wide Web was first conceived in 1989 by Tim Berners-Lee at CERN in Geneva, Switzerland. The first conference, WWW1, was held at CERN in 1994. The conference series aims to provide the world a premier forum about the evolution of the Web, the standardization of its associated technologies, and the Web’s impact on society and culture. These conferences bring together researchers, developers, users, and vendors – indeed all of you who are passionate about the Web and what it has to offer, now and in the future. The IW3C2 was founded by Joseph Hardin and Robert Cailliau in August 1994 and has been responsible for the International WWW Conference series ever since. Except for 1994 and 1995 when two conferences were held each year, WWWn became an annual event held in late April or early May. The location of the conferences rotates among North America, Europe, and Asia. In 2002 we changed the conference designator from a number (1 through 10) to the year it is held; i.e., WWW11 became known as WWW2002, and so on. You may browse our website http://www. iw3c2.org/ for information on past and future conferences. I wish to recognize our long-term partner, the World Wide Web Consortium (W3C), who has contributed a significant part of the conferences’ contents since the very beginning.
    [Show full text]
  • Data Quality: from Theory to Practice
    Data Quality: From Theory to Practice Wenfei Fan School of Informatics, University of Edinburgh, and RCBD, Beihang University [email protected] ABSTRACT “more than 25% of critical data in the world’s top compa- Data quantity and data quality, like two sides of a coin, are nies is flawed” [53], and “pieces of information perceived equally important to data management. This paper provides as being needed for clinical decisions were missing from an overview of recent advances in the study of data quality, 13.6% to 81% of the time” [76]. It is also estimated that from theory to practice. We also address challenges intro- “2% of records in a customer file become obsolete in one duced by big data to data quality management. month” [31] and hence, in a customer database, 50% of its records may be obsolete and inaccurate within two years. 1. INTRODUCTION Dirty data is costly. Statistics shows that “bad data or poor When we talk about big data, we typically emphasize the data quality costs US businesses $600 billion annually” [31], quantity (volume) of the data. We often focus on techniques “poor data can cost businesses 20%-35% of their operating that allow us to efficiently store, manage and query the data. revenue” [92], and that “poor data across businesses and the For example, there has been a host of work on developing government costs the US economy $3.1 trillion a year” [92]. scalable algorithms that, given a query Q and a dataset D, Worse still, when it comes to big data, the scale of the data compute query answers Q(D) when D is big.
    [Show full text]
  • Download the Trustees' Report and Financial Statements 2018-2019
    Science is Global Trustees’ report and financial statements for the year ended 31 March 2019 The Royal Society’s fundamental purpose, reflected in its founding Charters of the 1660s, is to recognise, promote, and support excellence in science and to encourage the development and use of science for the benefit of humanity. The Society is a self-governing Fellowship of distinguished scientists drawn from all areas of science, technology, engineering, mathematics and medicine. The Society has played a part in some of the most fundamental, significant, and life-changing discoveries in scientific history and Royal Society scientists – our Fellows and those people we fund – continue to make outstanding contributions to science and help to shape the world we live in. Discover more online at: royalsociety.org BELGIUM AUSTRIA 3 1 NETHERLANDS GERMANY 5 12 CZECH REPUBLIC SWITZERLAND 3 2 CANADA POLAND 8 1 Charity Case study: Africa As a registered charity, the Royal Society Professor Cheikh Bécaye Gaye FRANCE undertakes a range of activities that from Cheikh Anta Diop University 25 provide public benefit either directly or in Senegal, Professor Daniel Olago from the University of indirectly. These include providing financial SPAIN UNITED STATES Nairobi in Kenya, Dr Michael OF AMERICA 18 support for scientists at various stages Owor from Makerere University of their careers, funding programmes 33 in Uganda and Professor Richard that advance understanding of our world, Taylor from University College organising scientific conferences to foster London are working on ways to discussion and collaboration, and publishing sustain low-cost, urban water supply and sanitation systems scientific journals.
    [Show full text]
  • Duplication in Biological Databases Further Limit the Development of the Related Duplicate Detection Methods
    School of Computing and Information Systems The University of Melbourne DUPLICATIONINBIOLOGICAL DATABASES:DEFINITIONS, IMPACTSANDMETHODS Qingyu Chen ORCID ID: 0000-0002-6036-1516 Supervisors: Prof. Justin Zobel Prof. Karin Verspoor Submitted in total fulfilment of the requirements of the degree of Doctor of Philosophy Produced on archival quality paper August 2017 ABSTRACT Duplication is a pressing issue in biological databases. This thesis concerns duplication, in terms of its definitions (what records are duplicates), impacts (why duplicates are significant) and solutions (how to address duplication). The volume of biological databases is growing at an unprecedented rate, populated by complex records drawn from heterogeneous sources; the huge data volume and the diverse types cause concern for the underlying data quality. A specific challenge is dupli- cation, that is, the presence of redundant or inconsistent records. While existing studies concern duplicates, the definitions of duplicates are not clear; the foundational under- standing of what records are considered as duplicates by database stakeholders is lacking. The impacts of duplication are not clear either; existing studies have different or even inconsistent views on the impacts. The unclear definitions and impacts of duplication in biological databases further limit the development of the related duplicate detection methods. In this work, we refine the definitions of duplication in biological databases through a retrospective analysis of merged groups in primary nucleotide databases – the duplicates identified by record submitters and database staff (or biocurators) – to understand what types of duplicates matter to database stakeholders. This reveals two primary representa- tions of duplication under the context of biological databases: entity duplicates, multiple records belonging to the same entities, which particularly impact record submission and curation, and near duplicates (or redundant records), records sharing high similarities, particularly impact database search.
    [Show full text]
  • Adding Regular Expressions to Graph Reachability and Pattern Queries
    Front. Comput. Sci., 2012, 6(3): 313–338 DOI 10.1007/s11704-012-1312-y Adding regular expressions to graph reachability and pattern queries Wenfei FAN1,2, Jianzhong LI2, Shuai MA 3, Nan TANG4, Yinghui WU1 1 School of Informatics, University of Edinburgh, Edinburgh EH8 9YL, UK 2 Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin 150001, China 3 NLSDE Lab, Beihang University, Beijing 100919, China 4 Qatar Computing Research Institute, Qatar Foundation, Doha, Qatar c Higher Education Press and Springer-Verlag Berlin Heidelberg 2012 Abstract It is increasingly common to find graphs in which edges are of different types, indicating a variety of relation- 1 Introduction ships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is It is increasingly common to find data modeled as graphs in specified with a regular expression of a certain form, ex- a variety of areas, e.g., computer vision, knowledge discov- pressing the connectivity of a data graph via edges of var- ery, biology, chem-informatics, dynamic network traffic, so- ious types. In addition, we define graph pattern matching cial networks, semantic Web, and intelligence analysis. To based on a revised notion of graph simulation. On graphs in query data graphs, two classes of queries are being widely emerging applications such as social networks, we show that used: these queries are capable of finding more sensible informa- (a) Reachability queries, asking whether there exists a path tion than their traditional counterparts. Better still, their in- from one node to another [1–6].
    [Show full text]
  • Data Quality: from Theory to Practice
    Data Quality: From Theory to Practice Wenfei Fan School of Informatics, University of Edinburgh, and RCBD, Beihang University [email protected] ABSTRACT deed, some attributes of t1,t2 and t3 have become obso- 2 Data quantity and data quality, like two sides of a coin, lete and thus inaccurate. are equally important to data management. This paper provides an overview of recent advances in the study of data quality, from theory to practice. We also address The example shows that if the quality of the data is challenges introduced by big data to data quality man- bad, we cannot find correct query answers no matter agement. how scalable and efficient our query evaluation algo- rithms are. 1. INTRODUCTION Unfortunately, real-life data is often dirty: inconsis- When we talk about big data, we typically empha- tent, inaccurate, incomplete, obsolete and duplicated. size the quantity (volume) of the data. We often focus Indeed, “more than 25% of critical data in the world’s on techniques that allow us to efficiently store, manage top companies is flawed” [53], and “pieces of infor- and query the data. For example, there has been a host mation perceived as being needed for clinical decisions of work on developing scalable algorithms that, given a were missing from 13.6% to 81% of the time” [76]. It query Q and a dataset D, compute query answers Q(D) is also estimated that “2% of records in a customer file when D is big. becomeobsoletein one month”[31] and hence, in a cus- tomer database, 50% of its records may be obsolete and But can we trust Q(D) as correct query answers? inaccurate within two years.
    [Show full text]
  • RSE Fellows Ordered by Academic Discipline As at 15/04/2014
    RSE Fellows ordered by Academic Discipline as at 15/04/2014 HRH Prince Charles The Prince of Wales KG KT GCB Hon FRSE HRH The Duke of Edinburgh KG KT OM, GBE Hon FRSE HRH The Princess Royal KG KT GCVO, HonFRSE A1 Biomedical and Cognitive Sciences 2014 Professor Judith Elizabeth Allen FRSE, Professor of Immunobiology and Director of Research, School of Biological Sciences, University of Edinburgh. 1998 Dr Ferenc Andras Antoni FRSE, Head, Division of Preclinical Research, EGIS Pharmaceuticals plc. 1993 Sir John Peebles Arbuthnott MRIA PRSE, FMedSci, Former Principal and Vice-Chancellor, University of Strathclyde. Member, Food Standards Agency, Scotland; Chair, NHS Greater Glasgow and Clyde. 2010 Professor Andrew Howard Baker FRSE, Professor of Molecular Medicine, University of Glasgow. 1986 Professor Joseph Cyril Barbenel FRSE, Professor, Department of Electronic and Electrical Engineering, University of Strathclyde. 2013 Professor Michael Peter Barrett FRSE, Professor of Biochemical Parasitology, University of Glasgow. 2005 Professor Susan Margaret Black OBE FRSE, Director, Centre for Anatomy and Human Identification, University of Dundee. Director, Centre for Anatomy and Human Identification, University of Dundee. 2007 Professor Nuala Ann Booth FRSE, Personal Professor of Molecular Haemostasis and Thrombosis, University of Aberdeen. 2001 Professor Peter Boyle CorrFRSE, FMedSci, Former Director, International Agency for Research on Cancer, Lyon. 1991 Professor Sir Alasdair Muir Breckenridge CBE KB FRSE, FMedSci, Emeritus Professor of Clinical Pharmacology, University of Liverpool. 2007 Professor Peter James Brophy FRSE, FMedSci, Professor of Anatomy, University of Edinburgh. Director, Centre for Neuroregeneration, University of Edinburgh. 2013 Professor Gordon Douglas Brown FRSE, Professor of Immunology, University of Aberdeen. 2012 Professor Verity Joy Brown FRSE, Provost of St Leonard's College, University of St Andrews.
    [Show full text]
  • Download Version of Record (PDF / 2MB)
    Open Research Online The Open University’s repository of research publications and other research outputs Capturing and Exploiting Citation Knowledge for the Recommendation of Scientific Publications Thesis How to cite: Khadka, Anita (2020). Capturing and Exploiting Citation Knowledge for the Recommendation of Scientific Publications. PhD thesis The Open University. For guidance on citations see FAQs. c 2020 The Author https://creativecommons.org/licenses/by-nc-nd/4.0/ Version: Version of Record Link(s) to article on publisher’s website: http://dx.doi.org/doi:10.21954/ou.ro.00011b89 Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk CAPTURINGANDEXPLOITINGCITATIONKNOWLEDGE FORTHERECOMMENDATIONOFSCIENTIFIC PUBLICATIONS Anita Khadka Thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Knowledge Media Institute Faculty of Science, Technology, Engineering & Mathematics The Open University September 2020 ABSTRACT With the continuous growth of scientific literature, it is becoming in- creasingly challenging to discover relevant scientific publications from the plethora of available academic digital libraries. Despite the current scale, important efforts have been achieved towards the research and development of academic search engines, reference management tools, review management platforms, scientometrics systems, and recom- mender systems that help finding a variety of relevant scientific items, such as publications, books, researchers, grants and events, among others. This thesis focuses on recommender systems for scientific public- ations. Existing systems do not always provide the most relevant scientific publications to users, despite they are present in the recom- mendation space.
    [Show full text]