RDF Shape Induction Using Knowledge Base Profiling

Total Page:16

File Type:pdf, Size:1020Kb

RDF Shape Induction Using Knowledge Base Profiling RDF Shape Induction using Knowledge Base Profiling Nandana Mihindukulasooriya Mohammad Rifat Ahmmad Giuseppe Rizzo Universidad Politécnica de Madrid, Rashid Instituto Superiore Mario Boella, Italy Spain Politecnico di Torino, Italy [email protected] [email protected] [email protected] Raúl García-Castro Oscar Corcho Marco Torchiano Universidad Politécnica de Madrid, Universidad Politécnica de Madrid, Politecnico di Torino, Italy Spain Spain [email protected] [email protected] [email protected] ABSTRACT Ontology languages are designed for inferring new knowledge Knowledge Graphs (KGs) are becoming the core of most artificial using known axioms rather than for validating data based on ax- intelligent and cognitive applications. Popular KGs such as DBpedia ioms. A reasoner and a validator have different functions, i.e., a and Wikidata have chosen the RDF data model to represent their reasoner is used for inferring new knowledge while a validator is data. Despite the advantages, there are challenges in using RDF used for finding violations against a set of constraints. The under- data, for example, data validation. Ontologies for specifying domain pinning principles used in OWL, such as the use of Open World conceptualizations in RDF data are designed for entailments rather Assumption (OWA) and Non-Unique Name Assumption, can lead to than validation. Most ontologies lack the granular information unexpected results in a validator. Thus, it is challenging to perform needed for validating constraints. Recent work on RDF Shapes and certain validation tasks using such languages. standardization of languages such as SHACL and ShEX provide Validation against a pre-defined schema is one of the key steps better mechanisms for representing integrity constraints for RDF in most conventional data storage and publishing paradigms; for data. However, manually creating constraints for large KGs is still instance, validation against DDL constraints in SQL databases or a tedious task. In this paper, we present a data driven approach for XML Schema or RelaxNG in XML documents [27]. For RDF data, inducing integrity constraints for RDF data using data profiling. typically OWL or RDF Schema based ontologies are used for doing Those constraints can be combined into RDF Shapes and can be used this kind of validations. However, the use of ontologies for validat- to validate RDF graphs. Our method is based on machine learning ing RDF data leads to several problems: (a) the mismatch between techniques to automatically generate RDF shapes using profiled intended purposes, inference (for which the ontologies are designed RDF data as features. In the experiments, the proposed approach for) and validation, and (b) lack of information useful for validation achieved 97% precision in deriving RDF Shapes with cardinality in most of the ontologies. constraints for a subset of DBpedia data. In general, data quality remains a critical aspect to obtain trust by users. Data quality relates to the perception of the “fitness for use” CCS CONCEPTS in a given context [32]. Constraints are limitations incorporated on the data that are supposed to be satisfied all the time by instances • Information systems → Data cleaning; • Computing method- of the database [2]. They are useful for users to understand data ologies → Knowledge representation and reasoning; as they represent characteristics that data naturally exhibits [22]. KEYWORDS Data profiling is defined as “the activity of creating small but infor- mative summaries of a database” [17]. In this study, we present a RDF Shape, Knowledge Base, Data Quality, Machine Learning data-driven approach by applying machine learning techniques on profiled RDF data to automatically generate integrity constraints. 1 INTRODUCTION Then, a set of constraints related to a specific class are combined to generate an RDF Shape as a collection of constraints. These RDF Knowledge Graphs (KGs) are becoming the core of most artificial shapes can be used to validate RDF graphs. intelligent applications driven by domain knowledge. Popular KGs One of the main use cases for generating RDF Shapes from data is such as DBpedia, YAGO2, and Wikidata have chosen the RDF data quality assessment. In this work, we use DBpedia [21] (version 2016- model for knowledge representation. RDF is a graph-based data 04) as our target KB and we experiment on assessing the quality of model which is the de facto model in Semantic Web and Linked all dbo:Person instances in the DBpedia KB. The main requirement Data applications. RDF graphs can capture and represent domain for such quality assessment would be to identify the validation rules information in a semantically rich manner using ontologies. As that each person instance should conform to. For example, a person with any other data model, in order to specify the conditions that should have exactly one value for the “dbo:birthDate" property or must be satisfied by an RDF graph, constraints need to be imposed. the values of the “dbo:height" property should always be a positive In practical settings, constraints are commonly used for three main number. The instances of the Person class have more than 13,000 tasks: (a) specifying the properties that data should hold; (b) handle associated properties (including dbo, DBpedia ontology proper- contradictions within data or with respect to the domain under ties and dbp, auto-generated properties from Wikipedia infobox consideration; or (c) as a help for query optimization. keys). Thus, it is a tedious, time-consuming, and error-prone task Shapes Working Group1 and later on through the Shape Expres- to generate such validation rules manually. Some of the validation sion Community Group2. The outcomes of such efforts, languages rules can be encoded into the ontology, but it still requires a lot of such as W3C Shapes Constraint Language (SHACL) [19] and Shape manual effort. This work proposes an approach for inducing such Expressions Language [26], allow integrity constraints to be de- validation rules in the form of RDF shapes by profiling the data and fined for RDF data and to validate data. However, for largeRDF using inductive approaches to extract those rules. Other use cases knowledge bases such as DBpedia, which contains more than 750 for inducting shapes include describing the data (which is helpful classes and 60,000 properties (dbo: and dbp:), it is still a challenge to in generating queries or creating dynamic user interfaces). generate integrity constraints manually. In this context, the main The main contribution of this paper is a novel approach for research problem addressed in the paper is: How to automatically generating RDF Shapes based on inducing constraints using ma- generate integrity constraints for RDF data? chine learning algorithms and using data profiling information In this study, we propose a data-driven approach to automati- as features. The proposed approach is defined in a generic way cally generate RDF shapes for a given dataset by applying machine that applies to any type of constraint and is validated using two learning techniques to RDF data profiling information extracted common constraint types: (a) cardinality constraints, and (b) range from the dataset. constraints. The remainder of this paper is structured as follows: in Section 2 2.1 Objective we define the problem setting; in Section 3 we provide references Using the information extracted from statistical data profilers, our and a background on related research activities; in Section 4 we objective is to generate representative RDF Shapes for the data in a describe our approach and in Section we report the experimental given dataset. We generate shapes at the class-level. An example results 5 achieved by our approach in a controlled setup (Section 6). excerpt of RDF Shape in SHACL for the dbo:Person class is illustrated We conclude in Section 7 providing insights on future research in Listing 1. endeavours. Listing 1: A snippet from an example Person shape @prefix dbo: <http://dbpedia.org/ontology/> . 2 PROBLEM DEFINITION @prefix sh: <http://www.w3.org/ns/shacl#> . A knowledge graph is defined to be consistent if it does not contain conflicting or contradictory data [16]. When a schema is available ex:DBpediaPerson a sh:NodeShape; with the integrity constraints that the data should comply, the sh:targetClass dbo:Person; data usually goes through a validation process that verifies the sh:property [sh:path foaf:name; compliance against those constraints. These integrity constraints sh:minCount 1; encapsulate the consistency requirements for the data in order sh:nodeKind sh:Literal ]; to be fit to a given set of use cases. For instance, in relational sh:property [ sh:path dbo:birthDate; databases, integrity constraints are expressed in a data definition sh:datatype xsd:date ; language (DDL), and the database management systems (DBMS) sh:minCount 1; ensure that any data inserted into the database will not lead to any sh:maxCount 1; inconsistency in the entire database. However, validation of RDF sh:nodeKind sh:Literal ] ; data is not done in this manner due to several reasons such as the sh:property [ sh:path dbo:birthPlace; lack of a language for expressing constraints (RDFS and OWL are sh:datatype dbo:Place; designed for entailment and not for validation as discussed in the sh:nodeKind sh:BlankNodeOrIRI; introduction), or having generic models suitable for wider use and sh:minCount 1; not for specific use cases. sh:maxCount 1 ] . When an ontology is available with TBox axioms that define the conceptualization of the domain, a reasoner can be used to verify To this end, a large number of special purpose systems and whether the dataset is consistent with the domain by verifying the techniques have been developed for solving such constraint-based axioms defined in the ontology. However, most of the ontologies mining and learning problems [10]. Constraints can be used both in do not have rich axioms that could help to detect inconsistencies a descriptive setting, such as discovering association rules [3], or in in data.
Recommended publications
  • A Comparative Evaluation of Geospatial Semantic Web Frameworks for Cultural Heritage
    heritage Article A Comparative Evaluation of Geospatial Semantic Web Frameworks for Cultural Heritage Ikrom Nishanbaev 1,* , Erik Champion 1,2,3 and David A. McMeekin 4,5 1 School of Media, Creative Arts, and Social Inquiry, Curtin University, Perth, WA 6845, Australia; [email protected] 2 Honorary Research Professor, CDHR, Sir Roland Wilson Building, 120 McCoy Circuit, Acton 2601, Australia 3 Honorary Research Fellow, School of Social Sciences, FABLE, University of Western Australia, 35 Stirling Highway, Perth, WA 6907, Australia 4 School of Earth and Planetary Sciences, Curtin University, Perth, WA 6845, Australia; [email protected] 5 School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Perth, WA 6845, Australia * Correspondence: [email protected] Received: 14 July 2020; Accepted: 4 August 2020; Published: 12 August 2020 Abstract: Recently, many Resource Description Framework (RDF) data generation tools have been developed to convert geospatial and non-geospatial data into RDF data. Furthermore, there are several interlinking frameworks that find semantically equivalent geospatial resources in related RDF data sources. However, many existing Linked Open Data sources are currently sparsely interlinked. Also, many RDF generation and interlinking frameworks require a solid knowledge of Semantic Web and Geospatial Semantic Web concepts to successfully deploy them. This article comparatively evaluates features and functionality of the current state-of-the-art geospatial RDF generation tools and interlinking frameworks. This evaluation is specifically performed for cultural heritage researchers and professionals who have limited expertise in computer programming. Hence, a set of criteria has been defined to facilitate the selection of tools and frameworks.
    [Show full text]
  • From the Semantic Web to Social Machines
    ARTICLE IN PRESS ARTINT:2455 JID:ARTINT AID:2455 /REV [m3G; v 1.23; Prn:25/11/2009; 12:36] P.1 (1-6) Artificial Intelligence ••• (••••) •••–••• Contents lists available at ScienceDirect Artificial Intelligence www.elsevier.com/locate/artint From the Semantic Web to social machines: A research challenge for AI on the World Wide Web ∗ Jim Hendler a, , Tim Berners-Lee b a Tetherless World Constellation, RPI, United States b Computer Science and AI Laboratory, MIT, United States article info abstract Article history: The advent of social computing on the Web has led to a new generation of Web Received 24 September 2009 applications that are powerful and world-changing. However, we argue that we are just Received in revised form 1 October 2009 at the beginning of this age of “social machines” and that their continued evolution and Accepted 1 October 2009 growth requires the cooperation of Web and AI researchers. In this paper, we show how Available online xxxx the growing Semantic Web provides necessary support for these technologies, outline the challenges we see in bringing the technology to the next level, and propose some starting places for the research. © 2009 Elsevier B.V. All rights reserved. Much has been written about the profound impact that the World Wide Web has had on society. Yet it is primarily in the past few years, as more interactive “read/write” technologies (e.g. Wikis, blogs and photo/video sharing) and social network- ing sites have proliferated, that the truly profound nature of this change is being felt. From the very beginning, however, the Web was designed to create a network of humans changing society empowered using this shared infrastructure.
    [Show full text]
  • Deciding SHACL Shape Containment Through Description Logics Reasoning
    Deciding SHACL Shape Containment through Description Logics Reasoning Martin Leinberger1, Philipp Seifer2, Tjitze Rienstra1, Ralf Lämmel2, and Steffen Staab3;4 1 Inst. for Web Science and Technologies, University of Koblenz-Landau, Germany 2 The Software Languages Team, University of Koblenz-Landau, Germany 3 Institute for Parallel and Distributed Systems, University of Stuttgart, Germany 4 Web and Internet Science Research Group, University of Southampton, England Abstract. The Shapes Constraint Language (SHACL) allows for for- malizing constraints over RDF data graphs. A shape groups a set of constraints that may be fulfilled by nodes in the RDF graph. We investi- gate the problem of containment between SHACL shapes. One shape is contained in a second shape if every graph node meeting the constraints of the first shape also meets the constraints of the second. Todecide shape containment, we map SHACL shape graphs into description logic axioms such that shape containment can be answered by description logic reasoning. We identify several, increasingly tight syntactic restrictions of SHACL for which this approach becomes sound and complete. 1 Introduction RDF has been designed as a flexible, semi-structured data format. To ensure data quality and to allow for restricting its large flexibility in specific domains, the W3C has standardized the Shapes Constraint Language (SHACL)5. A set of SHACL shapes are represented in a shape graph. A shape graph represents constraints that only a subset of all possible RDF data graphs conform to. A SHACL processor may validate whether a given RDF data graph conforms to a given SHACL shape graph. A shape graph and a data graph that act as a running example are pre- sented in Fig.
    [Show full text]
  • The Application of Semantic Web Technologies to Content Analysis in Sociology
    THEAPPLICATIONOFSEMANTICWEBTECHNOLOGIESTO CONTENTANALYSISINSOCIOLOGY MASTER THESIS tabea tietz Matrikelnummer: 749153 Faculty of Economics and Social Science University of Potsdam Erstgutachter: Alexander Knoth, M.A. Zweitgutachter: Prof. Dr. rer. nat. Harald Sack Potsdam, August 2018 Tabea Tietz: The Application of Semantic Web Technologies to Content Analysis in Soci- ology, , © August 2018 ABSTRACT In sociology, texts are understood as social phenomena and provide means to an- alyze social reality. Throughout the years, a broad range of techniques evolved to perform such analysis, qualitative and quantitative approaches as well as com- pletely manual analyses and computer-assisted methods. The development of the World Wide Web and social media as well as technical developments like optical character recognition and automated speech recognition contributed to the enor- mous increase of text available for analysis. This also led sociologists to rely more on computer-assisted approaches for their text analysis and included statistical Natural Language Processing (NLP) techniques. A variety of techniques, tools and use cases developed, which lack an overall uniform way of standardizing these approaches. Furthermore, this problem is coupled with a lack of standards for reporting studies with regards to text analysis in sociology. Semantic Web and Linked Data provide a variety of standards to represent information and knowl- edge. Numerous applications make use of these standards, including possibilities to publish data and to perform Named Entity Linking, a specific branch of NLP. This thesis attempts to discuss the question to which extend the standards and tools provided by the Semantic Web and Linked Data community may support computer-assisted text analysis in sociology. First, these said tools and standards will be briefly introduced and then applied to the use case of constitutional texts of the Netherlands from 1884 to 2016.
    [Show full text]
  • The Web of Data for E-Commerce: Schema.Org and Goodrelations for Researchers and Practitioners
    The Web of Data for E-Commerce: Schema.org and GoodRelations for Researchers and Practitioners Martin Hepp() E-Business & Web Science Research Group, Universität der Bundeswehr München, Werner-Heisenberg-Weg 39, 85577, Neubiberg, Germany [email protected] Abstract. Schema.org is one of the main drivers for the adoption of Semantic Web principles by a broad number of organizations and individuals for real business needs. GoodRelations is a well-established conceptual model for representing e-commerce information, one of the few widely used OWL DL on- tologies, and since 2012 the official e-commerce model of schema.org. In this tutorial, we will (1) give a comprehensive overview and hands-on training on the advanced conceptual structures of schema.org for e-commerce, including patterns for ownership and demand, (2) present the full tool chain for producing and consuming respective data, (3) explain the long-term vision of Linked Open Commerce, and (4) discuss advanced topics, like access control, identity and authentication (e.g. with WebID); micropayment services, and data management issues from the publisher and consumer perspective. We will also cover research opportunities resulting from the growing adoption and the re- spective amount of data in RDFa, Microdata, and JSON-LD syntaxes. Keywords: Schema.org · GoodRelations · Semantic Web · Ontologies · Micro- data · OWL · RDFa · JSON-LD · Linked Open Data · E-Commerce · E-Business 1 Introduction Schema.org [1] is one of the main drivers for the adoption of Semantic Web principles by a broad number of people for their real business needs. The resulting amount of real-world RDF1 data exceeds a critical mass so that it becomes interesting as reference data for any kind of foundational Semantic Web research.
    [Show full text]
  • A Celebration of 10 Years of the Science of the Web WEB SCIENCE TRUST BOARD Board Members
    2006-2016 A celebration of 10 Years of the Science of the Web WEB SCIENCE TRUST BOARD Board Members Professor Dame Wendy Hall JP Rangaswami Professor Sir Nigel Shadbolt Professor George Metakides Professor James Hendler John Taysom Professor Noshir Contractor Daniel J Weitzner Fellows and Advisors Professor Bebo White Web Science Champion Sir John Taylor Professor Sir Tim Berners-Lee Senior Fellow Senior Fellow Anni Rowland-Campbell Baroness Rennie Fritchie Advisor Patron We also wish to acknowledge the contribution of colleagues who acted as supporters and research fellows for the fore- runner to the Web Science Trust, the Web Science Research Initiative (WSRI). CELEBRATING 10 YEARS OF WEB SCIENCE 2016 marks the tenth anniversary of the academic discipline of Web Science. It was in 2006 that the paper ‘Creating a Science of the Web’ appeared in the journal Science. The paper’s authors: Tim Berners-Lee, Wendy Hall, James Hendler, Nigel Shadbolt, and Daniel Weitzner, set out their concerns about the future direction of the Web, and emphasized the need to establish a clear research agenda ‘aimed at understanding the current, evolving, and potential Web’: “If we want to model the Web; if we want to understand the architectural principles that have provided for its growth; and if we want to be sure that it supports the basic social values of trustworthiness, privacy, and respect for social boundaries, then we must chart out a research agenda that targets the Web as a primary focus of attention.” The authors called for the new discipline of Web Science role in shaping appropriate policy directives, as well as to be inherently interdisciplinary, to tackle research enabling a better understanding of the central importance challenges around ownership and access to data, and to of the Web in all our lives.
    [Show full text]
  • A Manifesto for Web Science? Susan Halford, Cathy Pope, Leslie Carr University of Southampton Southampton United Kingdom
    A Manifesto for Web Science? Susan Halford, Cathy Pope, Leslie Carr University of Southampton Southampton United Kingdom [email protected] Categories and Subject Descriptors The call for Web Science insists that we open up this H.3.5 [Information Storage and Retrieval]: Online space. In doing so, a flag has been planted. Hendler, Information Services Berners-Lee et al have named this territory for web science and have begun to map it from their vantage point in General Terms Computer Science. But – and as they would be the first to Management, Documentation, Economics, Security, acknowledge – this is only one vantage point. Other Human Factors, Legal Aspects. disciplines will add new perspectives and interpretations. However, it is by no means certain that we will all agree Keywords about what we see. For whilst we might all agree that Web Web science Science cannot develop without inter-disciplinarity, we 1. INTRODUCTION should be clear from the beginning that this is no simple matter. We need to be realistic about what we are getting A clarion call for a new science of the web has been ourselves into. There will be big challenges in making sounded in the pages of CACM (Hendler et al 2008) and ourselves understood to each other and developing elsewhere in path-breaking papers by Berners-Lee et al collaborative understandings will require us to leave the (2006a, 2006b). These authors point to a paradox: despite comfort of our disciplinary silos. But, the promise of new the huge effect that the web has had on computing – not to forms of knowledge and understanding that are bigger than mention the world – computer scientists rarely study the the sum of our parts are gains worth working for.
    [Show full text]
  • Coverage of Academic Citation Databases Compared with Coverage
    The current issue and full text archive of this journal is available on Emerald Insight at: www.emeraldinsight.com/1468-4527.htm Coverage of Coverage of academic citation academic databases compared with citation coverage of scientific social media databases Personal publication lists as 255 calibration parameters Received 24 July 2014 Fourth revision approved Fee Hilbert, Julia Barth, Julia Gremm, Daniel Gros, Jessica Haiter, 5 January 2015 Maria Henkel, Wilhelm Reinhardt and Wolfgang G. Stock Department of Information Science, Heinrich Heine University, Dusseldorf, Germany Abstract Purpose – The purpose of this paper is to show how the coverage of publications is represented in information services. Academic citation databases (Web of Science, Scopus, Google Scholar) and scientific social media (Mendeley, CiteULike, BibSonomy) were analyzed by applying a new method: the use of personal publication lists of scientists. Design/methodology/approach – Personal publication lists of scientists of the field of information science were analyzed. All data were taken in collaboration with the scientists in order to guarantee complete publication lists. Findings – The demonstrated calibration parameter shows the coverage of information services in the field of information science. None of the investigated databases reached a coverage of 100 percent. However Google Scholar covers a greater amount of publications than other academic citation databases and scientific social media. Research limitations/implications – Results were limited to the publications of scientists working at an information science department from 2003 to 2012 at German-speaking universities. Practical implications – Scientists of the field of information science are encouraged to review their publication strategy in case of quality and quantity. Originality/value – The paper confirms the usefulness of personal publication lists as a calibration parameter for measuring coverage of information services.
    [Show full text]
  • Astrea: Automatic Generation of SHACL Shapes from Ontologies
    Astrea: automatic generation of SHACL shapes from ontologies Andrea Cimmino[0000−0002−1823−4484], Alba Fern´andez-Izquierdo[0000−0003−2011−3654], and Ra´ulGarc´ıa-Castro[0000−0002−0421−452X] Ontology Engineering Group Universidad Polit´ecnicade Madrid cimmino,albafernandez,[email protected] Abstract. Knowledge Graphs (KGs) that publish RDF data modelled using ontologies in a wide range of domains have populated the Web. The SHACL language is a W3C recommendation that has been endowed to encode a set of either value or model data restrictions that aim at validat- ing KG data, ensuring data quality. Developing shapes is a complex and time consuming task that is not feasible to achieve manually. This arti- cle presents two resources that aim at generating automatically SHACL shapes for a set of ontologies: (1) Astrea-KG, a KG that publishes a set of mappings that encode the equivalent conceptual restrictions among ontology constraint patterns and SHACL constraint patterns, and (2) Astrea, a tool that automatically generates SHACL shapes from a set of ontologies by executing the mappings from the Astrea-KG. These two resources are openly available at Zenodo, GitHub, and a web application. In contrast to other proposals, these resources cover a large number of SHACL restrictions producing both value and model data restrictions, whereas other proposals consider only a limited number of restrictions or focus only on value or model restrictions. Keywords: SHACL Shapes · RDF Validation · Ontology Resource type: Dataset & Software Astrea-KG: http://astrea.helio.linkeddata.es/ Astrea-KG DOI: https://doi.org/10.5281/zenodo.3571009 Astrea application: http://astrea.linkeddata.es/ 1 Introduction Knowledge Graphs (KGs) are becoming pervasive on the Web [5].
    [Show full text]
  • Public and Private Research: Current State of German Publications
    A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Stahlschmidt, Stephan; Stephen, Dimity; Hinze, Sybille Research Report Performance and structures of the German science system Studien zum deutschen Innovationssystem, No. 5-2019 Provided in Cooperation with: Expertenkommission Forschung und Innovation (EFI) Suggested Citation: Stahlschmidt, Stephan; Stephen, Dimity; Hinze, Sybille (2019) : Performance and structures of the German science system, Studien zum deutschen Innovationssystem, No. 5-2019, Expertenkommission Forschung und Innovation (EFI), Berlin This Version is available at: http://hdl.handle.net/10419/194275 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. www.econstor.eu Performance and Structures of the German Science System Stephan Stahlschmidt, Dimity Stephen and Sybille Hinze Studien zum deutschen Innovationssystem Nr.
    [Show full text]
  • Data Standards and Guidelines 05/11/2020 Online Event
    Artificial Intelligence and Augmented Intelligence for Automated Investigations for Scientific Discovery The Physical Sciences Data-Science Service (PSDS) Patterns Failed it to Nailed it: Data Standards and Guidelines 05/11/2020 Online Event Dr Samantha Kanza & Dr Nicola Knight University of Southampton 10/12/2020 AI3SD-Event-Series:Report-20 Failed it to Nailed it: Data Standards and Guidelines AI3SD-Event-Series:Report-20 10/12/2020 DOI: 10.5258/SOTON/P0033 Published by University of Southampton Network: Artificial Intelligence and Augmented Intelligence for Automated Invest- igations for Scientific Discovery This Network+ is EPSRC Funded under Grant No: EP/S000356/1 Principal Investigator: Professor Jeremy Frey Co-Investigator: Professor Mahesan Niranjan Network+ Coordinator: Dr Samantha Kanza An EPSRC National Research Facility to facilitate Data Science in the Physical Sciences: The Physical Sciences Data science Service (PSDS) This Facility is EPSRC Funded under Grant No: EP/S020357/1 Principal Investigator: Professor Simon Coles Co-Investigators: Dr Brian Matthews, Dr Juan Bicarregui & Professor Jeremy Frey Contents 1 Event Details1 2 Event Summary and Format1 3 Event Background1 4 Talks 2 4.1 Metadata........................................2 4.1.1 Introduction to Metadata...........................2 4.1.2 Introduction to FAIR.............................2 4.1.3 Data generation, data standards, and metadata capture in drug discovery - Dr Martin-Immanuel Bittner (Arctoris)..................2 4.2 Open Data.......................................8 4.2.1 Introduction to Open Data..........................8 4.2.2 Giving your open data the best chance to realise its potential - Mr Chris Gutteridge (University of Southampton)...................8 4.3 Linked Data....................................... 13 4.3.1 Introduction to Semantic Web Technologies................
    [Show full text]
  • Network Science, Web Science and Internet Science: Comparing Interdisciplinary Areas
    Network Science, Web Science and Internet Science: comparing interdisciplinary areas Thanassis Tiropanis1, Wendy Hall1, Jon Crowcroft2, Noshir Contractor3, Leandros Tassiulas4 1 University of Southampton 2 University of Cambridge 3 Northwestern University 4 University of Thessaly ABSTRACT In this paper, we examine the fields of Network Science, Web Science and Internet Science. All three areas are interdisciplinary, and since the Web is based on the Internet, and both the Web and the Internet are networks, there is perhaps confusion about the relationship between them. We study the extent of overlap and ask whether one includes the others, or whether they are all part of the same larger domain. This paper provides an account of the emergence of each of these areas and outlines a framework for comparison. Based on this framework, we discuss these overlaps and propose directions for harmonization of research activities. INTRODUCTION The observation of patterns that characterise networks, from biological to technological and social, and the impact of the Web and the Internet on society and business have motivated interdisciplinary research to advance our understanding of these systems. Their study has been the subject of Network Science research for a number of years. However, more recently we have witnessed the emergence of two new interdisciplinary areas: Web Science and Internet Science. Network Science Network science can be traced to its mathematical origins dating back to Leonard Euler’s seminal work on graph theory (Euler, 1741) in the 18th century and to its social scientific origins two centuries later by the psychiatrist Jacob Moreno’s (1953) efforts to develop “sociometry.” Soon thereafter, the mathematical framework offered by graph theory was also picked up by psychologists (Bavelas, 1950), anthropologists (Mitchell, 1956), and other social scientists to create an interdiscipline called Social Networks.
    [Show full text]