Modelling and Computing the Quality of Scientific Information on the Web of Data

Total Page:16

File Type:pdf, Size:1020Kb

Modelling and Computing the Quality of Scientific Information on the Web of Data MODELLING AND COMPUTING THE QUALITY OF SCIENTIFIC INFORMATION ON THE WEB OF DATA A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences 2014 By Matthew Gamble School of Computer Science Contents List of Tables 7 List of Figures 11 List of Code Listings 13 Abstract 14 Declaration 15 Copyright 16 Table of Artefacts 21 Acknowledgements 24 1 Introduction 26 1.1 Problem . 26 1.1.1 A Motivating Question . 30 1.1.2 Quality Knowledge and the Information Quality Life-cycle 31 1.2 Examples of Quality Knowledge for Scientific Data on the Web . 32 1.2.1 Minimum Information Checklists . 32 1.2.2 Gene Ontology Annotations Quality Score . 34 1.2.3 Online Chemical Structure Repositories . 36 1.3 Research Aims and Objectives . 37 1.4 Objective, Predictive, and Subjective . 37 1.5 Research Hypotheses . 39 1.6 Methodology and Approach . 40 1.7 Research Contributions . 42 2 1.8 Publications and Research Activity . 44 1.8.1 Publications . 45 1.8.2 Research Activity . 46 1.9 Thesis Organization . 46 2 Background 48 2.1 Chapter Introduction . 48 2.2 What is Data Quality? . 49 2.2.1 The IQ Life-Cycle . 53 2.3 Quality Knowledge and the Web of Data . 58 2.3.1 The Linked Data Approach . 58 2.3.2 Examining Quality Knowledge . 63 2.4 OPS IQ: an Objective, Predictive and Subjective Classification . 69 2.4.1 Using the Classification . 72 2.5 The Role of Provenance . 75 2.5.1 What is Provenance? . 75 2.5.2 State of Provenance in the Web of Data . 78 2.6 Related Work in Information Quality Assessment . 82 2.7 Summary and Conclusions . 95 3 MIM: a Minimum Information Model 97 3.1 Chapter Introduction . 97 3.2 Minimum Information Checklists . 98 3.3 A WikiProject Chemicals Case-Study . 102 3.3.1 Extending the IQ Life-cycle . 107 3.3.2 chemmim: A Checklist for Chemical Compound Data . 108 3.4 Minimum Information Checklist Structural Meta-Study . 112 3.5 The MIM Vocabulary . 115 3.5.1 Describing a Checklist . 117 3.5.2 Reporting Against a Checklist . 120 3.5.3 Checklist Satisfaction . 123 3.6 Implementation . 125 3.6.1 Checklist Satisfaction using SPIN . 127 3.7 Evaluating chembox Data with Chemmim . 129 3.7.1 Iterative Assessment - The IQ Life-cycle . 133 3.8 Comparison with other approaches . 134 3 3.8.1 minim . 134 3.8.2 OWL and OWL Integrity Constraints . 136 3.9 Discussion . 138 3.10 Future Work for MIM . 139 3.11 Chapter Summary . 141 4 Quality Fragments: a Probabilistic Approach 142 4.1 Chapter Introduction . 142 4.2 Modelling Predictive Quality Knowledge . 143 4.3 Bayesian Networks . 146 4.3.1 Modelling the Variables of a Metric: Continuous vs. Discrete150 4.3.2 Modelling the GAQ Metric . 151 4.3.3 Modular Metrics . 154 4.4 Multi-Entity Bayesian Networks . 156 4.5 Implementation . 163 4.5.1 Quality Knowledge Encoding with PR-OWL and the Evi- dent Vocabulary. 166 4.6 Assessing Bio2RDF data with Quality Fragments . 169 4.6.1 Data Preparation . 170 4.6.2 Replicating the GAQ Metric. 171 4.6.3 GAQ Assessment with Missing Metadata . 173 4.7 Discussion . 179 4.7.1 Some Notes on Implementation . 181 4.8 Comparison with Related Work . 182 4.9 Future Work for Quality Fragments . 183 4.10 Chapter Summary . 184 5 Procedurally Building Quality Fragments using Provenance 186 5.1 Chapter Introduction . 186 5.2 Provenance-based Quality Knowledge . 187 5.2.1 Our Intuition . 190 5.3 PROV . 192 5.4 Influence Factor . 199 5.4.1 Modelling evident:influenceFactor . 200 5.5 Quality Fragment Generation Procedure . 202 5.5.1 Assumptions and Restrictions . 203 4 5.5.2 Structure Generation . 204 5.5.3 Combination Strategy . 206 5.5.4 Influence Factor Normalization . 208 5.6 Evaluation . 212 5.6.1 The Zeng Network . 213 5.6.2 Data Preparation . 217 5.6.3 The Evident Generator . 220 5.6.4 The Evident Vocabulary for Provenance-aware Quality Frag- ments. 221 5.6.5 The Generated Quality Fragment . 222 5.6.6 Comparison of the Generated Network With the Zeng Net- work.............................. 225 5.7 Comparison with Related Work . 232 5.8 Conclusions and Future Work for Quality Fragment Generation . 233 5.9 Chapter Summary . 235 6 Conclusions 236 6.1 Summary of Research Contributions . 236 6.2 Future Work . 241 Bibliography 245 A Minimum Information Checklist Analysis 275 B The MIM Vocabulary 279 C The chemmim Checklist 285 D The mimspin MIM Validation Rules 288 E The Evident Vocabulary 305 F GAQ Multi-Entity Bayesian Network 308 G Wikiprov RDF Serialization Example 324 5 Word Count: 65331 6 List of Tables 3 Table of Artefacts that Support this Thesis . 21 2.1 Information Quality Dimensions Classified using OPS IQ . 74 2.2 Provenance Vocabularies . 79 2.3 Provenance Vocabulary Usage in the Web of Data . 81 2.4 IQ Assessment Frameworks . 83 3.1 Summary of Minimum Information Checklist Analysis . 112 4.1 Summary of Evaluation Datasets . 173 4.2 Results for GAQ Mfrags compared with GA2GAQ.pl . 174 5.1 The Core Elements of the PROV Data Model . 196 5.2 Summary of PROV Influence Types in ProvBench . 198 5.3 tAi CPT................................ 215 5.4 AuthorTrust CPT . 215 5.5 Results from the Zeng Network . 216 5.6 Statistics of the Training Dataset . 218 5.7 Statistics of the Test Dataset . 218 5.8 Results from the article Q Quality Fragment compared with the Zeng Network . 226 5.9 article Q Results for Featured and Cleanup Articles in Training Set230 5.10 article Q Results for Featured and Cleanup Articles in Test Set . 231 7 List of Figures 1.1 The Minimum Information about and RNAi Experiment (MIARE) Checklist (extract). 33 1.2 Annotation of UniprotKB P06727 with the term \lipoprotein metabolic process" (GO:0042157) in the Gene Ontology Annotations Database. 34 1.3 Position of the term \lipoprotein metabolic process" (GO:0042157) in the Gene Ontology. 35 1.4 The Three Aspects of Quality Knowledge. 37 1.5 A Readers Guide to the Structure and Dependencies in this Thesis 41 2.1 Two-dimensional matrix illustrating a spectrum of applicability and effectiveness of IQ metrics. 51 2.2 Map of Concepts used in this Thesis . 54 2.3 The Information Quality Life-cycle [Mis08] . 54 2.4 Abstract Representation of the GAQ score Assessment processes using a Reusable Quality Component for the GAQ Score . 55 2.5 Abstract Representation of a Quality View reusing the GAQ Score Assessment . 57 2.6 The Linking Open Data cloud diagram from [CJ11] . 59 2.7 Abstract Representation of a Quality Standard based Assessment 65 2.8 Evidence Code Ranks (ECRs) as Defined by Buza et al. as an Example of Prior Knowledge . 67 2.9 Abstract Representation of a Quality Fragment based Assessment 67 2.10 Abstract Representation of a Quality View based Assessment . 69 2.11 Using the OPS IQ classification. 72 2.12 Core Elements of the PROV Provenance Model [GM13] . 77 2.13 Example PROV Provenance Graph for a GO Annotation. 78 2.14 The WIQA Filtering Processes. 84 2.15 Intrinsic vs. Provenance-based Quality Knowledge in the GAQ score. 89 8 2.16 Example Bayesian Network Metric from Wang et al. [WV05] . 92 3.1 The Minimum Information about and RNAi Experiment (MIARE) Checklist (Assay Requirement Set). 99 3.2 A Minimum Information Checklist Solution in the Web of Data. 101 3.3 The Wikipedia Article and Chembox for Bicarbonate . 103 3.4 Interaction Between the IQ Life-cycle and Linked Data Extraction processes. 108 3.5 The 11 Requirements of the Chemmim Checklist. 109 3.6 Comparison of Wikipedia Article for Aluminium Hydroxide Oxide (Stub) and Acetic Acid (A). 111 3.7 The Anatomy of a Minimum Information Checklist. 113 3.8 The Minimum Information Model Vocabulary. 116 3.9 Creating an ObjectReport and Aligning it with a Checklist Re- quirement. 121 3.10 Creating a DataReport using a blank node and Aligning it with a Checklist Requirement. 121 3.11 Creating a ReportSet and Aligning it with a Checklist Requirement Set. 122 3.12 Associating the InChI Requirement with a SPARQL Query to Au- tomatically Generate Reports. 123 3.13 Requirement Satisfaction. 124 3.14 Requirement Satisfaction. ..
Recommended publications
  • The Chem Access Project from Bitmap Graphics to Fully Accessible Chemical Diagrams
    The Chem Access project From Bitmap Graphics to Fully Accessible Chemical Diagrams Volker Sorge Scientific Document Analysis Group Progressive Accessibility Solutions School of Computer Science Birmingham, UK University of Birmingham progressiveaccess.com 9th European e-Accessibility Forum, 2015, Paris, 8 June 2015 Motivation I Accessibility to STEM material is important issue for inclusive education I Diagrams are an important teaching means in STEM I Chemical diagrams (depictions of molecules) are ubiquitous in teaching from GCSE and A-levels teaching to undergrad curriculum chemistry, biosciences, life sciences. I Previous work on assistive technology for chemical diagrams I Require diagrams to be drawn in particular way or authoring environment I Need for specialist software to access and interact with diagrams I Additional hurdles for both authors and readers Goals I Make regular teaching material accessible I From inaccessible image to support for independent learning I Source independence I Do not rely on the benevolent, educated author I Platform independence I Use standard web technology (HTML5) I Accessible with all browsers, screen readers I Provide a seamless user experience without/very little interface I Support diverse material, for novices and experts alike Examples I Different representations of Aspirin molecule. Displayed formula. Skeletal formula. Structural formula. Examples I Or somewhat more complex. Procedure Input: A bitmap image of a molecule diagram 1. Image analysis and recognition 2. Generation of annotated SVG
    [Show full text]
  • Understanding Semantic Aware Grid Middleware for E-Science
    Computing and Informatics, Vol. 27, 2008, 93–118 UNDERSTANDING SEMANTIC AWARE GRID MIDDLEWARE FOR E-SCIENCE Pinar Alper, Carole Goble School of Computer Science, University of Manchester, Manchester, UK e-mail: penpecip, carole @cs.man.ac.uk { } Oscar Corcho School of Computer Science, University of Manchester, Manchester, UK & Facultad de Inform´atica, Universidad Polit´ecnica de Madrid Boadilla del Monte, ES e-mail: [email protected] Revised manuscript received 11 January 2007 Abstract. In this paper we analyze several semantic-aware Grid middleware ser- vices used in e-Science applications. We describe them according to a common analysis framework, so as to find their commonalities and their distinguishing fea- tures. As a result of this analysis we categorize these services into three groups: information services, data access services and decision support services. We make comparisons and provide additional conclusions that are useful to understand bet- ter how these services have been developed and deployed, and how similar ser- vices would be developed in the future, mainly in the context of e-Science applica- tions. Keywords: Semantic grid, middleware, e-science 1 INTRODUCTION The Science 2020 report [40] stresses the importance of understanding and manag- ing the semantics of data used in scientific applications as one of the key enablers of 94 P. Alper, C. Goble, O. Corcho future e-Science. This involves aspects like understanding metadata, ensuring data quality and accuracy, dealing with data provenance (where and how it was pro- duced), etc. The report also stresses the fact that metadata is not simply for human consumption, but primarily used by tools that perform data integration and exploit web services and workflows that transform the data, compute new derived data, etc.
    [Show full text]
  • Description Logics Emerge from Ivory Towers Deborah L
    Description Logics Emerge from Ivory Towers Deborah L. McGuinness Stanford University, Stanford, CA, 94305 [email protected] Abstract: Description logic (DL) has existed as a field for a few decades yet somewhat recently have appeared to transform from an area of academic interest to an area of broad interest. This paper provides a brief historical perspective of description logic developments that have impacted their usability beyond just in universities and research labs and provides one perspective on the topic. Description logics (previously called terminological logics and KL-ONE-like systems) started with a motivation of providing a formal foundation for semantic networks. The first implemented DL system – KL-ONE – grew out of Brachman’s thesis [Brachman, 1977]. This work was influenced by the work on frame systems but was focused on providing a foundation for building term meanings in a semantically meaningful and unambiguous manner. It rejected the notion of maintaining an ever growing (seemingly adhoc) vocabulary of link and node names seen in semantic networks and instead embraced the notion of a fixed set of domain-independent “epistemological primitives” that could be used to construct complex, structured object descriptions. It included constructs such as “defines-an-attribute-of” as a built-in construct and expected terms like “has-employee” to be higher-level terms built up from the epistemological primitives. Higher level terms such as “has-employee” and “has-part-time-employee” could be related automatically based on term definitions instead of requiring a user to place links between them. In its original incarnation, this led to maintaining the motivation of semantic networks of providing broad expressive capabilities (since people wanted to be able to represent natural language applications) coupled with the motivation of providing a foundation of building blocks that could be used in a principled and well-defined manner.
    [Show full text]
  • Open PHACTS: Semantic Interoperability for Drug Discovery
    REVIEWS Drug Discovery Today Volume 17, Numbers 21/22 November 2012 Reviews KEYNOTE REVIEW Open PHACTS: semantic interoperability for drug discovery 1 2 3 Antony J. Williams Antony J. Williams , Lee Harland , Paul Groth , graduated with a PhD in 4 5,6 chemistry as an NMR Stephen Pettifer , Christine Chichester , spectroscopist. Antony 7 6,7 8 Williams is currently VP, Egon L. Willighagen , Chris T. Evelo , Niklas Blomberg , Strategic development for 9 4 6 ChemSpider at the Royal Gerhard Ecker , Carole Goble and Barend Mons Society of Chemistry. He has written chapters for many 1 Royal Society of Chemistry, ChemSpider, US Office, 904 Tamaras Circle, Wake Forest, NC 27587, USA books and authored >140 2 Connected Discovery Ltd., 27 Old Gloucester Street, London, WC1N 3AX, UK peer reviewed papers and 3 book chapters on NMR, predictive ADME methods, VU University Amsterdam, Room T-365, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands 4 internet-based tools, crowdsourcing and database School of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK 5 curation. He is an active blogger and participant in the Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, 1211 Geneva 4, Switzerland Internet chemistry network as @ChemConnector. 6 Netherlands Bioinformatics Center, P. O. Box 9101, 6500 HB Nijmegen, and Leiden University Medical Center, The Netherlands Lee Harland 7 is the Founder & Chief Department of Bioinformatics – BiGCaT, Maastricht University, The Netherlands 8 Technical Officer of Respiratory & Inflammation iMed, AstraZeneca R&D Mo¨lndal, S-431 83 Mo¨lndal, Sweden 9 ConnectedDiscovery, a University of Vienna, Department of Medicinal Chemistry, Althanstraße 14, 1090 Wien, Austria company established to promote and manage precompetitive collaboration Open PHACTS is a public–private partnership between academia, within the life science industry.
    [Show full text]
  • The Fourth Paradigm
    ABOUT THE FOURTH PARADIGM This book presents the first broad look at the rapidly emerging field of data- THE FOUR intensive science, with the goal of influencing the worldwide scientific and com- puting research communities and inspiring the next generation of scientists. Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets. The speed at which any given scientific discipline advances will depend on how well its researchers collaborate with one another, and with technologists, in areas of eScience such as databases, workflow management, visualization, and cloud- computing technologies. This collection of essays expands on the vision of pio- T neering computer scientist Jim Gray for a new, fourth paradigm of discovery based H PARADIGM on data-intensive science and offers insights into how it can be fully realized. “The impact of Jim Gray’s thinking is continuing to get people to think in a new way about how data and software are redefining what it means to do science.” —Bill GaTES “I often tell people working in eScience that they aren’t in this field because they are visionaries or super-intelligent—it’s because they care about science The and they are alive now. It is about technology changing the world, and science taking advantage of it, to do more and do better.” —RhyS FRANCIS, AUSTRALIAN eRESEARCH INFRASTRUCTURE COUNCIL F OURTH “One of the greatest challenges for 21st-century science is how we respond to this new era of data-intensive
    [Show full text]
  • 11.2 Alkanes
    11.2 Alkanes A large number of carbon compounds are possible because the covalent bond between carbon atoms, such as those in hexane, C6H14, are very strong. Learning Goal Write the IUPAC names and draw the condensed structural formulas and skeletal formulas for alkanes and cycloalkanes. Chemistry: An Introduction to General, Organic, and Biological Chemistry, Twelfth Edition © 2015 Pearson Education, Inc. Naming Alkanes Alkanes • are hydrocarbons that contain only C—C and C—H bonds • are formed by a continuous chain of carbon atoms • are named using the IUPAC (International Union of Pure and Applied Chemistry) system • have names that end in ane • use Greek prefixes to name carbon chains with five or more carbon atoms Chemistry: An Introduction to General, Organic, and Biological Chemistry, Twelfth Edition © 2015 Pearson Education, Inc. IUPAC Naming of First Ten Alkanes Chemistry: An Introduction to General, Organic, and Biological Chemistry, Twelfth Edition © 2015 Pearson Education, Inc. 1 Condensed Structural Formulas In a condensed structural formula, • each carbon atom and its attached hydrogen atoms are written as a group • a subscript indicates the number of hydrogen atoms bonded to each carbon atom The condensed structural formula of butane has four carbon atoms. CH3—CH2—CH2—CH3 butane Core Chemistry Skill Naming and Drawing Alkanes Chemistry: An Introduction to General, Organic, and Biological Chemistry, Twelfth Edition © 2015 Pearson Education, Inc. Condensed Structural Formulas Alkanes are written with structural formulas that are • expanded to show each bond • condensed to show each carbon atom and its attached hydrogen atoms Expanded Condensed Expanded Condensed Chemistry: An Introduction to General, Organic, and Biological Chemistry, Twelfth Edition © 2015 Pearson Education, Inc.
    [Show full text]
  • Imaging Live Drosophila Brain with Two-Photon Fluorescence Microscopy Syeed Ehsan Ahmed University of Texas at El Paso, [email protected]
    University of Texas at El Paso DigitalCommons@UTEP Open Access Theses & Dissertations 2017-01-01 Imaging Live Drosophila Brain With Two-Photon Fluorescence Microscopy Syeed Ehsan Ahmed University of Texas at El Paso, [email protected] Follow this and additional works at: https://digitalcommons.utep.edu/open_etd Part of the Biophysics Commons, and the Optics Commons Recommended Citation Ahmed, Syeed Ehsan, "Imaging Live Drosophila Brain With Two-Photon Fluorescence Microscopy" (2017). Open Access Theses & Dissertations. 397. https://digitalcommons.utep.edu/open_etd/397 This is brought to you for free and open access by DigitalCommons@UTEP. It has been accepted for inclusion in Open Access Theses & Dissertations by an authorized administrator of DigitalCommons@UTEP. For more information, please contact [email protected]. IMAGING LIVE DROSOPHILA BRAIN WITH TWO-PHOTON FLUORESCENCE MICROSCOPY SYEED EHSAN AHMED Master’s Program in Physics APPROVED: Chunqiang Li, Ph.D., Chair Cristian E. Botez, Ph.D. Chuan Xiao, Ph.D. Charles Ambler, Ph.D. Dean of the Graduate School Copyright © by Syeed Ehsan Ahmed 2017 Dedication Dedicated to my beloved family and friends. Thank you for believing in me and helping me throughout my journey. IMAGING LIVE DROSOPHILA BRAIN WITH TWO-PHOTON FLUORESCENCE MICROSCOPY by SYEED EHSAN AHMED, B.Sc. in Physics THESIS Presented to the Faculty of the Graduate School of The University of Texas at El Paso in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE Department of Physics THE UNIVERSITY OF TEXAS AT EL PASO August 2017 ACKNOWLEDGEMENTS All praise and worthiness goes to Almighty Merciful Allah who has given me the opportunity, strength and ability to complete this work.
    [Show full text]
  • Arxiv:2001.11604V2 [Cs.PL] 6 Sep 2020 of Trigger Conditions, Having an in Situ Infrastructure That Simplifies Results They Desire
    DIVA: A Declarative and Reactive Language for in situ Visualization Qi Wu* Tyson Neuroth† Oleg Igouchkine‡ University of California, Davis, University of California, Davis, University of California, Davis, United States United States United States Konduri Aditya§ Jacqueline H. Chen¶ Kwan-Liu Ma|| Indian Institute of Science, India Sandia National Laboratories, University of California, Davis, United States United States ABSTRACT for many applications. VTK [61] also partially adopted this ap- The use of adaptive workflow management for in situ visualization proach, however, VTK is designed for programming unidirectional and analysis has been a growing trend in large-scale scientific simu- visualization pipelines, and provides limited support for highly dy- lations. However, coordinating adaptive workflows with traditional namic dataflows. Moreover, the synchronous dataflow model is procedural programming languages can be difficult because system somewhat difficult to use and does not always lead to modular pro- flow is determined by unpredictable scientific phenomena, which of- grams for large scale applications when control flows become com- ten appear in an unknown order and can evade event handling. This plicated [19]. Functional reactive programming (FRP) [19,24,48,53] makes the implementation of adaptive workflows tedious and error- further improved this model by directly treating time-varying values prone. Recently, reactive and declarative programming paradigms as first-class primitives. This allowed programmers to write reac- have been recognized as well-suited solutions to similar problems in tive programs using dataflows declaratively (as opposed to callback other domains. However, there is a dearth of research on adapting functions), hiding the mechanism that controls those flows under these approaches to in situ visualization and analysis.
    [Show full text]
  • Data Curation+Process Curation^Data Integration+Science
    BRIEFINGS IN BIOINFORMATICS. VOL 9. NO 6. 506^517 doi:10.1093/bib/bbn034 Advance Access publication December 6, 2008 Data curation 1 process curation^data integration 1 science Carole Goble, Robert Stevens, Duncan Hull, Katy Wolstencroft and Rodrigo Lopez Submitted: 16th May 2008; Received (in revised form): 25th July 2008 Abstract In bioinformatics, we are familiar with the idea of curated data as a prerequisite for data integration. We neglect, often to our cost, the curation and cataloguing of the processes that we use to integrate and analyse our data. Downloaded from https://academic.oup.com/bib/article/9/6/506/223646 by guest on 27 September 2021 Programmatic access to services, for data and processes, means that compositions of services can be made that represent the in silico experiments or processes that bioinformaticians perform. Data integration through workflows depends on being able to know what services exist and where to find those services. The large number of services and the operations they perform, their arbitrary naming and lack of documentation, however, mean that they can be difficult to use. The workflows themselves are composite processes that could be pooled and reused but only if they too can be found and understood. Thus appropriate curation, including semantic mark-up, would enable processes to be found, maintained and consequently used more easily.This broader view on semantic annotation is vital for full data integration that is necessary for the modern scientific analyses in biology.This article will brief the community on the current state of the art and the current challenges for process curation, both within and without the Life Sciences.
    [Show full text]
  • Social Networking Site for Researchers Aims to Make Academic Papers a Thing of the Past 16 July 2009
    Social networking site for researchers aims to make academic papers a thing of the past 16 July 2009 myExperiment, the social networking site for and traditional ideas of repositories,’ said Professor scientists, has set out to challenge traditional ideas Carole Goble. ‘myExperiment paves the way for of academic publishing as it enters a new phase of the next generation of researchers to do new funding. research using new research methods.’ The site has just received a further £250,000 In its first year, the myExperiment.org website has funding from the Joint Information Systems attracted thousands of users worldwide and Committee (JISC) as part of the JISC Information established the largest public collection of its kind. Environment programme to improve scholarly communication in contemporary research practice. More information: www.myexperiment.org According to Professor David De Roure at the Source: University of Southampton University of Southampton’s School of Electronics and Computer Science, who has developed the site jointly with Professor Carole Goble at the University of Manchester, researchers will in the future be sharing new forms of “Research Objects” rather than academic publications. Research Objects contain everything needed to understand and reuse a piece of research, including workflows, data, research outputs and provenance information. They provide a systematic and unbiased approach to research, essential when researchers are faced with a deluge of data. ‘We are introducing new approaches to make research more reproducible, reusable and reliable,’ Professor De Roure said. ‘Research Objects are self-contained pieces of reproducible research which we will share in the future like papers are shared today.’ The myExperiment Enhancement project will integrate myExperiment with the established EPrints research repository in Southampton and Manchester’s new e-Scholar institutional repository.
    [Show full text]
  • A Dynamic Multidimensional Visualization Method for Social Networks
    PsychNology Journal, 2008 Volume 6, Number 3, 291 – 320 SIM: A dynamic multidimensional visualization method for social networks Maria Chiara Caschera*¨, Fernando Ferri¨ and Patrizia Grifoni¨ ¨CNR-IRPPS, National Research Council, Institute of Research on Population and Social Policies, Rome (Italy) ABSTRACT Visualization plays an important role in social networks analysis to explore and investigate individual and groups behaviours. Therefore, different approaches have been proposed for managing networks patterns and structures according to the visualization purposes. This paper presents a method of social networks visualization devoted not only to analyse individual and group social networking but also aimed to stimulate the second-one. This method provides (using a hybrid visualization approach) both an egocentric as well as a global point of view. Indeed, it is devoted to explore the social network structure, to analyse social aggregations and/or individuals and their evolution. Moreover, it considers and integrates features such as real-time social network elements locations in local areas. Multidimensionality consists of social phenomena, their evolution during the time, their individual characterization, the elements social position, and their spatial location. The proposed method was evaluated using the Social Interaction Map (SIM) software module in the scenario of planning and managing a scientific seminars cycle. This method enables the analysis of the topics evolution and the participants’ scientific interests changes using a temporal layers sequence for topics. This knowledge provides information for planning next conference and events, to extend and modify main topics and to analyse research interests trends. Keywords: Social network visualization, Spatial representation of social information, Map based visualization.
    [Show full text]
  • Locality-Dependent Training and Descriptor Sets for QSAR Modeling
    Locality-Dependent Training and Descriptor Sets for QSAR Modeling Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Bryan Christopher Hobocienski Graduate Program in Chemical Engineering The Ohio State University 2020 Dissertation Committee Dr. James Rathman, Advisor Dr. Bhavik Bakshi Dr. Jeffrey Chalmers Copyrighted by Bryan Christopher Hobocienski 2020 Abstract Quantitative Structure-Activity Relationships (QSARs) are empirical or semi-empirical models which correlate the structure of chemical compounds with their biological activities. QSAR analysis frequently finds application in drug development and environmental and human health protection. It is here that these models are employed to predict pharmacological endpoints for candidate drug molecules or to assess the toxicological potential of chemical ingredients found in commercial products, respectively. Fields such as drug design and health regulation share the necessity of managing a plethora of chemicals in which sufficient experimental data as to their application-relevant profiles is often lacking; the time and resources required to conduct the necessary in vitro and in vivo tests to properly characterize these compounds make a pure experimental approach impossible. QSAR analysis successfully alleviates the problems posed by these data gaps through interpretation of the wealth of information already contained in existing databases. This research involves the development of a novel QSAR workflow utilizing a local modeling strategy. By far the most common QSAR models reported in the literature are “global” models; they use all available training molecules and a single set of chemical descriptors to learn the relationship between structure and the endpoint of interest.
    [Show full text]