Enriching and Analysing Statistics with Linked Open Data

Total Page:16

File Type:pdf, Size:1020Kb

Enriching and Analysing Statistics with Linked Open Data Enriching and Analysing Statistics with Linked Open Data Benjamin Zapilko1, Andreas Harth2, Brigitte Mathiak1 1GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany, e-mail: [email protected], [email protected] 2Institute AIFB, Karlsruhe Institute of Technology, Karlsruhe, Germany, e-mail: [email protected] Abstract Scientists often need to analyse heterogeneous and distributed datasets for secondary purposes, for example to verify prior assumptions or to detect correlations between datasets. In most cases the time-consuming effort for converting the desired data into formats for statistical tools is not justified by the information need itself. With the emerging paradigm of Linked Open Data, existing information infrastructures for processing research data can be opened up for the use of external data published at sources on the web. Thus, scientists can conduct statistical analyses on a broader range of available data sets but can also save time-consuming data conversion because of the use of widely established standards and interfaces. This paper discusses issues and challenges when using Linked Open Data for analysing and enriching statistics illustrated by a real life use case scenario. Keywords: Data Integration, Linked Open Data, Statistics 1. Introduction A tenet of research in the social sciences is the study of social phenomena via analysing quantitative evidence. Scientists typically need to perform major and complex analyses on statistical data, and with the ever-increasing amount of available digital data, new approaches for data analysis become possible. As part of their main tasks, researchers often require tedious secondary examinations on heterogeneous and distributed datasets, for example to quickly verify prior or referenced assumptions or to detect correlations between two or more datasets (Schnell, et al. 2005; King, et al. 1994; Kohler, et al. 2008). A lot of tools already exist which support researchers in processing and analysing their data, for example SPSS1 or STATA2. Sizeable amounts of data used by scientists are attainable through the web, however, the data is published in a large variety of data formats. To process and analyse data, scientists often need to convert the raw data to particular data formats and integrate data from multiple sources. In general, data 1 SPSS Statistics, IBM, http://www.spss.com/de/ 2 STATA Data Analysis and Statistical Software, http://www.stata.com conversion and integration is not a technical barrier, but the effort spent for conversion is a nuisance, especially for necessary but tedious routine tasks or in cases where the expected research gain is minor. As much research data is being attainable through the web, advanced web technologies may facilitate the access to and integration of such data. In 2006 Tim Berners-Lee articulated four principles for publishing data on the web as so-called Linked Data. Linked Data provides a technical basis for exposing, sharing and linking data on the web, based on the established web architecture comprising standardised formats and interfaces. Spurred by the Linking Open Data project3 – which aims at making Linked Data available under liberal copyright licenses – a large amount of datasets of scientific interest have been published, e.g. life science data, governmental data and official statistical data covering a broad range of domains and countries. The popularity of the Linked Open Data idea raises expectations that more data providers will publish their data in standardised ways. For researchers, Linked Open Data holds the possibility to gain access to a lot of data which has not been available or easily accessible in the past. To conduct scientific analyses, Linked Data still has to be converted to the formats of the tools scientists are using. However, with data accessible via a single standardised interface, time-consuming transformation processes could be automated and analysis tasks could be performed more easily and thus at lower cost. The information infrastructure for working with research data are often too strictly adjusted to data sources traditionally used or to specific domains or purposes. Considering the Linked Open Data movement, a method for performing mostly secondarily research tasks is needed, to allow for statistical queries and calculations on standardised datasets. If these standards would be of a broader range than those of for example SPSS or STATA for statistical data, additional datasets which lie outside the researchers’ facilities could be used automatically for combined analyses. This would also expand the usage and easy-to-perform analyses of published data to other interested user groups, for instance to journalists. In this paper, we propose a method which provides the capability of conducting such statistical analyses on Linked Data resources to support common tasks that researchers encounter when working with heterogeneous and distributed datasets. We give a brief overview on Linked Open Data in Section 2 and present a real-world use case from the domain of the social sciences in Section 3 where heterogeneous datasets are linked for further combined analyses. The technical implementation of the use case is presented in Section 4 and will cover four issues: (i) the publication of data as Linked Data, (ii) the integration and processing of multiple datasets, (iii) the processing of statistical methods and calculations on such datasets and (iv) the combined visualisation of multiple data sets in a line graph. In Section 5 we present related work in the field of statistical online analyses of data. We conclude in Section 6 with a discussion on observed problems occurred during the use case development and its implementation and give an outlook on future work. 3 http://linkeddata.org/ 2. Linked Open Data The main intention behind the recent idea of Linked Open Data (Bizer, et al. 2009) is a method to expose, share and connect freely available data on the web using Semantic Web standards. Linked Data is based on the unique identification of each thing, such as metadata elements or certain entities. The representation of information about these things can be combined with connections to other relevant or associated things on the web. Tim Berners-Lee outlined four principles for Linked Data (Berners-Lee 2006): 1. Use URIs4 to identify things. 2. Use HTTP5 URIs so that these things can be referred to and looked up (“dereferenced”) by people and user agents. 3. Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF6. 4. Include links to other, related URIs in the exposed data to improve discovery of other related information on the web. The publication of data as Linked Open Data from a technical perspective (Bizer, et al. 2007) is based on common standards and techniques which have been developed for years and are established worldwide as fundamental formats and interfaces for publishing data on the web, e.g. URIs, HTTP and RDF. RDF is a graph-structured data format. With the standardisation of SPARQL7 in 2008, a key technology for querying RDF data has been established. The later described use case consists partly of a processed version of the German General Social Survey ALLBUS8 for North Rhine Westphalia, a RDF representation of this dataset looks as follows: <http://lod.gesis.org/lod.pilot/ALLBUS/ZA4570agg.rdf> rdfs:label "ZA4570 Cumulated ALLBUS / GGSS 1980-2008"; gesis:geo <http://lod.gesis.org/vocab#D-NRW>; rdfs:seeAlso <http://www.gesis.org/dienstleistungen/daten/umfragedaten/allbus/>. <http://lod.gesis.org/vocab#D-NRW> owl:sameAs <http://dbpedia.org/resource/North_Rhine-Westphalia>. RDF is structured like Subject-Predicate-Object sentences. The above example reads: The entry D-NRW in the gesis vocabulary is the same as the entry North_Rhine- Westphalia in DBpedia9. The paradigm of Linked Open Data was well received in the Semantic Web community and has encouraged more organisations to publish data. Research organisations, archives, universities and media corporations publish and link their data and participate in the 4 Uniform Resource Identifier 5 Hypertext Transfer Protocol 6 Resource Description Framework, http://www.w3.org/RDF/ 7 SPARQL Protocol and RDF Query Language (SPARQL), http://www.w3.org/TR/rdf-sparql-protocol/ 8 http://www.gesis.org/en/services/data/survey-data/allbus/ 9 http://dbpedia.org/ movement. Work on standardisation and discussions about the openness of data generates new impulses, so even governments all over the world are beginning to deal with the publication of official government data in the web. This strengthens the transparency of governmental agencies as well as the collaboration with and the participation of citizens in diverse aspects. The linkage of distributed and apparently unrelated data sources holds potentials for both data providers and users. Data providers are in the position to enrich their own data with external sources from the web which prove to be relevant for their users or at least to be interesting additional information complementary to their data. Developers can in turn use the published and linked data for inclusion in own tools and applications which can in turn be made available to users. Therefore the real advantages of Linked Open Data for end users, e.g. scientists, will be evident when there are enough tools and applications which use linked data sources in a practical and useful way. 3. Use Case In this section, we present a use case scenario for enriching and analysing statistics based on Linked Data which covers typical research tasks of social scientists: the detection and analysis of correlations between heterogeneous data sources and further calculations on these data sources. Our use case revolves around the following data sources: The German General Social Survey ALLBUS, which collects up-to-date data on attitudes, behaviour, and social structure in Germany and is archived at GESIS – Leibniz Institute for the Social Sciences10. Due to data privacy restrictions we use a special processed version of a subset of ALLBUS.
Recommended publications
  • SQSTM1 Mutations in Familial and Sporadic Amyotrophic Lateral Sclerosis
    ORIGINAL CONTRIBUTION SQSTM1 Mutations in Familial and Sporadic Amyotrophic Lateral Sclerosis Faisal Fecto, MD; Jianhua Yan, MD, PhD; S. Pavan Vemula; Erdong Liu, MD; Yi Yang, MS; Wenjie Chen, MD; Jian Guo Zheng, MD; Yong Shi, MD, PhD; Nailah Siddique, RN, MSN; Hasan Arrat, MD; Sandra Donkervoort, MS; Senda Ajroud-Driss, MD; Robert L. Sufit, MD; Scott L. Heller, MD; Han-Xiang Deng, MD, PhD; Teepu Siddique, MD Background: The SQSTM1 gene encodes p62, a major In silico analysis of variants was performed to predict al- pathologic protein involved in neurodegeneration. terations in p62 structure and function. Objective: To examine whether SQSTM1 mutations con- Results: We identified 10 novel SQSTM1 mutations (9 tribute to familial and sporadic amyotrophic lateral scle- heterozygous missense and 1 deletion) in 15 patients (6 rosis (ALS). with familial ALS and 9 with sporadic ALS). Predictive in silico analysis classified 8 of 9 missense variants as Design: Case-control study. pathogenic. Setting: Academic research. Conclusions: Using candidate gene identification based on prior biological knowledge and the functional pre- Patients: A cohort of 546 patients with familial diction of rare variants, we identified several novel (n=340) or sporadic (n=206) ALS seen at a major aca- SQSTM1 mutations in patients with ALS. Our findings demic referral center were screened for SQSTM1 muta- provide evidence of a direct genetic role for p62 in ALS tions. pathogenesis and suggest that regulation of protein deg- radation pathways may represent an important thera- Main Outcome Measures: We evaluated the distri- peutic target in motor neuron degeneration. bution of missense, deletion, silent, and intronic vari- ants in SQSTM1 among our cohort of patients with ALS.
    [Show full text]
  • Archives of Agriculture and Environmental Science
    ISSN (Online) : 2456-6632 Archives of Agriculture and Environmental Science An International Journal Volume 4 | Issue 2 Agriculture and Environmental Science Academy www.aesacademy.org Scan to view it on the web Archives of Agriculture and Environmental Science (Abbreviation: Arch. Agr. Environ. Sci.) ISSN: 2456-6632 (Online) An International Research Journal of Agriculture and Environmental Sciences Volume 4 Number 2 2019 Abstracted/Indexed: The journal AAES is proud to be a registered member of the following leading abstracting/indexing agencies: Google Scholar, AGRIS-FAO, CrossRef, Informatics, jGate @ e-Shodh Sindhu, WorldCat Library, OpenAIRE, Zenodo ResearchShare, DataCite, Index Copernicus International, Root Indexing, Research Gate etc. All Rights Reserved © 2016-2019 Agriculture and Environmental Science Academy Disclaimer: No part of this booklet may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission of the publisher. However, all the articles published in this issue are open access articles which are distributed under the terms of the Creative Commons Attribution 4.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited For information regarding permission, write us [email protected]. An official publication of Agriculture and Environmental Science Academy 86, Gurubaksh Vihar (East) Kankhal Haridwar-249408 (Uttarakhand), India Website: https://www.aesacademy.org Email: [email protected] Phone: +91-98971-89197 Archives of Agriculture and Environmental Science (An International Research Journal) (Abbreviation: Arch. Agri. Environ. Sci.) Aims & Objectives: The journal is an official publication of Agriculture and Environmental Science Academy.
    [Show full text]
  • Strategies for Non-Invasive Management of High-Grade Cervical Intraepithelial Neoplasia
    Strategies for non-invasive management of high-grade cervical intraepithelial neoplasia Citation for published version (APA): Koeneman, M. M. (2019). Strategies for non-invasive management of high-grade cervical intraepithelial neoplasia: prognostic biomarkers and immunotherapy. Maastricht University. https://doi.org/10.26481/dis.20190116mk Document status and date: Published: 01/01/2019 DOI: 10.26481/dis.20190116mk Document Version: Publisher's PDF, also known as Version of record Please check the document version of this publication: • A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.
    [Show full text]
  • Software in the Scientific Literature: Problems with Seeing, Finding, And
    Software in the Scientific Literature: Problems with Seeing, Finding, and Using Software Mentioned in the Biology Literature James Howison School of Information, University of Texas at Austin, 1616 Guadalupe Street, Austin, TX 78701, USA. E-mail: [email protected] Julia Bullard School of Information, University of Texas at Austin, 1616 Guadalupe Street, Austin, TX 78701, USA. E-mail: [email protected] Software is increasingly crucial to scholarship, yet the incorporates key scientific methods; increasingly, software visibility and usefulness of software in the scientific is also key to work in humanities and the arts, indeed to work record are in question. Just as with data, the visibility of with data of all kinds (Borgman, Wallis, & Mayernik, 2012). software in publications is related to incentives to share software in reusable ways, and so promote efficient Yet, the visibility of software in the scientific record is in science. In this article, we examine software in publica- question, leading to concerns, expressed in a series of tions through content analysis of a random sample of 90 National Science Foundation (NSF)- and National Institutes biology articles. We develop a coding scheme to identify of Health–funded workshops, about the extent that scientists software “mentions” and classify them according to can understand and build upon existing scholarship (e.g., their characteristics and ability to realize the functions of citations. Overall, we find diverse and problematic Katz et al., 2014; Stewart, Almes, & Wheeler, 2010). In practices: Only between 31% and 43% of mentions particular, the questionable visibility of software is linked to involve formal citations; informal mentions are very concerns that the software underlying science is of question- common, even in high impact factor journals and across able quality.
    [Show full text]
  • The Research Road We Make: Statistics for the Uninitiated
    The Research Road We Make: Statistics for the Uninitiated Saviour Formosa PhD Sandra Scicluna PhD Jacqueline Azzopardi PhD Janice Formosa Pace MSc Trevor Calafato MSc 2011 Published by the National Statistics Office, Malta Published by the National Statistics Office Lascaris Valletta VLT 2000 Malta Tel.: (+356) 25997000 Fax:(+356) 25997205 / 25997103 e-mail: [email protected] website: http://www.nso.gov.mt CIP Data The Research Road We Make: Statistics for the Uninitiated – Valletta: National Statistics Office, 2011 xxii, 279p. ISBN: 978-99957-29-14-1 For further information, please contact: Unit D2: External Cooperation and Communication Directorate D: Resources and Support Services National Statistics Office Lascaris Valletta VLT 2000 Malta Tel: (+356) 25997219 NSO publications are available from: Unit D2: External Cooperation and Communication Directorate D: Resources and Support Services National Statistics Office Lascaris Valletta VLT 2000 Malta Tel.: (+356) 25997219 Fax: (+356) 25997205 Contents Page Preface vii Foreword viii Acknowledgements ix Abbreviations xi Glossary xiii Imagery xix Introduction xxi Chapter 1 What is Statistics? An Intro for the Uninitiated! 1 Why Statistics? 3 The Tower of Babel Syndrome or Valhalla? 4 The ‘fear of stats’ 5 Myths and Realities 5 Chapter 2 Research Methodology 7 The Research Design 9 Social Scientific Research Methods 10 Research Problems 11 Sampling 11 Causality, Association and Correlations 12 Methods of Research 13 Chapter 3 DIKA 25 Why is Research Necessary? 29 Forms of Research 31 Types of Research
    [Show full text]
  • Assessment of the Impact of the Electronic Submission and Review Environment on the Efficiency and Effectiveness of the Review of Human Drugs – Final Report
    Evaluations and Studies of New Drug Review Programs Under PDUFA IV for the FDA Contract No. HHSF223201010017B Task No. 2 Assessment of the Impact of the Electronic Submission and Review Environment on the Efficiency and Effectiveness of the Review of Human Drugs – Final Report September 9, 2011 Booz | Allen | Hamilton Assessment of the Impact of the Electronic Submission and Review Environment Final Report Table of Contents EXECUTIVE SUMMARY .............................................................................................................. 1 Assessment Overview .......................................................................................................... 1 Degree of Electronic Implementation ................................................................................... 1 Baseline Review Performance ............................................................................................. 2 Exchange and Content Data Standards ............................................................................... 4 Review Tools and Training Analysis .................................................................................... 5 Current Organization and Responsibilities ........................................................................... 6 Progress against Ideal Electronic Review Environment ....................................................... 6 Recommendations ............................................................................................................... 8 ASSESSMENT OVERVIEW ........................................................................................................
    [Show full text]
  • Analyzing Data with Graphpad Prism
    Analyzing Data with GraphPad Prism A companion to GraphPad Prism version 3 Harvey Motulsky President GraphPad Software Inc. [email protected] GraphPad Software, Inc. © 1999 GraphPad Software, Inc. All rights reserved. All Rights Reserved. GraphPad Prism, Prism and InStat are registered trademarks of GraphPad Contents Software, Inc. GraphPad is a trademark of GraphPad Software, Inc. Use of the software is subject to the restrictions contained in the Preface.........................................................................................................2 accompanying software license agreement. Introduction to statistical comparisons..........................................................3 Garbage in, garbage out ...................................................................................3 Citation: H.J. Motulsky, Analyzing Data with GraphPad Prism, 1999, When do you need statistical calculations? ........................................................3 The key concept: Sampling from a population....................................................4 GraphPad Software Inc., San Diego CA, www.graphpad.com. Confidence intervals ........................................................................................8 P values ..........................................................................................................9 Hypothesis testing and statistical significance................................................... 11 Acknowledgements: The following individuals made substantial Statistical power ...........................................................................................
    [Show full text]
  • Fitting Models to Biological Data Using Linear and Nonlinear Regression
    Fitting Models to Biological Data using Linear and Nonlinear Regression A practical guide to curve fitting Harvey Motulsky & Arthur Christopoulos Copyright 2003 GraphPad Software, Inc. All rights reserved. GraphPad Prism and Prism are registered trademarks of GraphPad Software, Inc. GraphPad is a trademark of GraphPad Software, Inc. Citation: H.J. Motulsky and A Christopoulos, Fitting models to biological data using linear and nonlinear regression. A practical guide to curve fitting. 2003, GraphPad Software Inc., San Diego CA, www.graphpad.com. To contact GraphPad Software, email [email protected] or [email protected]. Contents at a Glance A. Fitting data with nonlinear regression.................................... 13 B. Fitting data with linear regression..........................................47 C. Models ....................................................................................58 D. How nonlinear regression works........................................... 80 E. Confidence intervals of the parameters ..................................97 F. Comparing models................................................................ 134 G. How does a treatment change the curve?..............................160 H. Fitting radioligand and enzyme kinetics data ....................... 187 I. Fitting dose-response curves .................................................256 J. Fitting curves with GraphPad Prism......................................296 3 Contents Preface ........................................................................................................12
    [Show full text]
  • The R Project for Statistical Computing a Free Software Environment For
    The R Project for Statistical Computing A free software environment for statistical computing and graphics that runs on a wide variety of UNIX platforms, Windows and MacOS OpenStat OpenStat is a general-purpose statistics package that you can download and install for free. It was originally written as an aid in the teaching of statistics to students enrolled in a social science program. It has been expanded to provide procedures useful in a wide variety of disciplines. It has a similar interface to SPSS SOFA A basic, user-friendly, open-source statistics, analysis, and reporting package PSPP PSPP is a program for statistical analysis of sampled data. It is a free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions TANAGRA A free, open-source, easy to use data-mining package PAST PAST is a package created with the palaeontologist in mind but has been adopted by users in other disciplines. It’s easy to use and includes a large selection of common statistical, plotting and modelling functions AnSWR AnSWR is a software system for coordinating and conducting large-scale, team-based analysis projects that integrate qualitative and quantitative techniques MIX An Excel-based tool for meta-analysis Free Statistical Software This page links to free software packages that you can download and install on your computer from StatPages.org Free Statistical Software This page links to free software packages that you can download and install on your computer from freestatistics.info Free Software Information and links from the Resources for Methods in Evaluation and Social Research site You can sort the table below by clicking on the column names.
    [Show full text]
  • Arxiv:2104.02712V1 [Cs.OH] 6 Apr 2021
    Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications EUNICE JUN, University of Washington, USA MELISSA BIRCHFIELD, University of Washington, USA NICOLE DE MOURA, Eastlake High School, USA JEFFREY HEER, University of Washington, USA RENÉ JUST, University of Washington, USA Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization. In a content analysis of research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analysis to fit familiar approaches, even ifsub-optimal.In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation, and discuss implications for future tools. ACM Reference Format: Eunice Jun, Melissa Birchfield, Nicole de Moura, Jeffrey Heer, and René Just. 2021. Hypothesis Formalization: Empirical Findings, Software
    [Show full text]
  • List of New Applications Added in ARL #2571
    List of New Applications Added in ARL #2571 Application Name Publisher M*Modal Fluency Direct 3M M*Modal Fluency Direct Connector 10.0 3M M*Modal Fluency Direct Connector 3M M*Modal Fluency for Transcription Editor 3M M*Modal Fluency for Transcription Editor 7.6 3M M*Modal Fluency Flex 3M M*Modal Fluency Direct Connector 7.85 3M Fluency for Imaging 3M M*Modal Fluency Direct CAPD 3M Studio 3T 2020.7 3T Software Labs Studio 3T 2020.3 3T Software Labs Studio 3T 2020.8 3T Software Labs Studio 3T 2020.2 3T Software Labs MailRaider 3.69 Pro 45RPM software MailRaider 3.67 Pro 45RPM software Universal Restore Bootable Media Builder 11.5 Acronis ActivePerl 5.26 Enterprise ActiveState Photoshop Lightroom Classic [CRACKED!] CC (2018) Adobe Fresco CC (2020) Adobe Reader MUI Adobe Outlook Spy 4.0 Advanced Messaging Systems AE Broadcast+ Agencia Estado AE Broadcast Agencia Estado Airtame 3.4 Airtame ARGUS Enterprise 12.1 Altus Group Amethyst 0.15 Amethyst Game Engine BetterTouchTool 3.402 Andreas Hegenberg BetterTouchTool 2.428 Andreas Hegenberg BetterTouchTool 3.401 Andreas Hegenberg Password Manager SafeInCloud 20.2 Andrey Shcherbakov SynciOS Data Transfer 3.0 Anvsoft Video Download Capture 6.4 Apowersoft HEIC Converter Apowersoft AirServer 2020.7 App Dynamic Clipboard Viewer 3.0 Apple BIMx 2018 Apple Safari for Mac 14.0 Apple ARCserve Backup Agent for Microsoft Exchange 17.0 ARCserve ARCserve Backup Global Dashboard 17.5 ARCserve Arcserve Universal Agent 17.5 ARCserve ARCserve Backup NDMP NAS Option 17.0 ARCserve ARCserve Backup NDMP NAS Option
    [Show full text]
  • Prism 4 Statistics Guide −Statistical Analyses for Laboratory and Clinical Researchers
    Version 4.0 Statistics Guide Statistical analyses for laboratory and clinical researchers Harvey Motulsky 1999-2005 GraphPad Software, Inc. All rights reserved. Third printing February 2005 GraphPad Prism is a registered trademark of GraphPad Software, Inc. GraphPad is a trademark of GraphPad Software, Inc. Citation: H.J. Motulsky, Prism 4 Statistics Guide −Statistical analyses for laboratory and clinical researchers. GraphPad Software Inc., San Diego CA, 2003, www.graphpad.com. This book is a companion to the computer program GraphPad Prism version 4, which combines scientific graphics, basic biostatistics, and nonlinear regression (for both Windows and Macintosh). Prism comes with three volumes in addition to this one: The Prism 4 User’s Guide, Prism 4 Step-by-Step Examples, and Fitting Models to Biological Data using Linear and Nonlinear Regression. You can learn more about Prism, and download all four volumes as Acrobat (pdf) files, from www.graphpad.com To contact GraphPad Software, email [email protected] or [email protected]. Table of Contents Statistical Principles ......................................................................................9 1. Introduction to Statistical Thinking .....................................................9 Garbage in, garbage out .............................................................................................9 When do you need statistical calculations? ...............................................................9 The key concept: Sampling from a population ........................................................10
    [Show full text]