Foundations of Biostatistics Foundations of Biostatistics M
Total Page:16
File Type:pdf, Size:1020Kb
M. Ataharul Islam Abdullah Al-Shiha Foundations of Biostatistics Foundations of Biostatistics M. Ataharul Islam • Abdullah Al-Shiha Foundations of Biostatistics 123 M. Ataharul Islam Abdullah Al-Shiha ISRT Department of Statistics and Operations University of Dhaka Research Dhaka College of Science, King Saud University Bangladesh Riyadh Saudi Arabia ISBN 978-981-10-8626-7 ISBN 978-981-10-8627-4 (eBook) https://doi.org/10.1007/978-981-10-8627-4 Library of Congress Control Number: 2018939003 © Springer Nature Singapore Pte Ltd. 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Printed on acid-free paper This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. part of Springer Nature The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Preface The importance of learning biostatistics has been growing very fast due to its increasing usage in addressing challenges in life sciences – in particular biomedical and biological sciences. There are challenges, both old and new, in nature, mostly attributable to interactive developments in the fields of life sciences, statistics, and computer science. The developments in statistics and biostatistics have been complementary to each other, biostatistics being focused to address the challenges in the field of biomedical sciences including its close links with epidemiology, health, biological science, and other life sciences. Despite the focus on biomedical-related problems, the need for addressing important issues of concern in the life science problems has been the source of some major developments in the theory of statistics. The current trend in the development of data science indicates that biostatistics will be in greater demand in the future. The compelling motivation behind writing this book stemmed from our long experience of teaching the foundation course on biostatistics in different universities to students having varied background. The students seek a thorough knowledge with adequate background in both theory and applications, employing a more cohesive approach such that all the relevant foundation concepts can be linked as a building block. This book provides a careful presentation of topics in a sequence necessary to make the students and users of the book proficient in foundations of biostatistics. In other words, the understanding of any of the foundation materials is not left to the unfamiliar domain. In biostatistics, all these foundation materials are deeply interconnected and even a single source of unfamiliarity may cause difficulty in understanding and applying the techniques properly. In this textbook, immense emphasis is given on coverage of relevant topics with adequate details, to facilitate easy understanding of these topics by the students and users. We have used mostly elementary mathematics with minor exceptions in a few sections in order to provide a more comprehensive view of the foundations of biostatistics that are expected from a biostatistician in real-life situations. v vi Preface There are 11 chapters in this book. Chapter 1 introduces basic concepts, and organizing and displaying data. One section includes the introduction to designing of sample surveys to make the students familiar with the steps necessary for col- lecting data. Chapter 2 provides a thorough background on basic summary statis- tics. The fundamental concepts and their applications are illustrated. In Chap. 3, basic probability concepts are discussed and illustrated with the help of suitable examples. The concepts are introduced in a self-explanatory manner. The appli- cations to biomedical sciences are highlighted. The fundamental concepts of probability distributions are included in Chap. 4. All examples are self-explanatory and will help students and users to learn the basics without difficulty. Chapter 5 includes a brief introduction of continuous probability distributions, mainly normal and standard normal distributions. In this chapter, emphasis is given mostly to the appropriate understanding and knowledge about applications of these two distri- butions to the problems in biomedical science. The sampling distribution plays a vital role in inferential procedures of biostatistics. Without understanding the concepts of sampling distribution, students and users of biostatistics would not be able to face the challenges in comprehensive applications of biostatistical tech- niques. In Chap. 6, concepts of sampling distribution are illustrated with very simple examples. This chapter provides a useful link between descriptive statistics in Chap. 2 with inferential statistics in Chaps. 7 and 8. Chapters 7 and 8 provide inferential procedures, estimation, and tests of hypothesis, respectively. As a pre- lude to Chap. 6 along with Chaps. 7 and 8, Chaps. 3–5 provide very important background support in understanding the underlying concepts of biostatistics. All these chapters are presented in a very simple manner and the linkages are easily understandable by students and users. Chapters 9–11 cover the most extensively employed models and techniques that provide any biostatistician the necessary understanding and knowledge to explore the underlying association between risk or prognostic factors with outcome variables. Chapter 9 includes correlation and regression, and Chap. 10 introduces essential techniques of analysis of variance. The most useful techniques of survival analysis are introduced in Chap. 11 which includes a brief discussion about topics on study designs in biomedical science, measures of association such as odds ratio and relative risk, logistic regression and proportional hazards model. With clear understanding of the concepts illustrated in Chaps. 1–8, students and users will find the last three chapters very useful and meaningful to obtain a comprehensive background of foundations of biostatistics. We want to express our gratitude to Halida Hanum Akhtar and Mahbub Elahi Chowdhury for the BIRPERHT data on pregnancy-related complications, the Health and Retirement Study (HRS), USA, M. Lichman, Machine Learning Repository (http://archive.ics.uci.edu/ml) of the Center for Machine Learning and Intelligent Systems, UCI, W. H. Wolberg, W. N. Street, and O. L. Mangasarian of the University of Wisconsin, Creators of the Breast Cancer Wisconsin Data Set, and NIPORT for BDHS 2014 and BMMS 2010 data. We would like to express our deep gratitude to the Royal Statistical Society for permission to reproduce the statistical tables included in the appendix, and our special thanks to Atinuke Phillips for her help. We also acknowledge that one table has been reproduced from Preface vii C. Dougherty’s tables that have been computed to accompany the text ‘Introduction to Econometrics’ (second edition 2002, Oxford University Press, Oxford). We are grateful to our colleagues and students at the King Saud University and the University of Dhaka. The idea of writing this book has stemmed from teaching and supervising research students at the King Saud University. We want to express our heartiest gratitude to the faculty members and the students of the Department of Statistics and Operations Research of the King Saud University and to the Institute of Statistical Research and Training (ISRT) of the University of Dhaka. Our special thanks to ISRT for providing every possible support for completing this book. We want to thank Shahariar Huda for his continued support to our work. We extend our deepest gratitude to Rafiqul I. Chowdhury for his immense help at different stages of writing this book. We want to express our gratitude to F. M. Arifur Rahman for his unconditional support to our work whenever we needed. We would like to thank Mahfuzur Rahman for his contribution in preparing the manuscript. Our special thanks to Amiya Atahar for her unconditional help during the final stage of writing this book. Further, we acknowledge gratefully the continued support from Tahmina Khatun, Jayati Atahar, and Shainur Ahsan. We acknowledge with deep gratitude that without help from Mahfuza Begum, this work would be difficult to complete. We extend our deep gratitude to Syed Shahadat Hossain, Azmeri Khan, Jahida Gulshan, Shafiqur Rahman, Israt Rayhan, Mushtaque Raza Chowdhury, Lutfor Rahman, Rosihan M. Ali, Adam Baharum, V. Ravichandran, and A. A. Kamil for their support. We are