Processing of Data and Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Conceptual Paper Biostatistics and Epidemiology International Journal Processing of data and analysis Dr Balkishan Sharma Sri Aurobindo Medical College & PG Institute, India Correspondence: Dr Balkishan Sharma, PhD, Associate Professor (Biostatistics), Department of Community Medicine, Sri Aurobindo Medical College & PG Institute, Indore (MP), India, Email [email protected] and [email protected] Received: February 06, 2018 | Published: February 20, 2018 Copyright© 2018 Sharma. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Introduction Health research is essential in developing evidence-based interventions Data-processing techniques that will make a difference in mitigating health problems, promoting health and ultimately improving the quality of life. Data collected Once the fieldwork has been completed, the information that has been and compiled from experimental work, records and surveys should gathered must be centralized and input to the computer. Data entry is be accurate and complete. They must be checked for accuracy tiresome work, but it isnecessary to accord it some attention because and adequacy before processing further. So far they lie in masses, of inevitable errors in reading and entering the information, especially are scattered in the records and in other words they are mixed and if it is not carried out by people accustomed to using a keyboard for unsorted. data-entry. The computerized data can be stored in tabular form in a spreadsheet (e.g. Microsoft Excel, Lotus 123) or more compactly and The processing of data and further analysis may be break up into efficiently in a relational database management system (e.g. Access, three stages: (1) data management, (2) explanatory data analysis Oracle, dBase). and (3) statistical analysis (testing and modeling). However, before proceeding for statistical analysis, researcher must consider the Methods of statistical processing following raised points: Statistics offers a range of methods, the choice of which will depend ✓ Understand the process involved in data processing. on four factors: (1) The type of variables: qualitative or quantitative; (2) The status of the variables: explanatory or dependent; (3) The ✓ Use computers to perform data processing. number of variables: one, two or multiple and (4) the type of analysis: ✓ Distinguish between qualitative and quantitative data. exploratory (descriptive) or confirmatory (inferential). ✓ Understand probabilities and their applications. Scope and purpose ✓ Interpret summary statistics, graphical presentation and Data analysis is the process of developing answers to questions contingency tables. through the examination and interpretation of data. The basic steps ✓ Commonly presented in the health literature. in the analytic process consist of identifying issues, determining the availability of suitable data, deciding on which methods are ✓ Carry out exploratory data analysis. appropriate for answering the questions of interest, applying the methods and evaluating, summarizing and communicating the results. ✓ Understand the process involved in estimations and hypothesis testing. Analytical results underscore the usefulness of data sources by shedding light on relevant issues. Data analysis also plays a key role ✓ Interpret the functions of confidence intervals and p-values. in data quality assessment by pointing to data quality problems in a ✓ Understand statements in published articles relating to statistics. given survey. Analysis can thus influence future improvements to the survey process. ✓ Use computers to perform some statistical analysis. Exploratory data analysis involves examination of the data errors and Quality indicators describing data using summary statistics and graphical techniques Main quality elements: relevance, interpretability, accuracy, (descriptive statistics). As data errors are often detected at this accessibility. An analytical product is relevant if there is an audience stage, the process of exploratory data analysis and data cleaning are who is (or will be) interested in the results of the study. For the typically iterative. The aim of this process is, for the researcher to gain interpretability of an analytical article to be high, the style of writing familiarity with, and understanding of the data, in order to determine must suit the intended audience. Sufficient and relevant details must the approach to take, and methods to use in further statistical analyzes. Submit your Article | www.ologypress.com/submit-article Citation: Sharma B. Processing of data and analysis. Biostatistics Epidemiol Int J. (2018);1(1): 3-5. Ology Press DOI: 10.30881/beij.00003 Biostatistics and Epidemiology International Journal 4 be provided so that another person if allowed access to the data could ✓ Distinguishing data types. replicate the results. ✓ Distinguishing different types of Statistical tests. For an analytical product to be accurate, appropriate methods and ✓ Identify the selection of a right test. tools need to be used to produce the results. For an analytical product to be accessible, it must be available to people for whom the research ✓ Determining statistical significance. results would be useful. ✓ Distinguishing between Parametric and Non-Parametric test Analysis of data with their applying criteria. Data analysis converts data into information and knowledge, and ✓ Distinguishing between Correlation and Regression. explores the relationship between variables. Data Analysis is the ✓ Drawing unbiased inference. process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data. ✓ Inappropriate subgroup analysis. According to Shamoo & Resnik1 various analytic procedures “provide ✓ Lack of clearly defined and objective outcome measurements. a way of drawing inductive inferences from data and distinguishing the signal (the phenomenon of interest) from the noise (statistical ✓ Partitioning ‘text’ when analyzing qualitative data. fluctuations) present in the data.” ✓ Reliability and Validity. Understanding of the data analysis procedures will enable you to ✓ appreciate the meaning of the scientific method which includes Extent of analysis. testing of hypotheses and statistical significance in relation to Table 1 consists of common statistical tests. All tests which are research questions. There are a number of issues that researchers described below are provided in the book titled Intuitive Biostatistics should be cognizant of with respect to data analysis. Some of the key by Harvey Motulsky2 and were performed by InStat, except for tests considerations in analysis and selection of the right test of significance marked with asterisks. Tests labeled with a single asterisk are briefly are as follows: mentioned in this book, and tests labeled with two asterisks were not mentioned at all. ✓ Having the necessary skills to analyze. Table 1 Common statistical tests Type of Data Rank, Score, or Measurement Binomial Goal Measurement (from (from Non-Gaussian (Two Possible Survival Time Gaussian Population) Population) Outcomes) Kaplan Meier survival Describe one group Mean, SD Median, Interquartile range Proportion curve Compare one group to Chi-square or Binomial One-sample t test Wilcoxon test - a hypothetical value test** Compare two unpaired Fisher's test (chi-square Log-rank test or Mantel- Unpaired t test Mann-Whitney test groups for large samples) Haenszel* Compare two paired Conditional proportional Paired t test Wilcoxon test McNemar's test groups hazards regression* Compare three or Cox proportional hazard more unmatched One-way ANOVA Kruskal-Wallis test Chi-square test regression** groups Compare three or Repeated-measures Conditional proportional Friedman test Cochrane Q** more matched groups ANOVA hazards regression** Rank, Score, or Measurement Binomial Measurement (from Goal (from Non-Gaussian (Two Possible Survival Time Gaussian Population) Population) Outcomes) Submit your Article | www.ologypress.com/submit-article Citation: Sharma B. Processing of data and analysis. Biostatistics Epidemiol Int J. (2018);1(1): 3-5. DOI: 10.30881/beij.00003 Ology Press 5 Biostatistics and Epidemiology International Journal (Table 1 continuous..) Quantify association Contingency Pearson correlation Spearman Correlation - between two variables Coefficients** Predict value from Cox proportional hazard another measured Simple LR or Non LR Nonparametric Regression** Simple Logistic regression* variable Regression* Predict value from Multiple LR* or Multiple Multiple Logistic Cox proportional hazard several measured or - Non LR** Regression* regression* binomial variables *briefly mentioned in Intuitive Biostatistics2 **not mentioned in Intuitive Biostatistics2 Developments in the field of statistical data analysis often parallel 5. http://www.emacpd.org/sites/default/files/resource_center/3.Data%20 or follow advancements in other fields to which statistical methods Processing%20Module.pdf are fruitfully applied. Data is known to be crude information and not 6. Korn EL, Graubard BI. Analysis of health surveys. John Wiley & Sons. knowledge by itself. The sequence from data to knowledge is: from 2011;323. Data to Information, from Information to Facts, and finally, from Facts to Verification of Truth. Data becomes information, when it becomes 7. Lehtonen