CHAPTER – 11 Univariate and Bivariate Analysis of Data

Total Page:16

File Type:pdf, Size:1020Kb

CHAPTER – 11 Univariate and Bivariate Analysis of Data CHAPTER – 11 Univariate and Bivariate Analysis of Data 1. Which of the following cannot be covered under univariate analysis of data? a. Association between two variables b. Computation of mean, median and mode c. Preparation of frequency table d. Computation of percentage frequency for a variable 2. Both mean and mode can be computed for what type of measurement scales. a. Nominal b. Ordinal c. Interval d. Ratio e. c and d f. a and c 3. Which of the following option is not available for the treatment of missing value? a. Substitute the average value of the response b. Substitute a neutral value c. Return to the field to get the desired observation d. The concerned questionnaire should be eliminated from the analysis. 4. In the interpretation of the cross table, the percentages should be computed a. Row wise b. Column wise c. In the direction of the independent variable d. Computing percentages either row wise or column wise does not make a difference. 5. For which type of measurement, the coefficient of variation can be computed. a. Nominal scale b. Ordinal scale c. Interval scale d. Ratio scale 6. A third variable is introduced in the two variable table to a. Refine the association that was observed originally between two variables. b. The introduction of third variable may show that there was no association between the original two variables. c. Introducing a third variable may indicate association between two original variables although initially no relationship was found between them. d. All of the above are true. 7. Which of the following is true about the missing observation? a. Missing value could be coded with another number which should not be equal to the value of the variable obtained as a part of the survey. b. It is possible to get different research conclusions if the value of the missing observation was available. c. The actual results may deviate from the observed results depends upon the number of missing observation and the extent to which the missing data would be different from the actual observation. d. All of the above statements are true. 8. Which of the following is true in case of interpretation of a table with analysis of multiple responses? a. The percentages add up to 100%. b. The percentages exceed 100% because of multiplicity of answer. c. The percentages cannot be computed. d. All the above statements are incorrect. 9. There are 20% female students in a class – this is an example of a. Descriptive analysis b. Inferential analysis c. Gender bias d. All of the above are correct 10. For which type of measurement, median cannot be computed. a. Nominal b. Ordinal c. Interval d. Ratio 11. Which of the following analysis cannot be carried out using ordinal scale data? a. Preparation of frequency distribution b. Computation of the quartiles of the distribution c. Computing the mode of the distribution d. Computing the standard deviation of the variable 12. The uses of a frequency distribution are a. To get the extent of non response. b. To detect the presence of extreme cases (outliers in the distribution). c. To get the extent of illegitimate responses. d. All of the above 13. That measure of central tendency above which 50% values fall and below which the remaining 50% fall is called: a. Mean b. Median c. Mode d. Range 14. When a respondent assigns an order of preference using values as 1, 2, 3 and so on, he is using a. Nominal values b. Ordinal values c. Interval values d. Ratio values 15. The median can be computed from a. Ordinal, interval and nominal data b. Ratio, ordinal and nominal data c. Ratio, interval and ordinal data d. Ratio, interval and nominal data 16. Which of the following gives the measure of consistency of data? a. Mean b. Standard deviation c. Mode d. Median 17. The simple way to look at association for the data which requires only the ability to compute percentages. a. Cross-tabulation b. Correlation coefficient c. Spearman rank correlation coefficient d. Simple graph e. None of the above 18. Factor analysis is a technique for a. Univariate data b. Bivariate data c. Multivariate data d. Both (a) and (b) e. Both (a) and (c) f. Both (b) and (c) 19. Which of the following may not be the right method for dealing with missing information on a questionnaire? a. Discard the questionnaire b. Ignore that particular question and code the remaining c. Based on responses of similar respondents, substitute a value d. All of them may be appropriate e. Both (a) and (b) f. Both (b) and (c) 20. Median can be computed for a. Closed ended class interval b. Open ended class interval c. Ordinal scale data d. Interval scale data e. Ratio scale data f. All of them are true 21. The one-way table cannot be used for a. Frequency distribution b. Percentage distribution c. Cumulative frequency distribution d. To compute measure of central tendency e. It can be used for all of the above. For the next 4 questions, read the following table: TABLE Consumption of ice cream and household income Low Consumption of Ice cream High Consumption of Ice cream Total Low Income 30 10 40 Middle Income 20 20 40 High Income 12 28 40 Total 62 58 120 22 The above table is an example of a. Cross-tabulation b. One way tabulation c. Four way classification d. None of the above 23 What percentage of household have less consumption of Ice cream? a. 50 b 51.67 c 54 d 49.38 24 How many households are there with middle income? a 30 b 28 c 40 d None of the above 25 How many household with middle income have high consumption of Ice cream\/ a 20 b 30 c 28 d 12 For the next 3 question read the following data Given below are the marks of 10 students in an examination 40, 60, 50, 55, 45, 70, 60, 55, 75, 80. 26 What is the average score of the students? a 55 b 60 c 65 d 59 27 What is median of the distribution of marks? a 55 b 57.5 c 58.5 d 60 28 What is the range of the marks of the distribution? a 40 b 18 c 20 d 30 CHAPTER-12 Testing of Hypotheses 1 A p value of o.o5 means a. There is only 5% chance that the results are incorrect b. There is probability of 5 in 100 that this result would occur if the null hypotheses were true c. There is only 5% chance of getting this result d. All of the above are true 2. The probability of rejecting a null hypothesis when it is true is called a Level of significance b Type II error c Type I error d Beta 3 While testing for single population mean when the sample size n is less than 30 and population standard deviation is unknown a z test is used. b t test with n-2 degrees of freedom is used. c t test with n-1 degrees of freedom is used. d None of the above statement is true. 4 Which of the following statements is false? a t distribution is symmetrical. b z distribution is symmetrical. c t distribution is flatter than z distribution. d t distribution is applicable only when sample size n is greater than 50. 5 Which of the following distribution is useful for small sample while testing for population means? a z distribution b. F distribution c. chi-square distribution d. t distribution 6 When testing for the equality of two means, the hypotheses would take which of following forms? a. 푯풐:풖ퟏ = 풖ퟐ 퐻1:푢1 ≠푢2 2 2 b. 퐻표: 휎1 = 휎2 2 2 퐻1: 휎1 ≠ 휎2 C 퐻표 : μ ≤ 2.0 퐻1 : μ > 2.0 d 퐻표:푝1 = 푝2 퐻1:푝1 ≠ 푝2 7 Testing hypotheses concerning population parameters using sample data is called a Exploratory research b Descriptive research c Descriptive analysis d Inferential analysis 8 Which of the following could be labeled as a null hypothesis? a μ ≠ 25 b μ > 25 c μ = 25 d μ < 25 9 The degrees of freedom for testing the equality of two population means assuming equal variance using a t test are a 푛1 + 푛2 b 푛1 - 푛2 c 푛1 + 푛2 - 1 d 풏ퟏ + 풏ퟐ – 2 10 When testing for the proportion of a population, we use a t test b F test c Z test assuming normal approximation to binominal distribution d Paired sample t test 11 When we accept the null hypothesis when it is false we, are committing a type 1 error b type 2 error c neither type 1 nor type 2 error d none of the above is true 12 The alternative hypothesis is “that more than 80% of the students know driving” is an example of a One-tailed test b Two-tailed test c Type 1 error d Type 2 error 13 The degrees of freedom for the t test to test the hypothesis about paired sample are a n b 푛1 + 푛2 c 푛1 + 푛2 – 2 d n -1 14 In case of small sample with the objective of testing the equality of two population means with unequal population variance a z test is used b t test with degrees of freedom 푛1 + 푛2 – 2 is used c t test with 푛1 + 푛2 degrees of freedom is used d None of the above statements is correct. 15 The null hypothesis to test the equality of two population means using t test would be a 푡1 = 푡2 b 푋1 = 푋2 c μ1>μ2 d 훍 ퟏ = 훍 ퟐ 16 The most appropriate level of significance used in testing of hypothesis is a.
Recommended publications
  • T-TEST Outline Hypothesis Testing Steps Bivariate Analysis
    Bivariate Analysis T-TEST Variable 1 2 LEVELS >2 LEVELS CONTINUOUS Variable 2 2 LEVELS X2 X2 t-test chi square test chi square test >2 LEVELS X2 X2 ANOVA chi square test chi square test (F-test) CONTINUOUS t-test ANOVA -Correlation (F-test) -Simple linear Regression Outline Comparison of means: t-test Hypothesis testing steps T-test is used when one variable is of a continuous nature and the other is dichotomous. T-test The t-test is used to compare the means of two groups on a given variable. Anova Examples: Difference in average blood pressure among males & females. Difference in average BMI among those who exercise and those who do not. Hypothesis testing steps Comparison of means: t-test Example 1: Identify the study objective Research question: Among university students, is there a State the null & alternative hypothesis difference between the average weight for males versus females? Select the proper test statistic Null hypothesis (Ho): μ weight males = μ weight females Calculate the test statistic Alternative hypothesis (Ha): μ weight males ≠ μ weight females Take a statistical decision based on the p-value. Statistical test: t-test Reject or accept the null hypothesis Comparison of means: t-test Comparison of means: t-test T-Test (SPSS output) If this p-value is < 0.05 then reject null hypothesis and conclude that the variances are different (accept alternative) Group Statistics and hence check this p-value for the t-test. Std. Error gender N Mean Std. Deviation Mean weight male 804 75.92 12.843 .453 If this p-value is > 0.05 then accept null hypothesis and female 1135 56.47 8.923 .265 conclude that the variances are equal and hence check this p-value for the t-test.
    [Show full text]
  • Chapter Five Multivariate Statistics
    Chapter Five Multivariate Statistics Jorge Luis Romeu IIT Research Institute Rome, NY 13440 April 23, 1999 Introduction In this chapter we treat the multivariate analysis problem, which occurs when there is more than one piece of information from each subject, and present and discuss several materials analysis real data sets. We first discuss several statistical procedures for the bivariate case: contingency tables, covariance, correlation and linear regression. They occur when both variables are either qualitative or quantitative: Then, we discuss the case when one variable is qualitative and the other quantitative, via the one way ANOVA. We then overview the general multivariate regression problem. Finally, the non-parametric case for comparison of several groups is discussed. We emphasize the assessment of all model assumptions, prior to model acceptance and use and we present some methods of detection and correction of several types of assumption violation problems. The Case of Bivariate Data Up to now, we have dealt with data sets where each observation consists of a single measurement (e.g. each observation consists of a tensile strength measurement). These are called univariate observations and the related statistical problem is known as univariate analysis. In many cases, however, each observation yields more than one piece of information (e.g. tensile strength, material thickness, surface damage). These are called multivariate observations and the statistical problem is now called multivariate analysis. Multivariate analysis is of great importance and can help us enhance our data analysis in several ways. For, coming from the same subject, multivariate measurements are often associated with each other. If we are able to model this association then we can take advantage of the situation to obtain one from the other.
    [Show full text]
  • Package 'Biwavelet'
    Package ‘biwavelet’ May 26, 2021 Type Package Title Conduct Univariate and Bivariate Wavelet Analyses Version 0.20.21 Date 2021-05-24 Author Tarik C. Gouhier, Aslak Grinsted, Viliam Simko Maintainer Tarik C. Gouhier <[email protected]> Description This is a port of the WTC MATLAB package written by Aslak Grinsted and the wavelet program written by Christopher Torrence and Gibert P. Compo. This package can be used to perform univariate and bivariate (cross-wavelet, wavelet coherence, wavelet clustering) analyses. License GPL (>= 2) URL https://github.com/tgouhier/biwavelet BugReports https://github.com/tgouhier/biwavelet/issues LazyData yes LinkingTo Rcpp Imports fields, foreach, Rcpp (>= 0.12.2) Suggests testthat, knitr, rmarkdown, devtools RoxygenNote 7.1.1 NeedsCompilation yes Repository CRAN Date/Publication 2021-05-26 05:10:10 UTC R topics documented: biwavelet-package . .2 ar1.spectrum . .4 ar1_ma0_sim . .5 arrow ............................................6 arrow2 . .7 1 2 biwavelet-package check.data . .7 check.datum . .8 convolve2D . .9 convolve2D_typeopen . 10 enviro.data . 10 get_minroots . 11 MOTHERS . 12 phase.plot . 12 plot.biwavelet . 13 pwtc............................................. 17 rcpp_row_quantile . 19 rcpp_wt_bases_dog . 20 rcpp_wt_bases_morlet . 21 rcpp_wt_bases_paul . 22 smooth.wavelet . 23 wclust . 24 wdist . 25 wt.............................................. 26 wt.bases . 28 wt.bases.dog . 29 wt.bases.morlet . 29 wt.bases.paul . 30 wt.sig . 31 wtc.............................................. 32 wtc.sig . 34 wtc_sig_parallel . 36 xwt ............................................. 37 Index 40 biwavelet-package Conduct Univariate and Bivariate Wavelet Analyses Description This is a port of the WTC MATLAB package written by Aslak Grinsted and the wavelet program written by Christopher Torrence and Gibert P. Compo. This package can be used to perform uni- variate and bivariate (cross-wavelet, wavelet coherence, wavelet clustering) wavelet analyses.
    [Show full text]
  • Meta4diag: Bayesian Bivariate Meta-Analysis of Diagnostic Test Studies for Routine Practice
    meta4diag: Bayesian Bivariate Meta-analysis of Diagnostic Test Studies for Routine Practice J. Guo and A. Riebler Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, PO 7491, Norway. July 8, 2016 Abstract This paper introduces the R package meta4diag for implementing Bayesian bivariate meta-analyses of diagnostic test studies. Our package meta4diag is a purpose-built front end of the R package INLA. While INLA offers full Bayesian inference for the large set of latent Gaussian models using integrated nested Laplace approximations, meta4diag extracts the features needed for bivariate meta-analysis and presents them in an intuitive way. It allows the user a straightforward model- specification and offers user-specific prior distributions. Further, the newly proposed penalised complexity prior framework is supported, which builds on prior intuitions about the behaviours of the variance and correlation parameters. Accurate posterior marginal distributions for sensitivity and specificity as well as all hyperparameters, and covariates are directly obtained without Markov chain Monte Carlo sampling. Further, univariate estimates of interest, such as odds ratios, as well as the SROC curve and other common graphics are directly available for interpretation. An in- teractive graphical user interface provides the user with the full functionality of the package without requiring any R programming. The package is available through CRAN https://cran.r-project.org/web/packages/meta4diag/ and its usage will be illustrated using three real data examples. arXiv:1512.06220v2 [stat.AP] 7 Jul 2016 1 1 Introduction A meta-analysis summarises the results from multiple studies with the purpose of finding a general trend across the studies.
    [Show full text]
  • UNIT 1 INTRODUCTION to STATISTICS Introduction to Statistics
    UNIT 1 INTRODUCTION TO STATISTICS Introduction to Statistics Structure 1.0 Introduction 1.1 Objectives 1.2 Meaning of Statistics 1.2.1 Statistics in Singular Sense 1.2.2 Statistics in Plural Sense 1.2.3 Definition of Statistics 1.3 Types of Statistics 1.3.1 On the Basis of Function 1.3.2 On the Basis of Distribution of Data 1.4 Scope and Use of Statistics 1.5 Limitations of Statistics 1.6 Distrust and Misuse of Statistics 1.7 Let Us Sum Up 1.8 Unit End Questions 1.9 Glossary 1.10 Suggested Readings 1.0 INTRODUCTION The word statistics has different meaning to different persons. Knowledge of statistics is applicable in day to day life in different ways. In daily life it means general calculation of items, in railway statistics means the number of trains operating, number of passenger’s freight etc. and so on. Thus statistics is used by people to take decision about the problems on the basis of different type of quantitative and qualitative information available to them. However, in behavioural sciences, the word ‘statistics’ means something different from the common concern of it. Prime function of statistic is to draw statistical inference about population on the basis of available quantitative information. Overall, statistical methods deal with reduction of data to convenient descriptive terms and drawing some inferences from them. This unit focuses on the above aspects of statistics. 1.1 OBJECTIVES After going through this unit, you will be able to: Define the term statistics; Explain the status of statistics; Describe the nature of statistics; State basic concepts used in statistics; and Analyse the uses and misuses of statistics.
    [Show full text]
  • Correlation and Regression Analysis
    OIC ACCREDITATION CERTIFICATION PROGRAMME FOR OFFICIAL STATISTICS Correlation and Regression Analysis TEXTBOOK ORGANISATION OF ISLAMIC COOPERATION STATISTICAL ECONOMIC AND SOCIAL RESEARCH AND TRAINING CENTRE FOR ISLAMIC COUNTRIES OIC ACCREDITATION CERTIFICATION PROGRAMME FOR OFFICIAL STATISTICS Correlation and Regression Analysis TEXTBOOK {{Dr. Mohamed Ahmed Zaid}} ORGANISATION OF ISLAMIC COOPERATION STATISTICAL ECONOMIC AND SOCIAL RESEARCH AND TRAINING CENTRE FOR ISLAMIC COUNTRIES © 2015 The Statistical, Economic and Social Research and Training Centre for Islamic Countries (SESRIC) Kudüs Cad. No: 9, Diplomatik Site, 06450 Oran, Ankara – Turkey Telephone +90 – 312 – 468 6172 Internet www.sesric.org E-mail [email protected] The material presented in this publication is copyrighted. The authors give the permission to view, copy download, and print the material presented that these materials are not going to be reused, on whatsoever condition, for commercial purposes. For permission to reproduce or reprint any part of this publication, please send a request with complete information to the Publication Department of SESRIC. All queries on rights and licenses should be addressed to the Statistics Department, SESRIC, at the aforementioned address. DISCLAIMER: Any views or opinions presented in this document are solely those of the author(s) and do not reflect the views of SESRIC. ISBN: xxx-xxx-xxxx-xx-x Cover design by Publication Department, SESRIC. For additional information, contact Statistics Department, SESRIC. i CONTENTS Acronyms
    [Show full text]
  • Multiple Linear Regression
    Multiple Linear Regression Mark Tranmer Mark Elliot 1 Contents Section 1: Introduction............................................................................................................... 3 Exam16 ...................................................................................................................................................... Exam11...................................................................................................................................... 4 Predicted values......................................................................................................................... 5 Residuals.................................................................................................................................... 6 Scatterplot of exam performance at 16 against exam performance at 11.................................. 6 1.3 Theory for multiple linear regression................................................................................... 7 Section 2: Worked Example using SPSS.................................................................................. 10 Section 3: Further topics.......................................................................................................... 36 Stepwise................................................................................................................................... 46 Section 4: BHPS assignment................................................................................................... 46 Reading list.............................................................................................................................
    [Show full text]
  • The Landscape of R Packages for Automated Exploratory Data Analysis by Mateusz Staniak and Przemysław Biecek
    CONTRIBUTED RESEARCH ARTICLE 1 The Landscape of R Packages for Automated Exploratory Data Analysis by Mateusz Staniak and Przemysław Biecek Abstract The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is the Exploratory Data Analysis, crucial for better domain understanding, data cleaning, data validation, and feature engineering. There is a growing number of libraries that attempt to automate some of the typical Exploratory Data Analysis tasks to make the search for new insights easier and faster. In this paper, we present a systematic review of existing tools for Automated Exploratory Data Analysis (autoEDA). We explore the features of fifteen popular R packages to identify the parts of the analysis that can be effectively automated with the current tools and to point out new directions for further autoEDA development. Introduction With the advent of tools for automated model training (autoML), building predictive models is becoming easier, more accessible and faster than ever. Tools for R such as mlrMBO (Bischl et al., 2017), parsnip (Kuhn and Vaughan, 2019); tools for python such as TPOT (Olson et al., 2016), auto-sklearn (Feurer et al., 2015), autoKeras (Jin et al., 2018) or tools for other languages such as H2O Driverless AI (H2O.ai, 2019; Cook, 2016) and autoWeka (Kotthoff et al., 2017) supports fully- or semi-automated feature engineering and selection, model tuning and training of predictive models. Yet, model building is always preceded by a phase of understanding the problem, understanding of a domain and exploration of a data set.
    [Show full text]
  • WHY MULTIVARIATE ANALYSIS? • Used for Analysing Complicated
    WHY MULTIVARIATE ANALYSIS? • Used for analysing complicated data sets • When there are many Independent Variables (IVs) and/or many Dependent Variables (DVs) • When IVs and DVs are correlated with one Dr. Azmi Mohd Tamil another to varying degrees Jabatan Kesihatan Masyarakat Fakulti Perubatan UKM • When need to come up with Prediction Model Based on lecture notes by Dr Azidah Hashim • Parallels greater complexity of contemporary research USING MULTIVARIATE ANALYSIS CHOICE OF APPROPRIATE STATISTICAL METHOD BASED ON: • WHICH STATISTICAL PROCEDURE TO USE? • Nature of IVs and DVs • HOW TO PERFORM CHOSEN PROCEDURE? • Investigator’s Experience • Personal Preferences • HOW TO INFER FROM RESULTS OBTAINED? • Ease of Comfort with Methods Used • Literature Review References • ANY OTHER ALTERNATIVE APPROACH? • Consultation with Statistician BASIC PREMISE I: BASIC PREMISE I: TERM USED:- Relationship between X Y X Y Independent Variable (IV) Dependent Variable (DV) Predictor Outcome Explanatory Response Eg . Smoking Lung Cancer Risk Factor Effect Age Hypertension Covariates (Continuous) Maternal ANC Birthweight Factor (Categorical) Control INDEPENDENT DEPENDENT VARIABLE VARIABLE Confounders RISK FACTOR OUTCOME Nuisance 1 BASIC PREMISE II: TERMINOLOGY: Univariate Analysis DATA • Analysis in which there is a single DV Bivariate Analysis Qualitative Quantitative • Analysis of two variables (categorical) • Wish to simply study the relationship between -Dichotomous -Continuous the variables - Polynomial -Discrete Multivariate Analysis • Simultaneously analyse multiple DVs and IVs Rough Guide to Multivariate BASIC PREMISE 111:VARIATIONS OF Methods (1) THE SAME THEME Name Xs Y THE GENERAL LINEAR MODEL: Regression and Continuous Continuous Correlation (eg. age) (eg. BP) Y = a + b1x1 + b2x2 + b3x3 …. + bixi Analysis of Variance Categorical Continuous Used in the following procedures: (ANOVA) (eg.
    [Show full text]
  • Geographic Information Systems Correlation Modeling As A
    Boise State University ScholarWorks Anthropology Graduate Projects and Theses Department of Anthropology 6-1-2011 Geographic Information Systems Correlation Modeling as a Management Tool in the Study Effects of Environmental Variables’ Effects on Cultural Resources Brian Wallace Boise State University GEOGRAPHIC INFORMATION SYSTEMS CORRELATION MODELING AS A MANAGEMENT TOOL IN THE STUDY EFFECTS OF ENVIRONMENTAL VARIABLES’ EFFECTS ON CULTURAL RESOURCES by Brian N. Wallace A Project Submitted in Partial Fulfillment Of the Requirements for the Degree of Master of Applied Anthropology Boise State University June 2011 BOISE STATE UNIVERSITY GRADUATE COLLEGE DEFENSE COMMITTEE APPROVAL of the project submitted by Brian N. Wallace The project presented by Brian N. Wallace entitled Geographic Information Systems Correlation Modeling as a Management Tool in the Study Effects of Environmental Variables’ Effects on Cultural Resources is hereby approved: ______________________________________________________ Mark G. Plew Date Advisor ______________________________________________________ John Ziker Date Committee Member ______________________________________________________ Margaret A. Streeter Date Committee Member ______________________________________________________ John R. (Jack) Pelton Date Dean of the Graduate College DEDICATION To my parents Jerry and Cheryl Wallace ii ACKNOWLEDGMENTS The author would like to express his appreciation to those individuals and entities who contributed to the completion of this thesis. Dr. Mark Plew is owed a great debt of gratitude for his invaluable advice, guidance, and friendship throughout my graduate experience. It is because of his helpful suggestions both for this paper and his mentorship over the last few years that have allowed me succeed beyond my own expectations. To the Cultural Resource and GIS staff of the Boise District of the Bureau of Land Management (BLM) whom allowed for access to the necessary data to run analyses successfully and for helpful guidance along the way.
    [Show full text]
  • Univariate Analysis and Normality Test Using SAS, STATA, and SPSS
    © 2002-2006 The Trustees of Indiana University Univariate Analysis and Normality Test: 1 Univariate Analysis and Normality Test Using SAS, STATA, and SPSS Hun Myoung Park This document summarizes graphical and numerical methods for univariate analysis and normality test, and illustrates how to test normality using SAS 9.1, STATA 9.2 SE, and SPSS 14.0. 1. Introduction 2. Graphical Methods 3. Numerical Methods 4. Testing Normality Using SAS 5. Testing Normality Using STATA 6. Testing Normality Using SPSS 7. Conclusion 1. Introduction Descriptive statistics provide important information about variables. Mean, median, and mode measure the central tendency of a variable. Measures of dispersion include variance, standard deviation, range, and interquantile range (IQR). Researchers may draw a histogram, a stem-and-leaf plot, or a box plot to see how a variable is distributed. Statistical methods are based on various underlying assumptions. One common assumption is that a random variable is normally distributed. In many statistical analyses, normality is often conveniently assumed without any empirical evidence or test. But normality is critical in many statistical methods. When this assumption is violated, interpretation and inference may not be reliable or valid. Figure 1. Comparing the Standard Normal and a Bimodal Probability Distributions Standard Normal Distribution Bimodal Distribution .4 .4 .3 .3 .2 .2 .1 .1 0 0 -5 -3 -1 1 3 5 -5 -3 -1 1 3 5 T-test and ANOVA (Analysis of Variance) compare group means, assuming variables follow normal probability distributions. Otherwise, these methods do not make much http://www.indiana.edu/~statmath © 2002-2006 The Trustees of Indiana University Univariate Analysis and Normality Test: 2 sense.
    [Show full text]
  • Chapter Four: Univariate Statistics SPSS V11 Chapter Four: Univariate Statistics
    Chapter Four: Univariate Statistics SPSS V11 Chapter Four: Univariate Statistics Univariate analysis, looking at single variables, is typically the first procedure one does when examining first time data. There are a number of reasons why it is the first procedure, and most of the reasons we will cover at the end of this chapter, but for now let us just say we are interested in the "basic" results. If we are examining a survey, we are interested in how many people said, "Yes" or "No", or how many people "Agreed" or "Disagreed" with a statement. We aren't really testing a traditional hypothesis with an independent and dependent variable; we are just looking at the distribution of responses. The SPSS tools for looking at single variables include the following procedures: Frequencies, Descriptives and Explore all located under the Analyze menu. This chapter will use the GSS02A file used in earlier chapters, so start SPSS and bring the file into the Data Editor. ( See Chapter 1 to refresh your memory on how to start SPSS). To begin the process start SPSS, then open the data file. Under the Analyze menu, choose Descriptive Statistics and the procedure desired: Frequencies, Descriptives, Explore, Crosstabs. Frequencies Generally a frequency is used for looking at detailed information in a nominal (category) data set that describes the results. Categorical data is for variables such as gender i.e. males are coded as "1" and females are coded as "2." Frequencies options include a table showing counts and percentages, statistics including percentile values, central tendency, dispersion and distribution, and charts including bar charts and histograms.
    [Show full text]