Statistics & Quantitative Analysis SIPA U4320

Total Page:16

File Type:pdf, Size:1020Kb

Statistics & Quantitative Analysis SIPA U4320 Univariate Analysis Statistics & Quantitative Analysis n Assumptions of Regression Model SIPA U4320 n Regression Line n Population Parameters n The standard regression equation is Segment 10: Multiple Regression Yi= a + bXi + ei n The only things that we observe is Y and X. Prof. Sharyn O’Halloran n From these data we estimate a and b. n But our estimate will always contain some error. Key Points Univariate Analysis (cont.) n This error is represented by: n Review Univariate Regression Model e i = Yi -Y n Introduce Multivariate Regression Y = a + bX n Assumptions i X2 X1 X e2 X n Estimation X X e e3 n 1 X Hypothesis Testing X X3 Yield X X n Interpreting Multiple Regression Model X X X X n “Impact of X on Y controlling for ....” a X X b intercept X X =0 n Slope Coefficient as a Multiplication Factor X X n Path Diagram and Causal Models n Direct and Indirect Effects Fertilizer Copy Right Sharyn O'Halloran 2001 1 Univariate Analysis (cont.) Univariate Analysis (cont.) n Underlying Assumptions n Sample Parameters n Linearity n Most times we don’t observe the underlying n The true relation between Y and X is captured in the equation: population parameters. Y = a + bX n All we observe is a sample of X and Y values n Homoscedasticity (Homogeneous Variance) from which make estimates of a and b. n Each of the ei has the same variance. n 2 2 The predicted line takes the form: $ E(ei )= s for all i Y = a+bX where: å xy n Independence b= a =Y -bX 2 Relation Between Yield and Fertilizer n Each of the ei's is independent from each other. That is, the å x value of one does not effect the value of any other 100 predicted line observation i's error. The predicted line is the expected 80 Cov(e ,e ) = 0 for i ¹ j value of Y for a given value of X. 60 i j 40 n Normality 20 2 For any value of the dependent Yield (Bushel/Acre) 0 n Each ei is normally distributed with mean=0 and variance s 2 variable, there is a single most likely 0 100 200 300 400 500 600 700 800 ei ~ N(0, s ) value for the independent variable. Fertilizer (lb/Acre) Univariate Analysis (cont.) Univariate Analysis (cont.) Probability of Y given X n So we introduce a new form of error in our analysis. P(Y/X) Estimated regression line $ ei = Yi - Y Yˆ = a + bX Source of error: Y=a+bX 2 Inherent variability X2 Yield s2 s e of sampling process X1 X 2 X X e e3 Y2 Y = a + bX 1 X X X3 s2 Yield X Y3 True regression line e 2 e 3 e 1 X Y1 X X a b=0 intercept X X X X1 X3 X X2 Fertilizer X Fertilizer Copy Right Sharyn O'Halloran 2001 2 Univariate Analysis (cont.) Univariate Analysis (cont.) n Inferences n Standard Error n Make inferences about the population given a n The standard error is exactly by how much sample. our estimate of b is off. Where, x2 = (X -X )2 n Best Fit Line s i Standard error of b = n We are estimating the population line by drawing the 2 N Sx (X - X )2 best fit line through our data, å i s = i =1 Y$ = a + bX N n Rewrite the Formula: Spread n We estimate both a slope and an intercept. s s s 1 of X Standard Error = = · 2 2 å xy n 2 æ Sx ö n å x b = × Sx nç ÷ 2 a = Y - bX n è n ø n Standard å x Error Univariate Analysis (cont.) Univariate Analysis (cont.) n The Standard Error of slope b § Distribution of error terms n Parameter of interest is b s n Slope coefficient b measures the impact of one SE = Sx 2 variable on the dependent variable. n When b=0 implies X has no effect on Y E(b) =b n To construct a statistical test of the slope of the regression line, we need to know its mean and n This makes sense, b is the factor that standard error. relates the X’s to the Y, n Mean n The standard error depends on both the n The mean of the slope of the regression line expected variations in the Y’s and on the Expected value of b = b. variation in the X’s. Copy Right Sharyn O'Halloran 2001 3 Univariate Analysis (cont.) Univariate Analysis (cont.) n Hypothesis Testing n Example: Do people save more n 95% Confidence Intervals (s unknown) money as their income increases? n Confidence interval for the true slope of b given our estimate b: n Data: Suppose we observed 4 individual's income and saving rates? b = b± t.025 SE Income Savings X-deviation Y-deviation xy x2 Predicted-Y Deviation from Squared Deviation s Observation (X) (Y) from mean from mean Predicted Y from Predicted Y b = b ± t.025 2 1 22 2 1 -0.2 -0 1 2.34 -0.34 0.116 Sx 2 18 2 -3 -0.2 0.6 9 1.77 0.23 0.053 3 17 1.6 -4 -0.6 2.4 16 1.63 -0.03 0.0009 4 27 3.2 6 1 6 36 3.05 0.15 0.0225 n Test to see if the hypothesis lies within the Sum 84 8.8 0 0 8.8 62 8.79 0.1924 estimated range. Mean 21 2.2 ˆ x = (X i - X ) y = (Yi -Y ) Predicted Line Y = a + bX Univariate Analysis (cont.) Univariate Analysis (cont.) n P-values n Calculate the fitted line n P-value is the probability of observing an Y= a + bX event, given that the null hypothesis is true. n Estimate b b = Sxy / Sx2 = 8.8 / 62 = 0.142 n We can calculate the p-value by: n What does this mean? n Standardizing and calculating the t-statistic: b - b n On average, people save a little over 14% of every t = 0 extra dollar they earn. SE n Intercept a n Determine the Degrees of Freedom: n a = Y — b X = 2.2 - 0.142 (21) = -0.782 For univariate analysis = n-2 n What does this mean? n Find the probability associated with the t- n With no income, people borrow statistics with n-2 degrees of freedom in the t- table. n Regression equation is: Yˆ = - 0.78 + 0.142X Copy Right Sharyn O'Halloran 2001 4 . Univariate Analysis(cont.) Univariate Analysis (cont.) n What is the formula for the confidence interval? Savings Ratio by Income s .309 b = b ± t . b = .142 ± 4.30 · . 4 .025 å x2 62 3 Yˆ = - 0.78 + 0.142X 2 b = .142 ± .169 Þ -.027 £ b £ .311 1 n Reject or fail to reject the null hypothesis Savings 0 -.078 -1 0 5 10 15 20 25 30 n Since zero falls within this interval, we cannot reject the null hypothesis. -2 Income This is probably due to the small sample size Ø Each additional unit of income you save 14.2 cents Ø People with no income borrow. -.027 b=0 .311 Univariate Analysis (cont.) Univariate Analysis (cont.) n Calculate a 95% confidence interval n Additional Examples n State Hypothesis n How about the hypothesis that b = .50, so that n Now let's test the null hypothesis that b = 0. people save half their extra income? n That is, the hypothesis that people do not save any of the extra money they earn. n It is outside the confidence interval, so we can reject this hypothesis. H0: b = 0 Ha: b ¹ 0; n Let's say that it is well known that Japanese at the 5% significance level. consumers save 20% of their income on average. n Construct the Confidence Interval n Can we use these data (presumably from American families) n What do we need to calculate the confidence interval? to test the hypothesis that Japanese save at a higher rate 2 than Americans? n Degrees of Freedom 2 (Yi - Y) .192 s = = = 0.096 n Since 20% also falls within the confidence interval, we cannot n a-level = .05 n - 2 2 reject the null hypothesis that Americans save at the same rate n Sample variance s = 0.096 = .309 as Japanese. Copy Right Sharyn O'Halloran 2001 5 Regression in Excel Regression in Excel(cont.) Relation between Powerboat Registrtion (1000) n Example: and Manatee Deaths Graph Data: 60 n Manatees are large gentle sea creatures that live 50 along the Florida coast. 40 -35.18 + 0.11X 1 ˆ 30 Y = * * Manatees Killed n Many Manatees are killed or injured by (-4.57 ) (8.93) 20 powerboats each year. 10 0 Registration -100 0 100 200 300 400 500 600 700 800 n The US Fish and Wildlife Service conducted a -10 study on the impact on registration permits and For each additional -20 -30 number of Manatees killed. 1000 powerboats -40 registered, we expect Manatee Data an increase of .11 *Note: t-statistics in parentheses. * indicates p-value <0.05 Number of Manatee Manatee Deaths. Coefficients Standard Error t Stat P-value Powerboats Deaths Intercept -35.18 7.70 -4.57 0.000314 Powerboat registration (1000) 0.11 0.01 8.93 0.000000 Regression in Excel Regression in Excel(cont.) These are the data collected: n Hypothesis Testing Powerboat Manatees Powerboat Manatees H0: b1 = 0 Year registration (1000) Killed registration (1000) Killed 1977 447 13 1978 460 21 Descriptive Statistics Ha: b1 ¹ 0 1979 481 24 Mean 601.56 Mean 32.61 1980 498 16 Standard Error 24.46 Standard Error 3.02 n Calculate a 95% Confidence Interval 1981 513 24 Median 599.50 Median 33.50 1982 512 20 1983 526 15 Mode 716.00 Mode 24.00 n-1-k 1984 559 34 Standard Deviation 103.79 Standard Deviation 12.82 b ± t * SE 1985 585 33 Sample Variance 10773.32 Sample Variance 164.25 .025 b 1986 614 33 Range 288.00 Range 40.00 a=.025 a=.025 1987 645 39 1988 675 43 Minimum 447.00 Minimum 13.00 0.11± 2.12*0.01 1989 711 50 Maximum 735.00 Maximum 53.00 1990 719 47 Sum 10828.00 Sum 587.00 0.11212 0.10788 1991 716 53 Count 18.00 Count 18.00 0.11± 0.00212 1992 716 38 Confidence Confidence 1993 716 35 1994 735 49 Level(95.0%) 51.62 Level(95.0%) 6.37 n Reject or Fail to Reject Null Hypothesis Ø Does the number of Registered Powerboats increase n Therefore, we reject the null hypothesis that b1=0 in the number of Manatees killed? favor of the alternative that it is not equal to 0.
Recommended publications
  • UNIT 1 INTRODUCTION to STATISTICS Introduction to Statistics
    UNIT 1 INTRODUCTION TO STATISTICS Introduction to Statistics Structure 1.0 Introduction 1.1 Objectives 1.2 Meaning of Statistics 1.2.1 Statistics in Singular Sense 1.2.2 Statistics in Plural Sense 1.2.3 Definition of Statistics 1.3 Types of Statistics 1.3.1 On the Basis of Function 1.3.2 On the Basis of Distribution of Data 1.4 Scope and Use of Statistics 1.5 Limitations of Statistics 1.6 Distrust and Misuse of Statistics 1.7 Let Us Sum Up 1.8 Unit End Questions 1.9 Glossary 1.10 Suggested Readings 1.0 INTRODUCTION The word statistics has different meaning to different persons. Knowledge of statistics is applicable in day to day life in different ways. In daily life it means general calculation of items, in railway statistics means the number of trains operating, number of passenger’s freight etc. and so on. Thus statistics is used by people to take decision about the problems on the basis of different type of quantitative and qualitative information available to them. However, in behavioural sciences, the word ‘statistics’ means something different from the common concern of it. Prime function of statistic is to draw statistical inference about population on the basis of available quantitative information. Overall, statistical methods deal with reduction of data to convenient descriptive terms and drawing some inferences from them. This unit focuses on the above aspects of statistics. 1.1 OBJECTIVES After going through this unit, you will be able to: Define the term statistics; Explain the status of statistics; Describe the nature of statistics; State basic concepts used in statistics; and Analyse the uses and misuses of statistics.
    [Show full text]
  • The Landscape of R Packages for Automated Exploratory Data Analysis by Mateusz Staniak and Przemysław Biecek
    CONTRIBUTED RESEARCH ARTICLE 1 The Landscape of R Packages for Automated Exploratory Data Analysis by Mateusz Staniak and Przemysław Biecek Abstract The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is the Exploratory Data Analysis, crucial for better domain understanding, data cleaning, data validation, and feature engineering. There is a growing number of libraries that attempt to automate some of the typical Exploratory Data Analysis tasks to make the search for new insights easier and faster. In this paper, we present a systematic review of existing tools for Automated Exploratory Data Analysis (autoEDA). We explore the features of fifteen popular R packages to identify the parts of the analysis that can be effectively automated with the current tools and to point out new directions for further autoEDA development. Introduction With the advent of tools for automated model training (autoML), building predictive models is becoming easier, more accessible and faster than ever. Tools for R such as mlrMBO (Bischl et al., 2017), parsnip (Kuhn and Vaughan, 2019); tools for python such as TPOT (Olson et al., 2016), auto-sklearn (Feurer et al., 2015), autoKeras (Jin et al., 2018) or tools for other languages such as H2O Driverless AI (H2O.ai, 2019; Cook, 2016) and autoWeka (Kotthoff et al., 2017) supports fully- or semi-automated feature engineering and selection, model tuning and training of predictive models. Yet, model building is always preceded by a phase of understanding the problem, understanding of a domain and exploration of a data set.
    [Show full text]
  • Univariate Analysis and Normality Test Using SAS, STATA, and SPSS
    © 2002-2006 The Trustees of Indiana University Univariate Analysis and Normality Test: 1 Univariate Analysis and Normality Test Using SAS, STATA, and SPSS Hun Myoung Park This document summarizes graphical and numerical methods for univariate analysis and normality test, and illustrates how to test normality using SAS 9.1, STATA 9.2 SE, and SPSS 14.0. 1. Introduction 2. Graphical Methods 3. Numerical Methods 4. Testing Normality Using SAS 5. Testing Normality Using STATA 6. Testing Normality Using SPSS 7. Conclusion 1. Introduction Descriptive statistics provide important information about variables. Mean, median, and mode measure the central tendency of a variable. Measures of dispersion include variance, standard deviation, range, and interquantile range (IQR). Researchers may draw a histogram, a stem-and-leaf plot, or a box plot to see how a variable is distributed. Statistical methods are based on various underlying assumptions. One common assumption is that a random variable is normally distributed. In many statistical analyses, normality is often conveniently assumed without any empirical evidence or test. But normality is critical in many statistical methods. When this assumption is violated, interpretation and inference may not be reliable or valid. Figure 1. Comparing the Standard Normal and a Bimodal Probability Distributions Standard Normal Distribution Bimodal Distribution .4 .4 .3 .3 .2 .2 .1 .1 0 0 -5 -3 -1 1 3 5 -5 -3 -1 1 3 5 T-test and ANOVA (Analysis of Variance) compare group means, assuming variables follow normal probability distributions. Otherwise, these methods do not make much http://www.indiana.edu/~statmath © 2002-2006 The Trustees of Indiana University Univariate Analysis and Normality Test: 2 sense.
    [Show full text]
  • Chapter Four: Univariate Statistics SPSS V11 Chapter Four: Univariate Statistics
    Chapter Four: Univariate Statistics SPSS V11 Chapter Four: Univariate Statistics Univariate analysis, looking at single variables, is typically the first procedure one does when examining first time data. There are a number of reasons why it is the first procedure, and most of the reasons we will cover at the end of this chapter, but for now let us just say we are interested in the "basic" results. If we are examining a survey, we are interested in how many people said, "Yes" or "No", or how many people "Agreed" or "Disagreed" with a statement. We aren't really testing a traditional hypothesis with an independent and dependent variable; we are just looking at the distribution of responses. The SPSS tools for looking at single variables include the following procedures: Frequencies, Descriptives and Explore all located under the Analyze menu. This chapter will use the GSS02A file used in earlier chapters, so start SPSS and bring the file into the Data Editor. ( See Chapter 1 to refresh your memory on how to start SPSS). To begin the process start SPSS, then open the data file. Under the Analyze menu, choose Descriptive Statistics and the procedure desired: Frequencies, Descriptives, Explore, Crosstabs. Frequencies Generally a frequency is used for looking at detailed information in a nominal (category) data set that describes the results. Categorical data is for variables such as gender i.e. males are coded as "1" and females are coded as "2." Frequencies options include a table showing counts and percentages, statistics including percentile values, central tendency, dispersion and distribution, and charts including bar charts and histograms.
    [Show full text]
  • Bivariate Analysis: the Statistical Analysis of the Relationship Between Two Variables
    bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ2) test for independence: A test of statistical significance used to assess the likelihood that an observed association between two variables could have occurred by chance. consistency checking: A data-cleaning procedure involving checking for unreasonable patterns of responses, such as a 12-year-old who voted in the last US presidential election. correlation coefficient: A statistical measure of the strength and direction of a linear relationship between two variables; it may vary from −1 to 0 to +1. data cleaning: The detection and correction of errors in a computer datafile that may have occurred during data collection, coding, and/or data entry. data matrix: The form of a computer datafile, with rows as cases and columns as variables; each cell represents the value of a particular variable (column) for a particular case (row). data processing: The preparation of data for analysis. descriptive statistics: Procedures for organizing and summarizing data. dummy variable: A variable or set of variable categories recoded to have values of 0 and 1. Dummy coding may be applied to nominal- or ordinal-scale variables for the purpose of regression or other numerical analysis. frequency distribution: A tabulation of the number of cases falling into each category of a variable. histogram: A graphic display in which the height of a vertical bar represents the frequency or percentage of cases in each category of an interval/ratio variable. imputation: A procedure for handling missing data in which missing values are assigned based on other information, such as the sample mean or known values of other variables.
    [Show full text]
  • Instructions for Univariate Analysis
    Instructions for Univariate Analysis 1. Select a dataset from the pull‐down menu. 2. Select a table from the pull‐down menu. 3. Select a variable for which a univariate analysis is desired. 4. You may use the Add Variable feature or the Shift‐Select or Ctrl‐Select options to analyze multiple variables in a single run. 1 6/22/2012 5. Click the Run button to generate the desired descriptive statistical analysis. The results will appear at the bottom. The results are arranged in a series of tabs, providing the following information: a. Descriptive statistics (standard descriptors such as sample size, mean, median, standard deviation, coefficient of variation) b. Percentiles showing the value of the variable corresponding to every 10th percentile c. Frequency providing a detailed frequency distribution, including a cumulative distribution, for the selected variables d. Frequency distribution graph providing a graph depicting the frequency distribution, both in absolute number and in percent e. Percent distribution graph providing the frequency distribution in percent and cumulative percent 6. Use the Clear Results button to clear out all of the results and start fresh. 7. The web portal allows you to perform either unweighted or weighted data analysis. Generally, it is recommended that weighted analysis be conducted so that values reflective of population characteristics are obtained. However, if you are interested in the numbers based on the raw sample, then choose unweighted analysis. The web portal has already identified the appropriate weight variable for each data set and made it available as an option. 8. You may choose to undertake the cross tabulation analysis for any selected subsample of the survey sample.
    [Show full text]
  • Univariate Analyses Can Be Used for Which of the Following
    Chapter Coverage and Supplementary Lecture Material: Chapter 17 Please replace the hand-out in class with a print-out or on-line reading of this material. There were typos and a few errors in that hand-out. This covers (and then some) the material in pages 275-278 of Chapter 17, which is all you are responsible for at this time. There are a number of important concepts in Chapter 17 which need to be understood in order to proceed further with the exercises and to prepare for the data analysis we will do on data collected in our group research projects. I want to go over a number of them, and will be sure to cover those items that will be on the quiz. First, at the outset of the chapter, the authors explain the difference between quantitative analysis which is descriptive and that which is inferential. And then the explain the difference between univariate analysis, bivariate analysis and multivariate analysis. Descriptive data analysis, the authors point out on page 275, involves focus in on the data collected per se, which describes what Ragin and Zaret refer to as the object of the research - what has been studied (objectively one hopes). This distinction between the object of research and the subject of research is an important one, and will help us understand the distinction between descriptive and inferential data analysis. As I argue in the paper I have written: “In thinking about the topic of a research project, it is helpful to distinguish between the object and subject of research.
    [Show full text]
  • UNIT 2 DESCRIPTIVE STATISTICS Introduction to Statistics
    UNIT 2 DESCRIPTIVE STATISTICS Introduction to Statistics Structure 2.0 Introduction 2.1 Objectives 2.2 Meaning of Descriptive Statistics 2.3 Organising Data 2.3.1 Classification 2.3.2 Tabulation 2.3.3 Graphical Presentation of Data 2.3.4 Diagrammatical Presentation of Data 2.4 Summarising Data 2.4.1 Measures of Central Tendency 2.4.2 Measures of Dispersion 2.5 Use of Descriptive Statistics 2.6 Let Us Sum Up 2.7 Unit End Questions 2.8 Glossary 2.9 Suggested Readings 2.0 INTRODUCTION We have learned in the previous unit that looking at the functions of statistics point of view, statistics may be descriptive, correlational and inferential. In this unit we shall discuss the various aspects of descriptive statistics, particularly how to organise and discribe the data. Most of the observations in this universe are subject to variability, especially observations related to human behaviour. It is a well known fact that Attitude, Intelligence, Personality, etc. differ from individual to individual. In order to make a sensible definition of the group or to identify the group with reference to their observations/ scores, it is necessary to express them in a precise manner. For this purpose observations need to be expressed as a single estimate which summarises the observations. Such single estimate of the series of data which summarises the distribution are known as parameters of the distribution. These parameters define the distribution completely. In this unit we will be focusing on descriptive statistics, the characteristic features and the various statistics used in this category.
    [Show full text]
  • Normality Testing Sas Spss.Pdf
    © 2002-2008 The Trustees of Indiana University Univariate Analysis and Normality Test: 1 I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Univariate Analysis and Normality Test Using SAS, Stata, and SPSS* Hun Myoung Park, Ph.D. © 2002-2008 Last modified on November 2008 University Information Technology Services Center for Statistical and Mathematical Computing Indiana University 410 North Park Avenue Bloomington, IN 47408 (812) 855-4724 (317) 278-4740 http://www.indiana.edu/~statmath * The citation of this document should read: “Park, Hun Myoung. 2008. Univariate Analysis and Normality Test Using SAS, Stata, and SPSS. Working Paper. The University Information Technology Services (UITS) Center for Statistical and Mathematical Computing, Indiana University.” http://www.indiana.edu/~statmath/stat/all/normality/index.html http://www.indiana.edu/~statmath © 2002-2008 The Trustees of Indiana University Univariate Analysis and Normality Test: 2 This document summarizes graphical and numerical methods for univariate analysis and normality test, and illustrates how to do using SAS 9.1, Stata 10 special edition, and SPSS 16.0. 1. Introduction 2. Graphical Methods 3. Numerical Methods 4. Testing Normality Using SAS 5. Testing Normality Using Stata 6. Testing Normality Using SPSS 7. Conclusion 1. Introduction Descriptive statistics provide important information about variables to be analyzed. Mean, median, and mode measure central tendency of a variable.
    [Show full text]
  • Measures of Central Tendency
    © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION Chapter 2 © Jones & BartlettMeasures Learning, LLC of Central© Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION Tendency © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION Learning Objectives ■■©Understand Jones & how Bartlett frequency Learning, tables are used LLC in statistical analysis. © Jones & Bartlett Learning, LLC ■■NOTExplain FOR the conventionsSALE OR for DISTRIBUTION building distributions. NOT FOR SALE OR DISTRIBUTION ■■ Understand the mode, median, and mean as measures of central tendency. ■■ Identify the proper measure of central tendency to use for each level of measure- ment. © Jones & Bartlett■■ Explain Learning, how to calculate LLC the mode, median, and mean.© Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION Key Terms bivariate median central tendency mode © Jones & Bartlett Learning,frequency LLC distribution © Jonesmultivariate & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTIONmean NOT univariateFOR SALE OR DISTRIBUTION Now we begin statistical analysis. Statistical analysis may be broken down into three broad categories: © Jones■■ Univariate & Bartlett analyses Learning, LLC © Jones & Bartlett Learning, LLC NOT■■ Bivariate FOR SALE analyses OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION ■■ Multivariate analyses These divisions are fairly straightforward. Univariate analyses deal with one variable at a time. Bivariate analyses compare two variables to each other to see how they dif- © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION 17 © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC.© NOTJones FOR SALE & OR Bartlett DISTRIBUTION.
    [Show full text]
  • A Comparative Study of Univariate and Multivariate Methodological Approaches to Educational Research Robert M
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1989 A comparative study of univariate and multivariate methodological approaches to educational research Robert M. Crawford Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Educational Assessment, Evaluation, and Research Commons, and the Teacher Education and Professional Development Commons Recommended Citation Crawford, Robert M., "A comparative study of univariate and multivariate methodological approaches to educational research " (1989). Retrospective Theses and Dissertations. 8923. https://lib.dr.iastate.edu/rtd/8923 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. INFORMATION TO USERS The most advanced technology has been used to photo­ graph and reproduce this manuscript from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.
    [Show full text]
  • Skewness to Test Normality for Mean Comparison
    International Journal of Assessment Tools in Education 2020, Vol. 7, No. 2, 255–265 https://doi.org/10.21449/ijate.656077 Published at https://dergipark.org.tr/en/pub/ijate Research Article Parametric or Non-parametric: Skewness to Test Normality for Mean Comparison Fatih Orcan 1,* 1Department of Educational Measurement and Evaluation, Trabzon University, Turkey ARTICLE HISTORY Abstract: Checking the normality assumption is necessary to decide whether a Received: Dec 06, 2019 parametric or non-parametric test needs to be used. Different ways are suggested in literature to use for checking normality. Skewness and kurtosis values are one of Revised: Apr 22, 2020 them. However, there is no consensus which values indicated a normal distribution. Accepted: May 24, 2020 Therefore, the effects of different criteria in terms of skewness values were simulated in this study. Specifically, the results of t-test and U-test are compared KEYWORDS under different skewness values. The results showed that t-test and U-test give Normality test, different results when the data showed skewness. Based on the results, using Skewness, skewness values alone to decide about normality of a dataset may not be enough. Mean comparison, Therefore, the use of non-parametric tests might be inevitable. Non-parametric, 1. INTRODUCTION Mean comparison tests, such as t-test, Analysis of Variance (ANOVA) or Mann-Whitney U test, are frequently used statistical techniques in educational sciences. The techniques used differ according to the properties of the data sets such as normality or equal variance. For example, if the data is not normally distributed Mann-Whitney U test is used instead of independent sample t-test.
    [Show full text]