Data Analysis /&(
Total Page:16
File Type:pdf, Size:1020Kb
0DUNHW5HVHDUFK Data Analysis /&( Univariate Data Analysis Univariate Data Analysis Frequency Distribution Cross-Tabulation Agenda 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape 4) Introduction to Hypothesis Testing 5) A General Procedure for Hypothesis Testing 6) Cross-Tabulations 7) Statistics Associated with Cross-Tabulation 0DUNHW5HVHDUFK Data Analysis /&( Internet Usage Data (File Internet) Respondent Sex Familiarity Internet Attitude Toward Usage of Internet Number Usage Internet Technology Shopping Banking 1 1.00 7.00 14.00 7.00 6.00 1.00 1.00 2 2.00 2.00 2.00 3.00 3.00 2.00 2.00 3 2.00 3.00 3.00 4.00 3.00 1.00 2.00 4 2.00 3.00 3.00 7.00 5.00 1.00 2.00 5 1.00 7.00 13.00 7.00 7.00 1.00 1.00 6 2.00 4.00 6.00 5.00 4.00 1.00 2.00 7 2.00 2.00 2.00 4.00 5.00 2.00 2.00 8 2.00 3.00 6.00 5.00 4.00 2.00 2.00 9 2.00 3.00 6.00 6.00 4.00 1.00 2.00 10 1.00 9.00 15.00 7.00 6.00 1.00 2.00 11 2.00 4.00 3.00 4.00 3.00 2.00 2.00 12 2.00 5.00 4.00 6.00 4.00 2.00 2.00 13 1.00 6.00 9.00 6.00 5.00 2.00 1.00 14 1.00 6.00 8.00 3.00 2.00 2.00 2.00 15 1.00 6.00 5.00 5.00 4.00 1.00 2.00 16 2.00 4.00 3.00 4.00 3.00 2.00 2.00 17 1.00 6.00 9.00 5.00 3.00 1.00 1.00 18 1.00 4.00 4.00 5.00 4.00 1.00 2.00 19 1.00 7.00 14.00 6.00 6.00 1.00 1.00 20 2.00 6.00 6.00 6.00 4.00 2.00 2.00 21 1.00 6.00 9.00 4.00 2.00 2.00 2.00 22 1.00 5.00 5.00 5.00 4.00 2.00 1.00 23 2.00 3.00 2.00 4.00 2.00 2.00 2.00 24 1.00 7.00 15.00 6.00 6.00 1.00 1.00 25 2.00 6.00 6.00 5.00 3.00 1.00 2.00 26 1.00 6.00 13.00 6.00 6.00 1.00 1.00 27 2.00 5.00 4.00 5.00 5.00 1.00 1.00 28 2.00 4.00 2.00 3.00 2.00 2.00 2.00 29 1.00 4.00 4.00 5.00 3.00 1.00 2.00 30 1.00 3.00 3.00 7.00 5.00 1.00 2.00 Univariate Data Analysis Examining Summary Statistics for Individual Variables 0DUNHW5HVHDUFK Data Analysis /&( Level of M easurem ent • Different summary measures are appropriate for different types of data, depending on the level of measurement: – Nominal (categorical data where there is no inherent order to the categories) – Ordinal (categorical data where there is a meaningful order of categories, but there isn't a measurable distance between categories) – Scale (data measured on an interval or ratio scale) Sum m ary M easures • For categorical data – From the menus choose: • Analyse – Descriptive Statistics • Frequencies • For Scale Variables – Analyise • Descriptive Statistics – Explore 0DUNHW5HVHDUFK Data Analysis /&( Frequency Distribution • In a frequency distribution, one variable is considered at a time. • A frequency distribution for a variable produces a table of frequency counts, percentages, and cumulative percentages for all the values associated with that variable. 0DUNHW5HVHDUFK Data Analysis /&( Frequency Distribution of Fam iliarity w ith the Internet Valid Cumulative Value label Value Frequency (N) Percentage percentage percentage Not so familiar 1 0 0.0 0.0 0.0 2 2 6.7 6.9 6.9 3 6 20.0 20.7 27.6 4 6 20.0 20.7 48.3 5 3 10.0 10.3 58.6 6 8 26.7 27.6 86.2 Very familiar 7 4 13.3 13.8 100.0 M issing 9 1 3.3 TOTAL 30 100.0 100.0 Frequency Histogram 8 7 6 y c 5 n e u 4 q e 3 r F 2 1 0 2 3 4 5 6 7 Familiarity 0DUNHW5HVHDUFK Data Analysis /&( Statistics Familiarity N Valid 29 Missing 1 Median 5,00 Mode 6 Percentiles 25 3,00 50 5,00 75 6,00 Familiarity Cumulative Frequency Percent Valid Percent Percent Valid 2 2 6,7 6,9 6,9 3 6 20,0 20,7 27,6 4 6 20,0 20,7 48,3 5 3 10,0 10,3 58,6 6 8 26,7 27,6 86,2 Very Familiar 4 13,3 13,8 100,0 Total 29 96,7 100,0 Missing 9 1 3,3 Total 30 100,0 Internet Usage Group Cumulative Frequency Percent Valid Percent Percent Valid Light Users 15 50,0 50,0 50,0 Heavy Users 15 50,0 50,0 100,0 Total 30 100,0 100,0 Internet Usage Group Light Users Heavy Users 0DUNHW5HVHDUFK Data Analysis /&( Statistics Associated with Frequency Distribution M easures of Location • The mean, or average value, is the most commonly used measure of central tendency. • The mode is the value that occurs most frequently. It represents the highest peak of the distribution. The mode is a good measure of location when the variable is inherently categorical or has otherwise been grouped into categories. Statistics Associated with Frequency Distribution M easures of Location • The median of a sample is the middle value when the data are arranged in ascending or descending order. • If the number of data points is even, the median is usually estimated as the midpoint between the two middle values – by adding the two middle values and dividing their sum by 2. • The median is the 50th percentile. 0DUNHW5HVHDUFK Data Analysis /&( Statistics Associated with Frequency Distribution M easures of Variability • The range measures the spread of the data. It is simply the difference between the largest and smallest values in the sample. Range = Xlargest – Xsmallest. • The interquartile range is the difference between the 75th and 25th percentile. For a set of data points arranged in order of magnitude, the pth percentile is the value that has p% of the data points below it and (100 - p)% above it. Statistics Associated with Frequency Distribution M easures of Variability • The variance is the mean squared deviation from the mean. The variance can never be negative. • The standard deviation is the square root of the variance. n X X 2 s ( i - ) x = Σ n i =1 - 1 • The coefficient of variation is the ratio of the standard deviation to the mean expressed as a percentage, and is a unitless measure of relative variability. CV = sx/X 0DUNHW5HVHDUFK Data Analysis /&( Statistics Associated with Frequency Distribution M easures of Shape • Skewness. The tendency of the deviations from the mean to be larger in one direction than in the other. It can be thought of as the tendency for one tail of the distribution to be heavier than the other. • Kurtosis is a measure of the relative peakedness or flatness of the curve defined by the frequency distribution. The kurtosis of a normal distribution is zero. If the kurtosis is positive, then the distribution is more peaked than a normal distribution. A negative value means that the distribution is flatter than a normal distribution. Skew ness of a Distribution Symmetric Distribution Skewed Distribution Mean Median Mode (a) Mean Median Mode (b) 0DUNHW5HVHDUFK Data Analysis /&( Bivariate Data Analysis To obtain Crosstabulations … 0DUNHW5HVHDUFK Data Analysis /&( Cross-Tabulation • While a frequency distribution describes one variable at a time, a cross-tabulation describes two or more variables simultaneously. • Cross-tabulation results in tables that reflect the joint distribution of two or more variables with a limited number of categories or distinct values, e.g., Table 15.3. Gender and Internet Us age Gender Row Internet Usage Male Female Total Light (1) 5 10 15 Heavy (2) 10 5 15 Column Total 15 15 0DUNHW5HVHDUFK Data Analysis /&( Tw o Variables Cross- Tabulation • Since two variables have been cross classified, percentages could be computed either columnwise, based on column totals, or rowwise, based on row totals. • The general rule is to compute the percentages in the direction of the independent variable, across the dependent variable. Internet Usage by Gender Gender Internet Usage Male Female Light 33.3% 66.7% Heavy 66.7% 33.3% Column total 100% 100% 0DUNHW5HVHDUFK Data Analysis /&( G)ender by Internet Usage Internet Usage Gender Light Heavy Total Male 33.3% 66.7% 100.0% Female 66.7% 33.3% 100.0% Purchase of Fashion Clothing by M arital Status Purchase of Current Marital Status Fashion Clothing Married Unmarried High 31% 52% Low 69% 48% Column 100% 100% Number of 700 300 respondents 0DUNHW5HVHDUFK Data Analysis /&( Ow nership of Expensive Autom obiles by Education Level Own Expensive Education Automobile College Degree No College Degree Yes 32% 21% No 68% 79% Column totals 100% 100% Number of cases 250 750 Desire to Travel Abroad by Age Desire to Travel Abroad Age Less than 45 45 or More Yes 50% 50% No 50% 50% Column totals 100% 100% Number of respondents 500 500 0DUNHW5HVHDUFK Data Analysis /&( E,ating Frequently in Fast-Food Restaurants by Fam ily Size Eat Frequently in Fast- Family Size Food Restaurants Small Large Yes 65% 65% No 35% 35% Column totals 100% 100% Number of cases 500 500 Statistics Associated with Cross-Tabulation Chi-Square • To determine whether a systematic association exists, the probability of obtaining a value of chi- square as large or larger than the one calculated from the cross-tabulation is estimated.