F-Test) CONTINUOUS T-Test ANOVA -Correlation (F-Test) -Simple Linear Regression

Total Page:16

File Type:pdf, Size:1020Kb

F-Test) CONTINUOUS T-Test ANOVA -Correlation (F-Test) -Simple Linear Regression Bivariate Analysis Variable 1 2 LEVELS >2 LEVELS CONTINUOUS Variable 2 Variable 2 LEVELS X2 X2 t-test chi square test chi square test >2 LEVELS X2 X2 ANOVA chi square test chi square test (F-test) CONTINUOUS t-test ANOVA -Correlation (F-test) -Simple linear Regression Comparison of means: F-test Example 1: Research question: Among university students, is the average weight of students in university “A” different than that in university “B” and that in university “C”? Is there an association between weight and type of university? Ho : Average weight A = Average weight B = Average weight C Ha : At least two averages are different Statistical test: F-test = (Analysis of Variance)= ANOVA Comparison of means: F-test One way F-Test (SPSS output): Example 1 Descriptives weight 95% Confidence Interval for Mean N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum A 290 65.59 13.297 .781 64.06 67.13 41 125 B 1340 63.46 14.201 .388 62.70 64.22 39 135 C 345 67.74 15.299 .824 66.12 69.36 42 115 Total 1975 64.52 14.360 .323 63.89 65.15 39 135 ANOVA weight Sum of Squares df Mean Square F Sig. Between Groups 5414.963 2 2707.482 13.293 .000 Within Groups 401651.5 1972 203.677 Total 407066.5 1974 Comparison of means: F-test This is the p-value for the F-test (testing of the null hypothesis of whether the mean of weight for A = mean of weight for B = mean of weight for C). If this p-value is > 0.05 then accept null hypothesis and conclude that the means of the 3 groups are equal. If the p-value is < 0.05 then reject null hypothesis (accept the alternative) and conclude that at least two means are different. ANOVA weight Sum of Squares df Mean Square F Sig. Between Groups 5414.963 2 2707.482 13.293 .000 Within Groups 401651.5 1972 203.677 Total 407066.5 1974 Comparison of means: F-test ANOVA weight Sum of Squares df Mean Square F Sig. Between Groups 5414.963 2 2707.482 13.293 .000 Within Groups 401651.5 1972 203.677 Total 407066.5 1974 Since p-value is < 0.05 then reject null hypothesis (accept the alternative) and conclude that at least two means are different. BUT which of the means are different??? Comparison of means: F-test If we want to know exactly what 2 means are different: need to ask for Post Hoc Test Post Hoc Tests Multiple Comparisons Dependent Variable: weight Bonferroni Mean Difference 95% Confidence Interval (I) university (J) university (I-J) Std. Error Sig. Lower Bound Upper Bound A B 2.135 .924 .063 -.08 4.35 C -2.144 1.137 .178 -4.87 .58 B A -2.135 .924 .063 -4.35 .08 C -4.279* .862 .000 -6.34 -2.21 C A 2.144 1.137 .178 -.58 4.87 B 4.279* .862 .000 2.21 6.34 *. The mean difference is significant at the .05 level. Comparison of means: F-test A p-value < 0.05 (*) identifies significance between 2 groups: In this example differences in average of weight are between B and C. Post Hoc Tests Multiple Comparisons Dependent Variable: weight Bonferroni Mean Difference 95% Confidence Interval (I) university (J) university (I-J) Std. Error Sig. Lower Bound Upper Bound A B 2.135 .924 .063 -.08 4.35 C -2.144 1.137 .178 -4.87 .58 B A -2.135 .924 .063 -4.35 .08 C -4.279* .862 .000 -6.34 -2.21 C A 2.144 1.137 .178 -.58 4.87 B 4.279* .862 .000 2.21 6.34 *. The mean difference is significant at the .05 level. Comparison of means: F-test Example 1: Research question: Is there an association between weight and type of university? Ho : Average weight A = Average weight B = Average weight C Ha : At least two averages are different Statistical test: F-test = 13.293; p<0.05 Conclusion: There is a significant relationship between weight and type of university. Based on the post Hoc test, differences in average of weight are between B and C. Comparison of means: F-test One way F-Test (SPSS output): Example 2 Descriptives height 95% Confidence Interval for Mean N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum A 291 170.63 12.197 .715 169.22 172.03 72 194 B 1339 169.47 9.446 .258 168.96 169.97 58 202 C 341 171.80 9.336 .506 170.80 172.79 150 201 Total 1971 170.04 9.917 .223 169.60 170.48 58 202 ANOVA height Sum of Squares df Mean Square F Sig. Between Groups 1589.713 2 794.856 8.140 .000 Within Groups 192172.9 1968 97.649 Total 193762.6 1970 Comparison of means: F-test Post Hoc Tests (SPSS output): Example 2 Multiple Comparisons Dependent Variable: height Bonferroni Mean Difference 95% Confidence Interval (I) university (J) university (I-J) Std. Error Sig. Lower Bound Upper Bound A B 1.158 .639 .211 -.37 2.69 C -1.171 .789 .413 -3.06 .72 B A -1.158 .639 .211 -2.69 .37 C -2.328* .599 .000 -3.76 -.89 C A 1.171 .789 .413 -.72 3.06 B 2.328* .599 .000 .89 3.76 *. The mean difference is significant at the .05 level. Comparison of means: F-test One way F-Test (SPSS output): Example 2 Descriptives height 95% Confidence Interval for Mean N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum A 291 170.63 12.197 .715 169.22 172.03 72 194 B 1339 169.47 9.446 .258 168.96 169.97 58 202 C 341 171.80 9.336 .506 170.80 172.79 150 201 Total 1971 170.04 9.917 .223 169.60 170.48 58 202 Research question: Is there an association between height and type of university? Comparison of means: F-test One way F-Test (SPSS output): Example 2 Descriptives height 95% Confidence Interval for Mean N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum A 291 170.63 12.197 .715 169.22 172.03 72 194 B 1339 169.47 9.446 .258 168.96 169.97 58 202 C 341 171.80 9.336 .506 170.80 172.79 150 201 Total 1971 170.04 9.917 .223 169.60 170.48 58 202 Ho: Average height A = Average height B = Average height C Ha: At least two averages are different Comparison of means: F-test One way F-Test (SPSS output): Example 2 ANOVA height Sum of Squares df Mean Square F Sig. Between Groups 1589.713 2 794.856 8.140 .000 Within Groups 192172.9 1968 97.649 Total 193762.6 1970 Value of statistical test: 8.140 P-value: 0.000 Comparison of means: F-test One way F-Test (SPSS output): Example 2 Multiple Comparisons Dependent Variable: height Bonferroni Mean Difference 95% Confidence Interval (I) university (J) university (I-J) Std. Error Sig. Lower Bound Upper Bound A B 1.158 .639 .211 -.37 2.69 C -1.171 .789 .413 -3.06 .72 B A -1.158 .639 .211 -2.69 .37 C -2.328* .599 .000 -3.76 -.89 C A 1.171 .789 .413 -.72 3.06 B 2.328* .599 .000 .89 3.76 *. The mean difference is significant at the .05 level. There is a difference in average of height between B and C. Comparison of means: F-test One way F-Test (SPSS output): Example 2 ANOVA height Sum of Squares df Mean Square F Sig. Between Groups 1589.713 2 794.856 8.140 .000 Within Groups 192172.9 1968 97.649 Total 193762.6 1970 Conclusion: There is a significant relationship between height and type of university. Based on the post Hoc test, differences in average of height are between B and C. SPSS commands for F-test Example 1 Analyze Compare Means One way ANOVA select weight as the dependent variable select university as the factor Go to options- chose descriptive statistics Go to Post Hoc- Select Bonferroni for equal variance assumed SPSS commands for F-test Example 2 Analyze Compare Means One way ANOVA select height as the dependent variable select university as the factor Go to options- chose descriptive statistics Go to Post Hoc- Select Bonferroni for equal variance assumed T-TEST: CI VS. P-VALUE Hypothesis Testing (P-Value) & Confidence Interval (CI) In hypothesis testing (p-value): Decision of whether to accept or not the null hypothesis In confidence interval: Estimation of the parameter Decision of whether to accept or not the null hypothesis P-value & Confidence interval Example 1: T-test Group Statistics Std. Error gender N Mean Std. Deviation Mean weight male 804 75.92 12.843 .453 female 1135 56.47 8.923 .265 Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Mean Std. Error Difference F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper weight Equal variances 132.258 .000 39.337 1937 .000 19.444 .494 18.475 20.414 assumed Equal variances 37.059 1335.508 .000 19.444 .525 18.415 20.473 not assumed CI: Estimation of the parameter Example 1: T-test Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Mean Std.
Recommended publications
  • Simple Mean Weighted Mean Or Harmonic Mean
    MultiplyMultiply oror Divide?Divide? AA BestBest PracticePractice forfor FactorFactor AnalysisAnalysis 77 ––10 10 JuneJune 20112011 Dr.Dr. ShuShu-Ping-Ping HuHu AlfredAlfred SmithSmith CCEACCEA Los Angeles Washington, D.C. Boston Chantilly Huntsville Dayton Santa Barbara Albuquerque Colorado Springs Ft. Meade Ft. Monmouth Goddard Space Flight Center Ogden Patuxent River Silver Spring Washington Navy Yard Cleveland Dahlgren Denver Johnson Space Center Montgomery New Orleans Oklahoma City Tampa Tacoma Vandenberg AFB Warner Robins ALC Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PRT-70, 01 Apr 2011 ObjectivesObjectives It is common to estimate hours as a simple factor of a technical parameter such as weight, aperture, power or source lines of code (SLOC), i.e., hours = a*TechParameter z “Software development hours = a * SLOC” is used as an example z Concept is applicable to any factor cost estimating relationship (CER) Our objective is to address how to best estimate “a” z Multiply SLOC by Hour/SLOC or Divide SLOC by SLOC/Hour? z Simple, weighted, or harmonic mean? z Role of regression analysis z Base uncertainty on the prediction interval rather than just the range Our goal is to provide analysts a better understanding of choices available and how to select the right approach Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 2 of 25 OutlineOutline Definitions
    [Show full text]
  • Hydraulics Manual Glossary G - 3
    Glossary G - 1 GLOSSARY OF HIGHWAY-RELATED DRAINAGE TERMS (Reprinted from the 1999 edition of the American Association of State Highway and Transportation Officials Model Drainage Manual) G.1 Introduction This Glossary is divided into three parts: · Introduction, · Glossary, and · References. It is not intended that all the terms in this Glossary be rigorously accurate or complete. Realistically, this is impossible. Depending on the circumstance, a particular term may have several meanings; this can never change. The primary purpose of this Glossary is to define the terms found in the Highway Drainage Guidelines and Model Drainage Manual in a manner that makes them easier to interpret and understand. A lesser purpose is to provide a compendium of terms that will be useful for both the novice as well as the more experienced hydraulics engineer. This Glossary may also help those who are unfamiliar with highway drainage design to become more understanding and appreciative of this complex science as well as facilitate communication between the highway hydraulics engineer and others. Where readily available, the source of a definition has been referenced. For clarity or format purposes, cited definitions may have some additional verbiage contained in double brackets [ ]. Conversely, three “dots” (...) are used to indicate where some parts of a cited definition were eliminated. Also, as might be expected, different sources were found to use different hyphenation and terminology practices for the same words. Insignificant changes in this regard were made to some cited references and elsewhere to gain uniformity for the terms contained in this Glossary: as an example, “groundwater” vice “ground-water” or “ground water,” and “cross section area” vice “cross-sectional area.” Cited definitions were taken primarily from two sources: W.B.
    [Show full text]
  • Notes on Calculating Computer Performance
    Notes on Calculating Computer Performance Bruce Jacob and Trevor Mudge Advanced Computer Architecture Lab EECS Department, University of Michigan {blj,tnm}@umich.edu Abstract This report explains what it means to characterize the performance of a computer, and which methods are appro- priate and inappropriate for the task. The most widely used metric is the performance on the SPEC benchmark suite of programs; currently, the results of running the SPEC benchmark suite are compiled into a single number using the geometric mean. The primary reason for using the geometric mean is that it preserves values across normalization, but unfortunately, it does not preserve total run time, which is probably the figure of greatest interest when performances are being compared. Cycles per Instruction (CPI) is another widely used metric, but this method is invalid, even if comparing machines with identical clock speeds. Comparing CPI values to judge performance falls prey to the same prob- lems as averaging normalized values. In general, normalized values must not be averaged and instead of the geometric mean, either the harmonic or the arithmetic mean is the appropriate method for averaging a set running times. The arithmetic mean should be used to average times, and the harmonic mean should be used to average rates (1/time). A number of published SPECmarks are recomputed using these means to demonstrate the effect of choosing a favorable algorithm. 1.0 Performance and the Use of Means We want to summarize the performance of a computer; the easiest way uses a single number that can be compared against the numbers of other machines.
    [Show full text]
  • Cross-Sectional Skewness
    Cross-sectional Skewness Sangmin Oh∗ Jessica A. Wachtery June 18, 2019 Abstract This paper evaluates skewness in the cross-section of stock returns in light of pre- dictions from a well-known class of models. Cross-sectional skewness in monthly returns far exceeds what the standard lognormal model of returns would predict. In spite of the fact that cross-sectional skewness is positive, aggregate market skewness is negative. We present a model that accounts for both of these facts. This model also exhibits long-horizon skewness through the mechanism of nonstationary firm shares. ∗Booth School of Business, The University of Chicago. Email: [email protected] yThe Wharton School, University of Pennsylvania. Email: [email protected]. We thank Hendrik Bessembinder, John Campbell, Marco Grotteria, Nishad Kapadia, Yapai Zhang, and seminar participants at the Wharton School for helpful comments. 1 Introduction Underlying the cross-section of stock returns is a universe of heterogeneous entities com- monly referred to as firms. What is the most useful approach to modeling these firms? For the aggregate market, there is a wide consensus concerning the form a model needs to take to be a plausible account of the data. While there are important differences, quantitatively successful models tend to feature a stochastic discount factor with station- ary growth rates and permanent shocks, combined with aggregate cash flows that, too, have stationary growth rates and permanent shocks.1 No such consensus exists for the cross-section. We start with a simple model for stock returns to illustrate the puzzle. The model is not meant to be the final word on the cross-section, but rather to show that the most straightforward way to extend the consensus for the aggregate to the cross-section runs quickly into difficulties both with regard to data and to theory.
    [Show full text]
  • Math 140 Introductory Statistics
    Notation Population Sample Sampling Math 140 Distribution Introductory Statistics µ µ Mean x x Standard Professor Bernardo Ábrego Deviation σ s σ x Lecture 16 Sections 7.1,7.2 Size N n Properties of The Sampling Example 1 Distribution of The Sample Mean The mean µ x of the sampling distribution of x equals the mean of the population µ: Problems usually involve a combination of the µx = µ three properties of the Sampling Distribution of the Sample Mean, together with what we The standard deviation σ x of the sampling distribution of x , also called the standard error of the mean, equals the standard learned about the normal distribution. deviation of the population σ divided by the square root of the sample size n: σ Example: Average Number of Children σ x = n What is the probability that a random sample The Shape of the sampling distribution will be approximately of 20 families in the United States will have normal if the population is approximately normal; for other populations, the sampling distribution becomes more normal as an average of 1.5 children or fewer? n increases. This property is called the Central Limit Theorem. 1 Example 1 Example 1 Example: Average Number Number of Children Proportion of families, P(x) of Children (per family), x µx = µ = 0.873 What is the probability that a 0 0.524 random sample of 20 1 0.201 σ 1.095 σx = = = 0.2448 families in the United States 2 0.179 n 20 will have an average of 1.5 3 0.070 children or fewer? 4 or more 0.026 Mean (of population) 0.6 µ = 0.873 0.5 0.4 Standard Deviation 0.3 σ =1.095 0.2 0.1 0.873 0 01234 Example 1 Example 2 µ = µ = 0.873 Find z-score of the value 1.5 Example: Reasonably Likely Averages x x − mean z = = What average numbers of children are σ 1.095 SD σ = = = 0.2448 reasonably likely in a random sample of 20 x x − µ 1.5 − 0.873 n 20 = x = families? σx 0.2448 ≈ 2.56 Recall that the values that are in the middle normalcdf(−99999,2.56) ≈ .9947 95% of a random distribution are called So in a random sample of 20 Reasonably Likely.
    [Show full text]
  • 4.1 What Is an Average? Example
    STAT1010 – mean (or average) Chapter 4: Describing data ! 4.1 Averages and measures of center " Describing the center of a distribution ! 4.2 Shapes of distributions " Describing the shape ! 4.3 Quantifying variation " Describing the spread of a distribution 1 4.1 What is an average? ! In statistics, we generally use the term mean instead of average, and the mean has a specific formula… mean = sum of all values total number of values ! The term average could be interpreted in a variety of ways, thus, we’ll focus on the mean of a distribution or set of numbers. 2 Example: Eight grocery stores sell the PR energy bar for the following prices: $1.09 $1.29 $1.29 $1.35 $1.39 $1.49 $1.59 $1.79 Find the mean of these prices. Solution: The mean price is $1.41: $1.09 + $1.29 + $1.29 + $1.35 + $1.39 + $1.49 + $1.59 + $1.79 mean = 8 = $1.41 3 1 STAT1010 – mean (or average) Example: Octane Rating n = 40 87.4, 88.4, 88.7, 88.9, 89.3, 89.3, 89.6, 89.7 89.8, 89.8, 89.9, 90.0, 90.1, 90.3, 90.4, 90.4 90.4, 90.5, 90.6, 90.7, 91.0, 91.1, 91.1, 91.2 91.2, 91.6, 91.6, 91.8, 91.8, 92.2, 92.2, 92.2 92.3, 92.6, 92.7, 92.7, 93.0, 93.3, 93.7, 94.4 4 Example: Octane Rating Technical Note (short hand formula for the mean): Let x1, x2, …, xn represent n values.
    [Show full text]
  • STANDARDS and GUIDELINES for STATISTICAL SURVEYS September 2006
    OFFICE OF MANAGEMENT AND BUDGET STANDARDS AND GUIDELINES FOR STATISTICAL SURVEYS September 2006 Table of Contents LIST OF STANDARDS FOR STATISTICAL SURVEYS ....................................................... i INTRODUCTION......................................................................................................................... 1 SECTION 1 DEVELOPMENT OF CONCEPTS, METHODS, AND DESIGN .................. 5 Section 1.1 Survey Planning..................................................................................................... 5 Section 1.2 Survey Design........................................................................................................ 7 Section 1.3 Survey Response Rates.......................................................................................... 8 Section 1.4 Pretesting Survey Systems..................................................................................... 9 SECTION 2 COLLECTION OF DATA................................................................................... 9 Section 2.1 Developing Sampling Frames................................................................................ 9 Section 2.2 Required Notifications to Potential Survey Respondents.................................... 10 Section 2.3 Data Collection Methodology.............................................................................. 11 SECTION 3 PROCESSING AND EDITING OF DATA...................................................... 13 Section 3.1 Data Editing ........................................................................................................
    [Show full text]
  • Glossary of Transportation Construction Quality Assurance Terms
    TRANSPORTATION RESEARCH Number E-C235 August 2018 Glossary of Transportation Construction Quality Assurance Terms Seventh Edition TRANSPORTATION RESEARCH BOARD 2018 EXECUTIVE COMMITTEE OFFICERS Chair: Katherine F. Turnbull, Executive Associate Director and Research Scientist, Texas A&M Transportation Institute, College Station Vice Chair: Victoria A. Arroyo, Executive Director, Georgetown Climate Center; Assistant Dean, Centers and Institutes; and Professor and Director, Environmental Law Program, Georgetown University Law Center, Washington, D.C. Division Chair for NRC Oversight: Susan Hanson, Distinguished University Professor Emerita, School of Geography, Clark University, Worcester, Massachusetts Executive Director: Neil J. Pedersen, Transportation Research Board TRANSPORTATION RESEARCH BOARD 2017–2018 TECHNICAL ACTIVITIES COUNCIL Chair: Hyun-A C. Park, President, Spy Pond Partners, LLC, Arlington, Massachusetts Technical Activities Director: Ann M. Brach, Transportation Research Board David Ballard, Senior Economist, Gellman Research Associates, Inc., Jenkintown, Pennsylvania, Aviation Group Chair Coco Briseno, Deputy Director, Planning and Modal Programs, California Department of Transportation, Sacramento, State DOT Representative Anne Goodchild, Associate Professor, University of Washington, Seattle, Freight Systems Group Chair George Grimes, CEO Advisor, Patriot Rail Company, Denver, Colorado, Rail Group Chair David Harkey, Director, Highway Safety Research Center, University of North Carolina, Chapel Hill, Safety and Systems
    [Show full text]
  • Probability and Statistics Activity: Your Average Joe TEKS
    Mathematics TEKS Refinement 2006 – 6-8 Tarleton State University Probability and Statistics Activity: Your Average Joe TEKS: (6.10) Probability and statistics. The student uses statistical representations to analyze data. The student is expected to: (B) identify mean (using concrete objects and pictorial models), median, mode, and range of a set of data; (6.11) Underlying processes and mathematical tools. The student applies Grade 6 mathematics to solve problems connected to everyday experiences, investigations in other disciplines, and activities in and outside of school The student is expected to: (D) select tools such as real objects, manipulatives, paper/pencil, and technology or techniques such as mental math, estimation, and number sense to solve problems. Overview: In this activity, students will answer a question about the average number of letters in a first name. Using linking cubes, students will physically model the mean, median, and mode of the number of letters in their first names. Definitions will be formulated for these terms by referring to the physical activities used to determine their value. Example: to find the median, we first arranged ourselves in order from the shortest first name to the longest first name. Finally, students will be given a problem to solve in pairs, and then solution strategies will be shared with the class. Materials: Linking cubes Grid paper Transparencies 1-4 Handout 1 Calculator Grouping: Large group and pairs of students Time: Two 45-minute class periods Lesson: Procedures Notes 1. (5 minutes) Discuss the meaning of the Possible question: word “average” and how it is used in statistics.
    [Show full text]
  • Descriptive Statistics
    Statistics: Descriptive Statistics When we are given a large data set, it is necessary to describe the data in some way. The raw data is just too large and meaningless on its own to make sense of. We will sue the following exam scores data throughout: x1, x2, x3, x4, x5, x6, x7, x8, x9, x10 45, 67, 87, 21, 43, 98, 28, 23, 28, 75 Summary Statistics We can use the raw data to calculate summary statistics so that we have some idea what the data looks like and how sprad out it is. Max, Min and Range The maximum value of the dataset and the minimum value of the dataset are very simple measures. The range of the data is difference between the maximum and minimum value. Range = Max Value − Min Value = 98 − 21 = 77 Mean, Median and Mode The mean, median and mode are measures of central tendency of the data (i.e. where is the center of the data). Mean (µ) The mean is sum of all values divided by how many values there are N 1 X 45 + 67 + 87 + 21 + 43 + 98 + 28 + 23 + 28 + 75 xi = = 51.5 N i=1 10 Median The median is the middle data point when the dataset is arranged in order from smallest to largest. If there are two middle values then we take the average of the two values. Using the data above check that the median is: 44 Mode The mode is the value in the dataset that appears most. Using the data above check that the mode is: 28 Standard Deviation The standard deviation (σ) measures how spread out the data is.
    [Show full text]
  • The Case of the Disappearing Skewness
    THE CASE OF THE DISAPPEARING SKEWNESS * Preliminary Matthieu Gomez Valentin Haddad Erik Loualiche Columbia University UCLA & NBER University of Minnesota June 15, 2020 Abstract A well-known observation about firm-level returns is that they are positively skewed. We show that this positive skewness has slowly disappeared towards the end of the 20th century. In the 21st century, the distribution of idiosyncratic re- turns is symmetric. Using the entire cross-section of firms, we investigate possible explanations for the source of this change. This inquiry leaves us with a puzzle: none of the standard rationales behind the asymmetry of firm returns seem able to explain this phenomenon. Instead, the disappearance of skewness is present in firms of all types. *Gomez: [email protected]; Haddad: [email protected]; Loualiche: [email protected]. 1 Introduction Individual stock returns are positively skewed. This claim has a long history in the finance literature. As early as McEnally (1974), researchers have noted the asymmetry of firm-level returns, with larger and rarer good outcomes than bad outcomes. This observation has led to a large body of work arguing that this asymmetry plays an important role in shaping investor deci- sions. For example, if investors desire positively skewed gambles, they might choose not to diversify and invest in high-skew stocks, a view developed, for example, in Co- nine Jr and Tamarkin(1981) and Barberis and Huang(2008). In equilibrium this desire for skewness should be reflected in expected return, which Boyer, Mitton, and Vorkink(2010) and Conrad, Dittmar, and Ghysels(2013) document. In this paper, we update this classic claim.
    [Show full text]
  • Chapter 3 : Central Tendency
    Chapter 3 : Central Tendency OiOverview • Definition: Central tendency is a statistical measure to determine a single score tha t dfidefines the center of a distribution. – The goal of central tendency is to find the single score that is most typical or most representative of the entire group. – Measures of central tendency are also useful for making comparisons between groups of individuals or between sets of figures. • For example, weather data indicate that for Seattle, Washington, the average yearly temperature is 53° and the average annual precipitation is 34 inches. OiOverview cont. – By comparison, the average temperature in Phoenix, Arizona, is 71 ° and the average precipitation is 7.4 inches. • Clearly, there are problems defining the "center" of a distribution. • Occasionally, you will find a nice, neat distribution like the one shown in Figure 3.2(a), for which everyone will agree on the center. (See next slide.) • But you should realize that other distr ibut ions are possible and that there may be different opinions concerning the definition of the center. (See in two slides) OiOverview cont. • To deal with these problems, statisticians have developed three different methods for measuring central tendency: – Mean – MdiMedian – Mode OiOverview cont. Negatively Skewed Distribution Bimodal Distribution The Mean • The mean for a population will be identified by the Greek letter mu, μ (pronounced "mew"), and the mean for a sample is identified by M or (read "x- bar"). • Definition: The mean for a distribution is the sum of the scores divided by the number of scores. • The formula for the population mean is • The formula for the sample mean uses symbols that signify sample values: The Mean cont.
    [Show full text]