Descriptive Statistics
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
EART6 Lab Standard Deviation in Probability and Statistics, The
EART6 Lab Standard deviation In probability and statistics, the standard deviation is a measure of the dispersion of a collection of values. It can apply to a probability distribution, a random variable, a population or a data set. The standard deviation is usually denoted with the letter σ. It is defined as the root-mean-square (RMS) deviation of the values from their mean, or as the square root of the variance. Formulated by Galton in the late 1860s, the standard deviation remains the most common measure of statistical dispersion, measuring how widely spread the values in a data set are. If many data points are close to the mean, then the standard deviation is small; if many data points are far from the mean, then the standard deviation is large. If all data values are equal, then the standard deviation is zero. A useful property of standard deviation is that, unlike variance, it is expressed in the same units as the data. 2 (x " x ) ! = # N N Fig. 1 Dark blue is less than one standard deviation from the mean. For Fig. 2. A data set with a mean the normal distribution, this accounts for 68.27 % of the set; while two of 50 (shown in blue) and a standard deviations from the mean (medium and dark blue) account for standard deviation (σ) of 20. 95.45%; three standard deviations (light, medium, and dark blue) account for 99.73%; and four standard deviations account for 99.994%. The two points of the curve which are one standard deviation from the mean are also the inflection points. -
Uses for Census Data in Market Research
Uses for Census Data in Market Research MRS Presentation 4 July 2011 Andrew Zelin, Director, Sampling & RMC, Ipsos MORI Why Census Data is so critical to us Use of Census Data underlies almost all of our quantitative work activities and projects Needed to ensure that: – Samples are balanced and representative – ie Sampling; – Results derived from samples reflect that of target population – ie Weighting; – Understanding how demographic and geographic factors affect the findings from surveys – ie Analysis. How would we do our jobs without the use of Census Data? Our work withoutCensusData Our work Regional Assembly What I will cover Introduction, overview and why Census Data is so critical; Census data for sampling; Census data for survey weighting; Census data for statistical analysis; Use of SARs; The future (eg use of “hypercubes”); Our work without Census Data / use of alternatives; Conclusions Census Data for Sampling For Random (pre-selected) Samples: – Need to define stratum sample sizes and sampling fractions; – Need to define cluster size, esp. if PPS – (Need to define stratum / cluster) membership; For any type of Sample: – To balance samples across demographics that cannot be included as quota controls or stratification variables – To determine number of sample points by geography – To create accurate booster samples (eg young people / ethnic groups) Census Data for Sampling For Quota (non-probability) Samples: – To set appropriate quotas (based on demographics); Data are used here at very localised level – ie Census Outputs Census Data for Sampling Census Data for Survey Weighting To ensure that the sample profile balances the population profile Census Data are needed to tell you what the population profile is ….and hence provide accurate weights ….and hence provide an accurate measure of the design effect and the true level of statistical reliability / confidence intervals on your survey measures But also pre-weighting, for interviewer field dept. -
Robust Logistic Regression to Static Geometric Representation of Ratios
Journal of Mathematics and Statistics 5 (3):226-233, 2009 ISSN 1549-3644 © 2009 Science Publications Robust Logistic Regression to Static Geometric Representation of Ratios 1Alireza Bahiraie, 2Noor Akma Ibrahim, 3A.K.M. Azhar and 4Ismail Bin Mohd 1,2 Institute for Mathematical Research, University Putra Malaysia, 43400 Serdang, Selangor, Malaysia 3Graduate School of Management, University Putra Malaysia, 43400 Serdang, Selangor, Malaysia 4Department of Mathematics, Faculty of Science, University Malaysia Terengganu, 21030, Malaysia Abstract: Problem statement: Some methodological problems concerning financial ratios such as non- proportionality, non-asymetricity, non-salacity were solved in this study and we presented a complementary technique for empirical analysis of financial ratios and bankruptcy risk. This new method would be a general methodological guideline associated with financial data and bankruptcy risk. Approach: We proposed the use of a new measure of risk, the Share Risk (SR) measure. We provided evidence of the extent to which changes in values of this index are associated with changes in each axis values and how this may alter our economic interpretation of changes in the patterns and directions. Our simple methodology provided a geometric illustration of the new proposed risk measure and transformation behavior. This study also employed Robust logit method, which extends the logit model by considering outlier. Results: Results showed new SR method obtained better numerical results in compare to common ratios approach. With respect to accuracy results, Logistic and Robust Logistic Regression Analysis illustrated that this new transformation (SR) produced more accurate prediction statistically and can be used as an alternative for common ratios. Additionally, robust logit model outperforms logit model in both approaches and was substantially superior to the logit method in predictions to assess sample forecast performances and regressions. -
Iam 530 Elements of Probability and Statistics
IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES VARIABLE • Studying the behavior of random variables, and more importantly functions of random variables is essential for both the theory and practice of statistics. Variable: A characteristic of population or sample that is of interest for us. Random Variable: A function defined on the sample space S that associates a real number with each outcome in S. In other words, a numerical value to each outcome of a particular experiment. • For each element of an experiment’s sample space, the random variable can take on exactly one value TYPES OF RANDOM VARIABLES We will start with univariate random variables. • Discrete Random Variable: A random variable is called discrete if its range consists of a countable (possibly infinite) number of elements. • Continuous Random Variable: A random variable is called continuous if it can take on any value along a continuum (but may be reported “discretely”). In other words, its outcome can be any value in an interval of the real number line. Note: • Random Variables are denoted by upper case letters (X) • Individual outcomes for RV are denoted by lower case letters (x) DISCRETE RANDOM VARIABLES EXAMPLES • A random variable which takes on values in {0,1} is known as a Bernoulli random variable. • Discrete Uniform distribution: 1 P(X x) ; x 1,2,..., N; N 1,2,... N • Throw a fair die. P(X=1)=…=P(X=6)=1/6 DISCRETE RANDOM VARIABLES • Probability Distribution: Table, Graph, or Formula that describes values a random variable can take on, and its corresponding probability (discrete random variable) or density (continuous random variable). -
CORRELATION COEFFICIENTS Ice Cream and Crimedistribute Difficulty Scale ☺ ☺ (Moderately Hard)Or
5 COMPUTING CORRELATION COEFFICIENTS Ice Cream and Crimedistribute Difficulty Scale ☺ ☺ (moderately hard)or WHAT YOU WILLpost, LEARN IN THIS CHAPTER • Understanding what correlations are and how they work • Computing a simple correlation coefficient • Interpretingcopy, the value of the correlation coefficient • Understanding what other types of correlations exist and when they notshould be used Do WHAT ARE CORRELATIONS ALL ABOUT? Measures of central tendency and measures of variability are not the only descrip- tive statistics that we are interested in using to get a picture of what a set of scores 76 Copyright ©2020 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher. Chapter 5 ■ Computing Correlation Coefficients 77 looks like. You have already learned that knowing the values of the one most repre- sentative score (central tendency) and a measure of spread or dispersion (variability) is critical for describing the characteristics of a distribution. However, sometimes we are as interested in the relationship between variables—or, to be more precise, how the value of one variable changes when the value of another variable changes. The way we express this interest is through the computation of a simple correlation coefficient. For example, what’s the relationship between age and strength? Income and years of education? Memory skills and amount of drug use? Your political attitudes and the attitudes of your parents? A correlation coefficient is a numerical index that reflects the relationship or asso- ciation between two variables. The value of this descriptive statistic ranges between −1.00 and +1.00. -
What Is Statistic?
What is Statistic? OPRE 6301 In today’s world. ...we are constantly being bombarded with statistics and statistical information. For example: Customer Surveys Medical News Demographics Political Polls Economic Predictions Marketing Information Sales Forecasts Stock Market Projections Consumer Price Index Sports Statistics How can we make sense out of all this data? How do we differentiate valid from flawed claims? 1 What is Statistics?! “Statistics is a way to get information from data.” Statistics Data Information Data: Facts, especially Information: Knowledge numerical facts, collected communicated concerning together for reference or some particular fact. information. Statistics is a tool for creating an understanding from a set of numbers. Humorous Definitions: The Science of drawing a precise line between an unwar- ranted assumption and a forgone conclusion. The Science of stating precisely what you don’t know. 2 An Example: Stats Anxiety. A business school student is anxious about their statistics course, since they’ve heard the course is difficult. The professor provides last term’s final exam marks to the student. What can be discerned from this list of numbers? Statistics Data Information List of last term’s marks. New information about the statistics class. 95 89 70 E.g. Class average, 65 Proportion of class receiving A’s 78 Most frequent mark, 57 Marks distribution, etc. : 3 Key Statistical Concepts. Population — a population is the group of all items of interest to a statistics practitioner. — frequently very large; sometimes infinite. E.g. All 5 million Florida voters (per Example 12.5). Sample — A sample is a set of data drawn from the population. -
Questionnaire Analysis Using SPSS
Questionnaire design and analysing the data using SPSS page 1 Questionnaire design. For each decision you make when designing a questionnaire there is likely to be a list of points for and against just as there is for deciding on a questionnaire as the data gathering vehicle in the first place. Before designing the questionnaire the initial driver for its design has to be the research question, what are you trying to find out. After that is established you can address the issues of how best to do it. An early decision will be to choose the method that your survey will be administered by, i.e. how it will you inflict it on your subjects. There are typically two underlying methods for conducting your survey; self-administered and interviewer administered. A self-administered survey is more adaptable in some respects, it can be written e.g. a paper questionnaire or sent by mail, email, or conducted electronically on the internet. Surveys administered by an interviewer can be done in person or over the phone, with the interviewer recording results on paper or directly onto a PC. Deciding on which is the best for you will depend upon your question and the target population. For example, if questions are personal then self-administered surveys can be a good choice. Self-administered surveys reduce the chance of bias sneaking in via the interviewer but at the expense of having the interviewer available to explain the questions. The hints and tips below about questionnaire design draw heavily on two excellent resources. SPSS Survey Tips, SPSS Inc (2008) and Guide to the Design of Questionnaires, The University of Leeds (1996). -
4. Introduction to Statistics Descriptive Statistics
Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation to another. For example, the units might be headache sufferers and the variate might be the time between taking an aspirin and the headache ceasing. An observation or response is the value taken by a variate for some given unit. There are various types of variate. Qualitative or nominal; described by a word or phrase (e.g. blood group, colour). Quantitative; described by a number (e.g. time till cure, number of calls arriving at a telephone exchange in 5 seconds). Ordinal; this is an "in-between" case. Observations are not numbers but they can be ordered (e.g. much improved, improved, same, worse, much worse). Averages etc. can sensibly be evaluated for quantitative data, but not for the other two. Qualitative data can be analysed by considering the frequencies of different categories. Ordinal data can be analysed like qualitative data, but really requires special techniques called nonparametric methods. Quantitative data can be: Discrete: the variate can only take one of a finite or countable number of values (e.g. a count) Continuous: the variate is a measurement which can take any value in an interval of the real line (e.g. a weight). Displaying data It is nearly always useful to use graphical methods to illustrate your data. We shall describe in this section just a few of the methods available. Discrete data: frequency table and bar chart Suppose that you have collected some discrete data. -
Three Kinds of Statistical Literacy: What Should We Teach?
ICOTS6, 2002: Schield THREE KINDS OF STATISTICAL LITERACY: WHAT SHOULD WE TEACH? Milo Schield Augsburg College Minneapolis USA Statistical literacy is analyzed from three different approaches: chance-based, fallacy-based and correlation-based. The three perspectives are evaluated in relation to the needs of employees, consumers, and citizens. A list of the top 35 statistically based trade books in the US is developed and used as a standard for what materials statistically literate people should be able to understand. The utility of each perspective is evaluated by reference to the best sellers within each category. Recommendations are made for what content should be included in pre-college and college statistical literacy textbooks from each kind of statistical literacy. STATISTICAL LITERACY Statistical literacy is a new term and both words (statistical and literacy) are ambiguous. In today’s society literacy means functional literacy: the ability to review, interpret, analyze and evaluate written materials (and to detect errors and flaws therein). Anyone who lacks this type of literacy is functionally illiterate as a productive worker, an informed consumer or a responsible citizen. Functional illiteracy is a modern extension of literal literacy: the ability to read and comprehend written materials. Statistics have two functions depending on whether the data is obtained from a random sample. If a statistic is not from a random sample then one can only do descriptive statistics and statistical modeling even if the data is the entire population. If a statistic is from a random sample then one can also do inferential statistics: sampling distributions, confidence intervals and hypothesis tests. -
Measurements of Entropic Uncertainty Relations in Neutron Optics
applied sciences Review Measurements of Entropic Uncertainty Relations in Neutron Optics Bülent Demirel 1,† , Stephan Sponar 2,*,† and Yuji Hasegawa 2,3 1 Institute for Functional Matter and Quantum Technologies, University of Stuttgart, 70569 Stuttgart, Germany; [email protected] 2 Atominstitut, Vienna University of Technology, A-1020 Vienna, Austria; [email protected] or [email protected] 3 Division of Applied Physics, Hokkaido University Kita-ku, Sapporo 060-8628, Japan * Correspondence: [email protected] † These authors contributed equally to this work. Received: 30 December 2019; Accepted: 4 February 2020; Published: 6 February 2020 Abstract: The emergence of the uncertainty principle has celebrated its 90th anniversary recently. For this occasion, the latest experimental results of uncertainty relations quantified in terms of Shannon entropies are presented, concentrating only on outcomes in neutron optics. The focus is on the type of measurement uncertainties that describe the inability to obtain the respective individual results from joint measurement statistics. For this purpose, the neutron spin of two non-commuting directions is analyzed. Two sub-categories of measurement uncertainty relations are considered: noise–noise and noise–disturbance uncertainty relations. In the first case, it will be shown that the lowest boundary can be obtained and the uncertainty relations be saturated by implementing a simple positive operator-valued measure (POVM). For the second category, an analysis for projective measurements is made and error correction procedures are presented. Keywords: uncertainty relation; joint measurability; quantum information theory; Shannon entropy; noise and disturbance; foundations of quantum measurement; neutron optics 1. Introduction According to quantum mechanics, any single observable or a set of compatible observables can be measured with arbitrary accuracy. -
THE USE of EFFECT SIZES in CREDIT RATING MODELS By
THE USE OF EFFECT SIZES IN CREDIT RATING MODELS by HENDRIK STEFANUS STEYN submitted in accordance with the requirements for the degree of MASTER OF SCIENCE in the subject STATISTICS at the UNIVERSITY OF SOUTH AFRICA SUPERVISOR: PROF P NDLOVU DECEMBER 2014 © University of South Africa 2015 Abstract The aim of this thesis was to investigate the use of effect sizes to report the results of statistical credit rating models in a more practical way. Rating systems in the form of statistical probability models like logistic regression models are used to forecast the behaviour of clients and guide business in rating clients as “high” or “low” risk borrowers. Therefore, model results were reported in terms of statistical significance as well as business language (practical significance), which business experts can understand and interpret. In this thesis, statistical results were expressed as effect sizes like Cohen‟s d that puts the results into standardised and measurable units, which can be reported practically. These effect sizes indicated strength of correlations between variables, contribution of variables to the odds of defaulting, the overall goodness-of-fit of the models and the models‟ discriminating ability between high and low risk customers. Key Terms Practical significance; Logistic regression; Cohen‟s d; Probability of default; Effect size; Goodness-of-fit; Odds ratio; Area under the curve; Multi-collinearity; Basel II © University of South Africa 2015 i Contents Abstract .................................................................................................................................................. -
Measures of Central Tendency (Mean, Mode, Median,) Mean • Mean Is the Most Commonly Used Measure of Central Tendency
Measures of Central Tendency (Mean, Mode, Median,) Mean • Mean is the most commonly used measure of central tendency. • There are different types of mean, viz. arithmetic mean, weighted mean, geometric mean (GM) and harmonic mean (HM). • If mentioned without an adjective (as mean), it generally refers to the arithmetic mean. Arithmetic mean • Arithmetic mean (or, simply, “mean”) is nothing but the average. It is computed by adding all the values in the data set divided by the number of observations in it. If we have the raw data, mean is given by the formula • Where, ∑ (the uppercase Greek letter sigma), refers to summation, X refers to the individual value and n is the number of observations in the sample (sample size). • The research articles published in journals do not provide raw data and, in such a situation, the readers can compute the mean by calculating it from the frequency distribution. Mean contd. • Where, f is the frequency and X is the midpoint of the class interval and n is the number of observations. • The standard statistical notations (in relation to measures of central tendency). • The mean calculated from the frequency distribution is not exactly the same as that calculated from the raw data. • It approaches the mean calculated from the raw data as the number of intervals increase Mean contd. • It is closely related to standard deviation, the most common measure of dispersion. • The important disadvantage of mean is that it is sensitive to extreme values/outliers, especially when the sample size is small. • Therefore, it is not an appropriate measure of central tendency for skewed distribution.