Introduction to Statistics - Stat 1011
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to Statistics - Stat 1011 Awol S. Department of Statistics College of Computing & Informatics Haramaya University Dire Dawa, Ethiopia c 2013/2014 Contents 1 Introduction 1 1.1 Some Statistical Terms . 1 1.2 Definition and Classification of Statistics . 1 1.2.1 Definitions of Statistics . 1 1.2.2 Stages in Statistical Investigation . 2 1.2.3 Classification of Statistics . 3 1.3 Applications, Uses and Limitations of Statistics . 4 1.3.1 Applications of Statistics . 4 1.3.2 Uses of Statistics . 5 1.3.3 Limitations of Statistics . 6 1.4 Types of Variables and Measurement Scales . 6 1.4.1 Variable . 6 1.4.2 Scales of Measurement . 7 2 Methods of Data Collection and Presentation 9 2.1 Types of Data . 9 2.2 Methods of Data Collection . 9 2.2.1 Questionnaire . 10 2.2.2 Secondary data . 11 2.3 Data Organization . 11 2.4 Methods of Data Presentation . 14 2.4.1 Frequency Distributions . 14 2.4.2 Diagrammatic Display of Data . 19 2.4.3 Graphical Presentation of Data . 23 3 Measures of Central Tendency 25 3.1 Objectives of Measures of Central Tendency . 25 3.2 Characteristics of Good Measure of Central Tendency . 26 3.3 Summation Notation . 26 3.4 Mean . 27 3.4.1 Arithmetic Mean . 27 3.4.2 Geometric Mean . 30 3.4.3 Harmonic Mean . 32 i CONTENTS CONTENTS 3.5 Median . 34 3.6 Other Measures of Location: Quantiles . 36 3.6.1 Quartiles . 36 3.6.2 Deciles . 38 3.6.3 Percentiles . 39 3.7 Mode . 40 4 Measures of Variation, Skewness and Kurtosis 43 4.1 Objectives of Measures of Variation . 44 4.2 Types of Measures of Variation . 44 4.2.1 Range and Relative Range . 45 4.2.2 Quartile Deviation and Coefficient of Quartile Deviation . 45 4.2.3 Mean Deviation and Coefficient of Mean Deviation . 46 4.2.4 Variance and Standard Deviation . 48 4.2.5 Coefficient of Variation . 51 4.2.6 Standard Score . 52 4.3 Moments . 53 4.4 Skewness . 54 4.4.1 Frequency Curves . 54 4.4.2 Measures of Skewness . 56 4.5 Kurtosis . 57 5 Elementary Probability 60 5.1 Concept of Set . 60 5.2 Basic Probability Terms . 62 5.3 Counting Techniques . 63 5.4 Definitions of Probability . 65 5.5 Some Rules of Probability . 68 5.6 Conditional Probability and Independence . 69 5.6.1 Conditional Events . 69 5.6.2 Independent Events . 70 6 Probability Distributions 72 6.1 Random Variable . 72 6.1.1 Probability Distribution . 72 6.1.2 Expectations of a Random Variable . 74 6.2 Common Discrete Distributions . 75 6.2.1 The Binomial Distribution . 75 6.2.2 The Poisson Distribution . 77 6.3 Common Continuous Distributions . 78 6.3.1 The Normal Distribution . 78 6.3.2 Other Continuous Distributions . 82 ii Introduction to Statistics - Stat 1011 [email protected] 7 Sampling Techniques 83 7.1 Basic Concepts . 83 7.2 Reasons for Sampling . 83 7.3 Types of Errors . 84 7.4 Types of Sampling Techniques . 84 7.4.1 Probability Sampling Techniques . 85 7.4.2 Non-probability Sampling Techniques . 87 8 Statistical Inference for a Single Population 88 8.1 Estimation . 88 8.1.1 Point Estimation . 88 8.1.2 Interval Estimation . 89 8.2 Hypothesis Testing . 89 8.2.1 Basic Concepts in Hypothesis Testing . 90 8.2.2 Hypothesis Testing for a Population Mean . 91 8.2.3 Confidence Interval for a Population Mean . 93 9 Inference for Two or More Populations 94 9.1 Comparison of the Population Mean in Two groups . 94 9.1.1 Paired Sample . 94 9.1.2 Independent Samples . 96 9.2 Analysis of Variance (ANOVA) . 100 10 Simple Linear Regression and Correlation 104 10.1 Correlation . 104 10.1.1 Covariance . 104 10.1.2 Pearson's Correlation Coefficient . 106 10.1.3 Spearman's Rank Correlation . 107 10.2 Simple Linear Regression . 108 10.2.1 Method of Estimation . 109 10.2.2 The Coefficient of Determination . 110 iii Chapter 1 Introduction 1.1 Some Statistical Terms Statistics has become an integral part of our daily lives. Every day we are confronted with some form of statistical information through newspapers, magazines and other forms of communication. Such statistical information has become highly influential in our lives. Before getting involved in the subject matter in detail, let us define some of the terms used extensively in the field of statistics. • Datum: It is an information taken from an object. It is also known as an observation or an item or a case or a unit. • Data: are collection of observed values representing one or more characteristics of some objects. • Population: is the totality of all objects under study. • Sample: is the subset of the population. Normally a sample should be selected in such a way as to be representative of the population. 1.2 Definition and Classification of Statistics 1.2.1 Definitions of Statistics Statistics can be defined in two senses: plural (as statistical data) and singular (as statis- tical methods). • Plural sense: Statistics are collection of facts (figures). This meaning of the word is widely used when reference is made to facts and figures on a certain characteristic. For example: sales statistics, labor statistics, employment statistics, e.t.c. In this sense the word "statistics" serves simply as "data". But, not all numerical data are statistics. In order for the numerical data to be identified as statistics, it must 1 Introduction to Statistics - Stat 1011 [email protected] possess certain identifiable characteristics. Some of these characteristics are described as follows: 1. Statistics are aggregate of facts. Single or isolated facts or figures cannot be called statistics as these cannot be compared or related to other figures within the same framework. Accordingly, there must be an aggregate of these figures. For example, if a person says that "I earn Birr 30000 per year", it would not be considered as statistics. On the other hand if we say that the average salary of a professor at our university is Birr 30000 per year, then this would be considered as statistics since the average has been computed from many related figures such as yearly salaries of many professors. 2. Statistics are numerically expressed. All statistics are stated in numerical figures only. Qualitative statements cannot be called statistics. For example, such qualitative statements as 'Ethiopia is a developing country' or 'Jack is very tall' would not be considered as statistical statements. On the other hand comparing per capita income of Ethiopia with that of Kenya would be considered statistical in nature. Similarly, Jack's height in numbers compared to average height in Ethiopia would also be considered as statistics. 3. Statistics must be placed in relation to each other. The main objective of statistical analysis is to facilitate a comparative and relative study of the desired characteristics of the data. The comparison of facts and figures may be conducted regarding the same characteristics over a period of time from a single source or it may be from various sources at any one given time. For example, prices of different items in a store as such would not be considered statistics. However, prices of one product in different stores constitute statistical data since these prices are comparable. Also, the changes in the price of a product in one store over a period of time would also be considered as as statistical data since these changes provide for comparison over a period of time. However, these comparisons must relate to the same phenomenon or subject so that likes are compared with likes and oranges are not compared with apples. • Singular sense: Statistics is a science that deals with the method of data collec- tion, data organization, data presentation, data analysis and interpretation of results. It refers to a subject matter that is concerned with extracting relevant information from available data with the aim to make sound decisions. According to this mean- ing, statistics is concerned with the development and application of methods and techniques for collecting, organizing, presenting, analyzing data and interpreting re- sults. 1.2.2 Stages in Statistical Investigation According to the singular sense definition of statistics, a statistical investigation involves five stages: data collection, organization, presentation, analysis and interpretation of re- 2 Introduction to Statistics - Stat 1011 [email protected] sults. 1. Collection of data: Data collection is the first stage in any statistical investiga- tion. It involves the process of obtaining (gathering) a set of related measurements or counts to meet predetermined objectives. Data may be available from existing published sources which may have already been organized in some presentable form. Such information is commonly referred to as secondary data. On the other hand, the investigator may actually collect his or her own data. This is usually warranted when information about some area of inquiry has not been ascertained. In such cases, the data.