Quick viewing(Text Mode)

Econometrics – 18KP3EC11

Econometrics – 18KP3EC11

Dr. S. Manonmani Assistant Professor of KNGAC for Women Thanjavur UNIT – I Introduction • is formed from two Greek words and measure. • Econometrics deals with measurement of economic relationships. • Origin of Econometrics • Prof. – Quantity theory of - with the help of . • Ragner Frish – Father of econometrics and he is one of the founders of econometric society.

• Definition of econometrics • Prof. G. Tintner defined “Econometrics consists of the application of mathematical economic theory and statistical procedures to in order to establish numerical results in the field of economics and to verify economic theorems.

• Econometrics is a combination of economic theory, and . • Economics + Mathematics – Mathematical Economics • Mathematical Economics + Statistics = Econometrics

• Econometrics is usually described as the application of statistical techniques to economic data. • use econometrics to quantify economic relationships. eg. To estimate how demand for a product varies with its . To estimate trends in a nation’s imports over a given period of time. • Economic theory and mathematical economics postulates exact relationship between various economic magnitude.

• Whereas econometrics deals with random component of economic relationships.

eg. Economic theory says that the demand for a depends on its price, on the of other , on consumer’s income and on tastes. Q = b0 + b1P + b2PO + b3Y + b4t Where Q – Quantity demanded for a particular commodity P – price of the commodity PO - prices of other commodities Y – consumer’s income t – consumer’s tastes b0, b1, b2, b3, b4 – coefficients of the demand equation

. In Econometrics the influence of other factors such as Psychological and sociological factors. These other factors are taken into account by introduction of .

Q = b0 + b1P + b2PO + b3Y + b4t + u Where u stands for random factors.

• Relationship between Econometrics, mathematical economics and statistics. Mathematical economics states economic theory in terms of mathematical symbols. Eg. Q = f (P) It describes the economic relationships in exact form. It never allows random elements. • Econometrics assumes that relationships are not exact. This methods are take into account the random variables. Further, econometric provides numerical values of the coefficients of economic phenomena.

• Econometrics and Statistics • and . • Economic statistics – empirical data – record the data- tabulate and chart it. • Mathematical statistics deals with methods of measurement, which are developed on the basis of controlled in laboratories. • Econometrics uses statistical methods after adapting them to the problems of economic life. These adopted statistical methods are called Econometric methods. • Raw materials of econometrics Econometrics deals with statistical estimation of economic relationships which can be formulated economically. Data have prime importance for econometricians – quantitative and qualitative The statistics of income, , production, price, , government expenditure, private expenditure, exports imports etc., provides raw material to econometricians.

• Econometrician is free to collect his data from primary sources and secondary sources. • The primary and secondary data can be classified into two category - Data from same entity (universe) for several periods of time. Cross-section data – Data which describe the activities of individual persons, firms or other units at a given point of time. Objectives or uses of econometrics 1. Econometrics is giving numerical estimates to economic parameters. We have economic parameters like marginal values, propensities, growth rates and . We derive numerical values for these parameters by estimating economic models. 2. Econometrics is verifying economic theories. Using the fitted demand function D = 45 – 3P we get (-3) as the price effect. It verifies Marshall’s theory of demand. 3. Econometrics is forecasting economic variables. eg. Decisions made to purchase of raw materials by forecasting future sales. 4. Econometrics is suggesting economic policies. Production and policy – estimated demand and elasticity of demand for a commodity. Income policy and policy - Keynes linear consumption function. and subsidy policy – Cobb-Douglass production function. • Characteristics of Econometrics Economic theory is mainly concerned with quantitative relationships among economic variables. Such quantitative statements are usually expressed in the form of equations with specified numerical coefficients. According to Prof. Carl F. Christ, these equations must have some desirable characteristics. 1. Relevance: since our study is based on equations, an economic equation should be relevant to the phenomenon being studied. – the model should be relevant to objective and area of study. For eg. If we want estimate production function for manufacturing sector, Cobb-Douglas production function is to be considered. 2. Simplicity: the should be simple and easy to estimate. – The simple satisfies this criteria. 3. Theoretical acceptability: constructing economic theory specifies the . For eg. Harrod – Domar growth model is suitable for developing countries and it should be considered to get growth rate of national income. 4. Explanatory ability: The estimated econometric model should have greater explanatory power. Using the Co- efficient of determination (R2) the explanatory power of the model can be derived. Eg. If consumption expenditure has a multiple relation with income, wealth, family size and age of the head, then a multiple linear consumption function is to be estimated instead of a simple linear consumption function.

5. Accuracy: The model should give accurate for economic parameters. Eg. Price elasticity of demand can be estimated more accurately with the use of Engal’s demand function than Marshall’s demand function. 6. Forecasting power: The estimated model should have forecasting ability. The Econometric study is not concerned with present only but it is also to forecast future. eg,. If we know the population in 2016 with accurate coefficient, the probable population in 2020 can be forecast. • Scope of Econometrics Scope and areas of application of econometrics is expanding constantly. It plays a role to economic analysis. It is now being widely used in policy formation by governments, businessmen and other economic thinkers. For eg. Govt. wants to devaluate its to correct the balance of payment position. For estimating the consequences of devaluation, the government is immediately concerned with price elasticity of imports and exports.. If exports and imports are inelastic the devaluation will ruin the economy, if it is elastic then it is advantages to the country, these price elasticity are to be estimated with the help of demand functions of import and export commodities, here econometric tool will be applied. The problem of have been effectively solved by econometrics. It is an outstanding method for verification of economic theories. • Limitations of Econometrics According to Tintner, “The case for econometrics has sometimes been overstated by enthusiastic econometricians” as it has the limitations. 1. Econometrics is applicable only for quantifiable economic behaviour. 2. Estimation of econometric model is based on certain assumptions, which are not true with economic data. 3. Mathematical models are inadequate to explain economic model. 4. Econometric models are time consuming and complex. 5. Estimation of econometric model is biased due to specification errors and measurement errors. • Tools and methods of study Econometrics is a scientific method which has a systematic way of studying the phenomena.

There are some important tools of econometrics

Mathematics Statistics • Econometrics and mathematics Econometrics transforms economic theory into mathematical terms and utilizes statistical methods to derive economic relationships under certain assumptions. In mathematics we study different types of equations. Economic model refers to a set of equations which describes the relationships among the economic variables. for eg. While constructing national income model, the set of equations are, Y = C + I C = f(Y) or C = α + βY + u Where, Y = National Income, C = Consumption, I = Investment, u = disturbance term, and α, β are the parameters or the coefficients. The above model consists of two types of variables 1. Endogenous variable or dependent variable 2. Exogenous variable or independent variable

Endogenous variable: The variable which are determined inside the model. (Y and C)

Exogenous variable: The variable which are determined outside the model. (I) • Econometrics and statistics The second important tool is . There are two different approaches in Statistics are used in econometrics. they are, 1. – measurement of , measurement of dispersion etc. 2. technique of statistical inference – it includes the theory of probability.

Some of the statistical tools are part of econometrics. They are, 1. Statistical data – The economic parameters are estimated on the basis of statistical data available to an investigator. 2. Statistical methods of – In econometrics the data are collected by sampling method. 3. Testing of hypothesis – whether the assumed relationship exists or not can be verified with the help of t-test or z-test at the given level of significance.

• Methods of econometric research or methodology of econometric research or Stages of econometric research Econometric research deals with measurement of parameters in economic relationships, which includes the following stages Stage I : Specification of the model Stage II : Estimation of the model Stage III : Evaluation of the estimated model Stage IV : Application • Stage I : Specification of the model An econometrician is to express the relationship between variables to explore the economic phenomena empirically. This is called the specification of the model. It involves the determination of variables, constants and equations. 1. Selection of endogenous and explanatory variables: The specification of variables assumes prior knowledge of economic theory. The econometrician must know the general of economic theory. Economic laws includes the variables influencing the dependent variable in any problem. for eg. In supply equation, quantity supplied is the endogenous variable (Q) and the price is the explanatory variable (P). In Keynesian linear consumption function, the dependent variable is consumption (C) and the explanatory variable is income (Y). 2. Specification of sign and magnitude of parameters: The sign of the parameters and their size are known from economic theory. For eg. In the general theory of

demand Dx = b0 + b1Px + b2Pr + b3Y +u.

According to the theory of demand, price effect b1 is negative for normal ,

b2 represents cross effect, which is positive for substitutes and negative for complementary goods. B3 is income effect which is positive for normal goods and negative for Giffen goods 3. Choice of suitable functional form and model: An econometrician must choose suitable set of equations and number of explanatory variables.

In case of supply function Y = b0 + b1 X + u Where, Y is quantity supplied, X is Price of the commodity. For consumption function

C = b0 + b1 Y + b2 W + b3 N + u Where, C is consumption, Y is income, W is wealth and N is number of members in the family. For Keynesian Nation income model, system of simultaneous equation instead of single equation.

• Stage II: Estimation of the model The value of the parameters in the specified model is to be estimated. 1. Collection of data: The sample data must be collected from randomly selected sample. Using sample method the time series data or the cross section data is to be collected. 2. Selection of method of estimation: The method is expected to provide an unbiased and essential . It is essential to satisfy either the small sample properties or large sample properties. In practice the Ordinary (OLS) method is found to have optimal properties (BLUE). It is simple and easy but it is applicable only under certain assumptions. The choice of an appropriate econometric method of estimation depends on the nature of relation, purpose of research, properties of and availability of resources. In case of single equation model, OLS and limited information maximum likelihood methods (LIML) are used. For simultaneous system of equations, indirect least squares, 2SLS, 3SLS and full information maximum likelihood (FIML) methods are used. 3. Problems in estimation: Multi-collinearity, auto-correlation, are the problems (will arrive at the time of estimation) should be avoided. Then only the estimated parameters will provide true results. The macro variables are to be properly aggregated.

Stage III: Evaluation of the estimated model The evaluation consists of deciding whether the estimated parameters are theoretically meaningful and statistically significant. Then only the reliability of can be determined. For this purpose we use economic criteria, Statistical criteria and econometric criteria. 1. Economic Criteria: The estimates are parametric constants of economic theory such as elasticity, growth rate, propensity, marginal value and . Economic theory imposes restrictions on the sign and values of these parameters of economic relationships. In Keynes liquidity function,

M = b0 + b1 Y + b2 i + u Where M is , Y is income and i is rate, b1 positive and b2 is negative . Unless there is reason to accept the wrong sign or size of the parameters, they may be attributed to deficiencies of the empirical data.

2. Statistical criteria: The most widely used statistical criteria are the square of (R2) and of estimates. Statistical criteria are secondary to economic criteria. High coefficient of determination implies that the explanatory variables in the model are powerful to influence the endogenous variable. The low standard error of estimates implies that the estimates are statistically significant.

3. Econometric criteria: It estimates the prediction power of estimates. It helps us to establish the desirable properties of estimates such as unbiasedness, consistency, and sufficiency. Further the model should not be affected by multi-collinearity and auto-correlation.

An econometrician must use all above criteria ( economic, statistical and econometric) before he accepts or rejects the estimates. Stage IV: Application Prediction and policy suggestions are the end points of econometric research. The ultimate goal of econometrics is to obtain reliable predictions for the application to policy suggestions. If the estimated model is reliable then the expected results shall be derived. Biased results lead to losses in the of the society. So the model fitted for empirical data is applicable for policy purposes only if it satisfies economic criteria, statistical criteria and econometric criteria. • Economic and econometric model Economic and econometric model study economic phenomena but the two are different in many respects. Economic model : It is a set of exact equations explaining economic relationships. 1. It is based on abstract economic theories 2. It contains only established facts and relations in economics. 3. It needs precise knowledge about economic theories. 4. It deals always with exact equations such as C = a + bY and Q = ALaKb 5. It explains existing laws and theories in economics. Econometric model : It is an integration of economics, mathematics and statistics. It is a set of inexact equations explaining economic relationships. 1. It is based on existing economic theories and also new phenomena. 2. It also contains new facts and relations in economics. 3. It needs wide knowledge about economic theories and human behaviour. 4. It deals always with inexact equations such as C = a + b Y + u and Q = A La Kb + u 5. It develops new laws and theories in economics.

• Time series and cross section models Time series: Time series studies are based upon those data which have been collected from same entity for several periods of time such as weeks, months, and years. Cross section: Cross section studies are based upon those data which are drawn from different groups of a population at the same time, such as consumers, families or firms, industries etc. Time series model: It contains equations which are supposed to remain unchanged during each of several different time periods.

Example for time series model, time series consumption (Ct) of an individual consumer’s behaviour related to his disposable income (Yt) in year t. The time series model is Ct = α + βYt where, t = 1, 2, 3 …..n In this model, it is assumed that the parameters α and β are remain unchanged during the given time period. th Cross Section Model: under this model, let Ci be the consumption by the i consumer in a certain period. It depends upon his disposable income (Yi) in that year. This model may be defined by the following equation by assuming that this equation is applicable for each consumer.

The cross section model is Ci = α + βYi where, I = 1, 2,3 ……n The parameters α and β should not change from consumer to consumer. Econometrics – Unit II Statistical Inference • Introduction The technique of statistical inference includes the theory of probability. With the help of the observed data, hypothesis and probability theory the inferences drawn about the nature of world. There are two kinds of statistical inference, 1. Estimation – estimating value of parameter 2. Testing of hypothesis – we make a statement about the population and then test whether the statement is true or false. These two are concerned with making judgements about some unknown aspect of given population on the basis of sample information. • Estimation – Calculating the value of the parameter by using sample . • Estimator – The procedure or formula that enables us to arrive at the estimate. There are two types of estimation, 1. Point estimation: Single value of the population parameter. Eg. In a city it is decided to estimate people’s preference for a certain product, if the result shows that 30 per cent of people consumes given product. 2. : of values for the population parameter with a certain degree of confidence. Eg. If the result for the above problem shows that the per cent of people consumes a given product could be any where between 20 to 40. • Point Estimator: A point estimate indicates a single value which can be used to estimate the population parameter. 1. The sample x ̅ is a point estimate of the population mean µ 2. The sample s2 of Dispersion σ2 • Interval estimator: An interval estimate is a range of values which may include the population parameter, with a certain degree of confidence. 1. 95% for 5% P level significance is

Sample statistic ± T0.05 (Standard error) T0.05 = Table value of the sample statistic at 5% level of significance.

2. 99% confidence interval for 1% P level significance is

Sample statistic ± T0.01 (Standard error) T0.01 = Table value of the sample statistic at 1% level of significance.

For large sample (n ≥ 30) 95% confidence level z value is 1.96 99% confidence level z value is 2.58 For small sample (n < 30) Student’s t – test, Snedecor’s F- test, Pearson’s χ2 test (Chi-Square) can be used, their table values are given for different levels of significance and degrees of freedom. Example 1. The of marks of 100 students is 60 with 10. Find 95% and 99% confidence limits. Ans. X̅ = 60, σ = 10 The 95% confidence limits for population mean are 99% confidence limits are

X̅ ± Z0.05 (S.E (X̅)) X̅ ± Z0.01 (S.E (X̅)) σ 10 σ 10 S.E (X̅) = = = 1 S.E (X̅) = = = 1 푛 100 푛 100 60 ± 1.96 (1) 60 ± 2.58 (1) 60 + 1.96, 60 – 1.96 60 + 2.58, 60 – 2.58 61.96, 58.05 62.58, 57.42 Population mean lies between 58.05 and 61.96 • Properties of estimators The properties of estimators can be divided into two depending upon the size of sample. Such as small sample properties and large sample properties. Small sample properties: If the sample size is less than 30, then that sample is known as small sample. 1. Unbiasedness : the unbiased estimator is one whose mean value is equal to the value of the population parameter to be estimated. ѳ̂ is an unbiased estimator of ѳ if E(ѳ̂) = ѳ 2. Efficiency: An unbiased estimator with minimum variance is an efficient estimator. ѳ̂ is an efficient estimator if E(ѳ̂) = ѳ and var (ѳ̂) ≤ var (ѳ*) with ѳ*is another unbiased estimator of θ for example: x ̅ is an efficient estimator of µ as E(x̅) = µ and E(M) = µ σ2 πσ2 Where M is with Var (x̅) = and Var (M) = 푛 2푛 So, Var (x)̅ < Var (M) 3. Sufficiency: An estimator is said to be sufficient if it utilizes all the information about the parameter that are contained in the sample. The estimator is based an all sample observations (data). Sample median is not sufficient estimator since it uses only and not the values of the sample observations. estimator can not be efficient unless it makes use of all the sample observations, so, sufficiency is a necessary condition for efficiency. Example, Ʃ푋 i. x ̅ is a sufficient estimator of µ as x ̅ = 푛 Ʃ 푥−푋 2 ii. S2 is a sufficient estimator of as s2 = 푛 iii. b̂ is a sufficient estimator of b in Y = a + bX + u Ʃ푋푌 −푛푋 푌 using OLS, b̂ = Ʃ 푋2 −푛푋 2 4. Best linear unbiasedness: A linear and unbiased estimator with minimum variance is the Best Linear unbiased Estimator (BLUE). for example, X̅ is BLUE of µ, as it is linear unbiased and efficient and b̂ is BLUE of b in the regression line Y = a + bX + u

Large Sample Properties or Asymptotic Properties: Asymptotic properties relate to the of an estimator when the sample size is large. 1. Asymptotic Unbiasedness: the expected value of the estimator is equal to the actual value when the sample size (n) increases and becomes ˃ 30. ѳ̂ is an asymptotic unbiased estimator of ѳ as lim 퐸(ѳ ) = ѳ n 푛 ∞ 푛

2. Asymptotic Efficiency: An asymptotic unbiased estimator with minimum variance in the large sample is an asymptotic efficient estimator of ѳ as ∗ lim 퐸(ѳ ) = ѳ and lim 푉푎푟 (ѳ ) < lim 푉푎푟(θ ) 푛 ∞ 푛 푛 ∞ 푛 푛 ∞ 푛

with θ*n another unbiased estimator of θ. 3. Consistency: The estimator approaches the actual value of the parameter as sample size increases. That is, lim (ѳ ) = ѳ 푛 ∞ 푛 • Methods of Estimation In order to drive the estimators of the population parameter, anyone of the following methods may be used. These methods provide point estimators. 1. Method of Moments: This is the oldest method. In this method the sample moments are used in place of population moments to get estimators. th r By definition r central of population is µr = E(X - µ) where µ = E(X̅) 2 2 Second central moment µ2 = σ = E(X - µ) 3 Third central moment µ3 = E(X -µ) Using method of moments, µ̂ = X̅ and σ2 = s2 This method of estimation has the following properties i. The estimators are not always unbiased. Eg. X̅ is an unbiased estimator of µ whereas s2 is a biased estimator of σ2 1 ii. The estimators are always sufficient. Eg. X̅ = ƩX and 푛 i 1 s2 = Ʃ(X - X̅)2 are based on all the sample data. 푛 iii. The estimators are consistent. iv. Always we do not get efficient estimators

2. Method of Least Squares: In this method in order to get estimators the sum of squares of deviations from their respective estimated values is made a minimum. This method is also known as classical least squares or (OLS) method of estimation. for eg. To estimate a simple model Y = α + β X + u Ʃe2 = Ʃ(Y - Ŷ)2 is minimised δƩe2 δƩe2 Using derivatives we get = 0 and δα̂ δβ̂ From these equations we derive that

Ʃ푥푖푦푖 β ̂ = 2 and α ̂ = Y̅ - β ̂ X̅ Ʃ푥푖 with xi = Xi - X̅ and yi = Yi - Y̅ These OLS estimators satisfy the following properties, i. They are linear ii. They are unbiased iii. They are efficient and iv. They are sufficient So OLS estimators are known as BLUE with optimal properties. This method is used to estimate the parameters in simple and multiple linear regression functions in Single equation model. In case of a system of simultaneous equations 2SLS, 3SLS, and ILS methods are used to derive optimal estimators.

3. Method of Maximum Likelihood: In this method the estimators are got by maximising the or the probability function of the population f(X, θ). The likelihood function for the sample is n L = Π f(Xi,θ) (Π product of) in order to find out θ 푛 Log L = 푖=1 퐿표푔푓(푋푖, θ) is maximised 푑 (푙표푔퐿) 푑2 (log 퐿) ؞ = 0 and < 0 푑θ 푑θ2 The maximum likelihood method gives estimators with the following properties i. They need not be unbiased ii. They are sufficient and iii. They are consistent This method is used to find out the parameters if theoretical of a variable is known. The estimators are at least asymptotically unbiased and asymptotically efficient. • Testing of Hypothesis What is hypothesis? A hypothesis is a statement or assumption concerning a population. For the purpose of decision making, a hypothesis has to be verified and then accepted or rejected. What is hypothesis testing? The procedure which, on the basis of sample results, enables us to decide whether a hypothesis is to be accepted or rejected is called Hypothesis testing. It is also known as test of significance. There are two types of hypothesis 1. Null hypothesis: a Null hypothesis that says there is no between the two variables in the hypothesis. It is denoted by

H0. the null hypothesis is the one to be tested. It is usually the hypothesis a researcher or experimenter will try to disprove. 2. Alternate hypothesis: It is contrary to the null hypothesis. An alternate hypothesis is one that states there is a statistically significant relationship between two variables. It is denoted as H1 or Ha.

Eg. To test whether there is no difference between the sample mean X̅ and the population parameter µ, we write the null hypothesis,

H0 : x ̅ = µ The alternate hypothesis would be

H1 : x ̅ ≠ µ another example: 1. A new fragrance soap is introduced in the . The Null hypothesis

H0 : The new soap is not better than the existing soap. H1 : The new soap is better than the existing soap.

2. To find the relationship between and income

H0: There is no association between levels of education and income of people H1: There is association between level of education and income of people

3. In agriculture to find the relationship between yield of two varieties of seeds

H0: There is no significant difference in the yield of two varieties of seeds. H1: There is significant difference in the yield of two varieties of seeds. 4. To find the relationship between income of the consumer and demand for a commodity.

H0: There is no significant difference between income of the consumer and demand for a commodity.

H1: There is significant difference between income of the consumer and the demand for a commodity. • Committing Errors When we take decision about the hypothesis whether to accept or reject, we may commit errors in two ways. They are, 1. Type I Error: We reject a hypothesis when it may be true. 2. Type II Error: We accept a hypothesis when it may be wrong. The other true situations are desirable: We accept a hypothesis when it is true. We reject a hypothesis when it is false. The decision rule:

Accept H0 Reject H0 H True Desirable Type I Error 0 H0 False Type II Error Desirable • Level of significance The probability of committing Type I Error is the level of significance of a statistical test. A 5% level implies that the probability of committing Type I Error is 0.05. A 1% level implies 0.01 probability of committing Type I Error.

The probability of committing Type I error is denoted by α

That is, α = Prob. (Rejecting H0 / H0 is true) 1 – α = Prob. (Accepting H0 / H0 is true)

The probability of making Type II error is denoted by β

Where, β = Prob. (Accepting H0 / H0 is false) 1 – β = Prob. (Rejecting H0 / H0 is false)

• One tailed and Two Tailed tests The hypothesis is to be tested is called the Null Hypothesis and is denoted by H0. The null hypothesis implies that there is no difference between the sample statistic and the population parameter. To test whether there is no difference between the sample statistic X̅ and the population parameter µ, we write the null hypothesis as

H0: x ̅ = µ The alternate hypothesis would be

H1: x ̅ ≠ µ This means x ̅ > µ or x ̅ < µ This is called two tailed test

The alternative H1 : x ̅ > µ is right tailed The alternative H1 : x ̅ < µ is left tailed These are one tailed alternatives or one tailed test Example: 1. In a study on “ mobilisation of rural ” we need comparison of propensity to save between landless labourers and farmers. For this purpose we consider left side test (one – sided test)

H0 : MPS of labourers is not lesser than the farmers, that is MPSL = MPSF H1: MPS of labourers is lesser than that of farmers, that is MPSL < MPSF OR

H0 : MPS of labourers is not greater than the farmers, that is MPSL = MPSF H1: MPS of labourers is greater than that of farmers, that is MPSL > MPSF This is right side test or one sided test

2. In a study on return on education, difference in rate of return to education with level of education is got by taking two tailed or two sided test

H0 : There is no significant difference in the return on education between UG and PG level of education

H0 : µ1 = µ2 H1 : There is significant difference in return to education between UG and PG level of education

H1 : µ1 ≠ µ2 This means either µ1 > µ2 or µ1 < µ2 This is two sided test.

• Critical Region A critical region, also known as the rejection region, is a set of values for the test statistic for which the null hypothesis is rejected. i.e. if the observed test statistic is in the critical region then we reject the null hypothesis and accept the . • Critical Value In hypothesis testing, a critical value is a point on the test distribution that is compared to the test statistic to determine whether to reject the null hypothesis. If the absolute value of your test statistic is greater than the critical value, you can declare statistical significance and reject the null hypothesis.(if the test statistic is less than the critical value, you can declare statistical insignificance and accept null hypothesis). • Standard error The concept of standard error of statistic is used to test precision of a sample. The standard error (SE) of any statistic is the standard deviation of the of the statistic The formula for SE for various statistic, σ SE (x)̅ = 푛 푃푄 SE (p) = 푛

• Types of tests We divide the tests in to small sample tests and large sample tests as follows, Small sample tests (n < 30) 1. Gossett’s Student t-test (Average, Correlation, regression) 2. Snedecor and Fisher’s F-test (, R2, ANOVA) 3. Pearson’s Chi-Square ꭓ2 – test (Association, Goodness of fit) Large sample test (n ≥ 30) Z – test Two tailed test One tailed test

Z0.05 = 1.96 Z0.05 = 1.645 Z0.01 = 2.58 Z0.01 = 2.33

• Degrees of freedom Degrees of freedom is a combination of how much data you have and how many parameters you need to estimate. It indicates how much independent information goes into a parameter estimate. • Steps in Hypothesis Testing

1. State H0 and H1 2. Choose the statistical test 3. Select the desired level of Significance 4. Compute calculated value 5. Obtain critical value 6. Make decision

• Large sample test ( Z – test) • Test for a sample mean X̅ 푥 − µ σ z = Where SE(x)̅ = 푆퐸(푥 ) 푛 1. A sample of 400 persons has a mean height of 171.38 cm. can the sample be regarded as having been drawn from a population with mean height of 171.17 cm and SD 3.3? n = 400, x ̅ = 171.38, µ = 171.17, σ = 3.3

H0: The sample is not taken from population with mean height of 171.17 cm H0: x ̅ = µ H1: The sample is taken from population with mean height of 171.17 cm H1: x ̅ ≠ µ 푥 − µ σ 3.3 3.3 z = Where SE(x)̅ = SE(x)̅ = = = 0.165 푆퐸(푥 ) 푛 400 20 171.38 −171.17 z = 0.165 z = 1.27 Calculated value or absolute value

z0.05 = 1.96 Table value or critical value The calculated z value is less than the table value, therefore the null hypothesis is accepted at 5% level of significance. That is the sample is not taken from population with mean height of 171.17 cm. 2. A factory claims to produce a lower average number of defectives compared to the prevalent average of 30.5. A random sample of 100 defective articles from the factory with mean 28.8 and SD 6.4. Is the claim sustained at 5% level of significance?

H0 : µ = 30.5 H1 : µ < 30.5 x ̅ = 28.8, µ = 30.5, σ = 6.4, n = 100 푥 − µ σ z = Where SE(x)̅ = 푆퐸(푥 ) 푛 6.4 6.4 SE(x)̅ = = = 0.64 100 10 28.8 −30.5 z = 0.64 z = 2.65 ------Calculated value or absolute value

z0.05 = 1.64 ------Table value or critical value The z calculated value is greater than the z table value , therefore the null hypothesis is rejected at 5% level of significance. It is concluded that the claim sustained

• Large sample test for equality of two means x1̅ , x2̅ x̅ − x̅ z = 1 2 푆퐸 (x̅ − x̅ ) 1 2 2 2 σ1 σ2 where 푆퐸 (x̅1 − x̅2) =√ + 푛1 푛2 1. The average score of two groups A, B were found to be 25 and 22 with SD 4 and 5.5 respectively. Test the equality of the two group scores. Given 푛1 = 푛2 = 400 H : µ = µ 0 1 2 H : µ ≠ µ 1 1 2 2 2 σ1 σ2 푆퐸 (x̅1 − x̅2) =√ + 푛1 푛2

42 5.52 = √ + 400 400 = √0.04 + 0.08 = 0.346

x̅ − x̅ z = 1 2 푆퐸 (x̅1 − x̅2)

25 −22 = 0.346

z = 8.67 Calculated value

Z0.05 = 1.96 Table value

The calculated z value is greater than the z table value at 5% level of significance, therefore the null hypothesis is rejected. It is concluded that the average value of two groups are not equal.

• Small sample test (t-test) Test for sample mean x ̅ 1. 10 persons randomly selected are found to have mean height 67.8 inches and standard deviation is 2.94. Discuss the suggestion that the mean height in the population is 66 inches

H0: x ̅ = µ H1: x ̅ ≠ µ 푥 − µ 푡 = with n-1 degrees of freedom 푠/ 푛 −1 n = 10, x ̅ = 67.8, s = 2.94, µ = 66 67.8 −66 푡 = 2.94/ 10 −1 t = 1.8 ----- Calculated value t0.05 with d.f 9 = 2.262 ------Table value t calculated value is less than the t table value, so the null hypothesis is accepted at 5% level of significance. It is concluded that the sample mean and population mean are equal.

• To test the significance of correlation coefficient r 푟 푛−2 푡 = with n – 2 degrees of freedom 1−푟2 If 푡 > table value of tn-2, 0.05 then r is significant

If 푡 < table value of tn-2, 0.05 Then r is not significant. 1. For 18 observations it was found that r = - 0.6. Is this significant for the existence of correlation in the population?

H0: There is no significant correlation H1: There is significant correlation 0.6 18−2 푡 = with 18 – 2 degrees of freedom 1−0.62 = 2.67 t16, 0.05 = 2.12 The calculated value is greater than the table value. Therefore the null hypothesis is rejected. It is concluded that the coefficient of correlation is significant.

• Chi-Square (ꭓ2) test It is used 1. To test the goodness of fit (to test the difference between theoretical and observed frequencies) 2. To test the independence of attributes.() 3. To test the population variance has a specified value. Test for goodness of fit 푂 −퐸 2 ꭓ2 = Ʃ With n-1 degrees of freedom 퐸 Where, O is observed E is expected frequency n is total frequency •There are 100 slips in an urn supposed to have all digits in equal numbers. Acheck gives the following result. Is the result consistant wit that the urn conbtains all digits in equal numbers i.e. 10?

Digit 0 1 2 3 4 5 6 7 8 9 O (observed 9 11 1 12 10 8 7 10 9 12 frequencies) 2 Ans. H0: There is no difference between observed and expected values H1: There is difference between observed and expected values

Digit O (observed E (Expected (O – E)2 / E

frequencies) frequencies)

0 9 10 0.1 1 11 10 0.1 2 12 10 0.4

3 12 10 0.4

4 10 10 0 5 8 10 0.4 6 7 10 0.9 7 10 10 0 8 9 10 0.1 9 12 10 0.4 Total 2.8

ꭓ2 = ƩO-E2E = 2.8 and d.f = n – 1 = 10 – 1 = 9 2 From the table ꭓ 9, 0.05 = 16.92 Calculated ꭓ2 is less than the table value, therefore null hypothesis accepted at 5% level of significance. It is concluded that there is no difference between observed and expected values Test of independence 푂 −퐸푖푗 2 ꭓ2 = Ʃ 푖푗 with degrees of freedom (r-1) (c-1) 퐸푖푗 Contingency table Suppose the frequencies in the data are classified according to attribute A into r classes (rows) and according to attribute B into c classes (columns) as follows:

Class B1 B2 ……. Bc Total

A1 O11 O12 …… O1c (A1)

A O O …….. O (A ) 2 21 22 2c 2 ......

Ar Or1 Or2 …….. Orc (Ar)

Total (B1) (B2) ……… (Bc) N

The total of row and column frequencies are (Ai), (Bj) To test if there is any relation between A, B we set up the null hypothesis of independence between A, B. The expected frequency in any cell is calculated by using the formula (퐴푖)(퐵푗) E = ij 푁 푂 −퐸푖푗 2 ꭓ2 = Ʃ 푖푗 with degrees of freedom (r-1) (c-1) 퐸푖푗 2 1. From the following, test if there is any association between sex and education. Given ꭓ 2, 0.05 = 5.99 and 2 ꭓ 2, 0.01 = 9.2

School College University Total

Boys 10 15 25 50

Girls 25 10 15 50 Total 35 25 40 100

Ans.

H0: There is no association between sex and education H1: There is association between sex and education Expected frequency 35 x 50 / 100 = 17.5, 25 x 50 / 100 = 12.5, 40 x 50 / 100 = 20 35 x 50 / 100 = 17.5, 25 x 50 / 100 = 12.5, 40 x 50 / 100 = 20

School College University Total (O – E)2/E (10 – 17.5)2/17.5 = 3.21 (15 – 12.5)2/12.5 = 0.5 Boys 17.5 12.5 20 50 (25 – 20)2/20 = 1.25 Girls 17.5 12.5 20 50 (25 – 17.5)2/17.5 = 3.21 (10 - 12.50)2/ 12.5 = 0.5 Total 35 25 40 100 (15 – 20)2/20 = 1.25 ------ꭓ2 = 9.92 Calculated chi-squared value is greater than the table value both at 9.92 2 ꭓ 2, 0.05 = 5.99 5% and 1% level of significance. Therefore the null hypothesis is rejected. It is 2 ꭓ 2, 0.01 = 9.2 concluded that There is association between sex and education and both are not independent.