<<

Centre for Development, Environment and Policy

P101

Applied Econometrics

Prepared by Francesca Di Nuzzo

This module is partially based on the earlier module ‘Applied Econometrics for the Agricultural and Food Sector’ prepared for the University of London’s External Programme by Alison Burrell.

© SOAS | 3740 Applied Econometrics Module Introduction

ABOUT THIS MODULE

This module is about econometric methods and how they are applied to estimate and test the unknown parameters of economic relationships. Priority is given to both the statistical reasoning underlying the methodology and the practical considerations involved in using this methodology with a variety of models and real data.

The focus of the module is on the classical linear regression model. This is the basis for much econometric methodology and it provides the framework for organising the module.

The module covers

 the principles of regression analysis and its statistical foundations

 the simple linear regression model

 the multiple linear regression model

 violations of the assumptions of classical linear regression

There is a limit to the distance that can be covered in the study time available. In an econometrics module, the trade-off between breadth and depth is low since, without good groundwork and sufficient information at each stage, ideas may be misunderstood and techniques misapplied. This module follows the standard itinerary of most econometrics textbooks.

The practical exercises designed to be done with the help of the free computer software package R are an important element of the module.

© SOAS CeDEP 2 Applied Econometrics Module Introduction

STRUCTURE OF THE MODULE

Each unit in the study guide follows the same format.

Each unit will always start with a section on ideas or issues, whose purpose is to explain, in simple words and with a minimum of technical notation, the basic substance of the unit. The aim is to give you an intuitive feel for the subject matter before going into technical detail. If you feel that mathematics and statistics is not your strongest suit, this regular section will give you a few ‘analytical handles’ to hold on to when studying relevant techniques. But even if you are confident with mathematics and statistics, it is important not to skip this section.

Technical expertise is not just a question of one’s ability to work out the steps in a technical procedure or to understand a mathematical derivation. It also involves understanding the type of questions a technique tries to address and the assumptions on which it is based as well as judging the appropriateness of particular technical procedures in specific conditions.

Next, the module units contain a Key Concepts section, which guides your study in further detail. The purpose of this section is to highlight the main concepts as well as to structure your reading of the textbook.

Following on from this, you will find a section containing an example (except Units 6 and 10). The purpose of this section is twofold.

 First, the example highlights a specific aspect of the topic under study in a particular unit of the module.

 Second, the example also gives you a glimpse of econometrics in action.

The examples aim to highlight the links between economic theory and empirical investigation, and to illustrate the problems that can arise when we work with real data.

Often, you will find a set of self-assessment questions at the end of each section. It is important that you work through all of these. Their purpose is threefold:

 to check your understanding of basic concepts and ideas

 to verify your ability to execute technical procedures in practice

 to develop your skills in interpreting the results of empirical analysis.

Also, you will find additional exercises at the end of each unit, which aim to give you hands-on examples of how to carry out empirical analysis; in many cases, you will be asked to work with actual data and to interpret results. Answers to the self- assessment questions are provided.

For each unit (except in Units 6 and 10) you will find a guide that explains how to use R – the software package you will use to carry out econometric exercises. This guide will help you to master this particular econometrics software package. You may find it more convenient to study each R guide individually at the end of each unit (before the section ‘Unit Self-Assessment Questions’).

© SOAS CeDEP 3 Applied Econometrics Module Introduction

When studying each unit, we suggest that you study the text at your own pace and then work through the questions which are always designed to test your understanding of the unit material. If these questions reveal some weak spots, refresh them first before going on to the applied, data-based questions which you will need to answer in conjunction with the R guide. The Summary at the end of each unit briefly describes the topics covered, and the table of Key Terms and Concepts lists all the important concepts you should have learned through your study.

Applied statistics and econometrics are subjects with a great deal of specialised jargon. It can be disconcerting to be faced with a number of unfamiliar new terms, many of them quite long and often rather similar to each other. We recommend very strongly that, right from the beginning of the module, you keep a glossary in which you list each new term as you encounter it, together with an explanation of the term in your own words. You should read through this glossary every few weeks, updating your definitions if you find that, as the module progresses, your understanding of the term develops along with your familiarity with the concept.

© SOAS CeDEP 4 Applied Econometrics Module Introduction

WHAT YOU WILL LEARN

Module Aims

The specific aims of the module are:

 To explain the principles of econometric estimation and its statistical foundations.

 To present the theory of the classical linear regression model and explain why the conditions in such a model provide an ideal environment for ordinary least squares regression.

 To develop practical skills of data analysis, use of regression techniques and interpretation of regression results.

 To explain the procedures of interval estimation and hypothesis testing in the classical normal linear regression model.

 To show how econometric models can be made more realistic through the use of dummy variables.

 To explain how linear restrictions can be imposed on parameters during estimation and how these restrictions can be tested.

 To investigate the consequences of heteroscedasticity of the disturbances and endogeneity of the regressors.

 To encourage an appreciation of what constitutes a ‘good’ econometric model, and how to test that a model is well specified.

Module Learning Outcomes

By the end of this module students should be able to:

 understand and selectively and critically apply the basic principles of regression analysis and statistical inference in the context of a single-equation regression model

 formulate a single-equation regression model, estimate its parameters, carry out a variety of tests relating to model specification and critically interpret all results

 test hypotheses about economic behaviour and critically interpret the results of these tests

 specify and interpret models using dummy variables, different types of dynamic specification and incorporate and test linear restrictions

 test for heteroscedasticity and endogeneity, and take appropriate action when these conditions are found to be present.

© SOAS CeDEP 5 Applied Econometrics Module Introduction

ASSESSMENT

This module is assessed by:

• an examined assignment (EA) worth 40%

• a written examination worth 60%.

Since the EA is an element of the formal examination process, please note the following:

(a) The EA questions and submission date will be available on the Virtual Learning Environment (VLE).

(b) The EA is submitted by uploading it to the VLE.

(c) The EA is marked by the module tutor and students will receive a percentage mark and feedback.

(d) Answers submitted must be entirely the student’s own work and not a product of collaboration.

(e) Plagiarism is a breach of regulations. To ensure compliance with the specific University of London regulations, all students are advised to read the guidelines on referencing the work of other people. For more detailed information, see the FAQ on the VLE.

© SOAS CeDEP 6 Applied Econometrics Module Introduction

STUDY MATERIALS

Except for Unit 9, for which separate material will be provided, the single textbook for the module is:

 Gujarati, D. & Porter, D. (2010) Essentials of Econometrics. 4th edition. International Edition. McGraw-Hill.

This book has been chosen for this self-study module because of its attention to full explanations of concepts and procedures, its long introductory section presenting the basic statistical concepts used in regression modelling, and its avoidance of unnecessary algebra and difficult notation.

You are also encouraged to read those parts that are not specifically identified in the module texts, since all the material here should be within your grasp and will reinforce your understanding of the subject. By the end of the module, you will know this textbook well and will be ready for other more advanced readings.

The following are not compulsory; however, they are more advanced in terms of notation and explanations, should you wish to go into more depth about specific concepts. Please note that these texts are recommendations and are not provided.

Intermediate

Greene, W. (2000) Econometric Analysis. 4th edition. New Jersey, Prentice Hall.

Gujarati, D. (1979) Basic Econometrics. Singapore, McGraw-Hill.

Gujarati, D. (2011) Econometrics by Example. Palgrave Macmillan.

Advanced

Maddala, G.S. & Lahiri, K. (2009) Introduction to Econometrics. 4th edition. Chichester, John Wiley & Sons.

Other

Hallam, D. (1990) Econometric Modelling of Agricultural Commodity Markets. London, Routledge. Discussion of econometric modelling in agricultural .

© SOAS CeDEP 7 Applied Econometrics Module Introduction

For each of the module units, the following are provided.

Key Readings

These are drawn mainly from the textbook and relevant academic journals and internationally respected reports. Key Readings are provided to add breadth and depth to the unit materials, as appropriate, and are required reading as they contain material on which students may be examined. The notes under each reading indicate the scope and relevance of the reading.

Further Readings

These texts are not provided in hard copy, but weblinks have been included. Further Readings are NOT examinable and are provided to enable students to pursue their own areas of interest.

Multimedia

The e-version of the study guide includes a number of interviews with rural development managers in which they discuss particular aspects of their management experience. These interviews can be treated rather like audio case studies to illustrate concepts and arguments presented in the module text.

References

Each unit contains a full list of all material cited in the text. All references cited in the unit text are listed in the relevant units. However, this is primarily a matter of good academic practice: to show where points made in the text can be substantiated. Students are not expected to consult these references as part of their study of this module.

Self-Assessment Questions

Often, you will find a set of Self-Assessment Questions at the end of each section within a unit. It is important that you work through all of these. Their purpose is threefold:

 to check your understanding of basic concepts and ideas

 to verify your ability to execute technical procedures in practice

 to develop your skills in interpreting the results of empirical analysis.

Also, you will find additional Unit Self-Assessment Questions at the end of each unit, which aim to help you assess your broader understanding of the unit material. Answers to the Self-Assessment Questions are provided in the Answer Booklet.

© SOAS CeDEP 8 Applied Econometrics Module Introduction

In-text Questions

 This icon invites you to answer a question for which an answer is provided. Try not to look at the answer immediately; first write down what you think is a reasonable answer to the question before reading on. This is equivalent to lecturers asking a question of their class and using the answers as a springboard for further explanation.

In-text Activities

 This symbol invites you to halt and consider an issue or engage in a practical activity.

Key Terms and Concepts

At the end of each unit you are provided with a list of Key Terms and Concepts which have been introduced in the unit. The first time these appear in the text guide they are Bold Italicised. Some key terms are very likely to be used in examination questions, and an explanation of the meaning of relevant key terms will nearly always gain you credit in your answers.

Acronyms and Abbreviations

As you progress through the module you may need to check unfamiliar acronyms that are used. A full list of these is provided for you at the end of the Introduction.

© SOAS CeDEP 9 Applied Econometrics Module Introduction

TUTORIAL SUPPORT

There are two opportunities for receiving support from tutors during your study, and you are strongly advised to take advantage of both. These opportunities involve:

(a) participating in the Virtual Learning Environment (VLE)

(b) completing the examined assignment (EA).

Virtual Learning Environment (VLE)

The Virtual Learning Environment provides an opportunity for you to interact with both other students and tutors. A discussion forum is provided through which you can post questions regarding any study topic that you have difficulty with, or for which you require further clarification. You can also discuss more general issues on the News forum within the CeDEP Programme Area.

© SOAS CeDEP 10 Applied Econometrics Module Introduction

INDICATIVE STUDY CALENDAR

Part/unit Unit title Study time (hours)

PART I INTRODUCTORY IDEAS AND STATISTICAL CONCEPTS

Unit 1 Introduction to Econometrics 10

Unit 2 Statistical Review 15

PART II THE SIMPLE REGRESSION MODEL

Unit 3 The Classical Linear Regression Model 15

Unit 4 Hypothesis Testing 15

PART III THE MULTIPLE REGRESSION MODEL

Unit 5 The Multiple Regression Model 15

Unit 6 Dummy Variables 10

Unit 7 Linear Parameter Restrictions 15

PART IV NON-CLASSICAL DISTURBANCES AND ENDOGENEITY

Unit 8 Heteroscedasticity 15

Unit 9 Causality and Instrumental Variables 15

PART V MODULE SUMMARY

Unit 10 Module Summary 10

Examined Assignment 15 Check the VLE for submission deadline

Examination entry July

Revision and examination preparation Jul–Sep

End-of-module examination Late Sep— early Oct

© SOAS CeDEP 11 Applied Econometrics Module Introduction

ACRONYMS AND ABBREVIATIONS

BLUE best linear unbiased estimator

CLRM classical linear regression model

CLT central limit theorem

CNLRM classical normal linear regression model

CPI consumer price index d.f. degrees of freedom

ESS explained sum of squares

FAO Food and Agriculture Organization

FDI foreign direct investment

GDP gross domestic product

GNP gross national product

LM Lagrange multiplier

MBA Masters of Business Administration

ML maximum likelihood

MPC marginal propensity to consume

MSE mean square error

MT Metical; the currency of Mozambique

OLS ordinary least squares

PDF probability density function

PRF objective of regression

PRL population regression line

RLS restricted least squares

RSS residual sum of squares

SRF sample regression function

SUR seemingly unrelated regression

TSS total sum of squares

VIF variance inflation factor

WLS weighted least squares

© SOAS CeDEP 12 Unit One: Introduction to Econometrics

Unit Information 2 Unit Overview 2 Unit Aims 2 Unit Learning Outcomes 2 Unit Interdependencies 3

Key Readings 4

References 5

1.0 Ideas and issues 6 Section Overview 6 Section Learning Outcomes 6 1.1 What is econometrics? 6 1.2 Econometrics in theory and in practice 9

2.0 Key concepts: the concept of regression 11 Section Overview 11 Section Learning Outcomes 11 2.1 What is regression? 11 2.2 Linearity and log-linearity 15 2.3 Correlation versus regression analysis 16 2.4 Data and regression 16 2.5 A word of caution on notation 17 Section 2 Self-Assessment Question 18

3.0 Example: The Keynesian function 19

Unit Summary 22

Unit Self-Assessment Questions 23

Key Terms and Concepts 26

Applied Econometrics Unit 1

UNIT INFORMATION

Unit Overview This unit introduces you to the study of econometrics. It begins by defining econometrics and then explains how econometrics relates to and differs from other branches of economics. The important roles of economic theory and data in econometric work are emphasised. Regression analysis is identified as the basis of econometric procedure. The aims and purpose of regression analysis are explained. The main steps of a typical econometric investigation are described and illustrated with an example.

Unit Aims  To define the nature and scope of econometrics.

 To identify the special characteristics of econometrics as a tool of applied economics.

 To describe and illustrate the main steps of an econometric investigation.

 To identify some characteristics of economic dat a.

 To practise some basic techniques of data investigation.

Unit Learning Outcomes By the end of this unit, students should:

 have an appreciation of econometrics as a method of empirical investigation

 have an understanding of major differences between econometric models and economic models

 have an understanding of the main steps of an econometric investigation  have a knowledge of essential terminology relating to regression analysis

 know how to perform basic data analysis in R.

© SOAS CeDEP 2 Applied Econometrics Unit 1

Unit Interdependencies

Unit 3

In Unit 3, you will take the first steps into linear regression analysis, which is one technique to deal with non-experimental data.

Unit 4

In this unit, we discuss the methodology of econometrics, which involves testing our maintained hypothesis; this is what you will be introduced to in Unit 4.

Unit 5

In Unit 5, we expand linear regression to multivariate regression, which is commonly used to perform empirical analysis. Also, we will make use of correlation analysis in dealing with the issue of multicollinearity.

Unit 8

In this Unit, you will learn that we can use a variety of functional forms to specify our model. In Unit 8 you will review the logarithmic transformation and employ it to mitigate the effect of non-constant variance in the residuals, ie heteroscedasticity.

Unit 9

In this unit we introduce the concept of regression analysis. This kind of analysis implies that a causal nexus is identified between two or more variables. In Unit 9 the issues of causality are discussed more extensively.

© SOAS CeDEP 3 Applied Econometrics Unit 1

KEY READINGS

Gujarati, D. & Porter, D. (2010) The nature and scope of econometrics. In: Essentials of Econometrics. 4th edition. International Edition, McGraw-Hill. pp. 1–13. Gujarati and Porter (2010) is the main textbook for the study of this module. This introductory chapter will give you an overview of the course and it will define the field of econometrics, its methodologies and types of data. The chapter is straightforward and can be read fairly quickly.

Gujarati, D. & Porter, D. (2010) Basic ideas of linear regression: the two-variable model. In: Essentials of Econometrics. 4th edition. International Edition, McGraw- Hill. Sections 2.1–2.6, pp. 21–32. Sections 2.1 to 2.6 provide a brief, clear explanation of the key introductory concepts of linear regression. Make sure that you refer to these definitions also as you go through the following units.

© SOAS CeDEP 4 Applied Econometrics Unit 1

REFERENCES

Maddala, G.S. (2009) Introduction to Econometrics. 4th edition. Chichester, John Wiley & Sons.

ONS. (1991) Economic Trends Annual Supplement 1991. Unipub.

Tsai, P.L. (1991) Determinants of foreign direct investment in Taiwan: an alternative approach with time series data. World Development, 19 (2–3), 275–285.

© SOAS CeDEP 5 Applied Econometrics Unit 1

1.0 IDEAS AND ISSUES

Section Overview In this section, we begin to explore the subject of econometrics, what it does and how it is used in applied research.

Section Learning Outcomes By the end of this section, students should be able to:

 understand what econometrics does and how it is employed in empirical research

 understand the nature of the data and methods used in econometrics.

1.1 What is econometrics? Welcome to this module. Its aim is to give you an introduction to econometric methods or, more specifically, to linear regression which is the main statistical foundation for econometric work. Throughout the module you will be working with data; we hope you will find this interesting.

Economic theory is concerned with relationships between variables. You might have already met some of these, such as demand and supply functions for agricultural products, production functions, labour supply and demand functions, and so on. Economic theory aims to explain economic behaviour; this involves studying the relationship between economic variables and the factors that influence them.

The purpose of econometrics is to quantify economic relationships. Econometrics can provide numerical estimates of the parameters of these relationships and a framework for testing hypotheses about them. Broadly defined, econometrics is

‘ … the application of statistical and mathematical methods to the analysis of economic data, with the purpose of giving empirical content to economic theories and verifying them or refuting them …’

Source: Maddala (2009) p. 3.

Other definitions are possible: you will come across a number of definitions that each has a slightly different emphasis. Common to all definitions, however, is the stress on the empirical nature of econometric work.

 The process of econometrics involves the confrontation between economic theory and economic data in quantifying economic relat ionships.

Econometrics is not just a branch of mathematical economics. Mathematical economics need not have any empirical content at all whereas in econometrics the emphasis is on empirical analysis. At the same time, econometrics is not just a ‘box of tools’ to work with data. It requires, undoubtedly, a good training in statistical techniques but these techniques need to be deployed in an interactive process between theory and the data.

© SOAS CeDEP 6 Applied Econometrics Unit 1

This module can be studied in its own right, but normally we would expect you to take it as part of the MSc programme where, in Part I, you will have studied various economic theories and models. You should therefore be familiar with a range of questions raised in theoretical discussions and with the results of some applied empirical studies. These are good foundations on which to build the study of econometrics. If up to now you have approached empirical studies from the point of view of theory or the consequences for policy making, we now invite you to look at them from the point of view of an econometrician. What is the difference?

To give empirical content to economic theories, the econometrician is confronted with four problems that hardly concern the economic theorist. These four problems are explained below.

(a) Non-experimental data

Economic theory develops models using a priori reasoning applied to relatively simple assumptions. This procedure involves abstracting from secondary complications by assuming that ‘other things remain equal’ (or ceteris paribus), in order to investigate the links between a few key economic variables.

For example, in demand theory we say that the quantity demanded of a commodity (that is not a Giffen good) will fall if its price rises, other things being equal. These ‘other things’ which we assume are held constant include consumers’ incomes and income distribution, and the prices of substitutes and complementary goods.

This method is fruitful in economic theory but, unfortunately, it is rarely possible to carry out controlled experiments to test such statements. Therefore, in empirical economics the scope for observing such behaviour is severely limited. A researcher cannot alter a commodity’s price, holding other things constant, in order to see what happens to its demand.

In general, economic data are not the outcome of experiments but rather are observed and recorded in a non-experimental world where other things are never equal. Therefore, econometrics involves untangling the effects of different factors that act simultaneously rather than analysing the results of a laboratory experiment.

(b) Stochastic relationships

Economic theory usually involves deterministic relationships between economic variables. This can be explained with a simple example: the Keynesian consumption function. In economic theory we assume that, if we know the level of aggregate real income, consumption will be uniquely determined. That is, for each value of aggregate real income there corresponds a given level of aggregate consumption.

In reality, however, we do not expect theoretical relationships to hold exactly. Even when all the main factors that systematically affect the behaviour of an economic variable are taken into account, there will still be some random variation due to non- systematic, ‘one-off’ factors and human variability.

Hence, in econometric work we deal with relationships between variables that contain a random or stochastic element, and that are therefore not deterministic in nature. We investigate functions between variables which we believe to be reasonably stable on average, but there is always a degree of uncertainty about them.

© SOAS CeDEP 7 Applied Econometrics Unit 1

In econometrics we make explicit assumptions about these random components, called disturbances. This is why econometrics draws heavily on probability theory and statistical inference.

(c) Observed variables

In economic theory we work with theoretical variables. Econometrics, in contrast, deals with observed data.

Obviously, there is a certain correspondence between them: data collection is inspired by some theoretical framework. For example, the framework for measuring national income account data derives from , which is centred on the analysis of theoretical aggregates such as output, demand, employment and the price level.

However, observed variables do not fully correspond to their theoretical counterparts because of differences in definition and coverage, and errors in measurement. For example, the ‘price level’ is an abstract concept that is usually represented empirically by some aggregate price index; however, the values it takes depend on the goods whose prices are covered by the index and the method of calculating the index.

Another example concerns modelling technology. In agricultural supply functions, the ‘state of technology’ is an important variable: changes in supply over time are driven both by price changes and by the pace of technological change. But how can technological development be measured? Many researchers resort to a simple time trend to represent this important variable.

Finally, ‘management’ is a key input in the theoretical specification of an agricultural production function but one that is always difficult to measure empirically. Econometricians sometimes resort to proxy variables for management like the number of years of education the farmer has received.

In econometrics we need to be aware of the discrepancy between theoretical concepts and observed data, and its implications when quantifying theoretical propositions.

(d) The treatment of time

The econometrician must make explicit assumptions about the role of time in his model. When economic theory postulates that consumption depends on disposable income, ceteris paribus, it implies that when income takes different values so does consumption. Econometrics can quantify this dependency by using information about how consumption changes as income takes different values. However, this dependency could be observed empirically in two alternative ways:

(i) by recording how consumption and income move together over time, or

(ii) by recording the consumption of households at different income levels during the same time period.

In the first case, we have a time-series model, requiring time-series data (measured at different intervals over time).

In the second case, we have a cross-section model, requiring cross-section data (measured for different individuals or micro-units at the same point or during the

© SOAS CeDEP 8 Applied Econometrics Unit 1 same period in time). The choice between a time-series and a cross-section model often depends on data availability, although this choice is less straightforward than it may seem.

First, we may need to modify our theory for explaining consumption changes over time before it can be applied to cross-sectional consumption analysis.

Second, there may be data considerations; for example, a time-series approach is hardly appropriate for studying how consumption varies with income during periods in which there has been virtually zero income growth.

1.2 Econometrics in theory and in practice The four elements above give econometric work its distinctive flavour:

 the fact that we cannot hold other things constant in empirical analysis

 the random (or stochastic) nature of relationships between variables

 the discrepancies between theoretical variables and observed data

 the need to make explicit assumptions about time. We cannot move straight from an economic model as formulated by economic theory to parameter estimation without dealing with these issues. In empirical analysis, our data never behave exactly as our theoretical models would lead us to believe. Simple theoretical models are useful abstractions.

But in empirical work the relationships we wish to disentangle from the data may involve a number of variables, and may be subject to uncertainties that our theories could not possibly aim to explain. ‘Econometric methodology’ therefore includes approaches for dealing with these issues, as well as the statistical techniques of parameter estimation.

Regression analysis provides us with an analytical framework for handling relationships involving a number of causal factors, including random elements. It seeks to establish statistical regularities among observed variables. To do this we need to deal with the randomness inherent in the behaviour of our variables. This requires the help of statistical theory, which allows us to model randomness as an integral part of the relationship between variables. How this is done, and how we should interpret the results, is the subject of this module.

The following are the main points to remember.

 In econometrics we confront theory with economic data so as to quantify economic relationships and to test hypotheses about them.

 In practice, we deal with stochastic relationships between variables which we can only observe in a non-experimental context.

 Econometric methodology has been developed in order to deal with this situation, and differs significantly from the way regression analysis is applied to experimental data. There are many outstanding issues and unresolved methodological problems in the practice of econometrics.

© SOAS CeDEP 9 Applied Econometrics Unit 1

 Moreover, the conclusions we draw in a particular context will always involve a considerable degree of uncertainty, even if our model is correctly specified. For this reason, we rely on probability theory and statistical inference to deal with uncertainty in assessing the results of empirical analysis.

 Econometrics is concerned primarily with quantifying and testing relationships between variables, and regression analysis is its main tool of statistical analysis.

© SOAS CeDEP 10 Applied Econometrics Unit 1

2.0 KEY CONCEPTS: THE CONCEPT OF REGRESSION

Section Overview Linear regression is one important tool of econometrics. In this section, we discuss the general nature of regression analysis and provide a background that will be used to interpret real examples in the rest of the module.

Section Learning Outcomes By the end of this section, students should be able to:

 understand the concept of linear regression

 identify the components of regression analysis. Teaching and learning econometrics involves a preoccupation with tec hnical details, definitions of technical terms, mathematical derivations, step by step descriptions of statistical procedures etc, all expressed in technical notation.

This is normal and, indeed, necessary. But this preoccupation with technical detail often implies that students lose a perspective on ‘What is it all about?’ and ‘Why are we doing this?’ That is, there is a need to keep a grip on the kinds of basic questions, which give substance to the subsequent technical exercises, uncluttered by notation and technical detail. We need to get an overview of a problem before we attack it aided by our technical armoury. We need to know the simple questions and intuitive insights which have prompted elaborate technical enquiries. Let us first start with the concept of regression analysis.

2.1 What is regression?

An intuitive explanation

Regression is the main statistical tool of econometrics. But what is it exactly? Regression can best be explained by an example. Consider a famous empirical law of , formulated by German economist and statistician Engel. This was based on a household budget survey of Belgian working class families collected in 1855. Engel observed that the share of expenditure on food in total household expenditure (= the -variable) was a declining function of household income (= the -variable). This is indeed what one would expect: on average, poorer families spend a higher proportion of their income on food in comparison with better-off families. Note that we refer to the proportion of total household expenditures spent on food and not total food consumption of the family (one would expect better-off families to spend more money on food even though these expenditures are generally a smaller proportion of their total expenditure).

Hence, we expect that, on average, the share of food in household expenditures is inversely related to household income. But we do not expect this relationship to be exact. That is, if we were to sample 10 families with identical income (ie equal -

© SOAS CeDEP 11 Applied Econometrics Unit 1 values), we would not expect to get 10 identical shares of food consumption in total household expenditures (the -values).

Differences in the demographic composition of families, in consumption habits and in tastes will account for differences in food expenditures. In fact, many budget studies, in the past and in the present, reveal that there is considerable variation within each income class with respect to the proportion of household expenditures spent on food. But, nevertheless, it is still valid to say that, on average, the proportion of household expenditures spent on food declines as the level of income increases.

This leads us to the concept of regression. Regression methods bring out this average relationship between a dependent variable (the -variable) on the one hand and one or more independent variables (the -variables, also called the explanatory variables) on the other.

In our example, the average relationship between the share of food in household expenditure and the level of household income is the regression of the former variable on the latter.

Hence, in regression analysis we seek to model the chance variation around the average as well as the average itself.

In summary, we hope that our model captures the basic structure of interaction between economic variables. We expect that the behavioural relationships are reasonably stable but we know that they do not hold exactly because of the random component (the disturbance term). At most, we expect these relations to hold ‘on average’.

Trying to determine this average relationship amidst the random variation in the data is like trying to separate sound from noise when listening to a badly tuned radio.

Thus, a regression model has two components.

(a) A regression line, which models the average relationship between the dependent variable and its explanatory variable(s). This requires us to make an explicit assumption about the shape of the regression line: the function that expresses it may be linear, quadratic, exponential, etc.

(b) Disturbances; we acknowledge the existence of chance fluctuations due to a multitude of factors not explicitly recognised in the model. We model this element of uncertainty (the noise) in the form of a disturbance term which constitutes an integral part of our model. This disturbance term is a ‘catchall for all the variables considered as irrelevant for the purpose of this model as well as all unforeseen events’ (Maddala, 2009: p. 5). It is a random variable that we cannot observe or measure in practice.

We are not interested in the disturbance term as a variable per se, but we are keen to remove its blurred messages that hamper our attempts to investigate the behavioural relationship between the variables of our model. To do this, we need to model the stochastic (probabilistic) nature of the disturbance term. This is no easy task and we always need to think carefully about whether the assumptions we make about the behaviour of the disturbance term are indeed appropriate for the relationship under study. Not surprisingly, a great deal of econometric theory and practice revolves around these assumptions.

© SOAS CeDEP 12 Applied Econometrics Unit 1

A formal explanation

It is useful to express these important ideas more formally. We start with the population regression function. This function is a theoretical construct representing a hypothesis about how the data are generated. For the simple, two- variable linear regression model we have

(1.1) where is the dependent variable (sometimes called the ‘regressand’)

is the explanatory variable or independent variable (or ‘regressor’)

is the disturbance term

and are the regression parameters: 1 is the intercept, or constant,

and is the slope coefficient.

The subscript indicates the -th observation, ie the -th person or object sampled.

Typically, the variables and are observable for each observation , the disturbance takes different values for each but is not observable, whereas the parameters and are unknown but constant for all observations.

The presence of the random disturbance in equation (1.1) means that is stochastic.

The population regression function may be viewed as comprising two components:

 a systematic element represented by a straight line showing the statistical dependence of on ;

 a random, or stochastic, element represented by the disturbance term u.

The systematic element can be expressed as

( | ) (1.2) that is, the average, or expected, value of conditional on a given value of X is a linear function of . Therefore, the population regression function joins the conditional means of . The disturbance term, , accounts for the variation in Y around the population regression line. In later units, you will learn about the assumptions made concerning .

Regression enables us to quantify the unknown parameters and , and the unknown disturbances * +, for , ... , , in equation (1.1).

Using a sample of data on and , we obtain estimates ̂ and ̂ , of the unknown population parameters ( ̂ is read as ‘hat’, hence ̂ is ‘beta 1 hat’). We have the sample regression function

̂ ̂ ̂ (1.3) in which ̂ and ̂ are random variables (the particular estimates obtained depend on the particular sample of data on and used) that differ from the population parameters and .

© SOAS CeDEP 13 Applied Econometrics Unit 1

Consequently, the sample residuals, ̂ , differ from the unknown population disturbances, . Whereas the disturbance term accounts for the variation in around the population regression line, the residuals give us the deviations of the observed -values from the estimated regression line.

2.1.1 Difference between disturbances and residuals

Name Notation Refers to

Disturbances Population regression line (computed on the entire population)

Residuals ̂ Estimated regression line (computed on the sample)

Source: unit author

The residuals, therefore, are not identical with the disturbances, but clearly they may contain some information that can help us understand the behaviour of the disturbances. How to analyse the information contained in the residuals is addressed in later units.

 Please note that different textbooks use different definitions for disturbances and residuals; for example, some use the term ‘error’ instead of residuals. In this module, we try to be as flexible as possible with terminology.

The predicted (or fitted) value of the dependent variable, ̂ , is given by the sample regression line

̂ ̂ ̂ (1.4) in which ̂ is the fitted value of the dependent variable, the estimator of ( | ), that is the estimator of the population conditional mean (cf. equation (1.2)).

The sample linear regression line is an estimator of the population regression line.

© SOAS CeDEP 14 Applied Econometrics Unit 1

2.2 Linearity and log-linearity

Equation (1.1) is an example of a linear regression model. That is, is linear in and in the parameters and . With the linear regression line

the interpretation given to relies on the fact that

(1.5)

In case you are not familiar with derivatives, this means that an increase of one unit in (measured in units of ) results in an increase of units in (measured in units of ).

On the intercept – in theory, is the predicted value of (in units of ) if . In practice, this interpretation of is not recommended unless zero values of could reasonably occur.

Now consider the model

(1.6) which, after taking natural logarithms of both sides of the equation, can be written as

(1.7) where If you are not familiar with this transformation, please make sure that you revise the algebra in a basic textbook; this kind of topics is usually in the appendix.

This model is also linear in the parameters and . We may view the model as

(1.8)

where and .

This model is known by various names – logarithmic, double log, log-log, log-linear and constant elasticity – and is frequently used in applied work to characterise the form of the functional relationship between the variables. It has the property that the slope coefficient measures the elasticity of with respect to because

⁄ (1.9)

Again, if you do not feel comfortable with this formulation, please make sure that you refresh the basic mathematical tools.

© SOAS CeDEP 15 Applied Econometrics Unit 1

2.3 Correlation versus regression analysis Although regression analysis is related to correlation analysis, conceptually these two types of analysis are very different.

The main aim of correlation analysis is to measure the degree of linear association between two variables and this is summarised by a value, the correlation coefficient.

The two variables are treated symmetrically:

 both are considered random

 there is no distinction between dependent and explanatory variables

 there is no implication of causality in a particular direction from one variable to the other.

Regression analysis, on the other hand, can deal with relationships between two or more variables and the variables are not treated symmetrically:

 the dependent and explanatory variables are carefully distinguished

 the former is random whereas the latter are often assumed to take the same values in different samples – often referred to as ‘fixed in repeated samples’

 the underlying economic theory implies that , an explanatory variable, ‘c auses’ or ‘determines’ , the dependent variable  moreover, with more than one explanatory variable, regression analysis quantifies the influence of each explanatory variable on the dependent variable.

It is important to note that the regression of on does not give the same sample regression line as the regression of on . The appropriate direction of causality is determined by the modeller according to a priori reasoning, based on theory or common sense.

2.4 Data and regression Regression methods allow us to investigate associations between variables, but the justification for these relationships comes from theory. Relationships have to be meaningful and whether they are or not depends on theoretical argument s.

This does not mean, however, that data play only a passive role in economic analysis. Empirical investigation is an active part of theoretical analysis inasmuch as it involves testing theoretical hypotheses against the data as well as, in many instances, providing clues and hints towards new avenues of theoretical enquiry. Theoretical insights have to be translated into empirically testable hypotheses that we can investigate with observed data. Hence, theory and data are interactive: theoretical propositions should be continually tested empirically and theoretical insights can be improved with the aid of signals from the data.

© SOAS CeDEP 16 Applied Econometrics Unit 1

Most of the data we use in applied economic analysis are not obtained from experiments but are the result of surveys and observational programmes.

 For example: national income accounts, agricultural and industrial surveys, financial accounts, employment surveys, population census data, household budget surveys, and price and income data, that are collected by various statistical offices.

They are records of unplanned events; they are not the outcome of experiments. The nature of economic data makes an econometrician’s work quite different from that of a psychologist or an agricultural scientist.

In the latter cases, experiments play a central role in empirical research, and much emphasis is put on the careful design of experiments in order to single out the ‘stimulus-response’ relationship between two variables whilst controlling for the influence of other variables (that is, by holding them constant ).

In economics, the scope for experimentation is very limited. We cannot change the price of a commodity, holding incomes and all other prices constant, just to see what would happen to the demand for it. In economic theory, we assume that ‘other things are equal’ (ceteris paribus) and focus on cause and effect between the remaining variables. But in empirical analysis other things are never equal, and we have to observe the behaviour of economic agents from survey data. Multiple regression techniques allow us to ‘account’ for the influence of other variables whilst investigating the interaction between two key variables, but this is not the same as ‘holding other variables constant’.

A careful observer uses data not just to confirm his or her theories, but also to get clues from empirical analysis to advance his/her theoretical grasp of a problem. It is primarily this aspect that enables data to contribute to the process of analysis.

2.5 A word of caution on notation You will notice that terminology and notation may be different from the textbook in some instances. Although these differences are inconvenient, it is an unfortunate fact that terminology and notation are not wholly standardised amongst econometricians.

For example, we maintain the ‘hat’ to denote estimators, eg ̂ – instead of . You should not be alarmed by these discrepancies; we invite you to focus on the concepts and ideas rather than on the technicalities. This material is specifically designed to keep technical contents to a minimum, and our aim is for everyone to enjoy the course independently of their mathematical knowledge.

© SOAS CeDEP 17 Applied Econometrics Unit 1

Section 2 Self-Assessment Question

uestion 1

Q

What are the links between econometrics and both economic theory and mathematical economics?

© SOAS CeDEP 18 Applied Econometrics Unit 1

3.0 EXAMPLE: THE KEYNESIAN CONSUMPTION FUNCTION

We shall now illustrate several phases in the methodology of econometrics with an example, the Keynesian consumption function.

Statement of the theory The Keynesian theory of consumption is the basis of our model of consumption expenditure. This theory states that real consumption expenditure depends on real disposable income, other things held constant. When income rises, consumption expenditure rises, but changes in consumption expenditure are less than the change in income. Also, as income rises, the average propensity to consume, that is, consumption per unit of income, falls.

Mathematical model of the theory

Suppose we represent the Keynesian consumption function as a linear relationship

(1.10)

where is real consumption expenditure, is real disposable income, is a constant and is the slope of the consumption function, ie the marginal propensity to consume out of disposable income. Because of our a priori expectations concerning the average and marginal propensities to consume, we expect and . (Note that the average propensity to consume is ⁄ ( ⁄ ) . For this to fall as income rises, we need .)

Econometric model of the theory

The econometric model is stochastic. It includes a random disturbance, , which captures the influence of all the other variables that may influence consumption expenditure.

(1.11)

© SOAS CeDEP 19 Applied Econometrics Unit 1

Collection of data

The data to be used are annual time-series data for the UK covering the period 19551991. They are aggregate consumption expenditure and personal disposable income both measured in £(1985) million. The source of the data is the Economic Trends Annual Supplement 1991 (ONS, 1991). Thus, our model represents a theory about the behaviour of aggregate consumption over time. A scatter plot of these data is given in the figure in 3.1.1.

It is obvious from this scatter plot that the relationship is upward sloping and it seems to be reasonably linear.

3.1.1 Scatter plot of aggregate consumption expenditure ( ) and personal disposable income ( )

Source: unit author using data from Economic Trends Annual Supplement (ONS, 1991)

Parameter estimation

Using these data the parameters and can be estimated to obtain the average relationship between and . Just how the coefficients of the population regression function are estimated will be explained in detail in later units. The consumption function estimated with our data is

̂ (1.12) and this represents the average relationship between consumption expenditure and personal disposable income.

The estimated value of is 3952 and of is 0.889. Consequently if personal disposable income increases by £1 million, consumption expenditure increases on average by £0.889 million.

© SOAS CeDEP 20 Applied Econometrics Unit 1

In this case, the interpretation of the intercept is not so meaningful. Mechanical interpretation of the estimate tells us that consumption expenditure is £3952 million if aggregate personal disposable income is zero. However, this is not particularly helpful because if aggregate personal disposable income is zero then the economy would be in chaos and the Keynesian theory of consumption expenditure would not be appropriate. The fact is that, in our sample, the -values are a long way from zero, and we really have no idea what the consumption function might look like at low levels of income.

Alternatively, some explain the value of the intercept as the average of all the variables omitted from the model.

Tests of the hypothesis

Do the results conform to the theory of the consumption function?

With our theory we expect and . Is each of these hypotheses supported by the results? Clearly, our estimates are consistent with what we expected to obtain. For a discussion of formal hypothesis tests, we must wait until the corresponding units.

Prediction

We can use the estimated model to predict what consumption expenditure would be if personal disposable income were a particular amount. Suppose personal disposable income was £250 000 million. The predicted amount of consumption expenditure is

̂ = 3952 + 0.889(250 000)

∴ ̂ =226 202

That is, consumption expenditure is predicted to be £226 202 million if disposable income is £250 000 million.

© SOAS CeDEP 21 Applied Econometrics Unit 1

UNIT SUMMARY

In this unit we have introduced some basic ideas on econometrics and regression analysis. The most important points to remember are the following.

 Econometrics is the application of statistical and mathematical methods to the analysis of economic data, with the purpose of giving empirical content to economic theories and testing them against ‘reality’.

 The econometrician’s approach differs from that of the economic theorist because:

- we cannot ‘hold other things constant’ in empirical analysis

- the random nature of relationships between variables means that the results and conclusions of empirical analysis always contain an element of uncertainty

- there is a discrepancy between theoretical variables and observed data in terms of coverage and precision of measurement

- econometricians cannot avoid explicit assumptions about the time frame of their model, since the data they use have been generated in a ‘real-time’ context.

 Regression analysis is the statistical basis of econometric theory and practice. Its aim is to quantify relationships between variables, especially between variables whose relationship is subject to chance variation.

 Regression involves finding an average line that summarises the relationship whereby depends on in the midst of random variation and uncertainty of outcome.

 The randomness inherent in conclusions and outcomes based on regression analysis is formally modelled by introducing a disturbance term into our behavioural equations. This is a stochastic variable which is not observable. However, the residuals of a sample regression function may provide us with an indication as to the behaviour of these unknown disturbances.

 Regression allows us to investigate the association between variables, but it cannot ‘discover’ causality between them. To establish causality we need to resort to economic theory.

 Empirical work in economics cannot rely on experimentation. Econometric analysis is therefore based on careful observation of data drawn from within a context which we do not control.

In terms of practical skills, this unit requires that:

 you are familiar with the scatter plot as a practical tool of empirical analysis

 you know how to load data into R software from a pre-existing data file

 you know the R software commands to obtain a summary of descriptive statistics of a variable, make a scatter plot and create logarithms of variables.

© SOAS CeDEP 22 Applied Econometrics Unit 1

UNIT SELF-ASSESSMENT QUESTIONS

To answer the Unit Self-assessment Questions, you will need to use R software. Please refer to the R Guide for Unit 1 (available on your e-study guide) and follow the instructions given.

uestion 1

Q

The data file u1q2.txt contains annual time-series data for the United States over the period 19591991 on aggregate consumption expenditure, , and disposable income, , both measured per head of population and in billions of constant 1987$. (Source: Economic Report of the President, 1992: table B-5, p. 305).

(a) Use R software to produce a scatter plot of on the vertical axis and on the horizontal axis. Comment on the scatter plot: would a linear regression seem appropriate?

(b) Use R software to obtain time-series plots of and . Describe the way consumption and income have moved over the period 19581991.

uestion 2 Q

The hypothesis that foreign direct investment is determined by demand suggests that foreign direct investment and gross domestic product are positively related, other variables remaining constant. The data file u1q3.txt contains annual time- series data for the period 19581985 on foreign direct investment, FDI, and gross domestic product, GDP, for Taiwan. (Source: Tsai 1991: Table A-1, p. 285).

Use R software to obtain scatter plots of FDI on GDP and of the logarithm of FDI on the logarithm of GDP, both for the period 19581985. Comment on the two scatter plots. Remember that a log transformation is used in empirical research to reduce the ‘noise’ (stored in the disturbances) associated with higher values of relative to that associated with lower -values. Which of the following would you expect to be the more appropriate linear regression model:

(a) or

(b) ?

© SOAS CeDEP 23 Applied Econometrics Unit 1

uestion 3

Q

The data file u1q4.txt contains cross-section data from a sample of 100 rural households on the value of their consumption and income during a given month. Income ( ) includes cash income from all sources during the month concerned, plus the (imputed) market value of own production consumed by the household. Consumption ( ) includes the value of all purchased items, plus the value of own production consumed by the household. The units are measured in the local currency rounded to the nearest whole number.

(a) Obtain the scatter plot of on . What is the main difference between this scatter plot and the one constructed in Question 1?

(b) Use R software to obtain the histograms of and . Income has the usual positively skewed distribution that we would expect, whereas the distribution of consumption is less skewed. Can you suggest a reason for this?

(c) Use R software

(i) to obtain the average propensity to consume at the sample means (ii) to compare the degree of skewness of the two variables

(iii)to obtain their correlation coefficient.

You may remember that the correlation coefficient (or Pearson’s coefficient) is defined as:

( ) ( )

It measures the degree of linear dependence between the two variables, it varies between −1 and 1 and it is invariant to the units of measurement (eg $, thousands $ etc). A positive (negative) coefficient indicates that the variables are positively (negatively) correlated, with a zero-value coefficient indicating that they are uncorrelated random variables.

© SOAS CeDEP 24 Applied Econometrics Unit 1

uestion 4

Q

Weekly earnings can vary considerably in the case of casual dock labourers recruited on a day-to-day basis. There are differences between workers as well as across weeks. Weekly earnings will vary from week to week depending on the activity of the harbour which determines the demand for labour. Daily recruitment will be high if demand is high, and vice versa. Earnings also vary between workers in any given week. These depend on the numbers of days a worker manages to get recruited for in a particular week, on whether he or she is recruited for the day shift or the night shift, and on the number of hours of overtime he or she works in that week.

In this exercise you will look at data on the weekly earnings of casual workers, ECAS, and the recruitment of casual workers, CASREC. The data file u1q5.txt contains paired observations on the two variables ECAS and CASREC. The data were taken from a field study carried out in 1980/1981 by the Centre of African Studies in Mozambique (Eduardo Mondlane University, Maputo) on casual labour on the docks of Maputo harbour. The earnings data are in units of 100 MT, the local currency being the Metical.

(a) Using R software, calculate the means, standard deviations, and minimum and maximum values for both variables.

(b) A particular worker is randomly chosen from the labour force in a particular week in 1980/1981 that is also randomly chosen. Using the information in your answer to part (a), what is your best estimate of the weekly earnings of this randomly selected worker?

(c) With R software, obtain the scatter plot of ECAS against CASREC. Write down what you observe.

© SOAS CeDEP 25 Applied Econometrics Unit 1

KEY TERMS AND CONCEPTS cross-section data Type of data collected by observing a number of individuals, countries, firms etc at the same point in time. disturbances The unobserved random component that explains the difference between observed and predicted values of . econometrics The discipline that investigates economic data and relationships using statistical techniques. linear regression The statistical methodology that models the relationship between a dependent variable and one or more explanatory variables. In linear regression, this relationship (or function) is assumed to be linear. Other methodologies can deal with other functional forms, eg exponential, quadratic. non-experimental data Type of data that are not compiled as a result of experiments. Other factors in the model are assumed to be fixed, although the researcher cannot actually hold them fixed as he could do in an experimental context. time-series data Type of data collected by observing the same individual, country, firm etc over a period of time.

© SOAS CeDEP 26