<<

Urban and Regional Report No. 32-8 Public Disclosure Authorized

Series on the of Urban Shelter Programs

VOLUME SEVYEN

STATISTICAL PROCEDURES FOR THE EVALUATION

Public Disclosure Authorized OF PROJECT IMPACT

Michael Bamberger

May 1982 Public Disclosure Authorized

This volume was prepared as part of a cooperative project sponsored by the International Development Research Centre (Ottawa) and the World Bank. The views reported here are those of the authors, and they should not be interpreted as reflecting the views of the World Bank or its affiliated organizations.

Urban and Regional Economics Division Development Economics Department The World Bank Washington, D. C. 20433 Public Disclosure Authorized e

TABLE OF CONTENTS

Preface Page

Part I: QUASI-EXPERIMENTAL DESIGNS AND THE EVALUATION OF URBAN SHELTER PROGRAMS

Chapter 1: Quasi-Experimental Designs for Evaluation Research and their Weaknesses

1. Experimental and Quasi-Experimental Research Designs...... 1

2. Limitations of the quasi-experimental designs and the threats they present to the validity of the results...... 6

2.1 Statistical conclusion validity ...... 7 2. 2 Internal validity .. a ...... 11 2.3 Construct validity of causal relations ...... so..... 15 29 4 External validity ...... * * ...... 18

Chapter 2: The Application of the Evaluation Designs to Urban Shelter Programs in Developing Countries

1. The principal types of shelter programs being studied ...... 20

2. Problems in evaluating the impact of urban shelter programs ...... s...... o...... 23 2.1 Some general research problems ...... 4 ...... so. 23 2.2 Some specific problems in the evaluation of sites and services ...... 24 2.3 Some specific problems in the evaluation of upgrading projects .. **.* ...... 25

3. The principal research designs for the evaluation of urban shelter programs ...... s. 26

3. l Panel studies ...... n *. .**000 *000***...... 000 28 3.2 Independent random samples ... ee...... *.*. 32 3.3 Panel study with replacement of drop-outs (Mixed Samples) ...... 34 3.4 Ex-post comparison of experimental and control groups ...... 36

4. Comparison of the 8 research designs...... 37 TABLE OF CONTENTS

-2- Page

Part II: STATISTICAL TESTS FOR EVALUATING CHANGE

'Chapter 3: Selecting the Appropriate Statistical Test and Defining the Research Hypothesis

1. Factors which influence the type of statistical test to use for measuring change ..*...... 44

1.1 The research design ...... *****...... 44 1.2 The level of of the dependent variable.... 46 1.3 The form of the distribution being studied ...... *..49 1.4 The form in which the data is stored and in which it can be accessed ...... *-.9...¢...... 50

2. Defining the research hypothesis and the conditions under which it will be accepted or rejected ...... 51

2.1 Defining the research hypothesis ...... 51 2.2 Defining the null hypothesis ...... 52 2.3 Defining the conditions under which the null hypothesis will be rejected .e...... 53 2.4 One-and two-tail tests ...... 57 2.5 Summary of procedures for defining hypotheses...... 58

Chapter 4: Statistical Tests for measuring Change with Nominal Variables

1. Independent samples: The Chi-Square Test, PHI and Cramers V ..... 4* *4...... ee ..e....o 60

1.1 Assumptions and conditions for the use of Chi-Square 65 1.2 Some limitations of Chi-Square ...... 66 1 .3 The PHI coefficient ...... * ...... 67 1.4 Cramer' s V ...... eeee^**ee¢ee*... o...... 68

2. Related samples: The McNemar Test ...... 68

2.1 A Limitation of the McNemar Test ...... 70 2.2 Applying the McNemar Test with SPSS ...... 70 TABLE OF CONTENTS (Continled)

-3- Page

Chapter 5: Statistical Tests for measuring Change with Ordinal Scale Variables

1. The relationship between Ordinal and Interval Measurement Scales ...... 72

2. Independent samples: The Test ...... 73

3. Related samples: The and Wilcoxon's Matched Pairs Signed Ranks Test ...... a...*....*...... 75

3.1 The Sign Test: Weak Ordinal .* ...... 76 3.2 The Wilcoxon Matched Pairs Signed Ranks Test: Stronger Ordinal Measurement ...... 79

Chapter 6: Statistical Tests for measuring Change with Interval Scale Variables

1. Three analytical approaches to the study of change with interval scale variables...... o...... 87

2. Independent samples ...... co. 91

2.1 The T-Test for the significance of the difference between *T.a.* ... s.ic.....i ...... 91

2.1.1 Calculating the T- with common ' .... o...... o.X-0v§s@*92 2el 2 Calculating the T-Statistic with unequal variances ...... a...... 96

2.2 The T-Test or Z Test for the difference of proportions ...... o...... o..... 98

2.2.1 The T-Test or Z-Test for the difference of differences of proportions ...... 100

2.3 The confidence intervals for the difference of means *...... e...e...e.e...... 101

3. Related samples aa...a.a.a....a..... o a...... a 108

3.1 The T-Test for pairs . o...... 108

3.1.1 The use of the T-Test without a control group . ..* . * * * * * a a *a a *. - a a aee...... a . a 108 3.1.2 The use of the T-Test with a control group ... 109 TABLE OF CONTENTS (Continued)

-4-

3 2 The use of confidence intervals a...... Ooeoeo i11 3.3 The use of to evaluate differences in the rate of change between the experimental and control groups while con- trolling for the value of dependent and inde- pendent variables in T(1) ...... O..*. 112

Part III: APPLICATION OF THE STATISTICAL TESTS IN THE EVALUATION OF URBAN SHELTER PROGRAMS. Examples from ongoing research

Chaper 7: Operational Problems in matching Cases in Urban Research ...... o...... e....**. .. 119

1. Causes of the difficulties in matching cases ...... 119

161 Administrative problems ...... X...... o...... 120 1.2 Confusion during the ...... 122 1.3 Coding errors and problems with punching O..e O... *.. 122

2. Some corrective measures *... a*.*e...... emson.aeoe06 123

2.1 Clearer definitions of head of household, participant etc. .*. .****a* ...... * 123 2.2 Recording full names in the data files...o...... e 125 2.3 Building redundancy into the coding ...... o....*.*. 125 2.4 Logical consistency checks at the analysis stage ..o. 126

Chapter 8: The Application of the Statistical Tests with Panel Samples *. . a ...... 132

1. Panel studies without control groups ...... *... I..... 132

1.1 Nominal scale measurement: Example: Changes in Job Stability (McNemar Test) o ...... *o.....o. 132 1.2 Ordinal scale measurement: Example: Changes in housing quality (Wilcoxon Matched Pairs Signed Ranks Test) . . .. e..e.*...... 136 1.3 Interval scale measurement: Example: Changes in earned income of head and earned income

of household (T-Test) ...... 4*..*e...... e...o..... 143

2. Related samples with control groups ..,...... 146

201 Nominal scale measurement: Example: Changes in Job Stability ..... o ...... o...... ago...... 146 TABLE OF CONTENTS (Cont'd)

-5-

Page

2.2 Ordinal scale measurement: Example: Changes in housing quality (The Median Test) ...... 156 2.3 Interval scale measurement: Example: Changes in earned income of household head (T-Test)...... 159

Chapter 9: The Application of the Statistical Tests with Independent Samplesvo...... o...... ~... 163

1. Independent designs without control groups

1.1 Nominal scale measurement: X2. s ..... 165 1.2 Ordinal scale measurement: the Median Test...o.... 165 + 1N3 Interval scale measurement: T-Test, confidence intervals and regression analysis ...... a .... 166

2. Independent sample designs with control groups.e*.o. o... 166

2.1 Nominal scale measurement: Example: Changes in occupational categories (Application of the T-Test for Proportions to dichotomized

categories) . . * *.. . * ....* d * * .*. * * * .. a am. 166 2.2 Ordinal scale measurement: Example: Changes in housing quality (The Median Test)-...... Os.. 173 2.3 Interval scale measurement: Example: Changes in expenditure on food (T-Test).. . e ee... .,*...... 177 2.3.1 Estimating confidence intervalsoo...... oo.. 182

Chapt.er 10: The Application of the Statistical Tests with Mixed Samples.- ...... 0**...... ** . . .. 185

1. Nominal scale variables: Example: changes in the proportion of heads of household working in services and commerce e.** C**e. eea .....m*e e 190

2. Ordinal scale variables: Example: Changes in housing .o quality . ... .o.e * * o-...... * oa * 196

3. Interval scale variables: Example: Changes in the number of households occupying each structure ...... 198 TABLE OF CONTENTS .Continued)

-6 Page

Chapter 11: The Application of the Statistical Tests when the Research Design,does not include a Pre-Test

1. Simple Ex-Post comparison e ev...... e. ee...... 204

1.1 Nominal scale measurement: Exampie: Changes in status of housing in Jakarta (X used in con- junction with PHI) ...... 205 1.2 Ordinal scale measuremetit: Example: Method of transport used to travel to work in Jakarta (MIedian Test) . . . *. * * * e. ve e e e e e e .e e v e. e.v v .v .e e 208 1.3 Interval scale measurement: Example: Changes in per capita income in Jakarta (T-Test) ...... e.( 213

2. Ex-post comparison using different ex-ante study on same families as the reference point: Example: Impact of mutual help house 2construction on attitudes and social participation (X arld T'-Test)...... 216 Chapter 12: The Problem of Non-Equivalent Control Groups

1. Adjusting for the effect of intervening variables when the dependent variable is dichotomized: Example: changes in Job Stability ...... 225

1-1 Controlling for the inflbuence of intervening variables through the use of cross-breaks .*...... 226 1.2 Controlling for the effect of intervening variables through the use of .. 233

2. Adjusting for the effect of intervening variables when the dependent variable is measured on an ordinal scale: Example: Changes in housing quality .**.o. ... 235

2.1 The use of cross-breaks in conjunction with the Median Test O...... v...... o...... ,.. 237 2.2 Controlling for intervening variables through the use of MIultiple Regression Analysis *e...e 240 3. Adjusting for the effect of intervening variables when the dependent variable is measured on an interval scale: Example: Changes in income of head of household ...... *.. 245 References. ...* ..o... *...... eo ...... o .. . 253 PREFACE

This is the seventh and final volume of a Series produced by the Urban and Regional Economics Division of the World Bank on the design and implementatiorn of systems for the evaluation of urban shelter programs. The Series is one of the end products of a cooperative venture by the International Development Research Centre of Canada and the World Bank, which helped establish national research units to conduct five-year of urban shelter programs in Zambia, Senegal, El Salvador and the Philippines.

One of the original objectives of the program was to use the experience of the four evaluations to develop systems which could be applied to the evaluation of other urban projects. The present Series is the response to this objective. The Series comprises seven volumes which between them provide guidance on most of the major policy and research issues related to the planning, implementation and use of evaluation research on urban shelter programs. Table 1 presents a summary of the content and intended audience for each volume.

The present volume is intended for researchers in developing countries who are involved in the design and interpretation of studies whose objective is to evaluate the impact of urban shelter projects. Although a large number of research textbooks are available, most of them are concerned with studies which are able to approximate reasonably closely to one of the Quasi-Experimental designs. Researchers i:n developing countries who are concerned with the evaluation of large scale urban development projects will constantly find themselves in situations where the textbook methodology cannot be applied and where different designs and methods of analysis must be used. The purpose of the present volume is to cover in as comprehensive a way as possible the swtatiRtical procedures -required to analyze the main evaluation research designs which are likely to be used in urban shelter evaluations. The manual is intended to be easily accessible to readers who do not have advanced statistical training and who do not have easy access to social science research literature. At the same , it is directed to readers writh more advanced knowledge of research methodology, but who find that many of the practical problems they are encountering are not covered by the standard textbooks.

There are three parts.. Part I describes the research setting, the research designs which are being used and the theoretical limitations of these designs. An attempt is made to illustrate many of the practical limitations which are found in the evaluation of large scale urban shelter programs, and particularly the problems of drawing inferences from large scale projects which affect population groups who are not usually representa- tive of the low-income population with which policy makers are concerned.

Part II presents each of the main types of statistical test referred to in later sections. Each test is accompanied by simple examples so that the reader can work through the stages of the calculation if he or she is unfamiliar with the test. Table 1: The content and audience for each volume of the Evaluation Series

- Volume Title Contents Audience 1 Planning an Evaluation Definition of main users of eval- Policy makers System and for an Urban Shelter uation and the types of informa- Program Managers Program: Key Issues for tion they need. Key issues in Program Managers each type of study and main research designs. Main stages in the planning of the evaluation system. 2 A basic evaluation Presentation of a basic evalua- Program managers package tion system with a step by step and researchers explanation of how it should be set up.

3 A methodology for Research techniques for the Researchers impact evaluation in design, implementation and urban shelter programs analysis of impact evaluations. 4 Designing a Discussion of the main types of Researchers for longitudinal impact information required for meas- studies uring project impact over time. Examples of typical questionnaire formats. 5 Non- techniques in Description of non-survey tech- Researchers the evaluation of urban niques of evaluation which AVoid shelter programs many of the problems inherent in the use of structured . 6. The preparation of Discussion of typical problems Researchers evaluation survey data for with matching cases, data computer analysis - some cleaning, consistency checks practical issues. and data transference from one computer system to another. 7 Statistical evaluation of Review of the main statistical Researchers project impact through techniques for evaluating project longitudinal surveys impact with different types of design. Explanations are included of how to use each technique. Part III illustrates the way in which each test has been applied in one of the ongoing evaluations of World Bank shelter programs. Each example includes a discussion of the level of measurement, the formulation of the hypotheses and an interpretation of the results. An analysis is made of each of the possible "threats to validity" suggested by Cook and Campbell (1979).

The final chapter is devoted to the problem of non-equivalent control groups as nearly all of the studies conducted in urban areas fall into this category. As this is one of the issues least adequately covered in the research literature, this is perhaps the most important chapter. Examples are given of analytical techniques which can be used with each of the research designs.

Although the statistical techniques presented in the manual have been gleat-ed from a large number of different sources, an attempt has been made to limit the basic references to five easily accessible texts 1/ so as to simplify the problems of the reader who does not have easy access to libraries.

1/ The basic recommended texts are: Thomas D. Cook and Donald T. Campbell "Quasi-Experimentation: Design and Analysis Issues for Field Settin&s." Rand McNally, 1979. Sidney Siegel, "Non- for the Behavioural Sciences." McGraw Hill, 1956. Hubert M. Blalock Jr., "." (Second Edition) McGraw Hill, 1972. Jacob Cohen and Patricia Cohen, "Applied Multiple Regression/Correlation Anialysis for the Behavioural Sciences." Lawrence Erlbaum Associates, Publishers, New Jersey, 1975. Norman Nie and others, "Statistical Package for the Social Sciences" Second Edition. McGraw Hill, 1975. PART I

QUASI-EXPEMIMENTAL DESIGNS AND THE EVALUATION OF URBAN SHELTER PROGRAMS

In Part I we review the concept of quasi-experimental designs and discuss their application in the evaluation of large scale urban shelter programs.

Chapter 1 reviews the concepts of experimental and quasi-experi- mental research designs as they have been developed in the social science research literature. The weaknesses of each design are then evaluated using the concept of the four main types of "Threato to Validity" proposed by Cook and Campbelle

Chapter 2 describes the specific urban shelter programs covered by the World Bank-IDRC evaluation program from which the examples given in this text are drawn. These projects are used to illustrate the practical problems involved in the evaluation of large scale urban programs.

In the final section of Chapter 2 we present the 8 types of longitudinal research designs which have been or could be used, to evaluate these programs and we consider the major "Threats to Validity" inherent in each of them. CHAPTER 1: QUASI-EXPERIMENTAL DESIGNS FOR EVALUATION RESEARCH AND THEIR WEAKNESSES

1. Experimental and Quasi-Experimental Research Designs

Although it can rarely be used outside the field of experimental , we begin this Chapter with a brief discussion of the concept of an Experimental Design, as this is held up by most text-books as the ideal parad'gm on which all research designs should be modeled. Recently there has been. some discussion of whether in fact it is appropriate for the aocial sciences to try and base themselves on a research methodology which is taken from the natural sciences. Whatever one's feelings on this philosophical issue, the Experimental Design is a useful point of reference because it will be seen that departures from this design make the process of interpreting the results of our studies increasingly difficult. In those research designs which depart completely from this model it is often extremely difficult to know how to interpret the results obtaiaed from a study (for example Design

Number 1) .

There is not in fact one single Experimental Design, and the term indicates more an approach to scientific research. The paradigm presented in the following paragraphs in one of the simpler forms and the one most common in social science textbooks. The design, which is described as -2- a Pre-Test Post-Test Panel design with equivalent experimental and control groups can be represented as follows:

T(1) T(2)

E(1) _ X_ E(2)

C{1) C(2)

where: T(1) and T(2) represent time periods before and

after the experimental treatment.

X = experimental treatment

E(i) and E(2) represent the measurement of the

dependent variable in the experi-

mental group in T(1) and T(2)

C(t) C(2) represent the measurement of the

dependent variable in the control

group in T(I) and T(2). -3-

If we took an example from housing this would that a large number of families, say 500, would be selected at random. They would then be randomly assigned to an experimental and a control group, with perhaps

250 families in each. Let us assume that the is designed to measure the impact of housing on income. The income of all families would be measured at Time 1, called T(1) for convenience. As the two groups are chosen at random we would expect that the average income of the two groups would be the same, in other words:

Expected valuie: E(1) = C(1)

In practice it is extremely unlikely that the two groups would have identical incomes as there is always some variation due to chance.

What expected income means is that if the samples were redrawn a very large number of , on average, the income of the two groups would have the same mean.

The experimental group would then all be given exactly the same package of new housing at the same time. The experimental treat- ment, in this case housing, is defined as X. The control group would not receive any change in housing, but in all other respects the conditions of the two groups would be identical. After a certain period of time the income of the two groups would again be measured. These new measures, taken in Time 2 are indicated as E(2) and C(2) respectively.

If the new housing had an effect on income we would expect to find that

E(2) CC(2). -4-

In fact, as we have seen, it is highly likely that some differ- ence would exist between the mean income of the two groups, simply due to

chance factors. We would use a statistical test to determine the probability of finding a difference of this magnitude simply by chance if there was no real difference between the groups. The use of statistical tests and of null hypotheses will be discussed in detail in later chapters0

If the tests indicate a statistically significant difference between the experimental and control groups, and if all of the conditions of the experinental design have been met, then we would conclude that the experimental treatment (new housing) had had an effect on income. If the test indicated no difference between the groups, then we would infer that housing had not been shown to have an effect on income.

For a reasearch design to satisfy the requirements of an experi- mental design, the following conditions must be met:

(1) There must be a of subjects (people

or families) to the experimental and control groups.

This means that families would have no choice as to

whether they participated or not.

(2) Other than the new housing all other aspects of the

environment of the two groups would remain the same.

There would be no differences in employment, living

conditions, special events etc. which could affect

one group differently from the other and which could

thus affect changes in income. -5.

(3) All families would receive exactly the same package

of housing services and at the same time. If some

families would move in later than others, or some

receive water while others do not, or some teceive a

larger house than others, then this would violate the

requirement of a unique experimental treatment. There

are research designs which can include differential

treatment, but only if this is planned in a systematic

way.

It will become clear as we discuss urban shelter programs that no program has yet been developed which satisfies all of the above conditons.

Even in those rare cases where families are assigned to a project by a lot- tery (where there are more applications than houses) conditions (b) and (c) are not satisfied. This is not a special problem in the evaluation of shelter programs but is found in almost all evaluation reasearch. The real world very rarely approximates the sscientist's laboratory. This situation

is so prevalent that social researchers normally expect to work with Quasi-

Experimental designs which are acknowledged to depart in some important way

from the Experimental paradigm.

Campbell and Stanley (1963) identified 10 different types of

quasi-experimental designs as well as what they called 3 "Pre-Experimental

Designs" and a number of techniques based on correlation or ex-post analysis.

One of the main factors which differentiates the different quasi-experimental

designs is the way in which the control group is drawn and how closely it

approximates the experimental group. -6-

2. LIMITATIONS OF THE QUASI-EXPERIMENTAL DESIGNS AND THE THREATS THEY PRESENT TO*THE VALIDITY OF THE RESULTS

The essential logic of the experimental design is to select two

groups which are equivalent, to isolate all external influences and to

subject one of the two groups to an experimental treatment. Under

these conditions, if a difference is found between the groups in T(2)

then there is a strong presumption that this tL1fference was caused by the experimental treatment. Once we depart from this experimental design it becomes much harder to infer that differences observed in T(2) are due to the experimental treatment. The differences might be due, for example, to initial differences between the two groups, to the fact that the group who received the treatment was more (or less) likely to react than the control group, to the fact that events between T(1) and T(2) were different for the two groups, or that not everyone received the treatment in the same way.

Cook and Campbell (1979) propose four types of "threat to validity" which may occur with quasi-experimental designs. These four categories are described here and they will later be used to evaluate each of the research designs described later in this Chapter, then again in Parts 3 and 4.

In reading the discussion which follows it is important to consider why it is we conduct a particular study. Although in most cases we study only a sample of families or individuals, our interest is usually not limited to these particular families. Instead we hope that what we learn about these particular families will help us to understand the effect which the project is having on all families in the project. In most cases we would also like -7- to be able to use the results to infer what effects this type of project would have on some wider group such as the low-income population of a particular city or country. This distinction is important because our statistical test will often tell us that (for example) the income of

Rodriguez, Sanchez and the other 148 families we studied in a project has increased more than the income of Rojas, Blanco and the other 120 families we studied in our control group. However, what we wish to know is whether this finding will hold true for a comparison between all families in the project and all families in the control group. It is when we try to answer this question that we become concerned about the validity of our results.

2.1 Statistical conclusion validity: Cook and Campbell's first category refers to the statistical interpretation of the results of the study. The first question to ask is whether the survey design would be able to detect a statistical association between the dependent and in- dependent variables if such an association were to exist. If the asso- ciation is weak, and if the sample size is small (or the sample wrongly designed) it might not be possible to detect a statistical relationship even if one did exist. Under these circumstances it would be possible to make the error of inferring that there was no relationship when in fact a more careful analysis of the sample would have shown that it would have been extremely difficult, if not impossible, for any relationship to have been found.

In some cases if we have good information from previous studies on the of the dependent variable, it is possible to calculate the sample size which is needed to ensure that a given change in a variable can 8-

be detected. If we do not have this information, the analysis must be

done once the survey results are available.

However the analysis is done, before accepting a null hypothesis

that no change has occurred, always make sure that this negative result has

not been produced by the way in which the sample was designed. A design

which is potentially subject to this problem is the panel study (see Designs

I and 2 in Section 5 of this chapter). A high attrition rate may substan-

tially reduce the sample size in T(2) thus making it more difficult to obtain

statistically significant results.

Once the researcher is satisfied that the survey design is ade quate

to detect the effects of the treatment, he or she can then proceed to determine

whether any effect has in fact been observed. This requires the application

of the appropriate statistical test to determine whether there is a scatisti-

cally significant relationship between the treatment and the experimental

group. In some types of test this will mean determining whether there is a

significant difference between the experimental and control groups (for

example with the T-test), in others it will involve a test of the difference

between two distributions (X ) whilst in others it will mean determining

whether there is a covariation between the dependent variable and experimental.

status (experimental and control groups). In all cases the researcher must

decide on the level of statistical significance to be used (see Part 2). A

frequent error in reporting research results is to confuse statistical signi-

ficance with importaut or substantive differences. If samples are large, even

very small changes will often be found to be statistically significant.

The researcher must be cautious to show the magnitude of the differences which have been found so that both he and the reader can judge whether -9- the differences are sufficiently large to have any practical importance.

A large sample might show, for example, that the income of project parti- cipants has increased by $2 a month more than the income of the control group. A judgment must be made as to whether a difference of this size is important.

Two different criteria can be used in determining whether a sta- tistically significant association is "important." The first is to examine the absolute magnitude of the differences. This can often be done by pre- senting the confidence intervals of observed differences (see Chapter 6,

Section 2.2). From an analytical point of view one can also examine the proportion of the total variance which is explained by this association.

For example, if we find an association between income and participation in a project, we can examine whether participation is among the most important causes of increased income (explains a high proportion of the explained variance) or whether there are other more important causes. This issue is discussed in Chapters 6 and 12.

A fourth issue, not discussed by Cook and Campbell in this con- text, 1/ is whether the sample design permits the control of other indepen- dent variables which, might be producing the effects which are being attri-- buted to the experimental treatment. This statistical control is largely achieved through regression analysis. One particularly important technique is the use of lag variables to control for the value of the dependent variable in the previous time period, T(1). If this technique cannot

1/ This issue is however, discussed in detail in other sections of their book. - 10 - be used, there is a danger of making wrong inferences as to the cause of the observed changes in the dependent variable in T(2). This point is discussed further in the review of the panel study design in Section 5 of this chapter.

The process of determining the statistical validity of the results can conveniently be summarized in Fig 1.

Fig. 1 The process of determining the statistical validity of the survey findings

Question 2 Is the sample design adequate to permit the null hypothesis that the treatment has no effect to be

NO YES

possibled Question 2 r~~-~ t-----Does a statistically significant No further relationship exist between the analysis is treatment and the dependent~1, variable7r possible ex-

!Y . Question 3

Is the relationship sutficiently strong

Question 3a QuestiUon. 3b What is the absolute effect What proportion of the tota (difference between the expe- explained variance in the rimental and control groups)? dependent variable is ex- 21pained by the treatment--

Question 4 Does the design permit statistical control of independent variables, including lag variables, which might be causing the effects attributed to the experimental treatment? 2.2 Internal validity

Once it has been established that there exists a statistical association between the treatment and the dependent variable, the next task is to determine whether this association is evidence of a causal relationship between the two variables. It is important not to confuse statistical association (for example a correlation) with causality.

The fact that income of participants in a housing project increases faster than the income of the control group does not prove that participation in the project caused the increase in income. It might be that families who enter t'ie project are those more likely to increase their income (for example small businessmen looking for new markets, or people who are generally more ambitious), or it might be that the increased income is due to a third factor such as the opening of a new factory near the project. Some of the alternative explana- tions for the correlation between project participation and increased income are the following:

P-- -Y+ Participation in the project (P) causes income to increase (Y +)

P Desire to increase income (D) leads people to participate in D the project (P) and also leads income to increase (Y+) but there is no direct relationship between P and Y+

P Desire to increase income (D) leads people to participate in D---- Y+ the project P. Income is partly raised by participation in the project and partly by the desire to increase (D) - 12

P Y+ Income is increased (Y+) due to the opening of a new factory close to the project.

The existence of a simple correlation coefficient between P and Y+ does not allow us to determine which of the above causal explanations (and there are others) is correct. We must be careful not to make invalid causal inferences from our data. 1/

Cook and Campbell (1979) list 13 threats to internal validity. 2/

Several of these are more relevant to the field of social psychology, but the following are important for the present discussion:

(1) History: the events, unrelated to the experiment, which occur between

T(1) and T(2). Examples would be a fire in one community or the opening

of a factory which generates new sources of employment.

(2) Maturation: changes which occur as a result of the passage of time.

For example, subjects receive more education and earn more as they

grow older.

(3) Testing: the effects which the process of interviewing or obtaining

the information may have on the subject. The fact that a person is

repeatedly asked about household expenditures may lead him or her to

change the expenditures. Less might be spent on entertainment and

more on food, for example, if the person comes to realize how money

is currently being spent.

1/ The presentation of causal linkages in the form of the above diagrams is the basis of Path Analysis, a very useful technique which helps determine the direction of causation from correlation coefficients. See Cook and Campbell (1979) Chapter 7 and Nie and others (1975) Chapter 21. For a more complete discussion of causal analysis see Blalock and Blalock (1968) Chapters 5 and 6.

2/ Cook ard Campbell (1979) pages 51-55. - 13 -

(4) Instrumentation: spurious differences can be produced when changes

are made in the way a question is asked or who does the interview.

It sometimes happens, for example, that the researcher realizes there

was a problem in the way a question was asked in T(O) so the wording

is changed in T(1). An example would be the realization that certain

forms of female income were not being recorded as the woman did not

consider making clothes for the neighbors as income. If the question

is refined in T(1) to include this source of income this might produce

an apparent increase in income.

(5) Regression: if groups have been selected on the basis of extreme

scores (for example high income and low income) there is a tendency

for these scores to regress towards the mean. This phenomenon will

be found to be extremely important when we use change scores (Part 2,

Section 2.3.1).

(6) Biases resulting from differential selection for the experimental and

control groups. The example of the job-training program illustrates

this.

(7) Experimental mortality: In a longitudinal study it is usually not

possible to reinterview a certain number of subjects. The "drop-outs'''

tend to be different from the other families so that estimates based

on the remaiaing families are likely to be biased.

(8) Demoralization of subjects in the control group: If families in the

control areas feel that they are being unfairly excluded, or that

participants are being given some unfair advantages, they may become

discouraged and perhaps perform less well than would otherwise have

been the case. This is particularly likely to be true if the control - 14 -

group is drawn from families who were unsuccessful in the selection

lottery.

In addition to the factors mentioned by Cook and Campbell, the following

additional factors are important in many urban contexts:

(9) Direct effect of the project on the control areas: If a cooperative is

formed in a community, the families who are not members (and who are

selected for the control group) may be financially worse off as a result

of the cooperative. This is true if the cooperative is able to monopo-

lize work contracts or if it has an advantage in obtaining scarce raw

materials. Another example is the effect of a project on land prices

or rents. In the former case the project might increase land value5

in surrounding areas, hence making the coutrol group worse off. In the

second case, the increased housing supply might lower rents in surround-

ing areas and hence produce additional benefits for the control group.

All of these examples serve to show some of the difficulties in separat-

ing the experimental and control groups and using the latter to measure

what would have happened if there had not been a project.

(10) Reaction of public authorities and other groups to the exis'tence of the

project: As many low-income shelter programs are set up on an experi-

mental basis outside the normal structure of the government housing

programs, the creation of the programs can create either hostility

or desire to obtain political benefit. In the former case the affected

agencies may try to slow the progress of the project (by not providing

basic services for example). In the latter case the high visibility of

the projects may encourage the government to provide additional ser-

vices such as schools or health centers. What usually tends to happen - 15 -

is that these services do not represent additional budget items but

are diverted from other parts of the city. This means that other low-

income areas may lose a school or health center which had been planned.

When one compares the experimental and control groups, the former may

come out much highe'r because of these additional services, and the

control groups will come out lower. Again it becomes very difficult

to isolate which effects have been caused directly by the project and

which by these external government policy decisions.

2.3 Construct validity of causal relations

In the previous section we considered the possibility that supposed causal relations between A and B had in. fact been the result of some third factor which had not been taken into account (controlled for) in the research design. Construct validity, on the other hand, refers to the danger that different researchers might interpret the same findings in different ways.

An example would be the finding that a large number of families drop out of a housing project, selling their new house and returning to the poorer hous- ing area in which they previously lived. One researcher might interpret this as proving the project was not affordable to many families in the target group. Another researcher might argue, on the basis of the same evidence, that families were making a capital gain and that their decision

to leave the project was rational and did not reflect an affordability problem.

This problem of is particularly severe in u.rban

shelter studies as a large number of factors, many of which cannot be

controlled, are operating at the same time. As potential causes can

often not be completely isolated, there is considerable room for 16 interpretation of causal relations. Cook and Campbell observe:

"It is our distinct impression that most applied experimental

research is much more oriented toward high construct validity of effects than of causes." 1/

This is certainly true in urban shelter studies where great care is taken to measure effects (for example very detailed analysis of changes in housing quality or family income) but almost no attention is paid to the careful definition of causes. It is frequently assumed that the cause is the housing project, without trying to systematically define or quantify this at all.

Some of the factors which cause the problems of construct validity are the following:

(1) Absence of a clearly defined causal model: In many studies no

causal model is developed at all. Analysis is based on the use

of null hypotheses which simply relate to the expected outcome.

If the null hypothesis is rejected (i.e. there is a difference

between the experimental and control groups) it is assumed that

this difference was caused by the experimental treatment.

(2) Lack of clear definition and of the experimental treatment (2resumed cause)

In most projects there is not a single, unique treatment which all

participants receive in exactly the same way, or at the-same time.

Many projects may include as many as 10 different components (basic

core house, building materials loan, water, sanitation, roads, com-

munity center, cooperative, etc.) and dtfferent families will often

receive different combinations of services. When houses are built

p. 63 - 17 -

through progressive development one family may complete a large,

solid structure up to two years before another family may complete a

much smaller house made of poorer quality materials.

Under these circumstances it becomes extremely important

to define and quantify the "amount" of treatment received by each

family.

(3) Mono operation bias

Cook and Campbell point out that when concepts are complex and

difficult to define and precisely delimit, if only one definition

is used, there is considerable danger of spurious measurement.

This is due to the fact that unanticipted factors may enter into

the definition. If, for example, "completion of house" were used

as a definition of families who were receiving the maximum project

impazt, this may have a bias in favor of the wealthier families who

have the resources to complete the house. It may also exclude fami-

lies who prefer/ to benefit Lrom the cooperative to improve their

employment situation and who decide to defer the completion of the

house.

To overcome this one should try to use a number of different opera-

tional definitions of both treatment and effects. If the different

definitions provide consistent results, then one can be more.confi-

dent that a meaningful concept has been identified. If, on the other

hand, the results are not con3istent, then this might provide insight

into which aspects of the treatment produce different types of effect

(4) Interactionr of diffa-rent treatments

As many participants will be receiving a number of different

treatments, it will often be very difficult to isolate the

effects of one particular treatment in which the researcher

is interested. If the sample is large enough, and if each

treatment can be quantified, it is possible to isolate the

effects of each treatment with statistical techniques such

as Multiple Regression. However, one must be careful not to

immediately infer that one particular treatment was the cause

of the observed effect, without going through this more com-

plete type of statistical analysis.

2.4 External Validity

In most cases the main purpose of the evaluation research is to be able to generalize the results to some wider population. On the first level of generalization one wishes to generalize from the particular sample to all families in the project area. On a second level one usually wishes to be able to generalize from the results of the project to a wider population such as all low-income families in a particular city or in all urban areas of the country.

An important distinction is between generalizing to all of a given population (for example all low-income fmilies in the city) and generalizing to a particular group (such as families who are renting, or self-employed) Although sampling textbooks usually assume the objective is to generalize to a given population, in fact, from the policy point of view it may be more important to evaluate the poten- tial impact on a particular group. The difference is important as it - 19 - may sometimes be easier to design samples which are representative of particular groups rather than representative of a total population.

The following are some of the main threats to external validity:

(1) Validity of the inferred causal relationships

This is a summary of the issues covered under 2.2 and 2.3.

If the assumptions about the inferred causal relationship

are not in fact valid, they obviously cannot be generalized

to a wider population.

(2) Representativity of the experimental group

As projects tend to select particular groups or particular

geographical areas, it is important to determine whether

these groups or areas are typical of the wider population

to which we wish to generalize. Did the project select the

families who were most ambitious or who had the highest

potential to improve? Are there other geographical areas

similar to those of the first projects?

.(3) Could the project itself be replicated

In many cases the project itself is unique and very difficult

to replicate. If a project started as a result of a flood,

earthquake, political upheaval or land invasion, it may not

be possible to replicate under different circumstances. - 20 -

CHAPTER 2. THE APPLICATION OF THE EVALUATION DESIGNS TO URBAN SHELTER PROGRAMS IN DEVELOPING COUNTRIES

1. The principal types of shelter programs being studied

Althugh the research strategies discussed in this document can be applied in a wide of different research settings, the discussion is directed specifically to the evaluation of different types of intervention in the low-income housing market. The initial research progrma to which reference is constantly made was designed to evaluate the impact of urban low-income shelter programs developed under World Bank loans. 1/ These will be referred to as Urban Shelter Programs, rather than housing programs, to reflect the fact that many of the upgrading programs are more directly concerned with the provision of basic services such as water and sanitation than with the provision of housing. Two main types of program are discussed.

The first is Sites and Services in which plots of land are sold, or in some cases rented, to low-income families. The plots are provided with certain basic services such as water, sanitation, roads and possibly community services. In some cases a partially built house may be included whilst in others there is no construction at all. The purchaser will build or complete the house, either through his own efforts or through organized mutual help.

Sites and Services projects are often quite small and may offer

as few as 250 to 500 dwelling units, although some include up to 10,000 units.

1/ The evaluation presently covers: El Salvador (sites and services), Senegal (sites and services), Zambia (sites and services and upgrading), the Philippines (sites and services and upgrading), Indonesia (upgrading), Kenya (sites and services) and Colombia (upgrading). - 21 -

Normally the total number of units will not provide housing for more than about 10 percent of the low-income population of the city. Participating families are required to make a monthly payment which, although designed to be as low as possible, will normally be sufficiently high to exclude at least the poorest 20 percent of urban families. Given the relatively limited supply, and the need to ensure that families can meet the monthly payments, Sites and Services projects always involve a selection procedure.

The selection has the effect of including families who are not a representa- tive sample of low-income families. The participants do not include the poorest families and there may also be requirements in terms of family size, type of employment 1/ and time living in the city.2/

A Sites and Services project involves the physical movement of families from their existing place of residence to a new community built on previously unoccupied land. In some cases the distance travelled is relatively small, but where projects are located on outskirts of large cities, families may have to move a distance of 15 miles or even further from their present place of dwelling and employment. The projects will normally result in an increase in the total housing stock.

I/ In some cases there is a requirement that stated income must be con- firmed by the employer, or that monthly mortgage payments must be deducted from the pay check. Both of these conditions effectively eliminate families working in the informal sector.

2/ The requirement of a certain time living in the city is included to try and discourage rural-urban migration. - 22 -

The second type of project involves the Upgrading of existing

communities or areas of the city. Some projects concentrate on improving

the housing stock through dedensification and construction material loans,

whereas others focus on upgrading public services such as roads, water and

drainage. In this latter type of project it is hoped families will be

encouraged to make their own investments in housing once they see the

improvements which are being produced in the community and surrounding areas.

Upgrading projects are often very large and in many countries

are designed to cover most of the low-income population of the city. Some

upgrading projects will directly affect all families in a given area.

This is the case when reblocking or dedensification involves the re-alignment

of dwellings or the demolition of a certain number of units and the moving

of these families to overspill areas. Other projects which concentrate on

the installation of roads and other basic services may have differential

impact, with some families being directly affected and others experiencing

almost no direct impact. The difference in impacts can be even greater in

projects which involve a large number of different agencies providing dif-

ferent services. An example of this is when day-care centers, job-training programs, community centers, health clinics and paved roads are provided in different areas of a community or city. Many families may have no access to some services whilst being directly affected by others. This means that different families will receive widely different packages of services. The cost to families of participating in upgrading programs is much lower than. for sites and services, and the goal is often to ensure that almost no families are excluded through incapacity to pay. This contrasts markedly with the selection procedures of Sites and Services. - 23 -

Normally an upgrading program will not increase the total

housing stock but it will improve the quality of existing housing units.

2. Problems in Evaluating the Impact of Urban Shelter Programs

2.1 Some general research problems

The evaluation of the impact of an urban housing intervention

poses a number of special problems for the research design. We will discuss

some of the general problems and then consider special problems for the eva-

luation of sites and services and upgrading programs.

The first problem is that participants in the projects cannot be selected randomly. In the case of sites and services the participants

are selected from among families who apply and who have certain characteris-

tics which make them likely to be successful in the project. In the case of upgrading all families in a given area are automatically included. This fact makes it extremely difficult to select an equivalent control group. In the

case of Sites and Services it is usually possible to select controls who are

fairly similar in terms of their socio-economic characteristics, but their motivation cannot be controlled.

A second problem relates to the multiple treatments which are

included in most projects. The treatments are received in different combina-

tions and different intensities by different families, thus making it diffi-

cult to use many of the simpler methods for evaluating change.

A third problem relates to the selection and maintenance of a

control group. In the case of upgrading projects which are either intended

to cover the whole low-income population or are directed to groups which are - 24 -

unique 1/ it is very difficult to find similar communities to which they can be matched. In many cases the potential control groups are very unstable and may be demolished, destroyed by flooding or radically transofrmed 2/

during the period of the study. Even when the control area is not affected

there is likely to be a high turnover rate of families. This complicates the analysis considerably.

A fourth major problem for the research design is that projects are often subject to long and unpredictable delays. It is quite common for

a project to start 18 months later than expected, and in one of the projects

studied there was almost a five year delay. This makes it extremely diffi-

cult to plan a study and may mean that some of the treatments can still not be evaluated by the end of a 3 to 5 year study.

2.2 Some specific problems in the evaluation of Sites and Services

From the point of view of the research design, the sites and ser- vices projects have a number of advantages which make them easier to evaluate. In the first place there are no problems in defining which families are affected by the project. The houses are constructed on previously empty land and there

1/ In the case of the Philippines for example the upgrading project is being executed in Tondo which is a large area with a population of 100,000 people and very special historical and economic conditions. The control areas are relatively small and do not share many of these characteristics. In Jakarta the Kampung Improvement Program will cover all kampungs. It started with the poorest and gradually works upwards over a period of about 10 years. This means that at any point in time the control group is likely to have better conditions than the experi- mental group.

2/ Parts of squatter settlements are often demolished to build roads, whilst others may be dramatically improved through a new government policy. D 25 -

is no ambiguity as to whether a family is a participant. Families tend

either to all receive the same package of basic services (water, light,

sanitation, etc.), or there are several clearly distinct packages each

of which is received by a determined group of families. Sites and services

also tend to be limited to a relatively small proportion of the population.

of a city (usually not more than about 10 percent of the low-income popula-

tion), so that it is usually feasible to select a control group of families who will not participate but who have fairly similar characteristics to the participants.

2.3 Some specific problems in the evaluation of tupgrading projects

Upgrading projects are more difficult to evaluate for a number of different reasons. Many of the projects are very large and seek to cover

the total low-income population of a city. In practice different families in an upgraded area will be differentially affected by the project. At one ex-

treme a family may gain direct access to the paved road, be located close to a public water supply or be able to pay for the installation of private water

connection. At the other extreme one can often find large numbers of fami- lies who do not appear to havre been directly affected by the project at all.

These families may live in a sector of the community where roads have not been paved or drainage installed, and they may not be able to afford the purchase of the private water connection. In other projects, which involve the cooperation of a large number of different government agencies, a large number of different services may be provided in widely scattered areas. One family may live close to a day care center which they use, but be too far from a job training program to be able to enroll. Another family may have access to a health clinic but not to a community center. This situation makes it - 26 - extremely difficult to use a as the number of families in the sample who have received each service will be extremely lowg

In many cases with an upgrading project it is almost impossible to delimit clearly which families lie within the project area and can be con- sidered as forming part of the experimental group. The fact that many pro- jects cover such a large area of the city also makes it very difficult, or sometimes impossible, to locate a control group of families with similar characteristics but who have not been affected.

3. The Principal Research Designs for the Evaluation of Urban Shelter Programs

In this section we describe the 8 main types of evaluation design which are currently being used to evaluate the World Bank shelter programs. These do not exhaust all possible designs, but they cover the main concepts involved and can easily be generalized to other designs.

After the designs are presented we consider the particular validity prob- lems inherent in each.

Fig 2 summarizes the main characteristics of the 8 designs.

The designs are distinguished according to the type of sample, whether or not they use a control group, whether interviews are conducted before as well as after the application of the experimental treatment, and by whether the same dependent variables are used before and after the experimental treatment. The large number of designs, many of which contain serious defects from the theoretical point of view, reflects the difficult situations under which many of the evaluations have to be conducted. Fig. 2 EIGHT EVALUATION RESEARCH DESIGNS CURRENTLY BEING USED TO EVALUATE URBAN SHELT-ER PROGRAMS

Before and after research designs Ex post design

T y p e of s a m p 1 e

Panel Independent Mixed Independent

Designs No control Design 1 Design 3 Design 5 in which the same group dependent variable Control Design 2 Design 4 Design 6 Design 8 is used for all group samples

Designs Control in which different group Design 7 dependent variables are used in T(1) and T(2) - 28

3.1 Panel studies (Designs 1 and 2) 1/

The panel study involves the application of the research instru- ment to the same group of subjects (individuals, families, etc.) on at least

2 occasions. It is a powerful design because two measurements are obtained from the same person. This increases the sample precision by reducing the varianice of the difference between the T(1) a:;d T(2) means. This means that the same level of precision can be obtained with a smaller sample than would be required if independent samples were used. The benefit of the smaller sample size may be offset by the need to make some allowance for drop-outs (see below).

The panel study (or the modified version using mixed samples) is the only possible design, if we wish to adjust the value of the dependent va- riable in T(2) by its value in T(1). As we will show in Chapters 6 and 12 re- gressing on the score in T(1) is often the only way to avoid serious errors in the interpretation of the causes of change. In the example we give in Chap- ter 12, Section 3, when the T-Test was used it appeared that there was a signi- ficantly larger change in income of participants than of the control group.

This would normally be interpreted as showing that the project had an effect income. However, when income in T(2) was used as the dependent variable and this was regressed on income in T(1) as well as participant status, it was found that the project no longer had a significant effect on income and that the change in income (variance) could largely be explained by knowing the

1/ We do not include Design 7 in this discussion, because even though it uses a panel sample it contains very serious flaws so it is discussed separately. 29 - family income in T(l). In other words the families which showed the greatest income change were those who had the highest income in T(1). As these fami- lies were mainly project participants one could mistakenly assume the income change had been produced by the project.

This ability to regress on the dependent variable in T(1), defined by economists as the use of lag variables, is one of the strongest arguments in support of panel designs.

Another advantage of the panel designs will becqme apparent in Part ItI. When a control group is used we have a total of four observa- tions, two on the experimental group, in T(1) and in T(2)s, and two on the control group. If we use independent samples it is very difficult to conduct the statistical analysis as a way has to be found to reduce the four observa- tions to two so that the comparison can be made. With panel studies the reduction is simple as we can combine the two observations on each subject to produce a change score (referred to by some authors as a gain score or a first difference). With independent samples this option is not possible.

The problem is particularly difficult when the dependent variable is measured on an ordinal or nominal scale.

A weakness of the panel design is that when there is a high rate of drop-out or turnover, the panel sample becomes increasingly un- representative of the population in the project or control area. If 25 percent of families leave the area, as is quite common in many of the studies, then the 25 percent of new families who come in are not covered by the panel. If we assume the new families differ in some way from the - 30 -

original families, then the panel sample is not able to provide any infor-

mation on these new families and the information it provides becomes

increasingly unrepresentative of the total community. This problem is

largely overcome by the use of a Mixed Sample (see below),

The need to compensate for drop-outs may require an increase

in the initial sample size. If, for example, it was estimated that a

sample of 400 families were required to obtain statistically significant

results, and if a drop-out rate of 25% were anticipated, an initial

sample of approximately 530 would be required. This largely offsets the

benefits of the greater sample precision of the panel sample.

A third disadvantage of the panel design is that in practice

it is not possible to follow the families who leave the project or con-

trol areas. Several attempts to follow up on these families have proved

prohibitively expensive and even then the number of families who have

been located is quite low. As many of the control areas are considered

as extra-legal families are reluctant to leave records of their coming

and going, and none of the neighbors will provide the information as

enquiries of this type usually come from creditors or the police.

This lack of ability to follow-up means that one cannot observe the pro,ject impact on families who leave, which makes the panel design weaker than it would otherwise have been. Within this context we will

D-w describe the two panel designs which have been used.

Design 1 Panel Study without control group

This design simply involves the interviewing of a sample of project participants at two points in time, T(1) before the project began and T(2) at some point, usually one or two years, after the - 31 -

start of the project. X indicates the point at which the project began.

In this design there is no control group. The design can be represented

as follows: T(1) T(2)

Experimental group E(l)l- _x -__E(1)2

The following symbols will be used.F.in the presentation of all

designs:.

T(1) and T(2) indicate the time periods in which the interviews

a:'e conducted.

E iadicates the experimental group. In subsequent designs C will

indicate control groupo

E(1) The number in parenthesis indicates the time period

.inwhich the subject was first interviewed. In the

present design all subjects were first interviewed

in T(1).

E(1)1 and E(1)2 The number after the parenthesis indicates the

time period in which the interview was conducted.

Thus the two symbQls indicate interviews conducted

in T(1) and T(2) respectively.

X The start of the experimental treatment. In other

words the start of the project.

An example of this design would be a tracer study in which a group of trainees in a technical training program were interviewed at the start of the program and then perhaps one year later when they were in the labor market. In many cases the interview would be repeated several times. In which case this would be represented as follows: 32

T(1) T(2) T(3) T(4) Experimental group E(1)1 X E(1)2 E(1)3 E(1)4

In this case the symbols indicate that the same group of trainees who were

first interviewed in T(1) were reinterviewed in T(2), T(3) and T(4).

Design 2 Panel study with control group

in this design a control group is select-d as well as the experi-

mental group . The same families in each group are interviewed twice. This design can be represented as follows:

T(1) T(2) Experimental group E(1)1 _- X E(1)2

Control group C(l)l C(1)2

In the control group represented by C, there is no experimental treatment and hence X. no An example would be a study of the effects of improved water and sanitation on parasitic infection in which stool samples were taken from chil- dren when they entered a project and again after one year. Similar samples would also be taken from a control group of children who would be observed at the same two points in time.

3.2 Independent random samples

With this design a new random sample is drawn eva.ry time the interviews are repeated and a different set of subjects are therefore interviewed on each occasion. The two observations to be comparod do ncvt in this case come from the same individuals. This design has the advantage of not being affected by high turnover rates. It is more flexible and adjusts to increases or decreases in - 33 -

the total number of dwelling units. It is often possible to use a smaller

sample than would be rquired with a panel study as there is no need to com-

pensate for expected drop-outs (see discussion above of panel designs).

The disadvantage is that it is not possible to study changes in particular

individuals as information is only available on each person at one point in

time. This means that more detailed follow-up studies on subsamples of high

or low-changers are much more difficult to design. It is also not possible to

use the techniques of statistical control through regression on the T(1)

scores which were mentioned above.

Again two designs, with and without control group, can be

distinguished:

Design 3 Independent samples without control groups

This design involves the comparison of two independent samples of

participants, drawn at different points in time, and with no control gropp.

The design can be represeuted as follows:

T(1) T(2

Experimental group E(1)1 -_- _X E(2)2

where: E(1)1 = observation at T(1) of group selected in T(1) E(2)2 = observation at T(2) of group selected in T(2)

In this design the symbol E(2)2 indicates that the subjects interviewed in T(2) were selected in T(2) and were being interviewed for the first time.

Design 4 Independent samples with control group

In this design the experimental group and a control group are inter- viewed in T(1). New samples of both the experimental and the control group are

selected in T(2) and interviewed. Thus there are two independent observations - 34 -

of the experimental and of the control groups. This design can be repre-

sented as follows:

T T(2)

Experimental group E(l)l x E(2)2

Control group C(1)1 C(2)2

3.3 Panel study with replacement of drop-outs (mixed samplel

In all of the areas studied there is a high sample attrition rate

(a large number of families who move and cannot be re-interviewed). One approach in this situation is to use a panel study and accept a reduction in the sample size and simply re-interview the families who are still living

in the same dwelling (or who can otherwise be relocated) in T(2). The dis- advantages of this approach are that the sample size will be substantially

reduced, and more importantly that the sample becomes progressively less

representative of all families living in the community in T(2) (Bamberger.

1978)

An alternative strategy is to replace subjects who cannot be re-

interviewed. This can either be done by interviewing the new family living

in the house occupied by the subject interviewed in T(1) and who has moved,

or by drawing a sample of replacmeents at random. The merits of the dif-

ferent systems of replacement will not be &iscussed here.

This design offers the advantage of the greater statistical

flexibility of the panel design, whilst at the same time permitting

adjustments for the changing composition of the total population.

Whilst it does not overcome the effects on the statistical precision

of the panel sample of a high drop-out rate, the Mixed Sample Design - 35 - is perhaps the best compromise design which is possible in most circumstances.

This is the design which has been most widely used. Again two variations are possible, depending upon whether or not a control group has been used.

Design 5 Mixed sample without control group

This design involves the repeated observation of families in the experimental group. Those families who cannot be re-interviewed are replaced

(either by the new family living in the same dwelling or by a new family ran- domly selected 1/'. No control group is used. The design can be represented as follows:

T1) T(2) Experimental group E(1)1 x E(1)2

E(2)2

where: E(1)2 = original families who are reinterviewed in T(2) E(2)2 = replacements for original families who cannot be reinterviewed in T(2)

In this design it can be seen tnat some of the families in T(2) are being re- interviewed (the group E(1)2), whereas the group E(2)2 is being interviewed for the first time. When the groups E(1)2 and E(2)2 are combined they represent a stratified random sample of all families in the project in T(2).

1/ A question also arises as to whether the replacements should be dra-wn randomly from among all families in the project or only from among families who have moved in later than T(l). If this latter decision is adopted the replacements will represent a random sample of new families. - 36 -

Design 6 Mixed sample with control group

This design follows the same replacement policy as Design 5 but

in this case there is also a control group. The design can be represented

as follows:

T(1))

Experimental group E (l) X E(1)2

E(2)2

Control group CCI)1 C(1)2

C(2)2

This design has been used in both the Philippines and El Salvador.

3.4 Ex-post comparison of experimental and control groups

This is not strictly a longitudinal design but it is mentioned here because several of the evaluation programs have had to use this for at least one of their studies. There are two designs, one in which only an ex-post comparison is made and the other in which information from a different but related ex-ante study is introduced.

Design 7. Ex-post comparison

T:(I) T(2)

Experimental group E(2)2

Control group C(2)2

This design is extremely weak as it is very difficult to make any inferences about program impact simply from an observed difference between tho, two groups in the ex-post condition. - 37 -

Design 8 Ex-post comparison different ex-ante study

In some studies it has been possible to strengthen the design by

obtaining information on the two groups before the project began, but from

a different source and with different, although similar, dependent variables. 1/

This situation can be represented as follows:

T(1) T(2)

Experimental group e(1)1 X E(1)2

Control group c(1)1 C(1)2

where: e(I)l = a different set of data obtained in T(1) on the experimental families who are to be observed in T(2)

4. Comparison of the 8 Research Designs

Fig. 3 summarizes the list of threats to validity discussed in

Section 2 of this Chapter, and indicates how each of these threats affects each of the 8 designs.

As is to be expected, in all cases the designs which do not include a c.ontrol group are much weaker as there are many intervening variables which cannot be controlled and which might have produced the effects which are being attributed to the experimental treatment. The 3 non-control group designs (1,3 and 5) are recognized as being weak, and are usually only used when it is not possible to have a control group.

1/ An example was a study in El Salvador in which two groups of participants were compared 6 months after they had begun work on the construction of their houses. It was possible to compare the two groups on socio-economic conditions at an earlier point in time by referring to the form filled in at the time of selection. 38 -

Fig. 3 Threats to the validity of inferences from each of the 8 evaluation designs

Evaluation designs 1 2 3 4 5 6 7 8

STATISTICAL CONCLUSION VALIDITY

1. Is sample design adequate X X - X X - 2. Is there a statistically significant relationship 3. Is relationship important ------4. Control for intervening variables X X x X X X

INTERNAL VALIDITY

I. History X X - X - - X 2. Maturation X x - X - - x 3. Testing - - - - - X 4. Instrumentation - - - - X 5. Regression x X X X X 6. Selection X - X X - X 7. Experimental mortality X X - - X

8. Demoralization - - - - - 9. Effects of project on control areas - - - - - 10. Reaction of public authorities . - - - -

CONSTRUCT VALIDITY

1. Absence of causal model - - X X - - 2. Lack of operationalization ------X - 3. Mono-operational bias ------4. between treatment effects ------

EXTERNAL VALIDITY

1. Problems of infering causal relationship x - x X x - X X 2. Representativeness x - x - X - - X 3. Replicability of project X - X - x - - -

Key: Threats to validity are explained in Part 1 Section 2 Evaluation designs are explained in Part 1 Section 5

X = this problem is particularly serious with this design - = this problem can potentially exist but is not a particular problem with this design blanks indicate this problem does not exist with this particular design. 39 -

Designs 7 dud 8, in which either only ex-post measurements are made, or where the ex-ante measurement uses a different instrument, are also very weak. Again these two designs are only used where it is not possible to use a stronger design.

Before going on to compare the three strongest designs, we should consider what are the dangers of using the designs without control groups. Firstly, from examining the section of the chart on internal vali- dity we see that none of these designs can control for the effects of history or maturation. If our dependent variable is income or employment, it is very important to be able to control for general changes in the eco- nomic environment such as wage increases, the opening of new factories or

seasonal changes in employment. In El Salvador, for example, many families

enjoy a much higher income during the 5 months of the coffee barvest. If

our first interview was conducted outside this period of the year and the

second interview was conducted during one of the five coffee harvest months,

it is very likely we would find many families with a higher income. With-

out a control group we might easily attribute this historical (external)

change to the effect of the project.

A similar danger can exist with maturation. In a health study,

we will find that many types of parasitic infection are age-specific.

A repeat measure of parasitic infection on the same children would pro-

bably find marked changes due to maturation. Again if we do not have a

control group we might attribute these changes to the effect of the

project.

Another problem when there is no control group is to control

for the effects of selection . Often the families who enter a project - 40 -

are different from the general population in that they have more education,

smaller families, different employment patterns, and possibly higher income.

Unless we have a control group we will not know how different the partici- pants are from the general population and again may make erroneous inferences about the project's effects.

Finally, the lack of a control group seriously limits the external validity anid the extent to which the results can be generalized. The chart shows that with a non-control group design it is almost impossible to gene- ralize from the survey results to the low-income population. All of these factors obviously reduce very substantially the usefulness of the non-control group designs.

We now come to the comparison of the 3 strongest designs: (2) the panel study with control group, (4) the independent random samples with con- trol group and (6) the mixed samples with control group. In terms of Statis- tical Conclusion Validity the panel and mixed designs have the potential problem of a high drop-out rate which can reduce sample size and hence the precision of the results. This does not affect the independent samples. On the other hand the independent sample has the disadvantage that it cannot use lag variables and that it is much more difficult to control for the effect of intervening variables. This means there is a greater danger of attributing causation to the project when in fact the changes in the dependent variable were really produced by one or more of the uncontrolled independent variables.

When we compare internal validity the independent samples seem to fare better. The panel and mixed samples may be subject to problems of - 41 statistical regression 1/ which may tend to reduce somewhat the observed project,effect. Similarly these two designs are faced with the problem of attrition rates, although this is a problem which can be handled easily enough (in terms of its effects on internal validity) as long as the re- searcher is aware of the problem.

In terms of construct validity the fact that the independent sample design cannot observe change in particular subjects, limits some of the types of hypothesis about the causes and effects of change. It is very difficult, for example, to identify individuals who have experienced particularly high or low rates of change, and hence the reasons for these high or low rates cannot be analyzed.

In terms of external validity the only difference between the three

designs is that the independent sample's design is in a somewhat weaker

position for inferring causal relationships. This is due to the more limited

control over independent variables, and the inabilVty to study change at the

level of the individual. Although all three of these designs are very useful

if designed and analyzed properly, our feeling is that the Mixed Sample Design

is potentially the strongest as it combines many of the strong points of both

the Panel and the Independent Sample designs. However, the chart above shows

that all of the designs are subject to a number of potential pitfalls and that

1/ There is a statistical tendency for extreme cases to regress towards the mean. Thus subjects with an exceptionally low score in T(1) are likely to show a more than average increase, whereas subjects with an exceptionally high score in T(1) are likely to show a less than average increase. This causes problems for those dsesigns where two measurements are obtained from the same subject. - 42 - in the interpretation of the results of studies using any of these designs the researcher must be very careful to examine and report the potential weaknesses of the results. - 43 -

PART II

STATISTICAL TESTS FOR EVALUATING CHANGE

Part II describes in detail the statistical tests which can be used to measure change and project impact with different types of sample designs and with variables on different levels of measurement.

Chapter 3 discusses some of the factors which determine the appro- priate type of statistical test to use and also describes how hypotheses are defined and tested.

Chapter 4, 5 and 6 describe the statistical tests which can be used with nominal9 ordinal and interval scale variables respectively, considering in each case the different statistical analyses which are required for panel studies and designs using independent samples. Chapter 6 also considers ways to avoid some of the weaknesses inherent in the use of change scores (referred to by other writers as gain scores or first differences).

Each test is accompanied by a worked example, illustrating the mechanics and logic of the process of applying the test and of making inferences on the basis of the results. - 44 -

CHAPTER 3 SELECTING THE APPROPRIATE STATISTICAL TEST AND DEFINING THE RESEARCH HYPOTHESIS

1. Factors wnich influence the type of statistical test to use for measuring change

Each of the tests which are presented in this section is based upon a series of assumptions as to the characteristics of the sample design and

the types of variables being measuredo In this section we indicate some of the main factors which must be taken into account in the selection of the appropriate type of test.

1.1 The research design

The first factor determining the type of statistical test is the research design. Several aspects of the design are particularly important in this respect:

i The use of related or indendent sample

design is when two or more observations are made on the same

subject. The most common form is the panel study. Independent

samples are used when a new sample is drawn each time the scudy

is conducted so that each observation is independent of the other.

This distinction is important because many of the better known

significance tests such as Chi-Square, T-Test and Analysis of

Varianae assume ahat the samples are independent. The null hypo-

thesis wihich is being tested is that the two samples are drawn

from the same population. (The logic of null hypotheses is

discussed in section 2.2.1.) If the null hypothesis is rejected

then it can be inferred that the two samples come from different

populations. If both observations are made on the same individual -45 -

the inferences which can be made are different. When related

samples are used it becomes necessary to use different statis-

tical tests. It will be seen that in the following sections

the type of test to be used is alvays determined by whether

the samples are indepe .dent or related. 1/

(ii) The use of a control group: Ideally a control group should

always be used with experimental and quasi-experimental designs.

In practice there are often studies-in whicth a control group is

not used or where it has become lost during the process of the

study. 2/ The presence or absence of a control group affects

the test which can be used.

(iii) The type of control group: Some research designs use a matched

sample in which a control subject is matched to each experimental

subject. This affects the type of test, because with a matched

sample design the two samples are related so that the assumption

of independent estimates of variance cannot be maintained.

(iv) Equivalence of experimental and control groups: For a true

experimental design there should be random assignment of subjects

to the experimental and control conditions. Ir. most social re-

search, and in virtually all of the designs being studied, random

assignment is not possible and there is a non equivalent control

group. This does not affect the type of test so much as the

1/ It will be found that in mauy cases it is possible to transform related samples into a form whereby they can be treated as independent. An example is the use of change scores (see section 2.3.1).

2/ In one of the Zambian studies, for example, the control community was obliterated through flooding and all families had to -a relocated, thus eliminating the control group. -46-

inferences which can be made from the results of the

tests (external validity).

1.2 The level of measurement of the dependent variable

Before any variable can be analyzed statistically it must be meas- urable. Measuremant is achieved by constructing a scale on which the pos- sible values of the variable can be placed. Three or four main types of scale (or level of measurement) are normally distinguished:

Nominal scales: are used when different categories can

be dist±nguished but when it is not possible to order

them from greater to less. Examples are sex, nationality,

employment status, occupational category and religion.

The types of arithtimetic operations which can be conducted

with nominal scales are very limited and for this reason

they will often-be converted into binary scales (see below).

Binary or dichotomous variables: are used when a variable

can only assume one of two values. Examples are: sex,

member of the experimental or control group, employed or

unemployed. A very common form is to create dummy var-

iables which can only assume the values of I or 0, which

makes it simple to handle in regression analysis. 47 -

Any variable can be reduced to a binary form. The two main ways used in statistical analysis are:

a) To decide the most important division and to classify all cases into two groups on the basis of this division. For example with occupational categories all cases could be classified into self-employed or other. Informal settle- ments could be classified into rental units and other etc.

b) A series of dummy variables could be created. If there are 4 types of lov-income housing: tenements, squatter, extra- legal subdivisions and government housing, three dummy variables could be created as follows:

DUMMY Tene- Extra- Govern- VARIABLE ment Squatter legal ment

RENT 1 0 0 0 SQUAT 0 1 0 0 EXTRA 0 0 1 0

There is always one less dummy variable than the number of categories.

Binary variables also have the advantage that they can be converted into proportions so theat tests such as the T-Test can be used to evaluate change.

Ordinal scales are used when values of a variable can

be ranked from greater to lesser but where it is not

possible to assume that the intervals between the

different positions on the scale are equal. The fol-

lowing scale to measure attitudes towards permitting

a community store to sell liquor, is a typical ordinal

scale:

"As you know the Community Council has been asked to

grant a liquor licence to several stores in the com-

munity. How would you feel about these licences being

granted?" - 48 -

Strongly approve ...... 1

Approve ...... 2

Neither approve nor disapprove ...... 3

Disapprove ...... 4

Strongly disapprove ...... 5

It can be assumed that a person who approves (position

2) is more in favor than a petson who neither approves nor disapproves (position 3). However, it is not immediately clear that the difference between position

1 and position 2 is equal to that between position 3 and position 4. If one tries to visualize the intervals between the positions on the scale it could be that for most people the situation is as follows:

1 2 3 4 5 in that there is very little difference between strongly approving, approving and being indifferent, but there is a large difference between indifference and the two levels of disagreement. On the other hand the distances could be more as follows:

1 2 3 4 5 where the 3 middle positions are clustered and the large differences are between the two extreme positions. Given the fact that we cannot assume the scale to have equal intervals, it could be very misleading to calculate means or other statistics which involve adding, subtracting or dividing the scores. There is considerable discussion in the literature on the importance of the assumption of - 49 -

equal intervals. Some writers argue that in most cases

the assumption can be relaxed and ordinal scales can be

manipulated in the same way as interval scales without

causing any serious problems in the interpretation of

the results of the analysis.

The reason for wzishing to treat the scales as if they

were equal interval is that this permits the use of

a much wider range of arithmetic and statistical opera-

tions such as the calculation of means, standard devia-

tions, etc. which is not possible with ordinal scales.

Interval scales (equal interval scales or continuous

variables) are applicable when the intervals between

the positions on the scale are equal. Examples are age,

income, cost of a house, time living in the city, etc.

These scales can be analyzed with any type of mathematical

or statistical operation. This is the simplest type of

variable for the application of significance tests to

measure changes between groups or over time.

1.3 The form of the distribution being studied

Most statistical tests for interval variables assume a normal dis-

trilbution of the sampling means. If a sample is sufficiently large the dis-

tribution of means will tend to approximate normality even if the original distribution is not normal. However, if one is working with small samples

(of less than 100 for example) there may be significant departures from a normal distribution of the sample means. Cohen and Cohen (1975), Kerlinger 50

(1964) and others argue, however, that the two main tests we will be using

(the T-Test and the F-Test) are very robust, and departures from the norma-

lity assumption will usually not affect too seriously the validity of inferences although in some cases the size of the probabilities may be somewhat over or under-estimated 1/. In most cases the use of the 0.01 confidence level in place of the lower 0.05 level will be sufficient to compensate for the possible bias in the estimates.

1.4 The form in which the data is stored and in which it can be accessed

Many ;'tatistical tests require that the data for the observations being compared must be stored in the same file. This is the case, for exam- ple, when change scores are to be calculated. If the score in T(1) is to be subtracted from the score in T(2) it is obvious that the two values must be in the same file. In some cases, due to the limited capacity of the com- puters being used, it may not be possible to merge the files from the first and second survey. 2/ In these cases it is necessary to use tests which do not require the combining of the two scores.

1/ Cohen and Cohen, p. 49.

2/ An example is the use, by many research groups who only have access to a small computer, of the Mini-version of SPSS. This only permits the creation of files with less than 100 variables. As many surveys include 200 or more variables in each interview, it becomes extremely difficult to combine the two files. In these cases it is much easier to use tests 'which do not require the use of change scores or the merging of the files. - 51

2. Defining the research hypothesis and the conditions under which it will be accepted or rejected

2.1 Defining the research hypothesis

Before any statistical test can be applied it is necessary to define the hypothesis which is to be tested. Although this may seem very obvious, in many research papers the hypothesis is never stated very clearly and the deductions which are made from the results of a statistical test are often not logically valid.

The first step is to define the expected outcome we wish to test.

This is called the research hypothesis, maintained hypothesis or alternative hypothesis, depending on which author is consulted and on the approach which is used to testing the hypothesis. We will use the term Research

Hypothesis. For example we may believe that participation in a housing project will affect the income of the participants. We may believe that the income of participants will increase more than that of families in the control group or we may simply expect income to change differently for the experimental and control group but without being sure of the direction.

Let us assume for the that we expect income for partici- pants to increase more than for the control group. Our research hypothesis might be stated as follows:

Research hypothesis: Between the first and second application

of the interview (where the first interview is conducted at T(1)

before the project begins, and the second is conducted at T(2)

one year after families have occuped the project) the income

of families in the project will increase more than the income

of families in the control areas. - 52 -

The problem now arises to decide under what conditions we will

consider our research hypothesis has been proved. If income in the expe- rimental group increases by $200 and in the control group the increase is

only $199, would we say that our hypothesis has been prved? How do we define more?

2.2 Defining the null hypothesis

To test a research hypothesis it is usually necessary to state the conditions under which we will consider it has not been proved. If

these conditions can be rejected, we will then consider that our research hypothesis has been proved, The reason for this somewhat tortuous sounding approach will become clear when we examine the logic of the statistical test.

The conditions under whiCh our research hypothesis will not be accepted is called the null hypothesis. In many cases, although not all, the null hypothesis will be that there is no difference between the groups or that no change has occurred. We must be careful to ensure that our null hypothesis is the logical opposite of our research hypothesis so that the rejection of the former implies the acceptance of the latter. For example to test our research hypothesis it would not 'Ci valid to use the following null hypothesis:

Null hypothesis: There will be no difference in the change in

income between the experimental and control groups.

The reason why this is not valid is that if the control group income increased more than the experimental group we would have to reject the null hypothesis. However, in this case it is clear that rejecting the null nypothesis does not logically imply that the research hypothesis - 53 - is true. In this example the null hypothesis should be:

Null hypothesis: There will be no difference in the change in

income of the experimental and control groups or the control

group income will increase more.

If this null hypothesis were rejected it would be valid to accept our research hypothesis.

Null hypotheses are often presented in the following form:

H: (E) = (C) change in experimental group is equal to the

change in the control group (the first null

hypothesis)

H (E) < (C) change in the experimental group is less than 0 or equal to control group (second null hypothesis)

2.3 Defining the conditions under which the null hypothesis will be rejected

The participants who were interviewed were a sample drawn from the

population of all project participants. If another sample were drawm from

the same population, it is likely the average income would be slightly dif-

ferent. If one continued to draw samples, in each case one would probably

find a different average income. However, the average of all the samples,

if one took enough, would be the same as the average income of all project

participants.

Similarly the control group is a sample of families drawn from

all families who form part of the population we have defined as the control

group. Again, if we drew another sample from the control group we would

probably find that the average income was slightly different from the

first sample. These differences are called sampling error. The fact 54

that there are differences between the sample means of repeated samples

does not contradict the fact that all samples come from the same population.

When we have a null hypothesis of the form:

H : C(E) = C(C)

what this is essentially saying is that both samples are drawn

from the same population and that any differences which are found are due

simply to sampling error. In this case the null hypothesis would be saying

that participants and control group come from the same population. The

question we have to answer is --How large a difference would be required between C(E) and C(C) before we would feel secure in rejecting the null hypothesis and stating that the two samples come from different populations?

(i.e. that there really is a difference between the change in income and

that we can accept the research hypothesis).

Fig. 4 represents a theoretical distribution which would be obtained if we took repeatedly two samples from a population and plotted the difference between the means. The distribution would approximate to a normal curve. This curve has a mean of 0 (i.e. if the two samples come from the same population the most common difference between them would be

0) and a standard S. We know that approximately 95% of the sam- ples would have a value which fell within the range of + 1.96 S of the mean.l/

From this it follows that the difference between means of any given sample has only a 5% chance of differing by more than 1.96S from the true popula- tion mean. Fig. 4 shows that half of these 5% of cases would fall above the mean (to the right of D) and half would fall below the mean (to the

11 For a more detailed discussion of the Normal Curve see Blalock (1972) Chapter 7. I *

- - ______;

______.4_ _ _i ______* I . k S3______I

-V0 ______,______4~ ' g .. l ______

i . - 56 -

left of B). Similarly only 1% of sample means would fall at a distance

greater than + 2.58S from the true population mean.

It is possible to construct this theoretical frequency distri-

bution on the basis of data from two samples. 1/ Let us suppose that S,

the of the curve, is estimated to be 2 and that we

find that the difference between the means of our two samples is -2.8.

This would mean that our sample had a value of:

-2.8 2 = - 1.4S. Which is known as the Z score. This is shown as

position Y on the curve. If we consult a table of the normal distribution we will find that .08076 of the curve lies between -1.4 and 0. To find the

total area lying between 0 and + 1.4 we must double this and we find that

approximately 16% of the samples fall within this area. This would indicate

that there is a high probability of finding a difference between means as

large as this if both samples came from the same population.

Now let us assume that the difference between the sample means was 5.6. This sample would have a Z score of:

5.6 2 = 2.8. This is shown by position X on the curve. When we

consult the table we find there is only a probability of .00256 of finding a value this far above the mean. When we include the probability of finding a value this far below the mean the total probability becomes approximately

1/ This is estimated b,r pooling the standard deviations from the two samples and taking a weighted average. If both samples come from the same popula- tion then both samples should have the same standard deviation. This is discussed further in Chapter 6 when we examine the T-Test. - 57

.00512 or 0.5%. In this case it is extremely unlikely that we would find a value as large as this if the two samples did come from the same population.

It now becomes clear on what basis we can accept or rteject the null hypothesis that the two samples come from the same population. We must de- cide a probability level for rejecting the null hypothesis. Traditionally social scientists tend to use the 0.05 and 0.01 probability levels for re- jecting the null hypothesis. If we took the more stringent 0.01 level and we found a sample difference equal to Y, we would not reject the null hypo- thesis. In this case the probability of finding a value equal to Y is 0.16 which is greater than our rejection level. In the second example of X, we would reject the null hypothesis as a value as extreme as this has only a

.005 chance of occurring. It is important to understand that we are not saying that the null hypothesis is true or false. What we are doing is stating the probability of it being true. When this probability is very

small we decide to reject the null hypothesis. We must be aware however

that even in the case of X there is a 0.005 chance of two samples being

drawn from the same population and having a difference as large as 5.6.

We accept this risk of being wrong when we reject the null hypothesis.

In social science research we may be quite prepared to accept

a 0.05 chance of being wrong, but in medical research on new drugs or in

testing components for a moon rocket it would be necessary to have much

more stringent testing procedures and the rejection level may be put at

0.001 or even lower.

2.4 One- and two-tail tests

If our hypothesis does not indicate the direction of the dif-

ference we expect then an extreme value at either tail of the curve would - 58 - be sufficient to reject our null hypothesis. If however, our research hypothesis is that the experimental group should have a higher mean than the control group, then we may only be interested in extreme cases at the right tail of the curve. In Fig. 4 position C shows that 5% of samples can be expected to fall below -1.64S. If we know the direction we are looking for, a lower Z score or difference from the population mean will be sufficient to reject the null hypothesis. If we found a Z score of -1.64 with a null hypothesis which did not state the direction of difference, then this score would not be sufficient to reject the null hypothesis at the 0.05 level. If, however, our hypothesis did indicate that we expected the difference to be negative then we could reject the null hypothesis on the basis of this score.

When our hypothesis does indicate direction we use a one-tail test; when our hypothesis does not state direction we use a two-tail test.

2.5 Summary of 2rocedures for defining hypothesis

The procedures can be summarized in the following steps:

1. Define a research hypothesis about the type and direction of

the difference it is expected to find between the two groups.

2. Decide on the basis of I whether a one-tail or a two-tail

test is appropriate.

3. Define tihe null hypothesis in such a way that its rejection

will logically lend support to the research hypothesis.

4. Decide the level of rejection. 59 -

Thnese steps can now be illustrated using the example of income:

Research hypozhesis:

Between T(1) and T(2) the income of families in the p,oject (E)

will increase more than the income of families in the control group

(C).

Null hypothesis:

H : E < C 0 Level of rejiection

0.05 (ona-tail test)e - 60 -

CHAPTER 4: STATISTICAL TESTS FOR MEASURING CHANGE WITH NOMINAL SCALE VARIABLES

1. Independent samples: The Chi-Square, Test, PHI and Cramer's V

The Chi-Square test i.s used when we wish to test for the existence of a difference between 2 complete distributions. The fact that we are eval- uating a nominal variable means that it is not possible to test for the dif- ference between mean or median scores, as we normally do for interval or ordinal scales, The following example will illustrate the logic of the test.

A sample of 400 heads of household were interviewed in T(1) and information was obtai-ned on their occupation. A new sample of 450 heads of household was interviewed in T(2) and, information was again obtained on their occupa- tions. Tab1fa 1 summarizes the information obtained.

-P.ble 1: Occupation of head of household in independent samples drawn in 1975 and 1977

Occupation 1975 (T1) 1977 (T2) Total

Agriculture 160 130 290 Manufacturing 110 90 200 Co-zstruction 60 70 130 Services 40 100 140 Transport 30 50 80

Total 400 450 850

In 1975 (T1) it can be seen that the main occupation was agricul- ture (160 out of 400) and the second most important was manufacturing. In

1977 (T2) agriculture is still the most important although the number has

dropped to 130, but the second most important category is now Services which has increased from 40 to 100, whilst employment in manufacturing has dropped

from 110 to 90. The question we wish to answer is whether there has been a

statistically significant change in occupations between 1.975 and 1977 or

whether the differences are due to sampling error. - 61

Our Research Hypothesis is simply that the distribution between occupational categories in 1975 (dl) will be different from the distribu- tion in 1977 (d2). With X our hypothesis is usually in this form as normally we do not have a specific hypothesis about the way the distribution will change. 1/ The null hypothesis will be stated as follows:

H : dl = d2 0 Our research, or alternative hypothesis can be stated as follows:

H : dl # d2 1 In this case we will specify the 0.01 level for rejection of the null hypothesis.

The test statistic is calculated as follows:

2 2 (Equation 1) x (_- EL E

where:

0 3 observed frequency

E expected frequency on the basis of H

It can be seen that the test statistic is the sun of the differences from the expected frequency, divided by the expected frequency. The expected frequency of a cell under the null hypothesis of no difference between the two distributions is calculated as follows:

E(ij) = ri x ci (Equation 2) T where:

1/ It is, however, possible tp specify the expected frequencies for each category. This is rarely done, however, in the type of evaluation we are di,scussing. 62 -

E(ij) expected frequency of the cell when row i and

column j intersect.

ri = total frequency of row i

cj = total frequency of column j

T = total frequency

In Table 1 the expected frequency of agricultural employment in 1975 would be:

E = 290 x 400 = 136.4 850

The value of X for this cell is:

(160 - 136.4)2 4.08 136.4

When the values are computed for all the cells the value of X is found to be:

2 x = 66.7

Before we can determine the significance of this value in a table of the X distribution we must calculate the number of Degrees of Freedom. (d.f.).

This refers to the number of cells in the table which are independently determined and is calcTilated as follows:

d.f. (c-1) (r-I) (Equation 3)

where:

c nnumber of columns in the table (not including the total column) r nnumber of rows in the table (not including total)

In the present case d.f. (2-1) (5-1) = 1 x 4 4. - 63 -

The logic of degrees of freedom can be understood as follows. If all of the marginal totals are known for the rows and columns we only need to know one number in each row to be able to calculate the other. For example 290 people in total worked in agriculture. If we know that 160 were in agriculture in 1975 then the number in 1977 must be: 290 - 160 = 130.

Once we know one row the other is automatically determined. This means that

(r-1) figures in the row are independently determinede In the same way if we know 4 figures in a column the fifth is automatically determined so that the number of independent observations is (c-1) which in this case is

4.

Table 2 gives the values of X for d.f = 4 for various probability levels:

2 Table 2: Values of X corresponding to different levels of probability of the null hypothesis being true when d.f. = 4

Probability Levels

d.f. 0.05 0.025 0.01 0.001

4 9.488 11.143 13.277 18.467

With d.f. = 4 a value of X equal to 9.488 indicates that there is only a 5% probability of obtaining a value as high as this if the null hypothesis is true.

If the value of X' is 13.277 there is only a 1% chance of the null hypothesis being true. In our present example the value of X is 66.7 which means that the probability of the null hypothesis being true is virtually nil. On this basis we can acccept the altet.aative or research hypothesis that the two samples come from different populations. In our present study - 64

we would infer that there has been a change in the distribution of occupa-

tions between the time of the two sampl.es. It is extremely important to

appreciate that this analysis, which does not include a control group,

does not permit us to infer that the change has been produced by the project.

It is worth mentioning again that the X test tells us something

about the characteristics of the total distribution but it does not indicate

which categories have changed most or which of these changes is the most

significant. If, for example, we wished to test the hypothesis that a change

had occured in the proportion of heads working in the service sector, we would

not use the above form of the test. A more appropriate way to present and

test the data is given below in Table 3: 1/

Table 3: Re-organization of the data from Table 1 to test a hypothesis about changes in employment in the service sector

Occupation 1975 (Ti) 1977 (T2) Total

Service sector 40 100 140

Other sectors 360 350 710

Total 400 450 850

The information on all non-service employment has been combined so that the previous 2 x 5 table has been collapsed to a 2 x 2 table. In this case the null hypothesis is:

H0: S(1) = S(2)

where:

S = employment in the service sector.

1/ When the table is reduced to this binary form a number of other statis- tical tests are possible including: T-Test for proportions (see Chapter 6), use of regression analysis with dummy variables (see Chapter 6) or the calculation of a correlation coefficient (PHI). This latter is presented in the following section. - 65 -

2 When estimating X with a 2 x 2 table it is necessary to use a modified form of the equation to adjust for continuity

X HO - E) - 0.512 (Equation adj E 4)

In the present example X = 22.1 with d.f. = 1. Again the proba- adj bility of the null hypothesis being true is less than 0.001 and we can confi- dently assume that there is a difference between the two samples, in terms of the proportion of heads who work in the service sector.

The test could be repeated again to evaluate changes in a different occupational category if this was of interest to the investigator.

1.1 Assumptions and conditions for the use of X

1. The two samples must be independent.

2. Not more than 20% of the Expected frequencies

should be less than 5.

3. None of the Expected frequencies should be zero.

In a table with a large number of cells it is often necessary to combine cells to satisfy conditions 2 and 3. This can be done by combining cells which are similar or which are of little interest to the study.

1/ Siegel proposes a different form:

2 2 X N2AD - BC)-N/2] 2 (Equation 5) (A + B)(C + D)(A + C)(B + D) - 66 -

1.2 Some limitations of x2

There are two main limitations on the use and interpretation of 2 X e The first is that as there is no upper limit, it is not possible to 2 use X to estimate the proportion of the variance which is explained. 1/

This means that although we can estimate the statistical signifiance of the

relationship we cannot estimate the strength of its explanatory power. Thus

we know that X in the previous example is significant at the 0.001 level

but we do not know how large the change is.

The second limitation is that the value of X is strongly affected

by the sample size. The larger the sample, the more likely one is to find a highly significant value of X. With a very large sample one could find that the difference between 49.9% and 50.1% is highly significant, even though a difference as small as this normally has no practical importance (unless of course you are a politician seeking re-election!).

With a 2 x 2 table these limitations can be overcome by the use of

PHI, and when the table is larger than 2 x 2 they can partially be overcome by the use of Cramer's V.

1/ Variance can be understood if we think of the factors which determine height. Not all people are the same size. Some of the factors which determine height are age, weight and sex. If we found that the average height of everyone in the country was 4 ft. 6 inches we could then com- pare age, weight and sex individually to find out which was the best predictor of a person's height (the amount this person varied from the mean). The ability to predict height from each of these factors is defined as the contribution of the factor to explaining the variance in height. - 67 -

1.3 The PHI coefficient

To calculate PHI it is necessary for the table to be reduced to a

2 x 2 form. PHI is equivalent to a correlation coefficient in that its maximum possible value is + 1 and the minimum is 0. PHI is calculated as follows:

PHI BC - AD (Equation 6) 4(A + B)(C + D)(A + C)(B + D)

If the value of X2 is already known, PHI can be calculated more easily as:

PHI 2 (Equation 7)

PHI has the advantage over X that it provides an indication of the proportion of the variance which is explained. In the present case only 16.4% of the variance is explained, which would suggest that this relationship is not one of the main determinants of employment sector. Care must be taken in the interpretation however, because with a 2 x 2 table the maximum possible value of the correlation is normally less than 1. 1/ In the example given in Chapter 12 the maximum possible value of PHI was estimated to be 0.55.

It is possible to estimate the significance level of PHI by con- sulting a table which gives the significance level of a correlation co- efficient. In the present case the coefficient is found to be significant at the 0.001 level.

For the figures presented in Table 3, PHI would be calculated as follows:

PHI = 10 x .360) - (40 x 350 j6(40 + 100) (360 + 350) (40 + 360) (100 + 350)

= 0.1645

1/ For a discussion of this point see Cohen and Cohen (1975) Chapter 2, page 37. - 68 -

1.4 Cramer's V

When a table is larger than 2 x 2, PHI has no upper limit so its

value is severely reduced. The value can be adjusted through using Cramer's

V so that the maximum value becomes +1. This is calculated as follows:

J 2 Cramer's V = PHI (Equation 8) Min (r-1)(c-1)

This means that PHI is divided either by the number of rows minus

one, or by the number of columns minus one, whichever is the smaller.

2. Related samples: McNemar Test

The McNemar Test is derived from the ChiSquare Distribution but can be used with related samples. The test can only be used with dichotomous variables and to apply the test the variable must be presented in the following

form:

Table 4: EmEloyment status in T(1) and 'T2

T(2)

Unemployed Employed

t(l) Employed 60 (A) 150 (B)

Unemployed 80 (C) 10 (D)

This table summarizes the results of a study on unemployment. In T(1) a total of 90 people were unemployed and in T(2) this figure had increased to

140. 230 people did not change their employment status between T(1) and T(2) but 60 previously employed people were unemployed in T(2) compared with only 10 previously unemployed who were working in T(2). The test is based on a com- parison of the people who change, to determine whether there is a significant - 69 -

difference between the number of people who changed from the negative to positive category (unemployed to employed) and the number who changed from positive to negative.

As only those people who have changed are included in the analysis, the samples of those who change from positive to negative and negative to positive are now independent. The results can be transformed to produce a

2 X distribution in the following way:

22 X 3 (A-D) (A + D) with 1 degree of freedom. (Equation 9)

The most exact form of the equation requires a correction for con- tinuity (because there is only 1 degree of freedom), expressed as; follows:

2 2 X = (A - D - 1) (A + D) with 1 degree of freedom. (Equation 10) 1/

In the example given in Table 4 the value of X would be calculated as follows:

22 x = 60 - 10) (60 + 10)

34.3 (d.f. = 1)

1/ This is the form recommended by Siegel (1956). A number of other authors recommend subtracting 0.5 instead of 1. - 70 -

The probability of achieving a score as high as 10.828 if there

is no difference between A and D is only .001 so the probability of finding

a score as high as 35.7 is virtually nill. We can therefore reject the null hypothesis that there is no difference, and conclude that the number

of people who have lost their jobs between T(1) and T(2) is significantly higher than the number of unemployed in T(1) who were working in T(2).

2.1 A limitation of the McNemar Test 2 To achieve independent samples and thus permit the use of X the test rejects all cases where there was no change between the two time periods. In a situation where the majority of subjects do not change, the test can be very misleading as it can produce a highly significant result, but one which could be based on an extremely small number of cases. To take an extreme.example. If only 10 people out of a population of 1000 changed their employment status, with 9 :hanging from unemployed to employed and the remaining 1 case changing from employed to unemployed, the test would show a highly significant result. To deduce from this that a major change in employment status had occurred would obviously be very misleading.

2.2 Applying the McNemar Test with SPSS

The McNemar Test is available in Version 7 of SPSS but not in earlier versions or the Miniversion. If Version 7 is not available the actual computation of Equation 6 must be done manually. However, this is very simple as SPSS can produce a 2 x 2 table which gives the frequencies of A, B,

C and D. In the following example information was obtained on employment type in T(1) and again in T(2). Employment in T(1) was Variable 10 in the code and Employment in T(2) was variable 20. Employment was classified into the following types: - 71 -

Agriculture ...... *...... I Marlufacturing ...... 2 Services ...... 3 Transportation ...... 4 Construction ...... 5

As the McNemar Test can only analyse a dichotomous variable it was decided to compare changes in Agricultural employment. Employment is thus reduced to a dichotomous variable:

Agriculture and other.

The 2 x 2 table is produced as follows:

IF (VAR1O EQ 1 AND VAR020 NE 1) VARX = 1 IF (VAR10 EQ 1 AND VAR020 EQ 1) VARX 2 IF (VAR10 NE I AND VAR020 NE l) VARX = 3 IF (VARIO NE 1 AND VAR020 EQ 1) VARX = 4

lhen a is produced the values of VARX

correspond to the values of A (VARX = 1), B (VARX 2), C (VARX 3 3) and

D (VARX - 4).

2 The value of X must then be calculated manually. 72 -

CHAPTER 5. STATISTICAL TESTS FOR MEASURING CHANGE WITH ORDINAL SCALE VARIABLES

1. The relationship between ordinal and interval measurement scales

An ordinal scale is used when positions on a scale can be ranked from lowest to highest but where there is no evidence to demonstrate that the intervals between the scale positions are equal. Due to this there are severe limitations on the typt: of statistical analysis which can be conducted with ordinal variables. There has been considerable discussion in the research literature about the effect of treating these scales as if in fact they were equal interval scales. If this is possible it has the great advantage that a wider range of more powerful statistical techniques can be used. The following quotation from Cohen and Cohen (1975) illustrates this point of view.

I"Both theoretical statistical analysis and many empirical results,

however, suggest that such scales (ordinal) have "equal enough"

intervals for most purposes. Subjectively approximately equal

intervals have been found to behave more like equal than like

unequal interv-als. For example their regression lines with

other variables are usually reasonably straight, and when not,

the fault is likely to be with the other variable." 1/

Whilst this conclusion may be true, if the researcher does not have access to an expert statistician it may be advisable to adopt a more conservative approach and assume the intervals are not equal. 2/ In this case the types of stacistical tests to use are those presented in the fol- lwing sections.

1/ Col'fn and Cohen (1975) p. 263. 2/ One method for determining whether the scale approximates a normal distribution is illustrated in Chapter 12, Section 2. - 73 -

2. Independent samples: The Median Test 1/

Let us assume that a study to measure attitudes towards trade unions has been administered. A 9 point scale is used where +4 indicates an extremely positive attitude and -4 indicates an extremely negative attitude.

Table 5 gives the results of the application of the scale to a sample of 110 families in a new housing settlement and 155 families in control areas.

It can be seen that only 25 of the participants have an attitude less favorable than zero with none below -1. On the other hand 90 of the control group are below zero and 30 are below -1.

To use the test the median score for the combined sample must be calculated. This is done by producing a cumulative frequency distribution as shown in Table 5.

With a total of 290 observations the median lies between the value of the 145th and 146th cases, which means that it lies between the value of +1 and 0. We now produce a table to indicate how many cases in the participant and control groups respectively fall above and below the median value. This is shown in Table 6:

1! See Siegel (1965) pages 111-116. - 74 -

Table 5: Application of a scale of attitudes to trade unions to participants in a new housing pro-ect and to a control grouR

Experimental Control Group Group Cumulative Attitude Frequency Frequency Frequencies

+4 extremely fa-J orable 0 5 5

+3 10 0 15

+2 25 20 60

+1 35 50 145

0 neutraT 15 15 175

-1 25 60 260

-2 0 5 265

-3 0 15 280

-4 extremely unfavorable 0 10 290

Total 100 180 290 75 -

Table 6: Number of cases falling above and below the median in the participants and control groups

Participants Control

Cases above the median 70 (A) 75 (B)

Cases below the median 40 (C) 105 (D)

Total 110 180

2 The distribution can be transformed to correspond to the X dis- tribution in tha following way:

2 2 X = N(AD - BC) - N/21 (Equation 11) (A+B) (C+D) (A+C) (B+D)

with d.f. = 1

2 For the present example the value of X would be:

2 2 X = 290, ((70 x 105 - 75 x 40) - 290/2] (70+75) (40+150) (70+40) (75+105)

= 12.3

The probability of this value occurring by chance if the two groups

come from the same population is less than 0.001 (which ha-- a value of 10.828)

so we can safely conclude that the two samples are drawn from different populations.

3. Related samples: The Sign Test and Wilcoxon's Matched Pairs Signed

Ranks Test

Two different tests are available, the choice depending on the

strength of the level of measurement. In some instances it is only possible

to indicate that one score is greater than another without being able to

estimate the magnitude of the difference. An example of this weak level -76 of measurement would be the classification of subjects into two groups according to whether or not their attitudes had become more favorable to participation in mutual help construction. In this case it might not be considered possible to measure the degree of improvement of attitude. With a weak measurement of this kind the appropriate test is the Sign Test.

In other cases it is also possible to rank subjects on a scale of order of magnitude. The example in Table 5 of a 9 point scale to measure attitudes to trade unions would be a typical scale of this stronger level of ordinal measurement. In this case the appropriate test would be the Wilcoxon Matched

Pairs Signed Ranks Test. Below are given an example of each test.

3.1 The Sign Test - weak ordinal measurement 1/

The Sign Test is used when the level of measurement is weak and it is not possible to rank cases on an ordinal scale but where it is possible to classify scores into two groupso Typical examples would be the classification of people into "Well informed" and "Poorly informed" on a particular social issue, or into "High level of social participation" and "Low level of social pdtticipation". If A is the classification of a person in T(1) and B is the classification of the same person in T(2), the null hypothesis would be:

p(A) > (B) = p(B) < (A) = 0.5

where:

p(A) > (B) = probability that A is greater than B.

To administer the test a 2 x 2 table of the form shown in Table 7 must be constructed.

1/ See Siegel (1965) pages 68-75. - 77

Table 7: Hypothetical 2 x 2 table to apply the Sign Test to evaluate chaSg&s in the level of social participation

Level of participation in T(2)

Level of participation in T(1) High Low

Low (A) 80 (B) 120

High (C) 40 (D) 100

The table shows that a sample of 340 families were interviewed in

T(1) and again in T(2). A total of 180 changed their level of participation.

(80 went from low to high and 100 went from high to low.) The test will deter- mine the probability of finding 100 cases where participation declined com-

pared with 80 where it increased if the 'ull hypothesis:

H: A = D 0 is true.

There are two forms of the test depending upon whether (A+D) is greater than or less than 25.

3.1.1. Application of the test when (A+D) < 25

To illustrate the procedures we will simply divide the scores in

Table 7 by 10 so that:

A = 8 and D = 10.

The purpose of the test is to determine whether the higher number of cases where the participation level decreased is statistically significant or simply due to sampling error. The following steps are used in the analysis:

i. Select the smaller of the two values

(in this case A).

ii. Add the two values. A + D 18 - 78 -

iii. Use a Binomial distribution Table 1/ and calculate

the probability of encountering 8 or less cases

of A wheen P(A) = 0.5 and N (total number of cases

or A+D) = 18. It will be found that the proba-

bility is 0.4073 which means that the null hypo-

thesis cannot be rejected,

3.1.2. Application of the test when (A+D) > 25

To illustrate the use of the test when (A+D) is greater than 25 we will use the figures as presented in Table 7. In this case we use the

Normal Distribution and calculate the corresponding Z score as follows:

Z = (x + .5) 005(N) (Equation 12) 0.5

where:

x a or D, depending on the hypothesis being tested

N - (A + D)

i. When x > .5N substract 0.5 fom x.

When x < .5N add 0.5 to x.

ii. In the present example the value of Z is calcu-

lated as follows:

Z (8 + .5) (.5)(18) .5 -Yl

.2357

1/ When using a binomial distribution, read the figures in the table with extreme care since different texts present different forms of the same tablte. Some tables will indicate cumulative probability while others show the exact probability. ° 79 -

iii. The probability corresponding to Z = .235 is

found.to be .40905 which means that the proba-

bility of finding a score equal to or less than

8 is:

.5 - .40905 .09095 which is too high to

reject the null hypothesis.

3.2 The Wilcoxon Matched Pairs Signed Ranks Test for stronger ordinal

measurement 1/

The Wilcoxon Matched Pairs Signed Ranks Test is used when it is possible to rank each case on an ordinal scale with a relatively large number of positions. In the following examples we use a scale of satisfac- tion with housing where the minimum possible score is 0, indicating complete dissatisfaction and the maximum is 20, indicating complete satisfaction.

The scale is ordinal because although we can assume that a score of 15 indicates a greater level of satisfaction than a score of 14, we cannot assume that the difference between the score of 14 and 15 is equal to the difference between a score of 9 and 10. As with thie Sign Test there are two different ways to apply the test, depending upon whether the number of cases is greater or lesser than 25.

3.2.1. Application of the Wilcoxon Test when n < 25

Table 8 illustrates a hypothetical scale of satisfaction with housing. A sample of 15 people were asked to indicate their degree of

satisfaction with their housing in T(1) and again in T(2). The score on

a scale of 0 to 20 is shown in the table. The lowest score in T(1) is 2

(Case 8) and the highest is 19 (Case 15). The following steps are used

to apply the test:

1/ See Siegel (1965) pages 75-83 and Blalock (1972) Chapter 14. - 80

Table 8: Results of a hypothetical study on satisfaction with housing in T(1) and T(2) (N=151

Signed Rank with Satisfaction score Rank least Case No. T(1) T(2) d of d common sign

1 15 19 4 9.5 -1 2 12 11 -1 -1 3 8 13 5 12.5 -3.5 4 9 7 2 -3.5 5 14 18 4 9.5 6 6 10 4 9.5 7 3 6 3 6.5 8 2 9 7 14 9 9 7 -2 -3.5 -3.5 10 8 10 2 3.5 11 7 12 5 12.5 12 6 8 2 3.5 13 14 17 3 6.5 14 13 13 0 15 19 15 -4 -9.5 -9.5

T = -17.5 - 81 -

i. The score of each case in T(1) and T(2) is

presented as illustrated in the table.

i>. The difference (d) between the two scores is com-

puted. If the second score is greater the d score

will be positive (as for the first case) and if

the >:. ond is lower, the d score will be negative

(as in the second case). iii. The cases are ranked in order of the magnitude

of d (irrespective of the sign of the differ-

ence). If there is no change the case is excluded from the

analysis (Case 14). The least change is case 2 where

d = -1. This is given a -ank of -1 (I because it is the

lowest and a negative sign because the sign of d was

negative.)

iv. There are 4 cases where the absolute change is 2

(cases 4, 9, 10 and 12). When the ranks are ties

each case is given the average rank. As the ranks

would be 2, 3, 4 and 5 each receives a ra)'2. of:

2 + 3 + 4 + 5 = 3.5 4

For those cases where the sign of d is ne3at2.v.e,

the rank is given as -3.5 whereas for those cauzes

where d is positive the rank is given as + 3.5. - 82 -

v. Each case is ranked in this way finishing with

case 8 which is given a rank of 14 (remember

case 14 where D = 0 was eliminated.)

vi. The frequency of positive and negative ranks

is compared to determine which sign is least

frequent. In this case there are 10 cases with

a positive sign and only 4 with a negative sign

so the negative sign is the less frequent. The

ranks of the cases with least common sign are

recorded in the final column.

vii. The values of the ranks in the final column are

added to give the T score which in this case is:

(-10) + (-3.5) + (-3.5) + (-9.5) = -17.5 viii. The null hypothesis is that there is no change

between T(1) and T(2). If this is true the number

and size of negative and positive ranks should be

approximately equal with the only difference being

due to sampling error. The average positive and

negative rank would be 7.5 (estimated as the aver-

age of the lowest possible rank which is 1 and

the highest possible which is 14). In this case

we would expect to find 7 cases with negative

signs (ignoring the one tied case), and each of

these would have an average rank of 7.5 which

means the value of T would be approximately 52.5. - 83 -

The purpose of the test is to determine the

probability of finding a T score as low as 17.5

when the expected score with 14 cases is 52.5.

±x2 A specia1 table has been developed to test the

significance of the difference. The table

indicates the probability of achieving certain

T scores for a given number of cases. Assuming

we are using a one-tail test, because our null

hypothesis is that there will be no increase,

the probabilities of achieving different T scores

are as follows:

Table 9: The probability of achieving certain T scores with a one-tail test when n = 14 (Wilcuxon matched pairs signed ranks test 1/

Level of significance of certain T scores

0.05 0.025 0.01 0.005

T score when n = 14 26 21 16 13

X. The score of -17.5 falls between 21 and 15 which

means that the probability of achieving this score

is greater than 0.01 but less than 0.025. This

means that one can reject the null hypothesis and

accept the alternative hypothesis that satisfac-

tion with housing has in fact increased.

11 Kmietowicz and Yannoulis (1976) Table 23 -84-

3.2.2. Application of Wilcoxon Test when n > 25

When n is greater than 25 the distribution of T scores approximates

a normal distribution and a Z score can be calculated as follows:

T - N(N+1) Z 4 (Equation 13) 24

The application of this test will be illustrated with the hypo-

thetical data on a scale of housing quality which is presented in Table 10.

The scale was produced by adding different components of a house which have

been evaluated on a 3 point scale (Bad-Average-Good). Observations have

been obtained on a sample of 30 households in T(1) and T(2). Our Research

Hypothesis is that the quality of housing will have improved so the Null hypothesis is expressed as follows:

H : Q(1) > Q(2) 0 where Q = Quality of housing measured in

T(1) and T(2).

Given the form of the null hypothesis it will be appropriate to use

a One-tail test as we have specified the direction of the change we expect.

The T score is calculated in exactly the same way as for the previous example. In this case the number of ranks with a negative sign is 15 and the number with a positive sign is 14. As the number of positive and negative ranks is almost equal the value of T (for positive ranks in - 85 -

Table 10: Results of a Hypothetical Study of Housing Qualit [N=301

Signed Rank with HuigQait Rank least Case No. T(1) T(2) d of d common sign

1 23 19 -4 -23 2 24 18 -6 -28 3 16 17 1 5.5 5.5 4 15 14 -1 -5M5 5 19 19 - - 6 10 11 1 5.5 5.5 7 14 16 2 13 13 8 15 12 -3 -18.5 9 16 15 -1 -5.5 10 23 18 -5 -26.5 11 24 21 -3 -18.5 12 21 16 -5 -2.6 13 17 18 1 5.5 5.5 14 19 21 2 13 13 15 23 21 -2 -13 16 17 11 -3 -18.5 17 18 19 1 5.5 5.5 18. 19 24 5 26 26 19 17 19 2 13 13 20 12 13 1 5.5 f5.5 21 16 19 3 18.5 18.5 22 15 19 4 23 23 23 8 9 1 5.5 5.5 24 12 9 -3 -18.5 25 14 10 -4 -23 26 15 12 -3 -18.5 27 17 18 1 5.5 5.5 28 19 21 2 13 13 29 10 9 -1 -5.5 30 18 10 -8 -29

T 657 86 this case) is much greater than in the previous example and comes to 657.

Using Equation 12 we obtain the following Z score:

657 - 29(30) Z -- 4 29(30) (58+1) v 24

- 9.5

A Z score as high as 9.5 indicates that probability of finding a more extreme score is virtually zero and consequently the probability of finding this score if the null hypothesis is true approaches 1. We there- fore cannot rejeCt the null hypothesis. -87 -

CHAPTER 6 STATISTICAL TESTS FOR MEASURING CHANGE WITH INTERVAL SCALE VARIABLES

1. Three analytical approaches to the study of change with interval variables

With interval variables there are a number of significantly

different analytical approaches which can be used. The three approaches which will be discussed in this section are the following:

i. Use of the T-Test to estimate the significance

of a difference between means or proportions.

ii. Use of confidence intervals to estimate the

magnitude of the difference between means.

iii. Use of regression and covariance analysis.

The traditional approach in the evaluation of differences is to use

the T-Test to determine whether the difference between two sample means is

statistically significant. A criticism of this approach is that statistical

significance is often confused with important or meaningful.differences. This

criticism is expressed in the following quote (Cohen and Cohen, 1975):

"Despite the preoccupation (some critics would sub-

stitute obsession) of the behavioral sciences with

quantitative methods, the level of consciousness in

many areas of just how big things are is at a sur-

prisingly low level. This is because concern about

the statistical significance of effects (whether they

exist at all) has tended to pre-empt attention to

their magnitude. That significant effects may be

small and non-significant ones large, is a truism.

...... Yet many research reports, at least implicitly,

confuse the issues of size and statistical significance,

using the latter as if it meant the former-" - 88 -

To overcome this problem a second approach is to estimate the confidence intervals for the difference of means. This indicates the level of confidence which can be placed upon different estimates of the magnitude of the difference. This has the advantage that the researcher can judge whether the size of the difference is meaningful. In many cases it will be decided that although a statistically significant difference has been found, it is so small as to have no practical utility.

A further disadvantage of the use of the T-Test is that when a panel study uses an experimental and control group, each observed at 2 points in time, the 4 observations must be reduced to 2 by the use of change scores.

A Change Score is calculated as:

Change = Score(2) - Score (1) (Equation 14)

where: (1) and (2) refer to the two time periods being compared.

Change = Change Score

The use of Change Scores has two serious disadvantages. The first is that the

Score in T(2) tends to be correlated with the score in T(1). This can produce two types of bias:

a) Due to the statistical tendency for values to regress towards the mean,

it is likely that extremely low values in T(1) will have an unusually

high increase in T(2) whereas unusually high scores in T(1) will have

an unusually small increase.

This tendency can be illustrated if we think of a study of the number

of fish caught by fishermen. Let us assume that on average fishermen

catch 20 fishes a day. On the day on which the first study is con-

ducted some fishermen will have a particularly good day and may

catch up to 30 fish, whereas others will have a bad day and may - 89 -

catch as few as 5. Let us assume that conditions improve and a

year later when we return for the second survey the average catch

has gone up to 25 per day. We would expect that on average the

fishermen who had had a particularly lucky day on the first survey

would this time be nearer to the average. If this is so many of

them would record a lower catch in the second survey even though

the overall average is increasing. In the same way we would expect

the fishermen with the poor catch the first time around to also be

closer to the mean, so that many of them might record increases

from T(1) to T(2) of 15 or 20 fish, even though the overall average

increase is only 5.

b) The regression towards the mean operates when the low and high

scores in T(1) were caused by random variations. In many

cases, however, one would expect that the fishermen who caught

more fish in T(1) were better fishermen or had better equipment.

In this case we would expect them to show a higher than average

increase in T(2).

Both of these factors make it difficult to interpret a Change Score as being the result of a project impact. To be able to estimate how far the change is due to the project we are evaluating it will be necessary to find a way of isolating the effect of the two intervening factors mentioned above.

A second difficulty with the use of Change Scores is that the of the Change Score is lower than the reliability of the two observations on which it is based. If for example we are measuring a change in income, there is some error in the estimation of income in T(1) and

T(2) due to the unreliability of the information. A family does not know 90 -

their exact income or for one of a number of reasons they do not give an exact figure to the interviewer. If we assume that the reliability of the estimation of income in both T(1) and T(2) is 0.85 (which is in fact very high) and that the correlation between the two scores is 0.65, the reliability of the changes score will be:

Rel (X2-X) -[ REL (X1) + REL(X 2)/2 rX X2 (Equation 15)

I - rX X2

where:

Rel (X2-X ) reliability of Change Score.

Rel(X ) r eliability of measurement in 11 X rX x correlation 1lo between the two scores.

In the present example the reliability is estimated as follows:

Rel 1X2-x) (.85 + .85)/2 - 0.65 1 .65e

= .57

Thus even though the reliability of the two original measurements was .85 the reliability of the change score is only .57.

For these reasons many writers 1/ argue against the use of Change

Scores, or at least their use in their simplest form. An alternative which largely avoids this problem is the use of regressinn analysis or covariance analysis, with the score in T(2) as the dependent variable. Both of these methods will be explained in this chapter.

1/ See for example Cohen and Cohen (1975) Chapter 9 and Cook and Campbell 11979J Chapter 4. - 91 .

2. Independent samples

2.1 The T-Test for significance of the difference between means

The T-Test permits us to test the probability of finding

a given difference between 2 sample means if the null hypothesis

that the two samples are drawn from the same population, is true.

The T statistic is calculated as follows:

xX - -xX fe c (Equation 16)

where:

Xe = mean of the experimental group e Xc = mean of the control group

S(D) = standard deviation of the differ-

ence between means of the experi-

mental and control group.

There are two ways in w,hich SED] is calculated, depending upon whether the two means have the same variance. 1/ To know which of the two forms must be used we must first apply the F test to compare the variance (S ) of the two variables. This is applied as follows:

2 F - larger (5)_ (Equation 17) smaller(S) 2

For example in Table 12 the experimental group (El has the larger standard deviation and the F-Test would be applied as follows:

1/ The variance V = S - 92

(21.4)2

(21.3)2

1.009

As the sample size is 25 in both cases, the Degrees of Freedom is

24 (N-1) for each variance. Consulting the F table we find that with D.F. = 24, 24 it would be necessary to obtain an F score of 1.97 for the difference between the two variances to be significant at the 0.05 level. We can therefore conclude that the sample has a common variance. With the sample in Table 11 the F value would be:

(30X1)2 F2 (21.3)

= 1.997

This indicates that the variances are statistically different at the 0.05 level and that in this case it would be necessary to use the procedures for separate variances.

2.1.1. Calculating the T-statistic with common variances

The standard deviation of the differehce between means of the experi- mental and control groups, S(D) is calculated as follows:

N S 2 + N S N + N e e c c e c (Equation18) S(D) N + N -2 N _ N(Euto 8 e c e c

where:

N = number of interviews in experimental group e

N = number of interviews in control group c

S = Standard deviation of experimental group e

S = Standard deviation of control group. c 93 -

Table 11: Comparison of per capita income in a project (experimental group) and in low-income rental housing (control group)

Per capita income (Pesos/month)

Project Rental housing (E) (C)

27 40 52 65 53 75 54 70 61 80 49 65 19 25 18 20 22 21 24 29 29 39 60 91 52 50 18 18 19 26 20 27 15 25 16 20 31 45 32 48 84 104 78 110 69 95 12 20 41 90

X 38.2 51.9 S 21.3 30.1 N 25 25 - 94 -

The T statistic is then calculated according to Equation 16 with

N + N 2 degrees of freedom. e c

Example

Table 12 presents the results of a hypothetical study in which the per capita family income in a project is compared with per capita income of families living in extra legal subdivision housing The research hypo- thesis is that families in rental accommodation are richer. This is expressed as follows:

H: C > E

The null hypothesis is therefore:

H: C < E 0

It has already been established that the two sarmples have a common variance and the value of S(D) is therefore calculated using Equation 18:

SED] / (25 x 21e42) + (25 x 21.3 ) 25 + 25 25 + 25 - 2 25 x 25

-~ 6.16

The value of T is then calculated b7 Equation 16:

T 40.9 - 38.2 6.16

= .4383

With (25 + 25 - 2) = 48 Degrees of Freedom a T score of 1.67 would be needed to be significant at the 0,05 level so it is clear that no differ- ence between the means of the project and the rental group can be observed, and the null hypothesis cannot be rejected. 95

Table 12: Comparison of per capita income in a project (E) and in extra-legal subdivision housing (C)

Per capita income (Pesos/month)

Project Rental housing ((C)

27 30 52 54 53 57 54 58 61 63 49 51 19 22 18 25 22 23 24 26 29 34 60 62 52 54 18 21 19 20 20 25 15 17 16 18 31 34 32 34 84 87 78 81 69 73 12 13 41 42

X 38.2 40.9

S 21.4 21.3

N 25 25

.- - 96 -

2.1.2. Calculating the T-statistic with unequal variances

When the variance of the means of the two groups are unequal, S(D) is estimated as follows:

S(D) S2 + S2 (Equation 19) 1 2 i-1 N21

where: sI and s2 are the standard deviations of the first and second groups.

When the two variances are not equal the calculation of the degrees of freedom is more complicated as it must take into account the fact that sample sizes may not be equal and that this could artificially give too much weight to one of the two variances. The correct equation for estimating the degrees of freedom in this case is the following:

{12 + S2)

D.F. lI N2 1 -2 (Equation 20)

1 1 + 2 1- \N1I N\1+ I N2\ N2 I

Example

Table 11 presents hypothetical data on a comparison between per

capita income in a project and in low-income rertal accommodation. In this

case the investigator is not sure whether income of rental families should be higher or lower than in the project and the research hypothesis is

simply that the two groups of housing will have different per capita incomes:

H E C - 97 -

As a consequence the null hypothesis is that the two areas will have the same income:

H : E = C 0 and in this case the appropriate test is two-tail.

The standard deviation of the difference of means S(D) is cal- culated using Equation 19 as follows:

S(D) = 21.32 + 30.12 24 24

The T-statistic is calculated as follows:

T = 51.9 - 38.2 7.52

= 1.82

The degrees of Freedom, according to Equation 20, are:

(21.32 + 30.12 2 D.F. = 24 24 (1.32)2302 2 {1d+ 0125 2 1A 24 26J\ 24 ) 26/ = 46.8

- 47

As the two samples have the same size there is almost no difference in the number of degrees of freedom from the previous example.

With 47 degrees of freedom the one-tail probabilities associated with different T scores are the following (using 50 D.F. as most tables do not give the exact value for D.F. = 47).

Probability 0.1 0.05 0.025 T Score 1.29 1.67 2.08 - 98 -

We must be very careful to note that these are probabilities for a one-tail test in which the direction of the expected difference is spe- cified. However, in the present case we do not know the expected direction and a two-tail test is used. If we refer back to Fig. 4, this distinction will be clarified. The area to the left of C shows the 5% area at the left tail of the curve. This is the 5% probability we would use to test a one- tail test. However, if the test is two-tail and if we have this amount of area on each tail, the total area covered would be 10%. For a two tail test at the 5% level we only require 2.5% at each tail of the curve (represented by the area to the left of B on, Fig. 4). From this it can be seen that the

2.5% area to the left of B will represent the 2.5% significance level for a one-tail test but if a similar area exists at the other tail also, then this will only represent the 5% significance level for a two-tail test. It is for this reason that the one-tail probability must be doubled for a two-tail test.

2.2. The T-Test or Z Test for Differences of Proportions

Differences of proportions is a special case of differences between means. The standard deviation of a proportion is:

Sp = v (Equation 21)

where:

p = the proportion with a certain attribute

q = i-p = proportion without this attribute

Following the general rule for the sum of variances 1/, the standard

1/ The variance of the difference of means or proportions is equal to the weighted sum of the variances of the two distributions of the two means being compared. This is expressed as follows:

S(X g) 1 + 2 (Equation 22) 2 1 n n 1 2 - 99

deviation of the difference between variances is:

S(P 2 -PI) |2 + s2 (Equation 23) n1 n 2

Given that sp pq and that the null hypothesis to be tested is that both samples come from the same population, it follows that to test the

null hypothesis we can assume that there is a common variance PQ. The two observed variances from the samples will be pooled to obtain the estimate pq

of the true variance, With the use of a pooled estimate of variance, Equation 23 becomes:

s(P 2 -P) p N (Equation 24) N1N2

The estimate of p, is in turn obtained by pooling the two observed values of p (p 1 and P9) as follows:

p = lP+ n2P2 (Equation 25) n.I+ n2

q can then be estimated directly as (1 - p) Example

A hypothetical study was conducted to estimate the proportion of women who were members of a cooperative. Independeut samples were taken twice at a year's interval and the following results were found:

T(1) T(2) Proportion of women who were members of a cooperative (p) .53 .46 Sample size (n) 50 48 - 100 -

Using Equation 25, p is estimated as follows:

p - nIP1 + n2P2 = 50(.53) + 48(.46) = .4957 n + n2 50 + 48

q = 1 -p = .5043

The value of the standard deviation of the difference can now be obtained from Equation 24 as follows:

s(p2-p1)- 4 50 + 48 .1010 50 x 48

The value of Z 1/ is estimated as:

Z = P2 - 1 .53 - .46 = .6931 s(p 2 - P1) .1010

The probability of this value is .49 so the nulJ. hypothesis can obviously not be rejected.

2.2.1 . The T-Test or Z-Test for the Difference of Difference of Proportions

The previous analysis can easily be extended to the difference between difference of proportions. In this case the Z score is calculated as follows:

z - (P1 P2) - (Equation 26)

PiqI + P2q2 + p3q3 + p4q4 n1 n2 n3 n4

Example A study was conducted to compare changes in political party member- ship of men and women. The proportions of male and female heads of household who are members of a political party were as follows:

1/ Z is used rather than T as the frequencies are relatively large. - 101 -

Female heads Male heads T(l) T(2) T(1) T(2)

p .65 .58 .69 .67

n 50 60 45 65

Our null hypothesis is:

H0: p(F2 - Fl) = p(M2 - Ml)

where:

F - female participation

M 3 male participation

The Z score is calculated as follows:

Z = (.65 - .58) - (.69 - .67)

> ( .58) + )(.42) + (.69)(.31) + (.67)(.33) 50 60 45 65

0.3862

The probability of finding this score is approximately .7 so we

cannot reject the null hypothesis of no difference in the change of parti-

cipation in political parties for female and male headed families.

2.3. Confidence Intervals for the Difference of Means

It was pointed out in Section 1 of this chapter that knowing the

difference between two means is statistically significant does not tell us

anything about the magnitude of the difference. In other words the statisti-

cal significance test does not tell us whether the difference is important or meaningful. An alternative approach, which does indicate the size of the

difference as well as its statistital significance, is the calculation of

confidence intervals. - 102 -

Fig. 5 presents the areas under the normal curve. In most cases, when sample size is relatively large (50 or more) the distribution of means which will be obtained from repeated samples, follows this distribution.

This is also the case in the present instance where we are considering a sample which estimates the difference between means of two samples. x is the difference between the means of the two populations from which the samples are drawn. If the sample is repeated a large number of times the majority of the estimated differences between means will lie near to X but a certain number will be more widely distributed. If we take a range which goes from 1.96 standard deviations (s) below the true mean (X) to 1.96s above x it will be found that approximately 95% of the sample means will fall within this range. This is usually expressed by considering the total area under the curve a 1 and expressing the areas contained under different parts of the curve as proportions or probabilities. In this case we would say that the area contained between:

X + 1.96s = .95.

As the normal curve is perfectly symmetrical, if .95 of the area falls within 1.96s of the mean, then .025 will lie below this range and .025 will lie above this range. These proportions mean that if 1000 samples were drawn the mean of 950 would lie within 1.96s of the true mean (X). Similarly one would expect to find 25 samples where the mean was more than 1.96s below

X and 25 where the mean was more than 1.96s above X. The shaded areas in

Fig. 1 represent the sample means falling outside the 1.96s range. -l l-i- C~--~- jl l lW l :--- 0 x i!A :L -- I-' - I Figure 5: AREAS UNDER THE NORMAL CURVE I

II)

-24

I !~i~j~ -T-

- ..- ..-....-.--.. -- 104 -

In a similar way it is found that .99 of the area lies within a range of + 2.58s and as a consequence one can -expect .005 of the sample means to lie below this range and .005 to lie above. This is shown by the double shaded area.

A table of areas under the normal curve indicates the proportion of obserrvations, in this case sample means, which fall outside the range delimited by any standLrd deviation (usually referred to as the Z score).

Table 13 provides a few scores to illustrate.

Table 13: Some areas under the normal curve

Area lying beyond this value Z on one tail of the curve

0.0 .5000

1.0 .1587

1.96 .0250

2.58 .0049

3.0 .0013

The figures in the table only refer to one side of the curve. in

Fig. 5 this could either be the .5000 of the area lying above or below X. The

value for Z = 0.0 indicates that .5000 of the area lies above the mean (X).

When Z = +1.0 we find that .1587 of the area lies above this value. If we

were i-nterested in the area lying outside the range X + Is we would have to

double this value to take into account both sides of the curve and we would

find that .3174 lies outside this range.

When Z = + 1.96 we find that .0250 lies above this range and an area

of .0250 x 2 = .0500 lies outside the range of X + 1.96s. Similarly when

Z = + 2.58 we find that approximately .001 of the area lies outside the range

of X + 2.58s. The final value we give in the example is Z = 3.0 and in this

case it is found that only .0026 of the area lies outside the range of X + 3s. -105-

Example: Estimating Confidence Intervals for the Difference in Food Expenditure for Male and Female Headed Households

Table 14 presents hypothetical data from a study which compares per capita monthly food expenditures of male and female headed households.

The mean expenditure of female households is 29.2 Pesos and for male households

48.4 Pesos. If we were to apply the T-Test it would be found that the value of

T = 24.7 (d.f. = 48) is statistically significant at the .0005 level so that we can feel sure that this is highly significant. However, we do not know how large the absolute difference is because the T-Test does not directly tell us anything about the confidence intervals for the difference of 18.2

Pesos found between the sample means.

To establish confidence intervals we must calculate the standard deviation of the difference between means, the F-Test shows that the Variance of the two means is different (F = 2.54 with d.f. = 24/24) so we must calculate

S(D) using Equation 19.

2 2 S(D) = 0(2.2)+ (3.22) 25-1 25-i

- 0.7759

If we wish to establish confidence intervals to determine the range which has a 95% probability of including the true mean, we would estimate as

follows:

X 1.96s

= 18.4 + 1.96 (0.7759)

= 18.4 + 1.5208 -106-

This means.we can be 95% confident that per capita food expenditure

of male headed households is between 16.8792 pesos and 19.9208 pesos higher

than for female headed households. The researcher and the reader now have much more valuable information than simply knowing how significant the difference

is. It is now possible to judge whether a difference of this magnitude is

important.

If the researcher wished to be even more certain that the true value

of the difference lies within the confidence range, one could use 2.58s to be

99% certain of including the true value within the range. In thiti case the

limits of the estimate would be:

18.4 + 2.58 (0.7759)

= 18.4 + 2.0018 pesos - 107

Table 14: Comparison of Per Capita Monthly Expenditure on Food for Female and Male Headed Households (Pesos)

Female Headed Male Headed

27 45 29 48 31 47 26 51 25 50 33 48 27 46 28 47 29 48 29 49 30 46 28 48 27 51 28 53 30 54 30 39 31 44 29 46 28 48 30 49 32 50 32 52 31 53 30 50 29 49

X 29.2 48.4

s 2.02 3.22 - 108 -

3. Related samples

3.1. The T-Test- for Pairs

3.1.1. The use of the T-Test without a control group

When paired samples are used, and there is no control group the

T-Statistic to be calculated is as follows:

T = d -.k (Equation 27) s(d)

where: N 3 the hypothetical mean under the null hypothesis.

d - X(2) - X(1)

Unless otherwise stated our null hypothesis is:

H : X = X, so#= 0 and can be left out. o 2

The standard deviation of the difference between means S(d) is calculated as follows:

5(d) + S _(S _ __1 _ _2 (Equation 28)

n

The value of T is calculated with D.F. = n - 1. It can be seen that the use

of a paired sample reduces the degrees of freedom by half. This sacrifice in

the number of degrees of freedom is only worthwhile if there is a substantial

reduction in the size of the standard deviation. This reduction will be

achieved when there is a high correlation between the score of a person

in T(1) and T(2). Within Equation 28 this correlation is reflected through the

covariation term which is: Covar(x x ) =-X 1L C 2 (Equation 29) 1 2 n - I - 109

In other words, if a person's score in T(1) is highly positively correlated with the score in T(2) there will be a benefit to be derived from using a paired sample. If there is no close correlation, or if the correlation is negative then pairing should not be used.

An alternative approach 1/

An alternative way to handle paired samples when no control group is used is to calculate the change score as defined in Equation 13:

CHANGE = SCORE (2) - SCORE (1)

When this score is calculated the two paired samples are reduced

to a single sarple. The T-Statistic is now calculated as follows:

T = T - Y (Equation 30) s(C)

where: C = change score

C = mean change score

3.1.2 The use of the T-Test with a control group

When a control group is included in the sample design the two

observations on each case must be reduced to a single Change Score. Once

this has been done the comparison is now between two independent samples,

one drawn from the experimental group and one from the control group. The

T-Test can now be used in exactly the same way as for the comparison of two

independent samples (See Section 2.1).

Example: Evaluating changes in monthly expenditures on transport

Table 15 presents the results of a hypothetical study in which the

impact of a new housing project on transportation costs was evaluated. The

1/ See for example: H. Blalock (1972) Chapter 13, Section 13.4. 110 -

Table 15: Hypothetical data on changes in monthly expenditure on transport for project participants and a control group

Related samples

Monthly expenditures on transport (pesos)

Project participants Control group T(1) T(2) Change score T(1) T(2) Change score

30 27 -3 27 40 13 54 52 -2 52 65 13 57 53 -4 53 75 22 58 54 -4 54 70 16 63 61 -2 61 80 19 51 49 -2 49 65 16 22 19 -3 19 25 6 25 18 -7 18 20 2 23 22 -1 22 21 -1 26 24 -2 24 29 5 34 29 -5 29 39 10 62 60 -2 60 91 31 54 52 -2 52 50 -2 21 18 -3 18 18 0 20 19 -1 19 26 7 25 20 -5 20 27 7 17 15 -2 15 25 10 18 16 -2 16 20 4 34 31 -3 31 45 14 34 32 -2 32 48 16 87 84 -3 84 104 20 81 78 -3 78 110 32 73 69 -4 69 95 26 1.3 12 -1 12 20 8 42 41 -1 41 50 9

X 40.9 38.2 -2.76 38.2 50.3 12.1

s 21.3 21.4 1.45 21.3 29 9.28 families in the experimental and control groups were interviewed in T(1) before the former moved to the new project site, and information was obtained on monthly expenditure on transport. The same families were reinterviewed in T(2) after participants had been living in the project for a yeara The change score was obtained by subtracting transport costs in T(1) from those in T(2). It can be seen that average expenditure for participants decreased by 2.76 pesos and for the control group increased by 12.1 pesos.

The null hypothesis being tested is:

H: C > C 0 c e where: C = changed expenditure in experimental group

C = changed expenditure in control group. c

The F-Test shows that the difference between the variance of the two samples is highly significant (F = 40.84 p < .001) so the version of the T-Test for separate variances must be used (Equations 19 and 100.

The test produced the following results;

T = 7.93 D.F. = 25 Probability (one tailed) < .001

This indicates that there is a highly significant difference in the change in transport costs for the experimental and control groups and this would support the hypothesis that the project has reduced transport costs.

3.2 Use of confidence intervals

As in t7.ieearlier discussions, knowing that the differences are highly significant does not tell us whether they are important. The standard - 112 -

error of the difference is calculated from Equation 19 and is found to

be 1.92. The mean difference between the change scores is

12.1 - (-2.7) = 14.8 pesos

so at the 95% level the confidence intervals for the estimate of

the mean difference are:

14.8 + 1.96 (1.92) = 11.0 to 18.6 pesos

3.3 The use of regression analysis to evaluate differences in the

rate of change between the experimental and control groups

while controlling for the value of dependent and independent

variables in T(1)

Section 1 indicated some of the difficulties in using change

scores to evaluate project impact§ The first problem mentioned was that

the reliability of the change score is lower than the reliability of the

two scores on which it is based. The second was that a large part of the

variance in the change score is produced by the score in T(1) and if the

analysis does not control for the effect of T(1) the impact of the project may appear to be higher than it really is. To overcome both of these prob-

lems, an alternative approach is to measure and evaluate change through multiple regression analysis using the score in T(2) as the dependent variable. The logic of this approach will be demonstrated with the example

given in Table 15.

Table 16 gives the results of a simple regression analysis using

the change score as the dependent variable, and the Dummy Variable STATUS,

as the independent variable. STATUS is defined as follows:

Project Participant: STATUS = 0

Control Group: STATUS = 1 - 113 -

Table 16: Simple Regression analysis with change in transport costs (CHANGE) as the dependent variable and project participation (STATUS) as the independent variable

DEPENDENT VARIABLE CHANGE (Change in monthly transport costs)

MULTIPLE R 0.7528

R SQUARE 0.5668

STANDARD ERROR 6.6383 F = 62.8 (D.F,, 1/48)

VARIABLES IN THE EQUATION

Standard VtARIABLE B error of B F Simple R R SQUARE RSQ Change

STATUS 14.88 1.8776 62.807 .7529 0.5668 0.5668

Constant -2.76 - 114 -

Comparing w7ith Table 15 it will be found that the Y intercept (the

constant) is equal to the mean change score for participants. It is always

the case with dummy variables that the variable or category defined as 0 in

the dummy variable has a mean equal to the constant. The mean for the control

group is found adding the regression coefficient for STATUS to the constant:

14.88 + (-2.76) = 12.12

The for the change score can also be calculated

as the standard deviation ( of B) is also given. The F score

of 62.8 is significant beyond the 0.0001 level showing the difference to be

highly significant.

The result is the same as that found with the T-Test and in fact

when only one independent variable is used:

T =(Equation 31)

This can be checked by taking the square root of the F Score

(62.8 - -V7.92) which is the same as the T Statistic estimated in

Section 3.1.2.

This finding, provides the same conclusion as the T-Test, namely

that the project (as defined by STATUS) has a strong effect on travel expen- 2 diture. Finally, the table shows that R = .56681 which means that STATUS

explains approximately 56 percent of the variance in transport costs. Given

the problems inherent in the use of change scores, an alternative approach

is to use transport expenditure in T(2) as the dependent variable. If

expenditure in T(1) is introduced as a second independent variable together with STATUS, the effect of expenditure in T(1) will be controlled for, or - 115 -

"partialled out" 1/ so that the regression coefficient for STATUS now shows the effect on expenditure in T(2) after the initial differences in expenditure in T(1) have been taken out. Table 17 shows the results of the analysis. Although the coefficient of STATUS has now dropped considerably it is still highly significant (F = 33.8). T(1) explains the largest share of the variance and can be seen to reduce, although not completely eliminate, the effect of STATUS. In Chapter 12 we will give an example where the introduction of the T(1) score completely eliminates the effect of project participation.

This regression technique, in addition to being able to con- trol for the effect of the T(1) score, can also control, or partial out, the effect of other variables. We will return to the importance of this in Chapter 12 when we discuss the problems which arise from non-equivalent control groups.

One possible weakness of this approach has been stressed by econometricians who use "'lagged variables" extensively in analysis. The lagged variable, in its simplest form is introduced in the following way into the regression equation:

Yt = a + bYt-1 + et

whera:

t- lagged variable

e = error term

1/ The regression coefficients are in fact partial coefficients which show the effect of each variable when the effect of all other variables in the equation have been controlled for or "partialled out." - 116

Table 17: Multiple Regression analysis with transport costs in T(2) as the dependent variable and transport costs in T(1) and project participation (STATUS) as the independent variables

DEPENDENT VARIABLE T2 (Transport costs in T2)

MULTIPLE R 0.9762

R SQUARE 0.9530

STANDARD ERROR 5.6865 F = 477 (D.F. = 2/47)

VARIABLES IN THE EQUATION

Standard VARIABLE B error of B F Simple R R SQUARE RSQ Change

TI 1.165 0.038 920 0.9587 0.9192 0.9192

STATUS 9.360 1.608 33.86 0.1839 0.9530 0.0338

Constant .- 3.53 - 117 -

It is argued that with the use of lagged variables the distribution of the error term is not random. The random distribution of the error term is an essential requirement for regression analysis, and if it is not satisfied the estimates from the regression equation will be biased.

However, this is more of a problem in a time series analysis than with a comparison of two points in time, as the error term, et is probably independent of Y and Y although it is not independent of Y , Y t t-1 t+1 t+2, and future ovservations. 1/ As our analysis only covers two time periods this should not, therefore, be a problem.

11 Johnston (1963) Chapter 8, Section 8.3e - 118 -

PART III

APPLICATION OF THE STATISTICAL TESTS IN THE EVALUATION OF URBAN SHELTER PROGRAMS Examples from ongoing research

Part III illustrates how the tests have been applied with each of the research designs presented in Chapter 2. All of the examples are taken from ongoing evaluations of World Bank projects.

Chapter 7 presents some important practical, but often ignored problems of matching cases in longitudinal studies. The analysis of panel studies and mixed samples requires matching the two interviews with each subject. In practice the matching process can be very difficult.

Chapters 8, 9 and 10 illustrate the statistical analysis of panel studies, studies based on independent random samples, and designs using mixed samples. Chapter 11 reviews two approaches which have been adopted when it is not possible to condulct a pre-test.

The final chapter is devoted to the problems created by non- equivalent control groups. A number of strategies are proposed for resolving these problems with dependent variables measured on nominal, ordinal and interval scales. - 119 -

Chapter 7. OPERATIONAL PROBLEMS IN MATCHING CASES IN URBAN RESEARCH

1. Causes of the Difficulties in Matching Cases

The logic of the evaluation of change with panel studies or mixed samples requires that information be obtained from a sample of subjects at two (or more) points in time. The score for each case on a given variable is compared for T(1) and T(2) and all appropriate test is used to determine if there is a significant difference between the two samples. To conduct this analysis it is necessary to compare the scores for each case in the two time periods. In a laboratory experi- ment this matching of cases presents no problem as participation in the experiment is closely controlled by the experimenter. The only problem which may arise is that it is not possible to repeat the interview or test with a certain number of cases, and when this happens the incom- pleted cases are dropped from the study.

In urban research the situation tends to be more complex.

In most of the studies we are discussing, the second interview will be conducted one or even two years after the family has been inter- viewed for the first time. Many of the families will have moved and will not be locatable for the second interview. When a family

"drops out" of the sample in this way, the sample size will either be reduced (panel study) or a replacement family will be selected

(mixed sample). The replacement family may either be the new family who has moved into the house, or another family may be selected on the basis of random selection. - 120 -

Once the decision has been made as to which sampling design to

use, the procedures to follow are clearly defined and the appropriate

type of statistical test will be used depending on whether the samples are related, independent or mixed.

Unfortunately at the time of the analysis it is often extremely

difficult for the analyst to be completely certain which persons or fami-

lies have been reinterviewed and which cases include interviews with dif-

ferent people in T(1) and T(2). The main reasons which can lead to this confusion are the following:

1.1. Administrative problems

(i) In many cases the researcher will rely on information collected

by the program he is studying to assist in drawing or updating

the sample. This information may be used by program management

to determine when families move to new areas (with essential

resettlement programs for example), who is selected for a project

or who leaves. In some cases this information is not accurate;

is not kept up-to-date or is not prepared in the way needed by

the researcher. Due to one or more of these reasons it may not

be possible to keep control over which families are still in the

sample or who can be classified as affected by the project,,

Some examples:

a. In the essential resettlement program in Zambia, families

were moved to a provisional lot which had not been surveyed

and which did not have definite boundaries. Many of these - 121 -

plots had their boundaries changed when titles were finalized.

This meant that many of the families were "lost" and it was not

possible to determine what had been the plot number in the original

upgraded areas for the families who had moved to the resettlement

areas.

b. In Nairobi many families sold their plots or sublet the whole

house, both of which were not permitted by the project regulations.

As families were careful to conceal these illegal transactions

project records tended to be somewhat unreliable as a source of

information on which families were actually occupying each plot.

(ii) MAny projects keep a record of the names of the families and

the head of household occupying each plot. It is extremely

difficult to use the list of names to determine whether the same

family is occupying the plot in T(1) and T(2). Some examples:

a. In El Salvador many of the original application were made

by women, and it was the woman's name which appeared on the

application form in T(1). When the title was given it would

.often be in the name of the man or in both names. As a result

the name appearing in the project records in T(2) would often

be different from the original name in T(1) even though it was

the same family.

b. In Zambia, a person has several different names and the name

he or she gives in T(1) may be different from the name given

in T(2). - 122 -

102 Confusion during the interview

i. In many of the countries studied there is no clear and consistent

definition of household head. As a result a woman may be defined

as household head in T(1) whereas in T(2) the head may be defined

as a man. This is particularly true in countries such as El Sal-

vador where the family group is unstable. In many cases the woman

is the de facto head, but a man, who only lives in the house part of

the time will sometimes assume the title of head.

1.3 Coding errors and problems with punching.

i. It is almost impossible to eliminate completely coding errors. In

many coding systems there is a single variable which indicates whe-

ther the original family is being interviewed for the second time

or whether the family is a replacement for one of the original

families who has moved. For example:

Code

Original family reinterviewed 1

New family being interviewed

for the first time 2

If a 2 is coded instead of a 1 the case will be classified as a replace-

ment whereas it is an original family. When this happens all of the information is wrongly analysed.

ii. Even when the information is correctly coded there is still a danger

of errors in the punching. - 123 -

iii. In many cases there are ambiguous cases which lead to potential

errors and confusions. For example, in the El Salvador studies

there were a certain number of families who had been selected as

participants but who, for different reasons, had not moved to their

new house by the time of the second interview. These families

could be considered as participants because they had been selected

for the project and were in fact planning to move. On the other

hand they were like the control group in that they had not moved

to the project.

A similar confusion was caused by families who were part of

the control group in T(1) but who had subsequently been s.elected

as participants to replace families who had left When these

families were revisited in T(2) for the second interview with

the control group it was foAnd that they were now partic.,pants.

All of these examples illustrate types of confusion which

can arise, and which can lead to families being wrongly classified.

2. Some corrective measures

2.1 Clearer definitions of head of household, participants, etc.

A considerable amount of confusion can be eliminated through clearer definition of concepts, and through clearer coding instructions. The confusion created by marginal cases such as participants who have not moved to the project can be eliminated by specifying in the coding plan all possible alternative outcomes. For example: - 124 -

Code

Family who had been selected ac participant in T(1) and who has moved to the project by T(2) 1

Family who had been selected as participant in T(1) who has not moved to the project by T(2) but who still intends to move 2

Family who had been selected as participant in T(1) but who never moved to the project and who does not intend to continue in the project 3

Member of the control group in T(1) who still remains in the same house and in the control group in T(2) 4

Member of the control group in T(1) who has bec ome a participant and moved to the project by T(2) 5

Participant family in T(1) who dropped out of the proiect before T(2) and where the replacement family is interviewed in T(2) 6

Control family in T(1) who has moved and been replaced by a new control family in T(2) 7

The important thing with the coding system is to clearly classify all of the possible outcomes. It does not matter at this point whether a decision has been made as to which of these groups should be defined as participants and control, but only that the classifications are clear. The decision can be made later at the time of analysis how to classify each group, and in fact it is possible to experiment with different definitions to determine the effect they have on the results. - 125 -

2.2 Recording the.full name of the male and female spouses in the data file.

If a coding system is used which permits alpha-numeric coding it is

possible to record the names of both spouses on the cards, tape or disk. If

a different person is defined as head of household in T(2) it is possible by

comparing names to know whether the new name indicates a different family or

whether it is simply the other spouse. As a general guideline the names

should be recorded in as much detail as possible as in many cultures a person

has two or more family names (which can sometimes be used interchangeably)

or may have a number of other different names by which he or she may be known.

If the number of cases is not too large (not more than two or three

hundred for example) a listing of all names together with the classification

as original family or a replacement, provides a very useful visual consistency

check. The use of names has a great advantage over numerical control numbers

in that a name is igsually recognizable with one or even more letters mis-

punched, whereas a single error in a number will render it invalid. For

example, if FERNANDEZ is mispunched as FERNENDEZ, FARNANDEZ or even FORNENDEZ

it is still recognizably the same name. However, if an identity number which

should be 990034 is mispunched as 980034 we have no way of knowing whether

this is a punching error or whether it i ndicates that a different person was interviewed on each occasion.

2.3. Building redundancy into the coding

Instead of relying on one code to indicate whether the case is an

original or a replacement family, there are various types of redundancy which can be built in. For example: - 126 -

a. Instead of using I to represent a repeat interview and

2 a replacement family, one could use:

111 = repeat

222 = replacement

It is unlikely that three consecutive punching errors occur.

Thus if we find a code such as 112 or 212 we know there is an

error and can go back and check..

b. Coding the information twice or more in different parts of

the coding frame. If the information has to be coded inde-

pendently in two or more different locations in the coding

frame this will eliminate the danger that with system a.

above a coder could through carelessness code 111 when he

meant 222. This is unlikely to happen tf the coding occurs

on different occasions and is recorded in widely separated

columns.

2.4 Logical consistency checks in the analysis stage.

If the second interview is conducted two years after the first interview, each person must be approximately 2 years older (they could be one or three years older if the interview was not conducted exactly

2 years later). If the head of household remains the same person, then:

AGE (2) = AGE (1) + (between 1 and 3 years).

If we find a case where AGE (2) > AGE (1) + 3 or AGE (2) < AGE (1) + 1

then there is an error. Most computer programs can easily identify the

error cases and list them. It is then necessary to return to the coding

sheets or the questionnaires to determine the cause of the mistake. - 127 -

This type of consistency check is extremely useful when the head of household remains the same, but it will obviously not work if a male head is replaced by a female or-vice versa.

The appropriate consistency check varies from one culture to another. In some countries age is not a good basis for consistency as there is a high level of inaccuracy in reporting age so that a person will frequent- ly give a different age in different interviews simply because he or she is not sure how old he really is.

An example of a consistency check

Table 18 is an illustration of part of the consistency check pro- cedure used in El Salvador. The table was used to estimate the magnitude of possible inconsistencies on age and education. The cases with apparent inconsistencies were then identified and a check was made with the original questionnaire and code-sheets. The two interviews were conducted with an interval of approximately 30 months. All of the cases given in the table are supposedly panel interviews where the same head of household was interviewed on both occasions. Logically tbe respondents' age in the second interview should be between two and three years greater than in the first interview.

If we look at the distribution of the variable AGEDIF (the difference between reported age in the two interviews) we find the following: Differences in age between the first Number of and second interviews Cases %

-1 or less 32 14.6 0-1 years 24 11 2-3 years 69 31.5 4-5 years 54 24.7 more than 5 years 40 18.3

Total 219 100 Table 48: PANEL DATA AMBIGUITIES SANTA ANA, EL SALVADOR

w * r' * 4 * *~ * * * * * CR S S T A U L A T I AGEDIF N O F * * * * * * * * * * BY EDUDIF K ' 4 * 4 * * 4* * * * 4. * * * * * * * * * * * 9. * * * * * * * * * * * * * * * * * * * * * * PAGE 1 OF I EDUDIF COUNT I TOT PCT 10 1 -1 2 3 OR MOR -2 OR LE ROW I E SS TOTAL I 1.I 2.1 3.1 4.1 5.I AGEDIF ------I------I------I------I------I------I------I6.1 I. I 12 1 1 1 3 1 II 2 I 5 1 24 0-1 1 5.5 I 0.5 1 1.4 1 0.5 I 0.9 i 2.3 I 11.0 -1------I------I------I------I------I------I 2. 1 0 I I 13 1 tl 1 9 1 20 I 16 1 2-3 1 0.01 69 5.9 1 5.01 4.1 1 9.1 I 7.3 1 31.5 -I------I------I------I------I------t I 3. 1 32 I 4 I 1 1 2 1 10 1 5 I 54 4-5 I 14.6 1 1 1.8 I 0.5 I 0.9 1 4.6 I 2.3 1 24.7 -I------I------I------I------I------I------I 4. 1 141 1 1 0 I 6 1 7 1 -1 OR LESS 1 4 1 32 6.4 I 0.5 I 0.0 I 2.7 1 3.2 I 1.8 1 14.6 -I ------I------I------I------I------I------I 5. 1 22 1 5 1 3 1 4 1 2 1 41 MORE THAN 5 I 10.0 40 I 2.3 I 1.4 I 1.8 1 0.9 I 1.8 I -l------I------I------I------I------I------I 18.3 COLUMN 80 24 18 22 41 34 219 TOTAL 36.5 11.0 8.2 10.0 18.7 15.5 100.0

Source: David Lindauer, "Longitudinal Analysis and Project Turnover. Lessons from El Salvador." September 1979, Urban and Regional Economics Division, Development Economics Department, The Wiorld Bank. - 129 -

Even if we are generous and accept a certain margin of error in reporting, it would seem that the 14.6% whose age is less in the second interview and the 18.3 whose age has increased by more than 5 years, signal some potential inconsistency or error.

If we examine the information on education, the group whose edu- cation has increased by 3 years or more, or whose education has decreased by two years or more, would also seem to signal potential problems.

On the basis of these and similar checks it was possible to identify a n'imber of punching and coding errors and also to eliminate a number of other inconsistencies. However, a very thorough checkback revealed that it is extremely difficult to develop any foolproof system of consistency checks.

Many people simply do not know what their age is, people do change their names, and there are many lapses of memory.

Table 19 illustrates part of a more complete listing system which can be used to detect inconsistencies. Looking at this table reveals a number of possible inconsistencies. In row 8 for example (case number 158) the head

of household has supposedly been interviewed twice. In both cases the head is a man, but his age drops from 47 to 38 and his education drops from 11 years

to 6 years. Was a different member of the familv interviewed? Is there a new head? Was it a different family? Was the interview poorly conducted? In all

doubtful cases the original questiounaires were consulted. Although these

techniques, particularly with fairly small samples of only ?10-300, can reduce

considerably the margin of error, it is inevitable there will always be a cer-

tain number of cases wrongly coded or punched or classified. - 130

Table 19: MATCHING HOUSEHOLD CHARACTERISTICS USING THE SANTA ANA PANEL DATA (Some Examples)

ColJ# (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) Characteristics of the Row # ID# No.of Characteristics of Household Head * Household * Times Sex Sex Age Age Educ Educ Marital Marital Size Size Inter- 1 2 1 2 1 2 1 2 1 2 viewed (1=Male;2=Female)(in yrs.) (in yrs.) (**) (number)

1 013 1 2 1 30 45 12 6 2 3 4 3

2 039 1 2 2 75 24 0 9 7 2 2 4

3 209 1 2 2 69 71 0 0 6 6 3 1

4 168 2 1 1 33 36 6 6 3 7 7

5 373 2 1 1 58 6io 4 4 2 2 5 5

6 392 2 1 1 56 59 6 6 3 3 5 4

7 150 2 2 1 22 64 9 0 1 3 10 10

8 158 2 1 2 47 46 2 6 2 2 8 6

9 184 2 1 1 47 38 11 6 2 3 5 4

10 344 2 2 2 53 52 0 2 1 5 6 6

11 389 2 1 1 38 42 0 0 2 2 10 9

* Subscripts refer to First (1) and Second (2) panels.

** 1 = Single; 2 = Married; 3 = Free Union; 4 = Divorced; 5 = Separated; 6 = Widowed; 7 = No Information.

Source: David Lindauer An 2o.4xs and Project Turnover. Some Lessons from El Salvador." September 1979. TUrbn and Regional Economics Division. World Bank. -131-

Several practical lessons come from this discussion:

1. The most important is to be aware that misclassification is a

serious problem. All of the effort of carefully designed samples

or supervised interviews is wasted if the cases are wrongly classi-

fied. This is a vital step in the research and one which is

almost never mentioned.

2. Introduce rigorous supervision systems to make sure you know who

you have interviewed.

3. Introduce consistency checks and into the coding so

that internal checks can be made.

4. Develop as many logical checks as possible and be prepared to go

back to the questionnaires.

5. Review very carefully any strange, inconsistent cases, the "out-

liers" on the scatterplots. With fairly small samples a result

can be dramatically changed by one or two cases which may be

wrong. Be prepared to throw out doubtful cases ra.her than

to defend at all costs the sample size. - 132 -

Chapter 8: THE APPLICATION OF THE STATISTICAL TESTS WITH PANEL SAMPLES

1. Panel studies without control groups

Panel studies use a design in which the same sample of subjects

(individuals, families, etc.) are reinterviewed at two or more points in

time. The samples are dependent or related in that the selection of the subjects to interview in T(2) is determined by the selection in T(1). The fact that the samples are related means that we cannot directly use statis- tical tests which assume that each sample provides an independent estimate of the population variance (X and T-Test for example).

The sample design for a panel conducted at two points in time without a control group is as follows:

T(I) T(2) Experimental Group E(1)1 X E(1)2

Although this is theoretically a very weak design, in practice it often has to be used as control groups may not be available or are destroyed by natural disaster or government policy.

In the following sections examples will be given of the way change is evaluated with panel studies without control groups when the variables being evaluated are measured on nominal, ordinal and interval scales.

1.1 Nominal scale measurement: Changes in Job Stability: McNamer Test)

Table 20 prescents data on changes in job stability in El Salvador.l/

This study did in fact use a control group which is excluded here for illus- trative purposes. 127 participants in a housing project were interviewed in

1977, before they entered the project and again in 1979. In each interview information was obtained on job stability. If a person had been in the same

1/ The example is taken from '"A preliminary analysis of panel data on income and employment in El Salvador" Umnuay Sae-Hua. DEDRB. October 1979. - 133 -

job for 12 months or more his employment was defined as stable, whereas if he had held the job for less than 12 months it was defined as unstable. As all of the families were interviewed twice, this is a panel design with related samples. 1/.

Table 20: JOB STABILITY AMIONG PROJECT PARTICIPANTS IN SONSONATE, EL SALVADOR 1977 and 1979

1979 Unstable Stable Total (-) (+)

Stable (+) 19 90 109 1977 Unstable (-) 8 10 18

Total 27 100 127

Source: Umnuay Sae-Hua. "A preliminary analysis of panel data on income and employment in El Salvador." Table 5. DEDRB. October 1979.

As the two samples are related it is not possible directly to use X2, as this is based upon the assumption of the independence of the two samples. The appro- priate test in this case is the McNemar Test 2/ To use this test the four cells

of the table are defined as:

1/ Families who could not be reinterviewed were replaced in the second interview but are excluded from this panel analysis although they do enter into the ana- lysis of the "mixed sample". 2/ See Chapter 4, Section 2. - 134

1 9 7 9 - +

+ A B 1977 C

The distribution can be made to approximate the X distribution as follows:

2 (A- D - 1)

A + D with f.F. = 1

It can be seen that the test eliminates cases where there has been no change in job stability (8 cases where employment was unstable in both time periods and 90 cases where it was stable in both periods). In this way the two samples are made independent of each other. The A group can be considered as a random sample of families who changed from stable to unstable employment, and the D group as an independent random sample of families who changed from unstable to stable employment. Each sample is now independent of the other so it is pos- sible to apply x2

The research hypothesis which is being tested is that job stability will increase between the two observations. If this does occur it will be inferred that this is related to participation in the project. The inference is very weak as we have no control group and are thus not able to control for any of the other possible factors which might have produced the change (the

"threats to validity" discussed in Chapters 1 and 2). -135-

The null hypothesis to be tested in this case is:

H : A = D 0 In other words the number of people whose job stability increases will equal many the number whose job stability decreases. Unlike T-Test, Median Test and others, it is not usual with X for the null hypothesis to indicate the ex- pected direction of change. It will be recalled that the X test compares total distributions rather than measures of (as is the case with the median test and the T-Test). As there is only one degree of freedom,

X is calculated using the correction for continuity as follows:

2 (19 - 10 - 1)2 X = 19 + 10

a 2.706

so that we With D.F. = 1 this does not reach the 0.05 level of significance

cannot reject the null hypothesis, and hence have no evidence of a change in

job stability.

Table 21 illustrates one way in which the results of the test can be involved, presented. This is useful as it makes explicit all of the assumptions

and makes it easier for the reader to follow the logic of the process of infe- it rence. It is important to note that a major weakness of this test is that in a only includes cases where there was a change. If change only occurs

minority of cases it would be possible to find a significant value of 2

which could be misleading as it could give the impression that (in this case) - 136 - participation had had a major impact on job stability, whereas in fact the change might have occurred in only a very small number of cases. It is a good practice when reporting results to indicate the proportion of total cases in which change occurred.

1.2 Ordinal scale measurement: Changes in housing quality (Wilcoxon matched pairs signed ranks test) 1/

Table 22 shows the results of the application of a scale of Housing

Quality to a panel sample of 243 families who were observed before and after moving to a new house. The new index was constructed by taking 6 attributes of a house:

Quality of floor Quality of roof Quality of walls Type of toilet Type of water supply Type of lighting

Each was ranked from 1 (low) to 3 (high) in terms of quality. The 6 ranks were then combined to produce an ordinal scale of quality. 2/ The maximum possible score on the scale is 18 and the minimum is 6. The column for QUALITY! indicates the scores in 1977, whilst QUALITY2 indicates the scores in 1979. Notice that

QUALITY1 indicates a score of 28, and QUALITY2 indicates a score of 30. Both of these are errors as they are above the maximum possible score of 18. As there were only 2 such cases they were eliminated from the calculations without any danger of affecting the results.

To apply the Wilcoxon Test the difference between the scores for each case in T(1) and T(2) were calculated and arranged in the form of a frequency distribution. This can be done manually or directly with programs such as

SPSS. The results are presented in Table 23.

1/ See Chapter 5 Section 3 for a non detailed explanation of the test. 2 / Figures are taken from the longitudinal study conducted in Sonsonate, El Salvador by the Fundacion Salvadorena de Desarollo y Vivienda Minima in 1977 and 1979e 137 -

Table 21: RESULTS OF TESTING THE HYPOTHESIS THAT PARTICIPATION IN A HOUSING PROJECT AFFECTS JOB STABILITY

Research hypothesis: Participants in a housing project will experience an increase in job stability.

Null hypothesis: H : A = D where: A = people changing from unstable to to stable employment

D = people changing from stable to unstable employment

Level of measurement: Nominal scale variables reduced to binary form.

Sale Related sample of 127 heads of household who were interviewed twice. No control group was used.

Assumed : No assumptions made.

Statistical test: McNemar Test in which related samples are converted into two independent samples by combining the observations from T(1) and T(2) into a single observation. Cases where no change occurred in employment stability are eliminated so that the test involves only 29 of the total of 127 cases in the original sample.

Rejection level for null hvpothesis: p< 0.05

12 Test statistic: X corrected for continuity 2.79

Degrees of freedom: 1

Probability of observed score under null hypothesis: Does not reach minimum required score of 3.841 for p = 0.05.

Interpretation: The null hypothesis cannot be rejected and there is no evidence of a change in job stability. - 138 -

Table 22: CHANGES IN QUALITY OF HOUSING INDEX OF PROJECT PARTICIPANTS. 1977 AND 1979. SONSONATE EL SALVADOR.

HOUSING QUALITY2 1979

8 11 12 13 14 15 16 17 18 30 ':l:'t:tal

9 1 0 0 0 1 0 0 0 0 0 2

10 0 0 0 1 0 0 0 0 0 0 1

11 0 0 0 0 1 0 2 0 2 0 5 HOUS ING QUALITYI 12 0 1 1 0 2 3 3 1 5 0 16 1977 13 0 0 2 5 3 3 9 0 5 1 28

14 0 2 1 3 19 6 30 3 34 1 99

15 0 0 0 0 5 10 24 3 16 1 59

16 0 0 0 1 1 5 9 1 7 0 24

17 0 0 0 0 1 1 3 0 0 0 5

18 0 0 0 0 0 1 1 0 1 0 3

28 0 0 0 0 0 0 0 =) 1 0 1

Total 1 3 4 10 33 29 81 8 71 3 243

Source: Socio-Economic Survey of Sonsonate 1977/79. Fundacion Salvadorena de Desarrollo y Vivienda Minima. Table prepared by the author. 139 -

Table 23: ORGANIZATION OF THE DATA FROM TABLE 22 TO PERMIT THE APPLICATION OF THE WILCOXON MATCHED PAIRS SIGNED RANKS TEST

Cum. Frequency Diff F F Rank + - Rank (F)

7 2 2 1.5 2 0 0 6 5 7 5 5 0 0 5 8 15 11.5 8 0 0 4 36 51 33.5 36 0 0 3 38 89 70.5 33 5 352.5 2 49 139 114.5 45 4 458 1 57 197 168.5 34 23 3875

Total 136 32

T 4658.5

Column Headings:

Difference = the difference between T(1) and T(2) without considering the sign. F = the frequency of the difference Cum F = cumulative frequency Rank = rank order for this group (for example there are 5 cases with a difference of 6. These are cases 3 thru 7 in rank order and the average is 3 + 7 = 5 ) 2 Frequency + - = frequency of each difference when the sign is taken into account Rank (f) = frequency of the least frequent sign multiplied by the rank of the group. For example the least frequent sign is negative (32 cases compared with 136 positive). As there are 5 negative cases with a Diff - 3, the (F) Rank for this group is 70.5 x 5 = 352.5 T = Sum of (F) Rank 4658.5

Note; The 45 cases where there was no change are not included in the table. The two cases with a value out of range are also excluded.

Source: Table 22. - 140 -

As the number of cases is greater than 25, the results can be transformed into a Z score by using Equation 13:

Z= T- N (N + 1) 4

N (N + 1) (2N + 1) 24

- 4658 - 168(169) 4

/168(169)(2 x 168 + 1) T 24

+ 3.86

It is important to note the sign of the Z score. As it is positive this means

QUALITY2 is greater than QUALITYl: If the sign was negative this would automa- tically confirm the null hypothesis as we have stated it.

Our research hypothesis is that housing quality will improve as a result of moving to the project and that:

QUALITY2 > QUALITY1

Our null hypothesis is that there is either no change or that QUALITY1 will be higher than QUALITY2. This is expressed as follows:

H : QUALITYI > QUALITY2

As we are specifying the direction of expected change, it is appropriate to use a one-tail test. We wi ll specify the 0.01 level for rejection of the null hypothesis0

The table of the normal curve indicates there is only .00006 of the area of the curve beyond this value.. As we are using a one-tail test we do not have to double the value. The probability is thus extremely small of finding a value as - 141 -

extreme as this under the null hypothesis and we can conclude that housing quality has improved for the project participants. What we cannot infer is that the improvement was due to the project. To be able to make any infe- rence about project effect it is necessary to know what would have happened to housing quality without the project, and this can only be done statistica- ly if we have an adequate control group. We underline the word statistically because in many cases it is possible to know without using statistics, that the change was due to the project. In Lusaka, for example, where there is presently no control group, it is very easy to visit squatter compounds in other parts of the city and to observe that almost no improvements have been made there since the project started. As the project was a squatter compound previously, one can make inferences about the project impact without the need to resort to statistics.

This point is important because a reading of research textbooks would often give the impression that without a control group no type of useful or valid inferences can be made about the effects of a project or experimental treatment. Although the research cautions are extremely important, they should not blind us to the very useful (and probably valid) inferences which can be made in many contexts without the benefits of a control group.

Returning to our present example, the housing market in San Sal- vador is more complex than Lusaka, and many changes are taking place in the control areas. In this case, without the benefit of a control group, it would be very difficult to reach strong conclusions about the project's impact.

Table 24 summarizes the results of the statistical test. - 142 -

Table 24: RESULTS OF TESTING THE HYPOTHESIS THAT PARTICIPATION IN A HOUSING PROJECT fl4PROVVS THE QUALITY OF HOUSING

Research hypothesis: Participants in a housing project will experience improvements in the quality of their housing. Null hyRothesis: H : QUALITYI > QUALITY2 Level of measurement: Ordinal scale variables Sample: Related sample of 243 families who were interviewed twice. No control group. Assumed sampling distribution: No assumptions are required. Statistical Text: Wilcoxon matched pairs signed ranks test. As N > 25 the result will be transformed into a Z score. Rejection level of null hypothesis: p = < 0.01 (one-tail). Test statistic Z = 3.86 Probability of observed score under null hypothesis: p = 0.00006. Interpretation: The null hypothesis must be rejected and we conclude that the quality of housing, as measured by the index QUALITY, has in- creased. However, as no control group was used, we cannot conclude with any certainty that this change was produced by participation in the project. - 143 -

1.3 Interval scale mieasurement: Changes in earned income of head and earned income of household: (T-Test)

Table 25 presents information on the earnings of the family head and of household members among participating families in a housing project in El Salvador. Each family was interviewed twice,.in 1977 and 1979 and for both the head and the household it can be observed that income has increased

(for heads of households the increase= 418s 37.6 which is about 14 percent; for family income the increase = Os 100.5 which is about 27 percent). It should be stressed that these increases are in nominal terms and do not take into account the rate of inflation so that the increase in real income will be considerably less and uiight even be negative.

Table 25: CHANGES IN EARNED INCOMES BETWEEN 1977 AND 1979 OF FAMILY HEADS

AND ALL HOUSEHOLD MEMBERS OF FAMILIES PARTICIPATING IN A HOUSING

PROJECT. SONSONATE, EL SALVADOR.

Average earned incomes (0s) 1977 1979 Change Sample

Earned income of head 260.8 298.4 37.6 127

Earned income of household 362.6 463.1 100.5 155

Note: Families where the information was not available were excluded.

Source: Umnuay Sae-Hua. "A preliminary analysis of panel data on income and employment in El Salvador." Table 1. Urban and Regional Econromics Division. World Bank. October 1979. - 144 -

Our research hypotheses are that mean income of the head of house- hold (HI) and family-income (FI) have increased. As we specify the direction of the expected changes we will use a one-tail test. The null hypotheses are as follows:

H : HI (1977) > HI (1979) 0

H : FI (1977) > FI (1979) 0

To determii.e whether there has been a statistically significant increase, the T-Test will be used. As the samples are paired the version of the

T-Test for paired sa-ales will be used (see Equations 27 and 28). The values of T are as follows: One-tail T D.F. Probability

Income of head 2.93 125 < 0.005 Family income 5.55 125 < 0.0005

In both cases the probabilities of finding these values if the null hypo- thesis is true, are so low that we can conclude there has been a statisti- cally significant increase in income. However, as there was no control group we cannot infer that the change was due to participation in the project. In Section 2.3 of this chapter we will show that a similar

change occurred in the control group. Although in some cases it may be possible to make inferences about the cause oL improved housing quality without a control group on the basis of observation, it is almost impossible

to do this with income as so many more complicating factors are involved and because the relationship between project and income is less direct.

Table 26 presents the results in the usual form (for the household head). - 145 -

Table 26: RESULTS OF TESTING THE HYPOTHESIS THAT PARTICIPATION IN A HOUSING

PROJECT IMPROVES INCOME OF THE HOUSEHOLD HEAD

Research hypothesis: Participation in the housing project will increase

the income of heads of household.

Null hypothesis: H : HI(1977) > HI (1979)

where HI = average income of head of household in 1977 and 1979.

Level of measurement: Interval scale variables.

Sample: Dependent sample of 127 families who were interviewed twice.

No control group was used.

Assumed sampling distribution: Normal distribution of sample means.

Statistical Test: T-Test for pairs with no control group.

Minimum rejection level for null hypothesis: p - < 0.05

Test statistic: T = 2.93

Degrees of freedom: 125

Probability of observed score under null hypothesis p < 0.005

Interpretation: The null hypothesis must be, rejected and we can conclude

that income of heads of household has increased significantly.

However, as there is no control group we cannot infer that the

increase was due to participation in the project. -146-

2. Related samples with control groups

The design for related samples (panel) with control group can be represented as follows:

T(1) T(2) Experimental group E(1)1 X E(1)2 Control group C(l)1 C(1)2

This indicates that the same families are interviewed in T(1) and

T(2). 2.1 Nominal variable: Changes in job stability. Use of X with variables

transformed into independent samples. 1/

Table 27 presents data from a longitudinal study in Sonsonate, El

Salvador. An original sample of 320 households were interviewed in 1977.

When the survey was repeated in 1979 it was only possible to reinterview 243.

When a family could not be reinterviewed it was replaced by the new family living in the same dwelling. The combination of the original families with the replacements produces a mixed sample which under certain circumstances 2/ can be assuLmed to approximate a random sample of all families living in the area at that point in time.

The present example is based on the analysis of the 243 families who were reinterviewed and who thus formed a panel. Of these families 156 were project participants and 87 were from two control areas. 3/ These 156 families can be considered as random samples of original families who have remained in the

1/ The data for this example are taken from "A preliminary analysis of panel data on income and employment in El Salvador" Umnuay Sae-Hua. Urban and Regional Economics Division, World Bank. September 1979. 2/ The most important assumption is that the total number of dwellings in the area has not changed significantly between the time of the two surveys. 3/ Extra legal subdivisions and tenement houses. - 147 -

Table 27: JOB STABILITY AMONG PROJECT PARTIClPANTS AND CONTROL GROUPS IN SONSONATE 1977 AND 1979

1977 1979

Experimental Stable 109 100 Group Unstable 18 17

Total 127 127

Control Stable 58 46 Group Unstable 7 19

Total 65 65

Source: Umnuay Sae-Hua. "A preliminary analysis of panel data on income and employment in El Salvador" Urban and Regional Economics Division. World Bank. September 1979. Table 5. The figures have been reworked by the author. - 148 - control areas or who have moved into the project. It can be appreciated that once families begin to drop out of the sample and are replaced, the panel sample is no longer representative of all families living in the area but only of the original families. 1/ For the purposes of the ana- lysis, job stability was dichotomized into stable, which meant having been in the present job for at least 12 months, and unstable, having been in the present job for less than 12 months. It can be seen that in 1977 a total of 109 participants had stable employment and 18 unstable, as compared with

58 and 7 respectively for the control group. In 1977 the tumber of people with stable employment had fallen. for both the experimental and the control groups. The purpose of the test is to determine whether there is a difference in the change from stable to unstable employment between the two groups.

The simplest way to compare these groups would be to use X but as the 2 samples in T(1) and T(2) are related, it is not possibi- to directly use X . 2/

1/ To determine the extent of bias produced by the use of the panel sample, a comparison was made for the 1977 data between the families who were re- interviewed in 1979 (the panel) and the families who had moved by 1979 and could not be reinterviewed. The differences between the two groups were very small. For example, the income of household heads in the control group was 225 Colones for the panel and 238 for the replacement sample.

2/ The X test is designed to compare the dit;tributions of two independent samples. If the two samples are drawn in6ependently, and there is found to be no difference between them in terms of the distribution of observa- tion between the different categories, then we can accept the null hypo- thesis that both samples are drawn from the same population. If the two samples are not independent of each other, then the application of the test is not valid as the difference between the two distributions will be substantially reduced. -149-

The problem is resolved by transforming the data in the way indicated in

Table 28. The stability scores for 1977 and 1979 are combined for each case.

A person who had a stable job in both time periods would fall into the

category of ++ whereas a pernon with unstable employment in both periods

would fall into the -- group. Similarly a person who had unstable employment

in 1979 but stable in 1979 would be classified as - + and a person who

changed from stable to unstable would be classified as + -. The table shows

that among participants 90 were stable in both periods and 8 were unstable in

both periods, 10 changed from unstable to stable and 19 moved from stable

to unstable.

Table 28: CREATION OF 4 JOB STABILITY CATEGORIES BY COMBINATION OF STABILITY SCORES IN T(1) AND T(2) SONSONATE, EL SALVADOR 1977 AND 1979

1977 + - +

1979 - + + - Total

Participants 8 90 10 19 127

Control 3 42 4 16 65

Total 11 132 14 35 192

Source: "A preliminary analysis of panel data on income and employment in El Salvador." Umnuay Sae-Hua. Urban and Regional Economics Division. September 1979. Table 5. - 150 -

The importance of the transformation of the data is that each of the 4 categories are now independent of each other. The ++ cases, for example, can be considered as a random sample of original families who had stable employment in both time periods. The -- cases can be considered as a random sample of original families who had unstable employment in both periods, and whose selection is completely independent of the other groups.

As the samples are now independent we can proceed to define our research hypothesis. We now face a problem. If we test Table 28 and find a significant X , there are two different factors which could have produced this. The difference could have been caused by a difference between the experimental and control groups, over both periods, or by a general difference between 1977 and 1979 in both groups. Table 29 illustrates two hypothetical situations to illustrate this point: In Case A the difference is produced by the fact that nearly all participants have stable employment in both periods whilst nearly all control group have unstable employment in both periods. This would produce a X score of 162.2 which is significant at the .001 level. It would not, however, be valid to interpret this as proving that the project had affected job stability, because it can be seen that virtually all partic.; ts had stable employment before they entered the project (1977) and that no change was produced. -151

Table 29: TWO ALTERNATIVE HYPOTHETICAL SITUATIONS WHICH COULD PRODUCE A

SIGNIFICANT X WHEN PARTICIPANTS AND CONTROL GROUP ARE

COMPARED OVER 2 T IME PERIODS

CASE A

1977 - + - + 1979 - + + - Total

Participants 0 107 10 10 127

Control 55 0 5 5 65

Total 55 107 15 15 192

2 X 162.2 D.F. = 3

Probability < 0.001

CASE B.

1977 - + - + Tocal 1979 - + + -

Participants 60 10 52 5 127

Control 50 5 0 10 65

Total 110 15 67 15 192

2 = 49.8 D.F. = 3

Probability < 0.001

Source for both tables: Hypothetical data - 152 -

In Case B it can be seen that in both groups the majority had unstable employ-

ment in 1977 (88% and 77% respectively for participants and control). In this

case the highly significant X score of 49.8 is due to the substantial improve-

ment of job stability by participants (52 who were unstable in 1977 were stable

in 1979) whilst the trend for the control group was in the other direction. In

this case it is valid to say that participation in the project appears to have

improved job stability.

To distinguish between these alternative explanations our testing

procedures and consequently our definition of research and null hypotheses must be conducted in several stages. Our research hypothesis is tbhat job

stability will increase more for participants between 1977 and 1979 than for

the control group. This can be broken down into two hypotheses:

1) There was a significant change in job stability patterns between

1977 and 1979. The use of this hypothesis is important as it avoids

the previously discussed problem of making invalid inferences on the

basis of reduced sample sizes.

2) There was a difference in the direction of change of job stability

for the experimental and control groups.

To test the first hypothesis we will apply X to the modified form of the data presented in Table 28. As the samples are now independent it is pos- 2 sible to apply X . The null hypothesis will be:

H : Dist (E) = Dist (C) 0 where: (E) and (C) refer to the distribution of the experimental

and control groups respectively. - 153

2 The value of X 33.14 with D,F. = 3. As the critical value at the 0.05

level is 7.815 we must conclude there is no difference in the overall

distribution of employment stability for the experimental and control groups.

We must be careful to note that the absence of an overall difference between

the two distributions does not preclude the existence of a difference with respect to those cases where there is a change in job stability. What it means is that any such difference would be too small to have an impact on the overall patterns of job stability. This becomes clear when we note in Table

28 that 143 of the 192 cases in the sample showed no change in job stability between 1977 and 1979. The question is now to determine whether among the

49 people who did change there is any difference in the direction of change for the experimental and control groups. In this case our null hypothesis will be derived from Table 30 and will be as follows:

H : Dist (E) = Dist (C) 0

The value of X (corrected for continuity) = 1.21 which with D.F. = 1 has a probability greater than 0.1 so we must accept the null hypothesis.

The conclusion of our analysis is that there has been no overall

change in patterns of job stability between 1977 and 1979, and even among

the small group of 25% who did change their job stability there is no dif- ference between the experimental and control groups. We must therefore conclude that there is no evidence of an association between project participation and job stability.

The results of the analysis are summarized in Table 31. - 154 -

Table 30: TrESTING THE DIFFERENCE IN THE DIRECTION OF CHANGE FOR CASES WHERE JOB STABILITY CHANGED BETWEEN 1977 AND 1979

Cases where job stability changed between 1977 and 1979

1977 - +

1979 + - Total

Participants 1Q 19 29

Control 4 16 20

Total 14 35 49

X (corrected for continuity) - 1.21. D.F. = I

Probability > 0.1

Source: Table 28 - 155 -

Table 31: RESULTS OF TESTING THE HYPOTHESIS THAT PROJECT PARTICIPATION AFFECTS JOB STABILITY

Research hypothesis: Two hypotheses must be tested. Firstly, that a difference in job stability patterns exists between participants and the control group. Secondly that among subjects who change their job stability there is a difference between partici- pants and control. Level of measurement: Nominal scale. Sample: Related samples of participants and control group inter- viewed in 1977 and 1979. First null hypothesis: H : Dist (E) = Dist (C) where: Dist = the transformed total distributions presented in Table 28 Assumed sampling distribution: Independent random samples with no assumptions about the form of the distribution. Statistical test: X after the data from Table 27 has been transformed to produce independent random samples. Rejection level for null hypothesis: p < 0.05 2 Test statistic: X - 3.14 Degrees of freedom: 3 Probability of observed score under null hypothesis: p > 0.1 Second null hypothesis: H : Dist (E) - Dist (C) 0 where: Dist = reduced samples presented in Table 30 Assumed sampling distribution: As above Statistical Test: X with correction for continuity. Rejection level for null hypothesis: p < 0.05 2 Test statistic: X m 1.21 Degrees of freedom: 1 Probability of observed score under null hypothesis p > 0.1 Interpretation: There is no difference between participants and the control group in terms of job stability. For the 25% of cases who do change their job stability, there is no difference between the pattern of change for par- ticipants and the control group. We must cot;.1ude that there is no evidence that participation in the project affects job stability. - 156 -

2.2 Ordinal scale variable: changes in housing quality (The Median Test)

The Median Test can be used only with independent samples (see

Chapter 5 Section 2) but it is often possible to transform related samples into independent samples by combining the two observations on each case into a change score. This can only be done when a control group is used, as in this case two independent ordinal scales can be produced, one for the cotitrol group and one for the experimental. In Section 1.2 of this chap- ter where we again used housing quality, but without a control group, it was not possible to use this transformatior. The reason is that if only one change score is produced for the experimental group, there is no other scale to Twhich it can be compared. 1/

To calculate the change score we use Equation 14:

Chiange = Score 2 - Score I which means that the score in T(.) is subtracted from the score in T(2).

Using these change scores the information needed to apply the Median Test is given in Table 32. It was found that the average increase in QUALITY

between T(1) and T(2) was 2 poir- *s. 82 families in the experimental group were above this point and 74 below. This contrasted with the coatrol group where only 7 families were above the median change and 80 were below. With 2 the information in this form it is possible to apply the X test, corrected for continuity. The null hypothesis being tested is that there is no dif- ference between the two distributions. The obtained X score of 43 has a

1/ With an intervA level variable it would be possible to apply the T-test under these circumstances to test whether the observed mean differs from zero, but this cannot be done with an ordinal level variable. - 157 - probability of less than 0.0001 of occuring if the null hypothesis is true, so we can accept the alternative hypothesis that there is in fact a dif- ference between the two distributions. This shows that QUALITY of the housing of participants has improved much more than for the control group, and we can conclude that participation in the project appears to be associated with improved housing quality. The results of the test are presented in the usual form in Table 33.

Table 32: THE NUMBER OF EXPERIMENTAL AND CONTROL GROUP FAMILIES WHO EXPERIENCE AN INCREASE OF HOUSING QUALITY ABOVE AND BELOW THE MEDIAN CHANGE SCORE OF *2. SOSONATE, EL SALVADOR, 1977-79.

Project Control participant Group +

Above median increase 82 7

Below median increase 74 80

Source: Fundacion Salvadorena de Desarrollo y Vivienda Minima. S-cio-economic survey of Sonsonate, 1977 and 1979. Calculations by the author. 158 -

Table 33: RESULTS OF TESTING THE HYPOTHESIS THAT THE QUALITY OF HOUSING OF PROJECT PARTICIPANTS WILL INCREASE MORE THAN FOR THE CONTROL GROUP.

Research hyfpothesis: Participants in a housing project will experience greater improvement in housing quality than will families in a control group. Null_hypothesis: H : Q(E) > Q(C) where: Q(E) 2 change in quality for experimental group C(C) = change in quality for control group Level of measurement: Ordinal scale variables Sample: Related samples of 156 experimental and 87 control families were interviewed twice. Assumed sampling distribution No assumptions required Statistical test: Two ordinal scores from each subject reduced to a single change score. Median Test applied to these scores which are then tested using X corrected for continuity. Rejection level for null hypothesis: p < 0.05 (two-tail) 2 Test statistic: X = 46 Degrees of Freedom: 1 Probzbility of observed score under null hypothesis: p < 0.0001 Interpretation: The quality of housing as measured by the QUALITY score has increased more for the experimental than for the control group. Although the control and experimental groups are not equiva- lent, the results must provide a first indication that the project has had a positive impact on housing quality. - 159 -

2.3 Interval scale variable: changes in earned income of head of household:

(T-Test)

Table 34 presents information on earned income of the head of household of paritipants and a control group in 1977 and 1979. The example is the same as that used in Section 1.3 of this chapter with the exception that in this case we also include a control group.

Although the survey is based upon paired samples, they can be made independent by the calculation of a Change Score 1/ which represents the dif- ference in income between 1977 and 1979. Using this procedure the samples are reduced to two independent groups, one of participants and one of the control.

Table 34 shows that in 1977 the average income of the participants was 0 26 higher than that of the control group. However, in 1979 the control group income had increased by 0 75 compared with ¢ 38 for the experimental group and was now 0 11 higher. A superficial inspection would suggest that the income of the control group had increased more rapidly and that the project had had a negative effect on income. Thus although the initial research hypo- thesis was that the project had a positive impact on income, on the basis of the results it is mcre appropriate to test the opposite hypothesis (that the project has had a negative effect). To test this hypothesis we should use a one-tail test as we wish to specify the direction of the difference.

The first step in the analysis is to determine whether the two samples have a common variance. The F-Value of 17.9 has a probability of only 0.006 of occurring if there is no difference between the variances, so

1/ It should be recalled that in Chapter 6 we indicated some weaknesses in the use of Change Scores. This problem will be discussed again in Chapter 12. - 160 -

Table 34: THE USE OF THE T-TEST TO DETERMINE THE SIGNIFICANCE OF CHANGES IN EARNED INCOME OF HEAD OF HOUSEHOLD IN PARTICIPANT AND CONTROL GROUP BETWEEN 1977 AND 1979 SONSONATE. EL SALVADOR

Separate Variance Test Pooled Variance Variance No.of Mean income(¢s) 2-tail 2-tail 2-tail Group cases 1977 1979 Change S.D. F-Value prob. T D.F.prob. T D.F.prob.

Participant 127 260 298 38 144 17.9 0.006 1.51 189 0.132 1.38 99 0.172 Control 64 234 309 75 294

Source: Umnuay Sae-Hua. "A preliminary analysis of panel data on income and employment in El Salvador." Urban and Regional Economics Division. World Bank. October 1979 -161- we must conclude that the variances are different, and must use the version of the T-Test for separate variances. The table shows that when we have separate variances the degrees of freedom are approximately halved (99 compared with 189). For separate variances the table shows a 2-tail probability of

0.172, however, as we are using a one-tail test the probability is half of this, in other words 0.086. This still does not reach the 0.05 level but suggests that participation may possibly have some slight but not yet signifi- cant negative effect on income. The results are presented in Table 35. - 162 -

Table 35: RESULTS OF TESTING HYPOTHESIS TO DETERMINE THE IMPACT OF PROJECT PARTICIPATION ON THE EARNED INCOME OF HOUSEHOLD HEADS

Research hypothesis: The original hypothesis had been that project participa- tion would increase earned income of household heads. However, on inspecting the first results this hypothesis was changed to stating that participation in a housing project would have a negative effect on the earned income of household heads when compared to a control group. H : C(YE) < C(YC) wAere C(YE) = change in income of participants C(ye) = change in income of control group

hpothesis: H : C(YE) > C(YC)

Level of measurement: Interval scale variables

Sample: Related samples of 127 participants and 64 in control group, each of whom are interviewed twice.

Assumed sampling distribution: Normal distribution of the mean change scores and equal variances of the distribution of the mean change scores.

Test of (equal variances): The F-Test was applied and the probability of equal variances was found to be only 0.006 so it is assumed the variances are different.

Statistical test: Transformation of related samples into independent samples through the use of change scores. Use of T-test for independent samples with unequal variances.

Rejection level for null hypothesis: p < 0.05 (one-tail)

Test statistic: T = 1.38

Degrees of freedom: 99

Probability of observed score under null hypothesis: p = 0.086

Interpretation: No statistically significant difference has been found between the two change scores so we must conclude there is no evidence that participation in the project affects the change in income. However, as the probability is close to the minimum rejection level of the null hypothesis there is a slight indication that the project may have some negative impact on income of head of household. - 163 -

CHAPTER 9: THE APPLICATION OF THE STATISTICAL TESTS WITH INDEPENDENT SAMPLES

None of the research designs used in the evaluation program discussed in the present report have used independent samples for the main survey design.

The reason for this is that related samples offer a number of advantages, of which the following are some of the most important:

a. Related samples, in which two observations are obtained from the same

subject, contain a smaller sampling error. This means that more precise

estimations can be made from a given sample size and consequently that

smaller samples can be used for the same level of precision.

b. Having two observations on the same subject permits the use of multiple

regression analysis in which the effects of non-equivalent control

groups can be statistically controlled ("partialled out"). It is

particularly importnat that the effect of the dependent variable in

T(1) can be partialled out.

c. Having information on the amount of change in individual subjects

means that more detailed follow-up studies can be conducted with

groups which have experienced specially high or low rates of change.

This cannot be done with independent samples as we normally have no

way of measuring the amount of change experienced by a particular

person.

Despite these advantages for the related sample design, this design has a number of disadvantages which can be overcome by independent samples, and it is expected the independent sample design will be considered more seriously in the future. Some of the advantages of the independent sample design are the following: -164- a. It is much simpler to draw samples. The expensive task of matching

can also be avoided. b. In Chapter 7 we pointed out that matching is a major problem and that

when one uses related samples one must always expect a certain margin

of error from mismatching so that some cases are classified as being

repeated interviews with the same family when in fact the families are

different, and vice versa. This problem can be completely eliminated

with independent samples. c. In the earlier discussion of the advantages of panel studies it was

pointed out that the error term is smaller and the same level of pre-

cision can be obtained with a smaller sample than would be required if

an independent sample design were being used. This potential benefit

is, however, largely offset by the effects of sample attrition. In

many countries it is found that between the first aiid second applica-

tions of the study, 25 percent or more of families have moved and

have to be replaced. Assuming for example that it had been estimated

that a sample of 400 was required to obtain accurate estimates with a

panel study, this would mean that an original sample of 533 would

have to be interviewed so that 400 would remain in T(2). d. Independent samples, as they are selected afresh each time, are always

representative of the total population and do not present the types of

interpretation problems found with panel studies once families begin to

drop out. With a high attrition rate the panel study only represents

original families who lived in the area in T(1) and cannot take into

account the changes in the.population which have taken place since then.

Given the potential importance of the independent sample designs we pre-

sent them in this Chapter even though they have not yet been used very

frequently. Examples of applications will not be given for the designs - 165 - without control groups although the appropriate tests will be indicated for variables on each level of measurement. Examples of application are given for the designs with control groups even though in most cases they could not be taken from the principal designs in the evaluation programs, as in most cases these use panel studies or mixed designs. For this reason some of the examples on rather unusual group comparison but this does not effect the logic of the test.

1. Independent sample designs without control groups

The sample design for a study conducted at two points in time with independent samples and no control group is as follows:

T(1) T(2) E(1)1 X - E(2)2 which shows that a new sample was drawn in T(2).

The fact of the samples being independent permits the use of a wider range of statistical tests and in most cases, but not all, makes the analysis somewhat simpler.

1.1 Nominal scale measurement: X

With independent samples, the X test can be used directly without any need for transformation of the variables. The test is applied exactly as ex- plained in Chapter 4. As explained in that chapter it is convenient to combine the use of X with PHI (for 2 x 2 tables) or Cramer's V (for tables with more cells) so as to adjust for sample size and to have a test whose maximum value is

1 and which can be interpreted as a correlation coefficient. The value of X is 22 directly proportional to sample size,9 and when large samples are used there is a danger of producing spuriously significant values of X2 .

1.2 Ordinal scale measurement: The Median Test

The test is applied exactly as explained in Chapter 5. - 166 -

1.3 Interval scale variables: T-Test, confidence interval estimate and regression analysis

The simplest procedure is to use the T-Test to determine the signifi- cance of the difference between means (Chapter 6). This method has the potential disadvantage that in some cases a statistically significant diffe- rence will be so small as to have no substantive importance. To avoid this error the T-Test can be complemented by the estimation of confidence intervals which will show how large the differences are as well as their level of significance (Chapter 6 Section 2.2).

2. Independent samples with control groups

This sample design, in its simplest form, can be represented as follows:

T(1) T(2) Experimental group E(1)1 X EfE(2)2 Control group C(l)l C(2)2 whiich shows that for both experimental and control groups, new random samples were drawn in T(2). The statistical analysis for this sample design is slightly more complex than for the related samplas as will be seen in the following sections.

2.1 Nominal scale variables: Changes in occupational categories: Application of T-Test for Proportions to dichotomized categories

Table 36 presents data on the occupational category of the head of household among project participants and control group families in Sonsonate in 1977 and 1979. The figures only refer to families who did not live in the same dwelling in both time periods. For 1977 the sample includes families who were interviewed in this year but who had left their dwelling by 1979. For

1979 the figures refer to families who replaced drop-outs. We thus have two independent samples. 1/ The investigator would like to examine changes in

1/ It could be argued that these two samples are not strictly independent as the replacement families were drawn from the dwelling where the dropout family had lived. However, as assignment to a house is largely independent of the process of leaving, the two groups approximate independent samples. - 167 -

Table 36: COMPARISON OF CHANGES IN OCCUPATION CATEGORY OF HEAD OF HOUSEHOLD OF DROP-OUTS AND REPLACEMENTS AMONG PROJECT PARTICIPANTS AND CONMOC GROUP. SONSONATE 1977 AND 1979. EL SALVADOR

1977 1979

Participants Control Families Replacement Replacement Who Drop Out Who Move Participants Control

1. Agriculture and Fishing 0 0 4 3

2. Industry and Manufacturing 3 4 4 7

3. Construction 1 5 1 0

4. Electricity and Gas 2 1 4 12

5. Commerce 1 4 1 6

6. Transport and Services 10 27 v 8 18

TOTAL 17 41 19 46

Source: Socioeconomic Studies of Sonsonate 1977 and 1979. Fundacion Salvadorena de Desarollo y Vivienda Minima. Table prepared by author. - 168 -

occupational categories to determine whether new families who move into the

project have different types of employment from families who hase dropped

out. This is an interesting research question as program management is

interested in determining the potential impact which projects can have on employment.

Unfortunately, there is no statistical test which can be applied

directly to the data of Table 36. As the samples are independent of each

other it is not possible to obtain change scores as was done for ordinal

variables with matched samples (Chapter 8, Section 2.1). The only way in

which data of this kind from independent sampes can be analyzed is to reduce

the table to a dichotomized form. To do this the researcher must decide

which is the employment category of most interest. In this case it was

decided that the most interesting break would be to compare commerce and

services (categories~ 5 and 6) with all other types of employment. It can be

argued that these are the two types of employment which are most unstable and

that the project, because of the incentive it gives for long-term housing investment, might tend to attract families with more stable employment. To

test this the data is dichotomized into Services and Commerce and Other

Employment (Table 37). An initial inspection of Table 37 shows that the proportion of Other Employment among replacement families in the project is

52.7% compared with only 35.3% among families who dropped out of the project.

This would provide initial support for the hypothesis. However, to control for the possibility that this might be a general trend throughout the city and not restricted to project participants, a control group is introduced into the analysis. It can be seen that a similar trend exists there as well, with the proportion of Other Employment increasing from 24.4% to 47.8%. To determine - 169 -

Table 37: CHANGE IN PROPORTION OF HEADS OF HOUSEHOLD WORKING IN COMMERCE AND SERVICES AMONG DROP-OUTS AND REPLACEMENTS. SONSONATE 1977 AND 1979. EL SALVADOR.

1977 1979

Participants Control Families Replacement Replacement Who Drop Qut Who Move Participants Control

Commerce and Services

Number 11 31 9 24 % 64.7 75.6 4703 52.2

Other Sectors of Employment

Number 6 10 10 22 % 35.3 24.4 52.7 47.8

Total

Number 17 41 19 46 % 100 100 100 100

Source: Table 36 - 170 -

the T-Test for Proportions is applied. As we are comparing rates of change between four groups,-the appropriate test is for the Difference of the

Difference of Proportions. (See Chapter 6, Section 2.1.2)

Our experimental hypothesis is that:

1 (Pe2 Pel) (Pc2 Pcl) where: p refers to the proportion of Other Employment in thu participant

and control groups in the two time periods.

The null hypothesis will be:

: (Pe2 Pel) S Ic2 Pcl

as our experimental hypothesis indicates the direction we expect the chanige

to take, we will use a one-tail test. In other words, that the increase in

the proportion of other employment in the con,Crol group will be greater than

or equal to that in the experimental group. - 171 -

Due to the relatively small samples sizes the T-Test should be applied rather than the Z Test. This is calculated by using Equation 26:

(p 1 - P2 (p 3 P 4) T--

P1 q1 P2 q2 P3 q3 P4 q4

nV n2 n 3 n4

In the present example this gives the following:

T = (,447 - .473) - (.756 - .522)

47(+ .473)(5529) + (.:!56)(..244) + (.522)(.478) 17 19 41 46

= 0.3142

As the T-Score for the 0.05 probability level with a one-tail test and D.F. = 121 is 1.28 we must conclude that there is no statistically signi- ficant difference in the change of proportions between the participant and the control group. There is therefore no evidetice to indicate tihat participation in the project affects employment in non-commerce and service categories. The results of the test are presented in Table 38. - 172

Table 38: RESULTS OF TESTING THE HYPOTHESIS THAT PARTICIPATION IN A HOUSING PROJECT REDUCES THE PROPORTION OF HEADS OF HOUSEHOLD WORKING IN COMMERCE AND SERVICES

Research hypothesis: New families who move into the project will be employed more in non-commerce and service employment than families who leave the project. This is stated as:

H1I (Pe 2-PeI) > (PC2-Pc 1)

Null hypothesis: H : (Pe -PeI) < (Pc 2-Pc )

Level of measurement: Nominal scale variables.

Sample: Independent samples of original families who drop out of project and of families who replace them. Control samples also drawn independently.

Assumed sampling distribution: No assumptions are made about the distribution of the nominal variable but when this is dichotomized, the distribu- tion of the difference of proportions approximates to the no:'Lal distribution.

Statistical test: The scale is dichotomized into Commerce-and Services versus Other Employment Categories. The T-Test for the Difference of Difference of Proportions is then applied.

Re-jection level for null hypothesis: p < 0.05 (one-tail test)

Test statistic: T = 0.3142

Degrees of Freedom: 121

Probability of obsererved score under null hypothesis: p > 0.25

Interpretation: We must accept the null hypothesis and conclude that there is no evidence that the project tends to attract families who work in employment categories other than commerce and services. - 173 -

2.2 Ordinal scale variables: Changes in housing quality: The Median Test

Table 39 presents information on housing quality of participants and the control group in Sonsonate, El Salvador in 1977 and 1979. As in the previous example, the figures only refer to families who did not live in the same residence in both periods of time. For the participant group the families included for 1977 are those who had left before the second interview in 1979, whereas the 1979 families are those who had entered the project since 1977. Similarly for the control group, the 1977 families had left before the 1979 inlterview, and the 1979 families had entered since

1977. We thus have independent samples for both participants and control groups. The hypothesis we wish to investigate is that replacement families entering the project will enjoy a higher standard of housing than the families they replaced in 1977. To determine whether this is due to the project or is a general phenomenon throughout the city, the same.measure of housing quality will be applied to the control group. The index of housing quality is the same as that used in Chapter 8, Sections 1.2 and

2.2 and is measured on an ordinal scale.

As the two samples of participants are independent, we cannot use a change score in the way we did in earlier sections. However, if the data from the two participant groups and from the two control groups are combined as in Table 40, it is then possible to use the Median Test.

What we are doing in this case is to compare the total quality scores for the two participant grGups with the total quality scores for the two control groups. If the test is applied in this way and we find a difference, this will prove there is a difference between the partici- pant and control groups in total. However,in this simple form there are two possible e:xplanations of the difference: - 174

Table 39: HOUSING QUALITY IN 1977 OF FAMILIES WHO DROP OUT OF EXPERIMENTAL AND CONTROL GROUPS BY 1979, AND IN 1979 OF REPLACEMENT FAMILIES WHO WERE NOT IN SAMPLE IN 1977. SONSONATE, EL SALVADOR

1977 1979

Housing Quality Participants Control Participants Control

6

7

8 0 0

9 0 0

10 0 0

11 1 0 0 0

12 3 2 0 0

13 4 4 0 6

14 5 29 0 18

15 4 14 0 20

16 2 5 9 4

17 1 1 0 1

18 0 1 12 0

Total 20 56 21 49

Source: Socio-Economic Survey of Sonsonate 1977 and 1979. Fundacion Salvadorena de Desarrollo y Vivienda Minima. Table prepared by the author. - 175 -

Table 40: POOLING OF SCORES FOR 1977 AND 1979 FOR THE EXPERIXENTAL AND CONTROL GROUPS FOR THE DATA PRESENTED IN TABLE 39

Combined frequencies for 1977 and 1979

Quality Experimental Control Cumulative score group group frequency

11 1 0 1

12 3 2 6

13 4 10 20

-O------X------D------14 5 47 72 MEDIAN ------15 4 34 110

16 11 9 130

17 1 2 133

18 12 1 146

Source: Table 39 - 176 -

a. That there was a difference between the participant and

control groups in T(1).

b. That there was no difference in T(1) and that the overall

difference is due to differences in the rate of change of

the two groups between T(1) and T(2).

To be able to test the second hypothesis it is necessary to first test hypothesis a. This is done by using the Median Test to compare the two groups in 1977. The information from 1977 can be reduced to the following form:

(Median score = 14)

Score Participants Control

11 to 14 13 35

15 + 7 21

Applying the Median Test as defined in Equation 11:

2 612)2 X = 76 ((13 x 21) - (35 x 7) -762 (13 + 35) (7 + 21) (13 + 7) (35 + 21)

= 0.005

With a value of X as low as this there is no evidence of any difference between the two groups in 1977 and we can proceed to compare the overall

distributions. When the two distributions are compared the 2 x 2 table

is as follows:

(Median score = 14)

Score Participants Control

11 to 14 13 59

15 + 28 46 - 177 -

Our hypothesis is that we should find more high scores for the participants, and the null hypothesis is that there will be no difference between the two groups. Inspection of the table indicates that there are more high scores in the experimental group where 28 out of 41 cases are above the median as opposed to only 46 out of 105 for the control group. This means that if the null hypo- thesis is rejected, the reason will be that the change is in the expected 2 direction. The application of the test produces a X score of 8.08 which has a probability of less than 0.005 if the null hypothesis is true. This shows that new project arrivals have improved the quality of their housing in comparison with the control group. The results are summarized in Table 41.

2.3 Interval scale variables: Changes in expenditure on food: The T-Test

Table 42 presents data on food expenditure of participants and control families in 1977 and 1979. As in the two previous examples the data refers to drop-outs and replal:ement families. The hypothesis we are testing is that due to increased expenditures on housing during the construction phase,food expen- diture of participant families will decrease relative to the control group.

As food expenditure is an interval level variable, the most conve- nient test to use is the T-Test. However, as the two observations on the experimental and the control groups are drawn in both cases from independent samples, it is necessary to calculate the T-Statistic in two stages. In the first stage, we must estimate the standard deviation of the difference of means for the experimental and control groups respectively,and then in the second stage we use these two standard deviations to estimate the standard deviation of the difference of means of the experimental and control groups. - 178 D

Table 41: RESULTS OF TESTING THE HYPOTHESIS THAT NEW FAMILIES ENTERING A HOUSING PROJECT WILL HAVE IMPROVED QUALITY OF HOUSING WHEN COMfPARED WITH A CONTROL GROUP OF LOW-INCOME FAMILIES

Research hypothesis: The quality of housing of new families entering the project will have increased more compared with the original housing of families they replace, than will have the housing of new famlies moving into control areas as compared with the families they replace. This can be expressed as:

H1 : Qe2 -QeI> QC2 -QC 1 where: e and c = replacement families in experimental and 2 2 control groups in 1979. e and c families in the experimental and control 1 1 groups in 1977 who had left by 1979.

Q housing quality.

Null hypothesis: Ho: Qe2 - Qe1 = Qc2 -QcI

Level of measurement: Ordinal scale variables Sample: Independent samples of families who drop out and their replace- ments in the experimental and control groups. Assumed sampling distribution: No assumptions are made Statistical test: Median Test conducted in 2 stages, first to determine whether a difference exists between e and c in T(1), then to compare the pooled scores for the experimental and control groups. Rejection level for null hypothesis: p < 0.05 Test statistic: (1) For comparison of E(1) and C(1) 2 X =0.005 with D.F. = 1

(2) For pooled comparisons of control and experimental groups: 2 X = 8.08 with D.F. = 1

Probability of observed scores if null hypothesis true: (1) p = .95 (2) p < .005 Interpretation: There is no difference between the two groups in 1977. However, the combined scores show there has been a greater increase of Quality in the experimental group, which means that the change is in the direction predicted by the research hypothesis. - 179 -

Table 42: COMPARISON OF FOOD EXPENDITURE OF ORIGINAL FAMILIES AND REPLACEMENT IN EXPERIMENTAL AND CONTROL GROUPS 1977 AND 1979. SONSONATE. EL SALVADOR.

1977 1979

Participants Control Participants Control

(Colones) 177.4 155.2 240.6 241.5

SX 62.3 78.8 127.9 192.8

inumber of Observations 179 140 20 46

Source: Socio Economic Studies of Sonsonate 1977 and 1979. Fundacion Salvadorena de Desarrolo y Vivienda Minima. Table prepared by author. 180 -

To calculate the standard deviation of the difference of means for

the experimental and.control groups (Stage 1) we must determine in each case

whether the samples from 1977 and 1979 have common variances. This is done by

using the F-Test as specified in Equation 17. The F-scores are the following:

Comparison of variances for 1977 and 1979 F D.F. Probability

Experimental 1.73 19,178 .05

Controi 1.66 45,139 .05

In both cases the variances are different and the standard deviation of the

difference of means must be calculated using Equation 19. For the experimental

group using iniormation in Table 42 the standard deviation of the difference of means S(De) is calculated as follows:

2 . ) / (240.6) + (177.4)2 S [De] 20-1 179-1

- 56.8

Conducting a similar calculation for the control group gives:

S (Dc) = 39.7 -181-

The standard deviation for the difference of the difference of means

S [DE-C] can now be dalculated. The F Test on the variances of the two differences of means gives a value of 2.04 which has a probability of less than 0.01 if the variances are the same so we must assume there are separate variances and again use equation 19. The calculation is as follows:

S [De - c] (56.8)2 + (397)2 V 20 + 179 - 1 46 + 140 - 1

- 4.98

Tne T-Score is now estimated to be:

T = (240.6 - 177.4) - (241.5 - 155.2) 4.98

= -4.63

where the figures in the numerator are the means of the four original

samples (the first two the experimental groups and the last two the

control groups).

As the two variances were different the number of degrees of freedom must be estimated using Equation 20 as follows:

(.82 + (39.7)21 198 185 D.F. = -2 56.8 2 jj1 j+j 39.7 j 198 f\100 185 /187/

- 356 - 182 -

The probability of finding a score as high as this if the null

hypothesis is true is less than .0005 and we must therefore conclude that

there is a difference between the two groups. As the sign of the T Score

was negative this means that there was a greater increase in the control

group expenditure on food than in the experimental group and we can accept the research hypothesis.

As the research design includes a control group the results tenta-

tively suggest that the reduced food expenditure by the experimental group is at least partially caused by their participation in the project. One likely reason would be that more money has to be spent on the construction of the new house and other moving expenses. However, caution must be used in making these inferences because the design uses a non-eguivalent control group so until we are able to control for the ef'fects of the initial differences between the experimental and c-natrol groups we cannot exclude the possibility that some of the differences we attribute, to the project are in fact partially caused by other intervening variables (this issue was discussed at length !,.a

Chapter 2 and we return to it in Chapter 12).

2.3.1 Estimating confidence limits: It was pointed out in Chapter 6 that a difference may be statistically significant but at the same time be so small as to have no substantive importance. To determine the importance of statis- tical differences ft was recommended that the confidence intervals be estimated for differences between mea-ns. If we use the 0.05 and 0.01 confidence levels, the confidence limits for the observed difference of means of -Cs 2341 are the following: Minimum difference Maximum difference

0.05 level - 13.34 -32.8 0.01 Level - 10.35 -35.8 183 -

This means we can be 95% certain that control group food expenditure exceeds experimental group by between Os 13.34 and 32.8, and we can be 99% certain that the difference lies between Os 10.35 and 35.8. The policy planner can then decide whether this amount of difference is large enough to be considered important.

The results of the tests are summarized in Table 43. - 184 -

Table 43: TESTING THE HYPOTHESIS THAT FOOD EXPENDITURES INCREASE MORE RAPIDLY FOR THE CON2TROL GROUP THAN FOR THE EXPERIMENTAL GROUP

Research hypothesis: Food expenditure will increase less rapidly in the experi- mental than in the control group. This can be expressed in the following way:

H1: Fe- < Fc2 -Fc

where: F equals food expenditure

Null hypothesis: Ho: Fe 2-Fe1 > Fc -FcI

Level of measurement: Interval scale variables

Sample: Independent samples drawn from the experimental and control groups in 1977 and 1979.

Assumed sampling distribution: Normal distribution of sampling means.

Statistical test: T-Test for the difference of the difference of sample means with independent samples.

Rejection level of null hypothesis: p

Test statistic: T = 4.63 with D.F. = 356

Probability of observed score under null hypothesis: p < .0005

Confidence intervals for difference of means:

p = 0.05 -¢s 23.1 + 9.76 p = 0.01 -Os 23.1 + 12.74

Interpretation: The results of the test support the research hypothesis that there has been a greater increase in food expenditure by the control than by the experimental group. At the 0.01 level of confidence the difference is at least 185 -

CHAPTER 10 THE APPLICATION OF THE STATISTICAL TESTS WITH MIXED SAMPLES

A mixed sample, as defined in the present context, is a panel sample design in which drop-outs are replace-do This means that in T(2) some of the families are being interviewed for the second time (panel) and some are being interviewed for the first time (independent samples). When the total sample is analyzed we have a mixed sample. When this design is used with a control group it can be represented as follows:

TVI) T(2) Experimental group E(1)2@- X E(1)2 Panel sample ---- -E(2)2 Independent sample

Control group C(M) - C(1)2 Panel sample ~ C(2)2 Independent sample

Using our standard terminology we can see that the sample in T(2) is composed of a sub-sample of families who were first interviewed in T(1) and who have been reinterviewed; and a subsample of families who are being interviewed for the first time in T(2). The structure of the mixed sample can be shown in the following diagram:

Experimental Control Samnle SaM,le

Panel sample A B

Independent sample C D - 186 -

In T(2) the total sample, represented by the area: A + B + C + D is divided into the experimental group (A+C) and the control group

(B + D). Three main types of comparison can be made between the experimental and control groups:

1. Panel study A:B

2. Independen, random samples C:D

3. Mixed samples (A + C): (B + D)

The mixed sample is designed so that it is a stratified random sample of all families in the area at the time of the interview. The independent s¢amples represent families who were in the area during certain specified time periods, and the panel study represents families who have been in the area since the start of the study.

This design has been used in most of the studies reported in this document. There are several reasons why this design has been used so widely:

i. It permits an analysis both of original families and of the

total population at each point in time..

ii. It resolves the problem of the increasingly unrepresentative-

ness of the panel sample.

iii. It permits the use of a much smaller sample than would be

required if the original sample had to be increased to allow

for expected drop-outs.

iv. It permits a separate analysis of new families who enter a

project or control area.

Despite these many advantages, the mixed sample is more difficult to analyze.

The reason for this is that the sample is composed of two parts, a panel

sample aad an independent sample, each of which usually requires a different

statistical treatment. As, we have seen in Chapter 8, panel samples, because - 187 - they do not satisfy the requirement of independent estimates of variance, usually have to be modified before statistical tests can be applied. In

Chapter 9 it was shown that as independent samples satisfy the requirement of independent estimates of variance, the usual statistical tests can be applied directly. However, these tests can be difficult to apply with four groups (two observations on each of the experimental and control groups) and one does not have the advantage of being able to reduce the four observations to two through the use of change scores or multiple regression as could be done with the panel samples.

The problem to be faced is to determine which are the appropriate testing procedures when the two parts of the sample would normally be tested using different procedures in each case.

One possibility is to analyze the two sub-samples separately. The panel sample would be analyzed using the appropriate tests for related samples whilst the sample of replacements would be analyzed using the appropriate tests for independent samples. The results of the two tests would be reported separately. Table 44 presents the 9 possible outcomes of this procedure. E+ indicates that the test showed that there was a statistically significant difference in the change between the two groups and that the experimental group increased more (or decreased less). E- indicates that there was a statistically significant difference in the rate of change but that in this case the control group increased more (or decreased less). 0 indicates that there was no statistically significant difference between the two groups. As there are 3 possible outcomes for the panel sample (E+, 0 and E-) and the same

3 for the independent samples, when the results of the two samples are combined there are 9 possible outcomes. It can be seen from Table 44 that only 3 of these

9 outcomes give the same result for the panel and independent samples: - 188 -

Outcome 1 where the experimental group is shown to increase in both cases

(E+,E+); outcome 5 is where neither group shows any difference (0,0) and outcome 9 is where the control group increase more than the experimental in both cases (E-,E-). When any of these three outcomes occur the interpreta- tion of the results is simple, and one can conclude that the experimental group has increased for the whole sample (outcome 1), that there is no differ- ence between the two groups (outcome 5) or that the control group has in- creased more for the whole sample.

The problem arises when one of the other 6 outcomes occurs. For example, what can be said about the change in the total sample with outcome 2?

In this case the experimental group shows an increase for the panel sample but there is no difference between the two groups for the independent sample.

In outcome 3 the situation is even more difficult as the direction of change is different for the two samples. In this case the experimental groups shows an increase for the panel but a decrease for the independent sample.

One possibility is of course to leave the analysis at this point and decide that, except in those cases where the test gives the same result for the two samples, we cannot make any statements about the total sample.

This would be very unfortunate as one of the main purposes of most of the

studies is to examine precisely these types of change at the level of the

total sample.

The approach which is proposed is the following:

i. Test the total sample as if it were independent.

ii. Require a higher level of significance before accepting an

observed difference. This will usually mean requiring at

least the 0.01 .,evel. - 189

Table 44: POSSIBLE OUTCOMES OF SEPARATE TESTING OF DEPENDENT AND INDEPENDENT SUB-SA4PLES WITHIN A MIXED SAMPLE DESIGN

Panel Independent Outcome sale sample

1 E+ E+

2 E+ 0

3 E+ E-

4 0 E+

5 0 0

6 0 L

7 E- E+

1. E- 0

9 E-

Note: E+ = experimental group increased significantly in comparison to

control group

O = No statistical difference between experimental and control groups

E- = Control group increased significantly in comparison to experimental

group. - 190

iii. Test the two sub-samples separately.

Using this policy one can obtain a reasonably accurate estimate for the total

sample. Normallv, a much higher proportion of the cases will fall into the

panel sample 1/ so the results will tend to approximate the results of the

panel. There are two reasons for using a higher significance level. The

first is to add a note of caution and to ensure that marginal results which might have been produced by the combination of the samples, are not accepted.

The second reason is that increasing the confidence level has approximately

the same effect as reducing the number of degrees of freedom. With the T-

Test (i.e. for interval variables) related samples halve the degrees of freedom,

so the use of a higher confidence level will avoid errors of over-estimation of significance whien tests for independent samples are used with data which is partly derived from related samples.

In the following sections examples are given of the. use of these estimating techniques with nominal, ordinal and interval variables. All of

the examples use control groups, but the logic of the tests is the same if the analysis was conducted without a control group.

1. Nominal scale variables: Changes in the proportion of heads of household working in services and commerce

Table 45 presents information on the occupational group of heads of household in the project and in a control group in Sonsonate. The data is presented for 1977 and 1979. Theme is also a breakdown between panel

(families who were interviewed on both occasions), replacements (where a different family was interviewed in 1977 and 1979) and the total sample.

The purpose of the analysis is to determine whether there have been changes

1/ The results of the studies so far available show that after 2 to 3 years it is still possible to find at least 60 percent of the original sample. - 191 - in the type of occupation of the heads of household for all participants.

As we are dealing with a nominal scale variable the table must be reduced to a dichotomy before the comparison can be made. It was decided that the most interesting question is whether there has been a change in the propor- tion of heads of household who work in services and commerce. The reason for selecting this category is the same as that presented in Chapter 9

Section 2. Our hypothesis is again that participation in the project will reduce the proportion of heads working in services and commerce (here called

SERVICES). The experimental hypothesis is defined as:

1 (pe2 pel (pcl Pc2)

where: p refers to the proportion of SERVICE employment.

The null hypothesis will be:

H: (p e2 - Pci (Pcl - Pc2)

To test this hypothesis for the total sample we will assume that the two samples are independent and will use the T-Test for the Difference of the

Difference of Proportions. The data necessary to calculate T is presented in

Table 46. Using Equation 26 the value of T is found to be 0.4978 which has a probability of 0.312 under the null hypothesis. We must obviously accept the null hypothesis that there is no difference between the experimental and control groups in terms of the change in the proportion of family heads working in

SERVICES.

We will now proceed to test the two sub-samples separately to deter- mine whether any difference exists in the change in proportion of heads in

SERVICES. 192 -

Table 45: OCCUPATIONAL GROUP OF HEAD OF HO(ISEHOLD FOR PANEL REPLACEMENT FAMILIES AND TOTAL SAMPLE. PARTICIPANTS AND CONTROL GROUP. SONSONATE 1977 AND 1979

P a nel1 Replacements Total Participants 1977 1979 1977 1979 1977 1979

Agriculture and fishing 3 17 0 4 3 21 Manufacturing 25 14 3 1 28 15 Construction 11 6 1 1 12 7 Electricity, gas water 6 37 2 4 8 41 Commerce 29 5 1 1 30 6 Transport and services 64 56 10 8 76 64

TOTAL 138 135 17 19 157 159

Control group

Agriculture and fishing 4 11 0 3 4 14 Manufacturing 14 5 4 7 18 12 Construction 8 3 5 0 13 3 Electricity, gas water 2 15 1 12 3 27 Commerce 14 4 4 6 18 10 Transport and services 31 30 27 18 59 48

TOTAL 73 68 41 46 115 114

Source: Socio-economic surveys of Sonsonate. Fundacion Salvadorena de Desarrollo y Vivienda Minima. San Salvador, El Salvador. Table prepared by the author. 193 -

Table 46: INFORMATION REQUIRED TO APPLY THE T-TEST FOR THE DIFFERENCE OF DIFFERENCE OF PROPORTIONS TO THE TOTAL SAMPLE FOR TABLE 45.

T(2) 1977 1979 Experimental group Np 106 70 Nq 51 84 p .675 .454 q .325 .545 Sd(p) .468 .497 n 157 154

Control group Np 77 56 Nq 38 58 p .67 .491 q .33 .509 Sd(p) .47 .489 n 115 114

Table 47 INFORMKATION REQUIRED TO APPLY THE X TEST FOR THE PANEL SAMPLE WITH THE DATA OF TABLE 45

SERVICE (1) and other employment (2) in 1977 and 1979

1977 1 2 1 2 1979 2 1 1 2 Total

Participants 42 11 44 28 125

Control 15 8 25 16 64

TOTAL 57 19 69 44 189 - 194 -

For the panel sample we cannot directly use the T-Test for propor- tions so we will combine the two observations and use Xe The necessary information is presented in Table 47. When the complete distributions are compared, to determine whether there is a difference in the total distribution 2 of participants and the control group, the value of X = 2.28 which with

3 degrees of freedom is not significant. When the test is applied to heads who have changed from SERVICE to other employment or vice versa, the value 2 of X = 1.68 which is also not significant. This means that for the panel sample there is also no observable dLfference in the rate of change of SERVICE employment.

For the replacement sample it is again possible to apply the

T-Test for the difference of difference of proportions as the two samples are independent. The value of T = 0.3142 which has a probability of 0.378 under the null hypothesis which means that we cannot reject the null hypothesis.

Table 48 presents the results of the tests for the total sample for the panel and for the independent samples. As all 3 tests show no significant difference between the experimental and control groups, we can conclude that the finding of no difference for the total sample is valid

(this is Outcome 5 on Table 44). - 195 -

Table 48: RESULTS OF THE STATISTICAL TESTS FOR THE TOTAL SAMPLE, PANEL AND INDEPENDENT SAMP: LES FOR THE DATA IN TABLE 45

Probability under Test Score null hypothesis

Total Sample z .498 .312

Panel:

comparison of control and experimental in 2 T(1) X 1.68 .25

comparison T(1) and 2 T(2) X 2.28 .5

Independent samples Z .3142 .378 - 196

2. Ordinal scale variables: Changes in housing quality

Table 49 presents the results of a study on housing quality con-

ducted in Sonsonate in 1977 and 1979. The figures are for a mixed sample where some of the families are being reinterviewed in 1979 whereas the re- placement families are being interviewed for the first time.

As the scale is ordinal the appropriate test to use is the Median

Test. As a first step we compare the total samples in 1977 to determine whether there was any difference between the experimental and control groups.

The test gives X = 7.65 which with one degree of freedom has a probability of

0.01. This means that a difference already existed between the two groups in

1977. When the two sets of data for the experi mental group (1977 and 1979) are combined and compated with the two sets of data for the control group, again using the Median Test, we find that X has now increased to 127 which has a probability of 0.0001. This suggests tentatively that there may have been an increase in the difference between the experimental and control groups, but one can not infer too much simply from an increase in the level of significance.

Table 50 shows that for both the panel sample and the independent samples (replacements) there are also significant differences between parti- cipants and control, The conclusions which we can draw from these results are:

i. The quality of housing is higher for participants than for the

control group in 1979.

ii. The same pattern is found for all three samples so we can accept

that there is a consistent difference. - 197 -

Table 49: HOUSING QUALITY IN 1977 AND 1979 OF PARTICIPANTS AND CONTROL GROUP. SONSONATE, EL SALVADOR

1977 1979

Housing Quality Participants Control Participants Control

8 0 0 0 1 9 0 2 0 0 10 0 1 0 0 11 4 2 0 3 12 12 9 0 4 13 17 19 0 16 14 67 66 0 51 15 52 25 7 47 16 19 12 81 13 17 4 3 6 3 18 3 1 83 0 19 1 0 28 1 9 3 7

Total 180 140 180 140

Source; Socio-Economic Survey of Sonsonate4 1977 and 1979. Fundacion Salvadorena de Desarrollo y Vivienda Minima. Table prepared by author. 198 -

iii. There is some slight indication that the difference in quality

has increased between the two groups from 1977 to 1979 but the

evidence is not very strong.

iv. We cannot say with certainty that participation in the project

has increased the difference in housing quality between parti-

cipants and the control group.

3. Interval variables

It was shown in Chapter 6 Section 3.3.1 that when the T-Test is applied to a panel study with experimental and control groups, the result

is the same as if the test had been applied to two independent samples, but with the nuimber of degrees of freedom halved. The reason for this is

that the two observations on each case are combined to produce a single

change score, which makes the two samples independent but with half the number

of observations. A mixed sample is composed partly of cases where two observa-

tions are obtained on the same subject and partly of cases where the two

observations are obtained from different subjects. Following the argument

presented above it is possible to analyze the sample as if it were independent, but using half the degrees of freedom. As in fact part of the sample is

independent we will be reducing the degrees of freedom more than necessary

so that this is a conservative measure to ensure that we under-estimate

rather than over-estimate the significance level.

Example: Changes in the number of households occupying structures in

low-income areas of Manila 1/

Table 51 shows the number of households occupying structures

in the Tondo Foreshore Project and in 3 low-income areas used as a control

in Manila. The survey was conducted in 1978 and repeated in 1979 with the

1/ Taken from "A study of the impact of the project on the physical environ- ment of Tondo." Research and Analysis Division. National Housing Authority. The Philippines. 1979. - 199 -

Table 50: RESULTS OF THE STATISICAL TESTS FOR THE TOTAL SAMPLE, PANEL AND INDEPENDENT SAMPLES FOR THE DATA IN TABLE 49

Probability under Test Score null hypothesis

Total sample:

1977 Median

7.65 0.01

1977:1979 " 127 0.0001

Panel " 46 0X0001

Independent samples 8.08 0.005 - 200 -

purpose of determining whether the upgrading of community services and the

reblocking process would affect the number of households occupying each dwelling.

As there is no clear theoretical reason to indicate whether the

reblocking process would affect the number of households occupying each

dwelling.

As there is no clear theoretical reason to indicate whether the

reblocking process will increase or decrease the number of households per

dwelling, the research hypothesis is that reblocking willl chan&e the number

of households per dwelling but without indicating the expected direction of change. The null hypothesis is therefore:

H : H(E) = H(C)

where: H(E) = no. of households per dwelling in the project area.

H(c) = no. of households per dwelling in the control areas.

The figures required to conduct the T-Test are given in Table 52 where it

can be seen that a slight increase in the number of households per dwelling

occurred in Tondo (1.7045 to 1.7219) at the same time as there was a slight

reduction in the combined control areas (1.8897 to 1.8419).

As we are dealing with 4 samples the T-Test must be calculated in two stages; in the first we estimate the standard deviation of the difference between means for the two samples drawn from the experimental and control. groups respectively. In the second stage we then calculate the standard deviation of the difference of the difference of means. 201 -

Table 51: DISTRIBUTION OF STRUCTURES BY NUMBER OF HOUSEHOLDS OCCUPYING THE STRUCTURE TONDO AND CONTROL AREAS, 1978 ANND 1979

Tondo Malabon Mandaluong Paranague

No. of Households No. % No . % No. % No. %

A. 1978 1 584 59.7 41 58.6 36 37.5 69 65.1 2 219 22.4 14 20.0 22 23.0 22 20.8 3 98 10.0 12 17.1 13 13.5 12 11.3 4 48 4.9 2 2.9 13 13.5 3 2.8 5 20 2.1 1 1.4 10 10.4 6 6 0.6 7 2 0.2 2 2.1 8 9 1 0.1 10 -_.

TOTAL 978 100.0 70 100.0 96 100.0 106 100.0

3 1979

1 560 57.3 38 54.3 37 38.5 72 67.9 2 251 25.7 17 24.3 23 24.0 24 22.7 3 92 9.4 10 14.3 17 17.7 5 4.7 4 45 4.6 1 1.4 9 9.4 3 2.8 5 20 2.0 3 4.3 5 5.2 2 1.9 6 6 0.6 1 1.4 3 3.1 7 3 0.3 2 2.1 8 9 1 0.1 10 or more -_-

TOTAL 978 100.0 70 100.0 96 100.0 106 100.0

Source: "A study of the impact of the project on the physical environment of Tondo." Table 3, p. 49. RAD. National Housing Authority. The Philippines, 1979. - 202 -

Table 52: PRESENTATION OF THE INFORMATION REQUIRED TO CONDUCT A T-TEST ON THE DATA PRESENTED IN TABLE 51

Combined scores for Tondo 3 control areas

1978 1979 1978 1979

X 1.7045 1.7219 1.8897 1.841.9

s 1.0908 1.0874 1.2137 1.2593

N 978 978 2M2 272

F Test on variances 1.0063 1.0766

Source: Table 51 - 203 -

Using Equation 26 the standard deviation of the difference between the means of the two samples taken from the experimental group S(De) is calculated as follows:

S(De) 978 (1.0908)2+ 978 (1.0874 978 + 978 978 + 978-2 (978) (978)

- 0.0493

Similarly the standard deviation of the difference between means of the two samples from the control group is calculated as follows:

2 2 S(Dc) = 272-(1.2137) + 272 (1.2593)2 272 + 272 272 + 272 - 2 (272) (272)

= 0.1793

These two standard deviations are now combined in the equation to obtain the standard deviation of the difference between the difference of means as follows:

2 2 S(De-c) 978(.0493) + 272(0.1793) 978 + 272 978 + 272 - 2 (978) (272)

= 0.0065

The T-Statistic is then calculated in the usual way as follows:

T = (1.7219 - 1,7045) - (1.84i9 - 1.8819) 0.0065

= -4.6769 which with 1248 degrees of freedom is significant at the 0.0005 level. Our recom- mendation was that with mixed samples we should always require the 0.01 level of significance to reject the null hypothesis. The present T score far exceeds this

level so we can safely conclude that the number of households per structure has increased significantly more in Tondo than in the control areas. However,

as the difference between the two aroups only represents 0.05 households per

structure it is probably too small to have much operational importance. - 204 -

CHAPTER 11 THE APPLICATION OF THE STATISTICAL TESTS WHEN THE RESEARCH DESIGN DOES NOT INCLUDE A PRE-TEST

1. Simple ex-post comparison

The ex-post comparison is one of the weakest designs and is normally

only employed when for some reason it has not been possible to conduct inter-

views before the project has started. Although this design is theoretically

very weak, there are many situations in which this is the only type of st-udy

which is possible. The design is as follows:

T(1) T(2) E(2)2

C(2)2

In contrast with the stronger longitudinal designs where comparisons

are made between T(1) and T(2), in this design a comparison is made between two

groups interviewed in T(2), one of which has been affected by the project and

the other has not. It is assumed that if a difference is found between the two

groups, this can provide some indication of project impact. The design is very

weak, although the following example will illustrate that there are situations

in which a slightly stronger case can be made.

As the comparison is between two independent samples at one point in

time, the application of statistical tests is very straightforward.

Example of an ex-post comparison: Evaluation of the impact of the Kampung Improvement Program in Jakarta

The Kampung Improvement Program (KIP) is designed to upgrade all of

Jakarta's low-income kampung (neighborhoods) over a period of approximately

10 years. The plan, which began in about 1969, was to start with the poorest kampung and gradually to extend the program to include kampungs of an increas- ingly higher economic level. A major kampung survey was conducted in 1976-77. - 205 -

This included a sample of 4937 families drawn from 145 kampungs classified into the following 3 groups:

Group 1: 58 kampungs scheduled to be improved (2249 families)

Group 2: 60 kampungs already improved (1720 families)

Group 3: 27 kampungs never to be improved (968 families). This

latter group consisted of kampungs beside railways, in

flood areas or in other areas not suitable for improvement.

A review of existing evidence seemed to indicate that the government had fol- lowed the stated policy of starting with the poorest kampungs and gradually extending the program to include richer kampungs. The argument was made that as the already im2roved kampungs were the poorest at the time the program began, they must have been poorer than the group of kampungs to be improved. If the study,in 1976-77 indicated that the already improved kampungs now had equal or better conditions than the to be improved group then this would demonstrate that the improvement program had been successful. The design therefore con- sisted of a comparison between the already improved and the to be improved kampungs. For the purposes of the present discussion the never to be improved group can be ignored.

1.1 Comparison for nominal variables: Status of Housing [X2 test used in conjunction with PHI]

Table 53 presents information on the status of housing in improved and to be improved kampungs. There are five status groups: "owned," "rented,"

"leased," "free of rental" and "other." The purpose of the analysis is to determine whether the proportion of families in each status category is different in the two groups of kampungs. If a difference is found this may provide a first indication as to whether the program is producing more stable and secure land tenure. - 206 -

2 As the two samples are indepe-ndent the appropriate test to use is X 2 X will test the null hypothesis that the two samples come from the same distri- bution. In the present case the null hypothesis can be stated as:

H : Dist (E) = Dist (C)

where:

Dist(E) = distribution of experimental sample

Dist(C) = distribution of control sample

2 2 The application of X produces a value of X = 59.47 with (5 - I = 4) Degrees of

Freedom. The critical values for X at the 0.05 and 0.01 level are 9.488 and

13.277 respectively which indicates the existence of a statistically significant difference between the two samples. However, as the value of X is directly affected by the sample size, it is always advisable to apply PHI or Cramer's V

to the result to adjust for the effects of sample size. 1/ As the present data

are presented in a 2 x 5 table the appropriate test is PHI. The value of

PHI is calculated as follows:

PHI = //X - .1224 9N X 3969

PHI is approximately equivalent to a correlation coefficient. Using a 2 tail

test the critical value of r at the .01 level (with d.f. 500) is .1149. This

means that the result is statistically significant but that the proportion of

1/ PHI is used for a 2 x 2 or 2 x n table as under these circumstances its values range between 0 and 1 and the score can be interpreted in a similar way to a cor- relation coefficient. When there are more than 2 rows and columns there is no fixed upper value so it is preferable to use Cramer's V which does have limits of 0 and 1. - 207 -

Table 53: HOUSING STATUS IN IMPROVED AND TO BE IMPROVED KAMPUNGS IN JAKARTA 1976-77

Housing status To be improved Improved Total

Owned 1515 1085 2600

Rented 194 273 467

Leased 307 170 477

Free of rental 149 113 262

Other 84 79 163

Total number of kampungs 2240 1720 3969

Source: "Analysis and evaluation of impacts of KIP Implementation in Jakarta" Table 2.13(b) p-e41 P. T. Resources Jaya Teknik Management Indonesia. June 30, 1979.

The table presents the proportional distribution between the different categories. The present table was prepared by the author through combining these proportions with the total number of families interviewed in each type of kampung. - 208 -

the variance explained is only 1.32 per cent 2/ which meat's that the difference between the two samples is of almost no practical importance.

The conclusion is that a very small statistical difference has been

found between housing status in the two groups of kampungs, but the difference is so small as to be unimportant. The test results are summarized in Table 54.

1.2 Ordinal variable: Method of transport to work (Median Test)

Table 55 presents data on the method of transport used to go to work. The figures refer only to families who work outside the kampung in which they live. Methods of transport can be ordered from best to worse, with the highest score (1) assigned to people using a private car, (2) office car, (3) means of public transport other than a bus, which for the purpose of this example is assumed to mean a taxi, (4) bus, (5) becak, a bicycle taxi and (6) walking. The purpose of the test is to determine whether familiies in im- proved kampungs use "better" means of transport to get to work. It should be pointed out that this example is chosen purely to illustrate the operation of the test, and that a more thorough analysis would of course control for factors such as distance to work. As the modes of transport can be ordered from best to worst the table represents an ordinal scale. As the two samples

are independent the appropriate statistical test is the Median Test (See Chapter 5, Section 2).

To conduct this test a cumulative frequency distribution must be produced for the combined distribution of improved and to be improved kampungs. This is shown in the last column of Table 55. The frequency of group 1 (private car) is 5.4%, the cumulative frequency of the first 2 catego- ries is 16.1%, etc. The Median Score falls in category 4.

2/ Proportion of variance explained = r 209 -

Table 54: TESTING THE HYPOTHESIS THAT THE KAMPUNG IMPROVEMENT PROGRAM AFFECTS LAND TENURE

Research hypothesis: Participation in the kampung improvement program first phase (improved kampungs) will affect the land tenure system of the kampung.

Null hypothesis: H : Dist(I) = Dist(N) wRere: Dist(I) = distribution of tenure in improved kampungs Dist(N) = distribution of tenure in not improved kampungs

Level of measurement: Nominal variables Sample: 2 independent random samples Assumed sampling distribution: No assumptions required Statistical tests: X used in combination with PHI Minimum rejection level: 0.05 with 2-tail test Test statistics: X 59-47 PHI = 0.1224 Degrees of Freedom: 4 for X and 3969 for r Critical value of: X : 13.277 at 0.01 level r : 0.1149 at 0.01 level Interpretation: There is a statistically significant difference between the two sampling distributions, but as only 1.26 percent of the variance is explained the difference is of no operational importance. - 210 -

Table 55: METHOD OF'TRANSPORT TO WORK (FOR FAMILIES WORKING OUTSIDE THEIR KAMPUNG) IN TO BE IMPROVED AND IMPROVED KAMPUNGS IN JAKARTA 1976-77

Rank of To be Cumulative order Transport Improved Improved Total percentage

I Private car 103 48 151 5.47

2 Office car 187 108 295 16.17

3 Taxi 110 104 214 23.84

4 Bus 578 394 972 59.15

5 Becak 25 22 47 60.90

6 Walk 600 478 1078 100

Total 1603 1154 2757

Source: "Analysis and evaluation of impacts of KIP Implementation in Jakarta." Table 2.6 p. 30. P.T. Reosurces Jaya Teknik Management, Indonesia. June 30, 1979.

The table was calculated by the author. It was assumed for the purpose of this example that the category "Other types of public transport" could be considered as taxis. - 211 -

Using the policy recommended in Chapter 5 for situations where a considerable number of cases fall on the median, the table is dichoto- mized into one group of cases which fall above or on the median, and a second group which lie below the median. Table 56 presents the data in this dichotomized form.

Now that the data have been presented in this form it is possible to apply the X test. As the table is 2 x 2 it is necessary to apply the Correc- tion for Continuity (Equation 4) which is calculated as follows:

X2. = (0- E) +0.5] 2 E

The null hypothesis which is being tested is that there is no difference bet- ween improved and to be improved kampungs in terms of the proportion of families above or on the median. In the present example the corrected value

2 of X - 5.18 which with D.F. = 1 is significant at the 0.025 level (critical value = 5e025). 1/

This shows that there is a difference between the two groups of kampungs. However, inspection of the table will show that the proportion of improved families on or above the median (56.7%) is lower than for the to be improved group (61%). This shows that the difference is in the opposite direction to that which had been expected., and that the familines in improved kampungs use inferior means for travelling to work.

11 If we used the Correction for Continuity formula proposed by Siegel (Equa- tion 6.4 page 107) the value of X would be reduced to 4.9 which would lower the significance level to 0.05. - 212 -

Table 56: REDUCTION OF TABLE 55 TO A DICHOTOMIZED FORM OF CASES LYING ON OR ABOVE THE MEDIAN CONTRASTED WITH CASES LYING BELOW THE MEDIAN

To be improved Improved Total

A B

Above or on median 9-78 655 1653

C D

Below median 625 500 1125

Total 1603 1155 2758

Source: Table 55 - 213 -

It should be noted that the hypothesis that improved kampungs use

superior methods of transport cannot be tested directly. The hypothesis

being tested is that there is a difference between the two distributions.

If the null hypothesis (that there is no difference) is rejected, we must

then determine by inspection which is the direction of the difference. The results are summarized in Table 57.

1.3 Interval variable: Per capita income (T-Test)

Table 58 shows per capita income in improved and to be improved

kampungs. The mean per capita income in improved kampungs (Rupias 7018)

is lower than for to be improved kampungs (Rupias 7395). The initial

research hypothesis was that per capita income would be higher in improved

kampungs. However, as this can be observed not to be true, the more

interesting research question is to determine whether per capita income

is in fact lower in improved kampungs. This is done by testing a one- tail null hypothesis.

H0: I(I) > I(N)

where:

I(I) = Average per capita income in improved kampungs.

I(N) = Average per capita income in not improved kampungs.

The first step is to determine whether the two samples have common or separate variances as this determines the way in which the T-Test will be calculated (Chapter 6 Section 2). The application of the F-Test to the comparison of the two variances produces:

F = 1.38 with D.F. 59,57 214 -

..able 57: TEST OF THE HYPOTHESIS THAT FAMILIES IN IMPROVED KAMPUNGS USED SUPERIOR MEANS OF TRANSPORT TO TRAVEL TO WORK

Research hypothesis: Families in improved kampungs use superior means to travel to work.

Null hypothesis: Dist(E) = Dist(C)

Level of measurement: Ordinal scale

Sample: Two independent random samples

Assumed sampling distribution: No assumptions are made

Statistical test: Median Test

Minimum rejection level: 0.05

Test statistic: x (Corrected for Continuity) = 5.18

Degrees of freedom: I

Critical value of x 2 5.025 at 0.025 level

Conclusion: The null hypothesis is rejected. However, inspection of Table 56 reveals that the proportion of families in improved kampungs who use superior means of transport is in fact lower. This means that the research hypo- thesis must be rejected. -215-

Table 58: PER CAPITA INCOME IN IMPROVED AND UNIMPROVED KAMPUNGS JAKARTA 1976-77

Improved Unimproved kampungs kampungs Per capita income (Rupias per month) 7018 7395

Standard deviation 2084 2453

Sample size (no. of kampungs) 60 58

Source: "Analysis and evaluation of impacts of KIP implementation in Jakarta." Table 2.1 Page 24. P. T. Resources Jaya Teknik Management Indonesia. June 30, 1979

Calculations by the author. Note that the estimations were based on average data from 60 and 58 kampungs rather than from the data of each individual family. - 216 -

This does not reach the critical level with 0.05 probability so the two

variances can be considered equal and can be pooled. The standard deviation of the difference between means S(D) is calculated according to Equation 25 as follows:

S(D) 60(2084)2 + 2 0 + 58 60 + 58 - 2 60(58) 422

The T-Statistic is then calculated according to Equation 16 as follows:

T = 7395 - 7018 422

= 0.891

With D.F. (60 = + 58 - 2) = 116 the critical score at the 0.05 level is 1.65 so the obtained score is clearly not statistically significant. The conclusion -is that there is no observed difference between per capita income in improved and unimproved kampungs. Thus even though the average per capita income in improved kampungs was found to be lower in the sample, the difference was not sufficiently .large to be considered statistically significant.

The results are summarized in Table 59. 2. Ex-post comparison using differenz ex-ante study on same families as the reference point

Th.e design presented in Section 1 of this chapter is extremely weak and should only be used as a last resort. The main problem is that we have no way of knowing if observed ex-post differences are due to project impact or simply to the fact that the two groups were different in T(1). A slight improvement can be achieved if some source of information can be found to compare the groups in T(1) even if this comes from a different study. A typical situation is where information is available on families from the - 217 -

Table 59: TEST OF THE HYPOTHESIS THAT PER CAPITA INCOME IN IMPROVED KAMPUN(.S WILL BE HIGHER THAN IN UNIMPROVED KAMPUNGS

Research hypothesis: Per capita income in improved kampungs will be higher than in unimproved kampungs.

Null hypothesis: As per capita income in improved kampungs is observed to be lower than in unimproved kampungs, for the sample, the null hypothesis is that there is no significant difference:

H : X =X o e c Level of measurement: Interval scale variables

Sample: Two independent random samples

Assumed sampling distribution: Normal distribution with equal variances.

Result of test for homoscedasticity: The F-Test was applied to the two variances and a value of F = 1.38 was found. With D.F. = 59,57 this is not significant and the two samples can be assumed to have the same variance.

Statistical test: T-Test for independent samples

Minimum rejection level: 0.05 with one-tail test

Test statistic: T = 0.891

Degrees of freedom: 116

Critical value of T: 1.65 for 0.05 level .Conclusion: There is no significant difference between per capita income in improved and unimproved kampungs. The original research hypothesis (that per capita income in improved kampungs is higher) must be rejected, but the null hypothesis (that there is no difference) must be accepted. -218-

project selection stage. This usually includes iniformation on income, educa-

tion and family size, all of which are likely to be related to the rate of

change in the variables in which we are interested. Using this data our

research design is as follows:

T(1) T(2) e()1 X E(f1)2 c(1)1 C(1)2 where c and e refer to a different set of data available on the families studied in T(2).

The illustration comes from a study of the mutual help construction process in

the Salvadorean city of San Miguel. Groups 1 and 2 were compared at the completion of mutual help. It was hypothesized that Group 1 would have different attitudes and perceptions in T(2) because being the first group they would face more problems and would therefore have more opportunity for group discussions and group interaction. Information was available from the selection process which had taken place about 9 months earlier and this was used to determine whether the two groups were sufficiently similar in T(1) for the comparison in T(2) to be meaningful. Table 60 shows the results of this comparison in T(1). On the 8 variables compared it was found that the two groups only differed in their family income. This would suggest that the two groups are sufficiently similar for the comparison in T(2) to be meaningful.

Table 61 compares the two groups in T(2). The comparison was made on

8 attitude variables and differences between the two groups were only found on three of these variables (participation in planning meetings, types of prob- lems identified daring the work periods and the most important lessons learnt from the group meetings.) It is beyond the scope of the present discussion to evaluate the theoretical importance of these differences, but given the - 219 -

Table 60: COMPARISON OF THE FIRST AND SECOND GROUPS ON A NUMBER OF SOCIOEC,ONOMIC CHARACTERISTICS AT THE START OF THE PROJECT

Variable Test Score DF Probability

Age of head T -0.85 193 0.402

Years of education T 0.91 194 0.364

Income of head T 1.08 193 0.282

Family size T 0.99 194 0.323

Family income T 2.08 194 0.039 +

House Payment T -0.52 194 0.603

Monthly expenditure T 1.69 194 0.092

Labor for participation X 0.98 3 0.804

Source: Evaluation of the mutual help program in San Miguel. Unidad de Evaluacion. FSDVM 1979. -220-

Table 61: COMPARISON OF THE ATTITUDES OF THE FIRST AND SECOND GROUPS IN T(2)

Variable Test Score DF Probability

Should group meetings continue after the construction is completed T -0.05 192 0.959

Did respondent partcipate in planning of project T 2.23 192 0.027 +

Objectives of mutual help X 0.85 2 0.65

Most important lessons learnt from mutual help X 3.62 5 0.604

Principal problems during 2 work periods X 6.02 2 0.049 +

Most important lessons learnt from group meetings X 31.59 8 0.0001 ++

Reasons for wishing to continue meetings X 5.69 3 0.12

Lessons learnt from group work X 7.32 11 0.77

Source: Evaluation of the mutual help program in San Miguel. Unidad de Evaluacion. FSDVM. 1979. - 221 - similarity of the two groups in T(1) one can conclude that there is some tentative evidence that the differences may have been produced by the different project experiences. - 222 -

CHAPTER 12 THE PROBLEM OF NON-EQUIVALENT CONTROL GROUPS

Much of the theory of significance tests is based on the assump-

tion that the experimental and control groups are very closely matched in T(1).

In the laboratory experiment there is random assignment of subjects to the

experimental and control conditions. Although it is acknowledged that such

a strict comparability is usually not possible in field research, it is

assumed, at least implicitly, that there is a very close approximation.

Unfortunately, as was pointed out in Part 1, in urban research it is very

common to find that considerable differences exist between the experimental

and control groups.

Table 62 presents a typical situation in which the experimental and control groups were compared in T(1) on a series of key indicators.

A statistically significant difference between the two groups was found on 4 of the 8 indicators (education of head, family size, household income and age of the household head).

These differences in the initial condition of the experimental and control groups make it very difficult to apply and interpret the usual statistical tests. Let us assume, for example, that we find a more rapid rate of housing consolidation in the experimental than in the control group

(using the median test). If the two groups were identical in T(1) we would have some justification for inferring that the observed differences in T(2) were due to the effects of the project. If, however, the two groups were different in T(1) how are we to know if the difference in T(2) is due to the effect of the project or to these initial differences? - 223 -

Table 62: A COMPARISON OF THE CHARACTERISTICS OF THE FXPERI4ENTAL *AND CONTROL GROUPS IN T(1). SONSONATE. 1977.

Variable Participants Control Test Score Probability

Years of education 4.52 3.05 T 3.96 0.0001

Weeks worked last month 3.9 3.78 T 1.39 0.1.68

Months in present job 112 126 T -1.01 0.339

Family size 5.67 4.82 T 2.89 0.004

Household income last month 385.6 301.2 T 3.52 0.001

Age of head of household 37.2 43.5 T -3.7 0.0001

Sex of head of household 60.3 66.7 X .725 0.39 (, male) Branch of economic activity x2 4.04 0.67

Source: "Socio-economic survey of Sonsonate 1977T." Fundacion Salvadorena de Desarrollo y Vivietida Minima. Table prepared by the author. -224-

Figures 6 and 7 suggest one way to clarify the issue. Figure 6A

shows the typical situation in an experimental design. With random assignment

(or matching) the mean score on the dependenit variable in T(1) is very close

for the experimental and control groups, with the only difference being due

to sampling error. In T(2) we find there is now a difference between E2 and

C2. If this difference proves to be statisticalJ7 significant we can infer

that the experimental treatment has had an effect.

Figure 6B represents a typical quasi-experimental design in which

E1 and CI are different in T(1). The results of the study show that in T(2)

the difference between E2 and C2 has increased even further. Can we infer that this ig due to the effect of the experimental treatment, or can it be explained as a result of the initial differences between the two groups?

Figure 7 presents two alternative explanations of the Figure 6B situation. Figure 7A represents a situation in which E and C are 2 2 found to be two points on the same regression line. This means that the E and C 2 2 scores can be explained without attributing any effect to the experimental treatment. An example of this would be where the dependent variable was income and where age was one of the independent variables where there was a difference between C and E * If income increases with age, and if the control group in T(1) were younger than the experimental group (for example mean age of 20 years compared with 25 years). In this case we might find that most of the variance in income can be explained by aging. Under these circumstances C is simply lower down the curve than E 2 2 - 225 -

Figure 7B presents the alternative situation where E and E are 1 12 points on the BB regression line, whilst CI and C2 are points on the CC regression line. In this case there is a real difference between the two, after the effect of intervening variables has been controlled for. In this situation we can infer that the project has probably had an effect. The objective of this chapter is to suggest ways in which we can control or adjust for the effect of the uncontrolled independent variables. Essentially we Tvish to determine whether there still exists a difference between the experimental and control groups which cannot be explained by these other variables, and which we can then argue is due to the effect of the project.

As is to be expected, the methods for adjusting for the intervening variables will depend upon the level of measurement of the dependent variable in which we are interested. This distinction is important as many texts only discuss this issue when the dependent variable is measured on an interval scale.

1. Adjusting for the effect of intervening variables when the dependent variable is dichotomized

In the present discussion we assume that when the dependent variable is measured on a nominal scale it will be reduced to a dichotomous form. The procedures for doing this are the same as those used in earlier sections.

An important first decision is whether in fact the dependent variable should be reduced to a dichotomous form. In many cases the variable is originally measured on an ordinal scale but is reduced to a dichotomous form as it is believed this will facilitate the analysis. One of the reasons for doing this is the widespread use of dummy variables where this procedure is followed. However, in the present case it is often preferable to maintain 226 -

the variable in its original ordinal form. The reason for this is that the

reduction to binary form reduces the maximum possible value of the correla-

tion co efficient. 1/ With small samples and large differences between p and

q the maximum possible value of r (the correlation coefficient) may be as

low as 0.25. 2/ This obviously makes it difficult to interpret the results.

The problem can be partly overcome by using Tetrachoric Correlations but

the procedures involved are much more complicated and should probably be avoided for ihis reason. 3/

It is strongly recommended that unless the researcher proposes

to use a more complex analytical procedure such as Probit or Logit, the

variables should be left wherever possible in ordinal form.

1.1 Controlling for the influence of intervening variables through the use of cross-breaks

Throughout this section we will continue to use the example of

changes in housing quality. In the present case the sample has been

dichotomized according to whether a low or high level of improvement of

quality has been observed. It should be cautioned that this procedure

runs counter to the recommendation above that where possible variables

should be kept in ordinal form. The example is used, therefore, for

illustrative puirposes as it is helpful to be able to compare the resulrs

of analyzing the same data in several different ways.

1/ The reason is that r can only assume the value of + I when the shape of the frequency distribution of each variable is equal or exactly the opposite. With dichotomous variables this can only occur when p = q = 0.5. The larger the difference between p and q the greater the difference between the distributions. 2/ See Cohen and Cohen (1975) p. 60 for an illustration. 3/ Cohen and Cohen (1975) p. 62 or McNemer (1975) for a more detailed discussion. - 227 - FIG 6 GPLAPHICAL COPPARISON OF EXPERIB-T 3.ALTTAL ANID rOUASI-E7=FERDEI :EiNLTA DESIGIN S

A. The exDerimental design where E(1) = C(I)

T(z

\/ALU'E oF/ C?..-

B. The quasi-experimenXal design where E(i) C(-).

OF zS

C,

TC') r(z) - 228 -

Fi- 7 ALTIERNATIVE EXPLANATIONS OF DIFFERENTCES RBT T;EEIN CONmC L AND EXPERTY'ENTAL GTRCUPS IN T(2) WITH NO!OT-EOTUIV^:LENT CONTROL GRO UPS

A. E 22anad C are different points on the same regression line

T

cz

Cti- T(' )

3. E2 and C2 are points on different regression lines

T(2) 22 2

EX-

i;TO,

- - - -- . 229 -

Table 63 pr.esents the comparison of the control and participant

groups in the form of a 2 x 2 table. It can be seen that the proportion

of high changes among participants is very much higher than for the control

group (52.6% compared with 4.9%) and we find that the X value of 53.4 is significant beyond the 0.0000 level. As we mentioned in earlier sections it is advisable to use PHI rather than X so as to avoid the danger of spuriously inflated significance levels. We find the PHI score is 0.476

(Table 64) which is significant also at the 0.0000 level.

The difficulty of interpreting the PHI coefficient is shown when we estimate the maximum possible value of r for the values in Table 63.

The maximum possible value would occur if all control group were in the low-change cell. The table would then become:

Change Low High

Participant 68 (A) 84 (B)

Control 83 (C) 0 (D)

r is calculated as follows:

BC - AD r w t (A+B)(C+D)(AbuC)(BuD).

with the above distribution the value of r = 0.5511 - 230 -

Table 63: COMPARISON OF DICHOTOMIZED CHANGE IN HOUSING QUALITY FOR PARTICIPANTS AND CONTROL GROUP

Low change High change

N p N q Total

Participants 72 .473 80 .526 152 Control group 79 .951 4 .049 83

Total Sample 151 .642 84 .358 235

Source: Socio-economic study of Sonsonate. 1977-and 1979 Table prepared by the author. - 231 -

2 Table 64: X AND PHI FOR 2 X 2 COMPARISON OF HOUSING QUALITY CHANGE AND PARTICIPANT STATUS CONTROLLING FOR INCOME AND FAMILY SIZE

Value of association between dichotomized housing quali'ty change and participant status

Control Variable Level X N D.F. Probability PHI Probability

LOW 35.2 136 1 0.0000 0.524 0.0000 INCOME HIGH 15.5 100 1 0.0001 0.417 0.0000

LOW 29.0 132 1 0.0000 0.485 0. 0000 FAMILY SIZE HIGH 19.7 103 1 0. 0000 0. 46 0.0000

COMPLETE SAMPLE 53.4 235 1 0.0000 0.476 0.0000

Source: Socio-economic study of Sonsonate 1977 and 1979. Table prepared by the author. - 232 -

To illustrate the process of controlling for the effect of

intervening variables we select income and family size as examples.

On the basis of a frequency distribution each of these was dichotomized

into LOW and HIGH groups. In the present case the division was made so

as to ensure an approximately equal number of cases in each group as follows:

Family size: Less than 4 = Low

4 or more = High

Income: Less than ¢ 450 = Low

¢ 450 or more = High

One can select any point for the division but the advantage of producing

equal numbers in each is that the problem of not having sufficient cases

in each cell for some of the cross-breaks is minimized. Table 64 gives

the results of the analysis when the effect of family size and income are controlled for separately. When we control for income we find PHI = 0.524

when income is low, and PHI = 0.417. Both of these values are significant

beyond the 0.0000 level. This means that when we separate families accord-

ing to their income level this does not affect the fact that the participant

group contains a -very much higher proportion of families who have experienced a high increase in housing quality. If the change in quality was not due to participant status as such but to the fact that the participant group con- tained a much higher proportion of families with high income, then when the

group was divided according to income we would have expected to find a

reduction in the significance level. When we control for family size we

again find that the difference between the control and participant groups has not been affected.

...... ~~~~~~~~...... -.....- . ..--.....--- - 233 -

In Table 65 we control simultaneously for income and fa=.,:ly size so that this time the group is divided into 4 cells (low income and low family size, high income and low family size, etc.) Again we findl that all of the PHI scores are highly significant so we can conclude tlhe inter- action between income and family size also does not alter the basic dif- ference between participants and control groups.

If the sample is sufficiently large it would be possible to control for the interaction between 3 or more variables, but every additional variable which is added (in dichotomous form) will reduce on average the cell sizes by half and in some cases the reduction will be even greater. It can be seen that if we were examining a relatively large number of independent variables, this process would become ex- tremely cumbersome both to conduct and to interpret.

1.2 Controlling for the effect of intervening variables through the use of partial correlation

Partial correlation is a technique which permits the calcula- tion of a correlation coefficient between two vari ab1-,; whilst holding constant one or more intervening variables. It ia a method of perform- ing statistically the type of control which was done mechanically through the cross-breaks. The use of partial correlation has at least two advan- tages. Firstly, it is much easier to handle and interpret, and secondly, it is possible to analyze simultaneously the interaction between a larger number of independent variables with a given sample size than would be possible with cross-breaks. 1/

1/ This is possible because it is assumed that the relationship between the variables is linear. The value of each of the two variables in the corre- lation is then adjusted to take out the effect of the independent variable (see Nie et al p. 302) -234 -

Table 65: PHI SCORES FOR 2 X 2 COMPARISON OF CHANGE IN HOUSING QUALITY AND PARTICIPANT STATUS, CONTROLLING SIMULTANEOUSLY FOR DICHOTOMIZED INCOME AND FAMILY SIZE.

I N C O M E

FAMILY SIZE L o w H i g h

L o w PHI = 0.429 PHT = 0.523

N = 83 N = 47

D.F. 1 D.F. = i

Signif = 0.0000 Signif = 0.0001

H i g h PHI = O.603 PHI = O.345

N = 51 N = 52

D.F. = 1 D.F. = 1

Signif = 0.0000 Signif = 0.0061

Source: Socio-economic study of Sonsonate 1977 and 1979. FSDVM. Table prepared by the author. -235-

In statistical packages such as SPSS it is usually possible to

control simultaneously for the influence of up to 5 independent variables.

Table 66 illustrates the use of partial correlations with the previous

example of changes in housing improvement. Four intervening variables

are introduced: income, age 9 education and family size. When none of

these variables are introduced the (zero order) correlation between change

in housing quality and participant status is 0.4803 which is significant

at the 0.001 level. In the next four rows of the table the intervening variables are controlled for one by one (first order partials). It can be seen that this has virtually no effect on the size of the correlation

and the significance level remains unaffected. The intervening variables are then entered two at a time (second order partial correlations), three at a time (third order) and finally all four are introduced together (fourth order). It can-be seen that the correlation between housing quality change and participant status remains virtually unaffected by the introduction of these variables. The conclusion of this analysis is that there appears to be a real difference between participants and the control group which remains unchanged by the introduction of intervening variables.

This analysis is both easier to conduct and interpret than the crossbreaks and has the considerable advantage that the sample size is not reduced with the introduction of additional variables.

2 Adjusting for the effect of intervening variables when the dependent variable is measured on an ordinal scale

Two approaches are possible with ordinal variables. The first is to use cross-breaks in a similar way to the previous section, the only difference being that in this case we can analyze each of the result- ing contingency tables either with X or using the Median Test. The second - 236 -

Table 66: ZERO TO FOURTH ORDER PARTIAL CORRELAT.ION COEFFICIENTS FOR IMlPROVEMENT IN HOUSING QUALITY WITII ]!;RITICIPANT STATUS CONTROLLING FOR INCOME, AGE, EDUCATION.l AND FAMILY SIZE

C o n t r o 1 v a r i a b 1 e s

Income Age Education Family size C(',orrelation Significance

0.4803 .001 x 0.4822 .001 X 0.4692 001 X 0.4613 .001 x 0.4666 .001 X X 0.4693 .001 X X 0.4583 .001 X X 0.4555 .001 X X 0.4749 .001 X X 0.4607 .001 X X 0.4463 .001 X X X 0.'+641 .001 X X X 0.4558 .001 X X X 0.4431 .001 X X X 0.4516 .001 X X X X 0.4484 .001

Source: Socio-economic study of Sonsonate 1977 and 1979. FSDVM. Table prepared by the author. -237- approach is to make the "heroic assumption" that the ordinal scale variable can be treated as if it were measured on an interval scale, and then proceed to use the multiple regression techniques which will be presented in Section

3 of this chapter. Both procedures will be discussed using the previously discussed index of changes in housing quality.

2.1 The use of cross-breaks in conjunction with the median test

Table 67 shows the difference between the median housing quality change for participants and control groups when we separate the sample into large and small family groups, and into high and low incomes. The median test can then be applied in exactly the same way as given in Chapter 5.

Table 68 gives the frequency distributions for small families to illustrate the process of applying the Median Test. It can be seen from the combined cumulative frequency distribution that the median value for the combined distributions lies between the scores of 1 and 2. The table is then reduced to the following form:

Participants Control

Below median 15 45 Above median 59 11

The value of X is then estimated as:

2 N2 X = N (rAD BC -2) (A + B)(C + D)(A + C)(B + D)

= 48.75 -238 -

Table 67: MEDIAN QUALITY CHANGE IN PARTICIPANTS AND CONTROL GROUP WHEN CONTROLLING FOR FAMILY SIZE AND INCOME

TOTAL FAMILY SIZE INCOME Small Large Low High

PARTICIPANTS Median 2.23 2.142 2.37 2.35 2

N 154 74 78 80 74

CONTROL Median -. 47 -. 65 -. 39 -.56 -.187

N 82 58 25 56 26

Difference between 2.70 2.79 2.76 2.91 2.187 -239 -

Table 68: FREQUENCY DISTRIBUTION OF CHANGES IN HOUSING QUALITY FOR PARTICIPANTS AND CONTROL GROUP WHEN FAMILY SIZE IS SMALL

F r e q u e n c i e s Combined cumulative Change in quality Participants Control frequency

-3 0 4 4

-2 0 2 6

-1 2 13 21

0 4 19 44

+1 9 7 60 ------Median +2 20 7 87

+3 14 3 104

+4 15 0 119

+5 5 1 125

+6 3 0 128

+7 2 0 130 -240-

With D.F. = I this is statistically significant at the 0.0000 level.

The test can be applied in the same way for each of the other cells given in Table 67 and in each case the differences are found to be highly signi- ficant. This shows that the difference between participants and control with respect to housing change is not affected by family size or income level.

If the sample is sufficiently large one can control simultaneously for 2 or more independent vatiables. Table 69 illustrates this process when we control separately for income and family size and then for the interactions between the two. Although there are some changes in the medians of each cell, in all cases the difference between participants and control is found to be highly significant.

2.2 Controlling for intervening variables through the use of multiple regression analysis

We will again use the example of changes in housing quality dis- cussed in earlier sections. In this case the dependent variable will be

QUALITY3 = QUALITY2 - QUALITY1

where: QUALITY2 and QUALITY1 indicate housing n-rality in

T(2) and T(1) respectively.

To be able to use regression analysis a key assumption is that the depen- dent variable is measured on an interval scale and that its distribution is approximately normal. This second assumption is particularly important in the present case as we wish to use the F test to determine whether the coefficients of the independent variables are significantly different from zero. - 241 -

Table 69: RELATIONSHIP BETWEEN' THE ORDINAL VARIABLE CHANGE IN HOUSING QUALITY AND PARTICIPANT STATUS, CONTROLLING FOR INCOME AND FAMILY SIZE

Variable Entire population

Median ,239 -236 .

Variable Participant Family size Income Code Yes Small Low Median 2.23 2.142 2.5 N 154 74 42

Variable Income Code High Median 2.44 N 32

Variable Family size Income Code L -arge Low Median 2.37 2.72 N .-78 37

Variable Income Code High Median . N 41

Variable Participant Family size Irncome Code No Small LoW Median -.47 - 99 61

Variable Income Code High Median -. 25 N L1

Variable Family size Income Code Large Low Median 39-4 N 25 14

Variable Income Code High Median -. 3 N -242-

To test for the normality of the distribution we first construct a distribution of the observed and expected frequencies for a normal dis-

tribution. The observed frequencies are shown in column 1 of Table 70 where it can also be seen that the mean change in quality is + 1a92. Secondly, we calculate the Z scores for each scale value (column 2) these are defined as:

Z = x - x

where:

x = observed score

x = mean

s = standard deviation

Thirdly, calculate the cumulative probabilities for each scale position

(column 3). For negative Z valuies (i.e. below the mean), the cumulative probability can be read directly from the table of the normal curve. For positive Z values the cumulative probability is defined as:

Cum Prob = 1 - C

where: i = area under the normal curve.

The expected cumulative frequency is then calculated by multiplying cumula- tive probability by the total frequency (238). The corresponding figures are givien in Column 4. The expected cumulative frequencies are then com- pared, with the observed cumulative frequencies (column 5) by using X.

The value of X is 66.59 which with 9 degrees of freedom has a probability of less than 0.0001 of occurring under the null hypothesis.

The conclusion is that there is a difference between the observed and expected sampling distributions and that our observed distribution cannot therefore be considered to be normal. However, inspectioni of Figure 8 - 243 -

Table 70: COMPARISON OF THE OBSERVED FREQUENCY DISTRIBUTION OF CHANGES IN HOUSING QUALITY WITH THE EXPECTED FREQUENCIES FOR A NORMAL DISTRIBUTION

(1) (2) (3) (4) (5) Expected Observed Score Observed Cumulative cumulative cumulative category frequency Z score probability frequencies frequencies

-3 5 -2e45 .00714 1.69 5 -2 4 -1.95 .02559 6.09 9 -1 20 -1.45 .07353 17.5 29 0 44 -0.96 .1685 40e1 73 +1 34 -0.46 .32276 76.8 107 +2 45 0.04 .5016 119.4 152 +3 33 0.54 .72054 171.5 185 +4 37 1.03 .8485 201.9 222 +5 9 1.53 .9370 223 231 +6 5 2.03 .9788 232.9 236 +7 2 2.53 .9946 237 238

Mean 1.92

s 2.01

Source: Socio-economic study of Sonsonate. 1977 and 1979. Calculations by the author. -244- Figure 8: COMPARISON OF FREQUENCY DISTRIBUTION OF CHANGES IN HOUSING QUALITY WITH THE EXPECTED FREQUENCIES UNDER THE NORMAL CURVE

10

4e A

-3I /' 14 * *- ...... -..i......

0. .

l] -2 -l 0 1 42oAX +4 +? is +s+? l

*- ~ 6 )Pw.ITYCIII14X -245-

shows that the distribution appears to follow closely the general shape of

a normal distributioii. If we accept the conclusions of studies by Norton,

Boneau and others on the robustness of the F-Test and T-Test 1/ then it is

probably justified to apply the tests to the present distribution given its

reasonable approximation to the shape of the normal curve.

Assuming the above arguments are valid, it now becomes possible to

use multiple regression analysis in the same way as in section 3 of this

chapter. The purpose of the analysis is to determine whether a significant

difference between participants and control group exists when the effect of

intervening variables has been controlled for. Table 71 shows that the

coefficient for participant status (-2.37) is still highly significant on

the F-Test when we control for months in present job, sex of head of household, weeks worked by head last month, family size, education and household income. On the basis of this analysis we can be more confident that there is a real difference in quality change between participants and control group and that this is not a spurious relationship caused by initial socio-economic differences of the two groups.

3 Adjusting for the effect of intervening variables when the dependent variable is measured on an interval scale

To illustrate the approach we return to the study of changes in the income of the head of household in Sonsonate, El Salvador. In the actual data obtained from the study there was found to be no significant difference in the rate of change of income between participants and the control group. For the purposes of the present discussion we have multiplied

1/ Kerlinger, 1973, Chapter 16 and Cohen and Cohen, 1975, Chapter 2 Section 2.8 summarize the results of many of these studies. -246-

Table 71: STEPtMIDE MUlLTIPLE REGRESSION WITH CHANGE IN HOUSING QUALITY AS THE DEPENDENT VARIABLE AND WITH PARTICIPANT STATUS AND VARIOUS SOCIO-ECONOMIC ATTRIBUTES OF THE SUBJECTS AS THE INDEPENDENT VARIABLES

Multiple R RSQ Simple Probability R Square Change R B Beta F of F

Participant status 0.592 0.351 0.351 -0.59 -2.37 -0.59 97. 9 >0.0001

Months in job 0.598 0.358 0.007 0.07 0.178 0.09 2.37 <0.1

Sex 0.602 0.363 0. 005 0. 12 0. 303 0. 07 1. 67 <0. 1 Weeks worked last month 0. 605 0. 366 0.003 -0. 03 -0.26 -0.06 1. 20 <0. 1

Family size 0. 607 0. 369 0. 003 0. 05 -0. 38 -0.04 0. 42 <0. 1

Education of head 0.608 0.369 0.000 0.13 0.21 0.03 0.28 <0. 1

Household income 0.608 0.370 0.000 0.02 -0 16 -0.02 0.12 <0. 1

Constant 5.52

Source: Socio-economic study of Sonsonate. 1977 and 1979. Table prepared by the author. -247 -

the income of participants in T(2) by a factor of 1.2 so as to ensure

a significant difference in the rates of change. None of the other

variables have been changed.

Table 72 shows that with this adjustment the mean change in

the income of the household head is ¢ 199 for the part,icipants compared

with Z 116 for the control group. When the T-Test is applied this difference

is found to be statistically significant at the 0.005 level. Under the

traditional analytical approach one would infer that the difference was due

to the effect of participation in the project. However, Table 62 showed that

there were significant differences in the initial conditions of the two groups

in T(1) with respect to education, family size, household income and age of

the household head. To be able to state that the observed difference in the

rates of change is due to participation in the project we must be able to

control for these initial differences. This control can easily be achieved

through the use of multiple regression analysis.

As a first step we will control for differences in initial income in

T(I). This will be done using the procedures presented in Chapter 6 Section 3.

Instead of using the change in income as the dependent variable, we will

regress on the income in T(2) so that we can separate out the effects of the

initial income of the head in T(1). Table. 73 shows that when we use income in

T(1) and Participant Status as independent variables, we can explain 34% of

the variance in income in T(2). When we examine the table we find that 32.6%

of the variance is explained by income in T(1) (R Square = 0.326) and that

participant status only explains an additional 1.7%. This means that 95% of I -248 -

Table 72: T=TEST FOR THE DIFFERENCE OF MEANS OF INCOME FOR PARTICIPANTS AND CONTROL GROUP. HYPOTHETICAL DATA BASED ON A STUDY IN SONSONATE

Separate estimate of variance

No. of Standard 2-tail T Degrees of 2-tail cases Mean deviation F Value prob. value freedom prob.

Participants 156 199 254

1.81 0.004 2.86 209 0.005

Control 82 116 189

Source: Based on data from the Socio-economic study of Sonsonate. Income for participants in T(2) was multiplied by 1.2 to ensure a significant difference so as to be able to demonstrate the logic of the testing procedures. -249-

Table 73: MULTIPLE REGRESSION ANALYSIS WITH INCOME IN T(2) AS THE DEPENDENT VARIABLE AND WITH INCOME IN T(1) AND PARTICIPANT STATUS AS INDEPENDENT VARIABLES. HYPOTHETICAL DATA BASED ON A STUDY IN SONSONATE

Multiple R RSQ Simple Probability R Square Change R B Beta F of F

Income in T(1) 0.5709 0.326 0.326 0.57 0.916 0.546 88.9 0.01

Participant status 0.5888 0.343 0.017 0.23 84.159 0.133 5.32 0.05

Constant 147.49

Source: Based on data from the Socio-economic study of Sonsonate. Income for participants in T(2) was multiplied by 1.2 so as to ensure a significant difference to be able to demonstrate the logic of the testing procedure. -250-

the explained variance can be attributed to income in T(1). When we control

for the initial income, participant status still has a significant impact

on income in T(2) but only contributes an additional 5% of explained variance

and is only significant at the 0.05 level, as opposed to the 0. 005 level when

the T-Test was used. This indicates clearly that much of the impact which we

had originally attributed to the project is in fact due to the initial income

of the head. Having controlled for the effect of initial income we will now

proceed to examine the effect of the other variables which showed differences

between the two groups in T(1). Table 74 shows the results of a multiple

regression analysis in which family size, education, weeks worked last month,

age of head, months in present job, sex of head and type of residence were also introduced into the equation. The proportion of variance which is now explained rises to 39.6% (R Square = 0.396). The analysis was conducted through Stepwise Regression where the variables are introduced into the equation in descending order of importance of contribution to the unexplained proportion of the variance. In other words the program first selects the variable which explains the greatest proportion of the variance (in this case income in T(1), it then selects the next variable which explains the greatest proportion of the remaining variance, and so on until all variables have been introduced. 1/ If we take the 0.05 level as the minimum level for considering a variable significant, we find that the four variables which make a significant contribution to explaining the variance are the four where a significant difference was found between the two groups in T(1), namely:

1/ It is possible to request the program only to include variables which achieve a certain specified F value or RSQ change. ' - U- ~-251

Table 74: STEPWISE MULTIPLE REGRESSION WITH INCOME IN T(2) AS THE DEPENDENT VARIABLE AND PARTICIPANT STATUS AND VARIOUS SOCIO-ECONOMIC ATTRIBUTES OF THE SUBJECTS AS THE INDEPENDENT VARIABLES

Multiple R RSQ Simple Probability R Square Change R B Beta F of F

Income in T(1) 0.571 0.326 0.326 0.57 0.83 0.49 67.9 0.01

Family size 0.589 0.348 0. 021 0.30 21. 9 0.14 5. 9 0. 05

Education of head 0.604 0.365 0.017 0.20 17.8 0.17 6.9 0.01

Weeks worked last month 0.611 0.373 0.008 -0.05 -60.6 -0.09 2.9 0.10

Participant status 0.617 0.381 0.008 0.23 68.2 0.10 3.31 0.10

Age of head 0.624 0.390 0.009 0.02 3. 58 0.13 4. 1 0.05

Months in job 0.627 0.395 0.004 -0.03 -0. 22 -0. 06 1. 4 >0. 10

Sex of head 0.628 0.396 0.001 0. 07 -19.1 -0.03 0. 2 >0.10

Type of residence 0.629 0.396 0.000 -0.00 -10. 7 -0.01 0.09 >0. 10

Constant 137. 6

Source: Based on data from the Socio-economic study of Sonsonate. Income for participants in T(2) multiplied by 1.2 to ensure a signifiacant difference so as to be able to demonstrate the logic of the testing proscedure. -252- between the two groups in T(I), namely: income in T(1), family size, edu- cation of head and age of head. The crucial point for our present discussion is that when we control for the effect of all these variables, we find that there is no longer a significant difference between the income of the two groups in T(2). This means that the apparently significant impact of the project on income can be completely explained by the differences in the initial condition of the participant and control groups. On the basis of this result we can conclude that if a true experimental design had been used in which the two groups were matched on all these attributes in T(1), the ana- lysis would have shown that the project had no effect on income, and the income of the two groups would not have differed in T(2). This is an example of Case A in Figure 7 where E and C 2can be considered as two points on the same regression line.

The result of this analysis is extremely important as it suggests that with imperfectly matched experimental and control groups the results of traditiontal statistical significance tests are likely to be misleading.

The conclusion is that one should always attempt to control for the effects of these differences in the initial conditions of the two groups. -253-

REFERENCES

Bamberger, M. "Problems of SamR1e Attrition in the Evaluation of Social and Economic Change. A Case Study from El Salvador." Paper presented to the Evaluation Research Society Session on "Evaluation in Developing Countries" October 1978.

Blalock, H.M. 'tSocial Statistics" (Second edition) McGraw Hill 1972.

and Blalock, A (Eds) "Methodology in Social Research" McGraw Hill. 1968.

Campbell, D.T. and Stanley, J.C. "Experimental and Quasi-Experimental Designs for Research." Rand McNally. 1966.

Cohen J. and Cohen C. "Applied Multiple Regression/Correlation Analysis for the Behavioural Sciences." Lawrence Erlbaum Associates Publishers. New Jersey. 1975.

Cook, T.D. and Campbell, D.T. "Quasi-Experimentation. Design and Analysis Issues for Field Settings." Rand McNally 1979.

Hansen, M., Hurwitz, W. N. and Madow, W.G. "Sample Survey Methods and Theory." John Wiley 1953.

Hays, T.L. "Statistics for the Social Sciences." (Second Edition) Holt, Rinehart and Winston. 1973.

Johnston, J. "Econometric Methods" McGraw Hill 1963.

Kerlinger, F.N. "Foundations of Behavioural Research" (Second Edition) Holt, Rinehart and Winston 1973.

Kish, L. "" John Wiley 1965.

Kmietowicz, Z.W. and Yannoulis, Y. "Mathematical, Statistical and Financial Tables for the Social Sciences." Longman. 1976.

Lindauer, D. "Longitudinal Analysis and Project Turnover. Lessons from El Salvador." (mimeo) Urban and Regional Economics Division. World Bank. September 1979.

McNemar, Q. "Psychological Statistics" (Fifth Edition) Wiley 1975.

Nie, N.H. and others "Statistical Package for the Social Sciences" (Second Edition) McGraw Hill 1975.

Nunnally, J.C. "Psychometric Theor-y' McGraw Hill 1967. -254-

References (continued)

P. T. Resources "Analysis and Evaluation of KIP Implementation in Jakarta" P. T. Resources. Jaya Teknik Management. Indonesia. June 1979.

Reforma, M. "A Study of the Impact of the Project on the Physical Environment of Tondo." Research and Analysis Division. National Housing Autho- rity. The Philippines. 1979.

Sae-Hua, U. "A Preliminary Analysis of Panel Data on Income and Employment in El Salvador." (mimeo) Urban and Regional Economics Division. World Bank. 1979.

Siegel, S. "Non-Parametric statistics for the Behavioural Sciences." McGraw Hill. 1956.