Introduction to Biostatistics and Data Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Biostatistics and Data Analysis

UNIT Introduction to 3 Biostatistics and Data Analysis

In Unit 2, you will think about the nature of schools as settings for Health

Introduction

Unit 1 introduced a systematic approach to selecting indicators and monitoring information on health in your community. Unit 2 looked more closely at the research activities in epidemiology that are responsible for generating much of this evidence. Unit 3 will introduce the basic concepts, terms and methods for analysing epidemiologic information. This type of data analysis is generally called statistics or biostatistics. The latter is commonly used to indicate that the statistics are related to biologic or health conditions and not for example business and accounting statistics. For this module we will use the two terms interchangeably.

The statistics section of any Public Health course is often the most difficult for many of the students, as grasping the concepts and even understanding the notation and formulas requires a basic grounding in maths. However, it is important that any Public Health professional understand how to analyse basic data and understand important concepts like statistical testing (what it is and what it isn't) in order to interpret published research, as well as local data monitoring and programme evaluation reports. The statistics must be understood so that the Public Health practitioner can make informed decisions and correctly inform public policy, which is the topic of the last unit in this module. But first the statistics!

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 85 The contents of Sessions 1 and 2 are required for Assignment 1, so make sure you work through them thoroughly. Study Sessions 3 and 4 introduce you to using Epi Info for data analysis. Study Session 5 - 7 provide guidance for the initial processes your Epidemiological Report - Assignment 2.

You have two options: After Assignment 1 is handed in, start with Sessions 5 and 6, embark on your literature review, and then return to Study Sessions 3 and 4; or start working with Epi Info first in Study Sessions 3 and 4, and then start the literature review for the assignment. Remember, however, that a literature review is an essential process before you start to analyse the data, as is the process of identifying your study aim and objectives.

Study Sessions

Study Session 1: Basic Descriptive Statistics. Study Session 2: Basic Analytic Statistics. Study Session 3: Deciding on What Data Analysis to Use. Study Session 4: Using Epi Info for Data Analysis. Study Session 5: Assignment 2: Introduction and Literature Review. Study Session 6: Assignment 2: Study Methodology. Study Session 7: Assignment 2: Analyse Study Data.

Learning Outcomes of Unit 3

By the end of this unit you should be better able to:

. Understand basic statistical terms and concepts. . Conduct basic descriptive and analytic statistical analyses. . Use Epi Info to analyse a set of data. . Search for, locate and critically review literature relating to your epidemiological investigation. . Define and write a problem statement. . Identify aims and objectives for an epidemiological study. . Explore a selection of research methods related to your own discipline or area of interest.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 86 Unit 3 - Session 1 Basic Descriptive Statistics

Introduction

Once we have measured our variables of interest using the study design selected from Unit 2, we now have to describe the data so we can explain what we found to others. This session will introduce some basic statistical terms.

It will also discuss the different type of variables and scales and introduce the basic descriptive statistics that correspond to each type of variable.

Session Contents

1 Learning outcomes of this session 2 Readings 3 Clarifying statistical terms 4 Calculating measures of central tendency 5 Using measures of dispersion 6 Data reduction and frequency distributions 7 Practice using descriptive statistics 8 Session summary

1 LEARNING OUTCOMES OF THIS SESSION

By the end of this study session you should be better able to:

. Recognise different types of data. . Use simple statistical tools to summarise a data set. . Understand the use of descriptive statistics for different types of data.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 87 2 READINGS

Author/s Publication details Katzenellenbogen, (1997). Ch 11 - An Introduction to Data Presentation, Analysis, and J. M., Joubert, G., Interpretation. In Epidemiology: A Manual for South Africa. Cape Town: Abdool Karim, S. S. Oxford University Press: 101 - 114. Beaglehole, R., (1993). Ch 4 - Basic Statistics. In Basic Epidemiology. Geneva: WHO: 53 Bonita, R. & - 66. Kjellstrom, T. Friedman, G. (1980). Ch 2 - Basic Measurements in Epidemiology. In Primer of Epidemiology. New York: McGraw-Hill: 8 - 22

3 CLARIFYING STATISTICAL TERMS

This section invites you to familiarise yourself with some important statistical concepts which you will need to use in the process of analysing and interpreting data. Use these readings to define each of the concepts in your own words, and consolidate your understanding by doing the relevant task.

READINGS

Katzenellenbogen, J. M., Joubert, G., Abdool Karim, S. S. (1997). Ch 11 - An Introduction to Data Presentation, Analysis, and Interpretation. In Epidemiology: A Manual for South Africa. Cape Town: Oxford University Press: 101-103. See Reader, pages 193 - 195.

TASK 1 – DEFINE AND APPLY THE CONCEPT OF VARIABLES

Search the above readings for explanations of this term and then define it in your own words: Different types of variables/data.

FEEDBACK

As Katzenellenbogen et al put it, variables are “the characteristics which one measures, and about which data are collected …” (1997: 101). Variables can be divided into two broad categories: they are either quantitative or qualitative.

Quantitative variables are numerical, since the allocated numbers have a quantitative meaning. These quantitative variables can either be discrete or continuous.

Discrete quantitative variables can take on only whole numbers (integers) and

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 88 the basic unit of measurement cannot be subdivided, e.g. numbers of teeth in a mouth = 24, number of people at Olympics Opening Ceremony = 79 251.

Continuous quantitative variables can take an infinite numbers of possible values such as fractions and the basic unit of measurement can be subdivided, e.g. weight = 63.59 kg, distance from Cape Town to George = 359.69 km.

Qualitative or categorical variables are non-numerical, without any magnitude, e.g. gender (male/female), type of tooth (incisor/molar). However in reporting variables, a number can be used to describe a quality, e.g. gender. For example, 1 = male and 2 = female.

It is extremely important to understand the types of variables you are going to analyse, as the types of statistical analyses carried out will depend on the type of variable. Now check your understanding by doing Task 2.

TASK 2 – CONSOLIDATE YOUR UNDERSTANDING OF VARIABLES

Categorise each variable in the table by variable type, e.g. discrete quantitative variables.

Gender Type of House Age (yrs) Exam Mark (%) Income Group (R/month) M Brick 13 79.67 1000+ F Wood 11 79.59 500-999 F Mud 12 78.65 0-499

Variable Type Gender Type of House Age (yrs) Exam % Income Group

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 89 FEEDBACK

Variable Type Gender Qualitative or categorical Type of House Qualitative or categorical Age (yrs) Discrete qualitative or numeric Exam % Continuous qualitative or numeric Income Group Qualitative or categorical

. Gender is a qualitative (or categorical) variable. . Type of house is a qualitative (or categorical) variable. . The ages in the table are discrete, quantitative (numerical) variables. . The exam mark (%) is a quantitative (or numeric), continuous variable. . The income group is a qualitative (or categorical) variable. While actual monthly income would have been a continuous qualitative (or numeric) variable, because in this case it is income 'group' not actual income it is a categorical not a numeric variable. Numeric variables can be made categorical, but categorical variables cannot be name to be numeric, even though sometimes groups may be represented as numbers in the data set, they are still categories not numeric.

The method you choose for analysis is dependent on the variables being investigated.

READING

Beaglehole, R., Bonita, R. & Kjellstrom, T. (1993). Ch 4 - Basic Statistics. In Basic Epidemiology. Geneva: WHO: 53. See prescribed book.

TASK 3 – DEFINE AND APPLY THE CONCEPT OF SCALES

Search the above reading for explanations of the range of scales.

FEEDBACK

Types of Scales: A nominal scale has equivalency. Objects or observations are categorised or placed into classes, e.g. diabetic or non-diabetic. The characteristic of a nominal scale is that the attributes do not have a quantitative value. The assigned value is qualitative.

An ordinal scale is used to express the ranking of a characteristic in some empirical order, e.g. the use of an index for plaque measurement, ranking of

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 90 students in class or people by income level. The data may be discrete or continuous, and may be either qualitative or quantitative.

An interval scale includes quantitative characteristics that yield continuous units of measurements, and equal intervals between points on a scale, e.g. age or height. An interval scale does not have to begin with absolute zero, e.g. body temperature in F degrees.

A ratio scale involves measurements that indicate intervals on a continuous scale where one number is divided by another, e.g. percent of population that are children.

Using a similar table as above what type of scale is each variable below:

Gender Type of Age (yrs) Exam Mark (%) Income House Group (R/month) M Brick 13 79.67 1000+ F Wood 11 79.59 500-999 F Mud 12 78.65 0-499

Variable Type Gender Nominal Type of House Nominal or Ordinal Age (yrs) Interval Exam Mark (%) Ratio Income Group Ordinal

. Gender is nominal, because subjects are categorised as male or female, which do not have a quantitative value, and are not ordered in any way. . Type of House could be nominal if you don't consider one type of house to be better than any of the others, or ordinal if you would classify brick houses as better than wood, and wood better than mud. It would depend on the context in which you do your study and how you define the variable . Age is an interval variable, as it is quantitative data which is in continuous units of measurement, with equal intervals between points, i.e. the difference is the same from one year to the next. . Exam Mark (%) is a ratio measure, as percentages are a ratio of number correct per 100 questions. . Income group is ordinal, as groups have increasing Rand value per month, suggesting an order.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 91 4 CALCULATING MEASURES OF CENTRAL TENDENCY

Having introduced the basic vocabulary for types of variables and scales we use for data analysis, now we will consider the basic concepts of descriptive data analysis. Note that the type of data determines what type of statistics you can use. We will first consider descriptive statistics for numeric (quantitative) variables.

When examining or describing numeric data the first concept is the Measures of Central Tendency. These are summary indices, describing the central point of a set of measurements. These summary indices apply to quantitative variables and include means, medians, modes etc. Review the readings on these indices and complete Task 4.

READINGS

Beaglehole, R., Bonita, R. & Kjellstrom, T. (1993). Ch 4 - Basic Statistics. In Basic Epidemiology. Geneva: WHO: 55-56. See prescribed book.

Katzenellenbogen, J. M., Joubert, G., Abdool Karim, S. S. (1997). Ch 11 - An Introduction to Data Presentation, Analysis, and Interpretation. In Epidemiology: A Manual for South Africa. Cape Town: Oxford University Press: 108-111.

TASK 4 - CALCULATE MEASURES OF CENTRAL TENDENCY

Clarify the concepts Measures of central tendency, which include mean, median and mode; refer to the readings noted above. Use the index to find these concepts and then answer these questions in the tables below:

a) What type of variables are these, and b) Calculate the mean, median and mode for the age, distance to work and income variables in the table below.

Record No. Age (yrs) Distance to Work (km) Income (R) per month

1 23 13 1 500 2 31 19 500 3 55 15 3 500 4 43 35 3 500 5 55 76 6 000 6 19 44 700 7 17 23 400 8 44 14 1 500 9 43 6 9 000 10 37 55 700 Age (yrs) Distance to Work (km) Income (R) per month SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 92 Variable Type Mean Median Mode FEEDBACK The Table is completed as follows:

Age (yrs) Distance to Work (km) Income (R) per month Type/scale Numeric Numeric interval Numeric interval interval Mean 36.7 30 2730 Median 40 21 1500 Mode 43 & 55 None 700, 1500 & 3500 Type: All variables in this table are numeric interval data – quantitative, continuous with equal units.

Mean: This is calculated by summing the numbers in that column and then dividing by the number of measurements, for example the sum of the ten values of age is 367 and 367 divided by the numbers of observations, i.e. 10, equals 36.7.

Median: To calculate the median, you first order the data from lowest to highest and then select the middle value. In this example there are an even number of observations, so the average between the two middle values are the median. For example, when you order the ages 17 19 23 31 37 43 43 44 55 55, numbers 37 and 43 are the 5th and 6th values (the two middle values). The average of 37 and 43 is 40. This is the median age.

Mode: This is the most frequently occurring value. In this example, age has two variables appearing twice, i.e. the most frequently occurring; distance has no values that appear more than once; and income has three values that appear twice.

You will need to be able to use these concepts when analysing statistical information. Work with them, developing your own examples until you feel that they are part of your vocabulary.

Note that the most commonly used of these statistics is the mean, also called the average. This is the preferred measure unless there are extreme values in your data, for example the distance to work for the 55 km value was instead 250 km. This would increase the mean from 30 to 49.5. However, note the median would remain 21, as the middle values did not change. In a case where you have extreme measures that are very different from all or most of the other values, it is generally considered better to use the median, instead of the mean, because the SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 93 median is not changed by these unusual people/values.

5 USING MEASURES OF DISPERSION

In addition to measures of central tendency, when describing quantitative or numeric data we also report measures of dispersion that focus on the variation within a data set.

READINGS

Beaglehole, R., Bonita, R. & Kjellstrom, T. (1993). Ch 4 - Basic Statistics. In Basic Epidemiology. Geneva: WHO: 56-58. See prescribed book.

Katzenellenbogen, J. M., Joubert, G., Abdool Karim, S. S. (1997). Ch 11 – An Introduction to Data Presentation, Analysis, and Interpretation. In Epidemiology: A Manual for South Africa. Cape Town: Oxford University Press: 111.

TASK 5 – CLARIFY MEASURES OF DISPERSION

Once again, refer to these readings to define key measures of dispersion for yourself, including: range, percentile, inter-quartile range (IQR), standard deviation (S) and variance.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 94 FEEDBACK

While measures of central tendency tell us where the middle of the group of numeric measurements is, the measures of dispersion tell us how wide apart they are. For example two sets of data could have the same mean of 85, but one set could have a minimum value of 80 and a maximum of 90, while another set of data might have a minimum of 30 and a maximum of 250. Clearly these two sets of data are different. Your goal is to describe the data so that is why we report both a measure of central tendency and a measure of dispersion when describing numeric data.

The range is the simplest and weakest of the measures of dispersion, and can be reported as the minimum and maximum values and/or the difference between the largest and smallest measurements. It is also sensitive to extreme values.

Percentile is as if the sample was split into 100 parts. One example of a percentile is the median, which is the 50th percentile: it divides the data set into two equal halves - with half below and half above. One could calculate anything from the 1st to the 99th percentiles of a data set. For example the 25th percentile has 25% of the values below it and 75% of the values above it.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 95 The calculation of percentiles also differs (as with the median) for equal and odd numbers of observations. As with the median if there is an odd number of measurements then the middle value is the median, however if there are an even number of measurements then the average of the two middle measurements is the median. It is the same for calculating any other percentile. The following procedure can be used to calculate any percentile including the median:

Calculating a Percentile

1st calculate np/100 which equals number of observations or measurements (n) multiplied by the desired percentile (p) then divide by 100. If np/100 is not an integer or whole number, (and includes a decimal point, e.g. 2.5), then the desired percentile will be the (k+1)th largest sample point where k is the largest integer less than np/100 – for example if np/100 = 10 X 25/100 = 2.5 then k = 2 and K+1 = 3 so the 3rd number in the ordered series of 10 numbers would be the desired percentile (25th). If np/100 is an integer or whole number (for example 9) then the desired percentile is the average of the (np/100)th and (np/100+1)th largest observation - for example if np/100 = 12 X 75/100 = 9 then the 75th percentile equals the average of the 9th and 10th numbers in the ordered series of 12 numbers.

The Inter-quartile Range (IQR) is the difference between the 75th and 25th percentiles. It encloses the central 50% of the observations. Noting that all of these statistics are percentiles - the median (50th percentile), the 25th and 75th percentiles - it should be easy to recognise that if you are using the median as your measure of central tendency, that you should use the inter-quartile range as your measure of dispersion.

The standard deviation (S) is the mean of the differences between each measurement and the mean of that data set. This definition in words is very difficult to understand. Understanding a standard deviation is better done with a picture:

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 96 Mean = 30

-10 20 +15 45

-15 15

+10 40

______10 20 30 40 50

FIGURE 1: Differences Between Each Measurement and the Mean.

If you were to calculate the mean for the values 20, 45, 15 & 40 it would equal 30. The difference between each value and the mean (x-μ) is noted in italics, for example 20-30 = -10. The standard deviation is the mean or average of these differences. Recall how to calculate the mean: you sum the numbers and divide by the number of measurements. However, in this case if we added the positive and negative numbers the sum would equal 0. Therefore, to be able to calculate the average difference between the values and the mean we must get rid of the negative signs. One way to do this mathematically is to square the number, i.e. -10 X -10 = +100. So we can sum the numbers and divide by the number of observations, and finally convert it back to the original scale by taking the square- root of the number just calculated. The number before we take the square-root is called the variance.

The variance (S 2) is the square of the standard deviation. The formula for the standard deviation and variance can be found on Beaglehole et al (1993) on pages 56-7. This is a complex calculation. A simpler calculation often used is the 2 formula: S = square root of the sum () of each measurement squared (xi ) 2 minus the sum of all measurements squared (xi) divided by the number of observations (n) all divided by (n – 1), which is the square root of the following formula:

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 97 2 2 2 S =  (xi ) – ((  xi ) / n) (n – 1)

In the literature you will often find the expression: ± sd. This summarises the mean of the data set with the standard deviation. In a normal distribution, (where the mean and median are similar, and the measurements are equally spread about the mean and median), one would expect 68% of the observations to fall within the ± 1sd range. Also, in a normally distributed data set, 95% of the observations will lie within ± 1.96sd and 99% of the observations will lie within ± 2.58sd. (See Figure 4.4 in Beaglehole (1993), page 57).

In words and mathematical formulas the standard deviation and the variance appear quite complex, however, using the graphic example above hopefully you now have sense that it is actually a relatively simple concept - the average of the differences between the measurements and the mean. Like its counterpart the mean, the standard deviation is the preferred measure of dispersion. Because of its relationship to the normal curve (Figure 4.4 in Beaglehole (1993) page 57), it is a very powerful statistic and is used in calculating many of the other statistics we will review in future sections of this unit.

TASK 6 - DETERMINE THE DEGREE OF DISPERSION

Calculate the range, IQR, variance and standard deviation for the age, distance to work and income variable presented in the table in the previous section.

FEEDBACK

AGE (yrs) DISTANCE to Work (km) INCOME (R per Month RANGE 17-55 6-76 400-9000 IQR 23 20 2600 VARIANCE 192 498 8,095,670 STANDARD 14 22 2845 DEVIA TION

Range: This is the lowest and highest values in the series. These are easily obtained from the ordered (sequenced) list you did above to obtain the median.

IQR: is the 75th percentile minus the 25th percentile. If you use the ordered list for age that we used before, the value with 25% of the measures below is 23 (3rd value) and the value with 75% below is 44 (8th value). 44 minus 23 equals 23.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 98 Variance and Standard Deviation: are calculated using the formula above. It is often easier to create a table as follows to get the necessary values:

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 99 2 Observation AGE (xi) AGE Squared (xi ) 1 23 529 2 31 961 3 55 3025 4 43 1849 5 55 3025 6 19 361 7 17 289 8 44 1936 9 43 1849 10 37 1369 SUM () (367)2=134689  10 = 13468.9 15193

Therefore S2 = (15193 – 13468.9)/(10-1) = (1724.1) / (9) = 191.567 is the variance and the square root of 191.567 = 13.841 is the standard deviation.

As another example to work through see if you can also complete Task 7.

TASK 7 – CALCULATE MEASURES OF CENTRAL TENDENCY AND DEGREE OF DISPERSION

a) Categorise the types of variables. b) Calculate the mean, median, mode, range, IQR and standard deviation of the variables in this table.

Record No. Mother’s Age (yrs) Birth Weight of New Born (grams) 1 19 2523 2 21 2622 3 35 2948 4 28 2325 5 32 3600 6 16 1889 7 22 2126 8 35 3430 9 42 2125 10 31 3062

FEEDBACK

See above.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 100 6 DATA REDUCTION & FREQUENCY DISTRIBUTIONS

If however, your data is categorical (qualitative) you will need to use different statistics to describe the data from your study. Review the following readings and complete Task 8.

READINGS

Beaglehole, R., Bonita, R. & Kjellstrom, T. (1993). Ch 4 - Basic Statistics. In Basic Epidemiology. Geneva: WHO: 54-55. See prescribed text.

Katzenellenbogen, J. M., Joubert, G., Abdool Karim, S. S. (1997). Ch 11 - An Introduction to Data Presentation, Analysis, and Interpretation. In Epidemiology: A Manual for South Africa. Cape Town: Oxford University Press: 106-108.

Friedman, G. (1980). Ch 2 - Basic Measurements in Epidemiology. In Primer of Epidemiology. New York: McGraw-Hill: 15 - 17. (Note that in this reading the quantitative variable serum uric acid is converted from a continuous numeric variable to a categorical variable and then presented using statistics for a categorical variables).

TASK 8 - DEFINE DIFFERENT FORMS OF DATA REDUCTION

Define the terms: Data Reduction, Frequency Distributions, Relative Frequency, Cumulative Frequency and Class Intervals.

FEEDBACK

The concept of Data Reduction includes the calculation of Frequency Distributions, Relative Frequencies and Cumulative Frequencies. A Frequency Distribution is a set of ordered measurements and their corresponding frequencies. The frequency (f) is the number of times a measurement occurs in a data set. The Relative Frequency (Rf) Distribution is the proportion or percentage of the frequency of the measurement in the data set. This is calculated by dividing the frequency of the measurement by the total number of observations (f/N). If the data set is very large and become cumbersome to work with, one often has to group or condense the data set into class intervals. The formula used to calculate the number of class intervals is called the Sturges’ Rule (k=1+3.322[log N]), where k is the number of class intervals and N the total number of measurements. The formula used to calculate the width of a class interval is as follows: W=R/k, where W is the width of the class interval, R the highest to lowest measurements and k the number of class intervals. These formulas are used as a guideline for the number and width of class intervals. Statisticians suggest that the number of class intervals should range from 7 to 20. SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 101 The Cumulative Frequency (Cf) Distribution is the successive addition of the number of frequencies.

TASK 9 - REDUCTION OF FLUORIDATION DATA

The Fluoride Concentration of Municipal Water Supplies in South Africa is given as:

1.2, 0.03, 0.05, 0.7, 0.5, 0.03, 0.08, 0.7, 0.9, 1.2

Order the measurements, then count the frequency for each concentration and calculate the relative and cumulative frequency distribution. These all can be represented in a single table, also known as a 1-way frequency table.

Fl f Rf Cf 0.03 2 2/10 = 0.2 or 20% 2/10 = 0.2 or 20% 0.05 1 1/10 = 0.1 or 10% 2+1/10 = 0.3 or 30% 0.08 1 1/10 = 0.1 or 10% 2+1+1/10 = 0.4 or 40% 0.5 1 1/10 = 0.1 or 10% 2+1+1+1/10 = 0.5 or 50% 0.7 2 2/10 = 0.2 or 20% 2+1+1+1+2/10 = 0.7 or 70% 0.9 1 1/10 = 0.1 or 10% 2+1+1+1+2+1 = 0.8 or 80% 1.2 2 2/10 = 0.2 or 20% 2+1+1+1+2+1+2/10 = 1 or 100%

TASK 10 - WORK OUT THE FREQUENCY DISTRIBUTION

The following are course marks (%) for a class of postgraduate students in 1999.

65, 60, 73, 50, 65, 55, 70, 45, 65, 75, 50, 60, 70, 55, 75, 70, 55, 50, 65, 55, 65, 60, 73, 50, 65, 55, 70, 45, 65, 75, 50, 60, 70, 55, 75, 70, 55, 50, 65, 55

Draw a frequency distribution table of the marks using the same format as the table above i.e.

Observation F Rf Cf

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 102 7 PRACTICE USING DESCRIPTIVE STATISTICS

Now let’s examine a set of data and see if you can apply the principles and statistics learned in this study session to produce descriptive statistics on a set of study data. Review the reading and tasks in this session and complete Task 11.

TASK 11 – ANALYSE A SET OF DATA

The table below emerged from a study of the determinants of heart disease that was carried out.

a) Indicate the qualitative/quantitative nature of the variables. b) Calculate the mean, median, mode and standard deviation of the income variable. c) Draw a frequency distribution, Rf and Cf of the heart disease and level of stress variables. d) Graphically present the relationship between i) Income and Heart Disease and ii) Income and Stress.

Income per Month Heart Disease Marital Status Level of Stress 3000 N 2 2 4500 Y 1 2 1500 N 1 2 750 Y 1 1 12000 Y 3 1 450 Y 4 1 750 Y 4 2 900 N 1 2 1100 N 1 1 5000 N 4 2 350 N 1 1 3500 Y 3 2 900 N 2 2 5500 Y 3 2 3000 N 1 1 3500 Y 2 2 1200 N 4 2 800 Y 2 1 1200 N 2 2 1500 Y 4 1

FEEDBACK

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 103 See previous tasks and study session readings.

8 SESSION SUMMARY

This study session introduced you to some basic terms and concepts used in data analysis of epidemiologic data. In particular it focuses on descriptive statistics. The purpose of descriptive statistics is simply to describe the data results from your study. However, sometimes one of the objectives of the study is not just to describe a population or sample of people, but also to compare some people to others. This latter type of analysis requires more complex statistics. The next session will introduce you to the basic concepts and statistics used to answer analytic questions, (i.e. to compare groups of study subjects).

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 104 Unit 3 - Session 2 Basic Analytic Statistics

Introduction

The analysis of the examples until now has mainly involved descriptive statistics, i.e. different ways to summarise and describe the main features of the data. In most analytical studies, you will also need to determine whether certain determinants (or exposures) and health or disease outcomes, are related. One way to do this is to test their statistical association. However, even if this turns out to be statistically significant, this must not be confused with clinical significance (or lack thereof) of the relationship between the variables.

Session Contents

1 Learning outcomes of this session 2 Readings 3 Comparing means 4 Measures of effect for categorical variables 5 Testing for association between two variables 6 Practice using analytic statistics 7 Session summary

1 LEARNING OUTCOMES OF THIS SESSION

By the end of this study session you should be better able to:

. Understand how to compare sets of numeric (or quantitative data). . Use measures of effect and measures of association. . Understand the use of analytic statistics for different types of data.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 105 2 READINGS

Author/s Publication details Katzenellenbogen, (1997). Ch 11 – An Introduction to Data Presentation, Analysis, and J. M., Joubert, G., Interpretation. In Epidemiology: A Manual for South Africa. Cape Town: Abdool Karim, S. S. Oxford University Press: 111-123. Beaglehole, R., (1993). Ch 4 - Basic Statistics. In Basic Epidemiology. Geneva: WHO: Bonita, R. & 58-69. Kjellstrom, T. Unwin, N., Carr, S. (1998). Ch 3 - Measures of Risk. An Introductory Study Guide to Public & Leeson, J. Health and Epidemiology. Open University Press. Buckingham: 37-45.

Rothman, K. (2002). Ch 6 - Random Error and the Role of Statistics. In Epidemiology: An Introduction. New York: Oxford University Press: 113-117.

3 COMPARING MEANS

If the objective is to compare two groups and the data is numeric then as with descriptive statistics you have particular statistics that apply. In this case we once again use the mean of the numeric variable. For example if you wanted to compare the mean birthweight of babies born to smokers and non-smokers. Both the simplest and most common comparison of numeric data is the mean difference. The mean difference is a simple parameter:

Mean in Group 1 (exposed) minus Mean in Group 2 (non-exposed)

For example if the mean birthweight in the smokers is 2540 gms and the mean birthweight in the non-smokers is 3230 gms then the mean difference equals -690 gms, suggesting that smoking by the mother decreases the mean birthweight of babies. If the mean difference is Ǿ, then there is no difference in the two groups.

Another measure of effect (comparison) for numeric data is the correlation coefficient. Read the following and try Task 1. READING

Beaglehole, R., Bonita, R. & Kjellstrom, T. (1993). Ch 4 - Basic Statistics. In Basic Epidemiology. Geneva: WHO: 65. See prescribed text.

Katzenellenbogen, J. M., Joubert, G., Abdool Karim, S. S. (1997). Ch 11 – An Introduction to Data Presentation, Analysis, and Interpretation. In Epidemiology: A Manual for South Africa. Cape Town: Oxford University Press: 120.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 106 TASK 1 - ASSESS ANALYTIC STATISTICS FOR NUMERIC DATA

a) Discuss the difference between a correlation coefficient and the mean difference. b) Interpret the following data: Pearsons R = - 0.6 Mean Difference = 23 Pearsons R = 0.80 Mean Difference = -180 Mean Difference = Ǿ

FEEDBACK a) The mean difference subtracts the mean (or average) in one group compared to the mean in the other group. The correlation coefficient measures how one variable changes (e.g. blood pressure) with changes in another (e.g. age). In this case there are no groups, you would just have a set of blood pressures with the corresponding age of the person and you would see if blood pressure goes up or down as people in the sample are older. A positive correlation suggests blood pressure goes up with increasing age, while a negative correlation suggests that blood pressure goes down with increasing age. b) A Pearsons R of -0.6 is a negative number suggesting that as one variable goes up the other goes down.

A mean difference of 23 suggests that the mean in the first group is 23 points higher (e.g. systolic blood pressure) than the mean in group 2.

A Pearsons R of 0.80 suggests that as one variable goes up the other also goes up. A mean difference of -180 suggests that mean in group 1 is 180 points (e.g. weight in grams) lower than in group 2.

A mean difference of Ǿ suggests that the two groups are similar.

4 MEASURES OF EFFECT FOR CATEGORICAL VARIABLES

In the previous task we saw how to measure whether one variable had an “effect” on another variable for numeric data. We can also measure 'effect' in categorical data. Read the following and complete Task 2.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 107 READING

Katzenellenbogen, J. M., Joubert, G., Abdool Karim, S. S. (1997). Ch 11 – An Introduction to Data Presentation, Analysis, and Interpretation. In Epidemiology: A Manual for South Africa. Cape Town: Oxford University Press: 116-119.

Unwin, N., Carr, S. & Leeson, J. (1998). Ch 3 - Measures of Risk. An Introductory Study Guide to Public Health and Epidemiology. Open University Press. Buckingham: 37-45.

TASK 2 - DEFINE TERMS RELATED TO THE EFFECT OF ONE VARIABLE ON ANOTHER

Define the following terms: Risks, the 2-by-2 (2 X 2) table, Relative Risk, Attributable Risk and Odds Ratios.

FEEDBACK

Many studies set out to study a relationship/association between a disease, (e.g. tooth decay) and an exposure, (e.g. sugar). Often the results of these studies can be summarised into a 2 X 2 table as follows:

Diseased Not Diseased

Exposed A B A+B

Not Exposed C D C+D

A+C B+D

The effect of the exposure on disease can be calculated from the 2 X 2 table (also known as a 2-way frequency table). In a prevalence or incidence study, the people studied could be placed into any of the above four cells (A, B, C or D). The risk (or rate) of disease among the exposed is A/A+B and the risk of diseased among the non-exposed is C/C+D, i.e. the number of people who have disease in the exposed (A) divided by the total number of people who are exposed (A+B) and the number of people who have disease in the non-exposed (C ) divided by the total number of people who are not exposed (C+D).

One measure of the association between the exposure and disease is the ratio of the risk of disease in the exposed to the risk of disease in the non- exposed, called the Relative Risk or Risk Ratio (RR). This is calculated by A/A+B ÷ C/C+D. The relative risk is always a positive number and a RR of 1 means that there is no association between the exposure and the disease. A RR of more than 1 indicates the

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 108 exposure is a risk factor (more disease among the exposed) and a RR of less than 1 (a decimal/fraction) suggests that the exposure is protective (less disease among the exposed). If there is no difference in the exposed and non-exposed the RR will equal 1. Another measure of association, the Attributable Risk (AR), is one that is particularly important to Public Health, as it provides an estimate of the actual increase or decrease in the burden of disease. For this measure the rate in the non-exposed is subtracted from the rate in the exposed. This is simply calculated by A/A+B – C/C+D. The AR often provides an indication of the Public Health importance of a disease because the units of measure are the same as the actual rates of the disease, for example percent of children with diarrhoea. For example, two studies might find the following ARs - 20% (40%-20%) and 0.2% (0.4%-0.2%); notice that these two situations have the same RR (2.0) but a quite different AR. Clearly the 1st occurs much more frequently in the community. However, severity of the condition must also be taken into account: for example, if the 1st was just the common cold and the second was death rates in children, the second would still be a major concern despite its rare occurrence. If the AR is a positive number then it indicates that there is more disease in the exposed, while an AR that is a negative number indicates that the exposed have less disease, and an AR of zero indicates the rate of disease in the exposed and non-exposed is the same. In a case-control study, a different measure of association is used. In these studies, the case and controls are first selected (A+C and B+D) and then the exposure status is determined. Here the odds of exposure among the diseased (A/C) and the odds of exposure among the non- diseased (B/D) are determined. The Odds Ratio is calculated as follows: A/C ÷ B/D = AD/BC. It is useful to note that when a disease is rare, (i.e. occurs in the population in less than 10% of people), the odds ratio will be very close to the relative risk. As the odds of exposure in the diseased can sometimes be a difficult concept to explain, if the disease is rare, you can also interpret the odds ratio to be the increased risk of disease in the exposed. However, if the disease is not rare then the odds ratio will be much larger than the relative risk and this rule would not hold. Because the OR is also a ratio measure, the direction of effect is the same as in the RR, and no effect would be an OR =1.

5 TEST FOR ASSOCIATION BETWEEN TWO VARIABLES

When an effect is seen between one variable and another (e.g. exposure and disease), it is possible that the difference seen was just the result of random error when sampling the exposed and the non-exposed (or the diseased and non- diseased in a case-control study). Some of the most commonly used statistics SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 109 test for the likelihood that this type of error has occurred in a study. Review the following readings and then complete Task 3.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 110 READINGS

Beaglehole, R., Bonita, R. & Kjellstrom, T. (1993). Ch 4 - Basic Statistics. In Basic Epidemiology. Geneva: WHO: 58-69. See prescribed text.

Katzenellenbogen, J. M., Joubert, G., Abdool Karim, S. S. (1997). Ch 11 – An Introduction to Data Presentation, Analysis, and Interpretation In Epidemiology: A Manual for South Africa. Cape Town: Oxford University Press: 111-116.

Rothman, K. (2002). Ch 6 - Random Error and the Role of Statistics. In Epidemiology: An Introduction. New York: Oxford University Press: 113-117.

TASK 3 - DEFINE TERMS RELATED TO STATISTICAL TESTS FOR ASSOCIATION AND SIGNIFICANCE

Define the following terms: Null Hypothesis, Random error, Type I & Type II error, Significance Tests and P-Value, Chi-square and t-tests and confidence interval.

FEEDBACK

The strength of the association between disease and exposure can be assessed using significance tests. In 2 X 2 tables (qualitative data),you test the significance of the association by carrying out a Chi-Square test with an attached p-value (see later). For quantitative data, the mean scores between two groups can be tested for significance. This test is called the t-Test, again with an attached p-Value. For both the Chi-Square and t-Tests, commonly a p-value of less than 0.05 is considered a significant association between the disease and the exposure. A p- value greater than 0.05 shows a non-significant association between the two.

However, the above representation of the p-value is fairly simplistic. The underlying theory behind statistical testing is that you are 'testing' the null hypothesis, i.e. that there are no differences in the groups (RR or OR =1; Mean Difference or AR = 0). Due to the fact that we sample populations we can by chance have random error in our sample (see Katzenellenbogen Tables 11.3 & 11.4, pages 112 & 118). Because of random error we can make two kinds of mistakes: find a difference when the two groups are actually the same i.e. reject the null hypothesis when it is true (Type I error) or not find a difference when the two groups are not the same i.e. do not reject the null hypothesis when it is in fact false (Type II error). The p-value is actually the probability expressed as a decimal that we have made a Type I error. Note that in neither of the statements related to Type I and Type II error do we state that we "accept the null hypothesis" this is because we can never accept the null hypothesis - we either

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 111 reject it or do not reject it. This may seem like semantics, but it is an important distinction that relates to current theories in science, which are beyond the scope of this module. If you are interested in reading more about this consider the works by Karl Popper or obtain and read epidemiology textbooks by Kenneth Rothman such as the one in the reference list.

The corollary to the p-value is the confidence interval. The confidence interval indicates the range within which the investigator feels 95% 'confident' that the true value of whatever they are measuring (mean, RR, OR, rate in the population) falls between. For example the RR may be 2.1 and the 95% confidence interval around the RR is 1.5 to 3.6. Note that if the confidence interval does not include the null value (in this case 1 for the RR) then the p- value will be less than 0.05 so you can tell the statistical significance from the confidence interval as well as the p-value.

Now try calculating some of these statistics. Complete Task 4.

TASK 4 - TEST THE ASSOCIATION BETWEEN OCCUPATION AND LUNG DISEASE

Consider the 2 X 2 table below: a) Calculate the appropriate measure of association if this was i) a prospective cohort study and ii) case-control study. b) Interpret the measures of association calculated in a. c) Interpret the p-Value.

A 2 X 2 table representing the relationship between occupation and lung disease (p-Value < 0.001):

Lung Disease No Lung Disease Miner 55 16 Non-Miner 22 43

FEEDBACK a) (i) For a prospective cohort study you can calculate either a relative risk or an attributable risk as follows: RR = (A/A+B) divided by (C/C+D) = 55(71) divided by 22(65) = .775/.338 = 2.29 AR = A/(A+B) – C/(C+D) = 55/71 – 22/65 = .775 - .338 = .437 or 43.7% a) (ii) For a case-control study you would use the odds ratio as follows: OR = AD/BC = 55(43)/22(16) = 2365/352 = 6.72 b) (i) Miners are 2.29 times as likely to have lung disease then non-miners or miners have 43.7% more lung disease than non-miners. SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 112 b) (ii) The odds of being a miner is 6.72 times higher in those with lung disease, than in those who do not have lung disease. **Note that the occurrence of disease in this population is 77.5% and 33.8%, not rare, and that the odds ratio is much larger than the relative risk. c) The p-value is <0.001 which is very small and suggests that the differences between the two groups (miners and non-miners) is statistically significant or not likely due to random error.

6 PRACTICE USING ANALYTIC STATISTICS

Try another example - complete Task 5

TASK 5 - TEST THE ASSOCIATION BETWEEN HEART DISEASE AND SELECTED VARIABLES

Use the data set on income, heart disease, marital status and stress in Unit 3 Session 1 Task 11 to answer the following questions. Note that Stress = 1 is a higher stress level than Stress = 2 and that Marital Status = 1 is Married and Marital Status = 2 is Single.

a) Create a 2 X 2 table of the relationship between levels of stress and heart disease. b) Select only married and single people and create a 2 X 2 table for marital status and stress compared to people without heart disease. c) Calculate the odds ratios for these relationships. d) Interpret your findings in (c) above. e) What does a p-Value measure?

FEEDBACK

a) 2 X 2 TABLE OF STRESS & HEART DISEASE HEART DISEASE NO HEART DISEASE STRESS = 1 5 3 STRESS = 2 5 7

b) 2 X 2 TABLE OF MARITAL STATUS & HEART DISEASE HEART DISEASE NO HEART DISEASE SINGLE (MAR STAT=2) 2 3 MARRIED (MAR STAT=1) 2 5

c) Relationship of Stress and Heart Disease OR = 5(7)/5(3) = 35/15 = 2.33 Relationship of Marital Status and Heart Disease OR = 2(5)/2(3) = 10/6 = 1.67

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 113 d) Higher stress and being single are more likely (2.33 times and 1.67 times, respectively) in people with heart disease. e) A p-value measures the probability or likelihood that an association seen between an exposure and a disease is due to chance or random error. It indicates the strength of the association with smaller p-values indicating a stronger relationship between the two variables. A p-value of less than 0.05 is often said to be statistically significant, which means that statistically the association is strong or it is a “significant” association.

7 SESSION SUMMARY

In this session you learned how to use analytic statistics to compare two groups. You also were introduced to many new statistical terms and the concept of statistical significance testing. The calculations for these latter tests are quite complex and difficult to do using hand-calculations. Also if a study has a large sample size, even the simplest calculations may be difficult. Therefore, generally a computer is used to assist with data analysis and complex calculations. Study Session 4 in this unit introduces a commonly used computer programme for analysing epidemiologic data – Epi Info. In the next session, you will distinguish what type of data analysis you would use for different types of study data and study questions.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 114 SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 115 Unit 3 - Session 3 Deciding What Data Analysis

to Use

Introduction

In this session, you are guided through selecting an appropriate type of data analysis, and introduced to the different uses of descriptive and analytic statistics.

Session Contents

1 Learning outcomes of this session 2 Readings 3 Choosing your type of analysis 4 Map a set of data 5 Descriptive statistics – uses and misuses 6 Using analytic statistics to answer “why” questions. 7 Session summary

1 LEARNING OUTCOMES OF THIS SESSION

By the end of this study session you should be better able to:

. Understand which statistical analysis you should use for different types of data and study questions. . Compare the different uses of descriptive and analytic statistics.

2 READINGS

Author/s Publication details Refer to the readings provided in Study Sessions 1 and 2 of Unit 3.

3 CHOOSING YOUR TYPE OF ANALYSIS

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 116 The type of analysis you use in your study will depend primarily on two basic questions:

Question 1 – What type of data do you have?

Numeric data: you would use the mean, median, range, standard deviation, etc.

Categorical data: frequency distributions.

Question 2 – What type of study question(s) do you have?

What, Where, Who, When: Descriptive statistics = measures of central tendency and dispersion for numeric data or simple (1-way) frequency tables for categorical data.

Why: Analytic statistics = comparison of two means (Mean Difference) for numeric data or cross-tabulation of frequencies (2 X 2 Table) and calculation of associated RR, AR or for categorical data WITH associated p-values for each comparison or cross-tabulation.

Descriptive statistics are used when we just want to describe what is happening in a population, for example in an annual report of health data or a community survey of health behaviours and diseases.

Analytic statistics are used when we want to try and understand why people are getting certain diseases – this can include understanding the causes of disease or just looking for risk factors or associations with the disease or health outcome of interest. Review Unit 2, Study Session 1 for definitions of these terms.

4 MAP A SET OF DATA

Let’s look at a practical example from an assignment used in this module in a previous year. Review the Example assignment and study questions below and then try Tasks 1 and 2.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 117 DESCRIPTION OF EXAMPLE ASSIGNMENT

You are in charge of maternal and newborn health for your district. Your district is a small rural district in a primarily farming area in South Africa. You have one district hospital which delivers approximately 700-800 infants per year. This represents 50-75% of births in your district, with the remainder delivering at home in the villages with traditional birth attendants. The population at the hospital is fairly homogeneous of the same racial and ethnic group. The hospital has four doctors (two general practice, one paediatrician and one obstetrician) and four advanced midwives to cover the maternity ward. You have all the WHO recommended services for essential obstetric care, such as caesarean delivery, blood for transfusion, etc. Your neonatal services are limited to essential neonatal resuscitation at birth, one incubator, mask oxygen, intravenous therapy and gavage tube feeding. Transfer of newborns and mothers for advanced care on an emergency basis is generally not available as the closest tertiary centre is over 4 hours away and there are limited ambulances and transport in the district.

Monthly data from your hospital suggests a high perinatal death rate. You know that birthweight and gestational age at birth are associated with perinatal mortality, so you undertake a study to investigate the outcomes of births at your hospital. You are fortunate to have a UWC MPH student working at your hospital so you ask them to collect the data. Data was collected on all births from mid-March through October. They entered the data into an EpiInfo version 6 (Epi6) data set. However, your student assistant has not yet completed their epidemiology module so you will have to assist with the data analysis and prepare the report and make recommendations to the District Management Team and Hospital Administration.

YOUR STUDY QUESTIONS WERE:

1) What is the birthweight and gestational age distribution of births at your hospital. 2) What is the rate of perinatal death, low birth weight, preterm birth and small for gestational age at your hospital. 3) Why are babies dying, that is what are the likely contributors to, or causes of, perinatal death at your hospital.

TASK 1 – TYPES OF STUDY DATA

Answer the first of the two questions in the introduction above to help you decide what type of data analysis to use:

Question 1 – Complete Table A below to help you answer this question as follows: a) Indicate the types of data for each of the variables in your study questions in the 2nd column of the table below b) Now indicate in the 3rd column of the table above what type of data analysis would be appropriate for this type of data.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 118 TABLE A – TYPES OF STUDY DATA Study Variable Type of Data (numeric OR Type of Data Analysis categorical) (measure s of central tendency & dispersio n OR frequenc y analysis) Birthweight (weight in grams at birth) Gestational Age (completed weeks of pregnancy at birth) Perinatal Death (baby died right before or after birth) Low Birth Weight (baby born with birthweight <2500gms) Preterm Birth (baby born at <37 weeks of pregnancy) Small for Gestational Age (baby born small for gestational age according to standard growth scale)

FEEDBACK TABLE A – TYPES OF STUDY DATA

Review the previous sections and tasks in the earlier study sessions in this Unit to assist you. The Feedback Table is over the page.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 119 Study Variable Type of Data (numeric OR Type of Data Analysis categorical) (measure s of central tendency & dispersio n or frequenc y analysis) Birthweight (weight in grams at Numeric Measures of central birth) tendency & dispersio n (mean, median, mode, range, IQR, Standard Deviation ) Gestational Age (completed Numeric Measures of central weeks of tendency pregnancy at & birth) dispersio n (mean, median, mode, range, IQR, Standard Deviation ) Perinatal Death (baby died right Categorical – yes-died or no- Frequency table analysis before or after lived (frequenc birth) y, relative frequenc y (%), cumulativ e frequenc y (%)) Low Birth Weight (baby born Categorical – yes-BW<2500g Frequency table analysis with or no- (frequenc birthweight BW>=2500g y, relative <2500gms) frequenc y (%), cumulativ e frequenc y (%)) Preterm Birth (baby born at <37 Categorical – yes-born before Frequency table analysis weeks of 37 weeks or (frequenc pregnancy) no-born 37+ y, relative SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 120 weeks frequenc y (%), cumulativ e frequenc y (%)) Small for Gestational Age (baby Categorical – yes-born small for Frequency table analysis born small for age or no-not (frequenc gestational born small for y, relative age according age frequenc to standard y (%), growth scale) cumulativ e frequenc y (%))

TASK 2 – IDENTIFY THE STUDY QUESTIONS

Now answer the second question to help you decide what kind of analysis to use: Question 2 - What type of study question(s) do you have?

Determine the type of study questions you have in your project. Complete Table B below for each study question as to type of study question, type of analysis and actual expected data output to answer each question.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 121 NS

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 122 Study Question Type of Study Type of Analysis Expected Data Output – ( expected D data or data e table(s) s c r i p t i v e

o r A n a l y t i c ) 1a) What is the birth weig ht distri buti on of birth s at your hos pital. 1b) What is the gest atio nal age distri buti on of birth s at your hos pital. 2a) What is the rate of peri nata l SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 123 deat h at your hos pital 2b) What is the rate of low birth weig ht at your hos pital 2c) What is the rate of pret erm birth at your hos pital 2d) What is the rate of smal l for gest atio nal age at your hos pital 3) Why are babies dyin g, that is what are the likel y cont ribut ors to, or caus es of, peri nata l deat h at your SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 124 hos pital.

FEEDBACK TABLE B – TYPES OF STUDY QUESTIONS

First notice that to complete this exercise we had to split questions 1 and 2 into subcomponents one for each variable in the question. The Feedback Table is over the page.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 125 Study Question Type of Type of Expected Data Output A n a l y s i s 1a) What is the WHAT DESCRIPTIVE The mean, median, mode, birth range, IQR wei and Standard ght Deviation of distr birthweight in ibuti your babies on of birth s at your hos pital . 1b) What is the WHAT DESCRIPTIVE The mean, median, mode, gest range, IQR atio and Standard nal Deviation of age gestational distr age in your ibuti babies on of birth s at your hos pital . 2a) What is the rate of WHAT DESCRIPTIVE A 1-way frequency table of peri perinatal death nata (yes or no) l deat h at your hos pital 2b) What is the rate of WHAT DESCRIPTIVE A 1-way frequency table of low low birth weight birth (yes or no) wei ght at your hos SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 126 pital 2c) What is the rate of WHAT DESCRIPTIVE A 1-way frequency table of pret preterm birth erm (yes or no) birth at your hos pital 2d) What is the rate of WHAT DESCRIPTIVE A 1-way frequency table of small sma for gestational ll for age (yes or gest no) atio nal age at your hos pital 3) Why are babies WHY ANALYTIC A series of 2-way frequency dyin tables (2 X 2 g, tables) that comparing is rates of wha perinatal death t are for potential the causes or risk likel factors for y death, PLUS cont the risk ratios ribut (RR) and p- ors values to, associated or with these cau tables. ses of, peri nata l deat h at your hos pital .

Note that this type of data mapping is very useful. You can use similar tables to guide you when you plan your analysis for your module assignment at the end of this Unit (Study Session 5).

Also note the last column “Expected Data Output” are the kinds of statements you would use in the “Data Analysis” section of a study protocol or study report.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 127 5 DESCRIPTIVE STATISTICS – USES AND MISUSES

For study questions 1 and 2, the answers found in Table B are pretty straightforward, i.e. descriptive statistics for descriptive questions – the correct use of descriptive analysis. We will not review how to produce simple descriptive statistics again here, as that was covered in study session 1 of this Unit.

However, for study question 3 you may still not have a clear picture in your mind as to what these tables and data would look like. A common mistake is to use descriptive statistics and just look at the rates of risk factors in the deaths to see what occurs more frequently in the deaths. This unfortunately will not adequately answer a “WHY” question. You must also look at the rates of risk factors in the babies who lived and compare this to the deaths in order to tell what are the possible causes of deaths. Let’s try an example from this same assignment – try Task 3.

TASK 3 – EXAMINE & INTERPRET DESCRIPTIVE STATISTICS

Below are the 1-way frequency tables of some of the potential risk factors for perinatal death from our data as produced by EpiInfo 2002 (you will learn how to generate these kinds of data tables in the next study session).

The data is restricted to just the 22 perinatal deaths (Select: BABYDEATH=”+”). Examine the data and describe which factors occur more frequently (have high rates) in the babies who died.

Examining these data which of these risk factors appear to be associated with Perinatal Death at your hospital?

EpiInfo 2002

Current View: C:\Epi_Info\ASSIGN.rec: Select: BABYDEATH = "+" Record Count: 22 (Deleted records excluded) Date:2002/12/18 02:48:18

FREQ INFANTSSEX PRETERM LBW PRENATALC DIABETIC

Infants Sex Preterm LBW PrenatalCare SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 128 Abruptio Diabetic

Infants Sex

Infants Sex Frequency Percent Cum Percent f 14 63.6% 63.6% m 8 36.4% 100.0% Total 22 100.0% 100.0%

95% Conf Limits f 40.7% 82.8% m 17.2% 59.3%

Preterm

Preterm Frequency Percent Cum Percent Yes 9 47.4% 47.4% No 10 52.6% 100.0% Total 19 100.0% 100.0%

95% Conf Limits Yes 24.4% 71.1% No 28.9% 75.6%

LBW

LBW Frequency Percent Cum Percent Yes 12 60.0% 60.0% No 8 40.0% 100.0% Total 20 100.0% 100.0%

95% Conf Limits Yes 36.1% 80.9% No 19.1% 63.9%

Prenatal Care

Prenatal Care Frequency Percent Cum Percent

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 129 Yes 15 68.2% 68.2% No 7 31.8% 100.0% Total 22 100.0% 100.0%

95% Conf Limits Yes 45.1% 86.1% No 13.9% 54.9%

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 130 Diabetic

DIABETIC Frequency Percent Cum Percent 1=Yes 2 9.1% 9.1% 2=No 20 90.9% 100.0% Total 22 100.0% 100.0%

95% Conf Limits 1=Yes 1.1% 29.2% 2=No 70.8% 98.9%

MEANS WEIGHTGMS

Obs Total Mean Variance Std Dev 22 55958.0000 2543.5455 848967.1169 921.3941 Minimum 25% Median 75% Maximum Mode 1023.0000 1814.0000 2456.5000 3155.0000 4347.0000 1023.0000

FEEDBACK

We see from this data that in perinatal deaths, 63.6% are female, 47.4% are preterm, 60.0% are low birth weight, 68.2% had prenatal care, 9.1% had mothers with diabetes, and the average birth weight was 2543g with a standard deviation of 921g. The factors that occur more frequently in perinatal deaths from this data analysis include female, preterm and low birth weight, (although the average birth weight in the deaths is not below 2 500g), and prenatal care. Diabetes only occurred in less than 10% of babies who died.

While this seems logical, it is actually a misuse of descriptive statistics as they are not meant to answer “Why” questions like Study Question #3 in this example. Work through the next section and Task 4 to see why you can get incorrect answers if you try to use descriptive statistics for ‘Why’ questions.

5 USING ANALYTIC STATISTICS TO ANSWER ‘WHY’ QUESTIONS

Now let’s look at the data correctly using 2 X 2 tables, risk ratios and p-values. Once again the data is presented as an EpiInfo 2002 printout (some of

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 131 the extra statistics have been deleted from the printout to take it easier for you to read).

TASK 4 – EXAMINE ANALYTIC STATISTICS

Interpret the Epi Info printout of data analysis below and determine which of these potential risk factors appear to be associated with perinatal death in your hospital.

Note that this time we are using all 471 subjects, i.e. those with and without perinatal death (BABYDEATH) and comparing them to each other.

EpiInfo 2002

Current View: C:\Epi_Info\ASSIGN.rec: Record Count: 471 (Deleted records excluded) Date:2002/12/20 02:11:26

TABLES INFANTS SEX BABYDEATH

Single Table Analysis Point 95% Confidence Interval Estimate Lower Upper

PARAMETERS: Risk-based Risk Ratio (RR) 2.0430 0.8738 4.7769 (T) STATISTICAL TESTS Chi-square 1-tailed p 2-tailed p Chi square - uncorrected 2.8452 0.0916481906

TABLES PRETERM BABYDEATH

Single Table Analysis Point 95% Confidence Interval Estimate Lower Upper PARAMETERS: Risk-based SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 132 Risk Ratio (RR) 13.0800 5.7547 29.7296 (T) STATISTICAL TESTS Chi-square 1-tailed p 2-tailed p Chi square - uncorrected 55.0926 0.0000000000

TABLES LBW BABYDEATH BABYDEATH LBW 1=Yes 2=No TOTAL Yes 12 18 30 Row % 40.0 60.0 100.0 Col % 60.0 4.1 6.5 No 8 425 433 Row % 1.8 98.2 100.0 Col % 40.0 95.9 93.5 TOTAL 20 443 463 Row % 4.3 95.7 100.0 Col % 100.0 100.0 100.0

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 133 Single Table Analysis Point 95% Confidence Interval Estimate Lower Upper PARAMETERS: Risk-based Risk Ratio (RR) 21.6500 9.5879 48.8866 (T) STATISTICAL TESTS Chi-square 1-tailed p 2-tailed p Chi square - uncorrected 98.8100 0.0000000000

TABLES PRENATALC BABYDEATH

Single Table Analysis Point 95% Confidence Interval Estimate Lower Upper PARAMETERS: Risk-based Risk Ratio (RR) 0.8699 0.3627 2.0864 (T) STATISTICAL TESTS Chi-square 1-tailed p 2-tailed p Chi square - uncorrected 0.0974 0.7550247957

TABLES DIABETIC BABYDEATH

Single Table Analysis Point 95% Confidence Interval Estimate Lower Upper PARAMETERS: Risk-based Risk Ratio (RR) 6.6286 1.9042 23.0740 (T) STATISTICAL TESTS Chi-square 1-tailed p 2-tailed p Chi square - uncorrected 9.1157 0.0025354661

MEANS WEIGHTGMS BABYDEATH

Descriptive Statistics for Each Value of Crosstab Variable Obs Total Mean Variance Std Dev 1=Yes 22 55958.0000 2543.5455 848967.1169 921.3941 2=No 445 1475635.0000 3316.0337 210014.2669 458.2731 SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 134 Minimum 25% Median 75% Maximum Mode 1=Yes 1023.0000 1814.0000 2456.5000 3155.0000 4347.0000 1023.0000 2=No 2186.0000 3030.0000 3320.0000 3600.0000 4859.0000 3382.0000

T Statistic =7.2368 P-value =0.0000

FEEDBACK

The key data to look at are the Risk Ratios or Mean Differences and the p- values. First lets put this data into a single table that would be more suitable for presentation in a report.

RM ANALYTIC ANALYSIS VARIABLE RISK RATIO (RR) P-VALUE Infant Sex 2.04 0.09 (Female=expose d) Preterm (Yes=exposed) 13.08 0.00 Low Birth Weight 21.65 0.00 (Yes=exposed) Prenatal Care 0.87 0.75 (Yes=exposed) Diabetic Mother 6.63 0.002 (Yes=exposed)

VARIABLE MEAN DIFFERENCE P-VALUE Mean Birth Weight 2544-3316 = -772 0.00

From this table we see that Preterm, Low Birth Weight and Diabetic Mother all have large RR and p-values less than 0.05, so they appear to be associated with baby deaths, while infant sex and prenatal care are not associated with perinatal deaths (p-values >0.05). In addition, babies who died had birthweights which were on average almost ¾ of a kilo less than babies who lived, and this is a significant difference (p- value <0.05). If we were to use the 1st data analysis we would have mistakenly associated prenatal care and infant sex with death and missed the fact that diabetes in the mother is actually a problem. So it is important to recognise that if you have a “WHY” question in your study that you must compare exposed and not exposed or diseased and not diseased groups to adequately answer this question. Descriptive statistics for a “WHY” question can lead to errors in interpretation.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 135 6 SESSION SUMMARY

This session worked through an example of data analysis to help you determine what type of data analysis you will need for a study. First you answered the two basic questions necessary to guide your data analysis:

Question 1 - The type of data you have? Question 2 – What type of study question(s) do you have?

You should always answer these two questions before you start any data analysis and map the data as you did in Tasks 1 and 2.

Then using an example you learned the correct analysis to answer a “Why” questions, i.e. analytic statistics.

In the next session you will learn how to use a common statistical package to make data analysis easier.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 136 Unit 3 - Session 4 Using Epi Info for Data

Analysis

Introduction

The first three study sessions of this unit reviewed basic data analysis used in epidemiology. While it is important to understand how these measures are derived and to be able to calculate them simply and by hand, when the sample or the amount of data collected is large most study analyses will use a computer to analyse study data. This study session will introduce one of the most common computer applications for epidemiologic analysis, Epi Info. Epi Info is a public domain programme which means that it is free for anyone who wants to use it.

The programme is supported by the WHO and U.S. Centers for Disease Control. Copies can be obtained free from the CDC Epi Info Web Site: http://www.cdc.gov/epiinfo/epiinfo.htm.

There are many statistical analysis programmes available to the Public Health researcher. Other examples include: SPSS, SAS, and STATA. Basic programmes like EXCEL and ACCESS can also be used, especially for graphs and charts, and are commonly available in South Africa. The principles are the same in all programmes, so that once a basic understanding of analysis is obtained, any programme available should be of assistance to the researcher.

Session Contents

1 Learning outcomes of this session 2 Readings 3 Starting Epi Info 4 Reading a data set and creating a line listing 5 Creating categorical variables in Epi Info 6 Calculating measures of central tendency & dispersion using Epi Info 7 Creating cross-tabulations and test for association between two variables using Epi Info 8 Using Epi Info to create bar graphs and pie charts 9 Saving your analysis 10 Session summary

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 137 1 LEARNING OUTCOMES OF THIS SESSION

By the end of this study session you should be better able to:

. Conduct simple statistical analyses using Epi Info. . Manipulate study data using simple Epi Info programming. . Create a simple bar and pie chart and line graph using Epi Info.

2 READINGS

Author/s Publication details Hall, A./ Centers for (2004) Oswego: An Outbreak Investigation - Tutorial. Epi Info V3.5.1: Disease Control Objectives, Introduction & Table of Contents

Vaughan, J. P. & (1989) Chapter 10 - Data Processing and Analysis. In Manual of Morrow, R. H. Epidemiology for District Health Management. Geneva: WHO: 99-110.

3 STARTING EPI INFO

This section will help you get started using Epi Info by assisting you to install the programme, open the programme and review the main menu.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 138 TASK 1 – INSTALL & OPEN EPI INFO 2002

a) If you happen to be loading the data set now, look back at Section 3.5.4 in the Module Introduction. If you have not done so already complete the Preliminary Session in this Module - Installing Epi Info. If you have already completed this session as directed or you already have Epi Info on your computer proceed to "b".

b) An Epi Info 2002 Icon should now appear on your computer screen (Windows Desktop). Double click your cursor on the icon to start Epi Info 2002. The Epi Info 2002 main menu should appear on your screen. To execute or use any of the programmes in Epi Info 2002 you start by selecting the desired task by double clicking on the appropriate grey/blue box or by selecting from the “tool bar” at the top of the page. This is what the EPI INFO 2002 Main Menu page/window looks like (note this is the Epi Info 2000 main page, the 2002 main page is similar just with fewer blue (or grey) boxes).

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 139 FEEDBACK

After completing these tasks you should now be able to open Epi Info and negotiate the main menu of the programme. You also should have a copy of the database necessary to complete Tasks B through F below copied on to your hard drive.

For this study session, the majority of the readings and all of the tasks will be computer based. You will first read and work through instructions using an Epi Info computer based tutorial (the Oswego Foodborne Outbreak Investigation) and then you will be asked to complete similar data analysis tasks using the Hookworm data set.

IT IS IMPORTANT TO NOTE HERE THAT TASKS 3 & 4 - THE OSWEGO TUTORIAL IS PART OF YOUR MODULE GUIDE - YOU NEED TO WORK THROUGH IT AS DIRECTED, AS IF IT WERE ANY OTHER SESSION TASK. THE TASKS THAT FOLLOW DO NOT INCLUDE OSWEGO PROVIDE YOU WITH ADDITIONAL INFORMATION/LEARNING, THEY DO NOT REPLACE OSWEGO, AND IN FACT THEY ASSUME YOU HAVE COMPLETED THE RELEVANT OSWEGO TASKS AS DIRECTED.

4 READING A DATA SET & CREATING A LINE LISTING USING EPI INFO

This exercise will introduce you to the data set we are going to use, the analysis section of Epi Info, and assist you to produce a listing of the data in the hookworm data set.

READING

Vaughan, J. P. & Morrow, R. H. (1989) Chapter 10 - Data Processing and Analysis. In Manual of Epidemiology for District Health Management. Geneva: WHO: 99-110. See Reader, pages 509-522.

TASK 2 – REVIEW VAUGHAN & MORROW CHAPTER 10.

Read Chapter 10 from Vaughan & Morrow (1989) and work through the examples from the Hookworm data by hand.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 140 FEEDBACK

The data analysis used in the Vaughan & Morrow (1989) Chapter 10 should be familiar to you based on your learning in Sessions 1-3 of this Unit. The answers to the exercises are clearly outlined in Chapter 10.

For many of the exercises in this Session we will be using the hookworm data found in Vaughan & Morrow Chapter 10 as our analysis data set. This data has been previously entered in to an Epi Info data set named Hookworm, which you copied into your hard drive in the Preliminary Session. If you wish to learn about creating questionnaires and entering data in Epi Info then refer to the computer tasks on those topics in the Oswego Foodborne Outbreak Tutorial you will use below. The basic format for the remainder of this session will be for you to work through tasks in the Oswego Tutorial and then try them on your own using the Hookwork data set.

READING

Hall, A./ Centers for Disease Control (2004) Oswego: An Outbreak Investigation - Tutorial. Epi Info 2002 V3.3: Objectives, Introduction & Table of Contents.

Examine copies of the Objectives, Introduction and Table of Contents for the Oswego Tutorial in the reading above. These are the entry screens for this tutorial and should help you to navigate through the tutorial. Now start with Task 3.

TASK 3 – BEGIN OSWEGO FOODBORNE OUTBREAK INVESTIGATION TUTORIAL IN EPI INFO 2002

Highlight/click on the “Tool Bar” menu item “Help” on the menu at the top of the Epi Info 2002 main menu and highlight the item “Tutorials” which then gives you a box to the right, now select/click on the item called “Oswego Foodborne Outbreak”. Another window will now appear towards the bottom of your screen. (Note to see this entire window you possibly will need to click on the black bar on the top and “drag” the window to the top and centre of your screen until you can see the “Scroll Bar” on the right side of the window.) Click on the down arrow on the side of this window (scroll bar) to move down the page. Double click on the blue text which says “[Click Here} to Begin”. Another window will appear. Also use right scroll bar to move to the bottom of this window. At the bottom of this page there are two choices:

“[Click here] to begin the first session, or [click here] to see the complete Table of Contents”

Choose the second “[click here] to see the Table of Contents” shown here. SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 141 “Scroll Bar Down Arrow”

Now click on “Session 1: Introduction” to read the introduction to this tutorial.

FEEDBACK

For this study session we will be working through selected tasks in the Oswego Tutorial to learn how to analyse data using Epi Info. We will not be using the entire Oswego Tutorial, which covers how to conduct an infectious disease outbreak investigation, and also how to develop a questionnaire and enter data using Epi Info. If you would like to also complete these sections, or if at some later date you need to create a questionnaire and enter data using Epi Info then you can come back to the tutorial and complete those sections any time.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 142 TASK 3 – PRODUCING A LINE LISTING

Now click on the item “Computer Tasks” in the yellow box on the left side of the “Introduction” page. Select and Read AND follow the instructions to work through the Oswego Tutorial Computer Task 3 titled “Creating a Line Listing”. You can also find this task on the Table of Contents for the Oswego Tutorial.

Once you have completed the line listing task in the Oswego tutorial, use the same procedures to READ the Hookworm data set and CREATE A LINE LISTING of all the variables in the Hookworm data set.

READ in the data set: Note that the Hookworm data set is an old Epi Info Version 6 data set. This means that on the 1 st screen after you select READ from the analysis menu, under “Data Formats” you will have to tell Epi Info 2002 that you will be reading an Epi Info Version 6 data set. To do this click the arrow at the right of the “Data Formats” box and then highlight “Epi6”. This will identify all the Epi Info Version 6 data sets in your Epi Info directory. Click on the one called Hookworm. This will now appear in the bottom

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 143 CREATING A LINE LISTING: use the “LIST” command as you did in the Oswego Tutorial to create the line listing. Remember the * asks for a listing of all variables which is what you want to do.

FEEDBACK

The listing of all 100 records should appear on your screen that should resemble the data in Vaughan & Morrow Table 10.1. Now review the data by using the right scroll bar in the “Results Library”.

5 CREATING CATEGORICAL VARIABLES IN EPI INFO As in Vaughan & Morrow Chapter 10, it is often useful to categorise continuous data into categories or groups. Table 10.5 categorises haemoglobin as anaemia when the HGB is less than 10g/100ml. This exercise will create the new variable ANAEMIA as described in Vaughan & Morrow. In Task 5 below, you will re-create this table using Epi Info.

TASK 4 - USE EPI INFO TO DEFINE A NEW VARIABLE

Complete the Oswego Tutorial Computer Task 5 titled “Calculating Incubation Period”.

FEEDBACK

This tutorial teaches you to DEFINE new variables from your data and introduces you to the ASSIGN command which is one way to give a new variable data. Use the ASSIGN command when the new variable is an arithmetic expression of existing data as was the case in the tutorial.

TASK 5 – CREATE A NEW VARIABLE

Create a categorical variable called ANAEMIA in analysis using the Vaughan & Morrow definition of anaemia=haemoglobin <10.0 g/100ml. (Note you will need to READ HOOKWORM again.) It is often preferred that data be collected and entered in a data set as continuous data, which you can categorise later if necessary.

-- Use the DEFINE command from Computer Task “Calculating Incubation Period” to create the ANAEMIA variable in the Hookworm data set. -- Now instead of using the ASSIGN command use the IF command to indicate Yes or No in the ANAEMIA variable. Choose the IF command. In the “If” box type: HGB<10.0. Next in the “Then” box SOPH, UWC, type: Master ANAEMIA=”Y”. of Public Health: Next Measuring in the “Else” Health box & Disease type: ANAEMIA=”N”. II: Intermediate Epidemiology – Unit 3 Finally click on OK to complete the IF command procedure -- see144 To check and see if the variable was created properly use the “FREQuencies” command. Select “FREQuencies” from the Analysis Commands section. In the dialogue box select using the down arrow to list variables in the “Frequency of” box and select ANAEMIA and click OK -- see “FREQuencies” Dialogue Box below. How many of the study subjects in the data set have ANAEMIA (HGB <10.0)

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 145 FEEDBACK

The DEFINE command creates a new variable named ANAEMIA which is now available in the data set. Note that this variable is temporary. When you exit analysis the variable will be deleted. If you opened the data set again and wanted to use the ANAEMIA variable you would have to create it again.

The IF command then places either a Y for yes or N for no in the ANAEMIA variable based on the value of HGB. IF/THEN programming is the basic procedure used in computer programming and is useful for manipulating the data to define new variables when conditions are used to decide the variable data rather than an arithmetic expression as was used in the Oswego Tutorial. When creating new variables it is important to check that you have successfully created the variable and that the correct data was entered into the field. A simple way to check this is by using the “FREQuencies” command. This command gives you a simple count of how many subjects are in each category. In this case we compare the data in the Epi Info “Results Library” with the data in Vaughan & Morrow Chapter 10. There should be 44 subjects with ANAEMIA=Y in your results.

6TASKCALCULATE 6 – PRODUCING MEASURES MEASURES OF CENTRALCENTRAL TENDENCY TENDENCY & DISPERSION & First DISPERSION complete the WITH Computer EPIINFO Task 6 “Finding the Mean” in the Oswego Tutorial.

SOPH,Now UWC,use the Master “MEANS” of Public command Health: Measuring to produce Health a mean, & Disease median II: Intermediate and mode, Epidemiology –range, Unit 3 IQR, variance and standard deviation for a) age and b) haemoglobin for those subjects with and without hookworm146 infection. Are the mean age and mean haemoglobin statistically different? What are the P-Values? FEEDBACK

Select the “MEANS” command and then in the “Means of” box select AGE and in the “Cross-tabulate by Value of” box select HOOKWORM to get the mean age in each hookworm group plus the accompanying statistics. The measures of central tendency and dispersion are as follows: AGE HOOKWORM NO HOOKWORM MEAN 25.237 23.756 MEDIAN 16 24 MODE 4 42 RANGE 0-71 0-56 IQR 34 28 VARIANCE 449.598 257.089 STD DEV 21.204 16.034

Select the “MEANS” command and then in the “Means of” box select HGB and in the “Cross-tabulate by Value of” box select HOOKWORM to get the mean haemoglobin in each hookworm group plus the accompanying statistics. The measures of central tendency and dispersion are as follows:

HGB HOOKWORM NO HOOKWORM MEAN 9.475 10.817 MEDIAN 9.6 10.9 MODE 8.4 10.8 RANGE 6.2-13.8 7.3-13.1 IQR 1.9 2.0 VARIANCE 2.313 1.896 STD DEV 1.572 1.377

Age is not significantly different for those with and without hookworm (p-value 0.706095), but haemoglobin is significantly different for those with and without hookworm (p-value 0.000018) **NOTE: How much easier this is than hand calculations, which would have been almost unmanageable with this number of subjects!!!

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 147 7 CREATE A CROSS-TABULATION & TEST FOR ASSOCIATION BETWEEN TWO VARIABLES USING EPI INFO

TASK 7 - LEARN HOW TO CREATE CROSS-TABULATIONS & PRODUCE STATISTICS FOR CATEGORICAL VARIABLES.

Complete the Oswego Tutorial Computer Task 8 called “Creating Tables”.

TASK 8 - RE-CREATE TABLE 10.5 FROM VAUGHAN & MORROW USING THE TABLES COMMAND

Use the “TABLES” command from Oswego tutorial to re-create Table 10.5. Which variable will be the “Exposure Variable” and which will be the “Outcome Variable”?

FEEDBACK

This command creates a table of ANAEMIA by HOOKWORM and produces statistics. Select HOOKWORM as the “Exposure Variable” and ANAEMIA as the “Outcome Variable” in the “TABLES” dialogue box. Each cell and totals should match those in Table 10.5. (Note that you have exited the analysis programme of Epi Info since completing previous tasks so you will need to “READ” the hookworm data set again and then re-create the ANAEMIA variable before using the “TABLES” command.)

TASK 9 – INTERPRET THE DATA

a) What are the odds ratio, risk ratio and p-values testing the association between anaemia and hookworm, interpret these results. b) Which of these would you use if this were a case-control study? A cohort study? c) Using the table created above you can also test for associations FEEDBACK between haemoglobin defined as anaemia and the presence or absence of hookworm.

FEEDBACK

a) and (c):The odds ratio is 3.46 and the Risk Ratio is 2.14. (Once again, since the occurrence of anaemia is not rare the odds ratio and risk ratio are not that similar.) The p-value (uncorrected) is 0.0039. This suggests that there is a significant association between haemoglobin and anaemia.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 148 b) Case-Control Study use Odds Ratio. Cohort Study use Risk Ratio.

8 USING EPI INFO TO CREATE BAR GRAPHS AND PIE CHARTS When presenting data to managers it is often helpful to use graphics instead of tables. Epi Info will also create selected graphics. This exercise will assist you to create a simple pie and bar chart, and a line graph using Epi Info.

TASK 10 – CREATE GRAPHS & CHARTS

Complete the Oswego Tutorial Computer Task titled “Creating a Graph”. Now create the following graphics and print copies of them using the hookworm data set: Bar Chart of ANAEMIA Pie Chart of HOOKWORM Line Graph of HGB

HINT: You need to adjust the scale and label the Line Graph of HGB for it to make sense -- see below:

FEEDBACK

You should now have bar and pie charts and a line graph like those below which could be placed in a report, or for use during a presentation, to illustrate the prevalence of anaemia and hookworm, and the distribution of HGB values in the population. You should have labelled your charts and graphs

For the line graph of HGB you also adjusted the scale of the vertical axis so that the counts would be single integers which would represent number of subjects with the matching HGB from the horizontal axis. Note you needed to check the “Vertical Labels” box or else all the Horizontal (X)

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 149 Axis values would have overlapped and been impossible to read. Note that you should also use Labels for the graph and the Y & X Axis for the graph to be understandable to others.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 150 Epi Info graphics are not as fancy as those which could be developed using another programme like Excel, but if done properly could still get the message across and provide another way to simply present research results.

To print graphic results, click on “Menu” in the top left corner of charts and graphs and then highlight “File” then highlight “Print”.

You have now learned how to use a basic statistical analysis programme to assist in simple data analysis. You may want to take a chance to look at the other tasks and tutorial sessions in the Oswego Tutorial and try some of the other commands in Analysis. EPI Info is a simple and powerful tool that can help you examine data from your programme or study.

9 SAVING YOUR ANALYSIS

After you have finished your analysis you will want to save the data for reference while writing up the results or sharing with others. There are two ways to save the data. Make sure you are in the Epi Info Analysis screen and complete Task 11.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 151 TASK 11 - SAVE EPI INFO OUTPUTS

There are two basic ways you can save your output for future use - try the following:

a) Print the output directly from Epi Info b) Find and open the relevant file in your Epi Info sub-directory in MSWord.

FEEDBACK a) If you just want a simple printed copy for reference you can print directly from Epi Info. You will note in the Analysis Output section of the Epi Info Analysis screen at the top right, there is a box that says "Print" with a picture of a printer. As long as your computer is attached to a printer and the printer is turned on all you need to do is click on that button and it will give you your usual printer dialogue box. Note that everything currently in the grey output box will be printed. If you have used the select option, or gone in and out of Analysis then prior work will not be currently be in the grey output box and you will have to go to the Epi_Info sub-directory and find the file with the desired output to access prior data outputs. b) If you need to access prior outputs or you just want to save an accessible copy of the data analysis in an electronic format, then you will want to access the data in your Epi_Info sub-directory. Note that Epi Info saves all outputs in the grey output box to a file. Every time the grey output box resets (for example after a select procedure) then Epi Info starts a new output file. You can access these files and save them for future use in MSWord, by using the following commands:

1. Close Epi Info. 2. Open MSWord or other word processor programme. 3. Click on the File - Open command. 4. You will now get a Browse dialogue box, which will generally be sitting in the "My Documents" sub-directory (or folder). 5. Navigate the Browse dialogue box to the C-Drive on your computer. 6. In the list of folders on your C-Drive you should see a folder called Epi_Info" - click to open this folder. 7. In the Epi_Info sub-directory (or folder) there are several hundred files that are used for running the Epi Info programme. You will also find your analysis files, such as Hookworm and your assignment data set here. Also in this folder are the saved data outputs from any analysis you have run. Generally the files are in alphabetical order, so use the scroll down arrow to find files that are named "Out#" for example "Out1", "Out2", etc. These are your output files. Just click on them to find the one you want. They are numbered sequentially in the order you created them, i.e. SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 152 conducted your analysis, so if you want the most recent output then select the Out file with the highest number. 8. Once you have the output file open in MSWord then you can edit it just like any other word document. You can also save it with a name that makes sense, for example "Hookworm analysis 3March05" or something like that. You can also save it to another folder where you keep all your analysis. At this point you can do anything to it that you can do to a MSWord document. Note that the output tables are now regular MSWord Tables so they can be edited as necessary as well. You can also copy and paste data into other documents or Powerpoint slides, etc.

EPI INFO ADVANCED TUTORIALS - OPTIONAL

Included on your CD-ROM is a folder called EPIINFO ADVANCED TUTORIAL. In this folder there is a series of exercises to learn/practice using Epi Info. It covers many of the topics covered in the Oswego tutorial, but at a slightly more advanced level. Exercises 5 & 6 cover basic statistical analysis. If you want to learn more about Epi Info or just want to try additional learning exercises you should copy all the files in this folder into your Epi_Info folder (directory). You can access the “Epi_Info_Exercises” using MSWord and browsing to find the file of this name in your Epi_Info folder (directory). You can then print out the selected pages and try the exercises.

10 SESSION SUMMARY

This session introduced you to a simple computer programme for analysing epidemiologic data. Like all computer programmes you need to first understand the statistics you are using to make sense of the data, the computer does not do that for you. If you had trouble understanding any of the data produced using Epi Info then review again Sessions 1-3 in this Unit and then practice again using Epi Info to produce results. The session used data you could hand-calculate, i.e. the hookworm data, so that you could see the relationship between simple hand- calculations and computer analysis. You also learned how to save your data analysis outputs for future use.

The next session in this unit supports you in starting the first part of Assignment 2 for this module. In it you will learn how to find and review the literature relevant to understanding the particular health problem which is presented in the dataset (Mhdassignment on your CD). Reading for and writing a literature review is a skill you will need in all epidemiological studies and research and enables you to understand more about the problem you are to analyse before you embark on the analysis.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 153 SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 154 Unit 3 - Session 5

FOCUS ON Assignment 2: ASSIGNMENT 2 Introduction and Literature Review

Introduction The last three study sessions of this unit aim to support you in developing your Assignment 2. Before you work through this session, review the assignment requirements in the Module Introduction of this Module Guide (section 3.5). The session’s aim is to guide you in understanding the requirements of the first part of Assignment 2 and identifying relevant topics for your literature search. Once you have understood the assignment requirements, and scrutinised the data set and other information provided about the study, you can begin searching for relevant literature. This will involve searching databases on the Internet, as well as scientific journals and books in an academic library. Talk to colleagues about any relevant sources of information they know of and consider research reports on the topic as well as comparable data sets or programmes that may be available at your workplace. You will need to read this literature critically and review it as part of your assignment. Remember to compile your Reference List as you read, using the SOPH Academic Handbook to guide you.

Your goal is to pare down your discussion to a 2 - 5 page literature review of the material you have found, and to submit this for review by your lecturer as part of your Draft Assignment 2. Although you may find it difficult to be critical of the studies you read at this stage, you will have a chance to revisit your Literature Review as part of Unit 3 - Session 6 - Study Methodology). At that stage, you could improve your critique, as you study that session.

Session Contents

1 Learning outcomes of this session 2 Readings 3 Reviewing and starting Assignment 2 4 The purpose of a literature review 5 The literature search process 6 Developing critical review skills 7 Preparing to write your literature review 8 Session summary SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 155 9 References

1 LEARNING OUTCOMES OF THIS SESSION

By the end of this session, you should be better able to:

. Clarify and justify investigating a health problem based on a data set. . Search and locate relevant health information relevant to the research problem. . Write a short, critical review of the available health literature. . Write an introduction for your report.

2 READINGS

Refer to all the readings listed in the preceding Study Sessions.

Author/s Publication details Vaughan, J. P. & (1989). Ch 12 - Communicating Health Information. In Epidemiology for Morrow, R. H. Health Managers. Geneva: WHO Publications: 125 - 129.

Depoy, E. & (1994). Ch 5 - Developing a Knowledge Base Through Review of the Gitlin, L. Literature. In Introduction to Research. St Louis: Mosby: 61 - 76.

Mouton, J. (2001). Ch 6 - The Literature Review. How to Succeed in your Master’s and Doctoral Studies: A South African Guide and Resource Book. Pretoria: Van Schaik: 90 - 97. Dane, F. C. (1990). Ch 4 - Reviewing the Literature. In Research Methods. Pacific Grove, California: Brooks/Cole: 65 - 78.

Alexander, L./ (2005). Section 5.3 - Citing and Referencing the Sources That You Use. School of Public SOPH Academic Handbook. Bellville, UWC: 52 - 61. Health, UWC Beaglehole, R., (1993). Ch 11 - Continuing Education in Epidemiology. In Basic Bonita, R. & Epidemiology. Geneva: WHO: 143 - 147. Kjellstrom, T.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 156 3 REVIEWING AND STARTING ASSIGNMENT 2

3.1 Narrative Context For the purposes of Assignment 2, you may assume that you are a public official, responsible for monitoring Public Health in your region. You have recently contracted a team of researchers to carry out a survey for you. They have provided a preliminary data set. (See CD and Module Introduction, section 3). This data is drawn from a problem occurring in the real world of Public Health practice. This data set raises a number of concerns which may require an urgent response from you, depending on your findings.

3. 2 Review the Instructions for Assignment 2

Assignment 2 – Write a Literature Review

The outcome of Assignment 2 is an Epidemiological Research Report based on your study of a Public Health problem which is represented by the data set provided. Use the description of the research, the related information provided and the data set to identify the problem.

Then begin searching for as much relevant literature as you can find on this problem. Use the Internet, resources in your workplace, consult expert opinion, access the records of local health authorities available to you or comparative studies from journals, academic libraries and resource centres. Having accessed sufficient literature:

. Read and critically review the literature you find. . Prepare your Reference List using the correct referencing method: use the SOPH Academic Handbook, 2005, and the websites provided in this session.

Your review of the data and the relevant literature will help you to develop your study questions (or objectives) in the next stage of the assignment.

3.3 Plan Your Assignment

Before plunging into your assignment, take a few moments to clarify what you hope to achieve, by when, and the resources available to you. A plan like this will help you to work systematically through all aspects of this task.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 157 TASK 1 - OUTLINE THE PROCESS FOR YOUR ASSIGNMENT

After reading the assignment instructions above, scan through the Module Introduction and Sessions 5, 6 and 7 of this Unit. In the table below, note down the relevant information. a) Identify the marks allocated and due dates of the assignment. b) Identify the main components of the Report required for the assignment. c) Identify what you must do to complete these components. d) Identify any resources that may help you with the assignments.

Assignment Marks Date What section of Actions to be Resources no Due the Report must taken identified so be completed far 2 (Draft Assignment)

2 (Final Assignment)

FEEDBACK

Hopefully this gives you a sense of the process that lies ahead - made up of Draft Assignment 2 and Final Assignment 2. You should add the required actions to this table as you work through this session, (e.g. search for comparable studies on Internet), and return to it to remind yourself of the process that lies ahead.

You have a list of assignment due dates (deadlines) from the Student Administrators. The components of the Report required for each assignment is set out in the Module Introduction and explained in the final sessions of each unit.

Some of the resources that may be helpful include provincial or local authority annual health reports, the Health Systems Trust’s annual South African Health Review, World Health Organisation (WHO) or UNICEF reports and other documents. These might provide you with a basis for comparing the data in this report with similar data in other places.

In order to get a feel for the final report that you must produce, take a look at Chapter 12 of Vaughan and Morrow (1998), on different ways to present health information. The checklist provided at the end of Chapter 12 may also be particularly useful. SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 158 READING

Vaughan, J. P. & Morrow, R. H. (1989). Ch 12 - Communicating Health Information. In Epidemiology for Health Managers. Geneva: WHO Publications: 125 - 129.

Before you even start your next section, the literature review, make sure that you have a good grasp of the health problem that you are facing in the data set.

3.4 Clarifying the Health Problem

You will need to spend some quiet time scrutinising the problem represented by the research study description presented in the assignment, in order to start searching for information. Use the study description, data set and any other information that has been provided. The outcome of this process should be that you write a draft of your introductory paragraph/s for the report, in which you give an overview of the problem, its context, and a rationale for addressing it. Think for a moment about the role that the Introduction plays in a research report.

TASK 2 – THE PURPOSE OF THE INTRODUCTION

What would you say is the purpose or role of an Introduction in a Research Report?

FEEDBACK

Take a look at this example of an Introduction by one of SOPH’s past students from Tanzania, Jeremiah Mazala (2004) which is reproduced with his permission. The data set that he studied was concerned with:

THE SEXUAL AND DRUG RELATED RISK BEHAVIOUR IN ADOLESCENTS AGED 11 AND 15 YEARS IN REGION X. Defines key concepts Introduction

There are different definitions of risk behaviour by different writers. Richard (1998) defined it as behaviours that can directly or indirectly compromise the well being, the health and even the life course of young people. The World Health Report 2002 suggests that risk is “… the probability of an adverse outcome, or a factor that raises this probability” (WHO, 2002: 21). In other words any behaviour that will raise this probability is a health risk.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 159 Develops context and establishes rationale for investigation

Teenagers are at peak of risky behaviour because of their curiosity. They constantly experiment and test varieties of behaviours. In doing this, they become dangerously exposed to all kinds of behaviour associated risks. Amongst others, these include cigarette smoking, alcohol drinking, unsafe

States problem and builds rationale

sex, marijuana smoking, use of cocaine and mandrax, etc. Since the young population is higher than other age groups in most of populations, the future working and economically viable population will be in jeopardy if these risky behaviours are left unchecked. This has an impact on future adult productiveness economically, socially and politically. So knowing the extent of the problem is important for productive interventions.

The Introduction to your Report should define key concepts, succinctly state the problem studied by the survey or data, explain the context or setting in which it exists and establish why it is worth tackling. This introduction achieves all of these things, but the context must, in a real situation, be presented in more detail. For the context of region in your report, describe your own region (where you love and work).

Your Assignment 2 task arising from this section is to draft an Introduction to your Report. Now that you have defined the problem, you are in a position to start reading more about the topic.

4 THE PURPOSE OF A LITERATURE REVIEW

Before going any further, do Task 3, in which you are asked to define the purpose of a literature review for yourself. This is always a helpful step when preparing to write any text – clarify its purpose. A text is, after all, a communication written to fulfil a particular communicative role. To assist you, read the first part of Depoy & Gitlin (1994) in which they discuss the purpose of literature reviews.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 160 READING

Depoy, E. & Gitlin, L. (1994). Ch 5 - Developing a Knowledge Base Through Review of the Literature. In Introduction to Research. St Louis: Mosby: 61 - 66.

TASK 3 – WHY REVIEW THE LITERATURE?

From the reading and the assignment instructions, try to answer the question: Why should you review the literature at this stage of the study?

FEEDBACK

In the section you have read, Depoy & Gitlin (1994) make several important points about the purpose of literature reviews in research studies.

In research in general, these authors point out that a literature review assists you, the researcher to understand the current state of knowledge on a topic, to define the “boundaries” of your study, and to select a research strategy for a study. More specifically, they note that knowing more about the topic helps you to refine the questions you will ask in your study. In an epidemiology study, you will often be faced with a data set raising an unfamiliar problem. Reading current literature will help you to understand the problem better, to explore how others have understood the problem, to see whether comparable studies have been undertaken, and to note what sorts of solutions they have adopted.

Even experienced health workers may not be familiar with every health problem they encounter, and may not know much about its natural history, risk factors or prevention strategies. Studies addressing a similar problem, and exploring the influence of social and biomedical risk factors, may suggest risk factors that you should be aware of in your study. Familiarity with the problem therefore helps you to define or refine your study objectives, in order to address the problem with greater focus. In addition, you might search for studies which have taken place in similar settings, with comparable populations, which provide insight into the causes of a problem, or even how it might be addressed.

Reading other studies also enables you to consider your own study method in relation to previous studies, to see what can be learnt from them. Other studies may, for example, suggest that specific variables and instruments will be suitable in your study. In relation to this, we ask whether studies we read are “replicable”.

You will find that Depoy and Gitlin (1994) define the key purpose of a literature review very clearly on page 61 at the end of the second paragraph and on page 62. Look back at these purposes in detail so that they inform your Literature Review for Assignment 2. SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 161 From this discussion of the purpose of the literature review, you should have some idea of the kinds of issues you should be alert to when reading the literature. We will return to these after discussing the process of searching for relevant literature.

5 THE LITERATURE SEARCH PROCESS Searching the literature is a cumulative process that you should start at the beginning of your study, and resume as you see gaps in what you have gathered. As a postgraduate student and a Public Health professional, you need to develop the skill of searching for information quickly and easily when you need it. You will often be faced with new problems and your responsibility is such that you need to develop an in-depth understanding in a short time. Never before has so much information been available to us as now, with the facilities of the World Wide Web. If you are not already conversant with using the Internet, you must aim to develop this competence as soon as possible. Refer to your SOPH Academic Handbook for support, try to attend a SOPH Summer or Winter School soon, and use the UWC Library Guide which has been sent to you.

Once you have your health problem or study topic clear in your mind, there are four key steps to a literature search which are outlined below. They are: . Identify key words (related to your topic or problem); . Search for and identify relevant information; . Obtain copies of the literature; . Take notes for your literature review. From this introduction to searching for literature, you will realise that you need to be: . Internet literate; . Focused, so that you search for relevant literature; . Strategic, otherwise searching will take too long; . Systematic, otherwise you will end up confused by what you have and missing reference details; . Selective, otherwise you will end up with much that is irrelevant; . Ready for disappointments: sometimes you will not find what you need, and you cannot afford to be disheartened, but rather to plan alternative searches. We now describe the process of searching for literature. Start by reading Dane’s (1990) Introduction to this reading, including the section on “Planning the Search”.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 162 READING

Dane, F. C. (1990). Ch 4 - Reviewing the Literature. In Research Methods. Pacific Grove, California: Brooks/Cole: 65 - 78.

Dane (1990) makes a number of important points – firstly to plan your search, and secondly to stop your search when you have sufficient material.

5.1 Identify Key Words

In any library or Internet collection, you can search for literature by author and title, but if you do not have this information, your best bet is to search using subject or topic key words. This requires you to brainstorm a set of key words or phrases relevant to your topic in advance. When identifying key words, you might also use some of the variables which were used to collect the data set.

Identifying key words is done by: . choosing both broad and narrow subject terms for your topic, e.g. Adolescence (very broad) and Mandrax addiction amongst South African teenagers (narrower). . identifying synonyms for the topic, e.g. elderly; aged; old age; ageism; ageism; aging; ageing; aged; older adult; geriatric; gerontology. You could even use Roget’s Thesaurus to find synonyms for your search.

This is an important strategy in order to find all the publications on a topic: different authors and librarians may have used different key words, and if you only use one term, you may miss out on thousands of others.

To demonstrate the brainstorming process, here is a research problem or topic from a previous assignment. SURVEY OF ADOLESCENT RISK BEHAVIOUR

Risk taking by teenagers has been identified as a problem in several communities in your province. As part of a larger survey on adolescent health, school children aged 11 and 15 from three different schools were surveyed to determine the extent of sexual and drug related risk behaviour encountered in the previous two-year period. The Department of Education has requested assistance from the Epidemiology Unit of the Department of Health to analyse this data and write a report and recommendations for joint action by the two Departments.

The data set included a range of variables for the study, including mandrax and suicide. In the data collection questionnaire, the following questions were asked: . How often have you taken Mandrax in the past year? . Have you seriously contemplated suicide at any time the past two years?

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 163 TASK 4 – IDENTIFY KEY WORDS a) From this brief study description of the Survey of Adolescent Risk Behaviour, brainstorm at least eight key words which could be used in your search. Remember to use synonyms as well. b) After reading the Feedback, do the same for your Assignment topic, using the study description and data set, as well as your general knowledge on the topic.

FEEDBACK

There is a wide range of key words that could be used in different combinations. For example, you could use: risk behaviour, risk taking, teenage risk, adolescent risk, youth risk behaviour, drug related risk, drug addiction, Mandrax, risky sexual behaviour, safe sex, teenage suicide, etc.

By brainstorming alternative key words before you start searching, you will broaden the range of literature that you will find, but at the same time, it will help you to focus more specifically on your topic. It is, in fact, an essential time and cost saver.

5.2 Search for and Identify Relevant Information

Once you have your key words, you should search and identify relevant information. Take a look at Mouton (2001), pages 88 and 89, for a review of sources of information. Mouton makes a useful point that a Literature Review is actually a review of “a body of accumulated scholarship” (Mouton, 2001: 6). This suggests that the review is inclusive of definitions of a range of items including definitions of the key concepts, theories and models which are relevant to the topic, “existing data and empirical findings” produced on the topic and previous “measuring instruments” used in this field (Mouton, 2001: 87).

READING

Mouton, J. (2001). Ch 6 - The Literature Review. How to Succeed in your Master’s and Doctoral Studies: A South African Guide and Resource Book. Pretoria: Van Schaik: 86 - 97.

In order to become familiar with the search process, there are a few technical concepts you need to understand. Academic libraries pay subscription fees in order to provide their students with access to Electronic Databases and Electronic Journals. The UWC Library Website guides you through the process of SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 164 using these resources, as will the librarian if you visit UWC. In brief, an Electronic Database is a collection of thousands of scholarly references, catalogued by topic, author and title, which you can search, select from, print out or read on screen or save. There are databases which are devoted to medical, health and social sciences literature. You need to familarise yourself with these resources, as they will make your literature search task so much easier. It is important to remember that a literature review should review scientific literature: extracts from websites are seldom appropriate.

You can save such documents which are usually in PDF format (Portable Document Format) onto a disk, a Mass Storage Device known as a Flash Drive or you can e-mail them to your own e-mail address.

TASK 5 – SEARCH THE ELECTRONIC DATABASES

Having identified your key words, go onto the UWC Library Website (Electronic Resources), select Databases and search several of them using your key words. You could use established electronic databases such as Ebscohost, InfoTrac, MedLine, MedInfo and others. Below is a list of further databases that you could search.

Print the reference details and abstracts of articles that seem to be useful or save the complete articles if they appear to be particularly relevant. Some full articles can be found on the Internet while others only can only be accessed as Abstracts: these can still be useful.

Relevant Electronic Databases

Remember also that as a UWC student you have on-line access to some Important databases as well as hundreds of electronic journals. In addition, you can order hard copies of older journal articles through the UWC Inter- Library Loan system at a small cost. You will find this information under the section on “Using the Library” in the SOPH Programme Handbook.

The following electronic databases may be helpful to you in the course of this Module, and must be used from the UWC Library website. You will need passwords to use these databases which will be supplied to you by the Librarian or Student Administrator. See UWC Library Guide on the website and the SOPH Academic Handbook for information on how to use these databases.

In many instances you can now obtain full-text references off the Internet, especially from such sources such as WHO, UNICEF and other organisations. Remember also that you have the possibility of ordering hard copies of journal articles through the UWC Inter-Library Loan system at a small cost. You will find SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 165 this information in the SOPH Programme Handbook.

NAMES OF USEFUL DATABASES

Ebscohost contains other databases including Medline, Academic Search Premier, and many others.

InfoTrac provides full-text and bibliographic access to journals via a collection of databases which specialises in business, health, biography and history. It also provides access to an excellent full-text multidisciplinary database which covers all academic areas.

Scopus claims that it is the world's largest abstract and citation database of scientific, technical, medical and social sciences literature with 13,450 peer-reviewed titles from over 4,000 international publishers.

NAMES OF USEFUL WEBSITES http://www.who.int (World Health Organisation) http: //www.unicef.org (United Nations Childrens Fund) http://www.cdc.gov (Centers for Disease Control) http://www.hst.org.za (Health Systems Trust) http://www.kings.cam.ac.uk/~js229/glossary.html (Dictionary of Epidemiology) http://www.epibiostat.ucsf.edu/epidem/epidem.htm (World Wide Web Virtual Library: Epidemiology)

When it comes to health information, find out about annual publications or reports published in your country, e.g. the Department of Health’s annual reports. Another way to search is to ask colleagues and other experts for advice on locating more local sources of information on a topic.

When searching for literature, particularly on a widely researched topic like HIV/AIDS, you will need to be very specific and selective while searching. One way to select journal references is to concentrate on those published in the last five years. You are, after all, wanting the latest research on the topic, and to understand current thinking on the topic.

One question that often comes up concerns the number of references required for a literature review. There is no simple answer to this question. Mouton discusses the average number of readings for Masters and Doctoral theses. However, your final assignment for this module is a short research report, which generally would include between 15 - 20 references. Remember that they should be relevant and up-to-date references, so this will require you to read many more than 20 references.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 166 5.3 Obtain Copies The next step is to obtain copies of the relevant literature, whether it is in the form of journal articles, books, monographs, or sections of a website. You need copies in order to critically review each reference. Remember that thesecan be obtained by saving them to a disc or “Flash Drive”, or having them printed out immediately, or e-mailing them to your own e-mail address and reading them on your computer. At the same time as obtaining copies, you should be reading these articles to make sure it is relevant, mapping its content and organising or filing your information. Depoy & Gitlin (1994) describe some helpful ways of doing this on pages 71 - 73.

READING Depoy, E. & Gitlin, L. (1994). Ch 5 - Developing a Knowledge Base Through Review of the Literature. In Introduction to Research. St Louis: Mosby: 61 - 66.

Another helpful strategy when working through the literature is, as you finish reading an article, to write a sentence or two at the end of the article summarising the key issues that will be relevant to your study, or summarising the main points of the article. The value of this is that it forces you to reflect on what you have read immediately, and therefore to internalise the information more effectively.

5.4 Take Notes for Your Literature Review As you take your initial notes for the literature review, you should list these references accurately in the correct format, and use the correct in-text referencing style. The SOPH Academic Handbook (see below) concentrates on the Harvard Style and provides guidance on using this style. You may choose your preferred style, but what is important is that the in-text and reference list formats are consistent and correct! In addition, it is essential that you never copy anything down verbatim without putting it in quotation marks and referencing it – not a sentence nor even a phrase, and certainly not parts of the Module Guide! Do not underestimate the importance of referencing: it is essential in any academic or research undertaking, and at postgraduate level, you cannot afford to be casual about it. Locate the section on Referencing by using the Contents list in the SOPH Academic Handbook.

READING

Alexander, L., SOPH, UWC. (2008). Section 5.3 - Citing and Referencing the Sources That You Use. SOPH Academic Handbook. Bellville, UWC: 52 - 61. Provided by SOPH, UWC: not in Reader.

There are several websites that may be helpful to you, depending on which referencing style you use: SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 167 The Harvard System (author, year system): http://www.lc.unsw.edu.au/onlib/ref.html From this site there are a number of links. You can also simply type Harvard Referencing Method into Google, and explore from there.

The Vancouver (number system): http://wwwlib.murdoch.edu.au/find/citation/vancouver.html You can also simply type Vancouver Referencing Method into Google, and explore from there.

Once you have a selection of key texts, you should study, summarise and discuss them in your literature review. In the process, you will be trying to identify study precedents which may later help you to substantiate your own findings. You will be searching for gaps in the literature which may constitute further areas for research. You must remember that the discussion contained in a literature review is expected to be critical and to respond to previous research. In the end, a literature review should map out the findings of previous studies, noting their strengths, analysing their methodologies and findings, and noting areas of weakness or gaps within them.

6 DEVELOPING CRITICAL REVIEW SKILLS

Once you have assembled sufficient relevant literature, you are expected to read the material critically. Depoy & Gitlin (1994) point out that, in the research context, a literature review is a critical review of the literature. This requires you to understand the state of existing knowledge on the topic, and to consider critically whether previous research conducted on the topic is sufficient, whether the focus and research strategy is adequate, and whether the findings are valid. You may be wondering how you will do this, if you are a newcomer to a topic. Perhaps the best advice is that you will need to do a fair amount of reading and gain a level of familiarity with the topic, before you can take a critical position on the literature.

Start exploring the issue of critically reviewing literature by reading this short discussion of the process. Use this Task to focus your reading.

TASK 6 – QUESTIONS TO FOCUS YOUR CRITICAL READING

As you read this discussion, develop three critical or evaluative questions to ask about the literature that you read.

WRITING A CRITICAL LITERATURE REVIEW OF SCIENTIFIC PAPERS

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 168 Introduction

Critical reading of epidemiological literature enables us to be more systematic in our evaluation of population-based studies. We need to ask critical questions such as: Does the study design warrant these conclusions? Such a critique must address the objective or hypothesis of the study in relation to the study’s design, results, discussion, conclusions and significance.

Statement of Problem

The paper should provide the reader with sufficient background on the general problem area to allow an assessment of its importance. Relevant literature should be surveyed and critiqued to conclude what is currently known regarding the problem and what remains debatable or unanswered.

Study Objective or Hypothesis

A clear statement of the original objective/s of the research is essential. Ideally, the objective will be clearly stated as a study hypothesis. Objectives are often not explicitly stated and the reader must then assess the conclusions to identify what the research was actually trying to prove. The objectives should clearly relate to the problem stated earlier.

Methods and Study Design

The study sample must be clearly identified and must be appropriate for the study objectives and the generalisability of the results. Sample size and the selection process must be acceptable. The study design is one of the strongest factors affecting a study’s conclusions and must be appropriate to the hypothesis or objective. Many designs involve studies that determine the relationship between a health or disease outcome, (e.g. death from lung cancer) and an exposure (e.g. smoking). In this case smoking is the independent variable which affects the other dependent variable (death from lung cancer).

Study designs are of two types: experimental or observational - and can be further sub-categorised as cross-sectional or prevalence studies, case-control studies, and incidence or cohort studies. Other concerns include adequate definition of independent and dependent variables, control of confounding variables, the validity and reliability of instruments and techniques used for measurement as well as sound ethical practice, etc.

Results

Data must be presented clearly and precisely, with figures and summary statistics to allow the reader to agree or disagree with the author. You need to check whether the study design and methods of observation warrant the statistical analysis presented as the results. Certain biases in the method of observation or defects in study design may invalidate the results.

Discussion

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 169 This section of a paper allows the author to interpret the results as they relate to the study’s objectives or hypothesis. Comparisons can also be made with what is reported in the literature. Difficulties or problems in study design or data collection leading to potential bias should also be discussed here.

FEEDBACK

There are many critical questions that you could have developed to guide your reading. Here are a few examples: . Is there a rationale for the study? . Is the objective or hypothesis clearly stated? . Does the hypothesis being tested relate to the problem? . Is the method appropriate to the study objective? . Did any aspect of the method bias the results? . Was the study sample appropriate? What were its limitations, if any? . Are the results warranted by the analyses that were undertaken?

Ultimately, you should be asking yourself whether findings be generalised beyond the study setting and whether the findings of the study you are reading are replicable. In other words, could they be applied to the data set that you, as the Public Health official, are concerned with?

Each of the authors below offer advice on writing a critical review of the literature. Mouton (2001) provides “Criteria for a good literature review” on page 90. Depoy & Gitlin (1994) provide 19 questions to guide your evaluation of the literature on page 73. Modify your questions to include some of the points you find useful in these readings, and remember to use these questions while you select and read articles for your Epidemiological Report assignment.

READING

Mouton, J. (2001). Ch 6 - The Literature Review. How to Succeed in your Master’s and Doctoral Studies: A South African Guide and Resource Book. Pretoria: Van Schaik: pages 90- 97.

Depoy, E. & Gitlin, L. (1994). Ch 5 - Developing a Knowledge Base Through Review of the Literature. In Introduction to Research. St Louis: Mosby: 61 - 66.

7 PREPARING TO WRITE YOUR LITERATURE REVIEW

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 170 You are probably aware that no one can write a text without substantial planning and selection of its content. Too often students simply think that a literature review is made up of a series of short paragraphs, briefly describing each article or text you have read. This is not so. You have already been introduced to the idea of reading the literature critically, and using it purposefully to increase your understanding of the problem you are studying. Depoy & Gitlin (1994) suggest two ways of “Charting the Literature”, or organising it: the first way is according to key concepts that arise from the study, e.g. drug-related behaviours, depression and suicide. Mouton (2001) also suggests this approach.

Another way of structuring the content of the literature review is to use the framework provided by scientific papers (Depoy & Gitlin, 1994). Here you could focus on links and contrasts between study population, then focus on the sampling strategies and study methods, then on the findings, and so on. Whatever way you decide to structure your literature review, you need to integrate the literature into a discussion, with the aim of showing what is relevant to your study, what contradicts it and whether there are gaps in the research. Take a look at the readings that follow, and choose one that suits you.

TASK 7 – UNDERTAKE A CRITICAL REVIEW OF THE LITERATURE

Once you have identified several texts for your assignment, read them with your critical questions in mind and take notes using one of the frameworks discussed. Be sure to assess the quality and appropriateness of the research methods used to generate the results, and to comment critically on the findings and conclusions.

READING

Mouton, J. (2001). Ch 6 - The Literature Review. How to Succeed in your Master’s and Doctoral Studies: A South African Guide and Resource Book. Pretoria: Van Schaik: pages 90- 97.

Depoy, E. & Gitlin, L. (1994). Ch 5 - Developing a Knowledge Base Through Review of the Literature. In Introduction to Research. St Louis: Mosby: 61 - 66.

FEEDBACK

From this point, you should develop an outline for your introduction and literature review sections, and begin to see how you can systematically and clearly present the information you have reviewed. It could be “By Theme or Construct” (Mouton, 2001: 93). It may also be useful to prepare tables which display the main findings from different researchers alongside each other. Methodological features such as sample selection and size are just as important to consider as the main research results. Questionable calibration and examiner variability controls can introduce SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 171 serious bias and this should be commented upon.

This is more than just a summary, but you will probably find that only once you have analysed your own data set, will you really be able to comment critically on the other comparative texts you have found. Do as much as you can at this stage, but be prepared to revise your Literature Review after you receive feedback from your lecturer, when you are compiling the Final Report (Unit 4 - Session 5).

8 SESSION SUMMARY

This session introduced the literature review process and guided you into tackling Assignment 2. Your assignments in this module provide you with an opportunity to apply these epidemiological skills in beginning to resolve a real Public Health concern.

9 REFERENCES

. Beaglehole, R., Bonita, R. & Kjellstrom, T. (1993). Ch 11 - Continuing Education in Epidemiology. In Basic Epidemiology. Geneva: WHO: 145-148. See prescribed book.

. Mazala, J. F. (2004). Assignment: The Sexual and Drug Related Risk Behavior in Adolescents Aged 11 And 15 Years in X Region. Bellville: SOPH, UWC.

. Mouton, J. (2001). Ch 6 - The Literature Review. How to Succeed in your Master’s and Doctoral Studies: A South African Guide and Resource Book. Pretoria: Van Schaik: 86 - 97.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 172 Unit 3 - Session 6 FOCUS ON ASSIGNMENT Assignment 2: 2 Study Methodology

Introduction

This study session focuses on a different aspect of the assignment. Before you read this session, review the assignment requirements from the Module Introduction of this Module Guide.

After completing Session 5, you wrote the Introduction and Literature Review for Assignment 2.

Taking Assignment 2 further

This study session guides you in completing additional components of your epidemiologic report (Assignment 2). These components are:

. Develop your study Aims and Objectives. . Write the Methods section of your final report.

Session Contents

1 Learning outcomes of this session 2 Readings 3 Your Assignment task 4 Critically appraise the literature on your Assignment topic 5 Defining the Aims and Objectives of your study. 6 Linking back to the Literature Review 7 Writing a Study Methods section 8 Session summary

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 173 1 LEARNING OUTCOMES OF THIS SESSION

By the end of this session, you should be better able to:

. Improve your skills for critically reviewing health related literature. . Improve on a short, critical review of the available health literature. . Write a problem statement. . Identify aims and objectives for an epidemiological study. . Understand the components of a methods section in an epidemiologic report. . Understand how to discuss limitations found in epidemiologic studies.

2 READINGS

Refer to all the readings listed in the preceding study sessions. Specifically, you may want to use Readings from Study Session 5 of Unit 3 and Study Sessions 1 - 3 in Unit 2.

Author/s Publication details Katzenellenbogen, (1997). Ch 6 - Setting Objectives for Research. In Epidemiology: A J. M., Joubert, G., Manual for South Africa. Cape Town: Oxford University Press: 56 - Abdool Karim, S. S. 63.

3 YOUR ASSIGNMENT TASK – DEVELOP A REPORT ON AN EPIDEMIOLOGICAL PROBLEM

In this section, you are going to develop a problem statement from the data set and aims and objectives related to your investigation. The data set is provided on the CD and the assignment task is described in more detail in the Module Introduction. In this session, you will be guided to complete the next two sections of the final assignment for submission and feedback: . Aims and Objectives . Study Methods These new sections directly relate to the study sessions on study design in Unit 2.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 174 TASK 1 - REVIEW ASSIGNMENT 2

Read again the Module Introduction (section 3) outlining the requirements for Assignment 2 and map out your work plan for completing this assignment.

FEEDBACK

Assignment 2 has the following parts to it: . METHODOLOGY - Develop study aims and objectives and write methods section. . DATA ANLYSIS AND INTERPRETATION - Complete data analysis on assignment data set and write the Results section of your final report. . WRITE UP AND SUBMIT FINAL REPORT - Complete Discussion, Conclusion and Recommendations sections of the final report.

It is important to note the deadline for the draft submission of the complete final report. If you do not submit a draft by the deadline, 5% of your marks will be deducted. Submitting the draft and revising your assignment in relation to the lecturer’s feedback is likely to increase both your ultimate understanding of the material and your Assignment 2 mark.

At this time you may also want to review Unit 3 Session 7 and Unit 4 Session 5 to see what lies ahead in the remainder of this Assignment. Make a timetable to assure that you will complete your draft in time.

4 CRITICALLY APPRAISE THE LITERATURE ON THE ASSIGNMENT TOPIC

Review Sessions 1 and 2 of Unit 2 and then complete Task 2.

TASK 2 - REVISE THE LITERATURE REVIEW

Revise your literature review based on new knowledge gained in Unit 2.

FEEDBACK

In the last session, you developed the following four sections of your final assignment/report: a) Title b) Introduction SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 175 c) Literature Review d) Reference List

Unit 1 Session 4 and Unit 2 focus on increasing your knowledge and understanding around issues of study validity and bias and their relationship to study design. It was noted in Unit 1 Session 4 that several issues being discussed could be used to critique studies and research reports that you find in the literature. Use this knowledge to expand and improve the literature review section of your report. For example refer again to Unit 1 Session 4, Task 2 and Unit 2 Session 1, Task 3 for relevant questions to apply to each study in your literature review.

5 DEFINING THE AIMS AND OBJECTIVES OF YOUR STUDY

The purpose of any problem solving task is to come up with answers. In the case of this assignment, your particular role as a Public Health official means that you have the responsibility to respond to this health problem.

You are now expected to define an aim and objectives for your research study. This means that you should take the role of the epidemiologist, and against the background of the data set and literature review, identify what you aim to find out or what questions you need to answer. The aim and objectives help to guide your research process. They determine your sample group, the data you will collect and how you will analyse the data. Here is an example.

DEVELOPING AIMS & OBJECTIVES FOR A STUDY TOPIC: PERINATAL MORTALITY IN REGION Y

AIM: To understand the causes of perinatal mortality (deaths of infants around the time of birth) at Hospital Y.

OBJECTIVES: 1) Find data on the birthweight and gestational age distribution of births at Hospital Y. 2) Calculate the rate of perinatal death, low birth weight, preterm birth and small for gestational age at Hospital Y. 3) Analyse the reasons for babies dying, that is, the likely contributors to, or causes of, perinatal death at Hospital Y.

STUDY QUESTIONS: 1) What is the birthweight and gestational age distribution of births at Hospital Y? 2) What is the rate of perinatal death, low birth weight, preterm birth and small for gestational age at Hospital Y? 3) Why are babies dying, that is what are the likely contributors to, or causes SOPH, UWC, Masterof perinatal of Public death Health: at MeasuringHospital Y? Health & Disease II: Intermediate Epidemiology – Unit 3 176 Defining your aims and objectives or study questions is a process of deciding what you need to do and therefore what you need to know in order to respond adequately. Katzenellenbogen et al (1997) have written a useful chapter that addresses this process. Read it and work through the examples in the text on pages 62 - 63. Refer to these examples while you complete Task 3.

READING

Katzenellenbogen, J. M., Joubert, G., Abdool Karim, S. S. (1997). Ch 6 - Setting Objectives for Research. In Epidemiology: A Manual for South Africa. Cape Town: Oxford University Press: 56 - 63.

TASK 3 – CLARIFY THE AIM AND OBJECTIVES OF THE RESEARCH STUDY FOR ASSIGNMENT 2

In this study, your aim is to improve the health of this particular community by whatever means are required. Follow Katzenellenbogen et al’s four steps in order to formulate objectives.

What, therefore is the aim and what are the objectives of this study? Formulate the broad aim of your study as clearly as possible.

FEEDBACK

You should have started with the problem statement you formulated and from that extracted your overall purpose in conducting a research study. Then from there you could have asked questions like: what specific information will I need to reach this aim? These are your objectives, and there may be many of them. Objectives can also be formulated as study questions, as you see in the example above.

Objective: Analyse the reasons for babies dying, that is, the likely contributors to, or causes of, perinatal death at Hospital Y.

Study question: Why are babies dying, that is what are the likely contributors to, or causes of perinatal death at Hospital Y?

You are welcome to check your aims and objectives by e-mailing them to your lecturer. Feedback will also be provided when you submit them as Assignment 2 draft, at which point you will be able to revise and clarify them.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 177 6 LINKING BACK TO THE LITERATURE REVIEW

Remember that your objectives should be borne in mind as your write your literature review. In other words, you will be asking:

What is already known about this in the literature, or other reports?

As we have already said, this is the reason for the Literature Review, to understand what is already known about the health problem with which you are concerned. It may be that the answer to your study question is already in the literature, in which case there would be no reason for you to research it. However, even when answers can be found in the literature, they are not necessarily applicable to the local context in which you work.

The process of conducting your study will be guided by your aims and objectives, so it is important to frame them clearly and simply to reduce confusion as you proceed. Note, however, that your own assignment objectives or study questions may be refined as you get more familiar with the study topic and the data.

7 WRITING A STUDY METHODS SECTION

The methods section of a report describes how the study was conducted. Review the Study Methods for the Assignment found in the Module Information. Also review the Vaughan & Morrow (1989) Chapter 12 from Unit 1 - Session 4 again.

READING

Vaughan, J. P. & Morrow, R. H. (1989). Ch 12 - Communicating Health Information. In Epidemiology for Health Managers. Geneva: WHO Publications: 125 - 129.

TASK 4 - WRITE THE METHODS SECTION OF YOUR REPORT FOR ASSIGNMENT 2

a) What are the components of a methods section for an epidemiologic or other research report? b) Critique the Study Methods for the module assignment as described in the Module Information (section 3). c) Add a discussion of the limitations of the Assignment Research Methods of the Methods section provided.

FEEDBACK SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 178 a) Depending on the nature of the report, there are several possible ways to organise your methods section. Commonly the Methods section should include:

DESCRIPTION OF STUDY METHODS

. Study Design: What type of study was done, for example was it a cohort, case-control, ecologic or randomised controlled study. . Population and Sample: Who is the target population of interest, for example all residents of a district, or all women of reproductive age? How did you sample the population, for example random sample of households, convenience sample from local clinics, etc. . Sample Size: Indicate the number of subjects in the sample. If there is more than one group then give the number in each group. Also present sample size calculations, if applicable. . Data Collection: This section should include: - a description of the study data collection tool, e.g. questionnaire, interview schedule, including its development and validation, and - a detailed description of how the data was collected and the qualifications and training of those who collected the data. . Data Management: Describe how the data was managed, for example was it entered into a data base, and which one? Who entered it and how was the data quality checked and cleaned? . Data Analysis: Describe the data analysis plan, for example are you doing simple descriptive statistics or analytic statistics, and what kind of data do you expect? You will learn more about this in Unit 3. . Study Limitations: What are the expected limitations of your data, for example if women are sampled from a clinic, are there any potential biases in using this sample? . Ethical Considerations: Discuss whether the study was approved by an ethics committee, your informed consent procedures and any risks or benefits for the subjects involved in your study. Also discuss procedures for maintaining study confidentiality.

There may be many more sections and sub-sections in your Methods section; these are just a few common examples. Depending on the design, the methods section can be quite detailed; for example, if you include laboratory tests, you will need to describe those in detail. b) Use the knowledge gained in Unit 2 to critique the Study Methods section provided for the module assignment. c) Note that the Study Methods in the Module Information contains some of the above listed sections except Study Limitations. You should include the Study Methods from the Module Information and then ADD the Study Limitations section based on your critique.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 179 8 SESSION SUMMARY

For this session you completed the next two sections of your final research report for Assignment 2. Save these sections and combine them with other necessary sections which will be completed at the end of Units 3 and 4. After completion of Unit 4 - Session 5, you should submit a full draft report to your lecturer for feedback.

Now take a break before continuing with Unit 3 Session 7, which will the analysis of your data set. You will need to revise Sessions 3 and 4 to undertake this process.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 180 FOCUS ON Unit 3 - Session 7 ASSIGNMENT 2 Assignment 2: Analyse Study Data

Introduction

This study session provides the opportunity for you to analyse the assignment data set, and apply some of the statistical techniques and principles of analysis that you explored in the preceding study sessions. It will enable you to identify the main results and make sense of them for your assignment report.

Session Contents

1 Learning outcomes of this session 2 Readings 3 Analyse the data 4 Interpret the information 5 Session summary

1 LEARNING OUTCOMES OF THIS SESSION

By the end of this study session you should be better able to:

. Analyse and interpret a set of epidemiological data/health information.

2 READINGS

Author/s Publication details

Refer to the readings provided in Study Sessions 1 to 4 of Unit 3.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 181 3 ANALYSE THE DATA

The purpose of this task is to analyse the data in your assignment data set, to describe the data and to test for any associations that may exist between the variables in the data set. You will need to apply some of the statistical tools introduced in earlier study sessions, to the assignment data set. Your ultimate aim is to answer your study questions (objectives). Examine the study objectives you developed in Study Session 6 and develop a strategy to answer these question using the assignment data, then complete Task 1.

TASK 1 – DESCRIBE YOUR ASSIGNMENT DATA USING TABLES, GRAPHS AND DETERMINE ANY STATISTICAL ASSOCIATION BETWEEN THE MAIN VARIABLES

Identify how you would analyse the assignment data to answer your study questions. Arrange the data into a format that enables you to represent the descriptive data and then test for relevant associations and carry out an appropriate analysis.

FEEDBACK

Refer to the previous study sessions in Unit 3 for comment and advice.

4 INTERPRET THE INFORMATION

Clearly the most important part of your job is to make sense out of the information that has been given to you. You have already carried out a simple statistical analysis of the data and identified its main features. The next step is to determine what this information means. What do the results of your analysis tell you about the health problem(s) described in the data provided? This task will take you through the process of data interpretation. As in previous sessions, the trick is to choose the right questions to ask, and search the material data for answers. A better understanding of the health problem can guide your recommendations or decisions on what action to take.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 182 TASK 2 - IDENTIFY THE MAIN FINDINGS AND WRITE A DESCRIPTION OF THE DATA AND FINDINGS Now you need to determine which information is the most important. There is a lot of information but not all of it tells us something that is essential to know. In this task you need to determine the main message or result(s) emerging from your analysis of the material.

FEEDBACK Remember that your main task is to determine how to address the health problem described and answer your study questions (objectives). The main findings are those that are best able to give you an answer to these questions. Note that the main finding could also tell you that you have asked the wrong question, in which case you may need to ask another question. The end result should provide you with information that helps you write the Results section of your final assignment. Remember that you still need to complete Unit 4 and then submit your full draft by the deadline in your time table.

5 SESSION SUMMARY

In the course of this Unit, the study sessions addressed the techniques necessary for the analysis and interpretation of data. This was supplemented with an introduction to EpiInfo2002, a computer software tool with which to assist the analysis of epidemiological data. The study session concluded with an opportunity for you to apply these analytic tools to the assignment data set provided and write up the Results of your analysis. All three sessions completed at the end of this Unit should take you relatively far in compiling your Epidemiological Report. Review your work and plan the way forward to meeting the deadline for drafts of Assignment 2. It is hoped that by the end of this unit you find yourself more confident in your understanding the application of simple statistical and computer skills to the analysis of epidemiology research data. More specifically you should now be in a position to more easily recognise the value of different research study designs used in epidemiology, apply basic statistical tools to the analysis of an epidemiological data set, use EpiInfo2002 and systematically approach the interpretation of a set of health data. The skills in this unit provide a useful basis from which to consider the epidemiological basis for various health intervention strategies and the construction of sound health policy, which are addressed in Unit 4.

SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 183 SOPH, UWC, Master of Public Health: Measuring Health & Disease II: Intermediate Epidemiology – Unit 3 184

Recommended publications