Using EpiData & Epi-Info for Windows

Training for Communicable Disease Control in Local Authorities

Cardiff Council (Strategic Planning & Environment)

March 2007 Acknowledgements

Acknowledgements

© 2007 Cardiff Council (Strategic Planning & Environment). This training guide was produced by Alastair Tomlinson to form part of the Communicable Disease Lead Officer Training Programme, co-ordinated by the Wales Centre for Health.

Please send enquiries relating to this training guide to:

Alastair Tomlinson, Chartered Practitioner Team Leader (Health Improvement) Public Protection Division Room 134 City Hall Cathays Park Cardiff. CF10 3ND. 029 2087 1845 [email protected]

About the software ™ is a public domain software package designed for the global community of practitioners and researchers. It provides for easy form and database construction, data entry, and analysis with epidemiologic , maps, and graphs. Epi Info can be downloaded from http://www.cdc.gov/epiinfo

EpiData Software has developed from securing the principles of Epi Info V6 for DOS to an independent documentation oriented system. EpiData can be downloaded from http://www.epidata.dk

Conventions used in this training guide Text to be entered on screen is shown in this font.

Directions to drop-down menu items are shown in bold type, e.g. File > SaveSave.

i Table of Contents

Table of Contents

Acknowledgements...... i

Table of Contents ...... ii

Aim and Objectives ...... 1

Outbreak Scenario...... 2

Creating a Questionnaire using EpiData ...... 3

Entering Data using EpiData ...... 17

Outbreak Investigation using Epi Info Analysis ...... 19

Using Analysis with routine COSURV data...... 35

Other capabilities of EpiData and Epi Info ...... 39

Appendix I – Comparison of Epi Info & EpiData ...... 44

Appendix II – Contents of course CD-ROM...... 48

Appendix III – Further information & resources...... 49

Appendix IV – Worksheet for 2x2 table results ...... 52

Appendix V – Check code example ...... 53

ii Aim and Objectives

Aim and Objectives

Aim of the training To provide training on the practical use of Epi Info and EpiData in communicable disease control, with particular reference to:

♦ An outbreak situation ♦ Analysis of routine Cosurv surveillance data

Objectives By the end of the training delegates will:

♦ Have an understanding of EpiData and Epi-Info for Windows and their component elements ♦ Be able to use EpiData to design a data entry form for a questionnaire in an outbreak situation ♦ Be able to use EpiData to enter outbreak investigation data into a record suitable for analysis in Epi Info for Windows ♦ Be able to use Analysis to obtain useful statistical and epidemiological information from an EpiData / Epi-Info for Windows database for outbreak investigation purposes ♦ Be able to use Analysis to import routine Cosurv surveillance data into Epi-Info for Windows , and obtain useful statistical and epidemiological information

1 Outbreak Scenario

Outbreak Scenario

On the 17 th August, you receive a telephone call from a gentleman who reports that he and several others who attended a buffet following a funeral were suffering symptoms of food poisoning. The buffet, provided by an external caterer, was held at a local club following the funeral, and mourners arrived at the club at around 3.00 pm on 14 th August. Food left over from the buffet was placed in the main bar areas of the club for club members to consume later that day.

Initial activity involves obtaining of a list of people who attended the funeral and others who may have eaten the food provided for the funeral buffet. A list of food served at the buffet has been obtained from the caterer, and cross-referenced with initial information gathered from cases. Indications are that around 70-80 people attended the funeral, and approximately 40-50 of these people may have experienced symptoms consistent with food poisoning.

Table 111 --- List of foods served at the buffet

sausage rolls chicken rolls salmon sandwiches pickled onions ham sandwiches egg rolls corned beef sandwiches ham rolls egg sandwiches chicken nuggets chicken sandwiches cheese & biscuits crisps gateaux pasties An Outbreak Control Team has been convened, and has decided to undertake a to investigate the outbreak. The OCT assigns you with the following tasks:

♦ Establish the case definition ♦ Develop a structured questionnaire to investigate the outbreak. ♦ Enter questionnaire data into an appropriate computer database ♦ Analyse the data to describe outbreak and identify exposures associated with illness This training uses this scenario to introduce the various functions of EpiData and Epi Info for Windows, and their particular use in outbreak investigation.

2 Creating a Questionnaire using EpiData

Creating a Questionnaire using EpiData

Basic Questionnaire Creation A screenshot of the main EpiData screen is shown.

We want to create a new questionnaire, so select DefineDefine Data > New .QES File (It is also possible to edit an existing questionnaire, by using Define Data > Open .QES FileFileFile).File This creates an empty text file into which we can enter information.

On creation of the file, the following toolbar option also becomes available. Clicking this button brings up the ‘Field pick list’ dialog.

This dialog makes it easy to create different kinds of fields. As an example, we will create one or two of the basic field types in our questionnaire.

3 Creating a Questionnaire using EpiData

First, type an appropriate heading into the first line of your questionnaire, such as “Lead Officer Training March 2007 ”.

Then, on the row below, enter Surname: Leave the cursor flashing after the colon. If the Field pick list is not already showing, click the button to bring it on screen. Select the ‘Text’ tab from the pick list. This then gives a short option list of ‘text’, ‘upper-case text’, and ‘encryption field’. For now we’ll accept the default ‘text’ option. Set the field length to 20, then click the Insert button. EpiData inserts a series of underscore characters after the Surname: label. Underscore characters _ are how EpiData denotes plain text fields. The number of underscores indicates the maximum length of the field.

On the next line, type Forename: Using the field pick list again, insert another text field of 15 characters.

Now let’s try a different field type – dates. On the next line, type Date of birth: Select the ‘Date’ tab from the field pick list. This presents two lists of options – general date fields on the left, and ‘automatic’ dates on the right. General date fields are formatted in three different ways. For most of us in Europe, the

format is most natural, so select that. Click the Insert button, and EpiData inserts the relevant date format field type.

On the next line, type Gender: and insert a single character Uppercase text field. EpiData inserts a code, which denotes an uppercase field one character long. Later, we’ll restrict the entries in this field to either M (male), F (female) or U (unknown). Below this, add the label Occupation: and insert another 20 character text field.

We need to be able to record interviewee address details.

First, let’s create a house number field. On the next line type “ House number: ”, and then select the ‘Numeric’ tab on the field pick list. Select 3 digits before the decimal point, and 0 digits after it, then click the Insert button. EpiData inserts ### after the “House number:” label. # characters are how EpiData denotes numeric fields, and again the number of # characters indicates the maximum size of the number. (Numbers with a decimal point appear as ##.## ).

4 Creating a Questionnaire using EpiData

Add another text field for House name (30 characters), and three more fields for Street name (30 characters), District (20 characters) and Town (20 characters). Then add another label for Postcode: and this time add an ‘Uppercase text’ field of 8 characters. EpiData inserts uppercase fields as with the number of spaces determining the total length of the field.

Finally, let’s add a field for telephone details. Initially it seems like a good idea to create this as a numeric field, but in doing this we wouldn’t be able to record any text details (such as ext. etc), and it’s unlikely we would ever want to order our data by telephone number, so it’s probably easier to simply create a text field of around 15-20 characters. If you prefer you can create two fields, one for home and one for other (e.g. work, mobile).

We’ve now created the fields for the basic contact details of the interviewee. Before proceeding onto further work, let’s save what we’ve done so far. Click the Save button on the toolbar (or select File > SaveSave). Enter an appropriate filename and location in the dialog box, and click Save.

We can also take a sneak preview of how the questionnaire will appear for those entering data. Before doing that, let’s set a couple of options that determine how our fields will be named. Click File > OptionsOptions, and then select the ‘Create data file’ tab.

Generally, the ‘automatic field names’ options is best, since it will try to make sense of the question (i.e. the text immediately to the left of the field), and will ignore common words like ‘who’, ‘did’, ‘or’ etc. Sometimes it may be preferable to select the ‘First word in question’ option. For this exercise, select ‘automatic field names’.

5 Creating a Questionnaire using EpiData

Later in the module we’ll look at how we can specifically tailor the fieldnames that EpiData will generate in the data record files. Fieldnames have a maximum length of 10 characters.

The decision on letter case of field names is mainly one of personal preference – the author’s preference is to use upper-case for field names to make them stand out.

Once you have made your option selections and clicked OK, click the Preview Data Form button (or select Make Data File > PreviewPreview Data FormForm).

A new tab on the main display will appear, showing the questionnaire with data entry fields in the relevant places. You can select File > Print Data FFormormormorm to get an idea of how the questionnaire will appear on paper for completion by interviewers. You can even practice entering data into the form to check that things appear as you expect them to. For now, it’s useful just to see how things are going to be presented. To close the form, select File > Close formform,form or press CTRL F4.

Currently our questionnaire lets us record interviewees’ personal details, but not a lot else. Let’s change that by adding some details specific to the event in our scenario. The first thing to establish is whether the person actually attended the funeral (they may have been exposed to the food under suspicion at the club bar after the event).

This introduces us to another important field type: the Boolean field. This is simply a Yes/No field, but this type of data is often crucial in outbreak investigation, since it allows us to construct 2x2 tables to assess for various exposures.

On a new line at the end of the questionnaire, type Did you attend the funeral? Select the ‘Other’ tab on the field pick list, and select the Boolean (yes/no) option (the Length field on the dialog becomes greyed out as it isn’t relevant). Click the Insert button. EpiData inserts a code, which is how it denotes a Boolean field.

For clarity, let’s also include a question on whether the person attended the members club – we don’t know if people from the members club have been affected, or whether there is crossover with the cohort of funeral attendees, but it may be important to be able to distinguish between them later. On a new line, type Did you attend the members club? and insert another Boolean field.

6 Creating a Questionnaire using EpiData

We’ll also use Boolean fields to record whether or not the person was ill, and what their symptoms were. Add the relevant lines and fields to the questionnaire for the following fields:

♦ Were you ill? ♦ Pyrexia ♦ Diarrhoea ♦ Headache ♦ Vomiting ♦ Other aches ♦ Abdominal pain ♦ Other symptoms (with a separate text field for description) ♦ Nausea

Sometimes people may have described themselves as ill, but do not meet the actual case definition, so include an additional Case definition met? Boolean field as well.

Another key set of data to record for those who have suffered symptoms is their onset date/time, and the duration of symptoms. Go back up the questionnaire, and add a couple of extra lines after Were you ill? but before the list of symptoms. Type Onset date: and then insert a general date field. On the next line type Onset time: and then insert a numeric field with 2 digits before and 2 digits after the decimal point ( ##.## ). EpiData records time-related information in this numeric, with the digits before the point representing hours and those after minutes. The 24- hour clock is used. Then add another field for duration of symptoms – 2 digits in size, intended to be measured in days, and a similar 3 digit field for incubation period, this time intended to be measured in hours.

In a full outbreak we would probably also include further questions about whether the person was hospitalised, whether specimens had been submitted, and so on, together with details of any other household contacts, and maybe other data to indicate severity of symptoms, but for the purposes of this exercise we’ll skip these elements.

The final major part of the questionnaire is the recording of relevant exposures. Comparison of the rates of illness in those exposed and not exposed will enable us to assess which exposures are most likely to be implicated in the outbreak. For the purpose of this exercise, we’ll assume that the OCT has decided to focus attention on the foods consumed at the buffet. In a real life situation, it may be more appropriate to retain an open mind and include other potentially relevant exposures that may explain some or all of the illness.

NB – avoid use of the ampersand & symbol in questionnaires, since it tends to cause unexpected display results.

7 Creating a Questionnaire using EpiData

Add a list of Boolean fields for the relevant food items – the table from the outbreak scenario is reproduced below.

sausage rolls chicken rolls salmon sandwiches pickled onions ham sandwiches egg rolls corned beef sandwiches ham rolls egg sandwiches chicken nuggets chicken sandwiches cheese & biscuits crisps gateaux pasties Again, in a real situation we might add additional information on quantity eaten or portion size to investigate the possibility of a dose-response relationship, but for this exercise we will keep the exposures simple yes/no answers. Finally, add a general comments text field of around 50 characters to capture any other relevant information (e.g. perhaps interview was carried out with parent, relative, interpreter etc).

The only remaining items to add to the questionnaire are some basic administration fields. We need to have some way of identifying each record as unique, and EpiData provides a specific field type for this purpose. It’s quite useful for this to be easily seen, so at the very top of the questionnaire, type Record no.: and select the ‘Other’ tab on the Field pick list. Select the ‘Auto ID number’ field type and click Insert. EpiData inserts an code, which will include an automatically incrementing number for each new record added. It is also useful to record the date of interview and the name of the interviewer (initials usually sufficient), so add appropriate date and uppercase fields for this purpose.

We can also add another date field, the date of entry into EpiData. Again EpiData can automatically insert this for us – select the ‘Date’ tab on the Field pick list and select the code from the right-hand list.

The basic questionnaire is now complete. At the moment it isn’t particularly easy to read and this may make it more difficult for interviewers to complete the questionnaire, and harder for data entry staff to accurately and quickly enter the results. EpiData includes an ‘Align fields’ option to help address this problem. Place the cursor in one of the longer questions/labels, such as Corned beef sandwiches , or Did you attend the members club? Then select Edit > Align FieldsFields. EpiData will realign each line of the questionnaire so that the fields appear in a ‘column’ – making it easier for both interviewers and data entry staff.

Save the questionnaire before we proceed any further.

8 Creating a Questionnaire using EpiData

Advanced questionnaire design In this section we’ll cover some of the techniques and functions provided by EpiData to help save time on data entry, and to ensure that accurate and reliable data is entered.

Closer control over fieldnames To start with, lets look at how our questionnaire looks in data entry mode. Select the Preview Data Form button to display the data form. Use the TTTabTababab key to cycle through the fields in the questionnaire. Note that for each field, the fieldname appears in the status bar at the bottom left of the screen, and next to it information on the type of data that can be entered (e.g. ‘ Alpha: all entries allowed ’, ‘ Date (dmy): 0- 9 and / allowed ’, ‘ Boolean: Y,1,N,0 allowed ’ etc.).

As you cycle through the fields, note the fieldnames that EpiData has automatically assigned to each field. In the majority of cases, they make perfect sense, but there are a few where the fieldname doesn’t intuitively indicate what the contents of the field are. This can be particularly important where data analysis is being undertaken by someone who wasn’t involved in the original drafting of the questionnaire (quite conceivable in a large outbreak with several partner organisations) – the last thing that they need is to be unsure what a relevant item of data actually . Fortunately, EpiData allows questionnaire designers greater control over fieldname selection where necessary.

The default fieldname selected in each case is up to 10 letters long, based on the text that appears immediately to the left of the field but ignoring common words such as ‘did’ or ‘the’. As an example, the fieldname for Did you attend the funeral? is YOUATTENDF , for Were you ill? – WEREYOUILL , and for Pickled onions - PICKLEDONI .

For these and some other fields, we would like to tailor the fieldname to make it a bit more meaningful. The chief way of doing this is by the use of braces { } , also known as curly brackets. When automatically selecting fieldnames, EpiData uses text enclosed in braces in preference to normal text. If the question is “{my} first {field}” then the field name will be MYFIELD. Braces offer a powerful method of defining meaningful field names.

Let’s look at a simple example – the Were you ill? question. Although the fieldname does make sense, it doesn’t really need to be that long – simply the term ILL would be enough. By putting braces around the word ill in the questionnaire, we force EpiData to call the field ILL. Modify the text in the questionnaire so that it looks like this:

Were you {ill}?

Now click Preview Data Form and put the cursor into the Were you ill? field. You can see from the status bar in the bottom corner that this fieldname is now simply ILL . Once you’ve satisfied yourself of this, close the preview (CTRL F4).

9 Creating a Questionnaire using EpiData

This is a fairly simple example, but the EpiData capabilities are more sophisticated than that. EpiData can pull text from more than one set of braces together to create a fieldname. As another example, consider the Did you attend the funeral/members club questions. Presently these have fieldnames of YOUATTENDF and YOUATTENDM respectively – not terribly meaningful. But by changing the text in the questionnaire as follows:

Did you {attend} the {fun}eral? Did you {attend} the members {club}?

… we produce fieldnames of ATTENDFUN and ATTENDCLUB, which are far more intuitive. Check for yourself by clicking Preview Data Form. Notice also that the braces do not appear on the entry form (and won’t appear on a printout either), so it doesn’t affect the ease of use for interviewers and data entry staff.

Go through the table below to update the questions as indicated to generate more meaningful fieldnames:

Question Current fieldname Modification New fieldname Date of birth DATEBIRTH {D}ate {o}f {b]irth DOB Abdominal pain ABDOMINALP {Abdom}inal {pain} ABDOMPAIN Case definition met? CASEDEFINI {Case def}inition {met}? CASEDEFMET Sausage rolls SAUSAGEROL {Saus}age {rolls} SAUSROLLS Pickled onions PICKLEDONI Pickled {onions} ONIONS Chicken nuggets CHICKENNUG Chicken {nuggets} NUGGETS Chicken rolls CHICKENROL {Chick}en {rolls} CHICKROLLS Chicken sandwiches CHICKENSAN {Chick}en {sand}wiches CHICKSAND

Click Preview Data Form to confirm the changes that have been made. Once you’re finished, close the preview and save your modified questionnaire.

Controlling data entry and skipping questions For some fields it can be useful to place restrictions on the range of data that can be entered – for example the Gender field can only have three sensible values (male, female, unknown) and it also makes sense to limit the Onset time field to the valid times represented in the 24 hour clock. There also some fields that can be filled through calculation – for example, age at time of interview, incubation period, perhaps even case definition in some circumstances – which can help with data accuracy and consistency. Finally data entry can be significantly quicker by using ‘skips’ so that the data entry operative doesn’t have to cycle through irrelevant fields (such as symptom fields for an interviewee who wasn’t ill).

These functions are all achieved by what EpiData calls checks . Checks are usually added once a ‘data file’ has been created based on the layout in a questionnaire.

10 Creating a Questionnaire using EpiData

One of the things we’ll do is add a simple calculation to work out a persons age in years at the time of the interview. Before we create the data file, add a new numeric field of 2 digits to hold the calculated age. Place it below the date of birth question.

Now our questionnaire has all the fields we need, so we can create the data file that EpiData will actually store the records in once they are entered. Make sure you have saved the most recent changes to the questionnaire, then click Make Data File > Make Data File to display the following dialog:

The .QES file is the file holding the questionnaire details (.QES is the extension that EpiData uses for all questionnaire files). The currently active file should be displayed in this box. The data file will be created according to the details shown in the lower box, and the default setting is the same name as the questionnaire file, but with a .REC extension, which is the extension used by EpiData for data record files. If the settings in the dialog look appropriate, click OK.

You’ll then be presented with another dialog asking you to give a label to the data file:

Give the data file an appropriate label (e.g. codename or incident number of the outbreak).

You should then get a message saying that the data file has been created. EpiData has also closed the original questionnaire, so we now have a blank screen. Click the Checks button, which will then ask you to open a data file – select the file we just created above.

EpiData will then open up the new data form – unsurprisingly this has a similar appearance to that of the Preview Data Form. However, we are currently in ‘Add/revise checks’ mode, so it isn’t possible to enter data. You’ll also notice that

11 Creating a Questionnaire using EpiData the Check file dialog has appeared. The status bar of the dialog shows the name of the check file that check details will be stored in. This will have the same name as the REC file, but with a .CHK extension.

The dialog shows the current field that checks may be added to. Some of the basic check settings are then shown in the lower part of the dialog. We’ll briefly summarise all these options before looking at some specific examples.

Range, Legal allows you to restrict the range of values that can be entered in a particular field. A range is defined by typing the minimum value and the maximum value separated by a hyphen. Typing 2-5 defines that only the numbers 2,3,4 or 5 can be entered in the current field. If only a maximum value is wanted then use -INF (minus infinity) as the minimum value. If only a minimum value is wanted then use INF (infinity) as the maximum value. Typing -INF-5 defines all numbers less than or equal to 5 as legal entries in the current field. Typing 0-INF defines all positive numbers as legal entries. Legal values are defined by typing all the accepted values separated by spaces or commas. Typing 4,6,8,10 defines that only the numbers 4,6,8 or 10 can be entered in the current field.

Jumps are available to help data entry flow. As an example, if a person has indicated that they suffered illness, data entry is likely to cover the symptoms that they suffered. If they weren’t ill, it makes more sense for data entry to skip past those fields and onto the next section. Jumps are entered by specifying the value, entering a greater-than-sign (>) and specifying the name of the field to jump to. For example, in relation to the ILL field, entering Y>ONSETDATE,N>SAUSROLLS would set up the necessary jumps as suggested in the example. We’ll look at exactly how we set this up a little later (since we might also want to automatically set the Case Definition Met field to No if the person has not been ill). It’s also possible to use AUTOJUMP followed by the fieldname to make the skip take place regardless of the value inserted in the field.

Must enter is quite simple – the Yes/No value defines whether or not a value must be entered for the question. There will be some fields for which this is useful (e.g. basic personal details like name, perhaps date of birth and address as well, together with key data points such as whether they were ill and/or meet the case definition). There may be other fields where being able to leave the field blank is useful – for example if someone does not remember whether or not they ate Crisps, it is better that the field is left blank rather than assuming a ‘No’ answer, which could distort results.

Repeat – if Yes is entered in this rule then the data entered in the previous record will be repeated in the next new record. Repeated data can be changed during data entry. This function can save a lot of typing if your forms contain data that changes

12 Creating a Questionnaire using EpiData only rarely in a particular batch of forms (e.g. reporting forms in a surveillance system). It is probably of less use in an outbreak situation.

Value labels are a set of values combined with text items that explain the meaning of each value. For example, a field is created to enter information on the sex of the informant. It is decided that a value of 1 in the field means that the informant is male and that a value of 2 means the informant is female. If a value label is defined then a ‘translation table’ can be shown during data entry if the user presses [F9] (or the [+] key on the numeric keypad). The value labels in this example would be:

1 Male 2 Female

It is important not to confuse value labels with ranges/legal values – although both place restrictions on the data that can be entered into the field. Decide what you want and select the appropriate option – you may not want to have to go to the trouble of setting value labels if a simple range is all that’s required, and of course in some situations value labels aren’t relevant.

These are the basic checks that can be attached to a field through the check file dialog. In addition, you can click Edit to open the check file editor for the current field and enter check code manually. This is useful for calculating field values based on what has already been entered, and for more complicated checks (which are largely outside the remit of this training but covered in detail in the EpiData help files).

Let’s work through some examples.

Select the ONSETTIME field on the form (or from the dropdown list in the check dialog). Then type 00.00-23.59 into the Range, Legal box on the dialog. This sets the range to those relevant to a time setting. Let’s see how this actually appears in the check code itself – click the Edit button. This brings up another screen showing the actual check code relevant to this field.

We can see that the code starts with the fieldname, and then the code for the range is included. The word END indicates the end of the checks for this field. As we’ll go on to see, one field can have several different types of checks in the code. Click Cancel to close this dialog without making any further changes.

Now select the GENDER field. This time we’ll create Value labels to restrict the options for entry (and to give a guide to data entry staff). Select the Value label dropdown list – you’ll see that there are some predefined value labels, including one for ‘sex’. However, the predefined ‘sex’ labels are based on entering a single

13 Creating a Questionnaire using EpiData digit number, and our gender field is an uppercase text field. So let’s instead create our own value label list. Select the [none] option from the dropdown list, and then click the + button next to the list.

The edit checks screen appears, with the following text showing:

LABEL Label_GENDER

END

We then enter the legal values and relevant labels as follows:

LABEL Label_GENDER M Male F Female U Unknown END

The indenting of the text setting the labels is optional, but makes the code easier to read. Click Accept and Close to close the window. The value label list now shows label_gender . This label list can be re-used for other fields if desired – perhaps not so useful in the case of gender, but if for example you wanted to record details of portion size in relation to each food consumed, you could define one list of value labels (e.g. small, medium, large) and apply that to each portion size field.

There are a few fields that we would like to be entered for every questionnaire – for example, if we do not know if the person was ill or meets the case definition, it is difficult to draw any conclusions from any other information they have given us. So we need to make sure that these fields are set to ‘must enter’. Select the ILL field and change the Must enter option to Yes . Repeat this process for the CASEDEFMET , ATTENDFUN and ATTENDCLUB fields. Initially it can be tempting to set this option for most of the fields, but not all fields will be relevant to all interviewees (e.g. ONSETDATE is only relevant for those who have been ill) and the blank field option (indicating missing or unknown data) can be important in relation to exposures.

So far we’ve covered the use of ranges, value labels and must enter checks. Next, let’s consider the use of jumps. Previously we considered that this might be useful in controlling data entry flow after the Were you ill? question. Select the ILL field. There are two options for the contents of the field after entry – Y or N. (By setting the Must enter property to Yes , the ‘empty’ option is not available). Type the following into the Jumps option:

Y>ONSETDATE

This sets the flow so that the next field selected after a Y is entered will be ONSETDATE .

Now we need to enter the details for the N option – add a comma after the text that is already there, then type:

N>

14 Creating a Questionnaire using EpiData

Instead of typing the name of the appropriate field to jump to, you can also select it on the screen using the mouse – do this now by clicking on the Sausage rolls field. EpiData automatically inserts the relevant fieldname ( SAUSROLLS ) into the Jumps option.

Before we move on, let’s look at how this code looks in the editor. Click the Edit button to bring up the check code editor. We can see that the JUMPS options are laid out line by line, and MUSTENTER follows it, showing how more than one check can be included in relation to one particular field. Even so, all this was created just by using the dialog box.

Now lets look at calculating one field based on the information entered into one or more other fields. As an example, we’ll calculate the age of the respondent at the time of the event, based on the date of the event (a value we will provide in the code). Since this is a calculation that will be run once the Date of Birth details have been entered, we actually need to put the relevant code into that field, so select the DOB field. We can’t create the calculation using the dialog options, so instead click Edit to bring up the code editor.

The first thing to do is tell EpiData that we want the commands to run after data entry into the DOB field has finished. We do this using the AFTER ENTRY … END code block, as follows:

DOB AFTER ENTRY

END END

Now we can enter the actual code to do the calculation, in between AFTER ENTRY and the first END command. First, we need to define the date of the event:

DEFINE dateofevent

dateofevent = “14/08/1998”

This defines a temporary ‘variable’ that holds the date of the event – doing things this way makes the final formula easier to understand. Now we add the actual calculation that assigns the age of the person to the AGE field.

AGE = trunc(int(dateofevent - DOB)/365.25)

This might seem a little complicated, but by taking it apart it is easier to understand:

1. First we take the difference in days between the event date and DOB: dateofevent - DOB

15 Creating a Questionnaire using EpiData

2. We convert that difference (which EpiData is still treating as a ‘date’) into an integer, using the int function: int(dateofevent - DOB)

3. Convert the result in days to number of years, by dividing by 365.25: int(dateofevent - DOB ) / 365.25

4. It’s likely that the result of this calculation isn’t going to be a round number, so we use the trunc function to round the result down to the person’s age in years: trunc(int(dateofevent - DOB) / 365.25)

5. Finally we assign the result of the calculation to the AGE data field: AGE = trunc(int(dateofevent - DOB) / 365.25)

One other thing to do – since we are calculating the AGE field, we don’t need the data entry form to actually include that field, so we can skip it and go straight to the Gender field. We can use the Jumps section of the check dialog to do this, so Accept and Close the code edits that you have made for the calculation and return to the check dialog. Because we want to jump straight to the Gender field regardless of the value entered in the DOB field, we use the AUTOJUMP term, as follows:

AUTOJUMP GENDER

That’s all the changes we need to make, so click Save and then Close on the check dialog.

Hopefully this makes some sense, and you can follow how check code can be used to calculate data for a particular field. If it doesn’t, or seems too complicated, don’t worry too much. Knowing how to use the finer points of calculations and check code is not essential to using EpiData for outbreak investigation – but it does open up some of the power of the program in controlling data entry and consistency, and saving time.

On the other hand, if this has piqued your interest in using check code for running calculations and controlling data entry, much more information on how to do this can be found in the EpiData help files. EpiData follows largely the same check code rules as Epi Info 6 (the DOS version of Epi Info) so if you have access to old check code programs used in Epi Info 6, they may still work in EpiData (perhaps with some minor tweaks).

For now, we’ve done enough to create a questionnaire to investigate this outbreak, with some basic checks and calculations in place to help data entry. In the next section, we’ll look briefly at how we actually enter data into our EpiData data file.

16 Entering Data using EpiData

Entering Data using EpiData

From the main EpiData screen, select the Enter Data button (close any open forms first if necessary). You’ll be asked to select a data file – choose the data (.REC) file that you created earlier.

The data entry form that you are probably familiar with by now should appear. This time you can enter data for real! Note also that the status bar at the bottom of the screen has some additional buttons for navigating around records in the file.

The table below includes fictional data for three sample records – have a go at entering them into the form. It’s good to get into the habit of using the Enter and/or Tab keys to move between the fields, rather than clicking with the mouse – because check code that has been set to run before or after entry of data into a particular field will not be run if the mouse is used.

As you go through the data entry, note how the check code we included earlier is working – inserting calculated results into the AGE field, jumping fields according to the data that has been entered, requiring data to be entered into a particular field. At the end of each record, you’ll be asked if you want to save the record to disk – click Yes.

Field Record 1 Record 2 Record 3 Interview date 20/08/1998 21/08/1998 24/08/1998 Interviewer AGT DJG KJB Surname Jones Dickens Jenkins Forename Stephen Charles Hannah Date of birth 24/10/1943 12/12/1922 13/10/1992 Gender M M F Occupation Teacher Retired Schoolchild House number 24 745 House name Ty Gwyn Street name Gelligaer Street Mill Lane Newport Road District Cathays Lisvane Rumney Town Cardiff Cardiff Cardiff Postcode CF24 4LA Home tel. 029 2067 8765 029 2045 3234 Other tel. 07796 423659 Did you attend the Yes Yes Yes funeral? Did you attend the No No No members club? Were you ill? Yes No Yes Onset date 15/08/1998 16/08/1998 Onset time 13.00 01.00

17 Entering Data using EpiData

Duration (days) 3 4 Inc. period (hrs) 22 34 Diarrhoea Y Y Vomiting Y N Abdominal pain Y Y Nausea Y N Pyrexia N Y Headache N Y Other aches N N Other symptoms Y N Other symptoms Fainted description Case definition met? Y N Y Sausage rolls Y N Y Salmon sandwiches N Y N Pickled onions N Y Y Corned beef sandwiches N N N Chicken nuggets Y N Y Chicken rolls N Y Y Chicken sandwiches N Y N Ham sandwiches Y Y N Ham rolls N N N Egg rolls Y N Y Egg sandwiches Y N Y Pasties Y Y N Crisps N N Y Gateaux N Y Y Cheese & biscuits Y N N Comments Interview with mother

Basic data entry is as simple as that really. The only other thing we’ll dwell on here is navigating between existing records. You can use the additional buttons at the bottom of the status bar, which show the current record, total number of records, as well as having buttons for creating a new record and deleting the current record. When you delete a record, the record is just ‘marked’ as deleted with the word DEL in the status bar (and therefore isn’t included in any future analysis). However you can ‘undelete’ the record by simply clicking the delete button again (or using the option in the Goto menu).

Now that we’ve seen how questionnaires can be designed, and data entered using EpiData, let’s look at how we analyse the data using the data analysis tools in Epi Info for Windows.

18 Outbreak Investigation using Epi Info Analysis

Outbreak Investigation using Epi Info Analysis

For this section of the course, we’ll move to using Epi Info for Windows, and specifically the data analysis elements of the software. 1

Epi Info for Windows is based around the idea on working on projects, which are actually based around the file format. Epi Info for Windows provides a full package for designing questionnaires, entering data and carrying out analysis, but because EpiData’s questionnaire design tools are quicker and easier to use, we made use of that software instead. So we need to import our EpiData data file (which is stored in REC format, the same format used by Epi Info 6) into a format that is usable in Epi Info for Windows.

The first thing we need to do is create an Epi Info project that we can import the data into. We actually do this by starting the process of designing a new questionnaire. Using the shortcut on the desktop, or via the Start menu, open the main Epi Info for Windows menu screen. Click the Make View button to open the Make/Edit View program. The program starts with a blank screen, so create a new project by selecting File > NewNew. The Create or Open Project dialog appears:

Browse to the directory containing your working files and type in an appropriate file name, then click Open (which will create the project file). You’ll then be asked to name the new ‘View’ that you are creating ( view is the term Epi Info uses for a questionnaire design & data table) – as we don’t want to create a new

1 Elements of this section of the training have been taken from “Introduction to Epi Info for Windows” by Andrew G Dean, available at: http://www.epiinformatics.com/Resources.htm

19 Outbreak Investigation using Epi Info Analysis

questionnaire, just click Cancel. Then select File > Exit to return to the main Epi Info menu.

Now we can import our data into the Epi Info Analysis program. To run the Analysis element, click the ANALYZE DATA button on the main menu screen. The Analysis program will then open.

Let’s take a quick tour of what the program shows. All the main analysis commands are shown in the tree view on the left. Clicking on a command will bring up a dialog that places the command in appropriate form in the program editor at the bottom of the screen. Results appear as web pages in the Output window, a simplified version of the Microsoft Internet Explorer browser.

READing data into Analysis For this section of the training we’ll use some real (anonymous) outbreak data that broadly matches the scenario we’ve been working with. Before we do that, though, we’ll import the data we entered into the questionnaire we created earlier, just to confirm that it all worked! The first thing we need to do is READ that data into Analysis. The data is in the REC file format used by Epi Info v6 and EpiData.

Click on the READ command. A dialog box appears so that you can choose a database and a view. Click the button called CHANGE PROJECT and then use the

20 Outbreak Investigation using Epi Info Analysis dialog that pops up to find the project file you created a moment ago. Once you’ve found the file, select it and click the Open button to return to the main READ dialog.

Since we created a brand new project, no ‘views’ or data tables appear in the list. Now we need to import our EpiData REC file. On the Data Formats dropdown list, select Epi6 (which is the same format as EpiData uses). Then, click the button with three dots … next to the Data Source text box (which will now be empty). Select the REC file we created earlier from the dialog, and select Open. You’ll now return to the READ dialog, which should show something like the screenshot below:

Check that you’ve selected the right file, then click OK. Epi Info imports the data table into Analysis. You’ll notice that some text will appear in the Analysis Output window detailing the current “view” (Epi Info’s term for a data table), number of records and the current date and time. You’ll also notice that a command appears in the Program Editor – every time you carry out a task/function the relevant line of code will appear in the editor.

The data has now been imported into the Epi Info project – so if you come back a t afuture time to carry out more analysis, you can select the relevant “view” from within the project, rather than having to import the data again. Of course if the data has been updated or amended then you’ll want to import it again to work with the most recent information.

We’ll carry out a couple of basic tasks with this data before we import some more meaningful records.

21 Outbreak Investigation using Epi Info Analysis

LISTing basic case details A common task in outbreak investigation is producing a simple case listing, including for example name, gender, date of birth, case status, onset date, and so on.

Click on LIST in the command tree. A dialog box will then appear. Initially, let’s go with the default settings and produce a grid showing all the data, so just click OK. A grid then appears over the top of the output window, with scrollbars etc, allowing you to scroll through all the data currently selected. This is a bit overwhelming so we need to change our parameters a little to limit the information that appears.

Click the LIST command again, but this time use the drop-down list at the top of the dialog to select the following variables:

SURNAME FORENAME GENDER AGE CASEDEFMET ONSETDATE ONSETTIME INCUBATION DURATION

Note that the items appear in the dialog in alphabetical order as you select them – however when the list is displayed the items will appear from left to right in the order you actually select them. We can also choose to have the listing formatted in the main output window, rather than the temporary grid we got the first time. To do this, select the ‘Web (HTML)’ option from the Display Mode list on the right hand side of the dialog. When you’ve selected this, some extra options will appear – the one that is most useful is ‘Fields Per Page’. Putting a zero into ‘Fields Per Page’ ensures that all the data for one record appears on the same line (otherwise the table can get split after 6 columns, which can be difficult to read). Click OK to display the list. Epi Info displays ‘Missing’ to represent blank fields (such as ONSETDATE for a person who did not suffer illness).

OK, so we’ve seen that we can import data created in EpiData for analysis in Epi Info. Let’s now import some real outbreak data for some more meaningful analysis. Click the Read command again, and repeat the process we followed before (except that you shouldn’t need to Change Project this time). Select the OutbreakData.rec file. If all has gone well, the output window should show a record count of 75 records.

22 Outbreak Investigation using Epi Info Analysis

For the rest of this section we’ll work with this data to carry out analysis that fits in with our outbreak scenario.

Producing FREQuencies for various items of data Generally it is useful to know what the ‘distribution’ of our study population is – for example by age, location, gender, occupation, illness etc. These sort of descriptive statistics are often the starting point for analysis of a set of data. Epi Info’s Frequencies command allows us to do this. Click the Frequencies command in the command tree to bring up the FREQ dialog. In the dialog box, use the ‘Frequency of’ dropdown list to select the following variables:

ILL SEX PERIOD (meaning incubation period)

Then click OK. After a short wait, this produces a frequency breakdown for each of these data variables in the Output window. Scroll up and down and note that each table is accompanied by yellow bars to the right that indicate the frequencies. Epi Info also estimates 95% confidence intervals for each row in the frequency table, although for most variables these are likely to be of limited use (particularly in outbreak investigation). If you want to turn off these statistics, click the Settings button in the FREQ dialog, and ensure the Statistics option is set to None.

For numeric (i.e. measured on a continuous scale) variables like PERIOD , we might also wish to identify some other statistics, such as and values. Epi Info’s Means command can provide this data. Click the Means command in the command tree to bring up the dialog – select PERIOD from the ‘Means of’ dropdown list, and click OK. This produces a similar table to the one we got with the Frequencies command, but at the bottom additional statistics are provided including number of observations, mean, median, mode, standard deviation, and quartiles.

(As an aside, putting two or more numeric variables into the MEANS dialog means that Epi Info also runs a test to check for the statistical significance between the variables. This isn’t usually of use in an outbreak investigation, where we are more interested in the Relative Risk and/or for various exposures, but can be of value in other types of epidemiological investigation, such as evaluation of a trial or intervention, comparing a particular measurement before and after the intervention).

23 Outbreak Investigation using Epi Info Analysis

SELECTing particular groups of records Sometimes you don’t want to run a particular analysis on the entire dataset – perhaps you only want to know about those people who were ill. In Epi Info Analysis, the SELECT statement limits subsequent analysis to particular records based on criteria that you specify. Click the Select command to bring up the SELECT dialog.

The Available Variables dropdown list includes all the available variables in the dataset. We’ll select only those cases that meet the case definition. Select the CASEDEF variable, then click the = and “Yes” buttons to put together the full statement – note that Epi Info uses the notation (+) to denote Yes (it also uses (-) for No and (.) for Missing (blank) data. Alternatively, you can just type the statement directly into the dialog. It’s possible to construct more complex conditional statements – e.g. requiring more than one condition to be met using AND , or one of a series of conditions to be met using OR .

Click OK when you are finished, and you’ll be returned to the main Analysis screen.

The Output window has updated to show the current statement(s) that are applying, and the number of records now selected (note that this has dropped from 75 to 56). Run a FREQ CASEDEF command to confirm that only cases have been selected.

24 Outbreak Investigation using Epi Info Analysis

If we now run another SELECT command, that will be processed only on the 56 records that we are currently working with (this provides another way for combining several statements to select a particular subgroup of the data). Let’s try this by clicking the SELECT command again. This time, we’ll select all those aged 50 or over. Select AGE from the dropdown list of variables, then either type or use the buttons to produce the statement AGE>=50 . Click OK, and see how the Output window has updated to reflect the second SELECT statement. Run a case LISTing of AGE and CASEDEF to see the selected records.

To get back to the base dataset of all records, simply click the CANCEL SELECT command, and click OK in the resulting dialog.

You’ll notice that there are also options in the same section of the command tree for SORTing data. The dialog is self-explanatory so we won’t dwell on it here – like the SELECT command, the SORT on the data will apply to all future analysis until a CANCEL SORT command is issued. Before moving on, cancel any current SELECT or SORT commands so that you are working with the base dataset again.

Recoding data We saw earlier that, as you might expect, the individuals in our dataset are spread across a wide range of ages. This makes it difficult to get an accurate, easily understood picture from Frequencies or Means results about how the population is distributed. We can overcome this problem by recoding the age data into a series of age groups.

To do this, we first need to define a new variable that will hold the age group information for each case. Click the Define command to bring up the DEFINE dialog.

Defining a Standard variable effectively creates an additional field in our dataset, into which we can then insert data. (Global and Permanent variables are more commonly used in more complex Epi Info Analysis programs – if you want to know more about them, read the information in the Epi Info help file). Define the Variable Name as AGEGROUP , and click OK.

Now that we’ve defined the AGEGROUP variable, we can recode the age data into that variable. Click the Recode command to show the RECODE dialog.

25 Outbreak Investigation using Epi Info Analysis

We want to recode from the base data in the AGE field, so select that in the ‘From’ dropdown list. Our target field for the recoded data is AGEGROUP , so put that into the ‘To’ list.

The dialog provides two ways to recode data: entering the options by hand into the grid shown, or by clicking the Fill Ranges button, which is what we’ll do in this case. You’ll see that the dialog changes to a simpler format, asking for values for Start, End and By. Start and End are self-explanatory – the start and finish points for the recoding. The By values determines the size or interval for each group. Enter the range as follows: Start=10, End=90 , By=10 , then click OK. This returns you to the original dialog, which has now had the grid filled with recoding values based on our selections. Notice that the terms LOVALUE and HIVALUE are used to denote the minimum and maximum values and these rows in the grid are automatically created for us (which is why we set 10 as the start point, not zero). Click OK to process the recode.

Now run a FREQuencies command on the AGEGROUP variable. Select the CASEDEF variable in the ‘Stratify by’ dropdown list to produce two separate tables: one for cases and the other for non-cases. Using the recoded variable gives a much more concise description of the age distribution in the population.

One final note: Epi Info may give an error if you try and recode with more than 10 separate recoding options, so it’s best to limit the range of ‘rows’ in the recoding grid to a maximum of 10 options.

Using the Program editor

Because the AGEGROUP variable we defined is only a ‘standard’ variable, the data will be lost when we exit Epi Info Analysis. Rather than having to go through the whole process again (not too difficult in this case, but potentially irritating if the recoding groups have been more carefully hand-crafted), it would be useful to have a way of saving the commands that we produced so that we can issue them again in future. Fortunately, Epi Info Analysis allows us to do this, using the Program Editor.

You’ve probably already noticed that every time you run a command through a dialog, text is added to the Program Editor window. Let’s look a bit more closely at it. Scroll through the text and you’ll see a range of commands, starting with the READ ing of the original data, the case LIST ing, FREQ uencies and MEANS operations,

26 Outbreak Investigation using Epi Info Analysis

SELECT ing and SORT ing of records. At the bottom of the text will be the most recent commands, including DEFINE AGEGROUP and the RECODE commands.

The Program Editor allows us to save this output into the project as an Epi Info Program. We could save this entire output as a program in Epi Info, and then running it again at another date would process all the commands listed one after another. That’s probably overkill, though, and would take a significant amount of processing time to come up with the results. More usefully, we could save the commands relevant to the recoding process. Go through the text in the editor and delete everything that appears before the DEFINE AGEGROUP statement. Then delete any text after the END statement at the end of the RECODE code block. The Program Editor should now contain text that looks something like this:

DEFINE AGEGROUP RECODE AGE TO AGEGROUP LOVALUE - 10 = "<=10" 10 - 20 = ">10 - 20" 20 - 30 = ">20 - 30" 30 - 40 = ">30 - 40" 40 - 50 = ">40 - 50" 50 - 60 = ">50 - 60" 60 - 70 = ">60 - 70" 70 - 80 = ">70 - 80" 80 - 90 = ">80 - 90" 90 - HIVALUE = ">90" END

Click the Save button in the Program Editor – the Save Program dialog will appear:

The Project File field contains the details of the current project, and can be left as it is. Type a name for your program into the Program field, put your name in the Author field and perhaps include a brief comment as well. The Date fields are automatically filled by Epi Info and are not editable. When you’re happy, click OK to save the program. (Note that the Text File option also exists to save the program to a separate text file – useful if you want to maintain a repository of useful programs to import into any Epi Info project.)

Now let’s see how this has worked. Exit the Analysis program (there’s a button at the top left) and return to the main Epi Info menu. Click the Analyze Data button to reopen a new instance of Analysis. You’ll need to READ in the outbreak data, so do that (changing project if necessary), selecting the viewOutbreakData option.

27 Outbreak Investigation using Epi Info Analysis

Once the data has been read into Analysis, click the Open button on the Program Editor. The Read Program dialog will appear – this looks very similar to the Save Program dialog. Click the Program dropdown list and select the program you stored earlier – the author, date and comment details will then appear. Click OK to read the program into the Program Editor. The program hasn’t yet been run, so click the Run button (note that the Run Command button runs only the command currently containing the cursor). Since all the commands in our program don’t actually produce any output of their own, we need to run FREQ AGEGROUP to see if things have worked. You should get the same age group breakdown that we saw a little earlier.

Accessing previous results & controlling file storage By now, you might be wondering where all this Analysis output is being stored. Epi Info Analysis stores output as HTML files (web pages). By default, these files are stored in the same folder as the project file, but there are several options available to customise this in the ‘Output’ section towards the bottom of the command tree.

Epi Info also provides a handy index of all the output that you’ve produced in work on the project. First, click on CLOSEOUT to close the output file you’ve just been working on, and then click on the hyperlink called RESULTS LIBRARY at the top of the output in the browser (you might need to scroll up). An index page appears, showing previous commands that have produced output files. Click on any of the entries to display it.

There are a wide range of options for customising storage of data, mostly accessed via the Storing Output command. The most useful of these is the ability to set the ‘Results Folder’ where output files are stored – perhaps a new subfolder inside your main project directory. Other settings for archiving data are also available, but get more involved – refer to the Epi Info help file for more details.

Despite all this, there might come a time when you want to define a specific file in which to store a particular set of output. In the next section, we’ll be producing some graphs, so now we’ll create a file specifically to store the graph output in. This is does using the Routeout command, which pops up a simple dialog asking for an output filename.

Click the … button to bring up the file browser dialog – enter a file name (like Graphs) and click the Open button, then click OK in the main ROUTEOUT dialog.

Any future output will be directed to this file, until a CLOSEOUT command is issued (when Epi Info will start issuing output files in the default location again.)

28 Outbreak Investigation using Epi Info Analysis

Producing simple GRAPHs A common task in outbreak investigation is to plot an curve showing the order and frequency of onset date. Epi Info Analysis has a Graph command that helps us do this.

Click this command to bring up the GRAPH dialog.

First of all, let’s run a simple bar chart based on ONSETDATE. At the top left of the dialog is the Graph Type dropdown list. The default setting is BAR, so leave that alone for now. Below that, select ONSETDATE as the Main Variable for the x-axis. For the y-axis, we want the Count of the main variable, so this is OK as it is. Click OK.

This displays the graph in a separate ‘Epi Graph’ window, and we can see that the onset date information is displayed in the graph. This window allows further customisation of the graph if desired – for now we’ll just accept what we’re given and return to Epi Info by selecting File > Save & ExitExit.Exit The graph is then displayed in the output window, and is also saved as a separate JPEG image file in the results output directory.

29 Outbreak Investigation using Epi Info Analysis

Unfortunately, this graph isn’t terribly helpful – because the incubation period is relatively short, separating the cases in hours would be more useful. Fortunately, the outbreak investigators in this case included a field for incubation period in their data collection, measured in hours. Previously we ran a MEANS command on the PERIOD variable, so we know that there is a wide range of values from 5 to 46 hours – and around 30 different values in each case. Producing a Bar chart in this instance may not be terribly helpful, since we’ll get the same wide range of values – because a bar chart will produce a bar for each individual value represented in the data set. But because we’re using a numerical (continuous) field, we can run a Histogram graph, which gives us more control over the output. Click the Graph command again, and this time select HISTOGRAM from the Graph Type dropdown list.

Put PERIOD into the x-axis Main Variable list. Notice also the Interval boxes below. This is where you can set the intervals that the Histogram bars will be grouped in – select an appropriate value like 6 hours. The 1 st Value box can also be set – the default is Auto which will select an appropriate value. However, you can adjust this yourself if you want to force the x-axis to start at 0 or another fixed point – which makes sense in this case, so set the value to zero. Accept the other settings and click OK.

The way that the graph appears suggests a point source outbreak, but there is also a second, smaller ‘peak’ at around 36 hours. How could this be explained?

One possible explanation could be that these are ‘secondary’ cases, who did not attend the buffet, but consumed the food in the club bar later in the evening, or contracted the illness through person-to-person contact with cases. We can run the

30 Outbreak Investigation using Epi Info Analysis graph again, and this time separate the data into two ‘series’, based on whether or not the case attended the funeral itself. Open the GRAPH dialog again, select HISTOGRAM as the graph type, PERIOD as the main variable, and an interval of 6 hours, with a 1 st value of zero. We still want a count of cases on the y-axis, but we also want to display bars for the series – so select the variable ATTEND in the ‘Bar for each value of …” dropdown list. It is possible to set the title, but we can do this at the customisation stage, so we’ll look at that in a moment.

Click OK to produce the graph in Epi Graph. Now we can see that there are two separate sets of bars for those who did and did not attend the funeral. Nevertheless, by customising the graph we can make it easier to see exactly what’s going on. In Epi Graph, select View > Customization …

The Customization dialog allows us to modify many of the factors relating to the design and presentation of the graph. On the General tab, we can add or modify the Title and Subtitle, set font sizes, whether the graph should appear in colour or monochrome, apply gridlines, and determine whether a data table is produced to go with the graph.

Add a title and subtitle for the graph if you wish – you can leave most of the other settings as they are unless you particularly want to add grid lines for the y-axis to make it easier to read across the counts.

Now select the Plot tab in the dialog. The Plot Style list in the centre of this tab gives a range of options for displaying the graph series. Currently the separate bars mean that it is hard to pick out the actual epidemic curves. We can change the display to make it easier to see what is happening – select Area from the Plot Style list (which displays the series as filled areas).

To see what difference a change makes, you can click the Apply button, which applies the customisations to the graph without closing the dialog. It might also help to make the series show in 3D, so select that option from the bottom left. The other tabs can be left as they are

31 Outbreak Investigation using Epi Info Analysis

unless you would like to change the Font, Color or Styles used for the display – these tabs are self-explanatory.

Access to many of these settings can also be obtained by right-clicking on the graph and selecting from the pop-up menu options that appear.

Once you’re happy with the settings, click OK in the dialog to have a proper look at your graph. It’s now easier to see what’s been happening. When you’re finished you can select File > Save & Exit to return to the Analysis window. (You can also select File > Export to export the graph as an image to the clipboard, direct to a printer, or a file location of your choice).

An example of the type of results that are achievable using Epi Graph is shown below (you would probably show the graph in a larger format for a full outbreak report).

Epidemic Curve Incubation period since buffet (hours)

ATTEND=Yes ATTEND=No

10.0

COUNT 5.0

0.0 0 6 12 18 24 30 36 42 48 PERIOD We’ve finished working with the graphs now, so click the Closeout command to stop routing output to the Graphs file we created earlier.

Producing 2x2 tables to assess relevance of exposures The chief purpose of conducting an epidemiological study in outbreak investigation is to try to identify statistically significant associations between certain exposures and illness. This is normally done through the construction of ‘2x2’ tables, comparing the rates of illness in those exposed to a particular risk factor to the rate in those who were not exposed.

Epi Info’s Tables command allows us to produce these tables, together with relevant statistics. Let’s investigate the relevance of exposures in our outbreak example. We’ll start by looking at consumption of ham sandwiches. Click the Tables command to bring up the TABLES dialog.

32 Outbreak Investigation using Epi Info Analysis

Select the variable HAM in the Exposure Variable list. Our outcome variable for all this analysis will be CASEDEF , representing whether or not an individual meets the case definition (the alternative would be ILL , but there may be some people who have reported illness but do not meet the case definition). Accept all the other default settings, and click OK.

The 2x2 table will be displayed in the output window, followed by a range of different statistics. Presentation of the 2x2 table is fairly obvious, but let’s spend a little time considering the various statistics. The statistics produced from the above analysis are shown below.

Single Table Analysis Point 95% Estimate Lower Upper PARAMETERS: Odds-based

Odds Ratio (cross product) 0.6771 0.2368 1.9365 (T) Odds Ratio (MLE) 0.6806 0.2286 1.9691 (M) 0.2038 2.1855 (F) PARAMETERS: Risk-based

Risk Ratio (RR) 0.9065 0.6960 1.1807 (T) Risk Difference (RD%) -7.3257 -26.9187 12.2672 (T)

(T=Taylor series; C=Cornfield; M=Mid-P; F=Fisher Exact)

STATISTICAL TESTS Chi-square 1-tailed p 2-tailed p Chi square – uncorrected 0.5319 0.4658206779 Chi square - Mantel-Haenszel 0.5248 0.4688093873 Chi square - corrected (Yates) 0.2151 0.6428085146 Mid-p exact 0.2409736241 Fisher exact 0.3219542501

In a case-control study, we would be interested in the odds ratio (cross product), and risk-based measures should not be used. In a cohort study, we are able to use risk-based parameters as the total population exposed is a known quantity – so the Risk Ratio (also known as the Relative Risk) is the most useful basic measure of risk. In this case the RR is 0.9065, which suggests very little difference in outcome based on this exposure (strict interpretation: those eating ham sandwiches were

33 Outbreak Investigation using Epi Info Analysis

0.9065 times as likely to be cases as those who did not). Since the 95% Confidence Interval includes the ‘no difference’ value of 1, we know that this difference is not statistically significant at the 95% confidence level.

The statistics provided also include chi-square test results in the form of p-values, achieved by a number of different statistical procedures. In general, where a reasonably large dataset/sample size has been used, there is unlikely to be an important difference between these procedures. However, where the differences between the procedures are important (e.g. one identifies a statistically significant result, but another does not), you should seek the advice of an epidemiologist or statistician to assist with interpretation of the results.

This analysis indicates that consumption of ham sandwiches was not associated with illness in this outbreak. However, other foodstuffs might be implicated in the outbreak, and assessments of the strength of association between consumption and illness for each of the menu items should be completed.

Appendix IV contains a worksheet containing an empty table for you to recorecordrd the results of this analysis. A completed version of thethe worksheet is included oonn the course CDCD----ROM.ROM.

34 Using Analysis with routine COSURV data

Using Analysis with routine COSURV data

So far we’ve concentrated on the outbreak scenario, but there are also times when we want to carry out on routinely collected surveillance data, particularly the information held on Cosurv. Cosurv has the ability to export data into the REC format readable by Epi Info for Windows. This short section looks at how we export that data out of Cosurv, and some common analysis tasks that we might wish to perform.

NB This process will overwrite any existing/previous exports that you have done (i.e. the file EpiXport.REC ––– read on for details) so please make sure that you hahaveve taken a copy of any previous exports if the data is still required.

Exporting data from Cosurv Open Cosurv, and enter your username and password. From the Cosurv main screen, select Export > EpiInfo & ASCIIASCII. The Export dialog will then appear:

Select the dates for the relevant period you want to export to – say, all of 2006 – then click the Get Records button. The list of records to be exported will appear in a ‘spreadsheet’ format in the main part of the dialog. You can use the scrollbars to move around the list and explore the data that will be exported. One word of warning: if a specific record has been selected in that dialog, and then the Export button is clicked, only that record will be exported (but potentially hundreds of times, depending on the size of the dataset). So if you have unwittingly clicked on a specific record, try clicking off the dialog somewhere to deselect it. This may be a bug in Cosurv, or it may be intended behaviour – either way, it probably isn’t what you want to happen.

35 Using Analysis with routine COSURV data

If you’re happy with the list that’s shown, click the Export button. Cosurv will process the export – you may see a DOS program window pop up briefly, as Cosurv actually uses one of the old Epi Info 6 DOS utilities to carry out the export. Once the export has been processed, you’ll be returned to the main Cosurv screen.

The exported data has been placed into an Epi Info REC file named EpiXport.REC (NB the data has also been exported in ASCII delimited text format in a file named csExport.txt – a format suitable for import into spreadsheet software such as Excel). But where is it? The location of this file depends on your current Cosurv settings. You can find these out from the main Cosurv screen by selecting System > Setup > DistrictDistrict, which brings up the District settings dialog.

The final box on the General tab shows the location for Epi Info export files. When you’ve done an export, it’s strongly recommended that you move or rename this file appropriately to stop it being overwritten by future exports.

The data that you’re provided with for this training has been anonymised, with key patient-identifying data (names, address, telephone numbers, employer details, id numbers etc removed). Obviously when working with live data these will be available, and this is something to bear in mind since it is important that data protection and confidentiality guidelines are followed. The Cosurv data system itself is encrypted, i.e. data can only be accessed through the password-protected Cosurv database itself – anyone trying to access the datafiles via other methods will just read encrypted gobbledegook. However, data exported in Epi Info (or ASCII text) format is not encrypted, and can be read by anyone with a text editor or spreadsheet. Therefore, you need to keep careful control over how exported data is stored and manipulated.

36 Using Analysis with routine COSURV data

Importing data into Epi Info Since the data is now in the REC file format, importing the data into an Epi Info project follows the same process as discussed earlier in relation to the outbreak data. In reality, you might want to create a separate new Epi Info project for this routine data, but for the purposes of this training, we’ll use the same project that we’ve been working with so far. So use the READ command in Epi Info Analysis, select Epi6 from the Data Formats dropdown list, and then browse to select the Cosurv2005.REC data file. Click OK to import the data into Epi Info Analysis.

Common analysis tasks for routine surveillance data Let’s briefly look at some of the common analysis tasks that we might want to complete with the Cosurv data. Of course, Cosurv itself has its own reporting mechanisms through the District Reports section, but these apply only to certain predetermined formats – using Epi Info allows us full control over the data and its presentation. The data supplied is the anonymised surveillance reports held by Cardiff Council for the calendar year 2005.

First of all, let’s generate some basic frequencies. Click the Frequencies command and include the following fields in the dialog:

SEX Gender of the case LOCALITY Locality (in Cardiff, we use electoral ward) DISEASE Disease (as used in NOIDS return) TRANSMISS Suspected mode of transmission CONTRACTED Suspected source – home, commercial, unknown

The usual list of frequency tables will appear. You may or may not be surprised by the results!

It might also be interesting to study the distribution of cases by age in the population. Age data is stored in two ways in the Cosurv data system – the DOB field (date of birth), and the combination of the fields AGEY (age in years) and AGEM (months passed of current year) – so a case aged 18 months at the time of the data entry would have AGEY=1, AGEM=6. We saw before that age has such a wide range of values that it is often easier to recode age information into age groups for analysis.

Let us reuse the program we created earlier in the outbreak analysis. Load the program into the Program Editor using the Open button. We need to make one minor modification – the variable we use for age is AGEY, whereas in the program it refers to AGE – make any necessary modifications and then run the program. Now run FREQ AGEGROUP to see the distribution of cases by age group (equally, we could use this variable to construct a bar graph, or even a population pyramid by stratifying by gender).

37 Using Analysis with routine COSURV data

We’ll now focus a bit more on the data that specifically relates to cases of Food Poisoning. To do this we need to select only those cases where DISEASE=”Food Poisoning” . Click the SELECT command and pick the DISEASE variable from the dropdown list, then type =”Food Poisoning” into the dialog after the variable name. (Alternatively, you can use the DISCODE variable and type =22 , which achieves the same thing and is quicker and easier to finish – but of course you have to know that the DISCODE for Food Poisoning is 22).

Now lets look at the various types of organism that have been associated with cases of food poisoning – so run a Frequencies command for the ORGANISM field. You’ll note that organisms such as Campylobacter are basically caught under one field, but because most isolates of Salmonella are sent to HPA Colindale for further typing, the full breakdown of species is available for these cases.

As a final exercise, let’s investigate seasonal variation in notification of food poisoning cases. Select the Graph command, and choose the LINE graph type. Select the WEEK field as the main variable for the x-axis ( WEEK is the NOIDS week during which the case was notified), and leave the other default settings as they are, since what we are looking for is the count of cases in each week. The graph will be produced (and as before, you can continue to tinker with the display settings, titles etc). The line is quite jagged, because there is inevitably some variation from week to week anyway – if you wanted to ‘flatten’ this out a bit, you could produce a similar graph in HISTOGRAM type, with an INTERVAL of either 2 or 4 weeks to aggregate the data a bit closer (fortnightly figures are quite useful).

What factors might explain sosomeme of the variation noted?

This training guide has introduced the basic analysis commands available in Epi Info Analysis, with particular reference to the common epidemiological tasks undertaken in outbreak investigation. Epi Info Analysis is a very powerful tool, and we have only scratched the surface of the various options available, not to mention the advanced statistical analysis that is possible. If you want to know more, a good starting point is the further information and resources detailed in Appendix III.

38 Other capabilities of EpiData and Epi Info

Other capabilities of EpiData and Epi Info

EpiData EpiData includes a number of additional capabilities that have not been covered in this training guide. Some of these functions are discussed briefly below – for more information, consult the excellent documentation in the EpiData help file.

Revise an existing questionnaire/data file In the training, we produced a new questionnaire and REC file from scratch, in one session. In reality, you may well design a questionnaire, pilot it with a few individuals, and then come back and amend the questionnaire based on the outcome of the pilot. EpiData allows you to amend an existing QES and REC file structure. Data already entered will be copied to the new format for fields with the same name. You can add fields, change definition of fields or delete fields. Use the function Revise File found in the Tools menu from the main screen (close all files first). Full instructions on how to do this are in the EpiData help file.

WARNING: If you remove fields or change the names of the fields (e.g. by changing the text to the left of the fields) you will lose data. Please check the new, revised data file carefully. If something goes wrong, the original data file can be restored. The original data file is saved with the name FILENAME.OLD.REC in the same directory as the new data file.

This ability to revise existing structures also means that it is possible to create a ‘template’ questionnaire and data file (with check code if desired) for investigation of outbreaks. Design the basic structure once (much as we did in the training), including patient details, medical information on illness and symptoms, details of contacts, and so on. Create the REC file and any check code that you want to include. When a new outbreak occurs, you can use this template as a basis by copying and pasting all the relevant files into a new folder (so that you don’t overwrite your ‘master template’). Then all you need to do is revise the questionnaire to include the specific exposures that are of interest in the particular situation, update the REC file, and you have a new questionnaire and data file structure ready to start working with. This sort of preparation work can be valuable in saving time in the pressured situation of a major outbreak.

Creating encrypted fields One of the options for text fields in EpiData is the ‘Encrypted’ field (denotes in a QES file by the notation , with the total length including spaces determining the field size). Encrypted fields are a special kind of text fields. The contents of encrypted fields are shown in readable form on the screen, but saved on disk with encryption. Use encrypted fields to store data in a protected mode, e.g. personal information. When a datafile is created and the QES-file contains one or more

39 Other capabilities of EpiData and Epi Info encrypted fields, EpiData will prompt for a password that is saved in the datafile. A datafile containing encrypted fields can only be opened if the proper password is entered. There is no way of breaking the encryption, so do not forget the password or the information in that field is lost.

Data file documentation EpiData can automatically produce a file documenting the structure and checks in place for a particular REC file. Select Document > File Structure and then choose the REC file you want to document in the dialog box. EpiData then produces a text file detailing the file structure, field types and checks in place. This can then be saved (EpiData saves these files with the .NOT extension).

Data entry notes During data entry in EpiData of a questionnaire it is often useful to make notes (e.g. if a difficult to read word is written on a questionnaire). The Data Entry Notes function can be used for making short notes either during data entry or when no data file is open.

During data entry the notes can be accessed by pressing [F8]. If no note file exists for the current data file, a new file will be created. Current time and date are automatically inserted in the notes.

Logical consistency check Consistency checks are a special type of check code – rather than being associated with a particular field, they appear in a separate block (and can even be a separate text file), and are run by selecting Document > ConsistencyConsistency CheckCheck. It provides ’batch’ checking as opposed to the interactive (i.e. as data is entered) checking that is also offered by check commands. A number of consistency checks may be defined in the check file or another text file. The function will show a list of all records in the data file that fail one of the specified consistency checks. Consistency checks are particularly useful where some data has already been entered before interactive checks were added to the data file, and so you want to check the existing data for consistency as well as any future records that may be added.

Double entry and validation To ensure a high quality of data, often it is a good strategy to have two different persons enter the same data. In EpiData this can be done in two different ways: either by entering the same data in two separate data files, which later can be compared or by entering in double entry mode where the new data immediately are compared with the original data. Detailed instructions on these processes are available in the EpiData help file.

40 Other capabilities of EpiData and Epi Info

Data backup and encryption EpiData has a simple data backup option – select ExportExport Data > BackupBackup, choose a data file to backup and select a destination directory. Press OK to begin the backup. This function creates a copy of the selected data file, a .QES file of the same name, a check file of the same name and a data entry notes file of the same name in the selected destination directory.

As an alternative to a plain backup which simply copies the files, you can compact files and complete directory structures with the archive command in the tools menu. This will save files in a Zip formatted file (which can be handled by other programs e.g. WinZip, Windows Explorer on Windows XP machines). As an option these archives can be encrypted using “strong encryption” made with the AES/Rijndael algorithm. There is no way to find or guess a forgotten password, so make sure you keep a copy in a safe place.

EpiData Analysis EpiData also comes with a separate Analysis module, which is based on the Analysis module in Epi Info 6 for DOS. Many of the commands are the same as those used in Epi Info for Windows. In the author’s opinion, the Epi Info Analysis is easier to use and more powerful than EpiData Analysis, which is why that software was used for this training. However, the EpiData Analysis module is a relatively small file, and can be installed on a PC without administrator privileges – it also does not require a particularly high-specification of PC hardware. The final advantage is that both EpiData and EpiData Analysis can be installed and run from a USB stick, since they do not rely on any specific Windows libraries. The EpiData Analysis installation file is included on the course CD-ROM.

Epi Info for Windows In addition to the questionnaire creation, data entry and Analysis modules, Epi Info for Windows includes several other modules and functions that may be of interest. Again, a summary is given below, with further information available in the Epi Info help file (see also the resources listed in Appendix III).

Map creation Clicking Create Maps on the main Epi Info window opens the module. Epi Map is designed to show data from Epi Info files by relating data fields to shapefiles containing the geographic boundaries. Shapefiles are an industry standard for Geographic Information Systems (GIS) used in the popular ArcView and ArcInfo programs. Shapefiles of every country in the world can be downloaded from the Epi Info website under “Maps”, and the Internet provides thousands of such files. Shapefiles can be drawn or edited in either ArcView or Epi Info for emergency or custom purposes. Shapefiles also can contain data on population or other variables, and can therefore provide numeric data that become part of the display either as numerator or denominator. Numeric data can be displayed either as

41 Other capabilities of EpiData and Epi Info color/pattern maps or as dot density maps with the dots randomly distributed within geographic regions.

The maps shown below were created using Epi Map. The first map shows the percentage of births in which the mother was an adolescent in 1995 for each Mexican state, whilst Map 2 shows the distribution of cholera cases (triangles) around the Broad Street pump in Victorian Soho.

42 Other capabilities of EpiData and Epi Info

An excellent tutorial on using Epi Map to present geographical data in a visual format can be found in Andrew Dean’s Introduction to Epi Info for Windows, available at http://www.epiinformatics.com/Resources.htm .

StatCalc – a simple epidemiological calculator StatCalc is an epidemiologic calculator the produces statistics from summary data entered on the screen – particularly useful when you only have the summary data available (and so the conventional tools in Epi Info Analysis are not suitable). Three types of calculations are offered:

♦ Statistics for 2-by-2 and 2-by-n tables ♦ Sample Size Calculations ♦ Chi Square for Trend You can access StatCalc from the main Epi Info menu, by selecting Utilities > StatCalcStatCalc.

Producing reports of analysis results Clicking Create Reports opens the Epi Reports module. The Epi Report tool can be used to design and generate various reports. An end user can include various elements in the reports generated, and these elements can be bound to various data sources. Record lists, cell replacement, groups, fragments from analysis XML can be included in the report. Data analysis through pivot table is also provided. Read the Epi Info help file for more details, and see also the CDC Epi Info Exercises file, which is included on the CD-ROM with the installation files – one of the chapters in this file deals with using Epi Reports.

Encrypting Epi Info Projects Epi Lock provides encryption to control access to data files and compression to facilitate both transmission and making backup copies. Using Epi Lock, files can be encrypted with a password to reduce the risk of others being able to read the data. The encryption mechanism also provides some assurance that the files have not been altered in transmission, since they will not decrypt properly if altered. Epi Lock also performs the function of properly deleting the unencrypted version (“clear text”) version of the file after testing to make sure that the encrypted version can be decrypted to produce the identical result. Access the Epi Lock functions by selecting Utilities > Epi Lock from the main Epi Info menu.

43 Appendix I – Comparison of Epi Info & EpiData

Appendix I – Comparison of Epi Info & EpiData

This article was published in Eurosurveillance Weekly Release surveillance report (vol 12, issue 1, 12 January 2007). The original can be accessed online at:

♦ http://www.eurosurveillance.org/ew/2007/070111.asp#4

A comparison of public domain software for outbreak investigations: Epi Info and EpiData G Kafatos ( [email protected] ), NQ Verlander, D Gelb, E Arnold, A Charlett

Statistics, Modelling and Bioinformatics Department, Health Protection Agency Centre for Infections, London, UK

Since the early 1980s, there has been an increase in the number of outbreaks investigated using analytical since it became widely accepted that microbiological investigation alone is often insufficient [1, 2, 3] . This increase has also been encouraged by a growing availability of statistical tools within public health and the establishment of international and national training programmes and courses (such as the European Programme for Intervention Epidemiology Training). In 1985, a team at the Centers for Disease Control and Prevention (CDC) in the United States created a suite of public domain computer programs called Epi Info (http://www.cdc.gov/epiinfo/ ), which were designed to support field epidemiological investigations of outbreaks.

We compare the two most recent versions of Epi Info, and EpiData, a public domain software based on Epi Info 6.04d.

Epi Info 6.04d Since the release of its first version written for the DOS-environment in 1986, Epi Info quickly became very popular amongst epidemiologists and other public health professionals. Its popularity is demonstrated by the fact that the program is available in 16 different languages. In 2001, the CDC released the final version of Epi Info DOS (Epi Info 6.04d).

Epi Info DOS consists of a suite of programs, which provide all that is necessary to support an outbreak investigation. It is a flexible, versatile tool that permits the construction of a data entry system and an underlying database with great speed, which is something greatly appreciated under the time pressures of an on-going outbreak investigation. In addition, Epi Info DOS contains an analysis module for elementary statistical analysis that is very useful for outbreak investigations where the outcome is often binary (case/control, ill/not ill). The ‘tables’ command provides all the necessary measures of the strength of association, such as odds ratios (OR) and risk ratios (RR), their confidence intervals, and hypothesis tests. It also allows stratified

44 Appendix I – Comparison of Epi Info & EpiData analysis to remove the effects of up to three factors. The ‘csample’ module enables cluster analysis and stratified designed surveys. Epi Info DOS was complemented by a CDC public domain mapping software (Epi Map, CDC), allowing area maps to be produced directly from Epi Info files.

Epi Info for Windows 3.3.2 Despite the unprecedented success of Epi Info DOS, by the late 1990s, the widespread uptake of the operating system for personal computers meant that a Windows version was needed. Epi Info 2000 (Epi Info 3) for Windows was released in 2000. However, there are substantial differences compared with Epi Info DOS.

The analysis capability of Epi Info DOS was carried forward to Epi Info 3 while adding the ability to estimate proportional hazards models and integrating logistic regression and mapping facilities into the package. The software uses components from commercial software, which enhances its compatibility with ‘industry standards’ such as Microsoft Office Access 2003. However, this is also the main disadvantage, since it has resulted in the loss of some functions (such as copying and pasting of fields in questionnaire creation and of check commands between different fields) reducing the speed of usage, a key requisite for outbreak investigations. Moreover, because the principle of simplicity and easy use which underpinned the creation of Epi Info DOS was not carried forward to Epi Info 3, those who have worked with Epi Info DOS will need training to become proficient users of Epi Info 3.

EpiData 3.1 (120306) EpiData was initially released in 2000 by a non-profit organisation in Denmark (EpiData Association, http://www.epidata.dk/ ). The current version 3.1 became available in 2006. This software was developed specifically to translate Epi Info DOS into Windows software by using the Epi Info DOS file structures and command language wherever possible.

Certain restrictions in file size were removed but those dependent upon file structures remain. Construction of a data entry system using EpiData is very simple, especially if the original questionnaire has been constructed in a Windows-based word processing software where it can be either opened as a text file or copied directly into EpiData.

For validating double-entered data (a common practice to ensure high quality data) the ‘Validate Duplicate Files’ option in EpiData has built on the dual entry validation capability within Epi Info DOS. Two additional options to either ‘Ignore text fields’ or ‘Ignore letter-case in text fields’ resolve the problems that occur when comparing text fields (i.e. two identical text fields are considered different by the software if one is written in capital and the other in lower-case letters).

45 Appendix I – Comparison of Epi Info & EpiData

EpiData Analysis (v1.1 Build 68) When EpiData was originally released, one major disadvantage was its lack of analytical statistical tools. This was addressed with the introduction of EpiData Analysis in 2005. This provides the statistical methods available within the analysis module of Epi Info DOS, but has greatly improved data manipulation facilities.

A wide range of statistical tools exists within Epi Data Analysis, including , which provides a t-test for independent groups, chi-square and Fisher’s exact tests of association, nonparametric tests of difference in location between two groups, and estimates of measures of association with confidence intervals. The graphical capability in EpiData Analysis is much better than Epi Info DOS, with more options and flexibility. While the graphs are acceptable for descriptive purposes, there will probably always be limitations in producing publication quality graphs. A new capability within EpiData Analysis is the ability to produce statistical process control (SPC) charts for performance monitoring applicable for surveillance data.

Discussion Two public domain software packages have been developed based on the classic DOS version of Epi Info. There does seem to be a divergence in the development of EpiData (Entry and Analysis) and Epi Info 3, which is surprising given their common ancestor, Epi Info DOS. While EpiData has aimed to keep the functionality and interface of Epi Info DOS available, Epi Info 3 appears to be developing into a more general software to support epidemiologists rather than keeping with the original concept of an efficient and relatively simple software for outbreak investigation.

It is difficult to decide which of the two development directions will best meet the needs of epidemiologists over the next twenty years. It is possible that there is a place for both. The DOS version of Epi Info was extremely well-structured and efficient and, as EpiData has remained faithful to this concept, this could be seen as its natural successor. With the requirement for more sophisticated analytical methods to overcome issues such as confounding, it may be that Epi Info 3 will be better placed. However, it is now moving into a very competitive arena where commercial software has predominated. The authors would recommend EpiData 3.1 and EpiData Analysis v1.1 software packages for those involved in the investigation of infectious disease outbreaks for their speed and simplicity of use in a Windows environment.

Reviewed software: Epi Info, version 6.04d for DOS and version 3.3.2 for Windows

Centers for Disease Control and Prevention (CDC), Atlanta, Georgia 30333. Can be downloaded from www.cdc.gov/epiinfo/Epi6/ei6.htm (DOS) and www.cdc.gov/epiinfo/ (Windows)

EpiData, version 3.1 (120306) and EpiData Analysis, version 1.1 Build 68

46 Appendix I – Comparison of Epi Info & EpiData

The EpiData Association, att. Jens Lauritsen, Enghavevej 34, DK5230 Odense M, Denmark. Can be downloaded from www.epidata.dk/download.php

References: 1. Fonseca MGP & Armenian HK (1991) Use of the Case-Control Method in Outbreak Investigations. American Journal of Epidemiology 133: 748-752

2. O'Brien SJ, Elson , Gillespie IA, Adak GK, & Cowden JM (2002). Surveillance of foodborne outbreaks of infectious intestinal disease in England and Wales 1992-1999: contributing to evidence-based food policy? Public Health 116: 75-80

3. Palmer SR (1989) Epidemiology in search of infectious diseases: methods in outbreak investigation. Journal of Epidemiology and 43: 311-314

47 Appendix II – Contents of course CD-ROM

Appendix II – Contents of course CD-ROM

♦ This training guide (Adobe PDF document) ♦ Examples of files that might be produced by following this training:

◊ AGEGROUPS.PGM – sample analysis program file for recoding age into agegroups

◊ Completed worksheet for 2x2 table results (Adobe PDF document) – see Appendix IV

◊ Lead Officer.CHK – EpiData check code file

◊ Lead Officer.MDB – Epi Info project

◊ Lead Officer.QES – EpiData questionnaire file

◊ Lead Officer.REC – EpiData data file ♦ Installation and support files:

◊ EpiData Setup

◊ EpiData Analysis Setup

◊ Epi Info Setup

◊ CDC Epi Info Exercises (Word document)

◊ OpenEpi project (Zip archive file) – see Appendix III

48 Appendix III – Further information & resources

Appendix III – Further information & resources

EpiData

EpiData homepage http://www.epidata.dk/

EpiData email discussion list http://lists.umanitoba.ca/mailman/listinfo/epidata-list List intended for discussions, suggestions and help among users of EpiData.

EpiData documentation and technical info http://www.epidata.dk/documentation.php Extended manuals, field notes and technical information on use of EpiData, as well as some general guidance on data management.

EpiData Wiki http://www.epidata.org/wiki Common entry for development of field guides, questions and solutions. All users can add their suggestions and knowledge to the site. A couple of useful field guides already exist, including one on the investigation of a food poisoning outbreak.

Epi Info

Centers for Disease Control & Prevention: Epi Info homepage http://www.cdc.gov/epiinfo

CDC Epi Info Tutorials http://www.cdc.gov/epiinfo/tutorials.htm Two tutorials designed and published by CDC – one based on a cholera outbreak, and the other on a community health assessment.

49 Appendix III – Further information & resources

Epi Informatics - Dr Andrew Dean http://www.epiinformatics.com/ Site dedicated to the use of computers in epidemiology and public health. Includes a brief online introduction to Epi Info, and links to download a more complete introductory manual, together with two 15-hour courses (complete with tutor manuals).

Professor Kevin Sullivan http://www.sph.emory.edu/~cdckms/ Includes tutorials on Epi Info for Windows and Epi Info 6 for DOS, as well as general epidemiology documents.

Google http://www.google.co.uk/ The above sites are just a summary – there is a wide range of training materials, courses etc. relating to Epi Info available globally – a simple Google search on ‘ Epi Info training ’, ‘ Epi Info exercises ’, ‘ Epi Info tutorials ’ etc. should give a good range of resources.

Epidemiology and Statistics

CDC General Training Resources http://www.cdc.gov/descd/materials.html#foodborne In addition to the cholera outbreak tutorial referred to above, three other epidemiologic case studies that focus on real-life food borne disease outbreak investigations. Not specifically based on Epi Info.

OpenEpi http://www.openepi.com/ OpenEpi is a project to create open source software for Epidemiologic statistics in JavaScript and HTML. OpenEpi can be run from a web site or downloaded and run without a web connection. The OpenEpi files are also included on the course CD-ROM for installation and use without the need for internet access.

Epidemiology Supercourse http://www.pitt.edu/~super1/index.htm The Supercourse is a global repository of lectures on public health and prevention targeting educators across the world. Supercourse has a network of over 41300 scientists in 171 countries who are sharing a free library of over 3162 lectures in 26 languages.

50 Appendix III – Further information & resources

StatPages http://www.statpages.org/ Index of web pages that together comprise a powerful, conveniently accessible, multi-platform statistical software package. There are also links to online statistics books, tutorials, downloadable software, and related resources.

Rice Virtual Lab in Statistics http://www.onlinestatbook.com/rvls.html Online statistics book with simulations/demonstrations, case studies, and online analysis tools.

Outbreak Investigations – A Perspective http://www.cdc.gov/ncidod/EID/vol4no1/reingold.htm Article from the peer-reviewed journal Emerging Infectious Diseases that provides an excellent overview of the principles of outbreak investigation and the issues involved.

51 Appendix IV – Worksheet for 2x2 table results

Appendix IV – Worksheet for 2x2 table results

Complete this table with the values from the TABLES output in Epi Info Analysis.

Epi Info Chi square Relative RR 95% Confidence Interval Exposure variable 2-tailed Risk (RR) name Lower Upper p-value

Ham sandwiches HAM

Ham rolls HAMROLL

Sausage rolls SAUSROLLS

Pickled onions ONION

Corned beef sandwiches BEEF

Chicken nuggets NUGGETS

Chicken rolls CHICKROLL

Chicken sandwiches CHICKEN

Egg rolls EGGROLL

Egg sandwiches EGG

Salmon sandwiches SALMON

Cheese & biscuits CHEESE

Crisps CRISPS

Gateaux GATEAUX

Pasties PASTIE

Which exposures demonstrate an association with illness?

Which associations aarere statistically significant?

What further analysis might you wish to undertake?

The completed version of this worksheet is available on the course CD-ROM.

52 Appendix V – Check code example

Appendix V – Check code example

Calculating incubation period It would often be useful to be able to calculate the incubation period to assist with producing an epidemic curve. In the examples we followed in the analysis training, this data was given to us – but if this was calculated ‘manually’, it introduces the possibility of human calculation or data entry errors. Far easier to have the period calculated for us at the time of data entry, based on the onset date and time reported. We would also need to set an arbitrary ‘base date’ – which could be the date of an implicated event, or just a specific date at some point before the first case. To keep things simple we track the period from midnight on the ‘base date’

In the following examples it is assumed that the fields ONSETDATE and ONSETTIME refer to the actual date and time reported by the case. The field INCUBATION is a numeric field (say, two or three digits long with no decimal places) that will hold the calculated values (whether in hours or days). You may need to modify the check code examples to reflect the fieldnames that you have used.

Where the illness in question has a relatively long incubation period, likely to be measured in days (as might be the case in e.g. typhoid, Cryptosporidium , Legionella outbreaks), this is a simple matter of deducting the ‘base date’ from the ‘onset date’ to obtain the period in days. The int function is used to convert the answer into a numeric value to match the INCUBATION field). The check code for this would appear in relation to the ONSETDATE field, and would read something like the following:

AFTER ENTRY DEFINE basedate

basedate = “14/08/1998” INCUBATION = int(ONSETDATE – basedate) END

The situation gets a little more complicated where a shorter incubation period is suggested, such as might be the case with many ‘food poisoning’ organisms such as Salmonella spp., Clostridium perfringens , Bacillus cereus etc. Since EpiData stores date and time details in separate fields, a more complex calculation is required. The check code would need to appear in relation to whichever of ONSETDATE and ONSETTIME was last on the questionnaire (and therefore last to have data entered). The calculation would look something like this:

AFTER ENTRY DEFINE basedate

basedate = “14/08/1998” INCUBATION=(int(ONSETDATE-basedate)*24) + trunc(ONSETTIME) END

This code initially looks the same, in that the first part of the main calculation determines the difference in days between the base date and onset date. This value (in days) is then multiplied by 24 to convert it into hours. Finally, we include the

53 Appendix V – Check code example onset time element by simply adding the value in hours on (to keep things simple, we limit the value to hours by truncating the time field to the hour, ignoring the minutes).

Further tinkering with the time element of the calculation allows you to set a base time to go with the base date, which could be of use where a specific event is implicated. This involves defining a ‘base time’ using the 24-hour clock, and deducting that number of hours from the calculation formula above (since that formula is based on time passed since midnight, so any other base time will reduce the incubation period accordingly). The check code would then look like this:

AFTER ENTRY DEFINE basedate

basedate = “14/08/1998” DEFINE basetime ##.## basetime = 15.00 INCUBATION=((int(ONSETDATE-basedate)*24) + trunc(ONSETTIME)) – trunc(basetime) END

54