GH531/EPI539 Computer Lab Module: Session 4 2012

SESSION 4: ANALYSIS BASICS

This lesson returns to the standard Epi-Info interface and is an introduction to the Analysis module of ™. This lesson shows different ways information in your data set can be viewed, including the relationship between variables. Once records are in Epi-Info through the Form Designer, CheckCode, and Enter Data components that you previously used, you can use several commands to select, sort, compare, and compute basic statistics about your data. Data analysis is usually the reason why you do all of the previous steps – it finally lets you look at your results!

In Session 2, you added 2 records to an existing data set about a survey about asthma symptoms in students in New York. In this session, you will use the Analysis module to read in that data and view it. You will also add even more data to this set from a different type of file. So, you will need to return to the .mdb file that you saved at the end of Session 2. Some of you may have had trouble entering the data on the Computer Lab machines, possibly due to security restrictions not allowing the content to be edited. A copy of the project with the 2 additional records has been uploaded to the website for you to use for this module.

There are two components to this session. The first part focuses on how to open your data in the Analysis module as well as how to sort and select data. The tasks may seem similar to the search activities that you did in Session 2, but you will be using specific commands from the Analysis mode. The second part of the lesson centers on creating basic statistics about your data set (e.g. means, frequencies). Although Epi-Info does have a rich set of data analysis tools, we will not be engaging in sophisticated analysis as it is beyond the scope of this lab.

SESSION ACTIVITIES

 Understand and practice opening data sets in Analysis as well as merging files from other sources

 Learn how to use basic commands for viewing, sorting, and selecting your data.

 Create frequencies, crosstabs, and means in a data set using Epi-Info’s basic statistical tools.

 Learn how to save your work in the Analysis mode, including the difference between the instruction code and the output files.

WHAT YOU NEED

 The .mdb file that you used in Session 2 or the file that has been loaded onto the Session 4 webpage. It should be called AsthmaSurvey.mdb. This file is the Access file that you had downloaded from the Course Web site and had added records to during the Enter Data portion of the lab. GH531/EPI539 Computer Lab Module: Session 4 2012

 The Excel file called AsthmaTable.xls that is located on the Course Web site under Session 4. Download and save this file to your computer or flashdrive. We will use it in the second half of the lesson. You cannot complete the assignment without having this file.

NOTE: If you did not save your file from Session 2, you can recreate it. Take the Access Survey Access File from Session 2 (on the Course Web site under Epi-Info) and add on the data from the 2 sample surveys provided in Session 2. These surveys are for the children with last names Evans and Williams. You can find these surveys on the Canvas site under Session 2 (look for the PDF called Sample Surveys).

The Analysis module can be accessed by clicking the Analyze Data button from the Epi Info™ main page. There are two options, Classic and Visual Dashboard. We will be covering how to analyze data in the Classic form, but the same functions work in the Visual Dashboard and can be accessed either by right- clicking on the workspace or the tools on the sidebars. Analysis can read data files created in Form Designer and other types of databases such as DBase, FoxPro, MS Excel, and others. Analysis can produce graphs and interact with to display geographic data. Once you have collected your data, Analysis acts as a statistical toolbox providing you with many ways to transform your data and perform statistical evaluations. Analysis also provides you with ways to export your data to new file formats such as MS Excel.

The Analysis Workspace

The Analysis module contains three working areas: the Analysis Command Tree, the Analysis Output window, and the Program Editor. You will see a picture of this workspace on the next page. Of course, you will also see this if you have opened Epi-Info.

 The Analysis Command Tree contains a list of available commands separated into folders by command type. The Command Tree is used to create the functions and programs that run your statistics from the data. Selecting a command opens the corresponding dialog box for that command, function, or statistic to run. It is the column on the lefthand side of your screen.

 The Output window acts as a browser and displays all the information that is generated from the commands run in the Program Editor. The Output window buttons allow you to navigate through the program scripts that have been run and displayed in the output screen. It is the big gray box on the upper righthand side of your screen.

 The Program Editor displays the dialog/code of the commands created using the Command Tree. Commands can be directly typed into the Program Editor. Programs or .PGM files written in Analysis are stored in the open .MDB. Programs can be saved and run against new data when the data becomes available. It is the white box on the lower righthand side of your screen.

GH531/EPI539 Computer Lab Module: Session 4 2012

1. Before beginning, please be sure that you have downloaded the necessary files (see above under the heading “What You Need”). 2. If you have not opened Epi-Info already, go ahead and boot it up. Click the Analyze Data button from the front page or select Programs>Analyze Data from the menus at the top of page to open the Analysis module.

 The Analysis workspace should open. Scroll down to the next page a picture of this space with explanations of each component.  The top of the page shows that you are in the Analysis module.

This is the Output window.

This is the Command Tree for Analysis. After you run a command, the results appear in this window. Commands are broken into folder groups. Sometimes you might have to look carefully Once you read in a file, the Output window shows and/or scroll down to find the command you your Current View and the Record Count for that need. View.

The Help button appears at the bottom of The results you generate in this space are saved in the window. Output files on your desktop.

This is the Program Editor window. Commands generated from the Command Tree wizards appear in this window. These are the instructions you give Epi-Info on how to analyze the data.

Opening a Data Set in Epi-Info Analysis In order to analyze data, you must Read or Import a project into the Analysis module. The Read command allows you to select a project and data table on which to run your statistics. The READ or IMPORT command (found in the Command Tree Window under the ‘Data’ folder) will be used every time you open the Analysis module in order to begin work. GH531/EPI539 Computer Lab Module: Session 4 2012

From the READ dialog box, you will select a project containing data for analyses. The Current Project is the database file (.mdb) which is the default source of data.

It is recommended that the Current Project and the Data Source (fields in the Read Dialog box) match. Having the same Project and Data Source allows you to keep all of your data and tables together.

 If they do not match, you will need to change your project. The new project you select will then appear as the Current Project and Data Source. (There is no need to do this right now – we are about to practice it.)

 If you Read in a file where the Project and Source do not match, you will be asked to create a linked table. Data sources other than the current project are accessed through link tables located in the current project. Linked files are references to the data tables, and not the actual data, so if you move your files to a new location, the links will be broken.

Let’s open up your previous work and get started.

1. From the Analysis Command Tree, click Read (Import). The READ dialog box opens. You may see a number of projects or views that you do not recognize under Recent Data Sources. That is why we need to change the project.

2. Click on the “…” button next to the Data Source pathway. A Windows Explorer window opens for you to direct the program to where your project is saved.

3. Use the drop-down menu to locate the *.prj file where you have the 12 subjects entered. Alternatively, you can also just select the Access database if you change the Database Type to 2002-2003 file. As you can see if you click on the drop-down menu you can select other types of

4. Click Open (in the READ File Name box). Now, the file you just chose will appear in the Current Project field and the Data Source field inside the READ dialog box.

5. Be sure the Forms is selected in the Show section.

6. Select viewPreInterventionSurvey from the Data Source Explorer.

7. Click OK. The Output window opens.

 The Output window shows your Current View from wherever you have the Access table saved.  You should have a Record Count of 12. (If you don’t, you need to enter the data from the 2 surveys provided in Session 2, for the children called Evans and Williams.)

 Notice the READ command you executed appears in the Program Editor at the bottom of the screen (the code displayed here indicates commands that have already been performed – what you have done so far). GH531/EPI539 Computer Lab Module: Session 4 2012

Navigating Analysis

Below is a list of menus and commands that you will find in Analysis. This information is also available in separate documents on the Canvas site for this session (Session 4).

Read the following sections and familiarize yourself with the Analysis workspace. Don’t try to analyze anything yet (unless you are confident that you can get back to the point you are right now (in which you have not selected any datapoints). Click Cancel or the close X to exit any dialog or command boxes you open.

Menus in the Output and Program Editor Windows

The menus in these two windows may be familiar to you from other software programs that you have used.

Output Window

The Analysis Output window acts as a browser to display all the output information that is generated from the commands created and run in the Program Editor. The Output window buttons allow you to navigate through program scripts that have been run and displayed including Previous, Next, Last, History, Open, Bookmark, Print, and Maximize.

Program Editor Window

File Use the File Menu to create New files, Open, Save, Close, Print, and Exit.

Edit Use the Edit Menu to Copy, Paste, Delete, Find, Replace, or go to the Program Beginning, and Program End.

Fonts Use the Fonts Menu to select Fonts, Style, and Color.

Run Use the Run Menu to Run a Program or Run This Command to run one line of code.

Use the LIST Command The LIST command creates a line listing of the current dataset. Select specific variables from the Variables drop-down to narrow the list or select * to display all the variables. To exclude certain variables from the list, check the All (*) Except box and select from the Variables drop-down menu.

1. From the Command Tree Statistics folder, click Statistics>List. The LIST dialog box opens.

GH531/EPI539 Computer Lab Module: Session 4 2012

2. Click OK to accept the default settings. The variables in your dataset appear in a grid table inside your. Output Window. It looks like it might be an Excel spreadsheet; it’s not!

 Notice the LIST command in the Program Editor.

 Notice also that an Output file is created on your desktop as an .htm file. It is probably labeled EpiHome.htm and appears as an Internet Explorer icon. If you click on it, it will not open up in Epi-Info. Instead, it will open up in your default Internet browser. When you open it, it won’t be the grid that is showing, but the corresponding text that is found in the Output Window of Epi- Info. It will be gray and list the project as well as the number of records in the data set.

Commands

The Command Tree contains a set of commands located in a series of nine folders on the left. Sometimes you may need to look carefully or scroll down to find the command you need. Below are only a few of the commands that on the tree – the ones that are pertinent for the first part of this lesson. You will notice that there are many more. We will use a few more when we start creating statistics, but we will not use all of them. So, they are not all defined here.

READ/IMPORT READ is the most commonly used command. The Read (Import) command changes the current data source and/or the current project. It removes any standard defined variables. The READ command operates on many different types of data. Epi Info™ can Read in 24 different types of files. Located in the Data folder.

SELECT The Select command specifies a condition that must be true for a record to be processed. Use to select a set of records for analyses. For example, select records based on gender or zip code. Located in the Select/If folder.

CANCEL SELECT The Cancel Select command cancels a previous SELECT command. Located in the Select/If folder.

SORT The Sort command specifies the sequence in which records will appear when using the LIST, GRAPH, or WRITE commands. SORT organizes the listed data in an ascending or descending order, based on selected variables. For example, you can sort by last name or age. Located in the Select/If folder.

CANCEL SORT The Cancel Sort command cancels a previous SORT command. Located in the Select/If folder.

LIST The List command creates a listing of the current data table. Lists can be customized to list all, exclude, or show specific records. Located in the Statistics folder.

Create a Web Format List Now you are going to list your records again, but you will change the format of your output. Instead of producing a grid (like an Excel spreadsheet), you will use a format that will resemble ho w you might see the same data set in a table on the Web.

1. Click List. The LIST dialog box opens. GH531/EPI539 Computer Lab Module: Session 4 2012

2. From the Display Mode section of the LIST box, select the radio button Printable/Exportable.

3. Click OK. The List appears in the Output window in a web table format.

 The Web display mode creates an embedded Output file.

 Notice the title bar now contains the name of your default output file, OUT1.htm. You can see a copy of this as a web document where you are saving your files. Use the Sort Command The SORT command allows you to specify the sequence in which records will appear. If more than one variable is selected, records are sorted in order by the first variable. After you sort records, you must use the LIST command to see the results.

1. From the Command Tree Select/If folder, click Sort. The SORT dialog box opens.

2. Double click the variable DOB to place it in the Sort Variables field. Variables can be sorted by ascending or descending order.

3. Click OK. The Record Count stays at 12. Notice the Sort information is listed in your Output window.

4. Use the LIST command to view all the variables. You can view the list in the format you prefer, Grid or HTML. Notice how the list is sorted based on the DOB variable (listing in increasing or decreasing order by date). 5. Click Cancel Sort. The CANCEL SORT dialog box opens.

6. Click OK.

NOTE: Records remain sorted until you close Analysis, Read in a new project, or Cancel Sort.

Create a List of Sorted Variables 1. Click Sort. The SORT dialog box opens.

2. Double click the variable ChildLast to place it in the Sort Variables field.

3. Click OK., then Click List. The LIST dialog box opens.

4. Select Printable/Exportable from the Display Mode.

5. From the Variables drop-down select ChildFirst, then ChildLast, then Asthma

6. Click OK. The selected variables appear in a web table format.

 Notice the list is sorted alphabetically by the Child's Last Name variable.

GH531/EPI539 Computer Lab Module: Session 4 2012

1. Click Cancel Sort. The CANCEL SORT dialog box opens.

2. Click OK. The original dataset order is applied.

Use the Select Command You can use the SELECT command to specify criteria for data to be included in Analysis functions. Think about the various ways you could divide your data into groups using this command. From the current data, you could select based on Age, Gender, Zip Code, or any combination of variables, in addition to illness information.

IMPORTANT NOTE: Once you make a selection, it affects all of the analyses that follow. You must use the Cancel Select command to remove the selection. Selections accumulate until you Cancel Select (visible in the Program Editor Window), close Analysis, or Read in a new project. It is very easy to obtain wrong results when you analyze your data if you are still only looking at a subset of the data. Verify your record count in the Output window before activating any analysis command.

Like SORT, you need to use the LIST Command to see the selected records. In this example, you will select only the male students' data from the survey.

1. From the Command Tree Select/If folder, click Select. The SELECT dialog box opens.

2. From the Available Variables drop-down, select Gender. Gender appears in the Select Criteria field.

3. Click the “=” operator.

4. Click or type quote ".

5. Type the word Male.

 The word Male must be enclosed in quotes because it is a text field. Text variables must appear in quotes.

6. Click or type quote " to close the expression.

7. Click OK. The Output window updates with your new selection criteria.

 The Current View stays the same

 The Select criteria appears as Gender="Male"

 The Record Count changes to 9.

 Note that the Program Editor contains the Select command information that you just told Epi- Info.

GH531/EPI539 Computer Lab Module: Session 4 2012

List the Selected Records 1. Click List. The LIST Dialog box opens.

2. Select Printable/Exportable as the Display Mode.

3. Click OK to view all the variables.  Notice that all the records in the list are for male students.

Select Based on Age Remember the Select command has a cumulative affect on your data. Now select all of the male students who are younger than the age of 9. You have already selected the male students. This selection now selects on the basis of Age.

1. Click Select. The SELECT dialog box opens.

2. From the Available Variables drop-down, select Age.

3. Click or type <.

4. Type the number 9.

 The Select Criteria field should read Age<9.

5. Click OK. The Output window updates.

 Notice that your Record Count changes.

 Notice that your Select criteria reads (Gender = "male") AND (Age < 9). This expression could have been entered into the Select Criteria field at the same time. To do so, you need to enclose each expression (e.g. Gender=”male”) into parentheses and you would need to use one of the Boolean operators offered in the SELECT dialog box, AND or OR.

6. List the records in a Printable.

7. Click Cancel Select.

 Notice the Record Count returns to the original 12.

PRACTICE!

Put the students who have bronchitis in order by age and list your results in a Web format. How many did you find? Who is the oldest?

Hint: Bronchitis=”Yes” is the expression you’ll need to use.

You should get a Record Count of 8. Steve Michaels should be the oldest. GH531/EPI539 Computer Lab Module: Session 4 2012

Do not forget to Cancel Sort and Cancel Select to return to the original 12 records in their original order.

Creating Statistics

Now, you will create statistics using Commands available in the Analysis Module. You will also create results using the Frequencies, Means, and Tables Commands. Also, you will merge the 12 records in your .mdb file with records gathered and recorded in an MS Excel spreadsheet format. (*** Note: I couldn’t get the Merge command to work, but I left the instructions in so you have them as reference for if they do update it in the future to work properly ***). Many times information needed for analyses will come from a variety of sources. In this example, you have student records in an .mdb file that you had created in Epi-Info and in an MS Excel spreadsheet. You will use the MERGE command to place all the student records in one project and then create statistics to see what these records can tell you about asthma among the sampled student body.

Display Files This command allows the display of table, view, and database information. You are going to use it to view information about the variables in your current dataset and to determine which variable to use as the Build Key when you merge your student records with the student records from MS Excel.

Your current .mdb file for the Asthma Survey should still have 12 records.

1. From the Command Tree window, click Display from the Variables folder. The Display dialog box opens. You want to see Variables selected in the “Information for” field and Variables Currently Available selected in the “From” field.

2. Click OK to accept the default settings. The Variables in your project appear in the Output window.

3. Scroll through the variables listed.  Display is an important command to use before importing data or completing analyses on unfamiliar data. If needed, you could use this information to format the cells in your spreadsheet prior to merging or importing files. Display can alert you to potential errors in Analysis, for example, assuring that variables are coded as the same field type.

 To see how the MS Excel file you will be merging into the project is set up, you can open the AsthmaTable file in MS Excel (not from Epi-Info) and view the data through a separate window. Do not make any changes to the file.

a. Notice that the MS Excel and the Epi Info™ project file both contain variables called StudentID. This field corresponds to both sets of data and will be used to join the datasets.

GH531/EPI539 Computer Lab Module: Session 4 2012

b. Notice also that the Excel spreadsheet does not contain all the personal information collected in your Epi Info™ project. Values that do not correspond will appear as missing in Analysis. Be sure to close MS Excel before beginning the merge.

GH531/EPI539 Computer Lab Module: Session 4 2012

Commands The following commands from the Command Tree will be used. All of these are found in the column on the lefthand side of Epi-Info. Again, this list is not comprehensive; instead, only the commands used in this lesson are described.

DISPLAY Use this command to display table, view, and database information. Use the display option Variables Currently Available to see all the variables in the dataset, including names, field types, and formatting. Use Display, prior to merging or creating statistics, to ensure that field types and variable names have been coded as needed. Located in the Variables folder.

SET Use this command to view the current values of Analysis option settings and generate commands to change them. Statistical and graphic viewing options can be selected. Yes, No, and Missing values can be viewed in alternate forms. Allows the inclusion or exclusion of missing records in statistical computations. Located in the Options folder.

MERGE Use this command to merge records in one dataset with those in another, using one or more defined identifiers or build keys to establish the match between records. Merged records can be appended or updated to the new dataset. Merge is only supported when the READ data source is an Epi Info™ or MS Access table. The MERGE data source can be of various types. Located in the Data folder.

FREQUENCIES Use this command to produce a table showing how data is distributed, counts of any given interval, how many records have each value of a selected variable, or the number of occurrences for a specified variable. If more than one variable is given, a separate table is created for each variable. Confidence limits for each proportion are included. Records may be included or excluded from the count by using SELECT statements. Located in the Statistics folder.

MEANS Use this command to compute descriptive statistics for a continuous numeric variable. The mean of a set of data is equal to the sum of the data divided by the number of items in the data set. When used with a cross-tabulation variable, it also computes statistics showing the likelihood that the means of the groups are equal. The mean of a yes/no variable is the proportion of respondents answering yes. Means can be created for continuous or categorical values. Continuous variables fall along a continuum with one variable selected, for example, age. Categorical variables are groupings or categories with two variables selected, for example, gender. If one variable is selected, the program produces a table similar to the Frequencies table plus descriptive statistics. If two variables are selected, the first is a numeric variable containing data to be analyzed and the second is a variable that indicates how the group will be distinguished. The output of this format is a table similar to that produced by Tables, plus descriptive statistics of the numeric variable for each value of the group variable. Means produces various statistical tests with which you may not be familiar such as parametric tests, ANOVA, student’s t- test, Mann-Whitney/Wilcoxon Two Sample Test, etc. Located in the Statistics folder.

TABLES Use this command to create cross-tabulations of categorical variables. Tables can help determine the probability that a risk factor is linked to an outcome. For these values to have their accepted epidemiological meanings, the value representing presence of the exposure (independent value) and outcome conditions (dependent variable) must appear in the first row and column of the table. Epi Info yes/no variables are automatically sorted. Values of the first selected variable will appear across the top of the table, and those of the second selected variable will be on the left margin of the table. Normally cells contain counts of records matching the values in the corresponding marginal labels. For 2x2 tables, the command produces odds ratios and risk ratios. For tables other than 2x2, Chi-Squared statistics are computed. Chi square for trend tests for the presence of a trend in dose response or other case control studies where a series of increasing or decreasing exposures is being studied. The p-value is the probability that the association between two variables may be due to chance. A low p-value of <.05 means the exposure may be closely associated with the outcome. Located in the Statistics folder.

GH531/EPI539 Computer Lab Module: Session 4 2012

Set Options The SET command (in the Options folder within the Analysis Window) provides various options that affect the performance and output of data in Analysis. These settings are used whenever the module runs. You want to be sure that the Statistics setting is Advanced or the Risk and Odds Ratios may not appear in your results.

1. From the Command Tree Options folder, click Set. You may need to scroll down to find this Command. It is the last one listed. The SET dialog box opens.

2. Set your options as follows.

 Yes as Yes; No as No; Missing as Missing; All Show boxes selected; Statistics Advanced; Include Missing not selected; Process Records as Normal (undeleted)

 Most of these options are likely to be the default setting except the Statistics choice. Make sure the Advanced checkbox is selected.

3. Click OK.

Merge Files

*** Note: I could not get this command to work properly in Epi Info 7. I tried the commands in both the Classic and Visual Dashboard and they would not work for me. Hopefully in future releases they will fix this issue. You can try to see if it will work for you on your computer. Just use the AsthmaTable.xls file for the rest of the module. ***

The MERGE command is used to join records from multiple sources. It is particularly useful if you have multiple data collection and entry points that need to be combined into one, central record (e.g. multi- site household or health post survey implemented by a Ministry of Health). There are some important points to remember when using this command, requiring you to plan in advance before you start your data entry.

 Merge is only supported when the READ data source is an Epi Info™ or MS Access table.

 The Merge data source can be of any type, such as MS Excel.

 A merge requires a Build Key that represents a matching variable inside both sets of data. So, you need to use the same data types and field names.

Your current Project in Epi-Info should contain 12 student records. In order to view all the available records for the county, use the MERGE command to join the 12 records you created in GH531/EPI539 Computer Lab Module: Session 4 2012 previous lessons with a larger data table gathered using MS Excel. For this example, you will use the MS Excel spreadsheet called AsthmaTable.xls that you downloaded at the beginning of this lesson.

The MS Excel file used for this lesson has been formatted so that the column headings and data types match the field names you created in Form Designer. You are going to merge the files based on the Build Key StudentID. When you created the View, you made StudentID a Required field. In our example, there are no duplicate values on the field StudentID. Each record will have a unique value.

1. From the Command Tree window Data folder, click Read.

2. Select Data Format as Excel 97-2003.

3. Click on the “…” to find the Excel file AsthmaTable.xls.

4. Click Open and the OK.

5. Under the Data Source Explorer, select SurveyData$.

6. Now click on Merge. Your project should be selected already.

7. Click on Show Views and select PreInterventionSurvey.

8. Click Build Key. The BUILD KEY dialog box opens.

 Keep in mind that you need to choose a variable that can be joined from the Current Table that corresponds to the Related Table. In this example, you need a variable from your Epi-Info .mdb file that matches a variable in your MS Excel table. As noted, it is StudentID in this case.

 NOTE: If you have not entered data into ALL of the records in the StudentID field or whichever field you have chosen as your BuildKey, Epi-Info will reject the merge. So, you need to make sure that you choose a BuildKey variable that you know you have all of the datapoints for; even one missing datapoint will block the merge. Another reason why you need to pay attention to detail when you construct and enter data into your database.

9. From the Available Variables drop-down, select StudentID. The Key Component box should now say StudentID.

10. Click OK. The Current Table field shows StudentID. This designation refers to the existing data table from the Access (.mdb) file.

 Now, you will do the same thing for the table from the new Excel file that you downloaded from the course Web site for this lesson. Epi-Info thinks of this as the Related Table. GH531/EPI539 Computer Lab Module: Session 4 2012

11. From the Available Variables drop-down select, StudentID. The Key Component box should now say StudentID.

12. Click OK. The Related Tables field shows StudentID. The RELATE-BUILD KEY dialog box should now look like this:

13. Click OK to accept the Build Key designations. You return to the MERGE dialog box.

 In the Key field it should read: StudentID::StudentID.

 The Update and the Append checkboxes should be selected.

a. For matching records, Update replaces the value of any field in the READ table whose build key matches the MERGE table.

b. For unmatched records, Append creates a new record in the READ table with values only for those fields that exist in the MERGE table. In this case, the READ table is your Epi-Info .mdb file while the MERGE table is your MS Excel file.

c. For this merge, you want to Update and Append records, meaning that records containing the same StudentIDs will be updated (overwritten) if there is new information, and all other records will be appended (added) to the end of the Epi-Info data table.

14. Click OK from the MERGE dialog box. The FILESPEC dialog box opens. (It may look like it is missing a lot of information. It might be – it’s a very odd sight! But, the important information that you need should be working right now.) Be sure the First Row Contains Field Names checkbox is selected.

15. Click OK. The READ dialog box opens. This is a small box that tells you that you could have a temporary or a permanent link.

 Because you are bringing a new data source into the project, you must decide if you want to create a temporary or permanent link between the two sets of data. For this exercise, you will be working with the 800 records for several exercises so make the link permanent.

16. In the READ dialog box, type the link name AsthmaTable. Be sure to type it as written – no spaces, using capital letters where indicated.

17. Click OK. The Record Count in the Output window should read 800 (eight hundred is the number of rows from the Excel file which have been merged and are overlapping the 12 records in the original .mdb dataset).

18. Use the LIST Command to list the variables in a Grid. Choose to list ALL of the variables. GH531/EPI539 Computer Lab Module: Session 4 2012

 Notice the first 12 records contain information from your survey. The StudentIDs are the same, so the records were updated to reflect any changes. These 12 records are not missing any information in the data fields. After the 12th record row, notice that only the first 5 fields are filled in. Fields in the columns after DOB are set to ‘Missing’ because these fields contained no data in the Excel file.

19. Exit the data table by clicking the X in the upper righthand corner. You will return to the Output window.

 Notice that the Program Editor shows the commands used to create the Merge and the List.

You will now start carrying out tasks that are part of this module’s Assignment. You will learn how to use Epi-Info’s Analysis Commands and then answer the assignment questions while you practice what you have learned.

Open the Assignment on the Canvas site now.

The FREQUENCY Command

***Originally this module had students use the merged dataset for the assignment, but since that function does not seem to be working in Epi Info 7, use only the AsthmaTable.xls file ***

*** You may use the Classic or Visual Dashboard for the quiz – it will give you the same results. The instructions below are for the Classic ***

You are going to use the FREQUENCY command to get a count and view the 95% Confidence Limits on selected variables. The FREQ command produces a frequency table that shows how many records have a value for each variable, the percentage of the total, and a cumulative percentage. The command FREQ * creates a table for each variable in the current view. This command is often used to begin analyses on a new data set. To exclude variables, use the All (*) Except checkbox and select variables to be excluded.

1. From the Command Tree Statistics folder, click Frequencies. The FREQ dialog box opens.

GH531/EPI539 Computer Lab Module: Session 4 2012

2. From the Frequency of drop-down, select SchoolNum.

3. Click OK. The Frequency distribution of SchoolNum appears in the Output window.

 The Frequency column gives you the count of students per school. The Percent column indicates this count (the school’s number of students enrolled) as a percentage of the total number of schools in the district.

 The 95% Confidence Limits listed underneath the table are a range of values for the variable SchoolNum. This range indicates the likely location of the true value of the measure. In this frequency, the Confidence Limits for School A’s student population are 21.6%-27.7% and the actual percentage is 24.5%, which falls within the range.

PRACTICE! (as part of the Assignment)

Type your answers to the questions below into the Assignment window of the Canvas site.

1) Are there more males or females in the survey population? What’s the proportion of survey respondents in the largest gender group?

 Hint: Create a Frequency of Gender to find out.

2) Which condition has the highest frequency and what percent of the survey population suffer from this problem?

 Hint: Create a Frequency of Asthmatic Conditions. Notice you get four tables, one for each condition. (Scroll to visualize all.)

3) Which month has the highest number of students with breathing difficulties?

 Hint: Create a Frequency of WMonth.

Remember to write your answers in the Assignment window of Canvas.

Stratify a Frequency GH531/EPI539 Computer Lab Module: Session 4 2012

Frequency tables can be stratified or grouped by a selected variable. You will use two variables to create the results.

 You want to find out which school had the highest frequency of students with bronchitis.

1. Click Frequencies. The FREQ dialog box opens.

2. From the Frequency of drop-down menu, select Bronchitis.

3. From the Stratify By drop-down menu, select SchoolNum.

4. Click OK. You will get four tables containing the frequency of bronchitis per school.

Did you determine that it was School A? 13.8% (N=27) of the students in School A reported Asthmatic or Wheezy Bronchitis.

PRACTICE! (as part of the Assignment)

4) Which gender has the highest frequency of Asthma? What’s the proportion of survey respondents in the largest gender group?

 Hint: Create a Frequency of Asthma stratified by Gender.

Remember to write your answer in the Assignment window of Canvas.

Select Variables for a Frequency You can use the SELECT Command to specify a subset of students and then create a frequency for only those records.

 You are interested in discovering how many students have Reactive Airway Disease and if they are currently taking medication for the condition. So, you want to find those students answered “Yes” to having Reactive Airway Disease and the percentage of students with RAD who have been prescribed medication.

1. Use the SELECT command to view those who answered “Yes” to having Reactive Airway Disease. Remember the steps you used earlier in this lesson to select specific records. You need to choose your variable in the Available Variables drop-down menu and type in or use the operator buttons. In this case, you would use RAD="Yes".

 Click OK to get your results.

 Notice that your results give you a record count. You should have 93 records.

2. From these records, create a Frequency of students who have been prescribed medication. Use the steps you just learned above for running a frequency. (Hint: Frequency of Medication)

 A frequency table should now appear underneath your results for the records selected. 49.5% of the children should have been prescribed medication. Remember that this percentage is for GH531/EPI539 Computer Lab Module: Session 4 2012

those children who have Reactive Airway Disease since you selected the records for this disease only.

3. Click Cancel Select in the Command window (on the lefthand side) to return to the original 800 records.

PRACTICE! (as part of the Assignment)

5) Which condition has the highest frequency among the students living in the part of town with the largest number of study subjects? Which zip code contains the largest number of study subjects?

 (Hint: There are multiple commands you will use to solve this question. Use Zip and Asthmatic Conditions as your two variables. Note that Zip Code is a text field. Text fields must be surrounded by quotes when used in commands such as Zip=“12345”)

Remember to Cancel Select when you are finished or else your next search will use only the records you selected to answer this question.

Remember to write your answer in the Assignment window of Canvas.

6) Notice how your results differ if you stratify Asthma by Zip. (Note: You’ll need to click on one of the Zip code results in order to see the frequency tables.)

a. Which zip code has the highest percentage of students with Asthma?

b. Which zip code has the highest number of students with Asthma?

7) Briefly explain why there is a difference in your results for #5 and for #6.

Remember to write your answers in the Assignment window of Canvas.

8) Are the children who have all 4 asthmatic conditions (Asthma, Reactive Airway Disease, Bronchitis, and Wheezing) concentrated in a certain area of town or in a particular school? Name the zip code (proxy for area of town) or name the school. Explain your reasoning.

 (Hint: The variables for the 4 conditions are: Asthma, Reactive Airway Disease, Bronchitis, and Wheezing. Use Zip for zip code and SchoolNum for the school name. Again, note that Zip Code is a text field. Text fields must be surrounded by quotes when used in commands such as Zip=“12345”)

 Questions to consider when answering this one:

o How many children suffer from all of the asthmatic conditions?

o Are the conditions evenly distributed among the survey population by zip code or by school? Or are they concentrated in one zip code or one school?

GH531/EPI539 Computer Lab Module: Session 4 2012

o Do you need to use frequency tables to determine your answer?

 Remember to Cancel Select when you are finished or else your next search will use only the records you selected to answer this question. You should return to the original 800 records.

Remember to write your answer in the Assignment window of Canvas.

The TABLES Command

As you may recall from and classes (or similar statistical analysis classes), tables, or crosstabs, are used to show the relationship between two or more categorical values. These tables are different from the frequency tables you saw above. Frequency tables show the distribution of one variable, but contingency tables, or crosstabs, show the distribution of multiple variables at the same time. If you have studied epidemiology or biotstatistics, you should be very familiar with a 2x2 contingency table. This table is what you use to calculate Odds Ratio and values.

In Epi-Info, you use the TABLES command to create contingency tables, or crosstabs. A 2x2 table is one type of crosstab.

Create a Contingency Table or Crosstab You want to further explore the idea that there may be a connection between school location and asthmatic conditions. You are also interested in seeing what kinds of results you can get with the 800 student records available to you as a primary data source.

 Your first question to explore is: Does one school have a higher number of students with asthma?

1. From the Command Tree Statistics folder, click Tables. The Tables dialog box opens.

2. From the Exposure Variable drop-down, select SchoolNum.

3. From the Outcome Variable drop-down, select Asthma.

4. Click OK. Results should appear in the Output window.

 In the table created, which school has the highest numbers of students with Asthma? You should identify School A. 29 students answered “Yes” representing 14.8% of the School A student body. This count (number of students) and this proportion (percentage) is higher than the other schools in the survey.

Your next question to explore is: Which school has the highest number of students with Wheezing?

1. Follow the same procedures as above, but select Wheezing as the Outcome Variable.

Again, your answer should be School A. Did you find that 30 students from School A, or 15.3% of the student body, suffer from Wheezing?

PRACTICE! (as part of the Assignment)

GH531/EPI539 Computer Lab Module: Session 4 2012

9) Describe the relationship between the child’s age and the number of days missed from sports. Discuss limitations from the data.

 (Hint: Use Age and MissSport as your two variables.)

Remember to write your answer in the Assignment window of Canvas.

Create a 2x2 Table Two by two tables are a common epidemiological tool to determine if there is a correlation between two variables. They allow you to calculate odds ratios and risk ratios, measures of the strength of this relationship. These ratios help you understand what impact your intervention or exposure amount had on the outcomes in your data. You encountered these in Session 3. The TABLES feature in Epi-Info provides these ratios as well as the results of a Chi-Square test, another common biostatistics tool for analyzing data. A Chi-Square test is used to determine how likely it is that an observed association between an exposure and a disease could have occurred due to chance alone, if the exposure was not actually related to the disease.

You want to determine if there a correlation between the children who have asthma and those with wheezing.

1. Click Tables. The TABLES dialog box opens.

2. Select the Exposure Variable of Asthma.

3. Select the Outcome Variable of Wheezing.

4. Click OK.

 Notice the graphic representation of your results.

 A 2x2 table was created because each variable had a yes or no answer.

 The square with the largest value represents the children who answered no to both wheezing and asthma.

How many students had both Asthma and Wheezing? You should find 63 students which is the upper leftthand box. The results in this box indicate those who answered “Yes” to Asthma and “Yes” to Wheezing.

5. Scroll down to view the Single Table Analysis.

You will see a number of interesting statistics in this box.:

 Odds Ratio (OR) is given, including the 95% C.I. for this OR. Odds Ratio (MLE) is also provided which indicates the OR Maximum Likelihood Estimate, a manner of calculating the Odds Ratio using Maximum Likelihood Estimation, a statistical process that is not within the scope of this class. For simplicity’s sake, we will use the cross-product Odds Ratio in this tutorial.

 Relative Risk or Risk Ratio (RR) is given, along with the 95% C.I. for this RR. The Risk Difference is also provided. Risk Difference is your absolute difference in risk between your two groups, GH531/EPI539 Computer Lab Module: Session 4 2012

experimental and control. Mathematically, it is the experimental groups’ RR – the control groups’ RR. It will show the absolute change in risk that is due to the intervention.

 The first row shown in the 2 x 2 table (in this case, those who have asthma) is the referant population for the calculations for OR and RR.

 The various statistical tests should be familiar to those of you who have completed a basic biostatistics or social science statistics class. These are tests that determine how significant your results were -- how likely it is that your results could have occurred due to chance alone, if your intervention is not related to the outcome. These tests are not within the scope of this class. Consult your friendly biostatistician, statistics text, or the Web for more information.

Now, based on the results you generated in the box above, what are the odds that you have the condition Wheezing if you have the condition Asthma? There is a 42% chance that you have Wheezing if you have Asthma according to the odds ratio.

PRACTICE! (as part of the Assignment)

10) You want to determine if there is a correlation between gender and various asthmatic conditions. List the odds and risk ratios for:

a. Asthma

b. Bronchitis

c. Reactive Airway Disease

d. Wheezing

NOTE: Round off to two decimal points. (e.g 42.0625 should be written as 42.06)

e. Discuss the relationship between gender and the different conditions of asthma.

Remember to write your answer in the Assignment window of Canvas.

The MEANS Command

The MEANS command can be used to get an average for a continuous variable. A continuous variable is one with a range of results (e.g. age, income, # of days on treatment, etc.). The opposite of a continuous variable is a dichotomous variable which has only two results (e.g. yes/no, male/female, use condoms/not use condoms). The mean of a yes/no variable is the proportion of respondents answering yes.

When you use the MEANS command, your output will also include frequency results for the variable. The MEANS command has two formats.

 If only one variable is supplied, the program produces a table like that produced by FREQUENCIES, plus descriptive statistics. GH531/EPI539 Computer Lab Module: Session 4 2012

 If two variables are supplied, the first is a numeric variable containing data to be analyzed and the second is a variable that indicates how groups will be distinguished. The output of this format is a table like that produced by TABLES, plus descriptive statistics of the numeric variable for each value of the group variable.

Create a Mean You want to find the simple means of the number of days children missed school due to asthma or an asthma related condition.

1. From the Command Tree Statistics folder, click Means. The MEANS dialog box opens.

2. From the Means Of drop-down menu, select the variable MissDays.

3. Click OK. The Output window populates with the results. The table results show a breakdown of missed days aggregated per answer. A picture of the results through 9 missed days is below. Your screen should show the results through 15 missed days.

What was the most frequent number of missed days? It should be 1 day. 241, or 30.1%, of the respondents missed 1 day of school.

NOTE: If your continuous variable has a large range of options, it might be difficult to determine the value with the largest number of observations (aka the Mode). (Above, there are only 9 rows above so it is easy to scan the results.) In this case, you can look at the Statistics output underneath the table above.

4. Scroll down to see the Summary of Statistics, including the results for the Mean.

You will see a number of statistics listed in this box:

 Obs – Number of unique data points

 Total – Sum of all values in data

 Mean – Average of datapoint values

 Variance – A measure of the range of observations in your data, how it is spread out.

 Standard deviation – A statistical summary of how dispersed the values of a variable are around its mean.

 Minimum – Lowest value in data set

 Total – Sum of all values in data

 25% – Point denoting that 25% of the data points fall on or below this value

 Median – Halfway point in range of data. (Point denoting that 50% of the data points fall on or below this value.)

 75% – Point denoting that 75% of the data points fall on or below this value.

 Maximum – Highest value in data set GH531/EPI539 Computer Lab Module: Session 4 2012

 Mode – The value that is most frequent in the data set (largest number of observations)

What is the Mean (average) number of days missed due to asthmatic conditions? You should find 3.77 as the mean. Notice that you have rounded off to two decimal points.

PRACTICE! (as part of the Assignment)

11) What is the average age of students in the study?

12) Why do we question these results?

Remember to write your answer in the Assignment window of Canvas.

Fixing a Variable in the Data Set before Analysis You may have noticed that the Age field for the imported data from the Excel file has not been populated. The variable Age did not exist in the MS Excel data that you merged. (Open up the Excel file directly to look at this if you’d like.) But, you realize that there might be a way to correct this missing data since you do have data about the subjects’ Date of Birth. You can compute the age by using the YEARS function as you did in the Check Code lesson. So, you will write a new rule for the data to improve the analysis results.

You will not need to exit the Analysis window of Epi-Info and go to MakeView to write this rule, however. Instead, you will use the ASSIGN command to assign a code to populate the Age field. Then, you will analyze it and find the average age of students in the dataset.

From the Command Tree Variables folder, click Assign. The ASSIGN dialog box opens.

From the Assign Variables drop-down, select Age.

In the =Expression field type your code. This is the same code you used to create the age field Check Code in Form Designer.

 Use the YEARS function.

 The two inputs for this function will be DOB and Survey Date.

 Age= YEARS(DOB,10/25/2005)

 Make sure you don’t type extra spaces in.

Click OK. The code appears in the Program Editor.

List your data in a grid so that you can see the how the Age field is now populated with more data points. Use the LIST command to do so.

 All the once missing fields should now contain ages.

Create a Means of Age. GH531/EPI539 Computer Lab Module: Session 4 2012

PRACTICE! (as part of the Assignment)

13) With the new data points filled in, what is the average age of students in the study?

Use the MEANS Command with a Cross-Tabulated Variable Use a cross tabulation variable to show the distribution of means on one variable for various crosstab values. You can also compute statistics showing the likelihood that the means of the groups are equal.

 You are interested in finding out which school had the highest average of students who missed days of school due to asthma.

To determine this, you need to compute the means of missed days by school.

1. Click Means. The MEANS dialog box opens.

2. Select the Means Of variable MissDays.

3. Select the Cross-Tabulate by Value of variable SchoolNum.

4. Click OK. Results appear in the Output window.

 You should get a very large results table. Each row denotes the number of days missed, starting at 1 day and ending at 15 days. Each column denotes each school in your study, A through D.

 You can use this table to find your answer, or you can use the Descriptive Statistics table.

5. Scroll to the Descriptive Statistics. Notice the Mean is computed by School.

Which school had the highest average of missed days? You should find school A with the highest average of 4.16 days missed.

PRACTICE! (as part of the Assignment)

14) Which grade had the highest average of missed days? What is the average number of days missed by students with asthmatic conditions in this grade?

15) Who has a higher average of missed days due to asthmatic conditions, boys or girls?

Create a Stratified Means You can use the MEANS command to show a distribution of means using two variables (crosstab) that is stratified by still another variable. Use Stratify By to group your data by a variable.

 You are interested in finding out whether the school with the highest average of missed days for males is the same as the school with the highest average of missed days for females.

1. Click Means. The MEANS dialog box opens.

2. Select the Means Of variable as MissDays. GH531/EPI539 Computer Lab Module: Session 4 2012

3. Select the Cross-Tabulate by Value of variable SchoolNum.

4. Select the Stratify By variable Gender.

5. Click OK. Results appear in the Output window.

 Two tables and two sets of descriptive statistics are created, one for female and one for male.

6. Scroll through the results.

 Is it the same school for boys as girls? You should find that it is not the same school for boys as it is for girls. For boys, School A has the highest average of missed days (4.37 days) while School B has the highest average of missed days for girls (3.94 days).

PRACTICE! (as part of the Assignment)

16) Here’s one that will force you to search a bit in the results. (The data set is somewhat limiting for asking questions using a stratified means!)

 Is the area of town (using Zip as your proxy for area of town) with the highest number of missed school days due to asthmatic conditions the same area for boys as girls?

Use Select with the Means Command You can use the SELECT Command to create subsets of data and then use the MEANS command to create statistics.

For this example, you want to find out if the average number of missed days is higher among those who answered Yes to Bronchitis than the average number of missed days overall.

 To answer this, you need to determine two means. The latter has already been done (see the example under Create a Mean above). You need to compare this result with the mean for those with bronchitis.

1) Click Select. The SELECT dialog box opens. Use the SELECT Command to display only those students who answered Yes to Bronchitis. See if you remember how to use the SELECT Command before looking at the instructions below.

 You should have 86 records. Go back to p. 9 for refreshing how to use the Select Command.

2) Create a Means of Missdays.

 What is the average number of missed days by students who have bronchitis? You should have found 10.63 days as the mean.

GH531/EPI539 Computer Lab Module: Session 4 2012

 How does this mean compare with the overall average? Which group – the entire study population or those answering “Yes” to bronchitis – have missed a higher number of days on average?

3) Return to the original 800 records. In other words, cancel the selection you just made. Again, if you’re stuck, go back to p. 9 for refreshing how to use the Select Command.

PRACTICE! (as part of the Assignment)

17) In New York where this asthma study took place, you think that ozone and other air pollutants may be higher in the late summer/early fall period. So, you wonder if students reporting worst months in the late summer/early fall actually miss more days of school. Which group of students missed a higher number of days on average – students reporting their worst month for breathing problems from August to November, or people reporting from March to June?

Hints:

 To answer this, you’ll need to make two selections, one for the late summer/early fall group and another for the spring group.

 WMonth is the variable for the worst month for breathing problems.

 The months are listed in the data according to numbers; thus, January = 1.

 You will need to decide whether you use the operators AND or OR to help make your selection. AND means that your subject will need to meet both conditions. OR means that your subject can meet one of the conditions.

 You need to use parentheses to group each variable in the SELECT dialog box; thus, (Asthma=”Yes”) AND (RAD=”No”) OR (Wheezing=”Yes”)

Saving Your Work in the Analysis Module

You have the option of saving two types of files from the work that you have done in the Analysis Module: the computer code that you used to generate your results, located in the Program Editor window at the bottom of your screen; and the results you generated in the gray box on your screen.

Saving the Instructions (computer code) You Told Epi-Info To save your analysis results in statistical programs, you often save the computer code that you used to generate these results. Epi-Info operates in the same manner. Saving your work like this (instead of simply saving the output themselves) helps you recreate the same results in the future without having to guess or remember what you did. By doing so, you can run these same statistics on the new records. Looking at the code may also help you check for errors in your analysis; perhaps you gave the computer the wrong instructions. While we would often like to blame the computer, it is often user error – forgetting the smallest detail – that causes problems in the end.

So, you will now learn how to save all of the computer code that is listed in the Program Editor at the bottom of your screen. Go ahead and scroll up and down it to see all of the code. You may see where you made mistakes in entering a command or a function. GH531/EPI539 Computer Lab Module: Session 4 2012

Epi-Info Analysis automatically saves the last program created in the program editor in a file called LastPGM. This file is created and saved in the Epi_Info folder (which may be located by default on your C: drive (your local drive). See C:\Epi-Info on the UW computers). This file will get overwritten each time a new PGM is created and the application is closed. So, you want to create your own program file based on your activities with the date today.

1. From the Program Editor at the bottom of your screen, click File>Save. The Save Program dialog box opens.

 The Project File field contains the information about where you will be saving your file.

 You want to leave this information set to your flashdrive or to your computer desktop (wherever you are storing files and information for this class). Note that it will use your account folder on the C: drive as the default if you are using the UW computers. So, if you are trying to save it to your flashdrive, you will need to manually type in the pathway. You could also search for it later and save it to your flashdrive. NOTE: It can be tricky to search for.

2. In the Program Field, type Session 4.

3. In the Author field, type your initials.

4. In Comments, type My First Epi-InfoAnalysis Code

5. Click OK.

 The date created and date updated fields will be populated with new information each time you open the program.

6. Click Exit in the Command Tree to close Analysis.

Test the Saved Program 1. Open Analysis from the main page of EPI Info.

2. From the Program Editor at the bottom of the screen, click File>Open or click the Open button. The READ dialog box opens.

3. From the Program drop-down, select Session4 (the project file should not have changed).

4. Click OK. Your saved program appears in the Program Editor.

5. Select Edit>Program Beginning from the Program Editor menus.

6. Click Run This Command. The first command inside your program runs. The first command was Read, so your 800 records appear. If you had updated records, the new information would be reflected in the results.

 At any time during an Epi-Info Analysis session, you can run any previous commands by placing your cursor next to the command listed in the Program Editor Window and clicking Run This Command. GH531/EPI539 Computer Lab Module: Session 4 2012

7. Click Run to run the entire program.

 Notice that all of your results will flash through the Analysis results window, ending at the last set of results that you generated.

The program you saved is part of your project, to keep it updated you must save it. To access the LASTPGM file, click the Text File button from the Save Program dialog box (which you call up by selecting the Save button in the Program Editor Window) and locate the file in the Epi_Info folder.

Saving the Analysis Output Files The output files – the results that you generated in the Analysis window – are a bit easier to save and retrieve. You should see them as .htm files on your Desktop, perhaps using the Internet Explorer icon. They will be labeled OUT1.htm, OUT2.htm, etc. You can save these wherever you wish.

To open them, simply click on them. The file will open in an Internet browser window since they are .html files, the computer code that one uses to write Web pages.

So, you do not even need Epi-Info open to view these results. Thus, you could paste the results table into a document for presentation in a report. Or, you could send them to someone who does not have Epi-Info on her machine.

Unfortunately, you cannot use these files to continue analysis like you can use the computer code that you learned how to save above. Yes, you can open these files into Epi-Info Analysis by selecting the Open button located on the top of the Analysis results window (and then the OUT.htm file that you want to view). But, the code that you used to generate these results does not open with your results. Therefore, if you had generated a selection of data (e.g. selection of all students who have their worst breathing problems in the spring) and that shows up in your results file, you would not be able to work with it until you had the corresponding computer code in the Program Editor. So, for future analysis tasks, it is better to save the computer code as discussed above.