Introduction to Epi Info ( Version 3.4.1) Analyze Data Module
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to Epi Info (Version 3.4.1) Analyze Data Module By Kevin M. Sullivan, PhD, MPH, MHA and Minn Minn Soe, MD, MCTM, MPH Department of Epidemiology Rollins School of Public Health of Emory University Figure 1. Epi Info Introductory Screen. Intro to Epi Info 3.4.1 Analysis.doc October 2007 i ii TABLE OF CONTENTS I. Introduction 1 II. Basic Commands 5 Reading Data and Writing Data 5 List Data 6 Display Variables 7 III. Simple Analytic Commands and Graphics 9 Frequencies 9 Means 10 Tables 12 Match 15 Summarize 18 Graph 20 Exercise 1 22 IV. Navigating and Managing the Output and the Program Editor windows 25 V. Data manipulation commands 29 Sort/Cancel Sort. 29 Select/Cancel Select 30 Define/Undefine 31 Assign 32 Recode 33 If 34 Exercise 2 38 VI. Setting System Defaults 39 VII. Advanced Statistics 41 Linear Regression 41 Simple linear regression 42 Multiple linear regression 44 Logistic Regression 45 Unconditional logistic regression 46 Conditional logistic regression 49 Survival Analysis 49 Kaplan-Meier 49 Cox Proportional Hazards 51 Complex sample commands 55 Complex Sample Frequencies 55 Complex Sample Tables 58 Complex Sample Means 59 Exercise 3 62 iii VIII. Statistics Command Options 63 Stratify by 63 Weight 70 IX. Advanced Data Management Topics 75 Write (Export) 75 Delete file/table 76 Delete/undelete records 77 Relate 78 Merge 80 Acknowledgments 83 References 83 APPENDICES 85 Appendix 1. Data Dictionaries 85 Appendix 2. Operators/ Functions 102 Appendix 3. Answers to Exercises 105 Appendix 4. Analysis Commands By Type of Variables 113 iv I. INTRODUCTION Epi Info is a program developed by the Centers for Disease Control and Prevention (CDC) that runs under the Microsoft Windows® operating system and provides programs for data entry and analysis. The Epi Info program and help information can be found at www.cdc.gov/epiinfo. The purpose of this document is to introduce the Analyze Data module, discussing commands in a sequence appropriate for learning the program. More detailed information on some topics is provided in the later chapters and details of the commands can be found in Epi Info’s on-line help. Also presented is the program OpenEpi (www.OpenEpi.com) and how it can be used to supplement Epi Info. Figure 1 on the front of this document presents the Epi Info (3.4.1) introductory screen (release date: July 3, 2007). To start the Analyze Data module, click on the Analyze Data button in the lower left of the screen or from the pull-down menu by Programs → Analyze Data (see Figure 1). The main Analyze Data screen is shown in Figure 2. The screen is composed of three windows. There is a narrow window of the left side of the screen labeled Analysis that presents the commands, i.e., the “command tree”; the largest window labeled Analysis Output comprises the upper right portion of the screen and is where the output is presented; and a smaller window in the bottom right of the screen labeled Program Editor where text commands appear. Figure 2. Analyze Data main screen, Epi Info. Before discussing the Analysis commands, let’s review some basics. To exit the Analyze Data module, you can click on the Exit button at the top of the left window; to minimize the module, click on the “_” button in the upper right-hand corner of the left window. The Help button at the bottom of the left window takes you to the Epi Info Help system. You can resize any of the three windows by placing the cursor where two windows meet, hold the left mouse button, and drag the border. 1 The commands listed in the narrow window on the left are grouped according to their general functions, such as those related to reading, relating, writing and merging data files (the Data section); those relating to creating and assigning values to variables (Variables and Select/If sections), and the analytic commands (Statistics and Advanced Statistics sections). A brief description of some of these commands is presented in Figure 3. In this document, the Analyze Data commands are described in the following Chapters in this order: ¾ II. Basic Commands o Read (Import) – usually the first command used; to “read” or “open” a data file o List – to view or update the data in a spreadsheet format o Display – to display variable names and types ¾ III. Simple Analytic commands and Graphics o Frequencies – for viewing the frequencies of values for a variable o Means – similar to Frequencies for a single variable except for numeric data where the Means command provides summary statistics; can also perform independent t-test, one-way ANOVA, and their nonparametric equivalents. o Tables – for single and stratified 2x2 tables (where the odds ratio, risk ratio, and other measures of association are provided) or any size r x c table o Match – for matched case-control data o Summarize – to create a new table containing a summary of descriptive statistics for the current dataset. o Graph – graphing data ¾ IV. Navigating and Managing the Output and the Program Editor Windows o Issues related to using and navigating the Output and the Program Editor windows ¾ V. Data manipulation commands o Sort/Cancel Sort – sort the data/cancel a sort o Select/Cancel Select – “select” or “unselect” a subset of records for analysis o Define/Undefine – “define” or “undefine” new variables Assign – “assign” values to a variable Recode – recode from one variable to another variable If – If commands for conditional logic, also Then and Else ¾ VI. Setting System Defaults o Set – choose default settings ¾ VII. Advanced Statistics o Linear Regression – simple linear and multiple linear regression o Logistic Regression – both unconditional and conditional logistic regression o Survival Analysis Kaplan-Meier – simple survival analysis Cox Proportional Hazards – advanced survival analysis o Complex sample commands – commands for use with cross-sectional data which include elements of cluster and/or stratification Complex Sample Frequencies Complex Sample Tables Complex Sample Means ¾ VIII. Statistics Command Options o Stratify by o Weight ¾ IX. Advanced Data Management Topics o Write (Export) o Delete File/Table o Merge o Relate 2 An introduction to these commands follows. Because the goal of this document is to provide an introduction to these commands, there will be some details of the commands that are not presented. For more detailed information on the commands, please consult the on-line help in Epi Info. In this document, for examples of the output of some commands, we have removed or slightly modified some of the output to reduce space. Figure 3. Short description of selected commands in Analyze Data ←The minimize button (“_”) on the Analysis commands window will minimize all 3 Analyze Data windows; the close button (“X”) closes all 3 windows, same as the Exit button described next. ←The Exit button will exit from the Analyze Data module. Data-Related commands Read – Read an Epi 2000 (i.e., Access 2000) file; can Import other file types, such as Epi6, dBase, and Excel Relate is for relating files, Write for writing new data files or Exporting to a different file (e.g., Epi Info 6, dBase, and others); and Merge to merge data files. Commands to Delete a file, table, records, or undelete records Variables – commands for creating variables & assigning values Define/Undefine to create/remove a temporary variable; Assign values to a variable; Recode to recode a variable (a short-hand version of If/Then/Else); and Display to display the variables by their variable names and types in the data set and any temporary defined variables. Select/If – commands for selecting a subset of records, If/Then/Else statements, and Sorting a file. Select/Cancel Select is for selecting/unselecting a subset of records. If/Then/Else commands Sort/Cancel Sort for sorting/unsorting the file on one or more variables. Statistics – commands for presenting data & statistical analyses List – data depicted as a spreadsheet (“grid”), allow data entry Frequencies – frequency, %, cum %, and confidence intervals Tables – cross-tabulations; stratified analysis Match – for matched case-control analysis; can have one or more controls for each case Means – frequency, descriptive statistics, independent t-test, one-way ANOVA, and nonparametric tests Summarize – Creates a new variable with descriptive statistics Graph – graphing module Map – mapping module Advanced Statistics Linear Regression – simple or multiple linear regression Logistic Regression – regression for dichotomous outcome variables Kaplan-Meier Survival analysis Cox Proportional Hazards survival analysis Complex Sample commands – for surveys using complex sample design, such as stratification and clusters, for Complex Sample Frequencies, Tables, and Means Help – to open the Epi Info Help program There are a number of commands lower in this window not described in this figure. 3 4 II. BASIC COMMANDS Reading and Writing Data One of the first things you will want to do in the Analyze Data module is to Read a data set, or, as usually stated in most Windows programs, to open a file. To Read a file, click on Read (Import) in the command window; a dialog box will be presented as shown in Figure 4. Figure 4. Dialog box for Read (Import), Epi Info. Epi Info uses the Microsoft Access data file format and in Epi Info referred to as Epi 2000 files. If you are not familiar with Access files, they can seem complicated because they may contain many “tables” which are the equivalent of data files such as Epi Info DOS .REC files and SPSS .SAV files. When Epi Info is installed on computer, the default Current Project (see top of dialog box in Figure 4) is usually on drive C:, the Epi_Info folder, and the Data Source drive, folder, and file name is C:\Epi_Info\Sample.mdb, a file distributed with Epi Info that contains a number of example files.