The First Line in the Above Code Defines a New Variable Called “Anemic”; the Second Line Assigns Everyone a Value of (-) Or No Meaning They Are Not Anemic
Total Page:16
File Type:pdf, Size:1020Kb
The first line in the above code Defines a new variable called “anemic”; the second line assigns everyone a value of (-) or No meaning they are not anemic. The first and second If commands recodes the “anemic” variable to (+) or Yes if they meet the specified criteria for hemoglobin level and sex. A Note to Epi Info DOS users: Using the DOS version of Epi you had to be very careful about using the Recode and If/Then commands to avoid recoding a missing value in the original variable to a code in the new variable. In the Windows version, if the original value is missing, then the new variable will also usually be missing, but always verify this. Always Verify Coding It is recommended that you List the original variable(s) and newly defined variables to make sure the coding worked as you expected. You can also use the Tables and Means command for double-checking the accuracy of the new coding. Use of Else … The Else part of the If command is usually used for categorizing into two groups. An example of the code is below which separates the individuals in the viewEvansCounty data into “younger” and “older” age categories: DEFINE agegroup3 IF AGE<50 THEN agegroup3="Younger" ELSE agegroup3="Older" END Use of Parentheses ( ) For the Assign and If/Then/Else commands, for multiple mathematical signs, you may need to use parentheses. In a command, the order of mathematical operations performed is as follows: Exponentiation (“^”), multiplication (“*”), division (“/”), addition (“+”), and subtraction (“-“). For example, the following command ASSIGNs a value to a new variable called calc_age based on an original variable AGE (in years): ASSIGN calc_age = AGE * 10 / 2 + 20 For example, a 14 year old would have the following calculation: 14 * 10 / 2 + 20 = 90 First, the multiplication is performed (14 * 10 = 140), followed by the division (140 / 2 = 70), and then the addition (70 + 20 = 90). If you wanted the addition to take place (2 + 20) before the multiplication and division, place parentheses around 2 + 20: ASSIGN calc_age = AGE * 10 / (2 + 20) For AGE = 14, the above would result in 6.3636: first, 2 + 20 equals 22; then 14 * 10 = 140, which would then be divided by 22 = 6.3636. It never hurts to insert parentheses for clarity - sometimes leaving them out can lead to unexpected results. 36 Epi Info Exercise 2 – Use of Select, Define, Assign, Recode, and If/Then commands The following questions are based on the viewEvansCounty data. You are interested in performing some analyses only on those with hypertension. In this data file, the variable name is HPT, and those who are hypertensive have the code “Yes.” Use the Select command and answer the following questions: 1. What is the mean cholesterol (variable CHL) for the hypertensive group? 2. What is the risk ratio for the CAT-CHD relationship among those with hypertension? At this point, please Cancel Select. An investigator has developed a new index for predicting coronary heart disease. This index is based on the measure of body size called QTI and cholesterol level (CHL). The index is calculated as: CHD_index = 100 x QTI2/Cholesterol level 3. Create this variable in the data set. What is the mean CHD_index value? 4. Do those who developed CHD have a significantly higher mean CHD_index compared to those who did not develop CHD? Using the Recode command, Recode age to agegroup using by 20-year age intervals: 40-59 and 50-79 years of age. 5. How many individuals are there in the 40-59 year age group and how many in the 60-79 year age group? We would like to use the hematocrit information to classify the men as anemic or not anemic. The cutoff for anemia is a hematocrit <39 for nonsmokers and hematocrit <40 for smokers. The variable name for hematocrit is HEM and the variable name for smoking is SMK and is coded as a “Yes”/”No”. 6. Define a new variable Anemic and use If/Then statements to give a value of 1 if anemic, a value of 2 if not anemic. What is the prevalence of anemia? 7. Save the above Define and If/Then statements into a program file called Anemic in the Sample.mdb file. ReRead viewEvansCounty and Run the program. 37 38 VI. Setting System Defaults Set The user can specify some aspects of how information is presented using the Set command (see Figure 43). For example, for output and the List command, the following top three aspects of the Set command dialog box are: For Yes/No fields, the “Yes” response presented as: “Yes”, “True” or (+) For Yes/No fields, the “No” response presented as: “No”, “False”, or (-) Missing values presented as: “Missing”, “Unknown”, or (.) Figure 43. Set command dialog box, Analyze Data, Epi Info. Show Hyperlinks – when checked, will show hyperlinks to output in the Output window; when not checked, hyperlinks not shown. Show Selection Criteria – when checked, shows the Selection criteria with the output of every subsequent command; when not checked, no selection criteria is shown. Show Percents – when checked, shows row and column percentages for the Tables and Means command; when not checked, these percentages are not shown. Show Tables in Output – when checked, shows tables for Frequencies, Means, Tables, or Match commands; when not checked, tables are not shown. For the Statistics output, you will probably want to set this at “Advanced” to see all the statistics. Include Missing is whether to include missing records in the tables presented in Frequencies and Tables. Process Records – to determine if in the analyses you want only undeleted (“normal”) records, only Deleted records, or both normal and deleted records. 39 40 VII. Advanced Statistics In this section the following advanced statistics commands are described: Linear Regression, Logistic Regression, Kaplan-Meier Survival, Cox Proportional Hazards, and commands for analyzing complex sample designs (Complex Sample Frequencies, Complex Sample Tables, and Complex Sample Means). Linear Regression Linear regression is used when the outcome variable is continuous, such as age, hemoglobin values, and cholesterol. The dialog box for Linear Regression is shown in Figure 44. The Linear Regression command can be used for simple linear-regression and simple correlation (only one independent variable), and for multiple linear regression (more than one independent variable). Regression is where the primary interest is to predict one dependent variable (y) from one or more independent variables (x1, ..., xk). The correlation coefficient or r (sometimes referred to as the Pearson correlation coefficient) is a measure used to determine how two continuous variables are related. If the correlation is greater than 0, the variables are positively correlated; i.e., as x increases, y also increases. If the correlation is less than 0, the variables are negatively correlated; i.e., as x increases, y decreases. If the correlation is exactly 0, then the variables are uncorrelated. The correlation coefficient can vary between +1 and -1. For positive correlations (r > 0), the closer to +1, the stronger the correlation; for negative correlations (r < 0), the closer to -1 the stronger the correlation. As a rule of thumb for interpreting r: 0.9-1, very high correlation; 0.7-0.89, high correlation; 0.5- 0.69, moderate correlation; 0.3-0.49, low correlation; 0.0-0.29, little if any correlation. Figure 44. Linear Regression command dialog box, Analyze Data, Epi Info. Note a slight discrepancy between the command Linear Regression and the name of the dialog box Regress. If the data are ordinal or not normally distributed, significance tests based on the Pearson correlation coefficient may not be valid and a nonparametric equivalent to Pearson’s would be preferable (which is not currently available in Epi Info). The following discusses simple linear regression (only one predictor/independent variable) and multiple linear regression (more than one predictor/independent variable). Simple Linear Regression As an example of simple linear regression, we will use the viewEstriolandBirthweight data which can be found in the Sample.mdb file. These data are from Rosner and are described in Appendix 1. In this example, the Outcome Variable is Birthweight and the Other Variables is Estriol. [Note: In Epi Info version 41 3.3.2 and earlier there is an error in this data file. To obtain the same results as below (and in the textbook by Rosner) in the older version of Epi Info, you will need to correct record 12 where the Birthweight should be 31, not 30. This correction can be made using the List command with its Allow Updates option.] The results of the regression are shown in Figure 45 and some of the output is explained below. Figure 45. Example output of simple linear regression from the Linear Regression command, viewEstriolAndBirthweight data, Epi Info. Linear Regression Variable Coefficient Std Error F-test P-Value Estriol 0.608 0.147 17.1616 0.000286 CONSTANT 21.523 2.620 67.4656 0.000000 Correlation Coefficient: r^2= 0.37 Source df Sum of Squares Mean Square F-statistic Regression 1 250.574 250.574 17.162 Residuals 29 423.426 14.601 Total 30 674.000 (Note: The Correlation Coefficient, frequently referred to as “r”, is not the same as r^2 or r2) Coefficient, Std Error, F-test, and P-value: For the predictor variable, the coefficient value is the slope of the line, sometimes referred to as the “regression coefficient.” In this example, 0.608 can be interpreted as for every one-unit increase in estriol (1 mg/24 hr), there is a 0.608 increase in birth weight units (g/100). Statistics concerning the slope are also provided; the standard error (“Std Error”), which is 0.147, F-test (same as F-Statistic presented lower in the output for simple linear regression), and a P-value, in this example 0.000286.