UNIVERSITY OF CINCINNATI

Date: 3-May-2010

I, Karen Weyer , hereby submit this original work as part of the requirements for the degree of: Master of Science in Biostatistics (Environmental Health) It is entitled: Determining Appropriate Sample Size for Cases in a Case-Control Study

Utilizing Proxy Respondents

Student Signature: Karen Weyer

This work and its defense approved by: Committee Chair: Paul Succop, PhD Paul Succop, PhD

Tania Carreon-valenci, PhD Tania Carreon-valenci, PhD

5/18/2010 596

Determining Appropriate Sample Size for Cases in a Case-Control Study Utilizing Proxy

Respondents

A thesis submitted to the Division of Research and Advanced Studies of the University of Cincinnati in partial fulfillment of the requirements for the degree of Master of Science in the Department of Environmental Health of the College of Medicine by Karen Weyer B.A. Agnes Scott College June 2010 Committee Chair: Paul Succop, PhD

ABSTRACT

There are many situations in research when the circumstances do not allow the subject to be interviewed about their medical, family or work history, due to their death, or being incapable of providing responses on their own. In these situations the researcher must rely on a proxy.

Using the data from the Upper Midwest Health Study (UMHS), two power simulation analyses were completed to determine the appropriate sample size of proxy respondents required to maintain a power of 0.80 or higher. Results show that we can rely on the proxy responses for agreement to the subject response for demographic questions, but not for study-related questions.

Additionally, the power analysis demonstrated that when the percent of disagreement between the proxy and subject responses are larger, fewer proxy respondents are required for a power of

0.80.

iii

iv

Acknowledgements

I would like to thank my academic advisor and committee chair Dr. Paul Succop for his guidance, wisdom, and patience.

I would like to thank Dr. Tania Carreon-Valencia for providing me the opportunity to work on this project, as well as her support and efforts through the completion of this project.

I would like to thank Dr. Avima Ruder for creating for me the data subset from the Upper

Midwest Health Study from which this analysis is based upon.

I would like to thank those that I work with at i3 statprobe who supported me during the duration of this project; John Lasley, Jennifer Savage-Sales, Ann Hemken, and Diana Cucos and the many others who would ask me ‘How is your thesis coming, Karen?’

Finally, I would like to thank my family for their ongoing support; Mom and Dad for encouraging me to get it done, my brother Stephen for unknowingly challenging me to get my degree done at the same time he finished his, and my husband Scott and son Benjamin for providing me love, support, and quiet Sunday afternoons to get this paper finished.

v

Table of Contents

Abstract iii Acknowledgements v List of Tables vii List of Figures viii Section I: Sample-Size and Power 1 Section II: Analysis of Upper Midwest Health Study Data 2 Introduction 2 Methods 4 Results Part I: Analysis of UMHS Data 6 Results Part II: Analysis of Simulation 8 Discussion 10 Section III: Impact on Environmental Health Research 12 Bibliography 13 Tables 14 Figures 17 Appendix 1: Question List 29 Appendix 2: SAS Code 31

vi

List of Tables

1. Percentage of Matched Responses Between Subject and Proxy Respondents

2. Z-test Scores (p-value) for Response Agreement Between Subjects and Proxies when Subject

Answered “Yes”

3. Power of Proxy Agreement to Subject Responses

vii

List of Figures

1) Power versus Sample Size for Demographic Questions – alpha = 0.05

2) Power versus Sample Size for Study Related Questions – alpha = 0.05

3) Power versus Sample Size for Question Groups – alpha = 0.05

4) Power versus Sample Size for Demographic Questions – alpha = 0.01

5) Power versus Sample Size for Study Related Questions – alpha = 0.01

6) Power versus Sample Size for Question Groups – alpha = 0.01

7) Mean Change versus Sample Size for Demographic Questions: alpha = 0.05 and beta ≥ 0.80

8) Mean Change versus Sample Size for Study Related Questions: alpha = 0.05 and beta ≥ 0.80

9) Mean Change versus Sample Size for Question Groups: alpha = 0.05 and beta ≥ 0.80

10) Mean Change versus Sample Size for Demographic Questions: alpha = 0.01 and beta ≥ 0.80

11) Mean Change versus Sample Size for Study Related Questions: alpha = 0.01 and beta ≥ 0.80

Mean Change versus Sample Size for Question Groups: alpha = 0.01 and beta ≥ 0.80

viii

Section I: Sample-Size and Power

Choosing the appropriate samples size for a study is critical. Having too small of a sample size could cause the study to demonstrate non-significant results even if there are true differences between the true mean and the null mean. Having too large of a sample could be costly and not provide much more information than a smaller sample.

There are four factors that impact the sample size, the standard deviation of the hypothesized effect (σ2), the probability of a type I error (α-level), power (1-β), and the effect size (i.e., the distance between the means under the null and alternative hypotheses) (Rosner,

2000). When these factors are adjusted per specifications of the study, the ideal sample size can be determined.

Power is defined as one minus the probability of making a type II error. Power tells us the likelihood of rejecting the null hypothesis given that the alternative hypothesis is true

(Rosner). It is influenced by the same factors of sample size, alpha, effect size and standard deviation of the hypothesized effect. Power decreases when alpha decreases, and the standard deviation of the hypothesized effect increases. Power increases when the difference between the true mean and the null mean increases, and the sample size increases (Rosner). Most researchers set power at 0.80 or higher when doing sample size estimation.

1

Section II: Analysis of Upper Midwest Health Study Data

Introduction:

There are often situations in research when the subject cannot provide a reliable medical, occupational, or life style history. Researchers must then rely on the history that is provided by the subject’s proxy respondent. In epidemiological studies, the proxy response increases study power, by increasing the sample size and improves representation of the case group (Campbell et al. 2007). However, the accuracy of the proxy’s information can be influenced by multiple factors, including length of relationship, bias and quality of relationship, and has the potential to misclassify the exposure (Johnson, et al., 1993).

Many studies (Campbell, et al., 2007, Johnson, et al., 1993) have compared proxy responses for accuracy against self-respondents in case-control studies. Campbell et al. (2007) looked at the completeness of information of the proxy respondents in a population-based case- control study that used self-administered mailed questionnaires. They found that parents and spouses provided more complete information than did children, siblings, friends or others for many questions commonly asked in epidemiological research. Johnson et al. (1993) examined the quality of pesticide exposure data provided by self- and proxy respondents and concluded that the pesticide data from the proxy and self-respondents does not lead to the same conclusions or the same estimate of risk. If the information between the self and proxy respondent do not lead to the same conclusions due to differences in responses, then adding proxy respondents to increase study power may actually cause incorrect conclusions to be made during study analysis.

This paper uses data from the Upper Midwest Health Study (UMHS) to determine the optimal sample size for studies examining pesticide exposure when using data from proxies. The analysis looks at the agreement between proxy and respondent answers and if a sample of

2

proxies provides a similar response as the subject/self-respondent. Explicitly the null hypothesis is that the proxies will respond the same as the subject. A power analysis is then completed using the standard deviation and difference between subject-proxy agreement to determine appropriate sample size for similar studies.

3

Methods:

Data from the Upper Midwest Health Study, a case-control study, was analyzed to determine the optimal sample size when collecting proxy data. Researchers from the National

Institute for Occupational Safety and Health (NIOSH) collected pesticide usage data from residents in non-urban areas in four upper Midwestern states: Michigan, Minnesota, Iowa and

Wisconsin (Ruder, et al., 2006). The 872 cases were diagnosed with gliomas (the most common type of brain cancer) from January 1995 through January 1997. The 1669 control participants had no diagnosis of glioma. All participants (or their proxies) were interviewed and answered questions regarding their occupational experiences, ethnicity, education and lifestyle. A subset of case respondents had a next of kin (proxy) interviewed 1-2 years after the original interview.

This subset of 105 matched responses was used for this analysis.

Seven questions from the administered questionnaire were analyzed (Appendix 1). For

this analysis, the questions were grouped into two categories, ‘demographic’ and ‘study-related’.

The questions were analyzed separately and across the two groups to determine if there is a

significant difference between the subjects and their proxies and possible recall bias.

Additional analyses looked at the number of subjects who answered ‘Yes’ to the question

and compared that to the number of proxies who also answered ‘Yes’ to the same question to

determine if there was any significant difference between the responses. The subject’s response

was considered the ‘gold-standard’ and a z-test was used to compare the proxy’s response to the subject’s response. The subject-proxy matched ‘No’ responses were removed from the statistical analysis and simulation. The analysis was done with all subjects/proxy pairs and then subset by relationship between proxy and subject and subset by gender of the proxy respondent.

4

Two power simulation analyses were performed using the difference and standard deviation values provided through the analysis of the data from the UMHS. SAS version 9.2 was used for the analysis. For the first simulation, the difference and standard deviation of each of the seven questions were examined to determine the appropriate sample size for using proxy respondents to get a power at 0.80 when the alpha-level is set at 0.05 (z = 1.645) and set at 0.01

(z = 2.326). For the second simulation analysis, the standard deviation of each question was used and the mean difference between the subject and proxy was modified in a step-wise manner from 0.05 to 0.50 by 0.05. The number of subjects was adjusted from one to 30 for each modification in difference. The analysis was done with the alpha-level set at 0.05 (z = 1.645) and again with the alpha set at 0.01 (z = 2.326).

5

Results:

Part I: Analysis of UMHS Data

The potential analysis population included 105 subject-proxy pairs. Fifty pairs were removed from analysis because a proxy was present with the subject at the original interview and it is not possible to determine if the subject or proxy answered the questions. Among the subjects, 37 (66%) were male and 19 (34%) were female. Most proxy respondents were spouses

(66%), children (18%) or siblings (11%). There were 17 (30%) male proxy respondents and 38

(68%) female proxy respondents. On average, proxies lived with the subject for 30 years or more.

Pooling all of the matched “Yes” and matched “No” responses between the subjects and the proxies, they were in agreement 75% of the time for all 7 questions analyzed. The proxies matched the answer of their matched respondent 90% for the three demographic questions examined and agreed 42% of the time with the respondents’ answers on the four study-related questions examined. Comparing the agreement of responses for the demographic questions versus the agreement of responses to the study-related questions we found that there was a statistically significant difference (z score = 8.660 , p < .001). Table 1 displays the percentage of response agreement for additional subsets of the data. The spousal proxies were in agreement with their counterpart more than the child proxies and male proxies had a matched response less frequently than the female proxies – response agreement was not significant for either subset comparisons when all questions were taken into consideration.

Table 2 displays the z-scores for the analysis of the number of percentage of proxies who gave a ‘Yes’ response when the subject gave a ‘Yes’ response. When all of the questions were

6

considered together, there was a significant difference in agreement between the subject and proxies (p <0.0001) on all questions. The difference in agreement was also significant for the spouse (p=0.002) and slightly significant for the sibling (p = 0.04) proxy relationship analysis.

The gender of the proxy did not show any significance in agreement with alpha levels greater than 0.09.

The probability of agreement in responses to the demographic questions was non- significant at 0.970. The disagreement in responses between subject and proxy was significant at p <0.0001. The analysis holds true for the spouse, sibling and male gender subsets. However, for the set of female proxies they showed disagreement on the demographic questions at a significant level (p = 0.004) and no significance in disagreement of response for the study-related pesticide questions (p = 0.23).

Looking at specific questions for all subject-proxy pairs, there is no significant difference in responses for smoking history, use of herbicides and use of fungicides. Response disagreement was nearly significant at the .09 1-tailed alpha-level, 0.06 1-tailed alpha-level and

0.08 1-tailed alpha-level for alcohol history, previous exposure to a farm and use of fumigants, respectively.

The power of the z-test score results was calculated using a one-tailed- alpha-level of

0.05. Table 3 displays the power for the corresponding z-test scores for each question. All but two questions demonstrated power at greater than 0.80 level for all subjects. The responses to smoking history and use of fungicides both resulted in a power of 0.050 since there was no difference between the subject and proxy responses. When the relationship of the proxy is a

7

child the power falls below 0.80 level for two questions; alcohol history and (power = 0.776) and usage of insecticide (power = 0.654).

Part II: Analysis of Simulation

Figures 1-3 show the simulation results as sample size versus power calculation with an alpha-level of 0.05. Data for smoking history and use of herbicides were not analyzed in figures

1-2 since there is no difference between the subject and proxy respondents response of “Yes”.

Figure 1 shows that a sample size of four proxies is sufficient to achieve a power greater than

0.80 for both demographic questions. Figure 2 shows that a sample size of fifteen is sufficient to achieve the desired power when looking at the questions individually.

Figure 3 displays results of the power analysis when the questions are combined into three groups, demographic, study-related, and all questions. The sample size to achieve a power greater than 0.080 for the study-related questions is two proxies and one proxy for the demographic questions and when all questions are analyzed together.

Figures 4-6 display the subject versus power results when alpha is set at 0.01. A sample size of six is sufficient to achieve a power greater than 0.80 for all demographic questions in figure four. Twenty-four proxies is the sample size required to achieve power greater than 0.80 for the study-related questions when analyzed individually in figure five. Figure six displays the results of the combined questions. A sample size of three produces a power greater than 0.80 for the combined agreement of the study-related questions and sample-size of one produces a power of 1.00 for the agreement between subject and proxy when all questions are combined.

The results of the second simulation analysis are presented in figures 7-12. The figures show the number of proxies and mean change when the power surpassed the critical point of 8

0.80. Figure 7 shows that when the percent difference between subject and proxy response is

5%, 14 proxies are required for a power greater than 0.80 when the alpha-level is set at 0.05.

Figure 10 shows that with the alpha – level reduced to 0.01 it will take 23 subjects for the power to reach 0.80 for the farm-ranch question. Both figures show that when the difference reaches

25% the number of patients required for greater than 0.80 power for all three demographic questions is one.

When the disagreement of responses is 5% the number of subjects required to reach a power of 0.80 is almost double for most of the questions regarding pesticide usage (Figure 8).

The number of proxies required reduces to one when the difference reaches 40% for all 4 pesticide questions when alpha is set at 0.05 and 50% when alpha is set at 0.01 (Figures 8 and

11).

Figures 9 and 12 display the power analysis when the questions are collapsed into the three categories: “Overall”, “Demography”, and “Study-Related”. The study related questions require more proxy respondents when the difference is 5%. When the difference is set as acceptable at 15%, only one proxy is required to reach a power of 0.80.

9

Discussion

The results from this analysis of subject-proxy agreement demonstrate that we can accept the null hypothesis that the proxy response is reliable if the subject is not available to answer demographic questions, but we must reject the null hypothesis for the for study-related questions due to disagreement in responses between subject and proxy. The power analysis shows that a minimal number of subjects are required to provide a high-level of power when using the standard deviation and mean difference from the UMHS analysis; thus supporting the likelihood that we accept the alternative hypothesis when there is a difference in agreement between the subject and proxy response. When we modified the mean change in the simulations, we found that more subjects are required when the mean change is smaller to achieve appropriate power.

There are a few general issues to consider when reviewing this subject-proxy respondent analysis. These results differ from the Johnson et al. (1993) study where they found that for general pesticide questions the respondent-proxy agreement was greater than 60% for most questions regarding overall pesticide usage. The mean age for which the subject and proxy knew each other is quite different between the two studies. In this analysis the mean years that the subject and proxy lived together was 30 years. The mean number of years the subject and proxy knew each other in Johnson et al. (1993) study was 50 years or more. When the analysis was done including only those subject-proxy pairs that knew each other 40 or more years, the subject-proxy response agreement was 87.5% for the study-related questions.

Another consideration for the disagreement between the subject and proxy is that the proxy was interviewed two to three years after the initial interview with the subject. During this time

10

lapse information could have been forgotten by the proxy or media details regarding the impact of pesticide usage could have biased the responses provided to the interviewer.

This study does not consider the accuracy of response of the subject regarding the usage of pesticide data. Rather it considers only if the proxy respondent will provide the same response as the proxy. In situations where the disease being study could impact the memory of the subject

(e.g. brain cancer, Alzheimer’s, dementia), the proxy response could be more accurate than the subject response while it was in disagreement. Further analyses should be considered using a study design where a gold-standard can be established to compare the subject and proxy responses and the level of accuracy to the ‘standard’.

11

Section 3: Impact on Environmental Health Research

The study of sample size is critical to all of research not just Environmental Health research.

However, specific to environmental health having accurate information about the occupation or

place of residence of a subject is critical information when determining if an environmental

factor is contributing to the illness. If the subject is not able to provide this, then the researchers must rely on the proxy to provide the same information. Determining the correct sample size to ensure that the alternative hypothesis is not being accepted falsely will improve the validity of a study.

12

Bibliography

Campbell, P. T., Sloan, M., & Kreiger, N. (2007). Utility of Proxy versus Independent Respondent Information in a Population-Based Case-Control Study of Rapidly Fatal Cancers. Annals of Epidemiology , 17 (4), 253-257.

Johnson, R. A., Mandel, J. S., Gibson, R. W., Mandel, J. H., Bender, A. P., Gunderson, P. D., et al. (1993). Data on Prior Pesticide Use Collected from Self-and Proxy Respondents. Epidemiology , 4 (2), 157-164.

Magaziner, J. (1991). The Use of Proxy Respondents in Health Studies of the Aged. In R. W. Wallace, The Epidemiology Study of the Elderly (pp. 120-129). New York: Oxford University Press.

Rosner, B. (2000). Fundamentals of Biostatistics. (5 ed.). Duxbury Thompson Learning. pp: 816.

Ruder, A., Waters, M. A., Carreon, T., Butler, M. A., Davis-King, K. E., Calvert, G. M., et al. (2006). The Upper Midwest Health Study: A Case-Control Study of Primary Intracranial Gliomas in Farm and Rural Residents. Journal of Agricultural Safety and Health , 12 (4), 255- 274.

13

Table 1: Percentage of Matched Responses between Subject and Proxy Respondents

Proxy Relation Gender of Proxy

All Spouse Child Female Male

Alcohol History 89.1 91.9 80 91.9 82.4

Smoking History 96.4 97.3 90 97.4 94.1

Farm/Ranch 85.5 81.1 90 86.8 82.4

Demographic 90.4 90.1 86.7 92.0 86.3

Insecticide Usage 42.9 38.5 40 50 20

Herbicide Usage 31.6 30.8 0 40 0

Fungicide Usage 33.3 28.6 33.3 35.3 25

Fumigant Usage 63.2 58.3 66.7 73.3 25

Study Related 42.5 38.5 35.7 49.2 17.6

All Questions 74.8 73.6 70.5 76.7 69.1

14 Table 2: Z-test scores (p-value) for Response Agreement Between Subjects and Proxies when Subject Answered ‘Yes’

Proxy Relation Gender of Proxy

All Spouse Child Female Male

Alcohol History 1.316 (0.09)

Smoking History 0.0000 (0.5)

Residence on 1.625 (0.0526) Farm/Ranch

Demographic 1.899 (0.0294) 1.968 (0.0239 0.556 (0.2912) 2.706 (0.0035) 0.000 (0.5)

Insecticide Usage 9.347 (<0.0001)

Herbicide Usage 0.0000 (0.5)

Fungicide Usage 0.657 (0.2578)

Fumigant Usage 1.368 (0.0869)

Study Related 5.221 (<0.0001) 3.808 (<0.0001) 2.098 (0.0183) 0.702 (0.2420) 3.374 (0.004)

All Questions 4.599 (<0.0001) 3.340 (0.0004) 1.708 (0.0446) 1.385 (0.0838) 1.063 (0.1446)

Note: Z-tests are not reported for the individual questions of the subset populations due to the small size of the sample.

15

Table 3: Power of Proxy Agreement to Subject Responses

Proxy Relation Gender of Proxy

All Spouse Child Female Male

Alcohol History 1.000 1.000 0.776 1.000 0.050

Smoking History 0.050 0.050 0.050 0.050 1.000

Farm/Ranch 1.000 1.000 0.050 1.000 0.050

Demographic 1.000 1.000 0.816 1.000 0.050

Insecticide Usage 1.000 1.000 0.654 0.974 0.804

Herbicide Usage 0.050 0.050 1.000 0.000 1.000

Fungicide Usage 0.816 1.000 0.050 0.826 0.050

Fumigant Usage 1.000 0.964 0.000 1.000 1.000

Study Related 1.000 1.000 1.000 0.999 1.000

All Questions 1.000 1.000 1.000 1.000 1.000

`a) Alpha-level set at 0.05.

16

17

18

19

20

21

22

23

24

25

26

27

28

Appendix 1

Demographic Questions

1) Did (you/your ____ ) ever drink alcoholic beverages at least 12 or mor times in a single year before 1993? Yes No DK

2) Did (you/he/she) ever smoke 100 or more cigarettes during (your/his/her) lifetime before 1993? Yes No DK

3) Did (you/your ____) ever live or work on a farm or ranch before 1993? Yes No DK

Study-Related Questions

4) Before 1993, while (you/your ___) lived or worked on the farm(s), were insecticides ever used on livestock, farm crops, farm buildings, or lots? Insecticides are chemicals used for killing insects, for example, on livestock, farm crops, buildings, or lots. Yes No DK

5) Before 1993, while (you/your ___) lived or worked on the farm(s), were herbicides ever used? Herbicides are chemicals used to kill plans or retard plant growth, especially weeds. Yes No DK

6) Before 1993, while (you/your ___) lived or worked on the farm(s), were fungicides ever used? Fungicides are chemicals used to kill organisms that cause molds, mildews, rots, and smuts. Yes No DK

29

7) While (you/your ___) lived or worked on the farm(s), were fumigants or miticides ever used? Fumigants are gaseous substances or mixtures of substances that are used to kill insects, bacteria, or rodents. Miticides are substances used to control mites. Yes No DK

30

Appendix 2

Power Simulation Analysis 1:

%global alpha; option spool;

%macro getpower (df = , std = , ttle =, end = 100, byvar = 1, num=);

DATA POWER# format power 8.4; DO n = 1 TO &end BY &byvar; ZALPHA = PROBIT(&alpha); diff = &df; sTDEV = &std; Z=zalpha+((diff*sqrt(n))/stdev); power = probnorm(z); output; end; run; proc sort data = power# by power; run; data power&num ; length labl $50.; set power# by power;

power&num = power; labl = "&ttle"; if first.power then flag1 = 1;

label zalpha = Standard Normal Variable Associated with 5% level diff = Mean Difference STDEV = Standard Deviation Z = Z (1-Beta) power = Statistical Power power&num = &ttle; run; ods rtf; proc printto file = "C:\Users\Karen\Documents\Thesis\Analysis_Data\DataOut\simulations\&ttle&atit le..rtf" new; proc print label data = power# title "&ttle at alpha level &alpha"; var zalpha diff stdev z power; where flag1 = 1; run; ods rtf close; %mend;

31

%macro doalpha (alpha = , atitle = ); %getpower (df = 0.067, std = 0.051, ttle = Alcohol History, num = 1); %getpower (df = 0.000, std = 0.049, ttle = Smoking History, end = 10000, byvar = 200, num = 2); %getpower (df = 0.121, std = 0.075, ttle = Farm-Ranch, num =3); %getpower (df = 0.452, std = 0.048, ttle = Insecticide Usage, num =4); %getpower (df = 0.000, std = 0.132, ttle = Herbicide Usage, end = 10000, byvar = 200, num = 5 ); %getpower (df = 0.100, std = 0.152, ttle = Fungicide Usage, num =6); %getpower (df = 0.185, std = 0.135, ttle = Fumigant Usage, num =7); %getpower (df = 0.166, std = 0.036, ttle = Overall, num =8); %getpower (df = 0.066, std = 0.035, ttle = Demography Questions, num =9); %getpower (df = 0.244, std = 0.047, ttle = Study-Related Questions, num =10); data all&atitle keypower&atitle; set -;

if 0.75 < power <=1 then flag = 1; output all&atitle; if flag ne . then output keypower&atitle; run; proc sort data = keypower&atitle; by labl power; run; data keypower_final&atitle; set keypower&atitle; by labl power; if first.labl; run; proc sort data = all&atitle; by labl n; run; %mend doalpha;

%doalpha (alpha = 0.05, atitle = 5); %doalpha (alpha = 0.01, atitle = 1);

32

Power Simulation Analysis 2:

%global alpha atitle ttle; option spool; options linesize = 55 pagesize = 50 orientation = portrait ;

%macro getpower (df = , std = , ttle =, end = 50, byvar = 1, num=);

DATA POWER# format power 8.4; do diff = 0.05 to 0.50 by .01; DO n = 1 TO &end BY &byvar; ZALPHA = PROBIT(&alpha); /* diff = &df;*/ sTDEV = &std; Z=zalpha+((diff*sqrt(n))/stdev); power = probnorm(z); output; end; end; run; proc sort data = power# by diff power; run; data power&num ; length labl $50.; set power# by diff power;

diff&num = diff; power&num = power; labl = "&ttle"; if first.power then flag1 = 1;

label zalpha = Standard Normal Variable Associated with 5% level diff = Mean Difference STDEV = Standard Deviation Z = Z (1-Beta) power = Statistical Power power&num = &ttle diff&num = &ttle; run; ods rtf; proc printto file = "C:\Users\Karen\Documents\Thesis\Analysis_Data\DataOut\simulations\&ttle&atit le._modifydiff.rtf" new; proc print label data = power# title "&ttle at alpha level &alpha"; var zalpha diff stdev z power;

33

where flag1 = 1; run; ods rtf close; %mend;

%macro doalpha (alpha = , atitle = ); %getpower (df = 0.067, std = 0.051, ttle = Alcohol History, num = 1); %getpower (df = 0.000, std = 0.049, ttle = Smoking History, num = 2); %getpower (df = 0.121, std = 0.075, ttle = Farm-Ranch, num =3); %getpower (df = 0.452, std = 0.048, ttle = Insecticide Usage, num =4); %getpower (df = 0.000, std = 0.132, ttle = Herbicide Usage, num = 5); %getpower (df = 0.100, std = 0.152, ttle = Fungicide Usage, num =6); %getpower (df = 0.185, std = 0.135, ttle = Fumigant Usage, num =7); %getpower (df = 0.166, std = 0.036, ttle = Overall, num =8); %getpower (df = 0.066, std = 0.035, ttle = Demography Questions, num =9); %getpower (df = 0.244, std = 0.047, ttle = Study-Related Questions, num =10); data all&atitle keypower&atitle; set power1-power10;

if power > 0.8000 then flag = 1; output all&atitle; if flag ne . then output keypower&atitle; run; proc sort data = keypower&atitle; by labl diff power; run; data keypower_final&atitle; set keypower&atitle; by labl diff power; if first.diff; run; proc sort data = all&atitle; by labl diff n; run; /**/ /*data graph_output;*/ /* merge power1 power10;*/ /* by n;*/ /*run;*/ %mend doalpha;

%doalpha (alpha = 0.05, atitle = 5); %doalpha (alpha = 0.01, atitle = 1);

34