Statistical Methods in Disease Risk Analysis, Disease Testing and Nutrition Epidemiology Hui Lin Iowa State University

Statistical Methods in Disease Risk Analysis, Disease Testing and Nutrition Epidemiology Hui Lin Iowa State University

Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2012 Statistical methods in disease risk analysis, disease testing and nutrition epidemiology Hui Lin Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Statistics and Probability Commons Recommended Citation Lin, Hui, "Statistical methods in disease risk analysis, disease testing and nutrition epidemiology" (2012). Graduate Theses and Dissertations. 13380. https://lib.dr.iastate.edu/etd/13380 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Statistical methods in disease risk analysis, disease testing and nutrition epidemiology by Hui Lin A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Statistics Program of Study Committee: Alicia Carriquiry, Co-major Professor Chong Wang, Co-major Professor Kenneth Koehler Dan Nordman Derald Holtkamp Iowa State University Ames, Iowa 2013 Copyright c Hui Lin, 2013. All rights reserved. ii TABLE OF CONTENTS LIST OF TABLES . v LIST OF FIGURES . vii ACKNOWLEDGEMENTS . x ABSTRACT . xi CHAPTER 1. GENERAL INTRODUCTION . 1 CHAPTER 2. CONSTRUCTION OF DISEASE RISK SCORING SYS- TEMS USING LOGISTIC GROUP LASSO: APPLICATION TO PORCINE REPRODUCTIVE AND RESPIRATORY SYNDROME SURVEY DATA 3 2.1 Introduction . .4 2.2 Models for risk scoring systems . .6 2.2.1 Multivariate logistic regression model . .6 2.2.2 Group lasso for logistic regression . .7 2.3 Application to PRRS Data . .9 2.3.1 Data Description . .9 2.3.2 Application of logistic group lasso . 10 2.3.3 Results . 11 2.3.4 Comparison among risk scoring systems . 14 2.4 Simulation Study . 14 2.5 Discussion . 16 iii CHAPTER 3. EXACT AND ASYMPTOTIC STATISTICAL TESTS FOR DIFFERENCE IN PROPORTIONS OF ONE-TO-TWO MATCHED BI- NARY VARIABLES . 20 3.1 Introduction . 20 3.2 Basic concepts, Terminology and Notation . 22 3.2.1 Miettinen Exact Test . 23 3.3 Random Exact Test . 24 3.3.1 Test Statistic . 24 3.3.2 Power of the Random Exact Test . 25 3.4 Asymptotic Test . 26 3.5 Simulation . 27 3.6 Application Examples . 30 3.6.1 Dual Sample Pooling Test . 30 3.6.2 Pen-based oral fluid specimens for influenza A virus detection . 32 3.7 Discussion and Conclusion . 34 CHAPTER 4. MEASUREMENT ERROR IN A BIVARIATE MODEL { APPLICATION IN NUTRITION EPIDEMIOLOGY . 36 4.1 Introduction . 37 4.2 Bivariate random measurements with error in one margin . 39 4.2.1 Deconvolution estimator of fX2 (x2)..................... 41 4.2.2 A copula approach to conditional density estimation . 43 4.3 Simulation study . 45 4.4 Discussion . 50 CHAPTER 5. GENERAL CONCLUSIONS . 55 APPENDIX A. ADDITIONAL MATERIAL . 58 A.1 Tables for difference parameterization . 58 A.2 Generalize to One to More Matched Test . 59 A.2.1 Exact Binomial Test . 59 iv A.2.2 Asymptotic Test . 60 BIBLIOGRAPHY . 62 v LIST OF TABLES Table 2.1 Summary of number of questions in the final risk scoring system by category of risk factors . 18 Table 2.2 AUC estimations for three risk scoring systems . 18 Table 2.3 Simulation study result with various values of coefficient γ. Reported are mean and standard deviation of AUC for both methods, mean difference and p value from Wilcoxon signed rank test. 19 Table 3.1 Outcome for Subject j ........................... 23 Table 3.2 Counting Table for n Sets of Observations . 23 3.3 Counting Table for Dual Pooling Test . 32 Table 3.4 Counting Table for influenza A virus detection . 33 Table 4.1 Moments of the distributions of the target values x2i, deconvolution ∗ estimates x2i and contaminated observations X2ij for different sample sizes and error distributions. 50 Table 4.2 Percentiles of the ratio x2 under the three correlation structures. The x1 measurement error distribution is N(0,0.5) and the size is 200 subjects o with 7 replicates each.r ^k is estimated ratio; rk is the true ratio; rk is the observed ratio with measurement error, and k indicates the corre- sponding correlation structure. 53 4.3 Percentiles of the ratio x2 under three correlation structures. The mea- x1 surement error distribution is N(0,0.5) and the size is 350 subjects with o 4 replicates each.r ^k is estimated ratio; rk is the true ratio; rk is the observed ratio with measurement error. 54 vi Table A.1 Table for Setting 1, δ =0.......................... 58 Table A.2 Table for Setting 2, δ =0.......................... 58 Table A.3 Table for Setting 3, δ =0.......................... 58 Table A.4 Table for Setting 4, δ =0.......................... 59 A.5 Outcome for Subject j for 1 to L Matched Test . 59 A.6 Counting Table for N Sets of Observations for 1 to L Matched Test . 59 vii LIST OF FIGURES Figure 2.1 Three criteria for choice of penalty parameter λ ............. 12 Figure 2.2 Distributions of estimated probabilities for both negative and positive groups . 13 Figure 2.3 ROC curves for three risk scoring systems . 15 Figure 3.1 Comparison of exact test power and asymptotic test power for setting 1 of one-to-two case. The numbers in the legend indicate the sample size. AsyTest indicates asymptotic test; Exact indicates exact test. 28 Figure 3.2 Comparison of exact test power and asymptotic test power for setting 2 of one-to-two case. The numbers in the legend indicate the sample size. AsyTest indicates asymptotic test; Exact indicates exact test. 29 Figure 3.3 Comparison of exact test power and asymptotic test power for setting 3 of one-to-two case. The numbers in the legend indicate the sample size. AsyTest indicates asymptotic test; Exact indicates exact test. 30 Figure 3.4 Comparison of exact test power and asymptotic test power for setting 4 of one-to-two case. The numbers in the legend indicate the sample size. AsyTest indicates asymptotic test; Exact indicates exact test. 31 Figure 3.5 Cumulative distribution functions of the fuzzy p-values for Dual Sample Pooling Test based on 2000 iterations . 33 Figure 3.6 Cumulative distribution functions of the fuzzy p-values for Pen-based oral fluid specimens for influenza A virus detection based on 2000 iterations 34 2 4.1 Joint distribution for simulated x1i and x2i; top-left : x1ijx2i ∼ χ (x2i); p Γ(3;5) top-right: x1ijx2i ∼ Γ(5; 1) + x2i; bottom: x1ijx2i ∼ e + sin(x2i) 46 viii 4.2 Errors are ∼ N(0; 0:5). Black solid curve is the average (over 15 reps) of the true density of x2; blue dotted curve is the average of the naive density estimator, ignoring measurement error; red dashed curve is the average of the deconvolution estimator. The left panel corresponds to the case where n = 200 and r = 7 and the right panel corresponds to the case where n = 350 and r =4...................... 48 4.3 Errors are ∼ t(3). Black solid curve is the average (over 15 reps) of the true density of x2; blue dotted curve is the average of the naive density estimator, ignoring measurement error; red dashed curve is the average of the deconvolution estimator. The left panel corresponds to the case where n = 200 and r = 7 and the right panel corresponds to the case where n = 350 and r =4. ......................... 49 2 4.4 Density of the ratio x2=x1 with x1ijx2i ∼ χ (x2i). Left panel corresponds to n = 200 subjects with r = 7 independent replicates each; right panel corresponds to n = 350 subjects with r = 4 replicates. The black solid curve is the true density; the blue dotted curve is the density of the observed ratio (ignoring measurement error); the red dashed curve is obtained using the deconvolution estimate of f(x2). 51 p 4.5 Density of the ratio x2=x1 with x1ijx2i ∼ Γ(5; 1) + x2i. Left panel corresponds to n = 200 subjects with r = 7 independent replicates each; right panel corresponds to n = 350 subjects with r = 4 replicates. The black solid curve is the true density; the blue dotted curve is the density of the observed ratio (ignoring measurement error); the red dashed curve is obtained using the deconvolution estimate of f(x2). 52 ix Γ(3;5) 4.6 Density of the ratio x2=x1 with x1ijx2i ∼ e + sin(x2i). Left panel corresponds to n = 200 subjects with r = 7 independent replicates each; right panel corresponds to n = 350 subjects with r = 4 replicates. The black solid curve is the true density; the blue dotted curve is the density of the observed ratio (ignoring measurement error); the red dashed curve is obtained using the deconvolution estimate of f(x2). 53 x ACKNOWLEDGEMENTS I would like to take this opportunity to express my thanks to those who helped me with various aspects of conducting research and writing this thesis. First and foremost, Dr. Alicia Carriquiry and Dr. Chong Wang for their guidance, patience and support throughout this research and the writing of this thesis. I would additionally like to thank Dr. Derald Holtkamp for his help and encouragement throughout my graduate study. I would also like to thank Dr. Kenneth Koehler and Dr. Dan Nordman for their effort and time. I would also like to thank my friends and family for their loving guidance and financial assistance during my school life. Without those people I would not have been able to complete this work.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    79 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us