Dissertation

Total Page:16

File Type:pdf, Size:1020Kb

Dissertation PHYSICAL STUDIES OF AIRBORNE POLLEN AND PARTICULATES UTILIZING MACHINE LEARNING by Xun Liu APPROVED BY SUPERVISORY COMMITTEE: David John Lary, Chair Roderick Heelis Robert Glosser Lunjin Chen Fabiano Rodrigues Copyright c 2019 Xun Liu All rights reserved PHYSICAL STUDIES OF AIRBORNE POLLEN AND PARTICULATES UTILIZING MACHINE LEARNING by XUN LIU, BS, MS DISSERTATION Presented to the Faculty of The University of Texas at Dallas in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY IN PHYSICS THE UNIVERSITY OF TEXAS AT DALLAS December 2019 ACKNOWLEDGMENTS I would like to thank all those who supported my research and dissertation. Without their help, I would never have completed this dissertation. I must express my deepest appreciation to my advisor Dr. David Lary for his guidance in the past three years. My heartfelt gratitude also goes to his patience and tolerance for all those mistakes I had once made. His warm support and insightful instructions helped me out of many hard times during the process of this research. I would also like to thank other members of my committee: Dr. Roderick Heelis, Dr. Robert Glosser, Dr. Lunjin Chen, and Dr. Fabiano Rodrigues for serving as my committee members and generously providing me with advice and comments. I am also grateful to all teammates in Dr. Lary's group: Daji Wu, Gebreab K. Zewdie and Lakitha Wijeratne, for their collaboration on my research. October 2019 iv PHYSICAL STUDIES OF AIRBORNE POLLEN AND PARTICULATES UTILIZING MACHINE LEARNING Xun Liu, PhD The University of Texas at Dallas, 2019 Supervising Professor: David John Lary, Chair This dissertation presents an approach for estimating the abundance of airborne pollen and particulates using a comprehensive description of the physical environment coupled with machine learning. The aspects of the physical environment are characterized by eighty-five variables that quantify the physical state of the land surface and soil, and the physical state of the atmosphere. The physical environment of plants naturally affects their rate of maturing and pollen generation. Then, once the pollen is released, conditions such as wind speed will affect how the pollen is dispersed. Machine learning is helpful for studying such a complex system. Machine learning allows us to `learn by example', since at present, we do not have a complete theoretical description, from first principles, of the entire system from the plant growth and development to the plants' full interaction with its physical environment. Machine learning allows us to objectively highlight which physical parameters play a central role in determining the atmospheric abundance of the pollen, and hence, the impact on human health. Some key aspects in building a physical model of airborne particulates using machine learning that are explored in this dissertation include: 1. The collection of an appropriate and comprehensive training dataset that machine learning algorithms can use to learn from. This involves characterizing the appropriate v temporal and spatial scales involved. Variograms were used to perform this analysis. Machine learning is an automated encapsulation of the scientific method, an automated paradigm for learning by example to build descriptive models that can be tested and iteratively improved. 2. Identifying the physical parameters which are the most appropriate input variables (or features) to build an accurate machine learning model. This is a key step in machine learning called feature engineering. Feature engineering can provide useful physical insights into the key drivers of the system being studied. 3. Provide a framework for updating the machine learning model as new observational data is collected. This was done by providing a mini-batch training process that allows the machine learning model to be updated in almost real-time. vi TABLE OF CONTENTS ACKNOWLEDGMENTS . iv ABSTRACT . v LIST OF FIGURES . ix LIST OF TABLES . xii CHAPTER 1 INTRODUCTION . 1 1.1 Airborne Particulates . 1 1.1.1 Particulate Matter in Different Sizes . 1 1.1.2 Chemical Composition and Source Apportionment . 2 1.1.3 Role in Global Environmental Change . 3 1.1.4 Particulate Matter and Human Health . 5 1.2 Airborne Ambrosia Pollen . 7 1.2.1 Ragweeds - Source of Airborne Pollen in North America . 8 1.2.2 Airborne Pollen Particles . 9 1.2.3 Ambrosia Pollen and Health . 9 1.2.4 Environment's effect on airborne pollen . 12 1.3 Summary . 13 CHAPTER 2 OBSERVATIONS OF THE TEMPORAL CHANGES IN AMBROSIA (RAGWEED) POLLEN ABUNDANCE . 15 2.1 Previous Work . 15 2.2 Data . 17 2.3 Summary . 19 CHAPTER 3 OBSERVATION OF THE TEMPORAL CHANGES IN PARTICULATE MATTER ABUNDANCE . 20 3.1 Optical Particle Counters . 20 3.2 In-situ Observation System . 21 3.3 Data from MINTS . 22 CHAPTER 4 PHYSICAL INSIGHTS PROVIDED BY VARIOGRAMS . 24 4.1 Stochastic Process and Sampling . 24 vii 4.2 Variogram and Kriging . 25 4.2.1 Variogram Definition Equation . 25 4.2.2 Kriging for Data Interpolation . 26 4.2.3 Practical Use of Variograms . 27 4.3 Variograms of Airborne Particulates . 29 4.4 Summary . 34 CHAPTER 5 BUILDING EMPIRICAL PHYSICAL MODELS OF AIRBORNE PAR- TICULATES USING MACHINE LEARNING . 35 5.1 Introduction to Machine Learning . 36 5.1.1 Supervised Learning . 37 5.1.2 Unsupervised Learning . 37 5.1.3 Feature Engineering . 39 5.2 LASSO . 39 5.2.1 Algorithm . 40 5.2.2 Result . 40 5.3 Neural Networks . 41 5.3.1 Algorithm . 41 5.3.2 Result . 43 5.4 Ensembles of Decision Trees . 44 5.4.1 Decision Trees . 44 5.4.2 Random Forest . 45 5.4.3 Advantages of a Random Forest . 47 5.4.4 Estimating Ambrosia Pollen Using Random Forests . 48 5.5 Summary of Pollen Estimation Using Machine Learning . 54 5.6 Machine Learning Inter-comparison for PM2:5 . 55 CHAPTER 6 SUMMARY . 63 6.1 Conclusion . 63 6.2 Future Direction . 65 APPENDIX ENVIRONMENTAL VARIABLES USED IN POLLEN ESTIMATION 66 REFERENCES . 70 BIOGRAPHICAL SKETCH . 76 CURRICULUM VITAE viii LIST OF FIGURES 1.1 Size comparison for PM particles, GNU free documentation license from EPA public knowledge . 2 1.2 Airborne particulate size distribution chart, GNU free documentation license from Wikipedia. 4 1.3 PM2:5 source and chemical composition apportionment at multiple Chinese sites during 2013 (Huang et al., 2014). 5 1.4 Chemical composition and source apportionment comparison between PM10 and PM2:5.......................................... 6 1.5 Particulates' direct and indirect effects on the global climate system. 7 1.6 Percentages of risk factors on attributable deaths in 2013. (Bank, 2016) . 8 1.7 A schematic showing the Ambrosia life-cycle. 10 2.1 Correlation of the model-predicted pollen concentrations with observed validation data for 2013. Plotted based on equation 2.1, using data from (Howard and Levetin, 2014; Bosilovich et al., 2006) . 16 2.2 Example seasonal pollen data for 1986, 1987 and 1988. 17 2.3 Averaged 1986-2014 Pollen Data in Flowering Season . 18 3.1 Optical Particle Counters . 21 3.2 Schematic of MINTS sensors . 22 3.3 Particulate Time Series Data in Chattanooga, 08.02.2018 . 23 3.4 Particulate in size of 0.75 - 1.7 µm Time Series Data from August 10th to 12th . 23 4.1 A spherical variogram fit. 28 4.2 Covariance function as a function of data pair separation. 29 4.3 Significance of variogram nugget and range. The range characterizes the spatial scale beyond which separation the data is no longer correlated, so this is a useful way to determine the spatial and/or temporal scales of our data. The nugget (the variogram at zero separation) characterizes the experimental error in our observations. 30 4.4 The lower panel shows an example temporal variogram for observed PM2:5. The units of the variogram are the same as the units of variance, in this case of PM2:5. The upper panel shows how many observations are in each lag time bin. 31 ix −3 4.5 (a) Observed PM2:5 time series in µg/cm (shown in red). Values are recorded every two seconds, then a one-hour moving time window centered on each time point is considered. For this one-hour time window we calculate the representa- tiveness uncertainty, σrep. The green lines either side of the observed PM2:5 indi- cate this representativeness uncertainty. (b) The representativeness uncertainty over a one hour moving time window in µg/cm−3. Note that the representative- ness uncertainty is a significant fraction of the observed PM2:5. (c) A histogram of the range of each variogram. A separate variogram is considered for all observa- tions taken over the one-hour moving time window centered on each observation taken every two-second is calculated. The most frequent range (time-scale) for this time-series is for a lag of 9 minutes. So ideally an observation should be re- ported every few minutes so that this dominant time-scale of temporal variations can be adequately resolved. (d) A histogram of the fractional representativeness uncertainty, σrep, for the entire time-series. 32 −3 4.6 (a) Observed PM10 time series in µg/cm (shown in red). Values are recorded ev- ery two seconds, then a one hour moving time window centered on each time point is considered. For this one hour time window we calculate the representativeness uncertainty, σrep. The green lines either side of the observed PM10 indicate this representativeness uncertainty. (b) The representativeness uncertainty over a one hour moving time window in µg/cm−3. Note that the representativeness uncer- tainty is a significant fraction of the observed PM10. (c) A histogram of the range of each variogram. A separate variogram is considered for all observations taken over the one hour moving time window centered on each observation taken every two-second is calculated.
Recommended publications
  • Yongwan Chun
    Curriculum Vitae August 30, 2021 Yongwan Chun School of Economic, Political and Policy Sciences The University of Texas at Dallas (GR31) 800 West Campbell Road Richardson, Texas 75080 Tel: 972-883-4719 Email: [email protected] EDUCATION Ph.D., Geography, 2007, The Ohio State University, Columbus, Ohio. M.A.S., Statistics, 2006, The Ohio State University, Columbus, Ohio. M.A., Geography Education, 2002, Seoul National University, Seoul, South Korea. B.A., Geography Education, 1996, Seoul National University, Seoul, South Korea. EMPLOYMENT AND POSITIONS Empolyment 2021−present Professor, School of Economic, Political and Policy Sciences (EPPS), The University of Texas at Dallas. 2015−2021 Associate Professor, School of EPPS, The University of Texas at Dallas. 2009−2015 Assistant Professor, School of EPPS, The University of Texas at Dallas. 2008−2009 Clinical Assistant Professor, School of EPPS, The University of Texas at Dallas. 2007−2008 Postdoctoral Research Fellow, School of EPPS, The University of Texas at Dallas. Other Positions 2021−present Director of Graduate Studies and Associate Head, Geospatial Information Sciences Program, School of EPPS, The University of Texas at Dallas. 2017 Acting Program Head, Geospatial Information Sciences Program, School of EPPS, The University of Texas at Dallas (Fall semester). 2013–2018 Coordinator, Graduate Certificate Program, School of EPPS, The University of Texas at Dallas: Geographic Information System, Remote Sensing, and Geospatial Intelligence certificates. 2003−2007 Teaching/Research Assistant, Department of Geography, The Ohio State University. 2002−2003 Research Assistant, Center for Mapping, The Ohio State University. 2001−2002 Academic Administration Assistant, Department of Geography Education, Seoul National University. 1997−2001 GIS Analyst/Instructor, Technical Support Department, Esri Korea Inc.
    [Show full text]
  • Laboratory for Atmospheres PHILOSOPHY, ORGANIZATION, MAJOR ACTIVITIES, and 2001 HIGHLIGHTS
    Laboratory for Atmospheres PHILOSOPHY, ORGANIZATION, MAJOR ACTIVITIES, AND 2001 HIGHLIGHTS January 2002 Impact of Simulated LIDAR Winds on Numerical Weather Prediction National Aeronautics and Space Administration Goddard Space Flight Center Greenbelt, MD 20771 NASA GODDARD SPACE FLIGHT CENTER Laboratory for Atmospheres PHILOSOPHY, ORGANIZATION, MAJOR ACTIVITIES, AND 2001 HIGHLIGHTS January 2002 Impact of Simulated LIDAR Winds on Numerical Weather Prediction The Laboratory for Atmospheres’ Data Assimilation Office (DAO) uses a modeling technique called OSSE (Observing System Simulation Experiment) to study atmospheric monitoring capabilities. In this unique approach, the OSSE synthesizes the observations of a proposed satellite instrument and uses them in a data assimilation to predict the instrument’s usefulness in forecasting. The cover shows simulations to evaluate various concepts for obtaining Doppler Wind Lidar (DWL) profiles from space. The drawing shows the cross-track coverage of a DWL in a 400 km orbit and the improved anomaly correlation for sea-level pressure in the southern hemisphere. The anomaly correlation shown on the ordinate in the chart indicates forecast accuracy. A perfect forecast has an anomaly correlation of 1.0, while the limit of useful forecast skill is about 0.6. Photo courtesy of R. Atlas, J. Ardizzone, J. Terry, and D. Bungato of the Data Assimilation Office; G.D. Emmitt of Simpson Weather Associates; and T. Carnahan and C. Congedo of the Mechanical Systems Analysis and Simulation Branch, NASA Goddard Space Flight Center. National Aeronautics and Space Administration Goddard Space Flight Center NASA Greenbelt, Maryland 20771 Laboratory Chief’s Summary Dear Reader: Welcome to the Laboratory for Atmospheres’ annual report for 2001.
    [Show full text]
  • Curriculum Vitae Prof. David John Lary University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX 75080 Telephone 972-489-2059
    Curriculum Vitae Prof. David John Lary University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX 75080 Telephone 972-489-2059. Email:[email protected]. Web: http://utdallas.edu/~david.lary/ OVERVIEW My work is in the area of Applied Physics for Societal Benefit with a focus on using remote sensing from robotic aerial vehicles and satellites coupled with machine learning to facilitate scientific discovery and decision support. All of the elements described in this CV have used or supported this goal. From my PhD at the University of Cambridge which described the world’s first chemical scheme within a weather forecasting model, to the autonomous earth observing system I have developed for NASA, to the use of artificial intelligence to deal with inter-instrument biases and produce climate data records, to the support of various decision support tools in areas such as smart agriculture, tornado prediction, disaster response, famine relief, health systems, to the detection of online fraud. All have used and furthered computational discovery. These contributions have been recognized by my peers through: Invited contributions to the Royal Society, National Academies, and CDC, two dedicated EGU symposia sessions, three prestigious fellowships, five editorial commendations, several million dollars in research funding, seven NASA awards, and more than sixty publications with over a thousand citations in the peer- reviewed literature with a Hirsch Index of seventeen. EDUCATION PhD In Photochemical Modeling of the Atmosphere 1987-1991 Department of Chemistry and Churchill College, University of Cambridge, England. First Class Double Honors BSc. In Physics & Chemistry, King's College, London, England.
    [Show full text]