© Copyright 2017 Nathanial E. Watson

© Copyright 2017 Nathanial E. Watson Development of Instrumental and Computational Methods for Accessing Information in Multi-Dimensional Gas Chromatography with Mass Spectrometry Nathanial E. Watson A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2017 Reading Committee: Robert E. Synovec, Chair Matthew F. Bush Ashleigh B. Theberge Program Authorized to Offer Degree: Chemistry University of Washington Abstract Development of Instrumental and Computational Methods for Accessing Information in Multi-Dimensional Gas Chromatography with Mass Spectrometry Nathanial E. Watson Chair of the Supervisory Committee: Professor Robert E. Synovec Department of Chemistry Three instrumental and computational methods are demonstrated in an endeavor to create novel techniques to extract information from the troves of data generated by multi-dimensional gas chromatography with mass spectrometry. Initially, these methods are considered within the context of targeted and non-targeted experimental design. The tile-based Fisher Ratio with null distribution analysis is first evaluated and validated within the non-targeted realm. The method is shown to be fast and accurate. Forty-six of the fifty-four benchmarked changing metabolites previously discovered were found by the new methodology while consistently excluding all but one of the benchmarked nineteen false positive metabolites previously identified. This was achieved in less than 5% of the time required for the previous method. Later, the three- dimensional gas chromatograph is improved to include mass spectrometric detection. This instrument provides four dimensions (4D) of chemical selectivity and includes significant improvements to total selectivity (mass spectrometric and chromatographic), peak identification, and operational temperature range relative to previous models of the GC3 reported. Useful approaches to visualize the 4D data are presented. The GC3 - TOFMS instrument experimentally achieved total peak capacity, nc,3D, ranging from 5000 to 9600 (x̅ = 7000, s = 1700) for 10 representative analytes for 50 min separations with component dimensional peak capacities averaging 406, 3.6, and 4.9 for 1D, 2D, and 3D, respectively. Using this instrument and the well understood Parallel Factor Analysis (PARAFAC) model a new option for targeted analysis is presented. Conceptualizing the GC3 - TOFMS as a one-dimensional gas chromatograph with GC × GC-TOFMS detection the instrument was allowed to create the PARAFAC target window natively. Each first dimension modulation thus created a full GC × GC-TOFMS chromatogram totally amenable to PARAFAC. A simple mixture of 115 compounds and a diesel sample were interrogated through this methodology. All test analyte targets were successfully identified in both mixtures. In addition, mass spectral matching of the PARAFAC loadings to library spectra yielded results greater than 900 in 40 of 42 test analyte cases. Twenty-nine of these cases produced match values greater than 950. TABLE OF CONTENTS List of Figures ................................................................................................................................ iv List of Tables ................................................................................................................................. vi Chapter 1. Introduction to Data Analysis in Multi-Dimensional Gas Chromatography and Mass Spectrometry ................................................................................................................................... 1 1.1 Scope ............................................................................................................................... 1 1.2 Data Management and Analysis of Variance .................................................................. 3 1.3 Deconvolution ................................................................................................................. 4 1.4 Classification and Comparison ....................................................................................... 8 1.5 Statistical Presentation of Useful Information ................................................................ 9 1.5.1 Targeted Analysis ....................................................................................................... 9 1.5.2 Non-targeted Analysis .............................................................................................. 10 1.5.3 Hybrid Targeted/Non-targeted Analysis ................................................................... 11 1.6 Hypotheses .................................................................................................................... 12 1.6.1 Chapter 2: Performance Evaluation of Tile-Based Fisher Ratio Analysis using a Benchmark Yeast Metabolome Dataset ................................................................................ 12 1.6.2 Chapter 3: Comprehensive Three-Dimensional Gas Chromatography with Time-of- Flight Mass Spectrometry ..................................................................................................... 13 1.6.3 Chapter 4: Targeted Analyte Deconvolution and Identification by Four-Way Parallel Factor Analysis Using Three-Dimensional Gas Chromatography with Mass Spectrometry 13 1.7 References ..................................................................................................................... 13 i Chapter 2. Performance Evaluation of Tile-Based Fisher Ratio Analysis using a Benchmark Yeast Metabolome Dataset ........................................................................................................... 18 2.1 Introduction ................................................................................................................... 18 2.2 Experimental ................................................................................................................. 24 2.2.1 Sample Prep and Data Collection ............................................................................. 24 2.2.2 Data Analysis ............................................................................................................ 26 2.3 Results and Discussion ................................................................................................. 28 2.4 Conclusions ................................................................................................................... 42 2.5 References ..................................................................................................................... 43 Chapter 3. Comprehensive Three-Dimensional Gas Chromatography with Time-of-Flight Mass Spectrometry ................................................................................................................................. 47 3.1 Introduction ................................................................................................................... 47 3.2 Basic Principles and Instrumental Design .................................................................... 50 3.3 Experimental ................................................................................................................. 52 3.4 Results and discussion .................................................................................................. 56 3.5 Conclusions ................................................................................................................... 63 3.6 References ..................................................................................................................... 63 Chapter 4. Targeted Analyte Deconvolution and Identification by Four-Way Parallel Factor Analysis Using Three-Dimensional Gas Chromatography with Mass Spectrometry Data .......... 67 4.1 Introduction ................................................................................................................... 67 4.2 Experimental ................................................................................................................. 70 4.2.1 Data Collection ......................................................................................................... 70 ii 4.2.2 Data Analysis ............................................................................................................ 72 4.3 Results and Discussion ................................................................................................. 73 4.4 Conclusion .................................................................................................................... 83 4.5 References ..................................................................................................................... 84 Chapter 5. Conclusion ................................................................................................................... 87 Bibliography ................................................................................................................................. 90 iii LIST OF FIGURES Figure 1.1. Targeted Analysis ............................................................................................. 2 Figure 1.2. Non-targeted Analysis ...................................................................................... 2 Figure 2.1. Representative (A) GC × GC‒TOFMS chromatogram of m/z 73 and zoom (B) to region in box. ............................................................................................................ 24 Figure 2.2. Extract of the cystathionine chromatographic

© Copyright 2017 Nathanial E. Watson

Proquest Dissertations

A Notation and Definitions

Ryan Tibshirani

High-Dimensional and Causal Inference by Simon James

Pcredux Package - an Overview Stefan Rödiger, Michał Burdukiewicz, Andrej-Nikolai Spiess 2017-11-20

Mathematics People

Arxiv:1905.02086V1 [Stat.CO] 6 May 2019 K Arise When Looking for the Optimal Regularization Parameter in Q X Sˆg(Q) = Rt(Qi + L)−1R

Lester Mackey

T. Hastie, R. Tibshirani, J. Friedman the Elements of Statistical Learning Data Mining, Inference, and Prediction, Second Edition

The Lasso: an Application to Cancer Detection and Some New Tools for Selective Inference

Downloaded from the Gene Expression Omnibus (GEO) Database Under the Accession Number GSE25219

Additive Logistic Regression: a Statistical View of Boosting