Real-Time Wavelet Compression and Self-Modeling Curve

REAL-TIME WAVELET COMPRESSION AND SELF-MODELING CURVE
RESOLUTION FOR ION MOBILITY SPECTROMETRY

A dissertation presented to the faculty of the College of Arts and Sciences of Ohio University

In partial fulfillment of the requirements for the degree
Doctor of Philosophy

Guoxiang Chen March 2003
This dissertation entitled
REAL-TIME WAVELET COMPRESSION AND SELF-MODELING CURVE
RESOLUTION FOR ION MOBILITY SPECTROMETRY

BY
GUOXIANG CHEN

has been approved for the Department of Chemistry and Biochemistry and the College of Arts and Sciences by

Peter de B. Harrington
Associate Professor of Chemistry and Biochemistry

Leslie A. Flemming
Dean, College of Arts and Sciences
CHEN, GUOXIANG. Ph.D. March 2003. Analytical Chemistry Real-Time Wavelet Compression and Self-Modeling Curve Resolution for Ion Mobility Spectrometry (203 pp.) Director of Dissertation: Peter de B. Harrington
Chemometrics has proven useful for solving chemistry problems. Most of the chemometric methods are applied in post-run analyses, for which data are processed after being collected and archived. However, in many applications, real-time processing is required to obtain knowledge underlying complex chemical systems instantly. Moreover, real-time chemometrics can eliminate the storage burden for large amounts of raw data that occurs in post-run analyses. These attributes are important for the construction of portable intelligent instruments.
Ion mobility spectrometry (IMS) furnishes inexpensive, sensitive, fast, and portable sensors that afford a wide variety of potential applications. SIMPLe-to-use Interactive Self-modeling Mixture Analysis (SIMPLISMA) is a self-modeling curve resolution method that has been demonstrated as an effective tool for enhancing IMS measurements. However, all of the previously reported studies have applied SIMPLISMA as a post-run tool.
A modified SIMPLISMA algorithm, referred to as RTSIMPLISMA, was developed for modeling IMS data in real-time. The real-time algorithm can determine the number of components in the IMS data automatically. Resolved concentration and spectral profiles are simultaneously displayed on a virtual instrument while the data is collected from an ion mobility spectrometer.
The computational burden for real-time SIMPLISMA increases when the collected number of spectra grows in size. A spectrum will not be acquired when the data processing consumes too large a share of computer resources. To alleviate this problem, a two-dimensional wavelet compression (WC²) was applied prior to RTSIMPLISMA modeling. The optimal settings of WC²-RTSIMPLISMA for processing IMS data were obtained, by which satisfactory models could be resolved when the data was compressed to 1/256.
A novel real-time WC²has been developed to compress data as it is acquired from
IMS sensors. RTSIMPLISMA was applied to the WC²processed data in real-time, by which the real-time modeling could be significantly accelerated. An integrated software package was developed to implement the real-time WC²-RTSIMPLISMA algorithm and used for the rapid processing of the IMS data of drugs and explosives. The real-time algorithm was able to disclose the very small features in the IMS data and rapidly model the dynamic changes during an IMS measurement course.

Approved: Peter de B. Harrington
Associate Professor of Chemistry and Biochemistry
5
Acknowledgments
I would like to thank my research advisor, Dr. Peter de B. Harrington, for his invaluable support and guidance during my stay at Ohio University. This dissertation could not have been written and the research could not have been accomplished without his help. I would also like to thank my dissertation committee members, Drs. Gary W. Small, Howard D. Dewald, Martin T. Tuck, Wen-jia R. Chen and Xiaozhuo Chen, for their great help in my academic progress and research pursuits. Paul Schmittauer is thanked for his assistances in electronic techniques.
I would like to thank the Department of Chemistry and Biochemistry at Ohio
University for offering me the opportunity to conduct my doctoral research. The Center for Intelligent Chemical Instrumentation at Ohio University is thanked for supporting the conference trips. Ohio University is thanked for the support of Donald R. Clippinger Fellowship. The US Army ERDEC, GeoCenters, and Ion Track Instruments are thanked for the partial support of this research. Metara Inc. is thanked for supporting me to write this dissertation while working. Dr. Willem Windig at Eigenvector Research Inc. is thanked for his permission for me to use the spectral data files and MATLAB scripts.
I would also thank the members in Dr. Harrington’s research group for their helpful suggestions. Special thanks are given to Libo Cao for her consistent help over the years. Dr. Tricia L. Buxton Derringer is also thanked for the bacterial data set.
I would like thank Zhuo Chen for her love, encouragement, and valuable support.
I would like to thank my father and the other family members who are always caring and supportive in my life.
6
Table of Contents

Page
Abstract...............................................................................................................................3 Acknowledgments...............................................................................................................5 List of Tables ......................................................................................................................9 List of Figures...................................................................................................................10 List of Abbreviations ........................................................................................................18

Chapter 1
1.1
Introduction............................................................................................... 21 General Statement..................................................................................... 21 Ion Mobility Spectrometry........................................................................ 23 Self-Modeling Curve Resolution.............................................................. 26 Data Compression..................................................................................... 29 The Research Objectives........................................................................... 32
1.2 1.3 1.4 1.5

Chapter 2

SIMPLISMA and Wavelet Transform...................................................... 34

SIMPLISMA............................................................................................. 34 Wavelet Transform ................................................................................... 48
2.1 2.2

Chapter 3
3.1
Real-Time Self-Modeling Mixture Analysis ............................................ 59 Introduction............................................................................................... 59
7
3.2

3.3 3.4 3.5
Theory....................................................................................................... 61 Experimental Section................................................................................ 63 Results and Discussion ............................................................................. 67 Conclusions............................................................................................... 86

Chapter 4

RTSIMPLISMA Applied to Two-Dimensional Wavelet Compressed Ion

Mobility Data........................................................................................................ 87

4.1 4.2 4.3 4.4
Introduction............................................................................................... 87 Theory....................................................................................................... 91 Experimental Section................................................................................ 94 Results and Discussion ............................................................................. 97
4.4.1 Conventional SIMPLISMA Models...................................................97

4.4.2 Optimization of WC²-RTSIMPLISMA

103

117

4.4.3 RTSIMPLISMA Applied to Windig Standard Data Sets

4.5

Conclusions............................................................................................. 137

Real-Time Two-Dimensional Wavelet Compression and Its Application to

Chapter 5

Real-Time Self-Modeling of IMS data............................................................... 143

5.1 5.2 5.3
Introduction............................................................................................. 143 Theory..................................................................................................... 144 Experimental Section.............................................................................. 148
8
Results and Discussion ........................................................................... 151
5.4.1 Time Performance of Real-Time WC²-RTSIMPLISMA.................151 5.4.2 Enhanced IMS Measurement by Real-Time WC²-RTSIMPLISMA156 5.4.3 Real-Time Self-Modeling of IMS Data of Explosives.....................164 5.4.4 Internal Reference Method for Real-Time WC²-RTSIMPLISMA...172
5.4

5.5

Conclusions............................................................................................. 186

Summary and Future Work..................................................................... 187 ................................................................................................................. 191
Chapter 6 References Appendix A: Publications.............................................................................................. 200 Appendix B: Presentations............................................................................................. 201 Appendix C: MATLAB Scripts..................................................................................... 202
9
List of Tables

Table

Page

Table 2.1 The concentrations of the compounds A and B during the reaction course..... 41 Table 3.1 Time performances of different methods and batch number R for processing
550 spectra. ............................................................................................................... 85
Table 4.1 Contribution of wavelet type and compression level to the variation of
RRMSES and RRMSEC......................................................................................... 111
Table 4.2 Compression levels, compression factor (C.F.), percent correctn_c, average
RRMSES, minimum RRMSES, and the corresponding wavelet type for different compression levels for drug data set....................................................................... 112
Table 4.3 Compression levels, percent correctn_c, average RRMSES, minimum RRMSES, and the corresponding wavelet type for different compression levels for bacterial data set. ................................................................................................................... 113
Table 5.1 Experimental setup for the data sets in Section 5.4.4. In the table, t_iis the time when the sample was inserted into the desorber; is the time when the

t_s

measurement stopped;

is the total number spectra collected. Sample volume was

n_s

1 µL and the sample disk was removed at 5 s after it was inserted........................ 175
10
List of Figures

Figure

Page

Figure 1.1 The schematic diagram of an ion mobility spectrometer. .............................. 24 Figure 2.1 The virtual spectra of 1 mM aqueous solution of the pure compound A (Panel
A) and 1 mM aqueous solution of the pure compound B (Panel B)......................... 42
Figure 2.2 The three-dimensional surface plot of the synthesized mixture spectra of the two-component virtual reaction system.................................................................... 43
Figure 2.3 SIMPLISMA resolved spectra of compounds A (Panel A) and B (Panel B) with the number of components being predefined to two (α=0.05)......................... 45
Figure 2.4 SIMPLISMA resolved concentration profiles of compounds A (Panel A) and
B (Panel B) with the number of components being predefined to two (α=0.05). .... 46
Figure 2.5 SIMPLISMA model of the synthesized data set with the number of components being predefined to three (α=0.05)....................................................... 47
Figure 2.6 Schematic of multi-level operations of the pyramid WT algorithm with dyadic sampling.................................................................................................................... 50
Figure 2.7 Father and mother wavelets of daublet 4 (Panel A, four coefficients), daublet
14 (Panel B, 14 coefficients), coiflet 3 (Panel C, 18 coefficients), and symmlet 6 (Panel D, 12 coefficients). ........................................................................................ 52
Figure 2.8 Illustration of the multi-level operations of the pyramid algorithm for forward
WT using daublet 14................................................................................................. 55
11
Figure 2.9 The enlarged view of the smooth and detail parts of WT spectrum at level 5 in
Figure 2.8.................................................................................................................. 56
Figure 2.10 Reconstructed spectra from the smooth parts of forward WT with level 1 to 5 corresponding to Figure 2.8...................................................................................... 58
Figure 3.1 Structures for (A) diisopropyl methanephosphonate (DIMP), and (B)
Pinacolyl methyl phosphonofluoridate (soman)....................................................... 64
Figure 3.2 The graphical user interface for real-time SIMPLISMA................................ 66 Figure 3.3 The 3D surface plot of the IMS data of ethanol. ............................................ 68 Figure 3.4 RTSIMPLISMA resolved concentration profiles (Panel A) and component spectra (Panel B) for ethanol data............................................................................. 69
Figure 3.5 The 3D surface plot of the IMS data of DIMP. The data set was acquired from the CAM at positive mode........................................................................................ 70
Figure 3.6 RTSIMPLISMA resolved concentration profiles (Panel A) and component spectra (Panel B) for DIMP data............................................................................... 72
Figure 3.7 SIMPLISMA-det resolved concentration profiles (Panel A) and component spectra (Panel B) for DIMP data............................................................................... 73
Figure 3.8 RTSIMPLISMA model after processing 25 spectra for DIMP data. Panel A presents the concentration profiles and Panel B presents the component spectra.... 75
Figure 3.9 RTSIMPLISMA model after processing 40 spectra for DIMP data.. ............ 76 Figure 3.10 RTSIMPLISMA model after processing 135 spectra for DIMP data. ......... 77 Figure 3.11 RTSIMPLISMA resolved model for DIMP data with NPV threshold β₀=

0.008.......................................................................................................................... 78
12
Figure 3.12 RTSIMPLISMA model for DIMP data with NPV threshold β₀= 0.04....... 79

Figure 3.13 RTSIMPLISMA model for DIMP data with NPV threshold β₀= 0.24....... 80 Figure 3.14 Comparison of time performance for real-time implementation of
SIMPLISMA-det, RTSIMPLISMA, and data acquisition only................................ 82
Figure 3.15 Effects on time performance by real-time implementation of
RTSIMPLISMA for batches of R spectra................................................................. 84
Figure 4.1 Structures for (A) cocaine, and (B) heroin. ................................................... 90 Figure 4.2 Schematic diagram of the implementation principle of the WC²-
RTSIMPLISMA algorithm....................................................................................... 92
Figure 4.3 The cocaine-heroin data set comprised 1024 spectra on a 3D surface plot
(Acquired from ITEMISER^®ITMS in positive mode)............................................. 98
Figure 4.4 The TMAH-preprocessed Bacillus cereus data set comprised 1024 spectra on a 3D surface plot (Acquired from Barringer IONSCAN^350 spectrometer in positive mode)........................................................................................................... 99
Figure 4.5 Conventional SIMPLISMA model from the original cocaine-heroin data set
(three-component model). (A) Concentration profiles. (B) Component spectra.... 101
Figure 4.6 Conventional SIMPLISMA model from the Bacillus cereus data set (fourcomponent model). (A) Concentration profiles. (B) Component spectra............... 102
Figure 4.7 Relative purity curves of determinant-based SIMPLISMA for the drug and bacterial data sets.................................................................................................... 104
Figure 4.8 Relative purity curves of Gram-Schmidt-based SIMPLISMA for the drug data set............................................................................................................................ 105
13
Figure 4.9 Relative purity curves of Gram-Schmidt-based SIMPLISMA for the bacterial data set. ................................................................................................................... 106
Figure 4.10 Percent correct number of components with respect to the threshold ∆₀.. 108

Figure 4.11 The 4 × 4 daublet 14-daublet 4 compressed Bacillus cereus dataset comprised of 32 × 64 points in a 3D surface plot................................................... 116
Figure 4.12 Reconstructed RTSIMPLISMA model from the 4 × 4 daublet 14-daublet 4 compressed drug data set. (A) Concentration profiles. (B) Component spectra. ... 118
Figure 4.13 Reconstructed RTSIMPLISMA model from the 4 × 4 daublet 14-daublet 4 compressed bacterial data set. (A) Concentration profiles. (B) Component spectra. ................................................................................................................................. 119
Figure 4.14 The reconstructed data set from the 4 × 4 daublet 14-daublet 4 WC²-
RTSIMPLISMA model from the bacterial data set................................................ 120
Figure 4.15 RTSIMPLISMA relative purity curve for the Windig Raman data set. The transition point is highlighted. ................................................................................ 122
Figure 4.16 RTSIMPLISMA resolved spectra for the Windig Raman data set. ........... 124 Figure 4.17 Conventional SIMPLISMA (Panel A) and RTSIMPLISMA (Panel B) resolved concentration profiles for the Windig Raman data set............................. 125
Figure 4.18 Conventional SIMPLISMA resolved spectra for the Windig Raman data set
(α = 0.03)................................................................................................................ 126
Figure 4.19 RTSIMPLISMA relative purity curve for the Windig FTIR microscopy data set............................................................................................................................ 127
14
Figure 4.20 RTSIMPLISMA resolved spectra for the Windig FTIR microscopy data set.
................................................................................................................................. 129
Figure 4.21 Conventional SIMPLISMA (Panel A) and RTSIMPLISMA (Panel B) resolved concentration profiles for the Windig FTIR microscopy data set............ 130
Figure 4.22 Conventional SIMPLISMA resolved spectra for the Windig FTIR microscopy data set (α = 0.03). .............................................................................. 131
Figure 4.23 RTSIMPLISMA relative purity curve for the Windig NIR data set. ......... 132 Figure 4.24 Resolved spectra for the Windig NIR data set with conventional
SIMPLISMA applied on the positive part of the inverted second derivative data set (α = 0.1).................................................................................................................. 133
Figure 4.25 RTSIMPLISMA resolved spectra for the Windig NIR data set................. 135 Figure 4.26 RTSIMPLISMA relative purity curve for the Windig time resolved mass spectrometry data set............................................................................................... 136
Figure 4.27 Reference spectra for the three photographic color coupling compounds in the Windig time resolved mass spectrometry data set............................................ 138
Figure 4.28 TSIMPLISMA resolved spectra for the Windig time resolved mass spectrometry data set (α = 0.03)............................................................................. 139
Figure 4.29 RTSIMPLISMA resolved spectra for the Windig time resolved mass spectrometry data set............................................................................................... 140
Figure 4.30 TSIMPLISMA (Panel A) and RTSIMPLISMA (Panel B) resolved concentration profiles for the Windig time resolved mass spectrometry data set. . 141
15
Figure 5.1 Circular buffer. As a new point is received, it is placed into the memory pointed by pointer P_n. The start position of the data to be processed is located by P_p.................................................................................................................................. 147
Figure 5.2 Structures for (A) urea nitrate, (B) cyclotrimethylenetrinitramine (RDX), (C)
2,4,6-trinitrotoluene (TNT), and (D) 3,4 methylenedioxymethamphetamine (MDMA) ................................................................................................................................. 150
Figure 5.3 The vector φ⁽⁴⁾that defines the FIR filter for column compression............ 153 Figure 5.4 The time performance curve for RTSIMPLISMA without compression..... 154 Figure 5.5 The time performance curves for data acquisition only and real-time WC²-
RTSIMPLISMA...................................................................................................... 155
Figure 5.6 IMS data set of blank trap disk on 3D surface plot. ..................................... 157 Figure 5.7 The average IMS spectra for three replicates of IMS measurement of a blank trap disk and the average spectrum of the data set that only has RIP..................... 158
Figure 5.8 Real-time WC²-RTSIMPLISMA resolved spectra for the data sets from the three replicates of blank trap disk experiments....................................................... 160
Figure 5.9 The variation profiles corresponding to different drift time for the raw data set of blank 3 in Figure 5.7........................................................................................... 161
Figure 5.10 IMS data set of 3.6×10¹pg TNT on 3D surface plot.................................. 162 Figure 5.11 The average IMS spectra for two replicates of IMS measurement of 3.6×10¹pg TNT and the average spectrum of the data set that only has RIP...................... 163
Figure 5.12 Real-time WC²-RTSIMPLISMA resolved spectra for the data sets from the two replicate data set of 36 pg TNT........................................................................ 165
16
Figure 5.13 IMS data set of explosives (urea nitrate, RDX, and TNT) on a 3D surface plot. ......................................................................................................................... 166
Figure 5.14 Real-time WC²-RTSIMPLISMA resolved concentration profiles at the final point (258.3 s)......................................................................................................... 168
Figure 5.15 Real-time WC²-RTSIMPLISMA resolved component spectra at different acquisition time (Part I). ......................................................................................... 169
Figure 5.16 Real-time WC²-RTSIMPLISMA resolved component spectra at different acquisition time (Part II). (177.1 - 249.0 s)............................................................. 170
Figure 5.17 Real-time WC²-RTSIMPLISMA resolved component spectra at different acquisition time (Part III). (249.1 - 258.3 s) ........................................................... 171
Figure 5.18 SIMPLISMA-det resolved concentration profiles from the raw IMS data of explosives................................................................................................................ 173
Figure 5.19 SIMPLISMA-det resolved component spectra from the raw IMS data set of explosives................................................................................................................ 174
Figure 5.20 Ion mobility spectra of 1µL ethanol solution with 1.0 × 10²ng MDMA, 1.0
× 10²ng cocaine, and 2.0 × 10²ng heroin, respectively, collected on the ITEMISER^ITMS in positive ion mode................................................................ 176
Figure 5.21 Real-time WC²-RTSIMPLISMA resolved component spectra from the data set of drug mixture A.............................................................................................. 178
Figure 5.22 Real-time WC²-RTSIMPLISMA resolved component spectra for drug mixture A with internal reference spectra of cocaine, MDMA, and heroin. .......... 179
17
Figure 5.23 Real-time WC²-RTSIMPLISMA resolved concentration profiles for drug mixture A with internal reference spectra of cocaine, MDMA, and heroin. .......... 181
Figure 5.24 Real-time WC²-RTSIMPLISMA resolved component spectra for drug mixture B. ............................................................................................................... 183
Figure 5.25 Real-time WC²-RTSIMPLISMA resolved component spectra for drug mixture B with appended IMS reference spectra of cocaine, MDMA, and heroin.184
Figure 5.26 Real-time WC²-RTSIMPLISMA resolved concentration profiles for drug mixture B with appended reference IMS spectra of cocaine, MDMA, and heroin. 185
18
List of Abbreviations

Real-Time Wavelet Compression and Self-Modeling Curve

Curriculum Vitae

Multivariate Chemometrics As a Strategy to Predict the Allergenic Nature of Food Proteins

Copula-Based Analysis of Dependent Data with Censoring and Zero Inﬂation

Principal Component Analysis

Chemometrics in Europe: I (R;Q,P)= I (Q11p) Na (2) Selected Results

Chemometrics: Theory and Application

A Synergistic Use of Chemometrics and Deep Learning Improved the Predictive Performance of Near-Infrared Spectroscopy Models for Dry Matter Prediction in Mango Fruit

Philip Ernst: Tweedie Award the Institute of Mathematical Statistics CONTENTS Has Selected Philip A

Abstracts Chemometrics and Analytical Chemistry 2014

Chemometric Study of the Correlation Between Human Exposure to Benzene and Pahs and Urinary Excretion of Oxidative Stress Biomarkers

Sequential Truncation of R-Vine Copula Mixture Model for High- Dimensional Datasets

Chemometrics and Data Analysis