Kumulative Dissertation)
Total Page:16
File Type:pdf, Size:1020Kb
Investigations on chemometric approaches for diagnostic applications utilizing various combinations of spectral and image data types Dissertation (Kumulative Dissertation) zur Erlangung des akademischen Grades Doctor rerum naturalium (Dr. rer. nat.) vorgelegt dem Rat der Chemisch-Geowissenschaftlichen Fakultät der Friedrich-Schiller-Universität Jena von M.Sc. Oleg Ryabchykov geboren am 6. Juli 1990 in Simferopol, die Ukraine 1. Gutachter: Prof. Dr. Jürgen Popp Institut für Physikalische Chemie, Friedrich-Schiller-Universität Jena 2. Gutachter: PD Dr. Thomas Bocklitz Institut für Physikalische Chemie, Friedrich-Schiller-Universität Jena Tag der öffentlichen Verteidigung: 7. August 2019 Mathematical science shows what is. It is the language of unseen relations between things. But to use and apply that language, we must be able fully to appreciate, to feel, to seize the unseen, the unconscious. Ada Lovelace (1815 – 1852) Contents | V Contents Contents .................................................................................................................... V List of tables and figures ....................................................................................... VII List of Abbreviations ................................................................................................ IX 1. Introduction ......................................................................................................... 1 1.1. Tissue and cell-based diagnostics ................................................................ 2 1.2. Machine learning .......................................................................................... 5 1.3. Data fusion .................................................................................................... 6 1.4. Outline of thesis ............................................................................................ 9 2. Data fusion for tissue diagnostics ..................................................................... 11 2.1. Raman spectroscopy ................................................................................... 12 2.2. MALDI spectrometric imaging ................................................................... 18 2.3. Combined data analysis ............................................................................. 20 2.4. Treating misaligned and incomplete data ................................................. 23 3. Cell-based diagnostics ....................................................................................... 27 3.1. Image-based hemogram.............................................................................. 28 3.2. Merging the predictions from morphological and Raman spectroscopic data................ ........................................................................................................ 32 3.3. Combining scores obtained from Raman data and clinical values ........... 34 4. Summary ........................................................................................................... 39 5. Zusammenfassung ............................................................................................. 45 Bibliography ............................................................................................................. 51 Publications .............................................................................................................. 57 [P1] Automatization of spike correction in Raman spectra of biological samples ................................................................................................................. 57 [P2] Leukocyte subtypes classification by means of image processing ......... 65 VI | Contents [P3] Raman based molecular imaging and analytics: a magic bullet for biomedical applications!? ..................................................................................... 75 [P4] Toward food analytics: fast estimation of lycopene and β-carotene content in tomatoes based on surface enhanced Raman spectroscopy (SERS)............... ................................................................................................... 95 [P5] Raman spectroscopic investigation of the human liver stem cell line HepaRG .............................................................................................................. 105 [P6] Fusion of MALDI spectrometric imaging and Raman spectroscopic data for the analysis of biological samples ................................................................ 115 [P7] UV-Raman spectroscopic identification of fungal spores important for respiratory diseases ........................................................................................... 127 [P8] Surface enhanced Raman spectroscopy‐detection of the uptake of mannose‐modified nanoparticles by macrophages in vitro: A model for detection of vulnerable atherosclerotic plaques ................................................................ 135 List of publications and conference contributions .................................................... i Acknowledgments ..................................................................................................... iv Erklärungen .............................................................................................................. v List of tables and figures | VII List of tables and figures Table 1: SVM prediction of test data..................................................................... 32 Table 2: LBOCV prediction of inflammatory conditions by combined CPPLS model....................................................................................................................... 37 Figure 1: Types of data fusion architecture…....................................................... 07 Figure 2: Raman spectroscopy............................................................................... 12 Figure 3: Raman spectroscopic data processing workflow................................... 14 Figure 4: Automated parameter selection............................................................. 15 Figure 5: MALDI-TOF spectrometry..................................................................... 18 Figure 6: MALDI spectrometric data processing workflow................................. 19 Figure 7: PCA of separate and combined spectral data....................................... 22 Figure 8: PCA loading for separated and combined analysis.............................. 23 Figure 9: Missing data imputation as a part of decentralized data fusion......... 25 Figure 10: Leave-one-batch-out performance of PCA-LDA prediction................ 26 Figure 11: Workflow of multi-module blood analysis........................................... 28 Figure 12: Microscopic cell image processing workflow....................................... 29 Figure 13: The visualization of pseudo-Zernike moments and the LDA loadings of a binary classification model.................................................................................. 31 Figure 14: PCA-LDA classification between two major types of lymphocytes............................................................................................................ 34 Figure 15: Distributed data fusion workflow for sepsis diagnostics.................... 36 VIII | List of tables and figures Figure 16: Scores of a combined CPPLS model and a cross-validated performance of disease severity prediction............................................................................ 37 Figure 17: Preprocessing workflows used for various data types........................ 39 Figure 18: A diagram that demonstrates the difference between data fusion ap- proaches.................................................................................................................. 40 List of Abbreviations | IX List of Abbreviations AUC area under curve CCD charge-coupled device CPPLS canonical powered partial least squares CV cross-validation EM-PCA expectation maximization principal component analysis H&E hematoxylin and eosin HCC hepatocellular carcinoma ICA independent component analysis LBOCV leave-batch-out-cross-validation LDA linear discriminant analysis MALDI matrix assisted laser desorption/ionization MALDI-MS MALDI mass spectrometry MALDI-TOF matrix assisted laser desorption/ionization time-of-flight MCR-ALS multivariate curve resolution alternating least squares MS mass spectrometric NA not available (missing value) NMF non-negative matrix factorization PC principal component PC3 third principal component PCA principal component analysis X | List of Abbreviations PLS partial least squares PZI pseudo-Zernike invariants PZM pseudo-Zernike moments ROC receiver operating characteristic RMS root mean square SIRS systemic inflammatory response syndrome SNIP sensitive nonlinear iterative peak SVM support vector machines TIC total ion count TOF time-of-flight WBC white blood cells Introduction | 1 1. Introduction The diagnostics of diseases is one of the main challenges in modern medicine. Classical approaches such as interviewing a patient, screening body temperature, or measuring blood pressure provide valuable information about patients to phy- sicians. These tests can be performed as a part of a general medical examination and may reveal a patient’s state of health. More specific diagnostics require deeper insights into the patient’s condition. In many cases, this additional infor- mation is obtained by means of anatomic and clinical pathology. Anatomic pathology is associated with examination