<<

HYPERSPECTRAL FOR EARLY AND RAPID DETECTION OF

SALMONELLA SEROTYPES

by

MATTHEW BRENT EADY

(Under the Direction of Bosoon Park)

ABSTRACT

Optical microbial detection methodologies have shown the potential as early and

rapid pathogen detection methods. This proof-of-concept project explores the use of hyperspectral microscopy as a potential method for early and rapid classification of five

Salmonella serotypes. Darkfield hyperspectral microscope images were collected at early incubation times of 6, 8, 10, and 12 hrs., then compared to 24 hrs. incubation. Hypercube data collected from cells were analyzed through multivariate data analysis (MVDA) methods to assess classification ability of early incubation times. A separate experiment

was conducted to explore the ability of analyzing hyperspectral data collected from only

informative spectral bands as opposed to the original 89 spectral bands. MVDA showed

that incubation times as early as 8 hours had near identical spectra and classification

abilities compared to 24 hours; while the original 89-point spectrum could be reduced to

3 selected spectral bands and maintain high serotype classification accuracy.

INDEX WORDS: Salmonella, hyperspectral microscopy, early detection,

rapid detection, multivariate data

HYPERSPECTRAL MICROSCOPY FOR EARLY AND RAPID DETECTION OF

SALMONELLA SEROTYPES

by

MATTHEW BRENT EADY

BSA, The University of Georgia, 2011

A Thesis Submitted to the Graduate Faculty of The University of Georgia in Partial

Fulfillment of the Requirements for the Degree

MASTER OF SCIENCE

ATHENS, GEORGIA

2014

© 2014

Matthew Brent Eady

All Rights Reserved

HYPERSPECTRAL MICROSCOPY FOR EARLY AND RAPID DETECTION OF

SALMONELLA SEROTYPES

by

MATTHEW BRENT EADY

Major Professor: Bosoon Park Committee: Mark Harrison Fanbin Kong

Electronic Version Approved:

Julie Coffield Interim Dean of the Graduate School The University of Georgia December 2014

iv

DEDICATION

To my wife and children:

I dedicate this body of work in memory of my daughter, Amelie Eady, whose short time here taught me the true meaning of the word determination and inspired me to pursue the field of food safety, to my son, Judah Eady, who at the end of the day reminds me what all the effort is for, and finally, to my wife, Audrey Blackwell Eady, for her unshakable support of my education through the best of times, the worst of times, and everything in-between.

v

ACKNOWLEDGEMENTS

I would like to thank my major professor, Dr. Bosoon Park, for the opportunity to be a part of his lab and his commitment to this project. I would also like to acknowledge my committee members, Drs. Mark Harrison and Fanbin Kong, for their time and input.

I am grateful to Dr. Nasreen Bano for her expertise in microscopy and microbiological techniques. Thank you to Dr. Darlene Samuel for giving me my start with the USDA-

ARS and the confidence to pursue this endeavor, and I am also grateful to Dr. Brian

Bowker for his advice on the scientific writing and the peer review processes.

The past few years would not be possible without the support and encouragement of my wife, Audrey Blackwell Eady. Her constant support and sacrifices over the past few years have made this body of work possible. I would also like to thank my mother,

Elizabeth Eady, and brother, Jake Eady, for their encouragement and support. Also acknowledged are Brenda and Larry Blackwell for making the many trips to Athens, GA over the years to ease the stress-filled days.

Sincerest appreciation,

Mathew Eady

vi

TABLE OF CONTENTS

Page

ACKNOWLEDGEMENTS ...... v

LIST OF TABLES ...... ix

LIST OF FIGURES ...... x

CHAPTER

1 INTRODUCTION ...... 1

References ...... 6

Figure ...... 10

2 LITERATURE REVIEW ...... 11

Optical detection of bacteria ...... 11

Hyperspectral imaging (HSI) and the hypercube...... 13

Food quality applications of HSI ...... 14

Food safety applications of HSI ...... 14

Hyperspectral microscope imaging (HMI) ...... 15

Data preprocessing methods ...... 16

HSI and chemometrics ...... 17

Informative spectral band selection ...... 20

References ...... 21

Table ...... 28

vii

3 RAPID AND EARLY DETECTION OF SALMONELLA SEROTYPES

WITH HYPERSPECTRAL MICROSCOPE AND MULTIVARIATE DATA

ANALYSIS ...... 29

Abstract ...... 30

Introduction ...... 31

Materials and methods ...... 33

Results ...... 38

Discussion ...... 40

References ...... 44

Figures and Tables ...... 49

4 VISIBLE NEAR- HYPERSPECTRAL MICROSCOPY AND

INFORMATIVE BAND SELECTION FOR CLASSIFICATION OF

SALMONELLA ENTERICA SEROTYPES...... 55

Abstract ...... 56

Introduction ...... 57

Materials and methods ...... 60

Results ...... 66

Discussion ...... 67

References ...... 70

Figures and Tables ...... 75

5 CONCLUSIONS...... 82

6 APPENDICES ...... 85

viii

A Principal component analyses score plots from reduced variable selection

sets...... 85

ix

LIST OF TABLES

Page

Table 2.1: Overview of HIS methods applied to food for quality and safety assessment .28

Table 3.1: Difference of serotype values from a one-way ANOVA Fisher test, to compare

relative average spectral peak ratios (590 nm / 538 nm) at incubation times of 8 -

24 hrs……...... 52

Table 3.2: Mahalanobis distance (MD) values between serotype clusters calculated from

PCA score plots at incubation times of 8 - 24 hrs ...... 53

Table 3.3: Soft-independent modeling of class analogy (SIMCA) for all serotypes within

each incubation period ...... 54

Table 4.1: Description of variable sets selected for analyses ...... 80

Table 4.2: Confusion matrix results obtained from support vector machine classification

(SVMC)...... 81

x

LIST OF FIGURES

Page

Figure 1.1: The hyperspectral microscope setup ...... 10

Figure 3.1: Hyperspectral microscope images of S. Typhimurium cells at different

incubation times: (a) 6 hrs., (b) 8 hrs., (c) 10 hrs., (d) 12 hrs., and (e) 24 hrs...... 49

Figure 3.2: (a) Average spectra for S. Enteritidis normalized to 546nm, (b) average S.

Enteritidis spectra with pretreatment algorithm applied ...... 50

Figure 3.3: PCA score plots with loading vectors for four incubation times: (a) 8 hrs., (b)

10 hrs., (c) 12 hrs., and (d) 24 hrs ...... 51

Figure 4.1: Sample collection and analysis flow chart ...... 75

Figure 4.2: A mosaic of the 590 nm band with each variable selection set image (A-E)

displaying the obtained HMI with a highlighted cell (top), with a respective 3D

surface plot reproducing the highlighted pixels (middle), and accompanying

spectra of the collected bands for the highlighted cell ...... 76

Figure 4.3: Spectral band variance, normalized to each band, (b) loading vectors for PCA

score plots, and (c) average spectra for 5 Salmonella serotypes, vertical lines

represent the range of bands selected for variable sets B-E (586 – 662 nm) ...... 77

Figure 4.4: Example of how support vector machine (SVM) creates a hyperplane for

classification of nonlinear data ...... 78

Figure 4.5: Comparison of the 590-598 nm spectral range collected at each of the five

variable selection sets, for S.Enteritidis...... 79

xi

Appendix A: Principal component analysis for reduced spectral band models A-E…….85

1

CHAPTER 1

INTRODUCTION

The detection of foodborne disease causative agents such as Salmonella is a global concern, highlighted by disease outbreaks in recent years. Market demand and advances in transportation logistics have brought foods from around the world to our doorsteps in a matter of hours (Kӓferstein et al., 1997). While this enhanced availability has placed fresher and more exotic products in consumer markets, it has also been accompanied by food safety concerns regarding traceability and accountability. The

2011 outbreak of Shiga toxin-producing E. coli (STEC) O104 in organic bean sprouts grown in the Lower Saxony region of Germany is an example of the potential for rapid global disease spread. Before positively identifying the sprouts as the contaminated product, Spanish cucumbers were incorrectly labeled as the pathogen-harboring product, resulting in an estimated loss equivalent to $200M US dollars per week for exporters

(CDC, 2011). The sprouts were later positively identified as the source of the outbreak, tracing the problem back to Egyptian farms where fenugreek seeds were cultivated and contaminated (Bielaszewska et al., 2011; Buchholz et al., 2011). This outbreak highlights the spread of global agricultural commerce and the need for early and rapid foodborne disease detection methods.

The CDC estimated that in 2011 the United States had 47.8 million people affected by foodborne diseases, and approximately 3,000 deaths occurred as a result

(CDC, 2014). Salmonella, a facultative intracellular pathogen of interest in many food

2

products, can result in gastroenteritis and, in some cases, death. Majowicz et al., 2010

estimated that there are 80.3 million global cases of foodborne nontyphoidal

salmonellosis each year with 155,000 associated deaths.

Plating techniques involving nutrient-enriched growth media and immunological

testing methods, such as enzyme-linked immunosorbent assay (ELISA), or molecular methods, such as polymerase chain reaction (PCR), have been the standard microbiological testing methods for foodborne bacteria for years (Barlen et al., 2007).

While these methods are reliable, they have their disadvantages; confirmation through traditional plating media can take anywhere from several days to weeks due to incubation times and rounds of presumptive testing followed by confirmation plating (Ellis and

Goodacre, 2006). Investigators are rushing to positively identify the contaminated products and can potentially name the wrong product, such as the previously-mentioned

Spanish cucumbers. Molecular and immunological testing methods are less time- consuming; however, methods such as PCR require extensive training and can become expensive due to the high recurring cost of target-specific antibodies and pathogen- specific reagent kits, that are necessary for lysing the cell and extracting the DNA before amplification (Jay et al., 2005; Mullins et al., 1987).

Apart from the human health effects caused by foodborne disease outbreaks, there is also an economic impact. Production of broiler chickens and peanuts are integral agricultural industries to the State of Georgia’s economy. Other large agricultural industries in the State of Georgia include blueberries, onions, peaches, and oysters, all of which can be susceptible to a variety of bacterial contaminations. In 2009, a Salmonella

Typhimurium outbreak occurred at a peanut processing facility in Blakely, Georgia

3

resulting in 471 reported illnesses and nine deaths, as well as the shutdown of a company

and the loss of approximately two hundred jobs in the state (CDC, 2012). The outbreak

caused a dip in peanut sales nationwide for several months before returning to normal

(Wittenberger, 2010). The CDC’s PulseNet is a national disease tracking network of

public health agencies and food regulatory operations, which recognize a pattern of

unique DNA fingerprints for Salmonella and other pathogenic bacteria through pulsed-

field gel electrophoresis (PFGE) (Murase et al., 1995). Because it typically takes two to

three weeks from ingestion of a contaminated food source until a positive diagnosis of

disease from an outbreak, it is critical to reduce the days or weeks that may be required to

then correctly match the disease to a strain isolated from a food source to minimize the

spread of an outbreak.

In February of 2013, the Food Safety and Inspection Service (FSIS), of the United

States Department of Agriculture (USDA), implemented more stringent allowable

Salmonella limits for poultry processors. A method of rapid and early detection could potentially assist production facilities in meeting federal regulatory standards, while

reducing the chance of production slowdowns and yield-loss due to extended quality

control checks that rely on time-consuming incubation periods for microbial enumeration, and an early and rapid detection methodology could be implemented in- house to assist in meeting federal standards with less of a strain on production yields.

Optical detection has shown potential in detecting and classifying bacteria

through spectroscopic methods that identify bacteria based on a spectral signature unique

to the organism (Fan et al., 2008). Previously, spectroscopic instruments such as Raman,

Fourier transform infrared (FTIR), laser-induced breakdown (LIBS), and

4

surface plasmon resonance (SPR) have been utilized for obtaining unique spectral

signatures (Perkins and Squirrell et al., 2000; Rehse et al., 2010; Sundaram et al., 2013;

Yang et al., 2014). Once the sample’s spectra are collected, multivariate data analysis

(MVDA) methods are typically applied to classify samples. The resulting spectra can exhibit signs of collinearity with small differences in spectra. MVDA treats each spectral band as an independent variable; therefore, small differences in specific peaks can be identified as unique contributing factors to a specific bacterial species or serotypes.

Other MVDA methods, such as the principal component analysis (PCA), soft

independent modelling by class analogy (SIMCA), or partial least squares regression

(PLSR), are typically applied to the selected spectra as a means of differentiating and

classifying samples.

Hyperspectral imaging (HSI) is a technique that combines imaging with

spectroscopy through the collection of data in a three-dimensional matrix. HSI has been

most commonly associated with military astronomy, medicine and pharmaceuticals

(Gowen et al., 2008). The ability to collect both spatial and spectral data from a sample offers an advantage over other spectroscopic techniques. Previously, HSI has been applied to a variety of agricultural commodities for quality measurements. Food safety measures have also been assessed with HSI on commodities such as broiler chickens

(Park et al., 2006), with recent applications focusing on predicting the bacterial loads on chickens and differentiating non-O157:H7 Shiga-toxin producing E.coli. (STEC) (Feng and Sun, 2013; and Windham et al., 2013). The focus of this project is investigating the potential for hyperspectral microscopy (HMI) as a means of early and rapid classification

5

for pathogenic bacteria, as little research has been published on HMI as a means of

foodborne bacteria detection. The HMI setup can be viewed in Figure 1.1.

This project investigates HMI data collected for five Salmonella serotypes (S.

Enteritidis, S. Heidelberg, S. Infantis, S. Kentucky, and S. Typhimurium). The objectives of this research are: 1) to determine if HMI data collected for serotypes of the same bacterial species can be differentiated and if they can be differentiated at early incubation

times of 6, 8, 10, and 12 hours as well as how the spectra compares to 24 hours

incubation at 35-37°C, and 2) can the large data processing and storage requirement

associated with hyperspectral imaging be reduced to only the necessary spectral bands

while achieving a high classification accuracy in an effort to reduce the data processing

and storage requirement.

6

References

1. Barlen, B., S. D. Mazumdar, O. Lezrich, P. Kampfre, and M. Keusgen. 2007.

Detection of Salmonella by surface plasmon resonance. Sensors 7:1427-1446.

2. Bielaszewska, M., A. Mellmann, W. Zhang, R. Kock, A. Fruth, A. Bauwens, G.

Peters, and H. Karch. 2011. Characterization of the Escherichia coli strain

associated with an outbreak of haemolytic uraemic syndrome in Germany, 2011: a

microbiological study. Lancet Infect. Dis. 11: 671-676.

3. Buchholz, U., H. Bernard, D. Werber, M.M. Bohmer, C. Remschmidt, H.

Wilking, Y. Delere, M. an der Heiden, C. Adlhoch, J. Dreesman. 2011. German

outbreak of Escherichia coli O104: H4 associated with sprouts. N. Engl. J. Med.

365:1763-1770.

4. CDC. 02 September 2011. Investigation update: outbreak of Shiga toxin-

producing E.coli O104 (STEC) O104:H4) infections associated with travel to

Germany http://www.cdc.gov/ecoli/2011/ecolio104/#investigation. 01 October

2014.

5. CDC. 02 December 2012. Multistate outbreak of Salmonella Typhimurium

infections linked to peanut butter, 2008-2009.

http://www.cdc.gov/salmonella/typhimurium/update.html. 01 October 2014.

6. CDC. 08 January 2014. CDC 2011 estimates: findings.

http://www.cdc.gov/foodborneburden/2011-foodborne-estimates.html. 01 October

2014

7

7. Ellis, D.I., and R. Goodacre. 2001. Rapid and quantitative detection of the

microbial spoilage of foods: current status and future trends. Trends in Food Sci.

Technol.11:414-424.

8. Fan, X., I.M. White, S.I. Shopova, H. Zhu, J.D. Sutter, and Y. Sun. 2008.

Sensetive optical biosensors for unlabled targets: a review. Anal. Chim. Acts. 1:8-

26.

9. Feng, Y.Z., D.W. Sun. 2013. Determination of total viable count (TVC) in

chicken fillets by near-infrared hyperspectral imaging and spectroscopic

transformations. Talanta. 105:244-249.

10. Gowen, A., C. O’Donnell, and M. Taghizadeh. 2008. Hyperspectral imaging for

the investigation of quality deterioration in sliced mushrooms (Agaricus bisporus)

during storage. Sens. Inst. Food Qual. 2:133-143.

11. Jay, J.M., M.J. Loessner, and D.A. Golden. 2005. Modern food microbiology.

Springer, New York, NY.

12. Käferstein, F.K., Y. Motarjemi, and D.W. Bettcher. 1997. Foodborne disease

control: a transnational challenge. Emerg. Infect. Dis. 3:503-510.

13. Majowicz, S.E., J. Musto, E. Scallan, F.J. Angulo, M. Kirk, S.J. O’Brien, T.F.

Jones, A. Fazil, and R.M. Hoekstra. 2010. The global burden of nontyphoidal

Salmonella gastroenteritis. Clin. Infect. Dis. 50:882-889.

14. Mullis, K.B., and F.A. Faloona. 1987. Specific synthesis of DNA in vitro via a

polymerase-catalysed chain reaction. J. Gen. Microbiol. 155:335-350.

8

15. Murase, T., T. Okitsu, R. Suzuki, H. Morozumi, A. Matsushima, A. Nakamura,

and S. Yami. 1995. Evaluatio nof DNA fingerprinting by PFGE as an

epidemiological tool for Salmonella infections. Microb. Immun. 39:673-676.

16. Park, B., K. C. Lawrence, W.R. Windham, and D.P. Smith 2006. Performance of

supervised classification algorithms of hyperspectral imagery for identifying fecal

and ingesta contaminates. Trans. ASABE. 49:2017-2014.

17. Perkins, E. A., and D. J. Squirrell. 2000. Development of instrumentation to allow

the detection of microorganisms using scattering in combination with surface

plasmon resonance. Biosens. Bioelectron. 14:853-859.

18. Rehse, S. J., Q. I. Mohaidat, and S. Palchaudhuri. 2010. Towards the clinical

application of laser-induced breakdown spectroscopy for rapid pathogen

diagnosis: the effect of mixed cultures and sample dilution on bacterial

identification. Appl. Optics 49:C27-C35.

19. Sundaram, J., B. Park, A. Hinton, K. Lawrence, and Y. Kwon. 2013. Detection

and differentiation of Salmonella serotypes using surface enhanced Raman

scattering (SERS) technique. J. Measur. 7:1-12 .

20. Windham, W.R., S.C. Yoon, S.R. Ladely, J.A. Haley, J.W. Heitschmidt, K.C.

Lawrence, B. Park, N. Narrang, and W.C. Cray. 2013. Detection by hyperspectral

imaging of Shiga toxin–producing Escherichia coli serogroups O26, O45, O103,

O111, O121, and O145 on rainbow agar. J. Food Prot. 76:1129-1136.

21. Wittenberger, K. 2010. Peanut outlook: Impacts of the 2008-09 foodborne illness

outbreak linked to Salmonella in peanuts. DIANE publishing. USDA.

9

22. Yang, L.J., J. Wang, Z.J. Li, X.H. Song, Y.M. Liu, and Q.R. Hu. 2014. Rapid

differentiation and identification of Shigella sonnei and Escherichia coli O157:

H7 by Fourier transform and multivariate statistical

analysis. Adv. Mater. Res. 926:1116-1119.

10

EMCCD Camera

AOTF

Microscope

Light Source

Controller

Figure 1.1: The hyperspectral microscope setup

11

CHAPTER 2

LITERATURE REVIEW

Optical detection of bacteria

The exploration of optical detection methods for pathogenic bacteria is based on

identifying from spectral signatures unique to the organism. Optical detection is typically

viewed as a non-destructive rapid identification method that can potentially counter

disadvantages of traditional detection methods such as nutrient-enriched growth media,

immunological testing such as antigen/antibody interactions, and molecular methods such

as polymerase chain reaction (PCR) (Jay et al., 2005). The use of nutrient-enriched growth media can take up to a week for results (Rose, 1998). While immunological testing and PCR require much less time, but there is also a high recurring cost due to cell lysing reagents and pathogen specific antibodies, with extensive training required for equipment associated with these methods (Barlen et al., 2007). Various spectroscopic approaches have been applied to bacterium identification, such as laser-induced breakdown spectroscopy (LIBS), Fourier-transform infrared (FTIR) spectroscopy, surface-enhanced (SERS), and surface plasmon resonance (SPR)

(Perkins and Squirrell, 2000; Rehse et al., 2010; Sundaram et al., 2013; Yang et al.,

2014)., with the objective of creating a reference library of spectral signatures that can be

used when comparing unknown bacteria obtained from a suspect food product.

LIBS has seen wide remote applications in fields ranging from chemical

analysis, military defense, and geology (Gottfried and Harmon, 2009). Use of the LIBS

12

system requires little sample preparation and a small sample size (Barnett et al., 2011).

Rehse et al., (2010) stated that collection time of a bacterial sample with a LIBS system required only a few nanoseconds to obtain the spectra. A review of the literature shows

that, for the research area of bacterial detection and classification, there is much variation

in collection techniques with laser strength, type, and harmonizing levels, as well as the

platform for observing the bacteria. Variations in platform include a nutrient-free agar,

nutrient-enriched media, water suspensions, and creation of concentrated freeze-dried

pellets of bacteria (Barnett et al., 2011; Baudelet et al., 2006; Gamble, 2014; Multari et

al., 2010).

FT-IR collects data over a broad spectral range and requires the addition of a

Fourier transformation algorithm applied to the raw data. Yang et al., 2014 used the FT-

IR when searching for a rapid discrimination method for Shigella sonnei and Escherichia

coli O157:H7 and collected absorption spectra between 4000-600 cm-1. The study found

that focusing the spectra on a range of 1800-900 cm-1 highlighted the differences between

the two species, which could then be differentiated by multivariate data analysis

(MVDA).

SERS is another common method for the optical detection of bacteria. Typically,

a substrate is prepared that enhances the Rayleigh scattering of light (Xu et al., 2013).

Substrates are either roughened metallic substances, such as gold or silver, or are composed of nanoparticles. Compared to FT-IR, SERS has seen more bacterial identification applications, in part because FT-IR signals arise from asymmetrical molecular vibrations as opposed to SERS, which yields spectral bands arising from

13

symmetrical vibrations (Ellis and Goodacre, 2006). Water is also a weak Raman

scattering substance, which has a greater effect on FT-IR (Jarvis and Goodacre, 2008).

SPR-based immunoassays have also previously been developed for detecting bacteria (Bokken et al., 2003; Mazundar et al., 2007; Perkins and Squirrell, 2000; Waswa et al., 2007), which is done by binding bacterial cells to antibodies, creating an enhanced signal for detection of SPR spectra with a biosensor (Barlen et al., 2007). While rapid cell extraction and imaging can be achieved with this technology, there is once again the high recurring cost of purchasing the antibodies in addition to the equipment training requirements.

A review of the literature shows both advantages and disadvantages to these optical detection methodologies. Some methods such as the LIBS system require minimal sample preparation and a data acquisition time of only a few nanoseconds.

There is debate over an appropriate substrate for the bacteria sampling in addition to the time requirement for fine-tuning the instrument settings. Other methods such as SERS,

SPR, or FT-IR may need a concentrated cell sample from the same strain, requiring enrichment.

Hyperspectral imaging (HSI) and hypercube

There is an array of remote HSI applications ranging from astronomy, pharmaceuticals, and medical applications (Gowen et al., 2008), and one advantage of

HSI over the previously mentioned spectroscopic techniques is the ability to collect data in a three-dimensional hypercube. Two dimensions represent spatial data (x and y coordinates) with a third spectral dimension (λ). Data are collected at each point over the spectrum and at each pixel to form a band of the cube. Each band is combined together

14

and forms a three-dimensional cube. Because the first and second spatial dimensions are constant throughout the overlaying of images, the third dimension will yield a spectrum across all collected bands. Also, data from the hypercube can be analyzed on a band-by-

band basis because each collected band produces a separate image.

Food quality applications of HSI

HSI has been implemented for analysis of quality attributes in a variety of

agricultural commodities ranging from meats, dairy, fruits, and vegetables with an

overview of examples from previously conducted research in this field listed in Table 2.1.

This list is not all-inclusive, but is rather meant to provide an understanding of the variety

in quality issues that have been previously investigated. This table also shows us that

there has been an influx in research investigating HSI and food within the last 10 years.

Projects have employed hyperspectral imaging to predict quality parameters within the

commodity, such as predicting the drip-loss of salmon or assessing apples for bruising

(He et al., 2014; Elmasry et al., 2008). Chao et al., (2007) demonstrated that

hyperspectral imaging could be applied in a high-throughput setting to maintain

production goals in a food processing facility.

Food safety applications of HSI

There has been far less published research applying HSI to food safety matters.

Previously, work has been published in the area of applying HSI to detecting fecal matter

on broiler chicken carcasses (Heitschmidt et al., 2007; Lawrence et al., 2007; Park et al.,

2006; Windham et al., 2003). In general, an in-line hyperspectral imaging system was

established for detection of fecal material present on broiler chicken carcasses during

processing (Park et al., 2006). Images were collected and analyzed through a variety of

15

MVDA methods such as spectral angle mapper (SAM), Mahalanobis distances, support

vector machine (SVM) classification, and principal component analysis (PCA). Once

contaminated birds were identified, they could then be reprocessed through further

decontamination steps.

Foodborne pathogen detection on a colony scale has also recently been approached with HSI. Windham et al., (2013) and Yoon et al., (2013) were able to

classify Shiga toxin-producing E. coli (STEC) bacterial colonies growing on a rainbow

agar plate. These studies observe bacterial colonies, six non-O157:H7 serogroups (O26,

O45, O103, O111, O126, and O145) of STEC, on agar plates. A five-pixel crosshair is used to select pixels within the colonies, followed by MVDA. Windham et al., (2013) found that using HSI to identify overall classification of the six serogroups through a k- nearest neighbor (kNN) cluster analysis approach was accurate 99% of the time, with approximately 4% of O103 colonies misclassified as O26 colonies.

Feng et al., (2013) applied near-infrared hyperspectral imaging with a spectral range of 930-1450nm to predict the total Enterobacteriaceae load on chicken breast fillets. This family of bacteria represents organisms commonly found in poultry, such as

Salmonella, E. coli, and Enterobacter. The results of this standoff imaging method showed that through a combination of hyperspectral image collection and partial least squares regression (PLSR) the bacterial loads on fillets was correctly predicted with R2

-1 ≥81% accuracy and a root mean squared error (RMSE) of ≤ 0.47 log10 CFU g .

Hyperspectral microscope imaging (HMI)

HMI applies the concepts of hyperspectral imaging to microscopic images.

Hypercube data are generated for the cells captured within a darkfield image. In theory,

16

this will create a sensitive detection method that will require only a few cells for

classification. Little published research exists on the topic of bacterial classification

through HMI. Previously, Anderson et al, (2008) investigated the use of HMI to differentiate between live and dead endospores. This microscope setup used a tunable filter, for filtering the tungsten halogen light into its hyperspectral state and collected 32 bands over the range of 400 – 720nm. Bacillus strains were imaged under immersion oil and 1000x magnification. The study found that live-viable cells varied from peroxide killed cells, but not chlorine killed cells. It was theorized that this was due to the peroxide killed cells having extensive damage to their outer layers, while the chlorine did not have the same effect on the endospores.

Hyperspectral data preprocessing methods

The purpose of preprocessing methods is to highlight the intrinsic differences in a sample’s unique spectral signature of samples and minimizing spectral noise that can be attributed to background spectra, object scattering, limb-darkening, or differences in particle size (Amigo et al., 2013; Park et al., 2006). Spectroscopic data have a tendency to be collinear (Mark and Workman, 1991). A data pretreatment algorithm can reduce noise while focusing on differences between samples. A variety of data pretreatment algorithms have been applied to preprocess hyperspectral image data. Perhaps the most commonly used is the Savitzky-Golay 1st or 2nd derivative-based algorithm (Savitzky et

al., 1964). In addition to Savitzky-Golay, other preprocessing measures can be used such

as moving average smoothing treatments, Norris gap derivatives, baseline offsets, or

segments-size derivatives. These preprocessing steps share a common goal of enhancing

the intrinsic spectral differences between samples and reducing spectral noise.

17

HSI and chemometrics

The term “chemometrics” originated in the early 1970s; although the origins of the MVDA approach is said to have dated back decades prior (Esbensen, 1990). Because the hypercube generates a full spectrum for each pixel in an image, chemometrics is often implemented for handling the large amount of data generated from hyperspectral imaging, that can also tend to be collinear (Amigo, 2010). Chemometrics can be used to extract meaningful data from hyperspectral images while reducing dimensionality

(Amigo et al., 2013). Factorial analyses reduce large amounts of hyperspectral data into easy-to-read plots that relate the common underlying principles of the data trends between samples. A review of the literature shows many MVDA methods are applied, such as; principal component analyses (PCA), partial-least squares regression (PLSR), support vector machine (SVM) classification, multivariate curve resolution (MCR), multivariate linear regression or (MLR) to name a few (Elmasry et al., 2012; Feng et al.,

2012; Sun 2010). Multiple MVDA approaches are also listed in Table 2.1.

PCA is a commonly used analytical method. PCA can be useful in simplifying the analysis, and reducing the number of variables by decomposing the spectral X matrix:

X = TPT + E where X is the data matrix, T represents the score matrix, P represents the loading matrix, and E represents the residuals or spectral noise (Wold et al., 1987). The primary tools of a PCA analysis are the scores plot and x-loading vector plot. The PCA scores plot shows two PCA component vectors plotted against each other representing the underlying footprint of the spectra, while the x-loadings plot can be viewed as a “map of variables” that explains the influence each observed band has on the spectra as a whole (Esbensen,

18

2000). PCA is particularly useful in dealing with HSI samples due to the ability to

handle a large amount of data. The PCA score plot is a clear way to quickly visualize the

cluster relationship among samples, and a quantitative representation amongst the PCA

clusters can help in assessing cluster separation. The Mahalanobis distance (MD) can be

utilized in the principal component space and takes into account the correlation in data by

using the inverse of the variance-covariance matrix from the data set, and measuring from

the center of one cluster to the center of another cluster (De Maesschalck et al., 2000).

The MD can be computed as:

2 T -1 MD = (xi – xj) S (xi - xj)

Where xi and xj are the averages of groups i and j, S is the covariance matrix, superscript

T is the transverse of the matrix, and MD2 represents the Mahalanobis distance.

SVM classification is a method of kernel based nonlinear classification. When

the data is not necessarily linear and multidimensional clustering of sample classes is

present, a kernel classification is implemented for clustering based on the three

dimensional space (Alonso et al., 2011). A kernel based recognition algorithm is applied

for reclassification of the data classes based on the separation capabilities of the created hyperplane, that serves as an intermediate step:

h = wT x + b

where h is the hyperplane data, x is the HMI input pixel vector, w is the adaptable weight vector, b represents the bias, and T is the transverse operator (Sun, 2010). Samples from classes are then reclassified based on their separation with this hyperplane.

PLSR is a generalization of MLR, that is utilized when assessing highly collinear spectra with many variables (Wold et al., 2001). PLSR compares the similarities between

19

two data matrices (Hair et al., 2010). Because we are observing spectra of serotypes

within the same bacterial species the Pearson correlation values tend to be highly

collinear. In this research project we are using PLSR as a method of assessing the

goodness-of-fit for a global data pretreatment model. The second data matrix is the

collection of untreated raw spectral data obtained through another repetition of our

experiment, while the same global data pre-treatment model is applied, and the similarity

of PCA results is observed. A high R2 and low root-mean squared error of calibration

(RMSEC) or validation (RMSEV) is desired.

The soft-independent model classification analogy (SIMCA) is another tool that has been previously used for validating the results of a PCA (Sundaram et al., 2013).

Instead of addressing the goodness-of-fit such as PLSR, SIMCA applies the previously applied pre-treatment algorithm to a data training set to predict the classification abilities of the validation repetition, based on the results of a previously completed PCA (Dunn III et al., 1978).

Table 2.1 shows a summary of HSI data collected from agricultural commodities including processing technique, spectral bands analyzed, and the chemometric analysis that was applied to the study. The table shows a variety of approaches dependent on the scope of the study, type of image, and the nature of the data produced from the microscope hypercube data. For this project, the Unscrambler V10.3 software program

(CAMO, Oslo, Norway) was applied. This program is widely used throughout the literature for spectral processing.

20

Informative spectral band selection

One of the downsides to this methodology is processing and storing the large amount of data generated from the hypercube. It is common practice in dealing with HSI projects to select the minimal number of bands necessary to reduce the spectral noise from unnecessary bands, while still maintaining the naturally occurring variance (Sun

2010). Several methods for band selection exist such as analyzing the loading vectors of

PCAs for the bands explaining the largest amount of variance (Windham et al., 2013).

Another method of band selection focuses on a Monte Carlo approach for variable elimination by using a robust statistical model applied with PLSR (Gowen et al., 2012).

PLSR has also been applied for informative band selection (Lawrence et al., 2004).

Gowen et al., (2008) applied MLR for selection of optimal bands. There is a large amount of variance in MVDA and informative band selection methods. The nature of the biological material, amount of spectral bands, type of filtering apparatus, and lighting source are all play factors in selecting an appropriate MVDA and informative band selection method.

21

References:

1. Alonso, M., J.A. Malpica, and A.M. de Agirre. 2011. Consequences of the hughes

phenom on some classification techniques. ASPRS 2011 Annual Conf.

Milwalkee, WI.

2. Amigo, J.M. 2010. Practical issues of hyperspectral image analysis of solid

dosage forms. J. Anal. Chem. 398:93-109

3. Amigo, J.M., I. Marti, and A. Gowen. 2013. Hyperspectral imaging and

chemometrics: A perfect combination for the analysis of food structure,

composting and quality. Elsevier BV. Amsterdam, The Netherlands. 28:343-370.

4. Anderson, J., C. Reynolds, D. Ringelberg, J. Edwards, and K. Foley. 2008.

Differentiation of live-viable versus dead bacterial endospores by calibrated

hyperspectral reflectance microscopy. J Microsc. 232:130-136.

5. Ariana, D., D.E. Guyer, and B. Shrestha. 2006. Integrating multispectral

reflectance and fluorescence imaging for defect detection in apples. Comps.

Electron. Agr.2:148-161.

6. Barlen, B., S. D. Mazumdar, O. Lezrich, P. Kampfre, and M. Keusgen. 2007.

Detection of Salmonella by surface plasmon resonance. Sensors 7:1427-1446.

7. Barnett, C., C. Bell, K. Vig, A.C. Akpovo, L. Johnson, S. Pillai, and S. Singh.

2011. Development of a LIBS assay for the detection of Salmonella enterica

serovar Typhimurium from food. Anal. Bioanal, Chem. 400:3323-3330.

8. Baudelet, M., J. Yu, M. Bossu, J. Jovelet, J.P. Wolf, T.Amodeo, E. Frejafon, and

P. Laloi. 2006. Discrimination of microbiological samples using femtosecond

laser-induced breakdown spectroscopy. Appl. Phy. Let. 89:163903.

22

9. Bokken, G. C., R.J. Corbee, F. Knapen, and A.A. Bergwerff. 2003.

Immunochemical detection of Salmonella group B, D and E using an optical

surface plasmon resonance biosensor. FEMS Microb. Let. 222:75-82.

10. Burger, J., and A. Gowen. 2011. Data handling in hyperspectral image analysis.

Chemom. Intell. Lab. Syst. 108:13-22

11. Chao, K., C.C. Yang, Y.R. Chen, M.S. Kim, and D.E. Chan. 2007. Hyperspectral-

multispectral line-scan imaging system for automated poultry carcass inspection

applications for food safety. Poult. Sci. 86:2450-2460.

12. De Maesschalack, D., D. Jouan-Rimbaud, and D. Massart. 2000 Tutorial: the

mahalanobis distance. Chemom. Intell. Lab. Syst. 50:1-18.

13. Dunn III, W.J., S. Wold, and Y.C. Martin. 1978. Structure-activity study of β-

adrenergenic agents using the SIMCA method of pattern recognition. J. Med.

Chem. 21:922-930.

14. Ellis, D.I., and R. Goodacre. 2006. Metabolic fingerprinting in disease diagnosis:

biomedical applications of infrared and Raman spectroscopy. Analyst. 131:875-

885.

15. ElMasry, G., N. Wang, C. Vigneault, J. Qiao, and A. Elsayed. 2008. Early

detection of apple bruises on different background colors using hyperspectral

imaging. LWT-Food Sci. Technol. 41:337-345.

16. Elmasry, G., M. Kamruzzaman, D.W. Sun, and P. Allen. 2012. Principles and

applications of hyperspectral imaging in quality evaluation of agro-food products:

a review. Crit. Rev. Food. Sci. 52:999-1023.

23

17. Esbensen, K. 1990. The start and early history of chemometrics: selected

interviews. Part 2. J. Chemom. 4:389-412.

18. Esbensen, K. 2000. Multivariate data analysis - in practice. 4th ed. 2000. CAMO,

Trondheim, Norway.

19. Feng, Y. Z., and D.W. Sun. 2012. Application of hyperspectral imaging in food

safety inspection and control: a review. Crit. Rev. Food Sci. Nutr. 52:1039-1058.

20. Feng, Y.Z., G. Elmasry, D.W. Sun, A.G. Scannell, D. Walsh, and N. Morcy.

2013. Near-infrared hyperspectral imaging and partial least squares regression for

rapid and reagentless determination of Enterobacteriaceae on chicken fillets.

Food Chem. 138:1829-1836.

21. Gamble, G. 2014. The effect of nutrient media water purity on LIBS based

identification of bacteria.

http://www.ars.usda.gov/research/publications/publications.htm?SEQ_NO_115=3

07307. Accessed 1 October 2014.

22. Gottfried, J. L., and R. S. Harmon. 2009. Multivariate analysis of laser-induced

breakdown spectroscopy chemical signatures for geomaterial classification.

Spectrochim. Acta, Part B. 64:1009-1019.

23. Gowen, A., C. O’Donnell, and M. Taghizadeh. 2008. Hyperspectral imaging for

the investigation of quality deterioration in sliced mushrooms (Agaricus bisporus)

during storage. Sens. Inst. Food Qual. 2:133-143.

24. Gowen, A. A., C. P. O'Donnell, G. Downey, and C. Esquerre. 2012. Selection of

variables based on most stable normalised partial least squares regression

24

coefficients in an ensemble monte carlo procedure. J. Near Infrared Spectrosc.

19:443-450.

25. Hair, J.F., W.C. Black, J.B. Barry, and R.E. Anderson. 2010. Multivariate data

analysis. Prentice Hall, Upper Saddle River, NJ. 760-764.

26. He, H. J., D. Wu, and D. W. Sun. 2014. Rapid and non-destructive determination

of drip loss and pH distribution in farmed Atlantic salmon (Salmo salar) fillets

using visible and near-infrared (Vis-NIR) hyperspectral imaging. Food Chem.

156:394-401.

27. Heitschmidt, G., B. Park, K.C. Lawrence, W. R. Windham, and D. P. Smith.

2007. Improved hyperspectral imaging system for fecal detection on poultry

carcasses. Trans. ASABE 50:1427-1432.

28. Jay, J.M., M.J. Loessner, and D.A. Golden. 2005. Modern food microbiology.

Springer. New York, NY.

29. Lawrence, K. C., W. R. Windham, B. Park, and R.J. Buhr. 2003. A hyperspectral

imaging system for identification of faecal and ingesta contamination on poultry

carcasses. J. Near Infrar. Spectrosc. 11:269-281.

30. Lawrence, K.C., W.R. Windham, B. Park, D.P. Smith, and G.H. Poole. 2004.

Comparison between visible/NIR spectroscopy and hyperspectral imaging for

detecting surface contaminates on poultry carcasses. Proc. SPIE. Monitoring

Food Safety, Agriculture, and Plant Health. Providence, RI.

31. Jarvis, R.M., and R. Goodacre, 2008. Characterisation and identification of

bacteria using SERS. Cem. Soc. Rev. 37:931-936

25

32. Mark, H., and J. Workman. 1991. Statistics in spectroscopy. Academic Press Inc.

San Diego, CA. 294-295

33. Mazumdar, S. D., M. Hartmann, P. Kämpfer, and M. Keusgen, 2007. Rapid

method for detection of Salmonella in milk by surface plasmon resonance (SPR).

Biosens. Bioelectron. 22:2040-2046.

34. Multari, R. A., D. A. Cremers, J.M. Dupre, and J.E. Gustafson. 2010. The use of

laser-induced breakdown spectroscopy for distinguishing between bacterial

pathogen species and strains. Appl. Spectrosc. 64:750-759.

35. Naganathan, G.K., L.M. Grimes, J. Subbiah, R. Lu, C.R. Calkins, A. Samal, and

G.E. Meyer. 2008. Visible/near-infrared hyperspectral imaging for beef

tenderness prediction. Comput. Electron. Agr. 64:225-233.

36. Park, B., K. C. Lawrence, W.R. Windham, and D.P. Smith 2006. Performance of

supervised classification algorithms of hyperspectral imagery for identifying fecal

and ingesta contaminates. Trans. ASABE. 49:2017-2014.

37. Park, B., K.C. Lawrence, W.R. Windham, and D. Smith. 2006. Performance of

hyperspectral imaging system for poultry surface fecal contaminate detection. J.

Food Eng. 3:340-348.

38. Perkins, E. A., and D. J. Squirrell. 2000. Development of instrumentation to allow

the detection of microorganisms using light scattering in combination with surface

plasmon resonance. Biosens. Bioelectron. 14:853-859.

39. Rehse, S. J., Q. I. Mohaidat, and S. Palchaudhuri. 2010. Towards the clinical

application of laser-induced breakdown spectroscopy for rapid pathogen

26

diagnosis: the effect of mixed cultures and sample dilution on bacterial

identification. Appl. Optics 49:C27-C35.

40. Rose, B.E. 1998. USDA/FSIS microbiology guidebook. USDA Food Safety

Inspection Service. 4.1-4.2.

41. Savitzky, A. and M. J. Golay. 1964. Smoothing and differentiation of data by

simplified least squares procedures. Anal. Chem. 36:1627-1639.

42. Singh, C.B., D.S. Jayas, J. Paliwal, and N.D.G. White. Detection of insect

damaged wheat kernals using near-infrared hyperspectral imaging. J. Stored Prod.

Res. 3:151-158.

43. Siripatrawan, U., Y. Makino, Y. Kawagoe, and S. Oshita. 2011. Rapid detection

of Escherichia coli contamination in packaged fresh spinach using hyperspectral

imaging. Talanta 85:276-281.

44. Sun, D. W. Ed. 2010. Hyperspectral imaging for food quality analysis and control.

Elsevier. San Diego, CA. 92-94.

45. Sundaram, J., B. Park, A. Hinton, K. Lawrence, and Y. Kwon. 2013. Detection

and differentiation of Salmonella serotypes using surface enhanced Raman

scattering (SERS) technique. J. Measur. 7:1-12 .

46. Waswa, J., J. Irudayaraj, and C. DebRoy. 2007. Direct detection of E. coli O157:

H7 in selected food systems by a surface plasmon resonance biosensor. LWT-

Food Sci. Technol. 40:187-192.

47. Windham, W. R., K. C. Lawrence, B. Park, and R.J. Buhr. 2003. Visible/NIR

spectroscopy for characterizing fecal contamination of chicken carcasses. Trans.

ASABE 46:747-754.

27

48. Windham, W.R., S.C. Yoon, S.R. Ladely, J.A. Haley, J.W. Heitschmidt, K.C.

Lawrence, B. Park, N. Narrang, and W.C. Cray. 2013. Detection by hyperspectral

imaging of Shiga toxin–producing Escherichia coli serogroups O26, O45, O103,

O111, O121, and O145 on rainbow Agar. J. Food Prot. 76:1129-1136.

49. Wold, S., K. Esbensen, and P. Geladi. 1987. Principal component analysis.

Chemom. Intell. Lab. Syst. 2:37-52.

50. Wold, S., M. Sjöström, and L. Eriksson. 2001. PLS-regression: a basic tool of

chemometrics. Chemom. Intell. Lab. Syst. 58:109-130.

51. Wu. D., and D.W. Sun. 2013. Applications of visible and near infrared

hyperspectral imaging for non-invasively measuring distribution of water-holding

capacity in salmon flesh. Talanta 116:266-276.

52. Xu, X., H. Li, D. Hasan, R.S. Ruoff, A.X. Wang, and D.L. Fan. 2013. Near‐field

enhanced plasmonic‐magnetic bifunctional nanotubes for single cell bioanalysis.

Adv. Funct. Mater. 23:4332-4338.

53. Yang, L.J., J. Wang, Z.J. Li, X.H. Song, Y.M. Liu, and Q.R. Hu. 2014. Rapid

Differentiation and identification of Shigella sonnei and Escherichia coli O157:

H7 by Fourier transform infrared spectroscopy and multivariate statistical

analysis. Adv. Mater. Res. 926:1116-1119.

54. Yoon, S. C., W. Windham, S. Ladely, G. Heitschmidt, K. Lawrence, B. Park, N.

Narang, and W. Cray. 2013. Hyperspectral imaging for differentiating colonies of

non-O157 Shiga toxin-producing Escherichia coli (STEC) serogroups on spread

plates of pure cultures. J. Near Infrar. Spectrosc. 21:81-95.

28

Table: 2.1 Overview of HSI methods applied to food for quality and safety assessment.

Product Quality Attribute Spectra Band MVDA # of Reference Range Method Optimal (nm) Bands Selected Apple Surface defects Reflectance 450-940 ANN N/A Ariana et al., 2006 Beef Tenderness Reflectance 400- PCA, 8 Naganathan 1000 CDA et al., 2008 Cheese Composition Absorbance 1100- PCA, N/A Burger and 2498 PLS Geladi 2006 Chicken Fecal detection Reflenctance 430-900 Par, N/A Park et al., Min, 2006 Chicken Enterobacteriacia Reflectance 930- Mah, 3 1660 Max, Feng et al., Sam, 2013 Bin PLSR Non Colony Absorbance 400- PCA, N/A Windham et O157:H7 differentiation 1000 kNN al., 2013 STEC E. coli Grain Insect damage Reflectance 1000- PCA, 2 Singh et al., 1600 Bin 2008 Mushroom Deterioration Reflectance 400- MLR, 20 Gowen et al., 1000 PCR 2008 Spinach E. coli Reflectance 400- PCA, N/A Siripatrawan 1000 ANN et al., 2011 Salmon WHC Absorbance 400- PLSR, 9, 12, 13 Wu and Sun 1753 LS- 2013 SVM STEC = Shiga toxin producing E. coli, WHC = water holding capacity, ANN = artificial

neural network, MLR = multiple linear regression, PCR = principal component

regression, Par = parallelepiped, Min = minimum distance, Mah = Mahalanobis

distance, Max = maximum likelihood, Sam = spectral angle mapper, Bin – binary, PCA

= principal component analysis, CDA = canonical discriminate analysis, PLSR = partial

least squares regression, LS-SVM = least-squares support vector machine, kNN = k nearest neighbor

29

CHAPTER 3

RAPID AND EARLY DETECTION OF SALMONELLA SEROTYPES WITH

HYPERSPECTRAL MICROSCOPE AND MULTIVARIATE DATA ANALYSIS1

1Eady, M.B., Park, B., and Choi, S. Submitted to “The Journal of Food Protection” on 08/04/14

30

Abstract

This study was designed to evaluate hyperspectral microscope images for early

and rapid detection of Salmonella serotypes: S. Enteritidis, S. Heidelberg, S. Infantis, S.

Kentucky, and S. Typhimurium at incubation times of 6, 8, 10, 12, and 24 h. Images

were collected by an acousto-optic tunable filter (AOTF) hyperspectral microscope

imaging (HMI) system, with a metal halide light source measuring 89 contiguous bands

every 4 nm between 450 – 800 nm. Pearson correlation values were calculated for

incubation times of 8, 10, and 12 hrs., compared to 24 hrs. to evaluate the change in

spectral signatures from bacterial cells over time. Regions of interest (ROIs) were analyzed at 30% of the pixels in an average cell size per image. Preprocessing of the spectral data was performed by applying a global data transformation algorithm, and followed with a principal component analysis (PCA). The Mahalanobis Distance (MD) was calculated from PCA score plots for analyzing serotype cluster separation. Partial

least squares regression (PLSR) was applied for calibration and validation of the model,

while soft independent modeling of class analogy (SIMCA) was utilized to classify

serotype clusters of training set. Pearson correlation values indicate very similar to

spectral patterns for varying incubation times ranging from 0.987 to 0.999. PCA score

plots show cluster separation at all incubation times, with MD values for incubation times

ranging from 2.146 to 27.071. PLSR had a maximum RMSEC value of 0.0025 and

RMSEV value of 0.0030. SIMCA correctly classified values at 8 h = 98.32%, 10 h =

96.67%, 12 h = 88.33%, and 24 h = 98.67% with the optimal number of principle

components (four or five). The results of this study suggest that Salmonella serotypes

31

can be classified by applying a PCA to HMI data from samples at the earliest of 8 h incubation time.

Introduction

Identifying the causative agents and contaminated products of a foodborne disease

outbreak can be a time consuming task. The use of nutrient enriched growth media has

been the standard methodology for many years. This approach can prove to be lengthy

with a best case scenario requiring several days (4). Methods such as PCR can reduce the

necessary time by lysing cells and amplifying DNA sequences, while enzyme-linked

immunosorbant assays (ELISAs) can reduce the required time through the binding of

pathogen specific surface antigens to antibodies (12,19). These methods require

extensive training with a high recurring cost of reagent kits and antibodies (13). These

obstacles can be difficult to overcome when tracking foodborne disease outbreaks, due to

the expedited measures of transportation logistics and rapid access to imported foods

(14).

Optical detection offers a nondestructive rapid methodology for real-time

recognition of pathogenic bacteria (20). Spectroscopic methods are utilized to produce a spectral signature that is unique to the organism. Multivariate data analysis (MVDA) methods are typically applied to categorize organisms based on the intrinsic differences in spectral signatures. Hyperspectral imaging (HSI) observes light in more bands than the three colors (red, green, and blue) recognizable by the human , and has the

capability to compile contiguous wavebands for each pixel in an image (22).

Hyperspectral microscope imaging (HMI) is a technique that collects microscopic data in

a three dimensional hypercube matrix with two dimensions representing spatial data (x

32

and y coordinates), and a third representing spectral data (ƛ) that can be used as a tool in identifying the unique optical properties of bacteria (22). Machine learning algorithms could then potentially be applied to classify bacterial cells, resulting in a sensitive detection method.

In addition to the need for an early and rapid foodborne pathogen detection methodology for disease tracking purposes, there are potential economic benefits for food industry applications. A rapid nondestructive optical detection system can provide a means for maintaining the high throughput required for food production facilities, while streamlining quality assurance protocols. Early detection can dramatically reduce the amount of time that food products are kept in storage before being released to consumers.

Because of the possibility of a foodborne disease outbreak companies have been known to implement a “test and hold” time period for sampling the microbial safety of new products, prior to release in consumer markets. Reducing the “test and hold” period to the shortest amount of time could have promising impacts on product safety, transportation logistics, and minimizing final product quality degradation. Clinical microbiology is another field that could benefit from the capabilities of early optical detection of pathogenic bacteria. By quickly diagnosing bacterial infections, healthcare professionals can implement specialized medical care sooner, providing the patient with a better chance for recovery. In addition to early detection, HMI offers the capabilities for

rapid detection by eliminating some of the downsides to previously mentioned

methodologies.

Previously, HSI systems have been applied to agricultural commodities studied as

in-line fecal detection mechanisms for poultry processing facilities (11, 15, 17, 21, 23,

33

28) pathogen detection in vegetables (26), safety characteristics of apples (16), and

quality attributes of the beef industry (24). In 2008, live and dead Bacillus anthracis

endospores were differentiated by HMI with fluorescent tags (1). Hyperspectral imaging

has also been used in food safety applications on a macro-scale by predicting the

Enterobacteriaceae loads, total viable bacteria count, and total Pseudomonas loads on

raw broiler chicken breast meat (6, 7 8). Recently HMI was implemented as a way of

differentiating between Shiga-toxin producing E. coli (STEC) serogroups on the colony

scale (29, 32). HMI could be an effective tool for differentiating between various

serotypes of a bacterial species at the cellular level, and comparing classification abilities

at early detection times with micro-colonies inoculated less than 24 hrs. The objectives

of this reserach were to 1) determine if the spectral signatures at 24 hrs. incubation time

can be compared to the spectra of earlier time frames consisting of 6, 8, 10, and 12 hrs. of

incubation, and 2) determine if the five Salmonella serotypes (S. Enteritidis, S.

Heidelberg, S. Infantis, S. Kentucky, and S. Typhimurium) could be differentiated at the

cellular level by their spectral signatures.

Materials and methods

Preparation of isolates and microscope slides

Glycerol stock cultures of S. Enteritidis (SE), S. Heidelberg (SH), S. Infantis (SI),

S. Kentucky (SK), and S. Typhimurium (ST) were obtained from poultry carcasses rinses at the Poultry Processing and Swine Physiology Research Unit at the Russell Research

Center in Athens, GA and stored at -80°C until needed. Samples were prepared for short term storage by inoculating agar slants and storing at 4°C. Fresh cultures were prepared by inoculating a few agar slant colonies into 10 ml of tryptic soy broth (TSB), incubating

34

samples at 35 to 37 °C for 18 to 24 h. The TSB cultures were serially diluted in peptone

buffer up to 10-5 dilution, and 100 µl of 10-5 dilution was spread onto brilliant green sulfa

(BGS) agar media plates, with a final plate dilution of 10-6. Samples were incubated at

35 to 37°C for a predetermined amount of time (6, 8, 10, 12, and 24 hrs.). From each plate 1 to 3 colonies were picked using a stereo-microscope (Olympus, SZK12, Center

Valley, PA) (except 24 h incubated agar plates), and then resuspended in 10 µl of sterile water. To prepare the HMI microscope slides 3 µl of bacteria suspended in sterile water was placed in the center of a microscope slide, which were allowed to air dry in a biosafety cabinet (BSC, Baker, Sanford, ME) for 15 min. After drying, 0.8 µl of sterile water was added with a cover slide on top, and a drop of 50 cc immersion oil (Nikon,

Melville, NY) on top of the cover slide.

Hyperspectral microscope imaging (HMI) system

The HMI system (HSI-400, Gooch & Housego, Orlando, FL) was used for acquiring images from the glass slides containing foodborne bacterial samples. The HMI system consists of a Nikon upright microscope (Nikon, Eclipse e80i, Lewisville, TX), acousto-optical tunable filter (AOTF) (Goouch & Housego, HSI-400, Ilminster,

England), a high performance cooled electron multiplying charge coupled device

(EMCCD), 16-bit camera (iXon, Andor Technology, Belfast, Northern Ireland), and dark-field illumination lighting sources (CytoViva 150 Unit, 24W metal halide, Auburn,

AL). The AOTF used for this research has a high-speed, high-throughput , random- access solid state optical filter with an adjustable optical pass-band and exceptionally high rejected light levels. AOTF delivers diffraction-limited image quality with variable bandwidth resolution within 2 nm.

35

HMI acquisitions

HMI-400 software (ChromaDynamics, Lakewood, NJ), was used for acquiring

HMIs and converted data from the individual spectral images to the contiguous

hypercube format. The hypercube stack images obtained at all 89 bands, and overlays

each pixel into a three dimensional cube. Once image acquisition software was executed,

all required parameters including exposure, electronically magnified (EM) gain, binning, and bandwidth were selected for high quality images accordingly. In this experiment, the hyperspectral microscope imagery was collected from 450-800 nm at 2 nm bandwidth, 4 nm spectral intervals with a scanning exposure time of 250 ms, and a gain of 9. All images were acquired by spectral sweep mode that enabled us to collect a total of 89 contiguous spectral images by sweeping the entire image in one scan direction, and returning to capture the next spectral band, with an acquisition time of approximately

1.98 s per single spectral image.

Region of interest (ROI) and preprocessing

HMIs were analyzed through the environment for visualizing images (ENVI) software (Exelis, McLean, VA). The region of interest (ROI) extraction by threshold selection was implemented to select the pixels within the bacterial cells, removing the background. ROI pixels were then randomized to insure data subsets were a true representation of the image, as much as possible. Data subsets were calculated based on

30% of the pixels in an average cell for each image, in an effort to reduce the large data processing and storage requirement associated with HSI (approximately 500 pixels per cell). Six hour incubation did not produce enough cells per image with any of the five serotypes for an adequate comparison to other incubation times; therefore, 6 hrs. images

36

were not included for further analyses. Each sample subset size is thirty with the

exception of SH (n=18) and ST (n=21) both at 8 hrs. and SK at 12 hrs. (n=28). ROI

subsets were averaged, spectra was normalized to the excitation band at 546 nm, which

has the maximum intensity for all serotypes, placing the spectra on a scale of 0 to 1.

Pearson correlation values were calculated for each serotype to compare the average

spectrum of shortened incubation times to that of 24 h.

Data preprocessing was carried out on the subsets through the Unscrambler

software version 9.8 (Camo, Oslo, Norway). A first derivative Savitzky-Golay data

transformation algorithm (25) with a quadratic polynomial, and 11-point convolution

interval was applied to all data subsets to create more spectral band separation between

serotypes, followed by a moving average smoothing treatment with a 5-point convolution

interval (27).

A two-band relative ratio was calculated for each data subset. The band at 590

nm was selected based on the correlation of the peak to the original spectral peak at 590

nm (figure 2), and the 538 nm band was selected because this represents the second

highest peak, which upon visual inspection also depicts variance in each serotype’s

spectra.

Relative Ratio = SIλ1 / SIλ2

Where SIλ1 is the scattering intensity at 590 nm, and SIλ2 is the scattering intensity at 538 nm. OriginPro9 software (OriginLab, Northampton, MA) was used to calculate one way

ANOVA on the relative ratio values with a p-value of 0.05, and a Fisher’s least significant difference means comparison test (9). The relative ratios were calculated at 8

37

h, 10 h, 12 h, and 24 h to determine if the two-band relative ratios can be differentiated at

each incubation time between serotypes.

Principal component analysis (PCA)

PCA was performed on ROI subsets at each incubation period. In general PCA is

a factorial analysis used to reduce a large amount of multivariate data into manageable

plots based on underlying data trends. Esbensen (5) presents additional information on

PCA. For this experiment, PCAs were calculated with full leave-one-out cross-validation

and Martene’s uncertainty principle was applied (18). PCA can be calculated by:

X = TPT + E

where X is the data matrix, T represents the score matrix, P represents the loading matrix,

and E represents the residuals or spectral noise (30). PCA score plots and loading plots are generated through the analysis. Score plots show sample clustering based on the underlying data trends, and loading plots were used to determine spectral bands contributing meaningful influence on the score plots. The Mahalanobis Distance (MD) values were calculated to quantify the distances between the center points of two PCA clusters. De Maesschalck et al. (2) and Hair et al. (10) provide further information on

MD. The MD can be calculated by:

2 T -1 MD = (xi – xj) S (xi - xj)

where xi and xj are the averages of groups i and j, T represents the transverse of the

matrix, and S is the covariance matrix.

Experiment repetition and validation

The experiment was repeated with new cultures of the same five Salmonella

serotypes grown and plated on BGS. Previously mentioned image collection, ROI

38

extraction, and spectral transformations were repeated. Partial least squares regression

(PLSR) with Martens’s uncertainty principle and full leave-one-out cross validation were

also performed on the validation data. PLSR was used as a method for validating the

performance of the global data transformation model, by assessing the fit of the data

collected from new images. Soft independent modeling of class analogy (SIMCA) was

implemented as a supervised pattern recognition method. In general, SIMCA classifies

the validation data based on results from the previously generated PCA. Dunn III et al.

(3) provides additional information on SIMCA.

Results

BGS plate growth and spectral comparisons

S. Infantis showed no signs of colony growth at 6 hrs. of incubation time during

the first repetition, and no or little growth at 6, 8, 10, or 12 hrs. during the second repetition of the experiment. However, the other four serotypes had noticeable colony growth at 6 – 24 hrs. Figure 3.1 shows hyperspectral microscopic composite images from S. Typhimurium cells grown at different incubation times, and Figure 3.2(a) shows a typical untreated spectral pattern for S. Enteritidis at the four incubation times, while

Figure 3.2(b) represents the spectra with the global pretreatment algorithm applied. A sharp excitation peak occurs at 546 nm, with a second more gradual peak beginning at

590 nm in the raw spectra. Pearson correlation values were used to compare the spectral signatures of the shorter incubation times to the standard 24 h. Correlation values ranged from 0.987 to 0.999 for all serotypes and incubation times, representing highly correlated spectra between samples and justifying the need for MVDA. The results of the two-band relative ratios analyzed through ANOVA for incubation times of 8, 10, 12, and 24 hrs.

39

can be seen in Table 3.1, and shows that there is a significant difference in values for

each serotype comparison at,with the exception of one (S. Heidelberg and

S.Typhimurium at 8 h). This shows that the relationship of the transformed spectral

peaks at 538 and 590nm are responsible for a portion of the serotype difference explained

in the PCA score plots.

Principal component analysis (PCA)

Figure 3.3 represents the PCA score plots overlaying the loading vectors for the

first and second principle components (PCs). Visual serotype cluster separation is

noticeable. Serotype clustering appears to be consistent in size and separation with the

exception of ST at 12 h, which has higher inter-cluster variance. Each of the plots lists the amount of variance explained at the first and second PCs. The MD values listed in

Table 2 represent a way to quantification of the cluster distances from the PCA scores plot by measuring the distance from center-point to center-point.

Experimental repetition and validation

PLSR shows a root mean squared error of calibration (RMSEC) < 0.0025 and R2

> 0.9844, while the root mean square error of validation (RMSEV) is < 0.0030, with R2 >

0.9954. SIMCA results are listed in Table 3.3. The SIMCA results are expressed as a percentage of correctly classified data subsets from the second experimental run based on the results of the first run’s PCA. The suggested optimal numbers of PCs for each

incubation time are based on obtaining a desirable percentage of explained variance that

was either four or five PCs.

40

Discussion

Spectral differences between incubation times

The lack of micro-colony growth at early incubation times for S. Infantis could

signify that the serotype metabolizes nutrients differently than the other serotypes,

possibly indicating a longer lag phase. Pearson correlation values are above 0.9870,

describing a strong similarity between the spectral signatures. Figure 3.1 represents a

typical image in which one can see S. Typhimurium cells of various life cycle stages,

denoted by the various cell shapes. The strong correlation values indicate that incubation times are not having an effect on the spectral signatures, suggesting that times earlier than

8 h would have very similar spectral patterns that could be used for identification and

classification purposes, if enough cells per image are obtained.

The two-band relative ratio difference of average values displayed in Table 3.1

shows that only S. Heidelberg and S. Typhimurium at 8 hrs. are not significantly

different. The lack of significant difference between these two serotypes at 8 hrs. could

be accounted for by the PCA only explaining 60% of the variance at the first PC. The

first PC shows strong peaks at the 538 and 590 nm bands, while 8 h would be affected by

the second PC more than the other incubation times, which has prominent peaks at 550,

598, 630, 666, and 726 nm.

Principal component analysis (PCA)

While the high correlation of the spectral signatures between 8 h and 24 h is

advantageous for describing the early detection possibilities, tight grouping makes for

difficult classification. Plotting the resulting score matrix gives a visual of dominate

“object patterns” within the data matrix, whereas plotting the loading matrix shows the

41

complimentary “variable patterns” (30). Score plots display the relationships of principle

components PCs to the central axis in the PC matrix, and loading plots are responsible for

conveying information about the direction of each principle component relative to the

original co-ordinate system. While the serotypes can be classified and validated at each

incubation time, PCA score plots show different loading vector bands are influencing

cluster scattering at the four incubation times. This could be due in part to the majority of

the cells imaged at each incubation time being in different life cycle stages, affecting

light scattering. The PCA cluster for ST at 12 hrs. contains a higher level of intracluster

variance. This is due to the fact that the collected image contained a high amount of extra

cellular particles, which were included within the ROI threshold limits. Those

extracellular particles could be removed with further object-based image processing

algorithms to minimize variation. These particles appear to be background material from

the BGS agar and are accounting for the variation in the PCA score plot. The percentage

of variance that is explained by each PC aids in determining the optimal number of PCs.

It is desirable to have the least amount of PCs necessary, to reduce the data processing

and storage requirements. The 12 hrs. plot has a lower percentage of variance described

by the first PC due to the variance within ST. In Table 3.2 we can see that MD values

range from 2.146 to 27.071, with the higher values equating to a higher level of

classification. At 8 hrs. of incubation time the average inter-cluster distance values were the highest of reduced times, suggesting that the intrinsic differences in each serotype have a more noticeable effect on the sample’s spectra, leading to potentially higher classification results, which indicates MD algorithm could be a good candidate for early classification of Salmonella serotypes.

42

Validation with MVDA

The same ROI subset approach was applied to the validation data set and used to

assess the fit of the data pretreatment model. The precision of the model parameters

improves with the increasing number of relevant variables and observations (31). Using

the full 89 band variable set of data proved to be a close fit for the regression analysis

with R2 values for both calibration and validation > 0.9821.

SIMCA is based on the idea that objects in one class or group show similar but

not necessarily identical patterns (3). This form of classification applies pattern recognition to the second set and uses the same global data pretreatment algorithm applied to the PCA, with the objective of predicting the serotype’s group classification.

The samples at 12 hrs. of incubation had a lower percentage of correctly classified subsets as seen in Table 3.3, likely due to the extracellular particles in the ST image.

Both 8 hrs. and 24 hrs. yielded classification percentages above 98% of the training data set, suggesting repeat experiments showed early identification is possible at high classification efficiencies.

In conclusion, early incubation times appear to have nearly identical spectra to 24 h and do not suggest any effect on the spectral signature of the bacteria, indicating that time is not a factor concerning early detection through HMI on the cellular level. The data from this study suggests that bacterial cells at any stage of their life cycle have a similar spectral signature unique to not only their species, but serotype as well. The data transformation model shows increased band separation and the classification of serotypes by HMI can be implemented as a rapid and early analysis tool for presumptive detection of bacterial pathogens. By applying a classification method such as SIMCA on the

43

second repetition to validate the results of the first data set collected, we showed that

classification through the MVDA steps is repeatable. Future experiments will focus on

reducing the number of bands to as few as possible while maintaining high classification accuracy to reduce the data processing and storage requirements necessary for an applicable implementation of this technology in the food industry.

44

References

1. Anderson, J., C. Reynolds, D. Ringelberg, J. Edwards, and K. Foley. 2008.

Differentiation of live-viable versuses dead bacterial endospores by calibrated

hyperspectral microscopy. J. Microsc. 1:130-136.

2. De Maesschalck, R., D. Jouan-Rimbaud, and D.L. Massart. 2000. The

mahalanobis distance. Chemom. Intell. Lab. Syst. 50:1-18.

3. Dunn III, W.J., S. Wold, and Y.C. Martin. 1978. Structure-activity study of β-

adrenergenic agents using the SIMCA method of pattern recognition. J. Med.

Chem. 21:922-930.

4. Ellis, D.I., and R. Goodacre. 2001. Rapid and quantitative detection of the

microbial spoilage of foods: current status and future trends. Trends in Food Sci.

Technol.11:414-424

5. Esbensen, K. 2000. Multivariate data analysis - in practice. CAMO, Oslo,

Norway.

6. Feng, Y. Z., G. Elmasry, D.W. Sun, A.G.M. Scannell, D. Walsh, and N. Morcy.

2013. Near-infrared hyperspectral imaging and partial least squares regression for

rapid and reageentless determination of Enterobacteriacia on chicken fillets. J.

Food Chem. 138:1829-1836.

7. Feng, Y.Z., and D.W. Sun. 2013. Determination of total viable count (TVC) in

chicken breast fillets by near-infrared hyperspectral imaging and spectroscopic

transformations. Talanta. 1:244-249.

8. Feng, Y.Z., and D.W. Sun. 2013. Near-infrared hyperspectral imaging in tandem

with partial least squarees regression and genetic algorithm for non-destructive

45

determination and visualization of Pseudomonas loads in chicken fillets. Talanta.

1:74-83

9. Fisher, R.A., The use of multiple measurements in taxonomic problems. 1936.

Ann. Eugen. 7: 179-188.

10. Hair, J.F., W.C. Black, B.J. Babin, and R.E. Anderson. 2010. Multivariate data

analysis. Prentice Hall, Inc. Upper Saddle River, NJ.

11. Heitschmidt, J.W., B. Park, K.C. Lawrence, W.R. Windham, and D.P. Smith.

2007. Improved hyperspectral imaging system for fecal detection on poultry

carcasses. Trans ASABE 50:1427-1432.

12. Jay, J.M., M.J. Loessner, and D.A. Golden. 2005. Modern food microbiology.

Springer, New York, NY.

13. Joo, J., C. Yim, D. Kwon, J. Lee, H.H. Shin, H.J. Cha, and S. Jeon. 2012. A facile

and sensative detection of pathogenic bacteria usnig nanoparticles and optical

nanocrystal probes. Analyist. 16:3609-3612.

14. Käferstein, F.K., Y. Motarjemi, and D.W. Bettcher. 1997. Foodborne disease

control: a transnational challenge. Emerging. Infect. Dis. 3:503-510.

15. Lawrence, K. C., W. R. Windham, B. Park, and R.J. Buhr. 2003. A hyperspectral

imaging system for identification of faecal and ingesta contamination on poultry

carcasses. J. Near Infrared. Spectrosc. 11:269-281.

16. Liu, Y., Y. Chen, M.S. Kim, D.E. Chan, and A.M. Lefcourt. 2007. Development

of simple algoritms for the detection of fecal contaminants on apples from

visible/near infrared hyperspectral reflectance imaging. J. Food Eng. 87:412-418

46

17. Liu, Y., W. R. Windham, K.C. Lawrence, and B. Park. 2003. Simple algorithms

for the classification of visible/near-infrared and hyperspectral imaging spectra of

chicken skins, feces, and fecal contaminated skins. J. Appl. spectrosc. 57:1609-

1612.

18. Martens, H., and M. Martens. 1999. Modified jack-knife estimation of parameter

uncertainty in bilinear modeling by partial least squares regression (PLSR). Food

Qual. Prefer. 11:5-16.

19. Mullis, K.B., and F.A. Faloona. 1987. Specific synthesis of DNA in vitro via a

polymerase-catalysed chain reaction. J. Gen. Microbiol. 155:335-350.

20. Narsaiah, K., S. N. Narayana, R. Bhardwaj, R.Sharma, and R. Kumar. 2011.

Optical biosensors for food quality and safety assurance-a review. J. Food Sci.

Technol. 49:383-406.

21. Park, B., K. C. Lawrence, W.R. Windham, and D.P. Smith. 2006. Performance of

hyperspectral imaging system for poultry surface fecal contaminant detection. J.

Food Eng. 75:340-348.

22. Park, B., K.C. Lawrence, W.R. Windham, and R.J. Buhr. 2002. Hyperspectral

imaging for detecting fecal and ingesta contaminates on poultry carcasses. Trans.

ASABE 45:2017-2026.

23. Park, B., S. Yoon, W.R. Windham, K.C. Lawrence, M. Kim, and K. Chao. 2011.

Line-scan hyperspectral imaging for real-time in-line poultry fecal detection.

Sens. & Instrmen. Food Qual. 5:25-32.

47

24. Peng, Y., J. Z. Zhang, W. Wang, Y. Li, J. Wu, H. Huang, X. Gao, and W. Jiang.

2011. Potential prediction of the microbial spoilage of beef using the spatially

resolved hyperspectral scattering profiles. J. Food Eng. 102:163-169.

25. Savitzky, A., and M.J. Golay. 1964. Smoothing and differentiaion of data by

simplified least squares procedures. J. Anal. Chem. 8:1627-1639.

26. Siripatrawan, U., Y. Makino, Y. Kawagoe, and S. Oshita. 2011. Rapid detection

of Escherichia coli contamination in packaged fresh spinach using hyperspectral

imaging. Talanta 85:276-281.

27. Soderberg, G.L., and L.M. Knutson. 2000. A guide for use and interpretation of

kinesiologic electromyographic data. Phy. Ther. 5:485-498.

28. Windham, W.R., J.W. Heitschmidt, D. P. Smith, and M.E. Berrang. 2005.

Detection of ingesta on pre-chilled broiler carcasses by hyperspectral imaging.

Int. J. Poult. Sci. 4:959-964.

29. Windham, W.R., S. Yoon, S.R. Ladley, J.A. Haley, J.W. Heitschmidt, K.

Lawrence, B. Park, N. Narang, and W.C. Cray. 2013. Detction by hyperspectral

imaging of shiga toxin-producing Escherichia coli serogroups O26, O45, O103,

O111, O121, and O145 on rainbow agar. J. Food Prot. 76:1129-1136.

30. Wold, S., K. Esbensen, and P. Geladi. 1987. Principal component analysis.

Chemom. Intell. Lab. Syst. 2:37-52.

31. Wold, S., M. Sjöström, and L. Eriksson. 2001. PLS-regression: a basic tool of

chemometrics. Chemom. Intell. Lab. Syst. 58:109-130.

32. Yoon, S., W.R. Windham, S.R. Ladley, J.W. Heitschmidt, K.C. Lawrence, B.

Park, N. Narang, W.C. Cray. 2013. Differentiation of big-six non-O157 Shiga-

48 toxin producing Escherichia coli (STEC) on spread plates of mixed cultures using hyperspectral imaging. J. Food Meas. and Charact. 7:47-59.

(c)

49

(a) (b)

1000x (d) (e)

Figure 3.1: Hyperspectral microscope images of S. Typhimurium cells at different incubation times: (a) 6 hrs., (b) 8 hrs., (c) 10 hrs., (d) 12 hrs., and (e) 24 hrs.

50

1.2 0.12 8 h 8 h 0.10 546nm 10 h 590nm 10 h 1.0 12 h 12 h 0.08 24 h 24 h 538nm 590nm 0.06 0.8 0.04

0.6 0.02

0.00 Intensity(a.u.) Intensity(a.u.) 0.4 -0.02

-0.04 0.2 554nm -0.06 614n 0.0 -0.08

500 600 700 800 500 600 700 Wavelength (nm) Wavelength (nm)

Figure 3.2: (a) Average spectra for S. Enteritidis normalized to 546 nm, (b) average S. Enteritidis spectra with pretreatment algorithm applied.

51

PC 1 (60%) PC 1 (72%)

-0.03 -0.02 -0.01 0.00 0.01 0.02 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.5 0.015 0.5 0.015

0.4 0.4 0.010 0.010

0.3 0.3

0.005 0.005 0.2 0.2

0.1 0.000 0.1 0.000 PC(18%) 2 Intensity(a.u.) 0.0 PC(26%) 2 Intensity(a.u.) 0.0 -0.005 -0.005

-0.1 -0.1

-0.010 -0.010 -0.2 -0.2

-0.3 -0.015 -0.015 500 600 700 -0.3 500 600 700 Wavelength (nm) Wavelength (nm) (a) (b)

PC 1 (74%) PC 1 (84%) -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.6 0.020 0.5 0.02

0.4 0.015 0.01 0.4

0.3 0.010 0.00 0.2 0.2 0.005

0.1 -0.01 0.000 0.0 Intensity(a.u.) PC(7%) 2 PC(13%) 2 Intensity(a.u.) 0.0 -0.02 -0.005 -0.1 -0.2 -0.03 -0.010 -0.2

-0.3 -0.04 -0.4 -0.015 500 600 700 500 600 700 Wavelength (nm) Wavelength (nm)

S. Enteritidis S. Heidelberg (c) (d) S. Infantis S. Kentucky PC 1 S. Typhimurium PC 2

Figure 3.3: PCA score plots with loading vectors for four incubation times: (a) 8 hrs., (b) 10 hrs., (c) 12 hrs., and (d) 24 hrs.

52

Table 3.1: Difference of average serotype values from a one way ANOVA Fisher test, to compare relative average spectral

peak ratios (590 nm / 538 nm) at incubation times 8 – 24 hrs.

Time Comparison between serotypes

(h) E:H E:I E:K E:T H:I H:K H:T I:K I:T K:T

8 -0.11 n/a 0.07 -0.13 n/a 0.064 (-0.00) n/a n/a -0.07

10 -0.05 -0.10 -0.12 -0.08 -0.05 -0.07 -0.026 -0.02 0.02 0.04

12 -0.35 -0.42 -0.23 -0.10 -0.05 0.12 0.254 0.19 0.32 0.131

24 -0.20 -0.25 -0.62 -0.33 -0.05 0.14 -0.136 0.19 -0.08 -0.27

E= S. Enteritidis, H= S. Heidelberg, I= S. Infantis, K= S. Kentucky, T= S. Typhimurium p=0.05, ( ) represents a pair of average values that are not significantly different

53

Table 3.2: Mahalanobis distance (MD) values between serotype clusters calculated from PCA score plots at incubation times of 8 – 24 hrs.

Time Comparison between serotypes

(h) E:H E:I E:K E:T H:I H:K H:T I:K I:T K:T

8 14.24 n/a 26.29 16.43 n/a 15.66 7.44 n/a n/a 22.07

10 21.46 11.76 10.27 12.43 8.73 13.55 14.63 10.21 13.71 4.11

12 23.27 27.07 16.64 11.06 9.20 5.49 2.15 10.21 3.11 3.81

24 4.60 12.93 15.62 17.62 12.27 17.54 17.37 8.59 5.10 10.45

E= S. Enteritidis, H= S. Heidelberg, I= S. Infantis, K= S. Kentucky, T= S. Typhimurium

54

Table 3.3: Soft-independent modeling of class analogy (SIMCA) for all serotypes within each incubation period.

Incubation Optimal # of Total # of % of subsets

time (h) PCs serotype subsets correctly classified

8 4 119 98.32

10 4 120 96.67

12 5 120 88.33

24 4 150 98.67

(Optimal # of PCs represents the # of principal components required to explain >96% of

variance)

55

CHAPTER 4

Visible Near-Infrared Hyperspectral Microscopy and Informative Band Selection for

Classification of Salmonella Enterica Serotypes1

1Eady, M.B., and Park, B. To be submitted to The Journal of Microscopy.

56

Abstract

A novel method for classification of five Salmonella enterica serotypes (S. Enteritidis, S.

Heidelberg, S. Infantis, S. Kentucky, and S. Typhimurium) through the analysis of images obtained from a hyperspectral microscope is proposed for a rapid detection method of pathogenic foodborne bacteria. Data was obtained from a hypercube that captured darkfield images of bacterial cells. A metal halide lighting source emits light through a glass slide containing a air dried bacterial suspension, and an acousto-optic tunable filter (AOTF) is applied with an electron-multiplying charge coupled device (EMCCD) to capture light scattering. Hyperspectral microscope images (HMI) were collected with 89 spectral bands, every 4 nm between 450 and

800 nm. Principal component analysis (PCA) was carried out on the images with informative variables selected based on PCA loading vectors and inter-band intensity variance. Reduced variable sets of 20, 12, 7, and 3 bands were chosen between the 586 and 662 nm range to compare. PCA score plots show cluster separation for the full 89 pt. spectrum, and at the reduced variable sets, however classification between S. Enteritidis and S. Typhimurium is less concise. A second set of HMIs were collected for the reduced number of bands. . Support vector machine (SVM) classification was implemented as a tool for classifying the five serotypes into groups. The first repetition of HMIs saw SVM classification abilities drop to 84.2%, where as the HMIs collected at only the reduced variables had increased classification accuracies, with as few as three spectral bands selected (590, 594, and 598 nm), representing a 96.6% reduction in the data processing and storage requirement of HMI.

57

Introduction

Traditional foodborne detection methodologies such as the use of nutrient enriched

growth media or polymerase chain reaction (PCR) have been widely implemented for bacterial

detection and identification for some time, however these protocols come with disadvantages

that can be limiting in assessing potential causative agents in foodborne disease outbreaks.

Utilizing nutrient enriched growth media is time consuming with confirmation results requiring

several days or longer. PCR analysis can be completed in considerably less time, although

requires extensive training in addition to the high recurring cost of cell lysing reagents.

Optical detection methodologies have been approached as a rapid detection method for

bacteria, focusing on the classification of microbes based on a spectral signature that is unique to

that organism (Narsaiah et al., 2012). Spectroscopic techniques such as surface-enhanced

Raman spectroscopy (SERS), Fourier-transform infrared (FT-IR), laser-induced breakdown spectroscopy (LIBS), and surface plasmon resonance (SPR) have previously been used as a means of nondestructive foodborne pathogen detection (Dudka et al., 2009; Rehse et al., 2010;

Sundaram et al., 2013; Udelhoven et al., 2000). The proposed hyperspectral microscope image

(HMI) analysis method has the potential for optical identification with the added benefit of sensitivity, requiring only a few cells captured within an image for classification.

Hyperspectral imaging (HSI) is a technique that generates both spatial and spectral information, representing a three dimensional data set or hypercube with two of the dimensions correlating to spatial data (x and y coordinates) and a third dimension yielding spectral data (ƛ)

(Sun 2010). This imaging technique creates a data hypercube, which consists of capturing an image for each band selected. The hypercube then overlays each of these images, creating a

58

spectrum for each pixel within the image, while keeping the x and y coordinates constant. HSI

has been implemented for medicine (Kellicut et al., 2005), pharmaceuticals (Roggo et al., 2005),

and astronomy (Wood et al., 2002) applications. In recent years the exploration for applications of stand-off HSI in agricultural commodities has risen. This technology has been used as a

nondestructive automated defect detection system in agricultural commodities such as broiler

chickens, apples, mushrooms, and salmon (Gowen et al., 2008; Lawrence et al., 2003; Liu et al.,

2007; Wu et al., 2013) to name a few. Park et al., (2012) showed that Salmonella and E. coli

differ in HMI obtained spectra. This experiment proposes a novel method for detecting and

classifying bacteria on a microscopic level through analysis of the HSI.

Salmonellosis is the disease accompanying a Salmonella infection, typically resulting in

gastrointestinal symptoms. It is estimated that 93.8 million cases occur worldwide, resulting in approximately 155,000 deaths each year (Majowicz et al., 2010).

In addition to the risk of adverse human health issues, the presence of Salmonella can produce significant economic problems for the food industry, including food producers, processors, and distributors. This gram-negative pathogenic bacterium has caused foodborne disease outbreaks in poultry, eggs, milk, nuts, fruits, and vegetables (Cox et al., 2013). Costly product recalls can result in a negative consumer perception in addition to the profit loss. The growing acceptance of food safety programs such as the Hazard Analysis and Critical Control

Points (HACCP) program put a focus on analyzing each processing step for potential chemical, physical, or biological hazards. The microbial analysis of a HACCP plan requires monitoring and validation of results. A rapid detection method for foodborne pathogens such as Salmonella

59 enterica serotypes can assist in automating this process, which typically focuses on microbial detection methods using enriched growth media or PCR.

A large amount of data is generated from HSI hypercubes due to each pixel yielding a spectrum generated from the overlapping images. It is important to reduce the amount of data collected to as little as possible while still maintaining a high level of classification accuracy for the purpose of applying this technology to a quality control facility, where a multitude of samples must be collected daily. Previously, various spectral reduction approaches have been undertaken to reduce the number of variables necessary with success in maintaining classification abilities. Windham et al., 2013 used a principal component analysis (PCA) loading vector to find bands that explained the highest amount of variance. Partial least squares regression (PLSR) techniques have been applied to each band selecting the bands with the smallest root-mean squared error of calibration (RMSEC) or validation (RMSEV) (Wu et al.,

2013). Other techniques implemented have been an ensemble Monte Carlo selection method, discriminate analysis (DA), analysis of spectral difference (ASD), successive projections algorithm (SPA), and principle component regression (PCR) (Esquerre et al., 2011; Gowen et al., 2008; Liu et al., 2007; Ye et al., 2008). While reducing the number of bands necessary for identification purposes, caution must be utilized to not over reduce the spectra losing variables that explain the naturally occurring variance that drive spectral based classification abilities. In this study, we evaluated the necessity of each band based on two different selection techniques; loading vectors of principal component analysis, and the amount of inter-band variance created by normalizing the spectra t oeach band, then averaging and assessing the average variance at each band.

60

The objective of this experiment is to develop a hyperspectral microscopic method to definitively classify five serotypes of Salmonella, while selecting the fewest amount of informative variables or spectral bands necessary for retaining a high rate of classification compared to the full 89 band near-infrared spectrum between 450 and 800 nm.

Materials and methods

Sample preparation

Five Salmonella serotypes: S. Enteritidis (SE), S. Heidelberg (SH), S. Infantis (SI), S.

Kentucky (SK), and S. Typhimurium (ST) were obtained from poultry carcasses rinses in the

Poultry Processing and Swine Physiology Research Unit at the Russell Research Center in

Athens, GA. Cultures were stored at -80°C. Fresh cultures were prepared from frozen cultures and incubated for 24 hrs. at 35-37°C. Several colonies were picked from the slants and inoculated into tubes containing triptic-soy broth (TSB), and incubated overnight at 35-37°C.

Serial dilutions were prepared to 10-5 for each of the five serotypes, then 100 µL of 10-5 was pipetted onto brilliant green sulfide (BGS) plates with a final plate count dilution of 10-6. BGS plates were incubated for 24 hrs. at 35-37 °C.

Hyperspectral microscope and image collection

The HMI system can be viewed in Figure 1.1. A Nikon Eclipse 80i upright microscope

(Nikon, Lewisville, TX) was equipped with an acousto-optic tunable filter (AOTF) (HIS-400,

Gooch & Housego, Orlando, FL) and a high performance cooled electron multiplying charge coupled device (EMCCD), 16-bit camera (iXon) (Andor Technology, Belfast, Northern Ireland), and a darkfield illuminating metal halide lighting source (Cytoviva 150 unit, 24 W, Auburn, AL) that travels through a fiber optic cable illuminating from underneath the sample stage. The

61

AOTF filter used in this experiment has a high-speed, high-throughput, random access solid state

optical filter with an adjustable optical pass-band. The variable bandwidth of the diffraction- limited image resolution was 2 nm.

Two colonies were picked from the BGS plates and suspended into 10 µL of autoclaved deionized water, followed by a brief vortex. Three µL of the bacterial suspensions were placed on a glass slide and allowed to air dry in a bio-safety cabinet (BSC, Baker, Sanford, ME) for 15 min. Sterile water (0.8 µL) was placed on top of the dried suspension and a glass cover slip was

applied. Pressure was applied to the glass cover slip to immobilize cells and prevent movement

during the multispectral sweep, collecting 89 contiguous images.

Initially, 89 contiguous bands were collected between 450 and 800 nm, every 4 nm for

the first repetition of image collection. Once the informative variables were identified the

second repetition focused on collecting those selected spectral bands only. The first repetition

had an acquisition time of 1.98 s between each acquired image within the hypercube.

Hypercube and pixel extraction

The hypercube represents the overlapping of multiple images collected through the

hyperspectral imaging process at various bands or spectral bands, and an experimental flow chart

can be seen in Figure 4.1. These images are stacked upon each other creating the hypercube by

combining both spatial and spectral data. The x and y planes are consistent as they represent the

spatial coordinates of each pixel in the image. Lawrence et al., (2003) stated that this

information can be used to describe physical or geometric observations within an image such as

shape color and size. Figure 4.2 represents a mosaic of the same microscope image captured at

62

each of the five selected variable sets. The middle row shows a typical Salmonella cell and how

intensity values from the 590 nm band are used to recreate a single band of the hypercube, in

which physical features of a cell are defined.

Collected images were analyzed through the Environment for Visualizing Images (ENVI) software program (Exelis, McLean, VA). The region of interest (ROI) is a selection of pixels representing an object in the image. Pixels representing only the cells were extracted from the

HSI images in order to remove the background. This was performed by setting minimum and maximum threshold values of each pixel to extract only cell data. After ROIs were extracted pixels were randomized to account for optical effects such as limb darkening of the cell boundaries. A thirty percent subset was calculated for each image based on the average number of pixels per cell and per image. Because the scope of this project focuses on an efficient data reduction method we also looked at extracted a minimal amount of data from each image. A typical rod shaped bacterial cell may contain approximately 500 pixels, however not all of these pixels are necessary for classification. The average number of pixels in a cell was calculated per

image by dividing the number of pixels extracted from the threshold ROIs divided by the number

of cells per image. Thirty percent of the average number of pixels were then randomized and

averaged to reduce the amount of data required for classification. Based on the number of pixels

obtainable per image, subsets of n=120 for each image were calculated for the first repetition of

serotype images in this experiment. The randomization assures that subsests are not extracted

multiple times from the same cell as a whole, and yields a true representation of the data set.

63

Principal component analysis (PCA) and informative band selection

Data was then imported into the Unscrambler software program version 10.3 (Oslo,

Norway) for multivariant data analysis (MVDA). MVDA is typically necessary when analyzing

HMI data due to the high correlation of the spectra obtained from samples (Hair et al., 2010). A

first derivative Savitzky-Golay preprocessing step (Savitzky et al., 1964) with a quadratic

polynomial for removing some of the baseline effect, and an 11-point convolution interval for

spectral noise reduction of an appropriate segment size for the 89 contiguous bands collected or

independent variables. A 5-point moving average smoothing treatment was then applied for

additional smoothing of the sharp spectral peaks (Soderberg et al., 2000). This helped reduce

noise and aided in selecting only independent bands.

PCAs were performed on the first repetition of HMI for each of the variable selection set

models A-E. However, due to a reduction in spectral noise by eliminating the uninformative variables, data pretreatment was not needed for variable sets B-E. A PCA can be used to

simplify an analysis, in which multiple variables are present as with spectroscopy data treating

each band as a separate variable. This offers a scatter plot for simple visualization of the five

serotype clusters. A PCA decomposes the spectral X matrix in which:

X = TPT + E

where X is the data matrix, T represents the score matrix, P represents the loading matrix, and E

represents the residuals or spectral noise (Wold et al.,1987). PCAs in this experiment were

calculated for each of the 600 ROI data subsets (n=120 per serotype x 5 serotypes) with full

leave-one-out cross-validation, and Martene’s uncertainty principle was used for creating

submodels and applying a cross-validation with a jack-knifing method of resampling for leaving

64 out each observation (band) in the data set and calculating the 89 submodel averages (Martenes et al, 1999).

The selection of informative bands for an abbreviated data processing and analysis protocol was defined by treating each of the 89 contiguous bands as independent variables. The

PCA loading vectors represent the principal component (PC) direction relative to the original information and are considered to be a transformation matrix between the original variable space and the PCA matrix (Esbensen, 2010). Each PC within a PCA analysis has a loading vector associated with it. When the loading vectors for essential PCs (explaining a significant amount of variance) are plotted together, spectral bands with peak intensities can be identified.

In addition to analyzing the PCA loading vectors, the amount of variance associated with each band was assessed through normalization of the spectra at each band. The 89 spectral bands were normalized to each collected band in the spectrum, and the resulting average serotype variance plotted in Figure 4.3(a). Spectral bands were selected for the reduced variable sets by comparing the loading vectors Figure 4.3(b) to the spectral band variance observed through normalization. Figure 4.3(c) showsthe range of variables selected as informative.

Collection of validation hyperspectral microscope images (HMIs)

Informative bands were identified with the first set of images, and the purpose of recollecting HMIs with a second repetition is to 1) validate classification results and 2) to determine if a reduction in the hypercube data will in fact be able to classify the five Salmonella serotypes at a high efficiency. A second set of HMIs was collected for the same five serotypes of Salmonella. The new bacterial cultures were grown following the same protocol as outlined

65

for the first repetition. Glass slides were prepared in the same manner. The collection of HMI differed. The second set of HMIs were collected at only the specified informative spectral bands. Variable set A was collected first, followed by changing the band selection range within the chemometric software to represent 586 to 663 nm (set B) obtaining the same microscopic image followed by sets C, D, and E collected at respective bands. The bandwidth for all repetition 2 images was 2.5 nm with spectral intervals of 4 nm, and a scanning exposure time of

250 ms. Theoretically classification results should be similar to that of the first HMI set. The purpose of this was to not just select bands that appear informative, but to carry out the hypercube data collection with the reduced bands and determine if a practical collection method would yield similar spectra and classification results to the first repetition that analyzes the bands based on the collection of the full 89 spectral bands.

Support vector machine (SVM) classification

Classification of serotypes for both repetitions and each of the five variable selection sets were performed with SVM classification algorithms. SVM is a form of supervised classification based on object recognition of nonlinear data (Mercier 2003). A kernel-based recognition algorithm creates a hyperplane or high-dimensional space that is then used to classify the serotypes through the input space with the equation:

h = wT x + b

where h is the hyperplane value, and x is the HMI input vector from pixel intensity, w is the

adaptable weight vector, b represents the bias, and T is the transverse operator (Sun, 2010).

Figure 4.4 shows a representation of how SVM classification uses a kernel based approach for

66

distinguishing objects in a high-dimensional space. While one class is visually noticeable in the

original variable matrix sometimes a nonlinear classification algorithm is necessary. SVM

creates a high-dimensional linear feature space or hyperplane to classify objects based on a linear

separation (Sun, 2010). The original variable space is a low-dimensional space where a kernel is

applied to create a hyperplane. A confusion matrix describes the classification results obtained

from the SVM algorithms, by grouping serotype subsets into one of five classes.

Results

Informative band selection

Informative band selection was determined by comparing both PCA loading vectors with

the average variance attributed to normalization at each band.. The results can be seen in Figures

4.3(a) and 4.3(b). Figure 4.3(c) shows the original spectra with the area for informative band selection highlighted. All four of the abbreviated variable selection sets are within the range of

586 and 662 nm. Table 4.1 lists the five variable selection sets, the associated spectral ranges, and the percent reduction from the original 89 band spectra (set A).

Principal component analysis (PCA)

The PCA score plots can be viewed in appendix a for the first repetition of images collected. The score plots give us a visual representation of the underlying trends in this factorial analysis. Cluster separation is noticeable, however with variable reduction typically comes less cluster separation. Appendix a depicts models A and D with clear cluster separation, while models B, C, and E display cluster separation although S. Typhimurium show some of the overlapping in cluster boundaries.

67

Support vector machine (SVM) classification

Figure 4.2 represents a mosaic of the HMI collected during repetition 2 of this

experiment, where the same image was collected only at the specified spectral bands between

586 and 662 nm for each variable set. In the top row of the image, HMIs collected at 590 nm are visible, which appear very similar to each other. In the middle row, reconstruction of the same S.

Enteritidis cell is repeated for each variable selection set. Comparing the middle row to the bottom row with corresponding spectra of the same pixel shows that while there is a slight change in spectral intensity, the spectral pattern is consistent across all five variable sets. For all of the pixels selected as a whole, Figure 4.5 describes the relationship of the 590-598 nm spectral range with each variable selection set. Variable set A is similar in pattern to variable set E. The change in spectral intensities can be attributed to several unknown factors at this time.

Table 4.2 shows that for the second repetition of the experiment SVM classification results were perfect (120 correctly identified, for n=120 per serotype) for much of the serotypes in variable sets A and B. Variable sets C, D, and E saw reductions in the classification abilities, dropping to 84.2% correctly identified. The confusion matrix also shows a 100% classification accuracy for all serotypes of variable sets of the second repetition.

Discussion

Spectral bands within the 450-800 nm range for darkfield images of bacterial cells showed that the bands between 586 nm and 662 nm appeared to be the most informative. Bands explaining the most variance in the PCA loading vector also had the lowest average band variance. The mosaic in Figure 4 shows that the 590 nm band, light scattering from a single cell, and pixel spectrum may change slightly in intensity over time, however the spectral pattern

68

remains similar. Figure 4.5 verifies that the average spectra of all extracted pixels for the 590-

598 nm band range are similar with variable sets A and E being very similar.

The kernel-based classification algorithm succeeded with high classification accuracy

because of the robust nature of SVM applied to nonlinear data. SVM results from repetition two

confirm the PCA score plot results of the same images. Through both analyses there is clear

classification for three serotypes, however in both analyses S. Enteritidis and S. Typhimurium are

not separated as efficiently. By collecting only the 590, 594, and 598 nm spectral bands this

shows high classification accuracy amongst all five Salmonella serotypes by reducing noise

associated with uninformative variables.

Park et al., (2012) showed that spectral signatures from E. coli and Salmonella cells

differed. This experiment suggests that it is possible to classify serotypes of the same bacterial

species with as few as three spectral bands (590, 594, and 598 nm) through a kernel-based SVM classification analysis. The Hughes phenomenon is a situation that is common with hyperspectral or multispectral data sets. The Hughes phenomena states that with a fixed number of samples, the accuracy of the classification algorithm will decrease (Hughes et al., 1964).

Alonso et al.,( 2011) showed that the SVM classification algorithm, being a non-parametric

classifier was robust enough to negate the Hughes phenomena. From this, we can assume that

the increase in correct SVM classification percentages during the second repetition where only

the selected bands were collected is not influenced by this statistical method; however, possibly

influenced by the reduction in spectral noise associated with the Salmonella serotypes.

In conclusion, this project shows the sensitive nature of hyperspectral imaging and that it can be applied to differentiate images of bacterial serotypes within the same species based on a

69 microscope image analysis. While selecting as few as three spectral bands (590, 594, and 598 nm) for classification will greatly reduce the amount of data stored and processed, a kernel-based

SVM algorithm is useful for classification. This novel methodology could be implemented as a more cost efficient alternative to bacterial identification through PCR, while considerably less time consuming then traditional plating techniques. Further in-depth analysis on the hyperspectral signature of bacteria would be required prior to a creation of a spectral library followed by repeated sampling trials. Thus, an optical method with HMI will be a good candidate to detect and identify bacterial serotypes as well as species rapidly and easily compared to conventional plating and immune assay methods.

70

References

1. Alonso, M. C., Malpica, J. A., & de Agirre, A. M. (2011) Consequences of the Hughes

phenomenon on some classification Techniques. In ASPRS 2011. Milwaukee, WI. May

pp. 1-5.

2. Cox, N.A., Frye, J.G., McMahon, W., Jackson, C.R., Richardson, J.L., Cosby, D.E.,

Mead, G.C., & Doyle, M.P. (2013) Salmonella. Compendium of methods for the

microbiological examination of foods. American Public Health Association. Washington,

DC.

3. Dudak, F.C., & Boyaci, I. H. (2009) Rapid label free bacteria detection by surface

plasmon resonance (SPR) biosensors. J. Biotechnol. 4, 1003-1011.

4. Esbensen, K. H., Guyot, D., & Westad, F. (2000) Multivariate data analysis: in practice:

an introduction to multivariate data analysis and experimental design 4th Ed. CAMO

ASA. Oslo, Norway.

5. Esquerre, C., Gowen, A.A., & Downey, G. (2012) Wavelength selection for development

of a near infrared imaging system for early detection of bruise damage in mushrooms

(Agaricus bisporus). J. Near Infrar. Spectrosc. 20, 537-546.

6. Gowen, A. A., et al. (2008) Hyperspectral imaging for the investigation of quality

deterioration in sliced mushrooms (Agaricus bisporus) during storage. Sens. Inst. Food

Qual. Safety, 2, 133-143.

7. Gowen, A. A., Taghizadeh, M., & O’Donnell, C. P. (2009) Identification of mushrooms

subjected to freeze damage using hyperspectral imaging. J. Food Eng. 93, 7-12.

71

8. Hair, J.F., Black, W.C., Barry, J.B., & Anderson, R.E. (2010) Multivariate data analysis.

Prentice Hall, Upper Saddle River, NJ.

9. Hughes, G. (1968) On the mean accuracy of statistical pattern recognizers. Info. Theory,

IEEE Trans. 14, 55-63.

10. Kellicut, D., et al. (2004) Emerging technology: hyperspectral imaging. Perspect. Vasc.

Surg. Endovasc. Ther. 16, 53-57.

11. Lawrence, K. C., Windham, W. R., Park, B., & Buhr, R. J. (2003) A hyperspectral

imaging system for identification of faecal and ingesta contamination on poultry

carcasses. J. Near Infrar. Spectrosc. 11, 269-281.

12. Lawrence, K. C., Park, B., Windham, W. R., & Mao, C. (2003) Calibration of a

pushbroom hyperspectral imaging system for agricultural inspection. Trans. ASAE, 46,

513-521.

13. Liu, Y., Chen, Y. R., Kim, M. S., Chan, D. E., & Lefcourt, A. M. (2007) Development of

simple algorithms for the detection of fecal contaminants on apples from visible/near

infrared hyperspectral reflectance imaging. J. Food Eng. 81, 412-418.

14. Majowicz S.E. et al. (2010) International Collaboration on Enteric Disease Burden of

Illness Studies. The global burden of nontyphoidal Salmonella gastroenteritidis. Clin.

Infect. Dis. 50, 882-889.

15. Martens, H., & Martens, M. (1999) Modified jack-knife estimation of parameter

uncertainty in bilinear modeling by partial least squares regression (PLSR). Food Qual.

Prefer. 11, 5-16.

72

16. Mercier, G., & Lennon, M. (2003) Support vector machines for hyperspectral image

classification with spectral-based kernels. Geoscience and Symposium,

IGARSS'03. Proc. IEEE 1, pp. 288-290.

17. Narsaiah, K., Jha, S.N., Rishi, B., Rajiv, & S., Ramesh, K. (2012) Optical biosensors for

food quality and safety assurance – a review. J. Food Sci. Technol. 4, 383-406

18. Park, B., Yoon, S.C., Lee, S., Sundaram, J., Windham, W.R., Hinton Jr., A., & Lawrence,

K.C. (2012) Acousto-optical tunable filter hyperspectral microscope imaging method for

characterizing spectra from foodborne pathogens. Trans. ASABE. 55, 1997-2006

19. Rehse, S.J., Mohaidat, Q., & Palchaudhuri, S. (2010) Towards the clinical applciation of

laser-induced breakdown spectroscopy for rapid pathogen diagnosis: the effect of mixed

cultures and sample dilution on bacterial identification. Appl. Optics.49, C27-C35.

20. Roggo, Y., Edmond, A., Chalus, P., & Ulmschneider, M. (2005) Infrared hyperspectral

imaging for qualitative analysis of pharmaceutical soild forms. Anal.Chim. Acta. 535, 79-

87.

21. Savitzky, A., & Golay, M.J. (1964) Smoothing and differentiaion of data by simplified

least squares procedures. Anal. Chem. 8, 1627-1639.

22. Soderberg, G.L., & Knutson, L.M. (2000) A guide for use and interpretation of

kinesiologic electromyographic data. Phy. Theropy. 5, 485-498.

23. Sundaram, J., Park, B., Hinton Jr, A., Lawrence, K. C., & Kwon, Y. (2013) Detection and

differentiation of Salmonella serotypes using surface enhanced Raman scattering (SERS)

technique. J. Food Meas. Char. 7, 1-12.

73

24. Sun, D.W. (2010) Hyperspectral imaging for food quality analysis and control. Elsevier,

San Diego, CA.

25. Udelhoven, T., Naumann, D., & Schmitt, J. (2000) Development of a hierarchical

classification system with artificial neural networks and FT-IR spectra for the

identification of bacteria. Appl. Spectrosc. 54, 1471-1479.

26. Windham, W. R., et al. (2013) Detection by hyperspectral imaging of shiga toxin–

producing Escherichia coli serogroups O26, O45, O103, O111, O121, and O145 on

Rainbow Agar. J. Food Prot. 76, 1129-1136.

27. Wold, S., Rhue, A., Wold, H., & Dunn III, W.J. (1984) The collinearity problem in linear

regression. The partial least squares (PLS) approach in gneralizing inverses. SIAM J. Sci.

Stat. Comp. 5, 735-743.

28. Wold, S., Esbensen, K., & Geladi, P. (1987) Principal component analysis. Chemom.

Intell. Lab. Syst. 2, 37-52.

29. Wold, S., Sjӧstrӧm, M., & Eriksson, L. (2001) PLS-regression: a basic tool for

chemometrics. Chemom. Intell. Lab. Syst. 58, 109-130.

30. Wood, K.S., Guilian, A.M., Fritz, G.G., & VanVechten, D. (2002) A QVR detector for

focal plane hyperspectral imaging in astronomy. Bull. Am. Astran. Soc. 3, 1241.

31. Wu, D., & Sun, D.W. (2013) Application of visible and near infrared hyperspectral

imaging for non-invasively measuring distribution of water-holding capacity in salmon

flesh. Talanta 116, 266-276.

74

32. Ye, S., Wang, D., & Min, S. (2008) Successive projections algorithm combined with

uninformative variable elimination for spectral variable selection. Chemom. Intell. Lab.

Syst. 91, 194-199.

75

Glass Slides Informative Salmonella Serotype HMI Collected for Prepared for each 89 Bands (450- Bands Selected Preparation on BGS Serotype Agar 800nm)

PLSR HMI Collected for PCA Selected Bands Only New Salmonella (#89, 20, 12, 7, and Serotypes and Glass 3 bands) Slides Prepared

SVM Classification

Figure 4.1: Sample collection and analysis flow chart

76

2600 1500 1400 1400 1400

2400 1400 1350 1350 2200 1300 1300 2000 1300 1300

1800 1250 1200 1200 1250 1600 1200 1400 Intensity Intensity Intensity Intensity 1100 Intensity 1200 1100 1200 1150

1000 1000 1150 1100 800 1000 900 1100 1050 600

400 800 900 1050 1000 500 600 700 800 600 620 640 660 590 600 610 620 630 590 595 600 605 610 590 592 594 596 598 Wavelength (nm) Wavelength (nm) Wavelength (nm) Wavelength (nm) Wavelength (nm) a b c d e

Figure 4.2: A mosaic of the 590 nm band with each variable selection set (A-E) displaying the obtained HMI with a highlighted cell (top), with a respective 3D surface plot reproducing the highlighted pixels (middle), and accompanying spectra of the collected bands for the highlighted cell (bottom).

77

3.0 0.4 1.2

S. Enteritidis 2.5 S. Heidelberg 0.2 1.0 S. Infantis S. Kentucky S. Typhimurium 2.0 0.8 0.0

1.5 0.6 -0.2 Band Variance Loading Weight Loading 1.0 Intensity(a.u.) S. Enteritidis 0.4 S. Heidelberg -0.4 PC 1 (70%) 0.5 S. Infantis S. Kentucky PC 2 (24%) 0.2 S. Typhimurium PC 3 (3%) 0.0 -0.6 0.0 500 600 700 800 500 600 700 800 500 600 700 800 Wavelength (nm) Wavelength (nm) Wavelength (nm)

(a) (b) (c)

Figure 4.3(a): Spectral band variance, normalized to each band, (b) loading vectors for PCA score plots (c) average spectra for

5 Salmonella serotypes, vertical lines represent the range of bands selected for variable sets B-E (586 – 662 nm).

78

Optimal Hyperplane

Marginal Object

Original Variable Matrix Hyperplane Margins

Figure 4.4: Example of how support vector machine (SVM) classification creates a hyperplane, for classification of nonlinear data.

79

1600

1500

1400

1300

Intensity 1200

1100

1000

900 588 590 592 594 596 598 600 Wavelength (nm)

Variable Set A Variable Set B Variable Set C Variable Set D Variable Set E

Figure 4.5: Comparison of the 590-598 nm spectral range collected at each of the five variable selection sets, for S. Enteritidis.

80

Table 4.1: Description of variable sets selected for analyses.

Variable Set Spectral band Number of Bands Band Reduction

range (nm) Percentage

A 450-800 89 0

B 586-662 20 77.5

C 586-630 12 86.5

D 590-614 7 92.1

E 590-598 3 96.6

81

Table 4.2: Confusion matrix results obtained from support vector machine classification (SVMC).

Reduced variable sets

Serotype A B C D E

S. Enteritidis 100(100) 95.0 (100) 93.3 (100) 85.8 (100) 85.8 (100)

S. Heidelberg 100 (100) 100 (100) 100 (100) 100 (100) 100 (100)

S. Infantis 100 (100) 100 (100) 100 (100) 98.3 (100) 98.3 (100)

S. Kentucky 100 (100) 100 (100) 99.2 (100) 89.2 (100) 89.2 (100)

S. Typhimurium 97.5 (100) 87.5 (100) 84.4 (100) 84.2 (100) 84.2 (100)

Values represent percentage of correctly classified samples.

( ) represents confusion matrix results for correctly classifying percentage of Salmonella serotype subsets, results of second repetition with variable set E (n=60)

82

CHAPTER 5

CONCLUSIONS

The primary objectives of this study were to first determine if five serotypes of the same

Salmonella enterica subspecies could be classified through hyperspectral microscope imaging

(HMI). The second objective was to analyze spectra obtained from early incubation times (6, 8,

10, 12, hrs.), and to compare the similarities in spectra and classification to the spectra obtained

from 24 hrs. of incubation time at 35-37°C. The other objective of this study was to determine if

the large data processing and analysis requirements associated with hyperspectral imaging can be

reduced to only informative spectral bands, while still maintaining similar classification

efficiencies to the full 89 spectra originally obtained from 450 – 800 nm.

This study shows that early classification of bacterial cells is possible through HMI. A

comparison between the spectra of each serotype from incubation times of 6, 8, 10, and 12 hrs.

compared to 24 hrs. yielded Pearson correlation values > 0.980, indicating strong similarities in

samples grown at early incubation times. Application of the principal component analysis (PCA)

showed visual cluster separation through a PCA scores plot, used as a factorial multivariate data

analysis (MVDA) tool, with inter-cluster distances represented by Mahalanobis distance values

to quantify separation between serotypes. Partial least squares regression (PLSR) showed that

repetition of the experiment yielded R2 calibration and validation values > 0.9821, describing spectral data that is consistent through repetition. Soft-independent classification analogy

(SIMCA) was used in this experiment to classify the five serotypes into clusters, based on the

83 results of the PCAs obtained through the first repetition of sample culturing and image collection. This method verified that serotype classification accuracy for both 8 and 24 hrs. is greater than 98%.

Due to the large data processing and storage requirement of hyperspectral imaging (HSI) it is essential that analyses be completed with the only the informative spectral bands only. The support vector machine (SVM) applied a kernel-based classification technique to serotype cluster separation by introduction of a hyperplane for classification purposes. SVM classification of the full spectra provided cluster separation > 84.2%, however collecting data at only three spectral bands 590, 594, and 598 nm yielded SVM classification results of 100% for all five serotypes.

The range between 590-598nm appears to represent key intrinsic differences in cell physiology between the five Salmonella serotypes.

The end goal for this area of study is to develop an early and rapid analytical system that counters drawbacks to traditional microbial testing procedures such as the high recurring cost and training requirements of PCR, or the extended time requirements of nutrient enriched growth media. While this study shows that detection and classification of pure culture Salmonella isolates is plausible at reduced incubation times, further experimentation on conditions that may affect cellular spectra need to be examined. It is unknown if environmental factors such as temperature or ionic variances, as well as morphological factors such as cellular evolutions or mutations influence spectral signatures. Correlating spectral peaks to physical characteristics of the cell is also needed, as is further understanding of cellular geometric orientation within an image, and the effects of limb-darkening on the cell’s overall spectral profile. This project was able to show that five Salmonella serotypes are capable of being classified with as little as 8 hrs.

84 of incubation time using a method requiring approximately 15 minutes to complete. It is also possible to carry out the analysis in as few as three spectral bands ranging between 590 and 598 nm. In conclusion, HMI for early and rapid bacterial identification is a promising novel microbial identification methodology. Future work is needed before practical application such as: exploring the effects of various food matrix compositions and environmental factors on the bacteria’s spectra.

85

APPENDIX

0.010

0.008

0.006

0.004

0.002

PC3 (5%) PC3 0.000

-0.002 0.02 0.01 -0.004 0.00 -0.006 -0.01 0.010 0.005 PC1 (73%) 0.000 -0.02 -0.005 S. Enteritidis PC2 (17%) -0.010 -0.03 S. Heidelberg -0.015 S. Infantis (a) S. Kentucky S. Typhimurium

Appendix A: Principal component analyses score plots from reduced variable selection sets.

(A.a): PCA scores plot for model A (89 bands).

86

0.04

0.02

0.00 0.20

PC3 (5%) PC3 -0.02 0.15 0.10 -0.04 0.05 -0.06 0.00

0.04 PC1 (79%) 0.02 -0.05 0.00 -0.10 -0.02 PC2 (12%) S. Enteritdis -0.04 -0.15 S. Heidelberg (b) -0.06 S. Infantis S. Kentucky S. Typhimurium

(A.b): PCA scores plot for model B (20 bands).

0.03

0.02

0.01

0.00 PC3 (3%) PC3 -0.01 0.20 -0.02 0.15 0.10 -0.03 0.05 0.04 0.00 0.02 -0.05 PC1 (77%) 0.00 -0.10 S. Enteritidis -0.02 PC2 (17%) -0.04 -0.15 S. Heidelberg -0.06 S. Infantis S. Kentucky S. Typhimurium (c)

(A.c): PCA scores plot for model C (12 bands).

87

0.03

0.02

0.01

0.00 PC3 (1%) PC3

-0.01

0.15 -0.02 0.10 0.05 0.00 -0.05 -0.03 -0.10 PC1 (75%) 0.04 0.02 0.00 -0.02 -0.15 -0.04 -0.06 S. Enteritidis PC2 (22%) S. Heidelberg S. Infantis S. Kentucky S. Typhimurium (d)

(A.d): PCA scores plot for model D (7 bands).

0.10 0.015 0.08 0.010 0.06 0.005 0.000 0.04

PC3 (1%) PC3 -0.005 0.02 -0.010 -0.015 0.00 0.04 -0.02 PC1 (74%) 0.02 -0.04 0.00 PC2 (25%)-0.02 -0.06 S. Enteritidis -0.04 -0.08 S. Heidelberg -0.06 S. Infantis S. Kentucky S. Typhimurium

(e) (A.e): PCA scores plot for model E (3 bands).