Food Fraud Guide

Methodologies for Food Fraud

Tips for robust experimental results

Executive summary

Knowing that food fraud scandals often drive public awareness and regulatory changes, the goal of this paper is to present analytical techniques and experimental methodologies, and introduce multivariate statistics and sample class prediction as it relates to food adulteration. Some approaches such as molecular spectroscopy tend to be less expensive, and a few of these instruments have been miniaturized to the point where they can be field-deployed. Spectroscopic instruments are useful in fingerprinting food because small changes in a sample’s spectral profile can be detected with the latest technology, assuming appropriate data normalization techniques are applied. Similarly, the use of both unit- and high-resolution-based mass spectrometry (MS) can be important in food fraud testing because they can fingerprint food based on the pattern of discrete compounds they detect. While other techniques such as inductively coupled plasma mass spectrometry (ICP/MS) and inductively coupled plasma optical emission spectrometry (ICP/OES) have proven adept at identifying geographic origin based on trace element analysis. Genomic testing can accurately identify fish DNA, even from processed samples. From a methodological perspective, nontargeted approaches have proven effective in fingerprinting samples. The advent of inexpensive computer workstations and statistical software has made it possible to link nontargeted workflows with multivariate statistical analysis to extract useful information from analytical data. Until recently, these approaches have been too expensive or complex for researchers to perform by themselves; instead, the data had been handed over to dedicated statisticians or never fully investigated. But now, there are moderately priced tools that allow researchers to more easily use statistical approaches for determining attributes such as sample quality and building sample class prediction models. Introduction Beyond fundamental safety concerns, food adulteration cheats consumers, as As background to why adulteration is has been demonstrated with extra virgin an important issue, you may recall the olive oil (EVOO). A study from 2010 found scandal that broke out in 2007 that imported olive oil, which at that when dogs and cats were poisoned by time accounted for 99% of the EVOO on their food.1 Investigations discovered the US market, often failed the sensory the dog and cat food was tainted with test for EVOO classification.6 Experts a mixture of melamine and its triazine now claim that up to 80% of EVOO is analogs (, , and fraudulently labeled.7 It is relatively ). These inexpensive but straightforward to perform established highly nitrogenous industrial chemicals tests for distinguishing EVOO and were being used to boost the nitrogen other grades of olive oils. Two popular content in food to give the perception examples are the HPLC measurement The Agilent 8890 GC system and Agilent 5977B 2 that the foods were rich in protein. of diacylglycerol or pyropheophytin GC/MSD. 8 As awareness grew, the adulteration (a breakdown product of chlorophyll). Many volatile and semivolatile of food with melamine quickly became However, these tests can generate compounds are routinely analyzed by an international problem. It was later false positive and false negative results, electron ionization (EI)-based GC/MS. detected in baby formula produced in and fraudsters can use an alternative This technique has the advantage of the US, cookies distributed in Canada, adulterating oil to pass the tests. EI spectra that can easily be library chocolate sold in Asia and Australia, Furthermore, there are flavor defects searched against the large NIST 17/Wiley condensed milk in Thailand, and eggs that can lead to a true EVOO not passing Registry 11 library that contains 597K in Hong Kong. During the scandal, sensory testing. Mustiness, rancidity, and compounds and over one million spectra. international agencies such as the acidity can all be issues, especially with This gives an unprecedented ability to World Health Organization, the Food and badly stored EVOO. Chemical analysis tentatively identify many volatile and Agriculture Organization, the European offers potential solutions to screen for semivolatile compounds. There are Food Safety Association, and the these issues based on the concentration some limited structure-elucidation tools, International Food Safety Authorities of specific markers such as acids, esters, such as the MS Interpreter and the Network worked together to characterize and aldehydes being found at high ppb to substructure search feature provided and control the crisis. While 68 countries low ppm concentrations.9 by NIST, which are useful with unit banned or recalled foods suspected As chemists search for more mass data. Adding a full-spectrum of containing melamine,3 many reliable tests for EMA, new analytical accurate‑mass instrument such as countries established allowable limits instruments and methodologies are a GC/TOF or GC/Q‑TOF provides for melamine, with the FDA maximum being explored. The key is finding the structural information and allows residue limit (MRL) as 1 part per million right technique for your application and identification of unknown compounds (ppm) for infant baby formula, and budget. through the generation of fragmentation 2.5 ppm for other products.4 The FDA trees and thermodynamic-based also set up an economically motivated MS-based food testing structure elucidation tools such as adulteration (EMA) working group. In Agilent Molecular Structure Correlator. 2009, it defined EMA as the fraudulent, Full spectrum (time of flight), scanning The FT-traps have excellent mass intentional substitution or addition of a (quadrupole), and image building resolution, but image building is not substance in a product for the purpose (Fourier transform) MS techniques are yet fully developed for EI analysis, and of increasing the apparent value of good tools for sample classification. the match factors are lowered due to that product or reducing the cost of its They vary in price from inexpensive gas missing peaks. One solution to this production (that is, for economic gain).5 chromatography/mass spectrometry problem is to develop a proprietary (GC/MS) instruments, more expensive identification algorithm that places Fourier transform mass spectrometers greater emphasis on mass accuracy, and (FTMS), to high-price, high-value liquid less weight on missing ions. phase ion mobility quadrupole time of flight instruments (IM/Q-TOF).

2 Feature-finding analysis batch-wise manual editing of the results oxidation. Volatile components such There is also a mixture of commercial should be permitted following batch as these are relatively easy to separate and academic feature-finding tools recursive feature extraction. XCMS chromatographically, and can be used going back to AMDIS that was made is an online public resource for the to evaluate sensory attributes as well public by NIST in the early 1990s.10 A nontargeted analysis of small molecules as characteristics associated with rice good academic solution for EI peak developed by The Scripps Research adulteration.15 Subsequently, integrating alignment and data extraction is the Institute that follows a similar workflow headspace solid-phase microextraction AMDIS/manual curation/SpectConnect. to fill in missing peak data.12 into the methodology eliminates much of 16 Unlike XCMS and MZmine, SpectConnect Once the feature-finding data have been the sample preparation. is specifically designed for EI data. reviewed and any integration errors GC also lends itself to multidimensional addressed, the data can be exported chromatography with various as text files. If a peak is not found, it commercially available solutions from is considered missing. There may be companies such as Agilent, LECO, an option in the statistical software and ZOEX. An advantage for these to treat missing values either as zero techniques is that they typically have abundance or as peaks that were a higher signal-to-noise (S/N) than not found. Selecting zero abundance the unmodulated counterparts. Their strongly impacts the statistics. Choose disadvantage is that they require this option only if the data have been specialized hardware and peak carefully reviewed, and it has been deconvolution tools from companies confirmed that the peak is missing. such as GC Image. This software tool Select missing values if the data are works well for analyzing samples not carefully reviewed and the missing sequentially, but has not been optimized component might be a false negative. The Agilent 1290 Infinity II LC system with an for batch analysis, a cornerstone of Targeted and nontargeted methods Agilent 6546 LC/Q-TOF. nontargeted analysis.11 The combination of targeted and To build reproducible prediction models, Nontargeted workflows depend on nontargeted chemometric approaches MS-based EMA methods should focus reproducible feature-finding software, provides a good solution when on finding robust identifiers instead ideally based on recursion. Recursive evaluating samples that allow accurate of all identifiers. An example of this feature finding aims to minimize both identification of expected compounds, approach is when a LC/Q-TOF was false positives and false negatives. but also allow for the presence of used in full‑spectrum mode to show It does this by eliminating ions unexpected components. A good the applicability of this instrument considered noise, aligning retention example of this approach is the work for differentiating mango variety. The times, aligning masses, and ion done by Hjelmeland and colleagues approach relied on newer software binning to build a consensus library evaluating the chemical and sensory tools to identify robust classifiers and (consisting of a composite feature list profiles of Cabernet Sauvignon wines by constructing class prediction models. and corresponding composite feature HS-SPME. That study shows a targeted Even with the latest software tools, skill spectra). The consensus library is used analysis using a nontargeted workflow and expertise are required to develop a by the algorithm to re-assign individual with synchronous SIM/Scan.13 Including robust prediction method. However, the ions. This reduces the number of false a nontargeted component allows for subsequent step of running the class positives. The remaining unassigned retrospective analysis later. features are then used to perform a prediction model should not require Fragrant rice such as Basmati and targeted search. This process reduces as much technical skill. Therefore, a Jasmine have higher concentrations the number of false negatives due to software automation tool was created of 2-acyl-1-pyrroline due to a loss missing values by re-examining the that contains the feature-finding method of function mutation in the fgr gene target list with less-stringent criteria. as well as the sample class prediction product.14 DNA testing could look for Recursive feature extraction is designed models. This allows production data the mutated gene, but there are other to increase the quality of the overall files to be run in a streamlined manner key volatiles found in rice such as results. Since no algorithm is perfect, against a previously developed MS‑based hexanal, which is associated with lipid model.17

3 Elemental fingerprinting The electrochemical techniques are soil pH, humidity, clay, and humic acid simple and inexpensive but are only complexes,21 for example authenticity techniques for geographic useful for looking at a handful of specific testing in Pu-erh tea from the Yunnan indication metals. More expensive instruments province in China. An ICP-MS was used such as atomic emission spectrometry to test 30 tea samples in triplicate Elemental fingerprinting can be used (AES), ICP-OES, and ICP-MS have the for 29 elements. The results were for many applications such as process ability to simultaneously detect trace analyzed by multivariate statistics using quality control, identification of elemental and rare earth elements at low detection supervised learning. Canonical variate contaminants, and speciating samples limits.19 analysis showed two vectors. The first based on their trace elemental profiles. Isotope-ratio mass spectrometry was based on macro elements such as High-value foods and beverages that (IR‑MS) has been shown to be useful Na, Mg, and Ca and micro elements such are characterized by origin and are in determining geographic indications as Sr, Zr, Mo, and trace elements. This produced in limited quantities are targets using stable-isotope ratios. The samples primary vector differed between the tea for EMA. To clarify the definitions of are first converted into simple gasses regions. The second vector separated the geographic indication (GI), the World such as hydrogen, , green teas from the black and fermented Trade Organization Agreement on carbon monoxide, nitrogen, oxygen, teas. This vector depended on the levels Trade‑Related Aspects of Intellectual or sulfur dioxide using an elemental of Rb, Mn, W, Re, and Tl, all of which were Property Rights (TRIPS) was agreed 22 analyzer connected to a stable IR-MS. higher in the black and fermented teas. upon by all 164 member countries. The The isotope-pair ratios, such as 13C/12C Most ICP-MS interferences are obligations are related to two articles, and 15N/14N can be plotted together to polyatomic, and possess a greater ionic 22 in which each country has laws to help distinguish between geographic cross-section relative to mono‑atomic prevent the use of marks that mislead locations where the 13C/12C ratio depends ions. This characteristic can be exploited the public as to the geographical origin mainly on the biochemical type of plant, to remove background interferents of the goods. Article 23 says that while the nitrogen ratio reflects the local through collisions with an inert gas. governments may refuse to register or soil conditions. The nitrogen ratio can be This process, called kinetic energy may invalidate a trademark that conflicts greatly influenced by the type of fertilizer discrimination, causes all ions to lose with a wine or spirits GI whether the used, with mineral fertilizer fixed from energy in proportion to their collisional trademark misleads or not.18 atmospheric nitrogen having a ratio cross-section. The process reaches Food and beverages can be localized to close to zero while organic fertilizer has a a point in which the polyatomic ions country of origin by elemental content. higher 15N/14N ratio.20 have lost enough energy and can be The source of the trace and rare earth removed from the mass spectrum. elements can originate from several With an inert collision gas such as sources such as the raw ingredients helium, no side reactions or new product or from the local soil, environment, interferences are formed.23 A triple fertilizers, and agrochemicals. The quadrupole ICP‑MS also benefits from processing steps such as CuSO4 added kinetic energy discrimination. However, to wine to remove thiols, or Zn and it also can eliminate mono-atomic Fe leaching from steel fermentation interferences using reactive gases in the equipment, or copper leaching from collision cell.22 In both cases, modern the distillation equipment can also ICP-MS instruments are significantly contribute. Finally, bottling and storage The Agilent 7900 ICP-MS system. more selective and sensitive than their can cause lead to leach from the metal predecessors. capsule seal of a wine bottle. ICP-MS is a powerful tool for elemental Common analytical methods for the analysis, providing part per billion (ppb) determination of trace metals in foods or trillion (ppt) limits of analytical can be as simple as ion chromatography, detection for more than 70 elements. electrochemical analysis, or Trace-element availability depends on atomic absorption spectrometry. various geographic factors such as

4 Spectroscopic techniques from 13,300 to 4,000 cm–1. This energy centering were not enough.31 The use range is not high enough for electron of multiplicative scatter correction is a in authenticity testing excitation but higher than necessary common thread in NIR rice authenticity to promote molecules to their lowest studies.32 There are several types of EMA seen with vibrational states. NIR spectra are based wine and spirits. It can be as simple as Spatially offset Raman spectroscopy on molecular overtones and combination sourcing an authentic bottle and either (SORS) is a technique used for security bands, weak signals that are not 33 diluting the product with a cheaper screening in European airports. As a allowed by quantum mechanics. One alternative, counterfeiting a quality technology, it shows promise for EMA, advantage of NIR’s lower absorptivity product with inexpensive alcohol and as limitations such as fluorescence is that it can penetrate further into a additives to simulate the original product, are being addressed through scientific sample than MIR.27 Specifically, the NIR or it can be as bad as unscrupulously advancement. A traditional Raman milk adulteration samples showed two using diluted tax-free denatured alcohol system uses a laser of approximately 24 prominent absorption bands at 7,700 and 785 nm, which is good for sensitivity but to make a dangerous fake. –1 5,000 cm . bad for interference from fluorescence. Fourier transform infrared (FTIR), While both MIR and NIR instruments Newer Raman systems use longer near infrared (NIR), and are capable of adulteration testing, MIR wavelengths. The Agilent Resolve Raman spectroscopy tended to provide better classification handheld SORS instrument has Milk is one of the seven most adulterated models and quantification after principal addressed fluorescence as a concern 25 foods. Water, whey, , component analysis (PCA)-based by increasing the wavelength to 830 nm, , melamine, and hydrogen peroxide chemometrics.26 Both UV-Vis and IR generating a smaller signal response, have been used to adulterate milk by techniques are based on absorbance, much less susceptible to fluorescence. increasing volume, obtaining higher and the amount of light absorbed is The SORS method also reduces protein content values, or sanitizing the an absolute measurement. That is not fluorescence from the packaging product. Although direct measurement to say that they are not independent materials, such as colored plastics or of milk samples is ideal, water and fat of physical parameters, for example glass. The operation of SORS includes masks some of the spectral signals. the granularity of a sample is very a spectral normalization step. The A chloroform extraction was used important, as shown in the following instrument first collects zero mode to remove the fat, and an aliquot of rice authenticity discussion where spectra, which are like a traditional the water fraction was dried on the multiplicative scatter correction was backscatter mode where the results attenuated total reflectance (ATR) required to correct for the granular are biased by the surface. The laser diamond crystal. Regardless of which nature of rice. automatically shifts position before technique was used, there were Rice is the staple food of more than taking an offset measurement to correct prominent absorption bands. Midinfrared half of the world’s population, with for the container contribution to the (MIR) showed major spectral differences approximately 480 million metric tons of spectrum. between the control samples and milled rice produced annually, 85% of it EMA samples for SORS evaluation the adulterated milk from 1,600 to for human consumption.28 Unfortunately, were provided by the Scotch Whiskey 1,200 cm–1. Spectra of milk adulterated EMA is a concern, as it is easy to dilute Research Institute at real-world levels with whey had prominent amide I quality rice with substandard rice or of denaturants and adulterants. stretching (C=O) at 1,635 cm–1 and add adulterants.29 Field-based NIR Six denaturants commonly used are: amide II (N-H bending/C-H stretching) techniques have been developed to methyl ethyl ketone, isopropyl alcohol, at 1,530 cm–1. Samples adulterated identify rice by classifications such methyl isopropyl ketone, ethyl sec-amyl with urea, synthetic milk, and urine as geographical indication30 or a ketone, methanol, and denatonium showed a strong urea stretching (C=O) measure of sample quality. In these NIR benzoate. These have been detected at 1,615 cm–1, and NH + deformation 4 studies, normalization is an important using a handheld SORS instrument at 1,454 cm–1. There were also strong component of the predictive quality of down to ppm to sub-ppm levels. bands observed in the O-H stretching the NIR method. In the sample quality Typical flavorings used to adulterate region 3,700 to 3,200 cm–1, and study, multiplicative scatter correction whiskey are vanillin, sucrose, limonene, additional absorption bands in the was able to distinguish sample classes and trans‑anethole. These have been complex fingerprint region from 1,200 to of rice, while detrend and mean detected in the low ppm range. All of 800 cm–1.26 The NIR region extends

5 these were tested in closed glass DNA testing, is relatively stable to food 10 and 20% non-Basmati rice.43 34 threaded vials. To extend the usefulness processing. It can also be used to PCR-RFLP approaches represent the of this technique to real-world conditions, identify simple admixtures of meats standard authentication approach, and additional samples were bought in and proteins until overlapping profiles are good at generating a single sequence 35 stores to test the usefulness of SORS complicate interpretation. Despite to compare with Barcode of Life-based with the types of containers typically the promise of DNA testing, there are public databases.44 An approach is used for wines and spirits: clear flint, several considerations that need to be emerging designed to identify mixtures green, and brown glass. The dark glass considered. One drawback is that the of unknown species, and is based on reduced light transmission so that nuclear genome is large and largely next-generation sequencing (NGS). NGS more measurements were required conserved. However, mitochondrial DNA eliminates the need for cloning DNA to achieve the highest S/N possible. (mtDNA) has a relatively fast mutation fragments by bacteria. Instead, it is With these stipulations, methanol was rate, comes from a single parent, and cell-free, can sequence multiple samples 36 detected down to 250 ppm (well below is 100,000 times smaller. Initially, in a single run, and the results can be the tolerance level of 20,000 ppm, or mtDNA approaches were developed observed without gel electrophoresis.45 2%), and all the other adulterants were on both mitochondrial cytochrome b Since NGS is based on fragmenting detected down in the ppm range. The and c regions. Eventually, the Barcode and processing DNA in parallel, it is Resolve is sold as a field-based detection of Life Data System was adopted. suited to identify multispecies seafood system for narcotics, explosives, and The standard for animal identification products such as surimi. To distinguish hazardous materials. The limits of became the 684 base pair region of the between the fish and mollusk species, 37 detection (LODs) achieved in the above cytochrome C oxidase I (COI) gene. The an experiment can use species-specific study used offline processing methods, fish identification protocol was adapted universal primers with a metabarcoding 38 not currently available on the Agilent to this standard. approach to classify genera and species Resolve handheld SORS instrument. Other advancements in fish mtDNA of fish and mollusks. The Basic Local analysis came from hardware Alignment Tool (BLAST) is a specialized Genetic profiling for improvements. The initial experiments software for aligning and evaluating the authenticity testing were done using polymerase chain results.46 reaction-restriction fragment length polymorphism (PCR-RFLP) with capillary Other techniques of note gel electrophoresis and staining for endpoint detection.39 This approach Electronic Nose (e-Nose) detects was improved by replacing the gel volatiles using chemical (amperometric electrophoresis step with a lab-on-a-chip and conductometric), piezoelectric, and capillary electrophoresis system.40 optical (smell-sensing) sensors, and those based on GC and MS.47,48 The most However, the Barcode of Life approach frequently used e-Nose sensors are took longer to develop for land plants electrochemical. The number and type because of the high genetic diversity The Agilent 2100 Bioanalyzer Instrument. of sensors as well as their selectivity found in plant species. It was decided and sensitivity depend on the e-Nose Traditional authenticity testing of fish that targeting the conserved nuclear application. E-Noses, due to their rapid 41 has been based on visual inspection. transcribed region would be reliable. screening capacity, can be an alternative Unfortunately, many fish are difficult The CBOL Plant Working Group to traditional GC. It has identified to visually distinguish while alive, and recommended using rbcL and matK Emmentaler cheese from various 42 the removal of fins and scales during genome sequences in 2009. Prior to geographic regions.49 The drawback processing makes visual identification this standard approach being agreed with this approach is that without impossible. An antibody-based solution on, various microsatellite markers were chromatography or spectral data it is could be used for field-based testing. being used with PCR-RFLP. An example not possible to identify specific volatile However, this approach does not of this was the Basmati rice authenticity compounds to monitor. proof-of-concept project. In that study, work after the meat has been cooked 1D and 2D HR-NMR processed with because proteins often denature it was shown that DNA testing could quickly determine admixtures of between multivariate statistics have detected though heating. Another approach, fraud in the honey industry, and assisted

6 in identifying and quantifying several Most of the chemical tests used for It is easy to set up a multivariate parameters such as sugars and amino adulteration only screen for a few well statistics sample class prediction acids. To ensure quality control while characterized attributes. Using EVOO as method and get results that seem minimizing safety issues, the profiles of an example, known degradation products predictive based on the test data set. But screened food products are compared for sample quality are compounds having too many independent variables to a larger database of genuine food such as pyropheophytin, a degradation can lead to a class prediction method samples.50 product of chlorophyll, and 1-octen-3-ol, that fails to predict subsequent samples which contributes to a musty odor. Other despite fitting the training data set. This General nontargeted known compounds have fermented, situation is typically caused by overfitting analysis workflow rancid, or musty attributes. the data. To reduce the likelihood of We can augment chemical tests with a this scenario, see if simplifying to a For nonchromatographic techniques, sensory panel of experts. The biggest two-dimensional PCA is effective at such as a spectrometric analysis, we are drawback is that this approach is labor separating the sample classes. In not concerned with identifying individual intensive (expensive) and slow (experts general, prediction models that need components. In these experiments, we only work for a short period of time fewer independent variables to predict are simply looking for spectral features before their ability to identify subtle class differences are stronger models, that differentiate between sample types. characteristics wane). A combination of and are less prone to data overfitting. However, with MS-based techniques, a sensory training set and multivariate Data quality is critical, so design the we have the option of processing the statistics can offer a cheaper, more experiment around good analytical samples through an identification efficient, and reproducible approach for principals. Initially, minimize sample protocol. This enriches the data, and determining authenticity. preparation for nontargeted workflows, allows us to compare results against as it biases what can be detected. When known adulterants, sensory defects, and Key experimental factors necessary, use dilution and simple potential contaminants. Thus, adding a for multivariate statistics extractions to deal with matrix effects. targeted component to a nontargeted Considering that successful models use approach can provide valuable insight. Ideally, adulteration would be tested discreet class-specific characteristics It is important to recognize that in the raw materials, as opposed to for prediction, once these factors are an incorrect identification reduces downstream products, which can confirmed, sample preparation and the statistical significance of the dilute and mask them. If possible, chromatography can be optimized to experimental results. Therefore, leaving evaluate multiple analytical techniques ensure that these variables are robustly the results unidentified is preferential to to determine which one best meets measured in the final targeted prediction including spurious identifications. Only your requirements. Spectroscopic, method. include a targeted (known) component spectrometric, as well as nuclear Data normalization also needs to be in your experiment if you are confident magnetic resonance are all options carefully considered. With small sample about your ability to consistently identify for a specific food fraud test. Cost, class prediction methods, it is possible it. With chromatographically separated complexity, sensitivity, and portability to pool samples for normalization components, there are significant influence selection between competing controls. The advantage of this approach challenges to identification such as: poor methodologies. is that every component is present, and retention time consistency, incomplete Balance sample class prediction matrix and instrument effects can be peak separation, isomeric and isobaric experiments with similar group sizes for reduced. However, it is not appropriate compounds, and unresolved peaks. authentic and adulterated samples, or for the subsequent production sample These challenges lead to alignment unequal sampling can bias the statistical class prediction methods in that the problems, false peak detection (false results because the heterogeneity acts of pooling samples and running positives and false negatives), and of variance is an assumption in these pooled samples on a regular identification errors. Even in the best one-way analysis of variance (ANOVA) basis requires significantly more circumstances, data review is an experiments. Modern statistical software resources than adding appropriate important step in the workflow. tools can compensate for some of the internal standards. Instead, internal bias, but it is better to avoid problems standards should be representative rather than having to try to minimize their of the classes of adulterants being effect after the data have been collected. evaluated. Similarly, use proficiency

7 samples at the beginning and the end of major contributors to the most common systematic treatment, determine the each analytical batch to confirm that the flavor characteristics, and how many Mahalanobis distance to show how instrument’s performance is stable. The olive varieties are going to be included in many standard deviations away a value targeted sample class prediction method the training set. is from the mean value.53 should be optimized for high-throughput It is important to determine in subtle Another useful approach is to use a production. class prediction experiments if there recursive workflow to process the data Sample class prediction is a form of is an appropriate amount of biological when developing spectrometric-based supervised learning. It is a good practice variation in the training data to cover sample class prediction methods. This to limit the number of dependent the phenotypic differences and the helps the problem of false negatives and variables, ideally a single variable per various targeted characteristics. Are positives in the analytical data. Several predictive method. For example, sample there enough samples to have statistical feature finding and data alignment type is the variable with vehicle blank, power? Post-hoc power analysis can protocols have been optimized to deal control, and known adulterants being determine the minimum required number with drifting chromatography and instances. Using milk as an example, of samples based on the statistical changes in instrument response. These adulterants could be water, whey, power of the pilot experiments.51 tools perform alignment as part of sodium hydroxide, urea, melamine, Chromatographic reproducibility feature extraction, and help address the and hydrogen peroxide. With obvious is important, especially with mass sample variability that can come from adulterants such as these, three spectrometry techniques. Even though chromatographic instability from various technical replicates of every sample can there are chromatographic peak warping sources. If recursion is not an option, confirm chromatographic reproducibility, tools to help correct for chromatographic and there is too much data to manually if necessary, and five samples per drift, peak alignment may not correctly check the results, consider treating condition should be enough for statistical identify structural isomers that nearly missing values as missing as opposed to significance. It may be that each target co‑chromatograph if the retention times zero within the statistical software. This adulterant is distinguished enough to are drifting during the run. It is better to should be a data import feature of all be characterized by its absorbance avoid chromatographic problems rather multivariate statistical packages. A final frequency in the sample spectra. In these than deal with them after the fact. To recommendation is to keep track of all cases, it is possible that a spectroscopic ensure that any run order effects do not available metadata. It may be necessary approach can be used to identify influence the overall outcome, analyze to track down confounding factors adulteration in the field using a handheld samples in random order. For example, that complicate data interpretation. A FTIR, potentially avoiding more involved matrix interferences can cause either reference that goes into greater depth lab-based approaches. signal suppression or enhancement. This about spectrometric experimental design In other cases, as we see with the often happens gradually throughout a and quality control is written by Warwick 54 rampant miss-classification of EVOO, batch. Dunn et al. some substitutions are subtler and are After the data are collected, review them Multivariate statistics, therefore more difficult to expose. In to look for chromatographic or spectral these cases, chromatography and a outliers that can bias the results. A significance testing, and sample class prediction workflow are simple approach is to visually review confirmation necessary to detect fraud. Decide the the data. If the problem seems to be a scope of the method early. For example, batch‑based chromatographic effect, it While not an instrument per se, fustiness, mustiness, rancidity, and may be possible to correct for it through multivariate statistics can play a pivotal vinegariness can all be issues, especially location and scale normalization. The role in identifying food fraud. This is with badly stored EVOO, that lead to Combat algorithm, readily available the case for geographic authenticity, lower classifications such as virgin olive online, is integrated in various food quality evaluation, or adulteration. oil or worse. These four defects are well multivariate statistical packages.52 Whichever type of EMA is being characterized, but there are many other Also, to detect bad samples within a evaluated, both biological and reference factors that would lead to a failed EVOO good batch, look for poor clustering on sample data need to be extracted classification that are dependent on a PCA, poor correlation within a group, to identify the characteristics (and the phenotype. Is it necessary to build and check the box and whisker plot for underlying compounds) associated with a method robust enough to identify the nongaussian distribution. For a more acceptability, and which characteristics

8 lead to failure. This was typically done there were many users interested prediction model for the samples. SCP manually with a painstaking search in nontargeted analysis and sample will provide the best results when the for what is different between a failed profiling but less interested in learning sample data are properly filtered. As sample and reference samples. As more how to use a sophisticated statistical mentioned in the multivariate statistics failed samples were evaluated and more package, Agilent focused on making section, Agilent developed Profinder for attributes discovered, a targeted list of MPP user friendly and instrument recursive data extraction to reduce the components were added to a library. centric. As discussed earlier, scientific number of false negatives and positives This is the state that many labs find skill and analytical expertise are still that a researcher needs to evaluate. This themselves in today, combining tests for needed to develop a robust prediction tool works with both scanning and full- known degradation products along with method. However, the subsequent step spectrum mass spectral data. Similarly, a sensory evaluation. Neither chemical of running the class prediction model with IR data generated from instruments nor sensory tests are ideal on their own. should not be complicated. Classifier from the 4300 Handheld FTIR to the Agilent Profinder software does both was developed to automate LC/Q-TOF 8700 LDIR for chemical imaging, Agilent mass and retention time alignment. classification model analysis. provides optional chemometric and Therefore, it can import the median Finally, it is important to independently prediction modules in the MicroLab retention and mass values, eliminating test the statistical power of your findings. Expert software. the need to perform a retention time There are standalone power analysis Multiple prediction models allow alignment step in Agilent Mass Profiler programs used to validate the statistical evaluation and customization since each Professional (MPP) software when power of an experimental result.51 prediction model has traits that make it the data are imported as a Profinder Post-hoc power analysis calculates the applicable under certain situations: archive file. However, there is still a power of the results given the sample • Partial least squares discrimination mass alignment option in MPP to size tested, where power is defined performs vector analysis to develop allow people to import CEF files. When as the probability of rejecting the null models that explain the difference working with CEF files, because of the hypothesis when the specific alternative between the classes. Outliers have slight differences between how mass is hypothesis is true. significant influence on the results, defined in Profinder compared to MPP, it and can cause this model to fail. is best to increase the mass alignment Class prediction/classifier It works well in situations where window in MPP to accommodate the data quality is consistent, and the differences. To import data files First, multivariate statistics provide separation of the sample classes, a simple model can distinguish from third-party instruments requires between the classes. importing the file as a text file and using providing components (entities or a targeted workflow. features) that best discriminate between • Support vector machine works with the classes. A sample class prediction overlapping sample classes. The There are also online statistical tools (SCP) model is then built using these algorithm imagines each sample as to perform feature extraction and predictive components. Explained this a point in two- or three-dimensional simple chemometric analysis for way, it is understandable that another space. When multiple planes exist pairwise experiments where there is name for sample class prediction is that separate the classes, the 55,56 only a single independent variable. supervised learning. algorithm maximizes the separation Fortunately, there are capable statistical between the classes. tools available when an experiment It is possible to have an entity list that contains more than one independent seems to separate classes but is not • Naïve Bayesian assumes that variable, and the researcher sees value capable of building a valid SCP model. classes are independent from one in sophisticated techniques such as data The case where the model describes a another (called class conditional clustering, class prediction, multiomic random error in the training data instead independence). This algorithm can analysis, or pathway analysis. In these of an underlying relationship in the work with small entity sets since 57 cases, other tools are available that samples is called overfitting the data. only the variance within each class can be used for differential analysis To generate the SCP model with the needs to be determined. and visualization of complex data sets. highest accuracy of prediction, the Examples are MPP, SPSS from IBM, and data quality is crucial. This facilitates Progenesis QI from Waters. Realizing construction of the right filtering and

9 • Decision tree works well when the Agilent’s goal is to provide solutions References entities are present in all or nearly for customers which includes sample all the samples. It works through a preparation products, analytical 1. Barboza, D.; Barrionuevo, A. Filler in series of if-then-else decisions to instrumentation, software, workflows, Animal Feed Is Open Secret in China. separate the classes. and support. This guide is written to aid The New York Times 30 April 2007. • Neural network is suited for making academic researchers evaluate the most 2. Litzau, J. J.; Mercer, G. E.; classifications when there is a commonly used techniques for fighting Mulligan, K. J. GC-MS Screen for the complex (or unknown) relationship food fraud. It covers field-deployable Presence of Melamine, Ammeline, between the classes. It is a good spectrometric approaches using portable Ammelide, and Cyanuric Acid. v. 2.1, choice when there are two or more instruments like the Agilent Resolve FDA Center for Veterinary Medicine, classes present in the data. Raman system and 4300 Handheld FTIR. original posting May 5, 2007. It discusses the Agilent 2100 Bioanalyzer In general, there are two approaches System as a lab based genomic test for 3. Bhalla, V.; et al. Melamine to validate models once they are accurately identifying fish, even from Nephrotoxicity: an Emerging constructed. Leave one out is when a processed samples where the DNA has Epidemic in an Era of Globalization data file from each class is left out of the been degraded. The guide also shows Kidney International 2009, 75, training set and is later classified with the that identifying geographic origin based 774–779. prediction model. All the files go through on trace elemental analysis has never 4. GC-MS Screen for the Presence of this process to generate the confusion been easier with instruments such Melamine, Ammeline, Amelide, and matrix, which shows the applicability as the matrix-tolerant 7900 ICP-MS Cyanuric Acid. U.S. Food and Drug of the training data with the model. The and the economical 5110 ICP-OES. Administration, LIB No. 4423, vol. 4, other approach is N-fold, in which the Agilent has good solutions for both October 2008. training data are randomly assigned to unit- and high‑resolution MS-based food 5. FDA Notice of Public Meeting N groups. All but one of the groups are classification. used to train the model, and the last on Economically Motivated group is used to validate the model. The Agilent has good solutions for both Adulteration. 74 Fed. Reg. 15,497 process is repeated N times to generate unit- and high-resolution MS based (April 6, 2009). the confusion matrix. Generally, an food classification. An 8890 GC paired 6. Frankel, E. N.; et al. Tests Indicate N value of three is enough. Once a model with the 5977B single quadrupole MS That Imported “Extra Virgin” Olive Oil is constructed, trained, and validated, it can be used to identify volatile and Often Fails International and USDA should be verified using known samples semivolatile adulterants with a spectral Standards. UC Davis Olive Center, since we need to be able to predict that library. The Agilent 6546 LC/QTOF July 2010. is a high-resolution, accurate-mass, they are completely independent of the 7. Cord, C. 80 Percent is the New 69 research instrument for nonvolatile training data. Percent. Olive Oil Times Nov. 30, contaminants. Agilent provides a sample 2016. There are numerous steps, and the classification workflow solution that whole process seems rather complex include MassHunter Profinder software 8. Ayton, J.; Mailer, R. J.; Graham, K. for researchers that do not have any for recursive mass spectrometric The Effect of Storage Conditions on experience in multivariate statistics. For data analysis for both quadrupole and Extra Virgin Olive Oil Quality. RIRDC people such as these, there are guided Time‑of-Flight-based instruments. April 2012, 12/024. workflow options and tools that integrate Curated accurate and unit-mass 9. Morales, M. T.; Luna, G.; feature finding, significance testing, libraries are available for common food Aparicio, R. S. Comparative Study model building, and validation into a adulterants and allergens. Agilent offers of Virgin Olive Oil Sensory Defects. single process in developing supervised instrument-centric statistical tools like 17 Food Chem. 2005, 91(2), 293 –301. learning models. Mass Profiler Professional and user 10. Stein, S. E.; Scott, D. R. Optimization friendly MassHunter Classifier to provide and Testing of Mass Spectral Library researchers with the ability build their Search Algorithms for Compound own sample class prediction models and Identification. J. Amer. Soc. Mass predict attributes such as sample quality. Spectrom. 1994, 5(9), 859–866.

10 11. Taro, Q.; et al. New Investigator 20. Förstel, H. The Natural Fingerprint 29. Vemireddy, L. R.; et al. Review of Tools for Finding Unique and of Stable Isotopes – Use of IRMS to Methods for the Detection and Common Components in Multiple Test Food Authenticity. Anal. Bioanal. Quantification of Adulteration of Samples with Comprehensive Chem. 2007, 388, 541–544. Rice: Basmati as a Case Study. Two-Dimensional Chromatography. 21. Drivelos, S. A.; Georgiou, C. A. J. Food Sci. Technol. 2015, 52(6), Chromatography Today 2018, 13–18. MultiElement and MultiIsotope‑Ratio 3187–3202. 12. Smith, C. A.; et al. XCMS: Processing Analysis to Determine the 30. Kim, S. S.; et al. Authentication of Mass Spectrometry Data for Geographic Origin of Foods in the Rice Using Near-Infrared Reflectance Metabolite Profiling Using Nonlinear European Union. Trends Anal. Chem. Spectroscopy. Cereal Chem. 2003, Peak Alignment, Matching, and 2012, 40, 38 –51. 80(3), 346–349. Identification. Anal. Chem. 2006, 22. Nelson, J.; Hopfer, H. Authentication 31. Teye, E.; et al. Innovative and Rapid 78(3), 779–782. of Specialty Teas: An Application Analysis for Rice Authenticity Using 13. Hjelmeland, A. K.; et al. Note. Food Qual. Saf. 2019, Hand-Held NIR Spectrometry and Characterizing the Chemical and December/January, 32–33. Chemometrics. Spectrochim. Acta A Sensory Profiles of United States 23. Woods, G. Measurement of 2019, 217, 147–154. Cabernet Sauvignon Wines and Trace Elements in Malt Spirit 32. Yu, Y.; et al. Accuracy and Stability Blends. Am. J. Enol. Vitic. 2013, Beverages (Whisky) by 7500cx Improvement in Detecting Wuchang 64(2), 169–179. ICP-MS. Agilent Technologies Rice Adulteration by Piece-Wise 14. Bradbury, L. M.T.; et al. The Gene for Application Note, publication number Multiplicative Scatter Correction in Fragrance in Rice. Plant Biotechnol. 5989‑7214EN, August 2007. the Hyperspectral Imaging System. J. 2005, 3, 363–370. 24. Ellis, D. I.; et al. Through‑Container, Anal. Methods 2018, 10, 3224–3231. 15. Bergman, C. J.; et al. Rapid Gas Extremely Low Concentration 33. Izake, E. L. Forensics and Homeland Chromatograph Technique for Detection of Multiple Chemical Security Applications of Modern Quantifying 2-Acetyl-1-Pyrroline and Markers of Counterfeit Alcohol Using Portable Raman Spectroscopy. Hexanal in Rice (Oryza sativa, L). a Handheld SORS Device. Sci. Rep. Forensic Sci. Int. 2010, 202(1-3), 1–8. Cereal Chem. 2000, 77(4), 454 –458. 2017, 7, 12082. 34. Dooley, J.; et al. Improved Fish 16. Grimm, C. C.; et al. Screening 25. Moore, J. C.; Spink, J.; Lipp, M. Species Identification by the Use for 2-Acetyl-1-Pyrroline in Development and Application of of Lab-on-a-Chip Technology. Food the Headspace of Rice using a Database of Food Ingredient Control 2005, 16, 601–607. SPME/GC-MS. J. Agric. Food Chem. Fraud and Economically Motivated 35. Dooley, J. J.; et al. Fish Species 2001, 49, 245–249. Adulteration from 1980 to 2010. Identification Using PCR-RFLP 17. Yannell, K. E.; Cuthbertson, D. J. Food Sci. 2012, 77(4), R118–R126. Analysis and Lab-on-Chip Capillary Food Authenticity Testing with 26. Santos, P. M.; Pereira-Filho, E. R.; Electrophoresis: Application to the Agilent 6546 LC/Q-TOF Rodriguez-Saona, L. E. Application Detect White Fish Species in Food and MassHunter Classifier. of Hand-Held and Portable Infrared Products and an Interlaboratory Agilent Technologies Application Spectrometers in Bovine Milk Study. J. Agric. Food Chem. 2005, Note, publication number Analysis. J. Agric. Food Chem. 2013, 53, 3348–3357. 5994‑0694EN, March 2019. 61, 1205 –1211. 36. Hebert, P. D.; et al. Biological 18. WTO Analytical Index, TRIPS 27. Pasquini, C. New Infrared Identifications Through DNA Agreement Articles 2018 22, 23. Spectroscopy: Fundamentals, Barcodes. Proc. Biol. Sci. 2003, 19. Ibanez, J. G.; et al. Metals in Alcohol Practical Aspects and Analytical 270(1512), 313–21. Beverages: A Review of Sources, Applications. J. Braz. Chem. Soc. 37. Ratnasingham, S.; Hebert, P. D. Effects, Concentrations, Removal, 2003, 14(2), 198–219. BOLD: The Barcode of Life Data Speciation, and Analysis. J. Food 28. Muthayya, S.; et al. An Overview System (www.barcodinglife.org). Compos. Anal. 2008, 21, 672–683. of Global Rice Production, Supply, Mol. Ecol. Notes 2007, 7, 355–364. Trade, and Consumption. Ann. N.Y. Acad. Sci. 2014, 1324, 7–14.

11 38. Handy, S. M.; et al. Evaluation 46. Giusta, A.; et al. Seafood 53. Schultz-Trieglaff, O.; et al. of the Agilent Technologies Identification in Multispecies Statistical Quality Assessment Bioanalyzer‑Based DNA Fish Products: Assessment of 16SrRNA, and Outlier Detection for Identification Solution. Food Control cytb, and COI Universal Primers’ Liquid Chromatography‑Mass 2017, 73, 627–633. Efficiency as a Preliminary Analytical Spectrometry Experiments. BioData 39. Cespedes, A.; et al. Identification of Step for Setting Up Metabarcoding Mining 2009, 2:4. Flatfish Species Using Polymerase Next-Generation Sequencing 54. Dunn, W; et al. The Importance Chain Reaction (PCR) Amplification Techniques. J. Agric. Food Chem. of Experimental Design and QC and Restriction Analysis of the 2017, 65(13), 2902–2912. Samples in Large-Scale and Cytochrome b Gene. J. Food. Sci. 47. Dymerski, T.; Chmiel, T.; MS‑Driven Untargeted Metabolomic 1998, 63, 206–209. Wardencki, W. Invited Review Studies of Humans. Bioanalysis 40. Dooley, J. J.; et al. Improved Fish Article: An Odor-Sensing System— 2012, 4:18, 2249–2264. Species Identification by the Use Powerful Technique for Foodstuff 55. Tautenhahn, R.; et al. XCMS Online: of Lab-on-a-Chip Technology. Food Studies. Rev. Sci. Instrum. 2011, 82, A Web-Based Platform to Process Control 2005, 16, 601–607. 111101–111132. Untargeted Metabolomic Data. Anal. 41. Wattoo, J. I.; et al. DNA Barcoding: 48. Śliwińska, M.; et al. Food Analysis Chem. 2012, 84(11), 5035–5039. Amplification and Sequence Analysis Using Artificial Senses. J. Agric. Food 56. Styczynski, M. P.; et al. Systematic of rbcL and matK Genome Regions Chem. 2014, 62, 1423–1448. Identification of Conserved in Three Divergent Plant Species. 49. Pillonel, L.; et al. Analytical Methods Metabolites in GC/MS Data for Adv. Life Sci. 2016, 4(1), 3–7. for the Determination of the Metabolomics and Biomarker 42. CBOL Plant Working Group. A DNA Geographic Origin of Emmental Discovery. Anal. Chem. 2007, 79, barcode for land plants. PNAS 2009, Cheese: Volatile Compounds by 966–973. 106(31), 12794–12797. GC/MS-FID and Electronic Nose. 57. Smilde, A. K.; et al. Metabolomics Eur. Food Res. Technol. 2003, 216, in Practice: Successful Strategies 43. Garrett, S.; Clarke, M. Use of the 179–183. Agilent 2100 Bioanalyzer for to Generate and Analyze Basmati Rice Authenticity Testing. 50. Bertelli, D.; et al. Detection of Metabolic Data. M. Lämmerhofer; Agilent Technologies Application Honey Adulteration by Sugar W. Weckwerth, Eds. Weinheim, Note, publication number Syrups Using One-Dimensional and Germany: Wiley-VCH Verlag & Co., 5989‑6836EN, 2007. Two‑Dimensional High-Resolution 2013, p. 266. Nuclear Magnetic Resonance. 44. Wong, E. H; Hanner, R. H. DNA J. Agric. Food Chem. 2010, 58, Barcoding Detects Market 8495–8501. Substitution in North American Seafood. Food Res. Int. 2008, 41(8), 51. Faul, F.; et al. Statistical power 828–837. analyses using G*Power 3.1: Tests for Correlation and Regression 45. Xing, R-R; et al. Application of Next Analyses. Behav. Res. Methods 2009, Generation Sequencing for Species 41(4), 1149–1160. Identification in Meat and Poultry Products: A DNA Metabarcoding 52. Johnson, W. E.; Li, C. Adjusting Batch Approach. Food Control 2019, 101, Effects in Microarray Expression 173–179. Data Using Empirical Bayes Methods. Biostatistics 2007, 8:1, 118–127.

www.agilent.com/chem

This information is subject to change without notice.

© Agilent Technologies, Inc. 2019 Printed in the USA, July 15, 2019 5994-1076EN