VOL. 31 NO. 3 (2019) TONY DAVIES COLUMN

Are omics the death of Good Sampling Practice?

Antony N. Daviesa,b and Roy Goodacrec aExpert Capability Group – Measurement and Analytical Science, Nouryon, Deventer, the Netherlands bSERC, Sustainable Environment Research Centre, Faculty of Computing Engineering and Science, University of South Wales, UK. 0000-0002-3119-4202 cDepartment of , Institute of Integrative Biology, , UK. E-mail: [email protected], 0000-0003-2230-645X

During the recent Royal Society of apparently promising dead-ends. We are in most case-control studies the cases Chemistry, Faraday discussion meeting in reminded by George Poste in his edito- (those with some form of disease) are Edinburgh on Challenges in Analysis of rial “Bring on the Biomarkers” in 20112 usually already on medication (or self Complex Natural Mixtures I found myself that, at that time, of the 150,000 clini- medicating), so this strong confounding wondering if the power that our modern cal biomarkers described in the literature factor also needs to be considered. spectrometers bring to the study of highly a mere 100 were routinely used in the There is an enormous gap between complex systems can sometimes over- clinic. delivering theoretical correlations with whelm our natural scepticism around Omics experiments in themselves the hope of finding causation from stud- poor sampling practices.1 Some targeted present an enormous issue for classical ies of cell cultures in Petri dishes to questions put by Roy Goodacre in this statisticians just by their huge dimension- catching the developing lung cancer in a direction to several speakers seemed to ality. Conventional wisdom has it that the fit, football-playing 45 year-old engineer indicate I was not alone in my concerns, greater the dimensionality of your prob- before he starts coughing blood into his so I thought it might be worth looking lem, the greater the number of unique handkerchief. at the temptations and some good prac- un-related samples you need if you wish So is it really appropriate in such an tices in this area. to analyse the problem successfully. environment to ignore all our Good But where the promises of the omics Sampling Practice that was drummed My spectrometer has approach are being sung the loudest is into us at university (hopefully) and just identified 30,000 also the area where it is always notori- go all out for as much data as we can separate chemical entities ously difficult to recruit large sample get and (ethics committees willing) just so why do I need eight populations. keep throwing the mass spectra, nuclear replicate samples? In the health care environment, omics magnetic resonance (NMR) data sets As regular readers know, this column is believed to be one of the key analytical and our ion mobility fingerprints onto a never aims to be deliberately provoca- spectroscopic advances which will form big pile for the statisticians to fight over? tive (!) but as our analytical spectroscopic the backbone of personalised medicine. Several years ago Raji Balasubramanian and spectrometric toolbox gets stronger However, inconsistent ethics commit- and co-workers compared some clas- and stronger, there is always going to tees, medical practitioner patient notes sification algorithms used in omics be a temptation to revel in the glory and a simple lack of enough patients spectroscopic technologies driven by the of the latest high-resolution enhance - taking part in trials who are the same high-dimensionality of the data.3 Lauren ment for its own sake and to forget, sex, age, weight (or BMI), ethnic origin, McIntyre looked last year at the lack of just for a moment, why we are carrying diet, alcohol intake, fitness regime, medi- samples compared to the complex - out the experiments in the first place. cal history etc. and who are, for exam- ity of metabolites/genes and the lack In the world of omics experiments it is ple, at the same stage of say a non-small of acknowledgement of over-fitting in even more important to be sure that cell carcinoma, could well hinder this the literature proposing a slightly differ- the results we are churning out by the approach well into the future. Let us not ent two-stage approach to the data Petabyte are robust and fit-for-purpose. even start discussing the need for healthy analysis challenge.4 Drupad Trivedi and If we leave aside the cost of the instru- controls with the same characteristics or colleagues recently surveyed the metab- mentation, the societal costs of sloppy- even wander into the analytical mine - olomics literature and found that the vast omics as more data becomes openly field of comparing results from contin- majority of studies were unfortunately available for other scientists to use, could uous monitoring against grab sampling underpowered.5 At the beginning of lead to false conclusions being drawn with different storage strategies taken by this year, Wu and co-authors published and resources being diverted down different projects. Let us not forget that a “selective” review on integrating data www.spectroscopyeurope.com SPECTROSCOPYEUROPE 17 VOL. 31 NO. 3 (2019) TONY DAVIES COLUMN

from different types of omics experi - 150,000 participants being recruited they had been denatured, reduced and ments who want to add another level of from 2013 to 2017. Molecular profil - alkylated followed by digestion and complexity to their lives!6 Thus an abso- ing of each participant is important to de-salting. Unfortunately, these stud- lutely essential components of any study catch genetic and environmental factors. ies take over an hour per sample, which that generates megavariate data is the The analytical centre at the biobank clearly was going to cause problems with need to reduce false discovery.7 carries out standard non-targeted mass a project of this size. If so, then how on earth do we spectrometry (MS) and NMR analyses For the analysis they continue to convince the governmen- making the data available to the scien- adopted a non-targeted approach using tal funding bodies that it is wise to pour tific community. NMR and both positive- and negative-ion money into these areas of research in the The authors discussed the difficulty mode LC-MS. long term? Those in the medical spec- in carrying out sample collection during Metabolites were extracted from the troscopy field who passionately believe omics cohort studies—where although plasma samples into a sodium phos - in this approach, will need to answer the the genome will not alter, target metabo- phate buffer for the NMR studies on a question every three to five years about lites may well be unstable and will be 600 MHz instrument collecting standard how many lives did your last project influenced by many factors which must 1D NOESY and CMPG spectra success- save? (As those approaching the next be captured at the time of sampling. fully identifying and quantifying 37 UK Research Excellence Framework will Indeed, they make the nice statement metabolites. For the MS analyses, an have to think about…). Maybe the best that the quality of the omics data largely automated sample preparation robot was approach is to keep all these issues in depends on the quality of the collected used which could process around 100 mind when designing your experiments samples. They studied which type of samples per hour, detecting over 1000 in the first place as the next story shows. blood collection procedure was best peaks and identifying 250 metabolites. for omics studies, deciding that it was For positive-ion mode analyses, the team Studying the aftereffects best to collect EDTA plasma, as proteins used an UHPLC QTOF/MS system with of a natural disaster by and metabolites can be unstable during electrospray ionisation and a C18 column omics serum clotting. To remain consistent with (Acquity HSS T3; Waters) was used for Tohoku Medical Megabank (TMM) other laboratories, however, they decided LC separation. For negative-ion mode, a Project was created to operate prospec- to continue to collect both serum and NANOSPACE SI II HPLC (Shiseido, Tokyo) tive cohort studies in Japan for regions plasma samples. Figure 1 shows the and a Q Exactive Orbitrap system was where the population were impacted by sample collection and transportation deployed using a HILIC column for sepa- the Great East Japan Earthquake on 11 plan from the cohort recruitment sites to ration (ZIC pHILIC; Sequant, Darmstadt). March 2011.8 The project has at its heart the biobank. The MS measurements could be run at the desire to support personalised medi- four per hour. cal support for the earthquake-damaged The TMM central regions in the future. A good deal of laboratory protocols Sample quality control in thought went into this multi-omics study for proteome and metabolome analysis going right back to the sampling proce- metabolome analysis Not satisfied with the level of sampling dures. Two cohort studies are discussed— For proteome analysis, the TMM team standardisation and analysis described one an adult study and a second birth carried out LC-MS/MS measurements above, the team also put protocols and three generations study with over in triplicate of the plasma samples after in place on the not unreasonable

Figure 1. The TMM cohort sample collection and storage protocol.

18 SPECTROSCOPYEUROPE www.spectroscopyeurope.com VOL. 31 NO. 3 (2019) TONY DAVIES COLUMN

assumption that there would be use the resource to support the deploy- 6. C. Wu, F. Zhou, J. Ren, X. Li, Y. Jiang some sample handling errors deal - ment of personalised medicine to these and S. Ma, “A selective review of ing with such a large study. Samples regions. multi-level omics data integra - were excluded as outliers if the NMR tion using variable selection”, High data on certain control metabolites References Throughput 8(1), 4 (2019). https:// were outside the ranges expected. 1. http://www.rsc.org/events/ doi.org/10.3390/ht8010004 For example, the blood glucose values detail/29574/challenges-in-analysis- 7. D.I. Broadhurst and D.B. Kell, needed to be no lower than 70 % of of-complex-natural-mixtures-faraday- “Statistical strategies for avoiding false that measured by an original blood test discussion (accessed 10 June 2019). discoveries in and carried out at the recruitment site and 2. G. Poste, “Bring on the biomark - related experiments”, Metabolomics the lactose levels could not exceed ers”, Nature 469, 156–157 (2011). 2, 171–196 (2006). https://doi. more than 2× standard deviation of https://doi.org/10.1038/469156a org/10.1007/s11306-006-0037-z the cohort average. Samples were also 3. Y. Guo, A. Graber, R.N. McBurney and 8. S. Koshiba, I. Motoike, D. Saigusa, excluded if they breached some aspect R. Balasubramanian, “Sample size J. Inoue, M. Shirota, Y. Katoh, of the protocol, such as accidental stor- and statistical power considerations F. Katsuoka, I. Danjoh, A. Hozawa, age for longer periods before entering in high-dimensionality data settings: S. Kuriyama, N. Minegishi, the biobank. a comparative study of classifica- M. Nagasaki, T. Takai-Igarashi, Finally, the quality controlled data tion algorithms”, BMC Bioinformatics S. Ogishima, N. Fuse, S. Kure, are being made available at the jMorp 11, 447 (2010). https://doi. G. Tamiya, O. Tanabe, J. Yasuda, Japanese Multi Omics Reference org/10.1186/1471-2105-11-447 K. Kinoshita and M. Yamamoto, Panel at https://jmorp.megabank. 4. A. Kirpich, E.A. Ainsworth, “Omics research project on prospec- tohoku.ac.jp/201905/ and 8 May saw J.M. Wedow, J.R.B. Newman, tive cohort studies from the Tohoku jMorp release 201905 of the 5KJPNv2 G. Michailidis and L.M. McIntyre, Medical Megabank Project”, Genes Genotype Frequency dataset from 3500 “Variable selection in omics data: A Cells 23(6), 406–417 (2018). individuals. The metabolites database practical evaluation of small sample https://doi.org/10.1111/gtc.12588 release (ToMMo Metabolome 2018 sizes”, PLoS ONE 13(6), e0197910 9. D. Broadhurst, R. Goodacre, 20180827) currently contains distribu- (2018). https://doi.org/10.1371/ S.N. Reinke, J. Kuligowski, I.D. Wilson, tions of metabolite concentrations iden- journal.pone.0197910 M. Lewis and W.B. Dunn, ”Guidelines tified by NMR, and distributions of peak 5. D.K. Trivedi, K.A. Hollywood and and considerations for the use of intensities of metabolites characterised R. Goodacre, “Metabolomics for system suitability and quality control by LC-MS detected in samples from an the masses: the future of metab - samples in mass spectrometry assays initial 10,719 volunteers (only 3012 for olomics in a personalized world”, applied in untargeted clinical metab- LC-MS so far). Eur. J. Mol. Clin. Med. 3, 294–305 olomic studies”, Metabolomics 14, For those interested in how to use (2017). https://doi.org/10.1016/j. 72 (2018). https://doi.org/10.1007/ quality controls in metabolomics nhtm.2017.06.001 s11306-018-1367-3 the reader is directed to an article in Metabolomics that won this year’s prize for the most downloads—a testament JOURNAJOURNALL that many researchers are aware that OOFF quality assurance and quality control is a SPECTRAL SPECTRALvolume 1 / 2010 JSI IMAGINIMAGINGISSN 2040-4565 G very important aspect of any large-scale omics studies.9 An Open Access JournalJournal fromfrom IMP OpenOpen

Conclusions Special Issue: So, for what the TMM project authors claim to be one of the biggest planned Spectral Imaging in multi-omics longitudinal studies currently underway, it is clear that those with Synchrotron Light Facilities responsibility for the planning and execu- tion of the project are certainly of the opinion that sampling really is critical to the quality of the whole omics project. Time will tell if they have taken enough precautions as the data sets increase in impopen.com/synchrotron size and the analytical scientists start to www.spectroscopyeurope.com SPECTROSCOPYEUROPE 19