Available online at www.sciencedirect.com Current Opinion in ScienceDirect

Next-generation sequencing data for use in risk assessment Bruce Alexander Merrick

Abstract fluorescently labeled nucleotides and capillary electro- Next-generation sequencing (NGS) represents several phoresis that gave rise to automated sequencing in- powerful platforms that have revolutionized RNA and DNA struments such as the Applied Biosystems, Inc. model analysis. The parallel sequencing of millions of DNA molecules 370A sequencers and others, on the basis of Sanger can provide mechanistic insights into toxicology and provide chemistries [2]. Read length per each sample was at 500e new avenues for biomarker discovery with growing relevance 800 nucleotides, and sample throughput was limited. for risk assessment. The evolution of NGS technologies has improved over the last decade with increased sensitivity and Sanger-based DNA sequencing instruments are accuracy to foster new biomarker assays from tissue, blood, considered first-generation platforms. Instruments that and other biofluids. NGS technologies can identify transcrip- perform multiple sequencing reactions simultaneously tional changes and genomic targets with base pair precision in in a ‘massively parallel’ fashion have been dubbed, response to chemical exposure. Furthermore, there are ‘NextGeneration’ or NextGen Sequencing [3]. How several exciting movements within the toxicology community NGS came about compared with other genomic plat- that incorporate NGS platforms into new strategies for more forms is interlinked with microarray technology rapid toxicological characterizations. These include the Tox21 (Figure 1). Both platforms can provide whole genomic in vitro high-throughput transcriptomic screening program, approaches to research problems. Microarrays are a development of organotypic spheroids, alternative animal fluorescent probe hybridization-based technology with models, mining archival tissues, liquid biopsy, and epige- origins in the mid-1990s that are now a mature genomic nomics. This review will describe NGS-based technologies, platform with a well-established data analysis pipeline. demonstrate how they can be used as tools for target dis- Downsides of microarrays are that a prior genomic covery in tissue and blood, and suggest how they might be knowledge is needed to generate probes which are applied for risk assessment. species-specific with a limited dynamic range for dif- ferential expression. Addresses Molecular and Genomic Toxicology Group, Biomolecular Screening Branch, Division National Toxicology Program, National Institute of Development of the second wave of sequencing tech- Environmental Health Sciences, Research Triangle Park, NC, 27709, nologies, termed NextGen sequencing (NGS) technol- USA ogies, has overlapped with microarray platforms Corresponding author: Merrick, B. Alex ([email protected]) (Figure 1). NGS began in the new millennium as exem- plified by the massively parallel signature sequencing (MPSS) system that came from university research. Current Opinion in Toxicology 2019, 18:18–26 Improvements in sequencing chemistries, detection, and This review comes from a themed issue on Genomic Toxicology automation over the next decade promoted a rapid Available online 8 March 2019 development of NGS platforms. From 2010 to 2015, many NGS instruments became commercially available For a complete overview see the Issue and the Editorial that could produce millions of reads from 100 to 1000 https://doi.org/10.1016/j.cotox.2019.02.010 bases in length. A read is a short piece of sequence (e.g. 2468-2020/Published by Elsevier B.V. 100 nucleotides) that can be aligned to a transcript, and it also serves as a quantitative measure of a transcript when Keywords summed up with other aligning reads. These second- Next generation sequencing, RNA-seq, DNA-seq, Risk assessment, generation sequencers include the Roche ‘454 FLX,’ Biomarkers, High throughput transcriptomics, Liquid biopsy, Life Technologies‘Ion Torrent,’Applied Biosystems, Inc. Transcriptomics. ‘SOLiD’, and Illumina family of sequencers including the HiSeq 2000 series, MiSeq, X-Ten, and NovaSeq [4]. Introduction Further advances, such as single-molecule real-time The Sanger sequencing method was developed in the sequencing, in NGS sequencing technology have led to late 1970s to analyze DNA sequence using 32P-labeled longer read sequencers such as the Pacific Biosciences nucleotides separated by polyacrylamide gels for ‘PacBio RS II’ instrument that produces reads greater autoradiograms [1]. Radionuclides were replaced by than 10,000 bases [3].

Current Opinion in Toxicology 2019, 18:18–26 www.sciencedirect.com NGS in risk assessment Merrick 19

Figure 1

Timeline for development of microarray and next-generation sequencing (NGS) technology platforms. Microarray developments are above the timeline and NGS activities are below. For microarray development, Brown’s laboratory at Stanford was one of the first to develop a multigene expression measurement system using fluorescent detection. The term, toxicogenomics (Tgmx) was first defined by Nuwaysir et al., in 1999 [67]. Commercialized platforms such as Affymetrix, Agilent, and NimbleGen matured through 2010. For NGS, the massively parallel signature sequencing (MPSS) was developed in 2000 by the Brenner lab at Lynx Therapeutics. 454 Life Sciences developed a massively parallel pyrosequencing method in 2006 followed by a commercial instrument put out by Roche. The Solexa short-read platform was acquired by Illumina in 2008 and has undergone continued devel- opment and improvement. Commercialization of NGS platforms continues with various speeds of analysis, read lengths, and sequencing capacities. More recent developments include BRB-seq or ‘Bulk RNA Barcoding and Sequencing’ and TempO-Seq by BioSpyder as a library of bar-coded probes that hybridize to representative transcripts as a targeted NGS approach to transcript expression. Advantages and disadvantages are summarized and discussed further in the text.

A more recent sequencing technology has been available [7]. Test articles can include chemicals and advanced by Oxford Nanopore Technologies. Nanopore many other agents, including pharmaceuticals, , instruments read bases directly from single DNA or natural products, particles such as asbestos, nano- RNA molecules through a biological nanopore particles, physical factors such as radiation, metals, and channelda nanoscale biological tube that sequences by many others. The type of or hazard can be sensing changes in ionic current as the nucleic acid widely defined as macroscopic or microscopic lesions molecule passes through [5]. The sequencing devices and pathologies, altered pharmacologic, immunologic, can provide rapid analysis (hours), and some units are functional, and behavioral reactions, changes in portable (size of an USB flash drive) that can be readily biochemistry and physiology, or any measurable applied to teaching laboratories, medical offices, and response that is considered adverse or outside of normal field work. Reads lengths can be in the tens to hundreds health. Study of the underlying molecular changes of kb in length. A primary advantage of long read length contributing to toxicity has been greatly facilitated by is to reduce the ambiguity of highly homologous , technologies, particularly transcriptomics, while splice variants, and repetitive regions in the standardization of data analysis and interpretation where alignment is inherently more difficult using short continue to be refined [8]. There are many in vitro assays reads. The high sequence resolution of NGS in- and screens (e.g. anticholinesterase activity or bacterial struments has come at the expense of relatively low mutagenesis) that support the mode of action in the risk sample throughput. This issue has been addressed by assessment process, but new initiatives such as Tox21 creating libraries of targeted probe sets that analyze the aim to develop new assays incorporating NGS platforms complete (e.g. TempO-Seq [6] and bulk for a larger role in risk determination [9]. RNA coding-seq [BRB-seq], reviewed later), rapidly and at relatively low cost. A brief depiction of NGS appli- The dynamic nature of (tran- cations is shown in Figure 2. scriptomics) in response to a chemical or test article exposure makes it well suited as part of the hazard NGS and risk assessment identification and dose-setting process for risk assess- Traditional risk assessment often involves identifying ment [10,11]. There are approximately 15,000 coding hazard(s) in a doseeresponse manner after chemical or genes and probably an equal number of noncoding genes test article exposure in animal models or human data if expressed at any one time in a specific type. Splice www.sciencedirect.com Current Opinion in Toxicology 2019, 18:18–26 20 Genomic Toxicology

Figure 2

Many applications for measuring RNA and DNA in toxicogenomics are supported by NGS platforms. Whole genome or transcriptome analysis or targeted portions of each can be measured by NGS. seq, sequencing; WG, whole genome; ATAC, assay for transposase-accessible chromatin; ChIP, chromatin immunoprecipitation; Ribo, ribosome; miRNA, microRNA. RNA analysis platforms are indicated by blue arrows, and DNA analysis platforms are shown by red arrows. Further description of these applications is provided in the text.

variants also add more complexity to response. Of those the transcriptome after alignment to a reference expressed genes, only a proportion may change in genome. About 95% of total RNA isolates are ribo- response to chemical exposure. A robust transcriptomic somal RNA (rRNA), and because it provides little response may number in the thousands of altered tran- value, rRNA must be removed either by using poly(A) scripts, while a low level of response may differ by a few enrichment or by rRNA depletion strategies [15].In hundred transcripts or less. The field has been greatly most tissues, the transcriptome is primarily composed assisted by genomic doseeresponse analysis software of mRNA that is translated into ; however, (e.g. BMDExpress) to facilitate use of transcriptomic there is a substantial portion of RNA that is not pro- data in toxicology and risk assessment [12]. tein coding, expressed as noncoding RNA (ncRNA), that can be detected by RNA-seq. Such ncRNA in- RNA-seq cludes microRNA, long non-coding RNA (lncRNA), RNA-seq is the principal NGS platform for tran- and other specialized small RNAs (Tab le 1). mRNA or scriptome analysis [13,14]. Unlike microarrays, tran- ncRNA are reverse transcribed into cDNA, and then, a script sequencing can occur without prior genomic library of cDNA fragments is constructed for each knowledge, although accurate alignment is greatly sample with short adaptor sequences attached to enhanced by genome data assemblies. RNA-seq per- either fragment end. RNA libraries can be sequenced forms tens or hundreds of thousands of small-scale in one direction single-end reads and also from the DNA sequencing reactions (cDNA converted from opposite direction (paired-end reads). Paired-end RNA) that produce relatively short sequences (reads) reads provide a much more accurate alignment but at of 100e400 bases in length that in aggregate represent more expense.

Table 1 Transcriptome transcript classificationa.

Transcript type Genomic number Mature size Examples

mRNA – coding 20–25,000 500–15,000nt TP53, GAPDH ncRNA – miRNA 2–5000 22nt miRNA-29, miR-122 ncRNA - lncRNA >30,000 >200nt HOTAIR, PVT1 small ncRNA - regulatory 1000 20–100nt tRNA, rRNA, siRNA

HOTAIR, HOX antisense intergenic RNA; PVT1, plasmacytoma variant translocation 1; GAPDH, glyceraldehyde-3-phosphate dehydrogenase. a The transcriptome comprises coding and noncoding transcripts (ncRNA), including microRNAs (miRNAs); long noncoding transcripts (lncRNA); and regulatory small ncRNAs (e.g. transfer RNA, ribosomal RNA, small interfering RNA).

Current Opinion in Toxicology 2019, 18:18–26 www.sciencedirect.com NGS in risk assessment Merrick 21

The number of reads per sequencing lane or sequencing presence or absence of gut on toxicity and run varies with each NGS platform, and the number of gene expression [21]. These authors found several samples for RNA-seq analysis can be mixed and distin- protein-coding lncRNA pairs that may serve as specific guished by a multiplexing process called ‘bar coding.’ biomarkers to distinguish various polybrominated Statistical confidence in differential transcript expres- diphenyl ether congeners and shed light on chemical sion is increased by devoting a requisite number of reads effects and toxicity related to changes in the micro- per sample for adequate ‘depth of coverage’ of the biome. In another RNA study, livers from rats transcriptome [14]. Sequence coverage is the number of subchronically exposed to aflatoxin B1 (AFB1) in times each base within the transcriptome is sequenced; diet showed differential expression of 25 new lncRNAs so, if each base within the transcriptome on average was to exposure that were discovered as candidate predictive sequenced 10 times, the coverage would be 10-fold. biomarkers of hepatocellular carcinomas [22]. RNA-seq Coverage needs for differential expression using RNA- can also detect microRNAs, as recently described in the seq vary from 15- to 30-fold depending upon the or- rat microRNA body atlas [23], for detecting specific ganism and reference genome. However, the detection biomarkers such as miR122 that is released from hepa- of rare transcripts or inferring single-nucleotide poly- tocytes into biological fluids after chemically induced morphisms (SNPs) from RNA-seq data may require liver injury [24]. Other recent studies have similarly significantly more reads from 100- to 200-fold coverage used the sensitivity and base-pair resolution power of [16]. It should be acknowledged that the expense, RNA-seq for differential expression and pathway depth of transcriptome coverage, greater data changes to suggest modes of action for toxicity with complexity, greater demands for computational analysis, estradiol [25] or pharmaceutical water contaminants and computational infrastructure are considerations in [26] in zebrafish, with diesel fractions [27] or the flame- RNA-seq analysis. retardant contaminant, tetrabromobisphenol A [28],in zebrafish embryos in vitro. One of the landmark studies in comparing RNA-seq with the microarray platform for chemical exposure studies involved the Sequencing Quality Control project High-throughput transcriptomics that examined thousands of microarrays and RNA-seq Even though RNA-seq can provide a detailed measure- analyses to compare differential expression profiling ment of the transcriptome, it is not a high-throughput and quality metrics for each platform [17]. In general, platform. Tox21 is interagency program to develop and the consortium found good agreement between RNA- encourage high throughput in vitro screening and seq and microarray relative to gene expression, despite advanced computation methods to better predict toxi- some data variability in low-expression genes that could cological effects of chemical exposure [29]. In the past be attributed to differences in expression platforms and few years, high-throughput transcriptomics or ‘HTT’ data analysis pipelines. A follow-up study by this work- has been developed to measure thousands of transcripts ing group compared from RNA-seq and for differential expression after chemical exposure, in a microarray data on 498 primary neuroblastomas and highly multiplexed fashion that accommodates thou- showed that RNA-seq outperforms microarrays in terms sands of samples to establish concentrationeresponse of overall transcript characterization, but both platforms relationships at the gene and pathway level [6,30]. show similar results in clinical endpoint prediction [18]. How was this accomplished? A library of transcript- specific probes can be synthesized that bind to RNA The sensitivity and discovery potential of RNA-seq has in a hybridization-ligation reaction. Indexing sequences found applications in biomarker discovery and environ- are unique to each transcript and sample. Sample li- mental monitoring that can be relevant for many stages braries can be mixed for simultaneous analysis on an of risk assessment. For example, nine chemical pollut- NGS sequencing instrument with a sensitivity of ants were screened in undifferentiated mouse embry- detection in the picogram range for RNA. This sensi- onic stem cells in a cell-based toxicity assay in which tivity for RNA transcripts means that transcriptional RNA-seq identified novel RNA biomarkers including changes can be measured using thousands or just hun- ncRNAs that showed substantial response to in vitro dreds of cells. RNA-seq methods have been developed chemical exposure [19]. Bis-(2-ethylhexyl)-tetra- to profile single-cell transcriptomes of known and novel bromophthalate, a widely used commercial chemical, cell types in complex tissues such as kidney [31]. The was screened in vitro in a fish embryo system (Atlantic sensitivity of NGS methods has been adapted for toxi- killifish) by RNA-seq that related transcriptional and cant screening in 96- or 384-well plates for high-reso- pathway changes to developmental endpoints as part of lution concentrationeresponse assessment to chemical environmental risk assessment [20]. A very innovative exposures [32]. Several articles have demonstrated its study profiled liver expression responses from mice application in toxicity screening. For example, six com- in conventional and germ-free conditions exposed to pounds were screened in differentiated kidney primary the persistent environmental contaminants, poly- renal proximal tubule epithelial cells (RPTEC) or liver brominated diphenyl ethers, to determine effect of HepaRG cells in a time- and concentration-related www.sciencedirect.com Current Opinion in Toxicology 2019, 18:18–26 22 Genomic Toxicology

manner using a 2800 transcript panel to discriminate samples. One study showed genomic signatures and compound and cell typeespecific responses [33]. The gene set analysis in AFB1 differentially expressed tran- development of liver spheroids comprising 1000 to scripts that were highly comparable for matched fresh 2000 cells has an increased metabolic capacity over two- frozen and FFPE tissues [37]. Subsequent studies have dimensional cells in flat culture. The high sensitivity of shown a conservation of gene expression patterns in HTT can be exploited for rapid, high-throughput FFPE and frozen tissue samples, especially when rRNA concentrationeresponse studies with multiple chem- depletion procedures were used [38]. A comprehensive icals [32,34]. analysis of archival liver sample sets involving di(2- ethylhexyl) phthalate or dichloroacetic acid, varying While the whole transcriptome can be screened in each from 2 to 20 years storage, showed remarkably high HTT analysis, considerable sequencing must be correlation in doseeresponse of differentially expressed performed to deliver a statistical level of confidence for genes, despite challenges of lower read counts from the gene expression, especially for low copy number tran- older study [39]. In particular, the more recently (2 scripts. As a result, a strategy of selecting a tran- year) archived FFPE sample data were highly similar to scriptomic subset of genes has been developed into a frozen sample transcriptional data regarding sequencing platform called the S1500þ, or ‘Sentinel’ 1500 [30]. This quality metrics, differential expression, and dosee platform evolved as a hybrid approach of combining (1) response relationships [39]. the L1000 platform; (2) transcripts using a toxicoge- nomic data-driven method from public databases for DNA sequencing selecting the most responsive transcripts; and (3) expert The base-pair resolution of NGS platforms is uniquely þ contributed genes [30]. The S1500 platform repre- positioned to detect mutations and single-nucleotide sents a biological space reflecting a diverse pharmaco- variations (SNPs) related to genomic changes and logic and toxicity gene expression that represents all health hazards posed by chemical exposure. Targeted known canonical pathways from the Molecular Signature sequencing of molecular sensors of genotoxic and Database and can infer changes from the remainder of cellular stress such as TP53 [40]; multiple Omic ana- þ the transcriptome. A study comparing the S1500 gene lyses involving genomic, epigenomic, and transcriptomic set with RNA-seq and microarray rat liver mode of action characterization of the , 1,3-butadiene, in þ samples demonstrated that the S1500 platform results mouse strains [41]; and research on genomic interroga- are consistent with findings performed with genome- tion of oxidative DNA damage [42] are representative wide platforms (e.g. microarray, RNA-seq) for studies that have applied multiple NGS platforms to measuring genome-wide transcriptional responses [35]. more completely describe adverse effects of chemical Another aspect of the HTTapproach for risk assessment exposure across the genome. Unlike the dynamic nature can be the comparative screening of specific cell types of RNA transcription to hazardous substance exposures, in vitro (e.g. liver, kidney, heart, neurons) to test articles the stability of DNA in human, rodent, and other animal during the same experiment. For animal testing, it will model systems does not as readily lend itself to rapid eventually be possible to monitor transcriptional changes chemical screening for genomic changes in risk assess- e in all tissues of test article exposed animals to accom- ment. Genetic toxicology uses a range of screening tests pany histopathologic and clinical chemistry evaluations. for DNA damage including the Ames assay, the micro- nucleus test, the Comet assay, and several chromosomal Archival Transcriptomics aberration and DNA damage and repair assays [43]. Toxicology studies with archival specimens such as formalin-fixed and paraffin-embedded (FFPE) tissues There is increasing interest in the role that high- comprise an invaluable resource for linking histopatho- throughput screening and NGS platforms might play in logic diagnosis to gene expression profiles. Establishing creating a new generation of tests for genotoxicity and molecular and pathologic relationships can provide a genetic susceptibility to disease and chemical hazards basis for risk assessment on the basis of linking estab- [44,45]. An important part of NGS approaches is the lished molecular pathways with chemical pathologies ability to interrogate the whole genome or the exome, and disease. Advanced procedures for deparaffinization where changes in coding regions of genes could alter and enzymatic digestion have been developed to release translational protein products and damage regulatory nucleic acids for extraction and purification from slices processes. In the past, human genetic and epidemio- of paraffin blocks after years in storage [36]. logical studies were limited to a candidate gene approach to establish genomic disease and toxicity re- The nonspecific hybridization and higher background lationships. Now, whole-genome sequencing can survey from microarray analysis of archival blocked samples SNPs, copy number variation, and chromosomal aber- have encouraged researchers to use RNA-seq for tran- rations with increasing accuracy. Sequencing studies of scriptomic profiling. Several studies have successfully humans and experimental species (e.g. mouse) provide used NGS transcript profiling technologies on archival publicly available data to better estimate ‘normal’

Current Opinion in Toxicology 2019, 18:18–26 www.sciencedirect.com NGS in risk assessment Merrick 23

sequence variation (normal phenotype) which is critical sequencing. Despite these concerns, there is enthu- for distinguishing those variants leading to environ- siasm for use of noninvasive or minimally invasive (e.g. mental disease. Inherent in this task of genetic variation blood) DNA or RNA for NGS and genotyping [53,54]. is discerning germline variants (heritable sequences) from somatic variants acquired in DNA of tissues during Exome sequencing life. NGS in forward genetic screens in mice is one Because coding regions comprise about 2% of , experimental approach to help sort out candidate genes designing probe sets to capture and sequence only those and mutations that correspond to specific phenotypes coding genomic regions provides an efficient way to [46]. The dbSNP public database maintained by the examine sequence variants in the most consequential NCBI is a catalog of single-nucleotide variants, small- regions of the genome where changes may be linked to scale deletions or insertions, and short tandem repeats abnormal phenotypes and disease without sequencing such as microsatellites [47,48]. As of 2017, the NCBI the entire genome. The other advantages to this will only accept human data variant submissions, approach are the greater depth of sequencing possible whereas EBI’s European Variation Archive will continue with each sample (e.g. 50- to 100-fold or more) to accept data and host the collection of nonhuman data compared with whole-genome sequencing and the variants. dbGaP is the NCBI public database that ar- greater number of samples possible for analysis in a e chives genotype phenotype associations from many sequencing run [55]. Thus far, use of exome sequencing sources such as genome-wide association study data, for assessment of risk has been clinically focused on short read archive, molecular diagnostic assays, and prenatal and reproductive medicine [56] as well as others. cancer diagnosis and treatment [57]. Use of exome sequencing in environmental risk assessment lies in the future. Whole-genome sequencing The cost of whole-genomic sequencing has crossed below the one thousand dollar barrier. In the wake of Duplex sequencing decreasing costs of NGS platforms, considerable public Duplex sequencing is a highly specific NGS method for resources are now being devoted to sequencing thou- detecting rare sequence variants and mutation with sands of human genomes to benefit personalized med- frequencies as low as one in 10 million [58]. Specific icine and understanding genetic susceptibility. For adapters and tags can uniquely identify reads from each example, the ‘All of Us’ project sponsored by the Na- strand of DNA. Although there is considerable sensi- tional Institutes of Health aims to fund several centers tivity in NGS platforms, sample preparation and poly- for complete genomic sequencing on one million or merase amplification error rate contribute to substantial more people to interrelate effects of genetics, environ- noise that obscures low-frequency mutations. For ment, and life style [49]. As public sequencing projects example, duplex sequencing was used to determine get underway, the varying levels of sequencing depth are TP53 mutations in peritoneal fluid samples from women required for how genomic information can be used with with ovarian carcinomas and control individuals without confidence for distinguishing germline variants and rare cancer [59]. Findings showed nearly all patients with variants from somatic variation, use in clinical decisions, and without cancer (35/37 total) had low-frequency surgery and therapeutics; use in genetic counseling; or TP53 mutations that were more abundant with cancer, use in advising patients with risk factors, either known clustered in hotspots, and increased with age. Wide- or suspect [50]. Not all variants will have risk; the risks spread, age-associated somatic TP53 mutations in may yet be undiscovered; variants may be multifactorial noncancerous tissue suggests overall mutational burden in disease risk; or variants may be protective, counter- even in normal individuals. Such TP53 mutations could acting the disease potential posed by other variants. also be detected in peripheral blood samples. Another Pharmacogenetics and are research study identified a characteristic mutation spectrum for areas where NGS may inform both clinical and research the liver , AFB1, months before tumors were efforts for therapeutic and chemical exposure risks [51]. detectable in a mouse model using duplex sequencing [60]. The AFB1 spectrum proved clinically useful in Genomic material for sequencing is generally collected accurately identifying a subset of cancers associated from blood or oral swabs, and such procedures offer a with AFB1 exposure from a larger set of human liver noninvasive ease of collection [52] but may have limi- tumors. Use of duplex sequencing as a measure of tations. First is that many disease phenotypes occur in mutational load in sensitive genomic regions due to tissues or organs far away from the collection site so environmental chemical exposure remains to be critical sequence variants may not be readily observed explored. when obtained from blood or pharynx. Second, initiation of early-stage disease may begin in a single cell or small Liquid biopsy cluster of cells so that detection of sequence variants Liquid biopsy refers efforts to detect and monitor dis- may require cell enrichment or greater depths of DNA ease or toxicity in accessible biofluids, notably in blood, www.sciencedirect.com Current Opinion in Toxicology 2019, 18:18–26 24 Genomic Toxicology

because of the relative ease of sampling that can be Conflicts of interest performed repeatedly overtime. Release of cell-specific Nothing declared. miRNAs during chemically induced toxicity can be exploited by NGS for a liquid biopsy approach to assess Funding risk to chemical exposure overtime [61]. For example, No funding was received for this work. miRNA-seq analysis of urine in rats sampled after 1 week exposure to the renal , gentamicin, found 227 unique miRNAs of which 146 were differentially Disclaimer expressed, with nine being novel miRNAs not found on This article is the work product of employees of the a primer-designed qPCR platform [62]. In addition to National Institute of Environmental Health Sciences circulating miRNA, NGS analysis of plasma DNA used (NIEHS), National Institutes of Health (NIH); how- for tumor diagnosis in oncology might be similarly ever, the statements, opinions, or conclusions contained applied in toxicology. Circulating, cell-free DNA therein do not necessarily represent the statements, (ccfDNA) comprises short fragments extracellular DNA opinions, or conclusions of NIEHS, NIH, or the United (w180bp) that normally circulate in blood at low levels States government. (e.g. 1e5 ng/ml) in healthy individuals. ccfDNA is derived from leukocytes and tissue apoptosis and cell References turnover from all tissues. However, in many tumors, the Papers of particular interest, published within the period of review, amount of ccfDNA is increased (10e100 ng/ml), and a have been highlighted as: portion may harbor diagnostic cancer mutations [63]. of special interest Use of ccfDNA has gained attention in the diagnosis, of outstanding interest staging, and biomarker discovery for many types of 1. Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain- tumors [64] and also many other diseases [65] (e.g. terminating inhibitors. Proc Natl Acad Sci U S A 1977, 74: autoimmune and infectious diseases) as a novel, mini- 5463–5467. mally invasive form of ‘liquid biopsy’, an attractive 2. Heather JM, Chain B: The sequence of sequencers: the history alternative to needle biopsy. Another exciting develop- of sequencing DNA. 2016, 107:1–8. ment in this field is the epigenetic analysis of ccfDNA 3. Goodwin S, McPherson JD, McCombie WR: Coming of age: ten for cancer diagnostics and determining tissue of origin years of next-generation sequencing technologies. Nat Rev Genet 2016, 17:333–351. along with somatic mutations [66]. To date, ccfDNA has A current overview of NGS technologies and applications, including a been little studied in environmental health sciences. comparison against microarrays, and subgenomic platforms like NanoString, qPCR and Optical mapping. Different NGS instrument and However, for those exposures that leave a somatic mu- sequencing platforms are described ranging from DNA-seq, RNA-seq, tation or epigenetic pattern, NGS analysis of ccfDNA at ATAC-seq and others. the exome, whole genomic, or epigenomic level could 4. Mutz KO, Heilkenbrinker A, Lonne M, Walter JG, Stahl F: Tran- provide new data on the amounts and types of envi- scriptome analysis using next-generation sequencing. Curr Opin Biotechnol 2013, 24:22–30. ronmental exposures overtime in experimental and epidemiological settings for improved risk assessments. 5. Feng Y, Zhang Y, Ying C, Wang D, Du C: Nanopore-based fourth-generation DNA sequencing technology. Genom Proteom Bioinform 2015, 13:4–16. Summary 6. Yeakley JM, Shepard PJ, Goyena DE, VanSteenhouse HC, Transitioning from the clinic to environmental carcino- McComb JD, Seligmann BE: A trichostatin A expression signature identified by TempO-Seq targeted whole tran- genesis research, the impact of biomarker research is scriptome profiling. PLoS One 2017, 12:e0178302. shifting from therapeutics and companion diagnostic 7. McCarty LS, Borgert CJ, Posthuma L: The regulatory challenge development toward risk assessment and early detection of chemicals in the environment: toxicity testing, risk of chemical exposure and disease-related changes. Bio- assessment, and decision-making models. Regul Toxicol Pharmacol 2018, 99:289–295. markers evaluating risk assessment directly relate to 8. Sauer UG, Deferme L, Gribaldo L, Hackermuller J, Tralau T, van environmental regulation and can be used to help define Ravenzwaay B, Yauk C, Poole A, Tong W, Gant TW: The chal- amounts and the type of environmental chemical ex- lenge of the application of ’omics technologies in chemicals posures, biomarkers of effect, and biomarkers of sus- risk assessment: background and outlook. Regul Toxicol Pharmacol 2017, 91(Suppl 1):S14–S26. ceptibility, all of which reflect the interactions between 9. Tice RR, Austin CP, Kavlock RJ, Bucher JR: Improving the the environment and the population. Thus, NGS data human hazard characterization of chemicals: a Tox21 update. can contribute to an understanding of the environ- Environ Health Perspect 2013, 121:756–765. mental and/or genetic factors that could lead to poten- A review of the Tox21 interagency program, the goals and accom- plishments for Phase I and Phase II is detailed. High throughput tial adverse health effects and have a very positive screening assays in 1536 well format are described to form 15 impact on the future of chemical risk assessment. concentration-response curves to a 10K (10,000 environmental, pesticide and pharmaceutical compounds) in a variety of nuclear re- ceptor and stress reponse reporter assay systems. Acknowledgements 10. Bourdon-Lacombe JA, Moffat ID, Deveau M, Husain M, This work was supported by the Division of the National Toxicology Pro- Auerbach S, Krewski D, Thomas RS, Bushel PR, Williams A, gram at the National Institute of Environmental Health Sciences under Yauk CL: Technical guide for applications of gene expression grant Z99 ES999999.

Current Opinion in Toxicology 2019, 18:18–26 www.sciencedirect.com NGS in risk assessment Merrick 25

profiling in human health risk assessment of environmental 26. Wu M, Liu S, Hu L, Qu H, Pan C, Lei P, Shen Y, Yang M: Global chemicals. Regul Toxicol Pharmacol 2015, 72:292–309. transcriptomic analysis of zebrafish in response to embry- onic exposure to three antidepressants, amitriptyline, fluox- 11. Wilson VS, Keshava N, Hester S, Segal D, Chiu W, etine and mianserin. Aquat Toxicol 2017, 192:274–283. Thompson CM, Euling SY: Utilizing toxicogenomic data to understand chemical mechanism of action in risk assess- 27. Mu X, Liu J, Yang K, Huang Y, Li X, Yang W, Qi S, Tu W, Shen G, ment. Toxicol Appl Pharmacol 2013, 271:299–308. Li Y: 0# Diesel water-accommodated fraction induced lipid homeostasis alteration in zebrafish embryos. Environ Pollut 12. Phillips JR, Svoboda DL, Tandon A, Patel S, Sedykh A, Mav D, 2018, 242:952–961. Kuo B, Yauk CL, Yang L, Thomas RS, et al.: BMDExpress 2: enhanced transcriptomic dose-response analysis workflow. 28. Chen J, Tanguay RL, Xiao Y, Haggard DE, Ge X, Jia Y, Zheng Y, 2018. https://doi.org/10.1093/bioinformatics/ Dong Q, Huang C, Lin K: TBBPA exposure during a sensitive bty878. bty878. developmental window produces neurobehavioral changes in larval zebrafish. Environ Pollut 2016, 216:53–63. 13. Chen G, Shi T, Shi L: Characterizing and annotating the genome using RNA-seq data. Sci China Life Sci 2017, 60: 29. Merrick BA, Paules RS, Tice RR: Intersection of toxicoge- 116–125. nomics and high throughput screening in the Tox21 pro- gram: an NIEHS perspective. Int J Biotechnol 2015, 14:7–27. 14. Hrdlickova R, Toloue M, Tian B: RNA-Seq methods for tran- scriptome analysis. Wiley Interdiscip Rev RNA 2017, 8. 30. Mav D, Shah RR, Howard BE, Auerbach SS, Bushel PR, Collins JB, Gerhold DL, Judson RS, Karmaus AL, Maull EA, et al.: 15. Zhao S, Zhang Y, Gamini R, Zhang B, von Schack D: Evaluation A hybrid gene selection approach to create the S1500+ of two main RNA-seq approaches for gene quantification in targeted gene sets for use in high-throughput tran- clinical RNA sequencing: polyA+ selection versus rRNA scriptomics. PLoS One 2018, 13:e0191105. depletion. Sci Rep 2018, 8:4781. 31. Park J, Shrestha R, Qiu C, Kondo A, Huang S, Werth M, Li M, 16. Piskol R, Ramaswami G, Li JB: Reliable identification of Barasch J, Susztak K: Single-cell transcriptomics of the genomic variants from RNA-seq data. Am J Hum Genet 2013, mouse kidney reveals potential cellular targets of kidney 93:641–651. disease. Science 2018, 360:758–763. 17. Wang C, Gong B, Bushel PR, Thierry-Mieg J, Thierry-Mieg D, 32. Ramaiahgari SC, Waidyanatha S, Dixon D, DeVito MJ, Xu J, Fang H, Hong H, Shen J, Su Z, et al.: The concordance Paules RS, Ferguson SS: Three-dimensional (3D) HepaRG between RNA-seq and microarray data depends on chemical spheroid model with physiologically relevant xenobiotic treatment and transcript abundance. Nat Biotechnol 2014, 32: metabolism competence and hepatocyte functionality for 926–932. liver toxicity screening. Toxicol Sci 2017, 160:189–190. Large consortium study comparing the RNA-seq and microarray plat- Human liver spheroid model demonstrates enhanced xenobiotic forms iwht 27 chemicals representing multiple modes of action (MOA). (CYP1A2, CYP2B6and CYP3A4/5) and functional capabilities. Cross platform concordance was very high but differentially expressed HepaRG liver spheroids can be screened in 384 well plates and show genes and pathways was affected by transcript abundance and bio- liver enzyme inducibility with activators of hepatic receptors to AhR, logical complexity of MOA. The bioinformatic methods and data CAR and PXR. The 3D spheroids have a longevity in culture needed viscualization are very informative for analyzing large datasets and for repeated exposures and lab-to-lab and year-to-year repeatability for platform comparison. toxicology screening. Generated data could rapidly provide concentration-response data for risk assessment. 18. Zhang W, Yu Y, Hertwig F, Thierry-Mieg J, Zhang W, Thierry- Mieg D, Wang J, Furlanello C, Devanarayan V, Cheng J, et al.: 33. Limonciel A, Ates G, Carta G, Wilmes A, Watzele M, Shepard PJ, Comparison of RNA-seq and microarray-based models for VanSteenhouse HC, Seligmann B, Yeakley JM, van de Water B, clinical endpoint prediction. Genome Biol 2015, 16:133. et al.: Comparison of base-line and chemical-induced tran- scriptomic responses in HepaRG and RPTEC/TERT1 cells 19. Tani H, Takeshita JI, Aoki H, Nakamura K, Abe R, Toyoda A, using TempO-Seq. Arch Toxicol 2018, 92:2517–2531. Endo Y, Miyamoto S, Gamo M, Sato H, et al.: Identification of RNA biomarkers for chemical safety screening in mouse 34. Ramaiahgari SC, den Braver MW, Herpers B, Terpstra V, embryonic stem cells using RNA deep sequencing analysis. Commandeur JN, van de Water B, Price LS: A 3D in vitro model PLoS One 2017, 12:e0182032. of differentiated HepG2 cell spheroids with improved liver- like properties for repeated dose high-throughput toxicity 20. Huang W, Bencic DC, Flick RL, Nacci DE, Clark BW, Burkhard L, studies. Arch Toxicol 2014, 88:1083–1095. Lahren T, Biales AD: Characterization of the Fundulus heter- oclitus embryo transcriptional response and development of 35. Bushel PR, Paules RS, Auerbach SS: A comparison of the a gene expression-based fingerprint of exposure for the TempO-seq S1500+ platform to RNA-seq and microarray using alternative flame retardant, TBPH (bis (2-ethylhexyl)-tetra- rat liver mode of action samples. Front Genet 2018, 9:485. bromophthalate). Environ Pollut 2019, 247:696–705. 36. Wehmas LC, Wood CE, Gagne R, Williams A, Yauk C, 21. Li CY, Cui JY: Regulation of protein-coding gene and long Gosink MM, Dalmas D, Hao R, O’Lone R, Hester S: Demodifying noncoding RNA pairs in liver of conventional and germ-free RNA for transcriptomic analyses of archival formalin-fixed mice following oral PBDE exposure. PLoS One 2018, 13: paraffin-embedded samples. Toxicol Sci 2018, 162:535–547. e0201387. 37. Auerbach SS, Phadke DP, Mav D, Holmgren S, Gao Y, Xie B, 22. Merrick BA, Chang JS, Phadke DP, Bostrom MA, Shah RR, Shin JH, Shah RR, Merrick BA, Tice RR: RNA-Seq-based toxi- Wang X, Gordon O, Wright GM: HAfTs are novel lncRNA cogenomic assessment of fresh frozen and formalin-fixed transcripts from aflatoxin exposure. PLoS One 2018, 13: tissues yields similar mechanistic insights. J Appl Toxicol e0190992. 2015, 35:766–780. 23. Smith A, Calley J, Mathur S, Qian HR, Wu H, Farmen M, 38. Webster AF, Zumbo P, Fostel J, Gandara J, Hester SD, Recio L, Caiment F, Bushel PR, Li J, Fisher C, et al.: The Rat microRNA Williams A, Wood CE, Yauk CL, Mason CE: Mining the archives: body atlas; Evaluation of the microRNA content of rat organs a cross-platform analysis of gene expression profiles in through deep sequencing and characterization of pancreas archival formalin-fixed paraffin-embedded tissues. Toxicol Sci enriched miRNAs as biomarkers of pancreatic toxicity in the 2015, 148:460–472. rat and dog. BMC Genomics 2016, 17:694. 39. Hester SD, Bhat V, Chorley BN, Carswell G, Jones W, 24. Kullak-Ublick GA, Andrade RJ, Merz M, End P, Benesic A, Wehmas LC, Wood CE: Editor’s highlight: dose-response Gerbes AL, Aithal GP: -induced liver injury: recent ad- analysis of RNA-seq profiles in archival formalin-fixed vances in diagnosis and risk assessment. Gut 2017, 66: paraffin-embedded samples. Toxicol Sci 2016, vol. 154: 1154–1164. 202–213. Demonstration of dose–response from RNA-seq transcript profiled 25. Zheng Y, Yuan J, Meng S, Chen J, Gu Z: Testicular tran- after RNA extraction from paired frozen and FFPE samples from two scriptome alterations in zebrafish (Danio rerio) exposure to archival studies in mice at 2 years and 20 years of storage. Extraction 17beta-estradiol. Chemosphere 2019, 218:14–25. procedures, RNA-seq methods and data analysis are exemplary for www.sciencedirect.com Current Opinion in Toxicology 2019, 18:18–26 26 Genomic Toxicology

use of archival materials that connect histopathology and NGS-based 55. Warr A, Robert C, Hume D, Archibald A, Deeb N, Watson M: transcriptomics. Exome sequencing: current and future perspectives. G3 (Bethesda) 2015, 5:1543–1550. 40. Zerdoumi Y, Kasper E, Soubigou F, Adriouch S, Bougeard G, Frebourg T, Flaman JM: A new genotoxicity assay based on 56. Normand EA, Alaimo JT, Van den Veyver IB: Exome and p53 target gene induction. Mutat Res Genet Toxicol Environ genome sequencing in reproductive medicine. Fertil Steril Mutagen 2015, 789–790:28–35. 2018, 109:213–220. 41. Israel JW, Chappell GA, Simon JM, Pott S, Safi A, Lewis L, 57. Kamps R, Brandao RD, Bosch BJ, Paulussen AD, Xanthoulea S, Cotney P, Boulos HS, Bodnar W, Lieb JD, et al.: Tissue- and Blok MJ, Romano A: Next-generation sequencing in oncology: strain-specific effects of a genotoxic carcinogen 1,3-buta- genetic diagnosis, risk prediction and cancer classification. diene on chromatin and transcription. Mamm Genome 2018, Int J Mol Sci 2017, 18. 29:153–167. 58. Fox EJ, Reid-Bayliss KS, Emond MJ, Loeb LA: Accuracy of next 42. Poetsch AR, Boulton SJ, Luscombe NM: Genomic landscape of generation sequencing platforms. Next Gener Seq Appl 2014, oxidative DNA damage and repair reveals regioselective 1. protection from mutagenesis. Genome Biol 2018, 19:215. 59. Krimmel JD, Schmitt MW, Harrell MI, Agnew KJ, Kennedy SR, 43. Cimino MC: Comparative overview of current international Emond MJ, Loeb LA, Swisher EM, Risques RA: Ultra-deep strategies and guidelines for genetic toxicology testing for sequencing detects ovarian cancer cells in peritoneal regulatory purposes. Environ Mol Mutagen 2006, 47:362–390. fluid and reveals somatic TP53 mutations in noncan- cerous tissues. Proc Natl Acad Sci U S A 2016, 113: 44. Maslov AY, Quispe-Tintaya W, Gorbacheva T, White RR, Vijg J: 6005–6010. High-throughput sequencing in mutation detection: a new Use of novel NGS methods for ultradeep sequencing of TP53 muta- generation of genotoxicity tests? Mutat Res 2015, 776: tions applied to ovarian cancer. The implications are that similar 136–143. queries on environmentally important genes could be useful for determining risk in environmentally contaminated regions or lifestyles. 45. Gomy I, Diz Mdel P: Hereditary cancer risk assessment: in- Mutational load on critical genes may be determinants of risk and sights and perspectives for the Next-Generation Sequencing environmental disease. era. Genet Mol Biol 2016, 39:184–188. 60. Fedeles BI, Chawanthayatham S, Croy RG, Wogan GN, 46. Schneeberger K: Using next-generation sequencing to isolate Essigmann JM: Early detection of the aflatoxin B1 mutational mutant genes from forward genetic screens. Nat Rev Genet fingerprint: a diagnostic tool for liver cancer. Mol Cell Oncol 2014, 15:662–676. 2017, 4:e1329693. 47. Deng JE, Sham PC, Li MX: SNPTracker: a swift tool for 61. Harrill AH, McCullough SD, Wood CE, Kahle JJ, Chorley BN: comprehensive tracking and unifying dbSNP rs IDs and MicroRNA biomarkers of toxicity in biological matrices. genomic coordinates of massive sequence variants. G3 Toxicol Sci 2016, 152:264–272. (Bethesda) 2015, 6:205–207. Biofluid-based miRNA studies reviewed as a “liquid biopsy” by sam- 48. Wei CH, Phan L, Feltz J, Maiti R, Hefferon T, Lu Z: tmVar 2.0: pling extracellular fluids like blood for chemically-induced toxicity in integrating genomic variant information from literature with liver, kidney, heart and pancreas. Description of unique factors in dbSNP and ClinVar for precision medicine. Bioinformatics miRNA analysis including biogenesis and baseline circulating expres- 2018, 34:80–87. sion levels, normalization, spike-in oligonucleotides and potential interference by erythrocyte lysis. 49. Scherr CL, Aufox S, Ross AA, Ramesh S, Wicklund CA, Smith M: What people want to know about their genes: a critical review 62. Nassirpour R, Mathur S, Gosink MM, Li Y, Shoieb AM, Wood J, of the literature on large-scale genome sequencing studies. O’Neil SP, Homer BL, Whiteley LO: Identification of tubular Healthcare (Basel) 2018, 6. injury microRNA biomarkers in urine: comparison of next- generation sequencing and qPCR-based profiling platforms. 50. Chen HZ, Bonneville R, Roychowdhury S: Implementing preci- BMC Genomics 2014, 15:485. sion cancer medicine in the genomic era. Semin Canc Biol 2019, 55:16–27. 63. Parsons HA, Beaver JA, Park BH: Circulating plasma tumor DNA. Adv Exp Med Biol 2016, 882:259–276. 51. Schwarz UI, Gulilat M, Kim RB: The role of next-generation sequencing in pharmacogenetics and pharmacogenomics. 64. Petit J, Carroll G, Gould T, Pockney P, Dun M, Scott RJ: Cell-free Cold Spring Harb Perspect Med 2019, 9. DNA as a diagnostic blood-based biomarker for colorectal cancer: a systematic review. J Surg Res 2018, 236:184–197. 52. Woo JG, Martin LJ, Ding L, Brown WM, Howard TD, Langefeld CD, Moomaw CJ, Haverbusch M, Sun G, Indugula SR, 65. Ghosh RK, Pandey T, Dey P: Liquid biopsy: a new avenue in et al.: Quantitative criteria for improving performance of pathology. Cytopathology 2019, 30:138–143. buccal DNA for high-throughput genetic analysis. BMC Genet 66. Gai W, Sun K: Epigenetic biomarkers in cell-free DNA and 2012, 13:75. applications in liquid biopsy. Genes (Basel) 2019, 10. 53. Bellairs JA, Hasina R, Agrawal N: Tumor DNA: an emerging 67. Nuwaysir EF, Bittner M, Trent J, Barrett JC, Afshari CA: Micro- biomarker in head and neck cancer. Cancer Metastasis Rev arrays and toxicology: the advent of toxicogenomics. Mol 2017, 36:515–523. Carcinog 1999, 24:153–159. 54. Yin Y, Lan J, Zhang Q: Application of high-throughput next- generation sequencing for HLA typing on buccal extracted DNA. Methods Mol Biol 2018, 1802:101–113.

Current Opinion in Toxicology 2019, 18:18–26 www.sciencedirect.com