Machine Learning, Transcriptome, and Genotyping Chip Analyses

Total Page:16

File Type:pdf, Size:1020Kb

Machine Learning, Transcriptome, and Genotyping Chip Analyses www.nature.com/scientificreports OPEN Machine learning, transcriptome, and genotyping chip analyses provide insights into SNP markers identifying fower color in Platycodon grandiforus Go‑Eun Yu1,6, Younhee Shin2,6, Sathiyamoorthy Subramaniyam2, Sang‑Ho Kang1, Si‑Myung Lee1, Chuloh Cho3, Seung‑Sik Lee4,5 & Chang‑Kug Kim1* Bellfower is an edible ornamental gardening plant in Asia. For predicting the fower color in bellfower plants, a transcriptome‑wide approach based on machine learning, transcriptome, and genotyping chip analyses was used to identify SNP markers. Six machine learning methods were deployed to explore the classifcation potential of the selected SNPs as features in two datasets, namely training (60 RNA‑Seq samples) and validation (480 Fluidigm chip samples). SNP selection was performed in sequential order. Firstly, 96 SNPs were selected from the transcriptome‑wide SNPs using the principal compound analysis (PCA). Then, 9 among 96 SNPs were later identifed using the Random forest based feature selection method from the Fluidigm chip dataset. Among six machines, the random forest (RF) model produced higher classifcation performance than the other models. The 9 SNP marker candidates selected for classifying the fower color classifcation were verifed using the genomic DNA PCR with Sanger sequencing. Our results suggest that this methodology could be used for future selection of breeding traits even though the plant accessions are highly heterogeneous. Te bellfower (Platycodon grandiforus) is a popular plant used as food, medicine, and ornamental plant in Asia1. P. g randif or u s is a monotypic species of the bellfower family (Campanulaceae). Due to its therapeutic efects, its root has been used in traditional medicine as a popular food additive for over 2000 years2. P. g randif or u s consists of 12 cultivars, which has received high commercial ratings in the ornamental fower market for their habitat and foral displays3. Its fower is characterized by an attractive colorful bud, a long fowering time, and an extended vase life 4,5. Genome-wide molecular markers-based genomic selections enhance plant breeding to produce the desired traits6. However, traditional molecular breeding techniques, such as molecular assisted selection, have been limited due to multiple gene variants for complex traits 7. Next-generation sequencing technologies have enabled large-scale genome-wide genotyping for heterogeneous phenotypes, which helped in precise genome selection associated with specifc phenotypes. Further, the reduced representation of genome-wide genotyping is tran- scriptome-wide association studies, which leverage the project cost for molecular breeding studies in the model and non-model plants 8–10. Tese massive genotyping eforts have recently been subject to machine learning (ML) methods to predict SNP associations with specifc traits. Population studies have demonstrated how ML based modeling can be used efectively to predict phenotype from genotype 11. ML facilitates pattern recognition of large biological datasets since ML algorithms are used widely in various biological felds, such as molecular marker identifcation, coding region recognition, pathway gene recognition, protein–protein interaction determina- tion, and metabolic network detection12. For instance, ML models have been constructed for genomic selection 1Genomics Division, National Institute of Agricultural Sciences, Jeonju 54874, Korea. 2Research and Development Center, Insilicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea. 3Crop Foundation Research Division, National Institute of Crop Science, RDA, Wanju 55365, Korea. 4Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, 29 Geumgu-gil, Jeongeup 56212, Korea. 5Department of Radiation Science and Technology, University of Science and Technology, Daejeon 34113, Korea. 6These authors contributed equally: Go-Eun Yu and Younhee Shin. *email: [email protected] Scientifc Reports | (2021) 11:8019 | https://doi.org/10.1038/s41598-021-87281-0 1 Vol.:(0123456789) www.nature.com/scientificreports/ in wheat13, root genotype classifcation14, nut-size prediction in Castanea crenata10, and polyploidy associated SNPs identifcation in plants15. Several studies have been conducted in P. grandiforas to identify molecular markers such as simple sequence repeats16,17, microsatellites18, and cleaved amplifed polymorphic sequences 19. However, molecular markers that determine specifc fower colors are limited. Hence, we report in this study a newly designed transcriptome-wide approach as an efective method in identifying fower color in bellfower plants, where the selected SNP markers could be used to predict bellfower color during the plant breeding experiments. Methods Plant materials. P. g randif or u s plant seeds germinated at 60-hole pots for 30 days and transferred to 18.3 CM PI-pots for 90 days in green house setup under 25 °C temperature. Once the fower color was conformed, the petals were collected as a samples for experiments. All the plants used in this study were maintained at experi- mental feld located in Jeonju, Korea (N: 35° 49′; E: 127° 09′) National Institute of Agricultural Sciences/RDA living modifed organism (LMO) guidance. Accessions for three fower colours (i.e., Astra pink, Janbaek violet, and Jangbaek white) for RNA sequencing (RNA-Seq) were obtained from the RDA Genebank (http:// geneb ank. rda. go. kr/). For three colors (20 in each color), 60 samples were collected, and those sampling details are given in Supplementary Table S2. Te 480 genotyping samples for the Fluidigm chip array were collected from the mutant accessions created by the gamma irradiation of three diferent fower color accessions (Supplementary Table S3). Te leaf and fower components from each plant were collected individually, and the samples were frozen in liquid nitrogen and stored at -80℃ following DNA and RNA extractions. Gamma irradiation. A Co-60 gamma-irradiator (IR 222, MDS Nordion Inc., Kanata, Canada) was used for gamma irradiation. Dry seeds of P. g randif or u s were irradiated with 50, 100, 150, and 200 Gy of gamma radia- tion at dose rates of 25, 50, 75, and 100 Gy h–1 for 2 h using a Co-60 gamma-irradiator at the Advanced Radiation Technology Institute (ARTI), Korea Atomic Energy Research Institute (KAERI). P. grandiforas reference genome. Te reference genome for P. grandiforas was obtained from the P. g randif or u s genome project1 (http:// platy codon. thera genet ex. com/). Te gene model and other functional annotations were also obtained from the respective site. RNA sequencing and variant calling. Te 60 transcriptome sequence libraries were subjected to a high- throughput Illumina NovaSeq sequencing system for RNA sequencing. RNA was extracted from individual samples using TRIzol reagent (Invitrogen, Termo Fisher Scientifc, Waltham, MA, USA). Total RNA was quan- titated using the NanoDrop spectrophotometer (Invitrogen, Termo Fisher Scientifc, Waltham, MA, USA), and quality was assessed using the RNA 6000 Nano assay kit and Bioanalyser2100 (Agilent Technologies, Santa Clara, CA, USA). Before variant calling, sequence reads were checked for bacterial contamination and adapters using Trimmo- matic v0.3620, as explained in shin et. al.,21 and mapped to the reference genome using bowtie2 22. To optimize the small insertion and deletion artifacts, the reads were re-mapped to the reference genome using the GATK v3.5 tool23 and the variants for individual samples were stored in the variant call format (VCF) fles. Te command line includes the parameters (T:HaplotypeCaller; emitRefConfdence:GVCF, variant_index_type:LINEAR, vari- ent_index_parameter:128,000; nct:20; drf:DuplicateRead). Using flters such as the normalized quality score ≥ 20 and mapping quality ≥ 40, the high-quality SNPs were obtained. Te obtained SNPs were annotated using SnpEf v4.224 and the missing genotypes were imputed by Beagle v4.125 with the linkage disequilibrium (LD) score. Population stratifcation. For population stratifcation, SNPs were examined via principal component analysis (PCA) using the PLINK v1.9 tool26. Te commands include options such as assoc, adjust, fsher, model, logistic, hap-assoc, hap-impute. To reduce false positive predictions, stringent fltering cutofs such as genotyp- ing rate ≥ 90%, mapping quality ≥ 40, minor allele frequency (MAF) > 5%, and Hardly-Weinberg equilibrium (HWE) < 0.001were applied to the scores calculated by the PLINK tool. Te sub-populations were estimated based on the number of clusters (i.e., K value), which was obtained using the STRU CTU RE v2.3.4 tool27. Te ad-hoc static ΔK (i.e., change rate of log probability between successive K values) was used to determine the uppermost hierarchical level of population structures. Structural analysis was performed following 20 replicated runs using the 100,000 iterations afer a burn-in period of 50,000 runs. Genome‑wide association. A genome-wide association study was conducted with categorical association (case vs. control). Here, there are three fower color categories tread as case vs control (example: white vs other two as controls). Te association between genotype and phenotype for the above model was conducted using PLINK-v.1.9 with the association mode (Fig. 1). Te signifcant SNPs were selected by applying the cut-of, p < 0.01. Genotyping chip construction. Te SNPs for Fluidigm chip manufacturing were selected in two steps from the color-associated SNPs obtained through GWAS. Firstly, 50% of SNPs were systematically selected
Recommended publications
  • Bacteriophage T7 DNA Polymerase – Sequenase
    Bacteriophage T7 DNA polymerase – sequenase The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Zhu, Bin. 2014. “Bacteriophage T7 DNA polymerase – sequenase.” Frontiers in Microbiology 5 (1): 181. doi:10.3389/fmicb.2014.00181. http://dx.doi.org/10.3389/fmicb.2014.00181. Published Version doi:10.3389/fmicb.2014.00181 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:12407012 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA REVIEW ARTICLE published: 16 April 2014 doi: 10.3389/fmicb.2014.00181 BacteriophageT7 DNA polymerase – sequenase Bin Zhu* Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA Edited by: An ideal DNA polymerase for chain-terminating DNA sequencing should possess the Andrew F.Gardner, New England following features: (1) incorporate dideoxy- and other modified nucleotides at an efficiency Biolabs, USA similar to that of the cognate deoxynucleotides; (2) high processivity; (3) high fidelity in Reviewed by: the absence of proofreading/exonuclease activity; and (4) production of clear and uniform Kirk Matthew Schnorr, Novozymes A/S, Denmark signals for detection. The DNA polymerase encoded by bacteriophage T7 is naturally Samir Hamdan, King Abdullah endowed with or can be engineered to have all these characteristics. The chemically or University of Science and Technology, genetically modified enzyme (Sequenase) expedited significantly the development of DNA Saudi Arabia sequencing technology.
    [Show full text]
  • Sequencing.Pdf
    Sequencing. Biological sciences has a long and storied history of natural science – identifying, cataloging, and classifying organisms in different environments. Every aspect of biological sciences is being revolutionized by two dominant technologies: cheap and ubiquitous access to DNA sequencing, and cheap and ubiquitous access to high performance computing. Molecular biology was forever altered by the introduction of the polymerase chain reaction (PCR), and that invention, more than any other, has enabled the sequencing revolution. Taxonomy, one of the oldest endeavors of the biologist, is changing from char- acterizing organisms based on morphology to characterizing organisms based on DNA sequences. Revisionist taxonomic approaches are being introduced throughout the tree of life, and no branch on the tree can avoid the sequenc- ing machine. For example, microbial and viral taxonomy are being completely rewritten because of the analysis of DNA sequences (Rohwer & Edwards, 2002; Thompson et al., 2015). All aspects of ecology are being affected by the intro- duction of DNA sequencing: for example, in the marine environment the ecology of everything from the largest (Cammen et al., 2016) to the smallest (Chow & Suttle, 2015; Haggerty & Dinsdale, 2017) organisms is being revised because of sequence analysis. Evolutionary theories about plants and animals are being upended because of the introduction of genome skimming techniques and deep sequencing (Hollingsworth et al., 2016). Sanger Sequencing Two-time nobel Laureate Dr. Fred Sanger developed this sequencing technol- ogy in 1977. The approach uses terminators, dideoxynucleotide triphosphates (ddNTPs), that terminate DNA extension during replication and do not allow the strand to be synthesized further. Thus, every time a ddNTP is added to the growing DNA strand, synthesis stops and the fragement can not be made any longer.
    [Show full text]
  • Sanger Sequencing – a Hands-On Simulation Background And
    Sanger sequencing – a hands-on simulation Background and Guidelines for Instructor Jared Young Correspondence concerning this article should be addressed to Jared Young, Mills College 5000 MacArthur Blvd, Oakland, CA 94613 Contact: [email protected] Synopsis This hands-on simulation teaches the Sanger (dideoxy) method of DNA sequencing. In the process of carrying out the exercise, students also confront DNA synthesis, especially as it relates to chemical structure, and the stochastic nature of biological processes. The exercise is designed for an introductory undergraduate genetics course for biology majors. The exercise can be completed in around 90-minutes, which can be broken up into a 50-minute period for the simulation and a follow-up 50-minute (or less) period for discussion. This follow-up could also take place in a Teaching Assistant (TA) led section. The exercise involves interactions between student pairs and the entire class. There is an accompanying student handout with prompts that should be invoked where indicated in these instructions. Introduction Sanger sequencing is an important technique: it revolutionized the field of Genetics and is still in wide use today. Sanger sequencing is a powerful pedagogical tool well-suited for inducing multiple “aha” moments: in achieving a deep understanding of the technique, students gain a better understanding of DNA and nucleotide structure, DNA synthesis, the stochastic nature of biological processes, the utility of visible chemical modifications (in this case, fluorescent dyes), gel electrophoresis, and the connection between a physical molecule and the information it contains. Sanger sequencing is beautiful: a truly elegant method that can bring a deep sense of satisfaction when it is fully understood.
    [Show full text]
  • DNA Sequencing
    DNA Sequencing: Sanger Method (Dideoxynucleotide chain termination) Sanger sequencing is a DNA sequencing method in which target DNA is denatured and annealed to an oligonucleotide primer, which is then extended by DNA polymerase using a mixture of deoxynucleotide triphosphates (normal dNTPs) and chain-terminating dideoxynucleotide triphosphates (ddNTPs). ddNTPs lack the 3’ OH group to which the next dNTP of the growing DNA chain is added. Without the 3’ OH, no more nucleotides can be added, and DNA polymerase falls off. The resulting newly synthesized DNA chains will be a mixture of lengths, depending on how long the chain was when a ddNTP was randomly incorporated. Manual DNA sequencing example: • First, anneal the primer to the DNA template (must be single stranded): 5’ -GAATGTCCTTTCTCTAAG 3'-GGAGACTTACAGGAAAGAGATTCAGGATTCAGGAGGCCTACCATGAAGATCAAG-5' • Then split the sample into four aliquots including the following nucleotides: "G" tube: All four dNTPs, one of which is radiolabeled, plus ddGTP (low concentration) "A" tube: All four dNTPs, one of which is radiolabeled, plus ddATP "T" tube: All four dNTPs, one of which is radiolabeled, plus ddTTP "C" tube: All four dNTPs, one of which is radiolabeled, plus ddCTP • When a DNA polymerase (e.g. Klenow fragment) is added to the tubes, the synthetic reaction proceeds until, by chance, a dideoxynucleotide is incorporated instead of a deoxynucleotide. This is a "chain termination" event, because there is a 3' H instead of a 3' OH group. Since the synthesized DNA is labeled (classically with 35S-dATP), the products can be detected and distinguished from the template. Note that the higher the concentration of the ddNTP in the reaction, the shorter the products will be, hence, you will get sequence CLOSER to your primer.
    [Show full text]
  • Sanger Sequencing 14
    Molecular Biology and History of DNA Sequencing 02-223 Sept. 9 2014 History of DNA Thomas Morgan first James Watson and described Francis Crick proposed Gregor Mendel first linkage and that DNA is a double described patterns of recombination strand with a double inheritance helical structure 1866 1869 1911 1950 1953 Fredrich Edwin Chargaff Miescher first discovered that A and isolated DNA T, and G and C have equal amounts http://www.nature.com/scitable/content/dna-is-a-double-helix-24263 History of DNA Frederick Sanger Arthur Kornberg Hamilton Smith developed dideoxy replicated DNA in- discovered DNA Commercial DNA sequencing ~100 vitro using DNA restriction automated DNA bases/reaction polymerase enzymes synthesizer PCR developed ~1000 bases/ by Kary Mullis reaction 1957 1961 1970 1971 1977 1983 1986 1996 First genome sequenced using Leroy Hood Marshall in-vitro replication by Ray Wu, developed Nirenberg A.D. Kaiser, and Ellen Taylor . automated elucidated the Phage λ, ~5000 nt took over 3 sequencing codons years DNA Polymerase h"p://www.virology.ws/2009/05/10/the-error-prone-ways-of-rna-synthesis/ Even with proofreading, mistakes made every 107-109 Bases 6 Billion Bases in human genome! h"p://www.virology.ws/2009/05/10/the-error-prone-ways-of-rna-synthesis/ Molecular Biology of the Cell. 4th edition. Alberts B, Johnson A, Lewis J, et al. New York: Garland Science; 2002. PCR • Polymerase Chain ReacJon • Invented in 1983 • DNA polymerase from Thermus aqua+cus • 2.2x105 error rate Polymerase Chain Reaction (PCR) overview buffer, ssDNA primers,
    [Show full text]
  • A Beginner's Guide to Next Generation Sequencing
    A BEGINNER’S GUIDE TO NEXT GENERATION SEQUENCING youseq.com Next Generation Sequencing Let’s keep things simple. The world of Next Generation Sequencing (NGS) can seem complex and intimidating. It need not be. Let’s start by reminding ourselves what its useful for and why we use it. All of life is coded in it’s DNA. A remarkably simple code of four molecules that act as a blue print to define the proteins that we and all of the organisms we share our planet with are made of. Reading this code is one of the most astonishing achievements that the human species has ever and will ever accomplish. Reading this code helps us to understand how we are made, how we are all related, how errors or mutations in our DNA cause disease and how we may respond best to medicines. It holds the promise to revolutionise healthcare and has already begun to do so. The first human genome “read” was competed in 2001. It took 10 years and the best part of $2.7bn. It was achieved by DNA sequencing. A method by which the sequence of the DNA is read painstakingly in small fragments and then reassembled to create a complete sequence. Sanger Sequencing as it is known, was the method used to achieve the first publication of the first human genome. Next Generation Sequencing is a phrase used to describe a range of technologies that speed up and reduce the cost of DNA sequencing vs the traditional Sanger sequencing. What are all these different Next Generation Sequencing Technologies? Well there are quite a few of them.
    [Show full text]
  • Discovery and Characterization Of
    Discovery and characterization of microsatellites for the solitary bee Colletes inaequalis using Sanger and 454 pyrosequencing Margarita López-Uribe, Christine Santiago, Steve Bogdanowicz, Bryan Danforth To cite this version: Margarita López-Uribe, Christine Santiago, Steve Bogdanowicz, Bryan Danforth. Discovery and characterization of microsatellites for the solitary bee Colletes inaequalis using Sanger and 454 py- rosequencing. Apidologie, Springer Verlag, 2013, 44 (2), pp.163-172. 10.1007/s13592-012-0168-3. hal-01201284 HAL Id: hal-01201284 https://hal.archives-ouvertes.fr/hal-01201284 Submitted on 17 Sep 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Apidologie (2013) 44:163–172 Original article * INRA, DIB and Springer-Verlag France, 2012 DOI: 10.1007/s13592-012-0168-3 Discovery and characterization of microsatellites for the solitary bee Colletes inaequalis using Sanger and 454 pyrosequencing 1 1 2 Margarita M. LÓPEZ-URIBE , Christine K. SANTIAGO , Steve M. BOGDANOWICZ , 1 Bryan N. DANFORTH 1Department of Entomology, Cornell University, Ithaca, NY 14853, USA 2Evolutionary Genetics Core Facilities, Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY 14853, USA Received 5 June 2012 – Revised 8 August 2012 – Accepted 18 September 2012 Abstract – The recent implementation of next-generation sequencing for the discovery of microsatellite markers has made this technology the most effective method for generating genetic markers in non-model organisms.
    [Show full text]
  • Guide to Interpreting Genomic Reports: a Genomics Toolkit
    Guide to Interpreting Genomic Reports: A Genomics Toolkit A guide to genomic test results for non-genetics providers Created by the Practitioner Education Working Group of the Clinical Sequencing Exploratory Research (CSER) Consortium Genomic Report Toolkit Authors Kelly East, MS, CGC, Wendy Chung MD, PhD, Kate Foreman, MS, CGC, Mari Gilmore, MS, CGC, Michele Gornick, PhD, Lucia Hindorff, PhD, Tia Kauffman, MPH, Donna Messersmith , PhD, Cindy Prows, MSN, APRN, CNS, Elena Stoffel, MD, Joon-Ho Yu, MPh, PhD and Sharon Plon, MD, PhD About this resource This resource was created by a team of genomic testing experts. It is designed to help non-geneticist healthcare providers to understand genomic medicine and genome sequencing. The CSER Consortium1 is an NIH-funded group exploring genomic testing in clinical settings. Acknowledgements This work was conducted as part of the Clinical Sequencing Exploratory Research (CSER) Consortium, grants U01 HG006485, U01 HG006485, U01 HG006546, U01 HG006492, UM1 HG007301, UM1 HG007292, UM1 HG006508, U01 HG006487, U01 HG006507, R01 HG006618, and U01 HG007307. Special thanks to Alexandria Wyatt and Hugo O’Campo for graphic design and layout, Jill Pope for technical editing, and the entire CSER Practitioner Education Working Group for their time, energy, and support in developing this resource. Contents 1 Introduction and Overview ................................................................ 3 2 Diagnostic Results Related to Patient Symptoms: Pathogenic and Likely Pathogenic Variants . 8 3 Uncertain Results
    [Show full text]
  • DNA Polymerases Drive DNA Sequencing-By-Synthesis Technologies: Both Past and Present
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE REVIEW ARTICLEprovided by Frontiers - Publisher Connector published: 24 June 2014 doi: 10.3389/fmicb.2014.00305 DNA polymerases drive DNA sequencing-by-synthesis technologies: both past and present Cheng-Yao Chen* Protein Engineering Group, Illumina, San Diego, CA, USA Edited by: Next-generation sequencing (NGS) technologies have revolutionized modern biological and Andrew F.Gardner, New England biomedical research. The engines responsible for this innovation are DNA polymerases; Biolabs, USA they catalyze the biochemical reaction for deriving template sequence information. In fact, Reviewed by: DNA polymerase has been a cornerstone of DNA sequencing from the very beginning. Suleyman Yildirim, Walter Reed Army Institute of Research, USA Escherichia coli DNA polymerase I proteolytic (Klenow) fragment was originally utilized Andreas Marx, University of in Sanger’s dideoxy chain-terminating DNA sequencing chemistry. From these humble Konstanz, Germany beginnings followed an explosion of organism-specific, genome sequence information *Correspondence: accessible via public database. Family A/B DNA polymerases from mesophilic/thermophilic Cheng-Yao Chen, Protein Engineering bacteria/archaea were modified and tested in today’s standard capillary electrophoresis Group, Illumina, 5200 Illumina Way, San Diego, CA 92122, USA (CE) and NGS sequencing platforms.These enzymes were selected for their efficient incor- e-mail: [email protected] poration of bulky dye-terminator and reversible dye-terminator nucleotides respectively. Third generation, real-time single molecule sequencing platform requires slightly different enzyme properties. Enterobacterial phage φ29 DNA polymerase copies long stretches of DNA and possesses a unique capability to efficiently incorporate terminal phosphate- labeled nucleoside polyphosphates.
    [Show full text]
  • Discovery of Single-Nucleotide Polymorphisms (Snps) in the Uncharacterized Genome of the Ascomycete Ophiognomonia Clavigignenti-Juglandacearum from 454 Sequence Data
    Molecular Ecology Resources (2011) 11, 693–702 doi: 10.1111/j.1755-0998.2011.02998.x Discovery of single-nucleotide polymorphisms (SNPs) in the uncharacterized genome of the ascomycete Ophiognomonia clavigignenti-juglandacearum from 454 sequence data K.D.BRODERS,*K.E.WOESTE,†P.J.SANMIGUEL,‡ R. P. WESTERMAN‡ and G. J. BOLAND* *School of Environmental Sciences, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada N1G 2W1, †USDA Forest Service Hardwood Tree Improvement and Regeneration Center, Department of Forestry and Natural Resources, Purdue University, 715 W. State Street, West Lafayette, IN 47907, USA, ‡Department of Horticulture & Landscape Architecture, Purdue University, 625 Agriculture Mall Drive, West Lafayette, IN 47907, USA Abstract The benefits from recent improvement in sequencing technologies, such as the Roche GS FLX (454) pyrosequencing, may be even more valuable in non-model organisms, such as many plant pathogenic fungi of economic importance. One appli- cation of this new sequencing technology is the rapid generation of genomic information to identify putative single- nucleotide polymorphisms (SNPs) to be used for population genetic, evolutionary, and phylogeographic studies on non-model organisms. The focus of this research was to sequence, assemble, discover and validate SNPs in a fungal genome using 454 pyrosequencing when no reference sequence is available. Genomic DNA from eight isolates of Ophiognomonia clavigignenti-juglandacearum was pooled in one region of a four-region sequencing run on a Roche 454 GS FLX. This yielded 71 million total bases comprising 217 000 reads, 80% of which collapsed into 16 125 754 bases in 30 339 contigs upon assembly. By aligning reads from multiple isolates, we detected 298 SNPs using Roche’s GS Mapper.
    [Show full text]
  • An Introduction to Next-Generation Sequencing for Cardiology Table of Contents
    An Introduction to Next-Generation Sequencing for Cardiology www.illumina.com/technology/next-generation-sequencing.ilmn Table of Contents Part I. Welcome to Next-Generation Sequencing 3 NGS in Clinical Cardiology Research 3 Part II. What is Next-Generation Sequencing? 3 The Basic NGS Workflow 3 Multiplexing 5 Part III. NGS Applications 6 Whole-Genome Sequencing 6 Exome Sequencing 6 Targeted Sequencing 6 Part IV. Summary 6 Glossary 7 References 8 Part I. Welcome to Next-Generation Sequencing NGS in Clinical Cardiology Research In 1990, the beta cardiac myosin heavy chain gene (MYH7) was first implicated in hypertrophic cardiomyopathy (HCM). Since then, many genes have been associated with channelopathies, cardiomyopathies, and aortopathies, and investigators continue to look to the human genome for answers. Researchers can achieve 60% detection of causative variants for certain cardiac conditions (e.g., LQTS, HCM, ACM/ARVC, CPVT) and genetic testing may reveal whether family members are carriers or at high risk for sudden cardiac arrest.1 Recent advances in technology are removing many of the barriers that have traditionally made genetic testing difficult. The value of genetic testing is gaining recognition among leading organizations. For example, in 2011 the Heart Rhythm Society (HRS) and the European Heart Rhythm Association (EHRA) published guidelines on the state of genetic testing for the channelopathies and cardiomyopathies.1 The introduction of next-generation sequencing (NGS) technology has transformed the way scientists and clinical researchers think about genetic information. NGS offers several advantages over existing methods, such as PCR and Sanger sequencing. Because NGS can assess multiple genes in a single assay, it eliminates the need for multiple single-gene tests to identify the causative variant or variants.
    [Show full text]
  • DNA Metabarcoding of Neotropical Ichthyoplankton: Enabling High Accuracy with Lower Cost
    Metabarcoding and Metagenomics 3: 69–76 DOI 10.3897/mbmg.3.35060 Research Article DNA metabarcoding of Neotropical ichthyoplankton: Enabling high accuracy with lower cost André B. Nobile1, Diogo Freitas-Souza1, Francisco J. Ruiz-Ruano2,3, Maria Lígia M. O. Nobile1, Gabriela O. Costa1, Felipe P. de Lima1, Juan Pedro M. Camacho2, Fausto Foresti1, Claudio Oliveira1 1 Laboratório de Biologia e Genética de Peixes, Departamento de Morfologia, Unesp, Botucatu, Brazil. 2 Evolutionary Genetics group, Department of Genetics, University of Granada, Granada, Spain. 3 Present address: Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden. Corresponding author: André B. Nobile ([email protected]) Academic editor: Filipe Costa | Received 8 April 2019 | Accepted 29 July 2019 | Published 5 September 2019 Abstract Knowledge of ichthyoplankton dynamics is extremely important for conservation management as it can provide information about preferential spawning sites, reproductive period, migratory routes and recruitment success, which can be used to guide management and conservation efforts. However, identification of the eggs and larvae of Neotropical freshwater fish is a difficult task. DNA- bar codes have emerged as an alternative and highly accurate approach for species identification, but DNA barcoding can be time-con- suming and costly. To solve this problem, we aimed to develop a simple protocol based on DNA metabarcoding, to investigate whether it is possible to detect and quantify all species present in a pool of organisms. To do this, 230 larvae were cut in half, one half was sequenced by the Sanger technique and the other half was used to compose six arrays with a pool of larvae that were sequenced using a next-generation technique (NGS).
    [Show full text]