Next Generation Sequencing Based Forward Genetic Approaches for Identification and Mapping of Causal Mutations in Crop Plants: a Comprehensive Review
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Gene Mapping Techniques
Developmental Neurobiology, edited by Philippe Evrard and Alexandre Minkowski. Nestle Nutrition Workshop Series, Vol. 12. Nestec Ltd., Vevey/Raven Press, Ltd., New York © 1989. Gene Mapping Techniques Jean-Louis Guenet Institut Pasteur, 75724 Paris Cedex 15, France Very accurate gene mapping is essential in both man and laboratory mammals (1- 3). Several techniques have been used over the last 50 years to localize mammalian genes on the chromosomes of a given species. This chapter reviews these tech- niques, with special emphasis on the most recent ones that represent a true break- through in formal genetics. CLASSICAL GENE MAPPING TECHNIQUES AND THEIR LIMITATIONS When two genes are linked they have a tendency to cosegregate during successive generations. The closer the linkage, the more absolute is the cosegregation. This is the fundamental principle of gene mapping, which has been successfully applied to all species, including plants, over many years. In mammals such as humans and mice, the continued discovery of marker genes scattered throughout the genome has facilitated the mapping of new genes so that we now possess for these two species, particularly the mouse, linkage maps that are far more detailed than those existing for other mammals. In the mouse, special matings can be set up, with appropriate stocks, to test for possible autosomal linkage after two successive reproductive rounds: In general, cross-back crosses are used, cross-intercrosses being reserved for studies in which the viability or the fertility of the homozygous mutant under study is impaired. In humans investigations concerning linkage are based on pedigree analysis. In other words, for both species it is essential to define as a starting point a situa- tion where two genes are heterozygous and either in repulsion A + / + B or in cou- pling AB/ + +, then to look for changes in this configuration after a reproductive cycle (forms in coupling giving rise to forms in repulsion and vice versa), and fi- nally to count the percentage or frequency of these recombination events. -
Genetic Mapping and Manipulation: Chapter 2-Two-Point Mapping with Genetic Markers* §
Genetic mapping and manipulation: Chapter 2-Two-point mapping with genetic markers* § David Fay , Department of Molecular Biology, University of Wyoming, Laramie, Wyoming 82071-3944 USA Table of Contents 1. The basics ..............................................................................................................................1 2. Calculating map distances ......................................................................................................... 3 3. Other considerations ................................................................................................................. 5 4. References ..............................................................................................................................6 1. The basics The basics. Two-point mapping, wherein a mutation in the gene of interest is mapped against a marker mutation, is primarily used to assign mutations to individual chromosomes. It can also give at least a rough indication of distance between the mutation and the markers used. On the surface, the concept of two-point mapping to determine chromosomal linkage is relatively straightforward. It can, however, be the source of some confusion when it comes to processing the actual data based on phenotypic frequencies to accurately determine genetic distances. It is also worth noting that most researchers don't bother much with exhaustive two-point mapping anymore. Once we've assigned our mutation to a linkage group, it's generally off to the races with three-point and SNP mapping -
File Formats Exercises
Introduction to high-throughput sequencing file formats – advanced exercises Daniel Vodák (Bioinformatics Core Facility, The Norwegian Radium Hospital) ( [email protected] ) All example files are located in directory “/data/file_formats/data”. You can set your current directory to that path (“cd /data/file_formats/data”) Exercises: a) FASTA format File “Uniprot_Mammalia.fasta” contains a collection of mammalian protein sequences obtained from the UniProt database ( http://www.uniprot.org/uniprot ). 1) Count the number of lines in the file. 2) Count the number of header lines (i.e. the number of sequences) in the file. 3) Count the number of header lines with “Homo” in the organism/species name. 4) Count the number of header lines with “Homo sapiens” in the organism/species name. 5) Display the header lines which have “Homo” in the organism/species names, but only such that do not have “Homo sapiens” there. b) FASTQ format File NKTCL_31T_1M_R1.fq contains a reduced collection of tumor read sequences obtained from The Sequence Read Archive ( http://www.ncbi.nlm.nih.gov/sra ). 1) Count the number of lines in the file. 2) Count the number of lines beginning with the “at” symbol (“@”). 3) How many reads are there in the file? c) SAM format File NKTCL_31T_1M.sam contains alignments of pair-end reads stored in files NKTCL_31T_1M_R1.fq and NKTCL_31T_1M_R2.fq. 1) Which program was used for the alignment? 2) How many header lines are there in the file? 3) How many non-header lines are there in the file? 4) What do the non-header lines represent? 5) Reads from how many template sequences have been used in the alignment process? c) BED format File NKTCL_31T_1M.bed holds information about alignment locations stored in file NKTCL_31T_1M.sam. -
EMBL-EBI Powerpoint Presentation
Processing data from high-throughput sequencing experiments Simon Anders Use-cases for HTS · de-novo sequencing and assembly of small genomes · transcriptome analysis (RNA-Seq, sRNA-Seq, ...) • identifying transcripted regions • expression profiling · Resequencing to find genetic polymorphisms: • SNPs, micro-indels • CNVs · ChIP-Seq, nucleosome positions, etc. · DNA methylation studies (after bisulfite treatment) · environmental sampling (metagenomics) · reading bar codes Use cases for HTS: Bioinformatics challenges Established procedures may not be suitable. New algorithms are required for · assembly · alignment · statistical tests (counting statistics) · visualization · segmentation · ... Where does Bioconductor come in? Several steps: · Processing of the images and determining of the read sequencest • typically done by core facility with software from the manufacturer of the sequencing machine · Aligning the reads to a reference genome (or assembling the reads into a new genome) • Done with community-developed stand-alone tools. · Downstream statistical analyis. • Write your own scripts with the help of Bioconductor infrastructure. Solexa standard workflow SolexaPipeline · "Firecrest": Identifying clusters ⇨ typically 15..20 mio good clusters per lane · "Bustard": Base calling ⇨ sequence for each cluster, with Phred-like scores · "Eland": Aligning to reference Firecrest output Large tab-separated text files with one row per identified cluster, specifying · lane index and tile index · x and y coordinates of cluster on tile · for each -
Sequence Alignment/Map Format Specification
Sequence Alignment/Map Format Specification The SAM/BAM Format Specification Working Group 3 Jun 2021 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing is version 53752fa from that repository, last modified on the date shown above. 1 The SAM Format Specification SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. If present, the header must be prior to the alignments. Header lines start with `@', while alignment lines do not. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. This specification is for version 1.6 of the SAM and BAM formats. Each SAM and BAMfilemay optionally specify the version being used via the @HD VN tag. For full version history see Appendix B. Unless explicitly specified elsewhere, all fields are encoded using 7-bit US-ASCII 1 in using the POSIX / C locale. Regular expressions listed use the POSIX / IEEE Std 1003.1 extended syntax. 1.1 An example Suppose we have the following alignment with bases in lowercase clipped from the alignment. Read r001/1 and r001/2 constitute a read pair; r003 is a chimeric read; r004 represents a split alignment. Coor 12345678901234 5678901234567890123456789012345 ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT +r001/1 TTAGATAAAGGATA*CTG +r002 aaaAGATAA*GGATA +r003 gcctaAGCTAA +r004 ATAGCT..............TCAGC -r003 ttagctTAGGC -r001/2 CAGCGGCAT The corresponding SAM format is:2 1Charset ANSI X3.4-1968 as defined in RFC1345. -
HUMAN GENE MAPPING WORKSHOPS C.1973–C.1991
HUMAN GENE MAPPING WORKSHOPS c.1973–c.1991 The transcript of a Witness Seminar held by the History of Modern Biomedicine Research Group, Queen Mary University of London, on 25 March 2014 Edited by E M Jones and E M Tansey Volume 54 2015 ©The Trustee of the Wellcome Trust, London, 2015 First published by Queen Mary University of London, 2015 The History of Modern Biomedicine Research Group is funded by the Wellcome Trust, which is a registered charity, no. 210183. ISBN 978 1 91019 5031 All volumes are freely available online at www.histmodbiomed.org Please cite as: Jones E M, Tansey E M. (eds) (2015) Human Gene Mapping Workshops c.1973–c.1991. Wellcome Witnesses to Contemporary Medicine, vol. 54. London: Queen Mary University of London. CONTENTS What is a Witness Seminar? v Acknowledgements E M Tansey and E M Jones vii Illustrations and credits ix Abbreviations and ancillary guides xi Introduction Professor Peter Goodfellow xiii Transcript Edited by E M Jones and E M Tansey 1 Appendix 1 Photographs of participants at HGM1, Yale; ‘New Haven Conference 1973: First International Workshop on Human Gene Mapping’ 90 Appendix 2 Photograph of (EMBO) workshop on ‘Cell Hybridization and Somatic Cell Genetics’, 1973 96 Biographical notes 99 References 109 Index 129 Witness Seminars: Meetings and publications 141 WHAT IS A WITNESS SEMINAR? The Witness Seminar is a specialized form of oral history, where several individuals associated with a particular set of circumstances or events are invited to meet together to discuss, debate, and agree or disagree about their memories. The meeting is recorded, transcribed, and edited for publication. -
Introduction to Bioinformatics (Elective) – SBB1609
SCHOOL OF BIO AND CHEMICAL ENGINEERING DEPARTMENT OF BIOTECHNOLOGY Unit 1 – Introduction to Bioinformatics (Elective) – SBB1609 1 I HISTORY OF BIOINFORMATICS Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biologicaldata. As an interdisciplinary field of science, bioinformatics combines computer science, statistics, mathematics, and engineering to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques. Bioinformatics derives knowledge from computer analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature. Research in bioinformatics includes method development for storage, retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It has many practical applications in different areas of biology and medicine. Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. "Classical" bioinformatics: "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information.” The National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as: "Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. -
Mapping Genomes
472 Chapter 17 | Biotechnology and Genomics the bacterium Agrobacterium tumefaciens occur by DNA transfer from the bacterium to the plant. Although the tumors do not kill the plants, they stunt the plants and they become more susceptible to harsh environmental conditions. A. tumefaciens affects many plants such as walnuts, grapes, nut trees, and beets. Artificially introducing DNA into plant cells is more challenging than in animal cells because of the thick plant cell wall. Researchers used the natural transfer of DNA from Agrobacterium to a plant host to introduce DNA fragments of their choice into plant hosts. In nature, the disease-causing A. tumefaciens have a set of plasmids, Ti plasmids (tumor-inducing plasmids), that contain genes to produce tumors in plants. DNA from the Ti plasmid integrates into the infected plant cell’s genome. Researchers manipulate the Ti plasmids to remove the tumor-causing genes and insert the desired DNA fragment for transfer into the plant genome. The Ti plasmids carry antibiotic resistance genes to aid selection and researchers can propagate them in E. coli cells as well. The Organic Insecticide Bacillus thuringiensis Bacillus thuringiensis (Bt) is a bacterium that produces protein crystals during sporulation that are toxic to many insect species that affect plants. Insects need to ingest Bt toxin in order to activate the toxin. Insects that have eaten Bt toxin stop feeding on the plants within a few hours. After the toxin activates in the insects' intestines, they die within a couple of days. Modern biotechnology has allowed plants to encode their own crystal Bt toxin that acts against insects. -
Physical Mapping Technologies for the Identification and Characterization of Mutated Genes Contributing to Crop Quality
to Crop Quality for the Identification for the Identification and Characterization of Mutated Genes Contributing Physical Mapping Technologies Physical Mapping Technologies IAEA-TECDOC-1664 spine: 7,75 mm - 120 pages IAEA-TECDOC-1664 n PHYSICAL MAPPING TECHNOLOGIES FOR THE IDENTIFICATION AND CHARACTERIZATION OF MUTATED GENES CONTRIBUTING TO CROP QUALITY VIENNA ISSN 1011–4289 ISBN 978–92–0–119610–1 INTERNATIONAL ATOMIC AGENCY ENERGY ATOMIC INTERNATIONAL Physical Mapping Technologies for the Identification and Characterization of Mutated Genes Contributing to Crop Quality The following States are Members of the International Atomic Energy Agency: AFGHANISTAN GHANA NORWAY ALBANIA GREECE OMAN ALGERIA GUATEMALA PAKISTAN ANGOLA HAITI PALAU ARGENTINA HOLY SEE PANAMA ARMENIA HONDURAS PARAGUAY AUSTRALIA HUNGARY PERU AUSTRIA ICELAND PHILIPPINES AZERBAIJAN INDIA POLAND BAHRAIN INDONESIA PORTUGAL BANGLADESH IRAN, ISLAMIC REPUBLIC OF QATAR BELARUS IRAQ REPUBLIC OF MOLDOVA BELGIUM IRELAND ROMANIA BELIZE ISRAEL RUSSIAN FEDERATION BENIN ITALY SAUDI ARABIA BOLIVIA JAMAICA BOSNIA AND HERZEGOVINA JAPAN SENEGAL BOTSWANA JORDAN SERBIA BRAZIL KAZAKHSTAN SEYCHELLES BULGARIA KENYA SIERRA LEONE BURKINA FASO KOREA, REPUBLIC OF SINGAPORE BURUNDI KUWAIT SLOVAKIA CAMBODIA KYRGYZSTAN SLOVENIA CAMEROON LATVIA SOUTH AFRICA CANADA LEBANON SPAIN CENTRAL AFRICAN LESOTHO SRI LANKA REPUBLIC LIBERIA SUDAN CHAD LIBYAN ARAB JAMAHIRIYA SWEDEN CHILE LIECHTENSTEIN SWITZERLAND CHINA LITHUANIA SYRIAN ARAB REPUBLIC COLOMBIA LUXEMBOURG TAJIKISTAN CONGO MADAGASCAR THAILAND COSTA RICA -
Linkage & Genetic Mapping in Eukaryotes
LinLinkkaaggee && GGeenneetticic MMaappppiningg inin EEuukkaarryyootteess CChh.. 66 1 LLIINNKKAAGGEE AANNDD CCRROOSSSSIINNGG OOVVEERR ! IInn eeuukkaarryyoottiicc ssppeecciieess,, eeaacchh lliinneeaarr cchhrroommoossoommee ccoonnttaaiinnss aa lloonngg ppiieeccee ooff DDNNAA – A typical chromosome contains many hundred or even a few thousand different genes ! TThhee tteerrmm lliinnkkaaggee hhaass ttwwoo rreellaatteedd mmeeaanniinnggss – 1. Two or more genes can be located on the same chromosome – 2. Genes that are close together tend to be transmitted as a unit Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display 2 LinkageLinkage GroupsGroups ! Chromosomes are called linkage groups – They contain a group of genes that are linked together ! The number of linkage groups is the number of types of chromosomes of the species – For example, in humans " 22 autosomal linkage groups " An X chromosome linkage group " A Y chromosome linkage group ! Genes that are far apart on the same chromosome can independently assort from each other – This is due to crossing-over or recombination Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display 3 LLiinnkkaaggee aanndd RRecombinationecombination Genes nearby on the same chromosome tend to stay together during the formation of gametes; this is linkage. The breakage of the chromosome, the separation of the genes, and the exchange of genes between chromatids is known as recombination. (we call it crossing over) 4 IndependentIndependent assortment:assortment: -
Bioinformatics: a Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D
BIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins SECOND EDITION Andreas D. Baxevanis Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland USA B. F. Francis Ouellette Centre for Molecular Medicine and Therapeutics Children’s and Women’s Health Centre of British Columbia University of British Columbia Vancouver, British Columbia Canada A JOHN WILEY & SONS, INC., PUBLICATION New York • Chichester • Weinheim • Brisbane • Singapore • Toronto BIOINFORMATICS SECOND EDITION METHODS OF BIOCHEMICAL ANALYSIS Volume 43 BIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins SECOND EDITION Andreas D. Baxevanis Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland USA B. F. Francis Ouellette Centre for Molecular Medicine and Therapeutics Children’s and Women’s Health Centre of British Columbia University of British Columbia Vancouver, British Columbia Canada A JOHN WILEY & SONS, INC., PUBLICATION New York • Chichester • Weinheim • Brisbane • Singapore • Toronto Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. Copyright ᭧ 2001 by John Wiley & Sons, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic or mechanical, including uploading, downloading, printing, decompiling, recording or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. -
NGS Raw Data Analysis
NGS raw data analysis Introduc3on to Next-Generaon Sequencing (NGS)-data Analysis Laura Riva & Maa Pelizzola Outline 1. Descrip7on of sequencing plaorms 2. Alignments, QC, file formats, data processing, and tools of general use 3. Prac7cal lesson Illumina Sequencing Single base extension with incorporaon of fluorescently labeled nucleodes Library preparaon Automated cluster genera3on DNA fragmenta3on and adapter ligaon Aachment to the flow cell Cluster generaon by bridge amplifica3on of DNA fragments Sequencing Illumina Output Quality Scores Sequence Files Comparison of some exisng plaGorms 454 Ti RocheT Illumina HiSeqTM ABI 5500 (SOLiD) 2000 Amplificaon Emulsion PCR Bridge PCR Emulsion PCR Sequencing Pyrosequencing Reversible Ligaon-based reac7on terminators sequencing Paired ends/sep Yes/3kb Yes/200 bp Yes/3 kb Read length 400 bp 100 bp 75 bp Advantages Short run 7mes. The most popular Good base call Longer reads plaorm accuracy. Good improve mapping in mulplexing repe7ve regions. capability Ability to detect large structural variaons Disadvantages High reagent cost. Higher error rates in repeat sequences The Illumina HiSeq! Stacey Gabriel Libraries, lanes, and flowcells Flowcell Lanes Illumina Each reaction produces a unique Each NGS machine processes a library of DNA fragments for single flowcell containing several sequencing. independent lanes during a single sequencing run Applications of Next-Generation Sequencing Basic data analysis concepts: • Raw data analysis= image processing and base calling • Understanding Fastq • Understanding Quality scores • Quality control and read cleaning • Alignment to the reference genome • Understanding SAM/BAM formats • Samtools • Bedtools Understanding FASTQ file format FASTQ format is a text-based format for storing a biological sequence and its corresponding quality scores.