Contents Nucleotide Sequence Databases
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
T-Coffee Documentation Release Version 13.45.47.Aba98c5
T-Coffee Documentation Release Version_13.45.47.aba98c5 Cedric Notredame Aug 31, 2021 Contents 1 T-Coffee Installation 3 1.1 Installation................................................3 1.1.1 Unix/Linux Binaries......................................4 1.1.2 MacOS Binaries - Updated...................................4 1.1.3 Installation From Source/Binaries downloader (Mac OSX/Linux)...............4 1.2 Template based modes: PSI/TM-Coffee and Expresso.........................5 1.2.1 Why do I need BLAST with T-Coffee?.............................6 1.2.2 Using a BLAST local version on Unix.............................6 1.2.3 Using the EBI BLAST client..................................6 1.2.4 Using the NCBI BLAST client.................................7 1.2.5 Using another client.......................................7 1.3 Troubleshooting.............................................7 1.3.1 Third party packages......................................7 1.3.2 M-Coffee parameters......................................9 1.3.3 Structural modes (using PDB)................................. 10 1.3.4 R-Coffee associated packages................................. 10 2 Quick Start Regressive Algorithm 11 2.1 Introduction............................................... 11 2.2 Installation from source......................................... 12 2.3 Examples................................................. 12 2.3.1 Fast and accurate........................................ 12 2.3.2 Slower and more accurate.................................... 12 2.3.3 Very Fast........................................... -
Mouse Kcnip2 Conditional Knockout Project (CRISPR/Cas9)
https://www.alphaknockout.com Mouse Kcnip2 Conditional Knockout Project (CRISPR/Cas9) Objective: To create a Kcnip2 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Kcnip2 gene (NCBI Reference Sequence: NM_145703 ; Ensembl: ENSMUSG00000025221 ) is located on Mouse chromosome 19. 10 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 10 (Transcript: ENSMUST00000162528). Exon 4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Kcnip2 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-98F2 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for disruptions in this gene are susceptible to induced cardiac arrhythmias but are otherwise normal. Exon 4 starts from about 27.65% of the coding region. The knockout of Exon 4 will result in frameshift of the gene. The size of intron 3 for 5'-loxP site insertion: 574 bp, and the size of intron 4 for 3'-loxP site insertion: 532 bp. The size of effective cKO region: ~625 bp. The cKO region does not have any other known gene. Page 1 of 8 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele gRNA region 5' gRNA region 3' 1 2 3 4 5 6 7 8 9 10 Targeting vector Targeted allele Constitutive KO allele (After Cre recombination) Legends Exon of mouse Kcnip2 Homology arm cKO region loxP site Page 2 of 8 https://www.alphaknockout.com Overview of the Dot Plot Window size: 10 bp Forward Reverse Complement Sequence 12 Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. -
How to Generate a Publication-Quality Multiple Sequence Alignment (Thomas Weimbs, University of California Santa Barbara, 11/2012)
Tutorial: How to generate a publication-quality multiple sequence alignment (Thomas Weimbs, University of California Santa Barbara, 11/2012) 1) Get your sequences in FASTA format: • Go to the NCBI website; find your sequences and display them in FASTA format. Each sequence should look like this (http://www.ncbi.nlm.nih.gov/protein/6678177?report=fasta): >gi|6678177|ref|NP_033320.1| syntaxin-4 [Mus musculus] MRDRTHELRQGDNISDDEDEVRVALVVHSGAARLGSPDDEFFQKVQTIRQTMAKLESKVRELEKQQVTIL ATPLPEESMKQGLQNLREEIKQLGREVRAQLKAIEPQKEEADENYNSVNTRMKKTQHGVLSQQFVELINK CNSMQSEYREKNVERIRRQLKITNAGMVSDEELEQMLDSGQSEVFVSNILKDTQVTRQALNEISARHSEI QQLERSIRELHEIFTFLATEVEMQGEMINRIEKNILSSADYVERGQEHVKIALENQKKARKKKVMIAICV SVTVLILAVIIGITITVG 2) In a text editor, paste all your sequences together (in the order that you would like them to appear in the end). It should look like this: >gi|6678177|ref|NP_033320.1| syntaxin-4 [Mus musculus] MRDRTHELRQGDNISDDEDEVRVALVVHSGAARLGSPDDEFFQKVQTIRQTMAKLESKVRELEKQQVTIL ATPLPEESMKQGLQNLREEIKQLGREVRAQLKAIEPQKEEADENYNSVNTRMKKTQHGVLSQQFVELINK CNSMQSEYREKNVERIRRQLKITNAGMVSDEELEQMLDSGQSEVFVSNILKDTQVTRQALNEISARHSEI QQLERSIRELHEIFTFLATEVEMQGEMINRIEKNILSSADYVERGQEHVKIALENQKKARKKKVMIAICV SVTVLILAVIIGITITVG >gi|151554658|gb|AAI47965.1| STX3 protein [Bos taurus] MKDRLEQLKAKQLTQDDDTDEVEIAVDNTAFMDEFFSEIEETRVNIDKISEHVEEAKRLYSVILSAPIPE PKTKDDLEQLTTEIKKRANNVRNKLKSMERHIEEDEVQSSADLRIRKSQHSVLSRKFVEVMTKYNEAQVD FRERSKGRIQRQLEITGKKTTDEELEEMLESGNPAIFTSGIIDSQISKQALSEIEGRHKDIVRLESSIKE LHDMFMDIAMLVENQGEMLDNIELNVMHTVDHVEKAREETKRAVKYQGQARKKLVIIIVIVVVLLGILAL IIGLSVGLK -
Bioinformatics Study of Lectins: New Classification and Prediction In
Bioinformatics study of lectins : new classification and prediction in genomes François Bonnardel To cite this version: François Bonnardel. Bioinformatics study of lectins : new classification and prediction in genomes. Structural Biology [q-bio.BM]. Université Grenoble Alpes [2020-..]; Université de Genève, 2021. En- glish. NNT : 2021GRALV010. tel-03331649 HAL Id: tel-03331649 https://tel.archives-ouvertes.fr/tel-03331649 Submitted on 2 Sep 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITE GRENOBLE ALPES préparée dans le cadre d’une cotutelle entre la Communauté Université Grenoble Alpes et l’Université de Genève Spécialités: Chimie Biologie Arrêté ministériel : le 6 janvier 2005 – 25 mai 2016 Présentée par François Bonnardel Thèse dirigée par la Dr. Anne Imberty codirigée par la Dr/Prof. Frédérique Lisacek préparée au sein du laboratoire CERMAV, CNRS et du Computer Science Department, UNIGE et de l’équipe PIG, SIB Dans les Écoles Doctorales EDCSV et UNIGE Etude bioinformatique des lectines: nouvelle classification et prédiction dans les génomes Thèse soutenue publiquement le 8 Février 2021, devant le jury composé de : Dr. Alexandre de Brevern UMR S1134, Inserm, Université Paris Diderot, Paris, France, Rapporteur Dr. -
Sequence Alignment/Map) Is a Text Format for Storing Sequence Alignment Data in a Series of Tab Delimited ASCII Columns
NGS FILE FORMATS SEQUENCE FILE FORMATS FASTA FORMAT FASTA Single sequence example: >HWI-ST398_0092:1:1:5372:2486#0/1 TTTTTCGTTCTTTTCATGTACCGCTTTTTGTTCGGTTAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGAT ACGTAGCAGCAGCATCAGTACGACTACGACGACTAGCACATGCGACGATCGATGCTAGCTGACTATCGATG Multiple sequence example: >Sequence Name 1 TTTTTCGTTCTTTTCATGTACCGCTTTTTGTTCGGTTAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGAT ACGTAGCAGCAGCATCAGTACGACTACGACGACTAGCACATGCGACGATCGATGCTAGCTGACTATCGATG >Sequence Name 2 ACGTAGACACGACTAGCATCAGCTACGCATCGATCAGCATCGACTAGCATCACACATCGATCAGCATCACGACTAGCAT AGCATCGACTACACTACGACTACGATCCACGTACGACTAGCATGCTAGCGCTAGCTAGCTAGCTAGTCGATCGATGAGT AGCTAGCTAGCTAGC >Sequence Name 3 ACTCAGCATGCATCAGCATCGACTACGACTACGACATCGACTAGCATCAGCAT SEQUENCE FILE FORMATS FASTQ FORMAT FASTQ Text based format for storing sequence data and corresponding quality scores for each base. To enable a one-one correspondence between the base sequence and the quality score the score is stored as a single one letter/number code using an offset of the standard ASCII code. Quality scores range from 0 to 40 and represent a log10 score for the probability of being wrong. E.g. score of 30 => 1:1000 chance of error SEQUENCE FILE FORMATS FASTQ FORMAT FASTQ Each fastq file contain multiple entries and each entry consists of 4 lines: 1. header line beginning with “@“ and sequence name 2. sequence line 3. header line beginning with “+” which can have the name but rarely does 4. quality score line SEQUENCE FILE FORMATS FASTQ FORMAT FASTQ @HWI-ST398_0092:6:73:5372:2486#0/1 TTTTTCGTTCTTTTCATGTACCGCTTTTTGTTCGGTTAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGAT -
Maternally Inherited Pirnas Direct Transient Heterochromatin Formation
RESEARCH ARTICLE Maternally inherited piRNAs direct transient heterochromatin formation at active transposons during early Drosophila embryogenesis Martin H Fabry, Federica A Falconio, Fadwa Joud, Emily K Lythgoe, Benjamin Czech*, Gregory J Hannon* CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Cambridge, United Kingdom Abstract The PIWI-interacting RNA (piRNA) pathway controls transposon expression in animal germ cells, thereby ensuring genome stability over generations. In Drosophila, piRNAs are intergenerationally inherited through the maternal lineage, and this has demonstrated importance in the specification of piRNA source loci and in silencing of I- and P-elements in the germ cells of daughters. Maternally inherited Piwi protein enters somatic nuclei in early embryos prior to zygotic genome activation and persists therein for roughly half of the time required to complete embryonic development. To investigate the role of the piRNA pathway in the embryonic soma, we created a conditionally unstable Piwi protein. This enabled maternally deposited Piwi to be cleared from newly laid embryos within 30 min and well ahead of the activation of zygotic transcription. Examination of RNA and protein profiles over time, and correlation with patterns of H3K9me3 deposition, suggests a role for maternally deposited Piwi in attenuating zygotic transposon expression in somatic cells of the developing embryo. In particular, robust deposition of piRNAs targeting roo, an element whose expression is mainly restricted to embryonic development, results in the deposition of transient heterochromatic marks at active roo insertions. We hypothesize that *For correspondence: roo, an extremely successful mobile element, may have adopted a lifestyle of expression in the [email protected] embryonic soma to evade silencing in germ cells. -
PIR Brochure
Protein Information Resource Integrated Protein Informatics Resource for Genomic & Proteomic Research For four decades the Protein Information Resource (PIR) has provided databases and protein sequence analysis tools to the scientific community, including the Protein Sequence Database, which grew out from the Atlas of Protein Sequence and Structure, edited by Margaret Dayhoff [1965-1978]. Currently, PIR major activities include: i) UniProt (Universal Protein Resource) development, ii) iProClass protein data integration and ID mapping, iii) PIRSF protein pir.georgetown.edu classification, and iv) iProLINK protein literature mining and ontology development. UniProt – Universal Protein Resource What is UniProt? UniProt is the central resource for storing and UniProt (Universal Protein Resource) http://www.uniprot.org interconnecting information from large and = + + disparate sources and the most UniProt: the world's most comprehensive catalog of information on proteins comprehensive catalog of protein sequence and functional annotation. UniProt Knowledgebase UniProt Reference UniProt Archive (UniProtKB) Clusters (UniRef) (UniParc) When to use UniProt databases Integration of Swiss-Prot, TrEMBL Non-redundant reference A stable, and PIR-PSD sequences clustered from comprehensive Use UniProtKB to retrieve curated, reliable, Fully classified, richly and accurately UniProtKB and UniParc for archive of all publicly annotated protein sequences with comprehensive or fast available protein comprehensive information on proteins. minimal redundancy and extensive sequence searches at 100%, sequences for Use UniRef to decrease redundancy and cross-references 90%, or 50% identity sequence tracking from: speed up sequence similarity searches. TrEMBL section UniRef100 Swiss-Prot, Computer-annotated protein sequences TrEMBL, PIR-PSD, Use UniParc to access to archived sequences EMBL, Ensembl, IPI, and their source databases. -
A Semantic Standard for Describing the Location of Nucleotide and Protein Feature Annotation Jerven T
Bolleman et al. Journal of Biomedical Semantics (2016) 7:39 DOI 10.1186/s13326-016-0067-z RESEARCH Open Access FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation Jerven T. Bolleman1*, Christopher J. Mungall2, Francesco Strozzi3, Joachim Baran4, Michel Dumontier5, Raoul J. P. Bonnal6, Robert Buels7, Robert Hoehndorf8, Takatomo Fujisawa9, Toshiaki Katayama10 and Peter J. A. Cock11 Abstract Background: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. Description: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Conclusions: Our ontology allows -
Loss of BCL-3 Sensitises Colorectal Cancer Cells to DNA Damage, Revealing A
bioRxiv preprint doi: https://doi.org/10.1101/2021.08.03.454995; this version posted August 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Title: Loss of BCL-3 sensitises colorectal cancer cells to DNA damage, revealing a role for BCL-3 in double strand break repair by homologous recombination Authors: Christopher Parker*1, Adam C Chambers*1, Dustin Flanagan2, Tracey J Collard1, Greg Ngo3, Duncan M Baird3, Penny Timms1, Rhys G Morgan4, Owen Sansom2 and Ann C Williams1. *Joint first authors. Author affiliations: 1. Colorectal Tumour Biology Group, School of Cellular and Molecular Medicine, Faculty of Life Sciences, Biomedical Sciences Building, University Walk, University of Bristol, Bristol, BS8 1TD, UK 2. Cancer Research UK Beatson Institute, Garscube Estate, Switchback Road, Bearsden Glasgow, G61 1BD UK 3. Division of Cancer and Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN UK 4. School of Life Sciences, University of Sussex, Sussex House, Falmer, Brighton, BN1 9RH UK 1 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.03.454995; this version posted August 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Abstract (250 words) Objective: The proto-oncogene BCL-3 is upregulated in a subset of colorectal cancers (CRC) and increased expression of the gene correlates with poor patient prognosis. The aim is to investigate whether inhibiting BCL-3 can increase the response to DNA damage in CRC. -
BIO4342 Exercise 2: Browser-Based Annotation and RNA-Seq Data
BIO4342 Exercise 2: Browser-Based Annotation and RNA-Seq Data Jeremy Buhler March 15, 2010 This exercise continues your introduction to practical issues in comparative annotation. You’ll be annotating genomic sequence from the dot chromosome of Drosophila mojavensis using your knowledge of BLAST and some improved visualization tools. You’ll also consider how best to integrate information from high-throughput sequencing of expressed RNA. 1 Getting Started To begin, go to our local genome browser at http://gander.wustl.edu/. Select “Genome Browser” from the left-side menu and choose the “Improved Dot” assembly of D. mojavensis for viewing. Finally, hit submit to start looking at the sequence. The entire dot assembly is about 1.69 megabases in length; zoom out to see everything. This assembly is built from a set of overlapping fosmid clones prepared for the 2009 edition of BIO 4342. We’ve added a variety of information to the genome browser to help you annotate, such as: • gene-structure predictions from several different tools; • repeats annotated using the RepeatMasker program; • BLAST hits to D. melanogaster proteins; • RNA-Seq data, which we’ll describe in more detail later. Having all this evidence available at once is somewhat overwhelming. To keep the view to a manageable level, I’d suggest that you initially set all the gene prediction tracks (Genscan, Nscan, SNAP, Geneid), as well as the repeat tracks, to “dense” mode, so that each displays on a single line. Set the BLAST hit track (called “D. mel proteins”) to “pack” to see the locations of all BLAST hits, and set the “RNA-Seq Coverage” track to “full” and the “TopHat junctions” track to “pack” to get a detailed view of these results. -
Integrative Annotation of 21,037 Human Genes Validated by Full-Length Cdna Clones
PLoS BIOLOGY Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones Tadashi Imanishi1, Takeshi Itoh1,2, Yutaka Suzuki3,68, Claire O’Donovan4, Satoshi Fukuchi5, Kanako O. Koyanagi6, Roberto A. Barrero5, Takuro Tamura7,8, Yumi Yamaguchi-Kabata1, Motohiko Tanino1,7, Kei Yura9, Satoru Miyazaki5, Kazuho Ikeo5, Keiichi Homma5, Arek Kasprzyk4, Tetsuo Nishikawa10,11, Mika Hirakawa12, Jean Thierry-Mieg13,14, Danielle Thierry-Mieg13,14, Jennifer Ashurst15, Libin Jia16, Mitsuteru Nakao3, Michael A. Thomas17, Nicola Mulder4, Youla Karavidopoulou4, Lihua Jin5, Sangsoo Kim18, Tomohiro Yasuda11, Boris Lenhard19, Eric Eveno20,21, Yoshiyuki Suzuki5, Chisato Yamasaki1, Jun-ichi Takeda1, Craig Gough1,7, Phillip Hilton1,7, Yasuyuki Fujii1,7, Hiroaki Sakai1,7,22, Susumu Tanaka1,7, Clara Amid23, Matthew Bellgard24, Maria de Fatima Bonaldo25, Hidemasa Bono26, Susan K. Bromberg27, Anthony J. Brookes19, Elspeth Bruford28, Piero Carninci29, Claude Chelala20, Christine Couillault20,21, Sandro J. de Souza30, Marie-Anne Debily20, Marie-Dominique Devignes31, Inna Dubchak32, Toshinori Endo33, Anne Estreicher34, Eduardo Eyras15, Kaoru Fukami-Kobayashi35, Gopal R. Gopinath36, Esther Graudens20,21, Yoonsoo Hahn18, Michael Han23, Ze-Guang Han21,37, Kousuke Hanada5, Hideki Hanaoka1, Erimi Harada1,7, Katsuyuki Hashimoto38, Ursula Hinz34, Momoki Hirai39, Teruyoshi Hishiki40, Ian Hopkinson41,42, Sandrine Imbeaud20,21, Hidetoshi Inoko1,7,43, Alexander Kanapin4, Yayoi Kaneko1,7, Takeya Kasukawa26, Janet Kelso44, Paul Kersey4, Reiko Kikuno45, Kouichi -
A SARS-Cov-2 Sequence Submission Tool for the European Nucleotide
Databases and ontologies Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab421/6294398 by guest on 25 June 2021 A SARS-CoV-2 sequence submission tool for the European Nucleotide Archive Miguel Roncoroni 1,2,∗, Bert Droesbeke 1,2, Ignacio Eguinoa 1,2, Kim De Ruyck 1,2, Flora D’Anna 1,2, Dilmurat Yusuf 3, Björn Grüning 3, Rolf Backofen 3 and Frederik Coppens 1,2 1Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium, 1VIB Center for Plant Systems Biology, 9052 Ghent, Belgium and 2University of Freiburg, Department of Computer Science, Freiburg im Breisgau, Baden-Württemberg, Germany ∗To whom correspondence should be addressed. Associate Editor: XXXXXXX Received on XXXXX; revised on XXXXX; accepted on XXXXX Abstract Summary: Many aspects of the global response to the COVID-19 pandemic are enabled by the fast and open publication of SARS-CoV-2 genetic sequence data. The European Nucleotide Archive (ENA) is the European recommended open repository for genetic sequences. In this work, we present a tool for submitting raw sequencing reads of SARS-CoV-2 to ENA. The tool features a single-step submission process, a graphical user interface, tabular-formatted metadata and the possibility to remove human reads prior to submission. A Galaxy wrap of the tool allows users with little or no bioinformatic knowledge to do bulk sequencing read submissions. The tool is also packed in a Docker container to ease deployment. Availability: CLI ENA upload tool is available at github.com/usegalaxy- eu/ena-upload-cli (DOI 10.5281/zenodo.4537621); Galaxy ENA upload tool at toolshed.g2.bx.psu.edu/view/iuc/ena_upload/382518f24d6d and https://github.com/galaxyproject/tools- iuc/tree/master/tools/ena_upload (development) and; ENA upload Galaxy container at github.com/ELIXIR- Belgium/ena-upload-container (DOI 10.5281/zenodo.4730785) Contact: [email protected] 1 Introduction Nucleotide Archive (ENA).