The Human Genome Browser at UCSC
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Mouse Kcnip2 Conditional Knockout Project (CRISPR/Cas9)
https://www.alphaknockout.com Mouse Kcnip2 Conditional Knockout Project (CRISPR/Cas9) Objective: To create a Kcnip2 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Kcnip2 gene (NCBI Reference Sequence: NM_145703 ; Ensembl: ENSMUSG00000025221 ) is located on Mouse chromosome 19. 10 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 10 (Transcript: ENSMUST00000162528). Exon 4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Kcnip2 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-98F2 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for disruptions in this gene are susceptible to induced cardiac arrhythmias but are otherwise normal. Exon 4 starts from about 27.65% of the coding region. The knockout of Exon 4 will result in frameshift of the gene. The size of intron 3 for 5'-loxP site insertion: 574 bp, and the size of intron 4 for 3'-loxP site insertion: 532 bp. The size of effective cKO region: ~625 bp. The cKO region does not have any other known gene. Page 1 of 8 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele gRNA region 5' gRNA region 3' 1 2 3 4 5 6 7 8 9 10 Targeting vector Targeted allele Constitutive KO allele (After Cre recombination) Legends Exon of mouse Kcnip2 Homology arm cKO region loxP site Page 2 of 8 https://www.alphaknockout.com Overview of the Dot Plot Window size: 10 bp Forward Reverse Complement Sequence 12 Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. -
BIO4342 Exercise 2: Browser-Based Annotation and RNA-Seq Data
BIO4342 Exercise 2: Browser-Based Annotation and RNA-Seq Data Jeremy Buhler March 15, 2010 This exercise continues your introduction to practical issues in comparative annotation. You’ll be annotating genomic sequence from the dot chromosome of Drosophila mojavensis using your knowledge of BLAST and some improved visualization tools. You’ll also consider how best to integrate information from high-throughput sequencing of expressed RNA. 1 Getting Started To begin, go to our local genome browser at http://gander.wustl.edu/. Select “Genome Browser” from the left-side menu and choose the “Improved Dot” assembly of D. mojavensis for viewing. Finally, hit submit to start looking at the sequence. The entire dot assembly is about 1.69 megabases in length; zoom out to see everything. This assembly is built from a set of overlapping fosmid clones prepared for the 2009 edition of BIO 4342. We’ve added a variety of information to the genome browser to help you annotate, such as: • gene-structure predictions from several different tools; • repeats annotated using the RepeatMasker program; • BLAST hits to D. melanogaster proteins; • RNA-Seq data, which we’ll describe in more detail later. Having all this evidence available at once is somewhat overwhelming. To keep the view to a manageable level, I’d suggest that you initially set all the gene prediction tracks (Genscan, Nscan, SNAP, Geneid), as well as the repeat tracks, to “dense” mode, so that each displays on a single line. Set the BLAST hit track (called “D. mel proteins”) to “pack” to see the locations of all BLAST hits, and set the “RNA-Seq Coverage” track to “full” and the “TopHat junctions” track to “pack” to get a detailed view of these results. -
BLAT—The BLAST-Like Alignment Tool
Resource BLAT—The BLAST-Like Alignment Tool W. James Kent Department of Biology and Center for Molecular Biology of RNA, University of California, Santa Cruz, Santa Cruz, California 95064, USA Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. BLAT’s speed stems from an index of all nonoverlapping K-mers in the genome. This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly. BLAT has several major stages. It uses the index to find regions in the genome likely to be homologous to the query sequence. It performs an alignment between homologous regions. It stitches together these aligned regions (often exons) into larger alignments (typically genes). Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible. This paper describes how BLAT was optimized. Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches. BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications. http://genome.ucsc.edu hosts a web-based BLAT server for the human genome. Some might wonder why in the year 2002 the world needs sions on any number of perfect or near-perfect hits. Where another sequence alignment tool. -
The Encodedb Portal: Simplified Access to ENCODE Consortium Data Laura L
Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press Resource The ENCODEdb portal: Simplified access to ENCODE Consortium data Laura L. Elnitski, Prachi Shah, R. Travis Moreland, Lowell Umayam, Tyra G. Wolfsberg, and Andreas D. Baxevanis1 Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA The Encyclopedia of DNA Elements (ENCODE) project aims to identify and characterize all functional elements in a representative chromosomal sample comprising 1% of the human genome. Data generated by members of The ENCODE Project Consortium are housed in a number of public databases, such as the UCSC Genome Browser, NCBI’s Gene Expression Omnibus (GEO), and EBI’s ArrayExpress. As such, it is often difficult for biologists to gather all of the ENCODE data from a particular genomic region of interest and integrate them with relevant information found in other public databases. The ENCODEdb portal was developed to address this problem. ENCODEdb provides a unified, single point-of-access to data generated by the ENCODE Consortium, as well as to data from other source databases that lie within ENCODE regions; this provides the user a complete view of all known data in a particular region of interest. ENCODEdb Genomic Context searches allow for the retrieval of information on functional elements annotated within ENCODE regions, including mRNA, EST, and STS sequences; single nucleotide polymorphisms, and UniGene clusters. Information is also retrieved from GEO, OMIM, and major genome sequence browsers. ENCODEdb Consortium Data searches allow users to perform compound queries on array-based ENCODE data available both from GEO and from the UCSC Genome Browser. -
A Multithread Blat Algorithm Speeding up Aligning Sequences to Genomes Meng Wang and Lei Kong*
Wang and Kong BMC Bioinformatics (2019) 20:28 https://doi.org/10.1186/s12859-019-2597-8 SOFTWARE Open Access pblat: a multithread blat algorithm speeding up aligning sequences to genomes Meng Wang and Lei Kong* Abstract Background: The blat is a widely used sequence alignment tool. It is especially useful for aligning long sequences and gapped mapping, which cannot be performed properly by other fast sequence mappers designed for short reads. However, the blat tool is single threaded and when used to map whole genome or whole transcriptome sequences to reference genomes this program can take days to finish, making it unsuitable for large scale sequencing projects and iterative analysis. Here, we present pblat (parallel blat), a parallelized blat algorithm with multithread and cluster computing support, which functions to rapidly fine map large scale DNA/RNA sequences against genomes. Results: The pblat algorithm takes advantage of modern multicore processors and significantly reduces the run time with the number of threads used. pblat utilizes almost equal amount of memory as when running blat. The results generated by pblat are identical with those generated by blat. The pblat tool is easy to install and can run on Linux and Mac OS systems. In addition, we provide a cluster version of pblat (pblat-cluster) running on computing clusters with MPI support. Conclusion: pblat is open source and free available for non-commercial users. It is easy to install and easy to use. pblat and pblat-cluster would facilitate the high-throughput mapping of large scale genomic and transcript sequences to reference genomes with both high speed and high precision. -
Homology & Alignment
Protein Bioinformatics Johns Hopkins Bloomberg School of Public Health 260.655 Thursday, April 1, 2010 Jonathan Pevsner Outline for today 1. Homology and pairwise alignment 2. BLAST 3. Multiple sequence alignment 4. Phylogeny and evolution Learning objectives: homology & alignment 1. You should know the definitions of homologs, orthologs, and paralogs 2. You should know how to determine whether two genes (or proteins) are homologous 3. You should know what a scoring matrix is 4. You should know how alignments are performed 5. You should know how to align two sequences using the BLAST tool at NCBI 1 Pairwise sequence alignment is the most fundamental operation of bioinformatics • It is used to decide if two proteins (or genes) are related structurally or functionally • It is used to identify domains or motifs that are shared between proteins • It is the basis of BLAST searching (next topic) • It is used in the analysis of genomes myoglobin Beta globin (NP_005359) (NP_000509) 2MM1 2HHB Page 49 Pairwise alignment: protein sequences can be more informative than DNA • protein is more informative (20 vs 4 characters); many amino acids share related biophysical properties • codons are degenerate: changes in the third position often do not alter the amino acid that is specified • protein sequences offer a longer “look-back” time • DNA sequences can be translated into protein, and then used in pairwise alignments 2 Find BLAST from the home page of NCBI and select protein BLAST… Page 52 Choose align two or more sequences… Page 52 Enter the two sequences (as accession numbers or in the fasta format) and click BLAST. -
Differential Gene Expression Profiling in Bed Bug (Cimex Lectularius L.) Fed on Ibuprofen and Caffeine in Reconstituted Human Blood Ralph B
Herpe y & tolo og g l y: o C th i u Narain et al., Entomol Ornithol Herpetol 2015, 4:3 n r r r e O n , t y R g DOI: 10.4172/2161-0983.1000160 e o l s o e a m r o c t h n E Entomology, Ornithology & Herpetology ISSN: 2161-0983 ResearchResearch Article Article OpenOpen Access Access Differential Gene Expression Profiling in Bed Bug (Cimex Lectularius L.) Fed on Ibuprofen and Caffeine in Reconstituted Human Blood Ralph B. Narain, Haichuan Wang and Shripat T. Kamble* Department of Entomology, University of Nebraska, Lincoln, NE 68583, USA Abstract The recent resurgence of the common bed bug (Cimex lectularius L.) infestations worldwide has created a need for renewed research on biology, behavior, population genetics and management practices. Humans serve as exclusive hosts to bed bugs in urban environments. Since a majority of humans consume Ibuprofen (as pain medication) and caffeine (in coffee and other soft drinks) so bug bugs subsequently acquire Ibuprofen and caffeine through blood feeding. However, the effect of these chemicals at genetic level in bed bug is unknown. Therefore, this research was conducted to determine differential gene expression in bed bugs using RNA-Seq analysis at dosages of 200 ppm Ibuprofen and 40 ppm caffeine incorporated into reconstituted human blood and compared against the control. Total RNA was extracted from a single bed bug per replication per treatment and sequenced. Read counts obtained were analyzed using Bioconductor software programs to identify differentially expressed genes, which were then searched against the non-redundant (nr) protein database of National Center for Biotechnology Information (NCBI). -
A Dissertation
A Dissertation entitled Strategies for Membrane Protein Studies and Structural Characterization of a Metabolic Enzyme for Antibiotic Development by Buenafe T. Arachea Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Doctor of Philosophy Degree in Chemistry Dr. Ronald E. Viola, Committee Chair Dr. Max O. Funk, Committee Member Dr. Donald Ronning, Committee Member Dr. Marcia McInerney, Committee Member Dr. Patricia R. Komuniecki, Dean College of Graduate Studies The University of Toledo August 2011 Copyright © 2011, Buenafe T. Arachea This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author. An Abstract of Strategies for Membrane Protein Studies and Structural Characterization of a Metabolic Enzyme for Antibiotic Development by Buenafe T. Arachea Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Doctor of Philosophy Degree in Chemistry The University of Toledo August 2011 Membrane proteins are essential in a variety of cellular functions, making them viable targets for drug development. However, progress in the structural elucidation of membrane proteins has proven to be a difficult task, thus limiting the number of published structures of membrane proteins as compared with the enormous structural information obtained from soluble proteins. The challenge in membrane protein studies lies in the production of the required sample for characterization, as well as in developing methods to effectively solubilize and maintain a functional and stable form of the target protein during the course of crystallization. To address these issues, two different approaches were explored for membrane protein studies. -
Bedtools Documentation Release 2.30.0
Bedtools Documentation Release 2.30.0 Quinlan lab @ Univ. of Utah Jan 23, 2021 Contents 1 Tutorial 3 2 Important notes 5 3 Interesting Usage Examples 7 4 Table of contents 9 5 Performance 169 6 Brief example 173 7 License 175 8 Acknowledgments 177 9 Mailing list 179 i ii Bedtools Documentation, Release 2.30.0 Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line. bedtools is developed in the Quinlan laboratory at the University of Utah and benefits from fantastic contributions made by scientists worldwide. Contents 1 Bedtools Documentation, Release 2.30.0 2 Contents CHAPTER 1 Tutorial We have developed a fairly comprehensive tutorial that demonstrates both the basics, as well as some more advanced examples of how bedtools can help you in your research. Please have a look. 3 Bedtools Documentation, Release 2.30.0 4 Chapter 1. Tutorial CHAPTER 2 Important notes • As of version 2.28.0, bedtools now supports the CRAM format via the use of htslib. Specify the reference genome associated with your CRAM file via the CRAM_REFERENCE environment variable. -
Quantification of Experimentally Induced Nucleotide Conversions in High-Throughput Sequencing Datasets Tobias Neumann1* , Veronika A
Neumann et al. BMC Bioinformatics (2019) 20:258 https://doi.org/10.1186/s12859-019-2849-7 RESEARCH ARTICLE Open Access Quantification of experimentally induced nucleotide conversions in high-throughput sequencing datasets Tobias Neumann1* , Veronika A. Herzog2, Matthias Muhar1, Arndt von Haeseler3,4, Johannes Zuber1,5, Stefan L. Ameres2 and Philipp Rescheneder3* Abstract Background: Methods to read out naturally occurring or experimentally introduced nucleic acid modifications are emerging as powerful tools to study dynamic cellular processes. The recovery, quantification and interpretation of such events in high-throughput sequencing datasets demands specialized bioinformatics approaches. Results: Here, we present Digital Unmasking of Nucleotide conversions in K-mers (DUNK), a data analysis pipeline enabling the quantification of nucleotide conversions in high-throughput sequencing datasets. We demonstrate using experimentally generated and simulated datasets that DUNK allows constant mapping rates irrespective of nucleotide-conversion rates, promotes the recovery of multimapping reads and employs Single Nucleotide Polymorphism (SNP) masking to uncouple true SNPs from nucleotide conversions to facilitate a robust and sensitive quantification of nucleotide-conversions. As a first application, we implement this strategy as SLAM-DUNK for the analysis of SLAMseq profiles, in which 4-thiouridine-labeled transcripts are detected based on T > C conversions. SLAM-DUNK provides both raw counts of nucleotide-conversion containing reads as well as a base-content and read coverage normalized approach for estimating the fractions of labeled transcripts as readout. Conclusion: Beyond providing a readily accessible tool for analyzing SLAMseq and related time-resolved RNA sequencing methods (TimeLapse-seq, TUC-seq), DUNK establishes a broadly applicable strategy for quantifying nucleotide conversions. -
Penguin: a Tool for Predicting Pseudouridine Sites in Direct RNA Nanopore Sequencing Data
bioRxiv preprint doi: https://doi.org/10.1101/2021.03.31.437901; this version posted June 9, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Penguin: A Tool for Predicting Pseudouridine Sites in Direct RNA Nanopore Sequencing Data Doaa Hassan1,4, Daniel Acevedo1,5, Swapna Vidhur Daulatabad1, Quoseena Mir1, Sarath Chandra Janga1,2,3 1. Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University, 535 West Michigan Street, Indianapolis, Indiana 46202 2. Department of Medical and Molecular Genetics, Indiana University School of Medicine, Medical Research and Library Building, 975 West Walnut Street, Indianapolis, Indiana, 46202 3. Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translational Sciences (HITS), 410 West 10th Street, Indianapolis, Indiana, 46202 4. Computers and Systems Department, National Telecommunication Institute, Cairo, Egypt. 5. Computer Science Department, University of Texas Rio Grande Valley Keywords: RNA modifications, Pseudouridine, Nanopore *Correspondence should be addressed to: Sarath Chandra Janga ([email protected]) Informatics and Communications Technology Complex, IT475H 535 West Michigan Street Indianapolis, IN 46202 317 278 4147 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.31.437901; this version posted June 9, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. -
Tutorial 1: Exploring the UCSC Genome Browser
Last updated: May 22, 2013 Tutorial 1: Exploring the UCSC Genome Browser Open the homepage of the UCSC Genome Browser at: http://genome.ucsc.edu/ In the blue bar at the top, click on the Genomes link. Change the drop down menus so that clade is set to Vertebrate, genome is set to Human and assembly is set to Feb. 2009 (GRCh37/hg19). Then type in “adam2” in the position or search term window and click submit. On the subsequent page, click on the top link under the UCSC Genes section. This should bring up a window that looks similar to that shown in Figure 1: Figure 1: ADAM2 as displayed in the UCSC genome browser with default tracks. The UCSC Genome browser is highly configurable. If your view looks much different than Fig. 1, try clicking the default tracks button below the graphic window. This is a complicated window, so we will go through it from top to bottom: 1. In the row labeled move, are buttons that allow you to move left to right on the chromosome as well as zoom in and out on the information. 2. The position window shows the genomic coordinates of the chromosome sequence displayed. You can edit this manually or use the move controls to zoom in and out or change positions. The entire chromosome can be viewed at once, though this takes a very long time to load. To the right of the jump and clear buttons, the size the genomic region being viewed is listed. The view is taken up by the ADAM2 gene, corresponding to about 94Kb in length, not atypical for a mammalian gene.