Tutorial 1: Exploring the UCSC Genome Browser
Total Page:16
File Type:pdf, Size:1020Kb
Last updated: May 22, 2013 Tutorial 1: Exploring the UCSC Genome Browser Open the homepage of the UCSC Genome Browser at: http://genome.ucsc.edu/ In the blue bar at the top, click on the Genomes link. Change the drop down menus so that clade is set to Vertebrate, genome is set to Human and assembly is set to Feb. 2009 (GRCh37/hg19). Then type in “adam2” in the position or search term window and click submit. On the subsequent page, click on the top link under the UCSC Genes section. This should bring up a window that looks similar to that shown in Figure 1: Figure 1: ADAM2 as displayed in the UCSC genome browser with default tracks. The UCSC Genome browser is highly configurable. If your view looks much different than Fig. 1, try clicking the default tracks button below the graphic window. This is a complicated window, so we will go through it from top to bottom: 1. In the row labeled move, are buttons that allow you to move left to right on the chromosome as well as zoom in and out on the information. 2. The position window shows the genomic coordinates of the chromosome sequence displayed. You can edit this manually or use the move controls to zoom in and out or change positions. The entire chromosome can be viewed at once, though this takes a very long time to load. To the right of the jump and clear buttons, the size the genomic region being viewed is listed. The view is taken up by the ADAM2 gene, corresponding to about 94Kb in length, not atypical for a mammalian gene. BCH-M628 2013 UCSC Genome Viewer tutorials Page 1 of 10 Last updated: May 22, 2013 3. Use the configure button to change the default image width. By default it is set to 800. Try changing from 800 to 1000 and then click the submit button. You can also adjust the placement of the labels on the graphic window as well as what tracks to display and how those tracks are configured. 4. The next row shows a picture of the chromosome with the chromosome bands in lighter and darker shades of grey. The red vertical bars shows where on the chromosome the current view is located. In this case, chr8 (p11.22) tells you this gene is located on the petit or short arm at position 22 of chromosome 8. The designation q is for the long arm of the chromosome. Cytogenetic mapping has not been done for all organisms so this view may not always be available. 5. The main window displays a set of annotation tracks. Right-clicking on any track should give you display configuration options. If you scroll down below the main window, the tracks that are shown on the display have their menu boxes in white, whereas the hidden ones are in grey. Clicking on the link above the menu will give you more information about the tracks as well as configuration options. The tracks listed shown in this graphic represent different sequence information that has been placed on the human genome by coordinates relative to the beginning and end of each Figure 2: Expanded view shown in 1st red box chromosome. For the tracks representing genes or ESTs, the longer vertical bars in the gene structure represent exons while the horizontal lines between the vertical bars represent introns. The > and < symbols tell you which orientation the gene is relative to the chromosome. ADAM2 is listed four times because it has 4 alternatively spliced transcripts. Note the difference in exon number between the 3 isoforms in the area surrounded by the red boxs. ADAM2 is coded on the negative strand (<). Below the gene and EST tracks are 3 tracks that provide information relevant to the regulation of transcription from the ENCODE project. The layered H2K27Ac track represents histone acetylation marks. You can configure the track height, among other parameters to see these marks more clearly. The DNase Clusters track shows regions where the chromatin is hypersensitive to cutting by the DNase enzyme, which has been assayed in a large number of cell types. Regulatory regions, in general, tend to be DNase sensitive, and promoters are particularly DNase sensitive. The Txn Factor ChIP track shows DNA regions where transcription factors, proteins responsible for modulating gene transcription, bind as assayed by chromatin immunoprecipitation with antibodies specific to the transcription factor followed by sequencing of the precipitated DNA (ChIP-seq). BCH-M628 2013 UCSC Genome Viewer tutorials Page 2 of 10 Last updated: May 22, 2013 Below the regulation tracks are a series of sequence conservation tracks. The darker the color, the more conservation between the sequence in the window and species listed. Change the view to full and then click the refresh button. It should be a little easier to decipher. It’s a bar graph and the higher the bar, the more conserved are the sequences Not surprisingly, Rhesus macques have the highest degree of conservation with humans, while the conservation falls off as you get further away from humans. 6. Below the image window, are the fine level controls for changing the start and end of the displayed sequence data. 7. The next row provides default tracks, hide all, configure and refresh buttons for rapidly resetting the tracks display. To the right of those buttons are check boxes for toggling on and off the chromosome display and label display. If your display does not look like above, click the default tracks button to reset it. Tracks: You will notice there are many different tracks, representing the different information sources included in the output. There are lots of tracks to view (> 100 for this Human build) and different views for them. Scroll down to the track window below the graphic. Only a few have menus displayed, most are in “hide” mode. Change the UCSC genes from “pack” to “dense”, and click the Refresh button. What happened to the display? Change RefSeq from “dense” to “pack” and click the Refresh button. What happened? If you want to know what a track represents, click on the link above each track setting. The UCSC browser has many genomes, most of which do not have nearly as many annotation tracks as the human browser. You can export the graphic window as either a PDF or PS (post-script) file. Click on the PDF/PS link on the blue navigation bar at the top and a window will open allowing you to export either PostScript or PDF. If you are familiar with Adobe Illustrator, this program will open a postscript file so that it can be edited and printed at high resolution. If you click on the Gene names listed above or to the left of the track, it will open a window with a description, summary and links to other databases that have information about that gene. From the gene summary page, there is a link for obtaining the genomic sequence from the region around the gene. Another way to obtain the genomic sequence, use the DNA link located in the top blue bar. It will automatically retrieve all of the genomic sequence represented in the current view. You can use the subsequent dialog box to add more DNA to one or both ends. If you want the coding sequence, you have to know the orientation of your gene. If it is on the negative/reverse (<) strand, you will want to check the Reverse compliment box before using the get DNA button in the dialog box. Figure 3: Get DNA dialog box for UCSC genome browser. BCH-M628 2013 UCSC Genome Viewer tutorials Page 3 of 10 Last updated: May 22, 2013 Tutorial 2: Mapping sequences using BLAT One of the most useful features of the UCSC browser is the “BLAT” search. This allows you to put in either a nucleotide or protein sequence and it will conduct a similarity search that is related to BLAST. The results are the location of your sequence on the genome itself. This is quite useful if you want to map an unknown clone or to find a homolog. For this example, we will use an EST fragment and determine where the EST is mapped in the human genome. Once we find that, we will look for alternatively spliced transcripts. First, retrieve the EST sequence from NCBI’s Entrez interface by typing in “BG334944” into the text box at the top of the page. The resulting page should show one entry in the EST database. Click on the EST link and then click on the fasta link to bring up the sequence in fasta format. Copy the sequence from the definition line (identified by the “>”) to the end of the sequence. Go to http://genome.ucsc.edu. From the main page, choose human in the pull-down menu, then click the “Blat” link from the list on the left-hand side or from the top menu. In the BLAT search window on the next page, paste in the sequence you copied from NCBI. Keep the Assembly selection at Human Feb. 2009. You can change Query type to DNA. Press Submit. Figure 4: BLAT search window The server will very quickly return the search results. In this case, the EST produces a several matches on several different chromosomes. This is not unusual, but if you look at the list returned, the first one has a significantly better score over a longer stretch of the sequence than do the other hits. Figure 5: BLAT Search Results for BG334944. Click on details link for the first match to obtain more information about the query sequence and its alignment to the genomic sequence.