<<

Fishing for in the UCSC Browser A Tutorial

Rachel Harte Mark Diekhans University of California, Santa Cruz, CA Version 1.0 September 18, 2008

Abstract This tutorial is aimed at the biologist who is interested in exploring -coding genes using the University of California Santa Cruz (UCSC) Browser. It is geared towards those who have little or no experience using the UCSC Genome Browser and for more advanced users who are not familiar with many of the -oriented browser features. Using the example of a gene, PPP1R1B, the reader is guided through a step-by-step process for finding and visualizing protein-coding genes in the context of the and a wide variety of genomic data. The user is shown how to use the UCSC Genome Browser to locate a Mammalian Gene Collection (MGC) clone of the gene and how to order the clone from suppliers. Fishing for Genes in the UCSC Browser

1 Accessing the UCSC • Genome Graphs on the side blue bar is a tool to facilitate viewing of whole genome Genome Browser datasets such as genome-wide SNP associ- The UCSC Genome website ation studies, linkage studies and homozy- consists of a suite of tools for the viewing and gosity mapping. Instructions are in the mining of genomic data. The UCSC Genome Genome Graphs User’s Guide Browser [16, 18] facilitates the viewing of • PCR on the top blue bar or In Silico PCR clones from the Mammalian Gene Collection on the side blue bar is a fast method of (MGC)[35, 37] in the context of other genome searching for a pair of PCR primer se- annotations. Various types of annotations such quences in a genome assembly. as MGC Genes are visualized in tracks which • VisiGene is a virtual microscope for view- are graphical representations of data displayed ing in situ hybridization images. in a genomic context on the Genome Browser. • Proteome on the top blue bar and Pro- To view these annotation tracks, first go to the teome Browser on the side blue bar lead home page at http://genome.ucsc.edu (Figure 1). to the gateway of a tool for viewing pro- The blue menu bars at the top and the left side teins and their properties[14]. Both graph- of the Genome Browser home page both include ical images and links to external websites following links to various tools (Figure 1): provide a rich source of protein informa- tion. • on the top blue bar or the • Utilities on the side blue bar allows access Genome Browser link on the side blue to various tools to remove non-sequence- menu bar allow entry to the Gateway page related characters from DNA or protein, for for the Genome Browser. From here, users creating a gif image for a phylogenetic tree may select the genomes that they wish and a tool which converts genome coordi- to browse. An additional route of en- nates between assemblies (liftOver). try is via a Photo Gateway page in the • Downloads on the side blue bar allows UCSC genomewiki with photographs of bulk download of data including dumps each species whose genome is represented from the Genome Browser databases. in the UCSC Genome Browser. • Custom Tracks is a powerful tool allow- • Blat is a fast alignment tool[20] which al- ing users to add their own data for view- lows the user to align DNA or protein se- ing and querying within the context of a quences to the genome assemblies. genome and its associated annotation data. • Tables on the top blue bar or Table A User’s Guide provides help for creating Browser[17] on the side blue bar link to an Custom Tracks. interface allowing the retrieval of data as- sociated with tracks from the databases un- derlying the Genome Browser. Data added 2 Searching for a gene as additional tracks created by the user (Custom Tracks) may also be queried with Let’s suppose that a user wishes to study the hu- this tool. man PPP1R1B (, regula- • Gene Sorter is a tool that displays a sorted tory (inhibitor) subunit 1B) gene, which is ex- table of genes that are related by some met- pressed in the brain. The protein encoded by ric selected by the user e.g. similar expres- this gene is also known as DARPP32. Its phos- sion patterns, protein homology or proxim- phorylation status is regulated by dopaminer- ity in the genome[19]. gic and glutamatergic (NMDA) receptors. Once

1 Fishing for Genes in the UCSC Browser

Genomes - Blat - Tables - Gene Sorter - PCR - VisiGene - Proteome - Session - FAQ - Help

Genome About the UCSC Genome Bioinformatics Site Browser Welcome to the UCSC Genome Browser website. This site contains the reference sequence and ENCODE working draft assemblies for a large collection of genomes. It also provides a portal to the ENCODE Gateway project. Blat We encourage you to explore these sequences with our tools. The Genome Browser zooms and scrolls Table over , showing the work of annotators worldwide. The Gene Sorter shows expression, Browser homology and other information on groups of genes that can be related in many ways. Blat quickly maps your sequence to the genome. The Table Browser provides convenient access to the underlying Gene Sorter database. VisiGene lets you browse through a large collection of in situ mouse and frog images to In Silico examine expression patterns. Genome Graphs allows you to upload and display genome-wide data PCR sets.

Genome The UCSC Genome Browser is developed and maintained by the Genome Bioinformatics Group, a Graphs cross-departmental team within the Center for Biomolecular Science and Engineering (CBSE) at the University of California Santa Cruz (UCSC). If you have feedback or questions concerning the tools or Galaxy data on this website, feel free to contact us on our public mailing list. To view the results of the Genome VisiGene Browser users' survey we conducted in May 2007, click here.

Proteome Browser News

Utilities To receive announcements of new genome assembly releases, new software features, updates and training seminars by email, subscribe to the genome-announce mailing list. Downloads

Release Log 26 June 2008 - New Worm Genome Available

Custom Along with the set of worm browser updates that we're currently releasing, we've added a new Tracks nematode to the collection: Caenorhabditis japonica. This genome assembly (UCSC version caeJap1, Mar. 2008) corresponds to the v. 3.0.2 assembly produced by the Genome Sequencing Archaeal Center at the Washington University St. Louis (WUSTL) School of Medicine. Genomes Bulk downloads of the sequence and annotation data are available via the Genome Browser FTP Mirrors server or Downloads page. Please review the WUSTL data use policy for usage restrictions and Archives citation information.

Training We'd like to thank WUSTL for providing the sequence data for this assembly. The UCSC caeJap1 browser was produced by Hiram Clawson, Ann Zweig, and Donna Karolchik. See the Genome Credits Browser Credits page for a detailed list of the organizations and individuals who contributed to this release. Publications

Cite Us 20 June 2008 - Two Worm Updates Released: We've updated our browsers for Licenses the C. remanei and C. brenneri nematode genomes. Read more.

Jobs 10 June 2008 - Lamprey Browser Released: We have released a Genome Browser for the Mar. 2007 assembly of the lamprey genome, Petromyzon marinus. Read more. Staff

Contact Us 10 June 2008 - Lancelet Genome Available in Browser: The Mar. 2006 release of the lancelet genome (Branchiostoma floridae) is now available in the UCSC Genome Figure 1: UCSC Genome BrowserBrowser. Reahomed more. page. On this page, there is an introduction to the web site and postings of news of5 training June 2008 seminars - Guinea and Pig recent Browser additions Released: such asThe new Feb. assemblies2008 CavPor3 and release of the guinea pig genome (Cavia porcellus) is now available in the UCSC Genome Browser. software features. The blueReamenud more. bars at the top and left side of the page allow access to the Genome Browser for genome assemblies of a variety of organisms, data mining tools and help pages.

2 Fishing for Genes in the UCSC Browser

phosphorylated, PPP1R1B, is a potent protein 35046403 link. This will take you to phosphatase-1 inhibitor. To visualize this gene the PPP1R1B locus in the human genome. in the UCSC Genome Browser in the context of Note that there are two RefSeq splice vari- various genomic annotations, the user may take ants listed for this locus and this one is the the following steps: longer transcript variant.

1. To get started, clicking on either the Genomes or Genome Browser link will 3 Genome Browser take the user to the Gateway page where the clade, genome and assembly of inter- annotation tracks est may be selected from pull-down lists of multiple organisms and genome assem- The Genome Browser displays certain annota- blies. tion tracks by default in the main Browser im- age. The current default display (Figure 4) 2. Below this area, there is a section describ- shows a number of gene tracks: ing the selected assembly which also indi- cates that the default assembly (at the time • UCSC Gene predictions[15, 18] UCSC of writing) is known as hg18 on the • BLAT alignments of sequences from Genome Browser website. This assembly is GenBank[2] also known as NCBI Build 36.1. This sec- • BLAT alignments of full-length ORF MGC tion also contains a list of Sample position Genes. queries. These are a selection of queries that may be entered into the position or Other visible tracks in this default display are: search term box adjacent to the genome and assembly controls. Examples include • Vertebrate Conservation[29] gene name, mRNA or EST accession and • Simple Nucleotide Polymorphisms from descriptive term. Such terms could be used NCBI Build 128 of dbSNP[33](SNPs if the gene is not a known gene with an offi- (128)) track[38] cial gene symbol. For the gene in question, the alternate name, DARPP32, could be • Location of repeats found by Repeat- used or a descriptive term such as NMDA Masker which is less selective and returns a larger number of results. Gene structure is shown in these tracks with filled blocks representing ; thick blocks 3. Enter the gene symbol, PPP1R1B into the are in the coding sequence (CDS) and thin position or search term box (Figure 2). blocks represent untranslated regions (UTRs). Clicking on the submit button directs the The lines connecting the blocks are . user to a page displaying the search results. The direction of arrowheads on the lines or the For each track where there are data items blocks show the strand on which the element re- containing PPP1R1B as their identifier or sides. Right-facing arrows show that it is on the in their description, results are presented sense strand while left-facing arrows show that as links to the genomic positions of these it is on the antisense strand. To change the sense items. (Figure 3). of the strand in order to more conveniently view annotation on the antisense strand, the Genome 4. Next, click on one of the links, e.g., the Ref- Browser display may be reversed using the re- Seq Genes PPP1R1B at chr17:35036705- verse button underneath the Genome Browser

3 Fishing for Genes in the UCSC Browser

Search box Select Home Genomes Blat Tables Gene Sorter PCR Session FAQ Help Assembly Human (Homo sapiens) Genome Browser Gateway Genome The UCSC Genome Browser was created by the Genome Bioinformatics Group of UC Santa Cruz. Clade Software Copyright (c) The Regents of the University of California. All rights reserved.

clade genome assembly position or search term image width Vertebrate Human Mar. 2006 PPP1R1B 620 submit Click here to reset the browser user interface settings to their defaults. add custom tracks configure tracks and display clear position

About the Human Mar. 2006 (hg18) assembly (sequences)

The March 2006 human reference sequence (NCBI Build 36.1) was produced by the International Human Genome Sequencing Consortium. Assembly information Sample position queries

A genome position can be specified by the accession number of a sequenced genomic clone, an mRNA or EST or STS marker, or a cytological band, a chromosomal coordinate range, or keywords from the GenBank description of an mRNA. Example The following list shows examples of valid position queries for the human genome. See the User's Guide for more queries information.

Request: Genome Browser Response:

chr7 Displays all of 7 20p13 Displays region for band p13 on chr 20 chr3:1-1000000 Displays first million bases of chr 3, counting from p arm telomere chr3:1000000+2000 Displays a region of chr3 that spans 2000 bases, starting with position 1000000

D16S3046 Displays region around STS marker D16S3046 from the Genethon/Marshfield maps. Includes 100,000 bases on each side as well. RH18061;RH80175 Displays region between STS markers RH18061;RH80175. This syntax may also be used for other range queries, such as between cytobands and uniquely-determined ESTs, mRNAs, refSeqs, etc.

AA205474 Displays region of EST with GenBank accession AA205474 in BRCA1 cancer gene on chr 17 Figure 2: UCSCAC008101 GenomeDis Browserplays region of Gateway clone with GenBapage.nk access Theion AC008101position or search term box allows AF083811 Displays region of mRNA with GenBank accession number AF083811 the user toP searchRNP for a positionDisplays region within of genome the with HUGO selected Gene Nom genomeenclature Com assemblymittee identifie orr PRN byP keyword, gene symbol or otherNM_017414 identifier.Dis Here,plays the re agi searchon of genom fore withPPP1R1B RefSeq identifieris NM being_017414 initiated. NP_059110 Displays the region of genome with protein accession number NP_059110

pseudogene mRNA Lists transcribed pseudogenes, but not cDNAs homeobox caudal Lists mRNAs for caudal homeobox genes zinc finger Lists many zinc finger mRNAs kruppel zinc finger Lists only kruppel-like zinc fingers huntington Lists candidate genes associated with Huntington's disease zahler Lists mRNAs deposited by scientist named Zahler Evans,J.E. Lists mRNAs deposited by co-author J.E. Evans

Use this last format for author queries. Although GenBank requires the search format Evans JE, internally it uses the format Evans,J.E..

4 Fishing for Genes in the UCSC Browser

Click here to go to the location of the longer splice variant UCSC Genes

PPP1R1B (uc002hsc.1) at chr17:35038279-35046404 - protein phosphatase 1, regulatory (inhibitor) PPP1R1B (uc002hsb.1) at chr17:35038279-35046404 - protein phosphatase 1, regulatory (inhibitor) PPP1R1B (uc002hsa.1) at chr17:35036705-35046404 - protein phosphatase 1, regulatory (inhibitor) PPP1R1B (uc002hrz.1) at chr17:35036705-35046404 - protein phosphatase 1, regulatory (inhibitor) RefSeq Genes

PPP1R1B at chr17:35038279-35046404 - (NM_181505) protein phosphatase 1, regulatory (inhibitor) PPP1R1B at chr17:35036705-35046403 - (NM_032192) protein phosphatase 1, regulatory (inhibitor) Non-Human RefSeq Genes

PPP1R1B at chr17:35037013-35046370 - (NM_174647) protein phosphatase 1, regulatory (inhibitor) Ppp1r1b at chr17:35037163-35046403 - (NM_144828) protein phosphatase 1, regulatory (inhibitor) Ppp1r1b at chr17:35037163-35245015 - (NM_138521) protein phosphatase 1, regulatory (inhibitor) Human Aligned mRNA Search Results

AF435972 - Homo sapiens - and cAMP-regulated neuronal phosphoprotein isoform A (PPP1R1B) mRNA, complete cds; alternatively spliced. AF435973 - Homo sapiens dopamine- and cAMP-regulated neuronal phosphoprotein isoform B (PPP1R1B) mRNA, complete cds; alternatively spliced transcript B1. AF435974 - Homo sapiens dopamine- and cAMP-regulated neuronal phosphoprotein isoform B (PPP1R1B) mRNA, complete cds; alternatively spliced transcript B2. AY070271 - Homo sapiens truncated DARPP-32 isoform t-DARPP (PPP1R1B) mRNA, complete cds, alternatively spliced. BC001519 - Homo sapiens protein phosphatase 1, regulatory (inhibitor) subunit 1B (dopamine and cAMP regulated phosphoprotein, DARPP-32), mRNA (cDNA clone MGC:2855 IMAGE:2987943), complete cds. Non-Human Aligned mRNA Search Results

AY601872 - Rattus norvegicus, dopamine- and cAMP-regulated phosphoprotein AY640620 - Mus musculus, DARPP-30a AY640621 - Mus musculus, DARPP-30b BC011122 - Mus musculus, protein phosphatase 1, regulatory (inhibitor) BC026568 - Mus musculus, protein phosphatase 1, regulatory (inhibitor) BC031129 - Mus musculus, protein phosphatase 1, regulatory (inhibitor) BC078954 - Rattus norvegicus, protein phosphatase 1, regulatory (inhibitor) AY890186 - synthetic construct, protein phosphatase 1 regulatory subunit 1B AY892674 - synthetic construct, protein phosphatase 1 regulatory subunit 1B AY892675 - synthetic construct, protein phosphatase 1 regulatory subunit 1B BC118264 - Bos taurus, protein phosphatase 1, regulatory (inhibitor)

Figure 3: Results of a search for PPP1R1B. The results page shows all the tracks that contain data whose annotation includes the keyword “PPP1R1B”. This gene is found in the UCSC Genes, RefSeq Genes and Human mRNA tracks as well as in the tracks for Non-human RefSeq Genes and Non-human mRNAs. image (see Figure 4. The image will then be re- arate line, but at 50% of the height of full drawn so that the 50 to 30 direction of transcrip- mode and without labels tion of the antisense gene is from left to right, • pack which displays several features on which is more intuitive. each line with labels Above the Genome Browser image for the hu- • full which shows each feature on a separate man genome assemblies (and for other organ- line with labels isms when available), there is a chromosome ideogram[13] showing a red box which indi- Such compacting of tracks is particularly useful cates the position of the current viewing window for those tracks with large amounts of data. By within the chromosome. Navigation controls altering the visibility of a number of tracks, a are found above and below the Browser image display such as that in Figure 5 can be achieved. (Figure 4). Scrolling down below the Browser Zooming in to base level in order to exam- graphic allows access to the visibility controls ine annotations more closely can be achieved by for each track grouped by track type (see the using the navigation buttons above the Browser lower part of Figure 4). These controls allow image. Clicking on the base numbers in the Base the user to select various tracks for display; dis- Position track also allows zooming. At the base play modes can be altered using the pull-down level, the genome bases can be viewed below the lists. Visibility options are: numbering in the Base Position track (Figure 6. The one-letter codes for the trans- • hide which renders the track invisible lation of the codon triplets in all three frames • dense which collapses all the features into for the strand being viewed will come into view a single line as one zooms to base level (if the Base Position • squish which displays each features a sep- track visibility is full) while codons in various

5 Fishing for Genes in the UCSC Browser

Zoom

Home Genomes Blat Tables Gene Sorter PCR DNA Convert Ensembl NCBI PDF/PS Session Help Navigate along UCSC Genome Browser on Human Mar. 2006 Assembly Assembly organism the genome move <<< << < > >> >>> zoom in 1.5x 3x 10x base zoom out 1.5x 3x 10x and date Search box position/search chr17:35,036,705-35,046,403 jump clear size 9,699 bp. configure Chromosome ideogram Click on base position to zoom in

Click for track description Browser graphic

move start Click on a feature for details. Click on base position to zoom in around cursor. Click gray/blue bars move end < 2.0 > on left for track options and descriptions. < 2.0 > default tracks hide all add custom tracks configure reverse refresh

Use drop-down controls below and press refresh to alter tracks displayed. Tracks with lots of items will automatically be displayed in more compact modes. Reverses the refresh Mapping and Sequencing Tracks display for viewing Base Position Chromosome Band STS Markers FISH Clones Recomb Rate annotations on dense hide hide hide hide the negative strand Map Contigs Assembly Gap Coverage BAC End Pairs hide hide hide hide hide Track Groups Fosmid End Pairs GC Percent Short Match Restr Enzymes hide hide hide hide

refresh Phenotype and Disease Associations

refresh Genes and Gene Prediction Tracks

refresh Visibility mRNA and EST Tracks Controls Human mRNAs Spliced ESTs Human ESTs Other mRNAs Other ESTs dense dense hide hide hide H-Inv UniGene Gene Bounds SIB Alt-Splicing Poly(A) Click here for track hide hide hide hide hide description CGAP SAGE hide

refresh Expression and Regulation

refresh

refresh Variation and Repeats

refresh ENCODE Regions and Genes

Figure 4: Default tracks for the human hg18 (NCBI Build 36.1) assembly at the PPP1R1B gene locus.

6 Fishing for Genes in the UCSC Browser

Window Position Human Mar. 2006 chr17:35,036,705-35,046,403 (9,699 bp) chr17: 35040000 35045000 Genetic Association Studies of Complex Diseases and Disorders A PPP1R1B UCSC Gene Predictions Based on RefSeq, UniProt, GenBank, and Comparative Genomics B PPP1R1B PPP1R1B PPP1R1B PPP1R1B Consensus CDS CCDS11339.1 CCDS11340.1 RefSeq Genes PPP1R1B/NM_032192 PPP1R1B/NM_181505 Mammalian Gene Collection Full ORF mRNAs BC001519 Ensembl Gene Predictions ENST00000357165 ENST00000254079 ENST00000394271 ENST00000394265 ENST00000394267 N-SCAN PASA-EST Gene Predictions C chr17.36.002.a N-SCAN Gene Predictions chr17.36.003.a Exoniphy Human/Mouse/Rat/Dog Exoniphy ACEScan alternative conserved Human-Mouse predictions D PPP1R1B_pred.1 Human mRNAs from GenBank E

Human ESTs That Have Been Spliced Spliced ESTs Reported Poly(A) Sites from PolyA_DB F p.84152.1 Predicted Poly(A) Sites Using an SVM NM_181505.polyA-1 NM_032192.polyA-1 CpG Islands (Islands < 300 Bases are Light Green) G CpG: 71 SwitchGear Genomics Transcription Start Sites CHR17_P0438_R1 CHR17_P0438_R2 Genome Institute of Singapore PET of PolyA+ RNA GIS RNA hES3 Genome Institute of Singapore ChIP-PET p53 HCT116 +5FU STAT1 HeLa +gIF STAT1 HeLa -gIF cMyc P493 H3K4me3 hES3 H3K27me3 hES3 Regulatory elements from ORegAnno OREG0024380 OREG0012111 TargetScan miRNA Regulatory Sites H PPP1R1B:miR-330 I Vertebrate Multiz Alignment & PhastCons Conservation (28 Species) Vertebrate Cons

Chimp Rhesus Mouse Rat Dog Cow Tetraodon Fugu Simple Nucleotide Polymorphisms (dbSNP build 128) J SNPs (128) Repeating Elements by RepeatMasker RepeatMasker

Figure 5: Human hg18 Genome Browser showing multiple annotations for the PPP1R1B gene locus. Using the track controls below the Browser image, the visibility has been set to display a variety of annotation types in the following tracks: (A) Genetic Association Database (GAD) View; (B) UCSC, CCDS, RefSeq, MGC and Ensembl Genes; (C) N-SCAN predictions and ExoniPhy; (D) ACEScan; (E) Human mRNAs and Spliced ESTs; (F) Poly(A); (G) CpG Islands, SwitchGear Transcription Start Site predictions, GIS-PET and OregAnno; (H) TargetScan; (I) Conservation; and (J) SNPs and RepeatMasker repeats. 7 Fishing for Genes in the UCSC Browser gene annotation tracks are also labeled with the 5 Phenotype and Disease one-letter amino acid code (Figure 6). Associations tracks

The Genetic Association Database (GAD) View track[1] has a red rectangle spanning the 4 Annotation details PPP1R1B locus, indicating that this gene has been associated with a disease or condition (Fig- Each track has an associated description page ure 5, Box A). The details page for this item which can be reached either by clicking on the shows that the associated condition is nicotine hyperlinked name above the appropriate track dependence in certain individuals. Links are control below this image or via the blue/gray bar provided to both the GAD database and to the at the side of the track. The track description de- single publication documenting evidence for the tails the methods and data sources used to pro- association. duce the track, validation, credits and citations of relevant publications. Above the description, many tracks have configuration controls specific 6 Gene and Gene Prediction to the track type and some tracks also have fil- ters. These controls allow the user to create the tracks display that best shows the data of interest. Each item in a track also has a details page which 6.1 UCSC Genes may be reached by clicking on an item of inter- The UCSC Genes[15, 18] track (Figure 5, Box est in a track. Instead of configuration controls B) consists of gene models based on data from in the top section of the page, there are further GenBank, RefSeq and UniProt, each of which details about the track item. These details can has a details page that is extremely rich in infor- be extensive and may include: mation about that gene including:

• Links to external databases • Gene descriptions • Confidence scores where applicable • Database cross-reference links • Position information • Alternate gene names • Links to sequence • Genetic Association data • Links to alignment information for aligned • Chemicals interacting with the gene prod- sequences,e.g., GenBank sequences. uct • Selected microarray data • mRNA UTR secondary structure Annotation data can be noisy, so care must • Protein domains and structure be taken when using it to make interpretations. For instance, predictions can contain false pos- • Orthologs from other model organisms itives and experimental data can be erroneous. • (GO) annotations For this reason, it is important to evaluate data • Pathways involving the gene product from different sources in order to make an in- • Information on the gene model formed judgment as to the confidence that anno- tations should be assigned. With this in mind, The UCSC Genes set has four splice vari- examples from the different track groups will be ants at the PPP1R1B locus. Two predicted tran- compared for the PPP1R1B locus. scripts longer and differ in the

8 Fishing for Genes in the UCSC Browser

Stop codon In−frame Met codon Genome sequence Window Position Human Mar. 2006 chr17:35,038,933-35,039,021 (89 bp) chr17: 35038940 35038950 35038960 35038970 35038980 35038990 35039000 35039010 35039020 ---> CCC T CC A T C T CC TT A G A T CC GG C G C A GG A G A CC AA C G CC T G CC A T G C T G TT CC GG C T C T C A G A G C A C T CC T C A CC A GG T A GG CCCC T CC P S I S L D P A Q E T N A C H A V P A L R A L L T R * A P P L P P S P * I R R R R P T P A M L F R L S E H S S P G R P L P Translation in S L H L L R S G A G D Q R L P C C S G S Q S T P H Q V G P S Right−facing arrow showing Genetic Association Studies of Complex Diseases and Disorders three frames PPP1R1B UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics 5’−>3’ direction PPP1R1B/uc002hrz.1 I R R R R P T P A M L F R L S E H S S P E PPP1R1B/uc002hsa.1 I R R R R P T P A M L F R L S E H S S P E of sequence PPP1R1B/uc010cvx.1 I R R R R P T P A M L F R L S E H S S P A PPP1R1B/uc002hsb.1 M L F R L S E H S S P E PPP1R1B/uc002hsc.1 M L F R L S E H S S P E Consensus CDS CCDS11339.1 I R R R R P T P A M L F R L S E H S S P E CCDS11340.1 M L F R L S E H S S P E RefSeq Genes PPP1R1B/NM_032192 I R R R R P T P A M L F R L S E H S S P E PPP1R1B/NM_181505 M L F R L S E H S S P E Mammalian Gene Collection Full ORF mRNAs BC001519 Ensembl Gene Predictions ENST00000254079 ENST00000357165 ENST00000394271 ENST00000394265 ENST00000394267 N-SCAN Gene Predictions chr17.36.003.a I R R R R P T P A M L F R L S E H S S P E N-SCAN PASA-EST Gene Predictions chr17.36.002.a I R R R R P T P A M L F R L S E H S S P E Exoniphy Human/Mouse/Rat/Dog Exoniphy Human mRNAs from GenBank AF464196 AK127542 AK123112 AK129537 AF435972 CR606470 CR621995 AY070271 AF435973 AF435974 AK024593 AK225058 BC001519 BT007076 Human ESTs That Have Been Spliced Spliced ESTs Vertebrate Multiz Alignment & PhastCons Conservation (28 Species)

Vertebrate Cons

Gaps + 2 Human CCC T CC A T C T CC TT A G I R R R R P T P A M L F R L S E H S S P E G T A GG CCCC T CC Chimp CCC T CC A T C T CC TT A G I R R R R P T P A M L F R L S E H S S P E G T A GG CCCC T CC Rhesus CCC T CC T T C T CC TT A G I R R R R P T P A M L F R L S E H S S P E G T A GGG CCC T CC Sequence level view Mouse T CC T CC A T C T C T C T A G I R R R R P T P A M L F R V S E H S S P E G T A GGG T CC T CC Rat T CC C CC A T C T CC C T A G I R R R R P T P A L L F R V S E H S S P E G T A GGG CCC T CC of genome in Dog C T C C CC A T C T CC TT A G I R R R R P T P A M L F R L S E H S S P E G T GGG - CCC T CC Cow C T C C C T G T C T CC T C A G I R R R R P T P A M L F R L S E H S S P E G T GGG CCCC T CC mulitiple alignment Tetraodon ======Fugu ======

Translation of coding exons in the reading frame of hg18 UCSC Genes

Figure 6: Multiple annotation tracks at the human hg18 PPP1R1B gene locus. At this zoom level, both the genome sequence and the amino acid translation in three frames are displayed in the Base Position track at full visibility. On the left side of this track, a right-facing arrow shows that the sequence runs from 50 to 30. Clicking on the arrow reverse-complements the sequence so that it reads from right to left (30 to 5prime). Various gene annotation and alignment tracks show color- coding at this zoom level; in the mRNA and MGC Genes tracks, by default, the codons are shown as alternating light and dark colored rectangles. The UCSC Genes, CCDS and N-SCAN tracks and the multiple alignment section of the Conservation track display genomic codons, by default, with the one-letter amino acids code labeling each codon in the CDS region of each transcript. RefSeq Genes has this feature switched off by default, but here, it is switched on together with a label for each item consisting of its gene name and RefSeq accession. At this base-level view, ATG are codons colored light green and labeled with ”M” to represent potential translation start methionine codons while stop codons are colored red and are labeled with an asterisk ”*”. The Conservation track has been configured to show a subset of species alignments and the Vertebrate conservation scores. These features can be configured using the controls at the top of the description page reached by clicking on the blue/gray mini-button at the left side of the relevant track.

9 Fishing for Genes in the UCSC Browser

size of an internal exon. The other two tran- 6.3 Other Gene and Gene Prediction scripts that encode a short protein differ in the tracks length of the 50 UTR (Figure 5, Box B). Clicking on the first transcript (UCSC Gene uc002hrz.1) The RefSeq[31] transcripts are medium blue in displays the details page which is divided into color signifying their Provisional status. At this a number of collapsible sections. Selected sec- status level, a RefSeq is represented by a sin- tions are shown in the UCSC Genes details gle GenBank source sequence and has not yet page figures: Figure 7 shows sections includ- undergone a full review by annotators. Dur- ing those with gene descriptions and alternate ing graduation to Reviewed RefSeq status, addi- gene names, Figure 8 shows expression data and tional sequence data may be used to modify and pathways involving the gene product and Figure extend the transcript structure and additional bi- 9 shows a section of information about the gene ological annotations may be added. However, model for UCSC Gene uc002hrz.1. users can make their own judgment by evaluat- ing the additional evidence for a gene structure using the other annotation data. The Consensus Coding Sequence (CCDS) track (Figure 5, Box B) is a high quality 6.2 TransMap cross-species set of annotations consisting of a core set of protein-coding regions produced as a collabora- alignments tion among NCBI, UCSC and the Havana and Ensembl groups at the Wellcome Trust Sanger The TransMap[35, 36, 44] track contains cross- Institute (WTSI) and the European Bioinformat- species alignments of GenBank[2] mRNAs and ics Institute (EBI). CCDS represent a consen- Spliced ESTs, RefSeq Genes and UCSC Genes sus between RefSeq annotations (NCBI) and the to the genome. The more sensitive BLASTZ Ensembl and Havana annotations (EBI/WTSI). [32] alignment is used to create a base level At this locus, there are CCDS representing projection of transcript alignments from differ- both splice variants in the RefSeq Genes track ent species onto a genome in order to predict (NM 032192 and NM 181505) and also, all orthologous genes on that genome. Mouse, the predicted UCSC Genes. The two CCDS human and many other vertebrate assemblies (CCDS11339.1 and CCDS11340.1) represent have a TransMap annotation track. For each all the coding regions that are shown in the Gene genome, the vertebrate assemblies with BLASTZ and Gene Prediction group annotation tracks in alignments were selected. For closer evolution- Figure 5, Box B. ary distances, the BLASTZ alignment nets[22] The MGC Genes full-length ORF track has are syntenically filtered to distinguish orthologs one MGC clone (BC001519) (Figure 5, Box B). from paralogs; for more distant species or if MGC clone sequences have been submitted to synteny is difficult to determine, all BLASTZ the GenBank database so the same mRNA also chains[22] are used. In this way, more genes appears in the Human mRNA track. The mRNA can be mapped but with the complication that and EST tracks contain BLAT alignments of ad- some genes are mapped to paralogous regions. ditional transcripts (see section 7 and Figure 5, Post-alignment filtering can remove some of the Box E). duplications. It is this set of chains that are used The Gene and Gene Predictions group of to create a base-level projection of the transcript tracks also includes gene predictions based on alignments to the genome. The resulting pair- mRNAs and ESTs such as Ensembl Genes [9] wise alignments are shown in Figure 10. (Figure 5, Box B), de novo gene prediction

10 Fishing for Genes in the UCSC Browser

(a) Description for UCSC Gene uc002hrz.1

Home Genomes Genome Browser Blat Tables Gene Sorter PCR Session FAQ Help

Human Gene PPP1R1B (uc002hrz.1) Description and Page Index

Description:Home Genomes protein phosphataseGenome 1, Browser regulatory (inhibitor)Blat Tables Gene Sorter PCR Session FAQ Help RefSeq Summary (NM_032192): Midbrain dopaminergic neurons play a critical role in multiple brain functions, and Humanabnormal signalingGene PPP1R1B through dopaminergic (uc002hrz.1) pathways Description has been implicated and Page in several Index major neurologic and psychiatric disorders. One well-studied target for the actions of dopamine is DARPP32. In the densely dopamine- and glutamate- Home Genomes Genome Browser Blat Tables Gene Sorter PCR Session FAQ Help innervatedDescription: rat proteincaudate phosphatase-putamen, DARPP32 1, regulatory is expressed (inhibitor) in medium-sized spiny neurons (Ouimet and Greengard, 1990 RefSeq Summary (NM_032192): Midbrain dopaminergic neurons play a critical role in multiple brain functions, and Human[PubMed 2191086])Gene PPP1R1B that also express(uc002hrz.1) dopamine Description D1 receptors and(Walaas Page and Index Greengard, 1984 [PubMed 6319627]). The functionabnormal of signaling DARPP32 through seems dopaminergic to be regulated pathways by receptor has been stimulation. implicated Both in severaldopaminergic major neurologic and glutamatergic and psychiatric (NMDA) disorders. One well-studied target for the actions of dopamine is DARPP32. In the densely dopamine- and glutamate- Description:receptor stimulation protein regulate phosphatase the extent 1, regulatory of DARPP32 (inhibitor) phosphorylation, but in opposite directions (Halpain et al., 1990 [PubMedinnervated 2153935]). rat caudate Dopamine-putamen, D1DARPP32 receptor is stimulation expressed enhancesin medium cAMP-sized formation, spiny neurons resulting (Ouimet in the and phosphorylation Greengard, 1990 of [PubMedRefSeq Summary 2191086]) (NM_032192): that also express Midbrain dopamine dopaminergic D1 receptors neurons (Walaas play and a critical Greengard, role in 1984multiple [PubMed brain functions, 6319627]). and The abnormalDARPP32 signaling (Walaas through and Greengard, dopaminergic 1984 pathways [PubMed has 6319627]); been implicated phosphorylated in several DARPP32major neurologic is a potent and psychiatricprotein phosphatasefunction of DARPP32-1 (see MIM seems 176875) to be regulatedinhibitor (Hemmingsby receptor stimulation.et al., 1984 [PubMedBoth dopaminergic 6087160]). and NMDA glutamatergic receptor (NMDA) stimulati on receptordisorders. stimulation One well- regulatestudied target the extent for the of actionsDARPP32 of dopamine phosphorylation, is DARPP32. but in Inopposite the densely directions dopamine (Halpain- and et glutamate al., 1990- innervatedelevates intracellular rat caudate calcium,-putamen, which DARPP32 leads to isactivation expressed of in calcine mediumurin- sizedand dephosphorylation spiny neurons (Ouimet of phospho and Greengard,-DARPP32, 1990 thereby reducing[PubMed the2153935]). phosphatase Dopamine-1 inhibitory D1 receptor activity stimulation of DARPP32 enhances (Halpain cAMP et al., formation, 1990 [PubMed resulting 2153935]).[supplied in the phosphorylation by of DARPP32[PubMed 2191086]) (Walaas and that Greengard, also express 1984 dopamine [PubMed D1 6319627]); receptors (Walaas phosphorylated and Greengard, DARPP32 1984 is a[PubMed potent protein 6319627]). The functionOMIM]. of DARPP32 seems to be regulated by receptor stimulation. Both dopaminergic and glutamatergic (NMDA) Strand:phosphatase + Genomic-1 (see MIM Size: 176875) 9700 inhibitorExon Count: (Hemmings 7 Coding et al., Exon 1984 Count:[PubMed 7 6087160]). NMDA receptor stimulation elevatesreceptor intracellularstimulation regulatecalcium, the which extent leads of toDARPP32 activation phosphorylation, of and but dephosphorylation in opposite directions of phospho (Halpain-DARPP32, et al., 1990 thereby reducing[PubMed mRNA the2153935]). phosphatase Secondary Dopamine-1 inhibitory Structure D1 receptor activity ofstimulation of 3' DARPP32 and 5'enhances UTRs (Halpain cAMP et al., formation, 1990 [PubMed resulting 2153935]).[supplied in the phosphorylation by of OMIM].DARPP32Page Index (WalaasSequence and Greengard, and Links 1984UniProt [PubMed Comments 6319627]);Genetic phosphorylated Associations DARPP32CTD is a potentMicroarray protein phosphatasePress "+" in- 1the (see title MIM bar above176875) to inhibitoropen this (Hemmings section. et al., 1984 [PubMed 6087160]). NMDA receptor stimulation Strand:RNA Structure + GenomicProtein Size: Structure(b) 9700 InternalExonOther Count: Species and 7 externalCodingGO Exon Annotations links Count: for 7 PPP1R1BmRNA Descriptions Pathways elevates intracellular calcium, which leads to activation of calcineurin and dephosphorylation of phospho-DARPP32, thereby reducingOther Names the phosphataseModel Information-1 inhibitory activityMethods of DARPP32 (Halpain et al., 1990 [PubMed 2153935]).[supplied by OMIM]. Page ProteinIndex DomainSequence andand Links StructureUniProt InformationComments Genetic Associations CTD Microarray Strand:RNA Structure + GenomicProtein Size: Structure 9700 ExonOther Count: Species 7 CodingGO Exon Annotations Count: 7 mRNA Descriptions Pathways Press Sequence "+" in the titleand bar Links above toto openTools this and section. Databases Other Names Model Information Methods Page Index Sequence and Links UniProt Comments Genetic Associations CTD Microarray Genomic Sequence (chr17:35,036,705-35,046,404) mRNA (may differ from genome) Protein (204 aa) RNA OrthologousStructure Protein Genes Structure in OtherOther Species Species GO Annotations mRNA Descriptions Pathways Gene Sequence Sorter andGenome Links Browser to ToolsProteome and Databases Browser VisiGene Table Schema CGAP OtherPress "+"Names in the Modeltitle bar Information above to openMethods this section. Ensembl Gene ExonPrimer GeneCards Gepis Tissue H-INV Genomic Sequence (chr17:35,036,705-35,046,404) mRNA (may differ from genome) Protein (204 aa) HGNC HPRD HuGE Jackson Labs OMIM PubMed Gene GeneSorter OntologyGenome (GO) Browser AnnotationsProteome Browserwith StructuredVisiGene VocabularyTable Schema CGAP Stanford Sequence SOURCE andTreefam Links to ToolsUniProt and Databases EnsemblMolecular Function:Entrez Gene ExonPrimer GeneCards Gepis Tissue H-INV Genomic Sequence (chr17:35,036,705-35,046,404) mRNA (may differ from genome) Protein (204 aa) HGNCGO:0004860 proteinHPRD kinase inhibitor activityHuGE Jackson Labs OMIM PubMed Gene Comments Sorter andGenome Description Browser ProteomeText from Browser UniProtVisiGene (Swiss-Prot/TrEMBL)Table Schema CGAP StanfordGO:0004864 SOURCE proteinTreefam phosphatase inhibitorUniProt activity Ensembl Entrez Gene ExonPrimer GeneCards Gepis Tissue H-INV ID:BiologicalIPPD_HUMAN Process: DESCRIPTION:HGNC DopamineHPRD - and cAMPHuGE-regulated neuronalJackson phosphoprotein Labs OMIM (DARPP-32). PubMed GO:0007165 Comments signal and transduction Description Text from UniProt (Swiss-Prot/TrEMBL) FUNCTION:Stanford SOURCE InhibitorTreefam of protein(c)-phosphatase UniProtUniProt 1. Description for PPP1R1B SUBCELLULAR LOCATION: Cytoplasm. Cellular Component: PTM:ID: IPPD_HUMAN Dopamine- and cyclic AMP-regulated neuronal phosphoprotein. DESCRIPTION:GO:0005737 cytoplasm Dopamine- and cAMP-regulated neuronal phosphoprotein (DARPP-32). PTM: Comments Phosphorylation and of Description Thr-34 is required Text for from activity UniProt (By similarity). (Swiss-Prot/TrEMBL) SIMILARITY:FUNCTION: Inhibitor Belongs of to protein the protein-phosphatase phosphatase 1. inhibitor 1 family. SUBCELLULAR LOCATION: Cytoplasm. PTM:ID: IPPD_HUMAN Dopamine- and cyclic AMP-regulated neuronal phosphoprotein. PTM:DESCRIPTION: Genetic Descriptions Phosphorylation Association Dopamine fromof Thr- - all34andStudies is associated cAMPrequired of-regulated forComplex activityGenBank neuronal (By Diseases similarity). mRNAsphosphoprotein and Disorders (DARPP -32). SIMILARITY:FUNCTION: Inhibitor Belongs of to protein the protein-phosphatase phosphatase 1. inhibitor 1 family. GeneticSUBCELLULARPress "+" Association in the title LOCATION: bar Database: above to PPP1R1BopenCytoplasm. this section. CDCPTM: HuGE Dopamine Published- and cyclic Literature: AMP-regulated PPP1R1B neuronal phosphoprotein. PositivePTM: Genetic Phosphorylation Disease Association Associations: of Thr- 34Studies isnicotine required of smoking forComplex activity behavior (By Diseases similarity). and Disorders Biochemical and Signaling Pathways RelatedSIMILARITY: Studies: Belongs to the protein phosphatase inhibitor 1 family. Genetic Association Database: PPP1R1B CDCBioCarta HuGE from Published NCI Cancer Literature: Genome PPP1R1B Anatomy Project h_ck1Pathway1. nicotine smoking- Regulation behavior of ck1/cdk5 by type 1 glutamate receptors Positive Genetic Disease Association Associations: Studies nicotine(d)Alternate of smoking Complex behavior names Diseases for andPPP1R1B Disorders Relatedh_fosbPathwayBeuten, Studies: J. et- FOSBal. 2006, gene Association expression analysis and drug of theabuse protein phosphatase 1 regulatory subunit 1B (PPP1R1B) gene with Geneticnicotine Association dependence Database: in European PPP1R1B- and African-American smokers, Am J Med Genet B Neuropsychiatr Genet 2006. CDC HuGE Published Literature: PPP1R1B 1. nicotine smoking behavior Positive Other Disease Names Associations: for This nicotine Gene smoking behavior RelatedBeuten, Studies: J. et al. 2006, Association analysis of the protein phosphatase 1 regulatory subunit 1B (PPP1R1B) gene with Alternatenicotine Gene dependence Symbols: in EuropeanDARPP32,- and IPPD_HUMAN, African-American NM_032192, smokers, Am NP_115568, J Med Genet Q9H7G1, B Neuropsychiatr Q9UD71 Genet 2006. UCSC1. nicotine ID: uc002hrz.1 smoking behavior RefSeq Accession: NM_032192 Protein:Beuten,Q9UD71 J. et al. (aka 2006, IPPD_HUMAN) Association analysis of the protein phosphatase 1 regulatory subunit 1B (PPP1R1B) gene with CCDS:nicotineCCDS11339.1 dependence in European- and African-American smokers, Am J Med Genet B Neuropsychiatr Genet 2006.

Figure 7: Description, Gene Model links Information and gene names from the details page for the human gene PPP1R1B UCSC Genes RefSeq Genes (uc002hrz.1) incategory: the coding nonsenseset. (a)-mediatedshows-decay: theno RNA accession: NM_032192.2description for the gene. (b) shows links to theexon sectionscount: on the7 CDS details single in page.3' UTR: Theno RNA next size: section contains1841 links to various tools and databases bothORF atsize: UCSC and615 externally.CDS single in :(c) showsno Alignment a section % ID: containing100.00 the UniProt/SwissProt txCdsPredict score: 1416.67 frame shift in genome: no % Coverage: 100.00 database description for the protein encoded by this gene. The UniProt ID (IPPD HUMAN) links has start codon: yes stop codon in genome: no # of Alignments: 1 to the UniProt database entry for a protein isoform produced by this gene. The UniProt record also has end codon: yes retained intron: no # AT/AC introns Generated0 by www.PDFonFly.com has information about other protein isoforms represented in the database. (d) shows a section of alternate names for the PPP1R1B gene, some of11 which have links to external databases. Fishing for Genes in the UCSC Browser

Comparative Toxicogenomics Database (CTD)

Press "+" in the title bar above to open(a) this Microarray section. data for PPP1R1B

Microarray Expression Data

Expression ratio colors: red high/green low Submit

GNF Expression Atlas 2 Data from U133A and GNF1H Chips

Ratios Absolute

mRNA Secondary Structure of 3' and 5' UTRs

Press "+" in the title bar above to open this section.

Ratios Protein Domain and Structure Information Absolute Press "+" in the title bar above to open this section.

Orthologous Genes in Other Species

Press "+" in the title bar above to open this section.

Gene Ontology (GO) Annotations with Structured Vocabulary

Molecular Function: Ratios GO:0004860 protein kinase inhibitor activity Absolute GO:0004864 protein phosphatase inhibitor activity Affymetrix All Exon Microarrays Biological Process:

GO:0007165 signal transduction

Cellular Component: GO:0005737 cytoplasm Ratios Absolute

Descriptions from all associated GenBank mRNAs

Press "+" in the title bar above to open(b) this Pathways section. involving PPP1R1B

Biochemical and Signaling Pathways

BioCarta from NCI Cancer Genome Anatomy Project h_ck1Pathway - Regulation of ck1/cdk5 by type 1 glutamate receptors h_fosbPathway - FOSB gene expression and drug abuse

Figure 8: Expression Other Names data for and This pathways Gene sections from the details page for the human gene Alternate Gene Symbols: DARPP32, IPPD_HUMAN, NM_032192, NP_115568, Q9H7G1, Q9UD71 PPP1R1B (uc002hrz.1UCSC ID: uc002hrz.1) in the UCSC Genes set. (a) shows heatmap displays of microarray ex- RefSeqPPP1R1B Accession: NM_032192 pression data forProtein: Q9UD71 (akain IPPD_HUMAN) a variety of tissues and cell lines. The upper dataset is from the Genomics InstituteCCDS: ofCCDS11339.1 the Novartis Research Foundation (GNF) (http://symatlas.gnf.org) using Affymetrix chips. The lower dataset was provided by Affymetrix (http://www.affymetrix.com) Gene Model Information and it was produced using Affymetrix Human Exon 1.0 ST arrays. (b) is the pathways section which has links tocategory: the BioCartacoding pathwaysnonsense-mediated involving-decay:12 no theRNAPPP1R1B accession: NM_032192.2gene product. exon count: 7 CDS single in 3' UTR: no RNA size: 1841 ORF size: 615 CDS single in intron: no Alignment % ID: 100.00 txCdsPredict score: 1416.67 frame shift in genome: no % Coverage: 100.00

has start codon: yes stop codon in genome: no # of Alignments: 1

has end codon: yes retained intron: no # AT/AC introns Generated0 by www.PDFonFly.com Other Names for This Gene

Alternate Gene Symbols: DARPP32, IPPD_HUMAN, NM_032192, NP_115568, Q9H7G1, Q9UD71 UCSC ID: uc002hrz.1 RefSeq Accession: FishingNM_032192 for Genes in the UCSC Browser Protein: Q9UD71 (aka IPPD_HUMAN) CCDS: CCDS11339.1

Gene Model Information

category: coding nonsense-mediated-decay: no RNA accession: NM_032192.2 exon count: 7 CDS single in 3' UTR: no RNA size: 1841 ORF size: 615 CDS single in intron: no Alignment % ID: 100.00 txCdsPredict score: 1416.67 frame shift in genome: no % Coverage: 100.00 has start codon: yes stop codon in genome: no # of Alignments: 1 has end codon: yes retained intron: no # AT/AC introns 0 selenocysteine: no end bleed into intron: 0 # strange splices: 0 Click here for a detailed description of the fields of the table above.

Figure 9: Gene model Methods, information Credits, and sectionUse Restrictions from the UCSC Genes details page for the human gene PPP1R1B (uc002hrz.1Click here for). details This on how section this gene model shows was made that and datauc002hrz.1 restrictions if any. is protein-coding, has seven exons and has an open reading frame (ORF) of 615 bp among other features.

tracks such as N-SCAN, and exon prediction exons towards the 30 end of the gene (Figure 5). tracks such as ExoniPhy (Figure 5, Box C). ESTs can therefore be used to predict the exis- EXONIPHY uses conservation among human, tence of additional splice forms not represented mouse, rat and dog to identify putative protein- by mRNAs. Sequence data can be noisy; in par- coding exons. Computational gene predictions ticular, ESTs tend to have low sequence quality can provide validation for the gene and tran- and were generated by single-pass sequencing. script structures produced by manual curation Therefore, care should be taken in interpreting efforts such as those in the RefSeq Genes and these data. One caveat is that differences be- Vega Genes[41] tracks. tween transcripts and the reference genome may not be noise, but simply genetic variation be- tween individuals. The SNPs track can indicate 7 GenBank mRNA and EST such instances (Figure 5, Box J and Figure 11). The SNPs track may display no known SNPs sequence alignments from dbSNP[33] at the position of the nonsyn- onymous codon. It may be that this is a SNP The mRNA and EST Tracks group (Figure 5, that has not yet been discovered or submitted Box E) show sequences from GenBank that to dbSNP or it couldGenerated be dueby www.PDFonFly.com a sequencing er- align well to the genome using BLAT. In Fig- ror resulting in an incorrect base call. Without ure 5, these tracks are compacted; the Human evidence of a SNP or viewing the sequencing mRNAs track is shown with the visibility set to quality scores, it is impossible to determine the squish and the Spliced ESTs track is in dense origin of this base change. mode (see the paragraph on track visibility in There are configurable signals in the align- section 3). The transcript sequence alignments ment displays for the mRNAs and ESTs track are regularly updated to keep in synchrony with that denote certain features in the sequences and GenBank; mRNA and RefSeq Genes updates are may aid in the identification of noise in the se- daily and ESTs tracks are updated weekly. The quences: mRNA and EST transcripts may suggest the ex- istence of additional exons and therefore addi- • By default, the mRNAs tracks display red tional transcript variants than those found in the or yellow lines in aligned blocks where a gene and gene prediction tracks. The Spliced base difference between a transcript and the ESTs track shows that there are ESTs at the reference genome results in a nonsynony- PPP1R1B locus that have two additional small mous codon (Figures 11(b) and (c). At

13 Fishing for Genes in the UCSC Browser

Window Position Human Mar. 2006 chr17:35,036,705-35,046,403 (9,699 bp) chr17: 35038000 35039000 35040000 35041000 35042000 35043000 35044000 35045000 35046000 UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics PPP1R1B/uc002hrz.1 PPP1R1B/uc002hsa.1 PPP1R1B/uc010cvx.1 PPP1R1B/uc002hsb.1 PPP1R1B/uc002hsc.1 RefSeq Genes PPP1R1B/NM_032192 PPP1R1B/NM_181505 Mammalian Gene Collection Full ORF mRNAs BC001519 TransMap UCSC Gene Mappings A Mouse uc007lfz.1 Mouse uc007lga.1 Mouse uc007lgb.1 TransMap RefSeq Gene Mappings B Rat NM_138521.1 Mouse NM_144828.1 TransMap GenBank mRNA Mappings C Rat AY601872.1 Mouse BC026568.1 Rat BC078954.1 Mouse BC011122.1 Mouse BC031129.1 Marmoset DQ535487.1 Mouse AB221544.1 Mouse AY640620.1 Mouse AY640621.1 Mouse AK005627.1 Mouse AK141563.1 Mouse AF281662.1 Rhesus AF233348.1 Rat AF281661.1 Ensembl Gene Predictions ENST00000254079 ENST00000357165 ENST00000394271 ENST00000394265 ENST00000394267 Human mRNAs from GenBank AF464196 AK127542 AK123112 AK129537 AF435972 CR606470 CR621995 AY070271 AF435973 AF435974 AK024593 AK225058 BC001519 BT007076 BX648641 AK123950 AK124417 AF233349 BC118070 BC122554 Human ESTs That Have Been Spliced Spliced ESTs

Figure 10: TransMap track at the human hg18 PPP1R1B gene locus. (A) UCSC Genes; (B) RefSeq Genes and (C) GenBank mRNAs alignments from other species are projected on to the human genome via BLASTZ alignment chains and nets. The TransMap ESTs and human mRNAs are shown in squish visibility mode since there are large numbers of these transcripts. For the same reason, human Spliced ESTs are shown with dense visibility. Colored lines on the alignments represent nonsynonymous codons in aligned transcripts compared to the reference genome sequence; red signifies that amino acids differ in physicochemical properties and yellow signifies similar amino acids. TransMap alignments show that the exons towards the 30 end of the PPP1R1B gene appear to be fast evolving in human whereas those at the 50 end are evolving at a slower rate.

14 Fishing for Genes in the UCSC Browser

(a) Full view of mouse Vill gene with nonsynonymous base changes in MGC and mRNA tracks

Window Position Mouse July 2007 chr9:118,961,920-118,980,444 (18,525 bp) chr9: 118965000 118970000 118975000 118980000 RefSeq Genes Vill/NM_011700 Mammalian Gene Collection Full ORF mRNAs BC021808 Nonsynonymous base BC022664 substitution changes Mouse mRNAs from GenBank AK028229 colored red and yellow AK166823 BC021808 AJ344341 BC022664 AK166652 AK188903 U72681 Simple Nucleotide Polymorphisms (dbSNP build 128) SNPs (128)

(b) Exon 5 showing genomic codons and known SNPs resulting in nonsynonymous base changes

Window Position Mouse July 2007 chr9:118,969,225-118,969,800 (576 bp) Nonsynonymous chr9: 118969300 118969400 118969500 118969600 118969700 ---> base substitutions

coincide with SNPs RefSeq Genes Vill/NM_011700 Mammalian Gene Collection Full ORF mRNAs BC021808 Mouse mRNAs from GenBank AK028229 AK166823 BC021808 AJ344341 Simple Nucleotide Polymorphisms (dbSNP build 128) SNPs (128)

Nonsynonymous SNPs from dbSNP (NCBI build 128)

(c) Base-level view of exon 5 showing amino acid substitutions caused by nonsynonymous base changes

Nonsynonymous base substitution changes Arg to Lys Window Position Mouse July 2007 chr9:118,969,487-118,969,550 (64 bp) chr9: 118969500 118969510 118969520 118969530 118969540 ---> T G C T C T C AA G TTT G C GG A G A CC AA C A T G T A C AA T G T CC A G C G A C T G C TT C A C A T C A GGGG AA GG L L S S L R R P T C T M S S D C F T S G E G C S Q V C G D Q H V Q C P A T A S H Q G K E Nonsynonymous base S A L K F A E T N M Y N V Q R L L H I R G R RefSeq Genes substitution changes Vill/NM_011700S A L K F A E T N M Y N V Q R L L H I R G R Mammalian Gene Collection Full ORF mRNAs Phe to Leu BC021808 L K Mouse mRNAs from GenBank AK028229 AK166823 BC021808 L K AJ344341 K Simple Nucleotide Polymorphisms (dbSNP build 128) SNPs (128)

Known SNPs from dbSNP (NCBI build 128)

Figure 11: Genome Browser transcript rendering modes: Non-synonymous base changes. Various signals can be incorporated into the track displays that can aid in identifying noise in the data. The mouse (mm9, NCBI Build 37 assembly) Vill gene is shown at different zoom levels. Colored tick marks (zoomed out) or rectangles (base level view) represent nonsynonymous amino acid changes due to substitutions in MGC Genes and mRNA transcripts compared to the reference genome. Similar amino acids are colored yellow and amino acids that differ in physicochemical properties are colored red. Zooming in to a specific region can be achieved by clicking at the top of the Base Position track or using the zoom controls. (a) shows the entire Vill gene. (b) shows exon 5 with genomic codons appearing in the Base Position track and the RefSeq Genes, MGC and mRNA tracks. The nonsynonymous base changes in BC021808 and in the AJ344341 are due to known polymorphisms colored red in the SNP track. (c) shows the base level view of exon 5.

15 Fishing for Genes in the UCSC Browser

the base level display, the nonsynonymous for transcription start sites (TSS) (Eponine codons in the transcripts display the one TSS [8] and SwitchGear TSS (http://www. letter amino acid codes (Figure 11(c)). switchgeargenomics.com/) tracks). In Figure • By default, the ESTs tracks display red 5, Box G, the SwitchGear Genomics TSS track lines in aligned blocks showing a base dif- predicts the transcription start sites of both the ference between a transcript and the refer- longer and shorter splice variants in the Ref- ence genome; this signal can also be turned Seq Genes and Ensembl Genes tracks (Box B). on for mRNAs and MGC Genes. Experimental data such as ditags can be used • Vertical blue lines indicate an insertion at to support and verify the 50 and 30 ends of a the beginning or the end of a transcript rela- transcript. Gene Identification Signature Paired- tive to the reference genome (Figure 12(a)). End Tags (GIS-PET) [5, 6, 30, 40, 43] in- 0 0 • Green lines indicate the presence of a volves the sequencing of 5 and 3 signatures polyA tail at the 30 end of a transcript (Fig- of full-length cDNAs that are subsequently con- ure 12(a)). catenated to form ditags, sequenced and then • Orange lines indicate insertions in the mid- mapped to the genome of origin to mark the dle of a transcript relative to the reference boundaries of the transcripts (Figure 5, Box genome (Figure 12(b)). G and Figure 14). Ditags offer a much more • Double horizontal lines indicate that both efficient way of obtaining such data than tra- the genome and the transcript have an in- ditional cDNA sequencing with the limitation sertion. This may be due to poor sequence that the internal exon structure is not deter- quality in a subregion of the transcript (Fig- mined. GIS-PET data from human embryonic ures 12(a) and (b)). stem cells (hES3) confirm the existence of tran- scripts whose 50 end is coincident with both the Due to technical reasons, cloned sequences 50 end of the longer and shorter PPP1R1B tran- are often incomplete especially at the ends of scripts (NM 181505) and with the 50 end of the UTRs. Therefore, it is difficult to determine some of the longer mRNAs (Figure 14). At the whether differences in UTR length are due to 30 end, there are ditags that are coincident the ”real” variation between transcripts. Genomic the 30 end of the RefSeq transcripts and many data displayed in the Genome Browser can help mRNA transcripts of the PPP1R1B gene (Fig- the user to make an informed decision regarding ure 14) as well as the computationally predicted the completeness of mRNAs. A vertical green SwitchGear TSSs. line at the 30 end of a transcript indicates the presence of a polyA tail, confirming that the se- quence is complete at the 30 end. Both reported 8 Conservation and and predicted polyadenylation (polyA) sites are shown in the Poly(A) track[4, 24]; data in this regulation data track may suggest alternative polyadenylation sites whose use would result in variation of the The Conservation track shows a 28-way mul- 30 UTR length (Figure 5, Box F). For PPP1R1B, tiple alignment of vertebrate genomes[29] cre- both predicted and reported sites are in agree- ated using the MULTIZ alignment program[3] ment and coincide with the 30 ends of the tran- and a histogram (wiggle type track) of conser- scripts at this locus. vation scores[34] associated with the alignment The completeness of the 50 UTR is more (Figure 5, Box I). Conservation tends to peak in difficult to assess. To aid in this evalua- coding regions of the gene and falls off in non- tion, there are computationally predicted sites coding parts (introns and intergenic regions) so

16 Fishing for Genes in the UCSC Browser

(a) Insertions at the end of transcripts and polyA tails

Window Position Human Mar. 2006 chr17:35,045,550-35,046,460 (911 bp) chr17: 35046000 --->

RefSeq Genes PPP1R1B/NM_032192 PPP1R1B/NM_181505 Mammalian Gene Collection Full ORF mRNAs BC001519 Human mRNAs from GenBank AF464196 AK127542 AK123112 PolyA tails AK129537 AF435972 CR606470 CR621995 AY070271 AF435973 AF435974 Insertion at AK024593 AK225058 end of mRNA BC001519 BT007076 BX648641 AK123950 AK124417 BC118070 BC122554

(b) Internal transcript insertions

Mouse July 2007 chr1:158,357,727-158,362,861 (5,135 bp) Window Position 158362000 158361000 158360000 158359000 :chr1 RefSeq Genes Soat1/NM_009230 Mammalian Gene Collection Full ORF mRNAs BC083092 Mouse mRNAs from GenBank L42293 AK018442 AK142486 DQ903181 BC083092 AK087210 AK052299 Internal insertions AK159393 AK159529 in transcripts AK160017 S81092 U06668

Figure 12: Genome Browser transcript rendering modes: Insertions and polyA tails. Various signals can be incorporated into the track displays for GenBank mRNAs and ESTs that can aid in identifying noise in the data. (a) shows the human PPP1R1B gene locus with vertical, colored lines at the 30 ends of mRNAs; blue indicates insertions (at the 50 or 30 transcript end) in the transcript alignment relative to the genome and green lines indicate the presence of polyA tails. (b) shows the mouse (mm9, NCBI Build 37 assembly) Soat1 gene locus showing orange lines in aligned mRNAs which indicate insertions that occur in the middle of the mRNA sequences in their alignments to the genome. Note that Soat1 is on the reverse strand and the image has been reversed using the reverse button below the Genome Browser graphic (see Figure 4) so the gene can be viewed in the 50 to 30 orientation. Track element labels are now shown on the right side of the image.

17 Fishing for Genes in the UCSC Browser

(a) Insertion in both transcript and genome

Genome and transcript both have insertions

Human Mar. 2006 chr15:43,267,047-43,267,092 (46 bp) Window Position 43267090 43267085 43267080 43267075 43267070 43267065 43267060 43267055 :chr15 C C G C C C T C G C C A C C C C T G T C G C C C C C G G A C C C A C G C C G C C C C C G C G ---> P P S P P L S P P D P R R P R R R P R H P C R P R T H A A P A A A L A T P V A P G P T P P P R Consensus CDS CCDS10120.2 RefSeq Genes SHF/NM_138356 Human ESTs That Have Been Spliced BX446958 BX446025 AL529700 BX438306 BX332806 BX423813 BX352799 BX421607 BX457975 BI915484 DA061244 BX442519 BX441951 DA065885 DA017657 DC412792 DC396888

(b) EST sequence indicating sequence that is unalignable to the genome

EST sequence unaligned to human hg18 genome sequence

Figure 13: Genome Browser transcript rendering modes: Insertions in both sides of the alignment. Various signals can be incorporated into the track displays for GenBank mRNAs and ESTs that can aid in identifying noise in the data. (a) shows the human (hg18, NCBI Build 36.1) SHF gene locus at a position where the human EST, AL529700, has double horizontal lines between the black rectangles representing sequence aligned to the genome. The double lines indicate that there are insertions in the alignment in both the EST sequence and the genome. This is shown in (b) where the AL529700 sequence is blue if it aligns to the genome where black text shows unaligned regions. The unaligned sequence in the red box corresponds to the double lines in image in (a) where it can also be noted that the genome sequence in this region is different from that in the human EST sequence whereas the flanking regions from both sequences are similar, hence the insertion in both transcript and genome in the . This region of the EST is likely to be an erroneous sequence as a result of poor quality sequencing.

18 Fishing for Genes in the UCSC Browser

Window Position Human Mar. 2006 chr17:35,036,400-35,038,600 (2,201 bp) chr17: 35037000 35037500 35038000 35038500 UCSC Gene Predictions Based on RefSeq, UniProt, GenBank, and Comparative Genomics PPP1R1B PPP1R1B PPP1R1B PPP1R1B RefSeq Genes PPP1R1B/NM_032192 PPP1R1B/NM_181505 Mammalian Gene Collection Full ORF mRNAs BC001519 N-SCAN PASA-EST Gene Predictions chr17.36.002.a N-SCAN Gene Predictions chr17.36.003.a Human mRNAs from GenBank AF464196 AK127542 AK123112 AK129537 AF435972 CR606470 CR621995 AY070271 AF435973 AF435974 AK024593 AK225058 BC001519 Human ESTs That Have Been Spliced Spliced ESTs CpG Islands (Islands < 300 Bases are Light Green) CpG: 71 Genome Institute of Singapore PET of PolyA+ RNA (embryonic stem cell hES3) 290659-1-1 292622-1-5 332791-1-1 353951-1-4 413608-1-1 406135-1-1 256264-1-1 532454-3-1 401956-1-1 433582-1-3 425529-1-2 5' PET tags 36150-1-10 supporting 184616-1-3 96706-1-1 AK127542 and 171432-1-2 369073-1-1 other long mRNAs 197364-1-4 170059-1-5 518216-1-1 220273-1-1 455226-1-1 424563-1-1 393455-1-1 5109-1-7

5' PET tags supporting AY070271 and other shorter mRNAs

Figure 14: PPP1R1B locus showing the GIS-PET PolyA+ RNA ditags from the human embryonic cell line (hES3). The GIS-Pet (Gene Identification Signature Paired End Ditags) RNA track can be considered in evaluating the 50 completeness of transcripts. By clicking on the zoom buttons above the Browser image or by clicking above the base numbers at the top of the image, a zoomed-in genome view can be created. The GIS-Pet RNA track shows ditags from the 50 and 30 ends of full-length polyA+ mRNA. In this view, it can be seen that there are ditags whose 50 end coincide with the 50 end of some of the longer mRNAs e.g. AK127542 and AK123112 and ditags that coincide with the 50 end of the shorter mRNAs, e.g., AY070271 and AF435973 and the RefSeq, NM 181505. There is a longer mRNA (AF464196) and ESTs that have been used to extend the RefSeq, NM 181505, further upstream, but there are no ditags that represent this transcript.

19 Fishing for Genes in the UCSC Browser

it is a strong signal for a protein-coding gene TargetScan[12, 25, 26] predicts the presence (Figure 15). Conservation of sequence implies of a microRNA binding site in a conserved re- functional significance and can also occur where gion at the 30 end of the transcript (Figure 5, Box there are regulatory elements in the genome. H). The prediction is based partially on conser- The ACEScan[42] track predicts conserved vation so the TargetScan annotation and conser- alternative exons that are present in some tran- vation are not independent evidence of a regula- scripts and skipped by others in both human and tory region. mouse (Figure 5, Box D). Enrichment of splic- The Conservation track shows peaks of con- ing regulatory motifs occurs in intron regions servation that correspond to the coding sequence close to alternative exons, which also show a of the gene (Figure 5, Box I). Conservation falls greater degree of conservation than those close off at the exon/intron boundaries as illustrated to constitutive exons. ACEScan uses this infor- in Figures 15(a) and (b) which show a close-up mation to predict the constitutive exon that is view of the 50 and 30 ends of the PPP1R1B gene. skipped in the human mRNA, AK129537. At this zoom level, by default, the Conser- Transcriptional regulatory elements tend to be vation track shows the nucleotide sequence of enriched near the first exon [27]. Evidence of the aligned genomes, and, in coding regions such motifs are the CpG island [10] at the 50 end based on the longest UCSC Genes transcript at of the PPP1R1B gene locus and the SwitchGear this locus, the codon translation can be seen for TSS prediction[7, 39] which is color-coded ac- each of the genomes. This enables the user cording to confidence level (a darker color im- to see not only the conservation at the amino plies a higher score) (Figure 5, Box G). The acid level but also where there are differences at ORegAnno[11] track displays hand-curated reg- the amino acid level between proteins encoded ulatory regions extracted from the literature by orthologous protein-coding genes. In Figure (Figure 5, Box G). The darker green rectangle 15(b), it is possible to see that there is a SNP represents a regulatory region, located in an in- (rs35797948) in the SNPs (128) track which is tron of the PPP1R1B, bound by the CCCTC- colored red, indicating a nonsynonymous mu- binding factor (zinc finger protein) (CTCF) tran- tation in the coding region. Clicking on the scription factor. The lighter green item be- SNP in this track displays further information low represents the actual location of the CTCF about this SNP and a link out to the entry for binding site. This binding site was determined the SNP in dbSNP at NCBI (http://www.ncbi. by ChIP (Chromatin ImmunoPrecipitation)-chip nlm.nih.gov/SNP/). This reveals that there are experiments (details are found by clicking on two known alleles (A/G) which code for either these track items). Histone modifications and an arginine (CGC codon) or a histidine (CAC) multiple transcription factor binding sites for a amino acid in the translation of this last coding variety of cell types are shown in the GIS-PET exon of PPP1R1B. Histidine and arginine are track (the GIS-PET method is described in sec- both positively charged making this a conserva- tion 7). Tri-methylation of lysine4 and lysine27 tive substitution; histidine is an aromatic struc- on histone H3 is indicated at the 50 end of the ture. PPP1R1B gene (Figure 5, Box G). Such sig- nals for regulatory elements may be misleading; CpG islands are frequently found in or near pro- 9 Ordering an MGC clone moters of genes but not all genes have them, TSS predictions may contain false positives and Having used the Genome Browser to explore the transcription factor binding site measurements PPP1R1B gene, the user may now desire to or- can be noisy. der a clone for this gene for experimental re-

20 Fishing for Genes in the UCSC Browser

(a) 50 view

Window Position Human Mar. 2006 chr17:35,038,914-35,039,613 (700 bp) chr17: 35039000 35039050 35039100 35039150 35039200 35039250 35039300 35039350 35039400 35039450 35039500 35039550 35039600 --->

UCSC Gene Predictions Based on RefSeq, UniProt, GenBank, and Comparative Genomics PPP1R1B PPP1R1B PPP1R1B PPP1R1B Consensus CDS CCDS11339.1 CCDS11340.1 RefSeq Genes PPP1R1B/NM_032192 PPP1R1B/NM_181505 Mammalian Gene Collection Full ORF mRNAs BC001519 Vertebrate Multiz Alignment & PhastCons Conservation (28 Species)

Vertebrate Cons

Chimp Rhesus Mouse Rat Dog Cow Tetraodon Fugu Simple Nucleotide Polymorphisms (dbSNP build 128)

Conservation falls off at exon/intron boundaries

(b) 30 view

Window Position Human Mar. 2006 chr17:35,045,192-35,045,891 (700 bp) chr17: 35045250 35045300 35045350 35045400 35045450 35045500 35045550 35045600 35045650 35045700 35045750 35045800 35045850 --->

UCSC Gene Predictions Based on RefSeq, UniProt, GenBank, and Comparative Genomics PPP1R1B PPP1R1B PPP1R1B PPP1R1B Consensus CDS CCDS11339.1 CCDS11340.1 RefSeq Genes PPP1R1B/NM_032192 PPP1R1B/NM_181505 Mammalian Gene Collection Full ORF mRNAs BC001519 Vertebrate Multiz Alignment & PhastCons Conservation (28 Species)

Vertebrate Cons

Chimp Rhesus Mouse Rat Dog Cow Tetraodon Fugu Simple Nucleotide Polymorphisms (dbSNP build 128) rs7212852 rs34257414 rs35797948 rs734645

Synonymous Non−synonymous Untranslated SNPs

Figure 15: Zoomed in view of the 50 and 30 ends of the PPP1R1B gene. Conservation is high in the exons in the coding region and falls off at the boundaries of exons.

21 Fishing for Genes in the UCSC Browser

search. An MGC clone can be ordered by fol- of this tutorial but there is extensive documenta- lowing links from the Genome Browser to ven- tion to help users to navigate use of the Genome dors. The MGC Genes track shows the align- Browser and its integrated tools. To read the ment of one full-length MGC clone (BC001519) documentation, click on the Help link on the top for PPP1R1B. As mentioned previously (section blue menu bar found on the Genome Browser 6.3), it is also shown in the Human mRNAs track website. Links are also provided on the user in- since MGC clones are in the GenBank database terface for each tool. Additionally, questions re- (Figure 5, Box B). garding the website and Genome Browser use A click on the alignment for the BC001519 are welcome. Users may search the mailing list sequence in the MGC Genes track takes the archives (http://genome.ucsc.edu/contacts.html) user to the details page for this MGC transcript and may also send questions via e-mail to the (Figure 16). The details page displays a gene mailing list: [email protected]. description, the RefSeq accession and RefSeq description. Additionally, there is information about the clone, links for downloading protein, 11 FAQ mRNA and genomic sequences, the alignment to the human reference genome sequence, NCBI Question 1: How do I do a batch search for all clone validation information and links to various the genes that lie in a specified region of a hu- external databases including an MGC clone vali- man chromosome? dation report and links to the MGC website. The Answer 1: The Table Browser is an ex- CDS annotation of the MGC clone is frozen, but tremely useful tool for querying the database Genome Browser the RefSeq transcripts are continually being up- tables that underlie the . The dated by NCBI manual annotators as more tran- Table Browser can be reached by clicking on UCSC script and other experimental evidence becomes the Tables link on the top blue bar of the Genome Bioinformatics available. The RefSeq CDS similarity table on web pages. To retrieve UCSC Genes the MGC clone details page shows users how a set of genes from the set, these the RefSeq annotations differ to that of the MGC steps may be followed: clone. 1. Make sure that the correct assembly is se- To order a clone, the user should follow the lected. For the hg18 human assembly first link in the Links box, Order MGC clone, (NCBI Build 36.1), select Vertebrate as the which directs the user to a portal where clone clade, Human as the genome and Mar. distributors are listed. In some cases, there is 2006 as the assembly. a direct ordering link to facilitate the ordering process as seen in Figure 17. 2. For the group, select Genes and Gene Pre- diction Tracks and for the track, select UCSC Genes. 10 Summary 3. Select knownGene as the table. The Genome Browser is a very effective tool 4. Select position. To find genes in a re- for the integration and analysis of biological gion of the chromosome, type the genomic data in a genomic context. It provides an easy location in the text box in the format, method of rapidly locating an MGC clone for chr1:10000-100000. This example will a gene of interest with direct links for order- find genes on chromosome 1 between base ing the clone. Many tools are built into the 10,000 and base 100,000. Then press the Genome Browser; their use is beyond the scope lookup button.

22 Fishing for Genes in the UCSC Browser

Home Genomes Genome Browser Blat Tables Gene Sorter PCR Session FAQ Help

MGC Clone BC001519.2

PPP1R1B Homo sapiens protein phosphatase 1, regulatory (inhibitor) subunit 1B, mRNA (cDNA clone MGC:2855 Gene IMAGE:2987943), complete cds. Description RefSeq: NM_181505.2 RefSeq Summary: Midbrain dopaminergic neurons play a critical role in multiple brain functions, and abnormal signaling through has been implicated in several major neurologic and psychiatric disorders. One well-studied target for the actions of dopamine is DARPP32. In the densely dopamine- and glutamate-innervated rat caudate-putamen, DARPP32 is expressed in medium-sized spiny neurons (Ouimet and Greengard, 1990 [PubMed 2191086]) that also express dopamine D1 receptors (Walaas and Greengard, 1984 [PubMed 6319627]). The function of DARPP32 seems to be regulated by receptor stimulation. Both dopaminergic and glutamatergic (NMDA) receptor stimulation regulate the extent of DARPP32 phosphorylation, but in opposite directions (Halpain et al., 1990 [PubMed 2153935]). Dopamine D1 receptor stimulation enhances cAMP formation, resulting in the phosphorylation of DARPP32 (Walaas and Greengard, 1984 [PubMed 6319627]); phosphorylated DARPP32 is a potent protein phosphatase-1 (see MIM 176875) inhibitor (Hemmings et al., 1984 [PubMed 6087160]). NMDA receptor stimulation elevates intracellular calcium, which leads to activation of calcineurin and dephosphorylation of phospho-DARPP32, thereby reducing the phosphatase-1 inhibitory activity of DARPP32 (Halpain et al., 1990 [PubMed 2153935]).[supplied by OMIM]. Sequence Note: removed 1 base from the 5' end that did not align to the reference genome assembly. Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Entrez Gene record to access additional publications. Clone Source: Mammalian Gene Collection

MGC Clone Information and Links Link for ordering Gene PPP1R1B Links MGC clone Product protein phosphatase 1, regulatory (inhibitor) Order MGC clone Tissue colon, adenocarcinoma MGC clone database Library NIH_MGC_15 IMAGE clone 2987943 Development n/a Genbank BC001519 CDS 214..720 RefSeq NM_181505.2 Modification date 2008-07-04 CCDS11340.1 UCSC Gene uc002hsc.1 Brent Lab Clone Validation

RefSeq CDS similarity with MGC clone BC001519

RefSeq Position CDS similarity NM_181505.2 chr17:35038279-35046404 100.00% NM_032192.2 chr17:35036705-35046403 90.37%

Sequences

mRNA Protein Genomic

Reference genome mRNA Reference genome protein

Alignments

genomic (browser) span mRNA (alignment details) identity aligned

chr17:35038300-35046401 8102 + BC001519:1-1477 100.0% 97.0%

NCBI Clone Validation

mRNA start mRNA end Gene Notes

1481 1522 PPP1R1B polyA tail: 42 bases do not align to the human genome.

Description and Methods

Click here for details

Figure 16: Details page for the PPP1R1B MGC clone. The details page for the MGC full-length ORF mRNA (BC001519) shows information about the gene, this MGC clone and its sequence, similarity between the the RefSeq CDS and the annotated MGC CDS region, alignments, clone validation information and links to external databases. 23

Generated by www.PDFonFly.com Fishing for Genes in the UCSC Browser

NCBI » Clone Registry » Order Clones PubMed Nucleotide Protein Genome Structure PopSet Taxonomy

Search All Databases Go

Order cDNA clones for Homo sapiens gene PPP1R1B PPP1R1B: protein phosphatase 1, regulatory (inhibitor) subunit 1B (dopamine and cAMP regulated phosphoprotein, DARPP-32)

The following clone(s) can be purchased through any of the IMAGE distributors. Some of them have provided direct order link(s) for your convenience.

Clone: MGC:2855 (IMAGE:2987943) I.M.A.G.E. Distributors Clone Clone Sequence: BC001519.2 Vector: pOTB7 The I.M.A.G.E Consortium, information headquartered at Lawrence Corresponding RefSeq mRNA: NM_181505.2 Livermore National Laboratory, has handled from American Type Culture Collection clone arraying, archiving, from RZPD German Resource Center for Genome Research and distribution for a number Press buttons of large-scale projects, from Geneservice Ltd to order clones including MGC and The from Harvard Institute of Proteomics ORFeome Collaboration.

from Open Biosystems I.M.A.G.E Homepage

Distribution of the physical reagents, either individual clones or entire collections, Other clone(s) for the corresponding gene is handled by the following U.S. and European suppliers: American Type Culture Collection Invitrogen,Inc Open Biosystems Gene Service,Ltd. RZPD German Resource Center

Figure 17: NCBI page for ordering PPP1R1B cDNA clones. The pageMamma displayslian Gene Collecti linkson to vendors The NIH Mammalian Gene for cDNA clones. In this case, there is only one clone listed for thisC gene.ollection (MG GenBankC) project and RefSeq aims to identify at least one accessions, and the name of the vector containing the clone are shown.full-O TheRF cDNAclone clone for can be obtained each human and mouse from distributors of Integrated Molecular Analysis of Genomes and theirgene, produ Expressionce a high- (I.M.A.G.E) accuracy sequence, and make the physical reagents Consortium clones and links to some of the vendors are provided.ea Forsily ava thisilable to clone, direct order- researchers. The collection ing links are for American Type Culture Collection (ATCC), RZPD Germanis augmented by a Resource limited Center for number of rat cDNAs and Genome Research, Geneservice Ltd., Harvard Institute of Proteomicsnon and-mamm Openalian clone Biosystems.s coming from the affiliated Zebrafish (ZGC) and Xenopus (XGC) projects.

5. Finally select the output format. The de- instructions in the answerMGC Homepage to Question 1, except ZGC Homepage fault will provide the UCSC Gene identifier at step 4, type the nameXGC Ho ofmepage the chromosome into and the genomic location for each UCSC the position box in the format, chr21. Gene. Press the get output button to per- form the search. Question 3: How do I do batch retrievals for full-CDS MGC cDNA clones? A similar batch search can be done for RefSeq Answer 3: Use the Table Browser. See Ques- Genes (table: refGene), Ensembl Genes (table: tion 1 and Question 2 for guidance on batch ensGene) or other gene set. retrieval using this tool. To search for full-CDS MGC cDNA clones, the track to select is MGC Question 2: How do I do a batch search Genes and the table to select is mgcGenes. for genes on an entire human chromosome, for example, chr21? Answer 2: To carry out this search, follow the

24 Fishing for Genes in the UCSC Browser

12 Other resources URL http://www.genome.org/cgi/content/ abstract/14/4/708. • UCSC Genome Browser help at http:// genome.ucsc.edu/training [4] Y. Cheng, R. M. Miura, and B. Tian. • UCSC Genome Browser updates in the Prediction of mRNA polyadenylation Nucleic Acids Research (NAR) Database sites by support vector machine. Bioin- issues[13, 16, 18, 23] formatics, 22(19):2320–2325, 2006. doi: • The original UCSC Genome Browser 10.1093/bioinformatics/btl394. URL publication[21] http://bioinformatics.oxfordjournals.org/ cgi/content/abstract/22/19/2320. • UCSC genome browser tutorial[45] • UCSC Genome Browser: Deep support for [5] K. Chiu, C.-H. Wong, Q. Chen, P. Ari- molecular biomedical research.[28] yaratne, H. Ooi, C.-L. Wei, W.-K. Sung, • Chapter 1, Unit 1.4 of Using Biological and Y. Ruan. Pet-tool: a software Databases in “Current Protocols in Bioin- suite for comprehensive processing and formatics” managing of paired-end ditag (pet) se- • “Genomes, Browsers and Databases: Data- quence data. BMC Bioinformatics, 7(1): Mining Tools for Integrated Genomic 390, 2006. ISSN 1471-2105. doi: 10. Databases”, Peter Schattner (first edition, 1186/1471-2105-7-390. URL http://www. 2008). biomedcentral.com/1471-2105/7/390. [6] A. Choo, J. Padmanabhan, A. Chin, W. J. Fong, and S. K. W. Oh. Im- References mortalized feeders for the scale-up of human embryonic stem cells in feeder [1] K. G. Becker, K. C. Barnes, T. J. and feeder-free conditions. Journal Bright, and S. A. Wang. The ge- of Biotechnology, 122:130–141, Mar netic association database. Nat Genet, 2006. URL http://www.sciencedirect.com/ 36:431–432, May 2004. ISSN 1061- science/article/B6T3C-4HBTDBW-3/1/ 4036. URL http://dx.doi.org/10.1038/ b4f66b5f48e5b3227c5916653dbd88f0. ng0504-431. 10.1038/ng0504-431. [7] S. J. Cooper, N. D. Trinklein, E. D. An- [2] D. A. Benson, I. Karsch-Mizrachi, D. J. ton, L. Nguyen, and R. M. Myers. Com- Lipman, J. Ostell, and D. L. Wheeler. prehensive analysis of transcriptional pro- GenBank. Nucl. Acids Res., 36(suppl 1): moter structure and function in 1% of the D25–30, 2008. doi: 10.1093/nar/ human genome. Genome Res., 16(1): gkm929. URL http://nar.oxfordjournals. 1–10, 2006. doi: 10.1101/gr.4222606. org/cgi/content/abstract/36/suppl 1/D25. URL http://genome.cshlp.org/cgi/content/ abstract/16/1/1. [3] M. Blanchette, W. J. Kent, C. Riemer, L. Elnitski, A. F. Smit, K. M. Roskin, [8] T. A. Down and T. J. P. Hubbard. Com- R. Baertsch, K. Rosenbloom, H. Claw- putational Detection and Location of Tran- son, E. D. Green, D. Haussler, and scription Start Sites in Mammalian Ge- W. Miller. Aligning Multiple Genomic nomic DNA. Genome Res., 12(3):458– Sequences With the Threaded Block- 461, 2002. doi: 10.1101/gr.216102. set Aligner. Genome Res., 14(4):708– URL http://www.genome.org/cgi/content/ 715, 2004. doi: 10.1101/gr.1933104. abstract/12/3/458.

25 Fishing for Genes in the UCSC Browser

[9] P. Flicek, B. L. Aken, K. Beal, Nucl. Acids Res., 36(suppl 1):D107–113, B. Ballester, M. Caccamo, Y. Chen, 2008. doi: 10.1093/nar/gkm967. URL L. Clarke, G. Coates, F. Cunningham, http://nar.oxfordjournals.org/cgi/content/ T. Cutts, T. Down, S. C. Dyer, T. Eyre, abstract/36/suppl 1/D107. S. Fitzgerald, J. Fernandez-Banet, S. Graf, S. Haider, M. Hammond, R. Holland, [12] A. Grimson, K. K.-H. Farh, W. K. John- K. L. Howe, K. Howe, N. Johnson, ston, P. Garrett-Engele, L. P. Lim, and A. Jenkinson, A. Kahari, D. Keefe, D. P. Bartel. Microrna targeting specificity F. Kokocinski, E. Kulesha, D. Lawson, in mammals: Determinants beyond seed I. Longden, K. Megy, P. Meidl, B. Over- pairing. Molecular Cell, 27:91–105, Jul duin, A. Parker, B. Pritchard, A. Prlic, 2007. URL http://www.sciencedirect.com/ S. Rice, D. Rios, M. Schuster, I. Sealy, science/article/B6WSR-4P48CFV-9/1/ G. Slater, D. Smedley, G. Spudich, S. Tre- 9e1fa34b74d2d78d3ac3fd37b6a4b189. vanion, A. J. Vilella, J. Vogel, S. White, M. Wood, E. Birney, T. Cox, V. Curwen, [13] A. S. Hinrichs, D. Karolchik, R. Baertsch, R. Durbin, X. M. Fernandez-Suarez, G. P. Barber, G. Bejerano, H. Claw- J. Herrero, T. J. P. Hubbard, A. Kasprzyk, son, M. Diekhans, T. S. Furey, R. A. G. Proctor, J. Smith, A. Ureta-Vidal, Harte, F. Hsu, J. Hillman-Jackson, R. M. and S. Searle. Ensembl 2008. Nucl. Kuhn, J. S. Pedersen, A. Pohl, B. J. Acids Res., 36(suppl 1):D707–714, Raney, K. R. Rosenbloom, A. Siepel, 2008. doi: 10.1093/nar/gkm988. URL K. E. Smith, C. W. Sugnet, A. Sultan- http://nar.oxfordjournals.org/cgi/content/ Qurraie, D. J. Thomas, H. Trumbower, abstract/36/suppl 1/D707. R. J. Weber, M. Weirauch, A. S. Zweig, D. Haussler, and W. J. Kent. The [10] M. Gardiner-Garden and M. From- UCSC Genome Browser Database: up- mer. Cpg islands in vertebrate date 2006. Nucl. Acids Res., 34(suppl 1): genomes. Journal of Molecular D590–598, 2006. doi: 10.1093/nar/ Biology, 196:261–282, Jul 1987. gkj144. URL http://nar.oxfordjournals. URL http://www.sciencedirect.com/ org/cgi/content/abstract/34/suppl 1/D590. science/article/B6WK7-4DMP8FD-G/1/ f811b0610c688c457165b7da3cdc22b7. [14] F. Hsu, T. H. Pringle, R. M. Kuhn, D. Karolchik, M. Diekhans, D. Haussler, [11] O. L. Griffith, S. B. Montgomery, and W. J. Kent. The UCSC Proteome B. Bernier, B. Chu, K. Kasaian, S. Aerts, Browser. Nucl. Acids Res., 33(suppl 1): S. Mahony, M. C. Sleumer, M. Bilenky, D454–458, 2005. doi: 10.1093/nar/ M. Haeussler, M. Griffith, S. M. Gallo, gki100. URL http://nar.oxfordjournals. B. Giardine, B. Hooghe, P. Van Loo, org/cgi/content/abstract/33/suppl 1/D454. E. Blanco, A. Ticoll, S. Lithwick, E. Portales-Casamar, I. J. Donaldson, [15] F. Hsu, W. J. Kent, H. Clawson, R. M. G. Robertson, C. Wadelius, P. De Bleser, Kuhn, M. Diekhans, and D. Haussler. D. Vlieghe, M. S. Halfon, W. Wasser- The UCSC Known Genes. Bioinfor- man, R. Hardison, C. M. Bergman, S. J. matics, 22(9):1036–1046, 2006. doi: Jones, and T. O. R. A. Consortium. 10.1093/bioinformatics/btl048. URL ORegAnno: an open-access community- http://bioinformatics.oxfordjournals.org/ driven resource for regulatory annotation. cgi/content/abstract/22/9/1036.

26 Fishing for Genes in the UCSC Browser

[16] D. Karolchik, R. Baertsch, M. Diekhans, [21] W. J. Kent, C. W. Sugnet, T. S. Furey, T. S. Furey, A. Hinrichs, Y. T. Lu, K. M. K. M. Roskin, T. H. Pringle, A. M. Zahler, Roskin, M. Schwartz, C. W. Sugnet, and D. Haussler. The Human Genome D. J. Thomas, R. J. Weber, D. Haussler, Browser at UCSC. Genome Res., 12(6): and W. J. Kent. The UCSC Genome 996–1006, 2002. doi: 10.1101/gr.229102. Browser Database. Nucl. Acids Res., URL http://genome.cshlp.org/cgi/content/ 31(1):51–54, 2003. doi: 10.1093/nar/ abstract/12/6/996. gkg129. URL http://nar.oxfordjournals. org/cgi/content/abstract/31/1/51. [22] W. J. Kent, R. Baertsch, A. Hinrichs, W. Miller, and D. Haussler. Evolu- [17] D. Karolchik, A. S. Hinrichs, T. S. tion’s cauldron: Duplication, deletion, Furey, K. M. Roskin, C. W. Sug- and rearrangement in the mouse and hu- net, D. Haussler, and W. J. Kent. man genomes. Proceedings of the Na- The UCSC Table Browser data retrieval tional Academy of Sciences of the United tool. Nucl. Acids Res., 32(suppl 1): States of America, 100(20):11484–11489, D493–496, 2004. doi: 10.1093/nar/ 2003. doi: 10.1073/pnas.1932072100. gkh103. URL http://nar.oxfordjournals. URL http://www.pnas.org/content/100/20/ org/cgi/content/abstract/32/suppl 1/D493. 11484.abstract. [23] R. M. Kuhn, D. Karolchik, A. S. Zweig, [18] D. Karolchik, R. M. Kuhn, R. Baertsch, H. Trumbower, D. J. Thomas, A. Thakka- G. P. Barber, H. Clawson, M. Diekhans, pallayil, C. W. Sugnet, M. Stanke, B. Giardine, R. A. Harte, A. S. Hin- K. E. Smith, A. Siepel, K. R. Rosen- richs, F. Hsu, K. M. Kober, W. Miller, bloom, B. Rhead, B. J. Raney, A. Pohl, J. S. Pedersen, A. Pohl, B. J. Raney, J. S. Pedersen, F. Hsu, A. S. Hinrichs, B. Rhead, K. R. Rosenbloom, K. E. R. A. Harte, M. Diekhans, H. Clawson, Smith, M. Stanke, A. Thakkapallayil, G. Bejerano, G. P. Barber, R. Baertsch, H. Trumbower, T. Wang, A. S. Zweig, D. Haussler, and W. J. Kent. The D. Haussler, and W. J. Kent. The UCSC UCSC genome browser database: update Genome Browser Database: 2008 up- 2007. Nucl. Acids Res., 35(suppl 1): date. Nucl. Acids Res., 36(suppl 1): D668–673, 2007. doi: 10.1093/nar/ D773–779, 2008. doi: 10.1093/nar/ gkl928. URL http://nar.oxfordjournals. gkm966. URL http://nar.oxfordjournals. org/cgi/content/abstract/35/suppl 1/D668. org/cgi/content/abstract/36/suppl 1/D773. [24] J. Y. Lee, I. Yeh, J. Y. Park, and B. Tian. [19] W. Kent, F. Hsu, D. Karolchik, R. M. PolyA DB 2: mRNA polyadenyla- Kuhn, H. Clawson, H. Trumbower, and tion sites in vertebrate genes. Nucl. D. Haussler. Exploring relationships and Acids Res., 35(suppl 1):D165–168, mining data with the UCSC Gene Sorter. 2007. doi: 10.1093/nar/gkl870. URL Genome Res., 15(5):737–741, 2005. doi: http://nar.oxfordjournals.org/cgi/content/ 10.1101/gr.3694705. URL http://www. abstract/35/suppl 1/D165. genome.org/cgi/content/abstract/15/5/737. [25] B. P. Lewis, I.-h. Shih, M. W. Jones- [20] W. J. Kent. BLAT: The BLAST-like align- Rhoades, D. P. Bartel, and C. B. Burge. ment tool. Genome Res, 12:656–664, Prediction of mammalian microrna tar- 2002. gets. Cell, 115:787–798, Dec 2003.

27 Fishing for Genes in the UCSC Browser

URL http://www.sciencedirect.com/ hab, A. Ridwan, C. H. Wong, E. T. Liu, science/article/B6WSN-4C5RJXX-4/1/ and Y. Ruan. Gene identification signature daacc44b2dc4cbd20152154f2556cbfb. (gis) analysis for transcriptome characteri- zation and genome annotation. Nat Meth, [26] B. P. Lewis, C. B. Burge, and D. P. 2:105–111, Feb 2005. ISSN 1548-7091. Bartel. Conserved seed pairing, often URL http://dx.doi.org/10.1038/nmeth733. flanked by , indicates that 10.1038/nmeth733. thousands of human genes are microrna targets. Cell, 120:15–20, Jan 2005. [31] K. D. Pruitt, T. Tatusova, and D. R. Ma- URL http://www.sciencedirect.com/ glott. NCBI reference sequences (Ref- science/article/B6WSN-4F7PRGS-6/1/ Seq): a curated non-redundant sequence 4caeed2c84a218c0bd80bdf7fe1506c4. database of genomes, transcripts and pro- Nucl. Acids Res. [27] J. Majewski and J. Ott. Distribution and teins. , 35(suppl 1): Characterization of Regulatory Elements D61–65, 2007. doi: 10.1093/nar/ in the Human Genome. Genome Res., 12 gkl842. URL http://nar.oxfordjournals. (12):1827–1836, 2002. doi: 10.1101/gr. org/cgi/content/abstract/35/suppl 1/D61. 606402. [32] S. Schwartz, W. J. Kent, A. Smit, [28] M. E. Mangan, J. M. Williams, S. M. Z. Zhang, R. Baertsch, R. C. Hardison, Lathe, D. Karolchik, and W. C. Lathe III. D. Haussler, and W. Miller. Human-Mouse Biotechnology Annual Review, volume Alignments with BLASTZ. Genome Res., Volume 14, chapter UCSC Genome 13(1):103–107, 2003. doi: 10.1101/gr. Browser: Deep support for molecular 809403. URL http://www.genome.org/cgi/ biomedical research, pages 63–108. Else- content/abstract/13/1/103. vier, 2008. URL http://www.sciencedirect. com/science/article/B7CTS-4SX89K0-8/ [33] S. T. Sherry, M.-H. Ward, M. Kholodov, 2/fd8ba712b4824e759a88e34dac02e743. J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin. dbSNP: the NCBI database [29] W. Miller, K. Rosenbloom, R. C. Hardi- of genetic variation. Nucl. Acids Res., 29 son, M. Hou, J. Taylor, B. Raney, (1):308–311, 2001. doi: 10.1093/nar/29. R. Burhans, D. C. King, R. Baertsch, 1.308. URL http://nar.oxfordjournals.org/ D. Blankenberg, S. L. Kosakovsky Pond, cgi/content/abstract/29/1/308. A. Nekrutenko, B. Giardine, R. S. Har- ris, S. Tyekucheva, M. Diekhans, T. H. [34] A. Siepel, G. Bejerano, J. S. Pedersen, Pringle, W. J. Murphy, A. Lesk, G. M. A. S. Hinrichs, M. Hou, K. Rosenbloom, Weinstock, K. Lindblad-Toh, R. A. Gibbs, H. Clawson, J. Spieth, L. W. Hillier, E. S. Lander, A. Siepel, D. Haussler, and S. Richards, G. M. Weinstock, R. K. Wil- W. J. Kent. 28-Way vertebrate align- son, R. A. Gibbs, W. J. Kent, W. Miller, ment and conservation track in the UCSC and D. Haussler. Evolutionarily conserved Genome Browser. Genome Res., 17 elements in vertebrate, insect, worm, and (12):1797–1808, 2007. doi: 10.1101/gr. yeast genomes. Genome research, 15(8): 6761107. URL http://www.genome.org/ 1034–1040, 2005. cgi/content/abstract/17/12/1797. [35] A. Siepel, M. Diekhans, B. Brejova, [30] P. Ng, C.-L. Wei, W.-K. Sung, K. P. Chiu, L. Langton, M. Stevens, C. L. Comstock, L. Lipovich, C. C. Ang, S. Gupta, A. Sha- C. Davis, B. Ewing, S. Oommen, C. Lau,

28 Fishing for Genes in the UCSC Browser

H.-C. Yu, J. Li, B. A. Roe, P. Green, Kuznetsov, W.-K. Sung, L. D. Miller, D. S. Gerhard, G. Temple, D. Haussler, B. Lim, E. T. Liu, Q. Yu, H.-H. Ng, and M. R. Brent. Targeted discovery and Y. Ruan. A global map of p53 of novel human exons by comparative transcription-factor binding sites in the genomics. Genome Res., 17(12):1763– human genome. Cell, 124:207–219, Jan 1773, 2007. doi: 10.1101/gr.7128207. 2006. URL http://www.sciencedirect.com/ URL http://www.genome.org/cgi/content/ science/article/B6WSN-4J1B5X8-S/1/ abstract/17/12/1763. 199bb4dbe7ad1d5297fe65359c6dabec.

[36] M. Stanke, M. Diekhans, R. Baertsch, [41] L. G. Wilming, J. G. R. Gilbert, K. Howe, and D. Haussler. Using native and syn- S. Trevanion, T. Hubbard, and J. L. tenically mapped cDNA alignments to Harrow. The vertebrate genome improve de novo gene finding. Bioin- annotation (Vega) database. Nucl. formatics, page btn013, 2008. doi: Acids Res., 36(suppl 1):D753–760, 10.1093/bioinformatics/btn013. URL 2008. doi: 10.1093/nar/gkm987. URL http://bioinformatics.oxfordjournals.org/ http://nar.oxfordjournals.org/cgi/content/ cgi/content/abstract/btn013v1. abstract/36/suppl 1/D753.

[37] T. M. P. Team. The Status, Quality, and [42] G. W. Yeo, E. Van Nostrand, D. Holste, Expansion of the NIH Full-Length cDNA T. Poggio, and C. B. Burge. Identification Project: The Mammalian Gene Collec- and analysis of alternative splicing events tion (MGC). Genome Res., 14(10b):2121– conserved in human and mouse. Pro- 2127, 2004. doi: 10.1101/gr.2596504. ceedings of the National Academy of Sci- URL http://www.genome.org/cgi/content/ ences of the United States of America, 102 abstract/14/10b/2121. (8):2850–2855, 2005. doi: 10.1073/pnas. 0409742102. URL http://www.pnas.org/ [38] D. J. Thomas, H. Trumbower, A. D. Kern, content/102/8/2850.abstract. B. L. Rhead, R. M. Kuhn, D. Haussler, and W. J. Kent. Variation resources at UC [43] K. I. Zeller, X. Zhao, C. W. H. Lee, Santa Cruz. Nucl. Acids Res., 35(suppl 1): K. P. Chiu, F. Yao, J. T. Yustein, H. S. D716–720, 2007. doi: 10.1093/nar/ Ooi, Y. L. Orlov, A. Shahab, H. C. Yong, gkl953. URL http://nar.oxfordjournals. Y. Fu, Z. Weng, V. A. Kuznetsov, W.- org/cgi/content/abstract/35/suppl 1/D716. K. Sung, Y. Ruan, C. V. Dang, and C.-L. Wei. Global mapping of c-Myc [39] N. D. Trinklein, S. F. Aldred, S. J. Hart- binding sites and target gene networks man, D. I. Schroeder, R. P. Otillar, and in human B cells. Proceedings of the R. M. Myers. An Abundance of Bidirec- National Academy of Sciences, 103(47): tional Promoters in the Human Genome. 17834–17839, 2006. doi: 10.1073/pnas. Genome Res., 14(1):62–66, 2004. doi: 0604129103. URL http://www.pnas.org/ 10.1101/gr.1982804. URL http://genome. content/103/47/17834.abstract. cshlp.org/cgi/content/abstract/14/1/62. [44] J. Zhu, J. Z. Sanborn, M. Diekhans, C. B. [40] C.-L. Wei, Q. Wu, V. B. Vega, K. P. Lowe, T. H. Pringle, and D. Haussler. Chiu, P. Ng, T. Zhang, A. Shahab, H. C. Comparative genomics search for losses of Yong, Y. Fu, Z. Weng, J. Liu, X. D. long-established genes on the human lin- Zhao, J.-L. Chew, Y. L. Lee, V. A. eage. PLoS Computational Biology, 3:

29 Fishing for Genes in the UCSC Browser

e247 EP –, Dec 2007. URL http://dx.doi. org/10.1371/journal.pcbi.0030247.

[45] A. S. Zweig, D. Karolchik, R. M. Kuhn, D. Haussler, and W. J. Kent. Ucsc genome browser tuto- rial. Genomics, 92:75–84, Aug 2008. URL http://www.sciencedirect.com/ science/article/B6WG1-4SN8V4K-1/2/ 52fefdb7d8345ae05aa0ec653a0adc27.

30