Woods Hole 2012 Zebrafish Bioinforma cs Lab
All materials can be downloaded from
http://faculty.ithaca.edu/iwoods/docs/wh/ (or) h p://goo.gl/1bnOF
Ian G. Woods August 2012 Task 1: High resolu on mapping, sequencing, and expression
Overview: From a rough map position, refine the critical interval via (virtual) high resolution mapping with additional markers. Query the critical interval in the zebrafish genome for potential candidate genes. Find expression patterns online for these candidates. Design primers to sequence candidate genes for the mutagenic lesion or for additional SNPs to use in mapping. Task 2: Clone candidate enhancer/promoter sequences to create a transgenic reporter line
Overview: Iden fy the transla onal start site of a gene of interest. Obtain ~6kb of sequence upstream of this site. Design PCR primers that will amplify this region, and clone it in-frame with GFP in a tol2 expression vector. Iden fy BACs for use in crea ng reporter constructs via homologous recombina on or gap repair. Iden fy evolu onarily conserved sequences from other organisms to uncover poten al regulatory regions around your gene of interest. Task 3: Morpholinos, rescue, and expression
Overview: Find the zebrafish ortholog of your favorite gene. Find its loca on in the genome, locate the ATG, and iden fy the exon-intron boundaries. Design two 25-mer morpholino sequences that target (1) the ATG and (2) an exon- intron boundary. Iden fy an orthologous gene in another species for use in rescue experiments to control for morpholino specificity. Align this sequence with your morpholinos to determine degree of poten al ac vity. Obtain a full- length clone of the zebrafish gene (via RTPCR or clone collec ons) for use in overexpression experiments or expression analyses via in situ hybridiza on.
Task 4: Iden fying zebrafish transcripts via Batch sequence retrieval and BLAST
Overview: Mine OMIM (Online Mendelian inheritance in Man) for genes related to Sonic Hedgehog. Get amino acid sequences for these genes, and iden fy (via blast) the zebrafish orthologs for these proteins. Use a simple script to parse the blast results to see where the genes are located in the zebrafish genome. Finally, find out where a few of these genes are expressed (via zfin).
Requirements: UNIX terminal, perl (both na ve on MacOSX)
Detailed protocols and results Task 1: High resolu on mapping, sequencing, and expression
Overview: From a rough map position, find more SSLP markers to test for polymorphisms in your mapping cross. Query the critical interval in the zebrafish genome for potential candidate genes. Find expression patterns online for these candidates. Design primers to sequence candidate genes for the mutagenic lesion or for additional SNPs to use in mapping. ZFIN home http://www.zfin.org
Click on “Genetic Maps” ZFIN Gene c Maps Browser
Uncheck all but “MGH” and enter marker symbol ZFIN MGH map viewer
Zoom out as far as possible ZFIN Map Viewer
Z3057 is at ~38 cM. Z13936 is at ~85 cM.
Plenty of markers to follow up on.
Find primer sequences for Z15270 Find addi onal SSLPs to test http://www.ensembl.org Find addi onal SSLPs to test
enter Z15270 into the search box and hit go Follow the links . . . Ensembl Marker Report Viewer Ensembl Chromosome View
Zoom out a bit Configuring Tracks in Ensembl Zoom out in region
what kind of gene is LOC563432? Ensembl gene view
Click the ‘Orthologues’ link Ensembl gene view
Scroll down a bit . . . looks like a phosphodiesterase Record for rras2
links to expression pattern – click on External References Link out to ZFIN ZFIN expression data
Click on ‘Directly submi ed expression data’ link ZFIN expression data
rras2 expression – consistent with a role in muscle development? ZFIN home http://www.zfin.org
Follow link for Genes / Markers / Clones ZFIN gene search
Type in rras2 ZFIN gene search results
Click on “1 Gene” ZFIN gene record
Scroll down . . . . ZFIN gene record
Follow link for RNA sequence GenBank gene record
Scroll down . . . . GenBank gene record
Copy Sequence to Clipboard – note cds starts at 240 UCSC Genome Home
Click on the “BLAT” tab UCSC BLAT search
Paste in your sequence, and select “Zebrafish” from the Genome menu UCSC BLAT results
Follow “details” link for the top hit UCSC BLAT results
Light blue = exon boundaries Select about 600-800b of genomic sequence around the first exon Primer3 Home http://frodo.wi.mit.edu/primer3/
Choose size range of 500-600 Primer3 Results BLAST 2 Sequences http://blast.ncbi.nlm.nih.gov/Blast.cgi... ‘nucleotide blast’
Paste in wildtype and mutant sequences BLAST 2 Result
Scroll down . . . BLAST 2 Result
There are two SNPs . . . dCAPS Home http://helix.wustl.edu/dcaps/dcaps.html
Paste about 40b of wildtype and mutant sequence in flanking each SNP dCAPS results – SNP1
Not much luck, but can introduce differen al restric on sites via primers dCAPS results – SNP2
Plenty of enzymes from which to choose Do the SNPs affect coding?
BLAST mutant gDNA vs. cDNA Do the SNPs affect coding?
Query = mutant sequence; Subject = GenBank refseq SNPs are not in coding sequence Task 2: Clone candidate enhancer/promoter sequences to create a transgenic reporter line
Overview: Iden fy transla onal start site of gene of interest. Obtain 6kb of sequence upstream of this site. Design PCR primers that will amplify this region, and clone it in-frame with GFP in a tol2 expression vector. Iden fy BACs for use in crea ng reporter constructs via homologous recombina on. Iden fy evolu onarily conserved sequences from other organisms to uncover poten al regulatory regions around your gene of interest. ZFIN home http://www.zfin.org
Follow link for Genes / Markers / Clones ZFIN Gene Search
type in ‘scube2’ ZFIN Search Result
Follow link for Gene ZFIN Gene record
Scroll down a bit . . . ZFIN Gene record
Zfin localizes this gene to Chr. 7 GenBank Gene record
Scroll down . . . GenBank Gene record
Find the heading for “CDS” = coding sequence GenBank Gene record
The “atg” (transla onal start) is at #106 in this mRNA sequence Ensembl Zebrafish http://www.ensembl.org/Danio_rerio
Enter scube2 into the search box Ensembl text search result
Click on “Location” Ensembl Browser – scube2
The transcript is going to the ‘left’ Gene c vs. physical distance
Z15270 ~ 28,880,000 scube2 ~ 29,900,000 Physical distance ~ 1,000,000
Genetic distance = 0.1 cM Total genome = 3000 cM = 1.7 x 109 bp, so about 560,000 bp / cM
Genetic distance predicts ~ 56,000 bp away
Differences could arise from recombination hotspots/coldspots, and/or errors in sequence assembly scube2 exon 1 scube2 exon 1 – 5000b
uh oh .... overlaps another gene scube2 intergenic region
Hit ‘export data’ on left part of page scube2 intergenic region – export scube2 intergenic region – export
choose ‘text’ scube2 intergenic region – export Primer3 input http://frodo.wi.mit.edu/primer3/ Primer3 output
Add enzyme (or gateway) sequences, and clone into your GFP vector Ensembl Zebrafish Home http://www.ensembl.org/Danio_rerio
Enter “scube2” and click “Go”. Ensembl Chromosome view
scube2 is split between two sequenced BACS: CU467654 and CU464087 Ensembl Chromosome view
Turn on BAC ends track Ensembl Chromosome view
No BAC has en re gene plus puta ve regulatory sequences, but one has 5’ regions Find a BAC by BLASTing NCBI http://blast.ncbi.nlm.nih.gov/
Click “nucleo de blast” BLAST scube2 vs. zebrafish nr
Enter accession number, click “nr”, and type in “Danio rerio” NCBI BLAST results
There are several BACS, but none contain the en re sequence BLAST2 sequences http://blast.ncbi.nlm.nih.gov/ => ‘nucleotide blast’, ‘Align two sequences’
Place accession numbers for coding sequence on top, and BAC sequence on bo om BLAST2 result
The BAC contains upstream sequence plus about 1500b of coding sequence Zebrafish BAC assembly http://www.sanger.ac.uk/cgi-bin/humpub/chromoview Zebrafish BAC assembly Aligning genomic sequences with VISTA
See exons and conserved noncoding sequences (regulatory?) Iden fy orthologous sequences
Grab pep de sequence of Scube2 from GenBank record BLAT search at UCSC with pep de
Pull down Tetraodon from Genome menu Which sequence is the true ortholog?
Whole genome duplica on in teleosts makes orthology assignment a bit tricky Which sequence is the true ortholog? Zebrafish
Tetraodon S ckleback Medaka
Zebrafish Chromosome #7 =
Tetraodon #5 Stickleback #II or #VII Medaka #3 or #18
Clue from conserved synteny scube2 in Tetraodon
ZF #7 = Tet #5 scube2 in Tetraodon
Zoom out and grab sequence via DNA tab scube2 in S ckleback
Zoom out and grab sequence via DNA tab scube2 in Medaka
Zoom out and grab sequence via DNA tab mVISTA Submission
Select 4 sequences – upload them on the following page mVISTA upload mVISTA Viewer
Probably exons: can adjust parameters to be more or less stringent mVISTA Viewer
Conserved non-coding regions => regulatory elements? mVISTA Alignment
Zebrafish vs. Tetraodon Task 3: Morpholinos, rescue, and expression
Overview: Find the zebrafish ortholog of your favorite gene. Find its loca on in the genome, locate the ATG, and iden fy the exon-intron boundaries. Design two 25-mer morpholino sequences that target (1) the ATG and (2) an exon- intron boundary. Iden fy an orthologous gene in another species for use in rescue experiments to control for morpholino specificity. Align this sequence with your morpholinos to determine degree of poten al ac vity. Obtain a full- length clone of the zebrafish gene (via RTPCR or clone collec ons) for use in overexpression experiments or expression analyses via in situ hybridiza on.
Entrez Gene Search http://www.ncbi.nlm.nih.gov
Select Gene from the Search menu Entrez Gene Search
Search for the first mouse entry Entrez Gene Entry
Scroll down . . . Entrez Gene Entry
Click on pep de (NP_XXX) link Boc GenPept record
Copy pep de sequence to clipboard BLAST home page http://blast.ncbi.nlm.nih.gov/Blast.cgi
Select tblastn Tblastn vs. nr
Paste in sequence, select nr, and type Danio rerio into the Organism box Tblastn vs. ests
Paste in sequence, select est_others, and type Danio rerio into the Organism box BLAST – access recent searches
Click on Request_ID’s to see results Boc vs. nr
Click on the “U” of the top hits to go to the UniGene page Boc vs. ests
Click on the “U” of the top hits to go to the UniGene page Boc UniGene search
Click on the “Brother of CDO” link Boc UniGene entry
Scroll down . . . Boc UniGene entry
Follow BC107996 link. “BCXXXXXX” = zebrafish gene collec on BC107996 UniGene link
Click on the GenBank link BC107996 GenBank entry
1477 bp. Is this full length? Go back and check other sequences NM_001005393 GenBank entry
3373bp. Does this have en re cds? Scroll down . . . NM_001005393 GenBank entry
Looks like en re sequence is present. Design primers for RTPCR NM_001005393 GenBank entry
Looks like en re sequence is present. CDS begins at 231. Design primers for RTPCR Primer 3 Home http://frodo.wi.mit.edu/primer3/
Choose size range that will amplify complete cds Primer 3 Results
atg highlighted, end is included. Do high fidelity PCR and sequence verify NM_001005393 GenBank entry
CDS begins at 231. Design MO spanning ATG Ensembl sequence search
http://www.ensembl.org/Multi/blastview
Ensembl BLAT, choose zebrafish, paste in boc mRNA Zebrafish boc genomic structure
Each exon has an alignment on Chr 24. Beginning of CDS is at 231 Zebrafish boc genomic structure
Extract sequence around ATG or around splice junctions Zebrafish boc genomic structure http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=7955
Paste in coding sequence, and choose the Traces-WGS Database Zebrafish boc genomic structure
Alignments show exon structure – click on third exon Zebrafish boc genomic structure
Third exon ends at about 377b in trace, follow link to trace archive Zebrafish trace archive
Third exon ends at about 426b in trace Trace vs. coding sequence Trace vs. coding sequence
exon / intron boundary
Trace is in same orienta on Trace vs. coding sequence http://www.bioinformatics.org/SMS/rev_comp.html
Can find reverse complement of trace if you need to Target for splice morpholino
End of exon is highlighted in this genomic sequence Design morpholino that surrounds this sequence Morpholino design
Boc mRNA = NM_001005393 >boc_exon3_gDNA ATGACTGATAGCCAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCCCAAACTGGATTCAAATCCGGCGCCAATT ATAGATGTTTGTTTTGGATAGCCGATATATTTATCCTCCATTTTTCTGTCCTGTTTTCTCTTCTAGATGACGTTCCAG TGTTCACTGAGGAGCCGTTCTCGGTGGTACAGAAGCTGGGGGGCAGCGTGACCCTGCGCTGCAGTGCCCTGCCTAACC ATGTCAACATCAGCTGGCATCTGAATGGCAAAGAGCTGCCAGCTGGGGGCGACGAGGAGCTGGGAGTGCTGGTGCGGC CCGGTTCTCTCTACATCCCCTCCCTTACAAACCTCACCGTGGGCAGATACCAGTGTGTGGCCACCACCAGCGTCGGGT CCTGCGCGAGTGTGCCGGCCAATGTTACTGCAGCGAGTGAGTACACCAGTGCTTTTGTAAACATCAGACACTTGTTGT ATATATATTTACAGGCAATGGTGTGAACTTTGAAGCTTTTTTGTACAGAAAGTGTTTGTGTGCGAAAGATGTTTTGTT GTGTGGTGTGTGTGTGGTGTGTGTGTTTGCCTCATGTTAGTCAGTGTTGGAGCACAGAAAGGGGCGTGTAATATGACC CACCATAATGCCTGTACTAGTGAACCACTGGCCCCATTAGTGAAGGGCCTCCTCCCTGTTCTTCATAACTTTTCTTTG CAAATGTAGAGATGGAAAGAAAACCAAACAGTGAGTAAATTTTGAGAGTGTTTGGNTAGTAGTTGATTGAGTTGAGTT CAGG splice MO GTGTACTCACTCGCTGCAGTAACAT ATG MO: caacgtcccagacatcttccataca
Check these with your favorite sequence aligner to see that they match and are an sense Splice MO needs to be checked against genomic region (the trace) Search for boc in Tetraodon http://genome.ucsc.edu/cgi-bin/hgBlat?command=start
Enter mouse sequence into the BLAT page at UCSC, and select Tetraodon Tetraodon BLAT results
Click on the browser link for the top result Tetraodon genome browser
Zoom out a bit to include en re sequence Tetraodon genome browser
Click on GSTENT track Tetraodon genome browser
Click on Protein and mRNA to retrieve sequences Tetraodon Boc pep de sequence
How does this compare with the mouse sequence? Tetraodon Boc vs. Mouse Boc: BLAST2
How does this compare with the mouse sequence? Tetraodon Boc vs. Mouse Boc: BLAST2 Get UTRs for Tetraodon boc Get UTR sequences for Tetraodon boc
Grab about 100b of puta ve 5’ UTR Get UTR sequences for Tetraodon boc
Grab about 100b of puta ve 3’ UTR Primer 3 Home http://frodo.wi.mit.edu/primer3/
Enter cds plus puta ve UTR into primer 3 Primer 3 Output
Amplify via hi-fi PCR, clone, and inject Orthology makes sense by conserved synteny?
ZF: 24 Tet: 6
Looks OK ZF MO’s vs. Tetraodon boc
>tetraodon_boc_coding ATGACGATCCGTGTGGGGCTCCGGAGCCGACGGGGAGAGCTGCTTCGAGCATCCGGAGAGGGGAGTGTATGGAAAATGCCTGGAAAGCGCGACTGGACTCCGTGGATGAAAAAGAATAGGGC TCCAGTCTTGTGCACTCTGGGTGCAGTACTGCTATGCTGCCTGCAGAGCGGTGCCTCTGTCCCTGACGAGGTGCCGGTGTTCACCGAGGAGCCTGCGTCGGTGGTGCAGAAGCTCGGTGGTA GCGTGTCTCTGCGCTGCAGTGCCCGGCCCGCCTTGGCCAACATCAGCTGGCGCCTCAACGGCCAGGAACTGCTGGATGGAGATTTTGGAGCCGTGCTGGGGCCCAACAGCCTCTACATCCCG TCTTTGTCCAACCTGACCCTGGGCAGGTACCAGTGTGTAGCCAGCACTGGTGTGGGCGCCTTGGCCAGTGTACTGGCTAATGTGACGGCTGCCAAGCTGCGGGATTTCGAGCCGGACGACCA TCAGGAGATCGAGGTGGACGAGGGCAACACGGCCGTCATCGAGTGCCACCTCCCCGAGAGCCAGCCCAAGGCTCAGGTCCGCTACAGCGTCAAGCAGGAGTGGCTGGAAACATCCAAAGGCA ACTACCTCATCATGCCATCAGGGAACCTGCAGATCGCTAACGCCACCCAGGAGGACGAGGGCCCGTACAAGTGCGCCGCCTACAACCCCGTCACTCAGGAGGTCAAGACATCCATCTCTGCG GACCGCCTGCGCATACGCCGCTCCACCTCCGAGGCCGCACGCATCATTTACCCGCCGGCTTCTCGCTCCATCATGGCGACCAAGGGCCAGCGGCTGGTGCTGGAGTGTGTGGCCAGCGGCAT CCCCACCCCTCAGGTGACATGGGCGAAGGACGGGCAGGACCTGCGCTACGTCAACAACACCCGCTTCCTGCTCAGCAACCTGCTGATCGACGCCGTGGGTGAGAGCGACTCGGGCACCTACG CCTGCCAGGCCGACAACGGCATCCTTGCATCCGCCTCTGCGATGGTGCTCTATAACGTCCAGGTGTCCGAGCCTCCCCAGGTGACGGTGGAGCTGCAGCAGGTGTACGGTGGGACGGTGCGC TTCACCTGCCAGGCTCGCGGCAAACCGGCTCCCTCGGTGACGTGGCTCCACAACGCGCGGCCCCTGTCCCCGTCGCCCCGCCACCGGCTGACCTCCAGGATGCTCCGCGTGTCCAACGTGGG CCTCCAGGACGAGGGCCTGTACCAGTGCATGGCCGAGAACGGCGTGGGCAGCTCGCAGGCGTCGGCTCGCCTCATCATAGCCTCGGCCGTCGTCCCCCCGCGGGGAAAGCCGCCCTCCATTT TTCTGAGTCCCGACAAGGTGCTGCGGGAGCAGCCTCCGGTGAGGCCGGGGCCCGGCGGCGCCATGTTGCCCCTGGACTGCTCCGAGCTGCCGGGACAGGTCCTGCCCGCAGAAGCTCCCATC ATCCTCAGCCAGCCGCGCACGGGCAAGGCCGACTATTACGAGCTGACCTGGAGGCCCCGACACGAGCGCGGCGTTCCCGTGCTGGAGTATATGATTAAATACAGAAAGGTGGGGGACCCTCT GGCCGAGTGGACCTCCAGCAGTATCTCCGGCTCCCTGCACAAGCTGACCCTGGCCAAGCTGCAGCCAGACAGCCTGTACGAGGTGGAGATGGCTGCCAAGAACTGCGCCGGCTTGGGACAGC CGGCAATGATGACCTTCCGAACCGGCAAAGGTACATCGGAGCACCTCGGTGATGTTTCGGGGAAGGTTCTAAGAGTGGGGGCAGGTCGTAGAGGAAAAATCGATCCTCCAAAGACCCCTGCG GTCCCGTCGCCAAGCCTCTCTCGGTTTTTTTTGCCCTGTGTTTCTTGTCCTGTCCCATTTCACACTGCCCCCCCCCCCCCCGCAGCTCCCGAAGCCCCCGACAAGCCCACGGTCTCCGCGGC GACGGAGACATCGGCGTACGTGACCTGGATCCCGCGCGGCAACCGCGGCTTCCCCATCCAGTCTTTCCGGGTGGAGTACAAGAAAGTGAAGAAGGCCGGAGAAGACTGGGTGACGGCAGTGG AGAACATCCCCCCATCGCGCCTCTCCGTGGAGATCACAGGCCTGGAGAAAGGTACATCCTACAAGTTCCGCGTGGTGGCGGTGAATGTCATCGGTTCCAGTCCCCCCAGCGCTCCTTCCAAG GCCTACGCGGTGGTGGTTGGGAGAACCCCCGAGCGGCCCGTCGACGGCCCCTACATCACCTACAACGAAGCCATCAATGAGACCAGCATCATCCTCAAATGGACGTACACGCCTGTGAACAA CACGCCCATCTACGGCTTCTACATCTACTACCGCCCGACGGACAGCGACAACGACAGCGACTACAAGAAGGATGTGGTGGAGGGGGACAAGTACTGGCACTCCATCACCAACCTCCAGCCTG AGACCGCCTACGACATCAAGATGCAGAGCTTCAACGAGAAGGGCGAGAGCGAGTTCGGCAACGTGGTGATCCTGGAAACCAAAGGTGGGGCTGTCGTGTCGCTCGCGCCTGGGGGGGTGGAG GACAGAACCGGGTGCATTGATTGTATCCCTCCACTGCGTCTCGCTAGCCCGCCCCAATCAGCCCGTCCCGTCGGAGATCCCAGATTACAGTCCTGGAACCCCCAAGGACGGCGTGCCTCGGC CCGGCGACCTCCCCTACTTCATAGTCGTCATTGTCCTCGGGGCCTTCATCTTCATCATTGTGGCCTTCATCCCCTTCTGTCTTTGGAGGACCTGGGCCAAGCAGAAGCAAACATCAGACATG TGCTTTCCCGCCGTGCCCTCCCCCGTGCCATCCTGCCAGTACACCATGGTCCCTCTCCAGGGACTGGCCCTGGTTGGCCGCTGCCCGCTGGATGGTCACATGACCGGGCCGCACGGGGTTTA CCCTGTGAATGGCGAGTGCGGCATGAATGGCAAACCTCACCACCTGCCAGGACGGCAGCAGGTAAAGAAGCGAAGCGCTGGAACCGGCCTGTGTGGTGGGAGTGGAAAGCCTCAGCTGTGGT TTCTCCACAGGAGGAGGCGGACTGTGACATGGAGTGTGACACCCTGTTACCGCAGACGGTGCCAAATGGACATTTGCCAGTTTGCCATTACCCCACCAGAGTCGGTGTCCTTTCCTCTCCTC TCTCTGGAAGACGAAGGGGTCTTCACCACGTCCTCCTCGACGGCCACAACGCCACAATCCCAAGATACGATTCAGGAAGTGAGCATCCTCCCAAATGA >tetraodon_boc_genome TGGCGTAAAAAGTCACCGTTATGCAGGTGACGCCACAGATTGGTCGTGCGGGACTAAAGAAATGACTCTTTCCTGCAGGCAACTACCTCATCATGCCATCAGGGAACCTGCAGATCGCTAAC GCCACCCAGGAGGACGAGGGCCCGTACAAGTGCGCCGCCTACAACCCCGTCACTCAGGAGGTCAAGACATCCATCTCTGCGGACCGCCTGCGCATACGCCGTGAGTGGCCGCCGGTGCAGAC ACACGCACAGGCCGTTTTTGCATTTTGAGGTCACAAATGGTCGCACAAGCAGTTCCAAGAATTCCGAAATCACGTGCCGGCCGTCCCCCAGGCTCCACCTCCGAGGCCGCACGCATCATTTA CCCGCCGGCTTCTCGCTCCATCATGGCGACCAAGGGCCAGCGGCTGGTGCTGGAGTGTGTGGCCAGCGGCATCCCCACCCCTCAGGTGACATGGGCGAAGGACGGGCAGGACCTGCGCTACG TCAACAACACCCGCTTCCTGCTCAGCAACCTGCTGATCGACGCCGTGGGTGAGAGCGACTCGGGCACCTACGCCTGCCAGGCCGACAACGGCATCCTTGCATCCGCCTCTGCGATGGTGCTC TATAACGTCCAGGTGTCCGGTGAGTACACATGGAGGGCGTGAGGTGCAGGGACCTGGATGAGGTGCATGTTTGTTAAAATAGCTTTTTTCTTTCTGTAAAAACAAATGTGTTACTGTAGGTT TTTTTTGTGTGATTTTGGTAATCACACAAATTGCGTGAAATTGTGTGTGAGAGGCCTGTGAGGCTGGGCGGGGTTCTGGGGGACGGTATTTATGGAGACAGGCCTCCCTCTGTCAGAGGACT TGACTG splice MO CACTCACCGAACACCTGCACATCGT ATG MO: caacgtcccagacatcttccataca
Test via blast2: ATG-MO vs. cds, and splice-MO vs. genome – no alignment Task 4: Batch sequence retrieval and BLAST (a bit advanced)
Overview: Mine OMIM (Online Mendelian inheritance in Man) for genes related to Sonic Hedgehog. Get amino acid sequences for these genes, and iden fy (via blast) the zebrafish orthologs for these proteins. Use a simple script to parse the blast results to see where the genes are located in the zebrafish genome. Finally, find out where a few of these genes are expressed (via zfin).
Requirements: UNIX terminal, perl (both na ve on MacOSX)
Go to the NCBI website: http://www.ncbi.nlm.nih.gov/
Select “OMIM” and search for ‘shh’ OMIM
h p://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM Select “Protein links” from the Display pulldown menu Select “Homo sapiens” from the Top Organisms menu Select “FASTA (text)” and show 200 from the Display Se ngs menu Build up a file from the sequences BLAST homepage
http://blast.ncbi.nlm.nih.gov/Blast.cgi Click on the “help” tab Follow link for “Download BLAST So ware and Databases”
Use the “blast” download for your pla orm Connect to the Ensembl p server Move to the Zebrafish Directory and download sequences
cd pub/release-63/fasta/danio_rerio/cdna mget Dan* Make a BLASTable database from the zebrafish sequences
gunzip Danio* cat Danio_* >> Zv9_release63_transcripts.fa mv Zv9_release63_transcripts.fa ~/Desktop/ncbi-blast-2.2.25+ makeblastdb –in Zv9_release63_transcripts.fa –dbtype nucl may need to type ./bin/makeblastdb ... if you haven’t updated your PATH to point to BLAST executables BLAST the SHH-related pep des vs. the zebrafish transcript database
tblastn -query shh_peps.fa -db Zv9_release63_transcripts.fa -num_descriptions 2 -num_alignments 2 -evalue 1e-5 -out shh_v_zv9transcrips.tblastn &