Final Exam: Introduction to Bioinformatics and Genomics DUE: Thursday June 23Rd at 4:00 Pm
Total Page:16
File Type:pdf, Size:1020Kb
Final exam: Introduction to Bioinformatics and Genomics DUE: Thursday June 23rd at 4:00 pm I will NOT grade any exams turned in after this time. I am giving you an extra 2 days because the DAVID website is scheduled to be offline for maintenance from Friday June 17th – Sunday June 19th. My suggestion is that you conduct the gene enrichment analyses before Friday and save the lists generated in Excel so you can conduct other analyses without being dependent on DAVID. Exam description: The purpose of this exam is for you to demonstrate your ability to use the different BiomoleCular databases and BioinformatiCs tools to find BiologiCal meaning in a list of genes identified in a transCriptome study. SeCondarily, your report should Be written Clearly, ConCisely and resemBle a results seCtion of a manusCript. That is, your narratives should explain what you did and why, without listing step-by-step each mouse CliCk. You should address the questions listed, But do not make this look like a short answer exam. The questions I put on the exam should NOT appear in your report, exCept as rephrased as part of your narrative. There are five seCtions to the exam. The starting point is a list of signifiCantly differentially expressed genes from a mouse neuronal stem Cell line transduCed with a putative oncogenic transCription factor Compared to transduCtion with a veCtor Control. I analyzed the miCroarray data using two different methods, compared the genes lists from Both and found that genes that were signifiCantly differentially expressed By Both methods. I then edited the list to remove redundant genes, leaving a final list of 198 genes. You will ConduCt a gene enriChment using gene ontology and pathway analyses, look at protein-protein interactions and then Choose one gene on which to ConduCt an in silico moleCular/BioChemiCal analysis. I strongly suggest that you read the paper to help you put the analyses into Context. Reference: Zheng H., et. al. “PLAGL2 Regulated Wnt Signaling to Impede Differentiation in Neural Stem Cells and Gliomas”. (2010) CanCer Cell 17:497-509. PMID: 20478531 To oBtain the full points write-up should include: • ABstract/IntroduCtion • Gene ontology enriChment • Pathway enriChment & PPI • A Background seCtion on a gene of choice • Virtual BioChemiCal/moleCular analysis of same gene • No more than 7 pages of text, inCluding figures and tables. • Use 12 point font and margins ≥0.75 inches • Figures should have figure legends. The legends Can Be in 10 point font. • Tables should Be generated in Word, not pasted in from a weBsite or as a sCreen Capture. • If tables Cannot Be formatted to fit within ½ the page, then you may suBmit them as a separate attachment. • No more than 4 attachments (1 page each). • The report should have your name as a header and page numBers in the footer BCHM 6280 2016 Final Exam Page 1 of 5 The exam is worth a total of 140 points. The abstract/introduCtion is worth 10 points. The gene ontology and pathway enriChment seCtions are worth 30 points each. The Background seCtion on the single gene is worth 20 points. The virtual moleCular analysis is worth 50 points. When Characterizing a list of genes, you are looking for Clues as to why these partiCular genes show differential expression in response to some Condition. I have suggested speCifiC strategies in the seCtions Below that should guide you in finding potentially useful information and how to present that information in a ConCise, informative manner. You need to spend some time thinking about what the enriChed GO Categories, pathways or protein domains may tell you about this partiCular experimental system. You have the paper with the authors’ ConClusions. You Can use this to guide your own thinking and provide Context. You may Chose to use a different strategy than what I laid out, But you need to explain why you Chose that partiCular path. READ carefully. You will not get full points if you do not address all of the Bullet points for a given analysis. If you Cannot oBtain the data or it isn’t available to address a partiCular point, then you must state how you looked for it and the result, even if that result is negative. Choosing a gene without any known domains or annotation is proBably not a wise strategy, as it will not give you muCh to work with. Section 1: Abstract/introduction The goal of this seCtion is to provide Context for your analyses and to Briefly desCriBe your results. In 800 words or less, desCriBe the results of the paper from whiCh the miCroarray data was oBtained and the results of your own analyses. No figures or tables should Be inCluded in this seCtion of your report. This seCtion should proBably Be written last. Section 2: Gene enrichment by gene ontology Use the DAVID suite of programs to ConduCt enriChment using gene ontology on the genes that are UP-regulated in the list. Generate a table for your report that lists the top 5 enriChed terms for each of the three ontology Categories. The table should inClude: termID, term desCription, Count, % and P- value and fold enriChment. Identify the terms By their GO Category (BiologiCal proCess, Cellular Component or moleCular funCtion). Address the following in your narrative: • How many genes from the original list were UP and DOWN-regulated? • Do any of the enriChed terms support the ConClusions of the paper with regards to the role of Plagl2 in the Cell? Provide your reasoning for this ConClusion. • Are there other terms that you think might Be important for the role of Plagl2 as a potential onCogeniC transCription factor? • What other over-represented GO terms would you find interesting to pursue and why? Create a sublist from all UP-regulated genes in the top 5 BiologiCal proCess Categories. This list should Be <40 genes. If you have many more than that, then see me Before moving on with your analyses. View the suBlist in the Gene List report and export the file. As a supplemental table to your report, Convert your suBlist to ExCel that inCludes: Entrez Gene ID, offiCial gene symBol, gene name, and log2 ratio of Plgl2/Ctl expression. You will use this list in the next seCtion. BCHM 6280 2016 Final Exam Page 2 of 5 Section 3: Protein-protein interactions and pathway enrichment KEGG pathway enrichment: Provide a table of the kEGG pathways over-represented or enriChed in your list of genes, Based on the DAVID analysis with default options. The table should inClude 5 Columns of data from the DAVID output: the kEGG ID, pathway name, gene Count, % and p-value (unadjusted). Compare the enriChed pathways to those listed in Figure 6 of the paper, whiCh was done using the Ingenuity Pathway Analysis software. Some pathways appear to Be the same Based on the names of the pathways. Others may refleCt similar pathways if you dig into the kEGG pathway desCription whiCh Can Be found at the kEGG weBsite. Use this information to determine whiCh of the enriChed kEGG pathways overlap with those in the paper. In a 6th Column of the table with the over- represented kEGG pathways, list the name of the pathway from Figure 6 that you think matches the kEGG pathway that row of the table. Based on the authors’ assertion that Plagl2 promotes Cellular renewal and inhiBits differentiation, do any of the kEGG enriChed pathways make sense? DisCuss whiCh ones and why. Create a suBlist of genes from the top 5 enriChed KEGG pathways. As a supplemental table to your report, generate a list of all the genes that appear in the top 5 pathways inCluding: Entrez Gene ID, offiCial gene symBol, gene name, and log2 ratio of Plgl2/Ctl expression. Protein-protein interactions: SuBmit the suBlist from the genes in the top 5 KEGG gene list to the STRING Database. Change the view to ConfidenCe. Use this information to answer the following questions as part of your narrative: 1. Do the proteins in this list appear to interact in Clusters? DesCriBe how many Clusters and whiCh proteins form the Clusters. 2. What are the three most ConneCted proteins? (You may need to shift some proteins around to Count all the interactions). 3. Does expression level (relative to Control) Correlate with the most ConneCted proteins? That is, are the most ConneCted also the most UP-regulated? 4. When you ConduCt an analysis in STRING, are the same pathways over-represented as you found from the kEGG analysis in DAVID? How do they differ? SuBmit the Entrez Gene IDs from the top 5 BP gene list to the STRING database. 1. Do the proteins in this list appear to interact in Clusters? 2. Compared to the PPIs from the top 5 kEGG list, does this list of proteins have more or fewer proteins with no known or prediCted interactions 3. What are the top three most ConneCted proteins? 4. Are they the same as you oBserved when looking at the gene list from the top 5 kEGG pathways? If not, how do they differ? BCHM 6280 2016 Final Exam Page 3 of 5 In general, Comment on whether you find the visual view of the PPI or the pathway visuals helpful in understanding your system. How would you use this Context to Choose genes to study in BenCh experiments? Section 4: Background section on gene of choice: For this seCtion and the next seCtion of the exam, you will seleCt one gene from the list of differentially expressed genes and find its human homolog. Provide a Brief summary of the funCtion of your GENE in the Cell in table format: • Provide the Entrez Gene ID, HGNC approved name and symBol, Uniprote acCession and the Refseq mRNA acCession numBer for the longest isoform of your GENE.