Chapter 7. Functional Genomics Contents

Chapter 7. Functional Genomics Contents 7. Structural Genomics 7.1. Annotating Protein Function 7.1.1. The BLAST Search Tool at NCBI 7.1.2. Functional Annotation Using Yeast Knockout Mutations 7.1.3. Functional Annotation Using Mouse Knockout Mutations 7.1.4. Gene Knockdown Using RNAi 7.1.5. Gene Editing Using CRISPR-CAS 7.1.6. 7.1.7. Protein Localization Using GFP Tags 7.1.8. Protein-Protein Interactions by Yeast 2-hybrid 7.2. Proteomics 7.2.1. What is Proteomics 7.2.2. Protein Modifications 7.2.3. Technology for Proteomic Analysis 7.2.4. Structural Analysis of Proteins 7.3. Transcriptome and Gene Expression 7.3.1. Single Transcript Abundance Estimation 7.3.2. Genome-wide Transcript Abundance Estimation CONCEPTS OF GENOMIC BIOLOGY Page 7- 1 7.1. ANNOTATING PROTEIN FUNCTION (RETURN) CHAPTER 7. FUNCTIONAL GENOMICS Previously in Chapter 6 Structural Genomics, we ex- (RETURN) amined how genes were identified in sequenced ge- nomes. Once the coding sequences are identified, the lo- cation of introns and exons, the promoter region and the The Central Dogma of molecular biology simply stated 3’-UTR have been identified, the next step is to annotate is that DNA is coded into RNA and RNA is coded into pro- the structure and function of the protein products of protein. Further, we know that the phenotype of a gene is tein coding ORFs. The general approach and some of the determined by the proteins made inside the cell. Thus, it tools available for this task are outlined here. is because of the function of the protein that the gene is 7.1.1. The BLAST Search Tool at NCBI (RETURN) expressed as a phenotype. In this chapter will examine The Basic Local Alignment Search Tool (BLAST) at NCBI how we go about understanding protein function so that provides the ability to search a sequence database with a the annotation of proteins found at NCBI is determined. given sequence called a query sequence. Those se- The tools that are required to do this will also be consid- quences most similar to the query are reported back with ered. Today we extend this functional analysis to include a sequence alignment shown, a computed score indicat- a sophisticated analysis of gene expression that involves ing similarity of the query to the sequences found in the both all transcripts made by a genome, the transcrip- database, and any corresponding annotation for the sim- tome, and all proteins made from a genome, the prote- ilar hit sequences to the query. Thus, we can learn how ome. the proteins function that are most closely related to the query sequence we are using. There are several types of BLAST sequencing that can be used. These include: I. Nucleotide BLAST – using a nucleotide query to search a nucleotide database for the most similar sequences. CONCEPTS OF GENOMIC BIOLOGY Page 7- 2 II. Protein BLAST – using a protein query to search number of genes in various categories organized in two a protein database for the most similar se- different ways is shown in the figure. quences. III. Translated BLAST searches including a. BLASTX – using a translated nucleotide query in all 3 reading frames to search a protein database for protein databases for sequences most similar to the translated query. b. TBLASTN – using a protein sequence to search a nucleotide database translated in all 3 reading frames. Note that we do a laboratory covering the BLAST tool at NCBI, and examples of these searches are executed and examined. The utility of BLAST can be further ex- tended by linking BLAST Search Results to a gene ontology (GO). The Gene Ontology Consortium attempts pro- Figure 7.1. Yeast GO analysis, indicating the number of genes in each category. Each of the roughly 6200 genes identified in the yeast genome vides a framework for relating functional information have been placed in one or more GO categories, and a graphic summary about genes to the function of the whole organism, e.g. of the analysis is presented above. Note that the categories can be al- tered so that critical points of interest can be investigated in more de- determining when, where, and how a gene functions (in- tail. cluding metabolic and developmental pathways). The ex- tension of BLAST described above can be executed at several different web pages including the BLAST2GO page. 7.1.2. Functional Annotation Using Yeast Knockout Such analyses have been conducted on virtually every ge- Mutations (RETURN) nome sequenced, to provide at least a minimal functional analysis of the genome sequence. An example of the GO The purpose of making a knockout mutation is to re- analysis of the Yeast Genome is given in Figure 7.1. The place the open reading frame of an endogenous gene in CONCEPTS OF GENOMIC BIOLOGY Page 7- 3 the yeast genome with a replacement sequence that makes the endogenous gene nonfunctional. Typically, the replacement sequence is a selectable marker gene that allows the easy detection of the insertion. The most commonly used selectable marker is a gene for resistance to the antibiotic kanamycin. The KanR (kanamycin resistance) cassette including a promoter for the gene is inserted between a sequence of approximate 50 base-pairs near the start site at the 5’-end of the gene to a sequence about 50 base-pairs long near the 3’-end of the gene. As a result, the middle portion of the gene is removed (de- leted) and the cassette is inserted. The insertion is accom- plished by a process referred to as homologous recombination (Figure 7.2a) This recombinant construct is made in a shuttle vector (refer to Chapter 4, The Genomic Biologists Toolkit, sec- tion 4.2.3.) so that it can be transferred into yeast cells, Figure 7.2. a) Yeast Knockout mutations are constructed using a KanR and selection performed for cells that are stably resistant cassette by homologous recombination (see text for details). B) verifica- to kanamycin. This indicates that the knockout construct tion of the knockout by PCR using 4 primer sets A-B, C-D, A-KanB, and KanC-D. has been successfully recombined into the chromosomal DNA. Successful knockouts are further verified using PCR. A forward primer (A) outside the gene on the 5’-end and a reverse primer (B) inside the endogenous gene are used in one PCR reaction, while a forward primer (C) also inside the endogenous gene and a reverse primer (D) outside the 3’-end of the gene are used in a second reaction. If CONCEPTS OF GENOMIC BIOLOGY Page 7- 4 the knockout was unsuccessful, these two sets of primers VI. Move that construct into the yeast knockout and will each amplify endogenous gene sequences. However, determine whether the normal phenotype is re- if the knockout was successfully made these primers will stored or if knockout phenotype remains. not amplify sequences but using the same (A) and (D) pri- The limitation of this analysis is that your gene of in- mers with a reverse primer that lands inside the KanR cas- terest must have a yeast ortholog. However, yeast has a sette (KanB), and a forward primer also landing inside the genome that contains approximately 6,300 genes, and cassette (KanC) in Figure 7.2b will amplify sequences in most Eukaryotes contain several-fold more genes. Yeast the knockout, while they will not amplify endogenous cells are essentially unicellular, while most Eukaryotes of gene sequences. interest are multicellular, and have many genes associ- When you want to determine the phenotype of a gene ated with the developmentally appropriate expression of in any Eukaryote, knockout mutations of the yeast genes. In order to use the knockout approach on complex ortholog are a valuable tool for doing this. However, the multicellular organisms, mouse knockouts have proven technique only works when an ortholog of the gene of in- more useful for many genes. terest can be found in yeast. This is particularly useful for genes with simple metabolic phenotypes. 7.1.3. Functional Annotation Using Mouse Knockout The steps involved would be as follows: Mutations (RETURN) I. Identify the putative protein of interest using a A mouse knockout is made using homologous recom- BLAST search. bination just as in yeast. The mouse knockout cassette II. Determine whether there is a yeast ortholog of contains the ends of the target gene just as with yeast, your gene of interest. but it contains two selectable markers, one inside the III. Construct a knockout of that gene and deter- gene of interest, and the other outside the gene of inter- mine the phenotype of the knockout. est. These two markers make it possible to distinguish IV. Obtain a cDNA clone of the gene of interest from between a true homologous recombination knockout, the appropriate Eukaryotic organism. and integration of the foreign DNA vector into a random V. Construct a yeast expression shuttle vector that site in the genome which would not produce a mouse expresses the gene of interest. knockout (Figure 7.3). CONCEPTS OF GENOMIC BIOLOGY Page 7- 5 The deletion module is introduced into embryonic mouse stem cells (ES) from an agouti mouse, and the ES cells are incubated on “selective” media such that the cells with the internal deletion marker (neoR in the case shown in Figure 7.3) and lacking the marker outside the deletion site (tk in Figure 7.3) are allowed to grow. This enriches the growing cells that have the deletion module inserted into the target gene via homologous recombination and minimizes cells that had the full DNA vector ran- domly integrated into the genome which is the other possible outcome.

Chapter 7. Functional Genomics Contents

Chapter 14: Functional Genomics Learning Objectives

13 Genomics and Bioinformatics

Genomics and Its Impact on Science and Society: the Human Genome Project and Beyond

Genetic Effects on Microsatellite Diversity in Wild Emmer Wheat (Triticum Dicoccoides) at the Yehudiyya Microsite, Israel

Gene Prediction and Genome Annotation

Small Variants Frequently Asked Questions (FAQ) Updated September 2011

SOP Template Northern Blotting with P-32

Blotting Techniques Blotting Is the Technique in Which Nucleic Acids Or

Near-Infrared Fluorescent Northern Blot

NORTHERN BLOT SEM.I Dr.Ramesh Pathak the Northern Blot Is Also

Masterpath: Network Analysis of Functional Genomics Screening Data

Epigenetics Analysis and Integrated Analysis of Multiomics Data, Including Epigenetic Data, Using Artiﬁcial Intelligence in the Era of Precision Medicine