Identification, Characterization and Evolution of Membrane-Bound Proteins
Total Page:16
File Type:pdf, Size:1020Kb
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 388 Identification, Characterization and Evolution of Membrane-bound Proteins PÄR J. HÖGLUND ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6206 UPPSALA ISBN 978-91-554-7312-9 2008 urn:nbn:se:uu:diva-9329 ! " ##$ #%&## ' ( ' ' )( ( *+ ' ,- .( / 0 (- 1 ) 2- ##$- 3 ' (4 0 ' 5 ) - 6 - $$- 7 - - 3 ! %8$5%5""9585%- #: ' ( ( / ' ' ; 5 *;), * <, /( $## $# ( - 3 ) 3 33 3= / 7 ( ;) ' ( !3 - 3 ) 3 / ( ( ;) ) 33- > ' / / ( ;) 8 ( ) 3=- )( ( ( 7 ( ' ( ;) ' ( ( - 0 . ( ' ( ' ( / ( ( / / ( ( - '' / ' ( !5 ( ;) ( ( ? - 3 ) 333 / / ( ;)- 3 ) = / ( ' ' / / ( <768 <76$ ' ( ' 7 * <7,- > ' ( ( ' ( - > 9# @ <7 ' # / ' %7 - 3 ) =3 / ( ' ' ( ( ' ( '' < ' 0? - 3 / 9# @ ( / ( ( ' ( ' ( 97 < '- ; 5 ;) < ' 0 ( 6 ( )( ! "# $%& ' & ( ) *+,& & -./*012 & A )B 2- 1 ##$ 3 ! 7"57#7 3 ! %8$5%5""9585% & &&& 5%% *( &CC -?-C D E & &&& 5%%, Scientific discovery and scientific knowledge have been achieved only by those who have gone in pursuit of it without any practical purpose whatsoever in view. Max Planck To my family List of publications I. Fredriksson R, Lagerström MC, Höglund PJ, Schiöth HB. Novel human G protein-coupled receptors with long N-terminals containing GPS domains and Ser/Thr-rich regions. FEBS Lett. 2002 Nov 20;531:407- 14. II. Fredriksson R, Gloriam DE, Höglund PJ, Lagerström MC, Schiöth HB. There exist at least 30 human G-protein-coupled receptors with long Ser/Thr-rich N-termini. Biochem Biophys Res Commun. 2003 Feb 14;301:725-34. III. Fredriksson R, Höglund PJ, Gloriam DE, Lagerström MC, Schiöth HB. Seven evolutionarily conserved human rhodopsin G protein-coupled recep- tors lacking close relatives. FEBS Lett. 2003 Nov 20;554:381-8. IV. Bjarnadóttir TK, Fredriksson R, Höglund PJ, Gloriam DE, Lagerström MC, Schiöth HB. The human and mouse repertoire of the adhesion family of G-protein- coupled receptors. Genomics. 2004 Jul;84:23-33. V. Höglund PJ, Adzic D, Scicluna SJ, Lindblom J, Fredriksson R. The repertoire of solute carriers of family 6: identification of new human and rodent genes. Biochem Biophys Res Commun. 2005 Oct 14;336:175-89. VI. Höglund PJ, Nordström KJ, Schiöth HB, Fredriksson R. The families of Solute Carriers (SLC) have a remarkable long evolutionary history with most families present before a early divergence of eukaryotes. Manuscript. Articles printed with permission. Contents Introduction...................................................................................................11 From genetics to postgenomics ................................................................11 Genetics ...............................................................................................11 Genomics.............................................................................................11 Postgenomics .......................................................................................13 Sequenced genomes .................................................................................14 Membrane proteins...................................................................................16 G protein-coupled receptors ................................................................16 The Solute Carrier Family ...................................................................19 Research aim.................................................................................................22 Methods ........................................................................................................23 Phylogenetic analysis ...............................................................................23 Sequence similarity based search strategies.............................................27 Expressed Sequence Tags ........................................................................27 Results...........................................................................................................29 Discussion.....................................................................................................34 The Adhesion Family ...............................................................................34 The Solute Carrier Family........................................................................40 The Solute Carrier Family 6.....................................................................42 Diseases....................................................................................................45 Future perspectives .......................................................................................49 Acknowledgements.......................................................................................51 References.....................................................................................................53 Abbreviations 7TM Seven transmembrane Adhesion(s) GPCR(s) belonging to the Adhesion family BAC(s) Bacterial artificial chromosome(s) BAI Brain Specific Angiogenesis Inhibitor BLAST Basic Local Alignment Search Tool BLAT BLAST-Like Alignment Tool CA Cadherin Repeats cDNA Complementary DNA CDS Coding Domain Sequence CELSR Cadherin EGF LAG seven-pass G-type receptor CNS Central Nervous System EGF Epidermal Growth Factor Domain EGF-LAM Laminin Type Epidermal Growth Factor Domain EMR EGF-module containing mucin-like hormone receptors EST Expressed Sequence Tag E-value Expectation value EST(s) Expressed Sequence Tags Gbp Gigabase pairs GPCR G Protein-Coupled Receptor GPS GPCR-Proteolytic Site The human GPCR families; Glutamate, Rhodopsin, Adhesion, GRAFS Frizzled/Taste2 and Secretin HBD Hormone-Binding Domain HE6 Human Epidydimal Gene Product 6 HGNC HUGO Gene Nomenclature Committee HMM Hidden Markov Model HUGO Human [Human] Genome Organisation LamG Laminin domain LEC Lectomedin receptor Mbp Megabase Pairs ML Maximum Likelihood MP Maximum Parsimony mRNA Messenger Ribonucleic Acid NCBI (US) National Center for Biotechnology Information NJ Neighbor Joining OLF Olfactomedin domain PTX Pentraxin domain Rhodopsin(s) GPCR(s) belonging to the Rhodopsin family RPS-BLAST Reverse Position specific BLAST SEA Sperm Protein, Enterokinase, and Agrin SLCs Genes belonging to the Solute Carrier Family WGS Whole Genome Shotgun YAC(s) Yeast artificial chromosome(s) Introduction From genetics to postgenomics Genetics The modern era of genetics is generally considered to have started with Gregor Mendel and Charles Darwin. In 1858, Alfred Russel Wallace pub- lished his essay “On the Tendency of Varieties to Depart Indefinitely from the Original Type” in a joint presentation together with previously unpub- lished writings from Charles Darwin. Darwin published his landmark work “On the Origin of Species by Means of Natural Selection” in 1859. In 1866, the Augustinian monk, Gregor Mendel, published his findings on the laws of inheritance based on experiments. However, his findings were not given any attention in the scientific community and it was not until 1900 with the "re- discovery of Mendel" that the theories were widely accepted by the scientific community. In 1915, Thomas Hunt Morgan, published “The Mechanisms of Mendelian Heredity”, in which he presents results that proved that genes are lined up along chromosomes. In 1953, Francis Crick and James Watson de- termined the structure of the DNA molecule; a double helix composed of strings of nucleotides (Watson et al. 1953). Crick and Watson shared the Nobel Prize in Physiology or Medicine in 1962 for their discovery. In 1977, Fred Sanger developed the chain termination method for sequencing DNA (Sanger et al. 1977) and Allan Maxam and Walter Gilbert developed new sequencing techniques (Maxam et al. 1977). These sequencing techniques were very powerful and substantially raised the sequencing capacity. In 1983, Kary Mullis and others at Cetus Corporation invented a technique for making many copies of a specific DNA sequence - the polymerase chain reaction (PCR). Genomics The Human Genome Project, an international effort to sequence all of the human DNA and map all of the genes in humans, was launched in 1990 (Watson 1990). In 1995 the first full genome sequence of a living organism was completed for the bacteria Hemophilus influenzae (Fleischmann et al. 1995). In 1996, Saccharomyces cerevisiae (baker's yeast) was the first eu- 11 karyotic genome to be completely sequenced and annotated (Goffeau et al. 1996). The nematode, Caenorhabditis elegans, was the first multicellular organism to have its genome completely sequenced (CESC 1998). The DNA sequencing has developed rapidly, and sequences of much larger genomes were produced: Drosophila melanogaster (Adams et al. 2000), Arabidopsis thaliana (AGI 2000) , Anopheles gambiae (Holt et al. 2002) and Mus muscu- lus (Waterston et al. 2002). In 2001 the draft completion of the human ge- nome was announced by Celera Genomics (Venter et al. 2001) and the Inter- national Human Genome Sequencing Consortium (IHGSC) (Lander et al. 2001). The so-called final version of the human genome was published in 2004 (IHGSC 2004). There has been a quick expansion in the number of Sequenced Genomes. As of 2008-09-15, there are 854 Published Complete Genomes in the GOLD, Genomes OnLine Database (http://www.genomesonline.org):