Lorenzo Cerutti March 2011

InterPro: Protein signatures, classification, and functional analysis Lorenzo Cerutti Swiss Institute of Bioinformatics Geneva/Lausanne, Switzerland March 2011 Credits I Nicolas Hulo and Marco Pagni from SIB for sharing some slides with me. I Terri Attwood from Faculty of Life Sciences & School of Computer Science (University of Manchester) for allowing me to use some of her ideas. I Jennifer McDowall and Duncan Legge for providing the InterPro tutorial. I The InterPro team for providing some of their InterPro presentation slides. March 20112 Biology: a complex matter! March 20113 Biology: a Complex Matter! I Proteins I exhibit rich evolutionary relationships; I exhibit complex molecular interactions; I have complex regulation and modification mechanism; I exists in dynamic systems. I The computational sequence analysis tools are na¨ıveabout real biology and the complex relationships between molecular elements. I Therefore we should be critical about what we can achieve with such computational sequence analysis tools. I So, again, be critical! and understand the biology. March 20114 Biology: a Complex Matter! I Proteins I exhibit rich evolutionary relationships; I exhibit complex molecular interactions; I have complex regulation and modification mechanism; I exists in dynamic systems. I The computational sequence analysis tools are na¨ıveabout real biology and the complex relationships between molecular elements. I Therefore we should be critical about what we can achieve with such computational sequence analysis tools. I So, again, be critical! and understand the biology. March 20114 Biology: a Complex Matter! I Proteins I exhibit rich evolutionary relationships; I exhibit complex molecular interactions; I have complex regulation and modification mechanism; I exists in dynamic systems. I The computational sequence analysis tools are na¨ıveabout real biology and the complex relationships between molecular elements. I Therefore we should be critical about what we can achieve with such computational sequence analysis tools. I So, again, be critical! and understand the biology. March 20114 Biology: a Complex Matter! I Proteins I exhibit rich evolutionary relationships; I exhibit complex molecular interactions; I have complex regulation and modification mechanism; I exists in dynamic systems. I The computational sequence analysis tools are na¨ıveabout real biology and the complex relationships between molecular elements. I Therefore we should be critical about what we can achieve with such computational sequence analysis tools. I So, again, be critical! and understand the biology. March 20114 Biology: a Complex Matter! I Proteins I exhibit rich evolutionary relationships; I exhibit complex molecular interactions; I have complex regulation and modification mechanism; I exists in dynamic systems. I The computational sequence analysis tools are na¨ıveabout real biology and the complex relationships between molecular elements. I Therefore we should be critical about what we can achieve with such computational sequence analysis tools. I So, again, be critical! and understand the biology. March 20114 Biology: a Complex Matter! I Proteins I exhibit rich evolutionary relationships; I exhibit complex molecular interactions; I have complex regulation and modification mechanism; I exists in dynamic systems. I The computational sequence analysis tools are na¨ıveabout real biology and the complex relationships between molecular elements. I Therefore we should be critical about what we can achieve with such computational sequence analysis tools. I So, again, be critical! and understand the biology. March 20114 Biology: a Complex Matter! I Proteins I exhibit rich evolutionary relationships; I exhibit complex molecular interactions; I have complex regulation and modification mechanism; I exists in dynamic systems. I The computational sequence analysis tools are na¨ıveabout real biology and the complex relationships between molecular elements. I Therefore we should be critical about what we can achieve with such computational sequence analysis tools. I So, again, be critical! and understand the biology. March 20114 Today we speak about similarities and differences March 20115 Give me the criteria! A trip to the farm I What is similar? What is different? March 20116 A trip to the farm I What is similar? What is different? Give me the criteria! March 20116 I Hey! Donkeys have longer ears than horses! A trip to the farm I Accumulated observations help to detect subtle similarities/differences. March 20117 A trip to the farm I Accumulated observations help to detect subtle similarities/differences. I Hey! Donkeys have longer ears than horses! March 20117 Wings! what else! A trip in the air I Why can they fly? March 20118 A trip in the air I Why can they fly? Wings! what else! March 20118 I Hey! Some of them have an engine! A trip in the air I Accumulated observations help to detect subtle similarities/differences, and function. March 20119 A trip in the air I Accumulated observations help to detect subtle similarities/differences, and function. I Hey! Some of them have an engine! March 20119 Functional annotation of sequences I Use similarities and differences to infer which residues, motifs, domains, are responsible for a particular function. I Accumulated observations help in identifying such functional regions. I Ultimately, experimental evidences should be used to label functional residues. March 2011 10 Functional annotation of sequences I Use similarities and differences to infer which residues, motifs, domains, are responsible for a particular function. I Accumulated observations help in identifying such functional regions. I Ultimately, experimental evidences should be used to label functional residues. March 2011 10 Functional annotation of sequences I Use similarities and differences to infer which residues, motifs, domains, are responsible for a particular function. I Accumulated observations help in identifying such functional regions. I Ultimately, experimental evidences should be used to label functional residues. March 2011 10 One more thing before the serious stuff: let's play Lego! March 2011 11 One more thing before the serious stuff: let's play Lego! March 2011 11 One more thing before the serious stuff: let's play Lego! March 2011 11 One more thing before the serious stuff: let's play Lego! March 2011 11 Proteins and Lego I Most of the proteins are modular and/or contains specific motifs ... like the Lego, you can use the same brick in different constructions. I We can use these modules and motifs to build specific signatures that will be used to classify proteins and infer their function. I So, what are such sequence signature? March 2011 12 Proteins and Lego I Most of the proteins are modular and/or contains specific motifs ... like the Lego, you can use the same brick in different constructions. I We can use these modules and motifs to build specific signatures that will be used to classify proteins and infer their function. I So, what are such sequence signature? March 2011 12 Proteins and Lego I Most of the proteins are modular and/or contains specific motifs ... like the Lego, you can use the same brick in different constructions. I We can use these modules and motifs to build specific signatures that will be used to classify proteins and infer their function. I So, what are such sequence signature? March 2011 12 General definitions of conserved sequence signatures I Conserved regions in biological sequences can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a characteristic three dimensional structure or fold. Families: groups of proteins that have the same domain arrangement or that are conserved along the whole sequence. Repeats: structural units always found in two or more copies that assemble in a specific fold. Assemblies of repeats might also be thought of as domains. Motifs: region of domains containing conserved active or binding residues, or short conserved regions present outside domains that may adopt folded conformation only in association with their binding ligands. Sites: functional residues (active sites, disulfide bridges, post-translation modified residues). March 2011 13 General definitions of conserved sequence signatures I Conserved regions in biological sequences can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a characteristic three dimensional structure or fold. Families: groups of proteins that have the same domain arrangement or that are conserved along the whole sequence. Repeats: structural units always found in two or more copies that assemble in a specific fold. Assemblies of repeats might also be thought of as domains. Motifs: region of domains containing conserved active or binding residues, or short conserved regions present outside domains that may adopt folded conformation only in association with their binding ligands. Sites: functional residues (active sites, disulfide bridges, post-translation modified residues). March 2011 13 General definitions of conserved sequence signatures I Conserved regions in biological sequences can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a characteristic three dimensional structure or fold. Families: groups of proteins that have the same domain arrangement or that are conserved along the whole sequence. Repeats: structural units always found in two or more copies that assemble in a specific fold. Assemblies of repeats

Lorenzo Cerutti March 2011

ELIXIR Poster Numbers: P El001 - 037 Application Posters: P El034 - 037

The Uniprot Knowledgebase

Concepts, Historical Milestones and the Central Place of Bioinformatics in Modern Biology: a European Perspective

The Uniprot Knowledgebase BLAST

ISB Newsletter

Embnet.News Volume 4 Nr

GOBLET Annual General Meeting

The Role of Pattern Databases in Sequence Analysis

BIOINFORMATICS and MOLECULAR EVOLUTION BAMA01 09/03/2009 16:31 Page Ii BAMA01 09/03/2009 16:31 Page Iii

SEB/GOBLET Bioinformatics Workshop

ISB Newsletter September 2013 12

The Uniprot Knowledgebase the Uniprot Knowledgebase Uniprotkb Flat-File Format