EMBL-EBI Annual Scientific Report 2012 Table of Contents

EMBL-European Bioinformatics Institute Annual Scientific Report 2012 [email protected] © 2013 EMBL-European Bioinformatics Institute © 2013 EMBL-European External Relations Team. by EMBL-EBI’s This publication was produced information about EMBL-EBI please contact: For more EMBL-EBI Annual Scientific Report 2012 Table of contents Introduction & overview 3 Organisation 8 Services Genes, genomes and variations 10 Molecular atlas 20 Proteins and protein families 28 Molecular and cellular structures 36 Molecular systems 44 Chemical biology 48 Cross-domain tools and resources 54 Research 58 Bertone group 60 Birney group 62 Enright group 64 Goldman group 66 Le Novère group 68 Luscombe group 70 Marioni group 72 Rebholz-Schuhmann group 74 Saez-Rodriguez group 76 Thornton group 78 The EMBL International PhD Programme 80 Support 82 Financial report 96 Scientific advisory boards 104 Major database collaborations 110 Publications 116 2012 EMBL-EBI Annual Scientific Report 2012 EMBL-EBI Annual Scientific Report 1 2 Foreword Welcome to the 2012 Annual Scientific Report of the EMBL-European Bioinformatics Institute. This document gives an overview of our institute’s major achievements during the calendar year, reflecting progress in our mission areas of service, research, training, industry and European co-ordination. If you have visited the Genome Campus over the past year, you will have noticed a new building under construction. We were fortunate to receive funding in 2011 from the UK government's Large Facilities Capital Fund to expand our activities, and in June 2012 EMBL-EBI Director Janet Thornton broke the ground at a landmark ceremony. The rapid progress of construction means that the building is scheduled for completion in autumn 2013. The South Building will house the ELIXIR Hub as well as an Industry and Innovation Suite, which will promote the translation of biological information to applications in medicine, biotechnology and the environment. Our funding remained stable in 2012, despite another year of austerity in many countries, and our staff numbers also remain steady. Yet we continued to handle an exponential rise in biological data and unprecedented growth in the use of our resources. All of our efforts rely on close international collaborations. The deposition of new data, the daily exchange of information between data resources, the joint development of software tools, the sharing of curation tasks and the challenges of collaborative research allow us to build an extensive network of colleagues. We look forward to strengthening these links in the coming year and, as always, to creating new collaborations. Janet Thornton, Director Rolf Apweiler, Joint Associate Director Ewan Birney, Joint Associate Director 3 EMBL-EBI 2012 It was a year of transition, as we bade a fond farewell to Associate Director Graham Cameron, and welcomed Rolf Apweiler and Ewan Birney as joint Associate Directors. It was also a year of honours, as Drs Apweiler and Birney were appointed members of EMBO and our Director, Professor Janet Thornton, was made Dame Commander of the British Empire in recognition of her contribution to bioinformatics. Our research programme went through major changes as The new global website, to be launched in 2013, provides well, as we gained new group leaders Pedro Beltrao from an easily navigable interface to EBI services and features the University of California, San Francisco; Oliver Stegle from consistent functionality which upholds the unique identity the Max-Planck Institutes in Tübingen; and Sarah Teichman of resource brands. Many teams expended significant time from the MRC Laboratory of Molecular Biology in Cambridge. and effort creating and testing new user interfaces, for In the same year, we were very sad to say farewell to three example UniProt and InterPro, and their findings contributed talented and experienced group leaders, who reached significantly to the overall design strategy. the end of their EMBL contracts: Nick Luscombe, now at University College London and the Cancer Research UK London Research Institute; Nicolas Le Novère, who has Research moved a short way to the Babraham Institute, Cambridge; The ENCODE project dominated scientific news worldwide and Dietrich Rebholz-Schuhmann, now at the University of in September. Co-led by Ewan Birney at EMBL-EBI and Zurich’s Institute of Computational Linguistics. funded by the National Human Genome Research Institute In 2012 our protein services saw a change of leadership, in the US, the project comprised over 400 scientists in 32 as Alex Bateman left the Wellcome Trust Sanger Institute labs throughout the world who produced a detailed map of and moved across the Genome Campus to take on the genome function. ENCODE’s staggering output inspired a responsibilities formerly held by Rolf Apweiler. His team new publishing model: upwards of 30 papers were published brings with them the Pfam, Rfam, TreeFam and MEROPS under open-access license in several different peer-reviewed databases. A new Variation journals, with the contents linked by topic and united for Team, led by newcomer optimum exploration in a single interface, provided by Nature Justin Paschall, from (ENCODE Project Consortium, 2012). The launch of ENCODE the National Center was all about public engagement, with a press conference for Biotechnology featuring an exhibit at the London Science Museum and aerial Information (NCBI) in the acrobats re-enacting gene expression. The event captured US, is creating a unified the imagination of the world’s top-tier television and print interface for the European media. Genome-phenome Archive The 1000 Genomes Project published a map of normal and the Database of human variation: a major milestone for the Flicek team, Genomic Variants archive. which co-led the project’s data co-ordination centre with the Perhaps 2012’s largest National Centre for Biotechnology Information (NCBI). This project, involving every vast dataset is freely available via the Amazon Web Services team at EMBL-EBI, was the Cloud (The 1000 Genomes Project Consortium, 2012). redesign of the website with user experience as the driver. Figure 1. In 2012 Professor Janet Thornton was awarded the Commander of the British Empire medal. 4 Petra Schwalie in Paul Flicek’s research group led efforts to develop a new, integrated model of the evolution of the Services transcription factor CTCF. Her work, carried out with Duncan After broad consultation with the service teams and support Odom’s group at the University of Cambridge, explains the groups, we reorganised our services into ‘clusters’ to reflect origin of some 5000 highly conserved CTCF binding events the research communities they serve. The clusters provide in mammals and sheds light on how these binding sites are a better framework to make strategic decisions whilst still moved and amplified (Schmidt et al., 2012). enabling a deep engagement with the communities. Another gene expression study in Nick Luscombe’s lab The new clusters are: genes, genomes and variation, led by found that an organism’s most valuable assets are protected Paul Flicek; molecular atlas, led by Alvis Brazma; proteins most carefully. Their work demonstrated that mutation rates and protein families, led by Alex Bateman; molecular and in bacteria vary by more than an order of magnitude, with cellular structures, led by Gerard Kleywegt; chemical biology, highly expressed genes having the lowest mutation rates led by John Overington; molecular systems, also led by John (Martincorena et al., 2012). Overington; and cross-domain tools and resources, unified under Rolf Apweiler. Janet Thornton’s group continues its painstaking work to understand the evolution of enzyme mechanisms. Nick As a result of the exercise, the ArrayExpress Archive, Furnham, a postdoc in Janet’s group, worked with Christine Expression Atlas, PRIDE and MetaboLights resources are Orengo’s group at University College London to develop a now clustered together, in the interests of combining them new resource called FunTree and used it to study 276 enzyme into a single portal that provides access to integrated views superfamilies. This allowed them to determine the extent to of –omics data. which enzyme functions have changed over the course of evolution–findings that have important implications for the development of new therapeutics. Figure 2. The new South building, which will house the ELIXIR Technical Hub, under construction on the Genome Campus. 5 New service developments Training • Understanding the basis of crop diseases was the driver • Our user-training programme continued to develop behind the launch of PhytoPath, a new portal for plant alongside our service offerings and in 2012 actively pathogen data, while the Kersey team’s involvement in the involved more than 150 members of EMBL staff in 232 transPLANT project gave rise to a new integrative portal events, reaching an audience of over 9770 people in for plant genomics data. 35 countries on six continents. The Train online portal saw a rapidly rising number of users in its first year, The EBI Metagenomics resource was officially launched • and our hands-on programme offered new courses on by the Hunter and Cochrane teams, enabling users to –omics-based data analysis for plant biologists and a submit, archive and analyse genomic information from combined experimental/computational metagenomics environments containing many species. course, amongst many others. The Cochrane and Brazma • The Enzyme Portal, launched in 2012, draws together teams

EMBL-EBI Annual Scientific Report 2012 Table of Contents

A Method to Infer Changed Activity of Metabolic Function from Transcript Proﬁles

ANNUAL REVIEW 1 October 2005–30 September

Improving the Prediction of Transcription Factor Binding Sites To

The Kyoto Encyclopedia of Genes and Genomes (KEGG)

Applied Category Theory for Genomics – an Initiative

A Community Proposal to Integrate Structural

Distinctive Regulatory Architectures of Germline-Active and Somatic Genes in C

Open Data, Open Source, and Open Standards in Chemistry: the Blue Obelisk Five Years On" Journal of Cheminformatics Vol

Vol 9 No 1 Spring

UNIVERSITY of CALIFORNIA SAN DIEGO Making Sense of Microbial

Gurdon Institute 20122011 PROSPECTUS / ANNUAL REPORT 20112010

Open Data, Open Source and Open Standards in Chemistry: the Blue Obelisk ﬁve Years On