WELCOME !

On behalf of the ECCB10 Organizing and Steering Committees we would like to welcome you to beautiful Ghent, one of Europe’s most exciting cities. We hope you will enjoy ECCB10 - four days of presentations, poster sessions, workshops, software demonstrations and networking with other leading researchers, computational biologists and bio-IT professionals from around the world. ECCB is the leading European meeting in our field and a primary international conference at which people present and discuss advances in computational biology. We are proud to have five keynote lectures by distinguished speakers Peer Bork (European Molecular Biology Laboratory, Heidelberg, Germany), Elaine Mardis (Washington University Genome Center, St-Louis, MO, USA), Michael J. E. Sternberg (Imperial College, London, UK), Yves Van de Peer (Flemish Institute of Biotechnology and University of Ghent, Belgium) and Hans Westerhoff (University of Manchester, UK and VU University Amsterdam, The Netherlands). In addition, there will be a number of interesting workshops and tutorials on Sunday, including the first European Student Council Symposium, a workshop on leveraging perturbation effects towards reverse engineering biological networks and one on extraction and reuse of genotype-phenotype knowledge from scientific literature and databases. There is no better way to kick off ECCB10 than to participate in one of these workshops and tutorials! On Monday afternoon, we will have a special joint lecture by artist Koen Vanmechelen and geneticist Jean- Jacques Cassiman. They will present the Cosmopolitan Chicken Project – Koen’s quest to create a universal, cosmopolitan chicken. Further, they will discuss how art and science influence each other. We would like to acknowledge the help of all people who made ECCB10 happen, as well as our commercial and academic sponsors – thanks to them we were able to offer 62 fellowships for early-stage scientists. Finally, we would like to thank you for your participation. Enjoy the conference!

Yves Moreau & Jaap Heringa

ECCB10 Table of Contents

TABLE OF CONTENTS

Hotels ...... 4 Local Public Transport ...... 5 Floor Plan ...... 6 Organization ...... 7 Keynotes ...... 15 Schedule ...... 17 Social Events ...... 25 Workshops ...... 26 Tutorials ...... 29 Art Meets Science ...... 34 Technology Track ...... 35 Oral Presentations ...... 45 Posters ...... 61 Author Index ...... 102

3

ECCB10 Hotels

HOTELS

NH Gent Sint Pieters Hotel Novotel Gent Centrum Koning Albertlaan 121, 9000 Ghent, Belgium Goudenleeuwplein 5, 9000 Ghent, Belgium +32 9 222 60 65 +32 9 224 22 30

Holiday Inn Gent Expo Carlton Hotel Gent Maaltekouter 3, 9051 Ghent, Belgium Koningin Astridlaan 138, 9000 Ghent, Belgium +32 9 220 24 24 +32 9 222 88 36

NH Gent Belfort Hostel 47 Hoogpoort 63, 9000 Ghent, Belgium Blekerijstraat 47, 9000 Ghent, Belgium +32 9 233 33 31 +32 478 71 28 27

Ghent Marriott Hotel HI De Draecke Gent Korenlei 10, 9000 Ghent, Belgium Sint-Widostraat 11, 9000 Ghent, Belgium +32 9 233 93 93 +32 9 233 70 50

Hotel Gravensteen Ecohostel Andromeda Jan Breydelstraat 35, 9000 Ghent, Belgium Bargiekaai 35, 9000 Ghent, Belgium +32 9 225 11 50 +32 486 67 80 33

Ghent River Hotel Waaistraat 5, 9000 Ghent, Belgium +32 9 266 10 10

Hotel de Flandre Poel 1, 9000 Ghent, Belgium +32 9 266 06 00

Hotel Ibis Gent Centrum Opera Nederkouter 24-26, 9000 Ghent, Belgium +32 9 225 07 07

Hotel Ibis Centrum St-Baafs Kathedraal Limburgstraat 2, 9000 Ghent, Belgium +32 9 233 00 00

Holiday Inn Gent Express Akkerhage 2, 9000 Ghent, Belgium +32 9 222 58 85

Campanile Akkerhage 1, 9000 Ghent, Belgium +32 9 220 02 22

Poortackere Monasterium Oude Houtlei 58, 9000 Ghent, Belgium +32 9 269 22 30

4

ECCB10 Local Public Transport

LOCAL PUBLIC TRANSPORT

Basics ‘De Lijn’ (http://www.delijn.be) provides trams and buses in Ghent. Using public transport is easy and travelers are offered several ticket options, but in any case buying your ticket in one of the distribution points (train station, kiosks, supermarkets…) is at least 25% cheaper than buying it from the bus or tram driver. If you are considering taking several bus/tram trips you might be interested in the “lijnkaart”, which grants the owner 10 trips at a discount rate. People traveling in groups of 5 or more can buy a “lijnkaart %” which also provides 10 trips at an even lower rate. If you plan on using public transportation a lot, a day-pass or a 3-day-pass might be the most convenient way of traveling through Ghent. Once you have bought a ticket traveling is as simple as getting on the bus, saying “Goeiedag!” to the driver and inserting the ticket into the yellow validation device (don’t forget to take it back). Transfers are allowed for one hour after validation.

Ticket Type Price at distribution point Price on the bus/tram Single ticket € 1,20 € 2 Lijnkaart (10 trips) € 8 € 15 Lijnkaart % (10 trips) € 6 € 11 Day-pass € 5 € 6 3-day-pass € 10 € 12 Museum Pass (3-day-pass, includes access to the city’s museums): € 20

From the conference site to the city centre Bus 5 is a convenient way to go to the city centre. It leaves from the bus stop in front of the conference site. You will find a map of public transport in the conference bag.

Ghent Museum Pass The Ghent Museum Pass costs € 20 and offers access to various museums and municipal buildings for 3 days. During those 3 days the Ghent Museum Pass is also valid as a ticket for all De Lijn vehicles in the Ghent city area. The validity period starts on the date that you write on the back of the pass. You can purchase a Ghent Museum Pass in advance and use it for your journey to the first museum. You can purchase the Ghent Museum Pass in the Lijnwinkels in East Flanders, in participating museums, from the tourism department of the city of Ghent, and in several Ghent hotels. The participating museums and municipal buildings are: Belfry, Friary of the Carmelites, Design Museum Ghent, De wereld van Kina: the Garden, De wereld van Kina: the House, The Castle of the Counts, Alijn House, Art Centre St Peter’s Abbey, MIAT (Museum for Industrial Archaeology and Textiles), Museum Dr. Guislain, Museum for the History of the Sciences, Museum of Fine Arts, St Bavo's Cathedral + The Adoration of the Mystic Lamb, SMAK (City Museum for Modern Art), STAM (City Museum Ghent). More information available at De Lijn's website (http://www.delijn.be/en/vervoerbewijzen/types/uitstap/#5).

5

ECCB10 Floor Plan

FLOOR PLAN

6

ECCB10 Organization

ORGANIZATION

Conference Chair , Katholieke Universiteit Leuven, Belgium

Proceedings Chair Jaap Heringa, VU University Amsterdam, The Netherlands

Local Organizing Committee Yves Moreau, Katholieke Universiteit Leuven, Belgium Jaap Heringa, VU University Amsterdam, The Netherlands Gert Vriend, Radboud University, Nijmegen, The Netherlands Yves Van de Peer, University of Ghent and VIB, Belgium Kathleen Marchal, Katholieke Universiteit Leuven, Belgium Jacques van Helden, Université Libre de Bruxelles, Belgium Louis Wehenkel, Université de Liège, Belgium Antoine van Kampen, University of Amsterdam and Netherlands Bioinformatics Center (NBIC) Peter van der Spek, Erasmus MC, Rotterdam, The Netherlands

Steering Committee (Chair), Hebrew University, Jerusalem, Israel Soren Brunak, Technical University of Denmark, Lyngby, Denmark Philipp Bucher, University of Lausanne, Switzerland David Gilbert, University of Glasgow, UK , Max-Planck Institute for Informatics, Saarbrücken, Germany Yves Moreau, Katholieke Universiteit Leuven, Belgium Dietrich Rebholz-Schuhmann, European Bioinformatics Institute, Hinxton, UK Marie-France Sagot, INRIA and Université Claude Bernard, Lyon, France , Universidad Autonoma, Madrid, Spain Jaak Vilo, University of Tartu, Estonia

Program Committee – Area Chairs

A—Sequence Analysis, Alignment, and Next-Generation Sequencing Des Higgins, University College Dublin, Ireland Geoff Barton, University of Dundee, Scotland, UK

B—Comparative Genomics, Phylogeny, and Evolution Martijn Huynen, Radboud University Nijmegen Medical Centre, The Netherlands Yves Van de Peer, Ghent University and VIB, Belgium

C—Protein and Nucleotide Structure , University of Rome "La Sapienza", Italy Jan Gorodkin, University of Copenhagen, Denmark

7

ECCB10 Organization

D—Annotation and Prediction of Molecular Function Nir Ben-Tal, Tel-Aviv University, Israel Fritz Roth, Harvard Medical School, USA

E—Gene Regulation and Transcriptomics Jaak Vilo, University of Tartu, Estonia Zohar Yakhini, Agilent Laboratories, Tel-Aviv and the Technion, Haifa, Israel

F—Text Mining, Ontologies, and Databases Alfonso Valencia, National Center for Cancer Research, Madrid, Spain Guy Cochrane, European Bioinformatics Institute, Hinxton, UK

G—Protein Interactions, Molecular Networks, and Denis Thieffry, University of Marseille, France Benno Schwikowski, Institut Pasteur, Paris, France

H—Genomic Medicine Yves Moreau, Katholieke Universiteit Leuven, Belgium Niko Beerenwinkel, Federal Institute of Technology, Zürich, Switzerland

I—Other Bioinformatics Applications , Max Planck Institute for Molecular Genetics, Berlin, Germany Hans-Peter Lenhof, Saarland University, Saarbrücken, Germany

Program Committee – Members Edoardo Airoldi Mario Caccamo Jan Gorodkin Tero Aittokallio Yasmin Alam-Faruque Daniele Catanzaro Sam Griffiths-Jones Mario Albrecht Claudine Chaouiya Michael Gromiha Andre Altmann J. Michael Cherry Alain Guenoche Simon Andrews Andrea Cilliberto Roderic Guigo Rolf Backofen Kevin B. Cohen Udo Hahn Gary Bader Chris Cole Turkan Haliloglu Guy Baele Tim Conrad Thomas Hamelryck Timothy Bailey Thomas Dandekar Jackie Han Susmita Datta Monica Heiner Geoff Barton Peter Dawyndt Jaap Heringa Niko Beerenwinkel Hidde de Jong Philippe Herve Tim Beissbarth Charlotte Deane Matt Hibbs Asa Ben-Hur Rahul Deo Des Higgins Nir Ben-Tal Francisco Domingues Andreas Hildebrandt Panayiotis Benos Bas E. Dutilh Lynette Hirschman Mathieu Blanchette Arne Elofsson Ivo Hofacker Christian Blaschke Olof Emanuelsson Liisa Holm Christoph Bock Andrea Foulkes Torsten Hothorn Alexander Bockmayr Jan Freudenberg Richard Hughey Richard Bonneau Toni Gabaldon Martijn Huynen Erich Bornberg-Bauer John S. Garavelli Johannes Jaeger Sebastian Böcker Mikhail Gelfand Inge Jonassen Sylvain Brohée Robert Giegerich Igor Jurisica Christine Brun Mark Girolami Daniel Kahn Philipp Bucher Frank Oliver Gloeckner Simon Kasif Jeremy Buhler Richard Goldstein Kazutaka Katoh Joachim Buhmann Didier Gonze Paul Kersey 8

ECCB10 Organization

Daisuke Kihara Jong Park Ingolf Sommer Philip Kim Helen Parkinson Christopher Southan Oliver King John Parkinson Rainer Spang Katja Kivinen Itsik Pe’er John Spouge Stefan Klamt Anders Gorm Pedersen Peter Stadler Steven H. Kleinstein Matteo Pellegrini Jörg Stelling Ina Koch Bengt Persson Peter Sterk Rachel Kolodny Mihaela Pertea Josh Stuart Peter Konings Graziano Pesole Shamil Sunyaev Martin Krallinger Gianluca Pollastri Hakim Tafer Michael Krauthammer Mihai Pop Murat Tasan Martin Kuiper Jiang Qian Chris Taylor Stefan Kurtz Jörg Rahnenführer Weidong Tian Roman Laskowski Domenico Raimondo Silvio Tosatto Sonia Leach Ben Raphael Anna Tramontano Florian Leitner Mathias Rarey Achim Tresch Hans-Peter Lenhof Magnus Rattray Sophia Tsoka Christina Leslie Gunnar Rätsch Alfonso Valencia Olivier Lichtarge Dietrich Rebholz-Schumann Jacques van Helden Michal Linial Chandan Reddy Vera van Noort Ole Lund Knut Reinert Wessel van Wieringen Florian Markowetz Marylyn Ritchie Yves Van de Peer David Martin Stéphane Robin Klaas Vandepoele Aurélien Mazurie David Rocke Roel Verhaak Alice McHardy Michal Rosen-Zvi Jean-Philippe Vert Yves Moreau Volker Roth Allegra Via Pierre Rouzé Albert Vilella Norman Morrison Patrick Ruch Jaak Vilo Vincent Moulton Marie-France Sagot Martin Vingron Arcady Mushegian Jasmin Saric Gert Vriend Luay Nakhleh Michael Schroeder San Ming Wang Goran Nenadic Ora Schueler-Furman Bertram Weiss See-Kiong Ng Stefan Schuster Andreas Wilm Cedric Notredame Russell Schwartz Guillaume Obozinski Benno Schwikowski Haim Wolfson Uwe Ohler Joachim Selbig Yu Xia Ron Ophir Roded Sharan Zohar Yakhini Sandra Orchard Hagit Shatkay Ralf Zimmer Martin Oti Istvan Simon Andrei Zinovyev Christos Ouzounis Saurabh Sinha Kimmo Palin Kimmen Sjolander

9

ECCB10 Organization

Program Committee – Co-reviewers Mohamed Abouelhoda Siederdissen Nan Qiao Aare Abroi Katharina Hoff Stephane Rombauts Pelin Akan Steve Hoffmann Ovidiu Radulescu Sandro Andreotti Franziska Hufsky Fidel Ramirez Inka Appel Rene Hussong Franck Rapaport Aaron Arvey Bo Jiang Florian Rasche Eric Bonnet Marc Johannes Elisabeth Remy Michael Beckstette Klaus Jung Anna Ritz Christian Bender Crystal Kahn Christian Rohr Stephan Bernhart Chen Keasar Patrick Ruch Chris Bielow Andreas Keller Susanna Röblitz Enrique Blanco Ilona Kifer Martin Schwarick Regina Bohnert Hisanori Kiryu Michael Sammeth Daniel Bolser Renzo Kottmann Kerstin Scheubert Anne-Laure Boulesteix Frank Kramer Ruben Schilling Quang Bao Anh Bui Roland Krause Roland Schwarz Peter Butzhammer Sita Lange Donny Soh Rainer Böckmann Glenn Lawyer Artem Sokolov Liran Carmel Andreas Leha Eric Solis Robert Castelo Limor Leibovich Oliver Stegle Yi-Chien Chang Xiaoli Li Israel Steinfeld Pao-Yang Chen Steve Lianoglou Christine Steinhoff Manfred Claassen Yao-Cheng Lin Gautier Stoll Diego Di Bernardo Fei Liu Eric Tannier Francisco Dominguez Ronny Lorenz Léon-Charles Tranchevent Anne-Katrin Emde Claudio Lottaz Anke Truss Florian Erhard Claus Lundegaard George Tsatsaronis Gaston Fiore Fälth Maria Charalampos Tsourakakis Xijin Ge Alberto J. Martin Koji Tsuda Jan Gertheiss Daniel Maticzka Andrei Turinsky Ivan Gesteira Andrey Mironov Fabio Vandin Costa Filho Michael Molla Ian Walsh Giorgio Gonnella Simon Moxon Martin Welk Assaf Gottlieb Arcady Mushegian Christian Widmer Philip Groth Mathias Möhl Sebastian Will Andreas Gruber Liat Perlman Rainer Winnenburg Mostafa Herajy Thomas N Petersen Shanshan Zhu Silvia Heyde Conrad Plake Elena Zotenko Christian Hoener zu Jaime Prilusky

10

ECCB10 Sponsors and Exhibitors

11

ECCB10 Sponsors and Exhibitors

Platinum Sponsor IBM http://www.ibm.com/be/en/

Fellowships ISCB http://www.iscb.org/

Gold Sponsors NBIC http://www.nbic.nl/

Silver Sponsors Convey Computer http://www.conveycomputer.com/ myGrid http://www.mygrid.org.uk/ CLC bio A/S http://www.clcbio.com/ ENFIN http://www.enfin.org/page.php EBI http://www.ebi.ac.uk/ ELIXIR http://www.elixir-europe.org/page.php Illumina http://www.illumina.com/ Omixon http://www.omixon.com/omixon/

12

ECCB10 Sponsors and Exhibitors

Other Sponsors FNRS http://www2.frs-fnrs.be/ FWO http://www.fwo.be/ Keygene http://www.keygene.com/home/index.php EMBO http://www.embo.org/ Cancer Research http://cancerres.aacrjournals.org/ BioMedCentral http://www.biomedcentral.com/ Tibco http://www.tibco.com Microsoft http://www.microsoft.com BioSCENTer http://www.kuleuven.be/bioscenter/ K.U.Leuven http://www.kuleuven.be/ belspo IAP http://www.belspo.com BioMAGNet http://www.kuleuven.be/biomagnet

Cambridge University Press Cambridge University Press advances learning, knowledge and research worldwide. It is an integral part of the and for centuries has extended its research and teaching activities by making available worldwide through its printing and publishing a remarkable range of academic and educational books, journals, and examination papers. For millions of people around the globe, the publications of the Press represent their only real link with the University of Cambridge. http://www.cambridge.org

Oxford University Press publishes some of the most prestigious books and journals in the world, including Bioinformatics , an official journal of the International Society for Computational Biology, Nucleic Acids Research , and our new fully open access journal, Database: The Journal of Biological Databases and Curation . Visit our stand to browse books and pick up journal sample copies. Visit us online: www.oxfordjournals.org

Chapman & Hall/CRC - Taylor & Francis group publishes books, journals and electronic databases. Check out new titles in our mathematical and computational biology series (including Golan Yona’s “Introduction to Computational Proteomics”) and key titles from our Garland list, including Understanding Bioinformatics and Molecular Biology of the Cell. Take advantage of 20% (50% off selected titles) convention discounts.

Applied Maths develops cutting-edge software for the biosciences. The software BioNumerics® is a software suite for integrated analysis and databasing of biological data in the broadest sense. When linked to Kodon®, a high level sequence analysis package or GeneMathsXT®, a highly sophisticated mathematical toolbox for the analysis of data from genechips and genearrays, it becomes a universal, powerful platform for bio-databasing and analysis. The software is licensed to more than 1.000 research sites worldwide and served for the preparation of >1.400 peer-reviewed papers.

The Springer Computer Science program serves research and academic communities around the globe. An impressive 6,000+ volumes published in the Lecture Notes in Computer Science (LNCS) contribute to 7,900+ eBooks, comprehensive reference works and 100+ journals in the Computer Science collection. Stop by the Springer booth to get acquainted with our multi-format publishing model: print – eBook - MyCopy.

Korilog is a bioinformatics company providing next generation of graphical softwares to integrate and explore pathways, genomes and proteomes data. The company provides academic and industry research labs with desktop and web-based softwares (koriblast.com; koriviewer.com) as well as expert software development services to support researchers in their R&D activities.

SciEngines provides innovative and efficient high-performance computers based on reconfigurable processors. For computational biology, data-center computing performance is achieved at a fraction of the cost - enabling new levels of analysis for modern science.

13

ECCB10 ISMB/ECCB 2011

14

ECCB10 Keynotes

KEYNOTES

Systemic analysis of biological systems: lessons from a small single bacterium and a large complex community Peer Bork

European Molecular Laboratory Lab, Heidelberg, Germany Peer Bork , Ph.D., is senior group leader and joint head of the Structural and Computational Biology unit at EMBL, a European research organization with headquarters in Heidelberg. He also holds an appointment at the Max-Delbrueck-Center for Molecular Medicine in Berlin. Dr. Bork received his Ph.D. in Biochemistry (1990) and his habilitation in Theoretical Biophysics (1995). He works in various areas of computational biology and systems analysis with focus on function prediction, comparative analysis and data integration. He has published more than 400 research articles in international, peer-reviewed journals, among them more than 45 in Nature, Science and Cell. According to ISI (analyzing the last 10 years), Dr. Bork is currently the most cited European researcher in Molecular Biology and Genetics. He is on the editorial board of a number of journals including Science and PloS Biology, and functions as senior editor of the journal Molecular Systems Biology. Dr. Bork co-founded four biotech companies, two of which went public. More than 25 of his former associates now hold professorships or other group leader positions in prominent institutions all over the world. He received the "Nature award for creative mentoring" for his achievements in nurturing young scientists and was the recipient of the prestigious "Royal Society and Academie des Sciences Microsoft Award" for the advancement of science using computational methods.

Modeling , function, and interactions Michael J. E. Sternberg

Imperial College, London, UK Professor Michael Sternberg is the Director of the Centre for Integrative Systems Biology at Imperial College (CISBIC) and Centre for Bioinformatics (CfB) at and he holds the Chair of Structural Bioinformatics at Imperial. Michael Sternberg’s research interest are protein bioinformatics and the development of logic-based chemoinformatics. His group have developed the server for protein structure prediction, 3D-Garden for protein-protein docking, and Confunc/ 3DLigand Site for protein function prediction. Recent work has developed methods to analyse the interactome and pathways. The chemoinformatics modelling employs a logic-based approach and is able to learn rules relating structure to activity and then use these rules to identify novel active molecules.

Analysis of whole-genome sequencing data from paired tumor and normal genomes Elaine Mardis

Washington University Genome Center, St-Louis, MO, USA Dr. Elaine Mardis graduated Phi Beta Kappa from the University of Oklahoma with a B.S. degree in zoology. She then went on to complete her Ph.D. in Chemistry and Biochemistry in 1989, also at Oklahoma. Following graduation, Dr. Mardis was a senior research scientist for four years at BioRad Laboratories in Hercules, CA. In 1993, Dr. Mardis joined The Genome Center at Washington University School of Medicine. As Director of Technology Development, she helped create methods and automation pipelines for sequencing the Human Genome. She currently orchestrates the Center’s efforts to explore next generation sequencing technologies and to transition them into production sequencing capabilities.

15

ECCB10 Keynotes

Dr. Mardis has research interests in the application of DNA sequencing to characterize cancer genomes. She also is interested in facilitating the translation of basic science discoveries about human disease into the clinical setting. Dr. Mardis serves on several NIH study sections, is an editorial board member of Genome Research, and acts as a reviewer for Nature and Genome Research. She serves as chair of the Basic and Translational Sciences for the American College of Surgeons Oncology Group. Dr. Mardis recently received the Scripps Translational Research award for her work on cancer genomics.

Computing the flows of life Hans Westerhoff

University of Manchester, UK and VU University of Amsterdam, The Netherlands Hans Westerhoff is Professor of Systems Biology at Manchester University and also Professor of Microbial Physiology (Free University Amsterdam, VUA) and Professor of Mathematical Biochemistry (University of Amsterdam, UvA) at the BioCentrum Amsterdam. He heads a transnational research group on Systems Biology which spans the Manchester Centre for Integrative Systems Biology (MCISB) in the Manchester Interdisciplinary BioCentre (MIB) and the BioCentrum Amsterdam (see also http://www.bio.vu.nl/hwconf). His research interest focuses on how the interactions of macromolecules can lead to biological functioning, and integrates quantitative experimentation with mathematical analyses.

The evolutionary significance of ancient whole genome duplications and computational approaches to unveiling them Yves Van de Peer

University of Ghent and Flemish Institute of Biotechnology, Belgium Yves Van de Peer is professor in Bioinformatics and Genome biology in the Department of Plant Systems Biology at Ghent University, Belgium. He is leading a bioinformatics group of about 35 people. Yves Van de Peer has published about 250 papers in peer-reviewed journals and his group is a center of excellence in the fields of gene prediction and genome annotation, comparative genomics, and (top-down) systems biology.

16

ECCB10 Social Events

SCHEDULE

Sunday, September 26 – Workshops and Tutorials

8:30 AM Workshop and tutorial registration & welcome coffee

9:00 AM Tutorial 1 Tutorial 2 Tutorial 3 Tutorial 4 Workshop 1 Workshop 2 Workshop 3 (Vd Goes) (Window 9) (Window 16) (Window5) (Window 6) AIMM ESCS1 Working Use of semantic Current Protein Learning from (Window 4) (Bauwens) with next- web resources methods and structure perturbation Annotation, Student generation in applications for validation effects interpretation symposium sequencing computational regulatory and data biology and sequence management bioinformatics analysis of mutations

10:30 AM Coffee break

11:00 AM Tutorial 1 Tutorial 2 Tutorial 3 Tutorial 4 Workshop 1 Workshop 2 Workshop 3 AIMM ESCS1

12:30 PM Lunch

1:30 PM Tutorial 1 Tutorial 2 Tutorial 3 Tutorial 4 Workshop 1 Workshop 2 Workshop 3 AIMM ESCS1

3:00 PM Coffee break

3:30 PM Tutorial 1 Tutorial 2 Tutorial 3 Tutorial 4 Workshop 1 Workshop 2 Workshop 3 AIMM ESCS1

5 PM End of workshops and tutorials

17

ECCB10 Social Events

Sunday, September 26 – Conference Opening

4:00 PM Conference registration opens

5:30 PM Conference opening and welcome

5:40 PM Keynote 1 - Peer Bork. Systemic analysis of biological systems: lessons from a small single bacterium and a large complex community

6:30 PM PT1 - Utopia documents: linking scholarly literature with research data. Teresa K. Attwood, Douglas B. Kell, Philip McDermott, James Marsh, Steve Pettifer and David Thorne

6:50 PM PT2 - Integrating genome assemblies with MAIA. Jurgen Nijkamp, Wynand Winterbach, Marcel van den Broek, Jean-Marc Daran, Marcel Reinders and Dick de Ridder

7:10 PM PT3 - A graphical method for reducing and relating models in systems biology. Steven Gay, Sylvain Soliman and François Fages

7:30 PM Walk to the reception venue

8:00 PM Welcome reception

9:30 PM End of welcome reception

18

ECCB10 Social Events

Monday, September 27

8:30 AM Registration and welcome breakfast

9:00 AM Keynote 2 - Michael J. E. Sternberg. Modeling protein structure, function, and interactions

PROCEEDINGS TRACK (Auditorium) TECH TRACK (Vd Goes)

9:50 AM PT4 - Vorescore - Fold recognition improved by rescoring of protein TT1 - Software for the data- structure models. Gergely Csaba and Ralf Zimmer driven researcher of the future . Paul Fisher et al. 10:10 AM PT5 - Solenoid and non-solenoid protein recognition using stationary wavelet packet transform. An Vo, Nha Nguyen and Heng Huang

TT2 - Clinical genomic 10:30 AM PT6 - Discriminatory power of RNA family models. Christian Hoener analysis at IBM: from HIV zu Siederdissen and Ivo L. Hofacker positive to hypertension .

Michael Rosen-Zvi et al. 10:50 AM PT7 - RactIP: fast and accurate prediction of RNA-RNA interaction using integer programming. Yuki Kato, Kengo Sato, Michiaki Hamada, Yoshihide Watanabe, Kiyoshi Asai and Tatsuya Akutsu

11:10 AM Coffee break

PROCEEDINGS TRACK (Auditorium) TECH TRACK (Vd Goes)

11:40 AM PT8 - Semi-supervised multi-task learning for predicting interactions TT3 - The Microsoft Biology between HIV-1 and human proteins. Yanjun Qi, Oznur Tastan, Jaime Foundation . G. Carbonell, Judith Klein-Seetharaman and Jason Weston Simon Mercer et al.

12:00 PM PT9 - Candidate gene prioritization based on spatially mapped gene expression: an application to XLMR. Rosario M. Piro, Ivan Molineris, Ugo Ala, Paolo Provero and Ferdinando Di Cunto

12:20 PM PT10 - Prediction of a gene regulatory network linked to prostate TT4 - Study capturing: from cancer from gene expression, microRNA and clinical data . Eric research question to sample Bonnet, Tom Michoel and Yves Van de Peer annotation. Kees van Bochove et al. 12:40 PM PT11 - Inferring cancer subnetwork markers using density- constrained biclustering. Phuong Dao, Recep Colak, Raheleh Salari, Flavia Moser, Elai Davicioni, Alexander Schoenhuth and Martin Ester

19

ECCB10 Social Events

1:00 PM Lunch

PROCEEDINGS TRACK (Auditorium) TECH TRACK (Vd Goes)

2:15 PM PT12 - Modeling associations between genetic markers using Bayesian TT5 - NextGen biology with networks. Edwin Villanueva and Carlos Dias Maciel TIBCO Spotfire. Christof Gaenzler et al. 2:35 PM PT13 - Mass spectrometry data processing using zero-crossing lines in multi-scale of Gaussian derivative wavelet. Nha Nguyen, Heng Huang, Soontorn Oraintara and An Vo

2:55 PM PT14 - Detecting host factors involved in virus infection by observing TT6 - Partitioning biological the clustering of infected cells in siRNA screening images. Apichat data with transitivity Suratanee, Ilka Wörz, Petr Matula, Anil Kumar, Lars Kaderali, Karl Rohr, clustering. Ralf Bartenschlager, Roland Eils and Rainer König Jan Baumbach et al.

3:15 PM Announcement of ISMB/ECCB 2011 and ECCB 2012

3:30 PM Coffee break

4:00 PM Keynote 3 – Elaine Mardis . Analysis of whole-genome sequencing data from paired tumor and normal genomes

PROCEEDINGS TRACK (Auditorium) TECH TRACK (Vd Goes)

4:50 PM PT15 - Characteristics of 454 pyrosequencing data – enabling realistic ART MEETS SCIENCE simulation with Flowsim. Susanne Balzer, Ketil Malde, Anders Lanzén, Animesh Sharma and Inge Jonassen The Cosmopolitan Chicken Research Project. 5:10 PM PT16 - A Fast for exact sequence search in biological Koen Vanmechelen and Jean- sequences using polyphase decomposition. Abhilash Srikantha, Ajit Jacques Cassiman. Bopardikar, Kalyan Kaipa, Venkataraman Parthasarathy, KyuSang Lee, TaeJin Ahn and Rangavittal Narayanan

5:30 PM PT17 - Classification of ncRNAs using position and size information in deep sequencing data. Florian Erhard and Ralf Zimmer

5:50 PM Poster session (ODD numbers) & drinks

8:00 PM Visit city brewery Gruut and evening stroll through Ghent

20

ECCB10 Social Events

Tuesday, September 28

8:30 AM Registration and coffee

PROCEEDINGS TRACK (Auditorium) TECH TRACK (Vd Goes)

9:50 AM PT18 - Prototypes of elementary functional loops unravel evolutionary connections between protein functions. Alexander Goncearenco and Igor Berezovsky

10:10 AM PT19 - Improved sequence-based prediction of disordered regions TT7 - CLC bio, a with multilayer fusion of multiple information sources. Marcin J. comprehensive platform for Mizianty, Wojciech Stach, Ke Chen, Kanaka Durga Kedarisetti, Fatemeh NGS data analysis. Miri Disfani and Lukasz Kurgan Roald Forsberg

10:30 AM PT20 - A predictor for toxin-like proteins exposes cell modulator candidates within viral genomes. Guy Naamati, Manor Askenazi and Michal Linial

10:50 AM Coffee break

PROCEEDINGS TRACK (Auditorium) TECH TRACK (Vd Goes)

11:20 AM PT21 - Discriminative and informative features for biomolecular text TT8 - DNA sequencing with mining with ensemble feature selection. Sofie Van Landeghem, Thomas Illumina instruments and Abeel, Yvan Saeys and Yves Van de Peer chemistry: current and future applications. 11:40 AM PT22 - BioXSD: the common data-exchange format for everyday Dirk Evers bioinformatics web services. Matúš Kalaš, Pål Puntervoll, Alexandre Joseph, Edita Bartaševi čiūt÷, Armin Töpfer, Prabakar Venkataraman, Steve Pettifer, Jan Christian Bryne, Jon Ison, Christophe Blanchet, Kristoffer Rapacki and Inge Jonassen

12 PM PT23 - Improving disease gene prioritization using semantic TT9 - Super-scale sequence similarity of Gene Ontology terms. Andreas Schlicker, Thomas data analysis with hybrid core Lengauer and Mario Albrecht computing. John Leidel

12:20 PM PT24 - Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism. Luis Tari, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral

21

ECCB10 Social Events

12:40 PM Lunch

PROCEEDINGS TRACK (Auditorium) TECH TRACK (Vd Goes)

2:10 PM PT25 - Nonlinear dimension reduction and clustering by Minimum TT10 - How robust are NGS Curvilinearity unfold neuropathic pain and tissue embryological whole-genome assemblies? A classes. Carlo Vittorio Cannistraci, Timothy Ravasi . Franco Maria case study with plant Montevecchi, Trey Ideker and Massimo Alessio genomes. Niina Haiminen, Laxmi Parida 2:30 PM PT26 - A varying threshold method for ChIP peak-calling using multiple sources of information. KuanBei Chen and Yu Zhang

2:50 PM PT27 - is-rSNP: a novel technique for in silico regulatory SNP detection. Geoff Macintyre, James Bailey, Izhak Haviv and Adam Kowalczyk

3:10 PM Poster session (EVEN numbers) and coffee

5:10 PM Bus departures and ride to Bruges. Free time in Bruges

8 PM Banquet dinner

10:30 PM Bruges to Ghent. Last bus: 12 AM

22

ECCB10 Social Events

Wednesday, September 29

8:30 AM Registration and coffee

9 AM Keynote 4 – Hans Westerhoff. Computing the flow of life

PROCEEDINGS TRACK (Auditorium) TECH TRACK (Vd Goes)

9:50 AM PT28 - Efficient parameter search for qualitative models of TT11 - Data integration in regulatory networks using symbolic model checking. Gregory Batt, proteomics through EnVsion Michel Page, Irene Cantone, Gregor Goessler, Pedro Monteiro and and EnCore webservices. Hidde de Jong Rafael Jimenez et al.

10:10 AM PT29 - Bayesian experts in exploring reaction kinetics of transcription circuits. Ryo Yoshida, Masaya Saito, Hiromichi Nagao and Tomoyuki Higuchi

10:30 AM PT30 - Dynamic deterministic effects propagation networks: TT12 - ELIXIR: a sustainable learning signalling pathways from longitudinal protein array data. European infrastructure for Christian Bender, Frauke Henjes, Holger Fröhlich, Stefan Wiemann, biological information. Ulrike Korf and Tim Beißbarth Andrew Lyall

10:50 AM PT31 - A novel approach for determining environment-specific protein costs: the case of Arabidopsis thaliana. Max Sajitz-Hermstein and Zoran Nikoloski

11:10 AM Coffee break

PROCEEDINGS TRACK (Auditorium) TECH TRACK (Vd Goes)

11:40 AM PT32 - De-correlating expression in gene-set analysis. Dougu Nam TT13 - The Protein Data Bank in Europe (PDBe) - 12 PM PT33 - How threshold behaviour affects the use of subgraphs for bringing structure to biology. network comparison. Tiago Rito, Zi Wang, Charlotte Deane and Gesine Wim Vranken et al. Reinert

12:20 PM PT34 - Discovering graphical Granger causality using the truncating TT14 - The Universal Protein lasso penalty. Ali Shojaie and George Michailidis Resource (UniProt). Maria J. Martin

12:40 PM Lunch

23

ECCB10 Social Events

2:00 PM PT35 - Parsimony and likelihood reconstruction of human segmental TT15 - Accurate next gen duplications. Crystal Kahn, Benjamin Raphael and Borislav Hristov sequencing data analysis on cloud computing. 2:20 PM PT36 - Maximum likelihood estimation of locus-specific mutation rates Attila Bérces et al. in Y-chromosome short tandem repeats. Osnat Ravid-amir and Saharon Rosset

2:40 PM Coffee break

3:10 PM Keynote 5 – Yves Van de Peer. The evolutionary significance of ancient whole genome duplications and computational approaches to unveiling them

4:00 PM Closing remarks and awards

4:30 PM END OF CONFERENCE

24

ECCB10 Social Events

SOCIAL EVENTS

Sunday The opening reception in the historic Bijloke abbey starts at 8 PM. The Bijloke, originally donated by Countess Johanna van Constantinopel, is home to the Bijloke hospital and the Bijloke abbey. The hospital, founded in 1228 A.D., is a magnificent example of medieval hospital architecture. The unique wooden roof of the first patient hall is constructed from a complete oak forest coming from the Ardennes. The second patient hall, the Craeckhuys, built in the 16th century, is a unique example of 16th century hospital architecture. These days the site serves as a music centre for the city of Ghent. There will be live music by Swingalicious, a young band of 5 swinging musicians who studied at the Conservatory of Ghent, brought together by singer Katrien Van Opstal. They will play a fine set of standards, mostly jazz. Known or unknown pieces, swing or bossa, they will make sure that you learn one thing: jazz is fun!

Monday

Brewery Gruut City brewery Gruut is located in the very heart of Ghent, Belgium. In the Middle Ages the river Lys divided the city of Ghent into two parts. On the right bank the brewers used typical beer ingredients like hops. The brewers on the left bank, which was ruled by the French, added a mixture of herbs or Gruut to their beers. Landlords collected a tax from the brewers based on the amount of spice (“Gruuts”) used in the beer. Our brewery combines present day brewing techniques with centuries old tradition. The beer is unique and especially healthy because of the use of special spices instead of hop. We will visit the brewery (and taste the beer!) on Monday, leaving the conference site at 8 PM. There will be a limited number of free tickets available at the registration desk on Monday morning.

Evening Stroll through Ghent For those who want to explore the historic center of Ghent and try some of the local beverages, we offer a scenic walk through the city center along many of Ghent’s impressive medieval buildings. Teams can scout through the city center, assembling bits of a map to guide them to the next location. In typical Belgian pubs along the route, participants have the opportunity to sample some exclusive Belgian beers. Register for free on Monday.

Tuesday

Conference Dinner The banquet dinner will take place in the magnificent city of Bruges. Only a short bus ride away from Ghent, the "Venice of the North" is a prominent World Heritage Site of UNESCO. There will be time for a stroll through the city before the reception and evening dinner. Busses will leave the conference site at 5:10 PM. The banquet dinner starts at 8 PM in Oud Sint Jan, a XIX th century former hospital. Last busses to Ghent will leave Bruges at midnight.

25

ECCB10 Workshops

WORKSHOPS

Workshop 1: Learning from perturbation effects Reverse engineering of biological networks is a key to the understanding of biological systems and to identifying new drug targets. Perturbation techniques, like target specific inhibitors and RNA interference, open tremendous possibilities to detect so far unknown interdependencies from (possibly multidimensional) effects. At the same time computational methods being able to derive network hypotheses from such complex data play a crucial role. This workshop aims to bring together computational scientists working with various approaches for this challenging task. The goal is to give an overview about different computational methods in the field and to strengthen and initiate new cooperations.

Organizer Prof. Dr. Holger Fröhlich, Bonn-Aachen International Center for IT (B-IT), Bonn, Germany e-mail: [email protected]. http://www.abi.bit.uni-bonn.de

Program

Session 1: Models for RNAi-screening data Niko Beerenwinkel (ETH Zurich (D-BSSE), Basel, Switzerland): Computational approaches to reconstruct signaling networks of pathogen entry from RNAi screens Mirko Birbaumer (ETH Zurich, Zurich, Switzerland): From vesicle features to cellular phenotypes: statistical clustering in image-based high-throughput RNAi screens Lars Kaderali (University of Heidelberg, Bioquant, Germany): Reconstructing signaling pathways with probabilistic boolean threshold networks

Coffee Break

Session 2: Models for static perturbation effects I Florian Markowetz (Cancer Research UK Cambridge Research Institute, Cambridge, UK): The end of the screen is the beginning of the experiment: pathway integration of hits in genome-wide RNAi screens Holger Fröhlich (University of Bonn, Bonn-Aachen International Center for IT (BIT), Germany): Demo session: nested effects models at work Charles Vaske (Princeton University, USA): Predicting and expanding biological pathways using a factor graph nested effects model

Lunch

Session 3: Models for static perturbation effects II Lan Zagar (University of Ljubljana, Slovenia): Inference of epistasis from yeast genome-scale genetic interaction map Achim Tresch (Gene Center Munich, Ludwig-Maximilians University Munich, Germany): Modeling combinatorial interventions in transcriptional networks Nicole Radde (University of Stuttgart, Germany): A statistical Bayesian framework for identification of biological networks from perturbation experiments Marloes Maathuis (ETH Zurich, Switzerland): Predicting perturbation effects in large-scale systems from observational data

26

ECCB10 Workshops

Coffee Break

Session 4: Models for dynamic perturbation effects Rainer Spang (University of Regensburg, Germany): Dynamic nested effects models Christian Bender (German Cancer Research Center, Heidelberg, Germany): Dynamic deterministic effects propagation networks Sven Nelander (Sahlgrenska-CMR/ Wallenberg Laboratory and Gothenburg Mathematical Modeling Center, Sweden): Human transcriptional networks revealed by endogenous perturbations in cancer tumors

Workshop 2: Annotation, Interpretation and Management of Mutations (AIMM) This workshop presents the state of the art in extraction and reuse of genotype-phenotype information. Annotation of mutations with their impact on phenotypic expression is crucial to understanding genetic mechanisms involved in phenotypic processes and ultimately in complex diseases. This knowledge is key to generating novel hypotheses. Despite the existence of literature and databases describing impacts of mutations, association studies fail to deliver linkage to phenotypes which is the most important contemporary research interest. Extraction of such information from scientific literature is a promising research field and existing solutions are ready to be deployed as services.

Organizers Christopher J.O. Baker Ph.D., Associate Professor / Innovatia Research Chair, Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, Canada. e-mail: [email protected] Dietrich Rebholz-Schuhmann MD, Ph.D., Research Group Leader, European Bioinformatics Institute, Welcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom. e-mail: [email protected] René Witte Dr.-Ing., Assistant Professor, Group Leader, Semantic Software Lab, Concordia University, Department of Computer Science and Software Engineering, Montreal, Canada. E-mail: [email protected]

Keynote Speakers Michael Schroeder (BIOTEC Technical University Dresden, Germany): Extraction and reuse of mutations and annotations from literature using ontologies Joost Schymkowitz (VIB Switch Laboratory, Vrije Universiteit Brussel, Belgium): A knowledgebase for phenotyping of human SNPs and disease mutations

Accepted Papers

Toward a richer representation of sequence variation in the sequence ontology Michael Bada and Karen Eilbeck

Deploying the mutation impact mining pipeline with SADI: an exploratory case study Alexandre Riazanov, Jonas Bergman Laurila and Christopher JO Baker

GenoS: segment-based representation of genomics data. Application to genotyping data management Jean-pierre Kocher, Hugues Sicotte, Yaxiong Lin and Eric Klee

A semantic assistant for mutation mentions in PubMed abstracts Jonas Bergman Laurila, Alexandre Kouznetsov, and Christopher JO Baker

Frankenstein Junior: a relational learning approach toward protein engineering Elisa Cilia and Andrea Passerini 27

ECCB10 Workshops

Improving the prediction of disease -related variants using protein three-dimensional structure Emidio Capriotti and Russ B Altman

Characterization of pathogenic germline mutations in human protein kinases Jose Maria Gonzalez-Izarzugaza Martinez, Lisa Hopcroft, Anja Baresic, , Andre w Martin and Alfonso Valencia

Workshop 3: ESCS1—European Student Council Symposium The European Student Council Symposium is a forum for students and young researchers in the fields of Computational Biology and Bioinformatics. Participants will have the op portunity to present their work to an international audience, build a network within the computational biology community and develop important soft skills in an environment that fosters exchange of ideas and knowledge.

Chairs Magali Michaut, ESCS1 chair (Terrence Donnelly CCBR, University of Toronto, Ontario, Canada) Thomas Abeel, ESCS1 co-chair (Department of Plant Systems Biology, VIB, Ghent University, Gent, Belgium and Broad Institute of MIT and Harvard, Cambridge, MA, USA)

Program 9:00 AM Ice breaker 9:30 AM Keynote “Generating hypotheses for molecular biology through (top -down) systems biology” Yves Van de Peer, VIB and University of Ghent, Belgium 10:30 AM Coffee break 11:00 AM Students: contributed presentations based on abstract submission 12:30 PM Lunch and poster session 1:30 PM Soft skills workshop “Presenting Science using the theatre approach” Gijs Meeusen 3:00 PM Coffee break 3:30 PM Soft skills workshop presentation skills continues 5:00 PM Posters

28

ECCB10 Tutorials

TUTORIALS

The purpose of the tutorial program is to provide participants with lectures and instruction covering topics relevant to the bioinformatics field. It offers participants an opportunity to learn about new areas of bioinformatics research, to get an introduction to important established topics, or to develop advanced skills in areas about which they are already familiar.

Tutorial 1: Working with next-generation sequencing data In recent years, there has been a revolution in the area of DNA sequencing with the arrival of next-generation sequencing technologies. The type and volume of the data produced by next-generation sequencing machines presents many previously unseen informatics challenges. This tutorial will help people who are getting started on next-generation sequencing get an idea of the tools, flows, and procedures that they may need to set up to handle this data. In this short course, we will introduce the participants to the different next-generation sequencing technologies, show how to do some basic quality checking of the data, how to run the various next- generation alignment tools, create de novo sequence assemblies, and call variants (such as SNPs, short indels, and structural variants) from a reference sequence. All slides and data from examples will be available online. Prerequisites for this tutorial are an interest in genome sequencing and basic UNIX skills.

Presenters Thomas Keane completed his Ph.D. degree in the area of distributed computing and high-throughput phylogenomics from NUI Maynooth (Ireland) in 2006. He subsequently moved to the Pathogen Genomics group at the Wellcome Trust Sanger Institute to work on sequence assembly of several pathogens such as Plasmodium falciparum strains, Trypanosoma brucei, and several other pathogen genomes. In 2008, he co- founded the Vertebrate Resequencing Informatics group and manages the sequencing, informatics, and variation pipelines for large projects such as the 1000 genomes and mouse genomes project. Jan Aerts received his Ph.D. at Wageningen University (Netherlands) in 2005 on the subject of chicken genome mapping and sequencing. After a post-doc position at the Roslin Institute near Edinburgh in Scotland - working on the cow genome assembly - he now works at the Wellcome Trust Sanger Institute near Cambridge (UK). His current work includes downstream analysis of next-generation sequencing data in order to identify putative SNPs and indels. He will start an assistant-professorship at the University of Leuven in October.

Schedule

Slot 1 - Thomas Overview of next-generation sequencing technologies Applications of next-generation sequencing Quality control measures and metrics of libraries/lanes Data storage (file formats) and metadata

Slot 2 - Thomas Introduction to short read alignment and tools Practical instructions and examples on use of short read aligners Parallelising short read alignment Introduction to sequence assembly methods and tools Practical instructions on use of short read assembly tools to get the optimal sequence assembly

Slot 3 - Jan Overview of variation calling from next-generation sequence data SNP calling theory and tools 29

ECCB10 Tutorials

Short indel calling theory and tools Practical examples of variation calling File formats for variation calling and storage

Slot 4 - Jan Introduction to structural variation Summary of different types of structural variants (large insertions, deletions, inversions, translocations, copy number variants) Overview of algorithms and tools for calling structural variants Practical examples of calling structural variants Visualisation of structural variants

Tutorial 2: Use of Semantic Web resources in computational biology and bioinformatics The Semantic Web is a set of technologies, or a framework, which is designed to make data integration possible via the web, with the addition of a precise semantic characterization of entities and relations (ontologies). As data integration is a pre-requisite for systems biology and translational research, the Semantic Web can bring relevant benefits in these areas. The aim of this tutorial is to briefly introduce the key basic principles needed to understand what it means to represent information on the Semantic Web, and then to provide the attendees with basic hands on competences to start using biomedical information resources which are now available on this framework. Prerequisites for this tutorial are knowledge of main biology databases; programming skills are a plus. People with laptops and more proficient in programming will have the chance to explore a little bit further the exercises (we will leave some slightly more challenging exercises for the after course), however the course will be designed to be followed as a frontal presentation, at times very interactive.

Presenters Paolo Romano , Ph.D., Bioengineer, is a Senior Scientist in bioinformatics at the National Cancer Research Institute of Genoa, Italy. His research interests include biomedical data management, network standards and tools, data integration, interoperability, ontologies, semantics methods and tools. He designed and contributed to the development of the Cell Line Database and its associated hypertext HyperCLDB, and he designed and developed Biowep, the Workflow Enactment Portal for Bioinformatics. Andrea Splendiani , Ph.D., is a Senior Scientist in data integration at Rothamsted Research (BBSRC), Harpenden, UK. He has previous experience in microarray databases design and standards (University of Milano-Bicocca and Genopolis consortium, Italy), in Systems Biology (Institut Pasteur, France) and in Medical Informatics (University of Rennes1 – now ISERM U936, France). His research interests include standardization of biological data and in particular pathway information, biological data integration and the development of interactive systems to access and analyse biological information.

Schedule

Morning session Introduction to basic principles of Semantic Web based representation of biological information (1h, theory). In this hour we will introduce the basics of RDF, and focus on the importance of URIs, shared relations and the implication of the open world assumption. We will invite the participants to think at RDF in terms of of a conceptual model, rather than XML, where is important to be precise and “what” we are referring to and on which predicates we use. We will show that the Semantic Web is easy. Introduction to relevant biomedical resources, including UniProt, Bioportal, Pathwaycommons (1h, theory). In this hour we introduce briefly a few main biomedical resources which are available on the Semantic Web. The objective is to contextualise what we have presented before to the biomedical domain, and to provide some resources which will be used in the following examples and exercises. At the same time, given the widespread use of the resources above, it is likely that this introduction will

30

ECCB10 Tutorials

enable participants to quickly adopt them via the Semantic Web in their daily work. (Note: this lesson can easily accommodate a break as distinct resources are presented, which don't require a continuous flow of attention). The remainder of the tutorial will introduce technologies through a very simple hands-on use case which will guide the participants through exporting data on the Semantic Web, integrating this data with other existing biomedical resources and querying the resulting integrated (distributed) knowledge-base. Introduction to the simple examples and explanation of how several technologies fit together (15') Introduction to the D2RQ relational mapping system (45'). We will show how to generate an automatic mapping between a relational database and RDF via the D2RQ tool. We will then show how to tune this mapping to better represent URIs and relations, and then we will show how to open a SPARQL endpoint via D2RQ.

Afternoon session Practical example on how to use D2RQ (1h). Participants with laptops or more proficient in programming will be provided an extended example and invited to practice hands on, while we will present a subset of this example in detail to the audience, making this section very interactive by inviting participants to propose how they would conceptually map their information to RDF (which will help them to refine their understanding of this language). Triplestores (30'). We will introduce other triplestores, which can complement D2RQ in a production environment. We will briefly introduce their pros and cons and provide tips on how to install and operate them. Introduction to the SPARQL query language (45'). How to query D2RQ or any triplestore providing a SPARQL endpoint: examples of the most common and useful SPARQL constructs. Practical example on performing queries with SPARQL. We will begin by presenting a list of queries of increasing complexity, which will involve the mapping realized in the previous example (via D2RQ) and the biomedical resources presented in the morning. We will explain how to express these queries in SPARQL, and we will show how the proper definition of URIs for entities and relations is the basis for data integration, which is now automatically realized at query time. This will refer to what we have presented in the introduction and the participants have practiced in the previous session.

Tutorial 3: Current methods and applications for regulatory sequence analysis The annotation of the non-coding genome with gene regulatory function is lagging far behind the annotation of protein-coding genes and improved annotation will depend both on deeper biological insight into cis-regulatory logic and on more efficient computational prediction algorithms. Recent data obtained by high-throughput experiments accelerate the genome-wide identification of regulatory elements but also provide additional bioinformatics challenges. In the light of these developments, this tutorial will focus on bioinformatics methods to predict cis-regulatory elements and to aid the process of regulatory annotation. Participants will be provided with an overview of existing resources (databases, tools) and methods for detecting cis-regulatory elements in genome sequences, and generate testable hypotheses about the binding specificity of transcription factors (motifs discovery), their precise binding locations (binding site prediction), and their target genes (target identification), and go towards regulatory networks. A list of software will be provided in advance, in order to allow participants to install it on their laptop. Some knowledge of the Unix interface is required.

Presenters Jacques van Helden , Ph.D., is heading the Laboratory of Genome and network Bioinformatics (BiGRe - ULB). The BiGRe laboratory is specialized in the development, evaluation and application of bioinformatics approaches to analyze genome regulation, biomolecular networks, metabolic pathways and mobile genetic elements. Since 1997, Jacques van Helden has been developing a suite of specialized software tools for the analysis of genome regulation, the Regulatory Sequence Analysis Tools (RSAT), which integrates a variety of algorithms for motif discovery, sequence scanning for motifs, phylogenetic footprinting, analysis of ChIP-seq data, etc.

31

ECCB10 Tutorials

Stein Aerts , Ph.D., is heading the Laboratory of Computational Biology at the University of Leuven. The lab focuses on computational identification of cis-regulatory elements, on mapping transcriptional networks, on "omics" data integration, and on next-generation sequencing (NGS) data analysis. Stein Aerts is the developer of the TOUCAN software for regulatory sequence analysis, of several other motif and module discovery algorithms and text-mining applications, of the ENDEAVOUR application for gene prioritization, and of the cisTargetX method for transcriptional target identification in Drosophila.

Schedule • Retrieving regulatory sequences from UCSC, EnsEMBL, RSAT, Toucan,.... Upstream sequences, introns, repeat masking, multi-genome sequence retrieval,... Web-server, Web service or command-line access to resources • From binding sites to binding motifs o Public databases about transcriptional regulation : motifs, sites and regulatory regions. JASPAR, PAZAR, ORegAnno, FlyReg o Regular expressions o Position-specific scoring matrices (PSSM) Concepts: counts, frequencies, pseudo-counts, weights, information content o PSSM formats and their inter-conversions • Pattern matching: predicting binding sites o String-based pattern matching o Matrix-based pattern matching  matching scores  estimation of the risks (P-value distribution) o Cis-regulatory modules (clusters of motifs)  homotypic  heterotypic • Pattern discovery: methods and applications o String-based pattern discovery o PSSM-based pattern discovery o PSSM enrichment detection o Combinations of PSSMs • Phylogenetic footprinting o detecting conserved binding sites by comparative genomics • Applications o Clusters of co-expressed genes o Motif discovery using high-throughput data (ChIP-seq, ...) o Genome-wide target prediction o Gene regulatory networks • Evaluation of prediction performances o Benchmark datasets o Strength and pitfalls of evaluation: sensitivity, positive predictive value, accuracy, ROC curves,... o Method comparisons • Browsing genomes for cis-regulatory elements o Visualization o Integration of predictions and annotations

Tutorial 4: Protein structure validation Recent advances in homology modeling and have made clear that the quality of protein structures is really important for good results. Today’s software and CPU time availability on clusters, super computers, and the grid allow for easy improvement of old files. It is therefore important for all protein structure bioinformaticians to be able to evaluate the quality of the structures used in their studies and to know when it might be beneficial to ask an NMR spectroscopist or X-ray crystallographer to help bring an old file up to

32

ECCB10 Tutorials today's standards. This workshop will teach the participants how to use a series of protein structure validation tools and how to interpret the results. The origins of problems are discussed as well as their importance for follow up studies. In silico experience with protein structures is required. We will assume that the participants of this workshop have spent at least a hundred hours studying protein structures with molecular visualisation software. Extensive knowledge of any particular visualisation program, however, is not required. Candidates who cannot bring their own portable computer are requested to contact the organisers as only a limited number of PCs is available.

Presenters Robbie Joosten recently completed his Ph.D. at the CMBI with as topic “Protein Structure Validation.” His PhD project consisted of two major topics. The first topic was the detection of errors in PDB files. The second topic was rerefinement of all X-ray PDB files. After obtaining his Ph.D., Robbie moved to the X-ray department of the Dutch Cancer Institute (the NKI in Amsterdam) where he continues with the further improvement of (all) X- ray PDB files. Jurgen Doreleijers got his Ph.D. in the group of Rob Kapteyn in Utrecht with as topic NMR protein structure quality. After his Ph.D. defense Jurgen moved to the BMRB in Wisconsin, USA. Here he worked on the remediation of the experimental NMR data that has been deposited with the NMR structure ensembles in the PDB. Jurgen recently moved to the CMBI where he works in the group of Geerten Vuister on the CING software that evaluates NMR structures. Hanka Venselaar is a Ph.D. student at the CMBI. Her research project (automation of the prediction of the phenotypic effects of protein point mutations). This project includes extensive studies of protein structures and often involves a multitude of protein structure quality aspects. Gert Vriend is one of the pioneers in the field of protein structure quality determination. His WHAT_CHECK software is for many years already the de facto standard in this field. He has published dozens of articles in this field, and he has been the Ph.D. supervisor of a series of Ph.D. students who worked on the structure validation of X-ray or NMR files.

Schedule

Seminars Gert Vriend - Introduction Robbie Joosten - X-ray structure evaluation Jurgen Doreleijers - NMR structure evaluation

Practicals Hanka Venselaar - Using the YASARA twinset for protein structure evaluation Gert Vriend - Reading WHAT_CHECK reports

Parallel sessions Jurgen Doreleijers - Using CING to evaluate protein structures Robbie Joosten - Errors in X-ray structures

33

ECCB10 Art Meets Science

ART MEETS SCIENCE

ECCB has a tradition of organizing ‘Art Meets Science’ events. This year’s event is a joint lecture by Koen Vanmechelen and Jean-Jacques Cassiman, an artist and a geneticist.

Belgian artist Koen Vanmechelen makes iconoclastic sculpture, paintings, glasswork and installations. His daring expeditions into contemporary science, philosophy and ethics have resulted into several internationally acclaimed projects. Best known is The Cosmopolitan Chicken Project (TCCP), a vast attempt to create and manipulate scores of chicken breeds from all over the world into a new species, a universal chicken or Superbastard. Four subprojects constitute the TCCP: Virtual crossing, Experimental crossing, The Walking Egg, and The Accident, Chronicles of The Cosmopolitan Chicken. Prof. Jean-Jacques Cassiman is a human geneticist, renowned for pioneering genetic testing and forensic medicine in Belgium and for genetic research on cystic fibrosis. From 1993 to 1999, he was secretary general of the European Society of Human Genetics (ESHG). In 1998, he demonstrated through DNA testing that Karl Wilhelm Naundorff was not a descendant of the Bourbons and certainly not Louis XVII. In 2004, he demonstrated the heart that had been kept in Paris belonged to Louis XVII. For several decades, he has been a key figure in communicating genetic research to the general public and the media in Flanders. Together they have initiated the Cosmopolitan Chicken Research Project , a genetic research project that ties art and science together by studying the genetic diversity of chicken breeds and Cosmopolitan Chicken hybrids.

Website http://www.koenvanmechelen.be

34

ECCB10 Technology Track

TECHNOLOGY TRACK

TT1—Software for the data-driven researcher of the future Alan Williams 2, Aleksandra Nenadic 2, Shoaib Sufi 2, Danius Michaelides 2, David Withers 2, Don Cruickshank 2, Franck Tanoh 2, Ian Dunlop 2, Jiten Bhagat 2, Katy Wolstencroft 2, Paolo Missier 2, Sergejs Aleksejevs 2, Stian Soiland-Reyes 2, Stuart Owen 2, Peter Li 2, Finn Bacell 2, Mannie Taggs 2, Rishi Ramgolam 2, Marco Roos 2, Eric Nzuobontane 2, Thomas Laurent 2, David De Roure 2, Robert Stevens 2, Steve Pettifer 2, Rodrigo Lopez 2,Carole Goble 2

The myGrid project have developed a suite of open source software tools for the registration and discovery of Life Science Web Services (BioCatalogue), and the designing, execution, and sharing of scientific workflows (Taverna and myExperiment) for analytics and data processing. The Taverna workflow workbench provides an environment in which scientists design and execute workflows, combining local and public distributed services and data resources into a single experimental protocol. The latest Taverna 2.2.0 release features: a command line tool for running workflows without the need to interact directly with a user-interface; access to the BioCatalogue of Life Science Web Services via a plug-in interface; and a Taverna Server for running workflows on a server as well as on the desktop. myExperiment is a social networking site that provides a collaborative environment where scientists can safely publish their workflows, experiment plans, and standard operating procedures (SOPs). This allows researchers to share them with individuals, groups, and even discover those of others that can be subsequently re-used. myExperiment makes it easy for the next generation of scientists to contribute to a pool of scientific methods, build communities and form relationships – reducing time-to-experiment, sharing expertise and avoiding reinvention. The BioCatalogue is a community driven registry of Life Science Web Services. It provides an open platform for Web Services registration, annotation and monitoring, with a comprehensive REST API. Moreover, BioCatalogue is a platform for Web Service providers to publish and advertise their services, providing a centralised and curated catalogue of Web Services, and to build a collaborative environment where the community can find, contact and meet the experts and maintainers of these services. Taverna, BioCatalogue and myExperiment together create an integrated solution for bioinformatics, data analysis and analytics. We show how these tools together address the challenges of large scale data processing that comes with Next Generation Sequencing.

URL http://www.mygrid.org.uk

Speaker Dr. Paul Fisher 1

Author affiliations 1University of Manchester 2University of Manchester, University of Southampton, , EMBL-EBI, University of Leiden

TT2—Clinical genomic analysis at IBM: from HIV positive to hypertension Ehud Aharoni 1, Hani Neuvirth 1, Noam Slonim 1, EuResist GEIE partners 2, Hypergenes partners 3

The domain of personalized health and genome-based therapy has flourished in recent years due to the significant reduction of information storage devices, the reduction in care delivery costs, and improved clinical outcomes. Building on IBM’s leadership in areas like systems integrations, cloud computing, massive scale

35

ECCB10 Technology Track analytics and even emerging areas of science like nanomedicine, we focus on creating technologies and processes that will build an evidence-centric healthcare ecosystem. IBM has defined four main areas of research: evidence generation, which uses scientific methods to turn raw health data into proof of effective treatment methods; the ability to deliver evidence in a context-dependant and personalized way at the point of care; improving service quality through simplifying the complex healthcare delivery process; and incentives and models to shift the healthcare industry to a system that rewards based on outcomes and healthier patients rather than only treatment and volume of care. In recent years, IBM Research in Haifa, Israel has been involved in research related to evidence generation, with a focus on using the massive volumes of clinical and genomic data arriving from different hospitals to find new ways of optimizing patient treatment. Data mining and machine learning techniques were also applied to decipher the relationship between clinical status and genomic variations, with the goal of helping improve diagnostics and treatment. From 2006 to 2008, IBM Research and the EuResist GEIE consortium developed a drug-interaction modeling tool that lets users predict the success rate of various drug combinations and their impact on virus evolution via an online portal. A set of prediction engines that leverage medical data (for example, viral gene sequences, patient histories) from seven sources are available in the portal and recommend which therapies are expected to be most efficient given viral, genomic, and other clinical and demographic measures. This engine predicts patient response to therapy with 78% accuracy, outperforming other common tools. Since then more data has been contributed and the algorithms were updated based on the new information. In 2010, IBM and the GEIE partners updated the recommendation to include new drugs. IBM researchers are now working with the European HYPERGENES consortium to identify the genetic variations responsible for hypertension and associated organ damage. The team is working with clinical data accumulated from hypertensive and healthy people and genomic biomarkers provided by Illumina’s 1Million SNP chip. The aim is to create a comprehensive genetic-epidemiological model that takes into account how genomics and other factors help improve diagnostic accuracy and introduce new strategies for early detection, prevention, and therapy. This effort will help create more personalized treatment plans for individuals that suffer from hypertension.

URL http://engine.euresist.org/ and http://srv-rimon.haifa.il.ibm.com:8080/rimon_web/snpWeights.jsp

Speaker Dr. Michal Rosen-Zvi, manager 1

Author affiliations 1 machine learning and data mining group, IBM Haifa Research Lab 2 EuResist GEIE 3 Hypergenes

TT3—The Microsoft Biology Foundation Michael Zyskowski, Research Program Manager 1; Bob Davidson, Principal Software Architect 1

The aim of the Microsoft Biology Foundation (MBF) project is to produce a well-architected and comprehensively-documented library of common functionality related to bioinformatics and genomics, with the intention of making it easier to write life science applications on the Windows platform. Using C# and the .NET 4.0 framework provides additional levels of flexibility for the developer – over 70 .NET programming languages are compatible, from Visual Basic and Python to C++ and F#. It also leverages the power of .NET – over 15,000 pre-written functions - and takes advantage of .NET Parallel Extensions, a new feature which can parallelize algorithms across all cores and processors of the local machine. This demonstration will include a brief tour of the MBF library, including details of its free, open source, community-curated and community-owned philosophy and how scientists and developers can participate in future development. We will also demonstrate the flexibility and usability of the library through a range of applications, including a DNA sequence assembler using the Windows Presentation Foundation, an add-in for

36

ECCB10 Technology Track

Microsoft Excel integrating bioinformatics functionality directly with the spreadsheet, access to webservices including demonstration of the Microsoft cloud computing solution Azure, and integration with HPC and scientific workflows.

URL http://mbf.codeplex.com

Speaker Simon Mercer, Ph.D. Director of Health and Wellbeing 1

Author affiliations 1 Microsoft Research

TT4—Study capturing: from research question to sample annotation Machiel Jansen 2, Jeroen Wesbeek 1, Tjeerd Abma 3, Jahn-Takeshi Saito 4, Adem Bilican 4, Michael van Vliet 5, Prasad Gajula 6, Vincent Ludden 1, Robert Horlings 1, Siemen Sikkema 7, Margriet Hendriks 3, Chris Evelo 4, Ben van Ommen 1, Jildau Bouwman 1

The demonstration of the software tool will focus mainly on the part about capturing biological studies. During the demonstration, it will become clear how we achieve the following goals: • Facilitate standardization of the description of biological research studies • Leverage existing standards and ontologies and integrate them into one user interface • Cover complex study designs, such as double-blind crossover designs • Provide an overview of the different studies done in e.g. a consortium, to encourage cooperation and data exchange • Track samples that are used for omics assays back to the source, the circumstances under which and the subject from which they were taken • Provide the first cornerstone needed to do ‘multi-omics’ data analysis over multipe different studies and multiple omics platforms • Focus on user experience, through extensive testing with biologists The demonstration will have the following outline: • Walk the attendants through the process of adding a study to the study capture tool • Demonstration of the connection of samples in the study capture tool to omics data • Give an example of a query involving both study design data and omics data

URL http://www.dbnp.org

Speaker Kees van Bochove 1,2

Author affiliations 1 TNO Quality of Life, Zeist 2 NBIC BioAssist Engineering Team 3 UMCU Metabolomics Centre, UMC Utrecht 4 BiGCaT Department of Bioinformatics, Maastricht University 5 Leiden/Amsterdam Centre for Drug Research 6 Plant Research International, Wageningen University 7 Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam

TT5—NextGen biology with TIBCO Spotfire Prof. Peter van der Spek 2, Dr. Andreas Kremer 2

Spotfire helps you to integrate and analyse Microarray and other –omics data and align it with your LIMS and screening data. The complexity of such an analysis is rising with every new data format you are adding. The analyst needs powerful and flexible software tools to get results out of the combined data.

37

ECCB10 Technology Track

The TIBCO Spotfire software platform fulfils all important business needs of data analysis in modern bioinformatics. Every bioinformatician knows that collections of .CEL and .csv files are not sufficient for analyses as well as simple relational data bases are not. Even Data Warehouses and associated reporting tools are not sufficient to turn data into knowledge. Spotfire DecisionSite is well known for high quality ad-hoc analysis, but also this is not sufficient anymore. We will show how the new TIBCO Spotfire brings all pieces together, reduces the complexity of the analysis and is compliant with your company IT infrastructure. We will demonstrate how Spotfire and the bioinformatics department of the Erasmus Medical Center in Rotterdam built an interactive human brain atlas. This project shows the combination of genomics, proteomics and cytogenetic data with medical imaging. The aim is to identify genes associated with neurological symptoms and diseases (see ECCB2010 poster H13, A. Kremer).

URL http://spotfire.tibco.com

Speaker Dr. Christof Gaenzler, Senior Solution Consultant 1

Author Affiliations 1 TIBCO Spotfire, Germany 2 Erasmus Medical Center, Rotterdam

TT6—Partitioning biological data with Transitivity Clustering Tobias Wittkop 1, Sita Lange 2, Dorothea Emig 4, Sven Rahmann 3, Jan Baumbach 4,5

Partitioning biomedical data objects into groups, such that the objects in each group share common traits, is a long-standing challenge in computational biology. Here we present an integrated data clustering framework based on weighted transitive graph projection: Transitivity Clustering. We illustrate a typical, biomedical clustering task that starts with a list of amino acid sequences, investigates similarity functions and parameter estimation problems, and finally deals with an integrated result interpretation; all of which can be done easily with Transitivity Clustering, but with no other clustering software. We will further demonstrate how to use Transitivity Clustering to identify protein complexes in protein-protein interaction data. With Transitivity Clustering, we provide the user with easy-to-use interfaces that significantly ease and improve each step of the typical data clustering workflow: (1) A web interface for a quick analysis of medium-sized data sets, (2) a powerful stand-alone Java implementation for large-scale data clustering, and (3) a collection of Cytoscape plugins that also provide further methods to answer typical follow-up questions. Reference : Wittkop T, Emig D, Lange S, Morris JH, Boecker S, Rahmann S, Albrecht M, Stoye J, Baumbach J (2010) Partitioning biological data with Transitivity Clustering. Nature Methods. 2010 Jun;7(6):419-20.

URL http://transclust.cebitec.uni-bielefeld.de

Speaker Dr. Jan Baumbach 1,2,3,4,5

Author affiliations 1 Buck Institute for Age Research, USA 2 Albert-Ludwigs-University, Freiburg, Germany 3 Technische Universität Dortmund, Germany 4 Max Planck Institute for Informatics, Germany 5 International Computer Science Institute, University of California at Berkeley, USA

TT7—CLC bio, a comprehensive platform for NGS data analysis Not long ago high-throughput DNA sequencing (HTS) was reserved for a few dedicated genome centers that have dedicated and specialized bioinformatics staff and hardware for the analysis of sequencing data. But over the course of only a few years HTS has become broadly accessible to many non-specialized researchers and groups and the responsibility for analyzing very large HTS data sets may now rest upon researchers with a background in molecular biology or medicine and no formal computer training. For these

38

ECCB10 Technology Track groups the establishment of an appropriate software and hardware platform for bioinformatics analysis presents a very hard and often insurmountable challenge. Most bioinformatics software for analysis of HTS data exist as stand-alone command line tools that are inaccessible to non-specialists. This creates the need for a dedicated bioinformatician in the data analysis workflow to perform simple and tedious tasks such as simply establishing the analysis software infrastructure, executing commands and parsing textual or binary output. This is unfortunate since it makes the access to bioinformatics personnel a serious and poorly scalable bottleneck in the data analysis work flow and directs valuable bioinformatics brain resources away from creative and proactive tasks and towards tedious and non value creating routine tasks. Furthermore, it disempowers the biomedical researchers that requested the data and formulated the biological hypotheses by making them unable to directly inspect, handle and analyze the data themselves. Another serious bottleneck for data analysis is the access to appropriate hardware. The output of a single run of a sequencing instrument is now at around 50 giga bases and still increasing. This places very tough demands on the hardware used in the analysis regarding CPU power and accessible memory. Again, the recent broad dissemination of HTS data means that very large data sets are now accessible to research group that have insufficient financial or personnel resources to establish a dedicated and powerful computational infrastructure. In order to give these groups access to data analysis, software must be designed that can operate on and maximize the utility of standard, economic and omnipresent computer solutions. In organizations where powerful central computing facilities can be installed and dedicated to HTS data analysis these can only be accessed through technically challenging interfaces that are inaccessible to most employees and require specialist intervention and a drain on the bioinformatics personnel resources. To resolve these issues, we have developed and present the CLC bio integrated solution for analysis and data handling of high-throughput sequencing (HTS) data.

URL http://www.clcbio.com

Speaker Roald Forsberg, PhD 1

Author affiliations 1Director of Scientific Software Solutions

TT8—DNA sequencing with Illumina instruments and chemistry: current and future applications Klaus Maisinger 1, Anthony Cox 1, Lisa Murray 1, Come Raczy 1

The session will be separated into three parts, starting with the current range of Illumina sequencers and the latest performance figures achieved in the production sequencing facilities at Illumina Cambridge in Chesterford Research Park, UK. The data will serve as a basis to describe the Illumina software pipeline from RTA real time analysis on the instrument PC to CASAVA for mapping and genome variation detection via grid computing. This will be followed by a characterization of the current state of the art in high-throughput sequencing workflows. Starting from typical computing and storage volumes we will extrapolate application parameters for future research scenarios in de novo assembly, whole genome and whole exome re-sequencing as well as metagenomics. Finally, we will describe bioinformatics scientist positions currently open at our R&D site in Chesterford near Cambridge, UK.

39

ECCB10 Technology Track

Speaker Dirk J. Evers, Director Computational Biology 1

Author affiliations 1Illumina Inc.

TT9—Super-scale sequence data analysis with Hybrid Core computing It is clear that the latest generation of sequencing devices are a vast improvement over previous generations. However, the volume of data created by these new devices has significantly outpaced the ability to sufficiently analyze the data in a timely manner with traditional processors. As such, the only feasible methodology is the adoption of accelerator technologies. The current general purpose graphics processing methodologies provide interesting platforms for development, yet fail to deliver the sufficient order of speedup required to keep pace with the data explosion. Field programmable gate arrays (FPGAs) have typically performed quite well for bioinformatics applications. However, the difficulty in programming the soft cores and the difficulty in data management have historically failed to allow for widespread adoption. Convey’s Hybrid Core computing platform is an ideal platform for sequencing tasks in that it combines the traditional x86 economy of scale with an abstracted set of FPGAs for direct algorithmic synthesis coupled with a cache-coherent memory subsystem. The platform includes a significant set of compilers, tools and libraries such that the historical pain in programming FPGAs is fundamentally eliminated. In this way, Convey has the ability to implement very discretely managed algorithmic kernels (called personalities ), which serve as logic and instruction set extensions to the x86 environment. In doing so, the Convey platform allows users and applications the ability to significantly parallelize and outperform traditional Von Neumann architectures.

Speaker John Leidel 1

Author affiliations 1Software Architect, Convey Computer Corp

TT10—How robust are NGS whole-genome assemblies? A case study with plant genomes Laxmi Parida 1

A decade after the human genome was sequenced, a large section of the computational/bioinformatics community believes the general assembly problem to be “solved”, in spite of the recent changes in the underlying technologies. However, the “quality” of a draft genome of an organism continues to dog the community that studies the genomes closely with a fine-toothed comb for biological insights such as phenotypic associations with specific loci, strain-specific polymorphism detection and so on. As genome assemblers come of age and their numbers grow, is it possible to consolidate multiple assemblies so that the quality of the resulting draft is of a higher quality than the individual constituents? We are developing a system to investigate these questions and their related issues. One such burning issue is to understand, model and detect erroneously assembled segments. Clearly the misassemblies require more attention than gaps in the assembly (the former is the usual problem of we-don’t-know-what-we-don’t-know) and is also perhaps more amenable to computational/algorithmic remedies. We are focusing on two directions to study these: one is on intrinsic single assembly statistics and the other on comparing assemblies. Our work is motivated by the need for the generation of a high quality draft for Theobroma cacao, a plant with estimated 440 Mb genome size that produces cocoa beans, the basic ingredient in chocolate. The case study with the plant genomes originated within the USDA/Mars/IBM Consortium for sequencing the T. cacao genome. In this talk I will describe our assembly evaluation environment and present some preliminary results.

40

ECCB10 Technology Track

Speaker Niina Haiminen 1

Author affiliations 1 IBM T. J. Watson Research Center, Computational Genomics, Yorktown Heights, NY, USA

TT11—Data integration in proteomics through EnVsion and EnCore webservices Pascal Kahlem 1, Henning Hermjakob 1

This demo will introduce the EnCore infrastructure as collection of webservices to query proteomics data. We will describe how to use individual EnCore webservices and we will explain how to create workflow connecting different services. We will explain how the EnCore platform is technically structured and how it simplifies the way to use webservice using a common format named “EnXML”. We will list and briefly describe webservices available in EnCore. Among these services we will explore how to query molecular interactions, protein identifications, biological pathways, protein sequence information, biological models, protein localization and ontology distribution. We will learn how to use EnVision, a web graphical user interface to query EnCore webservices and display elaborate information from its results. We will see examples to query EnCore using EnVision and we will describe in detail the results obtained by EnVsion.

URL http://www.enfin.org

Speaker Rafael Jimenez 1

Author affiliations 1 European Bioinformatics Institute

TT12—ELIXIR: a sustainable European infrastructure for biological information Dr Andrew Lyall 1

ELIXIR has recently completed a two year, European wide consultation involving academic and industrial users, data providers and international collaborators, including three stakeholders meetings, two surveys, and fourteen work packages. ELIXIR will be a distributed infrastructure arranged as a hub and nodes, with the hub at the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK. The ELIXIR Steering Committee has recently issued a Request for Suggestions for ELIXIR nodes, seeking input from organisations that are interested in hosting one of its ‘nodes’, in order to help shape ELIXIRs construction. Organisations wishing to contribute their input have been requested to consider how they might contribute: data resources; bio-computing capacity; infrastructure for data integration; and services for the research community, including training and standards development. Responses from this round of stakeholder input will enable ELIXIR to: define the landscape of potential nodes; provide the coordination necessary to ensure interoperability; and identify any duplications and missing capabilities.

URL http://www.elixir-europe.org

Author affiliations 1 ELIXIR Project Manager, EMBL-EBI

41

ECCB10 Technology Track

TT13—The Protein Data Bank in Europe (PDBe) - bringing structure to biology Sameer Velankar 1, Gerard Kleywegt 1

In 2011, the Protein Data Bank (PDB) celebrates its 40th anniversary. To date, the PDB has essentially been a historic archive capturing structures (and the underpinning experimental data) of biomacromolecules published in the primary literature. As the use of structural data by non-experts becomes commonplace, the demands on the archive (by both these users and funding agencies) and the way it is made accessible will inevitably change. It is therefore necessary to transform sites that serve the archive into resources that are directly relevant for scientists who work in biomedicine and related disciplines, while simultaneously taking care not to alienate the communities that produce the structures. EMBL-EBI’s Protein Data Bank in Europe (PDBe), one of the founders of the Worldwide Protein Data Bank (wwPDB), is committed to becoming such a resource. The first step towards this goal is the recent make-over of our webpages. During the ECCB session, we will demonstrate how to use the new PDBe pages to navigate to the five areas (“ALIVE”) where we are traditionally strong: Advanced services, Ligands, Integration, Validation and Experimental data. We will also relate this to our core position within the European Bioinformatics Institute (EBI). During the session we will demonstrate:

• New front pages and functionality • Wizard for structure newbies • PDBprints (visual overview of PDB entry) • PDBeXplore (exploring biological data) • Easy navigation to the five areas (“ALIVE”) on which the PDBe focuses • Advanced services • SSM (Secondary Structure Matching) • Pisa (Quaternary Structure) • PDBeMotif (Motifs and Sites) • Ligands • PDBeChem (Ligand information) • Integration with other databases and resources • SIFTS project (in collaboration with UniProt) • Validation (implementing new tools and resources) • Experimental data • The Electron Density Server (which will be ported from Uppsala to PDBe • The Electron Microscopy Database (EMDB) • Nuclear Magnetic Resonance (NMR) data

URL www.pdbe.org

Speaker Dr Wim Vranken 1

Author affiliations 1 Protein Data Bank in Europe (PDBe), EMBL-EBI

TT14—The Universal Protein Resource (UniProt) The UniProt Consortium 1,2,3

UniProt is the central resource for storing and interconnecting information from large and disparate sources, and the most comprehensive catalog of protein sequence and functional annotation. UniProt is built upon the

42

ECCB10 Technology Track extensive bioinformatics infrastructure and scientific expertise at European Bioinformatics Institute (EBI), Protein Information Resource (PIR) and Swiss Institute of Bioinformatics (SIB). It has four components optimized for different uses. The UniProt Knowledgebase (UniProtKB) is an expertly curated database, a central access point for integrated protein information with cross-references to multiple sources. The UniProt Archive (UniParc) is a comprehensive sequence repository, reflecting the history of all protein sequences. UniProt Reference Clusters (UniRef) merge closely related sequences based on sequence identity to speed up searches. The UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for the expanding area of metagenomic and environmental data. Other developments include the ID mapping service which allows users to map between UniProtKB and more than 30 other data sources; and UniSave, a comprehensive history service of UniProtKB entries. The demonstration will cover: • A brief description of the UniProt databases. • Accessing UniProt using simple query syntax. The user will be presented with helpful suggestions and hints. • Exploration of sequence similarity searches, alignments and ID mapping tools provided. • Accessing UniProt data programmatically. This demonstration will also encourage user interaction and feedback.

URL http://www.uniprot.org/

Speaker Maria J. Martin, Ph.D.1

Author affiliations 1 Team Leader, UniProt (Development), EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK 2 Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven St. NW, Suite 1200 Washington, DC 20007, USA 3 Swiss Institute of Bioinformatics, Centre Medical Universitaire 1 rue Michel Servet 1211 Geneva 4, Switzerland.

TT15—Accurate next gen sequencing data analysis on cloud computing Miklós Cs őrös 2, Szilveszter Juhos 1

We present an internet based, automated, highly accurate genome variant analysis toolkit for next generation sequencing data called Omixon Variant Toolkit. This method is applicable for exome analysis or for analyzing highly variable genomes. Accurate, reliable and highly sensitive genome variant analysis based on next-generation sequencing short-read data poses a significant computational challenge. There is always a trade-off between computational speed and guaranteed high accuracy. (P. Flicek & E. Birney, S6 vol. 6 no. 11s 2009 Nature Methods) A small increase in the desired accuracy can exponentially increase computational demand. Since many genomics applications do not require high precision, the most popular methods are fast but less accurate especially for identifying small insertions and deletions. These variants are important for variable genomes, in cancer research or biomarker discovery. We developed an accurate and computationally efficient method implemented in the Omixon Variant Toolkit which we offer as an internet-based service. In this presentation we demonstrate how to use our automated analysis tool for genome variant discovery. We also demonstrate the accuracy of the method through comparing results obtained with the most popular and the best open source tools available. We show that the Omixon Variant Toolkit is able to map short reads in the genome in areas of high density of insertions, deletions and SNP and longer (10-12 bp) deletions. We carried out comparative analysis of 14 strains of a bacteria causing inflammatory disorder in humans including some pathogenic, non-pathogenic, drug-resistant, non-resistant strains. Our method was able to identify correctly variants and many of them were later confirmed by Sanger sequencing. We shall present additional example applications in human exome analysis. Additional benefits of our internet-based method is that users do not need expensive computational facilities 43

ECCB10 Technology Track since the calculation is carried out on a cloud computing platform, where the user pays only for the particular calculation. It is significantly more cost efficient than owning your own infrastructure. In addition, our software is very easy to use and guarantees high accuracy without having to know large number of different parameter settings. Users can control the maximum number of mismatches and indels and can choose from three levels of sensitivity: fast, sensitive and ultra sensitive. Accuracy is achieved by a spaced seeds mapping, followed by greedy extension and a precise statistical alignment based on a pair-hidden Markov model, combining DNA sequence evolution models and sequencing errors (from read quality files). The method was published in Csuros, Juhos, Berces “Fast Mapping and Precise Alignment of AB SOLiD Color Reads to Reference DNA” Springer Lecture Notes in Bioinformatics 6293:176- 188, 2010.

URL http://www.omixon.com

Speaker Attila Bérces 1

Author affiliations 1Omixon, Budapest, Hungary 2University of Montreal, Canada

44

ECCB10 Oral Presentations

ORAL PRESENTATIONS

Sequence Analysis, Alignment and Next Generation Sequencing

PT2—Integrating genome assemblies with MAIA Jurgen Nijkamp 1,2,3, ∗, Wynand Winterbach 1,4 , Marcel van den Broek 2,3 , Jean-Marc Daran 2,3 , Marcel Reinders 1,3,5 and Dick de Ridder 1,3,5 1The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Industrial Microbiology Group, Department of Biotechnology, Delft University of Technology, Julianalaan 67, 2628 BC Delft, 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, 4Network Architectures and Services, Department of Telecommunications, Delft University of Technology, Mekelweg 4, 2628 CD Delft and 5Netherlands Bioinformatics Center, 260 NBIC, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands Motivation : De novo assembly of a eukaryotic genome with next-generation sequencing data is still a challenging task. Over the past few years several assemblers have been developed, often suitable for one specific type of sequencing data. The number of known genomes is expanding rapidly, therefore it becomes possible to use multiple reference genomes for assembly projects. We introduce an assembly integrator that makes use of all available data, i.e. multiple de novo assemblies and mappings against multiple related genomes, by optimizing a weighted combination of criteria. Results : The developed algorithm was applied on the de novo sequencing of the Saccharomyces cerevisiae CEN.PK 113-7D strain. Using Solexa and 454 read data, two de novo and three comparative assemblies were constructed and subsequently integrated, yielding 29 contigs, covering more than 12 Mbp; a drastic improvement compared with the single assemblies. Availability : MAIA is available as a Matlab package and can be downloaded from http://bioinformatics.tudelft.nl Contact : [email protected]

PT15—Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim Susanne Balzer 1,2, ∗, Ketil Malde 1, ∗ , Anders Lanzén 3,4 , Animesh Sharma 2 and Inge Jonassen 2,3 1Institute of Marine Research, PO Box 1870, N-5817, 2Department of Informatics, University of Bergen, PO Box 7803, N- 5020, 3Computational Biology Unit, Bergen Center for Computational Science, Thormøhlensgate 55, N-5008 and 4Department of Biology, University of Bergen, PO Box 7803, N-5020, Bergen Motivation : The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to ∼500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments. Results : We explore 454 raw data to investigate its characteristics and derive empirical distributions for the flow values generated by pyrosequencing. Based on our findings, we implement Flowsim, a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences. We finally use our simulator to examine the impact of sequence lengths on the results of concrete whole-genome assemblies, and we suggest its use in planning of sequencing projects, benchmarking of assembly methods and other fields. Availability : Flowsim is freely available under the General Public License from http://blog.malde.org/index.php/flowsim/ Contact : [email protected]; [email protected]

PT16—A fast algorithm for exact sequence search in biological sequences using polyphase decomposition Abhilash Srikantha 1, ∗, Ajit S. Bopardikar 1, ∗, Kalyan Kumar Kaipa 1, Parthasarathy Venkataraman 1, Kyusang Lee 2, TaeJin Ahn 2 and Rangavittal Narayanan 1 1Samsung Advanced Institute of Technology Lab, Bangalore, Karnataka, India and 2Genetic Analysis Group, Emerging Center, Samsung Advanced Institute of Technology, Suwon, South Korea 45

ECCB10 Oral Presentations

Motivation : Exact sequence search allows a user to search for a specific DNA subsequence in a larger DNA sequence or database. It serves as a vital block in many areas such as Pharmacogenetics, Phylogenetics and Personal Genomics. As sequencing of genomic data becomes increasingly affordable, the amount of sequence data that must be processed will also increase exponentially. In this context, fast sequence search algorithms will play an important role in exploiting the information contained in the newly sequenced data. Many existing algorithms do not scale up well for large sequences or databases because of their high-computational costs. This article describes an efficient algorithm for performing fast searches on large DNA sequences. It makes use of hash tables of Q-grams that are constructed after downsampling the database, to enable efficient search and memory use. Time complexity for pattern search is reduced using beam pruning techniques. Theoretical complexity calculations and performance figures are presented to indicate the potential of the proposed algorithm. Contact : [email protected]; [email protected]

PT17—Classification of ncRNAs using position and size information in deep sequencing data Florian Erhard ∗ and Ralf Zimmer Institut für Informatik, Ludwig-Maximilians-Universität München, Amalienstraße 17, 80333 München, Germany Motivation : Small non-coding RNAs (ncRNAs) play important roles in various cellular functions in all clades of life. With next-generation sequencing techniques, it has become possible to study ncRNAs in a high- throughput manner and by using specialized algorithms ncRNA classes such as miRNAs can be detected in deep sequencing data. Typically, such methods are targeted to a certain class of ncRNA. Many methods rely on RNA secondary structure prediction, which is not always accurate and not all ncRNA classes are characterized by a common secondary structure. Unbiased classification methods for ncRNAs could be important to improve accuracy and to detect new ncRNA classes in sequencing data. Results : Here, we present a scoring system called ALPS (alignment of pattern matrices score) that only uses primary information from a deep sequencing experiment, i.e. the relative positions and lengths of reads, to classify ncRNAs. ALPS makes no further assumptions, e.g. about common structural properties in the ncRNA class and is nevertheless able to identify ncRNA classes with high accuracy. Since ALPS is not designed to recognize a certain class of ncRNA, it can be used to detect novel ncRNA classes, as long as these unknown ncRNAs have a characteristic pattern of deep sequencing read lengths and positions. We evaluate our scoring system on publicly available deep sequencing data and show that it is able to classify known ncRNAs with high sensitivity and specificity. Availability : Calculated pattern matrices of the datasets hESC and EB are available at the project web site http://www.bio.ifi.lmu.de/ALPS. An implementation of the described method is available upon request from the authors. Contact : [email protected]

Comparative Genomics, Phylogeny and Evolution

PT35—Parsimony and likelihood reconstruction of human segmental duplications Crystal L. Kahn 1, ∗, Borislav H. Hristov 1 and Benjamin J. Raphael 1,2, ∗ 1Department of Computer Science and 2Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA Motivation : Segmental duplications >1 kb in length with ≥90% sequence identity between copies comprise nearly 5% of the human genome. They are frequently found in large, contiguous regions known as duplication blocks that can contain mosaic patterns of thousands of segmental duplications. Reconstructing the evolutionary history of these complex genomic regions is a non-trivial, but important task. Results : We introduce parsimony and likelihood techniques to analyze the evolutionary relationships between duplication blocks. Both techniques rely on a generic model of duplication in which long, contiguous substrings are copied and reinserted over large physical distances, allowing for a duplication block to be constructed by aggregating substrings of other blocks. For the likelihood method, we give an efficient dynamic programming algorithm to compute the weighted ensemble of all duplication scenarios that account for the construction of a duplication block. Using this ensemble, we derive the probabilities of various duplication scenarios.We

46

ECCB10 Oral Presentations formalize the task of reconstructing the evolutionary history of segmental duplications as an optimization problem on the space of directed acyclic graphs. We use a simulated annealing heuristic to solve the problem for a set of segmental duplications in the human genome in both parsimony and likelihood settings. Availability : Supplementary information is available at http://www.cs.brown.edu/people/braphael/supplements. Contact : [email protected]; [email protected].

PT36—Maximum likelihood estimation of locus-specific mutation rates in Y-chromosome short tandem repeats Osnat Ravid-Amir and Saharon Rosset ∗ Department of Statistics and Operation Research, Tel Aviv, Israel Motivation : Y-chromosome short tandem repeats (Y-STRs) are widely used for population studies, forensic purposes and, potentially, the study of disease, therefore knowledge of their mutation rate is valuable. Here we show a novel method for estimation of site-specific Y-STR mutation rates from partial phylogenetic information, via the maximum likelihood framework. Results : Given Y-STR data classified into haplogroups, we describe the likelihood of observed data, and develop optimization strategies for deriving maximum likelihood estimates of mutation rates. We apply our method to Y-STR data from two recent papers. We show that our estimates are comparable, often more accurate than those obtained in familial studies, although our data sample is much smaller, and was not collected specifically for our study. Furthermore, we obtain mutation rate estimates for DYS388, DYS426, DYS457, three STRs for which there were no mutation rate measures until now. Contact : [email protected]

Protein and Nucleotide Structure

PT4—Vorescore—fold recognition improved by rescoring of protein structure models Gergely Csaba ∗ and Ralf Zimmer ∗ Practical Informatics and Bioinformatics Group, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstr. 17, D-80333 München, Germany Summary : The identification of good protein structure models and their appropriate ranking is a crucial problem in structure prediction and fold recognition. For many alignment methods, rescoring of alignment- induced models using structural information can improve the separation of useful and less useful models as compared with the alignment score. Vorescore, a template-based protein structure model rescoring system is introduced. The method scores the model structure against the template used for the modeling using Vorolign. The method works on models from different alignment methods and incorporates both knowledge from the prediction method and the rescoring. Results : The performance of Vorescore is evaluated in a largescale and difficult protein structure prediction context. We use different threading methods to create models for 410 targets, in three scenarios: (i) family members are contained in the template set; (ii) superfamily members (but no family members); and (iii) only fold members (but no family or superfamily members). In all cases Vorescore improves significantly (e.g. 40% on both Gotoh and HHalign at the fold level) on the model quality, and clearly outperforms the state-of-the-art physics-based model scoring system Rosetta. Moreover, Vorescore improves on other successful rescoring approaches such as Pcons and ProQ. In an additional experiment we add high-quality models based on structural alignments to the set, which allows Vorescore to improve the fold recognition rate by another 50%. Availability : All models of the test set (about 2 million, 44GB gzipped) are available upon request. Contact : [email protected]; [email protected]

PT5—Solenoid and non-solenoid protein recognition using stationary wavelet packet transform An Vo 1, ∗,† , Nha Nguyen 2,† and Heng Huang 3 1The Feinstein Institute for Medical Research, North Shore LIJ Health System, NY, 2Department of Electrical Engineering and 3Department of Computer Science and Engineering, University of Texas at Arlington, TX, USA Motivation : Solenoid proteins are emerging as a protein class with properties intermediate between structured and intrinsically unstructured proteins. Containing repeating structural units, solenoid proteins are expected to

47

ECCB10 Oral Presentations share sequence similarities. However, in many cases, the sequence similarities are weak and non-detectable. Moreover, solenoids can be degenerated and widely vary in the number of units. So that it is difficult to detect them. Recently, several solenoid repeats detection methods have been proposed, such as self-alignment of the sequence, spectral analysis and discrete Fourier transform of sequence. Although these methods have shown good performance on certain data sets, they often fail to detect repeats with weak similarities. In this article, we propose a new approach to recognize solenoid repeats and non-solenoid proteins using stationary wavelet packet transform (SWPT). Our method associates with three advantages: (i) naturally representing five main factors of protein structure and properties by wavelet analysis technique; (ii) extracting novel wavelet features that can capture hidden components from solenoid sequence similarities and distinguish them from global proteins; (iii) obtaining statistics features that capture repeating motifs of solenoid proteins. Results : Our method analyzes the characteristics of amino acid sequence in both spectral and temporal domains using SWPT. Both global and local information of proteins are captured by SWPT coefficients. We obtain and integrate wavelet-based features and statistics-based features of amino acid sequence to improve the classification task. Our proposed method is evaluated by comparing to state-of-the-art methods such as HHrepID and REPETITA. The experimental results show that our algorithm consistently outperforms them in areas under ROC curve. At the same false positive rate, the sensitivity of our WAVELET method is higher than other methods. Availability : http://www.naaan.org/anvo/Software/Software.htm Contact : [email protected]

PT6—Discriminatory power of RNA family models Christian Höner zu Siederdissen ∗ and Ivo L. Hofacker Institute for Theoretical Chemistry, University of Vienna, Währinger Strasse 17, A-1090 Wien, Austria Motivation : RNA family models group nucleotide sequences that share a common biological function. These models can be used to find new sequences belonging to the same family. To succeed in this task, a model needs to exhibit high sensitivity as well as high specificity. As model construction is guided by a manual process, a number of problems can occur, such as the introduction of more than one model for the same family or poorly constructed models. We explore the Rfam database to discover such problems. Results : Our main contribution is in the definition of the discriminatory power of RNA family models, together with a first algorithm for its computation. In addition, we present calculations across the whole Rfam database that show several families lacking high specificity when compared to other families. We give a list of these clusters of families and provide a tentative explanation. Our program can be used to: (i) make sure that new models are not equivalent to any model already present in the database; and (ii) new models are not simply submodels of existing families. Availability : www.tbi.univie.ac.at/software/cmcompare/. The code is licensed under the GPLv3. Results for the whole Rfam database and supporting scripts are available together with the software. Contact : [email protected]

PT7—RactIP: fast and accurate prediction of RNA-RNA interaction using integer programming Yuki Kato 1, ∗,† , Kengo Sato 2, ∗,† , Michiaki Hamada 3,4 , Yoshihide Watanabe 5, Kiyoshi Asai 2,4 and Tatsuya Akutsu 1 1Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, 2Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, 3Mizuho Information & Research Institute, Inc, 2-3 Kanda-Nishikicho, Chiyoda-ku, Tokyo 101-8443, 4Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-41-6, Aomi, Koto-ku, Tokyo 135- 0064 and 5Department of Mathematical Sciences, Faculty of Science and Engineering, Doshisha University, 1-3 Tataramiyakodani, Kyotanabe, Kyoto 610-0321, Japan Motivation : Considerable attention has been focused on predicting RNA–RNA interaction since it is a key to identifying possible targets of non-coding small RNAs that regulate gene expression posttranscriptionally. A number of computational studies have so far been devoted to predicting joint secondary structures or binding sites under a specific class of interactions. In general, there is a tradeoff between range of interaction type and efficiency of a prediction algorithm, and thus efficient computational methods for predicting comprehensive type of interaction are still awaited.

48

ECCB10 Oral Presentations

Results : We present RactIP, a fast and accurate prediction method for RNA–RNA interaction of general type using integer programming. RactIP can integrate approximate information on an ensemble of equilibrium joint structures into the objective function of integer programming using posterior internal and external base-paring probabilities. Experimental results on real interaction data show that prediction accuracy of RactIP is at least comparable to that of several state-of-the-art methods for RNA–RNA interaction prediction. Moreover, we demonstrate that RactIP can run incomparably faster than competitive methods for predicting joint secondary structures. Availability : RactIP is implemented in C++, and the source code is available at http://www.ncrna.org/software/ractip/ Contact : [email protected]; [email protected]

Prediction and Annotation of Molecular Function

PT18—Prototypes of elementary functional loops unravel evolutionary connections between protein functions Alexander Goncearenco 1,2 and Igor N. Berezovsky 1, ∗ 1Computational Biology Unit, Bergen Center for Computational Science and 2Department of Informatics, University of Bergen, N-5008 Norway Motivation : Earlier studies of protein structure revealed closed loops with a characteristic size 25–30 residues and ring-like shape as a basic universal structural element of globular proteins. Elementary functional loops (EFLs) have specific signatures and provide functional residues important for binding/activation and principal chemical transformation steps of the enzymatic reaction. The goal of this work is to show how these functional loops evolved from pre-domain peptides and to find a set of prototypes from which the EFLs of contemporary proteins originated. Results : This article describes a computational method for deriving prototypes of EFLs based on the sequences of complete genomes. The procedure comprises the iterative derivation of sequence profiles followed by their hierarchical clustering. The scoring function takes into account information content on profile positions, thus preserving the signature. The statistical significance of scores is evaluated from the empirical distribution of scores of the background model. A set of prototypes of EFLs from archaeal proteomes is derived. This set delineates evolutionary connections between major functions and illuminates how folds and functions emerged in pre-domain evolution as a combination of prototypes. Contact : [email protected]

PT19—Improved sequence-based prediction of disordered regions withmultilayer fusion of multiple information sources Marcin J. Mizianty, Wojciech Stach, Ke Chen, Kanaka Durga Kedarisetti, Fatemeh Miri Disfani and Lukasz Kurgan ∗ Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada T6G 2V4 Motivation : Intrinsically disordered proteins play a crucial role in numerous regulatory processes. Their abundance and ubiquity combined with a relatively low quantity of their annotations motivate research toward the development of computational models that predict disordered regions from protein sequences. Although the prediction quality of these methods continues to rise, novel and improved predictors are urgently needed. Results : We propose a novel method, named MFDp (Multilayered Fusion-based Disorder predictor), that aims to improve over the current disorder predictors. MFDp is as an ensemble of 3 Support Vector Machines specialized for the prediction of short, long and generic disordered regions. It combines three complementary disorder predictors, sequence, sequence profiles, predicted secondary structure, solvent accessibility, backbone dihedral torsion angles, residue flexibility and B-factors. Our method utilizes a custom-designed set of features that are based on raw predictions and aggregated raw values and recognizes various types of disorder. The MFDp is compared at the residue level on two datasets against eight recent disorder predictors and top- performing methods from the most recent CASP8 experiment. In spite of using training chains with ≤25% similarity to the test sequences, our method consistently and significantly outperforms the other methods based on the MCC index. The MFDp outperforms modern disorder predictors for the binary disorder assignment and

49

ECCB10 Oral Presentations provides competitive real-valued predictions. The MFDp’s outputs are also shown to outperform the other methods in the identification of proteins with long disordered regions. Availability : http://biomine.ece.ualberta.ca/MFDp.html Contact : [email protected]

PT20—A predictor for toxin-like proteins exposes cell modulator candidates within viral genomes Guy Naamati 1, Manor Askenazi 2,3 and Michal Linial 2, ∗ 1School of Computer Science and Engineering, 2Department of Biological Chemistry, Sudarsky Center for Computational Biology, Hebrew University of Jerusalem, Israel and 3Blais Proteomics Center, Dana-Farber Cancer Institute, Boston, MA, USA Motivation : Animal toxins operate by binding to receptors and ion channels. These proteins are short and vary in sequence, structure and function. Sporadic discoveries have also revealed endogenous toxin-like proteins in non-venomous organisms. Viral proteins are the largest group of quickly evolving proteomes. We tested the hypothesis that toxin-like proteins exist in viruses and that they act to modulate functions of their hosts. Results : We updated and improved a classifier for compact proteins resembling short animal toxins that is based on a machine-learning method. We applied it in a large-scale setting to identify toxin-like proteins among short viral proteins. Among the ∼ 26 000 representatives of such short proteins, 510 sequences were positively identified. We focused on the 19 highest scoring proteins. Among them, we identified conotoxin-like proteins, growth factors receptor-like proteins and anti-bacterial peptides. Our predictor was shown to enhance annotation inference for many ‘uncharacterized’ proteins. We conclude that our protocol can expose toxin-like proteins in unexplored niches including metagenomics data and enhance the systematic discovery of novel cell modulators for drug development. Availability : ClanTox is available at http://www.clantox.cs.huji.ac.il Contact : [email protected]

Gene Regulation and Transcriptomics

PT25—Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes Carlo Vittorio Cannistraci 1,2,3,4,5, ∗, Timothy Ravasi 1,5 , Franco Maria Montevecchi 3, Trey Ideker 5 and Massimo Alessio 2, ∗ 1Red Sea Integrative Systems Biology Lab, Computational Bioscience Research Center, Division of Chemical and Life Sciences and Engineering, King Abdullah University for Science and Technology (KAUST), Jeddah, Kingdom of Saudi Arabia, 2Proteome Biochemistry, San Raffaele Scientific Institute, via Olgettina 58, 20132 Milan, 3Department of Mechanics, 4CMP Group, Microsoft Research, Politecnico di Torino, c/so Duca degli Abruzzi 24, 10129 Turin, Italy, 5Department of Bioengineering and Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA Motivation : Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures—specifically dimension reduction (DR), coupled with clustering—provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space. Methods : ‘Minimum Curvilinearity’ (MC) is a principle that—for small datasets—suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering. Results : Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin. Conclusion: MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers

50

ECCB10 Oral Presentations new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal. Availability : https://sites.google.com/site/carlovittoriocannistraci/home Contact : [email protected]; [email protected]

PT26—A varying threshold method for ChIP peak-calling using multiple sources of information Kuan-Bei Chen 1 and Yu Zhang 2, ∗ 1Department of Computer Science and Engineering, The Pennsylvania State University, University Park and 2Department of Statistics, The Pennsylvania State University, 422A Thomas, University Park, PA 16802, USA Motivation : Gene regulation commonly involves interaction among DNA, proteins and biochemical conditions. Using chromatin immunoprecipitation (ChIP) technologies, protein–DNA interactions are routinely detected in the genome scale. Computational methods that detect weak protein-binding signals and simultaneously maintain a high specificity yet remain to be challenging. An attractive approach is to incorporate biologically relevant data, such as protein cooccupancy, to improve the power of protein-binding detection. We call the additional data related with the target protein binding as supporting tracks. Results : We propose a novel but rigorous statistical method to identify protein occupancy in ChIP data using multiple supporting tracks (PASS2). We demonstrate that utilizing biologically related information can significantly increase the discovery of true proteinbinding sites, while still maintaining a desired level of false positive calls. Applying the method to GATA1 restoration in mouse erythroid cell line, we detected many new GATA1-binding sites using GATA1 co-occupancy data. Availability : http://stat.psu.edu/ ∼yuzhang/pass2.tar Contact : [email protected]

PT27—is-rSNP: a novel technique for in silico regulatory SNP detection Geoff Macintyre 1,2, ∗, James Bailey 1,2 , Izhak Haviv 3,4,5 and Adam Kowalczyk 1,2, ∗ 1Department of Computer Science and Software Engineering, 2NICTA, Victoria Research Lab, The University of Melbourne, Victoria 3010, 3Bioinformatics and Systems Integration, The Blood and DNA Profiling Facility, Baker IDI Heart and Diabetes Institute, 75 Commercial Rd, Prahran, Victoria 3004, 4Metastasis Research Lab, Peter MacCallum Cancer Centre, St Andrews Place, East Melbourne, Victoria 3002 and 5Department of Biochemistry and Molecular Biology. The University of Melbourne, Victoria 3010, Australia Motivation : Determining the functional impact of non-coding disease-associated single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) is challenging. Many of these SNPs are likely to be regulatory SNPs (rSNPs): variations which affect the ability of a transcription factor (TF) to bind to DNA. However, experimental procedures for identifying rSNPs are expensive and labour intensive. Therefore, in silico methods are required for rSNP prediction. By scoring two alleles with a TF position weight matrix (PWM), it can be determined which SNPs are likely rSNPs. However, predictions in this manner are noisy and no method exists that determines the statistical significance of a nucleotide variation on a PWM score. Results : We have designed an algorithm for in silico rSNP detection called is-rSNP. We employ novel convolution methods to determine the complete distributions of PWM scores and ratios between allele scores, facilitating assignment of statistical significance to rSNP effects. We have tested our method on 41 experimentally verified rSNPs, correctly predicting the disrupted TF in 28 cases. We also analysed 146 disease- associated SNPs with no known functional impact in an attempt to identify candidate rSNPs. Of the 11 significantly predicted disrupted TFs, 9 had previous evidence of being associated with the disease in the literature. These results demonstrate that is-rSNP is suitable for high-throughput screening of SNPs for potential regulatory function. This is a useful and important tool in the interpretation of GWAS. Availability : is-rSNP software is available for use at: http://www.genomics.csse.unimelb.edu.au/is-rSNP Contact : [email protected]; [email protected]

51

ECCB10 Oral Presentations

PT32—De-correlating expression in gene-set analysis Dougu Nam School of Nano-Biotechnology and Chemical Engineering, Ulsan National Institute of Science and Technology, Republic of Korea Motivation : Group-wise pattern analysis of genes, known as geneset analysis (GSA), addresses the differential expression pattern of biologically pre-defined gene sets. GSA exhibits high statistical power and has revealed many novel biological processes associated with specific phenotypes. In most cases, however, GSA relies on the invalid assumption that the members of each gene set are sampled independently, which increases false predictions. Results : We propose an algorithm, termed DECO, to remove (or alleviate) the bias caused by the correlation of the expression data in GSAs. This is accomplished through the eigenvalue-decomposition of covariance matrixes and a series of linear transformations of data. In particular, moderate de-correlation methods that truncate or rescale eigenvalues were proposed for a more reliable analysis. Tests of simulated and real experimental data show that DECO effectively corrects the correlation structure of gene expression and improves the prediction accuracy (specificity and sensitivity) for both gene and sample-randomizing GSA methods. Availability : The MATLAB codes and the tested data sets are available at ftp://deco.nims.re.kr/pub or from the author. Contact : [email protected]

PT34—Discovering graphical Granger causality using the truncating lasso penalty Ali Shojaie ∗ and George Michailidis Department of Statistics, University of Michigan, Ann Arbor Michigan 48109, USA Motivation : Components of biological systems interact with each other in order to carry out vital cell functions. Such information can be used to improve estimation and inference, and to obtain better insights into the underlying cellular mechanisms. Discovering regulatory interactions among genes is therefore an important problem in systems biology. Whole-genome expression data over time provides an opportunity to determine how the expression levels of genes are affected by changes in transcription levels of other genes, and can therefore be used to discover regulatory interactions among genes. Results : In this article, we propose a novel penalization method, called truncating lasso, for estimation of causal relationships from time-course gene expression data. The proposed penalty can correctly determine the order of the underlying time series, and improves the performance of the lasso-type estimators. Moreover, the resulting estimate provides information on the time lag between activation of transcription factors and their effects on regulated genes. We provide an efficient algorithm for estimation of model parameters, and show that the proposed method can consistently discover causal relationships in the large p, small n setting. The performance of the proposed model is evaluated favorably in simulated, as well as real, data examples. Availability : The proposed truncating lasso method is implemented in the R-package ‘grangerTlasso’ and is freely available at http://www.stat.lsa.umich.edu/ ∼shojaie/ Contact : [email protected]

Text Mining, Ontologies and Databases

PT1—Utopia documents: linking scholarly literature with research data T. K. Attwood 1,2, ∗, D. B. Kell 3,4 , P. McDermott 1,2 , J. Marsh 3, S. R. Pettifer 2 and D. Thorne 3 1School of Computer Science, 2Faculty of Life Sciences, 3School of Chemistry, University of Manchester, Oxford Road, Manchester M13 9PL and 4Manchester Interdisciplinary Biocentre, 131 Princess Street, Manchester M1 7DN, UK Motivation : In recent years, the gulf between the mass of accumulating-research data and the massive literature describing and analyzing those data has widened. The need for intelligent tools to bridge this gap, to rescue the knowledge being systematically isolated in literature and data silos, is now widely acknowledged. Results : To this end, we have developed Utopia Documents, a novel PDF reader that semantically integrates visualization and data-analysis tools with published research articles. In a successful pilot with editors of the Biochemical Journal (BJ), the system has been used to transform static document features into objects that can

52

ECCB10 Oral Presentations be linked, annotated, visualized and analyzed interactively (http://www.biochemj.org/bj/424/3/). Utopia Documents is now used routinely by BJ editors to mark up article content prior to publication. Recent additions include integration of various textmining and biodatabase plugins, demonstrating the system’s ability to seamlessly integrate on-line content with PDF articles. Availability : http://getutopia.com Contact : [email protected]

PT21—Discriminative and informative features for biomolecular text mining with ensemble feature selection Sofie Van Landeghem 1,2,† , Thomas Abeel 1,2,† , Yvan Saeys 1,2 and Yves Van de Peer 1,2, ∗ 1Department of Plant Systems Biology, VIB and 2Department of Plant Biotechnology and Genetics, Ghent University, Gent, Belgium Motivation : In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. Results : We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text.We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. Availability : The FS algorithms and classifiers are available in Java- ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP’09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac .jp/GENIA/SharedTask/). Contact : [email protected]

PT22— BioXSD: the common data-exchange format for everyday bioinformatics web services Matúš Kalaš 1,2, ∗, Pål Puntervoll 1, Alexandre Joseph 3, Edita Bartaševi čiūt÷4, Armin Töpfer 1,5 , Prabakar Venkataraman 1, Steve Pettifer 6, Jan Christian Bryne 1,2 , Jon Ison 7, Christophe Blanchet 3, Kristoffer Rapacki 4 and Inge Jonassen 1,2 1Computational Biology Unit, Bergen Center for Computational Science, Uni Research, 5008 Bergen, Norway, 2Department of Informatics, University of Bergen, 5008 Bergen, Norway, 3Université Lyon 1; CNRS, UMR 5086; IBCP, Institut de Biologie et Chimie des Protéines, 69367 Lyon Cedex 07, France, 4Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kongens Lyngby, Denmark, 5Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, 33594 Bielefeld, Germany, 6School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK and 7European Bioinformatics Institute, EMBL, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Motivation : The world-wide community of life scientists has access to a large number of public bioinformatics databases and tools, which are developed and deployed using diverse technologies and designs. More and more of the resources offer programmatic web-service interface. However, efficient use of the resources is hampered by the lack of widely used, standard data-exchange formats for the basic, everyday bioinformatics data types. Results : BioXSD has been developed as a candidate for standard, canonical exchange format for basic bioinformatics data. BioXSD is represented by a dedicated XML Schema and defines syntax for biological sequences, sequence annotations, alignments and references to resources. We have adapted a set of web services to use BioXSD as the input and output format, and implemented a testcase workflow. This demonstrates that the approach is feasible and provides smooth interoperability. Semantics for BioXSD is provided by annotation with the EDAM ontology. We discuss in a separate section how BioXSD relates to other initiatives and approaches, including existing standards and the Semantic Web. Availability : The BioXSD 1.0 XML Schema is freely available at http://www.bioxsd.org/BioXSD-1.0.xsd under the Creative Commons BY-ND 3.0 license. The http://bioxsd.org web page offers documentation, examples of data in BioXSD format, example workflows with source codes in common programming

53

ECCB10 Oral Presentations languages, an updated list of compatible web services and tools and a repository of feature requests from the community. Contact : [email protected]; [email protected]; [email protected]

PT23—Improving disease gene prioritization using the semantic similarity of Gene Ontology terms Andreas Schlicker †, Thomas Lengauer and Mario Albrecht ∗ Max Planck Institute for Informatics, Department of Computational Biology and Applied Algorithmics, Campus E1.4, 66123 Saarbrücken, Germany Motivation : Many hereditary human diseases are polygenic, resulting from sequence alterations in multiple genes. Genomic linkage and association studies are commonly performed for identifying disease-related genes. Such studies often yield lists of up to several hundred candidate genes, which have to be prioritized and validated further. Recent studies discovered that genes involved in phenotypically similar diseases are often functionally related on the molecular level. Results : Here, we introduce MedSim, a novel approach for ranking candidate genes for a particular disease based on functional comparisons involving the Gene Ontology. MedSim uses functional annotations of known disease genes for assessing the similarity of diseases as well as the disease relevance of candidate genes. We benchmarked our approach with genes known to be involved in 99 diseases taken from the OMIM database. Using artificial quantitative trait loci, MedSim achieved excellent performance with an area under the ROC curve of up to 0.90 and a sensitivity of over 70% at 90% specificity when classifying gene products according to their disease relatedness. This performance is comparable or even superior to related methods in the field, albeit using less and thus more easily accessible information. Availability : MedSim is offered as part of our FunSimMat web service (http://www.funsimmat.de). Contact : [email protected]

PT24—Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism Luis Tari 1, ∗, Saadat Anwar 2, Shanshan Liang 2, James Cai 1 and Chitta Baral 2 1Disease and Translational Informatics, Hoffmann-La Roche, Nutley, NJ 07110 and 2Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA Motivation : Identifying drug–drug interactions (DDIs) is a critical process in drug administration and drug development. Clinical support tools often provide comprehensive lists of DDIs, but they usually lack the supporting scientific evidences and different tools can return inconsistent results. In this article, we propose a novel approach that integrates text mining and automated reasoning to derive DDIs. Through the extraction of various facts of drug metabolism, not only the DDIs that are explicitly mentioned in text can be extracted but also the potential interactions that can be inferred by reasoning. Results : Our approach was able to find several potential DDIs that are not present in DrugBank. We manually evaluated these interactions based on their supporting evidences, and our analysis revealed that 81.3% of these interactions are determined to be correct. This suggests that our approach can uncover potential DDIs with scientific evidences explaining the mechanism of the interactions. Contact : [email protected]

Protein Interactions, Molecular Networks and Systems Biology

PT3—A graphical method for reducing and relating models in systems biology Steven Gay, Sylvain Soliman and François Fages ∗ EPI Contraintes, Institut National de Recherche en Informatique et Automatique, INRIA Paris-Rocquencourt, France Motivation : In Systems Biology, an increasing collection of models of various biological processes is currently developed and made available in publicly accessible repositories, such as biomodels.net for instance, through common exchange formats such as SBML. To date, however, there is no general method to relate different models to each other by abstraction or reduction relationships, and this task is left to the modeler for re-using and coupling models. In mathematical biology, model reduction techniques have been studied for a long time, mainly in the case where a model exhibits different time scales, or different spatial phases, which can be

54

ECCB10 Oral Presentations analyzed separately. These techniques are however far too restrictive to be applied on a large scale in systems biology, and do not take into account abstractions other than time or phase decompositions. Our purpose here is to propose a general computational method for relating models together, by considering primarily the structure of the interactions and abstracting from their dynamics in a first step. Results : We present a graph-theoretic formalism with node merge and delete operations, in which model reductions can be studied as graph matching problems. From this setting, we derive an algorithm for deciding whether there exists a reduction from one model to another, and evaluate it on the computation of the reduction relations between all SBML models of the biomodels.net repository. In particular, in the case of the numerous models of MAPK signalling, and of the circadian clock, biologically meaningful mappings between models of each class are automatically inferred from the structure of the interactions. We conclude on the generality of our graphical method, on its limits with respect to the representation of the structure of the interactions in SBML, and on some perspectives for dealing with the dynamics. Availability : The algorithms described in this article are implemented in the open-source software modeling platform BIOCHAM available at http://contraintes.inria.fr/biocham. The models used in the experiments are available from http://www.biomodels.net/. Contact : [email protected]

PT28—Efficient parameter search for qualitative models of regulatory networks using symbolic model checking Gregory Batt 1, ∗, Michel Page 2,3 , Irene Cantone 4, Gregor Goessler 2, Pedro Monteiro 2,5 and Hidde de Jong 2 1INRIA Paris - Rocquencourt, Le Chesnay, 2INRIA Grenoble - Rhône-Alpes, Montbonnot, 3IAE, Université Pierre Mendès France, Grenoble, France, 4Clinical Sciences Center, Imperial College, London, UK and 5INESC/Instituto Superior Técnico, Lisbon, Portugal Motivation : Investigating the relation between the structure and behavior of complex biological networks often involves posing the question if the hypothesized structure of a regulatory network is consistent with the observed behavior, or if a proposed structure can generate a desired behavior. Results : The above questions can be cast into a parameter search problem for qualitative models of regulatory networks. We develop a method based on symbolic model checking that avoids enumerating all possible parametrizations, and show that this method performs well on real biological problems, using the IRMA synthetic network and benchmark datasets. We test the consistency between IRMA and time-series expression profiles, and search for parameter modifications that would make the external control of the system behavior more robust. Availability : GNA and the IRMA model are available at http://ibis.inrialpes.fr/ Contact : [email protected]

PT29—Bayesian experts in exploring reaction kinetics of transcription circuits Ryo Yoshida ∗, Masaya M. Saito, Hiromichi Nagao and Tomoyuki Higuchi The Institute of Statistical Mathematics, Research Organization of Information and Systems, 10-3 Midori-cho, Tachikawa, Tokyo, 190-8562, Japan Motivation : Biochemical reactions in cells are made of several types of biological circuits. In current systems biology, making differential equation (DE) models simulatable in silico has been an appealing, general approach to uncover a complex world of biochemical reaction dynamics. Despite of a need for simulation-aided studies, our research field has yet provided no clear answers: how to specify kinetic values in models that are difficult to measure from experimental/theoretical analyses on biochemical kinetics. Results : We present a novel non-parametric Bayesian approach to this problem. The key idea lies in the development of a Dirichlet process (DP) prior distribution, called Bayesian exper ts, which reflects substantive knowledge on reaction mechanisms inherent in given models and experimentally observable kinetic evidences to the subsequent parameter search. The DP prior identifies significant local regions of unknown parameter space before proceeding to the posterior analyses. This article reports that a Bayesian expert-inducing stochastic search can effectively explore unknown parameters of in silico transcription circuits such that solutions of DEs reproduce transcriptomic time course profiles.

55

ECCB10 Oral Presentations

Availability : A sample source code is available at the URL http://daweb.ism.ac.jp/ ∼yoshidar/lisdas/ Contact : [email protected]

PT30—Dynamic deterministic effects propagation networks: learning signalling pathways from longitudinal protein array data Christian Bender 1, ∗, Frauke Henjes 1, Holger Fröhlich 2, Stefan Wiemann 1, Ulrike Korf 1 and Tim Beißbarth 3 1Department of Molecular Genome Analysis, German Cancer Research Center, 69120 Heidelberg, 2Department of Algorithmic Bioinformatics, Bonn-Aachen International Center for IT, 53113 Bonn and 3Department of Medical Statistics, University of Göttingen, 37099 Göttingen, Germany Motivation : Network modelling in systems biology has become an important tool to study molecular interactions in cancer research, because understanding the interplay of proteins is necessary for developing novel drugs and therapies. De novo reconstruction of signalling pathways from data allows to unravel interactions between proteins and make qualitative statements on possible aberrations of the cellular regulatory program. We present a new method for reconstructing signalling networks from time course experiments after external perturbation and show an application of the method to data measuring abundance of phosphorylated proteins in a human breast cancer cell line, generated on reverse phase protein arrays. Results : Signalling dynamics is modelled using active and passive states for each protein at each timepoint. A fixed signal propagation scheme generates a set of possible state transitions on a discrete timescale for a given network hypothesis, reducing the number of theoretically reachable states. A likelihood score is proposed, describing the probability of measurements given the states of the proteins over time. The optimal sequence of state transitions is found via a hidden Markov model and network structure search is performed using a genetic algorithm that optimizes the overall likelihood of a population of candidate networks. Our method shows increased performance compared with two different dynamical Bayesian network approaches. For our real data, we were able to find several known signalling cascades from the ERBB signaling pathway. Availability : Dynamic deterministic effects propagation networks is implemented in the R programming language and available at http://www.dkfz.de/mga2/ddepn/ Contact : [email protected]

PT31—A novel approach for determining environment-specific protein costs: the case of Arabidopsis thaliana Max Sajitz-Hermstein 1 and Zoran Nikoloski 1,2, ∗ 1Max-Planck Institute of Molecular Plant Physiology and 2Institute of Biochemistry and Biology, University of Potsdam, Potsdam, D-14476 Germany Motivation : Comprehensive understanding of cellular processes requires development of approaches which consider the energetic balances in the cell. The existing approaches that address this problem are based on defining energy-equivalent costs which do not include the effects of a changing environment. By incorporating these effects, one could provide a framework for integrating ‘omics’ data from various levels of the system in order to provide interpretations with respect to the energy state and to elicit conclusions about putative global energy-related response mechanisms in the cell. Results : Here we define a cost measure for amino acid synthesis based on flux balance analysis of a genome- scale metabolic network, and develop methods for its integration with proteomics and metabolomics data. This is a first measure which accounts for the effect of different environmental conditions. We applied this approach to a genome-scale network of Arabidopsis thaliana and calculated the costs for all amino acids and proteins present in the network under light and dark conditions. Integration of function and process ontology terms in the analysis of protein abundances and their costs indicates that, during the night, the cell favors cheaper proteins compared with the light environment. However, this does not imply that there is squandering of resources during the day. The results from the association analysis between the costs, levels and well-defined expenses of amino acid synthesis, indicate that our approach not only captures the adjustment made at the switch of conditions, but also could explain the anticipation of resource usage via a global energy-related regulatory mechanism of amino acid and protein synthesis. Contact : [email protected]

56

ECCB10 Oral Presentations

PT33—How threshold behaviour affects the use of subgraphs for network comparison Tiago Rito 1,2, ∗, Zi Wang 1, Charlotte M. Deane 1,2 and Gesine Reinert 1,2 1Department of Statistics, University of Oxford, OX1 3TG and 2OCISB/DTC Systems Biology, Department of Biochemistry, South Parks Road, OX1 3QU, UK Motivation : A wealth of protein–protein interaction (PPI) data has recently become available. These data are organized as PPI networks and an efficient and biologically meaningful method to compare such PPI networks is needed. As a first step, we would like to compare observed networks to established network models, under the aspect of small subgraph counts, as these are conjectured to relate to functional modules in the PPI network. We employ the software tool GraphCrunch with the Graphlet Degree Distribution Agreement (GDDA) score to examine the use of such counts for network comparison. Results : Our results show that the GDDA score has a pronounced dependency on the number of edges and vertices of the networks being considered. This should be taken into account when testing the fit of models. We provide a method for assessing the statistical significance of the fit between random graph models and biological networks based on non-parametric tests. Using this method we examine the fit of Erdös–Rényi (ER), ER with fixed degree distribution and geometric (3D) models to PPI networks. Under these rigorous tests none of these models fit to the PPI networks. The GDDA score is not stable in the region of graph density relevant to currentPPI networks. We hypothesize that this score instability is due to the networks under consideration having a graph density in the threshold region for the appearance of small subgraphs. This is true for both geometric (3D) and ER random graph models. Such threshold behaviour may be linked to the robustness and efficiency properties of the PPI networks. Contact : [email protected]

Genomic Medicine

PT8—Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins Yanjun Qi 1, ∗, Oznur Tastan 2, Jaime G. Carbonell 2, Judith Klein-Seetharaman 2 and Jason Weston 3 1NEC Labs America, 4 Independence Way, Princeton, NJ 08540, 2School of Computer Science, Carnegie Mellon University, PA 15213 and 3Google Research NY, 75 Ninth Avenue, New York, NY 10011, USA Motivation : Protein–protein interactions (PPIs) are critical for virtually every biological function. Recently, researchers suggested to use supervised learning for the task of classifying pairs of proteins as interacting or not. However, its performance is largely restricted by the availability of truly interacting proteins ( labeled ). Meanwhile, there exists a considerable amount of protein pairs where an association appears between two partners, but not enough experimental evidence to support it as a direct interaction ( partially labeled ). Results : We propose a semi-supervised multi-task framework for predicting PPIs from not only labeled , but also partially labeled reference sets. The basic idea is to perform multi-task learning on a supervised classification task and a semi-supervised auxiliary task. The supervised classifier trains a multi-layer perceptron network for PPI predictions from labeled examples. The semi-supervised auxiliary task shares network layers of the supervised classifier and trains with partially labeled examples. Semi-supervision could be utilized in multiple ways. We tried three approaches in this article, (i) classification (to distinguish partial positives with negatives); (ii) ranking (to rate partial positive more likely than negatives); (iii) embedding (to make data clusters get similar labels). We applied this framework to improve the identification of interacting pairs between HIV-1 and human proteins. Our method improved upon the state-of-the-art method for this task indicating the benefits of semi-supervised multi-task learning using auxiliary information. Availability : http://www.cs.cmu.edu/ ∼qyj/HIVsemi Contact : [email protected]

PT9—Candidate gene prioritization based on spatially mapped gene expression: an application to XLMR Rosario M. Piro ∗, Ivan Molineris, Ugo Ala, Paolo Provero and Ferdinando Di Cunto Molecular Biotechnology Center and Department of Genetics, Biology and Biochemistry, University of Torino, Via Nizza 52, 10126 Torino, Italy Motivation : The identification of genes involved in specific phenotypes, such as human hereditary diseases, often requires the time-consuming and expensive examination of a large number of positional candidates

57

ECCB10 Oral Presentations selected by genome-wide techniques such as linkage analysis and association studies. Even considering the positive impact of next-generation sequencing technologies, the prioritization of these positional candidates may be an important step for disease-gene identification. Results : Here, we report a large-scale analysis of spatial, i.e. 3D, gene expression data from an entire organ (the mouse brain) for the purpose of evaluating and ranking positional candidate genes, showing that the spatial gene-expression patterns can be successfully exploited for the prediction of gene–phenotype associations not only for mouse phenotypes, but also for human central nervous system-related Mendelian disorders. We apply our method to the case of X-linked mental retardation, compare the predictions to the results obtained from a previous large-scale resequencing study of chromosome X and discuss some promising novel candidates. Contact : [email protected]

PT10—Prediction of a gene regulatory network linked to prostate cancer from gene expression, microRNA and clinical data Eric Bonnet 1,2 , Tom Michoel 1,2 and Yves Van de Peer 1,2, ∗ 1Department of Plant Systems Biology, VIB and 2Department of Molecular Genetics, Ghent University, Technologiepark 927, B-9052 Gent, Belgium Motivation : Cancer is a complex disease, triggered by mutations in multiple genes and pathways. There is a growing interest in the application of systems biology approaches to analyze various types of cancer-related data to understand the overwhelming complexity of changes induced by the disease. Results : We reconstructed a regulatory module network using gene expression, microRNA expression and a clinical parameter, all measured in lymphoblastoid cell lines derived from patients having aggressive or non- aggressive forms of prostate cancer. Our analysis identified several modules enriched in cell cycle-related genes as well as novel functional categories that might be linked to prostate cancer. Almost one-third of the regulators predicted to control the expression levels of the modules are microRNAs. Several of them have already been characterized as causal in various diseases, including cancer.We also predicted novel microRNAs that have never been associated to this type of tumor. Furthermore, the condition-dependent expression of several modules could be linked to the value of a clinical parameter characterizing the aggressiveness of the prostate cancer. Taken together, our results help to shed light on the consequences of aggressive and non-aggressive forms of prostate cancer. Availability : The complete regulatory network is available as an interactive supplementary web site at the following URL: http://bioinformatics.psb.ugent.be/webtools/pronet/ Contact : [email protected]

PT11—Inferring cancer subnetwork markers using density-constrained biclustering Phuong Dao 1,† , Recep Colak 1,† , Raheleh Salari 1, Flavia Moser 2, Elai Davicioni 3, Alexander Schönhuth 4, ∗,‡ and Martin Ester 1, ∗,‡ 1School of Computing Science, Simon Fraser University, Burnaby, 2University of British Columbia Centre for Disease Control, Vancouver, 3GenomeDX Biosciences Inc., Vancouver, BC, Canada and 4Department of Mathematics, University of California, Berkeley, CA, USA Motivation : Recent genomic studies have confirmed that cancer is of utmost phenotypical complexity, varying greatly in terms of subtypes and evolutionary stages. When classifying cancer tissue samples, subnetwork marker approaches have proven to be superior over single gene marker approaches, most importantly in cross- platform evaluation schemes. However, prior subnetwork-based approaches do not explicitly address the great phenotypical complexity of cancer. Results : We explicitly address this and employ density-constrained biclustering to compute subnetwork markers, which reflect pathways being dysregulated in many, but not necessarily all samples under consideration. In breast cancer we achieve substantial improvements over all cross-platform applicable approaches when predicting TP53 mutation status in a well-established non-cross-platform setting. In colon cancer, we raise prediction accuracy in the most difficult instances from 87% to 93% for cancer versus non- cancer and from 83% to (astonishing) 92%, for with versus without liver metastasis, in well-established cross- platform evaluation schemes.

58

ECCB10 Oral Presentations

Availability : Software is available on request. Contact : [email protected]; [email protected]

PT12—Modeling associations between genetic markers using Bayesian networks Edwin Villanueva and Carlos Dias Maciel ∗ Electrical Engineering Department, Sao Carlos School of Engineering, University of Sao Paulo, Sao Carlos, Sao Paulo, Brazil Motivation : Understanding the patterns of association between polymorphisms at different loci in a population (linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging. Results : We present a more practical method to build GM that describe LD. The method is based on learning weighted Bayesian network structures from haplotype data, extracting equivalence structure classes and using them to model LD. The results obtained in public data from the HapMap database showed that the method is a promising tool for modeling LD. The associations represented by the learned models are correlated with the traditional measure of LD D′. The method was able to represent LD blocks found by standard tools. The granularity of the association blocks and the readability of the models can be controlled in the method. The results suggest that the causality information gained by our method can be useful to tell about the conservability of the genetic markers and to guide the selection of subset of representative markers. Availability : The implementation of the method is available upon request by email. Contact : [email protected]

Other Bioinformatics Applications

PT13—Mass spectrometry data processing using zero-crossing lines in multi-scale of Gaussian derivative wavelet Nha Nguyen 1,2 , Heng Huang 1, ∗, Soontorn Oraintara 2 and An Vo 3 1Department of Computer Science and Engineering, 2Department of Electrical Engineering, University of Texas at Arlington, TX and 3The Feinstein Institute for Medical Research, North Shore LIJ Health System, New York, USA Motivation : Peaks are the key information in mass spectrometry (MS) which has been increasingly used to discover diseases-related proteomic patterns. Peak detection is an essential step for MS-based proteomic data analysis. Recently, several peak detection algorithms have been proposed. However, in these algorithms, there are three major deficiencies: (i) because the noise is often removed, the true signal could also be removed; (ii) baseline removal step may get rid of true peaks and create new false peaks; (iii) in peak quantification step, a threshold of signal-to-noise ratio (SNR) is usually used to remove false peaks; however, noise estimations in SNR calculation are often inaccurate in either time or wavelet domain. In this article, we propose new algorithms to solve these problems. First, we use bivariate shrinkage estimator in stationary wavelet domain to avoid removing true peaks in denoising step. Second, without baseline removal, zero-crossing lines in multi- scale of derivative Gaussian wavelets are investigated with mixture of Gaussian to estimate discriminative parameters of peaks. Third, in quantification step, the frequency, SD, height and rank of peaks are used to detect both high and small energy peaks with robustness to noise. Results : We propose a novel Gaussian Derivative Wavelet (GDWavelet) method to more accurately detect true peaks with a lower false discovery rate than existing methods. The proposed GDWavelet method has been performed on the real Surface-Enhanced Laser Desorption/Ionization Time-Of-Flight (SELDI-TOF) spectrum with known polypeptide positions and on two synthetic data with Gaussian and real noise. All experimental results demonstrate that our method outperforms other commonly used methods. The standard receiver operating characteristic (ROC) curves are used to evaluate the experimental results. Availability: http://ranger.uta.edu/ ∼heng/MS/GDWavelet.html or http://www.naaan.org/nhanguyen/archive.htm Contact : [email protected]

59

ECCB10 Oral Presentations

PT14—Detecting host factors involved in virus infection by observing the clustering of infected cells in siRNA screening images Apichat Suratanee 1,2,† , Ilka Rebhan 3,† , Petr Matula 1,2 , Anil Kumar 3, Lars Kaderali 4, Karl Rohr 1,2 , Ralf Bartenschlager 3, Roland Eils 1,2, ∗ and Rainer König 1,2, ∗ 1Department of Bioinformatics and Functional Genomics, Institute of Pharmacy and Molecular Biotechnology, Bioquant, University of Heidelberg, INF 267, 2Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), INF 280, 3The Department for Infectious Diseases, Molecular Virology, University of Heidelberg, INF 345 and 4Viroquant Research Group Modeling, Bioquant, University of Heidelberg, INF 267, 69120 Heidelberg, Germany Motivation : Detecting human proteins that are involved in virus entry and replication is facilitated by modern high-throughput RNAi screening technology. However, hit lists from different laboratories have shown only little consistency. This may be caused by not only experimental discrepancies, but also not fully explored possibilities of the data analysis.We wanted to improve reliability of such screens by combining a population analysis of infected cells with an established dye intensity readout. Results : Viral infection is mainly spread by cell–cell contacts and clustering of infected cells can be observed during spreading of the infection in situ and in vivo . We employed this clustering feature to define knockdowns which harm viral infection efficiency of human Hepatitis C Virus. Images of knocked down cells for 719 human kinase genes were analyzed with an established point pattern analysis method (Ripley’s K-function) to detect knockdowns in which virally infected cells did not show any clustering and therefore were hindered to spread their infection to their neighboring cells. The results were compared with a statistical analysis using a common intensity readout of the GFP-expressing viruses and a luciferase-based secondary screen yielding five promising host factors which may suit as potential targets for drug therapy. Conclusion : We report of an alternative method for high-throughput imaging methods to detect host factors being relevant for the infection efficiency of viruses. The method is generic and has the potential to be used for a large variety of different viruses and treatments being screened by imaging techniques. Contact : [email protected]; [email protected]

60

ECCB10 Posters

POSTERS

Sequence Analysis, Alignment and Next Generation Sequencing repeat structures and amino acid propensity of A-1 Resolving the genomic breakpoints of a existing proteins. microdeletion using next generation sequencing —Nebel J-C in a family Approaches to finding the exact breakpoints of the A-6 HERD: the highest expected reward decoding deletion causing a recurrent mental retardation for HMMs with application to recombination syndrome. Combining microarray, mate pair detection sequencing and familial data. We present a new approach for decoding hidden —de Ligt J, Vissers LELM, Kloosterman W, Markov models that can be tailored to a particular Hehir-Kwa JY, Bruijn E, Gilissen C, Guryev V, application, and therefore can achieve better Cuppen E , Brunner HG, Veltman JA prediction accuracy than traditional decoding methods, such as the Viterbi algorithm. A-2 Identification of unknown environmental —Nánási M, Vina ř T, Brejová B sequences by biased distribution of signature oligonucleotides A-7 Visualizing the next generation of sequencing We present a database of signature data with GenomeView oligonucleotides and software tools that provides a GenomeView can interactively browse high- graphical user interface to the database. These tools throughput sequencing data and genome allow one to manage the database and infer the alignments of dozens of genomes with dynamic phylogeny of unknown DNA sequences. navigation and zooming. It can handle dozens of —Labuschange P, Emmett W, Davenport CF, aligned genomes, thousands of annotation features Reva ON and millions of short reads —Abeel T, Van Parys T, Galagan J, Van de Peer Y A-3 Identification of conserved repeats with advanced background models A-8 IsUnstruct: a method based on Izing’s model In a survey of single amino acid repeats in over 100 for prediction of disordered residues from protein eukaryotes, leucine repeats are unexpectedly sequence alone. frequent in signal peptides according to an A new method based on the dynamic programming application specific background model. The was developed for searching disordered residues in relatively stronger conservation of the repeats protein chain. Testing of the method on two suggests a functional role. databases has shown that our method is one of the —Łabaj P-P, Sykacek P, Kreil D-P best. —Lobanov MY, Galzitskaya OV A-4 Searching for structural homologues of LeuT-like secondary transporters beyond A-9 MASiVE: Mapping and analysis of SireVirus sequence similarity elements in plant genome sequences We present AlignMe, a program that can use We present MASiVE, an algorithm for the large- various types of information for alignment of two scale, yet sensitive and highly accurate, discovery membrane protein sequences. We use AlignMe to of intact Sirevirus LTR-retrotransposons in plant detect relationships between transporters that share genomic sequences. MASiVE was validated on the a structural fold but lack detectable sequence annotated large Zea mays chromosome one. similarity. —Darzentas N, Bousios A, Apostolidou V, —Khafizov K, Staritzbichler R, Stamm M, Tsaftaris AS Forrest LR A-10 Signals in human promoter sequences A-5 Why inverse proteins are relatively abundant We have located GC rich 6-nts in human promoters Using a new artificial set of peptide sequences, we and identified sigmoidal behavior in their suggest the relative abundance of inverse proteins distribution.These hexamers are also observed in can be explained by the fact they display the same miRNAs and TF binding sites.We aim to establish 61

ECCB10 Posters correlation among promoters, TFs and A-16 AnsNGS: Annotation system of structural miRNAs,based on 6-nts. variation for next generation sequencing data —Putta P We introduce a new data integration and visualization tool AnsNGS to visualize short read A-11 Iterative read mapping and assembly allows mapping, identify and annotate genetic variation the use of more distant reference in metagenome based on the reference genome. assembly —Na YI, Park CH, Cho Y, Kim, JH We assemble a quasispecies consensus genome from 32nt metagenomic reads with 91.1% identity A-17 Assessing the accuracy and completeness of to the original reference. Iterating the mapping and the Bonobo genome sequence assembly procedure allows us to approach the We present the detailed analysis of the quality of consensus of the sequenced community. the Bonobo genome sequence, assembled from 25x —Dutilh BE, Huynen MA, Gloerich J, Strous M coverage of 454 sequencing data, by comparison to the human and chimpanzee genomes. A-12 APPRIS, a database of annotated principal —Prüfer K, Ptak SE, Fischer A, Good JM, splice variants for the human genome Mullikin JC, Miller J, Kodira CD, Knight JR, The We have developed a database that deploys a range Bonobo Genome Consortium, Kelso J, Pääbo S of computational methods to provide value to the annotations of the human genome. The database A-18 Estimating effective DNA database size via selects one of the CDS for each gene as the compression principal functional variants based on these Biological databases often contain identical or very algorithms. similar sequences. This is not taken into account in —Rodríguez J-M, Ezkurdia I, Lopez G, Pietrelli estimation of statistical significance of an A, Maietta P, Wesselink J-J, Valencia A, Tress M alignment. We propose to use the compressed size of database as an estimate of its effective size. A-13 Accurate long read mapping using —Visnovska M, Nanasi M, Vinar T, Brejova B enhanced suffix arrays We present an algorithm to map long reads to A-19 ChIP-seq analysis: from peaks to motifs reference genomes, based on chaining maximal We present ChIP-motifs, a specialized pipe-line exact matches and using heuristics and the combining various algorithms for discovering cis- Needleman-Wunsch algorithm to bridge the gaps. regulatory motifs in massive ChIP-seq datasets. It was tested on simulated BRCA1 data with —Thomas-Chollier M, Defrance M, Sand O, positive results. Herrmann C, Thieffry D, van Helden J —Vyverman M, Fack V, De Schrijver J, Dawyndt P A-20 Applying mass spectrometry data to improve annotation of the reference mouse genome A-14 Detection and architecture of small heat We have constructed a novel genome annotation shock protein monomers pipeline for tandem mass spectrometry data and We studied the architecture of small Heat Shock have applied this to the reference mouse genome, Proteins monomers, based on a new approach that enabling both the validation of current annotation detects and delineates the alpha-crystallin domain . and the identification of novel protein-coding loci. We annotated ACD structural components and —Saunders GI , Brosch M, Frankish A, Collins quantified sHSPs properties. MO, Yu L, Harrow J, Choudhary JS, Hubbard T —Poulain P, Gelly J-C, Flatters D A-21 Prediction of splice-modifying SNPs in A-15 Assembly of mitochondrial genomes using human genes using a new analysis pipeline called next-generation sequencing data AASsites We describe the identification and analysis of Most tools for the annotation of single nucleotide complete mitochondrial genomes using next- polymorphisms analyze possible effects on the generation data derived from whole-genome transcriptome and proteome levels. In contrast, our shotgun sequencing of two isolates of the plant AASsites pipeline indicates if a change in the pathogen Rhynchosporium secalis. splice pattern due to a SNP is likely to occur. —Felder M, Taudien S, Groth M, Münsterkötter —Hotz-Wagenblatt A, Faber K, Risch A, M, Navarro-Quezada A, Knogge W, Platzer M Glatting K-H

62

ECCB10 Posters A-22 Structator: a tool for fast index-based A-28 TAPIR, a web server for the prediction of matching of RNA structural motifs plant microRNA targets, including target mimics Structator is a tool for ultra-fast matching of RNA We present a new web server called TAPIR, structural motifs. It employs affix arrays, which is designed for the prediction of plant microRNA an index data structure that allows for bidirectional targets, including target mimics. The TAPIR web search of RNA structural motifs while taking server can be accessed at: advantage of their symmetric nature. http://bioinformatics.psb.ugent.be/webtools/tapir —Meyer F, Will S, Beckstette M —Bonnet E, He Y, Billiau K, Van de Peer Y

A-23 Project HOPE: Providing the last piece of A-29 GRID/CPCA analysis of the SULTs the puzzle... superfamily: multivariate characterization of the We present Project HOPE; an automatic analysis most relevant structural and physicochemical method for point mutations. HOPE collects and differences combines protein information from a series of We used computational methods to provide a sources and produces a conclusion while focusing multivariate characterization of human and mouse on the structural effects. cytosolic sulfotransferases,revealing the most —Venselaar H relevant structural and physicochemical differences within the SULT family towards to substrate A-24 Simple not-PWM-based approach for specificity. transcription factor binding site prediction —Rebollido-Rios R , Zamora-Rico I We suggest a novel simple not-PWM-based approach to the detection of transcription factor A-30 Theoretical and empirical quality binding sites. It outperforms other tools in both assessment of transcription factor binding motifs sensitivity and specificity and allows to We propose a method, implemented in the program significantly reduce the number of false positives matrix-quality, that combines theoretical and —Fazius E, Shelest V, Shelest E empirical score distributions to assess the reliability of position-specific matrices for predicting A-25 PICMI: mapping point variations on transcription factor binding sites. genomes —Medina-Rivera A, Abreu-Goodger C, Thomas- PICMI is a web server to simplify the mapping of Chollier M, Salgado-Osorio H, Collado-Vides J, nucleotide variations on a genome, to retrieve van Helden J information about their effect on all isoforms of a gene and, to rapidly add value to existing amino A-31 Characterization and prediction of protein acidic variation data. nucleolar localization sequences —Le Pera L, Marcatili P, Tramontano A By extensive curation of the literature, we have uncovered 46 human nucleolar localization A-26 Detection of alternative splice isoforms in sequences . These sequences were used to create an the human genome artificial neural network predictor which accurately We have carried out a comprehensive re- predicts NoLSs and distinguishes them from NLSs. identification of proteomics spectra from several —Scott MS, Boisvert F-M, McDowall MD, data repositories and confirmed that many human Lamond AI, Barton GJ genes express multiple alternative isoforms in sufficient quantities to be detected in proteomics A-32 Jalview: past, present, future. experiments. Jalview is a widely used open-source program for —Ezcurdia I, del Pozo A, Rodriguez JM, Ashman sequence alignment visualization. In this poster, we K, Valencia A, Tress ML describe its new annotation and web service capabilities, and our plans for its future A-27 Boundaries for short-read, low-coverage development. deNovo assembly and resequencing —Procter JB, Troshin PV, Martin MAM, A set of formulas are presented to calculate the Barton GJ expected distribution of contig length for a short- read sequencing experiment before performing it. A-33 Enrichment of eukaryotic linear motifs in Boundaries can be provided by these formulas to viral proteins. reduce the resources needed. We have found that there is an evidence for —Schatz F, Schimmler M enrichment of PCNA binding motif in the ordered region and PDZ ligand binding motifs in the 63

ECCB10 Posters disorder region of 33,269 proteins from 1,646 viral A-39 Monitoring strain diversity in metagenomics genomes. data using meta-MLST SNP profiling —Pushker R, Edwards R-J, Shields D-C Monitoring bacterial strain diversity in metagenomics sequence data, is typically done A-34 Analyzing ChIP-Seq data using qips using 16S rRNA sequences supporting resolution We propose a new tool called 'qips' especially up to the genus level. We describe a method that suited for the analysis of ChIP-Seq data also allows following diversity in metagenomes at the containing broader enriched regions which occur strain level. for example when targetting polymerase or histone —De Jager VCL, Van der Sijde MR, Hazelwood marks. LA, Kutahya O, Kleerebezem M, Smid EJ, Siezen —Gogol-Döring A , Chen W RJ, Van Hijum SAFT

A-35 COMA server for protein distant homology A-40 A variance stabilizing parametrization to search detect differentially expressed genes in RNA-Seq Detection of distant homology is a widely used data computational approach for studying proteins. Here Processing of RNA-Seq data is currently affected we report a homology search web server based on by a transcript length bias in the calling of sequence profile comparisons. The server differentially expressed genes. Here we propose a capabilities are illustrated with some non-trivial novel approach based on a variance stabilizing examples. parametrization and on an unbiased Wald type test. č —Margelevi ius M, Laganeckas M, Venclovas Č —Sales G, Risso D, Chiogna M, Romualdi C

A-36 Protein disorder and short motifs in A-41 DNA annotation induction: from RefGene disordered regions are enriched near the on human chr. 22 to genome-wide CAGE for cytoplasmic side of single-pass transmembrane human and mouse proteins We have shown that the knowledge of functional Human single-pass transmembrane proteins were annotation such as RefGene locations of analyzed to predict intrinsic disorder and transcription start sites on the smallest autosomal potentially functional short linear motifs. human chromosome is sufficient for genome-wide —Stavropoulos I, Khaldi N, Davey NE, Finian M, prediction of CAGE tags, for human and also Shields DC mouse. —Bedo J, Steininger A, Haviv I, Kowalczyk A A-37 Variation in positional preferences by 20 amino acids in alpha helices with different size. A-42 A new resampling method We have dissected alpha helixes with different We introduced a new, biologically inspired number of amino acids . We then compared the AA resampling method to use in molecular sequences. compositions of each position for each class of The method is crucial especially for the non helices. We have also compared the AA contents of alignment based methods. the same position in different helices. —Bakis Y, Sezerman OU —Fallahi H, Dokanehiifard S-A, Hosseini-Nasab S-M-E A-43 Clustering massive sequence databases We developed a tool for clustering massive number A-38 Comparison of mapping and variant calling of DNA sequences. This tool has been accuracies for next-generation sequence data in implemented in BioloMICS, a software solution for relation to coverage depth: a case study in biological data. Experiments show promising leukemia cell lines. results. We have assembled nine analysis pipelines to —Vu D, Robert V analyze targeted sequencing data and assessed their performances under high and low coverage A-44 WordCluster: detecting clusters of DNA thresholds separately and showed that it is possible words and genomic elements to identify correct variations even in low coverage WordCluster finds statistical significant clusters of regions DNA words or any other genomic elements. The —Kalender Z, Geerdens E, Cuppens H, Cools J, co-localization with gene annotations and the Aerts S enrichment/depletion analysis in GO terms may facilitate relating them to biological function. —Barturen G , Oliver JL, Hackenberg M 64

ECCB10 Posters A-45 Analysis of copy loss and gain variations in A-50 An iterative scheme for making long Holstein cattle autosomes using BeadChip SNPs structural alignments of ncRNAs using Using the BovineSNP50 BeadChip for 912 bulls, FOLDALIGN. we analyzed copy losses and gains based either on Pairwise structural alignment of ncRNAs is a deviation from the expected Hardy-Weinberg computational very demanding task. To lower equilibrium or on signal intensity using PennCNV. these requirements, we present an iterative This approach was more effective than previous approach which recursively extends shorter methodologies. alignments. The approach applies to both local and —Seroussi E, Glick G, Shirak A, Yakobson E, global alignments. Weller JI, Ezra E, Zeron Y —Havgaard JH , Gorodkin J

A-46 The role of the Occludin MARVEL domain A-51 A multi-objective approach to the prediction in intracellular targeting and clustering of sRNAs in Bacteria Occludin, a MARVEL domain protein is an We present a methodology, which uses a multi- essential component of tight junctions. objective approach to extract the best methods’ Bioinformatic sequence analysis was applied in aggregations to predict sRNAs in Bacteria by conjunction with live cell imaging to study the maximizing the specificity and sensitivity of the functions of this protein-lipid interacting domain. program´s individual predictions. —Yeheskel A, Yaffe Y, Hirschberg K, Pasmanik- —Arnedo J, Romero-Zaliz R, del Val C Chor M A-52 Scalability of large-scale protein domain A-47 Multi-Harmony: detecting functional family inference specificity from sequence alignment We present MPI_MkDom2, a distributed algorithm Specialized functions in protein sub-families for protein domain family inference enabling the generally are conferred by a small number of construction of regular new releases of ProDom, amino acids. Our improved multi-Harmony web while preserving the structure of domain families server reliably detects these sub-type-specific sites built by the now inapropriate sequential method from an input multiple alignment and sub- MkDom2. grouping. —Rezvoy C, Vivien F, Kahn D —Feenstra KA , Brandt BW, Heringa J A-53 Evolutionary study of eye-developmental A-48 Fast approximate statistical ranking for gene expression across multiple Drosophila sequence motif discovery under higher order species by high-throughput tag sequencing. Markov background model We have used Illumina’s digital gene expression in A new method of motif p-value estimation, based imaginal discs of three Drosophila species. We on a polynomial approximation and a MCMC show that DGE produces accurate measures of optimization of the coefficients, is proposed. gene expression and we identify evolutionary Higher order background correlation can be conserved and divergent expression during eye handled with minimum sacrifice of precision and development complexity. —Naval-Sanchez M , Aerts S —Shida K A-54 Higher order repeat with largest primary A-49 Networking interacting residues, repeat unit and located within a gene in human evolutionary conservation and energetics in genome protein structures Presentation of our novel computational method Residue interaction networks consider single amino Global Repeat Map, which is particularly acids in the protein structure as nodes and convenient for identification and analysis of repeats connections as physico-chemical interactions. and higher order repeats characterized by very long Here, we present RING as a novel tool to generate repeat units on a case study of Build 37.1 genomic RINs for use in Cytoscape. ensemble of Y chromosome. —Vidotto M, Martin AJM, Boscariol F, Walsh I, —Glun čić M, Paar V, Paar P, Rosandi ć M, Tosatto SCE Vlahovi ć I

65

ECCB10 Posters A-55 Fast and accurate digging for binding motifs examined their effect on key prediction measures in ChIP-Seq data using ChIPMunk software used by programs like miRDeep, and estimated the We present the ChIPMunk tool for motif discovery relevant parameters. in ChIP-Seq data using base coverage information. —Thakur V, Wanchana S, Xu M, Mosig A, Zhu X Comparison with other motif discovery tools shows that ChIPMunk identifies the correct motifs of the A-61 Using the Amazon Elastic Cloud Computing same or better quality and works much faster. Resource for the analysis of next-generation —Kulakovskiy I V, Boeva VA, Favorov AV, sequencing data in Ensembl Makeev V J The Ensembl project has ported parts of its Next- Generation Sequence analysis pipeline and web A-56 Mebitoo - A sequence analysis toolbox services to the Amazon Cloud. With this framework infrastructure in place, the analysis and We present Mebitoo, a bioinformatics software tool visualisation of large data sets is attractive for to embrace disequence analysis methods within a smaller labs. uniform graphical user interface. The framework —Vogel J , Aken B, Chiang G-T Clapham P, simplifies both the development of methods and the Coates G, Fairley S, Hourlier T, Ruffier M, Tang workflow when assessing large sets of sequences. A, White S, Zadissa A, Searle S, Hubbard T —Spaniol C , Helms V A-62 Automated sequence extraction of relevant A-57 Practical aspects of RNA-Seq data analysis genomic features for targeted resequencing that you should know and they don't tell you We present an easy to use webtool for genomic We present a guide of practical problems for the target selection based on the most complete and analysis of RNA-Seq data. We analysed a variety uniform annotation sources currently available. The of published data by means of state-of-the-art tool focuses on usability and should allow to easily software and provide valuable hints that should be prepare next generation sequencing experiments. taken into consideration on these kind of analysis —De Wilde B, D’Hont B, Lefever S, Hellemans J, —Garcia F, Tarazona S, Zuñiga S, Santoyo J, Speleman F, Pattyn F, Vandesompele J Dopazo J, Conesa A A-63 An Ensembl-based pipeline for microRNA A-58 MutaBase: a framework and web-interface prediction and expression profiling using next for the archival, management and interpretation generation sequencing data of single nucleotide variants obtained by next We present a workflow appliance for predicting generation sequencing. miRNA loci and profiling miRNA expression MutaBase is an easy-to-use, flexibly queryable based on short read sequences generated from framework which assists in the discovery of small RNA libraries. Is is based on Ensembl eHive interesting single-nucleotide variants called by and incorporates a number of open-source miRNA next-generation sequencing platforms. We here analysis software. present its application in full-exome sequencing of —James N , Donepudi M, Spooner W, Watson M 4 patients —Sifrim A , Van Houdt J, Vermeesch J, Moreau Y A-64 The Goby file formats: towards scalable next-generation sequencing data analysis A-59 Mining cleavage sites of the mouse The poster will present an overview of the Goby peptidome framework and provide the results of a storage We present the application of constraint-based benchmark comparing Goby formats to widely itemset mining techniques on peptide cleavage used next-gen file formats. sites. Mining experimentally verified cleavage —Chambwe N, Dorff KC, Srdanovic M, Deng M, sites allows discovery of recognition motifs for Andrews SJD, Campagne F endopeptidases. —De Grave K, Guns T, Vandekerckhove TTM, A-65 Quantitative transcriptome analysis using de Landuyt B, Menschaert G, Van Criekinge W, novo assembly of RNA-seq from the non-model Schoofs L, Luyten W C4 species Cleome gynandra Here we describe the de novo assembly of A-60 Properties of plant microRNAs affect Illumina-based RNA sequencing data to establish miRDeep statistical scoring as well as accuracy the Cleome gynandra transcriptome, and to assess Given the differences in properties of plant global changes in gene expression during leaf miRNAs from that of animal miRNAs, we development. 66

ECCB10 Posters —Salmon-Divon M , Aubry S, Rutherford K-M, enrichment technique and demonstrate its Kelly K-A, Hibberd J-M, Bertone P applicability for custom SNP genotyping experiments using childhood acute lymphoblastic A-66 iCount – comprehensive analysis of iCLIP leukemia as an example. data —Wesolowska A , Dalgaard MD, Borst L, Gautier iCount is a computational pipeline for the analysis L, Bak M, Weinhold N, Nielsen BF, Nersting J, of individual-nucleotide resolution UV- Tommerup N, Brunak S, Sicheritz-Ponten T, crosslinking and immunoprecipitation (iCLIP) Leffers H, Schmiegelow K, Gupta R data: mapping, quantification of bound RNA, peak finding, binding sequence motif enrichment and A-68 Exploring 3D pooled next-generation visualization. sequencing using SNP-Cub 3 —Rot G, König J, Gorup C, Zupan B, Ule J, We developed a simulation and analysis pipeline - Curk T called SNP-Cub 3 - for 3D pooled illumina resequencing that allows optimization and scaling A-67 Relating genomic variation to drug response of the multidimensional pools, optimization of the in childhood acute lymphoblastic leukemia by sequencing setup and analysis of the sequencing multiplexed targeted sequencing. data. We present a reproducible, cost-effective, multiple- —De Schrijver J , De Wilde B, Trooskens G, sample pooling customization of a genome Vandesompele J, Van Criekinge W

Comparative Genomics, Phylogeny and Evolution — Tran HT , Cuguen J, Touzet P, Saumitou- B-1 The role of transposable elements in the Laprade P evolution of non-mammalian vertebrates and invertebrates B-5 Trees inside trees for genome wide We present a comparative analysis of transposable association studies elements exonization in five different species. our We consider extensions of Random Forests analysis shed light on the differences between candidate splits for taking into account linkage vertbrates and invertbrates regarding the birth of desequilibrium between SNP markers in genome- new exons. wide association studies. We obtain improved —Sela N, Kim E, Ast G predictive power and interpretability on artificial and real datasets. B-2 A tool for comparison of complete proteomes —Botta V, Geurts P, Wehenkel L between pairs of organisms We present a tool that implements an algorithm B-6 Validation of single cell arrayCGH on a 60- that permits an easy comparison of the full mer oligo microarray platform for proteomes between two organisms. This tool is preimplantation genetics diagnosis implemented in PERL and will be made available We present a novel research to detect copy number both on a website and as a set of downloadable variations of single cell DNA derived from scripts. blastomeres. This research is helpful for the future —Karathia HM, Alves R study to improve the selection of high quality human embryos before implantation. B-3 Toward an unified measure of intraspecific —Cheng J, Vanneste E, Konings P, Van selective pressure Eyndhoven W, Voet T, Ampe M, Verbeke G, We present an attempt to unify some of the most Vermeesch J, Moreau Y used measures of intraspecific selective pressure. By their exhaustive analysis, we highlighted their B-7 Polymorphisms associated with mtDNA of pros and cons, suggesting a way for their elite Kenyan endurance athletes integration. We analyzed the complete mtDNA sequences of —Amato R, Miele G, Pinelli M, Cocozza S elite Kenyan endurance athletes to look for SNPs correlated with excellence in international B-4 Assessment of genetic structure within and competition. We have identified 4 polymorphisms among Normandie populations of wild beet in members of the L0 clade which are significantly We assess the level of genetic differentiation in B. enriched. vulgaris sps. maritima form coastal northern France at three different geographical scales. 67

ECCB10 Posters —Seiler M, Fuku N, Diao L, Solovyov A, Seiler S, support for MPI and multithreading allows the Tanaka M, Bhanot G efficient analysis the full Ensembl database . —Proost S, Fostier J, Dhoedt B, Demeester P, Van B-8 Bayesian inference of community average de Peer Y, Vandepoele K gene copy numbers and its application for the metagenomic characterization of B-14 Comparative expression analysis of bacterioplankton community types in the Sargasso orthologous genes between Arabidopsis and rice Sea In this study we developed a general framework to We present a Bayesian approach for estimating measure expression context conservation for community average gene copy numbers for orthologs between Arabidopsis and rice and show comparative metagenomic analyses, and illustrate that for several gene functions sequence and its use in a comparative metagenomic study of expression evolution are only weakly correlated. microbial communities from the Sargasso Sea. —Movahedi S, Van de Peer Y, Vandepoele K —Beszteri B, Frickenhaus S, Giovannoni SJ B-15 Phylogenomics and robust construction of B-9 Phylogenetic mapping of non-model prokaryotic evolutionary trees organism RNA-seq reads using a graph algorithm We show that our method of construction of We present a new methodology to map RNA-seq prokaryotic evolutionary trees based on sets of datasets for non-model organisms that don't have a selected genes produces very robust results. The reference genome to reference gene family sets are constructed from as many as possible alignments. The resulting fragments can then be clusters of orthologous groups of genes. assembled into transcripts, complementing existing —Korenblat K, Volkovich Z, Bolshoy A de novo tools. —Vilella A, Massingham T, Loytynoja A B-16 Mapping the human membrane proteome: a majority of the human membrane proteins can be B-10 GEI-DB: A database of atypical genomic classified according to function and evolutionary elements in bacteria origin. We present GEI-DB, a database of genomic islands Membrane proteins are essential in all cells and are that were identified in bacteria by the analysis of also the targets for the majority of our drugs. In this genome-wide distribution of tetra-nucleotide study we present a detailed overview of the patterns using oliginucleotide usage statistics. members of the human membrane proteome. —Bezuidt O, Lima-Mendez G, Reva O —Almén MS, Nordström KJ, Fredriksson R, Schiöth HB B-11 The CNS twilight zone: limitations in comparing upstream regions in plants B-17 Towards molecular trait-based ecology The limits of successful genomic comparisons to through integration of biogeochemical, understand upstream regions of plants are geographical and metagenomic data dependent on the twilight zone shaped by Metagenomics is a powerful tool in environmental divergence time, relative positioning, differences and medical microbiology . Here we show that an between plant groups, and duplication events. ecosystems 'parts list' can be lead to ecological —Reineke AR , Bornberg-Bauer E, Gu J knowledge through integration of the Venter GOS study with external data. B-12 TurboOrtho – a high performance —Raes J, Letunic I, Yamada T, Jensen LJ, Bork P alternative to OrthhoMCL We have re-implemented in C++ the popular B-18 Orthologous proteins associated with yeast OrthoMCL algorithm for protein clustering. This telomeric complex identified by synteny and resulted in 14-216 times increase in the sequence similarity performance. We attempt to identify orthologs in several yeast —Ekseth O, Lindi B, Kuiper M, Mironov V species for proteins known to have function in maintenance of telomeric complex. By B-13 i-ADHoRe 3.0 Detection of collinearity in combination of sequence similarity and synteny large scale datasets. information we evaluate confidence for putative We present a new version of i-ADHoRe capable of orthologs. detecting collinearity in large datasets. The —Macko M, Tomáška L, Vina ř T combination of algorithmical improvements and

68

ECCB10 Posters B-19 Estimating average heterozygosity using Lipoxygenase gene family with special emphasis multilocus dominant marker data: a maximum on mammalia. likelihood approach via the EM algorithm. —Padmanabhan R, Kuhn H, Reddanna P We present an Expectation-Maximization algorithm for obtaining the maximum likelihood B-25 Cassis: detection of genomic rearrangement estimate of average heterozygosity under presence breakpoints of ascertainment bias, using multilocus dominant Cassis is a software developed for the precise marker data. detection of genomic rearrangement breakpoints by —Khang TF , Yap VB the comparison of the genomes of two related species. It detects and narrows breakpoints and B-20 PLAZA 2.0 : a resource for plant statistically assesses the obtained results. comparative genomics —Baudet C, Lemaitre C, Dias Z, Gautier C, We present an update of the PLAZA plaform for Tannier E, Sagot M-F comparative genomics. Besides a substantial increase in number of available species, various B-26 Methylated cytosines are less likely to mutate tools have been updated and new tools have been within CpG islands added. 5mCpGs are over 1.5-times less likely to mutate —Van Bel M, Proost S, Wischnitzki E, Van de into TpG within CpG islands. Possible causes of Peer Y, Vandepoele K this effect include a higher percentage of deleterious C>T mutations in CpG islands, or B-21 Branch testing in phylogenetic trees: interactions between 5mCpG mutation rates and comparison of computationally efficient methods local GC content. We compare several methods to test branches in —Medvedeva Y, Panchin A, Mitrofanov S, phylogenetic trees in terms of false positive and Makeev V false negative rate, computational efficiency and robustness to the underparametization of the B-27 Entropy approach reveals new features of evolutionary model. genes and genomes —Czarna A, Wróbel B We present an investigation of entropy characteristics for 1031 bacterial, 89 archaeal and 9 B-22 Parsimonious higher-order hidden Markov yeast genomes. Genome regions differing models for enhanced comparative genomics considerably in their entropy characteristics from An efficient extension of the standard Hidden the average over the genome, were studied. Markov Model is developed for the analysis of —Putintseva Y , Sadovsky M Array-CGH data. The benefits of this method over others are demonstrated for the genomes of two B-28 The hidden duplication past of the plant important Arabidopsis ecotypes. pathogen Phytophthora and its consequences for —Seifert M, Banaei A, Gohr A, Strickert M, infection Grosse I We present a newly developed strategy to uncover ancient polyploidy events, in particular for heavily B-23 Correction of bootstrap confidence levels rearranged genomes. Furthermore, we have applied using an iterated bootstrap procedure with our new approach on the Phytophthora genomes to computational efficient methods. unveil their duplication history. We apply iterated bootstrap proposed by Rodrigo —Martens C, Van de Peer Y to correct bootstrap confidence levels using two computationally efficient methods for the B-29 Proteins linkages between African reconstruction of pseudotrees. Trypanosoma and 8 pathogenetic eukaryotic —Czarna A, Wróbel B organisms Trypanosoma brucei differs from higher eukaryotes B-24 Molecular evolution, structure and in many aspects.In the present study, we use the functional divergence of Lipoxygenase gene Domain Fusion Analysis for the prediction of family in vertebrate. functional functional linkages and associations Lipoxygenase comprise a multi-gene family of between Trypanosoma brucei and 8 pathogenic non-heme, iron containing dioxygenases. This organisms. study describes the phylogenetic relationships, —Dimitriadis D, Trimpalis P, Choli- structural and functional divergence of vertebrate Papadopoulou T, Anagnou NP, Kossida S

69

ECCB10 Posters B-30 Reconstructing phylogenetic trees from B-36 cn.FARMS: a probabilistic latent variable clustering trees model to detect copy number variations Some top-down methods for phylogenetic tree High-density oligonucleotide genotyping construction can be viewed not just as constructing microarrays, especially Affymetrix SNP6 chips, are trees, but as identifying constraints that the widely used for copy number analysis. In order to phylogenetic tree must satisfy. We show that this identify CNVs more reliable, we have developed a viewpoint can lead to improved phylogenetic trees. maximum a posteriori factor analysis model called —Costa E, Vens C, Blockeel H cn.FARMS. —Clevert D-A, Mitterecker A, Mayr A, B-31 Predictive modeling of psychrophilic Klambauer G, Tuefferd M, De Bondt A, Talloen W adaptation on the proteome sequence-level , Goehlmann, and Hochreiter S A numerical approach to characterize psychrophilic adaptation on the sequence level is presented. For B-37 Genome-wide heterogeneity of the some protein families discriminating models are substitution process derived, allowing for classification of adapted vs. We present a genomic characterization of variation non-adapted sequences. in best-fitting nucleotide substitution models. —Frickenhaus S , Beszteri B Different genes are best explained by different models, suggesting that there is a large amount of B-32 Microbial phenotype prediction based on phylogenetically relevant genomic heterogeneity. protein domain profiles —Arbiza L, Dopazo H, Posada D We present a discriminative machine learning approach for prediction of microbial phenotypes B-38 Comparative microarray analysis to from the organism-specific domain content. Our elucidate TF-networks in activated T-cells method achieves high prediction accuracy and T-cell activation is crucial for the adaptive immune allows biological interpretation of the learned response against pathogens. We assembled a large genomic features. set of publicly accessible gene expression data sets —Lingner T, Mühlhausen S, Meinicke P concerned with T-cell activation to study and re- engineer transcription factor networks. B-33 New insights into the metazoan evolution of —Kröger S, Scheel T, Leser U , Baumgrass R cadherins: from basal to modern Exhaustive identification and comparative analysis B-39 Gene-trait matching analysis of of the cadherin repertoires in key organisms such Lactobacillus plantarum strains as amphioxus, sea anemone and the placozoan We present a gene-trait matching method that Trichoplax enabled us to reconstruct the complex allows integration of large data sets. We test our metazoan evolution of the cadherin superfamily. method with genotype and phenotype data of 38 —Hulpiau P, van Roy F Lactobacillus plantarum strains. It allowed the identification of phenotype-specific gene clusters. B-34 Comparative mapping of transcription factor —Bayjanov J R, Siezen RJ, Van Hijum SAFT binding sites in plant genomes Comparative mapping of transcription factor B-40 ProtTest-HPC: fast selection of best-fit binding sites results in enrichment for functional models of protein evolution. binding sites and reduces the number of non- We have developed and evaluated a parallel functional and therefore false positive predictions version of ProtTest for the fast selection of best-fit which frequently occur in genome wide motif models of protein evolution in multi-core desktops studies. and high performance computing environments. —Wischnitzki E, Van de Peer Y, Vandepoele K —Darriba D, Taboada G-L, Doallo R, Posada D

B-35 Protein model selection with Prottest B-41 NTRFinder: an algorithm to find nested increases phylogenetic performance tandem repeats We carried out computer simulations to assess the We introduce the algorithm NTRFinder to find role of model selection on the inference of complex repetitive structures in DNA that we call maximum likelihood phylogenetic trees from nested tandem repeats . We have tested our protein alignment, using best-fit empirical models algorithm on both real and simulated data, and to of amino acid replacement identified by ProtTest. date have found 3 NTRs of interest in real —Patricio M, Abascal F, Zardoya R sequences. —Matroud A , Hendy M, Tuffley C 70

ECCB10 Posters B-42 Computational epigenomics of plant SNF2 B-45 Diversity of the chromosomal beta- genes lactamase in 181 clinical Klebsiella oxytoca Chromatin remodeling is at the core of epigenetic isolates. Comparison of the relation between beta- gene regulation, in plants responsible for proper lactamase and gyrase-A sequences development and genotype*environment Phylogenetic analysis of beta-lactamase genes from interactions. Comparative analysis of plant Snf2 181 Danish Klebsiella oxytoca isolates shows a genes helps elucidate mechanisms of chromatin clear separation into five known types. More than remodeling. half is type Oxy2. GyraseA sequences support the —Bargsten JW, Mlynárová L, Nap JPH grouping with one exception Oxy1 and Oxy5 are fused. B-43 Comprehensive analysis of splice site —Worning P , Boye K, Hansen DS evolution in primates using whole genome alignments and RNASeq data B-46 Origin and evolution of the organellar We present the comparative analysis of changes in release factors splicing sites between several primate species. Organellar gene expression is far from understood, Based on human annotated transcripts, we identify with the human mitochondrion genetic code being non-functional splice sites in other primates and elucidated in 2010. We uncovered the remarkable verify our results using RNASeq data. ribosomal release factor functional differentiation —Ongyerth M, Prüfer K, Kelso J and co-evolution with the organellar genetic code. —Duarte I , Huynen M . B-44 An ancient motif signature defines hundreds of vertebrate developmental enhancers B-47 Identification and characterisation of novel We uncover a striking enrichment within vertebrate “atypical” odorant binding proteins in the CNEs for binding-site motifs of the Pbx-Hox mosquito genomes hetero-dimer. We anticipate our findings will have We present the identification of a new class of a significant impact on understanding the gene- atypical odorant binding proteins and the regulatory elements that underlie vertebrate classification of the family in the three mosquito development. genomes Anopheles gambiae, Aedes aegypti and —Parker H, Piccinelli P , Elgar G Culex quinquifasciatus based on comparative genome analysis. —Manoharan M , Ng Fuk Chong M, Vaitinadapoulle A, Frumence E, Sowdhamini R, Offmann B

Protein and Nucleotide Structure

C-1 Developing a methodology for protein-ligand C-3 ProBiS: A web server for detection of docking based on genetic algorithm and normal structurally similar protein binding sites modes A web server, ProBiS, freely available at In this poster we describe the methodology and http://probis.cmm.ki.si, is presented. This provides discuss the results and the performance obtained access to the program ProBiS, which detects with the two first version of methodology that protein binding sites based on local structural combine genetic algorithm, rotamer library and alignments. normal modes anlaysis to simluate docking protein- —Konc J, Janeži č D ligand. —Lima AN, Philot EA, Scott LPB, Perahia D C-4 Toward the prediction of the absolute quality of single protein structure models C-2 A statistical potential for modelling of Here we present a new method for the prediction of protein-RNA complexes. the absolute quality of a model in which quality is We provide two knowledge-based potentials for expressed based on how similar the model’s discrimination of near native decoys, with RMS “energy” is compared to values calculated for —Tuszynska I, Rother K, Bujnicki JM experimental structures solved by X-ray crystallography. —Benkert P, Biasini M, Schwede T

71

ECCB10 Posters C-5 NMR data and protein structure C-11 Pattern recognition with moment invariants The PDBe works on relating NMR data to protein for interpretation of very low resolution structures, and on presenting NMR structures and macromolecular density maps data to the user. This poster gives an overview of We present a novel method for the interpretation of the current status of NMR at the PDBe. low-resolution maps, which does not rely on any —Vranken W map segmentation or knowledge about the position of the individual structural fragments. The C-6 Predicting the effect of mutations on the structures of the fragments should be known in thermal stability of proteins with statistical advance. potentials and artificial neural networks —Heuser P , Lamzin VS We present a new method to predict changes in melting temperature induced by mutations in C-12 PTools: an open source molecular docking proteins. It relies on a set of statistical potentials library and a neural network, and outperforms predictions PTools is an open source docking library that based on estimations of the thermodynamic handles biomolecular structures at atomic and stability. reduced resolutions. The ATTRACT program —Folch B, Rooman M, Dehouck Y based on the PTools library can perform protein- protein, protein-DNA and 3-body docking C-7 BioShell - a universal utility library for simulations. structural bioinformatics —Poulain P, Saladin A., Fiorucci S, Zacharias M, BioShell is a bioinformatical toolkit written in Prévost C Java, providing programs and a scripting library . It can also be used for development of Java programs. C-13 Development of a structural alignment tool Applications include sequence and structure for protein local surfaces alignments and structural analysis. We present a structural alignment tool that —Gront D produces similarity scores, superposed structures and atom alignments of detected similar surface C-8 Molecular dynamic simulation studies of regions of the input pair of proteins. It detected membrane bound fully solvated β3 adrenergic similar active sites of serine proteases in test case. receptor —Minai R, Horiike T A 5.0 ns molecular dynamics simulation was performed on the predicted 3D model of β3AR. C-14 Blender game engine as a tool to navigate The trajectories obtained during production phase the conformational space of proteins were analyzed. The average coordinate file Using Blender, a software of computer graphics generated during production phase of was taken as and video games, we developed a system to the final model. elaborate transitions between different models —Tewatia P, Malik BK, Sahi S present in NMR collections. While providing a intermediate conformations, the system also builds C-9 The significance of the ProtDeform score a navigation map We show that as TM-scores, PD-scores are protein —Zini M-F, Andrei R, Loni T, Zoppè M length independent and follow approximately an Extreme Value Distr. On fold and homology levels, C-15 Using data mining to identify structural PD-scores are more discrimiant that TM-scores. rules in proteins We calculate 3 different PD-values for different The progress in bioinformatics and biotechnology tests. area has generated a huge amount of data —Rocha J , Alberich R sequences that requires a profoundly analysis. In this work, we study formation rules of the C-10 Exploiting synergy between computational secondary structures of proteins using data mining. biology and X-ray crystallography for solving —Stelle D, Scott LPB, Barioni MC challenging macromolecular structures We present a novel method for completing C-16 Detecting protein domains of biological fragmented models automatically obtained from assemblies using TopDomain-web low-resolution electron density maps, exploiting TopDomain-web is a tool that automatically the presence of intrinsic information - the non- assigns domains to different structural entities of crystallographic symmetry . proteins. We demonstrate that comprehensive —Wiegels T , Lamzin V structural information as provided by biological 72

ECCB10 Posters assemblies is essential to assign compact structural analytical solution for the loop closure problem. domains. We show that this kinetic enables reliable sampling —Senn S, Sippl MJ of protein equilibrium. —Bottaro S, Boomsma W, Enøe Johansson K, C-17 Metals in protein structures: classification Andreetta C, Hamelryck T, Ferkinghoff-Borg J and functional prediction We present a new method to compare metal sites in C-23 Self Organising Maps applied to protein protein structures in a systematic and automated structure classification fashion. This provides the basis for their rigorous This work presents how a rather simple yet very classification, the development of prediction tools, powerful algorithm, called Self Organising Map, is and the design of a metalloprotein database. able to cluster similar structures given a numerical —Andreini C, Cavallaro G representation of the proteins. Application of SOMs to structure prediction is analysed. C18. Intuitive visualization of surface properties —Alanis-Lobato G of moving proteins Using Blender, a free, open-source, 3D creation C-24 Construction of a sequence- and backbone- software, we developed a method for simultaneous dependent rotamer library by hidden Markov visualization of surface properties of model macromolecules, such as Electrostatic Potential and Sidechain prediction is an important subproblem of hydropathy. protein design and structure prediction. —Andrei R, Callieri M, Zini M-F, Loni T, Construction of rotamer library is the basis for Zoppè M protein sidechain prediction because it provides the basic searching space for prediction. C-19 Hierarchical classification of helical —Wen W, Lv Q, Yang P, Yang L-Y, Wu J-Z, distortions related to proline Huang X The presence of proline in a helix can be accommodated by several types of distortions. We C-25 Codon usage and protein structure in present here a hierarchical method to classify them prokaryotes: old stories, new findings and into five canonical structures. This method can be puzzling questions implemented on a large scale and will help to We add a new chapter to the long standing debate develop predictive tools for molecular modeling. on the potential relationship between codon usage —Rey J, Devillé J, Chabbert M and protein structure. We show that some early results do not hold true in light of an appropriate C-20 A “smooth” knowledge based potential for statistical framework and large scale analyses. protein structure refinement —Cozzetto D, Ward S, Jones DT We construct a geometrical knowledge-based potential for protein structures by optimizing ‘nice C-26 Analysis of FTICR mass spectra with H/D smooth’ energy minima around native structures. exchange of deprotonated dinucleotides We use only the carbon alpha atoms to allow for We present a statistical method for the faster sampling methods and the potential is spline interpretation of H/D exchange mass spectra. It based. allows estimating the exchange rate of hydrogens —Røgen P , Koehl P and drawing conclusions about the conformation of biomolecules. C-21 PROTEIN PEELING in 2010: recent —Claesen J, Valkenborg D, Burzykowski T developments and applications We present an update of our Protein Peeling web C-27 Voroprot: an interactive tool for the analysis server for protein structure analysis. New of geometric features of protein structure improvements of server are described: mobile Voroprot is an advanced interactive tool that extremity identification, Protein Units energy provides a unique set of possibilities for the computation and structural domain delineation. visualization and analysis of geometric features of —Gelly J-C, de Brevern AG protein structure using additively weighted Voronoi diagram, also known as Apollonius diagram. C-22 Local moves for efficient sampling of —Olechnovi č K, Margelevi čius M, Venclovas Č protein conformational space We present a novel method to perform local deformations of protein backbone that includes an 73

ECCB10 Posters C-28 BBP - Beta-Barrel Predictor: a web server powerful visualization tools. It is an ideal working for the prediction of the super-secondary environment for the development of new structure of transmembrane β-barrel proteins algorithms. We present a web server for the prediction of the —Biasini M, Mariani V, Haas J, Scheuber S, super-secondary structure of transmembrane β- Schenk A-D, Schwede T, Philippsen A barrel proteins whose motifs are based on permutations, such as Greek key or Jelly roll. It is C-34 Modeling structure of ionic channels from compared to existing works and shows positive rigid network of contact sites efficiency. We present graph approach to characterize —Tran VD, Chassignet P, Steyaert J-M topology of protein channels and its characteristic features. The network of channel contact sites C-29 Prediction of three dimensional structure of proved distinct from other proteins and indicated protein complexes optimal definition of the contact site. We present new benchmark of currently available —Kotulska M, Gerstein M docking software. Seven program were evaluated on extensive set of 1300 protein-ligand pairs. C-35 Multi-task sequence labeling for protein Ability to predict correct structure of complexes as annotation well as scoring capabilities were tested. We introduce an original multi-task approach for —Plewczy ński D, Ła źniewski M, Augustyniak R, sequence labeling applied to protein annotation Ginalski K problems. We show successful results for the prediction of secondary structure, solvent C-30 Investigating differences in the structural accessibility and disorder regions on a set of 500 environment of parallel and anti parallel beta PDB proteins. sheets —Maes F, Becker J, Wehenkel L By inspecting the structural environment of beta sheets, we find an unexplained difference between C-36 A convex programming model for protein the solvent exposed area of parallel versus anti- structure prediction parallel beta sheets. This may have important Convex optimization is applied to protein structure consequences for the stability of beta sheets. prediction. Globally minimum conformations may —Bawono P , Abeln S be found quickly and energy function parameters may be exactly determined. Applications for the C-31 Protein model quality assessment based on framework are to NMR data and homology structural and functional similarities modeling. We present GOBA - a model quality assessment —Kauffman C , Karypis G programme that allows for the evaluation of the quality of protein model-structures based on the C-37 Predicting protein function with the relative knowledge of the protein function. backbone position kernel —Konopka BM, Nebel J-C, Kotulska M We propose a kernel method for predicting the function of proteins that makes use of 3D structural C-32 Data driven structure prediction: calculating information. Using the kernel in support vector accurate small angle X-ray scattering curves from machines, we obtain a state-of-the-art accuracy on coarse-grained protein models two datasets of protein structures. We present a method for calculating accurate —Schietgat L, Aryal S, Ramon J SAXS curves from coarse-grained protein structures. It was tested on a set of diverse C-38 The Torsional Network Model: structures and in protein decoy recognition and improvements in modelling of protein flexibility shows promise for use in inference of protein We present a benchmark of our new Torsional structures from SAXS data Network Model in terms of the robustness of the —Stovgaard K, Andreetta C, Ferkinghoff-Borg J, derived normal modes and its application to study Hamelryck T protein conformational changes, both as compared to the closely related Anisotropic Network Model. C-33 OpenStructure: A flexible software —Méndez Giráldez R , Bastolla U framework for computational structural biology We have developed OpenStructure, a flexible software framework for structural bioninformatics which integrates core computational methods with 74

ECCB10 Posters C-39 MEDELLER: coordinate generation for —Capriotti E, Marti-Renom MA membrane proteins MEDELLER is a homology-based coordinate C-44 IPknot: fast and accurate prediction of RNA generation method for membrane proteins. Using secondary structures with pseudoknots using membrane protein-specific information and integer programming accurate loop modelling it outperform the popular We present IPknot for prediction of RNA method Modeller on all test sets by an average of secondary structures with pseudoknots using 0.48 Angstroms RMSD. integer programming. IPknot is at least comparable —Kelm S, Shi J, Deane CM in accuracy to several methods for RNA pseudoknot prediction and can run fast. C-40 Structure-based predictor of HIV coreceptor —Sato K , Kato Y, Akutsu T, Asai K tropism We propose a structural predictor of the HIV C-45 The Protein Model Portal, a resource of the coreceptor tropism offering a high accuracy and Nature PSI Structural Biology Knowledgebase additional insight into the structural and The Protein Model Portal is a resource for 3-D physicochemical properties determining the viral protein structure models and their annotation. The phenotype. current release allows searching millions of model —Bozek K, Lengauer T, Domingues FS structures and provides access to interactive services for model building and quality estimation. C-41 Different structures/same interaction —Bordoli L, Haas J, Benkert P, Mostaguir K, partners: what can we learn from promiscuous Kiefer F, Arnold K, Schwede T protein-protein interactions? In this work, we analyzed the binding sites of C-46 PB-PENTAdb : a web resource for the proteins that are able to bind similar partners, at analysis and prediction of local backbone equivalent binding sites, despite their structural structure and flexibility using pentapeptide dissimilarity. We found no clear evidence of protein blocks convergent evolution. Taking advantage of protein blocks, a 16-state —Martin J description of the backbone of proteins, we have investigated the sequence to structure relationship C-42 3DM: A system to automatically build at the pentapeptide level and are providing for the structure-based super-family systems first time, an online tool. We present a system that automatically builds a —Offmann B , Tyagi M, Joseph A, Drula M, structure-based super-family information system Grondin M, Frumence E, Vaitinadapoulle A, Cadet that integrates data from many different sources. F, Srinivasan N, de Brevern A The system has been tested in a series of protein engineering projects. C-47 RactIP: fast and accurate prediction of —Joosten HJ, Kuipers RKP, Vd Bergh T, Schaap RNA-RNA interaction using integer PJ, Vriend G programming We present RactIP for RNA-RNA interaction C-43 Quantifying the relationship between RNA prediction using integer programming. RactIP is at sequence and three-dimensional structure least comparable in accuracy to several methods for conservation for homology detection RNA-RNA interaction prediction and can run We present an analysis of a selected set composed incomparably faster. by 589 high similarity RNA structural alignment to —Kato Y , Sato K, Hamada M, Watanabe Y, Asai quantify the relationship between RNA sequence K, Akutsu T and structure conservation and define the “twilight zone” for RNA homology detection.

Prediction and Annotation of Molecular Function —Huculeci R, Buts L, Rousseau F, Schymkowitz D-1 Validating the structural information J, van Nuland N, Lenaerts T exchange within the domain Fyn SH2 We show in this poster the validation through NMR of the predicted information processing within the protein domain Fyn SH2 caused by peptide binding. 75

ECCB10 Posters D-2 Discriminant functions for classification and —Khalifa W, Yousef M prediction of T1SS, T3SS, T4SS and T6SS secreted proteins from proteobacteria D-8 Computational identification of synonymous The two sets of predictor variables were obtained SNPs in the human genome and their potential for multiple discrimination among T1SS, T3SS, role in disease T4SS & T6SS secreted proteins. The classifiers are We present a computational approach and analysis considered as either independent or complementary tool for the identification of synonymous SNPs tools to specify the leaderless secretion substrates. within the human genome which may have a —Kampenusa I, Zikmanis P potential functional impact on gene expression and/or the gene product and may be associated with D-3 Computing fragmentation trees from tandem disease. mass spectrometry data —Wood L, Ramsay M We automatically annotate tandem mass spectra of metabolites with a fragmentation tree, which can D-9 The potential functional role of chimeric reveal information about a completely unknown transcripts in higher eukaryotes compound. The method does not require any We studied the potential functional role of chimeric information besides the spectra to work. proteins in eukaryotes. We found that chimeric —Rasche F, Svatos A, Böcker S RNAs resulting from trans-splicing combine full functional and transmembrane domains, and use D-4 Computational identification of new miRNAs signal peptides to change localization of a chimera. in genomic regions associated with psychiatric —Morgenstern F, Valencia A diseases We used an in-house developed miRNA prediction D-10 A ‘Functional Signature’ for analysing pipeline to predict new miRNA genes in genomic annotation space regions associated with psychiatric diseases. Using We have performed a comprehensive analysis of online available data from RNA-Seq experiments, the current state of the functional annotations in we were able to find experimental evidence for 21 protein databases. We describe homologous new miRNAs. clusters by their 'Functional Signature' and indicate —Aelterman B, De Rijk P, Del-Favero J where functional transfer by homology should be possible. D-5 CatANalyst: a web server for predicting —del Pozo A, Tress M, Valencia A catalytic residues CatANalyst is a web server for predicting catalytic D-11 Predicting small molecule ligand binding residues. It incorporates both a sequence-and a sites and catalytic residues with firestar/FireDB structure-based predictor and relies on an Here we present the new developments of firestar underlying SVM classifier that was shown to and FireDB, an expert system for predicting improve over previous state-of-the-art approaches. functional residues in protein sequences based on —Cilia E, Passerini A information extracted from structures and the Catalytic Site Atlas. D-6 Functional characterization of Parkinson by —López G, Maietta P, del Pozo A, Rodríguez JM, high-throughput data analysis with l1l2 Valencia A, Tress ML regularization We present the results of an in-silico analysis on D-12 Classifying substrate specificities of microarray Parkinson data, where the reliability of membrane transporters from Arabidopsis thaliana the employed variable selection technique is Based on a ranking method that measures the confirmed by the functional characterization of the similarity of membrane transporters by their amino identified signature. acid frequency and higher sequence order —Squillario M, Masecchia S, Barla A information, we annotated unknown transporters from Arabidopsis thaliana to four substrate classes. D-7 Feature elimination for one-class MicroRNA —Schaadt NS, Christoph J, Helms V target pridection and gene identification The usage of Zero norm feature elimination for one class machine learning to predict microRNA targets and gene identification cause an improving of the one-class results, where in some cases reaches the performance of the two-class approach. 76

ECCB10 Posters D-13 Molecular modeling and comparison by differentiating colorectal cancer patients from docking and molecular dynamics of the healthy subjects. interaction between fatty acids and proteins and —Dittwald P, Gambin A, Ostrowski J, EgFABP2 EgFABP1 Karczmarski J In order to understand why similar fatty acid binding proteins are expressed in the flatworm D-19 Mining posttranscriptional regulatory parasite Echinococcus granulosus, we studied features by comparing total and polysomal mRNA ligand affinities of EgFABP1 and EgFABP2 to profilings different fatty acids: arachidonic, palmitic, oleic The aim of this study is to relate the sensitivity of and linoleic. mRNAs to posttranscriptional control to UTRs —Paulino Z-M, Esteves A contextual properties. We used machine learning techniques to extract informative features D-14 GreenPhylDB version 2: web resources for characterizing posttranscriptionally controlled comparative and functional genomics in plants mRNAs. We present an update of the GreenPhyl website —Re A, Tebaldi T, Segata N, Passerini A, that contains a catalogue of gene families for 16 Blanzieri E, Quattrone A genomes of the plantae kingdom, ranging from algae to angiosperms. Gene families are first D-20 Prototypes of elementary functional loops manually checked and then phylogenetically unravel evolutionary connections between protein analysed. functions —Rouard M, Guignon V, Aluome C, Laporte We derive prototypes of elementary functional MA, Droc G, Walde C, Zmasek CM, Perin C, loops having specific signatures and characteristic Conte MG size 25-30 residues from sequences of complete proteomes. We show how folds and functions D-15 Collaboration-based function prediction in emerged in pre-domain evolution as a combination protein-protein interaction networks of prototypes We present a new approach for protein function —Goncearenco A, Berezovsky IN prediction in protein-protein interaction networks. The main idea behind our approach is that D-21 Utilization of proteins and nucleic acids in topologically close proteins tend to have the study of gene function: a comparative review collaborative functions, not necessarily the same This review examines protein amenability to functions. prediction of gene function and the potential of —Rahmani H, Blockeel H, Bender A proteomics in biological research. It is extracted from a wide array of literature on proteomics and D-16 SVM feature selection applied to gene function studies in view of current lignocellulose degradation advancements in molecular biology and or We study cellulose degradation by means of feature bioinformatics selection with an L1-regularized linear SVM to a —Mwololo JK, Karaya HG, Munyua JK, Muturi microbal dataset. PW, Munyiri SW —Trukhina Y , McHardy A D-22 ClanTox: A predictor tool for toxin-like D-17 Substrate-specific HMMs for the proteins reveals 500 such proteins within viral classification of adenylation and acyltransferase genomes domains of NRPS/PKS systems Animal toxins are short, highly stable proteins that We have developed an efficient strategy to classify operate by binding to receptors or ion channels. We the substrate specificity of the adenylation and developed a classifier that identifies toxin like acyltransferase domains of the non-ribosomal proteins and tested it on viral proteins. We peptide synthetases and polyketide synthases . discovered 510 toxin like proteins in viruses. —Khayatt B I, Siezen RJ, Francke C —Naamati G , Askenazi M, Linial M

D-18 Stochastic modeling of proteolytic activity D-23 I-Patch web service: inter-protein contact from LC-MS data prediction using local network information We propose the stochastic model of serum Presentation of the I-Patch Web Service for proteolytic activity. Parameters of the model prediction of inter-protein contact sites. I-Patch estimated from LC-MS/MS samples correspond to gives 59% precision with 20% recall on a blind test the cleavage activities of specific enzymes 77

ECCB10 Posters set of 31 proteins - better than other existing number of important biological processes. We methods on the same data set. characterize interactions of IL-8 with natural and —Hamer R, Luo Q, Armitage J, Reinert G, Deane modified glycosaminoglycans using computational C, Krawczyk K approaches. —Samsonov SA , Pisabarro MT D-24 Predicting candidate genes for biomass traits D-29 Expression profile based substrate We present an Ondex workflow for automated specificity prediction in complete Saccharomyces functional annotation of newly sequenced cerevisiae methyltransferome genomes, as well as web interface for systematic We identified complete S. cerevisiae analysis of QTL. It was used to annotate the poplar methyltransferome (all methyltransferases encoded genome and to predict candidate genes for biomass in genome) with structural and substrate specificity traits. characterization, supplemented with development —Hassani-Pak K, Kuo SC, Hanley S, Rawlings C of new methodology for substrate specificity prediction. D-25 Exploring residue interaction networks to —Wlodarski T , Rowicka M, Ginalski K understand protein structure-function relationships D-30 Discovery and annotation strategies of The new Cytoscape plugin RINalyzer enables the nonribosomal peptide synthetases from bacterial interactive, visual analysis of residue interaction genomes networks derived from 3D protein structures. It We propose a strategy to annotate nonribosomal affords exploring the molecular role of residues peptide synthetases, huge enzymatic complexes critical for protein structure and function. producing peptides from free amino acids, not via —Doncheva N-T, Domingues F-S, Albrecht M central dogma. The classical automatic annotation process can't be applied on those enzymes. D-26 BioXSD: the canonical XML-Schema data —Saci Z, Pupin M , Deravel J, Krier F, Caboche S, model for bioinformatics Web services Jacques P, Leclère V BioXSD defines standard exchange formats of the basic bioinformatics data types: sequences, D-31 Predicting N-glycosylation sites in human sequence annotations, alignments, and references to proteins data resources. It aims to enable smooth N-glycosylation site prediction in human proteins interoperability of the world-wide bioinformatics using artificial neural networks on sequence Web services. features. Distinguishes glycosylated and non- —Kalaš M, Puntervoll P, Joseph A, Bartaševi čiūt÷ glycosylated canonical consensus sequons. Impact E, Töpfer A, Ison J, Blanchet C, Rapacki K, on human proteome and disease mutations is Jonassen I investigated. —Gupta R, Frederiksen J , Jung E, Brunak S D-27 Functional inference for experimental proteomics: the PANDORA (annotations graph) D-32 AIGO: Toward a unified framework for the and ProtoNet (family tree) resources Analysis and the Inter-comparison of GO From experimentalist to knowledge - A shortcut functional annotations through PANDORA BOX. ProtoNet & This work defines an ensemble of measures PANDORA web tools for understanding proteins suitable to provide both a quantitative and a function by integration of sequence and qualitative description of GO functional annotations. annotations and also to allow a comprehensive —Rappoport N , Linial M inter-comparison between them. —Hindle M, Powers S, Habash D, Saqi M, D-28 Computational approaches to study Defoin-Platel M glycosaminglycans recognition by IL-8 for the design of biomaterials for tissue regeneration Glycosaminglycan-protein interactions in extracellular matrix play a critical role for a

78

ECCB10 Posters Gene Regulation and Transcriptomics

E-1 TFactS: a tool to predict transcription factor E-6 In silico study of the regulation of sRNAs in regulation from gene expression data Escherichia coli We present TFactS, a transcription factor target In this work we aimed at updating the known gene database, used to predict the regulation of Escherichia coli transcriptional network with the transcription factors from gene expression data. sRNA regulatory network. This includes The tool was updated and applied on different predicting the regulation of the sRNAs and biological systems . extending sRNA regulons of with novel targets. —Essaghir A, Pachikian BD, Delzenne NM, —Ishchukov I, Ryan D, Zarrineh P, Cloots L, Toffalini F, van Helden J, Demoulin JB Thijs I, Engelen K, Marchal K

E-2 Time- and dose-dependent effects of cigarette E-7 Systems biology analysis at VIB smoke on monocytic cell transcriptome We have developed powerful software tools to In the context of atherosclerosis, our infer gene regulation networks from transcriptional transcriptomics study investigates the molecular data that can help identifying molecular markers mechanisms underlying the effect of cigarette and deciphering the wiring of networks of smoke exposure on monocytic cells over time. interacting components. Hundreds of genes were found to be regulated. —Ren X-Y, Bonnet E., Joshi A, Maere S, Michoel —Poussin C, Lietz M, Schlage W, Lebrun S, T, Van Parys T, Vermeirssen V, Van de Peer Y Hoeng J, Peitsch M E-8 A computational framework for automated E-3 Combined Monte Carlo and dynamical analysis of nonlinear dynamics of gene regulatory modelling of gene regulation enables multiple networks parameter estimations and hypothesis generation We present a computational tool for qualitative Parameter estimation has revealed a bimodality of a simulation of gene regulatory network dynamics genetic regulatory circuit and an additional modelled by a specific class of ODEs. It is interpretation of its outcome has been compared appropriate to derive all different types of with sensitivity analyses. Moreover, an dynamical behaviors from given network experimental verification of the model has been structures. suggested. —Ironi L , Panzeri L —Herman D, Thomas CM, Stekel DJ E-9 Exploiting ESTs to infer plant alternative E-4 Module-based comparative gene expression splicing analysis: evolutionary conserved coexpression in We present findAS, a novel java-based utility to Bacillus subtilis and Escherichia coli find alternative splicing events from genomic We developed a method that uses microarray data alignments of mRNA sequences. FindAs extracts and homology mapping among genes as input and the putative AS from alignments of expressed provides pairs of conserved modules as output. We sequences by means of a cluster analysis. applied our methodology to study coexpression —Potenza E, Cestaro A, Fontana P, Bianco L, conservation between Escherichia coli and Bacillus Velasco R subtilis. —Zarrineh P, Fierro C, Sánchez-Rodríguez A, De E-10 Inferring gene regulatory networks to Moor B, Marchal K identify conserved regulation between Arabidopsis and Populus E-5 Algorithm-driven artifacts in median polish We infer the gene regulatory networks of Populus summarization of microarray data and Arabidopsis from microarray expression data Over-similarity artifacts are introduced by the using several methods. The networks from the two summarization step of Affymetrix microarrays species are aligned to identify conserved functional preprocessing methods RMA and GCRMA . We and topological features characteristic to trees. present a modified RMA procedure and show its —Netotea S, Hvidsten TR performance in microarray clustering tasks. —Giorgi F-M, Bolger A, Lohse M, Usadel B

79

ECCB10 Posters E-11 BayesPI - a new model to study protein- —Mestdagh P, Lefever S, Pattyn F, Ridzon D, DNA interactions: a case study of condition- Fredlund, Fieuw A, Vermeulen J, De Paepe A, specific protein binding parameters for Yeast Wong L, Speleman F, Chen C, Vandesompele J transcription factors We have incorporated Bayesian model E-16 D-Light: Transcription factor binding site regularization with biophysical modeling of management and visualization protein-DNA interactions, and of genome-wide We present D-Light, a new client-server system for nucleosome positioning to study protein-DNA compiling, managing, querying and displaying interactions, using a high-throughput dataset. annotation data for transcription factor binding —Wang J, Morigen sites. A GUI allows combinatorial queries within or between genomes. E-12 Use of structural DNA properties for the —Laimer J, Zuzan C, Ehrenberger T, prediction of regulator binding sites with Freudenberger M, Gschwandtner S, Lebherz C, conditional random fields Lirk G, Lackner P We present a framework for integrating local structural DNA properties in a motif screening E-17 Expression and chromosomal clustering of application and create a set of novel target site tissue restricted antigens predictions for various E. coli transcription factors Tissue restricted antigens are crucial in the negative that could be validated using independent data selection of autoreactive T cells in the thymus to sources prevent autoimmune dieseases. We defined TRAs —Meysman P, Engelen K, Laukens K, Dang TH, from microarray data and could show that they are Marchal K significantly clustered on chromosomes. —Dinkelacker M , Pinto S, Derbinski J, Eils E-13 Gene expression analysis in Scots pine in R,Kyewski B, Brors B response to red and far-red light We studied gene regulation in Pinus sylvestris from E-18 Community dynamics and northern Sweden grown under red and far-red light metatranscriptome analysis of lactic acid bacteria using Pine cDNA microarrays. Computational ecosystems through functional gene microarray annotation of the genes was done with Blast2GO, analysis which reveals existence of differentially expressed A lactic acid bacteria functional gene microarray genes was successfully used to unravel community —Ranade SS, Abrahamsson S, Niemi J, García- dynamics and to analyze the metatranscriptome of Gil MR a ten-day backslopped sourdough fermentation in a temporal way. E-14 Transcription regulatory regions and motifs —Weckx S, Allemeersch J, Van der Meulen R, database – the genomic data layer of the Nencki Vrancken G, Huys G, Vandamme P, Van Regulatory Genomics Portal Hummelen P, De Vuyst L Understanding transcriptional co-regulation in vertebrates remains a challenge. To address it, we E-19 A functional gene microarray for lactic acid are developing a database of several kinds of bacteria exceeding the species level: design, putative regulatory regions in human, mouse and validation, and data analysis rat, scored with three major TFBS motifs libraries. A functional gene microarray for lactic acid —Dabrowski M, Lenart J, Mieczkowski J, bacteria was developed to investigate whole- Kaminska B ecosystem gene expression in fermented food ecosystems. E-15 The microRNA body map: dissecting —Allemeersch J, Van der Meulen R, Vrancken G, microRNA function through integrative genomics Huys G, Vandamme P, Van Hummelen P, De We present the microRNA body map, an Vuyst L, Weckx S interactive online compendium and mining tool of high-dimensional newly generated and published E-20 Combinatorial analysis of transcription microRNA expression profiles along with regulation of response to oxidative stresses in E. functional annotation inferred through integrative coli transcriptomics. To study mechanisms involved in stress response in E. coli from gene expression data, an original approach based on Answer Set Programming was

80

ECCB10 Posters designed. With this method, new targets for imported to genome browsers or other analysis predicted transcription factors can be discovered. tools. —Thiele S, Klie S, Jozefczuk S, Selbig J, Blachon —Barreau D, Gao Y, Jaffrezic F, Hugo K, Rogel- S Gaillard C, Marthey S

E-21 Time delays in gene activation: sensitivity to E-27 Evaluation of computational miRNA target external noise predictions in human We present a study of a genetic switch capable of MicroRNA target prediction is a popular research delaying its transition from low to high expression topic. Many computational prediction tools have state after being excited by an external input, and been developed that predict different targets for the discuss the conditions under which these delays are same microRNA. This poster presents an objective most resistant to parameter perturbations. comparison between 7 often used tools. —Albert J, Rooman M —Gevaert O

E-22 Discovering regulatory mechanisms that E-28 Model-based multifactor dimensionality govern zygotic genome activation in Drosophila reduction to detect epistasis for quantitative traits From clusters of genes co-expressed at different in the presence of error-free and noisy data stages of early embryogenesis, we detected over- We present the power of Model-Based Multifactor represented motifs potentially involved in the Dimensionality Reduction in identifying significant zygotic genome activation in Drosophila. epistasis models for quantitative traits in the —Darbo E, Thieffry D, Lecuit T, van Helden J presence of noisy data. —Mahachie John JM, Van Lishout F , Van Steen E-23 Detection of extended UTRs with 3' K expression microarrays Affymetrix 3' expression arrays were used to E-29 Time coherent three-dimensional clustering: predict transcript extension beyond currently unraveling local transcriptional patterns in known boundaries. multiple gene expression time series —Thorrez L, Tranchevent L-C, Chang H-J, We present an efficient algorithm, relying on string Moreau Y, Schuit F matching using suffix trees and itemset mining, to unravel gene modules exhibiting a coherent E-24 A dynamic regulatory network model reveals transcriptional response in specific time frames hard-wired heterogeneity in blood stem cells across multiple gene expression time series. We present the most advanced mammalian —Gonçalves JP, Moreau Y, Madeira SC regulatory network model based on known cis- regulatory interactions to date, from which we E-30 The role of incoherent microRNA-mediated made novel, experimentally testable hypotheses feedforward loops in noise buffering about the control of differentiation in Incoherent microRNA mediated feedforward loop haematopoietic stem cells. are remarkably overrepresenetd in the human —Bonzanni N, Garg A, Foster SD, Wilson NK, regulatory network. We show analytically and Kinston S, Miranda-Saavedra D, Heringa J, through simulations that their role in the network is Feenstra A, Xenarios I, Göttgens B to control the fluctuations in the target protein level —Osella M, Bosia C, Corà D, Caselle M E-25 Alternative splicing as a marker of disease In this work we try to identify aberrant alternative E-31 Dynamic transcriptomics of helper T cell splicing isoforms that are cause or consecuence of differentiation human disease with a focus on neurodegenerative We performed transcription profiling of helper T disease. We will classify them according to tissue cells. Using our polar score method, we were able and age. to identify genes specifically associated with —Muro EM , Andrade-Navarro MA Th1/Th2 differentiation in the large set of – non polarizing – differentially expressed genes. E-26 TilSeg: an automated pipeline to process —van den Ham HJ, de Waal L, Zaaraoui F, Bijl tiling arrays expression data M, van IJcken WF, Osterhaus ADME, de Boer RJ, TilSeg is a tool dedicated to tiling arrays automated Andeweg AC analysis. This software uses standard quantification files, performs data analysis and provides standardized outputs files which can be easily 81

ECCB10 Posters E-32 Effect of size and heterogeneity of samples E-37 Predicting regulatory networks from large on biomarker discovery: synthetic and real data scale time-course microarray data under several assessment stimulation conditions on MCF-7 cells We analyzed the effect of sample size and We analyzed large scale time-course microarray heterogeneity on a number of biomarker discovery data under several stimulation conditions on MCF- methods on real and synthetic data. Classification 7 cells. Our analysis presented a putative regulatory based approaches improve selection; using external network leading to biological functions initiated by cross-validation loops improves results distinct stimulations. reproducibility. —Shiraishi Y, Saeki Y, Nagashima T, Yumoto N, —Di Camillo B, Martini M, Sanavia T, Jurman G, Takahashi K, Okada M Sambo F, Barla A, Squillario M, Furlanello C,Toffolo G, Cobelli C E-38 Inferring regulatory networks from expression data using tree-based methods E-33 Analysis of DNA sequence dependent We present a new algorithm for the inference of structural properties in prokaryotic genomes genetic regulatory networks from expression data Sequence dependent DNA structural property that exploits tree-based ensemble methods. This analysis in E. coli, B. subtilis and M. tuberculosis algorithm was best performer in the DREAM4 In promoter sequences as well as from other Silico Multifactorial challenge. prokaryotic genomes reveal that lower stability and —Huynh-Thu VA, Irrthum A, Wehenkel L, bendability are more common than higher intrinsic Geurts P curvature. —Rangannan V , Bansal M E-39 Dynamic profile of transcription factors inferred from mRNA expression time courses in E-34 Computational analysis of transcription Gefitinib-treated lung cancer cells factors' binding co-localization We present a statistical test to reverse-engineer We studied gene expression and biological dynamic activities of transcription factors from pathways data to investigate DNA Binding Targets gene expression time courses. Key regulatory co-localization in human and mouse. Specific pathways involving new clinical biomarkers are nucleotide distances between DBTs in promoter found in analyses of Gefitinib-treated lung cancer regions suggest patterns of transcription factor cells. cooperation. —Nagao H, Yoshida R, Saito M-M, Imoto S, —Martinez P, Blanc E, Coolen A, Holzwarth J, Yamaguchi R, Yamauchi M, Goto N, Miyano S, Fraternali F Higuchi T

E-35 Cross-checking experimental results with E-40 Deciphering the genomic plasticity of publicly available gene expression data: a query- bacterial species driven approach We developed a computational procedure and We present a computational tool that allows for compared the predicted regulatory systems from cross-checking a list of experimentally derived four closely related bacterial species in the genus genes with a gene expression compendium Bordetellae, of which three are pathogenic with comprising publicly available gene expression data. different host specificity. —De Smet R, Hermans K, De Keersmaecker S, —Janky R, Harvill ET, Madan Babu M Marchal K E-41 Expression arrays & outlying processes: E-36 Using tiling microarrays to predict building a database of outliers transcription start sites: application to We present the development of a database of Lactobacillus plantarum WCFS1 probes for Affymetrix expression arrays. A php We present a method for detecting transcription interface allow to summarize probe features and start sites based on whole-genome tiling microarray empirical measures, in order to detect both theoric expression data to validate genome-wide predicted and empirical outliers. promoter locations. —Berger F, Kroll KM, van Dorp M, Carlon E —Todt T, Wels M, Siezen RJ, Van Hijum SAFT, Kleerebezem M

82

ECCB10 Posters E-42 Detecting significantly correlated gene search for co-expressed genes over hundreds of clusters and their interactions in a continuous datasets at a time. Given the query gene MEM scale-space finds a list of genes that have similar expression in We apply scale space theory to detect genomic many datasets. regions showing significant clusters of genes with —Adler P, Kolde R, Kull M, Peterson H, Reimand highly correlated gene expression profiles. We J, Tkatšenko A, Vilo J introduce a novel method to control the FWE when detecting clusters with non-homogeneous gene E-48 Linking parallel measurements of high- densities. throughput miRNA, gene and protein expression —Van Dyk E, Reinders M, Wessels L data We presented a graph based workflow for linking E-43 MAMMOD: Multi-Agent Motif and MOdule different kinds of omics data using pathway Discovery information and component wise boosting. It was We present Mammod, a novel tool for tested on a colorectal cancer data set with miRNA simultaneous discovery of cis-regulatory motifs and gene expression measurements of 97 samples. and modules in eukaryotic promoters. We —Gade S , Binder H, Beißbarth T demonstrate Mammod's performance on benchmark data sets and show that it can compete E-49 An extended translational network involving with several state-of-the-art methods. transcription factors and RNA binding proteins —Van Delm W, Ayoubi T, De Moor B, Moreau Y derived from the identification of phylogentic footprings in 5’ and 3’ untranslated regions E-44 Modulon identification from bacterial gene We present a computational pipeline and a first expression array data using improved supervised benchmarking experiment aimed at the prediction approach of a network coming from the analysis We present and improvement of correlation matrix- hyperconserved elements of untranslated regions, based method of data mining for microarray data. applied to a 46 vertebrates species alignment. We have analyzed different data source and several —Dassi E , Tebaldi T, Zuccotti P, Riva P, basic regulons in E. coli, coming up with a Quattrone A software for supervised prediction of new modulon members E-50 Unvealing heterogeneity of stage II and —Hedge S, Permina E, Medvedeva Y, Mande SC, stage III colorectal cancer by defining molecular Makeev V subtypes We present the results of a meta-analytical E-45 GCQM: quality assessment of microarray approach for identifying robust subgroups of studies and samples using gene correlations patients with colorectal cancer. Unsupervised The quality of gene expression data is variable, and analysis of gene expression profiles revealed major difficult to measure. We propose a quality measure molecular subgroups, stable across experiments. based on the consistency of gene-gene correlations. —Budinska E, Popovici V, Delorenzi M We show that it allows the flagging of bad quality studies and samples across platforms. E-51 A method for improved prediction of in vivo —Venet D, Detours V, Bersini H transcription factor binding sites by using biophysical DNA features E-46 Integration of correlation structure improves We present a sequence-based biophysical approach performance of pathway-based classification for the identification of transcription factor binding By integrating the correlation structure of gene set sites. This method was used on widely available expression profiling, we imporved the ChIP-Seq data. performance in the gene set-based classification —Hooghe B, Broos S ,Van Roy F, De Bleser P analysis. Our method was validated with simulation data and four breast cancer datasets. E-52 Nonlinear dimension reduction and —Cho SB, Kim JH clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes E-47 Multi Experiment Matrix – web tool for ‘Minimum Curvilinearity’ is a principle which mining co-expressed genes over hundreds of inspires two novel nonlinear machine learning: datasets Minimum Curvilinear Embedding for MEM is a query engine atop of the microarray dimensionality reduction; and Minimum experiments from ArrayExpress that performs Curvilinear Affinity Propagation for clustering. 83

ECCB10 Posters —Cannistraci CV, Ravasi T, Montevecchi FM, E-58 Identification of genes with preferential Ideker T, Alessio M expression in the egg cell Gametes, like egg and central cell, are crucial cell E-53 Seeded Cis-Regulatory Module discovery types in the plant life cycle. To understand gamete using rank based motif scoring differentiation and function, we isolate and We have development an in silico cis-regulatory characterize egg cell specific genes of wheat and module detection algorithm for aid in interpretation the model plant Arabidopsis. of gene regulatory networks in humans. Our —Köszegi D, Czhial A, Kumlehn J, Altschmied L, biologically motived design results in improved Baumlein H performance over existing CRM predictors. —Macintyre G, Bailey J, Kowalczyk A, Haviv I E-59 Towards real-time control of gene expression: controlling the HOG signalling E-54 Proximity-based cis-regulatory module cascade detection using constraint programming for We present preliminary results on the development itemset mining of a platform for the real-time control of gene We present an extendible method for the detection expression. The platform integrates a microfluidic of cis-regulatory modules, based on constraint device, an epi-fluorescence microscope, and programming for itemset mining. The method finds software implementing control approaches. all combinations of transcription factors that —Uhlendorf J, Bottani S, Fages F, Hersen P, frequently have binding sites in each others Batt G proximity. —Guns T, Sun H, Nijssen S, Sanchez-Rodriguez E-60 Sequencing the transcriptome of a deep-sea A, De Raedt L, Marchal K hydrothermal vent mussel: new possibilities for the discovery of immune genes in an E-55 High-throughput detection of epistasis in unconventional model organism studies of the genetics of complex traits We have established the first tissue transcriptional We present a novel tool that has been developed to analysis of a deep-sea hydrothermal vent animal allow routine high throughput analysis of epistasis and generated a searchable catalog of genes which in complex traits in populations genotyped with can be applied in gene expression profiling high density single nucleotide polymorphism experiments from a non-conventional model markers. organism. —Gyenesei A, Laiho A, Semple C, Haley C, —Bettencourt R , Stefanni S, Pinheiro M, Egas C Wei W Serrão Santos R

E-56 Heavy metal resistance in Cupriavidus E-61 Towards a computational tool to uncover metallidurans: a complex evolutionary and genes involved in signaling crosstalk in transcriptional process Arabidopsis thaliana The metal-resistant soil bacterium Cupriavidus Crosstalk between signaling pathways is of critical metallidurans CH34 displays myriads of gene importance in organisms. We present a expression patterns when exposed to a wide range computational method to identify genes involved in of heavy metals at non-lethal concentrations signaling crosstalk in Arabidopsis thaliana, with the pointing towards a complex regulatory network. goal of providing a tool for experimental scientists. —Monsieurs P, Moors H, Van Houdt R, Janssen —Omranian N , Arvidsson S, Riaño-Pachón DM, P, Mergeay M, Leys N Mueller-Roeber B

E-57 Identification of co-regulated E-62 Prediction of the stage of embryonic stem mRNA/miRNA pairs by the CoExpress software cells differentiation from genome-wide expression tool data Here we present the software tool CoExpress and We evaluated several machine learning approaches apply it for the analysis of co-regulation between to predict the stage of embryonic stem cells miRNA and mRNA in 14 cell lines. Co-regulation differentiation from genome-wide expression effects were validated using numerical methods, profiles. Results are very encouraging, even when bioinformatics and qPCR experiments. learning and prediction use samples from different —Nazarov P V, Khutko V, Schmitz S, Muller A, cell lines. Kreis S, Vallar L

84

ECCB10 Posters —Zagar L , Mulas F, Sacchi L, Garagna S, differentially expressed genes. We aim at studying Zuccotti M, Bellazzi R, Zupan B gene expression time series data from the viewpoint complexity. E-63 High throughput sequencing of the —Sun X, Zou Y, Nikiforova V, Kurths J, Anopheles transcriptome through sexual Walther D development Despite the importance of Anopheline mosquito E-65 Efficient query-based biclustering of gene species as vectors of human malaria, the expression data using probabilistic relational mechanisms leading to sexual differentiation in models these insects is yet to be elucidated. 454 We present ProBic, a query-based biclustering pyrosequencing was used to identify sex-specific method. ProBic identifies biologically sound splicing in A. gambiae biclusters and, bicluster identification is robust as —Koscielny G , Nolan T, Severgnini M, Rizzi E, the set of query genes can contain genes that are Lawson D, Kersey P, De Bellis G, Crisanti A not part of the bicluster of interest. —Cloots L, Zhao H, Van den Bulcke T, Wu Y , De E-64 The complexity of gene expression dynamics Smet R, Storms V, Meysman P, Engelen K, revealed by permutation entropy Marchal K The analysis of gene expression data has so far focused primarily on the identification of

Text Mining, Ontologies and Databases

F-1 Resource of Asian Primary F-4 A graphical view of the world of protein- Immunodeficiency Diseases update: an open web- protein interaction data-bases based integrated molecular database on primary The graphical representantion of data exchange immunodeficiencies between various protein-protein databases is RAPID hosts information on PID gene-specific visualized using Cytoscape and presented at molecular alterations, expression profiles, http://www.pathguide.org/interactions.php interaction networks and mouse studies. By —Klingström T, Plewczynski D scanning whole genome, predicted 1442 candidate PID genes using SVM, out of which 15 genes have F-5 StrainInfo: your microbiological stepping been so far confirmed. stone —Ramadoss SK, Keerthikumar S, Raju R, StrainInfo is a global catalog of microbial material Kandasamy K, Balakrishnan L, Dhevi L, Selvan N, integrating the catalogs of more than 60 Biological Sekhar NR, Mohan S, Bhattacharjee M, Hijikata A, Resource Centers . StrainInfo integrates and Koh bundles information on its strain, sequence, taxon and publication passports. F-2 Measuring the functional similarity of drugs —Verslyppe B, De Smet W, Gillis W, De Vos P, We presents a novel strategy to assessing the De Baets B, Dawyndt P functional similarity of annotation profiles based on the Gene Ontology topology. We applied this F-6 Semantic integration of isolation habitat and strategy to an extensive dataset of drug profiles location in StrainInfo derived from genome-wide expression data. The StrainInfo integrates multiple redundant textual dominant b descriptions of isolation habitat and location by —Götz S, Behrens S, Tarazona S, Marba M, means of an ontology-based integration algorithm. Dopazo J, Conesa A —Verslyppe B, De Smet W, De Vos P, De Baets B, Dawyndt P F-3 G-language Bookmarklet: a gateway for semantic web, linked data, and web services F-7 Towards an ontology for farm animal G-language Bookmarklet is a visual gateway to reproduction traits numerous biological resources that works on any A framework of an ontology of reproduction traits web page viewed with any browser, implemented in farm animals was developed. A major goal was as an animated interactive ring-shaped menu, freely to integrate as much knowledge as possible about available at: the domain reproduction traits in farm animals and http://www.g-language.org/wiki/bookmarklet that can be used by computer-based analyses. —Arakawa K , Kido N, Oshita K, Tomita M 85

ECCB10 Posters —Hulsegge B, Smits MA, te Pas MFW, Woelders F-13 A formal framework for evaluating the H significance of peptide-spectrum matches We present a framework for assessing the F-8 Human gene symbol validation at the HGNC significance of peptide-spectrum-matches in High throughput technologies has created an computational mass-spectrometry. The use of increasing demand to resolve large lists of human enumeration techniques for counting peptides that gene symbols to their HGNC ID. We have match a specific spectrum allows to increase the developed a simple web-based 'list search' tool sensitivity of PSMs. which allows a researchers to quickly validate such —Vandenbogaert M datasets. —Lush MJ , Seal RL, Gordon SM, Wright MW, F-14 Improving the execution of bioinformatical Bruford EB workflows We present a novel approach called Data Flow F-9 miRSel: automated extraction of associations Delegation to improve the execution of between microRNAs and genes from the bioinformatical workflows, which optimizes the biomedical literature integration and coordination of different Biological We propose to use text mining of publication Databases and tools for solving various biological abstracts for extracting microRNA-gene problems. associations including microRNA-target relations —Subramanian S, Sztromwasser P, Puntervoll P to complement current repositories. —Naeem H, Küffner R, Csaba G, Zimmer R F-15 BridgeDb: standardized access to gene, protein and metabolite identifier mapping services F-10 Metarel, a relation metagraph for inferences We present BridgeDb, a developer library to in biomedical ontologies connect bioinformatics tools to online and offline We enabled queries on the Open Biomedical identifier mapping services. BridgeDb already Ontologies, by automated reasoning techniques. All works with several mapping services such as the ontologies were translated to RDF, a Semantic BioMart, Synergizer, Cronos, PICR and HMS Web standard. Reasoning modules were added to —Van Iersel MP, Pico AR, Kelder T, Gao J, Ho I, the Metarel metagraph, which allowed SPARUL Hanspers K, Conklin BR, Evelo CT inferences. —Blondé W, Antezana E, Venkatesan A, De Baets F-16 HGVbaseG2P: an advanced database for the B, Kuiper M, Mironov V integration and interrogation of genetic association datasets F-11 Predicting disease causing genes using the The Human Genome Variation Genotype to information content surrounding the disease and Phenotype database provides a comprehensive gene in literature publication medium for summary-level genetic We present a text-mining approach for predicting association data. Powerful graphical presentation gene-disease relationships, even when the gene and methods enable study visualisation & co- the disease never have been co-mentioned together examination. in an abstract before. The prediction is based on —Free R, Hastings R, Thorisson GA, Gollapudi indirect links. VLS, Beck T , Lancaster O, Brookes AJ —van Haagen H, Aten E,’t Hoen P-B, Roos M, Messemaker T, Mons B, van Ommen G-J, F-17 Extracting phytogeography information Schuemie M from species distribution data Availability of species distributions in digital F-12 Revealing heterogeneities and format motivated us to analyze this data. Pair wise inconsistencies in protein functional annotations similarities among grid squares were calculated. We propose a method able to highlight potential Our results were highly accurate when we compare inconsistencies on functional annotations in the with the phytogeographical regions. Gene Ontology database of a pool of proteins —Bakis Y, Sezerman OU, Babaç MT thanks to the simultaneous clustering of both GO terms and protein annotations. F-18 D2K - data to knowledge - data integration —Sanavia T, Facchinetti A, Di Camillo B, Toffolo for biological reasoning G, Lavezzo E, Toppo S, Fontana P We present data mining system by integration of quality controlled published, in silico generated and experimentally gathered data. The system is 86

ECCB10 Posters designed to generate biological hypotheses a priori F-24 The meaning behind the choice of words: a to wet-lab experiments or high-throughput studies. case in cancer metastasis —Wienecke-Baldacchino AK, Heinäniemi M, We analyzed the language used by cancer experts Carlberg C while talking about several aspects of metastasis. Unlike the rigid ideas often found in the dry and F-19 BRENDA text-mining: new developments strictly-written biomedical literature, our data for obtaining enzyme-related disease information revealed a world full of uncertainty and speculation from scientific literature —Divoli A, Mendonca EA, Rzhetsky A BRENDA contains manual annotated and text mining derived information from scientific F-25 Linked open experimentation literature. The disease-related text mining We investigate the application of information component has been revised and extended to the technologies that can help explore knowledge further classification of mining results. beyond a single experiment and a single researcher. —Soehngen C, Scheer M, Schomburg I, Chang A, We link biological models, Taverna results, Grote A, Schomburg D myExperiment, BioCatalogue, and ConceptWiki. —Roos M, Van Haagen HHHBM, Singh B, F-20 Beegle - a new search engine for discovering Marshall MS, Mons BM novel genes We present a new method based on text mining and F-26 First CALBC Challenge – first results association network analysis which, from any free CALBC aims at creating a large annotated corpus text query, returns known and potential novel genes with about 5 different semantic types whose linked to it. annotation is carried out automatically. The —Brohée S, Gonçalves JP, Nitsch D, Moreau Y CALBC challenge I has not been terminated and first results are available. F-21 An overview of the pathogen-host —Rebholz-Schuhmann D, Jimeno Yepes A, Li C, interactions database, PHI-base van Mulligen E, Kors J, Milward D, Hahn U The Pathogen-Host Interaction database is a phenomics database cataloguing experimentally F-27 EuroPhenome: a repository for high- verified pathogenicity genes. An orthologous gene throughput mouse phenotyping data cluster analysis is used to identify host specificity The Europhenome project is a comprehensive genes. resource for raw and annotated high throughput —Janowska-Sejda E, Defoin-Platel M, phenotyping data arising from projects such as Hammond-Kosack K, Urban M, Tsoka S, Saqi M EUMODIC. These provide data on inbred and mutant mouse lines. F-22 Finding disease related genes by GeneRank —Morgan H , Hassan A, Blake A, Hancock JM, algorithm using co-occurrence based network Mallon AM structures We present a method to decide the ranks of genes F-28 InSilico DB: an efficient starting point for for the disease related genes findings by using text the analysis of curated human Affymetrix gene mining for GeneRank algorithm. Our experiment expression microarray datasets in GenePattern shows that this approach outperforms the There is a large amount of Affymetrix gene functional annotation approach with Gene expression datasets in the public domain, but Ontology. reusing them requires tedious and error-prone —Lee H-M, Shin M-Y, Hong M-P retrieval and compilation. InSilico DB allows exports of uniform curated datasets into popular F-23 Identifying variability of human splicing analysis platform. forms in their interactions by using literature —Venet D , Coletta A, Taminau J, Steenhoff D, mining Bentabet L, Walker N, Meganck S, Delgado We present a new protein-protein interaction Blanco J, de Schaetzen V, Savagner F, Rousseau F, database generated through a comprehensive text Schymkowitz J, Detours V, Nowé A, Weiss Solís mining pipeline for human splicing variants. We DY , Bersini H analyze the PPI DB content to assess the variation in interactions of proteins involving isoforms. —Kafkas S, Varoglu E, Rebholz-Schuhmann D, Taneri B

87

ECCB10 Posters Protein Interactions, Molecular Networks and Systems Biology

G-1 Phenotypic effects of network rewiring in G-6 Fast motif enumeration and clustering in regulatory hierarchies integrated networks We study the phenotypic effects of network We present novel algorithms and a Cytoscape user rewiring events in regulatory hierarchies. By interface for identifying clusters of overlapping allowing the hierarchies to change upon deletions network motifs, which form functional modules in and insertions of nodes/edges, we find that location cellular networks composed of multiple types of of changes more accurately reflects the phenotypic interactions. effect. —Audenaert P, Van Parys T, Pickavet M, —Bhardwaj N, Kim PM, Gerstein M B Demeester P, Van de Peer Y, Michoel T

G-2 Bringing order to disorder: comparative G-7 Dynamic deterministic effects propagation genomics and genetic interactions uncover three networks: learning signalling pathways from biologically distinct forms of protein disorder longitudinal protein array data Intrinsically disordered regions are very common We present a method for reconstruction of in many proteins. Using comparative genomics and signalling networks from Reverse-Phase-Protein- genetic interactions we found that disorder can be Array time course data. Boolean signal propagation split into three phenomena: flexible disorder, is combined with a Hidden Markov Model and constrained disorder and non-conserved disorder. used in a Genetic Algorithm to infer signalling —Bellay J, Han S, Michaut M, Constanzo M, networks from data. Andrews BJ, Boone C, Bader GD, Myers CL, Kim —Bender C, Henjes F, Fröhlich H, Wiemann S, PM Korf U, Beißbarth T

G-3 Prediction of genetic interactions in yeast G-8 Systematic mapping of multiple specificity in using machine learning peptide recognition modules reveals new binding We propose to use machine learning techniques to modes of protein domains infer genetic interactions in yeast by integrating We develop a new computational model of binding various feature sets defined on genes. The approach specificity in peptide recognition modules using is validated using four available genetic large experimental datasets of interacting peptides. interactions maps as training networks. Our method predicts unexpected protein —Schrynemackers M, Geurts P, Wehenkel L, interactions and reveals new binding modes of PDZ Madan Babu M domains. —Gfeller D, Ernst A, Verschueren E, Vanhee P, G-4 Prioritizing candidate genes by network Dar N, Serrano L, Sidhu SS, Bader GD, Kim PM analysis of differential expression using machine learning approaches G-9 Reverse-engineering gene regulatory We present a method that can identify promising networks for the abiotic stress response in candidate genes using network-based machine Arabidopsis thaliana learning approaches even if no knowledge is We infer stress gene regulatory networks for available about the disease or phenotype. It was Arabidopsis thaliana, consisting of functionally benchmarked on 40 KO experiments in mice with coherent coexpression modules and biologically positive results. relevant regulators. We also predict new functions —Nitsch D, Gonçalves JP, Ojeda F, Tranchevent for uncharacterized genes in the abiotic stress L-C, Moreau Y response. —Vermeirssen V , De Clercq I, Van Parys T, Van G-5 The human E3 ubiquitin ligase enzyme Breusegem F, Van de Peer Y protein interaction network Using structural data,efficient structural G-10 Predicting protein-protein interaction using comparison algorithms and appropriate filters,we mirror tree construct human E3 ubiquitin ligase enzyme We introduce the Mutual Information measure as protein interaction network.Our analysis reveals an alternative metric to evaluate the co- important functional features and a priori unknown evolutionary information shared by a protein pair. E3 interactions We provide a comparative study of Mirror Tree —Kar G, Keskin O, Nussinov R, Gursoy A 88

ECCB10 Posters and enhanced versions using a large dataset of —Acuner Ozbabacan SE, Gursoy A, Keskin O, E.coli proteins. Nussinov R —Esmaielbeiki R , Nebel J-C G-16 Systematic network analysis G-11 Comparison of Bayesian networks and its We present a tool for large-scale network analysis extensions applied to the inference of regulatory and demonstrate its relevance to the study of networks biological networks, in particular the inference of In this work a comparison among tree different network features constrained by evolution. Our Bayesian networks approaches is presented. BNs approach helps motivate generative models of without any modification, with interventions and networks. with addition of extra knowledge. To compare the —Agarwal S, Villar G, Jones N methods synthetic and cytometry data are used. —Werhli AV G-17 Model of trisporic acid synthesis in Mucorales shows bistable behaviour G-12 Insight into the mechanism of specific We present an ODE model for the synthesis regulation of STAT proteins by atypical dual- pathway of trisporic acid in Mucorales. This specificity phosphatases pathway has some very interesting characteristics, We present our results based on an approach like exchange of intermediates as well as a positive combining docking and molecular dynamic feedback loop. Our model shows bistable simulations to understand at a molecular level the behaviour. mechanism responsible for the specific regulation —Werner S, Vlaic S, Schroeter A, Schuster S of STAT proteins by atypical dual-specificity phosphatases. G-18 Simulation of immune response to —Jardin C, Sticht H Mycobacterium tuberculosis using an agent-based model G-13 VoteDock: the consensus docking method We have developed a three-dimensional agent- for prediction of protein-ligand interactions based model for the immune response to M. Our consensus-based docking method is able to tuberculosis. This model captures the cellular dock properly about 20% of protein-ligand pairs migration, activation, phagocytosis and death. more than docking methods in average, and more Also, it includes the bacillus replication and than 10% of pairs more than single best program. necrosis formation. Also drop in RMSD of top conformation is —Galvão V, Miranda JGV, Andrade RFS observed. —Plewczynski D, Ła źniewski M, von Grotthuss G-19 Design, optimization and predictions of a M, Rychlewski G coupled model of the cell cycle, circadian clock, DNA repair system, irinotecan metabolism and G-14 Inferring genetic regulatory networks with exposure control under temporal logic constraints an hierarchical Bayesian model and a parallel We propose a coupled model of the mammalian sampling algorithm cell cycle, the circadian clock, the p53/Mdm2 We propose the use of a Metropolis Coupled DNA-damage repair system, the metabolism of Markov Chain Monte Carlo in order to improve a irinotecan and the control of cell exposure to it. We Bayesian coupling scheme for learning genetic discuss the use of this model for cancer regulatory networks from a combination of related chronotherapies. data sets, obtained under different experimental —De Maria E, Fages F, Rizk A, Soliman S conditions. —Mendoza MR, Werhli AV G-20 Adding structural information to the von Hippel-Lindau tumor suppressor interaction G-15 Prediction of protein-protein interactions in network the apoptosis pathway The von Hippel-Lindau tumor suppressor gene is a We focus on predicting protein-protein interactions protein interaction hub, controlling numerous genes in the apoptosis pathway, by structurally comparing implicated in tumor progression. Using structural the genes/proteins in that pathway with the information and computational analysis we have template interfaces of non-obligate complexes located three distinct interaction interfaces. obtained from the PDB. PRISM server is used as —Leonardi E, Murgia A, Tosatto S the prediction tool.

89

ECCB10 Posters G-21 Computational multiscale modeling of brain G-27 Visualizing spatiotemporal information in tumor growth heterogeneous biological networks with Arena3D We present a novel computational multiscale We present Arena3D, a visualization and analysis model for the spatio-temporal dynamics of brain platform that enables the understanding of cancer. This model combines a molecular connections between biological networks and the interaction network with its emerging cellular identification of patterns in gene dynamics. It was activities to describe avascular tumor progression. tested on the gene regulatory network of the cell —Schütz T A, Toma A, Becker S, Mang A, Buzug cycle. TM —Secrier M, Pavlopoulos GA, Schneider R

G-22 SLIDER: a generic metaheuristic for the G-28 Efficient learning of signaling networks discovery of correlated motifs in protein-protein from perturbation time series via dynamic nested interaction networks effects models We present the metaheuristic SLIDER for finding We propose a novel, computationally efficient correlated motifs in sequences of interacting approach for reconstructing signaling cascades proteins. We show that it outperforms existing from measured temporal downstream effects of motif-driven CMM methods and scales to large targeted interventions; via an extension of Nested protein-protein interaction networks. Effects Models . —Boyen P, Van Dyck D, Neven F, van Ham —Praveen P, Tresch A, Froehlich H RCHJ, van Dijk ADJ G-29 Discovering patterns of differentially G-23 The effective interactions between proteins regulated enzymes in metabolic pathways of via coarse-grained modeling tumors We propose a method that enables the fast We present a pattern recognition method on calculation of the potential of mean force between networks thatcomplements normal enrichment tests interacting protein pairs. Such information can be to detect such functionally related regulation used for predicting protein-protein interactions. patterns —Pool R, Feenstra KA, Heringa J —Schramm G,Wiesberg S, Diessl N, Kranz A-J, Sagulenko V, Oswald M, Reinelt G, Westermann G-24 Predicting cellular perturbation effects and F, Eils R, Konig R identifying causal perturbations using a message- passing based machine learning method G-30 Extending the yeast metabolic network A novel network reconstruction is proposed that is using a data integration approach based on a message-passing scheme in which the We present a novel methodology, using the data strength of the links are learned. Its effectiveness is integration and visualisation capabilities of Ondex, shown as it is able to explain effects of cellular with Taverna’s ability to combine bioinformatics perturbations. Web Services, to create a powerful platform for —Hulsman M, Reinders MJT analysing data to support systems biology research. —Fisher P , Dobson P, Brenninkmeijer C, Canevet G-25 Towards better receptor-ligand prediction C, Taubert J, Rawlings C, Stevens R Receptor-ligand interactions mediate virtually every aspect of our multicellular life. Here we G-31 Predicting genetic interactions by discuss the use of kernels to correctly pair these quantifying redundancy in biochemical pathways important receptors and ligands. We propose a novel unsupervised method to —Iacucci E, Ojeda F, De Moor B, Moreau Y predict genetic interactions based on the information contained in protein interaction G-26 Extracting and visualising protein-protein networks. Our method, combined with interfaces for data-driven docking experiments, could shed light on the quantitative Protein-protein interactions play a key role in many nature of biological robustness. cellular processes but they are difficult to predict at —Delgado-Eckert E, Gill S, Merdes G, the molecular level. We are developing an Beerenwinkel N approach to exploit knowledge of existing PPIs to help guide protein docking calculations. G-32 Hub protein interfaces and hot region —Ghoorah AW, Devignes M-D, Smaïl-Tabbone organization M, Ritchie DW Structural properties of hub proteins give clues about hot region organization in the interface. 90

ECCB10 Posters Diversity between hot regions in hub proteins G-38 Mapping the ‘Farnesylome’ - structure- contribute to predict whether a given hub interface based prediction of farnesyl transferase targets is constructed by two date hubs or two party hubs. We present a structure-based approach for the —Cukuroglu E, Gursoy A, Keskin O detection of farnesylated proteins. Using this approach we identify new putative targets for G-33 A functional-genomics fermentation Farnesyl transferase not identified by traditional platform to identify and optimize industrial- sequence-based approaches, providing new relevant properties of Lactobacillus plantarum functional insight. We present a functional genomics platform that —London N, Schueler-Furman N matches gene expression to industrially-relevant phenotypic traits in Lactobacillus plantarum G-39 Integrative analysis of gene expression and WCFS1. copy number variation data to elucidate a —Wels M, Bron PA, Wiersma A, Marco M, reference human gene network Kleerebezem M We developed algorithms and database to elucidate directed Gene Networks from gene expression and G-34 DisGeNET: a Cytoscape plugin to visualize, CNV. From human datasets we compiled a healthy integrate, search and analyze gene-disease reference network, which can be used in disease networks studies to identify differential disease networks. DisGeNET provides a user-friendly framework to —Orsini M, Pinna A, De Leo V, de la Fuente A explore human gene-disease associations by means of network analysis tools. G-40 Unknown player in modelling of signal —Bauer-Mehren A, Rautschka M, Sanz F, transduction pathway Furlong LI We present a probabilistic models base on Nested effects Models to infer unknown players in the G-35 Combination of network topology and signalling pathway topologies from RNAi gene pathway analysis to reveal functional modules in silencing data and the applicability of this method human disease in controlled simulation study. A comprehensive database comprising human —Sadeh M, Anchang B, Spang R gene-disease associations for monogenic, complex and environmental diseases was subjected to G-41 Assembly of protein complexes by topological and functional network analysis to integrating graph clusterings identify and characterize disease-related biological We propose a framework for integrating clustering processes. methods for predicting protein functional modules. —Bauer-Mehren A, Bundschus M, Rautschka M, Besides, we provide a web service which compute Mayer MA, Sanz F, Furlong LI six clustering results and their integration. It is interactive and powerful network analyzer. G-36 Attacking interface & interaction networks —Chin C-H, Chen S-H, Chen C-Y, Hsiung C, Ho “Interface & Interaction Networks” result from C-W, Ko M-T, Lin C-Y integration of binding site information into PPI networks. Here proteins are depicted as nodes, G-42 How natural host species avoid CD4+ T cell interactions as edges and interfaces as a different depletion during SIV infection kind of node. Our study reveals one key mechanism by which —Engin B., Gursoy A., Keskin O natural host monkeys have adapted to simian immunodeficiency virus infection and therefore G-37 Pathway Projector: web-based zoomable maintain their immune system and remain free pathway browser using KEGG Atlas and Google from AIDS. Maps API —Chan ML, Petravic J, Ortiz AM, Engram J, Pathway Projector is a web-based pathway browser Paiardini M, Cromer D, Silvestri G, Davenport MP with zoomable user interface. This application allows multiple search, experimental data mapping, G-43 Kinase-specific phosphorylation site and editing. Pathway Projector is available at: prediction in Arabidopsis thaliana http://www.g-language.org/ PathwayProjector/. We present a method for predicting kinase-specific —Kono N, Arakawa K, Ogawa R, Kido N, Oshita phosphorylation sites in Arabidopsis thaliana based K, Ikegami K, Tamaki S, Tomita M on finding kinases orthologous to kinases in other well-studied systems. Results obtained are subject to further computational characterisation. 91

ECCB10 Posters —Dang T-H, Verschoren A, Laukens K G-49 BioGraph: discovering biomedical relations by unsupervised hypothesis generation G-44 BiologicalNetworks: enabling systems-level The BioGraph service integrates heterogeneous studies of host-pathogen interactions knowledge bases and allows for the successful The approach at cross-scale data integration to automated formulation and ranking of study host-pathogen interactions is proposed and comprehensible functional hypotheses relating demonstrated on a study of the Influenza biomedical contexts to candidate targets, e.g., for infections. The methods and data are available disease-gene relations. through the Java application . —Liekens A, De Knijf J, Daelemans W, Goethals —Kozhenkov S, Sedova M, Dubinina Y, B, De Rijk P, Del-Favero J Ponomarenko J, Baitaluk M G-50 Tuning noise propagation in a two-step G-45 Genome-wide identification of disease- series enzymatic cascade related genes We characterize noise propagation in a two-step Elucidating the underlying disease mechanisms is series MAPK enzymatic cascade, which is crucial for understanding the onset of diseases, the ubiquitously conserved in eukaryotes. We identify development of specific diagnostic and therapeutic the parameters that may be used to tune noise approaches. We present a linkage interval- propagation in the enzymatic cascade. independent method to identify disease-related —Dhananjaneyulu V, Sagar PVN, Kumar G, genes. Viswanathan GA —Jaeger S , Leser U G-51 The effect of interactome evolution on G-46 Bioinformatics tools for the investigation of network alignment proteomics aspects involved in cardiovascular We investigate the effect of network evolution and diseases: from biomarker discovery to protein- error on protein network alignment results. Using a protein interaction networks distance metric for network alignments enables us The study of complex samples, by “high- to measure agreement between alignments of real throughput” proteomics technologies, requires the and simulated networks and propose likely error development of computational tools to handle the rates. vast amount of data produced. Their employment —Ali W, Deane CM will be shown in relation to myocardial infarction investigation. G-52 Analyzing compounds’ mode of action by —Di Silvestre D, Brambilla F, Brunetti P, Lionetti biological relatedness of proteins. V, Agostini S, Cavallini C, Mauri PL Mode of action is primarily analyzed by gene expression. As not all regulation happens on the G-47 ABC database - the analysing biomolecular transcriptional level, we enrich protein interactions contacts database by comprehensive data to detect the biological We present un update of our ABC database that context leading to/caused by expression changes. contais detailed information about protein-protein —Schmid R, Baum P, Ittrich C, Fundel-Clemens and protein-small molecule interfaces. The K, Lämmle B, Birzele F, Weith A, Brors B, Eils R, database is freely available via web interface for Mennerich D, Quast K the statistical analysis of biomolecular contacts —Walter P, Metzger J, Helms V G-53 Logical modelling of the regulatory network controlling the formation of the egg appendages G-48 The core and pan metabolism in the in Drosophila Escherichia coli species We present a map and a dynamic logical model of In this work, we analyze the metabolic diversity of the intracellular regulatory network controlling the Escherichia coli species. Using an optimized dorsal appendage patterning in drosophila reconstruction strategy, we built and investigated oogenesis. Our preliminary results support the the metabolic networks of 29 E. coli strains, which hypothesis that Broad inhibits Fas3 in the roof showed high correlation with their phylogeny. cells. —Vieira G, Sabarly V, Bourguignon P-Y, Durot —Fauré A, Vreede B, Sucena É, Chaouiya C M, Le Fèvre F, Mornico D, Vallenet D, Bouvet O, Denamur E, Schachter V & Médigue C

92

ECCB10 Posters G-54 A curated database of microRNA mediated G-59 Inferring translationally active RNA binding feed-forward loops involving MYC as master proteins - mRNA interactions from polysomal regulator profiling data with a Bayesian inference approach We present and discuss a database of mixed We present a Bayesian inference approach to microRNA / Transcription Factor Feed-Forward predict direct or indirect interactions between RNA regulatory Loops having MYC as master regulator binding proteins and their targets based on the and characterized completely by experimentally comparison between translatome and transcriptome supported regulatory interactions, in human. profiling experiments. —El Baroudi M, Corà D, Bosia C, Osella M, —Tebaldi T, Sanguinetti G, Niranjan M, Caselle M Quattrone A

G-55 The Biological Connection Markup G-60 DASS-GUI – a new data mining framework Language : a SBGN compliant data format for We have developed the new data mining visualization, filtering and analysis of biological framework DASS-GUI. It allows the pattern pathways identification itself and additional analyses of the BCML allows a SBGN compliant representation of identified patterns . pathways in a visual and machine-readable format. —Hollunder J, Van de Peer Y, Wilhelm T BCML permits a selection of elements basing on specific biological evidence, creating pathways G-61 EnrichNet: network-based gene set usable for both the bioinformatician and the enrichment analysis biologist. EnrichNet extends classical gene set enrichment —Calura E, Beltrame L,, Popovici R, Rizzetto L, analysis by mapping gene and protein sets of Rivero Guedez D, Donato M, Romualdi C, interest onto interaction networks to investigate Draghici S, Cavalieri D their associations using network distance calculations and sub-network visualizations. G-56 Estimating the size of the S.cerevisiae —Glaab E, Baudot A, Krasnogor N, Valencia A interactome Our method for estimating an interactome size G-62 Interacting copy number alterations in effectively combines high-throughput and breast cancer literature-curated data. We found a higher estimate We present a framework for the detection of co- for the S.cerevisiae interactome, showing that occurring copy number alterations from high- completing the yeast interactome map require density array data. The method is applied to 216 extensive effort. breast tumours, revealing several deregulated gene —Sambourg L, Thierry-Mieg N interactions driven by interdependent genomic alterations. G-57 Laplacian eigenmaps, penalized principal —Canisius S, Klijn C, Smid M, Martens J, component regression on graphs, and analysis of Foekens J, Wessels L biological pathways We propose a network-based approach for G-63 Compression of mass spectral imaging data analyzing the significance of biological pathways using discrete wavelet transform guided by spatial using Laplacian eingenmaps with Neumann information. boundary conditions, which provides a dimension The presented method reduces the size of MSI data reduction method that directly incorporates the sets considerably while still achieving excellent network information. reconstruction of the original mass spectra. This is —Shojaie A, Michailidis G done by retaining only a limited number of wavelet coefficients that express spatial structure. G-58 Predicting metabolic pathways from —Verbeeck N, Van de Plas R, De Moor B, bacterial operons and regulons Waelkens E We use the pathway extraction tool to predict metabolic pathways from bacterial genomes, by G-64 Learning ancestral polytrees for HIV-1 extracting from a metabolic network the subgraphs mutation pathways against nelfinavir that connect at best a set of seed genes that belong This retrospective study shows that even in the to a same operon or regulon. presence of sampling biases and confounding —Faust K, Croes D, Dupont P, van Helden J effects ancestral polytrees can depict HIV-1 resistance mutation pathways under drug selective pressure from nelfinavir, a protease inhibitor . 93

ECCB10 Posters —Li GD, Beheydt G, Bielza C, Larrañaga P, G-70 A generic gene regulatory network Camacho RJ, Grossman Z, Torti C, Zazzi M, reconstruction method: application to Prosperi M, Kaiser R, Van Laethem K, De Maeyer Lactococcus lactis MG1363 M, Jansen M, Vandamme AM We present a novel method for the whole-genome reconstruction of regulatory networks. A G-65 Ranking of genes from RNAi perturbation classification model is trained using known dat interactions for E. coli and B. subtilis. This model We present a comparative study in which we test is used to classify gene interactions in any methods for ranking genes from RNAi perturbation prokaryote. data. We identify the most robust techniques and —Theunissen D, Brouwer R, Kuipers O, validate them on first and secondary RNAi Hugenholtz J, Siezen R, Van Hijum S screening data. —Siebourg J, Beerenwinkel N G-71 Dissecting the specificity of E2-E3 interaction in the ubiquitination pathway by a G-66 A network centered approach for genome molecular docking approach wide data integration applied to Alzheimers Ubiquitin enzymes are key regulating units for disease prediction many fundamental processes required for cell Network centered analysis of SNP and expression viability. In this work, we used our molecular data yields predictive and biologically relevant docking software HADDOCK to get insights at an signatures for Late Onset Alzheimers Disease. atomic level on the specificity of E2/E3 —van den Akker E, Heijmans B, Kok J, interactions. Slagboom P, Reinders M —Melquiond ASJ, Rodrigues J, Bonvin AMJJ

G-67 How to efficiently hunt the disease causing G-72 PopCover – selecting peptides with optimal gene population and pathogen coverage We present Endeavour, a tool that identifies the PopCover integrates pan-specific MHC binding most promising candidate genes from large gene predictions with MHC allelic prevalence and select lists by integration of multiple genomic data pools of peptides that in concert will provide broad sources. It has been extensively benchmarked and coverage of both pathogen genomic and the also experimentally validated. population MHC diversity. —Tranchevent L-C, Aerts S, Van Loo P, Hassan —Lundegaard C , Buggert M, Karlsson AC, Lund BA, Moreau Y O, Perez C, Nielsen M

G-68 Study of the ultrasensitivity of the BCL-2 G-73 A framework for functional selection of apoptotic switch biomarkers In our work we compare different hypothesis about Molecular biomarkers can be used for evaluating functioning of the Bcl-2 apoptotic switch. We used the health state of an individual. Here, a framework mathematical modelling and measuring of the for biomarker selection is presented, and the ability steady-state stimulus-response sensitivity as the of the selected markers to cover different diseases judging criterion of plausibility of these is evaluated. hypotheses. —Kivinen V , Nykter M, Yli-Harja O, —Tokár T, Uli čný J Shmulevich I

G-69 Improved genome-wide protein-protein G-74 iPath: interactive pathways explorer interaction prediction and analysis of biological iPath is a web-based tool for the visualization and process coordination in Escherichia coli analysis of the metabolic pathways. An interactive We build an improved and high-quality genome- viewer provides straightforward navigation through wide interaction map for Escherichia coli, using various pathways and enables easy access to the different machine learning methods and genomic underlying chemicals and enzymes. feature datasets from several sources. These —Yamada T , Letunic I, Okuda S, Bork P predictions are used to propose various pathway interactions G-75 In silico comparative modeling of MTHFR —Muley VY, Ranjan A A1298C polymorphism in acute leukemia Alterations in folate levels are related with the risk factor for the cancer. The C677T and A1298C mutations of MTHFR were associated with 94

ECCB10 Posters susceptibility in leukemia. Simulations in silico are hierarchically and metabolically, but it is unclear proposed to clarify a paradoxal comportment to why and when. We study whether basic A1298C. physiological constraints can help explain the —Ramos F, Lima J, Melo J observed regulatory mechanisms. —Nikerel IE, Hu F, Berkhout J, Teusink B, G-76 Feasibility space as a tool to understand Reinders MJT, De Ridder D regulation of metabolic networks Hierarchical regulation analysis shows that metabolic networks are regulated both

Genomic Medicine

H-1 DISCOVERY: A resource for the rational H-5 Comparing network and pathway based selection of drug target proteins and leads for the classification for breast cancer. Network and malaria parasite, Plasmodium falciparum pathway based classifiers do not outperform The DISCOVERY project provides a resource for single gene classifiers the rational in silico selection of putative drug We compared the accuracy of three methods to target proteins and lead compounds in malaria, predict outcome in breast cancer based on network using data mining on automated protein or pathways data in combination with gene annotations and the creation of relations to ligand expression data with the performance of a single molecules. gene classifier. We found that single gene markers —Odendaal CJ, Harrison CM, Szolkiewicz MS, perform best. Joubert F —Staiger C, Kooter R, Dittrich M, Müller T, Klau GW, Wessels L H-2 Variance estimators for t-test ranking influence the stability and predictive performance H-6 Enterotypes of the human gut microbiome of microarray gene signatures In the gut metagenome of 39 individuals from 6 Identification of differentially expressed genes in countries we identified several robust clusters that cancer prognosis or diagnosis is commonly suggest that the intestinal microbiota variation is performed with a t-Test. Specific variance stratified, not continuous. We study the influencing estimators are shown to affect the classification factors of this phenomenon. performance and the stability of small gene —Raes J, Arumugam M, The Metahit Consortium, signatures. Dore J, Weissenbach J, Ehrlich, SD, Bork P —Touleimat N, Hernández-Lobato D , Dupont P H-7 Identifying chemotherapy resistance genes H-3 Selecting small subsets of genes for using outlier detection predicting the outcome of chemotherapy We have developed a novel algorithm that allows treatments : a dynamic programming approach the detection of genes that show class specific We adress the issue of selecting small sets of genes differential expression in a subset of samples. We involved in the response to a preoperative use this algorithm to find genes related to chemotherapy treatment in breast cancer. Our chemotherapy resistance. approach is to select genes that maximize the —de Ronde J, Mulder L, Lips E, Rodenhuis S, `responders - non responders' interclass distance. Wessels L —Natowicz R, Moraes Pataro C-D, Incitti R, Costa M-A, Cela A, Souza T, Braga A-P, H-8 Genome wide association study of non- Rouzier R synonymous single nucleotide polymorphisms for seven common diseases H-4 Networks of generalized top scoring pairs for We have conducted a genome wide scan for robust phenotype classification identifying association of non-synonymous SNPs We present a network analysis of the top scoring with seven common diseases. In our study we pairs classifiers and we show how one can build identified 18 new associations with diseases more powerful, but still simple, decision rules by studied in additional to the 24 associations publised exploiting the graph topologies induced by TSP. by WTCCC study. Also, we introduce a generalized version of TSP —Surendran P, Shields D, Stanton A and its parallelized implementation. —Popovici V, Budinska E, Delorenzi M 95

ECCB10 Posters

H-9 Comparative bioinformatics approach to analysis & visualization. We describe an multiple tumors reveals novel prognostic markers application that facilitates the integration & in breast cancer visualization of gene expression data in the context We provide a comparative analysis of pathways of the brain anatomy. enriched in multiple tumors and disclosed new —Kremer A, Gaenzler C, van der Spek P prognostic markers by applying gen set enrichment analysis to pathway groups highly conserved H-14 Clinical prognosis with transcriptional among the tumors. association networks and regression trees —Krupp M, Marquardt J, Maas T, Galle P, Tresch We present a new supervised prediction method for A, Teufel A prognostic problems based on the discovery of clinically-relevant association networks. The H-10 Genomic and epigenomic molecular method integrates regression trees and clinical signatures reveals network mechanisms class-specific networks. associated with ovarian cancer prognosis —Nepomuceno I, Azuaje F, Devaux Y, Nazarov The Cancer Genome Atlas has recently made PV, Muller A, Aguilar-Ruiz JS, Wagner DR available the molecular characteristics of more than 200 patients. Using computational network models, H-15 Prediction of clinical outcome after cardiac we were able to find sub-networks that arrest and induced therapeutic hypothermia significantly stratify survival rates.These We analyzed gene expression biosignatures interactions are assoc relevant to the prediction of clinical outcome after —Ben-Hamo R , Efroni S cardiac arrest . In order to identify mechanisms potentially driving a clinical outcome, we H-11 Inferring distributions of trait-associated developed a method to infer class-specific SNPs with application to genetic association networks. studies —Nepomuceno I, Azuaje F, Devaux Y, Stammet Published SNP associations were used to learn P, Aguilar-Ruiz JS, Wagner DR SNPs' a-priory potential of being trait-associated. The predicted potential is integrated with classic H-16 Allele-specific copy number analysis of association methods. Results are improved for breast carcinomas 16/19 simulated diseases, and for Type-2 diabetes Using a novel algorithm, we present the first allele- data. specific copy number analysis of the in vivo breast —Neuvirth H, Aharoni E, Azencott C-A, Farkash cancer genome, and identify specific recurring A, Landau D, Rosen-Zvi M, Geiger D signatures of aberrations. —Van Loo P, Nordgard SH, Lingjærde OC, H-12 Optimizing exon CGH array designs for Russnes HG, Rye IH, Sun W, Weigman VJ, robust rearrangements detection. Marynen P, Zetterberg A, Naume B, Perou CM, We propose new approach to measure the quality Børresen-Dale A-L, Kristensen VN of array CGH designs focusing on robustness of rearrangements detection to the noise. We H-17 Gene dosage balance and the evolution of implemented the efficient Monte Carlo method for protein interaction networks testing noise robustness within DNAcopy We present evidence of different mechanisms to procedure. control gene dosage balance between hubs in —Gambin T , Sykulski M, Gambin A vertebrates and lower eukaryotes. Such mechanisms reflect different properties in terms of H-13 Interactive human brain atlas for a better protein-protein interactions, gene origin and gene understanding of diseases duplicability. The ever increasing complexity & amount of —D'Antonio M , Ciccarelli FD research data require new approaches to data

Other Bioinformatics Applications universal targets. The optimization improves I-1 Universal virtual screening enrichment rates by 77.2%, and approximately We optimized scoring functions of molecular 70% of ligands for target proteins were predictable. docking programs for virtual screenings of 96

ECCB10 Posters

—Onodera K , Kamijo S based on gene-expression similarities. A perturbation approach aids in the selection of I-2 Accuracy and computational efficiency of a potential therapeutics among discovered drug- graphical modeling approach to linkage disease connections. disequilibrium estimation —McArt D , Zhang S-D We develop recent work on using graphical models for linkage disequilibrium to provide efficient I-8 In silico study of expression profile correlation programs for model fitting, phasing and imputation between microRNAs and cancerous genes of missing data in large data sets. We investigate the possibility that microRNA can —Abel H J, Thomas A act as an oncogene or tumor suppressor gene. Experimentally verified microRNA target genes I-3 Stepwise classifier for heterogeneous genomic information (TarBase) are integrated with data microRNA and mRNA expression data (NCI-60) to We introduce stepwise classification method which study this hypothesis, in which the Pearson use different data types independently, so that correlation and Spearman rank coefficients are intrinsic classification power of each data types used to quantify these relations for nine cancer play their role without overcome by tissues. others.Experiment result showed it's distinct —Ng K-L, Weng C-W, Huang C-H classification power —Wubulikasimu A , van de Wiel MA I-9 Harry Plotter: a user friendly program to visualize genome and genetic map features I-4 ALADYN: a web server for aligning proteins We present Harry Plotter a simple program that by matching their large-scale motion visualizes genomic scaffolds anchored to a genetic We present the ALADYN web server which aligns map and chromosome features distribution, like proteins by matching their large-scale internal QTL significance, gene density, repeated elements motion. The method allows to efficiently detect and GpC abundance, using multi-colored gradients. meaningful relationships between proteins that are —Moretto M, Cestaro A, Troggio M, Costa F, not necessarily well-alignable in sequence or Velasco R structure. —Potestio R, Aleksiev T, Pontiggia F, Cozzini S, I-10 Inhibitory activity of Thiadiazoles on Protein Micheletti C Kinase PKnB from Mycobacterium Tuberculosis: a virtual screening and molecular docking study I-5 Fitness differences as an explanation for Tuberculosis causes more than two million deaths difference between chronic myeloid leukemia per year. We did the study of 1800 different therapies Thiadiazoles to discover new derivatives that can Using a computational models of the hematopoietic be drugs with higher potency and specificity. dynamics, we show here why Nilotinib, a second —Raj U , Singh V-B, Swati, Srivastava A, Naqvi line treatment for chronic myeloid leukemia, has a S-A-H faster and deeper response than Imatinib. —Lenaerts T, Castagnetti F, Traulsen A, Pacheco I-11 A microarray format standardization study: JM, Rosti G, Dingli D meaningful structures Microarray data is too big and disparate. I-6 Stochastic extinction of leukemic stem cells Repositories cannot exchange data. The records are provides a road to cure chronic myeloid leukemia not visible as required. We extended the structure We show through a computational model of the and semantics of MINiML file on GEO within a hematopoietic system that tyrosine kinase metadata framework with a case study to prove the inhibitors like Imatinib can cure CML as a result of performance stochastic effects in the system. —Kocabas F, Can T, Baykal N —Lenaerts T, Traulsen A, Pacheco JM, Dingli D I-12 Normalization of proteomic ratio data. I-7 sscMap perturbation: unbiased candidate In order to correct proteomic ratio data for variable therapeutic ranking in connectivity mapping loading we commonly centre the logarithm of the Connectivity mapping is a method for discovering ratios. Here we set this “normalization” in a larger connections between different biological states context, explain why the common practice is a 97

ECCB10 Posters favoured, and show that sometimes we can do even I-18 Evaluation of multivariate data analysis better. strategies for high-content screening —Sherman J, Molloy M, Descallar J, Lance B, Multivariate data, as generated by high-content Uitto P, Wood G screening, is analyzed using a modular analysis pipeline. Dimension reduction methods and I-13 BioCatalogue: the curated catalogue of life approaches to summarize single cell results are science web services such easily evaluated to determine the best analysis We present the BioCatalogue, a registry of curated strategy. biological web services. A place where web service —Kümmel A, Parker CN, Gabriel D providers, users and curators can register, annotate, monitor and search for web services. I-19 Studies of the binding capacity of —Tanoh F, Bhagat J, Eric Nzuobontane E, cyclooxygenase II and biodistribution of phenols Laurent T, Wolstencroft K, Stevens R, Pettifer S, contained in natural products Lopez R, Goble C A set of phenols present in propolis and grape were studied in their interaction with an inflammatory I-14 Regression system for prediction of errors in processes mediator, the ciclooxygenase II enzyme. the data on gene expression in situ obtained from Additionally, their biodistribution properties were confocal images studied using in silico techniques. We present a regression system for the prediction —Paulino Z-M, Aguilera M-S, García Jenifer and correction of errors in gene expression data García, Iribarne F extracted from confocal images. The system is based on the method based on the censoring I-20 Finding logic networks using the Dutch Life technique that we have published elsewhere. Science Grid: handling 15 million jobs —Myasnikova E, Surkova S Samsonova M We present a method that enables running a huge amount of jobs on grid infrastructure. The solution I-15 Impact of genetic variations on involves a pilot job framework for scheduling jobs phosphorylation sites and an output receiver for storing the outputs. It We analyzed the cross-talking between mutations, scales beyond 5k cores and can handle 15M jobs. alternative splicing and phosphorylation sites in —Bot J, de Ridder J, Reinders M proteins and transcripts, describing how evolution shapes genes creating different contexts in which a I-21 A detailed view on several model-based given site is immersed. multifactor dimensionality reduction methods for —Via A, Le Pera L, Ferré F, Tramontano A detecting gene-gene interactions in case-control data in the absence and presence of noise I-16 FuSiGroups: grouping gene products by Model-Based Multifactor Dimensionality functional similarity Reduction is a promising method for detecting We present FuSiGroups, a novel algorithm to epistasis that flexibly deals with different outcome group gene products based on functional similarity. types. Its power and type I error are evaluated in a Unlike existing approaches, FuSiGroups allows variety of simulation settings. each gene product to be in multiple groups, —Cattaert T, Van Lishout F, Mahachie John JM, representing its association with different Van Steen K biological concepts. —Welter DN , Gray WA, Kille P I-22 Quality ranking of 16S sequences: an approach based on poset theory I-17 A file system strategy for high-performance We present a new approach to rank 16S rRNA sequence interval queries of very large datasets sequences according to several predefined quality We present results of performance tests of a file criteria. We devised applications and compared the system solution for fast sequence interval queries results with those of a generally available manually of large volumes of genomics feature data. GF2S curated data set. efficiently provides scalability, speed and —De Smet W, Verslyppe B, De Loof K, De Vos flexibility for visualization of huge datasets. P, De Baets B, Dawyndt P —Karcz SR , Links MG, Parkin IAP

98

ECCB10 Posters

I-23 Clustering and kernel methods using R defined a new measure and set up an evaluation Two different kernel based clustering methods, k methodology based on reference sets. K-Means and spectral clustering, for unsupervised —Benabderrahmane S, Smaïl-Tabbone M, learning on chemoinformatics data are presented in Napoli A, Poch O, Devignes MD this poster. —Adefioye A , De Moor B I-29 Utopia:GPCRDB: a domain-specific PDF reader. I-24 Ensuring data integrity in large modern We present a GPCR-specific PDF reader that integrative genomic studies dynamically links concepts in scientific articles to We present an algorithm for the detection and relevant entities in the GPCRDB, providing readers correction of sample mis-identification in studies with text-specific annotations containing data and combining genotypes with microarray expression information on i.e. GPCRs, residues and mutations. data. We illustrate its worth with simulated errors —Vroling B, Thorne D, McDermott P, Pettifer S in a breast cancer dataset. and Vriend G —Lynch A, Dunning M, Chin S-F, Curtis C I-30 Locally optimal structures of an RNA I-25 Testing branches in dendrograms RNA locally optimal secondary structures give a representing species abundance data rich picture of the folding space of an RNA. We We apply four approaches for dendrograms propose an efficient algorithm computing all such representing species abundance data, and analyse structures for a given RNA sequence. We which method enables to identify biologically implemented this new method in a software called relevant clusters. REGLISS. —Calkiewicz J, Wlodarska-Kowalczuk M, —Saffarian A, Giraud M, Touzet H Wrobel B I-31 Anesthetics and the role of the biological I-26 Reliable identification of hundreds of membrane in the molecular regulation of K2P proteins without peptide fragmentation potassium channels We present evidence that, combining high The TREK-1, a two-pore domain potassium precision mass determination and modern channel, is of paramount importance for anesthesia. prediction of the retention times of tryptic peptides We have investigated its structure in a membrane with a robust scoring system, we can detect environment and shown that the channels opening hundreds of proteins from HPLC-MS experiments must be an indirect reaction of the membrane with a low FDR. swelling. —Bochet P, Rügheimer F, Guina T, Brooks P, —Bernardi RC, Treptow W, Klein ML Goodlett D, Clote P, Schwikowski B I-32 Comparison of validation methods for I-27 Exploring unassigned peaks in protein merging cancer microarray data sets fragment mass spectra with frequent itemset Gene expression information abounds in public mining techniques databases. However, this information is fragmented We introduce an approach to exploit information in small datasets and thus of limited use. We from collections of unassigned spectral peaks lists benchmark datasets merging methods designed to by using frequent itemset mining techniques, and achieve greater statistical power. evaluate it based on a set of over 20.000 validated —Taminau J, Meganck S, Weiss-Solis DY, Van peptide mass fingerprint spectra. Staveren WCG, Dom G, Venet D, Bersini H, —Vu T-N, Valkenborg D, Eeckhout D, De Jaeger Detours V, Nowé A G, Goethals B, Witters E, Lemière F, Laukens K I-33 Mapping insertions to putative target genes I-28 Benchmarking a new semantic similarity We investigate the association of insertions with measure using reference sets and clustering: gene expression in MuLV and Sleeping Beauty evaluation and interpretation for the discovery of data sets. We present a knowledge-based approach missing information to map insertions to target genes and demonstrate Several semantic similarity measures exist using its superiority over nearest gene target selection. Gene Ontology controlled vocabulary. Comparing —De Jong J, De Ridder J, Sun N, Van Uitert M, performances requires evaluation criteria. We have Adams, DJ, Wessels LFA 99

ECCB10 Posters

I-34 The zebrafish embryo in toxicology: A M, Wilmotte A, Quillardet P, Tandeau de Marsac combination of toxicogenomics and other N, Talla E, Zhang C-C, Médigue C, Barbe V, screening techniques Ley N We present here our system for studying toxicological effects in zebrafish embryos. In our I-39 Moa: managing command line approach, we combine gene expression systems bioinformatics like microarrays and automated high throughput Bioinformatics projects are often command line imaging techniques. driven, while this allows flexibility it can make —Legradi J, Yang L, Alshut R., Ho N, Klüver N, maintenance difficult. Moa is designed to address Scholz S, Mikut R, Reischl M, Liebel U, Strähle U this issue; a tool to organise and maintain command line projects whilst retaining flexibility. I-35 Disease gene prediction based on tissue- —Fiers M specific conserved coexpression We show tissue-specific and multi-tissue conserved I-40 SAAPdb: structural effects of single amino coexpression to be highly complementary and acid polymorphisms exploit tissue-specific relationships between We present an update of SAAPdb, a database of disease and candidate genes to provide novel high- single amino acid polymorphisms and the effects confidence candidates for several genetic disorders. they are likely to have on protein structure. This —Piro RM , Ala U, Molineris I, Grassi E, update includes new mutations, a new structural Damasco C, Bracco C, Provero P, Di Cunto F effect and analyses of several datasets. —Baresic A, Alnumair N, Martin ACR I-36 Improved classification by integrating multiple patient data sets with literature I-41 Detecting human proteins involved in virus information using co-inertia analysis infection by observing the clustering of infected We describe a mathematical frame work to cells in siRNA screening images integrate two genome scale expression datasets of We present a clustering method analyzing siRNA patients and its corresponding literature screening images of HCV/DV infected cells to information from PUBMED as prior data sets. We detect host factors needed for the replication of the illustrate this frame work improve the classification virus. The results were compared with common ac intensity readouts and promising host factors were —Thomas M , Daemen A, De Moor B found. —Suratanee A , Rebhan I, Matula P, Kumar A, I-37 Time-resolved signatures from weighted Kaderali L, Rohr K , Bartenschlager R, Eils R, visibility graph representation of time-series data König R We provide a representation of time series via directed weighted visibility graphs. From this a I-42 A benchmark on cancer classification using measure is employed in statistical tests that account LS-SVMs and microarray data for the dependencies between not only the time We compare several cancer classification models points but also the entities from which time-series based on the Least Squares Support Vector data were obtained. Machine (LS-SVM)s and micrroarray data. In total, —Nikoloski Z, Grimbs S, Schäfer R, Sers C, we test 120 combinations of preprocessing Selbig J methods, feature selection methods, kernel functions and kernel parameter I-38 Genome Sequence of the Edible —Popovic D , Daemen A, De Moor B Cyanobacterium Arthrospira sp. PCC 8005 The cyanobacterium Arthrospira sp. PCC 8005 is I-43 MiRPara: a SVM-based software tool for of great interest to the European Space Agency for prediction of mature microRNAs its nutritive value and oyxgenic properties in the We have developed an SVM based system for MELiSSA biological life support system for long- microRNA prediction based on results from term manned missions into space. experimental studies. Our software can be run on —Janssen PJ, Morin N, Monsieurs P, Mergeay M, full length genome sequences from any species. In Leroy B, Wattiez, Vallaeys T, Waleron K, Waleron random tests, the software outperformed existing prediction tools —Wu Y, Liu H, Rayner S 100

ECCB10 Posters

I-44 A Platform for identifying prostate-cancer- related microRNA and mRNA using the empirical Bayes method in analysing microarray data A platform has been set up to study the regulatory role of miRNAs in tumorigenesis. Certain putative pairs of miRNA-mRNA are confirmed to be cancer related, hence, the effectiveness of the present approach is demonstrated. —Chen S-T, Ng K-L

I-45 PRIDE Inspector: a new tool to browse, visualize and review proteomics data PRIDE (http://www.ebi.ac.uk/pride) is one of the main public repositories of MS proteomics data. PRIDE Inspector (http://code.google.com/p/pride- inspector) is an open source application to visualize and perform a first quality assessment of MS data. —Wang R , Ríos D, Reisinger F, Vizcaíno JA, Hermjakob H

I-46 The scientist/staff, project collaboration and content management system (PCCMS) We present a scientist and project collaboration and relationship management platform, designed to streamline and integrate Omics experiment data and project management for various technologies in genomics, proteomics and bioinformatics. —Kumuthini J , Dominy J

101

ECCB10 Presentations & Posters Author Index

AUTHOR INDEX

Abascal F ...... 70 Arbiza L ...... 70 Becker J ...... 74 Abeel T ...... 53, 61 Armitage J ...... 78 Becker S ...... 90 Abel H J ...... 97 Arnedo J ...... 65 Beckstette M ...... 63 Abeln S ...... 74 Arnold K ...... 75 Bedo J ...... 64 Abma T ...... 37 Arumugam M ...... 95 Beerenwinkel N ...... 90, 94 Abrahamsson S ...... 80 Arvidsson S ...... 84 Beheydt G ...... 94 Abreu-Goodger C ...... 63 Aryal S ...... 74 Behrens S ...... 85 Acuner Ozbabacan SE ...... 89 Asai K ...... 48, 75 Beißbarth T ...... 56, 83, 88 Adefioye A ...... 99 Ashman K ...... 63 Bellay J ...... 88 Adler P ...... 83 Askenazi M ...... 50, 77 Bellazzi R ...... 85 Aelterman B ...... 76 Ast G ...... 67 Beltrame L ...... 93 Aerts S ...... 64, 65, 94 Aten E ...... 86 Benabderrahmane S ...... 99 Agarwal S ...... 89 Attwood TK ...... 52 Bender A ...... 77 Agostini S ...... 92 Aubry S ...... 67 Bender C ...... 56, 88 Aguilar-Ruiz JS ...... 96 Audenaert P ...... 88 Ben-Hamo R ...... 96 Aguilera M-S ...... 98 Augustyniak R ...... 74 Benkert P ...... 71, 75 Aharoni E...... 35, 96 Ayoubi T ...... 83 Bérces A ...... 44 Ahn T ...... 45 Azencott C-A ...... 96 Berezovsky IN ...... 49, 77 Aken B ...... 66 Azuaje F ...... 96 Berger F ...... 82 Akutsu T ...... 48, 75 Babaç MT ...... 86 Berkhout J ...... 95 Ala U ...... 57, 100 Bacell F ...... 35 Bernardi RC ...... 99 Alanis-Lobato G ...... 73 Bader GD ...... 88 Bersini H ...... 83, 87, 99 Alberich R ...... 72 Bailey J ...... 51, 84 Bertone P ...... 67 Albert J ...... 81 Baitaluk M...... 92 Beszteri B ...... 68, 70 Albrecht A ...... 54 Bak M ...... 67 Bettencourt R ...... 84 Albrecht M...... 38, 78 Bakis Y ...... 64, 86 Bezuidt O ...... 68 Aleksejevs S ...... 35 Balakrishnan L ...... 85 Bhagat J ...... 35, 98 Aleksiev T ...... 97 Balzer S ...... 45 Bhanot G ...... 68 Alessio M...... 50, 84 Banaei A ...... 69 Bhardwaj N ...... 88 Ali W ...... 92 Bansal M ...... 82 Bhattacharjee M ...... 85 Allemeersch J ...... 80 Baral C ...... 54 Bianco L ...... 79 Almén MS ...... 68 Barbe V ...... 100 Biasini M ...... 71, 74 Alnumair N ...... 100 Baresic A ...... 100 Bielza C ...... 94 Alshut R ...... 100 Bargsten JW ...... 71 Bijl M ...... 81 Altschmied L ...... 84 Barioni MC ...... 72 Bilican A ...... 37 Aluome C...... 77 Barla A ...... 76, 82 Billiau K ...... 63 Alves R ...... 67 Barreau D ...... 81 Binder H ...... 83 Amato R...... 67 Bartaševi čiūt÷ E ...... 53, 78 Birzele F ...... 92 Ampe M ...... 67 Bartenschlager R ...... 60, 100 Blachon S ...... 81 Anagnou NP ...... 69 Barton GJ ...... 63 Blake A ...... 87 Anchang B ...... 91 Barturen G ...... 64 Blanc E ...... 82 Andeweg AC ...... 81 Bastolla U ...... 74 Blanchet C ...... 53, 78 Andrade RFS ...... 89 Batt G ...... 55, 84 Blanzieri E ...... 77 Andrade-Navarro MA ...... 81 Baudet C ...... 69 Blockeel H ...... 70, 77 Andreetta C ...... 73, 74 Baudot A ...... 93 Blondé W ...... 86 Andrei R ...... 72, 73 Bauer-Mehren A ...... 91 Bochet P ...... 99 Andreini C ...... 73 Baum P ...... 92 Böcker S ...... 76 Andrews BJ ...... 88 Baumbach J ...... 38 Boeva VA ...... 66 Andrews SJD ...... 66 Baumgrass R ...... 70 Boisvert F-M ...... 63 Antezana E ...... 86 Baumlein H ...... 84 Bolger A ...... 79 Anwar S ...... 54 Bawono P ...... 74 Bolshoy A ...... 68 Apostolidou V ...... 61 Baykal N ...... 97 Bonnet E ...... 58, 63, 79 Arakawa K ...... 85, 91 Beck T ...... 86

102

ECCB10 Presentations & Posters Author Index

Bonobo Genome Consortium, Can T ...... 97 Corà D ...... 81, 93 The ...... 62 Canevet C ...... 90 Costa E ...... 70 Bonvin AMJJ ...... 94 Canisius S ...... 93 Costa F ...... 97 Bonzanni N ...... 81 Cannistraci CV ...... 50, 84 Costa M-A ...... 95 Boomsma W ...... 73 Cantone I ...... 55 Cox A ...... 39 Boone C ...... 88 Capriotti E ...... 75 Cozzetto D ...... 73 Bopardikar AS ...... 45 Carbonell JG ...... 57 Cozzini S...... 97 Bordoli L ...... 75 Carlberg C ...... 87 Crisanti A ...... 85 Bork P ...... 68, 94, 95 Carlon E ...... 82 Croes D ...... 93 Bornberg-Bauer E ...... 68 Caselle M ...... 81, 93 Cromer D ...... 91 Børresen-Dale A-L ...... 96 Castagnetti F ...... 97 Cruickshank D ...... 35 Borst L ...... 67 Cattaert T ...... 98 Csaba G ...... 47, 86 Boscariol F...... 65 Cavalieri D ...... 93 Cs őrös M...... 43 Bosia C ...... 81, 93 Cavallaro G ...... 73 Cuguen J ...... 67 Bot J ...... 98 Cavallini C ...... 92 Cukuroglu E ...... 91 Botta V ...... 67 Cela A ...... 95 Cuppen E ...... 61 Bottani S ...... 84 Cestaro A ...... 79, 97 Cuppens H ...... 64 Bottaro S ...... 73 Chabbert M ...... 73 Curk T ...... 67 Bourguignon P-Y ...... 92 Chambwe N...... 66 Curtis C ...... 99 Bousios A ...... 61 Chan ML ...... 91 Czarna A ...... 69 Bouvet O ...... 92 Chang A ...... 87 Czhial A ...... 84 Bouwman J ...... 37 Chang H-J ...... 81 Dabrowski M ...... 80 Boye K ...... 71 Chaouiya C ...... 92 Daelemans W ...... 92 Boyen P ...... 90 Chassignet P ...... 74 Daemen A ...... 100 Bozek K ...... 75 Chen C ...... 80, 91 Dalgaard MD ...... 67 Bracco C ...... 100 Chen C-Y ...... 91 Damasco C ...... 100 Braga A-P ...... 95 Chen K ...... 49 Dang TH ...... 80 Brambilla F ...... 92 Chen K-B ...... 51 Dang T-H ...... 92 Brandt BW ...... 65 Chen S-H ...... 91 D'Antonio M ...... 96 Brejova B ...... 62 Chen S-T ...... 101 Dao P ...... 58 Brejová B ...... 61 Chen W ...... 64 Dar N ...... 88 Brenninkmeijer C ...... 90 Cheng J ...... 67 Daran J-M ...... 45 Brohée S ...... 87 Chiang G-T ...... 66 Darbo E ...... 81 Bron PA ...... 91 Chin C-H ...... 91 Darriba D ...... 70 Brookes AJ ...... 86 Chin S-F ...... 99 Darzentas N ...... 61 Brooks P ...... 99 Chiogna M...... 64 Dassi E ...... 83 Broos S ...... 83 Cho SB ...... 83 Davenport CF ...... 61 Brors B ...... 80, 92 Cho Y ...... 62 Davenport MP ...... 91 Brosch M ...... 62 Choli-Papadopoulou T ...... 69 Davey NE ...... 64 Brouwer R ...... 94 Choudhary JS ...... 62 Davicioni E ...... 58 Bruford EB ...... 86 Christoph J ...... 76 Davidson B ...... 36 Bruijn E ...... 61 Ciccarelli FD ...... 96 Dawyndt P ...... 62, 85, 98 Brunak S ...... 67, 78 Cilia E ...... 76 De Baets B ...... 85, 86, 98 Brunetti P ...... 92 Claesen J ...... 73 De Bellis G ...... 85 Brunner HG ...... 61 Clapham P ...... 66 De Bleser P ...... 83 Bryne JC ...... 53 Clevert D-A ...... 70 de Boer RJ ...... 81 Budinska E...... 83, 95 Cloots L ...... 79, 85 De Bondt A ...... 70 Buggert M ...... 94 Clote P ...... 99 de Brevern A ...... 73, 75 Bujnicki JM ...... 71 Coates G ...... 66 de Brevern AG ...... 73 Bundschus M ...... 91 Cobelli C ...... 82 De Clercq I...... 88 Burzykowski T ...... 73 Cocozza S ...... 67 De Grave K ...... 66 Buts L ...... 75 Colak R ...... 58 De Jaeger G ...... 99 Caboche S ...... 78 Collado-Vides J ...... 63 De Jager VCL ...... 64 Cadet F ...... 75 Collins MO ...... 62 de Jong H ...... 55 Cai J ...... 54 Conesa A ...... 66, 85 De Jong J ...... 99 Calkiewicz J ...... 99 Conklin BR ...... 86 De Keersmaecker S...... 82 Callieri M...... 73 Constanzo M ...... 88 De Knijf J...... 92 Calura E ...... 93 Conte MG ...... 77 de la Fuente A ...... 91 Camacho RJ ...... 94 Coolen A ...... 82 De Leo V ...... 91 Campagne F ...... 66 Cools J ...... 64 de Ligt J ...... 61

103

ECCB10 Presentations & Posters Author Index

De Loof K ...... 98 Dobson P ...... 90 Fazius E ...... 63 De Maeyer M ...... 94 Dokanehiifard S-A ...... 64 Feenstra A ...... 81 De Maria E ...... 89 Dom G ...... 99 Feenstra KA ...... 65, 90 De Moor B79, 83, 90, 93, 99, Domingues FS ...... 75 Felder M ...... 62 100 Domingues F-S ...... 78 Ferkinghoff-Borg J ...... 73, 74 De Paepe A ...... 80 Dominy J ...... 101 Ferré F ...... 98 De Raedt L ...... 84 Donato M ...... 93 Fierro C ...... 79 de Ridder D ...... 45 Doncheva N-T ...... 78 Fiers M ...... 100 De Ridder D ...... 95 Donepudi M ...... 66 Fieuw A ...... 80 de Ridder J ...... 98 Dopazo H ...... 70 Finian M ...... 64 De Rijk P ...... 76, 92 Dopazo J ...... 66, 85 Fiorucci S ...... 72 de Ronde J ...... 95 Dore J ...... 95 Fischer A...... 62 De Roure D ...... 35 Dorff KC ...... 66 Fisher P ...... 35, 90 De Schrijver J ...... 62, 67 Draghici S ...... 93 Flatters D ...... 62 De Smet R...... 82, 85 Droc G ...... 77 Foekens J ...... 93 De Smet W ...... 85, 98 Drula M ...... 75 Folch B ...... 72 De Vos P ...... 85, 98 Duarte I ...... 71 Fontana P ...... 79, 86 De Vuyst L ...... 80 Dubinina Y ...... 92 Forrest LR ...... 61 de Waal L ...... 81 Dunlop I ...... 35 Forsberg R ...... 39 De Wilde B ...... 66, 67 Dunning M ...... 99 Foster SD ...... 81 Deane C ...... 75, 78, 92 Dupont P ...... 93, 95 Fostier J ...... 68 Deane CM ...... 57, 75, 92 Durot M ...... 92 Francke C ...... 77 Defoin-Platel M ...... 78, 87 Dutilh BE ...... 62 Frankish A ...... 62 Defrance M ...... 62 Edwards R-J ...... 64 Fraternali F ...... 82 Dehouck Y ...... 72 Eeckhout D ...... 99 Frederiksen J ...... 78 del Pozo A ...... 63, 76 Efroni S ...... 96 Fredlund ...... 80 del Val C ...... 65 Egas C ...... 84 Fredriksson R ...... 68 Del-Favero J ...... 76, 92 Ehrenberger T ...... 80 Free R ...... 86 Delgado-Eckert E ...... 90 Eils R ...... 60, 80, 90, 92, 100 Freudenberger M ...... 80 Delorenzi M ...... 83, 95 Ekseth O ...... 68 Frickenhaus S ...... 68, 70 Delzenne NM ...... 79 El Baroudi M ...... 93 Froehlich H ...... 90 Demeester P ...... 68, 88 Elgar G ...... 71 Fröhlich H ...... 56, 88 Demoulin JB ...... 79 Emig D ...... 38 Frumence E ...... 71, 75 Denamur E ...... 92 Emmett W ...... 61 Fuku N ...... 68 Deng M ...... 66 Engelen K ...... 79, 80, 85 Fundel-Clemens K ...... 92 Deravel J ...... 78 Engin B ...... 91 Furlanello C ...... 82 Derbinski J ...... 80 Engram J ...... 91 Furlong LI ...... 91 Descallar J ...... 98 Enøe Johansson K ...... 73 Gabriel D ...... 98 Detours V...... 83, 87, 99 Erhard F ...... 46 Gade S ...... 83 Devaux Y ...... 96 Ernst A ...... 88 Gaenzler C ...... 38, 96 Devignes MD ...... 99 Esmaielbeiki R ...... 89 Gajula P ...... 37 Devignes M-D ...... 90 Essaghir A ...... 79 Galagan J ...... 61 Devillé J ...... 73 Ester M ...... 58 Galle P ...... 96 Dhananjaneyulu V ...... 92 Esteves A ...... 77 Galvão V ...... 89 Dhoedt B ...... 68 EuResist GEIE partners ...... 35 Galzitskaya OV ...... 61 Di Camillo B ...... 82, 86 Evelo C ...... 37 Gambin A ...... 77, 96 Di Cunto F ...... 57, 100 Evelo CT ...... 86 Gambin T ...... 96 Di Silvestre D ...... 92 Evers MJ ...... 40 Gao J ...... 86 Diao L ...... 68 Ezcurdia I ...... 63 Gao Y ...... 81 Dias Maciel C ...... 59 Ezkurdia I ...... 62 Garagna S ...... 85 Dias Z ...... 69 Ezra E ...... 65 Garcia F ...... 66 Diessl N ...... 90 Faber K ...... 62 García Jenifer García ...... 98 Dimitriadis D ...... 69 Fack V ...... 62 García-Gil MR ...... 80 Dingli D ...... 97 Fages F ...... 54, 84, 89 Garg A ...... 81 Dinkelacker M ...... 80 Fairley S ...... 66 Gautier C...... 69 Disfani FM...... 49 Fallahi H ...... 64 Gautier L ...... 67 Dittrich M ...... 95 Farkash A ...... 96 Gay S ...... 54 Dittwald P ...... 77 Fauré A ...... 92 Geerdens E ...... 64 Divoli A ...... 87 Faust K ...... 93 Geiger D ...... 96 Doallo R...... 70 Favorov AV...... 66 Gelly J-C ...... 62, 73

104

ECCB10 Presentations & Posters Author Index

Gerstein M ...... 74, 88 Hammond-Kosack K ...... 87 Huang X ...... 73 Geurts P ...... 67, 82, 88 Han S ...... 88 Hubbard T ...... 62, 66 Gevaert O...... 81 Hancock JM ...... 87 Huculeci R ...... 75 Gfeller D ...... 88 Hanley S ...... 78 Hugenholtz J ...... 94 Ghoorah AW ...... 90 Hansen DS...... 71 Hugo K ...... 81 Gilissen C ...... 61 Hanspers K ...... 86 Hulpiau P ...... 70 Gill S ...... 90 Harrison CM ...... 95 Hulsegge B ...... 86 Gillis W ...... 85 Harrow J ...... 62 Hulsman M ...... 90 Ginalski K ...... 74, 78 Harvill ET ...... 82 Huynen M ...... 71 Giorgi F-M...... 79 Hassan A ...... 87 Huynen MA ...... 62 Giovannoni SJ ...... 68 Hassan BA...... 94 Huynh-Thu VA ...... 82 Giraud M ...... 99 Hassani-Pak K ...... 78 Huys G ...... 80 Glaab E ...... 93 Hastings R ...... 86 Hvidsten TR ...... 79 Glatting K-H ...... 62 Havgaard JH ...... 65 Hypergenes partners ...... 35 Glick G ...... 65 Haviv I ...... 51, 64, 84 Iacucci E ...... 90 Gloerich J...... 62 Hazelwood LA ...... 64 Ideker T ...... 50, 84 Glun čić M ...... 65 He Y ...... 63 Ikegami K ...... 91 Goble C ...... 35, 98 Hedge S ...... 83 Imoto S ...... 82 Goessler G ...... 55 Hehir-Kwa JY ...... 61 Incitti R ...... 95 Goethals B ...... 92, 99 Heijmans B ...... 94 Iribarne F ...... 98 Gogol-Döring A ...... 64 Heinäniemi M ...... 87 Ironi L ...... 79 Gohr A ...... 69 Hellemans J ...... 66 Irrthum A ...... 82 Gollapudi VLS ...... 86 Helms V ...... 66, 76, 92 Ishchukov I ...... 79 Gonçalves JP ...... 81, 87, 88 Hendriks M ...... 37 Ison J ...... 53, 78 Goncearenco A ...... 49, 77 Hendy M ...... 70 Ittrich C ...... 92 Good JM ...... 62 Henjes F ...... 56, 88 Jacques P...... 78 Goodlett D ...... 99 Heringa J ...... 65, 81, 90 Jaeger S ...... 92 Gordon SM ...... 86 Herman D ...... 79 Jaffrezic F ...... 81 Gorodkin J ...... 65 Hermans K ...... 82 James N ...... 66 Gorup C ...... 67 Hermjakob H ...... 41, 101 Janeži č D...... 71 Goto N ...... 82 Hernández-Lobato D ...... 95 Janky R ...... 82 Göttgens B ...... 81 Herrmann C ...... 62 Janowska-Sejda E ...... 87 Götz S ...... 85 Hersen P ...... 84 Jansen M ...... 37, 94 Grassi E ...... 100 Heuser P ...... 72 Janssen P ...... 84, 100 Gray WA ...... 98 Hibberd J-M ...... 67 Janssen PJ ...... 100 Grimbs S ...... 100 Higuchi T ...... 55, 82 Jardin C ...... 89 Grondin M ...... 75 Hijikata A ...... 85 Jensen LJ...... 68 Gront D ...... 72 Hindle M ...... 78 Jimenez R ...... 41 Grosse I ...... 69 Hirschberg K ...... 65 Jimeno Yepes A ...... 87 Grossman Z ...... 94 Ho C-W ...... 91 Jonassen I ...... 45, 53, 78 Grote A ...... 87 Ho I ...... 86 Jones DT ...... 73 Groth M ...... 62 Ho N ...... 100 Jones N ...... 89 Gschwandtner S ...... 80 Hochreiter S ...... 70 Joosten HJ ...... 75 Gu J ...... 68 Hoeng J ...... 79 Joseph A ...... 53, 75, 78 Guignon V ...... 77 Hofacker IL ...... 48 Joshi A ...... 79 Guina T ...... 99 Hollunder J ...... 93 Joubert F ...... 95 Guns T ...... 66, 84 Holzwarth J ...... 82 Jozefczuk S ...... 81 Gupta R ...... 67, 78 Höner zu Siederdissen C ...... 48 Juhos S ...... 43 Gursoy A ...... 88, 89, 91 Hong M-P ...... 87 Jung E ...... 78 Guryev V ...... 61 Hooghe B ...... 83 Jurman G...... 82 Gyenesei A ...... 84 Horiike T ...... 72 Kaderali L ...... 60, 100 Haas J ...... 74, 75 Horlings R ...... 37 Kafkas S ...... 87 Habash D ...... 78 Hosseini-Nasab S-M-E ...... 64 Kahlem P ...... 41 Hackenberg M ...... 64 Hotz-Wagenblatt A ...... 62 Kahn CL ...... 46 Hahn U ...... 87 Hourlier T ...... 66 Kahn D ...... 65 Haiminen N ...... 41 Hristov BH ...... 46 Kaipa KK ...... 45 Haley C ...... 84 Hsiung C ...... 91 Kaiser R ...... 94 Hamada M ...... 48, 75 Hu F ...... 95 Kalaš M ...... 53, 78 Hamelryck T ...... 73, 74 Huang C-H ...... 97 Kalender Z ...... 64 Hamer R...... 78 Huang H ...... 47, 59 Kamijo S ...... 97

105

ECCB10 Presentations & Posters Author Index

Kaminska B ...... 80 Kono N ...... 91 Lavezzo E ...... 86 Kampenusa I ...... 76 Konopka BM ...... 74 Lawson D ...... 85 Kandasamy K ...... 85 Kooter R ...... 95 Ła źniewski M ...... 74, 89 Kar G ...... 88 Korenblat K ...... 68 Le Fèvre F ...... 92 Karathia HM ...... 67 Korf U ...... 56, 88 Le Pera L ...... 63, 98 Karaya HG ...... 77 Kors J ...... 87 Lebherz C ...... 80 Karcz SR ...... 98 Koscielny G...... 85 Lebrun S ...... 79 Karczmarski J ...... 77 Kossida S ...... 69 Leclère V ...... 78 Karlsson AC ...... 94 Köszegi D ...... 84 Lecuit T ...... 81 Karypis G...... 74 Kotulska M ...... 74 Lee H-M ...... 87 Kato Y ...... 48, 75 Kowalczyk A...... 51, 64, 84 Lee K ...... 45 Kauffman C ...... 74 Kozhenkov S ...... 92 Lefever S...... 66, 80 Kedarisetti KD ...... 49 Kranz A-J ...... 90 Leffers H ...... 67 Keerthikumar S ...... 85 Krasnogor N ...... 93 Legradi J ...... 100 Kelder T ...... 86 Krawczyk K ...... 78 Leidel J ...... 40 Kell DB ...... 52 Kreil D-P ...... 61 Lemaitre C ...... 69 Kelly K-A ...... 67 Kreis S ...... 84 Lemière F ...... 99 Kelm S ...... 75 Kremer A ...... 37, 96 Lenaerts T ...... 75, 97 Kelso J ...... 62, 71 Krier F ...... 78 Lenart J ...... 80 Kersey P...... 85 Kristensen VN ...... 96 Lengauer T ...... 54, 75 Keskin O ...... 88, 89, 91 Kröger S ...... 70 Leonardi E ...... 89 Khafizov K ...... 61 Kroll KM ...... 82 Leroy B ...... 100 Khaldi N ...... 64 Krupp M ...... 96 Leser U ...... 70, 92 Khalifa W ...... 76 Küffner R ...... 86 Letunic I ...... 68, 94 Khang TF ...... 69 Kuhn H ...... 69 Ley N ...... 100 Khayatt B I ...... 77 Kuiper M ...... 68, 86 Leys N ...... 84 Khutko V ...... 84 Kuipers O ...... 94 Li C ...... 87 Kido N ...... 85, 91 Kuipers RKP ...... 75 Li GD ...... 94 Kiefer F ...... 75 Kulakovskiy I V ...... 66 Li P ...... 35 Kille P ...... 98 Kull M ...... 83 Liang S ...... 54 Kim E ...... 67 Kumar A ...... 60, 100 Liebel U ...... 100 Kim JH ...... 83 Kumar G ...... 92 Liekens A ...... 92 Kim PM ...... 88 Kumlehn J ...... 84 Lietz M ...... 79 Kim, JH...... 62 Kümmel A ...... 98 Lima AN ...... 71 Kinston S ...... 81 Kumuthini J ...... 101 Lima J ...... 95 Kivinen V ...... 94 Kuo SC ...... 78 Lima-Mendez G ...... 68 Klambauer G ...... 70 Kurgan L ...... 49 Lin C-Y ...... 91 Klau GW ...... 95 Kurths J ...... 85 Lindi B ...... 68 Kleerebezem M...... 64, 82, 91 Kutahya O ...... 64 Lingjærde OC ...... 96 Klein ML ...... 99 Kyewski B ...... 80 Lingner T ...... 70 Klein-Seetharaman J ...... 57 Łabaj P-P ...... 61 Linial M ...... 50, 77, 78 Kleywegt G ...... 42 Labuschange P ...... 61 Links MG ...... 98 Klie S ...... 81 Lackner P ...... 80 Lionetti V ...... 92 Klijn C ...... 93 Laganeckas M ...... 64 Lips E ...... 95 Klingström T ...... 85 Laiho A ...... 84 Lirk G ...... 80 Kloosterman W ...... 61 Laimer J ...... 80 Liu H ...... 100 Klüver N ...... 100 Lämmle B ...... 92 Lobanov MY ...... 61 Knight JR ...... 62 Lamond AI ...... 63 Lohse M ...... 79 Knogge W ...... 62 Lamzin V ...... 72 London N ...... 91 Ko M-T ...... 91 Lamzin VS ...... 72 Loni T ...... 72, 73 Kocabas F ...... 97 Lancaster O ...... 86 Lopez G ...... 62 Kodira CD ...... 62 Lance B ...... 98 López G ...... 76 Koehl P ...... 73 Landau D ...... 96 Lopez R ...... 35, 98 Kok J ...... 94 Landuyt B ...... 66 Loytynoja A ...... 68 Kolde R ...... 83 Lange S ...... 38 Ludden V ...... 37 Konc J ...... 71 Lanzén A ...... 45 Lund O ...... 94 König J ...... 67 Laporte MA ...... 77 Lundegaard C ...... 94 Konig R ...... 90 Larrañaga P ...... 94 Luo Q ...... 78 König R ...... 60, 100 Laukens K ...... 80, 92, 99 Lush MJ ...... 86 Konings P ...... 67 Laurent T ...... 35, 98 Luyten W ...... 66

106

ECCB10 Presentations & Posters Author Index

Lv Q ...... 73 Mennerich D ...... 92 Mullikin JC ...... 62 Lyall A ...... 41 Menschaert G ...... 66 Münsterkötter M ...... 62 Lynch A ...... 99 Mercer S ...... 37 Munyiri SW ...... 77 Maas T ...... 96 Merdes G ...... 90 Munyua JK ...... 77 Macintyre G ...... 51, 84 Mergeay M ...... 84, 100 Murgia A...... 89 Macko M ...... 68 Messemaker T ...... 86 Muro EM ...... 81 Madan Babu M ...... 82, 88 Mestdagh P ...... 80 Murray L ...... 39 Madeira SC ...... 81 Metahit Consortium, The ..... 95 Muturi PW ...... 77 Maere S ...... 79 Metzger J ...... 92 Mwololo JK ...... 77 Maes F ...... 74 Meyer F ...... 63 Myasnikova E ...... 98 Mahachie John JM ...... 81, 98 Meysman P ...... 80, 85 Myers CL ...... 88 Maietta P ...... 62, 76 Michaelides D ...... 35 Naamati G ...... 50, 77 Maisinger K ...... 39 Michailidis G...... 52, 93 Naeem H ...... 86 Makeev V ...... 66, 69, 83 Michaut M ...... 88 Nagao H ...... 55, 82 Malde K ...... 45 Micheletti C...... 97 Nagashima T ...... 82 Malik BK ...... 72 Michoel T ...... 58, 79, 88 Nam D ...... 52 Mang A ...... 90 Mieczkowski J ...... 80 Nanasi M...... 62 Manoharan M ...... 71 Miele G ...... 67 Nánási M...... 61 Marba M ...... 85 Mikut R ...... 100 Nap JPH ...... 71 Marcatili P ...... 63 Miller J ...... 62 Napoli A ...... 99 Marchal K .... 79, 80, 82, 84, 85 Milward D ...... 87 Naqvi S-A-H ...... 97 Marco M ...... 91 Minai R ...... 72 Narayanan R ...... 45 Margelevi čius M ...... 64, 73 Miranda JGV ...... 89 Natowicz R ...... 95 Mariani V...... 74 Miranda-Saavedra D ...... 81 Naume B ...... 96 Marquardt J ...... 96 Mironov V ...... 68, 86 Naval-Sanchez M...... 65 Marsh J ...... 52 Missier P ...... 35 Navarro-Quezada A ...... 62 Marshall MS ...... 87 Mitrofanov S ...... 69 Nazarov P V ...... 84 Martens C ...... 69 Mitterecker A ...... 70 Nazarov PV...... 96 Martens J ...... 93 Miyano S ...... 82 Nebel J-C ...... 61, 74, 89 Marthey S ...... 81 Mizianty MJ ...... 49 Nenadic A ...... 35 Martin ACR ...... 100 Mlynárová L ...... 71 Nepomuceno I ...... 96 Martin AJM ...... 65 Mohan S ...... 85 Nersting J ...... 67 Martin J ...... 75 Molineris I ...... 57, 100 Netotea S...... 79 Martin MAM ...... 63 Molloy M ...... 98 Neuvirth H ...... 35, 96 Martin MJ ...... 43 Mons B ...... 86 Neven F ...... 90 Martinez P ...... 82 Mons BM ...... 87 Ng Fuk Chong M ...... 71 Martini M...... 82 Monsieurs P...... 84, 100 Ng K-L ...... 97, 101 Marti-Renom MA ...... 75 Monteiro P...... 55 Nguyen N ...... 47, 59 Marynen P ...... 96 Montevecchi FM ...... 50, 84 Nielsen BF ...... 67 Masecchia S ...... 76 Moors H ...... 84 Nielsen M ...... 94 Massingham T ...... 68 Moraes Pataro C-D ...... 95 Niemi J ...... 80 Matroud A ...... 70 Moreau Y66, 67, 81, 83, 87, Nijkamp J ...... 45 Matula P...... 60, 100 88, 90, 94 Nijssen S ...... 84 Mauri PL ...... 92 Moretto M ...... 97 Nikerel IE ...... 95 Mayer MA ...... 91 Morgan H ...... 87 Nikiforova V ...... 85 Mayr A ...... 70 Morgenstern F ...... 76 Nikoloski Z ...... 56, 100 McArt D...... 97 Morigen ...... 80 Niranjan M ...... 93 McDermott P ...... 52, 99 Morin N ...... 100 Nitsch D ...... 87, 88 McDowall MD ...... 63 Mornico D ...... 92 Nolan T ...... 85 McHardy A ...... 77 Moser F ...... 58 Nordgard SH ...... 96 Médigue C ...... 92, 100 Mosig A ...... 66 Nordström KJ ...... 68 Medina-Rivera A ...... 63 Mostaguir K ...... 75 Nowé A ...... 87, 99 Medvedeva Y ...... 69, 83 Movahedi S ...... 68 Nussinov R ...... 88, 89 Meganck S ...... 87, 99 Mueller-Roeber B ...... 84 Nykter M...... 94 Meinicke P ...... 70 Mühlhausen S ...... 70 Nzuobontane E ...... 35, 98 Melo J ...... 95 Mulas F ...... 85 Odendaal CJ ...... 95 Melquiond ASJ ...... 94 Mulder L ...... 95 Offmann B ...... 71, 75 Méndez Giráldez R ...... 74 Muley VY ...... 94 Ogawa R ...... 91 Mendonca EA ...... 87 Muller A ...... 84, 96 Ojeda F ...... 88, 90 Mendoza MR ...... 89 Müller T ...... 95 Okada M ...... 82

107

ECCB10 Presentations & Posters Author Index

Okuda S ...... 94 Plewczy ński D ...... 74 Reineke AR...... 68 Olechnovi č K ...... 73 Poch O ...... 99 Reinelt G ...... 90 Oliver JL ...... 64 Ponomarenko J ...... 92 Reinert G...... 57, 78 Omranian N ...... 84 Pontiggia F ...... 97 Reischl M ...... 100 Ongyerth M ...... 71 Pool R ...... 90 Reisinger F ...... 101 Onodera K ...... 97 Popovic D ...... 100 Ren X-Y ...... 79 Oraintara S ...... 59 Popovici R ...... 93 Reva O ...... 68 Orsini M...... 91 Popovici V...... 83, 95 Reva ON ...... 61 Ortiz AM ...... 91 Posada D ...... 70 Rey J ...... 73 Osella M ...... 81, 93 Potenza E ...... 79 Rezvoy C ...... 65 Oshita K ...... 85, 91 Potestio R ...... 97 Riaño-Pachón DM ...... 84 Osterhaus ADME ...... 81 Poulain P ...... 62, 72 Ridzon D ...... 80 Ostrowski J ...... 77 Poussin C ...... 79 Ríos D ...... 101 Oswald M ...... 90 Powers S ...... 78 Risch A ...... 62 Owen S ...... 35 Praveen P ...... 90 Risso D ...... 64 Pääbo S ...... 62 Prévost C ...... 72 Ritchie DW ...... 90 Paar P ...... 65 Procter JB ...... 63 Rito T ...... 57 Paar V ...... 65 Proost S ...... 68, 69 Riva P ...... 83 Pacheco JM ...... 97 Prosperi M ...... 94 Rivero Guedez D ...... 93 Pachikian BD ...... 79 Provero P ...... 57, 100 Rizk A ...... 89 Padmanabhan R ...... 69 Prüfer K ...... 62, 71 Rizzetto L ...... 93 Page M ...... 55 Ptak SE ...... 62 Rizzi E ...... 85 Paiardini M ...... 91 Puntervoll P ...... 53, 78, 86 Robert V ...... 64 Panchin A ...... 69 Pupin M ...... 78 Rocha J ...... 72 Panzeri L ...... 79 Pushker R ...... 64 Rodenhuis S ...... 95 Parida L ...... 40 Putintseva Y ...... 69 Rodrigues J ...... 94 Park CH ...... 62 Putta P ...... 62 Rodriguez JM ...... 63 Parker C N ...... 98 Qi Y ...... 57 Rodríguez JM ...... 76 Parker H ...... 71 Quast K ...... 92 Rodríguez J-M ...... 62 Parkin IAP ...... 98 Quattrone A ...... 77, 83, 93 Rogel-Gaillard C ...... 81 Pasmanik-Chor M ...... 65 Quillardet P ...... 100 Røgen P ...... 73 Passerini A ...... 76, 77 Raczy C ...... 39 Rohr K ...... 60, 100 Patricio M ...... 70 Raes J ...... 68, 95 Romero-Zaliz R ...... 65 Pattyn F ...... 66, 80 Rahmani H ...... 77 Romualdi C ...... 64, 93 Paulino Z-M ...... 77, 98 Rahmann S ...... 38 Rooman M ...... 72, 81 Pavlopoulos GA ...... 90 Raj U ...... 97 Roos M ...... 35, 86, 87 Peitsch M ...... 79 Raju R ...... 85 Rosandi ć M ...... 65 Perahia D ...... 71 Ramgolam R ...... 35 Rosen-Zvi M ...... 36, 96 Perez C...... 94 Ramon J ...... 74 Rosset S ...... 47 Perin C ...... 77 Ramos F ...... 95 Rosti G ...... 97 Permina E ...... 83 Ramsay M ...... 76 Rot G ...... 67 Perou CM...... 96 Ranade SS ...... 80 Rother K ...... 71 Peterson H ...... 83 Rangannan V ...... 82 Rouard M ...... 77 Petravic J ...... 91 Ranjan A ...... 94 Rousseau F ...... 75, 87 Pettifer S ...... 35, 53, 98, 99 Rapacki K ...... 53, 78 Rouzier R ...... 95 Pettifer SR ...... 52 Raphael BJ ...... 46 Rowicka M ...... 78 Philippsen A ...... 74 Rappoport N ...... 78 Ruffier M ...... 66 Philot EA ...... 71 Rasche F ...... 76 Rügheimer F ...... 99 Piccinelli P ...... 71 Rautschka M ...... 91 Russnes HG ...... 96 Pickavet M ...... 88 Ravasi T ...... 50, 84 Rutherford K-M ...... 67 Pico AR ...... 86 Ravid-Amir O ...... 47 Ryan D ...... 79 Pietrelli A...... 62 Rawlings C ...... 78, 90 Rychlewski G ...... 89 Pinelli M ...... 67 Re A ...... 77 Rye IH ...... 96 Pinheiro M ...... 84 Rebhan I ...... 60, 100 Rzhetsky A ...... 87 Pinna A ...... 91 Rebholz-Schuhmann D ...... 87 Sabarly V ...... 92 Pinto S ...... 80 Rebollido-Rios R ...... 63 Sacchi L ...... 85 Piro RM ...... 57, 100 Reddanna P ...... 69 Saci Z ...... 78 Pisabarro MT ...... 78 Reimand J ...... 83 Sadeh M ...... 91 Platzer M ...... 62 Reinders M ... 45, 83, 90, 94, 98 Sadovsky M ...... 69 Plewczynski D ...... 85, 89 Reinders MJT ...... 90, 95 Saeki Y ...... 82

108

ECCB10 Presentations & Posters Author Index

Saeys Y ...... 53 Schütz T A ...... 90 Sowdhamini R ...... 71 Saffarian A ...... 99 Schwede T ...... 71, 74, 75 Spang R ...... 91 Sagar PVN ...... 92 Schwikowski B ...... 99 Spaniol C ...... 66 Sagot M-F ...... 69 Schymkowitz J ...... 75, 87 Speleman F ...... 66, 80 Sagulenko V ...... 90 Scott LPB ...... 71, 72 Spooner W ...... 66 Sahi S ...... 72 Scott MS ...... 63 Squillario M ...... 76, 82 Saito J-T...... 37 Seal RL ...... 86 Srdanovic M ...... 66 Saito MM ...... 55 Searle S ...... 66 Srikantha A ...... 45 Saito M-M ...... 82 Secrier M ...... 90 Srinivasan N ...... 75 Sajitz-Hermstein M...... 56 Sedova M ...... 92 Srivastava A ...... 97 Saladin A ...... 72 Segata N ...... 77 Stach W ...... 49 Salari R ...... 58 Seifert M ...... 69 Staiger C ...... 95 Sales G ...... 64 Seiler M ...... 68 Stamm M ...... 61 Salgado-Osorio H ...... 63 Seiler S ...... 68 Stammet P ...... 96 Salmon-Divon M ...... 67 Sela N ...... 67 Stanton A ...... 95 Sambo F ...... 82 Selbig J ...... 81, 100 Staritzbichler R ...... 61 Sambourg L ...... 93 Semple C ...... 84 Stavropoulos I ...... 64 Samsonov SA ...... 78 Senn S ...... 73 Stefanni S ...... 84 Samsonova M ...... 98 Seroussi E ...... 65 Steininger A ...... 64 Sanavia T ...... 82, 86 Serrano L ...... 88 Stekel DJ ...... 79 Sanchez-Rodriguez A ...... 84 Serrão Santos R ...... 84 Stelle D ...... 72 Sánchez-Rodríguez A ...... 79 Sers C ...... 100 Stevens R ...... 35, 90, 98 Sand O ...... 62 Severgnini M ...... 85 Steyaert J-M ...... 74 Sanguinetti G ...... 93 Sezerman OU ...... 64, 86 Sticht H ...... 89 Santoyo J ...... 66 Sharma A ...... 45 Storms V ...... 85 Sanz F ...... 91 Shelest E ...... 63 Stovgaard K ...... 74 Saqi M ...... 78, 87 Shelest V ...... 63 Strähle U ...... 100 Sato K ...... 48, 75 Sherman J ...... 98 Strickert M ...... 69 Saumitou-Laprade P ...... 67 Shi J ...... 75 Strous M ...... 62 Saunders GI ...... 62 Shida K ...... 65 Subramanian S ...... 86 Schaadt NS ...... 76 Shields D ...... 95 Sucena É ...... 92 Schaap PJ ...... 75 Shields DC ...... 64 Sufi S ...... 35 Schachter V ...... 92 Shields D-C ...... 64 Sun H ...... 84 Schäfer R ...... 100 Shin M-Y ...... 87 Sun N ...... 99 Schatz F ...... 63 Shiraishi Y...... 82 Sun W ...... 96 Scheel T ...... 70 Shirak A ...... 65 Sun X ...... 85 Scheer M ...... 87 Shmulevich I ...... 94 Suratanee A...... 60, 100 Schenk A-D ...... 74 Shojaie A ...... 52, 93 Surendran P...... 95 Scheuber S ...... 74 Sicheritz-Ponten T ...... 67 Surkova S ...... 98 Schietgat L ...... 74 Sidhu SS ...... 88 Svatos A ...... 76 Schimmler M ...... 63 Siebourg J ...... 94 Swati ...... 97 Schiöth HB ...... 68 Siezen R ...... 94 Sykacek P ...... 61 Schlage W ...... 79 Siezen RJ ...... 64, 70, 77, 82 Sykulski M ...... 96 Schlicker A ...... 54 Sifrim A ...... 66 Szolkiewicz MS ...... 95 Schmid R ...... 92 Sikkema S ...... 37 Sztromwasser P ...... 86 Schmiegelow K ...... 67 Silvestri G ...... 91 Taboada G-L ...... 70 Schmitz S ...... 84 Singh B ...... 87 Taggs M ...... 35 Schneider R ...... 90 Singh V-B ...... 97 Takahashi K ...... 82 Scholz S ...... 100 Sippl MJ ...... 73 Talla E ...... 100 Schomburg D ...... 87 Slagboom P ...... 94 Talloen W ...... 70 Schomburg I ...... 87 Slonim N ...... 35 Tamaki S ...... 91 Schönhuth A ...... 58 Smaïl-Tabbone M ...... 90, 99 Taminau J ...... 87, 99 Schoofs L ...... 66 Smid EJ ...... 64 Tanaka M ...... 68 Schramm G ...... 90 Smid M ...... 93 Tandeau de Marsac N ...... 100 Schroeter A ...... 89 Smits MA ...... 86 Taneri B ...... 87 Schrynemackers M ...... 88 Soehngen C ...... 87 Tang A ...... 66 Schueler-Furman N...... 91 Soiland-Reyes S ...... 35 Tannier E ...... 69 Schuemie M ...... 86 Soliman S ...... 54, 89 Tanoh F ...... 35, 98 Schuit F ...... 81 Solovyov A ...... 68 Tarazona S ...... 66, 85 Schuster S ...... 89 Souza T ...... 95 Tari L ...... 54

109

ECCB10 Presentations & Posters Author Index

Tastan O...... 57 Ule J ...... 67 Vandenbogaert M ...... 86 Taubert J ...... 90 Uli čný J ...... 94 Vandepoele K ...... 68, 69, 70 Taudien S ...... 62 UniProt Consortium, The ..... 42 Vandesompele J ...... 66, 67, 80 te Pas MFW ...... 86 Urban M ...... 87 Vanhee P ...... 88 Tebaldi T ...... 77, 83, 93 Usadel B ...... 79 Vanneste E ...... 67 Teufel A ...... 96 Vaitinadapoulle A ...... 71, 75 Varoglu E ...... 87 Teusink B...... 95 Valencia A...... 62, 63, 76, 93 Vd Bergh T ...... 75 Tewatia P ...... 72 Valkenborg D ...... 73, 99 Velankar S ...... 42 Thakur V ...... 66 Vallaeys T ...... 100 Velasco R ...... 79, 97 Theunissen D ...... 94 Vallar L ...... 84 Veltman JA ...... 61 Thieffry D ...... 62, 81 Vallenet D ...... 92 Venclovas Č ...... 64, 73 Thiele S ...... 81 Van Bel M ...... 69 Venet D ...... 83, 87, 99 Thierry-Mieg N ...... 93 van Bochove K ...... 37 Venkataraman P ...... 45, 53 Thijs I ...... 79 Van Breusegem F ...... 88 Venkatesan A ...... 86 Thomas A ...... 97 Van Criekinge W ...... 66, 67 Vens C ...... 70 Thomas CM ...... 79 Van de Peer Y53, 58, 61, 63, Venselaar H ...... 63 Thomas M ...... 100 68, 69, 70, 79, 88, 93 Verbeeck N ...... 93 Thomas-Chollier M ...... 62, 63 Van de Plas R ...... 93 Verbeke G ...... 67 Thorisson GA ...... 86 van de Wiel MA ...... 97 Vermeesch J ...... 66, 67 Thorne D ...... 52, 99 Van Delm W ...... 83 Vermeirssen V ...... 79, 88 Thorrez L ...... 81 van den Akker E ...... 94 Vermeulen J ...... 80 Tkatšenko A ...... 83 van den Broek M ...... 45 Verschoren A ...... 92 Todt T ...... 82 Van den Bulcke T ...... 85 Verschueren E ...... 88 Toffalini F ...... 79 van den Ham HJ ...... 81 Verslyppe B ...... 85, 98 Toffolo G ...... 82, 86 Van der Meulen R ...... 80 Via A ...... 98 Tokár T ...... 94 Van der Sijde MR ...... 64 Vidotto M ...... 65 Toma A ...... 90 van der Spek P ...... 37, 96 Vieira G ...... 92 Tomáška L ...... 68 van Dijk ADJ ...... 90 Vilella A ...... 68 Tomita M ...... 85, 91 van Dorp M ...... 82 Villanueva E ...... 59 Tommerup N ...... 67 Van Dyck D ...... 90 Villar G ...... 89 Töpfer A ...... 53, 78 Van Dyk E ...... 83 Vilo J ...... 83 Toppo S ...... 86 Van Eyndhoven W ...... 67 Vinar T ...... 62 Torti C ...... 94 van Haagen H ...... 86 Vina ř T ...... 61, 68 Tosatto S ...... 89 Van Haagen HHHBM ...... 87 Visnovska M ...... 62 Tosatto SCE ...... 65 van Ham RCHJ ...... 90 Vissers LELM ...... 61 Touleimat N ...... 95 van Helden J . 62, 63, 79, 81, 93 Viswanathan GA ...... 92 Touzet H ...... 99 Van Hijum S...... 94 Vivien F ...... 65 Touzet P ...... 67 Van Hijum SAFT ..... 64, 70, 82 Vizcaíno JA ...... 101 Tramontano A ...... 63, 98 Van Houdt J ...... 66 Vlahovi ć I ...... 65 Tran HT ...... 67 Van Houdt R ...... 84 Vlaic S ...... 89 Tran VD ...... 74 Van Hummelen P ...... 80 Vo A ...... 47, 59 Tranchevent L-C ...... 81, 88, 94 Van Iersel MP ...... 86 Voet T ...... 67 Traulsen A ...... 97 van IJcken WF ...... 81 Vogel J ...... 66 Treptow W ...... 99 Van Laethem K ...... 94 Volkovich Z ...... 68 Tresch A ...... 90, 96 Van Landeghem S ...... 53 von Grotthuss M ...... 89 Tress M ...... 62, 76 Van Lishout F...... 81, 98 Vrancken G ...... 80 Tress ML ...... 63, 76 Van Loo P ...... 94, 96 Vranken W ...... 42, 72 Trimpalis P ...... 69 van Mulligen E ...... 87 Vreede B ...... 92 Troggio M ...... 97 van Nuland N ...... 75 Vriend G ...... 75, 99 Trooskens G ...... 67 van Ommen B ...... 37 Vroling B ...... 99 Troshin PV...... 63 van Ommen G-J ...... 86 Vu D ...... 64 Trukhina Y...... 77 Van Parys T ...... 61, 79, 88 Vu T-N ...... 99 Tsaftaris AS ...... 61 van Roy F ...... 70 Vyverman M ...... 62 Tsoka S ...... 87 Van Staveren WCG ...... 99 Waelkens E ...... 93 Tuefferd M ...... 70 Van Steen K ...... 81, 98 Wagner DR ...... 96 Tuffley C ...... 70 Van Uitert M ...... 99 Walde C ...... 77 Tuszynska I ...... 71 van Vliet M ...... 37 Waleron K...... 100 Tyagi M ...... 75 Vandamme AM ...... 94 Waleron M ...... 100 Uhlendorf J ...... 84 Vandamme P ...... 80 Walsh I ...... 65 Uitto P ...... 98 Vandekerckhove TTM ...... 66 Walter P ...... 92

110

ECCB10 Presentations & Posters Author Index

Walther D ...... 85 Wiersma A ...... 91 Yang P ...... 73 Wanchana S ...... 66 Wiesberg S ...... 90 Yap VB ...... 69 Wang J ...... 80 Wilhelm T ...... 93 Yeheskel A ...... 65 Wang R ...... 101 Will S ...... 63 Yli-Harja O ...... 94 Wang Z ...... 57 Williams A ...... 35 Yoshida R ...... 55, 82 Ward S ...... 73 Wilmotte A ...... 100 Yousef M ...... 76 Watanabe Y ...... 48, 75 Wilson NK ...... 81 Yu L ...... 62 Watson M ...... 66 Winterbach W ...... 45 Yumoto N ...... 82 Wattiez ...... 100 Wischnitzki E ...... 69, 70 Zaaraoui F ...... 81 Weckx S...... 80 Withers D ...... 35 Zacharias M ...... 72 Wehenkel L ...... 67, 74, 82, 88 Witters E ...... 99 Zadissa A ...... 66 Wei W ...... 84 Wittkop T ...... 38 Zagar L ...... 85 Weigman VJ ...... 96 Wlodarska-Kowalczuk M .... 99 Zamora-Rico I ...... 63 Weinhold N ...... 67 Wlodarski T...... 78 Zardoya R ...... 70 Weissenbach J ...... 95 Woelders H ...... 86 Zarrineh P ...... 79 Weiss-Solis DY ...... 99 Wolstencroft K ...... 35, 98 Zazzi M ...... 94 Weith A ...... 92 Wong L ...... 80 Zeron Y ...... 65 Weller JI ...... 65 Wood G ...... 98 Zetterberg A ...... 96 Wels M ...... 82, 91 Wood L ...... 76 Zhang C-C ...... 100 Welter DN ...... 98 Worning P ...... 71 Zhang S-D...... 97 Wen W ...... 73 Wright MW ...... 86 Zhang Y ...... 51 Weng C-W ...... 97 Wrobel B ...... 99 Zhao H ...... 85 Werhli AV ...... 89 Wróbel B ...... 69 Zhu X ...... 66 Werner S ...... 89 Wu J-Z ...... 73 Zikmanis P ...... 76 Wesbeek J ...... 37 Wu Y ...... 85, 100 Zimmer R ...... 46, 47, 86 Wesolowska A ...... 67 Wubulikasimu A ...... 97 Zini M-F ...... 72, 73 Wesselink J-J ...... 62 Xenarios I ...... 81 Zmasek CM ...... 77 Wessels L...... 83, 93, 95, 99 Xu M ...... 66 Zoppè M ...... 72, 73 Wessels LFA ...... 99 Yaffe Y ...... 65 Zou Y ...... 85 Westermann F ...... 90 Yakobson E ...... 65 Zuccotti M ...... 85 Weston J ...... 57 Yamada T ...... 68, 94 Zuccotti P ...... 83 White S ...... 66 Yamaguchi R...... 82 Zuñiga S ...... 66 Wiegels T...... 72 Yamauchi M ...... 82 Zupan B ...... 67, 85 Wiemann S ...... 56, 88 Yang L ...... 100 Zuzan C ...... 80 Wienecke-Baldacchino AK . 87 Yang L-Y ...... 73 Zyskowski M ...... 36

111