Report 2013/2014

14 15 16 17 18

3

Report 2013/2014 4

Directors Prof. Dr. Thomas Lengauer Prof. Dr. Kurt Mehlhorn Prof. Dr. Bernt Schiele Prof. Dr. Hans-Peter Seidel Prof. Dr. Gerhard Weikum

Scientifi c Advisory Board Prof. Dr. Trevor Darrell, University of California, Berkeley, USA Prof. Dr. Nir Friedman, Hebrew University of Jerusalem, Prof. Dr. Pascal Fua, École Polytechnique Fédérale de Lausanne, Prof. Dr. Thomas A. Funkhouser, Princeton University, USA Prof. Dr. Jürgen Giesl, RWTH Aachen, Prof. Dr. Alon Halevy, Google Research, Mountain View, USA Prof. Dr. Renée Miller, University of Toronto, Canada Prof. Dr. Yves Moreau, Katholieke Universiteit Leuven, Prof. Dr. Nicole Schweikardt, Humboldt University Berlin, Germany Prof. Dr. François Sillion, INRIA Rhône-Alpes, Grenoble, France Prof. Dr. Emo Welzl, Swiss Federal Institute of Technology, Zurich, Switzerland Prof. Dr. Gerhard J. Woeginger, Eindhoven University of Technology, Netherlands

Curatorship Board Hon. Adv. Professor (Tsinghua) Dr. Reinhold Achatz, Head of Corpo- rate Technology, Innovation & Quality, Thyssen Krupp AG, Essen Dr. Siegfried Dais, Robert Bosch Industrietreuhand KG, Stuttgart Christiane Götz-Sobel, Head of the editorial department of Science and Technology ZDF, Munich Prof. Dr. Joachim Hertel, CEO of Denkprozess GmbH, Saarbrücken Prof. Dr. Henning Kagermann, President of the National Academy of Science and Technology (acatech), Munich Annegret Kramp-Karrenbauer, Saarland Prime Minister, Saarbrücken Prof. Dr. Volker Linneweber, President of Saarland University, Saarbrücken Prof. Dr. Wolf-Dieter Lukas, German Federal Ministry of Education and Research, Bonn Dr. Nelson Mattos, Vice President Engineering, EMEA, Google, Zurich, Switzerland Prof. Dr. Wolffried Stucky, Institute of Applied Informatics and Formal Description Methods (AIFB), University of Karlsruhe Prof. Dr. Margret Wintermantel, President of the German Academic Exchange Service (DAAD), Bonn 5

CONTENTS

7 PREFACE

8 THE MAX PLANCK INSTITUTE FOR INFORMATICS: OVERVIEW

14 DEPARTMENT OVERVIEW

DEPARTMENTS 14 DEPT . 1 ALGORITHMS AND COMPLEXITY 16 DEPT . 2 COMPUTER VISION AND MULTIMODAL COMPUTING 18 DEPT . 3 COMPUTATIONAL BIOLOGY AND APPLIED ALGORITHMICS 20 DEPT . 4 COMPUTER GRAPHICS 22 DEPT . 5 DATABASES AND INFORMATION SYSTEMS

RESEARCH GROUP 24 RG . 1 AUTOMATION OF LOGIC

25 THE MAX PLANCK CENTER 26 EXCELLENCE CLUSTER «MULTIMODAL COMPUTING AND INTERACTION»

28 THE RESEARCH AREAS

30 UNDERSTANDING IMAGES & VIDEOS 42 BIOINFORMATICS 58 GUARANTEES 66 INFORMATION SEARCH & DIGITAL KNOWLEDGE 80 MULTIMODAL INFORMATION & VISUALIZATION 92 OPTIMIZATION

100 SOFTWARE

102 IMPRS-CS

104 ALUMNI

124 NEWS

144 THE INSTITUTE IN FIGURES

146 JOINT ADMINISTRATION

148 INFORMATION SERVICES & TECHNOLOGY

152 COOPERATIONS

154 PUBLICATIONS

158 DIRECTIONS

6 PREFACE 7

PREFACE

The Max Planck Institute for Informatics regularly publishes a report for the general public. Now again, we would like to take this opportunity to present some of the cur- rent topics, goals and methods of modern informatics. We introduce the work of our Institute and hope to arouse your interest, in the fascinating world of our science.

The Max Planck Institute for Informatics aims to be a beacon of research in infor- matics. We aim to have impact in the following ways: First, through our scientifi c work, which we disseminate mainly through publications and books, but also in the form of software and Internet services. Second, through the training of young scien- tists, particularly during their doctoral and postdoctoral phases. We are educating future leaders in science and industry. Third, by assuming a formative role in our fi eld. We initiate and coordinate large research programs, and serve on important commit- tees. Fourth, by attracting talent from within and outside the country. About half of our research staff of over 180 scientists comes from abroad. Fifth, through the transfer of our results into industry. These transfers take place through collaborative projects, spin-offs, and people. Sixth, by building a world-class competence center for infor- matics in cooperation with our partners: Saarland University with its departments of Informatics and Computational Linguistics as well as its Center for Bioinformatics, the German Research Center for Artifi cial Intelligence, and the Max Planck Institute for Software Systems. We have continued to be very successful in all of these under- takings. Our success has become more visible through the erection of many new buildings around the Platz der Informatik on the Saarbrücken campus. In the next few months, the building for the Center of Internet Security, Privacy and Account- ability (CISPA) will be fi nished at the East corner of Stuhlsatzenhausweg. We are in the fi nal phase of the projects funded through the German Excellence Initiative – the Cluster of Excellence “Multimodal Computing and Interaction” and the “Saar- brücken Graduate School of Computer Science”. The recent evaluation of the Insti- tute by our Scientifi c Advisory Board has, once more, been very positive.

This report is structured as follows: After an overview of the Institute and its depart- ments and research groups, we present the main areas of recent work. These topics span several departments and will also be the focus of our work in the next years. This part is followed by a section on our alumni and a brief overview of current events. The last part of the report contains a selection of recent scientifi c publications and a compact presentation of the Institute through key indicators.

We wish you much enjoyment reading this report.

Thomas Lengauer Managing Director 8 OVERVIEW

The Max Planck Institute for Informatics, an Overview

Information technology infl uences all aspects of our lives. Computer systems, hardware, soft-

ware, and networks are among the most complex structures that have been constructed by man.

Computational thinking is a new way of studying the universe. Basic research in informatics is

needed to cope with this complexity, to lay the foundations for powerful computer systems, and

to further develop computational thinking. 9

OVERVIEW

Basic research in informatics has Goals Fifth, by transferring our results to indus- led to dramatic changes in our everyday The Max Planck Institute for Infor- try. These transfers take place through lives in recent years. This has become matics aims to be a beacon of research collaborative projects, spinoffs, and particularly clear in the last two decades: in informatics. We aim to have impact in people. Intel founded the Intel Visual The worldwide web, search engines, the following ways: Computing Institute in 2009 together compression processes for video and with the UdS (Saarland University), the music, and secure electronic banking First, through our scientifi c work, DFKI (German Center for Artifi cial In- using cryptographic methods have re- which we disseminate mainly through telligence), the Max Planck Institute for volutionized our lives just a few years publications, but also in the form of soft- Software Systems, and the Max Planck after their discovery at universities and ware and internet solutions. Presently, Institute for Informatics. research institutes. we are concentrating on algorithms for very large, multimodal data. Multimodal To establish a start-up culture and The Max Planck Society, the lead- includes text, speech, images, videos, support start-ups and licensing agree- ing organization for basic research in graphics, and high-dimensional data. ments, the Max Planck Society and Saar- Germany, reacted to these challenges land University jointly created the IT by founding the Max Planck Institute for Second, through the training of Inkubator GmbH, located at the Starter- Informatics (MPI-INF) in Saarbrücken young scientists, particularly in the zentrum of Saarland University. The IT in 1990. In 2005, the Max Planck Insti- doctoral and postdoctoral phases. We Inkubator GmbH focuses on supporting tute for Software Systems (MPI-SWS) are educating future leaders for research scientists in founding and fi nancing high- was established with sites in Saarbrück- and business. Over 180 researchers are tech start-ups, as well as marketing the en and Kaiserslautern. There are depart- working at our Institute and remain with research results to industries nationally ments with a strong emphasis on infor- us for three years on average. In this way, and internationally. The offi cial kick-off matics in other Institutes of the Max we provide the society with over 60 well- event took place in March 2015 and was Planck Society as well. The restructuring trained young scientists each year. attended by Saarland Governor, Annegret of the Max Planck Institute for Metal Kramp-Karrenbauer; the President of Research into the Institute for Intelligent Third, by our role in the scientifi c the Max Planck Society, Prof. Dr. Martin Systems has further strengthened infor- fi eld. We initiate and coordinate large re- Stratmann; and the President of Saarland matics within the Max Planck Society. search programs and serve on important University, Prof. Dr. Volker Linneweber. Given the importance of the area, the committees, e.g., the “Wissenschaftsrat” establishment of further Institutes for or DFG Review Boards. The Institute Sixth, by building a world-class informatics or related areas is desirable. has played a signifi cant role in forming competence center for informatics in the Excellence Cluster “Multimodal cooperation with our partners, Saarland Computing and Interaction” and the university, the DFKI, and the MPI-SWS. “Graduate School of Computer Science”. We have been very successful in all of these endeavors in recent years. Fourth, by attracting talent from within and outside the country. Half of the research staff of the Institute comes from outside Germany. This strengthens the talent base in Germany and estab- lishes bridges to foreign countries. 10 OVERVIEW

History and Organization tute; moreover, postdoctoral researchers The Max Planck Institute for Infor- can become senior researchers by pass- matics was founded in 1990 with Kurt ing an acceptance process similar to the Mehlhorn as the founding director. He procedure for a professorial appointment has directed the “Algorithms and Com- to a university. The senior researchers plexity” department since then. Also, currently with the Institute are: Ernst Alt- Harald Ganzinger was involved from the haus, Klaus Berberich, Andreas Bulling, very beginning and directed the “Logic of Piotr Didyk, Mario Fritz, Martin Hoefer, Programming” department until his death Olga Kalinina, Andreas Karrenbauer, in 2004. A third department, “Computer Michael Kerber, Thomas Lengauer, Graphics”, followed in 1999 under the Christoph Lenzen, Tobias Marschall, direction of Hans-Peter Seidel. Thomas Kurt Mehlhorn, Pauli Miettinen, Karol Lengauer then joined in 2001 to direct Myszkowski, Nico Pfeifer, Tobias the “Computational Biology and Applied Ritschel, Michael Sagraloff, Bernt Algorithmics” department. Gerhard Schiele, Marcel Schulz, Hans-Peter Weikum has directed the “Databases Seidel, Viorica Sofronie-Stokkermans, and Information Systems” department Jürgen Steimle, Thomas Sturm, Chris- since 2003. In the summer of 2010, tian Theobalt, Jilles Vreeken, Christoph the new “Computer Vision and Multi- Weidenbach, Gerhard Weikum, and modal Computing” department, directed Andreas Wiese. by Bernt Schiele, was formed.

Research Topics In addition to the departments, the Institute is home to independent re- Algorithms are our central research search groups: The “Automation of Log- subject. An algorithm is a general recipe ic” research group headed by Christoph for solving a class of problems. We en- Weidenbach as well as several junior large the computational universe through research groups within the Excellence the design of new and better algorithms. Cluster “Multimodal Computing and We prove the correctness of these algo- Interaction” and in the Max Planck rithms and analyze their performance. and Stanford cooperation (Mykhaylo We implement them and validate them Andriluka, Roland Angst, Andreas Kar- through experiment. We make them renbauer, Michael Kerber, Haricharan available to the world in the form of soft- Lakshman, Yangyan Li and Dominik ware libraries and Internet services. We Michels). study inherent properties of computation and investigate techniques for robust software design.

Senior Researchers We apply the algorithms to inter- The scientifi c work of the current- esting application problems. Improved ly 29 research groups at the Institute is hardware has led to impressive gains in orthogonal to its organizational struc- effi ciency; improved algorithms can lead, ture. These research groups are headed and have led, to even larger gains. Here by so-called senior researchers. Directors is an example: The state of hardware and and independent research group leaders algorithms in 1970 made it possible to automatically become senior researchers calculate an optimal travelling salesman through their appointment to the Insti- tour through 120 cities (a classic optimi- 11

OVERVIEW

zation problem and a recognized bench- tion of systems with a signifi cant discrete research on the diagnosis and treatment mark for computer performance). Adding proportion and optimization problems. of illnesses. one more city multiplies the runtime of the classical algorithm by 120, and Nowadays, computers are used to The mission of the “Databases and adding another city multiplies it by 121; model, represent, and simulate aspects Information Systems” department is the the runtime grows super-exponentially. of real or virtual worlds. Since the visual automatic acquisition of knowledge from By combining the increased hardware sense is a key modality for humans, com- Internet sources, text documents, and speed, made possible by today’s techno- puter graphics has become a key tech- social media. The ultimate goal is a com- logy and the classical algorithm from nology in modern information and com- puter that knows everything Wikipedia 1970, we can solve problems with 135 munication societies. The department contains in a formal knowledge repre- cities. It is the advance in algorithms “Computer Graphics” researches the en- sentation and that can harness this for that allows us to fi nd an optimal route tire processing chain from data acquisi- understanding natural language, deep through thousands of cities today; if we tion to modeling (the creation of a suit- text analytics, and logical inference. were to rely solely on advances in hard- able scene representation) and image The department has been a trendsetter ware, such high performance would not synthesis (the generation of views for in the automatic construction of formal even be possible in a thousand years. human consumption). The following knowledge bases, which has led to major scientifi c challenges emerge from this: innovations like the Google Knowledge The scientifi c problem of under- For the input side, we want to develop Graph and similar projects at Internet standing hand-developed algorithms and modeling tools for the effi cient handling companies and other enterprises. Our their realization in computer programs and processing of large data fl ows, and, knowledge base YAGO has been used has two important aspects. First, there on the output side, we seek new algo- in many projects around the world, in- is the question of whether the program rithms for the fast computation of high- cluding the IBM Watson System when calculates what it is intended to without quality views; these algorithms should it won the TV quiz show Jeopardy in “crashing”, “freezing”, or blocking all of exploit the capabilities of modern early 2011 and leveraged YAGO for the computer’s resources. Second, there graphics hardware. semantic type checking. Digital knowl- is also the question of whether the pro- edge is a key asset for a variety of intelli- gram is “effi cient”, i.e., whether the best The department “Bioinformatics gent computer applications: searching possible algorithm has been found. The and Applied Algorithmics” addresses for entities and complex relationships on department “Algorithms and Complexity” the potential of computing for the life the Internet or in enterprises, analyzing concentrates on the resource require- sciences. The life sciences have a high news articles and social media, summa- ments of algorithms. The most important and ever-increasing demand for algorith- rizing entire text corpora, exploring and resources are running time (How long mic support due to the recent turning analyzing heterogeneous collections of must I wait for the result of my computa- of biology into a quantitative discipline. Big Data, and many more. tion?) and space requirement (Do I have Algorithms and statistical analyses play enough storage space for my calculation?). a central role in the preparation and The department for “Computer Vi- The group develops new algorithms with confi guration of biological experiments sion and Multimodal Computing” inves- improved running times and storage re- and even more so in the interpretation tigates the processing and understand- quirements and also studies the basic of the biological data generated by them. ing of sensor information. Sensors range limits of computation (How much time As a consequence, the computer has from being relatively simple, e.g., GPS and space are provably necessary for a become an indispensible tool for biology and acceleration sensors, to being very computation?). and medicine. Vast amounts of data need powerful sensors, e.g., cameras. They to be processed in modern biology, and are embedded in more and more devices. The “Automation of Logic” research the molecular interactions in the living Although the algorithmic processing of group investigates logic-based generic organism are so complex that studying sensor information has advanced con- procedures for solving “hard” combina- them is unthinkable without computer siderably, it is still mainly limited to torial and decision problems. Typical support. Therefore, bioinformatics meth- low-level processing. In particular, we logic-based applications are the verifi ca- ods have become essential for modern are far from being able to fully interpret 12 OVERVIEW

and understand sensor information. computer systems to effi ciently and ro- Such sensor understanding is, however, bustly collect, then process and present a necessary prerequisite for many areas data of various modalities. A further goal such as man-machine interaction, the in- is to analyze and interpret large volumes dexing of image and video databases, or of distributed, noisy, and incomplete for autonomous systems such as robots. multimodal data and then organize and visualize the obtained knowledge in real- time. This is called mulitmodal process- Excellence Cluster “Multimodal ing. Everyday interpersonal communi- Computing and Interaction“ cation is based on numerous different The Institute plays an important modalities; so, the Cluster’s second ma- role in the Cluster of Excellence on jor goal is to facilitate a similarly natural “Multimodal Computing and Interaction”, multimodal interaction of information which was established by the DFG (Ger- systems – anywhere, anytime. The sys- man Research Foundation) in 2007 and tems must consider environmental con- successfully renewed in 2012. All direc- text, react to language, text, and gestures, tors of the Institute are principal inves- and then respond in the appropriate mo- tigators of the Cluster, and Hans-Peter dalities. This research program builds on Seidel is the scientifi c coordinator. existing strengths: The Cluster comprises the departments of Computer Science, The starting point of the Cluster’s Computational Linguistics and Phonet- research program was the observation ics, and Applied Linguistics at Saarland that there have been dramatic changes University, our Max Planck Institute for over the last two decades in the way we Informatics, the German Research Center live and work. Twenty years ago, most for Artifi cial Intelligence, and the Max digital content was textual, whereas to- Planck Institute for Software Systems. day, its scope has expanded to include The participating institutions have agreed audio, video, and graphics that are availa- on a joint long-term research programas ble practically everywhere. The challenge the basis of their work. The university is to organize, understand, and search and the state government provide special this multimodal information in a robust, assistance to the Cluster. effi cient, and intelligent way and to de- velop reliable and secure systems that A prominent goal of the Cluster allow intuitive multimodal interaction. is to qualify and promote young scien- The Cluster addresses this challenge. tists. Saarbrücken has acquired a repu- The term “multimodal” describes the tation over the years as an “elite school” different kinds of information such as for young scientists. For this reason, the text, speech, images, video, graphics, and majority of the allocated funds have been high dimensional data as well as the way devoted to the establishment of junior in which it is perceived and communi- research groups. This concept has proven cated, particularly through vision, hear- successful in the past years, and several ing, and human expression. The Cluster’s young scientists have since taken pro- primary goal is to improve the ability of fessorships in Germany and abroad. 13

OVERVIEW

Publications and Software We encourage our young scientists The scientifi c results of the Max to establish their own research programs Planck Institute for Informatics are dis- and then move on to other institutions. tributed through presentations, publica- Since the founding of the Institute, sev- tions, software, and web services. Our eral researchers from the Max Planck publications appear in the best venues Institute for Informatics in Saarbrücken of the computer science fi eld. Most have joined other research facilities, publications are freely available in the many of them accepting professorships. Institute’s repository [ http://www.mpi-inf. The section on the alumni of our Insti- mpg.de/publications/ ].. Some of our results tute in this report represents the sus- are available in the form of downloadable tained yield of our efforts. software or as a web service, for example CGAL (Computational Geometry Algo- Structure of the Report rithms Library) and the clinically-used web service geno2pheno for HIV therapy After a brief introduction to the support. Publications in the form of soft- departments and research groups of ware and web services make our results our Institute, we survey our work by available more directly and to a larger means of representative examples. audience than classical publications. We group the examples into subject areas, each of which spans at least two departments. This report ends with a Promotion of Young Scientists section on alumni, a presentation of A further goal of the Institute is the IMPRS-CS, an overview of recent to create a stimulating environment for events, a presentation of the Institute young scientists, in which they can grow, in fi gures, infrastructure aspects, and develop their own research programs, a tabular listing of cooperations and and build their own groups. We concen- publications. trate on doctoral and postdoctoral train- ing. Our 137 PhD students are trained Enjoy reading it. ::: in cooperation with the Graduate School of Computer Science at Saarland Uni- versity and the International Max Planck Research School for Computer Science (IMPRS-CS), page 102. Our postdoctor- al researchers participate in international collaborations such as the “Max Planck Center for Visual Computing and Com- munication” (a cooperation with Stan- ford University in the area of computer graphics), page 25, the “Indo Max Planck Center for Computer Science” (a coopera- tion with leading universities in ), page 133, and in our many EU projects. 14 DEPT . 1 Algorithms and Complexity

Algorithms and Complexity

PROF. DR. KURT MEHLHORN

DEPT. The group has existed since the founding of the institute. It currently has about 40 staff members 1 and doctoral candidates. Our goals are

• carrying out outstanding basic research in the fi eld of algorithms

• implementing our fundamental work in demonstrators and generally useful software libraries

• promoting young scientists in a stimulating workgroup environment.

We are successful in all three undertakings. We are effective through publications, software,

and people. We publish in the best journals, present our results at the leading international

conferences in the fi eld, and our LEDA and CGAL software libraries are used worldwide.

Many group alumni are in top positions domestically and abroad.

CONTACT

Algorithms and Complexity Secretary Ingrid Finkler-Paul | Christina Fries Phone +49 681 9325-1000 Email infi @mpi-inf.mpg.de [email protected] DEPARTMENTS 15

DEPT. 1

Algorithms are the heart of all soft- chet distance could not be broken for The promotion of young scientists ware systems. We work on the design more than twenty years (a faster algo- is an integral component of our work. and analysis of many facets of algorithms: rithm would imply a faster algorithm for We give lectures at Saarland University, combinatorial, geometric, and algebraic the satisfi ability problem). which are addressed not only to students, algorithms, data structures and search but also to our doctoral candidates. Our processes, very different computer Notable practical results in recent annual summer school draws leading models (sequential, parallel, distributed, years include our contributions to the experts and distinguished international fl at data or hierarchical data), exact and Computational Geometry Algorithms graduate students to Saarbrücken. Two approximate solutions, problem-specifi c Library (CGAL) as well as to the search of our alumni, Susanne Albers and Peter methods and general heuristics, deter- engine Complete Search and its applica- Sanders, received the Leibniz-Award of ministic and randomized solutions, up- tion in one of the most signifi cant com- the German Research Foundation. ::: per and lower bounds, and analyses in puter science bibliographic databases. the worst case and on the average. We Our software library, LEDA (begun in develop effi cient algorithms for abstract the 1990s) continues to enjoy strong versions of application problems as well demand in science and industry. as for concrete applications, e.g., opti- mized control of optical displays. A part Our theoretical and experimental of our theoretical insight fl ows into the work strengthen each other. For example, implementation of software demonstra- the effi cient implementations for deal- tors and software libraries; we devote a ing with curves and surfaces in CGAL part of our practical work to collaborating are based on our theoretical work on with companies. isolating roots of polynomials and fi nding solutions to systems of polynomial equa- Outstanding theoretical results in tions. The combination of theoretical the past two years include a new algo- and experimental research in algorithms rithm for the highly effi cient, and at the has become a widely accepted research same time, provably correct isolation of direction. The DFG supported it through zeros, so-called certifying algorithms for its priority program “Algorithm Engineer- combinatorial problems in graphs (i.e., ing”. The group is involved in two inter- algorithms that do not only compute a national projects: the GIF project (geo- solution, but also a proof for the correct- metric computing – with the University ness of the solution) and a simple (there- of Tel Aviv) and the Indo-German Max fore also practical) algorithm for calculat- Planck Center for Computer Science ing equilibrium prices in the Fischer and (IMPECS). In addition, there is a regular Walras market model. In addition to de- international exchange through various signing good algorithms, we strive to bet- means of support for excellent research. ter understand fundamental algorithmic Our scientifi c staff members have re- principles: we have found an explanation ceived scholarships from the Humboldt for how a certain slime mold can appar- Society, the European Union (Marie- ently determine shortest paths through Curie), and the German Academic Ex- mazes, we have given a mathematical ex- change Service (DAAD). In Germany, planation for the very fast dissemination we participate in the trans-regional spe- of information in social networks, and cial research area AVACS (Automatic we have explained why the quadratic Verifi cation and Analysis of Complex runtime barrier for computing the Fre- Systems). 16 DEPT . 2 Computer Vision and Multimodal Computing

Computer Vision and Multimodal Computing

PROF. DR. BERNT SCHIELE

DEPT. The department was founded in 2010 and currently includes 30 scientists. The group‘s 2 main research areas are computer vision with a focus on object recognition and 3D scene

description as well as multi-sensor-based context recognition in the area of ubiquitous

and wearable computing.

CONTACT

Computer Vision and Multimodal Computing Secretary Cornelia Balzert Phone +49 681 9325-2000 Email d2-offi [email protected] DEPARTMENTS 17

DEPT. 2

Sensors such as cameras, GPS and For example, cars equipped with such a tainties that exist with any sensor pro- accelerometers are being increasingly camera may predict the movements of cessing. In addition, they allow the use embedded in devices and environments, pedestrians and therefore react to their of large amounts of data, and can also and they are already helpful in various behavior more effectively. The depart- elegantly integrate previous knowledge. ::: ways. Computer-controlled processing of ment has developed approaches that ro- sensor information has made enormous bustly detect people and track them over progress but is generally limited to simple longer periods of time, even if they might matters. This means, in particular, that be out of sight for a long time. A recently devices and computers which have access presented approach not only describes to this sensor information do not fully people but also entire 3D scenes, repre- interpret it and thus cannot truly under- senting a further step towards complete stand their environment. The depart- image and scene understanding. ment is therefore concerned with the understanding of sensor information, In addition to computer vision, using both powerful sensors, such as the second central research area is the cameras, and embedded sensors, such processing and understanding of multi- as gyroscopes and accelerometers. modal sensor information. The under- lying observation here is that an increas- In the area of computer vision, the ing amount of computers and sensors department deals with problems such can be found in our environment, in ob- as object recognition, one of the basic jects and even in our clothing. Context problems of image understanding. In awareness and sensing is often seen as recent years, this area of computer vision a means of making the computing tasks has made impressive progress, and the de- sensitive to the situation and the user’s partment has played a pioneering role in needs. Ultimately, context awareness may presenting several innovative approaches. support and enable seamless interaction One of these approaches recognizes and and communication between humans segments the object simultaneously, lead- and computing environments without the ing to signifi cantly improved results in need for explicit interaction. In this area, comparison with standard approaches. the department has presented approaches Current work has presented approaches to recognize long-term activities and to for learning 3D models. These not only model personal daily routines. It was also enable the robust detection of objects, possible to show that a person’s interrup- but also allow additional parameters that tibility can be predicted with surprising are essential for 3D scene understanding, accuracy using a few sensors embedded e.g., the estimation of visual direction. in their clothing.

Another central theme of the de- The third research area of the de- partment is people detection and tracking partment is machine learning. This plays using moving cameras. This problem is the important role as a cross-cutting not only scientifi cally challenging but theme, as the other research areas make also has a wide variety of applications, extensive use of probabilistic modeling such as image and video understanding, and inference techniques. These allow, or in robotics and the automobile industry. for example, the modeling of the uncer- 18 DEPT . 3 Computational Biology and Applied Algorithmics

Computational Biology and Applied Algorithmics

PROF. DR. THOMAS LENGAUER, PH.D.

DEPT. This department dates back to October 2001 and is directed by Prof. Dr. Dr. Thomas Lengauer. 3 The department currently comprises about 20 scientists, performing research exclusively in

computational biology.

CONTACT

Computational Biology and Applied Algorithmics Secretary Ruth Schneppen-Christmann Phone +49 681 9325-3000 Email [email protected] DEPARTMENTS 19

DEPT. 3

The department focuses on topics and “Reconstructing Pathogen Transmission optimization of AIDS therapies (“Bioin- more or less closely related to the dia- Networks Using Genomic Data”, page 49). formatical Support of HIV Therapy”, page gnosis and therapy of diseases. At the 47). molecular level, diseases can be under- Diseases such as cancer, neurode- stood in terms of anomalies in the bio- generative diseases, or immunological Additional topics on which we chemical circuitry of an organism (“Bio- diseases are based on different princi- perform basic research include the evolu- informatics”, page 42). The building ples. Here, the starting point is an inter- tionary relationships in the viral kingdom blocks of such biochemical networks are play between the genomes of the patient (“Detection of Related Proteins in Unre- DNA, RNA, and proteins. The circuitry and the environment. Therefore, an lated Viral Species”, page 56) and analysis is realized by transient binding of the early diagnosis of a disease like cancer methods and software for analyzing varia- molecules among each other and with can be based on the analysis of genetic tions in individual (e.g. human) genomes small organic ligands. In this manner, or epigenetic aberrations of the relevant (“Structural Variation in Genomes”, page proteins catalyze chemical reactions, tissue (“Integrative Analysis of Cancer 57). regulate the expression of genes, and Data”, page 50). The epigenome – the transduce signals in and between cells. aggregate of all chemical modifi cations Our department is one of the main of the DNA inside the nucleus as well pillars of the Bioinformatics Center Saar, Our research concentrates on two as the chromatin enveloping it – is the an inter-faculty center at Saarland Uni- groups of diseases. key to the complex regulation of the versity focusing on teaching and research cell, which moves out of kilter in such in the area of bioinformatics. The depart- On the one hand, we investigate diseases. Charting the entire epigenome ment is a member of the German Arevir viral infections. The molecular under- is one of the great challenges facing mo- network and the European Consortia standing of such diseases requires the lecular biology in the next years (“Chart- EuResist and CHAIN, all of which con- elucidation of the function of the re- ing Epigenomes”, page 51, and “Analyzing tribute to bioinformatics research on levant viral proteins and their interac- the Three-Dimensional Organization of viral resistance development. Moreover, tions with the molecules of the infected the DNA Inside the Nucleus”, page 52). the department coordinates the bioin- patient as well as with drug molecules. formatics for the German Epigenome So, analysis must be carried out at the A more general approach to sup- Program (DEEP), sponsored by the Ger- level of the three-dimensional structure porting the diagnosis of a large variety man Federal Ministry of Education and of the involved molecules. of genetic diseases is described in “Phe- Research (BMBF), and is a partner in notype-Based Computational Clinical the European research programs BLUE- Our methods are applied to con- Diagnosis for Genetic Diseases”, page 53. PRINT (epigenetics) and PREDEMICS crete diseases such as AIDS (“Structural (research on viruses with considerable Insights into the Emergence of Resistance A large part of the method develop- epidemich potential). ::: in the HIV-1 Protease”, page 46), Hepa- ment in the department results in soft- titis B and Hepatitis C. AIDS plays a ware systems, which are used worldwide special role; with this disease, the Max by many academic, clinical, and also Planck Institute for Informatics goes even industrial users. Examples, which are a step further. We analyze the resistances presented in this issue (“EpiExplorer of HI viruses against administered com- and RnBeads: Integrative Analysis of bination drug treatments (“Bioinformati- Epigenomic Data”, page 54, and “BiQ cal support of HIV therapy”, page 47). Analyzer HiMod for the Locus-Specifi c Furthermore, we investigate the pattern Analysis of DNA Modifi cation Data”, of geographical distribution of the virus page 55), include the fi eld of epigenetics, (“Reconstructing Pathogen Transmission the analysis of protein function and pro- Networks Using Genomic Data”, page 48, tein interaction networks as well as the 20 DEPT . 4 Computer Graphics

Computer Graphics

PROF. DR. HANS-PETER SEIDEL

DEPT. The Computer Graphics work group was founded in 1999 and now includes 40 scientists. An im- 4 portant characteristic of our work is the thorough consideration of the entire processing pipeline,

from data acquisition to modeling and image synthesis (Visual Computing, resp. 3D Image Analy-

sis and Synthesis). Typical for the fi eld area is the co-occurrence of very large data sets and the

demand for fast, possibly interactive, high quality visual feedback. Furthermore, the user should

be able to interact with the environment in a natural and intuitive way.

CONTACT

Computer Graphics Secretary Sabine Budde Phone +49 681 9325-4000 Email [email protected] DEPARTMENTS 21

DEPT. 4

During the last three decades com- develop methods that effi ciently handle The Cluster of Excellence was estab- puter graphics has established itself as a the huge amount of data during the ac- lished by the German Research Founda- core discipline within computer science quisition process, extract structure and tion (DFG) within the framework of the and information technology. Ten years meaning from the abundance of digital German “Excellence Initiative” in 2007 ago, most digital content was textual. data, and turn this data into graphical and was successfully renewed in 2012. Today it has expanded to include images, representations that facilitate further Hans-Peter Seidel is the scientifi c coordi- video, and a variety of graphical represen- processing, rendering, and interaction. nator of the Cluster. tations. New and emerging technologies such as multimedia, digital television, The scientifi c activities of the Com- Another important partner is the and digital photography combined with puter Graphics work group are embed- Intel Visual Computing Institute (Intel the rapid development of new sensing ded in a series of project activities on VCI) that was established in 2009 as a devices, telecommunication and tele- national, European, and international European hub for academic research in presence, virtual reality, and 3D Internet levels. the fi eld of visual computing. Intel VCI further indicate the potential of compu- is a collaborative effort between on cam- ter graphics in the years to come. Typical The Max Planck Center for Visual pus organizations (Saarland University, for the fi eld is the co-occurrence of very Computing and Communication was German Research Centre for Artifi cial large data sets and the demand for fast, jointly established by the Max Planck Intelligence, MPI for Software Systems possibly interactive, high quality visual Institute for Informatics and Stanford and MPI for Informatics) and Intel, who feedback. Furthermore, the user should University in 2003 with support from sponsors the research. be able to interact with the environment the BMBF (German Federal Ministry in a natural and intuitive way. of Education and Research). The colla- Altogether, since the establishment boration has two intertwined goals: estab- of the group fi fteen years ago, the insti- New approaches are needed from a lish a joint research program in the area tute has produced more than 40 faculty scientifi c point of view in order to meet of visual computing and communication, members in computer graphics and vi- the above-mentioned challenges. An im- and incorporate a strong career develop- sual computing in Germany and abroad. portant characteristic in our department ment component to alleviate the short- The exceptional quality of this academic is, therefore, the thorough consideration age of qualifi ed faculty and scientists offspring is further documented by a of the entire pipeline, from data acquisi- in information technology in Germany. number of prestigious national and inter- tion to modeling (creation of a suitable The Max Planck Center fosters the pro- national awards that current and former internal computer scene description) and fessional development of a small number institute members have been able to image synthesis (generation of novel ren- of outstanding individuals by providing attract over the years: German Pattern derings). In our opinion, this point of view these with the opportunity to work at Recognition Award (3x), Eurographics is necessary to exploit the capabilities Stanford as visiting assistant professors Young Researcher Award (7x), ERC of modern hardware, on both the input in the area of visual computing and com- Starting Grant (5x), Eurographics Tech- (sensors, scanners, digital photography, munication for two years, and then return nical Contributions Award, Humboldt digital video) and the output (graphics to Germany to continue their research as Research Prize, ERC Advanced Grant, hardware, advanced displays) sides. leaders of junior research groups at the DFG Leibniz Prize (2x), Eurographics Moreover, we take into account existing Max Planck Institute for Informatics, Distinguished Career Award. ::: digital footage when analyzing and synthe- and ultimately as professors at major uni- sizing novel scene content. In particular, versities. we seek to extract powerful priors from the abundance of digital visual data that Furthermore, the Computer Graph- can then assist us during the acquisition, ics group is signifi cantly involved in the reconstruction, editing, and image forma- activities of the Cluster of Excellence on tion processes. Our long term goal is to “Multimodal Computing and Interaction”. 22 DEPT . 5 Databases and Information Systems

Databases and Information Systems

PROF. DR.-ING. GERHARD WEIKUM

DEPT. 5 The department, headed by Gerhard Weikum, pursues research in three areas: 1. Algorithmic extraction of knowledge from text and Web sources, and automatic

construction of large knowledge bases.

2. Semantic language understanding by automatically detecting and disambiguating names

and phrases and mapping them to entities and relations in data and knowledge bases.

3. Big Data analytics and explorative data analysis on structured data such as biological

networks or product ratings as well as unstructured data such as news articles or online

discussion forums.

CONTACT

Databases and Information Systems Secretary Petra Schaaf Phone +49 681 9325-5000 Email d5-offi [email protected] DEPARTMENTS 23

DEPT. 5

The central theme and strategic Digital knowledge is key to a variety 5. A variety of tools for explorative mission of the department is the auto- of intelligent computer applications: analysis of multivariate data, complex matic acquisition of knowledge from searching for entities and complex rela- networks, matrices and tensors, as Internet sources, text documents, and tionships in the internet or in enterprises, well as text corpora. social media. Envision a computer who analyzing news articles and social media, knows everything that Wikipedia con- summarizing entire text corpora, explor- The department is involved in a tains in a formal knowledge representa- ing and analyzing heterogeneous collec- number of externally funded projects. tion and can harness this for understand- tions of Big Data. An application where These include the EU project “LAWA: ing natural language, deep text analytics, we make intensive use of machine knowl- Longitudinal Analytics of Web Archive and logical inference. Such a computer edge is our AIDA project on automatic Data”, the ERC Synergy Grant “imPACT: would be able to give precise answers discovery and tracking of entities in text Privacy, Accountability, Compliance and to complex questions and may even be and data. Trust in Tomorrow‘s Internet”, and the able to pass a high-school exam in spe- DFG Excellence Cluster “Multimodal cifi c subjects. The ultimate goal would In terms of algorithmic methodo- Computing and Interaction”. The group be to pass the Turing Test: can a com- logy, the group pursues the combination has received a Google Focused Research puter have a conversation with a human of logic-based and statistically-based Award. ::: person such that the human cannot tell techniques. Methods of this kind are whether she communicates with a ma- central for knowledge extraction and chine or a person? also key assets for explorative data analy- sis on both structured data and unstruc- The department has pursued this tured text. ambitious goal since 2005 and has been a trendsetter on the automatic construc- Many of the tools and systems tion of formal knowledge bases. The developed by the group are publicly trend has later led to major results like available as open source software and the Google Knowledge Graph and similar are used by other research groups around projects at Internet companies and other the world. These include, in particular: enterprises. Our knowledge base YAGO is one of the three largest publicly avai- 1. Software libraries for the automatic lable knowledge graphs. YAGO comprises construction and maintenance of the nearly 10 Million entities like people, YAGO knowledge base, as well as the places, movies, drugs, and others, as well YAGO contents itself. as 200 Million facts about these entities and relationships among them. YAGO 2. The RDF search engine RDF-3X, has been used in many projects around which can fi nd complex patterns in the world, including the IBM Watson large collections of RDF triples and System when it won the TV quiz show graph-structured data very effi ciently. Jeopardy in early 2011 and leveraged YAGO for semantic type checking. 3. SOFIE, PROSPERA, PATTY, We are working on improvements and ClausIE and other tools for extracting extensions of YAGO and knowledge-base relational facts from Web contents technology in general, especially on ac- and textual sources. quiring temporal knowledge and com- monsense knowledge about everyday 4. AIDA, a knowledge-based linguistic objects, activities, and more. system for high-precision detection and disambiguation of entities in English or German texts. 24 RESEARCH GROUP RG. 1 Automation of Logic

Automation of Logic

PROF. DR. CHRISTOPH WEIDENBACH

RG. The independent research group “Automation of Logic”, under the direction of Prof. Dr. Christoph 1 Weidenbach, serves the complete pipeline, from basic research on new logics to industrially

used automated reasoning tools.

Logics are formal languages for reasoning with mathematically precise meanings and exact rules. A simple example of a logic are linear equation systems that we know from high school, including the variable elimination method for fi nding their solution. Logic was developed at the end of the 19th century in order to exactly describe and calculate general mathematical arguments. Our everyday language is not suitable for exact descriptions, because of, for example, ambiguities. Since the invention of com- puters and information technology, logics have been further developed for computer systems so that these could also be described precisely and properties could be for- mulated, analyzed, and ultimately proven. By about the mid-1990s, such analytical and proof efforts required massive manual interaction and, therefore, a great deal of time and money. There have since been major advancements in the automation of the methods.

Proving or analyzing computer system properties is typically “hard”, that is, even if there are automatic procedures available, the potential number of calculation steps grows at least exponentially with the size of the problem. From the mid-90s to date, however, procedures for a number of practically relevant properties have been deve- loped to work fully automatically in reasonable time. In sum, the application of logic was extended from mathematics to properties of computer systems in the broadest sense. In addition, it has been found that many of the techniques developed are generally successful on hard problems, even if they do not originate from the formal analysis of computer systems and mathematics. The application of analysis and proof techniques has thus once again extended to general hard problems in recent years, e.g., combinatorial optimization problems.

Our research group develops automatic, logic-based procedures. These should be capable of computing solutions for a class of applications within an acceptable amount of time. We are particularly interested in the verifi cation of controls, distri- buted protocols and systems, safety properties, software, solutions for decision and optimization problems, and in the computation in specifi c domains like in linear and non-linear arithmetic. We aim to be able to answer increasingly complex questions fully automatically and within an acceptable time period. To this end, the effi ciency CONTACT of the formal analysis methods used today must be raised signifi cantly. This is the goal

Automation of Logic of our research group, “Automation of Logic”. ::: Secretary Jennifer Müller Phone +49 681 9325-2900 Email [email protected] MAX PLANCK CENTER 25

Max Planck Center for Visual Computing and Communication

PROJECT MANAGEMENT: PROF. DR. HANS-PETER SEIDEL

In 2003, the “Max Planck Center for Visual Computing and Communica- Current status The Max Planck Center has now been in existence tion” was established with substantial sponsorship from the BMBF for more than ten years and counters the often observed (German Federal Ministry of Education and Research). The Max Planck „Brain Drain“ to the USA by providing an attractive return perspective. The program contributes to the training and Center links two globally leading institutes in this fi eld, namely the development of highly-qualifi ed young scientists and, thus, strengthens the innovation capacity and international com- Max Planck Institute for Informatics in Saarbrücken and Stanford petitiveness of the country. Meanwhile, the Max Planck Center has gained a strong reputation as a real talent factory. Since University in the USA. launching the program in 2003, a total of 29 young scientists have completed the program. Of these, 21 have meanwhile re- ceived tenured faculty positions, fourteen thereof in Germany, The cooperation‘s focus is on fundamental research in the seven in Europe, and twelve as full professorships. fi eld of Visual Computing and Communication and includes, in particular, the subareas of image acquisition, image analysis, The Center’s success demonstrates that it is indeed pos- image synthesis, computer vision, visualization, human-com- sible for Germany to be successful in the global competition puter interaction, and the exchange of image and video data for the brightest talent. Some key elements of this successful over complex networks. program are its international focus and the fl exible and highly dynamic research program, whose development strongly be- nefi ts from the input of the postdoctoral graduates. Further Strengthening Germany as a science and research location in important elements are the early scientifi c independence of the fi eld of computer science the young scientists with simultaneous tight integration into A major goal of the program is the development and pro- an internationally leading scientifi c environment, and last but motion of young scientists. This is achieved by opening the way not least the attractive return perspectives. ::: for particularly qualifi ed young computer scientists to gain early scientifi c independence, with simultaneous close integration into an internationally competitive and stimulating scientifi c environment. In this context, particularly outstanding young postdoctoral graduates are given the opportunity to do inde- pendent research in a small work group for up to fi ve years, under the supervision of two mentors from Germany and the United States. After a two-year stay at Stanford, where they have the status of visiting assistant professors (Phase I), the scientists return to Germany and continue their work as junior research group leaders at the Max Planck Institute for Infor- matics (Phase II). The second phase of the program is generally also open to outstanding postdoctoral graduates from other countries who would like to return here.

CONTACT

Prof. Dr. Hans-Peter Seidel Secretary Sabine Budde Phone +49 681 9325-4000 Email [email protected] 26 EXCELLENCE CLUSTER

Excellence Cluster Multimodal Computing and Interaction

SCIENTIFIC COORDINATOR: PROF. DR. HANS-PETER SEIDEL

The Excellence Cluster “Multimodal Computing and Interaction” was The research program builds on existing strengths. The Cluster comprises the Computer Science, Computational established by the German Research Foundation (DFG) within the Linguistics and Phonetics, and Applied Linguistics departments of Saarland University, as well as the Max Planck Institute for framework of the German Excellence Initiative in 2007 and successfully Informatics, the German Research Center for Artifi cial Intelli- renewed in 2012. gence, and the Max Planck Institute for Software Systems. The participating institutions have agreed on a common long- term, highly interdisciplinary research agenda as the basis for The starting point of the research program in the Cluster this Cluster. was the observation that our living and working conditions have changed dramatically in the past two decades. Twenty years ago, An integral goal of the Cluster is the promotion of young most digital content was textual. Today, digital content additio- researchers. Saarbrücken has been playing a leading role in nally comprises the modalities of speech, audio, video, graphics, this regard for many years and has established a reputation as a and more. Given these trends, the challenge is to organize, talent factory. Therefore, the majority of the approved funding understand, and search multimodal information in a robust, has been committed for the establishment of a number of effi cient, and intelligent manner and to create reliable, privacy- junior research groups. friendly systems that effi ciently support natural and intuitive multimodal interaction. This concept has proven extremely successful, and a large number of junior scientists have meanwhile been appointed to The Cluster of Excellence on Multimodal Computing and professorships in Germany and abroad. Interaction addresses these challenges. In this context, the term “multimodal” describes different kinds of information such as The Max Planck Institute for Informatics strongly con- text, speech, images, video, and graphics and the way it is per- tributes to the scientifi c activities of the Cluster. All directors ceived and communicated, particularly through vision, hearing, of the Institute are principal investigators of the Cluster, and and various forms of body language. Our fi rst goal is to enhance Hans-Peter Seidel serves as the scientifi c coordinator. ::: the ability of computer systems to acquire, process and present different modes of data in an effi cient, robust, and privacy- friendly way. We aim for systems that do the following: analyze and interpret multimodal information even when it is extensive, distributed, noisy, and incomplete; organize the obtained know- ledge for powerful querying; and produce 3D visual output in real time. We refer to this type of computing as multimodal computing. Our second goal is to enable natural multimodal interaction with information systems anytime and anywhere, exploiting the wealth of modalities present in everyday human- to-human interaction. The system must be aware of each user’s environment and situation, must react to speech, text, and gesture, and must respond in the appropriate output modality.

CONTACT

Prof. Dr. Hans-Peter Seidel Secretary Sabine Budde Phone +49 681 9325-4000 Email [email protected] RESEARCH AREAS 28

Research Areas

UNDERSTANDING 32 Articulated Human Pose Estimation

IMAGES & VIDEOS 33 Ten Years of Pedestrian Detection, What Have we Learned?

34 Markerless Reconstruction of Static and Dynamic Scenes

35 3D Object Detection

36 Advanced Video Processing

37 Describing Videos with Natural Language

38 Scalability of Object Class Recognition

39 Topology and Data Understanding

40 Learning of Visual Representations

BIOINFORMATICS 44 The Relation Between the Molecular Conformational Changes and the Free Energy Change

45 Attacking HIV From New Angles

46 Structural Insights into the Emergence of Resistance in the HIV-1 Protease

47 Bioinformatical Support of HIV Therapy

48 Reconstructing Pathogen Transmission Networks Using Genomic Data

49 Measuring the Infl uence of Every Node in a Network by its Expected Force of Infection

50 Integrative Analysis of Cancer Data

51 Charting Epigenomes

52 Analyzing the Three-Dimensional Organization of the DNA Inside the Nucleus

53 Phenotype-Based Computational Clinical Diagnosis for Genetic Diseases

54 EpiExplorer and RnBeads: Integrative Analysis of Epigenomic Data

55 BiQ Analyzer HiMod for the Locus-Specifi c Analysis of DNA Modifi cation Data

56 Detection of Related Proteins in Unrelated Viral Species

57 Structural Variation in Genomes

GUARANTEES 60 Formal Verifi cation of Peer-to-Peer Protocols

61 Distributed Algorithms for Fault-tolerant Hardware

62 Automated Deduction

63 Dynamic Matching under Preferences

64 Quantifi er Elimination – Statements can also be Calculated 29

INFORMATION SEARCH 68 Large-Scale Mining and Analysis of Sequential Patterns & DIGITAL KNOWLEDGE 69 Scalable Analysis of Very Large Datasets

70 AIDA – Resolving the Name Ambiguity

71 YAGO – A Collection of Digital Knowledge

72 Fast Searches in Large RDF Knowledge Bases

73 Answering Natural Language Questions Logically

74 DEANNA – Natural Language Questions over the Web of Data

75 Open Information Extraction

76 FERRARI: Quickly Probing Graphs for the Existence of a Relationship

77 STICS – Search and Analysis with Strings, Things, and Cats

78 Siren – Interactive Redescription Mining

79 KnowLife: Knowledge Extraction from Medical Texts

MULTIMODAL 82 Stereo 3D and HDR Imaging: INFORMATION & Display Quality Measurement and Enhancement VISUALIZATION 83 Advanced Real-time Rendering

84 Feature-based Visualization

85 Towards a Visual Turing Test: From Perception over Representation to Deduction

86 Physically-based Geometry Processing and Animation

87 Eye-Based Human-Computer Interaction

88 Optimizing User Interfaces using Biomechanical Simulation

89 Digital Fabrication of Flexible Displays and Touch Sensors

90 Perceptual Fabrication

OPTIMIZATION 94 From Routing to Pricing and Learning: Why Are They Hard to Compute?

95 Energy Effi cient Algorithms

96 Geometric Packing: Suitcases and More

97 Rule-Based Product Confi guration

98 CGAL – Exact and Effi cient Geometric Algorithms

99 Improving Flat Panel Displays by Discrete Optimization 30 RESEARCH AREAS

UNDERSTANDING IMAGES & VIDEOS

Understanding images and videos is one of the Various research groups within the One of the fundamental problems Max Planck Institute for Informatics of understanding images is the recogni- fundamental problems of image processing. work on aspects of image and video un- tion of objects. With the growing omni- derstanding. In the fi eld of modeling presence of digital image material, auto- The scientifi c challenges in understanding people, one group is working on the re- matic visual object-class recognition and images and videos range from the modeling construction of animated models from indexing techniques are becoming in- multi-video data, for example. The pri- creasingly important. Even if today‘s ap- of people and objects in camera systems to the mary goal here is the ability to model proaches can achieve remarkable results, and visualize people in detail and as two signifi cant questions remain: How reconstruction and description of 3D scenes. accurately as possible. Furthermore, we can object models be learned or con- are researching methods for people de- structed in a scalable fashion? How can This area has many applications including the tection and pose estimation, relying on we learn models that respect the 3D animation of people and the visualization of monocular cameras only, which are able nature of objects? To address these prob- to detect and track many people simulta- lems, research work at the institute in- 3D scenes, the indexing of image and video neously in complex scenes. Even if these vestigates how such object models can studies pursue fundamentally different be constructed with as little manual in- material, and 3D-capturing of the environment. targets and use different camera confi gu- put as possible to enable a broad applica- rations, they benefi t from mutual similar bility. ::: This research area is thus situated at the models and algorithms. intersection between computer vision and An emerging question in recent computer graphics, resulting in numerous years is how to generate natural language for images and videos. Our institute has opportunities for cooperation within the been at the forefront of this development; for example, our researchers have pro- institute. posed to formulate this problem as a language translation problem, where the input language is given by video descrip- tors, and the output language is natural language.

Advanced video processing is a topic at the intersection of computer vision and computer graphics. We have devel- oped techniques well beyond the state of the art that allow us to manipulate entire scene objects, such as particular actors. OPTIMIZATION & VISUALIZATION MULTIMODAL INFORMATION & DIGITAL KNOWLEDGE INFORMATION SEARCH GUARANTEES BIOINFORMATICS IMAGES &VIDEOS UNDERSTANDING Learning ofVisual Representations Topology andDataUnderstanding Scalability ofObjectClassRecognition Describing Videos withNaturalLanguage Advanced Video Processing 3D ObjectDetection Markerless ReconstructionofStaticandDynamicScenes Ten Years ofPedestrianDetection,WhatHaveweLearned? Articulated HumanPoseEstimation CONTRIBUTIONS .

......

40 39 38 37 36 35 34 33 32 31 32 UNDERSTANDING IMAGES & VIDEOS

Articulated Human Pose Estimation

Human body pose contains a wealth of information about a person’s intention, attitude and internal state. The focus of our research is to estimate body pose in realistic conditions such as images and videos found on YouTube or captured with a mobile phone. We envision that developed approaches will become building blocks in such applica- tions as activity recognition, marker-less motion capture and augmented reality.

We build on the recent advances in hierarchical image representations with convolutional neural networks (CNN) and explore two novel research directions: joint estimation of poses of multiple people and 3D human pose esti- Multi-view 3D human pose estimation potentials are used to probabilistically mation from only a few mobile cameras. We propose a novel method for the extract pose constraints for tracking by accurate marker-less capture of articu- using weighted sampling from a pose lated skeleton motion of several subjects posterior guided by the model. In the Detection and pose estimation in in general scenes, indoors and outdoors, fi nal energy, these constraints are com- multi-person scenes even from input fi lmed with as few as bined with an appearance-based model- We propose an approach that jointly two cameras. Our approach unites a dis- to-image similarity term. Poses can be solves the tasks of articulated person de- criminative image-based joint detection computed very effi ciently using iterative tection and pose estimation. Our approach method with a model-based generative local optimization, as CNN detection is infers the number of persons in a scene, motion tracking algorithm through a fast, and our formulation yields a com- identifi es occluded body parts, and dis- combined pose optimization energy. bined pose estimation energy with ana- ambiguates the body parts of different The discriminative part-based pose lytic derivatives. In combination, this people in close proximity of each other detection method, implemented using enables us to track fully articulated joint [ see fi gure ]. Our formulation is based on convolutional neural networks, estimates angles at state-of-the-art accuracy and partitioning and labeling a set of body unary potentials for each joint of a kine- temporal stability with only very few part hypotheses generated with a CNN- matic skeleton model. These unary cameras. ::: based body part detector. The partition- ing is accomplished by solving an integer- linear program that resembles correlation clustering approaches previously proposed for image and video segmentation. One of the advantages of our formulation is that it implicitly performs non-maximum suppression, removing spurious body part detections in the background and merg ing multiple correct detections corres- ponding to the same person. We evaluate our approach on standard benchmarks showing its advantages over previously proposed strategies that operated by fi rst CONTACT detecting the people and then indepen- dently estimating their body poses. Mykhaylo Andriluka DEPT. 2 Computer Vision and Multimodal Computing Phone +49 681 9325-2119 Email andriluka mpi-inf.mpg.de

Leonid Pishchulin DEPT. 2 Computer Vision and Multimodal Computing Phone +49 681 9325-1208 Email leonid mpi-inf.mpg.de UNDERSTANDING IMAGES & VIDEOS 33

Ten Years of Pedestrian Detection, What Have we Learned?

Object detection is one of the core problems in computer vision. Pedestrian detection is a canonical case of object detection. Due to its multiple applica- tions in car safety, surveillance, and ro- botics, pedestrian detection is an active area of research, with 1000+ papers published in the last decade. Pedestrian detection also benefi ts from well-estab- lished benchmarks for evaluating and comparing methods.

In the last decade, numerous tech- niques have been developed for pedes- In the last ten years, pedestrian detection has shown great progress. trian detection. Using different image features, cues, and classifi er types, de- tection performance has been steadily In the last years, our work has re- Although signifi cant progress has improving. Building larger systems that peatedly advanced the top performance been achieved in the last decade, we integrate more ingredients has become reached on standard benchmarks. In the are still far from reaching human perfor- a reliable recipe to advance quality last two years alone, we have improved mance on this task, or from reaching the (“more is more”). the state-of-the-art with a 30 x reduction desired super-human quality for automat- in errors (false positive per image at 80 % ic operations. We expect that a shift to- Our research has focused on under- recall on Caltech pedestrian dataset). wards a more global scene understanding standing which are the core ingredients Our techniques developed for pedestrian will help reduce the mistakes of current for high quality pedestrian detection. detection have also shown top perfor- methods in the future. ::: By improving these core ingredients, mance for face and traffi c sign detection. we have been able to consistently obtain For face detection, our results match the top performance on established standard best performance reported on all four benchmarks (“less is more”). major benchmark datasets.

Our top performing detectors are grounded on decade-old ideas (oriented gradients features, fi lter banks, and boosted decision trees) that we have carefully instantiated and adapted to our newest insights. Contrary to previous work, we show that top performance on pedestrian detection can be reached by using a single rigid template (no compo- nents, no deformable parts) applied as a sliding window (no geometry prior, scene prior, or bottom-up cues). Because of its simplicity, our approach is suitable for high-speed implementations. Our current results on pedestrian detection outperform both deep convolutional neural networks and methods that use more complex features (such as local CONTACT binary patterns and covariance) by a Rodrigo Benenson large margin. DEPT. 2 Computer Vision and Multimodal Computing Phone +49 681 9325-2105 Email benenson mpi-inf.mpg.de

Jan Hosang DEPT. 2 Computer Vision and Multimodal Computing Phone +49 681 9325-2123 Email jhosang mpi-inf.mpg.de 34 UNDERSTANDING IMAGES & VIDEOS

Markerless Reconstruction of Static and Dynamic Scenes

Reconstruction of detailed animation interference with the scene, for instance, models from multi-video data fi ducial markers, as are commonly used The development of new algorithms in motion capture. In the past, we have to reconstruct geometry and appearance pioneered the development of perfor- models of dynamic scenes in the real mance capture algorithms that are able world is an important goal of graphics to reconstruct detailed models of persons and vision research. Captured dynamic in general apparel or of arbitrary moving models of humans and more general objects from multi-view video. Unfortu- scenes are very important for computer nately, the application of these methods animation and visual effects. Reconstruc- is constrained to indoor studios with tion of the dynamic environment with many cameras, controlled lighting, and sensors is also highly important for au- controlled backgrounds. In the reporting Figure 1: Performance capture with a stereo camera tonomous robots and vehicles. It is also period. we therefore began to redefi ne the precondition for 3D video and virtual the algorithmic foundations of dynamic Reconstruction of static and dynamic replay production in TV broadcasting. scene reconstruction to enable, in the scenes with depth sensors Reconstruction of human motion, in par- long run, performance capture of arbi- ticular, is of increasing importance in bio- trarily complex outdoor scenes under New types of depth cameras meas- mechanics, sports sciences, and human- uncontrolled lighting and with only a ure 2.5 D scene geometry in real-time. computer interaction. The development few cameras. These new sensors are cheap and can of new sensors and new see-through dis- be an important tool to bring reconstruc- plays also brings us closer to truly believ- In the reporting period, we devel- tion methods for static and dynamic able augmented reality. The precondition oped some of the fi rst methods for the scenes to a broader user community. for this is also having algorithms to meas- detailed marker-less skeletal motion cap- Unfortunately, depth camera data is ure models of the real world in real-time. ture of humans and animals in outdoor very noisy, has low resolution, and ex- scenes with few cameras (2 – 3). Another hibits a systematic measurement bias. The goal of our research is thus the milestone was the development of new We have, therefore, developed methods development of new performance capture inverse rendering algorithms that allow to calibrate depth sensors, eliminate algorithms. These algorithms reconstruct us to measure models of appearance, noise, and algorithmically increase the detailed models of geometry, motion, illumination, and detailed geometry camera resolution. Furthermore, we are and appearance of dynamic scenes from from video footage recorded under un- working on new algorithms to capture multi-view video data. The reconstruc- controlled conditions. This new meth- geometry models of even large scenes tion is to be performed without optical odology is the basis for other important with such sensors. results: the fi rst approach to capture relightable 3D videos in uncontrolled Additionally, we research new scenes, the fi rst methods to capture methods to capture human skeletal highly detailed dynamic face models motion and body shape in real-time from from stereo or monocular video, and a single depth camera. In this context, the fi rst approach for detailed full body we have also developed some of the fi rst performance capture with a single stereo approaches for real-time capture of camera [ see fi gure 1 ]. An important new general deformable shapes with a single strand of research is also the develop- camera. ::: ment of algorithms for marker-less hand motion capture from a few cameras, or even from a single camera view [ see fi gure 2 ]. Figure 2: Real-time marker-less hand tracking

CONTACT

Christian Theobalt DEPT. 4 Computer Graphics Phone +49 681 9325-4028 Email theobalt mpi-inf.mpg.de UNDERSTANDING IMAGES & VIDEOS 35

3D Object Detection

Detecting and localizing objects in 3D deformable part-based models images and videos is a key component The goal of this project has been of many applications in robotics, autono- to build a 3D object class detector that, mous driving, image search, and surveil- along with a traditional 2D bounding lance. Traditionally, the object detection box [ see Figure 1 ], provides an estimate task has been defi ned as the localization of object viewpoints as well as relative of instances of a certain object class 3D part positions [ see Figure 2 ]. (such as cars) in an image. That is, the input to an object class detector is an im- Since 3D object modeling requires Figure 1 age, and the output is a two-dimensional supervision in the form of 3D data, we bounding box that highlights the position leverage 3D Computer Aided Design Occlusion patterns for object class of the detected object. (CAD) models as a 3D geometry proxy detection for the object class of interest. In addi- In this project, we have focused Recent work is making it increas- tion, we use 2D real-world imagery to on improving object class detection in ingly clear that high-level applications obtain realistic appearance models. the face of partial occlusion, as it occurs like scene understanding and object One of the main challenges consists in when the view onto an object is blocked tracking would benefi t from a richer, bridging the gap between artifi cial 3D by other objects. This situation can be three-dimensional object representation. geometry and 2D real-world appearance, frequently observed, for example, in Such a representation comprises, in addi- which we achieve through a shape-based street scenes, where cars block the view tion to the traditional 2D bounding box, abstraction of object appearance based onto other cars. Our approach is based an estimate of the viewpoint under which on non-photorealistic rendering. The fi nal on the intuition that the resulting occlu- an object is imaged as well as the relative 3D object class representation is then sions often follow similar patterns that 3D positions of object parts. given by a probabilistic, part-based model, can be exploited, such as cars parked in trained from both 3D and 2D data. front of other cars at the side of the road. It allows us to derive scene-level constraints between multiple objects in In our experiments, we have been We have extended our 3D object a scene (objects cannot overlap in 3D) able to demonstrate excellent perfor- class representation to include occlusion and between multiple views of the same mance in both the traditional 2D bound- patterns in the form of object pairs. In object (the viewpoints of different detec- ing box localization task and tasks aimed our experiments, we have shown that our tions must be consistent with camera at scene-level reasoning, such as view- model outperforms prior work in both movement). point estimation and ultra-wide baseline 2D object class detection and viewpoint matching. estimation. :::

Figure 2

CONTACT

Bojan Pepik DEPT. 2 Computer Vision and Multimodal Computing Phone +49 681 9325-2208 Email bojan mpi-inf.mpg.de

Michael Stark DEPT. 2 Computer Vision and Multimodal Computing Email stark mpi-inf.mpg.de 36 UNDERSTANDING IMAGES & VIDEOS

Advanced Video Processing

Image processing methods are im- portant tools during the post-processing of photographs. Many standard tools for image processing provide algorithms to fulfi l typical post-processing tasks, such as standard fi lters for noise removal or contrast enhancement.

Post-processing of videos is a much more challenging task. Videos are not merely a temporal sequence of individual images. Modifi cations to videos during post-processing, therefore, need not only be spatially consistent, that is, consist- ent within a single image, they also need to be consistent over time, that is, over many subsequent frames. Many commer- Videos in panoramic contexts cially available video editing tools, how- ever, are based on exactly this assumption relations, which is a highly challenging ed in that city. Once a frame showing a that videos can be treated like sequences and unresolved problem. We investigate portal comes up, one can transition into of individual images. Post-Processing op- the algorithmic foundations of both of another video based on a 3D reconstruc- erations with such tools are consequently the above problem areas. tion of the portal location from the video limited to the application of simple fi lters frames. We have also researched how over a temporal window. one can overlay videos of a location with Context-based exploration of video a panoramic context of that location [ see Many post-processing tasks in pro- databases fi gure]. This enables entirely new ways fessional video and movie productions Context-based relations between of visualizing the space-time relations are much more complex: Entire scene videos enable particularly interesting new between videos fi lmed at that location. elements, such as particular actors or exploration modes for video databases. supporting crew, may need to be removed Context relations capture, for instance, or manipulated in a shot, or their posi- whether videos have been recorded in New video editing approaches tions in a shot may need to be changed. the same area, the same town, or even We developed new algorithms to No existing video fi ltering approach partly show the exact same location. compute dense correspondences be- would even come close to solving such a We therefore develop new methods to tween time series of images, even images task automatically. The consequence is extract such context relations automati- exhibiting strong noise and images taken that such editing tasks are usually under- cally from video collections. One of our under starkly varying camera settings. taken through time-consuming manual approaches computes a graph, a so-called These methods are the foundation for editing of individual pixels, which can Videoscape, whose nodes are portals, i.e., new advanced video editing operations, easily take several weeks, even for short specifi c locations that videos can show, such as automatic dynamic background videos. and whose edges are videos. A video is inpainting. They also enable the automat- linked to a portal node if it ever shows ic reconstruction of high dynamic range Further, today’s personal and on- the location represented by that portal. images from a stack of images taken with line video collections are extremely large. The automatic computation of this graph different exposure times, even if there Browsing such collections is a challenge, is a complex task, and we have developed was strong motion in the scene between as current approaches are often based new machine vision and machine learn- images. New machine learning algorithms on unreliable user-based text annotation. ing algorithms to serve this purpose. we have developed also enable us to re- Ideally, one would want more semanti- The Videoscape can be explored interac- move compression artifacts from images cally-based ways of exploration that ex- tively, for instance, one can take a tour and videos, and to algorithmically increase ploit space-time and content relations through a city by watching videos record- the frame resolution in both cases. ::: between videos, such as the fact that videos were taken in the same environ-

ment, show a similar scene, were taken CONTACT at a similar time, or deal with a similar Christian Theobalt topic. The precondition for this would be DEPT. 4 Computer Graphics the ability to automatically extract such Phone +49 681 9325-4028 Email theobalt mpi-inf.mpg.de UNDERSTANDING IMAGES & VIDEOS 37

Describing Videos with Natural Language

An important aspect for automated systems is to communicate to humans what they recognize or “see”. However, current computer vision approaches typically focus on generating isolated labels for a video (e.g., “slicing”, “cucum- ber”, “plate”) that are not well suited for Detailed: A man took a cutting board and knife from the drawer. He took out an orange from the refrigerator. Then, he took a knife from the drawer. He juiced one half of the orange. Next, he opened the refrigerator. communication with humans. We thus He cut the orange with the knife. The man threw away the skin. He got a glass from the cabinet. Then, he study the problem of generating natural poured the juice into the glass. Finally, he placed the orange in the sink. language descriptions of videos. A corre- Short: A man juiced the orange. Next, he cut the orange in half. Finally, he poured the juice into a glass. One sentence: A man juiced the orange. sponding sentence for the example above could be “Someone sliced the cucumber Figure 1: An example output of our system, which automatically generated a detailed, short on the plate”. Describing videos with nat- and single-sentence description of a video. ural language is important, e.g., for the automatic captioning of web videos, hu- straction, whereas our system is able to ies and associated textual descriptions. man-robot interaction, or assisting visual- produce detailed, short, and single sen- We make use of two sources of text data, ly impaired people. In our work, we study tence descriptions of cooking videos namely movie scripts (screenplays) and two scenarios. First, we generate descrip- [ fi gure 1 ]. audio descriptions available for many tions of cooking videos. Second, we study DVDs and Blu-rays. Audio descriptions the problem of generating audio descrip- provide linguistic descriptions of movies tions for movies to enable blind people Movie description and allow visually impaired people to to follow a movie without seeing it. Existing video description datasets follow along with friends and family. focus on short video snippets, are limited The collected dataset additionally opens in size, or restricted to the cooking sce- new possibilities to understand stories Generating descriptions for cooking nario. In order to overcome these limita- and plots across multiple sentences in an videos tions, we propose a new dataset of mov- open domain on a large scale [fi gure 2]. ::: In order to study the problem of automatic video description, we propose Figure 2: to learn how to “translate” video snippet Example textual to a natural language sentence using descriptions from our dataset, aligned to techniques from statistical machine three movie snippets: translation between two languages, e.g., fi rst, coming from audio French and English. For training our description (AD), and second, from movie translation approach, we need pairs of AD: Abby gets in the basket. Mike leans over and sees Abby clasps her hands how high they are. around his face and kisses script. videos and sentences depicting various him passionately. cooking activities, which we have collect- Script: After a moment a Mike looks down to see – For the fi rst time in her life, ed on a large scale for videos, e.g., open frazzled Abby pops up in they are now fi fteen feet she stops thinking and grabs tin, stir pasta. his place. above the ground. Mike and kisses the hell out of him.

In contrast to related work, we do not only describe a short video snippet with a single sentence, but also a long video with multiple sentences. To ensure that we generate a consistent descrip- tion, we propose to model the topic of the video shared by all short video snip- pets within a long video. In our kitchen CONTACT scenario, the topic is a dish to be cooked, Anna Rohrbach e.g. preparing pasta. DEPT. 2 Computer Vision and Multimodal Computing Phone +49 681 9325-2111 We also explore the novel task of Email arohrbach mpi-inf.mpg.de describing videos at multiple levels of de- tail. All of the prior work has focused on Marcus Rohrbach describing videos at a fi xed level of ab- DEPT. 2 Computer Vision and Multimodal Computing Phone +49 681 9325-2111 Email rohrbach mpi-inf.mpg.de 38 UNDERSTANDING IMAGES & VIDEOS

Scalability of Object Class Recognition

Due to the omnipresence of digi- WordNet, Wikipedia, Flickr, and Yahoo tal imagery today, automatic techniques web search [ Figure 1 ]. The mined relations for the visual recognition of objects are can then be used to build object class becoming more and more important. models, even for classes where there are Figure 2: Effi cient recognition of object classes by The focus here lies on the recognition no training examples available, relying exploiting similarities between classes of entire classes of objects, such as cars, entirely on re-used components from rather than specifi c instances like one other models. The recognition perfor- particular red sports car. While current mance of the fully automatic system is Effective use of computation time state-of-the-art systems achieve remarka- demonstrated to be en par with providing Real-time systems as well as large- ble recognition performance for individ- the object class-attribute relations of scale recognition and retrieval systems ual classes, the simultaneous recognition human subjects. are often limited by a computational of multiple classes remains a major chal- budget. Today’s approaches rarely take lenge. At training time, building reliable such constraints into consideration and object class models requires a suffi ciently Sharing of computation between classes do not adapt their execution strategies high number of representative training Today’s state-of-the-art object de- with regard to the expected performance examples, often in the form of manually tectors predominantly achieve multi- gains and the computational cost of a annotated images. At test time, prediction class detection through a combination classifi er or detector. In our work, we has to be effi cient so that the correct of multiple single-class detectors. This propose a method that has access to mul- label can be inferred among large sets is undesirable because the computation tiple image classifi ers and detectors and of possible classes. Our research aims time grows linearly with the number of learns to deploy them with an optimal at reducing the requirements at training classes. A more detailed analysis of the strategy so that an application dependent time in order to achieve more scalable most common architectures reveals that reward is optimized under computational learning procedures. Increased scalability such independent detection schemes budget constraints. Intermediate results at test time is achieved with more effi cient perform a signifi cant amount of redun- are used to determine which classifi er or inference schemes that share computa- dant computation. Intuitively, car and bi- detector will be executed next. In con- tion among classes as well as optimize cycle detectors are both look for wheels, trast to prior work, our “reinforcement for computation-time aware performance but the detection scheme does not ex- learning”-based approach also picks metrics. ploit this. Therefore, we have presented actions that do not have an immediate shared basis representations that capture pay-off but will generate future rewards. the commonalities between such parts For instance, our image classifi er does Identifying re-usable components and classes. As a consequence, we only not yield any detections, but the result automatically have to compute the activations of such helps the method to make a more in- In order to apply the re-use of model a basis once and can share the results formed decision for the next time steps components in practice, re-usable com- among the class detectors. From this and thereby increases performance over ponents have to be identifi ed automati- shared representation, we are able to the whole episode. We have shown that cally. In this project, model components recover the activations of the individual this approach yields a strategy that out- correspond either to attributes, i.e., dis- class detectors with high accuracy. performs the best fi xed execution orders tinct visual properties like color or tex- We have presented a detection method under a time-sensitive performance ture, or to entire object class models. that leverages such decompositions, metric. ::: We propose to relate object classes and together with a hardware accelerated attributes by automatically mining lin- implementation, and have shown up guistic knowledge bases. Examples of to 35 x speedups over individual detec- linguistic knowledge bases include, e.g., tors.

CONTACT

Mario Fritz DEPT. 2 Computer Vision and Multimodal Computing Phone +49 681 9325-1204 Email mfritz mpi-inf.mpg.de

Marcus Rohrbach

Figure 1: Automatic identifi - DEPT. 2 Computer Vision and Multimodal Computing cation of re-usable attribute Phone +49 681 9325-1206 models Email rohrbach mpi-inf.mpg.de Internet http://www.d2.mpi-inf.mpg.de/nlp4vision UNDERSTANDING IMAGES & VIDEOS 39

Topology and Data Understanding

We are living in the era of informa- The relation to data analysis is that tion: Huge amounts of data are generated many data sets can be easily interpreted every minute. For instance around 350 as geometric data (in some high-dimen- million photos are uploaded to Facebook sional space), and topology provides a every single day. high-level summary of that data, ignoring details like the distance between points. Data analysis is the task of extrac- However, the classical notions of topo- ting meaningful information content and logy are insuffi cient because they are drawing conclusions from a data collec- developed for the idealized situation of tion. One example are recommendation “clean” shapes and are therefore sensitive systems for user navigation on a web- to noise. This problem has been over- site like Youtube or Netfl ix based on the come with the development of persistent choices made. In general, data analysis topology. The rough idea is to not just becomes more challenging through the count the number of topological features inevitable presence of noise in almost (like the number of holes in the exam- any form of real data. ple), but also to provide an “importance value” to each feature. This allows a In many cases, qualitative, high- more fi ne-grained analysis of the topo- level summaries are required for data logical features, in particular a distinc- analysis. For instance, a fi rst step for tion between noise and relevant features analyzing a collection of images is a clus- of the shape. tering into few categories, like images of people, buildings, landscapes, etc. The Persistent topology has been ap- A Klein bottle embedded in three dimensions growing interest of data analysis asks for plied to various questions in data analy- novel ways of extracting such high-level sis. As an example, the space of “natural information. images” has been shown to fi t the geo- metric shape of a Klein bottle, which is a twisted version of a (hollow) bagel. Topological data analysis A perhaps surprising connection has been established between data analy- Computational challenges sis and the topology of geometric shapes. In light of the increasing size of Topology is a mathematical language for data sets, effi cient ways of computing classifying shapes according to how they and analyzing the persistence informa- are connected. To illustrate the idea, a tion of data are required. Researchers at bagel and a Kaiser roll are topologically the Max Planck Institute in Saarbrücken different, because the former contains are devoted to this goal. The computa- a hole that is missing in the roll. More- tional problems connect with various over, a pretzel is different from the for- classical areas in algorithmics, for in- mer, too, again because it has more than stance, approximation algorithms to one hole. On the other hand, a coffee create a shape from data, linear algebra mug and a bagel are, topologically speak- to compute the persistence of a shape, ing, the same because both have one and combinatorial optimization to com- hole, and we can transform one into the pare the topologies of different shapes. ::: other without ever changing the connec- tivity of the shape.

CONTACT

Michael Kerber DEPT. 1 Algorithms and Complexity Phone +49 681 9325-1005 Email mkerber mpi-inf.mpg.de 40 UNDERSTANDING IMAGES & VIDEOS

Learning of Visual Representations

With the advance of sensor techno- logy, machines can get a detailed record- ing of an environment. But, there is a big gap between those raw sensor readings a computer can get, e.g., from a camera Figure 1 to the semantic understanding a human may get by looking at the same scene. We the importance of holistic inference for Better sensors are continuously need to bridge this gap in order to unfold learning these representations, in con- added to consumer electronics. 3D sen- the full potential of applications like au- trast to the more common feedforward sors are probably the most prominent ex- tonomous robotics, content-based image learning approach. We are currently ample right now (e.g., in mobile phones, search, or visual assistance for the blind. extending these representations to the gaming devices). Our investigation aims temporal domain. at understanding how 2D and 3D repre- Computer vision has come a long sentations of objects can be combined way by proposing representations that, best in order to improve object recogni- step by step, try to bridge this gap. How- Representations across data domains tion. We have successfully employed ever, despite the exciting progress, we and sensors object size constraints and have shown see a divergence of methods rather than The rapid development of the web how they can improve classifi cation and convergence, and we are far away from and mobile devices provides us with rich detection. matching the adaptivity, effi ciency, and sources of information. However, when effectiveness of the human visual system. tapping into these, we face a large hetero- Representations across rendered and Therefore, we seek principled ways to geneity in sensors, quality, and quantity real data derive visual representations in learning- of observations. It is upon good represen- There is an increasing interest in based approaches in order to eventually tations to reunite those data sources on using rendered data for training visual match or even surpass human perception. a common ground. classifi ers in order to reduce the effort of data collection and therefore providing We have studied approaches for more training data. Despite the strong Latent additive feature models domain adaptation in order to bridge progress in rendering techniques, such Inspired by models from document the gap between images from the web, synthesized images are still visually – as processing, we have applied methods that high quality and low quality sensors well as from a statistical point of view – automatically extract topics from text to [see fi gure 2]. Our investigations show different from real data. We have inves- the visual domain in order to decompose that some of the arising problems can tigated this issues on the task of material objects and image patches into parts and be solved by metric learning approaches recognition. We tap into publicly as well sub-structures. These models facilitate so that data from the web can be more as commercially available material shad- the learning of visual representations in effectively used for recognition in the ers in order to render new examples for a completely unsupervised fashion. This real-world. our training procedure. We have investi- application is facilitated by an analogy gated different methods for domain ad- between words in the text domain and aptation and data alignment in order to local gradient structures in the visual support material recognition from real domain. Figure 1 illustrates how the and virtual data which result in substan- gradient distribution of the bike on the tial improvements in recognition perfor- left is decomposed into sub-structures mance of up to 8 %. ::: on the right. We use methods that allow us to place expectations (priors) on the unknown variables in order to regularize and solve this ill-conditioned learning problem. Our experiments show that this type of representation is very well-suited Figure 2 for the task of object class recognition and detection and can even deal better with effects like transparency.

Most recently, we have defi ned a CONTACT recursive scheme that allows us to form Mario Fritz hierarchical representations built on the DEPT. 2 Computer Vision and Multimodal Computing same principle. Our studies have shown Phone +49 681 9325-1204 Email mfritz mpi-inf.mpg.de 41 42 RESEARCH AREAS

BIOINFORMATICS

Bioinformatics is a key discipline for progress For about 25 years, bioinformatics For more than the past two decades, has signifi cantly contributed to progress classical biological research, which used in biological sciences, such as biotechnology, in the life sciences. Therefore, the fi eld to be concentrated on cellular sub-sys- is a key ingredient of the revolution of tems with limited complexity, has been pharmacy and medicine. With computational biology. Bioinformatics methods and soft- complemented by high-throughput ex- methods Bioinformatics deepens and acceler- ware support researchers in planning ex- periments. The respective screening periments, store data which come from methods afford cell-wide capture of ates planning highly complex biological ex- all areas of biology, and assess this data molecular data, e.g., by measuring the with computer support. With the help frequency of all read off genes (transcrip- periments as well as interpreting the resulting of Bioinformatics, scientists can eluci- tion), by capturing the population of date the molecular processes in cells the proteins used by the cell (proteome) very large amounts of data. – the basic units of living organisms – and of their interactions (interactome). which form a complex system for pro- Generating new insight into the biology cessing matter, energy, and information. of the cell as well as the basis of diseases The genome harbors the construction and providing new approaches to therapy plan for the cell and the working plan involves highly complex information pro- of its metabolic and regulatory processes cessing. Bioinformatics addresses this in a complex, encoded form. In order to challenge. The Max Planck Institute for facilitate these cellular processes, parts Informatics performs research on many of the genome must be “read off ”. This of the challenges mentioned above. includes the genes, which amount to the blueprints of proteins, the molecu- Bioinformatics has the hybrid char- lar machinery of the cell. In turn, reading acter of a basic science which poses and off the genes is controlled via complex follows clear application perspectives. molecular networks. For the synthesis of This unique quality is highlighted by proteins, and also for their removal, there a signifi cant number of spin-offs of bio- are specifi c molecular complexes which informatics research groups. For exam- themselves are again subject to exqui- ple, Professor Lengauer co-founded the site molecular control. The cells convert company BioSolvelT GmbH, which de- energy, they communicate with cells in velops and distributes software for drug their neighborhood, they take on differ- design. Pharmaceutical companies all ent structures and forms, and they move. over the world are among the users of They react to changes in their environ- this software. ment, for example, pertaining to light, temperature and pH, and they mount The Bioinformatics Center Saar, defenses against invaders. Dysregulation whose spokesman Professor Lengauer is, of such processes constitutes the molec- received top marks for its research in the ular basis of disease. Drug therapy aims fi nal evaluation in 2007 of the fi ve bioin- to restore a tolerable molecular balance. formatics centers funded by the German Research Foundation (DFG) in the past decade. ::: OPTIMIZATION & VISUALIZATION MULTIMODAL INFORMATION & DIGITAL KNOWLEDGE INFORMATION SEARCH GUARANTEES BIOINFORMATICS IMAGES &VIDEOS UNDERSTANDING Structural Variation inGenomes Detection ofRelatedProteinsin UnrelatedViral Species of DNAModification Data BiQ AnalyzerHiModfortheLocus-Specific Analysis of EpigenomicData EpiExplorer andRnBeads:IntegrativeAnalysis for GeneticDiseases Phenotype-Based ComputationalClinicalDiagnosis Inside theNucleus Analyzing theThree-DimensionalOrganizationofDNA Charting Epigenomes Integrative AnalysisofCancerData by itsExpectedForceofInfection Measuring theInfluence ofEveryNodeinaNetwork Using GenomicData Reconstructing PathogenTransmission Networks Bioinformatical SupportofHIVTherapy in theHIV-1 Protease Structural InsightsintotheEmergenceofResistance Attacking HIVFromNewAngles and theFreeEnergyChange The RelationBetweentheMolecularConformationalChanges CONTRIBUTIONS ......

57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 44 BIOINFORMATICS

The Relation Between the Molecular Conformational Changes and the Free Energy Change

Biomolecules such as proteins are the molecular machines which work in concert to maintain living organisms. Our main interest is to develop theories to understand the relationships between the structures and the motions of the biomolecules and their roles in important biochemical interactions, such as inter- actions between proteins and drugs.

The atomistic three-dimensional structures of biomolecules can be ob- tained, e.g., by X-ray crystallography and can be considered as a snapshot of the biomolecular motion. Growing computer power affords the ability to simulate the motion of the atoms inside biomolecules and to understand the mechanistic basis of their function. Experimental thermo- dynamic measurements of quantities such as enthalpy, entropy, and free energy are used to quantify the biochemical processes, such as drugs binding to pro- teins, as well as changes of binding due to changes in the amino acids of the protein (mutations). The free energy is a result of the compensation between changes of enthalpy and entropy largely understand changes in the biochemical the molecular interactions, which are compensate each other. That is, enthalpy systems. The applications of informa- summarized by the enthalpy, and the and entropy undergo comparatively large tion theory in molecular simulations are molecular disorder, which is related to changes but the resulting change in free a promising approach to improving com- the entropy. Understanding the mecha- energy is comparatively small. We have putational methods to estimate free en- nisms by which the motions of the pro- provided a general theoretical explanation ergy changes. These methods are of high teins bring about the changes in the free of this phenomenon. The theory exploits importance for predicting resistance to- energy is crucial for many biochemical information theory to shed light on the wards certain drugs upon mutations in topics, e.g., drug design. physical basis of the molecular process. the amino acid sequences of proteins. ::: The simplicity and the generality of the A striking phenomenon in biomo- concepts in information theory provide lecular processes is that when a mole- direct physical interpretations and es- cular system undergoes a change, the tablish a new theoretical framework to

CONTACT

Mazen Ahmad DEPT. 3 Computational Biology and Applied Algorithmics Phone +49 681 9325-3105 Email mahmad mpi-inf.mpg.de BIOINFORMATICS 45

Attacking HIV From New Angles

According to the World Health We provided additional ways to visual- Organization (WHO) 35 million people ize the results, making them more inter- world-wide were living with an HIV in- pretable and better understandable than fection in 2013. About 1.5 million died standard approaches (see fi gure for one due to AIDS-related complications, in- part of the visualizations). Even though cluding 190,000 children. The number treatment of HIV-infected patients may of new HIV infections in 2013 was about be a few years from clinical routine, 2.1 million, according to WHO. Preven- these prediction models can already Motif Logo for the classifi cation of a test sample with tive measures are an important means of be used to assist future clinical bNAb unknown resistance capacity regarding a certain combating these horrendous numbers. treatment trials. bNAb. The logo reveals residues of the test sequence that have the strongest contribution to the classifi - There have been some signifi cant break- cation result. Here, the top 5 % discriminant residues throughs over the past years in HIV are considered. Blue upper case letters represent vaccine research. For example, Walker Analysis of different antibodies residues occurring in the test sequence that are against HIV strongly associated with susceptibility, whereas et al. were able to extract potent broadly orange lower case letters show residues strongly neutralizing antibodies (bNAbs) that can An important step towards develop- associated with resistance. neutralize many different HIV strains ing a vaccine against HIV is the analysis from patients with strong immune re- of the properties of antibodies that can Furthermore, we have been able sponses. Since there is no approved treat- neutralize the virus. To evaluate the po- to show that mono-treatment with cer- ment to cure an HIV infection to date, a tency of potential candidates, there exist tain types of bNAbs by passive antibody large part of HIV research is concerned laboratory tests that elicit how well a transfer may accelerate a change of the with alleviating symptoms and extending certain antibody can neutralize various viral population to a type of HIV that the life-span of infected patients with the different HIV strains. For these panels, is correlated with faster disease progres- help of antiretroviral drugs. several different HIV strains are used sion. Therefore, when deciding on treat- that are representative of the viral varia- ment options with bNAbs, these effects tion around the globe. Our analyses have to be taken into account by either Prediction of HIV resistance against showed that a certain group of viral supplementing those antibodies with an- passive antibody treatment strains is less important, clinically, and tibodies that are good against those later Treatment with a combination of that certain antibodies are signifi cantly HIV variants or additional antiretroviral bNAbs has recently proven effective less effective in neutralizing these strains treatment. ::: against an HIV infection in humanized than other antibodies. Because current mice and non-human primates. A phase antibody panels treat all viral strains alike, I clinical trial in humans for one of those they give the less important strains more bNAbs has also shown positive results. attention than they should. We advocate Therefore, this new kind of treatment targeting the antibodies to the important might be available for HIV-infected pa- strains only, which gives room for mak- tients in the near future. ing them more effective on these strains. Our results can be used to improve anti- The standard procedure for choos- body tests and are consequently an im- ing a therapy for an HIV-infected patient portant step on the way towards develop- in Germany is to sequence the important ing a universal vaccine against HIV. parts of the HIV genome and then evalu- ate which drugs are active (meaning that the patient does not harbor signifi cant resistant viral populations against these drugs). If treatment with bNAbs by pas- sive antibody transfer becomes available, CONTACT there will be a need to make an informed Anna Feldmann decision analogous to this setting. There- DEPT. 3 Computational Biology and fore, we have built prediction models Applied Algorithmics that are able to classify with high accu- Phone +49 681 9325-3027 racy whether or not a certain HIV variant Email feldmann mpi-inf.mpg.de is resistant against a certain bNAb. Nico Pfeifer DEPT. 3 Computational Biology and Applied Algorithmics Phone +49 681 9325-3019 Email nico.pfeifer mpi-inf.mpg.de 46 BIOINFORMATICS

Structural Insights into the Emergence of Resistance in the HIV-1 Protease

The human immunodefi ciency virus (HIV) causes AIDS. There is no cure or vaccine against this disease, but, in many cases, viral replication can be repressed, symptoms eased, and the disease progress slowed down by the antiretroviral drugs. One of the prominent targets for antiret- roviral therapy is the HIV protease, a viral protein that is necessary during virus maturation, a process that makes viral particles fully infectious. A high mutation rate of HIV results in the emergence of viral particles with new traits that make Example of conformation differences between of them resistant towards a particular thera- wildype (green) and mutant (blue) protease py. This in turn can lead to the necessity to switch the drug therapy, as the mutant state of a system of molecules and is with the inhibitor. This suggests pos- protein becomes less likely to be blocked related to the probability of observing sible mechanisms of how the mutation by the binding of the inhibiting drug. a specifi c conformation of this system. infl uences drug binding. Simulations of While most of major HIV protease re- This knowledge of the free energy is of the unbound protease structures also re- sistance-associated mutation sites are highest importance for understanding vealed differences in the conformations on the binding interface between the the behaviour of any molecular system. assumed by the wildtype and mutant protein and the drug molecule (in this proteins [ see fi gure ]. Since it is in the un- case called the ligand), some of them are bound form in which the protease comes also found in more distant parts of the Resistance mutation L76V to interact with its natural substrates, protein. Understanding how such muta- A particular mutation in the HIV this analysis provides insight into possi- tions infl uence drug binding is crucial genome, causing position 76 of the pro- ble reasons behind evolutionary selection to addressing the viral resistance. tease to change from amino acid leucine against some mutations: they may simply to valine (L76V), is a major resistance have consequences detrimental to the mutation. This means that the mutated formation of the infective viral particle. Molecular dynamics and thermo- protein is resistant towards inhibition Overall, this type of analysis sheds light dynamics by some protease-targeting drugs, but on the effect of distant site mutations Experimentally determined three- susceptible towards other drugs. This and gives possible support for drug dimensional structures of the complex of amino acid is located outside the drug- development. ::: protease and bound ligand, acquired by binding pocket. Molecular dynamics means of X-ray crystallography, provide simulations of protease with different static information about conformations sequences, as found in some patients, of these molecules and interactions be- together with different inhibitors were tween them. Molecular dynamics, a com- performed with both L or V variants at puter simulation technique for modeling position 76. Thermodynamic calculations molecular movement, can be employed were applied to estimate the difference to gain insight into the relevant confor- in free binding energy between wildtype mations of the complex and its dynamics, (76L) and mutant (76V) protease when starting from a crystal structure template. binding to an inhibitor, indicating whether When used in combination with statis- the mutation renders protein-drug inter- tical mechanics methods for thermody- action more or less stable. Comparing namic analysis, this method allows to wildtype and mutant proteins in complex estimate the free energy of binding be- with different drugs revealed differenc- tween ligand and different protease es in the propensities of different amino mutants. The free energy describes the acids of the protein to make contacts

CONTACT

Tomas Bastys DEPT. 3 Computational Biology and Applied Algorithmics Phone +49 681 9325-3029 Email tbastys mpi-inf.mpg.de BIOINFORMATICS 47

Bioinformatical Support of HIV Therapy

Preventing and controlling viral re- sistance is the central goal when admin- istering drug therapies against viral infec- Figure 2: Millions of HIV tions. For instance, HIV exists in millions variants are contrasted with of variants. In order to suppress the re- hundreds of combination plication of the virus inside the body drug therapies based on over two dozen drugs. The of the patient [ see fi gure 1 ], combination drug selection benefi ts from drug therapies are administered. These bioinformatical support. are composed of drugs from a base set of over two dozen compounds [ see fi gure 2 ]. the bioinformatical derivation of the viral order to generate a suitable data set on The relevant information for selecting the phenotype from the genotype based on which to derive the virtual phenotype. drugs is the viral genotype, which can be a suitable set of clinical data about viral geno2pheno makes available such virtual identifi ed from blood samples of the pa- resistance. At the Max Planck Institute phenotypes, which can be used as com- tient with methods of genome sequenc- for Informatics we are following the sec- panion diagnostic for administering cer- ing. From the genotype one can identify ond approach. Our research over the last tain AIDS drugs. the viral resistance phenotype with bio- decade or so has led to the geno2pheno informatics methods. There are two ap- system, which is freely available over the Another option for antiviral therapy proaches for this purpose. The fi rst con- Internet at www.geno2pheno.org, for treat- that is in development is the application sists of experts manually composing a set ing AIDS patients. Analyses offered on of antibodies. For this still emerging ap- of rules for deriving the phenotype from the geno2pheno server have found their proach, we have developed supporting the genotype. These rules are collected way into the European guidelines for software and performed analyses as well in computer-based expert systems which treating AIDS patients with certain drugs. (“Attacking HIV From New Angles”, page are utilized by the therapist. The second, In 2010 the research on geno2pheno was 45). more systematic approach consists of credited with the AIDS Research Award of the Heinz-Ansmann Foundation. Data collection is a substantial component in our research efforts. The A class of analyses offered on the resulting databases also afford analyses geno2pheno server that are used in cli- of networks of transmissions of the virus nical routine can be regarded as virtual among AIDS patients (“Reconstructing phenotypes. A virtual phenotype is a Pathogen Transmission Networks Using bioinformatical, i.e. computer-based Genomic Data”, page 48). procedure which estimates the result of a laboratory experiment that is infor- The molecular basis of drug resist- mative about the therapy of the patient ance is a diffi cult mystery to which we and which therefore can be used as com- have contributed in case studies (“Struc- panion diagnostic for selecting the medi- tural Insights into the Emergence of Re- cation. Here the laboratory experiment sistance in the HIV-1 Protease”, page 46) itself is usually not accessible, be it that as well as through basic research (“The is too expensive, takes too much time, or Relation Between the Mole-cular Con- is unavailable. Therefore the experiment formational Changes and the Free Energy is carried out only on a limited number Change”, page 44). ::: of samples within a research project, in

Figure 1: HIV particle budding from an infected immune cell (lower right) (courtesy Prof. Schneweis, Bonn)

CONTACT

Thomas Lengauer DEPT. 3 Computational Biology and Applied Algorithmics Phone +49 681 9325-3000 Email lengauer mpi-inf.mpg.de 48 BIOINFORMATICS

Reconstructing Pathogen Transmission Networks Using Genomic Data

Rapidly evolving pathogens We used this approach to construct The human immunodefi ciency virus transmission networks for a collection type I (HIV) has brought about one of the of 27,124 HIV sequences which were worst pandemics in recent human history aggregated from clinics and hospitals with an estimated 2.1 million new infec- across Europe. We observed that network tions in 2013. HIV is mainly transmitted robustness was negatively correlated with through sexual contact among heterosex- size. This lack of robustness can be ex- uals (HET) and men who have sex with plained by the uncertainty in estimating men (MSM), and through the sharing of the topology of the phylogenetic tree. infected needles among intravenous drug This is because of a correspondence users (IDU). Rapidly evolving pathogens between the topology of transmission like HIV accumulate genomic mutations networks constructed as minimum span- as they go through a sequence of trans- ning trees and the topology of phyloge- Figure 2: Transmission network with edges colored missions. This rapid evolution allows for netic trees. Specifi cally, each subtree in by estimated time of transmission the reconstruction of transmission net- the phylogenetic tree corresponds to a works using genetic data of pathogens subgraph in the transmission network We applied this approach to estimate collected from infected individuals. as illustrated in fi gure 1. transmission time for individuals con- nected in the transmission network and constructed a time-scaled network as Threshold-free networks Time-scaled networks shown in fi gure 2. The time interval Large HIV sequence repositories, The sampling times of virus along between transmission events was extract- resulting from efforts to study the evo- with a phylogeny-based molecular clock ed from this network and we observed lution of drug resistance in HIV, have model can be used to estimate the time that HET intervals (34 months) were enabled the construction of nationwide of evolutionary events such as the most longer than MSM intervals (25 months). and global HIV transmission networks. recent time point when two species IDU intervals were found to be the short- Such networks have generally been con- diverged from a common ancestor. est (20 months) which is probably due structed by thresholding evolutionary dis- to the higher transmission risk involved tance. A separate feature rich approach While this model has been widely in contact with infected blood as com- is to construct phylogenetic trees which used to infer transmission dynamics from pared to sexual contact. ::: model evolutionary history and have been phylogenetic trees, it has not been used widely used for inferring patterns of epi- for the analysis of transmission networks. demic spread. The popularity of thresh- old-based networks over phylogenies is mainly due to the short computational time required for their construction.

A threshold-based approach does not seem appropriate for HIV due to variance in transmission risk by route of infection. As an alternative we con- struct transmission networks as mini- Figure 1: Topological correspondence between mum spanning trees which is a thresh- transmission networks and phylogenetic trees. old-free approach. We used phylogenetic The colored circles and branches indicate the trees for extracting accurate estimates of corresponding subgraphs and subtrees. evolutionary distance. In order to recon- struct phylogenetic trees in a reasonable time we employed parallel computing.

CONTACT

Prabhav Kalaghatgi DEPT. 3 Computational Biology and Applied Algorithmics Phone +49 681 9325-3023 Email prabhavk mpi-inf.mpg.de BIOINFORMATICS 49

Measuring the Infl uence of Every Node in a Network by its Expected Force of Infection

Network models are ubiquitous in modern society. They defi ne everything from our interpersonal relationships to the fundamental infrastructures that support our way of life. Even life itself is now frequently understood in terms of networks of biological interactions.

The importance of networks has led to a number of methods for identifying the most important nodes in a network, for example node degree or page rank. The last decade, however, has seen grow- Frankfurt has the highest expected force of any airport in the world. ing awareness that such measures are not informative for the vast majority of nodes We have tested the measure exten- only a small portion of the network is that are not highly infl uential, pointing sively on both simulated and real net- known. The full structure of a real net- to a fundamental lack of understanding works, for all basic epidemic models, work is rarely fully known; typically the of the nature of infl uence in network in both continuous and discrete time. network structure is inferred from in- models. Since innovation almost always The expected force predicts epidemic direct, incomplete, and often biased originates from less infl uential nodes, a outcomes with high accuracy over a observations. method for accurately quantifying the broad range of network structures and infl uence of any node has practical as spreading processes. Our most recent What, then, does this measure tell us well as intellectual utility. work evaluates its predictive power for about the nature of node infl uence? pandemic spread over the world airline The defi nition of the expected force We approach the question within a network. This study used highly realistic implies that infl uence is determined continuous-time epidemiological frame- models which incorporate local popula- by both the degree of the node and the work, building from fi rst principles. In tion dynamics as well as travel patterns degree of its neighbors, and that the epidemiology the strength of an epidemic when simulating how a local disease relative infl uence of these two factors process is defi ned globally via its force of outbreak would spread globally. The ex- is different for nodes of low versus high infection (FoI), the current rate at which pected force of the airport servicing the spreading power. Weaker nodes gain individuals are becoming infected. Our outbreak’s starting location showed 90 % what strength they have from their neigh- task is to convert this global outbreak- correlation with the virulence the disease bors, whereas more infl uential nodes get specifi c time-varying random variable needed to become pandemic and 87 % their strength from their large number into an outbreak- and time-independent correlation with the speed at which the of connections. property of an individual node. Since the outbreak achieved pandemic status. FoI is directly proportional to the current This confi rms what we also know number of edges connecting infected and Since the expected force depends morally to be true. The best way to in- susceptible nodes, the outbreak-specifi c only on local topology, it allows epidem- crease your personal infl uence is to build part amounts to a multiplicative constant. ic outcomes on the whole network to be up the infl uence of the people around We resolve the time-varying aspect by predicted with high accuracy, even when you. ::: only considering the early stages of trans- mission, because only those are infl u- enced by individual nodes. This still leaves us with a random variable, which we summarize by taking its expected value. The resulting measure, the expect- ed force, represents the expected force of infection that would be generated by an epidemic process originating from a given node.

CONTACT

Glenn Lawyer DEPT. 3 Computational Biology and Applied Algorithmics Phone +49 681 9325-3007 Email lawyer mpi-inf.mpg.de 50 BIOINFORMATICS

Integrative Analysis of Cancer Data

Despite extensive research, cancer remains a major health threat that is dif- fi cult to treat and, in many cases, lethal. One reason for this is the heterogeneity of the disease, meaning that similar symptoms can have diverse underlying genomic or meta-genomic causes. There- fore, personalized treatments, which do not only take into consideration the tis- sue of the tumor but are tailored specifi - cally to the patient, increase the chance of successful treatment. Cancer subtypes provide a helpful tool for choosing a good

therapy for a patient. Here, cancer sub- Conventional approaches analyze each data The integrative analysis of several data types can types refer to groups of patients with dis- type separately. identify information that is weak in each individual tinct clinical outcomes, e.g., one group data source but supported by different data sources. might benefi t from a specifi c treatment while a different group does not. The on the information given in all data types explanations are more likely, according better a subtype refl ects the subtleties used. In our case, these similarities be- to observed patterns in the individual of a tumor, the better a therapy can be tween the cancer patients can be used samples. When a new sample is given adjusted. Established subtypes are main- to identify integrated subtypes and, thus, to the method, it predicts how likely each ly based on individual data types, such help guide treatment decisions. extracted explanation is for the data par- as the gene expression data of patients. taining to this particular sample. It then However, tumors often exhibit mutations gives a prediction and its interpretation on different levels of the cell. Although Interpretation accordingly. By keeping each explanation these changes can infl uence each other, Conventional machine learning simple, in the end, the method gives a there is no indivudual data type that cap- methods are generally incapable of ex- specifi c interpretation of its prediction, tures all the information about a tumor. plaining the underlying causes of a tu- which can be used by oncologists to Therefore, different molecular data is mor, and they cannot adapt to unseen further investigate the specifi c cause measured for cancer patients, often in- conditions. Therefore, we have devel- of cancer in a particular patient, and cluding for example genetic and epige- oped a method which extracts different also to use treatment targeted to that netic data. In this context, one aim is the explanations for the molecular basis of patient. ::: development of methods that reliably a disease from the data; it learns which integrate these data in a biologically meaningful manner.

Data integration One approach for the fusion of these data is regularized multiple kernel learning for dimensionality reduction (rMKL-DR). Using this method, weights for the different data types are learnt in an iterativ procedure. These weights refl ect the information that is available from the data type such that data types CONTACT with less information will have smaller Adrin Jalali weights and, therefore, less infl uence on DEPT. 3 Computational Biology and the fi nal result than data types with more Applied AlgorithmicsPhone information. The integrated data points, +49 681 9325-3028 ours being cancer patients, are then pro- Email ajalali mpi-inf.mpg.de jected into a low-dimensional subspace, Nora K. Speicher which allows for visualisation and further DEPT. 3 Computational Biology and analysis of the structure of the data. The Applied Algorithmics distances between the data points refl ect Phone +49 681 9325-3027 the similarities that were calculated based Email nora mpi-inf.mpg.de BIOINFORMATICS 51

Charting Epigenomes

Epigenetics is a research area which is at the center of worldwide attention today. The reason is that after the se- quencing of the human genome just over 10 years ago, it became clear quite quickly that the central aspect of cell regulation cannot be inferred directly from the genomic DNA,since all cells in an organism harbor the same genome. Cell regulation comprises specifi cally those aspects which are different in dif- ferent tissues, in which healthy cells are different from diseased cells, and which implement central biological processes such as the cellular response to stress and the process of aging. These aspects of cell regulation are implemented via the dynamic organization of the genome in the cell nucleus which, in turn, is based on chemical modifi cations of the genomic DNA and of the protein scaffold The methylation of the DNA impacts its accessibility by the transcription machinery of the cell. that envelops it. With techniques of mo- lecular biology that have been developed well as diseased (mostly malignant) cells. For this purpose, we have been in the last two decades, such modifi ca- Second, the Institute is a partner in the developing software systems for analyz- tions can be measured genome-wide. German Epigenome Program (Deutsches ing epigenomic data that are in world- As a result, their entirety, the epigenome, Epigenom Programm, DEEP). This pro- wide use (“BiQ Analyzer HiMod for the can be charted. ject, which is funded by the German Locus-Specifi c Analysis of DNA Modifi ca- Science Ministry with about euro 20 tion Data” page 55, “EpiExplorer and Rn- Whereas each individual has only a million, comprises the German contribu- Beads: Integrative Analysis of Epigenomic single genome, each of us has many epi- tion to IHEC. Within DEEP, metabolic Data”, page 54). genomes. In principle, all of the around and immunologic diseases in particular 200 tissue types in the human organism are being investigated and another 70 Furthermore, the Institute is in- are characterized by different epigenomes. epigenomes charted. The Max Planck volved in validation studies that involve Furthermore, the epigenomes of healthy Institute for Informatics coordinates the the application of our software in order cells differ from those of diseased cells. bioinformatics part of DEEP. to reap biological insight from epige- The knowledge of the epigenome is a nomic data. As a point in case, a study prerequisite of the understanding of the In addition to charting epigenomes, published in 2012 identifi ed the tissues cell-regulatory processes in the corre- we are also involved in interpreting the of origin of malignant tumors that could sponding cell type. For this reason, sev- resulting data. This entails the identifi - only be found in the form of metasases. eral years ago an international federation cation of epigenetic patterns that are Such identifi cation is central to effective of scientists known as the International characteristic for different types and tumor therapy. ::: Human Epigenome Consortium (IHEC) stages of disease. was founded with the goal of charting at least 1000 epigenomes of human cells within the next fi ve years.

The Max Planck Institute for Infor- matics contributes to IHEC in two ways. First, it is a partner in the BLUEPRINT project of the EU, which represents the European contribution to IHEC. This CONTACT project, with largest ever EU funding Thomas Lengauer volume for a single project in the area of DEPT. 3 Computational Biology and biology of euro 30 million, aims at chart- Applied Algorithmics ing one hundred epigenomes of bloodline Phone +49 681 9325-3000 cells (haematopoietic cells), of healthy as Email lengauer mpi-inf.mpg.de 52 BIOINFORMATICS

Analyzing the Three-Dimensional Organization of the DNA Inside the Nucleus

For any eukaryotic organism, its complete genome is packed inside the nucleus of the cell in a very dense fash- ion to form chromatin, which is a com- plex of DNA, RNA, and proteins. For example, the human genome, which comprises about 3 billion base pairs and, when stretched out, is about 2 m long, fi ts inside the cell nucleus with a dia- meter of one thousandth of a millimeter only due to this compaction brought about by forming chromatin. Albeit the compact packaging, the chromatin is arranged in three-dimensional (3D) space in a non-random manner; the arrange- ment, the so-called spatial conformation of the chromosomes, is closely correlated with the functional state of the cell Visualization of the DNA sequence motifs that contribute most to the prediction. Panel (A) in the above fi gure (healthy or diseased), gene activities, visualizes the top-6 of a set of enriched motifs found in the regions highly interacting with a particular genomic locus of interest over a ‘long-range’. Panel (B) shows the top-6 of the set of enriched motifs in the regions that and other factors. Thus, a better under- are depleted for these interactions. Each sequence motif shown is visualized as a sequence logo where, at standing of this three-dimensional struc- each position, the relative height of the base pairs denotes their frequency at the position, and the total height ture can help gain important insights of the letters denotes the information content of the position. into gene regulation. Though the overall structure of the chromatin is dependent In our research, we are developing have been studied at high-resolution. In on the state of the cell, the general pic- statistical models that help analyze these this case we can build a model using the ture, jointly measured over many cells, ‘long-range’ interaction data. Our models latter and help make reasonable predic- of the spatial co-localization of different let us predict the interaction between tions on even the low-resolution data. portions of the genome can suggest inter- putative interacting pairs, i.e., a putative, Based on the informative sequence sig- actions between chromosomal regions functionally important pair of genomic nals our model picks up to make these that lie far away in the genome sequence, loci that spatially co-localize. For exam- predictions, it may be possible to further and are possibly functionally important ple, one of these genomic loci could be investigate the underlying folding mecha- for cells belonging to a particular tissue. a region that is a few hundred base pairs nisms of the genomes. Also, our predic- long and just fl anking a gene, and an- tions may be helpful in improving exist- Only over the last decade have other a region that lies several kilo- or ing techniques that build coarse-grained molecular biology techniques advanced megabases apart on the linear genome, 3D models of the chromatin and suffer to be able to identify such ‘long-range’ but plays a role in the expression of this from the use of low-resolution genome- chromosomal interactions genome-wide. gene. Furthermore, we can characterize wide experiment data. ::: And these techniques have provided us the DNA sequence motifs that are re- with the fi rst set of genome-wide inter- sponsible for the prediction in order to actions data to model the 3D structure analyze general underlying mechanisms. of the chromosome. Obtaining multiple, Our model can be useful in a scenario high-resolution genome-wide interaction where only low-resolution data on ge- profi les for all the different cell-types of nome-wide interactions are available, but various organisms still remains very costly. a few selected portions of the genome

CONTACT

Sarvesh Nikumbh DEPT. 3 Computational Biology and Applied Algorithmics Phone +49 681 9325-3028 Email snikumbh mpi-inf.mpg.de BIOINFORMATICS 53

Phenotype-Based Computational Clinical Diagnosis for Genetic Diseases

Finding the correct diagnosis for patients is challenging, especially in the fi eld of medical genetics. A differential diagnosis is complicated by a large num- ber of genetic disorders, each of which is characterized by many clinical symptoms that overlap with those of other diseases. Additionally, many clinical symptoms show a variable degree of expression in patients with a disease, which further complicates the diagnosis. It is essential to avoid unnecessary diagnostic proce- dures before a fi nal diagnosis is made, to reduce the burden on patients and the cost of their medical care. For complex and rare genetic diseases patients some- times what years until a correct diagnosis The Phenomizer app allows computational diagnosis of genetic diseases in clinical practice. has been made. The Phenomizer reports for each search The Phenomizer server has been used Over the past years we have devel- result the estimated probability of obtain- over 50000 times in the last 3 years and oped novel algorithms and statistical ap- ing the same result by chance alone, ob- more than one thousand people have the proaches to solve the differential clinical tained with effi cient algorithms that we App installed. ::: diagnosis problem, in cooperation with have developed. Within seconds, a doc- the group of Prof. Dr. Peter Robinson at tor receives a list with the most probable the Charite hospital in Berlin. These ap- results, so physicians no longer have to proaches make use of the Human Pheno- research in databases or books for several type Ontology (HPO) that was developed hours. The list supports them in detecting in Berlin. the correct disease more quickly or in planning additional examinations to de- cide between two or more genetic diseas- The Phenomizer system for computa- es that the Phenomizer deemed equally tional clinical diagnosis likely for the given query symptoms. The Phenomizer webserver imple- ments our research ideas and is freely available to physicians all over the world. Going mobile for clinical practice The user enters clinical symptoms of a In order to support fast decision- patient, as defi ned by standardized HPO making in clinical practice and due to the terms. Then the Phenomizer returns a widespread availability of smartphones ranked list of the best matching differen- and tablets, we have developed the Phe- tial diagnoses from all genetic disorders nomizer App for Android-based systems. in the database. Currently the Pheno- The App allows the same functionality mizer knows more than 10 000 disease as the Phenomizer webserver on these symptoms and their relationships. These devices and supports the saving and ex- are assigned to over 7500 diseases. change of queries.

CONTACT

Marcel H. Schulz DEPT. 3 Computational Biology and Applied Algorithmics Phone +49 681 9325-3115 Email mschulz mpi-inf.mpg.de 54 BIOINFORMATICS

EpiExplorer and RnBeads: Integrative Analysis of Epigenomic Data

Our DNA encodes how our cells behave and what they look like. However, there is a multitude of additional layers of control that determine the fate of a cell. Chemical modifi cations of the DNA itself (such as DNA methylation) or its molecular scaffold (such as histone mod- ifi cations) regulate the ways in which the DNA is packed and processed in the nu- cleus of the cell and are thus responsible for the cell’s biological properties (pheno- type). Two software packages developed at the Max Planck Institute for Informat- EpiExplorer facilitates fast, interactive searches in genomic and epigenomic maps and visualizes them in an ics facilitate the interpretation and inte- intuitive fashion gration of such epigenetic layers of mo- lecular information: EpiExplorer allows tists identify landmarks along the genome package that enables both wet lab scien- for an integrative view and interactive sequence characterizing regions in the tists and computationally inclined scien- exploration of genomic and epigenomic genome of particular interest. EpiExplor- tists to run the entire analysis pipeline on features based on technology that has er uses a versatile text indexing scheme data originating from DNA methylation been developed in the context of Internet in order to perform comprehensive ana- experiments: Experimental biases can search engines for retrieving information lysis and visualization over the web in a be detected and removed and DNA in a fast and intuitive fashion. RnBeads matter of seconds. Datasets can also be methylation fi ngerprints can be quanti- is a comprehensive software package that compared quantitatively, and users can fi ed, visualized, and compared. For in- supports the analysis of experimentally incorporate their own datasets into the stance, genomic regions that exhibit obtained DNA methylation data. analysis. differential methylation fi ngerprints in cancerous tissue compared to normal cells can be identifi ed and annotated EpiExplorer: A tool for the effi cient Comprehensive analysis of DNA with biological knowledge. Genomic search and comparison of genomic and methylation data using RnBeads regions of interest can be exported to epigenomic features DNA methylation comprises an im- EpiExplorer where comparison with Recent technological advances are portant layer of epigenomic regulation other (epi)genomic maps is facilitated. leading to an explosion in the amount of of genomic programs and is probably the While the default workfl ow of RnBeads quantitative biological data available for best studied of all epigenomic marks. requires no expert knowledge, due to its a large number of individuals and cell It is a major contributor to embryonic modular design it is also highly confi gur- types. Major bioinformatics challenges development, and aberrant methylation able for custom analysis. The tool gen- include the integration of many hetero- patterns have been associated with com- erates comprehensive and interpretable geneous datasets comprising sequence plex diseases like cancer. Various experi- results in HTML format, thus facilitating information on the molecular architec- mental assays have been developed for easy tracking and comparison of analyses ture of the packed genome (chromatin) genome-wide analysis of DNA methyla- as well as the exchange of results with and on its epigenomic regulation as well tion patterns. RnBeads is a software collaboration partners. ::: as facilitating the discovery of molecu- lar interrelations between genomic and epigenomic features. EpiExplorer is a web-based tool that enables scientists to explore a multitude of large datasets in a CONTACT fast and intuitive fashion. EpiExplorer’s Felipe Albrecht key strength lies in supporting explora- DEPT. 3 Computational Biology and tory hypothesis generation using a broad Applied Algorithmics range of genome-wide data analyses per- Phone +49 681 9325-3008 formed in real time over the internet. Email albrecht mpi-inf.mpg.de The tool integrates a variety of genomic Fabian Müller and epigenomic maps and helps scien- DEPT. 3 Computational Biology and Applied Algorithmics Phone +49 681 9325-3009 Email fmueller mpi-inf.mpg.de Internet http://epiexplorer.mpi-inf.mpg.de BIOINFORMATICS 55

BiQ Analyzer HiMod for the Locus-Specifi c Analysis of DNA Modifi cation Data

Every cell in the human body con- are known to play a role in the develop- dates is to keep established and accepted tains an identical copy of our genetic ment and progress of severe diseases user interfaces intact, which makes it makeup, the so-called DNA (deoxyribo- such as cancer or Alzheimer’s. State-of- unnecessary for users to adapt new ways nucleic acid). The DNA is seen as the the-art research results in methods for of operating the software. Besides tech- blueprint for all parts of the body, yet it the earlier diagnosis of diseases based on nological advances, new biological in- is obvious that due to the numerous dif- anomalies in DNA methylation patterns sights also require software maintenance: ferent cells that constitute our body, e. g., as well as more personalized treatment, recently discovered chemical variants skin, heart ,or liver cells, a complex ma- e.g., an individual estimate of the chances of DNA methylation receive more and chinery is required to translate the blue- of success of chemotherapy for certain more attention by scientists due to their print and to control human development types of cancer. To foster this progress, assumed biological relevance. A variant from single egg to old age. A central com- in particular in the fi eld of biomedical re- called hydroxymethylation is suspected to ponent of this system is the so-called epi- search, modern and easy-to-use software be a key factor in the cellular de-methyl- genome, a generic term for mechanisms is needed that allows analysis of DNA ation process, i.e., the controlled removal that regulate the reading of genetic infor- methylation data even for non-bioinfor- of DNA methylation – an event that mation and thus affect cellular develop- maticians. lacked an explanatory model up to now. ment (see also “Charting Epigenomes”, page 51). A simple chemical modifi cation BiQ Analyzer HiMod [ see Figure ] of the DNA, called DNA methylation, is Keeping pace with scientifi c progress supports the analysis of regular DNA one of the best-studied epigenetic marks. We develop software tools tailored methylation data and all its known vari- DNA methylation is one of the key factors for this specifi c scenario (see also “Epi- ants. Developing a suitable update was that lets the cellular machinery distin- Explorer and RnBeads: Integrative Analy- exacerbated by the fact that standard guish between active genes, i.e., readable sis of Epigenomic Data”, page 54). The experimental protocols do not allow to and intended for protein production, and original version of BiQ Analyzer was distinguish between DNA methylation inactive genes. published in 2005 and quickly became and DNA hydroxymethylation during a standard tool for visualization, quality the computational analysis. Specialized control, and locus-specifi c, i.e. targeted, laboratory protocols are necessary to reli- DNA methylation as source of medical analysis of DNA methylation data. With- ably identify DNA hydroxymethylation information in a few years, technological advances later in a comparative analysis done by The importance of DNA methyla- enabled scientists to produce new data BiQ Analyzer HiMod. Long-term support tion in the regulation of biological pro- faster and in higher quantities, which and regular updates adapting bioinfor- cesses inside the cell is not only limited demanded an update of our program matics software to new discoveries and to normal human development. It is, in to accomodate data generated by these technologies are essential to enable re- fact, in the focus of active biomedical so-called high throughput technologies searchers not only to understand biology, research for many different diseases. (BiQ Analyzer HT for high throughput, but more importantly, to develop new ap- Aberrant patterns of DNA methylation 2010). An important aspect of such up- proaches in fi ghting complex diseases. :::

Different visualization modes of (hydroxy-)methylation data in BiQ Analyzer HiMod

CONTACT

Peter Ebert DEPT. 3 Computational Biology and Applied Algorithmics Phone +49 681 9325-3008 Email pebert mpi-inf.mpg.de Internet http://biq-analyzer-himod.bioinf.mpi-inf.mpg.de 56 BIOINFORMATICS

Detection of Related Proteins in Unrelated Viral Species

A virus is a small particle that con- tains a nucleic acid (DNA or RNA) har- bouring the genomic information, encap- sulated in a protein shell. Viruses lack ribosomes and other essential molecular machines, and thus they rely on the host cell of another organism that they infect for replication.

Unlike cellular organisms, viruses do not have a universal common ances- Transfer of HE. Green arrows represent potential routes of lateral gene transfer. tor, and the origin and evolution of virus- es is still highly debated. Evolutionary We have compared all known (Orthomyxoviridae family) HEs with that inference in the virus kingdom is very sequences and structures of seemingly of Betacoronavirus clearly shows that HE complicated. The evolutionary related- unrelated viruses in order to detect simi- was introduced into Torovirus and Beta- ness of cellular organisms is usually larities that may imply evolutionary links. coronavirus after the split of Infl uenza C deduced from comparison of the RNA Particularly, we are interested in viruses HE from Infl uenza A and B around 8000 sequence from the small ribosomal sub- with different types of genome-carrying years ago. Since other viruses from the unit, which is absent in viruses. Addition- nucleic acid (DNA or RNA, single- or Coronaviridae family do not have HEs, ally, viruses have a very high rate of evo- double-stranded), since they exhibit the this must have been a lateral gene trans- lution, several orders of magnitude faster largest differences in their life cycle. fer event in Torovirus and Betacorona- than that of cellular organisms. This We have observed two general patterns virus lineages. They share hosts, so the makes sequence comparison effective of evolution: Some viral proteins are transfer could have happened to one of only in closely related viruses, and the widely distributed in very different viral them and then to the other, or indepen- evolutionary relationships between differ- species and thus are very ancient, possi- dently. ::: ent virus families are largely unknown. bly dating back to the origin of life itself. Others have been transferred between In our work, we combine sequence viral species via a process known as hori- comparison with analysis of protein zontal or lateral gene transfer, a process structure to push further the boundaries of transfer of genetic material by means of our understanding of viral evolution. other than from parent to offspring. Proteins are biological machines that im- plement virtually all processes in a living A very interesting example is posed organism. Proteins are encoded in the by the hemagglutinin-esterases (HE) of genome in the form of genes, which are Torovirus and Betacoronavirus, related translated into a sequence of amino acids viral lineages from the Coronaviridae that fold into complex three-dimensional family that cause gastric and respiratory (3D) structures. The exact form of the syndromes in vertebrates. HEs mediate protein 3D structure is dictated by the attachment of the viral particle and sub- chemical properties of its amino acids. sequently destroy the host cellular recep- It is that form that is the basis for the tor protein involved in viral cell entry. biological function of the protein. Evolu- They are similar in sequence to HE of tionarily informative similarities between Infl uenza C virus, but sequence compari- 3D protein structures can be found son does not indicate from which of the between much more distantly related species the protein originated. However, organisms than those with detectable comparison of protein 3D structures of similarities of their genetic sequences. Infl uenza A, Infl uenza B and Infl uenza C

CONTACT

Olga Kalinina DEPT. 3 Computational Biology and Applied Algorithmics Phone +49 681 9325-3004 Email kalinina mpi-inf.mpg.de BIOINFORMATICS 57

Structural Variation in Genomes

Each of our body cells harbors a copy of our genome, the set of all genetic material. The genome is organized in chromosomes: we inherit 23 chromo- somes from our mother and father, re- spectively. A chromosome consists of a long DNA molecule, stabilized and spatially structured by special proteins. The DNA encodes genetic information as a sequence of its constituting bases adenine (A), cytosine (C), guanine (G), and thymine (T). In this sense, a chro- mosome can be viewed as a sequence of the letters A, C, G, and T. Likewise, we can represent a genome as a set of such sequences (one for each chromosome). The research fi eld of genomics asks the question, among others, of how our ge- Differences between the genomic sequences of two persons nome infl uences our traits. Many differ- ent traits can be of interest, such as body In a large study of 250 families, our Beyond fundamental research, height, the risk of contracting a certain computational methods have contributed studying structural genetic variants is disease, or the tolerance to a drug, for to characterizing structural genetic vari- potentially important for personalized instance. ants in detail. In this project, the com- medicine. So far, association studies that plete genomes of both parents and one aim to establish connections between The genomes of two persons can child were sequenced for each family. the genome of a person and clinically differ in a multitude of ways, as illus- The large number of studied families as relevant traits were restricted to point trated in fi gure 1. Traditionally, many ge- well as the high sequencing depth render mutations. Our recent research provides netic studies focused on point mutations this effort a leading project worldwide. evidence that taking structural variants (SNPs). That is, one restricts oneself to By reading the genomes of parents and into account in such studies leads to differences in single letters. In recent children, we can determine which ge- new, medically relevant insights. ::: years, however, increasing attention has netic variants found in the children were also been given to larger differences, the not inherited from the parents, but are so-called structural variants, such as de- the result of new mutations. The rate at letions or duplications of whole DNA which these so-called de novo mutations segments. Such variants occur frequently happen is of great interest as a parameter in humans and, hence, studying them is in models of human evolution. indispensable for capturing the full spec- trum of human genetic diversity. This is enabled by new sequencing technologies in combination with novel algorithmic methods, which we develop actively.

High-throughput methods for ge- nome sequencing allow for reading mil- lions of short DNA fragments in parallel. That means, these devices do not return the sequence of a whole chromosome; instead, they output the sequences of short fragments, called reads. The analy- sis of the resulting data is thus compa- rable to solving a jigsaw puzzle: from the CONTACT reads, we want to reconstruct the se- Tobias Marschall quenced genome. Here, we particularly DEPT. 3 Computational Biology and focus on detecting structural differences. Applied Algorithmics Phone +49 681 302 70880 Email t.marschall mpi-inf.mpg.de 58 RESEARCH AREAS

GUARANTEES

Software and Hardware must be reliable. Today, computers are a ubiquitous The articles “Distributed Algorithms part of our lives. We use them constantly, for Fault-tolerant Hardware”, and “Formal The most important criterion for reliability is sometimes knowingly, such as a comput- Verifi cation of Peer-to-Peer Protocols” dis- er on the desk, and sometimes unknow- cuss guarantees of different highly dis- correctness. Nearly as important, however, ingly, such as the electronic controls in tributed systems: the former modern pro- is performance: A correct answer that is not the car, airplane, or washing machine. cessor hardware, and the latter modern The more our lives are dependent upon internet protocols. received in time is not helpful. The search hardware and software, the more we pose the question of whether the trust In order to provide guarantees of for correctness and performance guarantees we place in these products is justifi ed. programs in general, we need deductive The answer to this question is especially methods that enable us to derive the is a cross-departmental research topic at the diffi cult and complex because software, desired properties from a program in the institute. and meanwhile hardware, is not robust. form of proofs. This requires automatic A single change in a program, a minor verifi cation procedures, which are dis- error in one line of a program with cussed in the article “Automated Deduc- several hundred thousand lines of code, tion”, as are techniques to properly divide or a small change in the processor’s cal- and conquer resulting proof obligations, culations can cause the overall program in particular, from theories such as arith- to crash or deliver wrong results. Or, the metic. This aspect is discussed in the developers may not have considered article “Quantifi er Elimination – State- an infrequent but possible border case, ments can also be Calculated”. ::: and software that has worked reliably for years can suddenly show incorrect behavior. The granting of guarantees becomes even more complex when we examine not just a single program, but a distributed system of programs in which parts work independently but also react to each other while pursuing different goals.

The article “Dynamic Matching under Preferences” investigates the latter situation. It examines the problem of match-making individuals pursuing dif- ferent preferences, seeking answers as to which assumptions will the match- making be successful. OPTIMIZATION & VISUALIZATION MULTIMODAL INFORMATION & DIGITAL KNOWLEDGE INFORMATION SEARCH GUARANTEES BIOINFORMATICS IMAGES &VIDEOS UNDERSTANDING Quantifier Elimination–StatementscanalsobeCalculated Dynamic MatchingunderPreferences Automated Deduction Distributed AlgorithmsforFault-tolerantHardware Formal Verification ofPeer-to-Peer Protocols CONTRIBUTIONS . . . .

. .

64 63 62 61 60 59 60 GUARANTEES

Formal Verifi cation of Peer-to-Peer Protocols

Peer-to-peer (P2P) networks are thriving as a means of communication over the Internet. An email service, for example, may be based on the more tra- ditional client-server model, where all information is stored in a central, dedi- cated location called a server, and client devices, such as phones and laptop com- puters, communicate with the server to get new messages or manage the inbox. The server controls the level of access users /clients have to resources and in- formation and coordinates the commu- nication between different clients. This centralized model has its benefi ts, but it is vulnerable due to the server being a single point of failure.

Instead, many modern Internet ap- mation may lead to big problems. In the fi nal proof largely depends on how plications that we use every day, like August 2007, for example, Skype was good the underlying theorem prover is, Skype, Adobe Flash Player, Jaman or Tor, unavailable to the majority of users for both at handling large problems, and at rely on peer-to-peer technology. Partici- 3 days, most likely due to a failure in its handling problems of a particular type pating nodes form an overlay network underlying P2P technology. It is there- or from a particular domain. For example, on top of the Internet. In this overlay fore very important to verify the correct- verifying P2P protocols requires provers network, messages do not pass through ness guarantees claimed by these proto- that can reason about arithmetic and a central location; the work of delivering cols. Hand-written proofs are subject to sets, since node addresses are typically a message to its destination is distributed mistakes, and they almost always contain represented by numbers from a ring. among nodes. Each node maintains a some. For example, though both Pastry Gaining experience in the formal verifi ca- routing table with the addresses of a few and Chord offer some correctness/per- tion of such distributed systems, as well other nodes in the network. If node x formance guarantees for which there are as devising suitable reasoning techniques does not know the address of the destina- hand-written proofs, research has shown for the underlying provers, is an impor- tion node y, it must forward the message that no published version of either proto- tant fi eld of research. ::: to some node z that it does know about col is correct. on the way to y. Distributed hash tables provide the underlying data structure for Formal verifi cation offers a much this kind of message routing. Among the better, though much more tedious, al- most popular P2P protocols are Chord, ternative. Interactive tools allow a user Pastry and Kademlia. to write a mathematical proof in some formal language in the form of steps. P2P protocols are susceptible to Each proof step can be sent to a theo- churn, since new nodes may join at any rem prover software for verifi cation. time, and existing nodes may sign-off The proof is accepted if the prover can or fail without giving notice. This makes verify every step. If a step is too large it very challenging to ensure that each or diffi cult for the theorem prover to node’s routing table stays up to date. prove automatically at one time, it can Inconsistency in the nodes’ local infor- be broken into sub-steps. The size of

CONTACT

Noran Azmy RG. 1 Automation of Logic Phone +49 681 9325-2908 Email azmy mpi-inf.mpg.de GUARANTEES 61

Distributed Algorithms for Fault-tolerant Hardware

Distributed Computing is concerned One of the most fundamental with systems in which communication challenges in designing computer chips – as opposed to computation – is the is the distribution of a well-behaved main obstacle to solving a given task fast. clock signal throughout the chip. Up to It also addresses the issue of fault-toler- a certain scale, this can be successful- ance, i.e., the ability of a system to work ly achieved by so-called clock trees [see correctly even if some of its components fi gure 1]. Basically, a clock tree is care- fail. As hardware development continues fully engineered to ensure that the clock to follow “Moore’s Law” which states signal generated by a single quartz crys- that the number of basic component in tal arrives at adjacent components at off-the-shelf hardware doubles roughly (almost) the same time. This permits to every 18 months, both of these aspects perform computations synchronously: become increasingly important. In other a computational step is executed, then Figure 2: Part of a clock distribution grid. Each node words, nowadays a common computer the results are “locked” and communi- waits for the second received signal before forwar- chip, and even more so a multi-core cated, and then the received new values ding the clock signal on all outgoing links. Hence, the yellow nodes still operate correctly even if the processor, is a distributed system in its are used for the next computational step. red node or its outgoing links fail. own right. The synchronous design paradigm offers many advantages and is therefore the de If this claim strikes you as odd, just facto standard in hardware design – up correctly even if a few isolated nodes or consider the following facts: (i) at today’s to the point at which systems become links fail. We work towards effi cient re- clock speeds in the gigahertz range, an too large to be effi ciently clocked by a alizations of this and similar schemes to electrical signal cannot traverse an en- single clock tree. enable building larger, yet more reliable tire chip in a single clock cycle; (ii) by systems. ::: now, standard processors host billions of We argue that using distributed transistors, making it virtually impossible algorithms, one can effi ciently and relia- to guarantee that each of them operates bly generate and distribute a clock signal correctly; and (iii) the most basic stand- on a signifi cantly larger scale than possi- ard components of computer chips have ble with a single clock tree. For instance, reached a size of about 10-20 nanome- consider Figure 2, which illustrates a ters (a nanometer is a billionth of a me- simple fault-tolerant clock distribution ter!), a size at which physical effects ne- mechanism. In contrast to a clock tree, cessitate dealing with transient faults, i.e., there are different paths along which components temporarily and unpredict- the clock signal is propagated; hence, ably operating out-of-spec. the system as a whole can still operate

Figure 1: Clock tree. In case of a fault at the junction marked red the entire red area fails. CONTACT Christoph Lenzen DEPT. 1 Algorithms and Complexity Phone +49 681 9325-1008 Email clenzen mpi-inf.mpg.de Internet http://www.mpi-inf.mpg.de/departments/ algorithms-complexity/research/ theory-of-distributed-systems/ 62 GUARANTEES

Automated Deduction

In order to guarantee that a piece of tions. The fi rst step in this direction is plicated one. The superposition calculus hardware or software is working correctly, easy: Obviousy it is a good idea to apply developed by Bachmair and Ganzinger it must be verifi ed – i.e., its correctness equations in such a way that the result is in 1990 offers a way out of this dilemma. must be proven formally. Proving correct- simplifi ed, for example, “x + 0 = x” only On the one hand, it performs calcula- ness means that one has to check wheth- from left to right and not the other way tions in a forward direction; on the other er certain properties of the system follow round. hand, it systematically identifi es and re- from other properties of the system that pairs the possible problematic cases in are already known. The question of how a set of formulas where backward com- to use computer programs for such proof putation could be inevitable. Superposi- tasks has been an important research tion is thus the foundation of almost all topic for many decades. Ever since the theorem provers for fi rst-order logic with fundamental theoretic results of Gödel equality, including the SPASS theorem and Turing at the beginning of the twen- prover that we have developed at the tieth century, it has been known that not Institute. everything that is true in a mathematical sense is actually provable, and that not In our research group, our current everything that is provable is automatical- focus is on refi ning the general super- ly provable. Correspondingly, deduction position method for special applications. systems differ signifi cantly in their ex- Application of equations In particular, we are developing tech- pressiveness and properties. For example, niques for combining the capabilities of decision procedures are specialized for a However, this approach is not al- various proof procedures. We are inves- certain type of data (such as real num- ways suffi cient. A typical example is tigating, on the one hand, how arithme- bers) and are guaranteed to detect the fractional arithmetic: We all know that tic decision procedures can be integrat- correctness or incorrectness of a state- it is occasionally necessary to expand a ed into superposition, and on the other ment within this area. Automatic theorem fraction before we can continue calculat- hand, how superposition provers can provers for so-called fi rst-order logic can ing with it. Expansion, however, causes improve the degree of automation of in- handle any data types defi ned in a pro- exactly what we are actually trying to teractive provers. We are also addressing gram. For such provers, it is guaranteed avoid: The equation “(x · z) / (y · z) = x / y” the question of how superposition can only that a proof will be found if one ex- is applied from right to left – a simple be used for business applications such ists; if none exists, the prover may con- expression is converted into a more com- as product confi guration. ::: tinue searching without success, possibly forever. Even more complicated problems can be handled using interactive provers; these provers, however, only work with user assistance and without any guaran- tee of completeness.

How does a theorem prover for fi rst-order logic work? It is not diffi cult to write a program that correctly derives new formulas from given formulas. A logically correct derivation is, however, not necessarily a useful derivation. For example, if we convert 2 · a + 3 · a to 2 · a + 3 · a + 0 and then to 2 · a + 3 · a + 0 + 0, we do not make any computation errors, but we are not getting any closer to our goal either. The actual challenge is thus to fi nd the few useful derivations Output of an automatic theorem prover among infi nitely many correct deriva-

CONTACT

Uwe Waldmann RG. 1 Automation of Logic Phone +49 681 9325-2905 Email uwe mpi-inf.mpg.de GUARANTEES 63

Dynamic Matching under Preferences

Matching, the grouping of items into disjoint pairs, is a fundamental prob- lem in graph theory, combinatorics, and computer science. A number of impor- tant applications for matching arises when participants with preferences strive to form contacts or groups. For example, consider a set of boys and a set of girls, each with a preference over the members of the other sex. If they form couples, what matchings are desirable or likely to evolve? A natural condition is stability: A stable matching with three boys and three girls There is no pair of boy and girl such that both prefer each other over their current We showed that here a path to stability Another stability criterion for partner [see fi gure]. Such stable matchings might not exist and dynamics might fail matchings is popularity. For a given ≥ 1, have many applications in uncoordinated to converge. Interestingly, if each partici- a matching is -more popular than anoth- job or housing markets, in multi-agent pant has a small memory and repeatedly er if there are times more participants systems, and centralized markets like the remembers a random previous partner, that prefer the former to the latter. For NRMP program in the United States, convergence can be achieved, in some an -popular matching, there is no other where medical residents are assigned to cases even in polynomial time. Thus, the matching that is -more popular. Popular- hospitals. emergence of stability crucially depends ity is a global criterion that says a major- on the means to overcome locality con- ity of participants prefer the status quo. Matching under preferences is an straints. The unpopularity factor relaxes this active research area in computer science idea – for larger , more unhappy parti- and economics, documented by recent cipants are required to label a matching Nobel memorial awards in economics for Matching in dynamic markets unpopular, and hence, the set of -popu- Alvin Roth and Lloyd Shapley for their Even in coordinated markets, dy- lar matchings expands. work on the subject. namics can arise. For example, while a fi rm can make centralized decisions We have shown how to maintain about jobs and employees, there might -popular matchings using dynamics: Dynamics in matching markets be retirements, new positions, current Move to an -more popular matching In uncoordinated markets, there is employees seeking promotion, etc. Here, whenever possible. For large , this strat- no authority that dictates a matching. In- the fi rm needs to maintain a matching egy makes changes only rarely, but the stead, participants deviate iteratively to under preferences in a dynamic environ- maintained conditions are unpopular. preferred pairs if possible. These match- ment. A natural goal is to avoid unnec- We have provided a tight analysis on ing dynamics are known to describe a essary re-assignments and to make only how a lower leads to a larger rate of “path to stability”, a sequence of match- a small number of changes. It turns out change. In fact, if every person is inter- ings that converges to a stable matching. that stable matchings cannot be main- ested in at most d possible partners, our tained in such a way and might require strategy maintains O(d)-popular match- In large markets, participants often changing many pairs frequently. ings at O(d) changes per iteration. ::: need to learn about possible matches be- fore they can engage in a joint activity. For example, in job markets people often discover possible jobs only via personal contacts or because they work in the same company. To capture this aspect, we studied locally stable matching and dynamics with changing visibility based on a (social) network. A new pair can form only if they currently know about each other and prefer to deviate.

CONTACT

Martin Hoefer DEPT. 1 Algorithms and Complexity Phone +49 681 9325-1055 Email mhoefer mpi-inf.mpg.de 64 GUARANTEES

Quantifi er Elimination – Statements can also be Calculated

The understanding of computing Statements about real numbers that This theoretical research is practi- on computers has historically evolved contain arithmetic operations, compari- cally implemented in the software system in several steps. In the early days of elec- sons, logical operations and quantifi ca- Redlog, which is globally considered as tronic computing equipment, in addition tion, as in our example, can always be being a powerful and effi cient computa- to dealing with large amounts of data, computed in such a way that the result tional tool by numerous scientists. The the numerical processing of numbers does not contain any quantifi cation. In computation time for our above proof was in the foreground. The development the special case where all symbols in a about the number sequence is only ap- of computer algebra since the mid 1960s statement are quantifi ed, the result con- proximately 0.07 seconds with Redlog. generalized that processing to computa- tains no symbols at all; it is then either tion with symbolic expressions: 0 = 0 (“true”) or 1 = 0 (“false”). A recent application of Redlog this A simple example is the calculation of year was the analysis of an automatic (x – 1) . (x + 2) + 2 with the result x2 + x. Let us consider, e.g., a sequence speed control for cars. Thereby, speed The fact that computational work has of real numbers as follows: The fi rst is automatically regulated in such a way indeed been performed here is evident, two elements x1 and x2 are arbitrary. that a collision with a preceding vehicle among other things, because the evalua- All others are created according to the is impossible. Redlog computed starting tion of the original expression for a con- rule xn+2 = | xn+1 | – xn. If we start with, speeds at which the control can be safely crete x requires double the amount of say, 1, 2, then we subsequently obtain transferred to the system. arithmetic operations as that of the re- 1, –1, 0, 1, 1, 0, –1, 1, 2, and we observe sult. Building on this, it became possible, that the sequence repeats after the fi rst Quantifi er elimination and decision e.g., to automatically derive real func- nine items. This fact can be formulated methods are also fi nding application in

tions and even calculate indefi nite inte- as a “for all” statement about x1, ..., x11. the fi eld of the analysis of complex sys- grals automatically. From this, our method computes 0 = 0. tems in natural sciences, for example This way, we automatically prove that electrical networks in physics, reaction In mathematics, symbolic expres- for arbitrary starting values the sequence systems in chemistry, or gene-regulatory sions typically appear in complex state- will repeat after the fi rst nine items. networks in biology. ::: ments that can quantify some of the oc- curring symbols, as, for example, in the If we consider statements about following statement about real numbers: integers instead of real numbers, then For all x there is a y, such that x2 + xy + b there can be no corresponding software; > 0 and x + ay2 + b 0. What can it pos- and this is mathematically provable. In sibly mean to “compute” such a state- this sense, the methods discussed here ment? Whether our statement is true or border on what is mathematically pos- not, depends on the selection of a and b. sible. In addition to integers and real We therefore determine the possible numbers, there are also many other do- choices of a and b for which the state- mains in the natural sciences being re- ment is true. The result can be rephrased searched at the Institute in terms of such as a statement that no longer contains methods. This even includes domains in any kind of quantifi cation: a < 0 and b > which symbols not only stand for num- 0. Here again, signifi cant computational bers, but also for functions. Correspond- work has been performed: based on the ing statements can contain arithmetic result, in contrast to the original formu- and derivatives. lation, one can determine with minimal effort for concrete choices of a and b whether our considered statement is true or not.

CONTACT

Thomas Sturm RG. 1 Automation of Logic Phone +49 681 9325-2920 Email sturm mpi-inf.mpg.de 65 66 RESEARCH AREAS

INFORMATION SEARCH & DIGITAL KNOWLEDGE

Digital information has changed our society, In parallel to this aim for a quan- A large fraction of such data arises from tum leap, we observe a complexity explo- social media, where relationships and in- economy, work in science, and everyday life. sion in digital information along several teractions among users are further assets dimensions: quantity, structural variety, for search, analyses, and recommenda- Modern search engines deliver useful infor- digital history, multi-modality, semantic tions. mation for nearly every question, and the In- search and text analytics, as well as natural language access to data and Queries on the Internet and in enter- ternet has the potential to be the world’s most knowledge. prises often refer to entities and their attributes and relationships: people, com- comprehensive knowledge base. However, In addition to billions of web pages, panies, products, etc. Semantic search is important information sources are: online the emerging technology to provide pre- knowledge structures in the Internet are news streams, blogs, tweets and social cise and concise answers by harnessing amorphous, and search engines rarely have networks with several hundred million digital knowledge bases. In addition, this users, online communities for photos, opens up great opportunities for advanc- precise answers to expert questions for music, books, health, politics, scientifi c ing text analytics in terms of entities and topics, and last but not least encyclope- relations. which users would consult reference books dias like Wikipedia. The total volume of this data is in the order of exabytes: 1018 The proliferation of smartphones en- and expert literature. A great challenge and bytes – more than one million terabyte tails an increasingly important user need research opportunity lies in making the leap disks. for natural-language question answering and dialog. To this end, computational from processing raw information to computer- There is an increasing amount of linguistics, statistical learning and digital more expressive data representations: background knowledge are essential. aided intelligent use of digital knowledge. tables in web pages, XML documents, RSS feeds, semantically connected RDF graphs, and much more. The richer Scientists at our institute address structure and heterogeneity of the data these challenges from various perspec- increases the complexity of mastering tives. The great vision of making the this digital variety. quantum leap towards knowledge search and discovery is pursued in work on The history of digital information – automatic knowledge extraction from for example earlier versions of our insti- web sources such as Wikipedia. The tute’s web pages, which are partly con- Max Planck Institute for Informatics served in the Internet Archive – is a po- is a world-wide leader in this important tential gold mine for in-depth analyses research direction. ::: along the time dimension. Sociologists and political scientists, but also media and market analysts as well as experts in intellectual property, can profi t from this.

In addition to text-oriented and struc- tured data, we are also experiencing an explosion of multimodal information: billions of people are becoming data producers in the web by sharing images, videos, and sound recordings with the rest of the world. 67 CONTRIBUTIONS

Large-Scale Mining and Analysis of Sequential Patterns ...... 68

Scalable Analysis of Very Large Datasets ...... 69

AIDA – Resolving the Name Ambiguity ...... 70

YAGO – A Collection of Digital Knowledge ...... 71

Fast Searches in Large RDF Knowledge Bases ...... 72

Answering Natural Language Questions Logically ...... 73

DEANNA – Natural Language Questions over the Web of Data ...... 74

Open Information Extraction ...... 75

FERRARI: Quickly Probing Graphs for the

Existence of a Relationship ...... 76

UNDERSTANDING

IMAGES & VIDEOS STICS – Search and Analysis with Strings, Things, and Cats ...... 77

BIOINFORMATICS Siren – Interactive Redescription Mining ...... 78 GUARANTEES

INFORMATION SEARCH KnowLife: Knowledge Extraction from Medical Texts ...... 79 & DIGITAL KNOWLEDGE

MULTIMODAL INFORMATION & VISUALIZATION

OPTIMIZATION 68 INFORMATION SEARCH & DIGITAL KNOWLEDGE

Large-Scale Mining and Analysis of Sequential Patterns

Frequent sequence mining is one For instance, when working with natural and exploration tasks. For example, the of the fundamental problems in data language documents, we may generalize evolution of the frequency of a sequence mining with various applications. In words into their part-of-speech (e.g., over time can be used for studying changes speech recognition, for example, fre- article or adjective). Thus, we may fi nd in the use of language or for analyzing quent sequences can be used to correctly the frequent sequence “get up popularity of entities (e.g., “How often identify the speech input (e.g., “get up eight o’clock,” where is a place- was Barack Obama mentioned within at eight o’clock” occurs much more of- holder for a preposition such as “at” or each year during the last two decades?”). ten than “get a potato clock”). Likewise, “before”. Further, quickly determining frequent in customer behavior analysis, frequent sequences that meet certain criteria sequences can be used to describe com- In our experiments, we have dem- allows for targeted exploration (e.g., mon behavior across users. As an exam- onstrated that our methods can compute “Which movies are frequently watched ple, an online video rental may exploit frequent sequences on large real-world after Pulp Fiction?”). The fi gure illus- the knowledge that customers often datasets. In a collection of more than trates an example of targeted explora- watch Pulp Fiction shortly after having 50 million documents, our methods can tion in the fi eld of text analytics using watched Kill Bill I and II to generate compute frequent sequences consisting our software “SyntEx’’. It shows fre- movie recommendations in the future. of up to 50 words in less than an hour, quent sequences of the form “ using a cluster of only 11 commodity to ’’ sorted according Existing methods for frequent machines. to their frequency in the archive of The sequence mining typically operate on a New York Times. single machine and, therefore, cannot In addition to computing frequent deal with today’s vast amounts of data. sequences, statistics about frequent se- All of our methods are publicly We have developed novel methods quences can be used for several analysis available as open-source software. ::: for frequent sequence mining based on MapReduce, which can be deployed on clusters of many machines and scale out as the data grows. MapReduce, de- veloped at Google in 2004, constitutes a natural environment for large-scale distributed data processing on clusters of many commodity machines and can handle hardware and software failures transparently. Our methods make use of Apache Hadoop, which is an open- source implementation of MapReduce widely used in industry and constitutes a core building block within big-data initiatives.

In more detail, we consider three Exploring sequences with SyntEx variants of the frequent sequence mining problem and have devised tailored ap- proaches for each of them. First, we consider the special case of contiguous sequences, i.e., for a sequence to be fre- quent, it has to occur suffi ciently many CONTACT times “as it is” in the data. Our second Kaustubh Beedkar variant allows for gaps of bounded length University of Mannheim, Mannheim, Germany between items constituting the sequence. Email kbeedkar uni-mannheim.de This allows us to fi nd the sequence “get up at _ o’clock” (with _ as a place- Klaus Berberich holder), although it might not frequent DEPT. 5 Databases and Information Systems Phone +49 681 9325-5005 when plugging in any concrete number. Email kberberi mpi-inf.mpg.de Lastly, we allow for generalizations of

items along a predefi ned hierarchy. Rainer Gemulla University of Mannheim, Mannheim, Germany Email rgemulla uni-mannheim.de INFORMATION SEARCH & DIGITAL KNOWLEDGE 69

Scalable Analysis of Very Large Datasets

The technical capabilities for data collection as well as the number of avail- able data sources have increased tremen- dously in recent years, imposing new, unprecedented challenges to information management. The development of the Web 2.0 and social networks, the ubiquity of mobile devices and sensor networks, and the advances in gathering scientifi c data have contributed signifi cantly to this Figure 1: Known incomplete matrix Figure 2: Automatically completed matrix development. The resulting fl ood of data is diffi cult, if not impossible, to manage A successful technique for predic- good recommendations for each user, using traditional tools for data manage- tion, which is also used by Netfl ix in without computing all predicted ratings. ment such as relational databases. On production, models the available movie Our methods are orders of magnitudes the one hand, the sheer size of the data ratings in the form of an incomplete ma- faster than the naive approach of fully requires massively-parallel processing, trix and subsequently tries to “complete” computing the completed matrix. using hundreds or thousands of compu- this matrix. Every row of the matrix cor- ters. On the other hand, novel methods responds to a user, every column to a In practice, we may not want to for data analysis are necessary to extract movie, and every entry to a rating of the simply recommend the movies with the useful information from the raw data. respective movie by the respective user. highest predicted rating. For example, it Figure 1 visualizes such an incomplete is important that recommendations are At the Max Planck Institute for matrix; here, black dots correspond to diverse, i.e., that each use’s recommen- Informatics, we develop effi cient, highly unknown values. We have developed dations consist of suffi ciently different scalable methods and systems for the parallel algorithms that can complete in- movies (e.g., different genres or differ- statistical analysis of such big datasets. complete matrices with millions of users, ent directors). The availability of licenses Internet companies such as as Amazon, millions of movies, and billions of entries or physical media is also crucial: if a user Google, or Netfl ix, for example, analyze within a couple of minutes. Every entry accepts a recommendation, then this re- data about users and their behavior to of the output matrix corresponds to a commendation should be delivered as provide personalized recommendations predicted rating. Figure 2 shows a com- fast as possible. Our techniques allow for products, websites, news, movies, or pleted matrix computed using our meth- effi cient, high-quality personalized re- images. From a user’s point a view, such ods. commendations that respect constraints personalized recommendations enable such as those mentioned above. the targeted exploration of potentially One option to create recommenda- interesting items; from a provider’s point tions for each user is to recommend the Apart from personalized recommen- of view, customer satisfaction and loyalty movies with the highest predicted rat- dations, our group also works on methods increase. The American movie rental ings. This is a non-trivial task, however, for the interactive exploration of large company Netfl ix, for instance, allows its because the completed matrix is usually document collections, for the analysis more than 20 million customers to rate too large to be constructed and processed and automated extraction of knowledge movies using a 5-star system. The so- in reasonable time. We have developed from natural language text, and for pat- obtained ratings are used to individually algorithms that effi ciently extract only tern mining on sequential data. ::: recommend new, not-yet-seen movies to users. Modern recommender systems are based on an approach called “collabo- rative fi ltering”, i.e., the behavior of many users and user groups is analyzed in order to create recommendations for each in- dividual user. The key challenges that re- commender systems need to solve are (1) CONTACT the modeling and prediction of user in- terests, and (2) the creation of interesting Rainer Gemulla recommendations on the basis of these University of Mannheim, Mannheim, Germany Email rgemulla uni-mannheim.de predictions.

Christina Tefl ioudi DEPT. 5 Databases and Information Systems Phone +49 681 9325-5029 Email chtefl io mpi-inf.mpg.de 70 INFORMATION SEARCH & DIGITAL KNOWLEDGE

AIDA – Resolving the Name Ambiguity

Have you ever googled for your own The most important insights for a name to fi nd out what the Web knows correct disambiguation are the follow- about you? Chances are that you’re not ing. A name probably refers to the most the only person with your name. The prominent entity. When the name “Paris” right Web pages are buried among others, is found in a text it generally refers to unless you are very famous indeed! This the French capital. There has to be re- is, of course, not the only scenario where ally good contextual evidence to suggests ambiguity makes life diffi cult. When we otherwise. Take for example the sentence read our daily news, most of the names “Paris had to steal Helen from her hus- mentioned are ambiguous. As a human band, the king of Sparta.” The words in being, we deal with ambiguity without this sentence suggest that it is a person thinking, the right meaning seems ob- from Greek mythology. This kind of con- vious to us. Only in the most diffi cult textual evidence is the second feature for cases – take for example the sentence our disambiguation mechanism. Every “Bush was a US president.” – we notice it. entity in our knowledge base is associated Without further information we can- with a textual description in keyword not know if “Bush” means George H. form that is compared to the surround- W. Bush (the 41st president) or his son ing context of the name. The more the George W. Bush (the 43rd president). context and the description overlap, the better the indication for the entity. How- is highly ambiguous, but the way it is The knowledge about which per- ever, in some cases, words alone are not placed in the context of “aftermath” gives son (organization, fi lm, or song, etc.) is enough, especially when the contextu- a strong indication that it is not a person mentioned where on the Web, or indeed al evidence is very limited. To deal with but indeed a natural disaster. Once both in any given text, is very useful for a mul- these cases our methodology enforces “Katrina” and “New Orleans” are resolved titude of applications. Where search coherence among the resolved entities, to the hurricane and the city, it becomes engines could previously only look for a preferring candidates that go well to- clear that “Bush” refers to Bush Jr., as string of characters, they can now actu- gether. In the example sentence “Paris the hurricane hit during his presidency. ally understand what exactly the user is met Helen.”, Paris and Helen from Troy looking for, and return more precise re- are more coherent than Paris Hilton and The disambiguation quality of sults. This allows you to specify the real Helen, Georgia, a small U.S. city. AIDA was tested on a collection of news- meaning when you are looking for the wire articles, and AIDA achives better re- rock group called “Bush” and not a US The interplay of the described sults than any other existing disambigua- president. Or imagine you are a research- features can be seen very well in the tion method. The resulting knowledge er looking for differences in the media sentence “Bush did not handle the after- of entities in a text allows a more power- reception of Bush Sr. and Bush Jr. You math of Katrina in New Orleans very well.” ful search in these texts and additionally can easily get all the articles and fi nd out New Orleans is easy to resolve, as the serves as a basis for the extraction of how often they were mentioned in them city is very prominently associated with further knowledge about the entities, for without having to even read a single one. the name. The fi rst name “Katrina” example how they relate to each other. :::

Our AIDA disambiguation system resolves the ambiguity by linking names

in text to a canonical entity representa- CONTACT

tion in a knowledge base, for example Johannes Hoffart YAGO. YAGO contains nearly 10 million DEPT. 5 Databases and Information Systems unique entities, among them nearly Phone +49 681 9325-5004 1 million persons, but it also contains Email jhoffart mpi-inf.mpg.de locations, organizations, products and events – see the “YAGO – a collection of Gerhard Weikum digital knowledge” article in this report DEPT. 5 Databases and Information Systems for more details. The disambiguation Phone +49 681 9325-5000 process consists of different pieces of Email weikum mpi-inf.mpg.de data and directives, each of which gives Mohamed Amir Yosef additional clues as to which entity is DEPT. 5 Databases and Information Systems actually referred to in the text. Combin- Phone +49 681 9325-5009 ing all of them in the right manner iden- Email mampi mpi-inf.mpg.de tifi es the correct entity. Internet http://www.mpi-inf.mpg.de/yago-naga/aida/ INFORMATION SEARCH & DIGITAL KNOWLEDGE 71

YAGO – A Collection of Digital Knowledge

In recent years, the Internet has approach, we get a very large ontology, in Most of the approximately 7 million developed into a signifi cant source of in- which all of the entities known to Wiki- locations in YAGO have geographic coor- formation. Train schedules, news, even pedia have their place. This ontology is dinates which place them on the Earth’s entire encyclopedias are available online. called YAGO (Yet Another Great Ontolo- surface. Thus, spatial proximity between Using search engines, we can query this gy, http://www.mpi-inf.mpg.de/yago-naga/yago/). two locations can be used as search cri- information, but current search engines At the moment, YAGO contains nearly terion. An exemplary search using the have limits. Assume we would like to 10 million entities and about 80 million space and time criteria could be: Which know which scientists are also active in facts. 20th century scientists were awarded a politics. This question can hardly be for- Nobel Prize and were born in the vicinity mulated in a way that it can be answered More recent extensions of YAGO of Stuttgart. In YAGO one fi nds, among by Google. Queries like “politician scien- pay particular attention to the issue of others, Albert Einstein, because both his tist” only return results about opinions on multi-lingualty and the organization of lifetime (1879-1955) and his birthplace political events. The problem here is that entities and facts in space and time – Ulm (70 kilometers from Stuttgart) are the computers we use today can store a two dimensions that are highly useful stored in the knowledge base. The multi- tremendous amount of data, but they are when searching in a knowledge base. lingual nature of YAGO allow these que- not able to relate this data to a given con- As example, the great majority of the ap- ries to be placed in any language, for in- text, much less understand it. If it were proximately 900,000 person-entities in stance asking for “Wissenschaftler’’ who possible to make computers understand YAGO are anchored in time by their birth received a “Nobelpreis’’. ::: data as knowledge, this knowledge could and death dates, allowing us to position be helpful not only for Internet searches, them in their historical context. For ex- but also for many other tasks, such as un- ample, one can ask questions about im- derstanding spoken language or the au- portant historical events during the life- tomatic translation of text into multiple time of a specifi c president, emperor or languages. This is the goal of the “YAGO- pope, or ask if and when someone actu- NAGA” project at the Max Planck Insti- ally was president. tute for Informatics.

Before a computer can process knowledge, it must be stored in a struc- tured fashion. Such a structured know- ledge collection is called an ontology. The building blocks of an ontology are entities. An entity is every type of con- crete or abstract object: the physicist Albert Einstein, the year 1879, or the Nobel Prize. Entities are connected by relations, for example, Albert Einstein is connected to the year 1879 by the rela- tion “born” [ see graph ]. We have devel- oped an approach to automatically create such an ontology using the online ency- clopedia Wikipedia. Wikipedia contains articles about thousands of personali- CONTACT ties, products, and organizations. Each Johannes Hoffart of these articles becomes an entity in our DEPT. 5 Databases and Information Systems ontology. Phone +49 681 9325-5028 Email jhoffart mpi-inf.mpg.de There is for example an article about Albert Einstein, so the physicist can be Fabian Suchanek recognized as an entity for the ontology. DEPT. 5 Databases and Information Systems Each article in Wikipedia is classifi ed Email fabian suchanek.name into specifi c categories, the article about Gerhard Weikum Einstein, for example, in the category DEPT. 5 Databases and Information Systems “born in 1879.” The keyword “born” al- Phone +49 681 9325-5100 lows the computer to store the fact that Email weikum mpi-inf.mpg.de Einstein was born in 1879. Using this Internet http://www.mpi-inf.mpg.de/yago-naga/yago/ 72 INFORMATION SEARCH & DIGITAL KNOWLEDGE

Fast Searches in Large RDF Knowledge Bases

Semantic data Our focus in this work lies in the scalable management of semantic data, that is, data that contains information about relationships between objects. Example RDF graph Such relationships are ubiquitous. For example, a book has one or more authors, The parts beginning with question Effi ciency and Scalability a protein takes part in specifi c reactions, mark are individual variables whose val- Based on these ideas for indexing and a user of Web 2.0 pages has connec- ues must be determined by the database RDF data collections and optimizing tions to his friends. These relationships engine. The portion in brackets is the queries, we have recently designed the between things, or, more abstractly: en- value given as search conditions by the TriAD engine to further scale these tech- tities, together with the entities them- user. In order to answer this query, the niques to very large RDF collections. selves, form a network or graph structure. database system must fi nd all data tri- TriAD follows, in principle, a classical Using formal semantics associated with ples that match the pattern of the query. master-slave, shared-nothing architec- it, the data graphs form semantic net- Because the data graphs are very large, ture but employs a proprietary, asynchro- works, i.e., networks which allow for there can be many millions of candidate nous communication protocol based on automatic reasoning. triples, making searches very diffi cult. the Message Passing Interface (MPI) in which all slaves can freely communicate A data format, which was specially The two database engines we have with each other. RDF data is partitioned designed for graph-structured data, is the developed for this purpose, RDF-3X (“sharded”) and indexed in multiple Resource Description Framework (RDF). (“RDF Triples Express”) and TriAD (“Tri- permutations at each slave. One of the An RDF data collection consists of tri- ple Asynchronous Distributed”), approach novel approaches followed in TriAD is plets, each of which corresponds to an these problems on several levels. First, to shard triples based on their locality in edge and an associated node pair in the the data itself is stored appropriately so the original RDF data graph. This helps data graph. A triplet consists in the RDF that individual triple patterns can be in minimizing the communication costs writing of a subject, a predicate and an effi ciently evaluated. Compression and and keeping query processing at the lo- object. In the graph, this corresponds indexing data using search trees enables cal slaves as effi cient as in a centralized with the labeled edge from quick evaluation of any given triple pat- approach. However, unlike in a central- subject to object. There are edges in the tern. However, this is not enough; the ized setting, we can process a query over example , which ex- users are mostly interested in larger inter- the different shards, which are distribut- press that the subject is associated relationships and thus pose queries with ed across all compute nodes in parallel. with the name “Arthur Conan Doyle”, linked triple patterns. Therefore, we map In addition to the main RDF indexes, a who was born in 1859. such interrelated queries into execution compact summary graph of the RDF data strategies using algebraic operators and graph is maintained at the master node, choose the ones that can be expected to by which we can additionally prune ir- Search in RDF graphs have the lowest execution time from the relevant partitions of the main indexes Using the above notation, complex different execution alternatives. The dif- even before the actual processing at the relationships can be formulated in a ferent alternatives are assessed, and then slaves takes place. In our experiments, concise manner. The simplicity and the most effi cient alternative is chosen which we have conducted on a number fl exibility, which is achieved through with the assistance of statistical models. of both real-world and synthetic datasets, the graph structure of the data, leads For the example query, one must esti- TriAD consistently outperformed compa- to the fact that RDF searches are rela- mate whether there are more authors rable approaches. For a large instance of tively expensive. Search queries are nor- or more cities in the data set and how the synthetic LUBM benchmark (1.8 B mally formulated in a query language this affects the execution time. A careful triples), we could demonstrate a perfor- such as SPARQL, for example, and de- choice of the execution strategy can mance improvement over RDF-3X instal- scribe a pattern that must be searched often accelerate the query processing lation of a factor of more than 300 (using for in the data. If one searches, for exam- by a factor of 10 or more. a 10-node compute cluster). ::: ple, for the title of books by Scottish authors, one can describe this with the following triple pattern: ?autor < geborenIn > ?stadt CONTACT ?stadt < liegtIn > Schottland ?autor < autorVon > ?buch Sairam Gurajada ?buch < Titel > ?titel DEPT. 5 Databases and Information Systems Phone +49 681 9325-5026 Email gurajada mpi-inf.mpg.de INFORMATION SEARCH & DIGITAL KNOWLEDGE 73

Answering Natural Language Questions Logically

Modern search engines such as Google or Bing cannot answer natural language questions in general and instead return a list of documents in which the user can fi nd answers to his question. Understanding natural language, the question as well as the sources of infor- mation, is the main problem for which there is no comprehensive solution. Figure 1 Figure 2 Many linguistic phenomena such as the ambiguity of words and phrases (for ex- ample “bank”: fi nancial institute or sitting Semantic analysis for disambiguation With the combination of both know- accommodation?) require much data and Humans can disambiguate such ledge sources, the computer system can complex procedures. phrases as the above because of their conclude that the phrase “The chair” life experience and knowledge about the refers to a living being, a chairman, and In our research, we develop novel context, i.e., additional knowledge about not a piece of furniture. Finally, know- techniques for answering English ques- a given situation. A computer system ledge about the context enables the sys- tions based on the knowledge base does not have any life experience and tem to resolve the reference and realize YAGO (see „YAGO – A Collection of needs other forms of knowledge addi- which person is being referred to. Digital Knowledge“, page 71). The two tionally to the knowledge about the con- We achieve such semantic analyses main steps are: text with which the true meaning can be by means of the fi rst-order logic and cor- 1) Analysis of the meaning of the recognized. For example,if the computer responding methods from the domain of question and system knows that only the referenced automated reasoning (see „Automated 2) Search for an answer in YAGO man from the above question owns a Deduction“, page 62). Our focus lies, telescope and nobody else (knowledge among other things, on the early and The second step has already been well about the context), then the computer effi cient recognition of semantic contra- researched in our group so that the fo- system can refute the fi rst interpreta- dictions. ::: cus of the current research lies on the tion of the question. With the following analysis of the meaning of the question. general knowledge and knowledge about For this purpose we cooperate with the the English language, the sentence “The department of computational linguistics chair saw the table.” can be analyzed un- at the Saarland University. ambiguously. General knowledge (in form of an Analysis of the meaning of a question ontology): The meaning of a question can • A chair is a piece of furniture. be derived from the meaning of its sub- • A piece of furniture is not a living phrases. First, subphrases are derived being. from single words of the question and • A chairman is a person. are used to derive larger subphrases. • A person is a living being. The procedure is repeated until the complete question has been covered and Knowledge about the English language: a correct meaning of it parsed. Figure 1 • “chair” can represent a piece of shows an exemplary analysis of the ques- furniture. tion “Who saw the man with the tele- • “The chair” can represent a chair- scope?”. This question is ambiguous and man. can be interpreted in two different ways: • “saw” can represent a verb which 1) as asking for a person who saw a requires the subject to be a living man through a telescope [ fi gure 1 ] . being. 2) as asking for a person who saw a man carrying a telescope [ fi gure 2 ] . CONTACT Such ambiguities occur very often in the Eugen Denerz English language and make natural lan- RG. 1 Automation of Logic guage processing very diffi cult. Phone +49 681 9325-2914 Email edenerz mpi-inf.mpg.de 74 INFORMATION SEARCH & DIGITAL KNOWLEDGE

DEANNA – Natural Language Questions over the Web of Data

When using a search engine, we are often interested in answers as opposed to links in response to our queries. Major search engines have recently started re- turning crisp answers in the form of enti- ties such as people, places, and movies in response to very simple keyword queries. For example, the answer Angela Merkel will be returned for the query German chancellor. Answers to such queries come from large Knowledge Graphs that store triples, also known as facts, where a rela- tion connects a pair of entities together (e.g., (Angela Merkel, chancellorOf, Question answering in DEANNA Germany)).

Existing efforts, however, leave will be determined from the Knowledge DEANNA’s disambiguation frame- much to be desired as they do not fully Graph. For example, the question “Which work is based on integer linear program- exploit the potential offered by Know- German states have minister-presidents ming (ILP) with an objective function to ledge Graphs. Complex queries of inter- from the same party as the chancellor?” maximize and a set of constraints. Here, est to researchers and journalists can translates to the triple pattern query: the constraints prevent any nonsensical be answered by Knowledge Graphs. understandings of the user’s question. For example, one can fi nd answers for ?s type GermanState For instance, understanding the chancel- the query “What was the party of Merkel’s lor as a novel prevents us from being able predecessor?”, or “Which German states ?p1 presidentMinisterOf ?s to connect it to any other objects men- have minister-presidents from the same tioned in the query, and hence means party as the chancellor?”. Such questions ?p1 affi liation ?party that this understanding is incorrect. cannot be put in keyword query form, The objective function captures how well as they completely lose their semantics, ?p2 chancellorOf Germany the different objects in an understand- which is what makes them interesting ing of the question go well together. In- in the fi rst place. What we need is a ?p2 affi liation ?party tuitively, the ILP-based framework re- system that takes questions asked by turns the most likely understanding of humans, maps them to the queries over where the occurrence of the same vari- the question (the objective function) that the Knowledge Graph, and returns an- able, such as ?s, in two triple patterns makes sense (the constraints). ::: swers to the user. connects the two.

We have developed DEANNA, a The main problem in question un- framework for question answering over derstanding in our setting is the inherent Knowledge Graphs. DEANNA allows ambiguity of language. Merkel can refer users to ask potentially complex ques- to one of many people, a computer might tions and obtain answers from a Know- think the chancellor refers to the 1875 ledge Graph. The main task DEANNA Jules Verne novel although it is inten- has to perform is to understand the ded as a relation connecting a person to question posed by the user with respect a country. While such ambiguities seem to the Knowledge Graph. In this set- trivial for a human to resolve, computers ting, understanding means being able to need to be given a framework to resolve map phrases in the question to objects them, which is what DEANNA provides. in the Knowledge Graph, and then com- bine these to create a query. Knowledge CONTACT Graph objects include entities (e.g., An- gela Merkel), types (e.g., political party), Mohamed Yahya DEPT. 5 Databases and Information Systems relations (e.g., chancellorOf). The target Phone +49 681 9325-5028 query language is composed of triple pat- Email myahya mpi-inf.mpg.de terns, where unknown parts of a triple Internet http://www.mpi-inf.mpg.de/departments/ are replaced with variables whose values databases-and-information-systems/ research/yago-naga/deanna/ INFORMATION SEARCH & DIGITAL KNOWLEDGE 75

Open Information Extraction

The great majority of the knowledge that mankind has produced is available only in the form of text, including books, news articles, scientifi c papers, and web pages. Natural language is written for hu- mans; it can be noisy, ambiguous, opin- ionated, or diffi cult to interpret without context. Our work aims to extract and Overview of our open information extraction and disambiguation systems represent this knowledge in a structured form amenable to automated processing by computers. A proposition does not, however, other common verbs such as “take” can provide a complete picture of the infor- have more than 40. To determine the Text understanding is a diffi cult mation present. For example, in the pro- correct sense, Werdy exploits the ob- task and requires the combination of position (Albert Einstein, won, Nobel servation that each verb sense occurs techniques from the machine learning, Prize), we would like to understand that in only a limited set of clauses and only logic, linguistics, and data management “Albert Einstein” refers to the German with a limited set of arguments. For ex- communities, among others. An ideal – scientist, that “Nobel Prize” refers to the ample, the sense of “win” that refers to or perhaps idealized – system would be famous award, and that “won” means obtaining an award requires a subject and suffi ciently powerful to capture the en- “having been awarded with a prize”. The an object that can serve as an “award” tire set of information represented in task of recognizing named entities has (e.g., the Nobel Prize). By systematically a given text collection, regardless of its been well-studied in recent years. Less leveraging this knowledge, Werdy is able domain (e.g., biology, history, economics). attention has been put to the disambi- to boost the precision of existing disam- Language, as the primary representation guation of the verb or verbal phrase, the biguation systems. of knowledge, already provides a system- main element linking the constituents atic way of structuring information, but of a proposition. Our Werdy system ad- Figure 1 illustrates our methods on this structure is often oblivious to com- dresses this gap by automatically disam- our running example. ClausIE and Werdy puters. biguating verbs. The main diffi culty of are basic building blocks for automatic this task is that verbs tend to have many text understanding. In our ongoing work, We have developed a system called different meanings. For instance, the we plan to integrate and reason about ClausIE, which translates sentences into verb “win” has fi ve different meanings large sets of propositions in a principled, a form that is easier to process by com- in the electronic dictionary WordNet; automated way. ::: puters. ClausIE exploits the grammatical structure of sentences, making use of the fact that statements are often expressed in terms of clauses. A clause is essen- tially a simple sentence that consists of a set of grammatical units, some obliga- tory (e.g., subject, verb), some optional (e.g., adverbials). Consider, for example, the sentence “Albert Einstein discovered the law of the photoelectric effect and won the Nobel Prize.” This sentence contains two clauses: “Albert Einstein won the Nobel Prize” and “Albert Einstein discov- ered the law of the photoelectric effect.” ClausIE recognizes such clauses and transforms them into so-called proposi- tions, which reveal the structure of the clause. In our example, the propositions CONTACT are (Albert Einstein, discovered, law of the Luciano Del Corro photoelectric effect) and (Albert Einstein, DEPT. 5 Databases and Information Systems Phone +49 681 9325-5029 won, Nobel Prize). Propositions are easier Email delcorro mpi-inf.mpg.de to process by computers because they are simple, domain-independent, and struc- Rainer Gemulla tured. University of Mannheim, Mannheim, Germany Email rgemulla uni-mannheim.de 76 INFORMATION SEARCH & DIGITAL KNOWLEDGE

FERRARI: Quickly Probing Graphs for the Existence of a Relationship

Entities and their connecting rela- Since a reachability query answers tionships are commonly modeled in the one of the most fundamental questions form of directed or undirected graphs. in graph processing – whether a connec- Prominent examples include social net- tion exists between entities – it is not works capturing personal or professional only used as an important application in relationships among people, biological its own right, but also as a building block networks of interactions between biologi- in other graph analytics scenarios. In cal units such as proteins, or functional practice, FERRARI can answer reach- dependencies in technical systems, for ability queries very quickly, requiring on example call graphs in computer pro- average only a few microseconds for pro- grams. An example for a call graph is cessing, over graphs containing many shown in the fi gure. Here, functional millions of entities and relationships. The dependencies – the relationships – are time and memory requirements for com- drawn as arrows between software mod- puting and storing the FERRARI index ules (the entities). Especially over the structure are also lightweight. FERRARI last decade, we have witnessed substan- enables the precomputation of the index Functional depencies among software modules tial growth concerning both the size in the order of a few minutes. Further- of individual graph-structured datasets more, any restrictions on the memory and the number of application domains The FERRARI index structure size are satisfi ed, either as specifi ed by leveraging the expressive power of the Our research group has proposed the user or mandated by the hardware graph data model. FERRARI, an effi cient index structure of the computer. The FERRARI software to answer questions of this form. The has been released as an open source main idea lies in precomputing certain project. ::: Querying for relationships information with regard to the connec- The growing size of graphs imposes tions between entities. The construction great challenges with respect to effi cient of such an index is a one-time effort, but processing. Graphs comprising several the resulting information is stored and millions of entities and relationships can can be used to speed up the answering render even the most basic analyses com- process for individual questions later. putationally demanding; however, many applications require (near) real-time an- FERRARI precomputes and stores swers to certain important queries. As information about the indirect connec- an example, consider a graph modeling tions in the graph. As a result, questions the functional dependencies between concerning the existence of a certain code fragments in a large software pro- connection can be answered quickly by ject. In this scenario, in order to facili- combining standard algorithms for the tate fast compilation and testing, it is an reachability problem with the precom- important requirement to quickly answer puted information in FERRARI. The questions of the form: “Is there a func- algorithms can operate much faster be- tional dependency of code module A on cause FERRARI has already computed code module B?” In graph processing, parts of the answer that are common questions of this form are known under among many queries and would be costly the term reachability queries. To answer to obtain over and over again for each such a question, it is required to examine newly arriving query. the graph for the existence of a direct or indirect connection – a connection over several intermediate entities. In the ex- ample graph shown in the fi gure, an indi- rect connection (series of arrows) exists between fragments A and B. Traditional approaches for reachability queries are

too slow, especially for the case of many CONTACT

sequentially arriving queries – the typical Stephan Seufert case in most application scenarios. DEPT. 5 Databases and Information Systems Phone +49 681 9325-5017 Email sseufert mpi-inf.mpg.de INFORMATION SEARCH & DIGITAL KNOWLEDGE 77

STICS – Search and Analysis with Strings, Things, and Cats

“Things, not Strings” has been Google’s motto when introducing the Knowledge Graph and the entity-aware- ness of its search engine. When you type in the keyword “Klitschko” as a query, Google still returns Web and news pages, but also explicit entities like Wladimir Klitschko and his brother Vitali (inclu- ding structured attributes like date of birth, profession, and relations to other Ukrainian opposition leader and former lysis. By specifying the canonicalized entities, from the Knowledge Graph). heavy-weight champion”, even if these entity ‘Maidan_Nezalezhnosti’, not only Moreover, while typing, the query auto- texts never mention the strings “Angela do we get rid of spurious mentions of completion method suggests the two Merkel” and “Vitali Klitschko”. STICS other Maidans, but we also fi nd articles brothers in entity form with the addition- achieves this by using the named entity where the square is mentioned only by al hints that one is an active boxer and recognition and disambiguation system its English name “Independence Square”. the other a politician. AIDA, which links ambiguous words to Thus, entity-level analytics, as supported entities in YAGO, which in turn contains by STICS, is the only way to get accurate However, the Google approach still the entities’ categories. The inner work- numbers. has limitations. First, recognizing entities ings of AIDA are detailed in “AIDA – in a keyword query and returning entity Resolving the Name Ambiguity”, page 70. Additionally, as we now have the results seems to be limited to prominent full potential of a structured knowledge entities. Unlike the Klitschko example, The same technology can also be base in the background, further oppor- a query for the Ukrainian pop singer used to improve the analysis of large ar- tunities are opened up. In all semantic “Iryna Bilyk” does not show any entity chives. Consider the task of visualizing knowledge bases, entities are organized suggestions (neither for auto-completion trends around the recent Ukrainian cri- in a category hierarchy, e.g., ‘Greenpeace’ nor in the search results). Second, Google sis, which originated from the Maidan, is an ‘environmental_organization’, which seems to understand only individual en- the square in Kiev where thousands is in turn a subclass of a general ‘organi- tities, but cannot handle sets of entities of Ukrainians protested in early 2014. zation’. Using this category hierarchy, we that are described by a type name or ca- A search for “Maidan” quickly reveals can conduct analyses for entire groups tegory phrase. For example, queries like that the name is highly ambiguous, as of entities, for example, comparing the “Ukrainian celebrities” or “East European it means ‘square’ not only in Ukrainian, presence of ‘environmental_organizations’ politicians” return only the usual ten blue but also in Hindi and Arabic. Thus, and ‘power_companies’ in news from links: Web pages that match these phra- simply counting the string “Maidan” different parts of the world, deriving a ses. The search engine does not under- will result in a large number of false picture of how their importance changes stand the user’s intention to obtain lists positives, leading to an imprecise ana- over time. ::: of people in these categories.

STICS, short for “Searching with Strings, Things, and Cats”, is a novel search engine that extends entity aware- ness in Web and news searches by tap- ping into long-tail entities and under- CONTACT standing and expanding phrases that Johannes Hoffart refer to semantic categories. STICS DEPT. 5 Databases and Information Systems supports users in searching for strings, Phone +49 681 9325-5004 things, and cats (short for categories) Email jhoffart mpi-inf.mpg.de in a seamless and convenient manner. For example, when posing the query Dragan Milchevski “Merkel Ukrainian opposition”, the user DEPT. 5 Databases and Information Systems is automatically guided, through auto- Phone +49 681 9325-5013 completion, to the entity ‘Angela_Mer- Email dmilchev mpi-inf.mpg.de kel’ and the category ‘Ukrainian_politi- Gerhard Weikum cians’, which is automatically expanded DEPT. 5 Databases and Information Systems into ‘Vitali_Klitschko’, ‘Arseniy_Yatseny- Phone +49 681 9325-5000 uk’, etc. The search results include texts Email weikum mpi-inf.mpg.de like “the German chancellor met the Internet http://www.mpi-inf.mpg.de/yago-naga/stics/ 78 INFORMATION SEARCH & DIGITAL KNOWLEDGE

Siren – Interactive Redescription Mining

Finding alternative ways of charac- terizing the same entities is at the root of many scientifi c problems. For example, a biologist could be interested in char- acterizing land areas by the species in- habiting them, on the one hand, and by the climate of the area on the other. In another domain, a social scientist might be interested in characterizing voters by their socio-economical status and voting behavior. Traditionally, these questions have been answered by explicit hypoth- esis testing: The researcher designs a hypothesis, such as ‘these species live in the following climates’ or ‘people in this socio-economical class vote for this party’, then he collects data and tests The user interface of Siren whether the hypothesis is supported by the data. process. Our redescription mining soft- be clearly different from the rest of the ware Siren helps the domain experts by colored markers. The parallel coordinates The advent of data driven science providing a single interface for mining, view displayed above the map, however, allows us to augment the traditional ap- editing, and visualizing the redescrip- explains the difference: the selected proach by automatically mining the hy- tions. Its interactive visualizations and areas (yellow lines in the parallel coor- potheses from the data and simultane- brush-and-link workfl ow facilitate ex- dinates plot) have a much higher maxi- ously checking their validity. Techniques ploratory redescription mining in a mum temperature in February than what such as classifi cation – or more generally, powerful way. is expected according to the redescrip- supervised learning – have been used to tion (second gray box from the left). The fi nd one characterization when the other The Figure shows the interface of user can now decide the appropriate next is given. The goal of redescription min- Siren, where the user has selected the step based on his/her domain knowledge: ing is to fi nd sets of entities that can be aforementioned redescription explaining Discard the area as an anomaly, try to characterized in alternative ways. In the the habitat of the moose in Europe. The extend the redescription to correctly aforementioned biological problem, the map in the bottom right corner shows handle the area, or reject the redescrip- goal of redescription mining would be that there is an area in Northern tion altogether. to fi nd a set of species that co-exist in where the moose does live, but where the areas that can be succinctly character- weather conditions are not as expected We are currently working with ized by their climate. For example, we (red squares). The user has selected this biologists from Finland, Germany, and can approximately describe the habitat of area from the map by drawing a yellow the United States to incorporate rede- the moose (Alces alces) in Europe by the polygon, and the data points correspond- scription mining and Siren into their following climate: February’s maximum ing to the selected area are highlighted data analysis workfl ows. We are also temperature is between – 10° and 0° C, in the other visualizations in yellow. exploring other application areas and July’s maximum temperature is between The two 2D-projections left to the the potential for commercializing the 12° and 25° C, and August’s precipitation map do not show the selected area to technology. ::: is between 57 and 136 mm.

Simply fi nding (or mining) a set of redescriptions is not enough, though. Even a small data set can contain hun- dreds of redescriptions, and understand- ing and verifying them can become a daunting task. Moreover, the users of re- description mining algorithms are usually CONTACT domain experts, and their knowledge of Pauli Miettinen the data and its specifi c characteristics DEPT. 5 Databases and Information Systems should be incorporated into the mining Phone +49 681 9325-5012 Email pauli.miettinen mpi-inf.mpg.de Internet http://siren.mpi-inf.mpg.de INFORMATION SEARCH & DIGITAL KNOWLEDGE 79

KnowLife: Knowledge Extraction from Medical Texts

Nowadays, the Internet is a rich Our fact extraction approach ap- to quickly gain an overview of the various and diverse repository of medical know- plies statistics-based pattern matching medical concepts and their relationships ledge. However, since such knowledge to recognize fact candidates and incor- to each other. An expert user can read is spread over many websites in the porates semantic assets to validate those the text sources supporting the facts and World Wide Web, searching for specifi c candidates. The pattern matching mod- form opinions based on the authority and information is tedious. Imagine a patient ule identifi es word sequences in our in- recency of the text sources. For example, newly diagnosed with diabetes. A typi- put text that express a relation with high the expert will be referred to relevant cal response would be to search the In- confi dence. For example, the sequence clinical studies and scientifi c publica- ternet so as to learn more about the dis- characteristic feature expresses the rela- tions. Finally, the portal also provides an ease. The patient must piece together tion (hasSymptom). To avoid contradic- intuitive way to explore our knowledge the big picture, combining information tory information and prune out incon- base. For instance, a user can search for from different sources and identifying sistent fact candidates, we use rules drug-drug interactions by providing the untrustworthy ones, which is especially based on the semantics of the facts and drug names and then explore the links important for online discussion forums. relations. The rules express constraints between the drugs and related diseases The patient’s overall goal is to get a com- that the extracted facts must satisfy. and side effects. ::: prehensive overview of complex medical One type of rule concerns entity types. facts – disregarding redundant and false For example, only a drug can have a side information. effect, and the side effect must be some symptom. Another type of rule concerns With KnowLife, we present a one- mutual exclusion between facts. For stop alternative for this patient. Know example, a drug that treats a symptom Life is a knowledge base featuring over cannot have that symptom as a side 500,000 facts automatically extracted effect at the same time. After applying from medical texts, where a fact is an the rules, the remaining facts are highly entity-relation-entity triple such as dia- accurate at 93 %. betes-hasSymptom-fatigue. All facts to- gether form a graph, where nodes are To showcase the usefulness of medical concepts and edges depict KnowLife, we have implemented a web relations between these concepts. Up portal [ http://knowlife.mpi-inf.mpg.de ] that to now, previous works have extracted supports a number of use cases: The knowledge for narrow specialties such portal generates for every concept in as protein-protein interactions and gene- our knowledge base an infobox that sum- drug relationships. In addition, they have marizes all the important information focused on scientifi c publications as the about the concept. Furthermore, a user de facto choice of text. In contrast, the can provide a webpage, and the portal goal of KnowLife is to build a versatile will annotate on the fl y any entities and and comprehensive knowledge base: facts found on that webpage and connect First, KnowLife spans many areas within them to our knowledge base. This allows the life sciences, covering a wide range a user, one such as our diabetes patient, of entities and relations involving genes, organs, diseases, symptoms, treatments, side effects, and environmental and life- CONTACT style risk factors for diseases. Second, Patrick Ernst it taps into articles from health portal DEPT. 5 Databases and Information Systems websites and online discussion forums. Phone +49 681 9325-5006 Third, to allow for a better interpretation Email pernst mpi-inf.mpg.de of medical facts, KnowLife enriches them with context information, such as Amy Siu diabetes-hasSymptom-fatigue during ini- DEPT. 5 Databases and Information Systems tial onset. Phone +49 681 9325-5026 Email siu mpi-inf.mpg.de

Gerhard Weikum DEPT. 5 Databases and Information Systems Phone +49 681 9325-5000 Email weikum mpi-inf.mpg.de Internet http://knowlife.mpi-inf.mpg.de 80 RESEARCH AREAS

MULTIMODAL INFORMATION & VISUALIZATION

Pictures are the fastest menas to access to hu- The basis of qualitatively high-value, The images created through image computer-generated images are accurate synthesis or image capture typically have man consciousness. Therefore, algorithms for scene models. At the Max Planck Insti- a brightness range (dynamic range) that tute for Informatics, we are investigating matches the dynamic range of the real appropriate visualization of digital information automated methods for the reconstruc- world. We research algorithms for the on various display types play a signifi cant role tion of models of dynamic scenes. This is image processing of these so-called High gaining more and more importance and Dynamic Range images (HDR) as well as in informatics. Artifi cial and natural worlds fi nds many applications, for instance in methods for their portrayal on standard computer animation, virtual reality, and displays. We also research perceptually- must be presented in an ever more realistic in the areas of 3D TV and 3D video (see based approaches to enhance the display also, research reports from the area “Un- quality of low and high dynamic range and fast manner: in fl ight simulators, surgical derstanding Images and Videos”). imagery on new types of stereoscopic operation planning systems, computer games displays. Methods to effi ciently create, mod- or for the illustration of large amounts of data ify, and simulate deformable 3D models Photographs and videos also rep- are at the heart of many visualization and resent forms of visual media. These are in the natural and engineering sciences. interaction approaches. We research the both aesthetically appealing and impor- algorithms to enable this. Biomechanical tant tools for data analysis. We are there- Graphical content is just one aspect of a simulation and motion estimation ap- fore working on new optical systems for variety of multimodal data types that are now- proaches are also the precondition for de- cameras and turn the classical camera veloping new effi cient and ergonomically into a calculation instrument that can adays captured and generated by computer optimized human-computer-interaction extract much more information from in- paradigms at our institute. dividual images than pure light intensity, and robotic systems. Visual data processing such as 3D geometry, for example. New 2D and 3D printing methods is thus increasingly intertwined with the enable very quickly creation of real ver- Rendering approaches also play simultaneous processing of other data mo- sions of virtual 3D models, and enable an important role in the visualization of printing of new fl exible sensors and dis- complex high-dimensional data. Such dalities. This will pave the way for entirely plays. The algorithmic foundations of data frequently arises, for example, as this bridge to reality are laid at the Max an output of weather simulations, fl ow new approaches of man-machine interaction. Planck Institute for Informatics. simulations, or the analysis of large scale studies in human society. At the Max In order to show lifelike models of Planck Institute for Informatics, we de- virtual worlds on various displays (rang- velop new approaches to visualize this ing from normal screens to immersive complex data, such that correlations and head-mounted displays), we have also important features can be effectively developed new simulation methods for deduced. ::: light propagation in scenes, so-called global illumination. For this, we focus on the development of real-time algorithms. OPTIMIZATION & VISUALIZATION MULTIMODAL INFORMATION & DIGITAL KNOWLEDGE INFORMATION SEARCH GUARANTEES BIOINFORMATICS IMAGES &VIDEOS UNDERSTANDING Perceptual Fabrication Digital FabricationofFlexibleDisplaysandTouch Sensors Optimizing UserInterfacesusingBiomechanicalSimulation Eye-Based Human-ComputerInteraction Physically-based GeometryProcessingandAnimation From PerceptionoverRepresentationtoDeduction Towards aVisual Turing Test: Feature-based Visualization Advanced Real-timeRendering Display QualityMeasurementandEnhancement Stereo 3DandHDRImaging: CONTRIBUTIONS ......

90 89 88 87 86 85 84 83 82 81 82 MULTIMODAL INFORMATION & VISUALIZATION

Stereo 3D and HDR Imaging: Display Quality Measurement and Enhancement

We understand human visual per- content to different devices and people ception as a fi nal and mandatory compo- so that both comfort and quality are op- nent in a visual processing pipeline. An timal. More recently, we have employed accurate model of human visual percep- a stereo 3D eye-tracker to investigate tion is therefore crucial for improving human performance in adjusting to dif- the visual pipeline as a whole. While ferent depths on a stereoscopic screen. psychology provides a range of theoretic This has resulted in a model that pre- models of human perception, they are dicts the time a human observer needs often not applicable for two reasons. to transition between different depth Apparent resolution enhancement for moving images: First, too many simplifying assumptions levels. We have demonstrated that such We optimize individual frames (1-3, from bottom left) are made about the stimuli. Second, the a model is crucial for editing movie cuts so that they appear more detailed (ours, frame 4) on a human retina than conventional solutions (bottom models are often only passive descrip- where rapid depth changes are often right). tions of fi ndings. The challenge in our unavoidable. Our technique allows us work is to deliver concrete models and to improve subjects’ task performance in the temporal and the spatial domain, algorithms that can predict human visual and provide content that is easier to view. our technique enables fi ne control over perception and be computationally in- the appearance of moving objects in the verted, for example, using optimization input video. techniques, to improve the content. Overcoming physical capabilities of display devices The possibility to combine other Apart from describing human per- perceptual models of high contrast (HDR), Towards improving the stereo 3D ception, our models can be used to im- glare effects, brightness gradients, and experience prove the user experience when depict- colors makes the building of a more and Drawing from our past experience ing content on existing display devices. more complete computational model in the image quality evaluation for regu- For example, a model of human retinal of human perception conceivable. This lar 2D videos, we have developed tech- image integration, eye movement, and provides us with an improved way of niques for comparing stereoscopic 3D tempo-spatial information integration depicting content for all users on all content. To this end, we have conducted can be used to display a high-resolution devices. ::: a number of psychophysical experiments image on a low-resolution screen such that investigate the performance of the that the sensation is as close as possible human visual system in perceiving depth to the original. The frame rate has a large differences between simple stimuli. impact on the perception of motion in Based on these studies, we have devised a video.Recently, we have developed a methods that can predict perceived dif- software technique for manipulating ferences between natural images. Such the apparent frame rate of an image metrics fi nd a number of applications in sequence without changing its actual compression and content manipulation display rate. Since we can emulate techniques that adjust stereoscopic continuously varying frame rates, both

CONTACT

Piotr Didyk DEPT. 4 Computer Graphics Phone +49 681 302 70756 Email pdidyk mmci.uni-saarland.de

Karol Myszkowski A conventional anaglyph stereo image (top) looks unpleasant when observed without proper anaglyph glasses, due to color artifacts. Using our backward DEPT. 4 Computer Graphics compatible solution (bottom), we can reduce the artifacts and, using anaglyph Phone +49 681 9325-4029 glasses, still provide a sense of depth when viewing the image. Email karol mpi-inf.mpg.de MULTIMODAL INFORMATION & VISUALIZATION 83

Advanced Real-time Rendering

The creation of photorealistic im- ages using computers is a basic tech- nology required in many visual media. Originating from special effects, nowa- days, the usage of computers to render fi lms is common. However, the required realism can only be achieved using high Figure 1: Real-time rendering of clouds using our method. computational effort where, typically, the synthesis of a single frame of mo- tion picture can take several hours. At the same time, real-time-rendering has become part of our everyday life: In com- puter games, geo-visualization on a cell phone or in interactive kitchen planners on desktop computers, the required im- Figure 2: Real-time rendering of dispersive caustics using spectral ray differentials (fi rst image). The three left images show a zoom into a result of our method and how the two competing approaches result in noise or ages are produced instantaneously from a banding, which we remove using very little additional computation. user’s input. In order to achieve this per- formance, severe simplifi cations had to ray, but is split up in space depending on ware. Furthermore, we have improved be made, which led to the development the wavelength. This introduces another the general quality of screen space shad- of highly specialized graphics hardware dimension that can be hard to sample ing by introducing the concept of deep (GPUs). In our work, we aim to fi ll the properly, often leading to distracting visual screen space, a point cloud representa- gap between highly realistic offl ine-ren- noise. In our spectral ray differential work, tion of a scene that has an output-sen- dering and fast interactive rendering. we have derived formulas to describe the sitize sampling, similar to a common spatial change of a diffracted ray accord- framebuffer, but does not inherit its A typical challenge in interactive ing to wavelength [ fi gure 2 ]. Put simply, limited support for occlusion or grazing rendering are participating media, i.e. this is a local approximation to the shape angles. materials that do not have a hard, opaque of a rainbow. Using this information, we surface, but appear transparent instead, can extrapolate spectral information in- Finally, distributing rendering work- as light scatters inside them. Classical side the image such that noise from dif- load to a computationally powerful ren- examples are clouds and fog, but a large fraction is fi ltered out. The resulting im- dering server has recently received grow- class of other objects such as human skin ages have no noise but a slight blurring ing attention. However, the visual quality also shows translucency. Most previous bias, which is often visually less disturb- might be reduced for slow connections, rendering approaches are not able to ing. We have further shown how a re- for example due to a low frame rate when render such effects in real time, and if, fi ned computation can guarantee even bandwidth drops or latency increases. In then only with certain severe limitaions, this bias to vanish completely after suffi - such conditions, most remote rendering such as holding the geometry of the ob- cient compute time. software extrapolates in-between frames ject fi xed or assuming the scattering to in order to keep an almost-constant frame be isotropic. We have proposed different Modern graphics hardware is best rate. This is of particular importance approaches to overcome these limitations at performing many tasks in parallel that in head-mounted displays where a low [ fi gure 1 ]. The fi rst extends classic photon are simple and independent. A particular frame-rate can lead to cyber sickness. mapping to handle slowly moving clouds. instance of such a task is an image fi lter. The currently-used extrapolation however The second is based on a decomposition Therefore, we have continued to extend tacitly assumes opaque diffuse surface. of the volumetric objects into its princi- our stream of work that uses GPU image In our work, we have extended this ex- pal directions of light propagation, after fi ltering to perform advanced interactive trapolation to handle refractive and re- which scattering becomes a diffusion rendering (screen space shading). We fl ective surfaces as well, leading to better process that can be effi ciently computed. have shown that scattering and diffrac- visual quality in combination with a high The third is simply based on blurring tion of light can be understood as image frame rate. Applications include stream- the image in a physically principled way, fi lters and how such image fi lters can be ing visualization of specular objects, allowing light scattering with relatively effi ciently implemented in graphics hard- such as a virtual car prototype. ::: low computational demand within a few milliseconds.

CONTACT A particular challenge in rendering Tobias Ritschel are dispersive materials such as in a glass DEPT. 4 Computer Graphics prism. Here, light does not follow a single Phone +49 681 9325-4157 Email ritschel mpi-inf.mpg.de 84 MULTIMODAL INFORMATION & VISUALIZATION

Feature-based Visualization

Structures repeat in both nature and engineering. An example is the sym- metric arrangement of the atoms in mole- cules such as Benzene. A prime example for a periodic process is the combustion in a car engine where the gas concen- tration in a cylinder exhibits repeating patterns. Figure 1: Similar structures in the neghip data set (left) are found by selecting a structure (middle) and then our Structures repeat with variations. algorithm fi nds the most similar ones in the entire volume (right). Sometimes these variations are clearly noticeable. For example, the weather How can structures be compared? ticle may exhibit only the vortex. Hence, patterns over the Atlantic Ocean are Comparing two structures requires the trajectories are only partly similar. We comparable every summer, but not quite a function that returns the difference deal with this by segmenting trajectories the same. Similarly, in engineering, the between them. This is not necessarily into meaningful parts. This is the basis fl ow fi elds around a smaller and a larger simple for trees and particle trajectories. for both focusing on a local structure and car are not quite the same, but they ex- We have developed new methods in this detecting its similar occurrences. Figure hibit signifi cant similarity. fi eld. Our method for comparing trees 2 shows an example. exploits the hierachy in the tree to dras- Future work One of the research challenges in tically speed-up the comparison and to visualization is to enable domain scien- make it more robust against noise at A notion of structural similarity tists to detect, quantify and compare the same time. As a result, we know the gives rise to many interesting future ap- structures in their data sets that are pairwise differences of all structures in plications. For example, one could com- similar to each other. the tree. Figure 1 shows how this works pare different data sets with each other. in practice. This is particularly interesting for com- paring data sets from different modalities What is a structure? The diffi culty in comparing particle such as numerical simulation and physi- Our work focuses on comparing trajectories lies in the fact that particles cal measurement. Another possibility data sets from numerical simulations or exhibit different fl ow patterns at different would be to identify outliers in a data physical measurements. For such data, times. As an example, one particle may set, which are characterized by not being the term “structure” needs to be mathe- fi rst be part of a vortex and then part of similar to other structures. ::: matically well-defi ned. This is a matter a hyperbolic region, whereas another par- of the underlying application, but often it refers to an arrangement of geomet- ric objects such as lines or surfaces. The structures in scalar fi elds such as tem- perature or pressure are often described using the topology of the isocontours. This generic description of structures has proven useful in a large number of applications. This approach encodes the structures of the scalar fi eld in a tree, where the size of a branch corresponds

to the importance of the structure in the Figure 2: The trajectories at the top have been segmented into meaningful parts to facilitate comparison. data. The structures in vector fi elds such The bottom image highlights all trajectories similar to the reference pattern. as fl uid fl ows or electromagnetic fi elds can be described using the trajectories of particles. A trajectory is the line con- sisting of all point locations visited by the particle over time.

CONTACT

Tino Weinkauf DEPT. 4 Computer Graphics Phone +49 681 9325-4020 Email weinkauf mpi-inf.mpg.de Internet http://feature.mpi-inf.mpg.de/ MULTIMODAL INFORMATION & VISUALIZATION 85

Towards a Visual Turing Test: From Perception over Representation to Deduction

As vision techniques like segmenta- tion and object recognition begin to ma- ture, there has been an increasing inte- rest in broadening the scope of research to full scene understanding. What do we mean by the “understanding’’ of a scene, and how do we measure the degree of “understanding’’? Today, “understanding’’ scenes most often refers to a correct labeling of pixels, regions, or bounding boxes in terms of semantic annotations. We have developed a task that can measure progress in a more holistic understanding of visual scenes and that also extends to more complex notions of attributes and spatial concepts.

Equally strong progress has been made on natural language understan- ding. For example, methods have been comparable efforts in the literature, our formulation that marginalizes over mul- proposed that can learn to answer ques- approach requires a deep understanding tiple possible “worlds” that correspond tions solely from question-answer pairs of the visual scene and the asked ques- to different interpretations of the scene training data. These methods operate on tions while maintaining a tractable an- in order to be able to process the proba- a set of facts given to the system, which notation effort and having an automatic bilistic output of state-of-the-art scene is referred to as a “world’’. Based on this evaluation protocol lends to easy evalua- segmentation algorithms. knowledge, the answer is inferred by tion and analysis. marginalizing over multiple interpreta- In this manner, we make a step to- tions of the question. wards bridging the gap between symbol- Combining symbolic with uncertain ic reasoning and the uncertain output of We unite these two research direc- visual facts in order to answer questions computer vision algorithms. While human tions by addressing a question answering on images performance is at about 50 % on this task based on real-world images. This While previous approaches on challenge, our best algorithms achieve challenge formulates a holistic task that question answering in the domain of about 13 %. We have established a bench- tests a whole chain, from perception of natural language processing are mostly mark and an evaluation scheme in order image content and understanding natural based on symbolic reasoning and tradi- to make systematic progress, but much language questions to representing the tional AI techniques, our approach has work on automatic image and language acquired information and deducing an- to cope with “uncertain facts’’ that origi- understanding lies ahead of us before swers to the questions. nate from the scene analysis by computer we will be able to match human perfor- vision methods. We propose a Bayesian mance. :::

Benchmark dataset for answering questions on images We are lacking a substantial dataset that serves as a benchmark for question answering on real-world images. Such a task relates to the “AI-dream’’ of building a Turing Test for vision that captures hu- man comprehension of language and ima- ges. While we are still not ready to ad- dress completely unconstrained settings, CONTACT we argue that a question-answering task Mario Fritz on complex indoor scenes is a timely step DEPT. 2 Computer Vision and Multimodal Computing in this direction. We introduce a novel Phone +49 681 9325-1204 dataset of more than 12,000 question- Email mfritz mpi-inf.mpg.de answer pairs produced by humans on Internet http://www.d2.mpi-inf.mpg.de/ RGBD images [ see fi gure ]. In contrast to visual-turing-challenge 86 MULTIMODAL INFORMATION & VISUALIZATION

Physically-based Geometry Processing and Animation

Creating digital geometric content is an important task for various applica- tions in areas such as digital manufactur- ing, architecture, computer animation, and virtual reality. Modern acquisition technologies, like 3D-scanning, facilitate Snapshots of an animation of a soft-body character generated with our tool. the accurate digitalization of detailed real- world objects, and modern fabrication feedback. We have devised algorithms Physically-based animation technologies, like 3D printing, allow the that allow for the effi cient computation In addition to the processing of automated realization of physical objects of approximate solutions of the complex static shapes, we consider the problem from digital designs. Between acquisition optimization problems. The principle is of creating dynamic shapes. In particular, and fabrication lies the fi eld of geometry to carefully construct a low-dimensional we consider the problem of generating processing that concerns the representa- approximation of the complex problem motions of objects and characters that tion, analysis, manipulation, and optimi- in a preprocess and to solve only the re- are physically plausible and follow an zation of digital geometric data. duced problem in the interactive phase. animator’s intent. This is a key problem in computer animation. With tradition- al computer animation techniques, like keyframing, this is diffi cult since many degrees of freedom must be determined and secondary motion effects have to be modeled by hand. Physical simulation can be of great help for creating realistic and detailed animations, but one needs to determine the forces and physical pa- rameters that produce the effect an ani- mator wants to create. Space-time opti- mization, in which the keyframes serve as constraints in space-time, allows to combine the realism provided by physi- cal simulation with the control over an animation offered by keyframing. The Our tool allows to interactively explore the nonlinear space results are planned motions showing of shapes spanned by a set of example shapes. effects like squash-and-stretch, timing, or anticipation, which are desired by Physically-based geometry processing In particular, we want the computation- animators. We have introduced a scheme Triggered by the needs in the al cost for solving the reduced problems that allows for generating interesting and applications, physically-based modeling to be independent of the resolution of realistic-looking motion with rich second- is becoming more and more important the shapes to be processed. A highlight ary motion from minimal input. An ani- for geometry processing. We are using in this area is our scheme for nonlinear mator only needs to generate partial (as concepts from continuum mechanics shape interpolation. It is the fi rst algo- opposed to full) keyframes and can to devise methods and algorithms for rithm that allows for computing inter- thereby generate complex motion from geometry processing. Examples are our polated shapes of a set of larger meshes rough sketches. ::: schemes for deformation-based shape in real-time. In our experiments, we ob- editing and for synthesizing new shapes tain rates of 20 – 100 interpolated shapes from examples. These processes are per second, even for meshes with 500 k modeled by non-linear optimization vertices per input shape. problems that the user can control with a small number of parameters. Since the shapes are highly resolved, this leads to a fundamental problem. On the one hand, high-dimensional non-linear op- timization problems have to be solved, CONTACT and, on the other hand, users expect the Klaus Hildebrandt shape processing tools to provide instant DEPT. 4 Computer Graphics Phone +49 681 9325-4012 Email hildebrandt mpi-inf.mpg.de MULTIMODAL INFORMATION & VISUALIZATION 87

Eye-Based Human-Computer Interaction

Our eyes are a compelling input modality for interacting with computing systems. In contrast to other and more well-established modalities, such as touch or speech, our eyes are involved in nearly everything that we do, anywhere and at any point in time – even while we are sleeping. They also naturally indicate what we are interested in and attending to, and their movements are driven by the situation as well as by our goals, in- tentions, activities, and personality. Final- Our method infers contextual cues about different aspects of what we do, by analyzing eye move- ly, eye movements are also infl uenced by ment patterns over time (from left to right): social (interacting with somebody vs. no interaction), a range of mental illnesses and medical cognitive (concentrated work vs. leisure), physical (physically active vs. not active), and spatial conditions such as alzheimers, autism, (inside vs. outside a building) or dementia. Taken together, these char- acteristics underline the signifi cant infor- Despite considerable advances over We work towards this vision by mation content available in our eyes and the last decade in tracking and analyzing 1) developing sensing systems that the considerable potential of using them eye movements, previous work on eye- robustly and accurately capture visual for implicit human-computer interaction. based human-computer interaction has behaviour in ever-changing conditions, For example, a wearable assistant could mainly developed the use of the eyes in 2) developing computational methods monitor the visual behavior of an elderly controlled laboratory settings that in- for automatic analysis and modeling that person over a long period of time and volved a single user, a single device, and are able to cope with the large variability provide the means to detect the onset WIMP-style (windows, icons, mouse, and person-specifi c characteristics in of dementia, therefore increasing the pointer) interactions. Previous work has human visual behaviour, and chances of an earlier and more appropri- also not explored the full range of infor- 3) using the information extracted from ate treatment. In addition, our eyes are mation available in users’ everyday visual such behaviour to develop novel human- the fastest moving external part of the behaviour, as described above. computer interfaces that are highly in- human body and can be repositioned teractive, multimodal, and modeled after effortlessly and with high precision. This Together with our collaborators, we natural human-to-human interactions. makes the eyes similarly promising for strive to use the information contained natural, fast, and spontaneous explicit in visual behaviour in all explicit and im- For example, we have developed human-computer interaction, i.e., inter- plicit interactions that users perform with new methods to automatically recognize actions in which users consciously trigger computing systems throughout the day. high-level information about users from commands to a computer using their eyes. We envision a future in which our eyes their visual behaviour, such as periods of universally enable, enhance, and support time during which they were physically such interactions and in which eye-based active, interacting with others, or doing interfaces fully exploit the wealth of in- concentrated work. In other work, we formation contained in visual behaviour. have introduced smooth pursuit eye move- In that future, the eyes will no longer re- ments (the movements that we perform main a niche input modality for special- when following moving objects, such as a interest groups, but the analysis of visual car passing by in front of us) for human- behaviour and attention will move centre computer interaction. We have demon- stage in all interactions between humans strated that smooth pursuit movements and computers. can provide spontaneous, natural, and robust means of interacting with ambient Our method using smooth pursuit matches the displays using gaze. ::: user‘s eye movement with the movement of on-screen objects.

CONTACT

Andreas Bulling DEPT. 2 Computer Vision and Multimodal Computing Phone +49 681 9325-2128 Email bulling mpi-inf.mpg.de 88 MULTIMODAL INFORMATION & VISUALIZATION

Optimizing User Interfaces using Biomechanical Simulation

The design of user interfaces has become an area of prime academic and industrial interest. Two groups at the institute have pioneered data-driven solutions to a fundamental challenge: the organization of inputs and outputs in a biomechanically effi cient way. Inter- faces constrain the postures and move-

ments of the human body in unique ways, Computational pipeline from optical motion capture and biomechanical simulation to interactive visualization of and these may or may not be ergonomic the multi-dimensional data set. and effi cient. We applied this approach to touch- other specifi cs of studies that employ We developed a new data-driven screen interfaces. Users performed point- motion capture and biomechanical simu- approach that leverages biomechanical ing tasks on fi ve common touch surface lation. MovExp was published in the top simulation. First, optical motion capture types. The results illuminate the strengths visualization conference IEEE VIS’14. is used to record the 3D movements of and weaknesses of each surface as well a user when interacting with a particular as trade-offs among them. They show user interface. Second, this data steers several trade-offs in terms of throughput, Optimal gestural interaction a biomechanical simulation of the user. postures, and the expected fatigue in We continued with a study of an The result is a data set containing infor- sustained interaction. The paper received emerging input method enabled by hand mation about the ergonomics (e.g., mus- a Best Paper Honorable Mention at the tracking: input by free motion of fi ngers. cular loads) and performance (e.g., speed top HCI conference ACM CHI’15. The method is expressive, potentially fast, and accuracy) of the captured movements. and usable across many settings as it does This combined output is a very rich, multi- not insist on physical contact or visual dimensional description of human move- Interactive visualization feedback. The goal was to inform the ment, encompassing more than 400 vari- However, existing visualization tools design of high-performance input meth- ables. The fi gure shows our computation- do not suffi ciently support designers and ods by providing detailed analysis of the al pipeline. researchers in analyzing such data sets. performance and anatomical character- We identifi ed two shortcomings. First, istics of fi nger motion. We conducted an Several novel approaches have been appropriate visual encodings are miss- experiment using a commercially avail- explored to leverage this complex data. ing particularly for the biomechanical able sensor to report on the speed, ac- We identifi ed four types of problems: aspects of the data. Second, the physical curacy, individuation, movement ranges, (1) Understanding basic human factors; setup of the user interface cannot be in- and individual differences of each fi nger. (2) Optimizing design parameters; corporated explicitly into existing tools. Findings have shown differences of up (3) Comparative studies of different de- We developed the versatile visualization to 50 % in movement times and provide signs which aim to solve the same task. tool MovExp that supports the evaluation indices quantifying the individuation of (4) Summarization of data for design of user interfaces using such data sets. single fi ngers. They were applied to text decisions. In particular, it can be easily adapted to entry by the computational optimization include the physical setup that is being of multi-fi nger gestures in mid-air. We evaluated, and visualize the data on top defi ned a novel objective function that Muscle activation clustering of it. Furthermore, it provides a variety considers performance, anatomical fac- We developed a novel summariza- of visual encodings to communicate mus- tors, and learnability. This work was pub- tion of the user performance and major cular loads, movement directions, and lished as a full paper in CHI’15. ::: muscle activations for 3D pointing tasks. The basis is a large motion data set in which we captured a large variety of CONTACT pointing movements that cover the entire Antti Oulasvirta egocentric space of a user. A subsequent Cluster of Excellence on Multimodal Computing clustering of the muscle activations re- and Interaction veals regions with different performance Phone +358 50 3841561 Email antti.oulasvirta gmail.com and ergonomic costs. This can be used to inform the design choices of user Tino Weinkauf interface designers. The work was pub- DEPT. 4 Computer Graphics lished in a top journal of the fi eld, ACM Phone +49 681 9325-4020 TOCHI. Email weinkauf mpi-inf.mpg.de Internet http://feature.mpi-inf.mpg.de/ MULTIMODAL INFORMATION & VISUALIZATION 89

Digital Fabrication of Flexible Displays and Touch Sensors

Graphical user interfaces offer a high degree of customizability. Graphical contents can be digitally designed and programmed, in order to tailor to the user and to the context of use. In contrast, the screens, which display these graphical contents, barely offer any possibility for customization: today’s displays are typi- cally rectangular, planar and rigid; they are relatively thick and available in only a few predefi ned sizes and aspect ratios.

Our techniques allow to transform a digital vector graphics (left) into a functional, custom touch display for This is getting increasingly proble- embedded user interfaces (center) and into a stretchable sensor for on-body input (right). matic, as user interfaces of the third era of computing are increasingly embedded in the physical environment. User inter- Printed deformable touch-displays Personalized touch-sensitive surfaces faces for the Internet of Things and for With PrintScreen, we have cont- for on-skin user interfaces wearable computing must be customiz- ributed the fi rst technique for digital For high usability on skin, sensor able in size and shape to match the re- fabrication of custom displays: Instead surfaces not only need to be thin, de- quirements of the physical context. of buying a display off the shelf, one formable, and customizable in shape, Moreover, they need new properties, for can now graphically design custom light- they must also be stretchable. With instance, body-worn displays should be emitting and touch-sensitive displays iSkin, we have contributed the fi rst very thin, leightweight, and deformable. in software, like designing conventional stretchable sensor surface for user in- vector graphics. The display is then terfaces on human skin. The sensor is We propose digital fabrication as a fabricated in a subsequent printing pass, graphically designed and then produced unique and novel approach to fabricating similarly to printing a document. through digital fabrication techniques. embedded user interfaces. Digital fabri- The thin and mostly transparent sensor cation techniques, such as 3D printing This enables the fabrication of is made of fully biocompatible silicone. and laser cutting, are already successful highly customized solutions in low Not only the shape, size, and location in commercial applications, for they en- volume, for instance, for prototypes of the touch-sensitive elements can be able the production of objects with high- and small-series products, personalized customized, but also their visual appear- ly custom shapes in industrial production wearables, and interactive print products. ance, in order to tailor to the aesthetic quality. Our pioneering work allows us The displays are around 0.1 mm thick, demands of the user. This sensor enables to go beyond beyond static shape alone can be deformed, rolled, and even folded. various new types of on-body user inter- to realizing interactive input and output Moreover, displays can be printed on faces. For instance, tattoo-like skin stick- components through digital fabrication. various materials, including polymer ers enable the quick, direct, and discreet This makes it possible to produce new fi lms, paper, wood, ceramics, or leather. control of mobile computing devices. ::: types of embedded or wearable user in- Only inexpensive commodity tools are terfaces in small quantities and in a fast, required for their design and printing. easy and inexpensive way. In follow-up research, we have developed PrintSense, a technique for printing sensor surfaces that capture ad- ditional modalities, including hovering and deformation. This enables single- pass printing of multi-modal interactive display surfaces.

CONTACT

Jürgen Steimle DEPT. 4 Computer Graphics Phone +49 681 302-71935 Email jsteimle mpi-inf.mpg.de 90 MULTIMODAL INFORMATION & VISUALIZATION

Perceptual Fabrication

In recent years, there has been a While the fi ne control over the ma- An example of such a perception- tremendous development of new, three- terial placement provides a great fl exibi- driven fabrication approach is our recent dimensional (3D) printing technologies. lity in terms of what can be manufac- work on compliance, which addresses While 3D printers are still not as popular tured, it also poses some computational the problem of printing deformable mate- as regular 2D printers, this may change challenges. Given an object with speci- rials. Given a deformation property of very soon due the great possibilities that fi ed properties (e.g., mechanical or vis- an object that we want to print and a they offer. Instead of using traditional ual), it is unclear how to combine mate- limited number of different materials manufacturing techniques, such as cut- rials available on the printer to achieve that are available on the printer, we want ting or injection molding, 3D printers best results. Solving this problem requires to choose a material that best matches can create complex three-dimensional optimizing for a material placement, our goal. Answering this question re- objects by disposing small portions of which provides properties as close as quires building a computational model printing material and building objects possible to those that were specifi ed. that can predict perceived differences layer by layer. Besides the fi ne control Due to a number of limitations of current between different materials. To this end, over the created geometry, new 3D prin- 3D printers such as limited printing we fi rst conducted a number of experi- ters can use multiple materials that can resolution, color reproduction, or mate- ments to investigate the ability of hu- be placed almost arbitrary within the rial properties, there is no guarantee that man observers to discriminate differenc- printing volume. This enables not only a such a process will succeed. This makes es between materials of varying softness. fi ne control over the fi nal appearance of producing hardcopies of real objects still Then, based on these experiments, we the printed objects [ fi gure 1 ], but also over very challenging. In our work, we argue built a computational model that can their mechanical properties [ fi gure 2 ]. that this problem can be, to some extent, quantify those differences. Such models overcome by accounting for human per- can be used later to compare perceived ception. We believe that good under- properties of different materials and standing with regards to how the printed choose the one best suited for printing objects infl uence user experience is the desired object. We believe that buil- necessary to fully utilize this kind of ding similar models for other properties technology. of printed objects will enable us to pro- duce objects that are not only close replicas in a numerical sense, but also perceptual. :::

Figure 1: 3D printed model of a building with pre- scribed texture and subsurface scattering properties.

Figure 2: 3D printed model of a book with prescribed deformation properties.

CONTACT

Piotr Didyk DEPT. 4 Computer Graphics Phone +49 681 9325-1009 Email [email protected] 91 92 RESEARCH AREAS

OPTIMIZATION

Optimization procedures are today of central Effi cient optimization procedures at least fi nd a solution which is close to are of core importance in various areas. the optimum. In addition, we are investi- signifi cance for companies’ effectiveness. For large companies, they have a deci- gating how we can employ random deci- sive infl uence on their competitiveness. sions to obtain more effi cient and simpler They are employed, for example, to reduce With careful planning, a large amount optimization procedures. In this context, the need for expensive resources such as of resources in industrial projects can we are also studying procedures that are be saved, leading to lower costs. How- inspired by optimization processes in na- work or raw materials. The challenge to ever, such planning problems are usually ture. Such methods often allow good so- very complex, and different requirements lutions to be found for a given problem science is to develop effi cient procedures have to be taken into account. This makes without spending too much effort on it diffi cult for computers to fi nd optimal, designing a specifi c optimiziation proce- for solving optimization problems. These or at least very good, solutions. dure. procedures should quickly fi nd an optimal At the Max Planck Institute for Since optimization plays a signifi - solution, or at least a solution that is close Informatics, we are working on optimi- cant role in many different areas, scien- zation problems arising in very different tists in all research areas of the institute to the optimum. applications such as industrial optimiza- are working on optimization problems. tion or medicine. On the one hand, we Optimization is nowadays a crucial tech- are developing elaborate procedures that nique for the design of effi cient planning fi nd optimal solutions effi ciently. On the processes. Its importance will continue other hand, if the underlying problem is to grow in the future. ::: too diffi cult to fi nd an optimal solution quickly, we develop procedures that can OPTIMIZATION & VISUALIZATION MULTIMODAL INFORMATION & DIGITAL KNOWLEDGE INFORMATION SEARCH GUARANTEES BIOINFORMATICS IMAGES &VIDEOS UNDERSTANDING Improving FlatPanelDisplaysbyDiscreteOptimization CGAL –ExactandEfficient GeometricAlgorithms Rule-Based ProductConfiguration Geometric Packing:SuitcasesandMore Energy Efficient Algorithms Why AreTheyHardtoCompute? From RoutingtoPricingandLearning: CONTRIBUTIONS ......

99 98 97 96 95 94 93 94 OPTIMIZATION

From Routing to Pricing and Learning: Why Are They Hard to Compute?

Pricing problems usually involve a over the internet. Computer scientists (i.e. it cannot learn deterministic fi nite setting where a store owner is interest- knew how to route packets much better automation). Computer scientists have ed in selling items to customers in a way than how to price items. We had some already suspected this limitation: A that maximizes the profi t. This class of idea why a certain routing problem is ground-breaking result of Kearns and problems includes many applications: very hard to compute, and it is better Valiant showed that any algorithm a dealer who sells cars of various types, NOT to compute them than to let a capable of such learning task would be a transportation company who owns the computer run the task until the end of able to break the famous cryptographic toll roads and charges a toll to drivers, the universe. When we see a practical system (RSA), thus convincing a num- a social network owner (e.g. Facebook) setting, we model the problem in a way ber of researchers that such an algorithm who sells commercial products for their that it is possible to effi ciently compute should not exist. Our result provides an advertisers, etc. The challenges of this the solution. We had much less under- even more convincing evidence: such an problem lie in striking the right prices – standing of pricing problems. algorithm would not only break the cryp- too low a price is not profi table; too high tographic system but also solve tens of a price scares off customers). In an Together with my collaborators, thousands of other hard computational apparently unrelated setting, network I connect these two areas by transfer- problems (i.e., all NP-hard problems). ::: routing problems ask for algorithms that ring the knowledge from routing prob- send data (i.e. packets) over the network lems. Our results show that algorithm and guarantee certain quality criteria, designers face a common diffi culty when e.g., route as many packets as possible dealing with routing and pricing. With in a given time. The challenges of rout- this discovery, now we understand that ing lie not only in choosing the right set certain settings of pricing problems are of requests to route but also in planning computationally very hard: For instance, their routes along the network. These are (even approximately) maximizing pro- some of the most fundamental issues in fi ts from the toll roads is very hard when networking and optimization, so it is no there are many more drivers than toll surprise that these problems have shown segments. Theoreticians will benefi t strong connections to major development from rich technical tools developed in in algorithm design, complexity theory, our work, and practitioners will learn graph theory, and optimization. As com- to reformulate their problems in order puter scientists, we are interested in to avoid these diffi cult settings. effi cient computational solutions for these problems, or seek an explanation We pushed this technique even for why such solutions cannot exist. further, and we ultimately succeeded in “transferring” the techniques from both In this project, it is shown that pricing and routing to help understand computational aspects of routing and the computational complexity of almost pricing problems share more similarities 20 other problems. These problems than one would imagine. Network rou- arise routinely in practical settings and ting problems have been popular since we now know they all share a common the 1980s. On the other hand, pricing source of computational diffi culty. After problems only started receiving attention 25 years, we now completely understand since 2000 – in the 90s, hardly anyone that a computer is, very likely, incapable ever thought about selling their products of learning even a very simple language

CONTACT

Parinya Chalermsook DEPT. 1 Algorithms and Complexity Phone +49 681 9325-1017 Email parinya mpi-inf.mpg.de OPTIMIZATION 95

Energy Effi cient Algorithms

The effi ciency of an algorithm is typically determined by the quality of the solution it returns, along with the amount of resources that it requires to return such a solution. Traditional analy- sis of algorithms mostly involves evaluat- ing the performance of an algorithm in terms of the resources, running time, A schedule and storage, i.e., the number of steps it requires in order to produce the desired result, or the space it takes up in the particular amount of CPU-cycles. Then, Additionally, the algorithm can compute storage device during execution. How- the operating system has to schedule such schedules in a running time that ever, as energy is a limited and expen- these programs on the processor, that is, is polynomial in the input size, which is sive resource, it is of critical importance it has to decide when to transition the considered effi cient. The main idea of to analyze algorithms (or the solutions processor to the sleep state, and further- our algorithm is to divide each program returned by the algorithms) for this re- more, at which times and at what speed into smaller, unsplittable subprograms source as well. To quote former Google to process each program. These decisions and identify a specifi c set of orderings in CEO, Eric Schmidt: “What matters most can have a huge impact on the energy which these subprograms could poten- to computer designers at Google is not consumption of the processor and, in tially be processed. We prove that one of speed, but power, low power, because data turn, on the battery life of the computer. these orderings produces a suffi ciently centers can consume as much electricity It has been observed that it is sometimes good solution, and that we can effi ciently as a city”. benefi cial to operate the processor faster fi nd the best ordering in the set by using than necessary in order to reside to the a well-known technique called dynamic To address the aim of power saving, sleep state for longer periods of time, a programming. Our result is the best pos- there has been a signifi cant upsurge in technique commonly referred to as race sible, given universally accepted com- the study of energy-effi cient algorithms to idle. However, operating the processor plexity-theoretic assumptions. ::: in recent years. The most obvious type at its highest possible speed whenever of algorithmic problem that considers there is work to be done, and transition- energy as a resource is the question of ing it to the sleep state otherwise (as is how to manage power directly at the pro- commonly done in practice) is also sub- cessor level. As an example, most modern optimal. portable computers are equipped with a processor that is speed scalable and is Together with Chien-Chung Huang, supplied with a sleep state. Suppose we have developed an algorithm for the we want to process a number of pro- aforementioned setting. The schedules grams on such a computer, where each that our algorithm produces have an program has to be processed within energy consumption that is guaranteed a specifi c time frame and requires a to be arbitrarily close to the optimal one.

CONTACT

Antonios Antoniadis DEPT. 1 Algorithms and Complexity Phone +49 681 9325-1025 Email aantonia mpi-inf.mpg.de

Sebastian Ott DEPT. 1 Algorithms and Complexity Phone +49 681 9325-1029 Email ott mpi-inf.mpg.de 96 OPTIMIZATION

Geometric Packing: Suitcases and More

Everyone who has gone on vacation knows this problem: before the trip can start, you have to pack your suitcase at home; and, you want to take many more things with you than can fi t into it (not to mention the weight limitations of air- lines). So, instead of taking everything, you pick the most useful items that you can actually transport in your suitcase.

Problems of this type arise in many different situations, i.e., when loading a truck, placing advertisement banners

on a website, or, with some additional Figure 1: The four different types of recangles. Figure 2: The partition of the given area into a few side constraints, when labeling a map rectangular pieces. with names of cities and countries. Even though these problems seem very For the problem above, we use the this problem by partitioning the given different at fi rst glance, they can all be following strategy: Roughly speaking, we area into a few rectangular pieces such described as follows: Given are some geo- can partition the given rectangles into that in each of them we allow only rec- metric objects and let us assume here high and wide rectangles, high and nar- tangles from one group. In the second for simplicity that they are (two-dimen- row rectangles, wide and thin rectangles, step, we distribute the rectangles into sional) rectangles. Our goal is to select and fi nally, rectangles that are small in these pieces such that in each of them as many of them as possible such that both dimensions [ see fi gure 1 ]. It turns out we place only one type of rectangle. we can place them in a given area (your that the fi rst three groups are the prob- With mathematical methods, we can suitcase, the loading area of a truck, etc.) lematic ones. If there were only rectan- prove that with this strategy we always without overlap. Additionally, we can as- gles from one of these groups, the prob- fi nd a solution that differs only margin- sign different levels of importance to the lem would be much easier. In particular, ally from the best possible solution and objects in order to model that, e.g., your the rectangles of the different types can that the resulting algorithm is effi cient passport is more important than your interact in the optimal solution in a very (in the theoretical sense). ::: third pair of shoes. The advantage of complicated way. In a nutshell, we solve such an abstract description is that the solutions we fi nd can be used for many applications – not only for packing your suitcase, but also for loading a truck.

How do we pick the most important objects in the problem described above so that they fi t into the given area? There are many different solutions for problems of this type, way too many to fi nd the best one by simply trying all the possibilities (even for the fastest com- puter in the world)! Therefore, we design algorithms that do not necessarily always fi nd the best possible solution, but that are effi cient and always fi nd a solution that does not differ too much from the optimum (approximation algorithms).

CONTACT

Andreas Wiese DEPT. 1 Algorithms and Complexity Phone +49 681 9325-1015 Email awiese mpi-inf.mpg.de OPTIMIZATION 97

Rule-Based Product Confi guration

In today’s industry and economy, to the respective areas of use. The de- The current research encompasses the principle of product lines plays a sired level of detail can be chosen freely: the analysis of existing rule-based sys- great role. In this context, different Thus, a product can be regarded solely tems, the development of a suitable products, based on a preferably high from a sales perspective, or considered formal language and the development number of identical components, are down to the last little screw for engineer- of automatic methods for the calculation grouped together and jointly designed, ing purposes. The approach can also be of properties. We have created a formali- developed, and produced. A typical ex- applied to the description of products zation which enables the verifi cation ample is the group-internal production composed of abstract modules. In addi- of a rule-based confi guration system, of different automobile models that are tion to the automotive industry, appli- attaching importance to the interactivity based on a common platform. cations for software products that are of such a system: Is it possible that user composed of individual modules are also decisions lead to inconsistencies? It es- A currently popular approach for possible, for example. Production lines, sentially considers variability in the form describing product lines is the use of which consist of a sequence of individual of decisions over Boolean values (i.e., rules: They specify which components, steps, or the composition of business ser- a variable in the system can have the in a specifi c context, or following an ex- vices from individual activities, can also values “true” or “false”). Building on that, plicit pre-selection of properties, can be be described naturally with the aid of procedures that allow analyzing proper- confi gured into an overall product. These rule systems. ties of a rule-based confi guration system contexts can be of a temporal or techni- have been implemented . Moreover, a cal nature, for example. Alternatively or Although promising results have fi rst extension of the formal system has additionally, a product can be described been accomplished with rule systems been completed, where additionally the based on the hierarchical structure of its on the application side, an appropriate variability in the form of decisions over assembly. An example of a component formalization, i.e., an explicit and con- numeric values is considered. ::: selection for a vehicle would be the se- sistent translation of a rule-based system lection of a color, an engine or an equip- into a language of logic, is still the sub- ment detail such as an air conditioner. ject of basic research. However, formali- A temporal context would include a par- zation is a prerequisite for the assurance ticular year of manufacturing or model of global properties. Important properties year; a technical context could be the include consistency, i.e., consistency of upper limit of CO2 emissions or the car’s the system of rules, or the uniqueness of total weight. The rules themselves de- the calculated results. Further examples scribe dependencies between the com- of these properties are the total num- ponents and contexts: A powerful engine ber of possible products from one prod- requires specifi c tire combinations based uct line according to the pre-selection on the high speeds that can potentially of components or contexts, optimality be achieved, whereas trailer hitching de- properties of products such as minimum vices exclude weak engines. The interac- production costs, or recognizing compo- tions are represented by so-called if-then nents that no longer contribute to an rules, which become active through deci- active product. For systems of rules that sions by the user specifying the proper- also describe a particular process, e.g., ties of an eventual product. a production or confi guration process, the confl uence of the system of rules Within the framework of our re- also plays a major role: Irrespective of search, we are interested in a formal the order of the rule executions starting description and verifi cation of such a from an initial state, the result is always system of rules. We also want to be able the same. to calculate mathematically precise anal- yses of the whole product structure. For many of the systems currently in use, the CONTACT challenge does not lie in developing the Ching Hoo Tang concrete rules, but rather in ensuring im- RG. 1 Automation of Logic portant global properties such as consist- Phone +49 681 9325-2919 Email chtang mpi-inf.mpg.de ency. The above-described rule approach is particularly characterized by its fl exi- Christoph Weidenbach bility. Rules can be easily adapted to the RG. 1 Automation of Logic types of products being confi gured and Phone +49 681 9325-2900 their associated components as well as Email weidenbach mpi-inf.mpg.de 98 OPTIMIZATION

CGAL – Exact and Effi cient Geometric Algorithms

When CGAL was founded, Depart- ment 1 contributed to the fi rst geometri- cal kernel, and has since provided devel- opments of algorithms for convex hulls, polygonal partitioning and Boolean oper- The Computational Geometry ations on Nef polyhedra in two and three Algorithms Library, CGAL, provides dimensions. The main recent contribu- access to a large body of effi cient and tions of Department 1 to CGAL consist reliable geometric algorithms and data of implementations for the analysis of structures. Our department has been an algebraic curves and surfaces in the fi eld active member of its development since of non-linear computational geometry. its beginning in the late 1990s. CGAL Such computations are needed for mo- reduces the gap between theoretical re- tion planning, for instance.1 However, sults and implementations that can be they cannot be carried out reliably with Segmenting an algebraic curve into x-monotone used in practical scenarios. It is success- standard machine precision arithmetic. parts and isolated points ful in academic prototypical develop- The latest success in performance gain ment and also widely employed among for algebraic curves is due to several in- tectures, build system, external library industrial users. The library’s philosophy gredients: The central one consists of a dependencies, testing, documentation, dictates correct results for any given in- cylindrical algebraic decomposition with releasing and also interactions on mailing put, even if intermediate round-off errors a revised lifting step. Using results from lists and via the website. All these tasks occur. This is achieved by a design that algebraic geometry we avoid any change undergo constant technological advance- separates numerical constructions and of coordinates and replace most of the ments; these ease the integration of new predicates from combinatorial algorithms costly symbolic operations with numeri- developers and also ensure industrial- and data structures, which is necessary cal tools. Two bottleneck tasks have been strength quality. Quality assurance is as computers actually do not compute replaced with implementations that provided through detailed reviews of new correctly, at least not without the pro- exploit the massively multi-parallelism contributions, similar to journal submis- grammer taking extra care. Thus, naive of fast graphics hardware. sion. Finally, we use CGAL for teaching geometrical implementations result in and training students in modern library rounding errors that surface in real appli- In addition, we have contributed to design and disseminate our achievements cations as crashes, non-terminating pro- the combinatorial structures of CGAL in to other researchers in the fi elds of math- grams, or wrong results. CGAL, however, collaboration with Dan Halperin’s group ematics and computer science at events follows the exact geometric computa- from Tel Aviv University in Israel: We like the International Conference for tion paradigm that ensures correct out- worked on arrangements on parameter- Mathematical Software. ::: put for arbitrary instances of a problem, izable surfaces and provide support for even handling with severe degeneracies. polycurves, that is, curves composed of CGAL is developed as an open-source piecewise subsegments. Both build basic project by a group of research institutes blocks used, for instance, in 3D printing and commercialized by GeometryFactory, workfl ows. a spin-off company that also offers con- sulting services for industrial users. Geo- Moreover that, a library the size of metrical computations occur in various CGAL needs constant maintenance as domains, as geographic information well as technical and human resources to systems, recommendation systems and keep its infrastructure alive: source code wireless communication planning. repository, adaptions to multi-core archi-

CONTACT

Eric Berberich DEPT. 1 Algorithms and Complexity Phone +49 681 9325-1012 Email eric mpi-inf.mpg.de Internet http://www.cgal.org OPTIMIZATION 99

Improving Flat Panel Displays by Discrete Optimization

Our objective is to advance the majority of cases in practice. This inven- state of the art in modern display tech- tion was patented and commercialized nology by means of discrete optimiza- by Dialog Semiconductor under the tion. Specifi cally, we want lower power brand name SmartXtend and made trans- consumption in the next generation of parent and fl exible displays possible. fl at-panel displays through a reduction In particular, it is used in transparent of their addressing time. To this end, we OLED displays by Futaba (formerly exploit a common feature of today’s dis- TDK). The Lenovo S800 [ fi gure 1 ], the plays: the pixels are arranged in a matrix Explay Crystal, and the Nexian Glaze fashion with only one contact per row M9090 are three mobile phones available and per column. For the sake of simplic- on the international market that are Figure 2: Representing the display matrix by a ity, let’s assume that these contacts are driven by our technology. bipartite graph switches and that a pixel (i,j) shines if and only if the switches for row i and We develop new sophisticated column j are closed. This implies that an driving algorithms and thereby contri- arbitrary pattern cannot be displayed at bute to the theory of addressing matrix the same time. Hence, several subframes displays, which still lacks a profound are necessary to display an image entirely. understanding of the underlying compu- Traditionally, an image is displayed row- tational problems. Due to the relation by-row at a suffi ciently high frame rate of binary matrices to bipartite graphs, such that the eye perceives the average. the addressing problem is captured by This method is called Single Line Ad- the problem of decomposing the edge dressing since each subframe consists set of a bipartite graph into a minimum of a single line. This is a major cause of number of complete bipartite subgraphs, excessive power consumption because also called bicliques. This project will each pixel only shines for a very short close a research gap in the area of ap- time, and thus, it must shine very bright- proximation algorithms for biclique de- ly to achieve a suffi cient average bright- composition problems, and the foun- ness. However, it is possible to drive mul- dational research will benefi t fl at panel tiple rows at once, though only by solving displays and in turn other applications, a discrete optimization problem in real- for example, in graph drawing, com- time on a driving chip. Constrained by puter security, and genetics. In fact, the strong competition on the display Figure 1: Lenovo S800, driven by our technology we have already taken a leap forward market, successful algorithms must be by recently proving nearly tight lower effi cient with respect to arithmetic oper- In a follow-up project funded by the and upper bounds for approximation ations and memory consumption. German Research Foundation, we extend factors of polynomial-time algorithms this research to further display technolo- for biclique covering and partition. ::: We have developed a fully combi- gies. Our framework will apply to devices natorial approximation algorithm for the like OLED displays, plasma screens, and practically relevant case in which pixels e-paper. in consecutive rows are addressed simul- taneously. Because the algorithm uses only addition, subtraction, and compari- sons, it is well-suited for implementation in hardware. Nevertheless, our algorithm computes decompositions in real-time whose objective values are within one percent of the optimum solution in the

CONTACT

Andreas Karrenbauer DEPT. 1 Algorithms and Complexity Phone +49 681 9325 1007 Email karrenba mpi-inf.mpg.de 100 SOFTWARE

SOFTWARE

Informatics is, fi rstly, a discipline of basic research that deals with universal compu-

tation and problem-solving methods and investigates fundamental properties such as

correctness and complexity. Secondly, it also resembles an engineering science that

supports a great variety of different applications. The development of software fulfi lls

many functions in this context. It is subject to basic research, the outcome of the imple-

mentation of new results from basic research, as well as the product of an engineering

effort for the solution of a concrete application problem. Software is thus an inherent

part and a connecting link in informatics research.

http://www.mpi-inf.mpg.de/software 101

AIDA FunSimMat

AIDA is a framework and online tool for FunSimMat is a comprehensive func- entity detection and disambiguation. tional similarity database for proteins RnBeads Given a natural-language text or a Web and protein families and a tool for easy table, it maps mentions of ambiguous and fast disease gene prioritization. RnBeads is an R package for comprehen- names onto canonical entities. sive analysis of DNA methylation data obtained with a wide variety of experi- Geno2Pheno mental protocols.

Geno2Pheno predicts phenotypic drug resistance of HIV and of Hepatitis B and CGAL C virus from the viral genome sequence.

SPASS | SPASS+T CGAL is a software project that provides easy access to effi cient and reliable geo- SPASS is an automated reasoning work- metric algorithms in the form of a C++ bench. SPASS+T extends the theorem library. prover SPASS with theory support. LEDA

ClausIE LEDA is a C++ class library for effi cient data types and algorithms.

ClausIE is an open information extractor; it identifi es and extracts relations and their arguments in natural language text. NEFI Waldmeister

NEFI is an extensible tool for extracting Waldmeister is a theorem prover for unit graphs from images of networks. equational logic.

EpiExplorer

The EpiExplorer integrates multiple epigenetic and genetic annotations and makes them explorable via an interactive interface. pfstools YAGO

Tools for Reading, Writing, Manipulat- Yago is a huge semantic knowledge base ing and Viewing High-Dynamic Range knowing over 900,000 entities. (HDR) Images and Video Frames. 102 IMPRS-CS

The International Max Planck Research School for Computer Science (IMPRS-CS)

The training of young scientists is fundamental to the future of science, research, and innovation

in Germany. The Max Planck Society, in cooperation with German universities, has launched an

initiative to promote young scientists: the International Max Planck Research Schools (IMPRS).

They offer especially gifted German and foreign students the possibility to earn a doctorate

within a structured program that provides excellent conditions for research. The aim is to

strengthen the recruitment and training of young scientists. 103

IMPRS-CS

Promotion of young scientists IMPRS-CS programs Financial support The IMPRS-CS is an opportunity Together with Saarland University IMPRS-CS students receive a for young scientists and scholars who and the Saarbrücken Graduate School PhD support contract. In addition, we are between the bachelor’s or master’s of Computer Science, the IMPRS-CS assist our students in fi nding accommo- degree and the PhD. This includes offers programs to establish graduate dation and with administrative matters fi rst-class training programs, academic level competency and also to achieve a of all kinds. We offer English and Ger- specialization, often with thematic lin- doctoral degree. All graduate programs man classes at several levels, joint king of individual doctorates, and close are offered in close cooperation with the activities, and excursions. ::: collaboration between doctoral students Max Planck Institute for Informatics, and their academic advisors. the Max Planck Institute for Software Systems and the Department of Com- One focus is on international puter Science at Saarland University. cooperation: The IMPRS-CS strives The projects are jointly supervised by the especially to attract foreign applicants scientists of the Max Planck Institutes to doing doctoral studies in Germany, and their colleagues in the Department familiarize them with the research in- of Computer Science at Saarland Univer- stitutions, and arouse their interest in sity. Outstanding knowledge of English future work at or in cooperation with is required for all candidates. German research institutions. Over 50 percent of our doctoral candidates come from abroad, with the largest contingent coming from Bulgaria, , India and Poland.

CONTACT

Jennifer Gerling IMPRS-CS Phone +49 681 9325-1801 Email jgerling mpi-inf.mpg.de

Andrea Ruffi ng IMPRS-CS Phone +49 681 9325-1801 Email aruffi ng mpi-inf.mpg.de Internet http://www.imprs-cs.de 104 ALUMNI

ALUMNI

My time at MPI-INF

I was always fascinated to work at MPI-INF with its extremely stimulating and lively research environment. It was inspiring to discuss and collaborate with colleagues such as Naveen Garg, Torben Hagerup, Hans-Peter Lenhof, Stefano Leonardi, Peter Sanders and Stefan Schirra (this list is by no means complete). With many of them, I am still in close touch. Moreover, it was instructive to attend the famous Noon Semi- nar where group members and guests reported about recent and ongoing research. The atmosphere at the institute was very open and democratic. Each group member was given the opportunity to pursue his/her own line of research and to establish SUSANNE ALBERS a network of international contacts. It was also highly interesting to see how Kurt Mehlhorn and Harald Ganzinger built up the institute in the early days. Right from the start MPI-INF was a fi rst-class international research institute. CV-Milestones In addition to the scientifi c work, there were social activities. Most remarkably, • PhD Student at MPI-INF from 1990 to 1993 Wolfgang Paul, Professor at Saarland University, and Ingrid Finkler-Paul, offered a • PhD 1993 at MPI-INF cooking course for members of the Research Training Group in their house. I still • Senior Researcher at MPI-INF keep the photocopies of the recipes. More generally, the members of the Algorithms from 1993 to 1999 and Complexity Department often went out in the evening. A common meeting point • Today: Professor TU Munich, Germany was Stiefelbräu near St. Johanner Markt. I also remember great dinner evenings with Naveen Garg, Marina Papatriantafi lou, Philippas Tsigas and Christos Zaroliagis. Overall, working at MPI-INF was not only scientifi cally challenging but also a lot Research of fun! :::

My research interests have always been in algorithms, with emphasis on online and approximation algorithms. Further interests include algorithmic game theory and algo- rithms engineering. ALUMNI 105

My time at MPI-INF

I joined the “Computational Biology and Applied Algorithmics” group of the MPI for Informatics in Saarbrücken in November 2005 – immediately after having fi nished my Master’s degree (back then called a Diplom) in Computer Science at the RWTH Aachen. The curriculum at the RWTH demanded to attend courses in a subject of application. I selected Biology and the courses I visited sparked my interest in using computational approaches to solve problems and answer questions in Biology, specifi - cally Molecular Biology.

ANDRE ALTMANN Upon joining MPI-INF, I was fortunate to work on an exciting topic that went beyond pure biological questions but into supporting medical decisions using computational approaches and data mining. More precisely, we developed computational models to CV-Milestones optimize the drug cocktail for patients infected with the human immunodefi ciency virus (HIV). The project put me in contact with a truly interdisciplinary team of medi- • PhD Student at MPI-INF from 2005 to 2011 cal doctors, virologists, biologists and computer scientists, which gave me a good un- • PhD 2011 at MPI-INF derstanding on how Computer Science can be used to advance medicine. The time at • Today: Senior Research Fellow at the MPI-INF was very exciting and still counts towards my best research experiences. University College London, UK It was here where I wrote my fi rst paper and learned the skills required for conducting high quality research. Most importantly, during the time at the MPI-INF, I met many great people with whom I am still friends. ::: Research

My research focuses on combining high throughput sequencing and neuroimaging to better understand the disease processes in brain disorders and to identify useful biomarkers, thus achieving computer aided diagnosis and prognosis of brain disorders. 106 ALUMNI

ALUMNI

My time at MPI-INF

The MPI-INF not just boosted my personal career substantially. It helped me to pioneer the fi eld of breath analysis for non-invasive biomedical diagnostics from exhaled metabolomics samples. In addition, its reputation attracted many internatio- nally renowned bioinformaticians to talks, projects and symposia. Several of my current projects still rely on collaborations that were set during my time in Saarbrücken, some of them driving whole bioinformatics research areas. :::

JAN BAUMBACH

CV-Milestones

• PhD 2008 at Bielefeld University, Germany • Senior Researcher at MPI-INF from 2010 to 2012 • Today: Professor at University of Southern Denmark

Research

My current research concentrates on the combined analysis of biological networks together with OMICS data, the modeling of genetic expression pathways as well as biomarker discovery and computational methods for personalized medicine. ALUMNI 107

My time at MPI-INF

After receiving my Diploma degree from the University of Bonn in 1999 with a thesis topic on algebraic number theory supervised by Florian Pop, my initial assess- ment of biology and its connections to math and computer science changed completely when I joined Thomas Lengauer’s research group at the GMD in St. Augustin. Bio- informatics, now a well-developed discipline rooted in computer science and central to the life sciences, was still a young and less well-defi ned subject. It became clear to me very quickly that bioinformatics, or computational biology, is a fascinating and highly dynamic fi eld full of unanswered (and unasked) questions as well as numerous NIKO BEERENWINKEL applications, especially in biology and medicine.

When Thomas Lengauer moved to Saarbrücken in 2001 to start the “Computa- CV-Milestones tional Biology and Applied Algorithmics” Department at the MPI for Informatics, I was lucky enough to join him as his PhD student. On one of my fi rst days at what used • PhD Student at MPI-INF from 2001 to 2004 to be called Stuhlsatzenhausweg 85, I walked upstairs to a printer on the sixth fl oor, • PhD 2004 at MPI-INF because ours on the fi fth fl oor had not yet been installed. At the printer, I bumped into • Today: Professor at ETH Zurich, Harald Ganzinger, who recognized a new face, introduced himself, and immediately Switzerland started interrogating me about my research. Although Harald took me by surprise, this early encounter left a long-lasting impression on me as it demonstrated the friendly, down-to-earth, open-minded, and stimulating atmosphere at MPI-INF that I encoun- Research tered in general and enjoyed throughout my stay. At that time, I largely took the scien- tifi c excellence of the Institute for granted and it was only later when I fully appreci- My research today is at the interface of ated it. I received my PhD in Computer Science from Saarland University in 2004, mathematics, statistics, and computer honored by the Otto Hahn Medal of the Max Planck Society, after Thomas Lengauer science with biology and medicine, ranging had introduced me to the fi eld of bioinformatics and given me the unique opportunity from mathematical foundations of biostatisti- to work on an exciting project concerned with the development of drug resistance cal models to clinical applications. in HIV. :::

The goal of our research is to support the design of medical interventions in complex and rapidly evolving systems by means of statistical modeling of high-throughput molecular profi ling data, by analyzing bio- logical networks and predicting the effect of perturbations, and by developing mathemati- cal models of the evolutionary dynamics of pathogen and tumor cell populations. 108 ALUMNI

ALUMNI

My time at MPI-INF

From 2001 to 2004, I was the leader of an independent research group (Nach- wuchsgruppe), the Static Analysis group, at the Max-Planck Institute for Informatics. At the same time, I was also researcher at CNRS, in the abstract interpretation team of Ecole Normale Supérieure (ENS), Paris. I spent about three weeks a month at MPI-INF and one week a month at ENS. That was a very exciting period scientifi cally, with collaborations both in Saarbrücken and in Paris, as well as with Martín Abadi (University of California, Santa Cruz) and Cédric Fournet (Microsoft Research, Cambridge). I particularly appreciated the excellent working conditions provided BRUNO BLANCHET by MPI-INF. :::

CV-Milestones

• PhD 2000 at Ecole Polytechnique, Paris, France • Independent Research Group Leader at MPI-INF from 2001 to 2004 • Today: Senior Researcher at INRIA Paris, France

Research

My research interests deal with automatic security protocol verifi cation both in the symbolic and in the computational model. ALUMNI 109

My time at MPI-INF

After accomplishing my MSc in Computer Sciences at the University of Warsaw in Poland, I started my path towards computational biology. During the years 2007- 2011, I was a PhD student at the MPI-INF department for “Computational Biology and Applied Algorithmics” with Prof Lengauer as my PhD advisor.

My stay at MPI-INF was a great time of discovering this new and exciting fi eld of research. The stimulating and inspiring environment of the group and institute taught me scientifi c approach – to be creative, think critically, and explore the solu- KASIA BOZEK tions of a given problem. It provided me a rich and unique research environment in which I could thrive and develop myself in the chosen direction.

CV-Milestones Not only did it result in a good publication record, but also opened the doors to other opportunities. After my time at MPI-INF, I moved to Shanghai for a Post- • PhD Student at MPI-INF from 2007 to 2011 Doc at the CAS-MPG Partner Institute for Computational Biology where I could put • PhD 2011 at MPI-INF many of the skills and methods learnt at MPI-INF into practice. Now I am moving • Today: Group Leader at Okinawa Institute on to Okinawa Institute of Science and Technology in Japan to start my work as a of Science and Technology, Japan group leader.

In all my future work, I will remember dearly the time spent at MPI-INF. This is Research the place where my scientifi c curiosity was fi rst awakened and where I learnt the skills and tools I will use the rest of my scientifi c and professional life. ::: My fi eld of research is in computational bio- logy. The projects I performed can be grouped into 3 major topics: metabolomics, HIV, and circadian clock.

Throughout these projects, I applied methods ranging from comparative metabolomics/ genome/gene expression analysis, through modeling in structural biology to machine learning methods for virus classifi cation. 110 ALUMNI

ALUMNI

My time at MPI-INF

After my PhD, I applied as a Senior Researcher in the Cluster of Excellence MMCI at Saarland University/Max-Planck Institute for Informatics. I initially ap- proached Hans-Peter Seidel at Eurographics and he invited me to present my work at the MPI-INF. During the conference dinner, already the fi rst MPI-INF students approached me and I received a glimpse of the fantastic atmosphere and team spirit.

This impression was also confi rmed during my visit and the warm welcome immediately convinced me that Saarbrücken is the right next step on my career path. ELMAR EISEMANN I still remember vividly the serious but always positive work atmosphere, combined with a literal strife for “excellence” and the various opportunities for collaboration, which clearly expanded my horizon. The many events in Saarbrücken were impressive CV-Milestones and I got the chance to contribute to the organization of EGSR and HPG, which took place on campus in 2010. • PhD 2008 at University of Grenoble at INRIA Rhône-Alpes, France Besides work, many other activities are still very present in my mind, ranging • Senior Researcher at MPI-INF from an Oktober Fest, to which we went straight out of a deadline without sleep for from 2008 to 2009 48h, to a visit in climbing parks and a trip to the Cologne Karneval with a small group • Today: Professor at TU Delft, Netherlands – which, together with the fact that I impersonated a prince during the cluster launch, led to a change of my door sign into “Prinz Elmar Eisemann”.

Research I am very grateful for the fantastic time at MPI-INF, which obviously had a strong infl uence on my future career steps. I enjoy many very good contacts that date My research covers a broad range of topics in back to this time and I am very thankful for their continuing support. I hope that the the fi eld of computer graphics, including the next 25 years (and many more) will be as successful as the last! ::: main areas of game technology, rendering, geometry processing, and visualization. ALUMNI 111

My time at MPI-INF

I fi rst came to MPI-INF in September ’ 94 when I started as a Post-Doc after having completed my PhD at IIT Delhi. The institute was then housed in a small building near the “Philosophen Café”. I must admit that getting accustomed to the gray weather in Saarbrücken was a bit of a challenge but the cheer, the warmth and the camaraderie of my colleagues at MPI-INF made it possible. Over the 3 years I spent at MPI-INF, I made many friends, friends who I continue to be in touch with till this day and whose friendship I do value a lot.

NAVEEN GARG One incident during my stay there that I still remember and often recount is from a day in November ’ 96. It started snowing around noon and since this was the very fi rst snow that season we were all looking through our large offi ce windows ad- CV-Milestones miring the sight of things turning white. It all looked very pretty but then someone decided that the snow should not stop. So over 8 hours we had enough snow to make • PhD 1994 at IIT Delhi, New Delhi, India all buses go off-road and the roads unsuitable for biking or driving. I used to bike to • Post-Doc at MPI-INF from 1994 to 1997 the institute and my home was almost 7 km away on the other side of the town on top • Today: Professor at IIT Delhi, India of a hill in a place called Bellevue (no not in the prison!). That evening I had no option but to walk and it took me over two hours to get back home. It was a beautiful and a tiring experience, one I am unlikely to ever forget. Research I left Saarbrücken in ’97 to start as an Assistant Professor at IIT Delhi where I My research interests are: design and analy- continue to this day. However my association with MPI-INF has only grown stronger sis of algorithms, designing approximation over the years. My group on “Approximation Algorithms” at IIT Delhi became a partner algorithms for NP-hard problems, combinato- group of the MPG from 2005 to 2010 and then from 2010 to 2015 I have been co- rial optimization and graph theory. director of the Indo-German Max-Planck Centre for Computer Science (IMPECS). I believe that except for a couple of occasions, I have visited MPI-INF every sum- mer since the time I left Saarbrücken. I also made two longer visits – one in 2012 as a “Bessel-Preisträger” and another on my sabbatical in 2006-07. While the researchers I have met and worked with have changed over the years, MPI-INF itself has been as welcoming and helpful as ever.

MPI-INF is celebrating its 25th birthday. Many of us who have passed through its portals owe a lot to this great institution for it has not just groomed us scientifi cal- ly but also given us many of our friends and our experiences. It has been an important part of my life and as everyone around me says “my second home”. ::: 112 ALUMNI

ALUMNI

My time at MPI-INF

20 years after I left MPI-INF, really fond memories of my time at MPI-INF remain. My most vivid memories are of vibrant discussions during talks with Kurt Mehlhorn, Naveen Garg, Sanjiv Mahajan, Torben Hagerup, Pierre Kelsen, Sunil Arya, Shiva Chaudhuri and others.

Sanjiv Mahajan’s off-beat acts like singing in bars in Hindi, long runs along the Saar with Shiva Chaudhuri and Jesper Traff followed by my fi rst marathon in Berlin, badminton with the always smiling Pierre Kelsen, learning Deutsch with RAMESH HARIHARAN Theresa, reading Hermann Hesse’s Siddharta in Deutsch with Andrea Esser, and sneaking past alert guards at the French border in the pre-Schengen era to shop at a mall right across will forever be etched in my memory. ::: CV-Milestones

• PhD 1994 at Courant Institute of Mathematical Sciences, New York, USA • Post-Doc at MPI-INF from 1994 to 1995 • Today: Chief Technology Offi cer at Strand Life Sciences, Bangalore, India

Research

Research-wise, I work on design and analysis of algorithms with provable guarantees. In the recent past, I have been working on algo- rithms for connectivity problems in graphs. My PhD thesis as well as some subsequent work involved design of fast pattern matching algorithms for string, two dimensional and tree patterns. ALUMNI 113

My time at MPI-INF

The time at MPI-INF was truly special, the research environment was very dynamic and productive, and the cultural and social dimensions were enriched by the many nationalities that create the MPI-INF community. During the 5 years at MPI-INF, I learned German and made many friends, attended the traditional Christmas and summer trips, and worked very hard. I proudly wore the MPI-INF badge when attending international conferences, and became a part of the MPI- INF identity. MPI-INF has laid the basis for my academic career, as well as my personal life, enriching me with unique work and life experiences, a PhD and GEORGIANA IFRIM my life partner. :::

CV-Milestones

• PhD Student at MPI-INF from 2005 to 2008 • PhD 2008 at MPI-INF • Today: Lecturer at University of Dublin, Ireland

Research

My main research interests stem from the areas of machine learning, data mining and information extraction. My recent research focuses on designing large-scale real-time machine learning methods, for news and social media data streams. The goal is to develop novel methods to enable new and classic digital journalism applications, such as social indexing and story tracking. 114 ALUMNI

ALUMNI

My time at MPI-INF

I remember the MPI-INF being a paradise for researchers. The time I spent as a Post-Doc in Kurt Mehlhorn’s group was great and productive, and it strongly infl u- enced my scientifi c career. The institute is a place where excellent (junior) research- ers with different research areas meet in a very open and communicative environment. In addition, there are numerous short-stay guest researchers from all over the world. I met many interesting people here at the institute and built up a large research net- work. Kurt, those were great times! I also really enjoyed the wonderful support the administration gave us in organizing events (such as the yearly ADFOCS summer NICOLE MEGOW school and the MPG Career Development Seminars). Working with such a dedicated and experienced team makes everything so much easier.

CV-Milestones All in all, I always felt that I was receiving a lot of support from the MPI-INF – and ultimately from the entire Max Planck Society. ::: • PhD 2006 at TU Berlin, Germany • Post-Doc and Senior Researcher at MPI-INF from 2008 to 2012 • Today: Professor at TU Munich, Germany

Research

My research deals with discrete optimization at the interface of discrete mathematics, theoretical computer science and operations research. My work focuses on developing and mathematically analyzing effi cient algorithms, particularly approximation algorithms, for the kinds of combinatorial optimization problems that can occur in production, logistics or computer system control. ALUMNI 115

My time at MPI-INF

I started studying computer science at Philipps-Universität in Marburg, where I was particularly active in the “Database Systems” group led by Prof. Bernhard Seeger. Towards the end of my diploma thesis, in the autumn of 2003, I had a conversation with Bernhard and he told me that a man by the name of Gerhard Weikum would soon be taking offi ce as a director at the Max Planck Institute in Saarbrücken, and that he would presumably have open PhD positions. Now, this might sound like a cliché, but after that conversation I did actually go back to my desk and did some on- line research to fi nd out where Saarbrücken even was. In the meantime, Bernhard SEBASTIAN MICHEL must have gotten in touch with Gerhard, because a few days later Gerhard sent me an invitation for an interview in Saarbrücken. At least I now knew where to fi nd the city on a map. So in September 2003 I travelled to Saarbrücken for the interview with CV-Milestones Gerhard, who was still working at Saarland University at the time. And – surprise! – I got the PhD position, or else you wouldn’t be reading about it right now. In March • PhD Student at MPI-INF from 2004 to 2007 2004, I fi nally joined the MPI-INF. • PhD 2007 at MPI-INF • Senior Researcher at MPI-INF I had a great time at the MPI-INF: nice colleagues and an extraordinary thesis from 2009 to 2014 advisor. Together we very successfully wrote publications very successfully, super- • Today: Professor at TU Kaiserslautern, vised students and organized seminars. In addition to working with Gerhard, my col- Germany laboration with Prof. Peter Triantafi llou also had a great infl uence on me. Peter came here from the University of Patras in Greece and joined the MPI-INF for a research stay that lasted six months, if I remember correctly – our collaboration went on to last Research much longer than those six months. For me, the MPI-INF stands for outstanding re- search opportunities, high standards and nice people, but just as importantly it also My research interests are databases and stands for “open doors”, in both the literal and fi gurative sense. After I received my (distributed) information systems, particularly PhD in July 2007, I became a postdoc at EPF Lausanne where I joined the working Top-k query processing over data streams and group led by Prof. Karl Aberer and had the opportunity to gain experience in the su- multimedia information retrieval. pervision of PhD candidates, teaching and project management.

In late 2008, I came across a job posting announcing open positions for junior research group leaders at Saarland University’s Cluster of Excellence MMCI. I sent in my application and was selected for one of the positions. I started the new job in August 2009 and was also directly appointed Senior Researcher at the MPI-INF. While I really enjoyed the freedom I had at the MMCI, I also benefi ted from my in- volvement with the MPI-INF, especially from my renewed collaboration with Gerhard Weikum. Several aspects were new: being responsible for staff and a budget, the right to award doctorates as well as the opportunity to give my own lectures – while at the same time having a very low mandatory teaching load.

In March 2014, I was awarded a W3 professorship in databases and information systems at the University of Kaiserslautern, as the successor of Prof. Theo Härder. So now I was off to the Rhineland-Palatinate region. Upon hearing of my appointment as professor in Kaiserslautern, a colleague and friend of mine said: “You should be used to the similar dialect by now!” ::: 116 ALUMNI

ALUMNI

My time at MPI-INF

At the Max Planck Institute for Informatics, work was conducted in a very pro- ductive manner. The large, mixed group and the open atmosphere fostered a high level of collaboration. A project carried out jointly with Gerhard Weikum resulted in the RDF-3X system, which was designed for effi cient processing and searching in graph- structured RDF databases. This project was not only a success in itself and attracted international attention – it was also extremely useful for cooperating with other group members working on knowledge extraction projects such as YAGO, for example. The fact that the group covered such a broad spectrum of topics and that it was so easy to THOMAS NEUMANN work with others were main hallmarks of the MPI-INF, and made for a very pleasant and productive stay. :::

CV-Milestones

• PhD 2005 at University of Mannheim • Post-Doc and Senior Researcher at MPI-INF from 2005 to 2010 • Today: Professor at TU Munich

Research

My research interests are: query optimization, graph-structured databases and distributed query processing. ALUMNI 117

My time at MPI-INF

Moving to MPI-INF was perhaps the biggest gamble in my career. I left behind a successful research group in Helsinki that I had been co-leading for three years. I did not know much about MPI-INF beyond its excellent reputation in natural sciences, and Saarbrücken appeared to be a nice city in the middle of Europe. I would have been able to continue in Helsinki, but I felt I needed to step up and renew my re- search. Trained in cognitive science, neuroscience, and computer science, I felt that the work that I was leading in Helsinki was not intellectually advancing my fi eld. I had been declined a permanent position in Helsinki – twice – and once in Zürich and was ANTTI OULASVIRTA seriously thinking my options. I hence convinced my wife – an anesthesiologist – and my daughter – then 4 years – to move with me – even if neither could speak a word in German! We sold our beautiful home, packed all our stuff to a truck, and we went off. CV-Milestones The very fi rst days I spent there – working on an empty desk in the MPI-INF • PhD 2006 at University of Helsinki, Finland building that was and has always been overcrowded – I realized the magnifi cent and • Senior Researcher at MPI-INF outstanding character of the institute. All group leaders and professors there – many from 2011 to 2014 of whom would later become my long-term collaborators – were highly ambitious and • Today: Professor at Aalto University, Finland brilliant researchers. The atmosphere at MPI-INF was never predatory but always collegial and supportive. It took me about one year to “calibrate myself up” to match that level. I am particularly grateful for Professor Hans-Peter Seidel, my mentor in D4. Research Even if he knew little about my fi eld, his trust on me and his advice on research lea- dership was instrumental to my development as a group leader. At the beginning, we My group‘s mission is to identify and exploit would meet on a weekly basis and later a bit less frequently. I remember knocking on optima of human-computer performance. We his door on the second fl oor and asking “Do you have a minute?” He responded “No formulate interface design problems as opti- I don’t, but come in anyway.” Under his guidance, I started recruiting PhD students mization tasks, develop predictive modeling and a Post-Doc and carefully built a group that I still have very fond memories of. We of interaction, and implement computational laid the groundwork for something for which I later earned an ERC Starting Grant methods for interface design. and am still continuing today.

During my stay at MPI-INF, I was heavily infl uenced by other group leaders, in- cluding in particular Dr. Tino Weinkauf, Prof. Christian Theobalt, Dr. Jürgen Steimle, Dr. Timo Kötzing, and Dr. Andreas Karrenbauer. Our family fared well, too. Although moving to a new country was stressful at the beginning, we found lots of nice escapees in the vicinity and had more time together than in Finland. My daughter eventually fi nished the fi rst year class in Ost Schule and is now starting in Deutsche Schule in Helsinki – the only German-speaking school in Finland. MPI-INF’s support in fi nd- ing a daycare right on the campus was instrumental to my ability to organize our life in Saarbrücken. And my wife eventually reached a level in German that allows her to work in a hospital. Now, looking back to the three years in Saarbrücken, the gamble paid off. ::: 118 ALUMNI

ALUMNI

My time at MPI-INF

Being born in the Saarland it was quite natural to study computer science in Saarbrücken, back then (and still now) one of the best addresses for this subject in Germany. I started my scientifi c career as a Master student and relatively quickly knew that Kurt Mehlhorn’s algorithms group fi t my interests. Hence, I specialized in algorithms. When I started my MSc thesis with Michiel Smid it was the time when the MPI-INF was founded and Kurt Mehlhorn was named director. The Mehlhorn group moved to a temporary building where I did my MSc research with fellow MSc students, among them Hannah Bast and Michael Seel (incidentally I met there my KNUT REINERT now wife in the MPI-INF library).

During my PhD period I was looking into different topics, among those were also CV-Milestones problems in a then relatively new fi eld, computational biology, at the time led by Hans-Peter Lenhof. The problems of working with biological data intrigued me and • PhD Student at MPI-INF from 1994 to 1999 I chose a topic in this fi eld. During the research I worked with international partners, • PhD 1999 at MPI-INF namely Gene Myers and his PhD student John Kececioglu, which should heavily • Today: Professor at FU Berlin, Germany infl uence my next career steps. :::

Research

In my research, I aim at enabling translational research in computationally based life sci- ences by removing existing (communication) gaps between theoretical algorithmicists, statisticians, programmers, and users in the biomedical fi eld. Of particular interest are devising mathematical models and algorithms for analyzing large genomic sequences, de- veloping algorithms for analyzing data derived from mass spectrometry experiments (e.g. for detecting the differential expression of pro- teins between normal and diseased samples), and the implementation of methods that benefi t researchers at the bench or in the clinic. ALUMNI 119

My time at MPI-INF

Even though the basic fundament had already been there at that time, getting my PhD at the MPI-INF essentially meant laying the foundation for my future in at least three different ways. First of all, it was not just me who made a fresh start in Saarbrücken after completing my Master’s degree in Darmstadt – it was also a new be- ginning for the Computer Vision and Multimodal Computing department, or “D2”, as we call it at the institute. Supervised by Director Bernt Schiele, I was, as one of the fi rst doctoral candidates and group admins, involved in setting up the department. Secondly, my PhD became a cornerstone of my professional scientifi c career, just MARCUS ROHRBACH like my postdoc at UC Berkeley. And thirdly, it was at the MPI-INF that I met the woman who is now my wife, Anna Rohrbach, and she is an important part of my life’s foundation. CV-Milestones I always enjoy coming back to visit Bernt and his group, to collaborate with • PhD Student at MPI-INF from 2010 to 2014 them and take part in the retreats. ::: • PhD 2014 at MPI-INF • Today: Post-Doc at UC Berkeley, USA

Research

My research interests are computer vision, computational linguistics and machine learning. 120 ALUMNI

ALUMNI

My time at MPI-INF

Regarding my time at MPI-INF, I would like to mention that I look back with a deep gratitude for being a part of this endeavor. I heartily thank all members of de- partment 4 which accompanied me in Saarbrücken. It is a unique and open-minded research facility which I have never found somewhere else and it is a perfect spring board for young researchers. The fi rst day when I arrived I had a meeting with my supervisor Hans-Peter Seidel. He told me to immediately start applying somewhere else and I thought “wow, I haven’t started here and he wants to get rid of me”. In his second sentence, he told me that as long as my research makes progress I should BODO ROSENHAHN not be concerned about my position. Then I realized: This is the place where I want to be. :::

CV-Milestones

• PhD 2003 at Christian-Albrechts University of Kiel, Germany • Senior Researcher at MPI-INF from 2005 to 2008 • Today: Professor at Leibniz-University Hannover, Germany

Research

My research focuses on automated image interpretation. It ranges from classical com- puter vision to visual signal processing and bio-medical image analysis. ALUMNI 121

My time at MPI-INF

My stay at MPI-INF was pivotal for my career. From day one I was integrated into a highly international team and was able to cooperate on a variety of topics with students, Post-Docs and Senior Researchers. Together with nearby Dagstuhl, one al- most got the impression that travelling is not necessary since everybody important in the fi eld will eventually come by. In particular, Saarbrücken is one of the best places to be if you want to work on graph algorithms, algorithms for memory hierarchies, or algorithm engineering. These areas became permanently important for me.

PETER SANDERS Another interesting feature at MPI-INF is the close cooperation with university. Without a mandatory teaching load, one still has the opportunity to gather experience in teaching a variety of courses. ::: CV-Milestones

• PhD 1996 at University of Karlsruhe • Post-Doc and Senior Researcher at MPI-INF from 1997 to 2004 • Today: Professor at University of Karlsruhe

Research

I work on the design, implementation and analysis of effi cient algorithms; “analysis” can be both theoretical and experimental. Topics I often touch are parallel processing and communication in networks, solving problems with „irregular“ structure, rand- omized algorithms, memory hierarchies (disks, caches) and realistic models for problems and machines. 122 ALUMNI

ALUMNI

My time at MPI-INF

Joining the MPI for Informatics in 2002 was a big chance for me. In fact, at this time I was rather uncertain about my own scientifi c capabilities and career chances. Working in the Computer Graphics group at MPI-INF has changed this immediately. This highly motivating and inspiring environment helped me a lot to get clearer about what I want and what I am able to do.

I remember long nights of hard work before the deadlines where everybody helped each other to fi nalize the research results. I also remember nice social aspects, HOLGER THEISEL both at the MPI-INF and at several conferences with my MPI-INF colleagues. I am still thankful to the MPI-INF (in particular to Hans-Peter Seidel) for giving me this chance; at this time, it was not clear at all which way I would go. I am proud and CV-Milestones happy to be an alumnus of the Max Planck Institute for Informatics. :::

• PhD 1996 at University of Rostock • Post-Doc at MPI-INF from 2002 to 2005 • Research Group Leader at MPC-VCC from 2005 to 2006 • Today: Professor at University of Magdeburg, Germany

Research

My research is focusing on different aspects and subfi elds of visualization, namely fl ow visualization, information visualization, visual analytics, volume visualization and tensor visualization. In addition, I work in computer graphics, in particular in the fi elds of geometry processing and computer-aided geometric design. ALUMNI 123

My time at MPI-INF

I enjoyed the time at MPI-INF in particular due to the fi rst class research environment. Even though I was one of the fi rst D2 members at the time, the great support from the institute’s IT department allowed me to be productive with my research from the very beginning. I also remember numerous fruitful discussions, in particular during the retreats, Friday meetings and before deadlines that made my stay at MPI-INF an inspiring and fun time. :::

CHRISTIAN WOJEK

CV-Milestones

• PhD 2010 at TU Darmstadt, Germany • Post-Doc at MPI-INF from 2010 to 2011 • Today: Computer Vision Senior Scientist at Carl Zeiss Corporate Research, Jena, Germany

Research

My current research interest is in the robust and reliable real-world application of com- puter vision methods to bio-medical (micros- copy, ophthalmology) and industry domains. In particular, my focus is on multi-modal regis- tration, multi-modal segmentation, abnormality and event detection. 124 NEWS

NEWS

This section contains an overview of the important events of the past year: activities to promote

young talent, appointments and awards, prestigious scholarships earned by institute members,

cooperations, and important events surrounding the Max Planck Institute for Informatics.

Signifi cant events include scientifi c conferences, the visit of the Scientifi c Advisory Board, the

Board of Curators meeting as well as the launch of the new website and the opening of the new

building of the Max Planck Institute for Software Systems in Kaiserslautern. 125 CONTRIBUTIONS

Awards, Honors, Commendations ...... 126

Personal Data ...... 130

Grants ...... 131

Non-academic Activities / Patents ...... 132

Cooperations ...... 133

CeBIT 2013 / 2014 ...... 134

IOI Training 2014 / 2015 ...... 135

ADFOCS 2013 and 2014...... 136

Computer Science Research Days ...... 137

Opening of the new building of the MPI-SWS in Kaiserslautern . . . . . 138

Meeting of the Board of Curators ...... 139

Scientifi c Review 2015 ...... 140

Visit of the Ambassador of India and the Indian General Consul . . . . . 141

Industrial Fair ...... 141

Science exhibition in Saarbrücken ...... 142

Launch of the institute’s new website ...... 142

Summer Party IMPRS-CS ...... 143

The Saarland Business Run ...... 143 126 NEWS

Awards, Honors, Commendations

Erasmus Medal for Kurt Mehlhorn Doctor honoris causa of the Thomas Lengauer receives University of Göteborg Hector Science Award

The Academia Europaea decorated The doctor honoris causa is a hon- Kurt Mehlhorn with its highest accolade, orary distinction of universities or facul- the Erasmus Medal for 2014. This an- ties that is bestowed for exceedingly aca- nually appointed award demic or scientifi c merits. Unlike regular attattests its winners doctorates the doctor honoris causa is no ccontinuous inter- academic degree for a certain paper but nationally recog- expression of appreciation. nized scholar- ship of highest Owing to the long-term collabora- level and rec- tion with the University of Gothenburg, ognition by col- Kurt Mehlhorn was decorated with a doc- lleagues. tor honoris causa. In the encomium his In tthe encomium fundamental contributions to many dif- the key role of algorithms ferent topics of computer science were In 2015, Professor Thomas in every computer system was empha- recognized as well as his dedication as Lengauer, director at the Max Planck sized and Kurt Mehlhorn was praised to a professor. ::: Institute for Informatics, was honored be “Mr. Algorithm” of Europe. Among the with the Hector Science Award. The 24 laureates he is the ninth from funda- Hector Foundation II acknowledges mental research, the fourth German, and his achievements in the fi eld of bioin- the fi rst computer scientist. ::: formatics as well as his commitment to teaching. The prize is awarded annually to excellent scientists at German univer- sities and carries a value of 150,000 EUR. The distinguished winners are also awarded the title “Hector Fellow” and become members of the Hector Fellow Academy, which promotes interdiscipli- nary projects and academic networks. :::

As sign for the doctor honoris causa, Kurt Mehlhorn is wreathed with laurel by the dean. NEWS 127

Awards, Honors, Commendations

Heinz Maier-Leibnitz Award Lise Meitner Award Otto Hahn Medal The German Science Foundation The Lise Meitner Award is given presents the Heinz Maier-Leibnitz Award by MPI-INF to young female scientists to young scientists for outstanding re- in the fi eld of computer science. For the sults. It should support them to pursue year 2014 Zeynep Akata, INRIA Rhone- their scientifi c career. The researchers Alpes in Grenoble, was awarded. The must already have award consists of a research stay at their own scientifi c MPI-INF and budget for equipment. The Max Planck Society bestows profi le after their Zeynep is going to spend two years with young scientists with this distinction for Ph.D. Bernt Schiele, Department 2, focussing outstanding results within their Ph.D. Nicole Megow, de- on the topic “Zero-shot and Few-shots partment of Kurt Learning Methods for Large-scale and Carola Doerr, post doctoral re- Mehlhorn, was one Fine-grained Image Classifi cation”. searcher in Kurt Mehlhorn’s department out of ten award [http://www.mpi-inf.mpg.de/news/employment/ until 2013, was awarded with this medal winners in 2013. ::: lise-meitner-award-fellowship/] ::: in 2013. It honoured her doctoral thesis, “Toward a Complexity Theory for Rand- omized Search Heuristics: Black-Box Elisabeth Schiemann Kolleg Chinese Government Award Models”, with which she received her Ph.D. in 2011 at Saarland University. Within the Elisabeth Schiemann The Chinese Government bestows Kolleg scientists of the Max Planck Soci- with the “Chinese Government Award for Ndapandula Nakashole was a ety support outstanding young female re- outstanding students abroad” Chinese peo- Ph.D. student in the department of searchers on their way to tenured faculty ple who study with great success in for- Professor Gerhard Weikum where she or director positions in research institu- eign countries. The government honours received her Ph.D. in 2012. For her tions. their dedication and tries to strengthen thesis “Automatic Extraction of Facts, their home bounds. Chenglei Wu, Ph.D. Relations, and Entities for Web-scale The Schiemann Kolleg promotes student with Professor Hans-Peter Seidel, Knowledge Base Population” that was its female fellows’ activities to establish was awarded in this way in 2014. ::: appraised by the reviewers with summa themselves in the scientifi c world. Paral- cum laude, she was awarded with the lel to that the Kolleg provides a platform Otto Hahn Medal in 2014. ::: for interdisciplinary scientifi c exchange. The support is of aspirational not fi nan- CREST International cial nature. In 2013 this support con- Research Grant Nicole Megow, Eurographics Young Researcher struct was established; The Janpanese Science and Tech- from the department of Kurt Mehlhorn, nology Agency, JST, awards teams of Award was coopted to it together with four other scientists within the Core Research for young female scientists. ::: Evolutionary Science and Technology (CREST) program. The awards are in support of revolutionary ideas in science and technology that give rise to hope The Young Researcher Award is of new research fi elds. In 2014 Andreas given each year by Eurographics (Euro- Bulling, Dept. 2, was part of such a pean Association for Computer Graph- team. ::: ics) to two young researchers in the fi eld who have already made a signifi cant con- tribution and are likely to to make more. The intent of this award is to recognize people early on in their career. Euro- graphics nominated Tobias Ritschel, Dept. 4, as one of the award winners for 2014. ::: 128 NEWS

Awards, Honors, Commendations

Professor Kurt Mehlhorn, the in- stitute’s Founding Director and head of Department 1 “Algorithms and Com- plexity”, was admitted as Foreign Associ- ate by the U.S. National Academy of Engineering; he was incorporated into the class of computer science. The admission is done by decision of the members when a person has achieved outstanding results. This membership is one of the highest honours in the fi eld of engineering science, it is often bestowed as an award for a lifework. :::

Khwarizmi International Award Two Awards for “The Captury” Dr. Eduard Martin Prize

The start up company “The Captury”, Mahmoud Fouz, Ph.D. student that was founded by three scientists of Benjamin Doerr, department 1, was from the group of Professor Christian awarded with the Dr. Eduard Martin Theobalt, got awarded twofold: In the Prize 2013 for his excellent thesis “Ran- “Founders Competition – ICT Innovative” domized Rumor Spreading in Social Net- they were awarded in the highest catego- works & Complete Graphs”. This award Since 1987 the Iranian science ry. In this Germany-wide competition the is founded by Saarland University in organisation IROST (Iranian Research German Government honours the foun- memory to it’s long-time president who’s Organization for Science and Technol- dation of innovative companies in the name it bares. It is annually bestowed ogy) bestows persons with the Khwar- fi eld of Information and Communication by the Association of Friends of Saarland izmi International Award who did out- Technologies (ICT). University to the best young scientists standing work, preferentially in the fi eld “1,2,3,GO” is the competition for the of the University. ::: of digital and mechanical development. best business plan in the Large Region Since 1991 also foreign scientists can be Saar-Lor-Lux-Trier-Wallonie; it is award- awarded. The Award is a memory to the ed by the Association of the Chambers renowned Iranian polymath Abu Jafar of Industry and Commerce. ::: LICS Test-of-Time Award Mohammad Ibn Mousa Khwarizmi who’s name is the etymological source of the The symposium “Logic in Comput- term algorithm. His work made the In- erscience” of the Association for Comput- dian mathematics as well as the use of ing Machinery awards annually papers the number zero known to the western that were published 20 years before and world. Kurt Mehlhorn was bestowed maintained infl uence to the fi eld of auto- with the highest category of this award mated logic. This award is distinguished for 2013. ::: from others as it represents the proven infl uence of a retrograde paper. Uwe Waldmann was chosen for one of this awards in 2013. ::: NEWS 129

Awards, Honors, Commendations

ERC Grant ERC Starting Grant ERC Synergy Grant

The European Research Council Professor Christian awards so-called ERC Grants, the high- Theobalt, DEPT. 4, est level of European research support, was awarded one on a regular basis. It provides substantial of the prestigious support money as well as signifi cant pres- prizes of the ERC. tige for the grant holders who are chosen In his research in a rigorous selection process. project “CapReal – Performance Cap- This funding instrument is meant to ture of the Real support topics of Frontier Research, that World in Motion”, he wants to develop Prof. Gerhard Weikum Prof. Michael Backes is high-risk science at the edge of current new effi cient ways to automatically de- understanding. tect and describe in 3D the motion of humans in video streams that were Scientists from the MPI-INF won two captured in non-artifi cial environments. of these highly competitive awards, one The ERC approved an amount of 1.48 Starting Grant, a prize for scientists who, million euros for a period of fi ve years. at the beginning of their careers, want to start a new fi eld, and one Synergy Grant, the greatest fi nancial award for research teams. Prof. Peter Druschel Prof. Rupak Majumdar

Professor Gerhard Weikum won the most prestigious of the European Research Council awards together with his colleagues, Professor Michael Backes (Saarland University and Max Planck Fellow), and Professor Peter Druschel and Professor Rupak Majumdar (both from the Max Planck Institute for Soft- ware Systems). This team was one of only 13 winners out of approx. 450 proposals in 2013.

They will receive around. 10 million euros for their project imPACT which explores how to keep Internet users safe from spying and fraud, and unmasks per- petrators while maintaining commerce, freedom of speech and open access to information. imPACT stands for “Privacy, Accountability, Compliance, and Trust in Tomorrow’s Internet”. [www.impact-erc.eu/] ::: 130 NEWS

Personal Data

New Senior Researcher 2013 / 14 Thomas Sturm | RG. 1 Thomas Sauerwald | DEPT. 1 Arithmetic Reasoning University of Cambridge Senior Researcher are appointed Sun He | DEPT. 1 Jens M. Schmidt | DEPT. 1 by unanimous decision of the institute’s Randomized Algorithms TU Ilmenau directors when their performance is out- and Computational Geometry standing. Part of this is an evaluation Marc Spaniol | DEPT. 5 by independent external reviewers. This Jilles Vreeken | DEPT. 5 University of Caen Basse-Normandie Exploratory Data Analysis appointment is one of the milestones in Rob van Stee | DEPT. 1 the academic career; it includes the right Andreas Wiese | DEPT. 1 University of Lancaster to guide Ph.D. students up to the doctor- Combinatorial Optimization Fabian Suchanek | Otto Hahn Group ate. During the report period the follow- and Approximation Algorithms ing young scientists got appointed: Télécom ParisTech Jacobo Urbani | DEPT. 5 Klaus Berberich | DEPT. 5 Offers University of Amsterdam Information Retrieval and Data Mining The following young scientists Magnus Wahlström | DEPT. 1 Andreas Bulling | DEPT. 2 of our institute received faculty offers Royal Holloway University of London Machine Learning and Pattern Recognition during the report period: Michael Wand | DEPT. 4 Piotr Didyk | DEPT. 4 Khaled Albassari | DEPT. 1 TU Utrecht and subsequently Displays, Computational Fabrication, University of Abu Dhabi University of Mainz and Perception Gilles Bailly | DEPT. 4 Tino Weinkauf | DEPT. 4 Mario Fritz | DEPT. 2 CNRS France KTH Stockholm Scalable Learning and Perception Syan Bhattacharya | DEPT. 1 Shanshan Zhang | DEPT. 2 Klaus Hildebrandt | DEPT. 4 Institute of Mathematical Sciences Nanjing University of Science and Applied Geometry Chennai Technology Martin Hoefer | DEPT. 1 Martin Cadik | DEPT. 4 Disign of Algorithms Brno University of Technology and Combinatorial Optimization Carola Doerr | DEPT. 1 Andreas Karrenbauer | DEPT. 1 CNRS France Discrete Optimization Philip Geevarghese | DEPT. 1 Michael Kerber | DEPT. 1 Institute of Mathematical Sciences Topological and Geometric Computing Chennai Christoph Lenzen | DEPT. 1 Rainer Gemulla | DEPT. 5 Theory of Distributed Computing University of Mannheim Tobias Marschall | DEPT. 3 Klaus Hildebrandt | DEPT. 4 Algorithms for Computational Genomics TU Delft Pauli Miettinen | DEPT. 5 Matthias Hullin | DEPT. 4 Data Mining University of Bonn Nico Pfeifer | DEPT. 3 Arjun Jain | DEPT. 4 Statistical Learning in Computational IIT Bombay Biology Kwang In Kim | DEPT. 4 Tobias Ritschel | DEPT. 4 University of Lancaster Rendering and GPUs Nicole Megow | DEPT. 1 Marcel Schulz | DEPT. 3 TU Munich High-throughput Genomics and Systems Biology Sebastian Michel | DEPT. 5 TU Kaiserslautern Jürgen Steimle | DEPT. 4 Embodied Interaction Antti Oulasvirta | DEPT. 4 University of Aalto NEWS 131

Grants

Scholarships provide a means of Samsung Scholarship Humboldt Research Award living wage, they are given to outstand- ing scientists. The purpose is focus on Seong Joon Oh got by the Samsung Professor Wolfgang Heidrich, science. Some grants involve signifi cant Scholarship the opportunity to study in King Abdullah University of Science prestige and go along with a substantial the department of Professor Schiele. and Technology in Thuwal, Saudi Arabia step in one’s career. Since 2002 this scholarship sends most was supported by the Humboldt Founda- talented Korean students, of whom tion with a Humboldt Research Award. Some members and guests of our insti- worldwide infl uence is expected in the Owing to his recent results in computer tute were supported by distinguished future, to the best international research based photography and display the foun- grant during the report period: facilities. ::: dation awarded him 60,000 Euros and invited him to a research visit in Google Anita Borg Scholarship Germany. As it is the rule with Silke Jansen, DEPT. 4, was award- Humboldt Schol- ed an Anita Borg Scholarship. Since arships the scholar 2004, Google Inc. provides this fund- can freely chose ing for young female computer sciencist his host. Since in memorial of Dr. Anita Borg. It should September 2014 encourage the grantees to strive for out- the Max Planck Institute for Informatics standing results and leadership. Part of accommodates him. ::: the scholarship is a dedicated retreat besides the fi nancial support. :::

Humboldt Research Fellowship

The Humboldt Foundation awards outstanding post docs scholarships as long as they are at the begin of their aca- demic career – up to four years after their Ph.D. Selected researchers from abroad are supported for 6 to 24 month so that they can continue their research in Germany. They are unrestricted in topic and also permitted to freely chose the German institution where they want to stay during that period of time. Anna Adamaszek, Ran Duan and Artur Jez, three post docs in the department of Kurt Mehlhorn as well as Yusuke Sugano with Bernt Schiele successfully applied for this scholarship. ::: 132 NEWS

Non-academic Activities

Patents

The basic research at the Max results prove a high level of inventiveness, patents, licenses, and inventions for the Planck Institute for Informatics at times it is possible to apply for a patent on the Max Planck Institute for Informatics in produces results that can be directly ex- new invention. The table below provides the year 2014. ::: ploited for commercial purposes. If the an overview of the activities concerning

Topic Activity Persons involved

A Clipping-free Algorithm for Effi cient Assignment / exploitation agreement Karrenbauer HW-Implementation of Local Dimming LED-Backlight

Videoscapes: Exploring Sparse, Invention reported Kim / Theobalt / Tompkin / Kautz Unstructed Video Collections

Method and System for Tracking an Object Invention reported Stoll / Theobalt / Seidel in a Sequence of Digital Video Images

A Perceptual Mode for Disparity Patent management and exploitation Didyk / Ritschel / Eisemann / agreement Myszkowski / Seidel

Method and Apparatus for Encoding High Patent extension Mantiuk / Krawczyk / Myszkoski / Dynamic Range Video Seidel

Apparant Display Resolution Patent application and license agreement Didyk / Eisemann / Ritschel / Enhancement for Moving Images Myszkowski / Seidel

Backward Compatible High Dynamic Range License agreement Efremov / Mantiuk / Krawczyk / MPEG Video Encoding Myszkowski / Seidel

A System for Automatic Material Invention reported Jain / Thormählen / Ritschel / Seidel Suggestions for 3D Objects NEWS 133

Cooperations

Intel Visual Computing Institute Indo-German Max Planck Center for Computer Science

Five years ago, the IVCI was found- The “Indo-German Max Planck institute can apply to become a partner, ed as the largest Intel research institute Center for Computer Science” (IMPECS) whereas German applicants are required in Europe. As there is a higher density was inaugurated on February 3rd, 2010 to be partners of the MPI for Informatics of computer scientists in Saarland than at the Indian Institute of Technology in or the MPI for Software Systems. Cur- anywhere else in the world, Intel decided Delhi (IIT Delhi) by then German Fed- rently, eight binational research groups that Saarbrücken was the right place for eral President Horst Köhler. By encour- are receiving support and further groups its new institute. The IVCI is headed by aging close cooperation between Indian are welcome to apply. a Governance Board on which Intel and and German scientists, the center seeks its four Saarland-based partners (Saar- to promote excellent basic research in The IMPECS allows for an active land University, the DFKI and both Max the fi eld of computer science. Joint re- exchange between postdocs and students Planck Institutes – MPI-INF and MPI- search, exchanges between PhD candi- in both directions. For example in sum- SWS) are equally represented. For an dates and postdocs, a Max Planck visit- mer 2011 professor Manindra Agarwal initial period of fi ve years, twelve million ing professorship and a large number of from IIT Kanpur visited the Max Planck euros have been provided by the high- workshops and schools are intended to Institute for Informatics for several month. tech enterprise. The projects are selected foster the collaboration. For an initial In turn, professor Helmut Seidl of Mu- by a specially established steering com- period of fi ve years, the IMPECS will nich, Technical University of Munich, mittee, with Prof. Christian Theobalt as be funded by the Max Planck Society spent two month at IISc Bangalore in the representative from the Max Planck (MPG), the German Federal Ministry of fall 2011. Institute for Informatics. Two of his Education and Research (BMBF) and projects, “Markerless Motion Capture” the Indian Department of Science and Furthermore, the IMPECS co- and “User-centric Video Processing”, are Technology (DST).The board of directors fi nanced the „Complexity Theory“ work- carried out within the Intel Institute. consists of Kurt Mehlhorn (MPI-INF) shop in Kanpur and the „Recent Trends [ www.intel-vci.uni-saarland.de ]. ::: and Rupak Majumdar (MPI-SWS) from in Algorithms and Complexity“ in Mysore. the Max Planck Society and Naveen For further information, refer to the Garg (IIT Delhi) and Manindra Agrawal website [ www.impecs.org ]. ::: (IIT Kanpur) from India. Bertram Somieski

Approximately ten Indo-German research groups are funded by IMPECS. From the Indian side, any university/ 134 NEWS

CeBIT 2013 / 2014 Information Technology: Saarland Presents its Abilities

Every year in March the doors of Hannover’s Exhibition Site open for the CeBIT. Even if the visitor records of the decade following the millenium are gone, hundreds of thousands enter to inform themselves about news in the area of digital technique. Since 1970 the Ce- BIT is a international competitive exhibi- tion having high density of innovations, high media coverage, and broad public interest.

The exhibition stands of research communities, federal states, and the Bundeswehr are united in hall no. 9 that

changes it’s motto every year. It is tradi- Annegret Kramp-Karrenbauer, Prime Minister of the Saarland, at the exhibit of Dr. Nils Hasler. tion that the MPI-INF is part of the joint stand of Saarlands Computer Science. There, Saarland University and its as- sociated research institutes concertedly exhibit novelties to demonstrate to the public the abilities of Saarlands IT.

The research group of Professor Christian Theobalt presented MPI-INF at the CeBIT 2013. Its exhibit “Captur- ing Reality in Realtime” is a system that can detect and follow persons in a video

stream in real time. This system does not A PIN appears in the Google Glass headset display… … and even the police are stunned. require marker points or a particular stu- dio environment; it works stable even if In 2014 another exhibits was high- INF also draw attention. A computer two persons occlude each other partially. ly noticed that got jointly developed by program developed in the group of Tobias Federal Minister of Economics and Tech- MPI-INF and CISPA: By means of the Ritschel arranges pictures according to nology Philipp Rösler awarded “The Cap- smart glasses “Google Glass” the wearer artifi cial aspects into a composition. The tury”, a start-up company that was found- is shown a PIN that can not be seen by department of Gerhard Weikum present- ed to promote and commercially exploit anyone else. By this theft of PINs is pre- ed a new search engine that scannes the this, with a grand prize of the founders vented. The other parts of the of this news and links its contents by semantic competition “IKT Innovativ”. exhibition that were provided by MPI- algorithms.

When top politicians of the Saar- land visit the CeBIT, a stop at the stand in hall no. 9 is mandatory; Saarland’s computer Science is considered to have Germany-wide importance. ::: Bertram Somieski

“The Captury” is awarded one of the grand prizes inthe startup competition IKT Innovativ. NEWS 135

IOI Training 2014 / 2015 Strict and Demanding Program Prepares Students for Olympiad in Informatics

The Max Planck Institute for Infor- matics, together with the German Federal Ministry of Education and Research,the German Informatics Society (GI), and the Fraunhofer IuK Group, constitute the responsible body of the German National Computer Science Competition [BWINF, http://www.bwinf.de/]. We are especially com- mitted to educating successful students of the national competition so as to qualify them for international competitions. It is for this reason that we, together with Dr. Pohl, manager of the BWINF, also organized the fi nal selection and training for the 2015 International Olympiad in Informatics (IOI).

Simon Bürger, Tobias Lenz, and ground knowledge that extends to topics the participants’ solutions and, thus, the Gregor Matl, all former successful Olym- only taught in graduate level courses. acquiring new knowledge and the learn- piad participants, trained the fi ve selected ing of new methods. students from May 26 to 29, 2015. As A challenge of this year’s training is the case every year, the program was was the following: Player A thinks of a The workshop not only aims to strict and demanding, lasting from nine number between 1 and n, and player B help the participants be successful at the o’clock in the morning until late in the has to guess the number. He is only Olympiad, but seeks also to excite them evening. We are very grateful to the or- allowed to ask questions of the type about computer science research. For this ganizers from Gästehaus Braunshausen, “Is the number less than k?”. Player A purpose, academic presentations and dis- which is funded by both the Saarland answers with either “Yes” or “No”, but is cussions complemented the IOI-specifi c Gymnastics and Soccer Associations, for allowed to lie once in the game. A “Yes” training. This time, the topics addressed providing us with accommodation and from player A costs x euros, a “No” y euros. were “What and how can quantum com- infrastructure at short notice. As a function of x, y, and n, what is the puters perform?” by Prof. Markus Bläser, cost optimal algorithm for player B to and “Open Problems in Automated Reason- A typical challenge of the Olympiad guess the number? ing” by Prof. Christoph Weidenbach. ::: in Informatics consists of, fi rst, develop- Christoph Weidenbach ing an algorithm to a problem and, then, The training incorporates not only faultlessly implementing it. The partici- fi nding solutions to such problems, but pants are even required to have back- also holding subsequent discussions of 136 NEWS

ADFOCS 2013 and 2014 Summer School: Young Talents Deepen Their Knowledge in Computer Science

The ADFOCS, Max Planck Ad- 2013: Lukasz Kowalik, Institute of In- Prof. Kurt Mehlhorn initiated this vanced Course on the Foundations of formatics, University of Warsaw, Dániel event of advanced education in 2000 to Computer Science, has meanwhile be- Marx, Computer and Automation Re- bring together young talents from all over come a classic among summer schools. search Institute, Hungarian Academy the world with established top scientists. Every summer master’s and PhD students of Sciences (MTA SZTAKI), Saket The latter are asked to include their cur- meet in Saarbrücken to deepen their Saurabh, Department of Informatics, rent research topics in their lectures. knowledge in computer science theory. University of Bergen This was meanwhile the 15th time this During fi ve days in August, the young summer school has been held, an attest scientists are intensely schooled by uni- 2014: Sanjeev Khanna, University of for the continuous interest of young versity teachers of international reputa- Pennsylvania, USA, Aleksander Madry, scientists in the course. ::: tion: EPFL, Switzerland, Yurii Nesterov, Bertram Somieski Catholic University of Louvain, Belgium

NEWS 137

Computer Science Research Days Challenging and Supporting Talented High School Students in Computer Science

Every year, in collaboration with er science as one of their main subjects program analysis. The group work cov- the Computer Science Department at were invited to the event. The students ered all areas of computer science, from Saarland University, the German Re- became acquainted with, and actively the fundamentals, to programming, bio- search Center for Artifi cial Intelligence participated in, current computer science informatics, graphics, and computer (DFKI), the Center for Bioinformatics research topics. vision; a wide range of current computer and the companies AbsInt GmbH and science research was covered. Logic4Business GmbH, the Max Planck The participants spent the fi rst Institute for Informatics organizes the evening getting to know each other while The motto of the Computer Science Computer Science Research Days for bowling. They then attended lectures Research Days is: “Challenging and sup- gifted pupils. on specifi c themes, such as “Malware porting”. With this event, the Max Planck Detection”, and also lectures from the Institute for Informatics aims to raise In 2013, 2014, and 2015, the most current curriculum of the computer sci- young people’s enthusiasm for the subject successful participants of the German ence department. In different workshops, of computer science and seeks to discov- Federal Competition in Computer Sci- the participants were given the opportu- er and support the development of new ence (“Bundeswettbewerb Informatik”) as nity to build and test 3D scanners, or talent. ::: well as Rhineland-Palatinate’s best recent for example learn how to avoid bugs with Jennifer Müller high school graduates who had comput- disastrous consequences through static 138 NEWS

Opening of the New Building of the Max Planck Institute for Software Systems in Kaiserslautern

The Max Planck Institute for Soft- ware Systems was founded in 2005 as the second institute of the Max Planck Society that is entirely devoted to com- puter science. The institute was set up from the beginning with two sites, Kaiserslautern and Saarbrücken. The Nürtingen based architect’s offi ce Wein- brenner-Single-Arabzadeh won with a double project, although each building had a separate competition. After the inauguration of the the fi rst building in October 2012 in Saarbrücken, the Kai- sers-lautern building was opened cere- monially on 2 July 2013.

As required, the building’s architec- offi ce workplaces and 6 conference rooms, tonic scheme focusses on the support of the atrium can serve as a large event hall. communication: Meandering internal This, combined with up-to-date commu- fl ights of stairs connect six fl oors. Each nication equipment, provides the institute fl oor consists of peripherical offi ces that with all that is needed for a modern build- encircle a square based atrium that is ing. Located at the entrance of the univer- fl ooded with light. An innovative ceiling sity campus, the building is an architec- construction provides additional light. tonic landmark. ::: In addition to the approximately 100 Bertram Somieski NEWS 139

Meeting of the Board of Curators Support, Inspiration, Discussions and Suggestions

Most Max Planck Insitutes have a Board of Curators. It consists of re- presentatives of politics, science, and societal actors. This board is a link be- tween science and society. During the yearly meetings the institute’s matters are brought to the curators as well as strategic research topics explained. These information about the institute’s occurrences are confi dence-building measures to ensure autonomously or- ganized research in Germany. The names of the current curators are listed on page 4 of this report.

The last meeting of the Board of Curators was held on 05th February 2013. Besides Professor Wolffried Stucky as chairman participated the curators Annegret Kramp-Karrenbauer, Saarland’s Prime Minister, Dr. Siegfried Dais, part- ner of Robert Bosch Industrietreuhand KG, and Christiane Götz-Sobel, Chief INF, referred the institute’s current sta- curators the scientifi c width and depth Editor Science and Technique of Sec- tus in science and society and presented of the institute’s research. Subsequently, ond German Television. Professor Bernt a strategic outlook to further activities. they received their certifi cate of appoint- Schiele, managing director of the MPI- He thanked the Board of Curators for ment by the chairman. The scientifi c part support and inspiration, for criticism and of the meeting was completed by a post- suggestions. er session where young scientists of the MPI-INF presented their results. The institute’s activities to establish a self-sustaining science center were The Board thanked Simone referred about by Kurt Mehlhorn; Hans- Bischoff, Team Leader of the institute Peter Seidel informed on Saarland’s part supervisors at the Administrative Head- of the German Excellence Initiative. quarters and Curator Board Meeting attendee for many years, for her service Four young scientists that were and welcomed her successor, Dr. Chris- advanced to Senior Scientists during the toph Freudenhammer. Bernt Schiele, previous year presented their scientifi c Managing Director, also expressed words topics in short talks. Olga Kalinina, of gratitude for a long-standing, trustful

Professor Stucky presenting Nicole Megow the Antti Oulasvirta, Thomas Sauerwald, and productive collaboration. ::: certifi cate as a Senior Researcher. and Nicole Megow demonstrated to the Bertram Somieski 140 NEWS

Scientifi c Review 2015 Scientifi c Lectures, Poster Sessions and Meetings with the Senior Researchers

The Scientifi c Advisory Board and the Senior Researchers

The Max Planck Society is mainly Board: The maximum term of offi ce of 2015 they were completed by: funded by the German federal and state each member is six years. Prof. Dr. Renée J. Miller governments. However, it is exclusively Toronto University, The members of the Scientifi c Advisory obliged to strive for scientifi c excellence Prof. Dr. Gerhard J. Woeginger Board, appointed by the President of the in science, which is checked and evalu- of Eindhoven University, and Max Planck Society, are: ated on a two-yearly basis by the Scien- Prof. Dr. Thomas A. Funkhouser tifi c Advisory Board for each institute. Prof. Dr. Trevor Darell of Princeton University. The Scientifi c Advisory Board is com- of Berkeley University, posed of leading scientists from the fi elds Prof. Dr. Nir Friedman The review started with presenta- of research in which the respective insti- of the Hebrew University of Jerusalem, tions of the directors and selected senior tute specializes. New members are regu- Prof. Dr. Pascal Fua researchers. It was followed by one-to-one larly appointed to the Scientifi c Advisory of the Swiss Federal Institute of conversations between the members of Technology in Lausanne, the Scientifi c Advisory Board and the Prof. Dr. Jürgen Giesl senior researchers of the institute. Dur- of RWTH Aachen, ing two poster sessions, the work of all Prof. Dr. Alon Halevy scientists was presented to the Scientifi c of Google Research in Mountain View, Advisory Board. The picture shows the California, Scientifi c Advisory Board together with Prof. Dr. Yves Moreau the senior researchers at the end of the of Leuven University, fi rst day of the review. ::: Prof. Dr. Nicole Schweikardt Christoph Weidenbach of Humboldt University Berlin, Prof. Dr. François Sillion of INRIA Grenoble Rhône-Alpes and Prof. Dr. Emo Welzl of ETH Zurich.

Poster session: presentation of the scientists’ work NEWS 141

Visit of the Ambassador of India and the Indian General Consul Fostering Contact with the Institute and with Indian Compatriots Living in Germany

The Indian Ambassador, Sujata ary 2014. The purpose of their visit was for feedback about life circumstances, Singh, and the Indian General Consul in to meet with their fellow countrymen social experiences, and working condi- Frankfurt/M, Raveesh Kumar, visited the and maintain business and academic tions of Indian students in the Saarland. ::: Saarland on 17 April 2013 and 04 Febru- contacts. In open discussions, they asked Bertram Somieski

Industrial Fair Computer Science Explained in a Comprehensible Way

Fundamental research has the re- In addition to the institute’s focus putation of being hard to understand or on fundamental research, this event diffi cult to apply – communicating the should present applicable results of this progress of certain areas of computer research. Hence, our institute presented science is hence a great challenge. Cur- itself together with the computer science rent fi ndings and developments should departments of the University and the be, at least in part, transparently con- DFKI in an big industrial exhibition on veyed to the public at an industrial fair 25 November 2013. By 56 exhibits, dis- on 25 September 2013, where local and played in the foyers of both Max Planck regional companies are the target audi- Institutes, the building of the Cluster of ence. These are offered cooperations Excellence, and the DFKI, the thematic with science institutions on Saarland diversity of the entire Saarland computer University’s Campus and given inspira- science campus was presented. The fair tion for their own developments. The was opened by a talk entitled “Multimod- fair was also open to the public. al Computing and Interaction” given by Professor Hans-Peter Seidel. ::: Bertram Somieski 142 NEWS

Science exhibition in Saarbrücken “Digital – An Index Ahead” – Research You Can Touch at Hauberrisser Hall

As part of the science year “Digital Society” an exhibition titled “Digital – An index ahead” was organized by the city of Saarbrücken. The Hauberrisser Hall in the town hall provided an appeal- ing location for the computer science related institutions of the universities an affi liated institutions to display their topics.The focus was on application-ori- ented research results that were adopted into easy understandable exhibits, giving easy insights to non-professionals.

For a whole month, from 25th Sep- tember to 25th October, random guests, visitors of the town hall, or registered tour groups could rend sorting algorithms by themselves, try their skills by optimiz- A panel discussion part was sched- ing package volumes, or compete with a uled to be part of the exhibition. Owing computer to fi nd the shortest round-trip to the then-current situation, the chosen route. Six exhibits presented the MPI- topic was “Pandemic hazard by infected air INF, depicting a colorful spectrum of travelers?” The results of Glenn Lawyer, research topics. Every Tuesday an “open postdoc in Thomas Lengauer’s department, internet meeting for senior citicens” was made clear this threat really exists, how- offered, including a guided tour to the ever the ebola epidemic at that time was exhibition. no danger to Central Europe. ::: Bertram Somieski

Launch of the Institute’s New Website

After intense preparation and de- sign, the 22th May 2014 saw the launch of the institute’s new website. The con- cept was realized in cooperation with a local advertising company, creating a modern display of our institute. The result is a easy-to-navigate and clear structured website that refl ects the in- stitute’s structure, offerings, and scien- tifi c results. ::: Bertram Somieski NEWS 143

Summer Party IMPRS-CS Celebrating and Enjoying a Culturally Diverse Environment

The International Max Planck Guests were entertained by a col- School for Computer Science supports ourful programme that had activities for and guides foreign students during their toddlers, live music and some contribu- studies of computer science at Saarland tions of scientists; ball games or dancing University. Each summer the IMPRS-CS covered physical interests. An especial- invites the members of both Max Planck ly challenging and sudorifi c affair was Institutes to the summer party that is the one-hour Zumba performance – four held on the Platz der Informatik. Com- young woman demonstrated a choreo- plementary to barbecue and beverages graphy on-stage. ::: originating in the Saarland, the interna- Bertram Somieski tional party guests cook dishes of their home countries, provide baked goods, and offer exotic salads.

The Saarland Business Run The „Planck Quanten“ had a Good Run

The Saarland Business Run enjoys great popularity since its founding in 2005. It is the sports event in the Saarland at- tracting the highest number of participants by assembling a fi eld of 15.000 out of bare 1 million Saarlandians. Companies and housing communities, medical offi c- es and schools send teams of four who’s attitudes vary between scores, fancy cos- tumes, or plain fun.

Since many years running enthu- course, they resist depending on the mixed teams! Lili Jiang topped that with siasts from both Max Planck Institutes weather heat, drenches, or cold winds #60 among all appr. 5000 women. ::: participate in this event displaying strong during the circuit through Dillingen’s city Bertram Somieski spirit. In defi ance of the huge crowd and the adjacent steel mill. In summer on the starting line and four live bands, 2014 team “Planck Quanten #2” scored amusing and diverting runners on the place 93 out of 1276 in the category 144 THE INSTITUTE IN FIGURES

The Institute in Figures

Budget without third-party funds 2011 to 2014 Personnel expenses Material expenses Promotion of young scientists Investments

11,000

10,000

9,000

8,000

7,000

6,000

5,000

4,000

3,000

2,000

1,000

in 1,000 Euro 2011 2012 2013 2014

Staff Current as of 1/1/2015 Ratio of scientific to non-scientific staff members Current as of 1/1/2015

Guests 12

28 Non-scientific staff 17,5% Non-scientific staff

Scientific staff not 82,5% Scientific staff including Bachelor 116 and Master students

73 Postdoctoral staff

229 Total 145

Third-party funding from 2011 to 2014 Numbers and distribution

Other Other Other IMPECS IMPECS EU EU Federal Other EU Ind. partnergroup EU IMPECS Ind. partnergroup government TN McHardy Otto Hahn Group Ind. partnergroup IMPRS IMPRS Otto Hahn Group IMPECS IMPRS Ind. partnergroup Industry Industry Industry IMPRS DFG Industry Federal DFG Federal DFG Federal government government government DFG 2011 59 projects 2012 59 projects 2013 50 projects 2014 36 projects

Third-party funding from 2011 to 2014 Revenues

7,500

7,000

6,500

6,000

5,500

5,000

4,500

4,000

3,500

3,000

2,500

2,000

1,500

1,000

500

in 1,000 Euro 2011 20122013 2014

CONTACT

Volker Maria Geiß Joint Administration Phone +49 681 9325-7000 Email geiss mpi-inf.mpg.de 146 ADMINISTRATION

Joint Administration of the Max Planck Institutes for Informatics and Software Systems

MPI for Informatics Saarbrücken

JOINT ADMINISTRATION 147

MPI for Software Systems Saarbrücken MPI for Software Systems Kaiserslautern

When the Max Planck Institute for Software Excellent research requires excellent service. This guiding principle is the joint administration’s highest maxim in all administrative and scientifi c service activities Systems was founded, the two Max Planck that it provides for both the Max Planck Institute for Informatics and the Max Planck Institute for Software Systems Kaiserslautern/Saarbrücken. Institutes decided to have a joint adminis- tration. This administration comprises all The joint administration offers the scientists of both institutes a wide range of support on matters surrounding their research activities. Special support goes to scientists be- services for both institutes: personnel and ginning and ending their tenure at the institute.

fi nancial services, general Fostering an open and friendly interaction of scientists and guests from around the world is another high standard the administration strives to uphold. This encompasses services, public relations, building services, support in bureaucratic and social matters for foreign scientists and visitors living in and the library. The library is shared by all Germany. computer science education and research This all-round support is only possible because all administrative staff members are committed to practising the idea of service far beyond their required duties. For this, institutions on the university campus. they deserve the deep appreciation of all institute members. :::

In 2011, the information services and tech- nology group was integrated into the joint administration as well.

CONTACT

Volker Maria Geiß Joint Administration Phone +49 681 9325-7000 Email geiss mpi-inf.mpg.de 148 INFORMATION SERVICES & TECHNOLOGY

Information Services and Technology

The aim of the Institute to produce fi rst-class Multiplicity of tools Research on the front line in com- research results is fostered by unhindered puter science often implies using innova- global cooperation and communication in a tive systems from various sources, includ- ing prototype and cross-platform setups motivating environment with a fl exible, high- in hard- and software. As a consequence, we provide heterogeneous systems with quality, reliable, and easy-to-use equipment. hardware from various manufacturers and all major operating systems: MacOS for clients, Solaris for fi le servers and These properties apply as well to our IT infra- Linux and Windows for client and server systems. To improve the homogeneity structure: we operate a versatile system that of these systems and their ease of use, we try to make user data and the most can adapt to rapidly evolving requirements important software packages available independently of platform and OS. while providing consistency and reliability at

the same time. The system does not neglect Automation as a guarantee for reliability security, despite the openness it requires to Regular updates and upgrades of software and hardware for all supported support international cooperation. platforms pose substantial demands on the reliability of the installation and ad- ministration. The mere variety of hard- ware and software components precludes image based installation approaches. In- stead, we use an extensive, automated and partly self-developed package-based installation, confi guration, and admin- stration system that is designed to take the requirements of the researchers and their projects into account.

Once implemented, results can be repeated as required and used very quickly across the whole infrastructure. This implementation work requires ef- fort and, in some cases, slows down the realization. The advantages for operating security, total expenses and time until larger changes are ready, however, clearly outweigh this disadvantage. 149

The system is fl exibly structured, Cooperation and communication 1GB or 10GB connections according so it can be customized to suit changing Our network is divided into various to their bandwidth requirements. Be- requirements (such as hardware require- areas, according to organizational and sides workstations and notebooks, most ments) quickly through specifi c exten- security-relevant points. Externally ac- of these connections are set up to be sions of the installation system. However, cessible, but anonymous services such fail-safe via multiple redundant lines. it is only used for repetitive tasks. Pro- as DNS (Internet address book), WWW, toypical installations are done manually. FTP (data transfer), and SMTP (email) Compute service are combined at the institute’s fi rewall in several demilitarized zones (DMZs), We operate several larger systems Security and protection which are differentiated according to that work with up to 64 closely coupled Fixed mechanisms that protect their meaning and their hazard potential. processors and have up to one and a against sabotage and espionage are not half terabyte of main memory. These possible in open systems. They greatly Guest scientists and students can machines are used for scientifi c research limit usability. The security guidelines connect their own devices by wire or via and applications which require high can therefore be nothing more than a WiFi (BYOD). The network infrastruc- parallelism and uniform access to a compromise that fl exibly follows the ture automatically treats them as external large main memory. requirements. machines. Our largest cluster to date, con- Some direct hazards can be defl ect- International cooperation demands sisting of 128 systems with 8 cores ed through the structure of the network, external access to internal resources in and 48 GB of main memory each, was the fi rewall, the encryption for external the infrastructure (Intranet). Here, we brought into operation in mid-2010. It access, or virus scanners. Indirect haz- offer secure login and access to email, is operated under the Grid Engine (GE). ards, such as the connection to virus- important databases, and other services. The automatic distribution of processes infected computers in the Intranet or Cooperation in software development is faults in externally acting software sys- supported by protected access to several tems, must be opposed by keeping our revision control systems (software reposi- local software installations and virus tories: Subversion, Git, and Github). scanners up to date. This logical structure is spread over a multi-10 GB backbone, consisting of Operational reliability various fl oor and data center switches. Power supply and cooling are de- The backbone is extended with Giga- signed in such a way that server opera- bit and 10 GBit leased lines to each of tion can be maintained, even during a our locations and several partners on the power outage. A generator with a power campuses in Saarbrücken and Kaisers- output of roughly one megawatt guaran- lautern. Internet connectivity is realized tees uninterrupted operation of the core with a 10 GB connection to the German infrastructure even during long outages. National Research and Education Net- work (DFN) that is used jointly with the Monitoring systems based on open university in Saarbrücken. source software provide information via email and text messaging on critical states Workstations and notebooks are of the server systems, the network, and powered via their fl oor switch with Gi- on malfunctions of complex processes in gabit Ethernet. Central servers, server terms of operational and safety aspects. farms and compute clusters are using 150 INFORMATION SERVICES & TECHNOLOGY

on the clusters’ individual computers allows us to reach high utilization on the whole system. We can fl exibly integrate the above-mentioned larger systems into this system. In order to allow their opti- mal distribution, jobs are categorized according to their specifi c requirements. The prioritization of specifi c processes helps with the appropriate and fair distri- bution of resources across all the wait- ing jobs.

In 2014, this setup was extended by a new cluster , which consists of 12 systems , each with 20 cores and 256 GB of main memory. Additionally, each of these systems is equipped with two serv- er GPU cards (Graphic Processor Unit) equipped with 12 GB of memory each. Specially designed software can thus uti- The two mirrored halves are ac- capacity of 700 TB (uncompressed). lize the high computing power and high commodated in at least two different fi re The robot is located in a specially-pre- throughput provided by this technology. compartments so that a fi re would have pared room to maximize the protection little chance of causing data loss, even at of its high-value data. We recently ex- the deepest level. Three fi le servers share tended the system by a second robot File service and data security a view of a disk’s status and can substi- with a capacity of almost 5 PB (uncom- The roughly 560 TB of institute tute for each other within a very short pressed) on about 1200 tapes. The new data is made available via NFS and CIFS period of time using virtual network ad- robot operates at a remote site to protect (SMB). Currently, 37 fi le servers dis- dresses and SAN technology. This also the data against disaster situations. This tribute the volumes of about 70 RAID allows for server updates without any is also achieved by the storage of tapes systems with more than 4000 disks. All perceivable interruptions to service. in a special fi re-proof safe. RAID systems are connected to a redun- dant Storage Area Network (SAN). In Our data security is based on two order to prevent data loss due to a failed different systems: a conventional tape Special systems RAID system, the RAID volumes are backup, which stores the data directly on Diverse special systems are re- operated in a paired mirroring by the a tape robot, and an online disk backup quired for special research tasks, espe- ZFS fi le system. ZFS utilizes checksums system that minimizes space require- cially in the area of computer graphics. to protect data against creeping chang- ments by using data comparison and al- Available systems include a video editing es. It decides during the read operation ways keeps the data online (open source system, several 3D scanners, multivideo whether a data block is correct or not system BackupPC). Because no data in recording systems, and 3D projection and in case it is not correct, it decides to this system is actually held with unneces- systems. External communication and use the corresponding block of the sec- sary redundancy, about 450 TB of gross collaboration is supported through the ond part of the mirror to correct it. This data can be reduced to about 30 TB in operation of several videoconference implies as well that time consuming fi le the different backup runs. installations. system checks prior to the access of the data are unnecessary. File systems are in- The tape robot currently has the spected during normal service operation. ability to access 700 tapes with a total 151

Flexible support of scientifi c projects The IST is divided into one core The described services, servers, and several institute-specifi c support and computing clusters are deployed for groups. Put simply, the core group is re- many scientifi c projects in very different sponsible for services that are identical scenarios, even for collaborations such for both institutes or even operated to- as the ‘The German epigenome pro- gether. The support groups, accordingly, gramme “DEEP”’ [ http://www.deutsches- cover the specifi c needs of the individual epigenom-programm.de ]. In order to meet institutes and the services for their re- the various requirements, the IST (Infor- spective researchers. mation Services and Technology) offers specifi cally tailored degrees of support Staff structure that extend from pure server hosting in individual cases to applications support. In addition to management and The division of tasks is in most cases procurement (two positions), six scien- dictated by the intersection of project tifi c staff members and two technicians requirements and the IST portfolio. work on “core” issues. There is also one For problems, for example, with self-ad- employee for each institute who takes ministered systems in hosting scenarios, care of administrative tasks to support the IST only provides consultation. the joint library and MMCI.

For their service desk, the institute’s Responsibilities support group (two scientifi c staff mem- IT procurement, installation, ad- bers and one technician) is assisted by ministration, operation, application de- a team of students. The service desk is velopment and support, and continuation reachable either by email, through a of the described systems and techniques web interface, or in person during busi- are the duty of the IST, a subdivision of ness hours. In addition to processing the Joint Administration of the Institutes questions concerning the use of the in- for Software Systems and Informatics. frastructure, this group also maintains Due to the Institutes’ cooperation with information systems such as a documen- several departments and institutes of the tation wiki. ::: university, the IST is also responsible for the campus library (‘Campusbibliothek für Informatik + Mathematik’) and the Cluster of Excellence ‘Multimodal Com- puting and Interaction’ (MMCI).

CONTACT

Jörg Herrmann Joint Administration IT Core Phone +49 681 9325-5800 Email jh mpi-klsb.mpg.de Internet http://www.mpi-klsb.mpg.de/gvw/ist.html 152 COOPERATIONS

Selected Cooperations

UNDERSTANDING BIOINFORMATICS IMAGES & VIDEOS ::: BioSolveIT GmbH, ::: German Research Center for Sankt Augustin, Germany Artifi cial Intelligence, Kaiserslautern, Germany ::: CeMM Research Center for Molecular Medicine, ::: ETH Zurich, Zurich, Switzerland Vienna, Austria ::: IST Austria, ::: Heinrich Heine University, Klosterneuburg, Austria Düsseldorf, Germany ::: Katholieke Universiteit Leuven, ::: Informa s.r.l, Rome, Italy Leuven, Belgium ::: Karolinska Institute, ::: Max Planck Institute for Intelligent Stockholm, Systems, Tübingen, Germany ::: Research Center for Natural ::: Microsoft Research, Sciences, Hungarian Academy of Cambridge, UK Sciences, Budapest, Hungary

::: RWTH Aachen, Aachen, Germany ::: Siemens Healthcare Diagnostics, Berkeley, USA ::: Stanford University, Stanford, USA ::: University of Basel, Basel, ::: Technicolor Paris, Paris, France Switzerland ::: TU Darmstadt, Darmstadt, Germany ::: University of Heidelberg, Heidelberg, Germany ::: Tsinghua University, Beijing, China ::: University of Cologne, Cologne, ::: University of British Columbia, Germany Vancouver, Canada ::: Saarland University, ::: University College London, Saarbrücken, Germany London, UK ::: Viiv Healthcare, Brentford, UK ::: University of California, Berkeley, USA ::: Wolfgang-Goethe-Universität, Frankfurt, Germany ::: Saarland University, Saarbrücken, Germany ::: University Tel Aviv, Tel Aviv, Israel GUARANTEES

::: INRIA Nancy, Nancy, France ::: Massachusetts Institute of Technology, Cambridge, USA ::: Tata Institute of Fundamental Research, Mumbai, India ::: TU Munich, Munich, Germany ::: TU Vienna, Vienna, Austria ::: University of Freiburg, Freiburg, Germany ::: University of Oldenburg, Oldenburg, Germany ::: University Tel Aviv, Tel Aviv, Israel 153

INFORMATION SEARCH MULTIMODAL OPTIMIZATION & DIGITAL KNOWLEDGE INFORMATION & VISUALIZATION ::: Aalto University, Aalto, Finland ::: Carnegie-Mellon University, Pittsburgh, USA ::: Albert-Ludwigs-University Freiburg, ::: Carnegie Mellon University, Freiburg, Germany Pittsburgh, USA ::: Google, Mountain View, USA ::: Carnegie Mellon University, ::: IIT Delhi, Delhi, Indien ::: Google, Zurich, Switzerland Pittsburgh, USA ::: KTH Royal Institute of Technology, ::: IBM Almaden Research Center, ::: Centre National de la Recherche Stockholm, Sweden San Jose, USA Scientifi que, Lyon, France ::: Logic4Business GmbH, ::: IIT Bombay, Bombay, India ::: Czech Technical University, Saarbrücken, Germany Prag, ::: IIT Delhi, Delhi, India ::: Siemens AG, Vienna, Austria ::: ETH Zurich, Zurich, Switzerland ::: Max Planck Institute for Software ::: University of Santiago de Chile, Systems, Kaiserslautern and ::: Harvard University, Cambridge, USA Santiago, Chile Saarbrücken, Germany ::: Keio University, Minato, Japan ::: University of Tel Aviv, ::: Microsoft Research, Redmond, USA Tel Aviv, Israel ::: KIT, Karlsruhe, Germany ::: Paris Telecom University, ::: University of Valparaíso, Paris, France ::: Lancaster University, Valparaíso, Chile Lancaster, UK ::: Siemens AG, Munich, Germany ::: LMU Munich, Munich, Germany ::: Tsinghua University, Peking, China ::: Massachusetts Institute of ::: University of Amsterdam, Technology, Cambridge, USA Amsterdam, Niederlande ::: Mircosoft Research, Seattle, USA ::: University of California, Santa Cruz, USA ::: Stanford University, Palo Alto, USA ::: University of Glasgow, Glasgow, UK ::: Technical University Delft, Delft, Netherlands ::: Saarland University, Saarbrücken, Germany ::: TU Dresden, Dresden, Germany ::: TU Munich, Munich, Germany ::: University of Erlangen-Nuremberg, Erlangen, Germany ::: University of Helsinki, Helsinki, Finland ::: University of Magdeburg, Magdeburg, Germany ::: University of Cambridge, Cambridge, UK ::: University College London, London, UK ::: Zuse Institute Berlin, Berlin, Germany 154 PUBLICATIONS

Selected Publications

[1] Spectral Graph Reduction for Effi cient [7] M. Bachynskyi, G. Palmas, A. Oulas- Image and Streaming Video Segmentation, virta, and T. Weinkauf. Informing the design 2014, IEEE Conference on Computer Vision of novel input methods with muscle coactiva- and Pattern Recognition (CVPR). Oral. tion clustering. ACM Transactions on Com- puter-Human Interaction (TOCHI), 21(6):30, [2] A. Adamaszek and A. Wiese. Approxima- 2015. tion schemes for maximum weight indepen- dent set of rectangles. In Proceedings of the [8] G. Bailly, A. Oulasvirta, D. P. Brumby, 54th Annual Symposium on Foundations of and A. Howes. Model of visual search and Computer Science (FOCS), 2013, pp. 400– selection time in linear menus. In Proceedings 409. IEEE. of the SIGCHI Conference on Human Factors in Computing Systems, 2014, pp. 3865–3874. [3] M. Ahmad, V. Helms, T. Lengauer, ACM. and O. V. Kalinina. How Molecular Confor- mational Changes Affect Changes in Free [9] J. Batra, N. Garg, A. Kumar, T. Mömke, Energy. Journal of Chemical Theory and and A. Wiese. New approximation schemes Computation, 11(7):150629134059007, 2015. for unsplittable fl ow on a path. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Sym- [4] A. Antoniadis, N. Barcelo, M. Nugent, posium on Discrete Algorithms, pp. 47–58. K. Pruhs, and M. Scquizzato. Energy- effi cient circuit design. In Innovations in [10] U. Bauer, M. Kerber, and J. Reining- Theoretical Computer Science, ITCS‘14, haus. Distributed computation of persistent Princeton, NJ, USA, January 12-14, 2014, homology. In 2014 Proceedings of the Sixteenth 2014, pp. 303–312. Workshop on Algorithm Engineering and Ex- periments, ALENEX 2014, Portland, Oregon, [5] A. Antoniadis, C. Huang, and S. Ott. USA, January 5, 2014, 2014, pp. 31–38. A fully polynomial-time approximation scheme for speed scaling with sleep state. In Proceed- [11] P. Baumgartner and U. Waldmann. ings of the Twenty-Sixth Annual ACM-SIAM Hierarchic superposition with weak abstrac- Symposium on Discrete Algorithms, SODA tion. In M. P. Bonacina, ed., Automated De- 2015, San Diego, CA, USA, January 4-6, 2015, duction – CADE-24, Lake Placid, NY, USA, 2015, pp. 1102–1113. 2013, LNAI 7898, pp. 39–57. Springer.

[6] Y. Assenov, F. Müller, P. Lutsik, J. [12] D. Becker, P. Lutsik, P. Ebert, Walter, T. Lengauer, and C. Bock. C. Bock, T. Lengauer, and J. Walter. Comprehensive analysis of DNA methyla- BiQ Analyzer HiMod: an interactive software tion data with RnBeads. Nature Methods, tool for high-throughput locus-specifi c analy- 11(11):1138–1140, 2014. sis of 5-methylcytosine and its oxidized derivatives. Nucleic acids research, 42(Web Server issue):W 501–7, 2014. 155

[13] E. Berberich, P. Emeliyanenko, A. [20] P. Chalermsook, B. Laekhanukit, and [26] H. Errami, M. Eiswirth, D. Grigoriev, Kobel, and M. Sagraloff. Exact symbolic- D. Nanongkai. Independent set, induced W. M. Seiler, T. Sturm, and A. Weber. numeric computation of planar algebraic matching, and pricing: Connections and tight Detection of hopf bifurcations in chemical curves. Theoretical Computer Science, 491: (subexponential time) approximation hard- reaction networks using convex coordinates. 1–32, 2013. nesses. In 54th Annual IEEE Symposium J. Comput. Physics, 291:279–302, 2015. on Foundations of Computer Science, FOCS [14] E. Berberich, D. Halperin, M. Kerber, 2013, 26-29 October, 2013, Berkeley, CA, [27] A. L. Fietzke and C. Weidenbach. and R. Pogalnikova. Deconstructing approxi- USA, 2013, pp. 370–379. Superposition as a decision procedure for mate offsets. Discrete & Computational timed automata. Mathematics in Computer Geometry, 48(4):964–989, 2012. [21] P. Chalermsook, B. Laekhanukit, and Science, 6(4):409–425, 2013. D. Nanongkai. Pre-reduction graph products: [15] K. Berberich and S. Bedathur. Hardnesses of properly learning dfas and ap- [28] A. Geiger, M. Lauer, C. Wojek, Computing n-gram statistics in MapReduce. proximating EDP on dags. In 55th IEEE An- C. Stiller, and R. Urtasun. 3d traffi c scene In Advances in Database Technology (EDBT nual Symposium on Foundations of Computer understanding from movable platforms. IEEE 2013), Genova, Italy, 2013, pp. 101–112. Science, FOCS 2014, Philadelphia, PA, USA, Transactions on Pattern Analysis and Machine ACM. October 18-21, 2014, 2014, pp. 444–453. Intelligence (PAMI), 36(5):1012–1025, 2013.

[16] S. Bhattacharya, M. Hoefer, C.-C. [22] C. Chen and M. Kerber. An output- [29] M. Ghaffari, A. Karrenbauer, F. Kuhn, Huang, T. Kavitha, and L. Wagner. Main- sensitive algorithm for persistent homology. C. Lenzen, and B. Patt-Shamir. Near-opti- taining nearpopular matchings. In Proc. 42nd Computational Geometry: Theory and Applica- mal distributed maximum fl ow: Extended Intl. Colloq. Automata, Languages, Program- tions, 46(4):435–447, 2013. abstract. In Proceedings of the 2015 ACM Sym- ming (ICALP), 2015, vol. 2, pp. 504–515. posium on Principles of Distributed Computing, [23] D. Chen, D. I. Levin, P. Didyk, P. PODC 2015, Donostia-San Sebastián, Spain, [17] K. Bringmann. Why walking the dog Sitthi-Amorn, and W. Matusik. Spec2fab: July 21-23, 2015, 2015, pp. 81–90. takes time: Frechet distance has no strongly a reducertuner model for translating spe- subquadratic algorithms unless SETH fails. cications to 3d prints. ACM Transactions on [30] D. Günther, A. Jacobson, J. Reining- In 55th IEEE Annual Symposium on Founda- Graphics (TOG), 32(4):135, 2013. haus, H.-P. Seidel, O. Sorkine-Hornung, tions of Computer Science, FOCS 2014, and T. Weinkauf. Fast and memory-effi cient Philadelphia, PA, USA, October 18-21, 2014, [24] D. Dhungana, C. H. Tang, C. Weiden- topological denoising of 2D and 3D scalar 2014, pp. 661–670. bach, and P. Wischnewski. Automated fi elds. IEEE Transactions on Visualization verifi cation of interactive rule-based congu- and Computer Graphics (Proc. IEEE VIS), [18] A. Bulling, U. Blanke, and B. Schiele. ration systems. In E. Denney, T. Bultan, and 20(12):2585–2594, 2014. A tutorial on human activity recognition using A. Zeller, eds., 28th IEEE/ACM International body-worn inertial sensors. ACM Computing Conference on Automated Software Engineering [31] M. Hoefer and L. Wagner. Locally Surveys, 46:1–33, 2014. (ASE 2013), Palo Alto, CA, USA, 2013, pp. stable marriage with strict preferences. In 551–561. IEEE. Proc. 40th Intl. Colloq. Automata, Languages, [19] A. Bulling, U. Blanke, and B. Schiele. Programming (ICALP), 2013, vol. 2, pp. A tutorial on human activity recognition using [25] R. Duan and K. Mehlhorn. A combi- 620–631. body-worn inertial sensors. ACM Computing natorial polynomial algorithm for the linear Surveys, 46(3):33:1–33:33, 2014. Arrow-Debreu market. Information and Com- putation, 243:112–132, 2015. a preliminary version appeared in ICALP 2013, LNCS 7965, pages 425-436. 156 PUBLICATIONS

[32] J. Hoffart, F. M. Suchanek, K. Berber- [38] M. Malinowski and M. Fritz. A multi- [44] M. Pilipczuk, P. Sankowski, and E. ich, and G. Weikum. YAGO2: A spatially and world approach to question answering about J. van Leeuwen. Network sparsifi cation for temporally enhanced knowledge base from real-world scenes based on uncertain input. steiner problems on planar and bounded- Wikipedia. Artifi cial Intelligence, 194:28–61, In Neural Information Processing Systems genus graphs. In 55th IEEE Annual Sympo- 2013. (NIPS), 2014. sium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October [33] J. Hosang, R. Benenson, P. Dollár, [39] S. Metzler and P. Miettinen. 18-21, 2014, 2014, pp. 276–285. and B. Schiele. What makes for effective Clustering boolean tensors. Data Mining and detection proposals? IEEE Transactions on Knowledge Discovery, pp. 1–31, 2015. [45] M. Regneri, M. Rohrbach, D. Pattern Analysis and Machine Intelligence Wetzel, S. Thater, B. Schiele, and (TPAMI), In Press. [40] S. Mukherjee, G. Weikum, and C. M. Pinkal. Grounding action descriptions Danescu-Niculescu-Mizil. People on drugs: in videos. Transactions of the Association for [34] J. Kappes, B. Andres, F. Hamprecht, Credibility of user statements in health com- Computational Linguistics (TACL), 1(Mar): C. Schnörr, S. Nowozin, D. Batra, S. Kim, munities. In S. A. Macskassy, C. Perlich, 25–36, 2013. B. Kausler, T. Kröger, J. Lellmann, N. Komo- J. Lescovec, W. Wang, and R. Ghani, eds., dakis, B. Savchynskyy, and C. Rother. KDD‘14, 20th ACM SIGKDD International [46] K. Rematas, T. Ritschel, M. Fritz, and A comparative study of modern inference Conference on Knowledge Discovery and Data T. Tuytelaars. Image-based synthesis and re- techniques for structured discrete energy Mining, New York, NY, USA, 2014, pp. synthesis of viewpoints guided by 3d models. minimization problems. International Journal 65–74. ACM. In Proc. CVPR, 2014. of Computer Vision, pp. 1–30, 2015. [41] C. H. Nguyen, T. Ritschel, and H.-P. [47] M. Rempfl er, M. Schneider, G. D. [35] S. Karayev, M. Fritz, and T. Darrell. Seidel. Data-driven color manifolds. ACM Ielacqua, X. Xiao, S. R. Stock, J. Klohs, Anytime recognition of objects and scenes. Trans. Graph., 34(2), 2015. G. Székely, B. Andres, and B. H. Menze. In IEEE Conference on Computer Vision and Reconstructing cerebrovascular networks Pattern Recognition (CVPR), 2014. oral. [42] S. Olberding, M. Wessely, and J. under local physiological constraints by in- Steimle. Printscreen: Fabricating highly teger programming. Medical Image Analysis, [36] T. Lengauer, N. Pfeifer, and R. customizable thin-fi lm touch-displays. In Pro- 25(1):86 – 94, 2015. Kaiser. Personalized HIV therapy to control ceedings of the 27th Annual ACM Symposium drug resistance. Drug Discovery Today: Tech- on User Interface Software and Technology [48] H. Saikia, H.-P. Seidel, and T. Weinkauf. nologies, 11:57–64, 2014. [Best Paper Award – top 1%], New York, NY, Extended branch decomposition graphs: USA, 2014, UIST ‚14, pp. 281–290. ACM. Structural comparison of scalar data. Com- [37] Y. Li, H. L. Nguyen, and D. P. Wood- puter Graphics Forum (Proc. EuroVis), 33(3): ruff. Turnstile streaming algorithms might [43] B. Pepik, M. Stark, P. Gehler, and B. 41–50, 2014. as well be linear sketches. In Symposium on Schiele. Multi-view and 3d deformable part Theory of Computing, STOC 2014, New York, models. IEEE Transactions on Pattern Analysis [49] S.-E. Schelhorn, M. Fischer, L. NY, USA, May 31-June 03, 2014, 2014, pp. and Machine Intelligence (TPAMI), 2015. Tolosi, J. Altmüller, P. Nürnberg, H. 174–183. Pfi ster, T. Lengauer, and F. Berthold. Sensitive detection of viral transcripts in human tumor transcriptomes. PLoS com- putational biology, 9(10):e1003228, 2013. 157

[50] C. Schulz, C. von Tycowicz, H.-P. [56] K. Templin, P. Didyk, K. Myszkowski, [61] M. A. Yosef, J. Hoffart, M. Spaniol, Seidel and K. Hildebrandt. Animating M. M. Hefeeda, H.-P. Seidel, and W. and G. Weikum. AIDA: An online tool for deformable objects using sparse spacetime Matusik. Modeling and optimizing eye ver- accurate disambiguation of named entities constraints. ACM Transactions on Graphics gence response to stereoscopic cuts. ACM in text and tables. In H. V. Jagadish, J. (SIGGRAPH), 33(4):109:1–109:10, 2014. Transactions on Graphics (Proc. SIGGRAPH), Blakeley, J. M. Hellerstein, N. Koudas, 33(4), 2014. W. Lehner, S. Sarawagi, and U. Röhm, eds., [51] N. K. Speicher and N. Pfeifer. Inte- Proceedings of the 37th International Confer- grating different data types by regularized [57] K. Templin, P. Didyk, K. Myszkowski, ence on Very Large Data Bases (VLDB 2011), unsupervised multiple kernel learning with and H.-P. Seidel. Perceptually-motivated Seattle, USA, 2011, Proceedings of the VLDB application to cancer subtype discovery. stereoscopic fi lm grain. Computer Graph- Endowment, pp. 1450–1453. VLDB Endow- Bioinformatics (Oxford, England), 31(12):i ics Forum (Proceedings Pacifi c Graphics 2014, ment. 268–i275, 2015. Seoul, South Korea), 33(7):349–358, 2014. [62] Y. Zhang, A. Bulling, and [52] S. Sridhar, F. Mueller, A. Oulasvirtta, [58] C. von Tycowicz, C. Schulz, H.-P. H. Gellersen. Sideways: A gaze interface and C. Theobalt. Fast and robust hand track- Seidel, and K. Hildebrandt. An effi cient for spontaneous interaction with situated ing using detection-guided optimization. construction of reduced deformable objects. displays. In Proc. of the 31st SIGCHI Inter- In IEEE Intl. Conf. on Computer Vision and ACM Transactions on Graphics (SIGGRAPH national Conference on Human Factors in Pattern Recognition (CVPR), 2015. Asia), 32(6):213:1–213:10, 2013. Computing Systems (CHI 2013), 2013, pp. 851–860. [53] M. Stikic, D. Larlus, S. Ebert, and [59] M. Weigel, T. Lu, G. Bailly, A. Oulas- B. Schiele. Weakly supervised recognition virta, C. Majidi, and J. Steimle. iskin: Flex- of daily life activities with wearable sensors. ible, stretchable and visually customizable IEEE Transactions on Pattern Analysis and on-body touch sensors for mobile computing. Machine Intelligence (PAMI), 33(12):2521– In Proceedings of the 33rd Annual ACM Con- 2537, 2011. ference on Human Factors in Computing Sys- tems [Best Paper Award – top 1%], New York, [54] T. Sturm. Subtropical real root fi nding. NY, USA, 2015, CHI ‚15, pp. 2991–3000. In Proceedings of the 2015 ACM on Interna- ACM. tional Symposium on Symbolic and Algebraic Computation, New York, NY, USA, 2015, [60] C. Wu, M. Zollhöfer, M. Nießner, ISSAC ‚15, pp. 347–354. ACM. M. Stamminger, S. Izadi, and C. Theobalt. Real-time shadingbased refi nement for con- [55] S. Tang, M. Andriluka, and B. sumer depth cameras. ACM Transactions on Schiele. Detection and tracking of occluded Graphics (Proc. SIG-GRAPH Asia), 2014. people. International Journal of Computer Vision (IJCV), 2014. 158

Directions

The Max Planck Institute for Informatics (Building E1 4) is located on the campus of Saarland University, approximately 5 km north-east Max Planck Institute for Informatics of downtown Saarbrücken, in a wooded area near Dudweiler.

university east Saarbrücken has its own airport (Saarbrücken-Ensheim) and is well- connected to the Frankfurt and Luxemburg airports by shuttle bus, train, multi-storey car or car. There are good hourly train connections from Saarbrücken to cities park east all over Germany, and trains to Metz, Nancy and Paris leave at regular intervals. The express highways – or Autobahns – provide fast routes to Mannheim, Frankfurt, Luxemburg, Trier, Köln, and Straßburg as well as Metz, Nancy, and Paris.

north

The campus is accessible...

from the airport at Saarbrücken-Ensheim by taxi in approximately 20 minutes

from Saarbrücken central train station by taxi in approximately 15 minutes by bus in approximately 20 minutes direction “Dudweiler-Dudoplatz” or “Universität Campus” exit “Universität Mensa”, or alternatively exit “Universität Busterminal”

from Frankfurt or Mannheim on Autobahn A6 exit “St.Ingbert-West” follow the white signs for “Universität”, leading to the campus

from Paris on Autobahn A4 exit “St.Ingbert-West” follow the white signs for “Universität”, leading to the campus 159

IMPRINT

Publisher Max Planck Institute for Informatics Campus E1 4 D-66123 Saarbrücken

Editors & Coordination Klaus Berberich Eugen Denerz Volker Maria Geiß Jan Hosang Thomas Kesselheim Thomas Lengauer Kurt Mehlhorn Jennifer Müller Nico Pfeifer Bernt Schiele Nathalie Schultes Hans-Peter Seidel Bertram Somieski Christian Theobalt Uwe Waldmann Christoph Weidenbach Gerhard Weikum

Proofreading Krista Ames Kamila Kolesniczenko Nathalie Schultes

Contact Max-Planck-Institut für Informatik Phone +49 681 9325-0 Fax +49 681 9325-5719 Email info mpi-inf.mpg.de Internet http://www.mpi-inf.mpg.de

Report Period 1st of January, 2013 to 31st of December, 2014

Design Behr Design | Saarbrücken

Printing Druckerei Wollenschneider | Saarbrücken

Max Planck Institute for Informatics Campus E1 4 D-66123 Saarbrücken

Phone +49 681 9325-0 Fax +49 681 9325-5719 Email info mpi-inf.mpg.de Internet http://www.mpi-inf.mpg.de