2020

Annual Report Jahresbericht

Think beyond the limits!

1 From a view of the spike glycoprotein-heparin Nahaufnahme der Interaktion zwischen Heparin und interaction: In the PRACE research project “DyCoVin dem SARS-CoV-2 Spike-Glykoprotein: Im Rahmen des – Interactions and dynamics of SARS-CoV-2 PRACE Projekts “DyCoVin – Interactions and dynamics spike-heparin complex”, the Molecular and Cellular of SARS-CoV-2 spike-heparin complex” untersucht die Modeling group (MCM) led by Rebecca Wade Gruppe „Molecular and Cellular Modeling“ (MCM) unter perform molecular dynamics simulations to investi- der Leitung von Rebecca Wade mithilfe von Molekular- gate how heparin may hinder SARS-CoV-2 infection. dynamik-Simulationen Moleküle, die am Infektionspro- The researchers want to characterize the structure zess von SARS-CoV-2 beteiligt sind Die Wissenschaftle- and dynamics of putative binding patches for rinnen und Wissenschaftler befassen sich insbesondere heparin-like compounds on the spike glycoprotein mit der Bestimmung der Struktur und Dynamik mög- (cf. Chapter 2.8, pp. 58/59). licher Bindungsstellen für heparin-ähnliche Wirk- (Image: Giulia Paiardi). stoffe am Spike-Glykoprotein (siehe Kapitel 2.8, S. 58/59). (Bild: Giulia Paiardi).

2 1 HITS Annual Report 2020 – Inhalt | Table of Contents

1 Think beyond the limits! 4 6 Collaborations 92

2 Research 8–81 7 Publications 94 2.1 Astroinformatics (AIN) 8 2.2 Computational Carbon Chemistry (CCC) 14 8 Teaching 98 2.3 Computational Molecular Evolution (CME) 18 2.4 Computational Statistics (CST) 24 9 Miscellaneous 102 2.5 Data Mining and Uncertainty Quantification (DMQ) 32 9.1 Guest Speaker Activities 102 2.6 Groups and Geometry (GRG) 38 9.2 Presentations 104 2.7 Molecular Biomechanics (MBM) 46 9.3 Memberships 107 2.8 Molecular and Cellular Modeling (MCM) 52 9.4 Contributions to the Scientific Community 109 2.9 Natural Language Processing (NLP) 60 9.5 Awards 110 2.10 Physics of Stellar Objects (PSO) 66 2.11 Scientific Databases and Visualization (SDBV) 72 10 Boards and Management 112 2.12 Theory and Observations of Stars (TOS) 78

3 Centralized Services 82 3.1 Administrative Services 82 3.2 IT Infrastructure and Network 83

4 Communication and Outreach 84

5 Events 88 5.1 Conferences, Workshops & Courses 88 5.1.1 Emulator Day 88 5.1.2 EUStands4PM workshop “Using patient-derived data for in 88 silico modeling in personalized medicine” 5.1.3 ZOrA workshop 89 5.1.4 Workshop “FAIR Data Infrastructures for Biomedical Communities” 89 5.1.5 Astrophysics winter workshop 90 5.2 HITS Colloquia 90 5.3 HITS anniversary reception 91

2 3 project “DeepCurate” (see Chapters 2.9 and 2.11). The scientific idea also 1 Think beyond the limits! came from the HITS Lab initiative. Two additional HITS Lab projects began in 2020. The first project, "Emulation in Simulation", is a collab- oration between Frauke Gräter (MBM), Fritz Röpke (PSO), and Tilmann Gneiting (CST) with the aim of estimating partial results via the clever use of machine-learning techniques – so-called emulators – and thereby of reducing computa- tional effort (see Chapters 2.4 and well as for Fabian Schneider, a The HITS Lab is an internal funding 5.1.1). The second project, “Geometry visiting scientist at proposal writing program for projects in which at least and Representation Learning,” is a time who is using the funds from his two groups from different disciplines collaborative effort between Anna ERC Starting Grant to establish his at HITS come together to work on a Wienhard (GRG), Michael Strube own junior group at HITS, “Stellar shared topic. The participating (NLP), and their groups that investi- Evolution Theory” (SET), which began groups have the opportunity to hire gates the use of non-Euclidean in January 2021. researchers as part of the HITS lab geometries within natural language who – in turn – are jointly supervised processing (NLP, see Chapter 2.6). We were thrilled to see independent by the respective group leaders. reviewers recognize the achievements Despite the pandemic and contact of HITS researchers in their respec- The first project to emerge from the restrictions of all kinds, there is much PD Dr. Wolfgang Müller Dr. Gesa Schönberger tive fields, and we continue to work initial considerations of the HITS Lab to report from HITS for 2020, and the (Scientific Director / Institutssprecher) (Managing Director / Geschäftsführerin) on the further development of HITS. was launched toward the end of 2019, next few years promise to continue An essential characteristic of the when Frauke Gräter (MBM) and to be fruitful. We expect to see Institute is its interdisciplinarity. From Michael Strube (NLP) together with projects and results that will take full In every Annual Report, the HITS From the very beginning, it was keep the HITS team spirit high. This the very moment of its founding, Vera Nünning (from the English advantage of the diversity of HITS management describes the major critical to pass on the ever-changing team spirit remained despite the three clusters of research emerged Department at Heidelberg University) and lead to the development new achievements of the preceding public information on the implemen- problems that most of us had and and have been continually strength- – working within the framework of a ideas. As you can also see, Scientific calendar year. At the beginning of tation and easing of restrictions – continue to have with the pandemic. ened: We have groups that develop project at the “Marsilius Kolleg” at Director Frauke Gräter, who assumed 2020, in line with previous years, we which was initially only available in The most important part of keeping and apply methods in the life sci- Heidelberg University – asked, “Does office at the beginning of 2021, will wrote a text about communication at German – internally and on a weekly spirits high, however, is played – as ences, groups that make astronomi- the quality of writing influence continue to be heavily involved – not the Institute and the role that its basis (and in English). As we have always – by the HITSters themselves cal observations and simulations, and scientific impact?” Around the same only in the management of HITS, but environment and the physical proximi- come to learn, the emails on the thanks to their interest, motivation, groups that work in a method-cen- time, Michael Strube (NLP), Wolfgang also in HITS Lab projects. We look ty of the workspaces play in fostering corona situation that were produced trust, and good will. HITS is a special tered, cross-disciplinary way. Müller (SDBV), and colleagues ob- forward to all that is to come. solidarity and joint research at HITS. and circulated by the HITS communi- place to work, even when working tained funding from the BMBF for the cations team (see Chapter 4) were from home. Within the individual fields, collabora- Just a few weeks later, however, also passed on to colleagues at other tion is relatively easy. Researchers came the first corona lockdown in institutes due to their thoroughness This positive attitude was also from the life sciences, for example, Germany, which posed a special and usefulness. reflected in our research: Some HITS share a common language. However, challenge to all of us as well as to groups participated in corona-related collaboration across disciplinary internal communications. We were Beginning in March, all events were research (see Chapters 2.3, 2.4, 2.8, boundaries is more challenging. able to deal with the new require- canceled and soon switched to digital and 2.11), while others took advan- While there is already ongoing ments very thoroughly and without formats, which were even better tage of the time to publish a consid- interdisciplinary work at HITS, we aim major setbacks because almost all attended than their pre-corona erable number of papers. In addition, to give the opportunity addressing scientists and non-academic staff non-digital counterparts. The online numerous third-party funding approv- this challenge more intensively. One were able to work well from home. At end-of-the-year celebration was also als were granted, including no less tool that can be used to help improve the same time, it was important to us a great success. We believe that our than three ERC grants: one each for collaboration is called the “HITS Lab.” to maintain the HITS spirit and diligent communication in years past HITS group leaders Frauke Gräter remain as transparent as possible. as well as during the crisis helped (MBM) and Saskia Hekker (TOS) as

4 5 1 Think beyond the limits!

Saskia Hekker (TOS) sowie für den bisher als Gastwissenschaftler am HITS tätigen Fabian Schneider, der die Mittel aus seinem ERC Starting Grant dafür verwendet, am HITS ab 2021 seine eigene Juniorgruppe „Stellar Evolution Theory“ (SET) aufzubauen.

Die Anerkennung unabhängiger Gutachter/-innen für die Leistungen der HITS-Forscher/-innen in ihren Spezialgebieten sehen wir mit großer Freude – und arbeiten zugleich weiter an der Fortentwicklung des HITS. Ein wesentliches Merkmal des Instituts ist seine Interdisziplinarität. Schon Gegen Ende des Jahres 2019 lief das Wienhard (GRG) und Michael Strube bei der Gründung waren drei Richtun- erste Projekt an, das aus den ersten (NLP) mit ihren Gruppen an nicht- gen erkennbar: Wir haben Gruppen, Überlegungen zum Thema HITS Lab euklidischen Geometrien für Lernauf- die Methoden in den Lebenswissen- entstanden war: „Does the quality of gaben in der natürlichsprachlichen schaften entwickeln und anwenden. writing influence scientific impact?“ Datenverarbeitung (siehe Kapitel 2.6). Wir haben Gruppen, die Astronomie fragten Frauke Gräter (MBM) und beobachtend und simulierend betrei- Michael Strube (NLP) gemeinsam mit Trotz Pandemie und Kontaktein- ben, und wir haben Gruppen, die Vera Nünning (Anglistisches Seminar schränkungen jeglicher Couleur gibt methodenzentriert, gebietsübergrei- der Universität Heidelberg) im Rah- es aus dem HITS für 2020 viel zu In jedem Jahresbericht berichtet die HITS-Spirit zu wahren und möglichst Jahren davor, ebenso wie die Kom- fend arbeiten. men eines Projektes am Marsilius- berichten, und man sieht jetzt schon, Institutsleitung über das vergangene transparent zu sein. Wichtig war es munikation während der Krise, Kolleg der Universität Heidelberg. dass die nächsten Jahre bunt wer- Jahr. Anfang 2020 schrieben wir von Anfang an, die sich ständig geholfen haben, den HITS-Team- Innerhalb der einzelnen Gebiete ist Etwa zeitgleich haben Michael Strube den. Wir erwarten viele interessante bezogen auf die Erfahrungen aus den ändernden und zu Beginn nur in Spirit auf hohem Niveau zu halten. das Zusammenarbeiten vergleichs- (NLP), Wolfgang Müller (SDBV) und Projekte und Resultate, in denen wir Vorjahren einen überzeugten Text deutscher Sprache zur Verfügung Dies gilt trotz der Probleme, die die weise einfach. Forschende aus den Mitarbeiter/-innen das Projekt „Deep- die Vielgestaltigkeit des HITS nutzen über Kommunikation im Institut, stehenden öffentlichen Informationen meisten von uns mit dieser Situation Lebenswissenschaften zum Beispiel Curate“ beim BMBF eingeworben und neue Ideen entwickeln. Wie Sie sowie über die Rolle, die die Umge- über Restriktionen und Wiedererlaub- hatten und haben. Am wichtigsten haben eine gemeinsame Sprache, (siehe Kapitel 2.9 und 2.11). Die auch sehen können, ist die ab 2021 bung des HITS und die kurzen Wege tes wöchentlich intern (und auf hierfür sind aber immer noch die doch die Zusammenarbeit über wissenschaftliche Idee kam auch hier amtierende Institutssprecherin Frauke für den Zusammenhalt und die Englisch) zu kommunizieren. Die von HITSter selbst, ihr Interesse, ihre Disziplingrenzen hinweg bleibt eine aus der HITS Lab-Initiative. Gräter sehr stark engagiert – nicht gemeinsame Forschung am HITS der HITS-Kommunikation (siehe Motivation, ihr Vertrauen und ihr Herausforderung. Dieser will sich nur in der Leitung des HITS, sondern spielen. Kapitel 4) erstellten Rundmails zur guter Wille. HITS ist ein besonderer HITS in Zukunft noch intensiver Im Jahr 2020 haben zwei weitere auch bei den HITS Lab-Projekten. Wir Corona-Situation werden, wie wir Arbeitsplatz, auch im „Homeoffice.“ stellen, ein Werkzeug hierzu heißt HITS Lab-Projekte begonnen: Zum freuen uns darauf. Doch schon wenige Wochen später hören, an Kolleg/-innen anderer „HITS Lab“. einen das Projekt „Emulation in kam der erste Corona-Lockdown in Institutionen weitergereicht, weil sie Diese positive Einstellung schlug sich Simulation", eine Zusammenarbeit Deutschland und damit eine besonde- so gut und nützlich sind. auch in der Forschung nieder: Einige Das HITS Lab ist ein internes Förder- von Frauke Gräter (MBM), Fritz Röpke re Herausforderung für uns alle. Und HITS-Gruppen haben sich an Corona- programm für Projekte, in denen sich (PSO) und Tilmann Gneiting (CST). auch eine Herausforderung für die Ab März wurden alle Veranstaltungen motivierten Forschungsarbeiten mindestens zwei Gruppen aus Hier geht es darum, durch geschick- interne Kommunikation. Wir sind sehr abgesagt und bald auf digitale beteiligt (siehe Kapitel 2.3, 2.4, 2.8 unterschiedlichen Disziplinen am ten Einsatz von Techniken des konsequent mit den neuen Notwen- Formate umgestellt. Diese digitalen und 2.11), andere haben die Zeit HITS zusammenfinden, um ein maschinellen Lernens als sogenannte digkeiten umgegangen. Dies konnten Formate waren sogar besser besucht genutzt, um viel zu publizieren. gemeinsames Thema zu bearbeiten. Emulatoren, Teilresultate zu schätzen wir tun, weil nahezu alle Wissen- als die nicht-digitalen Formate vor Darüber hinaus gab es zahlreiche Die beteiligten Gruppen haben die und so den Rechenaufwand zu schaftler/-innen ebenso wie das der Corona-Zeit. Auch die online- Drittmittel-Bewilligungen, unter Möglichkeit, dafür Mitarbeiter/-innen reduzieren (siehe Kapitel 2.4 und nicht-wissenschaftliche Personal gut Jahresendfeier war ein großer Erfolg. anderem über sage und schreibe drei einzustellen, die wiederum von zwei 5.1.1). Und zum anderen das Projekt von zuhause arbeiten konnten. Wir haben das Gefühl, dass unsere ERC Grants für die HITS-Gruppenlei- Gruppenleiter/-innen gemeinsam „Geometry and Representation Zugleich war es uns ein Anliegen, den sorgsame Kommunikation in den terinnen Frauke Gräter (MBM) und betreut werden. Learning“ – hier arbeiten Anna

6 7 Probabilistic flux variation methods have relied on fitting galaxy the FVG on a probabilistic basis in gradient templates and modeling the order to endow it with the ability to 2 Research host-galaxy profile via high-resolu- (i) account for uncertainty in the flux We have been working on a probabi- tion images. The flux-variation-gradi- measurements, (ii) jointly take all listic reformulation of the flux-varia- ent (FVG) method does not require photometric bands into account 2.1 Astroinformatics (AIN) tion-gradient method with the goal images of high spatial resolution when inferring the intersection point, of disentangling the roles played by and instead simply makes use of and (iii) produce a distribution – as active galactic nuclei (AGN) and fluxes measured at different photo- opposed to a point estimate – of their host galaxies in photometric metric bands. However, FVG does where the intersection point is likely reverberation mapping. This work require (i) a constant-in-time contri- to be located. will serve as a precursor study bution of the host galaxy (including before we embark on developing non-varying emission lines), (ii) a Our probabilistic reformulation more-powerful models that consider varying AGN contribution, (iii) an comprises two steps. The first step the actual physics that underlie the empirically derived linear relationship involves identifying the total flux line generation of the observed light of fluxes in different photometric (dashed line in Fig. 1) as the first curves. Such models will help us not bands, and (iv) knowledge of the principal component obtained via only in disentangling the photomet- colors of the host galaxy in question. probabilistic principal component ric contributions of AGNs and their analysis (PPCA). In contrast to host galaxies but also in shedding The FVG can be best understood classical principal component light on the physical properties of geometrically, as explained in Figure 1. analysis, PPCA incorporates an these systems, such as their black- In this simple geometric view, the explicit noise model that allows us hole mass and accretion rate. goal of FVG is to find the intersec- to account for the presence of noise tion point between the line of the in the observed data. Furthermore, AGNs contain large black holes in vector (termed the “galaxy vector”; by adopting a Bayesian perspective, their centers and produce a great green in Fig. 1) that expresses the we can work out a distribution of deal of energy, which renders them hypothesized colors of the host likely first principal components, among the most-luminous objects in galaxy and the line (dashed in Fig. 1) which means that we can identify a the Universe. Due to their extremely on which the total flux (galaxy plus set of likely lines on which the Group Leader Scholarship holder compact size with respect to their AGN) observations lie (pink in Fig. 1). observed fluxes may lie. This is Dr. Kai Polsterer Erica Hopkins host galaxies and to their usually Inferring this intersection point illustrated in Figure 2 for the case of three observed band filters. Staff members Student assistant Dr. Nikos Gianniotis (staff scientist) Fenja Kollasch The second step of our approach Dr. Antonio D’Isanto involves identifying the intersection Dr. Jan Plier (since June 2020) point. The current FVG method identifies the intersection point as the intersection of a single line that In recent decades, computers have come to revolution- around active super-massive black holes in the centers passes through the total flux obser- ize astronomy. Advances in technology have given rise of galaxies. Driven by these scientific challenges, we vations (pink points in Fig. 2) and to new detectors, complex instruments, and innovative develop new methods and tools and share them with the line implied by the galaxy vector telescope designs. These advances enable today’s the community. From a computer-science perspective, (e.g., u in Fig. 2; in green). In our astronomers to observe more objects than ever before we focus on time-series analyses, sparse-data prob- approach, as illustrated in Figure 2, and at higher spatial, spectral, and temporal resolu- lems, morphological classification, the proper evalua- we have a distribution of lines as tions. In addition, the possibility to observe astroparti- tion and training of models, and the development of Figure 1: Sketch of FVG: On the right-hand side, we see observations from bands i (top) and j opposed to a single line. Hence, we cles and gravitational waves along with previously explorative-research environments. These methods and (bottom) measured at three different time instances. By pairing flux values of co-occurring need to clarify what it means to observations, we form points (in pink) in the flux-flux plot (left-hand side) that fall on a dashed untapped wavelength regimes is now granting tools will prove critical to the analysis of data in large search for an intersection point line. Vector bij (in brown) corresponds to the unobserved host galaxy and defines line bij• • x, more-complete access to the Universe. upcoming survey projects, such as SKA, Gaia, LSST, which intersects the dashed line at x0. The FVG method consists of finding the intersection of between a line (defined by the The Astroinformatics group deals with the challenges and Euclid. u • x and the dashed line, where u (in green) is the so-called galaxy vector that is assumed to galaxy vector) and a density of lines: of analyzing and processing such complex, heteroge- Our ultimate goal is to enable scientists to analyze the have the same direction (i.e., the same colors) as the unobserved bij. The intersection we seek is a point neous, and large datasets. Our scientific focus in ever-growing volume of information in a bias-free relatively large distance from Earth, directly informs us of the different along the line implied by the galaxy astronomy is on evolutionary processes as well as manner. AGNs appear as point sources and photometric flux contributions of the vector (green in PF2) that receives extreme physics in galaxies, as found, for example, are difficult to spatially resolve in AGN vs. the galaxy. Based on this considerable support by the density photometric observations. Previous view, we worked on reformulating of lines. This view leads to a distri-

8 9 2.1 Astroinformatics (AIN) In this context, the AIN group at from the field of machine learning an archive that contains spectral HITS joined the ESCAPE project, a combined with a simple graphical information from heterogeneous large European collaboration whose user interface that allows for com- sources. This prototype is meant to aim is to tackle the new challenges pressed visualization, inspection, and demonstrate how we can overcome created by data-driven research in interaction with spectral observations. the need to download a complete astronomy and astroparticle physics Dimensionality-reduction methods are dataset in order to find a few interest- while maintaining a joint focus on used to construct a low-dimensional ing sources. This new concept of data dealing with complex data work- space into which high-dimensional access and interaction could become flows, infrastructural issues, and data items are projected with the aim an additional standard for accessing data- and software interoperability. of projecting similar objects close to the next generation of archives (see Our contribution was the develop- one another. This method enables us Figure 3).

Figure 2: Probabilistic FVG in action: The pink dots represent observations in three photometric bands; the green line corresponds to the so-called galaxy vector, which is hypothesized to have the same colors as the host galaxy. Our method infers a distribution of lines that likely go through the observations (pink dots). The cyan tube contains the distribution of these lines. We plot three samples of such lines in black and note that the uncertainty of the tube grows with increasing distance from the observed data. The yellow dots display the distribution of possible intersection points – that is, the “intersection” of the green line with the density of lines that go through the observed data. bution of likely intersection points access and analysis possible. The Modern practice has come to adopt Figure 3: The prototype developed to explore the spectra in the HARPS archive. The overview panel allows for interaction with the ML models (yellow dots in Fig. 2) and hence to astronomical community has made machine-learning solutions to deal and for selecting data (upper left). The characteristics of the spectra at the selected coordinate and in the selected area are shown in the three corresponding spectral plots (center). Histogram functions for subset selection as well as an overplot of spectral classes are shown at the a distribution of likely relative a large effort to fulfill these needs with a wide range of analysis tasks bottom. VO tools – such as Topcat (for inspecting the selected sources as tables) and Aladin (for inspecting the corresponding images) – are photometric contributions by the and provide the required infrastruc- used in answering astronomical integrated through VO standards (right). AGN vs. the host galaxy. ture. In so doing, the role of the questions. However, these solutions International Virtual Observatory require efficient access to and ment of new explorative access to browse massive datasets ordered In addition to multiple similarity Infrastructure for exploring Alliance (IVOA) has been essential processing of a vast number of data methods for spectroscopic data by structural similarities in order to measures, several dimensionality-re- astronomical spectra as the IVOA defines standards and in order to train models and obtain taken from the ESO archive. Instead find classes, outliers/anomalies, and duction models beyond a plain ensures a proper discussion on how reliable results. Some data centers of following current data-retrieval scientifically relevant objects. For the autoencoder were implemented Data archives have a long history of to make infrastructures, software, are currently working on concepts practices, we used explicit search prototype, we chose an autoencoder (Principal Component Analysis, use in astronomy. In the past, the and data interoperable and to how such as “bringing code to the data” criteria and positionally indexed as the main dimensionality-reduction Gaussian Process Latent Variable main preservation media were follow concepts like the FAIR data in order to overcome bottlenecks queries to come up with a novel model: an unsupervised neural Model, variational autoencoder, photographic plates and catalogs. principles. However, the current related to data transfer. Other search paradigm based on structural network that is able to compress convolutional autoencoder). Further- Since the beginning of the digital services of data providers reveal approaches focus, for example, on information as well as on data original data into a low-dimensional more, the prototype is connected to era, however, digital data archival severe limitations in functionality in data visualization, predefined analy- similarity. In other words, we built a representation and to generate a the most-important VO tools (Aladin, has been a major topic in astrono- the context of the Big Data regime. sis tasks, or queries about pre-ex- prototype that allows implicit and reconstruction from this low-dimen- Topcat, Splat) in order to obtain my. Advances in instrumentation The data deluge has rendered tracted features that provide a explorative access to data, including sional space. further information on the sources and dedicated data-intense survey traditional access- and analysis compressed representation of the searches by similarity. and allow additional interactive telescopes have led to an increasing techniques unfeasible with respect original data. Two datasets were selected as use features. The user can explore the demand for novel data infrastruc- to the size of modern archives. We reached our goal by utilizing cases: HARPS, an archive that in- projected space by similarities, make tures that make efficient data dimensionality-reduction methods cludes stellar spectra only, and UVES, selections, create new catalogs, and

10 11 2.1 Astroinformatics (AIN) ture individually at the single-cell level. The ability to extract interpre- Computational cardiology table features will enable us to determine features relating to a Due to our interest in analyzing specific substance, which is neces- massive datasets as well as in sary to ensure reliable application in morphologically analyzing galaxies at cardiology. We therefore use select- large scales, we are also involved in a ed engineered features, automatical- medical project dealing with morpho- ly learned features, and compressed logical data at microscopic scales representations as the basis for a (see Chapter 6, “Informatics4Life”). balanced comparison, for example, Our role in this project involves between pure classification perfor- building a pipeline that processes mance and scientific robustness. the captured images and extracts First experiments with respect to Figure 4: The long-term goal of the project presented as a flow chart. The current stage uses features that describe the morpho- unsupervised machine learning are special substances as surrogates for the diseases. The pipeline – including the pre-processing logical structure of each captured currently underway (for the long- phase – has already been implemented. cell. Subsequently, these extracted term goal of the project, see Fig. 4). import and export data. It is possible on a screen. With such a configura- features will be used to build a to load pre-trained models, to retrain tion, the model mainly learns to robust and stable diagnostic tool. In order to create high-resolution them, and to select between differ- reconstruct the spectral continuum, This project does not deal with data images in microscopy, small ent loss functions. which is directly connected to from astronomy, but rather with high-resolution patches are scanned temperature and therefore also to image data from a microscope and then stitched back together. Inspecting the projections of the the spectral classes. instead of a telescope. Both con- However, in microscopy, stitching is HARPS dataset yielded a clear cepts share similar problems and challenging since there might not be sequence of structural similarities Our future plans include the possibil- challenges, and it is therefore worth enough content to reliably calculate that corresponds to the main stellar ity of transferring the prototype to a transferring solutions from one field the necessary features that allow spectral classes, which was con- Jupyter Notebook/Lab package, to the other. the adjacent frames to be precisely firmed by checking the spectral making it available to the communi- registered. Homogeneous back- classification from Simbad and ty, and creating a web service for Instead of extracting morphological ground illumination as well as proper over-plotting it as color. This result some specific archives in order to features from galaxies, we are using focusing are essential for subse- was expected because the autoen- show the potential of this new and our profound knowledge in this area quent tasks, such as segmentation. coder had been configured to project efficient approach to making scien- to develop a machine-learning The same challenges exist in astron- the spectra in a 2-D space and to tific data accessible. approach to detect single cells and omy, and some can be solved allow the projections to be visualized describe their morphological struc- through special observational techniques, such as the so-called drift-scan technique. We have In den letzten Jahrzehnten hat der Einsatz von Computern die Astronomie stark beeinflusst. Der technologische Fort- adapted some common calibration schritt ermöglichte den Bau neuer Detektoren und innovativer Instrumente sowie neuartiger Teleskope. Damit können techniques from astronomy to Astronomen nun mehr Objekte als je zuvor mit bisher unerreichtem Detailreichtum, sowohl räumlich, spektral als auch microscopy and have already zeitlich aufgelöst beobachten. Hinzu kommen neue Beobachtungsmöglichkeiten durch z.B. Astroteilchen sowie Gravita- managed to increase the resulting tionswellen, die neben bisher nicht beobachtbaren Wellenlängenbereichen ein vollständigeres Bild des Universums bieten. image quality (see Fig. 5), thereby Die Forschungsgruppe Astroinformatik (AIN) beschäftigt sich mit den Herausforderungen, die durch die Analyse und leading to improved background Verarbeitung dieser komplexen, heterogenen und großen Daten entstehen. In der Astronomie beschäftigen uns die illumination and improved sharpness Figure 5: Comparison of the image quality gained by applying different pre-processing steps Fragestellungen im Bereich der Galaxienentwicklung sowie die extremen physikalischen Vorgänge, wie man sie z.B. in der of the imaged cells. We utilize a (before: above; after: below). Note the improvement in background illumination (red) and in sharpness (green). Umgebung von aktiven supermassereichen schwarzen Löchern in den Zentren von Galaxien findet. Auf diesen Fragestel- modified version of BaSiC for the lungen basierend, entwickeln wir neue Methoden und Werkzeuge, die wir frei zur Verfügung stellen. In der Informatik liegt background- and shading correction unser Interesse hierbei auf der Zeitreihenanalyse, dem Umgang mit spärlichen Daten, der morphologischen Klassifikation, based on low rank and sparse der richtigen Auswertung und dem richtigen Training von Modellen sowie explorativen Forschungsumgebungen. Diese decomposition. Image sharpness Werkzeuge und Methoden sind eminent wichtig für aktuelle und sich gerade in der Vorbereitung befindenden Projekten, could be improved by using focus wie SKA, Gaia, LSST und Euclid. stacking, which requires multiple Unser Ziel ist es, einen möglichst unvoreingenommenen Zugang zu dieser enormen Menge an Information zu gewähr- observations with different focus leisten. positions.

12 13 Accurate models of gas insights, they are challenged by the with experimental results from the adsorption on graphene periodic nature of the graphene literature for the CO -graphene 2 Research 2 substrates and the broad variability system. These results represent an Christopher Ehlert, Anna Piras, and of the adsorption complex geome- important milestone for the CCC 2.2 Computational Carbon Ganna Gryn’ova tries. group and form the basis of the In order to develop a reliable protocol methodological approaches used in Graphene is a two-dimensional sheet for simulating graphene-based gas our ongoing and future studies on Chemistry (CCC) of sp2 carbon atoms arranged in a sensors, we benchmarked a range of graphene chemistry. Even more honeycomb lattice. This material theoretical procedures against strikingly, certain properties of the displays a number of tantalizing available experimental data for the investigated adsorbates were found

electronic and optical properties adsorption of carbon dioxide (CO2) on to be independent of the model- and pertinent to its applications in nano- a pristine graphene surface. Simula- method choice. Specifically, the electronic devices. Among these tions across a range of models – relative stabilities between the

devices, graphene-based gas sensors from infinite periodic graphene to its different CO2 adsorption sites on are perhaps the most well-developed finite clusters of varying sizes – that pristine graphene were found to be

and utilized in practice. These sen- capture diverse CO2 adsorption generally independent of the density sors rely on a change in electric geometries and employ methods functional and the cluster size. properties triggered by (non-)covalent based on density functional theory Moreover, to arrive at an accurate

A B C

Group Leader Visiting scientist Dr. Ganna Gryn’ova Dr. Michelle Ernst (SNSF Scholarship, since August 2020) Staff members Dr. Christopher Ehlert Project student Anna Piras Juliette Schleicher (Heidelberg University, May–July 2020) Scholarship holder Oğuzhan Kucur

Modern functional materials combine structural complexity and its derivatives interacted with small molecules via with targeted performance and are utilized across many physi- and chemisorption and in simulations in which areas of industry and research, from nanoelectronics to metal-organic frameworks encapsulated small molecules in Figure 6: A Circular models of pristine graphene (the color code reflects the cluster size: smallest adsorption energy for a realistic large-scale production. Theoretical studies of these materi- their pores. Accurate approaches to energy and property in red, medium in red and green, and largest in red, green, and blue). B Various orientations periodic graphene sheet, we estab- (bridge, hollow, and top) of CO adsorbed on benzene. C The interaction energies of CO with als bring mechanistic underpinnings to light, facilitate the computations were identified via benchmarking against 2 2 lished a simple linear fit. With such graphene as a function of the inverse number of carbon atoms in the underlying surface model design and pre-screening of candidate architectures, and available experimental and high-level in-silico data. The at the PBE-D3 level of theory, as well as the linear regression of these data points. an extrapolation, interaction energies ultimately enable predictions of the physical and chemical utility of various tools for visualizing and analyzing interac- for an artificial, infinitely large properties of new systems to be made. tions in these complex systems was validated. interactions between the gas mole- and wavefunction theory were graphene model could be obtained The Computational Carbon Chemistry (CCC) group uses In terms of applications, the design guidelines for cule adsorbate and the graphene- performed (Figure 6A, B). Through within 1 kJ mol–1 accuracy using any theoretical and computational chemistry to explore and graphene-based sensors for nitroaromatic pollutants and based sensing surface. A profound these simulations, we identified quantum-chemical method that is exploit diverse functional organic and hybrid materials. In its for nanographene electrocatalysts for oxygen reduction understanding of adsorbate-surface theoretical procedures for obtaining applicable to finite cluster models 2nd year at HITS, the group focused on establishing reliable reaction were elucidated, with candidate architectures interactions is crucial for improving geometries and interaction energies (Figure 6C). yet computationally feasible protocols for the structural being pre-screened using the established computational existing sensors and developing new accurately and at an acceptable exploration and property quantification of various systems. protocols. These findings lay the groundwork for the in ones with targeted selectivities and computational cost compared with Diverse methods of density functional theory and wavefunc- silico development of new and improved functional carbo- sensitivities. While accurate in silico the prohibitively expensive gold tion theory were tested in simulations in which graphene naceous materials and hybrid organic–inorganic materials simulations are indispensable for standard of computational chemistry with targeted sensing, catalytic, and electronic behavior. gaining the necessary mechanistic (the coupled cluster method) and

14 15 detection separation 2.2 Computational Carbon Chemistry (CCC) the electron affinities (EAs) of the two drug catalysis quantum theory of atoms in molecules (QTAIM), the non-co- catalysts supports these findings: The delivery valent interactions (NCI) index, and the density overlap Nanographene catalysts for catalysts for ORR that had previously computed EA of system II was higher regions indicator (DORI), were applied to enable visualization oxygen reduction reaction been tested experimentally (I and II in than that of system I, which strength- and further analysis of the interactions between the frame- Figure 7A). The chemisorption of ened our argument on the anionic Metal-Organic Frameworks work and the guest molecules (Figure 9B). Our results pore sizeaccessibility Christopher Ehlert, Anna Piras, molecular oxygen on these substrates, nature of the active catalytic state. MOFs for emphasize the critical role of model choice and corroborate applications Juliette Schleicher, and Ganna which is the first step in the catalytic These results highlight the important solvent à Rational the understanding that, for an appropriately chosen model, Adsorption design Gryn’ova cycle, was simulated in several theoretical implications for modeling à Tailored the various tested tools provide invaluable insights into the properties electronic states using dispersion-cor- the ORR with electrocatalysts and physical nature, spatial localization, and energetic strength Host-guest Oxygen reduction reaction (ORR) lies rected density functional theory. First, suggest that electron affinity is as a interaction of the interactions between MOFs and small molecules analysis at the heart of sustainable energy-con- results obtained to date highlight the simple yet powerful descriptor of the Ab-initio à Tuneable inside their pores. These findings lay the groundwork for the computation interactions version technologies, such as fuel cells importance of accounting for the catalytic activity of nanographenes in à Structure- • Energy decomposition design and pre-screening of future MOFs without relying on property analysis analysis and metal-air batteries. However, its electrode double layer in the simula- ORR. • Electron density analysis pre-existing – and currently rather rare – high-resolution • Periodic calculations slow kinetics necessitates the use of tions. Specifically, adsorption on the • Cluster calculations experimental structures. electrocatalysts. Recently, metal-free neutral catalyst was found to be Tools for analyzing and visu- heteroatom-doped carbon catalysts energetically unfavorable. However, alizing non-covalent interac- Figure 8: A roadmap for experimentally and A B have emerged as more efficient, since the catalyst itself is adsorbed on tions in metal-organic computationally exploring MOFs. stable, low-cost, earth-abundant the negatively charged cathode, an frameworks of density functional theory in addition QTAIM alternatives to conventional Pt-based electron transfer under applied bias to symmetry-adapted perturbation systems and have demonstrated can potentially lead to the anionic Michelle Ernst and Ganna Gryn’ova theory were assessed for computing excellent performance in ORR. Howev- catalyst state. Indeed, constrained the interaction energies of the com- er, the chemical complexity of such potential energy scans for negatively Metal-organic frameworks (MOFs) are plexes of a selected MOF with two NCI index DORI periodic systems precludes both charged catalysts revealed a region of porous crystalline hybrid organic–inor- biologically active molecules in precise mechanistic analysis and an attractive chemisorption (Figure 7B). ganic materials that consist of regular- conjunction with periodic and finite elucidation of the structure–activity Of the two experimentally tested ly connected nodes and linkers, have cluster models (Figure 9A). Further- relationships. This challenge is eradi- catalysts, I was found to be inactive in high internal surface areas and low more, various energy decomposition Figure 9: A Crystal packing of the studied MOF with a 4,4’-bipyridine guest molecule, view in c cated in smaller derivatives (nanog- ORR while displaying stronger densities, and are able to host small schemes were employed to elucidate direction. One guest molecule and the nearest water molecule within the framework are shown with balls and sticks, while the corresponding hydrogen bonds between them are denoted with raphenes, also called graphene chemisorption in our simulations guest molecules. Due to their highly the physical nature of these interac- dashed lines. Inlet: the chosen cluster model for each complex. B Visualization of the results nanoflakes or graphene quantum tunable composition, , and tions. Next, a number of electron from various schemes for density partitioning. physico-chemical properties, MOFs are density partitioning tools, including the A B being increasingly utilized for gas storage and separation, drug delivery, Moderne funktionale Materialien kombinieren strukturelle Komplexität mit zielgerichteter Performance und werden in ver- (photo-)catalysis, and biological schiedenen Bereichen von Industrie und Forschung verwendet, von der Nanoelektronik bis hin zur Massenfertigung. Theoreti- imaging (Figure 8). A rational design of sche Studien dieser Materialien fördern mechanistische Grundlagen zutage, erleichtern das Design und Vorsortieren von B new MOFs with targeted absorption Kandidaten und ermöglichen Vorhersagen zu physikalischen und chemischen Eigenschaften neu geschaffener Systeme. I N properties requires an in-depth theo- Die Forschungsgruppe Computational Carbon Chemistry (CCC) arbeitet mit den neuesten Methoden der theoretischen und retical understand- computergestützten Chemie, um funktionale organische und Hybrid-Materialien zu untersuchen und auszuwerten. In ihrem ing of their micro- zweiten Jahr am HITS lag der Forschungsschwerpunkt auf der Erstellung zuverlässiger und gleichzeitig rechnerisch durch- scopic building führbarer Protokolle für die strukturelle Untersuchung und die Quantifizierung von Eigenschaften verschiedener Systeme. II blocks and the Unterschiedliche Methoden der Dichtefunktionaltheorie und Wellenfunktionstheorie wurden in Simulationen getestet, bei interactions denen Graphen und seine Derivate mit kleinen Molekülen via Physi- und Chemisorption interagierten, sowie in Simulationen, B between these in denen metallorganische Gerüstverbindungen kleine Moleküle in ihren Poren einschließen. Hochgenaue Ansätze zur building blocks. Berechnung von Energie und Eigenschaften wurden mittels Benchmarking gegen verfügbare experimentelle und high-level N In-Silico-Daten bestimmt. Die Nützlichkeit verschiedener Tools zur Visualisierung und Analyse von Interaktionen in diesen Figure 7: A Investigated nanographene electrocatalysts for the ORR. B Computed potential To address this komplexen Systemen wurde validiert. energy surfaces of O chemisorption on catalyst II and the structure of the energy minimum 2 challenge, we tested the applicability on the MS = ½ surface (PBE-D3 level of theory). and validity of various quantum-chemi- Im Bereich Anwendungen wurden die Designrichtlinien für graphenbasierte Sensoren für nitroaromatische Gefahrstoffe und dots), which display superior catalytic compared with experimentally active II, cal tools for analyzing and visualizing für nanographenbasierte Elektrokatalysatoren für die Sauerstoffreduktionsreaktion untersucht. Dabei wurden Kandidaten performance. thereby highlighting the subtle balance the strength and physical nature of the anhand etablierter computergestützter Protokolle vorsortiert. Die Forschungsergebnisse bilden die Grundlage für die In-Silico-

To identify the physical underpinnings between O2 activation and product non-covalent interactions in the MOF Entwicklung neuer und verbesserter funktionaler kohlenstoffhaltiger und hybrider organisch–anorganischer Materialien mit of this performance, we focused on desorption, in accordance with the host–guest complexes across periodic gezielten sensorischen, katalytischen und elektronischen Eigenschaften. two N,B-codoped nanographene Sabatier principle. Further analysis of and finite-size scales. Several methods

16 17 What happend in the lab in Our recurring highlight, the summer Introduction 2020? school on Computational Molecular 2 Research Evolution on Crete, for which Alexis The term “computational molecular In the winter of 2019/2020, Alexis, again served as main organizer in the evolution” refers to computer-based Ben, Alexey, and Pierre taught the 12th year and which was scheduled to methods of reconstructing evolution- 2.3 Computational “Introduction to Bioinformatics for take place in May 2020, was post- ary trees from DNA or – for example Computer Scientists” class at the poned until 2021 due to the pandemic. – from protein- or morphological data. Karlsruhe Institute of Technology Moreover, the 13th iteration of the The term also refers to the design of Molecular Evolution (CME) (KIT). As in previous years, we re- summer school, which was scheduled programs that estimate statistical ceived highly positive teaching to take place in Hinxton, UK, in 2021, properties of populations – that is, evaluations from the students (with a was postponed until 2022. Overall, the programs that disentangle evolution- learning quality index of 100 out of crisis management worked well as the ary events within a single species. 100; see http://cme.h-its.org/exelixis/ decision to postpone the event was The very first evolutionary trees were web/teaching/courseEvaluations/ made sufficiently early. inferred manually by comparing the Winter19_20.pdf). morphological characteristics (traits) Alexis was listed on the Clarivate of the species under study. Today, in During the summer semester of 2020, Analytics list of highly cited research- the age of the molecular data ava- we again taught our main seminar, ers for the fifth year in a row as well lanche, the manual reconstruction of “Hot Topics in Bioinformatics.” as for the third consecutive year under trees is no longer feasible. Evolution- Our teaching activities were heavily the new “cross-field” category, which ary biologists thus have to rely on affected by the current pandemic. The comprises researchers with a focus computers and algorithms for phylo- seminar in the summer was carried on interdisciplinary research (see genetic and population-genetic out entirely online. In addition, the vast Chapter 9.5). analyses. majority of the oral exams for the Following the introduction of so-called class from winter 19/20 were also The year was dominated by the short-read sequencing machines conducted online. In general, the pandemic. Overall, the transition to (machines used by biologists in the Group Leader Students transition to pure online teaching was working online was relatively easy as wet lab to extract DNA data from Prof. Dr. Alexandros Stamatakis Dimitri Höhler comparatively easy and unproblemat- online conferencing and collaboration organisms), which can generate over Ivo Baar (until April 2020) ic. Nonetheless, students at KIT tools – such as slack, github, overleaf, 10,000,000 short DNA fragments Staff members Johanna Wegmann (until March 2020) appear to now be tired of the online etc. – had already been in use long (each containing between 30 and 400 Dr. Alexey Kozlov (staff scientist) Julia Schmid (as of December 2020) teaching and miss having social before the pandemic began, and lab DNA characters), the community as a Benjamin Bettisworth Paul Schade (until November 2020) contact with others on the university members were already acquainted whole now faces novel challenges. Benoit Morel Lukas Hübner (until June 2020) campus. with online supervision. We also One key problem that needs to be Lukas Hübner (from October 2020) established a Friday coffee break addressed is the fact that the number Pierre Barbera Lukas Hübner, Dimitri Höhler, and Paul conference call to maintain some of molecular data available in public Sarah Lutteropp Schade all successfully defended their form of social life at the lab. databases is growing at a significantly master's theses at the Department of Computer Science at KIT. The supervi- The Computational Molecular Evolution group focuses on In the following section, we outline our current research sion of their theses was also conduct- developing algorithms, models, and high-performance activities, which lie at the interface(s) between computer ed online to a very large extent. computing solutions for bioinformatics. We focus mainly on science, biology, and bioinformatics. The overall goal of the We are happy that Lukas Hübner • computational molecular phylogenetics group is to devise new methods, algorithms, computer joined the lab as a PhD student in • large-scale evolutionary biological data analyses architectures, and freely available/accessible tools for October 2020. He is co-supervised by • supercomputing molecular data analysis and to make them available to and co-financed along with Prof. Peter • quantifying biodiversity evolutionary biologists. In other words, we strive to support Sanders, head of the algorithm • next-generation sequence data analyses research. One aim of evolutionary biology is to infer evolu- engineering group at the Institute for • scientific software quality & verification tionary relationships between species and the properties of Theoretical Informatics at KIT. individuals within populations of the same species. In In 2020, a total of four KIT master’s Secondary research interests include modern biology, evolution is a widely accepted fact and that students joined the lab either as • emerging parallel architectures can be analyzed, observed, and tracked at the DNA level. As student programmers – to work on • discrete algorithms on trees evolutionary biologist Theodosius Dobzhansky’s famous and their master’s theses – or as PhD • population genetics widely quoted dictum states, “Nothing in biology makes students. Figure 10: Cost of sequencing a human genome over time in comparison with the cost of sense except in the light of evolution.” computing according to Moore’s law (source: National Human Genome Research Institute).

18 19 2.3 Computational Molecular Evolution (CME) faster rate than the computers that total amount of energy-to-solution Together with our lab alumni Dora that the entire set be summarized and structure of the consensus phylogeny pandemic – cannot be determined are capable of analyzing the data can required. Serdari, Pavlos Pavlidis, and Lucas used for any downstream analyses and the virus classification (see Fig. 11). with confidence using either the keep up with. Overall, phylogenetic trees (evolution- Czech as well as all current staff and interpretations of the results. most-closely known bat- or pangolin In addition, the cost of sequencing a ary histories of species) and the members of the lab and virus-evolu- Despite the weak signal, we found Importantly, this study also found that coronaviruses or the first human genome is decreasing at a faster rate application of evolutionary concepts in tion experts from Greece and Cyprus, that our approach of summarizing the current tools for likelihood-based SARS-CoV-2 genomes from the than is the cost of computation, general are important in numerous we set forth to explore the difficulties information in the plausible tree set phylogenetic inference have substan- Wuhan region. although the curve seems to have domains of biological and medical of analyzing the evolution of these does yield helpful information. For tial numerical difficulties when applied been flattening out in the last 3–4 research. Programs for tree recon- challenging data. Apart from the years (see Fig. 10). struction that have been developed in scientific outcome, frequent video our lab can be deployed to infer conferences during the first lockdown We are thus faced with a scalability evolutionary relationships among also helped to prevent social isolation challenge – that is, we are constantly viruses, bacteria, green plants, fungi, and were generally just fun. trying to catch up with the data mammals, etc. – in other words, they Using a snapshot of the available avalanche and to make molecular are applicable to all types of species. whole-genome data on 5 May, we data-analysis tools more scalable with In combination with geographical and investigated the difficulties of analyz- respect to dataset sizes. At the same climate data, evolutionary trees can be ing the data, which are due to uneven time, we also want to implement more used – inter alia – to disentangle the sampling/sequencing across coun- complex and hence more realistic and origin of bacterial strains in hospitals, tries, to a large variation in se- compute-intensive models of evolu- to determine the correlation between quence-data quality across countries, tion. the frequency of speciation events and to the generally very low mutation (species diversity) and past climatic rate of the virus, which renders the To address the scalability challenge, changes, and to analyze microbial phylogenetic tree reconstruction we have recently begun to investigate diversity in the human gut. challenging. mechanisms for improving the fault tolerance (with respect to network- Finally, phylogenies play an important We found that the phylogenetic signal and processor failures) of large role in analyzing the dynamics and in the data is generally very weak (as parallel scientific software tools that evolution of the current SARS-CoV-2 expected) and that it is therefore not run concurrently on thousands of pandemic and in conducting local sufficient to reconstruct a single cores by example of RAxML-NG, our contact tracing. To that end, some of phylogenetic tree because the likeli- tool for phylogenetic inference. our activities also focused on contri- hood surface is extremely rugged. In Another novel line of research in this buting to the analysis of the vast other words, a large number of area is our new focus on making such number of SARS-CoV-2 virus genomes phylogenetic trees (evolutionary large computational codes more that have been sequenced, which now hypotheses about the evolutionary his- energy-efficient. Again, we conduct total approximately 225,000. We tory of the virus) can be found that research in this domain by example of describe some of these activities in exhibit similar likelihood scores – that RAxML-NG as it is the most widely greater detail in the following sections. is, trees, that explain the data equally used and most scalable Bioinformat- well and that we cannot distinguish ics tool developed in our group. Phylogenetic analysis of using standard statistical significance Hence, it also generates the largest SARS-CoV-2 data is difficult! tests for phylogenetics. The key

CO2 footprint. Initial experiments have problem is that these equally plausible Figure 11: Consensus tree constructed from the plausible tree set of SARS-CoV-2 data revealed that using fewer cores for The genomic data on SARS-CoV-2 trees exhibit substantial topological available on 5 May, annotated by the virus-subtype classification. The different subtypes and their names are shown in the bar on the right. the computations and reducing the exhibit several properties that render differences – that is, despite having clock frequency of the cores (as our their phylogenetic analysis difficult. In similar likelihood scores, the actual instance, we mapped a current virus to such data, for instance, when computations are predominantly a project that emerged ad hoc in tree topologies can be vastly different subtype classification onto the estimating the branch lengths of the memory-bandwidth-bound – i.e., the spring 2020 [Morel, 2020], we decided from one another. To alleviate this consensus tree constructed by trees or when using more complex cores waste cycles/energy by waiting to investigate whether it is possible to problem, we introduced the concept summarizing the tree topologies in the statistical models of evolution. In addi- to retrieve data from the main memo- reliably reconstruct large phylogenetic of a ‘plausible tree set’ that contains plausible tree set, and we observed a tion, we found that the root of the tree ry) can substantially decrease the trees on SARS-CoV-2 genome data. all equally likely trees and proposed substantial concordance between the – that is, the starting point of the

20 21 2.3 Computational Molecular Evolution (CME)

The Serratus Project viruses, such as the delta viruses or (e.g., including/excluding errors, the bacteriophages. distinct rates of evolution, etc.). Our PhD student Pierre Barbera also Based on a total of 19,400 simulated became involved in a second project CellPhy - a tool for inferring datasets, our results reveal that associated with a freely available single-cell phylogenies CellPhy is robust with respect to the open-source cloud-based computa- different error types and sources of tional analysis tool called Serratus that Another project related to the applica- uncertainty and that it outperforms was designed primarily for studying tion of phylogenetic methods to public competing methods under realistic coronavirus data [Edgar, 2020]. Pierre health problems is the development of simulation scenarios with respect to integrated the EPA-NG tool for phylo- the CellPhy tool [Kozlov, 2020]. both tree accuracy and speed. genetic placement of anonymous CellPhy uses maximum likelihood tree We also applied CellPhy to a new sequence data that he developed reconstruction methods to infer the empirical single-cell genome dataset while working on his thesis as well as evolutionary history of individual cells. for colorectal cancer and to three methods for standard phylogenetic By using single-cell sequencing, it is previously published empirical data- inference into Serratus. possible to obtain, for instance, sets. Figure 12 displays two trees that individual and potentially altered cell have been reconstructed on the The overall goal of Serratus is to genomes from different types of genomes of healthy cells and tumor provide the infrastructure and methods cancer cells in a single patient. cells from different parts of the body required to carry out sequence-similari- Single-cell sequencing thus represents of two colorectal cancer patients (A ty-based searches with the goal of a revolution in modern biology. and B). identifying closely related sequences to a set of query sequences, such as all Beyond cancer, understanding this In the context of the emerging disciple Figure 12: Sub-figure (A) CellPhy tree for 24 colorectal cancer-cell genome sequences. The distinct shapes and colors at the tips of the tree genes from all available coronavirus genomic diversity within a single living of single-cell sequencing-data analy- denote the respective cell type. A healthy cell is depicted by a blue circle, whereas the cancer cells are depicted by other colors and shapes depending on the type of cancer cell and its location in the body of the patient. Sub-figure (B) CellPhy tree for 86 colorectal cancer-cell genome sequences (this collection of all genes organism has applications in several sis, Alexis and Alexey also contributed sequences. Healthy cells are denoted by dark purple circles and light purple squares. Tumor cells of different types are denoted by light orange from all subtypes/strains of an areas of biology due to its connection to a review paper that summarizes diamonds and dark orange triangles. organism/virus is also commonly with development and aging process- the 11 main challenges faced by denoted as the pangenome) in huge es as well as with genetic diseases. single-cell data science [Lähnemann, collections of public sequencing data. The inference of phylogenetic trees on 2020]. In this context, EPA-NG was deployed this novel type of data goes hand in to taxonomically classify sequences hand with several challenges. For that might belong to the broad example, we needed to substantially spectrum of coronaviruses. adapt our statistical models of Die Gruppe rechnerbasierte Molekulare Evolution In diesem Bericht beschreiben wir unsere For- nucleotide evolution in order to (CME) beschäftigt sich mit Algorithmen, Modellen und schungsaktivitäten. Unsere Forschung setzt an der More specifically, Serratus was used account for the idiosyncrasies of dem Hochleistungsrechnen für die Bioinformatik. Schnittstelle zwischen Informatik, Biologie und to search and align the coronavirus single-cell genome data. In general, Unsere Hauptforschungsgebiete sind: Bioinformatik an. Unser Ziel ist es, Evolutionsbiologen pangenome against publicly available these data tend to be noisier than • Rechnerbasierte molekulare Stammbaumrekonstruk- neue Methoden, Algorithmen, Computerarchitekturen sequencing data containing 5.6 ‘normal’ DNA data used for standard tion und frei zugängliche Werkzeuge für die Analyse petabases of data (i.e., 5.6 x 1015 phylogenetic inference and also • Analyse großer evolutionsbiologischer Datensätze molekularer Daten zur Verfügung zu stellen. Unser nucleotides). contain more – as well as several • Hochleistungsrechnen grundlegendes Ziel ist es, Forschung zu unterstützen. distinct types of – sequencing errors. • Quantifizierung von Biodiversität Die Evolutionsbiologie versucht die evolutionären The computational analysis using To that end, we developed dedicated • Analysen von “Next-Generation” Sequenzdaten Zusammenhänge zwischen Spezies sowie die Serratus helped to identify thousands statistical models for the evolution of • Qualität & Verifikation wissenschaftlicher Software. Eigenschaften von Populationen innerhalb einer of coronavirus- and coronavirus-like single-cell sequencing data, including Spezies zu berechnen. genomes and genome fragments in an error model as well as a method to Sekundäre Forschungsgebiete sind unter anderem: In der modernen Biologie ist die Evolution eine the reference data, including known account for uncertainty in the input • Neue parallele Rechnerarchitekturen weithin akzeptierte Tatsache und kann heute anhand subtypes as well as – more impor- data. This novel model was imple- • Diskrete Algorithmen auf Bäumen von DNA analysiert, beobachtet und verfolgt werden. tantly – putatively novel types. Finally, mented in the standard RAxML-NG • Methoden der Populationsgenetik. Ein berühmtes Zitat in diesem Zusammenhang the Serratus approach is highly tree search algorithm and tested on stammt von Theodosius Dobzhansky: „Nichts in der generic – that is, it can also be used simulated single-cell sequence data Biologie ergibt Sinn, wenn es nicht im Licht der to unravel putative novel types in other under a large number of scenarios Evolution betrachtet wird”.

22 23 2 Research General news There is no doubt that 2020 was a es,” addresses critical needs in evaluat- al regression and forecast evaluation. most unusual year for all of us. The ing earthquake forecasts and also Alexander’s support facilitates and 2.4 Computational year was shaped by a pandemic that deals with epidemiological forecasts in invigorates our research in major ways has changed our lives in unprecedent- the interval format at the COVID-19 and has already borne substantial fruit ed ways. The CST group reacted Forecast Hub. at this early stage, which will be Statistics (CST) quickly to the pandemic and contribut- detailed in the report for the coming ed to the global effort on COVID-19 Timo Dimitriadis was offered a presti- year. forecasting in various ways, including gious new position as Assistant but not limited to the development of Professor in Applied Econometrics at In continuation of a valued tradition, an the German–Polish COVID-19 Forecast Heidelberg University’s Alfred Weber integral aspect of our work is charac- Hub. In parallel, we maintained an Institute of Economics, where he is terized by intense and frequent disci- intense and fruitful interdisciplinary currently based. Timo continues to be plinary and interdisciplinary scientific collaboration and exchange with affiliated with HITS as a visiting exchange. Before this exchange had to meteorologists at our home university, scientist. Incidentally, in this new be shifted almost exclusively to an the Karlsruhe Institute of Technology position, Timo is the successor to online format beginning in March 2020, (KIT), and at the European Centre for another HITS and CST alumnus, Fabian we had organized an interdisciplinary Medium-Range Weather Forecasts Krüger. event on the HITS premises in a joint (ECMWF) in Reading in the United endeavor with the Molecular Biome- Kingdom. Facets of this research are Our visiting scientist Sebastian Lerch chanics (MBM) and Physics of Stellar detailed in respective sections below. received a highly competitive major Objects (PSO) groups. As reported on For the early-career researchers in our four-year award in the “MINT for the in Section 5.1.1, “Emulator Day” took group, 2020 was a particularly reward- Environment” program of the Vector place on 27 January 2020. Emulators ing year marked by much success and Foundation to pursue research on are simple yet smart and computation- many accomplishments. "Artificial Intelligence for Probabilistic ally efficient surrogate models that Weather Forecasting" at the cutting complement more-complex, potentially Johannes Bracher completed his PhD edge of the interface between mathe- prohibitively computationally expensive Group Leader Visiting scientists work in Epidemiology and Biostatistics matics, computer science, and meteo- physics-based computer simulations. Prof. Dr. Tilmann Gneiting Dr. Johannes Bracher (since May 2020) at the University of Zürich in Switzer- rology. Sebastian is currently building a In the course of 2020, we strengthened Dr. Timo Dimitriadis (since October 2020) land in spring 2020 and began a junior research group at our home the partnership between the three Staff members Dr. Sebastian Lerch postdoc position in Melanie Schienle’s university of KIT and maintains close groups, and PhD students Kiril Maltsev Dr. Jonas Brehmer (since June 2020) Eva-Maria Walz group at KIT, paired with an appoint- ties to HITS. and Kai Riedmiller joined HITS to work Dr. Timo Dimitriadis (until September 2020) ment at HITS. Immediately following on mutual projects. While Kai and Kiril Dr. Alexander I. Jordan (staff scientist since July 2020) Student his appointment, Johannes was Patrick Schmidt moved to a postdoc are based in the MBM and PSO groups, Johannes Resin Daniel Wolffram (since September 2020) included in KIT’s competitive Young position at the University of Zürich in respectively, we were pleased to Patrick Schmidt (until March 2020) Investigator Group (YIG) Preparation Switzerland, which began in April 2020. additionally welcome them to the CST Program and received start-up funding In July, we were pleased to welcome group. toward a junior research group. Alexander Jordan back into the group. The Computational Statistics group at HITS was estab- developing both the theoretical foundations for the Johannes also took the lead in devel- After completing PhD studies at HITS lished in November 2013, when Tilmann Gneiting was science of forecasting and cutting-edge methodology in oping the German–Polish COVID-19 and KIT, Alexander moved to an appointed group leader and Professor of Computational statistics and machine learning, notably in connection Forecast Hub and has contributed to academic position at the University of Statistics at the Karlsruhe Institute of Technology (KIT). with applications. While weather forecasting and collabo- related efforts worldwide. Bern in Switzerland. Back at HITS, in The group’s research is focused on the theory and prac- rative research with meteorologists continue to represent addition to expanding his research on tice of forecasting. As the future is uncertain, forecasts prime examples of our work, we have also addressed Jonas Brehmer completed his PhD statistics and machine learning, should be probabilistic in nature, which means that they challenges raised by the pandemic in 2020 by establish- work in Mathematics at the University Alexander has been tackling important should take the form of probability distributions over ing collaborative relationships with epidemiologists, of Mannheim and successfully defend- long-term tasks in the CST group, such future quantities or events. Accordingly, over the past creating the German–Polish COVID-19 Forecast Hub, ed his thesis in early December 2020, as the continued development and several decades, we have been witnessing a trans-disci- developing methods for epidemiological ensemble graduating with top honors. His PhD maintenance of software tools that plinary shift of paradigms from deterministic- or point forecasts, and contributing to similar efforts worldwide thesis, “Theory and Methodology of enable researchers across disciplines forecasts to probabilistic forecasts. The CST group seeks while placing methodological emphasis on the generation Scoring Functions: Tail Properties, to apply our methodological solutions, to provide guidance and leadership in this transition by and evaluation of epidemiological ensemble forecasts. Interval Forecasts, and Point Process- particularly in the areas of distribution-

24 25 2.4 Computational Statistics (CST) Weekly reported cases, Germany, 1 wk ahead

Weekly reported cases, Germany, 1 wk ahead epiforecasts−EpiExpert 50% PI epiforecasts−EpiNow2 95% PI

250000 FIAS_FZJ−Epi1Ger LANL−GrowthRateepiforecasts−EpiExpert 50% PI epiforecasts−EpiNow2 KIT−time_series_baseline 95% PI

250000 KITCOVIDhub−median_ensembleFIAS_FZJ−Epi1Ger LANL−GrowthRate

100000 KIT−time_series_baseline KITCOVIDhub−median_ensemble observed

0 (ECDC) 100000 observed

0 Nov 01 Dec 01 (ECDC)Jan 01

Nov 01 time Dec 01 Jan 01

time Weekly reported deaths, Germany, 1 wk ahead

Weekly reported deaths, Germany, 1 wk ahead 6000 epiforecasts−EpiExpert 50% PI epiforecasts−EpiNow2 FIAS_FZJ−Epi1Ger 95% PI

6000 LANL−GrowthRateepiforecasts−EpiExpert 4000 50% PI epiforecasts−EpiNow2 KIT−time_series_baseline 95% PI KITCOVIDhub−median_ensembleFIAS_FZJ−Epi1Ger LANL−GrowthRate 4000 2000 KIT−time_series_baseline KITCOVIDhub−median_ensemble observed

0 (ECDC) 2000 observed

0 Nov 01 Dec 01 (ECDC)Jan 01

Nov 01 time Dec 01 Jan 01 Figure 13: Screenshot from the interactive web application (https://kitmetricslab.github.io/forecasthub/) showing median forecasts and 95% Figure 15: Forecasts one week in advance for cases (top) of and deaths (bottom) from COVID-19 in Germany from six selected models. We show time forecast intervals for deaths in Germany from selected models at one and two weeks in advance, as issued on 22 February 2021. 50% and 95% prediction intervals overlaid with data from the ECDC.

In what follows, we describe the Johannes Bracher is a member of the the various individual forecasts into a development of the German–Polish US Hub team, and group leader simple ensemble forecast that treats COVID-19 Forecast Hub at KIT and Tilmann Gneiting serves on its Ensem- its members equally. All these types of Weekly reported cases, Germany HITS. Afterward, we move to collabora- ble Advisory Board. Both Johannes and forecasts are publicly available in an 200000 tive research within the “Waves to Tilmann are coauthors on preprints online repository (https://github.com/ Weather'' consortium and report on a that report on results and experiences KITmetricslab/covid19-forecast-hub-de) 150000 recent comprehensive study on the from the US Hub [Ray et al., 2020, and can be explored interactively in a 100000 quality of precipitation forecasts for the https://www.medrxiv.org/content/10.110 dashboard, which is intended for a 50000 tropics. 1/2020.08.19.20177493v1; Cramer et al., general audience and displays median 2021, https://www.medrxiv.org/content/ forecasts along with forecast intervals. 0 German–Polish COVID-19 10.1101/2021.02.03.21250974v1]. Figure 13 illustrates an example for 2020−01−04 2020−03−28 2020−06−20 2020−09−12 2020−12−05 Forecast Hub deaths in Germany. In the German and Polish Forecast Weekly reported deaths, Germany A highlight of our work in 2020 was the Hub, we collaborate with more than a On 8 October 2020, we deposited a establishment of the German and dozen modeling teams from Germany, forecast-study protocol at the registry Polish COVID-19 Forecast Hub (https:// , Switzerland, the United King- of the Open Science Foundation (OSF) 6000 kitmetricslab.github.io/forecasthub/) in dom, and the United States, who that defined a study period and 4000 a joint endeavor with Melanie Schien- provide weekly forecasts of confirmed procedures for a prospective fore- le’s group at the Chair of Econometrics cases of and deaths from COVID-19 at cast-evaluation study. As Figure 14 2000 and Statistics in the Department of prediction horizons of up to four weeks illustrates, the specified study period 0 Economics at KIT with support from in advance in real time using a stan- spanned 10 weeks – from early the Signale Team at the Robert Koch dardized format that consists of October to mid-December – and 2020−01−04 2020−03−28 2020−06−20 2020−09−12 2020−12−05 Institute. The Forecast Hub is run in predictive quantiles at a total of 23 coincided with the rise of the second Figure 14: Weekly reported new cases (top) of and deaths (bottom) from COVID-19 in Germany according to data from the European Centre for close exchange with the US COVID-19 distinct levels. In addition, we produce COVID-19 wave in Germany. On 2 Disease Prevention and Control (ECDC). The study period – from early October to mid-December 2020 – is highlighted in gray. Forecast Hub (https://covid19forecast- and provide a range of baseline November, a national semi-lockdown hub.org/), to which we also contribute. forecasts ourselves, and we aggregate was imposed, followed by a full

26 27 2.4 Computational Statistics (CST) Toward better precipitation In order to generate weather forecasts, undisputed successes, ensemble forecasts for the tropics weather centers draw on highly systems continue to exhibit systematic a) sophisticated numerical models that errors, such as biases (the predictions In Europe and North America, reliable are run on supercomputers in real time are systematically too high or too low) Case forecasts, Germany Death forecasts, Germany weather20 forecasts have been taken for and produce point forecasts of future and dispersion errors (the output is granted for decades. For many tropical atmospheric states. In a strong move systematically too concentrated or too

Models: countries,0 accurate forecasts of rainfall toward probabilistic forecasts, these spread out), as illustrated in Figure 18

1000 epiforecasts−EpiExpert areLatitude unavailable despite the critical role efforts have been transformed through (see following page) for 1-day accumu-

100000 epiforecasts−EpiNow2 FIAS_FZJ−Epi1Ger this rainfall plays in the effective the operational implementation of lated rainfall (see the following page). It ITWW−county_repro −20

50000 MIT_CovidAnalytics−DELPHI

500 management of key resources, such as ensemble systems since the 1990s. An is therefore common practice to KIT−time_series_baseline KITCOVIDhub−median_ensemble water, food, health, and renewable ensemble forecast consists of multiple statistically postprocess the NWP KIT−extrapolation_baseline −100 −50 0 50 100 150 LeipzigIMISE−SECIR energy. In a recently published joint simultaneous runs of numerical-weath- output, and we use a state-of-the-art 20000 UCLA−SuEIR Longitude USC−SIkJalpha study with meteorologists at KIT [Vogel er-prediction (NWP) models, which ensemble-model-output-statistics 200

mean weighted interval score or AE mean weighted interval score or AE mean weighted b) KIT−baseline et al., 2020], we assessed the useful- differ from one another in terms of the (EMOS) technique for this postprocess- 10000

ness20 of predictions of 1- to 5-day two major sources of uncertainty, ing. The respective statistical parame-

5000 accumulated precipitation from leading namely the initial state of the atmo- ters need to be estimated from training 1 2 3 4 1 2 3 4 weather centers. Due to the sparsity of sphere and the mathematical represen- data based on forecast-observation forecast horizon forecast horizon weather0 stations over large parts of tation of the respective physical pairs from the past. Latitude Figure 16: Mean weighted-interval score (WIS) achieved by various models for predictions of cases and deaths in Germany. Results are stratified by the tropics, the study relies on pseu- processes. We focus on results for the prediction horizon. For deterministic models, the mean absolute error (AE) is shown. The lower end of the gray area corresponds to the perfor- do-observations−20 of rainfall from world-leading 52-member ensemble As a benchmark, we use the concept mance of the KIT-baseline model. Line segments within the white area thus indicate that a model outperforms the baseline. satellites. operated by the ECMWF. Despite their of probabilistic climatology with rainfall

−100 −50 0 50 100 150 lockdown from 16 December on. absolute error (AE) in the case of a Longitude Figure 15 shows forecasts one week in deterministic model that issues point An important question that we address advance for cases and deaths in forecasts only. Figure 16 provides an in ongoing work is whether ensemble Germany over the study period. While overview of results for Germany by forecasts could be improved by 20 some forecast models account for target and forecast horizon. For cases, performance-dependent weightings of interventions and changes in testing which are a more immediate measure members or by other types of statisti- 0 strategies, many groups prefer to use of COVID-19 activity than deaths, hardly cal postprocessing, as are ubiquitously Latitude purely data-driven modeling strategies, any approach succeeded in outper- applied in weather forecasting. While which may entail delayed adaptation forming the naïve baseline forecast achieving such improvement was −20 and explains why numerous models – beyond a prediction horizon of two challenging in 2020 given the limited including the ensemble forecast – weeks. However, for deaths, both the forecast history and rapid changes in −100 −50 0 50 100 150 Longitude showed overshoot in the first half of ensemble forecast and a majority of the epidemic situation, approaches of November, when cases began to the individual models outperformed the this type may prove successful in 2021 plateau. For the final week of our study baseline at prediction horizons of up to as the German and Polish Forecast 20 period, most models considerably four weeks in advance. Hub continues to compile short-term underestimated deaths, perhaps due to model forecasts and to process them prior age-specific shifts in incidence Overall, we found that achieving good into ensemble forecasts. The ongoing 0 Latitude rates. Interestingly, the models differ probabilistic forecasts is challenging in vaccine rollout will challenge models BSS from one another with respect not only a dynamic epidemic situation, and we with a new layer of complexity, and we −20 to their point (median) forecasts but observed pronounced heterogeneity will extend our study into these future −1 0 1 also to the implied uncertainty, as between the different forecasts, with a phases, thereby contributing to a −100 −50 0 50 100 150 reflected by the 50% and 95% forecast general tendency toward overconfident rapidly growing body of evidence on Longitude intervals. forecasts, as evidenced by overly the potential and limitations of epide- narrow prediction intervals. However, miological forecasts. Figure 17: Brier skill score (BSS) for probability forecasts of 5-day rainfall accumulations of greater than 50 mm based on the raw (top) and The forecasts were evaluated using the collaborative forecasting is beneficial, statistically postprocessed (bottom) ECMWF ensemble relative to probabilistic climatology from 2009 to 2017. A skill score that is positive or negative corresponds to better or worse performance, respectively, than the reference forecast. Source: Vogel et al., 2020. mean weighted-interval score (WIS), and our ensemble forecast displayed which is a theoretically coherent good – albeit not outstanding – rela- measure that reduces to the mean tive performance.

28 29 2.4 Computational Statistics (CST)

Figure 18: Calibration of 1-day ECMWF raw ensemble forecasts for precipitation accumulation over the tropics from 2009 to 2017. The probabili- ty-integral-transform (PIT) histograms correspond to the climate regions indicated with color shading and with the correspondingly colored arrows. For a calibrated forecast (i.e., a forecast that provides valid uncertainty quantification), the PIT histogram is uniform. The U-shaped histograms indicate that the raw ensemble is systematically too concentrated. Source: Vogel et al., 2020. accumulations from a temporal gy in many parts of the tropics – a Die Forschungsgruppe Computational Statistics (CST) am HITS besteht seit November 2013, window centered on the considered striking finding that remains true als Tilmann Gneiting seine Tätigkeit als Gruppenleiter sowie Professor für Computational day of the year. This reference forecast independently of accumulation time Statistics am Karlsruher Institut für Technologie (KIT) aufnahm. Der Schwerpunkt der For- represents the static, climatological and weather center. Statistical postpro- schung der Gruppe liegt in der Theorie und Praxis der Vorhersage. distribution of rainfall at a given cessing entails significant improve- location and time of year but does not ment of the raw model output in many Im Angesicht unvermeidbarer Unsicherheiten sollten Vorhersagen die Form von Wahrschein- incorporate dynamic information about regions, but not for most of tropical lichkeitsverteilungen über zukünftige Ereignisse und Größen annehmen. Dementsprechend the state of the atmosphere. The raw Africa, where performance remains erleben wir seit nunmehr einigen Jahrzehnten einen trans-disziplinären Paradigmenwechsel and postprocessed ECMWF ensemble sobering. von deterministischen oder Punktvorhersagen hin zu probabilistischen Vorhersagen. Ziel der forecasts are clearly expected to CST Gruppe ist es, diese Entwicklungen nachhaltig zu unterstützen, indem sie theoretische outperform probabilistic climatology. The fact that precipitation forecasts Grundlagen für wissenschaftlich fundierte Vorhersagen entwickelt, eine Vorreiterrolle in der Putting this expectation to the test, are poor for large parts of the tropics is Entwicklung entsprechender Methoden der Statistik und des maschinellen Lernens einnimmt Figure 17 (see previous page) shows a strong demonstration of both the und diese in wichtigen Anwendungsproblemen, wie etwa in der Wettervorhersage, zum the predictive performance of probabili- complexity of the underlying forecast Einsatz bringt. ty forecasts for 5-day rainfall accumu- problem and the need for vigorous lations of greater than 50 mm in terms scientific development. Future work In diesem Zusammenhang pflegen wir intensive Kontakte und Kooperationen mit Meteo- of the Brier score. A negative or should refine statistical postprocessing rolog/-innen am KIT und am Europäischen Zentrum für mittelfristige Wettervorhersagen positive skill score corresponds to a methods, run NWP models at higher (EZMW). Über kollaborative Projekte mit Epidemiolog/-innen, den Aufbau des deutsch-polni- forecast that is inferior or superior, spatial resolution and with improved schen COVID-19 Forecast Hubs und die Unterstützung von ähnlichen Projekten weltweit respectively, to the reference. Surpris- representations of physical processes, stellen wir uns durch die Pandemie ausgelösten neuen Herausforderungen. Unsere besondere ingly, the raw ensemble forecast and merge NWP-based- and data-driv- Aufmerksamkeit gilt dabei der Erzeugung und Bewertung von epidemiologischen Ensemble- underperforms probabilistic climatolo- en statistical forecasting techniques. vorhersagen.

30 31 Machine learning for with as much distance between them features influence the target in predicting employee as possible. Artificial neural networks artificial neural networks, partial 2 Research absences are composed of layers of connected derivatives that represent variations nodes through which information of the output with respect to small In order to remain competitive and travels from the input layer (the first changes in the inputs are obtained. 2.5 Data Mining and Uncer- effective in the market, companies layer) to the output layer (the last These derivatives are then ranked in seek to reduce costs and maximize layer) via one or more (hidden) layers. order to determine the degree to profit. Employee absenteeism, The connections between nodes have which the model is sensitive to each tainty Quantification (DMQ) whether justified or not, is a crucial weights that adjust as learning feature. factor in reaching this goal. Within proceeds through several traversals the KIPROSPER project, we study this of the layers. Artificial neural net- Results reveal that artificial neural issue via machine-learning tech- works perform tasks without pro- networks have the best accuracy niques to gain insights into employee grammed rules or prior knowledge (77%), followed by random forests absences. Specifically, we aim to and instead automatically learn from (72%) and support vector machines predict absences given employee the examples or the training data that (62%). Artificial neural networks also data, such as demographics, medical they are given to process. take at least 6 times longer to train and behavioral history, self-reported and perform better in almost all test job satisfaction, and work-structure Prior to training the models, the cases, except when the training-data information. Furthermore, we analyze training data are normalized to a size is smaller, in which case random which of these employee data are common scale and then resampled forests have slightly better accuracy significant factors in this prediction. to synthesize additional samples in (1–2%). Random forests also yield order to balance the number of consistent results, with accuracy The task is treated as a classification per-class labels. The metric used for ranging from 68–74%. Support vector problem. Classification aims to performance measurement is accura- machines always perform worst approximate a mapping function cy (i.e., the ratio of correct predic- among the models. Results also from input-independent variables tions to total predictions). Cross-vali- show that the set of training features (features) to output-dependent dation is employed to generalize the significantly affects models’ perfor- Group Leader Visiting scientists variables (targets). In this study, the model evaluation. Experiments are mance. Highly correlated features Prof. Dr. Vincent Heuveline Saskia Haupt features are medical measurements, Sotirios Nikas (until June 2020) demographic data, work-structure Staff members Elaine Zaunseder (since October 2020) data, self-reported life- and work Dr. Philipp Gerstner Alejandra Jayme (since September 2020) characteristics, and health- or behav-

Alejandra Jayme (until August 2020) Aksel Alpay (since November 2020) ioral history, and the target is a label Demographic Structural Data Data Philipp Lösel corresponding to a range of the Dr. Chen Song Students number of absent days, which Medical Work Charlotte Boys (until September 2020) represents an absence-risk level. Data Survey Jonas Roller In this work, three machine-learning models are studied: random forests

The Data Mining and Uncertainty Quantification (DMQ) and uncertainty quantification – require a decidedly (RF), support vector machines (SVM), Company and Machine Learning Predict Absences group, headed by Prof. Dr. Vincent Heuveline, began its interdisciplinary approach to mathematical modeling, and artificial neural networks (ANN). Employee Data Analysis Analyze Reasons research in May 2013. The group works in close collab- numerical simulation, hardware-aware computing, Random forests construct a multi- oration with the Engineering Mathematics and Comput- high-performance computing, and scientific visualiza- tude of individually trained decision Figure 19: Machine learning for predicting employee absences. ing Lab (EMCL) at the Interdisciplinary Center for tion. trees. Their output is the label with Scientific Computing (IWR) at Heidelberg University, most votes from the trees. Overfitting designed to analyze the effect of contain the same information, and which is also headed by Vincent Heuveline. The DMQ In 2020, the DMQ group focused on research activities is minimized using different versions training-data size, model size, and the removing redundant features has group’s research focus lies in gaining knowledge from in the areas of uncertainty quantification and machine of the input data (bootstrap aggrega- number of features on the models’ little impact on the models’ accuracy. extremely large and complex datasets through data- learning for medical applications, mathematical oncolo- tion) and random subsets of features performance. The feature sets are Neural networks can also correctly mining technologies. Reliability considerations with gy, and the Biomedisa online platform for biomedical for each tree during training. Support varied by degree of correlation or by classify the change in the risk of respect to these datasets are addressed via methods of image segmentation. vector machines aim to construct a ranking according to a domain absence after switching the values of uncertainty quantification. Both fields – data mining hyperplane or a set of hyperplanes expert, feature importance, and the features to which the model is that separate the data into groups random ranking To analyze how input highly sensitive.

32 33 2.5 Data Mining and Uncertainty Quantification (DMQ) increasing the demand for accelerat- ed image analysis. Image segmenta- These findings make clear that such as MRI. In order to quantify how simulation. In a similar fashion, a tion, in particular, remains a major machine-learning methods can be these uncertainties affect the validity hierarchy of Smolyak sparse grids is bottleneck and is often the most used to successfully predict employ- of given quantities of interest, we considered that is obtained by labor-intensive and error-prone task ee absences and similar health-relat- developed a computational frame- varying the number of grid points in of 3D-image analysis. ed problems. A thorough data work that propagates the input uncer- the input-parameter space – that is, preparation with appropriate model tainties through a deterministic the number of deterministic evalua- In situations in which little a priori selection is however the needed model. From a mathematical point of tions. Next, we combine the gPC knowledge is available, fully automat- ingredient for reliable predictions. view, this framework is based on the expansions that are obtained for ic segmentation routines are less generalized polynomial chaos meth- high-accuracy sparse grids and feasible. In these cases, manual Uncertainty quantification od (gPC) and allows the arising low-accuracy deterministic simula- segmentation by an expert followed in a multi-physics heart random variables to be expanded in a tions with those obtained for low-ac- by morphological interpolation simulation series of orthogonal polynomials. The curacy sparse grids and high-accura- remains a very common approach. gPC discretization technique is cy simulations in a defect-correction Here, labels are assigned to various The contraction of a human heart combined with two additional algo- fashion. In this way, a favorable structures of interest with different can be described by a system of rithmic concepts in order to deal with relationship between accuracy and intervals inside the 3D volume partial differential equations that the so-called curse of dimen- computational effort is (depending on the complexity of the combine hydrodynamical, elastome- sion, which states that achieved. dataset), followed by an interpolation chanical, and electrophysiological the computational Figure 20 illustrates of the labels between the pre-seg- elements. Understanding the underly- effort for uncertain- the results of the mented slices. The underlying image ing physical processes allows various ty quantification previously de- data are usually not taken into cardiovascular diseases to be dia- (UQ) increases scribed UQ account, and the interpolation is gnosed and treated. In a joint project exponentially workflow for therefore based exclusively on the with the respective groups led by with the uncertain segmented slices. Consequently, only Christian Wieners, Bettina Frohnapfel, number of material a fraction of the real experimental and Olaf Dössel, all from the Karls- considered, parameters information is utilized to derive the ruhe Institute of Technology, we uncertain and segmentation. address the issue of quantifying parame- internal uncertainties that arise in the context ters. The blood Figure 21: Mean and standard deviation of wall thickness during heart contraction at two To ensure the proper 3D segmenta- of a multi-physics heart simulation. first pressure. prescribed locations. tion of complex samples based on concept in The figure the morphological interpolation of The studied physical model is based use is shows the model may be low, which must be From semi-automatic to pre-segmented 2D slices, dense on the nonlinear equations of contin- given by heart taken into account in practical fully automatic image pre-segmentation is required, some- uum mechanics and describes the the geometry scenarios (e.g., by a surgeon who segmentation with the times even slice-by-slice. The conven- displacement of the initial heart Smolyak at the time plans an operation based on simula- Biomedisa online platform tional approach to the manual geometry under the influence of a sparse-grid of maximal tion data). The plots in Figure 21 segmentation of 3D images is force that arises due to biochemical method, contraction reveal how the wall thickness of the Three-dimensional imaging is leading therefore often tedious and time-con- reactions in the cells of the heart. As which reduces during a single heart changes over time at the to progress in many scientific disci- suming, effectively impeding the this situation is highly patient-specific the number of heartbeat. Here, prescribed locations. At both loca- plines. The analysis of volumetric analysis of large numbers of datasets and it is often not possible to mea- required evalua- the color map tions, the muscle is stretched, as is medical and biological imaging data from samples with high morphologi- sure material parameters in living tions of the deter- encodes the standard indicated by negative values. At – for example, from X-ray computed cal variability, as required, for exam- organs with accurate precision, it is ministic model – that is, deviation of the solid Location 2, however, the correspond- tomography (CT), magnetic reso- ple, for digitizing scientific collections important to consider that the input the number of deterministic ing standard deviation is significantly nance imaging (MRI), or optical or for conducting quantitative studies data of the simulation are subject to simulations for a given set of input Figure 20: Standard deviation of displacement larger (in relation to the mean value) microscopy – often requires isolating on biodiversity. Furthermore, artifacts uncertainties. parameters. The second concept is a field during heart contraction. than at Location 1. This behavior individual structures from the 3D resulting from the morphological multilevel technique. Here, the idea is again illustrates the heterogeneous volume via segmentation. Ongoing interpolation of manually segmented Examples of these uncertain parame- to consider a hierarchy of determinis- displacement field. Regions of high influence of uncertain input parameters. improvements in imaging technolo- slices and subsequent correction ters include material parameters, the tic models that is obtained by chang- uncertainty (i.e., regions in which the This topic is embedded in the gies result in higher resolutions and (e.g., line artifacts, overly smooth blood pressure inside the heart ing the underlying discretization uncertain input parameters have a Informatics4Life project (see Chapter 6). faster acquisition times, thereby meshes, etc.) limit the quality of the ventricles, the alignment of muscle parameters, which determine the large effect on the displacement) are results. fibers, and the overall geometry that accuracy and computational effort plotted in red (Location 2). In these is obtained by imaging methods, for performing a single deterministic regions, the validity of the physical

34 35 2.5 Data Mining and Uncertainty Quantification (DMQ) With the goal of reducing the neces- The use of mathematical abrogated, and frameshift peptides detect microsatellite indel mutations sary effort required of a human oncology to support the – novel protein structures caused by with high sensitivity [Ballhausen et al., annotator when segmenting large and search for vaccines against shifts of the translational reading 2020]. In close collaboration, we complex samples of unknown compo- cancer frame – are generated. Such frame- identified mutations shared by most sition, we developed Biomedisa shift peptides can be recognized by MSI colorectal and endometrial (Biomedical Image Segmentation App; Cancer is caused by various types of the immune system, and MSI tumors cancers. Such a predictable set of https://biomedisa.org), an easy-to-use genomic instability. Among other may thus respond well to immune shared indel mutations is unique to open-source online platform that is instabilities, so-called DNA mismatch therapies. However, until now, it had MMR-deficient cancers and encour- accessible through a web browser and repair- (MMR-)deficient tumors not been known whether these ages the development of preventive does not require any software installa- account for about 15% of colorectal neoantigens occur randomly in MSI vaccine approaches. tion or maintenance when used online. cancers and up to 30% of endometri- cancers. Biomedisa aims to be a one-button al cancers. Notably, tumors that As part of the "Mathematics in Furthermore, we detected a negative solution to addressing the needs of develop in the context of Lynch Oncology" collaborative initiative with correlation between the prevalence of scientists without the need for com- syndrome, the most common inherit- the Department of Applied Tumor a defined indel mutation in MMR-defi- plex and tedious configuration or for ed cancer predisposition syndrome, Biology (ATB) at Heidelberg University cient colorectal or endometrial substantial computational expertise. are characterized by MMR deficiency. Hospital, we developed and used a cancers and the predicted immuno- Its semi-automatic segmentation is Approximately 1 in every 180 people novel algorithm to quantitatively genicity of the resulting frameshift based on a smart interpolation of peptide. Our study strongly supports A sparsely pre-segmented slices that MSI tumor evolution uncontrolled outgrowth the concept of continuous immu- takes into account the complete without immune surveillance of immunogenic tumor noediting in human cancers (Figure underlying image data. The method 24B), which means that the immune Figure 22 : Biomedisa examples. a Medaka fish with a segmented skeleton and selected works well without parameter optimi- B system monitors the tumor during its internal organs (based on μCT scan). b Mouse molar tooth showing enamel (white) and zation and was specifically developed MSI tumor evolution tumor with limited immunogenicity development and immediately dentine (yellow) (μCT). c Human heart with segmented heart muscle and blood vessels (MRI). under immune surveillance (“immunoediting”) to help massively parallel computer eliminates cancer cells with highly d Fossilized parasitoid wasp from Baltic amber (SR-μCT). e Tracheal system of a hissing cockroach (μCT). f Claw of a theropod dinosaur from Burmese amber (SR-μCT). g Fossilized architectures – such as graphics immunogenic neoantigens. The study parasitoid wasp preserved inside a mineralized fly pupa (SR-μCT). h Head of an Australian processing units (GPUs) – cope with C provides new evidence for the bull-ant queen (SR-μCT). (Lösel et al., Nat. Commun. 2020). MSI tumor evolution Elimination of precancerous lesions, constantly growing image sizes. with immune stimulation no tumor development hypothesis that immunogenic can- (e.g., via vaccination) cers and pre-cancer cell clones can Biomedisa can be used for different be attacked and potentially eradicat- Legend 3D-imaging modalities and various bio- tumor cells immune cells ed by the host’s immune system medical applications (Figure 22). Its (Figure 24C). This finding strongly

smart interpolation can vastly acceler- low high encourages the development of ate the segmentation process while Immunogenicity cancer-preventive vaccines that may simultaneously providing more-accu- Figure 24: Immune surveillance and immunoediting in mismatch repair-deficient, microsatel- help to reduce tumor risk in Lynch rate results than manual segmentation lite-unstable cancers. syndrome and potentially beyond. (Figure 23). may be affected by Lynch syndrome, which is associated with a high Die Forschungsgruppe Data Mining and Uncertainty Quantification (DMQ) Finally, the results of the semi-auto- lifetime cancer risk. Knowing whether unter der Leitung von Prof. Dr. Vincent Heuveline besteht seit Mai 2013. Sie matic segmentation can be used to a tumor is MMR-deficient may help in arbeitet eng mit dem „Engineering Mathematics and Computing Lab“ am train a deep neural network that planning treatment or predicting how Interdisziplinären Zentrum für Wissenschaftliches Rechnen der Universität enables fully automatic segmentation well the tumor will respond to treat- Heidelberg zusammen, welches auch von Vincent Heuveline geleitet wird. Im when segmenting a large number of ment. Fokus der Forschungsarbeit steht ein zuverlässiger und strukturierter Wis- Figure 23: Comparison of a conventional segmentation approach (top row) and Biomedisa similar structures, such as the brain of MMR-deficient cells lack the ability to sensgewinn aus großen, komplexen Datensätzen, der mittels Data- (bottom row): The conventional procedure requires 77 hours compared with 9 hours with insects or the human heart. As a recognize and correct small errors in Mining Technologien erreicht und mit Methoden der Uncertainty Quantifica- Biomedisa. Both procedures require manual pre-segmentation of the 3D-image stack. While the result, Biomedisa enables large-scale DNA during cell division, thereby tion validiert wird. Beide Themenfelder – Data Mining und Uncertainty widely used morphological interpolation solely considers labels on pre-segmented slices, Biomedisa’s random walk interpolation takes both the underlying 3D-image data and the comparative quantitative analyses to leading to the accumulation of an Quantification – erfordern Interdisziplinarität in den Bereichen mathemati- pre-segmented slices into account, resulting in significantly less required manual input. Moreover, be used to gain a deeper understand- exceptionally high load of mutations, sche Modellierung, numerische Simulation, hardwarenahe Programmierung, interpolation artifacts are avoided, and fine details – such as hairs, which are usually omitted ing, for example, of the mechanisms particularly insertions or deletions Hochleistungsrechnen und wissenschaftliche Visualisierung. 2020 wurde during manual segmentation – are included (Lösel et al., Nat. Commun. 2020). behind insect cognition, and to provide (indels) of single nucleotides at dazu in der Gruppe in folgenden Anwendungsbereichen gearbeitet: Uncertain- numerical simulations based, for repetitive sequences (microsatellite ty Quantification und maschinelles Lernen für medizinische Anwendungen, example, on a patient-specific heart instability, MSI). Whenever gene-en- mathematische Onkologie und die Biomedisa-Webanwendung zur Segmen- model that supports clinicians in their coding microsatellites are hit, the tierung biomedizinischer Bilddaten. surgical planning and decision-making. respective gene function may be

36 37 In our research group, we investigate various mathe- groups, which act as symmetries of these spaces. matical problems in the fields of geometry, , We also apply the study of groups and geometry to 2 Research and dynamics that involve the interplay between other sciences, such as mathematical physics or spaces – such as manifolds or metric spaces – and data science and machine learning. 2.6 Groups and Geometry Hyperbolic geometry consistent geometry in which the Hyperbolicity in metric parallel postulate fails. This revolu- spaces (GRG) Hyperbolic geometry is the star of tionary discovery had profound non-Euclidean geometries and consequences on mathematics, There are many equivalent approaches provides the model for negative physics, and even philosophy. to describing hyperbolic geometry and curvature. The long history leading to Hyperbolic geometry continued to the manifestation of negative curva- its discovery originates in Euclid’s play a prominent role through the ture. One such approach is to compare Elements, a monumental treatise of 20th century, culminating in Thur- the measurements of the lengths and mathematics written in ca. 300 BC ston’s geometrization program and angles in triangles – that is, trigonome- that is likely the most influential its completion in the 2000s, which try – in Euclidean versus hyperbolic textbook ever written. Euclid’s ap- solved the renowned Poincaré geometry. For instance, hyperbolic proach is axiomatic and constructive, conjecture, one of the seven Millenni- triangles with given side lengths are relying on five fundamental axioms, um Prize Problems. To this day, always thinner than their Euclidean which he calls postulates. For hyperbolic geometry and its deriva- counterparts (see Fig. 26). centuries, mathematicians ques- tives remain an intensely active field tioned Euclid’s fifth postulate, accord- of mathematical research. In this This type of observation led mathema- ing to which, given a line and a point year’s HITS report, we describe some ticians to define curvature in very not on it, there exists a unique of the main features and applications general metric spaces, which include parallel line of hyperbolic geometry in connection Riemannian manifolds, through the point with the research activities of the graphs, and more-exotic (see Fig. 25 for GRG group. spaces. This line of Group Leader Marta Magnani Euclid’s original thinking proved Prof. Dr. Anna Wienhard Dr. Andreas Ott (until September 2020) formulation). highly Dr. Beatrice Pozzetti In the romantic success- Staff members Anja Randecker (since October 2020) 19th century, Johannes Horn (until October 2020) Evgenii Rogozinnikov Lobachevsky, Dr. Brice Loustau (since August 2020) Dr. Carmen Rovi (since September 2020) Gauss, Beltrami, Mareike Pfeil Dr. Andrew Sanders (until December 2020) Klein, Poincaré, Anna Schilling and other mathe- Dr. Gabriele Viaggi (since October 2020) maticians finally Visiting scientists broke Euclid’s rule: Figure 26: A hyperbolic triangle and its Euclidean counterpart. Hyperbolic triangles are “slim”: Assuming the same side Giulio Belletti (since October 2020) Students They developed a lengths |a| = |a’|, |b| = |b’|, |c|=|c’|, we have |xy| < |x’y’|. (Picture: Brice Loustau). Dr. Nguyen-Thi Dang Levin Maier (since October 2020) Dr. Valentina Disarlo Menelaos Zikidis

Figure 25: Euclid’s original formulation of the ful in the study of discrete groups and gave rise to a The Groups and Geometry research group works the study of all properties of a space that are invariant parallel postulate: If α + β is less than two right sub-field of research known as geometric group theory, in angles (180°), then the lines l and l’ must intersect closely with the Research Station Geometry & Dynam- under a group of transformations. In short: Geometry is which several GRG members specialize. One key idea of (Picture: Brice Loustau). ics at Heidelberg University. Both groups are headed by symmetry. geometric group theory is to associate to any abstract Prof. Dr. Anna Wienhard. This concept unified classical Euclidean geometry, the group a geometric incarnation of it – called its Cayley graph Symmetries play a central role in mathematics as well newly discovered hyperbolic geometry, and projective – and to study this object with concepts and tools stem- as in other natural sciences. Mathematically, symme- geometry, which has its origins in the study of perspec- ming from (hyperbolic) geometry, such as curvature and the tries are transformations of an object that leave it tive in art and is not based on the measurement of classification of isometries according to their dynamics (see unchanged. They can be composed – that is, applied distances but rather on incidence relations. Klein’s Fig. 27 on the following page). one after the other – and form a mathematical struc- concept fundamentally changed our view of geometry ture called a group. In the 19th century, mathematician in mathematics and theoretical physics and continues Felix Klein proposed a new definition of geometry as to influence these fields to this day.

38 39 2.6 Groups and Geometry (GRG)

Representations of discrete groups and geometric structures

Another approach to hyperbolic geometry, in the spirit of Klein’s “Geometry is Symmetry” (see above), consists of investigating the group of isometries of hyperbolic space. By definition, an isometry is a transfor- mation of hyperbolic space that preserves its geometry (it is enough to require that it be distance-preserv- ing). The discrete subgroups of isometries, such as groups of sym- metries of tessellations of the hyperbolic plane, are particularly interesting (see Fig. 28).

More generally, the study of represen- tations of discrete groups as isome- tries of non-Euclidean symmetric spaces is a highly active field of Figure 28: Two examples of uniform tilings research that is closely related to the of the hyperbolic plane in the Poincaré disk model, (Pictures: Wikipedia). theory of geometric structures on manifolds and their deformation spaces. This field of research, which is sometimes called (higher) Teich- müller–Thurston theory, extends the classical Teichmüller theory of the deformation of complex structures on surfaces. Many GRG members specialize in this field (e.g., the 2020 publications [Pozzetti et al., 2020] and [Dang and Glorieux, 2020] fall into this category).

Figure 27: Cayley graphs of the free group on two generators and of a finite group of order 18. (Pictures: Brice Loustau, Wikipedia).

40 41 2.6 Groups and Geometry (GRG)

Differential and computational geometry

The development of differential geometry based on the work of Gauss and Riemann tied the develop- ment of hyperbolic geometry to Riemannian geometry and to the Figure 30: A simple closed study of certain differential equa- curve on a once-punctured torus (Picture: Brice Loustau). tions, such as the geodesic equations and the Dirichlet problem. More generally, many problems from physics translate as PDEs (partial differential equations) on Riemannian manifolds, which mathematicians often attempt to solve using geomet- ric methods. A useful tool in this field called geometric analysis consists of Conjecture (MUC). This problem numbers via the simple relation serving isometries of Minkowski discretizing the domain and the concerns Markov numbers – that is, space. The hyperbolic analogs of equations and converting differential the integer solutions to the Diophan- Euclidean translations are identified problems to discrete ones that are tine equation x2 + y2 + z2 = 3xyz, such . with especially simple Lorentz more amenable to computation and as (x,y,z) = (1, 2, 5), (x,y,z) = (1, 5, 13), Numerical evidence strongly sug- transformations known as Lorentz sometimes also to theoretical analy- etc. MUC asserts that there is exactly gests that the identity holds and that boosts. sis. As an illustration, Fig. 29 features one such ordered triple (x, y, z) for MUC is therefore likely true. two examples of a harmonic map each Markov number z. This notori- Einstein’s theory of relativity is based between two hyperbolic surfaces ously difficult conjecture has defied Minkowski spacetime and on the postulate – confirmed via lifted to an equivariant map of the solution by mathematicians for over relativistic addition experimental measurements – that Poincaré disk with respect to the a century. the speed of light in the vacuum is action of two Fuchsian groups. This In the early 20th century, the develop- independent of the (freely falling) map solves a nonlinear PDE that is It turns out that Markov numbers can ment and language of hyperbolic observer. It follows that the velocities the hyperbolic analog of the Dirichlet be realized geometrically as the geometry was directly influenced by of objects relative to moving frames problem and was computed via a lengths of simple closed geodesics the rise of the theory of relativity in cannot simply be added, as in discrete heat-flow method (see on a hyperbolic once-punctured torus physics due to the relation of both Galilean physics. Instead, a veloci- [Gaster et al., 2019, Computing known as the modular torus. theories to Minkowski spaces. ty-addition formula exists that is harmonic maps between Riemannian Indeed, Minkowski’s 4-dimensional well-known to physicists and that can manifolds, arXiv preprint: 1910.08176] Using this beautiful connection, “spacetime” is the mathematical be simply derived mathematically by for details). Gaster and Loustau [2020, The sum framework for Einstein’s theory of composing a Lorentz boost and the of Lagrange numbers, arXiv preprint: special relativity (in the vacuum). On projection of the hyperboloid on the Hyperbolic geometry and 2008.07659] proved that MUC is the other hand, hyperbolic space (of t=1 plane. This projection yields number theory equivalent to the strikingly simple any dimension) can be realized as another famous model of hyperbolic identity the upper sheet of the hyperboloid of geometry known as the Beltrami– A deep connection exists between Minkowski space (of one dimension Klein model (see Fig. 32, next page). hyperbolic geometry and number . higher) consisting of unit timelike The relativistic addition of velocities theory that dates back to Poincaré’s vectors (see Fig. 31, next page). defines a “hyperbolic addition” in the study of Fuchsian groups and In this formula, ϕ is the golden ratio, Klein model of hyperbolic space automorphic forms relating to and Ln stands for the well-known In this model of hyperbolic geometry, analogous to the addition of vectors 2-dimensional hyperbolic geometry, Lagrange numbers, which appear in isometries are identified with or- in Euclidean space. complex-analytic functions, and Figure 29: Equivariant harmonic maps of the hyperbolic plane in the Poincaré disk model the theory of Diophantine approxima- thochronous Lorentz transformations number theory. (Pictures: Brice Loustau). tion and are related to Markov – that is, with time-orientation-pre- To illustrate this connection, we discuss the Markov Uniqueness

42 43 2.6 Groups and Geometry (GRG) Symmetrien spielen eine zentrale Mit diesem Konzept vereinheitlichte der Geometrie, Topologie und der Hyperbolic geometry and machine learning Rolle in der Mathematik als auch Klein die klassische Euklidische dynamischen Systeme, die das in vielen Naturwissenschaften. In Geometrie, die damals gerade neu Zusammenspiel zwischen Räumen, In recent years, interest in hyperbolic geometry has grown der Mathematik verstehen wir entdeckte hyperbolische Geometrie wie zum Beispiel Mannigfaltigkei- in the fields of data science and machine learning for unter Symmetrien die Transforma- als auch die projektive Geometrie, ten und metrische Räumen, und representation learning via graph embeddings and for the tionen eines Objektes, die dieses die aus dem Studium der perspekti- Gruppen, die als Symmetrien auf construction of so-called hyperbolic neural networks. invariant lassen. Solche Transfor- vischen Kunst erwuchs und die diese Räume wirken, einbeziehen. Learning graph representations via low-dimensional mationen lassen sich verknüpfen, nicht auf dem Messen von Abstän- Außerdem wir beschäftigen uns embeddings is an important problem in machine learning d.h. hintereinander ausführen und den, sondern auf Inzidenzrelationen mit den Anwendungen der Grup- since many situations (e.g., in linguistics, evolutionary bilden so die mathematische beruht. Noch wichtiger ist, dass pentheorie und Geometrie in biology, computer networks, etc.) involve data that have a Struktur einer, so genannten, Felix Kleins Konzept unser Ver- andere Disziplinen wie mathemati- hierarchical structure, such as a graph structure. Hyper- Gruppe. Im 19. Jh. entwickelte der ständnis von Geometrie in der sche Physik, Datenwissenschaft bolic spaces are well-known to geometers as typically Mathematiker Felix Klein einen Mathematik und der theoretischen und maschinelles Rechnen. being better ambient spaces for embeddings of graphs neuen Begriff der Geometrie: Physik grundlegend verändert hat than are Euclidean spaces because the latter do not have Geometrie ist das Studium der und bis heute prägt. “as much room” for the exponential growth of many Eigenschaften eines Raumes, die graphs and trees. An example of the application of invariant sind unter einer gegebe- Unsere Arbeitsgruppe Groups and hyperbolic graph embeddings to historical-linguistics data nen Gruppe von Transformatio- Geometry (GRG) beschäftigt sich is shown in Figure 33. nen. Kurz gesagt: Geometrie ist mit verschiedenen mathematischen Symmetrie. Forschungsfragen auf dem Gebiet Hyperbolic neural networks were introduced by Ganea et al. [Hyperbolic Neural Networks, NeurIPS 2018], thereby connecting hyperbolic geometry with deep learning.

Figure 31: The hyperboloid in Minkowski space (Picture: Brice [Figure 33: Embedding of historical-linguistics data in the Loustau). Poincaré disk model of the hyperbolic plane (from [Nickel and Kiela, 2018, Learning Continuous The key idea is to replace the linear Hierarchies in the Lorentz Model of Hyper- operations that are used in standard bolic Geometry, arXiv preprint: neural networks with hyperbolic 1806.03417]). operations based on the “hyperbolic addition” described above. Empirical- ly, hyperbolic neural networks appear well-suited for hyperbolic embed- dings and have already been used or generalized by many authors. Mem- bers of the NLP group have applied hyperbolic geometry to fine-grain entity typing (see the two papers by [Lopez et al., 2020]). In 2020, we began a collaboration between the GRG and the NLP groups to explore the more-intricate non-Euclidean geometries that arise from symmet- ric spaces, which play an important role in the GRG group’s research in the context of representation learn- ing. A first article, in which we Figure 32: Schematic illustration of the projection of the hyperboloid onto the Beltrami–Klein propose a systematic framework and model (Picture: Brice Loustau). metrics for learning graph embed- dings in symmetric spaces, was submitted in February 2021.

44 45 However, this view is currently being challenged, and the the development of new methods for better understand- Molecular Biomechanics group was able to contribute a ing such links between mechanics and biochemistry in 2 Research fresh perspective with our work on collagen in 2020. We proteins are critical and continuously growing, as exem- found that rich biochemistry occurs within our suppos- plified in the following sections. edly passive fibers, networks, and scaffolds. Protein Below, we take a look back at a productive year of 2.7 Molecular Biomechanics materials thus behave in this sense very much like an science full of social distancing and challenges to enzyme! We further outline these surprising findings collaborative work and creative thinking. (MBM) below. The role of high-performance computations and Advances in molecular er spring potential acts on another the underlying mechanisms of the simulations of the atom (or other atoms) of the molec- mechanical response of the protein mechanical properties and ular system to prevent the transla- under investigation. Recent examples function of proteins tion of the system. Significant of extensive statistics include 10–100 progress has been made in three unfoldings of spectrin domains at a Florian Franz, Csaba Daday, and areas: given unfolding velocity (Sheridan, S. Frauke Gräter Experimental observations and computer simulations are natural allies, as are single-molecule force spectroscopy (SMFS) and classical Molecular Dynamics (MD) when it comes to studying the response of molecules to mechanical force. Recent advances in experiments and high-speed AFM simulations have increasingly facilitated a direct comparison of SMFS- and MD data, most impor- conventional AFM tantly by closing the gap between timescales, which has traditionally Group Leader Isabel Martin been at least 5 orders of magnitudes Figure 34. Pulling speeds used in MD simulations of proteins have decreased over time, Prof. Dr. Frauke Gräter Dr. Nicholas Michelarakis wide. On the one hand, recent approximately following Moore’s law. Data points show the velocities of only a small subset of published simulations and include protein unfolding and protein-ligand-dissociation events with Benedikt Rennekamp methodological advances in molecu- strongly varying system sizes and pulling lengths. The overview includes one data point from Staff members Kai Riedmiller lar simulations – primarily Molecular our group, namely from simulations of desmoplakin (structure on the right, center). Saber Boushehri (since June 2020) Christopher Zapp Dynamics (MD) simulations – have Matthias Brosz expanded the range of timescales First, owing to the increased efficien- et al. How Fast Is Too Fast in Force- Svenja de Buhr Students that are accessible to simulations. cy of hardware and MD codes, Probe Molecular Dynamics Simula- Florian Franz Elizaveta Bobkova On the other hand, emerging high- pulling velocities of far below 1m/s tions? The Journal of Physical Ana María Herrera-Rodríguez (until March 2020) Thomas Ehret (until December 2020) speed AFM methods allow for faster can now be reached in explicit Chemistry B, 123, 3658–3664, 2019). Dr. Fan Jin Anna Schroeder (until March 2020) pulling speeds in experiments, which solvent (Figure 34). Speeds as low Third, instead of focusing on the Fabian Kutzki Dennis Wagner led to the recent breakthrough in the as 0.001m/s have been reported in mechanical stability of individual Dr. Markus Kurth first direct quantitative comparisons recent years. protein domains, interest has recent- of predicted rupture forces with ly shifted to larger multi-domain experimentally measured forces. Second, studies not only present systems and to their mechanisms of Simply speaking, the large variety of proteins in our cell nucleus and its precious genetic content from The basic principles of simulations single trajectories but can also mechano-sensing and mecha- bodies can be divided into two groups with vastly differ- damage. In contrast to enzymes, protein materials are under force have remained the include multiple or even dozens of no-transduction. Proteins in focal ent jobs. The first group comprises enzymes, which take rather passive when it comes to biochemistry. They are same: An atom (or a group of trajectories for a given set of parame- adhesions are particularly interesting care of all biochemistry and whose functions range from made, modified, and degraded by enzymes but do not atoms) in a molecule is (or are) ters. The large number of trajectories candidates for such simulations of building new molecules to degrading existing molecules carry out any such sophisticated biochemistry them- subjected either to a virtual spring of recent studies has allowed for larger systems. and from attaching signals to silencing these signals. selves. Consider a virus: Its shell and adaptor proteins that is moved with constant velocity sufficient statistics on unfolding For the – admittedly only exemplary The second group comprises protein materials, which provide stability and establish connections, while the along the pulling direction or to a forces (under constant-velocity subset of – studies that Figure 34 is hold all parts of the body together, connect bones to proteins contained on the inside of the virus capsid run constant (or constantly changing) pulling) or on unfolding times (under based on, pulling velocities have muscles, surround cells with scaffolds, and protect the the biochemical factory of an infection. force while a counterforce or anoth- constant-force pulling) as well as on roughly halved every two years and

46 47 2.7 Molecular Biomechanics (MBM)

have thereby thus far followed provides structural and mechanical the fibril, there is a part containing A striking feature of the collagen Vereinfacht gesagt kann man Proteine, wie sie in ihrer großen Vielfalt Moore’s law. However, on average, stability to nearly all human tissues. five molecules in cross-section, structure is the high abundance of in unserem Körper vorkommen, in zwei Gruppen einteilen, die ganz they have fallen behind the approxi- It is often under extreme pressure, which is called the overlap region, aromatic residues (phenylalanines unterschiedliche Aufgaben erfüllen: Die erste Gruppe umfasst die mate doubling in protein MD simula- for example, while you play sports or and a part containing only four and tyrosines) in these regions. This Enzyme. Sie kümmern sich um die gesamte Biochemie, vom Aufbau tion time every 1.3 years (Daday, C. et walk, during which time the Achilles molecules, which is called the gap observation was key to solving the neuer Moleküle bis zum Abbau derselben, vom Anhängen von Signa- al.: The mechano-sensing role of the tendon can experience multiple region. Covalent crosslinks between riddle of EPR data: These aromatic len an andere Proteine bis zum späteren Ausschalten dieser Signale. unique SH3 insertion in plakin times your body’s own weight in tropocollagen helices stabilize the residues served as first candidates Die zweite Gruppe von Proteinen umfasst die Baustoffe. Sie halten domains revealed by Molecular pressure. We asked ourselves: Do fibril. Taken together, these unique for the radical signal, which was alle Teile des Körpers zusammen, verbinden Knochen mit Muskeln, Dynamics simulations, Scientific mechanoradicals also form in and highly elaborated structural important in identifying the radicals. umgeben Zellen mit Gerüsten und schützen den Zellkern mit seinem Reports, 7, 11669, 2017), which is biological polymers – that is, in features of collagen equip it with The resulting harmful radicals, which wertvollen genetischen Inhalt vor Beschädigungen. Im Gegensatz zu likely due to the recent focus on proteins such as collagen? high mechanical resilience. However, arise from covalent bond scission, Enzymen sind diese Stoffe eher passiv, was die Biochemie angeht. improved statistics and larger biologi- Together with colleagues from as the new experiments suggest, are quickly scavenged by nearby aro- Sie werden von Enzymen auf-, um- und abgebaut, führen aber selbst cal systems discussed above. While Homburg, Frankfurt, and Seattle, we even if the collagen material remains matic residues – so-called DOPAs, keine ausgefeilte Biochemie durch. Nehmen wir einen Virus: Seine Moore’s law – which states that the demonstrated in a series of experi- intact at the tendon scale, individual or dihydroxyphenylalanines. This Hülle und die Ankerproteine geben Stabilität und stellen Verbindungen number of transistors per area ments that excessive mechanical chemical bonds inside the structure process prevents other side reac- her, während die im Inneren des Viruscapsids enthaltenen Proteine doubles every two years – is bound stress on collagen indeed produces rupture, and radicals are formed. tions and further degradation. The die biochemische Fabrik der Infektion betreiben. to be limited by physical constraints radicals. The first challenge was to Molecular Dynamics simulations of new discovery of a DOPA radical Diese Sichtweise wird derzeit in Frage gestellt, und die Gruppe Molecu- that will be reached within the establish a setup to produce radicals the collagen fibril – which comprises was supported by mass spectrome- lar Biomechanics (MBM) konnte mit ihrer Arbeit über Kollagen im Jahr coming years, we still expect the inside collagen. A key experiment millions of atoms – help to explain try, which revealed a specific DOPA 2020 zu einer neuen Perspektive beitragen. Wir haben herausgefunden, performance of simulations to involved mounting and stretching a the experimental observations: side in one of the collagen chains. dass in unseren vermeintlich passiven Fasern, Netzwerken und Gerüs- increase drastically in the future. With rat tail fascicle directly in an elec- Chemical bonds break at sites of The DOPA radicals then finally ten eine reiche Biochemie stattfindet. Proteinmaterialien verhalten sich GPU acceleration becoming more tron-paramagnetic-resonance (EPR) high stress concentration when convert into hydrogen peroxide, an also in diesem Sinne sehr ähnlich wie ein Enzym! In diesem Bericht and more abundant and AI finding its cavity in order to monitor radical collagen is stretched. We built a important oxidative molecule in the skizzieren diese überraschenden Erkenntnisse. Die Rolle von Hochleis- way to simulations, the current trend formation in real time. By increasing collagen-fibril model in atomistic body. Hydrogen peroxide in the tungsrechnungen und Methodenentwicklung für das Verständnis is likely to even accelerate. the pulling force, the radical concen- resolution to determine what hap- concentrations we find in collagen solcher Zusammenhänge zwischen Mechanik und Biochemie in tration inside the rat tail fascicle pens during pulling at the micro- acts as an important signaling Proteinen steht außer Zweifel und wird stetig wichtiger. Wir blicken auf Stretched beyond the limits rose without any observed macro- scopic level. The Molecular Dynam- molecule. Collagen is therefore not ein produktives Wissenschaftsjahr zurück, wenn auch mit physischer scopic ruptures. The 2-centime- ics simulations of collagen under only a mere bearer of force, but it Distanz - mit all den damit verbundenen Herausforderungen für das Christopher Zapp ter-long fascicle, which is mainly constant force comprised a repre- can also control the consequences gemeinsame Arbeiten und kreative Denken. Polymers are subjected to mechani- made of collagen I, survived forces sentative fragment of collagen, of such forces. The study suggests cal stress in many everyday sys- of up to 20 Newtons for several namely 27 triple helices at a length tems, such as those found in vehicle minutes. The toughness of collagen of one repeat unit along the fiber tires, shoe soles, and rubber bands. comes from its rope-like micro-struc- axis, which resulted in 5 million Imagine pulling a rubber band: ture. A single collagen molecule is collagen- and water atoms, a chal- Chemical bonds are cleaved and used to make larger collagen fibrils. lengingly large system to compute. radicals form, leading to material The molecule is approximately 300 By reaching a total of microseconds damage. Because the yielding nanometers long and 1.5 nanome- of simulation time, we were able to radicals are produced as a direct ters in diameter and is made of reveal the points of stress concen- consequence of the mechanical three polypeptide chains. These tration and bond scission within the force, they are called mechanoradi- chains or strands are twisted togeth- proteins. One major finding was that cals. These mechanoradicals have er into a right-handed triple helix. bonds within the crosslinks between been known for a century and have With type I collagen, the most-promi- triple helices as well as adjacent been technologically exploited. nent type, each triple helix is called a backbone bonds are particularly Now imagine your Achilles tendon. It collagen microfibril. These microfi- stressed and are candidates for is made of collagen, the virtual brils are shifted relative to adjacent bond scission. High-level quan- rubber band in your body. Collagen molecules by about 67 nanometers tum-chemical calculations revealed Figure 35. Force concentrates around crosslinks in tensed collagen. Snapshot of an MD simulation of the crosslinked collagen I model fibril is the basic material of tendons, and build a representative unit of a that the chemical nature of this under 1 nN of external force per chain, colored according to the distribution of the external force through the fibril (blue = low force, red = high force). Below is an example pair of overlapping triple helices connected by crosslinks from the snapshot, depicted separately to better visualize cartilage, ligaments, and bones that collagen fibril. In each repeat unit of scission is homolytic. the forces around the crosslinks. The crosslinks are represented as spheres, the remaining collagen chains as gray ribbons.

48 49 2.7 Molecular Biomechanics (MBM)

Where does collagen break?

Benedikt Rennekamp that collagen has evolved as a Many proteins in our bodies are To understand what happens at the radical sponge to combat damage. exposed to mechanical loads: When molecular scale, Molecular Dynam- In this way, collagen protects itself getting exercise, for example, our ics (MD) simulations are the stan- from the radicals (see Fig. 36). tendons are stretched. The resulting dard tool that we employ in our We are currently testing the hypothe- forces can lead to sis that collagen not only takes the covalent bond mechanical load but acts directly as scissions inside the a biochemical-force sensor in our fibrils, even before bodies. The coupling between macroscopic failure mechanical and oxidative stress in occurs. These collagen-based tissues is a missing tendons and many link between the mechanical load other force-bearing and biological processes, from pain materials, such as sensation to inflammation. These bones and skin, findings can explain why exercising consist largely of Figure 37. In order to simulate bond Figure 38. KIMMDY simulation of our previously modeled collagen fibril, consisting of one overlap and one gap region of the typical hierarchical can at times be quite painful and the structural protein collagen. As scissions in tensed proteins, we developed a collagen structure: At the transition of these regions, collagen strands that end are covalently connected to other molecules with crosslinks hybrid kinetic Monte Carlo / Molecular (shown here in black). The spheres mark simulated breakage sites that show a clear concentration in the vicinity of the crosslinks. also provide a promising starting we have shown and as outlined Dynamics (KIMMDY) scheme that enables point for improving tissue repair and above (in the section “Stretched bond scissions in all-atom MD. simulations. transplantation, for example, in beyond the limits”), bond scissions sports medicine. lead to radical formation in collagen, group. With respect to collagen, one We therefore developed a new back to a Molecular Dynamics internal ruptures, even if the applied which has potentially profound question that arises in this context simulation scheme called KIMMDY simulation. In this way, we are able to forces remain constant. These consequences for tissue. is how to identify weak spots that (kinetic Monte Carlo / Molecular investigate the molecular response internal (homolytic) bond breakages have a high rupture propensity. Dynamics) [citation of paper here] again after the bond scission and in lead to highly reactive radicals in However, in regular MD simulations, that can alleviate these issues via a greater detail. collagen. The subsequently created covalent bonds are predefined, and hybrid combination of different We applied this new technique to a species could potentially act in chemical reactions cannot occur. simulation methods (see Fig. 37). multi-million-atom system of a signaling processes by converting Furthermore, such events rarely take Bond rupture rates are calculated tensed collagen fibril and showed mechanical forces into oxidative place at simulation timescales that based on the interatomic distances in that bond scissions clearly concen- stress. This method hence paves the are accessible with MD. Existing the MD simulation and then serve as trate near the chemical crosslinks. way for further examinations of this approaches that incorporate chemi- an input for a kinetic Monte Carlo These crosslinks connect different biologically highly relevant mecha- cal reactions rely either on computa- step. This approach bridges the collagen molecules and are – ac- nism at a molecular scale and tionally expensive quantum calcula- often-apparent separation of accessi- cording to our analysis – a potential complements our experiments in tions (e.g., QM/MM) or on complex ble timescales since the kinetic weak spot in the collagen structure these fields (see the section bond-order formalisms in force Monte Carlo step propagates the (see Fig. 38). “Stretched beyond the limits”). fields (e.g., ReaxFF). system in time until a rupture takes place. After this transition has Switching back to the molecular occurred, KIMMDY creates a new simulations, the fibril can be seen to molecular topology in order to switch maintain its overall integrity after the

Figure 36. Forces in proteins, as in the displayed Achilles tendon, can lead to bond ruptures inside the molecule. These breakages, which can occur while the whole fibril is still intact, give rise to highly reactive radicals. (Photo credit - for the right part: istockphoto.com/SciePro).

50 51 lar-level understanding of drug interactions along the pro- visualization tools to machine-learning methods and atomic-de- cess extending from drug delivery to drug–target binding to tail molecular simulations. 2 Research drug metabolism. In this report, we outline some of the results achieved in 2020. We take an interdisciplinary approach that entails collabora- Following a general overview of what was new in the group last tion with experimentalists and makes concerted use of year, we focus on projects dealing with (i) predicting drug–target 2.8 Molecular and Cellular computational approaches based on physics and bio-/ binding kinetics, (ii) modeling macromolecular complexes, and chemo-informatics. The broad spectrum of techniques (iii) structure-based drug design against SARS-CoV-2. Modeling (MCM) developed and employed ranges from interactive, web-based What was new in 2020? some time in the SDBV group, she did give an online lecture in November began work as a postdoc at the on “Computational Approaches to It goes without saying that the corona- University of Eastern Finland, Kuopio, Protein Dynamics and Binding Kinetics virus dominated our lives and work in in January 2021. Jan-Niklas Dohrke for Drug Discovery”. While travelling 2020. In the MCM group, we not only completed his master’s thesis in came to a standstill, remote commu- adapted to working from home, to Molecular Biotechnology, in which he nication with scientists around the online teaching, to videoconference performed simulations to investigate globe became easier, and activities group meetings, and to virtual coffees, the mechanism behind the regulation aimed at strengthening the interna- but we also sought to use our know- of AMPAR ion channels in the brain by tional scientific community grew in how to tackle the virus. We initiated regulatory proteins. Fabian Ormers- importance. Ariane Nunes-Alves’s several projects aimed at identifying bach and Lukas Jarosch completed activities as a member of the Early therapeutics against the coronavirus, their bachelor’s theses in Molecular Career Board of the American Chemi- which are described below. In parallel, Biotechnology and Biochemistry, cal Society’s Journal of Chemical we continued work on our other respectively, in the summer. Several Information and Modeling exemplify projects. The final three-year phase of master’s students from Heidelberg such activities. In 2020, Ariane the EU-supported Human Brain Project University completed internships in published three articles on various Group Leader Giulia Paiardi (EMBO Short-Term Fellowship, January– began in April 2020. We participate in the group during the year: Xingyi aspects of conducting computation- Prof. Dr. Rebecca Wade September 2020) two collaborative tasks aimed at (Cyan) Cheng (Molecular and Cellular al-chemistry research [Nunes-Alves, providing tools for using molecular-lev- Biology), Anton Hanke (Molecular 2020b; Nunes-Alves, 2020c, Mazzolari, Staff members Students el information in multiscale simulations Biotechnology), Jonathan Teuffel 2020]. She was also able to team up Christina Athanasiou Xingyi Cheng (January–March 2020) of signal propagation that will be (Biochemistry), and Carolne Demidova with MCM-group alumni Outi Sa- Manuel Glaser Caroline Demidova (October–November 2020) incorporated into the EBRAINS infra- (Chemistry). lo-Ahen (Turku University, Finland) and Dr. Daria Kokh Jan-Niklas Dohrke (April–September 2020) structure for digital brain research Ghulam Mustafa (DKFZ, Heidelberg) Abraham Muniz Chicharro Sungho Bosco Han (since October 2020) (https://ebrains.eu). The pandemic has changed the way and with other scientists across Dr. Stefan Richter Anton Hanke (April–July 2020) scientific meetings are organized. Europe to write a review of the Dr. Kashif Sadiq (until October 2020) Lukas Jarosch (July–October 2020) At the beginning of the year, Giulia Ariane Nunes-Alves co-organized two application of molecular-dynamics Alexandros Tsengenes Konstantinos Mavridakis Paiardi (University of Brescia, Italy) meetings that took place in Septem- simulations for drug discovery and Fabian Ormersbach (March–July 2020) joined the group as a visiting PhD ber: a LatinXChem (https://www. pharmaceutical development (Sa- Visiting scientists Jonathan Teuffel (August–November 2020) student supported by short-term latinxchem.org/) virtual conference, lo-Ahen OMH, Alanko I, Bhadane R, Madhura De (DKFZ) fellowships from Erasmus+ and the which was held exclusively on Twitter, Bonvin AMJJ, Honorato RV, Hossain S, Dr. Goutam Mukherjee (Heidelberg University) EMBO. She performed simulations of and a virtual EMBL conference on Juffer AH, Kabedev A, Dr. Ariane Ferreira Nunes-Alves glycoprotein interactions, including “The impact of the COVID-19 crisis on Lahtela-Kakkonen M, Larsen AS, (Capes-Humboldt Fellowship) investigations of coronavirus women in science: Challenges and Lescrinier E, Marimuthu P, Mirza MU, spike-heparin interactions (see below). solutions” (https://www.embl.org/ Mustafa G, Nunes-Alves A, Having completed his Volkswagen events/covid19-wis ). Rebecca Wade Pantsar T, Saadabadi A, Singaravelu Molecular recognition, binding, and catalysis are fundamental ing how biomolecules interact. What determines the specific- Foundation “Experiment!” project on was named lecturer for the 2020 K, Vanmeert M (2021). processes for cell function. The ability to understand how ity and selectivity of a drug–receptor interaction? How can “RNA Epicatalysis,” Kashif Sadiq left Molecular Graphics and Modelling Molecular dynamics simulations in macromolecules interact with their binding partners and proteins assemble to form a complex? How is the assembly the group in October to join the EMBL Society (MGMS) Lecture Tour (https:// drug discovery and pharmaceutical participate in complex cellular networks is critical to the of a complex influenced by the crowded environment of a in Heidelberg. Ina Pöhner defended www.mgms.org/WordPress/lec- development. prediction of macromolecular function and to applications cell? What makes some binding processes quick and others her doctorate on computational ture-tour/ ). Although the lectures, Processes 9(1):71.). such as protein engineering and structure-based drug design. slow? How do the motions of proteins affect their binding approaches to anti-parasite drug which had been planned to be held in In the MCM group, we are primarily interested in understand- properties? One of our aims is to gain a mechanistic molecu- design against neglected tropical five cities in the UK, had to be post- diseases in May, and after spending poned due to the pandemic, Rebecca

52 53 2.8 Molecular and Cellular Modeling (MCM)

Prediction of drug–target over a wide range of timescales. protein target – its residence time (τ) binding kinetics Recently, many new methods for – is inversely related to the rate of computing binding kinetic parameters dissociation of the drug–target Drug–target binding kinetics are have been reported. We critically complex. For most pharmaceutically increasingly understood to influence assessed their performance by relevant compounds, the timescales the efficacy of a drug. Therefore, considering two well-characterized for dissociation from the target far drug-design procedures should aim to benchmark systems [Nunes-Alves, exceed those that are accessible to discover compounds that bind and 2020] and concluded that some of the conventional molecular dynamics unbind at optimal rates in addition to methods can already be usefully simulations. To address this problem, binding tightly to their molecular applied for computing binding kinetic we developed an efficient computa- target. Consequently, there is a need parameters in drug discovery lead tional workflow that enables the

Figure 40. Correlation between computed (τRAMD) and measured (1/koff) residence times for focal adhesion kinase (FAK) and proline-rich tyrosine kinase 2 (PYK2). The binding sites of the complexes with compound 10, which has the longest residence time on FAK (green circle) and a comparatively low residence time on PYK (purple circle), are shown on the right. The overlay of the sity of Frankfurt), we can affect the interplay between crystal structures of the complexes of compound 10 with FAK (cyan combined experi- protein structural mobility and li- and yellow) and PYK2 (gray and magenta) reveals that the complex mental in vitro and gand-induced structuring, thereby with FAK has a helical conformation of the DFG motif. FAK inhibitors with long residence times make hydrophobic interactions with the in cellulo kinetic leading to the kinetic selectivity of DFG motif in FAK and thereby induce a helical structure in FAK but data, crystal struc- enzyme inhibitors. not PYK2. (Figure adapted from: Berger et al.: Structure-kinetic tures, and mutagen- relationship reveals the mechanism of selectivity of FAK inhibitors esis data with Modeling of macromolecular over PYK2, Cell Chemical Biology, 25 January 2021, with permission from Elsevier). τRAMD calculations complexes of residence times Dynamics (RAMD) method, which – in to reveal the mechanism of kinetic We employ structural bioinformatics addition to existing implementations selectivity of inhibitors between two and molecular simulation approaches in the NAMD and AMBER software closely related kinases that are targets in combination with experimental data packages – Bernd Doser (Software for anticancer agents: focal adhesion to predict structures of protein–pro- Developer in the IT Services group) kinase and proline-rich tyrosine kinase tein- [Diestelkoetter-Bachert, 2020, implemented in the freely available 2 (Berger B, Amaral M, Kokh DB, Moreau, 2020/2021], protein–peptide- GROMACS molecular-simulation Nunes-Alves A, Musil D, Heinrich T, [Weidner, 2020], and protein–nucle- engine for simulations on CPU or GPU Schröder M, Neil R, Wang J, Navratilo- ic-acid [Öztürk, 2020], [Öztürk, 2020b] nodes, thereby enabling greatly va I, Bomke J, Elkins JM, Müller S, complexes and to investigate binding improved computational performance Frech M, Wade RC, Knapp S: Struc- mechanisms and properties. The (see Fig. 39). Relative dissociation ture-Kinetic-Relationship Reveals the modeling and simulation of protein rates are computed with our τRAMD Mechanism of Selectivity of FAK complexes in membrane environ- protocol, and dissociation trajectories Inhibitors Over PYK2. Cell Chemical ments presents particular computa- are analyzed using protein–ligand Biology, 25 January 2021, DOI: tional challenges, which we address in interaction fingerprints with our new 10.1016/j.chembiol.2021.01.003 ; see several projects, including the Euro- Figure 39. Illustration of the application of the τRAMD- and MD-IFP workflow for simulating the dissociation of a drug-like compound from an MD-IFP set of tools [Kokh, 2020] Figure 40). The simulations enabled neurotrophin ITN network (https:// anticancer target protein. Purple contours indicate the regions explored by the compound during dissociation from the protein in RAMD simula- (available at https://github.com/ the prediction of relative residence www.euroneurotrophin.eu/) and the tions. The compound can dissociate via two main paths, along which different transient states, which influence the rate of dissociation, are HITS-MCM/MD-IFP). times that were in good agreement Informatics4Life consortium (https:// sampled. The transient states are defined by an interaction fingerprint-based clustering of structures along the trajectories using MD-IFP. The RAMD simulations can be carried out efficiently using the implementation in the GROMACS software. with experimental data and the informatics4life.org/). In this report, We apply the τRAMD- and MD-IFP identification of the molecular deter- we describe a multiresolution simula- for accurate methods to compute optimization programs, but that prediction of relative drug-protein workflow to a range of different minants that affect the binding tion approach for investigating how binding kinetic parameters. However, further studies on more high-quality residence times and the analysis of proteins, including members of the kinetics. They thus provide a basis for membrane-bound mammalian cyto- computing drug–target binding benchmark datasets are necessary to dissociation mechanisms in an important drug–target classes of the rational optimization of com- chrome P450 (CYP) enzymes form kinetics poses a challenging problem improve and validate computational automated manner. The workflow is kinases and G-coupled protein recep- pounds to extend residence times. electron transfer-competent complex- because the rates at which drugs bind methods. The length of time that a based on simulations performed with tors. In one study performed with This study demonstrated how small es with their redox partner, cyto- to and dissociate from proteins vary drug molecule spends bound to its the Random Acceleration Molecular Stefan Knapp and colleagues (Univer- differences between protein sequences chrome P450 reductase (CPR).

54 55 2.8 Molecular and Cellular Modeling (MCM) imposed by membrane binding, we Structure-based drug design zation that is based on a set of (2) In virtual screens, the docking of identified several arrangements of against SARS-CoV-2 machine-learning models and uses compounds is typically performed to CYPs form a superfamily of ubiqui- minally truncated CPR using our SDA CPR around CYP 1A1 that are compat- simple pose-invariant physicochemical only one or a few structures of the tous heme-containing monooxygenas- software. The resultant encounter ible with electron transfer (see Figure During 2020, we pursued three descriptors of the ligands and the target protein. However, proteins are es that catalyze drug metabolism and complexes were then refined and 41). Computed electron-transfer rates strategies toward antiviral therapeu- target protein-binding pocket to flexible, and their binding sites can steroidogenesis and are of significant used to build several models of the and pathways agree well with avail- tics against SARS-CoV-2. estimate protein–ligand binding change shape and reveal transient or interest for exploitation as biocata- two full-length proteins in a phospho- able experimental data and provide cryptic pockets that can provide lysts and drug targets. The CYP lipid bilayer. These models were used evidence as to why the CYP–CPR binding sites for compounds. We catalytic cycle requires two electrons as initial structures for atomic-detail electron-transfer rates are low com- developed TRAPP (TRAnsient Pockets to be transferred to the heme cofac- molecular dynamics simulations run pared with those of soluble bacterial in Proteins; trapp.h-its.org) to enable tor, and the electron-transfer steps on the HLRS supercomputer in CYPs. Moreover, the identified elec- the exploration of different protein have been conformations, the analysis of binding observed to be pocket flexibility and dynamics, and the rate-limiting for extraction of spatial and physicochemi- the reaction in cal information on the binding pocket many cases. conformations. Version 4 of the TRAPP Both CYP and webserver, which includes a machine CPR are an- Figure 42. Schematic overview of the workflow of RASPD+ (Rapid Screening with Physico- learning method to score pocket chored in the chemical Descriptors + machine learning) from [Holderbach, 2020]. For each ligand molecule, druggability [Yuan, 2020], was released simple physicochemical descriptors are computed based on atomic contributions. Information membrane of the this year. We applied TRAPP to analyze on the target protein is gathered within a sphere around a putative binding position. A set of endoplasmic machine-learning (linear regression (LR), k-nearest neighbors (kNN), support vector regression the binding pocket dynamics of the reticulum by a (SVR), neural network (DNN), and random forest (RF)) methods is used to train and validate SARS-CoV-2 main protease and the single transmem- models, which are then used to predict binding free energies. spike receptor-binding domain. For the brane helix. The main protease, we analyzed about globular catalytic (1) We pooled the expertise of the affinity (see Figure 42). RASPD+ can be 30,000 conformations of the protein domain of CYPs MCM group together with that of used to filter a compound library – from crystal structures, molecular is attached to Francesca Spyrakis (University of before progressing to the more compu- dynamics simulations, and conforma- the transmem- Turin) and Giulia Rossetti (For- tationally demanding docking of tions generated by TRAPP – to select brane helix by a schungszentrum Jülich) in order to ligands into the protein-binding pocket. structures with high druggability flexible linker. We participate in the “JEDI billion mole- RASPD+ is available at https://github. scores (see Figure 43). combined cules against Covid19 grand chal- com/HITS-MCM/RASPDplus. coarse-grained lenge” (https://www.jedi.foundation/ and atomic-detail billion-molecules) and to virtually molecular screen millions of compounds for the dynamics inhibition of four virus proteins. Along simulations to with about 130 other teams world- predict how Figure 41. Representative structures and predicted electron transfer pathways for the structural ensemble of elec- wide, we submitted prioritized lists of CYPs arrange in a tron-transfer-competent CYP 1A1–CPR complexes in a phospholipid bilayer obtained via a multiresolution dynamics compounds based on the results from simulation approach. Upper panel: Three complexes shown with protein surfaces colored by electrostatic potential (pos- phospholipid our computational pipeline. JEDI itive: CYP (cyan) and CPR (blue); negative: CYP (pink) and CPR (red)), with transmembrane helices in green and the lipid bilayer with the phosphorous atoms displayed as orange spheres. Lower panel: The two predicted electron-transfer pathways from the analyzed all submissions and selected catalytic domain CPR FMN cofactor to the CYP heme cofactor in complexes are indicated in red in close-up views of the protein–protein 1,000 compounds for synthesis and dipping into the interfaces. The 2D-histogram plot shows the computed interaction free energy between the CYP globular domain and experimental testing, including com- the CPR FMN domain for snapshots from the simulations versus the computed electron transfer rates. Stars indicate membrane [Musta- pounds on the lists that we submitted. the complexes shown in the upper panel. Adapted from:[Mukherjee, G., Nandekar, P.P. & Wade, R.C. An electron transfer fa, 2020]. We used competent structural ensemble of membrane-bound cytochrome P450 1A1 and cytochrome P450 oxidoreductase. Tests of the compounds remain such a model of Commun Biol 4, 55 (2021). https://doi.org/10.1038/s42003-020-01568-y (CCBY 4.0). ongoing. CYP 1A1, which This work required a very large plays an important role in the metabo- Stuttgart. We found that upon binding tron-transfer pathways provide number of compounds to be Figure 43. Analysis of the flexibility and druggability of the SARS-CoV-2 main protease showed lism of carcinogens, such as those in to CPR, the CYP 1A1 catalytic domain mechanistic insights into the effects screened, which we achieved using high flexibility of the active site and revealed that small structural variations dramatically impact ligand-binding properties. Left: A structure of the main protease homodimer with an tobacco smoke, as a starting point for becomes less embedded in the of mutations associated with in- RASPD+ (Rapid Screening with inhibitor (cyan) bound in the active site. Upper right: Distribution of the two TRAPP druggability modeling CYP1A1-CPR complexes. We membrane and reorients, indicating creased cancer risks. We are currently Physicochemical Descriptors + scores for computationally generated structures. Lower right: Binding sites of two representa- first performed rigid-body Brownian that CPR may affect substrate pas- investigating the sequence depen- machine learning) [Holderbach, 2020]. tive structures with high druggability scores, with pocket surfaces colored by atom type. The dynamics docking simulations of the sage to the CYP active site from the dence of binding and electron transfer We developed RASPD+ as a fast region contributing most to the CNN druggability score is shown by contours. The high mobility of the flexible loops (green and pink) and the C-terminal tail (cyan) should be noted. CYP catalytic domain with the N-ter- membrane. Despite the constraints of other CYPs with CPR. pre-filtering method for ligand prioriti- Adapted from (Gossen et al., 2021). 56 57 2.8 Molecular and Cellular Modeling (MCM)

These conformations were used for virtual screening against compound Molekulare Erkennung, Bindung und Katalyse sind grundlegende Prozesse libraries by Giulia Rossetti and her der Zellfunktion. Die Fähigkeit zu verstehen, wie Makromoleküle mit ihren colleagues (Forschungszentrum Bindungspartnern interagieren und an komplexen zellulären Netzwerken Jülich), who were able to identify a teilnehmen, ist entscheidend für die Vorhersage von makromolekularen blueprint for a specific structure-based Funktionen und für Anwendungen wie beispielsweise Protein-Engineering pharmacophore, which was used to und strukturbasierte Wirkstoffentwicklung. predict high-affinity inhibitors that were independently validated both in In der Gruppe Molecular and Cellular Modeling (MCM) sind wir in erster an enzyme-inhibition assay and by Linie daran interessiert zu verstehen, wie Moleküle interagieren. Was crystallography (see: Gossen J, Albani bestimmt die spezifische und selektive Wirkung beim Zusammenspiel von S, Hanke A, Joseph BP, Bergh C, Wirkstoff und Rezeptor? Wie werden Proteinkomplexe gebildet und Kuzikov M, Costanzi E, Manelfi C, welche Formen können sie annehmen? Welche Wirkung hat die beengte Storici P, Gribbon P, Beccari AR, Zellumgebung auf die Bildung eines Proteinkomplexes? Warum verlaufen Talarico C, Spyrakis F, Lindahl E, einige Bindungsprozesse schnell und andere langsam? Welche Auswir- Zaliani A, Carloni P, Wade RC, Musiani kungen haben Proteinbewegungen auf ihre Bindungseigenschaften? F, Kokh DB, Rossetti G: A blueprint for high affinity SARS-CoV-2 Mpro Eines unserer Ziele besteht darin, die Mechanismen besser zu verstehen, inhibitors from activity-based com- die bei Wechselwirkung von Medikamenten auf der molekularen Ebene pound library screening guided by ablaufen, von der Freisetzung des Wirkstoffs über die Bindung zum analysis of protein dynamics. ACS Rezeptor bis hin zum Metabolismus des Medikaments. Pharmacology & Translational Sci- ence. Xx, 2021). In einem interdisziplinären Ansatz kooperieren wir mit experimentell arbeitenden Forscher/-innen und verwenden gemeinsam rechnerische (3) In collaboration with Marc Rusnati Methoden aus den Bereichen der Physik-, Bio- und Chemoinformatik. Das and Giulia Paiardi (University of breite Spektrum der Techniken, die wir entwickeln und einsetzen, reicht Brescia, Italy), we performed molecu- dabei von interaktiven web-basierten Visualisierungswerkzeugen bis hin lar dynamics simulations to investi- zu Molekularsimulationen auf atomarer Ebene. gate the interactions of heparin with the SARS-CoV-2 spike glycoprotein. In diesem Bericht beschreiben wir einige der Ergebnisse aus dem Jahr Heparin is polysaccharide adminis- 2020. Nach einem allgemeinen Überblick über Neuigkeiten in der Gruppe tered intravenously as an anticoagu- konzentriert sich der Bericht auf Projekte, die sich mit (i) der Vorhersage lant to COVID-19 patients and via von Wirkstoff-Protein-Bindungskinetik, (ii) mit der Modellierung von makro- Figure 44. Schematic illustration of our molecular dynamics simulation approach to studying interactions of the SARS-CoV-2 spike with heparin. aerosol for treating other lung diseas- molekularen Komplexen und (iii) mit strukturbasierter Wirkstoffentwick- (A) Recent findings suggest that SARS-CoV-2 infection depends on the interaction of the SARS-CoV-2 spike both with the glycan chains of the heparan sulphate proteoglycans (HSPGs) and with the angiotensin-converting enzyme-2 receptor (ACE2) of the human host cell and that heparin es. Experiments indicate that heparin lung gegen SARS-CoV-2 befassen. might interfere with these interactions. (B) View of the atomic-detail model of one of the simulated systems consisting of the homotrimeric inhibits SARS-CoV-2 infection by spike glycoprotein head (gray) – with one subunit in an open conformation suitable for binding ACE2 – interacting with 3 heparin chains (pink), binding to the virus spike glycoprotein, each composed of 31 monosaccharides. (C) Use of PRACE high-performance computers enables such systems to be simulated about 25 times which plays a key role in attachment prefusion conformations with zero, heparin masks key functional sites on faster than on a workstation and 5 times faster than on a typical in-house compute-cluster. Image published in “PRACE Digest 2020”(extract in: https://prace-ri.eu/interactions-of-the-spike-protein-and-heparin). and fusion with host cells by interact- one, or three bound heparin oligosac- the spike, stabilizes it in its closed ing with the host-cell ACE2 receptor charides. We then performed several inactive conformation, and allosterical- and heparan sulfate proteoglycans, replica microsecond simulations of ly modulates the exposure of the which are structural analogues of each system. Due to the large size of host-receptor binding site in the open, heparin that act as co-receptors and these systems (about 700,000 atoms), active conformation. Our results may influence host susceptibility. Our these simulations were run on the provide a basis for the rational molecular dynamics simulations were PRACE Marconi100 GPU nodes at optimization of heparin derivatives for designed to investigate the role of CINECA, Italy. Our models reveal long, antiviral therapy. heparan sulfate proteoglycans in positively charged patches on the SARS-CoV-2 infection and the inhibito- spike head that can accommodate the ry activity of heparin. We modeled the linear anionic polysaccharide chains homotrimeric head of the spike of heparin or heparan sulfate proteo- glycoprotein in active and inactive glycans. On binding at these patches,

58 59 2 Research machine-learning methods – such as support-vector The NLP group also began two new collaborations with machines – to neural networks. In his last published other research groups at HITS in 2020. Together with work at HITS, he created a neural network inspired by the SDBV group, we initiated DeepCurate, a BMBF-fund- 2.9 Natural Language linguistic theories. Mohsen currently works as a post- ed project with the aim of (semi-)automatically curating doctoral researcher at the UKP lab at the Technical database entries in the biomedical domain. Another University of Darmstadt. Master’s student Lucas exciting collaboration developed around Federico Processing (NLP) Rettenmeier, who joined the NLP group in 2019, submit- López’s work on hyperbolic neural networks for entity ted his master’s thesis at Heidelberg University in July linking. This work caught the attention of the GRG 2020 – interestingly, the thesis was submitted to the group and led to a joint workshop publication by Department of Physics and Astronomy. Lucas worked Federico, Beatrice Pozzetti, Steve Trettel (Stanford on the stability of word embeddings and applied his University), and Anna Wienhard (see Chapter 2.6, pp. insights to determine semantic change over time. He 44/45). We continue to work on geometric deep learn- has since joined Amazon Web Services as a Solutions ing and will further this work in a HITS lab project in Architect. Along the way, Lucas also finished the half 2021. marathon at the (virtual) NCT Run 2020 in less than one hour and 20 minutes (see Chapter 4)! Michael Strube was co-chair of the "First Workshop on Computational Approaches to Discourse," which took The NLP group continued to publish in 2020 at first- place (virtually) at EMNLP 2020. The event comprised rate conferences – such as EMNLP and COLING – with invited talks, paper presentations, and a large panel dis- first-authored papers by Sungho Jeon and Federico cussion on creating a community of discourse re- López. Over the summer, Federico completed an searchers in NLP. The second iteration of the workshop internship at Google Research in Mountain View, is already in the works and will hopefully be held in California (remote internship). person at the EMNLP 2021.

Neural-network-based coherence modeling

Sungho Jeon Coherence describes the semantic mostly used entity information to Centering only considers changes to relationship between the elements of describe the focus. These studies the focus at the sentence level. Group Leader Visiting scientists a text. A text passage can be ana- assume that entities are already Coherence, however, arises not only Prof. Dr. Michael Strube Mehwish Fatima (PhD student, HEC-DAAD Scholarship) lyzed either as a unified whole or as a determined and track focus changes at the sentence level but also at the Federico López (PhD student, Research Training Group collection of unrelated sentences. by checking which entities occur in document level, thereby providing Staff member AIPHES, delegated, Heidelberg University) The most well-known computational sentences and whether these insights into the structure of the Dr. Mark-Christoph Müller theory – centering theory – was entities are identical. This approach whole text or discourse. Discourse Students proposed in 1983 by Grosz, Joshi, renders centering theory accessible structure represents the semantic Scholarship holders Jason Brockmeyer and Weinstein. Centering theory to AI systems, and many studies organization of a whole text. Incor- Haixia Chai (HITS Scholarship) Fabian Düker determines the most-salient item in follow this idea. However, these porating structural information into Kevin Alex Mathews (HITS Scholarship) Lucas Rettenmeier (until June 2020) each sentence – the center (or focus) systems are mostly based on the centering model is beneficial for Sungho Jeon (HITS Scholarship) Tobias Martiné – and tracks changes in this center. heuristics and do not realize the diverse downstream tasks. In order Designing an AI system inspired by core elements of centering theory. In to identify the discourse structure, this theory in 2020 required overcom- 2020, we published two papers at earlier works have adopted a super- Natural Language Processing (NLP) is an interdisci- In 2020, Mohsen Mesgar successfully defended his ing many challenges. the EMNLP [Jeon and Strube, 2020a] vised approach that relies on human plinary research area that lies at the intersection of thesis, with the first defense taking place under and COLING conferences [Jeon and linguistic annotation. However, computer science and linguistics. The NLP group COVID-19 conditions and a part of the committee One challenge to the design involved Strube, 2020b] that aimed to com- annotating the discourse structure is develops methods, algorithms, and tools for automati- attending online. Mohsen worked on modeling text how to capture the focus of each bine a linguistically thorough treat- difficult, time-consuming, and costly cally analyzing natural language. The group focuses on coherence with an application for automatically as- sentence. Prior AI studies on coher- ment of centering with modern and requires that annotators under- discourse processing and related applications, such as sessing the readability of texts. Throughout the course ence have simplified the task and neural-network-based AI techniques. stand not only the local context automatic summarization and readability assessment. of his PhD work, Mohsen moved from traditional

60 61 2.9 Natural Language Processing (NLP)

surrounding the target sentence but To account for structural informa- text, thereby making it easier to NLP for database curation in also the higher discourse-level tion, we propose using a struc- determine the focus of attention. To the biomedical domain relations. ture-aware transformer that is built investigate this issue, we proposed a on top of a pretrained language coherence model that interprets Mark-Christoph Müller In [Jeon and Strube, 2020a], we model. This process creates trees texts incrementally in order to DeepCurate is a joint research effort curation in the biomedical domain. triage), (2) the identification of paper propose a coherence model inspired that represent the structure of the capture lexical relations [Jeon and of the HITS SDBV and NLP groups The biocuration use case is provided sections that contain relevant, by centering theory that accounts text. We evaluate our model on two Strube, 2020b]. that was initiated in 2020. The goal by the SDBV group’s SABIO-RK curatable content, and (3) the for structural information. The model tasks: automated essay scoring and of DeepCurate is to develop an database and comprises (1) the extraction, normalization, and does not rely on human annotations the assessment of writing quality. In our model, the meaning of sen- integrated, machine-learning-based selection of potential source-re- database insertion of this content. to identify this information and Results demonstrate that our model tences that have already been system for semi-automatic database search papers for curation (paper consists of two components: (1) a achieves state-of-the-art performance processed is represented by a discourse-segment parser that deter- on both tasks. We examine the semantic centroid vector, which is mines structural relationships identified trees assigned to different computed by averaging over the Natural Language Processing (NLP) ist ein interdisziplinäres Forschungsgebiet, das mit Methoden der Informatik between discourse segments by essay scores and observe that texts vector representations of these linguistische Fragestellungen bearbeitet. Die NLP Gruppe entwickelt Methoden, Algorithmen und Tools zur auto- tracking focus changes between of lower quality have deeper trees sentences. The model then mea- matischen Analyse von Sprache. Sie konzentriert sich auf die Diskursverarbeitung und verwandte Anwendungen, segments and (2) a structure-aware and more leaf nodes. These trees are sures the semantic distance be- wie zum Beispiel automatische Zusammenfassung und Lesbarkeitsbewertung. transformer that exploits structural also more skewed. Our findings tween the current sentence and the information to update sentence suggests that the focus changes less centroid vector. After processing the 2020 verteidigte Mohsen Mesgar seine Dissertation, die erste Verteidigung unter COVID-19-Bedingungen, bei der representations. The discourse-seg- frequently in texts of lower quality current sentence, the centroid vector ein Teil des Komitees online teilnahm. Mohsen behandelte das Thema Textkohärenz in seiner Dissertation. Er ment parser first identifies the than in texts of higher quality. is updated, and the model moves to wendete das von ihm entwickelte Modell auf die Bestimmung des Lesbarkeitsgrades von Texten an. Im Zuge der hierarchical discourse segments of a Another interesting observation is the next sentence. The coherence of Entwicklung seines Modells wendete er zunächst traditionelle Methoden des maschinellen Lernens an, während er text, thereby building on an approxi- that existing neural-network-based a document is determined by adding zum Ende seiner Arbeit mit guten Ergebnissen ein neuronales Netzwerk entwickelte, das gleichwohl auf linguisti- mation of centering theory. coherence-modeling systems the semantic distances between schen Einsichten beruhte. Mohsen arbeitet jetzt an der Technischen Universität Darmstadt als Postdoktorand. Der Centering theory defines three data process text differently from hu- each sentence and each sentence’s Masterstudent Lucas Rettenmeier reichte seine Masterarbeit an der Universität Heidelberg im Juli 2020 ein, structures in order to describe the mans. These systems have access corresponding centroid vector. We interessanterweise an der Fakultät für Physik und Astronomie. Er untersuchte die Stabilität von Word Embeddings focus of a sentence: a list of for- to all foci at the same time, and they evaluate the model on two tasks: und wendete seine Ergebnisse auf das Bestimmen von linguistisch-semantischem Wandel an. Lucas arbeitet nach ward-looking centers (Cf), the track the changes to all foci simulta- automated essay scoring and the seinem Abschluss jetzt bei Amazon Web Services als Solutions Architect. Nebenbei lief er einen Halbmarathon preferred center (Cp), and a single neously. The transformer – a recent assessment of discourse coherence. beim (virtuellen) NCT-Run 2020 (siehe Kapitel 4) in weniger als einer Stunde und 20 Minuten! backward-looking center (Cb). Cf AI technique – also relates all items Our findings suggest that it is indeed indicates the salient items of the simultaneously in order to capture useful to constrain the information Die NLP Gruppe publizierte weiterhin erfolgreich auf höchstem Niveau bei Konferenzen wie EMNLP und COLING, sentence that are focus candidates semantic relations in a text. Howev- to which coherence models are so etwa Sungho Jeon und Federico López. Federico absolvierte des weiteren ein dreimonatiges Praktikum bei in the next sentence, and Cp indi- er, humans process text linearly and exposed. Designing neural-net- Google Research in Mountain View, Kalifornien (von zu Hause). cates the most-preferred item of Cf. read sentences one-by-one and work-based coherence models that Cb describes the focus of a sen- incrementally. Incremental process- function similarly to how humans 2020 begannen wir zwei Kollaborationen mit HITS Gruppen. Zusammen mit der SDBV Gruppe starteten wir das tence with regard to the context of ing has a direct connection to process text is thereby beneficial. vom BMBF geförderte Projekt DeepCurate mit dem Ziel das Kuratieren von Datenbankeinträgen in der biomedizini- the previous sentences. Figure 45 centering because processing a text schen Domäne durch Computerhilfe zu unterstützen. Eine weitere faszinierende Kollaboration ergab sich aus den provides a system overview that sentence-by-sentence focuses the Arbeiten von Federico López zu hyperbolischen neuronalen Netzwerken für Entity Linking. Diese Arbeit stieß auf reveals how centers are selected. model’s attention on a part of the Interesse der GRG Gruppe, deren Arbeitsgebiet Differentialgeometrie ist. Diese Kollaboration führte 2020 schon zu einer Präsentation bei einem NeurIPS Workshop zu "Diffential Geometry Meets Deep Learning" von Federico                                                                   López, Beatrice Pozzetti, Steve Trettel (Stanford University) und Anna Wienhard. Wir werden weiterhin zusammen                      über Geometric Deep Learning arbeiten. Die NLP und die GRG Gruppen werden diese Zusammenarbeit im Rah-              men eines HITS Lab-Projekts vertiefen (siehe Kapitel 2.6, S. 44/45).                                          Michael Strube war Program Co-Chair des "First Workshop on Computational Approaches to Discourse", der online im Rahmen der EMNLP 2020 stattfand. Der Workshop umfasste eingeladene Vorträge, Vorstellungen regulärer Forschungsbeiträge und eine große Podiumsdiskussion, deren Ziel das Etablieren einer Gemeinschaft                                       von Wissenschaftlern ist, deren Forschungsthema Diskurs ist. Der zweite Workshop der Reihe ist schon in der Vorbereitung und wird im Rahmen der EMNLP 2021 stattfinden. Figure 45: Selecting forward-looking centers (Cf), preferred centers (Cp), and backward-looking centers (Cb).

62 63 2.9 Natural Language Processing (NLP)

In the first phase of the project, the employed. As a result of this sub- because printed paper is still the NLP group was active in several sub- task, the NLP group provided two preferred medium for manual tasks. To collect multi-modal pa- collections of candidate papers to curation. per-triage data (performed by the the SDBV, which were then used in SDBV), two pools of candidate the manual paper-triage-data collec- The ability to process scanned-paper papers from two different biomedi- tion. images will thus continue to be cal domains had to be created. important in the future. At the same A second, more-comprehensive time, an image (of sufficient quality) Apart from the domain of general effort involved the development of a constitutes the most general and biochemical reactions and their component for a processing task versatile representation of a scientif- kinetics, which constitutes the main called DB-to-document backprojec- ic document that might also contain domain of SABIO-RK, the second tion. In a nutshell, this task takes a tables or non-textual information, domain was related to the COVID-19 single database record from SABIO- such as graphs. virus. The full text corpus for the RK (or any other biomedical data- first domain was downloaded from base) that represents, for example, a In the context of DeepCurate, PubMed (https://pubmed.ncbi.nlm. single measurement reported in a DB-to-document backprojection was nih.gov/), while the second was particular paper and attempts to originally intended as a pre-process- based on the CORD-19 dataset determine the exact location in the ing step for creating training- and (https://www.kaggle.com/allen-insti- paper from which the information test data for machine-learning-based tute-for-ai/CORD-19-research-chal- was extracted during manual cura- curation methods. While this re- lenge). From these corpora, candi- tion. mains the main focus of the project, date papers for the manual triage DB-to-document backprojection is process had to be selected automat- This task is challenging because the also useful as a tool for quality ically. In both cases, the selection source papers are only available as assurance. A first version of the was based on the presence of digital images that were created by DB-to-document-backprojection certain words and – in particular – scanning the original paper print- component is described in [Müller et of expressions of numerical mea- outs. The decision to use these al., 2020]. Figure 46 presents auto- surements and their related units images as the basis for DB-to-docu- matically created results in which (mostly standard parameters from ment backprojection (as opposed to, database records (on the right) are kinetics, such as Kcat and Km). e.g., PDF- or XML-based full texts, linked to sections of the paper on which are also available for many the basis of matches in the OCR In order to support a more fine- papers curated for SABIO-RK) representation of the document Figure 46: OCR representation of a scanned document (left) and automatically extracted database records (right). grained string search involving stemmed from two main issues: image (read more about DeepCurate regular expressions and proximity First, the original paper printouts in Chapter 2.11, pp. 74/75). searches, the raw data were convert- used for SABIO-RK curation contain ed into the XML-based format of the manually added highlighting of annotation tool MMAX2 (https:// important sections that we wanted github.com/nlpAThits/MMAX2). For to preserve, and second, future both this conversion and the subse- curation efforts (by SABIO-RK as quent matching, an early version of well as by other biomedical databas- the pyMMAX API [Müller, 2020] was es) are likely to be paper-based

64 65 discovery of the accelerating expansion of the Universe. explains why stars are observed in different configurations Multi-dimensional fluid-dynamic simulations in combina- while also providing a qualitative understanding of stellar 2 Research tion with nucleosynthesis calculations and radiative-trans- evolution. However, simplifying assumptions limit the fer modeling provide a detailed picture of the physical predictive power of such models. Using newly developed processes that take place in Type Ia supernovae and are numerical tools, our group explores dynamic phases in 2.10 Physics of also applied in the PSO group to other kinds of cosmic stellar evolution via three-dimensional simulations. Our explosions.Classical astrophysical theory describes stars aim is to construct a new generation of stellar models as one-dimensional objects in hydrostatic equilibrium, an based on an improved description of the physical process- Stellar Objects (PSO) approach that has proven extremely successful and es that take place in stars.

Shaking up stars sound waves, which are triggered by The SLH code, developed in the PSO pressure fluctuations and can be group at HITS, was designed to deal Internal waves are hydrodynamic observed as higher-frequency signals. with this special situation based on a perturbations that propagate inside a In a one-dimensional model, the set of numerical techniques that stratified medium. They are a well- excitation of waves via convection remove numerical artifacts that are known phenomenon in the fields of needs to be approximated since common in simulations of low-Mach- oceanography and atmospheric convection is an intrinsically multi-di- number flows. At the same time, the science and are responsible for local mensional process. Moreover, SLH code avoids approximations of weather effects (e.g., föhn winds) and one-dimensional models cannot the hydrodynamical equations that global circulations in the atmosphere predict the amplitude of waves. are often employed in simulations of (e.g., quasi-biennial oscillation). One However, depending on the ampli- low-Mach-number flows but prevent mechanism by which such waves are tude, non-linear effects can lead to a the resolution of sound waves. generated is via convective motions redistribution of angular momentum in the upper atmosphere or in the and to a mixing of chemical elements Simulations of wave generation upper layers of oceans. Convection is throughout a star. Both effects can The capabilities of SLH have been Group Leader Visiting scientists usually restricted to some fixed part have a significant impact on the thoroughly tested by the PSO group Prof. Dr. Friedrich Röpke Sabrina Gronow of the atmosphere. When convective further evolution of the star. Multi-di- [Horst et al., 2020] and demonstrate Florian Lach plumes reach the boundary of a mensional numerical simulations are the superiority of SLH compared with Staff members Giovanni Leidi convective region, they "hammer" therefore needed to verify the cur- standard numerical schemes in Dr. Róbert Andrássy Dr. Fabian Schneider (Gliese Fellowship) against it and excite waves in the rently used models of wave excitation following the propagation of gravity Dr. Johann Higl Theodoros Soultanis (IMPRS PhD Student at MPIA adjacent, stable layer. The same and to estimate wave amplitudes. waves at very low Mach numbers. Javier Morán Fraile Heidelberg) mechanism is also active in stars. An The group’s publication also revealed example involves stars with masses Flows in deep stellar interiors that SLH is indeed capable of simul- Scholarship holders Students of more than twice that of the Sun Flows in the interior of stars usually taneously simulating the excitation Leonhard Horst (HITS Scholarship) Dave Bubeck and that have a convective core and have a very low Mach number – that and propagation of gravity- and Christian Sand (HITS Scholarship) Manuel Kramer a stable envelope. In these stars, is, the flow speed is usually several pressure waves, as was demonstrat- Melvin Moreno waves are excited at the core bound- orders of magnitude less than the ed with the example of a realistic Patrick Ondratschek ary and are expected to propagate all local speed of sound. The frequency stellar model of a 3-solar-mass star the way to the surface of the star, of gravity waves is related to the flow for which the evolution of the convec- where they can be detected as speed. In order to follow the propaga- tive core and a large fraction of the “We are stardust” – the matter we are made of is largely Physics of Stellar Objects research group seeks to under- variations in brightness. tion of gravity waves, it is therefore stable envelope were followed for the result of processing the primordial material formed stand the processes that take place in stars and stellar sufficient to follow the flow. Pressure about 1 month of physical time in a during the Big Bang. All heavier elements originate from explosions. Newly developed numerical techniques and Asteroseismology waves, in contrast, require that a two-dimensional simulation. The nucleosynthesis in stars and in gigantic stellar explosions. the ever-increasing power of supercomputers facilitate the The field of asteroseismology uses simulation also be accurate on the convective motion excites a rich How this material formed and how it is distributed modeling of stellar objects in unprecedented detail and these variations in brightness to infer much smaller timescales related to spectrum of waves in the stable throughout the Universe are fundamental concerns for with unparalleled precision. the interior structure of stars in order the speed of sound. This large layer. A careful analysis of the waves astrophysicists. At the same time, stellar objects make the One of our group’s primary goals is to model the ther- to improve one-dimensional stellar separation of timescales makes it revealed that their properties were in Universe accessible to us by way of astronomical observa- mo-nuclear explosions of white dwarf stars that lead to models. In general, two types of highly challenging to simulate both excellent agreement with theoretical tions. Stars shine in optical and other parts of the electro- the astronomical phenomenon known as Type Ia superno- waves can be observed: gravity gravity- and pressure waves in the predictions. In contrast to one-dimen- magnetic spectrum and are the fundamental building vae. These supernovae are the main source of iron in the waves, for which buoyancy acts as same simulation. sional models, wave amplitudes can blocks of galaxies and larger cosmological structures. Universe and have been instrumental as distance indica- the restoring force and that have a be easily extracted from the simula- With the help of extensive numerical simulations, the tors in cosmology, which has led to the spectacular rather low oscillation frequency, and tions. While the results hint at

66 67 2.10 Physics of Stellar Objects (PSO) „Wir sind Sternenstaub” – die Materie, aus der wir numerischen Hilfsmitteln untersucht unsere Gruppe possible non-linear wave behavior, it Stellar Dance requires fuel, and after some time, geformt sind, ist zum großen Teil das Ergebnis von dynamische Phasen der Sternentwicklung in dreidimen- is not possible to make strong this fuel is exhausted. At this point, Prozessierung des primordialen Materials aus dem sionalen Simulationen. Unser Ziel ist es, eine neue predictions with respect to mixing or Lonely stars? the stellar structure rearranges such Urknall. Alle schwereren Elemente stammen aus der Generation von Sternmodellen zu schaffen, die auf einer to angular-momentum transport due Although on clear nights the sky that new nuclear reactions com- Nukleosynthese in Sternen und gigantischen stellaren verbesserten Beschreibung der in ihnen ablaufenden to the reduced dimensionality of the appears crowded, the stars we mence and equilibrium is recovered. Explosionen. Wie dieses Material gebildet wurde und wie physikalischen Prozesse basiert. simulations (see Fig. 47). Future observe are often extremely far apart Our Sun will thus evolve through es sich im Universum verteilt, stellen für Astrophysiker Eine weitere Komplikation, die in klassischen Sternent- three-dimensional simulations should from one another. The resolving giant stages. As a so-called asymp- fundamentale Fragen dar. wicklungsmodellen nur sehr grob angenähert werden be able to improve on this point and power of modern telescopes, howev- totic-giant-branch (AGB) star, it will Sterne sind fundamentale Bausteine von Galaxien und kann, ist die Binarität. Wohl wegen des Beispiels are currently being prepared in the er, changes this picture. In fact, many develop a pronounced a core-enve- aller größeren kosmologischen Strukturen. Gleichzeitig unserer Sonne tendieren wir oft dazu, Sterne als PSO group. stars that appear to be lope structure in which a central and machen stellare Objekte das Universum für uns in isolierte Objekte zu sehen; tatsächlich findet man die solitary objects largely inert carbon/oxygen core will astronomischen Beobachtungen überhaupt erst sichtbar. meisten von ihnen jedoch in Systemen mit zwei oder turn out to be embedded in shells with burning Sterne scheinen im optischen und anderen Teilen des sogar mehr Sternen. Einige von diesen wechselwirken be a helium/hydrogen and a very large elektromagnetischen Spektrums. Am Ende ihrer Entwick- miteinander, und das hat weitreichende Auswirkungen envelope. lung kollabieren massereiche Sterne zu Neutronenster- auf ihre weitere Entwicklung. Solche Interaktionen sind Our Sun is a lonely star, but many nen oder Schwarzen Löcher. Eine Verschmelzung solcher inhärent mehrdimensional und können in klassischen others are not. Before reaching the kompakten Objekte wurde kürzlich mit Hilfe von Gravita- Modellen nicht konsistent behandelt werden. Die giant stages, stars may have close tionswellen beobachtet, die ein neues Fenster für PSO-Gruppe führt dreidimensionale Simulationen zu companions. astronomische Beobachtungen des Universums öffnen. stellaren Wechselwirkungen durch, um neue Einsichten Unsere Forschungsgruppe Physik stellarer Objekte in diese entscheidenden Phasen der Entwicklung von Common-envelope evolution (PSO) strebt mit Hilfe von aufwendigen numerischen Sternsystemen zu gewinnen. One of the unsolved problems Simulationen ein Verständnis der Prozesse in Sternen Das dritte Forschungsfeld der PSO Gruppe ist die of stellar astrophysics is the und stellaren Explosionen an. Neu entwickelte numeri- Modellierung von thermonuklearen Explosionen Weißer question of what happens to sche Techniken und die stetig wachsende Leistungsfä- Zwergsterne, die zum astronomischen Phänomen der the system when the fast- higkeit von Supercomputern ermöglichen eine Modellie- Supernovae vom Typ Ia führen. Diese sind die Haupt- er-evolving, more-massive rung stellarer Objekte in bisher nicht erreichtem quelle des Eisens im Universum und wurden als ("primary") star evolves into a Detailreichtum und mit großer Genauigkeit. Abstandsindikatoren in der Kosmologie eingesetzt, was giant. It is bound to catch its Die klassische astrophysikalische Theorie beschreibt zur spektakulären Entdeckung der beschleunigten companion in a "common Sterne als eindimensionale Objekte im hydrostatischen Expansion des Universums führte. Mehrdimensionale envelope," inside of which the Gleichgewicht. Dieser Ansatz ist extrem erfolgreich. Er strömungsdynamische Simulationen kombiniert mit Nu- companion and the core of the erklärt, warum wir Sterne in verschiedenen Konfiguratio- kleosyntheserechnungen und Modellierung des Strah- primary star orbit each other. nen beobachten, und liefert ein qualitatives Verständnis lungstransports ergeben ein detailliertes Bild der Because this happens in the der Sternentwicklung. Die hierbei verwendeten vereinfa- physikalischen Prozesse in Typ Ia Supernovae, werden envelope gas, tidal friction chenden Annahmen schränken jedoch die Vorhersage- aber auch auf andere Arten von kosmischen Explosio- drains kinetic energy from the kraft solcher Modelle stark ein. Mit neu entwickelten nen angewendet. orbital motion and transfers it to the energy of the envelope gas, which has two fundamental con- 3D hydrodynamical simulations follows fluid dynamics on a moving observed planetary nebulae and are pair of compan- sequences: The separation of the Common-envelope interaction is an computational mesh, which has thought to be the progenitors of Type Figure 47: Snapshot of the internal-wave ions. Stars are also not as stellar cores shrinks, and the enve- inherently dynamic and multidimen- many advantages over conventional Ia supernovae. Compared with other pattern in a two-dimensional simulation of a changeless as it first may seem. lope material may be ejected from sional phase in binary stellar evolu- static grids for the problem at hand. giant stages, however, the AGB phase 3-solar-mass star. The color coding shows Indeed, they have been found to the system, eventually forming a tion – that is, it cannot be described is numerically challenging. The temperature fluctuations, with red and blue indicating temperatures below and above the evolve through distinct phases that so-called planetary nebula. The in classical one-dimensional stel- Common envelopes resulting from envelope is extremely bloated and angularly averaged temperature, respectively. are characterized by vastly different leftover tight binary consisting of two lar-evolution models. In order to AGB primary stars only slightly bound, which requires The inset magnifies the core region, where the sizes and internal structures. stellar cores may engage in interac- study the physical process in detail, Systems that enter common-enve- meticulous care in the setup to avoid flow is convective (see [Horst et al., 2020]). The reason for this evolution lies in tions that – depending on the nature the PSO group developed a modeling lope evolution when the primary star non-physical perturbations of the the interplay between gravity, which of the two initial stars – can give rise approach based on the AREPO code becomes an AGB star are of particu- delicate equilibrium. Common-enve- pulls the stellar material together, and to spectacular astrophysical phenom- originally developed by Volker Sprin- lar interest. Once the envelope is lope interactions with AGB primary nuclear reactions in the stars’ interi- ena, such as supernova explosions. gel and his team at HITS (see former ejected, its core – a compact white stars have therefore not often been ors, which produce energy such that Annual Reports). This code is particu- dwarf consisting of carbon and simulated in the past. With its new thermal pressure can balance gravita- larly well-suited to modeling com- oxygen – is left with a close compan- tools, the PSO group tackled this tional pull. Nuclear burning, however, mon-envelope interaction because it ion. Such systems form many of the problem and was able to perform

68 69 2.10 Physics of Stellar Objects (PSO) 105 4000 corresponding simulations [Sand et the system settles on a final orbit. An This energy acts in spiral arms, al., 2020]. The snapshots shown in important (and challenging) question boosting the expansion (see Figure 2000 Figure 48 were taken from a system is whether the envelope material is 49).  with a 6-billion-year-old AGB primary ejected in this common-envelope 0 y / R star that – at the onset of com- phase. Where does the energy for the When including this mechanism, the mon-envelope interaction with a ejection come from? The orbital simulations performed by the PSO 2000 − 3 companion – had a radius 200 times energy of the binary is released group indicate full envelope ejection 10 − 3 as large as that of the Sun. Its core is during the inspiral. Although the (see Figure 50). 4000 t = 296 d t = 600 d t = 1000 d − erg cm marked with an "x." The companion envelope of the AGB star is quite / 4000 star, marked with a "+," has the mass loosely bound, the energy from the By varying the parameters of the recomb E 1 of our Sun. It was initialized in an inspiral of the companion – which is initial system, such simulations have V orbit that is not in exact co-rotation eventually transferred to the envelope also revealed that the efficiency of 2000 with the envelope, and its inspiral material – is insufficient to eject and mass loss and the final orbital  was caused by drag force. remove it completely. A second and separation of the core binary system 0 z / R 101

2000 800 − 5 10− 4000 t = 296 d t = 600 d t = 1000 d 400 − 10 6 − 4000 2000 0 2000 4000 4000 2000 0 2000 4000 4000 2000 0 2000 4000 − − − − − − − 3 / / /  x R x R x R 7    0 10− y / R ρ / g cm 10 8 Figure 49: The ionization energy density inside the photosphere (white and light-blue lines) can 400 − − be used to remove the envelope. The inner contours enclose regions with high ionization 9 10− fractions [Sand et al., 2020]. 800 − t = 0d t = 296 d t = 600 d 10 10− ρ / g cm−3 800 10-11 10-9 -7 -5 4 10 10 400 3 80  0 y / R 800 2 Mach number 400 0 − Figure 50: Gas density in the orbital plane 1 at the end of our simulation (after about 800 − t = 0d t = 296 d t = 600 d 400 seven years). The cores of the two stars 0 (marked by the symbols) perform a close 800 400 0 400 800 800 400 0 400 800 800 400 0 400 800 80 − − − − − − x/R x/R x/R 80 0 80 dance while preserving a spiral structure    around them [Fig. 4 of [Sand et al., 2020]].

Figure 48: Common-envelope evolution with an AGB primary star: gas density (upper row) and Mach number (the ratio of fluid velocity to sound 

R 0 speed; lower row) in the orbital plane are plotted at the beginning of the simulation as well as after both one and four orbits. The initial centers /

("cores") of the giant star and the companion are marked by a plus symbol and a cross symbol, respectively. The first spiral arm forms where y the companion dives into the envelope, and another is formed on the opposite side. Large-scale instabilities between layers can be identified particularly well in the lower Mach plot [Sand et al., 2020]. 400

While the first orbit takes about 300 important effect that powers enve- depend on the mass ratio of the two days, another three orbits occur lope ejection is the release of the stars. This finding is currently being 800 within the next 300 days as the ionization energy of the material in further explored by the PSO group t = 2496 d companion spirals in and approaches recombination processes: When gas with simulations of common-enve- 800 0 800 the core of the AGB star. The entire expands and cools, ions recombine lope phases in systems involving event takes only a few years before to form atoms and release energy. stars of different natures. x / R 

70 71 COVID-19 activities research data within national and of SARS-CoV-2 virus–host interac- international COVID-19 research tion mechanisms guided by input 2 Research With the services developed in our communities. from domain experts and based on group, namely SABIO-RK and published work. The resulting map FAIRDOM-SEEK, we were able to COVID-19 Disease Map in the is a platform for visual exploration 2.11 Scientific Databases react quickly to the new scientific FAIRDOMHub and computational analyses of the challenges brought about by the The international COVID-19 Disease molecular processes involved in SARS-CoV-2 coronavirus pandemic Map initiative is an effort involving SARS-CoV-2 entry, replication, and and Visualization (SDBV) and the new COVID-19 disease. more than 230 scientists with the host–pathogen interactions as well FAIRDOM-SEEK is used as a flexible goal of building a comprehensive, as immune response and host-cell platform for storing and sharing standardized knowledge repository recovery- and repair mechanisms.

Vivien Louise Junker (since October 2020) Group Leader Dr. Olga Krebs PD Dr. Wolfgang Müller Michael Lieser (since October 2020) Ghadeer Mobasher (since January 2020) Staff members Ina Pöhner Helen Desmond (since July 2020) Dr. Maja Rey Dr. Dorotea Dudaš Dr. Natalia Simous Dr. Sucheta Ghosh Dr. Andreas Weidemann Martin Golebiewski Benjamin Winter Xiaoming Hu Dr. Ulrike Wittig

This year we highlight our COVID-19 activities​, DeepCurate, our Most of our work is project-funded. We have three types of HITS Lab project, and NFDI4Health, the German National projects: (i) projects with a biological/biomedical focus and for Research Data Infrastructure for Personal Health Data. We which the SDBV group is an infrastructure partner, such as begin, however, by summarizing the group’s focus and its main LiSyM; (ii) collaborative infrastructure projects with the SDBV development​. group, such as FAIRDOM and de.NBI; and (iii) research pro- The SDBV group aims to provide services involving FAIR data. jects, such as DeepCurate and PoLiMeR. The FAIR acronym stands for Findable, Accessible, Interopera- In the past year, one project that (along with its forerunners) ble, and Reusable data. Using the SEEK system, we help people had helped to shape the group’s previous decade was com- make their own data FAIR. In SABIO-RK, our professional pleted: FAIRDOM, a project focused on FAIR Data, Operations, curators provide FAIR data on enzyme-catalyzed reactions and Models. FAIRDOM and its derivatives have been featured extracted from the literature. FAIR data are computationally in the SDBV group in recent years, and FAIRDOM work consti- discoverable. Ideally, a smart harvester should recognize FAIR tutes part of two of the new projects that began last year: data as such and be able to harvest them. Making these ideas ELIXIR CONVERGE and NFDI4Health. work implies standardization. Participating in standardization We hope that our contribution to the NFDI4Health consortium efforts – such as COMBINE, STRENDA, and ISO/TC 276 (which began in 2020) and to NFDI4Biodiversity will further Biotechnology – constitutes a significant part of our work. All develop the work of FAIRDOM. of our work benefits from the close collaboration of biologists and biochemists with computer scientists. Figure 51: Screenshot of the COVID-19 Disease Map project space (upper panel) on the SEEK-based FAIRDOMHub (https://fairdom- hub.org/projects/190) with two examples of SARS-CoV-2 models registered in FAIRDOMHub (lower panels).

72 73 2.11 Scientific Databases and Visualization (SDBV)

The map supports the research based on the SEEK software in an Therefore, in 2020, we put effort from one fixation to another is community and improves our effort to support data storage and into manually extracting from the called a saccade and is useful in understanding of the disease with exchange. SEEK is used to store literature such inhibition constants visual search. In Figure 53, we the aim of facilitating the develop- study documents and make them and other kinetic parameters observed that after working with the ment of efficient diagnostics and accessible. These documents involved in SARS-CoV-2 and related 15th of 30 papers, the cognitive therapies. We support the COVID-19 include study-protocol templates viruses – such as MERS and load during the fixations decreased, Disease Map community with a and data dictionaries as well as SARS-CoV – with the aim of sup- whereas the cognitive load for the project space in our public re- information on study-metadata porting the search for treatments saccade also decreased for one of search-data-management platform, structures – such as data models against COVID-19. Many of these the participants after the 15th task. FAIRDOMHub (https://fairdomhub. that describe study subjects and potentially antiviral agents are We also cross-checked the curators’ org/projects/190), which groups their clinical parameters – in directed (1) against the spike responses and found that this project contributors, data, computa- addition to treatment outcomes and protein needed by the virus to enter Figure 52: Threshold computation for cognitive load (PD in %) using quartile estimation (P1: decrease was not related to the tional models, and literature as well similar information (https://seek. the host cell or (2) against the viral Participant 1, P2: Participant 2). difficulty of the respective paper-tri- as data and curation guidelines for studyhub.nfdi4health.de). Addition- proteases (3CLpro, PLpro) needed age task. This study revealed that the community (Fig. 51, previous ally, direct links to primary resourc- for the virus’s life cycle. manual searches using a do- of completed tasks. In this query, the triage procedure includes both page). FAIRDOMHub and its under- es and websites for the studies are main-specific ontology and a the term familiarity implied attribu- reading and a visual search, which lying SEEK software have been included. Using the SEEK web DeepCurate parameter list from working memo- tion to long-term memory storage are comparable in terms of time. In under development by the SDBV services, this information also ry. The curators accustom them- and included both accommodation our case, the visual search required group for over 10 years. directly feeds a COVID-19 study DeepCurate represents a step selves to the domain-specific with the screen-based triage setup greater effort (that is, a greater search portal implemented by our toward semi-automated data ontology from the core or from the using an eye tracker and the ease cognitive load) than did the reading. COVID-19 Study Hub based on SEEK project partners at ZB MED in curation and offers a deeper under- habitual domain of their work. of reading papers from a new It is also possible that the curators for clinical and epidemiological Cologne (https://covid19.studyhub. standing of the biocuration process. domain. needed to search less and to read studies nfdi4health.de). The standardization Within the DeepCurate project, Another focus of the project is the Due to the pressing need during the of metadata that describe the which has been funded by the analysis of the effects of working progressing pandemic, a quickly studies as well as their subjects BMBF since January 2020, the data- with documents from the new growing number of clinical and and results also constitute part of base aims to strengthen biologi- COVID-19 domain. We found that it epidemiological COVID-19 studies our activities in the NFDI4Health cal-data curation using semi-auto- takes a statistically significant and are planned or already ongoing, but COVID-19 task force and contribute matic processes within the greater amount of effort to work there is a lack of coordination to the aim of making the studies biocuration framework of SABIO-RK with documents from the new among these efforts for securing and their content “FAIR.” via collaboration with the NLP COVID-19 domain than with those common standards, comparable group at HITS. The SABIO-RK from the core domain. We used the results, and – most importantly – COVID-19 kinetics data extraction database for biochemical reaction cognitive load (that is, pupil dilation unified access to these results. for SABIO-RK kinetics has successfully served the [PD] in percentage) to measure the Several partners in the German Within our SABIO-RK database community for several years. effort and then proposed a thresh- National Research Data Infrastruc- (http://sabiork.h-its.org/), users can DeepCurate also seeks to provide a old for effortless reading using ture for Personal Health Data (NFDI- find biochemical reactions together deeper understanding of the cura- quartile-based outlier estimation of 4Health), including the SDBV group, with the catalyzing enzymes and tors’ cognitive and ergonomic the cognitive load (Fig. 52). are developing a COVID-19 Study kinetic constants that describe processes, which can reveal their Figure 53: Cognitive load (PD in %) for fixation and saccade for two domains for both partici- Hub for Germany with funding from reaction velocities (Vmax, kcat) as perceived workload and lead to a We can thus now send feedback to pants (same legend for both figures). the DFG as a COVID-19 task force well as concentrations of reaction better understanding of points of the curators in the case of a diffi- (https://www.nfdi4health.de/index. participants for half-maximal difficulties or subtleties of compre- cult curation task, and we can also We noted that the cognitive load comparatively more for the familiar php/task-force-covid-19-2/). This activities (Km) or of inhibitors for hension. We are currently looking determine the curation cost using related to eye fixations decreased domain because the structure of publicly accessible platform bundles half-maximal inhibitions (Ki, IC50). deeply into the document-triage this metric. We further observed and remained stable over time for the paper may have been familiar to information on relevant clinical and Such structured, annotated, and procedure. Document triage is that the curators took around twice the new domain, which could them and they may have under- epidemiological studies in Germany, standardized collections of kinetic defined as the process of pooling as long to complete the triage of represent weak evidence of increas- stood where the required data could their publicly available study docu- constants that originate from candidate papers from a large COVID-domain literature than of ing familiarity with the new domain. be found. We look forward to using ments and results (e.g., measured scientific publications are needed journal repository – such as core-domain literature. We investi- The fixations were eye movements the proposed metrics in the main parameters), and other information not only to model metabolic net- PubMed – for later curation in the gated whether the curators’ famil- in which the retina stabilized over a curation task in the future (read about the studies. works but also for basic research, SABIO-RK database. This pooling is iarity with the curation in the new stationary object of interest used in more about DeepCurate in Chapter The COVID-19 Study Hub is partially for example, in drug development. performed by the curators via domain increased with the number reading. The rapid motion of the eye 2.9, pp. 63-65).

74 75 2.11 Scientific Databases and Visualization (SDBV) Der Großteil des diesjährigen Berichts ist COVID-19-Aktivitäten gewidmet, außerdem unserem HITS Lab-Projekt DeepCu- German National Research rate und NFDI4Health, der Nationalen Forschungsdateninfrastruktur für personenbezogene Gesundheitsdaten. Zuvor Data Infrastructure for jedoch fassen wir die Ziele der Scientific Databases and Visualization (SDBV) Gruppe zusammen und stellen die Personal Health Data from the health sector have con- in general, as has been impressively wichtigsten Entwicklungen vor. (NFDI4Health) firmed their participation, including demonstrated during the current major professional associations and COVID-19 pandemic. Die Gruppe befasst sich mit Diensten über FAIRe Daten. Die Abkürzung FAIR bezeichnet Findable, Accessible, Interopera- The largest change to the SDBV epidemiological cohorts. Letters of In the majority of cases, these types ble und Reusable Daten, also auffindbar, zugreifbar, interoperabel und nachnutzbare Daten. Mittels des SEEK Systems group in 2020 was the launch of support have been received from 37 of health data are hosted in distribut- helfen wir Nutzern, ihre eigenen Daten FAIR zu machen. Mit SABIO-RK stellen professionelle Kuratorinnen FAIR-Daten NFDI4Health – the German National national and international institu- ed local-data infrastructures within über enzymkatalysierte Reaktionen bereit, extrahiert aus der Literatur. Der Sinn von FAIR ist die “Discoverability“, also, Research Data Infrastructure for tions. institutions, such as in hospitals, dass sich Rechner in den Daten zurechtfinden können. Idealerweise sollte ein smartes Datensammelprogramm FAIRe Personal Health Data – in October public-health authorities, and Daten als solche erkennen, und bereit sein, die Daten zu ernten. Diese Ideen umsetzen zu wollen, impliziert Standardisie- 2020 (https://www.nfdi4health.de). The project will pursue the following research institutions. Information on rung. Partizipation in Standardisierungs-Initiativen wie COMBINE, STRENDA und ISO TC 276 Biotechnology, ist ein As one of 18 partner institutions in main objectives: (1) to implement a these data is often fractured in local signifikanter Teil unserer Arbeit. Wichtig für unsere Arbeit ist die enge Zusammenarbeit von Biolog/-innen und Biochemi- the initiative, the SDBV group at federated health-data infrastructure repositories and websites or fre- ker/-innen, die mit Informatiker/-innen zusammenarbeiten. HITS brings its many years of in Germany for searching for and quently even in publications. expertise with scientific databases accessing healthcare data and Ein Großteil unserer Arbeit wird durch Projektförderung ermöglicht. Wir haben 3 Arten von Projekten: (i) Projekte mit and data standardization to the health databases, (2) to enhance It is therefore vital to enhance the biologischem/biomedizinischem Fokus, in denen SDBV ein Infrastruktur-Partner ist, wie z.B. LiSyM; (ii) kollaborative table. Within the framework of the data sharing and the data linkage of findability and public availability of Infrastrukturprojekte mit SDBV, wie z.B. FAIRDOM und de.NBI; (iii) Forschungsprojekte wie DeepCurate und PoLiMeR. German National Research Infra- personal-health data in compliance these distributed resources through structure (NFDI), the multidisci- with privacy regulations and ethics the publication of GDPR-compliant Im letzten Jahr endete ein Projekt, das zusammen mit seinen Vorgängern SysMO-DB I und SysMO-DB II das vorige plinary team of NFDI4Health is set principles, (3) to enable the develop- rich metadata of health studies, Jahrzehnt unserer Arbeit bestimmt hat: FAIRDOM, das Projekt über FAIRe Daten, Operationen und Modelle. Dieses to establish a research-data infra- ment and deployment of new including applied study instruments, Projekt und von ihm abgeleitete Projekte waren hier oft Berichtsgegenstand. Auch in diesem Jahr sind Resultate aus structure for personal-health data in consent-management mechanisms questionnaires, data catalogues, FAIRDOM in zwei Projekten enthalten, die in diesem Jahr begonnen wurden, ELIXIR CONVERGE und NFDI4Health. Germany. The project – along with and augmented data-access ser- and protocols. It is important to 8 other German research-data vices, and (4) to foster data sharing support the publication of health Wir hoffen, dass unsere Beiträge zu NFDI4Health und NFDI4Biodiversity, ELIXIR CONVERGE und anderen Projekten die infrastructures – is funded by both and cooperation between the data by FAIRDOM-Arbeiten weiter entwickeln werden. the German Federal Government communities involved in clinical 1. developing and establishing and German state governments. research, epidemiological health, domain-specific publication 5. supporting the harmonization of Greifswald. Together with project processing and integration in the The SDBV group is directly involved and public health. guidelines for health data; common-core (meta-)datasets. partner ZB Med in Cologne, HITS life sciences. He is also involved in in the initiative as a key partner and 2. enforcing the consistent appli- Funding and incentives for data researchers will also develop a scientific-standardization initiatives, assumes a leading role in the data Cohort studies, health-surveillance cation of existing health-data collectors as well as corre- search portal that will access these such as COMBINE (Computational standardization. The SDBV group systems, and clinical trials are standards and metadata stan- sponding funding for data data in order to make it easier for Modeling in Biology Network) and will also make its SEEK software characterized by a deep phenotyp- dards (including standard stewards and data-management users to find them. EU-STANDS4PM (a European suite available to the initiative as ing of study subjects (healthy terminologies, ontologies, and infrastructures is needed, standardization framework for data part of the infrastructure architec- individuals and/or patients) via minimal information standards especially for data harmoniza- Martin Golebiewski will assume integration and data-driven in silico ture that will be built by NFDI- questionnaires, medical examina- for the health domain); tion and for past studies and leadership of the entire NFDI- models for personalized medicine). 4Health over the next few years. tions, molecular and genetic profil- 3. establishing new standards for their data (and to extract these 4Health work package (task area) Our goal is to quickly and purpose- ing, and collecting relevant metada- health data and their metadata, data from the scientific litera- “Standards for FAIR Data” (TA2). fully adapt standards to the require- NFDI4Health provides a long-term ta on study subjects. Their if needed, in close collaboration ture). Contributors to the work package ments of the data in NFDI4Health perspective for the SDBV group as longitudinal nature and high quality with domain-specific standard- will adapt and harmonize relevant and – if necessary – to initiate the this infrastructure is intended to be make these data and their metadata ization initiatives and commit- Developed in the SDBV group with data standards and thereby lay the development of new standards. sustainably funded, with an initial a valuable research resource for tees of standards-developing partners at the University of Man- foundation for bundling and struc- runtime of 5 years. NFDI4Health developing preventive and therapeu- organizations (SDOs), such as chester in the UK (Prof. Dr. Carole turing, which will facilitate finding In support of the establishment and represents an interdisciplinary tic measures at the individual- and the ISO, the CEN, and their Goble’s team), the SEEK platform and comparing the collected per- promotion of all research-data research community that integrates population levels. Rendering these national counterparts; will play a central role as a “hub” for sonal-health data. Martin has been infrastructures, both the German major German institutions with types of health data FAIR (Findable, 4. providing incentives and support data bundling, especially for meta- involved in standardization commit- Federal Government and individual experience as health-data collectors Accessible, Interoperable, and for publishing rich information data from various studies that are tees at the International Organiza- states together intend to provide up and holders, health-data managers Reusable) by providing and applying about health data, including collected and sorted in NFDI- tion for Standardization (ISO) for to EUR 90 million annually until at and analysts, as well as methodolo- corresponding standards and data instruments, access informa- 4Health. The group will adapt the several years and leads the ISO/TC least 2028 with the intention of gy developers. In addition to the 18 infrastructures is therefore a major tion, patient-consent forms, and SEEK software to meet these 276/WG 5, which comprises around creating a long-term sustained partners funded by the initiative, a goal not only for health research further study-related informa- requirements along with project 150 experts from all over the world platform that supports research. total of 46 renowned institutions but also for the public-health sector tion; partners in Leipzig, Göttingen, and who develop standards for data

76 77 Background

2 Research In the TOS group, we focus on stars An oscillation mode is uniquely cannot propagate and decay expo- with oscillations similar to those determined by the properties of the nentially; see Fig.54). The coupling present in the Sun. These so-called matter through which it travels and between the two oscillating cavities 2.12 Theory and solar-like oscillations are low-ampli- is described by its frequency (or and the phases of the waves in tude oscillations that are stochasti- period) and mode identification – each cavity can be derived from the cally excited through turbulence in that is, the radial order (the number resulting mixed pressure–gravity Observations of Stars (TOS) the near-surface convection layer of of nodal lines in the radial direc- oscillations and can provide infor- a star. The oscillations are sound tion), spherical degree (the number mation on the physical conditions in waves and are expected to be of nodal lines on the surface), and the evanescent region. Furthermore, present in all stars with convective azimuthal order (the number of the difference in period between outer layers. A convective envelope nodal lines that cross the spin axis). pure gravity dipole modes with is typically present in low-mass In evolved, so-called red-giant stars, consecutive radial orders (so-called main-sequence stars, subgiants, the dipole (with a spherical degree period spacing, which can be and red giants with surface tem- of 1) modes have sensitivity to both extracted from mixed dipole modes) peratures below ~6,700 °K. the deep interior and the outer provides a measure of the extent of The stellar structure is imprinted in layers – that is, the oscillations the gravity mode cavity and thus of the global oscilla- resonate in an inner (gravity) and the properties of the stellar core. tion modes an outer (acoustic) cavity Determining these values and of a separated by an evanes- understanding the physical process- star. cent zone (the area es that take place in these deep between the parts of stars is one of the aims of cavities where the TOS group. oscillations

Group Leader Student Prof. Dr. Ir. Saskia Hekker (since September 2020) Julian Schlecker (since 14 December 2020)

Staff member Daria Mokrytska (since September 2020)

Stars are an important source of electromagnetic radiation in CoRoT-, Kepler-, K2-, TESS-, SONG-, and Plato space observato- the universe that allow for the study of many phenomena, ries – combined with astrometric observations from Gaia, ranging from distant galaxies to the interstellar medium and spectroscopic data from the SDSS-V APOGEE, interferometry, extra-solar planets. However, due to their opacity, “at first sight photometry, and state-of-the-art stellar models, such as MESA it would seem that the deep interior of the Sun and stars is – provides insights into the stellar structure and physical less accessible to scientific investigation than any other region processes that take place in stars. Figure 54 left: Ray path of a dipole mode in the outer acoustic (sound) of the universe” (Sir Arthur Eddington, 1926). Now, however, Understanding these physical processes within stars and how cavity in black and of a dipole mode with the same frequency in the inner gravity cavity in blue. The outer turning points of the gravity through modern mathematical techniques and high-quality they change as a function of stellar evolution is the ultimate modes are indicated by the red-dashed circle. The empty white ring data, it has become possible to probe and study the internal goal of the Theory and Observations of Stars (TOS) group at between the black and blue ray paths is the evanescent zone. Right: stellar structure directly through global stellar oscillations via a HITS, which was established in 2020. Our focus lies in – but is Zoom of the ray path in the gravity mode cavity. The ray paths were method known as asteroseismology. not limited to – low-mass main-sequence stars, subgiants, and computed following Alexander Kosovichev (Kosovichev, Advances in global and local helioseismology: an introductory review, lecture notes Asteroseismology uses similar techniques to helioseismology red giants. These stars are interesting because they go in physics 832, 3, 2011, https://arxiv.org/abs/1103.1707) based on a as carried out on our closest star, the Sun, to study the struc- through a series of internal structure changes. Furthermore, stellar model computed with the MESA stellar-evolution code (Paxton ture of other stars. The properties of waves are used to they are potential hosts of planets and serve as standard et al., Modules for Experiments in Stellar Astrophysics (MESA): the internal conditions of these stars. Oscillations that impact candles for galactic studies (core-helium-burning red giants). Pulsating variable stars, rotation, convective boundaries, and energy conservation, ApJS 243, 10, 2019 and references therein, https://arxiv. upon the whole star reveal information that is hidden by the Exoplanet studies as well as galactic archaeology will hence org/abs/1903.01426). opaque surface. This asteroseismic information from the also benefit from an increased understanding of these stars.

78 79 2.12 Theory and Observations of Stars (TOS)

Ongoing work improve the determined age and Fourier spectrum. We aim to an- distance of the cluster when viewed swer the following questions: Extraction of oscillation features https://www.nature.com/articles/ enhancing lithium include mixing, as a single entity. At the same time, To extract the oscillations, we work nature10612?page=4 ), and the core which could be a result of rotation. we aim to determine the pres- • What are the physical differences in Fourier space. In this space, rotation rate therefore can be With this project, we aim to unravel ent-day helium abundances for a in the structures of / conditions in stellar oscillations appear as peaks. determined from the mixed dipole the impact of rotation on the subsample of stars. This latter red giants that lead to different Due to their stochastic driving modes (Mosser et al., Spin down of abundance of lithium. determination will provide important oscillation spectra? mechanism, solar-like oscillators the core rotation in red giants, A&A constraints on our knowledge of the • What is the cause of the different have peaks with a width depending 548, A10, 2012, https://www.aanda. Red-giant bump primordial helium abundance of the structures / conditions in these on their lifetimes. Identifying these org/articles/aa/full_html/2012/12/ In addition to these ongoing pro- environment in which these stars stars? peaks from the noise is therefore aa20106-12/aa20106-12.html ; jects, we aim to further our under- were formed. non-trivial. We developed a peak-de- Gehan et al., Core rotation braking standing of what happens at the The goal of the DipolarSound tection method based on a Mexi- on the red giant branch for various red-giant bump. The red-giant bump ERC Consolidator Grant “Dipolar- proposal is to unravel the physical can-hat wavelet that automatically mass ranges, A&A 616, A24, 2018, is a short phase in stellar evolution Sound” conditions and processes at play in and reliably determines and charac- https://www.aanda.org/articles/aa/ in which changes in surface tem- As of October 2021, the TOS group red giants using mixed dipole terizes the oscillation signals in pdf/2018/08/aa32822-18.pdf). perature and brightness are retract- will also be funded by the ERC oscillation modes and to better terms of frequencies, amplitudes, However, to obtain the radial rota- ed before the stars continue on Consolidator Grant “DipolarSound.” understand the underlying physical and lifetimes (see Garcia Saravia tion profile, the envelope rotation their paths. This feature is well Within the framework of this grant, origins of the different oscillation Ortiz de Montellano, Hekker, rate is an essential input. Currently, known in models and observations; we will study stars that display spectra observed in red giants. Themeßl, Automated asteroseismic the envelope-rotation rate has only however, the physical mechanism different morphologies in their peak detections, MNRAS 476, 1470, been derived for about 20 red behind it remains unknown, and the 2018, https://academic.oup.com/ giants (Aerts, Mathis, Rogers, location of the bump in the models mnras/article/476/2/1470/4832523 ). Angular momentum transport in and observations is not the same. We are subsequently in the process stellar interiors, ARAA 57, 35, 2019, Last year, we published a paper of developing ways to characterize and references therein, https://www. (Hekker et al., Mirror principle and Sterne sind eine wichtige Quelle elektromagnetischer Strahlung im Universum, mit der viele Phänomene untersucht the modes in terms of their radial annualreviews.org/doi/abs/10.1146/ the red-giant bump: the battle of werden können, von fernen Galaxien über das interstellare Medium bis hin zu Exoplaneten. Aufgrund ihrer Undurchsich- order, spherical degree, and azi- annurev-astro-091918-104359 ). entropy in low-mass stars, MNRAS tigkeit wurde jedoch einmal gesagt, dass „auf den ersten Blick das tiefe Innere der Sonne und der Sterne für wissen- muthal order. TACO (Tools for the 492, 5940, 2020, https://academic. schaftliche Untersuchungen weniger zugänglich zu sein scheint, als jede andere Region des Universums“ (Sir Arthur Automated Characterization of We are in the process of obtaining oup.com/mnras/arti- Eddington, 1926). Durch moderne mathematische Methoden und die Menge und Qualität verfügbarer Daten ist es nun Oscillations) – a full code that we asteroseismic rotation measure- cle/492/4/5940/5709942 ), in which jedoch möglich geworden, die innere Sternstruktur direkt durch Sternschwingungen zu erforschen: eine Methode, die als plan to release soon – is in its final ments for a large set of hundreds we proposed a physical mechanism Asteroseismologie bekannt ist. stages of development. of stars in order to build a statisti- that could cause this feature. We cal sample that can be used to now plan to check whether the Die Asteroseismologie verwendet ähnliche Techniken wie die Helioseismologie, die an unserem nächstgelegenen Stern, Rotation study the angular momentum-trans- developed toy model indeed works der Sonne, durchgeführt wird, um die Struktur anderer Sterne zu untersuchen. Hierzu werden die Eigenschaften von Given that the interior and exterior port mechanisms in stars. Further- and whether it can lead to a better Wellen verwendet, um Rückschlüsse auf die innere Beschaffenheit von Sternen zu ziehen. Schwingungen, die auf den of stars rotate, mixed dipole modes more, we aim to perform tests to understanding of the physics that is ganzen Stern einwirken, enthüllen so Informationen, die durch die undurchsichtige Oberfläche normalerweise verborgen always carry a signature of that validate the robustness of the missing in the models. sind. Diese asteroseismischen Informationen der Weltraumobservatorien wie CoRoT, Kepler, K2, TESS, SONG und Plato rotation. The dipole modes are split results obtained from rotational kombiniert mit astrometrischen Beobachtungen von Gaia, spektroskopischen Daten von SDSS-V APOGEE, Interferome- into several components depending inversions of the radial rotation Projects beginning in 2021 trie, Photometrie und hochmodernen Sternmodellen wie MESA, geben Einblicke in die Sternstruktur und die physikali- on the orientation of the star. If the profile in stars. More specifically, schen Prozesse, die in Sternen ablaufen. rotation rate is high enough, this we intend to obtain these rotational SFB 881 “The Milky Way System” splitting can be observed. A signifi- inversion results for both lithi- Beginning in 2021, the TOS group Das Ziel der Theory and Observations of Stars (TOS) Gruppe am HITS, die 2020 eingerichtet wurde, ist die Untersu- cant part of the rotationally split um-rich stars and stars that are not will participate in SFB 881 “The chung dieser physikalischen Prozesse, die in Sternen ablaufen, und wie sich diese in Abhängigkeit von der Sternentwick- mixed dipole modes – even of the rich in lithium. According to stan- Milky Way System,” in which we lung verändern. Die Gruppe konzentriert sich hierbei unter anderem auf sogenannte Hauptreihen-Sterne geringer Masse, most pressure-dominated ones – is dard stellar-evolution models, intend to use asteroseismology to „Unterriesen“ und rote Riesensterne. Diese Sterne sind deshalb interessant, weil sich ihre innere Struktur schnell ändert. sensitive to the core (Beck et al., lithium is depleted in evolved stars. obtain accurate masses, radii, ages, Da sie potenziell von Planeten umgeben und kosmologische „Standardkerzen“ für Galaxienstudien sind, können sowohl Fast core rotation in red-giant stars However, lithium can be observed at and distances as well as cluster die Exoplanetenforschung als auch die Galaxien-Archäologie vom wachsenden Verständnis dieser Sterne profitieren. as revealed by gravity-dominated enhanced values in a subsample of memberships of individual cluster mixed modes, Nature 481, 55, 2012 stars. Nearly all scenarios for stars and to use this information to

80 81 3 Centralized Services 3.2 IT Infrastructure and Network 3.1 Administrative Services At the beginning of the year, we The HITS administration supports the integrated a new underground fiber scientific groups in nearly all adminis- link in our network infrastructure, trative tasks. It takes care of day-to- thereby finalizing a larger project day operations at both HITS locations, that had begun in 2017. Both the manages human resources and Internet connection of the Institute accounting, clarifies legal issues, and and the connection between HITS assists the communications team in and the Heidelberg University’s organizing events. Computing Centre have thus be- The corona pandemic was, at least in come fully redundant. outward appearance, the main topic of 2020, and this was also true for the administration at The beginning of the COVID-19 HITS. Beginning in March 2020, many procedures and pandemic and the sudden switch to Group Leader processes were changed, and the administration team, like working from home in March did Dr. Gesa Schönberger almost all other HITSters, switched from working in the not catch us off guard. Indeed, for office to working from remote locations without any major several years, HITS has had an Staff members problems. All external and internal events and gatherings extended system for remote work in Group Leader Christina Blach (office) were initially cancelled, but many of these events could be place, including a secure VPN, a Dr. Ion Bogdan Costescu Frauke Bley (human resources) rescheduled online after a short period of time. The text-only- and a graphical login Christina Bölk-Krosta (controlling) cafeteria was closed from 12 March to 17 May and then server, and many web services. Staff members Benedicta Frech (office) again beginning on 21 December 2020, resulting in largely Most HITSters were already Dr. Bernd Doser (Senior Software Developer) Ingrid Kräling (controlling) empty offices each time. A strong hygiene plan put rules in equipped with the necessary lap- Norbert Rabes (System Administrator) Thomas Rasem (controlling) place for work for anyone who still wanted or had to come tops and accessories. The Internet Andreas Ulrich (System Administrator) Rebekka Riehl (human resources and to HITS. This plan was regularly reviewed but could largely connection of the Institute had Taufan Zimmer (System Administrator) Assistant to the Managing Director) be maintained unaltered throughout the year. Thus, al- sufficient free capacity to support Stefanie Szymorek (human resources) though many things at HITS were spatially separated, sustained work from home as well Student Irina Zaichenko (accounting) fortunately, the actual work itself could nevertheless as many videoconferences, which Julian Aris (July - December 2020) proceed as usual in many cases. have since become an integral part of Student After its official start at the beginning of 2020, the HITS daily work at HITS. As a result, most Lilly Börstler (until July 2020) administration was busy with the implementation of its scientific activities have continued, even during the tight The large BeeGFS storage system at HITS was also new corporate design as well as with the first events in lockdown periods. The ITS group members have gener- replaced, and the usable capacity was increased to celebration of our 10-year anniversary (see chapter 5.3). ally combined remote- and on-site work in order to approximately 1.6TB. The 150 compute nodes have maintain the hardware infrastructure while keeping a access to this storage system over Infiniband as well as The HITS Lab program was established at about the same time. It provides additional funding for interdisciplinary safe distance between each other. to the other BeeGFS storage system, which is located in group work on unusual research questions. We currently have two preliminary projects running that receive third-party the Heidelberg University Computing Centre, over funding and another two that are internally funded and began in 2020 (see Chapters 2.4, 2.6, 2.9, 2.11). In addition to its The installation of the new HPC cluster at HITS, which 10/40G Ethernet. Sharing the storage systems increas- day-to-day business, the HR team spent almost the entire year working on selecting and introducing an HR software had also been planned for the beginning of the year, es efficiency by eliminating the need to copy/move data program. Personio software was finally chosen and implemented over the summer and began being used by all HITS- had to be postponed to the summer due to large delays between the two clusters, and it also increases usability ters in September 2020. in the delivery of components. The old compute nodes, by employing a uniform naming scheme. This sharing some of which dated back to 2010, were replaced by will hopefully contribute to increased scientific output in The appointment of Saskia Hekker in the first half of the year and the launch of her TOS group in September 2020 are 150 new ones featuring Intel Cascade Lake CPUs and the years to come. two additional and important major steps taken by the Institute (see Chapter 2.12). Supporting Saskia Hekker and Infiniband technology of the newest generation for Frauke Gräter in the application process for two ERC Consolidator Grants, which was met with success at the end of optimal scalability of our parallel codes. the year, was no less important. We celebrated another great milestone with Fabian Schneider, whose applications to both the DFG (Emmy Noether Program) and the EU (ERC Starting Grant) were successful. Fabian prepared with us for the start of his junior group, SET, in the last third of the year.

82 83 management and standardization to stars: For five years running, Clarivate the search for suitable medicine and Analytics listed computer scientist COVID-19 forecasts – projects no Alexandros Stamatakis as one of the 4 Communication and one would have envisioned in Janu- most-cited researchers in his field ary. “Non-corona”-related research worldwide. was also successful and included Outreach new, “explosive” findings on dying Furthermore, we were able to present stars, a study on collagen with an a plethora of research highlights from impact on material research and the last ten years in our newly HITS on the air: Radio journalist Eberhard biomedicine, and an easy-to-use established blog “Via Data,” edited by Reuß interviews Rebecca Wade (MCM) for open-source online platform for Angela Michel on the “Scilogs” the broadcast on the Institute’s anniversary biomedical image segmentation. platform. Throughout the course of in January 2020. the year, several researchers joined other institutes, but we all missed In terms of the HITS workforce, we the team of authors and posted seeing one another in person. literally reached for the stars: In about their experiences and findings To compensate for this loss, we September, asteroseismologist in easily understandable language. immediately intensified our internal Saskia Hekker joined HITS as TOS Inspired by the positive response, we communications by issuing regular Group Leader (see Chapter 2.12.). In will continue with the “Via Data” blog emails to all HITSters to inform them December, she won an ERC Consoli- beyond the anniversary year. of the current status of the pandem- dator Grant for her research on the ic, and we included some lab news sound of the stars. At the same time,

HITS Communications team in 2020 (f.l.t.r.): Olexandr Golovin, Peter Saueressig, Isabel Lacurie, Angela Michel.

Head of Communications Student Dr. Peter Saueressig Olexandr Golovin

Staff members Isabel Lacurie Angela Michel

“Crash test for proteins”: HITS researchers Frauke Gräter (right) and Isabel Martin (left) presented their research in the MAINS Heidelberg, The HITS communications team is the Institute’s central such as conferences and workshops. Moreover, we strive moderated by science journalist Pia Grzesiak. hub for external and internal communications. Its main to spark enthusiasm for science among school students tasks are to raise the profile of HITS by coordinating and the general public alike through our outreach activi- about special achievements and MBM Group Leader Frauke Gräter Media relations: “Journalist media relations, digital and social-media communications, ties. In 2020, however, we had to adjust all our original interesting topics. As there were won an ERC Consolidator Grant for in Residence” program the Institute’s publications, and design and branding as plans in response to the fundamentally new situation. many such topics, there was always her project on radicals in collagen well as by organizing events for the scientific community, enough “fodder” for us to broadcast. (see Chapter 2.7). We firmly believe that an important Additionally, we were granted even prerequisite for successful science We had been looking forward to 2020 Neckar-Zeitung” and an interview on series of public talks in the MAINS Scientific excellence: more funding from the ERC: Kai communication is the development with great anticipation because on 1 the German radio station SWR 2 with venue in Heidelberg. At the first event Fighting COVID and reaching Polsterer’s AIN group participated in of reliable and sustainable journalistic January 2020, HITS celebrated its HITS researchers Rebecca Wade in February, Frauke Gräter and Isabel for the stars a project that received an ERC contacts. The “Journalist in Resi- 10th birthday. Therefore, we invited (MCM), Sebastian Lerch (CST), and Martin presented “crash test for Synergy Grant for measuring the true dence” program thus represents an distinguished guests to our anniver- Fabian Schneider (PSO). In February, proteins.” In terms of scientific success, the scale of the Universe. Moreover, important project for HITS. The sary reception in January for the we organized the ZoRa Workshop at However, in March, all our plans HITS communications team was astrophysicist Fabian Schneider program is geared toward science beginning of the many planned HITS, a project funded by the Klaus came crashing down due to corona. pleased to announce a great deal of received an ERC Starting Grant for journalists and offers them a paid special events and activities (see Tschira Foundation (see Chapter Many events had to be canceled, and good news to the public. his project on the evolution of stars sojourn at HITS. During their stay, Chapter 5.3). The reception was 5.1.3). Throughout the course of the others were shifted to the digital Our researchers participated in and began his own research group at journalists can learn more about complemented by a one-page news- year, we had planned to organize an sphere. As our lab is the computer, COVID-19 projects ranging from data HITS in January 2021. Speaking of data-driven science and get to know paper portrait in the local “Rhein- anniversary open-house event and a we had fewer problems coping than

84 85 4 Communication and Outreach researchers and new research topics without the pressure of the “daily grind.” In the most-recent opening, candidates from six continents applied. Canadian science journalist and author Siobhan Roberts was chosen to be the 2020 “Journalist in Resi- dence.” Roberts has worked as a freelance journalist with a focus on mathematics and science since 2001. She writes regularly for The New York Times’ “Science Times” and has contributed to The New Yorker’s science and tech blog, “Elements,” as well as to The Walrus, Quanta, and Kai Polsterer delivering his keynote address at the virtual ADASS conference in November 2020 via video stream. The Guardian, among other publica- tions. Moreover, Roberts is the author of two biographies on mathemati- on the HITS YouTube channel). “HITS In 2020, the run had to be reorga- teams participated in Heidelberg cians: “King of Infinite Space,” on is that rare place where you can nized as a virtual three-day event. alone, bicycling more than 200,000 Donald Coxeter, and “Genius at Play,” buckle down and make progress on a From 26–28 June, almost 9,000 kilometers in total. The “HITSter” on John Horton Conway. She has big project while at once expanding participants ran a total of 120,000 team ranked 25th in the citywide earned multiple awards for her work, your reportorial horizons in all sorts kilometers either individually or in competition and contributed over including the Euler Book Prize from of new directions. And the hilly forest small groups following a route of 2,000 kilometers. Thanks to their the Mathematical Association of provides the perfect escape to their own choosing. Among them efforts, the team managed to save America. contemplate and process it all and was the HITS team of 30 runners, over 300 kilograms of CO2 emissions Siobhan Roberts arrived at HITS in map a path forward,” stated Roberts, who were especially motivated: On while having fun and staying healthy. summing up her stay at the Institute. the occasion of its 25th anniversary, The next HITS “Journalist in Resi- the Klaus Tschira Foundation had dence” will be radio journalist Carl decided to sponsor each staff Smith (Sydney, Australia) in the kilometer with 25 euros. And the second half of 2021. running team did not need to be told twice: In keeping with the team Events: Separated by slogan “Run beyond the limits!” they distance, but united in completed a combined distance of spirit 427 kilometers and thus raised a total “Run beyond the limits!”: The HITS running team at the first virtual NCT run in June completed of EUR 10,675 for cancer research. a combined total distance of 427 kilometers and raised more than EUR 10,000 for cancer In addition to many digital events, The best moments of this first virtual research. such as the “virtual lab visit” for young NCT Run are preserved for posterity groups on several occasions, both searchers an online writing workshop researchers during the Heidelberg on YouTube. online and “in real life.” Roberts’s in November, which was fully booked. Laureate Forum (see Chapter 6) and goal was to increase her fluency in Due to the pandemic, she was only the live streaming of Kai Polsterer’s Another event took place from 20 data-driven research and to explore able to visit the numerous university- keynote address from HITS as part of September to 10 October 2020: the opportunities for data journalism. and extramural research institutes in the ADASS conference, some “in-per- nationwide “Stadtradeln” (“City Siobhan Roberts, HITS Journalist in She also worked on her current book Heidelberg on rare occasions; howev- son” events also took place. The Cycling Challenge”) competition. The Residence 2020. project, a biography on Swiss-Ameri- er, Roberts gave an online public talk annual run organized by the National competition was designed to show mid-September 2020. In spite of the can/Canadian mathematical logician entitled “Embracing the Uncertainties” Center for Tumor Diseases (NCT) in that we can easily navigate through Corona crisis, she was able to meet and group theorist Verena Huber- in January 2021 that marked the end Heidelberg is one of the most-popu- daily life by bike and thereby contrib- with HITS researchers from different Dyson. Roberts offered HITS re- of her stay (see the video of her talk lar charity events in the region. ute to saving the environment. 133

86 87 The workshop focused on the need for the standardization of patient-derived data for use in disease-related modeling in the context of personalized medicine as well as on the potential for artificial intelligence to utilize this patient-derived data and to support the modeling process. In addition to the gathering and discussion of use cases, the major aims of 5 Events the workshop were also to compare state-of-the-art modeling approaches based on patient-derived data, to discuss the potential for artificial-intelligence (AI) methods to utilize patient-derived data for modeling, as well as to analyze stan- 5.1 Conferences, Workshops & Courses dardization gaps in the use of AI technologies and manual modeling approaches in the context of in silico models for personalized medicine.

5.1.1 Emulator Day 5.1.3 ZOrA Workshop

27 January 2020 28 February 2020 Studio Villa Bosch/HITS, HITS, Heidelberg, Germany Heidelberg, Germany On 28 February 2020, HITS hosted a workshop within the What does research on astrophysics, molecular biology, and weather prediction have in common? The relevant physical framework of the “Zukunfts-Orientierungs-Akademie” processes act – and need to be modeled – on a wide variety of spatial and temporal scales. Up to now, these scales have (“Future-Orientation Academy”; short: ZOrA). The academy typically been addressed with ad-hoc models and so-called "sub-grid-scale" physics. However, using complimentary methods is a project by the PH Heidelberg and Heidelberg Universi- from computational statistics and machine learning appears promising and could lead to faster, more-efficient simulations. On ty and is funded by the Klaus Tschira Stifung. HITS – the interdisciplinary “Emulator Day,” organized by HITS group leaders Tilmann Gneiting (Computational Statistics), Frauke Gräter among other Heidelberg institutions – is a scientific part- (Molecular Biomechanics), and Friedrich Röpke (Physics of Stellar Objects), scientists met at the Studio Villa Bosch and at HITS ner of the program. ZOrA offers female high-school to discuss emulators – that is, judicious representations of complex, physics-based computer code via simpler and much-fast- students the possibility to visit different institutes and er surrogate models. organizations in Heidelberg in order to find out about The event began with a colloquium talk by Corinna Hoose (Karlsruhe Institute of Technology KIT, Germany) on the “Simulation career options in IT and science. HITS researchers Saskia of Deep Convective Clouds under Various Meteorological and Microphysical Impacts.” Afterward, Tilmann Gneiting presented Haupt (DMQ), Sabrina Gronow (PSO), and Mareike Pfeil an overview on emulators from a statistical perspective. Constanze Wellmann (Heidelberg University, Germany) provided (GRG) provided some valuable insights into their study insights on her use of “Statistical Emulation for Sensitivity Studies of Deep Convective Clouds,” and Takahiro Nishimichi (Kyoto paths as well as into their current research and career University, ) gave a talk about “Emulating Halo Statistics for Large-scale Structure Cosmology.” The “Emulator Day” also goals. Afterwards, the girls had the possibility to ask the marked the kick-off of a new interdisciplinary project called “Emulation in Simulation,” a HITS Lab project led by group leaders three scientists all the questions they had always wanted Tilmann Gneiting, Frauke Gräter, and Friedrich Röpke (see also Chapter 2.4). to ask about becoming a scientist.

5.1.2 EU-STANDS4PM Workshop “Using 5.1.4 Workshop “FAIR Data Infrastructures for Biomedical Communities” patient-derived data for in silico modeling Funded by in personalized medicine” 15 October 2020 online German Research Foundation 4 February 2020 of artificial-intelligence (AI) technology and manual in HITS, Heidelberg, Germany silico modeling approaches for personalized medicine, This online workshop with more than 80 attendees from Germany and all over Europe introduced several European and HITSter Martin Golebiewski together with Heike Moser national activities geared toward supporting the biomedical research community with FAIR (Findable, Accessible, Interopera- EU-STANDS4PM is a European initiative funded by the from DIN (German Institute for Standardization) presented ble, and Reusable) data management. By exchanging experiences from across the different initiatives, common concepts and Horizon2020 Framework Programme of the European possible options for standardization strategies. Moreover, a joint strategy for FAIR data and corresponding standards were developed. Commission with the aim of developing and establishing Christina Kyriakopoulou, a representative from the Europe- HITSter Martin Golebiewski organized the workshop together with partners in Göttingen and Leipzig as a satellite of the 65th a European standardization framework for data integration an Commission’s (EC) Health Directorate, attended the annual conference of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS). Martin leads the and data-driven in silico models for personalized medi- workshop and gave an overview on the EC’s vision of the hosting GMDS working group “FAIR data infrastructures for biomedical informatics,” which was formed in early 2019 (https:// cine. In an EU-STANDS4PM workshop organized by Martin importance of standardization for in silico modeling in bit.ly/2N6rCq1). The group bundles infrastructure activities for health-related FAIR data in the fields of biomedical research, Golebiewski, about 25 experts in data- and model stan- personalized medicine. clinical-data management, and medical informatics in Germany. It aims to provide a platform for information exchange on the dardization, data collection, and model simulation in best practices for and the implementation of FAIR data management in these fields. personalized medicine met in Heidelberg. The attendees Keynotes were presented by Niklas Blomberg – Director of the European Bioinformatics Infrastructure ELIXIR – and Carole originated from 8 different countries – mainly from Goble from the University of Manchester (UK). Sylvia Thun from the Berlin Institute of Health at Charité spoke about the highly Germany and other parts of Europe but also from New topical “Standardization and interoperability of study data from COVID-19 clinical and public-health research.” Several initiatives Zealand – and represented the fields of academic and for FAIR data from the domain presented their activities, including FAIRsharing, FAIR4Health, FAIRness for FHIR, the European industrial research as well as clinical applications. Open Science Cloud for the life sciences (EOSC-Life), and FAIRDOM. The workshop also aimed to collect requirements for the Many of the attendees presented their use cases on using newly formed German National Research Data Infrastructure (NFDI). The coordinators of both the German Human Ge- patient-derived data for in silico modeling in short presen- nome-Phenome Archive (GHGA) and the National Research Data Infrastructure for Personal Health Data (nfdi4health) gave an tations. To initiate a standardization roadmap for the use update on these parts of NFDI.

88 89 5 Events 5.3 HITS anniversary 5.1.5 Astrophysics winter reception workshop HITS was established on 1 January 14–18 December 2020 2010 when EML Research gGmbH online Sweden, the United Kingdom, Ireland, (HITS) on compressible simulations changed its name. In 2020, the insti- the United States, and several institu- of stellar oscillations, by Mark Magee tute celebrated its 10th anniversary. Due to the COVID-19 pandemic, the tions in Germany and discussed (Trinity College Dublin, Ireland) on The first event to mark the occasion annual winter workshop organized by topics on supernova research, binary light curves of Type Ia supernovae, was the anniversary reception on 20 the Physics of Stellar Objects (PSO) stellar evolution, asteroseismology, and by Patrick Ondratschek (MPS January 2020. HITS Managing Direc- group had to take place as an online and processes in stellar interiors. Göttingen, Germany) on magnetohy- tor Gesa Schönberger and Scientific event. Despite its disadvantages in drodynamic simulations of common Director Wolfgang Müller welcomed terms of discussions, the format The plenary session featured invited envelope phases in stellar binary more than 80 guests to the Studio Cutting the first slice: Wolfgang Müller and Gesa Schönberger. allowed for more participants to take talks by Sherry Suyu (MPA Garching, systems. The plenary session was Villa Bosch: scientific and industrial part in the "15th Würzburg workshop" Germany) on gravitationally lensed followed by topical sessions in which partners, local Heidelberg politicians, than had done so in previous years supernovae, by Zhengwei Liu (Yunnan the participants presented their latest science journalists, and friends. and for the program to be extended Observatories of the Chinese Acade- results, and projects for collabora- to cover a full week (from 14–18 my of Sciences) on the impact of tions were discussed. Gesa Schönberger and Wolfgang December). The international group supernova explosions on companion Müller presented a review of the last of about 50 participants included stars, by Saskia Hekker (HITS) on 10 years and an outlook on upcoming scientists from Australia, China, asteroseismology, by Johann Higl events (unfortunately, most of them had to be canceled due to the pan- demic), followed by remarks by 5.2 HITS Colloquia Prof. Dr. Jens Meiler Andreas Reuter, former Managing Director and member of the board of Vanderbilt University, Informatics Center directors of the HITS-Stiftung. Prof. Dr. Corinna Hoose for Structural Biology, USA 10 years HITS: The fancy birthday cake. 8 September 2020: Innovative Institute of Meteorology and Climate Computational Methods for Protein Structure Prediction, Drug Research, Karlsruhe Institute of Discovery, and Therapeutic Design (Online) Next, group leader Kai Polsterer (AIN) Technology, Germany gave a vivid and lively talk about AI in 27 January 2020: Simulation of deep convective clouds under science entitled “Wenn Maschinen various meteorological and microphysical impacts Prof. Dr. Claudia Draxl lernen: Künstliche Intelligenz in der Wissenschaft” (“When Machines Physics Department of the Humboldt- Learn: Artificial Intelligence in Sci- Dr. Wolfgang Maier University of Berlin, Germany ence”). 12 October 2020: Predicting properties of IBM R&D – Systems & Technology complex materials: challenges for modern ab initio theory Afterward, during the informal recep- Group, Böblingen, Germany (Online) tion, the HITS management cut the 17 February 2020: Future Compute The audience in the Carl Bosch Auditorium. first slice of a fancy birthday cake as Paradigms a symbolic start to the anniversary Prof. Dr. Robert Best year.

Prof. Dr. Robert C. Williamson National Institute of Diabetes and Digestive and Kidney Diseases, Australian National University, Research Theoretical Biophysical Chemistry School of Computer Science, Australia Section, USA 29 June 2020: The AI of Ethics (Online) 23 November 2020: Molecular Simulations of Intrinsically Disordered Proteins (Online)

Kai Polsterer during his talk about AI in science.

90 91 6 Collaborations

researchers from four continents and in different time zones participated in the session and made extensive use of the Q&A to pose questions about what it is like to conduct research and work at HITS. The session went smoothly thanks to HITS Alumnus Volker Gaibler, now a member of the HLF staff, who moderated the event.

Anna Wienhard as the new scientific chairperson of the HLFF The closing ceremony of the HLF saw a change in the conference committee: As planned, computer A get-together in times of physical distancing: HLF participants during the Virtual Meeting Hub (Picture: HLFF). Anna Wienhard and Andreas Reuter at the HLF closing ceremony (Photo: HLFF). scientist and former HITS Managing Director Andreas Reuter announced Heidelberg Laureate Forum September 2020. It included a that he was stepping down as multifaceted scientific program and Scientific Chairperson of the HLFF, a Informatics4Life HITS. It consists of several subpro- Another subproject in which HITSters The Heidelberg Laureate Forum various interactive elements designed position he had held since the jects in which HITS researchers are are involved is “ML-supported (HLF) is a networking conference at to facilitate exchange between Foundation was established in 2013. Despite remarkable progress in the involved as principal investigators high-content screening of temporal which 200 carefully selected young participants. Tools such as the Andreas was essential in developing diagnosis and treatment of acute and and contributors. and spatial cellular features” (ML = researchers in mathematics and interactive event app and the Virtual the scientific elements of each chronic cardiovascular diseases in machine learning). This project has computer science spend a week Meeting Hub in VR were heavily HLF program. He passed the prover- recent years, these illnesses remain additionally been funded by the Klaus interacting with the laureates of the employed by the young researchers bial torch to mathematician Anna the leading cause of death and Tschira Foundation and – due to its disciplines, including recipients of the throughout the week. Wienhard, GRG group leader at HITS hospitalization for all people world- similar focus – has recently merged Abel Prize, the ACM A.M. Turing and professor at Heidelberg Universi- wide. Informatics4life is a collabora- with Informatics4 Life. Kai Polsterer Award, the ACM Prize in Computing, Virtual lab visit: HITSters meet ty. Anna has been an active member tive initiative founded by the Klaus HITS involvement: From structure (AIN) and Jan Plier work on this the Fields Medal, and the Nevanlinna young researchers of the HLF’s Scientific Committee for Tschira Foundation that focuses on based-design to Machine Learning project and collaborate with Mathias Prize. Established in 2013, the HLF is This program includes lab visits several years and is an avid support- cardiovascular research and brings Rebecca Wade (MCM) is a Principal Konstandin (Heidelberg University organized annually by the Heidelberg during which a group of young er of the HLF who has helped pro- together experts in clinical research Investigator of the subproject “Struc- Hospital). Even though the scales are Laureate Forum Foundation (HLFF). researchers come to Heidelberg mote its guiding vision. Upon assum- and computational methods. The ture-based design of peptide-based extremely different, morphological The event was the product of a joint institutes and companies to get a ing her new position, she proclaimed, initiative’s patient-centric environment pharmaceuticals against striated analyses of cell tissues are very initiative between HITS and the Klaus glimpse of the work done there. For “I look forward maintaining the – a completely novel approach – is muscle disorders” (see Chapter 2.8), similar to analyses of multi-band Tschira Foundation. Since 2016, HITS the first time, HITS hosted the HLF tradition of excellence at the HLF and critical to translating research into Wolfgang Müller (SDBV) is Collaborat- observations of galaxies – an observa- has served as a scientific partner of young researchers virtually via video to shaping its future.” Anna took on clinical application. As Heidelberg is a ing Principal Investigator of the tion that provides further support for the HLF along with Heidelberg conference on 24 September 2020. her new role on 1 October 2020. “hot spot” for health research and subproject “Research data ware- Klaus Tschira’s vision that interdisci- University. HITS Scientific Director Wolfgang medicine with internationally highly house,” and Vincent Heuveline (DMQ) plinary research is key to making Müller led a virtual campus tour competitive institutes and research- is Principal Investigator of the subproj- progress in research (see Chapter 2.1). Traversing separation: The Virtual and explained the range of research ers, it is the ideal location for this ect “Cognition and uncertainty quanti- HLF, 21–25 September 2020 performed at the Institute. After- pioneering initiative. fication for numerical heart simula- The striking difference in 2020 ward, CME group leader Alexandros tion” (see Chapter 2.5). Moreover, compared with former HLF events Stamatakis gave a remote talk on the The project represents an interdisci- Vincent is also Co-Coordinator of was the obvious shift to the digital “phylogenetic inference of SARS- plinary alliance between cardiovascu- Informatics4life in cooperation with sphere. Branded with the motto CoV-2” from his vacation home in lar physicians and computer scien- Hugo Katus and Benjamin Meder "Virtual HLF – Traversing Separation," Crete – one of the advantages of a tists from Heidelberg University, (both from Heidelberg University the Forum took place from 21–25 virtual event. More than 40 young University Hospital Heidelberg, and Hospital).

92 93 Evans C, Lennon D, Langer N, Almeida L, Bartlett E, Bastian Hon M, Bellinger EP, Hekker S, Stello D, Kuszlewicz JS Bodensteiner J, Sana H, Mahy L, Patrick LR, de Koter A, de N, Bestenlehner J, Britavskiy N, Castro N, Clark S, Crowther (2020). Asteroseismic inference of subgiant evolutionary 7 Publications Mink SE, Evans CJ, Götberg Y, Langer N, Lennon DJ, P, de Koter A, de Mink S, Dufton P, Fossati L, Garcia M, parameters with deep learning. Monthly Notices of the Royal Schneider FRN, Tramper F (2020). The young massive SMC Gieles M, Gräfener G, Grin N, Hénault-Brunet V, Herrero A, Astronomical Society 499(2):2445-2461 cluster NGC 330 seen by MUSE. I. Observations and stellar Howarth I, Izzard R, Kalari V, Maíz Apellániz J, Markova N, Ballhausen A, Przybilla MJ, Jendrusch M, Haupt S, Pfaffen- content. A&A 634:A51 Najarro F, Patrick L, Puls J, Ramírez-Agudelo O, Renzo M, Horst L, Edelmann PVF, Andrássy R, Röpke FK, Bowman dorf E, Seidler F, Witt J, Sanchez AH, Urban K, Draxlbauer Sabín-Sanjulián C, Sana H, Schneider F, Schootemeijer A, DM, Aerts C, Ratnasingam RP (2020). Fully compressible M, Krausert S, Ahadova A, Kalteis MS, Pfuderer PL, Heid D, Bowman DM, Burssens S, Simón-Díaz S, Edelmann PVF, Simón-Díaz S, Smartt S, Taylor W, Tramper F, Van Loon J, simulations of waves and core convection in main-sequence Stichel D, Gebert J, Bonsack M, Schott S, Bläker H, Seppälä Rogers TM, Horst L, Röpke FK, Aerts C (2020). Photometric Villaseñor J, Vink JS, Walborn N (2020). The VLT-FLAMES stars. A&A 641:A18 T, Mecklin J, Broeke ST, Nielsen M, Heuveline V, Krzykalla J, detection of internal gravity waves in upper main-sequence stars. Tarantula Survey. The Messenger, 181:22-27 Benner A, Riemer AB, Doeberitz MvK, Kloor M (2020). The A&A 640:A36 Jeon S, Strube M (2020). Incremental Neural Lexical Coherence shared frameshift mutation landscape of microsatellite-unstable Franz F, Daday C, Gräter F (2020). Advances in molecular Modeling. In Proceedings of the 28th International Conference cancers suggests immunoediting during tumor evolution. Nat Brehmer, J.R., Gneiting, T. Properization: constructing proper simulations of protein mechanical properties and function. on Computational Linguistics (COLING), Online, December, pp. Commun 11(1), 4740 scoring rules via Bayes acts. Ann Inst Stat Math 72, 659–673 Current Opinion in Structural Biology 61:132-138 6752–6758 (2020) Barbera P, Czech L, Lutteropp S, Stamatakis A (2020). SCRAPP: Galvin TJ, Huynh MT, Norris RP, Wang XR, Hopkins E, Jeon S, Strube M (2020). Centering-based Neural Coherence A tool to assess the diversity of microbial samples from phylogenet- Brenna C, Khan AUM, Picascia T, Sun Q, Heuveline V, Gretz Polsterer K, Ralph NO, O’Brien AN, Heald GH (2020). Modeling with Hierarchical Discourse Segments. In Proceedings ic placements. Mol Ecol Resour, Vol. 21, Issue 1/ p. 340-349 N (2020). New technical approaches for 3D morphological Cataloguing the radio-sky with unsupervised machine learning: a of the Conference on Empirical Methods in Natural Language imaging and quantification of measurements. Anat Rec new approach for the SKA era. Monthly Notices of the Royal Processing (EMNLP), Online, November , pp. 7458-7472 Berger B-T, Amaral M, Kokh DB, Nunes-Alves A, Musil D, 303(10):2702-2715 Astronomical Society 497(3):2730-2758 Heinrich T, Schröder M, Neil R, Wang J, Navratilova I, Keating SM, Waltemath D, König M, Zhang F, Dräger A, Bomke J, Elkins JM, Müller S, Frech M, Wade RC, Knapp S Brunak S, Collin CB, EU-STANDS4PM Consortium, Cathaoir Gornik SG, Bergheim BG, Morel B, Stamatakis A, Foulkes Chaouiya C, Bergmann FT, Finney A, Gillespie CS, Helikar T, (2020). Structure-Kinetic-Relationship Reveals the Mechanism of KEÓ, Golebiewski M, Kirschner M, Kockum I, Moser H, NS, Guse A (2020). Photoreceptor diversification accompanies Hoops S, Malik‐Sheriff RS, Moodie SL, Moraru II, Myers CJ, Selectivity of FAK Inhibitors Over PYK2. Cell Chem Biol S2451- Waltemath D (2020). Towards standardization guidelines for in the evolution of Anthozoa. Molecular Biology and Evolution, Naldi A, Olivier BG, Sahle S, Schaff JC, Smith LP, Swat MJ, 9456(21):00003-9 silico approaches in personalized medicine. Journal of Integra- msaa304 Thieffry D, Watanabe L, Wilkinson DJ, Blinov ML, Begley K, tive Bioinformatics 17(2-3) Faeder JR, Gómez HF, Hamm TM, Inagaki Y, Liebermeister Bestenlehner JM, Crowther PA, Caballero-Nieves SM, Gottschling M, Czech L, Mahé F, Adl S, Dunthorn M (2020). W, Lister AL, Lucio D, Mjolsness E, Proctor CJ, Raman K, Schneider FR, Simón-Dı́az S, Brands SA, de Koter A, Chai H, Zhao W, Eger S, Strube M (2020). Evaluation of The windblown: possible explanations for dinophyte DNA in Rodriguez N, Shaffer CA, Shapiro BE, Stelling J, Swainston Gräfener G, Herrero A, Langer N, Lennon DJ, Maíz Apellániz Coreference Resolution Systems Under Adversarial Attacks. In forest soils. bioRxiv 2020.08.07.242388 N, Tanimura N, Wagner J, Meier‐Schellersheim M, Sauro J, Puls J, Vink JS (2020). The R136 star cluster dissected with Proceedings of the First Workshop on Computational Approaches HM, Palsson B, Bolouri H, Kitano H, Funahashi A, Hermja- Hubble Space Telescope/STIS - II. Physical properties of the to Discourse, Online, November, pp. 154-159 Gronow S, Collins C, Ohlmann ST, Pakmor R, Kromer M, kob H, Doyle JC, Hucka M, Adams RR, Allen NA, Anger- most massive stars in R136. Monthly Notices of the Royal Seitenzahl IR, Sim SA, Röpke FK (2020). SNe Ia from double mann BR, Antoniotti M, Bader GD, Červený J, Courtot M, Astronomical Society 499(2):1918-1936 Diestelkoetter‐Bachert P, Beck R, Reckmann I, Hellwig A, detonations: Impact of core-shell mixing on the carbon ignition Cox CD, Dalle Pezze PD, Demir E, Denney WS, Dharuri H, Garcia‐Saez A, Zelman‐Hopf M, Hanke A, Nunes-Alves A, mechanism. A&A 635:A169 Dorier J, Drasdo D, Ebrahim A, Eichner J, Elf J, Endler L, Bettisworth B, Stamatakis A (2020). RootDigger: a root Wade RC, Mayer MP, Wieland F (2020). Structural characteriza- Evelo CT, Flamm C, Fleming RM, Fröhlich M, Glont M, placement program for phylogenetic trees. bioRxiv tion of an Arf dimer interface: molecular mechanism of Arf‐de- Haupt S, Gleim N, Ahadova A, Bläker H, von Knebel Doe- Gonçalves E, Golebiewski M, Grabski H, Gutteridge A, 2020.02.13.935304 pendent membrane scission. FEBS Lett, 594(14):2240-2253 beritz M, Kloor M, Heuveline V (2020). Computational model Hachmeister D, Harris LA, Heavner BD, Henkel R, Hlavacek investigates the evolution of colonic crypts during Lynch syn- WS, Hu B, Hyduke DR, Jong H, Juty N, Karp PD, Karr JR, Blacker S, Bastian NF, Bauswein A, Blaschke DB, Fischer T, Doorenbos L, Cavuoti S, Brescia M, D’Isanto A, Longo G, drome carcinogenesis. bioRxiv 2020.12.29.424555 Kell DB, Keller R, Kiselev I, Klamt S, Klipp E, Knüpfer C, Oertel M, Soultanis T, Typel S (2020). Constraining the onset (2020). Comparison of outlier detection methods on astronomi- Kolpakov F, Krause F, Kutmon M, Laibe C, Lawless C, Li L, density of the hadron-quark phase transition with gravitation- cal image data. Intelligent Astrophysics, Springer Nature Haupt S, Zeilmann A, Ahadova A, von Knebel Doeberitz M, Loew LM, Machne R, Matsuoka Y, Mendes P, Mi H, Mittag F, al-wave observations. Phys. Rev. D 102(12),123023 10.1007/978-3-030-65867-0 Kloor M, Heuveline V (2020). Mathematical Modeling of Monteiro PT, Natarajan KN, Nielsen PM, Nguyen T, Palmisa- Multiple Pathways in Colorectal Carcinogenesis using Dynamical no A, Pettit J, Pfau T, Phair RD, Radivoyevitch T, Rohwer Bläker H, Haupt S, Morak M, Holinski‐Feder E, Arnold A, Edgar RC, Taylor J, Altman T, Barbera P, Meleshko D, Lin V, Systems with Kronecker Structure. bioRxiv 2020.08.14.250175 JM, Ruebenacker OA, Saez‐Rodriguez J, Scharm M, Horst D, Sieber‐Frank J, Seidler F, von Winterfeld M, Alwers Lohr D, Novakovsky G, Al-Shayeb B, Banfield JF, Kor- Schmidt H, Schreiber F, Schubert M, Schulte R, Sealfon SC, E, Chang‐Claude J, Brenner H, Roth W, Engel C, Löffler M, obeynikov A, Chikhi R, Babaian A (2020). Petabase-scale Holderbach S, Adam L, Jayaram B, Wade RC, Mukherjee G Smallbone K, Soliman S, Stefan MI, Sullivan DP, Takahashi Möslein G, Schackert H, Weitz J, Perne C, Aretz S, Hüne- sequence alignment catalyses viral discovery. bioRxiv (2020). RASPD+: Fast Protein-Ligand Binding Free Energy K, Teusink B, Tolnay D, Vazirabad I, Kamp A, Wittig U, burg R, Schmiegel W, Vangala D, Rahner N, Steinke‐Lange 2020.08.07.241729 Prediction Using Simplified Physicochemical Features. Frontiers Wrzodek C, Wrzodek F, Xenarios I, Zhukova A, Zucker J V, Heuveline V, von Knebel Doeberitz M, Ahadova A, Hoff- in Molecular Biosciences 7:393 (2020). SBML Level 3: an extensible format for the exchange and meister M, Kloor M (2020). Age‐dependent performance of Elsworth Y, Themeßl N, Hekker S, Chaplin W (2020). A reuse of biological models. Mol Syst Biol 16(8) BRAF mutation testing in Lynch syndrome diagnostics. Int. J. Layered Approach to Robust Determination of Asteroseismic Cancer 147(10):2801-2810 Parameters. Res. Notes AAS 4(10):177

94 95 7 Publications López F, Strube M (2020). A Fully Hyperbolic Neural Model for Morel B, Barbera P, Czech L, Bettisworth B, Hübner L, Dugourd A, Naldi A, Noel V, Calzone L, Sander C, Demir E, Hierarchical Multi-class Classification. In Proceedings of Find- Lutteropp S, Serdari D, Kostaki E, Mamais I, Kozlov AM, Korcsmaros T, Freeman TC, Augé F, Beckmann JS, Hasenau- Kokh DB, Doser B, Richter S, Ormersbach F, Cheng X, Wade ings of the Association for Computational Linguistics: EMNLP Pavlidis P, Paraskevis D, Stamatakis A (2020). Phylogenetic er J, Wolkenhauer O, Wilighagen EL, Pico AR, Evelo CT, RC (2020). A workflow for exploring ligand dissociation from a 2020, Online, November 2020, pp. 460-475 analysis of SARS-CoV-2 data is difficult. Molecular Biology and Gillespie ME, Stein LD, Hermjakob H, D’Eustachio P, macromolecule: Efficient random acceleration molecular dynam- Evolution, msaa314 Saez-Rodriguez J, Dopazo J, Valencia A, Kitano H, Barillot E, ics simulation and interaction fingerprint analysis of ligand Lösel PD, von de Kamp T, Jayme A, Ershov A, Faragó T, Auffray C, Balling R, Schneider R (2020). COVID-19 Disease trajectories. J. Chem. Phys. 153(12): p.125-102 Pichler O, Jerome NT, Aadepu N, Bremer S, Chilingaryan SA, Mustafa G, Nandekar PP, Mukherjee G, Bruce NJ, Wade RC Map, a computational knowledge repository of SARS-CoV-2 Heethoff M, Kopmann A, Odar J, Schmelzle S, Zuber M, (2020). The Effect of Force-Field Parameters on Cytochrome virus-host interaction mechanisms. bioRxiv 2020.10.26.356014 Kozlov A, Alves J, Stamatakis A, Posada D (2020). CellPhy: Wittbrodt J, Baumbach T, Heuveline V (2020). Introducing P450-Membrane Interactions: Structure and Dynamics. Sci Rep accurate and fast probabilistic inference of single-cell phyloge- Biomedisa as an open-source online platform for biomedical 10(1), pp.72-84 Öztürk MA, Wade RC (2020). Computation of FRAP recovery nies from scDNA-seq data. bioRxiv 2020.07.31.230292 image segmentation. Nat Commun 11(1),5577 times for linker histone – chromatin binding on the basis of Müller M (2020). pyMMAX2: Deep Access to MMAX2 Projects Brownian dynamics simulations. Biochimica et Biophysica Acta Kramer M, Schneider F, Ohlmann S, Geier S, Schaffenroth V, Mahy L, Almeida L, Sana H, Clark J, de Koter Ad, de Mink S, from Python. In Proceedings of the 14th Linguistic Annotation (BBA) - General Subjects 1864(10):129653 Pakmor R, Röpke F (2020). Formation of sdB-stars via com- Evans C, Grin N, Langer N, Moffat A, Schneider F, Shenar T, Workshop, Online, December , pp. 167-173. mon envelope ejection by substellar companions. A&A 642:A97 Tramper F (2020). The Tarantula Massive Binary Monitoring. IV. Öztürk MA, De M, Cojocaru V, Wade RC (2020). Chromato- Double-lined photometric binaries. A&A 634:A119 Müller M, Ghosh S, Rey M, Wittig U, Müller W, Strube M some Structure and Dynamics from Molecular Simulations. Lal S, Alpay A, Salzmann P, Cosenza B, Hirsch A, Stawinoga (2020). Reconstructing Manual Information Extraction with Annu. Rev. Phys. Chem. 71(1):101-119 N, Thoman P, Fahringer T, Heuveline V (2020). SYCL-Bench: A Mahy L, Sana H, Abdul-Masih M, Almeida L, Langer N, DB-to-Document Backprojection: Experiments in the Life Science Versatile Cross-Platform Benchmark Suite for Heterogeneous Shenar T, de Koter Ad, de Mink Sd, de Wit S, Grin N, Evans Domain. In Proceedings of the First Workshop on Scholarly Rennekamp B, Kutzki F, Obarska-Kosinska A, Zapp C, Computing. In: Euro-Par 2020: Parallel Processing. Springer C, Moffat A, Schneider F, Barbá R, Clark J, Crowther P, Document Processing, Online, November 2020, pp. 81-90. 2020. Gräter F (2020). Hybrid Kinetic Monte Carlo/Molecular Dynamics International Publishing, pp. 629-644 Gräfener G, Lennon D, Tramper F, Vink J (2020). The Tarantu- sdp-1.9 Simulations of Bond Scissions in Proteins. J. Chem. Theory la Massive Binary Monitoring. III. Atmosphere analysis of Comput. 16(1):553-563 Langer N, Schürmann C, Stoll K, Marchant P, Lennon D, double-lined spectroscopic systems. A&A 634:A118 Nunes-Alves A, Mazzolari A, Merz KM (2020). What Makes a Mahy L, Mink Sd, Quast M, Riedel W, Sana H, Schneider P, Paper Be Highly Cited? 60 Years of the Journal of Chemical Sadiq SK (2020). Fine-Tuning of Sequence Specificity by Near Schootemeijer A, Wang C, Almeida L, Bestenlehner J, Mathews K, Strube M (2020). A large harvested corpus of Information and Modeling. J. Chem. Inf. Model. 60(12):5866- Attack Conformations in Enzyme-Catalyzed Peptide Hydrolysis. Bodensteiner J, Castro N, Clark S, Crowther P, Dufton P, location metonymy. In Proceedings of the 12th International 5867 Catalysts 10(6):684 Evans C, Fossati L, Gräfener G, Grassitelli L, Grin N, Hast- Conference on Language Resources and Evaluation, Marseille, ings B, Herrero A, de Koter Ad, Menon A, Patrick L, Puls J, France, May 2020, p.11-16 Nunes-Alves A, Kokh DB, Wade RC (2020). Recent progress in Sand C, Ohlmann ST, Schneider FRN, Pakmor R, Röpke FK Renzo M, Sander A, Schneider F, Sen K, Shenar T, Simón- molecular simulation methods for drug binding kinetics. Current (2020). Common-envelope evolution with an asymptotic giant Dı́as S, Tauris T, Tramper F, Vink J, Xu X-T (2020). Properties Mazzolari A, Nunes-Alves A, Wahab HA, Amaro RE, Cournia Opinion in Structural Biology 64:126-133 branch star. A&A 644:A60 of OB star-black hole systems derived from detailed binary Z, Merz KM (2020). Impact of the Journal of Chemical Informa- evolution models. A&A 638:A39 tion and Modeling Special Issue on Women in Computational Ostaszewski M, Niarakis A, Mazein A, Kuperstein I, Phair R, Schneider FRN, Ohlmann ST, Podsiadlowski P, Röpke FK, Chemistry. J. Chem. Inf. Model., acs.jcim.0c00636 Orta-Resendiz A, Singh V, Aghamiri SS, Acencio ML, Glaab E, Balbus SA, Pakmor R (2020). Long-term evolution of a magnet- Lutteropp S, Kozlov AM, Stamatakis A (2020). A fast and Ruepp A, Fobo G, Montrone C, Brauner B, Frischman G, ic massive merger product. Monthly Notices of the Royal memory-efficient implementation of the transfer bootstrap. Miller A. A., Magee M. R., Polin A., Maguire K., Zimmerman Gómez LCM, Somers J, Hoch M, Gupta SK, Scheel J, Bor- Astronomical Society 495(3):2796-2812 Bioinformatics 36(7): p. 2280-2281 E., Yao Y., Sollerman J., Schulze S., Perley D. A., Kromer M., linghaus H, Czauderna T, Schreiber F, Montagud A, Leon Dhawan S., Bulla M., Andreoni I., Bellm E. C., De K., Dekany MPd, Funahashi A, Hiki Y, Hiroi N, Yamada TG, Dräger A, Schreiber F, Sommer B, Czauderna T, Golebiewski M, Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks R., Delacroix A., Fremling C., Gal-Yam A., Goldstein D. A., Renz A, Naveez M, Bocskei Z, Messina F, Börnigen D, Gorochowski TE, Hucka M, Keating SM, Konig M, Myers C, SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel Golkhou V. Z., Goobar A., Graham M. J., Irani I., Kasliwal M. Fergusson L, Conti M, Rameil M, Nakonecnij V, Vanhoefer J, Nickerson D, Waltemath D (2020). Specifications of standards N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CS, M., Kaye S., Kim Y. L., Laher R. R., Mahabal A. A., Masci F. Schmiester L, Wang M, Ackerman EE, Shoemaker J, Zucker in systems and synthetic biology: status and developments in Aparicio S, Baaijens J, Balvert M, Barbanson Bd, Cappuccio J., Nugent P. E., Ofek E., Phinney E. S., Prentice S. J., Riddle J, Oxford K, Teuton J, Kocakaya E, Summak GY, Hanspers K, 2020. Journal of Integrative Bioinformatics 17(2-3) A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, R., Rigault M., Rusholme B., Schweyer T., Shupe D. L., Kutmon M, Coort S, Eijssen L, Ehrhart F, Rex DAB, Slenter D, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel Soumagnac M. T., Terreran G., Walters R., Yan L., Zolkower Martens M, Haw R, Jassal B, Matthews L, Orlic-Milacic M, Shcherbakov O, Polsterer K, Svyatnyy VA (2020). Integration JO, Kozlov AM, Kuo T, Lelieveldt BP, Mandoiu II, Marioni JC, J., Kulkarni S. R. (2020). The Spectacular Ultraviolet Flash from Ribeiro AS, Rothfels K, Shamovsky V, Stephan R, Sevilla C, of Distributed Parallel Simulation Environment with Cloud-Infra- Marschall T, Mölder F, Niknejad A, Raczkowski L, Reinders the Peculiar Type Ia Supernova 2019yvq, The Astrophysical Varusai T, Ravel J, Fraser R, Ortseifen V, Marchesi S, Gawron structures, PMDA 22(1):22-27 M, Ridder Jd, Saliba A, Somarakis A, Stegle O, Theis FJ, Journal, 898, 56) P, Smula E, Heirendt L, Satagopam V, Wu G, Riutta A, Gole- Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, biewski M, Owen S, Goble C, Hu X, Overall RW, Maier D, Suyu SH., Huber S., Cañameras R, Kromer M, Schuldt S, Schönhuth A (2020). Eleven grand challenges in single-cell data Moreau CA, Quadt KA, Piirainen H, Kumar H, Bhargav SP, Bauch A, Gyori BM, Bachman JA, Vega C, Grouès V, Vazquez Taubenberger S, Yıldırım A, Bonvin V, Chan JHH, Courbin F, science. Genome Biology 21(1) p. 1-35 Strauss L, Tolia NH, Wade RC, Spatz JP, Kursula I, M, Porras P, Licata L, Iannuccelli M, Sacco F, Nesterova A, Nöbauer U, Sim SA, Sluse D (2020). HOLISMOKES. I. Highly Frischknecht F (2020). A function of profilin in force generation Yuryev A, Waard Ad, Turei D, Luna A, Babur O, Soliman S, Optimised Lensing Investigations of Supernovae, Microlensing during malaria parasite motility independent of actin binding. J Valdeolivas A, Esteban-Medina M, Peña-Chilet M, Helikar T, Objects, and Kinematics of Ellipticals and Spirals, Astronomy & Cell Sci:jcs.233775 Puniya BL, Modos D, Treveil A, Olbei M, Meulder BD, Astrophysics, 655, A162

96 97 7 Publications Lukas Jarosch: Lectures, Courses and Seminars 8 Teaching ”Computational modeling of SERCA interactions with van Son LAC, De Mink SE, Broekgaarden FS, Renzo M, S100A1ct and DWORF fingerprints”, Bachelor’s thesis, Nguyen-Thi Dang: Justham S, Laplace E, Morán-Fraile J, Hendriks DD, Farmer Biochemistry, Faculty of Biosciences and Faculty of “The top Lyapunov exponent”, Heidelberg University, R (2020). Polluting the Pair-instability Mass Gap for Binary Black Degrees Chemistry and Geosciences, Heidelberg University, and summer semester 2020. Holes through Super-Eddington Accretion in Isolated Binaries. HITS: Manuel Glaser and Rebecca C. Wade (2020). ApJ 897(1):100 Jonas Brehmer: Timo Dimitriadis (with Robert Jung): “Theory and Methodology of Scoring Functions: Tail Mohsen Mesgar: Lecture and exercise course on “Applied Financial Econo- Vogel, P., Knippertz, P., Fink, A. H., Schlueter, A., & Gneiting, Properties, Interval Forecasts, and Point Processes”, Ph.D. “Graph-based Patterns for Local Coherence Modeling”, metrics”, Universität Hohenheim, summer semester T. (2020). Skill of Global Raw and Postprocessed Ensemble thesis, Faculty of Business Informatics and Business Ph.D. Thesis, Neuphilologische Fakultät, Heidelberg 2020. Predictions of Rainfall in the Tropics, Weather and Forecasting, Mathematics, University of Mannheim (Martin Schlather, University, and HITS: Michael Strube (2020). 35(6), 2367-2385 Kirstin Strokorb), and HITS: Tilmann Gneiting (2020). Timo Dimitriadis: Fabian Ormersbach: Lecture and exercise course on “Allgemeine Methodenleh- Waltemath D, Golebiewski M, Blinov ML, Gleeson P, Hermja- David Bubeck: ”Computational ligand superimposition and analysis of re der Statistik (Statistical methods)”, Heidelberg Universi- kob H, Hucka M, Inau ET, Keating SM, König M, Krebs O, "Thermonuclear Explosions of Rapidy Rigidly Rotating protein-ligand interaction fingerprints”, Bachelor’s thesis, ty, winter semester 2020/2021. Malik-Sheriff RS, Nickerson D, Oberortner E, Sauro HM, White Dwarfs", Master´s thesis, Department of Physics Molecular Biotechnology, Faculty of Biosciences, Heidel- Schreiber F, Smith L, Stefan MI, Wittig U, Myers CJ (2020). and Astronomy, Heidelberg University, and HITS: berg University, and HITS: Daria Kokh and Rebecca C. Dorotea Dudas, Xiaoming Hu, Maja Rey, Andreas The first 10 years of the international coordination network for Friedrich Röpke (2020). Wade (2020). Weidemann and Ulrike Wittig: standards in systems and synthetic biology (COMBINE). Journal de.NBI Course “Tools for Systems biology modeling and of Integrative Bioinformatics 17(2-3) Thomas Ehret: Ina A. Pöhner: data exchange: COPASI, CellNetAnalyzer, SABIO-RK, “Comparative Study of FERM-PI(4,5)P2 Interaction Dynam- ”Computational approaches to drug design against the FAIRDOMHub/SEEK”, online, 21 - 23 September 2020. Weidner P, Söhn M, Schroeder T, Helm L, Hauber V, Gutting ics of Ezrin and Focal Adhesion Kinase”, Master´s thesis, folate & biopterin pathways of parasites causing neglect- T, Betge J, Röcken C, Rohrbacher FN, Pattabiraman VR, Heidelberg University, Faculty of Physics, Frauke Gräter ed tropical diseases”, Ph.D. thesis, Combined Faculty for Javier Moran Fraile: Bode JW, Seger R, Saar D, Nunes-Alves A, Wade RC, Ebert and Ulrich Schwarz (2020). the Natural Sciences and Mathematics, Heidelberg Tutorial course "Computational Astrophysics", Heidelberg MPA, Burgermeister E (2020). Myotubularin-related protein 7 University, and HITS: Rebecca C. Wade (2020). University, summer semester 2020. activates peroxisome proliferator-activated receptor-gamma. Philipp Gerstner: Oncogenesis 9(6),59 "Analysis and Numerical Approximation of Dielectrophoret- Lucas Rettenmeier: Tilmann Gneiting (lectures) and Johannes Resin ic Force Driven Flow Problems." Ph.D. thesis, Faculty of "Word Embeddings: Stability and Semantic Change", (exercises): Wu S, Everson RW, Schneider FRN, Podsiadlowski P, Mathematics and Computer Science, Heidelberg Univer- Master´s thesis, Department for Physics and Astronomy, Course on “Time Series Analysis”, Karlsruhe Institute of Ramirez-Ruiz E (2020). The Art of Modeling Stellar Mergers and sity, and HITS: Vincent Heuveline (2020). Heidelberg University (Fred Hamprecht), and HITS: Technology, summer semester 2020. the Case of the B[e] Supergiant R4 in the Small Magellanic Cloud. Michael Strube (2020). Course on “Forecasting: Theory and Practice I”, Karlsruhe ApJ 901(1):44 Dimitri Höhler: Institute of Technology, winter semester 2020/2021. "Advanced Heuristics for Accelerating PaPaRa", Master's Eugen Rogozinnikov: Yuan J-H, Han SB, Richter S, Wade RC, Kokh DB (2020). thesis, Karlsruhe Institute of Technology, and HITS: “Symplectic groups over noncommutative rings and Martin Golebiewski and Wolfgang Müller: Druggability Assessment in TRAPP Using Machine Learning Alexandros Stamatakis (2020). maximal representations”, Ph.D. Thesis, Mathematics “LiSyM Data Management”, LiSyM Retreat 2020 of the Approaches. J. Chem. Inf. Model. 60(3):1685-1699 Institute, Heidelberg University, and HITS: Daniele Liver Systems Medicine Network, Hofgeismar, Germany, Johannes Horn: Alessandrini, Anna Wienhard (2020). 29-31 January 2020. Zapletal A, Höhler D, Sinz C, Stamatakis A (2020). SoftWipe “Singular fibers of Hitchin systems”, Ph.D. Thesis, Mathe- – a tool and benchmark to assess scientific software quality. matics Institute, Heidelberg University, and HITS: Daniele Paul Schade: Frauke Gräter: bioRxiv 2020.10.07.330621 Alessandrini, Anna Wienhard (2020). "Phylogenetic species tree inference from gene trees Contribution to lecture “Computational biochemistry” for despite paralogy", Master's thesis, Karlsruhe Institute of biochemistry master students, winter semester 2020/21. Zapp C, Obarska-Kosinska A, Rennekamp B, Kurth M, Lukas Hübner: Technology, and HITS: Alexandros Stamatakis (2020). Contribution to lecture “Biophysical Chemistry” for Molec- Hudson DM, Mercadante D, Barayeu U, Dick TP, Denysen- "Load-Balance and Fault-Tolerance for Massively Parallel ular Biotechnology bachelor students, winter semester kov V, Prisner T, Bennati M, Daday C, Kappl R, Gräter F Phylogenetic Inference", Master's thesis, Karlsruhe Anna Schroeder: 2020/21. Lecture with practicals “Physical Chemistry of (2020). Mechanoradicals in tensed tendon collagen as a source Institute of Technology, and HITS: Alexandros “Collagen Structure and Mechanics: Molecular Dynamics Life II” winter semester 2020/21 (with Michael Boutros). of oxidative stress. Nat Commun 11(1),2315 Stamatakis (2020). Simulations and Network Analysis”, Master´s thesis, Heidelberg University, Faculty of Physics, Frauke Gräter Zhu J-Y, Song C, Heuveline V, Li B, Li B-H, Han Z-S, Liu X-Y and Ulrich Schwarz (2020). (2020). Characterization Simulation of a Bulk MOSFET in Steady-State with SIPG Method. IEEE 15th International Confer- ence on Solid-State & Integrated Circuit Technology (ICSICT), pp.1-3, IEEE

98 99 8 Teaching course "The Stellar Cookbook: A practical guide to the Gabriele Viaggi: Sebastian Lerch (with Joaquim Pinto): theory of stars", Heidelberg University, winter semester “Hyperbolic manifolds”, Heidelberg University, winter Lecture course on “Methods of Data Analysis”, Karlsruhe 2019/2020, together with Friedrich Röpke. Master Semi- semester 2020/2021. Frauke Gräter, Fabian Kutzki and Benedikt Rennekamp: Institute of Technology, summer semester 2020. nar "Cosmic explosions", Heidelberg University, summer Lecture with tutorials "Fundamentals of simulation meth- semester 2020. Lecture course "Python: programming for Rebecca C. Wade, Christina Athanasiou, Manuel ods", winter semester 2020/21 (with Friedrich Röpke). Sebastian Lerch and Eva-Maria Walz: scientists", Heidelberg University, summer semester Glaser, Daria Kokh, Abraham Muniz, Ariane Nunes- Short course “Introduction to Machine Learning”, Karl- 2020. Lecture course "Python: programming for scien- Alves, Stefan Richter, Athanasios-Alexandros Tsen- Frauke Gräter and Camilo Aponte-Santamaría: sruhe Institute of Technology, Graduate School of Center tists", Heidelberg University, winter semester 2019/2020. genes, (MCM), Frauke Gräter, Isabel Martin, Svenja de “Data Science 2020“, Max Planck School Matter to Life, MathSEE, summer semester 2020. Buhr (MBM): Heidelberg, Germany, October-December 2020. Theodoros Soultanis: B.Sc. Biosciences practical course “Grundkurs Bioinforma- Mark-Christoph Müller: Tutorial course "Experimental Physics II", Heidelberg tik”, Heidelberg University, 20-24 January 2020. Frauke Gräter (MBM), Rebecca C. Wade (MCM): Seminar: "Multiword Expressions", Department of Compu- University, summer semester 2020. M.Sc. Seminar course “Machine Learning for the Biomo- tational Linguistics, Heidelberg University (Winter Semes- Rebecca C. Wade, Christina Athanasiou, Athanasi- lecular World”, Heidelberg University, summer semester, ter 2019/2020). Alexandros Stamatakis, Ben Bettisworth, Alexey os-Alexandros Tsengenes: 2020. Kozlov, Pierre Barbera: EuroNeurotrophin International Training Network (ITM) 3rd Maria Beatrice Pozzetti: Lecture "Introduction to Bioinformatics for Computer Training Week “Structural Biology Approaches for Neuro- Frauke Gräter (MBM), Rebecca C. Wade, Ariane “Translation Surfaces”, Heidelberg University, Heidelberg, Scientists", computer science Master's program at degeneration”, held online, 30 September – 1 October Nunes-Alves, Manuel Glaser (MCM): summer semester 2020; “Differential Geometry 2”, Karlsruhe Institute of Technology, winter semester, 2020. M.Sc. practical course “Computational Molecular Biophys- Heidelberg University, winter semester 2020/2021; 2019/2020. ics”, Heidelberg University, winter semester, 2019/2020. “Bridgeland’s stability for meromorphic differentials”, Rebecca C. Wade: Heidelberg University, winter semester 2020/2021. Alexandros Stamatakis, Benoit Morel, Alexey Kozlov, Module 4, “Biomolecular Recognition: Modeling and Ganna Gryn’ova and Christopher Ehlert: Pierre Barbera: Simulation”, M.Sc. Molecular Cell Biology, Heidelberg Special Lecture Course “Applied Computational Chemis- Maja Rey and Ulrike Wittig: Lecture "Introduction to Bioinformatics for Computer University, 13 March 2020. Module 3, “Protein Modeling”, try”, Heidelberg University, summer semester 2020. FAIRDOM-SEEK Training at MIX-UP Kick off Meeting, Scientists", computer science Master's program at M.Sc. Molecular Cell Biology, Heidelberg University, 20 & Brussels, Belgium, 7 February 2020. Karlsruhe Institute of Technology, winter semester, 25 May 2020. Ringvorlesung “Computational Biochemis- Olga Krebs: 2020/2021. try”, “Electrostatics and Solvation for Biomolecules”, M. “FAIR Principles for research data management with Friedrich Röpke: Sc. Biochemistry, Heidelberg University, 30 October classroom discussions”, Lecture and live demo at the Lecture course "Computational Astrophysics", Heidelberg Alexandros Stamatakis, Benoit Morel, Pierre Barbera, 2020. Ringvorlesung „Biophysik“, “Receptor-Ligand Interac- ELIXIR Luxembourg’s training on “Best practices in University, summer semester 2020. Lecture course "The Ben Bettisworth, Alexey Kozlov: tions: Structure and Dynamics”, B.Sc. Molecular Biotech- research data management and stewardship”, 3 June Stellar Cookbook: A practical guide to the theory of stars", Seminar "Hot Topics in Bioinformatics", computer science nology, Heidelberg University, 15 December 2020. Ring- 2020. “FAIR data principles and Data publishing and Heidelberg University, winter semester 2019/2020 and Master's program at Karlsruhe Institute of Technology, vorlesung „Mobi4all“, “Computational Approaches to archival”, Lecture and practical session at the ELIXIR winter semester 2020/2021, together with Fabian Schnei- summer semester, 2020. Protein Dynamics and Binding Kinetics for Drug Discov- Luxembourg+ELIXIR France online training , 5 - 8 October der. Lecture course "Fundamentals of Simulation Meth- ery”, M.Sc. Molecular Biotechnology, Heidelberg Universi- 2020. “FAIR data training” computer practicals at the ods", Heidelberg University, winter semester 2020/2021, Alexandros Stamatakis: ty, December, 2020. Modelling COVID-19 epidemics course , 30 November - 9 together with Frauke Gräter. Research Seminar "Physics EU ITN IGNITE project PhD student training lectures on December 2020. of Stellar Objects", Heidelberg University, winter semester "General Introduction to Markov Chains", Anna Wienhard: 2019/2020, summer semester 2020, and winter semes- "Green high performance computing", “Topological methods in Data Analysis”, Journal Club, Giovanni Leidi: ter 2020/2021. "Ascertainment bias correction in phylogenetics" Heidelberg University, summer semester 2020; “Lineare Tutorial accompanying the lecture course "Computational "Discrete Operations on phylogenetic trees" Algebra I”, Heidelberg University, winter semester Astrophysics", Heidelberg University, summer semester Christian Sand: "Introduction to parsimony and its computational com- 2020/2021. 2020. Tutorial accompanying the lecture course "Funda- Exercises and tutorial accompanying the lecture course monalities with likelihood" mentals of Simulation Methods", Heidelberg University, "Physik A", Heidelberg University, winter semester "High Performance Computing and Phylogenetics" winter semester 2020/2021. 2019/2020. "Introduction to Bayesian phylogenetics using MCMC" "Computing the Phylogenetic Likelihood", on-line, August Sebastian Lerch: Anna-Sofie Schilling: 2020. Lecture course on “Probability and Statistics for Mechani- “Fun fact aus Analysis und Linearer Algebra”, Heidelberg cal Engineering”, Karlsruhe Institute of Technology, University, winter semester 2020/2021. Michael Strube: summer semester 2020. Lecture course on “Probability PhD Colloquium, Department of Computational Linguis- and Statistics for Computer Scientists”, Karlsruhe Insti- Fabian Schneider: tics, Heidelberg University (Winter Semester 2019/2020). tute of Technology, winter semester 2020/2021. Lecture course "The Stellar Cookbook: A practical guide to PhD Colloquium, Department of Computational Linguis- the theory of stars", Heidelberg University, winter semes- tics, Heidelberg University (Summer Semester 2020). ter 2020/2021, together with Friedrich Röpke. Lecture

100 101 Alexey Kozlov: Mareike Pfeil: "New phylogenetic tools for genome-scale datasets: RAx- “Deforming Anosov representations: From earthquakes 9 Miscellaneous ML-NG, ParGenes and GeneRax", 2nd Bird 10,000 Genomes maps to cataclysms”, Geometry Graduate Colloquium, (B10K) project symposium and Workshop, Copenhagen, Zürich, Switzerland, 2 April 2020. Denmark, February 2020. "Random thoughts on GreenHPC", Sustainability meeting at the Max Planck Institute for Maria Beatrice Pozzetti: 9.1 Guest Speaker Activities (invited only) Astronomy", Heidelberg, Germany (online), December 2020. “Real spectrum compactification of character varieties”, Tilmann Gneiting: Strasbourg, 15 June 2020; “What’s new in Higher (rank) Johannes Bracher: “Isotonic Distributional Regression – Leveraging Monotonic- Olga Krebs: Teichmüller theory?”, ETH Zürich, Switzerland, 30 October “Assembling, Comparing and Combining COVID-19 fore- ity, Uniquely So!”, Computational Science Lab (CSL) Sympo- “FAIRDOMHub: Implementing FAIR Data Principles for 2020; “Orbit growth rate and Hausdorff dimensions for casts”, Frankfurt Institute for Advanced Studies (FIAS) M²C sium, Universität Hohenheim, Hohenheim, Germany, 22 Janu- scientific data management and stewardship”, Lecture at Anosov representations”, Paderborn, Germany, 3 November Seminars (online), Frankfurt, Germany, 28 October 2020. ary 2020. “Wettervorhersage: Welche Rolle spielt der the Modelling COVID-19 epidemics course, 1 December 2020; “The real spectrum compactification of higher rank Zufall?”, Carl-Bosch-Museum, Heidelberg, Germany, 29 2020. Teichmüller spaces”, Frankfurt, Germany, 18 November Nguyen-Thi Dang: January 2020. “Isotonic Distributional Regression (IDR) – 2020. “Mélange topologique des flots hyperboliques homogène”, Leveraging Monotonicity, Uniquely So!”, Luminy Conference Sebastian Lerch: HORUS seminar, IRMA, Strasbourg, France, 5 June 2020; Mathematical Methods of Modern Statistics 2, Luminy, “Neural networks for postprocessing ensemble weather Anja Randecker: “Topological mixing of the Weyl chamber”, Ergodic theory France (online), 16 June 2020. “Erfolge und Grenzen der forecasts”, National Oceanic and Atmospheric Administration “Random walks on hyperbolic manifolds”, Oberseminar and dynamical systems Seminar, Bristol, UK, 29 October Wettervorhersage – eine mathematische Perspektive”, (NOAA) Workshop on Leveraging AI in Environmental Scienc- Algebra und Zahlentheorie, Saarbrücken, Germany, 3 Febru- 2020. Universität Ulm, Ulm, Germany (online), 10 July 2020. es, College Park, USA (online), 22 October 2020. “Predictive ary 2020; “Large-scale geometry of the saddle connection Inference Based on Markov Chain Monte Carlo Output”, complex”, Séminaire Teich, Marseille, France (online), Decem- Madhura De: Martin Golebiewski: Conference on Information Technology and Data Science, ber 2020. “Single-molecule studies on mono- and trinucleosomes”, ELIXIR Guest Webinar: “Two universes, one world - Commu- Debrecen, Hungary (online), 7 November 2020. Heidelberg Chromatin Club, DKFZ, Heidelberg, 27 February nity standards vs. formal ISO standards in the life sciences”, Friedrich Röpke: 2020. “Single-molecule studies on mono- and trinucleosomes” ELIXIR webinar series (online), 16 January 2020. “Establish- Brice Loustau: "A 3D view on stellar astrophysics", Heidelberg Joint Astro- Institute of Biological and Chemical Systems, Karlsruhe Institute ing and harmonizing technical standards and key perfor- “The hyper-Kähler geometry of minimal hyperbolic germs”, nomical Colloquium, Heidelberg, Germany, 21 January 2020. of Technology, Karlsruhe, Germany, 2 April 2020. mance indicators for human digital twins”, European Geometry seminar, University of Wisconsin at Milwaukee, "Type Ia supernova explosion models: challenges and Commission Workshop ‘Human Digital Twin’ (online), USA, 9 November 2020. implications", Meeting of the Royal Astronomical Society, Antonio D’Isanto: 6 November 2020. London, UK (online), 13 November 2020. “ESCAPE – Building the infrastructure for the next genera- Benoit Morel: tion astronomy”, Invited seminar for the Astrophysics group Ganna Gryn’ova: "Gene tree inference under gene duplication, transfer and Carmen Rovi: journal club, University of Naples Federico II, Physics Depart- “Happy Together: How Computational Chemistry Can loss with GeneRax", Sanger Institute, Cambridge, UK, “Whitney stratified spaces”, Heidelberg Topology seminar, ment, Naples, Italy, 7 January 2020. “Convolutional neural Advance Functional Materials”, seminar talk at Centre for Microbial Genomics Meetings (online), December 2020. Heidelberg University, Germany, 30 January 2020; “Topology networks: Theory and application in astronomy”, Invited Advanced Materials, Heidelberg University, Heidelberg, meets Physics: Scissors congruences and TQFTs”, Colloqui- lecture for the Astroinformatics course, University of Naples Germany, 12 February 2020; “Crossing Electronic Bridges: Goutam Mukherjee: um at Harvey Mudd College, Claremont, California, USA, 13 Federico II, Physics Department, Naples, Italy, 8 January Computational Chemistry of Molecular Junctions”, (Bio) "RASPD+: Fast protein-ligand binding free energy prediction February 2020; “The Whitehead group of a CW complex”, 2020. “The two worlds of photometric redshift estimation Molecular Electronics Colloquia (online), University of Liver- using machine learning and its applications to SARS-CoV-2 Heidelberg Topology seminar, Heidelberg University, Germany, via machine learning: fully automatic vs feature based”, pool, UK, 1 October 2020. targets", Drug Discovery Hackathon 2020 workshop, https:// 7 May 2020; “The torsion of a homotopy equivalence”, Invited seminar for the Machine learning in Astronomy www.youtube.com/watch?v=9WarUhvNpGg&t=162s, 18 Heidelberg Topology seminar, Heidelberg University, Germany, meetings of the West Sydney University, Australia (online), 9 Johannes Horn: August 2020. "RASPD+: Machine Learning for Fast Pro- 23 July 2020; “Introduction to Surgery Theory”, RTG lec- June 2020; invited lecture for the Machine learning for “Singular fibers of Hitchin integrable systems”, Seminar tein-Ligand Binding Free Energy Prediction and its Applica- tures, Heidelberg, Germany, (online), November-December astronomers course at the Masaryk’s University, Brno, Czech Symplectic Geometry, Gauge Theory and Categorification, tions to SARS-CoV-2 Targets", BMS Institute of Technology 2020; “Kontsevich's configuration space integrals and republic (online), 24 November 2020. “MEGAVIS - Real-time Columbia University, New York, USA, 21 February 2020; and Management Bengaluru, India (International Webinar), 3 characteristic classes of framed sphere bundles”, Topology spectra analysis and visualization with autoencoders”, “Singular fibers of Hitchin integrable systems”, Geome- October 2020. seminar, Münster, Germany, 18 December 2020. Invited talk for the PUNCHLunch Webinar series (online), try-Topology Seminar, University of Maryland, College Park, Leibniz Institute for Astrophysics Potsdam (AIP), Germany, USA, March 2020. Wolfgang Müller: Fabian Schneider: 3 December 2020. Topic Lead Excel Validator, Integrative Pathway Modelling in "State-of-the-art evolution models for single, binary and Daria Kokh: Systems Biology and Systems Medicine Hackathon, Bonn, magnetic massive stars", Invited talk at Conference "MOB- Valentina Disarlo: “Exploring ligand unbinding kinetics using random accelera- Germany, 5-7 February 2020. “FAIR data in de.NBI” de.NBI STER-I: Stellar variability as a probe of magnetic fields in “Lignes d'étirement de Thurston généralisées”, Séminaire tion molecular dynamics: what can we learn?”, "2020 plenary, Berlin, Germany, 14 February 2020. massive stars" (online), 16 July 2020. "The strongest mag- GT3, Université de Strasbourg, Strasbourg, France, 20 Workshop on Free Energy Methods in Drug Design", nets in the Universe", ARI Colloquium, Astronomisches Rech- January 2020; “Lignes d'étirement de Thurston général- 12-14 November 2020 en-Institut, Centre for Astronomy, Heidelberg University, isées”, Séminaire Géométrie, Université Sorbonne (Jussieu), Heidelberg, Germany (online), 26 November 2020. Paris, 20 February 2020. 102 103 9 Miscellaneous Antonio D’Isanto: Sabrina Gronow: 9.2 Presentations (contributed talks and “Real-time spectra analysis and visualization with autoencod- "Double detonations", talk at the 15th Würzburg Workshop, Alexandros Stamatakis, Alexey Kozlov: posters) ers”, ESCAPE WP4 Tech Forum 1, Strasbourg, France, 4 – 6 Heidelberg, Germany (online), 16 December 2020. "RAxML-NG: a fast, scalable and user-friendly tool for February 2020; IVOA Virtual Interop 2020, Virtual conference, maximum likelihood phylogenetic inference", ISCB Interna- 4 – 8 May 2020; AG 2020 – Meeting of the German Astro- Johann Higl: tional Society for Computational Biology, webinar (online), Robert Andrassy: nomical Society, Virtual conference, 21 – 25 September 2020. "Calibrating the Core Overshooting Parameter With Hydrody- September 2020. "Fully compressible simulations of wave generation and namical Simulations", poster at the Annual Meeting of the propagation in main-sequence stars", talk at the Annual Dorotea Dudas: European Astronomical Society, Leiden, (online), Rebecca C. Wade: Meeting of the European Astronomical Society, Leiden, "SABIO-VIS", lightning talk at HARMONY 2020, EMBL-EBI, 1 July 2020. "Low Mach Number Simulations and Their “Tau-RAMD tutorial: Fast estimation of drug residence Netherlands (online), 1 July 2020. "Convective boundary Hinxton, UK, 9-13 March 2020. Need for Well-Balancing", talk at the Annual Meeting of the times”, 2020 MolSSI School on “Open Source Software for mixing in stellar interiors", talk at the 15th Würzburg Work- German Astronomical Society (online), 25 September 2020. Rare Event Sampling Strategies”, https://github.com/westpa/ shop, Heidelberg, Germany (online), 14 December 2020. Christopher Ehlert: "Compressible Simulations of Stellar Oscillations", talk at westpa/wiki/2020-MolSSI-School-on-Open-Source-Software- “An Assessment of Cluster Models and Methods", the 15th Würzburg Workshop, Heidelberg, Germany (online), in-Rare-Event-Path-Sampling-Strategies, 15 July 2020. Pierre Barbera: Graphene2020 (online), 19 - 23 October 2020 (poster). 14 December 2020. “Zooming in on the dynamic interactions of proteins and "Phylogenetic Placement: Why the where of the what tells us drugs by computer simulation”, CRC 1093 Lecture Series, the who and the why", University of Duisburg-Essen, Essen, Florian Franz and Frauke Gräter: Xiaoming Hu: University of Duisburg-Essen, Germany (online), 17 November Germany, February 2020. Talin impacts force-induces Vinculin activation through “SEEK : A systems biology data and model management 2020. “Computational Approaches to Protein Dynamics and "loosening" the vinculin inactive state. Annual Meeting of the platform”, lightning talk at HARMONY 2020, EMBL-EBI, Binding Kinetics for Drug Discovery”, Molecular Graphics Thomas Baumann: Biophysical Society, San Diego, USA, 15-19 February 2020 Hinxton, UK, 9-13 March 2020. and Modelling Society (MGMS) Lecture Tour Lecture, https:// "Common Envelope Parameter Space Exploration", talk at the (poster). www.mgms.org/WordPress/lecture-tour/, https://www. 15th Würzburg Workshop, Heidelberg, Germany (online), 15 Alexander Jordan: youtube.com/watch?v=QFbxeovFRtg&t=101s (online), 24 December 2020. Javier Moran Fraile: “Evaluating Probabilistic Classifiers: Reliability Diagrams and November 2020. “Zooming in on the dynamic interactions of "3D MHD simulations of White Dwarf - Neutron Star merg- Score Decompositions Revisited”, International Verification proteins and drugs by computer simulation”, “Science at the Johannes Bracher: ers", talk at the 15th Würzburg Workshop, Heidelberg, Methods Workshop Online, 18 November 2020. Edge” colloquium, Biochemistry, Physics and Chemical “Assembling, Comparing and Combining COVID-19 forecasts”, Germany (online), 15 December 2020. Engineering / Materials Sciences Departments, Michigan Karlsruhe Institute of Technology Center for Information, Markus Kurth and Frauke Gräter: State University, USA (online), 4 December 2020. Systems, Technology (KCIST), Germany, Online Lecture Series, Martin Golebiewski: A role for dihydroxyphenylalanine (DOPA) and its radical as 26 October 2020. “Options for standardization strategies in personalized marker for mechanical stress in collagen. EMBO Workshop: Anna Wienhard: medicine”, Workshop “Using patient derived data for in silico Chemical Biology 2020 (online). Heidelberg, Germany, 3-5 “Geometry, Topology, and Groups”, Mathematics Colloquium, Jonas Brehmer: modelling in personalized medicine”, HITS, Heidelberg, September 2020 (poster). IST , 11 March 2020; “Where geometry meets “Scoring Interval Forecasts”, Bernoulli-IMS One World Sympo- Germany, 4 February 2020. “FAIR Data Infrastructures for dynamics: groups, entropy and Hausdorff dimension”, sium 2020 (online), 24-28 August 2020. Biomedical Research”, Annual Conference of the Research Florian Lach: Mathematical Colloquium, Rutgers University, USA, 30 Data Alliance Germany (RDA Deutschland), Potsdam, Germa- "Nucleosynthesis Imprints of different Type Ia Supernova September 2020; “Where geometry meets dynamics: Nguyen-Thi Dang: ny, 25-27 February 2020. “A European standardization Explosion Scenarios and Chandrasekhar-mass Deflagrations groups, entropy and Hausdorff dimension”, Mathematics “Topological dynamics of the Weyl chamber flow”, Sophus Lie framework for data integration and data-driven in silico in Carbon-Oxygen White Dwarfs", talk at the 15th Würzburg Colloquium, Florida State University, USA, 13 November 2020, Paderborn, Germany, February 2020; “Topological models for personalized medicine”, lightning talk at HARMO- Workshop, Heidelberg, Germany (online), 16 December 2020. 2020; “Growth of groups, entropy and Hausdorff dimen- dynamics of the Weyl chamber flow”, Nearly Carbon Neutral NY 2020, EMBL-EBI, Hinxton, UK, 9-13 March 2020. “Chanc- sion”, Zurich Colloquium in Mathematics, Switzerland, 24 Geometric Topology Conference (online), June 2020. es and challenges in developing standards”, EU-STANDS4PM Giovanni Leidi: November 2020. annual meeting 2020 (online), 27 May 2020. “ISO and commu- "Implementing MHD into SLH", talk at the 15th Würzburg Timo Dimitriadis: nity standards for Modelling in personalized Medicine”, Workshop, Heidelberg, Germany (online), 16 December 2020. Ulrike Wittig: “Testing Forecast Rationality for Measures of Central Tenden- COMBINE 2020, session “Towards in silico approaches for “Research data management by SEEK/FAIRDOMHub”, cy”, Computational Science Lab (CSL) Symposium, Universität personalized medicine” (online), 7 October 2020. “FAIRness of Sebastian Lerch: Seminar “Special Interest Group Data Infrastructure”, Hohenheim, Hohenheim, Germany, 22 January 2020; Al- data and models through standardization in systems biolo- “Neural Networks for Postprocessing Ensemble Weather Stuttgart, Germany, 5 February 2020. “SABIO-RK - a manual- fred-Weber-Institut, Universität Heidelberg, Heidelberg, Germa- gy”, COMBINE 2020, session “Data, tools, standards and Forecasts”, EUMETNET Workshop on AI for Weather and Cli- ly curated reaction kinetics database”, South Africa-Germany ny, 23 January 2020; 12th Econometric Society World Con- more: infrastructure for systems biology” (online), 8 October mate Modeling, Brussels, Belgium, 2 February 2020; ECM- bilateral research projects workshop on "Systematic profiling gress, Bocconi University, Milan, Italy (online), 19 August 2020. 2020. “FAIR Data Infrastructures for Biomedical Informatics”, WF-ESA Workshop on Machine Learning for Earth System of multicopper oxidases" (online), 5-7 October 2020. “Evaluating Probabilistic Classifiers: Reliability Diagrams and introduction for GMDS workshop, 65th Annual Conference of Observation and Prediction, Reading, UK (online), 7 October Score Decompositions Revisited”, International Symposium on the German Association for Medical Informatics, Biometry and 2020. “Evaluating Probabilistic Forecasts with scorin- Forecasting, Rio de Janeiro, Brazil (online), 28 October 2020; Epidemiology (GMDS) (online), 15 October 2020. “Standards gRules”, International Verification Methods Workshop Online, Alfred-Weber-Institut, Universität Heidelberg, Heidelberg, for FAIR Data”, NFDI4Health Kick-off-Meeting (online), 4 18 November 2020. “Artificial Intelligence for Probabilistic Germany (online), 18 November 2020. November 2020. “SEEK and Find: FAIRDOM Data Manage- Weather Prediction”, Karlsruhe Institute of Technology Center ment Support for COVID-19 Disease Maps”, 5th Disease Maps for Information, Systems, Technology (KCIST) Online Lecture Community Meeting (online), 12-14 November 2020. Series, Germany, 7 December 2020. 104 105 9 Miscellaneous Anja Randecker: 25 September 2020. "Supernovae from stripped binary “FAIRDOM: supporting FAIR data and model management”, “Coarse geometry of the saddle connection complex” Lost stars", Research Seminar at Monash University, Melbourne, ELIXIR All Hands Meeting 2020 (online), 8-10 June 2020 Kiril Maltsev: in Translation Surfaces, University of Luxembourg, Luxem- Australia (online), 27 October 2020. "Core Collapse Superno- (poster). "On the foundation of Black Hole Thermodynamics", talk at bourg, 30 January 2020; “What does your average transla- vae from Stripped Stars", 15th Würzburg Workshop, Heidel- the Fourth International Zel'dovich Meeting, Minsk (ICRANet), tion surface look like?” Nearly Carbon Neutral Geometric berg, Germany (online), 14 December 2020. Christopher Zapp and Frauke Gräter: Belarus (online), 7 September 2020; "Gaussian Process Topology Conference (online), 1 July 2020. Mechanoradicals in tensed tendon collagen as new source regression of 1D stellar evolution variables", talk at the 15th Theodoros Soultanis: of oxidative stress, DPG Spring Meeting, Dresden, Germa- Würzburg Workshop, Heidelberg, Germany (online), 16 Benedikt Rennekamp: "Neutron star mergers", talk at the IMPRS retreat, Heidel- ny,17 March 2020 (poster). December 2020. "What’s after the break-up? Hybrid simulations of mechanorad- berg, Germany (online), 25 November 2020; "Neutron star icals in collagen", Max-Planck-Institute for Biophysical oscillations", talk at the 15th Würzburg Workshop 2020, Isabel Martin and Frauke Gräter: Chemistry, Göttingen, Germany, (virtual seminar), 9 April Heidelberg, Germany (online), 18 December 2020. 9.3 Memberships Mechanosensing in Focal Adhesions: Pseudokinase Integ- 2020. "Perspectives on collagen failure", Max-Planck-Insti- rin-linked Kinase under Force. Remote BioExcel Summer tute for Biophysical Chemistry, Göttingen, Germany, (virtual Gabriele Viaggi: Camilo Aponte-Santamaría: School on Biomolecular Simulations. Germany, 22-26 June seminar), 30 July 2020. "Multi-scale simulations of collagen “Uniform models for random 3-manifolds”, Nearly Carbon Referee for Plos Computational Biology and Nature 2020 (poster). failure”, Matter-to-Life Summer School, online conference, Neutral Geometric Topology Conference (online), June 2020; Communications 21-25 September 2020. "Collagen as a buffer of mechanical “Volumes and random walks on mapping class groups”, Wolfgang Müller and Martin Golebiewski: and chemical stress", Max-Planck-Institute for Biophysical International young seminar on bounded cohomology and Frauke Gräter: “LiSyM SEEK: A FAIR data & models hub and catalogue for Chemistry, Göttingen, Germany, (virtual seminar), 9 Dezember simplicial volume (online), 22 June 2020. Max Planck Fellow of the Max Planck School Matter to Life LiSyM-Cancer”, LiSyM-Cancer Partnering Event (online), 23 2020. (since 2019). Fellow of the Marsilius Kolleg, Heidelberg April 2020. Eva-Maria Walz: University (2019-2020). Member of the Editorial Board, Benedikt Rennekamp and Frauke Gräter: “Receiver Operating Characteristic (ROC) Movies”, Luminy Biophysical Journal. Member of the Board of Directors, Abraham Muniz-Chicharro, Rebecca C. Wade: "Hybrid Kinetic Monte Carlo / Molecular Dynamics Simula- Conference Mathematical Methods of Modern Statistics 2, Interdisciplinary Center for Scientific Computing (IWR), Heidel- “Prediction of drug-protein binding kinetics”, 12th Annual tions of bond scissions in proteins”, Biophysical Society Luminy, France (online), 17 June 2020. berg University. Associated faculty member, HGS MathComp Colloquium of the Heidelberg Graduate School of Mathemati- Meeting, San Diego, USA, 15-19 February 2020 (poster). Graduate School, University of Heidelberg. Faculty member, cal and Computational Methods for the Sciences, Germany Andreas Weidemann, Ulrike Wittig, Maja Rey, Dorotea Hartmut Hoffmann-Berling International Graduate School of (online), 1 December 2020 (poster). Dudas, Wolfgang Müller: Molecular and Cellular Biology (HBIGS), Heidelberg University. Maja Rey: “SABIO-RK: database for biochemical reaction kinetics”, Member of the coordinating committee of the excellence Ariane Nunes-Alves: “SABIO-RK Introduction”, de.NBI Course “Tools for Systems ELIXIR All Hands Meeting 2020 (online), 8-10 June 2020 cluster “3D Matter Made to Order” (KIT and Heidelberg "Comprehensive characterization of ligand unbinding biology modeling and data exchange: COPASI, CellNetAna- (poster). University). Co-speaker of Biophysics section within the mechanisms and kinetics for T4 lysozyme mutants using lyzer, SABIO-RK, FAIRDOMHub/SEEK” (online), 21 - 23 Condensed Matter section of the German Physical Society. τRAMD", 16th German Conference on Cheminformatics September 2020. Ulrike Wittig: Board Member of the Flagship Initiative "Engineering Molecu- (online), 2-4 November 2020. “Research data management by SEEK/FAIRDOMHub”, lar Systems", Heidelberg University. Referee for Biophysical Friedrich Röpke: MIX-UP Kick off Meeting, Brussels, Belgium, 7 February Journal, Journal of the American Chemical Society, PLoS Anna Piras: "Formation of sdB stars via common envelope ejection by 2020. “Research data management by SEEK/FAIRDOMHub”, Journals, Nature Journals, Proceedings of the National "The Law Of Attraction: Computational Insights into the Role substellar companions", talk at the Annual Meeting of the de.NBI Course “Tools for Systems biology modeling and Academy of Sciences and German Research Society (DFG). Of Non-Covalent Interactions in Graphene-Based Sensing", European Astronomical Society, Leiden, Netherlands (online), data exchange: COPASI, CellNetAnalyzer, SABIO-RK, FAIR- Graphene2020 (online), 19-23 October 2020 (poster). 2 July 2020; talk at the Annual Meeting of the German DOMHub/SEEK” (online), 21-23 September 2020. “SABIO-RK: Tilmann Gneiting: “Computational Insights into the Role of Non-Covalent Astronomical Society (online), 25 September 2020. Curation and Visualization of Reaction Kinetics Data”, Fellow, European Centre for Medium-Range Weather Fore- Interactions in Graphene-Based Sensing”, 1st joint 4EU+/ COMBINE 2020 (Computational Modeling in Biology Network casts (ECMWF), Reading, United Kingdom. Affiliate Professor, HGS MathComp Annual Colloquium (online), 1–2 December Christian Sand: Meeting) (online), 5-9 October 2020. “Research data man- Department of Statistics, University of Washington, Seattle, 2020. "Common-envelope evolution of an AGB star", talk at the agement by SEEK/FAIRDOMHub”, General Meeting of COST Washington, United States. Member, Steering Committee, Annual Meeting of the European Astronomical Society, Action CA18133 “European Research Network on Signal Karlsruhe Institute of Technology Center MathSEE: Mathe- Maria Beatrice Pozzetti: Leiden, Netherlands (online), 2 July 2020; talk at the Annual Transduction” (online), 12-14 October 2020. matics in Sciences, Economics and Engineering. Member, “Surface subgroups of semisimple Lie groups”, Young Meeting of the German Astronomical Society (online), Ensemble Advisory Board, United States COVID-19 Forecast- Geometric Group Theory IX, Rennes, France, 28 February 25 September 2020. Ulrike Wittig, Stuart Owen, Olga Krebs, Martin Gole- Hub. Member, Committee on Publications, Institute for 2020; “The real spectrum compactification of maximal biewski, Maja Rey, Alan Williams, Finn Bacall, Xiaoming Mathematical Statistics. character varieties”, workshop on Simplicial volumes and Fabian Schneider: Hu, Ina Pöhner, Munazah Andrabi, Jacky Snoep, Fate- bounded cohomology, Regensburg, Germany, 22 September "Stellar mergers as the origin of magnetic massive stars", meh Zamanzad Ghavidel, Rune Kleppe, Kjell Petersen, Martin Golebiewski: 2020. talk at the Annual Meeting of the European Astronomical Kidane Tekle, Jon Olav Vik, Inge Jonassen, Andrew Convenor (chair) of the ISO/TC 276 Biotechnology working Society, Leiden, Netherlands (online), 2 July 2020; talk at the Millar, Flora D'Anna, Vahid Kiani, Anders Goksøyr, Katy group 5 “Data Processing and Integration”, International Annual Meeting of the German Astronomical Society (online), Wolstencroft, Frederik Coppens, Carole Goble, and Organization for Standardization (ISO). Chair of the project Wolfgang Müller: group “FAIR Data Infrastructures for Biomedical Informatics”

106 107 9 Miscellaneous Alexandros Stamatakis: kolleg 2229 "Asymptotic Invariants and Limits of Groups Martin Golebiewski: Member of the steering committee of the Munich Supercom- and Spaces”, extended 2020- 2024. PI in DFG SFB/TRR 191 Host and chair of the EU-STANDS4PM Workshop “Using of the German Association for Medical Informatics, Biometry puting System HLRB at LRZ. Member of the scientific "Symplectic Structures in Geometry, Algebra and Dynamics”, patient derived data for in silico modelling in personalized and Epidemiology (GMDS). Member of the board of coordi- advisory board of the Computational Biology Institute in 2020-2024. PI and Member of Program Committee, DFG medicine”, HITS, Heidelberg (Germany), 4 February 2020. nators of COMBINE (Computational Modeling in Biology Montpellier, France. Member of scientific committee of the Schwerpunktprogramm "Geometry at Infinity" (SPP 2026) Co-host and session chair of HARMONY 2020, EMBL-EBI, network). Member of the steering committee of the German SMPGD (Statistical Methods for Post Genomic Data analysis) 2020 – 2023. Hinxton, Cambridgeshire, UK, 9-13 March 2020. Host and National Research Data Infrastructure for Personal Health workshop series. chair of Committee Meetings of ISO/TC 276 Biotechnology Data (NFDI4Health). German delegate at the ISO technical Ulrike Wittig: working group WG5 “Data Processing and Integration”, committee 276 Biotechnology (ISO/TC 276), International Michael Strube: Member of the STRENDA Commission (Standards for online, 8 June - 3 July 2020. Co-host and session chair of Organization for Standardization (ISO). Member of the Research Training Group 1994, Adaptive Preparation of Reporting Enzymology Data). the COMBINE 2020 Online Forum: 11th Computational national German standardization committee (“Nationaler Information from Heterogeneous Sources (AIPHES), TU Modeling in Biology Network Meeting, online, 5-9 October Arbeitsausschuss”) NA 057–06–02 AA Biotechnology, Darmstadt/Heidelberg University/HITS; Associate Editor: 2020. Host and chair of the Workshop “FAIR data infrastruc- German Institute for Standardization (DIN). Co-chair of the Journal of Artificial Intelligence Research. tures for biomedical communities”, 65th Annual Conference ISO/IEC Joint Ad-hoc Group on Standardization of Genomic 9.4 Contributions to the Scientific of the German Association for Medical Informatics, Biometry Information Compression and Storage (MPEG-G). Member of Rebecca C. Wade: Community and Epidemiology (GMDS), online, 15 October 2020. Task the steering committee of the AIMe registry for artificial Associate Editor: Journal of Molecular Recognition, PLOS Area Chair “Standards for FAIR Data (TA2)” at the NFDI- intelligence in biomedical research. Computational Biology. Section Editor: Molecular Informatics, 4Health Kick-off-Meeting, online, 4-6 November 2020. International Journal of Molecular Sciences. Editorial Board: Program Committee Memberships Ganna Gryn’ova: Advances and Applications in Bioinformatics and Chemistry; Olga Krebs: Affiliated junior research group leader: Interdisciplinary Center BBA General Subjects; Journal of Chemical Information and Sucheta Ghosh: Best practices in research data management and steward- for Scientific Computing (IWR), Heidelberg University. Mem- Modeling; Journal of Computer-aided Molecular Design; The 34th AAAI Conference on Artificial Intelligence (AAAI), ship. Online course, 19 May 2020 - 3 June 2020. Best ber: working group on advancing women’s careers in mathe- Biopolymers; Protein Engineering, Design and Selection. New York, NY, USA, 7-12 February 2020. (Program Commit- practices in research data management and stewardship, matics and computer science, Heidelberg Laureate Forum Member of Scientific Advisory Council of the Leibniz-Institut tee). The 58th Annual Meeting of the Association for Compu- Online course organised by ELIXIR Luxembourg and ELIXIR Foundation (HLFF). für Molekulare Pharmakologie (FMP), Berlin-Buch. Member of tational Linguistics (ACL), Seattle, USA, 5-10 July 2020. The France, 5 - 8 October 2020. Modelling COVID-19 epidemics, Scientific Advisory Council of the Computational Biology Unit 21st Annual Conference of the International Speech Commu- online course, co-organized with ELIXIR, ISBE and EOSC, 30 Saskia Hekker: (CBU), University of Bergen, Norway. Member of Scientific nication Association- Interspeech, Shanghai, China, 14-18 November - 9 December 2020. Member of Brite Executive Science Team (BEST); head of an Advisory Board of the Max Planck Institute of Biophysics, September 2020 (Technical Program Committee). The 17th international node of the Stellar Astrophysics Centre (SAC); Frankfurt. Member at Heidelberg University of: HBIGS International Conference Applied Computing 2020, Lisbon, Ariane Nunes-Alves: member of TESS Asteroseismic Science Consortium (TASC) (Heidelberg Biosciences International Graduate School) Portugal, 18-20 November 2020 (Program Committee). Co-organizer: "LatinXChem" chemistry twitter poster confer- Steering Committee. faculty, HGS MathComp Graduate School faculty, Interdisci- ence, https://www.latinxchem.org/ online, 7 September 2020. plinary Center for Scientific Computing (IWR), DKFZ-ZMBH Michael Strube: Co-organizer: "The impact of the COVID-19 crisis on women Sebastian Lerch: Alliance of the German Cancer Research Center and the Senior Area Chair at the 58th Annual Meeting of the Associa- in science: challenges and solutions", EMBL WIS conference, Associate Editor, Monthly Weather Review. Associate Editor, Center for Molecular Biology at Heidelberg University. Mem- tion for Computational Linguistics, Online, The World, 5-10 held online https://www.embl.org/events/covid19-wis/, 9 Meteorological Applications. Guest Editor, Nonlinear Process- ber of Managing Board of Directors, Interdisciplinary Center July 2020. Area Chair at “The 1st Conference of the Asia-Pa- September 2020. es in Geophysics. for Scientific Computing (IWR), Heidelberg University. cific Chapter of the Association for Computational Linguis- tics” and the “10th International Joint Conference on Natural Friedrich Röpke: Wolfgang Müller: Anna Wienhard: Language Processing”, Online, The World, December 4–7, Senatsberichterstatter, Berufungsverfahren Professur Zahner- Member of the Scientific Advisory Board of the BioModels Fellow of the American Mathematical Society. Advisory Board, 2020. Area Chair at the “28th International Conference on haltungskunde, Heidelberg University, Germany, 2020. Database. Deputy Chairman of SIG 4 (Infrastructure & data Springer Lecture Notes in Mathematics. Advisory Board, Computational Linguistics”, Online, The World, December Member of the Scientific Organizing Committee, session SS5 management), German Network for Bioinformatics Infrastruc- Mathematisches Forschungsinstitut Oberwolfach. Scientific 8-13, 2020. "New insights of angular momentum transport in stellar ture (de.NBI). Board Member and Treasurer of FAIRDOM e.V. Committee Wissenschaftskommunikation.de. Member interiors", Annual Meeting of the European Astronomical Heidelberg Academy of Sciences. Member Berlin-Branden- Society, Leiden, Netherlands (online), 1 July 2020. Ariane Nunes-Alves: burg Academy of Sciences and Humanities. Selection Workshop and Conference Organization Member of the Early Career Board of the Journal of Chemical Committee, Heinz Maier Leibnitz Preis der Deutschen For- Alexandros Stamatakis: Information and Modeling. schungsgemeinschaft. Editor Annales Henri Lebesgue. Editor Robert Andrassy and Friedrich Röpke: Organizer of 2020 Computational Molecular Evolution Annales scientifiques de l'Ecole Normale Superieure. Editor Members of the Scientific and Local Organizing Committees, Summer School, Heraklion, Crete, Greece. (The summer Friedrich Röpke: Geometric and Functional Analysis. Editor Geometry & 15th Würzburg Workshop, HITS, Heidelberg, Germany school was postponed to 2021). Advisory Board: Sterne und Weltraum. Topology. Editor Proceedings of the London Mathematical (online), 14-18 December 2020. Society. Editor Geometriae Dedicata (2011–2020). Editor Michael Strube: Forum Mathematicum (2016–2020). Co-Spokesperson: DFG Tilmann Gneiting: Program Co-Chair of CODI 2020, The First Workshop on Excellence Cluster 2181 "STRUCTURES: A unifying approach Member, Organizing Committee, MathSEE Symposium 2020: Computational Approaches to Discourse at EMNLP 2020, to emergent phenomena in the physical world, mathemat- Mathematics in Sciences, Engineering and Economics (online), Online, The World, 19 November 2020. ics, and complex data” ,Co-Spokesperson DFG Graduierten- Karlsruhe Institute of Technology, Germany, 7-9 October 2020. 108 109 9 Miscellaneous

Anna Wienhard: 9.5 Awards Co-organizer of the conference “Teichmüller Theory: Classi- cal, Higher, Super and Quantum”, 5-9 October 2020, CIRM Johannes Bracher: (Centre International de Rencontres Mathématiques), Mar- “ProFID: Probabilistic Real-Time Forecasting of Infectious seille, France. Diseases”, Young Investigator Group Preparation Program (YIG Prep Pro) award, Karlsruhe Institute of Technology, 2020. Other contributions Frauke Gräter: Antonio D’Isanto: ERC Consolidator Grant, European Research Council, 2020. “A trip across the Galaxy: the search for exoplanets”, Invited lecture for the eight-grade classes of the Milton Hershey Ganna Gryn’ova: School, Pennsylvania, USA, 30 Oct 2020. “B like beyond the Principal Investigator: SFB1249 “N-Heteropolycycles as human limit or astronomical challenges in the era of AI”. Functional Materials”. Blog post in the HITS Blog “Via Data” on SciLogs. Saskia Hekker: Valentina Disarlo: ERC Consolidator Grant, European Research Council, 2020. Project Leader in the DFG SPP 2026 Geometry at Infinity for Project 44 "Actions of mapping class group and its sub- Sebastian Lerch: groups". “Artificial Intelligence for Probabilistic Weather Forecasts”, MINT for the Environment Early Career Research Group Isabel Martin and Frauke Gräter: award, Vector Foundation, 2020. “#HITS – Wie Daten Wissen schaffen. Proteine im Crash- test: Von der Achillessehne bis zum Blut.” MAINS Heidel- Eugen Rogozinnikov: berg, Germany, 13 February 2020. Heidelberger Dissertationspreis Mathematik 2020.

Nicholas Michelarakis: Carmen Rovi: "A Day in the Life of a Computational Biochemist"; talk at Summer Research in Mathematics Program, MSRI, Berkeley, the Summer School REMOTE 2020, “Jugend Präsentiert”, USA ($5000 grant and paid accommodation and travel). Heidelberg, Germany, (online), 3 July 2020. Invitation from Max Planck Institute for Mathematics, Janu- ary-June 2020 (comes with significant stipend). Maria Beatrice Pozzetti: Principal Investigator in the RTG 2229: Asymptotic Invariants Fabian Schneider: and Limits of Groups and Spaces. Project Leader in the DFG Emmy Noether Fellowship, German Research Foundation, SPP 2026 Geometry at Infinity for Project 28: Rigidity, 2020 (declined). ERC Starting Grant, European Research deformations and limits of maximal representations and Council, 2020. Project 71: Rigidity, deformations and limits of maximal representations II. Leader of a DFG Emmy Noether indepen- Alexandros Stamatakis: dent junior research group (Project number: 427903332). Highly Cited Researcher in Cross-Field Research, Clarivate Analytics, 2020. Anja Randecker: Principal Investigator in the RTG 2229: Asymptotic Invariants and Limits of Groups and Spaces. Project Leader in the DFG SPP 2026 Geometry at Infinity for Project 72: Limits of invariants of translation surfaces.

Anna Wienhard: Selected as Henriette-Herz Scout, Alexander von Humboldt Foundation.

110 111 Shareholders´ Board HITS Management

Prof. Dr. Wilfried Juling The HITS Management consists of the Managing Director 10 Boards and Management Member of the Board of Directors and the Scientific Director (“Institutssprecher”). The latter is one of the group leaders appointed by the HITS share- holders for a period of two years. The scientific director represents the institute in all scientific matters vis-à-vis cooperation partners and the public. HITS-Stiftung Prof. Dr.-Ing. Dr. h.c. Andreas Reuter Member of the Board of Directors Managing Director: (until May 2020) Dr. Gesa Schönberger

Prof. Dr. Carsten Könneker Member of the Board of Directors (since June 2020) © Mück/Klaus Tschira Stiftung Scientific Director: PD Dr. Wolfgang Müller (2019 – 2020) Heidelberg University Prof. Dr. Jörg Pross Vice-President of Research and Structure The HITS Scientific Advisory Board. From left to right: Wolfgang Müller (HITS Scientific Director), Tony Hey, Frauke Gräter (HITS Deputy Scientific © Philip Benjamin Deputy Scientific Director: Director), Alex Szalay, Victoria Stodden, Thomas Lengauer, Adele Goldberg, Barbara Wohlmuth, Dieter Kranzlmüller, Gert-Martin Greuel, Gesa Prof. Dr. Frauke Gräter Schönberger (HITS Managing Director), Jeffrey Brock. Karlsruhe Institute of Technology (KIT) (2019 – 2020) Dr. Hanns-Günther Mayer Scientific Advisory Board Director of Shareholdings (“Leitung Beteiligungen”) The HITS Scientific Advisory Board (SAB) is a group of internationally renowned scientists that supports the manage- ment of HITS in various aspects of running, planning, and directing the institute. The SAB is responsible for orchestrating the periodic evaluation of all the research groups of HITS. It presents the results to the HITS management and makes recommendations regarding how to further improve the institute’s research performance. In 2020, the board consisted of the following members: HITS

• Prof. Dr. Jeffrey Brock, Zhao and • Prof. Dr. Tony Hey, Chief Data • Prof. Dr. Alex Szalay, Johns The Heidelberg Institute for Theoretical Studies (HITS) was Ji Professor of Mathematics at Scientist, Science and Technology Hopkins University, USA established in 2010 by the physicist and SAP co-founder Klaus Yale University, USA Facilities Council, UK Tschira (1940 – 2015) and the Klaus Tschira Foundation as a • Prof. Dr. Victoria Stodden, School private, non-profit research institute. HITS conducts basic • Dr. Adele Goldberg, former • Prof. Dr. Dieter Kranzlmüller, of Information Sciences, Universi- research in the natural sciences, mathematics, and computer President of the Association for Ludwig Maximilians University, ty of Illinois at Urbana-Champaign, science, with a focus on the processing, structuring, and Computing Machinery (ACM), USA Munich, Director of the Leibniz USA analyzing large amounts of complex data and the development (Vice Chair, SAB) Super Computing Center, Germany of computational methods and software. The research fields (Chair, SAB) • Prof. Dr. Barbara Wohlmuth, Chair range from molecular biology to astrophysics. The sharehold- • Prof. Dr. Gert-Martin Greuel, of Numerical Mathematics at the ers of HITS are the HITS-Stiftung, Heidelberg University, and the University of Kaiserslautern, • Prof. Dr. Thomas Lengauer, Technical University of Munich Karlsruhe Institute of Technology (KIT). HITS also cooperates former Director of the “Mathema- Max-Planck-Institute for Computer (TUM), Germany. with other universities and research institutes and with industri- tisches Forschungszentrum Science, Saarbrücken, Germany al partners. The base funding of HITS is provided by the HITS Oberwolfach”, Germany Stiftung with funds received from the Klaus Tschira Foundation. The primary external funding agencies are the Federal Ministry of Education and Research (BMBF), the German Research Foundation (DFG), and the European Union.

112 113 HITS gGmbH Schloss-Wolfsbrunnenweg 35 D-69118 Heidelberg

Editor Dr. Peter Saueressig Head of Communications

Contact [email protected] Phone: +49 6221-533 533 Fax: +49 6221-533 298 www.h-its.org

Our e-mail addresses have the following structure: [email protected]

Pictures HITS gGmbH (unless otherwise indicated)

All rights reserved. All brand names and product names mentioned in this document are trade names, service marks, trademarks, or registered trademarks of their respective owners. All images are protected by copyright. Although not all are specifically indicated as such, appropriate protec- tive regulations are valid.

Layout and Design FEUERWASSER | grafik . web . design www.feuerwasser.de

ISSN 1438-4159 | © 2021 HITS gGmbH

Twitter: @HITStudies Facebook: /HITStudies Youtube: /TheHITSters LinkedIn: company/the-heidelberg-institute-for-theoretical-studies Instagram: the_hitsters

114