Research Summary - Spring 2016

Wolfgang Huber Resarch Group Leader and Senior Scientist

Contents

A Research Vision2

B Summary of Work 2

C Future Plans 10

D Publications 2012-16 15

E List of External Grants (since 2012) 22

F Curriculum Vitae 23 A Research Vision

The unifying concept of my research is methodology: statistical expertise and the ability to invent new methods. I apply these where there is a gap whose overcoming will progress biology. I run an interdisciplinary group with three main aims: The first aim is to drive forward the state of the art of statistics in biology – that is, the science of reasoning with uncertainty, making reliable inference based on incomplete, noisy or overwhelming data. But I also understand statistics as an instrument for discovery: a set of tools that help humans see interesting patterns in large datasets. A second aim is to gain insight into pressing questions in drug-genotype interactions and precision oncology through proficient use of statistical computing. To achieve both of these aims, I closely collaborate with biomedical researchers who are equipped with exciting novel technologies and are producing novel data types. Thirdly, I aim to advance translational statistics by making methods usable not only for experts, but for a wide range of users. This aim is embodied by my engagement in the Bioconductor project.

B Summary of Work

Research Highlights of the Last Four Years From 2012 till present, 22 papers were published with W. Huber as corresponding author and/or group members as (co-)first authors [1–18, 49,50,56,66]. There were 67 papers altogether (Section D). Several are having an impact. Highlights are: Statistical methods. We developed the first approach to false discovery rates in multiple testing that permits data-driven hypothesis weighting [68]. The power gains can be large, and the method is broadly applicable. RNA-seq. We developed what have become standard tools for RNA-seq analysis, most prominently DESeq2 for differential gene expression analysis [8, 10]. Moreover, we published htseq, DEXSeq [18] and a method for single-cell RNA-seq [49]. We used DEXSeq for a novel contribution to the debate on ’junk’ versus ’function’ in alternate RNA isoforms [15]. We used statistical modelling to weigh the extent of stochasticity and regularity in the promiscuous gene expression of medullary thymic epithelial cells [4]. Other ’omics. We developed methods and Bioconductor packages for cancer genome sequencing [6, 12], 4C-seq [7] and iCLIP [25] and applied these in numerous collaborative projects. We contributed to the adaptation of the DESeq2 framework to other data types, such as Ribo-seq and ChIP-seq. Translational statistics. Many powerful mathematical and computational methods exist but are diffi- cult to access for a majority of biomedical scientists. We translate advanced ideas into practical meth- ods and software. I took responsibility for the European presence of Bioconductor [1], a widely used software project, through organising developer conferences, annual summer courses and obtaining EC network grants (RADIANT, SOUND) for the project. Gene-gene & gene-drug interactions. We discovered a method for automated inference of the direc- tion of epistatic genetic interactions from high-content phenotyping data [2]. We partnered with the National Centre for Tumour Diseases (NCT) to translate high-dimensional phenotyping and gene-drug interaction screening into practical personalized medicine. Systems microscopy. We developed methods for estimating quantitative biophysical models from time- resolved microscopy data and applied these in several successful collaborations with developmental biologists [5, 24, 48, 56]. Thermal proteome profiling. We recently started work on statistical methodology and computational infrastructure for thermal proteome profiling [9, 27]. Our aim is to make the technology as widely acces- sible and usable as possible – a new ’workhorse’ for scientists both in fundamental and pharmaceutical research. Reproducible research. All our major papers are accompanied by a complete transcript of all compu- tations from raw data to figures, tables and numbers reported in the paper. Further details on a selection of the above-mentioned highlights are given in the following.

B.1 Translational Statistics The adjective translational is sometimes used for efforts to translate biological discoveries into some- thing useful for medicine. I use the term translational statistics for efforts to make sophisticated mathe- matical discoveries and computational methods accessible to a wide range of natural scientists. I have contributed to the Bioconductor project since 2002 [1,8, 53, 112, 139, 140]. The project has been providing an energetic, fast-moving platform to the research community for collaborative, interoperable, scientifically leading software in genomics and quantitative biology. It has also become a platform for the publication of bioinformatic software that many authors aspire to. Bioconductor is the largest software project in bioinformatics, with several thousand users and hundreds of developers worldwide. It comprises more than 1000 software packages. I have outlined the aims of the project, and the means by which we achieve them, in a recent perspective paper [1]. My particular role in the project has been in the provision of mathematically sophisticated packages for the primary data analysis of popular technologies [6–10,12,18, 50, 68, 71,82,88,104,105,109,124, 154]. For some of them it might be fair to say that they were among the “killer applications” that helped bring new users to Bioconductor. A list of those software packages is provided in Section F, heading Software. An important goal has been to facilitate interoperability of R/Bioconductor with other software projects. For instance, the rhdf5 package provides an interface to the HDF5 data storage system. HDF5 is used in high-performance computing and permits efficient exchange of large, array-shaped datasets between different software systems. The RBioFormats package provides an interface to BioFormats1, the leading solution for reading vendor-specific microscopy image data and metadata formats. The package lpsymphony interfaces to the powerful SYMPHONY optimisation package, an open-source solver for mixed-integer linear programmes. Since 2005, an annual general Bioconductor conference has been held in the US each summer. Since 2010, I have coordinated annual European developer conferences, which take place in the winter and alternate between the UK and the continent (, Zurich). They usually attract 40-50 active and future package developers. For new users, I organise the annual CSAMA summer schools in Brixen, South Tyrol, which have taken place every year since 2004. These week-long compact courses host around 60 participants (places are usually booked out quickly) and are taught by high-calibre teachers, incl. R. Gentleman, M. Morgan, V. Carey, M. Love, S. Anders. Since 2015, I have been involved in the organisation of bi-annual Statistics in Genomics workshops at the ETH’s wonderful conference centre on Monte Verita,` Ascona, Switzerland. To support Bioconductor development, I have co-written the EC network grant RADIANT (2012-2015) and am coordinating the SOUND project (2015-2018). These grants include leading contributors to Bioconductor in Europe. They do not only provide research funding, but also positions for staff to work on strategically important infrastructure- or support-oriented tasks. SOUND also includes a US partner, M. Morgan, the leader of the Bioconductor project.

B.2 DESeq, DESeq2 and DEXSeq DESeq is a method and software package for the differential analysis of count data from high-throughput sequencing that we published in 2010 [82]. In the meanwhile, it has been cited over 2,400 times2. With DESeq2, we have greatly extended its statistical sophistication (Figure 1) and the range of its applica- tions, and improved the software user interface and robustness, documentation and associated training material [10]. The method is based on generalized linear models and uses empirical Bayes methodol- ogy to permit model parameter estimation even in the case of few (e. g., two) replicates. Contrary to some misperceptions, using only such a ’small’ number of replicates is a reasonable, scientifically and economic efficient choice for designed experiments3. It is supported by progress in statistical modelling

1http://www.openmicroscopy.org/site/products/bio-formats 2ISI Web of Science 3It is helpful to distinguish between designed experiments, performed under well-controlled laboratory conditions, and studies, done with cohorts ’in the wild’, e. g., human subjects in the clinic. For the latter, large cohort sizes (100s, 1000s) Figure 1: DESeq2 uses empirical Bayes methodology to obtain stable estimates of logarithmic fold changes (LFC) and vari- ances even when the number of replicates is small. In this figure from reference [10], panels A and B show MA-plots of the maximum likelihood (ML, A) and maxi- mum a posteriori (MAP, B) estimates of LFC. Two genes with similar mean count and MLE LFC are highlighted by green and purple circles, and their normalized count data are shown in panel C. The green gene has low dispersion, the purple gene, high dispersion. Panel D shows the densi- ties of the likelihoods (solid lines), posteri- ors (dashed) and the Empirical Bayes prior (solid black).

–in particular, empirical Bayes methodology and hierarchical models that share information between genes– over the last 15 years. Since its publication in December 2014, the DESeq2 paper [10] has been cited over 110 times4, and the package was downloaded from >35,000 unique IP addresses over the last year. DESeq2 is an example for relatively sophisticated statistical methodology that makes a practical difference to biologists. Moreover, we also published htseq and htseq-count for counting the overlap of aligned sequencing reads with genomic features. This is a basic step in the processing of RNA-seq data. The paper associ- ated with the software [11] has been cited over 300 times and is mentioned in >1,000 papers according to the full-text search of PubmedCentral5. It is the most prominent implementation of the ’counting’ approach to RNA-seq6. DEXSeq [18] addresses alternative isoforms. In comparison to approaches that try to reconstruct full transcripts before testing them for differential abundance across conditions, DEXSeq short-circuits the assembly and looks for differential exon usage directly. It has performed well in recent benchmarks7 compared to the aforementioned approaches – a result of the fact that the goal of full mammalian transcript reconstruction from Illumina HiSeq short reads remains elusive8. Drift and conservation of differential exon usage across tissues in primate species. Using DEXSeq on multi-species, multi-tissue data, we have made a contribution to the discussion of ’junk’ versus ’function’ in alternate RNA isoforms [15]. We found that for a large fraction of tissue-specific isoform diversity seen in primates, the tissue-specific expression is not conserved even between closely related species. On the other hand, for the subset of highly expressed tissue-specific isoforms (3,800 exons in 1,643 genes), we do detect conserved tissue-specific usage across species. To the extent that such conservation is an indicator of selection for function, our analysis supports the view that, by and are needed, and analysis methods can be less reliant on the limma / edgeR / DESeq2 - style empirical Bayes approach to information sharing across genes. 4More than 450 if references to the bioRχiv preprint are included. 5http://www.ncbi.nlm.nih.gov/pmc/?term=htseq 6More recently, methods that circumvent the alignment and feature-counting steps by directly assiging reads to target sequences via k-mer matching, such as sailfish, are gaining traction. Eventually, this approach is likely to make the use cases for htseq-count less numerous – but not those for differential expression analysis, i. e., DESeq2 or its related methods. 7E. g., Soneson et al. (2015) http://dx.doi.org/10.1101/025387 8Steijger et al. (2013) http://dx.doi.org/10.1038/nmeth.2714 large, alternative isoform usage is leaky and noisy at low abundance levels, but more tighly controlled and functional for higher abundance transcripts. For single cell RNA-seq data, Simon Anders published an influential method for distinguishing true biological variability from technical variability [49]. We used it to resolve a debate on the extent of stochasticity and regularity in the promiscuous gene expression programmes of medullary thymic epithelial cells [4]. Also with the Steinmetz lab, we mapped cell-to-cell variability of 3’ isoform choice by single-cell polyadenylation site mapping [32]. Extension to other data types. A special highlight here was that the very first high-throughput CRISPR/Cas9 screen9 was analysed with DESeq2. The fact that this was done without our direct involvement speaks for the usability of the software. As for our own efforts, we focused on high- throughput chromosome conformation capture assays, specifically 4C, HiC and ChIA-PET. We de- veloped the Bioconductor package FourCSeq [7], applied it to research reported in Nature [42] and presented further results on analysis of HiC data in [69]. Documentation and usability. We published an end-to-end RNA-seq data analysis protocol ori- ented to practitioners in Nature Protocols [50]. This was written as a consensus document together with the authors of the main competing package, edgeR. Two years later, we provided an updated and distinctly extended version in F1000Research [8].

B.3 Cancer Genomics We developed the h5vc package, which leverages the high-performance data storage system HDF5 together with R/Bioconductor for large-scale analyses of genome sequencing data [12]. We also pub- lished the SomaticSignatures package, which identifies mutational signatures of single nucleotide vari- ants (SNVs) in tumour genomes [6]. It provides infrastructure related to the methodology described by Nik-Zainal (2012, Cell). We applied these tools in numerous collaborative projects, including the HeLa genome [17], the first data-based estimation of position-specific error rates for each base in the human genome10.

B.4 Multiple Testing, False Discovery Rates and Hypothesis Weighting When functional genomics data became available in the 1990s, a spike of interest arose in the topic of multiple testing. With the adoption of the false discovery rate (FDR) as a common experiment- wide summary and with practical computational methods11, it seemed for a while that the topic was settled. However, as the size and complexity of datasets have increased, researchers have realized a major limitation of the currently used FDR methods: the exchangeability assumption. The information used from each hypothesis test is only the p-values. Other potentially useful information –such as the power of the test, the observed effect size, the prior probability of the null hypothesis– is effectively ignored. Although various ad hoc fixes and heuristics existed, they were unsatisfactory since they were statistically inefficient, required manual ad hoc tuning, or were even fallacious. Our work provides a principled, data-driven and statistically near-optimal solution to the problem [68]. It generalizes earlier work [10, 81].

B.5 Gene-Gene and Gene-Drug Interactions Automated phenotyping from microscopy image analysis. Microscopy-based readouts are more informative for phenotyping than bulk viability or reporter assays, by providing single-cell resolved data on processes such as cell cycle and proliferation, cell migration, trafficking and organelle mor- phology. We have created an R-based infrastructure –in particular our Bioconductor package EBIm- age– to support such high-throughput workflows, and have applied it widely in successful collabora- tions [2,3,5,14, 24,41,44,48,52,54,56,65,67]. In comparison to other tools (e. g., CellProfiler, Matlab, ImageJ/Fiji), strengths of our solution lies in the combination of functionality, speed and scriptability.

9Zhou et al. (2014) http://dx.doi.org/10.1038/nature13166 10Julian Gehring’s PhD thesis; paper to be published 11Most prominently, the method of Benjamini and Hochberg. Published online: December 23, 2015 Molecular Systems Biology Integrated phenotypic and pharmacogenetic compound profiling Marco Breinig et al

A similarity of YC-1 C14 multiparametric interaction profiles ARP 101 Cantharidic acid C15 Cantharidin 2 low high BIO Disulfiram C18 ZPCK Tyrphostin AG 555 CAPE Betamethasone C4 Beclomethasone Mitoxantrone C8 U0126 (control) C2 Camptothecin U0126 5'dFUrd C6 Thapsigargin Figure 2: Unsupervised clustering of PD98059 Calcimycin 5-FU drugs based on the correlation of their Etoposide C7 Rottlerin C17 Amsacrine Niclosamide imaging-based high-content phenotypes in NU2058 Ara-C CGP-74514A C12 12 different cell lines [3]. The correla- Cyclo-C Emetine NSC95397 tion distances between each pair of com- Phenanthroline C16 Ouabain C11 5-Azacytidine Dihydro-Ouabain pounds are shown in the upper left half of Aminopterin 1 Brefeldin A Methotrexate the matrix. For comparison, the lower right BAY 11-7082 C13 BAY 11-7085 Carboplatin C9 STATTIC shows the structural similarities (Tanimoto CB 1954 Bendamustine C10 Iodoacetamide distances). PD 169316 C3 Pifithrin-mu SB 202190 Parthenolide Supercinnamaldehyde DMAT C5 Taxol (control) C1 TBBz Taxol Podophyllotoxin Colchicine structural similarity Vinblastine Vincristine of compounds Vinblastine (control) CHM-1 hydrate Nocodazole low high

C B Genotypes Multiparametric Genotypes and In terms of functionality,phenotypes we leveragemultiparametric R’s phenotypes rich toolset0.3 Target for selectivity statistics, machine learning and publication- 1.0

quality0.8 data visualisation. 0.2 0.6

Another output of general interest is our new∆ AUC feature selection method [2]. It combines attractive ECDF 0.4 0.1 properties0.2 of linear rotation methods (such as principal component analysis, linear discriminant anal- 0 0 ysis),–1.0 namely, –0.5 0 0.5 non-redundancy 1.0–1.0 –0.5 0 0.5 1.0 and–1.0 –0.5 signal-to-noise 0 0.5 1.0 based dimension selection with the advantages of feature selection,Correlation namely, between interpretability compound profiles and portability. Genotypes phenotypes no shared target selectivity shared target selectivity We performed the first gene-gene interaction screenMultiparametric byGenotypescombinatorial and RNAi in human cells [14, 44].

We demonstrated the power of genetically engineeredmultiparametric cell linesphenotypes and high-content phenotyping for dis- Figurecovering 5. drug-gene interactions (Figure2[3]).

10 MolecularWe Systems invented Biology 11: 846 | 2015 a new method for deducing directionalityª 2015 The Authorsin gene-gene interaction data (Figure 3). The inferred directed arrows can often be related to temporal, logical, or causal hierarchy of the targeted gene products [2]. The method is applicable to multivariate phenotypes, and in particular to features from high-content screening. Besides gene-gene interactions, it will also be applicable to gene-drug or drug-drug interactions. We are currently pushing forward this line of work from laboratory cell lines to large cohorts of primary cancer cells, in an exciting collaboration with haematologists at the National Centre for Tumour Diseases (Figure 4).

B.6 Reproducible research We have established a system of supplementary information that we use for all our major papers. It allows readers to fully reproduce the reported results from raw data to all figures, tables and numbers. We provide these packages for the free, open-source R system, most of them hosted on Bioconductor12. The packages contain the raw data files, custom-written procedures incl. standard R-style documentation in manual pages and literal programming documents. These are documents authored with the knitr system that mix computer code and human-readable narrative and are executable by anyone. In this way, readers can not only reproduce what we did, but also check the effect of variations of our analysis choices on the results. Moreover, they may take our methods and adapt them to their data. Besides the direct utility of this information, our aim is also to demonstrate across a range of jour- nals and communities that it is possible to move beyond supplementary information in static PDF files to support a paper. These include:

12https://bioconductor.org Topic Journal Package/URL Single cell transcriptome analysis in the early mouse Nature Cell Biology [40] Hiiragi2013 embryo Life-cell microscopy study of cell migration in the Nature [48] DonaPLLP2013 fish embryo First comprehensive RNA interactome Cell [66] Website Map of genetic interactions in human cancer cells Nature Methods [14] HD2013SGI with RNAi and multiparametric phenotyping Large-scale directional genetic interaction map in fly eLife [2] DmelSGI Mapping of signalling networks through synthetic Nature Methods [71] RNAinteractMAPK genetic interaction analysis by RNAi Chemicalgenetic interaction map of small molecules Mol. Syst. Biol. [3] PGPC using highthroughput imaging in cancer cells Single Cell RNA-Seq Nature Immunology [4] Single.mTEC.Transcriptomes Protein turnover in embryos based on tandem fluo- Development [5] TimerQuant rescent timer microscopy RNA-Seq analysis end-to-end workflow F1000 Research [8] Webpage RNA-Seq analysis method Genome Biology [10] DESeq2, Webpage Dynamical modelling of cell cycle phenotypes from BMC Bioinformatics [16] mitoODEdata genome-wide RNAi live-cell imaging Drift and conservation of differential exon usage PNAS [15] PDF vignette across tissues in primate species Differential exon usage from RNA-Seq method Genome Research [18] DEXSEq, pasilla Furrow segmentation in life imaging of optogenetic Developmental Cell [24] furrowSeg experiment Mutliple testing methods paper bioRχiv [68] github Research article Genes and chromosomes | Genomics and evolutionary biology

Research article Genes and chromosomes | Genomics and evolutionary biology

Cdc23→sti

Figure 4. Deriving directional genetic interactions. (A) Multiparametric phenotypes are extracted for single and double knockdowns. Genetic interaction Figure 3: Data-based inference of directional epistatic genetic interactions. Multivariate pheno- types were computedscores from are images computed of cells for treated each withdouble combinatorial knockdown libraries experiment. of single The schematic and double plots in the third column show the model for identifying directional genetic RNAi knockdowns [interactions2]. Each phenotype between was gene represented A and gene as Ban usingn-dimensional two exemplary vector; phenotypes. the origin of The single knockdown phenotypes of genes A and B and the measured the vector space wasdouble fixed such knockdown that the phenotypes null vector (AB) is the are negative depicted control. as arrows. For Thevisualisation, expected heredouble knockdown phenotype for non-interacting (NI) genes, which is the sum n = 2: cell numberof and the area single of gene nuclei. effects, In [2 is],depicted we used n by= the 21 symbol. It turns NI. out Black that arrows in many depict cases the genetic interaction π. The first row shows the case where genes A and B the double knockdownare not phenotype interacting. vector Below, of two four genes types A of andinteraction B is approximately between the genes collinear A and with B are shown: gene A is alleviating to gene B, gene A is aggravating to gene that of one the twoB; genes, andin but reverse, is either B alleviatesincreased or or aggravates decreased. gene These A. four Whenever scenarios the are genetic depicted interaction (black arrows) is parallel or anti- parallel to one of the single gene schematically on theeffects, left. The a directional middle and genetic right panels interaction show is data called. for (B two–D) exemplary A directional genes, interactionsti and detected between Cdc23 and sti.(C) The two orange and two blue arrows Cdc23, for four replicateshow experiments. the phenotypes The (nuclei data are area best and fit cell by number) model B→ ofA, the indicating two dsRNAs that designed loss of for sti and Cdc23. The grey dots show the expected double knockdown function of Cdc23 revertseffect for the the phenotype two genes. of sti The. Biologically, black arrows, this indicating is explained the genetic by the fact interaction, that the are directed opposite to the phenotype of sti, indicating that functional cytokinesis regulatorFiguresti acts 4. chronologically continued on next after page the APC/C member Cdc23 in mitosis. In Fig. 5 of the paper [2] we showed how to derive a dense network of such directional epistatic interactions for mitosis-relevant genes. Note: the images shown here represent only a small zoom-in view of the images analysed. Fischer et al. eLife 2015;4:e05464. DOI: 10.7554/eLife.05464 9 of 21

Figure 4. Deriving directional genetic interactions. (A) Multiparametric phenotypes are extracted for single and double knockdowns. Genetic interaction scores are computed for each double knockdown experiment. The schematic plots in the third column show the model for identifying directional genetic interactions between gene A and gene B using two exemplary phenotypes. The single knockdown phenotypes of genes A and B and the measured double knockdown phenotypes (AB) are depicted as arrows. The expected double knockdown phenotype for non-interacting (NI) genes, which is the sum of the single gene effects, is depicted by the symbol NI. Black arrows depict the genetic interaction π. The first row shows the case where genes A and B are not interacting. Below, four types of interaction between the genes A and B are shown: gene A is alleviating to gene B, gene A is aggravating to gene B; and in reverse, B alleviates or aggravates gene A. Whenever the genetic interaction (black arrows) is parallel or anti- parallel to one of the single gene effects, a directional genetic interaction is called. (B–D) A directional interaction detected between Cdc23 and sti.(C) The two orange and two blue arrows show the phenotypes (nuclei area and cell number) of the two dsRNAs designed for sti and Cdc23. The grey dots show the expected double knockdown effect for the two genes. The black arrows, indicating the genetic interaction, are directed opposite to the phenotype of sti, indicating that functional Figure 4. continued on next page

Fischer et al. eLife 2015;4:e05464. DOI: 10.7554/eLife.05464 9 of 21 Lars Steinmetz EMBL Transcriptomics, systems genetics [4,15,17, 21,32,45,49,51,58, 69, 72, 74, 90, 96, 97, 99, 103, 122, 124] Michael Boutros DKFZ Gene-gene and gene-drug interactions, high-content phenotyp- ing [2, 3, 14, 44, 71, 78, 79, 83, 84, 93, 123] Martin Morgan RPCI (Buffalo, Bioconductor – software for genome-scale data analysis [1, 53, USA) 102]. Funding: BIGDATA, SOUND Thorsten Zenz NCT, DKFZ Cancer pharmacogenomics [28, 31, 70]. Funding: SOUND, TRANSCAN GCH-CLL Jan Korbel EMBL Cancer genomics [17, 80, 91]. Funding: BioTop, HD-HuB Mikhail Savitski EMBL Thermal proteome profiling – statistical method development [ 9, 27] Eileen Furlong EMBL 4C data analysis [7, 42, 118] Jeroen Krijgsveld EMBL Mass spectrometry based quantitative proteomics [13, 26,43,59, 61, 64, 66] Susan Holmes Stanford Statistical methods for high-throughput biology Jan Ellenberg EMBL Systems microscopy [16, 86,87]. Funding: Systems Microscopy Darren Gilmour EMBL Quantitative modelling from live cell imaging of cell migration [48, 5] Stefano de Renzis EMBL Optogenetic study of tissue morphogenesis [24] Takashi Hiiragi EMBL Single cell transcriptomics [40] Michael Knop Heidelberg Quantitative modelling of microscopy data for protein turnover [41,67] Andreas Trumpp DKFZ RNA-seq data analysis [13, 20, 43] Matthias Hentze EMBL RNA interactome – statistical method development [25, 60, 61, 63, 66]. Funding: joint EIPOD Alvis Brazma EBI Quantitative methods for RNA-seq; imaging bioinformatics [29, 46, 75, 95, 105, 106, 131]. Funding: Systems Microscopy Gitte Neubauer, Cellzome / Thermal proteome profiling, high-content phenotyping and Gerard Drewes GSK multi-omics [9, 27]. Funding: GSK postdoc fellowship; joint EIPOD Judith Zaugg EMBL eQTL analysis – statistical method development [30, 68] Peer Bork EMBL Bioinformatics pipelines, statistical methods [76]. Funding: HD-HuB

Table 1: Overview of collaborations. Resulting publications and joint research grants (see also Sec- tionE) are stated where available. C Future Plans

Biostatistics for the 21st Century The ultimate goal of my research is the successful application of multi-omics and computational rea- soning to personalised health and medicine. My distinctive mark will be the combination of statistical methods innovation and practical application to leading-edge experiments or studies. I will continue to search out collaborations with biotechnology developers and biomedical researchers. I also plan to invest in the immersion of physician-scientists into genomic big data analysis. In terms of data types, the leading themes will be: • New technologies in nucleotide sequencing, proteomics, imaging, real-time monitoring • Pervasive longitudinal multi-omic data • Single-cell resolution for ever more assays • High-throughput genetics and precision oncology In terms of methods: • Data heterogeneity, data missing not at random and other biases • Structured learning • Translational statistics

C.1 New Technologies I aim to create innovative computational algorithms to mine the big and complex data that arise as part of developing new biotechnologies and applying them to novel areas of biology. Successful examples include microarrays [112,139,154], tiling arrays [122, 124], collaborative statistical computing [1, 139], RNAi [14, 71,123], RNA-seq [8, 10, 11, 18, 50, 82], 4C [7, 42], iCLIP [25], single-cell RNA-seq [4, 49], high-content phenotyping [2, 3, 54, 83, 84], iTRAQ [88], thermal proteome profiling [9, 27]. Current foci are: • Thermal proteome profiling and other applications of quantitative mass spectrometry – Data-driven biophysical modelling of melting curves – Rich multiparametric hierarchical models and (empirical) Bayes methods to make them identifiable from data • Single cell sequencing – Dimension reduction, detection and quantitative modelling of underlying structures: trajec- tories, gradients, bifurcation points, Waddington landscapes – Integrating multiple layers of data (e. g., DNA, transposase-accessible chromatin, RNA) • Imaging-based phenotyping of tumour models • High-throughput genetics I have always been keen to spot opportunities that might arise from early access to exciting new data types. Potential fields of future engagement are imaging (high-throughput super-resolution mi- croscopy for spatially resolved single-cell ‘omics), microfluidics, high-throughput synthetic biology (e. g., CRISPR), third-generation sequencing.

C.1.1 Pervasive Longitudinal Multi-Omic Data Humans are now the best-studied model organism. There are 7 billion individuals to be genotyped and phenotyped. There is a potential for extremely rich phenotypes, as the costs do not need to be born by research budgets. We can use data from clinics, which among other things are large phenotyping centres funded by health systems13. Moreover, wearable devices and the Internet of Things are emerging. They will provide rich data on life-styles and physiological parameters also from healthy humans. ‘Omic datasets of the past were from single time points, were picked together from ad hoc cohorts, had small sample sizes and used a single technology (e. g., microarrays). In contrast, datasets of the

13In 2013, 17.1% of the GDP of the USA was spent on health care, compared to 2.8% for research and development (incl. all sectors, not only health). Source: The World Bank, http://wdi.worldbank.org/table/2.15 and http://wdi.worldbank.org/ table/5.13 future will be pervasive (large cohorts, commoditized technologies), will be assayed at many time points during healthy life and disease, and use multiple ‘omic technologies to cover the range of relevant biology. Taken together, these developments will allow us to drive forward personalized medicine –the use of ‘omics and systems biology in evidence-based medicine– and personalized health – managing healthy life using new technologies (cf. the conference I co-organised, Section F). To help address associated challenges, I have assembled the international research network SOUND14. SOUND is funded by the European Commission within its Horizon 2020 Research and Innovation programme “Personalising Health and Care” and runs from 9/2015 to 8/2018. The partners com- prise bioinformatician-statisticians and physician-scientists from leading institutions in personalized medicine including NCT and EMBL Heidelberg, ETH and University Hospital Zurich, TU Munich, IDMEC Lisbon, BDD in The Hague and the Roswell Park Cancer Institute (USA). The objective of SOUND is to create the bioinformatic tools for statistically informed use of personal ’omic data in medicine, including cancers and rare metabolic diseases. Its partners have a strong track record and future commitment to Bioconductor (see Section C.3.3). Bioconductor has been exceedingly successful in enabling researchers to analyse the ‘omic datasets of the past, and the aim of SOUND is to help move forward Bioconductor to enable physician-scientists and biological researchers to effectively mine the pervasive longitudinal multi-omic data of the future.

C.1.2 Single-Cell Resolution for Ever More Data Types Many technologies were developed to work on bulk samples, i. e., on populations of millions of cells and billions of molecules. These numbers are coming down. In 2015, single-cell RNA sequencing for tens of thousands of cells (drop-seq) and the parallel sequencing of the same single cell’s RNA and DNA-methylation status were reported15. Other assays (e. g., transposase-accessible chromatin, ATAC-seq) are sure to follow. New developments in chemical biology, fluorescent probes and super-resolution microscopy are beginning to enable the spatial localization and quantification of specific RNA (and DNA) sequences at single molecule resolution. For the statistician, these data offer exciting opportunities: Error modeling – the technologies will have imperfect sensitivities and specificities. False positives and false negatives will not occur randomly, but often depend on biophysical biases (e. g., sequence, internal state, environment) that need to be discovered, quantitatively modelled and estimated. Signal processing – there is a need for designing clever codes (e. g., molecular barcodes) and to later deconvolute them, possibly in complex combinatorial ways and in the presence of error; see, e. g., the work by Xiaowei Zhuang’s lab on spatially resolved multiplexed RNA profiling in single cells16. Beyond averages – we will get variances and indeed full distributions, which need to be accurately and robustly estimated, and compared between each other (e. g., between cells with and without a stimulus). Patterns – what is noise, what is systematic behaviour? Variations that cancel out on average may or may not be actively regulated and systematic within single cells, and reveal important mechanisms. We addressed an instance of this question in [4]. Other examples are fluctuations in protein abundance that might be correlated by processes ensuring stoichiometry of operational units, or cellular localisation.

C.2 Application Areas: High-Throughput Genetics and Precision Oncology This line of research is a continuation of our successful work on gene-gene and gene-drug interactions (Section B.5). I plan to conduct it with primarily two strong, cross-fertilizing collaborations, one with a technology and cell line model focus and one with a translational and clinical focus.

14http://www.sound-biomed.eu 15Angermueller et al. (2016) http://dx.doi.org/10.1038/nmeth.3728 16Chen et al. (2015) http://dx.doi.org/10.1126/science.aaa6090 Figure'4'C)'

All'pa0ents' IGHV'D'unmutated' IGHV'D'mutated'

PI3K PI3K PI3K 100 100 100

20 S 20 S 20 S 80 Y 80 Y 80 Y K K K b b b P P P si si si li R li R li R la 40 T la 40 T la 40 T e 60 06 e 60 06 e 60 06 d d d i 2 i 2 i 2 60 60 60 K K K I3 7 I3 7 I3 7 P 60 H P 60 H P 60 H 40 C 40 C 40 C l l l

80 80 80 20 20 20

100 100 100

BTK 100 80 60 40 20 SYK BTK 100 80 60 40 20 SYK BTK 100 80 60 40 20 SYK

BTK ibrutinib BTK ibrutinib BTK ibrutinib

MEK MEK MEK 100 100 100

20 20 20 80 80 80 M M M ib T ib T ib T O O O in in in t R t R t R e e e 40 40 40 m 60 eve m 60 eve m 60 eve lu lu lu r r r se o se o se o li li li K m K m K m E 60 u E 60 u E 60 u M 40 s M 40 s M 40 s

80 80 80 20 20 20

100 100 100

BTK 100 80 60 40 20 MTOR BTK 100 80 60 40 20 MTOR BTK 100 80 60 40 20 MTOR

BTK ibrutinib BTK ibrutinib BTK ibrutinib

Figure 4: Pharmacogenomics of drug sensitivity. The position of each point in the ternary plots shows the relative response of a patient-derived primary chronic lymphocytic leukaemia (CLL) sam- ple to each of three drugs (ibrutinib, everolimus, selumetinib) that specifically target three different signalling kinases (BTK, MEK, MTOR). The circle size represents the average response of the sample to all three drugs. The plot highlights pathway-specific dependency distributions. While the majority of CLL with unmutated IGHV locus (left panel) depend about equally strongly on BTK and MEK activity, the distribution in CLL with mutated IGHV locus (right panel) is more dispersed and shows a subgroup that respond to MTOR inhibition and less to the other inhibitors.

C.3 Fundamental Problems in Statistics C.3.1 Data Heterogeneity, Data Missing Not at Random, and Biased Sampling The data heterogeneity challenge in multi-omics derives from the fact that for different ‘omic layers, different types of features are interrogated. DNA-related data are reported in chromosomal coordinate systems. The central dogma links that to RNA- and protein-related data, but the mapping can become ar- bitrarily complicated due to splicing, paralogy, post-transcriptional and post-translational modifications. Moreover, these processes may themselves be affected by the treatment of interest or differ between in- dividuals. For metabolites and drugs, the link to the other coordinate systems is even less well defined. Moreover, even though in the simplest case all levels of multi-omic data are measured simultaneously on the exact same samples, in practice they may be taken at more or less different body sites or with more or less time between them. Altogether, this means that while ’old’ omic data can be conveniently modelled by a 2D matrix (features × samples), multi-omic data are more complex than adding a 3rd dimension to the matrix: the mappings between features and samples at different levels are fiddly, dynamic and uncertain. We work on concepts, algorithms and software to address such challenges. Sampling is at the basis of much of statistics: voter polls are made not by asking everyone who will vote, but from a sufficiently large and representative sample. Similarly, in RNA-seq or ChIP-seq we do not sequence every DNA molecule that is theoretically available. Complications start when the sampling is biased. If the bias is precisely known, one can try to adjust for it. But in most cases, detecting, modelling and quantifying the important biases is part of the analyst’s task. Furthermore, she can feed such observations back ’upstream’ to improve technologies and experimental designs. A related problem is data missing not at random. For instance, in single-cell RNA-seq, some genes may go undetected and unreported due to low abundance, but the probability of such drop-out events may depend on biochemical and biophysical factors in complex ways. All of these challenges require deep engagement with the data, good mechanistic understanding of the data generating biology and technologies, but also of the downstream inferential expectations and, not least, mastery of statistical tools inclunding visualisation and regression modelling. C.3.2 The Importance of Structure If we want to estimate any kind of statistical or biophysical model in the high-dimensional setting, we need to impose additional structure onto the data. For the past twenty years, sparsity has been a popular and powerful structural assumption. The lasso is a popular incarnation of this, but more abstractly, the whole multiple testing field has made the same assumption: “only a few genes are truly differentially expressed”. Imposing such structural assumptions manifests itself in making intractable problems tractable and providing interpretable statistical results. Nevertheless, blindly using a sparsity assumption can lead us astray, especially in heterogeneous set- tings. We need to apply our accumulated biological knowledge to infer structural patterns. For instance, known signalling or metabolic pathways can impose natural structures on genetic or metabolomic datasets. We will continue to develop regularization strategies based on prior biological knowledge. I am particularly excited about developing methods that can learn or update structural assumptions in a data- driven way (our recent work [68] is a step in this direction). With the plethora of datasets available, we can use these in an Empirical Bayes way. Such an approach would enable iterative rounds of algorithm and model improvement, and data mining for new discovery.

C.3.3 Translational Statistics: Bioconductor This is one of the most difficult open problems in statistical research: how to rapidly produce robust software that solves a burning scientific question and share it with biomedical scientists. This question has been driving my research since I started in bioinformatics over 15 years ago, and my approach is embedded in the international Bioconductor collaboration [1]. Much of the infrastruc- ture of the Bioconductor project (archive, build system, website) is managed by Martin Morgan at the Roswell Park Cancer Institute in Buffalo, NY. Its scientific content, however, is driven by groups in multiple locations. I have a track record in algorithm development and flagship biological applications and I aim to maintain this role. • I will maintain the software engineering work in my group. Our aim is to increase the usability of scientific software in terms of documentation, performance, robustness and interoperability. • I will continue to organise interdisciplinary training courses, such as the Brixen and EMBO courses (see Section F). • I will continue to organise the European Bioconductor Developer Workshops and help with simi- lar events in other parts of the world. Scientific computing evolves rapidly. Although my work is strongly associated with R, this is no dogma. Our challenge will be to provide effective software platforms for in the medium term future, while safeguarding the investments that have been made (e.g. into R, CRAN and Bioconductor). Notably, over the last few years R has turned from an academic curiosity into a commercial-grade infrastructure17. This is excellent news for bioinformatics since the field will benefit from enormous commercial investments that would be unimaginable with research funding. Neverthe- less, I will also keenly monitor developments on other fronts, such as Julia and JavaScript18. Particular fields of focus of future work will be data wrangling, cloud computing and visualisation. Data wrangling is the process of converting data from one (raw) form into another form that allows for consumption of the data by downstream tools for analysis and integration. Not seldom it takes the majority of time of an applied analysis project19. There has recently been remarkable progress in this area, epitomized by the Hadleyverse20. Our particular challenge will be to merge the useful concept of tidy data21 with concepts that have made Bioconductor successful, including self-contained and self- documenting data sets, encapsulation, abstraction and provision of sufficient metadata.

17As evident e.g. from the formation of the R consortium, the acquisition of Revolution by Microsoft, the professional re- finement of R by RStudio, or the fact that leading high-tech companies including Facebook, Google, SAP hire R programmers. 18There is a friendly relationship between R and JavaScript as both derive from LISP / Scheme. 19http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html 20http://www.r-bloggers.com/welcome-to-the-hadleyverse 21Tidy Data. Hadley Wickham, Journal of Statistical Software 59:10 (2014) Cloudification of resources is a general trend in the computing world that offers cost savings and increased efficiency. Naturally, it is also affecting bioinformatics. I see our role here not to invent, but to lead the field by showing how to adapt and specialise generic solutions from the software industry. A recent example is our provision of Docker containers for an RNA-seq workflow22. Scientific visualisation has so far been remarkably conservative, presumably due to the overall con- servativeness of the scientific publication process, which is still centred around “papers” (equivalently: self-contained, printable PDF files). Nevertheless, future generations may learn to make better use of interactive, computer-aided visualisations and modern web technologies, and I plan to leverage such new developments from the wider computing world for scientific data visualisation and exploration.

22https://hub.docker.com/r/vladkim/rnaseq D Publications 2012-16

P indicates equal contributions, B co-corresponding authorships. See also http://www.huber.embl.de/ publications. Bibliometry is available, e. g., from Google Scholar.

Corresponding author papers 2012–16

[1] Orchestrating high-throughput genomic analysis with Bioconductor. Wolfgang HuberB, Vincent J. Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S. Carvalho, Hec- tor Corrada Bravo, Sean Davis, Laurent Gatto, Thomas Girke, Raphael Gottardo, Florian Hahne, Kasper D. Hansen, Rafael A. Irizarry, Michael Lawrence, Michael I. Love, James MacDonald, Valerie Obenchain, Andrzej K. Oles´, Herve´ Pages,` Alejandro Reyes, Paul Shannon, Gordon K. Smyth, Dan Tenenbaum, Levi Waldron, and Martin Morgan. Nature Methods, 12:115–121, 2015. pdf, url (35 citations23).

[2] A map of directional genetic interactions in a metazoan cell. Bernd FischerP, Thomas Sand- B mannP, Thomas HornP, Maximilian BillmannP, Varun Chaudhary, Wolfgang Huber , and Michael BoutrosB. eLife, 4, 2015. pdf, url.

[3] A chemical-genetic interaction map of small molecules using high-throughput imaging in B B cancer cells. Marco BreinigP, Felix A. KleinP, Wolfgang Huber , and Michael Boutros . Molecular Systems Biology, 11(12), 2015. pdf, url.

[4] Single-cell transcriptome analysis reveals coordinated ectopic gene-expression patterns in medullary thymic epithelial cells. Philip BrenneckeP, Alejandro ReyesP, Sheena PintoP, B B Kristin RattayP, Michelle Nguyen, Rita Kuchler,¨ Wolfgang Huber , Bruno Kyewski , and Lars M. SteinmetzB. Nature Immunology, 16:933–941, 2015. pdf, url.

[5] TimerQuant: A modelling approach to tandem fluorescent timer design and data interpre- tation for measuring protein turnover in embryos. Joseph D. Barry, Erika Dona,` Darren Gilmour, and Wolfgang Huber. Development, 143(1):174–179, 2016. pdf, url.

[6] SomaticSignatures: inferring mutational signatures from single-nucleotide variants. Ju- lian S. Gehring, Bernd Fischer, Michael Lawrence, and Wolfgang Huber. Bioinformatics, 31(22):3673–3675, 2015. pdf, url.

[7] FourCSeq: Analysis of 4C sequencing data. Felix A. Klein, Tibor Pakozdi, Simon Anders, Yad Ghavi-Helm, Eileen E. M. Furlong, and Wolfgang Huber. Bioinformatics, 31(19):3085– 3091, 2015. pdf, url.

[8] RNA-Seq workflow: gene-level exploratory analysis and differential expression. Michael I. Love, Simon Anders, Vladislav Kim, and Wolfgang Huber. F1000Research, 4(1070), 2015. pdf, url.

[9] Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry. Holger FrankenP, Toby MathiesonP, Dorothee ChildsP, Gavain M.A. SweetmanP, Thilo Werner, Ina Togel,¨ Carola Doce, Stephan Gade, Marcus Bantscheff, Gerard Drewes, Friedrich B.M ReinhardB, Wolfgang HuberB, and Mikhail M. SavitskiB. Nature Protocols, 10(10):1567–1593, 2015. pdf, url.

[10] Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Michael I. Love, Wolfgang Huber, and Simon Anders. Genome Biology, 15(12):550, 2014. pdf, url (112 citations).

23Source: ISI Web of Science. [11] HTSeq – a Python framework to work with high-throughput sequencing data. Simon An- ders, Paul Theodor Pyl, and Wolfgang Huber. Bioinformatics, 31(2):166–169, 2015. pdf, url (315 citations).

[12] h5vc: scalable nucleotide tallies with HDF5. Paul Theodor Pyl, Julian Gehring, Bernd Fis- cher, and Wolfgang Huber. Bioinformatics, 30(10):1464–1466, 2014. pdf, url.

[13] Transcriptome-wide profiling and posttranscriptional analysis of hematopoietic stem/progenitor cell differentiation toward myeloid commitment. Daniel KlimmeckP, Nina Cabezas-WallscheidP, Alejandro ReyesP, Lisa von Paleske, Simon Renders, Jenny Hansson, Jeroen Krijgsveld, Wolfgang HuberB, and Andreas TrumppB. Stem Cell Reports, 3(5):858–875, 2014. pdf, url.

[14] Mapping genetic interactions in human cancer cells with RNAi and multiparametric phe- B notyping. Christina LauferP, Bernd FischerP, Maximilian Billmann, Wolfgang Huber , and Michael BoutrosB. Nature Methods, 10:427–431, 2013. pdf, url (37 citations).

[15] Drift and conservation of differential exon usage across tissues in primate species. Alejandro ReyesP, Simon AndersP, Robert J. Weatheritt, Toby J. Gibson, Lars M. Steinmetz, and Wolf- gang Huber. Proc. Natl. Acad. Sci. U.S.A., 110(38):15377–15382, 2013. pdf, url (11 citations).

[16] Dynamical modelling of phenotypes in a genome-wide RNAi live-cell imaging assay. Gre- goire Pau, Thomas Walter, Beate Neumann, Jean-Karim Heriche,´ Jan Ellenberg, and Wolfgang Huber. BMC Bioinformatics, 14(1):308, 2013. pdf, url.

[17] The Genomic and Transcriptomic Landscape of a HeLa Cell Line. Jonathan LandryP, Paul Theodor PylP, Tobias Rausch, Thomas Zichner, Manu M. Tekkedil, Adrian M. Stutz,¨ Anna Jauch, Raeka S. Aiyar, Gregoire Pau, Nicolas Delhomme, Julien Gagneur, Jan O. Korbel, Wolf- gang HuberB, and Lars M. SteinmetzB. G3 (Bethesda), 3(8), 2013. pdf, url (85 citations).

[18] Detecting differential usage of exons from RNA-Seq data. Simon AndersP, Alejandro ReyesP, and Wolfgang Huber. Genome Research, 22:2008–2017, 2012. pdf, url (170 cita- tions).

Collaborative papers 2012–16

[19] A genetic interaction map of cell cycle regulators. Maximilian BillmannP, Thomas HornP, Bernd Fischer, Thomas Sandmann, Wolfgang Huber, and Michael Boutros. Molecular Biology of the Cell, 2016. pdf, url.

[20] Myc depletion induces a pluripotent dormant state mimicking diapause. Roberta Scog- namiglio, Nina Cabezas-Wallscheid, Marc Christian Thier, Sandro Altamura, Alejandro Reyes, Aine´ M. Prendergast, Daniel Baumgartner,¨ Larissa S. Carnevalli, Ann Atzberger, Simon Haas, Lisa von Paleske, Thorsten Boroviak, Philipp Worsd¨ orfer,¨ Marieke A.G. Essers, Ulrich Kloz, Robert N. Eisenman, Frank Edenhofer, Paul Bertone, Wolfgang Huber, Franciscus van der Ho- even, Austin Smith, and Andreas Trumpp. Cell, 164(4):668–680, 2016. pdf, url.

[21] Landscape and dynamics of transcription initiation in the malaria parasite Plasmodium fal- ciparum. Sophie H. Adjalley, Christophe D. Chabbert, Bernd Klaus, Vicent Pelechano, and Lars M. Steinmetz. Cell Reports, 14(10):2463–2475, 2016. pdf, url.

[22] Nuclear architecture organized by Rif1 underpins the replication-timing program. Rossana Foti, Stefano Gnan, Daniela Cornacchia, Vishnu Dileep, Aydan Bulut-Karslioglu, Sarah Diehl, Andreas Buness, Felix A. Klein, Wolfgang Huber, Ewan Johnstone, Remco Loos, Paul Bertone, David M. Gilbert, Thomas Manke, Thomas Jenuwein, and Sara C.B. Buonomo. Molecular Cell, 61(2):260–273, 2016. pdf, url. [23] CYP3A5 mediates basal and acquired therapy resistance in different subtypes of pancre- atic ductal adenocarcinoma. Elisa M Noll, Christian Eisen, Albrecht Stenzinger, Elisa Es- pinet, Alexander Muckenhuber, Corinna Klein, Vanessa Vogel, Bernd Klaus, Wiebke Nadler, Christoph Rosli,¨ Christian Lutz, Michael Kulke, Jan Engelhardt, Franziska M Zickgraf, Octavio Espinosa, Matthias Schlesner, Xiaoqi Jiang, Annette Kopp-Schneider, Peter Neuhaus, Marcus Bahra, Bruno V Sinn, Roland Eils, Nathalia A Giese, Thilo Hackert, Oliver Strobel, Jens Werner, Markus W Buchler,¨ Wilko Weichert, Andreas Trumpp, and Martin R Sprick. Nature Medicine, 22:278–287, 2016. pdf, url.

[24] An optogenetic method to modulate cell contractility during tissue morphogenesis. Giorgia Guglielmi, Joseph D. Barry, Wolfgang Huber, and Stefano De Renzis. Developmental Cell, 35(5):646–660, 2015. pdf, url.

[25] Improved binding site assignment by high-resolution mapping of RNA-protein interactions using iCLIP. Christian Hauer, Tomaz Curk, Simon Anders, Thomas Schwarzl, Anne-Marie Al- leaume, Jana Sieber, Ina Hollerer, Madhuri Bhuvanagiri, Wolfgang Huber, Matthias W. Hentze, and Andreas E. Kulozik. Nature Communications, 6(7921), 2015. pdf, url.

[26] The RNA-binding proteomes from yeast to man harbour conserved enigmRBPs. Benedikt M. Beckmann, Rastislav Horos, Bernd Fischer, Alfredo Castello, Katrin Eichelbaum, Anne-Marie Alleaume, Thomas Schwarzl, Tomaz Curk, Sophia Foehr, Wolfgang Huber, Jeroen Krijgsveld, and Matthias W. Hentze. Nature Communications, 6(10127), 2015. pdf, url.

[27] Thermal proteome profiling monitors ligand interactions with cellular membrane proteins. Friedrich B.M. Reinhard, Dirk Eberhard, Thilo Werner, Holger Franken, Dorothee Childs, Car- ola Doce, Maria Falth¨ Savitski, Wolfgang Huber, Marcus Bantscheff, Mikhail M. Savitski, and Gerard Drewes. Nature Methods, 2015. pdf, url.

[28] Mutational landscape and complexity in CLL. Thorsten Zenz and Wolfgang Huber. Blood, 126(18):2078–2079, 2015. pdf, url.

[29] Expression atlas update—an integrated database of gene and protein expression in humans, animals and plants. Robert Petryszak, Maria Keays, Y. Amy Tang, Nuno A. Fonseca, Elisa- bet Barrera, Tony Burdett, Anja Fullgrabe,¨ Alfonso Munoz-Pomer˜ Fuentes, Simon Jupp, Satu Koskinen, Oliver Mannion, Laura Huerta, Karine Megy, Catherine Snow, Eleanor Williams, Mitra Barzine, Emma Hastings, Hendrik Weisser, James Wright, Pankaj Jaiswal, Wolfgang Huber, Jyoti Choudhary, Helen E. Parkinson, and Alvis Brazma. Nucleic Acids Research, 44(1):D746–D752, 2016. pdf, url.

[30] Genetic control of chromatin states in humans involves local and distal chromosomal in- teractions. Grubert Fabian, Judith B. Zaugg, Maya Kasowski, Oana Ursu, Damek V. Spacek, Alicia R. Martin, Peyton Greenside, Rohith Srivas, Doug H. Phanstiel, Aleksandra Pekowska, Nastaran Heidari, Ghia Euskirchen, Wolfgang Huber, Jonathan K. Pritchard, Carlos D. Busta- mante, Lars M. Steinmetz, Anshul Kundaje, and Michael Snyder. Cell, 162(5):1051–1065, 2015. pdf, url.

[31] Recurrent CDKN1B (p27) mutations in hairy cell leukemia. Sascha Dietrich, Jennifer Hullein,¨ Stanley Chun-Wei Lee, Barbara Hutter, David Gonzalez, Sandrine Jayne, Martin J. S. Dyer, Małgorzata Oles´, Monica Else, Xiyang Liu, Mikołaj Słabicki, Bian Wu, Xavier Trous- sard, Jan Durig,¨ Mindaugas Andrulis, Claire Dearden, Christof von Kalle, Martin Granzow, Anna Jauch, Stefan Frohling,¨ Wolfgang Huber, Manja Meggendorfer, Torsten Haferlach, Anthony D. Ho, Daniela Richter, Benedikt Brors, Hanno Glimm, Estella Matutes, Omar Abdel Wahab, and Thorsten Zenz. Blood, 126(8):1005–1008, 2015. pdf, url.

[32] Single-cell polyadenylation site mapping reveals 3’ isoform choice variability. Lars Velten, Simon Anders, Aleksandra Pekowska, Aino I Jarvelin,¨ Wolfgang Huber, Vicent Pelechano, and Lars M. Steinmetz. Molecular Systems Biology, 11(6), 2015. pdf, url. [33] BRAF inhibitor therapy in HCL. Sascha Dietrich and Thorsten Zenz. Best Practice & Research Clinical Haematology, 28(4):246–252, 2015. url.

[34] A high-throughput ChIP-Seq for large-scale chromatin studies. Christophe D Chabbert, So- phie H Adjalley, Bernd Klaus, Emilie S Fritsch, Ishaan Gupta, Vicent Pelechano, and Lars M. Steinmetz. Molecular Systems Biology, 11(1), 2015. pdf, url.

[35] A novel inflammatory pathway mediating rapid hepcidin-independent hypoferremia. Clau- dia Guida, Sandro Altamura, Felix A. Klein, Bruno Galy, Michael Boutros, Artur J. Ulmer, Matthias W. Hentze, and Martina U. Muckenthaler. Blood, 125(14):2265–2275, 2015. pdf, url (13 citations).

[36] Fundamental physical cellular constraints drive self-organization of tissues. Daniel Sanchez-´ Gutierrez,´ Melda Tozluoglu, Joseph D. Barry, Alberto Pascual, Yanlan Mao, and Luis M Escud- ero. The EMBO Journal, 35(1):77–88, 2015. pdf, url.

[37] An open data ecosystem for cell migration research. Paola Masuzzo, Lennart Martens, Christophe Ampe, Kurt I. Anderson, Joseph Barry, Olivier De Wever, Olivier Debeir, Christine Decaestecker, Helmut Dolznig, Peter Friedl, Cedric Gaggioli, Benjamin Geiger, Ilya G. Goldberg, Elias Horn, Rick Horwitz, Zvi Kam, Sylvia E. Le Dev´ edec,´ Danijela Matic Vignjevic, Josh Moore, Jean-Christophe Olivo-Marin, Erik Sahai, Susanna A. Sansone, Victoria Sanz-Moreno, Staffan Stromblad,¨ Jason Swedlow, Johannes Textor, Marleen Van Troys, and Roman Zantl. Trends in Cell Biology, 25(2):55–58, 2015. pdf, url.

[38] Statistical relevance – relevant statistics, part I. Bernd Klaus. The EMBO Journal, 34(22):2727–2730, 2015. pdf, url.

[39] A discrete transition zone organizes the topological and regulatory autonomy of the adjacent Tfap2c and Bmp7 genes. Taro Tsujimura, Felix A. Klein, Katja Langenfeld, Juliane Glaser, Wolfgang Huber, and Franc¸ois Spitz. PLoS Genetics, 11(1):e1004897, 2015. pdf, url.

[40] Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Yusuke Ohnishi, Wolfgang Huber, Akiko Tsumura, Minjung Kang, Pana- giotis Xenopoulos, Kazuki Kurimoto, Andrzej K. Oles´, Marcos J. Arauzo-Bravo,´ Mitinori Saitou, Anna-Katerina Hadjantonakis, and Takashi Hiiragi. Nature Cell Biology, 16(1):27–37, 2014. pdf, url (49 citations).

[41] Protein quality control at the inner nuclear membrane. Anton Khmelinskii, Marina Panta- zopoulou, Bernd Fischer, Deike J. Omnus, Gaelle¨ Le Dez, Audrey Brossard, Alexander Gun- narsson, Joseph D. Barry, Matthias Meurer, Daniel Kirrmaier, Charles Boone, Wolfgang Huber, Gwenael¨ Rabut, Per O. Ljungdahl, and Michael Knop. Nature, 516(7531):410–413, 2014. pdf, url.

[42] Enhancer loops appear stable during development and are associated with paused poly- merase. Yad Ghavi-Helm, Felix A. KleinP, Tibor PakozdiP, Lucia Ciglar, Daan Noordermeer, Wolfgang Huber, and Eileen E. M. Furlong. Nature, 512(7512):96–100, 2014. pdf, url (51 citations).

[43] Identification of regulatory networks in HSCs and their immediate progeny via inte- grated proteome, transcriptome, and DNA methylome analysis. NinaPCabezas-Wallscheid, DanielPKlimmeck, JennyPHansson, Daniel BPLipka, AlejandroPReyes, Qi Wang, Dieter Weichenhan, Amelie Lier, Lisa von Paleske, Simon Renders, Peer Wunsche,¨ Petra Zeisberger, David Brocks, Lei Gu, Carl Herrmann, Simon Haas, Marieke A G Essers, Benedikt Brors, Roland Eils, Wolfgang Huber, Michael D Milsom, Christoph Plass, Jeroen Krijgsveld, and Andreas Trumpp. Cell Stem Cell, 15(4):507–522, 2014. pdf, url (24 citations). [44] Measuring genetic interactions in human cells by RNAi and imaging. Christina Laufer, Bernd Fischer, Wolfgang Huber, and Michael Boutros. Nature Protocols, 9(10):2341–2353, 2014. pdf, url.

[45] Alternative polyadenylation diversifies post-transcriptional regulation by selective RNA– protein interactions. Ishaan Gupta, Sandra Clauder-Munster,¨ Bernd Klaus, Aino I Jarvelin,¨ Raeka S. Aiyar, Vladimir Benes, Stefan Wilkening, Wolfgang Huber, Vicent Pelechano, and Lars M. Steinmetz. Molecular Systems Biology, 10(2), 2014. pdf, url (12 citations).

[46] Expression Atlas update–a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments. Robert Petryszak, Tony Burdett, Benedetto Fiorelli, Nuno A. Fonseca, Mar Gonzalez-Porta, Emma Hastings, Wolfgang Huber, Simon Jupp, Maria Keays, Nataliya Kryvych, Julie McMurry, John C. Marioni, James Malone, Karine Megy, Gabriella Rustici, Amy Y. Tang, Jan Taubert, Eleanor Williams, Oliver Mannion, Helen E. Parkin- son, and Alvis Brazma. Nucleic Acids Research, 42(1):D926–932, 2014. pdf, url (63 citations).

[47] A genome-wide map of mitochondrial DNA recombination in yeast. Emilie S. Fritsch, Christophe D. Chabbert, Bernd Klaus, and Lars M. Steinmetz. Genetics, 198(2):755–771, 2014. pdf, url.

[48] Directional tissue migration through a self-generated chemokine gradient. Erika Dona,` Joseph D. Barry, Guillaume Valentin, Charlotte Quirin, Anton Khmelinskii, Andreas Kunze, Sevi Durdu, Lionel R. Newton, Ana Fernandez-Minan, Wolfgang Huber, Michael Knop, and Darren Gilmour. Nature, 503(7475):285–289, 2013. pdf, url (58 citations).

[49] Accounting for technical noise in single-cell RNA-seq experiments. Philip BrenneckeP, Si- mon AndersP, Jong Kyoung KimP, Aleksandra A. Kolodziejczyk, Xiuwei Zhang, Valentina Proserpio, Bianka Baying, Vladimir Benes, Sarah A. Teichmann, John C. Marioni, and Marcus G. Heisler. Nature Methods, 10(11):1093–1095, 2013. pdf, url (77 citations).

[50] Count-based differential expression analysis of RNA sequencing data using R and Biocon- ductor. Simon Anders, Davis J McCarthy, Yunshun Chen, Michal Okoniewski, Gordon K Smyth, Wolfgang Huber, and Mark D Robinson. Nature Protocols, 8(9):1765–1786, 2013. pdf, url (136 citations).

[51] An Evaluation of High-Throughput Approaches to QTL Mapping in Saccharomyces cere- visiae. Stefan Wilkening, Gen Lin, Emilie S. Fritsch, Manu M. Tekkedil, Simon Anders, Raquel Kuehn, Michelle Nguyen, Raeka S. Aiyar, Michael Proctor, Nikita A. Sakhanenko, David J. Galas, Julien Gagneur, Adam Deutschbauer, and Lars M. Steinmetz. Genetics, 196(3):853–865, 2014. pdf, url (11 citations).

[52] High-content siRNA screen reveals global ENaC regulators and potential cystic fibrosis ther- apy targets. Joana Almac¸aP, Diana FariaP, Marisa Sousa, Inna Uliyakina, Christian Conrad, Lalida Sirianant, Luka A. Clarke, Jose´ Paulo Martins, Miguel Santos, Jean-Karim Heriche,´ Wolf- gang Huber, Rainer Schreiber, Rainer Pepperkok, Karl Kunzelmann, and Margarida D. Amaral. Cell, 154(6):1390–1400, 2013. pdf, url (14 citations).

[53] Software for computing and annotating genomic ranges. Michael Lawrence, Wolfgang Hu- ber, Herve´ Pages,` Patrick Aboyoun, Marc Carlson, Robert Gentleman, Martin T. Morgan, and Vincent J. Carey. PLoS Computational Biology, 9(8):e1003118, 2013. pdf, url (92 citations).

[54] CellH5: a format for data exchange in high-content screening. Christoph Sommer, Michael Held, Bernd Fischer, Wolfgang Huber, and Daniel W. Gerlich. Bioinformatics, 29:1580–1582, 2013. pdf, url.

[55] Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size. Danni Yu, Wolfgang Huber, and Olga Vitek. Bioinformatics, 29:1275– 1282, 2013. pdf, url. [56] Control of tissue morphology by Fasciclin III-mediated intercellular adhesion. Richard E. WellsP, Joseph D. BarryP, Simon Cuhlmann, Paul Evans, Wolfgang Huber, David Strutt, and Martin P. Zeidler. Development, 140:3858–3868, 2013. pdf, url. [57] Direct competition between hnRNP C and U2AF65 protects the transcriptome from the ex- onization of Alu elements. Kathi Zarnack, Julian Konig,¨ Mojca Tajnik, Inigo Martincorena, Se- bastian Eustermann, Isabelle Stevant,´ Alejandro Reyes, Simon Anders, Nicholas M. Luscombe, and Jernej Ule. Cell, 152(3):453–466, 2013. pdf, url (69 citations). [58] An efficient method for genome-wide polyadenylation site mapping and RNA quantifica- tion. Stefan Wilkening, Vicent Pelechano, Aino I. Jarvelin,¨ Manu M. Tekkedil, Simon Anders, Vladimir Benes, and Lars M. Steinmetz. Nucleic Acids Research, 41(5):e65, 2013. pdf, url (27 citations). [59] Properties of isotope patterns and their utility for peptide identification in large-scale pro- teomic experiments. Satoshi Okawa, Bernd Fischer, and Jeroen Krijgsveld. Rapid Communica- tions in Mass Spectrometry, 27(9):1067–1075, 2013. url. [60] RNA-binding proteins in Mendelian disease. Alfredo Castello, Bernd Fischer, Matthias W Hentze, and Thomas Preiss. Trends in Genetics, 29:318–327, 2013. pdf, url (43 citations). [61] System-wide identification of RNA-binding proteins by interactome capture. Alfredo Castello, Rastislav Horos, Claudia Strein, Bernd Fischer, Katrin Eichelbaum, Lars M. Stein- metz, Jeroen Krijgsveld, and Matthias W Hentze. Nature Protocols, 8(3):491–500, 2013. pdf, url (26 citations). [62] Biggest challenges in bioinformatics. Jonathan C Fuller, Pierre Khoueiry, Holger Dinkel, Kristoffer Forslund, Alexandros Stamatakis, Joseph Barry, Aidan Budd, Theodoros G Soldatos, Katja Linssen, and Abdul Mateen Rajput. EMBO reports, 14(4):302–304, 2013. pdf, url. [63] The RNA-binding protein repertoire of embryonic stem cells. S Chul Kwon, Hyerim Yi, Katrin Eichelbaum, Sophia Fohr,¨ Bernd Fischer, Kwon Tae You, Alfredo Castello, Jeroen Krijgsveld, Matthias W Hentze, and V Narry Kim. Nature Structural and Molecular Biology, 2013. pdf, url (69 citations). [64] Highly coordinated proteome dynamics during reprogramming of somatic cells to pluripo- tency. Jenny Hansson, Mahmoud Reza Rafiee, Sonja Reiland, Jose M. Polo, Julian Gehring, Satoshi Okawa, Wolfgang Huber, Konrad Hochedlinger, and Jeroen Krijgsveld. Cell Reports, 2(6):1579–1592, 2012. pdf, url (67 citations). [65] A cross-platform toolkit for mass spectrometry and proteomics. Matthew C Chambers, Bren- dan Maclean, Robert Burke, Dario Amodei, Daniel L Ruderman, Steffen Neumann, Laurent Gatto, Bernd Fischer, Brian Pratt, Jarrett Egertson, Katherine Hoff, Darren Kessner, Natalie Tasman, Nicholas Shulman, Barbara Frewen, Tahmina A Baker, Mi-Youn Brusniak, Christopher Paulse, David Creasy, Lisa Flashner, Kian Kani, Chris Moulding, Sean L Seymour, Lydia M Nuwaysir, Brent Lefebvre, Frank Kuhlmann, Joe Roark, Paape Rainer, Suckau Detlev, Tina Hemenway, An- dreas Huhmer, James Langridge, Brian Connolly, Trey Chadick, Krisztina Holly, Josh Eckels, Eric W Deutsch, Robert L Moritz, Jonathan E Katz, David B Agus, Michael MacCoss, David L Tabb, and Parag Mallick. Nature Biotechnology, 30(10):918–920, 2012. pdf, url (175 citations). [66] Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. Alfredo CastelloP, Bernd FischerP, Katrin Eichelbaum, Rastislav Horos, Benedikt M. Beckmann, Claudia Strein, Norman E. Davey, David T. Humphreys, Thomas Preiss, Lars M. Steinmetz, Jeroen Krijgsveld, and Matthias W. Hentze. Cell, 149:1393–1406, 2012. pdf, url (319 citations). [67] Tandem fluorescent protein timers for in vivo analysis of protein dynamics. Anton Khmelin- skii, Philipp J. Keller, Anna Bartosik, Matthias Meurer, Joseph D. Barry, Balca R. Mardin, An- dreas Kaufmann, Susanne Trautmann, Malte Wachsmuth, Gislene Pereira, Wolfgang Huber, El- mar Schiebel, and Michael Knop. Nature Biotechnology, 30:708–714, 2012. pdf, url (49 citations). Preprints

[68] Data-driven hypothesis weighting increases detection power in big data analytics. Nikolaos Ignatiadis, Bernd Klaus, Judith Zaugg, and Wolfgang Huber. bioRχiv, 2015. pdf, url.

[69] Neural lineage induction reveals multi-scale dynamics of 3D chromatin organization. Alek- sandra Pekowska, Bernd Klaus, Felix Alexander Klein, Simon Anders, Małgorzata Oles´, Lars M. Steinmetz, Paul Bertone, and Wolfgang Huber. bioRχiv, 2014. pdf, url.

[70] Mutated SF3B1 is associated with transcript isoform changes of the genes UQCC and RPL31 both in clls and uveal melanomas. Alejandro Reyes, Carolin Blume, Vicent Pelechano, Petra Jakob, Lars M. Steinmetz, Thorsten Zenz, and Wolfgang Huber. bioRχiv, 2014. pdf, url. E List of External Grants (since 2012)

Duration Name – Funding Body. Role. Topic 2015-18 SOUND (Statistical Multi-Omics Understanding) – Collaborative research 36 months project, Horizon 2020 Research and Innovation programme Personalising Health and Care, European Commission. I am the coordinator and lead one scientific work-package. Topic: to create the bioinformatic tools for statistically informed use of personal genomic and other omic data in medicine.

2011-15 Systems Microscopy – Network of Excellence, FP7-HEALTH-2010, European 60 months Commission. I led three RTD work packages and was part of the Executive Board. Topic: data-driven modelling of cell biological processes from life cell imaging data.

2012-15 Radiant – Collaborative research project, FP7-HEALTH-2010, European 36 months Commission. I was scientific co-coordinator (with Magnus Rattray and Neil Lawrence) and led two RTD work packages. Topic: statistical methods for high-throughput sequencing technologies.

2013-15 BIGDATA (Scalable Statistical Computing for Emerging Omics Data Streams) 24 months – US National Science Foundation (NSF) Mid-scale project: DA: ESCE: Collaborative Research. I was co-investigator. Topic: scaling statistical methods and Bioconductor software for large ‘omics data streams.

2015-17 BioTop (Bioinformatic tool harmonization for personalized cancer care) – 24 months BMBF. I am a co-investigator, responsible for RNA-seq data types. Topic: standardising methods for analysing high-throughput sequencing data for translational cancer research.

2016-19 TRANSCAN GCH-CLL (Translational research on human tumour hetero- 36 months geneity to overcome recurrence and resistance to therapy) – ERA-NET on translational Cancer Research (TRANSCAN) project. I am a co-investigator, responsible for computational aspects. Topic: intra-tumour heterogeneity in chronic lymphocytic leukaemia.

2014-17 GSK postdoc fellowship – Cellzome GmbH. Academic partner. Topic: 3-year 36 months postdoc project on developing computational and statistical methods for thermal proteome profiling.

2015-19 HD-HuB (Heidelberg Centre for Human Bioinformatics) – BMBF. Co- 60 months investigator, contributing a work-package on R/Bioconductor based workflows. F Curriculum Vitae

Wolfgang Huber European Molecular Biology Laboratory (EMBL) D 69117 Heidelberg ∗ 28.5.1968 in Bad Sackingen¨ www.huber.embl.de nationality: [email protected]

Positions

EMBL Research group leader Dec 2011 - present Senior Scientist Heidelberg, Mar 2009 - present Genome Biology Unit (UK), Sep 2004 - Feb 2009 European Bioinformatics Institute (EBI) DKFZ Postdoc cancer transcriptomics Heidelberg, Mar 2000 - Sep 2004 IBM Research Postdoc cheminformatics Almaden, San Jose (California) Jun 1998 - Dec 1999 University of Freiburg Research and teaching assistant, Faculty of Physics Oct 1994 - May 1998 Univ. Clinic Freiburg Research assistant, Department of Neurology Sep 1991 - Dec 1997

Education

1998 Dr. rer. nat. (Theoretical Physics) Univ. of Freiburg Thesis Dynamics of strongly driven open quantum systems 1994 Diplom (Physics) Univ. of Freiburg Minor in Mathematics (Probability and Statistics) 1990/91 Non-graduating exchange student Univ. of Edinburgh Physics 1990 Vordiplom (Physics) Univ. of Freiburg Minors in Mathematics and Chemistry

Academic Services – external

Journal reviewing Bioinformatics, Biostatistics, Cell Reports, EMBO Reports, FEBS Letters, Genome Biology, G3 (Genes k Genomes k Ge- netics), Genome Research, Methods, Molecular Systems Biol- ogy, Nature, Nature Biotechnology, Nature Cell Biology, Nature Methods, Nucleic Acids Research, PLoS ONE, Science, Science Translational Medicine; Programme Committees ECCB 2012, ISMB/ECCB 2013, ECCB 2014 Editorial board Bioinformatics, Giga Science, F1000Prime

Grant review boards HFSP Fellowships Research proposal reviewing Academy of Finland, ERC, French NCI (INCa), HRCMM, Na- tional Science Centre (Poland), Swiss National Science Foun- dation (SNF), Skolkovo Fund, Stichting Kinderen Kankervrij (Foundation Children Cancerfree), Wellcome Trust, Wiener Wissenschafts-, Forschungs- und Technologiefonds (WWTF), others Boards Scientific Advisory Board (SAB) and Technical Advisory Board: Bioconductor Project (2003 - ) SAB: Sophia Genetics S.A. (CH) (2011 - 2015) SAB: UMR3244 in Institut Curie (F) (2015 - ) SAB: Graduate School of Quantitative Biosciences Munich (2014 - ) SAB (Observer): Expression Atlas at EBI Executive Board: Systems Microscopy EC FP7 Network of Ex- cellence (2011 - 2015) Consulting Genentech (2010 - 2015) Evotec (2013 - 2014)

Academic Services – within EMBL

Annually since 2007 Coordinator of the ’Omics module of the EMBL International PhD Programme course 2012-2016 Thesis Advisory Committee: >40 students

Conference (co-)organisation

16 - 18 February 2012 Omics and Personalised Health, conference (140 participants) EMBL Heidelberg with Lars Steinmetz, Lee Hood and Rudi Balling 7 - 8 June 2014 Annual meeting of the RADIANT consortium (37 participants) EMBL Heidelberg with Magnus Rattray 12 - 13 January 2015 Bioconductor European Developer Conference (44 participants) EMBL Heidelberg with Martin Morgan 31 May - 5 June 2015 Workshop on Statistical Learning of Biological Systems from Centro Stefano Franscini, Ascona, CH Perturbations (55 participants) with Niko Beerenwinkel, Peter Buhlmann¨ 16 - 19 November 2015 Stanford - EMBL conference: Omics and Personalised Health EMBL Heidelberg (150 participants) with Lars Steinmetz, Judith Zaugg, Michael Snyder, Peer Bork, Jan Ellenberg 24 - 25 November 2015 C1omics - Single Cell ’Omics (57 participants) CR UK Manchester Institute with Magnus Rattray, Crispin Miller 19 - 21 May 2016 Cancer Systems Genetics DKFZ Heidelberg with Claudia Scholl, Stefan Frohling,¨ Michael Boutros 6 - 8 June 2016 Perspectives in Translational Medicine, EMBL Partnership Con- EMBL ference with Plamena Markova, Andreas Kulozik, Luis Serrano, Kjetil Tasken, Matthias Wilmanns 4 September 2016 Clinical Bioinformatics as a Service, ECCB Workshop The Hague, NL with Niko Beerenwinkel, Daniel Stekhoven, Simon Tavare´

Teaching

2 - 6 July 2012 CSAMA Summer School: Statistics and Computing in Genome 23 - 28 June 2013 Data Science, Brixen, South Tyrol 22 - 27 June 2014 14 - 19 June 2015 10 - 15 July 2016 17 - 22 October 2012 EMBO Practical Course: Analysis and informatics of transcrip- tomics data, Shenzhen, China. 24 - 25 January 2013 EMBL Practical Course: Advanced R programming, EMBL 16 - 16 January 2015 Heidelberg 25 - 26 February 2016 9 September 2012 ECCB Tutorial – Reads to Biological Patterns: End-to-End Dif- ferential Expression Analysis of RNA Sequencing Data Using Bioconductor 7 September 2014 ECCB Workshop – Analysis of Differential Isoform Usage by RNA-seq: Statistical Methodologies and Open Software 3 - 8 March 2013 EMBO Practical Course: High-throughput RNAi, EMBL/DKFZ Heidelberg 29 October - 3 November 2012 EMBO Practical Course: Analysis and informatics of transcrip- 20 - 24 October 2014 tomics data, EBI-EMBL, Hinxton, UK 19 - 23 October 2015 5 - 9 September 2016 15 - 20 October 2012 EMBO Practical Course: High-Throughput Microscopy for Sys- 20 -26 October 2014 tems Biology, EMBL Heidelberg 17 - 22 October 2016 Above are the courses that I organised or co-organised, with level of responsibility ranked from top to bottom. I have taught at others, mentioned below.

Selected speaker invitations (2012-16 only)

29 February 2012, Munich, DE Genomatix GmbH, internal seminar

20 March 2012, Mainz, DE University, Institute of Molecular Biology, institute seminar

26 June 2012, Augsburg, DE University, Institute for Mathematics, institute seminar

28 June 2012, Wurzburg,¨ DE University, Institute for Medical Infection Genomics, RNA-seq Workshop 23 - 25 July 2012, Seattle, USA Bioconductor conference

31 August 2012, Cambridge, UK From Phenotypes to Pathways, conference

10 - 11 October 2012, Cambridge, UK Literature-Data Integration, workshop

12 - 13 October 2012, Potsdam, DE From genomes to networks - New developments in complex disease analysis, annual workshop of the Society for Gene- Diagnostics) 6 - 7 December 2012, Dresden, DE Biotec Forum, conference

10 December 2012, Heidelberg, DE University, Heidelberger Kolloquium Medizinische Biometrie, Informatik und Epidemiologie 11 December 2012, Heidelberg, DE NGFN Annual Meeting, conference

13 - 14 December 2012, Zurich, CH Bioconductor Developer Conference

19 March 2013, Palo Alto, USA Stanford Genome Technology Centre, institute seminar

20 March 2013, South San Francisco, Genentech Inc., internal seminar USA 8 April 2013, Barcelona, ES Institute of Predictive and Personalized Medicine of Cancer (IMPPC), institute seminar 24 - 27 April 2013, Freiburg, DE Preclinical models of cancer: Towards enhanced clinical rele- vance and predictivity, conference 13 May 2013, Lisbon, PT University, Instituto de Medicina Molecular (IMM), institute seminar 19 - 24 May 2013, Dagstuhl, DE Dagstuhl Seminar 13212: Computational Methods Aiding Early-Stage Drug Design 17 - 19 July 2013, Seattle, USA Bioconductor Conference

23 July 2013, Berlin, DE ISMB Workshop: Professional Networks in Bioinformatics

11-16 August 2013, Banff, CA BIRS workshop: Statistical Data Integration Challenges in Computational Biology: Regulatory Networks and Personalized Medicine 8 - 11 September 2013, Bertinoro, I Computational Biology meeting: Computational Cancer Ge- nomics, conference 23 September 2013, Tubingen,¨ DE Summer School on Machine Learning for Personalized Medicine 25 September 2013, Stockholm, SE Karolinska Institutet, institute seminar

13 - 19 October 2013, Bedlewo, PL Autumn school on Computational Aspects of Gene Regulation

28 October - 3 November 2013, RNA-seq course at Brazilian Symposium on Bioinformatics Recife, Brazil 9 - 10 December 2013, Cambridge, Bioconductor Developer Conference UK 12 - 13 December 2013, Cambridge, Quantitative Methods in Gene Regulation, conference UK 9 - 10 January 2014, Paris, F Institut Curie, institute seminar

16 January 2014, Munster,¨ DE Max-Planck-Institute for Molecular Biomedicine, institute sem- inar 12 May 2014, Heidelberg, DE Cellzome, internal seminar

2 June 2014, Stockholm, SE Systems Microscopy, conference

12 May 2014, Heidelberg, DE Cellzome, internal seminar

2 July 2014, Saarbrucken,¨ DE Max-Planck-Institute for Informatics, institute seminar 20 October 2014, Munich, DE LMU, Gene Centre, institute seminar

29 - 31 October 2014, Stockholm, SE EMBO Workshop on a Systems-Level View of Cytoskeletal Function, conference 27 - 28 November 2014, Helsiniki, FI Institute for Molecular Medicine of Finland (FIMM), institute seminar 29 - 30 January 2015, Zurich, CH RADIANT workshop

12 - 13 February 2015, Munich, DE Statistical Methods for Post Genomic Data, conference

17 - 19 February 2015, NYU Abu Genomics and Systems Biology, conference Dhabi, UAE 25 March 2015, Heidelberg, DE R User Meeting Rhein-Neckar, workshop

16 - 17 April 2015, Kloster Cancer Genomics Meets Cancer Proteomics, workshop Johannisberg, DE 9 June 2015, London, UK Big Data Analytics, conference

20 - 22 July 2015, Seattle, USA Bioconductor conference

31 July - 2 August 2015, Pozega, Summer School of Science Croatia 15 - 18 September 2015, Saas-Fee, CH CERN ROOT 20th anniversary workshop

26 - 27 October 2015, Arlington, USA NSF workshop on Mathematical Biology

7 - 8 December 2015, Cambridge, UK Bioconductor Developer Conference

13 January 2016, Heidelberg, DE University Hospital, Medical Clinic V, institute seminar

22 - 29 January 2016, Bellairs, Genetic Networks, workshop Barbados 15 February 2016, London, UK Imperial College BRC Genomics Seminar Series

23 February 2016, Basel, CH Novartis, internal seminar

8 March 2016, Palo Alto, USA Stanford University, Department of Statistics, institute seminar

9 March 2016, Claremont, USA Harvey Mudd College, Biology Colloqium

10 March 2016, Berkeley, USA UC Berkeley, Department of Statistics, Statistics & Genomics Seminar 16 March 2016, Santa Cruz, USA UC Santa Cruz, institute seminar

18 March 2016, Mountain View, USA 23andMe, internal seminar

23 March 2016, Palo Alto, USA Stanford Genome Technology Center, institute seminar

14 April 2016, Utrecht, NL Centre for Molecular Medicine, institute seminar

22 April 2016, Mainz, DE Institute for Medical Biometry, Epidemiology and Informatics (IMBEI), symposium 25 - 27 April 2016, Copenhagen, DK MedBioinformatics, conference

2 May 2016, Paris, F High Energy Physics Software Foundation, workshop 30 May - 3 June 2016, Paris, F Ecole´ Analyse Genome´ Tumoral, summer school

Software

See also http://www.huber.embl.de/software

Primary author and maintainer vsn: microarray normalisation [154] cellHTS, cellHTS2: RNAi screen normalisation and quality control [123] tilingArray: transcript discovery and mapping [124] arrayQualityMetrics: interactive microarray data quality re- ports [105] Initiation, co-authorship, supervision DESeq, DESeq2: RNA-seq differential expression [10][82] htseq: processing reads from high-throughput sequencing [11] IHW: Independent hypothesis weighting [68] lpsymphony: mixed integer-linear program solver biomaRt: programmatic access to BioMarts [131] EBImage: image processing in R [84] DEXSeq: detecting differential usage of exons from RNA-seq data [18] h5vc: scalable nucleotide tallies with HDF5 [12] rhdf5: HDF5 interface to R FourCSeq: analysis of 4C sequencing data [7] SomaticSignatures: inferring mutational signatures from single- nucleotide variants [6] BiocStyle: document formatting for executable documents Publications from before 2012

[71] Mapping of signalling networks through synthetic genetic interaction analysis by RNAi. Thomas HornP, Thomas SandmannP, Bernd FischerP, Elin Axelsson, Wolfgang Huber, and Michael Boutros. Nature Methods, 8(4), 2011. pdf, url (74 citations).

[72] Antisense expression increases gene expression variability and locus interdependency. Zhenyu XuP, Wu WeiP, Julien GagneurP, Sandra Clauder-Munster,¨ Miłosz Smolik, Wolfgang Huber, and Lars M. Steinmetz. Molecular Systems Biology, 7, 2011. pdf, url (65 citations).

[73] cAMP response element-binding protein is a primary hub of activity-driven neuronal gene expression. E. Benito, L. M. Valor, M. Jimenez-Minchan, W. Huber, and A. Barco. Journal of Neuroscience, 31:18237–18250, 2011. pdf, url (31 citations).

[74] Genome-wide survey of post-meiotic segregation during yeast recombination. Eugenio Mancera, Richard Bourgon, Wolfgang Huber, and Lars M. Steinmetz. Genome Biology, 12:R36, 2011. pdf, url (10 citations).

[75] Contributions of the EMERALD project to assessing and improving microarray data qual- ity. Vidar Beisvag,˚ Audrey Kauffmann, James Malone, Carole Foy, Marc Salit, Heinz Schim- mel, Erik Bongcam-Rudloff, Ulf Landegren, Helen Parkinson, Wolfgang Huber, Alvis Brazma, Arne K. Sandvik, and Martin Kuiper. BioTechniques, 50:27–31, 2011. pdf, url.

[76] Enterotypes of the human gut microbiome. Mani Arumugam, Jeroen Raes, E. Pelletier, D. Le Paslier, T. Yamada, D. R. Mende, G. R. Fernandes, J. Tap, T. Bruls, J. M. Batto, M. Bertalan, N. Borruel, F. Casellas, L. Fernandez, L. Gautier, T. Hansen, M. Hattori, T. Hayashi, M. Kleere- bezem, K. Kurokawa, M. Leclerc, F. Levenez, C. Manichanh, H. B. Nielsen, T. Nielsen, N. Pons, J. Poulain, J. Qin, T. Sicheritz-Ponten, S. Tims, D. Torrents, E. Ugarte, E. G. Zoetendal, J. Wang, F. Guarner, O. Pedersen, W. M. de Vos, S. Brunak, J. Dore, J. Weissenbach, S. D. Ehrlich, Peer Bork, Metagenomics Consortium:, M. Antolin, F. Artiguenave, H. M. Blottiere, M. Almeida, C. Brechot, C. Cara, C. Chervaux, A. Cultrone, C. Delorme, G. Denariaz, R. Dervyn, K. U. Foerst- ner, C. Friss, M. van de Guchte, E. Guedon, F. Haimet, Wolfgang Huber, J. van Hylckama-Vlieg, A. Jamet, C. Juste, G. Kaci, J. Knol, O. Lakhdari, S. Layec, K. Le Roux, E. Maguin, A. Merieux, R. Melo Minardi, C. M’rini, J. Muller, R. Oozeer, J. Parkhill, P. Renault, M. Rescigno, N. Sanchez, S. Sunagawa, A. Torrejon, K. Turner, G. Vandemeulebrouck, E. Varela, Y. Winogradsky, and G. Zeller. Nature, 473:174–180, 2011. pdf, url (141 citations).

[77] Assessing Affymetrix GeneChip microarray quality. Matthew M. McCall, Peter N. Murakami, Margus Lukk, Wolfgang Huber, and Rafael A. Irizarry. BMC Bioinformatics, 12:137, 2011. pdf, url (23 citations).

[78] Polymorphisms in CTNNBL1 in relation to colorectal cancer with evolutionary implications. S. Huhn, D. Ingelfinger, J. L. Bermejo, M. Bevier, B. Pardini, A. Naccarati, V. Steinke, N. Rahner, E. Holinski-Feder, M. Morak, H. K. Schackert, H. Gorgens, C. P. Pox, T. Goecke, M. Kloor, M. Lo- effler, R. Buttner, L. Vodickova, J. Novotny, K. Demir, C. M. Cruciat, R. Renneberg, W. Huber, C. Niehrs, M. Boutros, P. Propping, P. Vodieka, K. Hemminki, and A. Forsti. Int J Mol Epidemiol Genet, 2:36–50, 2011. pdf, url.

[79] Extracting quantitative genetic interaction phenotypes from matrix combinatorial RNAi. Elin Axelsson, Thomas Sandmann, Thomas Horn, Michael Boutros, Wolfgang Huber, and Bernd Fischer. BMC Bioinformatics, 12:342, 2011. pdf, url.

[80] Relating CNVs to transcriptome data at fine-resolution: assessment of the effect of variant size, type, and overlap with functional regions. Andreas Schlattl, Simon Anders, Sebastian M. Waszak, Wolfgang Huber, and Jan O. Korbel. Genome Research, 21:2004–2013, 2011. pdf, url (42 citations). [81] Independent filtering increases detection power for high-throughput experiments. Richard Bourgon, Robert Gentleman, and Wolfgang Huber. PNAS, 107(21):9546–9551, 2010. pdf, url (152 citations).

[82] Differential expression analysis for sequence count data. Simon Anders and Wolfgang Huber. Genome Biology, 11:R106, 2010. pdf, url (2460 citations).

[83] Clustering phenotype populations by genome-wide RNAi and multiparametric imaging. Flo- rian FuchsP, Gregoire PauP, Dominique Kranz, Oleg Sklyar, Christoph Budjan, Sandra Stein- brink, Thomas Horn, Angelika Pedal, Wolfgang Huber, and Michael Boutros. Molecular Systems Biology, 6(370), 2010. pdf, url (59 citations).

[84] EBImage – an R package for image processing with applications to cellular phenotypes. Gre- goire Pau, Florian Fuchs, Oleg Sklyar, Michael Boutros, and Wolfgang Huber. Bioinformatics, 26:979–981, 2010. pdf, url (60 citations).

[85] Genome-wide analysis of mRNA decay patterns during early Drosophila development. Ste- fan Thomsen, Simon Anders, Sarath Chandra Janga, Wolfgang Huber, and Claudio R. Alonso. Genome Biology, 11:R93, 2010. pdf, url (36 citations).

[86] Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes. Beate Neumann, Thomas Walter, Jean-Karim Heriche,´ Jutta Bulkescher, Holger Erfle, Christian Conrad, Phill Rogers, Ina Poser, Michael Held, Urban Liebel, Cihan Cetin, Frank Sieckmann, Gregoire Pau, Rolf Kabbe, Annelie Wuensche, Venkata Satagopam, Michael H. A. Schmitz, Catherine Chapuis, Daniel W. Gerlich, Reinhard Schneider, Roland Eils, Wolfgang Hu- ber, Jan-Michael Peters, Anthony A. Hyman, Richard Durbin, Rainer Pepperkok, and Jan Ellen- berg. Nature, 464(7289):721–727, 2010. pdf, url (348 citations).

[87] CellCognition: time-resolved phenotype annotation in high-throughput live cell imaging. Michael Held, M. H. Schmitz, Bernd Fischer, Thomas Walter, Beate Neumann, M. H. Olma, M. Peter, Jan Ellenberg, and Daniel W. Gerlich. Nature Methods, 7(9):747–754, 2010. pdf, url (93 citations).

[88] Addressing accuracy and precision issues in iTRAQ quantitation. Natasha A. Karp, Wolfgang Huber, Pawel G. Sadowski, Philip D. Charles, Svenja V. Hester, and Kathryn S. Lilley. Molecular and Cellular Proteomics, 9:1885–97, 2010. pdf, url (176 citations).

[89] Organelle proteomics experimental designs and analysis. Laurent Gatto, Juan Antonio Vizca´ıno, Henning Hermjakob, Wolfgang Huber, and Kathryn S. Lilley. Proteomics, 2010. pdf, url (23 citations).

[90] High-resolution transcription atlas of the mitotic cell cycle in budding yeast. Marina V. Gra- novskaia, Lars J. Jensen, Matthew E. Ritchie, Jorn¨ Todling¨ , Ye Ning, Peer Bork, Wolfgang Huber, and Lars M. Steinmetz. Genome Biology, 11:R24, 2010. pdf, url (40 citations).

[91] Variation in transcription factor binding among humans. Maya Kasowski, Fabian Grubert, Christopher Heffelfinger, Manoj Hariharan, Akwasi Asabere, Sebastian M. Waszak, Lukas Habeg- ger, Joel Rozowsky, Minyi Shi, Alexander E. Urban, Mi-Young Hong, Konrad J. Karczewski, Wolfgang Huber, Sherman M. Weissman, Mark B. Gerstein, Jan O. Korbel, and Michael Snyder. Science, 328:232–235, 2010. pdf, url (266 citations).

[92] Microarray data quality control improves the detection of differentially expressed genes. Audrey Kauffmann and Wolfgang Huber. Genomics, 95:138–142, 2010. pdf, url (28 citations).

[93] A large-scale RNAi screen identifies Deaf1 as a regulator of innate immune responses in Drosophila. David Kuttenkeuler, Nadege Pelte, Anan Ragab, Viola Gesellchen, Lena Schnei- der, Claudia Blass, Elin Axelsson, Wolfgang Huber, and Michael Boutros. Journal of Innate Immunity, 2:181–194, 2010. pdf, url (20 citations). [94] Comparison of normalization methods for Illumina BeadChip(R) HumanHT-12 v3. Ramona Schmid, Patrick Baum, Carina Ittrich, Katrin Fundel-Clemens, Wolfgang Huber, Benedikt Brors, Roland Eils, Andreas Weith, Detlev Mennerich, and Karsten Quast. BMC Genomics, 11:349, 2010. pdf, url (31 citations).

[95] A global map of human gene expression. Margus Lukk, Misha Kapushesky, Janne Nikkila, Helen Parkinson, Angela Goncalves, Wolfgang Huber, Esko Ukkonen, and Alvis Brazma. Nature Biotechnology, 28:322–324, 2010. pdf, url (156 citations).

[96] Bidirectional promoters generate pervasive transcription in yeast. Zhenyu XuP, Wu WeiP, Julien Gagneur, Fabiana Perocchi, Sandra Clauder-Muenster, Jurgi Camblong, Elisa Guffanti, Francoise Stutz, Wolfgang Huber, and Lars M. Steinmetz. Nature, 457(7232):1033–1037, 2009. pdf, url (376 citations).

[97] High-resolution mapping of meiotic crossovers and non-crossovers in yeast. Eugenio ManceraP, Richard BourgonP, Alessandro Brozzi, Wolfgang Huber, and Lars M. Steinmetz. Nature, 454(7203):479–485, 2008. pdf, url (253 citations).

[98] The hwriter package. Gregoire Pau and Wolfgang Huber. The R Journal, 1(1):22–24, 2009. pdf, url.

[99] Array-based genotyping in S. cerevisiae using semi-supervised clustering. Richard Bourgon, Eugenio Mancera, Alessandro Brozzi, Lars M. Steinmetz, and Wolfgang Huber. Bioinformatics, 25(8):1056–1062, 2009. pdf, url.

[100] Mapping identifiers for the integration of genomic datasets with the R/Bioconductor pack- age biomaRt. Steffen Durinck, Paul T. Spellman, , and Wolfgang Huber. Nature Protocols, 4(8):1184–1191, 2009. pdf, url (108 citations).

[101] Visualisation of genomic data with the Hilbert curve. Simon Anders. Bioinformatics, 25:1231–1235, 2009. pdf, url (27 citations).

[102] ShortRead: a Bioconductor package for input, quality assessment and exploration of high- throughput sequence data. Martin Morgan, Simon Anders, Michael Lawrence, Patrick Aboy- oun, Herve´ Pages,´ and Robert Gentleman. Bioinformatics, 25:2607, 2009. pdf, url (115 citations).

[103] Genome-wide allele- and strand-specific expression profiling. Julien Gagneur, Himanshu Sinha, Fabiana Perocchi, Richard Bourgon, Wolfgang Huber, and Lars M. Steinmetz. Molecu- lar Systems Biology, 5:274, 2009. pdf, url (22 citations).

[104] Quality assessment and data analysis for microRNA expression arrays. Deepayan Sarkar, R. Parkin, S. Wyman, A. Bendoraite, C. Sather, J. Delrow, A. K. Godwin, C. Drescher, Wolfgang Huber, Robert Gentleman, and Munesh Tewari. Nucleic Acids Research, 37(2), 2009. pdf, url (29 citations).

[105] arrayQualityMetrics - a Bioconductor package for quality assessment of microarray data. Audrey Kauffmann, Robert Gentleman, and Wolfgang Huber. Bioinformatics, 25:415–416, 2009. pdf, url (17 citations).

[106] Importing ArrayExpress datasets into R/Bioconductor. Audrey Kauffmann, Tim F. Rayner, Helen Parkinson, Misha Kapushesky, Margus Lukk, Alvis Brazma, and Wolfgang Huber. Bioin- formatics, 25:2092–2094, 2009. pdf, url (17 citations).

[107] Analyzing ChIP-chip data using Bioconductor. Jorn¨ Todling¨ and Wolfgang Huber. PLoS Computational Biology, 4(11), 2008. pdf, url (13 citations).

[108] Rintact: enabling computational analysis of molecular interaction data from the IntAct repository. Tony Chiang, Nianhua Li, Sandra Orchard, Samuel Kerrien, Henning Hermjakob, Robert Gentleman, and Wolfgang Huber. Bioinformatics, 24(8):1100–1101, 2008. pdf, url. [109] Model-based variance-stabilizing transformation for Illumina microarray data. Simon M. Lin, Pan Du, Wolfgang Huber, and Warren A. Kibbe. Nucleic Acids Res, 36(2), 2008. pdf, url (231 citations).

[110] Combinatorial effects of four histone modifications in transcription and differentiation. Jenny J. Fischer, Jorn¨ Todling¨ , Tammo Kruger,¨ Markus Schuler,¨ Wolfgang Huber, and Silke Sperling. Genomics, 91(1):41–51, 2008. pdf, url (23 citations).

[111] Estimating node degree in bait-prey graphs. Denise Scholtens, Tony Chiang, Wolfgang Hu- ber, and Robert Gentleman. Bioinformatics, 24(2):218–224, 2008. pdf, url (10 citations).

[112] Florian Hahne, Wolfgang Huber, Robert Gentleman, and Seth Falcon. Bioconductor Case Studies. Use R. Springer, 2008. pdf, url (92 citations).

[113] Coverage and error models of protein-protein interaction data by directed graph analysis. Tony Chiang, Denise Scholtens, Deepayan Sarkar, Robert Gentleman, and Wolfgang Huber. Genome Biology, 8(9), 2007. pdf, url (23 citations).

[114] Making the most of high-throughput protein-interaction data. Robert Gentleman and Wolf- gang Huber. Genome Biology, 8(10):112–112, 2007. pdf, url (24 citations).

[115] Graphs in molecular biology. Wolfgang Huber, Vincent J. Carey, Li Long, Seth Falcon, and Robert Gentleman. BMC Bioinformatics, 8(Suppl. 6), 2007. pdf, url (42 citations).

[116] Ringo–an R/Bioconductor package for analyzing ChIP-chip readouts. Jorn¨ Todling¨ , Oleg Sklyar, Tammo Kruger,¨ Jenny J. Fischer, Silke Sperling, and Wolfgang Huber. BMC Bioinfor- matics, 8:221–221, 2007. pdf, url.

[117] In situ analysis of cross-hybridisation on microarrays and the inference of expression corre- lation. Tineke Casneuf, Yves Van de Peer, and Wolfgang Huber. BMC Bioinformatics, 8:461– 461, 2007. pdf, url (40 citations).

[118] CoCo: a web application to display, store and curate ChIP-on-chip data integrated with diverse types of gene expression data. Charles Girardot, Oleg Sklyar, Sophie Grosz, Wolfgang Huber, and Eileen E. M. Furlong. Bioinformatics, 23(6):771–773, 2007. pdf, url.

[119] Genomic organization of transcriptomes in mammals: Coregulation and cofunctional- ity. Antje Purmann, Jorn¨ Todling¨ , Markus Schuler,¨ Piero Carninci, Hans Lehrach, Yoshihide Hayashizaki, Wolfgang Huber, and Silke Sperling. Genomics, 89(5):580–587, 2007. pdf, url (32 citations).

[120] High-throughput flow cytometry-based assay to identify apoptosis-inducing proteins. Ma- matha Sauermann, Florian Hahne, Christian Schmidt, Meher Majety, Heiko Rosenfelder, Stephanie Bechtel, Wolfgang Huber, Annemarie Poustka, Dorit Arlt, and Stefan Wiemann. Jour- nal of Biomolecular Screening, 12(4):510–520, 2007. pdf, url.

[121] Comparative analysis of structured in S. cerevisiae indicates a multitude of different functions. Stephan Steigele, Wolfgang Huber, Claudia Stocsits, Peter F. Stadler, and Kay Nieselt. BMC Biology, 5:25–25, 2007. pdf, url (22 citations).

[122] A high-resolution map of transcription in the yeast genome. Lior DavidP, Wolfgang Hu- berP, Marina Granovskaia, Jorn¨ Todling¨ , Curtis J. Palm, Lee Bofkin, T. Jones, Ron W. Davis, and Lars M. Steinmetz. PNAS, 103(14):5320–5325, 2006. pdf, url (393 citations).

[123] Analysis of cell-based RNAi screens. Michael Boutros, L´ıgia P. Bras´ , and Wolfgang Huber. Genome Biology, 7(7), 2006. pdf, url (149 citations).

[124] Transcript mapping with high-density oligonucleotide tiling arrays. Wolfgang Huber, Jorn¨ Todling¨ , and Lars M. Steinmetz. Bioinformatics, 22(16):1963–1970, 2006. pdf, url (91 citations). [125] Statistical methods and software for the analysis of highthroughput reverse genetic assays using flow cytometry readouts. Florian Hahne, Dorit Arlt, Mamatha Sauermann, Meher Majety, Annemarie Poustka, Stefan Wiemann, and Wolfgang Huber. Genome Biology, 7(8), 2006. pdf, url (14 citations).

[126] Reproducible statistical analysis in microarray profiling studies. Ulrich Mansmann, Markus Ruschhaupt, and Wolfgang Huber. Methods of Information in Medicine, 45:139–145, 2006. pdf, url.

[127] The LIFEdb database in 2006. Alexander Mehrle, Heiko Rosenfelder, Ingo Schupp, Coral del Val, Dorit Arlt, Florian Hahne, Stephanie Bechtel, Jeremy Simpson, Oliver Hofmann, Winston Hide, Karl-Heinz Glatting, Wolfgang Huber, Rainer Pepperkok, Annemarie Poustka, and Stefan Wiemann. Nucleic Acids Research, 34(Database issue):415–418, 2006. pdf, url (21 citations).

[128] Robert Gentleman, Florian Hahne, and Wolfgang Huber. Visualizing genomic data. Technical Report 10, Bioconductor Project Working Papers, 2006. pdf, url.

[129] Image analysis for microscopy screens. Oleg Sklyar and Wolfgang Huber. R News, 6(5):12– 16, 2006. pdf, url.

[130] Transcript mapping with high-density tiling arrays. Matthew Ritchie and Wolfgang Huber. R News, 6(5):23–27, 2006. pdf, url.

[131] BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Steffen Durinck, , Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis Brazma, and Wolfgang Huber. Bioinformatics, 21:3439–3440, 2005. pdf, url (286 citations).

[132] Functional profiling: from microarrays via cell-based assays to novel tumor relevant mod- ulators of the cell cycle. Dorit Arlt, Wolfgang Huber, Urban Liebel, C. Schmidt, Meher Majety, Mamatha Sauermann, Heiko Rosenfelder, Stefanie Bechtel, Alexander Mehrle, Detlev Bannasch, Ingo Schupp, Markus Seiler, Jeremy C. Simpson, Florian Hahne, Petra Moosmayer, Markus Ruschhaupt, Birgit Guilleaume, Ruth Wellenreuther, Rainer Pepperkok, Holger Sultmann,¨ An- nemarie Poustka, and Stefan Wiemann. Cancer Research, 65(17):7733–7742, 2005. pdf, url (19 citations).

[133] Systematic comparison of surface coatings for protein microarrays. Birgit Guilleaume, An- dreas Buness, C. Schmidt, F. Klimek, G. Moldenhauer, Wolfgang Huber, Dorit Arlt, Ulrike Korf, Stefan Wiemann, and Annemarie Poustka. Proteomics, 5:4705–4712, 2005. pdf, url (32 citations).

[134] Gene expression in kidney cancer is associated with cytogenetic abnormalities, metastasis formation, and patient survival. Holger Sultmann,¨ Anja von Heydebreck, Wolfgang Huber, Rupert Kuner, Andreas Buness, Markus Vogt, Bastian Gunawan, , Laszlo Fuzesi, and Annemarie Poustka. Clinical Cancer Research, 11:646–655, 2005. pdf, url (52 citations).

[135] arrayMagic: two-colour cDNA microarray quality control and preprocessing. Andreas Buness, Wolfgang Huber, Klaus Steiner, Holger Sultmann,¨ and Annemarie Poustka. Bioinfor- matics, 21(4):554–556, 2005. pdf, url (36 citations).

[136] Novel cancer relevant cell cycle modulators identified in automated cell-based assays. Dorit Arlt, Wolfgang Huber, Mamatha Sauermann, Meher Majety, Florian Hahne, Rainer Pepperkok, Annemarie Poustka, and Stefan Wiemann. European Journal of Cell Biology, 84(Suppl. 55):30, 2005.

[137] Wolfgang Huber, Anja von Heydebreck, and Martin Vingron. Bioinformatics - from Genomes to Therapies, chapter Low-level analysis of microarray experiments. Wiley-VCH, 2005. pdf.

[138] On the synthesis of microarray experiments. Robert Gentleman, Markus Ruschhaupt, and Wolfgang Huber. Journal de la Societ´ e´ Franc¸aise de Statistique, 146(1-2), 2005. pdf, url. [139] Robert Gentleman, Vincent J. Carey, Wolfgang Huber, Rafael Irizarry, and Sandrine Dudoit, ed- itors. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, 2005. url (1925 citations).

[140] Bioconductor: open software development for computational biology and bioinformatics. Robert C. Gentleman, Vincent J. Carey, Douglas M. Bates, Ben Bolstad, Marcel Dettling, San- drine Dudoit, Byron Ellis, Laurent Gautier, Y.C. Ge, Jeff Gentry, Kurt Hornik, Torsten Hothorn, Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch, Cheng Li, Martin Maechler, Anthony J. Rossini, Gunther¨ Sawitzki, Colin Smith, Gordon Smyth, Luke Tierney, Jean Y.H. Yang, and J.H. Zhang. Genome Biology, 5(10), 2004. pdf, url (5421 citations).

[141] matchprobes: a Bioconductor package for the sequence-matching of microarray probe el- ements. Wolfgang Huber and Robert Gentleman. Bioinformatics, 20:1651–1652, 2004. pdf, url (25 citations).

[142] A compendium to ensure computational reproducibility in high-dimensional classification tasks. Markus Ruschhaupt, Wolfgang Huber, Annemarie Poustka, and Ulrich Mansmann. Sta- tistical Applications in Genetics and Molecular Biology, 3(37), 2004. pdf, url (90 citations).

[143] Systematic analysis of T7 RNA polymerase based in vitro linear RNA amplification for use in microarray experiments.Jorg¨ Schneider, Andreas Buness, Wolfgang Huber, Joachim Volz, Petra Kioschis, Mathias Hafner, Annemarie Poustka, and Holger Sultmann.¨ BMC Genomics, 5(1):29, 2004. pdf, url (60 citations).

[144] From ORFeome to biology: a functional genomics pipeline. Stefan Wiemann, Dorit Arlt, Wolfgang Huber, Ruth Wellenreuther, Simone Schleeger, Alexander Mehrle, Stephanie Bechtel, Mamatha Sauermann, Ulrike Korf, Rainer Pepperkok, Holger Sultmann,¨ and Annemarie Poustka. Genome Research, 108:2136–44, 2004. pdf, url (35 citations).

[145] Wolfgang Huber, Anja von Heydebreck, and Martin Vingron. Encyclopedia of Genetics, Ge- nomics, Proteomics and Bioinformatics, chapter Error models for microarray intensities. John Wiley & Sons, 2004. pdf (12 citations).

[146] Anja von Heydebreck, Wolfgang Huber, and Robert Gentleman. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics, chapter Differential Expression with the Bioconductor Project. John Wiley & Sons, 2004. pdf (51 citations).

[147] Multi-domain protein families and domain pairs: Comparison with known structures and a random model of domain recombination. Gordana Apic, Wolfgang Huber, and Sarah A. Teichmann. Journal of Structural and Functional Genomics, 4:67–78, 2003. pdf (76 citations).

[148] Cytogenetic and morphologic typing of 58 papillary renal cell carcinomas: Evidence for a cytogenetic evolution of type 2 from type 1 tumors. Bastian Gunawan, Anja von Heydebreck, Thekla Fritsch, Wolfgang Huber, Rolf-Hermann Ringert, Gerhard Jakse, and Laszl´ o´ Fuzesi.¨ Can- cer Research, 63:6200–6205, 2003. pdf, url (66 citations).

[149] Mathematical tree models for cytogenetic development in solid tumors. Anja von Heyde- breck, Bastian Gunawan, Wolfgang Huber, Martin Vingron, and Laszlo Fuzesi.¨ Verhandlungen der Deutschen Gesellschaft fur¨ Pathologie, 2003.

[150] Parameter estimation for the calibration and variance stabilization of microarray data. Wolfgang Huber, Anja von Heydebreck, Holger Sultmann,¨ Annemarie Poustka, and Martin Vin- gron. Statistical Applications in Genetics and Molecular Biology, 2(1):Article 3, 2003. pdf, url (158 citations).

[151] Wolfgang Huber, Anja von Heydebreck, and Martin Vingron. Analysis of microarray gene expression data. In Martin Bishop et al., editor, Handbook of Statistical Genetics. John Wiley & Sons, Ltd, Chichester, UK, 2003. pdf (53 citations). [152] Prognostic factors influencing surgical management and outcome of gastrointestinal stro- mal tumours. C. Langer, Bastian Gunawan, P. Schuler,¨ Wolfgang Huber, Laszlo Fuzesi,¨ and H. Becker. British Journal of Surgery, 90:332–399, 2003. pdf, url (114 citations).

[153] Transcription profiling of renal cell carcinoma. Wolfgang Huber, Judith M. Boer, Anja von Heydebreck, Bastian Gunawan, Martin Vingron, Laszl´ o´ Fuzes¨ ´ı, Annemarie Poustka, and Holger Sultmann.¨ Verhandlungen der Deutschen Gesellschaft fur¨ Pathologie, 86:153–164, 2002.

[154] Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Wolfgang Huber, Anja von Heydebreck, Holger Sultmann,¨ Annemarie Poustka, and Martin Vingron. Bioinformatics, 18 Suppl 1:96–104, 2002. pdf, url (1673 citations).

[155] Identification and classification of differentially expressed genes in renal cell carcinoma by expression profiling on a global human 31,500-element cDNA array. Judith M. Boer, Wolf- gang Huber, Holger Sultmann,¨ Friederike Wilmer, Anja von Heydebreck, Stefan Haas, Bernhard Korn, Bastian Gunawan, Astrid Vente, Laszlo Fuzesi,¨ Martin Vingron, and Annemarie Poustka. Genome Research, 11(11):1861–1870, 2001. pdf, url (145 citations).

[156] Prognostic impacts of cytogenetic findings in clear cell renal cell carcinoma: Chromosome translocation der(3)t(3;5) or gain of 5q predict a distinct clinical phenotype with favourable prognosis. Bastian Gunawan, Wolfgang Huber, Meike Holtrup, Anja von Heydebreck, Thomas Efferth, Annemarie Poustka, Rolf-Hermann Ringert, Gerhard Jakse, and Laszl´ o´ Fuzesi.¨ Cancer Research, 61:7731–7738, 2001. pdf, url (67 citations).

[157] FLASHFLOOD: A 3D field-based similarity search and alignment method for flexible molecules. Michael C. Pitman, Wolfgang Huber, Hans Horn, Andreas Kramer,¨ Julia E. Rice, and William C. Swope. Journal of Computer-Aided Molecular Design, 15:587–612, 2001. pdf, url (18 citations).

[158] Identifying splits with clear separation: A new class discovery method for gene expression data. Anja von Heydebreck, Wolfgang Huber, Annemarie Poustka, and Martin Vingron. Bioin- formatics, 17 Suppl. 1:S107–114, 2001. pdf, url (77 citations).

[159] Gene expression profiling of kidney cancer using a tumor-specific cDNA microarray. Holger Sultmann,¨ Wolfgang Huber, Laszlo Fuzesi, Bastian Gunawan, Anja von Heydebreck, Martin Vingron, and Annemarie Poustka. Clinical Cancer Research, 7(11, Suppl. S):155, 2001. pdf, url.

[160] Quasistationary distributions of dissipative nonlinear quantum oscillators in strong peri- odic driving fields. Heinz Peter Breuer, Wolfgang Huber, and Francesco Petruccione. Physical Review E, 61:4883–4889, 2000. pdf, url (26 citations).

[161] Stochastic wave function method versus density matrix: a numerical comparison. Heinz Pe- ter Breuer, Wolfgang Huber, and Francesco Petruccione. Computer Physics Communications, 104:46–58, 1997. pdf, url (16 citations).

[162] Vestibular-neck interaction and transformation of sensory coordinates. Thomas Mergner, Wolfgang Huber, and Wolfgang Becker. Journal of Vestibular Research, 7:347–367, 1997. (100 citations).

[163] Spatially resolved measurement and modeling of blood brain barrier permeability. Wolf- gang Huber, Klaus Kopitzki, Jens Timmer, and Peter Warnke. Biomedizinische Technik, 41 suppl. 1:160, 1996. pdf.

[164] Fast Monte Carlo algorithm for nonequilibrium systems. Heinz Peter Breuer, Wolfgang Huber, and Francesco Petruccione. Physical Review E, 53:4232–4235, 1996. pdf, url.

[165] The three-loop model: a neural network for the generation of saccadic reaction times. Burkhart Fischer, Stefan Gezeck, and Wolfgang Huber. Biological Cybernetics, 72:185–196, 1995. pdf, url (27 citations). [166] The macroscopic limit in a stochastic reaction–diffusion process. Heinz Peter Breuer, Wolf- gang Huber, and Francesco Petruccione. Europhysics Letters, 30:69–74, 1995. pdf, url (26 citations).

[167] Fluctuation effects on wave propagation in a reaction–diffusion process. Heinz Peter Breuer, Wolfgang Huber, and Francesco Petruccione. Physica D, 73:259–273, 1994. pdf, url (49 cita- tions).

[168] Wolfgang Huber. Dynamics of strongly driven open quantum systems. PhD thesis, University of Freiburg, 1998. pdf.

[169] Wolfgang Huber. The description of reaction diffusion processes by master equations. Diploma thesis, University of Freiburg, 1994. pdf.