Final Report A
Total Page:16
File Type:pdf, Size:1020Kb
au AARHUS UNIVERSITY ),1$/ REPORT Center for Massive Data Algorithmics 2007 - 201 MADALGO - a Center of the Danish National Research Foundation / Department of Computer Science / Aarhus University When initiated in 2007, the ambitious goal of the Danish National Research Foundation Center for Massive Data Algorithmics (MADALGO) was to become a world-leading center in algorithms for handling massive data. The high-level objectives of the center were: x To significantly advance the fundamental algorithms knowledge in the area of efficient processing of massive datasets; x To educate the next generation of researchers in a world-leading and international environment; x To be a catalyst for multidisciplinary collaborative research on massive dataset issues in commercial and scientific applications. Building on the research strength at the main center site at Aarhus University in Denmark, and at the sites of the other participants at Max Planck Institute for Informatics and at Frankfurt University in Germany, and at Massachusetts Institute of Technology in the US, as well as on significant international research collaboration, multidisciplinary and industry collaboration, and in general on a vibrant international environment at the main center site, I believe we have met the goal. Following the guidelines of the Foundation, this report contains a short reflection on the center’s results, impact and international standing (section A), along with appendices with statistics on patents and spin-off companies (section A) and the statistics collected by the Foundation (section C). It also contains 10 publications reflecting the breadth and depth of center research. Lars Arge Center Director May, 2016 Final report A. Center results, impact and international standing Center Results. A key motivation for establishment of Center for Massive Data Algorithmics was the inadequacy of traditional algorithms theory in many applications involving massive amounts of data. In particular, traditional algorithm theory use simplistic machine models (mathematical model in which algorithms are designed and analyzed) that do not take the hierarchical memory organization of modern machines into account. This translates into software inadequacies when processing massive data. Based on an ambitious research plan, center researchers have obtained a large number of results centered around four key areas that all relate to the design of algorithms in more realistic models of computations than traditional algorithm theory. Below we briefly highlight a few results (corresponding to the 10 selected papers). In the area of I/O-efficient algorithms, the goal is to design algorithms in a model of computation that takes the large access time of disk (compared to main memory) into account. Center results on fundamental problems in the area include results on sorting algorithms that are provably optimal in terms of both I/O and internal memory computation time (selected paper 1, which won the best paper awards at the 2013 ISAAC conference). Center researchers have also obtained results on processing of massive terrain data, including several results on modeling water flow on terrains, such as for example algorithms for computing how depressions in a terrain fills with water as it rains (paper 2, which appeared in the top computational geometry conference SoCG). Overall, we believe center work in the area has been very successful. In the area of cache-oblivious algorithms, the aim is to develop algorithms that automatically adapt to the unknown multiple levels of modern memory hierarchies. Developing cache-oblivious algorithms is very challenging; at the start of the center, techniques for obtaining cache-oblivious algorithms was not very developed and the fundamental limitations in the area not well understood. Center researchers have obtained a good number of results in the area, and e.g. developed new fundamental techniques used to obtain an optimal cache-oblivious data structure that allows for fast search for a data element among a set of elements (paper 3, which appeared in the top algorithms conference SODA). However, overall we have not made as much progress on cache-oblivious problems as we would have liked. In the area of streaming algorithms, the aim is to develop algorithms that only perform one sequential pass over the input data, while processing each input element as fast as possible using significantly less space than the input data size. Center researchers have obtained a large number of results in the area. For example, we have developed the first optimal algorithm for the well-studied and fundamental problem of estimating the number of distinct elements in a stream (paper 4, which won the best paper awards at the top theoretical database conference PODS in 2010). We have also obtained a large number of results in relation to the so- called linear sketching technique, and recently we showed the very first known lower bounds on how fast data stream elements can be processed (paper 5, which appeared in the top theoretical computer science conference STOC). Overall, we believe center work in the area has been very successful. The area of algorithm engineering covers the design and analysis of practical algorithms, their efficient implementation, as well as experimentation that provides insight into their applicability and further improvement. Center researches have in particular engineered many I/O-efficient algorithms. For example, the center has had great success with engineering algorithms for flood risk analysis using detailed (and thus massive) terrain data. This work includes implementation of results on how depressions fill during a rain event mentioned above (paper 2) and recent work on modelling and computing flood risk from water rising in rivers (paper 6, which appear in the good geographic information systems conference GIScience). It also formed the basis of the center startup company SCALGO, which has very successfully marketed flood risk analysis products based on the openly available detailed terrain model of Denmark. For example, the depression risk analysis results (paper 2) formed the basis of analysis purchased by more than half of the local Danish governments. Center algorithm engineering work has also formed the basis for collaboration with several other companies, as well as for extensive collaboration with researchers in other fields, in particular biology (bioinformatics and biodiversity) researchers. The river flood risk research (paper 6) is e.g. performed with a number of biology researchers, and the collaboration has also led to a Science paper (paper 7). Overall, we believe center algorithm engineering work has been very successful. In addition to the above four core areas, center researchers have obtained a large number of results on research questions in relation to e.g. models of parallel computation, flash memory, fault tolerance, memory efficient (succinct) data structures, as well as to fundamental problems in more traditional models of computation, in particular data structure and lower bound problems. As a few examples, center researchers have solved longstanding open problems in succinct (paper 8, which won the best student paper award at the top theoretical computer science conference FOCS in 2008) and priority queue (paper 9, which appeared in top theoretical computer science conferences STOC and is co-authored by Turing Award winner Final report Tarjan) data structures. Much of the center’s data structure work (in various models of computation) has been on fundamental range searching problems (storing a set of d-dimensional points such that points in a query range can be identified efficiently), and we have e.g. showed very strong lower bound tradeoffs between search and update time for several variants of the problem (paper 10, which won both best paper and best student paper awards in the top theoretical computer science conferences STOC in 2012). A key goal of the center has been to educate the next generation of researchers in a world-leading and international environment. Altogether, 28 PhD students have graduated from the center (19 from AU), and the center currently houses 14 PhD students (7 at AU). Among the 26 students at AU, 12 students are internationals. Additionally, the center has hosted 24 Post Doc (22 at AU). The center has emphasized creation of a vibrant and international environment at the main center site at AU. The center has hosted numerous research visitors, just as we have organized annual summer schools at the center. The summer school series has been very successful, with around 70 participants from 30+ institutions attending every year. The center has also hosted the top computational geometry conference (SoCG) in 2009, and is hosting the top European algorithms event ALGO (including the top European algorithms conference ESA) in 2016. The center also initiated Workshop on Massive Data Algorithmics (MASSIVE) in 2009 and the successful workshop has been held every year since in connection with SoCG or ALGO. Center international standing and impact. Overall, we believe the center has been very successful. From a purely bibliometric point of view, this can e.g. be seen from the fact that in the last four years, center researchers have published 90+ peer-reviewed papers a year, with consistently 10+ papers in the top three general theoretical computer science conferences STOC, FOCS and SODA. According to google scholar, center papers have received 12.000+ citations (steadily increasing