(FDK) Research Unit the Status of a National Center-Of-Excellence from January 1, 2002
Total Page:16
File Type:pdf, Size:1020Kb
FROM DATA TO KNOWLEDGE (FDK) RESEARCH UNIT NATIONAL CENTRE OF EXCELLENCE 2002-2007 ACADEMY OF FINLAND BIENNIAL REPORT 2002 - 2003 Department of Computer Science Laboratory of Computer and University of Helsinki Information Science Helsinki University of Technology Table of contents Preface .................................................................................................................................3 1. Progress of research work ...............................................................................................4 1.1. Data mining and machine learning ..........................................................................5 1.2. Computational methods in medical genetics and systems biology ..........................8 1.3. Combinatorial pattern matching and information retrieval ....................................11 1.4. Computational Structural Biology..........................................................................13 1.5. PhD Theses.............................................................................................................14 2. Changes in research strategy and research plans...........................................................17 3. Personnel .......................................................................................................................18 3.1. Summary.................................................................................................................18 3.2. Prizes and scientific honours received by researchers of the unit in 2002 - 2003..19 3.3. International positions of trust held by researchers of the unit in 2002 - 2003......19 3.4. Domestic positions of trust held by researchers of the unit in 2002 – 2003 ..........23 3.5. Mobility of researchers...........................................................................................24 4. Publications and other outcomes...................................................................................27 4.1. Articles in international scientific journals with referee practice...........................27 4.2. Articles in international edited works and conference proceedings with referee practice...................................................................................................................30 4.3. Articles in Finnish scientific journals with referee practice...................................38 4.4. Articles in Finnish edited works and conference proceedings with referee practice ................................................................................................................................38 4.5. Scientific monographs published abroad................................................................39 4.6. Other scientific publications...................................................................................39 4.7. Patents.....................................................................................................................41 4.8. Computer programs (and algorithms) ....................................................................41 4.10. Lectures and visiting lectures ...............................................................................42 4.11. Radio and television programmes and articles popularising science ...................46 4.12. Other outcomes: international conferences ..........................................................46 4.13. Degrees.................................................................................................................47 5. Funding of the center 2002-2003 ..................................................................................49 APPENDIX: List of personnel 2003 .................................................................................50 2 Preface The Academy of Finland granted to the From Data to Knowledge (FDK) research unit the status of a national Center-of-Excellence from January 1, 2002. The activities of the unit have, however, much longer history, dating back at least to early 1990's when some key- researchers of the unit started to collaborate. The vision has been from the beginning to build on our core competence in the algorithmics of combinatorial pattern matching and data mining, and apply it on novel problems in data analysis. With the prestigious new status our development and expansion has been even stronger than was expected. The research activity as well as the size of the personnel has grown rapidly. The unit had about 40 members at the beginning of 2002 while the current number is approaching 60. Also the establishment of the new Basic Research Unit of the Helsinki Institute of Information Technology (HIIT/BRU) in 2002 has made our environment much stronger and attractive. The HIIT/BRU, located in the same building as the FDK, is directed by Professor Heikki Mannila who is also a member of our unit. In 2004, Sami Kaski will start in the new data analysis professor position at our host department. We all will also to move to the new Exactum Building at Kumpula Campus during 2004. I expect these events to further strengthen the FDK, too. The present report summarizes the results and new plans of the unit from its first two years of activity. Helsinki, February 15th 2004. Esko Ukkonen Contact information: Prof. Esko Ukkonen Department of Computer Science P.O.Box 26 (Teollisuuskatu 23) FIN –00014 University of Helsinki, Finland Tel. +358 9 191 44172 Fax +358 9 191 44441 E-mail: [email protected] www.cs.helsinki.fi/research/fdk/ 3 1. Progress of research work In the original plan the research profile of the FDK unit was summarized as follows: Collection of raw data has in many areas of industry and research become easier than previously. Molecular biology produces long sequences of biological information; environmental satellites provide a wealth of data, process monitoring gives heaps of measurements, and the Internet gives easy access to a wide variety of data sources. Similar advances in the methods that provide useful information or knowledge from the data have not matched this overwhelming increase in the availability of data. The “From Data to Knowledge” (FDK) research unit develops methods for forming useful knowledge from large masses of data. The unit operates in a multidisciplinary fashion, integrating in its research group’s excellence in discrete algorithms, statistical techniques and application sciences. The major methodological tools of the research unit are combinatorial pattern matching and data mining. The combination of these two is unique in the world. The work combines conceptual advances, algorithmic, statistical and analytical methods, and empirical work: theory and practice go hand in hand. The results of the unit have been applied in, e.g., molecular biology, process industry, telecommunications, genetics, ecology, and natural language processing. The results have attracted wide international attention. Many concepts created by the group are in use in the scientific community, and they are presented in textbooks. Software that incorporates methods invented at the unit has been commercialized in several countries. The main themes in the planned activity of the FDK unit are efficient algorithms, data mining and combinatorial pattern matching, and the analysis of sequential and many-dimensional data as well as applications in computational molecular biology, bioinformatics, telecommunications and natural language technology. The research of the unit can be viewed as an intertwined combination of four major projects or themes: I: Data mining and machine learning; II: Computational methods in medical genetics and systems biology; III: Combinatorial pattern matching and information retrieval; IV: Computational structural biology. The projects are highly connected: basic research of computational methods for some specific applications occurs in each of the four. Similarly, the topics of discrete algorithms and probabilistic approaches occur repeatedly. The projects also share many researchers. Below, the subgroups belonging to each of the themes I-IV report briefly on the results and new plans. The subgroups are lead by the senior members of the unit: Helena Ahonen- Myka, Tapio Elomaa, Jaakko Hollmén, Heikki Mannila, Hannu Toivonen and Esko Ukkonen. Also two of our post-docs, Kjell Lemström and Juho Rousu, have their own subgroups. 4 1.1. Data mining and machine learning Group Ahonen-Myka Members: Helena Ahonen-Myka, Lili Aunimo, Martin Fluch (-06/03), Oskari Heinonen, Kaisa Kostiainen (-05/03), Reeta Kuuskoski, Miro Lehtonen, Greger Lindén, Juha Makkonen, Renaud Petit, Jussi Piitulainen, Andrei Popescu (-07/03), Marko Salmenkivi, Otso Virtanen (-12/02) We have studied the information retrieval related problems of first story detection and topic tracking [47, 71, 106, 107, 108, and 133]. The first story detection task is about spotting new, previously unreported real-life events from online news-feed, while the topic tracking attaches a document to a previously detected event. We have addressed these problems by extracting locations, proper names, temporal expressions and normal terms from documents, and assigning weights for these semantic classes. The weights are learned from a training set that contains pre-classified documents. We have also proposed new similarity measures based on semantic classes. In our experiments on a Finnish online news-stream corpus, we have found that the use of semantic classes improves the performance significantly. We have also started to experiment with commonly