Polish Society Institute of Computing Science, Poznań University of Technology Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznań

II Convention of the Polish Bioinformatics Society in conjunction with 7th Workshop on Bioinformatics

Będlewo, October 2–4, 2009

Book of Abstracts Program Committee

Jacek Błażewicz (Chair) Institute of Computing Science, Poznań University of Technology, Poznań, Poland

Jerzy Tiuryn Institute of Informatics, University of Warsaw, Warsaw, Poland

Wiesław Nowak Institute of Physics, Nicolaus Copernicus University, Toruń, Poland

Janusz M. Bujnicki Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell , Warsaw, Poland

Organizing Committee

Marta Kasprzak (Chair) Marta Szachniuk Marcin Borowski

Institute of Computing Science, Poznań University of Technology, Poznań, Poland

2 Workshop Overview

The following Book of Abstracts contains the talks to be presented at the 7th Workshop on Bio- informatics, to be held in Będlewo, October 2–4, 2009. The first five workshops gathered mostly Ph.D. students and senior researchers from two centers — University of Warsaw (Institute of In- formatics) and Poznań University of Technology (Institute of Computing Science). Since the last year the workshop is organized in a conjuction with the Convention of the Polish Bioinformatics Society and has a much wider representation of the centers and topics to be discussed. This year, due to the vast outcome of the call for presentations and the technical constraints imposed on the organizers, not all of the submitted talks could be accepted. The Program Committee after a thorough refereeing procedure included into the final program 29 talks. The accepted talks are concerned with several topics dominating current bioinformatics and areas. Most of them (10 talks) deal with different models of protein struc- ture analysis and prediction. Second biggest area is the RNA structure analysis and modeling (4 talks). Analysis of DNA sequences (assembly algorithms and motif discovery) and improve- ment of alignment algorithms as well as evolution issues are the topics of 3 talks each. Other topics discussed are: gene clusters (2 talks), disease and immunology issues (2 talks) and finally signaling pathways and text mining (1 talk each). All the talks discuss problems of a very high importance and the level of the results obtained is very good. Speakers represent different bioinformatics centers in Poland, in particular: Adam Mickiewicz University (Bioinformatics Laboratory at Institute of and Biotechnology), Poznań — 9 talks; University of Warsaw (Institute of Informatics and Interdisciplinary Centre for Mathematical and Computational Modelling) — 8 talks, Poznań University of Technology (Institute of Computing Science) — 7 talks, Nencki Institute of Experimental Biology, Warsaw — 2 talks, Silesian University of Technology, Gliwice (Institute of Informatics) — 2 talks, and finally International Institute of Molecular and Cell Biology (Laboratory of Bioinformatics and Protein Engineering), Warsaw — 1 talk. Here, I would like to thank all the members of Program and Organizing Committees for their help in evaluating talks and organizing the meeting. Special thanks are due to Marta Kasprzak who, as the Chair of the Organizing Committee, gave the right momentum to the work of these two bodies, devoting a lot of her energy and skills to the success of our workshop.

Prof. Jacek Błażewicz Chair of the Program Committee

3 Contents

Michał Startek, Anna Gambin, “Evolving subset seeds for finding protein alignment” . 6 Łukasz Ligowski, Witold Rudnicki, “Comparison of PSI-BLAST and PSI-Smith-Waterman algorithms” ...... 6 Paweł Wojciechowski, “Parallel implementation of T-COFFEE algorithmonGPU” .. 7 Piotr Gawron, “Parallel algorithm for DNA assembly” ...... 7 Tomasz Głowacki, Marcin Borowski, Piotr Formanowicz, “Long peptides assembly me- thodbasedonEdmandegradation” ...... 8 Rafał Pokrzywa, “Searching for tandem repeats in DNA sequences using the Burrows- Wheelertransform”...... 9 Miron Kursa, “WAIAS — a web toolkit for an aptamer analysis” ...... 9 Sławomir Walkowiak, Konrad Wawruch, Łukasz Ligowski, Witold Rudnicki, “Searching biochemicalcompoundsintextdatabases” ...... 10 Aleksandra Gruca, “Gene ontology based characterization of groups of genes by multi- attributerules” ...... 11 Dariusz Plewczyński, “Virtual high-throughput screening using machine learning & do- cking” ...... 11 Teresa Szczepińska, Krzysztof Pawłowski, “Looking for chromosome spatial organization rulesinmicroarraygeneexpressiondata” ...... 12 Ewa Szczurek, Florian Markowetz, Irit Gat-Viks, Przemysław Biecek, Jerzy Tiuryn, “Model-based dissection of transcriptional deregulation” ...... 13 Marcin Jąkalski, Izabela Makałowska, “Evolution of overlapping genes in Drosophila genomes” ...... 14 Aleksander Jankowski, “Predicting nucleosome binding sites in yeast genome” . . . . . 15 Mikołaj Rybiński, Anna Gambin, Haluk Resat, “Testing for heterogenity in the recep- torsdensity”...... 15 Joanna Ciomborowska, Damian Szklarczyk, Wojciech Makałowski, Izabela Makałowska, “Retrogenes: the genesis and evolutionary consequences” ...... 16 Michał Szcześniak, Michał Kabza, Joanna Ciomborowska, Izabela Makałowska, “Intron gaininmammals” ...... 17 Szymon Wąsik, “On a certain model of HCV infection” ...... 17 Małgorzata Maciukiewicz, “Association analysis in human diseases: schizophrenia and bipolar affective disease example” ...... 18 Wojciech Karłowski, Andrzej Zieleziński, “Computational identification and compara- tive analysis of ARGONAUTE-binding platforms found in plant RRM proteins” . 19 Anna Lenart, Krzysztof Pawłowski, “Comparative modeling of the hydrolase domain fromdifferentCLCAproteins” ...... 19 Arkadiusz Hoffa, Piotr Łukasiak, Maciej Antczak, Jacek Błażewicz, “Ligand-protein conformationstructureprediction” ...... 20 Maciej Antczak, Piotr Łukasiak, Maciej Miłostan, Jacek Błażewicz, “DomAn2 – appro- ach for predicting domains boundaries in proteins” ...... 21 Wojciech Potrzebowski, Janusz M. Bujnicki, “A novel method for tracing polypeptide chains in medium-resolution electron density maps” ...... 22 Joanna M. Kasprzak, Wojciech Potrzebowski, Janusz M. Bujnicki, Kristian Rother, “Modeling of large macromolecular complexes” ...... 22

4 Agnieszka Rybarczyk, “Branch and cut algorithm for the RNA degradation problem” . 23 Anna Philips, “Prediction of metal-binding sites in RNA structures” ...... 23 Magdalena Musielak, Kristian Rother, Tomasz Puton, Janusz M. Bujnicki, “RNA ter- tiarystructurepredictionwithModeRNA” ...... 24 Tomasz Puton, Kristian Rother, Janusz M. Bujnicki, “CompaRNA — a server for com- parison of methods for RNA structure prediction” ...... 24

5 Evolving subset seeds for finding protein alignment Michał Startek a∗, Anna Gambin a

(a) Institute of Informatics, University of Warsaw, Warsaw, Poland (∗) [email protected]

Abstract

During this talk, a modification of BLAST algorithm will be presented. The modification is based on replacing the standard substitution table used by BLAST (like BLOSUM) with a so-called subset seed (chosen specifically for the protein family being aligned), that represents structural information about the family. Using such a seed, it becomes possible to find structurally signifi- cant alignments from the so-called ‘twilight zone’ of BLAST algorithm, namely alignments with low identity percentage, that anyway represent a conserved domain. Choosing a proper subset of seeds for particular protein family, however, poses an interesting computational problem. We will present a solution for this problem using an evolution algorithm.

Comparison of PSI-BLAST and PSI-Smith-Waterman algorithms Łukasz Ligowski a∗, Witold Rudnicki a

(a) Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland (∗) [email protected]

Abstract

PSI-BLAST is basic tool to detect distant evolutionary relationships between proteins. It uses BLAST for scanning protein database for similar sequences. BLAST heuristics scans databases very fast while retaining good levels of specificity and sensitivity. However plain BLAST does not provide quality results like precise methods such as Smith-Waterman algorithm. Until now, employment of Smith-Waterman algorithm for large database scans was unfeasi- ble due to prohibitive computation time. Several high performance implementations of Smith- Waterman algorithm emerged on different hardware architectures. In order to create PSI-SW we used our previous work, which is Smith-Waterman on NVIDIA CUDA. Our implementation matches BLAST speed but Smith-Waterman algorithm produces significantly better results than BLAST. Our main interest is to find out if using PSI-* approach benefits from using different search engine that produces better results. We hope that using better engine will produce significantly better results in first and subsequent iterations. Our presentation will contain discussion of PSI-SW and PSI-BLAST results on SCOP data- base. Comparing two search engines by using only ROC curves calculated in straightforward way can produce misleading results depending on reference database composition. We will present detailed analysis of results PSI-SW and PSI-BLAST.

6 Parallel implementation of T-COFFEE algorithm on GPU Paweł Wojciechowski a∗

(a) Institute of Computing Science, Poznań University of Technology, Poznań, Poland (∗) [email protected]

Abstract

Multiple sequence alignment methods are widely used in a broad class of biological research. Many algorithms for a construction of such alignments are known. One of the best known methods is T-COFFEE (Tree-based Consistency Objective Function for alignment Evaluation). The main disadvantage of this method is a calculation time, which is significantly longer than for other methods. The idea to overcome the main drawback of T-COFFEE was to implement a parallel version this algorithm. Modern graphics processing units (GPU), due to their highly parallel architecture, have a theoretical peak of performance an order of magnitude higher than a CPU. Moreover, their performance is increasing by a factor 2 to 2.5 per year, which is faster than the increase in CPU performance. The two major GPU producers, NVidia and ATI (which belongs to AMD), released developing platforms for General-Purpose computation on GPUs (GPGPU). However, building GPU performance models is still not trivial. Such an algorithm’s speedup is attainable only for those, which can be divided into many, independent parts. To overcome this limitation, only those steps of algorithms, which are in accordance with already mentioned assumptions, are performed on GPU. Other steps are computed on CPU. Such a process of dividing tasks between GPU and CPU is called a “hybrid programming”. T-COFFEE consists of three main steps: generating of a primary library of alignments, extending the library and finally progressive alignment. The first step relies on calculation pairwise local and global alignments, therefore these calculations can be performed independently. The second step requires examination of all the possible triplets of sequences and also can be parallelized. The parallelization of the last step seems to be problematic. Based on above observations the parallel version of T-COFFEE was implemented on GPUs produced by AMD and NVidia. The performance of this implementation was tested on BAliBASE — collection of 141 alignments.

Parallel algorithm for DNA assembly Piotr Gawron a∗

(a) Institute of Computing Science, Poznań University of Technology, Poznań, Poland (∗) [email protected]

Abstract

Process of reading a DNA sequence consists of a few steps: reading short fragments of a sequence (DNA sequencing), merging these fragments together into longer ones of length about 1 mln of nucleotides (DNA assembly) and at the end placing them on the proper place on the chromosome (mapping).

7 Progress in technology and science led to development of new approaches to DNA sequencing, which can provide the data in shorter period of time and at lower cost. These new methods (i.e. Illumina (Solexa), Roche (454)) provide much more data than any other method before. More data delivered from DNA sequencing proccess brings the necessity of creating assembly methods which could handle larger input instances. It can be done in two ways: by developing new algorithm or by introducing parallelism to existing solution. We propose an algorithm which was developed especially for data coming from 454 sequencing method, and applied parallelism to improve the speed of the computation time. The algorithm is a heuristic, based on a graph approach. The parallelism was used in the algorithm to speed-up fundamental procedures (greatly influencing the time of computations): calculating the overlap between pairs of sequences using Smith-Waterman algorithm and searching for the longest path in the overlap graph. Parallelism was developed using Unix posix threads. Test of both versions of algorithm, single- and multi- processor, were performed on real data generated during sequencing of the whole genome of Prochlorococcus marinus.

Long peptides assembly method based on Edman degradation Tomasz Głowacki a∗, Marcin Borowski a, Piotr Formanowicz a

(a) Institute of Computing Science, Poznań University of Technology, Poznań, Poland (∗) [email protected]

Abstract

Peptides are chemical compounds formed by linking 20 types of amino acids. They have various functions in human body like hormones regulation, tissue regeneration or antibiotic ones. Long peptides are called proteins and they consist of up to thousands of amino acids. A sequence of amino acids determines their structure and therefore their functionality. Determination of amino acid order in peptide structure is called sequencing. Because existing methods for peptide sequencing allow for determination only short fragments up to 50 amino acids, there is a need for assembly methods to bring these pieces together. The assembly method based on Edman degradation and partial enzymatic digestion is presented. The mathematical representation of this problem is proposed. It is shown that the considered problem without any biochemical errors is easy. NP-hard version of the problem with errors occurred during digestion phase is also discussed. Taboo search method and Genetic algorithm for NP-hard version of the problem is proposed and the obtained results are discussed.

8 Searching for tandem repeats in DNA sequences using the Burrows-Wheeler transform Rafał Pokrzywa a∗

(a) Department of Computer Science, Silesian University of Technology, Gliwice, Poland (∗) [email protected]

Abstract

A tandem repeat in DNA is a pattern of two or more nucleotides that occurs twice or more times and the repeated motifs are directly adjacent to each other. Tandem repeats play important role in molecular biology, they are related to genetic backgrounds of inherited diseases, and also they can serve as markers for DNA mapping and DNA fingerprinting. Since nucleotide sequence databases keep growing in sizes and numbers there is a need for improving and creating new tools for finding tandem repeats in DNA sequences. This leads to the development of many new tools analyzing newly sequenced genomes searching for repetitive structures. Such analysis can give an interesting insight into the structure of the genome and provide the basis for further investigation by other methods. This paper presents an original algorithm that finds all occurrences of tandem repeats in DNA. The developed algorithm is based on the Burrows-Wheeler Transform, a very fast and effective data compression algorithm. The presented algorithm exploits all advantages offered by the Burrows-Wheeler Transform, particularly that it provides a form of compressed suffix array. The advantage of the presented algorithm over other tandem repeats identification algorithms is its elegant simplicity and flexibility. It finds all occurrences of tandem repeats in a string without prior knowledge of the pattern, its size or number of copies. The algorithm is time and especially space efficient as it allows finding tandem repeats directly in compressed DNA sequences. It is completely scalable and has no restrictions of the size of the repeat.

WAIAS — a web toolkit for an aptamer analysis Miron Kursa a∗

(a) Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland (∗) [email protected]

Abstract

Aptamers are oligonucleotide particles that posses an ability to bind tightly and specifically to various micro- and macromolecular targets. Due to their simplicity and flexibility they are exploited both by nature as a sensor parts of riboswitches and by man in a various biomolecular, medical and technical appliances. Aptamers binding to nearly any known molecule can be obtained during simple screening pro- cedure (SELEX); indeed, many such experiments have been and are being performed, resulting of a large number of aptamer sequences towards many various targets. Nevertheless, currently there is no actively maintained public database containing them. WAIAS aims to fill that gap,

9 additionally bringing interface to some tools devoted to aptamer analysis covering motif finding, structural predictions, evolutionary research and advanced searching. We will discuss implemented algorithms and show their applications for aptamer research. The service is currently in the early beta stage and its available on http://bioinfo.icm.edu.pl/services/waias/.

Searching biochemical compounds in text databases Sławomir Walkowiak a∗, Konrad Wawruch a, Łukasz Ligowski a, Witold Rudnicki a

(a) Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland (∗) [email protected]

Abstract

Exploration of text databases with biochemical compounds used as queries is a complex task. That is due to the high complexity of queries, which includes multiple variants, naming conven- tions, isomers and mismatches. We develop a new tool intended to exploit such queries. The server consist two main components, chemsearch used for processing user input and nsearch, which is the n-gram based engine for the approximate text searches. Chemsearch uses several heuristics to extend the query by adding synonyms; spelling va- riants etc. The algorithm works in an iterative way. First, it finds all simple variants of the query. Next, it searches synonym databases for new synonyms. The procedure is than repeated until the algorithm founds self-consistent set of synonyms. Such extended query, which may be significantly larger than user input, is then submitted to the nsearch module. The nsearch is based on n-gram index with position mapping. The index is stored in com- pressed form which is slightly larger than original data size. For PubMed database (9 GB) the total size of index is 12 GB. The index is segmented, with 1024 entries per segment. This allows significant reduction of the search space for most queries and results in significant speedups in query execution. Nsearch currently runs exact and approximate searches using queries up to 256 characters. In the exact mode nsearch send back the result almost instantly — measured response time for queries up to 70 characters long were less than half a second. Approximate search is slower than exact one; the slowdown is proportional to the allowed error level. Initial results show that queries with up to 30% error can be processed efficiently. The nsearch engine will be available at http:://bioinfo.icm.edu.pl/services/nsearch.

10 Gene ontology based characterization of groups of genes by multi-attribute rules Aleksandra Gruca a∗

(a) Institute of Informatics, Silesian University of Technology, Gliwice, Poland (∗) [email protected]

Abstract

Interpretation of DNA microarray experiment results is difficult and time-consuming process and is usually performed by an expert in the filed of the experimental design. To support expert work, specialized systems are designed for storing, organizing and extracting information on genes, their functions and products. One of such systems designed for storing the information about genes and gene products is the Gene Ontology (GO) database which provides a set of structured, controlled vocabularies (GO terms) for annotating genes and gene products. Typical analysis by means of GO database involves performing various statistical tests to detect enrichment or depletion of GO terms that describe analyzed gene group. However, more complex analysis may take into account not only a list of single GO terms but also a relationships among them. For example, analysis of combinations of GO terms may reveal that particular biological process is associated with a specific cellular component. Thus, information about co-occurrences of GO terms may improve whole process of experiment interpretation. A method that allows to characterize the gene clusters based on decision rules expressed as logical functions of GO terms is presented. Conditional part of the decision rule consists of the gene ontology terms. Conjunction of the GO terms (which are included in the premise of the rule) can better describe biological function of the genes that compose the group than a list of single GO terms which can be obtained using standard approaches. The algorithm allows to induce all possible rules with a given (or better) value of statistical significance level. The method is extended with an algorithm of filtration of decision rules. The filtration algorithm selects rules with premise, that includes biggest possible number of attributes which represents gene ontology terms from the low level of gene ontology hierarchy (and thus describe the more specific function of genes). Proposed method combines all known approaches of the gene ontology analysis. As an input parameter of the algorithm only the rule statistical significance is used — based on the significance level, the number of attributes and level of gene ontology terms (which are represented as rule attributes) are established.

Virtual high-throughput screening using machine learning & docking Dariusz Plewczyński a∗

(a) Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland (∗) [email protected]

Abstract

At the beginning of a typical high-throughput screening (HTS) experiments some information

11 about small chemical molecules and protein targets is already available. Low-throughput vali- dation studies of the biochemical targets provide sets of active compounds, such as substrate analogues, natural products, inhibitors of a related protein or ligands published by a pharma- ceutical companies. Machine learning presents a powerful method to organize this knowledge and guide a discovery of new inhibitors. Our approach is aimed on: i) reducing the number of compounds to be tested experimentally against the given target and ii) extending results of flexible docking experiments performed only on a subset of a chemical library in order to select promising inhibitors from the whole dataset. The random forest (RF) method is trained here on compounds from the MDL drug data report (MDDR). The recall values for selected five diverse protein targets are over 90% and the performance reaches 100%. This machine learning method combined with flexible docking is capable to find 60% of the active compounds for most protein targets by docking only 10% of ligands database. Therefore our in silico method is able to scan very large databases rapidly in order to predict biological activity of small molecule inhibitors and provides an effective alternative for more computationally demanding approaches in virtual HTS.

Looking for chromosome spatial organization rules in microarray gene expression data Teresa Szczepińska a∗, Krzysztof Pawłowski a

(a) Nencki Institute of Experimental Biology, Warsaw, Poland (∗) [email protected]

Abstract

BACKGROUND. There is increasing interest in how a genome is spatially and temporally organized within the cell nucleus. Evidence supports the idea that basic nuclear functions, such as , are structurally integrated. Several studies demonstrated non-random, gene density- and/or chromosome size-related radial positioning of chromosomes that may provide additional level of gene expression regulation. Chromosome positioning shows probabilistic rather than deterministic nature. We have used the large sets of human gene expression data and rat hippocampi gene expression data from public repositories to explore relations between gene expression and genomic context. MATERIAL AND METHODS. We have used several approaches to identify clusters of co- expressed genes. A graph-based data-mining approach efficiently identified frequent co-expression clusters in 105 assembled human microarray datasets. Average linkage hierarchical clustering was applied to broad dataset of 79 tissue microarray measurements. K-median clustering grouped high-density oligonucleotide gene array time-series measurements from rat hippocampi with ka- inic acid-induced status epilepticus. The potential transcription modules have been integrated with gene position and density information. Literature relationship based tools have been used for functional analysis. RESULTS. We identified and functionally annotated distant genomic clusters within co- expression clusters. The number of such groups of genomic clusters is statistically higher than random. Co-expression clusters contain statistically fewer chromosomes than random. Co- expressed genes from different chromosomes are characterized by similar local gene density. In

12 different expression datasets, different chromosomes do group within co-expression clusters. This has been observed for similar, medium gene-density, low density and acrocentric chromosomes. CONCLUSIONS. Co-expressed genes show inter-chromosomal groups of positional clusters when analyzing broad data from many tissues as well when analyzing data from individual tissue. Genes from particular chromosomes show higher than random expression correlation. Candidate genes for spacial interactions in rat hippocampi will be further examined by in-situ experiments.

Model-based dissection of transcriptional deregulation Ewa Szczurek a∗, Florian Markowetz b, Irit Gat-Viks c, Przemysław Biecek a, Jerzy Tiuryn a

(a) Institute of Informatics, Warsaw University, Warsaw, Poland (b) Cambridge Research Institute, Cambridge, UK (c) Broad Institute of MIT and Harvard, Cambridge, MA, USA (∗) [email protected]

Abstract

Understanding changes that arise between different cellular conditions is an important step in elucidating mechanisms of disease. For example, one condition could refer to healthy or untreated cells, whereas the other condition to tumor or treated cells. In the tumor cells, one or more pathway member is often mutated and thus not able to interact with other proteins, affecting the responsive genes. Such a drastic switch in cellular conditions leads to deregulation — differently activated signalling pathways and differently expressed genes. Different activation of the pathway and its downstream targets implies changes in regulatory control that governs the targets. Here, we propose a pathway-centric approach for dissecting transcriptional deregulation, i.e. changes in the machinery of transcriptional control between different cellular conditions. The signalling pathway itself may change its structure or activation states, e.g. due to treatment or mutations of the pathway members. The approach utilizes prior knowledge on the pathway and perturbation data. We explicitly model changes in the signalling pathway and in expression profiles of its downstream genes measured after perturbing pathway members under both cellular conditions. Extant approaches that study interactome deregulation utilize a predefined collection of inte- ractions. Each given interaction is examined for loss or gain of correlation or mutual information between condition-specific expression profiles. Unlike the previous methods we do not require a predefined set of interactions. We use prior pathway information and perturbation data to elucidate both the gene expression deregulation and the underlying changes in regulatory control at the same time. Moreover, we are able to filter out model-independent deregulated regulator- target relations, leaving those that are specific for the system under study. Our approach is exemplified in a case study on DNA damage response (DDR) network in human, which mediates cellular response to genomic alterations and thus functions as one of the key protections against cancer development. DNA damage response is known to involve extensive changes in transcriptional control and rewiring of regulatory networks. We find major transcriptional deregulation after exposure of cells to a damaging agent and evaluate the results both statistically and biologically.

13 Evolution of overlapping genes in Drosophila genomes Marcin Jąkalski a∗, Izabela Makałowska a

(a) Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Department of Biology, Adam Mickiewicz University, Poznań, Poland (∗) [email protected]

Abstract

Overlapping genes represent a pair of different genes which genomic regions cover to some extent. This phenomenon is often observed in viral, prokaryotic and also eukaryotic genomes. It is considered as a common strategy of genome organization and gene regulation in bacteria. Still increasing number of new evidences suggests that overlapping genes can regulate key processes of gene expression in Eukariota, including genomic imprinting, RNA interference, translational regulation and RNA editing. One of the mechanisms explaining origination of gene overlap is the hypothesis by Keese and Gibbs. It says that overlapping genes are created in an overprinting process — new genes are generated from previously existing nucleotide sequences. That’s why one of the genes from overlapping pair is representative of evolutionary and phylogenetically young protein coding genes. In our studies we focused on closely related species — 12 species of Drosophila genus. Entirely studies were done with reference to a set from D.melanogaster. We examined conservation of overlapping genes pairs and single genes, being a member of particular pairs. Comparative analysis were done in three levels: for all representatives of Drosophila genus, other insects (mosquito and bee), and vertebrates (human, mouse, chicken, zebrafish). From originally found 3504 unique genes overlapping in D.melanogaster the largest number of conserved genes (orthologs) was found within D.yakuba — 3022, and least in D.persimili — 2712. As compared to the group of model organisms, the highest number of orthologs were observed in mosquito — 2064 and least in mouse — 1147 genes. Out of the 2001 overlapping genes pairs found in D.melanogaster the highest number of conserved gene pairs in Drosophila genus were observed in D.yakuba — 1537, and least in D.grimshawi — 1270. In comparison with vertebrates and insect from the group of model organisms these numbers are much lower. The highest proportion of conserved gene pairs is observed in mosquito — 747, and lowest in chicken — 437 genes pairs. All results show that within overlapping genes there are many formed relatively recently and even among the same genus like Drosophila these genes are not conserved. This confirms Keese and Gibbs hypothesis and demonstrates that overprinting is one of major mechanisms leading to gene overlaps.

14 Predicting nucleosome binding sites in yeast genome Aleksander Jankowski a∗

(a) Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland (∗) [email protected]

Abstract

Nucleosomes form the fundamental repeating units of chromatin, which is used to pack large eukaryotic genomes into the nucleus while still ensuring appropriate access to it. Chromatin consists mostly of DNA and proteins, and its structure is inevitably related to the regulation of gene transcription. In 2009, first genome-wide maps of nucleosome occupancy in vitro and in vivo were obtained for yeast [Kaplan et al., Nature, vol. 458]. Relying on the experimental data, the authors devised a computational model of nucleosome sequence preferences. The model is based on thermodyna- mical equilibrium, and involves two free parameters, representing nucleosome concentration and inverse temperature. My study is directed towards improvement of this model. In my talk, I will analyse the impact of the model parameters to overall performance of prediction. I will compare different ways to estimate model parameters, and also explain the influence of the individual components on model accuracy.

Testing for heterogenity in the receptors density Mikołaj Rybiński a∗, Anna Gambin a, Haluk Resat b

(a) Institute of Informatics, University of Warsaw, Warsaw, Poland (b) Pacific Northwest National Laboratory, Richland, WA, USA (∗) [email protected]

Abstract

INTRODUCTION. Cellular membrane receptors can clamp together causing differences in the ligand binding affinity. Recently, in model based studies on the equilibrium epidermal growth factor (EGF) binding on EGF receptor (EGFR), it has been suggested that heterogeneity in the density of EGFR due to localization in certain regions of the plasma membrane, can be responsible for the experimental data giving concave up nature of the Scatchard (Rosenthal) plot. In this method, the ratio of bound receptor to free ligand concentration is plotted as a function of bound ligand concentration, which, for simple ligand-receptor binding reaction model L+R <-> LR results in a linear relation. PROBLEM. We are interested in checking if it is possible to perform experiment that supports or rejects the hypothesis of heterogeneous EGFR density on the membrane of the subject cells (breast cancer cells). In principle, assuming that experimental measurements have uncertainty level σ (e.g. ∼ 30%), what is the minimum receptors affinity difference δ (e.g. two- fold) that can be reliably measured? Basically, we want to be able to say with high certainty if we are dealing with two classes of receptors: the high density (clamped) and the low density (loose).

15 APPROACH. The experimental measurements, in fact, aren’t done in the equilibrium state thus, we generalize the problem to any given time point during the receptors saturation process. Using the mass action kinetics model of the heterogeneous ligand-receptor binding we show δ as a function of σ. This is achieved by a simple analysis and the numerical simulations of the ordinary differential equations describing proposed model kinetics. Moreover, we analyze the influence of the system’s intrinsic variations on the measurement’s reliability by taking into account stochastic description of the model.

Retrogenes: the genesis and evolutionary consequences Joanna Ciomborowska a∗, Damian Szklarczyk a, Wojciech Makałowski b, Izabela Makałowska a

(a) Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Department of Biology, Adam Mickiewicz University, Poznań, Poland (b) Institute of Bioinformatics, Faculty of Medicine, University of Münster, Germany (∗) [email protected]

Abstract

Human genome contains about 19 000 retrocopies of genes generated by reverse transcription. A few hundreds of them are functional and they are called retrogenes. Despite of the number of retrogenes studies, there is still little known about their evolution and function. One of the never explored areas is the replacement of parental gene by its intronless retrocopy. The main aim of this project is a comparative analysis of vertebrate genomes in order to identify the loss of parental genes and persistence of their retrocopies only. Using two comparative approaches we analyzed human and chicken genomes. In the first approach, we looked for all single-exon human genes (UCSC database) and compared them with transcripts of human and chicken multi-exonic genes. After filtering similarity search results and manual verification we obtained 17 candidates. In the second approach, putative orphan retrogenes were selected based on the sequence similarity with chicken multi-exonic genes and genomic localization of identified homologous gene pairs. This analysis returned 260 pairs of human retrogenes and chicken orthologs of their parental genes. After additional verification of all results, using annotations from databases, similarity searches, and phylogenetic methods, we identified five functional retrogenes for which there was no parental gene in the human genome. Further analyses showed that these five parental genes were lost in majority of mammalian genomes. This is a first report on such phenomenon. Our discovery shades a new light on cross-species differences and the evolution of vertebrate genomes.

16 Intron gain in mammals Michał Szcześniak a∗, Michał Kabza a, Joanna Ciomborowska a, Izabela Makałowska a

(a) Adam Mickiewicz University, Poznań, Poland (∗) [email protected]

Abstract

Introns are a hallmark of eukaryotic genes organization. Large scale bioinformatic studies of intron loss and intron gain in mammals have been performed so far. However these analyses were focused on search for intronisation events in multiple-exon genes. It is believed now that in mammals during the last 100 My there was no intronisation at all. Our goal was to test this hypothesis taking into consideration a class of very dynamically changing new genes: retrogenes. We performed the analyses for three mammalian species (human, chimp and mouse) using three comparative genomics approaches. In the first approach for each species we created the sets of two-exon genes, multiple-exon genes and intronless genes (UCSC database). Than we (1) compared intronless genes with one-exon genes and (2) intronless genes with multiple-exon genes (Blast). We filtered out one-exon genes that had high similarity scores in step (1) and (2) and checked them manually. In the second approach we used a set of human processed pseudogenes and retrogenes (Pseudogene.org database) and performed the same analysis as for intronless genes. Ultimately, the last approach included comparison of introns from two-exon genes with one-exon genes (Blast), search for parental genes, filtering and manual verification. Using these methods we found several candidate genes, possibly being an intronisation case through the recruitment of exonic sequences. One of them is RNF113B (human) that contains one alternatively spliced intron. For this gene we performed the expression analysis in 16 human tissues using PCR reaction with labeled primers, showing the tendency of .the spliced form being expressed in testis. We also looked at the evolution of the candidate genes finding out that they are a melting pot of gene evolution phenomena. Our findings show that intron gain events take place in mammals, which sheds a new light on the understanding of mammalian genomes evolution.

On a certain model of HCV infection Szymon Wąsik a∗

(a) Institute of Computing Science, Poznań University of Technology, Poznań, Poland (∗) [email protected]

Abstract

From the earlier research we know that genetic diversity of HCV population can be an indicator of the commonly used therapy result. We investigate a correlation between genetic diversity of HCV population and the level of viral RNA accumulation in patient blood. Genetic diversity is defined as the mean Hamming distance between all pairs of virus RNA sequences representing the population. We have found that a low Hamming distance (i.e., low genetic diversity) correlates with a high RNA level. Symmetrically, high diversity corresponds to a low RNA level. We contend that the obtained correlation strength justifies the use of the viral RNA level as a

17 forecast of a therapy result. We analyze data about viral RNA level gathered during the case study in Poznan hospitals. Using the statistical analysis we prove that the distribution of data is normal and complement some missing values. Using this set we analyze how the viral RNA level in patient blood changes over the time. To describe these changes we define transition states and transition probability matrices which give us an overview of how the infection process evolves. To describe the outcome of the therapy we define the therapy efficiency coefficient and propose the algorithm for the prediction of efficiency of an established therapy based on the treatment scheme. This algorithm utilizes previously defined transition matrices to calculate its result. At the end we use this algorithm to find the possible way of improving the efficiency and we propose that patient qualification for therapy, based on viral RNA level, improves its efficiency.

Association analysis in human diseases: schizophrenia and bipolar affective disease example Małgorzata Maciukiewicz a∗

(a) Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Department of Biology, Adam Mickiewicz University, Poznań, Poland (∗) [email protected]

Abstract

Schizophrenia and bipolar disease are complex mental disorders. Both diseases are highly heri- table and are characterized by complicated etiology. Genes involved in neurodevelopment and neurotransmission, such as DISC1, FAT, and BDNF are supposed to be among those potentially related with elevated risk. Association means that particular gene variant are present more often in a group, carrying certain trait (e.g. disease) than in other group without this trait. Association analysis may be performed on family level or on population level. Studies on population level are known as case-control studies. Association between genotype at particular locus and disease may be direct or indirect. Direct association arises when a gene variant is casually related to the disease. Indirect association arises when gene variant itself may not be casual, but may be in linkage disequilibrium (LD) with closely located causal variant. When loci are in LD they are co- inherited more often than in independent segregation. Results from analysis of schizophrenia and bipolar affective disease which was done in coope- ration with Poznan University of Medical Sciences will be shown. Analysis was performed on group of 503 schizophrenia patients, 418 bipolar patients and 530 controls, using Haploview and FamHap software. Also analysis of response to the prophylactic lithium treatment in bipolar patients was performed. In DISC1 gene putative association was found for two SNPs: rs1615409 and rs1411776. The strongest association was found in a group of females with schizophrenia. In the combined group of males and females association was weaker. No significant results were found in male group alone. Main problem with association studies is the lack of replication of positive results in independent studies. It may be caused by population stratification and/or population admixture. Choosing appropriate groups of cases and controls is crucial for further analysis and must be done before association studies.

18 Computational identification and comparative analysis of ARGONAUTE-binding platforms found in plant RRM proteins Wojciech Karłowski a, Andrzej Zieleziński a∗

(a) Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Department of Biology, Adam Mickiewicz University, Poznań, Poland (∗) [email protected]

Abstract

DNA methylation is a conserved epigenetic mark in eukaryotic cells. In plants, DNA methylation can be triggered by small interfering RNAs (siRNAs) through an RNA-directed DNA methylation (RdDM) pathway. siRNAs are loaded onto an Argonaute-containing RISC complex (RNA- induced silencing complex) associated with RNA polymerase V (polV) that targets the de novo DNA methyltransferase DRM2 to RdDM target loci. Recent studies have identified a conserved WG/GW-containing platform, which is crucial for the recruitment of Argonaute proteins to distinct components of the eukaryotic RNA silencing pathways. Here, we report the identification of plant RdDM effectors with the RNA-recognition motif (RRM) containing a C-terminal extension rich in WG/GW repeats. A phylogenetic analysis was performed to investigate the evolutionary relationship among these proteins in representa- tives of green alga, mosses, monocots and dicots. It showed that proteins cluster into distinct groups according to the number of RRM copies. It suggested the duplication of RRM do- main have occurred early during the evolution of the RRM proteins, before the separation of relevant plant species. Further analysis revealed that a WG/GW-rich domain was formed inde- pendently in RNA-binding proteins that have a glycine-rich domain during the lineage-specific diversification (e.g. convergent evolution). We performed comparative analysis of hydrophilic WG/GW-containing domains and highlighted specific amino acid composition, which may pro- vide binding sites for AGO proteins during RNAi. Here, we propose a new alternative approach for the identification of the functional WG/GW-rich domains based on a PSSM method. This study highlights the molecular evolution of the RRM gene family with reiterated WG/GW motifs in plant lineages and indicates peculiar amino acid residues likely relevant for the AGO- binding properties.

Comparative modeling of the hydrolase domain from different CLCA proteins Anna Lenart a∗, Krzysztof Pawłowski a

(a) Nencki Institute of Experimental Biology, Warsaw, Poland (∗) [email protected]

Abstract

BACKGROUND. CLCA proteins, until recently believed to be chloride channels, have been characterized as putative hydrolases similar to zincins [1]. Proteins of this family have been identified in vertebrates and also in some other Metazoa as well as in a few bacterial species. It

19 can be hypothesized that a homologue of today’s CLCAs was present in the common ancestor of Metazoa. The specific function of CLCA is not known. MATERIALS AND METHODS. For searching novel CLCA homologs, we have employed the BLAST [2] program on the NCBI sequence databases. The results obtained for bacterial sequences and sequences of invertebrate organisms were used as queries to the FFAS03 [3] server for validation of structure prediction. Using FFAS03-based multiple sequence alignment, models of hydrolase domain have been built using Modeller [4] and the published hCLCA1 model as a template. For some sequences, alternative models were built using Prime [5]. RESULTS. The structures of putative hydrolase active sites in CLCA proteins from different species were compared, including their geometrical and electrostatic properties. Special focus was on substituted active sites, where the “classic” HExxH motif was conservatively substituted, e.g. by RExxK or RQxxR. Structures of such sites were analyzed in detail, and phylogenetic analysis was performed in order to predict functional significance of such “inactive” active sites. CONCLUSION. Structural modeling and phylogenetic studies of substituted enzyme active sites provide insight into evolutionary mechanisms that lead to novel functions, both enzymatic and non-enzymatic. REFERENCES. [1] Pawlowski K et al. PROTEINS: Structure, Function and Bioinformatics 63:424-439 (2006) [2] Altschul SF et al. J. Mol. Biol. 215(3):403-410 (1990) [3] Rychlewski L et al. Protein Sci 9:232-241 (2000) [4] Sali A et al. Methods Mol Biol 426:145-159 (2008) [5] Schrödinger INC http://www.schrodinger.com

Ligand-protein conformation structure prediction Arkadiusz Hoffa a∗, Piotr Łukasiak a, Maciej Antczak a, Jacek Błażewicz a

(a) Institute of Computing Science, Poznań University of Technology, Poznań, Poland (∗) [email protected]

Abstract

One of the most important and difficult problems in biology is to find a final conformation of 3D structures of the ligand-protein complex. The new approach based on Elastic Net Method has been proposed to solve above problem. The proposed algorithm tries to dock the ligand molecule into the protein molecule. Because of the complexity of the problem Tabu Search metaheuristic has been used. The length of tabu list depends on number of the amino acids of ligand molecules. The docking energy is minimized during dynamic simulation. Energy is calculated as a sum of the distances between ligand and protein. Dynamic simulation process is finished if any of stop criteria is obtained. Proposed algorithm used three independent stop criteria: ligand is in contact with protein, the maximal value of simulation iterations is obtained, the total energy in ten last steps decreased. Based on conducted experiments our algorithm is a useful tool for the ligand-protein confor- mation structure protein. Algorithm was tested on representative set of proteins and obtained results are promising.

20 DomAn2 – approach for predicting domains boundaries in proteins Maciej Antczak a∗, Piotr Łukasiak a, Maciej Miłostan a, Jacek Błażewicz a

(a) Institute of Computing Science, Poznań University of Technology, Poznań, Poland (∗) [email protected]

Abstract

Understanding details of machinery of human organism has been a great challenge for humanity. Protein structures are usually analyzed at the level of the domain. However, the definition of a domain is not always straightforward. Small structural differences between otherwise similar proteins can have major consequences in the way a protein structure is broken into domains and even within a domain such differences can alter the way in which its fold (or topology) is perceived. Expert judgment can be used to some extent to overcome these problems, but experts do not always agree. Sequence similarity searching is a crucial step in analyzing newly determined protein sequen- ces. Whereas similarity searching by programs such as BLAST or FASTA allows the inference of homology and/or function in many cases, identification of multidomain proteins is often proble- matic because their similarities point to various unrelated protein families. The best solution to this problem is the use of pattern databases that store the common sequence patterns of domain groups in the form of consensus representations. Various pattern representation methods are in use, including regular expressions, position-dependent frequency matrices, and hidden Markov models (HMMs). All of these representations are based on multiple sequence alignments. The new method of prediction of proteins domains boundaries called DomAn2 has been proposed. The DomAn2 approach predicts protein domains using a combination of information in the form of domain boundaries sequence patterns. The domain boundaries sequence patterns are chains with length equal to twenty amino acids. All combinations of information are stored in database specially designed for the DomAn2 method. The domain boundaries sequence patterns are derived from seven domain classification databases: Conserved Domain Database (NCBI), SCOP (Murzin et al.), CATH (Orengo et al.), Dali Domain Dictionary (Holm and Sander) InterPro, UniProt (EMBL-EBI) and Pfam (Sanger Institute). Known protein sequence can be found in Protein Data Bank (PDB). The DomAn2 approach tries to follow designed decision tree to give a final decision about localization of domain boundary in protein chain. Based on the data from CASP7 and CASP8 accuracy of prediction is on the level higher than 70%. Proposed approach can be successfully applied for the considered problem.

21 A novel method for tracing polypeptide chains in medium-resolution electron density maps Wojciech Potrzebowski a∗, Janusz M. Bujnicki a

(a) Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Warsaw, Poland (∗) [email protected]

Abstract

Experimental advances in producing electron microscopy data on structures of macromolecu- lar assemblies have paved the way for theoretical methods that dock atomic models into low- resolution electron density maps. Although considerable progress has recently been made in building pseudo-atomic models based on combination of cryo-EM and X-ray crystallography, the problem remains unsolved in the absence of structural models for components of the assembly (e.g. without X-ray structures or theoretical models for individual subunits of the complex). We developed a method for tracing polypeptide chains in medium-resolution (up to 8-10Å) electron density maps, for proteins with known sequence. Our program achieves this goal by reducing representation of a map to inferred centers of masses of residues, combined with inference of secondary structure elements. The connectivity of backbone trace is found by means of a thre- ading algorithm. Our method can be applied for inferring residue-resolution or pseudo-atomic models for low-resolution electron density maps from X-ray crystallography or cryo-EM.

Modeling of large macromolecular complexes Joanna M. Kasprzak a∗, Wojciech Potrzebowski b, Janusz M. Bujnicki ab, Kristian Rother b

(a) Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland (b) Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Warsaw, Poland (∗) [email protected]

Abstract

One of the major challenges in structural biology is to determine the structures of macromolecu- lar complexes and to understand their function and mechanism of action. However, compared to structure determination of the individual components, structural characterization of macromole- cular assemblies is very difficult. To maximize completeness, resolution, accuracy, precision, and efficiency of structure determination for large macromolecular complexes, a hybrid computational approach is required that will be able to use spatial information from a variety of experimental methods (like X-ray, NMR, cryo-EM, SAXS, cross-linking and mass spectrometry, etc.) and model building with satisfaction of all available restrains taken from experimental data. We have developed a method that allows for building and visualizing very low-resolution models of protein complexes, where components are represented as ellipsoids (proteins) and cylinders (nucleic acid helices). Such an approach enables creation of low-resolution models even

22 for very large macromolecular complexes with components of unknown 3D structure. Model building relies on information about interactions between given particles (e.g. from predictions or from experimental data) and probability of contacts between them.

Branch and cut algorithm for the RNA degradation problem Agnieszka Rybarczyk a∗

(a) Institute of Computing Science, Poznań University of Technology, Poznań, Poland (∗) [email protected]

Abstract

For the last few years there has been a great interest in the RNA chain analysis and modeling due to the discovery of its role in the regulation of gene expression. It has occurred that not only large RNAs such as messenger or transfer RNA are responsible for proper functioning of the living organisms. There exist plenty of smaller RNAs, called small regulatory RNAs, which play a crucial role In the cellular processes. Small RNAs are produced from the larger molecules via enzyme processing or spontaneous degradation. The nature of the latter mechanism, non- enzymatic hydrolysis was deeply investigated and it was demonstrated that it is RNA structure dependent process. In this work, we present a mathematical formulation of a new problem of RNA degradation and a branch and cut algorithm for finding the solution of the problem. We also show laboratory results that enable to verify the results of the algorithm.

Prediction of metal-binding sites in RNA structures Anna Philips a∗

(a) Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Department of Biology, Adam Mickiewicz University, Poznań, Poland (∗) [email protected]

Abstract

The interactions of ribonucleic acids’ with other molecules such as ligands or metal ions play key roles in many biological processes. Mono- and di-valent metal ions drive the proper folding of RNA and stabilize its secondary and tertiary structure. Ions take part in the active sites of catalytic RNAs. Furthermore, many different RNAs (e.g. riboswitches) can bind ligands in order to regulate gene expression. Various RNAs are also considered as potential targets for small-molecule ligands that may function as drugs. Based on accumulated structural data on RNAs complexes with ligands and ions, we are developing predictive tools to assist in the modelling of these interactions, including the prediction of metal-binding sites.

23 Here, we present a distance- and orientation-dependent statistical potential for prediction of metal ion-binding sites, for a given RNA structure.This potential can be used to predict the localization of ions e.g. in theoretical models of RNA structure and will be implemented in our RNA modelling methods.

RNA tertiary structure prediction with ModeRNA Magdalena Musielak a∗, Kristian Rother b, Tomasz Puton a, Janusz M. Bujnicki b

(a) Bioinformatics Laboratory, Institute for Molecular Biology and Biotechnology, Adam Mickiewicz University, Poznań, Poland (b) Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Warsaw, Poland (∗) [email protected]

Abstract

ModeRNA is a computer program for comparative modeling of RNA structures. With this tool an user can obtain a 3D model of a target RNA molecule based on a homologous template struc- ture from the PDB, and a target-template alignment of RNA sequences. ModeRNA can generate a model automatically by introducing nucleotide substitutions as well as adding fragments mis- sing in the template by choosing linkers with a suitable geometry. Special emphasis was laid on posttranscriptionally modified nucleotides: modifications can be added or removed easily. Mo- reover, ModeRNA is equipped with many functions useful when working with nucleic acid PDB files for a variety of applications. ModeRNA is controlled by a set of textual commands that can be run as input scripts, or as part of a Python program. The program was exhaustively tested by creating 7220 models for known tRNA structures. The ModeRNA software with comprehensive documentation and a tutorial is available on http://iimcb.genesilico.pl/moderna.

CompaRNA — a server for comparison of methods for RNA structure prediction Tomasz Puton a∗, Kristian Rother ab, Janusz M. Bujnicki ab

(a) Laboratory of Bioinformatics, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland (b) Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Warsaw, Poland (∗) [email protected]

Abstract

Here we present the CompaRNA server for the automated and reliable benchmarking of publicly available tools for RNA secondary structure prediction. As the examples of servers such as EVA

24 (Koh et al., 2003) and LiveBench (Bujnicki et al., 2001) have shown, automated and objective testing of different algorithms is a strong impulse for development of better prediction methods (in the case of the servers EVA and LiveBench — methods for protein structure prediction). The CompaRNA server was created using the Python programming language (http://www.python.org/) and the Django library (http://www.djangoproject.com/). It mo- nitors the Protein Data Bank database (http://www.pdb.org/) for newly released RNA struc- tures solved experimentally. The new RNA sequences are downloaded and submitted to a series of third-party bioinformatics prediction methods. Based on the obtained results, the rankings of performance of particular methods are created. Currently, CompaRNA evaluated only methods for RNA secondary structure prediction, but we are in the course of implementing a benchmark of fully automated RNA tertiary prediction methods, and methods for predicting RNA-binding sites for given protein sequences. The main score used to rank methods for RNA secondary structure prediction is the Matthews Correlation Coefficient, which combines both sensitivity and specificity and is useful for ranking algorithms (Gardner and Giegerich, 2004). The results obtained are immediately displayed online and made available via RSS feeds. The first public release of the server is on-line and available for testing at http://iimcb.genesilico.pl/comparna/. REFERENCES. Bujnicki J. M., Elofsson A., Fischer D., Rychlewski, L. (2001). LiveBench-1: Continuous benchmarking of protein structure prediction servers. Protein Science 10 (2): 352-361. Gardner P. P., Giegerich R. (2004) A comprehensive comparison of comparative RNA struc- ture prediction approaches. BMC Bioinformatics 5:140. Koh, I. Y., Eyrich, V. A., Marti-Renom, M. A., Przybylski, D., Madhusudhan, M. S., Eswar, N., Grana, O., Pazos, F., Valencia, A., Sali, A., and Rost, B. (2003). EVA: Evaluation of protein structure prediction servers. Nucleic Acids Research 31: 3311-3315.

25 Author Index

Maciej Antczak, 20, 21 Jerzy Tiuryn, 13 Przemysław Biecek, 13 Sławomir Walkowiak, 10 Jacek Błażewicz, 20, 21 Konrad Wawruch, 10 Marcin Borowski, 8 Szymon Wąsik, 17 Janusz M. Bujnicki, 22, 22, 24, 24 Paweł Wojciechowski, 7 Joanna Ciomborowska, 16, 17 Andrzej Zieleziński, 19 Piotr Formanowicz, 8 Anna Gambin, 6, 15 Irit Gat-Viks, 13 Piotr Gawron, 7 Tomasz Głowacki, 8 Aleksandra Gruca, 11 Arkadiusz Hoffa, 20 Aleksander Jankowski, 15 Marcin Jąkalski, 14 Michał Kabza, 17 Wojciech Karłowski, 19 Joanna M. Kasprzak, 22 Miron Kursa, 9 Anna Lenart, 19 Łukasz Ligowski, 6, 10 Piotr Łukasiak, 20, 21 Małgorzata Maciukiewicz, 18 Izabela Makałowska, 14, 16, 17 Wojciech Makałowski, 16 Florian Markowetz, 13 Maciej Miłostan, 21 Magdalena Musielak, 24 Krzysztof Pawłowski, 12, 19 Anna Philips, 23 Dariusz Plewczyński, 11 Rafał Pokrzywa, 9 Wojciech Potrzebowski, 22, 22 Tomasz Puton, 24, 24 Haluk Resat, 15 Kristian Rother, 22, 24, 24 Witold Rudnicki, 6, 10 Agnieszka Rybarczyk, 23 Mikołaj Rybiński, 15 Michał Startek, 6 Teresa Szczepińska, 12 Michał Szcześniak, 17 Ewa Szczurek, 13 Damian Szklarczyk, 16

26