<<

DanBIF Conference on Molecular March 11 - 12, 2004

Venue: Festsal at Geocenter Copenhagen, Øster Voldgade 10, DK-1350 Copenhagen K, Denmark Organiser: DanBIF, Danish National Node of GBIF, Global Biodiversity Information Facility Presentations: Welcome & Introduction: The three main levels of biodiversity (Molecular, Organismic, Ecological); Why is molecular biodiversity important to GBIF/DanBIF?

Professor Henrik Enghoff Director, Natural History Museum of Denmark, University of Copenhagen. email: [email protected]

Abstract: Biodiversity is an immensely complex concept but can be construed as being organised according to three main "axes": Organismic biodiversity deals with entire animals, plants, fungi and micro- and includes such aspects as morphology, distribution, phylogeny, naming, preservation of specimens in collections. Molecular biodiversity focuses on DNA, RNA and and is the main subject of the present conference. Ecological biodiversity is about such things as, e.g., communities and interactions between species. Although research along each of the three main axes can give wonderful results, it is in the conceptual planes defined by two or in the conceptual space defined by all three of the axes that the really exciting discoveries are most likely to be made. GBIF has chosen initially to focus on the organismic level of biodiversity. However, when Denmark presented her bid to host the GBIF secretariat, much emphasis was put on exploring the interaction between organismic and molecular biodiversity. The present conference, arranged by DanBIF – GBIF's Danish node – represents an attempt to stimulate collaboration between molecular and organismic biodiversity researchers.

Session 1 – What is molecular biodiversity? - Presentation of the three molecular levels. Molecular Biodiversity – more than a GenBank reference?

Professor Peter Arctander Department of Evolutionary Biology, University of Copenhagen, Denmark. email: [email protected]

Abstract: GenBank references generally include taxonomic and bibliographic information, a DNA sequence and often a sequence. Comparing sequences from different individuals is useful for characterising relationships and thus are useful as markers. Over the past fifteen years the use of comparative sequence analysis has been refined to explore a wide spectrum of relationships, from individual relationships to deep phylogenies and are now routinely used in most biological and medical disciplines. With the massive increase in data available for molecular studies and the corresponding changes in our views of molecular processes a more comprehensive classification of molecular diversity is needed.

The human genome sequence has revealed less than 25,000 protein coding genes. In a mouse to human comparison only about 300 protein coding genes are found to be different. If we compare humans to our closest relative, the chimpanzee, we find about 1% difference between their DNA sequences, and 99% of these differences are found in non-protein coding regions. Vertebrate genomes have about twice the number of genes found in invertebrates, and this increase is primarily due to duplication of existing genes. Diversity is thus not only due to differences in protein coding genes or new genes; evidence is accumulating for the role of non protein coding DNA. The protein coding information in the exons comprises about 1.8% of human DNA, while more than half of our DNA is transcribed. Diversity generating mechanisms like alternative splicing is estimated from 74% of the human genes, together with other modifications the potential variation of the protein output is enormous.

The vast majority of the genetic output is however non protein coding RNA (ncRNA), and it is becoming evident that ncRNA is centrally involved in the regulation and interpretation of genomic information and in the creation of molecular and phenotypic diversity. To fully understand cellular functions, it is important to understand each transcript (protein coding or not) in the context of a complex regulatory network. The emerging “modern RNA world” and molecular networks are perhaps the most important new research frontiers in molecular diversity studies

Molecular markers alone, and differences in the DNA of protein coding genes, is not sufficient to describe molecular diversity. Genomic information, transcript information, protein variation and variation in the networks that drive complex cells function, and allow multicellular development, are all needed for an understanding of molecular biodiversity. As opposed to the marker based descriptions, this could be seen more as functional molecular diversity and thus come closer to uniting genotype and phenotype.

The current impressive effort in collecting data is of paramount importance especially for conservation, nature management and traditional biological sciences. Understanding and characterising molecular diversity is likely also to become important in medicine, molecular engineering and, of course, molecular genetics.

Introtype DNA sequences and what they do not tell us about chemical differentiation and biodiversity.

Professor Jens Chr. Frisvad BioCentrum-DTU, Mycology Group, Technical University of Denmark. email: [email protected]

Abstract: DNA sequences as such are important for establishing molecular clocks and an estimate of the tokogeny of organisms. Zuckerkandl and Pauling (1965) called molecules such as DNA and proteins for semantic molecules, because similarities in their sequences of nucleic acids and amino acids respectively, may give us hints on the phylogeny of species. As such the chemical world of nucleotides, amino acids, tricarboxylic acid cycle members etc. are common to all prokaryotes and and could be called biouniformity or biosimilarity. Until now most of bioinformatics has dealt with the sequences of , and regulation and quantification of all these basic processes (genomics, transcriptomics, proteomics, metabolomics, fluxomics). This basic apparatus in all organisms can be called the introtype and the metabolism has been called the primary or general metabolism. Because of some non-lethal mutations in genes for these general processes, data from the introtype may give us estimates on a molecular clock, but in general the introtype tell us nothing about evolution. Despite this Carl Woese and most other biologists use ribosomal DNA or household gene sequences to establish estimates of phylogenies.

Enter differentiation, which accounts for all the biodiversity on this planet (the extrotype). Differentiation and the following interaction between different organisms and between organisms and different abiotic conditions on the planet paves the way for evolution and ecology. For example animals have a single set of metabolic pathways, while the prokaryotes may account for more than

2 20 fundamentally different ones to cope with extreme environments. One part of differentiation concerns isolate or individual differences (fingerprint differentiation). Selection in this variation and genetic and ecological drift is the cause of speciation according to neodarwinists and in this system populations, races, varieties, subspecies and species are merging into each other. One extreme is described by Hubbell (2001), who has suggested a neutral theory (as opposed to the niche assembly theory) where the point mutation mode of speciation is considered very likely and the different systematic levels are like fractals. In contrast we believe that essential differentiation features (building up the the aristotype or the recognome) are of paramount importance for evolution and ecology, backed up by characters based on propagation, nutrition, ecophysiology and resistance. Such functional characters are often invariant, homoplastic or autapomorphic, but ecologically extremely significant, and therefore of use for classification. Classification should be based on such differentiation features, but also cladification, which is in need of new methods to show symbiogenesis and interactions between species. In the fungal kingdom the so called secondary metabolites (= extrolites) are of great importance for systematics. Paradoxically the episemantides (extrolites in our notation) of Zuckerkandl and Pauling (1965) are those that evolution is based on and hardly the semantides, which are just indirect indicators of history.

- Hubbell SP (2001) The unified neutral theory of biodiversity and biogeography. Princeton University Press, Princeton. - Zuckerkandl E, Pauling L (1965) Molecules as documents of evolutionary history. Journal of Theoretical Biology 8: 357-366.

Evidence of Hidden Transcriptome and Strategies Employed to Regulate Expression of RNA Transcripts.

Vice President for Biological Sciences, Dr. Thomas R. Gingeras Affymetrix, Inc., Santa Clara, California, USA. email: [email protected]

Abstract: The regulation of gene expression is accomplished at several points from the initiation of synthesis of RNA to post-translational modification and localization of the protein products. Two early control points where the expression of RNA is regulated are at the assembly of factors in a promoter region required for the initiation/repression of RNA synthesis and in the transport of processed (spliced and polyadenylated) RNAs from the nucleus to the cytoplasm. Using high density arrays which interrogate the non-repetitive sequences of human chromosomes 21 and 22 at 35 base pair resolution both of these control steps have been investigated. Sites of binding for three transcription factors, cMyc, Sp-1 and p53 along these chromosomes have been mapped. For Sp-1 and c-Myc a total of 866 high confidence binding sites have been mapped in a single cell line. Approximately 22% of these sites are located at least 5 kb away from the nearest annotation ( exon or EST) and many (25%) are proximal to novel non-coding transcripts discovered using these arrays to map the sites of RNA transcription along these two chromosomes (Kapranov, et al.,2002). Half of the Sp-1 binding sites are co-localized with a c-Myc binding site and at least 37% of the binding sites of these factors have been mapped within the introns and exons of well characterized genes (Cawley, et al 2004). Finally, maps created to locate the sites of transcription of polyA+ RNA obtained from the nucleus and cytoplasm of cells studied reveal that many transcripts of these chromosomes are processed (spliced and polyadenylated) but are predominately located in the nucleus possibly awaiting transport pending additional signaling. These data describe novel transcription factor binding sites and transcribe regions as well as the regulation of RNA functioning by nuclear localization, possibly via staged release of RNA transcripts for as ways to control the expression of RNA transcription. Strikingly, this organization of sites of transcription of novel non- coding RNAs and binding of transcription factors is also observed the murine genome.

1. Kapranov, P., S. Cawley, J. Drenkow, S. Bekironov, R. L. Strausberg, S. P. A. Fodor and T. R. Gingeras (2002) Large- scale transcriptional activity of the human genome revealed in chromosomes 21 and 22 Science 296: 916-919.

3 2. Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. , Wheeler, R. , Wong, B. , Drenkow, J., Yamanaka, M.1, Patel, S., Brubaker, S., Tammana, H., Helt, G., Struhl, K. and Gingeras, T. R. (2004) Mapping of transcription factor binding sites points to wide-spread antisense transcription along human chromosomes 21 and 22.( in press, Cell )

Proteomics, why when and how?

Professor Peter Roepstorff Department of Biochemistry and Molecular Biology, University of Southern Denmark. email: [email protected]

Abstract: The advances in DNA-sequencing and rapidly increasing amount of genome sequence data becoming available have supplied biological sciences with the genetic blueprint for a large number of organisms. However, the genome does not contain information about the further fate of the proteins, e.g. location, modification, interaction. Therefore, the next step is global analysis of the proteins and their variants. Such analyses have recently been termed proteomics defined as: The analysis of the complete protein complement expressed by a genome or by a cell or tissue type (Wilkins M.R. et al. (1996) Bio/Technology, 14, 61-65). Mass spectrometry is one of the most sensitive analytical techniques, which can generate structural information on proteins and the type of information generated by mass spectrometric analysis is ideal for queering sequence databases. Therefore mass spectrometry has become a key analytical tool in proteomics.

Proteomics studies can be performed on several levels: - Expression proteomics, i.e., which proteins are expressed, when and where. - Modification specific proteomics, i.e., which variants are present of the proteins, when and where. - Interactomics, i.e., which proteins interact with what , when and where.

General proteomics strategies will be described and the potential for using proteomics to assess biodiversity will be discussed.

Session 2 – Linkages between molecular and organismic level biodiversity. DNA-taxonomy.

Professor Diethard Tautz Institut für Genetik der Universität zu Köln, Köln, Germany. email: [email protected]

Abstract: Identification of species with molecular probes is likely to revolutionize taxonomy, at least for taxa with morphological characters that are difficult to determine otherwise. Among these are the single cell eucaryotes, such as Ciliates and Flagellates, but also many other kinds of small organisms, such as Nematodes, Rotifers, Crustaceans, mites, Annelids or Insect larvae. These organisms constitute the meiofauna in water and soil, which is of profound importance in the ecological network. Efficient ways for monitoring species identity and abundance in the meiofauna should significantly help to understand ecological processes. We use a DNA taxonomy approach to develop fast assessment tools for complex mixtures of species. One approach is based on the selection of species-specific oligo-nucleotides that can be used in microarray experiments. Another approach aims at developing cheap and simple sequencing technologies. We believe that these tools will eventually allow to do species inventories and monitoring on a routine basis and at low costs per sample. We have started to generate such data for meiobenthos organisms in lakes.

4 DNA bar-coding: a new diagnostic tool for rapid species recognition, identification, and discovery.

Director James Hanken Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA. email: [email protected]

Abstract: DNA barcoding is a diagnostic technique in which the base-pair (bp) sequence of a portion of a single gene is used to assist species recognition. It is intended to promote the rapid and inexpensive identification of the estimated 10 million species of eucaryotic life on Earth. Scientific benefits of DNA barcoding include: 1) facilitating species identification, including flagging specimens that may represent species new to a geographic area or new to science; 2) enabling rapid, inexpensive, on-site identifications where traditional methods are inappropriate or impractical; 3) promoting development of portable technology for DNA barcode analysis that can be applied in the field; and 4) providing insight into the natural and evolutionary history of life via quick and reliable species- level identifications.

Single-gene sequence data obtained also will contribute to multi-gene efforts to determine both deep and shallow evolutionary relationships. Potential practical applications lie in many areas, including environmental genomics, biomedicine (identification of pathogens and disease vectors), agriculture (recognition of pest species), environment (identification of species that are endangered or whose trade or exploitation is restricted), national security (remote monitoring for bio-warfare agents), and enhanced bio-literacy among the general public.

Initial efforts to apply DNA barcoding to animals have focused on a 645 bp fragment of the ubiquitous mitochondrial gene, cytochrome c oxidase subunit I (COI). However, COI is not appropriate for use in all animals, and additional pilot studies are needed to select alternate genes for use in such taxa. DNA barcoding is not intended to supplant or otherwise invalidate existing taxonomic practice. It accepts that description of new biological species should be based on multiple characters (data sets), and does not advocate formally linking DNA sequence to species taxonomy. Moreover, barcoding activities should adhere to established professional standards for specimen and data management, including the routine deposition of voucher specimens in institutional collections, and freely accessible electronic databases and specimen images. A "Barcode of Life" consortium is anticipated, with broad international representation, which would establish a vouchered database for barcoded sequences that is linked to other taxonomic and specimen databases.

By providing a simple and convenient molecular diagnostic tool, DNA barcoding is intended to enhance both the identification of existing species and the discovery of new ones, as well as provide other applied benefits to science and society.

Phylogeny - Do we really need morphology?

Associate Professor Ole Seberg Department of Evolutionary Biology, University of Copenhagen, Denmark. email: [email protected]

Abstract: Phylogenetic information is pervading all parts of biology, and has proven useful in many fields, such as choosing experimental systems for biological research, tracking the origin and spread of emerging diseases and their vectors, bioprospecting for pharmaceutical and agrochemical

5 products, preserving germplasm, targeting biological control of invasive species, and evaluating risk factors for species conservation and ecosystem restoration.

Molecular data, abundant and inexpensive as they are, have revolutionized phylogenetics, but not diminished the importance of traditional work. The last 20 years have witnessed an exponential increase in the number of papers that deal with molecular phylogeny whereas the number of e.g. taxonomic revisions has only increased liniarly.

Based on research on the phylogeny of the grass-tribe Triticeae and recent phylogenetic studies in the rushes and sedges the interactions and limitations of the two major data partitions, molecules and morphology, are explored.

Understanding and improving the phylogeographic link between molecular and organismal biodiversity.

Professor Brett R. Riddle Department of Biological Sciences, University of Nevada Las Vegas, USA. email: [email protected]

Abstract: Phylogeography is the discipline that uses genealogical lineages to explore the histories of populations, species, and biotas in a geographic context and primarily within a Late Neogene and Pleistocene timeframe. It is, therefore, designed explicitly to link molecular with organismal diversity.

Yet, phylogeography has to date been underutilized as an approach to discovering and analyzing biodiversity in a conservation context. Phylogeography will become an increasingly useful approach to biodiversity valuation because of its utility in linking microevolutionary with macroevolutionary patterns and processes, thereby creating a broad platform for associating patterns of biotic diversification, distributions, and associations with older geologic to more recent paleoclimatic events in earth history.

Rigorous approaches to revealing the geography of population history (e.g., recent shifts in population structure, size, and location) are advancing rapidly. While phylogeography has not been as successful at addressing rigorously the questions of historical biogeography (the geography of speciation, the contributions of vicariance and dispersal to the assembly of biotas), recent advances in this direction include the linkage of comparative phylogeography with phylogenetic biogeography.

Phylogeography provides data critical to identification of evolutionarily significant units, location of biodiversity hotspots, and reconstruction of Pleistocene refugia and biotic responses to past global climate changes. Growth of comparative data sets in target landscapes of interest will enhance the validity of employing prioritization analyses, e.g., phylogenetic diversity (PD) indices, to the valuation of biodiversity.

I illustrate several of these concepts using the terrestrial vertebrate biota of the warm deserts of North America.

Linking molecular and ecological diversity of microorganisms: pattern and prediction.

Professor David Ward Department of Microbiology, Montana State University – Bozeman, USA. email: [email protected]

6 Abstract: Distribution of molecular variants along ecological and geographic gradients suggests that hot spring cyanobacterial diversity arose through adaptation and geographic isolation, the same processes thought to mold plant and animal species.

However, conserved loci, like SSU rRNA, underestimate ecological diversity within a community. Higher resolution methods are needed to detect all the ecologically distinct populations (ecotypes) that can be viewed as the basic units from which microbial communities are constructed and that should respond uniquely to changing environmental conditions.

Periodic selection theory predicts that natural selection should lead to patterns such as those we have observed, culminating in the evolution of terminal phylogenetic clades corresponding to distinct ecotype populations.

If periodic selection theory applies to the evolution of microorganisms, it may be possible to distinguish variation within terminal clades (ecotypes) from variation between terminal clades (different ecotypes), provided that methods have sufficient resolution. Thus, it should be possible to understand how genetic diversity is linked to ecological diversity.

Molecular Biodiversity and Evolution of the Archaea.

Professor Roger A. Garret Institute of Molecular Biology, University of Copenhagen, Denmark email: [email protected]

Abstract: The fallacy of the prokaryotic-eukaryotic phylogenetic divide has been well documented over the past two decades. Much progress has now been made in characterising the Archaea, roughly estmated as constituting up to 30% of the planet’s biomass, and in positioning them phylogenetically with respect to bacteria and eukarya.

The Archaea were first defined by molecular biologists, and the availability of nucleic acid sequences was probably a prerquisite for their discovery (the initial breakthrough emerged from analysis of oligonucleotide sequences, and their modifications, from T1 RNase digests of 16S rRNA). Although many biologists have remained sceptical about this (almost) exclusively molecular approach to defining phylogeny, there is now a large body of evidence to support it.

In my talk I will consider: (a) some of the basic molecular processes of the archaeal cell that are, to a degree, archaea-specific; (b) how diversified these processes are in different archaeal Orders/Kingdoms; (c) what we can learn from the uniquely diverse of the Crenarchaea. I will also give some insights into how comparative genomics is being used to examine diversification and lateral gene transfer amongst the Archaea.

Session 3 – Use-cases, implementation and future application of linking molecular and organismic level biodiversity databases Data integration and multi-factorial analyses, the yeasts as a case study.

Dr. Vincent Robert Centraal Bureau voor Schimmelcultures, Utrecht, the Netherlands. email: [email protected]

Abstract:

7 Culture collections such as the Centraalbureau voor Schimmelcultures (CBS, The Netherlands) have to manage large amounts of strains (the records) and characters (the fields). Until a few years ago, collected data were essentially used for taxonomic purposes. Most of the time, they were not disclosed to the end-users and only the species name of the strains was available together with a few additional data such as the depositor, the substrate, the origins of the strain, … It’s only recently that some major collections have published some of their scientific data on the Internet. The amount of data available is however usually very limited either in terms of records and/or fields. Several years ago, we have decided to tackle this problem and to create databases that would contain all possible sources of information at the strains and species levels. This task was far from trivial since we wanted to integrate many different sources of information like administrative data, bibliography, geography, pictures, nomenclature, morphology, physiology, biochemistry, molecular data (electrophoresis results, sequences, …) as well as hyperlinks to other web based repositories. Proper storage of data in large databases was far from sufficient. From the beginning, it appeared that conventional searching tools available from commercially available databases were not fitting with our needs. Hence, we developed a new software called BioloMICS capable of searching, identifying, classifying and analysing all available data in a polyphasic way. Many new tools and algorithms were programmed and the current version of this software is fully adaptable to the end-users’ needs (characters’ list and weighting of characters can be changed dynamically, …). Data can be published using the CDROM and/or the Web Based module of the software. Our yeasts database was our first major achievement and contains more than 900 species descriptions as well as 6250 strains records for more than 350 data points. Our fungal and bacterial databases will also soon be released. A few examples of potential applications of such a system will be demonstrated.

Fungal phenotype versus fungal phylogeny: Evolution of fungal genes encoding for secreted .

Professor Lene Lange Molecular Biotechnology, Novozymes A/S, Denmark. email: [email protected]

How to link organismic level and molecular level databases, implementation and potentials.

Programme Officer Donald Hobern Global Biodiversity Information Facility (GBIF), Denmark. email: [email protected]

8

Posters: High-rate evolution of Saccharomyces sensu lato chromosomes – the first step towards speciation?

Mário Špírek (a;b), Jun Yang (a), Casper Groth (a), Randi F. Petersen (a), Rikke B. Langkjær (a), Elena S. Naumova (a;c), Pavol Sulo (a;b), Gennadi I. Naumov (a;c), Dorte Joerck-Ramberg (a), Jure Piškur (a). email: [email protected] a Molecular Biology Group, BioCentrum-DTU, Technical University of Denmark, Building 301, 2800 Lyngby, Denmark. b Department of Biochemistry, Faculty of Natural Sciences, Comenius University, Mlynska¤ Dolina, 842 15 Bratislava, Slovak Republic. c State Institute for Genetics and Selection of Industrial Microorganisms, I Dorozhnyi 1, Moscow 113545, Russia.

Abstract: The genus Saccharomyces has been traditionally divided according to the physiological and morphological observations as well as relatedness to S. cerevisiae into two subgroups. Complex Saccharomyces sensu stricto involves species closely related to S. cerevisiae and sensu lato group comprising of species more distantly related to S. cerevisiae. In order to understand the biodiversisty, genome dynamics and speciation within the genus, forty isolates belonging to the Saccharomyces sensu lato complex were examined for various molecular and genomic traits. The number and size of chromosomes, as well as the partial sequence of the nuclear 26S ribosomal RNA and mitochondrial small rRNA (SSU) and ATP9 genes were determined. Several isolates had almost identical nuclear and mitochondrial sequences and presumably belong to the same species. The second class of isolates includes groups exhibiting slight sequence variations that could represent sister species complexes at the beginning of speciation. The third class contains isolates, which represent independent species not so closely related to other isolates. The deduced phylogenetic relationships among isolates based on the nuclear and mitochondrial sequences were usually similar, suggesting that horizontal transfer/introgression has not been frequent. The highest degree of polymorphism was observed at the chromosome level. Even isolates, which had identical nuclear and mitochondrial sequences, exhibited variation in the number and size of their chromosomes. Therefore, it is likely that yeast chromosomes have been frequently reshaped and the position of genes has been dynamic during their evolutionary history. Recently the role of chromosome translocation has been pointed out in reproductive isolation and consequently in the origin of new species. Our analysis suggests that the variability in karyotypes could isolate different strains and may be the first step towards speciation.

Study on Microbial Diversity of Bor Khlueng Hot Spring in Thailand.

Pattanop Kanokratana (a), Supavadee Chanapan (b), Kusol Pootanakit (b) and Lily Eurwilaichitr (a). email: [email protected] a BIOTEC Center Laboratory, 113 Phahonyothin Road, Klong 1, Klongluang, Pathumthani 12120, Thailand. b Institute of Molecular Biology and Genetics, Mahidol University, Salaya Campus, Salaya, Phuttamonthon District, Nakhon Pathom 73170, Thailand.

9 Abstract: The prokaryotic diversity in the Bor Khlueng hot spring in Ratchaburi province, Thailand, was investigated by a culture-independent molecular approach. The Bor Khlueng hot spring is rich in sulfide and is believed to relieve muscle ache and pain. Community DNA was directly extracted from the hot spring soil sediment, and small-subunit rRNA genes (16S rDNA) were amplified by PCR using primers specific for the Archaea and Bacteria domains. The PCR products were cloned and sequenced. For the bacterial rDNA clone library, 200 clones were randomly selected for further analyses. After screening of rDNA clones by RFLP and exclusion of chimeric sequences, 38 phylotypes were obtained. The phylotypes spanned a wide range within the domain Bacteria, occupying eleven major lineages (phyla). Almost a quarter of the clones were classifed as Acidobacteria. The other clones were grouped into the Bacteriodetes (19%), Nitrospira (13%), Proteobacteria (12%), Deinococcus-Thermus lineage (11%), Planctomycetes (6%), and Verrucomicrobia (5%). The four remaining phyla, <5% each, were represented by Actinobacteria, Chloroflexi, Cyanobacteria, and the candidate division “OP10”. For the archaeal 16S rRNA gene sequence library, 25 distinct phylotypes were obtained, 17 clones were found to be associated with Crenarchaeota and 8 clones were associated with Euryarchaeota. A great majority of the prokaryotes found in the Bor Khlueng hot spring were unknown. The findings of this study demonstrated that this hot spring is a rich source of novel bacterial and archaeal species.

The identification of a gene encoding a novel starch-degrading isolated directly from environmental DNA.

Sutipa Tanapongpipat, Tanatchaporn Utairungsee, Kittapong Tang, Lily Eurwilaichitr, Krisana Asano and Kanyawim Kirtikara. email: [email protected] BIOTEC Central Research Unit, National Center for Genetic Engineering and Biotechnology, Thailand Science Park, 113 Paholyothin Rd, Klong 1, Klong Laung, Pathumthani, 12120, Thailand.

Abstract: PCR fragments were amplified from soil DNA collected from Bor khleung hot spring in Thailand using degenerate primers designed based on conserved regions of prokaryotic starch-degrading enzymes. Nucleotide sequencing of obtained DNA inserts confirmed that these fragments harboured partial gene sequences encoding consensus regions of amylolytic enzymes. The differences within these sequences comparing with those of known bacteria suggested the possibility to acquire novel starch-degrading enzymes from the hot spring. The partial DNA sequence of one clone, BK13, was further used to identify a complete gene by genomic walking. Nucleotide sequence of 1,857 bp encoding an ORF with a putative size of 619 amino acids was obtained. Using Blast analysis, this amino acid sequence showed 48% similarity to neopullulanase of Thermoactinomyces vulgaris and Bacillus stearothermophilus and 47% similarity to maltogenic amylase of Thermus sp. IM6501. This work has shown that by using PCR-based cloning technique, it is possible to isolate genes from uncultured bacteria that largely differ from those of cultured ones. This technique could be used to identify novel enzymes with unique characteristic for potentially industrial applications.

10 Phylogeny of snailfishes (Scorpaeniformes: Liparidae) from Atlantic, Arctic and Pacific waters based on morphological and molecular data.

Steen W. Knudsen (a,b), Peter R. Møller (a) and Peter Gravlund (b). email: [email protected] a Zoological Museum, University of Copenhagen, Universitetsparken 15, Copenhagen DK-2100, Denmark. b Department of Evolutionary biology, University of Copenhagen, Universitetsparken 15, Copenhagen DK- 2100, Denmark.

Abstract: An attempt to reveal the phylogeny within the family Liparidae is made by analyzing molecular and morphological data from 24 species of Liparidae (ingroup) and 4 Cyclopteridae species (outgroup). The molecular data consists of partial sequences of two mtDNA genes: 16S and cyt b. From 16S 525 bp were sequenced, and 448 bp were sequenced of cyt b. Morphological data compromises 77 characters traits.

The familiy of Liparidae is probably the richest family of species in Greenlandic and Arctic waters (15 described taxa and several undescribed), but it is also one of the least examined families regarding taxonomy, phylogeny and biogeography

Liparidae are found from shallow waters and down to 7500 m mainly in the Northern Pacific, the Arctic the North Atlantic and around Antarctica and few have been reported from subtropical and tropical deep waters. The aim of this study is to elucidate the phylogenetic coherence between 5 North Atlantic and Arctic genera and 6 East Pacific genera and their relationship with 3 genera of Cyclopteridae using molecular and morphological data. The close relationship between Liparidae and Cyclopteridae has been verified in former phylogenetic studies and Cyclopteridae is designated as outgroup taxa. Both Liparidae and Cyclopteridae posses ventral fins modified into a sucking disc, which is supposed to be lost secondarily in pelagic species. Two morphological attempts to reconstruct the phylogeny within Liparidae have been tried but none have so far been attempted using molecular data. This study provides the first genetic analysis of partial sequences of two genes from Liparidae and Cyclopteridae. This study reveals that the genus Psednos is related with Paraliparis and Rhodichthys. Two molecular most parsimonious trees favours the genus Nectoliparis as more related with the genus Liparis, indicating that the loss of sucking disc has happened twice. Apparently the genus Rhodichthys should not be included in Paraliparis, but should be regarded as a monotypic genus more closely related with the genus Psednos.

Galaxie - addressing sequence identity through automated phylogenetic analysis.

Nilsson RH, Larsson K-H, Ursing BM. email: [email protected] Botanical Institute, Göteborg University, Sweden.

Abstract: BLAST is a good tool for sequence identification. It works extremely fast and gives a rapid view of what's present in the public databases and the ectomycorrhizal database UNITE (see poster on UNITE). BLAST is, however, less well suited to assess previously un-encountered sequences. Other analysis tools, notably phylogenetic inference, typically retrieve more information from multiple alignments than do similarity searches like BLAST. Galaxie automates phylogenetic identification of sequences for UNITE. It takes an incoming sequence and uses BLAST to retrieve the most similar UNITE sequences, performs a multiple alignment and then carries out a NJ or

11 parsimony search, returning the resulting tree to the user's screen. It can also use pre-aligned matrices rather than align on the fly. This feature lets UNITE store multiple alignments for each genus of ECM fungi. Every incoming sequence is assed as to whether it belongs to any of the pre- made alignments; if so, it is aligned to and analysed together with the sequences of it. This gives a very detailed view of the position of a query sequence inside a genus - not least so since hand- made alignments tend to be better than those assembled automatically. Galaxie is freely available and open source. While Galaxie was created as a phylogenetic complement to BLAST for UNITE, it is also an independent entity, happily working on whatever gene and organism group its installer sees fit. More information is available at http://andromeda.botinst.gu.se/galaxiewelcome.html. Reference: Nilsson RH, Larsson K-H, Ursing BM. 2004. galaxie--CGI scripts for sequence identification using automated phylogenetic analysis. (Bioinformatics, in press).

UNITE – database providing web based tools for the molecular identification of ectomycorrhizal fungi.

Urmas Kõljalg (1), Karl-Henrik Larsson (2), Kessy Abarenkov (1), Henrik R. Nilsson (2), Ian J. Alexander (3), Ursula Eberhardt (4), Susanne Erland (5), Rasmus Kjøller (6), Ellen Larsson (2), Taina Pennanen (7), Robin Sen (8), Andrew F.S. Taylor (4), Trude Vrålstad (9), Leho Tedersoo (1), Björn M. Ursing (10). email: [email protected] 1 Institute of Botany, University of Tartu, Tartu, Estonia; 2 Systematic Botany, Göteborg University, Sweden; 3 Dept. of Plant and Soil Science, University of Aberdeen, UK; 4 Dept. of Forest Mycology and Ecology, SLU, Sweden; 5 Microbial Ecology, Lund University, Sweden; 6 Botanical Institute, University of Copenhagen, Denmark; 7 Metla, Finnish Forest Research Institute, Finland; 8 Dept. of Biosciences, University of Helsinki, Finland; 9 Dept. of Biology, University of Oslo, Norway; 10 Center for Genomics and Bionformatics, Karolinska Institutet, Sweden.

Abstract: Identification of ectomycorrhizal fungi is often achieved through comparisons of ribosomal DNA internal transcribed spacer (ITS) sequences with accessioned sequences deposited in public databases. A major problem encountered is that annotation of the sequences in these databases is not always complete or trustworthy. In order to overcome this deficiency, we report on UNITE, an open access database comprising well annotated fungal ITS sequences from well-defined herbarium specimens that include full herbarium reference identification data, collector/source, ecological data, etc. UNITE can be searched either by species name or via a sequence similarity search by using blastn. Database incorporates also phylogenetic species recognition. For this purpose the program galaxie, a web based tool for phylogenetic analyses, is implemented into UNITE (see poster on galaxie). Galaxie is using pre-aligned sequence files to place query sequences inside known taxa. The UNITE database test version is available through the Internet address http://unite.zbi.ee/.

Evolution and biogeography of suboscine birds.

Jan Ohlson, Martin Irestedt and Per G. P. Ericson. email: [email protected] Department of Vertebrate Zoology, Swedish Museum of Natural History, PO Box 50007, S-10405 Stockholm, Sweden.

Abstract: The long-term objective of the project is a) to produce robust estimates of the phylogenetic relationships among suboscine birds, and b) to describe their morphological and behavioural evolution. This is achieved by analysing DNA-sequence data obtained from multiple nuclear as well as mitochondrial genes. The research strategy is to first outline the major evolutionary units,

12 after which we proceed to studies of smaller, monophyletic groups that can be more densely taxonomically sampled. Passerine birds are much used model organisms in comparative studies of, for example, ecology, ecomorphology and ethology. In these studies, the interpretations often rest on the assumption that the included taxa form a monophyletic group, and that their inter- relationships are correctly understood. Our goal is to provide a solid phylogenetic framework within which such assumptions can be assessed. The suboscine passerines comprise about 1100 species traditionally divided into three groups. The Old World Suboscines (Pittas, Broadbills and Asities) consist of only about 50 species, distributed from tropica Africa to Melanesia. The New World Suboscines, consisting of Furnaroidea (Ovenbirds, Woodcreepers, Typical Antbird, Ground Antbirds, Gnateaters and Tapaculos) and Tyrannoidea (Tyrant Flycatchers, Cotingas and Manakins) are far more species-rich and diverse with about 500 species each. They constitute an important part of the land bird fauna in every type of habitat in the Neotropics, but only a few genera of Tyrant Flycatchers reach North America. The phylogenetic results of our research largely support the traditional classifications in the the three main groups, but strongly suggest that the relationships at the family level need to be re- evaluated at several points. The systematic position of some problematic taxa has been clarified and several novel hypotheses for suboscine relationships have been attained by our studies. Our studies also support the hypothesis that Passeriformes originated in Gondwana.

A study of clonal relationship of Escherichia coli pathogens using molecular and biochemical tools.

M. F. Anjum, I. Aktan, R. M. la Ragione, M. J. Woodward. email: [email protected] Department of Food and Environmental Safety, Veterinary Laboratory Agency (Weybridge), New Haw, Addlestone, Surrey, KT15 3NB, UK.

Abstract: To improve the current understanding of diversity within pathogenic bacteria the genomic relatedness between verotoxin (VT)-positive and negative E. coli O26 strains was assessed using DNA microarray. This approach established the core genes that are central to the E. coli species and identified regions of genomic variation in the pathogenic strains from the E. coli K12 chromosome. Whilst the O26 strains showed greater genomic similarity to each other than any other non-O26 strain included in this study (92.7% similarity), regions of variation were found which differentiated the O26 strains into subgroups. All verotoxin (VT)-positive O26 strains lacked a 5.2 Kb region (yagP-yagT) which was found by both microarray and PCR to be present in all VT- negative O26 strains. All O26 strains harbouring the haemolysin gene (hlyA) showed presence of the fimB gene, which regulates expression of the type 1 fimbriae. Biochemical studies showed VT- positive O26 strains were unable to ferment rhaminose whilst all other 19 strains in this study fermented rhaminose. Further testing with O26 field isolates is currently being undertaken to establish a link between virulence and the different O26 subgroups identified in this study.

13