Metagenomics of Cyanobacterial Blooms

Author Pope, Phillip Byron

Published 2007

Thesis Type Thesis (PhD Doctorate)

School School of Biomolecular and Biomedical Sciences

DOI https://doi.org/10.25904/1912/1772

Copyright Statement The author owns the copyright in this thesis, unless stated otherwise.

Downloaded from http://hdl.handle.net/10072/368095

Griffith Research Online https://research-repository.griffith.edu.au METAGENOMICS OF CYANOBACTERIAL BLOOMS

A Thesis submitted in fulfilment of the requirements for the degree of Doctor of

Philosophy, Griffith University

by Phillip Byron Pope (BSc. Hons)

School of Biomedical and Biomolecular Science, Faculty of Science, Griffith

University, Nathan Campus, Queensland, Australia

January 2007

DECLARATION

I declare that this work has not been previously submitted for a degree or diploma in any university. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made in the thesis itself.

Phillip B Pope

ii ACKNOWLEDGEMENTS

Upmost and foremost I would like to thank Professor Bharat KC Patel for the opportunity to work with him. His continuing advice, support and assurance gave me the confidence and enthusiasm needed to make my PhD candidature as productive and enjoyable as possible.

Many thanks are forwarded to Dr Glenn Shaw and Dr Anthony Greene for their additional support and guidance throughout my candidature. I appreciate the advice, assistance and friendship of Mr Peter Bain, Mr Jamie Nourse, Ms Janet

Li Zhang, Mr Matthew Walker-Brown, Dr Kerry Inder, Dr Renee Stirling, Mr

Frederich Huynh, Dr Marwan Abu-Halaweh and Dr Lyle McMillen. The visitors from Nanyang Polytechnology, Singapore, Tok Zheny Rony (Alex), Daphne Lim and Wee Yang Hoa are also mentioned for their contribution to this project.

Special thanks must also go out to the entire CRC for Water Quality and

Treatment “team”, particularly Professor Dennis Mulcahy who has always been there ready to provide words of wisdom and lend a helping hand. Carolyn

Bellamy and George Turelli are also mentioned for their solid support through out my candidature especially for assistance regarding conference travel.

I gratefully acknowledge the financial support from The Australian Research

Council Project Grant and the CRC for Water Quality and Treatment for my PhD program as well as funding provided by Griffith University for conference travel.

Finally I thank my family and friends for their continued interest and enduring support in all that’s been done and what still lies ahead.

iii PUBLICATIONS AND CONFERENCE PROCEEDINGS

Publications in preparation:

1. Phillip B Pope and Bharat KC Patel. Metagenomics of a Aphanizomenon

and Cylindrospermopsis Dominated Toxic Cyanobacterial Bloom. Applied

and Environmental Microbiology. In preparation.

2. Phillip B Pope and Bharat KC Patel. Community Composition associated

with Cyanobacterial blooms: Analysis of Communities in Lake Samsonvale

and Lake Ainsworth. Microbial Ecology. In preparation.

Conference proceedings:

1. Phillip B Pope and Bharat KC Patel (2006). Metagenomic Analysis of a

Toxic Cyanobacterial Bloom. 11th International Symposium on Microbial

Ecology, 20-25 August, Vienna, Austria.(oral)

2. Phillip B Pope and Bharat KC Patel (2006). Comparative Genomic Analysis

of DNA Fragments from a Cyanobacterial Bloom. 12th International

Conference on Harmful Algal Blooms, 4-8 September, Copenhagen,

Denmark.

3. Phillip B Pope (2006). The Quest for Treasure in a Toxic Green World.

Cooperative Research Centre’s Association National Conference, 17-19

May, Brisbane. – National Finalist for ‘Showcasing CRC PhD’s’. (oral)

4. Phillip B Pope (2006). Cyanobacterial Diversity and the Expression of

Secondary Metabolites in Environmental Blooms. 5th Biennial Postgraduate

Student Conference of the CRC for Water Quality and Treatment, 10-13

July, Melbourne Australia. – Runner up best presentation. (oral)

5. Phillip B Pope and Bharat KC Patel (2005). Metagenomics of

Cyanobacterial Blooms. GRC conference for Applied and Environmental

Microbiology, 24-29 July, New London, Connecticut, USA.

iv ABSTRACT

Cyanobacteria are a diverse and widely distributed group of organisms common in soil and in both marine and freshwater. Under favorable conditions they can reproduce explosively, forming dense concentrations called blooms. Fresh water cyanobacterial blooms in particular are commonly associated with toxin production in drinking water supplies and are increasingly becoming a risk to human health. Beyond toxin production these extremely complex, constantly interacting and changing microbial communities have vast impacts on their surrounding ecosystem. The triggers that initiate bloom formation and/or toxin production remain poorly understood. This stems from the fact that there is still very little known of cyanobacterial bloom population structure and their function in the real environment.

A greater understanding of the interactions of different microbial populations and their functions in the blooming process leading to toxin production could come from using metagenomics to investigate the genetic and metabolic diversity of the mixed populations rather than the difficult to culture cyanobacteria. Two distinct cyanobacterial bloom communities existing in contrasting Australian freshwater lakes were selected and high molecular weight DNA extracted. PCR-amplified 16S rRNA genes were subsequently cloned and a total of 75 clones from Lake Samsonvale and 50 clones from Lake

Ainsworth were examined. Sequences identified belonged to species from 6 different phyla from the Bacterial domain, including Cyanobacteria,

Actinobacteria, Firmicutes, Verrucomicrobium, Bacteroidetes, and α-, β- and γ-

Proteobacteria. The majority of the bacterial sequences were most closely related to sequences recovered from other freshwater clones or isolates (<80% homology), whilst few were closely related to sequences recovered from soil or v marine habitats. In particular 9 % of the total sequences were most closely related to sequences recovered from freshwater lakes that are susceptible to cyanobacterial blooms. A total of 12 novel clusters consisting of 22 sequences were noted spanning all divisions represented in the analysis. Of this, 7 were found to lack any close relatives suggesting that sequences in these clusters may be characteristic for bloom events. Preliminary results also indicate that physio-chemical differences in lake character appear to influence bacterial community composition associated with cyanobacterial blooms.

Bloom communities from Lake Samsonvale demonstrated high levels of toxin- producing Cyanobacteria and uncultured Actinobacteria. These findings were used to justify its selection for further metagenomic analysis to gain insights into the genomes of these and other organisms. DNA was fractionated and used to construct a bacterial artificial chromosome library (CBNPD1) of 2,850 clones which had an average insert size of 27 kb. A PCR-based single-gene polyketide synthase library was constructed in tandem and used as an additional assurance that high quality DNA was being extracted and cloned. Phylogenetic analysis of gene sequences recovered from this library demonstrated an abundance of novel bacterial polyketide synthase genes.

Sequence-based screening of library CBNPD1 was performed to identify clones of interest and provide a physiological insight within cyanobacterial blooms. A random BAC-end sequence survey generated 67 sequences (40 kb in total) from 36 randomly selected clones. G+C composition ranged from 33.33 to

72.91%. Fifteen sequence tags (22%) were found most similar to sequences affiliated to genera with no available genome. Another 17 sequence tags (25%) were most similar to sequences affiliated to genera with available genomes,

vi however similarities were less than 80%. Sequence tags were also found with affiliation to proteins involved in a wide array of cell metabolism processes including amino acid metabolism (e.g. methionine synthase), carbohydrate metabolism (cellulose), inorganic ion metabolism (nitrite/sulfite reductase), and lipid metabolism (fatty acid hydroxylase). A number of genes involved in cell structures (e.g. flagella), DNA processes, energy production (photosynthetic reaction center L subunit) and defense mechanisms (nucleases) were also affiliated to sequence tags. PCR screening of CBNPD1 was used to detect clones containing 16S rDNA to establish a link between physiological and phylogenetic information of uncharacterized microorganisms in cyanobacterial blooms. Screens from 480 clones identified 2 clones containing a 16S rRNA gene. Clone 545 and 578 contained 16S rDNA affiliated to 2 different phylogenetic genera within the division, Pseudomonas and

Roseateles respectively.

From library screens 7 BAC inserts were selected and sequenced to completion comprising 144 kb of a cyanobacterial bloom metagenome and spanning 3 phyla including Proteobacteria, Actinobacteria and Bacteroidetes. 130 genes have been identified and assigned to COG (clusters of orthologous groups of proteins) functional categories. Also identified, were many housekeeping proteins spanning the majority of the COG functional groups as well as physiologically and ecologically important proteins some of which were looked at more in depth. These include a putative phenylacetyl catabolon, a putative

RTX toxin, several putative and several putative bacterial transcriptional regulators that are inferred in controling a wide variety of activities in various biological processes, the most notable being quorum sensing.

vii This culture-independent experimental approach has provided a phylogenetic community snap shot of the cyanobacterial bloom community structure and their physiological functions within the bloom. Moreover it represents an important biodiversity resource which has already been shown to contain novel biomolecular biodiversity.

viii TABLE OF CONTENTS

DECLARATION ...... II

ACKNOWLEDGEMENTS...... III

PUBLICATIONS AND CONFERENCE PROCEEDINGS ...... IV

ABSTRACT...... V

LIST OF FIGURES...... XV

LIST OF TABLES ...... XVII

ABBREVIATIONS...... XIX

CHAPTER 1 - INTRODUCTION: A REVIEW OF METAGENOMICS, AND

CYANOBACTERIAL BLOOMS ...... 1

1.1 Advances in microbial ecology...... 1

1.2 Metagenomics ...... 4

1.2.1 Metagenomics defined...... 4

1.2.2 Metagenomic analysis of microbial communities: Describing microbial

ecology and diversity ...... 6

1.2.3 Reconstruction of the metagenome ...... 12

1.2.4 Challenges with metagenomic analysis ...... 12

1.3 Cyanobacteria...... 14

1.3.1 Species diversity and global distribution ...... 14

ix 1.3.2 Cyanobacterial blooms and factors affecting formation ...... 14

1.3.3 Cyanobacterial secondary metabolites ...... 16

1.3.4 Synthesis and regulation of secondary metabolites...... 19

1.3.5 Genome analysis of Cyanobacteria ...... 21

1.4 Aims ...... 23

CHAPTER 2 - MATERIALS AND GENERAL METHODS...... 25

2.1 Research plan...... 25

2.2 Reagents and chemicals...... 25

2.3 Buffers...... 26

2.4 Microbial media ...... 27

2.5 Sample sites and Collection...... 27

2.6 DNA Techniques...... 29

2.6.1 DNA plug preparation ...... 29

2.6.2 DNA plug lysis ...... 29

2.6.3 Agarose gel electrophoresis ...... 30

2.6.4 Pulsed field gel electrophoresis ...... 30

2.6.5 Digestion of HMW DNA ...... 31

2.6.6 GELase gel digestion...... 31

2.6.7 Ligation reactions...... 32

2.6.8 Dialysis ...... 32

2.6.9 Electroporation of E.coli...... 32

2.6.10 Small scale plasmid purification...... 33

2.6.11 Large scale plasmid purification...... 33

x 2.6.12 Restriction endonucleases and DNA-modifying ...... 33

2.6.13 Polymerase chain reaction...... 33

2.6.14 Automated DNA sequencing...... 34

2.6.15 pGEM-T easy PCR cloning...... 34

2.7 Computational analysis of DNA sequences...... 35

2.7.1 BioEdit Sequence editor ...... 35

2.7.2 BLAST searches...... 35

2.7.3 Ribosomal Database Project II ...... 35

2.7.4 Bacterial Annotation System...... 36

2.7.5 KEGG2 ...... 36

2.7.6 InterProScan protein signature database search...... 37

2.7.7 Treecon phylogenetic trees...... 37

CHAPTER 3 - COMMUNITY COMPOSITION ASSOCIATED WITH

CYANOBACTERIAL BLOOMS ...... 38

3.1 Introduction ...... 38

3.2 Experimental procedures ...... 39

3.2.1 Sample collection and HMW DNA extraction...... 39

3.2.2 PCR amplification and cloning ...... 39

3.2.3 Sequence editing and phylogenetic analysis ...... 40

3.3 Results ...... 41

3.3.1 Cyanobacteria...... 47

3.3.2 Heterotrophic comparative analysis ...... 48

3.3.2a Actinobacteria ...... 48

3.3.2b Proteobacteria ...... 49

3.3.3c Others ...... 55

xi 3.4 Discussion ...... 57

3.4.1 Bloom species composition ...... 57

3.4.2 Heterotrophic bacteria comparative analysis ...... 58

3.4.3 Sample selection for metagenomic library construction...... 61

CHAPTER 4 – METAGENOMIC LIBRARY CONSTRUCTION ...... 63

4.1 Introduction ...... 63

4.2 Experimental procedures ...... 64

4.2.1 Partial restriction digestion and size selection of random HWM

DNA fragments ...... 64

4.2.2 Construction of a cyanobacterial bloom metagenomic library...... 64

4.2.3 Insert size estimation ...... 65

4.2.4 PCR amplification, single-gene library construction and library analysis

...... 65

4.3 Results ...... 66

4.3.1 Metagenomic library construction ...... 66

4.3.2 Metagenomic library quality: insert size ...... 68

4.3.3 Metagenomic library quality: PKS gene survey...... 69

4.4 Discussion ...... 71

CHAPTER 5 – METAGENOMIC LIBRARY SCREENING...... 73

5.1 Introduction ...... 73

5.2 Environmental procedures ...... 74

5.2.1 BAC-clone 'pooling' and plasmid extraction ...... 74

5.2.2 Digestion of Chromosomal DNA ...... 74

xii 5.2.3 PCR amplification ...... 75

5.2.4 BAC-end sequencing...... 75

5.2.5 Computational analysis of DNA sequences ...... 75

5.3 Results ...... 76

5.3.1 PCR-based library screening ...... 76

5.3.2 Random sequence library screening...... 79

5.4 Discussion ...... 85

CHAPTER 6 – GENOMIC ANALYSIS OF 7 DNA FRAGMENTS FROM

CYANOBACTERIAL BLOOM ASSOCIATED BACTERIA...... 91

6.1 Introduction ...... 91

6.2 Environmental procedures ...... 92

6.2.1 BAC DNA purification and sequencing of insert DNA ...... 92

6.2.2 Computational analysis of sequencing data...... 92

6.3 Results ...... 93

6.3.1 Gene finding ...... 93

6.3.2 The Roseateles genomic fragment ...... 94

6.3.3 Low GC% genome fragment analysis...... 100

6.3.4 Mid-high GC% genome fragment analysis...... 102

6.4 Discussion ...... 110

6.4.1 Phylogenetic assignment ...... 111

6.4.1a Phylogenetic anchors: 16S rDNA ...... 112

6.4.2 Functional assignment...... 116

6.4.3 Response regulators...... 117

6.4.3a Bacterial regulatory protein, LuxR...... 118

xiii 6.4.3b Two-component regulatory system lytR/lytS...... 120

6.4.5 Alcohol oxidoreductases...... 121

6.4.6 A putative RTX toxin ...... 123

6.4.7 degradation: the phenylacetyl-CoA catabolon... 124

CHAPTER 7 – CONCLUSIONS AND FUTURE RESEARCH ...... 130

7.1 Conclusions...... 130

7.2 Future directions ...... 132

APPENDICES ...... 1388

Appendix I: GenBank accession numbers for single-gene clone libraries

...... 1388

Appendix II: Nucleotide sequence of Clone 67...... 1399

Appendix III: Nucleotide sequence of Clone 142...... 1544

Appendix IV: Nucleotide sequence of Clone 543 ...... 1588

Appendix V: Nucleotide sequence of Clone 578 ...... 1688

Appendix VI: Nucleotide sequence of Clone 905 ...... 1788

Appendix VII: Nucleotide sequence of Clone 1664 ...... 1855

Appendix VIII: Nucleotide sequence of Clone 2089 ...... 1966

REFERENCES...... 2099

xiv LIST OF FIGURES

Figure 1.1: Metagenomic Library construction and analysis...... 7

Figure 1.2: Microcystin synthase gene cluster mcy...... 20

Figure 2.1: Outline of research plan used to address aims to better understanding cyanobacterial bloom population structure and function...... 26

Figure 2.2: Images of Lake Samsonvale (A) and Lake Ainsworth (B)...... 28

Figure 3.1: Lake Samsonvale Cyanobacteria cell counts...... 41

Figure 3.2: Comparison of the composition percentage of 16S rRNA clones... 46

Figure 3.3: Phylogenetic distributions at genus level for rDNA clones affiliated to the phylum Cyanobacteria (Lake Samsonvale)...... 47

Figure 3.4: Phylogenetic tree of Actinobacteria-affiliated sequences...... 50

Figure 3.5 Phylogenetic tree of α-Proteobacteria-affiliated sequences...... 52

Figure 3.6: Phylogenetic tree of β-Proteobacteria-affiliated sequences ...... 54

Figure 4.1: PFGE size fractionation of partially digested environmental DNA.. 67

Figure 4.2: First and second size selection on one gel...... 67

Figure 4.3: PFGE of randomly selected environmental BAC clones...... 68

Figure 4.4: Size distribution of BAC clones in CBNPD1 ...... 69

xv Figure 4.5: Gel electrophoresis analysis of PCR products from library quality screens...... 70

Figure 5.1: BAC DNA extractions of 12x pooled cultures ...... 77

Figure 5.2: PCR screening of 48x pools of CBNPD1 clones before and after

BAC DNA template treated with plasmid-safe DNase ...... 77

Figure 5.3: PCR screening of CBNPD1 clones ...... 78

Figure 6.1: Linear ORF maps of the 7 completely sequenced BAC clones from the Cyanobacterial bloom metagenome library ...... 96

Figure 6.2: Phylogenetic tree of Roseateles-affiliated 16S rRNA gene ...... 98

Figure 6.3: Phylogenetic analysis of a D-lactate dehydrogenase protein ...... 116

Figure 6.4: Domain structure of a putative LuxR response regulator

...... 119

Figure 6.5: Domain structure of a putative LytR/AlgR response regulator

...... 121

Figure 6.6: Biochemical organization of the phenlyacetyl-CoA catabalon ...... 126

Figure 6.7: Organisation of a putative phenylacetyl-CoA degradation gene cluster...... 128

xvi LIST OF TABLES

Table 1.1: Metagenomic discovery based on activity-based functional screens. 9

Table 1.2: Metagenomic discoveries of homologues of targeted genes...... 10

Table 1.3: Selected bioactive products of cyanobacteria and representative producing species ...... 18

Table 1.5: Available cyanobacterial genomes from the Genomes Online

Database (GOLD)...... 23

Table 2.1: Comparisons of physical and chemical properties of sample sites;

Lake Samsonvale and Lake Ainsworth ...... 29

Table 3.1: Summary of closest relatives to cyanobacterial bloom 16S rRNA genes...... 43

Table 3.2: Summary of Physio-chemical characteristics and prevalent

Cyanobacteria species present in Lake Samsonvale, Lake Ainsworth and 4

Swedish Lakes used in comparative analysis...... 61

Table 4.1: Protein-coding PKS genes ...... 70

Table 5.1: 16S rRNA sequences obtained from CBNPD1 BAC clones ...... 79

Table 5.2: CBNPD1 library BAC-end sequencing statistics...... 79

Table 5.3: Cyanobacterial bloom derived BAC-end sequence matches...... 81

Table 5.4: Summary of BAC-end sequences matches to phya and genus level

xvii ...... 85

Table 6.1: Summary of the 10 environmental genome fragments analyzed..... 95

Table 6.2: Characteristics of the Roseateles genomic fragment (578) ...... 97

Table 6.3: Predicted RNA and protein coding genes of the Roseateles genomic fragment 578 ...... 99

Table 6.4. Predicted protein coding genes encoded on a low G+C DNA genomic fragment isolated from a cyanobacterial bloom metagenome...... 101

Table 6.5: Predicted protein coding genes encoded on medium-high G+C DNA genomic fragments isolated from a cyanobacterial bloom metagenome...... 105

xviii ABBREVIATIONS aa amino acid amp ampicillin

BAC Bacterial Artificial Chromosome

BAYSys bacterial annotation system

BLASTn basic local alignment search tool for nucleic acids

BLASTp basic local alignment search tool for proteins

BLASTx basic local alignment search tool for bp

Ch chloroamphenicol

COG clusters of orthologous groups of proteins dH2O deionised water

DNA deoxyribonucleic acid

EDTA ethylenediaminetetraacetic acid g gram

GOLD genomes online database hrs hours

IMG Integrated Microbial Genomes

ITPG isopropyl-β-D-thiogalactopyranoside

KEGG Kyoto Encyclopedia of Genes and Genomes l litre

LB Luria Bertani

LMP low melting point

LPS lipopolysaccharides

M molar m metre

xix MW molecular weight

NRPS non-ribosomal peptide synthetases

ORF open reading frame

PKS polyketide synthase rDNA ribosomal deoxyribonucleic acid

RE restriction enzyme

RNA ribonucleic acid

RNase Ribonuclease rRNA ribosomal ribonucleic acid

SSU small subunit

STE nacl/Tris/ethylenediamine tetra-acetic acid (buffer)

Str strain

T type strain

TAE tris/acetate/ethylenediamine tetra-acetic acid (buffer)

Taq Thermus aquaticus (DNA polymerase)

TB Teriffic Broth

TBE tris/boric acid/ethylenediamine tetra-acetic acid (buffer)

TE tris/ethylenediamine tetra-acetic acid (buffer)

Tris tris(hydroxymethyl)aminomethane

Tris-Cl tris-chloride (buffer) tRNA transfer ribonucleic acid

UV ultraviolet

V volts

X-Gal 5-bromo-4-chloro-3-indolyl--β-D-galactoside

xx CHAPTER 1 - INTRODUCTION: A REVIEW OF

METAGENOMICS, AND CYANOBACTERIAL BLOOMS

1.1 Advances in microbial ecology

Microbes have dominated life on Earth for most of its 4.5 billion year history.

They are the foundation of our biosphere, controlling biogeochemical cycles of elements and affecting geology, hydrology, and local and global climates. In addition microorganisms possess the highest potential for production of bioactive products, enzymes, polymers and most of the tool used in biotechnology. Though microbes contribute and exert control on the earth’s health, they have until recently been overlooked due to their small size and genome simplicity. Understanding microbial interactions in the natural environment is key to the study of complex global issues such as global warming, biodegradation of harmful compounds, and the discovery of new natural products such as antibiotics. The advent of the genomic era taken together with the inconspicuous but important contributions of microbes in the earth’s early and modern history has changed our perception of their importance and hence the push to better understand their roles in planetary life.

The total numbers of microbes on earth is estimated to be 4-6 x 1030 (Johri,

2005) of which between 95-99% have yet to be isolated and are deemed to be the ‘uncultured population’ (Dykhuizen, 1998; Curtis et al., 2002; Hagstrom et al., 2002; Torsvik et al., 2002). The branch of microbiology that studies interactions between living microorganisms and their surrounding environment is called microbial ecology. It forms the foundation in determining the functional attributes for almost every ecosystem on the planet: from industrial bioreactor

1 functioning to human and animal well-being; from pristine marine and terrestrial environments to global warming. Microbial ecology can be broadly studied with two objectives: (1) the first is to develop an understanding of population structures, biodiversity and interactions. The phrases coined to describe such studies include “population census” or “who is out there” and (2) the second is to determine the functional activities or physiological processes of each population of the microbial community complex and to monitor their effect during nutrient and seasonal fluctuations. The phrases coined to describe these studies include “guild groups” and “who is doing what”.

Microbial ecology traditionally evolved around the methodologies of enrichment cultures and pure culture isolation, which are still major parts of the technological repertoire available today. There is however inherent difficulties and bias associated with these techniques when studying uncharacterized microcosms. In particular is the problem of our inability to culture between 95-

99% of the biodiversity in the laboratory. This failure may be due to the inadequacies of the culture conditions and/or the nutritional nature of the medium ingredients. The cultures that do grow are usually the ones which are not dominant and do not necessarily reflect the dominant guild process. This is analogous to growing “weeds” rather than the desired “flowers. Recent reports suggest that the usually uncultured organism may be adapted to become culturable (Joseph et al., 2003; Sangwan et al., 2005).

Molecular approaches that encompass culture-independent techniques rather than culture-dependent techniques have recently been used as a means to capture and study the vast diversity of organisms that exist in nature. In 1984 the first mixed, natural microbial populations was described in situ using 5S

2 rRNA-based phylogenetic characterisation, subsequently initiating a new era of microbial ecology (Stahl et al., 1984). This culture-independent approach detailed the sequencing and analysis of a 5S rRNA clone library generated from the symbiotic microbial communities residing in: a tubeworm, Riftia pachyptila, a clam, Calyptogena magnifica and a bivalve Solemya velum. In comparison to

5S rRNA, 16S rDNA contains a much larger wealth of phylogenetic information.

With improved DNA sequencing technology 16S rDNA quickly became employed as the dominant phylogenetic marker in microbial diversity studies. In

1987 Carl Woese (Woese, 1987), using comparative phylogenetic analyses with more than 50 16S rDNA sequences, proposed 12 bacterial phyla all with cultured representatives. Today more than 300 000 16S rRNA genes are available in GenBank and belong to approximately 52 identifiable major lineages, or phyla, within the domain Bacteria (Rappe and Giovannoni, 2003).

However 185 593 or 68% of all 16S rDNA contained in GenBank, are from uncultured prokaryotes.

Another issue demonstrating the lack of information regarding microbial diversity is that amongst the phyla that contain cultured representatives, several may contain the greater part whilst the rest contain too few to give a complete spectrum of “who is doing what” within the phylum. Cultured representatives from four commonly named phyla; the Proteobacteria, Actinobacteria,

Firmicutes and Bacteroidetes, make up as much as 97% of the total representatives in some culture collections (Hugenholtz, 2002) and total 93% of

16S rRNA sequences from cultured representatives in GenBank. The remaining

7% of 16S rRNA sequences from cultured isolates is represented by the remaining 27 phyla, which in the real environment may be more dominant and have bigger impacts ecologically than the “big four”. Evidence from several field

3 studies indicate that many of the uncultivated phyla are found in diverse habitats and some are extraordinarily abundant although virtually nothing is known about their physiology or the ecological roles they play. The phylum

Acidobacteria is a noteworthy example as its representatives are among the most ubiquitous and abundant in nature based on culture-independent studies

(Barns et al., 1999; Buckley and Schmidt, 2002). However of the 2365

Acidobacteria 16S rRNA sequences, only 3 are from cultured isolates, many of which are physiologically divergent making it too difficult to predict many characteristics of Acidobacteria-affiliated organisms detected in the real environment (Hugenholtz et al., 1998; Barns et al., 1999).

1.2 Metagenomics

1.2.1 Metagenomics defined

Studies on the uncultured microbial majority are a challenge facing microbiologists searching for insights into the functional roles played by microbes and the interactions with other microorganisms and microbial populations. In addition to playing an essential role in natural environmental processes, they remain a hugely untapped resource for biotechnological applications and natural product discovery. Technologies such as 16S rRNA sequence analysis has provided detailed information on the phylogenetic diversity of microorganisms present in nature but is predominately limited to one single molecule, the 16S rRNA gene. 16S rRNA is useful for biodiversity studies

(i.e. “who is out there”) as it is a structural gene that acts as a molecular clock

(chronometer). However it is not a gene that can be used to relate to function

(i.e. “who does what”) and is not useful to provide information on physiology, biochemistry and ecological significance of the indigenous microbes.

Consequently studies which use rRNA for phylogenetic and population studies 4 cannot envisage what these organisms actually do in the natural environment meaning that this remains largely unexplored territory. With the rapid advances in DNA sequencing technology it has become feasible to sequence an entire microbial genome quickly and efficiently thus providing invaluable information on the putative physiology or function of a microbe. Nonetheless there is still a significant reliance on the microorganism being cultured before its genome is sequenced. A new approach augmented with DNA sequencing and in situ analysis has been developed in the last decade to access the total genetic resources or ‘metagenome’ of a microbial community without the need to culture. This approach has been aptly termed metagenomics (Rondon et al.,

2000).

Metagenomics describes the culture-independent functional and sequence- based analysis of the collective microbial genomes represented in an environmental sample. This approach is essentially based on cloning and/or shotgun sequencing random fragments of DNA isolated from microorganisms inhabiting natural environments. The stages involved with metagenomic library construction and screening are outlined in Figure 1.1. The vectors used in metagenome cloning can range from small to medium-construct vectors that have high-copy number in the host cell to large-construct vectors such as

Bacterial Artificial Chromosomes (BAC), Yeast Artificial Chromosomes (YAC), fosmids and cosmids. The advantage of using large-construct vectors is that they maintain very large DNA inserts (up to 300 kb) stably in the host cell meaning that fewer clones are required to cover the same size of genetic information and there is a far greater chance of capturing entire gene clusters, and pathways in microorganisms that may or may not be culturable. Such vectors also have the ability to express proteins from the promoters of the

5 recombinant genes (Shizuya et al., 1992)(Figure 1.1). This particular aspect of metagenomics has been embraced by numerous studies aimed at expression of secondary metabolites encoded by genes originating from unculturable microorganisms and is seen a new source for natural-product discovery

(Rondon et al., 2000; Courtois et al., 2003; Martinez et al., 2004). Importantly, by cloning large fragments sequence information flanking such genes can be obtained potentially providing phylogenetic affiliation of the organism from which the gene was derived. To date libraries constructed from various environments, including soil and marine habitats, have been used to study single genes such as cellulases (Healy et al., 1995), gene clusters and pathways such as antibiotic synthesis pathways (Rondon et al., 2000), uncultured organisms such as deep sea Archaea (Moreira et al., 2004) and whole communities (Venter et al., 2004).

1.2.2 Metagenomic analysis of microbial communities: Describing microbial ecology and diversity

The potential of metagenomics lies in accessing the genomes of as yet uncultured organisms in nature (Beja et al., 2000a), possibly establishing links and hypotheses between phylogenetic information of a microorganism with its specific functions and ecological properties within its ecosystem. From an industrial perspective, metagenomics provides access to what has been for a long time an untapped resource: the enormous resource of pharmaceuticals and other biocatalysts that may result from the chemistry of these as yet uncultured microorganisms. Three approaches have been taken thus far to identify genes, pathways and activities of interest including function and sequence driven screening analysis as well as whole shotgun sequencing.

6

Figure 1.1: Metagenomic library construction and analysis

An illustration outlining the various stages for construction and screening of a metagenomic library from a cyanobacterial bloom. (Modified from: Handelsman,

2004).

7 Function-based screenings of a metagenomic library for a desired phenotype are initiated using activity screens and assays to detect expression. Functional screens of metagenomic libraries have to date identified a variety of enzymes and biocatalysts (summarized in Table 1.1) including new antibiotics (Gillespie et al., 2002), hydrolytic and degradative enzymes (Henne et al., 1999; Henne et al., 2000), biosynthetic functions (Knietsch et al., 2003c), antibiotic resistance enzymes (Courtois et al., 2003) and membrane proteins (Majernik et al., 2001).

With this approach there is the added possibility that sequencing the flanking

DNA may reveal genes used to define a phylogenetic affiliation of the organism from which the DNA was isolated from. Linking phylogenetic information to new functions from the organism from which the DNA was derived could assist in physiological and ecological inferences which may lead to culturing strategies for uncultured microorganisms.

Sequence-based analysis involves either direct targeting of a sequence of interest or random generation of sequence information from clone-ends, which once annotated can be used to indicate the genomic potential of the insert.

Direct targeting of specific sequences entails using a conserved DNA sequence to design hybridization probes or PCR primers to screen metagenomic libraries for clones for a gene of interest. Genes targeted often include phylogenetic anchors such as 16S rDNA (Table 1.2). Sequencing the entire clone once an anchor is found provides functional information about the organism from which the DNA originated. Significant discoveries have resulted from using such approaches to study microorganisms in natural environments, the most dramatic involving the sequencing of a clone isolated from seawater (Beja et al.,

2000a). This clone, initially identified because it carried a bacterial 16S rRNA gene, revealed a gene with high similarity to a rhodopsin gene, a gene encoding

8 Table 1.1: Metagenomic discoveries based on activity-based functional screens Function/ Habitat Average Reference activity of interest insert size (kb) Esterase/lipase Tributyrin Forest soil 8 Lorenz and Eck, 2005 Esterase/lipase Tributyrin Forest soil 40 Lorenz and Eck, 2005 Esterase/lipase Tributyrin Sandy ecosystem 30 Lorenz and Eck, 2005 Esterase/lipase Tributyrin Sandy ecosystem 40 Lorenz and Eck, 2005 Esterase/lipase Tributyrin Soil 6 Henne et al., 2000 Esterase/lipase Triolein Soil 6 Henne et al., 2000 Esterase/lipase Bacto Lipid Soil 27 Rondon et al., 2000 Oxidation of polyols 1,2-ethanediol; 1,2-propanediol; 2,3-butanediol Soil 3 Knietsch et al., 2003c Alcohol Glycerol/1,2-propanediol Soil/enrichment 4 Knietsch et al., 2003b Amidase D-phenylglycine-L-leucine Soil/enrichment 5 Gabor et al., 2004 Amylase Starch Soil 27 Rondon et al., 2000 Biotin production Biotin deficient medium Excrement enrichment 35 Entcheva et al., 2001 Protease Skimmed milk Soil 10 Gupta et al., 2002 Cellulase Carboxylmethyl-cellulose Sediment enrichment 6 Rees et al., 2003 Chitinase Methylumbelliferyl-diacetylchitobioside Seawater 5 Cottrell et al., 1999 Dehydratase Glycerol Sediment enrichment 4 Knietsch et al., 2003a 4-Hydroxybutyrate 4-Hydroxybutyrate Soil 6 Henne et al., 1999 conversion B-lactamase Ampicillin Sediment 32 Song et al., 2005

9 Table 1.2: Metagenomic discoveries of homologues of targeted genes. Environment Insert size (kb) Gene of Interest Reference Marine 80 16S rDNA, \Photosystem II Beja et al., 2000b; Zeidner et al., 2003 Marine 40 Archaea 16S rDNA Beja et al., 2002 Marine (500m depth) 35 Archaea 16S rDNA Moreira et al., 2004 Marine sponge symbiots 38 Poribacteria 16S rDNA Fieseler et al., 2006 Antarctic coastal water Fosmid 16S rDNA Grzymski et al., 2006 Marine 20 16S rDNA Schmidt et al., 1991 Polychaete symbiots Fosmid 16S rDNA Campbell et al., 2003 Marine 50 Polyketide synthase Courtois et al., 2003 Marine BAC Photorhodopsin de la Torre et al., 2003 Sediment Fosmid Archaea 16S rDNA Hallam et al., 2003 Tubeworm symbiot Fosmid Histidine protein kinase Hughes et al., 1997 Soil 27 16S rDNA Rondon et al., 2000; Liles et al., 2003 Marine 35 16S rDNA Lopez-Garcia et al., 2004 Beetle symbiots Fosmid Polyketide synthase Piel, 2002 Beetle and sponge symbiots Cosmid Polyketide synthase Piel et al., 2004 Soil 35 16S rDNA Quaiser et al., 2002 Soil 33 Acidobacteria16S rDNA Quaiser et al., 2002 Sponge symbiots Fosmid RadA Sandler et al., 1999 Sponge symbiots 40 Archaea 16S rDNA Schleper et al., 1998 Freshwater 40 Archaea 16S rDNA Schleper et al., 1997 River biofilm 40 Hybridization analysis Sebat et al., 2003 Marine 40 Archaea 16S rDNA Stein et al., 1996

10 a light-driven proton pump that was once thought limited to the domain

Archaea. Subsequent expression of this gene in E.coli further demonstrated the power of this approach to link phylogeny to function. An alternative to targeting phylogenetic anchors is screening for genes that encode conserved regions of sequence that are associated with a particular function. Several metagenomic studies have targeted and identified novel polyketide synthases

(PKS) and peptide synthases (PS) from soil (Courtois et al., 2003) and a bacterial symbiont of a beetle (Piel, 2002). These genes which are often arranged in large gene clusters are inferred in secondary metabolite production such as antibiotics. Work by Piel (2002) identified and sequenced an entire 54 kb PKS/PS gene cluster which is inferred in pederin biosynthesis, a compound that has antitumor properties.

The third approach, whole-genome shotgun sequencing, details mass sequencing and annotation of the information contained within a community metagenome with the objective being to reconstruct the genomes of community members. The largest whole genome shotgun sequencing project to date was undertaken by Craig Venter and his colleagues (Venter et al., 2004) where they attempted to sequence the entire metagenome of the Sargasso Sea near

Bermuda. Over 1 billion base pairs of nonredundant sequence were generated, annotated, and analyzed to elucidate the gene content, diversity, and relative abundance within the microbial populations. They estimated that this data, identifying 1.2 million previously unknown genes, was derived from at least 1800 genomic species, including 148 previously unknown bacterial phylotypes.

11 1.2.3 Reconstruction of the metagenome

With the advent of metagenomic techniques and improved sequencing technology it has become feasible to sequence the collective genomes or metagenome of members of a microbial community. Most environmental communities are far too large and complex for the above to be possible with current sequencing and annotation technologies although with simpler communities there is the possibility of metagenome reconstruction. To date there have been reconstruction attempts of viral communities in ocean and human gut communities (Breitbart et al., 2002; Breitbart et al., 2003), members of a natural acidophilic biofilm (Tyson et al., 2004) and communities of nutrient- limited sea water (Venter et al., 2004). In the case of the acidophilic biofilm the report of Tyson et al (2004) demonstrated the culture-independent recovery of two near complete genomes and partial recovery of three others revealing pathways for carbon and nitrogen fixation and energy pathways and providing insights into survival strategies in an extreme environment.

1.2.4 Challenges with metagenomic analysis

With the rapid development of metagenomics there seem boundless possibilities although as with most emerging technologies there are various challenges and limitations. Issues include phylogenetic anchors, the complexity of microbial communities and the limitations of screening techniques. At present there is a trend to use 16S rDNA as a phylogenetic anchor, although the variance of rrn operons in different species (Klappenbach et al., 2000) makes it somewhat undesirable as it leads to bias and difficulty in assigning phylogenetic affiliation to genes and pathways without genome reconstruction.

The sheer complexity of most microbial communities makes any DNA

12 manipulations without bias technically challenging. An example that illustrates such problems is detailed thoroughly by (Riesenfeld et al., 2004) where they hypothesize that a library of 500 Gbp in size may be needed to capture the species richness of just 1ml of seawater and a staggering 10, 000 Gbp for the same size sample of soil (Venter et al., 2004).

Coinciding with the problems facing library size and phylogenetic anchors is the limitations associated with library screening. As it stands today none of the current approaches can define the full diversity of gene function in libraries.

Even mass sequencing methods which are seen as a way to bypass the shortcoming of function and sequence based screening have limitations in the form of inadequate information regarding the genes already available in the databases. Activity screens can detect genes that are as yet not identifiable based on sequence information and sequence screens can do visa versa, meaning a combination of the three is therefore critical to try and overlap the shortcoming of each. Advances in methodology addressing these and other challenges facing metagenomic analysis will be forthcoming as this new field evolves with the vast information gained from the metagenomes of microbial communities. Several publications have recently tackled issues such as phylogenetic anchors, size of metagenomes and library screening (Fiandt,

2000; Gabor et al., 2003; Riesenfeld et al., 2004). One in particular has generated habitat-specific fingerprints that reflect characteristics from various environments (Tringe et al., 2005), in affect developing 'community' anchors based on protein function of a whole community rather than that of an individual organism.

13 1.3 Cyanobacteria

1.3.1 Species diversity and global distribution

Cyanobacteria are oxygenic, photosynthetic, gram-negative prokaryotes that represent a single taxonomic and phylogenetic group. They are often referred to as blue green algae as they were long thought to belong to the plant kingdom partially due to their ability to perform photosynthesis and for the blue-green pigment phycocyanin which along with chlorophyll a gives them a blue-green appearance (Cohen-Bazire and Bryant, 1982). They are among the oldest life forms on earth and hence during the course of evolution cyanobacteria have adapted to almost every ecological niche covering terrestrial, marine and freshwater habitats, including those that are categorised as extreme. In terrestrial environments many members are found on the surfaces of rocks and soils (Garcia-Pichel and Belnap, 1996; Garcia-Pichel et al., 2001). In hot desert soils subject to intense sunlight certain groups of cyanobacteria form extensive epilithic crusts remaining dormant for most of the year, growing briefly during the cooler wetter months (Potts, 1994). They are known to exist in planktonic forms in oligotrophic oceans as well as form cyanobacterial mats of considerable thickness in inland saline lakes and shallow marine lagoons where warm higher saline salt waters exist (Bauld, 1981; Javor and Castenholz, 1981;

Stal, 1995). They also exist in planktonic forms in freshwater lakes and rivers although many species under eutrophic conditions may experience rapid levels of growth commonly referred to as a cyanobacterial ‘bloom’.

1.3.2 Cyanobacterial blooms and factors affecting formation

Cyanobacterial blooms can form at any water depth and in any area of water although the occurrence of blooming requires a large population to amass at or

14 near the lake surface in response to vertical and lateral movement (Walsby,

1994). This is otherwise known as surface scum. Vertical movement is controlled by cyanobacteria’s unique ability to move vertically within the water column via changes in their buoyancy due to gas vacuoles. Lateral movement is controlled by prevailing hydro-environmental conditions such as wind and water retention times. The maximum growth rates of most cyanobacteria range from

0.6-0.8 d-1 (Reynolds, 1987), which is generally much lower than that of most algal species (Hoogenhout and Amesz, 1965; Reynolds, 1984). However, when conditions of temperature, light and nutrient status are favorable, cyanobacterial populations of both marine and freshwater systems may out-compete other phytoplankton organisms to form a cyanobacterial bloom.

Cyanobacteria require sunlight to photosynthesize and produce the necessary energy for cellular functions. The rate of photosynthesis can be correlated to light intensity (Zevenboom and Mur, 1984), although many cyanobacteria have the ability to utilize various intensities and spectral qualities of light that gives them a competitive advantage over other phytoplankton in water systems. The maintenance constant or the energy required to maintain cell function and structure is low (Gons, 1977; Van Liere and Mur, 1979), meaning that cyanobacteria can maintain a relatively higher level of growth than other phytoplankton when light intensity is low. Maximum growth rates are attained for most bloom-forming cyanobacteria between 25ºC and 35ºC (Reynolds and

Walsby, 1975; Robarts and Zohary, 1987) which is higher than for green algae and diatoms. Cyanobacterial blooms are often associated with eutrophic water although the correlation between cyanobacterial biomass and total nitrogen and phosphorus is generally low (Canfield et al., 1989). Moreover the affinity of cyanobacteria for nitrogen and phosphorus is higher than for many other

15 photosynthetic organisms (Mur et al., 1999), which often leads to a competitive advantage over other phytoplankton organisms.

1.3.3 Cyanobacterial secondary metabolites

Secondary metabolites are the intermediates and products of metabolism that are not essential for normal growth, development and reproduction, but usually have important ecological functions. Cyanobacterial blooms are reported to be associated with the production of a number of secondary metabolites dating back as early as the 1500’s when Nostoc species were used to treat gout, fistula and several forms of cancer (Pietra, 1990). Since then over 4000 strains of freshwater, marine and terrestrial cyanobacteria have been screened with approximately 600 different secondary metabolites identified (Burja et al., 2001) exhibiting cytotoxic, anticancer, antibacterial, antiviral, antifungal, anti- inflammatory, enzyme-inhibiting, immunosuppressive and anti-protease activities (Moore, 1996; Namikoshi and Rinehart, 1996).

The most intensely studied secondary metabolites are those with cytotoxic effects aptly termed cyanotoxins, which are of potential concern to human health. Such toxins are commonly associated with poisoning of wild and domestic animals (Jackson et al., 1984; Beasley et al., 1989; Frazier et al.,

1998) and to a lesser extent, humans. The first report of a toxic cyanobacteria bloom occurred in 1878 with a toxic Nodularia bloom in Lake Alexanderina,

Australia (Francis, 1878). More recently there have been numerous reports including cyanotoxin contaminated drinking water being attributed to a high incidence of primary liver cancer in China (Yu, 1989; Harada et al., 1996; Ueno et al., 1996) and the tragic deaths of 60 dialysis patents in Brazil due to the

16 presence of cyanotoxins in the water supply used in a haemodialysis unit

(Jochimsen et al., 1998; Pouria et al., 1998). Cyanotoxins are classified functionally into hepatotoxins, neurotoxins and cytotoxins with some cyanobacteria also producing the lesser toxic lipopolysaccharides (LPS). Based on chemical structure cyanotoxins are divided into three classes: cyclic peptides

(hepatotoxins), alkaloids (neurotoxins) and LPS. Toxic bloom-forming cyanobacteria have been found among numerous genera and include

Anabaena spp., Aphanizomenon spp. and Microcystis spp. Other secondary metabolites that are of significance to human health and quality of life include taste and odour compounds such as geosmin, produced by several bloom forming cyanobacteria (Izaguirre and Taylor, 2004).

A number of cyanobacterial secondary metabolites have been discovered to have potential benefits to humans and thus are of pharmaceutical interest including cryptophycins and cyanovirin. Cryptophycins originally isolated from

Nostoc, are a group of potent antimitotic metabolites, which also show anticancer activity against a broad spectrum of tumours (Smith et al., 1994;

Panda et al., 1998). Cyanovirin is an antiviral protein discovered in extracts of

Nostoc ellipsosporum (Mori et al., 1998) that exhibits virion-receptor binding activity. It has been found to act as a fusion inhibitor in HIV, preventing the fusion of viral and cell membranes and thus the transfer of viral genetic material into the host (Esser et al., 1999; Dey et al., 2000). A mini-review on selected cyanobacterial metabolites and their bioactivities is presented in Table 1.3.

17 Table 1.3: Selected bioactive products of cyanobacteria and representative producing species Source: (Dittmann et al., 2001)

Compound Organism Bioactivity Reference Aeruginosin Microcystis aeruginosa Thrombin/trypsin inhibitor Kodani et al., 1998a Anatoxin-A Anabaena flos-aquae Nicotinic agonist Hemscheidt et al., 1995 Anabaenopeptin Microcystis aeruginosa Inhibitor of carboxypeptidase A Itou et al., 1999 Aplysiatoxin Lyngbya majuscule Tumour promoter Mitchell et al., 2000 Cylindrospermopsin Cylindrospermopsis raciborskii Hepatotoxic Runnegar et al., 1995b Curacin A Lyngbya majuscule Antimitotic Blokhin et al., 1995 Cyanopeptolin Microcystis aeruginosa Trypsin inhibitor Jakobi et al., 1995 Cryptophycin Nostoc sp. Antimitotic Smith et al., 1994 Dehydroradiosumin Anabaena clyindrica Trypsin inhibitor Kodani et al., 1998b Dolastatin Lyngbya majuscule Antimitotic inhibitor of HIV-1 integrase Mitchell et al., 2000 Laxaphycin Anabaena laxa Antifungal Frankmolle et al., 1992 Lyngbyabellin Lyngbya majuscule Antimitotic Luesch et al., 2000 Lyngbyatoxin Lyngbya majuscule Activator of protein kinase C Basu et al., 1992 Majusculamide C Lyngbya majuscule Antifungal Moore, 1996 Microginin Microcystis aeruginosa Inhibitor of angiotensin-converting enzyme Neumann et al., 1997 Microcystin Microcystis aeruginosa Inhibitor of protein phosphatases A and 2A Runnegar et al., 1995a Nodularin Nodularia spumigena Inhibitor of protein phosphatases 1 and 2A Annila et al., 1996 Oscillatorin Oscillatoria agardhii Chymotrypsin inhibitor Sano and Kaya, 1996b Oscillapeptin Oscillatoria agardhii Tyrosinase inhibitor Sano and Kaya, 1996a Saxitoxin Anabaena flos-aquae Sodium channel blocker Haney et al., 1995 Tanikolide Lyngbya majuscule Antifungal Singh et al., 1999

18 1.3.4 Synthesis and regulation of secondary metabolites

A high proportion of cyanobacterial secondary metabolites have been found to belong to structurally diverse groups of polyketides and non-ribosomal peptides

(Moore, 1996). These compounds are synthesized from small chain carboxylic acid and amino acid monomers, respectively, by large, multifunctional protein complexes called polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS) (Marahiel et al., 1997; Cane and Walsh, 1999). Both have a modular organisation often arranged in gene clusters, with each module carrying multiple copies of active domains that catalyse steps in polyketide and/or peptide biosynthesis. Recent studies have demonstrated the presence of

PKS and NRPS within a number of cyanobacterial genomes including

Anabaena spp., Cylindrospermopsis spp. and Microcystis spp with implicated links to secondary metabolite and toxin production in blooms (Schembri et al.,

2001; Hoffmann et al., 2003). There exist however, only several metabolites whose entire gene cluster have been characterised. Microcystin, a common cyanobacterial hepatotoxin produced by bloom-associated cyanobacteria, was the first cyanobacterial metabolite whose biosynthetic gene cluster, termed microcystin synthase (mcy; Figure 1.2) was discovered and characterised

(Nishizawa et al., 1999; Nishizawa et al., 2000; Tillett et al., 2000). This 55-kb cluster (Fig. 2) was found to encode 10 bidirectionally transcribed ORFs consisting of PKS and PS modules. Several biosynthetic gene clusters have since been characterised and reported (Table 1.4).

19 Table 1.4: Discoveries of cyanobacterial gene clusters involved in secondary metabolite production

Secondary Bioactivity Source Cluster ORFs metabolite size (kb) Microcystin Hepatotoxin Microcystis sp. 55 10 Planktothrix sp. Anabaena sp. Nodularin Hepatotoxin Nodularia sp. 48 9 Barbamide Molluscicidal Lyngbya sp. 26 12 Curacin A Antimitotic Lyngbya sp. 64 14 Jamaicamide Neurotoxin Lyngbya sp. 58 17 Lyngbyatoxin Activator of protein kinase C Lyngbya sp. 11.3 4 Nostopeptide Unknown Nostoc sp. 40 8

Figure 1.2: Microcystin synthase gene cluster mcy

Modular organisation of the microcystin synthase gene cluster mcy. Each module carries multiple copies of active domains that catalyse steps in polyketide and/or peptide biosynthesis. Genes encoding polyketide synthase modules: mcyD, MCI, mcyG. Genes encoding peptide synthase modules: mcyA-C, mcyF, mcyG. Genes encoding modifying enzymes: mcyF, mcyH-J.

The understanding of the regulation of cyanobacterial secondary metabolites production has evolved coinciding with the increase in molecular studies focusing on genes and/or clusters directly responsible for secondary metabolite production. There have been numerous culture-based studies mimicking environmental conditions in the laboratory and measuring extracellular metabolite concentrations as a means to understand secondary metabolite

20 production and regulation summarized by (Sivonen and Jones, 1999).

Subsequent results from these approaches are widely disputed and often contradictory (Kaebernick and Neilan, 2001). Moreover there has been a push for quantitative and qualitative information on regulation of genes, mRNA and protein levels as well as secondary metabolite quantification. Recent transcriptional analysis on PKS and NRPS modules within the microcystin synthase cluster has revealed differential expression in response to light quality with high light intensities resulting in greater expression of the genes in question as well as red light being more effective than white light (Kaebernick et al.,

2000; Kaebernick et al., 2002). There however still remains an alarming amount of unanswered questions regarding secondary metabolite production. Some cyanobacteria are known to be genetically capable of producing toxic secondary metabolites but do not under all conditions, with the triggers that initiate production unknown. The increasing number of cyanobacterial secondary metabolite biosynthetic gene clusters being discovered and their subsequent use in gene expression and regulation studies will add to the understanding of the expression, physiology and ecology of these metabolites.

1.3.5 Genome analysis of Cyanobacteria

Over the past decade cyanobacterial research has advanced from laboratory- based ecological studies on the isolation of individual environmental effectors to molecular studies aimed towards identifying genes involved in secondary metabolite production. This approach has been accelerated with the shift towards microbial genome sequencing and systems biology i.e. a systems-level understanding of biological systems that takes into account complex interactions of gene, protein, and cell elements within the environment. In the

21 phylum Cyanobacteria, complete genome sequences have been reported for thirteen species and strains (Table 1.5) including the toxic-bloom forming species, Anabaena variabilis ATCC 29413T. The comparison of morphological and genetic data is hindered by the lack of cultures of axenic cyanobacterial species and inadequate morphological data of sequenced strains. However analysis from the few genomes available has still given invaluable insights into cyanobacterial physiology and ecology. Using the available genome sequences of the Cyanobacteria it has been reported that they likely make extensive use of a variety of two-component proteins to regulate cell behaviour and gene expression in response to changes in the external environment (Mizuno et al.,

1996; Ohmori et al., 2001; Dufresne et al., 2003; Rocap et al., 2003). Two- component proteins enable bacteria to sense and adapt to the changes in their surroundings by detecting external stimulus, transmitting the signal across the cell membrane and ultimately regulating targeted gene expression. It has been previously postulated that the fraction of two-component proteins in the total protein set represents a measure of the organism’s ability to adapt to diverse conditions otherwise known as the “bacterial IQ” (Galperin, 2005) or

“rudimentary form of intelligence” (Hellingwerf, 2005). In that sense, cyanobacteria can be considered “pretty clever” organisms (Ashby and

Houmard, 2006). These types of findings have only become possible with the advent of genome sequencing and analysis.

22 Table 1.5: Available cyanobacterial genomes from the Genomes Online Database (GOLD). Updates available at: http://www.genomesonline.org/ Species (Total number) Order Genome Size (Mb) Anabaena variabilis ATCC 29413 Nostocales 7.07 Gloeobacter violaceus PCC 7421 Chroococcales 4.66 Nostoc sp. PCC 7120 Nostocales 7.21 Prochlorococcus marinus (4) Chroococcales 1.66-2.41 Synechococcus sp. (4) Chroococcales 2.23-2.7 Synechocystis sp. PCC 6803 Chroococcales 3.95 Thermosynechococcus elongatus BP-1 Chroococcales 2.59

1.4 Aims

The environmental playing field is immense. Not only is microbial species diversity vast, and mostly undefined, but also natural systems are highly dynamic, their population structure and activities shifting with changing environmental circumstance. Nowhere is this statement better illustrated than within freshwater systems susceptible to cyanobacterial blooms. These extremely complex, constant-interacting and changing microbial communities have vast impacts on their surrounding ecosystem and remain widely misunderstood. Although a considerable amount of work has been conducted on physio-chemical conditions that effect bloom formation and the phenotypes of cultured Cyanobacteria species, there is still very little known of cyanobacterial bloom population structure and their function in the real environment. The fact that the majority of Cyanobacteria cultures that exist are not axenic further highlights the various interactions between Cyanobacteria and heterotrophic bacteria of which there is a severe lack of information. In order to gain a comprehensive understanding of the ecology of environmental microbial systems such as cyanobacterial blooms, information into species diversity and their function is required.

23 The complexity of these communities is simply beyond the reach of conventional tools which at present are very much inadequate if used as a sole approach. Using a metagenomic approach, the collective genomes of a cyanobacterial bloom will be investigated using a variety of culture-independent techniques. The species diversity of bloom samples from two contrasting

Australian freshwater lakes will be assessed and the more suitable chosen for metagenomic analysis. A large-construct BAC library is to be constructed from random DNA fragments isolated from the selected toxin-producing cyanobacterial bloom. In parallel, several single-gene PCR-based libraries will be constructed from the same DNA to verify the quality of the metagenomic library. Various screening/sequencing and annotation techniques will be used to provide the first insight into a cyanobacterial bloom metagenome. It is hoped a more detailed picture on bloom population structure and function will emerge enabling a greater understanding of cyanobacterial bloom ecology as well as giving rise to future strategies in regards to bloom prediction and cyanobacterial culture.

24 CHAPTER 2 – MATERIALS AND GENERAL METHODS

2.1 Research plan

The research plan designed to address the project aims is outlined in Figure

2.1. Two contrasting sites are to be investigated which are frequently affected by cyanobacterial blooms; Lake Samsonvale which is predominately used as a drinking water reservoir to augment the supply for Brisbane City and Lake

Ainsworth, a natural freshwater dune lake valued for its scenic, aesthetic and recreational value. Bloom samples from both sites are to be collected, high molecular weight DNA extracted and population diversity of each bloom sample comparatively assessed by 16S rDNA analysis (Chapter 3). Based on what

Cyanobacteria species are dominant one bloom sample will be chosen for

HMW DNA fractionation and metagenomic library construction (Chapter 4).

Once constructed, the library will be screened to identify clones that contain genes of interest (Chapter 5). Selected clones will be sequenced to completion

(Chapter 6), and the genes encoded within annotated using various bioinformatics software packages to gain functional information.

2.2 Reagents and chemicals

All reagents and chemicals used are molecular biology and analytical grade unless otherwise specified.

25

Chapter 3: Community Composition Associated with Cyanobacterial Blooms Cyanobacterial bloom collection 1. Lake Samsonvale 2. Lake Ainsworth

Gentle HMW DNA extraction

16S rDNA clone library + analysis

Chapter 4: Metagenomic Library Construction HMW DNA fractionaction

BAC library construction

BAC library quality

Chapter 5: Metagenomic Library Screening BAC library gene screens: 1. BAC end survey 2. 16S rDNA

Chapter 6: Genomic Analysis of 7 DNA Fragments from Cyanobacterial BAC insert DNA Bloom-Associated Bacteria sequencing and analysis

Figure 2.1: Outline of research plan used to address aims to better understanding cyanobacterial bloom population structure and function.

2.3 Buffers

TE buffer 10 mM Tris-Cl (pH 7.4), 1 mM EDTA (pH 8.0)

TAE buffer 40 mM Tris-acetate, 2mM EDTA

6X loading buffer 0.25% bromophenol blue, 40% sucrose. Store at 4°C

10X TBE buffer 108 g/L Tris-HCl, 55 g/L boric acid, 40 ml 0.5 M

EDTA (pH 8.0), ddH2O to 1 L

26 ESP buffer 1% Sarkosyl, 1mg/ml Proteinase K, 0.5 M EDTA

Lysis buffer 10 mM Tris-HCl pH 8.0, 50 mM NaCl, 0.1 M EDTA,

1% Sarkosyl, 0.2% sodium deoxycholate, 1 mg/ml

lyzosome

STE buffer 1M NaCl, 10mM Tris-HCl, 1mM EDTA pH 8.0

Phosphate buffer 23.1 g/L KH2PO4, 125.4 g/L K2HPO4

2.4 Microbial media

LB medium 10 g/L NaCl, 10 g/L tryptone, 5 g/L yeast extract. (pH

7.0)

TB medium 24 g/L yeast extract, 12 g/L peptone, 4 ml glycerol,

10% (v/v) phosphate buffer. (pH 7.0)

SOC medium 2% tryptone, 0.5% yeast extract, 10 mM NaCl, 2.5

mM KCl, 10 mM MgCl2, 10 mM MgSO4 and 20 mM

glucose

2.5 Sample sites and Collection

Two Australian freshwater bodies, Lake Ainsworth and Lake Samsonvale, were chosen as contrasting model ecosystems that are both susceptible to cyanobacterial blooms during spring/summer. Lake Ainsworth located near

Lennox Head, NSW (28°42’S, 153°36’E) is a naturally acidic freshwater dune lake whose waters are brightly coloured from leaching of organic acids from adjacent melaleuca swamps. It can be classified as a dystrophic, polymictic oligotrophic dune lake. Lake Samsonvale (27°16’S, 152°56’E) is a concrete gravity structure constructed in the early 1970’s to augment the domestic water

27 supply to Brisbane City, QLD. The catchment area of Lake Samsonvale is 347 km2 of which 44.5% is covered by remnant woody vegetation and 35.2% is pasture. Most streams in the catchment frequently exhibit high nutrients in particular total phosphorus and nitrogen. Lake Samsonvale is a warm monomictic eutrophic reservoir that typically experiences thermal stratification from October-May with toxic cyanobacterial blooms a frequent occurrence.

A B

Figure 2.2: Images of Lake Samsonvale (A) and Lake Ainsworth (B).

Surface water samples (0-3 m) were collected from each lake using sterile 2 L

Schott bottles. Samples from Lake Samsonvale were collected from designated sample locations 10001S and 10006S using an integrated depth sampler

(SEQwater Cooperation, Brisbane, QLD) in mid-September 2003. Samples from Lake Ainsworth were collected from the northern end of the lake in late

February 2003. Water samples were kept at 4°C for no more than 24 hours before analysis.

28 Table 2.1: Comparisons of physical and chemical properties of sample sites; Lake Samsonvale and Lake Ainsworth Lake Samsonvale Lake Ainsworth Lake surface area (km2) 21.63 0.13

Mean depth (m) 9.4 4.0 Lake character Monomictic, eutrophic Dystrophic, polymictic, oligotrophic Water temp. (°C) 26 n/a pH Neutral-slightly alkaline Acidic Prevalent Cylindrospermopsis Microcystis cyanobacterial species Aphanizomenon Anabaena Anabaena

2.6 DNA Techniques

2.6.1 DNA plug preparation

Two litres of bloom sample were concentrated by centrifugation at 18,000g using a Beckman rotor and resuspended in 1.0ml of STE buffer. The resulting resuspension was mixed with 1 volume of 1% molten low melting point (LMP) agarose (SeaPlaque). Suspension was cooled to 40°C and then immediately transferred into Bio-Rad plug molds. Molds were placed on ice for ten minutes before plugs were removed and stored in 10 ml TE buffer at 4°C.

2.6.2 DNA plug lysis

The cell-containing agarose plugs were transferred to 10 ml of lysis buffer and incubated at 37°C for 1 hour. Plugs were then transferred to 40 ml of ESP buffer and incubated for a further 16 hours at 55°C. The solution was then replaced with 40ml of fresh ESP buffer, and incubated at 55°C for an additional hour.

Plugs were washed in 10 ml TE containing 2 mM PMSF to inactivate proteinase

K for 1 hour at 4°C and again with 10 ml TE for 30 minutes. Cell lysis and DNA release was checked on a 0.7% agarose gel. Agarose plugs were stored in 50 mM EDTA at 4°C.

29 2.6.3 Agarose gel electrophoresis

Agarose gel electrophoresis was used for visualisation of DNA fragments and

PCR products. Gel solutions of 0.7% or 1.0% were prepared by dissolving DNA grade agarose in 1X TAE buffer and boiling until completely dissolved. The solution was cooled to approximately 50°C before ethidium bromide was added to a final concentration of 0.1 μg/ml and the gel cast in a plastic tray. Samples containing DNA were mixed with 6x loading dye before being electrophoresised at 5 V/cm for 20 to 30 minutes in 1X TAE buffer. λ HindIII linear standard was run simultaneously with samples as a molecular marker for DNA size determination and the estimation of DNA concentration. DNA bands were visualised under a long wavelength UV transilluminator and documented using a UVP gel imaging system (Pathtech).

2.6.4 Pulsed field gel electrophoresis

Pulsed field gel electrophoresis (PFGE) was used for separation of DNA fragments between 5 and 1000 kb. Agarose gel solutions of 1.0% were prepared by dissolving DNA grade LMP agarose in 0.5X TBE and boiling until completely dissolved. The solution was cooled to 50°C and cast in a plastic tray.

DNA-containing agarose plugs were inserted directly into the wells or mixed with 1% LMP agarose prior to loading. Electrophoresis was performed in 0.5X

TBE (14°C) using an in-house built CHEF unit at 200 V for 16 hours with an initial and final switch time of 5 and 15 seconds respectively. A HMW λ concatamer (NEB) was run simultaneously with samples as a molecular marker.

After electrophoresis was completed gels were stained with ethidium bromide

(0.1 μg/ml) for 2 hours and destained with ddH2O for 30 minutes. DNA bands were visualised under a long wavelength UV transilluminator and documented

30 using a UVP gel imaging system.

2.6.5 Digestion of HMW DNA

Random fragments of HMW DNA used for BAC library construction were generated by partial restriction enzyme (RE) digestion of DNA. Plugs containing

DNA were placed in 1.5ml microcentrifuge tubes containing 45 μl of RE digestion buffer (10% of 10XRE buffer, 1% BSA) and incubated on ice for 30 minutes after which fresh buffer and BamHI was added and plugs incubated on ice for an additional 20 minutes to allow RE to diffuse into the plug. Plugs were incubated at 37°C for 20 minutes for partial digestion. The reaction was stopped by adding 1/10 0.5M EDTA pH 8.0 and tubes being placed on ice. Partial digestion was assessed by PFGE.

2.6.6 GELase gel digestion

For DNA purification from agarose gels, the agarase, GELrase (Bio-Rad), was used as per the manufacturer’s method. For selecting the appropriate sized

DNA used in library construction an area of the post-electrophoresised LMP agarose gel containing DNA of the size of interest was excised prior to ethidium bromide staining and UV visualisation. This procedure was undertaken so that

DNA used for further manipulations was not exposed to UV light which damages DNA. Flanking regions of the gel were stained, visualised under UV

(as per section 2.6.4) and used as markers to identify regions of the unstained

DNA to be used in further manipulations.

31 2.6.7 Ligation reactions

DNA ligation reactions were performed using Fermentas T4 DNA as per manufacturer’s protocol. Ligation reactions with a 120 µl final reaction were set up in sterile 1.5 ml microcentrifuge tubes and allowed to incubate overnight at

16°C.

2.6.8 Dialysis

Prior to electroporation, ligation reactions were drop-dialysed for 2 hours against 0.5X TE buffer using a Millipore 0.025 µm VSWP filter in order to remove salts and other small molecules.

2.6.9 Electroporation of E.coli

Dialysed ligation products (2 µl) were mixed with 50 µl of transformax EC100 electrocompetent cells (Epicentre) as per manufacturer’s protocol and transferred to a 0.2 cm gap electroporation curvette (Bio-Rad). The Gene Pulser

II apparatus (Bio-rad) was used to transform the cells with the following parameters: voltage was set to 2.5 kV, resistance to 1000 ohms and capacitance to 25 µF. Transformed cells were immediately suspended in 1 ml of

SOC medium and incubated at 37°C for 1 hour with shaking at 200 rpm. The transformed cells were plated on LB agar plates containing 12.5 µg/ml chloroamphenicol, 50 µg/ml Xgal and 25 µg/ml IPTG and the plates incubated for 24 hours at 37°C. White colonies were transferred to individual wells of 96- well microtitre plates containing 200 µl of LB with 12.5 µg/ml chloroamphenicol.

The microtitre plates were incubated at 37°C for 24 hours before sterile glycerol was added to a final concentration of 30% (v/v) and plates stored at -80°C.

32 2.6.10 Small scale plasmid purification

Small-scale plasmid purification was performed using Wizard™Plus SV Miniprep columns (Promega) as per manufacturer’s protocol. Exceptions were made with clones of low-copy vectors with the following alterations: Clones were inoculated into 10 ml TB+ch, all buffer volumes were doubled and elution buffer which was heated to 70°C was allowed to incubate for 1 hour in the column prior to the final spin stage.

2.6.11 Large scale plasmid purification

Large-scale, high quality plasmid DNA purification was performed using

QIAGEN QIAfilter Midi kits using conditions and protocols described by the manufacturer.

2.6.12 Restriction endonucleases and DNA modifying enzymes

All enzymes throughout this study were purchased from Fermentas and used under the reaction conditions described by the manufacturer unless otherwise stated.

2.6.13 Polymerase chain reaction

Amplification of specific DNA sequences was performed using the polymerase chain reaction (PCR). Reactions were set up in 0.8 ml microcentrifuge tubes and contained 100 ng of template, 1U of Taq polymerase (Promega), 25mM dNTP’s (Promega) and 50uM concentrations of each oligonucleotide. PCR reactions were performed using a Corbett thermal cycler (Corbett Research)

33 with 2 minutes of denaturation at 95◦C and 30 cycles of 95◦C for 1 minute, 55◦C annealing for 1 minute and 72◦C extension for 90 seconds. Reactions were analysed by agarose gel electrophoresis (section 2.6.3) and PCR fragments recovered using Qiaquick® purification spin columns (Qiagen) as per manufacture’s instructions.

2.6.14 Automated DNA sequencing

Automated DNA sequencing was performed using ABI BigDye™ deoxy termination reaction kits (Applied Biosystems) as per manufacturer’s instructions with the exception of 'half reactions' of BigDye mix (4 µl) being used. Cycle sequencing was performed in an Idaho Technology Rapid cycler.

The resulting extension products were transferred to sterile 1.5 ml microcentrifuge tubes, mixed with 5 µl 125 mM EDTA pH 8.0 and 60 µl 100% ethanol and allowed to incubate at room temperature for 15 minutes exactly.

Sequencing products were pelleted by centrifugation for 15 minutes in a microcentrifuge at maximum speed. The supernatant was carefully decanted and the resulting pellet washed by adding 175 µl of chilled 70% ethanol and spun for an additional 10 minutes. The supernatant was then carefully removed and the resulting pellets air dried.

2.6.15 pGEM-T easy PCR product cloning

PCR products were cloned using the Promega pGEM-T Easy cloning vector kit as per manufacturer’s instructions.

34 2.7 Computational analysis of DNA sequences

2.7.1 BioEdit Sequence editor

Automated DNA sequencing data was edited using BioEdit software (Hall,

1998) which is available for download at the URL http://www.mbio.ncsu.edu/RNaseP/info/programs/BIOEDIT/bioedit.html. BioEdit was used to examine automated DNA sequencing trace files in order to manually determine base calling accuracy and edit sequences accordingly.

Contiguous DNA sequences were generated from sequence fragments manually and using the CAP Contig Assembly accessory application with

BioEdit. Sequences selected from homology searches using BLAST (section

2.7.2) were retrieved from the GenBank database in FASTA format, placed in a single alignment file and aligned using the ClustalW software (Thompson et al.,

1994) accessory application within BioEdit using default parameters which included using bootstrapped neighbour joining trees with a bootstrap value of

1000.

2.7.2 BLAST searches

Homology searches were performed against the GenBank (Benson et al., 2002) non-redundant (nr) database using BLAST software (Altschul et al., 1990) located at the URL http://www.ncbi.nlm.nih.gov/BLAST/.NCBI. BLASTx software was used to generate 6-frame theoretical translations of partial DNA sequences and searches against amino acid sequence databases.

2.7.3 Ribosomal Database Project II

Bacterial small-subunit 16S rRNA sequences homology searches were

35 performed against The Ribosomal Database Project II (RDP) (Cole et al., 2003) using sequence match software located at the URL: http://rdp.cme.msu.edu/seqmatch/seqmatch_intro.jsp. Selected sequences were downloaded in phylip format and imported into BioEdit for ClustalW alignments and phylogenetic tree construction. Hierarchy Browser software located at the URL: http://rdp.cme.msu.edu/hierarchy/hb_intro.jsp was used to browse a phylogenetic hierarchy and compile a list of 16S rRNA sequences for download (phylip format) and use in sequence comparisons and phylogenetic tree construction.

2.7.4 Bacterial Annotation System

Open reading frames were identified using a Glimmer based annotation online web server, BASys (http://wishart.biology.ualberta.ca/basys/cgi/main.pl). BASys uses more than 30 programs to determine nearly 60 annotation subfields for each gene, including gene/protein name, GO function, COG function, possible paralogues and orthologues, molecular weight, isoelectric point, operon structure, subcellular localization, signal peptides, transmembrane regions, secondary structure, 3-D structure, reactions, and pathways. Results generated were saved in FASTA format.

2.7.5 KEGG2

The various KEGG (Kyoto encyclopedia of genes and genomes) pathways that

ORFs were assigned to were identified using the automatic annotation server

(KAAS) (Kanehisa and Goto, 2000; Kanehisa et al., 2006) located at the URL: http://www.genome.jp/kegg-bin/kaas_main?mode=partial. Predicted ORFs identified by the BASys which were saved in FASTA format were uploaded

36 using the SBH (single-directional best hit) method.

2.7.6 InterProScan protein signature database search

InterProScan software located at the URL http://www.ebi.ac.uk/interpro/scan.html was used to perform simultaneous integrated searches of amino acid sequences against the SWISS-PROT,

TrEMBL, PROSITE, PRINTS, pfam, SMART and ProDom databases (Apweiler et al., 2000). This powerful tool uses an integrated approach for functional classification of biochemically uncharacterized proteins providing a comprehensive search to identify protein signatures such as domains, functional sites and protein families.

2.7.7 Treecon phylogenetic trees

Treecon for windows software (Van de Peer and De Wachter, 1994) was used to generate both DNA and protein phylogenetic trees. Sequence alignments generated using ClustalW in BioEdit were stripped of common gaps and saved in PIR format. Cutoff software (http://www.allserv.rug.ac.be/

~avierstr/programs/programs.html) was then used to truncate the sequence names and the resulting file converted to phylip format using BioEdit.

Phylogenetic trees were then constructed using Treecon software with default parameters unless otherwise specified. Result files were saved in Treecon format before Cutoff software was used to restore full sequence names in the phylogenetic tree.

37

CHAPTER 3 - COMMUNITY COMPOSITION ASSOCIATED

WITH CYANOBACTERIAL BLOOMS

3.1 Introduction

Despite microbes providing major benefits in freshwater habitats (e.g. biogeochemical cycles), little is known of their identity and functional roles.

Identification of species composition is seen an important first step for any ecological study aimed at understanding the functional roles played by different bacteria. The few freshwater bacterial community studies that have been undertaken have focused mainly on oligotrophic ecosystems including unique environments such as arctic and crater lakes (Bahr et al., 1996; Urbach et al.,

2001). Limited bacterial community data exists regarding eutrophic freshwater bodies such as lakes and reservoirs despite their abundance in temperate zones and their economic importance (Zwart et al., 1998; Zwart et al., 2002). A major global problem associated with freshwater ecosystems is the presence of cyanobacterial blooms. Very little is understood on the interactions between cyanobacteria and heterotrophic bacteria (Cole et al., 1982; Ostensvik et al.,

1998), and their interactions regarding bloom formation and toxin production are unknown. It has been hypothesized that cyanobacterial blooms are tightly associated with heterotrophic bacteria communities or visa-versa (Eiler and

Bertilsson, 2004). Initial studies involving composition of bacterial communities associated with cyanobacterial blooms has been reported in four eutrophic lakes in Sweden with affiliation of clones to bacterial divisions commonly found in freshwater systems as well as novel clusters that may be characteristic for bloom events (Eiler and Bertilsson, 2004).

38

This chapter describes the construction and phylogenetic comparison of 16S rDNA clone libraries from cyanobacterial blooms recovered from two water bodies differing in nutrient status and water colour: a eutrophic reservoir and an oligotrophic dune lake. The major aims of these experiments were to identify a suitable sample that will be selected for metagenomic library construction and analysis to gain genome information of bloom constituents. This selection will be based on cyanobacteria species composition and lake characteristics.

Additional aims are to compare bacterial populations from these lakes with published reports on other bloom populations so that possible patterns in population structure may be identified. Moreover, community composition associated with cyanobacterial blooms may be related to biological, chemical and physical parameters in freshwater bodies.

3.2 Experimental procedures

3.2.1 Sample collection and HMW DNA extraction

Cyanobacterial bloom samples containing toxin-producing cyanobacteria were collected from Lake Ainsworth and Lake Samsonvale Australia (section 2.5). To minimize mechanical shearing of HMW DNA during DNA extraction a gentle lysis procedure that entails embedding cells in a LMT agarose was used

(section 2.6.1).

3.2.2 PCR amplification and cloning

Bacterial 16S rDNA was amplified by PCR (section 2.6.13) using universal 16S rRNA primers FD1 and RD1 (Redburn and Patel, 1993). Recovered PCR

39 fragments were cloned in the pGEM-T vector (section 2.6.15), plasmid DNA containing cloned PCR products was extracted (section 2.6.10), and DNA sequencing performed (section 2.6.14) with F1, R1 and R2 primers (Redburn and Patel, 1993).

3.2.3 Sequence editing and phylogenetic analysis

DNA sequences were edited using BioEdit (section 2.7.1) and preliminary phylogenetic affiliation was performed using the RDP classifier (Cole et al.,

2003) to assign 16S rRNA sequences to the taxonomical hierarchy proposed in

Bergey's Manual of Systematic Bacteriology, 2nd Ed. Genus affiliation of 16S rRNA sequences was obtained by comparing cloned sequences with GenBank entries using BLASTn (section 2.7.2). Sequences that exhibited <99% identity and demonstrated highest similarity to the same relative were grouped together as phylotypes. A single representative clone from each phylotype was chosen for further phylogenetic analysis. Chimeras were identified using the Chimera

Check program at RDP (Cole et al., 2003) and the Bellerophon server (Huber et al., 2004) and were removed from analysis. Reference sequences representing closest relatives of cloned sequences as well as 299 sequences obtained from clone libraries studying bacterial communities associated with cyanobacterial blooms (Eiler and Bertilsson, 2004) were downloaded from RDP. All sequences were imported into BioEdit and automatically aligned using ClustalW. Distance matrices were constructed by the method of Jukes and Cantor (Kuhner and

Felsenstein, 1994) and phylogenetic trees constructed by the neighbour-joining method as per section 2.7.7.

40 3.3 Results

The two Australian lakes were chosen as sample sites due to their economic and recreational value, their contrasting physio-chemical properties and their susceptibility to cyanobacterial blooms. Cell counts from the Lake Samsonvale bloom sample were dominated by cylindrospermopsin-producing cyanobacteria

Aphanizomenon and Cylindrospermopsis (Figure 3.1). Cylindrospermopsin toxin levels were recorded at 1.6ug/L (Courtesy: SEQwater). Lake Ainsworth was dominated by the microcystin-producing cyanobacterium Microcystis with cell counts recorded at ≤100 000 cells/ml (Courtesy: Ballina Shire Council).

Toxin levels were not recorded for samples taken from Lake Ainsworth; however, the lake was closed to the public due to the blooms presence.

Other

Microcystis spp.

Prochlorothrix spp.

Anabaena spp.

Cylindrospermopsis raciborskii

Aphanizomenon ovalisporum

0 102030405060 % abundance Cyanobacteria total cell count (Lake Samsonvale)

Figure 3.1: Lake Samsonvale Cyanobacteria cell counts. Abundance (% of total cyanobacterial cells) of potentially toxic cyanobacteria in a bloom encountered at Lake

Samsonvale, Australia. Samples were collected in early spring 2003 (Data: SEQWater,

2003). Cell numbers against the total count for cyanobacteria are given.

A total of 125 clones from 2 independent 16S rDNA clone libraries were prepared from the DNA extracted from Lake Samsonvale (75 clones) and Lake

41 Ainsworth (50 clones). Clones were partially sequenced (average: 510 nucleotides) and the sequences compared and the closest relatives identified

(Table 3.1). Clones affiliated with the phyla Cyanobacteria, Proteobacteria and

Actinobacteria were found to be the most abundant in both lakes (Figure 3.2) and together accounted for greater than 90% of the total. Nine clones (7%) were affiliated with 3 other phyla: 2 with Bacteroidetes, 4 with Verrucomicrobia,

3 with Firmicutes. A total of 4 (3%) clones were most closely related to sequences whose phylogenetic affiliation is as yet uncertain. Comparisons with the GenBank database showed that the majority of the clones were most closely related to sequences recovered from other freshwater clones or isolates

(<80%), whilst few were closely related to sequences recovered from soil or marine habitats. In particular 11 clones (9%) were most closely related to sequences recovered from freshwater lakes susceptible to cyanobacterial blooms.

From the 125 clones a total of 64 phylotypes (Sequences that exhibited <99% identity to one another and demonstrated highest similarity to the same relative) were identified. 20 phylotypes from 5 phyla had a similarity of 96% or more to cultured isolates and comprised 31% of all clones anaylsed. 44 phylotypes from

5 phyla had less than 96% similarity to cultured isolates, or were related to previously cloned 16S rRNA gene sequences, and comprised 69 % of the clones analysed. This shows that the microbial ecology of freshwater cyanobacterial blooms includes a diverse range of many uncultured novel species. These phylotypes were used for further phylogenetic analysis to compare cyanobacterial bloom populations

42 Table 3.1: Summary of closest relatives to cyanobacterial bloom 16S rRNA

genes. Clones of the two different libraries constructed from the DNA extracted from

the blooms of Lake Ainsworth and Lake Samsonvale and their affiliations to members

of the various phyla. Clones listed in brackets are grouped as phylotypes with the clone

which is listed first.

Phylum Clone(s) Sim. (%) Nearest relative Source Family (GenBank accession number) Actinobacteria CYN-1-2, (2-3) 99 Uncultured bacterium clone Horsetooth Microbacteriaceae CYN-1-38 96 HT2E3 (AF418967) Reservoir, USA Propionibacterineae MCY-41 98 Uncultured Propionibacterineae Brackish Pond, bacterium PH-B24N Hawaii (AF513962) Corynebacteriaceae MCY-28 99 Corynebacterium fastidiosum Clinical sample (CIP103808) Unclassified CYN-1-15, (1-8) 99 Uncultured actinobacterium N4 Lake Niegocin, CYN-1-18 96 (AJ575531) Poland CYN-1-50, (1-6) 99 Uncultured bacterium S9A-11 Lake Kasumigaura, CYN-1-29, (2-47) 96 (AB154300) Japan (eutrophic) CYN-1-40 98 Uncultured bacterium S9A-07 Lake Kasumigaura, (AB154299) Japan (eutrophic) CYN-2-23 98 Uncultured actinobacterium Cyano-bloom, LiUU-3-254 (AY496981) Sweden (eutrophic lake) CYN-2-20 100 Uncultured actinobacterium Cyano-bloom, LiUU-11-391 (AY497002) Sweden (eutrophic lake) CYN-2-44 97 Uncultured actinobacterium Cyano-bloom, LiUU-5-288 (AY496983) Sweden (eutrophic lake) CYN-1-42 97 Uncultured Crater Lake Ultra-oligotrophic bacterium CL120-17 Crater Lake, USA (AF316668) CYN-1-37 97 Uncultured actinobacterium S5 Lake Schoehsee, (AJ575507) Germany Cyanobacteria CYN-1-11, (1-45, 98-97 Aphanizomenon flos-aquae Brielse Meer, The Family 4.1 1-48, 2-2, 1-3, 1- (AY038035) Netherlands 19, 1-24, 1-9, 1- 36, 1-21, 1-31, 2- 39, 2-37, 1-25, 2- 45, 2-43, 2-30) CYN-1-27, (2-1, 97 Aphanizomenon cf. gracile 271 Lake Norre, 2-41, 2-42, 2-27) (AJ293125) Denmark

CYN-2-19 98 Anabaena cf. cylindrical Dam of PMC9705 (AJ293119) Champsanglard, France CYN-1-30 98 Aphanizomenon sp. TR183 Brackish water, (AJ133155) Baltic sea CYN-1-28, (1-43, 99 Cylindrospermopsis raciborskii NSW, Australia 1-47, 1-5, 2-25) strain 05E (AF516732) Family 1.1 MCY-42 93 Synechococcus elongatus PCC Freshwater 7942 (AF132930) MCY-84 98 Microcystis sp. AICB 35 Freshwater (AY672728) Prochlorothrichaceae CYN-2-18 95 Prochlorothrix hollandica Freshwater CYN-1-1 93 (AF132792)

43 Unclassified CYN-2-4, (2-6) 98 Uncultured diatom clone Arsenite-oxidizing (AY168751) biofilm, USA MCY-7, (20, 87, 99-97 Microcystis aeruginosa USA 43, 53, 64, 77, PCC7005 (U40338) 49, 62, 86, 18, 57, 23, 31, 38, 67, 73) α-Proteobacteria CYN-1-32 99 Methylosinus pucelana Denitrification Methylocystaceae (AF107461) reactor CYN-1-34, (1-12) 99 Unidentified bacterium GLB-5 Green Lake, Hawaii (AY345575) CYN-1-44, (1-49) 99 yanoikuyae B1 ? (U37524) Rickettsiaceae CYN-1-35 99 Uncultured bacterium RCP2-17 Forest wetland, (AF523878) USA Rhodospirallaceae MCY-13 90 Uncultured Green Bay Green Bay, USA MCY-15 88 ferromanganous micronodule bacterium MND8 (AF292999) Unclassified CYN-1-41 99 Uncultured alpha Cyano-bloom, proteobacterium LiUU-11-297 Sweden (eutrophic (AY509420) lake) CYN-2-49 99 Uncultured bacterium SY1-49 Lake Soyang, South (AF107502) Korea CYN-1-10, (2-12, 99 Uncultured bacterium HTH6 Horsetooth 2-22, 2-28) (AF418965) reservoir, USA CYN-2-33 98 Uncultured alpha Cyano-bloom, proteobacterium LiUU-11-28 Sweden (eutrophic (AY509409) lake) MCY-8, (61, 65) 98 Uncultured bacterium Sta0-39 Lake Ijssel, The (AJ416167) Netherlands MCY-85, (29) 98 Uncultured alpha Cyano-bloom, MCY-72 96 proteobacterium LiUU-3-194 Sweden (eutrophic (AY509417) lake) β-Proteobacteria CYN-1-17 97 Uncultured beta Rimov Reservoir, Comamonadaceae proteobacterium (AF361203) Czech Republic (meso-eutrophic) MCY-52 96 Uncultured bacterium C-CF-23 Semi-arid soil, USA (AF443568) Incertae sedis MCY-6, (10) 96 Uncultured beta Industrial biofilter, proteobacterium, BIci13b Germany (AJ318109) MCY-25, (82) 94 Uncultured Pseudomonas sp. Spectacles Hot YJQ-50 (AY569299) Spring, China Burkholderiaceae MCY-46 98 Ralstonia sp. A-471 Moderately (AF525456) thermophilic waste Unclassified CYN-2-14 96 beta proteobacterium MWH- Freshwater Lake, UniP1 (AJ565421) Tanzania CYN-1-23 97 Hylemonella sp. WQH1 Lake Hallstadt, (AJ565430) Australia CYN-2-36 99 Uncultured beta Cyano-bloom, proteobacterium LiUU-9-204 Sweden (eutrophic (AY509466) lake) MCY-1 97 Uncultured bacterium Earthworm intestine MCY-34 96 (AY154553) MCY-47, (54, 56, 98 Uncultured beta Toolik Lake, Alaska 60) proteobacterium TLM05 (AF534429) MCY-39 92 Azoarcus indigens (AF011345) Plant and Fungus associated γ-Proteobacteria MCY-93 99 Lysobacter sp. C3 (AY074793) Soil (Plant- Xanthomonadaceae associated) Legionellaceae CYN-1-7 96 Legionella feeleii (X73406) ?

44 Unclassified MCY-80 86 Uncultured gamma Mariculture site proteobacterium FFP54 (Marine) (AY830056) MCY-81 94 Uncultured eubacterium Polychlorinated WR8111 (AJ292891) biphenyl-polluted soil Bacteroidetes CYN-1-14 99 Uncultured bacterium 207ds20 Freshwater creek, Flexibacteraceae CYN-1-4 97 (AY212653) USA Verrucomicrobia CYN-2-31, (1-22) 99-98 Uncultured Verrucomicrobia Cyano-bloom, Verrucomicrobiales bacterium LiUU-9-291 Sweden (eutrophic (AY509507) lake) CYN-2-32 94 Uncultured bacterium SY1-18 Lake Soyang, South (AF107499) Korea Xiphinemato- CYN-1-33 96 Unidentified eubacterium LD29 Lake Loosdrecht, bacteraceae (AF009975) The Netherlands (eutrophic) Firmicutes MCY-59 98 Bacillus sp. CCR4 (AJ810548) Hot spring, Mexico Bacillaceae MCY-22, (36) 96 Bacillus sp. partial 16S rRNA Sea of Galilee, gene, isolate ISR04 (Y14144) Israel Unclassified CYN-2-46 97 Uncultured eubacterium CR- Columbia river, Unclassified PA13 (AF141415) USA CYN-1-46 96 Uncultured Crater Lake Ultra-oligotrophic bacterium CL500-37 Crater Lake, USA (AF316770) MCY-16, (14) 91 Uncultured Planctomycetales Hot springs, USA SM1A02 (AF445645)

45 β-Proteobacteria α-Proteobacteria 5% 1% γ-Proteobacteria 16% 21% Actinobacteria

Verrucomicrobia 4% 3% Bacteroidetes 6% Others 44% Cyanobacteria

A Lake Samsonvale

β-Proteobacteria 25% γ-Proteobacteria α-Proteobacteria 6% 16% 6% Actinobacteria

6% Firmicutes 4% Others 37% Cyanobacteria

B Lake Ainsworth

Figure 3.2: Comparison of the composition percentage of 16S rRNA clones.

Clones affiliated with phyla from Lake Samsonvale (A) and Lake Ainsworth (B). Clones

whose phylogenetic affiliation is as yet deemed ‘unclassified’ according to the

taxonomical hierarchy proposed in Bergey's Manual of Systematic Bacteriology, 2nd

Ed. are included in ‘others’.

46 3.3.1 Cyanobacteria

In total 52 clones from the 2 lakes were found affiliated to the phylum

Cyanobacteria, making up 44% and 37% of clones from Lake Samsonvale and

Lake Ainsworth respectively (Figure 3.2). As indicated by cell counts (Figure

3.1) the bloom samples collected from Lake Samsonvale contained various species of Cyanobacteria. Moreover it was dominated by the filamentous

Cylindrospermopsin-producing genera Aphanizomenon and

Cylindrospermopsis (Figure 3.1). This was reflected in 16S rDNA analysis with the majority of clones found to be most similar to Aphanizomenon (70%) and

Cylindrospermopsis (15%) (Figure 3.3). Also present were low percentages of clones most similar to potential toxin-producing Cyanobacteria Anabaena and

Prochlorothrix as well as an uncultured diatom clone (Figure 3.3). This diversity was not represented in the bloom sample collected from Lake Ainsworth with

95% of clones affiliated to the phylum Cyanobacteria found to be most similar to members of the Microcystis genus.

Uncultured diatom clone

Prochlorothrix sp.

Anabaena sp.

Cylindrospermopsis sp.

Aphanizomenon sp.

0 1020304050607080 % of 16S rDNA affiliated with genera from the phylum Cyanobacteria (Lake Samsonvale)

Figure 3.3: Phylogenetic distributions at genus level for rDNA clones affiliated to the phylum Cyanobacteria (Lake Samsonvale). 16S rRNA genes amplified from the

DNA recovered from a cyanobacterial bloom at Lake Samsonvale, Australia. 47 3.3.2 Heterotrophic bacteria comparative analysis

A phylogenetic comparison of the phylotypes identified in this study with published 16S rDNA sequences from heterotrophic bacterial communities associated with blooms (Eiler and Bertilsson, 2004) was performed to identify further phylogenetic patterns. Moreover, the addition of 64 phylotypes consisting of 125 sequences from Australian waters is expected to broaden the current web of knowledge to include a warm eurotrophic reservoir and an acidic oligotrophic lake. The criteria for defining a cluster taken by Zwart et al. (2002) was that only monophyletic groups of 16S rDNA sequences supported by bootstraps and at least 95% identical from at least two separate environments were considered. Many of the phylotypes grouped with previously described freshwater clusters (Glockner et al., 2000; Zwart et al., 2002) including those described for communities associated with cyanobacterial blooms (Eiler and

Bertilsson, 2004). From our analysis an additional 12 new clusters were identified. In addition 1 phylotype was identified with a previously described cluster that has been hypothesized as being a signature for cyanobacterial bloom events (Eiler and Bertilsson, 2004). Our study therefore provides further support for this hypothesis.

3.3.2a Actinobacteria

Fourteen phylotypes consisting of 18 bloom-associated clones (21% Lake

Samsonvale; 6% Lake Ainsworth) were identified as members of the phyla

Actinobacteria, a common division found in freshwaters of various trophic status, water chemistry and geographical location. 89% of Actinobacteria- affiliated clones analysed were closely related to previously cloned 16S rRNA gene sequences. Four novel clusters (cynA, cynB, cynC and cynD) were

48 identified with clones recovered from Lake Samsonvale (Figure 3.4). The cynA and cynB sub-clusters were affiliated with cluster acI, which presently consists of more than 200 environmental clones originating from various lakes, rivers and estuaries (Warnecke et al., 2004). In additional clusters cynC and cynD are affiliated within cluster acIV, which presently contains more than 150 representatives dominated by sequences from freshwater environments

(including cyanobacterial blooms) of which none are culturable (Warnecke et al., 2004). Clones from Lake Ainsworth have additionally grouped in the

Propionibacterineae and Corynebacteriaceae families. This is the first report of their association with cyanobacterial blooms. Greater than 25% of phylotypes grouped with a common freshwater Actinobacteria cluster, STA2-30 of which representatives have been detected in almost every freshwater habitat including cyanobacterial blooms (Zwart et al., 2002; Eiler and Bertilsson, 2004). Other

Actinobacteria-affiliated clusters commonly represented in freshwater habitats;

ACK-M1 and CL120-6, had no representatives in Lake Samsonvale or Lake

Ainsworth. This is in contrast to results from previous cyanobacterial bloom- associated communities, which have found an abundance of sequences represented in these clusters (Eiler and Bertilsson, 2004).

3.3.2b Proteobacteria

A total of 29 phylotypes (43 clones) were affiliated with members of

Proteobacteria, which are the most abundant of the heterotrophic bacteria represented in the Domain Bacteria. Three of the five subphyla of the phylum

Proteobacteria (α, β, and γ) were present in the two lakes, and is consistent with previous studies suggesting that these 3 subdivisions are the most common in freshwater (Glockner et al., 1999).

49 0.1

753 LiUU-9-180, AY497000 LiUU-9-60, AY496997 LiUU-3-53, AY496976 997 LiUU-5-187, AY496994 LiUU-5-349, AY496990 CLI20-6 873 LiUU-9-273, AY497001 LiUU-3-101, AY496977 993 LiUU-3-190, AY496979 958 LiUU-3-172, AY496978 969 LiUU-5-178, AY496985 LiUU-3-127, AY496980 521 508 LiUU-9-249, AY496996 669 702 LiUU-5-448, AY496987 ACK-M1 1000 LiUU-11-404, AY497003 LiUU-5-230, AY496988 LiUU-5-428, AY496986 CYN-2-20 885 LiUU-9-130, AY496999 acl 981 LiUU-3-254, AY496981 LiUU-5-394, AY496992 LiUU-11-391, AY497002 679 STA2-30 CYN-1-38 LiUU-9-93, AY496998 CYN-1-18 544 CYN-1-15 CYN-1-2 951 Uncultured bacterium S9A-07, AB154299 cynA 640 CYN-1-40 750 Uncultured actinobacterium NM3, AJ575536 505 LiUU-5-315, AY496984 1000599 LiUU-5-288, AY496983 LiUU-3-256, AY496982 539 607 cynB 971 CYN-1-50 878 CYN-2-44 992 CYN-2-47 CYN-2-23 LiUU-5-233, AY496989 547 1000 LiUU-5-375, AY509262 LiUU-3-40, AY509261 1000 Corynebacterium fastidiosum, CIP103808 985 MCY-28 1000 Uncultured Propionibacterineae bacterium clone PH-B24N, AF513MCY-28 1000 Propionibacterium acnes, AB042291 MCY-41 1000 Uncultured Crater Lake bacterium CL120-17, AF316668 cynC 791 CYN-1-42 821 LiUU-5-200, AY496995 LiUU-5-433, AY496993 994 aclV Uncultured actinobacterium S5, AJ575507 1000 CYN-1-37 cynD 681 LiUU-3-85.2, AY497004 LiUU-5-376, AY496991 Thermus thermophilus, M26923

Figure 3.4: Phylogenetic tree of Actinobacteria-affiliated sequences.

Phylogenetic tree based on neighbour-joining analysis of partial 16S rDNA sequences from Lake Samsonvale (CYN-) and Lake Ainsworth (MCY-) affiliated to the phylum

Actinobacteria and sequences extracted from GenBank. Bootstrap values greater than

500 are shown at nodes. Clusters defined by brackets are groups designed after

Hiorns et al. (1997), Glockner et al. (2000), Urbach et al. (2001), Zwart et al. (2002),

Warnecke et al. (2004) and Eiler and Bertilsson (2004). Novel freshwater clusters identified in this study are designated cyn (Lake Samsonvale) and mcy (Lake

Ainsworth) depending on their ecosystem that was studied.

50 Both lakes studied contained similar numbers of α-Proteobacteria (16% Lake

Ainsworth, 17% Lake Samsonvale) as indicated by cloned 16S rDNA sequences (Figure 3.2). This is in agreement with observations made by

Glockner et al. (2000) and Eiler and Bertilsson (2004) who showed that α-

Proteobacteria made up between 0-20% of the total population of bacteria in freshwater habitats. Phylotypes from Lake Samsonvale affiliated with cluster

LD12 (Figure 3.5), which previously has been observed in most (70%) freshwater lakes (Zwart et al., 2002). No phylotypes from Lake Ainsworth acidic waters affiliated with this cluster (Figure 3.5) however this cluster is thought to be more common in neutral to alkaline waters (Lindstrom and Leskinen, 2002) which our data further supports. Three novel clusters cynE, cynF and mcyA were identified (Figure 3.5). Clusters cynE and mcyA lack any closely related sequences in the RDP and GenBank databases. Cluster cynE affiliates within the Rickettsiaceae family (Figure 3.5) of which there is no previous encounter of communities associated with cyanobacterial blooms. Cluster cynF consists of phylotypes from Lake Samsonvale and Lake Vallentunasjon (Sweden) which contained a cyanobacterial bloom dominated by Microcystis, Aphanizomenon, and Anabaena species (Table 3.2). Two phylotypes from Lake Ainsworth affiliated with the previously described LiUU-9-115.2 cluster (Figure 3.5). This cluster is affiliated with the Acetobacteraceae, a family of acidophilic, gram- negative, aerobic, and ellipsoidal to rod-shaped bacteria.

51 0.1

LiUU-5-458, AY509395 LiUU-3-195, AY509384 LiUU-11-71, AY509411 LiUU-3-165, AY509383 LiUU-3-94, AY509387 LiUU-5-381, AY509381 LiUU-11-134 AY509412 CYN-1-10 LD12 CTN-2-49 LiUU-11-28, AY509409 LiUU-3-308, AY509385 LiUU-5-422, AY509394 666 LiUU-3-223, AY509389 1000 LiUU-3-34, AY509382 CYN-2-33 LiUU-3-264, AY509390 LiUU-5-138, AY509399 Uncultured Rickettsiaceae bacterium SIMO-1342, AY710782 986 Uncultured bacterium clone RCP2-17, AF523878 cynE Rickettsiaceae 990 CY N-1-35 LiUU-9-213, AY509407 LiUU-5-166, AY509401 LiUU-11-240 AY509414 903 994 LiUU-3-284, AY509391 576 LiUU-1-5a, AF550594 997 LiUU-9-101, AY509404 LiUU-5-323, AY509396 987 Uncultured bacterium clone Sta0-39, AJ416167 1000 MCY - 8 Rhizobiales Nordella oligomobilis, AF370880 998 Methylosinus pucelana, AF107461 LiUU-3-75, AY509386 CY N-1-32 LiUU-11-297, AY509420 1000 CYN-1-41 660 cynF 1000 LiUU-11-169, AY509419 LiUU-11-357, AY509416 506 Azospirillum amazonense (T), Z29616 LiUU-3-182, AY509388 CY N-1-44 992 LiUU-1-5b, AF550595 500 CY N-1-34 932 989 Sphingomonadaceae 996 LiUU-9-329, AY509408 GOBB3-C201 LiUU-9-141, AY509405 Uncultured Green Bay ferromanganous micronodule bacterium MND8, AFè uncultured alpha proteobacterium, AY139006 1000 1000 998 MCY - 15 mcyA MCY - 13 LiUU-9-115, AY509418 1000984 MCY - 72 MCY - 85 711 LiUU-9-115.2 LiUU-5-144, AY509400 756 863 LiUU-3-194, AY509417 LiUU-3-377, AY509392 Thermus thermophilus (T), M26923

Figure 3.5 Phylogenetic tree of α-Proteobacteria-affiliated sequences.

Phylogenetic tree based on neighbour-joining analysis of partial 16S rDNA sequences from Lake Samsonvale (CYN-) and Lake Ainsworth (MCY-) affiliated to the subphyla α-

Proteobacteria and sequences extracted from GenBank. Bootstrap values greater than

500 are shown at nodes. Clusters defined as in Figure 3.4

52 Twenty-five percent of all clones from Lake Ainsworth were affiliated to members of the subphyla β-Proteobacteria but only 5% of clones from Lake

Samsonvale were affiliated with this subphyla (Figure 3.2). Two novel clusters mcyB and mcyC were identified (Figure 3.6); McyC was affiliated to members of the family Comamonadaceae, which contains fifteen genera with diverse characteristics including strains that produce sheaths (Mulder, 1989a, b) which are hollow tube like structures surrounding a chain of cells. The sheaths assist bacteria to attach to surfaces, obtain nutrients from slowly running water and resist against predators. McyC consisting of two phylotypes from Lake

Ainsworth and an isolate from a freshwater lake in Japan lacked any close relatives (Figure 3.6). One phylotype recovered from Lake Samsonvale, CYN-2-

14 was found to group in the LiUU-5-131 cluster, which has been inferred previously to be characteristic for cyanobacterial bloom events in Swedish lakes

(Eiler and Bertilsson, 2004). Phylotypes from both Lake Samsonvale and Lake

Ainsworth were affiliated to clusters previously not reported with cyanobacterial blooms including Ramlibacter, Hymonella, Ralstonia, and Azoarcus (Figure

3.5). Ralstonia species in particular occupy very diverse ecological niches including soils, sludge and wastewater as well as clinical samples. Phylotypes were found to group in the Polynucleobacter necessarius cluster (Lake

Samsonvale) as well as those related to Rhodoferax sp. BAL47 (Lake

Ainsworth). These clusters are commonly associated with freshwater environments (Zwart et al., 2002; Hahn, 2003) and representatives are abundant in previous communities so far reported associated with cyanobacterial blooms.

53 0.1

782 CY N-2-36 751 LiUU-9-204, AY509466 877 LiUU-5-145, AY509442 876 LiUU-5-443, AY509457 LiUU-11-234, AY509472 Polynucleobacter necessarius LiUU-3-20, AY509423 866 LiUU-5-318, AY509443 625 LiUU-3-218, AY509424 978 LiUU-11-317, AY509476 LiUU-1-6, AY509597 LiUU-1-7A, AF550598 895 LiUU-1-7C, AF550600 LiUU-1-7B, AF550599 LiUU-1-5c, AF550596 1000 LiUU-5-225, AY509449 951 LiUU-3-4, AY509421 LiUU-3-86, AY509428 983 LiUU-5-340.2 997 LiUU-5-346, AY509448 LiUU-5-341, AY509447 LiUU-11-355, AY509480 1000 Ralstonia sp. A-471., AF525456 Ralstonia MCY - 46 998 CY N-2-14 LiUU-5-131 LiUU-11-207, AY509471 1000 MCY - 34 666 MCY - 1 mcyB 950 aquatic bacterium R1-B20; AB195752 1000 Uncultured bacterium lw 57, AY154553 LiUU-5-220, AY509455 1000 LiUU-3-162, AY509433 LiUU-5-367, AY509452 745 LD28 978 LiUU-3-241, AY509437 LiUU-3-122, AY509429 LiUU-3-82, AY509427 916 Azoarcus indigens., AF011345 MCY - 39 1000 LiUU-11-261, AY509473 811 LiUU-11-85, AY509470 LiUU-11-128 633 LiUU-3-128, AY509431 Burkolderiaceae 684 LiUU-9-197, AY509465 942 MCY - 6 LiUU-11-285, AY509475 998 beta proteobacterium ASRB1, AY612302 628 999 Uncultured Pseudomonas sp. clone YJQ-50., AY569299 mcyC MCY - 25 945 Hylemonella sp. WQH1., AJ565430 Hylemonella sp. WQHI 842 CY N-1-23 910 LiUU-11-331, AY509478 LiUU-5-151, AY509451 888757 Ramlibacter tataouinensis (T), AF144383 862 MCY - 52 514 LiUU-5-302, AY509445 LiUU-5-292, AY509438 LiUU-9-86, AY509461 1000 CY N-1-17 GKS-16 LiUU-11-337, AY509479 Comamonadaceae 524 LiUU-1-3B, AF550602 MCY - 60 712 999 LiUU-3-125, AY509430 609933 LiUU-9-169, AY509469 LiUU-9-287, AY509467 600 LiUU-3-329, AY509439 544 LiUU-3-81, AY509426 1000 981 LiUU-3-78, AY509425 Rhodoferax sp. BAL47 LiUU-5-463, AY509458 LiUU-5-435, AY509450 LiUU-3-353, AY509441 LiUU-3-292, AY509438 580 LiUU-3-164, AY509434 LiUU-3-149, AY509432 1000 LiUU-11-174, AY509484 LiUU-11-174 LiUU-9-138, AY509464 LiUU-11-330, AY509477 1000 LiUU-3-213, AY509436 LiUU-11-179, AY509485 568 642 LiUU-3-317, AY509482 LiUU-11-179.2 LiUU-11-179, AY509485 LiUU-9-233, AY509483 935 LiUU-9-134, AY509463 LiUU-9-70, AY509460 Thermus thermophilus (T), M26923

Figure 3.6: Phylogenetic tree of β-Proteobacteria-affiliated sequences.

Phylogenetic tree based on neighbour-joining analysis of partial 16S rDNA sequences from Lake Samsonvale (CYN-) and Lake Ainsworth (MCY-) affiliated to the subphyla β-

Proteobacteria and sequences extracted from GenBank. Bootstrap values greater than

500 are shown at nodes. Clusters defined as in Figure 3.4. 54 Within γ-Proteobacteria clones from Lake Ainsworth and Lake Samsonvale were identified representing 2% and 5% respectively (Figure 3.2). These low frequencies support previous observations that γ-Proteobacteria are less commonly represented in bacterial communities of freshwater habitats

(Glockner et al., 1999; Eiler and Bertilsson, 2004). Groupings within the

Legionelalles and were identified (Figure 3.7) which has been noted in cyanobacterial bloom associated bacterial communities. One phylotype from

Lake Ainsworth (MCY-93) exhibited 99% similarity to a Lysobacter isolate originating from soil, which is affiliated with the Xanthomonadaceae. To date no members from this class have been found represented in bacterial communities associated with cyanobacterial blooms.

3.3.3c Others

Phylotypes affiliated with members of the phyla Firmicutes were only found in

Lake Ainsworth representing 5% of clones in the library (Figure 3.2). Two phylotypes affiliated with members of the Bacillaceae with one phylotype exhibiting close similarity to a freshwater isolate. To our knowledge clones representing this group have not been found in bacterial communities associated with cyanobacterial blooms. Phylotypes affiliated with members of the phyla Verrucomicrobia were found only in Lake Samsonvale representing

5% of clones in this library (Figure 3.2). Previous studies have shown that specific clusters are widespread in freshwater lakes (Zwart et al., 2002) and communities associated with cyanobacterial blooms (Eiler and Bertilsson,

2004). However, no clones recovered from Lake Samsonvale affiliated with these clusters, instead grouping in two novel clusters cynH and cynI (Figure

3.6). Two phylotypes retrieved from Lake Samsonvale were affiliated with

55 0.1

LiUU-3-217, AY509280 LiUU-1-4, AF5505593 LiUU-9-219, AY509367 LiUU-3-343, AY509274 LiUU-5-347, AY509291 cynG 928 LiUU-3-84, AY509267 916 939 Uncultured bacterium clone 207ds20, AY212653 CY N-1-14 623 CY N-1-4 LiUU-3-145, AY509376 LiUU-11-382, AY509374 Bacteroidetes LiUU-9-238, AY509320 1000 LiUU-3-237, AY509282 LiUU-3-160, AY509276 574 994 LiUU-9-16, AY509366 998 1000 LiUU-9-37, AY509309 LiUU-9-128, AY509364 900 915 LiUU-9-170, AY509316 LiUU-5-132, AY509294 672 1000 LiUU-3-229, AY509281 LiUU-3-171, AY509278 LiUU-9-116, AY509502 748 LiUU-9-113, AY509462 594 LiUU-5-150, AY509454 999 uncultured eubacterium WR8111, AY154553 722 MCY - 81 953 Legionellales 947 Legionella feeleii,X73406 CY N-1-7 gamma-Proteobacteria 654 LiUU-5-227, AY509456 1000 Lysobacter sp. C3, AY074793 Lysobacter MCY - 93 LiUU-11-267, AY509474 LiUU-3-167, AY509435 672 LiUU-9-18, AY509459 LiUU-11-89, AY509481 1000 LiUU-9-215, AY509498 1000 LiUU-3-367, AY509488 1000 LiUU-9-41, AY509520 Bacillus sp. ISR04, Y14144 1000 Firmicutes Bacillus sp. CCR4, AJ810548 996 MCY - 36 Bacillus 576 MCY - 59 MCY - 22 1000 Uncultured eubacterium clone CR-PA13, AF141415 Genera_incertae_sedis_OP10 778 CYN-2-46 530 LiUU-9-87, AY509497 LiUU-9-203, AY509517 1000 LiUU-9-325, AY509509 1000 LiUU-9-218, AY509499 Planctomycetes 999 LiUU-5-379, AY509490 LiUU-5-192, AY509489 1000 LiUU-5-103, AY509493 503 LiUU-5-456, AY509491 1000 LiUU-5-334, AY509494 LiUU-9-265, AY509500 1000 CY N-2-31 993 cynH LiUU-9-291, AY509507 1000 783 LiUU-5-152, AY509492 966 Unidentified eubacterium LD29, AF009975 1000 CY N-1-33 cynI 930 LiUU-11-142, AY509512 Verrucomicrobia CY N-2-32 LiUU-9-12, AY509496 LiUU-11-380, AY509514 995 uncultured bacterium SY1-18, AF107499 LiUU-9-20, AY509516 LiUU-11-94 LiUU-11-94, AY509511 LiUU-11-236, AY509513 LiUU-9-225, AY509506 1000 Uncultured Crater Lake bacterium CL500-37 1000 CYN-1-46 Uncultured Planctomycetales bacterium clone SM1A02, AF445645 895 Unclassified 1000 MCY - 14 MCY - 16 Thermus thermophilus, M26923

Figure 3.7: Phylogenetic tree of γ-Proteobacteria, Firmicutes, Verrucomicrobia,

Bacteroidetes and candidate division OP10 (D) affiliated sequences Phylogenetic tree based on neighbour-joining analysis of partial 16S rDNA sequences from Lake

Samsonvale (CYN-) and Lake Ainsworth (MCY-) affiliated to the subphyla γ-

Proteobacteria and sequences extracted from GenBank. Bootstrap values greater than

500 are shown at nodes.

56 members of the phyla Bacteroidetes grouping within a novel cluster cynG consisting of an uncultured organism from a freshwater creek and 6 sequences from 3 different cyanobacterial bloom-affected lakes in Sweden (Figure 3.6).

3.4 Discussion

3.4.1 Bloom species composition

The comparative analysis of 64 phylotypes obtained from the bacterioplankton of two contrasting bloom-affected water bodies showed that the vast majority affiliated with clusters consisting of sequences originating from freshwater. This adds further evidence that freshwater bacteria form lineages phylogenetically distinct from bacteria in neighbouring environments such as soil and sediment.

The major bacterial divisions represented in most freshwater sites are also mirrored in Lake Samsonvale and Lake Ainsworth with Proteobacteria (alpha and beta subdivisions) and Actinobacteria representing the majority of heterotrophic bacteria from these two environments. A high percentage of total

16S rDNA clones consisted of sequences affiliated to Cyanobacteria (44% Lake

Samsonvale, 37% Lake Ainsworth). Both lakes exhibited total counts of cyanobacteria greater than 100, 000 cells per ml so this high representation is not unexpected. Notable is the low percentage of clones affiliated to the

Bacteroidetes (formally known as the Cytophaga-Flavobacterium-bacteroides

(CFB)) with only 3% of total clones in Lake Samsonvale and a complete absence in Lake Ainsworth (Figure 3.2). Members representative of

Bacteroidetes have been noted previously to be abundant in a variety of freshwater habitats, in particular four Swedish lakes affected by cyanobacterial blooms where they were found to make up 30-50% of the total population (Eiler and Bertilsson, 2004). This division has been noted previously (Glockner et al.,

57 1999; Cottrell and Kirchman, 2000; Eilers et al., 2000) to be frequently underrepresented which may be due to bias in DNA extraction and/or specificity of primers used in PCR although universal bacterial primers were used.

3.4.2 Heterotrophic bacteria comparative analysis

Establishing the link between heterotrophic bacteria in cyanobacterial blooms with their specific functions and ecological properties has the potential to bring about a better understanding of the overall ecology of cyanobacterial blooms.

To date the bacterial communities of only 6 lakes affected by cyanobacterial blooms has been investigated, four lakes in Sweden and the two Australian lakes in this study. To explore the preliminary examinations of 16S rDNA sequences from two Australian lakes associated with cyanobacterial blooms we investigated 64 phylotypes identified in this study for similarities to their closest relatives as well as the available cyanobacterial bloom-associated datasets.

A total of 12 novel clusters consisting of 22 sequences were noted spanning all divisions represented in the analysis. In addition 5 ‘isolate’ clusters distinct from

Swedish lake clusters and not previously noted to be associated with cyanobacterial blooms were identified further increasing knowledge regarding bacterial groups represented in bacterial communities associated with cyanobacterial blooms. A total of 7 of the 12 novel clusters identified (cynD, cynF, cynG, cynH, cynI, mcyA, mcyC) were found to lack any close relatives, clusters cynF, cynH and cynI in particular consist of clones from Lake

Samsonvale and several sequences from cyanobacterial bloom-associated lakes in Sweden. These finding opened the suggestion that sequences in these clusters may represent bacteria characteristic for bloom events. In addition a clone from Lake Samsonvale was found affiliated to the cluster LiUU-5-131 58 (Eiler and Bertilsson, 2004) which has been speculated previously to be characteristic for cyanobacterial bloom events. In order to learn more of the roles of such organisms in cyanobacterial blooms the development and application of specific 16S rDNA probes to determine the distribution of these organisms is needed. Likewise culturable representatives of these clusters are needed in order to learn more of possible functional roles these bacteria may play in the triggering bloom events and/or secondary metabolite production.

Although it has been noted that whilst freshwater bacterial clusters are phylogenetically distinct from other environments, they exhibit a wide geographical distribution (Zwart et al., 1998; Glockner et al., 2000; Zwart et al.,

2002). Only several attempts have been made to relate community composition to chemical and physical parameters in lakes such as trophic status (Lindstrom,

2000) with results suggesting that nutrient content of water bodies at least indirectly influence the structure of bacterial community’s. To date, there are no noted accounts of studies investigating bacterial communities associated with blooms originating from waters of contrasting trophic status and humic content.

In this study the bloom-associated heterotrophic bacteria communities of two lakes of different trophic status and humic content were compared to each other and to the bacterial communities of four bloom-affected Swedish lakes.

Phylogenetic analysis of sequences from these 6 lakes showed that 16S rDNA phylotypes retrieved from the acidic, dystrophic oligotrophic waters of Lake

Ainsworth were affiliated to clusters containing sequences from the eutrophic waters of 4 Swedish lakes and Lake Samsonvale at 16% and 0% respectively.

This is in contrast to 16S rDNA phylotypes from Lake Samsonvale of which

67% were affiliated to clusters that were restricted or dominated by sequences retrieved from the eutrophic bloom-affected Swedish lakes. Despite the

59 geographical separation preliminary data suggests that the community composition of Lake Samsonvale exhibits a greater proportion of bacterial groups that are present in bloom effected Swedish lakes than that of Lake

Ainsworth which may be attributed to lake character of the latter being quite dissimilar to the other water bodies (Table 3.2). Another possible hypothesis for these differences could be that the dominant Cyanobacteria species affect the heterotrophic bacteria population structure. However there seemed to be no apparent pattern of heterotrophic bacteria grouping with the Cyanobacteria species dominating the sample from which they originated.

Although to date the composition of bacterial communities associated with cyanobacterial blooms has been investigated in only 6 water bodies, preliminary results indicate that differences in lake character appear to influence bacterial community composition. A more comprehensive investigation spanning bacterial community composition of various lake types experiencing cyanobacterial blooms is needed to achieve a greater understanding of the effects of lake character regarding bacterial community composition, which ultimately may influence cyanobacterial blooms and the many events that are associated with them. Overall the results shown here have further demonstrated the findings of (Eiler and Bertilsson, 2004) that bacterial communities which accompanied the cyanobacterial blooms produce dissimilar bacterial communities.

60 Table 3.2: Summary of physio-chemical characteristics and prevalent Cyanobacteria species present in Lake Samsonvale, Lake Ainsworth and 4 Swedish Lakes used in comparative analysis. Lake Samsonvale Lake Swedish Lakes (Lake Ekoln, Ainsworth Erken, Limmaren Vallentunasjon Lake surface area 21.63 0.13 6-23 (km2)

Mean depth (m) 9.4 4.0 2.7-17.1

Lake character Monomictic, eutrophic Dystrophic, Dimictic, eutrophic polymictic, (Vallentunasjon: hyper- eutrophic) oligotrophic

Water temperature 26 n/a 21-23 (°C) pH Neutral-slightly alkaline Acidic Neutral-slightly alkaline

Prevalent Aphanizomenon spp. Microcystis Ekoln: Microcystis spp. Cyanobacteria Cylindrospermopsis spp. Erken: Gloetrichia echinulata species spp. Anabaena Limmaren: Aphanizomenon spp., Anabaena spp. spp. Anabaena spp. Vallentunasjon: Microcystis spp., Anabaena spp., Aphanizomenon spp.

3.4.3 Sample selection for metagenomic library construction

An abundance of 16S rDNA clones affiliated with the phylum Cyanobacteria and

members of toxin-producing genera Cylindrospermopsis and Aphanizomenon

were noted (Figure 3.3). Species of both these genera are known to be

associated with the production of the hepatotoxin cylindrospermopsin which

was additionally detected in the bloom sample. There is little known regarding

genes involved in its production and expression when compared to more

commonly studied cyanobacterial metabolites such as microcystin (Section

1.3.4).

Additionally it is important to note a significant amount of cloned sequences

from Lake Samsonvale affiliated with the phylum Actinobacteria (21%; Figure

3.2A). Previous studies involving Actinobacteria from Lake Samsonvale have

61 shown that microorganisms from this group make up a high density (18-24%) of the total bacterial population in this reservoir with implicated links to production of secondary metabolites geosmin and MIB which are secondary metabolites often linked to cyanobacteria (Klausen et al., 2004; Nielsen et al., 2006). None of the clones analysed from Lake Samsonvale were affiliated with cultured representatives (Table 3.1). Furthermore the majority of phylotypes demonstrated affiliation to clusters of freshwater Actinobacteria such as acIV for which there are as yet no cultured representatives. Therefore any genome information from the organisms from which these genes originated these would be of value. The demonstrated high levels of cylindrospermopsin-producing

Cyanobacteria and uncultured Actinobacteria were used to justify the selection of the bloom sample from Lake Samsonvale for further metagenomic analysis so that insights into the genomes of these and other organisms may come.

62 CHAPTER 4 – METAGENOMIC LIBRARY CONSTRUCTION

4.1 Introduction

The few genetic studies that have been previously performed on cyanobacterial cultures and their associated metabolites have demonstrated that using molecular tools to characterize the genes involved in metabolite biosynthesis in cyanobacterial blooms is a particularly appealing way to help uncover the formation, expression and ecological relevance of metabolites (Kaebernick and

Neilan, 2001). To better understand the community structure of cyanobacterial blooms in the real environment as well as the function of these species in terms of the formation of blooms and expression of cyanobacterial secondary metabolites, information on the genes involved is required. Metagenomics is a new and exciting culture-independent approach that can theoretically access the genes of any sequence or function in a particular cyanobacterial bloom. It potentially provides tools to identify species and characterize genes and/or gene clusters responsible for bloom formation and/or metabolite production in cyanobacterial blooms. It is hoped that such information would ultimately be used by water management to better predict, assess and possibly control bloom events.

Results from Chapter 3 deduced that DNA extracted from bloom samples collected from Lake Samsonvale was suitable for metagenomic analysis. This chapter describes the construction of a BAC library made with large random fragments of DNA isolated directly from the Lake Samsonvale cyanobacterial bloom sample. DNA used for this metagenomic library construction was initially screened for specific genes previously described to be directly involved in microbial function (PKS) with a separate single gene clone library constructed

63 from resulting PCR fragments. Clones from the resulting library were sequenced and the diversity of genes implicated in function of species in cyanobacterial blooms subsequently assessed. Analysis of results in relation to the 16S rDNA results in chapter 3 (section 3.4.2) was used as a means to identify library quality and integrity.

4.2 Experimental procedures

4.2.1 Partial restriction enzyme digestion and size selection of random

HWM DNA fragments

A 'wide-window' concentration-scale BamHI digest was used to identify RE concentrations that gave ideal sized DNA fragments to be used in library construction. Partial digestion of HMW DNA extracted from Lake Samsonvale

(section 3.2.1) was performed to fragment HMW DNA before the desired size range DNA could be selected. Experiment conditions were as per section 2.6.5.

Digested DNA was separated by PFGE as per section 2.6.4 with a switch time of 30 seconds and a running time for 16 hours.

For size selection of DNA to be used in library construction an area within the post-electrophoresised LMT agarose gel suspected of containing DNA of interest was excised from the gel prior to ethidium bromide as per section 2.6.4.

DNA ranging in size from 100-300kb was excised from the gel using sterile scalpels and purified with 1U of GELase (Epicentre) as per section 2.6.6.

4.2.2 Construction of a cyanobacterial bloom metagenomic library

Metagenomic library construction was as follows; 100ng of size-selected DNA was ligated to a pre-dephosphorylated pIndigoBAC-5 (Epicentre) vector in a molar ratio of 10:1 using 4U/ul T4 DNA ligase (Promega) (section 2.6.6). 2 µl of

64 dialyzed ligation products (section 2.6.7) were used to transform 30ul of commercial electrocompetent cells by electroporation (section 2.6.8). White, recombinant colonies were inoculated into 96 well plates, incubated for 24 hours and stored at -80°C as per section 2.6.9.

4.2.3 Insert size estimation

The quality and integrity of the metagenomic library was tested by analysis of a number of randomly selected clones. Plasmid preparations of BAC DNA were performed using Promega SV wizard™ miniprep spin columns (Promega) as per manufactures protocol with minor adjustments to accommodate for the low- copy vector used (section 2.6.10). To assess insert size of clones, BAC DNA of randomly selected clones was digested with 5U Bgl II for 2 hours at 37C and separated with PFGE (section 2.6.4) at 6V/cm with a 5-15s switch time for

16hrs at 12◦C.

4.2.4 PCR amplification, single-gene library construction and library analysis

HMW DNA, which was subsequently used as a template in BAC library construction, was screened for the presence of PKS genes as an additional test of library quality and integrity. Genes of interest were amplified using PCR

(section 2.6.13) with PKS-cyanobacteria degenerate primers DKF/DKR

(Schembri et al., 2001). Purified PCR fragments (section 2.6.13) were cloned in the pGEM-T vector using the pGEM-T easy cloning kit (section 2.4.15) and transformed into E.coli JM109 cells (Promega) as per manufactures instructions.

65 Plasmid DNA containing cloned PCR products was extracted (section 2.6.10) and DNA sequencing performed (section 2.6.14) with M13 forward and reverse primers. DNA sequences were edited using BioEdit (Hall, 1998) and the closest relative identified using sequence comparison with the GenBank databases using the BLASTn program (Altschul et al., 1990).

4.3 Results

4.3.1 Metagenomic library construction

To access information contained in the genes of cyanobacterial blooms a BAC library, termed CBNPD1, was prepared from HMW DNA isolated from a natural assemblage (cyanobacterial bloom collected from the waters in Lake

Samsonvale). This metagenomic library consists of 2850 clones and is arrayed in 30 96-well microtiter plates.

HMW DNA was subjected to various partial digestion conditions to determine the optima for size fractionation (Figure 4.1). Undigested DNA (Figure 4.1, lane2) contained a substantial amount of nucleic acid in the size ranges 50-

250kb. This sheared DNA masked the BamHI digested DNA resulting in partial digests being assessed by monitoring the appearance of DNA smaller than

50kb and an increase in concentration between 50-300kb. To reduce concentrations of small restriction fragments, size fractionation of DNA was performed twice within a single gel (Figure 4.2). This was in preference to using a second gel which was shown to result in large decreases in DNA recovery

(data not shown).

66

Figure 4.1: PFGE size fractionation of partially digested environmental DNA

Lane 1. Lambda concatamer ladder; lane 2 no addition of BamHI; lanes 2-10 digestion with 0.5, 1, 2, 3, 4, 5, 15 and 20 units of BamHI per 100 μl reaction.

Figure 4.2: First and second size selection on one gel

The first and second size selection on the same gel were performed under conditions of 6 V/cm, 90 second pulse, 14°C, 4 hours for the first size selection and

6 V/cm, 30 second pulse, 14°C, 12 hours for the second. After electrophoresis the desired DNA fragments were cut from the gel for ligation.

67 4.3.2 Metagenomic library quality: insert size

The diversity and average size of inserts generated from HMW fragments was examined by Bgl II digestion and PFGE of approximately 1.5% (n=40) of

CBNPD1 clones. Results showed that CBNPD1 contained inserts with sizes ranging from 5kb to greater than 50kb (Figure 4.3 and 4.4). Clones contained an approximate average insert size of 27kb, in which greater than 60% were larger than 20kb. It is therefore estimated that there is approximately 77 Mb of

DNA contained in CBNPD1. Given that the average size of any given gene is

~1kb, CBNPD1 could contain as many as 77 000 genes (Rondon et al., 2000).

Figure 4.3: PFGE of randomly selected environmental BAC clones.

Clones digested with Bgl II. Electrophoresis was performed for 15h at 12◦C in 0.5 x

TBE buffer at 6 V cm-1 with 5-15s pulses. Lane 1, lambda concatamer ladder (New

England Biolabs); Lane 2, lambda DNA digested with HindIII. Arrow indicates the 2kb segment of the BAC vector.

68

Figure 4.4: Size distribution of BAC clones in CBNPD1. Clones within a range of

10kb are grouped together.

4.3.3 Metagenomic library quality: PKS gene survey

To assess the quality of template DNA in library construction in terms of genes involved in species diversity and cyanobacterial secondary metabolite production, PCR screens were conducted to target genes encoding enzymes involved in secondary metabolite production. PCR screens targeting PKS genes were undertaken with cyanobacterial PKS specific primers (DKF/DKR) and various templates used in CBNPD1 library construction. PCR products (Figure

4.5) from two screens involving templates originating from original HMW genomic DNA extractions and reaction mixtures used for ligation reactions were cloned and sequenced.

Twenty three unique nucleotide sequences were obtained and the subsequent alignment of the predicted protein sequences with GenBank sequences revealed that 10 of the nucleotide sequences encoded regions of PKS genes

(Table 4.1). Importantly this diversity analysis suggests that the diversity of DNA within this particular cyanobacterial bloom was carried over to large pieces of

DNA that were extracted from the bloom and cloned into vectors to form an environmental library.

69

Figure 4.5: Gel electrophoresis analysis of PCR products from library quality screens. Depicted are two positive PCR reactions using cyanobacterial PKS- specific primers DKF/DKR from two different DNA templates used in CBNPD1 BAC library construction; DNA extraction (Lane 2) and ligation mix (Lane 3) prior to transformation into host strain. Lane 1: lambda HindII DNA marker

Table 4.1: Protein-coding PKS genes

Protein-coding PKS genes of highest similarity to that of sequenced and analyzed clone insert DNA originating from PCR screens. Screens aimed at assessing templates used for CBNPD1 BAC library construction for genes involved in cyanobacterial secondary metabolite production. Species from which genes originated are additionally listed.

Clone Best BLAST hit Putative function Identity PKS-2 Microcystis spp. PKS McyG 98% PKS-6 Microcystis spp. PKS McyD 96% PKS-7 Nostoc spp. Dethiobiotin synthetase 56% PKS-8 Nostoc spp. PKS modules 81% PKS-10 Clostridium spp. pksE 58% PKS-12 Anabaena spp. PKS modules 69% PKS-14 Nostoc spp. PKS modules 75% PKS-16 Synechocystis spp. acetyl-CoA acetyltransferase 74% PKS-18 Anabaena spp. PKS type 1 61% PKS-19 Oscillatoria spp. PKS 93% PKS-21 Microcystis spp. PKS McyD 98% PKS-23 Anabaena spp. PKS 69%

70 4.4 Discussion

The increased presence of cyanobacterial blooms in freshwater systems and the variety of secondary metabolites they produce that have impacts on human health warrants large-scale approaches into understanding the ‘bigger picture’ of cyanobacterial bloom population structure and function. Through studying the metagenomes of cyanobacterial blooms more information into cyanobacterial bloom formation and production of associated secondary metabolites will present itself. The complexities associated with studying metagenomes of cyanobacterial blooms, particularly its inaccessibility via traditional methods requires approaches that are culture-independent. The large-construct environmental BAC library that we describe here (CBNPD1) demonstrates a means of large-scale culture-independent access to genomes of microorganisms in cyanobacterial blooms. Importantly it has enabled a route of study for population structure analysis of cyanobacterial blooms and allows opportunities to potentially study not only genes involved in metabolite production but also gene clusters.

An area for concern using metagenomic analysis is that the methodology employed for library construction can result in a biased representation of the community microflora (Liles et al., 2003). In our case it is exceptionally important that DNA originating from secondary metabolite-producing cyanobacterium is being carried over to large pieces of DNA captured in this

BAC library. Coinciding with using the BAC system we have used alternative culture-independent methods that focus specifically on extracting functional

(PKS) or structural (16S rDNA) sequences from environmental DNA using PCR and specific primers to access gene fragments. These libraries were constructed with the aim of assessing the ‘quality’ of DNA used as a template

71 for BAC library construction in terms of the presence of 16S rDNA and/or PKS genes affiliated with known secondary metabolite-producing cyanobacterium.

Results from chapter 3 demonstrated an abundance of secondary metabolite- producing Cyanobacteria and Actinobacteria. PKS genes affiliated with members of toxin-producing genera's Microcystis, Nostoc and Anabaena were also noted (Table 4.1). Our results demonstrate that DNA fragments from secondary metabolite-producing cyanobacteria are being represented in DNA extractions used for BAC library construction. Beyond this initial aim these single-specific-gene libraries have led to a discovery of novel 16S rRNA and

PKS sequences (Table 4.1 and Table 3.1) from this particular cyanobacterial bloom sample which has provided us a window into the diversity of such sequences involved in population structure and function. The BAC system is now hoped to move beyond these single gene sequences towards greater understandings of the phylogenetic, physiological and functional properties of these microorganisms present in cyanobacterial blooms. Screening techniques described in chapter 5 will be used to build on the information gained here to identify clones that contain genes of interest.

72 CHAPTER 5 – METAGENOMIC LIBRARY SCREENING

5.1 Introduction

Several techniques have been used to identify genes of interest from metagenomic libraries based on either metabolic activity (function-driven approach) or on nucleotide sequence (sequence driven approach). PCR is commonly used for sequence based screening of metagenomic libraries using target-specific primers to identify genes of functional interest such as polyketide synthases (Piel, 2002; Courtois et al., 2003; Ginolhac et al., 2004; Piel et al.,

2004), proteorhodopsins (de la Torre et al., 2003) and histidine protein kinases

(Hughes et al., 1997) or a phylogenetic anchor such as 16S rDNA. The advantage of the identification of clones harbouring phylogenetic anchor genes on large inserts is that sequencing of the DNA surrounding these genes is possible. This enables the partial genomic characterization of uncultivated microorganisms providing possible insights into the physiology, ecological role and evolution of the organisms. This approach has been successfully used in the characterization of uncultivated members of the Acidobacteria phylum, which are abundant in soil but about which little is known (Liles et al., 2003;

Quaiser et al., 2003).

Another method of sequence based screening is random BAC-end sequencing which involves obtaining a single sequence read from both ends of the environmental DNA insert. BAC-end sequence data is collected from a large number of randomly selected clones in a given library, annotated and used to infer to some extent the genomic potential and the diversity of the large insert library and the environment from which the library was constructed.

Metagenomics and in particular BAC cloning uses large constructs, so that a

73 clone found to contain an interesting gene or phylogenetic marker in the survey can be sequenced to completion providing additional information on genes or possibly gene clusters of the organism from which the DNA fragment originated.

This chapter describes the sequence-based screening of BAC library CBNPD1 using both a PCR-based approach to identify clones containing 16S rDNA as well as a random BAC-end survey. Clones found to contain genes of interest and those that are deemed appropriate will be selected for insert-sequencing to be completed and reported in chapter 6.

5.2 Environmental procedures

5.2.1 BAC-clone 'pooling' and plasmid extraction

BAC DNA from clones 481-960 was pooled in a 48x format to reduce time and costs associated with library screening. BAC clones were inoculated onto LB+Ch agar plates in a 48x format (8x6 array) and incubated at 37°C for 24 hours.

Subsequent colonies were scraped off using sterile inoculation loops, resuspended in 10ml TB+Ch and pelleted by centrifugation at 10000 rpm for 10 minutes. Plasmid DNA from pooled clone cultures was extracted using the altered small-scale plasmid purification procedure (section 2.5.11).

5.2.2 Digestion of chromosomal DNA

Although the majority of host chromosomal DNA is removed during the neutralization step of BAC DNA isolation, trace quantities may exist which pose problems for the specificity of following PCR screenings. To remove trace quantities of host DNA so that the possibility of PCR amplification of host DNA

74 is reduced in subsequent gene screening PCR, BAC DNA was digested using a plasmid-safe ATP-dependent DNase (Epicentre) as per manufacturer’s instructions.

5.2.3 PCR amplification

In order to screen the information contained in clone libraries a specific primer/PCR approach was used to identify clones containing 16S rRNA genes.

PCR reactions were run as per section 2.5.14. Template DNA prepared as per section 4.2.1-2 was used with universal 16S rDNA primer FD1 and custom designed primer CR01.

5.2.4 BAC-end sequencing

Numerous difficulties are associated with BAC DNA sequencing, mostly due to an insufficient concentration of BAC DNA. BAC DNA was therefore purified using the large-scale plasmid purification procedure (section 2.5.12) to ensure a higher concentration. Sequencing reactions were performed as per section

2.5.15 using vector primers T7 and RPII (Epicentre) and template DNA was added to a concentration of 5 ng/μl.

5.2.5 Computational analysis of DNA sequences

DNA sequences were edited in order to remove inaccurately called bases and vector sequence. BAC-end sequences were then analyzed by translated similarity searches against the GenBank nr database using BLASTx software.

COG functions of genes found to affiliate with translated BAC-end sequences were identified using the Integrated Microbial Genomes (IMG) system v 1.3

75 (Markowitz et al., 2006). 16S rRNA sequences were compared against the

GenBank database using BLASTn software. Sequence data was additionally processed with GenMark (Benson et al., 2002) to identify open reading frames

(ORFs).

5.3 Results

5.3.1 PCR-based library screening

Screening of CBNPD1 was used to detect clones containing 16S rDNA so that a link between physiological and phylogenetic information of uncharacterized microorganisms in cyanobacterial blooms could be established. A PCR-based method for amplifying 16S rRNA gene sequences from BAC DNA extracted from pooled cultures (Figure 5.1) was used. Once a positive was identified from a given pool of 48 clones (Figure 5.2), the pool was broken down into pools of

12 and eventually examined individually (Figure 5.3) to identify the clone containing the 16S rRNA gene. This process provides evidence that the positive identified is encoded in a BAC clone and is not the result of host DNA being amplified.

Results from screening of the 48x and 12x pools were highly variable depending on the quality of the BAC DNA template. Host chromosome digestion was used to great effect to remove contaminating DNA and optimize

PCR reactions as indicated in Figure 5.2. However, plasmid-safe DNase treatment dilutes BAC DNA template roughly 1 in 25 making it difficult to distinguish between a false negative and a true negative. Therefore we screened CBNPD1 with and without plasmid-safe DNase treatment.

76

Figure 5.1: BAC DNA extractions of 12x pooled cultures

Figure 5.2: PCR screening of 48x pools of CBNPD1 clones before and after BAC

DNA template treated with plasmid-safe DNase

Plasmid-safe DNase treatment used to remove host chromosome DNA. Figure depicts

PCR screenings before (A) and after treatment (B). Clones 481-960 arranged in 48x pools used as template DNA

77

Figure 5.3: PCR screening of CBNPD1 clones

Screenings of clones 577-588 (A) and 541-552 (B) to identify clones containing 16S rDNA. Lanes exhibiting a positive result represent amplified 16S rDNA products from individually examined clones, which were traced from positive 48x, and 12x pooled plate positive screenings. +ve con: positive control. λ: lambda DNA digested with HindIII.

Positives identified were sequenced from PCR products and a bacterial phyla/genus/species affiliation assigned based on phylogenetic analysis performed. From various positives identified from pools of 48 clones (Figure

5.3) we have recovered 2 clones containing sequences belonging to different phylogenetic genera including Pseudomonas and Roseateles from the phylum

Proteobacteria (Table 5.1) These clones offered the opportunity to further investigate the functional biology of these organisms by sequence analysis of the entire BAC insert and functional analysis of genes encoded within. Such information would be vital for increased knowledge into their ecological roles in cyanobacterial blooms and associated events.

78

Table 5.1: 16S rRNA sequences obtained from CBNPD1 BAC clones Clone Insert Phylogenetic affiliation Sim. size (Kb) (Phylum Order/ Genus) % 545 30 γ-Proteobacteria Pseudomonadaceae / Pseudomonas 99 578 22 β-Proteobacteria Uncharacterised bacterium 99 β-Proteobacteria Burkholderiales / Roseateles 95

5.3.2 Random sequence library screening

DNA sequencing using T7 and RPII primers was performed on 36 BAC clones for a total of 72 sequencing reactions (Table 5.2). 5 of these sequencing reactions failed to generate quality sequence data, perhaps the result of failures in the sequencing reaction. No clone had both BAC-end sequences fail which ruled out plasmid purification as a cause. In total 67 sequences were obtained and submitted for BLASTx analysis against the

GenBank nr database with exception of the 7 sequences found to contain

ORFs which were submitted for BLASTp analysis. G+C composition ranged from 33.33 to 72.91%. Sequences with bit scores greater than 80 % were considered to be high matching. Putative COG function was assigned by searching the closest relative obtained by BLASTx or BLASTp against the

COG database using the Integrated Microbial Genomes (IMG) system.

Table 5.2: CBNPD1 library BAC-end sequencing statistics Number of DNA sequencing reactions performed 72 Number of reactions that failed 5 Number of sequence tags with high scoring matched 29 Number of sequence tags matching known proteins 36 Number of sequence tags matching hypothetical conserved 12 proteins Number of sequence tags matching genera with no available 15 genome Number of ORFs identified 7 Estimated bases sequenced 40.4 kb

79 Of all the microbial genomes that have been completed only 8 phylotypes are within the phylum Cyanobacteria. Very little is known regarding the structural and functional diversity of Cyanobacteria as well as heterotrophic bacteria within a cyanobacterial bloom. This BAC-end sequencing survey has provided an insight into genetic potential of as yet uncharacterized microbes with a total of 15 sequence tags found most similar to sequences affiliated to genera for which at present there is no available genome information (Table 5.2). Another

17 sequence tags were found most similar to sequences affiliated to genera with available genomes but had similarities less than 80%. Detailed descriptions of the 67 annotated sequences from the randomly selected

CBNPD1 clones are listed in Table 5.3. Based on analysis using the integrated microbial genomes system, these BAC-end sequences have been classed into various COG functional categories. Sequence tags were found with affiliation to proteins involved in a wide array of cell metabolism processes including amino acid metabolism (e.g. methionine synthase), carbohydrate metabolism

(cellulose), inorganic ion metabolism (nitrite/sulfite reductase), and lipid metabolism (fatty acid hydroxylase). A number of genes involved in cell structures (e.g.flagella), DNA processes, energy production (photosynthetic reaction center L subunit) and defense mechanisms (nucleases) were also affiliated to sequence tags.

80 Table 5.3: Cyanobacterial bloom derived BAC-end sequence matches * Clones selected for insert sequencing completion and analysis (Chapter 6). These clones were selected on several criteria depending on the presence of a phylogenetic anchor, whether the sequence tag was affiliated to a phylum and/or genus that was underrepresented or not represented by a sequenced genome and sequence tags that had low matches or matches to proteins of interest. Clone Sequence G+C ORF Genome Putative function Closest relative Sim. length (bp) % avail. (Phylum Genera) % Amino acid transport and metabolism 116-R 643 65.47 - Yes 4-hydroxyphenylpyruvate dioxygenase protein α-Proteobacteria Bradyrhizobium 71 662-T 609 71.43 - Yes O-succinylhomoserine sulfhydrylase α-Proteobacteria Caulobacter 80 1394-R 693 33.62 - No peptidase T Bacteroidetes Croceibacter 85 *2089-R 882 65.19 - Yes putative peptidase Actinobacteria Streptomyces 52 2485-T 382 - Yes methionine synthase β-Proteobacteria Azoarcus 73 2596-T 498 63.65 - Yes sulfate adenylyltransferase, small subunit β-Proteobacteria Burkholderia 87

Carbohydrate transport and metabolism *67-T 812 64.29 - Yes inner membrane metabolite transport protein α-Proteobacteria Bradyrhizobium 74 116-T 649 68.72 - Yes putative UDP-glucose 4-epimerase α-Proteobacteria Sinorhizobium 75 403-R 869 66.74 - Yes cellulase-like protein α-Proteobacteria Mesorhizobium 80 2383-T 160 - No inositol monophosphatase γ-Proteobacteria Azotobacter 79

Cell motility 1051-R 778 60.93 - Yes motility protein, MotB family γ-Proteobacteria Pseudomonas 88 2108-T 496 65.32 - Yes flagellar basal body L-ring protein α-Proteobacteria Magnetospirillum 94

Cell wall/membrane/envelope biogenesis *2089-T 694 63.40 - Yes putative UDP-N-acetylglucosamine 1- Actinobacteria Nocardia 71 carboxyvinyltransferase 2097-T 911 36.22 - No membrane-associated zinc metalloprotease Bacteroidetes Tenacibaculum 91

Coenzyme transport and metabolism 1051-T 896 58.37 - Yes thiamin biosynthesis protein ThiC γ-Proteobacteria Pseudomonas 93

Defense mechanisms 1454-R 878 51.59 - Yes HNH nuclease δ-Proteobacteria Anaeromyxobacter 42

81 Energy production and conversion 662-R 469 46.27 - Yes RepE γ-Proteobacteria Shigella 98 858-R 781 61.46 - Yes cytochrome-c oxidase γ-Proteobacteria Pseudomonas 100 *1664-T 790 72.91 - Yes Oxidoreductase α-Proteobacteria Mesorhizobium 81 2326-R 819 60.20 - No photosynthetic reaction center L subunit β-Proteobacteria Roseateles 91

General function prediction only 1031-R 639 64.95 - Yes von Willebrand factor, type A γ-Proteobacteria Pseudomonas 96 2326-T 779 67.52 - Yes alpha/beta fold α-Proteobacteria Bradyrhizobium 68

Hypothetical proteins without assigned function *142-R 677 45.05 - Yes uncharacterized conserved protein γ-Proteobacteria Idiomarina 80 182-R 519 54.72 - Yes hypothetical protein ELI_01985 α-Proteobacteria Erythrobacter 53 264-T 936 60.58 1 Yes hypothetical protein Pfl_1118 γ-Proteobacteria Pseudomonas 87 1014-R 699 33.33 - Yes hypothetical protein BT1932 Bacteroidetes Bacteroides 75 1264-R 775 70.71 - No hypothetical protein Rgel02001825 β-Proteobacteria Rubrivivax 61 1454-T 830 46.99 1 Yes hypothetical protein δ-Proteobacteria Bdellovibrio 70 1613-R 780 60.64 - Yes conserved hypothetical protein γ-Proteobacteria Pseudomonas 90 *1664-R 751 67.64 - Yes hypothetical protein SdenDRAFT_3459 γ-Proteobacteria Shewanella 47 2097-R 805 36.52 - No hypothetical protein MED217_18220 Bacteroidetes Flavobacterium 75 2210-R 628 61.62 1 No protein of unknown function DUF427 α-Proteobacteria Sphingopyxis 81 2596-R 773 56.92 1 Yes hypothetical protein bll6650 α-Proteobacteria Bradyrhizobium 67 2760-R 768 69.92 - No similar to uncharacterized conserved protein β-Proteobacteria Polaromonas 75

Inorganic ion transport and metabolism 290-R 798 54.76 1 Yes bacterioferritin (cytochrome b1) γ-Proteobacteria Pseudomonas 85 1014-T 668 37.42 - No copper resistance protein Bacteroidetes Ornithobacterium 63 1264-T 779 66.96 - No nitrite/sulfite reductase β-Proteobacteria Rhodoferax 79 2760-T 777 60.10 - Yes COG1629: Outer membrane receptor proteins, γ-Proteobacteria Pseudomonas 50 mostly Fe transport

Lipid transport and metabolism *67-R 858 62.47 - Yes short-chain dehydrogenase/reductase SDR β-Proteobacteria Burkholderia 89 1031-T 669 62.33 - Yes fatty acid hydroxylase γ-Proteobacteria Pseudomonas 88 2383-R 897 61.87 - Yes acetyl CoA synthase α-Proteobacteria Bradyrhizobium 79

82

Posttranslational modification, protein turnover, chaperones 874-R 829 60.80 1 Yes peptidyl-prolyl cis-trans , cyclophilin α-Proteobacteria Nitrobacter 85 type *142-T 870 49.31 - Yes peptidase S1 and S6, chymotrypsin/Hap γ-Proteobacteria Shewanella 68

Replication, recombination and repair 849-R 873 46.96 1 Yes RepE γ -Proteobacteria Erwinia 99 2108-R 901 58.93 - Yes recombination protein; RecA α-Proteobacteria Mesorhizobium 93 2485-R 503 60.83 - No nuclease subunit of the excinuclease complex δ-Proteobacteria Syntrophus 59

Signal transduction mechanisms 1613-T 829 61.76 - Yes isocitrate dehydrogenase kinasephosphatase γ-Proteobacteria Pseudomonas 94

Transcription 264-R 827 59.85 - Yes transcriptional Regulator, TetR family γ-Proteobacteria Pseudomonas 95 403-T 820 62.07 - No bacterial regulatory protein, LacI family α-Proteobacteria Magnetococcus 47 858-T 734 62.53 - Yes transcriptional Regulator, GntR family γ-Proteobacteria Pseudomonas 93

Translation, ribosomal structure and biogenesis 290-T 508 54.72 - Yes tyrosyl-tRNA synthetases γ-Proteobacteria Pseudomonas 93 778-R 816 68.01 - No 30S ribosomal protein S5 Planctomycetes Rhodopirellula 83 778-T 829 72.22 - No elongation factor G Planctomycetes Rhodopirellula 65 1394-T 721 32.45 - No putative RNA methyltransferase Bacteroidetes Flavobacterium 83 182-T 814 56.88 1 Yes similar to Acetyltransferases including N- α-Proteobacteria Silicibacter 67 acetylases of ribosomal proteins

83 Using this approach of sequencing random BAC-ends relies on identifying a phylogenetic anchor to give an accurate depiction of what species the fragment of DNA originated from. One sequence tag 2108-R, was found to affiliate to a

RecA gene which can be used to give phylogenetic inference for the fragment.

The numerical distribution of highest sequence matches to phyla and genera is given in Table 5.3 and demonstrates bias towards phyla that are represented by greater numbers of complete, annotated genome sequences. G+C composition for sequence tags ranged from 33-72% (Figure 5.4) showing that the host,

E.coli with an average G+C% of 51, exhibited no serious bias against DNA with a higher or lower G+C composition. Interestingly, several sequence tags exhibited a G+C composition outside the reported G+C range for the phylum

Proteobacteria, subphyla α-Proteobacteria (Figure 5.4). The phylum

Cyanobacteria which is represented by 17 completed genomes was not found to affiliate to any sequence tags despite Cyanobacteria being well represented in terms of cell counts as well as 16S rDNA analysis (Chapter 3).

A number of sequence tags were found to affiliate to proteins of interest both ecologically and in terms of potential in industrial processes. A sequence tag was found to affiliate to a HNH nuclease (clone 1454). The HNH family includes several restriction enzymes, e.g. McrA, and other enzymes involved in defense function, such as colicins. Sequence tags found affiliating to genes of industrial applications include a number of peptidases (clones 142, 1394, and 2089), genes involved in thiamin or vitamin B1 synthesis (clone 1051), cellulase (clone

403) and oxidoreductases (clone 1664)

84

Table 5.4: Summary of BAC-end sequences matches to phylum and genus level

Phylum Phylum Clone Genus Number of Genome (total genomes) G+C% G+C% matching sequence clones available Proteobacteria 22-69 45-72 45* (134) α-Proteobacteria 27-68 54-72 Mesorhizobium 3 Yes (40) Bradyrhizobium 3 Yes β-Proteobacteria 50-69 60-70 Burkholderia 2 Yes (19) γ-Proteobacteria 22-67 45-67 Pseudomonas 11 Yes (68) δ-Proteobacteria 46-60 46-60 Syntrophus 1 No (7)

Bacteroidetes (8) 33-56 33-37 6 Flavobacterium 2 No

Planctomycetes (1) 55.4 68-72 4 Rhodopirellula 2 Yes

Actinobacteria (21) 65-69 63-65 2 Streptomyces 1 Yes Nocardia 1 Yes * Proteobacteria 45 clones: α-16, β-7, γ-19, and δ-3

5.4 Discussion

Because of the complexity of the cyanobacterial bloom metagenome, and its current inaccessibility via traditional culturing methods approaches that are large-scale and culture independent are needed. Various screening techniques including 16S rDNA targeted PCR and random sequence tag generation have been used to access the vast wealth of information that is encompassed within the metagenomic library CBNPD1. PCR screens identified 2 clones encompassing 16S rDNA genes affiliated to the phylum Proteobacteria.

Randomly selected clones were partially sequenced in order to generate sequence tags which were analysed to provide a physiological insight within cyanobacterial blooms.

85 In order to describe the phylogenetic diversity within BAC libraries, most environmental genomic studies have depended on direct rDNA surveys of these libraries and sequencing of rDNA-containing BAC clones, or alternatively on surveys of PCR-amplified 16S rDNA of a parallel sample. The survey reported here is not exhaustive nor a quantitative measurement of 16S rDNA abundance in the cyanobacterial bloom sample or the metagenomic library CBNPD1. Only

480 of 2860 clones were screened in parallel to a PCR-amplified 16S rDNA survey conducted in chapter 3. The estimated number of 16S rDNA genes within CBNPD1 depends on a number of unknown variables. If the average microbial genome size is 4 Mb and CBNPD1 contains 77 Mb then there are roughly 19 genome equivalents within CBNPD and therefore a minimum of 19

16S rDNA genes. Work by Klappenbach et al. (2000) suggests the number of rRNA operon copies is directly proportional to growth rate in culture, with the higher the growth rate the more operon copies there will be present within the organisms genome. Cyanobacterial blooms are a rapid growing community existing in a eutrophic state, thus we speculate that on average, the microorganisms represented within a cyanobacterial bloom metagenome have at least two copies of the 16S rDNA gene per genome.

The potential bias and difficulties affecting the identification of BAC clones containing 16S rDNA include the specificity of primer sets, preferential amplification from the host rDNA template and potential toxicity of heterologously expressed rDNA operons. Improving primer design to either target specific phylogenetic groups of interest as well as be non-specific to the host rDNA in order to reduce non-specific amplification is seen as a means to improve screening methods. CRO1 was a 20 bp reverse primer designed using

Primrose software (Ashelford et al., 2002) that binds to position 1201 and had

86 only 60% homology to the 16S rDNA gene of E.coli. Although no 16S rDNA genes were found to affiliate to the host strain, interestingly no genes were found to affiliate to a member of the Cyanobacteria which was found to make up

44% of a PCR-amplified 16S rDNA gene survey performed in chapter 3.

Preferential amplification of host rDNA may be averted by increasing copy number of BAC DNA so that the BAC DNA to host DNA ratio would be increased resulting in less amplification from a host template. Inducible-copy

BAC vectors have recently become available (Wild et al., 2002), and rolling cage polymerases (Dean et al., 2001), which can rapidly replicate circular DNA.

However, with higher copy number there is potential toxicity of heterologously expressed rDNA operons which has been reported in instances where higher- copy vectors have been used (Liles et al., 2003) and this must be considered.

Approximately 40 kb of DNA sequence from a cyanobacterial bloom was generated from 67 random BAC end sequencing. Of these sequences 36 affiliated to known proteins and 12 affiliated to hypothetical conserved proteins.

Two-thirds of sequence tags affiliated to genes from organisms of the phylum

Proteobacteria which is not surprising as the group represents over 40% of all completed microbial genomes. No matches were found to the phylum

Cyanobacteria which was found in chapter 3 to dominate the population. This is most likely attributed to the under representation of cyanobacteria in terms of genome information. In fact there are no complete genome representatives for

Cylindrospermopsis raciborskii or Aphanizomenon spp. the dominant species from this particular cyanobacterial bloom. Although complete genome sequencing would provide more complete information about the metagenome of cyanobacterial blooms the complexity of these habitats and its inaccessibility via culture-dependent methods makes this currently non-feasible. This sequence

87 tag approach is a rapid, economical and effective means of studying the uncultured and genetically uncharacterised microorganisms of cyanobacterial blooms. The introduction of automation would additionally reduce time and costs, and allow a much more extensive survey than that carried out in the present study. For all the advantages this method still has a serious limitation; namely, the sequencing data rarely has a phylogenetic marker to anchor the

BAC to any specific taxa. This data lets one consider the genomic potential of the large insert library, but BAC end sequencing surveys cannot be used to infer the metabolic activities of any single bacterial taxa. For this reason this sequencing survey represented only one methodology in environmental metagenomics another being the 16S rDNA targeted PCR screening which was carried out so that metabolic activities and phylogenetic affiliation could be linked.

From these two methodologies and subsequent results described here 7 clones were selected for insert sequencing completion and analysis (Chapter 6) so that a greater insight into physiology with possible links to phylogeny could be made within the cyanobacterial bloom metagenome. These clones of varying G+C content are listed in Table 5.5, and were selected on several criteria depending on the presence of a phylogenetic anchor, whether the sequence tag was affiliated to a phylum and/or genus that was under-represented or not represented by a sequenced genome and sequence tags that had low matches or matches to proteins of interest. Clone 905 was selected due to its very low

G+C content and the affiliation of its sequence tags to the phylum

Bacteroidetes, a phylum that is thought to be dominant in freshwater systems and cyanobacterial bloom populations (Glockner et al., 1999; Eilers et al., 2000;

Kirchman, 2001; Eiler and Bertilsson, 2004) and is much under-represented in 88 terms of genome information. Clone 1664 was an exception and was selected due to a higher G+C% of reported complete genome sequences for the phylum

Proteobacteria (by far the most represented) which infers that the fragment originated from a genetically uncharacterised microorganism.

Table 5.5: Clones selected for sequencing to completion and their justification for selection

Clone selected for insert Insert size Justification sequencing completion (kb)

67 29.5 Interesting match: alcohol dehydrogenase oxidoreductase

142 9 Interesting match: chymotrypsin/HAP

543 20 Low matches/ Phyla poorly represented

578 22 Phylogenetic anchor: 16S rDNA gene

905 16.3 no genome representative

1664 20 G+C% outside reported range

2089 23.5 Low matches/ Phyla poorly represented

Although 2 clones were identified as containing a 16S rDNA gene only clone

578 was selected for insert sequencing completion. Clone 545’s omission was justified from several findings including the phylogeny of the 16S rDNA gene which had a 99% affiliation to Pseudomonas fluorescens (Table 5.1), for which a genome sequence has recently been completed (Sanger institute; ftp://ftp.sanger.ac.uk/pub/pathogens/pf/). Clone 578 was selected for insert sequencing despite the 16S rDNA sequence demonstrating a high affiliation to its closest relative (an uncharacterised β-Proteobacteria). This was due to the fact that the closest related characterised type culture was found to be

Roseateles depolymerans (96%) (Table 5.1), the first bacteriochlorophyll (BChl)

89 a-containing obligate aerobes to be classified in the beta-subphyla of the

Proteobacteria (Suyama et al., 1999). The strains were originally isolated as degraders of poly(hexamethylene carbonate) (PHC) but can additionally utilize some other biodegradable plastics (Suyama et al., 1999). Moreover the closest relative with a completed genome sequence (Rubrivivax gelatinosus; 96%) was also significantly lower suggesting that clone 578 may originate from an uncharacterised environmentally relevant microbe from which there is no genome representative justifying its sequencing and subsequent analysis.

90 CHAPTER 6 – GENOMIC ANALYSIS OF 7 DNA FRAGMENTS

FROM CYANOBACTERIAL BLOOM-ASSOCIATED BACTERIA

6.1 Introduction

Since the first microbial genome was sequenced in the mid 1990’s there have been remarkable advances in genome sequence technology with over 460 genomes currently published and another 998 sequencing projects underway.

Until recently the inability to culture the majority of microorganisms from nature was a problem that was restricting the advances of microbial genome sequencing. With improved sequencing methods and metagenomics ability to access the genomes of as yet uncultured microorganisms it has become feasible to sequence the entire genome of an environmental sample and hence the genomes of microorganisms within a given community. This has been accomplished in several instances concerning viral communities in ocean and human faeces (Breitbart et al., 2002; Breitbart et al., 2003) and more recently microbial communities of nutrient-limited sea water (Venter et al., 2004), and a bacterial biofilm community containing only 6 phylotypes (Tyson et al., 2004).

Communities that contain greater species richness such as marine communities

(100-200 species per ml) and soil (<4000 species per gram) require assembly efforts that are far more complex and require even faster and cheaper sequencing technology.

A less expensive and time consuming alternative to complete genome sequencing of such organisms is sequencing random DNA fragments. Although this does not provide as much information as a whole genome sequence it still produces physiological insights of previously uncharacterized and/or uncultured

91 organisms. Furthermore sequenced DNA fragments containing a phylogenetic anchor such as 16S rDNA allows a link between phylogenetic and physiological data to be established. Information gained from such studies is rapidly increasing our knowledge of the genetic and functional diversity of bacteria and archaea such as the widespread nature of rhodopsins in marine bacteria which were once thought limited to domain Archaea (Beja et al., 2000a). In this chapter the sequencing and analysis of BAC inserts constructed from a cyanobacterial bloom is described. Clones were selected for insert sequencing based on random BAC-end sequence information and clones found to contain a

16S rDNA gene from PCR screens as detailed in chapter 5, section 5.4.

6.2 Environmental procedures

6.2.1 BAC DNA purification and sequencing of insert DNA

BAC DNA was purified using large scale plasmid purification (Section 2.6.12).

Sequencing of DNA inserts was performed using a primer-walking approach using sequence derived primers. Sequencing reactions were carried out as per section 2.6.15 using primers T7 and RPII (Epicentre) vector primers for initial sequencing reactions. Sequencing data from vector primers was used as a template for primer construction using Primer3™ software (Rozen and

Skaletsky, 2000), with the cycle of sequencing and primer construction continued until insert sequencing was completed.

6.2.2 Computational analysis of sequencing data

Open reading frames were identified using a Glimmer-based annotation online web server, BASys (section 2.7.4). Several criteria were used to define genuine

92 ORFs among all the potential ORFs detected. Only small or non-overlapping sequences longer than 40 amino acids were retained and in the instances of putative ORFs being detected in different reading frames, those with known homologues were used. Sequence similarity between ORFs and published protein sequences was established using BLASTp (Altschul et al., 1997).

Phylogenetic tree construction of the 578 16S rDNA was conducted via sequence alignments using ClustalW followed by distance matrix construction by the method of Jukes and Cantor (Kuhner and Felsenstein, 1994) and phylogenetic tree construction by the neighbour-joining method using the

TreeconW package (section 2.7.7). The presence of tRNA genes was explored using the program tRNAscan (Lowe and Eddy, 1997) and the presence of tandem repeats detected with the program Tandom Repeats Finder (Benson,

1999). ORFs detected from BAsys were additionally processed using the KEGG database (section 2.7.5); specifically the automatic annotation server (KAAS)

(Kanehisa and Goto, 2000; Kanehisa et al., 2006) which was used to assign genes to pathways identified by KEGG.

6.3 Results

6.3.1 Gene finding

Seven clones were sequenced to completion using a primer-walking method and open reading frames identified and analyzed using an array of molecular tools. A primer-walking method was chosen as no insert was larger than 30 kb and so orientation could be kept and data analyzed progressively. Clones were selected depending on the presence of a phylogenetic anchor, whether the sequence tag was affiliated to a phylum and/or genus that were under- 93 represented or not represented by a sequenced genome and sequence tags that had low matches or matches to proteins of interest (Section 5.4). The six clones that lacked a phylogenetic anchor were grouped into 2 sections depending on their GC content, low GC and mid-high GC.

The size of sequenced inserts ranged from 9-30 kb and in total represented 144 kb of the cyanobacterial bloom metagenome. The identification of 130 open reading frames, the percentage of hypothetical proteins, and the average gene length of all inserts is summarized in Table 6.1.

6.3.2 The Roseateles genomic fragment

Clone 578 was identified via 16S rDNA targeted PCR screens of BAC library

CBNPD1 (section 5.3.1). The corresponding 16S rDNA sequence affiliated within the phylum Proteobacteria showing a similarity of 99% to an uncharacterized β-Proteobacteria. The closest affiliation to a type culture as deemed by the International Journal of Systematic and Evolutionary

Microbiology was 96 % similarity to Roseateles depolymerans DSM 11813T

(Suyama et al., 1999). The completely sequenced insert of 578 was 22169 bp in length. It contained two genes for stable RNA’s and 17 potential protein coding genes (Table 6.2). The average G+C content (64.7%) was not homogenous over the entire segment with ORFs 12-16 exhibiting a G+C content of 49-52%.

ORFs 13 and 15 also exhibited incoherent phylogenetic profiles with the 16S rDNA tree and could be suggestive of horizontal gene transfer (HGT). In comparison to conventional rrn operons (Hill, 1999) the Roseateles operon differs in that the16S rDNA gene is separated from the 23S and 5S rDNA genes. Table 6.3 summarizes the identity of all predicted open reading frames and their predicted functions of clone 578. 94 Table 6.1: Summary of the 7 environmental genome fragments analyzed Clone Insert size GC% No of Avg. gene length (bp) Total % hypothetical Phylum most represented Phylogenetic anchor (bp) ORFs 67 29540 66.1 32 830 6 Proteobacteria (94%) no 142 9532 49.2 5 1646 20 Proteobacteria (80%) no 543 20895 65.7 16 1075 9 Proteobacteria (56%) no 578 22169 64.7 17 873 41 Proteobacteria (88%) yes, 16S rDNA 905 16235 34.1 15 935 40 Bacteroidetes (87%) no 1664 22765 71.2 19 1207 53 Proteobacteria (47%) no 2089 23177 63.3 26 781 19 Actinobacteria (62%) no

95

Figure 6.1: Linear ORF maps of the 7 completely sequenced BAC clones from the Cyanobacterial bloom metagenome library ORFs are colour coded according to their COG affiliations and to ribosomal genes, where they exist. The phenylacetyl-CoA catabolon is a 18.5 Kb putative gene cluster detailed in Figure 5.7.

96 Table 6.2: Characteristics of the Roseateles genomic fragment (578) Size G+C ORFs rRNA tRNA Hypothetical Conserved Described Homologues (bp) (%) proteins hypothetical proteins in Rubrivivax proteins gelatinosus 22169 64.72 17 16S tRNAIle 0 7 10 4 rDNA tRNAAla

Analysis using BASys found a number of gene encoded proteins that could be involved in cell metabolism and membrane transport pathways (Table 6.3).

Three dehydrogenases (ORF4, 6 and 7) were identified that are involved in oxidoreductase activity in cell metabolism and electron transport. Additionally these three ORFs appear functionally related and being adjacent and in the same orientation, they possibly form an operon. Two genes encoded component proteins for the phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS), a major carbohydrate transport system in bacteria. Two aminopeptidases (ORF1 and 3) from the M28 family were identified as was a transcriptional regulator.

97 Schlegelella thermodepolymerans (T)4; AY152824

671 Aquabacterium citratiphilum (T); AF035050

Uncultured Green Bay bacterium MNE1; AF293003

*Rubrivivax gelatinosus (T); D16213 613 Ideonella dechloratans (T); X72724 1000 Ideonella sp. B508-1; AB049105 604 Leptothrix mobilis (T); X97071

Sphaerotilus sp. IF5; AF072915 1000 CBNPD1 clone 578 1000

Beta proteobacterium A1004; AF236009 735 Roseateles depolymerans (T); AB003624

1000 971 beta proteobacterium MBIC3293; AB022678 760 Pseudomonas saccharophila (T); AB021407 956

uncultured bacterium str. 42ds5; AY212737

Acidovorax anthurii (T); AJ007013

Escherichia coli; X80724

Figure 6.2: Phylogenetic tree of Roseateles-affiliated 16S rRNA gene

Phylogenetic tree based on neighbour-joining analysis of a 16S rDNA sequence (1500 bp) identified in clone 578 and sequences obtained from GenBank that are most similar. Bootstrap values greater than 500 are shown at nodes. * Rubrivivax gelatinous is the most similar organism based on 16S rDNA comparisons whose genome has been sequenced.

98 Table 6.3: Predicted RNA and protein coding genes of the Roseateles genomic fragment 578

ORF Position D Size Putative function: BLASTp (COG Closest relative Sim. Comments (aa) accession number if available) (Phylum Genus) % 1 2416-5073 + 886 leucine aminopeptidase-related protein Proteobacteria Caulobacter 63 Pfam04389; peptidase family M28, (COG2234) involved in proteolysis process. 2 6489-5125 - 455 D-lactate dehydrogenase (COG0277) Proteobacteria Burkholderia 52 Involved in energy production and conversion 3 7514-6486 - 343 Conserved hypothetical protein Proteobacteria Polarmonas 71 Cytoplasmic protein, unknown function 4 9063-7507 - 519 Aldehyde dehydrogenase (COG1012) Proteobacteria Rhodoferax 78 Enzymes which oxidize a wide variety of aliphatic and aromatic aldehydes using NADP as a . 5 10173-9079 - 365 Putative saccharopine dehydrogenase, Proteobacteria Burkholderia 79 Involved in amino acid transport and NAD-binding (COG1748) metabolism 6 10290-10775 + 162 Transcriptional Regulator, AsnC family Proteobacteria Pseudomonas 71 The AsnC family is a family of similar (COG1522) bacterial transcription regulatory proteins. ORF7-11 exhibit significantly lower G+C% than average ORF in fragment. 7 11584-11928 + 115 Conserved hypothetical protein Proteobacteria Legionella 79 G+C%: 52.2 8 12578-12907 + 110 Conserved hypothetical protein Deinococcus-Thermus deinococcus 55 G+C%: 53.3 9 13451-13831 + 127 Conserved hypothetical protein Proteobacteria Ralstonia 83 G+C%: 49.6 10 13815-13964 + 50 Conserved hypothetical protein Actinobacteria Arthrobacter 70 G+C%: 50.7 11 14136-14465 + 110 Conserved hypothetical protein Proteobacteria Mesorhizobium 65 G+C%: 57.3 tRNA 14308-14236 - tRNA-Ala tRNA-alanine tRNA 14403-14330 - tRNA-Ile tRNA-Isoleucine 16S rDNA 14551-16070 + 16S rRNA gene Beta-Proteobacteria 99 Unlinked rrn operon 12 15996-16472 + 159 Conserved hypothetical protein (53.04) Firmicutes Lactococcus 76 Cytoplasmic protein, unknown function 13 18323-16572 - 584 Phosphoenolpyruvate-protein kinase Proteobacteria Rubrivivax 78 Transfers the phosphoryl group from (COG1080) phosphoenolpyruvate (PEP) to the HPr 14 18633-18364 - 90 Phosphocarrier HPr protein (COG1925) Proteobacteria Nitrobacter 82 A component of the Phosphotransferase system (PTS), a major carbohydrate transport system in bacteria 15 19323-20786 + 488 Signal transduction histidine kinase Proteobacteria Rubrivivax 50 16 20803-21315 + 171 Thiol-disulfide isomerase and Proteobacteria Rubrivivax 80 thioredoxins (COG0526) 17 22032-21328 - 235 Lipoate synthase (COG0320) Proteobacteria Rubrivivax 90

99 6.3.3 Low GC% genome fragment analysis

One genomic fragment of low G+C percentage was identified from random

BAC-end sequencing survey and selected for insert sequencing. Its size, number of ORFs and total % hypothetical proteins are summarized in Table 6.1.

Clone 905 exhibited a high percentage (87%) of ORFs with similarity to homologues affiliated with members of the phylum Bacteroidetes. A total of 15

ORFs were assigned to 8 COG categories (Figure 6.2), 6 of which were identified as conserved hypothetical proteins. Several zinc-dependent peptidases were identified including one from the M48 family. A putative

Mg2+/Co2+ transport gene was identified as well as a beta-lactamase of the alpha/beta hydrolase superfamily often inferred in antibiotic resistance. A two- component system, regulatory protein (clone 905 ORF10) that regulates genes involved in autolysis and cell wall metabolism, a two-component system sensor protein without a kinase domain (clone 905 ORF8) and a transposase inferred in DNA transposition (clone 905 ORF7) were identified. Several ORFs were found with function inferred to energy production and conversion. Clone 905

ORF5 encoded a putative protein similar to a delta-1-pyrroline-5-carboxylate dehydrogenase which oxidizes proline to glutamate for use as a carbon and nitrogen source.

100 Table 6.4. Predicted protein coding genes encoded on a low G+C DNA genomic fragment isolated from a cyanobacterial bloom metagenome.

ORF Position D Size Putative function: BLASTp (COG Closest relative Sim. Comments (aa) accession number if available) (Phylum Genus) % 905 1 2860-47 - 937 Putative zinc protease Bacteroidetes Bacteroides 54 Zn-dependent peptidase involved in proteolysis and peptidolysis 2 3032-3589 + 185 Conserved hypothetical protein Bacteroidetes Flavobacterium 80 Cytoplasmic protein, unknown function 3 3627-4295 + 222 Conserved hypothetical protein Bacteroidetes Flavobacterium 79 Cytoplasmic protein, unknown function 4 4364-4750 + 128 Protein of unknown function DUF525 Bacteroidetes Flavobacterium 94 Uncharacterized protein affecting (COG2967) Mg2+/Co2+ transport 5 4870-6498 + 542 delta-1-pyrroline-5-carboxylate Bacteroidetes Flavobacterium 94 Oxidizes proline to glutamate for use as a dehydrogenase 1 (COG1012) carbon and nitrogen source 6 7739-6663 - 358 Conserved hypothetical protein Bacteroidetes Bacteroides 51 Cytoplasmic protein, unknown function 7 7834-9063 + 409 ISPg4, transposase (COG3385) Bacteroidetes Psychroflexus 82 Necessary for efficient DNA transposition 8 9433-10485 + 350 two-component system sensor protein, Proteobacteria 65 Predicted signal transduction protein with a without kinase domain (COG2972) C-terminal ATPase domain. 9 10519-11694 + 391 beta-lactamase (COG0596) Bacteroidetes Flavobacterium 64 Predicted or acyltransferases (alpha/beta hydrolase superfamily). 10 11733-12428 + 231 two-component system, regulatory protein Proteobacteria 65 Member of the two-component regulatory system lytR/lytS that regulates genes involved in autolysis and cell wall metabolism. 11 12647-12919 + 90 Conserved hypothetical protein Bacteroidetes Bacteroides 56 Cytoplasmic protein, unknown function 12 12916-14010 + 364 Conserved hypothetical protein Bacteroidetes Bacteroides 67 Cytoplasmic protein, unknown function 13 14519-14247 - 90 Peptidase M48, Ste24p (COG0501) Bacteroidetes Flavobacterium 90 Zn-dependent protease with chaperone function 14 15079-14576 - 167 Peptidase M48, Ste24p (COG0501) Bacteroidetes Flavobacterium 77 15 15865-15236 - 209 Glucose inhibited division protein Bacteroidetes Flavobacterium 92 Predicted S-adenosylmethionine-dependent (COG0357) methyltransferase involved in bacterial cell division

101 6.3.4 Mid-high GC% genome fragment analysis

Five genomic inserts of mid-high G+C percentage were identified from random

BAC-end sequencing survey and sequenced to completion. Their size, number of ORFs and total % hypothetical proteins are summarized in Table 6.1. G+C content ranged from 49.2 to 71.2%. A total of 98 ORFs were identified and assigned to 12 different COG functional categories. A total of 8 ORFs had no significant similarity to a homolog in the GenBank database; another 9 were identified as conserved hypothetical proteins. Several gene clusters were identified within clone 67. This included a cluster of 17 genes (ORFs 9-25), arranged in 5 putative operons, encoding predicted proteins involved in the phenylacetate and/or phenylacetyl-CoA degradation pathway as well as a cluster of 4 genes (ORFs 28-31) involved in the high-affinity ABC-type transport system of molybdenum, an essential element for bacteria which serves as a cofactor for a number of enzymes involved in the metabolism of carbon, nitrogen and sulphur. Clone 1664 appeared to be a unique fragment from an uncharacterized microorganism in that 7 of 19 ORFs had no significant similarity found and that similarities to homologues were low (average 53.7%). Nine

ORFs from clone 1664 exhibited similarities to Proteobacteria as well as individual ORFs showing similarities to homologues from Actinobacteria,

Firmicutes and domain Eukaryota. Clone 2089 had 16 ORFs (55%) with greatest similarity to homologues affiliated with Actinobacteria whereas clone 67 was dominated by ORFs with highest similarity to homologues affiliated with

Proteobacteria (70%). The 16 ORFs identified in clone 543 had similarity to homologues affiliated to species from 5 different phyla including Proteobacteria

(8), Actinobacteria (3) and Planctomycetes (2).

102 COG functional categories allocated to ORFs from the cyanobacterial bloom genome fragments indicate a large number of ORFs were involved in metabolism as well as energy production and conversion (Figure 6.3). Both a complete and partial ATP synthase complex chain were identified on clone

2089 (ORF5-16) and 543 (ORF12-16) respectively. In addition a zinc- dependent dehydrogenase family protein (clone 67 ORF6), a NADH dehydrogenase (clone 543 ORF11) and a ribose-5-phosphate isomerase (clone

2089 ORF2) were identified. As with ORFs identified from cyanobacterial bloom genome fragments with low GC content a number of putative hydrolases were identified including a metallophosphoesterase (clone 543 ORF2), a predicted hydrolase of the metallo-beta-lactamase superfamily (clone 1664

ORF13) and a predicted hydrolase of the alpha/beta-hydrolase fold (clone 2089

ORF25). One ORF was found to be involved in carbohydrate metabolism, a putative fructose-1,6-bisphosphatase (clone 2089 ORF21).

Several ORFs were identified that were assigned to functions involved with amino acid transport and metabolism including a small operon for glutamate biosynthesis (clone 142 ORF2-3) that consists of a small and large glutamate synthase subunits as well as a suspected ABC-type branched-chain amino acid transport systems operon (clone 67 ORF12-14). There were also ORFs with homology to a saccharopine dehydrogenase (L- Dehydrogenase; clone

543 ORF3), an oligoendopeptidase F gene (clone 543 ORF4), a probable class

III aminotransferase (clone 543 ORF10) and a dipeptidyl aminopeptidases

(clone 2089 ORF24) belonging to the S9 family.

ORFs inferred in lipid metabolism included a glycosyl (clone 2089

ORF4), two enoyl-CoA hydratase/isomerase’s (clone 2089 ORF17; clone 67

103 ORF5) a 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (clone 543 ORF5), and a toluene sulfonate zinc-independent alcohol dehydrogenase oxidoreductase (clone 67 ORF1) thought to be involved in the first reduction step in a fatty acid biosynthesis pathway.

A number of cellular processes genes were identified including several ORF showing homology to genes involved in signal transduction mechanisms

(histidine kinase’s (clone 1664 ORF14; clone 67 ORF4)). Genes involved in inorganic ion transport and metabolism were identified; specifically an ammonium transporter (clone 1664 ORF9) and a putative cation-transporting P- type ATPase (clone 543 ORF9). Genes were identified that were assigned to the COG function category; co-enzyme transport and metabolism including a riboflavin biosynthesis protein RIBA (clone 2089 ORF23) and a quinolinate synthetase (clone 543 ORF8). An extracellular serine protease (clone 2089

ORF25) was identified which is inferred in posttranslational modification, protein turnover and chaperones. A peptidase S13, D-Ala-D-Ala carboxypeptidase C

(clone 67 ORF3), and a penicillin-binding 1 (peptidoglycan synthetase) transmembrane protein (clone 543 ORF13) which are involved in cell wall/membrane/envelope biogenesis were also identified.

COG functional categories inferred in cell information, storage and processesing were assigned to numerous ORFs from the cyanobacterial bloom genomic fragments. Several ORFs were inferred in translation, ribosomal structure and biogenesis. A haemolysin activator-related protein (clone 1664

ORF12) which is suspected of activating haemolysin, a toxin secreted across both the cytoplasmic and outer membranes of Gram-negative bacteria, a Lysyl- tRNA synthetase (clone 67 ORF7). Genes encoding proteins of function

104 Table 6.5: Predicted protein coding genes encoded on medium-high G+C DNA genomic fragments isolated from a cyanobacterial bloom metagenome ORF Position D Size Putative function: BLASTp (COG Closest relative Sim. Comments (aa) accession number if available) (Phylum Genera) % 1664 1 1361-255 - 368 Spidroin 2 No significant similarities found Belongs to the silk fibroin family 2 3880-1316 - 854 Peptidase S1, chymotrypsin Proteobacteria Magnetococcus 56 3 4746-3847 - 299 Hypothetical protein No significant similarities found - 4 4796-6370 + 524 Hypothetical protein No significant similarities found - 5 6915-7685 + 256 dynamin Eukaryota 40 6 7698-8186 + 162 CBS domain protein (COG0517) Proteobacteria Caulobacter 74 GMP biosynthesis from IMP; first step 7 9736-8168 - 522 Conserved hypothetical protein Proteobacteria Burkholderia 50 Cytoplasmic protein, unknown function 8 9740-10837 + 365 PilL Proteobacteria Xanthomonas 39 9 11122-9818 - 434 Putative ammonium transporter Proteobacteria 69 Evolutionarily-related protein involved in the (COG0004) transport of ammonium ions across membranes 10 11258-11800 + 180 Conserved hypothetical protein Actinobacteria Frankia 50 Cytoplasmic protein, unknown function 11 12162-13448 + 428 regulatory protein, TetR Proteobacteria Rhodopseudomonas 52 Regulation of transcription, DNA-dependent

12 15222-12127 - 1031 hemolysin activator-related protein Proteobacteria Escherichia 59 Activates haemolysin , a toxin secreted (COG0532) across both the cytoplasmic and outer membranes of Gram-negative bacteria 13 17333-15546 - 595 Hypothetical protein (COG0595) No significant similarities found Predicted hydrolase of the metallo-beta- lactamase superfamily 14 17502-18476 + 324 Hypothetical protein (COG0692) No significant similarities found Probable histidine kinase involved in signal transduction mechanisms 15 18537-17629 - 302 Conserved hypothetical protein Proteobacteria Burkholderia 49 Cytoplasmic protein, unknown function 16 18709-19545 + 278 Hypothetical protein No significant similarities found - 17 19533-21065 + 510 Conserved hypothetical protein Firmicutes Streptococcus 54 Cytoplasmic protein, unknown function 18 21729-21577 - 50 Hypothetical protein No significant similarities found - 19 22208-21723 - 161 Hypothetical extracellular tungstate Proteobacteria Vibrio 50 binding protein

2089 1 2882-1377 - 501 Conserved hypothetical protein Actinobacteria Mycobacterium 59 Involved in Lipid transport and metabolism (COG0183) 2 2861-3406 + 181 Ribose-5-phosphate isomerase 3 Actinobacteria Propionibacterium 75 In addition to its activity on D-ribose 5-

105 (COG0698) phosphate it probably also has activity on D-allose 6-phosphate 3 5727-3964 - 587 Conserved hypothetical protein Proteobacteria Burkholderia 56 Cytoplasmic protein, unknown function 4 5707-5970 + 87 Glycosyl transferase, family 4 Actinobacteria Nocardioides 58 Involved in lipid metabolism 5 6050-6385 + 111 ATP synthase F0, subunit I Proteobacteria Sulfitobacter 55 ORFs 5-16: ATP synthase complex, 6 6382-6846 + 154 Conserved hypothetical protein Actinobacteria Kineococcus 51 composed of a nine-subunit (A-G, F6, F8) 7 6906-7664 + 252 ATP synthase F0, A subunit (COG0356) Actinobacteria Frankia 63 transmembrane channel through which 8 7713-7997 + 94 ATP synthase C chain (COG0636) Actinobacteria Corynebacterium 78 protons are pumped (F0-complex), and a 9 8072-8683 + 203 ATP synthase B chain (COG0711) Actinobacteria Corynebacterium 60 five-subunit (alpha, beta, gamma, delta, 10 8680-9222 + 180 DegT/DnrJ/EryC1/StrS family protein Proteobacteria Oceanicaulis 60 epsilon) catalytic core for ATP synthesis 11 9219-9743 + 164 ATP synthase delta subunit (COG0711) Actinobacteria Symbiobacterium 52 (F1-ATPase). 12 9774-11366 + 530 ATP synthase alpha subunit Actinobacteria Acidothermus 78 13 11372-11854 + 160 H+-transporting ATPase, gamma subunit Proteobacteria Polaromonas 63 14 11829-12332 + 167 H+-transporting ATPase, gamma subunit Actinobacteria Frankia 68 15 12348-13790 + 480 ATP synthase beta subunit Actinobacteria Propionibacterium 81 16 13790-14218 + 142 ATP synthase epsilon chain Actinobacteria Nonomuraea 61 17 14359-13862 - 165 Enoyl-CoA hydratase/isomerase Proteobacteria Azotobacter 41 Involved in fatty acid metabolism 18 14314-14994 + 226 Short chain dehydrogenase (COG0300) Actinobacteria Janibacter 59 Short-chain dehydrogenases of various substrate specificities 19 15032-15757 + 241 LuxR:Response regulator receiver Cyanobacteria Trichodesmium 77 A transcriptional activator for quorum- sensing control 20 15809-16462 + 217 Conserved hypothetical protein Actinobacteria Janibacter 47 Cytoplasmic protein, unknown function 21 16619-16488 - 43 putative fructose-1,6-bisphosphatase Actinobacteria Nocardia 88 Involved in carbohydrate metabolism 22 16593-19334 + 913 Peptidase S9, prolyl oligopeptidase Actinobacteria Crocosphaera 45 Serine-type peptidase activity (COG1506) 23 19346-20119 + 257 Riboflavin biosynthesis protein RIBA Firmicutes Clostridium 47 24 20888-20127 - 253 extracellular serine protease Deinococcus-Thermus Thermus 53 25 20839-21405 + 188 Predicted hydrolase of the alpha/beta- Proteobacteria Magnetospirillum 50 hydrolase fold 26 22970-21402 - 522 unnamed protein product (COG0477) Eukaryota Aspergillus 47 Cytoplasmic protein, unknown function

543 1 4409-2775 - 545 Class I peptide chain release factor Proteobacteria Paracoccus 42 Involved in translation, ribosomal structure and biogenesis 2 4708-5307 + 200 Metallophosphoesterase (COG1408) Proteobacteria Anaeromyxobacter 50 hydrolase activity 3 6387-5314 - 358 Saccharopine dehydrogenase (COG1748) Proteobacteria Hahella 67 Involved in amino acid transport and metabolism 4 8168-6384 - 595 Oligoendopeptidase F (COG1164) Bacteroidetes Cytophaga 60 Involved in amino acid transport and metabolism

106 5 9015-8077 - 313 4-diphosphocytidyl-2-C-methyl-D-erythritol Firmicutes Desulfitobacterium 53 Catalyzes the phosphorylation of the kinase (COG1947) position 2 hydroxy group of 4- diphosphocytidyl-2C-methyl-D-erythritol 6 9045-9224 + 60 Putative S1/P1 Nuclease Proteobacteria Alteromonas 74 Cleave RNA and single stranded DNA with no base specificity 7 12522-9868 - 885 Putative RTX toxin Bacteroidetes Tenacibaculum 52 Belong to the Type I secretion system, and are important virulence factors in Gram- negative bacteria 8 12532-12909 + 126 Quinolinate synthetase (COG0739) Actinobacteria Streptomyces 67 Catalyzes the condensation of iminoaspartate with dihydroxyacetone phosphate to form quinolinate 9 12944-16303 + 1120 Putative cation-transporting P-type Actinobacteria Corynebacterium 50 ATPase 10 16320-17450 + 386 Probable class III aminotransferase Proteobacteria Burkholderia 41 11 18074-17517 - 186 NADH dehydrogenase I chain I Proteobacteria Bdellovibrio 57 (COG1143) 12 18131-18475 + 115 ATP synthase F0, subunit I Proteobacteria Roseovarius 57 May donate electrons to ubiquinone 13 18472-18930 + 153 Penicillin-binding 1 (peptidoglycan Proteobacteria Azoarcus 65 synthetase) transmembrane protein 14 18976-19983 + 336 Probable protein ATP synthase A chain Planctomycetes Rhodopirellula 56 (COG0356) Key component of the proton channel; it 15 20084-20401 + 106 F0F1-type ATP synthase C subunit/ Actinobacteria Corynebacterium 73 may play a direct role in the translocation of Archaeal/vacuolar-type H+-ATPase protons across the membrane subunit K (COG0636) This is one of the three chains of the 16 20160-20894 + 250 Probable protein ATP synthase B chain Planctomycetes Blastopirellula 53 nonenzymatic component (Cf(0) subunit) of (COG0711) the atpase complex 67 1 45-266 + 74 Toluene sulfonate zinc-independent Proteobacteria Xanthobacter 91 Fatty acid biosynthesis pathway; first alcohol dehydrogenase oxidoreductase reduction step (COG1028) 2 997-473 - 175 Conserved hypothetical protein Bacteroidetes Flavobacterium 54 Cytoplasmic protein, unknown function 3 1113-1790 + 226 Two component transcriptional regulator, Proteobacteria Burkholderia 71 winged helix family 4 1787-2806 + 340 Signal transduction histidine kinase Proteobacteria Magnetospirillum 58 Member of a two-component regulatory (COG0642) system qseB/qseC. Activates the flagella regulon by activating transcription of flhDC 5 3951-3106 - 282 Enoyl-CoA hydratase/carnithine racemase Proteobacteria Pseudomonas 71 Could possibly oxidize fatty acids using

107 (COG1024) specific components 6 4032-5021 + 330 Oxidoreductase, zinc-binding Proteobacteria Roseovarius 83 Involved in energy production and dehydrogenase family protein (COG0604) conversion 7 5398-5024 - 125 Lysyl-tRNA synthetase Eukaryota Arabidopsis 62 8 6108-5479 - 210 Transcriptional regulator, TetR family Proteobacteria Rhodopseudomonas 70 Involved in nucleobase, nucleoside, nucleotide and nucleic acid metabolism 9 7479-6175 - 435 Phenylacetate-CoA ligase (COG1541) Proteobacteria Rhodopseudomonas 89 ORFs 9-25: Components of a Phenylacetate acid degradation gene 10 7932-7483 - 150 Phenylacetic acid degradation-related Proteobacteria Paracoccus 78 cluster protein (COG2050) 11 8746-7934 - 271 Phenylacetate degradation, enoyl-CoA Proteobacteria Nitrobacter 78 hydratase paaB (COG1024) 12 9014-10177 + 388 Putative branched-chain amino acid ABC Proteobacteria Rhodopseudomonas 85 ORFs 12-16: Component of the high-affinity transporter system substrate-binding leucine, isoleucine, valine transport system protein (COG0683) I (LIV-I), which is operative without Na(+) 13 10253-11125 + 291 Inner-membrane translocator (COG0599) Proteobacteria Paracoccus 79 and is specific for alanine and threonine, in putative branched-chain amino acid ABC Proteobacteria Rhodopseudomonas 85 addition to branched-chain amino acids. transport system permease protein Involved in cell growth and/or maintenance (COG4177) 14 11125-12072 + 316 Permease protein LivM Proteobacteria Rhodopseudomonas 83 15 12069-12872 + 268 ABC transporter related LivG Proteobacteria Rhodopseudomonas 79 16 12865-13563 + 233 ABC transporter related LivF Proteobacteria Rhodopseudomonas 77 17 13688-15730 + 681 Phenylacetic acid degradation protein, Proteobacteria Rhodopseudomonas 75 PaaN/Z subunit 18 15727-17853 + 709 enoyl-CoA hydratase/isomerase/3- Proteobacteria Roseobacter 64 hydroxyacyl-CoA dehydrogenase 19 17901-19103 + 401 Thiolase, PaaJ subunit Proteobacteria Mesorhizobium 86 20 20268-19192 - 359 Phenylacetate-CoA oxygenase/reductase, Proteobacteria Rhodopseudomonas 70 PaaK subunit 21 20817-20278 - 180 Phenylacetate-CoA oxygenase, PaaJ Proteobacteria Bradyrhizobium 82 subunit 22 21573-20821 - 251 Phenylacetate-CoA oxygenase, PaaI/C Proteobacteria Rhodopseudomonas 76 subunit 23 21860-21573 - 96 Phenylacetic acid degradation B, PaaB Proteobacteria Rhodopseudomonas 87 24 22870-21866 - 335 Phenylacetate-CoA oxygenase, PaaG Proteobacteria Rhodopseudomonas 85 subunit 25 23748-22978 - 257 Phenylacetic acid degradation operon Proteobacteria Bradyrhizobium 56 negative regulatory protein, PaaX 26 24408-23752 - 219 Putative glutathione S-transferase Proteobacteria Sinorhizobium 73

108 27 24714-24496 - 73 Hypothetical protein No significant similarity found ORFs 28-31: Components of a gene cluster 28 24724-25518 + 265 ABC transporter, substrate binding Proteobacteria Agrobacterium 73 that encodes an ABC-type, high-affinity protein, ModA subunit molybdate transporter 29 25525-26217 + 231 Molybdenum transport protein, ModB Proteobacteria Mesorhizobium 86 subunit 30 26221-26907 + 229 Molybdenum import ATP-binding protein Proteobacteria Agrobacterium 73 ModC 31 26904-27314 + 137 Molybdenum-binding transcriptional Proteobacteria Agrobacterium 70 regulator, ModE family protein 32 27589-28548 + 320 RNA methyltransferase TrmH, group 3 Proteobacteria Xanthobacter 73

142 1 95-973 + 292 Peptidase S1/S6, chymotrypsin/Hap Proteobacteria Shewanella 68 - 2 2531-1119 - 470 Glutamate synthase, NADH/NADPH, Proteobacteria Pseudoalteromonas 83 ORFs 2 & 3: Involved in nitrogen small subunit 2 (COG0493) metabolism and glutamate biosynthesis 3 7092-2548 - 1514 Glutamate synthase, large subunit Proteobacteria Alteromonas 81 (COG0069) 4 8730-7777 - 317 Uncharacterized conserved protein Proteobacteria Idiomarina 84 (COG1469) 5 9415-8963 - 150 isoaspartyl dipeptidase Firmicutes Bacillus 63 Catalyzes the hydrolytic cleavage of a subset of L- isoaspartyl (L-beta-aspartyl) dipeptides.

109 involved in transcription included two genes encoding regulatory protein, TetR which were identified from two different fragments (clone 1664 ORF11; clone

67ORF8) and a LuxR: Response regulator receiver which is a transcriptional activator for quorum-sensing control (clone 2089 ORF19).

6.4 Discussion

This chapter describes the complete sequences of seven 9-30 kb genome fragments selected from initial screening of 2850 environmental BAC clones which were constructed from the metagenome of a toxic cyanobacterial bloom.

With the advent of global warming cyanobacterial blooms are becoming far more frequent and are causing problems all over the world. However very little is known in regards to the population structure and function of cyanobacterial bloom communities and the intricate relationships that exist between cyanobacteria and heterotrophic bacteria. Similarly the Actinobacteria group which is thought to dominate water populations (Glockner et al., 2000; Sekar et al., 2003; Warnecke et al., 2004) is under-represented by cultured species and genome information. A greater understanding would steadily come with information into the genomes of cyanobacteria and other heterotrophic bacteria that comprise a bloom. The major problem facing investigations into the genomes of bloom constituents, particularly Cyanobacteria, is that it is extremely difficult to grow these species in pure form in the laboratory.

Metagenomics is a powerful technique potentially accessing the genome of any microorganism without the need to culture and was therefore used to uncover new insights into cyanobacterial bloom function and diversity. Small genome fragments described in this chapter provide an interesting snippet of the physiological and metabolic capabilities of the microorganisms that make up a cyanobacterial bloom, and may accelerate our ability to bring these

110 microorganisms into culture as well as to better control and predict bloom events.

6.4.1 Phylogenetic assignment

Of the 7 fragments only one insert was found to contain a phylogenetic anchor

(16S rDNA: Section 6.3.3a) meaning it is difficult to give phylogenetic inference to the remaining fragments with any conviction. Often, clones that bear phylogenetic anchors (usually ribosomal RNA genes) are sought in metagenomic studies, but sometimes phylogenetic assignments have been based on the dominance of BLAST hits obtained with ORFs. A metagenomic report by Nesbø et al. (2005) found that the majority of predicted proteins in each genome fragment, 57–96%, agree with their respective rRNA genes.

However caution was additionally issued with the finding that a significant fraction of the ORFs in each genome fragment (7–44%) has been acquired by lateral gene transfer. Using these findings as a guideline as well as G+C content, a ‘loose’ inference of phylogenetic affiliation of genome fragments has been provided at a phylum level to fragments that are dominated by homologues affiliated to a given phylum and are within the given G+C range.

There were several cases where inserts exhibited a high percentage of ORFs with similarity to homologues affiliated with members of a particular phylum and a suggestion of their phylogenetic affiliation to the microorganism from which the fragment originated was subsequently given. This included clone 905 which contained a high percentage of predicted proteins affiliated to the Bacteroidetes group (87%); clones 67 and 142 which were dominated by ORFs with highest similarity to homologues affiliated with Proteobacteria (70% and higher); and clone 2089 was dominated by ORFs that had greatest similarity to homologues 111 affiliated with Actinobacteria (55%). All fragments given a phylogenetic affiliation will be referred to as a ‘suspected’ fragment, for instance clone 905 which had

87% of its 15 ORFs affiliated to homologues from Bacteroidetes will be referred to as a suspected Bacteroidetes fragment. Clones 1664 and 543 exhibited no dominance of ORFs affiliated to one particular phylum and no phylogenetic inference was given (Section 6.3.4).

6.4.1a Phylogenetic anchors: 16S rDNA

Phylogenetic assignment of genomic fragments is an important step in metagenomic analysis as it presents the opportunity to link phylogeny of uncultivated or uncharacterized microorganisms to functional activities encoded on large fragments of DNA captured in BAC libraries. Identifying genes and their putative function could potentially allow hypotheses to be developed on the ecological roles these microorganisms play. To further investigate the functional biology of a partial genome sequence from a Burkholderiales taxon I have sequenced the 22 kb insert sequence from BAC clone 578, which contains an rRNA gene affiliated with the Comamonadaceae family, with 99% identity to an uncharacterized beta-Proteobacterium. This family contains fifteen genera with diverse characteristics including organisms that produce sheaths (Mulder,

1989a, b) which are hollow tube-like structures surrounding a chain of cells that help bacteria attach to surfaces, obtain nutrients from slowly running water and resist against predators. The highest identity to a characterized type culture was

96% to the 16S rRNA gene of Roseateles depolymerans DSM 11813T, the first bacteriochlorophyll (BChl) a-containing obligate aerobes to be classified in the beta-subclass of the Proteobacteria (Suyama et al., 1999). The strains were originally isolated as degraders of poly(hexamethylene carbonate) (PHC) but can additionally utilize some other biodegradable plastics (Suyama et al., 1999). 112 The closest relative to Clone 578 with a completed genome sequence available is Rubrivivax gelatinosus strain PM1 with which it shares a 96% 16S rDNA similarity. This is a phototrophic β-Proteobacteria that is found in freshwater, sewage, and activated sludge (Sinha and Banerjee, 1997; Hanson et al., 1999).

It can grow under aerobic conditions in the dark, or photosynthetically under anaerobic conditions and can also grow on carbon dioxide or on hydrogen. It is a model organism for the study of photosynthetic processes due to its genetic tractability (Nagashima et al., 1994; Ouchane et al., 1996; Igarashi et al., 2001;

Ouchane et al., 2002). One characteristic of this organism is the liquefaction of gelatin by extracellular proteases (Klemme and Pfleiderer, 1977). This strain is capable of methyl tert-butylether (MTBE) degradation which is a gasoline additive, and was isolated from a mixed culture that came from a compost biofilter from a water treatment plant (Hanson et al., 1999). Of the 17

ORFs identified 4 exhibited highest similarity to homologues from R. gelatinosus and 9 ORFs had similarity to homologues that are not present in the R. gelatinosus genome.

In comparison to conventional rrn operons (Hill, 1999), the Roseateles rrn operon of clone 578 is different in that the 16S rDNA gene is separated from the other 16S rDNA genes. There are only several known exceptions with unlinked rrn operons found in several genomes from species of the phyla

Planctomycetes, Deinococcus-Thermus, γ-Proteobacteria, α-Proteobacteria and

Euryarchaeota (Liesack and Stackebrandt, 1989; Ruepp et al., 2000; Tamas et al., 2002; Glockner et al., 2003; Henne et al., 2004) as well as genome fragments analysed from Poribacteria (Fieseler et al., 2006). The genome of

Rubrivivax gelatinosus is known to contain 3 copies of 16S rDNA with only one linked rrn operon. Unlinked rrn operons appear to be randomly distributed

113 among different prokaryotic phyla where they may be in one or two copies whereas conventional rrn operons can be present in as many as seven copies

(Hill, 1999). The physiological relevance of unlinked rrn operons is as yet unknown, however ecologically the low copy number of 16S rDNA genes has been correlated with slow metabolism (Wolfe and Haygood, 1993).

Although most proteins in clone 578 exhibited coherent phylogenetic profiles with the 16S rDNA phylogenetic tree (Figure 6.2), a few cases did not and could be suggestive of horizontal gene transfer. Moreover the ORFs in question additionally demonstrated G+C composition percentage that were considerably lower than the average G+C% of the genome fragment and that of close relatives (Suyama et al., 1999). These concerned a group of 5 ORFs aligned in the same orientation that all had highest similarity to homologues with no assigned function. ORF8 in particular only had 4 homologues of similarity 55% or less to hypothetical proteins from the phyla Deinococcus-Thermus,

Actinobacteria and Firmicutes.

A total of 17 ORFs were identified with putative COG function assigned in the broad themes of information, storage and processing; cellular processes and metabolism. Three dehydrogenases (ORF2, 4 and 5) were identified, two involved in energy production (an aldehyde dehydrogenase and a D-lactate dehydrogenase) and the other a saccharopine dehydrogenase, which is inferred in amino acid biosynthesis. Two genes (ORF14 and 15) encoded component proteins for the phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS), a major carbohydrate transport system in bacteria. The PTS catalyses the phosphorylation of incoming sugar substrates and coupled with translocation across the cell membrane, makes the PTS a link between the

114 uptake and metabolism of sugars.

One of the most appealing applications of metagenomics is the identification of metabolic genes that could provide information about the physiology and requirements of uncultivated and uncharacterized microorganism that can ultimately lead to changes in cultivation techniques. The most interesting ORF in this sense was ORF2 which encodes a D-lactate dehydrogenase homologue with a domain arrangement that included a FAD-linked oxidase, C-terminal and

N-terminal domain. D-lactate dehydrogenases belong to a group of respiratory enzymes that oxidize a variety of organic substrates (e.g. formate, succinate, lactate), passing the electrons to a variety of oxidants (e.g. oxygen, nitrate, fumarate) (Dym et al., 2000). The free energy available from these reactions is used by other membrane proteins such as solute transport or ATP synthesis. D- lactate dehydrogenase in Proteobacteria catalyzes the oxidation of d-lactate

(which it uses as a carbon source) to pyruvate, which is coupled to transmembrane transport of amino acids and sugars (Futai, 1992). This protein which is not located in the R. gelatinosus genome was found after amino acid sequence alignments and phylogenetic analysis to be most similar to a D- lactate dehydrogenase and FAD-linked oxidases found in Burkholderia species

(Figure 6.4). Caution should be used when interpreting the biological relevance of this putative protein without expression studies; however there is justification for such studies to be continued.

A much more common protein but nonetheless very ecologically important and versatile, was found to be encoded by ORF4 which had highest similarity to an aldehyde dehydrogenase homologue. These are enzymes which catalyze the oxidation (dehydrogenation) of a wide variety of aliphatic and aromatic

115 aldehydes using NADP as a cofactor and have been found in nearly every form of living species. They are used in many carbohydrate, amino acid and fatty acid metabolism pathways which were illustrated from KEGG analysis which assigned the putative aldehyde dehydrogenase (clone 578 ORF4) to a total of

16 different pathways.

Figure 6.3: Phylogenetic analysis of a D-lactate dehydrogenase protein

Putatative D-lactate dehydrogenase protein encoded by clone 578 ORF2. Bootstrap values greater than 50 are shown at nodes. The tree was calculated on complete amino acid sequences. A D-lactate dehydrogenase isoform from Homo sapiens

(NP_919417) was used as an outgroup.

6.4.2 Functional assignment

Gene finding and functional analysis identified many housekeeping proteins as well as physiologically and ecologically important proteins some which have been looked at more in depth in the present study. These include several bacterial transcriptional regulators (section 6.4.3), alcohol dehydrogenase/oxidoreductases (section 6.4.4) and a putative RTX toxin

116 (section 6.4.5). Several suspected operons have been identified to be involved in nitrogen metabolism as well as phenylacetic acid degradation both from suspected Proteobacteria-affiliated bacterial genome fragments in clone 67 and

142 respectively. The putative phenylacetic acid degradation gene cluster will be further examined in section 6.4.6.

6.4.3 Response regulators

Response regulator proteins are involved in the regulation of transcription, specifically modulating the frequency, rate or extent of DNA-dependent transcription. They are involved in two-component signal transduction systems in bacteria that function to detect and respond to environmental changes. As the name suggests, the two-component system consists of two elements - a histidine kinase environmental sensor and its cognate response regulator. Upon detection of a specific stimulus the histidine kinase phosphorylates the signal receiver domain of a response regulator protein which induces a conformational change in the response regulator. This activates the effector domain which usually has DNA-binding activity, triggering the cellular response (Stock et al.,

2000). The domains of the two-component proteins are highly modular, but the core structures and activities are maintained.

Two-component proteins can be responsible for a wide variety of biological processes in the real environment ranging from quorum sensing to the production of toxins. In the case of a complex community such as a cyanobacterial bloom they could play a significant role in population structure and function through indirect signals or proteins that they have induced to be produced. Consequently, they could be used as potential targets for bloom management if two-component systems are found that act as switches for 117 ecologically relevant proteins that participate in bloom formation and/or toxin production. Several ORFs from different genome fragments were affiliated to proteins involved in the regulation of transcription. However, three of the more interesting were found in clone 2089 (ORF20) and clone 905 (ORF8 and 10) which encoded a putative DNA-binding response regulator from the LuxR family and a putative two-component system of the LytR/LytS family respectively.

Genes are examined further in sections 6.4.3a-b.

6.4.3a Bacterial regulatory protein, LuxR

Clone 2089 ORF20 encoded a 241 residue protein which had closest similarity to homologues of DNA-binding response regulators, from the LuxR family. The highest similarities were to homologues of the LuxR domain, the closest being

77% to the cyanobacterium Trichodesmium. For the entire putative response regulator including both signal receiver and effector domains a homologue affiliated to the Firmicute Bacillus anthracis exhibited the highest similarity of

49%. DNA-binding response regulators from the LuxR family are generally characterised by a domain structure that consists of a LuxR effector domain and a signal receiver domain which are involved in the two-component signal transduction system in bacteria. Homology searches against the GenBank database using a conserved domain search (Marchler-Bauer and Bryant, 2004) and InterProScan (Zdobnov and Apweiler, 2001) shows that clone 2089 ORF20 has similar domain pattern to response regulators from the LuxR family; containing a helix-turn-helix (HTH) DNA-binding effector domain and a signal receiver domain (Figure 6.4). In proteins, the helix-turn-helix is a major structural motif capable of binding DNA. It is composed of two α helices joined by a short strand of amino acids and is found in many proteins that regulate gene expression. 118

Figure 6.4: Domain structure of a putative LuxR response regulator. The domain arrangement in 2089ORF20, a putative bacterial regulatory protein belonging to the

LuxR family.

The bacterial regulatory protein, LuxR domain is a HTH DNA-binding domain of about 65 amino acids, present in transcription regulators of the LuxR/FixJ family of response regulators (Pao et al., 1994). The domain is named after Vibrio fischeri luxR, a transcriptional activator for quorum-sensing control of luminescence (Engebrecht et al., 1983; Fuqua et al., 1994). Most LuxR-type

HTH domain regulators act as transcription activators via one of four different mechanisms and control a wide variety of activities in various biological processes, the most notable being quorum sensing. Some LuxR-type HTH domain regulators, when bound to N-acyl homoserine lactones, are used as quorum sensing molecules in a variety of gram-negative bacteria (Pappas et al.,

2004). This form of cell-cell communication present in these species regulates such dissimilar functions as exoenzyme synthesis, conjugation, antibiotic production, luminescence and biofilm formation (Swift et al., 1999; Greenberg,

2000; Parsek and Greenberg, 2000).

There are other notable types of mechanisms for these types of bacterial regulators including those that belong to a two-component sensory transduction system where the protein is activated by its phosphorylation by a transmembrane kinase (Birck et al., 2002). Proteins of this nature play ecologically important roles such as global regulators inducing the expression of nitrogen-fixation genes in microaerobiosis (Birck et al., 2002), activators of the 119 nitrate reductase operon in E.coli (Maris et al., 2002) and they are known to be involved in the regulation of exopolysaccharide biosynthesis in enteric and plant pathogenesis. Proteins from this family can also function as autonomous effector domain regulators, without a regulatory domain, such as the transcriptional regulator of spore formation in Bacillus subtilis (Ducros et al.,

2001) as well as multiple ligand-binding regulators which were found to activate the maltose operon in E.coli (Schlegel et al., 2002). Regardless of the specific mechanism of this bacterial regulator its function is probably essential from an ecological perspective for not just the species from which this genome fragment originated but possibly species from the larger community.

6.4.3b Two-component regulatory system lytR/lytS

Clone 905 was found to contain a potential putative two-component signal transduction system. ORF10 encoded a 231 residue protein which was assigned as a response regulator of the LytR/AlgR family (COG3279) with highest similarity (65%) to a two-component system, regulatory protein homologue affiliated to an uncharacterized γ-Proteobacterium. ORF 8 encoded a 350 residue protein that had highest similarity to a two-component system sensor protein without a kinase domain (65%). Interestingly, an affiliation to a signal transduction histidine kinase homologue from the LytS family was noted

(similarity: 54 %). In general, response regulators of the microbial two- component signal transduction systems consist of a signal receiver domain and a effector domain, typically a DNA-binding HTH or winged helix domain (Grebe and Stock, 1999; Stock et al., 2000; West and Stock, 2001). The LytR domain as defined by Nikolskaya and Galperin (2002) shows no significant similarity to the HTH or winged-helix domains. However, bacterial response regulators containing this LytR DNA-binding domain comprise a well defined family of 120 proteins that are involved in biosynthesis of extracellular polysaccharides, fimbriation, expression of exoproteins, including toxins, and quorum sensing

(Nikolskaya and Galperin, 2002). Clone 905 ORF10 demonstrated similar domain pattern to transcriptional regulators that contain the LytR domain which consists of a signal receiver domain and a LytR domain (Figure 6.5). Although the regulatory signal transmitted by this protein at this stage remains unknown it gives an snapshot of the physiological potential of this microorganism from which this genome fragment originated which is a low G+C bacteria which is loosely inferred as being affiliated to the Bacteroidetes. It gives additional reasons for further studies into this group of microorganisms that is thought to dominate most freshwater populations yet is still very much poorly understood.

Figure 6.5: Domain structure of a putative LytR/AlgR response regulator. The domain arrangement in 905ORF10, a putative response regulator of the LytR/AlgR family (COG3279).

6.4.5 Alcohol oxidoreductases

Natural microbial assemblages have long been used as a source of novel enzymes and biocatalysts to benefit industry and agriculture. Alcohol oxidoreductases constitute an interesting family of enzymes that displays a wide variety of substrate specificities as well as fulfilling several vital but quite different physiological functions. Enzymes of this type have numerous biotechnological applications ranging from alcohol and vinegar production to

121 synthesis of carbonyl compounds, hydroxy acids, amino acids, and chiral alcohols (Fang et al., 1995; Hummel, 1997, 1999).

The interconversion of alcohols, aldehydes, and ketones by alcohol oxidoreductases is involved in an astonishingly array of essential metabolic reactions in microorganisms (Knietsch et al., 2003b). They can be divided into three major categories (Reid and Fewson, 1994). (1) The NAD- or NADP- dependent dehydrogenases which can in turn be divided into the group I long- chain (approximately 350 amino acid residues) zinc-dependent enzymes; the group II short-chain (approximately 250 residues) zinc-independent enzymes and the group III "iron-activated" enzymes that generally contain approximately

385 amino acid residues. (2) NAD(P)-independent enzymes that use pyrroloquinoline quinone, haem or cofactor F420 as cofactor. (3) Oxidases that catalyze an essentially irreversible oxidation of alcohols.

Because of their use in biotechnology there is interest in pursuing new or improved catalysts from environmental sources that exhibit alcohol dehydrogenase/oxidoreductase activity. A metagenomic study focusing on detecting NAD(H)-dependent alcohol oxidoreductase activity in enrichments from soil and freshwater sediment communities has previously identified novel alcohol dehydrogenases as well as putative oxidoreductases (Knietsch et al.,

2003b). Here we report on 3 ORFs with similarity to homologues with oxidoreductase/dehydrogenase domains which were identified from annotation of gene sequences from the cyanobacterial bloom metagenome. These include a putative oxidoreductase, zinc-dependent dehydrogenase family protein (clone

67 ORF6), a putative toluene sulfonate zinc-independent alcohol

122 dehydrogenase oxidoreductase (clone 67 ORF1) and a putative short chain dehydrogenase (clone 2089 ORF18).

Clone 67 ORF6 encoded a 329 amino acid residue most similar to a protein belonging to the alcohol zinc-dependent dehydrogenase superfamily. Its putative gene ontology (GO) function entailed oxidoreductase activity, acting on the CH-OH group of donors with NAD or NADP as an acceptor. Likewise it was assigned to COG0604; ‘NADPH: quinone reductase and related Zn-dependent oxidoreductases’, which is categorized as being involved in energy production and conversion. Clone 2089 ORF18 encoded a 223 amino acid residue most similar to a protein that belongs to the short-chain dehydrogenases/reductases

(SDR) family. This family contains a wide variety of dehydrogenases of various substrate specificities but nonetheless exhibited oxidoreductase activity functions. Clone 67 ORF1 encoded 74 residues most similar to a toluene sulfonate zinc-independent alcohol dehydrogenase oxidoreductase that is related to short-chain alcohol dehydrogenases. It was assigned to the lipid transport and metabolism COG category, more specifically the first reduction step in the fatty acid biosynthesis pathway. For all three genes further proteomic work involving sub-cloning, protein expression, purification and protein crystallisation is needed before a complete physiochemical description can be made and their biotechnological potential realised.

6.4.6 A putative RTX toxin

Cyanobacterial blooms are often reported to be associated with toxic secondary metabolites but not to other harmful proteins. We have also demonstrated that genes associated with PKS were present in the cyanobacterial single gene library (Chapter 4, section 4.3.3). Here we report on the gene sequence from 123 clone 543 (ORF 7) which had low similarity to a putative RTX (repeats in toxin) toxin. The RTX family of cytolytic toxins belong to the Type I secretion system, which is found in some Gram-negative bacteria to export proteins (often proteases) across both inner and outer membranes to the extracellular medium.

In vitro, RTX toxins mostly exhibit cytotoxic and often haemolytic activities. The known operons are similarly organized in a CABD pattern where C codes for the activation protein, A encodes the structural toxin, and B and D code for proteins involved in the secretion of the toxin (Kuhnert et al., 1997). In addition a haemolysin activator-related protein was identified (1664ORF12) which activates haemolysin , which is often quoted as the model for RTX toxins (pfam accession number: PF02382 (Finn et al., 2006)). Although an operon was not identified and only putative function has been assigned to both predicted proteins an insight has been gained and with further investigation a clearer picture will ensure.

6.4.7 Aromatic compound degradation: the phenylacetyl-CoA catabolon

Aromatic compounds serve as a rich source of carbon and energy for a wide variety of microorganisms and plants. The microbial catabolism of aromatic compounds is carried out by many degradative pathways that facilitate the mineralization of these compounds. The enzymes that make up the aromatic catabolic pathways are useful for of environmental aromatic pollutants (Gibson and S. Harwood, 2002; Samanta et al., 2002; Diaz, 2004;

Parales and Haddock, 2004; Ramos et al., 2005; Zhang and Bennett, 2005) and for chemical synthesis (Parales and Haddock, 2004; Boyd et al., 2005b; Boyd et al., 2005a; Poppe and Rétey, 2005).

124 Many predicted proteins were identified that could be of interest from a biotechnological or pharmaceutical perspective including several inferred in hydrocarbon degradation. Clone 67, a suspected Proteobacteria-affiliated bacterial genome fragment contained a cluster of 17 genes (ORFs 9-25) encoding predicted proteins most similar to homologues involved in the phenylacetate and/or phenylacetyl-CoA degradation pathway. Phenylacetate derives primarily from the amino acid phenylalanine, but it is also formed in degradation pathways that target a variety of environmental pollutants and aromatics of natural and synthetic origin including styrene, 2-phenylethylamine, trans-styrylacetic acid, phenylacetaldehyde, phenylacetyl amides, phenylacetyl esters and n-phenylalkanoic acids containing an even number of carbon atoms

(Figure 6.6). As phenylacetate is a central compound that several different peripheral catabolic pathways converge through (Figure 6.6), it makes the phenylacetate degradation pathway the central functional core of an important catabolon, termed the phenylacetyl-CoA catabolon (Luengo et al., 2001). The name was given as a function of the first common intermediate (PhAc-CoA) which is the result of the activation of phenylacetate to phenylacetyl-CoA by the action of a phenylacetyl-CoA ligase (Luengo et al., 2001).

125

Figure 6.6: Biochemical organization of the phenlyacetyl-CoA catabalon

The enclosed box indicates the phenlyacetyl-CoA catabalon core. Sty, styrene; TA, tropic acid; EB, ethylbenzene; PhEtNH2, 2-phenylethylamine; PhAc amide, phenylacetyl amide; PhAc ester, phenylacetyl ester; PhAc, phenylacetic acid; PhAs, phenylalkanoates; PHPhAs, poly-hydroxyphenylalkanoates; StyAc, trans-styrylacetic acid; PhAc-CoA, phenylacetyl-CoA; 20-OH-PhAc-CoA, 20-OH-Phenylacetyl-CoA; TCA, tricarboxylic acid cycle. Source: (Luengo et al., 2001).

The complete phenylacetic acid (PhAc) catabolic pathway of P. putida

(summarized in Figure 6.7) was found to be encoded within an 18 kb gene cluster (pha) that is composed of 15 genes which are organized in five contiguous operons, paaABCEF, paaGHIJK, paaLMN, paaY and paaX (Olivera et al., 1998). These genes encode 15 proteins grouped in six putative functional 126 units: (i) a transport system (PaaL and PaaM); (ii) a PhAc-activating enzyme, i.e. the PhAc-CoA ligase (PaaF); (iii) a ring-hydroxylating complex (PaaG,

PaaH, PaaI, PaaJ and PaaK); (iv) a ring-opening protein (PaaN); (v) a β- oxidation-like system (PaaA, PaaB, PaaC and PaaE); and (vi) two regulatory proteins (PaaX and PaaY) (Olivera et al., 1998; Luengo et al., 2001). Several clusters have been identified in other species including E.coli (paa) (Ferrandez et al., 1997) and Pseudomonas sp. Y2 (Velasco et al., 1998) and numerous new putative genes and clusters have been identified from the many genomes that have been sequenced. In most cases operons from different bacteria appear to be arranged differently from those of P. putida suggesting that various

DNA rearrangements have occurred during its evolution in each particular host.

Clone 67 contained 12 genes, grouped from 5 functional units and arranged in 4 putative operons (Figure 6.7). As with clusters of several other bacteria, genes involved in the transport system specifically PaaL and PaaM were absent. PaaL and PaaM encode a permease and a specific channel forming protein for the uptake of phenylacetic acid, respectively. Five genes LivKHMGF inferred in branched-chain amino acid transport were located within this cluster with their role in relation to the phenylacetyl-CoA catabolic pathway unknown. The possibility has been previously suggested that in some bacteria within the β- oxidation-like functional unit encoded by paaGHIJK genes, the missing gene products may be replaced by similar enzymes from other β-oxidation pathways in the cell (Jiménez et al., 2002). Interestingly, Azoarcus evansii and

Burkholderia pseudomallei (β-Proteobacteria) present a putative regulatory protein (paaR gene product) of the TetR family instead of a paaX orthologue

(Mohamed et al., 2002). Whether some or all the genes inferred in branched- chain amino acid transport (LivKHMGF) play a role in the phenylacetyl-CoA

127 catabolic pathway would seem unlikely as phenylalanine is not a branched- chain amino acid and it and its derivatives contain an aromatic ring. However the positions of LivKHMGF within the gene cluster and the fact that their putative function is similar to that of the missing genes paaL and paaM does warrant further investigation. It is also interesting to note that upstream from the paa cluster is a transcriptional regulator from the TetR family. However whether its position is coincidence or if it plays a functional role in the pathway is unknown and also attracts further attention.

Figure 6.7: Organisation of a putative phenylacetyl-CoA degradation gene cluster

Gene cluster encoding the phenylacetyl-CoA catabolic pathway or catabolon core in

CBNPD1 clone 67 and in several other microbes illustrating differences in gene clusters. Colour combinations indicate the organization of these clusters in relation to the putative functional units reported in P.putida (Luengo et al., 2001). Genes

LivKHMGF in CBNPD1 clone 67 are inferred in branched-chain amino acid transport and their role in relation to the phenylacetyl-CoA catabolic pathway is unknown.

The PhAc-CoA catabolon offers attractive possibilities for the development of biotechnological applications. The pathway has been utilized in enzymatic

128 synthesis of penicillins (Luengo, 1998), improving the rate of penicillin G biosynthesis in P. chrysogenum (Minambres et al., 1996), biotransformation of

PhAc to 2'-OH-PhAc, which is a common intermediate in the synthesis of many chemical compounds (Lee and Shieh, 1993; Ferrandez et al., 1998; Olivera et al., 1998), styrene biodegradation and biotransformation (Panke et al., 1998;

Panke et al., 1999; Panke et al., 2000) and production of new bioplastics

(Luengo et al., 2001; Olivera et al., 2001).

129 CHAPTER 7 – CONCLUSIONS AND FUTURE RESEARCH

7.1 Conclusions

This study describes the microbial ecology of populations present in cyanobacterial blooms. Cyanobacterial blooms and the toxins they produce are becoming far more prevalent in Australian and international freshwaters and are causing large scale problems to agriculture and human health. Due to the difficulties associated with culturing cyanobacteria a combination of culture- independent techniques were used to investigate the population structure and function of these microbial communities. The first part of this thesis (Chapter 3) characterised the population structure of 2 cyanobacterial blooms collected from contrasting Australian lakes using 16S rRNA gene amplification and cloning.

Chapter 4 described the construction of a large-construct BAC library from

HMW DNA from a Aphanizomenon/Cylindrospermopsis dominated bloom sample, which was selected based on results from Chapter 3. Several clones were identified as containing genes of interest by various sequence-based screening techniques which were undertaken in Chapter 5. These clones were sequenced to completion in Chapter 6 and over 140 kb of sequence data annotated, with genes indentified and discussed further.

The molecular study of the 2 different cyanobacterial bloom populations has revealed there are still many uncultivated species present within this environment. Phylogenetic comparisons identified distinct novel clusters that originated from Lake Samsonvale containing sequences reported in other cyanobacterial bloom studies which therefore may be representative of cyanobacterial bloom events on a global scale. This is reinforced by the affiliation of sequences retrieved from Lake Samsonvale to previously defined

130 clusters suspected to be characteristic for cyanobacterial bloom events. The dissimilarities in bacterial communities associated with cyanobacterial blooms may also be attributed to lake character which preliminary results in this study suggest. Moreover these findings add to an as yet limited but much-needed greater understanding of interactions within bacterial communities associated with cyanobacterial blooms. It is likely that such interactions ultimately affect cyanobacterial bloom formation and/or toxin production.

Results from chapter 3 demonstrated that the bloom from Lake Samsonvale was dominated by cylindrospermopsin-producing cyanobacteria and a suitable sample for metagenomic analysis. The resultant BAC library (CBNPD1) is the first report of a metagenomic library constructed from DNA extracted from a toxic cyanobacterial bloom. Library quality assesments verified that CBNPD1 contained large inserts (Average: 27 kb) and was constructed from DNA that has novel 16S rRNA and PKS sequences.

Several clones were identified as containing genes of both physiological and phylogenetic interest using sequence-based screening of library CBNPD1. A

BAC-end sequencing survey was found to be a usefull tool not ony to identify clones for insert sequencing but also to generate sequence data which provided a snapshot of the physiological information contained within the BAC library.

PCR screens targeting 16S rDNA were used so that a link between physiological and phylogenetic information of uncharacterized microorganisms in cyanobacterial blooms could be established once the entire insert had been sequenced.

131 The major advantage of metagenomics and in particular large-construct BAC cloning is that a clone found to contain an interesting gene or phylogenetic marker can be sequenced to completion providing additional information on genes or possible gene clusters from the organism from which the DNA fragment originated. The complete sequencing of 7 clones from CBNPD1 has provided information on 130 genes from the cyanobacterial metagenome and is the first report of genetic information being recovered from a cyanbacterial bloom. The characterisation of a 22 kb fragment linked to a 16S rRNA gene affiliated to a member of the Roseateles is the first report into the genome of a microbe from this genus. Moreover the functional decription of these DNA fragments, in particular the in depth investigations into several genes of ecological importance as well as one 17.5 kb gene cluster has provided further physiological insights within cyanobacterial blooms.

7.2 Future directions

The complex interacting microbial populations of heterotrophic bacteria and

Cyanobacteria that make up cyanobacterial blooms remain a mystery despite the adverse effects they have on surrounding environments. The fact that it is extremely difficult to grow cyanobacteria in pure culture illustrates that are many intricate interactions and relationships that exist between bloom constituents. To understand completely the process of bloom formation and the production of ecologically relevant secondary metabolites such as toxins, much work needs to be directed towards characterising the microbial populations that make up a bloom. Studies identifying active and dormant populations and the physical organisation of such communities will be required to clarify the functional roles they partake in cyanobacterial blooms. The use of alternative culture- independent molecular metagenome methods is considered to be far more 132 powerful. Here we report on the first culture-independent investigation into the metagenome of a freshwater toxic bloom dominated by Aphanizomenon and

Cylindrospermopsis species, members of the phylum Cyanobacteria.

Investigations into the population structure of cyanobacterial blooms from two

Australian lakes identified numerous novel clusters, several which consisted only of sequences that originated from cyanobacterial bloom communities on a global scale. Preliminary results suggest that sequences affiliated to these clusters may have originated from bacteria which play ecologically important roles in such blooms and are representative for cyanobacterial blooms.

Futhermore it seems likely that DNA from these organisms is contained within the BAC library CBNPD1 as the same DNA template was used in both experiments. Additional investigation into the genomes of these as yet uncultured microbes is therefore desirable as it would provide invaluable information regarding their physiology. Such information may allow for gene(s) and/or pathways which are involved in bloom ecology to be identified. The metagenomic approach that is described in this study provides a basis for such investigations to be performed. The design and application of specific probes

(based on the 16S rRNA genes identified) to detect large inserts from these clusters and the subsequent insert sequencing should produce the first genomic information from these microbes and provide a base from which to build on.

Although there is an overwhelming potential for the use of culture-independent techniques such as metagenomics, it is still of upmost importance not to underestimate the power of culture-dependent strategies. As is stands today there are still major shortcomings with both approaches. It is widely accepted that whilst culture-dependent techniques grossly under-estimate the microbial

133 diversity present in nature they are still necessary to gain a complete physiological and ecological profile of an organism. Metagenomics can greatly assist culture strategies by providing information into the metabolic capabilities of an organism which in turn could lead to culture strategies being developed. A combination of both approaches is seen as step forward in overcoming the limitations of both culture-dependent and independent molecular techniques.

The identification of gene(s) and/or gene clusters involved in secondary metabolite production has important implications for water management, in particular to predict/manipulate the process of metabolite expression, especially cyanotoxins. The inability to culture the majority of Cyanobacteria that exist in nature remains a major bottleneck for the search and discovery of gene clusters responsible for cyanotoxin production. The ability to capture large fragments of

DNA using metagenomics provides a method to uncover the wealth of hidden genetic information in cyanobacterial blooms and identify large gene clusters responsible for cyanotoxin production. Only several gene clusters involved in cyanotoxin production have been exposed to date. The gene cluster for the hepatotoxin cylindrospermopsin, which was detected in the bloom sample used for BAC library construction, remains incomplete. The genetic information contained within the BAC library CBNPD1 demonstrated the presence of novel

PKS genes which could be part of gene clusters responsible for cyanotoxin production. In depth screenings using either sequence-based PCR and/or hybridization targeting genes of interest could identify clones containing such clusters.

The possible mechanisms of bloom formation and/or toxin production are rather large and it is highly likely that several determinants exist. These may include

134 and are not limited to metabolic substrate availability, cell-cell communication

(quorum-sensing), metabolite exchange, vitamin and cofactor synthesis and exchange, and competition with other organisms (bacterial toxin production).

The results presented here examine the diversity of microorganisms present in a bloom and provide detailed sequences from select BAC clones. However if the diversity of the bloom is, say greater than 100 or so different microbial

“species”, then the sequence data obtained, (144 kb), represents only approximately 0.05 % of the total sequence present in the bloom. This makes the information too sparse to be able to draw reliable and meaningful conclusions as yet. From our limited data several genes have been identified which have led to several highly speculative hypothesises on mechanisms for bloom production and toxin biosynthesis. Our studies have identified a number of bacterial regulatory switches including those that belong to a two-component sensory transduction system which are hypothesised as potentially playing ecologically important roles in bloom formation, toxin production and other biological processes. Subsequently they could be used as potential targets for predicting pre-blooming events and bloom management by controlling bloom formation and/or preventing its formation. Complete sequencing of the BAC library would further enable a more thorough assessment of the bloom metagenome allowing more detailed hypothesises to develop on what mechanisms trigger bloom production.

Much attention is focussed on secondary metabolites when the hazards of cyanobacterial blooms are assessed. However the identification of a RTX toxin infers that the health problems associated with cyanobacterial blooms may go further than the Cyanobacteria and cyanotoxins. This cytotoxin is thought to be produced by gram-negative bacteria and could be commonly present among

135 cyanobacterial blooms. Further investigations identifying the organisms that produce it, their distribution among cyanobacterial blooms and the levels at which the toxin is produced would be desirable in assisting any risk assessment strategies for water management.

The past 50 years has seen the commercialization of numerous microbial secondary metabolites from microbial cultures. The recent availability of new and powerful molecular methods and their use in microbial ecology has provided evidence that an enormous reservoir of untapped biomolecular diversity exists in nature and remains to be elucidated. Beyond the cyanotoxins, cyanobacteria are responsible for an array of secondary metabolites which are of biotechnological importance. Likewise microbes from many other phyla such as the Actinobacteria are known to produce natural products such as antibiotics.

The library constructed here is a base from which the novel molecular diversity of cyanobacterial blooms could be exploited and new natural products discovered. Screens for pharmaceutical properties of anticancer, antifungal, antibacterial and anti-protease either directly by function-based activity assays or indirectly by probing for genes in the BAC library is therefore desirable.

There is much biotechnological potential in cyanobacterial blooms beyond secondary metabolites. This study has identified several promising leads regarding genes encoding putative proteins that could be of interest. Several genes were identified that encoded putative alcohol dehydrogenase/oxidoreductase proteins. Because of their use in biotechnology ranging from alcohol and vinegar production to synthesis of carbonyl compounds, hydroxy acids, amino acids, and chiral alcohols there is interest in pursuing new or improved enzymes that exhibit alcohol

136 dehydrogenase/oxidoreductase activity. Likewise the identification a gene cluster encoding a putative PhAc-CoA catabolon offers attractive possibilities for the development of biotechnological applications. The pathway has been utilized in enzymatic synthesis of penicillins, biotransformation of PhAc to 2'-

OH-PhAc which is a common intermediate in the synthesis of many chemical compounds, styrene biodegradation and biotransformation and production of new bioplastics. Further work involving microarray technology is required to determine the possible operational pathways. Proteomic work involving sub- cloning, protein expression, purification and protein crystallisation of potential proteins of interest would also be advantageous.

Research of cyanobacterial bloom communities has much potential to clarify the ecological roles of Cyanobacteria and heterotrophic bacteria in freshwater systems. Moreover it provides a greater understanding of microbial systems in nature, a source for valuable natural products and aids in the controlling and prevention of blooms which is of obvious significance due to the extensive health aspects involved.

137 APPENDICES

Appendix I: GenBank accession numbers for single-gene clone libraries

Clone Accession Clone Accession Number Number CYN-1-2 EF158335 CYN-2-20 EF158356

CYN-1-4 EF158336 CYN-2-23 EF158357

CYN-1-7 EF158337 CYN-2-31 EF158358

CYN-1-10 EF158338 CYN-2-32 EF158359

CYN-1-14 EF158339 CYN-2-33 EF158360

CYN-1-15 EF158340 CYN-2-36 EF158361

CYN-1-17 EF158341 CYN-2-44 EF158362

CYN-1-18 EF158342 CYN-2-46 EF158363

CYN-1-23 EF158343 CYN-2-47 EF158365

CYN-1-32 EF158344 CYN-2-49 EF158364

CYN-1-33 EF158345 MCY-1 EF158366

CYN-1-34 EF158346 MCY-6 EF158367

CYN-1-35 EF158347 MCY-8 EF158368

CYN-1-37 EF158348 MCY-13 EF158369

CYN-1-38 EF158349 MCY-14 EF158370

CYN-1-40 EF158350 MCY-15 EF158371

CYN-1-41 EF158351 MCY-16 EF158372

CYN-1-42 EF158352 MCY-22 EF158373

CYN-1-44 EF158334 MCY-25 EF158374

CYN-1-46 EF158353 MCY-28 EF158375

CYN-1-50 EF158354 MCY-34 EF158376

CYN-2-14 EF158355 MCY-36 EF158377

138 MCY-39 EF158378 PKS-7 EF157675

MCY-41 EF158379 PKS-8 EF157676

MCY-46 EF158380 PKS-10 EF157677

MCY-52 EF158381 PKS-12 EF157678

MCY-59 EF158382 PKS-14 EF157679

MCY-60 EF158383 PKS-16 EF157680

MCY-72 EF158384 PKS-18 EF157681

MCY-81 EF158385 PKS-19 EF157682

MCY-85 EF158386 PKS-21 EF157683

MCY-93 EF158387 PKS-23 EF157684

PKS-2 EF157673 Clone 545 EF190334 16S rDNA PKS-6 EF157674

Appendix II: Nucleotide sequence of Clone 67

LOCUS CBNPD1_Clone_67 29540 bp DNA linear ENV 30-NOV-2006 DEFINITION Uncultured organism CBNPD1 BAC clone 67. ACCESSION EF157666 VERSION KEYWORDS ENV. SOURCE Uncultured organism CBNPD1 BAC clone 67 ORGANISM Uncultured organism CBNPD1 BAC clone 67 Unclassified sequences; environmental samples. REFERENCE 1 (bases 1 to 29540) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Metagenomic Analysis of a Toxic Cyanobacterial Bloom JOURNAL Unpublished REFERENCE 2 (bases 1 to 29540) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Direct Submission JOURNAL Submitted (30-NOV-2006) The Microbial Gene Diversity and Discovery Research Unit, Griffith University, Kessels Road, Nathan, Brisbane, Queensland 4111, Australia FEATURES Location/Qualifiers source 1..29540 /organism="Uncultured organism CBNPD1 BAC clone 67" /mol_type="genomic DNA" /isolation_source="Toxic cyanobacterial bloom" /environmental_sample /plasmid="pIndigoBAC-5" /country="Australia" /metagenomic gene 3..224 139 /gene="67-1" CDS <3..224 /gene="67-1" /note="'Putative Toluene sulfonate zinc-independent alcohol dehydrogenase oxidoreductase'" /codon_start=1 /product="CLONE 67-1" /translation="VNCVAPVIGPTGLLEQFMGMPDTPENRAKFLGTIPLGRFSTPAD IANACLYLASDEAEFVTGVILEVDGGRTI" gene complement(431..955) /gene="67-2" CDS complement(431..955) /gene="67-2" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 67-2" /translation="MKKWTMLAMLVLVGAGLMAAPFMTAGLAQQAAPQKNVNDWLLNA DNDTERFTKLQTFLRGFDQPMWEVGERYQRVYDALADGNYELAMYHWEKIKSSITTGY MKRPKRQPNADAMFIKNVYDPILASFKTKDSGKAWEGFDLGRNACMACHVAEEVGFMN DQPLFRRTAFPPKK" gene 1071..1748 /gene="67-3" CDS <1071..1748 /gene="67-3" /note="'Putative Two component transcriptional regulator, winged helix family'" /codon_start=1 /product="CLONE 67-3" /translation="LRLLLVEDERRLGNLLQASLHEAGFVTDRALDLADARDLARRER FDALLVDRGLPDGDGLELVRELRAAGNAVPVLVLTARDGVPDRVAGLENGADDYLVKP FAIEELIARVRAVLRRPGGALGQELRAGDIIYDAIERVVIIGARNLVLPRHELATLEC LMRRLGRVVTREILIDALYGRDGEPASNAVPVHIHNLRRKLQEAGAQARIITFRGLGY MLRAEPP" gene 1745..2764 /gene="67-4" CDS 1745..2764 /gene="67-4" /note="'Putative Signal transduction histidine kinase'" /codon_start=1 /product="CLONE 67-4" /translation="MIDRQSLAWQVGIRLAIIMLALTAIGLGWLVWHTSVIEAAHGPS NLVHAVIQEFFLDLAWGVPVFIALIILIGAWALRRALLPLRQASAAALAIGPDALHQR IPTARLPSEIRPLAEAANDAFDRVEKAYRAQQRFVANAAHELRTPIAVARAAVERLPE SAEQRAVLADMVRLSHLASQLLELARLERPLPPSARLDAVPHLQEIACELAASSLNAR DVRVALEAPNSVFVRADAERLIAIIRNLVENAVHHEPRGGEILIRLTSDGLMTVDDRG PGIPAGDEKRIFERFERGSWTKTSGSGLGLAIVKEACATIGATIRVENRASGGARFSV EFPLA" gene complement(3064..3909) /gene="67-5" CDS complement(3064..3909) /gene="67-5" /note="'Putative Enoyl-CoA hydratase/carnithine racemase'" /codon_start=1 /product="CLONE 67-5" /translation="MSLASPGLSFRIVANVLVAGASAIPNIDIETRGKIALLTLNRPE RRNAIDDATIADIERFFTRPPKNVRAVVLRAAGEHFCAGLDLQEHFELGRGPVEFMRM CQNWHRAFDAMQFGSIPIIAALKGAVVGGGLEIAAATHVRVAEKSTFFALPEGQRGVF TGGGATVRVARIMTPARMIDMMIAARTYDVQRGYDVGLAHEIVENGDGEKRAIAIAEA AAEHYDVTNFAILTGISRINDMSTTDGLFAESLLAGMVMTNPDLRPSLEKFFARKAPR LEPKG" gene 3990..4979 /gene="67-6" CDS 3990..4979 /gene="67-6" 140 /note="'Putative Oxidoreductase'" /codon_start=1 /product="CLONE 67-6" /translation="MGFQALLVDKLADGRVNAMVTTLEDSALPPEGNVTVRVEYSTVN YKDGLCITGGGGLVRSYPHVPGIDFAGTVEASDDARYKPGDKVVLTGWRVGETWWGGY AEKARVKADWLVPLPAGLTTRQAMAVGTAGFTAMLAVMALEAHGLAPAKGEVLVTGAA GGVGSVATAILAKRGYAVAAVTGRPEQEDYLKGLGASRIVPRAELADAPARPLDRETW AGCIDNVGGTMLARVLSQMKYNGSVAAVGLAGGANLPTTVIPFLLRGVNLLGIDSVMK PYADRIVAWRRIAEDLAFDKLEAMIVPAKLHDLPELGKAILAGQIRGRVVVEI" gene complement(4982..5356) /gene="67-7" CDS complement(4982..5356) /gene="67-7" /note="'Putative Lysyl-tRNA synthetase'" /codon_start=1 /product="CLONE 67-7" /translation="METTIRHYAQLSIARGCGFGALAIATLMVGSASDLSLFFRSGGF STLLMSFILLIMAARADHVPVKKTEVWIMLPKESRPPVEVAARLIARARKDYLLRFAY VAAIVASVELGLDLALTVGRLA" gene complement(5437..6066) /gene="67-8" CDS complement(5437..6066) /gene="67-8" /note="'Putative Transcriptional regulator, TetR family'" /codon_start=1 /product="CLONE 67-8" /translation="MARPIAADHDDKRRAILKAAAKLFASHGFDRASMAEIALACGVS KALLYHYYASKDQLLFDIIRAHLDDLVAAIEAVPKTLAPRERLAAMINALLEEYRHAD EEHQIQISDLKRLPPKMKAELVERERVLVRQFSGALAGLEPILAARRELLTPLTMNLF GMMNWKFMWFRENGPVSHAAFAEMVMRSIEAAASTLVADLVPRASLAKR" gene complement(6133..7437) /gene="67-9" CDS complement(6133..7437) /gene="67-9" /note="'Putative Phenylacetate-CoA ligase, paaF'" /codon_start=1 /product="CLONE 67-9" /translation="MFSISTYKPLTEPIEHASRDQLSALQLKRLQWSLRHAYENVPAY RASFDAAGVHPDDCKTLADLAKFPFTTKADLRDNYPFGMFAVPREKVSRIHASSGTTG RPTVVGYTANDIQTWADVVARSLRASGVRPGMTVHVAYGYGLFTGGLGAHYGAEKLGC TVIPISGGMTERQVQLITDFRPDVIMVTPSYMLAILDEFRRVGLDPRASSLEVGVFGA EPWTNAMRQEIEAAFDMHAVDIYGLSEVIGPGVANECVETKDGLHIWEDHFYPEIIDP DTGAVLPDGEQGELVFTALTKEAMPIIRYRTRDLTRLLPGTARSMRRMEKITGRSDDM MIVRGVNVFPTQIEEMLLALPALSAHYQIVLTREGRMDEMEVKVEARIDHSDEANSAA KLLAKRIKDRIGITAWVNVLPPDGIERSQGKAKRIIDKRPRA" gene complement(7441..7890) /gene="67-10" CDS complement(7441..7890) /gene="67-10" /note="'Putative Phenylacetic acid degradation-related protein, paaD'" /codon_start=1 /product="CLONE 67-10" /translation="MTKMTADELARACAKAMWDEDKASAGLGMAITAVAPGQAEITMS ISEAMVNGHGTCHGGYIFPLADSAFAFACNSRNHRVVAQHCSITFLAPGRLGDKLRAV ARETSRAGRSGIYDIDIFNQDDVKIAIFRGHSRMVKGEFFPGAVVEG" gene complement(7892..8704) /gene="67-11" CDS complement(7892..8704) /gene="67-11" /note="'Putative Phenylacetate degradation, enoyl-CoA hydratase paaB'" /codon_start=1 /product="CLONE 67-11" 141 /translation="MAQEGSAGEPSPVLVEHRGSLTLLTLNRPDKLNSFNEAQHRALK AAVDAAAADETCRAVIITGSGRGFCAGQDLSDRVRPEGGRAPDLGETLDEYYNPLIRA IRAMPKPVIAAVNGVAAGAGANLALACDLVLAARSAKFVQAFSKIGLIPDSGGTWMLP RLVGEARAKALTMLAIPVPAEEAERIGMIYRAVDDSALMSEAIALGEALAKAPTRGLA ETKLLLQRAFTLSFDQQLDDEREAQRRLGRSADYAEGVRAFMEKREATFRGQ" gene 8972..10135 /gene="67-12" CDS 8972..10135 /gene="67-12" /note="'Putative branched-chain amino acid ABC transporter system, LivK'" /codon_start=1 /product="CLONE 67-12" /translation="MSSSIGFSRRAALLAGAFTLGAGLSPAFAQGEIKLGAVLSVTGP ASFLGDPEKRTLEMVVEDVNAKGGILGRKVRLIVYDDAGDANAARTFATRLLEEDKVT AMIGGSTTGTTMAMIPAFEDAQVPFISLAGAVQIIQPVKKWVFKTPHTDTMACEKIFE DMKKRGTTKIGMISGTDGFGKSMRDECVKVAGKYGIEILHEENYGPRDTDMTPQLTNI KNKAGIQAVINPGFGQGPAIVTRNYRQLGISVPLYQSHGVASKQFIELAGPAAEGVRL PAAALLVADKLPDNDPQKKVVTEYSTNYAKRTGQPVSTFGGHAYDGFMIWKEAVERAK TDNPAKVRDEIEKTKGYVGTGGVVNMSPADHMGLDLSAFRMLEIRNGDWTLVP" gene 10211..11068 /gene="67-13" CDS 10211..>11068 /gene="67-13" /note="'Putative Inner-membrane translocator, LivH'" /codon_start=1 /product="CLONE 67-13" /translation="MPELLQFAFSGLTVGAVYALVALGFTLIYNASDVVNFAQGEFVM LGGMATVFLHLAGVPLPLAAALAILITVGVGLALYAFAIDPARGSSAVTIIIITIGAS IFLRGVAQVIFDKRFHALPHWFGSEPIRLGGAAILPQSLVVLFGAGLIVALIFLVIDR TLLGRAILATAANRLAARLVGIPVRKIVALSFALSAAIGAVAGILVTPITLTSYDVGT LLALKGFAAAMLGGMGSAVGAVVGGLLLGLAESFGAGLISSKYKDAVAFLIILGVLFV RPPGLLGKKSAERV" gene 11083..12030 /gene="67-14" CDS 11083..12030 /gene="67-14" /note="'Putative Permease protein LivM'" /codon_start=1 /product="CLONE 67-14" /translation="MARLLSSRWAIVVGLGLVLAILPLLFPPNYYFRVAAIVNIFALA AVGLNLLMGYAGQVSLGHAGFIGLGAYSVAIGATHFGLHGLAAAGIGAVLSAVVAFLV GRPILRLKGHYLAVATLGFGLLIAMALTNEASWTGGPDGMGVPRIEVFGTRLRGPDTW YYISAGVLLLGVILALNLIESPTGRAMRAIHDSEIASRVAGIDVASYKLKIFVLSAVY ASVAGSLLAFLNGHITPDGTAGFLRSVELVTMVVLGGLGSVIGAITGAAVLVILPQAL TIFHDYEHLMLGLMIMVSMILLPRGIIPSLLARMREGKS" gene 12027..12830 /gene="67-15" CDS 12027..12830 /gene="67-15" /note="'Putative ABC transporter related LivG'" /codon_start=1 /product="CLONE 67-15" /translation="MSVPLLETENLSIAFGGVKAVDGVSLKVQPRRIHSIIGPNGAGK TTLFNLISGLYQPSSGRVRLAGEDVTGLPPDALARRGLSRSFQNLQIFFRMSALDNVM VGRHQHEKTSLLSHVFALPSVRRENARSREIAMALLARVGLADVAEKEAGSLSYGALK RLEIARALATEPRILLLDEPAAGCNAIETEEIDRVIQSVAQSGIAVLLVEHDMRLVMR ISDEIHVLERGRTLVAGTPAEVRDDSRVIAAYLGTFGSQEANRAEANRA" gene 12823..13521 /gene="67-16" CDS <12823..13521 /gene="67-16" /note="'Putative ABC transporter related LivF'" /codon_start=1 142 /product="CLONE 67-16" /translation="VLEIRRLRSAYGRIEVLHGLDLDVQRNEIVCLIGGNGAGKTTLL RAISGVQPVTSGSITFEGQDITGLSAEKRVALGIAQVPEGRQIFGGLTVEDNLNLGGW LHGGGTARERDDIYALFPILHEKRALQAGGLSGGQQQMLAIGRALMSRPRLLLLDEPS MGLAPVLVEQVFGVIESLQQRGITILLVEQNAAAALAKSDRGFVIETGEITHSGASAD LIGDPKLREAYLGH" gene 13646..15688 /gene="67-17" CDS 13646..15688 /gene="67-17" /note="'Putative Phenylacetic acid degradation protein, PaaN/Z subunit'" /codon_start=1 /product="CLONE 67-17" /translation="MSLIRLQSFAEGHWFSGDGDGSLLVSAIDASPVAEITARGLDFA GMMRFARAKAGPALRRLTFHERASLLKAVAQALTARKEELYALSALTGATRADRWIDI DGGIGTLFVYASKGKRELPNETFLIDGAVEGLSKNGTFLGQHIYVPREGVAVQINAYN FPVWGMLEKLGPSILAGLPVIAKPASATAWLAEKAARIIIESGILPEGAFQFIAGSTG DLFEHLTSQDHVAFTGSLATSTALQAHPVILKNAVHFTAERDSLNCSILGPDAVPGTP EFDLLIKEVAREMSTKAGQKCTAIRRIIVPAAEADAVANALSKRLSSLAIGDPRVETT RIGPLASLAQRADVRAQIARLQEECEILHGAPDKLDLPGLDTEKGAFLSPVLLAARDS RAAERPHDTEAFGPVATLLTYRDLDEAIGLARKGDGSLAGSLVTADAQIAREVVFGTG AFHGRLLLLNAECAAESTGHGSPMPHLVHGGPGRAGGGEELGGMRAVHHQMQRVALQG SPAFLTAITKSYVKGTPEPAAPAHPFRLPFEDLVPGQSYHSPERVISLGDIEHFAHFT GDTFYAHMDEEAVKGHPFFPGRVAHGYLLLSFAAGLFVEPERGPVLANYGLDNLRFLK PVAPGEAIRVRLTVKSKSPRNTDYGEVRWDVEIRTGAGDIAATYELLTMNAYREAS" gene 15685..17811 /gene="67-18" CDS 15685..17811 /gene="67-18" /note="'Putative Enoyl-CoA hydratase/isomerase/3-hydroxyacyl-CoA dehydrogenase, paaC'" /codon_start=1 /product="CLONE 67-18" /translation="MSAEISAAGDADEGETMPDVLRVWRDGEVAVLEIDNPPVNATSQ AVRRALFDAVQRAQSEPETLAILICGAGRTFTAGGDISEFGKPPREPHLPDLINRIEE SAKPIVVAWHGTALGGGCEIGLGAHKRIIAKDGFVGLPEVKLGLLPGAGGTQRLPRLV GPVAALDLIASGRMVGAQEALRLGLVDAIARKDLRAEAIELARGLIGQMQPRLSLRAT PAAEPEAWQAMVAKVNREARGCIAPQRAIELVSQSLTLPFSTGQPNERRTFFELMNSE QSRGLRHLFLAEREAGKRPELASVPPRDLHQIGIIGAGTMGSGIAVAFLDAGYKVMLV EADDTALQAGLERVNGLYLRSIRSGRIDENTRLNRLSNLVPTTEMSRLAACDLVIEAI FEDLGVKLELMKKLATLLPASTLIATNTSYLDLEPMADALPAPERFLGLHFFSPAHVM KLLEIVRARRTGLEALSTALAIGRKLKKIAVISGVCEGFIGNRILAKFRAQCEFMLEE GALPREIDAAMEAFGLAMGPFAVQDLAGLDIAWARRKRLAATRDPTDRDVPLVDRLCE QGRFGQKAGKGWYAYENGRRVPDPAVEAMVRAHAAATGRPQRSFSAQEIQTRVVTAMV NEGARILGEGIAARPADIDLVLVHGYGFPNWRGGPMHHADALGLKQVLAVAQNTEARD GRGFEVAPLLAELVAGQRHFSSLNGG" gene 17859..19061 /gene="67-19" CDS 17859..19061 /gene="67-19" /note="'Putative Thiolase, PaaE subunit'" /codon_start=1 /product="CLONE 67-19" /translation="MPEAYLVDGTRTPIGRYAGALASLRADDMAAHVIRALMARFPPE MAAAVDEIILGSANQAGEDNRNVARMAGLLAGLPVEVPGTTVNRLCGSGLDAVGIAAR AIKAGEADLIIAGGVESMTRAPLVMGKAQEAFQRSAEVFDTTIGWRFVNPLMKAQYGI DSMPETAENVAEEFKVSREDQDSFALRSQARASRAQKNGRLAREITPVSISQKKGEPI IVDRDEHPRETTLEKLAALGTPFRKGGTVTAGNASGVNDGAAALLIASEAAVKRFGLT PLARVTGLATAGVAPRIMGIGPAPATQKLCARLGIKPSEFDVIELNEAFAAQGLASLR LLGLPDGADHINPNGGAIALGHPLGMSGARLALTAAIEMKDRGGRRACATMCIGVGQG IALALESA" gene complement(19150..20226) 143 /gene="67-20" CDS complement(19150..20226) /gene="67-20" /note="'Putative Phenylacetate-CoA oxygenase/reductase, PaaK subunit'" /codon_start=1 /product="CLONE 67-20" /translation="MSAPRFHTLTIRDIRRETPDAVSIAFEVPSELQQAFAFEQGQYL TLRTQIDGEEIRRSYSICAGEDDGELRVAVKEVAGGAFSTFANHALQPGAALDVMTPM GRFGATTRQAGGGHSVFFACGSGITPILSIIRTRLARDPDARLTLFYGNRNSGSILFR EALEDLKDRHLGRLALHHILSREAQDIDLLNGRMTPEKIALLVRTLGGARAIDDIYLC GPEEMTRAARSVLEEMGAEPSRIHVELFSTGAAPPRSGARTVVPEADNGVSLTVTHDG QSHSLMLHEGETVLEAAERAGLDVPYSCRGGMCCTCRAKVTEGSASMDLNFSLEPWEV EAGYVLTCQCRPTGAALAVDYDQV" gene complement(20236..20775) /gene="67-21" CDS complement(20236..20775) /gene="67-21" /note="'Putative Phenylacetate-CoA oxygenase, PaaJ subunit'" /codon_start=1 /product="CLONE 67-21" /translation="MAEIIHLREPVDERLHERLHERLMRARQAAGNVVDPEVPVLTIE DLGVLREVRFAGEAVEVVITPTYSGCPAMDLISLQVDLALEKAGFASRKVTLSLSPAW TTDWMSEAGKEKLRAYGIAPPVGRAKGRGALFGVDEIPCPHCGSAETERVSEFGSTAC KGLWRCRACREPFDYFKCI" gene complement(20779..21531) /gene="67-22" CDS complement(20779..21531) /gene="67-22" /note="'Putative Phenylacetate-CoA oxygenase, PaaI/C subunit'" /codon_start=1 /product="CLONE 67-22" /translation="METPLFAYTLRLADNALVLGHRLGEWCGHGPMLEEDLALANMAL DCIGQARNFYTYAGEIEGKGRSEDDLAYLRDCDEFRNILLVERPRGDFAFTILRQFLY AAFMHPCFERLATSKDERLAAIAAKAVKEMAYHLRHSAEWVIRLGDGTEESAARLQEA LADLWPFTGEMFEVDGTERALIEAGVAVDPAGIRPLWLDTVDRVFADALLNRPKDGWM QTGGRRGEHSEHLGHLLADLQFMQRAYPGSTW" gene complement(21531..21818) /gene="67-23" CDS complement(21531..21818) /gene="67-23" /note="'Putative Phenylacetic acid degradation, paaH'" /codon_start=1 /product="CLONE 67-23" /translation="MTEKNMPLWEVFIRSRTGLSHRHAGSLHAPDAEMALQNARDVYT RRGEGLSIWVVPSGAITASDPSDKDALFEPTATKIYRHPTFYEVPDEVGHM" gene complement(21824..22828) /gene="67-24" CDS complement(21824..22828) /gene="67-24" /note="'Putative Phenylacetate-CoA oxygenase, PaaG subunit'" /codon_start=1 /product="CLONE 67-24" /translation="MAAMYAQALNTVTEVSKDDPAKLAAFQARIDAEEKIEPNDWMPE GYRKTLQRQISQHAHSEIVGMLPEGNWITRAPSLKRKAALLAKVQDEGGHGLYLYSAA ETLGVSREELVDALHAGRAKYSSIFNYPTPSWADMGMIGWLVDGAAIMNQIPLCRTSF GPYARAMIRICKEESFHQRQGYEIVMTLARGTPAQKKMAQDSLNRWWWPAIMMFGPSD SASQHSDQNMKWKIKRFSNDALRQKFIDATVPQGHFLGLTFPDPDLRFNEATGHWEHG EIDWSEFNRVVKGEGPCNRERIKARIKAHEEGAWVREAALRHAEKRAARKAAARIAAE " gene complement(22936..23706) 144 /gene="67-25" CDS complement(22936..23706) /gene="67-25" /note="'Putative Phenylacetic acid degradation operon negative regulatory protein, PaaX'" /codon_start=1 /product="CLONE 67-25" /translation="MARPAALSALIAAHHGRVPPRTGSLIVSVFGALVLPAGEALRLS DLQEWLAALEIEPGLVRTALSRLVNDGTLLRERDGKAALYRLSARALKDFEAAGDLIF GRLLPRPTGDLDLLVIEDTGRRATLRAELVAMGFVPLAANLLIRSAWAGRAAPEVRGC LALSLHANADLALRARELWPLEQLAGGYRAVIAHAEAVQGGAFTPSEARLARLMLVHE FRRIVLRDPFLPEAVLPRDWPGSPARLAFDAALRHVQG" gene complement(23710..24366) /gene="67-26" CDS complement(23710..24366) /gene="67-26" /note="'Putative Glutathione S-transferase'" /codon_start=1 /product="CLONE 67-26" /translation="MVTLYGVTRSRASRNIWLMNELGQPYEQVPVIQAYRLANPEAPD APVNTRSPSFLAVNPNGHIPSMKDGDLVLHESLAINLYLAKKFGGPLAPANLAEEGEV AMWTLWAATEAEPHAIQILYHRLMKQGEERKPELADAAVVALRQPFQVLDGALAGNGC LVGGRFTVADINVAEVIRYASPAPELFEAAPHVRAWLVACQARPAFRAMMDKRNAEPA " gene complement(24454..24672) /gene="67-27" CDS complement(24454..24672) /gene="67-27" /note="'Hypothetical Protein'" /codon_start=1 /product="CLONE 67-27" /translation="MNRRDILRYIADLFPSSTKLSWGAAGRAALRTRDARIGEAAGDT TSGQMRAVADLFPRRLGMGQGTGTCKGE" gene 24682..25476 /gene="67-28" CDS 24682..25476 /gene="67-28" /note="'Putative ABC transporter, substrate binding protein, ModA subunit'" /codon_start=1 /product="CLONE 67-28" /translation="MPARRTVLAWSFAALAGLACPARAQSSGPVVFAAASLKTALDEI ATLWMRETGLPAPRLAYAGSNALARQIEQGAPADVFLSADLDWMDALAAKNLLRPGTR SNLLANRLALVAPVESKATIALQPGADLAAPLADGRLATANVDSVPAGKYAKAAFEKL GLWVSVKDRLAQAENVRAALLLVARGEAPLGVTYATDAAAEPKVRIITLFPEGSHPPI VYPIAMLRDSAHPQALRLLEFLKGGSARAIFERHGFTLPAPPRQGS" gene 25483..26175 /gene="67-29" CDS 25483..26175 /gene="67-29" /note="'Putative Molybdenum transport protein, ModB subunit'" /codon_start=1 /product="CLONE 67-29" /translation="MFAALTPEEWAALELSLKVATVATFASLPAGVAVACLLARGRFP GRSVLDALVHLPLILPPVVTGYLLLLAFGRKGPVGAWLEAHFGLVFSFRWTGAALAAA VMGFPLMVRAIRLSVEAVDRRLEQAAGTLGAGRMLTFLLVTLPLALPGILAGAVLAFA KAMGEFGATITFVANIPGETQTLPTAIYTFTQVPGGDAAAMRLVLISVAVSVLALVLS EWLARGLSRRMD" gene 26179..26865 /gene="67-30" CDS 26179..26865 /gene="67-30" /note="'Putative Molybdenum import ATP-binding protein 145 ModC'" /codon_start=1 /product="CLONE 67-30" /translation="MLEIDIRHRVGALDLAVSLRAGGPVSALFGRSGAGKTTLLNLIA GLAAPDSGRIALDATVLFDRQKGIDVPSRRRRIGYVFQDARLFPHLTVKQNLLFGRWL ARQPLSDRTVEQVLALLDLAPLLARRPAHPSGGEKQRVALGRALLASPRLLLMDEPLA SLDAERKAEILAHLEAVRDEIGIPILYVSHAREEVRRLAHEVALIEHGQLAAFGPAAT LLPRLPEGEA" gene 26862..27272 /gene="67-31" CDS 26862..27272 /gene="67-31" /note="'Putative Molybdenum-binding transcriptional regulator, ModE family protein'" /codon_start=1 /product="CLONE 67-31" /translation="MTAPSAPAGRLSLRIDFPNGERLGPGKVRLLEEIARLGSISAAG RSMGMSYRRAWLLVDALNGMFDAPLVGSHQGGSGGGGAALTPRGTEVVRLYRSIEANA QTGTREALSCLAGYATDGGQNPGRSGSFPTEIGD" gene 27547..28506 /gene="67-32" CDS 27547..28506 /gene="67-32" /note="'Putative RNA methyltransferase TrmH'" /codon_start=1 /product="CLONE 67-32" /translation="MAHDSKRPYHPLDRQPKPAREGSRMGEKPAFKDKSRFAPKPKAP KPRLPDDTDVLYGMHSVKEAFANPRRRFRRIVATENAVQRLTEDGVNLPLAPELVKPE AIGRLLTPDAVHQGLYVEAERLPLLPLGKLPRDRIILALDQVTDPHNVGAILRSAAAF DVGGVIVTHRHSPEITGVLAKAASGALEHVPMAGVQNLARALTTLKDEGFAVIGLDSE APEKMGDLPLKLPIVLVLGAEGRGLRPSTRELCSHLARLELPGAIKSLNVSNAAAISL YAVQQAITRMLKQEGSAPASRSGALGEAMRMGIRIAGSPLYLP" BASE COUNT 4951 a 9869 c 9653 g 5067 t ORIGIN 1 gcgtgaactg cgtcgccccg gtgatcggcg cgacgggcct gctcgaacaa ttcatgggca 61 tgccggatac gccggagaac cgggcgaaat tcctcggaac catcccgctc gggcgcttct 121 ccaccccggc cgacatcgcc aatgcctgcc tctatctcgc ctcggatgaa gccgaattcg 181 tcacgggcgt catcctcgag gtcgatggcg gccggacgat ctgacccccc ccgccgaacc 241 actgatcgag gccggcatgg ccgacccgag gacgacatcc cgcagatgcc atcaagggat 301 tggcgacagc ctcggcacgg gagaaaggcc ccgccgcccg catcgaacgt gaccgcccga 361 tgcgggagac cttgaacggg accttgcggc gggccgcgcg cgaggacctt ctcgccgcat 421 gatccgtgct tcacttcttc ggcgggaacg ccgtgcgccg aaagagcggt tgatcgttca 481 tgaagccgac ctcctcggcg acatggcagg ccatgcaggc attgcgaccg agatcaaagc 541 cctcccaggc cttgccggag tccttcgtct tgaaagaggc gaggatggga tcgtaaacgt 601 tcttgatgaa catcgcatcg gcattgggct ggcgtttggg ccgcttcatg tagccggtcg 661 tgatcgagga cttgatcttt tcccagtggt acatggcgag ttcgtaattg ccgtccgcca 721 acgcatcata gacccgctgg taacgttcgc cgacctccca catgggctga tcaaagccac 781 gaaggaaggt ttgcagcttc gtgaagcgct ccgtgtcatt gtcggcgttc agaagccagt 841 cattcacatt cttctggggc gcggcctgtt gcgccaggcc agccgtcatg aagggtgctg 901 ccatcagccc ggcaccaacc agcaccaaca ttgcgagcat tgtccatttc ttcattgttt 961 cccccatctt gccatcaaca cgtgacgcct cttgcacgag gcgtgctacc gtgatgcgct 1021 gaggttcagc gacgaagttt aaggcagttt tcaaagggga gtggcatcgc ttgcggctgt 1081 tgctggtgga ggatgagcgc cgattgggca atctcttgca ggcctcgctg catgaggcgg 1141 gttttgtgac tgatcgcgcc ctggatctcg ctgacgctcg cgatctcgcc cgccgggagc 1201 gtttcgacgc ccttctcgtc gatcgaggat tgccggatgg cgatgggctg gaacttgtcc 1261 gcgagttgcg cgctgccggc aacgcggtgc cggtgctcgt gctcacagcg cgcgatggcg 1321 taccggatcg ggtggccggg ctcgaaaacg gtgccgacga ttacctggtc aaaccttttg 1381 ccattgagga attgatcgcg cgggttcgcg ccgttctgcg tcgacccggc ggagcgctcg 1441 gccaggaact gagagcggga gacataattt atgatgcaat cgaacgcgtt gtcatcatcg 1501 gtgcccgaaa cctcgttttg ccccggcacg aactcgcaac gctcgaatgc ctgatgcggc 1561 gattgggcag ggtcgtcacg cgtgaaatcc tgatcgatgc cctgtatgga cgcgacgggg 1621 agcctgcctc caacgcagtg ccggtccata ttcacaatct gcgcaggaaa cttcaggaag 1681 caggagcaca agccagaatc atcaccttcc gggggctcgg ctacatgctc cgggccgagc 1741 ctccatgatc gaccgccagt cactggcctg gcaagtcggc attcgactgg ccatcatcat 146 1801 gctggcgctc actgccatcg ggctcggctg gctggtttgg cacaccagcg tgatcgaggc 1861 cgctcacggc ccttccaacc ttgtccatgc cgtgatccag gagtttttcc ttgatctcgc 1921 ctggggtgtc cccgtgttca tcgctctgat cattctgatc ggtgcctggg cgttgaggcg 1981 cgccctcctg cctctgaggc aggcttcggc ggcagcgctc gcgattggtc cggacgcgct 2041 gcatcagcgg attccgacag cccggctacc cagtgagatt cgaccgctgg cggaagccgc 2101 gaatgatgca ttcgaccgcg tggagaaggc ctatcgcgcg cagcagcgct ttgtcgccaa 2161 cgcggcgcat gaattgcgca ctccgattgc agtcgcgcgc gcagcggtgg agcgtctgcc 2221 ggaaagcgcc gaacagcgcg cagtgctggc ggacatggtc cggctctcgc acctcgcctc 2281 tcaattgctg gaactggccc gcctcgaacg tccgctgccc ccctctgcgc ggctcgacgc 2341 ggtcccccat ctgcaggaga tcgcgtgcga actggcggcg agttcgctca acgcccgcga 2401 tgtccgcgtt gcactcgagg ctccgaacag cgtgtttgtc cgcgcggatg ccgagcgtct 2461 gattgccatc atccgcaacc tagtcgagaa tgccgtacat catgaaccgc gcggcggcga 2521 aatcctgatc cgcctgacat ccgatgggct gatgacggtc gatgatcgcg gtccgggcat 2581 tccggccggc gacgaaaaac gcatcttcga gcgttttgag cgtggttcct ggaccaaaac 2641 aagcggttcc ggcctcggcc tggcgatcgt gaaagaagcc tgtgccacga tcggcgcgac 2701 gatccgcgtc gagaaccgcg cttccggcgg agcacggttc agcgttgaat tcccgctcgc 2761 ctgatccgct tttcccgatg gtttgccgca gggccgggcg aaccggtttc ctcgccatct 2821 cgatgatcca tttcacatgg aacccgaact tgccatgcgc cgtgtccgcg aggtggtgcg 2881 atgctcagac atggttgcgg cccgggtgca atcaggccgc gaggatttct ggagcgggcg 2941 acgggaatcg aacccgtgtc tctagcttgg aaggctaggg tcttaccatt acacaacgcc 3001 cgcgacgcga ccagagaggt aatcgcggcg acgcgccaag gcaaggggaa aggctgggct 3061 ttctcacccc ttcggttcca gcctcggcgc cttcctggcg aagaatttct cgaggctggg 3121 gcgcagatcc gggttggtca tcaccatgcc ggcgagcagg ctttcggcga agaggccatc 3181 ggtcgtcgac atgtcgttga tgcgtgaaat gccggtgaga atcgcgaaat tcgtcacgtc 3241 atagtgctcg gcagcagctt ccgcgatggc aatcgcgcgc ttctcgccgt ccccgttctc 3301 gacgatctcg tgggcgagac caacgtcata gccgcgctgc acgtcatagg tccgcgccgc 3361 gatcatcatg tcgatcatgc gcgcgggcgt catgatccgc gcgacgcgca ccgtggcccc 3421 accccccgtg aagaccccgc gctggccctc cggcaaggcg aagaaggtgg atttctcggc 3481 cacccgcaca tgcgtcgcgg cggcgatctc gaggccgccg cccaccaccg cgcccttcag 3541 cgccgcgatg atgggaatcg agccgaactg catcgcatcg aaggcgcgat gccagttctg 3601 gcacatgcgc atgaattcca ccgggccgcg gccaagttcg aaatgctcct gcaaatcgag 3661 gcccgcgcag aaatgttcgc ccgccgcgcg caacacgacc gcgcgcacat tcttcggcgg 3721 ccgggtgaag aagcgctcga tgtccgcgat tgtcgcgtca tcgatcgcgt tgcgccgctc 3781 cgggcggttc agcgtgagaa gcgcgatctt gccgcgggtt tcgatgtcga tgttgggaat 3841 ggcagaggca ccggctacga ggacattggc tacaattcga aacgaaagac ccggagacgc 3901 aaggctcatc gcagcctaag gttcaaaagg ccatcgatca agacggcagc gaaggcgcga 3961 aatcaaccgc atcgacaagg gagagaggaa tgggtttcca ggcattgctc gtggacaagc 4021 tggccgacgg gcgggtgaac gcgatggtca cgacgctgga ggacagcgcg ctgccgcccg 4081 agggcaatgt caccgttcgg gtcgagtatt ccacggtcaa ttacaaggat ggcctctgca 4141 tcaccggcgg cggcgggctc gtgcgcagct acccgcatgt gccgggcatt gattttgccg 4201 ggacagtcga ggcctccgat gatgcccgct acaagcccgg cgacaaggtc gtgctcaccg 4261 gctggcgcgt cggcgagacc tggtggggtg gctacgccga aaaggcgcgt gtgaaggcgg 4321 attggctcgt gccgcttccc gccggcctca ccacgcggca ggccatggcg gtcggcacgg 4381 ccggcttcac cgccatgctc gcggtgatgg cgctcgaagc gcatggcctc gcgcccgcga 4441 agggggaagt gctggtgacg ggtgccgcgg gcggcgtcgg ctcggtcgcg accgcgattc 4501 tcgcgaagcg cggctatgcg gtcgcggcgg tgacgggccg tccggagcag gaggattacc 4561 tcaagggcct cggcgcgagc cgcatcgtgc cgcgcgccga actcgcggat gcccccgcgc 4621 gcccgctcga cagggagacc tgggccggct gcatcgacaa tgtcggcggc acgatgctgg 4681 cgcgcgttct gagccagatg aagtataatg gctcggttgc ggccgtgggc cttgccggcg 4741 gcgcgaacct gccgacgacg gtcatcccct tcctgctgcg cggggtgaac ctgctgggca 4801 tcgacagcgt gatgaaaccc tatgccgacc gcatcgtcgc ctggcggcgc atcgcggagg 4861 acctggcatt cgacaagctg gaggccatga tcgtgccggc gaagctccat gacctgcccg 4921 aactcggcaa ggcgattctc gccgggcaga tccggggacg cgtggtggtg gagatctgac 4981 ctcaggccag ccggccgacg gtcagcgcga gatcgagccc cagttcgacg ctggccacga 5041 tggccgcgac ataggcgaag cgcaggagat aatccttccg tgcccgcgcg atcagtcgcg 5101 cggcaacctc gaccggcggc cggctctctt tcgggagcat gatccagacc tcggtctttt 5161 tcaccggcac atggtcggcg cgcgccgcca tgatcagcag gatgaagctc atcagcaacg 5221 tggaaaagcc gccggaccgg aagaagagcg agagatcgct ggcggaaccg accatcaaag 5281 tggcgatcgc cagcgccccg aaaccgcagc cccgcgcgat ggaaagctgc gcataatgcc 5341 ggatcgtcgt ctccatggcg aacccctttg ccggaagcac ccgtcacaat gccggaggct 5401 ccggtttctg gcaacgcgac cggagatcga gaggaatcac cttttcgcga gggatgcgcg 5461 cggcaccaga tccgccacca atgtcgaagc agccgcctcg atgctgcgca tgaccatctc 5521 ggcaaaagcg gcatggctca ccggtccgtt ctcgcggaac cacatgaatt tccagttcat 5581 catgccgaag aggttcatgg tcagcggggt gagaagctcg cgccgcgccg cgagaatcgg 147 5641 ctccagcccg gccagcgcgc cggagaactg ccgcaccagc acgcgctcgc gctccaccag 5701 ctcggccttc atcttgggcg gaaggcgctt caggtccgag atctggatct ggtgctcctc 5761 atccgcatgg cgatattcct cgagcagcgc gttgatcatc gccgccagcc gctcgcgcgg 5821 ggcaagcgtc ttcggcaccg cttcgatggc cgccacgaga tcatcgagat gggcgcggat 5881 gatgtcgaac aggagctgat ccttgctcgc gtagtaatga tagagcagcg ccttcgagac 5941 cccgcaggcc agcgcgatct cggccatgga ggcccgatcg aaaccgtggc tcgcaaagag 6001 cttcgcggca gccttgagga tggccctgcg cttgtcgtcg tgatcggcgg cgatggggcg 6061 tgccatgggt tcggttctgt ccttttgacg cgaattcctg atgcgggacg gctcccgctt 6121 cgcgtgaatt cgctacgccc tgggccgctt gtcgatgatg cgcttcgcct tgccctgcga 6181 acgttcgatc ccgtcgggcg gcagcacatt cacccaggcg gtgatgccga tcctgtcctt 6241 gatccgcttg gccaggagct ttgctgcgga attggcttcg tcggaatggt cgatgcgggc 6301 ctccaccttc acctccatct cgtccatgcg gccttcgcgc gtcagcacga tctggtaatg 6361 ggcggaaagc gcggggagtg ccagaagcat ctcctcgatc tgcgtgggga acacattcac 6421 cccgcgcacg atcatcatgt cgtcagagcg gcccgtgatc ttctccatcc gccgcatgga 6481 gcgcgcggtg cctggcagca gccgcgtgag atcgcgcgtg cggtagcgga tgatgggcat 6541 cgcttccttg gtgagcgcgg tgaagaccag ctccccctgc tcgccatcgg gcagaaccgc 6601 gcctgtgtcg gggtcgatga tctcggggta gaaatgatcc tcccagatgt ggaggccgtc 6661 cttcgtttcc acgcattcat tggcaacccc cggcccgatc acctccgaaa ggccgtagat 6721 atccacggca tgcatgtcga aggccgcctc gatttcctgc cgcatcgcat tcgtccaggg 6781 ctccgcgccg aagacgccga cctcaaggga tgaggcgcgc ggatcgagac cgaccctccg 6841 gaattcatcg agaatggcca gcatgtagct cggcgtgacc atgatcacgt ccggccggaa 6901 atcggtgatg agctgcacct gccgctcggt catgccgccg gaaatgggga tcaccgtgca 6961 gccgagcttc tccgcgccgt aatgcgcgcc gaggccgccg gtgaaaagcc cgtagccata 7021 ggcgacatgc actgtcattc cgggcctgac acccgaggcg cgcagggagc gcgccacaac 7081 atcggcccag gtctggatat cgttggcggt atagcccacc acggtcggcc tgccggtggt 7141 gccggaggag gcatggatgc ggctcacctt ctcgcgcggc acggcgaaca tgccgaaggg 7201 gtagttatcg cgcaggtccg ccttggtcgt gaaagggaat ttcgcgaggt ccgccagagt 7261 tttgcaatca tcggggtgaa cgcccgccgc atcaaaactg gcccgatagg ccggcacgtt 7321 ctcataagca tggcgcagcg accattgcag gcgcttcagt tgcagcgccg agagttgatc 7381 ccgcgaggca tgctcgatcg gctcggtcag cggcttgtag gtggaaatcg aaaacatggc 7441 tcagccctcc accaccgcgc cggggaagaa ttcgcccttc accatgcgcg aatgcccccg 7501 gaagatcgcg atcttcacat cgtcctggtt gaaaatgtcg atgtcgtaga tgcccgagcg 7561 gcccgcgcgg ctcgtctcgc gggccacggc gcgcagcttg tcacccagcc gccccggcgc 7621 gaggaaggtg atggagcaat gctgcgccac cacgcgatgg ttgcggctgt tgcaggcgaa 7681 ggcgaaggcc gaatccgcga gcgggaagat gtagccgcca tggcaggtgc cgtggccatt 7741 caccatcgcc tcgctgatcg acatggtgat ctccgcctgg ccgggcgcca cggcggtgat 7801 cgccatgccg aggccggcgg aagccttgtc ctcgtcccac atcgccttcg cgcaggcccg 7861 ggccagttcg tcggctgtca ttttcgtcat gtcattgccc ccggaaggtc gcttcgcgct 7921 tttccatgaa ggcgcgcacg ccttccgcat aatccgccga gcggccaagc cggcgctggg 7981 cctcccgctc gtcatcgagt tgctggtcga aggaaagcgt gaaggcccgc tgcagcagca 8041 gcttcgtttc ggccagcccg cgcgtcggtg ccttggcgag cgcctcgccc agcgcgatgg 8101 cctcgctcat cagcgcggaa tcgtccaccg cgcgatagat catgccgatg cgctcggcct 8161 cctcggccgg aacgggtatc gcgagcatcg tcagcgcctt ggcgcgcgct tcccccacga 8221 ggcgcgggag catccaggtg ccgcccgaat cggggatgag cccgatcttc gagaaggcct 8281 gcacgaactt ggcgcttctc gccgcgagca cgagatcgca ggcgagcgcc agattggcgc 8341 ccgcgcccgc cgccacgcca ttcaccgccg cgatcaccgg cttcggcatc gcgcggatcg 8401 cgcggatcag cgggttgtag tattcatcga gcgtctcgcc gagatcgggc gccctgccgc 8461 cctcggggcg cacgcgatcc gacagatcct gccccgcgca gaagccacgc cctgaaccgg 8521 tgatgatcac cgcccgacaa gtctcgtccg ccgcagccgc atccaccgcc gccttcagcg 8581 cgcggtgctg cgcctcgttg aaggaattca gcttgtccgg tcggttgagc gtgaggagcg 8641 tgagcgagcc ccgatgctcc accagaaccg gtgacggctc gcctgcgctg ccttcctgcg 8701 ccatgatccc ctcccctcga acagcgccgt cgcgttcgtg cttcgcgttt cccgccggcc 8761 gcttcctgaa ggatttcgtc cgcaacgatt ccgcggcgat ccctcttgac taactgaccg 8821 gtcggataat taaatgaaga agacagcgac aatgtcaacg ggctcctgaa acggaacccg 8881 catgaaacgg aaaacggcgc cggccagcac cggcgcctcc acggaaacgc cggatcgacc 8941 gccagagcgc ggcgcatggg agggaggaag catgagcagt tccatcgggt tttcacggcg 9001 cgcagcactt ctggcgggcg ccttcaccct gggcgcgggg ctgagcccgg ctttcgcgca 9061 gggcgagatc aagctcggcg cggttctctc ggtgacgggt cccgcctcgt tcctgggcga 9121 cccggaaaag cgcacgctcg aaatggtcgt cgaggatgtg aacgcgaagg gcggcattct 9181 cggccgcaag gttcgcctca tcgtctatga tgacgcgggc gacgccaatg ccgcgcgcac 9241 cttcgcgacg cgccttctcg aggaagacaa ggtgaccgcg atgatcggcg gctccaccac 9301 cggcaccacc atggcgatga tccccgcctt cgaggatgcg caggtgccct tcatcagcct 9361 cgccggcgcg gtgcagatca tccagccggt gaagaaatgg gtcttcaaaa ccccccacac 9421 cgacaccatg gcctgcgaga agattttcga ggacatgaag aagcgcggca ccacgaagat 148 9481 cggcatgatc tcgggcaccg acggcttcgg caaatccatg cgcgatgaat gcgtgaaggt 9541 cgcgggcaaa tacggcatcg agatcctcca cgaggagaat tacggcccgc gcgacaccga 9601 catgaccccg cagctcacca acatcaagaa caaggcgggc attcaggcgg tcatcaatcc 9661 gggcttcggc cagggcccgg cgatcgtcac gcgcaattac aggcagctcg gcatcagcgt 9721 tccgctctac cagagccatg gcgtggcctc gaagcagttc atcgagctcg ccggcccggc 9781 ggccgagggc gtgcgcctgc cggcggcggc gcttctggtg gccgacaagc tgcccgacaa 9841 cgacccgcag aagaaggtcg tgacggaata ttccaccaat tacgccaagc gcacgggcca 9901 gccggtctcg accttcggcg gccatgccta tgacggtttc atgatctgga aggaggccgt 9961 ggagcgcgcc aagaccgaca acccggccaa ggtgcgcgac gagatcgaga agaccaaggg 10021 ctatgtcggc accggcggcg tggtgaacat gtctcccgcc gaccatatgg gcctcgatct 10081 ctcggccttc cgcatgctgg aaatcaggaa cggcgactgg accctcgtcc cctgagcccg 10141 aaccggcgag gcggcccgcg ccgcctcgct tcccctcccc gctgacgccc gagcacggag 10201 catcccgcga atgcccgaac tcctgcaatt tgccttttcg ggcctgacgg tcggcgctgt 10261 ctatgcgctg gtcgcgctcg gcttcacgct gatctacaac gcttcggacg tcgtgaactt 10321 cgcgcaaggc gagttcgtga tgctcggcgg catggcgacg gtgttcctgc atctcgcggg 10381 cgtgccgctg ccgctcgccg ccgcgctggc catcctgatc acggtcggcg tgggccttgc 10441 gctctatgcc ttcgcgatcg acccggcacg gggttcgagc gcggtgacga tcatcatcat 10501 caccatcggg gcctcgatct tcctgcgcgg tgtggcgcag gtgatcttcg acaagcgctt 10561 ccatgccctg ccccattggt tcggctccga accgatccgt ctcggcggcg cggccatcct 10621 gccgcagagc ctcgtggtgc tcttcggcgc ggggctcatc gtggcgctga tcttcctcgt 10681 catcgaccgc acgcttctcg gccgcgcgat cctggcgacc gccgcgaacc gcctcgccgc 10741 gcgcctcgtg gggattcccg tgcgcaagat cgtggcgctc tccttcgcgc tgtcggccgc 10801 gattggcgcg gtcgcgggga ttctcgtcac gccgatcacg ctgacgagct atgatgtcgg 10861 cactcttctg gcgctcaagg gcttcgcggc tgcgatgctc ggcggcatgg gcagcgccgt 10921 gggcgcggtg gtgggcggcc tgctgctcgg cctcgccgag agcttcggcg cagggctgat 10981 ctcctcgaaa tacaaggatg ccgtcgcctt cctcatcatc ctcggggtgc ttttcgtgcg 11041 tccgcagggc ctgctgggca agaaaagcgt ggagcgcgtg tgatggcgcg gcttctttct 11101 tcccgctggg cgatcgtggt cggcctcggc ctcgtcctgg cgatcctgcc cctcctgttc 11161 ccgtcgaact actatttccg cgtcgcggcc atcgtgaaca tcttcgcgct ggcggccgtg 11221 ggtctgaacc tgctgatggg ctatgccggg caggtgagcc tgggccatgc cggcttcata 11281 gggctcggag cctattccgt ggcgatcggc gcgacacatt tcggcctcca cggtctcgcg 11341 gcggcgggaa tcggtgcggt gctgtcggcc gtggtggcct ttctcgtggg tcggccgatc 11401 ctgcgcctca agggccatta tctcgcagtc gcgaccctcg gcttcggcct gctcatcgcg 11461 atggcgctca ccaacgaggc gagctggacc ggcgggccgg atggcatggg cgtgccgcgc 11521 atcgaggtgt tcggcacgcg cctgcgcggg cccgatacct ggtattacat ctccgccggc 11581 gtgctgctcc tcggcgtcat cctcgcgctg aacctcatcg aaagccccac cggccgcgcc 11641 atgcgcgcga tccatgacag cgagatcgcg agccgtgtcg ccggcatcga tgtcgcgagc 11701 tacaaactga aaatcttcgt gctctcggcg gtctatgcct cggtggcggg ctccctgctg 11761 gcctttctga acgggcacat cacccccgat ggcacggccg gcttcctgcg ctccgtcgaa 11821 ctcgtgacca tggtggtgct gggggggttg ggctcggtca tcggcgcgat caccggcgcg 11881 gcggtgctgg tgatcctgcc gcaggcgctg acgatcttcc acgattacga gcacctcatg 11941 ctcggcctga tgatcatggt gagcatgatc ctgctgccgc gcggcatcat ccccagcctg 12001 ctggcccgca tgagggaagg gaaatcatga gcgtgcccct gcttgaaacc gaaaatctct 12061 cgatcgcctt cggcggcgtg aaggcggtcg atggcgtgtc gctgaaggtg cagccgcgcc 12121 gcatccattc gatcatcggg ccgaacggcg cgggcaagac cacgctcttc aacctgattt 12181 ccggcctcta ccagccttcc tcgggccgtg tgcgccttgc gggcgaggat gtgaccggcc 12241 tgccgcccga tgcgctggcg cggcgcggcc tctcgcgctc cttccagaac ctgcagatct 12301 tcttccgcat gagcgcgctc gacaatgtga tggtcggacg tcaccagcac gagaagacga 12361 gccttctgag ccatgtcttc gccttgccct cggtgcgccg cgagaatgcg cgctcccgcg 12421 agatcgccat ggcgctgctc gcgcgcgtgg gccttgcgga tgtcgcggaa aaggaagccg 12481 gctccctctc ctatggcgcg ctgaagcggc tggaaatcgc ccgcgcgctc gccactgaac 12541 cgcgcatcct gctgctggat gagcccgccg ccggctgcaa cgccatcgag accgaggaga 12601 tcgaccgcgt catccagtcg gtcgcgcaat ccggcatcgc ggtgctgctg gtcgaacacg 12661 acatgcgcct ggtgatgcgc atctcggacg agatccatgt gctcgaacgc ggccgcacgc 12721 tggttgccgg cacgccggcg gaggtgcgcg acgacagccg cgtcatcgcg gcctatctcg 12781 gaaccttcgg cagtcaggaa gcaaatcgcg cggaggcgaa ccgtgcttga aatccgccgg 12841 ctccgctcgg cctatggccg gatcgaggtg ctgcacggcc tcgatctcga cgtgcaacgg 12901 aacgaaatcg tctgcctcat cggcggcaat ggcgcgggga agacgacgct cctgcgtgcg 12961 atttccggcg tgcagccggt cacctccggc agcatcacct tcgaggggca ggatatcacg 13021 gggctttcgg ccgaaaagcg cgttgcgctc ggcatcgcgc aggtgccgga ggggcggcag 13081 attttcggcg gcctcacggt ggaggataac ctcaatctcg gcggctggct gcatggcggc 13141 ggaaccgccc gtgagcgcga cgacatctac gcgctcttcc ccatcctgca cgagaagcgc 13201 gccttgcagg ccggcggcct ttccggcggc cagcagcaga tgctcgccat cggccgcgcg 13261 ctgatgagcc gcccgcgcct gctgctgctc gacgagccga gcatgggcct cgcacccgtg 149 13321 ctggtggagc aggtcttcgg cgtgatcgag agcctgcagc agcgcggcat caccatcctg 13381 ctcgtggagc agaacgcggc ggcggcactg gccaaatcgg atcgcggttt cgtgatcgag 13441 accggcgaaa tcacccattc cggcgcctcg gccgatctca tcggcgatcc gaagctgcgg 13501 gaagcctatc tgggccacta gagcggaatc cgatctgatt gaatcagacc gtccgttctc 13561 atttcttgtt atttcacgca ttttcttcgc ccaaccggta tccgatggat cggaaaatgc 13621 tccagaaacg cctgggagtt gaaacatgag cctcatccgc ctccaatcct tcgccgaggg 13681 ccactggttt tcgggcgacg gcgacggcag cttgctcgtg agcgcgattg atgccagccc 13741 cgtcgccgag atcaccgcca ggggcctcga tttcgcgggg atgatgcgtt tcgcgcgagc 13801 aaaggccggc cccgcgcttc gccgcctcac cttccatgag cgcgcctcgc tgctcaaggc 13861 ggtcgcgcag gcgctgaccg cgcggaagga ggaactctac gcgctttcag cgctcacggg 13921 tgccacacgc gccgatagct ggatcgacat cgatggcggc atcggcacgc tcttcgtcta 13981 tgcctcgaag ggcaaacgcg aattgccgaa cgagaccttc ctgatcgatg gcgcggtgga 14041 aggcctgtcg aagaacggaa ccttcctcgg ccagcacatc tacgtgccgc gcgagggcgt 14101 ggcggtgcag atcaacgcct ataacttccc ggtctggggc atgctggaaa agctcggccc 14161 ctcgattctc gcgggcctgc ctgtcatcgc gaaacccgcg agcgccacgg cctggctcgc 14221 ggaaaaggcc gcgcgcatca tcatcgaaag cggtattctg ccggaaggcg cgttccagtt 14281 catcgcgggc tcgaccggcg acctcttcga gcatctgacg agccaggacc acgtggcctt 14341 caccggctcg ctcgcgacct cgaccgcgct gcaggcgcat cccgtcatcc tgaagaacgc 14401 cgtgcatttc accgccgaac gcgacagcct gaattgctcc atcctcgggc cggatgccgt 14461 gccggggacg ccggaattcg atctcctcat caaggaggtc gccagagaga tgagcacgaa 14521 ggccgggcag aaatgcacgg ccatccgccg catcatcgtt cccgccgccg aggctgatgc 14581 ggtggctaac gccctctcga agcggctctc gagcctcgcg atcggcgatc ctcgcgtcga 14641 gacgacccgc atcggccccc tcgcctcgct ggcgcagcgc gcggatgtgc gcgctcagat 14701 cgcccgcctg caagaggaat gcgagatcct ccacggcgcc cccgacaagc tcgacctgcc 14761 cggcctcgat accgagaagg gcgcgttcct ttcgcccgtt ctgctcgcgg cacgcgattc 14821 ccgcgccgcc gaacgcccgc atgacaccga agccttcggc cccgtggcaa cgctcctgac 14881 ctatcgcgat cttgacgagg ccatcggcct cgcgcggaaa ggggatggct cgctcgcggg 14941 ctcgctggtg acggcggatg cgcagatcgc ccgtgaggtg gtgttcggca cgggcgcctt 15001 ccacggccgg ctgctgctgc tgaacgccga atgcgcggcg gaatcgaccg gccacggctc 15061 gcccatgccg catctcgtgc atggcggccc cggccgcgcc ggaggcggcg aggaactcgg 15121 cggcatgcgt gccgtgcatc accagatgca gcgggtggcc ttgcagggat cgcccgcctt 15181 cctcacggcg atcacgaaaa gttacgtgaa gggcacgccc gagccggcgg cacccgcgca 15241 ccccttccgc ctccccttcg aggatctcgt gcccggccag agctaccact cgccggaacg 15301 cgtgatcagc cttggcgaca ttgaacattt cgcgcatttc accggagaca cgttctatgc 15361 ccacatggac gaggaggccg tgaagggcca cccgttcttc cccggccgcg tggcgcatgg 15421 ttatctgctg ctctccttcg ccgcgggcct gttcgtggag cccgagcgcg gcccggttct 15481 ggcgaattac gggcttgaca acctgcgatt cctgaagccc gttgcacctg gggaggccat 15541 cagggttcgg ctcacggtaa agtcgaaatc accgcgcaac acggattatg gtgaagtgcg 15601 ctgggatgtc gagatccgga ccggtgcagg ggatatcgcc gcgacctacg agcttctcac 15661 catgaacgcc tatcgtgagg cttcatgagt gcggagatct cggccgcagg cgatgcggac 15721 gaaggagaga cgatgcctga tgttctgcgg gtatggcggg acggcgaggt ggcggtcctc 15781 gagatcgaca acccgcccgt caacgcgacc tcgcaagcgg ttcgccgtgc cctgttcgat 15841 gccgtgcagc gggcgcaatc cgagccggag acgcttgcga tcctgatctg cggcgcgggc 15901 cgcaccttca ccgcgggcgg cgacatcagc gaattcggca agcccccccg cgaacctcac 15961 ctgcccgatc tcatcaaccg catcgaggag agcgcaaagc ccatcgtcgt cgcctggcat 16021 ggcaccgcgc ttggcggcgg ctgcgaaatc gggcttggcg cacacaagcg gatcatcgcg 16081 aaggatggct tcgtgggcct gcccgaggtg aagctcggcc tcctgccggg cgcaggcggc 16141 acgcagcgcc tgccgcggct cgtcgggccg gttgcggcgc tcgatctcat cgcctcgggc 16201 cgcatggtcg gcgcgcaaga ggctttgcgc ctcggcctcg tggatgccat cgcccggaag 16261 gacctgcgcg ccgaggcgat cgaactcgcg cgcgggctca tcggccagat gcagccgcgc 16321 ctgtcgctgc gcgccacccc cgccgccgag cccgaagcct ggcaggcgat ggtggcgaag 16381 gtgaaccgcg aggcccgcgg ctgcatcgcc ccgcaacggg cgatcgaact cgtctcgcag 16441 agcctgacgc tgcccttctc cacgggccag cccaacgagc gccgcacctt cttcgaactc 16501 atgaattccg agcaatcacg cgggcttcgc cacctcttcc tggccgagcg cgaggccggg 16561 aagcgccccg aactcgcgag cgtgccaccg cgcgatctgc accagatcgg catcatcggc 16621 gcgggcacta tgggctcggg catcgcggtt gccttcctcg atgccggcta caaggtgatg 16681 ctggtcgagg cggatgacac ggcgcttcag gccggactcg agcgcgtgaa cgggctttat 16741 ctgcgctcca tccgctcggg gcgcatcgac gagaataccc gcctcaaccg cctctcgaac 16801 ctcgtgccga cgaccgagat gagccggctc gcggcctgtg atctcgtgat cgaggcgatc 16861 ttcgaggatc tgggggtgaa gctcgaactg atgaagaagc tcgcgaccct gctgcccgcc 16921 tccacgctga ttgccacgaa tacgtcctat ctcgatctcg agccgatggc cgatgccctg 16981 ccggcgcccg agcgatttct cggcctgcat ttcttctcgc ccgcgcatgt gatgaagctg 17041 ctcgaaatcg tgcgcgcgcg ccgcacgggc ctcgaagccc tgtccaccgc tctggccatc 17101 gggcggaaac tcaagaaaat cgcggtgatt tccggcgttt gcgagggttt catcggcaat 150 17161 cgcatcctcg cgaaattccg cgcccaatgc gagttcatgc tggaggaagg cgcgctgccg 17221 cgcgaaatcg acgccgcgat ggaagccttc ggcctcgcca tggggccctt tgcggttcag 17281 gatctcgcgg gcctcgacat cgcctgggcg cgcaggaaac ggctcgccgc cacacgtgac 17341 cccaccgatc gcgacgtgcc gctggtggat cgcctgtgcg aacagggccg cttcgggcag 17401 aaggccggca agggttggta cgcttacgaa aatggccgcc gcgtccccga tcctgccgtg 17461 gaggcgatgg tgcgcgcgca tgccgccgct accgggcgtc ctcagcgttc cttcagcgcg 17521 caggaaatcc agacccgcgt cgtcaccgcg atggtgaatg agggcgcgag aattctgggc 17581 gaaggcatcg ccgcgcgccc tgcggatatc gatctcgtgc tcgtccacgg ctacggcttt 17641 ccgaactggc gcggtgggcc catgcaccat gccgatgcgc ttggcctgaa gcaggtgctc 17701 gccgtggcgc agaataccga ggcgcgcgat ggccgcggct tcgaggtcgc gccgctcctc 17761 gcggaactgg tggcgggcca gcgtcacttt tcttcgctca acggcggatg attctcgcct 17821 tgtggcgatt ccaaaccctg atgcaaggac accgaaccat gcccgaagcc tatctcgttg 17881 atggcacacg cacccctatc ggccgctatg ccggagcgct cgcgagcctg cgcgcggatg 17941 acatggctgc ccatgtcatc cgcgccctga tggcgcgttt tccaccggaa atggcggctg 18001 cggtggatga aatcatcctc ggttcggcca atcaggcggg cgaggacaac cgcaatgtcg 18061 cgcggatggc cgggcttctc gccggtttgc ccgtggaagt tccaggcacg accgtcaacc 18121 gcctctgcgg ttcggggctt gatgcggtgg ggattgccgc gcgcgcgatc aaggcgggcg 18181 aggccgatct catcatcgcg ggcggcgtcg agagcatgac gcgcgcgccc ctcgtgatgg 18241 gcaaggcgca ggaggccttc cagcgctcgg cggaggtgtt cgacaccacc atcggctggc 18301 gcttcgtgaa ccccctgatg aaggcgcaat acggcatcga ttccatgccg gaaacagccg 18361 agaacgtcgc cgaggagttc aaggtctccc gcgaggatca ggacagcttc gcgcttcggt 18421 cgcaggcgcg cgcctcccgc gcgcagaaga atggccgtct ggcgcgcgag atcacgcccg 18481 tcagcatatc gcagaagaag ggcgaaccga tcatcgtgga ccgcgacgag catccgcgcg 18541 agacgacgct cgaaaagctg gccgccctcg gcacgccctt ccgcaagggt ggcacggtga 18601 ccgccggcaa cgcctcgggc gtgaacgacg gtgctgccgc cctgctgatc gccagcgagg 18661 ccgccgtgaa gcgcttcggc ctcaccccgc tcgcccgcgt caccgggctg gcgacggcgg 18721 gcgtggcccc gcgcatcatg ggcatcggcc cggcgccagc gacgcagaag ctctgcgcgc 18781 gcctcggcat caagccttcc gaattcgatg tcatcgaact caacgaggcc ttcgccgcgc 18841 agggcctcgc ttcccttcgc ctcctcgggc tgcccgatgg cgcggatcac atcaacccga 18901 atggcggcgc gattgcgctg ggccatccgc tgggcatgtc cggcgcgcgg ctcgcgctca 18961 cggcggcgat cgagatgaag gatcgcggcg gcaggcgcgc ctgcgccacc atgtgcatcg 19021 gcgtgggcca gggcatcgcg ctggcgctgg agagcgcgtg aatcaggaac cgaggaacca 19081 gaaccggaat gcgagggacc ggaatcgatc tctcgccttt ccgccgcaag ctgctgaatt 19141 gactccgttt cagacctgat cgtaatccac cgcgagagcg gctcccgtgg gccggcactg 19201 gcaggtcagc acatagcctg cctcgacctc ccagggctcg aggctgaaat tcaggtccat 19261 gctggcggag ccctccgtca ctttcgcgcg gcaggtgcag cacatcccgc cccggcagga 19321 atagggcaca tcgaggcccg cgcgctccgc cgcttccagc accgtctcgc cctcgtgcag 19381 catcagacta tggctctggc cgtcatgcgt gacggtgagg gacacgccgt tatcggcctc 19441 gggcacgacc gtgcgagcgc ccgagcgcgg cggggcagca ccggtcgaga acagctcgac 19501 atggatgcgc gaaggctccg cgcccatctc ctccagcacg gagcgggcgg ccctggtcat 19561 ctcttcggga ccgcagagat agatgtcatc gatggccctc gcgcccccca gcgtgcgcac 19621 cagcagcgcg atcttctccg gcgtcatccg gccattgagc aggtcgatat cctgcgcctc 19681 gcggctgagg atgtgatgca gcgcgagccg cccgagatgt cggtccttca ggtcttccag 19741 cgcctcgcgg aagagaatcg aaccggaatt gcggttgcca tagaagagcg tgagcctggc 19801 atccgggtcc cgcgcgaggc gcgtgcggat gatcgagagg atgggcgtga tgcccgatcc 19861 gcaggcgaag aacaccgaat gcccgccgcc ggcttgccgc gtcgtggcgc cgaagcgacc 19921 catcggggtc atcacgtcga gcgccgcccc gggctgcagc gcatgattgg cgaaggtgga 19981 gaaggcgccg cccgcgactt ccttcaccgc cacgcgcaat tcgccgtcat cctcgcccgc 20041 gcagatcgaa taggagcggc ggatctcctc gccgtcaatc tgtgtgcgca aggtgagata 20101 ctggccctgc tcgaaggcga aagcctgctg caattcgctg ggcacctcga aggcgatgga 20161 gacggcatcc ggcgtctcgc ggcggatgtc acggatcgtc agcgtgtgaa agcggggggc 20221 ggacatcggc gcacctcaga tgcacttgaa gtaatcgaag ggttcgcggc aggcgcggca 20281 gcgccaaagc gccttgcagg cggtggagcc gaattcggag acgcgctcgg tttcggccga 20341 accgcaatgc gggcacggga tttcatccac cccgaacagc gcgccccggc ctttcgcgcg 20401 cccaaccggg ggcgcgatgc cataggcgcg aagcttttcc ttgcccgctt cgctcatcca 20461 gtccgtcgtc caggcgggcg agagcgagag cgtgaccttg cggctcgcga aacccgcctt 20521 ttcgagcgcg agatcgacct gcagggagat caggtccatc gcggggcagc cggaataggt 20581 gggcgtgatg acgacctcca ccgcttcccc cgcgaagcgc acctctcgca gaacgcccaa 20641 atcctcgatc gtgagcaccg gcacttccgg gtccaccacg tttcccgcag cctggcgcgc 20701 gcgcatcagg cgctcatgga gacgctcatg gaggcgctca tcgaccggct cgcgcagatg 20761 gatgatctcg gccatgattc accaggtcga gccgggatag gcgcgctgca tgaactgcag 20821 atcggccagc agatgcccca gatgctcgga atgctcgccg cgccggccac ccgtctgcat 20881 ccagccatcc ttcggccggt tcagcagggc atcggcgaaa acgcgatcca ccgtatcgag 20941 ccagagcggg cggatgccgg cgggatcgac cgccacgccc gcctcgatca gcgcgcgctc 151 21001 ggtcccgtcc acctcgaaca tctctcccgt gaagggccag agatcagcga gcgcttcctg 21061 aagccgcgcg gcgctctcct cggtgccgtc tccaagccgg atcacccact cggccgaatg 21121 gcggaggtga taggccatct ccttcacggc cttggccgca atcgcagcca gacgctcatc 21181 cttcgaggtg gcgaggcgct cgaaacaggg atgcatgaaa gcagcgtaga ggaactgccg 21241 gaggatggtg aaggcgaaat cgccgcgcgg acgctcgacc agcaggatgt tccggaattc 21301 atcgcaatcc cgaaggtagg cgaggtcgtc ttccgaacgg cccttcccct cgatctcgcc 21361 ggcataggtg tagaaattgc gcgcctgccc gatgcagtcg agcgccatgt tggcgagcgc 21421 caggtcttcc tcgagcatcg gcccgtggcc gcaccactcg ccgaggcgat ggcccaggac 21481 cagcgcgtta tccgcgagcc gaagggtata ggcgaagagc ggcgtttcca tcacatatgc 21541 cccacttcgt ccggcacctc gtagaaggtg gggtggcggt agatcttcgt ggccgtgggc 21601 tcgaagagcg cgtccttgtc ggaagggtca gacgcggtga tcgcaccgga cggcacgacc 21661 cagatcgaaa gcccctcgcc ccggcgggtg taaacgtcgc gggcgttctg cagcgccatc 21721 tcggcatccg gcgcgtgcag cgagccggca tggcgatggc tgagccccgt gcgggagcgg 21781 atgaagactt cccagagcgg catgttcttc tcggtcatgg tggttactcc gcggcgatcc 21841 tggcggccgc cttgcgggcc gcgcgttttt cggcgtggcg aagcgctgcc tcgcgcaccc 21901 aggcgccttc ctcatgcgcc ttgatgcggg ccttgatgcg ctcgcgattg cagggcccct 21961 cgcccttcac gacgcggttg aactcggacc aatcgatttc gccatgctcc cagtggccgg 22021 tcgcctcgtt gaaccggagg tcggggtccg ggaaggtcag gccgaggaaa tgcccttgcg 22081 gcacggtggc atcgatgaat ttctggcgca aggcgtcgtt cgagaagcgc ttgatcttcc 22141 atttcatgtt ctggtcggaa tgctgggaag ccgaatccga cgggccgaac atcatgattg 22201 cgggccacca ccagcggttc aggctgtcct gcgccatctt cttctgcgcc ggcgtgccac 22261 gcgccagcgt catcacgatc tcgtagccct gccgctggtg gaaactctct tccttgcaga 22321 tgcggatcat cgcgcgggca tagggcccga acgaggtccg gcacaatgga atctggttca 22381 tgatggccgc gccatccacc agccagccga tcatgcccat atcggcccag gagggagtgg 22441 gataattgaa gatgctggaa tatttcgcgc ggcccgcatg aagggcgtcc accagttcct 22501 cgcgggagac gcccagcgtt tcggcggcgg aatagaggta gaggccgtgg ccaccctcat 22561 cctgcacctt ggcgagaagt gcggccttgc gcttgaggga aggcgcgcgg gtgatccagt 22621 tcccctcggg cagcatgccc acgatttccg aatgcgcatg ctgggagatc tgccgctgaa 22681 gcgtcttccg gtagccctcg ggcatccagt cattcggctc gatcttctcc tcggcatcga 22741 tccgcgcctg gaaggccgcg agcttcgcag gatcgtcctt ggagacttcc gtgaccgtgt 22801 tgagcgcctg cgcatacatc gcggccatcc tcccctgatg ctgtttcgaa gtagtatgtt 22861 acagattttt caaaagtcaa caattctcgt tacaaatatc gcctgcgacc cattggcggc 22921 tcgcaatcga gcccttcagc cctgcacatg ccgcaacgca gcatcgaaag cgagacgcgc 22981 cgggcttccc ggccagtccc tcggcagcac cgcctcgggc aggaacggat cgcgcagcac 23041 gatgcggcgg aactcgtgca ccagcatcag ccgggcgagg cgcgcttcgg aaggcgtgaa 23101 ggctccgccc tgcacggcct cggcatgggc gatcaccgcg cgatagccgc cggcgagctg 23161 ctccaacggc cagagttcac gagcgcgcag ggcgagatcg gcattggcgt gcagcgacag 23221 agcaaggcag ccccggactt ccggcgctgc ccgcccggcc caggccgagc ggatcaacag 23281 gttggcggcg agcgggacga agcccatcgc gacgagttcg gcgcgcaggg tcgcgcgacg 23341 ccccgtatcc tcgatcacca gcagatcgag atcgcccgtg gggcgcggca gcagccgccc 23401 gaagatgaga tcgcccgccg cctcgaaatc cttcagcgcc cgcgccgaga ggcgatagag 23461 cgcggctttt ccgtcgcgct cgcgcagcaa agtgccgtca ttcacgaggc gagagagcgc 23521 cgtgcgcacg aggccgggct cgatctccag cgccgccagc cattcctgca ggtcggagag 23581 gcgcaaggcc tcgcccgccg gcagcacgag cgcgccaaag accgagacga tcagcgagcc 23641 ggtgcgcggc ggcacacgcc cgtgatgcgc ggcgatgagc gccgagagcg ccgcgggccg 23701 agccatgcct caggccggct ccgcattgcg cttgtccatc atcgccctga aggccggtcg 23761 cgcctggcag gccacgagcc aagctctcac atgcggggcg gcctcgaaaa gctccggcgc 23821 ggggctcgca tagcggatga cctccgccac gttgatatcc gcgaccgtga agcgcccgcc 23881 gacaaggcag ccattcccgg ccagcgcgcc atccagcacc tggaagggct ggcgcagggc 23941 caccacggcg gcatccgcca gttccggctt gcgttcctcg ccctgcttca tcaggcgatg 24001 atagaggatc tggatcgcgt gcggctcggc ctcggtcgcg gcccagagcg tccacatcgc 24061 gacctctccc tcctccgcga gattggccgg agcgagcgga ccgccgaact tcttggcgag 24121 atagaggttg atcgccagcg attcatgcag gacgagatcg ccatccttca tcgaggggat 24181 atgcccgttg gggttcaccg ccagaaagga tggcgagcgg gtattcaccg gcgcatccgg 24241 agcttccggg ttcgcaaggc gataggcctg gatgaccggc acctgctcgt aaggctggcc 24301 gagttcgttc atgagccaga tgttgcggga agcgcggctg cgcgtgacgc cgtagagggt 24361 aaccatgagc gggaagcgcc tttttcaggg ttgggccggg accataatct tccccagaaa 24421 cgcagcgctg accaatcaaa ttcgctgatg ctgtcattcg cccttgcacg tgccggtgcc 24481 ctgccccatc ccgagccggc gcgggaagag atcggccacg gcccgcatct ggccggaggt 24541 cgtatccccc gctgcctcgc cgatcctggc atctcgagtg cgaagcgcgg ccctgcctgc 24601 agcgccccat gacagcttcg tgctggacgg aaaaagatcc gctatatatc gtaggatatc 24661 acgcctgttc atggaggccc catgcctgcc cgtcgcacgg tcctcgcctg gtctttcgcg 24721 gccctggcgg gcctcgcctg ccccgcccgg gcccaatcct ccgggccggt ggtcttcgcg 24781 gcggcgagcc tgaaaaccgc gctcgacgag atcgcaacgc tctggatgcg ggagaccggc 152 24841 cttcccgctc cgcggctcgc ctatgccggg agcaacgcgc tggcccggca gatcgagcag 24901 ggcgcgccgg ctgatgtctt tctttcggcc gatctcgact ggatggatgc gctggcggcg 24961 aagaacctcc tgcggcccgg gacgcgttcg aacctgctgg cgaaccggct ggcgctggtg 25021 gcgccggtgg agtcgaaggc gacgatcgcg ctccagccgg gcgccgatct cgcggccccg 25081 cttgcggatg gccgccttgc gacagccaat gtggacagcg ttccggccgg gaaatacgca 25141 aaggccgcct tcgagaagct cggcctctgg gtgagcgtga aggatcggct ggcgcaggcc 25201 gagaatgtgc gcgcggcgct gctgctggtg gcgcgcggcg aggcgccgct cggcgtgacc 25261 tatgccaccg atgcggccgc cgaaccgaag gtccggatca tcaccctctt cccggagggg 25321 agccacccgc ccatcgtcta cccgatcgcg atgctgaggg attcggcgca tccgcaggcc 25381 ctgcgcctcc tcgaattcct gaagggcggc agcgcgcgcg ccattttcga gcgtcacggc 25441 ttcacacttc ccgctccacc ccggcagggc agctgatcat ccatgttcgc cgccctcacg 25501 cccgaggaat gggccgcgct ggagctgagc ctgaaggtcg cgaccgtcgc gacgttcgca 25561 agcctgccgg ccggcgtcgc cgtggcctgt ctgctcgcgc gcgggcgctt cccagggcgg 25621 agcgtgctcg atgccctcgt gcatctgccg ctcatcctgc cgccggtggt gacgggctat 25681 ctcctgctgc tggccttcgg gcgcaagggt cccgtcggcg catggctgga ggcccatttc 25741 gggctggtct tctccttccg ctggacgggg gctgccctcg cggcggccgt gatgggcttt 25801 cccctgatgg tccgcgccat ccgcctctcc gtcgaggcgg tggaccggcg cctcgaacag 25861 gcggcgggca cgctgggtgc tgggcggatg ctcactttcc tgcttgtcac attgccgctg 25921 gccctgcccg gcattctggc cggggccgtt ctggccttcg ccaaggcgat gggcgaattc 25981 ggcgcgacca tcaccttcgt cgccaacatc cccggcgaaa cgcagacgct gcccaccgcc 26041 atctacacct tcacgcaggt gccgggcggc gatgccgctg cgatgcgcct cgtgctgatc 26101 tcggtggcgg tgtccgttct ggcgctcgtg ctgtcggaat ggctggcgcg cgggctcagc 26161 cggcggatgg attagcccat gctggaaatc gacatccgcc atcgcgtcgg cgcgctcgat 26221 cttgcggtca gcctgcgggc cgggggtccc gtctcggctt tgttcggccg ctcgggcgcg 26281 gggaagacca cgctgctgaa cctgatcgcc gggctggcgg cgcccgattc gggccggatc 26341 gcgctcgatg cgaccgtgct gttcgaccga cagaagggca tcgatgtgcc gagccggcgc 26401 cggcgcatcg gctatgtgtt tcaggatgcg cggctcttcc cgcatctcac ggtgaagcag 26461 aacctgctgt tcggccgctg gctcgcgcgc cagccgctca gcgaccgcac cgtggagcag 26521 gtgctggcct tgctcgatct tgccccgctt ctggcgcgtc gcccggccca tctctcgggc 26581 ggcgagaagc agcgcgtcgc gctgggccgg gcgctgctgg ccagcccgcg cctgctgctg 26641 atggacgaac cgctcgcctc gctcgatgcg gagcgcaagg ccgagatcct cgcccatctc 26701 gaagccgtgc gcgacgagat cggcattccc atcctctacg tctcccatgc gcgcgaagaa 26761 gtgcggcggc tggcccatga ggtcgcgctg atcgagcatg ggcagcttgc cgccttcggc 26821 ccggcggcga cgttgcttcc gcgcctgccg gagggcgagg catgaccgcc ccctccgcgc 26881 ccgccggacg actctcgctg cggatcgatt ttcccaatgg cgagcgattg gggcccggca 26941 aggtgcggct tctggaagaa atcgcgcgtc tcggctcgat ttcggcggcc gggcgcagca 27001 tgggcatgtc ctatcgccgg gcctggctgc tggtggatgc gctgaacggc atgttcgatg 27061 cgccgctggt cggctcgcat cagggcggct cgggcggcgg cggcgcggca ctcacgcccc 27121 gcggcacgga ggtcgtgcgg ctctaccgga gcatcgaggc aaacgcacag actgggacgc 27181 gcgaagcgct ctcatgcctc gcaggctatg caaccgacgg ggggcaaaat cccggtcgtt 27241 ctggaagttt cccgactgaa ataggtgatt aaacgaaact ggtgggagaa gtaggactcg 27301 aacctacgaa ggcttagcca gcggatttac agtccgcccc ctttgccgct cgggacattc 27361 tcccacgcgc gccggcgaac cggcgcggaa gaccggcctt atgggcaagg gttgggccca 27421 tgtcaacctg gaaagccgcc cggaaaccgc ttccggccaa gggtgagagt cagggtggga 27481 ggcagggtgg cagtccatgc gcgattgcgg ctcgcatgac gccgcgaact gccctatgaa 27541 ccgggcatgg cccatgacag caaacgtccc tatcacccgc tagaccgcca accgaaaccg 27601 gcccgcgagg gctcgcgcat gggcgaaaag cccgccttca aggataagtc ccgtttcgcg 27661 ccgaagccca aggcgccgaa gccacggctg ccggatgaca ccgatgtgct ctacggcatg 27721 cattcggtga aggaggcctt cgccaatccg cgccgccgct tccgtcgcat cgtcgcgacc 27781 gagaacgcgg tgcagcgcct cacggaagat ggcgtgaacc tgccccttgc gcccgaactc 27841 gtgaagcccg aggccatcgg caggctgctc acgccggatg ccgtgcatca gggcctttac 27901 gtggaggccg agcgcctgcc gctgctgccc ttgggcaaac tcccgcgtga tcgcatcatc 27961 ctcgcgctcg accaggtgac cgatccgcac aatgtcggcg cgatcctgcg ttcggctgcc 28021 gccttcgatg tcggcggggt gatcgtcacg catcgccaca gcccggaaat cacgggcgtg 28081 ctcgccaagg ccgcctcggg cgcgctggag catgtgccga tggccggggt gcagaacctc 28141 gcgcgcgcgc tcaccacgct gaaggatgag ggtttcgcgg tcatcggcct tgattccgaa 28201 gcgccggaga agatgggcga tcttccgctg aaactgccga tcgtgctggt tctcggggcg 28261 gaagggcgcg gtctgcggcc ctcgacgcgc gaactctgca gccatctcgc ccggctggag 28321 cttcccggcg cgatcaagag cctgaacgtc tcgaacgcgg cggccatcag cctctatgcc 28381 gtgcagcagg ccatcacccg gatgctgaaa caggaaggct ccgccccggc gagccggagc 28441 ggagctctgg gagaggcgat gcggatgggc atccgcatcg ccgggagccc gctttacctg 28501 ccgtaattat aggtctcgat gtcgcgattc ttggtttccg gcaccagcag gatgccgatc 28561 acgaaggtca tcagcgcgaa gccgatcgga taccacaggc cgaaatagat atcgccggtg 28621 gaggccacca tcgcgaagga ggtcgccggc aggaggccgc cgaaccagcc gttgccgata 153 28681 tggtagggca ggctcatggc cgtgtagcgg atgcgggtgg ggaagaactc caccagggcc 28741 gccgcgatcg ggccgtaaac cgccgtcaca tagagcacca ggatcgtgag gatcgcgatg 28801 atcgtcagcg cctgcgggtt gctcagcgcc gcgaagaagc ccgagacctt caccttggcc 28861 gcgtcatccg ccttggggta acccgccgcg atcgaggctg ccgtgatgtt cttggtgaaa 28921 tccgcgacgc ccgcaccata ggccacttcc gtgccgttca ccgtcaccgt caaaggcgcg 28981 cctgcagcgg cgggcgtgta ggtgtacttg atggcctgcg aggagagcac gcggcgggca 29041 atgtcgcagg gctccgagaa ctggcggata cccaccggat caaagagcga gccgcacttc 29101 gaggcatcgg ccgatacctc gaccttcacc gtctcgagcg ccctggcata ggtcgggttc 29161 gccgtttccg ccatcatctt gaagagcggg aagtaggtga gcgccgcgat caggcagccg 29221 ccgaggatga tcggcttgcg gccgatcttg tccgagagcg agccgaagag aacgaagccg 29281 cccgtgccga tgatcagcga ccaggcgatg agcacgttcg ccgtgagcag gtcgaccttc 29341 aggatcgcct gcaggaagaa cagcgcatag aactggcccg tataccacac caccgcctgg 29401 ccgatgacga ggccggtcaa cgcgatgatc acgatcttca ggttgcgcca ctggccgaag 29461 gcttccgaaa gcggagcctt cgattgggtg ccctcttcct tcatcttccg gaaggccggg 29521 ctctcatgca tctgcaggcg

Appendix III: Nucleotide sequence of Clone 142

LOCUS CBNPD1_Clone_142 9532 bp DNA linear ENV 30-NOV-2006 DEFINITION Uncultured organism CBNPD1 BAC clone 142. ACCESSION EF157667 VERSION KEYWORDS ENV. SOURCE Uncultured organism CBNPD1 BAC clone 142 ORGANISM Uncultured organism CBNPD1 BAC clone 142 Unclassified sequences; environmental samples. REFERENCE 1 (bases 1 to 9532) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Metagenomic Analysis of a Toxic Cyanobacterial Bloom JOURNAL Unpublished REFERENCE 2 (bases 1 to 9532) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Direct Submission JOURNAL Submitted (30-NOV-2006) The Microbial Gene Diversity and Discovery Research Unit, Griffith University, Kessels Road, Nathan, Brisbane, Queensland 4111, Australia FEATURES Location/Qualifiers source 1..9532 /organism="Uncultured organism CBNPD1 BAC clone 142" /mol_type="genomic DNA" /isolation_source="Toxic cyanobacterial bloom" /environmental_sample /plasmid="pIndigoBAC-5" /country="Australia" /metagenomic gene 95..973 /gene="142-1" CDS 95..973 /gene="142-1" /note="'Putative Peptidase, S1/S6 Family'" /codon_start=1 /product="CLONE 142-1" /translation="MKNRHIFLFSLTIQTLSFNALAIVIRHDKADQLYLATSKDFPPL ATFYIDGAHGTLIRPTWVITAAHTTFCLFPGSWIDINQRLRQVKRIYVHPEHKTGVSH DIALIELAEAVTDVVPASLYPDGDELGQRIWFIGAGGTGHGKEGVAVDLVANRGQLRK AQNQVLAVAGPFIKFSFDAPPNALPLEGVSAGGDSGGPAYLVDKNQFSLLGISSRGDT GAVGFYGSVDVYTRVSFFVPWALKLMDSVPRLRGQWSLDKMRTLPDGLTADNLTEFCR QIGLKPERAGASSVHQ" gene complement(1119..2531) /gene="142-2"

154 CDS complement(1119..2531) /gene="142-2" /note="'Putative Glutamate synthase [NADPH] small chain'" /codon_start=1 /product="CLONE 142-2" /translation="MAQNVYQFIDVKRVDPPKKSHEDRTIKFVEIYQPMTDTQAAGQA DRCLDCGNPYCEWKCPVHNYIPQWLKLANDGKIIEAAELSHKTNSLPEVCGRVCPQDR LCEGACTLNEGFGAVTIGSIEKYITDKAFEMGWRPDLSAVVKTDKKVAVIGAGPAGLA CADVLVRNGVTPVVFDKYPEIGGLLTFGIPPFKLEKGVMQLRREIFSAMGFEFRLNTT VGVDVSFDSLLEEYDAVFLALGTYTPMAGGLVNEQAPGVYEALPYLIGNINNLMGWQT EHQFVDLKGKNVVVLGGGDTAMDCVRTAIRQGASKVICAYRRDEANMPGSRKEVQNAR EEGVEFMFNLQPLGIEVNNEGRACGVKVVSTALGAPDAKGRRSAEVVPGSEQVLAADA VVIAFGFQPSPPKWLQDKGIELDSKGRVVAHENSLFPLQTNQQKIFAGGDMVRGSDLV VTAIAQGRKAAEGMLDFLAV" gene complement(2548..7092) /gene="142-3" CDS complement(2548..7092) /gene="142-3" /note="'Putatuve Glutamate synthase [NADPH] large chain'" /codon_start=1 /product="CLONE 142-3" /translation="MKAANTPASAMTTGIFECRTGWRVNVMGLYHHSQARENCGFGLI AHLEGTASHRIVRTAISGLDRMQHRGGISADGKTGDGCGLLMQKPDSFFRAIAQENGW TLGKKYGVGIFFLSQDPVKAEFAKHIAETEIARETLTLVGWREVPVDPSVLGPIALAS MPKVVQVFVSAQPGWGDHDLERRLYMVRRRIEKQIQNDDDFYIPSFSSLVTVFKGLMM PADLPGYYLDLADIRMETAICVFHQRFSTNTMPRWALAQPFRFLAHNGEINTITGNRQ WARARQYKFASPLLPDIQNAAPFVGQKGSDSSSLDNMLELFLAGGMDLFRAMRLLVPP AWQGNKVMDDDLKAFYEFNSMHMEPWDGPAGIVLTNGQHVACNLDRNGLRPARFVITK DKLLTLASEVGIWDYTPDEVQEKGRVGPGEMLAVDTQTGKIWRSNEIDEDLRXRHPYR QWLTQNVQRLVPYEKFEVDLIGKRVFTDDEMMVYHKLFNYSYEEIDQVITVLAKDGQE AVGSMGDDTPMAVLSRKPRTFYDYFRQQFAQVTNPPIDPLREAHVMSLATSIGREHNV FSETAGYARRIQFESPVMMYSDLKQLKTADEKYYKHQIFSLNYNPNEETLEQAVRRLA QEAIVAVREHKVVLVILTDRRIAPELIPIPAAMAVGAVHHALVDAQLRCDSNIVIETA TARDPHHFAVLVGXGATGIYPFLAYETVEQLCEKGSLNITAREAVINYRKGINKGLYK IMSKMGISTVASYRCSNLFEAVGVNRNVMNLCFPDVPSRLGGADFEDFQQDVQQRANM AWLKRKPLDHGGLLKYVHDGEYHAYNPDVVQSLQKAVKSGLYEDYKVYSDFVNQRPVA HIRDLLTVVSEHVSPISLDEVAPAEQLYPRFDSAAMSIGALSPEAHEALAIAMNRLGG RSNSGEGGEDPRRFGTEKNSKIKQVASGRFGVTPHYLVNAEVIQIKVAQGAKPGEGGQ LPGEKVTAEIARLRYSVPGVTLISPPPHHDIYSIEDLSQLIFDLKQVNPQALISVKLV SEPGVGTIATGVAKAYADLITISGYDGGTGASPLTSVKYAGSPWELGLAEVHQALVEN GLRDRVRLQVDGGLKTGLDVVKAAILGAESFGFGTGPMVALGCKFLRICHLNNCATGV ATQDATLRRDHFNGLPDMVMNYFKFLALEVRQLMAQLGVKEITDLIGRTDLLVQREGQ TARQVKLDLSPILQACGVKSDKPLHCMQNNDPFDSGPLNQHMVSKCVDMVKFKTGGRL SLPIRNTDRSVGAALSGLIARHHGNQGMATEPVEVQFVGTAGQSFGVWNAGGLHMSLQ GDANDYVGKGMTGGQLAIYPPKGVSFARDNATIVGNTCLYGATGGRFYAAGRAGERFA VRNSGAIAVIEGTGDNCCEYMTGGVVVVLGQTGVNFGAGMTGGFAYLFDEHQDLALRV NPELVECLEVSTPILQEHLRSLIHQHFEATGSERAHKILSSFQSCLTQFKLIKPKTSD VNTLLGHRARSSAELRVQAM" gene complement(7543..7728) /gene="142-4" CDS complement(7543..7728) /gene="142-4" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 142-4" /translation="MVSSPEKAALAAFFLPENDKITIIRHKTVIKLQNTCLMLVSQQQ HLFAKSQLAIHRKNRYT" gene complement(7777..8730) /gene="142-5" CDS complement(7777..8730) /gene="142-5" /note="'Putative Isoaspartyl Dipeptidase'" /codon_start=1 /product="CLONE 142-5" 155 /translation="MQSTDLGEFMPTAMPDVANHSCATTEGVLDWVGMSNIEVPLMLA IAGEAPRQVSAHVEAFVNLKDPKAKGIHMSRLYLLLDDLSRESTLSHASLCDLLDGFI TSHEDLSNQAFVKFDFDLHLRRKALITEKQGWKAYPVTVTGQVLDKNLSVELAVKVPY SSTCPCSAALARQLIQEAFVAKFAGQQQISSELVVDWLGTTQGIVATPHSQRSVAEIK VKLHHNVTEFPIVALIDAIEDALKTPVQAAVKRADEQEFARLNGQNLMFCEDAARRLK HALNQLSEFDDFWLRVNHYESLHAHDAVAVTVKGLEGGYRA" BASE COUNT 2406 a 2444 c 2247 g 2435 t ORIGIN 1 cgccaaactt tggctcgcaa tcgcagcctg agatctctgc ggcgaaggcc tcggccgcca 61 atgctttagc gctataacga aaacatgtat ctatatgaaa aatagacata tatttctatt 121 ttcgctcaca atccagacgc taagttttaa tgcgctggcc atagtgatcc ggcacgataa 181 agctgatcaa ttgtatttgg caacaagcaa agacttcccg ccactggcaa ctttttatat 241 cgacggtgcc cacggtactt taatcaggcc cacatgggta ataacggcag cccataccac 301 tttttgtttg tttccaggca gttggatcga tatcaaccag cggctgcggc aggtgaagcg 361 aatttacgtg catccggagc acaaaacagg cgtcagtcat gatattgcat tgattgaact 421 ggctgaggct gtgactgatg tggttccggc cagtttgtat ccagatggcg acgaactcgg 481 ccagcgcatc tggtttattg gcgctggcgg cactggtcat ggcaaagaag gtgttgcagt 541 ggatctggtt gccaataggg ggcaattgcg taaagcgcaa aatcaagtgt tggcggtggc 601 cggccctttc ataaagttca gctttgacgc accgccaaat gcgctgcctt tggaaggtgt 661 ttctgccggt ggtgacagcg gtggcccagc ctatctcgtt gataaaaatc agttttcgct 721 tcttggtatt agctcccgag gtgacactgg cgcagttggt ttttatggtt cggttgatgt 781 gtatacccgg gtatcatttt ttgtaccctg ggcactaaaa ctgatggaca gtgtgcccag 841 gcttcgtggc cagtggagcc tggataaaat gcgcaccttg cctgacggcc tcactgcaga 901 caatttaact gagttttgtc gacagattgg cttaaaacca gagagggcgg gagcttcgtc 961 tgtacatcaa taagggtaac tggaagctgc atccttacga tgtttggcac agctataaaa 1021 aatccgggcc tgggcccgga tcatgagttt actgttaaaa cagctcagct aaaacagctt 1081 gtgttaaccc gctaaccgca gtgctacagc tgctgtgttc acaccgccaa aaagtccagc 1141 atgccttcag ctgcttttcg gccctgagcg atggcggtga ccactaagtc ggagccacgc 1201 accatatcac cgccggcaaa aatcttttgc tgattggttt gcagcggaaa caggctgttt 1261 tcgtgtgcaa caacgcggcc tttactgtcg agttcaatac ctttgtcctg caaccacttt 1321 ggcgggcttg gctgaaaacc aaaagcaatc actacagcat cagcagccag cacttgttca 1381 gagcctggta ccacttcagc agatcggcgg cctttggcat ccggcgcacc caaggcggta 1441 gaaacaactt taacgccaca agcgcggcct tcgttgttca cttcaatacc caaaggctgc 1501 aggttaaaca taaattcaac cccttcttcg cgggcgtttt gtacttcttt gcgtgaaccc 1561 ggcatattgg cttcgtcacg acgataagca caaatcactt tgctggcgcc ctgacgaatg 1621 gctgtacgta cgcagtccat ggcggtgtcg ccaccaccga gcaccaccac gtttttacct 1681 tttaaatcaa caaattgatg ttcggtttgc cagcccatca ggttattgat attgccaata 1741 agataaggca gggcttcata cacgccaggt gcctgttcat tgactaaacc tccggccata 1801 ggggtgtaag tgccaagggc taaaaatacc gcatcgtatt cttccagcag gctgtcaaaa 1861 ctgacatcta cacccacagt ggtatttaag cggaattcga agcccatagc gctgaaaatt 1921 tcgcgccgaa gctgcatcac gcctttttcc agtttaaaag gcggaatacc aaaagtcagc 1981 aaccccccaa tttccggata tttatcaaaa acaacagggg ttacaccgtt acgcaccagc 2041 acgtcggcac aagcgagtcc tgccgggcct gcaccaatca ctgctacttt tttatcggtt 2101 ttgaccacgg ccgataaatc cggccgccag cccatctcaa aagctttgtc ggtgatgtat 2161 ttttcaatag aaccaatagt gacagcacca aaaccttcgt tcagggtaca agcgccttca 2221 cagagccggt cttgtggaca aacccggcca cacacttcag gtaagctatt ggttttatgt 2281 gataactctg cagcttcaat aattttgccg tcgttggcaa gcttcaacca ctgcggaata 2341 tagttatgca ccgggcattt ccattcgcag taggggttgc cacaatccag gcaacgatcc 2401 gcctgaccag cggcctgggt gtcggtcatc ggctgataaa tttcgacaaa cttaatggtg 2461 cgatcttcat gcgatttttt cggcggatcg acccgtttta catcaataaa ctgataaaca 2521 ttctgtgcca tggttaattc tccaccctta catcgcctga acccgcagct cggctgagga 2581 gcgtgcgcgg tgacctaata acgtattcac gtcactggtt tttggtttaa tcaacttaaa 2641 ctgggttaag cagctctgga acgacgacaa aattttgtgc gcgcgctctg agcctgttgc 2701 ttcaaaatgt tgatgaatca ggctacgcaa atgctcttgt aagattggcg tgctgacctc 2761 taaacattcc accagttcag ggttgaccct aagggctaaa tcctgatgtt catcaaataa 2821 ataagcaaaa ccacctgtca taccagcgcc aaaattcacg ccggtttgac caagcacgac 2881 caccacaccg ccagtcatat attcacagca gttatcgccg gtgccttcaa tcaccgcaat 2941 agcgcctgag ttacgcacgg caaaacgttc accggcgcgg ccagcagcgt aaaaacgacc 3001 gcctgtggca ccatataaac aggtgttgcc cacaatagtg gcgttgtcgc gggcgaaact 3061 cacgccttta ggcgggtaaa tagcgagttg tccgcctgtc atgcctttac cgacataatc 3121 gttggcgtcc ccttgcagcg acatatgcaa accaccggcg ttccagacac caaaactttg 3181 gccggcggtg ccgacaaact gcacttcaac cggctcagtt gccatgccct gattgccgtg 3241 atggcgggca attaaacctg acaaggccgc gccaacagag cggtcggtat tgcggatagg 3301 caaacttaac cggccaccgg ttttaaattt gaccatatca acacatttac tgaccatgtg 156 3361 ttgattcagc gggccagaat cgaaggggtc gttattttgc atacagtgca gcggtttgtc 3421 ggacttcacc ccacaagcct gcaaaattgg cgataaatcc agtttgacct gccgtgctgt 3481 ttggccttca cgctgcacca gtaaatcggt gcggccaatt aagtcggtga tttctttcac 3541 accaagctgt gccatcagct gccgtacttc aagcgctaaa aacttaaagt agttcatcac 3601 catgtccggc aagccgttaa agtggtcgcg acgaagagtt gcatcctgag tggcaacacc 3661 ggtggcgcag ttatttaagt ggcaaatccg taaaaactta caacccaaag ccaccatagg 3721 gcctgtgcca aaaccaaaac tttcagcacc taaaatggcg gctttcacca catcgaggcc 3781 ggtttttaaa ccaccatcaa cttgcaggcg aacacggtcg cgcaggccgt tttccactaa 3841 cgcctgatgc acttcggcaa ggccaagctc ccatggagag cctgcgtatt tgactgatgt 3901 gagcgggctg gcacctgtgc caccgtcata gcctgaaatg gtaatcaggt cggcataagc 3961 tttggcaaca cctgttgcta tggtaccaac acctggctct gataccagtt tcacggaaat 4021 gagcgcctgc ggattgacct gttttaggtc aaaaatcagc tgggataaat cttcaattga 4081 ataaatatcg tgatgcggtg gtggcgaaat taaggtgaca cctggcactg aataacgaag 4141 tctggcgatt tcagcagtaa ctttttcacc cggcagctga ccaccttcac ccggcttagc 4201 gccttgcgcc acttttatct gaatgacttc ggcattcact aaataatgcg gtgtcacgcc 4261 aaaacggccg gacgccactt gtttaatttt gctgtttttt tcggtgccaa aacggcgcgg 4321 atcttcgcca ccttcacccg agttggaacg accgcccaaa cggttcatcg caatagccaa 4381 agcttcatgc gcttccggac ttaaagcacc aatcgacata gcggcgctgt caaaccttgg 4441 gtacagctgc tcagccggtg cgacttcatc cagtgaaata ggtgatacat gctcgctgac 4501 cacagttaac aaatcgcgga tatgggcaac agggcgctga tttacaaagt cggaatacac 4561 tttgtaatct tcatacaacc cacttttcac cgctttttgc aggctttgca ctacgtccgg 4621 gttgtaagca tggtattcac cgtcgtgcac atatttcagt aaaccaccgt gatccaatgg 4681 cttacgtttg agccaggcca tattggcgcg ttgttgtacg tcctgctgga aatcttcaaa 4741 gtcagcaccg cccaaacgtg aaggcacatc cgggaaacaa aggttcatca cgttgcggtt 4801 gacgccaacg gcttcaaaca ggttagagca gcggtaactg gccactgtag aaatgcccat 4861 tttggacata attttataaa ggcctttgtt aatgccttta cggtagttaa taacagcttc 4921 gcgcgcagtg atattcaacg agcctttttc acagagctgt tccacagttt catacgccaa 4981 aaacggataa ataccggtag cgccaaggcc aaccaacacc gcaaagtggt gtgggtcgcg 5041 cgctgtggca gtttctatca cgatgttgga atcacagcgc agctgtgcgt ccactaaggc 5101 atgatgcaca gcacccactg ccattgcggc cggaatagga atcagctctg gtgcaatacg 5161 ccggtcggtc agaataacca gcactacttt gtgctcgcgc acggccacaa tagcttcctg 5221 cgccagccgg cgtacggcct gctccagcgt ttcttcattc gggttgtaat tcaggctaaa 5281 aatctggtgc ttatagtatt tttcatcagc agtcttgagc tgtttcaaat ccgaatacat 5341 catcaccggc gattcaaact gaatacgacg ggcataacct gccgtttcgg aaaatacgtt 5401 gtgctcgcgg ccaatactgg tggccagcga catcacatga gcttcacgta atgggtcaat 5461 tggcgggttg gtgacctgcg caaactgctg gcggaaataa tcgtaaaagg ttctgggttt 5521 gcgcgacaac accgccattg gcgtgtcatc gcccatagag ccgacggctt cctgaccgtc 5581 tttggcaagc acggtaatca cctgatcaat ttcttcataa ctgtagttaa acagcttgtg 5641 atacaccatc atctcgtcgt cggtaaacac ccttttgccg atgaggtcca cttcaaactt 5701 ttcgtaaggg actaaacgct gcacgttttg tgtcagccac tggcggtacg ggtgacgggc 5761 acgtaaatct tcgtcaattt cgttggaacg ccagattttg ccggtttgtg tatcaactgc 5821 cagcatctcg cccggcccca cgcggccttt ttcctgcact tcatcaggtg tgtagtccca 5881 gatgccaact tcggaagcca gcgttaataa tttatctttg gtaatgacaa aacgggcagg 5941 gcgcaaaccg ttgcgatcca gattacaggc cacatgctgg ccgttggtca acacaatacc 6001 ggccggaccg tcccatggtt ccatatgcat ggagttaaat tcataaaagg ctttgaggtc 6061 gtcatccatc actttgttgc cttgccatgc cggcggcacc agcaaacgca tagcgcggaa 6121 caaatccatg ccacctgcta aaaacagctc cagcatattg tccagtgaac ttgaatctga 6181 acctttttgg ccaacaaaag gcgcggcatt ctgaatatcc ggcagcaaag gtgacgcaaa 6241 tttgtactga cgggcacgcg cccattggcg gttaccggtg atggtattaa tttcaccatt 6301 gtgcgccagg aaacggaaag gttgcgccaa agcccaacgt ggcatggtat tggtagaaaa 6361 ccgctggtgg aacacacaaa tcgcggtttc catgcggata tcagctaaat ccagataata 6421 acccggcaaa tcggccggca tcatcaggcc tttaaatacg gtcactaagc tggaaaaact 6481 cggaatataa aagtcatcgt cgttctgaat ttgtttttca atacgacggc gtaccatata 6541 taaacggcgc tctaaatcat ggtcacccca gcctggttgg gcgctgacaa acacctgcac 6601 tacttttggc atcgaagcca gtgcaatagg gccaagcacc gaagggtcta ccggcacttc 6661 acgccaaccc actaaagtta aggtttctcg ggctatttcg gtttcggcaa tatgcttggc 6721 gaattcggct ttcaccggat cctggcttaa gaagaaaata ccaacaccgt attttttacc 6781 taaagtccag ccattttctt gcgcgatagc gcgaaaaaag ctgtctggtt tttgcatcag 6841 caaaccacaa ccgtcaccgg ttttgccgtc tgctgaaatc ccgcccctat gctgcatccg 6901 gtccagaccg ctgatagcgg tacgaacgat gcgatggctg gcggtgccct ctaaatgtgc 6961 aatcaaacca aagccacagt tttctctggc ttgactgtgg tgatacaacc ccatgacgtt 7021 aaccctccaa cctgtcctgc attcaaaaat acccgtcgtc attgcggatg caggggtgtt 7081 ggccgccttc actcatgcca atcacatagt tcactatgct catggcaatt cgttcagttg 7141 ccgcctacct gcatccacaa tgacccggac ataaagctgc cgtcattctg gatgcagagg 157 7201 gtggttgcca ccttcgctga tgcctatcgc atagttttgt atgctgatgg tatgtcgttc 7261 agttgtcgaa gtatccaccc aggccaatct gtattcacaa caacttggga agcaaccctt 7321 tgtcgtcggg gattaggagg tttggccctt atgttctgcc tgcatctatc gctaatgact 7381 tcagtataaa tatgtttaat tagtgaatta ttattctaat ttttagtttt tttacgtgtg 7441 aaagaaaata acggggttat tttcaagggt caagggttgt tagtgttttt actctagtgt 7501 ggtcttggcg tcgtttagac gtctatacat aaagaccttg agttaggtat accgattttt 7561 tcggtgaatt gcaagctgac ttttggcgaa taagtgctgc tgctgtgaca cgagcataag 7621 gcatgtgttc tgtaatttaa ttacagtttt gtggcgaatt atggttattt tatcattttc 7681 aggcagaaaa aaagctgcca gagcagcttt ttcaggtgaa cttaccacaa agttatgcgg 7741 taattatgcg aatgacccgc attggttttt tttggatcag gcgcgataac cgccttccag 7801 tcctttgaca gtcactgcga ccgcatcatg ggcatgcaaa gattcgtagt ggttgacccg 7861 caaccagaaa tcatcaaatt cagacaactg attcagtgcg tgttttaaac gacgggcagc 7921 atcttcgcag aacatcagat tctgaccgtt taagcgggca aactcttgtt cgtcagcgcg 7981 tttaactgca gcttgtaccg gtgtttttaa ggcgtcttca atagcgtcaa ttaaagcaac 8041 tatcgggaac tcggtcacgt tgtgatgaag tttcacttta atttcagcaa ctgaacgctg 8101 gctatgcggg gtggcaacaa taccctgagt tgtacccaac cagtccacca ccaattcact 8161 gctgatttgt tgttgaccgg caaattttgc cacaaaagct tcctgaatca gctgacgtgc 8221 caaggctgcc gaacaagggc aggtggagga ataaggcact tttaccgcca gttccaccga 8281 caggttttta tcaagaacct gacctgtgac tgtcaccgga taagctttcc agccttgttt 8341 ttcagtgatc aacgctttac ggcgcaggtg cagatcaaaa tcaaacttta caaaagcctg 8401 attgcttaaa tcttcatggc tggtaataaa accgtccagt aaatcacaaa ggctggcatg 8461 gcttaaggtt gattcgcgtg ataaatcgtc cagcaacaaa tacagcctgg acatgtgaat 8521 gcccttggct tttggatctt ttaaatttac aaaagcttcg acatgggcac tgacctgacg 8581 cggtgcttca ccggcaattg ccagcatcag tggcacttca atattgctca ttccaaccca 8641 gtccagtacc ccttcggtgg tggcacaaga gtgattggct acatctggca tcgctgtcgg 8701 catgaattct ccaagatccg tcgattgcaa aaaaaggcgt tagctccggt ttatccgccg 8761 ttgtaaagcc ctaaagctgc aactggttaa ccgctggatg gctaaaaagg gctgcatttt 8821 accccagttc acaaatttgt cttatgaaaa atttgcaata cactcatcca ttttttcgag 8881 ccatggttta aaaaccagaa tacccgatta aattagtcgg gtattctact ccttctattt 8941 acgcactgac aagggatgtg gatcagtcgg cttttgtttc aaaggtgcca aataccactg 9001 ccttttgttg tttcaccata gtgcggcctt tggcaaaaac gctgtgaagg tgcaggtcgt 9061 ctttgtgaat aagcaccaaa tcggcatctt tgccaacggc aagtgtaccc ttgttgacca 9121 atccaagcac agcagcgggg gaggatgtaa tagaggctat agcagcaacg gggtttactt 9181 taaaatcaag catggcgctt ctggcacttt gatacagact gctgacttta cccacttcaa 9241 gcccaagcaa ttcaccctga gcattaaaaa gcggcaagct ggcattgccg tctgagctca 9301 atgtcaggtt tgtcacaggt atccgttgct ccagcgccat ggctaaagcc tgtgctgcgg 9361 ctacctcgcc gtgctgtaag tcataagctg tcgtgctggt agtaaaatca atcaccccac 9421 ctgctttggc aaagcgaata ccggcggcaa aaacgtcctg attacggtta atatgcgttg 9481 ggtaaaactg ttttagcgat aattcagtgc cgcttaccgc ctgatgcagc gg

Appendix IV: Nucleotide sequence of Clone 543

LOCUS CBNPD1_Clone_543 20895 bp DNA linear ENV 30-NOV-2006 DEFINITION Uncultured organism CBNPD1 BAC clone 543. ACCESSION EF157668 VERSION KEYWORDS ENV. SOURCE Uncultured organism CBNPD1 BAC clone 543 ORGANISM Uncultured organism CBNPD1 BAC clone 543 Unclassified sequences; environmental samples. REFERENCE 1 (bases 1 to 20895) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Metagenomic Analysis of a Toxic Cyanobacterial Bloom JOURNAL Unpublished REFERENCE 2 (bases 1 to 20895) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Direct Submission JOURNAL Submitted (30-NOV-2006) The Microbial Gene Diversity and Discovery Research Unit, Griffith University, Kessels Road, Nathan, Brisbane, Queensland 4111, Australia FEATURES Location/Qualifiers

158 source 1..20895 /organism="Uncultured organism CBNPD1 BAC clone 543" /mol_type="genomic DNA" /isolation_source="Toxic cyanobacterial bloom" /environmental_sample /plasmid="pIndigoBAC-5" /country="Australia" /metagenomic gene complement(2775..4409) /gene="543-1" CDS complement(2775..4409) /gene="543-1" /note="'Putative Chain Release Factor'" /codon_start=1 /product="CLONE 543-1" /translation="MGRSEADIDPHIQPLHTAIERGAVAGEIEVFESVAELAADLLGE HRAKSSSREDEVLTVREVGSGPPRSLKAHGTKPGHAVHHVEVELDAAADAGHHCAKGP RCGSTGGQIIHRHARAVPAHGTDRLNAEVHNGATIARDVAWLRRATAPWELLRAEGSP QRGIELAACTRRRKTVRQPIAHRDIDRAEIARHGGSDRRIGRKGPTRQLHAPELAPGN RQLTIERRFVAVQPKVSIDAESRTSKRAVRTHPRCPRGRCEVDAQRGIFLRPEQRHAL HRCHIALEHRTLRRRRRLRRRRVLPCIRARLGEAADDQGHPDRSCATTDDRHLRRFAA PMHIVSLPPADRPPNAIALGPRAWVRPHDLRFTFSRSGGPGGQNVNKIESRAQLRVAL VAIAGVDDAARARIARLAGSHLVGADAPPHGSPPPAEAELLFACDEHRSQRANHDLCI DRLRLLVLQGAVPPKRRRKTKPHSGIPRAKTRDQAHPIAAQANARRRRGLTDADQPIR ASPLPFTTSPITHAFSFRRWSLFSMRLAASARHRIV" gene 4708..5307 /gene="543-2" CDS 4708..5307 /gene="543-2" /note="'Putatuve Phosphoesterase'" /codon_start=1 /product="CLONE 543-2" /translation="MRQTKPDLVACTGDVVDLHVAGCEPVFEALGSIPAELGSYFVLG NHDLLDSARRVLRLSRSAGLVTLRSSSIAVPRGLRIAGVDWSNSIAGNAALVRRALVG SEAPHLLLAHNPKAFREAARLGVPLTLAGHTHGGQVSIRERGGVPRAVRRGSRLTVGH YHDGRSHLFVTTGAGGWFPLRVNCPAEVVSLVVRSADAK" gene complement(5314..6387) /gene="543-3" CDS complement(5314..6387) /gene="543-3" /note="'Putative Dehydrogenase'" /codon_start=1 /product="CLONE 543-3" /translation="MKKITVLGGGKIGRMVGFLLGTSGDYQVHVGDSDAAAATAAAQM AKGRGSAVDFSNEAQVRAAVSGAWAVVSCAPFHCNPLIASCAKAAGAHYFDLTEDVAV TKKVIELSAGARTVFAPQCGLAPGFITIAANHLASPFSVLDTLRLRVGALPLQPSNEL GYNLTWSTEGVINEYINECEAVQDGTLVRVPALEHEERLIIDGIEFEAFNTSGGLGTL AESMRGKVRNLNYKTIRFPGHARLMRFLLQELRLSEHRDELRRILERALPWTGADQVV VFVSASGMQDGRLTESSYAKRIGARSIGGHHWTAIQITTAAGVCAIVDLVAQGKGAKE GLVRMEDFRFDDFIANRFGAYYA" gene complement(6384..8168) /gene="543-4" CDS complement(6384..8168) /gene="543-4" /note="'Putative Oligoendopeptidase F'" /codon_start=1 /product="CLONE 543-4" /translation="MRRSWRCRRLPFAHAQTQRRATAHPCSDLYDPQSMTAAQHFDPT RWEQLEPLYARLLQRAIHSAEDLERLLLDRSDLDASADEAQANLYIAMTCHTDDPARQ KAFLDFVQHVEPQLKKVGFELDRRIAESPFAAQLDAARYGTYLRGVRQEVRLFRAENV PLQTEATTHDQEYSRLSGEMTVEYDGREQTLPQMAKYLEDTDRGVREASWRCVVDRRL RDREAIDTIYDKLVALRHQMALNAGFPNFRDFQHQRMHRFDYTSEDCARFHDAVERTC MPVLKRLHRERAQALGLSALRPWDVKVDLHGRPPLRPFTDANDLVQRSSRAFHRLDGE LGALFDTMRTGECLDLDSRKGKAPGGYQYQRQHSRTPFIFMNASGLQRDVVTMVHEAG 159 HAFHSMLSRHDPLLAYRGSPIEFAEVASMSMELLTFPQLSEFYSEADAARAKRDLLEG LAGMLPWIATIDAFQHWVYTHPTHSREERTREWVALQKRFGGEIDWTGCEPALEAMWQ RQLHLFGSPFYYVEYGIAQLGALQVWSQSRRSPSEALANYKRALSLGGSRPLPALFEA AGMPFDFSASTVQRLMDDVQSELEAIPA" gene complement(8077..9015) /gene="543-5" CDS complement(8077..9015) /gene="543-5" /note="'4-diphosphocytidyl-2-C-methyl-D-erythritol kinase'" /codon_start=1 /product="CLONE 543-5" /translation="MIPCMSSPLFMQAPAKLNLALSVGTTRPDGMHPISSWMVTVDLF DDLHLERLPEGNFSLFATVWHRDALRRSDIDWSISKDLVNRAHDHLEAFVGRPLPVKV RVEKRIPIGGGLGGGSSDAAAMLRAMNALFELHIGDDDLRGLAANIGSDVPFLVSGGS ALVGGTGGDIQLLGDPPEVHAVLFFPAVACPTADVYRQFDRSAKAPMLRSDAVAALAS RPSIGPTDPFNDLAGPAMDIAPSIADDMEEIGLLTELPVHVCGSGSSLFVICSDGMHA QALAAHCEAKLALPAVAVRTCANPAPCDGASVQRSL" gene 9045..9224 /gene="543-6" CDS 9045..9224 /gene="543-6" /note="'Putative S1/P1 Nuclease'" /codon_start=1 /product="CLONE 543-6" /translation="MLTSLQIAARFMGGWLVSVCIASSCLAWGEDGHRIVGALAAERL TPEARRRRHRTVGKR" gene complement(9868..12522) /gene="543-7" CDS complement(9868..12522) /gene="543-7" /note="'Putative RTX Toxin'" /codon_start=1 /product="CLONE 543-7" /translation="MDKNARAIGPCTNGIHMFGTEQLVHTAVASPQQHIDMPNRLRSD PAVRTLRIPQRHLRRLATRTRIDLADEPTRMRRVSPEMLIWKKQHENPLAFRARLLPP AQRPLEHHASIGTCAARPAMRTDEGLDRRARVHICNGNDPPLLAPIVDRLADELPCVE RVVIRGHIRHRTACAEIRKNHADVFRREYVGGLRHEVHATEDDELRPTVRSLHRRSLT QLEAVTRQVRVGDHLILLIVMPEDQQPIPHLASPCFDALVQLIRARGAVRVGDGRLPQ HCGESVCTRAGGVRSAFFAPGRSAEGGKKQKGLDALGMSWRRCETTGPGESPMHTSFP YRFLIASLLVVSAPLSVAHAIPPILANKLVATDGAAGDQMGNAVALSNLAGITAAEPD IAVVGLRYDDSPQADRGSANVYRRNAQGQWVFEAKLVAADGVTSDAFGYSVAIYADTI VVGAPFDDITLADQGSAYIFKRNANGTWSQITKLVATDAATTDSFGASVAITNTGLGS SSFGDLVLVGTPLNDVGSLQDQGSVYVFKRNTSGTWASEGQIAVTGSDAGAFENLGWS VSAYGDRALIGAPFDTIAGAQYRGSSYMWRRSSSGVWTQEAKLVANDGAASDYFGYAV SLFGNQCIVGAPTDDIGAVADRGSAYVFQLSGTSTWLQEAKLVASDGAAIDNFGTFVA IQGDFAICGAPNDDGTVDGSSTSYADVGSAYSFTQTGSSSWAQQARIAGADGQTGRVF GNAVAVFGTTALIGANGEDIGNPIRDNQGSASVYNLLPPDCNNNGVPDDEDISSGIAQ DCDGNGRPDSCDLADGGADSDDDGKIDACEIAYGDFDLNGAISGTDLSVMLSSWGATD PPAGDLDGDSSVGGTDLTILLGRWGPIP" gene 12532..12909 /gene="543-8" CDS 12532..12909 /gene="543-8" /note="'Putative Quinolinate synthetase A'" /codon_start=1 /product="CLONE 543-8" /translation="MQEVVDAADLSGSTEFIIRTIQAASAGERWSVATEVHLVARIAR AAAERGVHVELLSGCQCLCTTMYRIDQPHLAWALDALAEGRVVNEIRVHKDARVLARS ALDRMLALTESSAVRSDSARLVD" gene 12944..16303 /gene="543-9" CDS 12944..16303 /gene="543-9" 160 /note="'Putative Cation-transporting P-type ATPase'" /codon_start=1 /product="CLONE 543-9" /translation="MSSSERETSARPRKRLLVWGGAAAGVLLVAIGLGIALLPTLLSG AWGRSLVVNAIAPSVQGSVSLKEMSLSWGGPQRISGLEITSASGDRIAADVQVDGSLW SLVKLSAPVRVRLSGAVRSGTADDGSLTILRLLRGSSPSSAPTGSTAPALPVATSTGT TNASLEALRGSVLEIVRVDLEVTGSGDRPAISVTNLHGTCTLETDGCSVDLNADTKVG ERAGNLSIRGSAAKCLLPSGEFAVDTMALDLSVQASALAVPAKGFPLVIDRLQWTVVS RALRDEVRVQGEIAVALPTGEQAVSTVDLKARTPFDAAARSVAGSVRIERLPTSALAP YLPRAMDAQRDIGPSITAVLVLDGTQGTLQLQADSIRVNAAASLEEGGTKLVLSTTSV QAAVQPAMVPSLGLAEPLAVQVDIASAALPLSIGGAIDWNACAVQGRIATGAAAIRCT DAILLPLGATTVDVRAAGHAVPMEIRIDGSVGGTAVTAVQSVTGWAQLLAGQSDALGA KGSLAIAPLDITKASWLPDDIRTMLLQSAVSSVAVVVEQDGTMASGRASLRCTLGDAV VSTRTTWDKDAVATEPIDCTLTVAPSLVARFAPASIALSEPARAEIRVDALRIPWQQI NAGQYLPPLVAGTVRVPMLAVARSPGLAAPATLRDVEVKGSFAPAVEAKGAALSAVLT ARVLAAANDAAQLRGSVEWPDLGAPSMRGSTEVTMAGGSALASLLDLGPAAALLAGPG SIRASVERATADQFDFDVSLPRLKLKGAGLATLDPKGATPMRIELKPTQCAFDLPAEV VEQWLGLRSGADWQQLMTAATGRAIRGTISVESCAWRGSIDDASAVATVKIAPGSIEA PGREKVQFDEVTLSVKSPRVAERAQASLDGSFRVGAGAPGTLTVALDARGDLRAIAST TASALSLKDSSLGIKVPGALAVELLRWSGLQSPSANGAVSPSIGDINLACNVRSLSLP MSSAAGGSAAMRLDLAPCAVQLPGKPRLGMGALEVVVRSSGLNRELNAAVQGTLAVGD AAPAPLTAAAALSGDLRALFGAPNIPITLADSEASLQMPGALALALVDLVNDDASASA ALRRLSPIDARMRVKSMTMPASGRSRQRVHR" gene 16320..17477 /gene="543-10" CDS 16320..17477 /gene="543-10" /note="'Putative class III aminotransferase'" /codon_start=1 /product="CLONE 543-10" /translation="MELTPSSGPAVSLGATSSKASTVRLGDAIDLQVTSSGGNAGTIR GTVVGRQLAASDGSLAPADALWNAQVQMNAVPTALVDAAAGQRGELAELLGPALDGSI DAVSSGSGPQRTTRIRATAKSQTLDLKAPQVDIAQGRCIVSPAAPFEATLALNEAVRR RILKPVSPILSEIVSAPPLRFTVSSLSAPIDGSLSSLDLDGRIDVGDVELRRQSDGIA MLAVAQQPGVTTVPAQIDPLILTVRKGRLTYANFIVRVAKVGPQWQQVLKLSGDVDLA KTPPFANAITIRYPIASIGRTATGAGSSQLPEVTEINRLIAQLPVDPGELLDVDVTFS GPLGKVDGKEVPLQRSTKLVFDPSAIDAKKIQKGIENLPQTIDTFKKLFGG" gene complement(17517..18074) /gene="543-11" CDS complement(17517..18074) /gene="543-11" /note="'Putative NADH-ubiquinone oxidoreductase'" /codon_start=1 /product="CLONE 543-11" /translation="MPIPDDQIVDVTPKPLSRGESLFVPSIIKGLGTTMRHFVENFGR NGSNAKNIWVVQYPEQKRDDRPVEQGGQFRPTFRGVHRLNKDEDGRVRCVACFMCATA CPANCIHIVAEESPWEDREKYPKQFDIDELRCIYCGMCEEACPVDAIELTPHYEVTGL TRAELIFDKSKLLEIYDHTAAEKPM" gene 18131..18475 /gene="543-12" CDS 18131..18475 /gene="543-12" /note="'Putative ATP synthase FO'" /codon_start=1 /product="CLONE 543-12" /translation="MDADGSFRKTSSSRGQDEDVRVGWRMAGLAWRFTTEVIAGAVLG WFLGSWLGDANMGALVGTGIGLAVAMYSLMRGALKLNASLDRLGGVRSNTKGTSKDFG FPSDRGRGGDAT" gene 18472..18930 /gene="543-13" CDS 18472..18930 /gene="543-13" /note="'Conserved Hypothetical Protein'" /codon_start=1 161 /product="CLONE 543-13" /translation="MNGESQEMQRAARTPQLGPPLSMRWVLVAVAMVGAAMLFGCAWG LIAPRGGWMGWPLAIRVVPVVLAIALLGWLAAMPWRPRPAVDWVTVWLGGTVARLLLT PAACFGLYSIFPCDTVQFTAAAGGCCFAVVLAEVGMIAATLHTRSGPAAR" gene 18976..19983 /gene="543-14" CDS 18976..19983 /gene="543-14" /note="'Putative ATP synthase a chain'" /codon_start=1 /product="CLONE 543-14" /translation="MTSGRARPSERRRNDFPACERVGPDVASGGQGVASVEHPGQARD AQHGHDVGDRHRARGASALCGEVHRHGSRVDGQQAVRHTVRVRGSDRGHGAVLARRDA RPGHGRKIDSEVPALPALDVLLHLGAEIIGLIPFVEVQEFISWRRGEQFNLAENTTLA IFGGAATASISVTGGMAVISFVLIQVQGFRELGVKGWLEHLCGGHDLVGGSPLLYPVA LLVFVVEFFGLFVKPAALAIRLFANMVAGHTLLIVFTSFGALAANAGLGWFGVSAITV VSAVGSMLITLLEVFVALLQAFVFMFLTAVFISLMAHEDHGQDEAHGEDAAHAHGAPA H" gene 20084..20401 /gene="543-15" CDS 20084..20401 /gene="543-15" /note="'Putative ATP synthase C subunit'" /codon_start=1 /product="CLONE 543-15" /translation="MKIAHLALATLVGTVCATAAQAADPVAPAASAANWGVGIGAGLA CGLAAIGAGFGIGRIGGSAVESIARQPEMAGRIFINMLLTAAFVEGVALFAVVAGFLN FGK" gene 20160..20894 /gene="543-16" CDS 20160..>20894 /gene="543-16" /note="'Putative ATP synthase B chain'" /codon_start=1 /product="CLONE 543-16" /translation="MRLPHLQPIGAWVSARVSPADSLRSARVSALVVSEVQPWSPSRA SLRWQAASSSTCCSPLPSSKVWRCSRWSLASSTSASDHRSTVRLAAVQTRRTDSPWRF PSMMLASANPLEFNLLPFITTMVVFGTVAAALGFFVWPKILKGLDDRNAKILGEIAAA EAARTAAAAKQKEFEQKLQEAMEESSRMIREAKAEAVRMGEELRVRSEAELAERARRA QDEIESARRTAVAELESHAATLAVSIASRIL" BASE COUNT 3514 a 6386 c 7068 g 3927 t ORIGIN 1 gctcggatgc gttcgttcgc cgtggtgatc gtctcgacat cgggcacctg cgccttcgag 61 agcatgcggc cgattgattc tcgcatgtcg ggaatgtcgc cgagcacgat cggtgcgccc 121 gcatcgatga tccgctgcag ttgctgcagg ccgcgctcga gattctccag tcgctgcgcc 181 cccccgtgca tgggtgcgcc ctcggcgttc acggtgccgt acgcgaacca gaagaggtag 241 tcgagcgcaa ggaccagcgt tggttgcgca tcgatcgcgc ggtcgacttc gcttcgggcc 301 acgggaccag gattggcgaa gaagaaacca ctggtgaagt gacggatggt cggcaggttg 361 ggcgcagtgt ctggcgtggg cagatcggtc gcagcaggtg tggagcgcgc gccagcatgc 421 cgcagaatca tcgcgagtga gacgctgacg ggcggcacgc gccgcggctt cgcggacgct 481 gcgccttcgg gttgcatggg cggcttggtg gcctcggacg actcggtcgg gtggatgaaa 541 accccaaacc catcggatgc gctcgcgccg atcacgacga tgcgctcgta cagattgcgc 601 gtctgcggtg gagatgattg cgtggaagga tcggcaacag gagtctgtgc cgggaccgcc 661 gtggcgaacg cgagagcgaa cgccagagca acggccaggg caacggccag agcaacaggc 721 gaggcaacaa tccgtgcgaa gcgcgcggca gttcgcgaca gggcgggggg accggcgcgc 781 agggcgtggt ggttcatgca agcgcgggga cattcacacg cagggcttcc gcggtgaggc 841 gccccatgtc cgcagggctc tccgccacaa tcacgccgca ggcgcgcagc gcttcgatct 901 tcgcctgcgc cgtgtcctcg gcaccgccga tgatcgcgcc ggcgtggccc atgcgtcgac 961 cgggcggtgc ggtacgacct gcgatgaagg cgatgacggg cttgcgcgaa tgcgcgcgaa 1021 tccagcggcc gccgatgacc tctgcattgc cgccgatttc accgatcatc acgatcgcgt 1081 cggtctgtgg atcggctgcg aacagttcaa ggacatcgag gaaatcgagt cccttcacgg 1141 ggtcgccgcc gatgccgaca caggtgcttt gcccaatgcc tcgcgccgag cactgccaga 1201 ccgcttcgta cgtcagtgtg cccgatcgac tgacgatgcc caccgacttg ccgctcggcg 1261 catcatcgcg gtgcgtgtgg atgtagccgg gcatgatgcc gatcttgcag ccgcccgcga 162 1321 agcgcccatc gtccgtcctg cggccaggcg tgatcacgcc tgggcagttc gggccgacaa 1381 gcgtgaccgc tggagagtgg tggagcatcg agcgcacgcg gatcatgtcc tgcactggaa 1441 tgccctcggt gatggcgcag atgagtgaga tgcctgcatc ggctgcctcg aggatcgcat 1501 cgcccgcgaa gggtggcggg acgaagatca tcgttgcgtt cgcgtcggtt gccttcaccg 1561 catcgtgcac cgtgtcgaag atcggcagtc cgtttggatc gacctgacca cccttgcccg 1621 gcgtggtgcc gcccaccatc cgcgttccgt agtcgagaca ccccttggta tggaacgcgc 1681 ccgctgcacc tgtgatgccc tgacagatca cgcgagtatt gccgtcaatc aatatggaca 1741 tgggacggag aagcataacc attgccgagc ccgtagaccg atctgccctg catcgtgtgt 1801 aatgcgggga tgacggcgat tcctgcccgc aagggcctga cctaccgaga cgcgggcgtg 1861 aacatcgatg cgggcgatga agtggtggag cgcatcgggc ctgtcctgaa gcgcacgcag 1921 gatgcgcgcg tgctcggtcg ccatggggga tttgcaggaa tgttccgcct cgtgcgcggc 1981 aagggcagcg gcaaaggttc gcgccgctgg cggaagcccg tgctggtcgg ttgcacggat 2041 ggcgtcggga cgaaggtgct gttggcggtg gagcgacgat cgttcgcggg catcggcatc 2101 gactgcgtcg cgatgaacgt gaacgacctc atcgtgcaag gtgcggagcc gctgttcttc 2161 ctggattatc tcggcctgtc ccgactgcaa cccgatgaga cggcgcagct gatcgagagt 2221 gtggcccgcg gatgtgagat cgccggttgt gcgctgctcg gcggcgagtg tgcggagatg 2281 ccggatatct acaagcccgg cgacttcgac atcgccggat tttgcgtcgg cgcgctggaa 2341 gagggacgaa tcatcgaccc aggccgtgtg aagcgcggtg atgtggtgat cggtctcgcc 2401 agttcggggg ttcactcgaa cgggtacacg cttgtccgcc gcatcgtgcg tgatgcaggg 2461 cttgatctgg gtgcttccta tcccgactgc gcgcgcgatc ggacactcgc ggatgtgctg 2521 ctcgagccga cgcgcatcta tgcgcgcacc gtgctcggtc tgctgggtga ttcgcggctg 2581 cgcaagtcga tcacgggcac ggcgcacatc acgggcggtg ggcttccagg caatgtgtgt 2641 cgagcgctgc catcgcgcgt ggatgtggtg atcgatccga agtcatggga ggtgcccggg 2701 atcttcccat tcctgcagcg gcacggtcag atcgacacgg aggagatgtt ccgtgtgttc 2761 aacatgggca tcggctacac gatcctgtgc cgcgccgacg cagcgagtcg catgctgaag 2821 aggctccaac gcctgaagga gaaggcgtgg gtgatcggtg aagtcgtgaa gggcagtggc 2881 gaggcgcgga tcggctgatc cgcgtccgtc aacccgcgtc gccgccgcgc attcgcttgc 2941 gctgcgattg gatgcgcttg gtctcgagtc ttcgctctcg ggatgcccga gtggggcttc 3001 gtcttccgtc gacgtttcgg agggaccgct ccttgcagca ccagcagccg caggcgatca 3061 atgcagagat catggttcgc gcgctgcgag cggtgttcgt cgcaggcgaa gagcaattcc 3121 gcctcggcgg gtggtggtga gccatggggc ggtgcatccg cgcccacaag atgggaacct 3181 gcgaggcgtg cgatccgcgc gcgcgctgcg tcgtccactc ctgcgatggc caccagcgcc 3241 acgcgcaact gcgcacgcga ctcaatcttg ttcacattct gtccgccggg tccgcccgag 3301 cgggaaaagg tgaatcgaag atcgtgtggt cgtacccagg cgcgtggacc gagcgcgatg 3361 gcgttcggtg gtcgatcggc tggcggtagg ctcacgatgt gcattggagc agcaaatcgt 3421 cgaaggtgtc gatcgtcggt ggtcgcgcag gaacgatcgg ggtggccctg atcgtcggca 3481 gcctcgccca gacgcgcgcg aatgcacggc aagacgcgcc gccgtcgcaa gcggcgtcgg 3541 cgacggaggg tgcggtgttc cagagcgatg tgacaacgat gaagagcgtg ccgttgctcg 3601 ggccgaagga agattccgcg ctgcgcatcg acttcgcacc tgccgcgtgg gcagcgcggg 3661 tgcgtgcgga cagctcgctt ggaggtccga ctttcagcgt cgatggagac cttgggctga 3721 acggctacga agcggcgttc aatggtgagt tggcgatttc ctggggcgag ttctggcgcg 3781 tgcagctgac gggttggacc ttttcgaccg atgcgtcgat cacttccacc gtgtcgggcg 3841 atttcggctc ggtcaatatc gcggtgggcg atcggctgtc gaacagtttt gcggcggcga 3901 gtgcaggcgg cgagttcaat gccacgctgt ggcgaccctt cagcacgcag cagttcccat 3961 ggggcggtcg cacgcctcag ccaggcaaca tcgcgagcga tggtcgcccc attgtggact 4021 tcggcattca ggcgatcggt gccgtgcgct ggtacagcgc gagcatgacg gtgaatgatc 4081 tgaccgccgg tacttccgca tcgtggtccc ttggcgcagt gatgcccggc atcggcggcg 4141 gcatcgagtt cgacttcaac atggtggacc gcgtgtcctg gcttcgttcc gtgcgctttg 4201 aggctgcggg ggggaccgga tccaacttct cgaacggtca gtacttcatc ttcgcgcgag 4261 ctggacttcg cgcgatgttc accgagcaga tcggcggcga gttcggctac cgactcgaag 4321 acttcaatct ctccggcaac ggcgcctcgt tcgatggcgg tgtgcagggg ctgtatgtgg 4381 ggatcgatgt ccgcttctga tcgtcccaca cgcgccaagt cgcttcgtga actgctcgcg 4441 cacggacgct ccgcggcgcg ttcggatgca aaggggcgtc tcgcgcgcgc cacgctccgg 4501 catgcgatgg tgacgaccct gccggatcgt ctcctcggcg gggtgctgat gcgaaggcat 4561 cttcggacca ccgtccgcct tcgtgagcaa gagttgtggg tgccgggatg gcctgcggcg 4621 ttcgacggtg ttcggattgc gcatgtcagc gacttgcatg tgggcgagtt gatgcccgtg 4681 gagcgagccg tggaagtcat cgacctcgtg cgccagacaa agcccgacct cgtggcgtgc 4741 acgggcgatg tcgtcgactt gcatgtcgct gggtgcgagc cggtcttcga ggcgctcgga 4801 tcgatcccgg cggaactcgg cagctatttc gtgctgggga atcacgatct gcttgacagc 4861 gcgcgtcgcg tgctgcgact gtcgcgcagc gctgggctcg tgacgctgcg ctcgagttcg 4921 atcgctgttc cgcgcggcct tcgaatcgcg ggtgtggact ggtccaattc catcgcggga 4981 aatgcggcgt tggtgcggcg ggcgctcgtc ggcagcgagg caccgcatct gctgctggcc 5041 cacaatccca aggcgttccg tgaagcggca cgtctcgggg tgccgctcac cttggcaggt 5101 catacgcatg gcgggcaggt gtcgattcgt gaacgcggcg gcgtgccgcg cgcggtccgt 163 5161 cgcggatcga ggttgacggt gggtcactac cacgatggca ggtcacacct gttcgtgaca 5221 accggtgcag gcgggtggtt tccgctgcgc gtgaactgtc ccgccgaggt cgtcagcctc 5281 gtggtgcgca gcgcggacgc gaagtgaggc gtgtcaggcg tagtacgcgc cgaaacggtt 5341 cgcaatgaag tcgtcgaagc ggaagtcttc catccgtacg agcccctctt tcgcaccctt 5401 gccttgcgcc acaagatcga cgatggcgca cacccccgcg gccgtggtga tctggatcgc 5461 ggtccaatgg tggccaccga tggaccgcgc gccaatccgc ttggcgtagc tgctttcggt 5521 cagtcgcccg tcttgcatgc cggatgcact gacgaacacc accacctgat cggcgcccgt 5581 ccacggcagc gcgcgttcga ggatgcgtcg gagttcgtct cgatgttcgc tgaggcggag 5641 ttcctgcagc aggaagcgca tcagtcgcgc atggcctggg aaacggatcg tcttgtagtt 5701 caggttgcga accttgccgc gcatggattc ggcaagcgtt ccaaggccgc ccgatgtgtt 5761 gaacgcttcg aactcgatgc catcgatgat gaggcgctct tcgtgttcga gtgcaggaac 5821 gcgcacgagc gttccatcct gcaccgcttc gcactcgttg atgtactcgt tgatcacccc 5881 ctcggtggac catgtcaggt tgtagccgag ttcgttggac ggctgcagcg gcagtgcgcc 5941 cacgcggagc cgcagcgtgt cgagcaccga gaagggagaa gcgagatggt ttgcggcaat 6001 ggtgatgaag cccggcgcca gcccacactg tggtgcgaaa acggttcgtg cgcccgcgga 6061 cagttcgatc accttcttgg tgacggcgac atcctcggtg aggtcgaagt agtgcgcgcc 6121 ggcagccttt gcgcaggagg cgatgagcgg attgcagtgg aacggcgcgc acgagaccac 6181 tgcccacgcg ccagacacgg ccgcacgcac ctgcgcttcg ttgctgaagt cgacggcact 6241 tccccttccc ttggccatct gcgcggcagc cgtggctgcg gctgcatcgc tgtcgccgac 6301 atgcacttga tagtcgccgc tggttccgag caggaacccg accatgcgac cgatcttccc 6361 tccaccgagc actgtgatct ttttcatgcg gggatcgcct cgagttcgga ttgcacatca 6421 tccatcagcc gctgcacggt cgaagcgctg aagtcgaatg gcattccagc ggcctcgaag 6481 agtgcgggca gtgggcgcga acctccgagg gacagtgcgc gcttgtagtt cgcgagcgct 6541 tcggatggac tgcgccgaga ctgcgaccag acctgcagcg cgccgagttg cgcgatcccg 6601 tactccacgt aatagaacgg actgccaaag agatgcagtt ggcgctgcca catcgcttcg 6661 agcgccggct cgcaaccggt ccaatcgatc tcgccaccga atcgcttctg cagggcgacc 6721 cactcccgcg tgcgctcctc gcgcgaatgc gtcggatgcg tgtagaccca gtgctggaag 6781 gcgtcgatcg tggcaatcca gggcagcatg cccgcaagcc cttcgagcaa gtcgcgcttg 6841 gcgcgcgcgg cgtccgcctc ggaatagaac tcgctcaact gcgggaaggt gagcaattcc 6901 atgctcatgg acgcgacctc ggcgaactcg atgggactgc cgcggtacgc cagcagcggg 6961 tcgtgccggc tcagcatgga gtggaacgca tgtcccgcct catgcaccat cgtcaccaca 7021 tcgcgctgca ggccggaagc gttcatgaag atgaaagggg tgcgcgagtg ttgccgctgg 7081 tactggtagc caccgggtgc cttgcccttg cgcgaatcca ggtccaggca ctcgcccgtt 7141 cgcatcgtgt cgaacaacgc gccgagttct ccgtcgagcc tgtggaacgc ccgtgaagag 7201 cgctgcacca gatcgttggc atcggtgaac gggcgcaggg gcgggcgacc gtggagatcg 7261 accttgacat cccatggacg gagcgcgctc aatccgagtg cttgcgcacg ctcgcggtga 7321 agccgcttca gcacgggcat gcaggtgcgt tcgaccgcat catggaaacg agcgcagtcc 7381 tccgaggtgt agtcgaagcg gtgcatgcgc tgatgctgga aatcacggaa gttcgggaag 7441 cctgcgttca gtgccatctg gtggcgcagc gccacgagct tgtcgtagat ggtgtcgatg 7501 gcctcgcggt cacgaaggcg gcggtcgacc acgcatcgcc aggatgcctc gcgcacgccg 7561 cggtccgtgt cttcgaggta cttcgccatc tgcgggagcg tttgctcacg accgtcatac 7621 tcgaccgtca tttcgccgct gagacggctg tactcctgat cgtgggtggt cgcctccgtc 7681 tgcagaggca cattctccgc acggaagagc cggacttcct gtcgaacgcc gcgcaggtag 7741 gtgccgtacc gcgccgcatc gagctgtgcg gcgaagggac tttcggcgat gcggcgatcc 7801 agttcgaatc cgaccttctt caactgcggc tcgacatgct gcacaaagtc caggaacgct 7861 ttctgccgcg ctggatcatc ggtgtggcag gtcatggcga tgtacaagtt cgcctgcgcc 7921 tcatccgcag aggcatcgag atcgcttcgg tcgagcagga gccgctccaa gtcctctgcg 7981 ctgtggatgg cgcgttgcaa caaacgcgcg tagagaggtt cgagctgttc ccagcgagtg 8041 gggtcgaagt gttgcgccgc ggtcatggac tgcggatcat agagatcgct gcacggatgc 8101 gccgtcgcac ggcgctgggt ttgcgcatgt gcgaacggca accgccggca acgccaactt 8161 cgcctcacag tgcgccgcca atgcctgcgc gtgcatcccg tcagagcaga tcacgaacag 8221 gcttgaaccg ctcccgcaca catggacggg cagttccgtg agcaggccaa tctcctccat 8281 gtcgtccgcg atggatggag cgatgtccat ggcggggccg gcaagatcgt tgaagggatc 8341 ggtgggcccg atcgatggac gcgaggcaag cgccgcgacg gcatcgctgc gcaacatcgg 8401 agccttcgca ctgcgatcaa actggcggta gacatccgcc gtcgggcagg cgactgcggg 8461 gaagaagagc acggcgtgca cctcgggcgg atcgccgagc aattgaatgt cgccgcccgt 8521 tccaccgacg agtgccgagc cgccggaaac gagaaacggg acatcgcttc caatgttggc 8581 agcgagtccg cgcagatcgt cgtcgccgat gtgaagctcg aacagcgcgt tcatggcgcg 8641 caacatggct gcagcgtcgc tcgatcctcc gccgagtccg ccgccgatgg gaatgcgctt 8701 ttcgacgcgc accttgacag gcagagggcg cccgacgaac gcctcgagat gatcatgtgc 8761 tcgattgacc aagtccttgg agatcgacca gtcgatgtcg ctccggcgga gcgcgtcgcg 8821 gtgccacacc gtggcgaaca gcgagaagtt tccctccggc aggcgttcga gatgcagatc 8881 atcgaacaga tcgaccgtca ccatccaact actgatgggg tgcatgccat cgggacgcgt 8941 ggtgccgacg gagagcgcca gattcagctt ggcgggcgcc tgcatgaaca ggggggagga 164 9001 catgcacgga atcataggac tccatcggta cctttcacgg ctgcatgctc acctctcttc 9061 agatcgcggc gcgcttcatg ggtggatggt tggtatcagt ctgcatcgca tcatcgtgtc 9121 ttgcgtgggg cgaggacgga caccgaatcg tcggcgcgct cgcggcagaa cgcctcacgc 9181 ccgaagcgcg ccgccgccgt catcgaactg ttgggaagcg atgacctcgc caccgcaggt 9241 ttgtgggcgg atcagatccg gggcgatgct tcatgggact gggccaagcc gctgcactat 9301 gtgaatcttc cccgagacgg ttcgtccttc gaagcggcgc gtgattgccc ggagggtcag 9361 tgtgtggtcg ccgcgatacc gcggtttctg gccgtcgcct cggacgcttc gcgtccgatg 9421 gccgagcgcc gcgatgcgct gcgctttgcg gtccacttca tcggcgactt gcaccagcca 9481 ctgcacgcgg gatacaagga tgatctcggc gggaatcgca ttcaggtgac ggcgttcggc 9541 gacattcgca ccaaccttca cgcactgtgg gacagcgtgc tgatccgtga tcgcatcgcg 9601 ggggagtggc agtctctggc gcagtcgctg cgcggctcga tcacgccagc gctggctgcg 9661 gagtgggcgt cgcagtcgga cgccgccgcc tgggcaaacg agtcagccgc gatcactcgg 9721 accatctaca cggagctgcc cgccgatgcg aaggtgggtg cggagtactc caagtccaac 9781 atgccaacga ttgaacgacg gctgtcgatg gcgggagtgc gcctcgctca cgcactcaac 9841 cgcgtcttcg caagttcccg ggtgggatca gggaatggga ccccagcgcc cgagcaggat 9901 ggtgagatcg gttccgccga ccgagctatc cccgtcgagg tccccagcgg gcgggtcggt 9961 ggctccccac gatgaaagca tcacggacag gtctgtgccg ctgatggcac cgttgagatc 10021 gaagtctccg taggcaatct cgcacgcgtc aatcttgccg tcatcgtcgc tgtccgcgcc 10081 gccatcggcg agatcgcagg aatccgggcg accattgcca tcacagtcct gtgcgatgcc 10141 cgatgagatg tcttcatcat ctggaacacc attgttgttg cagtcgggcg gcaacaggtt 10201 gtagaccgat gcggagccct gattgtcccg aatgggattg ccgatgtcct cgccgtttgc 10261 gccgatgagc gccgtggtgc cgaagacggc cactgcattg ccgaagacgc gccccgtctg 10321 tccatccgcg cccgcgatgc gagcttgctg cgcccaggag gaactgcccg tctgcgtgaa 10381 ggagtacgca gatccgacat cggcgtacga ggtggaactc ccatcaaccg tgccgtcatc 10441 attcggagcg ccgcagatgg cgaaatcacc ctgaatcgcg acgaaggtgc cgaagttgtc 10501 aatcgcagcg ccgtccgagg cgaccaactt cgcctcctgc agccatgtcg atgtgccgga 10561 tagctgaaac acatacgccg agcctcggtc ggcaacggcg ccgatgtcgt ccgtcggagc 10621 acccacgatg cactggttgc cgaacaggct cacggcgtaa ccgaagtagt cggatgccgc 10681 cccatcgttc gcaacgagct tcgcttcttg cgtccaaaca ccagacgatg agcgccgcca 10741 catgtaggaa gaaccgcggt actgcgcgcc ggcgatcgtg tcgaagggtg caccgatgag 10801 cgctcggtca ccgtaggctg acacggacca gccgaggttc tcgaatgcac ccgcatcgga 10861 gcctgtcacg gcgatctgac cctcactcgc ccaggtgccg ctcgtgtttc tcttgaacac 10921 atagacggag ccctgatcct gcagcgatcc cacatcgttg agcggggttc cgacgagtac 10981 gagatcaccg aaggaactgc tgccgagtcc cgtgttggtg atggccacgg atgcaccgaa 11041 gctgtccgtc gttgcggcat cggtcgccac cagtttggtg atctgggacc acgtgccatt 11101 ggcattgcgc ttgaagatgt aggcggagcc ttgatcggcg agcgtgatgt catcgaaggg 11161 tgcgccgacg acgattgtgt ccgcgtagat cgccaccgag tagccaaacg cgtcgctcgt 11221 caccccatcg gccgccacga gctttgcctc gaagacccac tgaccctgtg cgttccttcg 11281 gtagacattc gccgaaccac gatccgcttg cggactgtcg tcgtatcgaa gacccaccac 11341 ggcgatgtcg ggttctgcgg ccgtgatccc agcgaggttc gagagcgcca cggcgttccc 11401 catctgatcg ccagctgcgc cgtccgttgc aaccagcttg ttggcgagaa taggtggaat 11461 cgcgtgcgca accgacagtg gtgctgacac gacgagcaac gaggcgatca ggaatcgata 11521 ggggaaggaa gtgtgcatcg gtgactcgcc gggccccgta gtctcgcatc ttcgccacga 11581 cattccaagc gcgtccagtc ctttttgttt tttccccccc tccgccgacc gccccggcgc 11641 aaaaaaggcc gatcggaccc cacctgcgcg agtgcatacg ctttctccac agtgttgtgg 11701 caaccgacca tccccaactc ttaccgcgcc gcgagcccga atcagctgca cgagcgcatc 11761 gaagcacggc gacgcgagat ggggaatcgg ctgctgatcc tcgggcatca ctatcagcaa 11821 gatgaggtga tcgcccacgc ggacctgacg ggtgacagcc tcaagttgag tcaagctgcg 11881 gcggtggagg ctgcgcaccg tgggacggag ttcatcgtct tctgtggcgt gcacttcatg 11941 gcggagaccg ccgacatact cacgccggaa gacgtccgcg tgattcttcc ggatctcggc 12001 gcaggctgtt cgatggcgga tatggccgcg tatgacgacg cgctcgacgc atgggagttc 12061 atcggcaagg cgatccacga tgggcgcgag aagcggcggg tcgttcccat tacatatgtg 12121 aactcgagcg cggcgatcaa ggccttcgtc ggttcgcatg gcggggcgtg ctgcacaagt 12181 tccaatgctc gcatggtgtt cgagtgggcg ctgcgcgggg gggagaagcc gtgcgcgaaa 12241 ggcgagcgga ttctcgtgct gtttcttcca gatcagcatc tcgggcgaaa cacggcgcat 12301 gcgtgtgggc tcgtcagcga gatcgatgcg cgtgcgagtg gcaagccggc gcagatggcg 12361 ttgtgggatc cgaagcgtcc gcacggcggg atcactgcgg aggcggttcg gcatgtcgat 12421 gtgctgctgt gggctggcca ctgcagtgtg cacaagctgt tccgtcccga acatgtggat 12481 gccgttcgtg cacgggccga tggcacgcgc gttcttgtcc accccgaatg catgcaggaa 12541 gtcgtcgatg cagcggatct ctcggggtca acggagttca tcattcgcac gatccaagct 12601 gcgtccgcag gcgagcgttg gtctgtcgcg accgaagtgc atctcgttgc gcggattgcg 12661 cgcgctgcgg cggagcgggg cgttcacgtg gaactgctgt cgggttgtca gtgcctgtgc 12721 accacgatgt accgaatcga tcagccgcac ctggcgtggg ccctcgatgc cctcgccgag 12781 ggccgcgtcg tgaacgagat ccgcgttcac aaggatgctc gcgtgctcgc gcggagtgcg 165 12841 ctcgaccgca tgctcgcact caccgaatcg agcgcagtgc gttcggattc agcacgactg 12901 gtggactgat cgcagtcggt gcgggataca ctcgacgccc gccatgtcga gcagcgaacg 12961 agagacaagt gcccgtccac gaaagcgact gctggtgtgg ggcggtgctg ccgcgggcgt 13021 gctgttggtc gcaatcggtc ttgggatcgc gctccttccc acgctgctgt caggcgcgtg 13081 ggggcgatcg ttggtcgtga atgcgatcgc gccctcggtg caggggagcg tttcactgaa 13141 ggagatgtcg ctgtcgtggg gtggtccgca gcggatcagc ggactggaga tcacgagcgc 13201 aagcggggac cgaatcgcgg ccgatgtgca agtggatggc agcctgtggt cgctcgtgaa 13261 gttgagcgcg ccagtgcgag tgcgtctgtc gggtgccgtt cggagtggca ccgccgacga 13321 tggttcgctc acgatcctgc ggctgctgcg gggaagttcg ccgagcagtg cgccgaccgg 13381 gtccactgcg cccgcactgc ctgtggccac atccacggga accaccaacg cgtcgctcga 13441 ggcgctccgc ggttcggtac tcgagattgt tcgcgttgat ctggaggtga ctgggagcgg 13501 agaccgccct gcaatctccg tgacgaacct gcacggcacc tgcacgctgg agacggatgg 13561 atgttcggtg gatctcaacg cagacaccaa ggtcggcgag cgcgcaggca acctgagcat 13621 ccgagggagt gcggcaaagt gtctgctgcc gagcggcgag ttcgcggtgg acacgatggc 13681 gctcgacttg agtgtgcagg caagtgcgct cgcagtcccg gcgaaggggt tcccgctggt 13741 gatcgatcgt ctgcagtgga cggttgtctc gcgcgcactg cgtgacgagg tgcgtgtgca 13801 gggagaaatc gcggttgcgc ttcccacggg agagcaggcc gtgtccaccg tcgatctgaa 13861 ggctcgaacg cccttcgatg cggcagcgcg ttccgtcgcg ggatcggttc ggatcgagcg 13921 gttgcccacc tctgcgctcg ctccatatct cccgcgagcg atggatgcgc agcgtgacat 13981 cggtccttcc atcaccgccg tcctcgtgct ggatggaacg caagggacgc tgcagttgca 14041 ggcggattcc atccgtgtga acgccgccgc gagtctggag gagggtggca cgaagctcgt 14101 tctgtcgacc acgagcgtgc aggctgccgt gcaacctgcg atggttccaa gtctcgggct 14161 cgcggagccg ctggcggtgc aggtggatat cgccagcgcc gcgctgccgc tctcgatcgg 14221 cggtgccatc gactggaacg cttgcgcggt gcaggggcgg atcgcaacgg gcgctgcggc 14281 gattcgttgc acggacgcga tcctgctgcc gctgggggcg accacggttg atgtgcgtgc 14341 cgcgggccat gcggtgccga tggagattcg catcgatgga tcggtagggg gtactgcggt 14401 gacagccgtg cagtcggtga cgggatgggc acagttgctc gccggtcagt cggacgcgct 14461 cggtgccaag ggaagtcttg cgattgcgcc actcgacatc accaaggcat cttggttgcc 14521 cgacgacatt cgaacgatgc tgctgcagtc tgccgtttcg agcgtggcgg tggtggtgga 14581 gcaggatgga accatggcga gcggtcgcgc atccctgcga tgcacgctcg gcgacgcagt 14641 cgtgtccaca cgcacgacgt gggacaagga tgcagtcgcg acggagccga tcgattgcac 14701 gctcacggtt gcgccgagtc tggtggcgcg ctttgcaccc gcgagcatcg ccctttcgga 14761 gccggcgcgc gcggagattc gcgtcgacgc gctgcggatt ccgtggcagc agatcaacgc 14821 cgggcagtat ctgccgccac tcgtcgccgg aaccgtccgc gttcccatgt tggcggtggc 14881 gcgctcgccg ggactcgctg cgcccgcgac gctgcgtgat gtcgaggtga agggttcgtt 14941 cgctcccgcc gtcgaggcga agggcgctgc actttctgcc gtcctgaccg cgcgtgtgct 15001 cgcggcagcg aatgacgcgg ctcagttgcg cggttccgtg gagtggccgg acctcggggc 15061 tccatccatg cgcggcagca ccgaggtgac gatggccggt ggatcggcgc ttgcgtcttt 15121 gctcgacctc ggacccgcgg ctgcgctgct cgcaggtcca ggctccattc gagcatcggt 15181 ggaacgcgcc accgccgatc aattcgactt cgatgtttcg ctgccgcgcc tcaagctcaa 15241 gggtgccggt ctcgcgacgc ttgatccgaa gggcgccacg cccatgcgca tcgaactgaa 15301 gccaacgcag tgtgcgttcg acttgcccgc ggaagtcgtc gagcaatggc tcggcctgcg 15361 aagcggcgca gactggcaac agttgatgac cgcagcaact ggaagggcaa tccgaggaac 15421 gatctccgtc gagtcgtgcg cctggcgcgg atcgatcgat gatgcatcgg cagtcgcaac 15481 ggtgaagatc gccccgggtt cgatcgaagc acctggtcgc gagaaggtgc agttcgacga 15541 ggtgacgctg tcggtgaaga gcccacgcgt tgccgagcga gcgcaggcca gcctcgacgg 15601 gagttttcgt gtcggcgctg gcgcccctgg gacattgacc gttgcgctgg atgcgcgcgg 15661 cgatctgcgt gcgatcgcgt cgaccaccgc gtcggcgctc tcgctgaagg acagttcgct 15721 ggggatcaaa gtccctggcg cgctggcggt cgaactgctg cgctggtcgg ggctgcagtc 15781 gccgagcgcg aacggcgccg tttccccatc gatcggcgac atcaaccttg catgcaatgt 15841 ccgctcgctg tcgctgccga tgtcgagcgc ggccggcgga agtgccgcca tgcgtctcga 15901 cttggctccc tgtgccgtgc agttgccggg gaagcctcgg cttggcatgg gcgcgctcga 15961 agtggtggtt cgatcatcgg gactgaaccg cgaactcaac gctgccgtcc agggaacgct 16021 ggcggtcggt gatgcagcgc ccgcaccgtt gaccgccgct gccgcgctca gcggcgattt 16081 gcgcgcgctc ttcggcgcgc cgaacatccc catcacgctc gccgacagcg aagcgagcct 16141 gcagatgcca ggcgcgttgg cgctcgccct cgtcgatctc gtcaatgacg atgcgagtgc 16201 atccgccgcg ctgcgtcgcc tcagtcccat cgatgcgcgc atgcgcgtga agtcgatgac 16261 gatgccagcg tcggggcgca gccggcagcg cgttcaccgg tgaactcact ctcgcgccgg 16321 tggaactcac gccgagcagc ggtcctgccg tgtcgcttgg cgcaacgagc agcaaggcga 16381 gcaccgtgcg cttgggcgat gcgatcgacc tccaagtcac gagcagcggt ggaaatgccg 16441 gcaccattcg cggcacggtg gttgggcggc agctggcggc ttctgacgga tcgctcgcgc 16501 ctgctgatgc gctgtggaac gcgcaggtgc agatgaacgc agtgccaact gcgctcgtgg 16561 atgccgcggc tgggcagcgg ggcgaactgg ccgagctgct cggtccagcc ctcgacggtt 16621 caatcgatgc ggtctcctcc ggcagcggac cacagcgaac cacgaggatc cgtgcgacgg 166 16681 caaagtcaca aacactcgac ctcaaggcgc cgcaggtgga cattgcccag gggcgctgca 16741 tcgtttcgcc cgccgctccc ttcgaggcaa cgcttgccct caacgaagcg gtgcggcgaa 16801 ggatcctcaa gccagtcagt ccgatcctgt cggagatcgt ctctgcgccg ccacttcgct 16861 tcacggtgag cagcctgtcg gcaccgatcg atggaagtct gtcttcgctc gacctcgatg 16921 gccgaatcga tgtgggcgat gtggaactcc ggcgccagag cgatggaatc gcgatgctcg 16981 ccgtcgcaca gcagcccggc gtgacgaccg tgccggccca gatcgaccca ctcatcctca 17041 ccgttcggaa ggggcgactc acctacgcaa acttcatcgt ccgcgtggcc aaggtcggac 17101 cacagtggca gcaggtgttg aaactgtcgg gtgatgtgga tctcgcgaaa accccgccgt 17161 tcgcgaatgc gatcaccatc cgctacccga tcgcatccat cggacgcacg gcaacgggag 17221 ctggttcttc gcaactgcca gaggtgacag agatcaatcg gctcatcgcg cagctgccgg 17281 tggatcccgg cgaacttctc gacgttgatg tgacattctc tggtccgctg ggcaaggtcg 17341 atgggaagga ggtcccgctg cagcgcagca ccaagttggt gttcgaccca tcggcgatcg 17401 atgcgaagaa gatccagaag gggatcgaga acctcccgca gaccatcgac accttcaaga 17461 agttgttcgg tggttgatcc cttcgagtcc gacgcagcca cgccggggca gcgcgatcac 17521 atcggctttt cagcagccgt gtggtcgtag atctcgagca acttcgactt gtcgaagatc 17581 agttccgcac gggtgagacc cgtcacttcg tagtgcggcg tcagttcgat cgcgtccacg 17641 gggcacgctt cctcgcacat gccgcagtag atgcagcgca actcatcgat gtcgaactgc 17701 ttcgggtact tctcgcggtc ctcccaaggg ctttcctccg cgacgatgtg aatgcagttg 17761 gcggggcagg cggtcgcgca catgaagcac gccacgcatc gaacgcgccc atcctcgtcc 17821 ttgttcaacc tgtgcacgcc gcgaaaggtc ggtcgaaact gaccaccctg ctcgactggt 17881 cggtcatcgc gcttctgctc ggggtactgc accacccaga tgttcttcgc attgcttccg 17941 ttgcgtccaa agttctccac gaagtggcgc atcgtcgtgc ccagcccctt gatgatgctg 18001 ggcacgaaca ggctttcgcc gcgcgagagc ggcttcggtg tgacatcgac gatctggtcg 18061 tcagggatgg gcatggcggg ggaaggatag ccgcgccgcg ccgtacgaag ccatcactac 18121 catcgcgagc atggatgccg acggctcttt ccggaagaca tcgtcgagcc gcggccaaga 18181 cgaggatgta cgcgtgggct ggcgaatggc tggactggcc tggcgtttca cgaccgaggt 18241 gatcgctggc gcagtcctcg gatggttcct gggatcgtgg ttgggagacg caaacatggg 18301 cgccctggtg ggcacgggta tcgggctcgc tgtcgccatg tacagcctga tgcgcggagc 18361 actgaagctg aacgcatcac tcgatcgcct cggcggggtg cgttcaaaca cgaagggcac 18421 ttcgaaggac ttcggattcc catcggatcg tgggcgcggt ggggatgcga catgaatggt 18481 gaatcgcagg agatgcaacg cgctgcccgg acgccccagc tcggaccgcc actttccatg 18541 cgctgggtgc tcgtcgccgt ggcgatggtc ggcgcggcga tgctgtttgg ctgcgcttgg 18601 gggttgattg cgcctcgtgg cggatggatg ggatggccac tcgcgatccg tgtggtgccc 18661 gtcgtgctcg cgatcgcgct cctgggttgg ttggccgcga tgccttggcg accccgtccc 18721 gcagtcgact gggtgacggt gtggcttggt ggaaccgtgg ctcggctgtt gctgacccca 18781 gcggcgtgct tcgggctata ctcgatcttc ccctgcgaca cggtgcagtt caccgccgca 18841 gcagggggat gctgcttcgc ggtcgtcctc gcggaagtcg gcatgatcgc cgccacgctg 18901 cacacccgaa gcggtccggc agcccgttga ggcgcggcat cgtggggacg agcgaagtcg 18961 agccgcggtc ggcggttgac ttcgggaaga gcccgaccga gtgagagacg acggaatgat 19021 ttccctgctt gcgagcgcgt cggacccgat gtcgcatctg gtggacaagg ggttgcatca 19081 gtggaacatc ctgggcaggc ccgtgatgct cagcatggtc acgatgttgg tgaccgccat 19141 cgtgctcgtg gtgcttctgc gctttgcggc gaggtccatc gccacgggtc ccgagtcgat 19201 gggcaacaag cggttcgtca cacggtccgc gttcggggga gtgatcgagg tcatggtgct 19261 gtacttgcgc gacgagatgc tcggcccggt catgggcgca agattgactc ggaagtacct 19321 gcccttcctg ctctcgatgt tcttcttcat cttggcgctg aaatcattgg gctgattccc 19381 ttcgtcgaag tgcaagagtt catttcctgg cgtcgcggcg agcagttcaa cctcgccgag 19441 aacaccacgc tcgccatctt tggcggcgca gcgacggcaa gcatctcggt cacgggaggc 19501 atggcggtca tctcgttcgt gctgatccaa gtgcagggct ttcgcgaatt gggcgtgaag 19561 ggttggcttg aacacttgtg cgggggccat gacctcgtcg gcggatcgcc gctcctctat 19621 cccgtcgcac tcttggtgtt cgtggtggag ttcttcggac tgtttgtcaa gccggctgca 19681 ttggcgatcc gcctcttcgc caacatggtg gccggtcaca ccctcctcat cgtgttcact 19741 tcgtttggcg cgttggcggc caacgcaggg ctcgggtggt ttggcgtcag cgcgatcacg 19801 gtggtcagcg ctgtgggctc gatgctgatc accctgctcg aagtgttcgt tgccctgctt 19861 caagcattcg tcttcatgtt cctcaccgcg gttttcatca gcctgatggc gcacgaggac 19921 catgggcagg atgaagccca cggggaagat gcggcgcatg cgcacggagc cccggcccac 19981 tgatttttgg ctatattctt tcccctccca cgggcgatgt tcgcctgtgg gctcgacctg 20041 agaaaccttc tgcaatctga ttcgcaacgg agatctgacc ccaatgaaaa tcgctcatct 20101 cgctctcgcc accctcgtcg gaacggtctg cgccacggct gcccaggcag ctgacccagt 20161 tgcgcctgcc gcatctgcag ccaattgggg cgtgggtatc ggcgcgggtc tcgcctgcgg 20221 actcgctgcg atcggcgcgg gtttcggcat tggtcgtatc ggaggttcag ccgtggagtc 20281 catcgcgcgc cagcctgaga tggcaggccg catcttcatc aacatgctgc tcaccgctgc 20341 cttcgtcgaa ggtgtggcgc tgttcgcggt ggtcgctggc ttcctcaact tcggcaagtg 20401 atcatcggtc gaccgttcgc ctcgctgcag tgcagacgag gcgaacggat tcgccttgga 20461 ggttcccgtc catgatgttg gcctcggcaa atcccctgga gttcaacctg ctccccttca 167 20521 tcaccacgat ggtggtgttt ggcacggtcg cggcggcgct tggcttcttt gtgtggccaa 20581 agatcctcaa ggggctggat gatcgcaatg caaagattct cggggagatc gcagctgcag 20641 aggcggctcg gacagccgct gctgcgaagc agaaggagtt cgagcagaag ttgcaggagg 20701 cgatggaaga gtccagccgc atgattcgcg aggcaaaagc ggaagcggtg cgcatgggcg 20761 aagagcttcg ggtacgcagc gaggcggagt tggccgagcg cgcccgccgc gctcaagacg 20821 agatcgagag cgcgcgtcgc acggcagtcg ccgaattgga atcgcacgcg gccacgcttg 20881 ccgtctccat cgcat

Appendix V: Nucleotide sequence of Clone 578

LOCUS CBNPD1_Clone_578 22169 bp DNA linear ENV 30-NOV-2006 DEFINITION Uncultured organism CBNPD1 BAC clone 578. ACCESSION EF157669 VERSION KEYWORDS ENV. SOURCE Uncultured organism CBNPD1 BAC clone 578 ORGANISM Uncultured organism CBNPD1 BAC clone 578 Uncultured beta-Proteobacteria; environmental samples. REFERENCE 1 (bases 1 to 22169) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Metagenomic Analysis of a Toxic Cyanobacterial Bloom JOURNAL Unpublished REFERENCE 2 (bases 1 to 22169) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Direct Submission JOURNAL Submitted (30-NOV-2006) The Microbial Gene Diversity and Discovery Research Unit, Griffith University, Kessels Road, Nathan, Brisbane, Queensland 4111, Australia FEATURES Location/Qualifiers source 1..22169 /organism="Uncultured organism CBNPD1 BAC clone 578" /mol_type="genomic DNA" /isolation_source="Toxic cyanobacterial bloom" /environmental_sample /plasmid="pIndigoBAC-5" /country="Australia" /metagenomic gene 2416..5073 /gene="578-1" CDS 2416..5073 /gene="578-1" /note="'Putative leucyl aminopeptidase precursor'" /codon_start=1 /product="Clone 578-1" /translation="MAPPLLELVPMRPLACTLAALAFQASAQSPAPTPVPSPPADPAL APLLQAIDPALLRRHIDTLAAFGTRHTASDTASDTRGIGAARRWIEREWRACAKDTPL EISTRSHTEPVGRRIAAPTEIVNVLATLPGRDPKRFLVVSGHYDSRNADVMDAQGDAP GANDDASGTAAVMAMACTMARSTRQPEATLVFAAVSGEEQGLLGAAQLAKELDIDGQT VEAMXTNDIVGSPSGAQGEHAPDAHSAVRRRPRPAAAPDAARRASRALAPSQSAASPG RRWWRRRPARRAAGPPPGPRHRGLRAGHANRPHPAPRPHPARRRPPALPRTRPGRSAL HRALRKLRQPAPERAPGKRPACGRPARVCGRGLPRPGDPRQPRRPGHFGLGSAAARAG AGGRQRTEQRHPPDLGVQPRRQRLPRAVAPQRVGPLGMGPRPARQRPRGGDRRCVARR RGLRRAGAGRSWPQQLGGVRTAGEPGALTARGLERSGEIKLGRALARSIQRPDESLRG RHHAVLELVAGWRLVRLRLGPGHRGVIEQGLEHHRPHVGRHARHPQFAFVHGRPHQLT LGVQVDPKGHPTRHPEALGQFGVAIVGLERRASRFATDPQQVGQSDGQALAKARQGLR LTGRIALHLGPRQFVAQGPVVQQTAVQRTRPGDFAPLARPGLRAVAAQQARQGLQPDG VAAAHPLDQCRHGLGLGHGQRTGLTPARQQGPLVRQGLGQSRCRQAAQTPQTLVHFRD RLGHRVAGGLALQRLQHQGVVRRPQAVQAPQAKLVVDARHHPVGRQGRLGPLLLLAPP GQHQRPHARQVLGHHGGGRLPSGLVVGGIHAPMRQQVLRDEPIHPGFGRRRLRFRRGG GLSRPGGEACPHAQCAQQPQHRRTQAANP" gene complement(5125..6489) 168 /gene="578-2" CDS complement(5125..6489) /gene="578-2" /note="'Putative D-lactate dehydrogenase'" /codon_start=1 /product="CLONE 578-2" /translation="MMGLEAFIRAAGSVVDTEVEPRYLSGARYGTGGSPAVCRPGTAA DVAHLLQRAQEHGVRLLPQGAHTGLVKAATPQGEVVLSTERLRGVFELDALDRTLRVS AGFRLSEVNERLRPHGLQFAIDLSADPSIGGMLAHNTGGTRMCRYGDVRANTLALEVV LPDGEVLTLGRGLAKDNAALALQHLFIGSSGALGVITEATLRLQALPRHTAVALVAPA SLEAVWPLYQRFTQAWGGLVSAFEGLSANALAAAIEVRGGASPFAQGLPPYSLLIELA ADFDGVDLRALLHGELEAAFERGDVLDAALDKDDALWALRHAVSEGLRERGAVIGFDI SLPRRAVWAFRAEASDWLSGEFPPAQVCDFGHLGDGGQHFNLVWPRAAEPLEAQALQR LREGVYARVAAHGGSFSAEHGLGPLVQGTYRAHTASALRQASARVVAALTQGALGPGT FDFG" gene complement(6486..7514) /gene="578-3" CDS complement(6486..7514) /gene="578-3" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 578-3" /translation="MSDAPRNGVHRLVQGLLGASAAEALVAQLEIDPRLIRPIGETVS RAEMAQALNMVLFFDLHERVPEGHDYVQDVMAAGQKITFDHGALRTVAWPSGALPPGE AAISRVLRPLGFTVADTYPLPRLKMTGRAWAHADFPETIAQFFVSELHPERFSAQFQA AVTRVLADSVDPLGPQDIGRLERLGRDAALPWSDALALMPKIVACFGRQHGGFALADY ETLLAESAEMAWIATEGNAFNHATDRVSDLQALSEAQKRKGRSIKDTIEVSGSGRVMQ TAFKAAQVAREFLHEGEWVTRHVPGSFYEFIQRERLQCGRLDLAFDAGNATAIFKMTT HEALEILN" gene complement(7507..9063) /gene="578-4" CDS complement(7507..9063) /gene="578-4" /note="'Putative aldehyde dehydrogenase'" /codon_start=1 /product="CLONE 578-4" /translation="MSTRCTKDRMTPMSSPTAASLLQGMGVAPALYTDGTLAARSPVD GAVTGRVVEASAQDMQAAIGRAHSAFLAWRVIPAPKRGELVRVFGEVLRAHKADLAAL VSLEAGKIASEGLGEVQEMIDICDFAVGLSRQLHGLTIASERPGHRMMEQWLPLGVVG IISAFNFPVAVWAWNAALALVAGDTCIWKPSEKTPLTALATQALFDKAVAAYSAAGHT APEGLSQVLNGGAAVGDALVTDARVPLISATGSTRMGRIVGPKVAARFGRCLLELGGN NALIVSDKADLELAVRAIAFGAWGTAGQRCTTTRRVIAHHSVHDALVERLDKVRVQLK IGHPLHDGTLVGPLVDRQAFEAMQTALDAARSQGGQVRGGERALAADHPEAYYATPAL VTMPAQTDVVCHETFAPILYVLKYSKLDEAIALQNAVPQGLSSAIFTTDLREAEAFMS ASGSDCGIANVNIGTSGAEIGGAFGGEKETGGGRESGSDAWRQYMRRVTNTVNYSNSL PLAQGVKFDV" gene complement(9079..10173) /gene="578-5" CDS complement(9079..10173) /gene="578-5" /note="'Putative Dehydrogenase'" /codon_start=1 /product="CLONE 578-5" /translation="MKIALLGAGHIGQXXXRLLHQSGDYRVTVVDKNAQYLSALAAEG IATAAVDTEDTAALAAQLRGQDAVLNALPYHLAITAATLAKECGCHYFDLTEDVAATK AIKDMADGAKTAFMPQCGLAPGFIGIVAHHLAKQFDEVRDVQMRVGALPAFPTNQLKY NLTWSVDGLINEYCHPCEAIHGGEFISALPLEGLEHFSLDGTEYEAFNTSGGLGTLCE TWAGKVRNLDYKTVRYPGHRGLMKFLLEDLGLAADQEKLKDIMRKSMPATMQDVVLVF VTVSGMKNGVLMQEVFARKIFADRDVKSPLSAIQITTAAGICAALDLFREGRLPQQGF IRQEEVALPDFLANRFGRAYQQSRQVESIA" gene 10290..10775 /gene="578-6" CDS 10290..10775 /gene="578-6" 169 /note="'Putative HTH-type transcriptional regulator'" /codon_start=1 /product="CLONE 578-6" /translation="MDVCFSFLYEMGKTMVHALDGLDRELIAALQSNARMSTTDLAKR LRVARTTVVSRLSRLEAQGVIVGYTVRLGATEGREGVQAFVNLSVSPKAARSVVDRLS LFPELRQLAAVSGEFDYLAVLRASSTQRLDALLDEIGEIDGVIRTTTSVLLAMRIDRI A" gene 11584..11928 /gene="578-7" CDS 11584..11928 /gene="578-7" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 578-7" /translation="MALPYRTTGSLSPSFLPARLVSLAVKHAYAYALSARFPTVPSVP SNSSVTLWEETAPVKLPTIHCPQPGSRAKVRTSNTPGWYFNVGSMRTSVRTSKPPTYP TQICSKSNVKLQ" gene 12578..12907 /gene="578-8" CDS 12578..12907 /gene="578-8" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 578-8" /translation="MIGTGILTCFPSATHLCLALGADSPYADERCVGNLALTARGLFT PFNATHVSIRTSDTSSRLHNPPSQAYRTLSYHAHCCASAASVTGLAPLHLPRRTTRSV SYYAFFK" gene 13451..13831 /gene="578-9" CDS 13451..13831 /gene="578-9" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 578-9" /translation="MYACGFRIYFTPLPGFFSPFPHGTCSLSVNYEYLALEDGPPIFR QDYTCPALLVVRSVPQIAFLIQGYHLLWPDFPFRFDMQTAKSHRLLRFRSPLLSESRL MSVPRATEMFQFARFASTPYVFRC" gene 13815..13964 /gene="578-10" CDS 13815..13964 /gene="578-10" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 578-10" /translation="MYSDVDTLAGGFPHSEISGSKLICQLPEAYRRLSRPSSPVIAKA STTCT" gene 14136..14465 /gene="578-11" CDS 14136..14465 /gene="578-11" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 578-11" /translation="MLKSSSPGIAWFLANMNTHRALMFASVVTLRWWRMTGSNRRPPA CKAGALPAELIPQSRIKAVVGLVGFEPTTPALSRRCSNRLSYRPTSVLARSRQSTRLS LLCLCSR" tRNA complement(14236..14308) /product="tRNA-Ile" tRNA complement(14330..14403) /product="tRNA-Ala" rRNA 14551..16070 /product="16S ribosomal RNA" gene 15996..16472 /gene="578-12" 170 CDS 15996..16472 /gene="578-12" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 578-12" /translation="MPRVTVRLACVRHAASVQSEPGSNSTVRSVSLNGIEVNFTSISV SASSKYSKLLRSCSTSSAHAYRLQIVKELRKTNQLLGVCCGALFVCSAQKRDYCSVLK RLSIPSKKFFEVAGLSRVPHPLSSAGQSSPKSRTVSPPPPAPPPSGQVRKQRSPSS" gene complement(16572..18323) /gene="578-13" CDS complement(16572..18323) /gene="578-13" /note="'Putative Phosphoenolpyruvate-protein phosphotransferase'" /codon_start=1 /product="CLONE 578-13" /translation="MSFQVFGIAVSQGVAIGRAVLVGAGRVDVAHYFVEPDRVAQEVA RCEGACAQVAEELVALRDELPADAPPELAALLDVHLMMARDEAFVGGAREWITERRYN AEWALSAQMEALGRQFDEMEDDYLRERKVDLEQVVERILAVLSRQAKGGGESLPAALP RDFGGDDPLLLVASDIAPADLIGFRQGVFRGFITDVGGRTSHTAIVARSMNIPAVVGA REASRLIRQDDWVIIDGDAGIAIVDPSPVVLAEYRFRQRQAELEQERLSLWRHRPAVT LDGHPVEMLANIELPADAEAALAAGAEGVGLFRTEFLFMNRGGKLPTEDEQYLAYRAA LENLQGRPLTIRTVDVGADKPLDRSHGSDHQLNPALGLRAIRWSLAEPVMFRQQLRAI LRAAQHGPVRLLIPMLAHQSELMEVRSTLDRVRAQWSTHSDQPCPRVDLGIMVEVPAA ALILDRVLPWVDFVSIGTNDLIQYTLAIDRADEAVAHLYDPWHPAVLQLIASVIRQAR SAGKAVCVCGEMAGDLAFTALLLGLGLRSFSMNPPQITLVKQRILRADAAKLQEMAAD VLAHEQPQERARQLGLA" gene complement(18364..18633) /gene="578-14" CDS complement(18364..18633) /gene="578-14" /note="'Putative Phosphocarrier protein'" /codon_start=1 /product="CLONE 578-14" /translation="MLTSVLTISNRLGLHARASAKLTKLAGSFASEIHLSRNGRRVNA KSIMGVMMLAAGMGTQVELEADGPDEAAALDAIAALVNDRFGEGE" gene 19323..20786 /gene="578-15" CDS 19323..20786 /gene="578-15" /note="'Putative Histidine Kinase'" /codon_start=1 /product="CLONE 578-15" /translation="MPGQGVEAAFGLALGEQARPARRGALLLGFAREASQHIRLGPTE LAPIVRRPAGGQDDDTEPGEPNCEQSKQGRRRAHRQTIQTPCSWGGRKTAEASHYLAF FCGPIVTLPARKPTFDPPDAPDGALSTELALLLTQLSHEAAQPLTEALDRMQSLLRTV DAGSAARLSMIREPMRRARDAVLLASQIGRLASGRVRPARDPVALHASVHQVAEQRRR EAMARGLQIRTQLLAAEAEADQALVNSLLHALLDWALWHTRSSIELSLVLTPWPVRSR LECRFAFRDLDQFDAPVQLPKLDDLRWRLVQQLCEVLGLECQRLDEGGVSVTRIDFPV PQLKEWVAQIDLSRTDIDPGLNTQPFAGMHALLLSADPGLVGEILPILERLGWTVDTV ASVDEAFQLCLDGLPQAVVTDAVLRGPDLDQWCTHVMAEAPSLCFVEVVSADSAALPF ARTRGLQRCQRERLSQDLPVLLRAALEPGEQALTLRL" gene 20803..21315 /gene="578-16" CDS 20803..21315 /gene="578-16" /note="'Putative Thioredoxin'" /codon_start=1 /product="CLONE 578-16" /translation="MPAMTPMQRRAALAALGLGAAGFGAWWWQAQALPLAPAVRFTLL DGSTHDSVAAWRNKVVLVNFWATSCATCVKEMPDIVSTHRRFAGRSFDTLAVAMQYDP PDYVVQFAQTRQLPFGVAIDNTGAIANAFGPVRMTPTTILIDRRGQIVKRILGEPQFA QLHELIESLL" gene complement(21328..22032) 171 /gene="578-17" CDS complement(21328..22032) /gene="578-17" /note="'Putative Lipoyl synthase'" /codon_start=1 /product="CLONE 578-17" /translation="MIMGDKCTRRCPFCDVGHGRPDPLDPEEPKNLAKTIGALKLNYV VITSVDRDDLKDGGAQHFVDCIRETRVASPQTKIEVLVPDFRGRDDRALDILKAAPPD VMNHNLETVPRLYKEARPGSDYAFSLNLLKKFKAAVPGVPTKSGLMVGLGETDEEILA VMRDLREHGVDMLTIGQYLAPSGHHLPVRRYVHPDTFKMFEREAAAMGFSHAAVGALV RSSYHADRQAEQAGAV" BASE COUNT 3922 a 7611 c 6730 g 3906 t ORIGIN 1 ctcgtccgcg gccagcccac ggccgtcgtc ggtccgcacg aaggtggcca cgccttgggc 61 atcgcggcgc aacaccacga gggggcgctg cgcgccccaa ctcgcgcgcg gcggcgcggc 121 gtcggtgggc acgctgaagt tcaaacgcat gtagtcgccc tgcatcaggg atcgcgggtc 181 gaccggggcc aggggcacga acaggggctc cccctgggcg atgagacgct ccttctgggc 241 gatgcccacg ttggccacgc ccagggtgag cgccagcgcg agcaaggcgc cccagcggcg 301 gtccaggccg gccgcgccag gcacccagcg cggccggacg atggcggcca ccagcaggcc 361 cagcgtcacc aggcccaggg ccttgtgcgc cagcggccag gccagttggt agtagaacgc 421 ccccaggacc cacagcgctg ccaggcccgc ggccgcagcc aggcgccagc gctgctggcg 481 cgcgcacagc gcggctgcca acgcgatggg gcccagcgcc ggcagcacca aggccaaggt 541 ggccagcagc agcgccaaca gcgcccaacg cgcagcgcgc aggggcggcc aggcgcgggc 601 cagcagggca aagcccagca ccgcagtggc gccgctgccc agggccagca cggcgctgcc 661 ccatggggtg tcgccacggt gcaagccgct caatcccgcc accaggaacg ggctgccgct 721 ttgcacggcc agcgcagcca gcaccgtcag cagccagccg gccccagtgg accaccaggc 781 ctgcgcgacg cgggccgaca ggagtgcacg cgccaccccg gcccatacag cgagcagggc 841 cgtggccacg gcccactgcg gccagcctgg caaggaccag ggccccgacc atggtttgct 901 cagtgcccaa cagcaagccg ccgcgccggc gcccagcaag gtgcgcagcc acggacgcgg 961 cagcggccag gccagcccaa ggcacagcag ggtcaaggcg gcccaggcgg gttcgtcggg 1021 caggctttcg tgcaggcgcc agcccacgca cagcaggccc accaacagca acggcacggc 1081 cagttgctcc acgaacaggg gcacggcacg gtcgcgcagc agcagcacgg ccacgacgat 1141 caagcccgcc cccacgaaga gcaaacccac gccgtcgcgc accagcgggc ccagcaagcc 1201 gccgatgaag gccagcaacg gcacagcggc aaaccacgcg cccacggccg tgagcaactg 1261 caccggccag ggccgctgct cggtgttggg ccacgccgcg gccttgggca gcaagccctg 1321 ggcctgggcg cgctgcagca gttcggcggg gctcatgcgg cacctccggc gcgatggcgg 1381 cgcatcagca ggctcacggt gccggccagc aggccggcag ccaccaagcc gaaaacaaag 1441 gccttgccga gccagtcgct tccactgccg ccctcgaaca accagcggcc gaagccggcc 1501 agtgcccaca cgttgatcgc cagggcggtg gacgacagca tcaccaggtc cccctggcgc 1561 cacagcgcag cgcccgcggc cagcagcagc ccgcccagcc aggcgtactg cggcccccag 1621 cgggtcgtga acacggccat caaggcccag ccactgatca ccaccacgcc ccacacgccc 1681 tgcgcgcgcc aggcccacaa gccggcgccg tgccaacgct gcgcccaggg cgccagcgct 1741 gcggccaggg ccaccagggc cgcggcgccc agcaactggg gcgtgagcgc gtgggcgtcc 1801 gaggcccagc gatgcccgct gtgcgcctgc acccacagca ccacggtcag cgtggccgtc 1861 agcgtccagg gcgcccacac ggcgtcgtgc cgcacggcca gcgccaaggg cagcgccagg 1921 gcagcccaca gcgcgaacaa ttgccaaggg tcggcgccgg tttggtaggt ctggccgaag 1981 taagcgaaca gggcgccctg ccccagccag gccagcaggc ccagcgcggg ccgcgcgtgc 2041 ggcagggtcc aggcgcccag caggctgctg gccaccaagc cttgcagcag cgcgaacttc 2101 gtggtccggc cccagtcggg ccagttggct gccacccaca tcaccacgcc caggcccacc 2161 agcaaggccg ccaccgcgcc cacggcgggc agcaggcgtt cggccaagcg gggcggcggc 2221 gcgtcttcca ccagggcaag cagggcgcga gtgcgctgcg gatcgaggcc gtggtccacg 2281 gccagggcgt gaagcccagg gcgacagtcg gtcatgcggt agcgggtaag gtgcaaggcc 2341 tcgatcctag gcaccctgcc gccccgctgc cccgcggaag cgccgagtcg ctccaagcgg 2401 taccgtgcga cgcggttggc gccacccctg ttggagctcg ttcccatgcg tcccctggcc 2461 tgcaccctcg ccgccttggc cttccaagcc tcggcccaaa gcccggcacc cacccccgtg 2521 ccctccccgc cggccgaccc ggcgctggcc ccgctgctgc aggccatcga ccctgcgctg 2581 ctgcgccgcc acatcgacac cctggccgcg ttcggcacgc gacacacggc gtcggacacg 2641 gcgagcgaca cccgcggcat cggcgcggcg cggcgttgga tcgagcgcga atggcgcgcc 2701 tgtgcgaagg acacaccgct ggaaatcagc acccgcagcc acaccgagcc cgtgggccgg 2761 cgcatcgcgg cgcccaccga gatcgtcaac gtgctggcca ccctgcccgg ccgcgacccc 2821 aagcgcttcc tcgtcgtcag cggtcattac gacagccgca acgccgacgt gatggacgcc 2881 caaggcgacg cccccggcgc caacgacgac gcctcgggca ccgccgccgt gatggccatg 2941 gcctgcacga tggcccgttc gacccgccaa cccgaggcca cgctggtgtt cgccgccgtg 3001 tcgggtgagg agcagggcct gctcggcgcc gcccaactgg ccaaggaact cgacatcgac 172 3061 ggccagaccg tcgaggccat ggtcaccaac gacatcgtgg gcagcccaag tggcgcccaa 3121 ggcgagcacg cccccgatgc gcattcggct gttcgccgac ggcctcgacc cgctgctgcg 3181 cctgatgctg cgaggcgcgc aagccgagcc ctcgccccct ctcaaagcgc agcaagccct 3241 ggccgccgct ggtggcgccg acgacctgcc cgccgcgcag ctgggccgcc acctggcccg 3301 cgccaccgag gcttacgtgc cgggcacgca aatcgacctc atccagcgcc gcgaccgcac 3361 cctgcgcggc ggcgaccacc tgcccttcct cgaacgcggc ctggccgcag tgcgcttcac 3421 cgagcccttc gaaaactacg ccaaccagca ccagaacgtg cgcctggaaa acggccggct 3481 tgcgggcgac ctgcccgcgt ttgtggacgt ggactacctc gcccgggtga cccgcgccaa 3541 cctcgccggc ctggccactt tggcctgggc tccgccgccg cccgcgcagg tgcaggtgga 3601 cgccagcgaa ctgagcaacg acacccgcct gatctgggcg tccagccccg gcgccagcgg 3661 ctaccgcgtg ctgtggcgcc gcagcgagtc ggcccgctgg gaatgggccc gcgacctgcc 3721 cgccagcgcc cgcgaggcgg tgatcgccgg tgtgtcgcgc gacgacgtgg tcttcggcgt 3781 gcaggcgctg ggcgctcatg gccacagcag cttggcggcg tacgcaccgc cggtgaaccc 3841 ggcgcgttga ccgcgcgggg gctcgaaagg tcaggggaaa tcaagctggg gcgggcgctc 3901 gcacgctcaa ttcagcgtcc agacgaaagc ctgcgtggcc gacatcacgc cgtccttgag 3961 cttgtcgctg gctggcgcct tgtacgtttg cgccttggcc cgggccaccg aggcgtcatc 4021 gagcaaggcc tcgaacacca ccggccgcac gtcggacggc acgcccggca cccgcagttc 4081 gcgttcgtgc acggtcgccc gcatcaactg acccttggcg tccaagtcga cccaaagggt 4141 cacccgaccc gtcaccccga agcgcttggg caattcgggg tagccatcgt cggacttgag 4201 cggcgtgcca gccgcttcgc gaccgatcca caacaggtcg ggcagtcgga tggccaggcc 4261 ctcgcgaaag cgcgccaagg cctgcgcttg acgggccgga tcgccctgca cctcgggccg 4321 cgccagttcg ttgcgcaggg cccagtggtg cagcagacag cggtccagcg cacgcgcccg 4381 ggcgacttcg cgccgctggc ccgcccgggc ctcagggccg tagctgcgca gcaggctcgg 4441 caagggctgc agcccgacgg tgtcgccgcg gcgcacccgc tcgatcagtg ccgccacggg 4501 ctggggctcg gtcacggcca gcgcacgggg ctgaccccag cgcgccagca aggcccgctc 4561 gtgcgccagg gcctcggcca aagccggtgc aggcaggccg cgcagacgcc ccaaacgctg 4621 gtccatttcc gtgaccggct cggccaccgg gtggcaggcg gtctcgctct gcagcgcctg 4681 cagcaccaag gcgtcgtgcg ccgaccccag gctgtccagg ctccacaagc caaactcgtt 4741 gtagatgcgc gccatcaccc ggtcggccgt cagggtcggc tggggccctt gctgctcctc 4801 gcgccaccag gccagcatca acgcccgcac gcgcggcagg tgctcggtca ccatggcggt 4861 ggccgcctgc cgagtgggct cgtcgtgggc ggcattcacg cgccaatgcg gcagcaagtt 4921 ttgcgcgatg aaccgatcca cccaggcttc gggcgacggc ggttgcgctt ccgccggggc 4981 ggcgggctca gccgcccagg tggagaagcc tgcccccatg cccaatgcgc acaacagccc 5041 cagcaccgcc gcacgcaggc ggccaaccca tgaacgattc atgccctcat tgtcggcgct 5101 gcgcagcggc cggcgcgggc gggttcagcc gaaatcgaag gtaccggggc ccagcgcccc 5161 ttgggtcagc gcggccacca cgcgtgcact ggcctgccgc agggcgctgg cggtgtgggc 5221 tcggtaggtg ccctgcacca agggccccag accatgctcg gcgctgaagc tgccaccatg 5281 cgccgccaca cgggcgtaaa cgccctcgcg caggcgctgc aaggcctgtg cctccaatgg 5341 ctcggctgca cggggccaca ccagattgaa atgctggcca ccgtcgccca aatggccaaa 5401 gtcgcacacc tgcgcgggtg gaaactcgcc cgacagccaa tcgctggcct ccgcccggaa 5461 ggcccacacg gcccggcgag gcaggctgat gtcgaagcca atgaccgccc cacgctcgcg 5521 caggccctcg cttaccgcat gccgcagggc ccagagggcg tcgtccttgt ccagcgccgc 5581 gtccagcaca tcaccgcgct cgaaagccgc ttccaactcg ccgtgcagca aagcgcgcag 5641 gtcgacaccg tcgaaatccg ccgccagctc gatcagcagg ctgtagggcg gcagcccttg 5701 cgcaaagggc gaggccccgc cccgcacctc gatggcggcg gccagggcgt tggcactcaa 5761 gccctcgaag gccgacacca aaccgcccca tgcctgggtg aagcgctggt acaggggcca 5821 caccgcctcc aggctcgcgg gcgccaccag ggccacggca gtgtgtcggg gcagggcttg 5881 caggcgcagc gtggcttcgg tgatgacgcc caaagcccca ctgctgccga tgaacagatg 5941 ctgcagcgcc agcgccgcgt tgtccttcgc caagccgcgg cccagcgtca acacctcgcc 6001 atcgggcagc accacctcca gggccaaggt gttggcccgc acatcgccat agcggcacat 6061 gcgcgtgccc ccggtgttgt gggccagcat gccgccgatg ctcgggtcgg cactgaggtc 6121 gatggcgaac tgcaaaccat gggggcgcag ccgctcgttc acctccgaca ggcggaatcc 6181 cgccgagacc cgcagggtgc ggtccagtgc atccagctcg aacacacccc gcagccgctc 6241 ggtgctcagc accacctcgc cctgcggcgt ggcagccttc accaggcccg tgtgtgcgcc 6301 ttgcggcaac aggcgcacgc cgtgctcctg ggcccgttgc agcaggtggg ccacatcggc 6361 cgcggtgccg gggcggcaaa cggccggcga gccgcccgtg ccgtaacgcg ccccgctcag 6421 gtagcggggt tccacctcgg tgtcgaccac cgaccccgcg gcccggatga aggcctccaa 6481 gcccatcaat tcaggatttc caacgcttca tgcgtcgtca ttttgaagat ggccgtcgcg 6541 ttgccggcgt cgaacgcgag atccagccga ccgcactgca gccgctcgcg ctggatgaac 6601 tcgtaaaagc tgcctggcac gtgccgcgtg acccattcgc cctcgtgaag gaactctcgc 6661 gccacctggg cggccttgaa ggcggtttgc atcacgcgtc cgctgccgct gacctcgatc 6721 gtgtccttga tcgagcgtcc cttgcgcttt tgcgcttcgc tcagcgcctg caggtcgctc 6781 acccggtcgg tggcgtggtt gaaggcattg ccctcggtgg caatccaggc catttccgcc 6841 gactcggcca gcagcgtttc gtaatcggcc agcgcaaaac ccccatgctg ccgaccgaag 173 6901 cacgccacga tcttgggcat cagggccaag gcgtcgctcc agggcaaagc ggcatcgcgc 6961 cccaggcgtt ccaggcgacc gatgtcttgc ggtcccagcg ggtccacgct gtcggccagc 7021 acgcgagtga ccgccgcttg gaactgcgca ctgaagcgct ccgggtgcaa ctcgctgacg 7081 aagaactgcg caatcgtttc cgggaaatcg gcgtgcgccc aagcgcggcc ggtcatcttc 7141 aagcggggca gcggataggt gtcggccacg gtgaaaccca aggggcgcaa cacgcgggag 7201 atcgccgcct cgccgggcgg cagggcgccg ctgggccagg ccaccgtgcg caaagcacca 7261 tggtcgaagg tgatcttttg gccggcggcc atcacatcct gaacgtagtc gtgcccttcg 7321 ggcacgcgct cgtgcaagtc gaagaacagc accatgttca aggcctgggc catttcggcg 7381 cggctgaccg tctcaccgat gggccggatc aagcgcgggt cgatctccaa ctgagccacc 7441 agggcctcgg ccgccgatgc ccccagcaag ccttgaacga ggcgatgaac gccattgcgt 7501 ggagcatcag acatcgaact tgaccccctg cgccaggggc aggctgttgc tgtagttcac 7561 cgtgttggtc acacggcgca tgtactgccg ccaggcgtcg gagcccgact cgcggccgcc 7621 gcccgtttcc ttttcgcccc cgaaggcgcc accgatctcg gcgcccgagg tgccgatgtt 7681 gacgttggcg atgccgcagt cgctgccgct ggcgctcatg aaggcctcgg cctctcggag 7741 gtcggtcgtg aagatggcgc tggaaagtcc ctgcggcacg gcgttttgca gggcgatggc 7801 ctcgtccaac ttgctgtact tcagcacgta caagatgggc gcgaaggttt cgtggcacac 7861 cacgtcggtc tgggccggca tggtcaccag ggccggggtg gcgtagtagg cctccgggtg 7921 atcggcggcc aaggcgcgct caccgccccg cacctgcccg ccctgggacc gcgctgcatc 7981 cagcgcggtt tgcatggcct cgaaagcctg ccgatccacc aaagggccca ccagcgtgcc 8041 gtcatgcagc gggtgcccga tcttcaattg cacccgcact ttgtccaaac gctcgaccag 8101 ggcgtcgtgc acgctgtggt gggcgatcac gcgccgcgtc gtcgtgcagc gctgacccgc 8161 cgtgccccag gcgccgaagg cgatggcgcg caccgccagt tccaagtcgg ctttgtcgct 8221 gacgatcagg gcattgttgc cccccagctc cagcaagcac cgaccaaagc gcgccgccac 8281 cttgggcccg acgatgcggc ccatgcgggt cgaccccgtg gcactgatca agggcacgcg 8341 ggcgtccgtc accaacgcat caccgaccgc ggcgccaccg ttcagcacct ggctcaagcc 8401 ctcgggtgcg gtgtgaccgg cagcgctgta cgcggcgacg gccttgtcga acagagcctg 8461 cgtggccaac gccgtcaacg gggtcttttc gctcggcttc caaatgcagg tgtccccggc 8521 caccaaggcc aaggcggcgt tccaagccca cacggccacc gggaagttga aggcgctgat 8581 gatgcccacc acgcccagcg gcaaccactg ttccatcatg cggtggcccg gccgctcgct 8641 ggcgatagtc aatccatgca actgccgcga caaaccgacg gcgaagtcgc agatgtcgat 8701 catctcctgc acctcaccca gcccttcgct ggcgatcttg ccggcctcca ggctcaccaa 8761 agcggccaaa tcggccttgt gcgcgcgcag cacctcaccg aagacgcgca ccaactcgcc 8821 ccgcttgggc gccgggatca cccgccatgc caagaaggcc gaatgcgcgc gcccgatggc 8881 cgcctgcatg tcctgcgcgg aggcctccac cacccggccg gtcacagcac catcaaccgg 8941 cgagcgggcg gccaaggtcc cgtcggtgta aagcgccggg gcgaccccca tgccttgcaa 9001 caaggaggcg gcggtgggcg aggacattgg ggtcatacgg tctttcgtgc agcgggtcga 9061 caaggtggga gacaggagtc aggcgatcga ctcgacctgg cgcgactgct ggtacgcccg 9121 accgaagcgg ttggctagga agtcgggcaa agcgacctcc tcttgccgga tgaagccttg 9181 ctgcggcaac ctgccttcgc ggaacaggtc cagggccgcg caaatgccgg cggccgtggt 9241 gatctggatc gcgctcaggg gcgacttgac atcccggtcg gcaaagatct tgcgggcgaa 9301 cacctcttgc atgagcacgc cgttcttcat gccactcacc gtcacgaaca ccagcaccac 9361 gtcctgcatc gtcgcgggca tgctcttgcg catgatgtcc ttgagttttt cttggtcggc 9421 cgccaagccc aggtcctcca gcaggaactt catcaggccg cggtggccgg ggtaacgcac 9481 ggtcttgtaa tccagattgc gcaccttgcc cgcccaggtt tcacacagcg tgcccaaacc 9541 ccccgatgtg ttgaacgcct cgtactccgt gccatccaag ctgaagtgct ccagcccttc 9601 caacgggagc gccgaaatga actcaccgcc gtggatcgcc tcgcaggggt ggcaatactc 9661 gttgatcaac ccatccaccg accacgtcag gttgtacttc agctggttgg tcgggaaggc 9721 cggcaacgcc cccacgcgca tttgcacgtc acgcacctca tcgaactgct tcgccaagtg 9781 gtgggcgacg atcccaatga agcccggcgc cagtccgcac tgcggcatga aagcggtctt 9841 tgcgccgtct gccatgtcct tgatcgcctt cgtggcggcc acgtcttcgg tcaagtcgaa 9901 gtaatgacaa ccgcactcct tcgccagcgt ggcggcggtg atggccaagt ggtagggcaa 9961 ggcattcagc accgcgtcct gtccgcgcaa ttgggccgcc agtgccgccg tgtcctcggt 10021 gtcaaccgcc gctgtggcga tgccttcggc agccaaggca ctcaggtact gggcattttt 10081 gtccaccacc gtgacgcggt agtccccgct ttggtgcagc aatcgtgcgt tggtttgtcc 10141 gatgtggcca gcgccaagca gggcgatctt catggggaac tccgtggtct tgagaggttg 10201 agcgccagcg gcgcgttggc atcactttag ggcgggttca cgtcaaaata cagaagcaac 10261 tcggcgaaag ggagagccat ctcgtcgatt tggatgtttg tttcagcttt ttgtatgaaa 10321 tggggaaaac catggttcac gcactcgacg gcctcgaccg cgaactcatc gccgcgttgc 10381 agtccaacgc cagaatgtcg accaccgacc tggccaagcg cctgcgggtg gcgcgcacaa 10441 ccgtggtgtc ccgattgagt cgactggagg cccaaggcgt catcgtcggc tacacggtgc 10501 ggctgggcgc gaccgagggc cgggaaggcg tgcaggcctt tgtcaacctg agcgtcagcc 10561 ccaaggccgc ccgaagcgtg gtggatcgcc tcagcctgtt ccccgagcta cgccagctgg 10621 ccgccgtcag cggcgagttc gactacctcg ccgtgctgcg cgccagcagc acccaacgct 10681 tggacgctct gctcgacgaa atcggcgaaa tcgatggcgt catccgcacc accacatcgg 174 10741 tgctcctcgc catgcgcatc gatcgcatcg cctgagttcc accaaaccac cgcctaccgt 10801 cggcagacag caaaaagcaa aacccctcga cacgggaggt atcgaggggt tttggggggt 10861 aaatagcttg acgatgtcct actttcacac gggcaatccg cactatcatc ggcgctgagg 10921 tgtttcacgg tcctgttcgg gatgggaagg ggtgggacca cctcgctatg gtcgtcaagc 10981 ttgacttgtc ggctgcccgc tgaggcggac aacccaattc atagagtctg gatcagcatg 11041 attgaattgc gtcttcattg aagaacgact tacacagcct cgactgatgt caaggttata 11101 gggtcaagcc tcacgagcaa ttagtatcgg ttagcttaac gcattactgc gcttccacac 11161 ccgacctatc aacgtcctgg tctcgaacga ctcttcaggg ggctcaaggc cccggcaaga 11221 ctcatcttga gacgagtttc ccgcttagat gctttcagcg gttatctctt ccgcacttag 11281 ctactcggca atgccactgg cgtgacaacc gatacaccag aggtgcgtcc actccggtcc 11341 tctcgtacta ggagcaggct ctctcaatct tgcagcgccc acggaagata gggaccaaac 11401 tgtctcacga cgttttaaac ccagctcacg tacctcttta aatggcgaac agccataccc 11461 ttgggaccgg ctacagcccc aggatgagat gagccgacat cgaggtgcca aacaccgccg 11521 tcgatatgaa ctcttgggcg gtatcagcct gttatcccca gagtaccttt tatccgttga 11581 gcgatggccc ttccatacag aaccaccgga tcacttagtc ctagtttcct acctgctcga 11641 cttgtcagtc tcgcagtcaa gcacgcttat gcctatgcac tatcagcacg atttccgacc 11701 gtacctagcg taccttcgaa ctcctccgtt acactttggg aggagaccgc cccagtcaaa 11761 ctgcccacca tacactgtcc ccaacccgga tcacgggcca aggttagaac ctcaaacaca 11821 ccagggtggt atttcaacgt cggctccatg cgaactagcg tccgcacttc aaagcctccc 11881 acctatccta cacagatctg ttcaaagtcc aatgtaaagc tacagtaaag gttcatgggg 11941 tctttccgtc tttccgcggg gagattgcat catcacaaac atttcaactt cgctgagtct 12001 ctggaggaga cagtgtggcc atcgttacgc cattcgtgca ggtcggaact tacccgacaa 12061 ggaatttcgc taccttagga ccgttatagt tacggccgcc gtttactggg acttcaatca 12121 agagcttgca ccccatcatt taatcttcca gcaccgggca ggcgtcacac cctatacgtc 12181 gactttcgtc tttgcagagt gctgtgtttt tagtaaacag tcgcagccac cgattctctg 12241 cgaccccgtt gggctcccct tgtacaggtt cacctactga gggcacacct tcttccgaag 12301 ttacggtgtc aatttgccga gttccttctc cagagttctc tcaagcgcct tagaatactc 12361 atctcgcgca ccagtgtcgg tttgcggtac ggtcgtcaat agctgaagct tagtggcttt 12421 tcctggaagc agggtatcac tcacttcgtc tgcaagcaga ctcgttatca cccctcatct 12481 aagcccggcg gatttgccta ccaggcacga ctacaggctt gaaccgggac atccaacacc 12541 cggctgagct aaccttctcc gtccccacat cgcactattg atcggtacag gaatattgac 12601 ctgtttccca tcagctacgc atctctgcct cgccttaggg gccgactcac cctacgccga 12661 tgaacgttgc gtaggaaacc ttgcgcttac ggcgaggggg cttttcaccc cctttaacgc 12721 tactcatgtc agcattcgca cttctgatac ctccagcagg cttcacaacc caccttcaca 12781 ggcttacaga acgctctcct accacgcaca ttgctgtgca tccgcagctt cggtaactgg 12841 cttagccccg ttacatcttc cgcgcaggac gactcgatca gtgagctatt acgctttctt 12901 taaatgatgg ctgcttctaa gccaacatcc tgactgtttt agccttccca cttcgtttcc 12961 cacttagcca attttgggga ccttagctgg cggtctgggt tgtttccctc ttgtgtccgg 13021 acgttagcac ccggtgcact gtctcccaag ctgtactcat cggtattcgg agtttgcata 13081 ggtttggtaa gtcgccatga ccccctagcc taaacagtgc tctacccccg atggtaatac 13141 ttgaggcact acctaaatag ttttcggaga gaaccagcta tctccaggtt tgtttagcct 13201 ttcaccccta tccacagctc atccgctagt tttgcaacac tagtcggttc ggacctccag 13261 cacctgttac ggtaccttca tcctggccat ggatagatca cctggtttcg ggtctacacc 13321 cagcgactaa atcgccctgt tcggactcgg tttcccttcg ccttccctat tcggttaagc 13381 tcgccactga atgtaagtcg ctgacccatt atacaaaagg tacgcagtca cccttgcggg 13441 ctcctacttt ttgtatgcat gcggtttcag gatctatttc actcccctcc cggggttctt 13501 ttcgcctttc cctcacggta cttgttcact atcggtcaat tacgagtatt tagccttgga 13561 ggatggtccc cccatcttca gacaggatta cacgtgtccc gccctacttg tcgtacgctc 13621 agttccacag attgcatttc tcatacaggg ctatcaccta ctatggccgg actttccatt 13681 ccgttttgat atgcagactg ctaaatcgca caggctcctc cgatttcgct cgccactact 13741 ttcggaatct cggttgatgt ctgttcctcg agctactgag atgtttcagt tcgcccggtt 13801 cgcctcaaca ccctatgtat tcagatgttg atacccttgc gggtgggttt ccccattcgg 13861 aaatctccgg atcaaagcta atttgccagc tccccgaagc ttatcgcagg ctatcacgtc 13921 cttcgtcgcc tgtaattgcc aaggcatcca ccacatgcac ttagtcactt gaccctataa 13981 cgttgacatc gctgactgct gatgtcgtca aggactgctt tgcgttcgcc gttcttcaat 14041 tccgcctcga gccttgcggc tctcagctga tgtttgttga cgcaatccaa tgccatgaag 14101 ctcttgcttc atggcgctga tttcgactct acgaattgtt aaagagcagc agtccaggca 14161 tcgcctggtt cctggcaaac atgaacaccc atcgggcact catgtttgcc agcgtcgtga 14221 cgttgcgttg gtggaggatg acgggatcga accgacgacc ccctgcttgc aaagcaggtg 14281 ctctcccagc tgagctaatc ccccaatctc gtatcaaggc cgtggtgggt ctggttggat 14341 tcgaaccaac gacccccgcc ttatcaagac ggtgctctaa ccgactgagc tacagaccca 14401 cgtctgtcct ggcgaggtct cgccagtcca ctcgcttgtc actgctgtgt ctttgcagcc 14461 gataagtgtg ggcgcatcaa agttgagctt tgattcgact cgatctgcct cgccagcgtg 14521 ctgactcggt gatctcgcca ttttctagaa aggaggtgat ccagccgcac cttccgatac 175 14581 ggctaccttg ttacgacttc accccagtca cgaaccctgc cgtggtaatc gccccccttg 14641 cggttaggct aactacttct ggcagaaccc gctcccatgg tgtgacgggc ggtgtgtaca 14701 agacccggga acgtattcac cgcggcaagc tgatccgcga ttactagcga ttccgacttc 14761 acgcagtcga gttgcagact gcgatccgga ctacgaccgg gtttctggga ttggctcccc 14821 ctcgcgggtt ggcagccctc tgtcccggcc attgtatgac gtgtgtagcc ctacccataa 14881 gggccatgag gacctgacgt catccccacc ttcctccggt ttgtcaccgg cagtctcatt 14941 agagtgccct ttcgtagcaa ctaatgacaa gggttgcgct cgttgcggga cttaacccaa 15001 catctcacga cacgagctga cgacggccat gcagcacctg tgtgcaggtt ctcttgcgag 15061 cactcccata tctctacagg attcctgcca tgtcaagggt aggtaaggtt tttcgcgttg 15121 catcgaatta aaccacatca tccaccgctt gtgcgggtcc ccgtcaattc ctttgagttt 15181 caaccttgcg gccgtactcc ccaggcggtc aacttcacgc gttagctacg ttactgagaa 15241 gaaaccctcc caacaaccag ttgacatcgt ttagggcgtg gactaccagg gtatctaatc 15301 ctgtttgctc cccacgcttt cgtgcatgag cgtcagtaca ggtccaggga gttgccttcg 15361 ccatcggtgt tcctccgcat atctacgcat ttcactgcta cacgcggaat tccactcccc 15421 tctaccgtac tctagctgtg cagtcacaag tgcagttccc aagttgagct cggggatttc 15481 acacctgtct tgcacaaccg cctgcgcacg ctttacgccc agtaattccg attaacgctt 15541 gcaccctacg tattaccgcg gctgctggca cgtagttagc cggtgcttat tcttcaggta 15601 ccgtcatcct cccgaggtat taactcagaa gatttcttcc ctgacaaaag cggtttacaa 15661 cccgaaggcc ttcttcccgc acgcggcatg gctggatcag acttgcgtcc attgtccaaa 15721 attccccact gctgcctccc gtaggagtct gggccgtgtc tcagtcccag tgtggctggt 15781 cgtcctctca gaccagctac ggatcgtcgc cttggtgggc ctttacccca ccaactagct 15841 aatccgacat cggccgctct gatagcgaga ggtcttgcga tccccccctt tcaccctcag 15901 gtcgtatgcg gtattagctg ctctttcgag cagttatccc ccactaccag gcacgttccg 15961 atgcattact cacccgttcg ccactcgtcg ccaggttgcc ccgcgttacc gttcgacttg 16021 catgtgtaag gcatgccgcc agcgttcaat ctgagccagg atcaaactct acagttcgat 16081 ctgtttcact caacggaatc gaagtgaact tcacttctat ttccgtgagc gcttcaagca 16141 agtactccaa attgctccgg agttgctcaa cttcaagcgc ccacgcttat cggctgcaaa 16201 ttgttaaaga acttcgcaaa accaaccaac tacttggcgt ttgttgcggc gcgctgtttg 16261 tttgttcagc gcagaagcga gattattgca gcgttttgaa gcgactgtca ataccttcga 16321 agaaattttt tgaagtcgcc ggcctcagcc gcgtccccca ccctctttct tctgcaggcc 16381 agtcaagccc caagagccga accgtctcgc ctccgccacc agcgcccccg ccctctgggc 16441 aagttcgtaa gcagcgaagc ccgtcatcat agaacagttt ggagaagtct tgcaaatgtc 16501 gatgaaaatt tgcaaccggc gatgcgtggg gcggcgggac agggactcag gctttcgcca 16561 gtgtccccac atcaagccaa acccagttgt ctcgcccgct cttgaggctg ctcgtgcgcc 16621 agcacatcgg cggccatctc ttggagcttg gcggcatcgg cgcggaggat gcgttgcttg 16681 accagggtga tctgcggcgg gttcatgctg aatgaacgca agcccagccc gagcaggagc 16741 gccgtgaaag ccaggtcgcc cgccatctcg ccgcaaacgc agacggcttt gcctgccgat 16801 cgcgcctgcc gaatgacgct ggcaatgagt tgaagaacag cggggtgcca ggggtcgtac 16861 aggtgagcca ccgcctcgtc ggcgcggtcg atcgccagcg tgtactgaat caggtcattg 16921 gtgccgatgg aaacaaagtc gacccaaggc aaaacgcgat cgaggatcag cgctgcggct 16981 gggacctcca ccatgatgcc cagatcgacc ctggggcagg gttgatcgct gtgcgtggac 17041 cactgagcgc gcacccggtc cagcgtgctg cgcacttcca tcaactccga ctggtgggcc 17101 agcatcggaa tgagcaagcg caccggcccg tgctgggcag cacgcaggat ggcgcgcagt 17161 tgctggcgaa acatgaccgg ctcggccaag ctccagcgga tggcgcgcag cccgagcgct 17221 gggttgagtt ggtggtcgct gccatggctg cggtcgagcg gcttgtcggc gccgacgtcc 17281 acggtccgga tggtcaaggg gcggccttgc aagttttcca gcgccgcgcg gtacgccagg 17341 tactgctcgt cttccgtggg cagtttcccg ccgcgattca tgaagagaaa ctcggtccgg 17401 aagaggccaa cgccttcggc accggcggcg agggcggctt cggcgtcggc tggcaattcg 17461 atgttggcca gcatctcgac ggggtggcca tccaaggtga ccgccggccg gtggcgccaa 17521 aggctgagcc gctcttgttc cagttccgcc tggcgttgac ggaatcggta ctccgccagg 17581 accacgggcg aggggtcgac gatggcgatg ccggcgtccc cgtcgatgat gacccaatcg 17641 tcttggcgaa tgagtcggct ggcctcccgc gctcccacca cggccgggat gttcatgctg 17701 cgcgcgacga tcgccgtgtg cgaggtgcga ccaccgacgt cggtgatgaa gcctcggaac 17761 acgccttgac gaaacccgat caggtcggcc ggcgcgatgt cggaggccac cagcagcagc 17821 ggatcgtcac cgccgaagtc gcggggcaac gcggcgggaa gcgactcgcc gccgcccttc 17881 gcttgccggc tcagcaccgc caggatgcgc tcaaccacct gctcgagatc gaccttgcgc 17941 tcccgcaggt agtcgtcttc catctcgtca aactgacgcc ctaaggcttc catctgcgcc 18001 gaaagcgccc actcggcgtt gtagcggcgc tcggtgatcc actcccgggc gccacccacg 18061 aaggcctcgt cccgcgccat catcaggtgc acgtccagca gcgccgccag ttccggcggc 18121 gcatcggcgg gcagttcgtc tcgcagggca accagttctt cggcgacctg ggcacaagcg 18181 ccttcgcaac gcgccacctc ttgcgcaacc cgatcgggct ccacgaagta gtgggccaca 18241 tcgacccgtc ccgccccgac gaggaccgcc cggccgatgg cgacgccctg tgagacggcg 18301 atgccgaaga cctgaaagct caaggtgccg cccccgagcc agtgggcttg gagggtcggg 18361 cgctcactcc ccttctccga agcggtcgtt gaccaacgcc gcaatcgcgt ccagcgcggc 176 18421 cgcttcgtcg gggccgtcgg cttccagttc gacttgggtg cccatgcctg cggccaacat 18481 catgacgccc atgatgctct tggcattcac ccggcgacca ttgcggctga ggtggatttc 18541 gctggcgaat gagcccgcaa gcttggtgag tttggccgac gcgcgcgcat gcaggccgag 18601 tcggttggag atggtcaaaa cagacgtgag catggcggga gcttaatccg tggcgcccag 18661 gagacggatg cctgaaaacg cgccctctct ggccttgtcg gccatggccg acaagccttg 18721 tcgtcggtag gcaagagcac gccaaaccat gggcacgttc acgccctgca gcaccaaggc 18781 cccggcatcg cccgccagtg cgcggacgca ggcattgctc ggggtggctc cagcaacgtc 18841 gaccagcacc aaggtatcca ccccggcatc gcggtgcgct tggcgcaacc gaccgagtgc 18901 ttcggtggcc tcgtccaatc cggccgcagc cgggacatcc agggcctgca cctcgtccgc 18961 cgcatcggca aagacgtgct gagccacctg cttcagggcc gacgccagtg gtgcatgagc 19021 gacgatcagg agacgcactg gcttcatggc agtcgctgga cctggtcgaa aaatgcggca 19081 gcgcggccct gcggttctgg gagtgcctcg ccctccacca cggtggcctg gtaaacggtc 19141 aagccccaag cccacgccat gagctgcacg cgacgctggc cgtcaccccc cgaccagcga 19201 ccactgccgg gccacgccaa agcagcggcg ggcacggcac cgggcagcga cttggcttgc 19261 cctgaccacc gctgtgggat ggcggccaag ccgtcgcgcg ccgcctccgg cgtgtcgaag 19321 acttgccagg ccaaggcgta gaggcggcct tcggactcgc actgggcgag caagcccgcc 19381 cggcccggcg gggcgcgctg ctgctcgggt ttgcaaggga agcgagccag cacattcgcc 19441 tcgggccaac ggagctcgcg ccaatcgtcc ggcggccggc aggcggacag gatgacgaca 19501 ccgaacccgg cgagcccaat tgtgagcaat cgaaacaagg acggcggcgc gctcatagac 19561 agacgattca aaccccttgc tcctggggtg gacggaagac tgcggaagca tcacactatc 19621 tggccttctt ttgtggcccc atcgtgacac tgcctgcccg caagcccact ttcgatccgc 19681 cggatgcgcc tgacggcgcg ttgagcacgg aattggcctt gctgctgacg caactgtcgc 19741 acgaagcagc ccagcccctg accgaagccc tggaccgcat gcagtcgctg ctgcgcacgg 19801 tggatgccgg cagcgctgcc cgtctgagca tgattcgcga acccatgcgc cgggcgcgag 19861 atgctgtgct gctggccagt cagatcggcc gcttggccag tgggcgggtc cgcccagcac 19921 gcgaccccgt ggccctgcac gcctcggtgc accaagtagc ggagcaacgc cggcgggagg 19981 ccatggcgcg gggcctgcag atccgcaccc agttgctggc cgccgaagcg gaggccgatc 20041 aagccctggt gaacagcctg ctgcacgctc tgttggattg ggcgctgtgg cacacccgca 20101 gttccatcga gttgtccttg gtcttgacgc cttggccggt ccgatcgcga ctggagtgtc 20161 ggtttgcctt ccgtgacctc gatcagttcg acgccccagt tcaactgccc aagctcgacg 20221 acctgcggtg gcggctggtg cagcagctgt gcgaagtctt gggcttggaa tgccagcgcc 20281 tggacgaggg cggggtgtcg gtcacgcgga tcgacttccc ggtgccgcag ctcaaggaat 20341 gggtggctca aatcgacttg agccgcaccg acatcgatcc gggcctgaac acccagccct 20401 ttgccggcat gcatgccttg ttgttgagcg ccgatcctgg cttggtgggc gaaatcttgc 20461 ccatcctgga gcggttggga tggacggtgg acacggtggc gtcggtggac gaagccttcc 20521 aactgtgcct agacggcctg ccgcaagcgg tggtgaccga tgccgtgttg cgcggtccgg 20581 atctcgacca gtggtgcacc catgtgatgg ccgaggcgcc ctcgctgtgc tttgtcgagg 20641 tggtgtcggc agactcggct gcgctgccct ttgcacgcac gcgcggcctg cagcggtgcc 20701 agcgtgagcg cttgagccaa gacctccccg tgttgctgcg tgcggccttg gagccgggcg 20761 aacaagccct gaccttgcgg ctgtgaagcc cggctcggca caatgccggc catgactccc 20821 atgcaacgtc gagcggcttt ggccgcgttg ggcctcgggg ccgcaggctt tggcgcttgg 20881 tggtggcagg cgcaggcgct gccgctggcc cccgcagtgc ggttcaccct gctggacggc 20941 agcacccacg acagcgttgc agcatggcgc aacaaggtgg tgctggtgaa tttttgggcc 21001 accagctgcg ccacgtgtgt gaaagagatg ccggacatcg tctccacgca ccggcgcttt 21061 gcgggccggt cgttcgacac gttggcggtg gccatgcagt acgacccgcc ggactacgtg 21121 gtccagttcg cgcaaacccg ccagctgcct tttggggtgg ccatcgacaa cacgggcgcg 21181 atcgcgaacg cgtttgggcc ggtgcgcatg acgcccacca ccatcctgat cgaccgccga 21241 gggcaaatcg tcaagcgcat cctgggcgaa ccccagttcg cccagttgca cgagctgatc 21301 gagtccctgc tgtaagccct ttgggggtca gactgcccca gcctgctcgg cttggcggtc 21361 ggcgtggtag ctggatcgca ccaaggcgcc cacggcggcg tggctgaagc ccatggcagc 21421 ggcttcgcgc tcgaacatct tgaacgtgtc gggatggacg taacggcgca ccggcaggtg 21481 gtgcccgctg ggggcaaggt actggccgat ggtcagcatg tccacaccgt gctcgcgcag 21541 atctcgcatc accgccaaga tttcctcgtc cgtttcaccc aagccgacca tcaagccgct 21601 cttggtgggc acgccgggca cagcggcttt gaacttcttg agcaaattca ggctgaaggc 21661 gtagtcggag ccgggccggg cttctttgta gaggcgcggc accgtctcga ggttgtggtt 21721 catcacgtcg ggcggggccg ccttcaagat gtccagcgcg cggtcgtcgc gcccccggaa 21781 gtcaggcacc agcacctcga tcttggtctg gggcgaagcc acccgggttt cgcggatgca 21841 gtccacgaag tgctgcgcgc cgccatcctt caggtcgtcg cggtcaacgc tggtgatgac 21901 gacgtagttg agcttcagcg ccccaatggt cttggccaga ttcttcggct cctcggggtc 21961 caaggggtca gggcgaccgt ggccaacatc gcaaaaaggg caacggcggg tgcacttgtc 22021 gcccatgatc atgaaggtgg ccgtgccgcc cccgaagcat tcgccgatgt tggggcaact 22081 ggcctcttcg cagacggtgt gcagcttgtg ctcgcgctgg atctgcttga tttcgtaaaa 22141 gcgggtgttg ggcgagcccg cttcacacg

177 Appendix VI: Nucleotide sequence of Clone 905

LOCUS CBNPD1_Clone_905 16235 bp DNA linear ENV 30-NOV-2006 DEFINITION Uncultured organism CBNPD1 BAC clone 905. ACCESSION EF157670 VERSION KEYWORDS ENV. SOURCE Uncultured organism CBNPD1 BAC clone 905 ORGANISM Uncultured organism CBNPD1 BAC clone 905 Unclassified sequences; environmental samples. REFERENCE 1 (bases 1 to 16235) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Metagenomic Analysis of a Toxic Cyanobacterial Bloom JOURNAL Unpublished REFERENCE 2 (bases 1 to 16235) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Direct Submission JOURNAL Submitted (30-NOV-2006) The Microbial Gene Diversity and Discovery Research Unit, Griffith University, Kessels Road, Nathan, Brisbane, Queensland 4111, Australia FEATURES Location/Qualifiers source 1..16235 /organism="Uncultured organism CBNPD1 BAC clone 905" /mol_type="genomic DNA" /isolation_source="Toxic cyanobacterial bloom" /environmental_sample /plasmid="pIndigoBAC-5" /country="Australia" /metagenomic gene complement(47..2860) /gene="905-1" CDS complement(47..2860) /gene="905-1" /note="'Putative zinc protease'" /codon_start=1 /product="CLONE 905-1" /translation="MKKINQFALGIMLFPVTILAQTMDYGQAIPFDTNVKTGKLDNGL TYYIKKNAKPEKKVDLRLVVNAGSILEDDDQQGLAHFMEHMCFNGTKRFPKNQLVDYL QSIGVKFGQHLNAYTSFDETVYFLPIPSDNPEKLEKGFQILEDWAFNTVLTPEEIDKE RGVVLEEYRLGLGAQKRMMGRYLPKMMYNSKYANRLPIGQKEILEKFKYETLTRFYKD WYRPNLMSVIVVGDIDVAEMEKKIKEHFASYKNPANEKVRNVFEVPNHKETFVAVESD KEASNTQVQLVYKDYDAPKKITTVGDFRSYLVEGLFTTLLNNRLEELTNSATPPFTFG YSYYGGTYARTKKAYQSVAMMAEDKQLSALKVLVTENERARKFGFTAGELDRAKADFL AQIEKNYNDRTKTDSENFVEEIQLNFLEKEPVPGIEWTFETMKKILPTIALADVNGFI KNYIKEDNRVVVFTGPQKDNLKKVTEQEVLETLKVNEADLKPYEDKAVATSLLRKEAK AGTIVKRETNATLGTKTLVLSNGVKVVYRTTDFKNDEVLFEAVSLGGSNLYSNDEMKK VQFANGALAEAGFSGLKLNDINKFMTGKIARVDPYIGQTTEGLRGNTTPKDLEYLFQM VHAYFTDLNFDQEAFEGFKQKQASFFKNMASQPQYYFQQEFYAYLNKENPRFNGILPT DKSWAETDYKLAYDKYKERFANAADFEFFFVGNVDDKTIEAFATKYLASLPTTAAKEK TVDLGYRMLKGDLKKVVNKGTDPKSNVNIMFYGEAKYSPKEXLVMEALGEVLTIKLIE QLRESESGVYGVSARGSMNKVPYGSYNFTINFPXGPDNAEKLTASALNELQKIITNGP EEKDVAKYKEGELADYRKDSKENRFWLTNFTRSFLNESNPENALKYEAEVNAITAKDI QEVAKKYLTKDKVIGMLMPESKS" gene 3032..3589 /gene="905-2" CDS 3032..3589 /gene="905-2" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 905-2" /translation="MRYRLFKIILGVLGFIPIDTIEAQSLQEIIPPYNIKTVSFMDNG

178 QNVIPVFKLGEAFELNFDDLFGNEADYYYEVMHCDYNWVPSDIPKQEYLKGLDNQRII NYSNSYNCLQLYSHYQLSIPNQFTQLALSGNYMIKILNDDREVVFSRKFIIYERPRQC KITNQKRQKFKHHRKQTKPRFLDNV" gene 3627..4295 /gene="905-3" CDS 3627..4295 /gene="905-3" /codon_start=1 /product="CLONE 905-3" /translation="MLLLQNRQFNTGIKNIVPQYTIGNELIYKYDTETQFWAGNEFRF FDNKEIRNAANNVAKISSNGGIYNSFLFTDDARANFPYSFTQDVNGNFVARNLNATNQ KIEADYAWVYFSLSAPAFQIDKGIYINGMFNNYAIAPENKMEYNEKKKVFEKALLIKQ GFTNYQYVVADAKGKIDSENSIDGNFFQTENEYEVIVYYRENTQRYDRVIGFVALTSV GATN" gene 4364..4750 /gene="905-4" CDS 4364..4750 /gene="905-4" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 905-4" /translation="MVSQITRGIKISVLTSFEGTYFKNYKIHFAFSYQITIENHSKDS VQLTSRHWEIYDSLNDHEVVDGEGVIGKKPVLKPAEQHTYSSGCLLSSPYGAMKGNFN MINFTTTKNFKVFVPNFRLCAPFALN" gene 4870..6498 /gene="905-5" CDS 4870..6498 /gene="905-5" /note="'Putative delta-1-pyrroline-5-carboxylate dehydrogenase'" /codon_start=1 /product="CLONE 905-5" /translation="MLKGFFNVPKAVNEPVKGYAPNSPEKAAVLAAYKAMWNSKIDVP LYVGSEEIKTGNTRNMTAPHDHQHVVGTYHLAEKAHVEKAIANALEARTAWANMAWEQ RAAIFLKAAELIAGPYRAKINAATMIAQSKNIHQAEIDASCELIDFLRFNVEFMTQIY ADQPASTSDMWNRLEYRPLEGFVYAITPFNFTAIAANLPASAAMMGNVVIWKPSDSQV FSAKIIIDVFKEAGVPDGVINVVFGDALMVTDTVLASRDFAGIHFTGSTHVFKDIWAK IGTNIHHYKTYPRIVGETGGKDFVVAHPSATVKQVATGIVRGAFEFQGQKCSAASRAY VPQSMWPALKEQLITDTKSMKMGSPEDFGNFITAVIHEGSFDKLASYIDQAKAATDAE IIVGGNYDKSVGYFIEPTIIVTSNPKYTTMETELFGPVITIYVYEDANWAETLKLVDT TSEYALTGAIFSQDRYAIEEATVALQNAAGNFYINDKPTGAVVGMQPFGGARASGTND KAGSALNLLRWVSPRTIKETFVTPVDYRYPFLGE" gene complement(6663..7739) /gene="905-6" CDS complement(6663..7739) /gene="905-6" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 905-6" /translation="MKLLIKTVFGFLILLNYSNSFSQEIKYNPSSKSIKPTLFADLDE TCQTPDGMALDKKGNLYLSVTNPTTFEKFGSKILTFDKNDKPITWFDKLPLHPEIKRV HPMGMEFGPDGNLYVIDNQCFARKADISRLIRVIVKDGKPIESEVLVEGFNFSDGLRW FKNRIYITDASFKNSTESGIYSFSLEELNKNKIVLDVSNKQKYLISTFRLKPETEKHS LGIDGIAFDKKGNLYAGSFGDGVITKIEFFKNGKVKSKKVVFDSDKLKCCDGFFYDKE RNAIFIANYDNNSVHQLDLNTNTITTIWGNSNNDGADGQLDNPCETIIYKGKLLVVNY DTFKGPKNTEIDSFHTISSFKLEN" gene 7834..9063 /gene="905-7" CDS 7834..9063 /gene="905-7" /note="'Putative transposase'" /codon_start=1 /product="CLONE 905-7" /translation="MGLFRRNKNIKKPVIRQIIDIVPRWMLESCAKKHSSDKGCSKYK 179 TYDQFVALTYGQLNKCYTLNDISTGIGVSETFIGDLGLIQSPARSTMSDGNKKRDWKV FETLYYRLLKHYGRVLKSESQRSIIKEIKEQNIKLIDSTTISLCLNMFEWAKFRTAKG GLKIHTCWDDNLQIPDLVNITEAKTHDRYGIGQLVFSKGTIVVEDRAYFDFALMLHRI QAENVFVTRIKTNTLYESIQELDLPDEEDQDIIKDEIVVLNSSKAIETGINQEKLRLV HVYKQDENKIIEIITNNLEWSARTIADLYKKRWDIESFFKAMKQNLQIKTFLGTSENA VKSQIYIALICYLLLELINRTISKKTNSFSNLVEKIRICLVYYLSLDYICNQVGDGAK KIRSDPELKFSSDLFSA" gene 9433..10485 /gene="905-8" CDS 9433..10485 /gene="905-8" /note="'Putative two-component system sensor protein'" /codon_start=1 /product="CLONE 905-8" /translation="MNLLKNKDLKRVLMHCVYWISFLLLYVSGKSDDTTYYDFIFVYT FKILTQATAGYGLIYWIIPQTLNKKKYLLFVVSALGWLYFVFALIMTLKYYYLEPKFP GFFDDWLGHKMTIPERLTSFKLIFREFSFITYPIIILGFISFNRKQQRLLKLEEEKKS MELKVLKNQLNPHFLFNTLNNLYTLTLKKDEKAPEVIAKLSEILDFVLYRCNEDFVAI EKEIALIENYIALEKLRYNENRLDVLFTKEIQENNKISPLIILTFIENAFKHGVINET EKATIRLHLESKKEKIIFPIENTKPQNGFERISDKSKIGLENVRKQLDLLYPKKHQLE IDETLANYIVKLSLTA" gene 10519..11694 /gene="905-9" CDS 10519..11694 /gene="905-9" /note="'Putative beta-lactamase'" /codon_start=1 /product="CLONE 905-9" /translation="MKNYFLISTIVFFSFISAKSQSKVIDTLVDVGNHKIHFKIIKGK GIPILFDAGGGNDGSVWNSILDQTSKITNATLITYDRAGFGKSTIDTLQKDDPKHGII SSVEYLEIGLKKLGYDKEIILVSHSYGGYLSALYASRHPKLVKGVVLIDVNHNYYEDG VIEKVLATQDKLIPQWKRNNKGTXYMSATILETVKIMSKINIHSSIPVVDFVNGIPFL KDPEEIERWKNCHKTYVENNPNVTGITASGCGHGIWMDNPPLVINTIAKIYANTSKQK AEIQERAIQYAIQACNEERAQELAFNHSEDDLNTWGYELLRKNENQKAAEVLKLNMVL NPNSSNAYDSYGEALLKIDRKEEAILMYKKSIELNPENKNGKEVLEKITKDNAKQLN" gene 11733..12428 /gene="905-10" CDS 11733..12428 /gene="905-10" /note="'Putative Sensory transduction protein lytR'" /codon_start=1 /product="CLONE 905-10" /translation="MKYKCIIVDDEPLARELIESHLAHFDSFELINSFENALKTYTFL ESNDVDLIFLDIEMPLLKGNDFLKKLKNPPKVIFTTAYREYALEGYELNVIDYLLKPI TFDRFFVSIEKFRQLQTQKKEKKEVXEHHIFVISGNRNIKIILVEIFYIESLKDYITI HLENGKSHHLKQNISTFEKVLDSNFVRIHRSYIIQTKKLTAYSKNEVEINAVEIPIGS SYKENWLAYLKNK" gene 12647..12919 /gene="905-11" CDS 12647..12919 /gene="905-11" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 905-11" /translation="MMTKIRRLSEFQYEIPIFTIDKDFDGFYAQFLKTDLGKIYLSTP FYELAKSFKLKDAQKGTDCYFSPKGKIALMMLKNYYGCSDKNLSNF" gene 12916..14010 /gene="905-12" CDS 12916..14010 /gene="905-12" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 905-12" /translation="MNGNIFMQFFCDILIPIDRPLTNFKIVSQIRMELSKSLNIRKSQ 180 EILAKNWIPYMKDLDKIFTDATCYESEVRFPTNQKLLWECVQWNYKQMESLCHLLKIK LPRTKYIDWCRRYNEYSKKRKKQFKYRVKVTRGLLKLLYKLNGELSRIENQNPFEATA KYKQQRAIIAKVYSQQAQIFKTGKSVPDRIVSISKSYIRPIVRGKEVKQVEFGAKVNK IQIDGINFIEHIQYRAFNEGTRLQSSVFCAQNLTKTKVKMLGADAIYATNKNRTFTSN HKIQTDFVRKGKAGKDEEQRKILAKEIKKERATRLEGSFGKEKEHYNLKKIKAKTQKS EMLWIFFGIHTANALEIGRRIFQKQQTLAA" gene complement(14247..14519) /gene="905-13" CDS complement(14247..14519) /gene="905-13" /note="'Putative Peptidase'" /codon_start=1 /product="CLONE 905-13" /translation="MQESEADLYSYDFMKRNGYNVNAVXSAFAILAKLSEGADASFLT RITSSHPDAKERAQNARLRAEKDGLYKPYVKKSEASKTKSAAKKKK" gene complement(14576..15079) /gene="905-14" CDS complement(14576..15079) /gene="905-14" /note="'Putative metalloprotease'" /codon_start=1 /product="CLONE 905-14" /translation="MLIKRKIVAFVVLFVFSMSSNAQKINDKILGSVSKAVKGFSFSN EEAIALAKAAVDQMDKENPVADAKDKYDIRLRKIFGKHATENGLKLNFKVYKVKEVNA FACADGSVRVYQGLMDAMDDNEVLAVIGHEIGHVANNDSQDAVKAAYKKEAFMDAIAS HLIKLRP" gene complement(15236..15865) /gene="905-15" CDS complement(15236..15865) /gene="905-15" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 905-15" /translation="MDEILKYFPNLTDLQKEQFAKLDILYHDWNEKINVISRKDIDAL YTKHVLHSLGIAKINKFEPGTFVLDVGTGGGFPGIPLAILFPETRFYLIDVIAKKIKV VKAVAEALELKNVKAEQIRAENVKGDFDXIVSRAVTNMPDFVSWVKTKIKKQQKHQLK NGILYLKGGDLTEELKDFPKAXEYNLADIFEDEFFETKKVVHLPLKFVV" BASE COUNT 5545 a 2926 c 2572 g 5192 t ORIGIN 1 tacttcaatg aaatagcagg tttttatttt tatcctcttt tcggatttat gatttgcttt 61 caggcatcaa catcccaatt actttgtctt tagttaaata ctttttggcc acttcttgaa 121 tgtcttttgc agtgatagca ttcacttcgg cttcgtattt cagtgcgttt tcaggattac 181 tttcatttag gaaagatcta gtaaaattag tcaaccaaaa acggttttct ttactgtctt 241 ttctataatc ggccaattct ccttctttgt atttggctac atctttttcc tctggaccgt 301 tggtaataat cttttgtaac tcgttcaatg ccgatgctgt aagcttctct gcattatcag 361 gaccacaagg gaaattaata gtgaaattat aagatccgta aggcactttg ttcatactac 421 cacgagctga aaccccatat accccgcttt cagattcacg caattgctca atcaatttaa 481 tagtcaatac ttcacctaaa gcttccatta ctaaagcctc ttttggagaa tattttgcct 541 caccatagaa cataatgttc acattacttt ttggatcagt tcctttgttt accacttttt 601 tcaaatcacc ttttagcatt cggtaaccta agtctactgt tttttctttg gctgctgttg 661 taggcaaaga agctaagtac ttggtagcaa aagcttcgat agttttatca tctacatttc 721 ctacgaagaa aaattcaaaa tcggctgcat tagcaaaacg ctctttgtat ttgtcatatg 781 ctagtttgta gtctgtttcg gcccatgatt tatctgttgg taaaattcca ttaaatcttg 841 ggttttcttt attcaaatag gcataaaatt cttgttggaa gtaatattgt ggttgcgaag 901 ccatattttt gaagaaacta gcttgttttt gcttaaagcc ttcaaatgct tcttgatcaa 961 aattcaaatc cgtgaaataa gcatgcacca tttggaacaa atattctaaa tcttttgggg 1021 tagtatttcc tctaagtcct tctgttgttt gaccaatgta tggatcgaca cgtgcaattt 1081 ttcctgtcat gaatttgttg atgtcattca atttcaatcc agagaatccc gcttcagcaa 1141 gagcaccgtt agcaaattgt acttttttca tctcatcatt agaatacaaa ttgcttcctc 1201 ctagactcac agcttcaaaa agcacttcgt cgtttttgaa atctgtagtt ctgtatacaa 1261 cttttactcc attagacaag actaacgttt tggtacctaa agtagcatta gtttctcttt 1321 ttactatagt tcctgctttt gcctcttttc taagcaaact agtcgctacc gccttatctt 1381 cgtaaggttt caaatccgct tcatttactt ttagcgtttc taacacttct tgttccgtaa 1441 cttttttcaa gttatctttt tgtggcccag taaaaacaac tacacggtta tcttctttga 181 1501 tataattttt gataaaccca tttacatctg ccaaagcaat cgttggcaat atttttttca 1561 tggtttcaaa agtccattct attccaggta caggttcttt ttcaaggaaa ttcaattgta 1621 tttcttctac aaaattctcg gaatctgttt tagttctgtc gttgtagttc ttttcgattt 1681 gtgctaagaa atcagctttg gctctatcca attcacctgc agtaaaacca aattttctgg 1741 ctctttcatt ttcggtcacc aacactttca aagcgcttaa ttgcttgtct tcagccatca 1801 tggcaacaga ctgataggcc tttttagttc tagcatacgt ccctccataa taggaatatc 1861 caaaagtaaa gggcggtgta gcagaattag tcaattcctc aagacgattg ttcaataaag 1921 tggtaaacaa gccttctaca aggtaacttc taaaatctcc aactgttgta attttctttg 1981 gtgcatcata atctttgtag accaactgca cctgagtatt cgaagcttct ttatcacttt 2041 ctaccgcaac aaaagtttcc ttgtgattgg gtacttcaaa aacgtttctc actttttcgt 2101 tggctggatt tttataggaa gcaaaatgtt ctttaatttt cttttccatt tcggctacat 2161 caatatcacc aaccacaata acgctcatca aattgggacg gtaccaatct ttatagaaac 2221 gagttaaggt ttcgtatttg aatttttcta aaatctcttt ttgaccaatt ggcaatcgat 2281 tggcgtattt agaattgtac atcattttag gaagataacg ccccatcatt cttttttgag 2341 cccctaatcc taaacgatat tcttccagaa ctacacctcg ttctttatca atttcctcag 2401 gtgtcaatac ggtattgaat gcccaatcct ctaaaatttg gaatcctttc tctaattttt 2461 ctggattatc agaaggaatc ggaagaaaat aaacggtttc gtcaaagcta gtataagcat 2521 taaggtgctg accaaattta actccaatac tttgtaaata atctactaat tgattcttcg 2581 gaaaacgttt tgtgccgtta aaacacatat gctccataaa gtgagccaaa ccttgttggt 2641 catcgtcttc caaaatagaa cctgcgttta cgaccaatct aaggtctacc tttttttctg 2701 gtttggcatt ctttttaatg tagtaggtta acccattatc gagtttaccg gttttgacat 2761 tagtatcaaa tggaatagcc tgtccataat ccatagtttg ggctaatatt gttacaggaa 2821 acaacattat ccctagagca aattggttga tttttttcat aaaaaatgtt ttgttgatag 2881 cgtaaaacta tagatttaac gacaaactga aatctttatt acaaaaaaat taacagaaat 2941 tacttccaaa aaattaatat cctcgattta actacaaaaa aacactataa ataaagcggt 3001 ttagtatatt tacattacta aacaatcaaa aatgcggtac cgtttattta aaatcatctt 3061 gggagtactt ggttttatac ccatagatac tattgaagct cagtcccttc aagaaataat 3121 cccaccatac aacattaaga cagtttcctt tatggacaat ggtcaaaatg taattccggt 3181 ttttaaatta ggggaagctt tcgagcttaa ttttgatgac ttatttggga atgaagctga 3241 ttattattat gaagttatgc attgcgatta taattgggta ccctcagaca tacccaaaca 3301 agaatacctc aaaggtcttg ataatcaaag aatcatcaac tattccaatt cctacaactg 3361 ccttcaactc tattcacatt accaactcag tatacccaat cagtttacac aattggcttt 3421 gagtggcaat tatatgatca aaatattaaa tgatgataga gaagtcgttt tttctagaaa 3481 atttattata tacgaaagac ctcgccagtg taagattaca aatcaaaaga ggcagaaatt 3541 taagcaccat cgaaagcaaa caaaacctag atttctcgat aacgtctaag agccttactt 3601 ttcaaaatcc attacaaaat atcaaggtgc ttcttttaca aaaccgtcaa ttcaacacgg 3661 gcatcaaaaa tattgtacca caatacacca ttggcaatga attaatttac aaatatgata 3721 ccgaaaccca attttgggca ggcaatgaat tccgtttttt tgacaataaa gaaatcagaa 3781 atgccgccaa taatgtagct aaaattagtt ccaatggtgg gatttataac agtttcttat 3841 ttacggatga tgcccgagcc aatttcccat atagtttcac gcaagatgta aacggaaatt 3901 ttgtagcgcg caacctcaat gccaccaatc aaaaaataga agctgattac gcttgggttt 3961 attttagttt atcagctcct gcttttcaaa ttgataaagg tatttacatc aatggtatgt 4021 tcaataatta cgcaattgct cccgaaaata aaatggaata caatgaaaag aaaaaagtat 4081 tcgaaaaagc attattaatc aagcaaggct ttaccaatta tcaatatgta gtggccgatg 4141 ccaaaggaaa aattgattca gaaaacagca tcgatggcaa cttttttcaa accgaaaatg 4201 aatacgaagt gatcgtgtat tatagagaaa atacacaacg atatgataga gtcataggat 4261 ttgtcgcact aacatctgta ggtgcaacca attaagttaa aaaaaagaat tccaatacat 4321 ttttttgtaa ttttgtcatt attaagcacc tacttttcac aatatggttt cacaaataac 4381 aagaggcata aaaatatcgg ttttaactag ttttgaaggg acttacttca aaaactacaa 4441 aatacatttt gcctttagtt accaaataac aattgaaaat cacagcaaag attctgttca 4501 actgacctcg cgtcactggg aaatctatga ctcactcaat gatcacgaag tggtagacgg 4561 agaaggtgta attggaaaaa aacccgtctt aaaacctgca gagcaacata cgtatagttc 4621 tggttgttta ctatcctctc cttatggcgc catgaaaggg aatttcaaca tgattaattt 4681 tactaccact aaaaatttca aagtatttgt gcctaatttt agattatgcg ctccttttgc 4741 cttgaattaa tttgactttt attctatact ctagaatgaa aatttcattt ttactttgta 4801 cttttacctc aaattttgga aaaacccaat aatattaatt agtatttaca atcatctaaa 4861 ataaaaacca tgctaaaagg attttttaac gtacccaaag cggttaacga accagtaaag 4921 ggttacgccc caaactcacc tgaaaaagca gctgttcttg ccgcttacaa agccatgtgg 4981 aattcaaaaa ttgatgttcc tttatacgtt ggtagtgaag aaatcaaaac gggtaatact 5041 agaaatatga ccgcaccaca cgatcaccag catgtggtag gtacgtatca tttagccgaa 5101 aaagcacatg ttgaaaaagc tattgctaac gctctagaag ctagaacagc atgggcaaac 5161 atggcatggg aacaacgtgc agctatattc ttaaaagctg ccgagttgat tgctggacct 5221 tacagagcca aaatcaacgc tgcaactatg attgcacaat ctaaaaatat tcatcaagct 5281 gaaatcgatg catcttgtga gttaattgac tttttacgtt tcaacgtaga gtttatgact 182 5341 caaatctatg ccgatcaacc tgcctccact tctgacatgt ggaaccgatt agaataccgt 5401 ccgttagaag gattcgtata cgcaatcact ccattcaact ttaccgctat cgctgccaac 5461 cttccagcaa gtgctgctat gatgggtaac gtggtaattt ggaaaccaag tgacagccaa 5521 gtattctctg caaaaatcat catcgatgta ttcaaagaag ccggagttcc tgatggcgta 5581 atcaatgtgg tttttggaga tgctttaatg gttaccgata cggttttggc aagtcgtgat 5641 tttgctggaa ttcattttac aggttcaaca catgtattta aagacatttg ggcaaaaatc 5701 ggaaccaaca ttcaccatta caaaacctac ccaagaattg taggagaaac tggtggtaaa 5761 gatttcgtgg tagcgcatcc aagtgctact gtaaaacaag tagcgactgg aattgttcgt 5821 ggcgctttcg agttccaagg acaaaaatgt tctgcagctt caagagctta tgtacctcaa 5881 agtatgtggc cagctttaaa agaacaattg attacagata ctaaatccat gaaaatgggt 5941 tctccagaag attttggaaa cttcattaca gccgtaattc acgaaggatc atttgacaaa 6001 ttagctagct atattgatca agctaaggca gctactgatg ctgaaattat tgttggtgga 6061 aattacgaca aatcagtggg gtattttatt gaacctacca ttatcgtaac cagcaatcct 6121 aagtacacta ctatggaaac cgaattattc ggtcctgtaa tcacaatcta tgtttacgaa 6181 gatgcgaatt gggcagaaac cttaaaattg gtagacacta cttctgagta cgctttaaca 6241 ggagccatct tcagccaaga ccgttacgcc atcgaagaag cgacagttgc tcttcaaaat 6301 gcagctggta atttctacat caacgacaaa ccaacgggcg ctgtagtagg tatgcaaccg 6361 tttggaggcg caagagcttc aggaaccaac gacaaagcag gttctgcctt gaacttattg 6421 cgttgggttt ctccaagaac tatcaaggaa acttttgtaa caccagttga ttacagatat 6481 ccatttttgg gagaataaaa atgtaaaaga ttgcagattt tggattaacg atttgtgatt 6541 taagatttaa aaccccacaa gaattcaaat tattttgagg tcttgtgggg tttgtttttg 6601 tcatgctctc caattttgta aattgcgata aaggagaaat cgtagcgtgc gagtttgcca 6661 ttttaatttt ccaacttaaa actagaaatt gtatgaaaac tatctatttc tgtgtttttt 6721 ggacctttaa atgtatcata gttcactacc aataattttc ctttataaat aatcgtctca 6781 caaggattat ccaattgtcc gtcggcacca tcattgttac tgtttcccca aattgtagtg 6841 attgtgttag tatttaaatc cagttgatgc acactgttgt tatcataatt ggcgataaaa 6901 atagcattcc tttccttgtc ataaaaaaaa ccgtcacagc attttaattt atccgaatca 6961 aaaacaactt ttttagattt tacttttcca tttttaaaaa actcaatttt agtaattaca 7021 ccatctccga aacttcctgc atataaattg ccttttttat caaatgcgat tccgtcaatt 7081 ccaagtgaat gtttctcggt ttcaggcttc aatctaaaag tgctaattag atatttttgt 7141 ttatttgaaa catctaaaac aatcttattt ttatttaatt cttctaatga aaaactataa 7201 ataccacttt ccgtactatt tttaaaggat gcatctgtaa tataaattcg gtttttaaac 7261 caacgaagtc catcagaaaa attaaatcct tcgactaaaa cttctgactc aattggcttt 7321 ccatctttca caataaccct aattaatctt gaaatatccg ctttccttgc aaaacactga 7381 ttgtctatca catataaatt tccatcagga ccaaactcca ttcccattgg atgtactctt 7441 ttgatttctg gatgcaacgg caatttatcg aaccaagtga tcggtttatc gtttttatcg 7501 aaagttagaa ttttacttcc aaacttttca aaagttgttg gattggtaac cgacagatat 7561 aaatttcctt tcttgtccaa cgccattccg tctggcgttt ggcacgtttc gtctaaatca 7621 gcaaataatg ttggctttat cgattttgaa gaaggattat atttaatttc ttgcgaaaat 7681 gaatttgaat aatttagtaa aattagaaat ccaaatacag ttttaattag taatttcatg 7741 aagttttagt ttagtacatt gctatccgaa agtcataaat aaaaaatgcc tgagaatccc 7801 gtatttttga ttttgcgaaa caaataatac tttatgggac tcttcaggcg aaacaaaaat 7861 atcaaaaaac cagttattcg acaaattatt gacattgttc ctcgatggat gcttgaatct 7921 tgtgctaaaa aacacagttc ggacaaaggc tgtagtaaat acaagactta cgaccaattt 7981 gttgcattaa cttacggaca gcttaataaa tgctacactt taaatgatat ttccactgga 8041 attggggtca gtgagacttt tattggggat ttagggctga ttcaaagtcc agcacgatcc 8101 acaatgagtg atggcaataa aaaacgtgat tggaaagtat ttgaaacctt atattacagg 8161 cttttgaaac attatggaag agttttaaaa agtgaatctc aacgtagtat tatcaaagaa 8221 attaaagaac aaaacataaa attgatagac agtaccacaa tcagtttgtg tttgaacatg 8281 tttgaatggg caaagtttcg gacagcaaaa ggtggcttaa aaatacacac ctgctgggat 8341 gataatttac aaatcccgga tttggtaaat ataaccgaag ctaaaacaca cgatcggtat 8401 ggtataggcc agttagtttt ctcaaaagga accatcgttg tggaagacag agcttatttt 8461 gatttcgcat tgatgctgca tcgaattcaa gcagaaaatg tttttgtaac taggattaag 8521 acaaatactt tgtatgaatc catccaagag ttggatttgc cagatgaaga agaccaagac 8581 attatcaagg acgaaatagt tgttttaaac agtagtaaag ccatagagac aggtattaac 8641 caagaaaaac taagattagt ccatgtgtat aagcaagacg aaaataagat tatagaaata 8701 ataaccaata atttagaatg gtcagcaaga acaatagccg atttgtataa aaagagatgg 8761 gatatcgaat cgtttttcaa ggcgatgaaa caaaatctgc agataaaaac tttcttgggt 8821 acaagtgaaa atgcggttaa atcgcaaatt tatatcgctt tgatttgtta tttgttgctt 8881 gaattgataa acagaaccat atcaaaaaag acaaactcat tctccaatct ggttgaaaaa 8941 atcagaatct gtttagtgta ttatttaagt ttagattata tctgcaatca ggttggagat 9001 ggagccaaaa aaataagatc tgacccagaa ttaaaattca gttctgactt attttcagca 9061 taataaaaaa gtatacacct atttataatg gctaacaagg tattttatgg tcattcatta 9121 aaactttcgg atagcaatga gtttagtatt aatttttaaa ataaatttct tttcagctcg 183 9181 gttattacaa ataacaagaa gcacagaatg agttttatca tgctttatag ttttagcgaa 9241 tatagaacta gaataaatcg ctaaaattat gagcattcaa aaagcatgtt tttacattca 9301 aataatcaaa ataaaacaaa aactatattc gtgtgcgaaa agaccttttt tagaagcaaa 9361 aacgcattat tcatcttagc gaattagtcc aacttcaata ttttattatc ttgctcccaa 9421 ttaagatacc ttatgaattt attaaaaaac aaggacctga aaagagtctt aatgcattgt 9481 gtatattgga tttctttttt attattatat gtatctggaa aatctgacga taccacttat 9541 tacgatttta tttttgtata tacatttaaa attctgacac aagcaacagc cggctacggc 9601 ttgatttatt ggattattcc gcaaacactg aataaaaaga aatacctgct ttttgtcgtt 9661 tctgctttgg gttggctgta tttcgtcttt gctctaataa tgactttaaa atattattat 9721 cttgaaccta aatttcctgg cttttttgat gattggctag gacataaaat gactattccg 9781 gaaaggctga cttctttcaa attaattttt agagaatttt cttttatcac ctatcctatt 9841 ataattttag gttttattag cttcaacaga aaacaacaac gtcttttaaa acttgaagaa 9901 gagaaaaaat caatggaatt gaaggtttta aaaaaccaat tgaaccctca ttttcttttc 9961 aacaccctaa ataatctcta tactttaacg cttaaaaaag atgaaaaagc gcctgaagta 10021 attgcgaaat tatctgaaat tttggacttt gttttatacc gttgcaacga ggattttgtt 10081 gctattgaaa aagaaattgc cttaattgaa aattatattg ccttagaaaa attacgctac 10141 aacgaaaacc gattagatgt tttatttacc aaagagatcc aggaaaacaa taaaatttcg 10201 cctttgatta tattgacttt tatagaaaat gcctttaaac atggtgtaat taatgaaact 10261 gaaaaagcaa caatccgatt acatttagaa agtaaaaaag aaaaaatcat ttttcctatt 10321 gaaaacacaa aacctcaaaa tggctttgaa agaatttcgg ataaatctaa aattggttta 10381 gagaatgttc gaaaacaact cgatttatta tatccaaaaa aacatcaatt agagatcgac 10441 gagacgctgg ccaattatat cgttaaactt tctcttacag cttagaattt cttaaaaatc 10501 actcctcaaa attattacat gaaaaactac ttcttaatta gtactattgt cttcttcagt 10561 tttatttcgg ccaaaagcca atctaaagtc attgacactt tggtcgatgt tggcaatcat 10621 aaaattcatt ttaaaattat aaaaggaaaa ggaataccaa tactttttga tgccggagga 10681 ggaaacgatg gctcagtttg gaactcgatt ttagatcaga cttcaaagat tacaaatgct 10741 actttaatta cgtatgatcg tgccggtttt ggcaaaagta caatcgacac attacaaaag 10801 gacgatccaa aacacggaat tataagcagt gttgaatatt tagaaattgg tttaaaaaaa 10861 cttggttatg ataaagaaat catactggtt tcacattctt atggtggcta tctatcggca 10921 ctttacgctt ccagacatcc taaattagtt aaaggtgtcg tattaattga cgtgaatcat 10981 aattattatg aagatggtgt aatcgaaaaa gtactagcta cgcaagacaa gttaattcca 11041 cagtggaaga gaaacaacaa aggcacttat tacatgtcgg ctacaattct ggaaaccgtt 11101 aaaataatga gtaaaatcaa tatccattca agcattcccg tggttgattt tgttaatgga 11161 attccgttct taaaagatcc agaagaaatt gaacgctgga aaaattgtca caaaacgtat 11221 gtcgaaaata atccaaatgt taccggcatt acggcttcag gatgtggtca cggaatttgg 11281 atggacaatc cgcctttagt aataaatacc attgccaaaa tctacgcaaa cacttctaaa 11341 caaaaagctg aaattcagga acgcgcaata caatatgcca ttcaggcttg taacgaagaa 11401 agagctcagg aattggcttt taatcattcc gaagatgatt taaatacttg gggttatgaa 11461 ttattaagga aaaatgaaaa ccaaaaagcg gcagaagttt taaaacttaa catggtattg 11521 aatccaaata gttcaaacgc ctatgatagt tatggagaag cattactgaa aattgatcgg 11581 aaagaagaag caattcttat gtataaaaaa tctattgagt taaaccctga aaataaaaac 11641 ggaaaagaag ttttagaaaa aatcacaaaa gacaatgcaa aacaactaaa ttagattttt 11701 agaaatttag aaactactta ttcctctatt gtatgaaata caaatgtatc atcgttgacg 11761 acgaaccttt ggctagagaa ttaattgagt cacatttagc tcatttcgat agtttcgaat 11821 taatcaattc tttcgaaaat gctcttaaaa catatacttt tttggaaagt aacgatgttg 11881 atttgatttt tttagatatc gaaatgcctt tattaaaagg aaacgatttt ctaaaaaaac 11941 taaagaatcc ccctaaagta attttcacaa cggcttacag agaatatgcg ttagaaggct 12001 acgaactcaa tgtgattgat tacctcttga agcccatcac ttttgatcga ttttttgttt 12061 caatagaaaa attcagacag ttacaaactc aaaaaaaaga aaaaaaagaa gttcctgaac 12121 atcacatttt tgtaatatca gggaacagga atatcaaaat tattttggtc gaaatttttt 12181 atatcgaaag tctaaaagat tacattacca ttcacctcga aaacggaaaa tcgcaccatc 12241 ttaagcagaa tatctctact ttcgaaaaag tactggattc aaattttgtc cgaatccatc 12301 gttcttatat tattcaaacc aaaaaactaa cggcttactc aaaaaacgaa gtggaaataa 12361 atgcagtaga aattccaatt ggaagtagtt ataaagaaaa ttggctagct tatttaaaaa 12421 ataaataata aaacccacaa gaattcaagt tgatgagttc atgtgggttt tatatttttt 12481 tacgaataaa aggttaaaac tataccaaac ccatttttcg aaaaagtacg taggcacatt 12541 cacaataaaa caatcacttg tttttcaata aaataggtgt tatattagaa gaaacaaacc 12601 ccgaaaatgc tcaaaaagag tcaaaaacgg ggttttcttt taaagtgtga tgacaaaaat 12661 aagacgttta agcgaatttc aatacgaaat tccaattttt accatcgata aagattttga 12721 tggtttttac gcccaatttc tcaaaactga tttgggtaaa atttaccttt ccacaccatt 12781 ttatgagctt gccaagagtt ttaagttgaa agatgcccaa aaaggcaccg attgttattt 12841 tagtccaaaa ggaaaaatag ccttgatgat gctcaaaaac tattacggct gttcggataa 12901 gaacttatcg aacttttgaa cggtaatatt tttatgcagt ttttttgcga tattctcatt 12961 ccaatcgaca gacctttgac caactttaaa atcgttagcc aaatcagaat ggaactttct 184 13021 aaatcattaa acattaggaa atctcaggaa atattggcta agaattggat tccttacatg 13081 aaagatttgg acaaaatctt taccgatgcc acttgctatg aaagcgaagt gcgttttccg 13141 accaatcaaa aattactgtg ggaatgtgtg caatggaatt ataaacagat ggaatcattg 13201 tgccatttgc tcaaaatcaa attgccaaga accaaatata tagattggtg cagacgctac 13261 aatgagtatt caaaaaagag aaaaaaacag tttaaatacc gtgtaaaagt aacccgaggg 13321 ttgctcaaat tattgtataa attaaatggc gaacttagcc gaattgagaa ccaaaatccc 13381 tttgaggcaa cggcaaaata caaacagcaa cgcgccataa ttgccaaagt ttatagtcaa 13441 caggcacaaa tatttaaaac tgggaaaagc gttcccgata gaattgtgag catcagcaaa 13501 agttacatcc gcccgatagt tcgagggaaa gaggttaagc aggtagagtt tggagccaaa 13561 gtcaacaaaa tccaaattga cggcatcaac tttatcgaac atattcaata ccgtgctttt 13621 aacgaaggaa cccgtttgca aagcagcgta ttttgcgcac aaaacctaac caaaaccaaa 13681 gtaaaaatgt tgggagccga tgctatttat gccacaaaca aaaaccgaac atttacctcg 13741 aaccataaaa tccaaaccga ttttgttaga aaaggaaaag caggcaaaga cgaagagcaa 13801 cgtaaaattt tagccaaaga aatcaagaaa gaacgcgcca ctcgcctaga aggaagtttt 13861 ggtaaggaaa aggaacatta caacctaaag aaaatcaagg ccaaaaccca aaaaagcgag 13921 atgttgtgga ttttcttcgg aatccataca gcaaacgctt tggaaatagg aagaaggatt 13981 tttcagaagc aacaaacttt agcggcttaa aaaaaacaaa aaaatacctc attggggaac 14041 gtgcctctag tcaacaccga aaaaatactt tttgacacac caaaatccat gaaaatcaaa 14101 aacgaccgat tttgaatttc aaaattggtc gtttttgata tcttttactc aaaagtgcgt 14161 ttacttctga atgtgcctac gtaatggaag gtcttggtgc acaaaaaaca tttgaatact 14221 agagaacgct cctaattcaa ccgctattat ttcttctttt tcgctgctga ctttgttttt 14281 gatgcctcag acttcttaac ataaggctta tacaaaccat ctttttcggc acggagtcta 14341 gcattttgcg ctctttcttt ggcgtctgga tgtgaactag tgattctggt taaaaaagaa 14401 gcatcagcac cttcacttaa tttggccaaa atagcaaatg ctgattctac ggcatttaca 14461 ttgtaaccat ttctcttcat aaaatcatag gaatacaaat ctgcttcaga ttcttgcatt 14521 cttgaatgtt tactatcaat aattactttc cctaattttc ctaaatcact actagttaag 14581 gccgcaattt tatcagatga gaagcaatcg catccataaa agcctctttc ttataggcgg 14641 ctttcacggc atcttgagag tcattattag caacatggcc aatttcgtga ccaatcacag 14701 ccaacacttc gttgtcatcc atcgcatcca tcaaaccttg gtacacacgt acacttccat 14761 ctgcgcaagc aaaagcattg acttctttta ctttgtagac tttgaaattt aacttaagac 14821 cattttctgt agcatgcttt ccgaaaattt ttctcagacg aatatcatac ttatctttcg 14881 catcagccac gggattttct ttatccatct gatcaaccgc agctttcgct aatgcaatcg 14941 cttcttcatt gctaaagcta aatcctttaa ccgctttact tacagaacct aaaattttgt 15001 cattgatttt ttgagcattc gaactcatag aaaatacaaa caaaactaca aaagcaacaa 15061 tttttctctt tatcaacata caaattttga tttaaaatac acctactaaa tttttcgcaa 15121 aagtattcaa aaagaagata aatcaacacc aaatattaaa tagaacaaga atattcgtca 15181 ttaaaatttc taaaaaaatt ctatttagtc cttccaattt tcatacttga tccttttaca 15241 ccacaaactt caacggtaaa tgcacaactt ttttagtttc gaaaaattca tcttcaaaaa 15301 tgtcagctaa attatattct gtagccttcg gaaaatcctt caactcctcg gtcaaatcgc 15361 ctccttttaa atacaaaata ccatttttta attggtgctt ttgttgcttt ttaattttag 15421 tttttaccca agaaacaaaa tcaggcatgt tggtcaccgc tcggcttaca ataaaatcaa 15481 aatcgccttt tacattttcg gcacgaattt gttctgcttt tacgtttttc aattccaaag 15541 cttccgccac cgctttgact acctttattt ttttggcaat cacatcaatc aaataaaaac 15601 gtgtttctgg aaacaaaata gctagtggaa ttccaggaaa accaccacct gtgcctacat 15661 ccaaaacaaa agttccgggc tcaaacttat taattttagc aattcctagc gaatgcaaaa 15721 catgtttagt gtacaatgca tcaatatctt ttctagaaat aacatttatt ttttcattcc 15781 aatcatgata taaaatatcc aatttggcaa actgctcttt ttgcaaatca gtcagattag 15841 gaaaatactt tagaatttcg tccatggtaa aatttttaac aaaagtactt attttagcgc 15901 ggaaattttt aagtctataa tgtataatcg aaaatttata atacttacat ttgcccctag 15961 ataaaaaatg ttatgaacac tacctcgccc actttttcta aacaagataa tttaaaattc 16021 tttagaacac tcaactctcg ggttaacaac tactttaagg aaaacaacct tcaaaaaaca 16081 ggtaactgga aactacattt aaaaacaatt attttgttca ctgtcttttt ggctccctat 16141 ttcataattc tcactcttaa tttgcctttt tgggcgtatt tattgctaac catcgttatc 16201 ggaattggaa tggcaggtgt tggaatgaat gtaat

Appendix VII: Nucleotide sequence of Clone 1664

LOCUS CBNPD1_Clone_1664 22765 bp DNA linear ENV 30-NOV-2006 DEFINITION Uncultured organism CBNPD1 BAC clone 1664. ACCESSION EF157671 VERSION 185 KEYWORDS ENV. SOURCE Uncultured organism CBNPD1 BAC clone 1664 ORGANISM Uncultured organism CBNPD1 BAC clone 1664 Unclassified sequences; environmental samples. REFERENCE 1 (bases 1 to 22765) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Metagenomic Analysis of a Toxic Cyanobacterial Bloom JOURNAL Unpublished REFERENCE 2 (bases 1 to 22765) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Direct Submission JOURNAL Submitted (30-NOV-2006) The Microbial Gene Diversity and Discovery Research Unit, Griffith University, Kessels Road, Nathan, Brisbane, Queensland 4111, Australia FEATURES Location/Qualifiers source 1..22765 /organism="Uncultured organism CBNPD1 BAC clone 1664" /mol_type="genomic DNA" /isolation_source="Toxic cyanobacterial bloom" /environmental_sample /plasmid="pIndigoBAC-5" /country="Australia" /metagenomic gene complement(255..1361) /gene="1664-1" CDS complement(255..1361) /gene="1664-1" /note="'Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-1" /translation="MSRPMGEDEVGGVGVDLGEAFQIALGVAGGQAGGGLGGGGQARS APAQDLRSLAQPGEAQVIRVLLGPVEGGVGADDPQAQAVRGADGHLAGPELAAGAAGP AQAHLDVVVQSPTGDEAGGLGADRGRRLAGHEGRQVEGVGGDVAEGAAGARPVRIAAP VGLAVAGQVAGQPVLGVLGLDHPDRPQDAPRHQVPGVPDHGIAAVVVGQDEGQAGAGD EVREGPRLLHPGGQGLVADHRDPRLEEGPGGGEMGGVGGDDGDRVDPVRPGGLGPGHF GEVAIAAVRVEPQGQAEGPRPLRIRGKGAGDQGPAIVQPGGHAVDGADEGALPAADHA EAKAAVSGQGRASGGWRRRRRRWRRNRRSSCRSR" gene complement(1316..3880) /gene="1664-2" CDS complement(1316..3880) /gene="1664-2" /note="'Putative Peptidase, S1 Family'" /codon_start=1 /product="CLONE 1664-2" /translation="MVRRPGGQGHVGQGRVHRGAGGHDRPVGEEEVGDLVRLAPAVHH RGGGIAAHAGRAHLVDAVAGRGDVFGGLFPGVEIEAAGGREHLPGVPGRLAHHPALIG AMGGVDDRHGHSPGVLARRREGDAVVGIGQALAEGQERHRPGQAARGLRPPGAHRPAL HAEAAQVGLVRLRHIGVARDVDARRALAGIVLVLEARPAAGDGPGADMVHQVAPHLVA AIGETPREGLRHRVQEDGRGVDAGRIEEDHPRRIGAPLAGVAVDHPHAGDSAESGVIL DRGDDGVGHDGQVAGGQGGGKGGRLGREIGAQIAAVGASQLALAPPAAKILMQLGGAA EVGGPTHDHAAGGEGRFDPGPHGLFDAVHRPGGKEVAVGQLTQAVIVAADPGEGLHMV IPGGEVVIPDRPVHPVPVLQVGLEVLRRPAVGLPAPGEGAAAQLVAPDPAIGLAGRGL VGVVEVAGPEGLVGLEQGVGDAHVIGVVLLALLRGDLVLAAAAAGGEVVAVVLPVANL RPPLEHQDLQAPFAQLLGDPAAADPRPDDDGVKHAWLPGRRRRRRRQGGDGLPARHEF MAKPLVIALAGEKAGDGGPVDLLPVVEVASPGGAGGVDMADQVEVGGQGADQVPLHQL HVIAVVKEAQARVVQLTHDFGAEGRAVALVAGVVDLAVQELQEQAHAGALRVLHHGPE ALDAGGDGRGVGAIADSRAGEDDNRLAAARGGVIDPAAHLVTQGGVPGGVVEAGLQGV AAEGDDGQAEVLSHGPQVVGQHLHPPGAGAGGPREEGPGIGGGVHGEGPPGEALPDHA VSWEREIQAPSAAALRTAAVMKATPLTPSRTPGRPARRSGRACPARWARMRSAASA" gene complement(3847..4746) /gene="1664-3" CDS complement(3847..4746) /gene="1664-3" /note="'Hypothetical Protein'" 186 /codon_start=1 /product="CLONE 1664-3" /translation="MGSSLTGWSFSGTPPPPNSFPGALRRRRAGPDRRSPQSPQVPQT RLHAGPVGLGREVVFDEAVLHARLVAGGQEGGEVEGSGAHIVHAAVGPVVLQVQEAHA AGQAADRVQGVLAAPGQPVDVQLEADEGRVRPGDQAVEGKGSVGVGGELEVVVVVGDA EALGGRPAGEGVQAGDHRLPVALVGHDEGQDDPAAAEVAGHGQDGVGLLGHVVQAGVH GGAVEADVAQVGAQVGEVHGQGGGELHGVVADLADAAQRPLGVGGEGLAQGVELQAVG GHQRAPSRRRSAAWFAVRADRAM" gene 4796..6370 /gene="1664-4" CDS 4796..6370 /gene="1664-4" /note="'Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-4" /translation="MQIQGPGEGPGKVSAQVRQVQHPRLSAETRGRVGEVGGRGGFEV EGPDPVAGRRQEASGVELGSDRPVTLAQPASEEGAGRFVLRSQLVRHVPIVGPENAPR ARPLQRHQRLRVFRHGPVAGVPGAWREIRSGDRGGDAVQPGLHPGVIPEDEVGGGERI HPPEALFRHPHRVDVQSPLARRQAGARGGGEPGAQAREAGEGPAARCDDGCVIESGPP GVHLQHLGGGGRIGLGVRHDCGKVGEDGRPRAQRLRDLGQPGQNSVDVLHVHVAEDDA QAGPGQAQGAKPLHQCGQLAVLPPDVAQGLQAARRVPGSVFEVGVEGGGADKLIFDQE DEGPFRPRGDLQNDRCAHLVDFLCAMRGLAEPDDLRGRVEDPGGEGCGGVRSQAPRRR IQEQGAHRRLEVALGGAEGAVAGNRTAAVGAGPPGPVRPERRQRSGEGRRAKRSQTRP SREGQGRLAFAARPTAEEILRAGLTHGRSSCDRDNRRPSFGKAARRTATHVVEARPPW GWSSWWWRQSSRLCGG" gene 6915..7685 /gene="1664-5" CDS 6915..7685 /gene="1664-5" /note="'Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-5" /translation="MDLLSLGRVVVLHRDQPDLPGAEVEGLEDVELGAFRVDGQVVDA PRRAPVRQEVIERDGGHLVALALRPGAGQAVALGEGVDGGMGRDGRLVEDQGLPRPRP DAPVVYEAGPVAPQGRVPDARRLGEDAVPAQPGLQGPGVGEPDPVRRPELHEEPAGPH LRLVGHPKVLQELAVAPGGVIHAELEEGDLLVLAGIGIPQADHQEQEVGAADRHHVGH SLFGLRARPSMARARRGDGDLRAKFGGPWPPVRSVFVV" gene 7698..8186 /gene="1664-6" CDS 7698..8186 /gene="1664-6" /note="'Putative CBS domain protein'" /codon_start=1 /product="CLONE 1664-6" /translation="MRRRRNPTMVPPSPTGTGDFMLVSQILRAKGDAVFTVAPTDTVG RVAELLHSRRVGAFVVTQGDRVVGIVSERDIVRAVAGGDVGVTSRPVSAYMTAEDLFA APGESVDALLTRMTDRRIRHLPVLEAGRLVGIVSIGDLVKWKISEVEAEADGLKAYIA GS" gene complement(8168..9736) /gene="1664-7" CDS complement(8168..9736) /gene="1664-7" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-7" /translation="MRGRWTGARPRAGGGPPRLVREDHRLAAVQEDPVLQHPAHRPGQ HPALDVAALAHQVLRLVAVADPLHVLLDDRALIEVGGDEVGGGADHLHPPGVGLMIGP RALEAGQEGVVDVDASAGQVQGHVLGQDLHVARQDDEVGPRGGDQGAQAGLLVGLGLG AHRQVVEGDALEIRMRKTRLRVVGDHRRDIHGQLALPPAPDQVGQAVIGLRGQDHHPS AGLGVAQAPVHGKVGGQGLQPRTHRLRPAGAGGEDHPHEETAGLGVVELLGVDDVGAV LEEQGGDAGDDAGPVGTGQGEDHAGAPGNGAADTAKGVGWEGAGRPPPATPEAVGIGP PGVLWIAFPPGRDGTAGRREVGSEPAKQDGQDEALNSLQNEDGDNGRNVQAAQRRDNA LKRRQDRRRQGLQNPVDPGHEAVAQVQDVEGRQPRHDGADDDGPDHHVDEAAEDKVQE ATDHPRHISRPPLCDNPGPPSVLTARPQNRLFRAFLGAVAQLVERIVRNDEVRGSTPL 187 GSTRKPPPDQLPAI" gene 9740..10837 /gene="1664-8" CDS 9740..10837 /gene="1664-8" /note="'Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-8" /translation="MEEFQRLDGSPEGELPAKRGEGVSGGSVDVVGPLAVGAEVEAVV DDLVAGPQARQLPDDGDQDQAGARRPGADHDDPESLAPGLAGQADALGIAGPAQEGVG QPCEHQGPENAAHSVDREDIKAVIDLQPGLHQADGLEADIAAERPQEEARQGADEARS GRDGGQAGNDSGDKADQAGPAKAAVLDQGPDQARRGRRHVCGGEGRAGVGAGRQGAAA IESEPAHPQEAGPGHGHARIVRRRQVPGPAVPGSHRLRQDESADPRRGVDHQAPGIVL DPQGRHPASAPDPVADGSVDQDQPAGAEGQEAGELHPLDHGADHQGRRDDGEGHLEGE EDDLGRRPDDVASGQAHQADATEVADEGGGP" gene 11258..11800 /gene="1664-10" CDS 11258..11800 /gene="1664-10" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-10" /translation="MEPGSEREIGKTARSGADFAVCRADCGLHRGGNLHGIGNSGFPL TGRRLVKNEADPLSGFREQRGKAAGILDPRLPGPGRLDGPGAQRRPLRPGAPVAGWPA LVDHQPECRQPSPAFGHPRRPGDRTARQRQLRDLPGLRPAPRDRGPAPGLRAEGRAAG LPGLRRNREDRDHLVRPVHA" CDS 11258..11800 /gene="1664-10" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-10" /translation="MEPGSEREIGKTARSGADFAVCRADCGLHRGGNLHGIGNSGFPL TGRRLVKNEADPLSGFREQRGKAAGILDPRLPGPGRLDGPGAQRRPLRPGAPVAGWPA LVDHQPECRQPSPAFGHPRRPGDRTARQRQLRDLPGLRPAPRDRGPAPGLRAEGRAAG LPGLRRNREDRDHLVRPVHA" gene complement(12127..15222) /gene="1664-12" CDS complement(12127..15222) /gene="1664-12" /note="'Putative haemolysin activator-related protein'" /codon_start=1 /product="CLONE 1664-12" /translation="MDAGDPGRLGRRDMQAGVDHLQRPEDVAVEINVQPLARQPLHHM ALDVHAGAIPPVRPGLGGERQGGEAADHLLQVAGPQARLGVAGADIEARGMAHQLGHG HGRGLRDQAPVGRLHLEACELRDVPFDGVVQLPEALLPELHHGDAGHRLGHGGDPPDG VGPGRGAGLHVRKAPGLEVEDPPLPGRQGRDTGQLAAVHQFLQVGRERLQLLRIKAGR PRRDHGQGRTRGEPGGAEGGQGGRQGGRRGGGDEKVAAGESHGHGSGDENANVSCDPA RRRAPCATMGPAPPGAARRRPAARKRRGGPRPEPLQRPRARGGPAGPQPGLPAADPRP APVRAPAGPLARRDPHARPPCGVRPWRNLDARGQVCGYARPGEGLPAGRLRLRHGELP PAPAWAPPGGRPGHRRGRGLPAAGGEGLRPGSGADHPDGSFGRRPSGRTGGPGRVLSG GGGRPAGGCPRRHPPGRLRPGHRAPCPRDGGPGLCPGLRSGPLRVGRHVAGHPRRAGP AAAPRPDACGRRQRRDPGPGPGPGRSPGRRRRGGGDPRGQGRKPQQPQLRLRGARRRD HPADPGLPGPAGALSAGLVRTGRRGLNRDQTSDRTGRKLHERPDAPHHGPADLRLQPA RRHERPADERKSDPAVRARQALHRGDSRADAEGVPGAGRGAYRPLELCARPARRAGKG EGPGQEGRPVELLPARRRDWRRPVQPRLRLYRRGARQGAHGLGGHELLRARHRQHGGA GARRHARAEGKVAEAPAERRDPLGLRHDRAGGGLVRRQEHRHPGGAGRRRMGDQRREV LHLRRRRPALQDHDHHGPHQSRRPAAQAAVADPGAHRCARGGDPRPDACLRRRPCAPR PHAHPLHQCPGAQVQCPARRGPGLRXQPGPPRAGPDPPLHAHDRQGRAGPGPDGPPGS LARGLRPQDRLARRQCGDHLPRPDRHRGHAPDGAEGGPRHGCAWQPRGPGLGVHGQGH GAGAGLPDHRPGHPDARRHRHLPVDAARLHVHRRASPAVRRRPGRGPSPGGRPQRDPR L" gene 12162..13448 /gene="1664-11" CDS 12162..13448 188 /gene="1664-11" /note="'Putative regulatory protein, TetR'" /codon_start=1 /product="CLONE 1664-11" /translation="MMDLVRAVGEPQVTHVGVHGGERRPLGDAGGAVHLDGLVDDLAG PLRHHGLDHGHPDPGLAVAKHIHGAGRLQHHQAHGLDVDPGAGDDLHIAAEPGDLAAE GLARETPADHQVQGPLGLADRAHAVVDPARPEADLADXEAPALAEQDIGLGHPDIGEA DVHVAAGRMVFAEDMHRAEDLHPGRIDGHQDLRLLLVRRGVGIGADHGDHDLAARVAG AGDVVLLAVDHPFVAVQHRLGGDVLGVGRGHRRLGHGVGRADLAVQQGLQPLFLLLGR ADALQHLHVAGVGRGAVHDLRGHGRLAELHGDIGVVEVGQAFASLGVGQEEVPQAFLL GLVLRPFQHVELAGRKVPAVGMPLAQRLELLLHRHDCLRDEALDVLVQRGRFFAHPQV VHVVVRVEAEGQRARGGGRRGVHGVSFLSGLMSGRD" gene complement(15546..17333) /gene="1664-13" CDS complement(15546..17333) /gene="1664-13" /note="'Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-13" /translation="MALRPRSVGRGLLRPWVFTPPDPETNLGIAPDLHRRRHPGRDRR QDPAAGHRAGGALRPALPDHRGNPRRHPGQPLPRRPGRRPGGGLPAGRPVPLRHRPVL HRHGRLDPDPRQAGRGRTPQGAREDGRLPDHHRSLLHRRDGRQDPAGHRRPRRAVRIR PHRHPGHDPGHDAGQRPGRLPRPRDPEAGVPDPGPPDRRRPLPGHRRLDDRPDRGLGL RPRAKAISPPRGSSRRSPVRGSAAPGDEDLDAGGDPAGGGVVPHVEGKAHPPPGVRVE QHGALRHVAEGLDIRRGLGSAGHGQGDGLARPPAEGGEGFGGEVEDQVAIGPVRPDGA GTCAVGADAERGEVQPDQGLRVDPRPDRPPVIAAGQAPHDRSKNVPAVEGGGPAGADH PGLRGDLPGLDDGIARIEQGDQAIVGQDQALAPAGPGHDGPAGRTHAGIDHRQDHSVR GQPGRGPGQEPGPGRHVVGRDPVGDVEEPGVRGEGEDHRPADGDGVVDLVEVRQETHH PPRGGGAHRACDEGQDEAGGEQPPPGWTGGAHAAGVRRAAYRRRMLSSRTGGLPAPGM LPLRFWIMPTKTMSFVGSIQNQVPKAPPQ" gene 17502..18476 /gene="1664-14" CDS 17502..18476 /gene="1664-14" /note="'Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-14" /translation="MILRCAPECPMCAPENIVPAGPGRERPMKVAAVKAPGGLDKLVI EERAQPVAGPRQLAELPRLRRRRRHAARRRRAHPHVRRGRRGGRCGRGRHGLRRRGPG PLDLLPQLGDGRPGPGTPHRRSRRPRRRLRGGVRRHAGPGLHXHARRLVLRRGGDPAL RGPHRLARPDGRGADQARRRRPDPGDGRGLDLRPPAGQGGRRDGHLHLLLRREAGKAQ GPRRGPPDQLSPDPGMGRRRRRPHRRPGRRRRGGDRRGRHPRPVGHRHPDRRPRLADR RPGGFRRRGADRPDHVQERHREGRHRRLAAGPGRDDPRPGGLDPPAGD" gene complement(17629..18537) /gene="1664-15" CDS complement(17629..18537) /gene="1664-15" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-15" /translation="MLLGDLMGEGGGDAVEGECGVNHRPEGRGLQGADHLFLVLPRAD GDALHGDVLGHDQGGRRLAGETRQDADQRDVAADPGGGDRLGEGAGPADLHHDVDAPA AGEGGGGGAPFRGLAIVDQVVRAEGPEPFQLLVGGGGGDDRRAGRLGQLEGEDRDPAR PLGQDDVAGLDPRLDHQGAPGGEGRAGQGRRLGEGPAGGHAGEGPGRHGDELRREAVG VVAGNADEAFQGRAARLPVGEEGREDPVPGGEARDALAHSDHLAGPVGHGDAPVGGGQ HAGDDGEVVEVQRAGADRRRAGPALR" gene 18709..19545 /gene="1664-16" CDS 18709..19545 /gene="1664-16" /note="'Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-16" /translation="MEAVRHGGGDPARPGRLQGPRRRGQGPAAGADVVHKRHVRAPET 189 RIVRVLKADLPVPLAGLSGDQQAGPQPRRRRPGPGRALGVRPGDQVHAGPQSCEGRGQ HLPGAGVQRPASRKHRRQVRRAVQVRLQGQHPVEGRGQDLSEDRLAQGLAVVEDPVLP HVGEVGGDEVDARRPRPPQGVRRQQQGHQPHVRPGEAAVEEGLAGHRDGRDGAGLSVR EAVRPDQAQGQAEGLGEASRRPGLVGKGVHGRAHARTNPLSSPAASQARAKSAGSSAC RR" gene 19533..21065 /gene="1664-17" CDS 19533..21065 /gene="1664-17" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-17" /translation="MQAVEGLAPADGVAPTLQEVEPRPRVGGRPGEARDPGDRQAVHA RNHAVPRCGDLPDELTRLQAAGEGGGALGLGDGLQLGQGRAALDQFAGPGLARPARTV GVQLDQPPGQADGLGLQVRRRPGLAAQHGQDVPRLQGRSHAPPHRVRSVGLQDLDLQP DPVGDELEKLAKPQGRLRIAHLGARPDRQVQEEVGGARRQLLGEDRGHHLLGGVEGQG PLHRDQDVVHRRQVHASAPDDAAAAALDHGAHLVLVQVHPGQHLHGVRRPGRGGDGPR GRLGRQQAVGRRHGRDDQRGPVSRNAADAVLVGHQGTVPDQPATGAGHGAGQGQGLLG GHEAGAGDQEGRDLDVRPAVVGQVMHHRLDLGVGQGRPGDLGAQGLGALRRRGGRDPD LRPRRNAEPPEGRLRQPRLARPDQGGVVADHHGGHQGAPAGPHLHLARARQPLRPQGA AVPGQEHGVLAEGVDGQGPDGQQHGASAGDHPLTGRPGQSPGPSGGALGSPRLSRLGR RT" gene complement(21577..21729) /gene="1664-18" CDS complement(21577..21729) /gene="1664-18" /note="'Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-18" /translation="MVVVGAIRLDMAIVLVDPELEEGLAQRQEAELTGWSDQGAVADG LGVGRA" gene complement(21723..22208) /gene="1664-19" CDS complement(21723..22208) /gene="1664-19" /note="'Hypothetical Protein'" /codon_start=1 /product="CLONE 1664-19" /translation="MCARAGNRANPQAARAKARRRRQTLISASVRGRSRERALPAVGE ERKGVGCVVVVDPPRKLWGDPEPAIALVEVAEAGNILVVDREGAHGDDGDCGTTWVRV RLEPVAGGDGPDAARSLGRLGVTEAGVXRVAIRRKPDQDHLGRRGRGSGSVKVASGMP W" BASE COUNT 3250 a 8295 c 7915 g 3305 t ORIGIN 1 ccggatgttc cacgtgaagg acgccgagtt ccggcccacc ggccgccagg gtgtctattc 61 cggctaccag ccctggaagg accgcgccgg ccgcttccgc tcgctgggcg acggccaggt 121 cgacttccgc gccatcttct ccaagttcgc ccagtacggc tattccggct gggcggtgct 181 ggagtgggag tgcgccctca aggccccgga ggacggggcc cgggagggcg cgcccttcat 241 ccgcgaccac atcatcaacg tgaccggcaa gagcttcgac gatttcgtcg ccaccggcgt 301 cgacgccgcc gccaaccgcc ggatgctcgg ccttgaccgc tgaccgccgc cttcgcctcg 361 gcatggtcgg cggcggggag ggcgccttca tcggcgccgt ccaccgcatg gccgcccggc 421 tggacgatcg ctggaccctg gtcgccggcg ccttttcctc ggatccggag cggtcgcggg 481 ccttcggcct ggccctgggg ctcgacccgg accgcagcta tggcgacttc gccgaaatgg 541 cccgggccga ggccgcccgg ccggacggga tcgacgcggt cgccatcgtc acccccaacg 601 ccgcccattt cgcccccgcc cggaccttcc tcgaggcggg gatcgcggtg atctgcgaca 661 agcccctgac caccgggatg caggaggcgc gggccctctc ggacctcgtc gcccgcaccg 721 gcctgccctt cgtcctgacc cacaactaca gcggctatcc catggtccgg cacgcccggg 781 acctggtggc gcggggcgtc ctggggccga tccgggtggt ccaggccgag tacgcccagg 841 actggctggc ccgcgacctg cccggcaacc gccaggccga ctggcgcggc gatccggacc 901 gggcgggccc cggcggcgcc ctcggcgaca tcgccaccca cgccttccac ctggcggcct 961 tcgtgaccgg cgagacgcct tccgcggtca gcgccgagac ctcccgcttc gtcgccggtc 1021 ggcgactgga cgacgacgtc caggtgcgcc tgcgctgggc cggcggcgcc cgcggccagc 1081 tctgggccag ccaggtggcc atcggcgcct cgaacggcct gcgcctgcgg gtcatcggcg 1141 ccgacgccgc cctcgactgg tcccaggagg acccggatta cctgcgcttc gcccggttgg 190 1201 gcgaggctcc gcaggtcctg cgccggggcg gaccgggcct gtcccccgcc gccgaggccg 1261 ccacccgcct gcccgccggc caccccgagg gctatctgga aggcttcgcc cagatctacg 1321 ccgacgccgc cgacctcatc ctcgcccatc gggcgggaca agcgcgcccc gagcgcctcg 1381 ccggcctgcc cggggtgcgc gagggggtca agggcgtcgc cttcatcacc gccgccgtcc 1441 gcagcgccgc cgccgacggg gcctggattt ccctttccca ggagaccgca tgatccggca 1501 gagcttcgcc tggtggtcct tcgccatgga ccccgccacc gatccccggg ccctcctcgc 1561 gggggccgcc ggcgccggcg ccgggggggt ggagatgctg cccgacgacc tgtggcccgt 1621 ggctcaggac ctcggcctga ccatcgtcac cctcagcggc cacgccctgg agaccggctt 1681 caacgacccc gcccggcacg ccgccctgcg tgacgaggtg cgccgccgga tcgatgacgc 1741 cgccgcgggc ggctgcgagg cggttatcgt cttctccggc tcgcgaatcg gcgatggcgc 1801 cgacgccccg gccatcgccg cctgcgtcga gggcctcggg cccgtggtgg agcacgcgaa 1861 gggcgccggc gtgcgcctgc tcctggagct cctgaacagc aaggtcgacc accccggcca 1921 ccagtgcgac cgcacggcct tcggcgccga aatcgtgcgt cagctgaacg acccgggcct 1981 gcgcctcctt tacgacggct atcacatgca gctgatggag ggggacctga tccgcaccct 2041 gaccgccaac ctcgacctgg tcggccatgt ccacaccgcc ggcgcccccg ggcgacgcga 2101 cctcgacgac cggcaggaga tcaactggcc cgccatcgcc ggccttctcg cccgcaaggg 2161 ctatgaccag tggcttggcc atgaattcgt gccgcgcggg gagcccgtcg ccgccctggc 2221 gtcgtcgacg ccggcgccgt ccagggagcc aggcatgttt gacgccatcg tcgtcgggtc 2281 ggggatcagc ggcggctggg tcgccaagga gctgtgcgaa cggggcctga aggtcctgat 2341 gctcgagcgg gggcctgagg ttcgccacgg ggaggactac agcaacgacc tcgccgccag 2401 ccgccgccgc agcgaggacc agatcgcccc ggaggagcgc gagaagcact acccctatca 2461 cgtgggcgtc tcctacgccc tgttcgagtc caacaaggcc ttctgggcca gcgacttcga 2521 ccacccctac gagaccgcgc ccggcaagcc ctatcgctgg atccggggct accagctggg 2581 cggccgctcc ctcacctggg gccggcagtc ctaccgctgg gcgccgcagg acttcgagtc 2641 caacctgaag gacgggcacg gggtggactg gccgatccgg tatgacgacc tcgccccctg 2701 gtatgaccat gtggaggcct tcgccggggt cagcggcgac tatgacggcc tgtgtcagct 2761 gcccgacggc gacttccttc ccgcctggcc gatgaacagc gtcgaagagg ccgtgagggc 2821 cgggatcgaa gcggccttcc ccacccgccg catgatcatg ggtcggaccg ccaacctcag 2881 ccgcaccacc aagctgcatc aggatcttgg ccgcaggcgg tgcgagggca agctgcgatg 2941 cgcccacggc tgcaatctgg gcgcctattt ctcgacccag gcggccaccc ttcccgccgc 3001 cctggccacc ggcaacctga ccgtcgtgac cgacgccgtc gtcgcctcgg tcgagtatga 3061 ccccgctctc ggccgagtca ccggcgtgcg ggtggtcgac cgcaacaccc gcgaggggcg 3121 cacctatccg gcgcggatgg tcttcctcaa tgcgtccagc atcaacacct cgaccatcct 3181 cctgaactcg gtgtcggagg ccttcccgag gggtctcgcc aatggcagcg accaggtggg 3241 gcgcaacctg atggaccatg tcagcacccg gcccatcgcc ggccgctggc cgggcttcca 3301 ggaccagcac gatcccggcg agcgcccgac gggcatctac atcccgcgct acgccaatgt 3361 gacggagacg gacaagccct acctgcgcgg cttcggcatg cagggcgggg cggtgcgcac 3421 ccggcgggcg gaggccccgg gcggcctgcc ctggtcgatg tctctcctgc ccttcggcga 3481 gggcctgccc aatcccgaca accgcgtcac cctctcggcg acgcgcaagg actcctgggg 3541 aatgcccgtg ccgatcatcg acgccgccca tggccccaat gagcgccgga tgatgcgcga 3601 ggcggcccgg gacgcctggg agatgctcac ggccgccggc tgcctcgatc tcaacccctg 3661 ggaagaggcc gccgaacaca tcaccccgcc cggcgaccgc atccacgaga tgggcacggc 3721 ccgcatgggc cgcgatcccg ccacctcggt ggtgaacggc tggggccagg cgcacgaggt 3781 ccccaacctc ttcctctccg acggggcggt catggcctcc agcgcctcga tgaacccttc 3841 cctgacctac atggccctgt ccgcccggac ggcgaaccat gcggccgacc gcctccgcga 3901 cggagccctc tgatgcccgc ctacggcctg cagctctaca ccctgcgcaa gcccttcgcc 3961 gccgacccca aggggacgct gagccgcatc cgcgagatcg gctacgacgc cgtggagttc 4021 gccgcccccc tgtccatgga cttcgccgac ctgggcgccc acctgcgcga catcggcctc 4081 gactgcccct ccgtgcacgc cggcctggac gacatggccc agcagcccga cgccgtcctg 4141 gccatggccc gccacctcgg ctgccgctgg atcgtcctgc ccttcgtcat gcccgaccag 4201 cgcgactggg aggcggtgat cgccggcctg aacgccttcg cccgccgggc gaccgccgag 4261 ggcttccgcg tcgcctacca ccaccacgac ttcgagttcg cccccgacgc cgacggaacc 4321 cttcccttcg accgcctggt cgccgggacg gacccggccc tcgtcagctt cgagctggac 4381 gtctactggc tgacccgggg cggccaggac gccctggaca cgatccgccg cctggccggc 4441 cgcgtgcgcc tcctgcacct gaaggactac agggccgacg gcggcatgga caatgtgggc 4501 gccggaaccc tcgacttccc cgccctcctg accgccggcg accaggcggg cgtggagcac 4561 cgcttcgtcg aacacgactt cccgcccgag ccctactggc ccagcgtgga ggcgagtttg 4621 aggtacctgc ggggactggg gtgagcgccg gtcaggcccc gcccgccgcc ttcggagggc 4681 gccagggaag gagttcggcg gcggaggtgt tccggagaaa gaccagcccg tcaaactgct 4741 ccccaaggac gccgcgggag gggcggcgcg aggggtcgta gtcgcccccg atgctgtgca 4801 gatccaaggg ccgggcgagg gcccgggaaa ggtctccgct caggtccgtc aggtccagca 4861 tccccgcctg tcggcagaga cgcgcggccg cgttggcgaa gtaggagggc ggggaggctt 4921 tgaggttgaa ggtcctgatc ctgtcgccgg gcgcagacag gaagcttccg gcgtagaact 4981 cggttccgac cgcccggtaa ccctcgccca gccggcttcg gaggaaggcg ccggccggtt 191 5041 tgtcctcagg tcgcagctcg tgcgacacgt gcccattgtg ggcccagaaa acgccccgcg 5101 cgcccggccc ctccagcgcc accagcgcct ccgcgttttc cgccatggcc ccgtcgcggg 5161 cgtcccaggg gcctggcgcg agatccgcag cggcgatcgg ggcggcgatg ccgtccagcc 5221 aggccttcac ccgggggtca tccccgaaga cgaagttggc ggtggcgagc ggatccatcc 5281 tccagaggct ctgttccgcc acccgcaccg tgtggatgtg cagagcccgc tcgcgcgcag 5341 acaggccggg gcccggggcg gcggcgaacc cggcgcgcag gcgcgcgagg ccggcgaggg 5401 cccggcggcg cgctgcgacg atgggtgcgt aatcgaaagc gggccgccgg gcgtacacct 5461 tcaacatctc ggtggcggcg gacgcatagg cctcggtgtc cgccacgact gcggaaaggt 5521 cggcgaggac ggccggcccc gcgcccagcg cctccgcgac ctcggccagc cgggccagaa 5581 ttcggtcgat gtcctgcatg tccacgtcgc ggaagacgat gcgcaggccg ggcctggtca 5641 ggcgcagggc gcgaagccgc tccatcagtg tggccagctc gcggttcttc caccagatgt 5701 tgcgcagggc cttcaggccg cccgccgcgt ccccggttcc gttttcgagg taggcgtcga 5761 gggcggcggc gcggacaaac tcatcttcga tcaggaagac gaggggccct tccgcccgcg 5821 cggcgatctc cagaacgatc gctgcgcgca cctcgttgat ttcctttgtg ccatgcgtgg 5881 cctcgccgag cccgacgatc ttcgtggccg ggttgaagat ccaggcgggg agggctgcgg 5941 cggagtccgg agtcaggcgc cccgtcgccg gatccaggag cagggggcgc atcgccgcct 6001 cgaggtcgcc ctgggcggcg ccgagggggc tgtcgccggg aaccggactg cagccgtcgg 6061 cgccgggccg ccgggtcccg tccgcccgga acgccgccag agatccggcg agggccgccg 6121 cgccaagcgc tctcagaccc gcccttcgcg cgagggacag gggcgcctcg ccttcgcggc 6181 tcggcccacc gccgaggaga ttctccgggc cgggctcacc catggacgtt catcctgcga 6241 tagggataac cggcgcccgt ccttcggaaa agcagcgagg cggacggcga cgcacgtggt 6301 ggaggcgagg ccgccctggg ggtggtcatc ctggtggtgg cgacaatctt cgaggctttg 6361 cggcggttga agtcgttgac ccggccccag atcaggaaca ggtcccccag ccacaggatg 6421 ccgcaaccgc cgaagatcag cggcttcagg atctcgaagc cgaggtcgcc cactatgaag 6481 cggtcgcccc cctgtgtgca gaccaggacc gacaggacaa taatcagggt ccagtccttc 6541 ttttccgaag cggactgggc gttggaccga aactcctgcc gttcgttcag gtcccggatc 6601 agcatctctt gccccaggaa ggtgaattca gccatgggaa ggtccgcggg gtccggcgtg 6661 aagttcgacg agagtcgccg ctgcaagcgg cctgaacggg ataagggcaa ggactccggc 6721 gaggcggaac aggcagcctg gcgcgtggac gaccgcagga gtcccgaggc tgagggccca 6781 ggccgctaca gatcgcgcga cgtcggacga acctttccaa acctttcccc gcctcagtag 6841 gtccgcacgt accagtcatc gacgttggac agggtctcga acttccggcg gtagcccttc 6901 gaggccagga aatcgtggat ctcctgtcgc ttggacgtgt ggttgtgctc caccgtgatc 6961 agccggatct cccaggcgct gaagtcgagg gcctggagga tgtcgagctc ggagccttcc 7021 gtgtcgatgg acaggtagtc gatgcgccgc ggcgcgcgcc agtgcgccag gaggtcattg 7081 agagagacgg tggtcacctc gtagcgctgg ccctccgccc gggtgccggc caggctgtcg 7141 ccctgggaga aggcgtcgat ggtggaatgg gccgcgatgg acggctggtt gaagaccagg 7201 gtctcccccg tccgcgtcca gacgcaccgg tcgtctatga agcaggcccg gttgcgccgc 7261 agggccgggt accagatgcg cgccggctcg gcgaggatgc cgttccagcc cagccgggtc 7321 tccagggccc gggtgttgga gaacctgatc ccgtccgtcg ccccgaactc cacgaagaac 7381 ccgccggtcc gcatcttcgt ctcgtaggcc acccaaaggt cctgcaggaa ctggccgtgg 7441 ctccaggcgg cgttatccat gcagaactgg aggaaggcga tctgctcgtc cttgccggga 7501 tcggaatccc tcaggccgac catcaggagc aggaagtcgg cgcggctgat cgccaccatg 7561 tcggtcattc tctcttcggg ttgagggcaa ggcctagcat ggcgcgggcg cgccggggag 7621 acggggactt gcgggcgaag ttcgggggac cctggccgcc ggtccgctcc gtcttcgtgg 7681 tttaacgatt cgttgaattg cgtcgtcggc ggaacccgac gatggttccg ccatcaccga 7741 cgggaacggg ggacttcatg ctggtctcac agattcttcg cgccaagggc gatgcggtgt 7801 tcaccgtcgc gcccacagac acggtcggcc gcgtggccga gctcctgcac tcccggcgcg 7861 tcggcgcctt tgtggtcacg cagggggatc gcgtggtggg catagtctcg gagcgcgaca 7921 tcgtccgggc ggtggccggc ggcgacgttg gcgtcacttc gcgcccggtc tccgcctaca 7981 tgacggcgga agacctcttc gccgcgccgg gtgaatccgt cgatgccctg ctcacgcgca 8041 tgaccgaccg gcgcatccgc cacctgccgg ttcttgaggc cgggcggctg gtcggcatcg 8101 tctccatcgg cgacctggtg aagtggaaaa tctccgaggt cgaagccgag gcggacggcc 8161 tgaaggccta tatcgcaggg agctgatctg gagggggctt cctggtggag ccgaggggag 8221 tcgaacccct gacctcatca ttgcgaacga tgcgctctac caactgagct acggccccaa 8281 ggaaggcgcg gaataagcgg ttttgggggc gggctgtcaa gacgcttggc gggcccgggt 8341 tgtcgcagag gggcgggcgg cttatgtggc ggggatgatc ggtagcttcc tgcactttgt 8401 cctcagcggc ctcatcgacg tgatggtctg ggccatcatc atcagcgccg tcatgtcgtg 8461 gctgacggcc ttcaacgtcc tgaacctgcg caacagcttc gtggccggga tccaccgggt 8521 tctggagacc ttgacggcgc cggtcctggc gccgcttcag cgcattatcc cgcctctggg 8581 cggcctggac attacgccca ttatcgccat cctcgttctg cagggaattc agcgcttcat 8641 cctgcccgtc ctgcttggcg ggctctgagc ccacctccct gcgccccgcc gtcccgtcac 8701 gccccggggg gaatgcgatc cagaggacgc cgggcgggcc gatcccgacg gcctccggag 8761 tcgccggcgg cggacgcccc gcgccttccc agccgacgcc tttcgcggta tccgccgccc 8821 cgtttccggg agcccctgcg tgatcctcac cctgtcctgt cccgaccggc ccggcatcgt 192 8881 cgcccgcgtc tccaccctgc tcttcgagca cggcgccaac atcgtcgacg cccagcagtt 8941 caacgacgcc gagaccggcc gtttcttcat gcgggtggtc ttcacccccg gccccggcgg 9001 ggcggaggcg gtgcgtccgg ggctggaggc cctggccgcc gacctttcca tggactggag 9061 cctgcgcgac cccaaggccc gccgacgggt gatgatcctg gcctcgcaga ccgatcactg 9121 cctggccgac ctgatctggc gctggcggca gggcgagctg cccatggata tctcggcggt 9181 ggtctccaac cacccggagg cgagttttcc gcatacggat ctcaagggca tcgccttcca 9241 ccacctgccg gtgagccccg agaccaaggc cgaccaggag gcccgcctgc gcgccctgat 9301 cgccgccacg gggaccgacc tcgtcatcct ggcgcgctac atgcagatcc tgtccgagga 9361 catggccctg cacctggccg gccgatgcat caacatccac cactccttcc tgcccggctt 9421 caagggcgcg cggccctatc atcaggccca cgcccggggg gtgaaggtga tcggcgccac 9481 cgcccacttc gtcacccccg acctcgatga gggcccgatc atcgagcagg acgtggagcg 9541 gatcagccac cgcgaccagc cgaaggacct ggtgcgcaag ggccgcgaca tcgagcgccg 9601 ggtgctggcc cgggcggtgc gctgggtgct ggaggaccgg gtcctcctga acggccgcaa 9661 gacggtggtc ttcacggact aggcgcggtg gaccccctcc ggcccggggc cttgccccgg 9721 tccacctccc cctcaagagg tggaggaatt tcagcgtctt gatggctccc ccgaggggga 9781 gctccccgca aagcgggggg agggggtgag cggcggctca gttgatgtgg taggcccgct 9841 cgccgtgggc gccgaggtcg aggccgtcgt cgatgacctc gtcgctggcc cgcaggcccg 9901 tcagcttccg gacgatggcg accaggacca ggctggcgcc cgccgaccag gcgcagacca 9961 cgacgacccc gagagcctgg cgccaggcct ggccggccag gccgacgccc tcggcatagc 10021 cggtcccgcc caggaggggg tgggccagcc ctgcgagcat cagggtcccg agaatgccgc 10081 ccactccgtg gaccgcgaag acatcaaggc tgtcatcgac cttcagccgg gccttcacca 10141 ggctgacggc ctcgaagcag acatagctgc cgagcgcccc caggaagagg cccgccaggg 10201 ggctgatgaa gcccgaagcg ggcgtgacgg tggccaggcc ggcaatgact ccggtgacaa 10261 ggccgaccag gctggcccgg ccaaagcggc ggtgctcgat cagggcccag accaggcccg 10321 ccgaggccgc cgccatgtgt gtggcggtga gggccgagcc ggcgtcggcg ctggccgtca 10381 gggcgctgcc gccattgaat ccgaaccagc ccacccacag gaggccggcc ccggtcatgg 10441 tcatgcccgg attgtgcggc ggcgacaggt cccggggcca gccgtcccgg ggtcccaccg 10501 ccttcgccag gatgagagcg ctgacccccg ccgtggcgtg gaccaccagg cccccggcat 10561 agtccttgac ccccagggtc gccatccagc ctccgcccca gatccagtgg cagacgggag 10621 cgtagaccag gaccagccag ccggcgcaga aggccaggag gccggcgaac ttcacccgct 10681 cgaccacggc gccgatcatc agggccggcg tgatgatggc gaaggccatc tggaaggcga 10741 agaagacgat ctcgggcgcc gtcccgatga cgtcgcctcg ggtcaggccc atcaggccga 10801 cgcgaccgag gtcgccgatg aagggggcgg tccctgaaaa ggccagggag tagccgaggg 10861 ccagccagac ccccgaggcc aggcaggcca ccgccgtgca ctgcatcagg accgacagga 10921 cgttcttggt gcgcaccagg ccgccgtaga aaagggccag tcccggcagg gtcatcagga 10981 gcaccagggc cgtggcgctg agcagccagg ccgtatcgcc ggagttcacc tccgaaggcg 11041 cctgggcgag ggccgggccc gccccgagga gccccaggcc cagggcgagg cccgtcgctt 11101 gcatgactct tgtgcgctgc attccccgct tccccccttg cgccgttccg gcggttgtcc 11161 tgaagtcctt ggctacagga tcaggcgtcg ctgggaaggc ccaaaaacgg ggccccgggc 11221 gccggtccgg caggcaggcg tccagcggcg cagacaggtg gagcccggaa gtgagcgaga 11281 aatcggcaaa acggcgcgat ctggcgcaga ttttgcggtc tgcagagcgg attgtggcct 11341 tcaccggggc gggaatctcc acggaatcgg gaattccgga tttccgctca ccgggcggcg 11401 tctggtcaaa aatgaagccg atcccctttc aggatttcgt gaacagcgag gaaaagcggc 11461 gggaatcctg gacccgcgcc ttccagggcc gggccggctg gacgggccgg gagcccaacg 11521 ccggccactt cgccctggag cgcctgtggc gggctggccg gctctcgtcg atcatcagcc 11581 agaatgtcga caaccttcac cagcattcgg gcatccccgc cgaccgggtg atcgaactgc 11641 acggcaacgc cagctacgcg acctgcctgg cctgcggcct gcgccacgag atcgaggacc 11701 tgcgcccggc cttcgagcag aggggcgagc tgccggcctg ccgggactgc ggcggaatcg 11761 tgaagaccgc gaccatctcg ttcggccagt ccatgcctga ggaacccatg cgccgggcgg 11821 aggcggagac cctgtcctgc gacctcttcc tcgtgctggg ctcaagcctc accgtctggc 11881 cggcggcggg ctttcccctc caggcgcggc gggccggcgc ccggctggtg atcctgaacc 11941 gcgaccccac ggacctcgac gaccacgccc acctcgtcat caacgacgag atcggcccga 12001 ccctgtccga ggtgatcccg gccaacttcc ccggctgatc ccgaaggcgg cgggcgccgc 12061 cccgggaccc cgccgcagga tcagttccgc cagaccaccg aggcggtgac gtcctcgccg 12121 ccgccctcat aggcgcggat ctcgttgcgg ccgaccacca ggtgatggac ctcgtccggg 12181 ccgtcggcga accgcaggtg acgcacgtcg gtgtacatgg aggcgagcgg cgtccactgg 12241 gagatgccgg tggcgccgtg catctggatg gcctggtcga tgatctggca ggcccgctcc 12301 ggcaccatgg ccttgaccat ggacacccag acccgggcct cgcggttgcc aagcacatcc 12361 atggcgcggg ccgccttcag caccatcagg cgcatggcct cgatgtcgat ccgggcgcgg 12421 gagatgatct ccacattgcc gccgagccag gcgatcttgc ggccgaaggc ctcgcgcgag 12481 agaccccggc ggaccatcag gtccagggcc cgctcggcct tgccgatcgt gcgcatgcag 12541 tggtggatcc ggcccggccc gaggcggacc tggctgatct cgaagccccg gccctcgccg 12601 agcaggacat tggacttggg cacccggaca ttggtgaagc ggatgtgcat gtggccgcgg 12661 ggcgcatggt cttcgccgaa gacatgcatc gggccgagga tctccacccc gggcgcatcg 193 12721 atgggcacca ggatctgcga ctgctgcttg tgcggcgggg cgtcgggatt ggtgcggacc 12781 atggtgatca tgatcttgca gcgcgggtcg ccggcgccgg agatgtagta cttctcgccg 12841 ttgatcaccc attcgtcgcc gtccagcacc gcctgggtgg cgatgttctt ggcgtcggac 12901 gaggccaccg ccggctcggt catggcgtag gccgagcgga tctcgccgtt cagcaggggc 12961 ttcagccact tttccttctg ctcgggcgtg ccgacgcgct ccagcacctc catgttgccg 13021 gtgtcgggcg cggagcagtt catgacctcc gaggccatgg gcgccttgcc gagctccacg 13081 gcgatatagg cgtagtcgag gttggacagg ccttcgccag tctcggcgtc gggcaggaag 13141 aagttccaca ggccttcctt cttggcctgg tccttcgccc tttccagcac gtcgagctgg 13201 ccgggcgcaa agttccagcg gtcggtatgc ccctcgccca gcgcctggaa ctccttctgc 13261 atcggcacga ctgtctccgc gatgaagcgc ttgacgtgct cgtacagcgg ggtcgctttt 13321 tcgctcatcc gcaggtcgtt catgtcgtcg tgcgggttga agccgaaggt cagcgggccc 13381 gtggtggggg gcgtcggggc gttcatggag tttccttcct gtccggtctg atgtctggtc 13441 gcgattaagg ccgcggcggc cggtcctgac aagcccggcg cttaacgcac ctgccgggcc 13501 aggaagtcca gggtccgccg ggtggtctcg tcgccgggcg ccccgaagcc gtagttgagg 13561 ctgttgtggc tttcgcccct ggcctcgtgg atctccgcct ccccgccggc ggcggccagg 13621 gcttcggcca gggcccgggc ctggcccggg gtctcgtcgt tgtcggcgtc cacatgcatc 13681 aggacggggg gcggcggccg gcccggcgcg gcgtgggtga ccggcgacat ggcggcccac 13741 ccggaggggt ccggaccgaa gacccgggca tagacccggt cccccgtctc gcggacatgg 13801 cgcgcgatgt ccaggccgta ggcgtccagg aggatggcgg cgcggacagc ctccggcggg 13861 acgcccgccc cctccagata ggactcgtcc agggccacca gtgcggccag atgggcgccg 13921 gccgaatgac ccatcaggat gatccgctcc ggatccaggc cgcagccctt cgcctcccgc 13981 cgcaggaagg ccacggcctc ggcgatgtcc cgggcggcct cctggggggg cccatgccgg 14041 agcaggcggt agttcaccgt ggcgtaggcg tagccggcct gccggaaggc cctcgcctgg 14101 ccgggcgtac ccgcatactt gcccccgtgc atccaggttc cgccatggac gaacaccaca 14161 aggggggcgg gcgtgggggt cgcgccgggc cagaggtcca gcaggtgcac ggaccggggc 14221 aggtcggggg tcggcggctg gtaagccagg ttgcggaccg gccggtccgc cccgggcgcg 14281 aggccgctgc aggggctctg ggcgtgggcc gccgcggcgc ttccgagcag cagggcggcg 14341 gcgagcggcg ccaggaggcg ccggacccat ggttgcgcaa ggcgcccggc gccgtgcagg 14401 atcgcaggag acgttcgcat tctcatcgcc ggaaccatgc ccatggcttt ctcccgccgc 14461 aaccttctcg tcgccgccgc cgcgtcggcc cccctggcga cccccctggc cgccttcggc 14521 tcccccgggt tcgccgcggg tccgcccctg cccatgatcg cgccgggggc ggccggcttt 14581 gatccggaga agctgaaggc gctctcgacc gacctgcagg aactggtgga ccgcggccag 14641 ctggccggtg tcacgaccct ggcggcccgg aaggggcggg tcttccactt cgagacccgg 14701 ggccttgcgg acgtggagac cggcgccccg gcccggcccg acaccatctg gcggatcgcc 14761 tccatgacca agccggtggc cggcgtcgcc atgatgcagc tctgggagaa gggcctctgg 14821 aagctggacg accccgtcga acggcacatc ccggagttcg caggcctcaa ggtgaaggcg 14881 cccgacggga gcctggtccc gcaggcctcg cccatgacca tggcccagct gatgagccat 14941 accgcgggct tcgatgtcag cgccggctac gcccaggcgg gcctgcgggc cggcgacctg 15001 aaggagatga tccgccgcct cgccgccctg ccgctcgccg cccagcccgg gtcggactgg 15061 cggtatggcc ccagcgtgga catccagggc catgtggtgg agcggctgtc gggcgagcgg 15121 ctggacgttt atctcgaccg ccacgtcttc aggcctctga agatggtcga caccggcctg 15181 catgtcccgg cggccaaggc ggcccgggtc gcccgcatcc atacctaccg ggagggccgg 15241 atcgtcgccg gtccgcccca ggccataccg gtcgcgccgc cggtcttcct gtcgggaagc 15301 gggggcctcc tgtccacggc ctcggactat ttccgcttct cccaggccct gctgaacgga 15361 ggcgccctgg acggcgcgcg gatcctgaag ccggagaccg tccggctcat gcggcgcagc 15421 gtcctggcgc ctggcgtgaa ggtggacctc tacggcccgg cccagtccgg gacgggcttc 15481 gggctcgact tcgccgtgat cgaggacccc gccgccgccg caacccccct gggccgcgac 15541 acctactact ggggcggagc cttcggcacc tggttctgga tcgatccgac gaatgacatc 15601 gtcttcgtgg gcatgatcca gaacctcaac ggcagcatcc ccggcgccgg gagtccccct 15661 gtgcgggagc taagcatccg gcgtctgtac gccgccctcc ggacgcccgc ggcgtgagcg 15721 ccccccgtcc agcccggcgg gggctgctcg ccgccggcct cgtcctgccc ctcgtcgcag 15781 gcccggtgcg cgccgccgcc gcggggcgga tggtgggttt cctgtcggac ttcgacgaga 15841 tcgacgacgc cgtcgccatc tgccgggcgg tgatcctctc cctcgccccg gacgcccggc 15901 tcctcgacat cacccaccgg gtcccgccct acgacgtgac ggccggggcc cggctcctgg 15961 ccgggtccgc gccctggctg accgcggacg ctgtgatcct ggcggtggtc gatcccggcg 16021 tgggttcggc ccgccgggcc atcgtggccc ggacccgccg gggccagagc ctggtcctgc 16081 ccgacaatgg cctgatcacc ctgctcgatg cgagcgatcc cgtcgtcgag gcccgggaga 16141 tcaccgcgga gtcctggatg atcggcgccc gccggtcctc caccttccac ggccgggaca 16201 tttttgctcc ggtcgtgggg cgcctggccc gcggcgatga ctggcgggcg gtcgggccgg 16261 ggatcgaccc gaagaccctg atccggctgg acctcgcccc gctccgcatc ggccccgacg 16321 gcgcacgtac ccgcgccatc gggacggacg ggccctatgg caacctgatc ctcgacctct 16381 ccgccgaagc cttcgccgcc ctcggctggc ggccgggcga gaccgtcccc ctgaccgtgg 16441 ccggccgacc cgagcccgcg ccgtatgtcc agaccttcag cgacgtgccg gagggcgccg 16501 tgctgctcta cccggactcc cgggggcgga tgagccttgc cctcaacatg gggaactacg 194 16561 ccgccgcccg cggggtcgcc cccggcgtcg aggtcttcat cccccggcgc cgctgacccc 16621 ctcacgggac ttcgccggga gctccccctc gggggagaaa tggccttggc gcgaggcctc 16681 agacccagcc cgcggtctgg acgatcatcc agacgccgat gaccaggaag aggccggcgg 16741 cgatccggcg gaccagggtc agggacaccc gcttcaggat ctcgtggccg aggaagacgg 16801 ccgggacgtt ggccagcatc atgcccaggg tcgtgcccag ggtgacgatg agggcggatt 16861 cgaaccgcgc gccgagggcg acggtggcca gctgggtctt gtcgcccatc tcgacgatga 16921 agaaggctgc ggtggtggtc aggaaggcgc ccatcttcgc gggcgccttg aggggttcgt 16981 cctcgtccag cttgtcgggg atcagggtcc aggcggccat ggcgatgaag gacagggcga 17041 tggcgtagcg gaacaggtcg ccctgcagga aggccgccgc ctgggcgccg accagggcgg 17101 cgaggaagtg gttggccagg gtggcgacga ggattccccc gatgatcggg aagggctgga 17161 cgaagcgcgc cgccagcacg atggccagca gctgggtctt gtcgccgatc tcggccaggg 17221 tgacgacggc ggtggaggtc aggagcgatt ccaaggttcg tttccgggtc gggcggcgtg 17281 aatacccatg gtcgcagcaa cccccgcccg acggaacggg gccgcagcgc catcggtctc 17341 gcccggaacg ccgccccgaa ggacggatcg ctcttcctgc gccatggccg cgccggccaa 17401 gtgtgttgac gcgagggtcc gccggggatg gcggagggct actccccttt gaccccttcc 17461 ggttacgccc gccgcccccg ccgcgcaagg gctgacgtcg cgtgatcttg cggtgcgcgc 17521 cggagtgtcc aatgtgcgcc ccggaaaaca tcgtccctgc ggggccaggc agggagaggc 17581 ccatgaaagt cgccgccgtg aaggccccgg gcgggctgga caagctggtc atcgaagagc 17641 gggcccagcc cgtcgccggt ccgcgccagc tcgctgaact tccacgactt cgccgtcgtc 17701 gccggcatgc tgcccgccgc cgacgggcgc atccccatgt ccgacggggc cggcgaggtg 17761 gtcgctgtgg gcgagggcgt cacgggcttc gccgccgggg accgggtcct ctcgaccttc 17821 ttccccaact gggagacggg cggcccggcc ctggaacgcc tcatcggcgt tcccggcgac 17881 cacgccgacg gcttcgcggc ggagttcgtc gccatgccgg cccgggcctt cacccgcatg 17941 cccgccggct ggtccttcgc cgaggcggcg accctgccct gcgcggccct caccgcctgg 18001 cgcgccctga tggtcgaggc gcggatcaag cccggcgacg tcgtcctgac ccaggggacg 18061 ggcggggtct cgatcttcgc cctccagctg gccaaggcgg ccggcgcgac ggtcatctcc 18121 acctcctcct ccgacgagaa gctggaaagg ctcagggccc tcggcgcgga ccacctgatc 18181 aactatcgcc agaccccgga atggggcgcc gccgccgccg ccctcaccgg cggccggggc 18241 gtcgacgtcg tggtggagat cggcggggcc ggcaccctcg cccagtcggt caccgccacc 18301 cggatcggcg gccacgtctc gctgatcggc gtcctggcgg gtttcgccgg cgaggcgccg 18361 accgccctga tcatgtccaa gaacgtcacc gtgaagggcg tcaccgtcgg ctcgcggcag 18421 gaccaggaag agatgatccg cgccctggag gcctcgaccc tccggccggt gattgactcc 18481 acattccccc tcgacggcat cgccgccgcc ttcgcccatc aggtctccca gaagcacttc 18541 ggcaagatct gcctggagat ctgaggcggc tcaggcggcg tcgagctccc gccaggcccg 18601 ggccaggacc ccggcggccg gcaggcgccg cccgggtccg gcgacggagt cggtggcgat 18661 gagtctgtcg acccccgcag ccttcagggc ccgggcggtc cgcgcgtcgt ggaggccgtg 18721 cgccacggcg gcggtgatcc ggcccgcccc ggccgccttc agggccctcg ccgccgcggc 18781 cagggtcccg ccgctggagc agatgtcgtc cacaagcgcc acgtgcgcgc cccggagacg 18841 aggatcgtcc gggtactgaa ggctgacctc ccggtccccc tggcgggtct ttccggcgac 18901 cagcaggccg gcccccagcc gcgccgccgc cgccccggcc cagggcgcgc tctcggcgtc 18961 cggcccggcg accaggtcca cgcggggccg cagtcgtgcg agggccgcgg ccagcatctc 19021 cccggcgccg gagttcagcg cccggcgtcc cggaaacacc gccgccaggt ccggcgtgcg 19081 gtgcaggtgc gcctccaggg tcagcacccg gtcgaaggcc gaggccagga cctgtccgaa 19141 gaccgcctgg ctcaggggct cgccgtcgtg gaagacccgg tcctgccgca tgtaggcgag 19201 gtagggggcg atgaggtcga tgcgcgacgc ccccgcccgc cgcagggcgt ccgccgccag 19261 cagcaggggc atcagccgca cgtccggccg ggcgaggctg cggtagagga gggccttgcc 19321 ggccaccggg acggtcgcga cggggcggga ctctccgtcc gggaagctgt gcggccggat 19381 caggcgcagg ggcaggccga gggcctcggc gaagcgtccc gccggcccgg cctcgtcggg 19441 aaaggcgtgc acggccgcgc tcatgcgagg acgaatccgc tgtcctcgcc cgccgcctcg 19501 caggccaggg cgaagtccgc cgggtcctcg gcgtgcaggc ggtagagggg ctcgccccgg 19561 cggacggcgt cgccccgacg cttcaggagg tcgagccccg cccccgggtc ggtgggcgcc 19621 ccggcgaggc gcgcgatccg ggcgatcgtc aggcagtcca cgcccgcaac cacgccgtcc 19681 cgcgatgcgg cgacctcccg gacgagctca cccggcttca ggccgccggc gaggggggag 19741 gggccctggg cctcggcgat ggcctccagc ttggccaggg ccgcgccgct ctcgaccagt 19801 tcgcgggccc gggcctcgcc cgccccgccc ggacagtcgg ggtccagctc gatcagccgc 19861 ccggccaggc ggatggactt ggccttcagg tccgccggcg ccccggcctc gccgcgcagc 19921 acggccagga cgtcccgcgc ctccagggcc ggtcccacgc cccgccccac cgggtgcgat 19981 ccgtcggtct ccaggatctc gaccttcagc ccgacccggt cggcgacgaa ctcgaaaagc 20041 ttgcgaagcc gcagggccgc ctccggatcg cgcatcttgg cgcccggccc gaccggcagg 20101 tccaggagga ggtgggtgga gcccgccgcc agcttcttgg agaggatcga ggccaccatc 20161 tgctcggggg tgtcgaggga caggggccgc tccaccgaga tcaggacgtc gtccaccggc 20221 gacaggttca cgcgtccgcc ccagacgatg cagccgccgc agcgctcgac cacggcgcgc 20281 atctcgtcct cgtccaggtc cacccgggcc aacacctcca tggtgtccgc cgtcccggcc 20341 ggggaggtga tggcccgcga ggacgtcttg ggcgtcagca ggccgtgggc cgccgccacg 195 20401 ggcgcgacga tcagcgaggt ccggtttccc ggaacgccgc cgatgcagtg cttgtcggcc 20461 accaggggac ggtcccagac cagccggcga ccggcgccgg ccatggcgcg ggtcagggcc 20521 agggtctcct cgggggtcat gaagctggcg caggcgacca ggaaggccga gatctcgatg 20581 tcagaccagc ggtggtcggc caggtcatgc accaccgcct cgacctcggc gtcggtcagg 20641 gtcgccccgg cgatcttggc gcgcagggcc tgggcgccct tcggcggcgg ggcgggcgtg 20701 atccggacct ccgcccccgc cggaacgccg agccgccgga aggccggctc cgtcagcccc 20761 gcctcgcccg gcccgaccag ggcggggtcg tcgcagatca ccacggcggc catcagggtg 20821 cgcccgccgg cccgcacctc caccttgcgc gcgcccgcca gccgctcagg ccgcagggcg 20881 cggcagtccc gggacaggag cacggtgttc tcgcggaagg tgtcgatggc caggggccgg 20941 acggtcagca gcatggcgcc tccgccgggg atcatccgct aacgggacga ccggggcaat 21001 ctccgggtcc ctcggggggc gccttggggt caccgcgcct ctctcgtctg ggaagacgca 21061 catgatccct ccacactcct tcgctaggcc acgaaaccga cagtcgcgcc agcgacgcct 21121 atgggaatgc ctgatatgaa gccaacctgg cccgctgtct tcaaggtcag cgctgcgggc 21181 tttgcgatcg cgctcctttg ttccgcctgc gccggcgaag ggtcgggaac gccggccccg 21241 aacgccggcg gagactccgg ttcggatggg accaagccca tcacaacagc ggatgttgtc 21301 tgcggcctca gcaaggtcgt gctcaatccc acgctcggcg tgacaagcca ggcgacctgg 21361 agttgctcca acggcgcccg aaccctttcg gccaacggta ttccggacca tgcggtgggg 21421 gtcttccgga atccggagaa ccggaatgcg atcaccgccc aggccgtgca attcacgacc 21481 actctgaccc ctaccgccag cacccggcag atggtcgccc ccgggcgccc gggttaccat 21541 ctcaacgggg tcaagtttga tccggtgacg gcgggctcat gcccggccaa cgccaagtcc 21601 gtcagcgact gcaccctgat cggaccaccc ggtcagttcg gcctcctggc gctgggccag 21661 tccctcttca agttcggggt cgaccaaaac aatggccatg tccagccgga tggcgcctac 21721 cactaccacg gcattccgga ggccaccttg accgagccgg agcctcgacc gcgacgccca 21781 agatggtctt gatcgggttt gcggcggatg gctacccgat gtacgcccgc ttcggttacg 21841 ccaaggcgac cgaggcttcg agcggcgtca ggaccatcgc ctccagctac cggctcaagg 21901 cgaacccgga cccaggtcgt cccgcagtct ccgtcatccc catgggcgcc ttcacggtcg 21961 actacgagta tgtttccggc ctcggcgacc tcgacgagtg caatggccgg ttcggggtca 22021 ccccagagtt tccgaggggg atctaccact actacgcaac cgacgccttt ccgttcctcc 22081 ccaactgctg gaagggcgcg ctcccgtgat cgccctcgga cgcttgcgga aatgagggtc 22141 tgcctgcgcc tcctcgcctt cgccctcgcg gcctgggggt tcgccctgtt tcccgcccgg 22201 gcgcacatgc tgccggatcg gacggtaacg ttgaacgtcg tgaatgcgga ggtctacacc 22261 gccgcatccg ttcccgtgtc cgccctgacc cgctgggaca ccgaccggga cggtcgcctc 22321 agtaccggag aactgtcccg cagcgagccc gggatcgcgt cggatttcgc cgcccggttc 22381 cgcatctcaa acgggggtcg cgcggcccag ccggtcggga ctgtcatcct ggcttcggag 22441 gccggcggcg cggggcatgc cgtggaccac gtgatcatcc tgcaaagatc ccgtctgccg 22501 gcgccgcctg aaaccctggt tgtagaggtg gacctcttcg gctctggtcc gggcggaggg 22561 gatatgaccc tcacggccac ccgcggccag gtctcggagg cggcgatctt cagccctctc 22621 acgcggcgcc atgtcttctt ccgaagtcct gtccgggtct ttctggacaa tgtcgtcgca 22681 ggagcggagc acatcctgcg gggcccggac cacctagtct tcctcctgac cctcctcctg 22741 gcgggcgcag gatggagaga ctggg

Appendix VIII: Nucleotide sequence of Clone 2089

LOCUS CBNPD1_Clone_2089 23177 bp DNA linear ENV 30-NOV-2006 DEFINITION Uncultured organism CBNPD1 BAC clone 2089. ACCESSION EF157672 VERSION KEYWORDS ENV. SOURCE Uncultured organism CBNPD1 BAC clone 2089 ORGANISM Uncultured organism CBNPD1 BAC clone 2089 Unclassified sequences; environmental samples. REFERENCE 1 (bases 1 to 23177) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Metagenomic Analysis of a Toxic Cyanobacterial Bloom JOURNAL Unpublished REFERENCE 2 (bases 1 to 23177) AUTHORS Pope,P.B. and Patel,B.K.C. TITLE Direct Submission JOURNAL Submitted (30-NOV-2006) The Microbial Gene Diversity and Discovery 196 Research Unit, Griffith University, Kessels Road, Nathan, Brisbane, Queensland 4111, Australia FEATURES Location/Qualifiers source 1..23177 /organism="Uncultured organism CBNPD1 BAC clone 2089" /mol_type="genomic DNA" /isolation_source="Toxic cyanobacterial bloom" /environmental_sample /plasmid="pIndigoBAC-5" /country="Australia" /metagenomic gene complement(1377..2882) /gene="2089-1" CDS complement(1377..2882) /gene="2089-1" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 2089-1" /translation="MTLDPRTPVLVGCGQSLQRPDDPRAASFDTAADPATLMAEAIRL AAADAGLAKVPDVDAIRVVALLSWKYGNPAWFVAQKLGITARQYGLSVNGGNTPQTLV NSAALDIQSGRADIVVLTGGEASRTKQRAKKAGHNLAWPKADASTPDADVVSEDLSMA GEQEIARRIVMPVQVYPMFETAIRARAGRSVAEHQVFISELWSRFSKVAAKNPNAWSR RELSAEEIRTPGPDNRMVGFPYPKFMNSNNDVDMGAALIVCSVEKAESLGIPRDRWVF IHSGSDCHEHQYISNRWTFAETPAIELGGKRALEFAGLTIGDIDIVDLYSCFPSAVQL GAQSLGLDINSQLTRTGGLPFAGGPWNNYVMHAIATVMNDLRERPGAKGLVWANGGYT TKHAFGVYSTEPPATAFRHAYPQDEIDAMPRRDVATIDQAAGLPATIEAYSVMHGRDG APEMIHAACLLADGRRAWGESTDVELGRSMCTNEFVGRRVDLDAKGVFHVA" gene 2861..3406 /gene="2089-2" CDS 2861..3406 /gene="2089-2" /note="'Putative Ribose-5-phosphate isomerase B'" /codon_start=1 /product="CLONE 2089-2" /translation="MCGGRVSCPKVMRCDGPGSYARGVSAASPRRSFRIAIGADHAGF ELKQHLVAFLGAAGHKVDDFGTHSTESVDYPPICAAVGRAVRDGKADLGIVLGGSGQG EQLAANKVRGVRAALCNCLYTAKMARSHNNANVLSMGARVVGVGLAEEIVTTFLATEF EGGRHERRVTQLAEIEEEESK" gene complement(3964..5727) /gene="2089-3" CDS complement(3964..5727) /gene="2089-3" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 2089-3" /translation="MAVAEIHEAMVEVALVRGREALAFVGTADDCEQHVENGDAENEK RDEERGKEEVGLAAVVLRRRVGTASDDARRHREHEPQQQGSAVPHEDFRRVEIVRQKS DARTARDGGDERADVWFGKKTEITEMAAVHEERRRCDGDDARRQAVETIDQVDSLGHD EQPQNGDEGNPVARQHEDVEVRNAEVEHRHTRPHENDSGHHRARDFRRCRNFADVVDE SKTADEDRCQHDAERLGVTAEQWIEPVHLPGQQQRNENADQECDAPDVWRWVRVHLAG IGLGHPLASFGEPHHDGGCGKRRKGRRKRDDDVAGGVAHSATRLGLCSTQRVESGDRV DGEPRAKCRDAGLHVCNGRRIGAVAQRRRNEFGDLLHLGFLHSLRGDRRCTDANAGGD EWAAWIVRDGVLVEGDSGLVEHELGFLARELCVEGREVNHHHVVVGSTRDEAETLAGQ CAGERACVLHDLVCVVAEHRRERLAERDGLGSDGVFEWSTLGSGEDSFVDLLGVFGLA EDASTARTAQRLVRGESDDVGIRHGVRVCTTGDEAGQVGDVEHQQCTDFVGDLLEGFG LEASCVRRRARHDHLRPVLQG" gene 5707..5970 /gene="2089-4" CDS 5707..5970 /gene="2089-4" /note="'Putative Glycosyl transferase'" /codon_start=1 /product="CLONE 2089-4" /translation="MNLGHGQRRSVLILWLWTALLSGMALYPAISTSGGNIAPFGVAA 197 LALGLYTVLHPSVRRRREGEDGAGIDESADGSDPEDSAQLGVN" gene 6050..6385 /gene="2089-5" CDS 6050..6385 /gene="2089-5" /note="'Putative ATP synthase F0, subunit I'" /codon_start=1 /product="CLONE 2089-5" /translation="MRLLPDPERIRSAARNSKSPSAAGDQGLGQGAEIAIGLLVFFGI GAGIDWWAGTTPVFMIAFTIFCAIGQFVRVWYGYDARMRNLEADRAHNARAHQTSVAA GDSSRAERP" gene 6382..6846 /gene="2089-6" CDS 6382..6846 /gene="2089-6" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 2089-6" /translation="MNAPANPLTTRFEGPAPEAQVTRHMLTRGAMVIPVLVAIAGFIW GADGALGCLYGVALVFVNFALXXGLISLTARISLALMMGAVLFGYLLRLGIIFLAVIL VRDAGWVSLPALGFTIIVTHLGLLFWELRHVSATLAFPGLKPQSIPPARDHV" gene 6906..7664 /gene="2089-7" CDS 6906..7664 /gene="2089-7" /note="'Putative ATP synthase a chain'" /codon_start=1 /product="CLONE 2089-7" /translation="MIALEFPPINAILRWQDVAPSFNKIAIIAVLATIIGCTIFTLAA RADAKKAPRGARNLAETSVEFIENGVIMQTMGRDGLGWTPFLLTLFLFIYLCNVPGII PFFQMPATARMAIPAALALMVWVVYNGAGIKHQGFGGYFKSVLFPPGVPKALYVLVTP IEFISAIIVRPFSLAVRLFANMLAGHILLVTFALLSESLFQAKDKILIPFGVLPMFML IFLTGFEVLVAFLQAYIFTILTAVYIGGAVHPEH" gene 7713..7997 /gene="2089-8" CDS 7713..7997 /gene="2089-8" /note="'Putative ATP synthase C chain'" /codon_start=1 /product="CLONE 2089-8" /translation="MEALAKIVAKATEAAVDPADAKNQAAAASAGYAYGLAAIGPGIG IGYLVGKAVEAMARQPEAAGMVRTTMFLGIAFTEALALIGFVVFILLKFA" gene 8072..8683 /gene="2089-9" CDS 8072..8683 /gene="2089-9" /note="'Putative ATP synthase B chain'" /codon_start=1 /product="CLONE 2089-9" /translation="MATAVVTLSGGSVVSVRLASEKGVAAAPRAEEGAEPKDMGPNPI SPELKELAWGFGSFLVFLVIMRLFLVPKVRKGMAERYDSIRADIEGADVAKSDARAEV AKYEAALADVRTEAAKRLDAARTTLDRERGEAIAAANQRIAAKKAEAEAAAAAERAAA RDQISAAVTSVTATATSIAVGKQADNSVITQAVAQAMQSTGAR" gene 8680..9222 /gene="2089-10" CDS 8680..9222 /gene="2089-10" /note="'Hypothetical Protein atpF'" /codon_start=1 /product="CLONE 2089-10" /translation="MMRVLHLLAAEDPSQTHHWLLPETAEIIYGGIAALIIFAALYKF ALPAAKKALAARTERIQKELDNAANTRSAAEAEASNIRKAVGDIASERARLLAEADQQ AVSLLSEGRGRIAAEVADLETKAEADIAASRSRSSDELRAQIAQIASVAAQNAVTATL NDGTKQELIEGFISSVGGAR" 198 gene 9219..9743 /gene="2089-11" CDS 9219..9743 /gene="2089-11" /note="'Putative ATP synthase delta chain'" /codon_start=1 /product="CLONE 2089-11" /translation="MSEARIEAYARALAEIASAEGNLDAVEKELYVVARAVESNDQLR ATLTDESIPAVRRQAVVEALAGSASNTTAQLLGLIIGSGRGRDLPAIIDKIVKRASNA QNKEVAEVRSAVALTADQQSRLASALERATGKAVNLKVVVDPSVLGGLIATVGDEVID GSVRTRLDQVKSRL" gene 9774..11366 /gene="2089-12" CDS 9774..11366 /gene="2089-12" /note="'Putative ATP synthase alpha chain'" /codon_start=1 /product="CLONE 2089-12" /translation="MADITLSASDIAAAITKGLEGYTPDLSKRTVGRVAEVGDGIARV SGLPDCAVNELLEFEDGTVGLALNLDEDSIGAVVLGDADSIEEGQAVKATGRILSLPV GDALLGRVINALGQPIDGKGDIVGATIRRVEVQAPGIMGRKPVHEPLQTGIKAIDAMI PIGRGQRELIIGDRKTGKTTVAIDTIINQRGLGVKCIYVAIGQKGSTVAQTVETLRQA GALEYTVVVAAPASDPAPFKYLAPYAGCALGQHWMENSEHALVVYDDLSKQAEAYRQM SLLLRRPPGREAYPGDVFYLHSRLLERAAKLSDANGAGSLTALPVIETKAGDVSAYIP TNVISITDGQVYLQDNLFKSGVRPAVDVGISVSRVGSAAQIKAMKSVSGTLKLDLAQF RELEAFATFGSELDAVSKAQLERGYRLVELLKQSLNSPMPVEEQVVSIFAGTKGYLDD IPVADVRRFESELLAHVKSRHGAMIAGIRQDPKADVPKDLADIVTAFKTQFKTSAPGS GSPDASAQPVGEATTTKTLATE" gene 11372..11854 /gene="2089-13" CDS 11372..11854 /gene="2089-13" /note="'Putative ATP synthase gamma chain'" /codon_start=1 /product="CLONE 2089-13" /translation="MAGGQERILRGRIRSVQATKKITRAMELIAASRIVKAQQRVAAA VPYSEKITEVVKHLSDGGAVSQSPFLAGRSDVKTTCYVAITADRGLCGGYNSGVLRAT EGRSPQGRRGGQELHDRSGRPKGRGLPPIPRIQHQPALRGFLRQPNHHRCGGNRTLRR " gene 11829..12332 /gene="2089-14" CDS 11829..12332 /gene="2089-14" /note="'Putative ATP synthase gamma chain'" /codon_start=1 /product="CLONE 2089-14" /translation="MAIGRFVVDLFLSGEVDRVELVYTRFVSAGRQEVVRRPLVPLGG DIAKAEGGDHGHEAPGAAGDYEYEPDPSAILDTLLPRYVEARTYAALLNAAASEHAFR QRAMKSATDNAEELIKNLTRVMNRARQDSITTEIMEIVGGAEALGSGDDDDVVVGDNP FVTSDRV" gene 12348..13790 /gene="2089-15" CDS 12348..13790 /gene="2089-15" /note="'Putative ATP synthase beta chain'" /codon_start=1 /product="CLONE 2089-15" /translation="MTMTASNPTDQKDGRVVAIAGPVIDVEFPRGSLPELNTALEFTV DVDGKPNIILAEVAQQLGNSRVRAVCMKPTDGLKRGTPVRNTGRGISVPVGSKTLGHV WNVWGDVLDADPKEFENIERWEIHRSAPAFDTLEPSKRMFETGIKVIDLLTPYLAGGK IGLFGGAGVGKTVLITEMINRVASKHGGVSVFAGVGERTREGTDLRLEMEESGVFEKA ALVFGQMDEPPGVRLRVALSALTMAEYFRDVQNQDVLLFVDNIFRFVQAGSEVSTLLG RMPSAVGYQPTLADEMGQLQERITSTRGRSITSLQAVYVPADDYTDPAPFTTFTFLDA TTELSRQIASLGIYPAVDPLASTSTILTPETVGDRHYQVARRVQEILQRYKELQDIIA 199 ILGLDELSEEDRLTVARARKVQRFLSQPFYVAEVFTGVSGEYVPVAETVESFEALING ELDEVPEQAFLNVGGVEQVLAKAKALQENA" gene 13790..14218 /gene="2089-16" CDS 13790..14218 /gene="2089-16" /note="'Putative ATP synthase epsilon chain'" /codon_start=1 /product="CLONE 2089-16" /translation="MSSETTGFRVEVVSPERVVYSGEASQVITRTLGGGEIAFLPGHV SFLGALVENHTRIYQADGKVVDAAVHGGFVEVSGTTVTILSDGAELASDIDVDRARRA KERAEEHLRAEHDAETEGALRRAHARLSAAGGLTGANTGH" gene complement(13862..14359) /gene="2089-17" CDS complement(13862..14359) /gene="2089-17" /note="'Putative Enoyl-CoA hydratase/isomerase'" /codon_start=1 /product="CLONE 2089-17" /translation="MHAIGHPCEPRRRRARRACPRMWTKHSPIPAEAPVTSARRNGKF AIGLMAGVCTGQAARGGKARVSTTQCTLGFCVVFGAEMFFRALLGAPCTVDIDVARKF CAVRQDRHRGSRYLDEATVHCSVDDLAIGLVDARVVFDQSTEERDVTGEKRDFSPTKR AGDDL" gene 14314..14994 /gene="2089-18" CDS 14314..14994 /gene="2089-18" /note="'Putative Oxidoreductase'" /codon_start=1 /product="CLONE 2089-18" /translation="MPCVVVARRGDRLRALADRYEGFEVLEADLQTDAGIARVADRLA STINPVDLLVNNAGFGTSGPFHESGIERATGQVDLNVKALVALTHGAVNAFRRHGGGH VLNVSSVASFQASPNMAVYAATKSFVTSFTEAVHEENRTFGVKVSALCPGFVATEFQA VSGGADRMTRTPRFLWLNADKVARAGLDGVAKNRAIVVPGWQYAALPMLSKLTPRLVL RRIAAKIL" gene 15032..15757 /gene="2089-19" CDS 15032..15757 /gene="2089-19" /note="'Putative Response regulator, LuxR Family'" /codon_start=1 /product="CLONE 2089-19" /translation="MVTIVSSPSLQIFVTKRGMKNVLILFRQAYARDLAPAVIGSFDS VRVEGIVGTTSEAIEILNSGAIDLIVMGPGFIESAQILANAHPGLVRPVFIVLHRGDD ASVRLRAQLQEIKHVVSLASGLNDAMESIHTVLHENERTRCRIIDDLPDRCDRRVMIQ VNDDADCEIVRLVATGFADRDIAEVVHMSHQTIRNRISRILGETGARNRTHLACLYFE RVHEGLVPFETLDGPYRPVVTGL" gene 15809..16462 /gene="2089-20" CDS 15809..16462 /gene="2089-20" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 2089-20" /translation="MASLVCAIDEPDLIAKFSRSAVDLHNLDLLFVGSDPLSVKVHVA EATVCGLCIPPMPVDRMLRLRHAVYGSRDDGANLRSVIAVERPDPAFVHYVLGYGIDD VFDTSVPDAEFAAQLATFAGGRLRACADQLVAGVDLPITVMDGVIDYADEMDRKMVRL ISVGYTDREIGEIIGFTHQAMRNRISHLMLRSGIRNRTQLAARYTFESINRGIAAQP" gene complement(16488..16619) /gene="2089-21" CDS complement(16488..16619) /gene="2089-21" /note="'Putative Fructose-1,6-bisphosphatase'"

200 /codon_start=1 /product="CLONE 2089-21" /translation="MRFDSRGAHTQSLVMRSLSGTVRLIDAHHRLDKLGTYASVNYG" gene 16593..19334 /gene="2089-22" CDS 16593..19334 /gene="2089-22" /note="'Putative Prolyl Oligopeptidase Family Protein'" /codon_start=1 /product="CLONE 2089-22" /translation="MGTARIETHAAQQFAVGDPGRCEEAVVSAYEVVGGQHLAEVVTR GDRGVAFRFATGPEASLNFAAHALQRRGRHDSLGGATDSEQHVDAGLVPRGGDGAEDV AVLDELDPRPGAANFLDQVLVALAIENHDGEVAHRFVFGASYPTEVLCGAGGDVDRSG GIGADRDLLHVHARXRVEHRVTLGDGDHRNGIAQTQRGEGGAIDRIDGDVGLRAAAVA HGLAVEEHRCVVLLAFADDDEAVHRDGGEHGAHGLDCCAVGTVLVAAAHPSRCCQRSG FGDADQLECEVPVGLLRMAVGHGTTVVPPTPATDTLSAVPGPTVTRCLAQRDLSEPRL DPTGVRVAFGASHDGVGSLRLVELATGVESEWPLSPTPALGRGMSGGCFRWLPDGSGI VYAAKDGGVWEASLPDAGAAEACRIDTHHEGVFSSLHCIAVSEDARFVAAVDSLTSMV VIERATGITRKVVHQHEFSLDPMWHRDTVYWQSWSPPHMPWDEAEIWCATAPDFRARR IAALPGVSLQQVAVSVAGELGAMNDANGRLRPGVVDAGMVTPLLLSGEPCDCAGAQWG GGGRSWCWSPDGTRYLVARNLAGFGDLAVVDRASGAVTTLARAVHGGVSWTPHGIVAL RSGAVTPTQVVLFDPVSFERRVLAMSELPCIDGPWSACDLVEPTAGTVDGIPFRLYEP NDPSEASHLLVWVHGGPTDQWQVTFMPRIAHWVGRGCRVLVVDHRGTTGHGRAFQQAL NGHWGEYDTADVITVAEHAHAEGWGTPSRTVVIGGSAGGFSSLNAAGERPDLFAGVVA LYPVVDLADATERSWNYEQHSIAVLVGDTSVNSELYYRRSPLSKLDKLSQAKVLLMHG DQDEAVPLDHSILIADQLRRRGGDVAFHVFEGEGHGFRLRVNQEREYALIGEFLQTL" gene 19346..20119 /gene="2089-23" CDS 19346..20119 /gene="2089-23" /note="'Putative Riboflavin Biosynthesis Protein RIBA'" /codon_start=1 /product="CLONE 2089-23" /translation="MSDWNPLDPDAESVHYDLGAWNLDQRAAVAEVFAEAEIPHAWVG DEVVVPAELEEVADVLLDRLEQEFGVDGATVSTRGASFAIDEADDDDITEYELDDWAD NERARLSELLVASGIPFKWEGALLVTLTDFEDTVDELLDAVEAGDVTIVDSSGSGRGT VSVADSDVSGETLTQMFLAAERLQRDPLDADGLALLVRVLDDVEDGGTPFGVPVPAWR QALELADQLADALAGGDIPDEIGAMEVAQRLFVXLRPHV" gene complement(20127..20888) /gene="2089-24" CDS complement(20127..20888) /gene="2089-24" /note="'Putative Extracellular Serine Protease'" /codon_start=1 /product="CLONE 2089-24" /translation="MLPAPGNSTRPFALTHNSQFREYPARLGQVGNLTTCTILRTTVV GMHQTEPVSETLRDLALRRLDGARGKTQHFKSPGAGVGERRDRIRGIADEVRDEGCTD RPAPRHRLHEHRAECASGDRFQTFECVNGAEHLLQRIAVRTKDELGGVVFTHQSMNTS RKSVTIDLEAVTKPRLHDSVTALDLEDEALNVGVEVVVDFVEMACDDRAQQHSPKTGR RIGWQNQVAKRNTPSGRDGPRVPDLEFCEQHRRRR" gene 20839..21405 /gene="2089-25" CDS 20839..21405 /gene="2089-25" /note="'Predicted Hydrolase of the alpha/beta-hydrolase fold'" /codon_start=1 /product="CLONE 2089-25" /translation="MCVRAKGLVLFPGAGSNRDHSSLVSLEERLAPLPVARVDFPYRR AGRKAPDRAPVLVDCVVREVKEFAALNSCRSSSLVIGGRSMGGRMCSMAVADGLAAKG LVLISYPLHPPAKPQNLRVEHLSNIAVPTLFVHGTNDPFGSPAELRRHARRVTGDVTF QFIERGRHDLKGSDELIADVVGEWMASL" gene complement(21402..22970) /gene="2089-26" 201 CDS complement(21402..22970) /gene="2089-26" /note="'Conserved Hypothetical Protein'" /codon_start=1 /product="CLONE 2089-26" /translation="MGRPDIRTDRGRRIRVTTRQIGPNDADFSSILEPHSDIVVESVG STSADTTTFAQHNGPFATYERVVTRRADGAARETTTWSVDIPWFWWMFTPLISRHFAR HERRAKSTWWAPPQELNQREVLVLSLLAAASMASAFINTLFTQTVTYAAEAFNVGEQG QGVAAAIVRFGIVVAIPIAVLGDRIGRRRIIVSLAWIAPVIAALGALAPSFWLLTANQ TIARPLGLALDVALLVVAVEEMPKNARAYGLSILALASGLGAGIAVLALPLADSGENG WRIIYVVSLIWLVVAARLTRRLSETTRFTRHRTEGSRRVEARKAGSHAPRTLRVQGVI LVVAFLGNVFLATASIFQNRYLKDVRGYSAVLVAAFTLLTTTPSALGLIFGGRLADVY GRRVVAATCIPVGALLIAMSFVVDGPLMWVCAVTGGVFAATAYPAMAVYRGELSPTAK RQTSSFLVTVSALLGGSAGLIVGGTLIDHHWSYGTVMLTFSSVSLVIAALVWFFYPET AHRELEELNPEDAP" BASE COUNT 4217 a 7379 c 7292 g 4289 t ORIGIN 1 ccccgcgttc aacggcgtga gaatggcaac cgaacagccg gcgatgtcgg cgatgtgaac 61 gttcggagac acgagcatcg tctcggcgtt gttcgtgagt tcggggccgt cgacgacacg 121 gaacgtcgca cgcggcgcgc gcaacgacga ctcgcggacg accgtgcgac cacctccgta 181 cgcggcggtg cacaccgatt cgccagccgc aatgggtcga acgcccgtgg acccgagtgc 241 atacacacct tggaaccacg gggtcgacac gtggttcgct gcgaagatca cttcgtcgcc 301 gaccgacagc accgagcgca cgtgcagatg gtcgggtgtc gccgagacgc cgtcgatcag 361 aatgcgccgc actcctcggt cggcacacgt gacgagcgca ccccccgtat tgagcgctgg 421 cacaccgggt atcaagtcga cccatgcatc gttcgaatcg gtcccgatga cgctgcatgc 481 gcccgacgtt gggtcgatgc gaacgatggt caccgagcgt tggtcgcggg actgaacctg 541 ggcaatgagc cctgcggcat tccaggcaac cgagttgagg tattcgtgcg gcggcaggtc 601 gacccggatg gccgcacccg tcgtcacatc gacgatgtgc aacgacacgt cggcgttggt 661 ggagccggca gcggggtagc ggtgttcgcc caccgcagcc gagggattcg cggggtcggc 721 gatgtaccaa acgttcaccc cggtcacgtc gacacgggtc gctgcgatga aaagtccgtc 781 cggcgcccac cagtaaccgg cgtgacggtt catttcttcg gcggcgatga actcggcgag 841 tccccacgtc acgttggggt cgtcgttacc tgcgagttct cgttcgttgc ccgctctgtc 901 cgagacgaac agtcgtccgg cgcgcgtgta ggcgatgagc gagccatccg gcgagggtcg 961 ggcatcgaag acaccgggat cgacggcgat ggcactggcc ccgcccgcga cgacgtcgac 1021 gacaaacaac tgaccgccga gcaccgtcgt tgccatcgtc acggccgaat cacacccata 1081 cgagacgatt ccgcctgcac cttcacgcgc gcgttcccgg cgtcggcgtt cttccccggt 1141 cagttcgcca tcggtatctt ctgcgttgcg ggggtcgaac agacaacttt cggcccccgt 1201 ggatacgtca agcgaccaca acgcattgac ggggtcgctg cccgttgccg accgcgaaaa 1261 aatgacgcgc ccgccgtcgg gcgacacctt gaagttgcgc ggctctccga gggtgaggcg 1321 ctgcgtgcgc gcgtgttggc gcgggaaagt gtccccggag cgtccggtgt cgcgactcat 1381 gcgacgtgaa acacgccttt tgcatcgaga tcaacacgac gaccgacgaa ttcattagtg 1441 cacattgagc gaccgagctc gacatcggtg ctttcacccc acgcgcgtcg accgtcagcc 1501 agcagacacg cggcgtggat catttccgga gcgccgtcgc ggccgtgcat caccgaatag 1561 gcctcgatcg tggccggcag acctgctgcc tggtcgatgg tcgcgacgtc gcgacgtggc 1621 atcgcgtcga tttcgtcctg tgggtacgcg tgacggaatg cggtggcagg gggttcggtc 1681 gaatagacgc cgaacgcgtg cttggttgtg tatccaccgt tggcccacac caggcccttc 1741 gcaccgggtc gctcgcgcag gtcgttcatc accgtcgcga tggcgtgcat gacgtagttg 1801 ttccacggtc cacccgcgaa tggcagacca cccgtacggg tgagctggga gttgatgtcg 1861 aggcccaggg attgggctcc gagttgcact gccgacggga agcacgagta caaatcgacg 1921 atgtcgatgt cgccgatggt gaggccggca aattcgagtg cgcgctttcc accgagttcg 1981 atggcgggcg tttcggcgaa tgtccaccgg ttcgagatgt actggtgttc gtggcaatcg 2041 gagcccgaat ggatgaacac ccatcggtcg cgggggatgc cgagcgactc ggccttttcg 2101 accgagcaca cgatgagggc agcgcccatg tcgacgtcgt tgttcgagtt catgaacttc 2161 ggatagggga aaccgaccat gcggttgtcg gggccgggcg tgcggatttc ctcggccgaa 2221 agttcacgac gcgaccacgc attcgggttt ttcgccgcga ccttgctgaa acgcgaccac 2281 agctcggaaa tgaagacctg atgttcggca accgaacggc cggcgcgcgc gcggatggcg 2341 gtttcgaaca tcgggtacac ctgaaccggc atcacgatgc gacgggcaat ttcctgttcg 2401 cccgccatcg acaggtcttc gctcaccacg tccgcatcgg gggtcgaagc gtcggccttg 2461 ggccatgcaa ggttgtggcc ggctttcttc gcgcgttgct tcgtgcgtga tgcctcgccg 2521 ccggtgagaa cgacgatgtc ggcacgaccc gactgaatgt cgagcgcggc cgagttcacg 2581 agcgtctggg gggtgttgcc accgttcacg ctgaggccgt actgccgcgc ggtgatgccg 2641 agtttttgcg cgacgaacca cgcggggttg ccgtacttcc acgacagcag tgcgacgacg 2701 cgaatcgcgt cgacatcggg aaccttcgcg agaccggcgt ctgctgcagc gagccgaatc 2761 gcttcggcca tcaacgtggc ggggtcggcc gccgtgtcga acgacgcagc ccgtggatcg 202 2821 tcgggtcgtt ggagagactg tccacagccg acaagaacgg gtgtgcgggg gtcgagtgtc 2881 atgcccgaag gttatgcgct gtgacggccc tggctcctat gctcggggcg tgtcagcagc 2941 atcgccacgt cggagttttc gcatcgcaat cggtgccgac catgcgggtt tcgaactgaa 3001 gcagcacctc gttgcgttcc tcggcgctgc gggccacaag gtcgacgact tcggcacgca 3061 ctccaccgaa agcgtcgact atccgcccat ctgcgcagcc gttggtcggg ccgtccgcga 3121 tggcaaagca gacctgggca tcgttctcgg cgggagcgga cagggcgaac aactcgcggc 3181 caacaaggtg cgcggagtgc gtgcagcgct ctgcaactgc ctctatacgg cgaagatggc 3241 gcgctcgcac aacaacgcca acgtgctgtc gatgggggcc cgggtcgtcg gcgtggggct 3301 cgccgaggaa atcgtgacca cgttccttgc caccgaattc gaaggtggcc gccacgagcg 3361 tcgtgtcacc cagttggccg aaatagaaga agaagagtcg aagtagcgcg agtcgaaagc 3421 gaaaggcaca cgacccccat gtccgaaatc aagaagagtc tgcgcggaat cgtccatgcc 3481 gcggcgaaga actgggcatc cgaactcggc gccgacgatc cggtgttcga tctcatcgac 3541 cgcgaggtgc agcgtcaaac gaccggtctc cagctgatcg caagcgaaaa cttcacctcg 3601 cccgccgtca tgcgcgccac aggctcggtg ctcaccaaca agtactccga gggctacccg 3661 gggaagcgct actacggcgg gaatgccgtt gtcgacgaca tcgaggcgat tgccatcgac 3721 cgcatcaaac agctcttcgg cgccgaacac gcaaacgtgc agccgcactc cggtgccagc 3781 gccaacatgg ccgtctacct ggctctgctg aatcccggtg acaccgttct cgggctcagc 3841 ctcgaccacg gcggccacct cacgcatggt tcgcccgtca acgcgagcgg catcctctac 3901 aacttcgtct cgtacaaggt gtcgtcgggc gaagagcgca tcgacatgaa cgaggtgcgt 3961 gatctagccc tgaagcaccg gccgaagatg atcgtggcgg gcacgacgtc gtacacacga 4021 cgcctcgagc ccgaaccctt caagaagatc gccgacgaag tcggtgcact gttgatgttc 4081 gacatcgccc acctggccgg cctcgtcgcc ggtggtgcac acccgaaccc cgtgccgtat 4141 gccgacgtcg tcactttcac cacgcacaag acgttgcgcg gtccgcgcgg tggatgcatc 4201 ctctgcaagg ccgaacacgc ccagaagatc gacaaaggaa tcttccccgg aacccagggt 4261 ggaccactcg aaaacgccat cgctgccaag gccatcgcgt tcggcgaggc gctcacgccg 4321 atgttccgcg actacgcaca ccagatcgtg aagaacgcaa gcgcgctcgc cggcgcactg 4381 gcccgccagg gtttccgcct cgtctcgggt ggaaccgaca accacatgat ggtggttgac 4441 ctcacgccct tcgacgcaga gctcacgggc aaggaagccc agctcgtgct cgaccaggcc 4501 ggaatcaccc tcaacaagaa caccatcccg aacgatccac gcagcccatt cgtcacctcc 4561 ggcgttcgca tcggtacacc gtcggtcacc acgcagggaa tgaaggaagc cgagatggag 4621 cagatcgccg aactcattgc gacggcgttg cgcaaccgca ccgattcggc ggccattgca 4681 gacgtgaagg cccgcgtcgc ggcactttgc gcgcggttcc ccgtctacgc gatcgcctga 4741 ttcgacccgc tgcgtcgagc aaagtcctag tcttgtagcg gaatgggcga caccaccggc 4801 tacgtcgtcg tcgcgcttgc ggcggccctt gcgacgtttg ccgcaacccc cgtcgtgatg 4861 cggctcgccg aacgacgcca gtggatggcc caacccgatc cccgcaaggt gcacacgaac 4921 ccaacgccag acatcggggg catcgcactc ttgatcggca ttctcgttgc gttgttgttg 4981 gcctggcaga tggacaggtt cgatccattg ttccgcggta actccgagac gctcggcgtc 5041 gtgctggcag cgatcttcat ctgcggtctt ggactcgtcg acgacatccg cgaaatttcg 5101 gcaccggcga aagtcgcggg cacggtggtg gccggaatcg ttctcgtgtg ggcgggtgtg 5161 acgatgttct acttccgcgt tccgtacttc gacgtcttcg tgttgtcgag cgactggatt 5221 cccctcgtca ccattctgtg gctgctcgtc atgaccgagg gaatcaacct gatcgatggt 5281 ctcgacggcc tggcggcggg catcgtcgcc atcgcatcgg cggcgttctt cgtgtacagc 5341 cgccatctcg gtgatctcgg tcttcttgcc gaaccaaaca tcggcccgct catcgccgcc 5401 atcacgtgcg gtgcgtgcat cggatttctg ccgcacaatt tcaaccctgc gaaaatcttc 5461 atgggggaca gcggagccct gttgctgggg ctcatgctcg cggtgtcgac gagcgtcgtc 5521 ggaggccgtg ccgaccctgc gacgcaggac tacagcagcc agacctactt cttccttgcc 5581 ccgctcttca tcccgcttct cattctcggc gtccccattt tcgacgtgtt gttcgcaatc 5641 gtccgccgtg cctacaaacg ccagagcttc gcgaccgcgg acaagggcca cctccaccat 5701 cgcctcatga atctcggcca cggccaacgc cgcagcgtcc tcattctctg gttgtggact 5761 gcgcttctct cgggcatggc cctgtatccc gccatctcaa cgtcgggtgg caatatcgca 5821 cctttcggcg tcgccgcctt ggcgctgggt ctctacacgg tgctgcaccc gagcgttcgc 5881 cgtcgccgcg agggtgagga cggcgccggg atcgatgaat cggccgacgg ttcggatccc 5941 gaggattcgg cccaacttgg cgtcaactga ctgctgcgcc tagattgcta aaggtttcac 6001 aagcggaggt gggccggcta ccaccgcttt tggccagctt gagcccccga tgagacttct 6061 gcccgatccc gagcgcattc gcagcgccgc acgcaacagc aagtcaccgt ccgcggcggg 6121 cgaccaaggc ctcggtcagg gcgcggaaat cgccattggg ctccttgttt tcttcggtat 6181 cggtgccggc atcgattggt gggccggaac cacccccgtc ttcatgatcg ccttcaccat 6241 cttttgcgcg atcggccagt tcgtgcgggt gtggtacggc tacgacgccc gtatgcggaa 6301 tctggaagcc gaccgcgcgc acaacgcgag agcgcaccaa acgagtgtcg ccgcagggga 6361 ttcgtcgcgg gcggagcggc cgtgaacgca cccgccaacc cgctgaccac gcgtttcgag 6421 ggcccggccc ccgaagccca ggtcactcgt cacatgctca cgcgaggggc catggtcatt 6481 cccgtgctcg tggcgattgc cgggttcatc tggggcgctg acggcgcact tggctgcctc 6541 tacggagtcg cactggtttt cgtcaacttt gcgcttgtgg cgggcctcat ctcgctgacc 6601 gcgcgcatct ccctcgcgct catgatgggg gccgtccttt tcggctattt gctgaggctc 203 6661 gggatcattt ttctcgcggt gattctcgtg cgagatgcgg gctgggtttc acttcccgcg 6721 ctcggattca ccatcatcgt gacgcacctc ggtctcttgt tttgggagtt gcgccacgta 6781 tcggcaacac tggcattccc cggactgaaa ccacagtcca tcccccccgc acgagaccac 6841 gtctgaacaa cgcacgcgaa gacacaggac gaaagagcac aggacgagag agttaggaaa 6901 caaacgtgat cgcactcgag ttccctccca tcaacgcaat cctccgctgg caggatgttg 6961 cgccgtcgtt caacaagatc gccatcatcg ccgttctcgc gacgatcatc ggatgcacga 7021 tcttcacgct cgccgcgcga gccgacgcga agaaggcacc gcgtggtgca cgcaacctcg 7081 ccgagacttc ggtcgagttc atcgagaacg gcgtcatcat gcagacgatg ggtcgtgacg 7141 gtctcggttg gacgccgttc ctgctcacgc tcttcctctt catctacctc tgcaacgtgc 7201 ccggcatcat tccgttcttc cagatgcccg cgaccgcgcg tatggcaatt ccggcggcgc 7261 tcgcattgat ggtgtgggtc gtctacaacg gcgccggcat caagcaccag ggcttcggtg 7321 ggtacttcaa gtcggtgctc ttcccgcccg gcgtgccgaa ggcgctctac gttctcgtga 7381 cgccgatcga attcatctcg gcaatcatcg tgcgcccgtt ctctctcgca gtgcgactct 7441 tcgccaacat gctcgccggc cacatccttc ttgtgacctt cgcgcttctc tccgagtcgc 7501 tcttccaggc caaggacaag atcctcatcc cgttcggcgt gctgccgatg ttcatgttga 7561 tcttcctcac cggcttcgag gttctcgtgg ccttccttca ggcctacatc ttcacgattc 7621 tgacggctgt gtacatcggt ggagcggtcc accccgaaca ctgatcgtcg caacggtctc 7681 cctattcatc atcatcacca ggagaaagaa aaatggaagc acttgccaag attgtcgcca 7741 aggcgaccga agcagcggtc gatcccgctg acgcgaagaa ccaggcagca gccgcttcgg 7801 ctggttacgc ctacggcctc gcggccatcg gaccaggcat cggcatcggt tacctcgtcg 7861 gcaaggccgt cgaggcgatg gcgcgtcagc ctgaggcagc cggcatggtc cgtacgacca 7921 tgttcctcgg catcgccttc accgaggcac tcgccctcat cggcttcgtc gtgttcattc 7981 tgctcaagtt cgcctgatcg cgaacgtcga gcaaccgcaa tccactgacc gttaacgcaa 8041 tcgccctgca cgagcaggat caggagtaca catggcaacc gcagtagtca ccctctcggg 8101 tggatccgtg gtgtcggtga ggctcgcctc cgaaaaaggc gtggccgccg cacccagagc 8161 cgaagagggc gccgaaccga aggacatggg gccgaacccg atttcccccg agttgaaaga 8221 actcgcatgg ggcttcgggt cgttcctcgt gttcctcgtc atcatgcgtc tgttcctcgt 8281 gccgaaggtg cgcaagggca tggccgagcg ctacgacagc attcgtgccg acatcgaggg 8341 cgccgacgtg gccaagtccg acgcacgcgc cgaggtggcc aagtacgaag ccgctctcgc 8401 cgacgtccgg accgaagccg ccaagcgtct cgatgcagct cgtaccacgc tcgaccgcga 8461 acgcggcgag gcaattgcgg cagcgaacca gcgcattgca gcgaagaagg ccgaggccga 8521 ggcggcagct gccgccgagc gtgccgcggc ccgcgaccaa atcagtgcgg cggtcacctc 8581 cgtcaccgcg acggcaacgt cgatcgcggt cggcaagcag gccgacaatt cggtcatcac 8641 ccaggcagtc gcgcaggcga tgcagagcac aggagcccga tgatgagagt tctccacctc 8701 ctcgcagccg aagatccctc gcagacacac cactggttgt tgccggaaac cgccgaaatc 8761 atctacggcg gtatcgccgc gttgatcatc ttcgccgcgc tctacaagtt cgccctgccg 8821 gccgcgaaga aggccctcgc agcgcgtacc gagcgcatcc agaaggagct cgacaacgcc 8881 gcgaacaccc gctcggccgc cgaggccgag gcgtcgaaca tccgcaaggc tgtcggggac 8941 atcgcttccg agcgtgcccg cctgcttgcc gaggccgacc aacaggcagt atcgcttttg 9001 tccgaaggtc gtggacgtat cgcggccgag gttgcggatc tcgaaaccaa ggccgaagcc 9061 gacatcgcgg catcgcggtc gcggtcgagc gacgaactgc gcgcacagat cgcacagatt 9121 gcctcggttg cggcgcagaa cgcggtgacc gccactctca acgacggaac gaagcaggaa 9181 ctcatcgagg gatttatcag ctcggtcgga ggtgcacgat gagcgaagca cgcatcgagg 9241 catacgcacg ggctctcgcg gaaatcgcct cggccgaggg caacctcgac gcggtcgaaa 9301 aagaactgta cgtggtcgcc cgtgccgtcg agtccaacga ccagttgcgc gcaacgctca 9361 ccgatgagtc gattccggcg gttcgccgtc aggccgtcgt cgaagcgctc gcaggttctg 9421 catcgaatac caccgctcag ctcctcggtc tcatcatcgg ttcaggtcgc ggccgcgacc 9481 tgcccgcaat catcgacaaa atcgtgaagc gtgcatcgaa cgcgcagaac aaggaagtcg 9541 ccgaagttcg ttcggccgta gccctcactg ccgaccagca gagccgcttg gcttccgccc 9601 tcgagcgcgc caccggcaag gcagtcaacc tgaaggtcgt cgtcgacccc agcgtcctcg 9661 ggggcctcat tgccaccgtc ggcgatgagg tcatcgacgg cagcgtccgt acccgcctcg 9721 atcaagtgaa gagccgcctc taggccggcc gaagtctcac gaagggaaca cacatggcag 9781 acatcacact gtccgcaagt gacatcgccg cagcgatcac gaagggtctc gagggttaca 9841 cccccgatct ttcgaagcgc accgtcggcc gcgtcgcaga agtcggcgac ggtattgccc 9901 gcgtgtcggg tcttcccgac tgcgcagtca acgagctcct cgagttcgaa gacggcaccg 9961 ttggtctcgc actcaacctc gacgaagact cgatcggcgc cgtcgttctc ggcgatgccg 10021 actcgattga agagggtcag gccgtcaagg ccacgggacg aatcctgtcg ctgcccgtcg 10081 gcgacgcact gctcggtcgc gtcatcaacg cgcttggcca gcccatcgac ggcaagggtg 10141 acatcgtcgg cgcgaccatc cgccgcgtcg aagttcaggc gcccggcatc atgggccgaa 10201 agcctgtgca cgagccgctg cagaccggca tcaaggccat cgacgccatg attccgatcg 10261 gtcgcggtca gcgcgagctc atcatcggcg accgcaagac cggcaagacc accgttgcga 10321 tcgacacgat catcaaccag cgtggcctcg gcgtgaagtg catctacgtc gccatcggcc 10381 agaagggctc gaccgttgca cagaccgtcg aaaccctgcg ccaggccggg gctctcgagt 10441 acaccgtcgt cgtcgcggct cctgcttccg acccggcacc gttcaagtac ctggccccgt 204 10501 acgcgggttg cgcgctcgga cagcactgga tggaaaacag cgaacacgcc ctcgtggtgt 10561 acgacgacct gtcgaagcag gccgaggcgt accgccagat gtcgctgctc cttcgccgcc 10621 caccgggccg cgaggcgtac cccggcgacg ttttctacct gcacagccgt cttcttgagc 10681 gcgccgcgaa gttgagcgac gcaaacggtg cgggctcgct cactgctctt ccggtcatcg 10741 aaacgaaggc cggtgacgtg tcggcgtaca ttccgacgaa cgtcatttcc atcaccgacg 10801 gccaggtgta cctgcaggac aacctcttca agtcgggtgt gcgtccggca gtcgacgtgg 10861 gcatctcggt gtcgcgcgtg ggcagcgccg cgcagatcaa ggcgatgaag agcgtgtcag 10921 gaaccctgaa actcgacctc gcgcagttcc gtgaactcga agccttcgca acgttcggtt 10981 cggaactcga cgccgtgtca aaggcgcagc tcgagcgcgg ctaccgcctc gttgaactgc 11041 tgaagcagtc gctgaacagc ccgatgccag ttgaagaaca ggtcgtttcg atcttcgccg 11101 gaaccaaggg ctacctcgac gacatcccgg tcgccgacgt ccgtcgtttc gaaagcgagc 11161 tgctcgccca cgtgaagtcg cgccacggtg cgatgatcgc cggaatccgt caagacccga 11221 aggccgacgt cccgaaggac ctcgctgaca tcgtcaccgc gttcaagacc cagttcaaga 11281 ccagtgcacc cggctcgggc tcgcccgatg catcggccca gccggtgggc gaagccacaa 11341 ccacgaagac cctcgcgacg gagtgatcta aatggcaggc ggtcaggaac gaattctgcg 11401 cgggcgtatc cgctcggtgc aggccacgaa gaaaatcacg cgtgcgatgg agctcattgc 11461 cgcatcgcgc atcgtgaagg cgcaacagcg cgtggcggcc gccgttccct acagcgaaaa 11521 gatcaccgag gtggtcaagc atctctccga cggcggtgcg gtcagccagt caccgttcct 11581 tgcgggccgc agcgatgtga agaccacgtg ctacgtggcc atcacggcgg atcgcgggtt 11641 gtgcggcgga tacaactccg gcgtcttgcg cgcaaccgag gggcgaagtc cgcaaggacg 11701 tcgcggcggg caagaactac atgatcgttc cggtcggccg aaaggccgag ggttacctcc 11761 gattccgcgg atacaacatc agccagccct tcgcgggttt ctccgacaac ccaaccatca 11821 ccgatgcggt ggcaatcgga cgcttcgtcg ttgacttgtt cctctcgggc gaagtcgacc 11881 gtgtcgaact ggtgtacacc cgtttcgtct cggccggtcg tcaagaagtg gtgcgtcgtc 11941 cactcgttcc actgggaggc gacatcgcca aggccgaagg cggcgaccac ggacacgagg 12001 ctccgggggc agcgggcgac tacgagtacg agcccgatcc gtcggccatt ctcgacacgt 12061 tgctgccccg ctacgtcgag gcccgtacgt acgccgcgct gctcaacgcc gccgcgtcgg 12121 aacacgcatt tcgccagcgc gcaatgaagt cggcaaccga caatgccgaa gaactcatca 12181 agaacctgac gcgcgtgatg aaccgcgccc gtcaagactc catcaccaca gaaatcatgg 12241 aaatcgtcgg tggtgccgaa gccctcggat ccggcgacga cgatgacgtc gttgtcggtg 12301 acaacccatt cgtaaccagc gatcgagtct gagcaggaga aactgacatg actatgacag 12361 caagcaaccc caccgaccag aaagacggtc gcgtcgtcgc catcgccgga ccggtgatcg 12421 acgtcgagtt ccctcgtggc tcactgcccg agctgaacac ggcgctcgaa ttcaccgtcg 12481 acgtcgacgg caagccgaac atcatcctcg ccgaggtggc ccagcagctg ggcaacagcc 12541 gcgttcgtgc ggtctgcatg aagcccaccg acggcctgaa gcgcggcacg cccgtgcgca 12601 acacgggccg tggcatctcg gtgcccgtcg gatcaaagac gctcggtcat gtgtggaacg 12661 tgtggggcga cgtgcttgac gccgacccca aggaattcga gaacatcgag cgctgggaga 12721 tccatcgctc cgcgccggcg ttcgacaccc tcgagccctc gaagcgcatg ttcgaaacgg 12781 gcatcaaggt catcgacctt ctcacccctt acctcgcggg cggaaagatc ggcctgttcg 12841 gtggtgcggg cgtcggcaag accgtgctca tcaccgagat gatcaaccgc gtggcttcga 12901 agcacggtgg tgtgtcggtg ttcgccggcg tgggtgagcg cacccgcgaa ggaaccgacc 12961 tgcgtctcga gatggaagag tccggcgtgt tcgaaaaggc cgccttggtc ttcggccaga 13021 tggacgaacc accgggcgtg cgcctccgcg tcgcgttgtc ggccctcacc atggccgagt 13081 atttccgtga cgtccaaaac caggacgtgc tgttgttcgt cgacaacatc ttccgcttcg 13141 tccaggcggg ctcggaagtg tcgaccctgc tcggccgtat gccgtcggcg gtgggttacc 13201 agcccacgct cgccgacgaa atgggtcagt tgcaagagcg catcacctcg acgcgtggtc 13261 gttcgatcac gtcgctgcag gccgtgtacg tgcccgccga cgactacacc gacccggctc 13321 cgttcacgac gttcaccttc ctcgatgcaa cgacggaact ctcgcgtcag atcgcctcgc 13381 tcggtatcta cccggcggtc gacccgctgg cgtcgacctc gacgatcctc acgcccgaaa 13441 ccgtcggcga ccgtcactac caggtggccc gccgcgtgca ggaaattctg cagcgttaca 13501 aggaactcca ggacatcatc gccattctcg gtctcgacga actgtcggaa gaagaccgcc 13561 tcaccgttgc gcgcgcgcgc aaggtgcagc gtttcctctc gcagccgttc tacgtggccg 13621 aggtgttcac cggtgtctcg ggtgagtacg tgccggtcgc cgaaacggtc gagagcttcg 13681 aggccctcat caacggtgaa ctcgacgaag ttcccgagca ggcgttcttg aacgtcggtg 13741 gcgtcgaaca ggttcttgcc aaggccaagg cgctgcagga gaacgcctga tgtcatccga 13801 gaccaccggc tttcgtgtcg aagtggtctc gcccgagcgc gtcgtgtact cgggtgaggc 13861 ctcacaggtc atcacccgca cgcttggtgg gggagaaatc gcgtttctcc ccggtcacgt 13921 ctcgttcctc ggtgctctgg tcgaaaacca cacgcgcatc taccaggccg atggcaaggt 13981 cgtcgacgct gcagtgcacg gtggcttcgt cgaggtatcg ggaaccacgg tgacgatcct 14041 gtctgacggc gcagaacttg cgagcgacat cgatgtcgac cgtgcacggc gcgccaagga 14101 gcgcgcggaa gaacatctcc gcgccgaaca cgacgcagaa accgagggtg cactgcgtcg 14161 tgctcacgcg cgcctttccg ccgcgggcgg cctgaccggt gcaaacaccg gccattaggc 14221 cgatggcaaa ctttccgttc cgacgggcac tcgtcaccgg agcctccgcc ggaatcggcg 14281 aatgcttcgt ccacatcctc ggacaggccc ggcgtgccct gcgtcgtcgt ggctcgcagg 205 14341 ggtgaccgat tgcgtgcact tgcagaccgt tacgagggct tcgaagttct cgaggccgat 14401 cttcaaaccg atgcgggcat cgcccgagtg gcggatcgcc tcgcgtcgac gatcaacccg 14461 gtcgacttgt tggtcaacaa tgccggcttc ggcacgagcg gaccgtttca cgagagtggc 14521 attgagcggg cgacggggca ggttgatctc aacgtgaagg cgctcgtcgc cctcacgcac 14581 ggcgccgtca acgccttccg gaggcatggc ggaggtcacg ttctcaatgt ctcgagcgtg 14641 gcgagcttcc aggccagccc gaacatggcc gtctacgccg caacgaagag ttttgtcacc 14701 agtttcaccg aggccgtcca cgaagaaaac cgaacgtttg gtgtgaaagt cagcgcgctg 14761 tgcccgggct ttgtcgcaac cgaatttcaa gcggtgtccg gtggtgctga tcgaatgaca 14821 cgcacccctc gatttctctg gctgaacgcc gacaaggtgg cccgcgccgg tctcgacggc 14881 gttgcaaaga accgcgccat cgtcgtgccc ggttggcagt acgccgctct gccgatgttg 14941 tcgaagctga ccccgcgttt ggtcttgcgt cgcatcgccg cgaagattct ctgaagcaac 15001 tctgcaacac cgaggtaaat ggtgtgaccc attggtgaca attgtttcgt caccatctct 15061 ccaaatcttc gttaccaaaa gaggtatgaa gaacgtattg attctctttc gccaggccta 15121 cgctcgtgac ctcgcccccg cggtcatcgg ttcgttcgat tcggtgcggg tcgaaggaat 15181 cgtcgggacg acgagcgaag cgatcgaaat cctgaacagt ggcgcaattg acttgatcgt 15241 catggggccg ggcttcatcg agtcggcgca aattctcgcc aatgcccacc ccgggctggt 15301 tcggccggtt ttcatcgttc ttcaccgtgg cgacgacgcc tcggttcggc tgcgggcaca 15361 acttcaagaa atcaaacatg tggtcagtct ggcgagtggt ctcaacgacg ccatggaatc 15421 gatacacacc gttcttcatg agaacgagcg cacgcgttgc agaatcatcg acgacctgcc 15481 tgatcggtgt gatcgccggg tcatgattca ggtgaacgac gatgcggatt gcgagattgt 15541 ccgattggtg gcgaccgggt ttgccgaccg cgacatcgcc gaggtcgttc acatgtctca 15601 ccagaccatt cggaacagaa tcagtcgcat tctcggcgag accggagcgc gtaatcgaac 15661 ccatcttgcc tgcctgtatt tcgagcgcgt gcacgaaggc ctcgtaccct tcgaaactct 15721 cgacggccca tacaggccgg tcgtcaccgg tctctagcca ccgcaccccg ccacgggata 15781 ccctttcttc gtatcgaacg gggggggtat ggcgtccctt gtctgcgcaa tcgacgaacc 15841 ggatctgatc gcgaagttct cgcgcagcgc ggttgatcta cacaaccttg atctgctgtt 15901 cgtgggaagt gatccactgt cggtcaaggt gcacgtggcg gaagcgactg tttgcggatt 15961 gtgcattccg ccgatgccgg ttgatcgaat gttgcgccta cgccatgcag tgtacggatc 16021 acgcgatgac ggtgcgaacc ttcgctcggt catcgcagtc gagcgccccg accccgcctt 16081 cgtgcactac gtgttgggct atggcatcga tgacgtgttc gacacatctg ttcccgacgc 16141 tgagtttgcc gcgcaactgg cgacttttgc tggaggacgc ctgcgtgcat gtgctgatca 16201 acttgtggcc ggtgttgacc tcccgatcac ggtgatggac ggcgtcattg actatgccga 16261 tgagatggat cgcaagatgg tgcgattgat ttcggttggc tacaccgatc gggaaatcgg 16321 ggaaattatc gggtttaccc atcaggcaat gcgcaacaga atcagccacc tgatgttgcg 16381 ctcgggaatc cgcaatcgta cgcagttggc ggcgcgctac acgttcgagt cgatcaaccg 16441 cggcatcgct gctcagccgt agttcactgt gcccattgcg tgaactacta gccgtagttc 16501 actgatgcat acgtgccgag tttgtcgagg cggtggtgcg cgtcgatgag gcgtacggtg 16561 cccgacagtg atcgcatgac gaggctttgg gtgtgggcac cgcgcgaatc gaaacgcacg 16621 ccgcgcaaca gttcgccgtc ggtgaccccg gtcgctgcga agaagcagtt gtctccgcgt 16681 acgaggtcgt cggtggtcag caccttgctg aggtcgtaac ccgcggcgac cgcggcgttg 16741 cgttccgctt cgctacgggg ccagaggcgt ccctgaattt cgccgcccat gcacttcagc 16801 gccgcggccg acacgactcc ctcgggggtg ccaccgattc cgaacagcac gtcgacgccg 16861 gcctcgtgcc acgcggtggc gatggcgccg aagatgtcgc cgtccttgat gagcttgatc 16921 cgcgccccgg tgctgcgaac ttcctcgatc aggtccttgt ggcgctcgcg atcgagaatc 16981 atgacggtga ggtcgcgcac cgattcgttt ttggcgcgag ctatccaacg gaggttctgt 17041 gtggggctgg cggtgatgtc gatcgatccg gcggcatcgg ggccgaccgc gatcttctcc 17101 atgtacatgc acggacccgg gtcgaacatc gtgtcacgct cggagacggc gatcaccgaa 17161 atggcattgc ccagacccag cgaggtgagg gtggtgccat cgatcggatc gacggcgacg 17221 tcggtcttcg tgctgctgcc gtcgcccacg gcctcgccgt tgaagagcat cggtgcgtcg 17281 tccttctcgc cttcgccgat gacgacgagg ccgtccatcg ggacggtggc gagcacggtg 17341 cgcatggcct cgactgctgc gccgtcggca ccgttcttgt cgccgcggcc cacccatcgc 17401 gatgctgcca acgcagcgga ttcggtgacg cggaccaact cgagtgcgag gttccggtcg 17461 ggcttttgcg catggctgtc ggtcatggaa ccacggtagt tccaccgaca cccgccacag 17521 ataccctcag cgccgtgccg ggcccgacag tcacaagatg ccttgcgcag cgcgatctct 17581 ccgagccccg tctcgacccc accggcgtca gggtcgcgtt tggcgcctcg cacgacggcg 17641 tggggtcact gcggctggtt gagctcgcca ccggtgtcga aagtgagtgg ccgttgtcac 17701 ccacgcccgc actgggccgg ggaatgagcg gcgggtgttt ccgttggttg cccgacggct 17761 cgggaatcgt ctacgccgcc aaggatggtg gcgtgtggga ggcatccctg cccgacgccg 17821 gcgcggccga ggcgtgtcga atcgatacgc accacgaggg cgtgttctcg tcactgcatt 17881 gcatcgctgt gtccgaagat gcacggttcg tcgcggcggt tgattccttg acctcgatgg 17941 tcgtgatcga gcgggcgacg ggaatcacgc gaaaggtcgt gcaccagcat gaattttctc 18001 tggatccgat gtggcaccgc gacacggtct actggcagtc atggtcgccg ccccacatgc 18061 cgtgggatga agccgagatt tggtgtgcga cggcacccga cttccgtgcg cgtcgcattg 18121 ctgcgttgcc cggtgtctcg ctgcagcaag tcgccgtatc tgtcgcaggc gaattgggcg 206 18181 cgatgaacga tgcgaacggc cgactacgac ccggtgtggt tgatgccggc atggtgacac 18241 cgcttctctt gtccggcgag ccgtgcgatt gcgctggtgc gcagtggggt ggtggtggac 18301 gatcgtggtg ctggtcgccc gatggcacgc gatacctcgt cgcgcgcaat ctggcgggct 18361 ttggtgatct tgcggttgtc gaccgtgcat cgggcgccgt caccacactg gcgcgcgcgg 18421 tgcatggtgg cgtgtcgtgg acgccgcacg gaatcgtcgc gctgcgcagc ggtgcagtga 18481 cgccgactca ggtggtgttg tttgatccgg tgtcgttcga gcgtcgtgta ctcgcgatgt 18541 ctgaattacc ttgcatcgac ggaccgtgga gcgcctgcga cttggttgaa ccgacggccg 18601 gcacggtcga cggaattccg ttccgtctct acgaaccgaa cgacccatcg gaggcgtcac 18661 atctcctcgt gtgggtgcac ggtgggccga ccgatcaatg gcaggtcacg ttcatgccgc 18721 gtatcgcgca ctgggtgggt cgtgggtgcc gtgtgctggt ggtcgaccac cgtggcacga 18781 ccggtcacgg cagggcattc cagcaggcgc tgaacggcca ctggggggaa tacgacacgg 18841 ccgacgtgat caccgtcgcc gaacacgctc acgcggaagg atggggtacg ccgtctcgca 18901 ccgtcgtgat cggtgggtcg gccgggggtt tttcgtcgct gaatgccgca ggcgaacgcc 18961 ccgacctttt tgcgggggtc gtcgcgctgt atccggtggt cgaccttgcc gatgcgaccg 19021 aacggtcgtg gaactacgag cagcactcga ttgcggttct tgtcggagac acgtccgtca 19081 actcggaact gtattaccgc cgttcacccc tctcgaagct cgacaaactc tcgcaggcaa 19141 aggtgcttct gatgcacgga gatcaggacg aagctgttcc gctcgaccac agcatcctca 19201 tcgcggatca attgcgtcgg aggggtggag atgttgcgtt ccacgtgttc gagggtgaag 19261 gacacgggtt tcgtcttcgt gtcaaccaag aacgcgaata cgcattgatt ggtgagttcc 19321 tgcagacctt gtagcgttct cgttcatgtc cgactggaac cctctcgacc ccgatgcaga 19381 gagcgttcat tacgacctcg gtgcatggaa tctcgatcaa cgcgcggcgg ttgccgaggt 19441 gttcgcggaa gctgagatac ctcatgcctg ggtgggcgac gaggtcgtcg tgcccgccga 19501 gctcgaggag gttgccgacg tgcttctcga tcgacttgag caggagttcg gtgtcgacgg 19561 agcaacggtt tcgacgcgcg gagcatcttt cgccattgac gaagcggacg acgacgacat 19621 cacggaatac gaactcgatg attgggccga caacgagcgt gcacggctga gcgaactgct 19681 cgtcgcgtca ggtatcccct tcaaatggga gggtgctctt ctcgtgacgc tcaccgactt 19741 cgaagacacc gtcgatgaac tactcgatgc cgtcgaggca ggcgacgtca ccatcgtcga 19801 ctcttccggt tcgggtcgtg gcaccgtgtc ggtggcggat tcggacgtgt cgggggaaac 19861 gctgacgcag atgttcctcg ctgccgaacg gctgcaacgc gatccactcg acgccgacgg 19921 tcttgcgttg ttggtccggg ttctcgacga cgtggaagac ggtggcacgc cgttcggagt 19981 gcccgtgccc gcgtggcgcc aggctctgga actcgccgat cagttggctg atgccctcgc 20041 cggcggcgac atccccgacg agatcggtgc gatggaagtc gcacaacgat tgttcgtcac 20101 gctgaggccg cacgtctgat tcgccactag cgtcgcctgc gatgctgctc gcagaactcg 20161 agatctggca ctcgcggccc atcacgccca ctcggcgtgt tgcgcttggc cacctggttc 20221 tgccagccga tccgtcgccc ggttttgggg gagtgctgct gggcgcggtc atcgcacgcc 20281 atctctacga agtcgacgac gacctccacc ccgacattca acgcctcatc ctcgaggtcc 20341 aacgcggtga ccgagtcgtg cagccgcggc ttcgtcaccg cttccaggtc gatcgtcacg 20401 gactttcgcg aagtgttcat cgactgatgg gtgaagacga ctccgccgag ttcgtctttg 20461 gtacgcacgg caatcctctg cagcaggtgc tcggcgccat ttacgcactc gaacgtctgg 20521 aagcggtcac ccgacgcgca ctcggcccgg tgctcgtgaa gtcgatggcg tggcgcgggc 20581 cgatcggtgc atccttcatc tcgtacctcg tcggcaatgc cacgaattcg gtctcggcgc 20641 tcgccgaccc ccgcgcctgg gctcttgaag tgctgggttt tcccgcgggc accatcaagc 20701 cgtcgaagcg cgagatcacg caacgtttcc gagacaggct ccgtttggtg catcccgacc 20761 acggtggtgc gcaggatcgt gcaagtcgtg agattgccga cctggccgag gcgcgccggg 20821 tactcacgaa actgtgagtt gtgcgtgagg gcgaaaggtc tcgtgctgtt cccgggtgcg 20881 gggagcaacc gcgaccactc gtcactcgtt tcgttggaag agcgactcgc cccgcttccg 20941 gtcgcgcgcg tcgactttcc ctaccggcgg gccggtcgca aggctccgga ccgtgccccg 21001 gttctcgtcg attgcgtggt gcgcgaggtg aaggagttcg ccgcgctgaa ttcgtgtcgg 21061 tcgtcgtcac tcgtcattgg tggccggtcg atgggcggtc gtatgtgctc gatggcggtt 21121 gccgatggtc ttgcggcgaa gggattggtc ttgatctcgt accccttgca tccgccggcg 21181 aagccgcaga acctgcgagt cgagcacctg tcgaatatcg cggtgccgac gctgttcgtg 21241 cacggtacga acgatccgtt cgggtcgccc gccgagctgc gtcgccacgc gcgtcgggtg 21301 acaggcgatg tgacgttcca attcatcgaa cgtggacgcc acgacctgaa gggaagcgac 21361 gaattgatcg ctgacgtggt gggtgaatgg atggcgtcgc tctagggagc gtcttccgga 21421 ttcaattcct cgagttcacg gtgcgcggtt tcggggtaga agaaccacac aagggccgcg 21481 atcaccagcg acaccgacga gaacgtcaac atcacggttc cgtaggacca atggtggtcg 21541 atcagcgttc caccgacgat caggcccgcg cttccgccta ggagggccga gacggtgacg 21601 agaaacgaac tcgtctggcg tttggccgtg ggggacagct ctccgcgata gacggccatg 21661 gccggatagg cggtcgcggc gaacacacca cccgtgaccg cgcacaccca catcagtggt 21721 ccgtcgacga cgaaggacat cgcgatgaga agcgcgccga cgggaatgca cgtggcggcg 21781 acgacgcgtc ggccgtagac gtcggcgagg cgaccgccga agatgagacc gagcgccgac 21841 ggtgtggtgg tgagaagggt gaaggcggca acgagaacgg ccgagtagcc acgaacgtct 21901 ttcaggtagc ggttctggaa aatcgatgca gttgcgagaa agacgttgcc gagaaacgcc 21961 accaccaaga tcaccccctg caccctcaga gttctcggcg cgtggctacc cgcttttcgc 207 22021 gcctccacgc gccgagaacc ctcagtgcgg tggcgggtga agcgcgtcgt ttccgacaac 22081 ctgcgcgtca agcgggcagc aaccaccagc cagatcagcg acaccacata gatgatgcgc 22141 cagccgtttt cgcccgaatc ggcgagcgga agagcaagaa cggcgatgcc ggcacccagt 22201 ccgctggcga gggcaaggat gctgagcccg tacgcgcggg cgttcttcgg catctcttcc 22261 accgcgacga cgaggagtgc aacgtcgagt gccaggccga gcggtcgcgc aatcgtttga 22321 ttggcggtga gcagccagaa actcggtgcg agggcaccga gtgccgcgat caccggagcg 22381 atccatgcga gtgacacgat gatgcgccgg cggccgatgc ggtcgccgag aacggcgatg 22441 ggaatcgcaa cgacgatgcc gaagcggacg atggcggcgg ccacgccctg accctgctcg 22501 cccacgttga aggcttccgc ggcgtaggtc acggtctgcg tgaacagcgt gttgatgaag 22561 gcgctggcca tcgatgcggc agcaagaaga gaaagaacca gtacctcgcg ttggttcagt 22621 tcctgtggcg gtgcccacca cgtcgacttg gcgcggcgtt cgtgacgtgc gaagtgacgc 22681 gagatgagtg gtgtgaacat ccaccagaac cacgggatgt cgaccgacca cgtggttgtt 22741 tcgcgcgccg cgccgtcggc gcggcgggtg acgacgcgtt cgtaggtggc gaagggaccg 22801 ttgtgctgag caaatgtcgt cgtgtcggcg gaagtcgaac caaccgactc gacgacgatg 22861 tcgctgtgtg gttcgagaat cgaagaaaaa tcagcgtcat ttgggccgat ttggcgcgtc 22921 gtgacgcgaa ttcgacggcc gcggtccgtg cgaatgtcgg ggcggcccat accggcggga 22981 gcatactgac acggtgactt cgccgcgtat ccgggtgcga cccgggaacc gactcaccgg 23041 caccgtcgac gtgcccggtg cgaagaactc ggtgctcaag ttgatggccg cgtgcctgat 23101 ggccgacggc gacttcacgc tgacgaacgt ccccgacatc gacgacgtgt cgacgatgtc 23161 cgaccttctc accgcaa

208 REFERENCES

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and lipman, D.J. (1990) Basic local alignment search tool. J Mol Biol 215: 403-410.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res 25: 3389-3402.

Annila, A., Lehtimaki, J., Mattila, K., Eriksson, J.E., Sivonen, K., Rantala, T.T., and Drakenberg, T. (1996) Solution structure of nodularin - an inhibitor of serine/threonine-specific protein phosphatases. J Biol Chem 271: 16695-16702.

Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M.,

Bucher, P., Cerutti, L., Corpet, F., Croning, M.D., Durbin, R., Falquet, L.,

Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D.,

Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N., Oinn, T.M.,

Pagni, M., Servant, F., Sigrist, C.J., and Zdobnov, E.M. (2000) InterPro - an integrated documentation resource for protein families, domains and functional sites. Bioinformatics 16: 1145-1150.

Ashby, M.K., and Houmard, J. (2006) Cyanobacterial Two-Component Proteins:

Structure, Diversity, Distribution, and Evolution. Microbiol Mol Biol Rev 70: 472-

509.

Ashelford, K.E., Weightman, A.J., and Fry, J.C. (2002) PRIMROSE: a computer program for generating and estimating the phylogenetic range of 16S rRNA oligonucleotide probes and primers in conjunction with the RDP-II database.

209 Nucl. Acids Res. 30: 3481-3489.

Bahr, M., Hobbie, J.E., and Sogrin, M.L. (1996) Bacterial diversity in an Artic lake: a freshwater SAR11 cluster. Aquat Microbiol Ecol 11: 271-277.

Barns, S.M., Takala, S.L., and Kuske, C.R. (1999) Wide distribution and diversity of members of the bacterial kingdom Acidobacterium in the environment. Appl. Environ. Microbiol. 65: 1731.

Basu, A., Kozikowski, A.P., and Lazo, J.S. (1992) Structural requirements of lyngbyatoxin A for activation and downregulation of protein kinase C.

Biochemistry 31: 3824-3830.

Bauld, J. (1981) Occurence of benthic microbial mats in saline lakes.

Hydrobiologia 81: 87-111.

Beasley, V.R., Dahlem, A.M., Cook, W.O., Valentine, W.M., Lovell, R.A.,

Hooser, S.B., Harada, K., Suzuki, M., and Carmichael, W. W. (1989) Diagnostic and clinically important aspects of cyanobacterial (blue-green algae) toxicoses.

J Vet Diagn Invest 1: 359-365.

Beja, O., Suzuki, M.T., Koonin, E.V., Aravind, L., and Hadd, A. (2000a)

Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environ. Microbiol. 2: 516.

Beja, O., Koonin, E.V., Aravind, L., Taylor, L.T., and Seitz, H. (2002)

Comparative genomic analysis of archaeal genotypic variants in a single population and in two different oceanic provinces. Appl. Environ. Microbiol. 68:

210 335.

Beja, O., Aravind, L., Koonin, E., Suzuki, M., Hadd, A., Nguyen, L., Nguyen,

L.P.,

Jovanovich, S., Gates, C.M., Feldman, R.A., and Spudich, J.L. (2000b)

Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science

289: 1902 - 1906.

Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Rapp, B.A., and

Wheeler, D.L. (2002) GenBank. Nucleic Acids Res. 30: 17.

Benson, G. (1999) Tandem repeats finder: a program analyse DNA sequences.

Nucl Acids Res 27: 573-578.

Birck, C., Malfois, M., Svergun, D., and Samama, J. (2002) Insights into signal transduction revealed by the low resolution structure of the FixJ response regulator. J Mol Biol 321: 447-457.

Blokhin, A.V., Yoo, H.D., Geralds, R.S., Nagle, D.G., Gerwick, W.H., and

Hamel, E. (1995) Characterisation of the interaction of the marine cyanobacterial natural product curacin A with the colchicine site of tubulin and initial structure-activity studies with analogues. Mol Pharmacol 48: 523-531.

Boyd, D.R., Sharma, N.D., O'dowd, C.R., Carroll, J.G., Loke, P.L., and Allen,

C.C. (2005a) cis-Dihydrodiol, arene oxide and phenol metabolites of dictamnine: key intermediates in the biodegradation and biosynthesis of furoquinoline alkaloids. Chem Commun 31: 3989-3991.

211 Boyd, D.R., Sharma, N.D., Llamas, N.M., Malone, J.F., O'Dowd, C.R., and

Allen, C.C. (2005b) Chemoenzymatic synthesis of carbasugars from iodobenzene. Org Biomol Chem 3: 1953-1963.

Breitbart, M., Salamon, P., Andresen, B., Mahaffy, J.M., and Segall, A.M. (2002)

Genomic analysis of uncultured marine viral communities. Proc. Natl. Acad. Sci.

USA 99: 14250.

Breitbart, M., Hewson, I., Felts, B., Mahaffy, J.M., and Nulton, J. (2003)

Metagenomic analyses of an uncultured viral community from human feces. J.

Bacteriol. 185: 6220.

Buckley, D.H., and Schmidt, T.M. (2002) Exploring the Diversity of Soil - a

Microbial Rainforest. In Biodiversity of Microbial Life: Foundations of Earth's

Biosphere. Stanley, J.T., and Reysenbach, A.-L. (eds). New York: Wiley-Liss, pp. 183-208.

Burja, A.M., Banaigs, B., Abou-Mansour, E., Burgess, J.G., and Wright, P.C.

(2001) Marine cyanobacteria—a prolific source of natural products. Tetrahedron

57: 9347-9377.

Campbell, B.J., Stein, J.L., and Cary, S.C. (2003) Evidence of chemolithoautotrophy in the bacterial community associated with Alvinella pompejana, a hydrothermal vent polychaete. Appl. Environ. Microbiol. 69: 5070.

Cane, D.E., and Walsh, C.T. (1999) The parallel and convergent universes of polyketide synthases and nonribosomal peptide synthetases. Chem Biol 6: 319-

325.

212 Canfield, D.E., Phlips, E., and Duarte, C.M. (1989) Factors influencing the abundance of blue-green algae in Florida lakes. Canadian Journal of Fisheries and Aquatic Science 46: 1232-1237.

Cohen-Bazire, G., and Bryant, D.A. (1982) Phycobilisomes: composition and structure. In The biology of Cyanobacteria. Carr, and Whitton (eds). Berkeley:

University of California Press, pp. 143-190.

Cole, J.J., Likens, G.E., and Strayer, D.L. (1982) Photosynthetically produced dissolved organic carbon: an important carbon source for planktonic bacteria.

Limnol Oceanogr 27: 1080-1090.

Cole, J.R., Chai, B., Marsh, T.L., Farris, R.J., Wang, Q., Kulam, S.A., Chandra,

S., McGarrell, D.M., Schmidt, T.M., Garrity, G.M., and Tiedje, J.M. (2003) The

Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic . Nucl Acids Res 31: 442-

443.

Cottrell, M.T., and Kirchman, D.L. (2000) Community composition of marine bacterioplankton determined by 16S rRNA gene clone libraries and fluorescence in situ hybridization. Appl Environ Microbiol 66: 5116-5122.

Cottrell, M.T., Moore, J.A., and Kirchman, D.L. (1999) Chitinases from uncultured marine microorganisms. Appl. Environ. Microbiol. 65: 2553.

Courtois, S., Cappellano, C.M., Ball, M., Francou, F.X., and Normand, P. (2003)

Recombinant environmental libraries provide access to microbial diversity for drug discovery from natural products. Appl. Environ. Microbiol. 69: 49.

213 Curtis, T.P., Sloan, W.T., and Scannell, J.W. (2002) Estimating prokaryotic diversity and its limits. Proc. Natl. Acad. Sci. USA 99: 10494.

de la Torre, J.R., Christianson, L.M., Beja, O., Suzuki, M.T., and Karl, D.M.

(2003) Proteorhodopsin genes are distributed among divergent marine bacterial taxa. Proc. Natl. Acad. Sci. USA 100: 12830.

Dean, F.B., Nelson, J.R., Giesler, T.L., and Lasken, R.S. (2001) Rapid

Amplification of Plasmid and Phage DNA Using Phi29 DNA Polymerase and

Multiply-Primed Rolling Circle Amplification. Genome Res. 11: 1095-1099.

Dey, B., Lerner, D.L., Lusso, P., Boyd, M.R., Elder, J.H., and Berger, E.A.

(2000) Multiple Antiviral Activities of Cyanovirin-N: Blocking of Human

Immunodeficiency Virus Type 1 gp120 Interaction with CD4 and Coreceptor and

Inhibition of Diverse Enveloped Viruses. J. Virol. 74: 4562-4569.

Diaz, E. (2004) Bacterial degradation of aromatic pollutants: a paradigm of metabolic versatility. Int Microbiol 7: 173-180.

Dittmann, E., Neilan, B.A., and Borner, T. (2001) Molecular biology of peptide and polyketide biosynthesis in cyanobacteria. Appl Microbiol Biotechnol 57:

467-473.

Ducros, V.M., Lewis, R.J., Verma, C.S., Dodson, E.J., Leonard, G., Turkenburg,

J.P., Murshudov, G.N., Wilkinson, A.J., and Brannigan, J.A. (2001) Crystal structure of GerE, the ultimate transcriptional regulator of spore formation in

Bacillus subtilis. J Mol Biol 306: 759-771.

214 Dufresne, A., Salanoubat, M., Partensky, F., Artiguenave, F., Axmann, I.M.,

Barbe, V., Duprat, S., Galperin, M.Y., Koonin, E.V., Le Gall, F., Makarova, K.S.,

Ostrowski, M., Oztas, S., Robert, C., Rogozin, I.B., Scanlan, D. J., Tandeau de

Marsac, N., Weisenbach, J., Wincker, P., Wolf, Y.I., and Hess, W. R. (2003)

Genome sequence of the cyanaobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome. Proc Natl Acad Sci USA 100: 10020-

10025.

Dykhuizen, D.E. (1998) Santa Rosalia revisited: Why are there so many species of bacteria? Antonie Van Leeuwenhoek International Journal of General and

Molecular Microbiology 73: 25-33.

Dym, O., Pratt, E.A., Ho, C., and Eisenberg, D. (2000) The crystal structure of d-lactate dehydrogenase, a peripheral membrane respiratory enzyme. Proc Natl

Acad Sci U S A 97: 9413-9418.

Eiler, A., and Bertilsson, S. (2004) Composition of freshwater bacterial communities associated with cyanobacterial blooms in four Swedish lakes.

Environ Microbiol 6: 1228-1243.

Eilers, H., Pernthaler, J., Glockner, F.O., and Amann, R. (2000) Culturability and in situ abundance of pelagic bacteria from the North Sea. Appl Biochem

Biotechnol 66: 3044-3051.

Engebrecht, J., Nealson, K., and Silverman, M. (1983) Bacterial bioluminescence: isolation and genetic analysis of functions from Vibrio fischeri.

Cell 32: 773-781.

215 Entcheva, P., Liebl, W., Johann, A., Hartsch, T., and Streit, W.R. (2001) Direct cloning from enrichment cultures, a reliable strategy for isolation of complete operons and genes from microbial consortia. Appl. Environ. Microbiol. 67: 89.

Esser, M.T., Mori, T., Mondor, I., Sattentau, Q.J., Dey, B., Berger, E.A., Boyd,

M.R., and Lifson, J.D. (1999) Cyanovirin-N Binds to gp120 To Interfere with

CD4-Dependent Human Immunodeficiency Virus Type 1 Virion Binding, Fusion, and Infectivity but Does Not Affect the CD4 on gp120 or Soluble

CD4-Induced Conformational Changes in gp120. J. Virol. 73: 4360-4371.

Fang, J.-M., Lin, C.-H., Bradshaw, C.W., and Wong, C.-H. (1995) Enzymes in organic synthesis: oxidoreductions. J Chem Soc Perkin Trans 1: 967-978.

Ferrandez, A., Prieto, M.A., Garcia, J.L., and Diaz, E. (1997) Molecular characterization of PadA, a phenylacetaldehyde dehydrogenase from

Escherichia coli. FEBS Lett 406: 23-27.

Ferrandez, A., Minambres, B., Garcia, B., Olivera, E.R., Luengo, J.M., Garcia,

J.L., and Diaz, E. (1998) Catabolism of phenylacetic acid in Escherichia coli.

Characterization of a new aerobic hybrid pathway. J Biol Chem 273: 25974-

25986.

Fiandt, M. (2000) Construction of an environmental genomic DNA library from soil using the EpiFOS fosmid library production kit. Epicentre Forum 7: 6.

Fieseler, L., Quaiser, A., Schleper, C., and Hentschel, U. (2006) Analysis of the first genome fragment from the marine sponge-associated, novel candidate phylum Poribacteria by environmental genomics. Environ Microbiol 8: 612-624.

216 Finn, R.D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., Hollich, V.,

Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S.R.,

Sonnhammer, E.L.L., and Bateman, A. (2006) Pfam: clans, web tools and services. Nucl. Acids Res. 34: D247-251.

Francis, G. (1878) Poisonous Australian Lake. Nature 18: 11-12.

Frankmolle, W.P., Knubel, G., Moore, R.E., and Patterson, G.M. (1992)

Antifungal cyclic peptides from the terrestrial blue-green alga Anabaena laxa II.

Structures of laxaphycins A, B, D and E. J Antibiot (Tokyo) 45: 1458-1466.

Frazier, K., Colvin, B., Styer, E., and Hullinger, G. (1998) Microcystin toxicosis in cattle due to overgrowth of blue-green algae. Vet Hum Toxicol 40: 23-24.

Fuqua, W.C., Winans, S.C., and Greenberg, E.P. (1994) Quorum sensing in bacteria: the LuxR-LuxI family of cell density-responsive transcriptional regulators. J Bacteriol 176: 269-275.

Futai, M. (1992) Membrane D-lactate dehydrogenase from Escherichia coli.

Purification and properties. Biochemistry 12: 2468-2474.

Gabor, E.M., de Vries, E.J., and Janssen, D.B. (2003) Efficient recovery of environmental DNA for expression cloning by indirect extraction methods.

FEMS Microbiol. Ecol. 44: 153.

Gabor, E.M., de Vries, E.J., and Janssen, D.B. (2004) Construction, characterization, and use of small-insert gene banks of DNA isolated from soil and enrichment cultures for the recovery of novel amidases. Environmental

217 Microbiology 6: 948-958.

Galperin, M.Y. (2005) A census of membrane-bound and intracellular signal transduction proteins in bacteria: bacterial IQ, extroverts and introverts. BMC

Microbiol 5: 35.

Garcia-Pichel, F., and Belnap, J. (1996) Microenvironments and microscale productivity of cyanobacterial desert crusts. J Phycol 32: 774-782.

Garcia-Pichel, F., Lopez-Cortes, A., and Nubel, U. (2001) Phylogenetic and morphological diversity of cyanobacteria in soil desert crusts from the Colorado plateau. Appl Environ Microbiol 67: 1902-1910.

Gibson, J., and S. Harwood, C. (2002) Metabolic diversity in aromatic compound utilization by anaerobic microbes. Annual Review of Microbiology 56:

345-369.

Gillespie, D.E., Brady, S.F., Bettermann, A.D., Cianciotto, N.P., and Liles, M.R.

(2002) Isolation of antibiotics turbomycin A and B from a metagenomic library of soil microbial DNA. Appl. Environ. Microbiol. 68: 4301.

Ginolhac, A., Jarrin, C., Gillet, B., Robe, P., Pujic, P., Tuphile, K., Bertrand, H.,

Vogel, T.M., Perrie`re, G., Simonet, P., and Nalin, R. (2004) Phylogenetic

Analysis of Polyketide Synthase I Domains from Soil Metagenomic Libraries

Allows Selection of Promising Clones. Appl Environ Microbiol 70: 5522-5527.

Glockner, F.O., Fuchs, B., and Amann, R. (1999) Bacterioplankton composition of lakes and oceans: a comparison based on fluorescence in situ hybridization.

218 Appl Environ Microbiol 65: 3721-3726.

Glockner, F.O., Kube, M., Bauer, M., Teeling, H., Lombardot, T., and Ludwig,

W. (2003) Complete genome sequence of the maine planctomycete Pirellula sp.

Strain 1. Proc Natl Acad Sci U S A 100: 8298-8303.

Glockner, F.O., Zaichikov, E., Belkova, N., Denissova, L., Pernthaler, J.,

Pernthaler, A., and Amann, R. (2000) Comparative 16S rRNA analysis of lake bacterioplankton reveals globally distributed phylogenetic clusters including an abundant group of actinobacteria. Appl Environ Microbiol 66: 5053-5065.

Gons, H.J. (1977) On the light-limited growth of Scenedesmus protuberans. In:

University of Amsterdam.

Grebe, T.W., and Stock, J.B. (1999) The histidine protein kinase superfamily.

Adv Microb Physiol 41: 139-227.

Greenberg, E.P. (2000) Acyl-homoserine lactone quorum sensing in bacteria. J

Microbiol 38: 117-121.

Grzymski, J.J., Carter, B.J., DeLong, E.F., Feldman, R.A., Ghadiri, A., and

Murray, A.E. (2006) Comparative Genomics of DNA Fragments from Six

Antarctic Marine Planktonic Bacteria. Appl. Environ. Microbiol. 72: 1532-1541.

Gupta, R., Beg, Q.K., and Lorenz, P. (2002) Bacterial alkaline proteases: molecular approaches and industrial applications. Appl Microbiol Biotechnol 59:

15-32.

219 Hagstrom, A., Pommier, T., Rohwer, F., Simu, K., Stolte, W., Svensson, D., and

Zweifel, U.L. (2002) Use of 16S Ribosomal DNA for Delineation of Marine

Bacterioplankton Species. Appl. Environ. Microbiol. 68: 3628-3633.

Hahn, M.W. (2003) Isolation of Strains Belonging to the Cosmopolitan

Polynucleobacter necessarius Cluster from Freshwater Habitats Located in

Three Climatic Zones. Appl. Environ. Microbiol. 69: 5248-5254.

Hall, T.A. (1998) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser 41: 95-98.

Hallam, S.J., Girguis, P.R., Preston, C.M., Richardson, P.M., and DeLong, E.F.

(2003) Identification of methyl coenzyme M reductase A (mcr A) genes associated with methane-oxidizing archaea. Appl. Environ. Microbiol. 69: 5483.

Handelsman, J. (2004) Metagenomics: Application of Genomics to Uncultured

Microorganisms. Microbiol. Mol. Biol. Rev. 68: 669-685.

Haney, J.F., Sasner, J.J., and Ikawa, M. (1995) Effects of products released by

Aphanizomenon flos-aquae and purified saxitoxin on the movements of

Daphnia-carinata feeding appendages. Limnol Oceanogr 40: 263-272.

Hanson, J.R., Ackerman, C.E., and Scow, K.M. (1999) Biodegradation of Methyl tert-Butyl Ether by a Bacterial Pure Culture. Appl. Environ. Microbiol. 65: 4788-

4792.

220 Harada, K., Oshikata, M., Uchida, H., Suzuki, M., Kondo, F., Sato, K., Ueno, Y.,

Yu, S.Z., Chen, G., and Chen, G.C. (1996) Detection and identification of microcystins in the drinking water of Haimen City, China. Nat Toxins 4: 277-283.

Healy, F.G., Ray, R.M., Aldrich, H.C., Wilkie, A.C., and Ingram, L.O. (1995)

Direct isolation of functional genes encoding cellulases from the microbial consortia in a thermophilic, anaerobic digester maintained on lignocellulose.

Appl. Microbiol. Biotechnol. 43: 667.

Hellingwerf, K.J. (2005) Bacterial observations: a rudimentary form of intelligence? Trends Microbiol 13: 152-158.

Hemscheidt, T., Rapala, J., Sivonen, K., and Skulberg, O.M. (1995)

Biosynthesis of anatoxin-A in Anabaena-flos-aquae and Homoanatoxin-A in

Oscillatoria formosa. J Chem Soc Chem Commun 13: 1361-1362.

Henne, A., Daniel, R., Schmitz, R.A., and Gottschalk, G. (1999) Construction of environmental DNA libraries in Escherichia coli and screening for the presence of genes conferring utilization of 4-hydroxybutyrate. Appl. Environ. Microbiol. 65:

3901-3907.

Henne, A., Schmitz, R.A., Bomeke, M., Gottschalk, G., and Daniel, R. (2000)

Screening of environmental DNA libraries for the presence of genes conferring lipolytic activity on Escherichia coli. Appl. Environ. Microbiol. 66: 3113-3116.

Henne, A., Bruggemann, H., Raasch, C., Wiezer, A., Hartsch, T., and

Liesegang, H. (2004) The genome sequence of the extreme thremophile

Thermus thermophilus. Nat Biotechnol 22: 547-553.

221 Hill, C.W. (1999) Large genomic sequence repetitions in bacteria: lessons from rRNA operons and Rhs elements. Res Microbiol 150: 665-674.

Hiorns, W.D., Methe´, B.A., Nierzwicki-Bauer, S.A., and Zehr, J.P. (1997)

Bacterial diversity in Adirondack mountain lakes as revealed by 16S rRNA gene sequences. Appl Environ Microbiol 63: 2957-2960.

Hoffmann, D., Hevel, J.M., Moore, R.E., and Moore, B.S. (2003) Sequence analysis and biochemical characterisation of the nostopeptolide A biosynthetic gene cluster from Nostoc sp. GSV224. Gene 311: 171-180.

Hoogenhout, H., and Amesz, J. (1965) Growth rates of photosynthetic microorganisms in laboratory cultures. Arch Microbiol 50.

Huber, T., Faulkner, G., and Hugenholtz, P. (2004) Bellerophon; a program to detect chimeric sequences in multiple sequence alignments. Bioinformatics 20:

2317-2319.

Hugenholtz, P. (2002) Exploring prokaryotic diversity in the genomic era.

Genome Biology 3: 1-8.

Hugenholtz, P., Goebel, B., and Pace, N. (1998) Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J Bacteriol 180:

4765 - 4774.

Hughes, D.S., Felbeck, H., and Stein, J.L. (1997) A histidine protein kinase homolog from the endosymbiont of the hydrothermal vent tubeworm Riftia pachyptila. Appl. Environ. Microbiol. 63: 3494.

222 Hummel, W. (1997) New alcohol dehydrogenases for the synthesis of chiral compounds. Adv Biochem Eng Biotechnol 58: 145-184.

Hummel, W. (1999) Large-scale applications of NAD(P)-dependent oxidoreductases: recent developments. Trends Biotechnol 17: 487-492.

Igarashi, N., Harada, J., Nagashima, S., Matsuura, K., Shimada, K., and

Nagashima, K.V.P. (2001) Horizontal Transfer of the Photosynthesis Gene

Cluster and Operon Rearrangement in Purple Bacteria. Journal of Molecular

Evolution 52: 333-341.

Itou, Y., Suzuki, S., Ishida, K., and Murakami, M. (1999) Anabaenopeptins G and H, potent carboxypeptidase A inhibitors from the cyanobacterium

Oscillatoria agardhii (NIES-595). Bioorg Med Chem Lett 9.

Izaguirre, G., and Taylor, W.D. (2004) A guide to geosmin- and MIB-producing cyanobacteria in the United States. Water Sci Technol 49: 19-24.

Jackson, A.R.B., McInnes, A., Falconer, I.R., and Runnegar, M.T.C. (1984)

Chemical and pathological changes in sheep experimentally poisoned by the blue-green algae Microcystis aeruginosa. Vet Pathol 21: 102-113.

Jakobi, C., Oberer, L., Quiquerez, C., Koniq, W.A., and Weckesser, J. (1995)

Cyanopeptolin S, a sulfate contianing depsipeptide from a water bloom of

Microcystis sp. FEMS Microbiol Lett 129: 129-133.

Javor, B.J., and Castenholz, R.W. (1981) Laminated microbial matts, Laguna

Guerrero Negro, Mexico. Geomicrobiol J 2: 237-274.

223 Jiménez, J.I., Miñambres, B., García, J.L., and Díaz, E. (2002) Genomic analysis of the aromatic catabolic pathways from Pseudomonas putida KT2440.

Environ Microbiol 4: 824-841.

Jochimsen, E.M., Carmichael, W.W., An, J.S., Cardo, D.M., Cookson, S.T.,

Holmes, C.E., Antunes, M.B., Filho, D.A., Lyra, T.M., and Barreto, V.S. (1998)

Liver failure and death after exposure to microcystins at a hemodialysis center in Brazil. N Engl J Med 338: 873-878.

Johri, B.M. (2005) Microbial diversity. CURRENT SCIENCE 89: 3-4.

Joseph, S.J., Hugenholtz, P., Sangwan, P., Osborne, C.A., and Janssen, P.H.

(2003) Laboratory Cultivation of Widespread and Previously Uncultured Soil

Bacteria. Appl. Environ. Microbiol. 69: 7210-7215.

Kaebernick, M., and Neilan, B.A. (2001) Ecological and molecular investigations of cyanotoxin production. FEMS Microbiol Ecol 35: 1-9.

Kaebernick, M., Neilan, B.A., Borner, T., and Dittmann, E. (2000) Light and the transciptional response of the microcystin biosynthetic gene cluster. Appl

Environ Microbiol 66: 3387-3392.

Kaebernick, M., Dittmann, E., Borner, T., and Neilan, B.A. (2002) Multiple alternate transcipts direct the biosynthesis of microcystin, a cyanobacterial nonribosomal peptide. Appl Environ Microbiol 68: 449-455.

Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and

Genomes. Nucl. Acids Res. 28: 27-30.

224 Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima,

S., Katayama, T., Araki, M., and Hirakawa, M. (2006) From genomics to chemical genomics: new developments in KEGG. Nucl. Acids Res. 34: D354-

357.

Kirchman, D.L. (2001) The ecology of Cytophaga-Flavobacteria in aquatic environments. FEMS Microbiol Ecol 39: 91-100.

Klappenbach, J.A., Dunbar, J.M., and Schmidt, T.M. (2000) rRNA operon copy number reflects ecological strategies of bacteria. Appl. Environ. Microbiol. 66:

1328.

Klausen, C., Jørgensen, N.O.G., Burford, M.A., and O’Donohue, M. (2004)

Actinomycetes may also produce taste and odour. Water 31: 45-49.

Klemme, J.-H., and Pfleiderer, C. (1977) Production of extracellular proteolytic enzymes by phototrophic bacteria. FEMS Lett 1: 297-299.

Knietsch, A., Bowien, S., Whited, G., Gottschalk, G., and Daniel, R. (2003a)

Identification and characterization of coenzyme B-12-dependent glycerol dehydratase- and diol dehydratase-encoding genes from metagenomic DNA libraries derived from enrichment cultures. Appl. Environ. Microbiol. 69: 3048.

Knietsch, A., Waschkowitz, T., Bowien, S., Henne, A., and Daniel, R. (2003b)

Construction and screening of metagenomic libraries derived from enrichment cultures: generation of a gene bank for genes conferring alcohol oxidoreductase activity on Escherichia coli. Appl. Environ. Microbiol. 69: 1408.

225 Knietsch, A., Waschkowitz, T., Bowien, S., Henne, A., and Daniel, R. (2003c)

Metagenomes of complex microbial consortia derived from different soils as sources for novel genes conferring formation of carbonyls from short-chain polyols on Escherichia coli. J. Mol. Microbiol. Biotechnol. 5: 46.

Kodani, S., Ishida, K., and Murakami, M. (1998a) Aeruginosin 103-A, a thrombin inhibitor from the cyanobacterium Microcystis viridis. J Nat Prod 61: 1046-1048.

Kodani, S., Ishida, K., and Murakami, M. (1998b) Dehydroradiosumin, a trypsin inhibitor from the cyanobacterium Anabaena cylindrica. J Nat Prod 61: 854-856.

Kuhner, M.K., and Felsenstein, J. (1994) A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol 11: 459-

468.

Kuhnert, P., Heyberger-meyer, B., Burnens, A.P., Nicolet, J., and Frey, J.

(1997) Detection of RTX Toxin Genes in Gram-Negative Bacteria with a Set of

Specific Probes. Appl Environ Microbiol 63: 2258–2265.

Lee, T.M., and Shieh, Y.J. (1993) o-Hydroxyphenylacetic acid-enhanced rice coleoptile elongation. Plant Sci 95: 49-54.

Liesack, W., and Stackebrandt, E. (1989) Evidence for unlinked rrn operons in the planctomycete Pirellula marina. J Bacteriol 171: 5025-5030.

Liles, M.R., Manske, B.F., Bintrim, S.B., Handelsman, J., and Goodman, R.M.

(2003) A census of rRNA genes and linked genomic sequences within a soil metagenomic library. Appl. Environ. Microbiol. 69: 2684.

226 Lindstrom, E.S. (2000) Bacterioplankton Community Composition in Five Lakes

Differing in Trophic Status and Humic Content. Microbial Ecology 40: 104-113.

Lindstrom, E.S., and Leskinen, E. (2002) Do Neighboring Lakes Share

Common Taxa of Bacterioplankton? Comparison of 16S rDNA Fingerprints and

Sequences from Three Geographic Regions. Microbial Ecology 44: 1-9.

Lopez-Garcia, P., Brochier, C., Moreira, D., and Rodriguez-Valera, F. (2004)

Comparative analysis of a genome fragment of an uncultivated mesopelagic crenarchaeote reveals multiple horizontal gene transfers. Environ. Microbiol. 6:

19.

Lorenz, P., and Eck, J. (2005) Metagenomics and Industrial Applications.

Nature Rev Microbiol 3: 510-516.

Lowe, T.M., and Eddy, S.R. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucl Acids Res 25: 955-

964.

Luengo, J.M. (1998) Enzymatic synthesis of penicillins. In Comprehensive

Natural Products Chemistry. Barton, D., and Nakanishi, K. (eds). New York:

Pergamon Press, pp. 239-274.

Luengo, J.M., Garcia, J.L., and Olivera, E.R. (2001) The phenylacetyl-CoA catabolon: a complex catabolic unit with broad biotechnological applications.

Mol Microbiol 39: 1434-1442.

Luesch, H., Yoshida, W.Y., Moore, R.E., Paul, V.J., and Mooberry, S.L. (2000)

227 Isolation, structure determination, and biological activity of Lyngbyabellin A from the marine cyanobacterium Lyngbya majuscula. J Nat Prod 63: 611-615.

Majernik, A., Gottschalk, G., and Daniel, R. (2001) Screening of environmental

DNA libraries for the presence of genes conferring Na+(Li+)/H+ antiporter activity on Escherichia coli: characterization of the recovered genes and the corresponding gene products. J. Bacteriol. 183: 6645.

Marahiel, M.A., Stachelhaus, T., and Mootz, H.D. (1997) Modular peptide synthetases involved in peptide synthesis. Chem Rev 97: 26511-26573.

Marchler-Bauer, A., and Bryant, S.H. (2004) CD-Search: protein domain annotations on the fly. Nucl Acids Res 32: 327-331.

Maris, A.E., Sawaya, M.R., Kaczor-grzeskowiak, M., Jarvis, M.R., Bearson,

S.M., Kopka, M.L., Schroder, I., Gunsalus, R.P., and Dickerson, R.E. (2002)

Dimerization allows DNA target site recognition by the NarL response regulator.

Nat Struct Biol 9: 771-778.

Markowitz, V.M., Korzeniewski, F., Palaniappan, K., Szeto, E., Werner, G.,

Padki, A., Zhao, X., Dubchak, I., Hugenholtz, P., Anderson, I., Lykidis, A.,

Mavromatis, K., Ivanova, N., and Kyrpides, N.C. (2006) The integrated microbial genomes (IMG) system. Nucl. Acids Res. 34: D344-348.

Martinez, A., Kolvek, S.J., Yip, C.L.T., Hopke, J., Brown, K.A., MacNeil, I.A., and Osburne, M.S. (2004) Genetically Modified Bacterial Strains and Novel

Bacterial Artificial Chromosome Shuttle Vectors for Constructing Environmental

Libraries and Detecting Heterologous Natural Products in Multiple Expression

228 Hosts. Appl. Environ. Microbiol. 70: 2452-2463.

Minambres, B., Martinez-Blanco, H., Olivera, E.R., Garcia, B., Diez, B., and

Barredo, J.L. (1996) Molecular cloning and expression in different microbes of the DNA encoding Pseudomonas putida phenylacetyl-CoA ligase. Use of this gene to improve the rate of benzylpenicillin biosynthesis in Penicillium chrysogenum. J Biol Chem 271: 33531-33538.

Mitchell, S.S., Faulkner, D.J., Rubins, K., and Bushman, F.D. (2000) Dolastatin

3 and two novel cyclic peptides from a palauan collection of Lyngbya majuscula.

J Nat Prod 63: 279-282.

Mizuno, T., Kaneko, T., and Tabata, S. (1996) Compilation of all genes encoding bacterial two-component signal transducers in the genome of the cyanobacterium Synechocystis sp. strain PCC 6803. DNA Res 3: 407-414.

Mohamed, M., Ismail, W., Heider, J., and Fuchs, G. (2002) Aerobic metabolism of phenylacetic acids in Azoarcus evansii. Archives of Microbiology 178: 180-

192.

Moore, R.E. (1996) Cyclic peptides and depsipeptides from cyanobacteria: a review. J Ind Microbiol 16: 134-143.

Moreira, D., Rodríguez-Valera, F., and López-García, P. (2004) Analysis of a genome fragment of a deep-sea uncultivated Group II euryarchaeote containing

16S rDNA, a spectinomycin-like operon and several energy metabolism genes.

Environmental Microbiology 6: 959-969.

229 Mori, T., Gustafson, K.R., Pannell, L.K., Shoemaker, R.H., Wu, L.W., McMahon,

J.B., and Boyd, M.R. (1998) Recombinant production of cyanovirin-N, a potent human immunodeficiency virus-inactivating protein derived from a cyanobacterium. Protein Expr Purif 12: 151-158.

Mulder, E.G. (1989a) Sheathed bacteria - genus Leptothrix. In Bergey's manual of systematic bacteriology. Stanley, J.T., Bryant, M.P., Pfennig, N., and Holt,

J.G. (eds). Baltimore: Williams and Wilkins, pp. 1998-2003.

Mulder, E.G. (1989b) Sheathed bacteria - genus Sphaerotilus. In Bergey's manual of systematic bacteriology. Stanley, J.T., Bryant, M.P., Pfennig, N., and

Holt, J.G. (eds). Baltimore: Williams and Wilkins, pp. 1994-1998.

Mur, L.R., Skulberg, O.M., and Utkilen, H. (1999) Cyanobacteria in the environment. In Toxic Cyanobacteria in Water: A guide to their public health consequences, monitoring and management. Chorus, I., and Bartram, J. (eds).

London: WHO.

Nagashima, K.V.P., Matsuura, K., Ohyama, S., and Shimada, K. (1994) Primary structure and transcription of genes encoding B870 and photosynthetic reaction center apoproteins from Rubrivivax gelatinosus. J Bio Chem 269: 1-8.

Namikoshi, M., and Rinehart, K.L. (1996) Bioactive compounds produced by cyanobacteria. J Ind Microbiol 17: 373-384.

Nesbø, C.L., Boucher, Y., Dlutek, M., and Ford Doolittle, W. (2005) Lateral gene transfer and phylogenetic assignment of environmental fosmid clones. Environ

Microbiol 7: 2011-2026.

230 Neumann, U., Forchert, A., Flury, T., and Weckesser, J. (1997) Microginin FR1, a linear peptide from a water bloom of Microcystis species. FEMS Microbiol Lett

153: 475-478.

Nielsen, J.L., Klausen, C., Nielsen, P.H., Burford, M., and Jørgensen, N.O.G.

(2006) Detection of activity among uncultured Actinobacteria in a drinking water reservoir. FEMS Microbiol Ecol 55: 432-438.

Nikolskaya, A.N., and Galperin, M.Y. (2002) A novel type of conserved DNA- binding domain in the transcriptional regulators of the AlgR/AgrA/LytR family.

Nucl Acids Res 30: 2453-2459.

Nishizawa, T., Asayama, M., and Fujii, K. (1999) Genetic analysis of the peptide synthetase genes for a cyclic heptpeptide microcystin in Microcystis spp. J

Biochem (Tokyo) 126: 520-529.

Nishizawa, T., Ueda, A., and Asayama, M. (2000) Polyketide synthase gene coupled to the peptide synthetase module involved in the biosynthesis of the cyclic heptapeptide microcystin. J Biochem 127: 779-789.

Ohmori, M., Ikeuchi, M., Sato, S., Wolk, C.P., Kaneko, T., Ogawa, T. Kanehisa,

M., Goto, S., Kawashima, S., Okamoto, S., Yoshimura, H., Katoh, H., Fujisawa,

T., Ehira, S., Kamei, A., Yoshihara, S., Narikawa, R., and Tabata, S. (2001)

Characterization of genes encoding multi-domain proteins in the genome of the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 7120.

DNA Res 8: 271–284.

Olivera, E.R., Minambres, B., Garcõa, B., Muniz, C., Moreno, M.A., and

231 Ferrandez, A. (1998) Molecular characterization of the phenylacetic acid catabolic pathway in Pseudomonas putida U: the phenylacetyl-CoA catabolon.

Proc Natl Acad Sci USA 95: 6419-6424.

Olivera, E.R., Carnicero, D., Garcia, B., Minambres, B., Moreno, M.A., and

Canedo, L. (2001) Two different pathways are involved in the b-oxidation of n- alkanoic and n-phenylalkanoic acids in Pseudomonas putida U: genetic studies and biotechnological applications. Mol Microbiol 39: 863-874.

Ostensvik, O., Skulberg, O.M., Underdal, B., and Hormazabal, V. (1998)

Antibacterial properties of extracts from selected planktonic freshwater

Cyanobacteria - a comparative study of bacterial bioassays. J Appl Microbiol

84: 1117-1124.

Ouchane, S., Agalidis, I., and Astier, C. (2002) Natural Resistance to Inhibitors of the Ubiquinol Cytochrome c Oxidoreductase of Rubrivivax gelatinosus:

Sequence and Functional Analysis of the Cytochrome bc1 Complex. J.

Bacteriol. 184: 3815-3822.

Ouchane, S., Picaud, M., Reiss-Husson, F., Vernotte, C., and Astier, C. (1996)

Development of gene transfer methods for Rubrivivax gelatinosus S1: construction, characterization and complementation of a puf operon deletion strain. Molecular Genetics and Genomics 252: 379-385.

Panda, D., DeLuca, K., Williams, D., Jordon, M.A., and Wilson, L. (1998)

Antiproliferative mechanism of action of cryptophycin-52: kinetic stabilization of microtubule dynamics by high-affinity binding to microtubule ends. Proc Natl

Acad Sci U S A 95: 9313 9318.

232 Panke, S., Witholt, B., Schmid, A., and Wubbolts, M.G. (1998) Towards a biocatalyst for (S)-styrene oxide production: characterization of the styrene degradation pathway of Pseudomonas sp. strain VLB120. Appl Environ

Microbiol 64: 2032-2043.

Panke, S., Wubbolts, M.G., Schmid, A., and Witholt, B. (2000) Production of enantiopure styrene oxide by recombinant Escherichia coli synthesizing a two- component styrene monooxygenase. Biotechnol Bioeng 69: 91-100.

Panke, S., de Lorenzo, V., Kaiser, A., Witholt, B., and Wubbolts, M.G. (1999)

Engineering of a stable whole-cell biocatalyst capable of (S)-styrene oxide formation for continuous two-liquid-phase applications. Appl Environ Microbiol

65: 5619-5623.

Pao, G.M., Tam, R., Lipschitz, L.S., and Saier, J., M. H. (1994) Response regulators: structure, function and evolution. Research in Microbiology 145:

356-362.

Pappas, K.M., Weingart, C.L., and Winans, S.C. (2004) Chemical communication in proteobacteria: biochemical and structural studies of signal synthases and receptors required for intercellular signalling. Mol Microbiol 53:

755-769.

Parales, R.E., and Haddock, J.D. (2004) Biocatalytic degradation of pollutants.

Current Opinion in Biotechnology 15: 374-379.

Parsek, M.R., and Greenberg, E.P. (2000) Acyl-homoserine lactone quorum sensing in Gram-negative bacteria: A signaling mechanism involved in

233 associations with higher organisms. PNAS 97: 8789-8793.

Piel, J. (2002) A polyketide synthase-peptide synthetase gene cluster from an uncultured bacterial symbiont of Paederus beetles. Proc. Natl. Acad. Sci. USA

99: 14002.

Piel, J., Hui, D., Fusetani, N., and Matsunaga, S. (2004) Targeting modular polyketide synthases with iteratively acting acyltransferases from metagenomes of uncultured bacterial consortia. Environ. Microbiol.

Pietra, F.A. (1990) A Secret World: Natural Products of Marine Life. Basel:

Birkhauser.

Poppe, L., and Rétey, J. (2005) Friedel-Crafts-Type Mechanism for the

Enzymatic Elimination of Ammonia from Histidine and Phenylalanine.

Angewandte Chemie International Edition 44: 3668-3688.

Potts, M. (1994) Desiccation tolerance of prokaryotes. Microbial reviews 58:

755-805.

Pouria, S., deAndrade, A., Barbosa, J., Cavalcanti, R., Barreto, V., Ward, C.,

Preiser, W., Poon, G., Neild, G., and Codd, G. (1998) Fatal microcystin intoxication in haemodialysis unit in Caruara, Brazil. Lancet 352: 21-26.

Quaiser, A., Ochsenreiter, T., Klenk, H.P., Kletzin, A., and Treusch, A.H. (2002)

First insight into the genome of an uncultivated crenarchaeote from soil.

Environ. Microbiol. 4: 603.

234 Quaiser, A., Ochsenreiter, T., Lanz, C., Schuster, S.C., and Treusch, A.H.

(2003) Acidobacteria form a coherent but highly diverse group within the bacterial domain: evidence from environmental genomics. Mol. Microbiol. 50:

563.

Ramos, J.L., Gonzalez-Perez, M.M., Caballero, A., and Dillewijn, P.v. (2005)

Bioremediation of polynitrated aromatic compounds: plants and microbes put up a fight. Current Opinion in Biotechnology 16: 275-281.

Rappe, M.S., and Giovannoni, S.J. (2003) The uncultured microbial majority.

Annu. Rev. Microbiol. 57: 369.

Redburn, A.C., and Patel, B.K.C. (1993) Phylogenetic analysis of

Desulfotomaculum thermobenzoicum using polymerase chain-amplified 16S rRNA-specific DNA. FEMS Microbiol Lett 113: 81-86.

Rees, H.C., Grant, S., Jones, B., Grant, W.D., and Heaphy, S. (2003) Detecting cellulase and esterase enzyme activities encoded by novel genes present in environmental DNA libraries. Extremophiles 7: 415-421.

Reid, M.F., and Fewson, C.A. (1994) Molecular characterization of microbial alcohol dehydrogenases. Crit Rev Microbiol 20: 13-56.

Reynolds, C.S. (1984) The ecology of Freswater Phytoplankton. Cambridge:

Cambridge University Press.

Reynolds, C.S. (1987) Cyanobacterial waterblooms. In Advances in Botanical

Research. Callow, P. (ed). London: Academic Press, pp. 17-143.

235 Reynolds, C.S., and Walsby, A.E. (1975) Water blooms. Biol rev 50: 437-481.

Riesenfeld, C.S., Schloss, P.D., and Handelsman, J. (2004) METAGENOMICS:

Genomic Analysis of Microbial Communities. Annual Review of Genetics 38:

525-552.

Robarts, R.D., and Zohary, T. (1987) Temperature effects on photosynthetic capacity, respiration, and growth rates of bloom-forming cyanobacteria. NZ J

Mar Freshwat Res 21: 391-399.

Rocap, G., Larimer, F.W., Lamerdin, J., Malfatti, S., Chain, P., Ahlgren, N.A.,

Arellano, A., Coleman, M., Hauser, L., Hess, W. R., Johnson, Z. I., Land, M.,

Lindell, D., Post, A. F., Regala, W., Shah, M., Shaw, S. L., Steglich, C., Sullivan,

M. B., Ting, C. S., Tolonen, A., Webb, E. A., Zinser, E. R., and Chisholm, S. W.

(2003) Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature 424: 1042-1047.

Rondon, M., August, P., Bettermann, A., Brady, S., Grossman, T., Liles, M.,

Loiacono, K.A., Lynch, B.A., MacNeil, I.A., and Minor, C (2000a) Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl Environ Microbiol 66: 2541 - 2547.

Rozen, S., and Skaletsky, H.J. (2000) Primer3 on the WWW for general users and for biologist programmers. In Bioinformatics Methods and Protocols:

Methods in Molecular Biology. Krawetz, S., and Misener, S. (eds). Totowa, NJ:

Humana Press, pp. 365-386.

Ruepp, A., Grami, W., Santos-Martinez, M.L., Koretke, K.K., Volker, C., and

236 Mewes, H. (2000) The genome sequence of the thermoacidophilic scavenger

Thermoplasma acidophilum. Nature 407: 508-513.

Runnegar, M., Berndt, N., Kong, S.M., Lee, E.Y., and Zhang, L. (1995a) In vivo and in vitro binding of microcystin to protein phosphatases 1 and 2A. Biochem

Biophys Res Commun 216: 162-169.

Runnegar, M.T., Kong, S.M., Zhong, Y.Z., and Lu, S.C. (1995b) Inhibition of reduced glutathion synthesis by cyanobacterial alkaloid cylindrospermopsin in cultured rat hepatocytes. Biochem Pharmacol 49: 219-225.

Samanta, S.K., Singh, O.V., and Jain, R.K. (2002) Polycyclic aromatic hydrocarbons: environmental pollution and bioremediation. Trends in

Biotechnology 20: 243-248.

Sandler, S.J., Hugenholtz, P., Schleper, C., DeLong, E.F., and Pace, N.R.

(1999) Diversity of rad A genes from cultured and uncultured archaea: comparative analysis of putative RadA proteins and their use as a phylogenetic marker. J. Bacteriol. 181: 907.

Sangwan, P., Kovac, S., Davis, K.E.R., Sait, M., and Janssen, P.H. (2005)

Detection and Cultivation of Soil Verrucomicrobia. Appl. Environ. Microbiol. 71:

8402-8410.

Sano, T., and Kaya, K. (1996a) Oscillapeptin G, a tyrosinase inhibitor from toxic

Oscillatoria agardhii. J Nat Prod 59: 90-92.

Sano, T., and Kaya, K. (1996b) Oscillatorin. A chymotrypsin inhibitor from toxic

237 Oscillatoria agardhii. Tetrahedron Lett 37: 6873-6876.

Schembri, M.A., Neilan, B.A., and Saint, C.P. (2001) Identification of genes implicated in toxin production in the cyanobacterium Cylindrospermopsis raciborskii. Environ Microbiol 16: 413-421.

Schlegel, A., Bohm, A., Lee, S.J., Peist, R., Decker, K., and Boos, W. (2002)

Network regulation of the Escherichia coli maltose system. J Mol Biol Biotechnol

4: 301-307.

Schleper, C., Holben, W., and Klenk, H.P. (1997) Recovery of Crenarchaeotal ribosomal DNA sequences from freshwater-lake sediments. Appl. Environ.

Microbiol. 63: 321.

Schleper, C., DeLong, E., Preston, C., Feldman, R., Wu, K., and Swanson, R.

(1998) Genomic analysis reveals chromosomal variation in natural populations of the uncultured psychrophilic archaeon Cenarchaeum symbiosum. J Bacteriol

180: 5003 - 5009.

Schmidt, T.M., DeLong, E.F., and Pace, N.R. (1991) Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing. J.

Bacteriol. 173: 4371.

Sebat, J.L., Colwell, F.S., and Crawford, R.L. (2003) Metagenomic profiling: microarray analysis of an environmental genomic library. Appl. Environ.

Microbiol. 69: 4927.

Sekar, R., Pernthaler, A., Pernthaler, J., Warnecke, F., Posch, T., and Amann,

238 R. (2003) An improved protocol for quantification of freshwater Actinobacteria by flourescence in situ hybridisation. Appl Environ Microbiol 69: 2928-2935.

Shizuya, H., Birren, B., Kim, U., Mancino, V., Slepak, T., Tachiiri, Y., and

Simon, M. (1992) Cloning and Stable Maintenance of 300-Kilobase-Pair

Fragments of Human DNA in Escherichia coli Using an F-Factor-Based Vector.

PNAS 89: 8794-8797.

Singh, C.D., Milligan, K.E., and Gerwick, W.H. (1999) Tanikolide, a toxic and antifungal lactone from the marine cyanobacterium Lyngbya majuscula. J Nat

Prod 62: 1333-1335.

Sinha, S.N., and Banerjee, R.D. (1997) Ecological role of thiosulfate and sulfide utilizing purple nonsulfur bacteria of a riverine ecosystem. FEMS Microbiology

Ecology 24: 211-220.

Sivonen, K., and Jones, G. (1999) Cyanobacterial toxins. In Toxic cyanobacteria in water: a guide to their public health consequences, monitoring, and management. Chorus, I., and Bartram, J. (eds). London: E&FN Spon., pp.

41-111.

Smith, C.D., Zhang, X., Mooberry, S.L., Patterson, G.M., and Moore, R.E.

(1994) Cryptophycin: a new antimicrotubule agent active against drug-resistant cells. Cancer Res 54: 3779-3784.

Song, J.S., Jeon, J.H., Lee, J.H., Jeong, S.H., Jeong, B.C., Kim, S.-J., Lee, J-

H., and Lee, S.H. (2005) Molecular Characterization of TEM-type beta-

239 Lactamases Identified in Cold-Seep Sediments of Edison Seamount (South of

Lihir Island, Papua New Guinea). Microbiol 43: 172-178.

Stahl, D.A., Lane, D.J., Olsen, G.J., and Pace, N.R. (1984) Analysis of hydrothermal vent-associated symbionts by ribosomal RNA sequences.

Science 224: 409-411.

Stal, L. (1995) Physiological ecology of cyanobacteria in microbial mats and other communities. New Phytol 131: 1-32.

Stein, J.L., Marsh, T.L., Wu, K.Y., Shizuya, H., and DeLong, E.F. (1996)

Characterization of uncultivated prokaryotes: isolation and analysis of a 40- kilobase-pair genome fragment front a planktonic marine archaeon. J. Bacteriol.

178: 591.

Stock, A.M., Robinson, V.L., and Goudreau, P.N. (2000) Two-component signal transduction. Annu Rev Biochem 69: 183-215.

Suyama, T., Shigematsu, T., Takaichi, S., Nodasaka, Y., Fujikawa, S., Hosoya,

H., Tokiwa, Y., Kanagawa, T., and Hanada, S. (1999) Roseateles depolymerans gen. nov., sp. nov., a new bacteriochlorophyll a-containing obligate aerobe belonging to the beta-subclass of the Proteobacteria. Int J Syst

Bacteriol 49: 449-457.

Swift, S., Williams, P., and Stewart, G.S.A.B. (1999) N-Acyl homoserine lactones and quorum sensing in proteobacteria. In Cell–Cell Signaling in

Bacteria. Dunny, G.M., and Winans, S.C. (eds). Washington DC: American

Society for Microbiology, pp. 291-313.

240 Tamas, I., Klasson, L., Canback, B., Naslund, A.K., and Eriksson, A.S. (2002)

50 million years of genomic stasis in endosymbiotic bacteria. Science 296:

2376.

Thompson, J.D., Higgins, D.G., and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice.

Nucl Acids Res 22: 4673-4680.

Tillett, D., Parker, D., and Neilan, B.A. (2000) Structural organisation of microcystin biosynthesis in Microcystis aeruginosa PCC7806: an integrated peptide-polyketide synthetase system. Chem Biol 7: 753-764.

Torsvik, V., Ovreas, L., and Thingstad, T.F. (2002) Prokaryotic Diversity--

Magnitude, Dynamics, and Controlling Factors. Science 296: 1064-1066.

Tringe, S.G., von Mering, C., Kobayashi, A., Salamov, A.A., Chen, K., Chang,

H.W., Podar, M., Short, J.M., Mathur, E.J., Detter, J.C., Bork, P., Hugenholtz,

P., and Rubin, E.M. (2005) Comparative Metagenomics of Microbial

Communities. Science 308: 554-557.

Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., and Ram, R.J. (2004)

Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428: 37.

Ueno, Y., Nagata, S., Tsutsumi, T., Hasegawa, A., Watanabe, M.F., Park, H.,

Chen, G.C., Chen, G., and Yu, S. (1996) Detections of microcystins, a blue green algal hepatotoxins, in drinking water sampled in Haimen and Fusus,

241 endemic areas of primary liver cancer in china, by highly sensitive immunoassay. Carcinogenesis 17: 1317-1321.

Urbach, E., Vergin, K.L., Young, L., Morse, A., and Giovannoni, S.J. (2001)

Unusual bacterioplankton community structure in ultra-oligotrophic Crater Lake.

Limnol. Oceanogr. 46: 557.

Van de Peer, Y., and De Wachter, R. (1994) TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the

Microsoft Windows environment. Comput Applic Biosci 10: 569-570.

Van Liere, L., and Mur, L.R. (1979) Some experiments on the competition between a green alga and a cyanobacterium. In: University of Amsterdam.

Velasco, A., Alonso, S., Garcia, J.L., Perera, J., and Diaz, E. (1998) Genetic and functional analysis of the styrene catabolic cluster of Pseudomonas sp. strain Y2. J Bacteriol 180: 1063-1071.

Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen,

J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap,

A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons,

R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y-H., and Smith, H.O. (2004)

Environmental genome shotgun sequencing of the Sargasso Sea. Science 304:

66-74.

Walsby, A.E. (1994) Gas vesicles. Microbiol Rev 58: 94-144.

Warnecke, F., Amann, R., and Pernthaler, J. (2004) Actinobacterial 16S rDNA

242 genes from freshwater habitats cluster in four distinct lineages. Environ

Microbiol 6: 242-253.

West, A.H., and Stock, A.M. (2001) Histidine kinases and response regulator proteins in two-component signaling systems. Trends Biochem Sci 26: 369-376.

Wild, J., Hradecna, Z., and Szybalski, W. (2002) Conditionally Amplifiable

BACs: Switching From Single-Copy to High-Copy Vectors and Genomic Clones.

Genome Res. 12: 1434-1444.

Woese, C.R. (1987) Bacterial evolution. Microbiol Rev 51: 221-271.

Wolfe, C.J., and Haygood, M.G. (1993) Bioluminescent symbiots of the

Caribbean flashlight fish (Kryptophanaron alfredi) have a single rRNA operon.

Mol Mar Biol Biotechnol 2: 189-197.

Yu, S.-Z. (1989) Drinking water and primary liver cancer. In Primary Liver

Cancer. Tang, Z.Y., Wu, M.C., and Xia, S.S. (eds). New York: China Academic

Publishers/Springer, pp. 30-37.

Zdobnov, E.M., and Apweiler, R. (2001) "InterProScan - an integration platform for the signature-recognition methods in InterPro". Bioinformatics 17: 847-848.

Zeidner, G., Preston, C.M., Delong, E.F., Massana, R., and Post, A.F. (2003)

Molecular diversity among marine picophytoplankton as revealed by psb A analyses. Environ. Microbiol. 5: 212.

Zevenboom, W., and Mur, L.R. (1984) Growth and photosynthetic response of

243 the cyanobacterium Microcystis aeroginosa in relation to photoperiodicity and irradiance. Arch Microbiol 139: 232-239.

Zhang, C., and Bennett, G.N. (2005) Biodegradation of xenobiotics by anaerobic bacteria. Applied Microbiology and Biotechnology 67: 600-618.

Zwart, G., Crump, B.C., Kamst-van Agterveld, M.P., Hagen, F., and Han, S.-K.

(2002) Typical freshwater bacteria: an analysis of available 16S rRNA gene sequences from plankton of lakes and rivers. Aquat. Microb. Ecol. 28: 141.

Zwart, G., Hiorns, W.D., Methe, B.A., Van Agterveid, M.P., Huisman, R., Nold,

S.C., Zehr, J. P., and Laanbroek, H. J. (1998) Nearly identical 16S rDNA sequences recovered from lakes in North America and Europe indicate the existence of clades of globally distributed freshwater bacteria. Syst Appl

Microbiol 21: 546-556.

244