Ntougias S, Lapidus A, Han J, Mavromatis K, Pati A, Chen A, Klenk HP, Woyke T, Fasseas C, Kyrpides NC, Zervakis G. High quality draft genome sequence of sitiensis type strain (AW-6(T)), a diphenol degrader with genes involved in the catechol pathway. Standards in genomic sciences 2014, 9(3), 783-793.

Copyright:

©BioMed Central. This work is licensed under a Creative Commons Attribution 3.0 License

DOI link to article: http://dx.doi.org/10.4056/sigs.5088950

Date deposited:

20/03/2015

This work is licensed under a Creative Commons Attribution 3.0 Unported License

Newcastle University ePrints - eprint.ncl.ac.uk

Standards in Genomic Sciences (2014) 9: 783-793 DOI:10.4056/sigs.5088950

High quality draft genome sequence of Olivibacter siti- ensis type strain (AW-6T), a diphenol degrader with genes involved in the catechol pathway

Spyridon Ntougias1, Alla Lapidus2, James Han2, Konstantinos Mavromatis2, Amrita Pati2, Amy Chen3, Hans-Peter Klenk4, Tanja Woyke2, Constantinos Fasseas5, Nikos C. Kyrpides2,6, Georgios I. Zervakis7,*

1Democritus University of Thrace, Department of Environmental Engineering, Laboratory of Wastewater Management and Treatment Technologies, Xanthi, Greece 2Department of Energy Joint Genome Institute, Genome Biology Program, Walnut Creek, CA, USA 3Biological Data Management and Technology Center, Lawrence Berkeley National Labora- tory, Berkeley, California, USA 4Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany 5Agricultural University of Athens, Electron Microscopy Laboratory, Athens, Greece 6 King Abdulaziz University, Jeddah, Saudi Arabia 7Agricultural University of Athens, Laboratory of General and Agricultural Microbiology, Athens, Greece *Correspondence: Georgios I. Zervakis ([email protected]) Keywords: alkaline two-phase olive mill waste, , , hemicel- lulose degradation, β-1,4-xylanase, β-1,4-xylosidase

Olivibacter sitiensis Ntougias et al. 2007 is a member of the family Sphingobacteriaceae, phy- lum Bacteroidetes. Members of the genus Olivibacter are phylogenetically diverse and of sig- nificant interest. They occur in diverse habitats, such as rhizosphere and contaminated soils, viscous wastes, composts, biofilter clean-up facilities on contaminated sites and cave envi- ronments, and they are involved in the degradation of complex and toxic compounds. Here we describe the features of O. sitiensis AW-6T, together with the permanent-draft genome se- quence and annotation. The organism was sequenced under the Genomic Encyclopedia for and Archaea (GEBA) project at the DOE Joint Genome Institute and it is the first ge- nome sequence of a within the genus Olivibacter. The genome is 5,053,571 bp long and is comprised of 110 scaffolds with an average GC content of 44.61%. Of the 4,565 genes predicted, 4,501 were protein-coding genes and 64 were RNA genes. Most protein-coding genes (68.52%) were assigned to a putative function. The identification of 2-keto-4- pentenoate hydratase/2-oxohepta-3-ene-1,7-dioic acid hydratase-coding genes indicates in- volvement of this organism in the catechol catabolic pathway. In addition, genes encoding for β-1,4-xylanases and β-1,4-xylosidases reveal the xylanolytic action of O. sitiensis.

Introduction Toplou Monastery, Sitia, Greece [1]. The genus name derived from the Latin term oliva and the The genus Olivibacter currently contains six spe- Neo-Latin bacter, meaning a rod-shaped bacte- cies with validly published names, all of which are rium living in olives/olive processing by-products aerobic and heterotrophic, non-motile, rod- [1]. The Neo-Latin species epithet sitiensis per- shaped Gram-negative bacteria [1-3]. Strain AW- tains to the region Sitia (Crete, Greece) where the 6T (= DSM 17696T = CECT 7133T = CIP 109529T) is olive mill is operating [1]. The other species of the the type strain of Olivibacter sitiensis [1], which is genus are O. soli, O. ginsengisoli, O. terrae, O. the type species of the genus Olivibacter. The oleidegradans and O. jilunii [2-4]. O. soli and O. strain was isolated from alkaline alperujo, an olive ginsengisoli were isolated from soil of a ginseng mill sludge-like waste produced by two-phase field [2], O. terrae from a compost prepared of cow centrifugal decanters located in the vicinity of manure and rice straw [2], O. oleidegradans from a

The Genomic Standards Consortium

Olivibacter sitiensis AW-6T biofilter clean-up facility in a hydrocarbon- quently occurring genera were in order contaminated site [3] and O. jilunii from a DDT- (27.7%), (17.1%), contaminated soil [4]. O. sitiensis can be distin- Flavobacterium (8.5%), Olivibacter (6.4%), guished from O. soli, O. ginsengisoli and O. terrae Hymenobacter (6.4%), (4.3%), on the basis of temperature and NaCl concentra- Cytophaga (4.3%), Flectobacillus (4.3%), tion ranges for growth, in its ability to assimilate (2.1%), Pseudosphingobacterium N-acetyl-D-glucosamine, L-histidine, maltose and (2.1%) and ‘Hevizibacter’ (2.1%) (47 hits in total). sorbitol, and for expression of naphthol-AS-BI- The 16S rRNA gene sequence of O. sitiensis AW-6T phosphohydrolase, in the presence/absence of was the only hit on members of the species in iso-C15: 1 F, C16: 1 2-OH, anteiso-C17: 1 B and/or iso- INSDC (=EMBL/NCBI/DDBJ) under the accession C17: 1 I, and in by its DNA G+C content [1,2,4]. number DQ421387 (=NR_043805). Among all Moreover, it differs from O. soli in terms of L- other species, the two yielding the highest score arabinose assimilation and valine arylamidase ex- were Parapedobacter koreensis Jip14T pression, from O. ginsengisoli in terms of inositol, (DQ680836) [7] and Olivibacter ginsengisoli Gsoil mannitol and salicin assimilation and in oxidase 060T (AB267716) [2], showing similarity in 16S reaction test, and from O. terrae in terms of L- rRNA gene of 90.1% (both of them) and HSP arabinose and mannitol assimilation, and β- coverages of 99.8% and 99.9% respectively. It is glucuronidase and valine arylamidase expression noteworthy that the Greengenes database uses the [1,2,4]. O. sitiensis can be differentiated from O. INSDC (=EMBL/NCBI/DDBJ) annotation, which is oleidegradans on the basis of DNA G+C content, pH not an authoritative source for nomenclature or upper limit for growth, in the ability for assimila- classification. The highest-scoring environmental tion of D-adonitol, L-arabinose, N-acetyl-D- sequences was AM114441 ['Interactions U(VI) glucosamine, L-histidine, D-lyxose, maltose, added natural dependence on various incubation melezitoze, salicin and turanose, and for expres- conditions soil uranium mining waste pile clone sion of esterase, β-galactosidase, α-mannosidase, JG35+U2A-AG9'], which showed identity of 90.3% urease and valine arylamidase as well as in the with HSP coverage of 86.1%. The most frequently presence/absence of some minor fatty acid com- occurring keywords within the labels of all envi- ponents of membrane lipids, menaquinone-6 (as ronmental samples that yielded hits were 'rumen' minor respiratory quinone) and (23.1%), 'oil' (10.8%), 'water' (9.7%), 'soil' aminophospholipids (as cellular polar lipids) (9.7%), 'fluid' (9.1%) and 'gut' (9.1%) (186 hits in [1,3,4]. In addition, O. sitiensis can be distin- total). The most frequently occurring keywords guished from O. jelunii on the basis of DNA G+C within the labels of those environmental samples content, pH, temperature and NaCl concentration that yielded hits of a higher score than the highest upper limits for growth, lactose fermentation, in scoring species were 'waste' (50.0%) and 'soil' the ability for assimilation of acetate, L-arabinose, (50.0%) (4 hits in total), which are keywords with N-acetyl-D-glucosamine, L-histidine, malonate, biological meaning fitting the environment from maltose, D-mannose, salicin and L-serine, and for which O. sitiensis AW-6T was isolated. expression of α-mannosidase, oxidase and valine Figure 1 shows the phylogenetic neighborhood of arylamidase as well as in the presence/absence of O. sitiensis in the 16S rRNA gene sequence-based some minor fatty acid components of membrane trees constructed. Independently from the cluster- lipids, menaquinone-8 (as minor respiratory qui- ing method applied, all Olivibacter species togeth- none) and aminophospholipids (as cellular polar er with Pseudosphingobacterium domesticum and lipids) [1,4]. Here we present a summary classifi- ‘Sphingobacterium’ sp. 21 fell into a distinct clus- cation and a set of features for O. sitiensis AW-6T, ter, indicating the unique phylogenetic position of together with the description of the permanent- genus Olivibacter and the necessity for reconsider- draft genome sequencing and annotation. ing the taxonomic status of the genus Classification and features Pseudosphingobacterium. In addition, ‘Sphingobacterium’ sp. 21 should be assigned to The 16S rRNA gene sequence of O. sitiensis AW-6T the genus Olivibacter, and not to the genus was compared using NCBI BLAST under default Sphingobacterium. In the ML tree, members of the settings (e.g., considering only the high-scoring genus Parapedobacter branched together with O. segment pairs (HSPs) from the best 250 hits) with sitiensis, although the unique topology of the ge- the most recent release of the Greengenes data- nus was established by applying a character-based base [5] and the relative frequencies of taxa and (parsimony) method. As previously stated by keywords (reduced to their stem [6]) were deter- Ntougias et al. [1], S. antarcticum should be reas- mined and weighted by BLAST scores. The fre- signed to the genus Pedobacter. quency of genera that belonged to the family Sphingobacteriaceae was 61.8%. The most fre-

784 Standards in Genomic Sciences Ntougias et al. 0.1 76 OlivibacterOsitiensis sitiensis DSM 17696T, DQ421387 A ParapedobacterParapedo koreensis Jip14 T, DQ680836 28 98 ParapedobacterParapedsoli soli DCY14T, NR_044119 OlivibacterOginseng ginsengisoli Gsoil 060T, NR_041504 96 PseudosphingobacteriumPseudosphing domesticum DSM 18733T, AM407725 69 62 OlivibacterOsoli soli Gsoil 034T, NR_041503 58 OlivibacterOterrae terrae Jip13T, NR_041502 5058OlivibacterOoleidegr oleidegradans TBF2/20.2T, HM021726 OlivibacterOjilunii jilunii 14-2AT, HQ848949 100 ′Sphingo21Sphingobacterium´ sp. 21, CP002584** 91 SphingobacteriumSmultivorum multivorum DSM 11691T, NR_040953* 99 SphingobacteriumSdetergens detergens 6.2ST, JN01521 83 SphingobacteriumSthalpoph thalpophilum DSM 11723T, NR_042135* 97SphingobacteriumSanhuiense anhuiense CWT, NR_044477 63 SphingobacteriumSkitahirosh kitahiroshimense 10CT, NR_041636 100 SphingobacteriumSfaecium faeciumDSM 11690 T, NR_025537* 53 SphingobacteriumspirATCC33300 spiritivorum ATCC 33300T, ACHB01000006* 100 SphingobacteriumSspiritiv spiritivorum ATCC 33861T, NR_044077* 34 SphingobacteriumScomT en composti T5-12T, NR_041363 48 SphingobacteriumScompYoo composti NBRC 106383T, AB682399 SphingobacteriumSbambusae bambusae IBFC2009 T, GQ339910 80 43 SphingobacteriumSshayense shayense HS39T, FJ816788 SphingobacteriumSmizutaii mizutaii DSM 11724T, NR_042134* 93 SphingobacteriumSdaejeonen daejeonense TR6-04T, NR_041407 T 41 MucilaginibacterMoryzae oryzae DSM 19975 , NR_044404 MucilaginibacterMgossypii gossypii Gh-67T, EU672804 31 MucilaginibacterMgossypiico gossypiicola Gh-48 T, EU672805 40 50 MucilaginibacterMkameinon kameinonensis SCKT, NR_041615 79 MucilaginibacterMrigui rigui NBRC 101115T, AB681382 18 Mdorajii T 79 Mucilaginibacter dorajii DR-f4 , GU139697 MucilaginibacterMcomposti composti TR6-03T, AB267719 11 Mximonensis T 86 Mucilaginibacter ximonensis XM-003 , NR_044565 MucilaginibacterMucilagi paludis DSM 18603T, AM490402* 54 91 MucilaginibacterMgracilis gracilis DSM 18602T, NR_042598 MucilaginibacterMdaejeon daejeonensis Jip 10T, NR_041505 99 56 SolitaleaScanadensiscanadensisDSM 3403 T, CP003349** 100 SolitaleaSkoreensiskoreensis DSM 21342T, EU787448 PedobacterPcomposti composti TR6-06T, NR_041506 46 PedobacterPoryzae oryzae DSM 19973T, EU109726* 70 PedobacterPglucosidil glucosidilyticus DSM 23534 T, EU585748* T 84 PedobacterPsaltans saltans DSM 12145 , CP002545** PedobacterParcticus arcticus NRRL B-59457T, HM051286* 20 59 PedobacterPdaechung daechungensis Dae 13T, NR_041507 76 PedobacterPlentus lentus DS-40 T, NR_044218 63 PerricolaPedobacter terricola DS-45T, NR_044219 PedobacterPaquatilis aquatilis AR107 T, NR_042446 PedobacterPborealis borealis DSM 19626 T, NR_044381 65 T 62 47 PedobacterPagri agri DSM 19486 , NR_044339* PsandarakPedobacter sandarakinus DS-27 T, NR_043665 5957 PterraePedobacter terrae DSM 17933 T, NR_044005 45 88PedobacterPsuwonensissuwonensis DSM 18130T, NR_043543 65 PedobacterProseus roseus CL-GP80 T, NR_043555 PedobacterPalluvionis alluvionis DSM 19624T, NR_044382 PduraquaePedobacter duraquae DSM 19034 T, NR_042601 36 100 SantarcticSphingobacterium antarcticumDSM 15311T, FR733711 PpisciumPedobacter piscium DSM 11725T, NR_025536 63PedobacterPhartonius hartonius DSM 19033 T, NR_042604 2095 PedobacterPhimalay himalayensis DSM 16196T, NR_042204 T 80PcryoconitisPedobacter cryoconitis DSM 14825 , NR_025534 77 T 46 PedobacterPboryungen boryungensis BR-9 , HM640986 PedobacterPinsulae insulaeDSM 18684 T, NR_044083 88 T 15 75 PedobacterPkoreens koreensis WPCB189 , NR_043538 PcaeniPedobacter caeni DSM 16990 T, NR_042323 T 24 90PsteyniiPedobacter steynii DSM 19110 , NR_042605 T 18 PedobacterPmetabolip metabolipauper DSM 19035 , NR_042603 PafricanusPedobacter africanus DSM 12126T, NR_028977 30 T 21 PginsengisPedobacter ginsengisoli Gsoil 104 , NR_041374 PedobacterPpanaciter panaciterrae Gsoil 042T, NR_041371 T 66 PedobacterPheparinus heparinus DSM 2366 , CP001681** PedobacterPnyackens nyackensis DSM 19625 T, NR_044380 34 ArcticibacterArcticibact svalbardensis MN12-7T, JQ396621* ´ScomitansCandidatuscomitans´, X91814 21 zeaxanthinifaciens TDMA-5T, AB264126

http://standardsingenomics.org 785 Olivibacter sitiensis AW-6T

OlivibacterOsitiensis sitiensis DSM 17696T, DQ421387 53 100 PseudosphingobacteriumPseudosphing domesticum DSM 18733T, AM407725 OlivibacterOsoli soli Gsoil 034T, NR_041503 100 ´Sphingo21Sphingobacterium´ sp. 21, CP002584** 63 Olivibacter jilunii 14-2AT, HQ848949 98 Ojilunii OlivibacterOoleidegr oleidegradans TBF2/20.2T, HM021726 B 27 OlivibacterOginseng ginsengisoli Gsoil 060T, NR_041504 51 OlivibacterOterrae terrae Jip13T, NR_041502 ParapedobacterParapedsoli soli DCY14T, NR_044119 98 ParapedobacterParapedo koreensis Jip14T, DQ680836 92 SphingobacteriumSanhuiense anhuiense CWT, NR_044477 100 SphingobacteriumSkitahirosh kitahiroshimense 10CT, NR_041636 T 45 SphingobacteriumSfaecium faecium DSM 11690 , NR_025537* 99 SphingobacteriumSdetergens detergens 6.2ST, JN01521 SphingobacteriumSmultivorum multivorum DSM 11691T, NR_040953* 90 SphingobacteriumSthalpoph thalpophilum DSM 11723T, NR_042135* 51 SphingobacteriumSbambusae bambusae IBFC2009T, GQ339910 45 SphingobacteriumScompYoo composti NBRC 106383T, AB682399 85 SphingobacteriumSdaejeonen daejeonense TR6-04T, NR_041407 46 T 45 95 SphingobacteriumSmizutaii mizutaii DSM 11724 , NR_042134* SphingobacteriumSshayense shayense HS39T, FJ816788 SphingobacteriumScomTen composti T5-12T, NR_041363 37 SphingobacteriumSspiritiv spiritivorum ATCC 33861T, NR_044077* 100 SphingobacteriumspirATCC33300 spiritivorum ATCC 33300T, ACHB01000006* MucilaginibacterMcomposti composti TR6-03T, AB267719 78 MucilaginibacterMdorajii dorajii DR-f4T, GU139697 17 MucilaginibacterMrigui rigui NBRC 101115T, AB681382 86 MucilaginibacterM gracilis gracilis DSM 18602T, NR_042598 T 14 93 MucilaginibacterM ucilagi paludis DSM 18603 , AM490402* MucilaginibacterMgossypii gossypii Gh-67T, EU672804 34 T 79 MucilaginibacterM gossypiico gossypiicola Gh-48 , EU672805 Mucilaginibacter kameinonensis SCK, NR_041615 12 48 Mkameinon MucilaginibacterMoryzae oryzae DSM 19975T, NR_044404 MucilaginibacterM ximonensis ximonensis XM-003T, NR_044565 77 MucilaginibacterM daejeon daejeonensis Jip 10T, NR_041505 100 SolitaleaSkoreensiskoreensis DSM 21342T, EU787448 SolitaleaScanadensiscanadensis DSM 3403T, CP003349** PedobacterPafricanus africanus DSM 12126T, NR_028977 78 PedobacterPboryungenboryungensis BR-9T, HM640986 4 PedobacterPinsulae insulae DSM 18684T, NR_044083 5 T 33 68 PedobacterPkoreens koreensis WPCB189 , NR_043538 5 PedobacterPcaeni caeni DSM 16990T, NR_042323 8 86 PedobacterPsteynii steynii DSM 19110T, NR_042605 PedobacterPduraquae duraquae DSM 19034T, NR_042601 PedobacterPmetabolip metabolipauper DSM 19035T, NR_042603 T 80 PedobacterPcryoconitiscryoconitis DSM 14825 , NR_025534 PedobacterPhartonius hartonius DSM 19033T, NR_042604 72 82 T 48 PedobacterPhimalay himalayensis DSM 16196 , NR_042204 40 PedobacterPpiscium piscium DSM 11725T, NR_025536 T 4 100 SphingobacteriumSantarctic antarcticum DSM 15311 , FR733711 T 4 PedobacterPnyackens nyackensis DSM 19625 , NR_044380 PedobacterPheparinusheparinus DSM 2366T, CP001681** 8 67 PedobacterPpanaciter panaciterrae Gsoil 042T, NR_041371 6 PedobacterPginsengis ginsengisoli Gsoil 104T, NR_041374 PedobacterPalluvionis alluvionis DSM 19624T, NR_044382 PedobacterPborealis borealis DSM 19626T, NR_044381 73 T 58 30 PedobacterPsandarak sandarakinus DS-27 , NR_043665 PedobacterPsuwonensissuwonensis DSM 18130T, NR_043543 16 93 PedobacterPterrae terrae DSM 17933T, NR_044005 57 T 60 46 PedobacterPagri agri DSM 19486 , NR_044339* PedobacterProseus roseus CL-GP80T, NR_043555 PedobacterPaquatilis aquatilis AR107T, NR_042446 PedobacterPdaechung daechungensis Dae 13T, NR_041507 5 69 38 PedobacterPerricola terricola DS-45T, NR_044219 PedobacterParcticus arcticus NRRL B-59457T, HM051286* T 37 17 PedobacterPlentus lentus DS-40 , NR_044218 PedobacterPglucosidil glucosidilyticus DSM 23534T, EU585748* ´CandidatusScomitans comitans´, X91814 2 NubsellaNubsella zeaxanthinifaciens TDMA-5T, AB264126 11 T 8 ArcticibacterArcticibact svalbardensis MN12-7 , JQ396621* PedobacterPcomposti composti TR6-06T, NR_041506 T 43 PedobacterPoryzae oryzae DSM 19973 , EU109726* PedobacterPsaltans saltans DSM 12145T, CP002545**

Figure 1. Phylogenetic trees highlighting the position of O. sitiensis relative to the type strains of the species within the family Sphingobacteriaceae. The tree was inferred from 1,288 aligned characters [8,9] of the 16S rRNA gene sequence under (A) [previous page] the maximum likelihood (ML) [10] and (B) [this page] the maximum-parsimony criterion. In ML tree, the branches are scaled in terms of the expected number of substi- tutions per site. Numbers adjacent to the branches are support values from 100 ML bootstrap replicates (A) and from 1,000 maximum-parsimony bootstrap replicates (B) [11]. Lineages with strain genome sequencing projects registered in GOLD [12] are labeled with one asterisk, while those listed as 'Complete and Published' with two asterisks (e.g. Pedobacter heparinus [13] and P. saltans [14]).

786 Standards in Genomic Sciences Ntougias et al. Cells of O. sitiensis AW-6T are Gram-negative non- protocatechuate and D(+)-xylose, while L- motile rods [1] with a length of 1.0- cysteine, D(-)-fructose, D(+)-galactose, L-histidine, width of 0.2- lactose, sorbitol and sucrose are also utilized by temperature range for growth is 5-45°C,1.3 μm with and an a strain AW-6T [1]. O. sitiensis was found to be sensi- optimum at0.3 28 μm–32°C (Table [1]. 1 andO. Figuresitiensis 2). Theis tive to ampicillin, bacitracin, chloramphenicol, neutrophilic, showing no growth at 30 g L-1 NaCl penicillin, rifampicin, tetracycline and trime- [1]. The pH for growth ranges between 5 and 8, thoprim, and resistant to kanamycin, polymixin B with pH values of 6-7 being the optimum [1]. O. and streptomycin (antibiotics’ concentration of 50 sitiensis is strictly aerobic and chemo- -1) [1]. organotrophic; it assimilates mostly D(+)-glucose, μg ml

Figure 2. Electron micrograph of O. sitiensis AW-6T negatively-stained cells. Bar represents 1 μm. http://standardsingenomics.org 787 Olivibacter sitiensis AW-6T Table 1. Classification and general features of O. sitiensis AW-6T, according to the MIGS recommendations [15]. MIGS ID Property Term Evidence codea Domain Bacteria TAS [16] Phylum Bacteroidetes TAS [17,18] Class TAS [18,19] Current classification Order TAS [18,20] Family Sphingobacteriaceae TAS [21] Genus Olivibacter TAS [1] Species Olivibacter sitiensis TAS [1] Type-strain AW-6T TAS [1] Gram stain negative TAS [1] Cell shape rod TAS [1] Motility non-motile TAS [1] Sporulation non-sporulating TAS [1] Temperature range mesophile, 5-45°C TAS [1] Optimum temperature 28-32°C TAS [1] Salinity neutrophilic and non-halotolerant -no growth at TAS [1] 30 g l-1 NaCl MIGS-22 Oxygen requirement strictly aerobic TAS [1] Carbon source carbohydrates and amino-acids, utilization of TAS [1] protocatechuate and sorbitol Energy metabolism chemo-organotroph TAS [1] MIGS-6 Habitat olive mill waste TAS [1] MIGS-15 Biotic relationship free living TAS [1] MIGS-14 Pathogenicity none NAS Biosafety level 1 TAS [22] MIGS- Isolation alkaline two-phase olive mill waste (alkaline TAS [1] 23.1 alperujo) MIGS-4 Geographic location Toplou Monastery, Sitia, Crete, Greece TAS [1] MIGS-5 Sample collection time year 2003 NAS MIGS-4.1 Latitude 35.220 TAS [1] MIGS-4.2 Longitude 26.216 TAS [1] MIGS-4.3 Depth surface NAS MIGS-4.4 Altitude 161 m NAS

aEvidence codes - TAS: Traceable Author Statement (i.e. a direct report exists in the literature); NAS: Non- traceable Author Statement (i.e. not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [23]. If the evidence code is IDA, then the property was directly observed for a living isolate by one of the authors or an expert mentioned in the acknowledgements. C16:0 [1]. The only respiratory quinone found in O. Chemotaxonomy sitiensis is menaquinone with seven isoprene sub- The major polar lipids of O. sitiensis are units (MK-7) [1]. phosphatidylethanolamine (PE), phosphatidylmonomethylethanolamine (PME), Genome sequencing and annotation phosphatidylinositol mannoside (PIM), an un- known phospholipid (PL) and an unknown non- Genome project history phosphorylated lipid (UL) [4]. Moreover, the main This microorganism was selected for sequencing membrane fatty acids of O. sitiensis are C16: 1 on the basis of its phylogenetic position [24,25], and/or iso-C15:0 2-OH, iso-C15:0, iso-C17:0 3-OH and and is part of the Genomic Encyclopedia of Type ω7c 788 Standards in Genomic Sciences Ntougias et al. Strains, Phase I: the one thousand microbial ge- quencing, finishing and annotation were per- nomes (KMG) project [26] which aims in increas- formed by the DOE Joint Genome Institute (JGI) ing the sequencing coverage of key reference mi- using state of the art sequencing technology [27]. crobial genomes. The genome project is deposited A summary of the project information is presented in the Genomes On Line Database [12] and the ge- in Table 2. nome sequence is available from GenBank. Se-

Table 2. Genome sequencing project information. MIGS ID Property Term MIGS-31 Finishing quality High-Quality Draft MIGS-29 Sequencing platforms Illumina MIGS-31.2 Sequencing coverage 120× MIGS-30 Assemblers ALLPATHS v. r41043 MIGS-32 Gene calling method Prodigal 2.5 Genbank ID ATZA00000000 Genbank Date of Release September 5, 2013 GOLD ID Gi11724 NCBI project ID 165253 Database: IMG 2515154027 MIGS-13 Source material identifier DSM 17696T Project relevance GEBA-KMG, Tree of Life, Biodegradation Growth conditions and DNA isolation Velvet contigs using wgsim [32] (iii) Illumina O. sitiensis strain AW-6T was grown aerobically in reads were assembled with simulated read pairs DSMZ medium 92 (trypticase soy yeast extract using Allpaths–LG (version r41043) [33]. The final medium) [28] at 28°C. DNA was isolated from 0.5- draft assembly contained 110 contigs in 110 scaf- 1 g of cell paste using Jetflex Genomic DNA purifi- folds. The total size of the genome is 5.1 Mbp and cation kit (Genomed_600100) following the the final assembly is based on 605.8 Mbp of standard protocol as recommended by the manu- Illumina data, which provides an average 120.0× facturer but applying a modified cell lysis proce- coverage of the genome. dure (1 hour incubation at 58°C with additional 50 µl proteinase K followed by overnight incubation Genome annotation on ice with additional 200 µl PPT-buffer). DNA is Genes were identified using Prodigal [34] as part available via the DNA Bank Network [29]. of the DOE-JGI Annotation pipeline [35]. The pre- dicted CDSs were translated and used to search Genome sequencing and assembly the National Center for Biotechnology Information The draft genome of Olivibacter sitiensis DSM (NCBI) non-redundant database, UniProt, 17696 was generated at the DOE Joint genome In- TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro stitute (JGI) using the Illumina technology. An databases. Additional gene prediction analysis and Illumina Standard shotgun library was construct- functional annotation was performed within the ed and sequenced using the Illumina HiSeq 2000 Integrated Microbial Genomes (IMG-ER) [36]. platform, which generated 13,155,872 reads total- ing 1,973.4 Mbp. All general aspects of library Genome properties construction and sequencing performed at the JGI The genome is 5,053,571 bp long and comprises can be found at the JGI website [30]. All raw 110 scaffolds with an average GC content of Illumina sequence data were passed through DUK, 44.61% (Table 3). Of the 4,565 genes predicted, a filtering program developed at JGI, which re- 4,501 were protein-coding genes and 64 RNA moves known Illumina sequencing and library genes. Most protein-coding genes (68.52%) were preparation artifacts (Mingkun L, unpublished). assigned to a putative function, while the remain- The following steps were then performed for as- ing ones were annotated as hypothetical proteins. sembly: (i) filtered Illumina reads were assembled The distribution of genes into COGs functional cat- using Velvet (version 1.1.04) [31], (ii) 1–3 Kbp egories is presented in Table 4. simulated paired end reads were created from http://standardsingenomics.org 789 Olivibacter sitiensis AW-6T Table 3. Genome statistics. Attribute Value % of totala Genome size (bp) 5,053,571 100.00% DNA coding region (bp) 4,534,282 89.72% DNA G+C content (bp) 2,254,441 44.61% DNA scaffolds 110 Total genes 4,565 RNA genes 64 1.40% tRNA genes 47 1.03% Protein-coding genes 4,501 98.60% Genes with function prediction (pro- teins) 3,128 68.52% Genes in paralog clusters 1,777 38.93% Genes assigned to COGs 3,062 67.08% Genes assigned Pfam domains 3,471 76.04% Genes with signal peptides 501 10.97% Genes with transmembrane helices 1,124 24.62% CRISPR repeats 0 aThe total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome Table 4. Number of genes associated with the 25 general COG functional categories Code Value %agea Description J 159 4.7 Translation, ribosomal structure and biogenesis A 1 0.0 RNA processing and modification K 283 8.4 Transcription L 190 5.7 Replication, recombination and repair B 1 0.0 Chromatin structure and dynamics D 22 0.6 Cell cycle control, cell division, chromosome partitioning Y 0 0.0 Nuclear structure V 99 2.9 Defense mechanisms T 197 5.9 Signal transduction mechanisms M 274 8.1 Cell wall/membrane biogenesis N 7 0.2 Cell motility Z 0 0.0 Cytoskeleton W 0 0.0 Extracellular structures U 63 1.9 Intracellular trafficking and secretion, and vesicular transport O 120 3.6 Posttranslational modification, protein turnover, chaperones C 168 5.0 Energy production and conversion G 259 7.7 Carbohydrate transport and metabolism E 211 6.3 Amino acid transport and metabolism F 61 1.8 Nucleotide transport and metabolism H 148 4.4 Coenzyme transport and metabolism I 107 3.2 Lipid transport and metabolism P 238 7.1 Inorganic ion transport and metabolism Q 52 1.5 Secondary metabolites biosynthesis, transport and catabolism R 419 12.5 General function prediction only S 280 8.3 Function unknown - 1,503 32.9 Not in COGs

790 Standards in Genomic Sciences Ntougias et al. aThe total is based on the total number of protein coding genes in the annotated genome. Based on genomic analysis of the metabolic fea- the US Department of Energy Office of Science, Bi- tures, O. sitiensis is an auxotroph for L-alanine, L- ological and Environmental Research Program, arginine, L-histidine, L-isoleucine, L-leucine, L- and by the University of California, Lawrence lysine, L-phenylalanine, L-proline, L-serine, L- Berkeley National Laboratory under contract No. tyrosine, L-tryptophan and L-valine, and a proto- DE-AC02-05CH11231, Lawrence Livermore Na- troph for L-aspartate, L-glutamate and glycine. tional Laboratory under Contract No. DE-AC52- Selenocysteine and biotin cannot be synthesized 07NA27344, as well as German Research Founda- by O. sitiensis. Strain AW-6T can utilize L-arabinose tion (DFG) INST 599/1-2. and maltose (via orthophosphate activation), whereas no maltose hydrolysis is achieved [1]. References Genome analysis revealed the genetic and molecu- 1. Ntougias S, Fasseas C, Zervakis GI. Olivibacter lar bases of the degradation of recalcitrant com- sitiensis gen. nov., sp. nov., isolated from alkaline pounds by O. sitiensis. The ability of O. sitiensis to olive-oil mill wastes in the region of Sitia, Crete. degrade phenolic compounds is verified by the Int J Syst Evol Microbiol 2007; 57:398-404. distribution of genes encoding oxidoreductases PubMed http://dx.doi.org/10.1099/ijs.0.64561-0 that act on diphenols and related substances and 2. Wang L, Ten LN, Lee HG, Im WT, Lee ST. by the 2-keto-4-pentenoate hydratase/2- Olivibacter soli sp. nov., Olivibacter ginsengisoli oxohepta-3-ene-1,7-dioic acid hydratase-coding sp. nov. and Olivibacter terrae sp. nov., from soil of a ginseng field and compost in South Korea. Int genes that are involved in the catechol pathway. J Syst Evol Microbiol 2008; 58:1123-1127. Genes encoding β-1,4-xylanases and β-1,4- PubMed http://dx.doi.org/10.1099/ijs.0.65299-0 xylosidases were also identified in the genome of 3. Szabó I, Szoboszlay S, Kriszt B, Háhn J, Harkai P, T strain AW-6 , indicating that O. sitiensis is a Baka E, Táncsics A, Kaszab E, Privler Z, Kukolya J. xylanolytic bacterium involved in the cleavage of Olivibacter oleidegradans sp. nov., a hydrocarbon β-1,4-xylosic bonds in hemicelluloses. The exist- degrading bacterium isolated from a biofilter ence of protocatechuate 3,4-dioxygenase cleanup facility on a hydrocarbon-contaminated (dioxygenase_C)-coding genes are indicative of the site. Int J Syst Evol Microbiol 2011; 61:2861- ability of this bacterium to degrade benzoate and 2865. PubMed 2,4-dichlorobenzoate. Genes encoding http://dx.doi.org/10.1099/ijs.0.026641-0 carboxymethylenebutenolidase were distributed 4. Chen K, Tang SK, Wang GL, Nie GX, Li QF, in the genome of O. sitiensis, indicating its poten- Zhang JD, Li WJ, Li SP. Olivibacter jilunii sp. nov., tial for hexachlorocyclohexane and 1,4- isolated from a DDT-contaminated soil. Int J Syst dichlorobenzene degradation. Oxidoreductases re- Evol Microbiol 2013; 63:1083-1088. PubMed lated to aryl-alcohol dehydrogenases were pre- http://dx.doi.org/10.1099/ijs.0.042416-0 dicted, showing that O. sitiensis may be also in- 5. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, volved in biphenyl and toluene/xylene degrada- Andersen GL. Greengenes, a chimera-checked tion. This is also strengthened by the identification 16S rRNA gene database and workbench of an uncharacterized protein, possibly involved in compatible with ARB. Appl Environ Microbiol aromatic compounds catabolism. Moreover, puta- 2006; 72:5069-5072. PubMed tive multicopper oxidases with possible laccase- http://dx.doi.org/10.1128/AEM.03006-05 like activity were identified. Mercuric reductase- 6. Porter MF. An algorithm for suffix stripping. and arsenate reductase-coding genes as well as Program: electronic library and information organic solvent tolerance and chromate transport systems 1980; 14:130-137. proteins encoded in the genome indicate possible 7. Kim MK, Na JR, Cho DH, Soung NK, Yang DC. resistance of O. sitiensis to the presence of heavy Parapedobacter koreensis gen. nov., sp. nov. Int J metals and organic solvents. Syst Evol Microbiol 2007; 57:1336-1341. PubMed http://dx.doi.org/10.1099/ijs.0.64677-0 Acknowledgments 8. Lee C, Grasso C, Sharlow MF. Multiple sequence We would like to gratefully acknowledge the help alignment using partial order graphs. of Brian J. Tindall and his team for growing O. Bioinformatics 2002; 18:452-464. PubMed sitiensis cultures, and Evelyne-Marie Brambilla for http://dx.doi.org/10.1093/bioinformatics/18.3.452 DNA extraction and quality control (all at DSMZ). 9. Castresana J. Selection of conserved blocks from This work was performed under the auspices of multiple alignments for their use in phylogenetic http://standardsingenomics.org 791 Olivibacter sitiensis AW-6T analysis. Mol Biol Evol 2000; 17:540-552. (eds), Bergey's Manual of Systematic PubMed Bacteriology, Second Edition, Volume 4, http://dx.doi.org/10.1093/oxfordjournals.molbev.a Springer-Verlag, New York, 2011, p. 330. 026334 20. Kampfer P. Order I. Sphingobacteriales ord. nov. 10. Stamatakis A, Hoover P, Rougemont J. A rapid In: Krieg NR, Staley JT, Brown DR, Hedlund BP, bootstrap algorithm for the RAxML web servers. Paster BJ, Ward NL, Ludwig W, Whitman WB Syst Biol 2008; 57:758-771. PubMed (eds), Bergey's Manual of Systematic http://dx.doi.org/10.1080/10635150802429642 Bacteriology, Second Edition, Volume 4, 11. Gouy M, Guindon S, Gascuel O. SeaView Springer-Verlag, New York, 2011, p. 330. Version 4: A multiplatform graphical user 21. Steyn PL, Segers P, Vancanneyt M, Sandra P, interface for sequence alignment and Kersters K, Joubert JJ. Classification of phylogenetic tree building. Mol Biol Evol 2010; heparinolytic bacteria into a new genus, 27:221-224. PubMed Pedobacter, comprising four species: Pedobacter http://dx.doi.org/10.1093/molbev/msp259 heparinus comb. nov., Pedobacter piscium comb. 12. Pagani I, Liolios K, Jansson J, Chen IMA, nov., Pedobacter africanus sp. nov. and Smirnova T, Nosrat B, Markowitz VM, Kyrpides Pedobacter saltans sp. nov. proposal of the family NC. The Genomes OnLine Database (GOLD) v.4: Sphingobacteriaceae fam. nov. Int J Syst Bacteriol Status of genomic and metagenomic projects and 1998; 48:165-177. PubMed their associated metadata. Nucleic Acids Res http://dx.doi.org/10.1099/00207713-48-1-165 2012; 40:D571-D579. PubMed 22. Bundesanstalt für Arbeitsschutz und http://dx.doi.org/10.1093/nar/gkr1100 Arbeitsmedizin (BAuA), Classification of 13. Han C, Spring S, Lapidus A, Glavina Del Rio T, prokaryotes (bacteria and archaea) into risk Tice H, Copeland A, Cheng JF, Lucas S, Chen F, groups. Technical Rule for Biological Agents 466 Nolan M, et al. Complete genome sequence of (TRBA 466), Germany, 2010, p. 157. Pedobacter heparinus type strain (HIM 762-3T). 23. Ashburner M, Ball CA, Blake JA, Botstein D, Stand Genomic Sci 2009; 1:54-62. PubMed Butler H, Cherry JM, Davis AP, Dolinski K, http://dx.doi.org/10.4056/sigs.22138 Dwight SS, Eppig JT, et al. Gene ontology: tool for 14. Liolios K, Sikorski J, Lu M, Nolan M, Lapidus A, the unification of biology. The Gene Ontology Lucas S, Hammon N, Deshpande S, Cheng JF, Consortium. Nat Genet 2000; 25:25-29. PubMed Tapia R, et al. Complete genome sequence of the http://dx.doi.org/10.1038/75556 gliding, heparinolytic Pedobacter saltans type 24. Klenk HP, Göker M. En route to a genome-based strain (113T). Stand Genomic Sci 2011; 5:30-40. classification of Archaea and Bacteria? Syst Appl PubMed http://dx.doi.org/10.4056/sigs.2154937 Microbiol 2010; 33:175-182. PubMed 15. Field D, Garrity G, Gray T, Morrison N, Selengut http://dx.doi.org/10.1016/j.syapm.2010.03.003 J, Sterk P, Tatusova T, Thomson N, Allen MJ, 25. Wu D, Hugenholtz P, Mavromatis K, Pukall R, Angiuoli SV, et al. The minimum information Dalin E, Ivanova NN, Kunin V, Goodwin L, Wu about a genome sequence (MIGS) specification. M, Tindall BJ, et al. A phylogeny-driven Genomic Nat Biotechnol 2008; 26:541-547. PubMed Encyclopaedia of Bacteria and Archaea. Nature http://dx.doi.org/10.1038/nbt1360 2009; 462:1056-1060. PubMed 16. Woese CR, Kandler O, Wheelis ML. Towards a http://dx.doi.org/10.1038/nature08656 natural system of organisms: proposal for the 26. Kyrpides NC, Woyke T, Eisen JA, Garrity G, domains Archaea, Bacteria, and Eucarya. Proc Lilburn TG, Beck BJ, Whitman WB, Hugenholz P, Natl Acad Sci USA 1990; 87:4576-4579. PubMed Klenk HP. Genomic Encyclopedia of Type Strains, http://dx.doi.org/10.1073/pnas.87.12.4576 Phase I: the one thousand microbial genomes 17. Krieg NR, Ludwig W, Euzeby J, Whitman WB. (KMG-I) project. Stand Genomic Sci 2013; 9:628- Phlum XIV. Bacteroidetes phyl. nov. In: Krieg NR, 634; http://dx.doi.org/10.4056/sigs.5068949. Staley JT, Brown DR, Hedlund BP, Paster BJ, 27. Mavromatis K, Land ML, Brettin TS, Quest DJ, Ward NL, Ludwig W, Whitman WB (eds), Copeland A, Clum A, Goodwin L, Woyke T, Bergey's Manual of Systematic Bacteriology, Lapidus A, Klenk HP, et al. The fast changing Second Edition, Volume 4, Springer-Verlag, New landscape of sequencing technologies and their York, 2011, p. 25. impact on microbial genome assemblies and 18. Editor L. Validation List No. 143. Int J Syst Evol annotation. PLoS ONE 2012; 7:e48837. PubMed Microbiol 2012; 62:1-4. http://dx.doi.org/10.1371/journal.pone.0048837 19. Kampfer P. Class III. Sphingobacteriia class. nov. 28. List of growth media used at DSMZ. In: Krieg NR, Staley JT, Brown DR, Hedlund BP, http://www.dsmz.de/catalogues/catalogue- Paster BJ, Ward NL, Ludwig W, Whitman WB

792 Standards in Genomic Sciences Ntougias et al. microorganisms/culture-technology/list-of-media- 108:1513-1518. PubMed for-microorganisms.html. http://dx.doi.org/10.1073/pnas.1017351108 29. Gemeinholzer B, Dröge G, Zetzsche H, 34. Hyatt D, Chen GL, Locascio PF, Land ML, Haszprunar G, Klenk HP, Güntsch A, Berendsohn Larimer FW, Hauser LJ. Prodigal: Prokaryotic WG, Wägele JW. The DNA Bank Network: the gene recognition and translation initiation site start from a German initiative. Biopreserv Biobank identification. BMC Bioinformatics 2010; 11:119. 2011; 9:51-55. PubMed http://dx.doi.org/10.1186/1471-2105-11- http://dx.doi.org/10.1089/bio.2010.0029 119 30. DOE Joint Genome Institute. 35. Mavromatis K, Ivanova NN, Chen IM, Szeto E, http://www.jgi.doe.gov Markowitz VM, Kyrpides NC. The DOE-JGI 31. Zerbino DR, Birney E. Velvet: algorithms for de Standard operating procedure for the annotations novo short read assembly using de Bruijn graphs. of microbial genomes. Stand Genomic Sci 2009; Genome Res 2008; 18:821-829. PubMed 1:63-67. PubMed http://dx.doi.org/10.1101/gr.074492.107 http://dx.doi.org/10.4056/sigs.632 32. wgsim. https://github.com/lh3/wgsim 36. Markowitz VM, Mavromatis K, Ivanova NN, Chen 33. Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, IM, Chu K, Kyrpides NC. IMG ER: a system for Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, microbial genome annotation expert review and Sykes S, et al. High–quality draft assemblies of curation. Bioinformatics 2009; 25:2271-2278. mammalian genomes from massively parallel PubMed sequence data. Proc Natl Acad Sci USA 2011; http://dx.doi.org/10.1093/bioinformatics/btp393

http://standardsingenomics.org 793