MARGEN-00353; No of Pages 10 Marine xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Marine Genomics

journal homepage: www.elsevier.com/locate/margen

Marine metagenomics as a source for bioprospecting

Rimantas Kodzius a,c,d,⁎,TakashiGojoboria,b,d a Computational Bioscience Research Center (CBRC), Saudi Arabia b Biological and Environmental Sciences and Engineering Division (BESE), Saudi Arabia c Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Saudi Arabia d King Abdullah University of Science and Technology (KAUST), Saudi Arabia article info abstract

Article history: This review summarizes usage of -editing technologies for metagenomic studies; these studies are used Received 31 March 2015 to retrieve and modify valuable microorganisms for production, particularly in marine metagenomics. Organisms Received in revised form 15 June 2015 may be cultivable or uncultivable. Metagenomics is providing especially valuable information for uncultivable Accepted 1 July 2015 samples. The novel , pathways and can be deducted. Therefore, metagenomics, particularly Available online xxxx genome engineering and system biology, allows for the enhancement of biological and chemical producers and the creation of novel bioresources. With natural resources rapidly depleting, genomics may be an effective Keywords: fi fi Microorganism way to ef ciently produce quantities of known and novel foods, livestock feed, fuels, pharmaceuticals and ne Producer or bulk chemicals. Bioprospecting © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license Metabolic engineering (http://creativecommons.org/licenses/by/4.0/).

Contents

1. Introduction...... 0 2. Metagenomicsasatooltostudytheenvironment...... 0 2.1. Diversityofmicroorganisms...... 0 3. Bioprospecting—theprocessofdiscoveryandcommercializationofnewproductsbasedonbiologicalresources...... 0 3.1. Sequence-basedscreening...... 0 3.2. Assessmentofmetabolicpotentialbyfunction-basedscreening...... 0 4. Metabolicengineeringasatoolforcellfactories...... 0 4.1. Establishedproductionstrains...... 0 4.1.1. E. coli asaproducer...... 0 4.1.2. Algaeasaproducer...... 0 4.1.3. S. cerevisiae asaproducer...... 0 4.2. Toolstoimprovethefunctionofstrains...... 0 4.2.1. Geneticengineeringusingmolecularbiologytools...... 0 4.2.2. Neworganismscreatedbysyntheticbiology...... 0 5. Conclusion/outlook...... 0 References...... 0

1. Introduction

Microbes are ubiquitous and an essential part of all life on Earth. They may be considered a major bioresource, acting as powerful chem- ical factories that transform environmental chemicals (carbon, hydro- ⁎ Corresponding author at: King Abdullah University of Science and Technology, gen, nitrogen, oxygen, phosphorus and sulfur) into more biologically Thuwal 23955-6900, Saudi Arabia. E-mail addresses: [email protected] (R. Kodzius), accessible molecules, which are then used by higher organisms [email protected] (T. Gojobori). (Adkins et al., 2012). All higher life forms, including plants and animals,

http://dx.doi.org/10.1016/j.margen.2015.07.001 1874-7787/© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Please cite this article as: Kodzius, R., Gojobori, T., Marine metagenomics as a source for bioprospecting, Mar. Genomics (2015), http://dx.doi.org/ 10.1016/j.margen.2015.07.001 2 R. Kodzius, T. Gojobori / Marine Genomics xxx (2015) xxx–xxx host over orders of magnitude from millions to trillions of microbes. Some to a mosaic of multiple populations (Sharon and Banfield, 2013; Culley make necessary nutrients and vitamins or help to digest and convert food. et al., 2006). By simplifying the complex environmental population Microbes also benefit the environment by removing contaminants and “composite genomes,” can be obtained and sequence variation can be pollutants from soil, groundwater, sediment and surface water. However, studied (Allen and Banfield, 2005)orgenomecontentscanbeinferred our understanding of these complex communities remains limited. (Tyson et al., 2004; Martin et al., 2006). Perhaps not surprisingly, Marine environments comprise over 70% of the earth's surface, rang- less complex communities were obtained from samples of extreme en- ing from habitats in the freezing Arctic and Antarctic to the warm vironments such as acid mines (high acidity) and thermal geysers (high waters of the tropics. Microorganisms thrive throughout oceans, temperature) (Zarraonaindia et al., 2013). Communities can be further reaching depths of 11,000 m (mean depth 3200 m), with pressures fractionated by cell size (filtration through defined pore size filters), exceeding 100 MPa and temperatures higher than 100 °C in deep-sea DNA composition (GC% gradients), flow cytometry (fluorescence hydrothermal vents (Kennedy et al., 2010). Typically, they associate activated cell sorting), affinity purification or by particular metabolic with other organisms: countless communities comprise , processes (Suenaga, 2012). Recent advancements permit the isolation , protists, fungi and viruses. Furthermore, single-celled and of single cells that can be lysed to extract genomic DNA (or RNA) multicellular microorganisms are responsible for 98% of primary followed by amplification and sequencing. Single-cell genomics is an productivity in marine ecosystems, making them integral to marine emerging method, getting more and more popularity due to the food chains and specifically to carbon and energy cycles (Sogin et al., constantly lowering sequencing costs (Kodzius and Gojobori, accepted 2006). Their evolution over the past 3.5 billion years has produced a for publication). Quake sequenced five single cells of an ammonia- high diversity in genetic and phenotypic variation. Marine meta- oxidizing archaea Nitrosoarchaeum limnia isolated from low-salinity genomics is, therefore, an excellent tool for reading the abundance of sediments in the San Francisco Bay (Blainey et al., 2011). By combining novel genetic information and unlocking the immensity of metabolic single-cell genomic and metagenomic approaches a high-quality draft diversity available from microorganisms. genome revealing features unique to N. limnia's low-salinity environ- ment was obtained. Because in single-cell genomics, genes expressed 2. Metagenomics as a tool to study the environment in the cell can be decoded and complete metabolic pathways can be established in a single genome (rather than in multiple distinct Antonie van Leeuwenhoek observed single-celled organisms discov- genomes), Quake could prove that N. limnia uses a modified version of ering microorganisms using microscopic lenses already in the 17th the 3-hydroxypropionate/4-hydroxybutyrate pathway for carbon century. Now, microbes are used by industries to cultivate various live- fixation (Blainey et al., 2011). While computational tools can help to stock feeds and nutrients; they are also used as energy sources and in solve certain scientific questions, the combination of metagenomics the production of pharmaceuticals. In the late 20th century, prokaryotic and single-cell genomics on a sample undoubtedly provides the communities were typically deciphered using 16S rRNA sequenc- means to more easily assemble contigs and a clearer view of the data. ing from total genomic DNA extracted from the whole community. For example, the combined single cell/metagenomic approach enabled Moreover, 18S rRNA gene sequencing was introduced to translate the Quake to test the single cell assemblies for completeness and the eukaryotic community (Rivas et al., 2004). metagenome for contamination (Blainey et al., 2011). Of course, the sample handling, storage and choice of DNA extraction At present, there is a shift from a gene-centric towards a more method have impact on the final multiple community genomic DNA organism-centric approach to encompass community genomics (i.e., representation. The environmental strains can vary in yield of DNA ecogenomics or environmental genomics) (DeLong et al., 2006; following even the same extraction protocol. The disruption of the cell Worden et al., 2006; McMahon, 2015). For example, DeLong et al. depends on the structure. For example, it is known that used sequence variation to analyze surface to near sea-floor depths of Gram-positive bacteria have a much thicker layer (20–80 nm) of pepti- planktonic microbial communities in the North Pacific Subtropical doglycan that encircles the cell, compared to Gram-negative bacteria Gyre (DeLong et al., 2006). From the genetic material obtained, they (10 nm). Additionally teichoic acid in Gram-positive bacteria stabilizes were able to infer both phylogeny and function of the material collected, the cell wall and makes it stronger. While it is relatively simple to dis- including uncultivable microbes. Knowledge gained from comparative rupt the Gram-negative cells (for example by bead-beating), additional genomic analysis of microbial communities provides insight freezing and thawing are used for more efficient disruption of Gram- into higher-order community organization and dynamics. McMahon positive bacterial envelope. Chemical and enzymatic lysis methods are recognizes the importance of generating the draft genomes from used in combination with mechanical and temperature control to weak- metagenomes, then pairing these reference genomes with the en and disrupt the cell wall (El Bali et al., 2014; Wesolowska-Andersen metatranscriptome datasets. Summarizing the current advancements, et al., 2014). The spores, yeast and mycobacteria are even more difficult McMahon calls all these “Metagenomics 2.0” (McMahon, 2015). Interest to break, as the cell wall is more complicated than Gram-negative and is also growing in “high-resolution genomics”, where multiple Gram-positive bacteria (Vingataramin and Frost, 2015). The extraction metagenomic samples (usually from the same source) are sequenced, methods are adapted to deal with the presence of possible inhibitors in and sequencing reads are assembled into genes. The groups of genes the soil (such as humic acids), also metabolites of the cells (Greco et al., that covary in abundance are thought to belong to the same genome 2014). There are additional challenges with possibility to gain additional (Mick and Sorek, 2014). For example, the covariation of gene abundance information by separately isolating intracellular and extracellular can be used to resolve a structure of a complex microbial sample. (Alawi et al., 2014). To increase the sample processing throughput, auto- The reconstruction of whole genomes that is possible using data mated DNA extraction platforms are offered by several companies. An ar- from metagenomics enables not only sequencing of complete genes ticle by Oldham et al. describes a comparison of such platforms for the and metabolic pathways, but also the construction of evolutionary extraction of DNA from seawater and oil samples (Oldham et al., 2012). trees. The combination of deep sequencing and bioinformatic While amplicon sequencing is still the dominating method, we approaches allows for metagenome-based genome recovery, even foresee that with the development of next generation sequencing from very complex systems. For example, both bacterial phylum TM7 methods, the standard will be to conduct shotgun sequencing of micro- and phylum bacteriodetes were assembled separately into a single bial community DNA and cDNA (to access expressed genes) (Tringe continuous sequence (one contig), basically as a single circular chromo- et al., 2005). Shotgun metagenomic sequencing allows not only the some that is complete genome. The relative metagenome abundance of sequencing of microbial, eukaryotic and archaeal genomes, but also of the completely assembled TM7 and bacteriodetes is ~(0.47–1.58)% viral genomes. In 2006, Culley et al. analyzed coastal RNA viral commu- (Albertsen et al., 2013). In addition, complete genomes have been nities. However, assembly of sequenced reads was problematic and led obtained from organisms that constitute ~1% of the community in

Please cite this article as: Kodzius, R., Gojobori, T., Marine metagenomics as a source for bioprospecting, Mar. Genomics (2015), http://dx.doi.org/ 10.1016/j.margen.2015.07.001 R. Kodzius, T. Gojobori / Marine Genomics xxx (2015) xxx–xxx 3 oceans, sediments and even from the adult human gut (Sharon and a niche of favorable conditions, recently discovered extremely large Banfield, 2013; Albertsen et al., 2013). Although genomes can be obtain- DNA viruses such as the pandoravirus and the mimivirus, which are ed from uncultivable cells, the average genome completeness is lower considered to exist in a parasitic fourth (Claverie, 2013; than that for cultivable cells. Rinke et al. recovered 201 partial genomes Philippe et al., 2013). By mining the Global Ocean Sampling environ- with 40–55% genome completeness, ranging from a few percent to mental sequence database, the closest relatives to Mimivirus were greater than 90% (Rinke et al., 2013). The variation may be attributed found in the sea (Claverie, 2013). to the bias of the single cell DNA amplification reaction — freshwater and marine samples yielded the highest percentages of successfully am- 3. Bioprospecting—the process of discovery and commercialization plified genomes (up to 40%), whereas the success rates for soil samples of new products based on biological resources tended to be low (b10%) (Rinke et al., 2014). Single-cell genome sequencing is a complimentary approach to metagenomics (Kodzius Metagenomics is not a new tool in the science. It is already and Gojobori, accepted for publication; Rinke et al., 2013; Kashtan established in the scientific field, revealing secrets of the earth's micro- et al., 2014). We expect that single-cell genomics will improve by parallel bial communities. In addition to studies of microbial taxonomic diversi- sequencing a few of the same cells, better covering fragmented genomes. ty, ecology and evolution, metagenomics is an important tool for It has been suggested that genomes with possible variations be stored as environmental resources (earth and life sciences, bioenergy, bioremedi- reference genomes rather than as individual sequences. This would allow ation, and biotechnology), agriculture (food production and safety), easier storage, handling of the data and genome comparison for the devel- biomedicine (biomedical sciences, biotechnology, biodefense and opment of evolutionary trees (Kahn, 2011; Nelson et al., 2010). Because microbial forensics), sustainability and ecology (N.R. Council, 2007). most microbes cannot be predictively cultured, rRNA phylotyping and metagenomics are favorable approaches to explain the microbial commu- 3.1. Sequence-based screening nity thriving in a particular environment. In a sequence-driven analysis, the PCR primers are designed based 2.1. Diversity of microorganisms on conserved DNA sequences. The metagenomics library is cloned into vector DNA and transformed into the producer. The clones are screened Using metagenomics, the genes and pathways of both cultivable and for the sequence of interest (Schloss and Handelsman, 2003). Of course, uncultivable organisms can be discovered. Estimates from deposited such sequence-driven approach has limitation. We can expect more total 16S rRNA sequences indicate that cultivable organisms constitute novel information obtained by sequencing environmental samples much less than 1% (12,000) (Yarza et al., 2013; Cole et al., 2014)ofan without prior knowledge on known sequences in databases. For exam- estimated total microbial populations (4 million) (Quast et al., 2013). ple, metagenome sequencing data from bacterial communities helps to may comprise 106 to 108 separate genospecies (Sleator identify genes that encode for natural products. Global sequencing of et al., 2008). Konstantinidis recommended that prokaryotic organisms samples to assess biosynthetic diversity has indicated that the biosyn- be classified (especially the uncultivable) and that their vast but finite thetic potential of microorganisms remains untapped, particularly diversity be described using metagenomics (Konstantinidis and because a large number of producers are uncultivable (Zhang and Rosselló-Móra, 2015). There have been some studies to assess the envi- Moore, 2015; Wilson et al., 2014). Often the potential producers are ronmental sample diversity. A study by Zachary Charlop-Powers et al. cohabitating in the closed environment. This can be observed in the have investigated soil samples from around the world and found that marine environment, such as microbial symbionts in sponges (Zhang there is little overlap between samples collected from different global et al., 2015)orcorals(Lema et al., 2012). Wilson et al. used single- locations (less than 3%); the strongest sequence homology was detected cell- and metagenomic-based approaches to discover a group of in samples collected in close proximity (Zhang and Moore, 2015; producers known as Entotheonella. The candidate genus Entotheonella Charlop-Powers et al., 2015). Kashtan et al. applied metagenomics and is a co-inhabitant of the chemically and microbially rich marine sponge single-cell genomics by sequencing 90 individual cell genomes on the Theonella swinhoei. The Entotheonella producer DNA was detected by globally abundant marine cyanobacterium Prochlorococcus in Bermuda PCR primers specific for genes encoding the respective pathways. The (Kashtan et al., 2014). A cell-by-cell genomic diversity assessment authors proved that a single member of the highly diverse microbiome evidenced the coexistence of Prochlorococcus subpopulations, where of the sponge host T. swinhoei is the source of almost all the polyketides they maintained a relatively stable population size in highly mixed and peptides that have been isolated from this sponge (Wilson et al., habitats (Kashtan et al., 2014).Using metagenomics approach, there is 2014). no need to isolate or cultivate the microorganisms. Directly isolated Using a metagenomic library (Fig. 1) simplifies the discovery of nuclei acids provide information on the metabolic and functional genes. In fact, it has facilitated the discovery of many novel enzymes capacity of a specific microbial community (Simon and Daniel, 2011). and biocatalysts from uncultivable bacteria. There are two main strate- Metatranscriptomics helps to explain which metabolic pathways and gies to identify and assign sequence tags to the related species genes are expressed in a given place at a given time. In parallel, it is (Sharpton, 2014). In binning, every metagenomic sequence is assigned possible to prepare and sequence both genomic DNA and total RNA to a taxonomic group. It is done through comparison to the referential libraries, and there are several reports on metatrascriptome studies data or clustering into groups based on shared characteristics (such as performed on samples taken from marine waters (Mason et al., 2012; GC content). In assembly, the sequencing reads are assembled by possi- Poretsky et al., 2009; Shi et al., 2009). Going forward, we believe that bly generating longer sequences, allowing later easier mapping and standard sample collection and processing protocols will include not bioinformatics analysis. Here the strain or species can be identified in only the analysis of DNA, but also of RNA, proteins and metabolites. In metagenomes using genome-specific markers, such as k-mers this way not only the taxonomic, but also functional diversity of the (Tu et al., 2014). After the sequence pre-processing and clustering, environmental communities, as well as the expressed genes and bioinformatics is used for gene prediction and for further functional metabolic activity in the chosen environment can be assessed. annotation (Richardson and Watson, 2013). Resolving and quantifying Metagenomics goes hand in hand with next generation sequencing taxonomic diversity allows to understand the biological function of and high-performance supercomputing, all of which enable broad the community, as the presence of specific species will indicate the access to microorganism diversity and function (Knight et al., 2012). function of described taxa. We envision improved bioinformatics Metagenomics is a useful tool to explore and discover the unknown. analysis by combination of short reads (such as Illumina) with long It is believed that there are other undiscovered branches on the tree of ones (for example from Pacific Biosciences), allowing better species life (Woyke and Rubin, 2014). While the RNA world may still exist in identification and gene prediction.

Please cite this article as: Kodzius, R., Gojobori, T., Marine metagenomics as a source for bioprospecting, Mar. Genomics (2015), http://dx.doi.org/ 10.1016/j.margen.2015.07.001 4 R. Kodzius, T. Gojobori / Marine Genomics xxx (2015) xxx–xxx

Fig. 1. The discovery process involves marine sampling, DNA sequencing and contig generation. Previously unknown genes, pathways and even whole genomes are being discovered.

The development of commercially available compounds favors the 3′-ends of a gene can be sequenced by high-throughput methods to use of large sequence databases (Sugawara et al., 2008; Kodama et al., recover many clusters and discover a large variety of genes (Iwai et al., 2012; Kryukov et al., 2012) to look for previously unknown membrane 2010). proteins, and especially enzymes (Gabor et al., 2004; Gupta et al., 2002). Enzymes are used by industries in foods, agriculture and 3.2. Assessment of metabolic potential by function-based screening livestock feed, paper and leather production, textile processing, deter- gents, personal care products, as well as in the production of fine and In sequence-based screening, the isolation of newly discovered bulk chemicals (DeSantis et al., 2002). enzymes is based on homological sequences, which often reveal only While newly discovered enzymes and other useful biomolecules can partial sequences: it is limited to known sequences of interest that do be found in metagenomic sequence reads and assembled contigs, a not allow for the discovery of information about the biochemical func- sequence-based approach is to design DNA primers derived from tions of the encoded enzymes. Activity-based assays can be performed conserved regions, amplifying the variety of novel variants of proteins to isolate genes that encode for novel biomolecules or biological activi- by PCR. Numerous genes have been found to encode for several newly ties. This “function-based screening” uses metagenomic libraries discovered enzymes such as nitrile reductases, chitinases, dioxygenases, constructed in expression vectors to express in a chosen host; sequence hydrogenases, glycerol dehydratases and specific enzyme-degrading information is not required. On the other hand functional screens enable compounds (Simon and Daniel, 2011). In the absence of sequencing, the identification of novel gene classes that encode for previously quantitative PCR alone can be used to analyze the diversity and abun- unknown or known functions (Ferrer et al., 2009; Handelsman, 2005; dance of certain genes. For example, Zaprasis found and quantified Riesenfeld et al., 2004). However, function-driven approaches are (per gram of soil) many previously unknown herbicide-degrading much slower because genes must be expressed in a selected vector, dioxygenases by quantitative PCR (Zaprasis et al., 2010). This is done where enzymes are ensured to be correctly folded. Although function- by designing the degenerated primer sets from conserved regions of based approaches are becoming more popular, their set-up and devel- genes of interest (such as dioxygenase). While the quantitative PCR is opment are time consuming and tedious; and because hit rates are a tool to analyze the gene expression of certain genes, the analysis of low, high-throughput screening is required. Miniaturization and the certain genes might well be used for diversity studies of process- application of microfluidic technologies are key to the success of geno- associated bacteria, determining the diversity of genotypes represented mics in the future (Wu et al., 2012, 2014; F.Q. Li et al., 2014). Reliable in gene libraries (Zaprasis et al., 2010). Furthermore, PCR can be and sensitive enzymatic assays also require sophisticated substrates combined with high-throughput sequencing, which allowed Varaljay and highly sensitive analytical methods (such as high-performance to obtain 62,000 sequences gathered into N700 clusters of environmen- liquid chromatography) to detect biomolecules. On the other hand, tal dimethylsulfoniopropionate demethylase (dmdA) sequences function-based screening permits the detection of clones that contain (Varaljay et al., 2010). This was all done using primers designed from previously unknown, active enzymes, from which conclusions about the DNA of one free-living marine bacterioplankton. Because PCR prod- enzyme physicochemical parameters and activities can be drawn ucts are of limited size and sequencing usually reveals only incomplete (Rabausch et al., 2013; Parachin and Gorwa-Grauslund, 2011). Subse- gene sequences, Iwai et al. introduced “gene-targeted metagenomics” quently, protein engineering through in vitro evolution can be used to to recover full versions of the target genes. They did this by designing create enzymes with improved properties by mutational exploration probes from collected sequence information from which the 5′-and and selection of the best candidates.

Please cite this article as: Kodzius, R., Gojobori, T., Marine metagenomics as a source for bioprospecting, Mar. Genomics (2015), http://dx.doi.org/ 10.1016/j.margen.2015.07.001 R. Kodzius, T. Gojobori / Marine Genomics xxx (2015) xxx–xxx 5

In addition to enzymes, compound screens have also been shown to building blocks), fuels and bulk chemicals (ethanol, solvents, polymer be successful. For example, a study that combined single-cell- and building blocks and feed additives like amino acids). metagenomic-based approaches discovered that the bacterial symbiont Entotheonella species expresses unique chemical bioactive compounds 4.1.1. E. coli as a producer including bioactive polyketides and peptides (Wilson et al., 2014). E. coli is a well-studied organism with a complex system of 4627 genes, 5827 regulatory interactions (including transcription initiation, 4. Metabolic engineering as a tool for cell factories transcription attenuation, regulation of translation and enzyme modulation) and 2528 unique compounds (Keseler et al., 2013; Feist 4.1. Established production strains et al., 2007); its metabolism is well understood. E. coli is an ideal candi- date for both metabolic engineering and industrial-scale production of Industries are using various strains of bacteria, yeast and algae to desirable bioproducts. E. coli encompasses excellent properties such as deliver valuable products such as cheese, wine and beer as well as rapid doubling time and growth rate, facilitation of high-cell-density other biotechnological and pharmaceutical products on a large scale. fermentation and low-cost production (Chen et al., 2013). E. coli is For this reason, cost-saving approaches have the potential for a huge im- known for the production of various (, bioethanol, pact on industries. Before the 1990s, microorganisms were genetically 1-butanol, isopropanol, 1-propanol, acetate, pyruvate, lactic and modified by chemicals, and the mutant strain that overexpressed the succinic acids); amino acids (phenylalanine, threonine, tryptophan, desired metabolite was used in production. However, random mutation tyrosine and valine); sugars and alcohols (mannitol, xylitol); diols and is problematic because the modified genes or DNA location was not polymers (such as precursors for polyesters) (Chen et al., 2013). As known, and untargeted pathways were affected (Torres and Voit, biosynthetic pathways in microorganisms thriving in marine waters 2002). Other early technologies used the random insertion of specific are uncovered, we will likely see E. coli used more and more as a genes into a living cell. This blind insertion method is similarly limited chemical producer. Ongley demonstrated a high-titer heterologous by unknown effects on unrelated pathways and subsequent unwanted production of lyngbyatoxin in E. coli, a protein kinase C activator from effects. Any changes in cell metabolic pathways may have drastic effects an uncultivable marine cyanobacterium (Ongley et al., 2013). The on the viability of the cell (Vemuri and Aristidou, 2005). With the following are a few examples of companies using E. coli to produce advancement of genome engineering and genome-editing technologies, high volumes of various metabolites for industrial purposes: it is now possible to remove, insert or replace specific DNA sequences in Dupont (1,3-propanediol, butanol and ethanol from lignocellulosics); the cell. Instead of a direct deletion or an overexpression of the genes DSM (the cephalexin); BASF (vitamin B2 and riboflavin); that encode the related metabolic enzymes, it is now more common Novozymes and Cargill (3-hydroxypropionic acid); and Gevo to target regulatory networks. In this way, particular pathways and/or (isobutanol) (Hong, 2012; Hong and Nielsen, 2012). the maximum output of a desired metabolite can be enhanced or suppressed in a cell (even nonnative pathways), with the possibility of 4.1.2. Algae as a producer scaling up entire processes, such that the cell remains viable under In the presence of sunlight, CO2, water and nutrients (nitrogen, production conditions. potassium and phosphorous) algae can produce an oil used for biofuels. Typically, the desired product to be synthesized is identified, and the Algae can be single-celled or multicellular; some algae have short sexual related reactions and pathways that can produce the desired product cycles and rapid growth that allows for easy manipulation, and some are selected. The most suitable organism to synthesize the product algae can grow in wastewater, salty water or nonarable land (Hannon is chosen from genetic and chemical information. It is important to et al., 2010). Algal strains vary in their requirements for growth medi- consider how easily the organism's pathways can be modified, how um, their growth rate and their biomass productivity; they are not all easily the organism is to grow and maintain and the volume, value photosynthetic, and they have various lipid profiles and levels of resis- and cleanliness of the final product. A wide range of microorganisms tance to pathogens (Georgianna and Mayfield, 2012). Algal strains established as cell factories for industrial purposes are currently used should be optimized for tolerance to high temperatures, various light for metabolic engineering: recombinant bacterial strains of Bacillus conditions, salinity, high oxygen concentration and nutrient composi- subtilis, Corynebacterium glutamicum, , filamentous tion. Ideally, cells should be able to grow and produce lipids simulta- fungi Aspergillus niger and Aspergillus oryzae, various microalgae (e.g., neously, and it would be best if the lipids produced were excreted Arthrospira, Botryococcus, Chaetoceros calcitrans, Chara vulgaris, Chorella outside cells. From the total 40,000 algal species (with some estimates protothecoides, Chlorella vulgaris, Clamydomonas, , of up to 72,500 algal species worldwide (Guiry, 2012)), only a few thou- Dunaliella, Nannochloris, Nannochloropsis, Ostreococcus, Pavlova lutheri, sand strains (~3000) are kept in collections and only half-a-dozen are Phaeodactylum tricornutum, Porphyridium cruentum, Thalassiosira cultivated in industrial quantities. To date, only approximately 10 differ- pseudonana, Thraustochytrium spp.) and the yeast Saccharomyces ent algal species can be transformed and fewer than 50 whole genomes cerevisiae. The availability of references and databases about the have been sequenced (Walker et al., 2005; Wijffels and Barbosa, 2010). genomic and chemical information, reactions and metabolic pathways for the production of product or a wanted result is essential. Each of 4.1.3. S. cerevisiae as a producer the producers has their own advantages and suffers from some prob- S. cerevisiae's genome was the first eukaryotic organism to be lems. For example, B. subtilis is capable producing riboflavin, butanediol, sequenced, and, therefore, it has been used widely as a isobutanol, cellulase and other substances. Production of these in many fundamental studies. The yeast S. cerevisiae naturally metabo- substances can be enhanced by gene modification (Hao et al., 2013). lizes hexoses (glucose, fructose and galactose) and a few dimers such The genome-scale metabolic models of B. subtilis are very useful for as sucrose and maltose. Yeasts can tolerate high ethanol and sugar the rapid and accurate prediction of the cellular response to gene knock- concentrations (resistant to osmotic stress), a high cell density, high out, media conditions and environmental changes. It is important to be temperatures, low pH values and low sulfite concentrations (Basso able to scale up the production from the laboratory to the industrial et al., 2008). For these reasons yeasts are able to grow under environ- scale. The volume of a product produced by engineered cells varies. mental conditions where other organisms cannot survive. In addition, High-value products (and usually low volume) include pharmaceuticals yeasts are immune to contamination by phages (compared to E. coli). (recombinant proteins, statins and other natural products) and food However, there are certain limitations to S. cerevisiae,suchasan ingredients (vitamins, antioxidants and flavors). On the other hand, inability to metabolize pentoses present in hemicelluloses, their conver- biorefineries seek larger volume products (even if the value of the sion and growth rates are lower than those for E. coli and it is more product is lower) such as fine chemicals (antibiotics, enzymes and chiral difficult to engineer yeast than E. coli (Hong and Nielsen, 2012). Despite

Please cite this article as: Kodzius, R., Gojobori, T., Marine metagenomics as a source for bioprospecting, Mar. Genomics (2015), http://dx.doi.org/ 10.1016/j.margen.2015.07.001 6 R. Kodzius, T. Gojobori / Marine Genomics xxx (2015) xxx–xxx these limitations, many strains of S. cerevisiae have been developed engineered to overproduce (5-fold) the normal amount of isoprenoid for the production of biofuels (ethanol, biodiesels and butanol); lycopene, an antioxidant linked to anticancer properties that is bulk chemicals (pyruvic acid, succinic acid, 1-lactic acid, 1,2- produced industrially. His work was completed in three days using a propanediol, d-ribose and ribitol and polyhydroxy-alkanoates); fine complex pool of synthetic DNA, creating over 4.3 billion combinatorial chemicals (1-ascorbic acid, antifungal diterpene, methylmalonyl- genomic variants (Wang et al., 2009). The accelerated growth of E. coli coenzyme A, nonribosomal peptides and Se-methylselenocysteine); was integral to the speed at which Church was able to improve and and protein drugs (immunoglobulin G, hepatitis B virus surface antigen, discover new properties (Wang et al., 2009). epidermal growth factor, insulin-like growth factor, single-chain anti- Producers in marine waters have been discovered and improved. For bodies and glucagon) (Hong and Nielsen, 2012). Many genes discovered example, a marine actinomycete Marinactinospora thermotolerans found in marine life can be expressed in S. cerevisiae; for example, a gene in deep-sea marine sediment was genetically modified to increase its encoding fatty acid desaturase was cloned from the marine fungus yield by ∼25-fold compared to the wild-type strain for its use in the pro- Thraustochytrium sp. and expressed at a very high conversion rate duction of antimicrobial nucleoside antibiotic A201A. M. thermotolerans (56.40%) in S. cerevisiae (Huang et al., 2011). was also shown to be a source of antibacterial agents (Zhu et al., 2012).

4.2.1. Genetic engineering using tools 4.2. Tools to improve the function of strains It is known that many of the organisms identified through metagenomics are unculturable. Advances in metagenomics and micro- We seek to improve wild-type producers to better suit our needs. biology allow to identify new media, expanding the list of potential The evolution of agriculture, for example, began around 10,000 BC, producers. Genomic mining of the metagenomics data allows to identify when humans transitioned from hunting to farming. Over time, crops biochemical diversity of specific gene families. For example, 31 beneficial to agricultural output have been selected. Although unaware cyanobactin gene clusters (family of cyclic ribosomal peptides) from of genomic DNA, people began to understand that genetic material is 126 genomes of cyanobacteria were identified by Leikoski et al. passed from parents to their offspring. Therefore, genetic material (2013). The gene families and pathways found can be inserted into can be altered by either random mutation events, such as irradiation the known producer strains. by X-rays, or by genome-targeted editing technologies, to make improvements to strains in a short period of time (Fig. 2). Similarly, established microbial strains have been improved in many 4.2.1.1. Meganucleases as a tool for genome engineering. Meganucleases ways— from random strain mutations to advanced methods, such as are enzymes capable of recognizing and cutting large DNA sequences metabolic engineering, when specific genes and pathways are changed. (from 12 to 40 bp); they provide a more sophisticated approach to Advances in molecular biology and genetic sciences have developed genome engineering than do random mutation techniques (Stoddard, genome engineering, which enables targeted mutations in genomes. 2005). However, naturally occurring meganucleases do not serve for In 2009, Church automated a process called multiplex-automated the purpose of engineering because it is impossible to match an exact genome engineering, which generated a diverse set of genetic changes meganuclease to act on a specific DNA sequence. Therefore, many (mismatches, deletions and/or insertions) that enabled E. coli to be meganucleases are created (Smith et al., 2006)byeitherbymutagenesis

Fig. 2. The typical steps followed to improve a producer strain using genetic engineering tools to modify genomic DNA by mutagenesis and pathway perturbation, followed by high- throughput screening. The best producer is used to scale up the production process.

Please cite this article as: Kodzius, R., Gojobori, T., Marine metagenomics as a source for bioprospecting, Mar. Genomics (2015), http://dx.doi.org/ 10.1016/j.margen.2015.07.001 R. Kodzius, T. Gojobori / Marine Genomics xxx (2015) xxx–xxx 7

(Seligman et al., 2002) or by combinatorial assembly, whereby protein mutations have been successfully introduced in the genomes of Strepto- subunits from different enzymes are fused (Arnould et al., 2006). coccus pneumoniae and E. coli using CRISPR system (Jiang et al., 2013). Currently, a fully rational design process (“directed nuclease editor”) Currently, targeted by CRISPR can be applied to micro- is being used to create highly specific engineered meganucleases for organisms harboring inducible (Jiang et al., 2013). any location in a genome (Gao et al., 2010). The more stringent the Today, the CRISPR system is recognized as the most efficient and DNA sequence recognition is of the meganuclease, the less toxic it is easiest system for genome engineering. CRISPR should be used to create for the cell. and modify producer strains and to improve and enhance the produc- Actinomycetes are Gram-positive mycelial bacteria, known to tion of a chemical of interest. A high-throughput system environment produce a wide variety of industrially and medically relevant com- is essential for CRISPR system function, and combined with single-cell pounds (antibiotics, chemotherapeutics, fungicides, herbicides and technologies, for example, the best performing producers could be immunosuppressants). Using meganuclease I-SceI from S. cerevisiae, quickly and easily selected. In the near future, we expect these methods the gene cluster for the red-pigmented tripyrrole compound that to lead to more producer strains for various industrially relevant output remains associated with the mycelium was deleted in actinomycetes products. Marine metagenomics is not only a source to explain bacterial Streptomyces coelicolor. The changed genotype and phenotype were and archaeal CRISPR-related gene sequences and their acquired resis- confirmed (Fernandez-Martinez and Bibb, 2014). tance to viruses (Heidelberg et al., 2009; Anderson et al., 2011), but it is also a source for identifying ways to improve parts of the CRISPR 4.2.1.2. ZNF and TALEN as a tool for genome engineering. Zinc finger system (Anderson et al., 2011; Zhang et al., 2014; Smedile et al., nucleases (ZFNs) and transcription activator-like effectors (TALEs) 2013); new producer strains can be obtained from marine sources and (Baker, 2012) are two DNA sequence-recognizing peptides that can modified/improved using CRISPR. CRISPR/Cas systems can contain be combined with a FokI endonuclease catalytic domain, which genes that encode highly divergent proteins (Makarova et al., 2011). have different recognition and cleaving sites. As a result, a construct All these are a valuable source for the synthetic biology. There are at with multiple recognition domains for recognizing multiple compos- least two studies which presented CRISPR-based transcriptional ite sequences over 24 bp can be created. Although ZNF (Kim et al., cascades for synthetic circuits, such as transcriptional activators and 1996)andTALEN(Li et al., 2011; Christian et al., 2010) are time con- repressors (Vogt, 2014). Voltage-dependent changes led to the change suming and expensive technologies, they have been used successful- to fluorescence resonance energy transfer (FRET) signals or changes to ly to genetically engineer many species, validating their role as endogenous protein fluorescence. While this was done in mammalian protein engineers. cells, it is also applicable for transfer to prokaryotes or other cells. The TALEN has successfully been applied to genetically modified sea shift from observation to application is beginning to take place. anemone Nematostella vectensis (Ikmi et al., 2014), sea urchin embryos (Hosoi et al., 2014), amphioxus Branchiostoma belcheri (G. Li et al., 4.2.2. New organisms created by synthetic biology 2014), marine annelid Platynereis dumerilii (Bannister et al., 2014)and Synthetic biology opens the door to a new era of producer improve- ascidian Ciona intestinalis (Yoshida et al., 2014). ment that could even lead to the creation of new organisms (Channon et al., 2008). Advances in oligonucleotide synthesis, where longer pieces 4.2.1.3. RE-TOFs as a tool for genome engineering. However, ZNF are being produced, are enabling the recreation of entire genomic DNA and TALEN, are limited by the fact that their constructs must be re- from certain cells. In addition to the development of pathways and engineered for each new DNA target. On one hand, restriction enzyme components, biological parts can be standardized (Baker et al., 2006), triple-helix-forming oligonucleotide (RE-TFO) conjugates offer an and the functional units are being introduced into organisms, with the easier approach to performing genome engineering (Silanskas et al., potential to construct entire organisms de novo. 2012; Eisenschmidt et al., 2005), where only DNA triple-helix-forming In 2000, a 9.6 kbp hepatitis C virus genome was synthesized (Blight nucleotides are used for binding specificity (no protein engineering is et al., 2000). Two years of work later, a synthetic 7.7 kbp poliovirus required). On the other hand, RE-TFO conjugates are restricted by the genome was developed (Couzin, 2002). The next year, the synthesis complicated methodology required to conjugate oligonucleotides to and construction of a 5.4 kbp bacteriophage Phi X 174 genome took the nuclease module, by the slow formation of the triple helix and by only two weeks (Smith et al., 2003). In 2006, the J. Institute the low-targeting frequency. constructed a synthetic genome of a newly discovered minimal bacteri- um laboratorium (Gibson et al., 2008). In 2010, Venter 4.2.1.4. CRISPR as a tool for genome engineering. The latest development demonstrated the synthetic assembly of a 1.08-M-bp Mycoplasma in genome-editing technology is the CRISPR/Cas system. CRISPR is an mycoides genome (Gibson et al., 2010). Eukaryotic algal RNA-based bacterial defense mechanism (in other words bacterial and of up to 500 kb have been assembled in yeast at the Craig Venter archaeal immune systems) designed to recognize and eliminate foreign Institute (Karas et al., 2013). It is likely that fully synthetic eukaryotic DNA from invading bacteriophages and (Horvath and producers will soon be synthesized and assembled. Other developments Barrangou, 2010). RNA-guided DNA endonuclease uses guide RNA for include expanding the natural (which consists of four target-site recognition, and Cas endonuclease cleaves the DNA. The bases A, C, G and T) to an extra (denoted by X and Y artificial main advantage to using CRISPR is the speed and simplicity of this nucleotides) (Malyshev et al., 2014). The production of new life forms assay — no protein engineering is required and only synthesized RNA and new proteins encoded from previously unknown and from devel- is used. Multiplexing is possible, which would allow for more changes oped amino acids is expected in the future (Suzuki et al., 2001; in the genome with one experiment (Cong et al., 2013; Gasiunas and Nakamura et al., 2000; Suzuki and Gojobori, 1999). Siksnys, 2013). The CRISPR approach has shown high potential and The first synthetic cell created by Venter cost 40 million USD genome-editing flexibility that can be applied to numerous DNA (Sleator, 2010). However, as DNA synthesis and sequencing technolo- targeting applications including transcriptional control (Barrangou gies improve and prices fall, we can expect not only further discovery and Marraffini, 2014). To date, CRISPR has been primarily used in the of naturally occurring organisms, but also the development of more engineering of eukaryotic cell lines; however, we can expect more engineered microorganisms. Many companies are involved in creating applications for genetic engineering in microorganisms in the near synthetic cells to capture CO2 and produce renewable fuels such as future. Although there is evidence that CRISPR-mediated cleavage of Joule Unlimited in Cambridge, LS9 Inc. in San Francisco, Amyris Biotech- chromosomes leads to cell death in many bacteria (Marraffini and nologies in California, Inc. in California (a company Sontheimer, 2010; Edgar and Qimron, 2010; Bikard et al., 2012)and founded by Dr. Venter) and the Exxon Mobil Corporation in Texas. archaea (Gudbergsdottir et al., 2011; Fischer et al., 2012), precise Synthetic biology serves to benefitfrommarinemetagenomicsbecause

Please cite this article as: Kodzius, R., Gojobori, T., Marine metagenomics as a source for bioprospecting, Mar. Genomics (2015), http://dx.doi.org/ 10.1016/j.margen.2015.07.001 8 R. Kodzius, T. Gojobori / Marine Genomics xxx (2015) xxx–xxx

Table 1 Baker, D., Group, B.F., Church, G., Collins, J., Endy, D., Jacobson, J., Keasling, J., Modrich, P., Challenges posed by the application of marine metagenomic methods and suggested Smolke, C., Weiss, R., 2006. Engineering life: building a fab for biology. Sci. Am. 294, – solutions. 44 51. Bannister, S., Antonova, O., Polo, A., Lohs, C., Hallay, N., Valinciute, A., Raible, F., Tessmar- Nr. Expected challenge Offered solution Raible, K., 2014. TALENs mediate efficient and heritable mutation of endogenous genes in the marine annelid Platynereis dumerilii. Genetics 197, 77–89. 1. Selection of sampling site Go to extreme environments to make new Barrangou, R., Marraffini, L.A., 2014. CRISPR–Cas systems: prokaryotes upgrade to adap- discoveries tive immunity. Mol. Cell 54, 234–244. 2. Huge data generation Use computer power to construct analysis Basso, L.C., de Amorim, H.V., de Oliveira, A.J., Lopes, M.L., 2008. Yeast selection for fuel eth- pipelines anol production in Brazil. FEMS Yeast Res. 8, 1155–1163. 3. Host species selection Select from available pathways based on what Bikard, D., Hatoum-Aslan, A., Mucida, D., Marraffini, L.A., 2012. CRISPR interference can should be expressed prevent natural transformation and virulence acquisition during in vivo bacterial – 4. Host improvement Select CRISPR as easiest and fastest method infection. Cell Host Microbe 12, 177 186. 5. Selection of best performing Use high-throughput methods such as Blainey, P.C., Mosier, A.C., Potanina, A., Francis, C.A., Quake, S.R., 2011. Genome of a low- salinity ammonia-oxidizing archaeon determined by single-cell and metagenomic producer cells single-droplet microfluidics analysis. PLoS One 6, e16626. Blight, K.J., Kolykhalov, A.A., Rice, C.M., 2000. Efficient initiation of HCV RNA replication in . Science 290, 1972–1974. Channon, K., Bromley, E.H.C., Woolfson, D.N., 2008. Synthetic biology through biomolecu- lots of the components for synthetic biology are derived from marine lar design and engineering. Curr. Opin. Struct. Biol. 18, 491–498. microorganisms (Wang and Mathews, 2009). Charlop-Powers, Z., Owen, J.G., Reddy, B.V.B., Ternei, M.A., Guimarães, D.O., de Frias, U.A., Pupo, M.T., Seepe, P., Feng, Z., Brady, S.F., 2015. Global biogeographic sampling of bacterial secondary metabolism. 5. Conclusion/outlook Chen, X., Zhou, L., Tian, K., Kumar, A., Singh, S., Prior, B.A., Wang, Z., 2013. Metabolic engineering of Escherichia coli: a sustainable industrial platform for bio-based – With the constant advancement of sequencing technologies, chemical production. Biotechnol. Adv. 31, 1200 1223. fl Christian, M., Cermak, T., Doyle, E.L., Schmidt, C., Zhang, F., Hummel, A., Bogdanove, A.J., delivering longer reads at lower costs, we can expect a ood of informa- Voytas, D.F., 2010. Targeting DNA double-strand breaks with TAL effector nucleases. tion about the genes, pathways and whole genomes of microorganisms Genetics 186, 757-U476. in various environmental populations. The limiting step will be reaching Claverie, J.-M., 2013. Giant virus in the sea: extending the realm of Megaviridae to Viridiplantae. Commun. Integr. Biol. 6, e25685. extreme locations to collect some of the most specially adapted Cole, J.R., Wang, Q., Fish, J.A., Chai, B.L., McGarrell, D.M., Sun, Y.N., Brown, C.T., Porras- organisms to discover not only new genomes, but also potentially Alfaro, A., Kuske, C.R., Tiedje, J.M., 2014. Ribosomal Database Project: data and tools undiscovered branches on the tree of life (Woyke and Rubin, 2014). for high throughput rRNA analysis. Nucleic Acids Res. 42, D633–D642. fl Cong, L., Ran, F.A., Cox, D., Lin, S.L., Barretto, R., Habib, N., Hsu, P.D., Wu, X.B., Jiang, W.Y., High-throughput screening utilizing micro uidic droplet technology Marraffini, L.A., Zhang, F., 2013. Multiplex genome engineering using CRISPR/Cas will allow us to select the best enzymes and best biomolecular pro- systems. Science 339, 819–823. ducers. Many opportunities remain to catalog genomes and pathways Couzin, J., 2002. Virology: active poliovirus baked from scratch. Science 297, 174–175. fi Culley, A.I., Lang, A.S., Suttle, C.A., 2006. Metagenomic analysis of coastal RNA virus com- for the simpli ed selection of necessary parts for genetic engineering. munities. Science 312, 1795–1798. Researchers at the Massachusetts Institute of Technology recently DeLong, E.F., Preston, C.M., Mincer, T., Rich, V., Hallam, S.J., Frigaard, N.-U., Martinez, A., presented CRISPR-based biological parts to perform logical functions Sullivan, M.B., Edwards, R., Brito, B.R., Chisholm, S.W., Karl, D.M., 2006. Community fi for synthetic biology (Rusk, 2014). Both the CRISPR-based transcrip- genomics among strati ed microbial assemblages in the ocean's interior. Science 311, 496–503. tional activators (Nissim et al., 2014) and repressors (Kiani et al., DeSantis, G., Zhu, Z., Greenberg, W.A., Wong,K.,Chaplin,J.,Hanson,S.R.,Farwell,B., 2014) were designed and demonstrated to be successful for synthetic Nicholson,L.W.,Rand,C.L.,Weiner,D.P.,Robertson,D.E.,Burk,M.J.,2002.An circuits in mammalian cells; very likely this technology will soon be enzyme library approach to biocatalysis: development of nitrilases for enantioselective production of carboxylic acid derivatives. J. Am. Chem. Soc. applied to microbial cells. Both the standardization of the sample 124, 9024–9025. collection technique and bioinformatics pipeline will speed up data Edgar, R., Qimron, U., 2010. The Escherichia coli CRISPR system protects from lambda production and analysis (Table 1.). Large quantities of data are expected lysogenization, lysogens, and prophage induction. J. Bacteriol. 192, 6291–6294. Eisenschmidt, K., Lanio, T., Simoncsits, A., Jeltsch, A., Pingoud, V., Wende, W., Pingoud, A., to become a challenge for bioinformatics; solutions so far include 2005. Developing a programmed restriction endonuclease for highly specific DNA storing reference genomes with their possible variations (Kahn, 2011; cleavage. Nucleic Acids Res. 33, 7039–7047. Nelson et al., 2010). El Bali, L., Diman, A., Bernard, A., Roosens, N.H.C., De Keersmaecker, S.C.J., 2014. Comparative study of seven commercial kits for human DNA extraction from urine We will see more synthetic organisms using environmental pollut- samples suitable for DNA biomarker-based public health studies. J. Biomol. Tech. ants (e.g.,CO2) or industrial waste to produce food, livestock feed, 25, 96–110. fuels, small molecules and pharmaceuticals. Considerable interest Feist, A.M., Henry, C.S., Reed, J.L., Krummenacker, M., Joyce, A.R., Karp, P.D., Broadbelt, L.J., Hatzimanikatis, V., Palsson, B.O., 2007. A genome-scale metabolic reconstruction for from industry is expected to speed up solutions for scaling up produc- Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic infor- tion. Interest from both industry and academia promises to move mation. Mol. Syst. Biol. 3. developments in marine metagenomics along at an impressive pace. Fernandez-Martinez, L.T., Bibb, M.J., 2014. Use of the Meganuclease I-SceI of Saccharomy- ces cerevisiae to select for gene deletions in actinomycetes. Sci. Rep. 4. Ferrer, M., Beloqui, A., Timmis, K.N., Golyshin, P.N., 2009. Metagenomics for mining new References genetic resources of microbial communities. J. Mol. Microbiol. Biotechnol. 16, 109–123. Adkins, J., Pugh, S., McKenna, R., Nielsen, D.R., 2012. Engineering microbial chemical Fischer,S.,Maier,L.K.,Stoll,B.,Brendel,J.,Fischer,E.,Pfeiffer,F.,Dyall-Smith,M., factories to produce renewable ‘biomonomers’. Front. Microbiol. 3. Marchfelder, A., 2012. An archaeal immune system can detect multiple Alawi, M., Schneider, B., Kallmeyer, J., 2014. A procedure for separate recovery of extra- protospacer adjacent motifs (PAMs) to target invader DNA. J. Biol. Chem. 287, and intracellular DNA from a single marine sediment sample. J. Microbiol. Methods 33351–33363. 104, 36–42. Gabor, E.M., De Vries, E.J., Janssen, D.B., 2004. Construction, characterization, and use of Albertsen, M., Hugenholtz, P., Skarshewski, A., Nielsen, K.L., Tyson, G.W., Nielsen, P.H., small-insert gene banks of DNA isolated from soil and enrichment cultures for the 2013. Genome sequences of rare, uncultured bacteria obtained by differential recovery of novel amidases. Environ. Microbiol. 6, 948–958. coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533 − +. Gao, H., Smith, J., Yang, M., Jones, S., Djukanovic, V., Nicholson, M.G., West, A., Bidney, D., Allen, E.E., Banfield, J.F., 2005. Community genomics in microbial ecology and evolution. Falco, S.C., Jantz, D., Lyznik, L.A., 2010. Heritable targeted mutagenesis in maize using Nat. Rev. Microbiol. 3, 489–498. a designed endonuclease. Plant J. 61, 176–187. Anderson, R.E., Brazelton, W.J., Baross, J.A., 2011. Using CRISPRs as a metagenomic tool to Gasiunas, G., Siksnys, V., 2013. RNA-dependent DNA endonuclease Cas9 of the CRISPR identify microbial hosts of a diffuse flow hydrothermal vent viral assemblage. FEMS system: Holy Grail of genome editing? Trends Microbiol. 21, 562–567. Microbiol. Ecol. 77, 120–133. Georgianna, D.R., Mayfield, S.P., 2012. Exploiting diversity and synthetic biology for the Arnould, S., Chames, P., Perez, C., Lacroix, E., Duclert, A., Epinat, J.C., Stricher, F., Petit, A.S., production of algal biofuels. 488, 329–335. Patin, A., Guillier, S., Rolland, S., Prieto, J., Blanco, F.J., Bravo, J., Montoya, G., Serrano, L., Gibson, D.G., Benders, G.A., Andrews-Pfannkoch, C., Denisova, E.A., Baden-Tillson, H., Duchateau, P., Paques, F., 2006. Engineering of large numbers of highly specific Zaveri, J., Stockwell, T.B., Brownley, A., Thomas, D.W., Algire, M.A., Merryman, C., homing endonucleases that induce recombination on novel DNA targets. J. Mol. Young, L., Noskov, V.N., Glass, J.I., Venter, J.C., Hutchison, C.A., Smith, H.O., 2008. Biol. 355, 443–458. Complete chemical synthesis, assembly, and cloning of a Baker, M., 2012. Gene-editing nucleases. Nat. Methods 9, 23–26. genome. Science 319, 1215–1220.

Please cite this article as: Kodzius, R., Gojobori, T., Marine metagenomics as a source for bioprospecting, Mar. Genomics (2015), http://dx.doi.org/ 10.1016/j.margen.2015.07.001 R. Kodzius, T. Gojobori / Marine Genomics xxx (2015) xxx–xxx 9

Gibson, D.G., Glass, J.I., Lartigue, C., Noskov, V.N., Chuang, R.Y., Algire, M.A., Benders, G.A., Lema, K.A., Willis, B.L., Bourne, D.G., 2012. Corals form characteristic associations with Montague, M.G., Ma, L., Moodie, M.M., Merryman, C., Vashee, S., Krishnakumar, R., symbiotic nitrogen-fixing bacteria. Appl. Environ. Microbiol. 78, 3136–3144. Assad-Garcia, N., Andrews-Pfannkoch, C., Denisova, E.A., Young, L., Qi, Z.Q., Segall- Li, G., Feng, J., Lei, Y., Wang, J., Wang, H., Shang, L.-K., Liu, D.-T., Zhao, H., Zhu, Y., Wang, Y.- Shapiro, T.H., Calvey, C.H., Parmar, P.P., Hutchison, C.A., Smith, H.O., Venter, J.C., Q., 2014b. Mutagenesis at specific genomic loci of amphioxus Branchiostoma belcheri 2010. Creation of a bacterial cell controlled by a chemically synthesized genome. Sci- using TALEN method. J. Genet. Genomics (Yi Chuan Xue Bao) 41, 215–219. ence 329, 52–56. Li, T., Huang, S., Jiang, W.Z., Wright, D., Spalding, M.H., Weeks, D.P., Yang, B., 2011. TAL Greco, M., Sáez, C.A., Brown, M.T., Bitonti, M.B., 2014. A simple and effective method for nucleases (TALNs): hybrid proteins composed of TAL effectors and FokI DNA- high quality co-extraction of genomic DNA and total RNA from low biomass cleavage domain. Nucleic Acids Res. 39, 359–372. Ectocarpus siliculosus, the model brown alga. PLoS One 9, e96470. Li, F.Q., Kodzius, R., Gooneratne, C.P., Foulds, I.G., Kosel, J., 2014a. Magneto-mechanical Gudbergsdottir, S., Deng, L., Chen, Z.J., Jensen, J.V.K., Jensen, L.R., She, Q.X., Garrett, R.A., trapping systems for biological target detection. Microchim. Acta 181, 1743–1748. 2011. Dynamic properties of the Sulfolobus CRISPR/Cas and CRISPR/Cmr systems Makarova, K.S., Haft, D.H., Barrangou, R., Brouns, S.J.J., Charpentier, E., Horvath, P., Moineau, when challenged with vector-borne viral and plasmid genes and protospacers. Mol. S., Mojica, F.J.M., Wolf, Y.I., Yakunin, A.F., van der Oost, J., Koonin, E.V., 2011. Evolution Microbiol. 79, 35–49. and classification of the CRISPR–Cassystems.Nat.Rev.Microbiol.9,467–477. Guiry, M.D., 2012. How many species of algae are there? J. Phycol. 48, 1057–1063. Malyshev, D.A., Dhami, K., Lavergne, T., Chen, T.J., Dai, N., Foster, J.M., Correa, I.R., Gupta, R., Beg, Q.K., Lorenz, P., 2002. Bacterial alkaline proteases: molecular approaches Romesberg, F.E., 2014. A semi-synthetic organism with an expanded genetic and industrial applications. Appl. Microbiol. Biotechnol. 59, 15–32. alphabet. Nature 509, 385 − +. Handelsman, J., 2005. Metagenomics: application of genomics to uncultured microorgan- Marraffini, L.A., Sontheimer, E.J., 2010. Self versus non-self discrimination during CRISPR isms (vol 68, pg 669, 2004). Microbiol. Mol. Biol. Rev. 69, 669–685. RNA-directed immunity. Nature 463, 568-U194. Hannon, M., Gimpel, J., Tran, M., Rasala, B., Mayfield, S., 2010. Biofuels from algae: Martin, H.G., Ivanova, N., Kunin, V., Warnecke, F., Barry, K.W., McHardy, A.C., Yeates, C., He, challenges and potential. Biofuels 1, 763–784. S., Salamov, A.A., Szeto, E., Dalin, E., Putnam, N.H., Shapiro, H.J., Pangilinan, J.L., Hao, T., Han, B., Ma, H., Fu, J., Wang, H., Wang, Z., Tang, B., Chen, T., Zhao, X., 2013. Rigoutsos, I., Kyrpides, N.C., Blackall, L.L., McMahon, K.D., Hugenholtz, P., 2006. metabolic engineering of Bacillus subtilis for improved production of riboflavin, Egl- Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) 237, (R,R)-2,3-butanediol and isobutanol. Mol. BioSyst. 9, 2034–2044. sludge communities. Nat. Biotechnol. 24, 1263–1269. Heidelberg, J.F., Nelson, W.C., Schoenfeld, T., Bhaya, D., 2009. Germ warfare in a microbial Mason, O.U., Hazen, T.C., Borglin, S., Chain, P.S., Dubinsky, E.A., Fortney, J.L., Han, J., mat community: CRISPRs provide insights into the co-evolution of host and viral Holman, H.Y., Hultman, J., Lamendella, R., Mackelprang, R., Malfatti, S., Tom, L.M., genomes. PLoS One 4 e4169. Tringe, S.G., Woyke, T., Zhou, J., Rubin, E.M., Jansson, J.K., 2012. Metagenome, Hong, K.K., 2012. Advancing Metabolic Engineering Through Combination of Systems metatranscriptome and single-cell sequencing reveal microbial response to Deepwa- Biology and Adaptive Evolution. Chalmers University of Technology. ter Horizon oil spill. ISME J. 6, 1715–1727. Hong, K.-K., Nielsen, J., 2012. Metabolic engineering of Saccharomyces cerevisiae:akeycell McMahon, K., 2015. Metagenomics 2.0. Environ. Microbiol. Rep. 7, 38–39. factory platform for future biorefineries. Cell. Mol. Life Sci. 69, 2671–2690. Mick, E., Sorek, R., 2014. High-resolution metagenomics (vol 32, pg 750, 2014). Nat. Horvath, P., Barrangou, R., 2010. CRISPR/Cas, the immune system of bacteria and archaea. Biotechnol. 32, 750–751. Science 327, 167–170. N.R. Council, 2007. The new science of metagenomics: revealing the secrets of our Hosoi, S., Sakuma, T., Sakamoto, N., Yamamoto, T., 2014. Targeted mutagenesis in sea microbial planet. The National Academies Press, Washington, DC. urchin embryos using TALENs. Develop. Growth Differ. 56, 92–97. Nakamura, Y., Gojobori, T., Ikemura, T., 2000. Codon usage tabulated from international Huang, J.-Z., Jiang, X.-Z., Xia, X.-F., Yu, A.-Q., Mao, R.-Y., Chen, X.-F., Tian, B.-Y., 2011. Cloning DNA sequence databases: status for the year 2000. Nucleic Acids Res. 28, 292. and functional identification of delta5 fatty acid desaturase gene and its 5′-upstream re- Nelson, K.E., Weinstock, G.M., Highlander, S.K., Worley, K.C., Creasy, H.H., Wortman, J.R., gion from marine fungus Thraustochytrium sp. FJN-10. Mar. Biotechnol. (N.Y.) 13, 12–21. Rusch, D.B., Mitreva, M., Sodergren, E., Chinwalla, A.T., Feldgarden, M., Gevers, D., Ikmi, A., McKinney, S.A., Delventhal, K.M., Gibson, M.C., 2014. TALEN and CRISPR/Cas9- Haas, B.J., Madupu, R., Ward, D.V., Birren, B., Gibbs, R.A., Methe, B., Petrosino, J.F., mediated genome editing in the early-branching metazoan Nematostella vectensis. Strausberg, R.L., Sutton, G.G., White, O.R., Wilson, R.K., Durkin, S., Gujja, S., Howarth, Nat. Commun. 5, 5486. C., Kodira, C.D., Kyrpides, N., Madupu, R., Mehta, T., Mitreva, M., Muzny, D.M., Iwai, S., Chai, B.L., Sul, W.J., Cole, J.R., Hashsham, S.A., Tiedje, J.M., 2010. Gene-targeted- Pearson, M., Pepin, K., Pati, A., Qin, X., Yandava, C., Zeng, Q.D., Zhang, L., Berlin, A.M., metagenomics reveals extensive diversity of aromatic dioxygenase genes in the Chen, L., Hepburn, T.A., Johnson, J., McCorrison, J., Miller, J., Minx, P., Nusbaum, C., environment. ISME J. 4, 279–285. Russ, C., Sutton, G.G., Sykes, S.M., Tomlinson, C.M., Young, S., Warren, W.C., Badger, J., Jiang, W., Bikard, D., Cox, D., Zhang, F., Marraffini, L.A., 2013. RNA-guided editing of Crabtree, J., Madupu, R., Markowitz, V.M., Orvis, J., Rusch, D.B., Sutton, G.G., Cree, A., bacterial genomes using CRISPR–Cas systems. Nat. Biotechnol. 31, 233–239. Ferriera, S., Gillis, M., Hemphill, L.D., Joshi, V., Kovar, C., Wetterstrand, K.A., Abouellleil, Kahn, S.D., 2011. On the future of genomic data. Science 331, 728–729. A., Wollam, A.M., Buhay, C.J., Ding, Y., Dugan, S., Fulton, L.L., Fulton, R.S., Holder, M., Karas, B.J., Molparia, B., Jablanovic, J., Hermann, W.J., Lin, Y.C., Dupont, C.L., Tagwerker, C., Hostetler, J., Sutton, G.G., Allen-Vercoe, E., Badger, J., Clifton, S.W., Earl, A.M., Farmer, Yonemoto, I.T., Noskov, V.N., Chuang, R.Y., Allen, A.E., Glass, J.I., Hutchison, C.A., Smith, C.N., Giglio, M.G., Liolios, K., Surette, M.G., Sutton, G.G., Torralba, M., Xu, Q., Pohl, C., H.O., Venter, J.C., Weyman, P.D., 2013. Assembly of eukaryotic algal chromosomes in Durkin, S., Sutton, G.G., Wilczek-Boney, K., Zhu, D.H., Jumpstart, H.M., 2010. Acatalog yeast. J. Biol. Eng. 7. of reference genomes from the human microbiome. Science 328, 994–999. Kashtan, N., Roggensack, S.E., Rodrigue, S., Thompson, J.W., Biller, S.J., Coe, A., Ding, H., Nissim, L., Perli, S.D., Fridkin, A., Perez-Pinera, P., Lu, T.K., 2014. Multiplexed and program- Marttinen, P., Malmstrom, R.R., Stocker, R., Follows, M.J., Stepanauskas, R., Chisholm, mable regulation of gene networks with an integrated RNA and CRISPR/Cas toolkit in S.W., 2014. Single-cell genomics reveals hundreds of coexisting subpopulations in human cells. Mol. Cell 54, 698–710. wild Prochlorococcus. Science 344, 416–420. Oldham, A., Drilling, H., Stamps, B., Stevenson, B., Duncan, K., 2012. Automated DNA ex- Kennedy, J., Flemer, B., Jackson, S.A., Lejon, D.P.H., Morrissey, J.P., O'Gara, F., Dobson, traction platforms offer solutions to challenges of assessing microbial biofouling in A.D.W., 2010. Marine Metagenomics: new tools for the study and exploitation of oil production facilities. AMB Express 2, 60. marine microbial metabolism. Mar. Drugs 8, 608–628. Ongley, S.E., Bian, X., Zhang, Y., Chau, R., Gerwick, W.H., Müller, R., Neilan, B.A., 2013. Keseler, I.M., Mackie, A., Peralta-Gil, M., Santos-Zavaleta, A., Gama-Castro, S., Bonavides- High-titer heterologous production in E. coli of lyngbyatoxin, a protein kinase C acti- Martínez, C., Fulcher, C., Huerta, A.M., Kothari, A., Krummenacker, M., Latendresse, vator from an uncultured marine cyanobacterium. ACS Chem. Biol. 8, 1888–1893. M., Muñiz-Rascado, L., Ong, Q., Paley, S., Schröder, I., Shearer, A.G., Subhraveti, P., Parachin, N.S., Gorwa-Grauslund, M.F., 2011. Isolation of xylose isomerases by sequence- Travers, M., Weerasinghe, D., Weiss, V., Collado-Vides, J., Gunsalus, R.P., Paulsen, I., and function-based screening from a soil metagenomic library. Biotechnol. Biofuels 4. Karp, P.D., 2013. EcoCyc: fusing model organism databases with systems biology. Philippe, N., Legendre, M., Doutre, G., Coute, Y., Poirot, O., Lescot, M., Arslan, D., Seltzer, V., Nucleic Acids Res. 41, D605–D612. Bertaux, L., Bruley, C., Garin, J., Claverie, J.M., Abergel, C., 2013. Pandoraviruses: amoe- Kiani, S., Beal, J., Ebrahimkhani, M.R., Huh, J., Hall, R.N., Xie, Z., Li, Y.Q., Weiss, R., 2014. ba viruses with genomes up to 2.5 Mb reaching that of parasitic eukaryotes. Science CRISPR transcriptional repression devices and layered circuits in mammalian cells. 341, 281–286. Nat. Methods 11, 723-U155. Poretsky, R.S., Hewson, I., Sun, S., Allen, A.E., Zehr, J.P., Moran, M.A., 2009. Comparative Kim, Y.G., Cha, J., Chandrasegaran, S., 1996. Hybrid restriction enzymes: zinc finger fusions day/night metatranscriptomic analysis of microbial communities in the North Pacific to Fok I cleavage domain. Proc. Natl. Acad. Sci. U. S. A. 93, 1156–1160. subtropical gyre. Environ. Microbiol. 11, 1358–1375. Knight, R., Jansson, J., Field, D., Fierer, N., Desai, N., Fuhrman, J.A., Hugenholtz, P., van der Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J., Glockner, F.O., Lelie, D., Meyer, F., Stevens, R., Bailey, M.J., Gordon, J.I., Kowalchuk, G.A., Gilbert, J.A., 2013. The SILVA ribosomal RNA gene database project: improved data processing and 2012. Unlocking the potential of metagenomics through replicated experimental web-based tools. Nucleic Acids Res. 41, D590–D596. design. Nat. Biotechnol. 30, 513–520. Rabausch, U., Juergensen, J., Ilmberger, N., Böhnke, S., Fischer, S., Schubach, B., Schulte, M., Kodama, Y., Mashima, J., Kaminuma, E., Gojobori, T., Ogasawara, O., Takagi, T., Okubo, K., Streit, W.R., 2013. Functional screening of metagenome and genome libraries for detec- Nakamura, Y., 2012. The DNA Data Bank of Japan launches a new resource, the DDBJ tion of novel flavonoid-modifying enzymes. Appl. Environ. Microbiol. 79, 4551–4563. Omics Archive of functional genomics experiments. Nucleic Acids Res. 40, D38–D42. Richardson, E.J., Watson, M., 2013. The automatic annotation of bacterial genomes. Brief. Kodzius, R., Gojobori, T., 2015. Single-cell technologies in environmental omics studies. Bioinform. 14, 1–12. gene (accepted for publication). Riesenfeld, C.S., Schloss, P.D., Handelsman, J., 2004. Metagenomics: genomic analysis of Konstantinidis, K.T., Rosselló-Móra, R., 2015. Classifying the uncultivated microbial majority: microbial communities. Annu. Rev. Genet. 38, 525–552. a place for metagenomic data in the Candidatus proposal. Syst. Appl. Microbiol. Rinke,C.,Lee,J.,Nath,N.,Goudeau,D.,Thompson,B.,Poulton,N.,Dmitrieff,E.,Malmstrom,R., Kryukov, K., Sumiyama, K., Ikeo, K., Gojobori, T., Saitou, N., 2012. A new database (GCD) Stepanauskas, R., Woyke, T., 2014. Obtaining genomes from uncultivated environmental on genome composition for eukaryote and genome sequences and their microorganisms using FACS-based single-cell genomics. Nat. Protoc. 9, 1038–1048. initial analyses. Genome Biol. Evol. 4, 501–512. Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N.N., Anderson, I.J., Cheng, J.-F., Darling, A., Leikoski, N., Liu, L., Jokela, J., Wahlsten, M., Gugger, M., Calteau, A., Permi, P., Kerfeld, C.A., Malfatti, S., Swan, B.K., Gies, E.A., Dodsworth, J.A., Hedlund, B.P., Tsiamis, G., Sievert, Sivonen, K., Fewer, D.P., 2013. Genome mining expands the chemical diversity of the S.M., Liu, W.-T., Eisen, J.A., Hallam, S.J., Kyrpides, N.C., Stepanauskas, R., Rubin, E.M., cyanobactin family to include highly modified linear peptides. Chem. Biol. 20, Hugenholtz, P., Woyke, T., 2013. Insights into the phylogeny and coding potential 1033–1043. of microbial dark matter. Nature 499, 431–437.

Please cite this article as: Kodzius, R., Gojobori, T., Marine metagenomics as a source for bioprospecting, Mar. Genomics (2015), http://dx.doi.org/ 10.1016/j.margen.2015.07.001 10 R. Kodzius, T. Gojobori / Marine Genomics xxx (2015) xxx–xxx

Rivas, R., Velázquez, E., Zurdo-Piñeiro, J.L., Mateos, P.F., Martınezmolina, E., 2004. Identifi- Vemuri, G.N., Aristidou, A.A., 2005. Metabolic engineering in the -omics era: elucidating cation of microorganisms by PCR amplification and sequencing of a universal and modulating regulatory networks. Microbiol. Mol. Biol. Rev. 69 (197 − +). amplified ribosomal region present in both prokaryotes and eukaryotes. Vingataramin, L., Frost, E.H., 2015. A single protocol for extraction of gDNA from bacteria J. Microbiol. Methods 56, 413–426. and yeast. Biotechniques 58, 120–125. Rusk, N., 2014. Synthetic biology: CRISPR circuits. Nat. Methods 11, 710–711. Vogt, N., 2014. Visualizing voltage. Nat. Methods 11, 710–711. Schloss, P.D., Handelsman, J., 2003. Biotechnological prospects from metagenomics. Curr. Walker, T.L., Collet, C., Purton, S., 2005. Algal transgenics in the genomic ERA1. J. Phycol. Opin. Biotechnol. 14, 303–310. 41, 1077–1093. Seligman, L.M., Chisholm, K.M., Chevalier, B.S., Chadsey, M.S., Edwards, S.T., Savage, J.H., Wang, G., Mathews, J., 2009. Potential applications of synthetic biology in marine Veillet, A.L., 2002. Mutations altering the cleavage specificity of a homing endonucle- microbial functional ecology and biotechnology, systems biology and synthetic ase. Nucleic Acids Res. 30, 3870–3879. biology. John Wiley & Sons, Inc., pp. 577–592. Sharon, I., Banfield, J.F., 2013. Genomes from metagenomics. Science 342, 1057–1058. Wang, H.H., Isaacs, F.J., Carr, P.A., Sun, Z.Z., Xu, G., Forest, C.R., Church, G.M., 2009. Sharpton, T.J., 2014. An introduction to the analysis of shotgun metagenomic data. Front. Programming cells by multiplex genome engineering and accelerated evolution. Plant Sci. 5. Nature 460, 894–898. Shi, Y., Tyson, G.W., DeLong, E.F., 2009. Metatranscriptomics reveals unique microbial Wesolowska-Andersen, A., Bahl, M., Carvalho, V., Kristiansen, K., Sicheritz-Ponten, T., small RNAs in the ocean's water column. Nature 459, 266–269. Gupta, R., Licht, T., 2014. Choice of bacterial DNA extraction method from fecal Silanskas, A., Zaremba, M., Sasnauskas, G., Siksnys, V., 2012. Catalytic activity control of material influences community structure as evaluated by metagenomic analysis. restriction endonuclease—triplex forming oligonucleotide conjugates. Bioconjug. Microbiome 2, 19. Chem. 23, 203–211. Wijffels, R.H., Barbosa, M.J., 2010. An outlook on microalgal biofuels. Science 329, Simon, C., Daniel, R., 2011. Metagenomic analyses: past and future trends. Appl. Environ. 796–799. Microbiol. 77, 1153–1161. Wilson, M.C., Mori, T., Ruckert, C., Uria, A.R., Helf, M.J., Takada, K., Gernert, C., Steffens, Sleator, R.D., 2010. The story of JCVI-syn1.0: the forty million dollar U.A.E., Heycke, N., Schmitt, S., Rinke, C., Helfrich, E.J.N., Brachmann, A.O., Gurgui, C., microbe. Bioeng. Bugs 1, 229–230. Wakimoto, T., Kracht, M., Crusemann, M., Hentschel, U., Abe, I., Matsunaga, S., Sleator, R.D., Shortall, C., Hill, C., 2008. Metagenomics. Lett. Appl. Microbiol. 47, 361–366. Kalinowski, J., Takeyama, H., Piel, J., 2014. An environmental bacterial taxon with a Smedile, F., Messina, E., La Cono, V., Tsoy, O., Monticelli, L.S., Borghini, M., Giuliano, L., large and distinct metabolic repertoire. Nature 506, 58–62. Golyshin, P.N., Mushegian, A., Yakimov, M.M., 2013. Metagenomic analysis of Worden, A.Z., Cuvelier, M.L., Bartlett, D.H., 2006. In-depth analyses of marine microbial hadopelagic microbial assemblages thriving at the deepest part of Mediterranean community genomics. Trends Microbiol. 14, 331–336. Sea, Matapan-Vavilov Deep. Environ. Microbiol. 15, 167–182. Woyke, T., Rubin, E.M., 2014. Searching for new branches on the tree of life. Science 346, Smith, J., Grizot, S., Arnould, S., Duclert, A., Epinat, J.C., Chames, P., Prieto, J., Redondo, P., 698–699. Blanco, F.J., Bravo, J., Montoya, G., Paques, F., Duchateau, P., 2006. A combinatorial Wu, J.B., Kodzius, R., Cao, W.B., Wen, W.J., 2014. Extraction, amplification and detection of approach to create artificial homing endonucleases cleaving chosen sequences. DNA inmicrofluidic chip-based assays. Microchim. Acta 181, 1611–1631. Nucleic Acids Res. 34. Wu, J.B., Kodzius, R., Xiao, K., Qin, J.H., Wen, W.J., 2012. Fast detection of genetic Smith, H.O., Hutchison, C.A., Pfannkoch, C., Venter, J.C., 2003. Generating a synthetic information by an optimized PCR in an interchangeable chip. Biomed. Microdevices genome by whole genome assembly: phi X174 bacteriophage from synthetic 14, 179–186. oligonucleotides. Proc. Natl. Acad. Sci. U. S. A. 100, 15440–15445. Yarza, P., Sproer, C., Swiderski, J., Mrotzek, N., Spring, S., Tindall, B.J., Gronow, S., Pukall, R., Sogin, M.L., Morrison, H.G., Huber, J.A., Welch, D.M., Huse, S.M., Neal, P.R., Arrieta, J.M., Klenk, H.P., Lang, E., Verbarg, S., Crouch, A., Lilburn, T., Beck, B., Unosson, C., Cardew, Herndl, G.J., 2006. Microbial diversity in the deep sea and the underexplored “rare S., Moore, E.R.B., Gomila, M., Nakagawa, Y., Janssens, D., De Vos, P., Peiren, J., Suttels, biosphere”. Proc. Natl. Acad. Sci. U. S. A. 103, 12115–12120. T., Clermont, D., Bizet, C., Sakamoto, M., Iida, T., Kudo, T., Kosako, Y., Oshida, Y., Stoddard, B.L., 2005. Homing endonuclease structure and function. Q. Rev. Biophys. 38, Ohkuma, M., Arahal, D.R., Spieck, E., Roeser, A.P., Figge, M., Park, D., Buchanan, P., 49–95. Cifuentes, A., Munoz, R., Euzeby, J.P., Schleifer, K.H., Ludwig, W., Amann, R., Suenaga, H., 2012. Targeted metagenomics: a high-resolution metagenomics approach Glockner, F.O., Rossello-Mora, R., 2013. Sequencing orphan species initiative (SOS): for specific gene clusters in complex microbial communities. Environ. Microbiol. 14, filling the gaps in the 16S rRNA gene sequence database for all species with validly 13–22. published names. Syst. Appl. Microbiol. 36, 69–73. Sugawara, H., Ogasawara, O., Okubo, K., Gojobori, T., Tateno, Y., 2008. DDBJ with new Yoshida, K., Treen, N., Hozumi, A., Sakuma, T., Yamamoto, T., Sasakura, Y., 2014. Germ cell system and face. Nucleic Acids Res. 36, D22–D24. mutations of the ascidian Ciona intestinalis with TALE nucleases. Genesis (New York) Suzuki, Y., Gojobori, T., 1999. A method for detecting positive selection at single amino 52, 431–439. acid sites. Mol. Biol. Evol. 16, 1315–1328. Zaprasis, A., Liu, Y.J., Liu, S.J., Drake, H.L., Horn, M.A., 2010. Abundance of novel and diverse Suzuki, Y., Gojobori, T., Nei, M., 2001. ADAPTSITE: detecting natural selection at single tfdA-like genes, encoding putative phenoxyalkanoic acid herbicide-degrading amino acid sites. Bioinformatics 17, 660–661. dioxygenases, in soil. Appl. Environ. Microbiol. 76, 119–128. Torres, N.V., Voit, E.O., 2002. Pathway analysis and optimization in metabolic engineering. Zarraonaindia, I., Smith, D.P., Gilbert, J.A., 2013. Beyond the genome: community-level Cambridge University Press. analysis of the microbial world. Biol. Philos. 28, 261–282. Tringe, S.G., von Mering, C., Kobayashi, A., Salamov, A.A., Chen, K., Chang, H.W., Podar, M., Zhang, J.J., Moore, B.S., 2015. Digging for biosynthetic dark matter. Elife 4. Short, J.M., Mathur, E.J., Detter, J.C., Bork, P., Hugenholtz, P., Rubin, E.M., 2005. Zhang, F., Blasiak, L.C., Karolin, J.O., Powell, R.J., Geddes, C.D., Hill, R.T., 2015. Phosphorus Comparative metagenomics of microbial communities. Science 308, 554–557. sequestration in the form of polyphosphate by microbial symbionts in marine Tu, Q., He, Z., Zhou, J., 2014. Strain/species identification in metagenomes using genome- sponges. Proc. Natl. Acad. Sci. 112, 4381–4386. specific markers. Nucleic Acids Res. Zhang, Q., Doak, T.G., Ye, Y., 2014. Expanding the catalog of cas genes with metagenomes. Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., Richardson, P.M., Solovyev, Nucleic Acids Res. 42, 2448–2459. V.V., Rubin, E.M., Rokhsar, D.S., Banfield, J.F., 2004. Community structure and metab- Zhu, Q., Li, J., Ma, J., Luo, M., Wang, B., Huang, H., Tian, X., Li, W., Zhang, S., Zhang, C., Ju, J., olism through reconstruction of microbial genomes from the environment. Nature 2012. Discovery and engineered overproduction of antimicrobial nucleoside 428, 37–43. antibiotic A201A from the deep-sea marine actinomycete Marinactinospora Varaljay, V.A., Howard, E.C., Sun, S.L., Moran, M.A., 2010. Deep sequencing of a thermotolerans SCSIO 00652. Antimicrob. Agents Chemother. 56, 110–114. dimethylsulfoniopropionate-degrading gene (dmdA) by using PCR primer pairs designed on the basis of marine metagenomic data. Appl. Environ. Microbiol. 76, 609–617.

Please cite this article as: Kodzius, R., Gojobori, T., Marine metagenomics as a source for bioprospecting, Mar. Genomics (2015), http://dx.doi.org/ 10.1016/j.margen.2015.07.001