Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Grant Agreement Number: 654008

EMBRIC

European Marine Biological Research Infrastructure Cluster to promote the Blue

Bioeconomy

Horizon 2020 – the Framework Programme for Research and Innovation (2014-2020), H2020-INFRADEV-1-2014-1

Start Date of Project: 01.06.2015 Duration: 48 Months

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery (M36)

HORIZON 2020 - INFRADEV

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 1 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Implementation and operation of cross-cutting services and solutions for clusters of ESFRI

Grant agreement no.: 654008 Project acronym: EMBRIC Project website: www.embric.eu Project full title: European Marine Biological Research Infrastructure cluster to promote the Bioeconomy Project start date: June 2015 (48 months) Submission due date : May 2018 Actual submission date: May 2018 Work Package: WP 6 Microbial pipeline from environment to active compounds

Lead Beneficiary: CABI Version: 9.0 Authors: SMITH David GOSS Rebecca OVERMANN Jörg BRÖNSTRUP Mark PASCUAL Javier BAJERSKI Felizitas HENSLER Michael WANG Yunpeng ABRAHAM Emily

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 2 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Project funded by the European Union’s Horizon 2020 research and innovation programme (2015-2019) Dissemination Level PU Public PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission X Services

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 3 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Abstract

The objective is to develop coherent chains of high quality services for access to biological, analytical and data resources and deploying common underpinning technologies and practices for the route to useful compounds from marine . This prototype pipeline brings together the expertise and facilities of partners from the MIRRI, EMBRC and EUOPENSCREEN research infrastructures to target and release the potential of microorganisms from isolation, through characterisation, to end product. Bacteria are key components of the marine environment, performing a wide range of biogeochemical and ecological functions yet we know very little about them. It is estimated that we have seen less than 1 percent in culture and the vast potential remains locked away. Molecular ecology has provided us with a picture of what might be there and an idea of the chemistry they may perform. The pipeline aimed to unlock this potential, utilizing the specialist expertise and facilities of consortium laboratories, organisms difficult to cultivate or have yet to be grown were targeted, characterised and prepared for scale up and production of active compounds. DSMZ undertook the task to make difficult-to-culture bacterial strains amenable to subsequent natural compound analyses. Optimal growth conditions for some of these organisms were determined and as a result bacterial strains with novel properties were isolated and grown. Following de-replicating of potential clones DSMZ isolated 264 species of slow growing bacteria from seawater, sediment and sponges of the Pacific Ocean representing the phyla Actinobacteria, Bacteroidetes, and Rhodothermaeota. The most interesting of these will be selected for cosmid library production two Arcobacter species were fermented and organic extracts supplied to HZI for analysis. USTAN had the role of improving the efficiency and effectiveness of a harmonised natural product discovery pipeline. Cosmid libraries were established for test strains and these put through heterologous platforms to determine the presence of compounds. Nine gene clusters and a series of compounds were detected. Improvements to the process were made using elicitors and promoters and some difficult cyclic peptides were characterised. HZI improved the efficiency of different extraction and characterization techniques for use on different types of organisms and as a result isolates producing interesting compounds were discovered. Once the pipeline through the partner infrastructures was fine-tuned, fully sequenced samples were put through the pipeline and it was demonstrated that interesting compounds could be discovered. CABI focused on the regulatory environment addressing issues around access to genetic resources. The prototype pipeline resulted in a flexible system to access cross research Infrastructure expertise and facilities to meet specific user defined demand of the research community. In one single experiment to develop the EMBRIC microbial prototype pipeline, 264 rarely isolated species of bacteria were made available for study of their bioactive compounds. This demonstrates that coordinated and targeted isolation programmes engaging research teams from the many partners in the Research Infrastructures orchestrated for specific bioindustry needs could yield many thousands of candidate organisms. The potential for discovery of new interesting bioactive compounds is thus increased exponentially.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 4 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Contents Abstract 4 1. Introduction 7 1.1 Microorganism prototype pipeline 7 2. Protocols for isolation, selection, growth of strains and extract supply 11 2.1. Sample collection and processing 11 2.3. Taxonomic affiliation of isolates 14 2.4. Maintenance procedures 14 2.5. Fermentation of bacteria for natural product production 15 3. The genomic characterisation, cosmid library production and analysis 16 3.1 Gene Clusters Prediction 16 3.2 Genomic DNA Preparation 16 3.3 Vector DNA Preparation 17 3.4 Construction of Cosmid Library 18 3.5 Transformation of Gene Clusters and Secondary Metabolite Heterologous Expression 19 4. Analysis of extracts 22 4.1 Analysis of extracts at USTAN 22 4.2 Analysis of extracts at HZI 23 5. Regulatory Environment 26 5.1 Regulatory Guidance and Community Best Practices 28 5.2 Regulations impacting on collection, handling, use and distribution of organisms 31 5.3 Health and Safety 32 5.4 Microorganisms as hazardous substances 33 5.5 Classification of Microorganisms on the Basis of Hazard 33 5.6 Quarantine regulations 35 5.7 Postal Regulations and Safety 35 5.8 Packaging 36 5.9 Regulations governing distribution of cultures 36 5.10 Control of Distribution of Dangerous Organisms 36 5.11 Biological Weapons Convention 37 5.12 Export Licensing Measures 37 5.13 Control procedures: UKNCC control of Dangerous Pathogens 39 5.14 Convention on Biological Diversity 40 5.15 Ownership of Intellectual Property Rights (IPR) 40

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 5 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

5.16 Safety information provided to the recipient of microorganisms 41 5.17 Areas Beyond National Jurisdiction (ABNJ) 42 6. Outputs from the microorganism prototype pipeline 45 6.1 Strains collected from the Pacific Ocean by DSMZ 45 6.2 Cosmid libraries and biosynthetic gene clusters at USTAN 45 6.3 Characterisation of extracts at HZI 61 6.4 Significance of outputs 64 7. EMBRIC pathways to discovery 66 8. Summary 75 9. References and bibliography 77

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 6 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

1. Introduction

The European Marine Biological Research Infrastructure Cluster (EMBRIC) is designed to accelerate the pace of scientific discovery and innovation from marine Bio-Resources. EMBRIC aims to promote new applications derived from marine organisms in fields such as drug discovery, novel foods and food ingredients, aquaculture selective breeding, bioremediation, cosmetics and bioenergy (http://www.embric.eu/). Researchers design their own experiments to lead to discovery of novel properties or products but such programmes can be enhanced by using the facilities available through European Research Infrastructures. EMBRIC addresses the steps to discovery from selection and isolation of the marine microorganisms to the characterisation and isolation of bioactive compounds. EMBRIC tests the different routes that can be followed and demonstrates the advantages of engaging with the most appropriate technologies available. Additionally, there are numerous areas where policy, regulations or legislation impact on the isolation, use and distribution of organisms which include health and safety, quarantine regulations, shipping, packaging, governing distribution of cultures including dangerous pathogens, export licensing measures, ownership of Intellectual Property Rights (IPR) and more. Compliance with this regulatory environment may be complicated and ever changing but it is not new. Individuals and their Institutions must address this but in many cases community best practices or common approaches have helped them. For example culture collections through the European Culture Collections’ Organisation (ECCO) and the World Federation for Culture Collections (WFCC) have supported their members in compliance particularly keeping track of shipping regulations and more latterly the Nagoya Protocol. In the context of EMBRIC, two of the infrastructures (MIRRI and EMBRC) have developed extensive support tools and advice. This report examines existing initiatives and identifies gaps in support in order to facilitate research and development in a compliant operational framework. However, it is the pipeline itself that EMBRIC focussed on, in order to demonstrate how appropriate technologies could be harnessed with user needs in mind, in order to help accelerate the discovery process. An initial small group of 4 laboratories were selected to work together to demonstrate how workflows through different centres and across Research Infrastructures could be optimised. 1.1 Microorganism prototype pipeline Discovery and exploitation of natural compounds from bacteria and fungi The aim of the EMBRIC microorganism prototype pipeline was to coordinate the expertise, technologies and facilities of Research Infrastructures to harness the most appropriate processes to accelerate discovery of new products from marine bioresources. Individual research institute or company pipelines have been developed often restricted by the in- house technologies and expertise available. The objective of the EMBRIC initiative was to develop coherent chains of high quality services for access to biological, analytical and data resources and deploying common underpinning technologies and practices for the route to useful secondary metabolites from marine bacteria. The prototype pipeline brought together expertise and facilities of partners from the MIRRI, EMBRC and EUOPENSCREEN research

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 7 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

infrastructures, to target and release the potential of the organisms from isolation through characterisation to end product. Bacteria are key components of the marine environment, performing a wide range of biogeochemical and ecological functions yet we know very little about them. It is estimated that we have seen less than 1 percent in culture and the vast potential remains locked away. Population genomics has provided us with a picture of what might be there and an idea of the chemistry they may perform. EMBRIC’s microorganism prototype pipeline (Fig. 1) aimed to unlock this potential. Utilizing the specialist expertise and facilities of key laboratories, organisms that are difficult to cultivate or have yet to be grown were targeted, characterised and prepared for scale up and production of active compounds.

Figure 1. An overview of the EMBRIC microorganism prototype pipeline Cultivation and culture collection ● Improved targeted isolation and characterization of strains (substrate pulse experiment) ● Biomass production ● Fermentation and extraction scale up

EMBRIC partners are enhancing and developing new tools for isolating the yet to be cultured microorganisms. For example DSMZ has used substrate pulse techniques on marine microbial communities to identify active cells; they have also employed a biofilm tool for microbial isolation which has facilitated microbial isolation. Challenging cells with specific substrates not only can provide some indication of what may be required for growth media but can also help target organisms with specific properties. The expertise of the microbiologists across EMBRC and MIRRI can be harnessed to facilitate the creation of sufficient biomass from organisms hitherto difficult or impossible to cultivate. Determination of organism potential Working with slow growing marine organisms can be laborious and expensive in time and resources so it is important that effort focuses on the organisms with potential. There are numerous methodologies that can be applied to characterise and discover potential, the microorganism prototype pipeline explores and evaluates some of them, for example: ● Characterizing the metabolome; GC/MS and LC/MS(/MS) pipeline; correlating metabolism with gene expression data; ● Genome reading to determine biosynthetic gene clusters (BGCs); ● Comparing approaches to unlocking natural product biosynthesis using: o Chemical elicitors (rapid but untargeted);

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 8 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

o Heterologous expression (cosmid and BAClibraries) – slow but precise and affording opportunity for further manipulation. ● Liquid chromatography-tandem mass spectrometry (LC-MS/MS), Proton (Hydrogen) nuclear magnetic resonance (HNMR), Carbon-13 nuclear magnetic resonance (CNMR), Heteronuclear Single Quantum Correlation (HSQC), Nuclear Overhauser Effect Spectroscopy (NOESY) characterization Metabarcoding technology can characterise the species compositions of mass samples of environmental DNA. Extract production ● Test efficiency of different extraction and characterization techniques; ● Selection of phenotypic assays; peptide arrays; mutant generation and sequencing; chemical probes; ● Develop specific purification procedures for each molecule - Extraction/isolation example natural products. Extraction method limits compounds available in the sample for assay – EMBRIC expertise can identify the appropriate stage of the organism’s life cycle or the triggers to express the target chemistry. Identifying the active component of the extract requires different technologies; the resources of RIs can be accessed to carry out processes efficiently whilst utilizing their expertise e.g. EU-OPENSCREEN provides infrastructure for Chemical Biology and its translation to medicine, agriculture, bioindustries. HZI has protocols for the detection and identification of different metabolite classes: from small volatile compounds (e.g. short- chain fatty acids) to larger polar and nonpolar compounds of primary and secondary metabolism. Different state-of-the-art routes to access, then heterologously express, identified biosynthetic gene clusters encoding novel chemistries are being explored and compared. Selected metabolites are purified by reversed phase and normal phase chromatography, structurally characterised by NMR and their biological profile is established in phenotypic assays (e.g. antibacterial, antifungal, cytotoxic). Purification of characterised compounds ● Mode of action analysis for isolated and structurally characterised compounds ● Characterise the metabolome by untargeted GC/MS and LC/MS ● Correlate metabolism with gene expression data ● Test efficiency of different extraction and characterization techniques ● Test extracts and pure compounds in cellular bioassays ● For bioactive compounds: Mode of action studies, peptides arrays, mutant generation and sequencing ● Links to relevant data sets outside the partners to add value

Compound identification in GC-MS analyses is carried out using NIST’11 spectral library; a GC-MS in-house library Golm Metabolome database is available for comparative purposes. For the LC-MS/MS workflow, HZI uses publicly available and commercial (metaboBASE) databases, 610 in-house standards and a self-programmed MS2 spectra clustering algorithm. Legal framework There is extensive legislation concerning the safe handling, use and distribution of microorganisms at the national, regional and international levels. Micro-organisms of hazard

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 9 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

groups 2, 3 and 4 are hazardous substances and as such fall under the EU Biological Agents Directive and are dangerous goods as defined by the International Air Transport Association (IATA) Dangerous Goods Regulations where requirements for their packaging are defined. The potential misuse of microorganisms has introduced control measures in place for biosecurity to control access to dangerous pathogens. Most recently the Nagoya Protocol is being implemented by nations to ensure benefit sharing occurs as a result of accessing genetic resources. The legislation and supporting documents are often difficult to find and understand. EMBRIC offers advice and interpretation to help the implementation of such legislation. Examples are: ● Health and Safety ● Classification of Microorganisms on the Basis of Hazard ● Quarantine regulations ● Postal Regulations and Safety ● Packaging ● Regulations governing distribution of cultures ● Legislation on the Proliferation, Distribution and Misuse of Dangerous Pathogens ● Export Licensing Measures ● Convention on Biological Diversity and the Nagoya Protocol ● Ownership of Intellectual Property Rights (IPR) ● Safety information provided to the recipient of microorganisms

Access Providers available for the full microorganism prototype pipeline: ● EMBRC: University of St Andrews, School of Chemistry ● MIRRI: Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH, Germany; CABI - Centre for Agriculture and Biosciences International, UK ● EU-OPENSCREEN: Helmholtz Centre for Infection Research, Germany

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 10 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

2. Protocols for isolation, selection, growth of strains and extract supply

The cultivation of bacteria is highly biased toward a few phylogenetic groups. However, many of the currently underexplored bacterial lineages likely have novel biosynthetic pathways (Overmann et al., 2017). The access to those enigmatic taxa would facilitate the discovery of bioactive molecules with new scaffolds or targets, avoiding the re-discovery of known natural products. In this regard, we have developed improved cultivation methods based on a better understanding of the ecology of previously not-cultured bacteria. Below, we detail the methodology followed for the isolation of bacterial strains from marine samples, and their subsequent taxonomic identification, preservation, selection for further steps, fermentation and generation of an organic extract collection. 2.1. Sample collection and processing Seawater samples were collected close to the water surface (1 m depth) in sterile Nalgene bottles. Marine sediments were sampled in sterile 50 ml Falcon tubes by divers or via the use of a small crane and directly transferred to a sterile 50 ml Falcon tube. Samples were kept at 4 ºC and processed within 10 h after sampling. Subsamples were fixed in 2% (v/v) glutaraldehyde. For the bacterial isolation, sediment samples were dispersed in 10 mM 4-(2- hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) buffered at pH 7.3 by vortexing. 2.2. Cultivation strategies For bacterial isolation, three complementary strategies were applied in order to maximise the diversity of isolated strains. 2.2.1 Single dilution high-throughput in liquid media. This strategy applied a high-throughput cultivation approach based on (i) liquid oligotrophic media, (ii) a low concentration of inocula in order to outcompete less abundant but fast- growing bacteria and (iii) long incubation times. These three factors allow accessing difficult to grow bacteria (Connon and Giovannoni, 2002; Pascual et al., 2015, Overmann et al., 2017). Parallel liquid cultures were set up in 96-well microtiter plates. Before inoculation, total bacterial cell numbers were determined from each natural sample by fluorescence microscopy after staining with SYBR Green I (Life Technologies, Ltd, Paisley, UK). Each well of the microtiter plates was filled with 180 µl of medium and subsequently inoculated with 20 µl of inoculum containing 10 or 50 cells (Bruns et al., 2003). The plates were filled and inoculated either by hand using multichannel pipettes or automatically using the Thermo Scientific™ Multidrop™ Combi Reagent Dispenser. The outer wells of each plate (36 wells) were not inoculated and served as negative controls. Five different culture media were used for the bacterial enrichment and isolation: (i) Artificial Sea Water (ASW)/ HD 1:10; (ii) medium “polymer mix”; (iii) medium “insoluble humic analogs”; (iv) medium “soluble humic analogs” and (v) medium Soil Extract (SE)/HD 1:10 (see Annexe 1 - Additional Information). Plates were incubated at 15 °C in the dark for 6-12 weeks. After incubation, the bacterial community grown in each well was analyzed by a barcoded Illumina paired-end sequencing

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 11 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

method targeting the 16S ribosomal RNA V1-2 hypervariable region (Camarinha-Silva et al., 2014). The of the reads was assigned against the SILVA database (v.128) (Quast et al., 2013) with UCLUST (Edgar, 2010). According to the taxonomic structure of the bacterial community of each well, a selected isolation strategy was carried out. Aliquots of each culture were plated on the above described medium solidified with 0.8% gelrite (w/v) [SERVA, Heidelberg, Germany]. After incubation for 4-6 weeks, several representative colonies were picked from each plate and purified by three additional passages on the corresponding solidified medium. 2.2.2 Direct plating method The direct plating method was used for the enrichment of slow-growing marine bacteria which need the interactions with other microorganisms for growth. Those interactions may be related to the synthesis and subsequent release of metabolites or signal molecules to the surrounding medium (Overmann, 2013). This approach is limited to bacteria able to produce (micro-) colonies on solid media. Five different culture media solidified with 0.8% gelrite (w/v) were used for the bacterial enrichment: (i) ASW/HD 1:10; (ii) medium “polymer mix”; (iii) medium “insoluble humic analogs”; (iv) medium “soluble humic analogs” and (v) medium Soil Extract (SE)/HD 1:10 (see Annexe 1 - Additional Information). Experiments were carried out in 90 mm Ø Petri dishes. Tenfold serial dilutions of the natural samples were performed in Artificial Sea Water media (ASW; modified from Bruns et al., 2003) or Artificial Brackish Water (ABW) (see Additional Information). Subsequently, 100 µl of the 10-3 or 10-4 dilution was added to the culture medium surface and spread with a Digralsky spreader. Plates were incubated at 15 °C in the dark for 6-12 weeks. After incubation, several representative colonies were picked from each plate and purified by three additional passages on the corresponding solidified medium. 2.2.3 Growth in biofilms For the enrichment and isolation of biofilm-forming bacteria, the methodology described by Gich et al. (2012) was adapted to marine samples. Solid surfaces may lead to the stimulation of cell division and growth of starved bacteria (Kjelleberg et al., 1982; Overmann, 2013). Strips consisting of different inert solid materials (steel, glass, polypropylene or polystyrol) were employed and incubated in 20 ml-glass vials (Fig. 2). Solid surfaces were incubated in ASW/HD 1:10 and inoculated with 1000 cells from the natural samples. Vials were incubated at 15°C for 8 weeks. Three sequential enrichments were done, transferring the solid surfaces to fresh medium every third month, and incubating the cultures at room temperature. Finally, the biofilm that was formed over the surface of the strips was spread onto ASW/HD 1:10 solidified with gelrite [0.8% (w/v)] and cultures were purified by subsequent re-streaking.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 12 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 2. 20 ml-glass vials used for the enrichment and isolation of biofilm-forming bacteria - Two parallel steel strips are submerged in ASW/HD 1:10.

2.2.4 Chemotaxis chambers Another cultivation approach for the enrichment and subsequent isolation of marine bacteria exploited the chemotactic responses of bacteria to specific nutrients (Adler 1973; Fröstl & Overmann, 1998; Overman, 2005). Although restricted to motile and chemotactically active microorganisms, a considerable fraction of species can be covered with this technique, particularly in bacterioplankton communities (Overmann, 2005). For the chemotaxis assays, glass capillaries loaded with defined substrate solutions were inserted in a suspension of motile microorganisms, and the accumulation of cells at the opening of or within the capillary is monitored by direct or indirect methods (Overmann, 2005). The substrates used for the isolation of marine bacteria are listed in Annexe 1 as Additional Information. The concentration of attractants is a parameter that has been found to be critical for observing a chemotactic response of bacteria (Jaspers, 2000). Experiments were set up in small microscopic chambers (Fig. 3; modified from Overmann, 2005), which are prepared using small 21 × 21 × 0.17‐mm coverslips as spacers between the microscope slide and the lid, which consists of another 60 × 24 × 0.17‐mm coverslip. Spacers and the lid are fixed by sealing the two short and one of the long edges of the chamber with a paraffin/mineral oil mixture (4:1, v/v). Flat rectangular glass capillaries with a length of 50 mm, an inside diameter of 0.1 × 1.0 mm, and a capacity of 5 µL (Vitrocom, Mountain Lakes, NJ) were used for most applications. These capillaries fit exactly into the opening of the chemotaxis chamber. The specific geometry of these capillaries permits a direct light microscopic examination of their contents. For marine samples, the small microscopic chambers were incubated at room temperature for 3 hours. After incubation, the capillaries are removed from the chambers. For direct microscopy of the accumulated microorganisms, the open end of each capillary was immediately sealed with plasticine. Subsequently, bacterial cells trapped in the capillaries could be transferred to petri dishes or 96-multiwell plates filled with ASW/HD 1:10 exerting positive pressure with a pipette from one end of the capillary.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 13 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 3. Laboratory‐made microscopic chamber - Two sides and the back of the chamber are sealed with paraffin/mineral oil (not shown). Flat glass capillaries are loaded with substrate solutions, sealed at one end with plasticine, and inserted into the chamber through the opening in the front.

2.3. Taxonomic affiliation of isolates The taxonomic affiliation of all axenic bacterial isolates was investigated by sequencing their 16S rRNA gene. The almost full-length 16S gene of strains was amplified directly by colony- PCR using primer pair 8f (5´-AGAGTTTGATCCTGGCTCAG-3´) (Galkiewicz et al., 2008) and 1492r (5´-GGTTACCTTGTTACGACTT-3´). PCR mixtures included 2.0 µL PCR buffer (10x), 0.8 µL MgCl2 (25 mM), BSA 0.4 µL (20 mg mL-1), 0.4 µL dNTPs (10 mM each), 0.08 µL each forward and reverse primers (50 pmol µL-1), 0.08 µL Dream Taq DNA polymerase (5 U µL-1 Thermo Scientific) and 1.0 µL template [Pick colonies and add a stab of each colony to 20 µL of water followed by three freeze/thaw cycles (-20°C/microwave oven)] in a total volume of 20 µL. The thermal cycling program consisted of: (i) 10 min at 94ºC; (ii) 32 cycles of 30s at 94ºC, 30s at 56ºC and 1 min at 72ºC, and (iii) a final elongation step of 7 min at 72 ºC. PCR products were purified and sequenced using the above primer pairs and the internal primers 1055f (5´-ATGGCTGTCGTCAGCT-3´) (Lane, 191) and 341r (5´- CTGCTGCCTCCCGTAGG-3´) (Muyzer et al., 1993), and by Sanger sequencing employing the AB 3730 DNA DNA analyser (Applied Biosystems, Foster City, CA) and the AmpliTaq® FS BigDye® Terminator Cycle Sequencing Kit. Subsequently, the 16S rRNA sequences were analyzed with the online database EzBioCloud (Yoon et al., 2017). A first selection of interesting strains was made based on their taxonomic adscription and novelty. Those isolates belonging to underrepresented phyla (e.g. Acidobacteria, Gemmatimonadetes, Verrucomicrobia, Chloroflexi and Armatimonadetes, among others) as well as those which represent novel species, genus, or higher taxonomic ranks were prioritised for successive analyses. The taxonomic criterion was complemented with molecular data derived from genome sequencing and cosmid library production (see chapter 3 “The genomic characterization, cosmid library production and analysis”). 2.4. Maintenance procedures Strains were routinely grown using the ASW/HD 1:10 at 15 ºC. Strains were sub-cultured every 1.5–2 months. For cryopreservation in liquid nitrogen or at −80°C, cell suspensions in ASW/HD 1:10 were supplemented with 5% DSMO (v/v) or 25% glycerol (v/v) and immediately shock-frozen.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 14 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

2.5. Fermentation of bacteria for natural product production Growth ranges and optima of temperature, pH and salinity were determined for each strain as well as its growth kinetic. All analyses were carried out in liquid ASW/HD 1:10 under toxic conditions as described before (Pascual et al., 2015). Growth was determined by measuring the optical density at 660 nm (OD660). The optimum growth was defined as ≥75% of the highest growth rate achieved. For the fermentation of marine strains, 2 × 100 mL cultures containing liquid ASW/HD 1:10 (in 250 mL Erlenmeyer flasks) were inoculated with 3.0 ml (3% v/v final culture volume) of a seed culture. Depending on the growth kinetics and optimum condition of each strain, the cultures were fermented at 15 - 25°C for 3-15 days, on a rotary shaker at 180 rpm. The growth medium was supplemented with 2% pre-washed Amberlite® 16 XAD resin (w/v). This hydrophobic polyaromatic resin adsorbs and releases ionic species through hydrophobic and polar interactions, increasing the diversity and amount of specific secondary metabolites released by the strain during growth (González-Menendez et al., 2014). The resins were washed in twice their volume of methanol by stirring for 1 h, followed by six washes with distilled water, and were finally incubated in distilled water at 4°C for 48 h. Afterward, the resin was vacuum filtered through a 125 mm diameter and 6 µm pore size filter. Once filtered, the resins were oven dried at 75°C, and 0.3 g of each resin were added to the fermentation vials before autoclaving for 21 min at 121°C. After fermentation, the well-grown culture (2 × 100 ml) was sieved through a metal sieve and the bacterial cells and adsorptive polymeric resin were washed with distilled water and subsequently dried with paper. The dried resin and biomass were added to an Erlenmeyer flask containing 70 ml of acetone and shaken at 180 rpm at 25ºC in dark chamber overnight. The acetone was then filtered through a folded filter into a 250 mL round, dried in a rotavapor at 40 ºC and finally re-suspended in 2.2 mL methanol. 600 µL of the organic extract was centrifuged for 3 min at 11,000 rpm. The resulting supernatant was used for biological testing and LC-MS analysis. For LC-MS analysis, 0.1 mL of the organic extract was diluted with additional 0.1 mL of methanol and stored at -20 ºC until further usage. For biological testing, the organic extracts were directly used for inhibition zone assays or dried again and subsequently resolved in 100 µL of medium for a tetrazolium reduction (MTT) assays. Organic extract collection was stored at -80 ºC for longer times. Additional information on media formulae used at DSMZ is given at Annexe 1.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 15 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

3. The genomic characterisation, cosmid library production and analysis

The bacterial genome sequences were analysed using antiSMASH to identify putative novel gene clusters and to predict the gene cluster products. The genomic DNA was then extracted and used to construct a cosmid library for each strain. Gene clusters predicted using antiSMASH, were then targeted for heterologous expression. To do this, the cosmid library was screened using PCR. Once the cosmids containing the gene clusters of interest were identified by PCR, the cosmid containing gene cluster was conjugated into Streptomyces ceolicolor and Streptomyces lividans for the heterologous production of novel natural products. The new compounds were then analysed using LC-MS/MS.

3.1 Gene Clusters Prediction Secondary metabolite biosynthesis gene clusters encoded within bacterial and fungal genomes can be identified, annotated and analysed rapidly using antiSMASH which integrates and cross-links with a large number of in silico secondary metabolite analysis tools. The genome sequence of strains is first submitted to antiSMASH. The bioinformatics software is able to predict all the encoded secondary metabolite biosynthesis gene clusters as well as key genes for secondary metabolite production, the gene cluster’s similarity with known gene clusters and the possible structure of secondary metabolite can be predicted. After the prediction the cosmid library of Saccharothrix espanaensis DSM 44229 and Amycolatopsis japonica MG417-CF17 were constructed and the BAC library of S. espanaensis DSM 44229 was constructed in Bio S&T company (See section 6.3 for further information). 3.2 Genomic DNA Preparation 3.2.1 Isolation of Genomic DNA: Genomic DNA from each bacterial strain was isolated in order to make a cosmid library for each bacterial strain. Large DNA fragments are required for constructing a cosmid library. The genomic DNA must be around 150 kb before digestion. Bacterial cells are harvested by

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 16 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

centrifugation (4000 rpm) for 5 min. Then the cells are re-suspended in 10 ml lysis solution (0.3 M sucrose, 25 mM EDTA, 25 mM Tris-HCl pH7.5, RNase 2U). 10 mg lysozyme is then added into the cells suspension and incubated in 37 °C for 30 min. Next, 1 ml 10% (w/v) SDS containing 5 mg proteinase K are added and incubated in 55 °C for 1.5 h. 3.6 ml NaCl (5 M) and 15 ml chloroform are added and rotated for 20 min. Then aqueous phase is transferred into a clean tube and 1 volume isopropanol is added to precipitate the DNA. Finally, DNA is washed using 70% ethanol and dried in room temperature then dissolved in water. 3.2.2 Partial Digestion of the Genomic DNA To make a cosmid library, the genomic DNA is partially digested into fragments between 30-42 kb which are packaged by a bacteriophage into cosmid vectors. To do this, the partial digestion conditions need to be optimised for the bacterial strain. Therefore, as a test reaction, 10 μg of genomic DNA and 10 μl of 10× restriction digest buffer was added into a microcentrifuge tube and water was added to a final volume of 100 μl. Samples were pre-equilibrated at 37°C for 5 minutes and 0.5 U of the frequent cutter Sau3A I enzyme was added. Next, 15 μl aliquots were removed at various time points from the digestion: 0-, 5-, 10-, 20-, 30- and 45-minute time points and analysed by gel electrophoresis. After the optimal time interval had been determined, the optimised partial digest was performed on 100 μg of genomic DNA in a 1 ml total reaction volume. DNA fragments in the range of 30-42 kb were then gel purified. 3.2.3 Dephosphorylation of Partial Digestion of the Genomic DNA The genomic DNA was dephosphorylated so that it could be ligated into the supercos vector to make the cosmid library. Once the partially digested genomic DNA had been purified, the DNA was dephosphorylated. 2 μg of DNA was dephosphorylated by adding 2 U of FastAP and diluting the mixture to a final volume of 100 μl. The reaction mixture was incubated at 37°C for 1 hour and then incubated at 68°C for 15 minutes to inactivate the FastAP. Finally, the dephosphorylated genomic DNA was purified and re-suspended to 1 μg/μl in water. 3.3 Vector DNA Preparation SuperCos 1 was used for the for the cosmid vector. SuperCos 1 is a 7.9 kb cosmid vector that contains bacteriophage promoter sequences flanking a unique cloning site (see Fig. 4. The SuperCos 1 vector is also engineered to contain genes for the amplification and expression of cosmid clones in eukaryotic cells.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 17 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 4. Circular map and features of the SuperCos 1 cosmid vector To prepare the vector, 25 μg of the SuperCos 1 cosmid vector was digested with 10 U/μg of Xba I and 1 U/μg of FastAP enzyme in a total volume of 500 μl at standard buffer conditions for 1 hour at 37 °C. The reaction mixture was then incubated for 15 min at 65 °C. Next 10 U/μg of BamHI was added and incubated for a further 1 hour at 37 °C. Finally the DNA was purified and dissolved in sterile water. 3.4 Construction of Cosmid Library 3.4.1 Ligation of DNA 1 μg of digested SuperCos 1 vector and 2.5 μg of partially digested gDNA were ligated using 2 U T4 DNA ligase and incubating at 24 °C overnight. 3.4.2 Packaging of DNA 100 μl of packaging extracts were thawed quickly by holding the tube until the contents of the tube just begins to thaw. 4 μl of ligated DNA were added to the packaging extract. Next the tubes were incubated at room temperature (22°C) for 2 hours. Next 500 μl of SM buffer

(5.8 g of NaCl, 2.0 g of MgSO4 · 7H2O, 50.0 ml of 1 M Tris-HCl (pH 7.5), 5.0 ml of 2% (w/v) gelatin, water to a final volume of 1 L) was added to the tube. Next, 20 μl of chloroform was added and the tube was spun briefly. The supernatant containing the phage was then ready for titering. 3.4.3 DNA Transfer and inoculation of cosmid library The VCS257 glycerol stock strain was streaked onto the LB agar plates. The plates were incubated overnight at 37 °C. The next day, LB medium supplemented with 10 mM MgSO4 and 0.2% (w/v) maltose was inoculated with a single colony. This was grown at 37°C, shaking for 4–6 hours. The strains were then centrifuged at 500 × g for 10 minutes and the

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 18 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

cells were gently re-suspended in half the original volume with sterile 10 mM MgSO4 and dilute the cells to an OD600 of 0.5 with sterile 10 mM MgSO4. A 1:10 and a 1:50 dilution of the cosmid packaging reaction in SM buffer was prepared. Each diluted was mixed with 25 μl of each dilution with 25 μl of the appropriate bacterial cells at an OD600 of 0.5 in a microcentrifuge tube and incubated at room temperature for 30 minutes. 200 μl of LB broth was added to each sample and incubated for 1 hour at 37°C. Using a sterile spreader, the cells on LB agar plates were inoculated with the required amount of the appropriate antibiotic and the plates were incubated overnight at 37°C. Next each single colony was inoculated into 96 deep well plates so that each well contained 1.9 ml LB with 50 ng/ml kanamycin and was incubated overnight at 37°C. The cosmid library was stored in 50% glycerol and the cosmid DNA was extracted for gene clusters screening. 3.4.4 Gene Clusters Screening Using antiSMASH, putative gene clusters for each bacterial strain were predicted. Using the sequence data, novel gene clusters were targeted using PCR. Two sets of primers were designed for the screening, one primer set to amplify the beginning of the gene cluster, and one primer set to amplify the end of the gene cluster. 3.5 Transformation of Gene Clusters and Secondary Metabolite Heterologous Expression 3.5.1 Introduction of cosmid clone into E. coli BW25113/pIJ790 In order for the cosmid to be transferred into the heterologous hosts, the cosmid was modified using a PCR targeting method to contain the apramycin resistance cassette and oriT. Once the cosmid was identified and purified, it was transferred to the strain E. coli BW25113/pIJ790 and grown overnight at 30 °C in 10 ml LB containing chloramphenicol (25 μg/ml). 100 μl E. coli BW25113/pIJ790 from overnight culture was then inoculated in 10 ml LB containing 20 mM MgSO4 and chloramphenicol (25 μg/ml). After 3-4 h at 30 °C the OD600 reached 0.4 and the cells were centrifuged at 3000 rpm for 10 min at 4 °C. The supernatant was discarded and the cells were re-suspended in an equal volume of chilled KMESI buffer (60 mM CaCl2, 25 mM MES (2-(N-morpholino) ethanesulphonic acid), 5 mM MgCl2, 5 mM MnCl2 Ph 5.8). The cells were kept on ice for 1-1.5 hours before being centrifuged again and re-suspend in 0.1 volume chilled KMESII buffer (KMESI with 10% glycerol). This cell suspension is competent for transformation. Next 50 µl of competent cell suspension was mixed with ~ 500 ng (5 µl) of cosmid DNA and the cosmid DNA was chemically transferred into the competent cells. The transformation mixture was incubated overnight at 30°C on LB. The next day a single colony was picked and used to inoculate an overnight culture to generate competent cells. (Fig.5) 3.5.2 Introduction of resistance cassette into E. coli BW25113/pIJ790 containing clusters The aac(3)IV-oriT-attP(FC31)-int(φ31) cassette was amplified from plasmid pIJ10702 by using the primers ScospIJ10702DVF2 5’-aga tct gat caa gag aca gga tga gga tcg ttt cgc atg

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 19 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

gat aag ttt atc acc acc ga and ScospIJ10702DVR2 5’- tcg ctt ggt cgg tca ttt cga acc cca gag tcc cgc tca gaa ctt ttc gat cag aaa c. Then the cassette was transferred into E. coli BW25113/pIJ790 containing the appropriate cosmid using the same method as above. After the transformation the plates were incubated at 37 °C overnight. The apramycin cassette then recombines with the cosmid to replace the kanamycin resistance gene of the cosmid. This therefore means the cosmid containing the putative novel gene cluster can be transferred into Streptomyces through conjugation. (Fig.5) 3.5.3 Transfer of the mutant cosmids into Streptomyces Mutant cosmids were transferred into E.coli ET12567/pUZ8002 using the same methods as above. E.coli ET12567/pUZ8002 containing the cosmid was incubated at 37 °C overnight. Next 100 µl overnight culture was inoculated into 10 ml fresh LB plus antibiotics as above and grown for ~ 4 h at 37°C to an OD600 of 0.4. The cells were washed twice with 10 ml of LB to remove antibiotics. While washing the E. coli cells, for each conjugation 10 µl (108) Streptomyces spores were added to 500 µl 2 × YT broth. The spores were heat shocked at 50°C for 10 min, and then allowed to cool. Next, 0.5 ml E. coli cell suspension and 0.5 ml heat-shocked spores were mixed and spun briefly. Most of the supernatant was then discarded, and the pellet was re- suspended in the 50 µl residual liquid. Next, a dilution series from 10-1 to 10-4 was made for the conjugation mixture and each dilution was plated on MS agar + 10mM MgCl2 (without antibiotics) and incubated at 30°C for 16-20 h. Next, the plate was overlaid with 1 ml water containing 0.5 mg nalidixic acid (20 μl of 25 mg/ml stock; selectively kills E. coli) and 1.25 mg apramycin (25 μl of 50 mg/ml stock). A spreader was used to lightly distribute the antibiotic solution and the incubation as continued at 30 °C. (Fig.5) Finally, single colonies were inoculated into suitable cultures for secondary metabolite production and compounds were analysed using LCMS.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 20 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 5. Flow chart of cosmid library construction

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 21 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

4. Analysis of extracts

4.1 Analysis of extracts at USTAN 4.1.1 Secondary metabolites production at Ustan The single colonies of engineering Streptomyces (S. coelicolor and S. lividans) with putative gene clusters and wild type were inoculated into 5 ml of 2 X YT medium (16 g/L tryptone, 10 g/L yeast extract and 5 g/L NaCl, pH7.2). After 2 days cultivation the cultures were inoculated into 50 ml of ISP2 medium (4 g/L yeast extract, 10 g/L malt extract and 4 g/L glucose, pH7.2) and incubated for 4 days. The cultures were then used for LC-MS/MS detection. 4.1.2 Secondary metabolites extraction 10 ml methanol was added into 50 ml cultures which were left shaking for 20 minutes to break the cells. Then 50 ml ethyl acetate was added to the cultures with methanol for the extraction of new compounds. After the ethyl acetate was added to the cultures, the cultures were inverted to mix and the mixture was centrifuged to separate the layers. After the centrifugation, the organic layer was collected and dried using the vacuum drier (GENEVAC LTD, UK) overnight. The dried sample was dissolved using 2 ml 50 % methanol and spin down for LCMS detection. 4.1.3 LC-HRMS detection LC-HRMS was carried out on a Termo Scientific Velos Pro / Orbitrap Velos Pro with a H-ESI source and a Themo Scientific Diones UltiMate 3000 RS chromatography system. The HPLC system was equipped with a Waters XBridge C18 3.5 μm 2.1 x 100 mm column at 40 °C. 5 μL injection volume was used for all samples. HPLC analysis was carried out with 0.1 % formic acid in water and acetonitrile with a flow rate of 350 μL / min. The following gradient was used: 0.00 – 1.50 min 5 % acetonitrile, 1.50 min – 8.00 min 5 % - 95 % acetonitrile, 8.00 min – 10.00 min 95 % acetonitrile, 10.00 – 10.50 min 95 – 5 % acetonitrile. UV absorbance was measured between 220 – 800 nm at 2 nm resolution. The first minute of the run was diverted to waste. After 1 minute the eluent was passed to the H-ESI source. The HRMS was set up with the following parameters: Positive ionisation mode using 250 °C heater temperature, 350 °C capillary temperature, 40 U sheath gas flow, 20 U aux gas flow, 2 U sweep gas flow, 3.5 kV ionization voltage and 50 % RF lens power. A background ion corresponding to the [M+H]+ of n-butyl benzenesulfonamide was used as a lock-mass for internal scan-by-scan calibration. 4.1.4 Network analysis For analysis of the heterologously expressed gene clusters the MS/MS data is analysed through the web-platform GNPS (Global Natural Products Social Molecular Networking) and then visualised using cytoscape. This allows for the visualisation of molecular networks, whereby sets of spectra from related molecules are displayed in spectral networks. This therefore helps with the deconvolution of data as the LCMS traces are very complex. The networks are visualised as such that each MS/MS spectrum is shown as a node, and the spectrum-to-spectrum relatedness are represented as the connections between the nodes.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 22 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Thus, key networks can be visualised whereby one parent mass has been fragmented (Fig. 6 Section 6.3 below). The differences in masses within the network can help to elucidate the final structure, for example there may be a loss of an amino acid. This is also linked with the genome analysis to try and elucidate the final structure of the unknown natural product. The spectral masses are also searched against the GNPS database to help de-replicate the data so that only novel compounds are isolated. 4.2 Analysis of extracts at HZI The extracts are tested in bio-assays designed to identify compounds that can be potential new drugs, or novel molecules for the cosmetic and nutraceutical industry. Bio-assays are performed at the HZI laboratory in Germany where they will be tested for antibacterial activities against disease relevant strains. Promising extracts identified in the bio-assays will be then further fractionated, and the different fractions will be tested again in the bioassay to ideally identify the molecule that is responsible for the activity. However, a large scale bioprofiling in dozens or even hundreds of assays is not possible for practical reasons. Therefore, WP6 aims to characterise the chemical diversity of extracts irrespective of a disclosed bioactivity. For this purpose, the metabolome, i.e. the ensemble of small molecules produced by a given microorganism, is investigated by ultra-high performance liquid chromatography – tandem mass spectrometry (UP LC-MS/MS). For this, first dried extracts obtained by DSMZ were re-suspended to a final concentration of 10 mg/mL in either the extraction solvent or in DMSO/acetonitrile (1:1, v/v) mixture, in order to become compatible with the aqueous conditions used for liquid chromatography analyses. Separation was performed on a C18 reversed phase column using a water/acetonitrile gradient. Two different ionization modes (positive and negative electrospray ionization) were applied to cover a broad range of compounds, as ionization efficiency depends on the polarity of compound and applied charge in the instrument. For our approach, a QTOF mass spectrometer (maXis HD, Bruker) was operated in a data-dependent MS/MS mode, applying collision-induced dissociation of the three most abundant ions in each scan (see Annex 2). Hence we obtained a metabolic profile, but also MS/MS data for structural elucidation of the most abundant compounds. Beside high-resolution mass spectrometry data, also UV- spectra were recorded that can give additional useful information for compound identification. The analysis of these data is still ongoing. Data were processed using open-source tools (e.g. XCMS Online) and commercial software (Bruker Data Analysis). This involved several steps such as peak-picking and filtering of detected features (retention time – m/z pairs). The assignment of a sum formula was not performed automatically, as especially for higher masses, several formula are compatible with a given mass and the error of measurement; in addition, the inclusion or exclusion of untypical chemical elements like P, S, or halogens is hard to define in a generic manner – especially for marine metabolites. As we applied both an external calibration before and after every run, and had an internal lock mass in every scan, a small mass error (in most cases <1 ppm) is expected, that should lead to very few, in most cases just one sum formula for compounds with masses <400 Da. For the identification of the compounds, measured spectra are compared with an in-house library that contains spectra obtained from around 600 pure standards. However, since this library is mainly based on primary metabolism from bacteria and human species, and since

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 23 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

other freely-available databases (e.g. Metlin) still lack spectra from secondary metabolites of marine microorganisms, only a small number of compounds could clearly be identified so far. Often an unambiguous prediction of the sum formula is not possible (especially in case of larger molecules). Therefore we analysed the MS2 spectra by clustering approaches, as outlined below. MS2 clustering is emerging as a promising computer-based approach to visualise and organise tandem MS/MS data sets and to automate database searches for metabolites identification within complex mixtures. For a large scale search of similarities between MS2 spectra on all runs, we developed a clustering algorithm in R, named CLUMSID2. Each analytical run provides about 500 unique MS2-spectra. These were clustered based on either product ions or neutral losses. We found that the product ion analysis generated clearer results. Thus, compounds with structural similarities can easily be recognised by CLUMSID, which facilitates the further annotation process (Fig. 6). One major advantage of a clustering-based de-replication strategy is the possibility to identify known compounds and putative analogues, thereby allowing for the prioritization of the isolation work flow and guide toward unknown molecules, as later a selection of interesting extracts and fractions will enter the chemical process leading to the purification by usual techniques (VLC, HPLC) and structural determination (HRMS, NMR 1D and 2D) of microbial compounds. The success of this strategy to achieve the rapid characterization of known compounds - functioning as anchor points of the cluster - greatly relies on the availability and quality of MS/MS compounds databases. Existing libraries (MassBank, Metlin, ReSpect, NIST) host MS/MS data of almost 23 000 individual compounds. We observed that they are far from being extensive to allow a full de-replication of complex mixtures. The database of compounds and associated MS/MS data being produced by WP6 will therefore represent a valuable resource.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 24 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 6. An example of results from a clustering analysis of a single extract (taken from WP7). Based on product ions, the recorded unique MS/MS spectra were clustered and plotted accordingly. Left: clustering results with additional heatmap to display similar spectra. Right: results are displayed as circular hierarchical plot. Each spectrum has a unique identifier (comprised of retention time and m/z) used for labelling in these plots. Furthermore, all MS-spectra (named by the identifier) are automatically extracted and converted into txt-file format. The MS/MS spectra will be provided in the database to allow easy conversion of the format into any spectral library format used by different MS libraries.

For each extract, molecular components are characterised with respect to their retention time, their UV spectrum, their accurate mass, and their fragmentation spectrum in two LC/MS/MS runs in positive and negative ionization modes. The analytical information is made available through a database, which will allows follow-up analysis, e.g. metabolite structural assignments on the basis of derived sum formula and fragmentation spectra, or the search for marine producer(s) of a given metabolite across a wide range of strains. For sample registration, the BioSamples (https://www.ebi.ac.uk/biosamples) database was chosen, the EBI partner (WP4) adapted it by creating a specific field to register EMBRIC samples defining appropriate attributes. Based on the metabolomic analyses (chemical fingerprints), a selection of interesting extracts and fractions will enter the chemical process leading to the purification by usual techniques (VLC, HPLC) of microbial compounds. Structure of the pure compounds will be elucidated by means of High Resolution Mass Spectrometry (HRMS) and extensive Nuclear Magnetic Resonance (NMR) techniques.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 25 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

5. Regulatory Environment

The EMBRIC pipelines to access products from marine environments encompass sampling, isolation and characterisation of organisms to extract and purification and assaying of potential bioactive compounds, with the goal to place products on the market. Table 1 lists some of the relevant regulations and provides links to the source. Table 2 presents the regulatory controls that are in place for each stage of the pipeline which includes collecting from in situ, biological diversity of areas beyond national jurisdiction (BBNJ), export to another country, import, handling (manipulation; growth; genetic manipulation), deposit as part of a patent process, shipping, storage and distribution. Community guidance and best practice is listed in Table 3. Table 1 List of some of the regulations governing microbiological activities Cartagena Protocol on Biosafety http://www.biodiv.org/biosafety/protocol.asp Budapest Treaty on the International Recognition of the Deposit of Microorganisms for the Purposes of http://www.cnpat.com/worldlaw/treaty/budapest_en.ht Patent Procedure (done at Budapest on April 28, m 1977 and amended on September 26, 1980) Convention on Biological Diversity http://www.biodiv.org/convention/articles.asp http://eur-op.eu.int/opnews/395/en/r3633.html for EC Directive 93/88/EEC on Biological Agents purchase through Celex laboratories http://eur- EC Directive 90/679/EEC setting mandatory control op.eu.int/opnews/395/en/r3633.html for purchase measures for through Celex Accord Européen relatif au transport international des http://eur-op.eu.int/opnews/395/en/r3633.html for merchandises dangereuses par routes (ADR). purchase through Celex IATA Dangerous Goods Regulations (DGR) http://www.iata.org/cargo/dg/dgr.htm http://www.hmso.gov.uk/cgi- bin/htm_hl3?URL=http://www.hmso.gov.uk/si/si1999/ UK Management of Health and Safety at Work 19993242.htm&STEMMER=en&WORDS=managem (MHSW) Regulations 1992 (Anon, 1992) ent+health+safeti+work+&COLOUR=Red&STYLE=s# muscat_highlighter_first_match http://www.hmso.gov.uk/cgi- bin/htm_hl3?URL=http://www.hmso.gov.uk/si/si1988/ UK Control of Substances Hazardous to Health Uksi_19881657_en_2.htm&STEMMER=en&WORDS (COSHH) regulations (1988) =control+substanc+hazard+health+&COLOUR=Red& STYLE=s#muscat_highlighter_first_match EU Council Regulation 3381/94/EEC on the Control of Exports of Dual-Use Goods from the Community of http://eur-op.eu.int/opnews/395/en/r3633.html 19th December 1994 (Official J. L 367, p1) and amendments EEC Directives 90/219/EEC. Contained use of genetically modified microorganisms (GMO's), *L117 http://biosafety.ihe.be/Menu/BiosEur1.html Volume 33, 8 May 1990. EEC Directives 90/220/EEC. Release of GMO's, http://biosafety.ihe.be/Menu/BiosEur1.html *L117 Volume 33, 8 May 1990.

Table 2. Regulatory control of microbiology Law, Regulation, Action Requirement Further information Convention Collecting from in Prior Informed consent from Convention on http://www.biodiv.org situ near-shore a recognised authority Biological Diversity marine environments Mutually agreed terms on Convention on http://www.biodiv.org

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 26 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

use Biological Diversity Access and Benefit Sharing Nagoya Protocol https://www.cbd.int/abs/about/ The new legal Collecting from in Under consideration: Marine instrument would fall http://www.un.org/sustainabled situ from waters global regime to better under the 1982 evelopment/blog/2017/07/coun beyond National address the conservation United Nations tries-agree-to-recommend- jurisdiction: biological and sustainable use of Convention on the elements-for-new-treaty-on- diversity of areas marine biological diversity of Law of the Sea, marine-biodiversity-of-areas- beyond national areas beyond national which has, since its beyond-national-jurisdiction/ jurisdiction (BBNJ) jurisdiction entry into force in 1994 Some plant and animal Quarantine pathogens require export Export to another regulations licences country Dangerous organisms with Export Licences for

potential for dual use dangerous organisms Non-indigenous plant Quarantine pathogens require licenses regulations from country authority Import Human and animal pathogens can often only be Health and Safety imported to specified laboratories Control of Biological Agents - Health and http://eur- Handling: Containment dependent on Safety op.eu.int/opnews/395/en/r363 Manipulation; Growth hazard EC Directive 3.html 93/88/EEC on Biological Agents EEC Directives 90/219/EEC. Contained use of genetically modified microorganisms http://www.biodiv.org/biosafety (GMO's), *L117 /protocol.asp Volume 33, 8 May Containment of manipulated http://biosafety.ihe.be/Menu/Bi Genetic manipulation 1990. organisms osEur1.html EEC Directives http://biosafety.ihe.be/Menu/Bi 90/220/EEC. Release osEur1.html of GMO's, *L117 Volume 33, 8 May 1990. Cartagena Protocol on Biosafety Budapest Treaty on the International Recognition of the Deposit as part of a http://www.cnpat.com/worldla Deposit of patent process w/treaty/budapest_en.htm Microorganisms for the Purposes of Patent Procedure IATA Dangerous Packaging and transport http://www.iata.org/cargo/dg/d Goods Regulations considerations gr.htm (DGR) EU Council Shipping Regulation http://eur- 3381/94/EEC on the Dangerous organisms op.eu.int/opnews/395/en/r363 Control of Exports of 3.html Dual-Use Goods from the Community of

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 27 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

19th December Appropriate containment Health and Safety Licence to hold Storage pathogens Security Sovereign rights over the Convention on http://www.biodiv.org strains Biological Diversity Distribution Access and benefit sharing Bonn Guidelines http://www.biodiv.org Intellectual property Right

ownership

EMBRIC work package 2 Deliverable 2.2 addresses Best Practice Methods for Biological Resource Centres Section 11 of that report covers Microorganism Biological Resource Centres compliance with national and international law. Here the elements covered are extended and additional guidance on compliance is reviewed. 5.1 Regulatory Guidance and Community Best Practices Table 3. Regulatory Guidance

Regulatory Guidance document URL area

Micro-Organisms Sustainable use and Access regulation International Code of www.belspo.be/bccm/mosaicc Conduct Convention on Guidance document on the scope of http://eur-lex.europa.eu/legal- Biological application and core obligations of content/EN/TXT/?uri=CELEX%3A52016XC0827% Diversity Regulation (EU) No 511/2014 on ABS 2801%29 http://www.mirri.org/user-service/nagoya- MIRRI user service on Nagoya Protocol protocol.html

Organisation for Economic Co- http://www.oecd.org/dataoecd/4/4/34932656.pdf operation and Development (OECD) United Nations Industrial Development Organisation (UNIDO) Bio-safety www.who.org/emc/biosafe/index.htm Information Network and Advisory Service (BINAS) International Centre for Genetic Engineering and Biotechnology www.aphisweb.aphis.usda.gov.biotech (ICGEB) Hygiene and US Animal and Plant Health Inspection Bio-safety www.nal.usda.gov/bic/ Service (APHIS) US Food and Drug Administration http://www.fda.gov/ (FDA) World Health Organization (WHO) http://www.who.int/csr/labepidemiology/projects/bio Biosafety Programme safetymain/en/index.html U.S. Departments of Health and Human http://www.cdc.gov/od/sap/final_rule.htm Services (HHS) and Agriculture (USDA) rules implementing USA PATRIOT Act and Public Health Security and

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 28 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Bioterrorism Preparedness and Response Act of 2002 Centre for Food Safety and Applied http://vm.cfsan.fda.gov/list.html Nutrition (CFSAN) Belgian Biosafety Server www.biosafety.be The Dutch Genetically Modified www.rivm.nl/csr/bggo.html Organism Bureau Biotechnology Information Centre (BIC) of the US Department of Agriculture www.nal.usda.gov/bic/ (USDA) UK Advisory Committee on Releases www.environment.detr.gov.uk/acre/index.htm into the Environment (ACRE) National Chemical Emergency www.eat.co.uk/ncec/complian/bibliog/bibliog.htm Response UK American Biological Safety Association http://www.absa.org (ABSA) European Biosafety Association http://www.ebsaweb.eu (EBSA) International Biosafety Working Group http://www.internationalbiosafety.org/english/index. (IBWG) asp Advisory Committee on Dangerous http://www.doh.gov.uk/bioinfo.htm Pathogens

Budapest Treaty on the International http://www.wipo.int/treaties/en/registration/budapes Patents recognition of the Deposit of t/ Microorganisms First generation guidelines for NCI- Supported Biorepositories – Federal http://biospecimens.cancer.gov/biorepositories/guid register Vol. 71, Number 82, Page elines_full_formatted.asp 25814, April 28, 2006. Transport and Harmonisation of UN documents etc. www.hazmat.dot.gov/rules shipping http://ibis.ib.upu.org; http://unicc/unece/tra; Universal Postal Union www.de/facil/upustr.htm WHO Guidance on Regulations for the http://www.who.int/csr/resources/publications/biosa Transport on Infectious Substances fety/WHO_CDS_CSR_LYO_2005_22/en/

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 29 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Table 4. Community Best Practice

Source Item URL Overview http://www.embrc.eu/sites/embrc Business Plan promoting Outlines of intentions to EMBRC .eu/files/public/EMBRC_Busines best practice share best practice s_Plan_web.pdf http://www.embrc.eu/sites/embrc Data management Plan .eu/files/public/EMBRC_data_m anagement_plan.pdf Key principles from available best practice http://www.embric.eu/sites/defau Deliverable 2.2 in the management of lt/files/deliverables/D2.1_Best% Best Practice Methods Biological Resource EMBRIC 2520practice%2520procedures for Biological Centres (BRCs) that %2520for%2520harmonized%2 Resource Centres 520access%5B1%5D.pdf supply marine microorganisms. (MIRRI, EMBRC) https://zenodo.org/record/28488 1 Specific guidance on MIRRI Access and https://absch.cbd.int/search/refer compliance with ABS MIRRI Benefit Sharing (ABS) enceRecords?schema=modelCo requirements in Manual ntractualClause microbiology focussed on mBRC activity ABSCH-A19A20-SCBD-208213- 1 Organisation for Covers general regulation Economic OECD Best Practice http://www.oecd.org/sti/biotech/o compliance and detailed Development and Guidelines for Biological ecdbestpracticeguidelinesforbiol best practice in biosecurity Co-operation Resource Centres ogicalresourcecentres.htm for BRCs (OECD)

Common Access Guidelines for Collection to Biological http://www.cabri.org/guidelines.h CABRI Guidelines Quality Resources and tml Information: Management Standards Public Deliverables: A BRC operational standard based on the http://www.embarc.eu/deliverabl OECD best practice es.html European guidelines for Biological http://www.embarc.eu/deliverabl Consortium of Resource Centres as a es/EMbaRC_D.NA1.2.1_2.28_B Microbial working draft for an ISO RC_standard.pdf Resources Standard http://www.embarc.eu/deliverabl Centres Overview of existing es/EMbaRC_D.NA1.3.2_D2.34_ legislation, guidelines, BiosecCoCfinal%20for%20Exec best practices etc. CommAug2012.pdf connected with biosecurity ECCO core Material Transfer Agreement for Provide model clauses and https://www.eccosite.org/ecco- European Culture the supply of samples of recommendations for MTA core-mta/ Collections’ biological material from content Organisation the public collection (ECCO) The Budapest Treaty: http://bccm.belspo.be/document Practical details on how Code of practice for IDAs s/files/deposit/code-of-practice- IDAs should operate to

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 30 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

– International Depositary for-idas.pdf comply with the Budapest authority under the Treaty on the International Budapest Treaty Recognition of the Deposit Budapest Treaty on the of Microorganisms for the International Recognition Purposes of Patent of the Deposit of Procedure (1977) Microorganisms for the Purposes of Patent Procedure (1977) World Federation for Culture Collections Basic quality management Guidelines for the guidance for culture Establishment and http://www.wfcc.info/guidelines/ collections; Section !7 Operation of Collections covers general aspects in of Cultures of compliance with regulation Microorganisms The resource covers: EBRCN information resource on transport World Federation EBRCN Quarantine for Culture Collections Regulations.doc (WFCC) EBRCN Alternative Safety WFCC Library provides http://www.wfcc.info/index.php/w Data Sheet.doc an information resource fcc_library/contribution/ EBRCN BRC Compliance from the EBRCN project with the CBD.doc September.draft-EBRCN Health and Safety Requiements.doc EBRCN Resource Legislation.

5.2 Regulations impacting on collection, handling, use and distribution of organisms The reach of a laboratory’s health and safety procedures extend beyond the laboratory where the work is carried out to cover all those who may come in contact with substances and products from that laboratory. A microorganism in transit will put carriers, postal staff, freight operators and recipients at risk, some organisms being relatively hazard free whilst others quite dangerous. It is essential that safety and shipping regulations are followed to ensure safe transit. There are several other pieces of legislation that restrict the distribution of microorganisms of which a microbiologist must be aware. Information is presented on how a Biological Resource Centre (BRC) or culture collection should comply with: Health and Safety requirements Classification of Microorganisms on the Basis of Hazard Quarantine regulations Postal Regulations and Safety Packaging requirements Regulations governing distribution of cultures

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 31 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Legislation on the Proliferation, Distribution and Misuse of Dangerous Pathogens Convention on Biological Diversity Ownership of Intellectual Property Rights (IPR) Safety information provided to the recipient of microorganisms It is critical that biological resource centres operate to high standards and currently there are some guidelines available for adoption and use (see Table 4 above) and summarised in EMBRIC Deliverable 2.2 Best Practice Methods for Biological Resource Centres (http://www.embric.eu/sites/default/files/deliverables/D2.1_Best%2520practice%2520proced ures%2520for%2520harmonized%2520access%5B1%5D.pdf. A goal of the Research Infrastructure is to implement regulatory compliant best practice in the delivery of infrastructure services in line with the appropriate operations of biological resource centres. The latter are defined in guidance such as that of the OECD, the implementation of ISO standards and various community approaches such as: CABRI: http://www.cabri.org/guidelines/gl-framed.html WFCC: http://www.wfcc.info/guidelines/ OECD: http://www.oecd.org/sti/biotech/oecdbestpracticeguidelinesforbiologicalresourcecentres.htm In the process of isolation, handling, storage and distribution of microorganisms and cell cultures there are many stages where compliance with the law, regulations or voluntary international conventions is required (Tables 1 and 2). 5.3 Health and Safety Health and safety are covered by Directive 89/391/EEC which puts in place measures to improve safety and health at work. It is designed to encourage improvements in occupational health and safety in all sectors of activity, both public and private; promote workers' rights to make proposals relating to health and safety, to appeal to the competent authority and to stop work in the event of serious danger seeks to adequately protect workers and ensure that they return home in good health at the end of the working day (http://ec.europa.eu/social/main.jsp?catId=148). There are also numerous national requirements. The EU legal framework in the area of occupational safety and health (OSH) is outlined in the OSH Strategic Framework 2014-2020 which can be consulted at http://ec.europa.eu/social/main.jsp?catId=151&langId=en. A risk assessment of handling and supply of organisms is required and should include an assessment of all hazards involved, not just infection, but also all others amongst which are, the production of toxic metabolites and the ability to cause allergic reactions. Organisms that produce volatile toxins or aerosols of spores or cells present a greater risk. It is the responsibility of the microbiologist to provide such assessment data to a recipient of a culture to ensure its safe handling and containment. Whether it is compliance with the law, or duties of a caring employer, the basic requirements in order to establish a safe workplace are: Adequate assessment of risks ● Provision of adequate control measures ● Provision of health and safety information

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 32 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

● Provision of appropriate training ● Establishment of record systems to allow safety audits to be carried out ● Implementation of good working procedures

Good working practice requires assurance that correct procedures are actually being followed and this requires a sound and accountable safety policy (http://www.hse.gov.uk/pubns/hsc13.htm). A BRC must put in place procedures to manage the health and safety of all who may be put at risk by its activities. This requires a suitable and sufficient assessment of the risks to health and safety to which any person whether employed by them or not may be exposed to through their work (Anon, 1996a). These assessments must be reviewed regularly, additionally when changes in procedures or regulations demand, and must be recorded. The distribution of microorganisms to others outside the workplace extends these duties to protect others. In Europe the protection of workers from risks related to exposure to biological agents at work is addressed by Directive 2000/54/EC - biological agents at work of the European Parliament and of the Council of 18 September 2000 (https://osha.europa.eu/en/legislation/directives/exposure-to-biological- agents/77). This Directive lays down minimum requirements for the health and safety of workers exposed to biological agents at work. 5.4 Microorganisms as hazardous substances The UK Management of Health and Safety at Work (MHSW) Regulations 1992 (http://www.legislation.gov.uk/uksi/1992/2051/contents/made) are all encompassing and general in nature but overlap and lead into many specific pieces of legislation. The Control of Substances Hazardous to Health COSHH) regulations (http://www.hse.gov.uk/coshh/) require that every employer makes a suitable and sufficient assessment of the risks to health and safety to which any person whether employed by them or not may be exposed to through their work. These assessments must be reviewed regularly in addition to and when changes in procedures or regulations demand, and must be recorded when the employer has more than five employees. The distribution of microorganisms to others outside the workplace extends these duties to protect others. The organism must be assessed for all types of hazard it presents not on infection but also on the basis of toxin production for example the mycotoxins or bacterial toxins. At the European level this is also covered by Directive 2000/54/EC - biological agents at work. 5.5 Classification of Microorganisms on the Basis of Hazard Various classification systems exist which include World Health Organisation (WHO); United States Public Health Service (USPHS); Advisory Group on Dangerous Pathogens (ACDP); European Federation of Biotechnology (EFB) and European Community (EC). In Europe, the EC Directive (93/88/EEC) on Biological Agents sets a common base line which has been strengthened and expanded in many of the individual member states. In the UK the definition and minimum handling procedures of pathogenic organisms are set by the ACDP who list four hazard groups 1-4 with corresponding containment levels. Microorganisms are normally classified on their potential to cause disease, their human pathogenicity, into four groups (Anon, 1996a; https://www.hseni.gov.uk/publications/categorisation-biological- agents-according-hazard-and-categories-containment):

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 33 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Group 1 A biological agent that is most unlikely to cause human disease. Group 2 A biological agent that may cause human disease and which might be a hazard to laboratory workers but is unlikely to spread in the community. Laboratory exposure rarely produces infection and effective prophylaxis or treatment is available. Group 3 A biological agent that may cause severe human disease and present a serious hazard to laboratory workers. It may present a risk of spread in the community but there is usually effective prophylaxis or treatment. Group 4 A biological agent that causes severe human disease and is a serious hazard to laboratory workers. It may present a high risk of spread in the community and there is usually no effective prophylaxis or treatment. A BRC must ensure that all strains are assigned to appropriate risk/hazard groups this includes a positive assignment to risk/hazard group 1 unless otherwise considered hazardous. Hazard information must be recorded and made available to recipients of this material. Various classification systems exist which include: World Health Organisation (WHO) http://www.who.int/csr/resources/publications/biosafety/WHO_CDS_CSR_LYO_2004_11/en/ NIH Guidelines for Research Involving Recombinant or Synthetic Nucleic Acid Molecules (April 2016) Appendix B - Table 1; https://osp.od.nih.gov/wp-content/uploads/2013/06/NIH_Guidelines.pdf Advisory Committee for Dangerous Organisms (ACDP); http://www.hse.gov.uk/pubns/misc208.pdf Safe biotechnology. 7. Classification of microorganisms on the basis of hazard. Working Party "Safety in Biotechnology" of the European Federation Biotechnology; https://www.ncbi.nlm.nih.gov/pubmed/8987466 Directive 2000/54/EC - biological agents at work; https://osha.europa.eu/en/legislation/directives/exposure-to-biological-agents/77 In Europe Directive 2000/54/EC - biological agents at work sets a common base line which has been strengthened and expanded in many of the individual member states. In the UK the definition and minimum handling procedures of pathogenic organisms are set by the ACDP who list four hazard groups 1-4 with corresponding containment levels. Similarly other European Countries have advisory committees, in Germany the ZKBS advises on how individual Genetically Engineered Microorganisms (GEMs) and Genetically Modified Organisms (GMOs) should be classified and the Trade Corporation Association of the Chemical Industry (BG Chemie) is involved in both. The Advisory Committee on Genetic Manipulation (ACGM) prescribe separate but similar regulations for those organisms that have been genetically modified.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 34 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

EU legislation on GMOs: COUNCIL DIRECTIVE (90/219/EEC) on the contained use of genetically modified microorganisms 23 April 1990 COUNCIL Directive (90/220/EEC) on the deliberate release into the environment of genetically modified organisms 3 April 1990 COMMISSION DIRECTIVE 94/51/EC of 7 November 1994, adapting to technical progress Council Directive 90/219/EEC on the contained use of genetically modified microorganisms COMMISSION DECISION (94/730/EC) of 4 November 1994, establishing simplified procedures concerning the deliberate release into the environment of genetically modified plants pursuant to Article 6 (5)of Council Directive 90/220/EEC EU Legislation concerning transport of GMOs or pathogenic organisms and that pertaining to biotechnological safety can be found at: https://ec.europa.eu/food/plant/gmo/legislation_en

United Nations Industrial Development Organisation ● Voluntary code of conduct for the release of organisms into the environment The text can be found at http://www.ask-force.org/web/Regulation/UNIDO-WHO- UNEP-Release-Organisms-Environment-Volutary-20010518.pdf 5.6 Quarantine regulations Clients, who wish to obtain cultures of non-indigenous plant pathogens must first obtain a permit to import, handle and store from the appropriate Government Department. Under the terms of such a licence the shipper is required to see a copy of the Ministry permit before such strains can be supplied. The BRC must do its best to ensure that non-indigenous pathogens are not distributed unless the recipient has a current licence. Quarantine legislation is in place in countries world-wide restricting the import of non- indigenous plant and animal pathogens. Those who wish to import such organisms must hold the relevant import permit which can be obtained from the relevant country authority for example – Canada, UK and USA. Information on the transport of plant pathogens throughout Europe can be obtained from the European and Mediterranean Plant, Protection Organisation (EPPO; https://www.eppo.int/). . 5.7 Postal Regulations and Safety Countries have their own regulations governing the packaging and transport of biological material in their domestic mail. International Postal Regulations regarding the postage of human and animal pathogens are very strict on account of the safety hazard they present. There are several organisations that set regulations controlling the international transfer of such material. These include the International Air Transport Association (IATA), International Civil Aviation Organisation (ICAO), United Nations Committee of Experts on the Transport of Dangerous Goods, the Universal Postal Union (UPU) and the World Health Organisation (WHO). It is common place to send microorganisms by post, as this is more convenient and less expensive than air freight. However, many countries prohibit the movement of biological substances through their postal services. The International Bureau

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 35 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

of the UPU in Berne publishes all import and export restrictions for biological materials by national postal services. 5.8 Packaging IATA Dangerous Goods Regulations (DGR) require that packaging used for the transport of hazard group 2, 3 or 4 must meet defined standards, IATA packing instruction 602 (class 6.2) (IATA, 2002). Packaging must meet EN 829 triple containment requirements for hazard group 1 organisms. However, microorganisms that qualify as dangerous goods (class 6.2) must be in UN certified packages. These packages must be sent by air freight if the postal services of the countries through which it passes do not allow the organisms in their postal systems. They can only be sent airmail if the National Postal authorities accept them. There are additional costs above the freight charges and package costs if the carrier does not have its own fleet which will require the package and documentation to be checked at the airport DGR centre. 5.9 Regulations governing distribution of cultures The IATA Dangerous Goods Regulations (DGR) require that shippers of microorganisms of hazard groups 2, 3 or 4 must be trained by IATA certified and approved instructors. They also require shippers declaration forms, which should accompany the package in duplicate, and specified labels are used for organisms in transit by air (IATA, 2002). IATA DGR also requires that packaging used for the transport of hazard group 2, 3 or 4 must meet defined standards, IATA packing instruction 602 (class 6.2) (IATA, 2002). See http://www.gbf.de/dsmz/shipping/shipping.htm. Packaging must meet EN 829 triple containment requirements for hazard group 1 organisms (Anon, 1996b). A BRC must ensure that staff responsible for distribution of cultures have a current IATA Shipper training certificate and ensure organisms are packed and shipped in accordance with IATA requirements. The IATA Dangerous Goods Regulations require that shippers of microorganisms of hazard groups 2, 3 or 4 must be trained by IATA certified and approved instructors. They also require shippers declaration forms, which should accompany the package in duplicate and specified labels, are used for organisms in transit by air (IATA, 2002). There are several other regulations that impose export restrictions on the distribution of microorganisms. These include control of distribution of agents that could be used in biological warfare; EU Council Regulation 3381/94/EEC on the control of export of dual-use goods (Official J. L 367, p1) and more generally countries are currently implementing Access Regulations to Genetic Resources under the Convention on Biological Diversity, transport of goods by road. It is critical that microbiologists are aware of and follow such legislation; see Guidance on regulations for the Transport of Infectious Substances 2007– 2008. WHO/CDS/EPR/2007.2 World Health Organization 2007 http://www.who.int/ihr/publications/WHO_CDS_EPR_2007_2cc.pdf 5.10 Control of Distribution of Dangerous Organisms There is considerable concern over the transfer of selected infectious agents capable of causing substantial harm to human health. There is potential for such organisms to be passed to parties not equipped to handle them or to persons who may make illegitimate use

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 36 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

of them. Of special concern are pathogens and toxins causing anthrax, botulism, brucellosis, plague, Q fever, tularemia and all agents classified for work at Biosafety Level 4 (hazard group 4). The ‘Australia Group’ of countries has strict controls for movement outside their group but has lower restrictions within. A BRC has procedures to check the validity of customers that wish to receive dangerous organisms and if in doubt does not supply. There is considerable concern over the transfer of selected infectious agents capable of causing substantial harm to human health. There is potential for such organisms to be passed to parties not equipped to handle them or to persons who may make illegitimate use of them. Of special concern are pathogens and toxins causing anthrax, botulism, brucellosis, plague, Q fever, tularemia and all agents classified for work at Biosafety Level 4 (hazard group 4). There is control legislation in place and some of this is described below. The American Society for Microbiology (ASM) provide information for example "Contact Information for Select Agent Preservation" is available at the WFCC site http://www.wfcc.info/index.php/wfcc_library/agents/. The document is on the issue of preserving (dangerous) agents for research for the human welfare. The title of the menu in the Web page is "US legislation concerning select agents". The relevant links are embedded in the document. 5.11 Biological Weapons Convention The Biological Weapons Convention (BWC) (The Biological and Toxin Weapons Convention) was signed in London, Moscow and Washington on 10 April 1972 and entered into force on 26 March 1975. There are currently 162 country signatories of which 18 are still to ratify (https://www.un.org/disarmament/wmd/bio/). ● Convention on the prohibition of the development, production and stockpiling of bacteriological (biological) and toxin weapons and on their destruction The text can be found at http://disarmament.un.org/treaties/t/bwc Following the signing of the BTWC countries have introduced new control legislation or procedures to prevent unauthorised access to strains that could be misused in this way. An Ad Hoc Group is currently discussing a verification protocol for the BWC, such a protocol is now in place for the Chemical Weapons Convention (CWC) https://www.un.org/disarmament/wmd/chemical/. 5.12 Export Licensing Measures Article III of the BWC obliges the States Parties not to transfer to any recipient whatsoever, directly or indirectly, and not in any way to assist, encourage, or induce any States, group of States or international organizations to manufacture or otherwise acquire any of the agents, toxins, weapons, equipment or means of delivery specified in article I of the Convention. This is a legally binding obligation. A number of countries have implemented national export licensing measures as an effective means of implementing these obligations and to avoid the possibility of the inadvertent supply of any item which could be used in a BW program. Export licenses are not bans. They operate to deter proliferation by monitoring trade of relevant materials, and provide authority to stop a sale in the infrequent cases where a

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 37 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

prospective export is likely to contribute to a BW program. It is also in the interest of industry and research institutes to ensure that such firms and institutes are not inadvertently supplying pathogens and dual-use equipment for use in the production of BW. The Geneva Protocol for the Prohibition of the Use in War of Asphyxiating, Poisonous or Other Gases, and of Bacteriological Methods of Warfare (https://www.un.org/disarmament/wmd/bio/1925-geneva-protocol/). USA The USA have rules that include a comprehensive list of infectious agents, registration of facilities that handle them, requirements for transfer, verification and disposal and that carry criminal and civil penalties. In the UK all facilities handling hazard group 2, 3 or 4 must be registered. Strict control of hazard group 3 and 4 organisms is in place.

The US Biological Safety Requirements for Facilities Transferring or Receiving Select Agents In response to a Congressional mandate in 1997, CDC promulgated a regulation, Additional Requirements for Facilities Transferring or Receiving Select Agents. The select agents were drawn from the Australia List and include forty or so microorganisms, a dozen toxins, and certain recombinant DNA molecules. Institutions were provided Registration Packages that included a self-assessing laboratory survey form based on the CDC/NIH publication, Biosafety in Microbiological and Biomedical Laboratories. Each registered facility is to be inspected by personnel from the OHS during a three year cycle. To date there are 67 laboratories registered. This presentation will provide a status report on the activities associated with implementing this regulation. CDC is currently working to develop a plan to address the role of Public Health to meet the growing national concerns about bioterrorism. Additional Requirements for Facilities Transferring or Receiving Select Agents: 42 CFR Part 72.6, 1996; https://www.gpo.gov/fdsys/granule/CFR-2004-title42-vol1/CFR-2004-title42-vol1- sec72-6 Public Health Chapter I--Public Health Service, Department of Health and Human Services Part 71--Foreign Quarantine-Subpart F--Importations Sec. 71.54 Etiological agents, hosts, and vectors. (a) A person may not import into the United States, nor distribute after importation, any etiological agent or any arthropod or other animal host or vector of human disease, or any exotic living arthropod or other animal capable of being a host or vector of human disease unless accompanied by a permit issued by the Director. (b) Any import coming within the provisions of this section will not be released from custody prior to receipt by the District Director of the U.S. Customs Service of a permit issued by the Director. European The European Union has adopted a common position with the Biological and Toxic Weapons Convention http://eur-lex.europa.eu/legal- content/EN/TXT/?uri=CELEX%3A31999E0346 Delivery of microorganisms which could be used as biological weapons is governed by the

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 38 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Council Regulation (EC) No 837/95 of 10 April 1995 amending Regulation (EC) No 3381/94 setting up a Community regime for the control of exports of dual-use goods http://eur- lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A31995R0837 France France have introduced regulations on high pathogenic strains (AFSSAPS). Information can be found in French on http://ansm.sante.fr/. 5.13 Control procedures: UKNCC control of Dangerous Pathogens The UK has put in place a code of practice for UK public service collections to follow. The Australia Group has strict controls for movement outside their group of countries but has lower restrictions within. The list of Australia Group controlled organisms is modified periodically. The UK National Culture Collections are implementing a system involving the registration of customers to ensure bone fide supply when there is any doubt associated with giving access to potentially organisms. Access to the organisms on the Australia Group list plus additional organisms of concern is controlled. A list of such organisms has been compiled by the UKNCC, which includes plant, animal and human pathogens. See http://www.hse.gov.uk/aboutus/meetings/committees/acdp/ Some biosafety related sites: The European Federation Biotechnology Working Party on Safety in Biotechnology http://www.biokemi.org/biozoom/issues/489/articles/1913 The European Biological Safety Association https://ebsaweb.eu/ The American Biological Safety Association https://ehs.msu.edu/ The Belgian Biosafety Server with lots of links https://www.biosafety.be/ World Health Organisation WHO http://www.who.int/en/ UNIDO Biosafety Information Network and Advisory Service http://binas.unido.org/ International Centre for Genetic Engineering and Biotechnology (ICGEB) in Trieste http://www.icgeb.trieste.it/~bsafesrv/ Some BTWC related sites: The Department of Peace Studies at the University of Bradford https://www.bradford.ac.uk/social-sciences/peace-studies/ Pugwash Study Group on CBW http://fas-www.harvard.edu:80/~hsp/pugwash.html CBIAC (The Chemical and Biological Defence Information Analysis Center) https://www.hsdl.org/?abstract&did=1099 SIPRI (Stockholm International Peace Research Institute) Chemical and Biological Warfare Project http://www.sipri.se The Henry L. Stimson Center, Chemical and Biological Weapons Nonproliferation Project http://www.stimson.org

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 39 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Biological and Toxin Weapons Working Group, Federation of American Scientists http://www.fas.org/bwc/index.html 5.14 Convention on Biological Diversity Text: https://www.cbd.int/convention/text/

The Convention on Biological Diversity requires that microbiologists seek prior informed consent from the country in which they wish to collect organisms. They will be required to agree terms on which benefits will be shared should they accrue from the use of the organisms. The benefit sharing may include monetary elements but may also include information, technology transfer and training. A BRC must ensure transparency retaining the link between country of origin and end user of genetic resources. Biological materials must be received and supplied within the spirit of the CBD ensuring material transfer agreements are in place. A BRC must maintain contact and follow recommendations of its national CBD Contact Point and National Focal Point The Convention on Biological Diversity (CBD) was established to support the conservation and utilisation of biodiversity ensuring fair and equitable sharing of benefits arising from the latter. The CBD assigns sovereign rights to the country of origin and requires that Prior Informed Consent (PIC) is received from the country in which access to organisms is requested. Mutually Agreed Terms (MAT) on the conditions under which access is granted and on which benefits will be shared should they accrue from the use of the organisms must be put in place. Benefit sharing may include monetary elements but may also include information, technology transfer and training. The supply of organisms must also be under agreed terms under Material Transfer Agreements (MTA) between supplier and recipient to ensure benefit sharing with, at least the country of origin. This is a significant role for public service collections potentially bearing critical responsibilities to ensure traceability. Many culture collections have operated benefit sharing since they began giving organisms in exchange for deposits and re-supplying the depositor with the strain if they require a replacement. An EU DG XII project, Microorganisms Sustainable Use and Access Regulation International Code of Conduct (MOSAICC) (http://bccm.belspo.be/projects/mosaicc) produced standard material transfer agreements to facilitate access to genetic resources whilst adhering to the spirit of the CBD and National and International law governing the distribution of microorganisms (Davison et al. 1998). Many countries are putting in place legislation to control access to their genetic resources. This normally includes certification processes and can help by identifying the relevant authorities for PIC and help set the terms and conditions of access. However, some legislation has been bureaucratic and has served to restrict access. National Focal Points on access and benefit sharing are beginning to appear that should make the process simpler. To date most collections have adopted a wait and see attitude as their role is still not clear. 5.15 Ownership of Intellectual Property Rights (IPR) Organisms originating from different habitats all over the world are deposited in collections. On deposit the issue of ownership of intellectual property associated with them must be addressed. The CBD bestows sovereign rights over genetic resources to the country of

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 40 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

origin, but intellectual property rights covering their use in processes is another matter. The CBD requires that the country of origin has a share in benefits accruing from such use, but there may be several other stakeholders. These may include the landowner where the organism was isolated, the collector, those involved in purification and growing the organism, the discoverer of the intellectual property, the collection owner where the organism was preserved and the developer of the process. It is clear that all stakeholders do not all have an equal stake; this will depend upon the input of each one to the discovery or process. This has implications for the sharing of benefits arising from exploitation of the genetic resource. The collection has a role to play in the protection of IPR even if it is merely informing the recipient of any existing material transfer agreement or the citation of the strain in a patent. The implementation of the CBD is still being discussed by delegates from the countries who are signatory and who meet at the Conference of the Parties and their workgroups. Information on the progress of these discussions can be found on the CBD web site (https://www.cbd.int/). As a form of protecting IPR patents may be taken out. In many cases the organism involved must be part of the disclosure and many countries either recommend or require by law that a written disclosure of an invention involving the use of organisms be supplemented by the deposit of the organism into a recognised culture collection. Most patent lawyers recommend that the organism is deposited, regardless of it being a requirement, to avoid the possibility of the patent being rejected. To remove the need for deposit of organisms in a collection in every country where patent protection is desired, the “Budapest Treaty on the International Recognition of the Deposit of Micro-organisms for the Purpose of Patent Procedure” was concluded in 1977 and came into force towards the end of 1980 (www.wipo.int/treaties/en/registration/budapest/). This recognises named culture collections as “International Depository Authorities” (IDA) and a single deposit made in any one is accepted by every country party to the treaty. Any collection can become an IDA providing it has been formally nominated by a contracting state and meets certain criteria. There are 47 IDAs around the world and 28 in Europe which accept patent deposits of human and animal cell lines, algae, bacteria, cyanobacteria, fungi, nematodes, non-pathogenic protozoa, plant seeds and yeasts (http://www.wipo.int/export/sites/www/treaties/en/registration/budapest/pdf/idalist.pdf). It is quite clear that every intermediary in an improvement or development process is entitled to a share of the IPR, which adds another dimension to ownership. Therefore, it is critical that clear procedures on access, mutually agreed terms on fair and equitable sharing of benefits and sound material transfer agreements are in place to protect the interested parties. 5.16 Safety information provided to the recipient of microorganisms A safety data sheet must be despatched with an organism indicating which hazard group it belongs to and what containment and disposal procedures are necessary; in Europe Code of Practice for Biological Agents 1994 (Anon, 1994). Article 10 of the EU Directive 90/379/EEC regulates that manufacturers, importers, distributors and suppliers must provide safety data sheets in a prescribed format. A safety data sheet accompanying a microorganism must include:

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 41 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

● The hazard group of the organism being despatched ● A definition of the hazards and assessment of the risks involved in handling the organism. ● Requirements for the safe handling and disposal of the organism. - Containment level - Opening cultures and ampoules - Transport - Disposal - Procedures in case of spillage A BRC issues an appropriate safety data sheet with every culture consignment. A safety data sheet must be despatched with organism indicating which hazard group it belongs to and what containment and disposal procedures are necessary. In the UK, Microorganisms are covered by the Control of Substances Hazardous to Health (COSHH) regulations (1988), HSW Act s.6(4)(c) and subject Approved Code of Practice Biological Agents 1994. Article 10 of the EU Directive 90/379/EEC regulates that manufacturers, importers, distributors and suppliers must provide safety data sheets in a prescribed format. A safety data sheet accompanying a microorganism must include: ● The hazard group of the organism being despatched as defined by EU Directive 90/679/EEC Classification of Biological Agents and by the national variation of this legislation for example, in the UK, as defined in the Advisory Committee on Dangerous Pathogens (ACDP) Categorisation of biological agents, 4th edition and the Approved Code of Practice (ACOP) for Biological Agents. ● A definition of the hazards and assessment of the risks involved in handling the organism. ● Requirements for the safe handling and disposal of the organism. - Containment level - Opening cultures and ampoules - Transport - Disposal - Procedures in case of spillage Safety Data Sheets content is described in EU Directive 93/112/EC of 10 December 1993 (OL L314/93) which amends the 91/155/EC. See http://eur-lex.europa.eu/legal- content/EN/TXT/?uri=CELEX%3A31993L0112 5.17 Areas Beyond National Jurisdiction (ABNJ) Marine Areas Beyond National Jurisdiction (ABNJ), commonly called the high seas, are those areas of ocean for which no one nation has sole responsibility for management. Action is being taken to improve management of fisheries and strengthen protection of related ecosystems. This is being undertaken to prevent devastating impacts on marine biodiversity, socio-economic well-being and food security for millions of people directly dependent on

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 42 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

those fisheries particularly by the Global Environment Facility (GEF) https://www.thegef.org/topics/areas-beyond-national-jurisdiction. The United Nations Convention on the Law of the Sea (UNCLOS) provides that the areas beyond the limits of national jurisdiction (ABNJ) include: 1. the water column beyond the Exclusive Economic Zone (EEZ), or beyond the Territorial Sea where no EEZ has been declared, called the High Seas (Article 86); and 2. the seabed which lies beyond the limits of the continental shelf, established in conformity with Article 76 of the Convention, designated as "the Area" (Article 1). Marine Areas Beyond National Jurisdiction (ABNJ) are increasingly being exploited by human activities including shipping, commercial fishing, deep sea mining and of course in EMBRIC for bio-prospecting. The GEF is also associated with many global and regional multilateral agreements that deal with international waters or transboundary water systems https://www.thegef.org/partners/conventions. As such, the GEF assist its recipient countries with international waters issues as they undertake work under the following conventions: ● The Global Ship Ballast Water Treaty ● The UN Law of the Sea Treaty ● The MARPOL treaty for shipping (International Convention for the Prevention of Pollution From Ships) ● The UN Agreement on conservation and management of straddling fish stocks and highly migratory fish stocks. The GEF also supports various UN Agency Action Programs like the Barbados Programme of Action, the Global Programme of Action (GPA), Code of Conduct for Fisheries. However, more often the GEF supports and help negotiate regional conventions like the Barcelona, Cartagena, Bucharest, and Danube Conventions. A written submission of the EU and its member states on marine genetic resources, including questions on the sharing of benefits was made on 22 February 2017; Development of an International Legally-Binding Instrument Under UNCLOS on the Conservation and Sustainable Use of Marine Biological Diversity of Areas Beyond National Jurisdiction http://www.un.org/depts/los/biodiversity/prepcom_files/rolling_comp/EU_Written_Submission _on_Marine_Genetic_Resources.pdf. The document explains that benefit-sharing under the UNCLOS Implementing Agreement should be in line with a pragmatic approach outlined. The objective of the future treaty on the conservation and sustainable use of marine biological diversity in areas beyond national jurisdiction (ABNJ) should be built around the conservation of biodiversity. Determination of the legal status of marine genetic resources (MGRs) is not a precondition for addressing relevant provisions concerning potential benefit- sharing with respect to MGRs in a future Implementing Agreement. The Agreement should not regulate the management of fish stocks and fisheries but should cover fish and other biological resources used for the research on their genetic properties. IPRs, including disclosure-of-origin requirements in patent applications, should not be within the scope of the UNCLOS IA, as this issue has to be dealt with within the existing institutional frameworks competent in this subject-area (WIPO and WTO); in the end the overall goal of the UNCLOS

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 43 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Implementing Agreement. Hence, it should be conducive to the conservation and sustainable use of marine biodiversity in areas beyond national jurisdiction, marine scientific research conducted in accordance with UNCLOS, as well as to the promotion of knowledge generation and innovation. Noting that to date there have been no concrete proposals that demonstrate how such benefit-sharing would operate in practice, the EU and its Member States remain ready to consider such specific proposals that delegations may wish to put forward. EMBRIC must not only keep a watching brief on the development of this agreement but must be part of the process championed by EMBRC to input to the process to ensure the outcome does not restrict our ability to find new and interesting products from the sea without compromising the biodiversity in it.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 44 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

6. Outputs from the microorganism prototype pipeline

There were well over a thousand samples collected from the Pacific Ocean providing the opportunity to isolate many thousands of strains. Without the application of selective techniques this would have resulted in many of the same common organisms being grown and tested. DSMZ’s focus was to select those that were rarely isolated because of their particular requirements to grow and with unique and potential new and interesting compounds that could be utilised by bioindustry. The methodologies to achieve this are described in sections 2, 3 and 4 above. Here the results of the targeted isolation programmes and the characterisation through the microbial prototype pipeline are presented.

6.1 Strains collected from the Pacific Ocean by DSMZ DSMZ collected samples of sea water, sediment and sponges from the Pacific Ocean in May 2106 and isolated many strains represented here by 264 species from the phyla Actinobacteria, Bacteroidetes, Proteobacteria and Rhodothermaeota based on their diversity and after de-replicating potential clones (Table 5). The most interesting of these will be selected for cosmid library production. Two Arcobacter species were grown in a fermenter to produce biomass for sequencing and the production of organic extracts for HZI to analyse. This preliminary test of the pipeline workflows was needed to ensure the protocols selected were optimised. A selection of the most interesting strains will be made from the list shown in Table 5 for fermentation to create biomass and provide organic extracts for analysis throughout April 2018 to provide a total of 200 samples; the majority will be for low level analysis but a few will be selected and analysed more deeply. 6.2 Cosmid libraries and biosynthetic gene clusters at USTAN As described above in section 3 the bacterial genome sequences provided by DSMZ were analysed using antiSMASH to identify putative novel gene clusters and to predict the gene cluster products. The genomic DNA was then extracted and used to construct a cosmid library for each strain. Gene clusters, predicted using antiSMASH were then targeted for heterologous expression. Biosynthetic gene clusters were discovered and an indication of potential bioactive compounds observed.

After the prediction the cosmid library of Saccharothrix espanaensis DSM 44229 and Amycolatopsis japonica MG417-CF17 were constructed and the BAC library of S. espanaensis DSM 44229 was constructed in Bio S&T company. The prediction gene clusters screening from BAC and cosmid libraries are represented in Table 6. The Network analysis of the five heterologously expressed gene clusters, viewed in cytoscape is shown in Figure 7.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 45 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Table 5. List of strains collected from the Pacific by DSMZ following dereplication of potential clones Sample Cultivation Strain name Isolation medium Taxonomic classification source Strategy Accession number of Similarity Phylum Closest relative the closest (%) relative Aeromicrobium Water Multiwell plate ABW salts + 1:10 HD Actinobacteria AB245394 99.70 JAB_HD_11a ginsengisoli Aeromicrobium Water Multiwell plate ABW salts + 1:10 HD Actinobacteria NR_025681 99.70 JAB_HD_20 marinum JAB_HD_22c Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Aeromicrobium ponti AM778683 99.10 Brevibacterium Water Multiwell plate ABW salts + 1:10 HD Actinobacteria NR_025614 100.00 JAB_HD_90 picturae JAB_HD_28a Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Frigoribacter sp. Y18807 100.00 Chemotaxis ASW salts + Fatty acid Sponge Actinobacteria Gordonia terrae KF410339 99.73 2i2 assay mix JAB_HD_35 Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Marmoricola scoriae FN386750 97.80 Microbacterium Water Multiwell plate ABW salts + 1:10 HD Actinobacteria AM181506 100.00 JAB_HD_95a maritypicum Microbacterium Water Multiwell plate ASW salts + 1:10 HD Actinobacteria AJ491806 100.00 JAS_HD_14a paraoxydans ASW salts + Polymer Water Multiwell plate Actinobacteria Micrococcus luteus AF542073 99.73 PNS81A_F11b mix Micrococcus Water Multiwell plate ASW salts + 1:10 HD Actinobacteria FJ214355 100.00 JAS_HD_29a yunnanensis ABW_HD_4_2 Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Nocardia coeliaca NR_104776 100.00 Rhodococcus Water Biofilm assay ASW salts + 1:10 HD Actinobacteria AB269261 98.60 3RW5_G1 kyotonensis Rhodococcus Water Biofilm assay ASW salts + 1:10 HD Actinobacteria AY602219 99.04 2RW5_G2 yunnanensis Thermoleophilum Water Multiwell plate ASW salts + 1:10 HD Actinobacteria HQ223108 99.80 JAS_HD_26 minutum ABW salts + Polymer Yonghaparkia Water Multiwell plate Actinobacteria NR_043675 97.00 ABW Poly mix alkaliphila

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 46 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Algoriphagus Water Multiwell plate ABW salts + 1:10 HD Bacteroidetes NR_025602 96.90 JAB_HD_38 aquimarinus Algoriphagus Water Multiwell plate ABW salts + 1:10 HD Bacteroidetes HQ401024 97.80 JAB_HD_10b namhaensis 3RW5_S4b Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Altuibacter lentus JQ362482 96.82 JAB_HD_101 Water Multiwell plate ABW salts + 1:10 HD Bacteroidetes Arenibacter troitsensis AB080771 99.90 CS3_PP3b Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Cellulophaga geojensis 315111068 99.50 Cellulophaga Water Multiwell plate ABW salts + 1:10 HD Bacteroidetes EU443205 100.00 JAB_HD_110 tyrosinoxydans ACS3C_G4 Sediment Multiwell plate ASW salts + 1:10 HD Bacteroidetes Dokdonia donghaensis DQ003276 99.45 ASW salts + Polymer Draconibacterium Sediment Multiwell plate Bacteroidetes JQ683778 97.94 PCS2D_A12 mix orientale Flaviramulus Sediment Multiwell plate ASW salts + 1:10 HD Bacteroidetes JX412958 95.76 ACS2D_F2 ichthyoenteri JAB_HD_16a Water Multiwell plate ABW salts + 1:10 HD Bacteroidetes Flavivirga amylovorans HM475138 97.10 CS2_G3a Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Formosa spongicola NR_116612 98.52 Jejudonia Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes KC792554 96.29 3RW5_PP6 soesokkakensis 3CS3_G3 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Lacinutrix sp. FN377744 97.31 2RS2_G3 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Lutibacter litoralis AY962293 98.10 Maribacter Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes AY960749 98.53 3RW5_PP3b dokdonensis 3RW5_PP1 Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Maribacter forsetii NR_042627 97.43 2RW5_PS4 Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Maribacter stanieri EF536747 99.69 2CS2_PP5 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Marinifilum flexuosum HE613737 95.28 3CS2_G2 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Marinifilum sp. HE613737 97.06 Mesoflavibacter Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes AB265181 98.60 2CW2_G1 zeaxanthinifaciens 3CW3_G4 Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Nonlabens arenilitoris JX291103 99.73 2CS2_PS1 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Olleya marilimosa JN175350 99.86 Planomicrobium Water Multiwell plate ASW salts + 1:10 HD Bacteroidetes AB680292 99.70 JAS_HD_7 okeanokoites

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 47 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

RW1 10^4 2A light Water Multiwell plate ASW salts + 1:10 HD Bacteroidetes Polaribacter porphyrae NR_114321 95.00 yellow Psychroserpens Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes DQ001321 97.82 2CS3_PS5 mesophilus Sphingobacteria Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes AB362263 97.41 2CS2_PP2 bacterium ACS3D_E6 Sediment Multiwell plate ASW salts + 1:10 HD Bacteroidetes Ulvibacter antarcticus EF554364 95.87 2CW2_PP3 Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Wandonia haliotis FJ424814 88.36 Winogradskyella Water Multiwell plate ASW salts + 1:10 HD Bacteroidetes HM368527 98.18 ACW3C_G4 aquimaris Winogradskyella Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes HQ336488 98.15 3RS2_PS2b damuponensis 4RS2_PS8 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Winogradskyella rapida U64013 98.01 Zobellia Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes NR_074684 99.09 4RW5_S2 galactanivorans Tenacibaculum Water Multiwell plate ASW salts + 1:10 HD Bateroidetes NR_125675 98.00 RW6 10^3 1 caenipelagi Albirhodobacter Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_126203 99.00 JAB_HD_87b marinus ASW salts + Polymer Alcanivorax Water Multiwell plate Proteobacteria NR_074890 99.00 PRS2_10^3 mix borkumensis Alcanivorax Water Multiwell plate ASW salts + 1:10 HD Proteobacteria NR_025145 99.39 MAW6_4 venustensis Altererythrobacter Water Multiwell plate ABW salts + 1:10 HD Proteobacteria DQ304436 96.10 JAB_HD_31b epoxidivorans RW2 10^4 2* Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Alterierythrobacter sp. FM177586 96.00 ARW1_1H2 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Alteromonas australica FJ595485 99.09 Alteromonas Water Multiwell plate ASW salts + 1:10 HD Proteobacteria NR_042667 99.00 ARW3b_1E7_(1small) genovensis RW2 10^3 A1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Alteromonas hispanica NR_043274 100.00 3CW3_PP2(1) Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Alteromonas marina AF529060 99.07 Alteromonas Water Multiwell plate ASW salts + 1:10 HD Proteobacteria AJ295715 99.00 ARW1_1H1 stellipolaris ACW2A_D7 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Amphritea atlantica AM156910 99.10

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 48 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

ACS2D_F3 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Amphritea balenae AB330883 99.50 Antarctobacter Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria DQ915602 99.75 2RS2_G5 heliothermus RW4 10^3 1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter ellisii NR_117105 97.00 ASW salts + Polymer Sediment Multiwell plate Proteobacteria Arcobacter marinus EU512920 99.73 PCS2D_E6 mix Arcobacter Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria FR675874 96.76 ACS2C_E8 molluscorum ARW1_2G2 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter nitrofigilis CP001999 95.61 ACS1D_H8 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter sp. FR717550 96.97 ACS2D_F11 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter suis FJ573216 94.87 ACS3C_E5 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter venerupis NR_117569 94.71 Aurantimonas Water Biofilm assay ASW salts + 1:10 HD Proteobacteria AY065627 99.44 4RW5_G6 coralicida 3RW5_G2a Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Aurantimonas litoralis AY178863 100.00 Bordetella Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria U04949 97.12 ACS1D_D11 parapertussis Brevundimonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria EU143355 100.00 JAB_HD_102a1 basaltis Brevundimonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria AB899817 99.50 JAB_HD_109b denitrificans 2CS3_PP4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Catenococcus thiocycli HE582778 99.85 JAB_HD_37a2 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Caulobacter fusiformis AJ227759 99.00 Celeribacter Water Multiwell plate ASW salts + 1:10 HD Proteobacteria NR_117908 97.00 ASW_HD_1_3 baekdonensis 3RS2_G1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Cobetia amphilecti AB646236 99.82 2CS3_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Cobetia litoralis AB646234 99.88 RS2_PP4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Colwellia aestuarii DQ055844 99.75 Colwellia Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AB011364 97.63 4CS1_PP4 psychroerythraea JAB_HD_19a Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Devosia insulae EF012357 98.60

JAB_HD_105 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Erythrobacter AY461441 99.00

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 49 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

aquimaris 4RW5_G5 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Erythrobacter citreus AF118020 99.60 Erythrobacter Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria AY562219 98.09 ACS3C_D9 seohaensis ACS3C_D9 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Erythrobacter vulgaris AY706935 97.53 JAB_HD_81b Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Gemmobacter aquatilis FR733676 97.40 Gemmobacter tilapiae Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_109053 98.80 JAB_HD_39 strain Ruye_53 Glaciecola agarilytica Water Biofilm assay ASW salts + 1:10 HD Proteobacteria DQ784575 99.73 3RW5_PP8 strain NO2 Glaciecola Water Biofilm assay ASW salts + 1:10 HD Proteobacteria AB247623 99.86 2RW5_PP1b chathamensis Glaciecola lipolytica Water Biofilm assay ASW salts + 1:10 HD Proteobacteria EU183316 97.87 4CW2_PS3 strain E3 Halocynthiibacter JWIF01000 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria 95.08 2CW3_PS1 namhaensis 056 RS2_G_1 Sediment biofilm assay ASW salts + 1:10 HD Proteobacteria Halomonas alkaliphila NR_042256 100.00 2CW3_PP1 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Halomonas sp. AJ876733 100.00 MAW8_1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Halomonas titanicae NR_116997 99.60 3CS2_G1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Halomonas venusta AJ306894 99.87 3RS2_PS2a Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Hoeflea alexandrii AJ786600 99.37 RS2_PS_5 Sediment biofilm assay ASW salts + 1:10 HD Proteobacteria Hoeflea halophila NR_108835 99.00 4RS2_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Hoeflea marina AY598817 99.82 2RS2_PP3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Hoeflea phototrophica JF957616 98.58 Hydrogenophaga Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_028716 97.00 JAB_HD_18 taeniospiralis Hyphomonas Water Multiwell plate ASW salts + 1:10 HD Proteobacteria KF863146 99.00 ASW_HD_1_1_1 jannaschiana JAB_HD_33b Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Hyphomonas oceanitis KF863148 99.90 MAW7_1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Hyphomonas sp KF863148 97.76 3RS2_PS3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Idiomarina abyssalis NR_024891 99.55 2CS1_PP4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Idiomarina seosinensis AY635468 99.29

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 50 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Janthinobacterium Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_026365 99.00 ABW_HD_1_3 lividum 3CS2_G3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Labrenzia aggregata AB681109 100.00 CW3_PP4 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Labrenzia alba NR_042378 100.00 Leisingera Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria DQ915607 97.93 2CS3_S1 methylohalidivorans ASW salts + Soil Water Multiwell plate Proteobacteria Leisingera nanhaiensis NR_116593 97.00 ASW_UV_1_3 extract ACW2D_F8 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Lentibacter algarum FJ436732 100.00 JAS_HD_27 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Limnobacter litoralis AB366174 100.00 Limnobacter Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_025421 99.70 JAB_HD_21 thiooxidans JAB_HD_26a Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Limnohabitans parvus FM165536 97.30 2RW5_PP2 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Litoreibacter meonggei JN021667 96.37 ACW3A_B6 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Loktanella koreensis DQ344498 97.67 Loktanella Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria KC987356 100.00 2CS2_S1 soesokkakensis RW3 10^3 2B Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Loktanella tamlensis NR_115814 98.00 Loktanella Water Multiwell plate ABW salts + 1:10 HD Proteobacteria AJ582226 99.20 JAB_HD_1 vestfoldensis Marinobacter Water Multiwell plate ASW salts + 1:10 HD Proteobacteria NR_074765 100.00 RW2 10^3 2B adhaerens 3RW5_G2b Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Marinobacter algicola NR_042807 99.09 Marinobacter Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria FJ196022 98.24 3RS2_PS4a antarcticus 3RW5_PP7 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Marinobacter flavimaris AY517632 100.00 2CS1_S4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Marinobacter lipolyticus NR_025671 100.00 MAW2_1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Marinobacter salarius KJ547705 100.00 Marinobacter Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria NR_029028 99.82 3RS2_PP2 sediminum ASW salts + Polymer Sediment Multiwell plate Proteobacteria EF192391 99.47 PCS3D_B11 mix rhizophilum

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 51 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Marinobacterium Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria EU573966 98.69 ACS1C_C5a sediminicola Chemotaxis ASW salts +glucose Sponge Proteobacteria Marinomonas alcarazii EU188442 96.76 Gluc_a2CH2 assay 2mM ASW salts + Polymer Marinomonas Sediment Multiwell plate Proteobacteria EU188447 99.34 PCS2C_F6 mix aquiplantarum Chemotaxis ASW salts + glucose Sponge Proteobacteria Marinomonas arctica DQ492749 98.10 a2CH2Gluc assay 2mM Chemotaxis ASW salts + glucose Sponge Proteobacteria Marinomonas foliarum EU188444 98.42 ASW_Sug_1CH4 assay 2mM ASW salts + Polymer Marinomonas Sediment Multiwell plate Proteobacteria AB242868 96.81 PCS2D_E7 mix ostreistagni ASW salts + Polymer Marinomonas Water Multiwell plate Proteobacteria EU188445 98.70 PRW1_1D5 mix posidonica Chemotaxis ASW salts + glucose Sponge Proteobacteria Marinomonas rhizomae EU188443 98.84 ASW_Sug_1CH1 assay 2mM Maritimibacter Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AB681686 97.40 4RS2_PS5 alkaliphilus ABW_HD_1_1_1 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Massilia aurea NR_042502 99.00 Methylobacterium Water Multiwell plate ABW salts + 1:10 HD Proteobacteria AB175634 100.00 JAB_HD_31a1 fujisawaense Neptuniibacter Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria GQ131677 95.53 CS2_G2b halophilus Neptunomonas Water Multiwell plate ASW salts + 1:10 HD Proteobacteria NR_114018 99.00 ARW1_2D4 naphthovorans 4RW5_S4 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Nereida ignava DQ915613 97.83 JAB_HD_22b Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Nevskia ramosa AJ001010 99.90 2CW3_S4 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Nisaea nitritireducens DQ665839 99.70 Nitratireductor Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria HQ176467 99.87 3CS1_G3 aquimarinus ASW salts + Polymer Sediment Multiwell plate Proteobacteria Oceanisphaera sp. FN377705 98.12 PCS2D_B10 mix ACS1C_B8a Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Oceanospirillum linum AB680860 98.09

ANS211A_A11 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Octadecabacter DQ915618 97.70

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 52 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

arcticus ARW3B_1D6 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Oricola cellulosilytica KF582604 99.00 CS1_PS1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Paracoccus caeni GQ250442 99.09 Paracoccus Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AB681242 97.57 CS1_PS2b seriniphilus CS1_PS2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Paracoccus siganidrum JX398976 97.85 Parasphingopyxis Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AB524074 99.84 3RS2_G4a lamellibrachiae Parrarhodobacter Sediment biofilm assay ASW salts + 1:10 HD Proteobacteria AM403160 97.39 RS2_PS_4 aggregans 2CW3_G5 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Pelagicola litoralis EF192392 99.30 RW5_PP4 Water biofilm assay ASW salts + 1:10 HD Proteobacteria Phaeobacter arcticus NR_043888 99.00 2CS2_G3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Phaeobacter leonis NR_117639 99.43 Photobacterium Water Biofilm assay ASW salts + 1:10 HD Proteobacteria DQ534014 98.84 CW3_PP3 lutimaris 2CS2_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Ponticoccus litoralis EF211829 98.02 Porphyrobacter Water Multiwell plate ABW salts + 1:10 HD Proteobacteria DQ011529 99.20 JAB_HD_87a dokdonensis Porphyrobacter Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_025816 99.00 ABW 1:10 HD 88 rw donghaensis Primorskyibacter Water Biofilm assay ASW salts + 1:10 HD Proteobacteria AB550558 100.00 4CW3_PP1 sedentarius Prosthecomicrobium Water Multiwell plate ABW salts + 1:10 HD Proteobacteria GQ221761 98.30 JAB_HD_19b enhydrum Pseudoalteromonas Water Biofilm assay ASW salts + 1:10 HD Proteobacteria NR_025509 99.00 RW5_PP1 agarivorans Chemotaxis ASW salts + Tween Pseudoalteromonas Sponge Proteobacteria AJ417594 99.82 Tween_4CH3 assay mix agarovorans Pseudoalteromonas Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AB576636 100.00 CS2_G1 arabiensis Pseudoalteromonas Water biofilm assay ASW salts + 1:10 HD Proteobacteria NR_026218 99.00 RW5_G_3 atlantica Pseudoalteromonas Water Biofilm assay ASW salts + 1:10 HD Proteobacteria X82136 100.00 3RW5_S3a carrageenovora

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 53 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Pseudoalteromonas Water Biofilm assay ASW salts + 1:10 HD Proteobacteria AB681736 100.00 CW3_G1 espejiana Pseudoalteromonas Water Multiwell plate ASW salts + 1:10 HD Proteobacteria NR_029285 99.00 ARW5b_1A1 espejiana Pseudoalteromonas Water Biofilm assay ASW salts + 1:10 HD Proteobacteria AF316144 100.00 2CW2_PS2 issachenkonii Pseudoalteromonas Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria FJ404721 99.81 CS2_PP1 lipolytica Pseudoalteromonas Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria AY563031 99.55 ACS2D_D3 marina ABW HD 2_1 89 b Pseudoalteromonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_028992 99.00 27.02.14 mariniglutinosa Pseudoalteromonas Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AF316891 99.11 3CS1_PP3 ruthenica Pseudoalteromonas Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AB720724 99.87 4CS1_S2 shioyasakiensis Pseudoalteromonas Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AJ507251 99.05 2CS2_PP3 sp. Pseudoalteromonas Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria JN578478 100.00 ACS2D_B11 sp. Pseudoalteromonas Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AY040230 98.10 4CS1_PS1 sp. Pseudoalteromonas Water Biofilm assay ASW salts + 1:10 HD Proteobacteria NR_114187 100.00 RW5_PP2 tetraodonis Water Multiwell plate ABW salts + 1:10 HD Proteobacteria AY017341 100.00 JAB_HD_57 chloritidismutans Pseudomonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_115115 99.00 ABW_HD_2_7_1 chloritidismutans Pseudomonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria EU791281 99.70 JAB_HD_29 cuatrocienegasensis Pseudomonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria AJ583501 100.00 ABW_HD_4_3 extremaustralis Pseudomonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria AF074384 100.00 JAB_HD_50 gessardii JAB_HD_42 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Pseudomonas guineae NR_042607 99.40

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 54 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

ABW salts + Soil Water Multiwell plate Proteobacteria Pseudomonas migulae NR_114223 99.00 ABW UV _5_3 101 a extract Pseudomonas Water Multiwell plate ASW salts + 1:10 HD Proteobacteria NR_040991 99.00 RW1 10^4 2AB pachastrellae Chemotaxis ASW salts + Tween Sponge Proteobacteria Pseudomonas peli AM114534 98.61 4d1tw assay mix Chemotaxis ASW salts + Tween Sponge Proteobacteria Pseudomonas sp. AJ272544 99.28 Tween_4D1 assay mix ABW_HD_2_5 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Pseudomonas veronii NR_028706 100.00 ABW 1:10 HD Pseudorhodobacter Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_113810 99.00 31.07.13 66 ferrugineus ABW salts + Soil Pseudorhodobacter Water Multiwell plate Proteobacteria NR_109461 99.00 ABW_UV_5_2 extract wandonensis Pseudoruegeria Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria NR_116620 96.90 CS3_PS3b lutimaris Pseudovibrio Water Biofilm assay ASW salts + 1:10 HD Proteobacteria AB681198 99.48 3RW5_S1a ascidiaceicola Chemotaxis ASW salts + Nitrogen Sponge Proteobacteria Pseudovibrio japonicus NR_041391 99.74 1l4 assay compounds 3RW5_PS2 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Pseudovibrio sp. HQ647029 98.32 Psychrosphaera Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria AB545807 97.03 ACS3C_D5 saromensis Psychrosphaera Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria AB545807 97.12 ACS3D_F7 saromensis CS1_G3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria R.algocolus X78315 99.64 4RW5_PS4 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria R.fascians X79186 99.25 ABW salts + Soil Rhizobium Water Multiwell plate Proteobacteria NR_116445 99.00 ABW_UV_2_2 extract rosettiformans Rhizobium Water Multiwell plate ABW salts + 1:10 HD Proteobacteria EF440185 99.20 JAB_HD_56 selenitireducens ABW_HD_1_2 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Rhodobacter ovatus NR_115057 97.00 Roseibacterium Water Multiwell plate ASW salts + 1:10 HD Proteobacteria FN667962 97.80 JAS_HD_21 elongatum 4CS3_PP2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Roseobacter sp. AF098495 99.65

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 55 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Chemotaxis ASW salts + Nitrogen Sponge Proteobacteria Roseovarius aestuarii EU156066 98.76 2l2 assay compounds 4RW5_G2 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Roseovarius litoreus JQ390520 94.63 3CS1_PS2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Roseovarius pacificus DQ120726 99.12 Roseovarius Water Biofilm assay ASW salts + 1:10 HD Proteobacteria JQ739459 97.32 3RW5_S5a sediminilitoris 2CS1_PS3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Ruegeria arenilitoris JQ807219 98.32 ACW3D_G5 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Ruegeria atlantica AB255399 100.00 Chemotaxis ASW salts + Nitrogen Sponge Proteobacteria Ruegeria meonggei KF740534 99.75 2l4 assay compounds 3CS1_PS1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Ruegeria mobilis AB255401 100.00 Ruegeria Water Biofilm assay ASW salts + 1:10 HD Proteobacteria AM905330 100.00 3RW5_PS3 scottomollicae 2CW2_PS3 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Sagittula marina HQ336489 100.00 2CS1_PS4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Sagittula stellata DQ915628 98.80 4CS1_PP2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Salipiger mucescens AY527274 99.58 Seohaeicola Water Multiwell plate ABW salts + 1:10 HD Proteobacteria EU221274 100.00 JAB_HD_104 saemankumensis MAS2_1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Shewanella arctica NR_117528 100.00 ACS3C_D11b Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Shewanella basaltis EU143361 97.70 3RS2_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Shewanella kaireitica AB094598 98.10 Shewanella Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AB081757 98.11 4CS3_G2 marinintestina Shewanella Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria CP000472 98.00 RS2_PP1 piezotolerans ASW salts + DMSO Chemotaxis Sponge 1% and Thiosulfate Proteobacteria Shewanella violacea D21225 98.36 assay 5c3 1mM ABW salts + Polymer Sphingomonas Water Multiwell plate Proteobacteria NR_116043 97.20 JAB_Poly_15 mix histidinilytica Sphingomonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_118124 98.10 JAB_HD_89 starnbergensis JAB_HD_100 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Sphingomonas wittichii AB021492 98.60

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 56 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Chemotaxis ASW salts + DMSO Sphingomonas Sponge Proteobacteria X94098 98.81 ASW_DMSO_5D3 assay 1% xenophaga JAB_HD_40 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Sphingopyxis flavimaris AY554010 99.50 JAB_HD_47b Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Spongiibacter borealis HQ199599 99.90 3RS2_PS1b Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Stappia marina AY628423 99.81 4RW5_S5 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Sulfitobacter delicatus AY180103 100.00 2RS2_G4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Sulfitobacter dubius AY180102 99.73 ACS3D_G9b Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Sulfitobacter marinus DQ683726 99.63 3RW5_PP3a Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Sulfitobacter pontiacus Y13155 99.10 4RS2_S3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Tateyamaria pelophila AJ968651 97.46 2RS2_S3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Thalassobacter arenae EU342372 95.11 4RS2_PP4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Thalassobius aestuarii AY442178 97.18 2CS1_S2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Thalassococcus lentus JX090308 98.78 Thalassospira Water Biofilm assay ASW salts + 1:10 HD Proteobacteria NR_115011 99.84 2CW2_PS1 lucentensis Thalassospira Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria FJ860275 100.00 2CS2_PP4 permensis Thalassospira Water Biofilm assay ASW salts + 1:10 HD Proteobacteria AB548215 100.00 2CW3_PP4 povalilytica Chemotaxis ASW salts + Yeast Sponge Proteobacteria Tropicibacter litoreus NR_117647 98.43 1b5b assay extract Tropicibacter Water Biofilm assay ASW salts + 1:10 HD Proteobacteria HE860710 98.49 2CW2_PS4 mediterraneus CS1_S3b Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria V.pelagius X74722 98.86 JAB_HD_12a Water Multiwell plate ABW salts + 1:10 HD Proteobacteria ginsengisoli AB245358 99.80 JAB_HD_32 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Variovorax ginsengisoli NR_112562 99.70 2CS1_G1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria azureus NR_041683 99.18 ARW1_1A1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Vibrio chagasii NR_117891 99.00 Chemotaxis ASW salts + Mix of Sponge Proteobacteria Vibrio cyclitrophicus AM162656 99.49 1d2 assay sugars I ARW1_2A7 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Vibrio gigantis AJ582810 100.00

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 57 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

ASW salts + Polymer Water Multiwell plate Proteobacteria Vibrio hemicentroti JX204734 99.70 PRW2_1E10 mix RW3 10^4 2 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Vibrio lentus NR_028926 99.00 Chemotaxis ASW salts + Mix of Vibrio Sponge Proteobacteria AB680329 99.85 5k2 assay sugars III parahaemolyticus 2CS1_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Vibrio probioticus AJ345063 98.70 CS1_G2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Vibrio proteolyticus AB680395 95.87 2CS1_PP2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Vibrio sp. AJ316171 98.51 3CS2_PP3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Vibrio sp. AJ316193 99.46 Chemotaxis ASW salts + Mix of Sponge Proteobacteria Vibrio toranzoniae HE978310 99.60 1k3 assay sugars III ARW1_2H4 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Vibrio xuii NR_025478 99.00 Zhongshania Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_126306 100.00 JAB_HD_49a aliphaticivorans ABW salts + Soil Water Multiwell plate Proteobacteria Zhongshania antarctica NR_108450 98.00 ABW_UV_6_3 extract Rhodothermae AQXH01000 Water Biofilm assay ASW salts + 1:10 HD Balneola vulgaris 93.96 2CW3_G4 ota 003

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 58 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Using PCR screening, 18 predicted gene clusters were located from the BAC and cosmid library of S. espanaensis DSM 44229 and 4 of them were transferred into Streptomyces (S. coelicolor and S. lividans) for heterologous expression (Table 6). Table 6 Prediction gene clusters screening from BAC and cosmid library Heterologous Size Homologous known gene expression in Cluster Type (Kb) clusters (similarity level) Streptomyces 1 Terpene 22.5 Lividomycin (6%) √ 2 Lantipeptide 22.6 Erythreapeptin (75%) 3 Terpene 25.3 Isorenieratene (42%) 5 Furan 11.1 Asukamycin (19%) √ 6 Nrps 55.7 - 8 Nrps 69.8 Bacillibactin (38%) 11 Nrps 88.3 Skyllamycin (14%) 12 T1pks-Otherks 82 - 17 Lantipeptide 22.8 Kinamycin (5%) √ 19 Melanin 10.5 - 23 Nrps-T1pks 87.3 Splenocin (12%) 24 T1pks 68.4 Streptazone E (66%) 30 Thiopeptide 29.2 - √ Terpene- Lassopeptide- T2pks-Nrps- 31 T1pks 85.3 Fluostatin (23%) Terpene- Lantipeptide- 32 Nrps-T1pks 99.4 Azinomycin B (17%) 34 Terpene 21.8 - 35 Oligosaccharide 25.3 - 36 Terpene 22 SF2575 (6%) √

To try and improve production of heterologously expressed compounds, the Streptomyces promoter ermE* was inserted into the cosmid containing cluster 17, 30 and 36. So far, the promoter has been inserted in one direction facing cluster 36 and in both directions flanking cluster 17. These new modified cosmids, were then cloned into Streptomyces and we will look at production levels and see if they are enhanced as expected.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 59 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

A: B:

C:

Figure 7. Network analysis of the five heterologously expressed gene clusters, viewed in cytoscape

MS/MS spectra were collected, and the molecular network was generated using cosine scores, where edges connect nodes and therefore measure relatedness in MS/MS spectra. Each node is colour coded for ease of visualisation. A: Purple nodes indicate masses which are only present in samples where gene cluster number 1 (predicted terpene) was heterologously expressed, but not in any other samples such as in the controls or when other gene clusters were heterologously expressed. Green nodes indicate masses which are only present in samples where gene cluster number 5 (predicted furan) was heterologously expressed. Teal nodes indicate masses which are only present in samples where gene cluster number 17 (predicted lanthipeptide) was heterologously expressed. Orange nodes indicate masses which are only present in samples where gene cluster number 30 (predicted thiopeptide was heterologously expressed. Yellow nodes indicate masses which are only present in samples where gene cluster number 36 (predicted terpene was heterologously expressed. Black nodes indicate masses which are only present in untransformed S. coelicolor and S. lividans. Grey nodes indicate masses present in more than one heterologously expressed gene cluster and/or in a control sample. B: Example networks from gene cluster 1 heterologously expressed. The number on the node represents the parent mass. C: Example networks from gene cluster 17 heterologously expressed. The number on the node represents the parent mass.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 60 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

6.3 Characterisation of extracts at HZI A first pilot extract, called ARW1, was obtained by DSMZ and profiled successfully, demonstrating the principle feasibility of the approach for measuring extracts of rare bacteria (see Fig. 7). The bulk of the study of extracts is to be completed between May 2018 and project end, when 100 – 200 samples will be provided by partners and analysed at low level. A selection of promising samples will see an in depth analysis and annotation.

The ARW1 strain was processed as followed: Cells and supernatant were collected and analysed separately (each in duplicate). Metabolites in the supernatant were enriched on an XAD resin, and eluted prior to analysis. Each sample was analysed by reversed phase C18 chromatography (water/acetonitrile gradient) in positive (ESI+) and negative (ESI-) ionization modes with data-dependent MS2 spectra acquisitions. The MS2 spectra were clustered using the CLUMSID algorithm. The metabolites were annotated using the HZI in-house bacterial metabolome library. A table with spectral features and additional information (sum formula, signal intensities, adduct formation) was prepared for an upload into the database.

Figure 7. Example output data from analysis of extracts at HZI

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 61 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 62 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 63 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

In summary, typical compounds from bacterial primary metabolism could be identified in cell extracts in extracts from ARW1. There was significant overlap in the metabolomes of the extract and the supernatant. However, the metabolite annotation is far from complete. To enable a broad group of users to further explore that data, both primary as well as processed data will be written to the EBI database. In an analogous manner, the bulk of the study of 100 – 200 samples will be conducted between May 2018 and project end.

6.4 Significance of outputs ● The designed media (formulae in Annexe 1) enabled the isolation of 264 species rarely isolated bacteria ● Flow cytometry and biofilm technologies was used for selection of marine organisms in samples that had greatest potential as candidates for biomass production ● A pulse delivery method to isolate cells with specific properties involving Multiple Displacement Amplification (MDA) was tried, tested and proved invaluable. This led to new cultivation media to augment the diversity of strains for the pipeline. The technique also allows targeting cells with specific properties directed by user needs ● Combined cosmid and BAC library heterologous expression approach developed. ● Genome informed use of isotopes to track new metabolites. ● USTAN generated cosmid libraries for a series of more tractable and sequenced test strains. Biosynthetic gene cluster analysis prioritized 9 clusters according to novelty. Cosmids containing them were taken into strains of Streptomyces coelicolor and Streptomyces lividnas for heterologous expression. Fermentation broth extracts revealed heterologously produced metabolites.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 64 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

● Larger biosynthetic gene clusters were accessed via a complementary BAC library approach, removing the need for cluster stitching prior to heterologous expression. ● Original bottlenecks in getting sufficient DNA for analysis was overcome shifting bottlenecks: - Expression level: addressed by introduction of promotors, utilization of different heterologous hosts, and utilization of small molecule elicitors. - Metabolite de-replication and assignment: addressed with cluster analysis of LC- MS/MS data aiding the identification of series of closely related compounds - Cyclic peptides difficult to characterize: new approaches explored by pre- incubation with an enzyme ● HZI have optimized analysis for different metabolite classes: e.g. short-chain fatty acids to larger polar and nonpolar compounds of primary and secondary metabolism. Lessons learned from the metabolome analysis of microalgae of WP7 accelerate their application to the bacteria. ● Use of metabolite clustering analysis with a tool made available as an R package to assist in the identification of novel suites of compounds. ● Three interesting fully sequenced strains from Institute Pasteur were examined to test the WP6 prototype pipeline. One strain is producing interesting compounds; extracts are being provided to HZI for further analysis. - Psychrobacter glacincola CIP 105313T; 1997, ACAM <- D. Nichols, Tasmania Univ., Australia Shelf Sea Ice - Psychrobacter maritimus CIP 108811T; 2005, DSMZ <- L. A. Romanenko: strain Pi 2-20; Sea-ice sample - Gillisia hiemivivida CIP 108528T; 2004, J. P. Bowman, Tasmania, Australia: strain IC154; Sea-ice algal assemblage

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 65 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

7. EMBRIC pathways to discovery

The European Marine Biological Research Infrastructure Cluster (EMBRIC) has brought together six Research Infrastructures from the biological sciences to promote the use of marine resources in research and development of biological products. ● EMBRC - European Marine Biological Resource Centre: www.embrc.eu/ ● MIRRI - Microbial Resource Research Infrastructure: www.mirri.org/ ● EU-OPENSCREEN - European Infrastructure of Open Screening Platforms for Chemical Biology: www.eu-openscreen.eu/ ● ELIXIR - A distributed infrastructure for life-science information: https://www.elixir- europe.org/ ● AQUAEXCEL - Aquaculture infrastructures for excellence in European fish research: www.eatip.eu ● RISIS - Research Infrastructure for research and information policy studies: http://risis.eu/

The Microbial prototype pipeline described here has demonstrated a tried and tested route through some of the facilities of the partners of these infrastructures leading from marine samples to compounds. EMBRIC offers many routes for researchers to facilitate and accelerate discovery of potential new biological products. Each participating Research Infrastructure (RI) and associated centres of excellence provides access to their facilities, expertise and technologies. A researcher can select the centres that offer technologies they do not have access to, including harmonised multidisciplinary workflows that will support their work thus initiating joint activities to overcome obstacles and bottlenecks. There are numerous routes that may be taken in the discovery process and EMBRIC has demonstrated how working with the biological and medical science research infrastructures can aid in the selection of the most appropriate screening process, thereby accelerating the process of discovery. The first step is to select the location to sample, followed by the prioritisation of organisms and selective sampling to enrich for organisms with the desired properties. The oceans are vast and the potential enormous, the organisms are frequently slow growing; so critical decisions should be made to utilise time and resources effectively. A typical workflow begins at the source where there are two options:1) isolation directly from the environment; or ii) to select the organisms already isolated and stored in ex situ biological resource collections such as the microbial domain Biological Resource Centres (mBRC) coordinated by MIRRI and EMBRC. MIRRI brings together 31 international mBRCs which, together, hold a total of >330K microorganisms, ranging from bacteria (including cyanobacteria) to yeasts, filamentous fungi and algae. The marine resources from these collections are listed in EMBRIC Deliverable D3.1 but can also be accessed via the EMBRC (http://www.embrc.eu/) and MIRRI (https://www.mirri.org/home.html) websites. There are also resources outside these Research infrastructures see World Data Centre for Microorganisms (WDCM) http://www.wdcm.org/. The first question, for a user to consider, is: how do researchers identify the potential that exists to be exploited? In some respects, identifying potential is easier if the organism has

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 66 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

been isolated and fully characterised. Some of this data may be available in online collection catalogues or databases but often contact should be made with the collection itself or it may be necessary to search literature databases to discover which microorganisms may have the desired properties; EMBRIC, particularly MIRRI and EMBRC can help with this selection. However, the question remains, how does one identify the potential of organisms in the environment that may be difficult to isolate and maintain in the laboratory? How can you select the most likely candidates in situ to target and invest in order to deliver new and useful properties? EMBRIC has begun the process of mapping current mechanisms available to help facilitate such choices. Existing elements identified for consideration, from source to active molecules, have been determined, as follows: ● Identifying the potential of organisms in nature and collections ● Targeting the organisms and selection of the sampling regime ● Identifying collection targets, the organism's, environment and location ● Screening environmental consortia ● Isolating the organisms ● Identifying the organisms, in axenic mono-culture ● Appropriate characterisation, for example through genomic DNA analysis using NGS approaches – e.g. total sample DNA for the yet uncultured ● Selection and application of characterisation technology ● Data analysis and use ● Scaling up ● Extraction ● Purification and delivery of chemically defined compounds ● Compliance with the regulatory environment EMBRIC Deliverable D3.1 Map of centres of expertise and best practices provides detailed coverage of the EMBRIC cluster (http://www.embric.eu/deliverables). It describes a web- based tool to provide relevant information and links (http://www.embric.eu/node/121). The deliverable outlines the identification, networking and integration of existing capacities (technologies, knowledge and skills) both within and outside the partnering RIs. Table 7 (below) describes some of the technologies available across EMBRIC and their application. However, there is an additional advantage to be gained by utilising the EMBRIC cluster. As the deliverable describes that by pooling expertise, a critical mass can be focused on specific user problems, such as, the development of dedicated isolation protocols (e.g. selective media to access the microorganisms that cannot yet be cultivated); thus taking advantage of the full potential of the combined RIs. By accessing the research infrastructures, the loss of resources and time, by rediscovery of known and already patented compounds, can be avoided (i.e. de-replication). Additionally, this approach ensures compliance with regulations, ensuring that permissions to collect are secured where

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 67 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

needed and legal obligations are met. As EMBRIC moved forward it has demonstrated in its three pipelines, this one on microorganisms, focussing on bacteria, one on microalgae and a third on finfish that platforms and gateways to technologies and collaborations generate new knowledge and access new compounds. Table 7: Some of the technologies available across EMBRIC and their uses (extracted from Deliverable 3.1) Technology Examples of where these Specific Use are available Antibodies for carbohydrates See EU-OPENSCREEN Antibodies to Screening and Chemistry carbohydrate antigens Centres are critical for the study of bacteria cell-cell adhesion interactions; for the analysis of viral, hormone, and toxin receptors; analysis of the glycosylation of recombinant proteins Assay technology, Spectrophotometry, EU-OPENSCREEN High-throughput GFP / fluorescence methods screening for early stage discovery RT² Profiler PCR Arrays for gene EMBRC, e.g. SZN Identification of expression analysis of targeted cell signalling pathways death, inflammation and antioxidant targeted by marine pathways natural products for their potential therapeutic applications (e.g. anticancer, anti- inflammatory, anti- neurodegenerative) Chemical probes EU-OPENSCREEN e.g. HZI - To establish the Helmholtz Centre for Infection relationship between a Research; molecular target with http://www.chemicalprobes.or which it interacts and g/ the broader biological consequences of modulating the target in cells or organisms. CNMR - Carbon-13 Nuclear Magnetic EMBRIC e.g. USTAN - Allows the identification Resonance University of St. Andrews of carbon atoms in an organic molecule Cosmid/ BAC/PAC library generation EMBRIC e.g. USTAN To identify candidate genes stemming from functional traits GC-MS - Gas Chromatography Mass See EU-OPENSCREEN Crude extracts, Spectrometry Screening and Chemistry environmental samples Centres (solvent extracted and treated), complex

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 68 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

mixtures of chemicals may be separated, identified and quantified Generation of novel vectors for gene EMBRIC e.g. USTAN The identification of sequence to protein novel transcripts HNMR - High Resolution Nuclear EMBRIC e.g. USTAN Chemical structure Magnetic Resonance determination HPLC - High Performance Liquid Used to separate, Chromatography identify, and quantify each component in a mixture HSQC - Heteronuclear Single EMBRC e.g. USTAN NMR spectroscopy of Quantum Coherence organic molecules and is of particular significance in the field of protein NMR. Used to determine chemical structure Identification of the biosynthetic cluster EMBRIC e.g. USTAN Target novel activities encoding LC-MS - Liquid chromatography–mass Differentiate chemical spectrometry mixtures on the basis of mass Mode of action analysis for isolated EU-OPENSCREEN e.g. HZI Help researchers and and structurally characterised technologists to identify compounds potential Morphological keys and intelligent Identification of software, lucid, organisms Mutant generation and sequencing EU-OPENSCREEN e.g. HZI Modification of expression to generate products NOESY - Nuclear Overhauser Effect EMBRIC e.g. USTAN Characterizing and Spectroscopy refining organic chemical structures Phenotypic assays EU-OPENSCREEN e.g. HZI Targeting potential source organisms Peptide arrays EU-OPENSCREEN e.g. HZI Enzyme profiling Programme to unlock silent clusters EMBRIC e.g. USTAN Unlocking the Potential of Bacterial Gene Clusters to Discover New Antibiotics e.g. Seyedsayamdost, M. R. (2014). Sanger sequencing of known regions MIRRI e.g. CABI, DSMZ Bar-coding; e.g. Cox, ITS, 16s, 18s;, identification of organisms

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 69 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

NGS Numerous sites within sequence and annotate EMBRC, ELIXIR and MIRRI genomes at a much faster rate; study variation, expression and DNA binding at a genome-wide level Raman Spectroscopy Measuring the Chemical Identity and Structure of Materials MALDI-tof Mass Spectrometry Numerous sites within Matrix-assisted laser EMBRC, ELIXIR and MIRRI desorption/ionization (MALDI) is a soft ionization technique used in mass spectrometry, allowing the analysis of biomolecules (biopolymers such as DNA, proteins, peptides and sugars) and large organic molecules (such as polymers, dendrimers and other macromolecules); also used for strain definition Mobilization and integration of data on DSMZ (BacDive) One stop shop - all characteristics of individual analysis of all existing microorganisms using controlled phenotypic, biochemical vocabulary and molecular information of target microorganisms; extensive search functions across large numbers of different taxa for desired properties

The microbial prototype pipeline has explored ways to integrate and mine metabolomics, genomic and metagenomic data. Technologies available enable us to carry out analysis of the samples before isolation and indeed to identify the cells that may be of interest by challenging them in the sample with substrates and isolating those individual cells that show activity. Organisms normally difficult to isolate from marine samples have been grown, characterised and interesting compounds identified. A prototype pipe line utilizing expertise and facilities in three European Research Infrastructures demonstrates the value for a researcher to access facilities not normally available to them. In May 2016, over 900 samples from the Pacific Ocean were collected; many responded positively to substrate challenges particularly to cysteine, isovaleric acid, spermidine and

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 70 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

tween. Candidate strains isolated were made available for compound discovery and tested the original workflows of the pipeline, this resulted in a revised pipeline to identify and access the strains of greatest potential (see Fig. 9).

Figure 9. Revised pipeline following initial testing

As reported above, USTAN generated cosmid libraries for a series of more tractable and sequenced test strains. Biosynthetic gene cluster analysis prioritised 9 clusters according to novelty. Cosmids containing them were incorporated into strains of Streptomyces coelicolor and Streptomyces lividnas for heterologous expression. Fermentation broth extracts revealed heterologously produced metabolites. Larger biosynthetic gene clusters were accessed via a complementary ‘Bac’ library approach, removing the need for cluster stitching prior to heterologous expression. The methodologies are shared in this deliverable to enable researchers to follow similar approaches but it is undoubtedly more efficient to involve the appropriate microbial domain Biological Resource Centres in the study. The workflow in the microbial prototype pipeline tested here continues with difficult to culture microbes sequenced at DSMZ, selection of the most interesting strains for cosmid and BAC library generation, and heterologous expression by USTAN. It is possible to visualise many such routes using different technologies and expertise at the many different centres within the EMBRIC RI cluster depending on the end products or properties being sought. Having

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 71 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

identified the potential activities and compounds, extracts are analysed and compounds isolated by HZI. Again, methodologies may differ, HZI are extracting different metabolite classes: e.g. short-chain fatty acids to larger polar and nonpolar compounds of primary and secondary metabolism. EU-OPENSCREEN (http://www.eu-openscreen.eu/) is a distributed RI which integrates high-capacity screening platforms throughout Europe. Each can be engaged to help in compound analysis and characterization. They jointly use a rationally selected compound collection, comprising up to 140,000 commercial and proprietary compounds collected from European chemists. EU-OPENSCREEN offers to researchers from academic institutions, SMEs and industrial organisations open access to its shared resources. EU-OPENSCREEN will collaboratively develop novel molecular tool compounds with external users from various disciplines of the life sciences. Active compound discovery, isolation and characterization are not straightforward and various hurdles and bottlenecks present themselves. The EMBRIC partners work together to resolve or circumvent these bottlenecks, helping the researcher to accelerate their route to discovery (see Fig. 10).

Figure 10. Bottlenecks in the microbial prototype pipeline

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 72 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 11. Overview of routes to discovery Source Microbial Pipeline Best Practice Research Prototype Infrastructure partners Sourcing organisms EMBRC; MIRRI DSMZ; Ex situ - Microbial domain In situ – targeted isolation OECD BPG CABI Biological Resource centres programmes from the marine (2007); ISO (mBRC) environment and identification Standards Selected from holdings Baiting and biofilm techniques; Flow cytometry selection techniques

MIRRI and DSMZ; Cultivation and biomass production Institutional EMBRC CABI Methods DSMZ; Sequenced microorganisms with Cosmid libraries generated; genome Genomic USTAN potential biomass generated analysis before selection of strains Standards and/or DNA extarction with potential for growth Consortium

Determination of organism potential ELIXIR and EU- USTAN; Biosynthetic gene cluster analysis Cosmids containing them were OPENSCREEN Larger clusters complementary incorporated into strains for Bac library approach heterologous expression

Extract production and exploration EU- DSMZ Fermentation in Broths Bioassay Guided LC-MSMS Institutional OPENSCREEN; complementary fractionated fractionation profiling Methods extracts from media according to mBRCs or polarity commercial sources

EU- HZI Rapid identification of new compounds ensuring mitigation against Comparison of OPENSCREEN wasted resource on compound rediscovery datasets Enzyme mining short-chain fatty acids to larger polar and nonpolar Institutional compounds of primary and secondary metabolism Methods

HZI Purification of characterised compounds Institutional Extracts and up-scaled full chemical biological Institutional EU- compounds to fermentation, characterisation assessment of Methods OPENSCREEN and from extraction and assisted by compound partners –IP – purification genome reading bioassay Compound Progression

Protection of investment and future use

MIRRI – DSMZ Strains cryopreserved and catalogued; OECD BPG mBRCs; EU- HZI Related extracts and purified compounds stored; (2007); ISO OPENSCREEN Standards; libraries DSMZ Related data and strains and compounds catalogued EU-OPEN- CABI Legal clarity for use: compliance with regulatory environment SCREEN and MIRRI

Once fully tested and operational, the microbial prototype pipeline was tested with 3 interesting fully sequenced strains from Institute Pasteur. These strains were isolated from sea-ice and were considered to have potential new compounds to be exploited:

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 73 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

● Psychrobacter glacincola CIP 105313T; 1997, ACAM <- D. Nichols, Tasmania University, Australia isolated from shelf Sea Ice ● Psychrobacter maritimus CIP 108811T; 2005, DSMZ <- L. A. Romanenko: strain Pi 2-20; Sea-ice sample ● Gillisia hiemivivida CIP 108528T; 2004, J. P. Bowman, Tasmania, Australia: strain IC154; Sea-ice algal assemblage The isolates entered the pipeline at the characterisation level with full genome sequences and went to USTAN where a cosmid library was generated, and heterologous expression was carried out and mining for novel biosynthetic gene clusters. One strain was found to produce interesting compounds; extracts were provided to HZI for further analysis. There are obviously larger numbers of routes to discovery possible via various RI nodes or partner institutions considering the comprehensive capacities of the EMBRIC partners (Brennecke et al. 2018; Smith et al. 2018). Figure 11 provides an overview of this potential. The 900 samples collected from the Pacific entered the prototype pipeline established by EMBRIC at DSMZ, USTAN and HZI. Here the focus was on difficult to isolate bacteria and DNA sequencing to discover biosynthetic genes with the hope to discover bioactive metabolites. The starting point and ultimate goals of the screening might be quite different but not beyond the expertise and facilities of EMBRIC and component RI’s. Figure 11 outlines the EMBRIC microorganism prototype pipeline and through the stepwise process ● Cultivation and biomass production ● Determination of organism potential ● Extract production and exploration ● Rapid identification of new compounds ensuring mitigation against wasted resource on compound rediscovery ● Purification of characterised compounds ● Protection of investment and future use Microorganisms are a group that includes unicellular and filamentous organisms such as, Archaea and Bacteria, (Prokaryotes) and including yeasts, filamentous fungi, microalgae and unicellular protists and protozoans (Eukaryotes). The EMBRIC partner with the most appropriate expertise would be chosen to help isolate and grow to create the necessary biomass. The Researcher or bioprospecting company would then choose which technologies they need to discover the molecules they are looking for. In the majority of cases genome technologies would utilised and these are not only available in EMBRIC via MIRRI, EMBRC and ELIXIR but from many other sources outside EMBRIC. Once properties are discovered the task of extracting, purifying and characterizing the molecules of interest is needed, again appropriate technologies can be selected from the EMBRIC partners bridging potential gaps in researcher’s facilities presenting new and different approaches to ease the path to discovery.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 74 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

8. Summary

Work package 6 overarching objective is to demonstrate that the components of the pipeline from the microorganisms in the environment to the active molecules and products on the market taking relevant input from the work of EMBRIC WPs 2-5 are effective. This required the development of coherent chains of high quality services for access to biological, analytical and data resources and deploying common underpinning technologies and practices for the route to useful secondary metabolites from marine bacteria. A prototype pipeline to demonstrate the cross infrastructure possibilities brought together the expertise and facilities of single partners from the MIRRI, EMBRC and EUOPENSCREEN research infrastructures. Although EMBRIC (specifically, EMBRC and MIRRI) covers most types of microorganism Bacteria were chosen as the starting point for the microorganism prototype pipeline as they are key components of the marine environment. They perform a wide range of biogeochemical and ecological functions yet we know very little about them. It is estimated that we have seen less than 1 percent in culture and the vast potential remains locked away. Population genomics has provided us with a picture of what might be there and an idea of the chemistry they may perform. EMBRIC’s microorganism prototype pipeline demonstrates how this potential can be unlocked. It utilises the specialist expertise and facilities of key laboratories and selected organisms that are difficult to cultivate or have yet to be grown are being targeted. DSMZ recovered strains from samples taken from the Pacific Ocean and these were characterised and prepared for scale up and production of active compounds. DSMZ determined optimal growth conditions for some of these organisms, these were sequenced to enable those bacterial strains with novel properties to be targeted and further studied. In one single experiment to develop this microbial prototype pipeline 264 rarely isolated species of bacteria were made available for study of their bioactive compounds. This demonstrates that coordinated and targeted isolation programmes engaging research teams from the many partners in the Research Infrastructures orchestrated for specific bioindustry needs could yield many thousands of candidate organisms. The potential for discovery of new interesting bioactive compounds is thus increased exponentially. USTAN produced cosmid libraries and detected 9 gene clusters and a series of compounds. Improvements to the process were made using elicitors and promoters and some difficult cyclic peptides were characterised. HZI improved the efficiency of different extraction and characterization techniques for use on different types of organisms and as a result isolates producing interesting compounds were discovered. The prototype pipeline was then tested using fully sequenced samples from Institute Pasteur. One strain produced a very interesting profile of compounds and is being further studied. The microorganism prototype pipeline has resulted in a flexible system to access cross research Infrastructure expertise and facilities to meet specific user defined demand of the research community. Such research must be performed in compliance with the regulatory environment. In the interests of the progress of science microbiologists must be able to exchange their organisms upon which their hypotheses and results are based but they must do this in a way that presents minimum risk to those who come into contact with the organism. They must

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 75 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

not fall foul of the laws that control the shipping of microorganisms as this will inevitably result in even more restrictive legislation that will make their exchange impossible. Health and Safety, packaging and shipping and controlled distribution legislation may be extensive and sometimes cumbersome but is there to protect us and must be followed.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 76 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

9. References and bibliography

Adler J (1973) A method for measuring chemotaxis and use of the method to determine optimum conditions for chemotaxis by Escherichia coli. J Gen Microbiol 74:77–91. Anon (1994) Approved Code of Practice for Biological Agents 1994. Health and Safety Executive. Sudbury: HSE Books. Anon (1996a) Categorisation of pathogens according to hazard and categories of containment. Fourth edition. Advisory Committee on Dangerous Pathogens (ACDP). London: HMSO. Anon (1996b) European Standard EN 829:1996 E: Transport packages for medical and biological specimens, Requirements, tests. Brussels: CEN, European Committee for Standardisation. Brennecke, P, Ferrante, MI, Johnston, IA & Smith, D. (2018) A collaborative European approach to accelerating translational marine science. Springer’s Marine Biotechnology Journal In the press Bruns A, Hoffelner H, and Overmann J (2003) A novel approach for high throughput cultivation assays and the isolation of planktonic bacteria. FEMS Microbiol Ecol 45: 161–71. Camarinha-Silva A, Jauregui R, Chaves-Moreno D, Oxley AP, Schaumburg F, Becker K, et al. (2014) Comparing the anterior nare bacterial community of two discrete human populations using Illumina amplicon sequencing. Env Microbiol 16: 2939–2952. Connon SA and Giovannoni SJ (2002) High-throughput methods for culturing microorganisms in very-low-nutrient media yield diverse new marine isolates. Appl Env. Microbiol 68: 3878–3885. Davison, A., Brebandere, J. de., & Smith, D. (1998). Microbes, collections and the MOSAICC approach. Microbiology Australia 19(1), 36-37. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460–2461. EEC Directives 93/88/EEC. Protection of workers from risks related to biological agents Fröstl JM and Overmann J (1998) Physiology and tactic response of the phototrophic consortium “Chlorochromatium aggregatum”. Arch Microbiol 169: 129–135. Galkiewicz JP and Kellogg CA (2008) Cross-kingdom amplification using bacteria-specific primers: complications for studies of coral microbial ecology. Appl Environ Microbiol 74: 7828–7831. Gich F, Janys MA, König M, and Overmann J (2012) Enrichment of previously uncultured bacteria from natural complex communities by adhesion to solid surfaces. Environ Microbiol 14: 2984–2997. González-Menéndez V, Asensio F, Moreno C, de Pedro N, Monteiro MC, de la Cruz M et al. (2014) Assessing the effects of adsorptive polymeric resin additions on fungal secondary metabolite chemical diversity. Mycology 5: 179–191. IATA - International Air Transport Association (2002) Dangerous Goods Regulations. 43rd edition. Montreal; Geneva: IATA. Jaspers E (2000) ‘‘Zur ökologischen Bedeutung der Diversität planktischer Bakterien: Erkenntnisse aus der Analyse von Reinkulturen.’’ Ph.D. Dissertation, University of Oldenburg. Kjelleberg S, Humphrey BA, Marshall KC (1982) Effect of interfaces on small, starved bacteria. Appl Environ Microbiol 43: 1166–1172.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 77 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Lane DJ (1991) 16S/23S rRNA sequencing. In: Stackebrandt, E., Goodfellow, M. (eds.), Nucleic Acid Techniques in Bacterial Systematics, John Wiley and Sons, New York, vol. pp. 115-175. Muyzer G, de Waal EC, Uitterlinden AG (1993) Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction- amplified genes coding for 16S rRNA. Appl Environ Microbiol 59: 695–700. Overmann J (2005). Chemotaxis and behavioral physiology of not‐yet‐cultivated microbes. Methods Enzymol 397: 133–147. Overmann J (2013). Principles of enrichment, isolation, cultivation, and preservation of prokaryotes. In The prokaryotes (pp. 149-207). Springer Berlin Heidelberg. Overmann J, Abt B and Sikorski J (2017). Presence and future of culturing bacteria. Annu. Rev. Microbiol 71: 711–730. Pascual J, Wüst PK, Geppert A, Foesel BU, Huber KJ and Overmann J (2015) Novel isolates double the number of chemotrophic species and allow the first description of higher taxa in Acidobacteria subdivision 4. Syst Appl Microbiol 38: 534–544. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P et al. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41: D590–596. Seyedsayamdost, MR (2014). High-throughput platform for the discovery of elicitors of silent bacterial gene clusters. Proc. Natl. Acad. Sci. 2014, http://www.pnas.org/content/111/20/7266.abstract Smith, D, Buddie, AG, Goss, R, Overmann, J, Lepleux, C, Brönstrup, M, Kloareg, B, Meiners, T, Brennecke, P, Ianora, A, Bouget, F-Y, Gribbon, P & Pina, M. (2018) Marine resources for discovery pipelines – an ocean of opportunity for biotechnology? Biotechnology and Bioprocess Engineering In the press

Useful websites World Federation for Culture Collections: http://www.wfcc.info/ World Data Centre for Microorganisms: www.wdcm.org/ ASM – Asian Consortium for the Conservation and Sustainable Use of Micro-organisms www.acm-mrc.asia/ ECCO, European Culture Collection Organisation: http://www.eccosite.org European Commission DGVII – Transport: https://ec.europa.eu/transport/about-us_en Food and Agriculture Organization (FAO): http://www.fao.org/home/en/ World Animal Health Organization (OIE): http://www.oie.int/eng/en_index.htm International Plant Protection Convention (IPPC): https://www.ippc.int/en/ The Australia Group: http://www.australiagroup.net/ Biological Weapons Convention (BWC): https://www.un.org/disarmament/wmd/bio/ MIRCEN - global network of Microbial Resources Centres http://www.ejbiotechnology.info/content/mircen/index.html WIPO - World Intellectual Property Organization : http://www.wipo.int ISO - International Organization for Standardization: https://www.iso.org/home.html International Air Transport Association: http://www.iata.org/Pages/default.aspx; Dangerous Goods regulations: http://www.iata.org/publications/dgr/pages/index.aspx International Civil Aviation Authority: Pipeline and Hazardous Materials Safety Administration https://www.icao.int/ https://www.phmsa.dot.gov/international-program/international-civil- aviation-organization

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 78 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Maritime Law: https://www.investopedia.com/terms/m/maritime-law.asp; https://www.thegef.org/partners/conventions.

Useful bibliography Anon (1994). Approved Code of Practice for Biological Agents 1994. Health and Safety Executive. Sudbury: HSE Books. Anon (1996b). European Standard EN 829:1996 E: Transport packages for medical and biological specimens, Requirements, tests. Brussels: CEN, European Committee for Standardisation. Cartagena Protocol on Biosafety to the Convention on Biological Diversity, https://bch.cbd.int/protocol. EC Council Directive 2000/29/EEC on protective measures against the introduction into the Member States of harmful organisms of plant or plant products. OJ No. L. 169, p.1 of 10.07.2000 EC Council Regulation 1504/2004 amending and updating Regulation 1334/2000 EC Council Directive 95/44/EC on establishing the conditions under which certain harmful organisms, plants, plant products and other objects listed in Annexes I to V to Council Directive 77/93/EEC may be introduced into or moved within the Community or certain protected zones thereof, for trial or scientific purposes and for work on varietal selections EC Council Directives 90/219/EEC and 98/81/EC on contained use of genetically modified organisms EC regulation 1946/2003 on the transboundary movement of genetically modified organisms (pertains to Cartagena Protocol on Biosafety) EC Council Directive 2000/54/EEC On the protection of workers from risks related to exposure to biological agents at work. OJ No. L. 262, pp.21-45 of 18.09.2000 EN 1619:1996 Biotechnology – large-scale process and production – General requirements for management and organisation for strain conservation procedures. EC Council Regulation No 1334/2000 of 22 June 2000 setting up a Community regime for the control of exports of dual-use items and technology. OJ No L 159 of 30.6.2000 (Amended by: EC Council Regulation 149/2003 of 27 January 2003, OJ L 30 of 05.02.2003, corrigendum OJ L 52 of 27.02.2003). IATA - International Air Transport Association (2005) Dangerous Goods Regulations 47th edition Montreal; Geneva: IATA. ISO 17025:2005, General requirements for the competence of testing and calibration laboratories. ISO 7218:2000, Microbiology of food and animal feeding stuffs. General rules for microbiological examinations. ISO 9001:2000, Quality Management Systems – Requirements Kirsop, B.E. & Doyle, A. (eds) (1991). Maintenance of Microorganisms and Cultured Cells: A Manual of Laboratory Methods, London: Academic Press.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 79 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

OECD (2001). Biological Resource Centres. Underpinning the future of life sciences and biotechnology. OECD (2006). Creation and Governance of Human Genetic Research Databases. Smith, D, Rohde, C (2002). The implication of the biological and toxin weapons convention and other related initiatives for WFCC members. WFCC Newsletter 34: 4-11. Technical Instructions for the Safe Transport of Dangerous Goods by Air. Doc 9284-AN/905. Council of ICAO, International Civil Aviation Organisation. United Kingdom National Culture Collection (1998) Quality manual. UKNCC Secretariat CABI Bioscience UK Centre, Egham, UK. Universal Postal Convention, Compendium of Information, Bern (International Bureau), Universal Postal Union, Beijing, 2000. World Federation for Culture Collections (1999). Guidelines for the establishment and operation of collections of cultures of microorganisms. UK: WFCC Secretariat. (2nd ed.). WHO World Health Organization, Geneva, Non-serial Publication, ISBN: 92 4 154650 6. Laboratory Biosafety Manual, Third Edition, English, 2004.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 80 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Annexe 1. Additional information on media formulae used at DSMZ

A) Medium Artificial Sea Water (ASW) /HD 1:10 ” Pepton 0.5 g Glucose 0.1 g Yeast Extract 0.25 g ASWa 1000.00 mL Adjust pH to 7.3 Add to 1000 mL of medium after autoclaving: Trace element solution SL-10b 1.00 mL Vitamin solutionc 1.00 mL

B) Medium “polymer mix” Peptine 1.0 g Chitin 1.0 g Cellulose 1.0 g Xylan 1.0 g Curdlan 1.0 g Basal mediuma 1000.00 mL Adjust pH to 7.3 Add to 1000 mL of medium after autoclaving: Trace element solutionb 1.00 mL Vitamin solutionc 1.00 mL

C) Medium “insoluble humic analogs” Abietic acid 500 µM Quercetin 500 µM Coumestrol 500 µM Methyl cinnamate 500 µM ASWa 1000.00 mL Adjust pH to 7.3 Add to 1000 mL of medium after autoclaving: Trace element solution SL-10b 1.00 mL Vitamin solutionc 1.00 mL

D) Medium “soluble humic analogs” Salicylate 500 µM Phthalic acid 500 µM AQDS 500 µM Furfural 500 µM Hydroxymethylfurfural 500 µM

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 81 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Lignosulfonate 500 µM Basal mediuma 1000.00 mL Adjust pH to 7.3 Add to 1000 mL of medium after autoclaving: Trace element solution SL-10b 1.00 mL Vitamin solutionc 1.00 mL

E) Medium “Soil Extract (SE)/HD 1:10“ Peptone 0.50 g Yeast extract 0.25 g Glucose 0.10 g 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) 10 mM Soil extract mediumd 1000 mL Add to 1000 mL of medium after autoclaving: Trace element solution SL-10b 1.00 mL Vitamin solutionc 1.00 mL aDepending on the salinity of the seawater samples, media were based either on artificial sea water media (ASW, modified from Bruns et al., 2003) or artificial brackish water (ABW).

● Artificial Sea Water media (ASW; modified from Bruns et al., 2003) NaCl 23.6 g

MgCl2·7H2O 4.53 g

CaCl2·2H2O 1.3 g KCl 0.64 g

MgSO4·7H2O 5.94 g

Na2HPO4 0.01 g

NH4NO3 2.1 mg 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) 2.3 g Distilled water 1000.00 mL

● Artificial Brackish Water (ABW) NaCl 5.53 g

MgCl2·7H2O 2.76 g

CaCl2·2H2O 0.33 g KCl 0.15 g

Na2SO4 0.90

NaHCO3 0.20 g KBr 22.8 mg

H3BO3 5.7 mg

SrCl2 5.4 mg

NH4Cl 4.9 mg

KH2PO4 1.2 mg

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 82 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

NaF 0.7 mg HEPES 2.38g Distilled water 1000.00 mL bTrace element solution HCl (25%; 7.7 M) 10 mL

FeCl2 x 4H2O 1.50 g

ZnCl2 70.0 mg

MnCl2 x 4H2O 100.0 mg

H3BO3 6.0 mg

CoCl2 x 6H2O 190.0 mg

CuCl2 x 2H2O 2.0 mg

NiCl2 x 6H2O 24.0 mg

Na2MoO4 x 2H2O 36.0 mg Distilled water 990 mL

First dissolve FeCl2 in HCl, then dilute with water, add and dissolve the other salts. Finally make up to 1000 mL. cVitamin solutions Biotin 2.0 mg Folic acid 2.0 mg Pyridoxine-HCl 10.0 mg

Thiamine-HCl x 2H2O 5.0 mg Riboflavin 5.0 mg Nicotinic acid 5.0 mg D-Ca-pantothenate 5.0 mg

Vitamin B12 0.10 mg p-aminobenzoic acid 5.0 mg Lipoic acid 5.0 mg Distilled 1000 mL dSoil extract medium (SE) The SE medium was prepared following an established procedure (DSM medium 12; http://www.dsmz.de/), using the sediments samples. After autoclaving and centrifugation, the resulting supernatant was filtered through a 0.22-µm filter and then autoclaved a second time. eChemoattractants used in the chemotaxis experiments • Tween 0.001% • DMSO 1% • Mix sugars I (trehalose, cellobiose, maltose) 2 mM each • Mix sugars II (gentiobiose, sucrose) 2 mM each • Mix sugars III (N-acetylglucosamine, mannitol, rhamnose) 2 mM each

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 83 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

• KH2PO4 2 mM • 20 Amino acids 2 mM • Fatty Acid mix (formate, acetate, valerate, propionate, butyrate) 2mM each • TCA mix (lactate, succinate, citrate, pyruvate, oxaloacetate, α-ketoglutarate) 2mM each + ● Nitrogen compounds (NH4 , TMAO, urea) 1 mM each

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 84 of 85

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Annexe 2. Details of applied UPLC-MS/MS method

1. Final preparation of lyophilised extracts for UPLC-MS analysis. Re-suspend extracts in the corresponding extraction/fractionation solvent to reach a concentration of 10 mg/mL. Transfer 20 µL of this into an analysis LC-MS vial with glass insert. In case of dichloromethane/methanol (1:1, v/v), cyclohexane and ethyl acetate, these 20 µL have to be evaporated under a stream of nitrogen, since these solvents are not compatible with later chromatography conditions. After evaporation of solvent, add 20 µl of DMSO/acetonitrile (1:1, v/v) for resuspension 2. Liquid chromatography – tandem mass spectrometry The injection volume was 4 µl. Extracts were separated by ultra-high performance liquid chromatography on a Dionex Ultimate 3000 UPLC (Thermo Fischer Scientific, Waltham, MA) using a 150 mm Kinetex C18 reversed phase column with 1.7 μm particle size and 2.1 mm inner diameter (Phenomenex, Aschaffenburg, Germany) with a flow rate of 300 μL/min. Gradient elution with water with 0.1% (v/v) formic acid as eluent A and acetonitrile with 0.1% (v/v) formic acid as eluent B was run as follows: 1% B for t = 0 min to t =2 min, linear gradient from 1% B to 100% B from t = 2 min to t =20 min, hold 100% B until t = 25 min and linear gradient from 100% B to 1% B from t = 25 min to t = 30 min. Samples were analysed by positive mode and negative mode electrospray ionization quadrupole time-of-flight mass spectrometry on a maXis™ HD QTOF (Bruker, Bremen, Germany) in full scan mode (50– 1500 Da). Data dependent MS/MS was performed by collision-induced dissociation of the three most abundant ions in each scan, making use of Bruker’s “smart exclusion” functionality to minimise multiple fragmentation of the same ion. The collision energy was ramped from 80% to 200% of the default auto-MS/MS collision energy in order to get more information rich spectra.

Both an external calibration before and after every run, and an internal lock mass calibration in every scan was applied, that leads to a small mass error (in most cases <1ppm). This should in turn lead to very few, in most cases just one sum formula for compounds with masses <400 Da. In addition to that also the isotopic pattern is taken into account to predict/narrow down the potential sum formula. However, an automated sum formula assignment was not implemented.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 85 of 85