Grant Agreement Number: 654008

EMBRIC

European Marine Biological Research Infrastructure Cluster to promote the Blue

Bioeconomy

Horizon 2020 – the Framework Programme for Research and Innovation (2014-2020), H2020-INFRADEV-1-2014-1

Start Date of Project: 01.06.2015 Duration: 48 Months

Deliverable D6.1 b EMBRIC showcases: prototype pipelines from the microorganism to product discovery (Revised 2019)

HORIZON 2020 - INFRADEV Implementation and operation of cross-cutting services and solutions for clusters of ESFRI

1

Grant agreement no.: 654008 Project acronym: EMBRIC Project website: www.embric.eu Project full title: European Marine Biological Research Infrastructure cluster to promote the Bioeconomy (Revised 2019) Project start date: June 2015 (48 months) Submission due date: May 2019 Actual submission date: Apr 2019 Work Package: WP 6 Microbial pipeline from environment to active compounds

Lead Beneficiary: CABI [Partner 15] Version: 1.0 Authors: SMITH David [CABI Partner 15] GOSS Rebecca [USTAN 10] OVERMANN Jörg [DSMZ Partner 24] BRÖNSTRUP Mark [HZI Partner 18] PASCUAL Javier [DSMZ Partner 24] BAJERSKI Felizitas [DSMZ Partner 24] HENSLER Michael [HZI Partner 18] WANG Yunpeng [USTAN Partner 10] ABRAHAM Emily [USTAN Partner 10] FIORINI Federica [HZI Partner 18]

Project funded by the European Union’s Horizon 2020 research and innovation programme (2015-2019) Dissemination Level PU Public X PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services

2

Abstract Deliverable D6.1b replaces Deliverable 6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery with the specific goal to refine technologies used but more specifically deliver results of the microbial discovery pipeline. The objective of work package 6 Microbial pipeline from environment to active compounds was to develop coherent chains of high-quality services for access to biological, analytical and data resources and deploying common underpinning technologies and practices for useful secondary metabolites from marine . This prototype pipeline brings together the expertise and facilities of partners from the MIRRI, EMBRC and EUOPENSCREEN research infrastructures to target and release the potential of microorganisms from isolation through characterisation to end product. Deliverable D3.1 Map of centres of expertise and best practices from EMBRIC work package 3 provides a comprehensive coverage of the cluster capabilities. Work package 6 demonstrated how the discovery pipelines can utilize the partner RIs and specifically how some of the services can be linked to produce specific end products. EMBRIC has established an overarching and operational structure to facilitate the integration of the multidisciplinary value chains of services. The partner RIs are committed to retain such activities as legacy to the EMBRIC project. Bacteria are key components of the marine environment, performing a wide range of biogeochemical and ecological functions yet we know very little about them. It is estimated that we have seen less than 1 percent in culture and the vast potential remains locked away. Population genomics has provided us with a picture of what might be there and an idea of the chemistry they may perform. EMBRIC’s microorganism pipeline aims to unlock this potential. Utilizing the specialist expertise and facilities of key laboratories, organisms that are difficult to cultivate or have yet to be grown were targeted, characterised and prepared for scale up and production of active compounds. DSMZ undertook to make difficult-to-culture bacteria amenable to subsequent natural compound analyses. Optimal growth conditions for some of these organisms were determined to enable strains with novel properties to be grown. Following de-replication, DSMZ isolated 264 species of slow growing bacteria from sea water, sediment and sponges of the Pacific Ocean representing the phyla , Bacteroidetes, and Rhodothermaeota. The most interesting were selected for cosmid library production. Two Arcobacter strains are described as novel species in the genus and those exemplary strains were fermented and organic extracts supplied to HZI for analysis. USTAN had the role of improving the efficiency and effectiveness of a harmonised natural product discovery pipeline. Cosmid libraries were established for test strains and these put through heterologous platforms to determine the presence of compounds. 9 gene clusters and a series of compounds were detected. Improvements to the process were made using elicitors and promoters and some difficult cyclic peptides were characterised. HZI improved the efficiency of different extraction and characterization techniques for use on different types of organisms and as a result isolates producing interesting compounds were discovered. Once the pipeline through the partner infrastructures were fine-tuned user fully sequenced samples were put through the microbial pipeline and demonstrated that interesting compounds could be discovered. CABI focused on the regulatory environment addressing issues around access to genetic resources and access and benefit sharing in particular. The microorganism prototype pipeline resulted in a flexible system to access cross research Infrastructure expertise and facilities to meet specific user defined demand of the research community. In one single experiment to develop the EMBRIC microbial prototype pipeline, a total of 264 rarely isolated species of bacteria were made available for study of their bioactive compounds. This demonstrates that coordinated and targeted isolation programmes engaging research teams from the many partners in the Research Infrastructures orchestrated for specific bioindustry needs could yield many thousands of candidate organisms. The potential for discovery of new interesting bioactive compounds is thus increased exponentially.

3

Contents Abstract ...... 3 1. Introduction ...... 6 1.1 Microorganism prototype pipeline ...... 6 Cultivation and culture collection ...... 7 Determination of organism potential ...... 7 Extract production ...... 8 Purification of characterised compounds...... 8 Legal framework ...... 8 2. Protocols for isolation, selection, growth of strains and extract supply ...... 10 2.1. Sample collection and processing ...... 10 2.2. Cultivation strategies ...... 10 2.2.1 Single dilution high-throughput in liquid media...... 10 2.2.2 Direct plating method ...... 11 2.2.3 Growth in biofilms ...... 11 2.2.4 Chemotaxis chambers ...... 11 2.3. Taxonomic affiliation of isolates ...... 12 2.4. Maintenance procedures ...... 13 2.5. Fermentation of bacteria for natural product production ...... 13 3. The genomic characterisation, cosmid library production and analysis ...... 14 3.1 Gene Clusters Prediction ...... 14 3.2 Genomic DNA Preparation ...... 15 3.3 Vector DNA Preparation ...... 15 3.4 Construction of Cosmid Library ...... 16 3.5 Transformation of Gene Clusters and Secondary Metabolite Heterologous Expression ...... 17 4. Analysis of extracts ...... 20 4.1 Analysis of extracts at USTAN ...... 20 4.1.1 Secondary metabolites production at Ustan ...... 20 4.1.2 Secondary metabolites extraction ...... 20 4.1.3 LC-HRMS detection ...... 20 4.1.4 Network analysis ...... 20 4.2 Analysis of extracts at HZI ...... 21 5. Regulatory Environment ...... 24 5.1 Regulatory Guidance and Community Best Practices...... 26 5.2 Regulations impacting on collection, handling, use and distribution of organisms ...... 28 5.3 Health and Safety ...... 29 5.4 Microorganisms as hazardous substances ...... 30 5.5 Classification of Microorganisms on the Basis of Hazard ...... 30 5.6 Quarantine regulations ...... 32 5.7 Postal Regulations and Safety ...... 32

4

5.8 Packaging ...... 32 5.9 Regulations governing distribution of cultures...... 32 5.10 Control of Distribution of Dangerous Organisms ...... 33 5.11 Biological Weapons Convention ...... 34 5.12 Export Licensing Measures ...... 34 USA ...... 34 European ...... 35 5.13 Control procedures: UKNCC control of Dangerous Pathogens ...... 35 5.14 Convention on Biological Diversity ...... 36 5.15 Ownership of Intellectual Property Rights (IPR) ...... 37 5.16 Safety information provided to the recipient of microorganisms ...... 38 5.17 Areas Beyond National Jurisdiction (ABNJ) ...... 39 6. Outputs from the microorganism prototype pipeline ...... 41 6.1 Strains collected by DSMZ ...... 41 6.2 Cosmid libraries and biosynthetic gene clusters at USTAN ...... 41 6.3 Characterisation of extracts at HZI ...... 54 6.4 Significance of outputs ...... 59 7. EMBRIC pathways to discovery ...... 61 8. Summary ...... 70 9. References and bibliography ...... 72 Annexe 1. Additional information on media formulae used at DSMZ ...... 76 Annexe 2. Details of applied UPLC-MS/MS method ...... 80

5

1. Introduction

The European Marine Biological Research Infrastructure Cluster (EMBRIC) is designed to accelerate the pace of scientific discovery and innovation from marine Bio-Resources. EMBRIC aims to promote new applications derived from marine organisms in fields such as drug discovery, novel foods and food ingredients, aquaculture selective breeding, bioremediation, cosmetics and bioenergy (http://www.embric.eu/). Researchers design their own experiments to lead to discovery of novel properties or products but such programmes can be enhanced by using the facilities available through European Research Infrastructures. EMBRIC addresses the steps to discovery from selection and isolation of the marine microorganisms to the characterisation and isolation of bioactive compounds. EMBRIC tests the different routes that can be followed and demonstrates the advantages of engaging with the most appropriate technologies available. Additionally, there are numerous areas where policy, regulations or legislation impact on the isolation, use and distribution of organisms which include health and safety, quarantine regulations, shipping, packaging, governing distribution of cultures including dangerous pathogens, export licensing measures, ownership of Intellectual Property Rights (IPR) and more. Compliance with this regulatory environment may be complicated and ever changing but it is not new. Individuals and their Institutions must address this but in many cases community best practices or common approaches have helped them. For example, culture collections through the European Culture Collections’ Organisation (ECCO) and the World Federation for Culture Collections (WFCC) have supported their members in compliance particularly keeping track of shipping regulations and more latterly the Nagoya Protocol. In the context of EMBRIC two of the infrastructures MIRRI and EMBRC have developed extensive support tools and advice. This report examines existing initiatives and identifies gaps in support in order to facilitate research and development in a compliant operational framework. However, it is the pipeline itself that EMBRIC focussed on, in order to demonstrate how appropriate technologies could be harnessed with user needs in mind to help accelerate the discovery process. An initial small group of 4 laboratories were selected to work together to demonstrate how workflows through different centres and across Research Infrastructures could be optimised.

1.1 Microorganism prototype pipeline Discovery and exploitation of natural compounds from bacteria and fungi The aim of the EMBRIC microorganism prototype pipeline was to coordinate the expertise, technologies and facilities of Research Infrastructures to harness the most appropriate processes to accelerate discovery of new products from marine bioresources. Individual research institute or company pipelines have been developed often restricted by the in-house technologies and expertise available. The objective of the EMBRIC initiative was to develop coherent chains of high-quality services for access to biological, analytical and data resources and deploying common underpinning technologies and practices for the route to useful secondary metabolites from marine bacteria. The prototype pipeline brought together expertise and facilities of partners from the MIRRI, EMBRC and EUOPENSCREEN research infrastructures, to target and release the potential of the organisms from isolation through characterisation to end product. Bacteria are key components of the marine environment, performing a wide range of biogeochemical and ecological functions yet we know very little about them. It is estimated

6 that we have seen less than 1 percent in culture and the vast potential remains locked away. Population genomics has provided us with a picture of what might be there and an idea of the chemistry they may perform. EMBRIC’s microorganism prototype pipeline (Fig. 1) aimed to unlock this potential. Utilizing the specialist expertise and facilities of key laboratories, organisms that are difficult to cultivate or have yet to be grown were targeted, characterised and prepared for scale up and production of active compounds.

Figure 1. An overview of the EMBRIC microorganism prototype pipeline (see section 5 for further information on legal framework) Cultivation and culture collection • Improved targeted isolation and characterization of strains (substrate pulse experiment) • Biomass production • Fermentation and extraction scale up

EMBRIC partners are enhancing and developing new tools for isolating the yet to be cultured microorganisms. For example, DSMZ has used substrate pulse techniques on marine microbial communities to identify active cells; they have also employed a biofilm tool for microbial isolation which has facilitated microbial isolation. Challenging cells with specific substrates not only can provide indication of what may be required for growth media but can also help target organisms with specific properties. The expertise of the microbiologists across EMBRC and MIRRI can be harnessed to facilitate the creation of sufficient biomass from organisms hitherto difficult or impossible to cultivate. Determination of organism potential Working with slow growing marine organisms can be laborious and expensive in time and resources so it is important that effort focuses on the organisms with potential. There are numerous methodologies that can be applied to characterise and discover potential, the microorganism prototype pipeline explores and evaluates some of them, for example:

• Characterizing the metabolome; GC/MS and LC/MS(/MS) pipeline; correlating metabolism with gene expression data • Genome sequencing to determine biosynthetic gene clusters (BGCs) • Comparing approaches to unlocking natural product biosynthesis using: o Chemical elicitors (rapid but untargeted) o Heterologous expression (cosmid and bac libraries) – slow but precise and affording opportunity for further manipulation • Liquid chromatography-tandem mass spectrometry (LC-MSMS), Proton (Hydrogen) nuclear magnetic resonance (HNMR), Carbon-13 nuclear magnetic

7

resonance (CNMR), Heteronuclear Single Quantum Correlation (HSQC), Nuclear Overhauser Effect Spectroscopy (NOESY) characterization • Metabarcoding technology can characterise the species compositions of mass samples of environmental DNA. Extract production • Test efficiency of different extraction and characterization techniques • Selection of phenotypic assays; peptide arrays; mutant generation and sequencing; chemical probes • Develop specific purification procedures for each molecule - Extraction/isolation example natural products Extraction method limits compounds available in the sample for assay – EMBRIC expertise can identify the appropriate stage of the organism’s life cycle or the triggers to express the target chemistry. Identifying the active component of the extract requires different technologies; the resources of RIs can be accessed to carry out process efficiently whilst utilizing their expertise e.g. EU-OPENSCREEN provides infrastructure for Chemical Biology and its translation to medicine, agriculture, bioindustries. HZI has protocols for the detection and identification of different metabolite classes: from small volatile compounds (e.g. short- chain fatty acids) to larger polar and nonpolar compounds of primary and secondary metabolism. Different state-of-the-art routes to access, then heterologous express, identified biosynthetic gene clusters encoding novel chemistries are being explored and compared. Purification of characterised compounds • Mode of action analysis for isolated and structurally characterised compounds • Characterise the metabolome by untargeted GC/MS and LC/MS • Correlate metabolism with gene expression data • Test efficiency of different extraction and characterization techniques • Test extracts and pure compounds in cellular bioassays • For bioactive compounds: Mode of action studies, peptides arrays, mutant generation and sequencing • Links to relevant data sets outside the partners to add value

Compound identification in GC-MS analyses is carried out using NIST’11 spectral library; a GC-MS in-house library Golm Metabolome database is available for comparative purposes. For the LC-MS/MS workflow, HZI uses publicly available and commercial (metaboBASE, Metlin, HMDB and GNPS) databases, 610 in-house standards and a self-programmed MS2 spectra clustering algorithm. Legal framework There is extensive legislation concerning the safe handling, use and distribution of micro- organisms at the national, regional and international levels. Micro-organisms of hazard groups 2, 3 and 4 are hazardous substances and as such fall under the EU Biological Agents Directive and are dangerous goods as defined by the International Air Transport Association (IATA) Dangerous Goods Regulations where requirements for their packaging are defined. The potential mis-use of microorganisms has introduced control measures in place for biosecurity to control access to dangerous pathogens. Most recently the Nagoya Protocol is being implemented by nations to ensure benefit sharing occurs as a result of accessing genetic resources. The legislation and supporting documents are often difficult to find and understand. EMBRIC offers advice and interpretation to help the implementation of such legislation. Examples are:

8

• Health and Safety • Classification of Microorganisms on the Basis of Hazard • Quarantine regulations • Postal Regulations and Safety • Packaging • Regulations governing distribution of cultures • Legislation on the Proliferation, Distribution and Misuse of Dangerous Pathogens • Export Licensing Measures • Convention on Biological Diversity and the Nagoya Protocol • Ownership of Intellectual Property Rights (IPR) • Safety information provided to the recipient of microorganisms

The samples collected in the course of the studies of work package 6 are either from the DSMZ collection or directly from Marine Areas Beyond National Jurisdiction (ABNJ), e.g. the Pacific Ocean. ABNJ commonly called the high seas, are those areas of ocean for which no one nation has sole responsibility for management (see section 5.17 below). Action is being taken to improve management of fisheries and strengthen protection of related ecosystems but at the time of collection there were no access controls. The intellectual property generated by work package 6 (and indeed the whole EMBRIC project) is subject to section 8 of the EMBRIC Consortium Agreement. Section 8.3.1.1 states that Own Results are the exclusive property of the Party who generated the Result alone. Each party is free to protect Own Results as deemed necessary. Section 8.3.1.2 Joint Results This section states that in the event that Results would be generated by the staff of two or more Parties inseparably (Joint Result), the Joint Results are co-owned by the Parties in proportion to their intellectual contributions, human, material and financial, unless the Parties expressly agree otherwise. Any Joint Result consisting of a new patent, software or other knowledge protected by intellectual property rights will be regulated by a joint ownership agreement, which will be established between the Co-owners Parties. The joint ownership agreement shall be signed by all of the co-owning Parties before any industrial or commercial use of the Joint Results. This joint ownership agreement shall specify, inter alia, the applicable arrangements in case of the extension of rights as well as those applicable to the allocation and assumption of expenses in connection with the requested protection. Access Providers available for the full microorganism prototype pipeline:

• EMBRC: University of St Andrews, School of Chemistry • MIRRI: Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH, Germany; CABI - Centre for Agriculture and Biosciences International, UK • EU-OPENSCREEN: Helmholtz Centre for Infection Research, Germany

9

2. Protocols for isolation, selection, growth of strains and extract supply

The cultivation of bacteria is highly biased toward a few phylogenetic groups. However, many of the currently underexplored bacterial lineages likely have novel biosynthetic pathways (Overmann et al., 2017). The access to those enigmatic taxa would facilitate the discovery of bioactive molecules with new scaffolds or targets, avoiding the re-discovery of known natural products. In this regard, we have developed improved cultivation methods based on a better understanding of the ecology of previously not-cultured bacteria. Below, we detail the methodology followed for the isolation of bacterial strains from marine samples, and their subsequent taxonomic identification, preservation, selection for further steps, fermentation and generation of an organic extract collection.

2.1. Sample collection and processing Seawater samples were collected close to the water surface (1 m depth) in sterile Nalgene bottles. Marine sediments were sampled in sterile 50 ml reaction tubes by divers or via the use of a small crane and directly transferred to a sterile 50 ml reaction tube. Samples were kept at 4 ºC and processed within 10 h after sampling. Subsamples were fixed in 2% (v/v) glutaraldehyde. For the bacterial isolation, sediment samples were dispersed in 10 mM 4-(2- hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) buffered at pH 7.3 by vortexing. 2.2. Cultivation strategies For bacterial isolation, three complementary strategies were applied in order to maximise the diversity of isolated strains. 2.2.1 Single dilution high-throughput in liquid media. This strategy applied a high-throughput cultivation approach based on (i) liquid oligotrophic media, (ii) a low concentration of inocula in order to outcompete less abundant but fast- growing bacteria and (iii) long incubation times. These three factors allow accessing difficult to grow bacteria (Connon and Giovannoni, 2002; Pascual et al., 2015, Overmann et al., 2017). Parallel liquid cultures were set up in 96-well microtiter plates. Before inoculation, total bacterial cell numbers were determined from each natural sample by fluorescence microscopy after staining with SYBR Green I (Life Technologies, Ltd, Paisley, UK). Each well of the microtiter plates was filled with 180 µl of medium and subsequently inoculated with 20 µl of inoculum containing 10 or 50 cells (Bruns et al., 2003). The plates were filled and inoculated either by hand using multichannel pipettes or automatically using the Thermo Scientific™ Multidrop™ Combi Reagent Dispenser. The outer wells of each plate (36 wells) were not inoculated and served as negative controls. Five different culture media were used for the bacterial enrichment and isolation: (i) Artificial Sea Water (ASW)/ HD 1:10; (ii) medium “polymer mix”; (iii) medium “insoluble humic analogs”; (iv) medium “soluble humic analogs” and (v) medium Soil Extract (SE)/HD 1:10 (see Annexe 1 - Additional Information). Plates were incubated at 15 °C in the dark for 6-12 weeks. After incubation, the bacterial community grown in each well was analysed by a barcoded Illumina paired-end sequencing method targeting the 16S ribosomal RNA V1-2 hypervariable region (Camarinha-Silva et al., 2014). The of the reads was assigned against the SILVA database (v.128) (Quast et al., 2013) with UCLUST (Edgar, 2010). According to the taxonomic structure of the bacterial community of each well, a selected isolation strategy was carried out. Aliquots of each culture were plated on the above described medium solidified with 0.8% gelrite (w/v) [SERVA,

10

Heidelberg, Germany]. After incubation for 4-6 weeks, several representative colonies were picked from each plate and purified by three additional passages on the corresponding solidified medium. 2.2.2 Direct plating method The direct plating method was used for the enrichment of slow-growing marine bacteria which need the interactions with other microorganisms for growth. Those interactions may be related to the synthesis and subsequent release of metabolites or signal molecules to the surrounding medium (Overmann, 2013). This approach is limited to bacteria able to produce (micro-) colonies on solid media. Five different culture media solidified with 0.8% gelrite (w/v) were used for the bacterial enrichment: (i) ASW/HD 1:10; (ii) medium “polymer mix”; (iii) medium “insoluble humic analogs”; (iv) medium “soluble humic analogs” and (v) medium Soil Extract (SE)/HD 1:10 (see Annexe 1 - Additional Information). Experiments were carried out in 90 mm Ø Petri dishes. Tenfold serial dilutions of the natural samples were performed in Artificial Sea Water media (ASW; modified from Bruns et al., 2003) or Artificial Brackish Water (ABW) (see Additional Information). Subsequently, 100 µl of the 10-3 or 10-4 dilution was added to the culture medium surface and spread with a Drigalsky spreader. Plates were incubated at 15 °C in the dark for 6-12 weeks. After incubation, several representative colonies were picked from each plate and purified by three additional passages on the corresponding solidified medium. 2.2.3 Growth in biofilms For the enrichment and isolation of biofilm-forming bacteria, the methodology described by Gich et al. (2012) was adapted to marine samples. Solid surfaces may lead to the stimulation of cell division and growth of starved bacteria (Kjelleberg et al., 1982; Overmann, 2013). Strips consisting of different inert solid materials (steel, glass, polypropylene or polystyrol) were employed and incubated in 20 ml-glass vials (Fig. 2). Solid surfaces were incubated in ASW/HD 1:10 and inoculated with 1000 cells from the natural samples. Vials were incubated at 15°C for 8 weeks. Three sequential enrichments were done, transferring the solid surfaces to fresh medium every third month, and incubating the cultures at room temperature. Finally, the biofilm that was formed over the surface of the strips was spread onto ASW/HD 1:10 solidified with gelrite [0.8% (w/v)] and cultures were purified by subsequent re-streaking.

Figure 2. 20 ml-glass vials used for the enrichment and isolation of biofilm-forming bacteria - Two parallel steel strips are submerged in ASW/HD 1:10.

2.2.4 Chemotaxis chambers Another cultivation approach for the enrichment and subsequent isolation of marine bacteria exploited the chemotactic responses of bacteria to specific nutrients (Adler 1973; Fröstl & Overmann, 1998; Overman, 2005). Although restricted to motile and chemotactically active microorganisms, a considerable fraction of species can be covered with this technique,

11 particularly in bacterioplankton communities (Overmann, 2005). For the chemotaxis assays, glass capillaries loaded with defined substrate solutions were inserted in a suspension of motile microorganisms, and the accumulation of cells at the opening of or within the capillary is monitored by direct or indirect methods (Overmann, 2005). The substrates used for the isolation of marine bacteria are listed in Annexe 1 as Additional Information. The concentration of attractants is a parameter that has been found to be critical for observing a chemotactic response of bacteria (Jaspers, 2000). Experiments were set up in small microscopic chambers (Fig. 3; modified from Overmann, 2005), which are prepared using small 21 × 21 × 0.17mm coverslips as spacers between the microscope slide and the lid, which consists of another 60 × 24 × 0.17mm coverslip. Spacers and the lid are fixed by sealing the two short and one of the long edges of the chamber with a paraffin/mineral oil mixture (4:1, v/v). Flat rectangular glass capillaries with a length of 50 mm, an inside diameter of 0.1 × 1.0 mm, and a capacity of 5 µL (Vitrocom, Mountain Lakes, NJ) were used for most applications. These capillaries fit exactly into the opening of the chemotaxis chamber. The specific geometry of these capillaries permits a direct light microscopic examination of their contents. For marine samples, the small microscopic chambers were incubated at room temperature for 3 hours. After incubation, the capillaries are removed from the chambers. For direct microscopy of the accumulated microorganisms, the open end of each capillary was immediately sealed with plasticine. Subsequently, bacterial cells trapped in the capillaries could be transferred to petri dishes or 96-multiwell plates filled with ASW/HD 1:10 exerting positive pressure with a pipette from one end of the capillary.

Figure 3. Laboratory-made microscopic chamber - Two sides and the back of the chamber are sealed with paraffin/mineral oil (not shown). Flat glass capillaries are loaded with substrate solutions, sealed at one end with plasticine, and inserted into the chamber through the opening in the front.

2.3. Taxonomic affiliation of isolates The taxonomic affiliation of all axenic bacterial isolates was investigated by sequencing their 16S rRNA gene. The almost full-length 16S gene of strains was amplified directly by colony- PCR using primer pair 8f (5´-AGAGTTTGATCCTGGCTCAG-3´) (Galkiewicz et al., 2008) and 1492r (5´-GGTTACCTTGTTACGACTT-3´). PCR mixtures included 2.0 µL PCR buffer (10x), 0.8 µL MgCl2 (25 mM), BSA 0.4 µL (20 mg mL-1), 0.4 µL dNTPs (10 mM each), 0.08 µL each forward and reverse primers (50 pmol µL-1), 0.08 µL Dream Taq DNA polymerase (5 U µL-1 Thermo Scientific) and 1.0 µL template [Pick colonies and add a stab of each colony to 20 µL of water followed by three freeze/thaw cycles (-20 °C/microwave oven)] in a total volume of 20 µL. The thermal cycling program consisted of: (i) 10 min at 94 ºC; (ii) 32 cycles of 30 s at 94 ºC, 30 s at 56 ºC and 1 min at 72 ºC, and (iii) a final elongation step of 7 min at 72 ºC. PCR products were purified and sequenced using the above primer pairs and the internal primers

12

1055f (5´-ATGGCTGTCGTCAGCT-3´) (Lane, 191) and 341r (5´-CTGCTGCCTCCCGTAGG- 3´) (Muyzer et al., 1993), and by Sanger sequencing employing the AB 3730 DNA DNA analyser (Applied Biosystems, Foster City, CA) and the AmpliTaq® FS BigDye® Terminator Cycle Sequencing Kit. Subsequently, the 16S rRNA sequences were analyzed with the online database EzBioCloud (Yoon et al., 2017). A first selection of interesting strains was made based on their taxonomic adscription and novelty. Those isolates belonging to underrepresented phyla (e.g. Bacteroidetes, Actinobacteria, and Proteobacteria among others) as well as those which represent novel species, genus, or higher taxonomic ranks were prioritised for successive analyses. The taxonomic criterion was complemented with molecular data derived from genome sequencing and cosmid library production (see chapter 3 “The genomic characterization, cosmid library production and analysis”).

2.4. Maintenance procedures Strains were routinely grown using the ASW/HD 1:10, ABW/HD 1:10, KM14, SSE/HD 1:10, SSE HP and MB at 15 ºC. Strains were sub-cultured every 1.5–2 months. For cryopreservation in liquid nitrogen or at −80°C, cell suspensions in were supplemented with 5% DSMO (v/v) or 25% glycerol (v/v) and immediately shock-frozen.

2.5. Fermentation of bacteria for natural product production Growth ranges and optima of temperature, pH and salinity were determined for each strain as well as its growth kinetic. All analyses were carried out in liquid ASW/HD 1:10 under toxic conditions as described before (Pascual et al., 2015). Growth was determined by measuring the optical density at 660 nm (OD660). The optimum growth was defined as ≥75% of the highest growth rate achieved. For the fermentation of strains, 4 × 100 mL cultures containing liquid ASW/HD 1:10 (in 250 mL Erlenmeyer flasks) were inoculated with 3.0 ml (3% v/v final culture volume) of a seed culture. Depending on the growth kinetics and optimum condition of each strain, the cultures were fermented at 15 - 28°C for 3-15 days, on a rotary shaker at 180 rpm. Two of the culture vessels were supplemented with 2% pre-washed Amberlite® 16 XAD resin (w/v). This hydrophobic polyaromatic resin adsorbs and releases ionic species through hydrophobic and polar interactions, increasing the diversity and amount of specific secondary metabolites released by the strain during growth (González-Menendez et al., 2014). Prior, the resin was washed in twice its volume of methanol by stirring for 1 h, followed by six washing steps with distilled water. It was finally autoclaved and stocked in distilled water at 4°C. After fermentation, the well-grown culture (2 × 100 ml) was sieved through a metal sieve (mesh size 270 µm). The dried resin and biomass were separately added to Erlenmeyer flasks containing 70 ml of acetone and shaken at 180 rpm at 21ºC in dark chamber for 3 h. The acetone was then filtrated through a folded filter into a 250 mL round-bottom flask, dried in a rotavapor at 40 ºC and finally re-suspended in 1 mL acetone. The supernatant of the cultures was extracted with ethyl acetate in a separating funnel until the organic phase stayed clear. Afterwards, the organic phase was dried over sodium sulphate and evaporated to dryness in a rotavapor at 40°C. 500 µL of each organic extract was centrifuged for 3 min at 11,000 rpm. The resulting supernatant was used for biological testing and LC-MS analysis.

13

Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

3. The genomic characterisation, cosmid library production and analysis

USTAN The bacterial genome sequences were analysed using antiSMASH to identify putative novel gene clusters and to predict the gene cluster products. The genomic DNA was then extracted and used to construct a cosmid library for each strain. Gene clusters predicted using antiSMASH were then targeted for heterologous expression. To do this, the cosmid library was screened using PCR. Once the cosmids containing the gene clusters of interest were identified by PCR, the cosmid containing gene cluster was conjugated into Streptomyces coelicolor and Streptomyces lividans for the heterologous production of novel natural products. The new compounds were then analysed using LC-MS/MS.

3.1 Gene Clusters Prediction Secondary metabolite biosynthesis gene clusters encoded within bacterial and fungal genomes can be identified, annotated and analysed rapidly using antiSMASH which integrates and cross-links with a large number of in silico secondary metabolite analysis tools. The genome sequence of strains is first submitted to antiSMASH. The bioinformatics software is able to predict all the encoded secondary metabolite biosynthesis gene clusters as well as key genes for secondary metabolite production, the gene cluster’s similarity with known gene clusters and the possible structure of secondary metabolite can be predicted. After the prediction the cosmid library of Saccharothrix espanaensis DSM 44229 and Amycolatopsis japonica MG417-CF17 were constructed and the BAC library of S. espanaensis DSM 44229 was constructed in Bio S&T company (See section 6.3 for further information).

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 14 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

3.2 Genomic DNA Preparation 3.2.1 Isolation of Genomic DNA: Genomic DNA from each bacterial strain was isolated in order to make a cosmid library for each bacterial strain. Large DNA fragments are required for constructing a cosmid library. The genomic DNA must be around 150 kb before digestion. Bacterial cells are harvested by centrifugation (4000 rpm) for 5 min. Then the cells are re-suspended in 10 ml lysis solution (0.3 M sucrose, 25 mM EDTA, 25 mM Tris-HCl pH7.5, RNase 2U). 10 mg lysozyme is then added into the cells suspension and incubated in 37 °C for 30 min. Next, 1 ml 10% (w/v) SDS containing 5 mg proteinase K are added and incubated in 55 °C for 1.5 h. 3.6 ml NaCl (5 M) and 15 ml chloroform are added and rotated for 20 min. Then aqueous phase is transferred into a clean tube and 1 volume isopropanol is added to precipitate the DNA. Finally, DNA is washed using 70% ethanol and dried in room temperature then dissolved in water. 3.2.2 Partial Digestion of the Genomic DNA To make a cosmid library, the genomic DNA is partially digested into fragments between 30-42 kb which are packaged by a bacteriophage into cosmid vectors. To do this, the partial digestion conditions need to be optimised for the bacterial strain. Therefore, as a test reaction, 10 μg of genomic DNA and 10 μl of 10× restriction digest buffer was added into a microcentrifuge tube and water was added to a final volume of 100 μl. Samples were pre-equilibrated at 37°C for 5 minutes and 0.5 U of the frequent cutter Sau3A I enzyme was added. Next, 15 μl aliquots were removed at various time points from the digestion: 0-, 5-, 10-, 20-, 30- and 45-minute time points and analysed by gel electrophoresis. After the optimal time interval had been determined, the optimised partial digest was performed on 100 μg of genomic DNA in a 1 ml total reaction volume. DNA fragments in the range of 30-42 kb were then gel purified. 3.2.3 Dephosphorylation of Partial Digestion of the Genomic DNA The genomic DNA was dephosphorylated so that it could be ligated into the supercos vector to make the cosmid library. Once the partially digested genomic DNA had been purified, the DNA was dephosphorylated. 2 μg of DNA was dephosphorylated by adding 2 U of FastAP and diluting the mixture to a final volume of 100 μl. The reaction mixture was incubated at 37°C for 1 hour and then incubated at 68°C for 15 minutes to inactivate the FastAP. Finally, the dephosphorylated genomic DNA was purified and re-suspended to 1 μg/μl in water.

3.3 Vector DNA Preparation SuperCos 1 was used for the for the cosmid vector. SuperCos 1 is a 7.9 kb cosmid vector that contains bacteriophage promoter sequences flanking a unique cloning site (see Fig. 4. The SuperCos 1 vector is also engineered to contain genes for the amplification and expression of cosmid clones in eukaryotic cells.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 15 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 4. Circular map and features of the SuperCos 1 cosmid vector To prepare the vector, 25 μg of the SuperCos 1 cosmid vector was digested with 10 U/μg of Xba I and 1 U/μg of FastAP enzyme in a total volume of 500 μl at standard buffer conditions for 1 hour at 37 °C. The reaction mixture was then incubated for 15 min at 65 °C. Next 10 U/μg of BamHI was added and incubated for a further 1 hour at 37 °C. Finally, the DNA was purified and dissolved in sterile water.

3.4 Construction of Cosmid Library 3.4.1 Ligation of DNA 1 μg of digested SuperCos 1 vector and 2.5 μg of partially digested gDNA were ligated using 2 U T4 DNA ligase and incubating at 24 °C overnight. 3.4.2 Packaging of DNA 100 μl of packaging extracts were thawed quickly by holding the tube until the contents of the tube just begins to thaw. 4 μl of ligated DNA were added to the packaging extract. Next the tubes were incubated at room temperature (22°C) for 2 hours. Next 500 μl of SM buffer (5.8 g of NaCl, 2.0 g of MgSO4 · 7H2O, 50.0 ml of 1 M Tris-HCl (pH 7.5), 5.0 ml of 2% (w/v) gelatin, water to a final volume of 1 L) was added to the tube. Next, 20 μl of chloroform was added and the tube was spun briefly. The supernatant containing the phage was then ready for titering. 3.4.3 DNA Transfer and inoculation of cosmid library The VCS257 glycerol stock strain was streaked onto the LB agar plates. The plates were incubated overnight at 37 °C. The next day, LB medium supplemented with 10 mM MgSO4 and 0.2% (w/v) maltose was inoculated with a single colony. This was grown at 37°C, shaking for 4–6 hours. The strains were then centrifuged at 500 × g for 10 minutes and the cells were

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 16 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

gently re-suspended in half the original volume with sterile 10 mM MgSO4 and dilute the cells to an OD600 of 0.5 with sterile 10 mM MgSO4. A 1:10 and a 1:50 dilution of the cosmid packaging reaction in SM buffer was prepared. Each diluted was mixed with 25 μl of each dilution with 25 μl of the appropriate bacterial cells at an OD600 of 0.5 in a microcentrifuge tube and incubated at room temperature for 30 minutes. 200 μl of LB broth was added to each sample and incubated for 1 hour at 37°C. Using a sterile spreader, the cells on LB agar plates were inoculated with the required amount of the appropriate antibiotic and the plates were incubated overnight at 37°C. Next each single colony was inoculated into 96 deep well plates so that each well contained 1.9 ml LB with 50 ng/ml kanamycin and was incubated overnight at 37°C. The cosmid library was stored in 50% glycerol and the cosmid DNA was extracted for gene clusters screening. 3.4.4 Gene Clusters Screening Using antiSMASH putative gene clusters for each bacterial strain were predicted. Using the sequence data, novel gene clusters were targeted using PCR. Two sets of primers were designed for the screening, one primer set to amplify the beginning of the gene cluster, and one primer set to amplify the end of the gene cluster.

3.5 Transformation of Gene Clusters and Secondary Metabolite Heterologous Expression 3.5.1 Introduction of cosmid clone into E. coli BW25113/pIJ790 In order for the cosmid to be transferred into the heterologous hosts, the cosmid was modified using a PCR targeting method to contain the apramycin resistance cassette and oriT. Once the cosmid was identified and purified, it was transferred to the strain E. coli BW25113/pIJ790 and grown overnight at 30 °C in 10 ml LB containing chloramphenicol (25 μg/ml). 100 μl E. coli BW25113/pIJ790 from overnight culture was then inoculated in 10 ml LB containing 20 mM MgSO4 and chloramphenicol (25 μg/ml). After 3-4 h at 30 °C the OD600 reached 0.4 and the cells were centrifuged at 3000 rpm for 10 min at 4 °C. The supernatant was discarded and the cells were re-suspended in an equal volume of chilled KMESI buffer (60 mM CaCl2, 25 mM MES (2-(N-morpholino) ethanesulphonic acid), 5 mM MgCl2, 5 mM MnCl2 Ph 5.8). The cells were kept on ice for 1-1.5 hours before being centrifuged again and re-suspend in 0.1 volume chilled KMESII buffer (KMESI with 10% glycerol). This cell suspension is competent for transformation. Next 50 µl of competent cell suspension was mixed with ~ 500 ng (5 µl) of cosmid DNA and the cosmid DNA was chemically transferred into the competent cells. The transformation mixture was incubated overnight at 30°C on LB. The next day a single colony was picked and used to inoculate an overnight culture to generate competent cells. (Fig.5) 3.5.2 Introduction of resistance cassette into E. coli BW25113/pIJ790 containing clusters The aac(3)IV-oriT-attP(FC31)-int(φ31) cassette was amplified from plasmid pIJ10702 by using the primers ScospIJ10702DVF2 5’-aga tct gat caa gag aca gga tga gga tcg ttt cgc atg gat aag

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 17 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

ttt atc acc acc ga and ScospIJ10702DVR2 5’- tcg ctt ggt cgg tca ttt cga acc cca gag tcc cgc tca gaa ctt ttc gat cag aaa c. Then the cassette was transferred into E. coli BW25113/pIJ790 containing the appropriate cosmid using the same method as above. After the transformation the plates were incubated at 37 °C overnight. The apramycin cassette then recombines with the cosmid to replace the kanamycin resistance gene of the cosmid. This therefore means the cosmid containing the putative novel gene cluster can be transferred into Streptomyces through conjugation. (Fig.5) 3.5.3 Transfer of the mutant cosmids into Streptomyces Mutant cosmids were transferred into E. coli ET12567/pUZ8002 using the same methods as above. E. coli ET12567/pUZ8002 containing the cosmid was incubated at 37°C overnight. Next 100 µl overnight culture was inoculated into 10 ml fresh LB plus antibiotics as above and grown for ~ 4 h at 37°C to an OD600 of 0.4. The cells were washed twice with 10 ml of LB to remove antibiotics. While washing the E. coli cells, for each conjugation 10 µl (108) Streptomyces spores were added to 500 µl 2 × YT broth. The spores were heat shocked at 50°C for 10 min, and then allowed to cool. Next, 0.5 ml E. coli cell suspension and 0.5 ml heat-shocked spores were mixed and spun briefly. Most of the supernatant was then discarded, and the pellet was re- suspended in the 50 µl residual liquid. Next, a dilution series from 10-1 to 10-4 was made for the conjugation mixture and each dilution was plated on MS agar + 10mM MgCl2 (without antibiotics) and incubated at 30°C for 16-20 h. Next, the plate was overlaid with 1 ml water containing 0.5 mg nalidixic acid (20 μl of 25 mg/ml stock; selectively kills E. coli) and 1.25 mg apramycin (25 μl of 50 mg/ml stock). A spreader was used to lightly distribute the antibiotic solution and the incubation as continued at 30 °C. (Fig.5) Finally, single colonies were inoculated into suitable cultures for secondary metabolite production and compounds were analysed using LCMS.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 18 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 5. Flow chart of cosmid library construction

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 19 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

4. Analysis of extracts

4.1 Analysis of extracts at USTAN 4.1.1 Secondary metabolites production at USTAN The single colonies of engineering Streptomyces (S. coelicolor and S. lividans) with putative gene clusters and wild type were inoculated into 5 ml of 2 X YT medium (16 g/L tryptone, 10 g/L yeast extract and 5 g/L NaCl, pH7.2). After 2 days cultivation the cultures were inoculated into 50 ml of ISP2 medium (4 g/L yeast extract, 10 g/L malt extract and 4 g/L glucose, pH7.2) and incubated for 4 days. The cultures were then used for LCMSMS detection. 4.1.2 Secondary metabolites extraction 10 ml methanol was added into 50 ml cultures which were left shaking for 20 minutes to break the cells. Then 50 ml ethyl acetate was added to the cultures with methanol for the extraction of new compounds. After the ethyl acetate was added to the cultures, the cultures were inverted to mix and the mixture was centrifuged to separate the layers. After the centrifugation, the organic layer was collected and dried using the vacuum drier (GENEVAC LTD, UK) overnight. The dried sample was dissolved using 2 ml 50 % methanol and spin down for LCMS detection. 4.1.3 LC-HRMS detection LC-HRMS was carried out on a Termo Scientific Velos Pro / Orbitrap Velos Pro with a H-ESI source and a Themo Scientific Diones UltiMate 3000 RS chromatography system. The HPLC system was equipped with a Waters XBridge C18 3.5 μm 2.1 x 100 mm column at 40 °C. 5 μL injection volume was used for all samples. HPLC analysis was carried out with 0.1 % formic acid in water and acetonitrile with a flow rate of 350 μL / min. The following gradient was used: 0.00 – 1.50 min 5 % acetonitrile, 1.50 min – 8.00 min 5 % - 95 % acetonitrile, 8.00 min – 10.00 min 95 % acetonitrile, 10.00 – 10.50 min 95 – 5 % acetonitrile. UV absorbance was measured between 220 – 800 nm at 2 nm resolution. The first minute of the run was diverted to waste. After 1 minute the eluent was passed to the H-ESI source. The HRMS was set up with the following parameters: Positive ionisation mode using 250 °C heater temperature, 350 °C capillary temperature, 40 U sheath gas flow, 20 U aux gas flow, 2 U sweep gas flow, 3.5 kV ionization voltage and 50 % RF lens power. A background ion corresponding to the [M+H]+ of n-butly benzenesulfonamide was used as a lock-mass for internal scan-by-scan calibration. 4.1.4 Network analysis For analysis of the heterologously expressed gene clusters the MS/MS data is analysed through the web-platform GNPS (Global Natural Products Social Molecular Networking) and then visualised using cytoscape. This allows for the visualisation of molecular networks, whereby sets of spectra from related molecules are displayed in spectral networks. This therefore helps with the deconvolution of data as the LCMS traces are very complex. The networks are visualised as such that each MS/MS spectrum is shown as a node, and the spectrum-to-spectrum relatedness are represented as the connections between the nodes. Thus, key networks can be visualised whereby one parent mass has been fragmented (Fig. 6 Section 6.3 below). The differences in masses within the network can help to elucidate the

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 20 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

final structure, for example there may be a loss of an amino acid. This is also linked with the genome analysis to try and elucidate the final structure of the unknown natural product. The spectral masses are also searched against the GNPS database to help de-replicate the data so that only novel compounds are isolated.

4.2 Analysis of extracts at HZI WP6 aimed to characterise the chemical diversity of extracts irrespective of their bioactivity. For this purpose, the metabolome, i.e. the ensemble of small molecules produced by a given microorganism, is investigated by ultra-high-performance liquid chromatography – tandem mass spectrometry (UHPLC-ESI-QTOF-MS(/MS) at HZI. This technique combines physical separation capabilities of liquid chromatography with the mass analysis from the mass spectrometer. The untargeted metabolomics approach used allows an unbiased analysis aiming at measuring the most abundant chemistries present in the endo-metabolomes (metabolite content of cell pellets) and in the exo-metabolomes (metabolite content of the supernatant) in the extracts of each culture, under the given growth conditions carried out at DSMZ. For this, 25 extracts obtained from DSMZ were dried and re-suspended to a final concentration of 0.5 mg/mL in acetonitrile 20% (v/v) mixture, in order to become compatible with the aqueous conditions used for liquid chromatography analyses. Separation was performed on a C18 reversed phase column using a water/acetonitrile gradient. Positive electrospray ionization mode was applied for the analysis of the large set of extracts first, and an analysis in the negative ionization mode will follow. A QTOF mass spectrometer (maXis HD, Bruker) was operated in a data-dependent MS/MS mode applying collision-induced dissociation of the five most abundant ions in each scan (see Annex 2). A metabolic profile was hence obtained, together with MS/MS data of the most abundant compounds for structural elucidation and matching. Beside high-resolution mass spectrometry data, also UV-spectra were recorded that can give additional useful information for compound identification. Data were first processed using the instrument commercial software (Bruker Data Analysis) and the opensource software RStudio. This involved several steps such as chromatographic peak detection, retention time correction, and filtering of detected features (retention time – m/z pairs). The assignment of a sum formula was not performed automatically, as especially for higher masses, several formulae are compatible with a given mass and the error of measurement; in addition, the inclusion or exclusion of untypical chemical elements like P, S, or halogens is hard to define in a generic manner – especially for marine metabolites. As we applied both an external calibration before and after every run, and had an internal lock mass in every scan, a small mass error (in most cases <1 ppm) is expected, that should lead to very few, in most cases just one sum formula for compounds with masses <400 Da. For the identification of known metabolites (i.e. dereplication) present in the data set, measured spectra were compared with the in-house library available at HZI, which contains spectra from approximately 600 pure standards. However, since this library is mainly based on primary metabolism from bacteria and human species, other freely-available online databases (e.g. Metlin and HMDB) were searched to improve the annotation of known metabolites via manual dereplication through MS2 spectral matching.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 21 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Furthermore, in order to confirm and improve the annotation of known metabolites, other methodologies were applied. One of this is “CluMSID”, an opensource tool developed at HZI (Depke et al). This is an R algorithm which aids the identification of features in untargeted LC- MS/MS analysis by clustering them based on MS2 spectra similarity and unsupervised statistical methods. MS2 clustering has indeed emerged as a promising computer-based approach to visualise and organise tandem MS/MS data sets and to automate database searches for metabolites identification within complex mixtures. Unique MS2 spectra provided by each analytical run (approx. 500) were clustered based on their product ions. Thus, compounds with structural similarities could be grouped together to facilitate the annotation process (Figure 6). The major advantage of such a clustering-based strategy is the possibility to confirm the annotation of known compounds and putative analogues based on their neighbouring features´ identity, when this is known, thereby providing a guide towards unknown molecules. The success of this strategy to achieve the rapid characterization of known compounds - functioning as anchor points of the cluster - greatly relies on the availability and quality of MS/MS compounds databases. Existing libraries (MassBank, Metlin, ReSpect, NIST) host MS/MS data of almost 23 000 individual compounds. We observed that they are far from being comprehensive to allow a full dereplication of complex mixtures – in particular because the unique marine resources of EMBRIC are underrepresented in such databases. The database of compounds and associated MS/MS data being produced within WP6 will therefore represent a valuable resource. A molecular networking tool, available from GNPS online, was applied to the whole data set, in order to obtain a visual display of the chemical space and structural similarity present in the MS experiments. This methodology allows the faster and improved annotation of known metabolites, together with the identification of unique metabolites potentially present in the data set. Data are converted to mzXML format (text-based format used to represent MS data describing the scan number, precursor m/z, and the m/z and intensity of each ion observed in MS/MS) and then are subjected to molecular networking, which includes groups of MS2 spectra based on spectra similarity. To reduce the complexity of the resultant network, identical MS/MS spectra are combined into consensus spectra, by comparing parent masses and fragment peaks (here mass tolerance for fragment peaks was set to 0.02 Da and parent mass tolerance to 0.005 Da). Each consensus spectrum is visualized as a node in the resulting molecular network.

Nodes are then connected by edges; here the thickness attribute of the edges was defined to reflect similarity between nodes, with thicker lines indicating higher similarity. The file generated from GNPS online is imported into Cytoscape for visualization as a network.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 22 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 6. An example of results from a clustering analysis of a single extract (strain M55 (proteobacteria). The recorded unique MS/MS spectra were clustered and displayed as circular hierarchical plot. Each spectrum has a unique identifier used for labelling in these plots. The MS/MS spectra will be provided in the database to allow easy conversion of the format into any spectral library format used by different MS libraries.

For each extract, molecular components are characterised with respect to their retention time, their UV spectrum, their accurate mass and their fragmentation spectrum. The analytical information is made available through a database, which will allow follow-up analysis, e.g. metabolite structural assignments on the basis of derived sum formula and fragmentation spectra, or the search for marine producer(s) of a given metabolite across a wide range of strains. For sample registration, the BioSamples (https://www.ebi.ac.uk/biosamples) database was chosen, the EBI partner (WP4) adapted it by creating a specific field to register EMBRIC samples defining appropriate attributes.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 23 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

5. Regulatory Environment

The EMBRIC pipelines to access products from marine environments encompass sampling, isolation and characterisation of organisms to extract and purification and assaying of potential bioactive compounds, with the goal to place products on the market. Table 1 lists some of the relevant regulations and provides links to the source. Table 2 presents the regulatory controls that are in place for each stage of the pipeline which includes collecting from in situ, biological diversity of areas beyond national jurisdiction (BBNJ), export to another country, import, handling (manipulation; growth; genetic manipulation), deposit as part of a patent process, shipping, storage and distribution. Community guidance and best practice is listed in Table 3. Table 1 List of some of the regulations governing microbiological activities Cartagena Protocol on Biosafety http://www.biodiv.org/biosafety/protocol.asp Budapest Treaty on the International Recognition of the http://www.cnpat.com/worldlaw/treaty/budapest_en.htm Deposit of Micro-organisms for the Purposes of Patent Procedure (done at Budapest on April 28, 1977 and amended on September 26, 1980) Convention on Biological Diversity http://www.biodiv.org/convention/articles.asp EC Directive 93/88/EEC on Biological Agents http://eur-op.eu.int/opnews/395/en/r3633.html for purchase through Celex EC Directive 90/679/EEC setting mandatory control laboratories http://eur-op.eu.int/opnews/395/en/r3633.html measures for for purchase through Celex Accord Européen relatif au transport international des http://eur-op.eu.int/opnews/395/en/r3633.html for purchase merchandises dangereuses par routes (ADR). through Celex IATA Dangerous Goods Regulations (DGR) http://www.iata.org/cargo/dg/dgr.htm UK Management of Health and Safety at Work (MHSW) http://www.hmso.gov.uk/cgi- Regulations 1992 (Anon, 1992) bin/htm_hl3?URL=http://www.hmso.gov.uk/si/si1999/199932 42.htm&STEMMER=en&WORDS=management+health+saf eti+work+&COLOUR=Red&STYLE=s#muscat_highlighter_fi rst_match UK Control of Substances Hazardous to Health (COSHH) http://www.hmso.gov.uk/cgi- regulations (1988) bin/htm_hl3?URL=http://www.hmso.gov.uk/si/si1988/Uksi_1 9881657_en_2.htm&STEMMER=en&WORDS=control+subs tanc+hazard+health+&COLOUR=Red&STYLE=s#muscat_h ighlighter_first_match EU Council Regulation 3381/94/EEC on the Control of http://eur-op.eu.int/opnews/395/en/r3633.html Exports of Dual-Use Goods from the Community of 19th December 1994 (Official J. L 367, p1) and amendments EEC Directives 90/219/EEC. Contained use of genetically http://biosafety.ihe.be/Menu/BiosEur1.html modified microorganisms (GMO's), *L117 Volume 33, 8 May 1990. EEC Directives 90/220/EEC. Release of GMO's, *L117 http://biosafety.ihe.be/Menu/BiosEur1.html Volume 33, 8 May 1990.

Table 2. Regulatory control of microbiology

Action Requirement Law, Regulation, Further information Convention Collecting from in situ Prior Informed consent from a Convention on Biological http://www.biodiv.org near-shore marine recognised authority Diversity environments Mutually agreed terms on use Convention on Biological http://www.biodiv.org Diversity Access and Benefit Sharing Nagoya Protocol https://www.cbd.int/abs/about/ Collecting from in situ Under consideration: Marine The new legal http://www.un.org/sustainabledeve from waters beyond global regime to better address instrument would fall lopment/blog/2017/07/countries- National jurisdiction: the conservation and sustainable under the 1982 United agree-to-recommend-elements- biological diversity of use of marine biological diversity Nations Convention on for-new-treaty-on-marine- areas beyond national of areas beyond national the Law of the Sea, biodiversity-of-areas-beyond- jurisdiction (BBNJ) jurisdiction national-jurisdiction/

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 24 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

which has, since its entry into force in 1994 Export to another Some plant and animal Quarantine regulations country pathogens require export licences Dangerous organisms with Export Licences for potential for dual use dangerous organisms Import Non-indigenous plant pathogens Quarantine regulations require licenses from country authority Human and animal pathogens Health and Safety can often only be imported to specified laboratories Handling: Manipulation; Containment dependent on Control of Biological http://eur- Growth hazard Agents - Health and op.eu.int/opnews/395/en/r3633.ht Safety ml EC Directive 93/88/EEC on Biological Agents Genetic manipulation Containment of manipulated EEC Directives http://www.biodiv.org/biosafety/pro organisms 90/219/EEC. Contained tocol.asp use of genetically http://biosafety.ihe.be/Menu/BiosE modified ur1.html microorganisms http://biosafety.ihe.be/Menu/BiosE (GMO's), *L117 Volume ur1.html 33, 8 May 1990. EEC Directives 90/220/EEC. Release of GMO's, *L117 Volume 33, 8 May 1990. Cartagena Protocol on Biosafety Deposit as part of a Budapest Treaty on the http://www.cnpat.com/worldlaw/tre patent process International aty/budapest_en.htm Recognition of the Deposit of Micro- organisms for the Purposes of Patent Procedure Shipping Packaging and transport IATA Dangerous Goods http://www.iata.org/cargo/dg/dgr.ht considerations Regulations (DGR) m Dangerous organisms EU Council Regulation http://eur- 3381/94/EEC on the op.eu.int/opnews/395/en/r3633.ht Control of Exports of ml Dual-Use Goods from the Community of 19th December Storage Appropriate containment Health and Safety Licence to hold pathogens Security Distribution Sovereign rights over the strains Convention on Biological http://www.biodiv.org Diversity Access and benefit sharing Bonn Guidelines http://www.biodiv.org Intellectual property Right ownership

EMBRIC work package 2 Deliverable 2.2 addresses Best Practice Methods for Biological Resource Centres Section 11 of that report covers Microorganism Biological Resource Centres compliance with national and international law. Here the elements covered are extended and additional guidance on compliance is reviewed.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 25 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

5.1 Regulatory Guidance and Community Best Practices Table 3. Regulatory Guidance Regulatory area Guidance document URL

Convention on Micro-Organisms Sustainable use and www.belspo.be/bccm/mosaicc Biological Access regulation International Code of Diversity Conduct Guidance document on the scope of http://eur-lex.europa.eu/legal- application and core obligations of content/EN/TXT/?uri=CELEX%3A52016XC0827%2801% Regulation (EU) No 511/2014 on ABS 29 MIRRI user service on Nagoya Protocol http://www.mirri.org/user-service/nagoya-protocol.html

Hygiene and Bio- Organisation for Economic Co-operation and http://www.oecd.org/dataoecd/4/4/34932656.pdf safety Development (OECD) United Nations Industrial Development www.who.org/emc/biosafe/index.htm Organisation (UNIDO) Bio-safety Information Network and Advisory Service (BINAS) International Centre for Genetic Engineering www.aphisweb.aphis.usda.gov.biotech and Biotechnology (ICGEB) US Animal and Plant Health Inspection www.nal.usda.gov/bic/ Service (APHIS) US Food and Drug Administration (FDA) http://www.fda.gov/ World Health Organization (WHO) Biosafety http://www.who.int/csr/labepidemiology/projects/biosafety Programme main/en/index.html U.S. Departments of Health and Human http://www.cdc.gov/od/sap/final_rule.htm Services (HHS) and Agriculture (USDA) rules implementing USA PATRIOT Act and Public Health Security and Bioterrorism Preparedness and Response Act of 2002 Centre for Food Safety and Applied Nutrition http://vm.cfsan.fda.gov/list.html (CFSAN) Belgian Bio-safety Server www.biosafety.be The Dutch Genetically Modified Organism www.rivm.nl/csr/bggo.html Bureau Biotechnology Information Centre (BIC) of www.nal.usda.gov/bic/ the US Department of Agriculture (USDA) UK Advisory Committee on Releases into www.environment.detr.gov.uk/acre/index.htm the Environment (ACRE) National Chemical Emergency Response UK www.eat.co.uk/ncec/complian/bibliog/bibliog.htm American Biological Safety Association http://www.absa.org (ABSA) European Biosafety Association (EBSA) http://www.ebsaweb.eu International Biosafety Working Group http://www.internationalbiosafety.org/english/index.asp (IBWG) Advisory Committee on Dangerous http://www.doh.gov.uk/bioinfo.htm Pathogens

Patents Budapest Treaty on the International http://www.wipo.int/treaties/en/registration/budapest/ recognition of the Deposit of Micro- organisms Transport and First generation guidelines for NCI- http://biospecimens.cancer.gov/biorepositories/guidelines shipping Supported Biorepositories – Federal register _full_formatted.asp Vol. 71, Number 82, Page 25814, April 28, 2006. Harmonisation of UN documents etc. www.hazmat.dot.gov/rules Universal Postal Union http://ibis.ib.upu.org; http://unicc/unece/tra; www.de/facil/upustr.htm WHO Guidance on Regulations for the http://www.who.int/csr/resources/publications/biosafety/W Transport on Infectious Substances HO_CDS_CSR_LYO_2005_22/en/

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 26 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Table 4. Community Best Practice Source Item URL Overview EMBRC Business Plan promoting http://www.embrc.eu/sites/embrc.eu/ Outlines of intentions to share best practice files/public/EMBRC_Business_Plan_ best practice web.pdf Data management Plan http://www.embrc.eu/sites/embrc.eu/ files/public/EMBRC_data_managem ent_plan.pdf EMBRIC Deliverable 2.2 http://www.embric.eu/sites/default/fil Key principles Best Practice Methods for es/deliverables/D2.1_Best%2520pra from available best practice in Biological ctice%2520procedures%2520for%2 the management of Biological Resource Centres 520harmonized%2520access%5B1 Resource Centres (BRCs) that %5D.pdf supply marine microorganisms. (MIRRI, EMBRC) MIRRI MIRRI Access and Benefit https://zenodo.org/record/284881 Specific guidance on Sharing (ABS) Manual https://absch.cbd.int/search/referenc compliance with ABS eRecords?schema=modelContractu requirements in microbiology alClause focussed on mBRC activity ABSCH-A19A20-SCBD-208213-1 Organisation for OECD Best Practice http://www.oecd.org/sti/biotech/oecd Covers general regulation Economic Guidelines for Biological bestpracticeguidelinesforbiologicalre compliance and detailed best Development and Resource Centres sourcecentres.htm practice in biosecurity for BRCs Co-operation (OECD) Common Access to CABRI Guidelines http://www.cabri.org/guidelines.html Guidelines for Collection Biological Quality Resources and Management Standards Information: European Public Deliverables: http://www.embarc.eu/deliverables.h Consortium of A BRC operational standard tml Microbial based on the OECD best http://www.embarc.eu/deliverables/E Resources Centres practice guidelines for MbaRC_D.NA1.2.1_2.28_BRC_stan Biological Resource Centres dard.pdf as a working draft for an ISO http://www.embarc.eu/deliverables/E Standard MbaRC_D.NA1.3.2_D2.34_BiosecC Overview of existing oCfinal%20for%20ExecCommAug20 legislation, guidelines, best 12.pdf practices etc. connected with biosecurity European Culture ECCO core Material https://www.eccosite.org/ecco-core- Provide model clauses and Collection’s Transfer Agreement for the mta/ recommendations for MTA Organisation supply of samples of content (ECCO) biological material from the public collection The Budapest Treaty: Code http://bccm.belspo.be/documents/file Practical details on how IDAs of practice for IDAs – s/deposit/code-of-practice-for- should operate to comply with International Depository idas.pdf the Budapest Treaty on the authority under the International Recognition of the Budapest Treaty Budapest Deposit of Microorganisms for Treaty on the International the Purposes of Patent Recognition of the Deposit Procedure (1977) of Microorganisms for the Purposes of Patent Procedure (1977) World Federation World Federation for Culture http://www.wfcc.info/guidelines/ Basic quality management for Culture Collections guidance for culture collections; Collections (WFCC) Guidelines for the Section !7 covers general Establishment and aspects in compliance with Operation of Collections of regulation Cultures of Microorganisms WFCC Library provides an http://www.wfcc.info/index.php/wfcc_ The resource covers: information resource from library/contribution/ EBRCN information resource the EBRCN project on transport EBRCN Quarantine Regulations.doc

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 27 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

EBRCN Alternative Safety Data Sheet.doc EBRCN BRC Compliance with the CBD.doc September.draft-EBRCN Health and Safety Requiements.doc EBRCN Resource Legislation.

5.2 Regulations impacting on collection, handling, use and distribution of organisms The reach of a laboratory’s health and safety procedures extend beyond the laboratory where the work is carried out to cover all those who may come in contact with substances and products from that laboratory. A microorganism in transit will put carriers, postal staff, freight operators and recipients at risk, some organisms being relatively hazard free whilst others quite dangerous. It is essential that safety and shipping regulations are followed to ensure safe transit. There are several other pieces of legislation that restrict the distribution of microorganisms of which a microbiologist must be aware. Information is presented on how a Biological Resource Centre (BRC) or culture collection should comply with: Health and Safety requirements Classification of Microorganisms on the Basis of Hazard Quarantine regulations Postal Regulations and Safety Packaging requirements Regulations governing distribution of cultures Legislation on the Proliferation, Distribution and Misuse of Dangerous Pathogens Convention on Biological Diversity Ownership of Intellectual Property Rights (IPR) Safety information provided to the recipient of microorganisms It is critical that biological resource centres operate to high standards and currently there are some guidelines available for adoption and use (see Table 4 above) and summarised in EMBRIC Deliverable 2.2 Best Practice Methods for Biological Resource Centres (http://www.embric.eu/sites/default/files/deliverables/D2.1_Best%2520practice%2520proced ures%2520for%2520harmonized%2520access%5B1%5D.pdf. A goal of the Research Infrastructure is to implement regulatory compliant best practice in the delivery of infrastructure services in line with the appropriate operations of biological resource centres. The latter are defined in guidance such as that of the OECD, the implementation of ISO standards and various community approaches such as: CABRI: http://www.cabri.org/guidelines/gl-framed.html WFCC: http://www.wfcc.info/guidelines/ OECD: http://www.oecd.org/sti/biotech/oecdbestpracticeguidelinesforbiologicalresourcecentres.htm

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 28 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

In the process of isolation, handling, storage and distribution of microorganisms and cell cultures there are many stages where compliance with the law, regulations or voluntary international conventions is required (Tables 1 and 2).

5.3 Health and Safety Health and safety are covered by Directive 89/391/EEC which puts in place measures to improve safety and health at work. It is designed to encourage improvements in occupational health and safety in all sectors of activity, both public and private; promote workers' rights to make proposals relating to health and safety, to appeal to the competent authority and to stop work in the event of serious danger seeks to adequately protect workers and ensure that they return home in good health at the end of the working day (http://ec.europa.eu/social/main.jsp?catId=148). There are also numerous national requirements. The EU legal framework in the area of occupational safety and health (OSH) is outlined in the OSH Strategic Framework 2014-2020 which can be consulted at http://ec.europa.eu/social/main.jsp?catId=151&langId=en. A risk assessment of handling and supply of organisms is required and should include an assessment of all hazards involved, not just infection, but also all others amongst which are, the production of toxic metabolites and the ability to cause allergic reactions. Organisms that produce volatile toxins or aerosols of spores or cells present a greater risk. It is the responsibility of the microbiologist to provide such assessment data to a recipient of a culture to ensure its safe handling and containment. Whether it is compliance with the law, or duties of a caring employer, the basic requirements in order to establish a safe workplace are: Adequate assessment of risks • Provision of adequate control measures • Provision of health and safety information • Provision of appropriate training • Establishment of record systems to allow safety audits to be carried out • Implementation of good working procedures

Good working practice requires assurance that correct procedures are actually being followed and this requires a sound and accountable safety policy (http://www.hse.gov.uk/pubns/hsc13.htm). A BRC must put in place procedures to manage the health and safety of all who may be put at risk by its activities. This requires a suitable and sufficient assessment of the risks to health and safety to which any person whether employed by them or not may be exposed to through their work (Anon, 1996a). These assessments must be reviewed regularly, additionally when changes in procedures or regulations demand, and must be recorded. The distribution of microorganisms to others outside the workplace extends these duties to protect others. In Europe the protection of workers from risks related to exposure to biological agents at work is addressed by Directive 2000/54/EC - biological agents at work of the European Parliament and of the Council of 18 September 2000 (https://osha.europa.eu/en/legislation/directives/exposure-to-biological-agents/77). This Directive lays down minimum requirements for the health and safety of workers exposed to biological agents at work.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 29 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

5.4 Microorganisms as hazardous substances The UK Management of Health and Safety at Work (MHSW) Regulations 1992 (http://www.legislation.gov.uk/uksi/1992/2051/contents/made) are all encompassing and general in nature but overlap and lead into many specific pieces of legislation. The Control of Substances Hazardous to Health COSHH) regulations (http://www.hse.gov.uk/coshh/) require that every employer makes a suitable and sufficient assessment of the risks to health and safety to which any person whether employed by them or not may be exposed to through their work. These assessments must be reviewed regularly in addition to and when changes in procedures or regulations demand, and must be recorded when the employer has more than five employees. The distribution of microorganisms to others outside the workplace extends these duties to protect others. The organism must be assessed for all types of hazard it presents not on infection but also on the basis of toxin production for example the mycotoxins or bacterial toxins. At the European level this is also covered by Directive 2000/54/EC - biological agents at work.

5.5 Classification of Microorganisms on the Basis of Hazard Various classification systems exist, which include World Health Organisation (WHO); United States Public Health Service (USPHS); Advisory Group on Dangerous Pathogens (ACDP); European Federation of Biotechnology (EFB) and European Community (EC). In Europe, the EC Directive (93/88/EEC) on Biological Agents sets a common base line which has been strengthened and expanded in many of the individual member states. In the UK the definition and minimum handling procedures of pathogenic organisms are set by the ACDP who list four hazard groups 1-4 with corresponding containment levels. Microorganisms are normally classified on their potential to cause disease, their human pathogenicity, into four groups (Anon, 1996a; https://www.hseni.gov.uk/publications/categorisation-biological-agents- according-hazard-and-categories-containment): Group 1 A biological agent that is most unlikely to cause human disease. Group 2 A biological agent that may cause human disease and which might be a hazard to laboratory workers but is unlikely to spread in the community. Laboratory exposure rarely produces infection and effective prophylaxis or treatment is available. Group 3 A biological agent that may cause severe human disease and present a serious hazard to laboratory workers. It may present a risk of spread in the community but there is usually effective prophylaxis or treatment. Group 4 A biological agent that causes severe human disease and is a serious hazard to laboratory workers. It may present a high risk of spread in the community and there is usually no effective prophylaxis or treatment. A BRC must ensure that all strains are assigned to appropriate risk/hazard groups this includes a positive assignment to risk/hazard group 1 unless otherwise considered hazardous. Hazard information must be recorded and made available to recipients of this material. Various classification systems exist which include: World Health Organisation (WHO)

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 30 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

http://www.who.int/csr/resources/publications/biosafety/WHO_CDS_CSR_LYO_2004_11/en/ NIH Guidelines for Research Involving Recombinant or Synthetic Nucleic Acid Molecules (April 2016) Appendix B - Table 1; https://osp.od.nih.gov/wp-content/uploads/2013/06/NIH_Guidelines.pdf Advisory Committee for Dangerous Organisms (ACDP); http://www.hse.gov.uk/pubns/misc208.pdf Safe biotechnology. 7. Classification of microorganisms on the basis of hazard. Working Party "Safety in Biotechnology" of the European Federation Biotechnology; https://www.ncbi.nlm.nih.gov/pubmed/8987466 Directive 2000/54/EC - biological agents at work; https://osha.europa.eu/en/legislation/directives/exposure-to-biological-agents/77 In Europe Directive 2000/54/EC - biological agents at work sets a common base line which has been strengthened and expanded in many of the individual member states. In the UK the definition and minimum handling procedures of pathogenic organisms are set by the ACDP who list four hazard groups 1-4 with corresponding containment levels. Similarly other European Countries have advisory committees, in Germany the ZKBS advises on how individual Genetically Engineered Microorganisms (GEMs) and Genetically Modified Organisms (GMOs) should be classified and the Trade Corporation Association of the Chemical Industry (BG Chemie) is involved in both. The Advisory Committee on Genetic Manipulation (ACGM) prescribe separate but similar regulations for those organisms that have been genetically modified. EU legislation on GMOs: COUNCIL DIRECTIVE (90/219/EEC) on the contained use of genetically modified micro- organisms 23 April 1990 COUNCIL Directive (90/220/EEC) on the deliberate release into the environment of genetically modified organisms 3 April 1990 COMMISSION DIRECTIVE 94/51/EC of 7 November 1994, adapting to technical progress Council Directive 90/219/EEC on the contained use of genetically modified micro-organisms COMMISSION DECISION (94/730/EC) of 4 November 1994, establishing simplified procedures concerning the deliberate release into the environment of genetically modified plants pursuant to Article 6 (5) of Council Directive 90/220/EEC EU Legislation concerning transport of GMOs or pathogenic organisms and that pertaining to biotechnological safety can be found at: https://ec.europa.eu/food/plant/gmo/legislation_en

United Nations Industrial Development Organisation • Voluntary code of conduct for the release of organisms into the environment The text can be found at http://www.ask-force.org/web/Regulation/UNIDO-WHO- UNEP-Release-Organisms-Environment-Volutary-20010518.pdf

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 31 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

5.6 Quarantine regulations Clients, who wish to obtain cultures of non-indigenous plant pathogens must first obtain a permit to import, handle and store from the appropriate Government Department. Under the terms of such a licence the shipper is required to see a copy of the Ministry permit before such strains can be supplied. The BRC must do its best to ensure that non-indigenous pathogens are not distributed unless the recipient has a current licence. Quarantine legislation is in place in countries world-wide restricting the import of non- indigenous plant and animal pathogens. Those who wish to import such organisms must hold the relevant import permit which can be obtained from the relevant country authority for example – Canada, UK and USA. Information on the transport of plant pathogens throughout Europe can be obtained from the European and Mediterranean Plant, Protection Organisation (EPPO; https://www.eppo.int/).

5.7 Postal Regulations and Safety Countries have their own regulations governing the packaging and transport of biological material in their domestic mail. International Postal Regulations regarding the postage of human and animal pathogens are very strict on account of the safety hazard they present. There are several organisations that set regulations controlling the international transfer of such material. These include the International Air Transport Association (IATA), International Civil Aviation Organisation (ICAO), United Nations Committee of Experts on the Transport of Dangerous Goods, the Universal Postal Union (UPU) and the World Health Organisation (WHO). It is common place to send microorganisms by post, as this is more convenient and less expensive than air freight. However, many countries prohibit the movement of biological substances through their postal services. The International Bureau of the UPU in Berne publishes all import and export restrictions for biological materials by national postal services.

5.8 Packaging IATA Dangerous Goods Regulations (DGR) require that packaging used for the transport of hazard group 2, 3 or 4 must meet defined standards, IATA packing instruction 602 (class 6.2) (IATA, 2002). Packaging must meet EN 829 triple containment requirements for hazard group 1 organisms. However, microorganisms that qualify as dangerous goods (class 6.2) must be in UN certified packages. These packages must be sent by air freight if the postal services of the countries through which it passes do not allow the organisms in their postal systems. They can only be sent airmail if the National Postal authorities accept them. There are additional costs above the freight charges and package costs if the carrier does not have its own fleet which will require the package and documentation to be checked at the airport DGR centre.

5.9 Regulations governing distribution of cultures The IATA Dangerous Goods Regulations (DGR) require that shippers of microorganisms of hazard groups 2, 3 or 4 must be trained by IATA certified and approved instructors. They also require shipper’s declaration forms, which should accompany the package in duplicate, and specified labels are used for organisms in transit by air (IATA, 2002). IATA DGR also requires that packaging used for the transport of hazard group 2, 3 or 4 must meet defined standards,

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 32 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

IATA packing instruction 602 (class 6.2) (IATA, 2002). See http://www.gbf.de/dsmz/shipping/shipping.htm. Packaging must meet EN 829 triple containment requirements for hazard group 1 organisms (Anon, 1996b). A BRC must ensure that staff responsible for distribution of cultures have a current IATA Shippers training certificate and ensure organisms are packed and shipped in accordance with IATA requirements. The IATA Dangerous Goods Regulations require that shippers of microorganisms of hazard groups 2, 3 or 4 must be trained by IATA certified and approved instructors. They also require shipper’s declaration forms, which should accompany the package in duplicate and specified labels, are used for organisms in transit by air (IATA, 2002). There are several other regulations that impose export restrictions on the distribution of microorganisms. These include control of distribution of agents that could be used in biological warfare; EU Council Regulation 3381/94/EEC on the control of export of dual-use goods (Official J. L 367, p1) and more generally countries are currently implementing Access Regulations to Genetic Resources under the Convention on Biological Diversity, transport of goods by road. It is critical that microbiologists are aware of and follow such legislation; see Guidance on regulations for the Transport of Infectious Substances 2007– 2008. WHO/CDS/EPR/2007.2 World Health Organization 2007 http://www.who.int/ihr/publications/WHO_CDS_EPR_2007_2cc.pdf

5.10 Control of Distribution of Dangerous Organisms There is considerable concern over the transfer of selected infectious agents capable of causing substantial harm to human health. There is potential for such organisms to be passed to parties not equipped to handle them or to persons who may make illegitimate use of them. Of special concern are pathogens and toxins causing anthrax, botulism, brucellosis, plague, Q fever, tularemia and all agents classified for work at Biosafety Level 4 (hazard group 4). The ‘Australia Group’ of countries has strict controls for movement outside their group but has lower restrictions within. A BRC has procedures to check the validity of customers that wish to receive dangerous organisms and if in doubt does not supply. There is considerable concern over the transfer of selected infectious agents capable of causing substantial harm to human health. There is potential for such organisms to be passed to parties not equipped to handle them or to persons who may make illegitimate use of them. Of special concern are pathogens and toxins causing anthrax, botulism, brucellosis, plague, Q fever, tularemia and all agents classified for work at Biosafety Level 4 (hazard group 4). There is control legislation in place and some of this is described below. The American Society for Microbiology (ASM) provide information for example "Contact Information for Select Agent Preservation" is available at the WFCC site http://www.wfcc.info/index.php/wfcc_library/agents/. The document is on the issue of preserving (dangerous) agents for research for the human welfare. The title of the menu in the Web page is "US legislation concerning select agents". The relevant links are embedded in the document.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 33 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

5.11 Biological Weapons Convention The Biological Weapons Convention (BWC) (The Biological and Toxin Weapons Convention) was signed in London, Moscow and Washington on 10 April 1972 and entered into force on 26 March 1975. There are currently 162 country signatories of which 18 are still to ratify (https://www.un.org/disarmament/wmd/bio/).

• Convention on the prohibition of the development, production and stockpiling of bacteriological (biological) and toxin weapons and on their destruction The text can be found at http://disarmament.un.org/treaties/t/bwc Following the signing of the BTWC countries have introduced new control legislation or procedures to prevent unauthorised access to strains that could be misused in this way. An Ad Hoc Group is currently discussing a verification protocol for the BWC, such a protocol is now in place for the Chemical Weapons Convention (CWC) https://www.un.org/disarmament/wmd/chemical/.

5.12 Export Licensing Measures Article III of the BWC obliges the States Parties not to transfer to any recipient what so ever, directly or indirectly, and not in any way to assist, encourage, or induce any States, group of States or international organizations to manufacture or otherwise acquire any of the agents, toxins, weapons, equipment or means of delivery specified in article I of the Convention. This is a legally binding obligation. A number of countries have implemented national export licensing measures as an effective means of implementing these obligations and to avoid the possibility of the inadvertent supply of any item which could be used in a BW program. Export licenses are not bans; they operate to deter proliferation by monitoring trade of relevant materials, and provide authority to stop a sale in the infrequent cases where a prospective export is likely to contribute to a BW program. It is also in the interest of industry and research institutes to ensure that such firms and institutes are not inadvertently supplying pathogens and dual-use equipment for use in the production of BW. The Geneva Protocol for the Prohibition of the Use in War of Asphyxiating, Poisonous or Other Gases, and of Bacteriological Methods of Warfare (https://www.un.org/disarmament/wmd/bio/1925-geneva-protocol/). USA The USA have rules that include a comprehensive list of infectious agents, registration of facilities that handle them, requirements for transfer, verification and disposal and that carry criminal and civil penalties. In the UK all facilities handling hazard group 2, 3 or 4 must be registered. Strict control of hazard group 3 and 4 organisms is in place. The US Biological Safety Requirements for Facilities Transferring or Receiving Select Agents In response to a Congressional mandate in 1997, CDC promulgated a regulation, Additional Requirements for Facilities Transferring or Receiving Select Agents. The select agents were drawn from the Australia List and include forty or so microorganisms, a dozen toxins, and

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 34 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

certain recombinant DNA molecules. Institutions were provided Registration Packages that included a self-assessing laboratory survey form based on the CDC/NIH publication, Biosafety in Microbiological and Biomedical Laboratories. Each registered facility is to be inspected by personnel from the OHS during a three-year cycle. To date there are 67 laboratories registered. This presentation will provide a status report on the activities associated with implementing this regulation. CDC is currently working to develop a plan to address the role of Public Health to meet the growing national concerns about bioterrorism. Additional Requirements for Facilities Transferring or Receiving Select Agents: 42 CFR Part 72.6, 1996; https://www.gpo.gov/fdsys/granule/CFR-2004-title42-vol1/CFR-2004-title42-vol1- sec72-6 Public Health Chapter I--Public Health Service, Department of Health and Human Services Part 71--Foreign Quarantine-Subpart F--Importations Sec. 71.54 Etiological agents, hosts, and vectors. (a) A person may not import into the United States, nor distribute after importation, any etiological agent or any arthropod or other animal host or vector of human disease, or any exotic living arthropod or other animal capable of being a host or vector of human disease unless accompanied by a permit issued by the Director. (b) Any import coming within the provisions of this section will not be released from custody prior to receipt by the District Director of the U.S. Customs Service of a permit issued by the Director. European The European Union has adopted a common position with the Biological and Toxic Weapons Convention http://eur-lex.europa.eu/legal- content/EN/TXT/?uri=CELEX%3A31999E0346 Delivery of microorganisms which could be used as biological weapons is governed by the Council Regulation (EC) No 837/95 of 10 April 1995 amending Regulation (EC) No 3381/94 setting up a Community regime for the control of exports of dual-use goods http://eur- lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A31995R0837 France France have introduced regulations on high pathogenic strains (AFSSAPS). Information can be found in French on http://ansm.sante.fr/.

5.13 Control procedures: UKNCC control of Dangerous Pathogens The UK has put in place a code of practice for UK public service collections to follow. The Australia Group has strict controls for movement outside their group of countries but has lower restrictions within. The list of Australia Group controlled organisms is modified periodically. The UK National Culture Collections are implementing a system involving the registration of customers to ensure bone fide supply when there is any doubt associated with giving access to potentially organisms. Access to the organisms on the Australia Group list plus additional organisms of concern is controlled. A list of such organisms has been compiled by the UKNCC, which includes plant, animal and human pathogens. See http://www.hse.gov.uk/aboutus/meetings/committees/acdp/

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 35 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Some biosafety related sites: The European Federation Biotechnology Working Party on Safety in Biotechnology http://www.biokemi.org/biozoom/issues/489/articles/1913 The European Biological Safety Association https://ebsaweb.eu/ The American Biological Safety Association https://ehs.msu.edu/ The Belgian Biosafety Server with lots of links https://www.biosafety.be/ World Health Organisation WHO http://www.who.int/en/ UNIDO Biosafety Information Network and Advisory Service http://binas.unido.org/ International Centre for Genetic Engineering and Biotechnology (ICGEB) in Trieste http://www.icgeb.trieste.it/~bsafesrv/ Some BTWC related sites: The Department of Peace Studies at the University of Bradford https://www.bradford.ac.uk/social-sciences/peace-studies/ Pugwash Study Group on CBW http://fas-www.harvard.edu:80/~hsp/pugwash.html CBIAC (The Chemical and Biological Defence Information Analysis Center) https://www.hsdl.org/?abstract&did=1099 SIPRI (Stockholm International Peace Research Institute) Chemical and Biological Warfare Project http://www.sipri.se The Henry L. Stimson Center, Chemical and Biological Weapons Non-proliferation Project http://www.stimson.org Biological and Toxin Weapons Working Group, Federation of American Scientists http://www.fas.org/bwc/index.html

5.14 Convention on Biological Diversity Text: https://www.cbd.int/convention/text/

The Convention on Biological Diversity requires that microbiologists seek prior informed consent from the country in which they wish to collect organisms. They will be required to agree terms on which benefits will be shared should they accrue from the use of the organisms. The benefit sharing may include monetary elements but may also include information, technology transfer and training. A BRC must ensure transparency retaining the link between country of origin and end user of genetic resources. Biological materials must be received and supplied within the spirit of the CBD ensuring material transfer agreements are in place. A BRC must maintain contact and follow recommendations of its national CBD Contact Point and National Focal Point The Convention on Biological Diversity (CBD) was established to support the conservation and utilisation of biodiversity ensuring fair and equitable sharing of benefits arising from the

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 36 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

latter. The CBD assigns sovereign rights to the country of origin and requires that Prior Informed Consent (PIC) is received from the country in which access to organisms is requested. Mutually Agreed Terms (MAT) on the conditions under which access is granted and on which benefits will be shared should they accrue from the use of the organisms must be put in place. Benefit sharing may include monetary elements but may also include information, technology transfer and training. The supply of organisms must also be under agreed terms under Material Transfer Agreements (MTA) between supplier and recipient to ensure benefit sharing with, at least the country of origin. This is a significant role for public service collections potentially bearing critical responsibilities to ensure traceability. Many culture collections have operated benefit sharing since they began giving organisms in exchange for deposits and re-supplying the depositor with the strain if they require a replacement. An EU DG XII project, Microorganisms Sustainable Use and Access Regulation International Code of Conduct (MOSAICC) (http://bccm.belspo.be/projects/mosaicc) produced standard material transfer agreements to facilitate access to genetic resources whilst adhering to the spirit of the CBD and National and International law governing the distribution of microorganisms (Davison et al. 1998). Many countries are putting in place legislation to control access to their genetic resources. This normally includes certification processes and can help by identifying the relevant authorities for PIC and help set the terms and conditions of access. However, some legislation has been bureaucratic and has served to restrict access. National Focal Points on access and benefit sharing are beginning to appear that should make the process simpler. To date most collections have adopted a wait and see attitude as their role is still not clear.

5.15 Ownership of Intellectual Property Rights (IPR) Organisms originating from different habitats all over the world are deposited in collections. On deposit the issue of ownership of intellectual property associated with them must be addressed. The CBD bestows sovereign rights over genetic resources to the country of origin, but intellectual property rights covering their use in processes is another matter. The CBD requires that the country of origin has a share in benefits accruing from such use, but there may be several other stakeholders. These may include the landowner where the organism was isolated, the collector, those involved in purification and growing the organism, the discoverer of the intellectual property, the collection owner where the organism was preserved and the developer of the process. It is clear that all stakeholders do not all have an equal stake; this will depend upon the input of each one to the discovery or process. This has implications for the sharing of benefits arising from exploitation of the genetic resource. The collection has a role to play in the protection of IPR even if it is merely informing the recipient of any existing material transfer agreement or the citation of the strain in a patent. The implementation of the CBD is still being discussed by delegates from the countries who are signatory and who meet at the Conference of the Parties and their workgroups. Information on the progress of these discussions can be found on the CBD web site (https://www.cbd.int/). As a form of protecting IPR patents may be taken out. In many cases the organism involved must be part of the disclosure and many countries either recommend or require by law that a written disclosure of an invention involving the use of organisms be supplemented by the deposit of the organism into a recognised culture collection. Most patent lawyers recommend

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 37 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

that the organism is deposited, regardless of it being a requirement, to avoid the possibility of the patent being rejected. To remove the need for deposit of organisms in a collection in every country where patent protection is desired, the “Budapest Treaty on the International Recognition of the Deposit of Micro-organisms for the Purpose of Patent Procedure” was concluded in 1977 and came into force towards the end of 1980 (www.wipo.int/treaties/en/registration/budapest/). This recognises named culture collections as “International Depository Authorities” (IDA) and a single deposit made in any one is accepted by every country party to the treaty. Any collection can become an IDA providing it has been formally nominated by a contracting state and meets certain criteria. There are 47 IDAs around the world and 28 in Europe which accept patent deposits of human and animal cell lines, algae, bacteria, cyanobacteria, fungi, nematodes, non-pathogenic protozoa, plant seeds and yeasts (http://www.wipo.int/export/sites/www/treaties/en/registration/budapest/pdf/idalist.pdf). It is quite clear that every intermediary in an improvement or development process is entitled to a share of the IPR, which adds another dimension to ownership. Therefore, it is critical that clear procedures on access, mutually agreed terms on fair and equitable sharing of benefits and sound material transfer agreements are in place to protect the interested parties.

5.16 Safety information provided to the recipient of microorganisms A safety data sheet must be despatched with an organism indicating which hazard group it belongs to and what containment and disposal procedures are necessary; in Europe Code of Practice for Biological Agents 1994 (Anon, 1994). Article 10 of the EU Directive 90/379/EEC regulates that manufacturers, importers, distributors and suppliers must provide safety data sheets in a prescribed format. A safety data sheet accompanying a microorganism must include:

• The hazard group of the organism being despatched

• A definition of the hazards and assessment of the risks involved in handling the organism.

• Requirements for the safe handling and disposal of the organism. - Containment level - Opening cultures and ampoules - Transport - Disposal - Procedures in case of spillage A BRC issues an appropriate safety data sheet with every culture consignment. A safety data sheet must be despatched with organism indicating which hazard group it belongs to and what containment and disposal procedures are necessary. In the UK, Microorganisms are covered by the Control of Substances Hazardous to Health (COSHH) regulations (1988), HSW Act s.6(4)(c) and subject Approved Code of Practice Biological Agents 1994. Article 10 of the EU Directive 90/379/EEC regulates that manufacturers, importers, distributors and suppliers must provide safety data sheets in a prescribed format.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 38 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

A safety data sheet accompanying a microorganism must include:

• The hazard group of the organism being despatched as defined by EU Directive 90/679/EEC Classification of Biological Agents and by the national variation of this legislation for example, in the UK, as defined in the Advisory Committee on Dangerous Pathogens (ACDP) Categorisation of biological agents, 4th edition and the Approved Code of Practice (ACOP) for Biological Agents.

• A definition of the hazards and assessment of the risks involved in handling the organism.

• Requirements for the safe handling and disposal of the organism. - Containment level - Opening cultures and ampoules - Transport - Disposal - Procedures in case of spillage Safety Data Sheets content is described in EU Directive 93/112/EC of 10 December 1993 (OL L314/93) which amends the 91/155/EC. See http://eur-lex.europa.eu/legal- content/EN/TXT/?uri=CELEX%3A31993L0112

5.17 Areas Beyond National Jurisdiction (ABNJ) Marine Areas Beyond National Jurisdiction (ABNJ), commonly called the high seas, are those areas of ocean for which no one nation has sole responsibility for management. Action is being taken to improve management of fisheries and strengthen protection of related ecosystems. This is being undertaken to prevent devastating impacts on marine biodiversity, socio- economic well-being and food security for millions of people directly dependent on those fisheries particularly by the Global Environment Facility (GEF) https://www.thegef.org/topics/areas-beyond-national-jurisdiction. The United Nations Convention on the Law of the Sea (UNCLOS) provides that the areas beyond the limits of national jurisdiction (ABNJ) include: 1. the water column beyond the Exclusive Economic Zone (EEZ), or beyond the Territorial Sea where no EEZ has been declared, called the High Seas (Article 86); and 2. the seabed which lies beyond the limits of the continental shelf, established in conformity with Article 76 of the Convention, designated as "the Area" (Article 1). Marine Areas Beyond National Jurisdiction (ABNJ) are increasingly being exploited by human activities including shipping, commercial fishing, deep sea mining and of course in EMBRIC for bio-prospecting. In the case of sampling for EMBRIC in the Pacific Ocean, DSMZ checked all sampling stations of the cruise of the research vessel (RV Sonne) individually and prior to sampling for their geographic positions. Only sampling sites outside of the EEZ of the nearest states and, in case of sediment sampling, beyond the continental shelf, were chosen for acquiring the samples and isolating bacterial strains.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 39 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

The GEF is also associated with many global and regional multilateral agreements that deal with international waters or transboundary water systems https://www.thegef.org/partners/conventions. As such, the GEF assist its recipient countries with international waters issues as they undertake work under the following conventions:

• The Global Ship Ballast Water Treaty • The UN Law of the Sea Treaty • The MARPOL treaty for shipping (International Convention for the Prevention of Pollution from Ships) • The UN Agreement on conservation and management of straddling fish stocks and highly migratory fish stocks. The GEF also supports various UN Agency Action Programs like the Barbados Programme of Action, the Global Programme of Action (GPA), Code of Conduct for Fisheries. However, more often the GEF supports and help negotiate regional conventions like the Barcelona, Cartagena, Bucharest, and Danube Conventions. A written submission of the EU and its member states on marine genetic resources, including questions on the sharing of benefits was made on 22 February 2017; Development of an International Legally-Binding Instrument Under UNCLOS on the Conservation and Sustainable Use of Marine Biological Diversity of Areas Beyond National Jurisdiction http://www.un.org/depts/los/biodiversity/prepcom_files/rolling_comp/EU_Written_Submission _on_Marine_Genetic_Resources.pdf. The document explains that benefit-sharing under the UNCLOS Implementing Agreement should be in line with a pragmatic approach outlined. The objective of the future treaty on the conservation and sustainable use of marine biological diversity in areas beyond national jurisdiction (ABNJ) should be built around the conservation of biodiversity. Determination of the legal status of marine genetic resources (MGRs) is not a precondition for addressing relevant provisions concerning potential benefit-sharing with respect to MGRs in a future Implementing Agreement. The Agreement should not regulate the management of fish stocks and fisheries but should cover fish and other biological resources used for the research on their genetic properties. IPRs, including disclosure-of-origin requirements in patent applications, should not be within the scope of the UNCLOS IA, as this issue has to be dealt with within the existing institutional frameworks competent in this subject- area (WIPO and WTO) - the overall goal of the UNCLOS Implementing Agreement. Hence, it should be conducive to the conservation and sustainable use of marine biodiversity in areas beyond national jurisdiction, marine scientific research conducted in accordance with UNCLOS, as well as to the promotion of knowledge generation and innovation. Noting that to date there have been no concrete proposals that demonstrate how such benefit-sharing would operate in practice, the EU and its Member States remain ready to consider such specific proposals that delegations may wish to put forward. EMBRIC must not only keep a watching brief on the development of this agreement but must be part of the process championed by EMBRC to input to the process to ensure the outcome does not restrict our ability to find new and interesting products from the sea without compromising the biodiversity in it.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 40 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

6. Outputs from the microorganism prototype pipeline

There were well over a thousand samples collected from the Pacific Ocean alone providing the opportunity to isolate many thousands of strains. Without the application of selective techniques this would have resulted in many of the same common organisms being grown and tested. DSMZ’s focus was to select those that were rarely isolated because of their particular requirements to grow and with unique and potential new and interesting compounds that could be utilised by bioindustry. The methodologies to achieve this are described in sections 2, 3 and 4 above. Here the results of the targeted isolation programmes and the characterisation through the microbial prototype pipeline are presented.

6.1 Strains collected by DSMZ DSMZ collected samples of sea water, sediment and sponges from the Atlantic Ocean, Mediterranean Sea, Baltic Sea and Pacific Ocean during different sampling campaigns between 2016 and 2018 and isolated over 900 strains represented here by 264 species from the phyla Actinobacteria, Bacteroidetes, Proteobacteria and Rhodothermaeota based on their diversity and after de-replicating potential clones (Table 5). The most interesting of these will be selected for cosmid library production. Two Arcobacter strains are described as novel species in the genus and those exemplary strains were as well grown in a fermenter to produce biomass for sequencing and the production of organic extracts for HZI to analyse. This preliminary test of the pipeline workflows was needed to ensure the protocols selected were optimised. A selection of the most interesting strains was made from the list shown in Table 5 for fermentation to create biomass and provide organic extracts for analysis throughout 2018 and 2019 to provide a total of 54 samples (Table 6).

6.2 Cosmid libraries and biosynthetic gene clusters at USTAN As described above in section 3 the bacterial genome sequences provided by DSMZ were analysed using antiSMASH to identify putative novel gene clusters and to predict the gene cluster products. The genomic DNA was then extracted and used to construct a cosmid library for each strain. Gene clusters predicted using antiSMASH were then targeted for heterologous expression. Biosynthetic gene clusters were discovered and an indication of potential bioactive compounds observed.

After the prediction the cosmid library of Saccharothrix espanaensis DSM 44229 and Amycolatopsis japonica MG417-CF17 were constructed and the BAC library of S. espanaensis DSM 44229 was constructed in Bio S&T company. The prediction gene clusters screening from BAC and cosmid libraries are represented in Table 7. The Network analysis of the five heterologously expressed gene clusters, viewed in cytoscape is shown in Figure 7.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 41 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Table 5. List of strains collected by DSMZ following dereplication of potential clones Sample Strain name Cultivation Strategy Isolation medium Taxonomic classification source Accession Similarity Phylum Closest relative number of the (%) closest relative JAB_HD_11a Water Multiwell plate ABW salts + 1:10 HD Actinobacteria ginsengisoli AB245394 99.70 JAB_HD_20 Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Aeromicrobium marinum NR_025681 99.70 JAB_HD_22c Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Aeromicrobium ponti AM778683 99.10 JAB_HD_90 Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Brevibacterium picturae NR_025614 100.00 JAB_HD_28a Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Frigoribacter sp. Y18807 100.00 2i2 Sponge Chemotaxis assay ASW salts + Fatty acid mix Actinobacteria Gordonia terrae KF410339 99.73 JAB_HD_35 Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Marmoricola scoriae FN386750 97.80 JAB_HD_95a Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Microbacterium maritypicum AM181506 100.00 JAS_HD_14a Water Multiwell plate ASW salts + 1:10 HD Actinobacteria Microbacterium paraoxydans AJ491806 100.00 PNS81A_F11b Water Multiwell plate ASW salts + Polymer mix Actinobacteria Micrococcus luteus AF542073 99.73 JAS_HD_29a Water Multiwell plate ASW salts + 1:10 HD Actinobacteria Micrococcus yunnanensis FJ214355 100.00 ABW_HD_4_2 Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Nocardia coeliaca NR_104776 100.00 3RW5_G1 Water Biofilm assay ASW salts + 1:10 HD Actinobacteria Rhodococcus kyotonensis AB269261 98.60 2RW5_G2 Water Biofilm assay ASW salts + 1:10 HD Actinobacteria Rhodococcus yunnanensis AY602219 99.04 JAS_HD_26 Water Multiwell plate ASW salts + 1:10 HD Actinobacteria Thermoleophilum minutum HQ223108 99.80 ABW Poly Water Multiwell plate ABW salts + Polymer mix Actinobacteria Yonghaparkia alkaliphila NR_043675 97.00 JAB_HD_38 Water Multiwell plate ABW salts + 1:10 HD Bacteroidetes Algoriphagus aquimarinus NR_025602 96.90 JAB_HD_10b Water Multiwell plate ABW salts + 1:10 HD Bacteroidetes Algoriphagus namhaensis HQ401024 97.80 3RW5_S4b Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Altuibacter lentus JQ362482 96.82 JAB_HD_101 Water Multiwell plate ABW salts + 1:10 HD Bacteroidetes Arenibacter troitsensis AB080771 99.90 CS3_PP3b Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Cellulophaga geojensis 315111068 99.50 JAB_HD_110 Water Multiwell plate ABW salts + 1:10 HD Bacteroidetes Cellulophaga tyrosinoxydans EU443205 100.00 ACS3C_G4 Sediment Multiwell plate ASW salts + 1:10 HD Bacteroidetes Dokdonia donghaensis DQ003276 99.45 PCS2D_A12 Sediment Multiwell plate ASW salts + Polymer mix Bacteroidetes Draconibacterium orientale JQ683778 97.94 ACS2D_F2 Sediment Multiwell plate ASW salts + 1:10 HD Bacteroidetes Flaviramulus ichthyoenteri JX412958 95.76 JAB_HD_16a Water Multiwell plate ABW salts + 1:10 HD Bacteroidetes Flavivirga amylovorans HM475138 97.10 CS2_G3a Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Formosa spongicola NR_116612 98.52 3RW5_PP6 Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Jejudonia soesokkakensis KC792554 96.29 3CS3_G3 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Lacinutrix sp. FN377744 97.31 2RS2_G3 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Lutibacter litoralis AY962293 98.10 3RW5_PP3b Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Maribacter dokdonensis AY960749 98.53 3RW5_PP1 Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Maribacter forsetii NR_042627 97.43 2RW5_PS4 Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Maribacter stanieri EF536747 99.69 2CS2_PP5 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Marinifilum flexuosum HE613737 95.28 3CS2_G2 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Marinifilum sp. HE613737 97.06 Mesoflavibacter Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes AB265181 98.60 2CW2_G1 zeaxanthinifaciens 3CW3_G4 Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Nonlabens arenilitoris JX291103 99.73 2CS2_PS1 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Olleya marilimosa JN175350 99.86 Planomicrobium Water Multiwell plate ASW salts + 1:10 HD Bacteroidetes AB680292 99.70 JAS_HD_7 okeanokoites RW1 10^4 2A light yellow Water Multiwell plate ASW salts + 1:10 HD Bacteroidetes Polaribacter porphyrae NR_114321 95.00 2CS3_PS5 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Psychroserpens mesophilus DQ001321 97.82

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 42 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Sample Strain name Cultivation Strategy Isolation medium Taxonomic classification source Accession Similarity Phylum Closest relative number of the (%) closest relative 2CS2_PP2 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Sphingobacteria bacterium AB362263 97.41 ACS3D_E6 Sediment Multiwell plate ASW salts + 1:10 HD Bacteroidetes Ulvibacter antarcticus EF554364 95.87 2CW2_PP3 Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Wandonia haliotis FJ424814 88.36 ACW3C_G4 Water Multiwell plate ASW salts + 1:10 HD Bacteroidetes Winogradskyella aquimaris HM368527 98.18 Winogradskyella Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes HQ336488 98.15 3RS2_PS2b damuponensis 4RS2_PS8 Sediment Biofilm assay ASW salts + 1:10 HD Bacteroidetes Winogradskyella rapida U64013 98.01 4RW5_S2 Water Biofilm assay ASW salts + 1:10 HD Bacteroidetes Zobellia galactanivorans NR_074684 99.09 RW6 10^3 1 Water Multiwell plate ASW salts + 1:10 HD Bateroidetes Tenacibaculum caenipelagi NR_125675 98.00 JAB_HD_87b Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Albirhodobacter marinus NR_126203 99.00 PRS2_10^3 Water Multiwell plate ASW salts + Polymer mix Proteobacteria Alcanivorax borkumensis NR_074890 99.00 MAW6_4 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Alcanivorax venustensis NR_025145 99.39 Altererythrobacter Water Multiwell plate ABW salts + 1:10 HD Proteobacteria DQ304436 96.10 JAB_HD_31b epoxidivorans RW2 10^4 2* Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Alterierythrobacter sp. FM177586 96.00 ARW1_1H2 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria australica FJ595485 99.09 ARW3b_1E7_(1small) Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Alteromonas genovensis NR_042667 99.00 RW2 10^3 A1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Alteromonas hispanica NR_043274 100.00 3CW3_PP2(1) Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Alteromonas marina AF529060 99.07 ARW1_1H1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Alteromonas stellipolaris AJ295715 99.00 ACW2A_D7 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Amphritea atlantica AM156910 99.10 ACS2D_F3 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Amphritea balenae AB330883 99.50 2RS2_G5 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Antarctobacter heliothermus DQ915602 99.75 RW4 10^3 1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter ellisii NR_117105 97.00 PCS2D_E6 Sediment Multiwell plate ASW salts + Polymer mix Proteobacteria Arcobacter marinus EU512920 99.73 ACS2C_E8 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter molluscorum FR675874 96.76 ARW1_2G2 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter nitrofigilis CP001999 95.61 ACS1D_H8 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter sp. FR717550 96.97 ACS2D_F11 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter suis FJ573216 94.87 ACS3C_E5 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter venerupis NR_117569 94.71 4RW5_G6 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Aurantimonas coralicida AY065627 99.44 3RW5_G2a Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Aurantimonas litoralis AY178863 100.00 ACS1D_D11 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Bordetella parapertussis U04949 97.12 JAB_HD_102a1 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Brevundimonas basaltis EU143355 100.00 JAB_HD_109b Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Brevundimonas denitrificans AB899817 99.50 2CS3_PP4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Catenococcus thiocycli HE582778 99.85 JAB_HD_37a2 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Caulobacter fusiformis AJ227759 99.00 ASW_HD_1_3 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Celeribacter baekdonensis NR_117908 97.00 3RS2_G1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Cobetia amphilecti AB646236 99.82 2CS3_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Cobetia litoralis AB646234 99.88 RS2_PP4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria aestuarii DQ055844 99.75 4CS1_PP4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Colwellia psychroerythraea AB011364 97.63 JAB_HD_19a Water Multiwell plate ABW salts + 1:10 HD Proteobacteria insulae EF012357 98.60 JAB_HD_105 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Erythrobacter aquimaris AY461441 99.00 4RW5_G5 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Erythrobacter citreus AF118020 99.60

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 43 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Sample Strain name Cultivation Strategy Isolation medium Taxonomic classification source Accession Similarity Phylum Closest relative number of the (%) closest relative ACS3C_D9 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Erythrobacter seohaensis AY562219 98.09 ACS3C_D9 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Erythrobacter vulgaris AY706935 97.53 JAB_HD_81b Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Gemmobacter aquatilis FR733676 97.40 Gemmobacter tilapiae strain Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_109053 98.80 JAB_HD_39 Ruye_53 Glaciecola agarilytica strain Water Biofilm assay ASW salts + 1:10 HD Proteobacteria DQ784575 99.73 3RW5_PP8 NO2 2RW5_PP1b Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Glaciecola chathamensis AB247623 99.86 4CW2_PS3 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Glaciecola lipolytica strain E3 EU183316 97.87 Halocynthiibacter Water Biofilm assay ASW salts + 1:10 HD Proteobacteria JWIF01000056 95.08 2CW3_PS1 namhaensis RS2_G_1 Sediment biofilm assay ASW salts + 1:10 HD Proteobacteria Halomonas alkaliphila NR_042256 100.00 2CW3_PP1 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Halomonas sp. AJ876733 100.00 MAW8_1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Halomonas titanicae NR_116997 99.60 3CS2_G1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Halomonas venusta AJ306894 99.87 3RS2_PS2a Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Hoeflea alexandrii AJ786600 99.37 RS2_PS_5 Sediment biofilm assay ASW salts + 1:10 HD Proteobacteria Hoeflea halophila NR_108835 99.00 4RS2_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Hoeflea marina AY598817 99.82 2RS2_PP3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Hoeflea phototrophica JF957616 98.58 Hydrogenophaga Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_028716 97.00 JAB_HD_18 taeniospiralis ASW_HD_1_1_1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Hyphomonas jannaschiana KF863146 99.00 JAB_HD_33b Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Hyphomonas oceanitis KF863148 99.90 MAW7_1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Hyphomonas sp KF863148 97.76 3RS2_PS3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Idiomarina abyssalis NR_024891 99.55 2CS1_PP4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Idiomarina seosinensis AY635468 99.29 ABW_HD_1_3 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Janthinobacterium lividum NR_026365 99.00 3CS2_G3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Labrenzia aggregata AB681109 100.00 CW3_PP4 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Labrenzia alba NR_042378 100.00 Leisingera Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria DQ915607 97.93 2CS3_S1 methylohalidivorans ASW_UV_1_3 Water Multiwell plate ASW salts + Soil extract Proteobacteria Leisingera nanhaiensis NR_116593 97.00 ACW2D_F8 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Lentibacter algarum FJ436732 100.00 JAS_HD_27 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria litoralis AB366174 100.00 JAB_HD_21 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Limnobacter thiooxidans NR_025421 99.70 JAB_HD_26a Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Limnohabitans parvus FM165536 97.30 2RW5_PP2 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Litoreibacter meonggei JN021667 96.37 ACW3A_B6 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Loktanella koreensis DQ344498 97.67 2CS2_S1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Loktanella soesokkakensis KC987356 100.00 RW3 10^3 2B Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Loktanella tamlensis NR_115814 98.00 JAB_HD_1 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Loktanella vestfoldensis AJ582226 99.20 RW2 10^3 2B Water Multiwell plate ASW salts + 1:10 HD Proteobacteria adhaerens NR_074765 100.00 3RW5_G2b Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Marinobacter algicola NR_042807 99.09 3RS2_PS4a Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Marinobacter antarcticus FJ196022 98.24 3RW5_PP7 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Marinobacter flavimaris AY517632 100.00

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 44 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Sample Strain name Cultivation Strategy Isolation medium Taxonomic classification source Accession Similarity Phylum Closest relative number of the (%) closest relative 2CS1_S4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Marinobacter lipolyticus NR_025671 100.00 MAW2_1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Marinobacter salarius KJ547705 100.00 3RS2_PP2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Marinobacter sediminum NR_029028 99.82 Sediment Multiwell plate ASW salts + Polymer mix Proteobacteria EF192391 99.47 PCS3D_B11 rhizophilum Marinobacterium Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria EU573966 98.69 ACS1C_C5a sediminicola Gluc_a2CH2 Sponge Chemotaxis assay ASW salts +glucose 2mM Proteobacteria Marinomonas alcarazii EU188442 96.76 PCS2C_F6 Sediment Multiwell plate ASW salts + Polymer mix Proteobacteria Marinomonas aquiplantarum EU188447 99.34 a2CH2Gluc Sponge Chemotaxis assay ASW salts + glucose 2mM Proteobacteria Marinomonas arctica DQ492749 98.10 ASW_Sug_1CH4 Sponge Chemotaxis assay ASW salts + glucose 2mM Proteobacteria Marinomonas foliarum EU188444 98.42 PCS2D_E7 Sediment Multiwell plate ASW salts + Polymer mix Proteobacteria Marinomonas ostreistagni AB242868 96.81 PRW1_1D5 Water Multiwell plate ASW salts + Polymer mix Proteobacteria Marinomonas posidonica EU188445 98.70 ASW_Sug_1CH1 Sponge Chemotaxis assay ASW salts + glucose 2mM Proteobacteria Marinomonas rhizomae EU188443 98.84 4RS2_PS5 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Maritimibacter alkaliphilus AB681686 97.40 ABW_HD_1_1_1 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Massilia aurea NR_042502 99.00 Methylobacterium Water Multiwell plate ABW salts + 1:10 HD Proteobacteria AB175634 100.00 JAB_HD_31a1 fujisawaense CS2_G2b Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Neptuniibacter halophilus GQ131677 95.53 Neptunomonas Water Multiwell plate ASW salts + 1:10 HD Proteobacteria NR_114018 99.00 ARW1_2D4 naphthovorans 4RW5_S4 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Nereida ignava DQ915613 97.83 JAB_HD_22b Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Nevskia ramosa AJ001010 99.90 2CW3_S4 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Nisaea nitritireducens DQ665839 99.70 3CS1_G3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Nitratireductor aquimarinus HQ176467 99.87 PCS2D_B10 Sediment Multiwell plate ASW salts + Polymer mix Proteobacteria Oceanisphaera sp. FN377705 98.12 ACS1C_B8a Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Oceanospirillum linum AB680860 98.09 ANS211A_A11 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Octadecabacter arcticus DQ915618 97.70 ARW3B_1D6 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Oricola cellulosilytica KF582604 99.00 CS1_PS1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Paracoccus caeni GQ250442 99.09 CS1_PS2b Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Paracoccus seriniphilus AB681242 97.57 CS1_PS2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Paracoccus siganidrum JX398976 97.85 Parasphingopyxis Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AB524074 99.84 3RS2_G4a lamellibrachiae RS2_PS_4 Sediment biofilm assay ASW salts + 1:10 HD Proteobacteria Parrarhodobacter aggregans AM403160 97.39 2CW3_G5 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Pelagicola litoralis EF192392 99.30 RW5_PP4 Water biofilm assay ASW salts + 1:10 HD Proteobacteria Phaeobacter arcticus NR_043888 99.00 2CS2_G3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Phaeobacter leonis NR_117639 99.43 CW3_PP3 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Photobacterium lutimaris DQ534014 98.84 2CS2_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Ponticoccus litoralis EF211829 98.02 JAB_HD_87a Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Porphyrobacter dokdonensis DQ011529 99.20 ABW 1:10 HD 88 rw Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Porphyrobacter donghaensis NR_025816 99.00 4CW3_PP1 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Primorskyibacter sedentarius AB550558 100.00 Prosthecomicrobium Water Multiwell plate ABW salts + 1:10 HD Proteobacteria GQ221761 98.30 JAB_HD_19b enhydrum

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 45 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Sample Strain name Cultivation Strategy Isolation medium Taxonomic classification source Accession Similarity Phylum Closest relative number of the (%) closest relative Pseudoalteromonas Water Biofilm assay ASW salts + 1:10 HD Proteobacteria NR_025509 99.00 RW5_PP1 agarivorans Pseudoalteromonas Sponge Chemotaxis assay ASW salts + Tween mix Proteobacteria AJ417594 99.82 Tween_4CH3 agarovorans Pseudoalteromonas Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AB576636 100.00 CS2_G1 arabiensis RW5_G_3 Water biofilm assay ASW salts + 1:10 HD Proteobacteria Pseudoalteromonas atlantica NR_026218 99.00 Pseudoalteromonas Water Biofilm assay ASW salts + 1:10 HD Proteobacteria X82136 100.00 3RW5_S3a carrageenovora Pseudoalteromonas Water Biofilm assay ASW salts + 1:10 HD Proteobacteria AB681736 100.00 CW3_G1 espejiana Pseudoalteromonas Water Multiwell plate ASW salts + 1:10 HD Proteobacteria NR_029285 99.00 ARW5b_1A1 espejiana Pseudoalteromonas Water Biofilm assay ASW salts + 1:10 HD Proteobacteria AF316144 100.00 2CW2_PS2 issachenkonii CS2_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Pseudoalteromonas lipolytica FJ404721 99.81 ACS2D_D3 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Pseudoalteromonas marina AY563031 99.55 Pseudoalteromonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_028992 99.00 ABW HD 2_1 89 b 27.02.14 mariniglutinosa Pseudoalteromonas Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AF316891 99.11 3CS1_PP3 ruthenica Pseudoalteromonas Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria AB720724 99.87 4CS1_S2 shioyasakiensis 2CS2_PP3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Pseudoalteromonas sp. AJ507251 99.05 ACS2D_B11 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Pseudoalteromonas sp. JN578478 100.00 4CS1_PS1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Pseudoalteromonas sp. AY040230 98.10 Pseudoalteromonas Water Biofilm assay ASW salts + 1:10 HD Proteobacteria NR_114187 100.00 RW5_PP2 tetraodonis Water Multiwell plate ABW salts + 1:10 HD Proteobacteria AY017341 100.00 JAB_HD_57 chloritidismutans Pseudomonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_115115 99.00 ABW_HD_2_7_1 chloritidismutans Pseudomonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria EU791281 99.70 JAB_HD_29 cuatrocienegasensis Pseudomonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria AJ583501 100.00 ABW_HD_4_3 extremaustralis JAB_HD_50 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Pseudomonas gessardii AF074384 100.00 JAB_HD_42 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Pseudomonas guineae NR_042607 99.40 ABW UV _5_3 101 a Water Multiwell plate ABW salts + Soil extract Proteobacteria Pseudomonas migulae NR_114223 99.00 RW1 10^4 2AB Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Pseudomonas pachastrellae NR_040991 99.00 4d1tw Sponge Chemotaxis assay ASW salts + Tween mix Proteobacteria Pseudomonas peli AM114534 98.61 Tween_4D1 Sponge Chemotaxis assay ASW salts + Tween mix Proteobacteria Pseudomonas sp. AJ272544 99.28 ABW_HD_2_5 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Pseudomonas veronii NR_028706 100.00 Pseudorhodobacter Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_113810 99.00 ABW 1:10 HD 31.07.13 66 ferrugineus

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 46 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Sample Strain name Cultivation Strategy Isolation medium Taxonomic classification source Accession Similarity Phylum Closest relative number of the (%) closest relative Pseudorhodobacter Water Multiwell plate ABW salts + Soil extract Proteobacteria NR_109461 99.00 ABW_UV_5_2 wandonensis CS3_PS3b Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Pseudoruegeria lutimaris NR_116620 96.90 3RW5_S1a Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Pseudovibrio ascidiaceicola AB681198 99.48 ASW salts + Nitrogen Sponge Chemotaxis assay Proteobacteria Pseudovibrio japonicus NR_041391 99.74 1l4 compounds 3RW5_PS2 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Pseudovibrio sp. HQ647029 98.32 ACS3C_D5 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Psychrosphaera saromensis AB545807 97.03 ACS3D_F7 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Psychrosphaera saromensis AB545807 97.12 CS1_G3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria R.algocolus X78315 99.64 4RW5_PS4 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria R.fascians X79186 99.25 ABW_UV_2_2 Water Multiwell plate ABW salts + Soil extract Proteobacteria Rhizobium rosettiformans NR_116445 99.00 JAB_HD_56 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Rhizobium selenitireducens EF440185 99.20 ABW_HD_1_2 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Rhodobacter ovatus NR_115057 97.00 JAS_HD_21 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Roseibacterium elongatum FN667962 97.80 4CS3_PP2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Roseobacter sp. AF098495 99.65 ASW salts + Nitrogen Sponge Chemotaxis assay Proteobacteria Roseovarius aestuarii EU156066 98.76 2l2 compounds 4RW5_G2 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Roseovarius litoreus JQ390520 94.63 3CS1_PS2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Roseovarius pacificus DQ120726 99.12 3RW5_S5a Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Roseovarius sediminilitoris JQ739459 97.32 2CS1_PS3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Ruegeria arenilitoris JQ807219 98.32 ACW3D_G5 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Ruegeria atlantica AB255399 100.00 ASW salts + Nitrogen Sponge Chemotaxis assay Proteobacteria Ruegeria meonggei KF740534 99.75 2l4 compounds 3CS1_PS1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Ruegeria mobilis AB255401 100.00 3RW5_PS3 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Ruegeria scottomollicae AM905330 100.00 2CW2_PS3 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria marina HQ336489 100.00 2CS1_PS4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Sagittula stellata DQ915628 98.80 4CS1_PP2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Salipiger mucescens AY527274 99.58 Seohaeicola Water Multiwell plate ABW salts + 1:10 HD Proteobacteria EU221274 100.00 JAB_HD_104 saemankumensis MAS2_1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Shewanella arctica NR_117528 100.00 ACS3C_D11b Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Shewanella basaltis EU143361 97.70 3RS2_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Shewanella kaireitica AB094598 98.10 4CS3_G2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Shewanella marinintestina AB081757 98.11 RS2_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Shewanella piezotolerans CP000472 98.00 ASW salts + DMSO 1% and Sponge Chemotaxis assay Proteobacteria Shewanella violacea D21225 98.36 5c3 Thiosulfate 1mM JAB_Poly_15 Water Multiwell plate ABW salts + Polymer mix Proteobacteria histidinilytica NR_116043 97.20 Sphingomonas Water Multiwell plate ABW salts + 1:10 HD Proteobacteria NR_118124 98.10 JAB_HD_89 starnbergensis JAB_HD_100 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Sphingomonas wittichii AB021492 98.60 ASW_DMSO_5D3 Sponge Chemotaxis assay ASW salts + DMSO 1% Proteobacteria Sphingomonas xenophaga X94098 98.81 JAB_HD_40 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Sphingopyxis flavimaris AY554010 99.50

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 47 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Sample Strain name Cultivation Strategy Isolation medium Taxonomic classification source Accession Similarity Phylum Closest relative number of the (%) closest relative JAB_HD_47b Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Spongiibacter borealis HQ199599 99.90 3RS2_PS1b Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Stappia marina AY628423 99.81 4RW5_S5 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Sulfitobacter delicatus AY180103 100.00 2RS2_G4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Sulfitobacter dubius AY180102 99.73 ACS3D_G9b Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Sulfitobacter marinus DQ683726 99.63 3RW5_PP3a Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Sulfitobacter pontiacus Y13155 99.10 4RS2_S3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Tateyamaria pelophila AJ968651 97.46 2RS2_S3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Thalassobacter arenae EU342372 95.11 4RS2_PP4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Thalassobius aestuarii AY442178 97.18 2CS1_S2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Thalassococcus lentus JX090308 98.78 2CW2_PS1 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Thalassospira lucentensis NR_115011 99.84 2CS2_PP4 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Thalassospira permensis FJ860275 100.00 2CW3_PP4 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Thalassospira povalilytica AB548215 100.00 1b5b Sponge Chemotaxis assay ASW salts + Yeast extract Proteobacteria Tropicibacter litoreus NR_117647 98.43 2CW2_PS4 Water Biofilm assay ASW salts + 1:10 HD Proteobacteria Tropicibacter mediterraneus HE860710 98.49 CS1_S3b Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria V.pelagius X74722 98.86 JAB_HD_12a Water Multiwell plate ABW salts + 1:10 HD Proteobacteria ginsengisoli AB245358 99.80 JAB_HD_32 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Variovorax ginsengisoli NR_112562 99.70 2CS1_G1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria azureus NR_041683 99.18 ARW1_1A1 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Vibrio chagasii NR_117891 99.00 1d2 Sponge Chemotaxis assay ASW salts + Mix of sugars I Proteobacteria Vibrio cyclitrophicus AM162656 99.49 ARW1_2A7 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Vibrio gigantis AJ582810 100.00 PRW2_1E10 Water Multiwell plate ASW salts + Polymer mix Proteobacteria Vibrio hemicentroti JX204734 99.70 RW3 10^4 2 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Vibrio lentus NR_028926 99.00 5k2 Sponge Chemotaxis assay ASW salts + Mix of sugars III Proteobacteria Vibrio parahaemolyticus AB680329 99.85 2CS1_PP1 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Vibrio probioticus AJ345063 98.70 CS1_G2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Vibrio proteolyticus AB680395 95.87 2CS1_PP2 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Vibrio sp. AJ316171 98.51 3CS2_PP3 Sediment Biofilm assay ASW salts + 1:10 HD Proteobacteria Vibrio sp. AJ316193 99.46 1k3 Sponge Chemotaxis assay ASW salts + Mix of sugars III Proteobacteria Vibrio toranzoniae HE978310 99.60 ARW1_2H4 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Vibrio xuii NR_025478 99.00 JAB_HD_49a Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Zhongshania aliphaticivorans NR_126306 100.00 ABW_UV_6_3 Water Multiwell plate ABW salts + Soil extract Proteobacteria Zhongshania antarctica NR_108450 98.00 2CW3_G4 Water Biofilm assay ASW salts + 1:10 HD Rhodothermaeota Balneola vulgaris AQXH01000003 93.96

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 48 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Table 6. List of selected strains for fermentation to create biomass and provide organic extracts for analysis.

Strain name Sample source Cultivation Strategy Isolation medium Taxonomic classification

Accession number of Phylum Closest relative Similarity (%) the closest relative PCS2D_E7 Sediment Multiwell plate ASW salts + 1:10 HD + Polymer Proteobacteria Marinomonas ostreistagni AB242868 96.81

ACS3C_E5 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter venerupis strain F67_11 NR_117569.1 94.71

CS3_PS3b Sediment Biofilm ASW salts + 1:10 HD + Polymer Proteobacteria Pseudoruegeria lutimaris strain HD_43 NR_116620 96.9

ACS3D_E6 Sediment Multiwell plate SSE + 1:10 HD Bacteroidetes Ulvibacter antarcticus strain IMCC3101 EF554364.1 95.87

ACS3C_D11 Sediment Multiwell plate ASW salts + 1:10 HD Proteobacteria Shewanella inventionis strain KX27 KT781407 93.99

2CW3_G4 Water Biofilm ASW salts + 1:10 HD + Polymer Rhodothermaeota Balneola vulgaris NR_042991.1 93.96

ARW1_2F2 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter venerupis F67-11(T) ??? 95.35

ARW1_2G2 Water Multiwell plate ASW salts + 1:10 HD Proteobacteria Arcobacter nitrofigilis DSM 7299(T) ??? 95.61

RS2_PS_4 Sediment Biofilm ASW salts + 1:10 HD + Polymer Proteobacteria Roseicitreum antarcticum strain ZS2-28 NR_116571.1 96,03

RW5_G4 Water Biofilm ASW salts + 1:10 HD + Glass Proteobacteria Pseudoruegeria sp.KMM 6708 ??? 99

3RW5_PP6 Water Biofilm ASW salts + 1:10 HD + Polymer Bacteroidetes Jejudonia soesokkakensis strain SSK1 KC792554 96.29

JAB_HD_38 Water Multiwell plate ASW salts + 1:10 HD Bacteroidetes Algoriphagus aquimarinus strain LMG 21971 NR_025602.1 96.9

M20 Water Biofilm KM14 Actinobacteria Rubrobacter indioceani strain SCSIO 08198 MF919580.1 94

M09 Water Biofilm MB Proteobacteria Altererythrobacter aquiaggeris strain KEM-3 NR_158024.1 99

M55 Water Biofilm MB Proteobacteria Altererythrobacter epoxidivorans strain JCS350 NR_043706.1 99

M62 Water Biofilm MB Firmicutes Bacillus toyonensis strain BCT-7112 NR_121761.1 100

M64 Water Biofilm KM14 Bacteroidetes Flavobacterium glaciei strain 0499 NR_043891.1 99

M66 Water Biofilm KM14 Bacteroidetes Flavobacterium terriphilum strain CUG00004 NR_152048.1 99

M68 Water Biofilm ABW salts + 1:10 HD Bacteroidetes Arenibacter algicola strain TG409 NR_116561.1 100

M72 Water Biofilm ABW salts + 1:10 HD Bacteroidetes Algoriphagus jejuensis strain CNU040 NR_108184.1 99

JAB_HD_127b Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Rhodococcus qingshengii strain JCM 15477 (T) NR_043535.1 99.04

PCS2D_E11 Sediment Multiwell plate ASW salts + 1:10 HD + Polymer Proteobacteria Oceanisphaera psychrotolerans strain LAM-WHM-ZC NR_137213.1 99.78

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 49 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

JAB_HD_128b Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Marinobacter sediminum strain R65 16S ribosomal RNA NR_029028.1 99.76

JAB_HD_118 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Sphingomonas wittichii strain RW1 NR_074268.1 97.37

2CW3_G4a Water Biofilm ASW salts + 1:10 HD + Glass Bacteroidetes Balneola alkaliphila strain CM41_14b NR_044367.1 92.55

JAB_HD_2a Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Rhodococcus qingshengii strain JCM 15477 (T) NR_115708.1 99.88

JAB_HD_137a Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Rhodococcus koreensis strain DNP505 NR_114500.1 99.46

JAB_HD_121 Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Microbacterium maritypicum strain DSM 12512 NR_114986.1 99.87

4RS2_G3b Sediment Biofilm ASW salts + 1:10 HD + Glass Proteobacteria Aliidiomarina taiwanensis strain AIT1 NR_118000.1 94.35

4RW5_PS1 Water Biofilm ASW salts + 1:10 HD + Polymer Proteobacteria Phaeobacter marinintestinus strain UB-M7 NR_148290.1 95.93

CS1_PP3 Sediment Multiwell plate ASW salts + 1:10 HD + Polymer Proteobacteria Pseudoalteromonas shioyasakiensis strain SE3 NR_125458.1 99.47

4CH2(twe Sponge Chemotaxis ASW salts + 1:10 HD Proteobacteria Pseudomonas peli strain R-20805 AM114534 98.57

JAB_HD_4a2 Water Multiwell plate ABW salts + 1:10 HD Actinobacteria Aeromicrobium ginsengisoli strain Gsoil 098 NR_041384.1 99.49

4RS2_G3a Sediment Biofilm ASW salts + 1:10 HD + Glass Proteobacteria Halomonas alkaliphila strain 18bAG NR_042256.1 100

4RW5_PS3 Water Biofilm ASW salts + 1:10 HD Proteobacteria Pseudovibrio axinellae strain Ad2 NR_118255.1 98.05

Rhodobacteraceae sp. D100-Iso2 Alga Direkt plating MB Proteobacteria Celeribacter manganoxidans strain DY25 CP021404.1 92.89

Rhodobacteraceae sp. MEBiC05055 Alga Direkt plating MB Proteobacteria Tateyamaria omphalii strain MKT107 NR_125446 98.81

Marinovum algicola FF3 DSM-10251T Alga Direkt plating MB Proteobacteria Marinovum algicola FF3 DSM10251T AB289592.1 100

Marinovum algicola DG898 DSM-27768 Alga Direkt plating MB Proteobacteria Marinovum algicola FF3 DSM10251T AB289592.1 100

Oceanicella actignes PRQ-68 DSM-22674T Alga Direkt plating MB Proteobacteria Oceanicella actignis strain CLW NR_109312.1 99.93

A11D_105 Alga Direkt plating MB Proteobacteria Sulfitobacter porphyrae SCM-1T AB758574 99.9

A05D_005 Alga Direkt plating MB Proteobacteria Aquicoccus porphyridii L1 8-17T MF113254 100

C05C_116 Alga Direkt plating L1ZM10 Proteobacteria Sulfitobacter pseudonitzschiae SMR1 CP022415.1 100

C05C_110 Alga Direkt plating MB Proteobacteria Hoeflea alexandrii AM1V30T AJ786600 98,5

H01Y_008A Alga Direkt plating MB Proteobacteria Fretibacter rubidus JC2236T JQ965646 96

3RW5_S4aa Water Biofilm ASW salts + 1:10 HD + Steel Bacteroidetes Maribacter forsetii strain KT02ds18-6 NR_042627.1 97.84

JAB_HD_102a2 Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Pseudomonas salina strain XCD-X85 NR_137210.1 98.19

4RW5_PP2 Water Biofilm ASW salts + 1:10 HD + Polymer Proteobacteria Jiella aquimaris strain LZB041 NR_134752.1 99.67

4d1(twe) Sponge Chemotaxis ASW salts + 1:10 HD Proteobacteria Pseudomonas peli strain R-20805 AM114534 98.66

4RS2_G7 Sediment Biofilm ASW salts + 1:10 HD + Glass Proteobacteria Lacimonas salitolerans strain TS-T30 NR_145871.1 97.49

JAB_HD_109a Water Multiwell plate ABW salts + 1:10 HD Proteobacteria Pseudorhodobacter sinensis strain Y1R2-4 NR_151911.1 99.39

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 50 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

RW5_G2 Water Biofilm ASW salts + 1:10 HD + Glass Bacteroidetes Altibacter lentus strain JLT2010 NR_126240.1 95.66

CS1PS2a Sediment Biofilm ASW salts + 1:10 HD + Polymer Proteobacteria Paracoccus homiensis strain DD-R11 NR_043733.1 97.64

4RW5_G1A Water Biofilm ASW salts + 1:10 HD + Glass Proteobacteria Marinobacter litoralis strain SW-45 NR_028841.1 99.77

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 51 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Using PCR screening, 18 predicted gene clusters were located from the BAC and cosmid library of S. espanaensis DSM 44229 and 4 of them were transferred into Streptomyces (S. coelicolor and S. lividans) for heterologous expression (Table 7). Table 7 Prediction gene clusters screening from BAC and cosmid library Heterologous Size Homologous known gene expression in Cluster Type (Kb) clusters (similarity level) Streptomyces 1 Terpene 22.5 Lividomycin (6%) √ 2 Lantipeptide 22.6 Erythreapeptin (75%) 3 Terpene 25.3 Isorenieratene (42%) 5 Furan 11.1 Asukamycin (19%) √ 6 Nrps 55.7 - 8 Nrps 69.8 Bacillibactin (38%) 11 Nrps 88.3 Skyllamycin (14%) 12 T1pks-Otherks 82 - 17 Lantipeptide 22.8 Kinamycin (5%) √ 19 Melanin 10.5 - 23 Nrps-T1pks 87.3 Splenocin (12%) 24 T1pks 68.4 Streptazone E (66%) 30 Thiopeptide 29.2 - √ Terpene- Lassopeptide- T2pks-Nrps- 31 T1pks 85.3 Fluostatin (23%) Terpene- Lantipeptide- 32 Nrps-T1pks 99.4 Azinomycin B (17%) 34 Terpene 21.8 - 35 Oligosaccharide 25.3 - 36 Terpene 22 SF2575 (6%) √

To try and improve production of heterologously expressed compounds, the Streptomyces promoter ermE* was inserted into the cosmid containing cluster 17, 30 and 36. So far, the promoter has been inserted in one direction facing cluster 36 and, in both directions, flanking cluster 17. These new modified cosmids, were then into Streptomyces and we will look at production levels and see if they are enhanced as expected.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 52 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

A: B:

C:

Figure 7. Network analysis of the five heterologous expressed gene clusters, viewed in cytoscape

MS/MS spectra were collected, and the molecular network was generated using cosine scores, where edges connect nodes and therefore measure relatedness in MS/MS spectra. Each node is colour coded for ease of visualisation. A: Purple nodes indicate masses which are only present in samples where gene cluster number 1 (predicted terpene) was heterologously expressed, but not in any other samples such as in the controls or when other gene clusters were heterologously expressed. Green nodes indicate masses which are only present in samples where gene cluster number 5 (predicted furan) was heterologously expressed. Teal nodes indicate masses which are only present in samples where gene cluster number 17 (predicted lantipeptide) was heterologously expressed. Orange nodes indicate masses which are only present in samples where gene cluster number 30 (predicted thiopeptide was heterologously expressed. Yellow nodes indicate masses which are only present in samples where gene cluster number 36 (predicted terpene was heterologously expressed. Black nodes indicate masses which are only present in untransformed S. coelicolor and S. lividans. Grey nodes indicate masses present in more than one heterologously expressed gene cluster and/or in a control sample. B: Example networks from gene cluster 1 heterologously expressed. The number on the node represents the parent mass. C: Example networks from gene cluster 17 heterologously expressed. The number on the node represents the parent mass.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 53 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

6.3 Characterisation of extracts at HZI A first pilot extract was obtained by DSMZ and profiled successfully, demonstrating the principle feasibility of the approach for measuring extracts of rare bacteria (see Figure 8). The bulk of the study of extracts was carried out between February and April 2019, when 25 extracts (Table 8) were delivered from DSMZ and analysed at HZI. Between April 2019 and project end, the in-depth analysis and annotation will be completed for a selection of promising samples; LC-MS/MS measure and proof of concept analysis of remaining 39 samples will be carried out after these extracts will be delivered at HZI.

Table 8 List of extracts from selected strains measured and analysed at HZI

Strain Isolation medium Taxonomic classification name Accession Similarity Phylum Closest relative number of the (%) closest relative ASW salts + 1:10 HD PCS2D_E7 Proteobacteria Marinomonas ostreistagni AB242868 96.81 + Polymer ACS3C_E5 ASW salts + 1:10 HD Proteobacteria Arcobacter venerupis strain F67_11 NR_117569.1 94.71 ASW salts + 1:10 HD CS3_PS3b Proteobacteria Pseudoruegeria lutimaris strain HD_43 NR_116620 96.9 + Polymer ACS3D_E6 SSE + 1:10 HD Bacteroidetes Ulvibacter antarcticus strain IMCC3101 EF554364.1 95.87 ASW salts + 1:10 HD 2CW3_G4 Rhodothermaeota Balneola vulgaris NR_042991.1 93.96 + Polymer ARW1_2F2 ASW salts + 1:10 HD Proteobacteria Arcobacter venerupis F67-11(T) ??? 95.35 ARW1_2G2 ASW salts + 1:10 HD Proteobacteria Arcobacter nitrofigilis DSM 7299(T) ??? 95.61 ASW salts + 1:10 HD RS2_PS_4 Proteobacteria Roseicitreum antarcticum strain ZS2-28 NR_116571.1 96.03 + Polymer ASW salts + 1:10 HD 3RW5_PP6 Bacteroidetes Jejudonia soesokkakensis strain SSK1 KC792554 96.29 + Polymer M20 KM14 Actinobacteria Rubrobacter indioceani strain SCSIO 08198 MF919580.1 94 M09 MB Proteobacteria Altererythrobacter aquiaggeris strain KEM-3 NR_158024.1 99 M55 MB Proteobacteria Altererythrobacter epoxidivorans strain JCS350 NR_043706.1 99 M62 MB M62 Bacillus toyonensis strain BCT-7112 NR_121761.1 100 M64 KM14 Bacteroidetes Flavobacterium glaciei strain 0499 NR_043891.1 99 M66 KM14 Bacteroidetes Flavobacterium terriphilum strain CUG00004 NR_152048.1 99 M68 ABW salts + 1:10 HD Bacteroidetes Arenibacter algicola strain TG409 NR_116561.1 100 M72 ABW salts + 1:10 HD Bacteroidetes Algoriphagus jejuensis strain CNU040 NR_108184.1 99 AEG42_23 SSE + 1:10 HD Bacteroidetes Ferruginibacter paludis strain HME8881 NR_136802.1 96 HEG41_64b SSE + 1:10 HD Bacteroidetes Niastella hibisci strain THG-YS3.2.1 NR_153698.1 96 SEG27_44 SSE + 1:10 HD Bacteroidetes Pseudobacter ginsenosidimutans strain Gsoil 221 NR_108590.1 94 SEG27_28 SSE + 1:10 HD Bacteroidetes Niveitalea solisilvae strain 6-4 NR_156888.1 94 AEG42_46 SSE + 1:10 HD Bacteroidetes Flavitalea sp. AN120636 KX762320.1 99 AEG42_13 SSE + 1:10 HD Actinobacteria Nocardioides marinquilinus strain CL-GY44 NR_109667.1 96 SEG27_38 SSE + 1:10 HD Bacteroidetes Chitinophaga sp. BN130233 KP419742.1 95 AEG42_45 SSE + 1:10 HD Actinobacteria Sporichthya polymorpha strain DSM 43042 NR_024727.1 95

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 54 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Sample processing workflow: metabolic profiling • Sample preparation was performed in order to obtain a dry biomass from extracts and resuspend the thus obtained pellet in ACN 20% at a concentration of 0.5 mg/ml. • Untargeted LC-MS/MS measurements were performed using a Bruker maxis HD UHR- TOF mass spectrometer equipped with an Apollo II Elektrospray source. LC separation was carried out via a Ultimate3000RS UHPLC system, using a Kinetex 1,7µ C18 100A, 150x2,1mm column (water-ACN gradient) and a flow rate of 300 µl/min. The overall run time was approximately 30 min for each sample. • Analysis pre-processing was performed as previously described in section 4.2. Briefly, several steps of chromatographic peak detection, retention time correction, peak- picking and filtering of detected features (retention time – m/z pairs) were carried out. • Annotation of known metabolites was possible via in-house library search, manual dereplication based on spectral matching through online available databases and clustering based on CluMSID and especially molecular networking approach.

Figure 8. Schematic representation of the workflow approach carried out at HZI for the measurement and analysis of extracts.

Molecular networking analysis was used to discriminate between endo-metabolomes (from cell pellets), exo-metabolomes (from supernatant) and isolation media used as control (Figure 9A). Such tool proved to be useful and efficacious to extract information from the large data set (50 extracts from cell pellets and supernatant + media controls), formed of potentially very different samples, as would be expected from distinct bacterial strains. The colour of the nodes was set to be corresponding to the origin of the extract (from cell pellets in pink, from supernatant in blue and from media control in grey). By navigation of the MS/MS detectable chemical space from these available extracts, it was possible to assess the node distribution in the extractions (Figure 9B). From the global molecular-network-derived Euler diagram (Figure 9B) the majority of “unique nodes” (found only in one subgroup) was detected from the cell pellets extraction. This suggested the higher potential uniqueness of endo-metabolomes over exo- metabolomes; therefore, it was decided to carry out the in-depth analysis of this promising class of cell pellets extracts.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 55 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 9 A: Molecular network obtain for the entire data set, originated from the extracts of 25 bacterial strains. Nodes from cell pellets extractions are depicted in pink, from supernatant in blue and from media control in grey. B: Euler diagram of node distribution based on the global molecular network in A.

The extensive analysis of cell pellets extracts singularly performed via a combination of MS/MS in-house library matching and manual dereplication (using commercial and online opensource databases) allowed to annotate between 10 and 25% of the detected features to known metabolites. An example of the list of annotated known metabolites, for the cell pellet extract of strain M55, is reported in Figure 10.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 56 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 10 List of annotated features (known metabolites) for the cell pellet extract of strain M55

However, the manual curation of a large data set can become very time-consuming and can account for only one aim of the work, which is the annotation of known metabolites, whereas the identification of unknown molecules remains unaddressed. Moreover, such work performed singularly on each measured extract, is incapable of giving comparative information; e.g. on unique metabolites potentially present in the whole data set. Clustering of highly similar MS2 spectra was used as a way to reduce the number of total spectra, while maintaining the number of unique spectra. The LC-MS/MS measurements from the cell pellets extracts of the 25 bacterial strains were subjected to molecular networking from GNPS, yielding a metabolite-level view of the data and chemical structure similarity. 1722 nodes representing unique spectra were obtained from the analysis; each node represents a single chemical entity present in one or more bacterial strains. Together with molecular networking, these nodes were also subjected to dereplication by MS2 spectral matching with online libraries available from GNPS; the dereplication was manually curated and combined with previously obtained annotations from in-house library. The 1722 nodes composing the global molecular network were grouped in 106 subnetwork families (with more than 2 nodes per cluster); we were able to identify members of 9 subnetworks (Figure 11).

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 57 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 11 A: Molecular network visualization of cell pellets extracts from 25 bacterial strains. Subnetworks containing annotated metabolites are boxed and annotation is reported in B. C: Zoom in of subnetwork 19b with annotated nodes. Nodes are colour coded with respect to their origin (strain); nodes in blue represent chemical entities found in 2 or more strains.

Considering that micro-organisms have evolved to survive in every niche on the planet, it can be postulated that the production of specialized chemistries has played a role in such an adaptation; in particular the specialization of primary and secondary metabolism can lead to the production of diverse molecules that are optimized to adapt to the producer’s niche. Therefore, we hypothesized that if a metabolite (node) is found infrequently, it is more probable that it belongs to a group of specialized chemistries, potentially indicating a novel metabolite. Following this hypothesis, we targeted molecular families produced by one single strain, to the aim of finding unknown metabolites. Color-coding of nodes based on their origin (strain) enabled a visualization of chemical entities extracted by one single strain. This analysis revealed several not-annotated nodes belonging to the same sample (strain M62), present in a subnetwork family where related compounds were annotated (#44 – choline derivatives). It was not possible to dereplicate these nodes using various tools already discussed, suggesting therefore the possibility of a novel group of chemistries; the analysis of MS spectra from these nodes is still ongoing.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 58 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Moreover, a subnetwork belonging to one single strain and containing non-annotated nodes was also found in the molecular network visualization, pointing to the possibility of potential novel compounds. The analysis of such nodes is currently ongoing at HZI.

6.4 Significance of outputs • The designed media (formulae in Annexe 1) enabled the isolation of 264 species rarely isolated bacteria • Flow cytometry and biofilm technologies was used for selection of marine organisms in samples that had greatest potential as candidates for biomass production • A pulse delivery method to isolate cells with specific properties involving Multiple Displacement Amplification (MDA) was tried, tested and proved invaluable. This led to new cultivation media to augment the diversity of strains for the pipeline. The technique also allows targeting cells with specific properties directed by user needs • Combined cosmid and bac library heterologous expression approach developed. • Genome informed use of isotopes to track new metabolites. • USTAN generated cosmid libraries for a series of more tractable and sequenced test strains. Biosynthetic gene cluster analysis prioritized 9 clusters according to novelty. Cosmids containing them were taken into strains of Streptomyces coelicolor and Streptomyces lividans for heterologous expression. Fermentation broth extracts revealed heterologously produced metabolites. • Larger biosynthetic gene clusters were accessed via a complementary Bac library approach, removing the need for cluster stitching prior to heterologous expression. • Original bottlenecks in getting sufficient DNA for analysis was overcome shifting bottle necks: - Expression level: addressed by introduction of promotors, utilization of different heterologous hosts, and utilization of small molecule elicitors. - Metabolite de-replication and assignment: addressed with cluster analysis of LCMSMS data aiding the identification of series of closely related compounds - Cyclic peptides difficult to characterize: new approaches explored by pre- incubation with an enzyme • HZI have optimized analysis for different metabolite classes: e.g. short-chain fatty acids to larger polar and nonpolar compounds of primary and secondary metabolism. Lessons learned from the metabolome analysis of microalgae of WP7 accelerate their application to the bacteria. • Use of metabolite clustering analysis with a tool made available as an R package to assist in the identification of novel suites of compounds. • Use of molecular networking tool from GNPS combined with manual dereplication and matching with in-house library allowed the faster and more efficient annotation of known metabolites. The visualization of the chemical space at the molecular level

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 59 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

gave insights on structural relationship between chemical entities in the whole data set, pointing towards potential novel compounds (unknown metabolites). • Three interesting fully sequenced strains from Institute Pasteur were examined to test the WP6 prototype pipeline. One strain is producing interesting compounds; extracts are being provided to HZI for further analysis. - Psychrobacter glacincola CIP 105313T; 1997, ACAM <- D. Nichols, Tasmania Univ., Australia Shelf Sea Ice - Psychrobacter maritimus CIP 108811T; 2005, DSMZ <- L. A. Romanenko: strain Pi 2-20; Sea-ice sample - Gillisia hiemivivida CIP 108528T; 2004, J. P. Bowman, Tasmania, Australia: strain IC154; Sea-ice algal assemblage

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 60 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

7. EMBRIC pathways to discovery

The European Marine Biological Research Infrastructure Cluster (EMBRIC) has brought together six Research Infrastructures from the biological sciences to promote the use of marine resources in research and development of biological products.

• EMBRC - European Marine Biological Resource Centre: www.embrc.eu/ • MIRRI - Microbial Resource Research Infrastructure: www.mirri.org/ • EU-OPENSCREEN - European Infrastructure of Open Screening Platforms for Chemical Biology: www.eu-openscreen.eu/ • ELIXIR - A distributed infrastructure for life-science information: https://www.elixir- europe.org/ • AQUAEXCEL - Aquaculture infrastructures for excellence in European fish research: www.eatip.eu • RISIS - Research Infrastructure for research and information policy studies: http://risis.eu/

The Microbial prototype pipeline described here has demonstrated a tried and tested route through some of the facilities of the partners of these infrastructures leading from marine samples to compounds. EMBRIC offers many routes for researchers to facilitate and accelerate discovery of potential new biological products. Each participating Research Infrastructure (RI) and associated centres of excellence provides access to their facilities, expertise and technologies. A researcher can select the centres that offer technologies they do not have access to, including harmonised multidisciplinary workflows that will support their work thus initiating joint activities to overcome obstacles and bottlenecks. There are numerous routes that may be taken in the discovery process and EMBRIC has demonstrated how working with the biological and medical science research infrastructures can aid in the selection of the most appropriate screening process, thereby accelerating the process of discovery. The first step is to select the location to sample, followed by the prioritisation of organisms and selective sampling to enrich for organisms with the desired properties. The oceans are vast and the potential enormous, the organisms are frequently slow growing; so critical decisions should be made to utilise time and resources effectively. A typical workflow begins at the source where there are two options:1) isolation directly from the environment; or ii) to select the organisms already isolated and stored in ex situ biological resource collections such as the microbial domain Biological Resource Centres (mBRC) coordinated by MIRRI and EMBRC. MIRRI brings together 31 international mBRCs which, together, hold a total of >330K microorganisms, ranging from bacteria (including cyanobacteria) to yeasts, filamentous fungi and algae. The marine resources from these collections are listed in EMBRIC Deliverable D3.1 but can also be accessed via the EMBRC (http://www.embrc.eu/) and MIRRI (https://www.mirri.org/home.html) websites. There are also resources outside these Research infrastructures see World Data Centre for Microorganisms (WDCM) http://www.wdcm.org/.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 61 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

The first question, for a user to consider, is: how do researchers identify the potential that exists to be exploited? In some respects, identifying potential is easier if the organism has been isolated and fully characterised. Some of this data may be available in online collection catalogues or databases but often contact should be made with the collection itself or it may be necessary to search literature databases to discover which microorganisms may have the desired properties; EMBRIC, particularly MIRRI and EMBRC can help with this selection. However, the question remains, how does one identify the potential of organisms in the environment that may be difficult to isolate and maintain in the laboratory? How can you select the most likely candidates in situ to target and invest in order to deliver new and useful properties? EMBRIC has begun the process of mapping current mechanisms available to help facilitate such choices. Existing elements identified for consideration, from source to active molecules, have been determined, as follows:

• Identifying the potential of organisms in nature and collections

• Targeting the organisms and selection of the sampling regime

• Identifying collection targets, the organisms, environment and location

• Screening environmental consortia

• Isolating the organisms

• Identifying the organisms, in axenic mono-culture

• Appropriate characterisation, for example through genomic DNA analysis using NGS approaches – e.g. total sample DNA for the yet uncultured

• Selection and application of characterisation technology

• Data analysis and use

• Scaling up

• Extraction

• Purification and delivery of chemically defined compounds

• Compliance with the regulatory environment EMBRIC Deliverable D3.1 Map of centres of expertise and best practices provides detailed coverage of the EMBRIC cluster (http://www.embric.eu/deliverables). It describes a web- based tool to provide relevant information and links (http://www.embric.eu/node/121). The deliverable outlines the identification, networking and integration of existing capacities (technologies, knowledge and skills) both within and outside the partnering RIs. Table 7 (below) describes some of the technologies available across EMBRIC and their application. However, there is an additional advantage to be gained by utilising the EMBRIC cluster. As the deliverable describes that by pooling expertise, a critical mass can be focused on specific user problems, such as, the development of dedicated isolation protocols (e.g. selective media to access the microorganisms that cannot yet be cultivated); thus, taking advantage of the full potential of the combined RIs. By accessing the research infrastructures, the loss of resources

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 62 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

and time, by rediscovery of known and already patented compounds, can be avoided (i.e. de- replication). Additionally, this approach ensures compliance with regulations, ensuring that permissions to collect are secured where needed and legal obligations are met. As EMBRIC moved forward it has demonstrated in its three pipelines, this one on microorganisms, focussing on bacteria, one on microalgae and a third on finfish that platforms and gateways to technologies and collaborations generate new knowledge and access new compounds. Table 7: Some of the technologies available across EMBRIC and their uses (extracted from Deliverable 3.1) Technology Examples of where these Specific Use are available Antibodies for carbohydrates See EU-OPENSCREEN Antibodies to Screening and Chemistry carbohydrate antigens Centres are critical for the study of bacteria cell-cell adhesion interactions; for the analysis of viral, hormone, and toxin receptors; analysis of the glycosylation of recombinant proteins Assay technology, Spectrophotometry, EU-OPENSCREEN High-throughput GFP / fluorescence methods screening for early stage discovery RT² Profiler PCR Arrays for gene EMBRC, e.g. SZN Identification of expression analysis of targeted cell signalling pathways death, inflammation and antioxidant targeted by marine pathways natural products for their potential therapeutic applications (e.g. anticancer, anti- inflammatory, anti- neurodegerative) Chemical probes EU-OPENSCREEN e.g. HZI - To establish the Helmholtz Centre for Infection relationship between a Research; molecular target with http://www.chemicalprobes.or which it interacts and g/ the broader biological consequences of modulating the target in cells or organisms. CNMR - Carbon-13 Nuclear Magnetic EMBRIC e.g. USTAN - Allows the identification Resonance University of St. Andrews of carbon atoms in an organic molecule Cosmid/ BAC/PAC library generation EMBRIC e.g. USTAN To identify candidate genes stemming from functional traits

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 63 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

GC-MS - Gas Chromatography Mass See EU-OPENSCREEN Crude extracts, Spectrometry Screening and Chemistry environmental samples Centres (solvent extracted and treated), complex mixtures of chemicals may be separated, identified and quantified Generation of novel vectors for gene EMBRIC e.g. USTAN The identification of sequence to protein novel transcripts HNMR - High Resolution Nuclear EMBRIC e.g. USTAN Chemical structure Magnetic Resonance determination HPLC - High Performance Liquid Used to separate, Chromatography identify, and quantify each component in a mixture HSQC - Heteronuclear Single EMBRC e.g. USTAN NMR spectroscopy of Quantum Coherence organic molecules and is of particular significance in the field of protein NMR. Used to determine chemical structure Identification of the biosynthetic cluster EMBRIC e.g. USTAN Target novel activities encoding LC-MS - Liquid chromatography–mass Differentiate chemical spectrometry mixtures on the basis of mass Mode of action analysis for isolated EU-OPENSCREEN e.g. HZI Help researchers and and structurally characterised technologists to identify compounds potential Morphological keys and intelligent Identification of software, lucid, organisms Mutant generation and sequencing EU-OPENSCREEN e.g. HZI Modification of expression to generate products NOESY - Nuclear Overhauser Effect EMBRIC e.g. USTAN Characterizing and Spectroscopy refining organic chemical structures Phenotypic assays EU-OPENSCREEN e.g. HZI Targeting potential source organisms Peptide arrays EU-OPENSCREEN e.g. HZI Enzyme profiling Programme to unlock silent clusters EMBRIC e.g. USTAN Unlocking the Potential of Bacterial Gene Clusters to Discover New Antibiotics e.g.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 64 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Seyedsayamdost, M. R. (2014). Sanger sequencing of known regions MIRRI e.g. CABI, DSMZ Bar-coding; identification e.g. Cox, ITS, 16s, 18s;, of organisms NGS Numerous sites within sequence and annotate EMBRC, ELIXIR and MIRRI genomes at a much faster rate; study variation, expression and DNA binding at a genome-wide level Raman Spectroscopy Measuring the Chemical Identity and Structure of Materials MALDI-tof Mass Spectrometry Numerous sites within Matrix-assisted laser EMBRC, ELIXIR and MIRRI desorption/ionization (MALDI) is a soft ionization technique used in mass spectrometry, allowing the analysis of biomolecules (biopolymers such as DNA, proteins, peptides and sugars) and large organic molecules (such as polymers, dendrimers and other macromolecules); also used for strain definition Mobilization and integration of data on DSMZ (BacDive) One stop shop - all characteristics of individual analysis of all existing microorganisms using controlled phenotypic, biochemical vocabulary and molecular information of target microorganisms; extensive search functions across large numbers of different taxa for desired properties

The microbial prototype pipeline has explored ways to integrate and mine metabolomics, genomic and metagenomic data. Technologies available enable us to carry out analysis of the samples before isolation and indeed to identify the cells that may be of interest by challenging them in the sample with substrates and isolating those individual cells that show activity. Organisms normally difficult to isolate from marine samples have been grown, characterised and interesting compounds identified. A prototype pipe line utilizing expertise and facilities in

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 65 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

three European Research Infrastructures demonstrates the value for a researcher to access facilities not normally available to them. In May 2016, over 900 samples from the Pacific Ocean were collected; many responded positively to substrate challenges particularly to cysteine, isovaleric acid, spermidine and tween. Candidate strains isolated were made available for compound discovery and tested the original workflows of the pipeline, this resulted in a revised pipeline to identify and access the strains of greatest potential (see Fig. 9).

Figure 9. Revised pipeline following initial testing

As reported above, USTAN generated cosmid libraries for a series of more tractable and sequenced test strains. Biosynthetic gene cluster analysis prioritised 9 clusters according to novelty. Cosmids containing them were incorporated into strains of Streptomyces coelicolor and Streptomyces lividans for heterologous expression. Fermentation broth extracts revealed heterologously produced metabolites. Larger biosynthetic gene clusters were accessed via a complementary ‘Bac’ library approach, removing the need for cluster stitching prior to heterologous expression. The methodologies are shared in this deliverable to enable researchers to follow similar approaches but it is undoubtedly more efficient to involve the appropriate microbial domain Biological Resource Centres in the study. The workflow in the

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 66 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

microbial prototype pipeline tested here continues with difficult to culture microbes sequenced at DSMZ, selection of the most interesting strains for cosmid and bac library generation, and heterologous expression by USTAN. It is possible to visualise many such routes using different technologies and expertise at the many different centres within the EMBRIC RI cluster depending on the end products or properties being sought. Having identified the potential activities and compounds, extracts are analysed and compounds isolated by HZI. Again, methodologies may differ, HZI are extracting different metabolite classes: e.g. short-chain fatty acids to larger polar and nonpolar compounds of primary and secondary metabolism. EU- OPENSCREEN (http://www.eu-openscreen.eu/) is a distributed RI which integrates high- capacity screening platforms throughout Europe. Each can be engaged to help in compound analysis and characterization. They jointly use a rationally selected compound collection, comprising up to 140,000 commercial and proprietary compounds collected from European chemists. EU-OPENSCREEN offers to researchers from academic institutions, SME's and industrial organisations open access to its shared resources. EU-OPENSCREEN will collaboratively develop novel molecular tool compounds with external users from various disciplines of the life sciences. Active compound discovery, isolation and characterization are not straightforward and various hurdles and bottlenecks present themselves. The EMBRIC partners work together to resolve or circumvent these bottlenecks, helping the researcher to accelerate their route to discovery (see Fig. 10).

Figure 10. Bottle necks in the microbial prototype pipeline

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 67 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Figure 11. Overview of routes to discovery Source Microbial Pipeline Best Legal Research Prototype Practice Frame Infrastructure partners work Sourcing organisms EMBRC; MIRRI DSMZ; Ex situ - Microbial domain In situ – targeted isolation OECD BPG IPR; CABI Biological Resource centres programmes from the marine (2007); ISO CBD (mBRC) environment and identification Standards Nagoya Protocol Selected from holdings Baiting and biofilm techniques; Health Flow cytometry selection & safety

MIRRI and DSMZ; Cultivation and biomass production Institutional EMBRC CABI Methods DSMZ; Biomass generated or DNA Cosmid libraries generated; Genomic Health USTAN extracted for sequenced genome analysis: selection of Standards & safety bioactive microorganisms strains for growth Consortium

Determination of organism potential ELIXIR and EU- USTAN; Biosynthetic gene cluster Cosmids containing them were Nagoya OPENSCREEN analysis Larger clusters incorporated into strains for Protocol complementary Bac library heterologous expression ABS approach

Extract production and exploration EU- DSMZ Fermentation Broths Bioassay Guided LC- Institutional Health OPENSCREEN; in fractionated fractionation MSMS Methods & extracts from complementary according profiling safety; mBRCs or media to polarity CBD commercial Nagoya sources Protocol

EU- HZI Rapid identification of new compounds ensuring mitigation Comparison OPENSCREEN against wasted resource on compound rediscovery of datasets Enzyme mining short-chain fatty acids to larger polar and Institutional nonpolar compounds of primary and Methods secondary metabolism

HZI Purification of characterised compounds Institutional Extracts and up-scaled full chemical biological Institutional Health EU- compounds fermentation, characterisation assessment Methods & safety OPENSCREEN to/from extraction assisted by of partners –IP and genome compound – Compound purification reading bioassay Progression

Protection of investment and future use

MIRRI – DSMZ Strains cryopreserved and catalogued; OECD BPG IPR; mBRCs; EU- HZI Related extracts and purified compounds stored; (2007); ISO CBD OPENSCREEN Standards; benefit libraries DSMZ Related data and strains and compounds catalogued EU-OPEN- sharing CABI Legal clarity for use: compliance with regulatory environment SCREEN and MIRRI

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 68 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Once fully tested and operational, the microbial prototype pipeline was tested with 3 interesting fully sequenced strains from Institute Pasteur. These strains were isolated from sea-ice and were considered to have potential new compounds to be exploited:

• Psychrobacter glacincola CIP 105313T; 1997, ACAM <- D. Nichols, Tasmania University, Australia isolated from shelf Sea Ice • Psychrobacter maritimus CIP 108811T; 2005, DSMZ <- L. A. Romanenko: strain Pi 2-20; Sea-ice sample • Gillisia hiemivivida CIP 108528T; 2004, J. P. Bowman, Tasmania, Australia: strain IC154; Sea-ice algal assemblage The isolates entered the pipeline at the characterisation level with full genome sequences and went to USTAN where a cosmid library was generated, and heterologous expression was carried out and mining for novel biosynthetic gene clusters. One strain was found to produce interesting compounds; extracts were provided to HZI for further analysis. There are obviously larger numbers of routes to discovery possible via various RI nodes or partner institutions considering the comprehensive capacities of the EMBRIC partners (Brennecke et al. 2018; Smith et al. 2018). Figure 11 provides an overview of this potential. The samples collected from the Pacific entered the prototype pipeline established by EMBRIC at DSMZ, USTAN and HZI. Here the focus was on difficult to isolate bacteria and DNA sequencing to discover biosynthetic genes with the hope to discover bioactive metabolites. The stating point and ultimate goals of the screening might be quite different but not beyond the expertise and facilities of EMBRIC and component RI’s. Figure 11 outlines the EMBRIC microorganism prototype pipeline and through the stepwise process • Cultivation and biomass production • Determination of organism potential • Extract production and exploration • Rapid identification of new compounds ensuring mitigation against wasted resource on compound rediscovery • Purification of characterised compounds • Protection of investment and future use Microorganisms are a group that includes unicellular and filamentous organisms such as, Archaea and Bacteria, (Prokaryotes) and including yeasts, filamentous fungi, microalgae and unicellular protists and protozoans (Eukaryotes). The EMBRIC partner with the most appropriate expertise would be chosen to help isolate and grow to create the necessary biomass. The Researcher or bioprospecting company would then choose which technologies they need to discover the molecules they are looking for. In the majority of cases genome technologies would utilised and these are not only available in EMBRIC via MIRRI, EMBRC and ELIXIR but from many other sources outside EMBRIC. Once properties are discovered the task of extracting, purifying and characterizing the molecules of interest is needed, again appropriate technologies can be selected from the EMBRIC partners bridging potential gaps in researcher’s facilities presenting new and different approaches to ease the path to discovery.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 69 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

8. Summary

Work package 6 overarching objective was to demonstrate that the components of the pipeline from the microorganisms in the environment to the active molecules and products on the market (taking relevant input from the work of EMBRIC WPs 2-5) were effective. This required the development of coherent chains of high-quality services for access to biological, analytical and data resources and deploying common underpinning technologies and practices for the route to useful secondary metabolites from marine bacteria. A prototype pipeline to demonstrate the cross-infrastructure possibilities brought together the expertise and facilities of single partners from the MIRRI, EMBRC and EUOPENSCREEN research infrastructures. Although EMBRIC (specifically, EMBRC and MIRRI) covers most types of microorganism Bacteria were chosen as the starting point for the microorganism prototype pipeline as they are key components of the marine environment. They perform a wide range of biogeochemical and ecological functions yet we know very little about them. It is estimated that we have seen less than 1 percent in culture and the vast potential remains locked away. Population genomics has provided us with a picture of what might be there and an idea of the chemistry they may perform. EMBRIC’s microorganism prototype pipeline demonstrates how this potential can be unlocked. It utilises the specialist expertise and facilities of key laboratories and selected organisms that are difficult to cultivate or have yet to be grown are being targeted. DSMZ recovered strains from samples taken from the Pacific Ocean and these were characterised and prepared for scale up and production of active compounds. DSMZ determined optimal growth conditions for some of these organisms, these were sequenced to enable those bacterial strains with novel properties to be targeted and further studied. In one single experiment to develop this microbial prototype pipeline 264 rarely isolated species of bacteria were made available for study of their bioactive compounds. This demonstrates that coordinated and targeted isolation programmes engaging research teams from the many partners in the Research Infrastructures orchestrated for specific bioindustry needs could yield many thousands of candidate organisms. The potential for discovery of new interesting bioactive compounds is thus increased exponentially. USTAN produced cosmid libraries and detected 9 gene clusters and a series of compounds. Improvements to the process were made using elicitors and promoters and some difficult cyclic peptides were characterised. HZI improved the efficiency of different extraction and characterization techniques for use on different types of organisms and as a result isolates producing interesting compounds were discovered. The application of molecular networking combined with manual dereplication and matching with HZI in-house library not only led to the annotation of several known metabolites, but also enabled untargeted strain prioritization. As a result, one extract from a DSMZ strain producing potential new metabolites is being further studied. Such an approach allowed for the effective visualization of structural relationships between chemical entities in the whole data set, pointing towards potential novel compounds (unknown metabolites).

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 70 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

The microorganism prototype pipeline has resulted in a flexible system to access cross research Infrastructure expertise and facilities to meet specific user defined demand of the research community. Such research must be performed in compliance with the regulatory environment. In the interests of the progress of science microbiologists must be able to exchange their organisms upon which their hypotheses and results are based but they must do this in a way that presents minimum risk to those who come into contact with the organism. They must not fall foul of the laws that control the shipping of microorganisms as this will inevitably result in even more restrictive legislation that will make their exchange impossible. Health and Safety, packaging and shipping and controlled distribution legislation may be extensive and sometimes cumbersome but is there to protect us and must be followed.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 71 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

9. References and bibliography

Adler J (1973) A method for measuring chemotaxis and use of the method to determine optimum conditions for chemotaxis by Escherichia coli. J Gen Microbiol 74:77–91. Anon (1994) Approved Code of Practice for Biological Agents 1994. Health and Safety Executive. Sudbury: HSE Books. Anon (1996a) Categorisation of pathogens according to hazard and categories of containment. Fourth edition. Advisory Committee on Dangerous Pathogens (ACDP). London: HMSO. Anon (1996b) European Standard EN 829:1996 E: Transport packages for medical and biological specimens, Requirements, tests. Brussels: CEN, European Committee for Standardisation. Brennecke, P, Ferrante, MI, Johnston, IA & Smith, D. (2018) A collaborative European approach to accelerating translational marine science. Springer’s Marine Biotechnology Journal In the press Bruns A, Hoffelner H, and Overmann J (2003) A novel approach for high throughput cultivation assays and the isolation of planktonic bacteria. FEMS Microbiol Ecol 45: 161–71. Camarinha-Silva A, Jauregui R, Chaves-Moreno D, Oxley AP, Schaumburg F, Becker K, et al. (2014) Comparing the anterior nare bacterial community of two discrete human populations using Illumina amplicon sequencing. Env Microbiol 16: 2939–2952. Connon SA and Giovannoni SJ (2002) High-throughput methods for culturing microorganisms in very-low-nutrient media yield diverse new marine isolates. Appl Env. Microbiol 68: 3878–3885. Davison, A., Brebandere, J. de., & Smith, D. (1998). Microbes, collections and the MOSAICC approach. Microbiology Australia 19(1), 36-37. Depke T., Franke R., Brönstrup M., J Chromatogr B Analyt Technol Biomed Life Sci. 2017, 1071, 19-28 Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460–2461. EEC Directives 93/88/EEC. Protection of workers from risks related to biological agents Fröstl JM and Overmann J (1998) Physiology and tactic response of the phototrophic consortium “Chlorochromatium aggregatum”. Arch Microbiol 169: 129–135. Galkiewicz JP and Kellogg CA (2008) Cross-kingdom amplification using bacteria-specific primers: complications for studies of coral microbial ecology. Appl Environ Microbiol 74: 7828–7831. Gich F, Janys MA, König M, and Overmann J (2012) Enrichment of previously uncultured bacteria from natural complex communities by adhesion to solid surfaces. Environ Microbiol 14: 2984–2997. González-Menéndez V, Asensio F, Moreno C, de Pedro N, Monteiro MC, de la Cruz M et al. (2014) Assessing the effects of adsorptive polymeric resin additions on fungal secondary metabolite chemical diversity. Mycology 5: 179–191. IATA - International Air Transport Association (2002) Dangerous Goods Regulations. 43rd edition. Montreal; Geneva: IATA. Jaspers E (2000) ‘‘Zur ökologischen Bedeutung der Diversität planktischer Bakterien: Erkenntnisse aus der Analyse von Reinkulturen.’’ Ph.D. Dissertation, University of Oldenburg. Kjelleberg S, Humphrey BA, Marshall KC (1982) Effect of interfaces on small, starved bacteria. Appl Environ Microbiol 43: 1166–1172. Lane DJ (1991) 16S/23S rRNA sequencing. In: Stackebrandt, E., Goodfellow, M. (eds.), Nucleic Acid Techniques in Bacterial Systematics, John Wiley and Sons, New York, vol. pp. 115-175.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 72 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Muyzer G, de Waal EC, Uitterlinden AG (1993) Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction- amplified genes coding for 16S rRNA. Appl Environ Microbiol 59: 695–700. Overmann J (2005). Chemotaxis and behavioural physiology of not-yet-cultivated microbes. Methods Enzymol 397: 133–147. Overmann J (2013). Principles of enrichment, isolation, cultivation, and preservation of prokaryotes. In The prokaryotes (pp. 149-207). Springer Berlin Heidelberg. Overmann J, Abt B and Sikorski J (2017). Presence and future of culturing bacteria. Annu. Rev. Microbiol 71: 711–730. Pascual J, Wüst PK, Geppert A, Foesel BU, Huber KJ and Overmann J (2015) Novel isolates double the number of chemotrophic species and allow the first description of higher taxa in Acidobacteria subdivision 4. Syst Appl Microbiol 38: 534–544. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P et al. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41: D590–596. Seyedsayamdost, MR (2014). High-throughput platform for the discovery of elicitors of silent bacterial gene clusters. Proc. Natl. Acad. Sci. 2014, http://www.pnas.org/content/111/20/7266.abstract Smith, D, Buddie, AG, Goss, R, Overmann, J, Lepleux, C, Brönstrup, M, Kloareg, B, Meiners, T, Brennecke, P, Ianora, A, Bouget, F-Y, Gribbon, P & Pina, M. (2018) Marine resources for discovery pipelines – an ocean of opportunity for biotechnology? Biotechnology and Bioprocess Engineering In the press

Useful websites World Federation for Culture Collections: http://www.wfcc.info/ World Data Centre for Micro-organisms: www.wdcm.org/ ASM – Asian Consortium for the Conservation and Sustainable Use of Micro-organisms www.acm-mrc.asia/ ECCO, European Culture Collection Organisation: http://www.eccosite.org European Commission DGVII – Transport: https://ec.europa.eu/transport/about-us_en Food and Agriculture Organization (FAO): http://www.fao.org/home/en/ World Animal Health Organization (OIE): http://www.oie.int/eng/en_index.htm International Plant Protection Convention (IPPC): https://www.ippc.int/en/ The Australia Group: http://www.australiagroup.net/ Biological Weapons Convention (BWC): https://www.un.org/disarmament/wmd/bio/ MIRCEN - global network of Microbial Resources Centres http://www.ejbiotechnology.info/content/mircen/index.html WIPO - World Intellectual Property Organization: http://www.wipo.int ISO - International Organization for Standardization: https://www.iso.org/home.html International Air Transport Association: http://www.iata.org/Pages/default.aspx; Dangerous Goods regulations: http://www.iata.org/publications/dgr/pages/index.aspx International Civil Aviation Authority: Pipeline and Hazardous Materials Safety Administration https://www.icao.int/ https://www.phmsa.dot.gov/international-program/international-civil- aviation-organization Maritime Law: https://www.investopedia.com/terms/m/maritime-law.asp; https://www.thegef.org/partners/conventions.

Useful bibliography Anon (1994). Approved Code of Practice for Biological Agents 1994. Health and Safety Executive. Sudbury: HSE Books.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 73 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Anon (1996b). European Standard EN 829:1996 E: Transport packages for medical and biological specimens, Requirements, tests. Brussels: CEN, European Committee for Standardisation. Cartagena Protocol on Biosafety to the Convention on Biological Diversity, https://bch.cbd.int/protocol. EC Council Directive 2000/29/EEC on protective measures against the introduction into the Member States of harmful organisms of plant or plant products. OJ No. L. 169, p.1 of 10.07.2000 EC Council Regulation 1504/2004 amending and updating Regulation 1334/2000 EC Council Directive 95/44/EC on establishing the conditions under which certain harmful organisms, plants, plant products and other objects listed in Annexes I to V to Council Directive 77/93/EEC may be introduced into or moved within the Community or certain protected zones thereof, for trial or scientific purposes and for work on varietal selections EC Council Directives 90/219/EEC and 98/81/EC on contained use of genetically modified organisms EC regulation 1946/2003 on the transboundary movement of genetically modified organisms (pertains to Cartagena Protocol on Biosafety) EC Council Directive 2000/54/EEC On the protection of workers from risks related to exposure to biological agents at work. OJ No. L. 262, pp.21-45 of 18.09.2000 EN 1619:1996 Biotechnology – large-scale process and production – General requirements for management and organisation for strain conservation procedures. EC Council Regulation No 1334/2000 of 22 June 2000 setting up a Community regime for the control of exports of dual-use items and technology. OJ No L 159 of 30.6.2000 (Amended by: EC Council Regulation 149/2003 of 27 January 2003, OJ L 30 of 05.02.2003, corrigendum OJ L 52 of 27.02.2003). IATA - International Air Transport Association (2005) Dangerous Goods Regulations 47th edition Montreal; Geneva: IATA. ISO 17025:2005, General requirements for the competence of testing and calibration laboratories. ISO 7218:2000, Microbiology of food and animal feeding stuffs. General rules for microbiological examinations. ISO 9001:2000, Quality Management Systems – Requirements Kirsop, B.E. & Doyle, A. (eds) (1991). Maintenance of Microorganisms and Cultured Cells: A Manual of Laboratory Methods, London: Academic Press. OECD (2001). Biological Resource Centres. Underpinning the future of life sciences and biotechnology. OECD (2006). Creation and Governance of Human Genetic Research Databases. Smith, D, Rohde, C (2002). The implication of the biological and toxin weapons convention and other related initiatives for WFCC members. WFCC Newsletter 34: 4-11.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 74 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Technical Instructions for the Safe Transport of Dangerous Goods by Air. Doc 9284-AN/905. Council of ICAO, International Civil Aviation Organisation. United Kingdom National Culture Collection (1998) Quality manual. UKNCC Secretariat CABI Bioscience UK Centre, Egham, UK. Universal Postal Convention, Compendium of Information, Bern (International Bureau), Universal Postal Union, Beijing, 2000. World Federation for Culture Collections (1999). Guidelines for the establishment and operation of collections of cultures of micro-organisms. UK: WFCC Secretariat. (2nd ed.). WHO World Health Organization, Geneva, Non-serial Publication, ISBN: 92 4 154650 6. Laboratory Biosafety Manual, Third Edition, English, 2004.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 75 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Annexe 1. Additional information on media formulae used at DSMZ

A) Medium Artificial Sea Water (ASW) /HD 1:10 ” Pepton 0.5 g Glucose 0.1 g Yeast Extract 0.25 g ASWa 1000.00 mL Adjust pH to 7.3 Add to 1000 mL of medium after autoclaving: Trace element solution SL-10b 1.00 mL Vitamin solutionc 1.00 mL

B) Medium “polymer mix” Peptine 1.0 g Chitin 1.0 g Cellulose 1.0 g Xylan 1.0 g Curdlan 1.0 g Basal mediuma 1000.00 mL Adjust pH to 7.3 Add to 1000 mL of medium after autoclaving: Trace element solutionb 1.00 mL Vitamin solutionc 1.00 mL

C) Medium “insoluble humic analogs” Abietic acid 500 µM Quercetin 500 µM Coumestrol 500 µM Methyl cinnamate 500 µM ASWa 1000.00 mL Adjust pH to 7.3 Add to 1000 mL of medium after autoclaving: Trace element solution SL-10b 1.00 mL Vitamin solutionc 1.00 mL

D) Medium “soluble humic analogs” Salicylate 500 µM Phtalic acid 500 µM AQDS 500 µM Furfural 500 µM Hydroxymethylfurfural 500 µM

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 76 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Lignosulfonate 500 µM Basal mediuma 1000.00 mL Adjust pH to 7.3 Add to 1000 mL of medium after autoclaving: Trace element solution SL-10b 1.00 mL Vitamin solutionc 1.00 mL

E) Medium “Soil Extract (SE)/HD 1:10“ Peptone 0.50 g Yeast extract 0.25 g Glucose 0.10 g 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) 10 mM Soil extract mediumd 1000 mL Add to 1000 mL of medium after autoclaving: Trace element solution SL-10b 1.00 mL Vitamin solutionc 1.00 mL

aDepending on the salinity of the seawater samples, media were based either on artificial sea water media (ASW, modified from Bruns et al., 2003) or artificial brackish water (ABW).

• Artificial Sea Water media (ASW; modified from Bruns et al., 2003) NaCl 23.6 g

MgCl2·7H2O 4.53 g

CaCl2·2H2O 1.3 g KCl 0.64 g

MgSO4·7H2O 5.94 g

Na2HPO4 0.01 g

NH4NO3 2.1 mg 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) 2.3 g Distilled water 1000.00 mL

• Artificial Brackish Water (ABW) NaCl 5.53 g

MgCl2·7H2O 2.76 g

CaCl2·2H2O 0.33 g KCl 0.15 g

Na2SO4 0.90

NaHCO3 0.20 g KBr 22.8 mg

H3BO3 5.7 mg

SrCl2 5.4 mg

NH4Cl 4.9 mg

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 77 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

KH2PO4 1.2 mg NaF 0.7 mg HEPES 2.38g Distilled water 1000.00 mL bTrace element solution HCl (25%; 7.7 M) 10 mL

FeCl2 x 4H2O 1.50 g

ZnCl2 70.0 mg

MnCl2 x 4H2O 100.0 mg

H3BO3 6.0 mg

CoCl2 x 6H2O 190.0 mg

CuCl2 x 2H2O 2.0 mg

NiCl2 x 6H2O 24.0 mg

Na2MoO4 x 2H2O 36.0 mg Distilled water 990 mL

First dissolve FeCl2 in HCl, then dilute with water, add and dissolve the other salts. Finally make up to 1000 mL. cVitamin solutions Biotin 2.0 mg Folic acid 2.0 mg Pyridoxine-HCl 10.0 mg

Thiamine-HCl x 2H2O 5.0 mg Riboflavin 5.0 mg Nicotinic acid 5.0 mg D-Ca-pantothenate 5.0 mg

Vitamin B12 0.10 mg p-aminobenzoic acid 5.0 mg Lipoic acid 5.0 mg Distilled 1000 mL dSoil extract medium (SE) The SE medium was prepared following an established procedure (DSM medium 12; http://www.dsmz.de/), using the sediments samples. After autoclaving and centrifugation, the resulting supernatant was filtered through a 0.22-µm filter and then autoclaved a second time. eChemoattractants used in the chemotaxis experiments • Tween 0.001% • DMSO 1% • Mix sugars I (trehalose, cellobiose, maltose) 2 mM each • Mix sugars II (gentobiose, sucrose) 2 mM each • Mix sugars III (N-acetylglucosamine, mannitol, rhamanose) 2 mM each

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 78 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

• KH2PO4 2 mM • 20 Amino acids 2 mM • Fatty Acid mix (formate, acetate, valerate, propionate, butyrate) 2mM each • TCA mix (lactate, succinate, citrate, pyruvate, oxaloacetate, α-ketoglutarate) 2mM each + • Nitrogen compounds (NH4 , TMAO, urea) 1 mM each

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 79 of 80 Deliverable D6.1 EMBRIC (Grant Agreement No. 654008)

Annexe 2. Details of applied UPLC-MS/MS method

1. Final preparation of lyophilised extracts for UPLC-MS analysis. Re-suspend extracts in the corresponding extraction/fractionation solvent to reach a concentration of 0.5 mg/mL. Transfer 20 µL of this into an analysis LC-MS vial with glass insert. In case of dichloromethane/methanol (1:1, v/v), cyclohexane and ethyl acetate, these 20 µL have to be evaporated under a stream of nitrogen, since these solvents are not compatible with later chromatography conditions. After evaporation of solvent, add 20 µl of acetonitrile 20% for resuspension. 2. Liquid chromatography – tandem mass spectrometry The injection volume was 4 µl. Extracts were separated by ultra-high performance liquid chromatography on a Dionex Ultimate 3000 UPLC (Thermo Fischer Scientific, Waltham, MA) using a 150 mm Kinetex C18 reversed phase column with 1.7 μm particle size and 2.1 mm inner diameter (Phenomenex, Aschaffenburg, Germany) with a flow rate of 300 μL/min. Gradient elution with water with 0.1% (v/v) formic acid as eluent A and acetonitrile with 0.1% (v/v) formic acid as eluent B was run as follows: 1% B for t = 0 min to t =2 min, linear gradient from 1% B to 100% B from t = 2 min to t =20 min, hold 100% B until t = 25 min and linear gradient from 100% B to 1% B from t = 25 min to t = 30 min. Samples were analysed by positive mode and negative mode electrospray ionization quadrupole time-of-flight mass spectrometry on a maXis™ HD QTOF (Bruker, Bremen, Germany) in full scan mode (50–1500 Da). Data dependent MS/MS was performed by collision-induced dissociation of the three most abundant ions in each scan, making use of Bruker’s “smart exclusion” functionality to minimise multiple fragmentation of the same ion. The collision energy was ramped from 80% to 200% of the default auto-MS/MS collision energy in order to get more information rich spectra. Both an external calibration before and after every run, and an internal lock mass calibration in every scan was applied, that leads to a small mass error (in most cases <1ppm). This should in turn lead to very few, in most cases just one sum formula for compounds with masses <400 Da. In addition to that also the isotopic pattern is taken into account to predict/narrow down the potential sum formula. However, an automated sum formula assignment was not implemented.

Deliverable D6.1 EMBRIC showcases: prototype pipelines from the microorganism to product discovery Page 80 of 80