Project Deliverable
Total Page:16
File Type:pdf, Size:1020Kb
Ref. Ares(2013)2985750 - 03/09/2013 Project number: 317871 Project acronym: BIOBANKCLOUD Project title: Scalable, Secure Storage and Analysis of Biobank Data Project website URL: http://www.biobankcloud.com/ Project Coordinator Name and Organisation: Jim Dowling, KTH E-mail: [email protected] WORK PACKAGE 8 : Dissemination Work Package Leader Name and Organisation: JAN-ERIC LITTON, Karolinska Institute (KI) E-mail: [email protected] PROJECT DELIVERABLE D8.1: Dissemination plan Deliverable Due date (and month since project start): 2013-08-31, m9 Deliverable Version: v0.2 BiobankCloud D8.1 page 1/22 317871 Document history Version Date Changes By Reviewed Lora Dimitrova Karin Zimmermann 0.1 2013-08-14 First draft Michael Hummel Jörgen Brandt Ulf Leser Lora Dimitrova 0.2 2013-08-26 Final Roxana Merino Martinez Michael Hummel Jan-Eric Litton BiobankCloud D8.1 page 2/22 317871 Abstract The goal of this deliverable is to elaborate a plan for the dissemination of the BiobankCloud. For this purpose two different approaches will be applied: active as well as passive dissemination. Different academic and industrial partners will be actively contacted, while internet tools, various meetings/workshops and publications in major scientific journals will be used for passive dissemination of the BiobankCloud. In D8.1 we describe these possibilities for dissemination of the knowledge about the BiobankCloud in detail. BiobankCloud D8.1 page 3/22 317871 Table of Contents Dissemination............................................................................................................. 5 1. Active Dissemination ........................................................................................... 6 1.1 Companies ............................................................................................... 6 1.1.1 Next Generation Sequencing Equipment Suppliers ............................... 6 1.1.2 Bioinformatics application developers ................................................... 9 1.2 Academic Users ..................................................................................... 10 1.2.1 Large-scale Next Generation Sequencing Users ................................. 10 1.2.2 Small-scale Next Generation Sequencing Users ................................. 13 1.2.3 Scientific Consortia ............................................................................. 13 2. Passive Dissemination ...................................................................................... 17 2.1 Internet Presence of the BiobankCloud .................................................. 17 2.2 Conferences and Meetings ..................................................................... 18 2.2.1 Biobanking Conferences and Meetings ............................................... 18 2.2.2 Next Generation Sequencing Conferences and Meetings ................... 19 2.3 Workshops ............................................................................................. 21 2.4 Publications in Major Journals ................................................................ 21 3. Conclusions ...................................................................................................... 22 4. References ........................................................................................................ 22 BiobankCloud D8.1 page 4/22 317871 Dissemination For the dissemination of the knowledge about the BiobankCloud we will use two different approaches: active and passive dissemination. Active dissemination means that scientists, institutes or companies will be contacted by us and we will present the advantages of the BiobankCloud for storage/analysis of their next generation sequencing (NGS) data to them. Passive dissemination means, that we will introduce the BiobankCloud to the scientific community through the internet, meetings/workshops as well as publications and will develop various approaches for being found and contacted by future users. In Diagram 1 a schematic overview of the different possibilities for dissemination of the BiobankCloud is shown. Diagram 1: Overview of the Different Dissemination Strategies. BiobankCloud D8.1 page 5/22 317871 1. Active Dissemination There are two groups of users, who will be actively contacted: academic and industrial users. In the following section these two groups are described in detail. 1.1 Companies Two groups of potential industrial users will be actively contacted: NGS equipment suppliers and bioinformatics application developers. 1.1.1 Next Generation Sequencing Equipment Suppliers Several companies worldwide provide NGS machines, and most of them have developed solutions for storage and processing of the generated NGS data. In this section the NGS platforms will be shortly described and the way, in which companies manage NGS data will be explained using the example of Illumina. Illumina, Inc. (San Diego, CA, USA) Illumina (https://basespace.illumina.com/home) has developed the technology called sequencing by synthesis, which is implemented in four different platforms: the HiSeq series, HiScan SQ, MiSeq and the Genome Analyzer IIx system. The runs carried out with these systems have a maximal output of 8,5–600 Gb of data and the run duration goes from 4 h–11 days. Many diverse applications can be performed with these platforms like DNA sequencing (whole genome/whole exome sequencing, targeted resequencing, de novo sequencing, amplicon sequencing, single nucleotide polymorphism (SNP) discovery, copy number variations (CNV) identification and analysis of chromosomal abnormalities), RNA sequencing (transcriptome analysis, small RNA discovery), ChIP-Seq and methylation analysis. For analysis, archiving and sharing of the generated NGS data Illumina has developed the BaseSpace cloud platform. The platform is built on Amazon’s Web Services, who are leading in cloud-based infrastructure. All sequencer from the MiSeq and HiSeq series have the BaseSpace platform integrated and when starting a run the sequencing data can be directly sent in real time to BaseSpace, thus saving time for data transfer. In the cloud the data can be securely stored and shared with collaborators. The storage of the first TB of data is free forever. Additionally BiobankCloud D8.1 page 6/22 317871 BaseSpace allows the transfer of ownerships- in this way e.g. NGS core facilities can provide sequencing data to their customers very quickly, without the need of transferring the data. For analysis of the NGS data many BaseSpace apps are available. While alignment and variant processing of the data is free, the integrated apps have different prices. 454 Life Sciences / Roche Applied Science (Penzberg, Germany) 454 Life Sciences/Roche (http://www.454.com/) has introduced the first commercially available next generation sequencer (Genome Sequencer 20) in 2005. They continued developing NGS based on the pyrosequencing of DNA captured on beads during the next years and now, two NGS systems are available on the market. The GS Junior System generates shorter reads (400 bp average length, longest reads up to 600 bp) and has a run time of 10 h, while the GS FLX+ System run takes 23 h, but generates reads up to 1.000 bp (mode read length 700 bp). For analysis of the NGS data the GS Data Analysis Software has been developed. The platforms are used for whole genome sequencing, targeted resequencing, metagenomics, epigenomic and methylation analysis as well as RNA-Seq. Applied Biosystems / Life Technologies, Inc. (Carlsbad, CA, USA) Applied Biosystems (ABI; http://de-de.invitrogen.com/site/de/de/home/brands/ Applied-Biosystems.html) has launched three 5500 Series Genetic Analysis Systems: 5500 System, 5500xl System with 1.0 µm microbeads and the 5500xl System with 0.75 µm nanobeads. With these platforms genome, exome, transcriptome and other analyses can be carried out in parallel within the same run with the highest accuracy. The volume of the generated data is 7–more than 20 Gb/day and the runs last 2 days (exome sequencing)–7 days (human genome sequencing). For analysis of the generated data ABI has developed the LifeScope™Genomic Analysis Software. Ion Torrent / Life Technologies, Inc. (Carlsbad, CA, USA) Based on the semiconductor sequencing technology Ion Torrent (http://de- de.invitrogen.com/site/de/de/home/brands/Ion-Torrent.html) has developed two BiobankCloud D8.1 page 7/22 317871 systems (ION PROTON™SEQUENCER and ION PGM™SEQUENCER) for NGS. The ION PROTON™SEQUENCER is used for exome, whole transcriptome and human genome sequencing, while the ION PGM™SEQUENCER is suitable for small-genome, gene set and ChIP-Seq sequencing as well as for gene expression analysis. Data analysis, annotation and archiving are performed with the Ion Reporter™ Software. Pacific Biosciences of California, Inc. (Menlo Park, CA, USA) Pacific Biosciences (http://www.pacificbiosciences.com/) has developed a single molecule DNA sequencing system, which allows sequencing of DNA molecules in real-time, without prior amplification and in 30-120 minutes. This technology produces the longest read lengths (average length of 3.000-5.000 bp, longest reads > 20.000 bp) compared with other sequencing platforms and the data generated have high accuracy and sensitivity. Another advantage of the PacBio RS II system is the possibility to analyze DNA base modifications like methylated adenine, which can’t be performed with the other platforms. The NGS machine is supplied with different software tools for processing of the generated data (SMRT Analysis software).