EpiFlu™ Database v1.1 www.gisaid.org Joachim Büch1*, Kirsten Roomp2*, Trevor Bedford3, Naomi Komadina4, Rebecca J. Kondor5, Richard Neher6, Catherine Smith5, Sebastian Maurer-Stroh7**, Gunter Bach8** 1 Max-Planck-Institute for Informatics, Saarbrücken, , 2 Luxembourg Centre for Systems Biomedicine, University of Luxembourg 3 Fred Hutchinson Cancer Research Center, Seattle, USA, 4 WHO Collaborating Centre for Reference & Research on , Melbourne, Australia 5 Centers for Disease Control and Prevention, Atlanta, USA, 6 Biozentrum, University of Basel, , 7 Institute, Agency for Science, Technology and Research, Singapore, 8 The GISAID Initiative, , Germany. *Joint first authors, ** Joint last authors

BACKGROUND FEATURES IN DETAIL RESULTS The Global Initiative on Sharing All • Contains influenza sequences Get list of identified mutations As of August 2019, 9,000 participants Influenza Data (GISAID) provides a and associated meta data with rely on data from 1,200 laboratories directly from sharing mechanism for its publicly each isolate GISAID’s EpiFlu Get exhaustive annotation for each mutation entrusted to GISAID from 198 nations. The FluSurver in a nutshell accessible EpiFlu™ database that • Genetic, clinical, epidemiological incentivizes the rapid exchange of EpiFlu™ database remains essential & geographical data for human Alternative numberings influenza virus data, by providing isolates plus species specific data Geographic Structure models for the Global Influenza Surveillance open-access to researchers and the associated with non-human isolates and Response System (GISRS) and development of medical interven- WHO biannual vaccine strain selec- • Batch and single upload functions tions in a transparent manner, while tion. Data in EpiFlu™ is comprised of Temporal • Ability of submitter to edit data Effect with protecting inherent rights of data Literature links 1,2 million nucleotide sequences from submitters. submitted to the platform Structural interactions nearly 300,000 influenza virus strains. GISAID fosters benefit sharing and • Each isolate accompanied by an Among GISAID’s contributors are OIE collaboration between submitters audit trail with a history of edits and FAO Reference Laboratories for and users of data. Scientists’ reti- • Submitted data available for view , as well as all WHO cence to share data prior to publica- to other users immediately National Influenza Centers and WHO tion via public-domain archives e.g. • Download facility of meta data Collaborating Centers for Surveillance, GenBank, where use of data takes associated with isolates in an and Control of Influenza. place anonymously and void of any Excel format To provide a complete picture of circu- enforceable conditions to safeguard • Sequence download in FASTA lating influenza strains, data from contributors’ rights, prompted the format with user defined headers public-domain archives are routinely creation of GISAID in 2008. Figure 3. FluSurver mutation analysis connected to • BLAST, alignment tool, FluSurver, nextflu phylogenetic trees imported. nextflu phylogenetic trees (Figure 3) OVERVIEW Upload Functions OUTLOOK & The EpiFlu™ database application, • Can upload data via single upload function CONCLUSION developed by the Max-Planck- or batch upload sheet (Figure 4) for multiple Institute for Informatics, enables the Since in 2010, the Federal Republic of isolates analysis of the world’s most complete Germany constitutes an important collection of the latest seasonal to • Metadata only needs to be entered once for pillar for the sustainability of GISAID new animal influenza viruses. The each isolate through its ongoing commitment to application is based on a proprietary • Once isolates uploaded, extra gene se- host the EpiFlu™ database. GISAID is software code and Oracle software. quences can be added via the edit function in the process of developing a new Extensive metadata are also col- and advanced database application • For isolates added via the batch upload software (v3.0) to address the needs lected for most isolates. EpiFlu™ sheet multiple extra gene sequences can also provides features for searching, of the user community for advanced be added by reusing the initial batch upload bioinformatics capabilities. filtering specific datasets for down- sheet load and upload functionality. Figure 2. The search results page This development will also extend the • Batch upload occurs in real time, with an spectrum of data analysis tools. The All users of EpiFlu™ have agreed to immediate response positively identify themselves. While functionality of the database will also Browse Functions • Error messages are displayed for isolates access is free of charge, all users be expanded to include more data & sequences not uploaded also agreed that they will not attach • Can choose to browse all isolates types. in the database, user submitted any restrictions on the data, but will • Duplicate sequences & isolates are flagged GISAID EpiFlu™ has not only become isolates, or isolates not available on acknowledge both the originator of and not added a trusted, but indispensable resource other databases (Figure 1) the specimen and the submitter of • Successful uploads are flagged and isolate for the global scientific community of the data, and seek to collaborate with • Can browse using one or more accession and segment accession numbers influenza researchers in addition to the Originating Laboratory. filters for: are added to the return batch upload sheet public and veterinary health officials. • Type, subtype, isolate name, genes, species, region, country of origin • Lineage swl or seasonal for H1N1, Yamagata or Victoria for Type B influenza viruses • Isolate Originating laboratory • Isolate sequence Submitting laboratory • Isolate specimen date • Isolate submission date • Can define fields to be displayed Figure 1. The browser menu Figure 4: The batch upload facility, successfully uploaded isolates