Genome Informatics 2016 Davide Chicco1 and Michael M
Total Page:16
File Type:pdf, Size:1020Kb
Chicco and Hoffman Genome Biology (2017) 18:5 DOI 10.1186/s13059-016-1135-5 MEETINGREPORT Open Access Genome Informatics 2016 Davide Chicco1 and Michael M. Hoffman1,2,3* Abstract of sequence variants. Konrad Karczewski (Massachusetts General Hospital, USA) presented the Loss Of Func- A report on the Genome Informatics conference, held tion Transcript Effect Estimator (LOFTEE, https://github. at the Wellcome Genome Campus Conference Centre, com/konradjk/loftee). LOFTEE uses a support vector Hinxton, United Kingdom, 19–22 September 2016. machine to identify sequence variants that significantly disrupt a gene and potentially affect biological processes. We report a sampling of the advances in computational Martin Kircher (University of Washington, USA) dis- genomics presented at the most recent Genome Informat- cussed a massively parallel reporter assay (MPRA) that ics conference. As in Genome Informatics 2014 [1], speak- uses a lentivirus for genomic integration, called lentiM- ers presented research on personal and medical genomics, PRA [3]. He used lentiMPRA to predict enhancer activity, transcriptomics, epigenomics, and metagenomics, new and to more generally measure the functional effect of sequencing techniques, and new computational algo- non-coding variants. William McLaren (European Bioin- rithms to crunch ever-larger genomic datasets. Two formatics Institute, UK) presented Haplosaurus, a variant changes were notable. First, there was a marked increase effect predictor that uses haplotype-phased data (https:// in the number of projects involving single-cell analy- github.com/willmclaren/ensembl-vep). ses, especially single-cell RNA-seq (scRNA-seq). Second, Two presenters discussed genome informatics while participants continued the practice of present- approaches to the analysis of cancer immunotherapy ing unpublished results, a large number of the presen- response. Meromit Singer (Broad Institute, USA) per- + ters had previously posted preprints on their work on formed single-cell RNA profiling in dysfunctional CD8 bioRxiv (http://www.bioRxiv.org) or elsewhere. Although T cells. She identified metallothioneins as drivers of T earlier in 2016, Berg et al. [2] wrote that “preprints are cur- cell dysfunction and revealed novel sub-populations of rently used minimally in biology”,this conference showed dysfunctional T cells [4]. Christopher Miller (Wash- that in genome informatics, at least, they are already used ington University, St Louis, USA) tracked the quite widely. response to cancer immunotherapy in the genome of patients [5]. Personal and medical genomics In a keynote lecture, Elaine Mardis (Washington Several talks covered systems and new technologies that University, St Louis, USA), described computational tools clinicians, patients, and researchers can use to understand and databases created to collect and process cancer- human genomic variation. Jessica Chong (University of specific mutation datasets. A substantive increase in Washington, USA) described MyGene2 (http://mygene2. the amount of clinical sequencing performed as part of org), a website that allows families to share their de- cancer diagnosis and treatment necessitated the devel- identified personal data and find other families with simi- opment of these tools. She emphasized the shift in lar traits. Jennifer Harrow (Illumina, UK) discussed using categorization of cancers—previously oncologists clas- BaseSpace (https://basespace.illumina.com/) for the anal- sified cancers by tissue, but increasingly they classify ysis of clinical sequencing data. Deanna Church (10x cancers by which genes are mutated. Mardis suggested Genomics, USA) presented Linked-Reads, a technology that we should instead describe cancers by the affected that makes it easier to find variants in less accessible metabolic and regulatory pathways, which can provide genomic regions such as the HLA locus. Several presen- insight even for previously unseen disruption. This dis- ters showed new methods to identify the functional effects ruption can be genetic mutations, but it can also man- ifest as other changes to cellular state, which must be *Correspondence: [email protected] measured with other techniques, such as RNA-seq. The 1Princess Margaret Cancer Centre, Toronto, Canada tools Mardis described help interpret the mutations iden- 2Department of Medical Biophysics, University of Toronto, Toronto, Canada Full list of author information is available at the end of the article tified by sequencing. These include the Database of © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Chicco and Hoffman Genome Biology (2017) 18:5 Page 2 of 4 Curated Mutations (DoCM). She also described Personal- In a keynote lecture, Richard Durbin (Wellcome ized Variant Antigens by Cancer Sequencing (pVAC-seq), Trust Sanger Institute, UK) discussed genome reference a tool for identifying tumor neoantigens from DNA-seq assemblies and the pitfalls of using a single flat refer- and RNA-seq data. She also described Clinical Inter- ence sequence. Genomicists use the reference genome pretations of Variants in Cancer (CIViC), a platform for mapping sequencing reads, as a coordinate system for crowd-sourcing data on clinical consequences of for reporting and annotation, and as a framework for genomic variants. CIViC has 1565 evidence items describ- describing known variation. While the reference genome ing the interpretation of genetic variants, and Mardis makes many analyses simpler, it biases these analyses announced a forthcoming Variant Curation Hackathon to towards what is previously seen. Durbin briefly discussed identify more. the advantages of the newest human reference assem- bly, GRCh38, which fixes many previous problems and Variant discovery and genome assembly includes alternate loci to capture complex genetic vari- Several speakers presented tools and methods about anal- ation. But to more effectively work with this variation, ysis of genome assemblies and exploration of sequence Durbin said we need to switch from a flat reference to variants. Jared Simpson (Ontario Institute for Cancer a “pan-genome” graph that includes much known vari- Research, Canada) started the second session with an ation[8].Todothis,wewillneedanewecosystemof overview of base calling for Oxford Nanopore sequenc- graph genome file formats and analysis software. Durbin ing data and his group’s contribution to this field, discussed the work of the Global Alliance for Genomics Nanocall (http://github.com/mateidavid/nanocall). Simp- and Health to evaluate proposed systems for working with son also discussed Nanopolish, which can detect 5- graph genomes. methylcytosine from Oxford Nanopore sequencing data directly, without bisulfite conversion. Kerstin Howe (Well- Epigenomics and the non-coding genome come Trust Sanger Institute, UK) presented her work Speakers described new methods for epigenomic data, with the Genome Reference Consortium on produc- such as DNase-seq (deoxyribonuclease sequencing), ing high quality assemblies for different strains of ChIP-seq (chromatin immunoprecipitation sequenc- mouse and zebrafish. Ideally, future work will integrate ing), and RNA-seq data. Christopher Probert (Stanford graph assemblies. Frank Nothaft (University of California, University, USA) presented DeepNuc, a deep learning Berkeley, USA) described ADAM (https://github.com/ technique able to determine nucleosome positioning bigdatagenomics/adam), a library for distributed com- from paired-end ATAC-seq datasets. Michael Hoffman puting on genomics data, and Toil, a workflow man- (Princess Margaret Cancer Centre, Canada) described agement system. These systems are about 3.5 times amethodtoanalyzeChIP-seqandRNA-seqdatasets faster than standard Genome Analysis Toolkit (GATK) and classify transcription factor binding sites into pipelines. four binding variability categories: static, expression- Some presenters discussed genome assembly tools independent, expression-sensitive, and low [9]. Anshul and datasets which might be utilized by the wider Kundaje (Stanford University, USA) described a deep community. Andrew Farrell (University of Utah, USA) learning approach that integrates epigenomic datasets introduced RUFUS (https://github.com/jandrewrfarrell/ (such as DNase-seq or ATAC-seq) to predict transcrip- RUFUS), a method for efficiently detecting de novo muta- tion factor binding sites across diverse cell types. Kundaje tion using k-mer counting instead of reference-guided also presented a new way to interpret the learned model alignment. Alicia Oshlack (Murdoch Childrens Research (https://github.com/kundajelab/deeplift). Institute, Australia) presented the SuperTranscript model Severalpresentersdescribedtheanalysisoftran- for enhancing transcriptome visualization (https://github. scription factor binding sites and enhancers. Katherine com/Oshlack/Lace/wiki). Jouni Sirén (Wellcome Trust Pollard (University of California, San Francisco, Sanger Institute, UK) presented a method to index