SINGLE-CELL ANALYSIS ENTERS the MULTIOMICS AGE a Rapidly Growing Collection of Software Tools Is Helping Researchers to Analyse Multiple Huge ‘-Omics’ Data Sets
Total Page:16
File Type:pdf, Size:1020Kb
Work / Technology & tools 10X GENOMICS 10X Multiomics data are increasingly being combined with spatial information. SINGLE-CELL ANALYSIS ENTERS THE MULTIOMICS AGE A rapidly growing collection of software tools is helping researchers to analyse multiple huge ‘-omics’ data sets. By Jeffrey M. Perkel t takes about 20 days for a mouse to grow that alters how genes are expressed. The second one of four first authors on the study, which from fertilized egg to newborn pup. Ricard was chromatin accessibility: how modifications was supervised by Babraham investigator Wolf Argelaguet and his colleagues were inter- to chromatin, the knotty complex of proteins Reik, as well as John Marioni at the EMBL-Euro- ested in what exactly happens inside the and DNA in eukaryotic nuclei, affect which pean Bioinformatics Institute in nearby Hinx- cells of a mouse embryo between days 4.5 parts of the DNA are accessible for transcrip- ton, and Oliver Stegle at the German Cancer Iand 7.5, when the stem cells shift into three tion into RNA. Both are factors in epigenetics, Research Center in Heidelberg. layers: the ectoderm, which develops into the the non-genetic elements that influence how Their result explains the decades-old obser- nervous system; the mesoderm, which devel- genes are expressed. vation that embryonic stem cells in culture will ops into muscle and bone; and the endoderm, Combining the three data sources revealed preferentially differentiate into neurons. And which develops into the gut and internal organs. something unexpected: in the absence of it’s a finding, says Argelaguet, that would have Researchers can easily distinguish between external stimuli, embryonic stem cells will been impossible to make using just a single these three layers by looking at which genes become ectoderm. “This was the most essen- type of data. are expressed in individual cells. But the team tial contribution of the paper,” Argelaguet wanted a more nuanced picture. So, in 2019, says. It showed “that there is kind of a hierar- Genomics explosion the researchers combined the gene-expression chy of cell fate specification at the epigenetic The past decade has witnessed an explo- data with two other sources of information1. The level”. Argelaguet, a computational biologist at sion in single-cell genomics. Single-cell RNA first was methylation, a chemical modification the Babraham Institute in Cambridge, UK, was sequencing (RNA-seq), which profiles gene 614 | Nature | Vol 595 | 22 July 2021 | Corrected online 21 July 2021 ©2021 Spri nger Nature Li mited. All rights reserved. ©2021 Spri nger Nature Li mited. All rights reserved. expression, is the most common technique. at Harvard University in Cambridge, Massa- Dozens of tools have been developed to Other methods detail processes such as meth- chusetts, says that some of his team’s compu- achieve this, and many are indexed on the ylation, genetic variation, protein abundance tations for a study describing a method called community-driven ‘awesome-multi-omics’ and chromatin accessibility. SHARE-seq took weeks to complete3. and ‘awesome-single-cell’ lists on GitHub. Now, researchers are increasingly com- The pay-off of the added detail, says Bernd Seurat, for instance, developed by Rahul bining these methods — and the resulting Bodenmiller, who studies single-cell tumour Satija’s team at the New York Genome Center, layers of data — in ‘multiomics’ experiments. biology at the University of Zurich, Switzer- effectively aligns UMAP visualizations of two Argelaguet, for instance, combined gene-ex- land, is that it helps researchers to “understand data sets to create a “shared, low-dimensional” pression profiling, methylation and chromatin the biology”. And they can do so using existing space, says Tim Stuart, a computational biol- accessibility in a technique called scNMT-seq. data sets, such as the Human Cell Atlas and its ogist in Satija’s group. “That enables you to Another technique, CITE-seq, profiles both 13.5 million cell profiles. In a preprint4 pub- find neighbours of one data set in the other transcription and protein abundance. And lished in June, Chengxiang Qiu, a graduate data set, and vice versa.” Other popular G&T-seq captures both genomic DNA and RNA. student in Shendure’s lab, and his colleagues options include: Argelaguet’s MOFA, which Whatever the acronym, all these techniques combined 1.4 million cells from 4 published he describes as a “kind of a multiomics gen- aim to glean complex biological insights atlases to tease apart how one cell type arises eralization” of principal-component analysis; that might be undetectable using any single from another across 10 days of mouse devel- Harmony, from Soumya Raychaudhuri’s team method. But the task is computationally chal- opment. The resulting “trajectories of mam- at Harvard Medical School in Boston, Massa- lenging, and making sense of the resulting data malian embryogenesis” revealed more than chusetts; and LIGER, developed by Joshua even more so. A fast-growing suite of software 500 transcription factors that could have a role Welch’s team at the University of Michigan in tools can help. in cell-type specification. Ann Arbor. According to Welch, just as online Almost all single-cell studies contain visu- retailers can mine their customers’ purchase alizations — sometimes called t-SNE or UMAP Software tools histories to identify products that a user is plots — which represent single cells as points The information can be integrated in three likely to want, LIGER uses ‘integrative non-neg- on a 2D plane. Studying how those points main ways, depending on what features (or ative matrix factorization’ to identify related aggregate, or cluster, can help researchers to ‘anchors’) the data sets have in common, cells and cell clusters. discern biological structures. But the visuali- says Marioni, who has published a review5 zations aren’t easy to create. on the topic. ‘Horizontal integration’ is used Spatial hackathon For one thing, single-cell data sets have for data sets of the same type — two RNA-seq With such a fast-growing tool set, researchers quickly become enormous. Back in 2019, can struggle to know what they should use for Argelaguet captured individual cells in micro- “We are piling up our which questions, and how to go about it. To titre plates using a fluorescence-activated cell help close those gaps, Elana Fertig at Johns sorter, which limited him to analysing 200– knowledge and methods and Hopkins University in Baltimore, Maryland; 300 cells per week1. Now, he can process thou- tweaking them, so we don’t Aedin Culhane at Harvard T. H. Chan School of sands of cells, thanks in part to a microfluidics reinvent the wheel.” Public Health in Boston; and Kim-Anh Lê Cao at platform developed by the biotechnology the University of Melbourne, Australia, organ- company 10x Genomics in Pleasanton, Cali- ized a virtual conference on single-cell omics fornia. And a 2020 atlas of human fetal gene data sets, for instance. In that case, genes act data integration. As part of that event, held expression supervised by genome scientists as the anchors, “because you’re measuring the in June 2020, the organizers provided three Cole Trapnell and Jay Shendure at the Uni- same set of genes in every population of cells”, curated data sets and challenged attendees versity of Washington, Seattle, included four Marioni says. to apply whichever algorithms and workflows million cells2. The result is basically a table with ‘Vertical integration’ involves data sets col- they liked to integrate and interpret the data, 80 billion entries — 4 million rows of cells by lected from the same cells, such as for RNA- in a series of ‘hackathons’. Unlike in-person 20,000 genes. seq and chromatin accessibility. And ‘diagonal hackathons, in which researchers intensively And yet, “the overwhelming majority of the integration’ involves molecular measurements collaborate on software projects in shorts entries in that matrix are zero”, Trapnell says. made across unrelated populations of cells. bursts over a few hours or days, these were This represents a key statistical and computa- “The question is, what’s the common feature virtual events held over a month, with collab- tional challenge, as scientists work to distin- that you’re going to use?” Marioni says. One orators dispersed around the globe. One event guish true zeroes — for instance, genes that approach to vertical integration is to associate focused on Argelaguet’s mouse-embryo data are actually not expressed — from dropouts sites of chromatin accessibility with the genes set; the others concentrated on spatial-da- that result from sample handling or sensitivity they regulate, and then compute a probable ta-integration problems. issues. One option is to use imputation meth- gene-expression profile from the data. “We were interested to see what would be ods, which ‘borrow’ data from similar cells in “So, basically, you’re making it into a hori- the challenges we should anticipate in multi- the data set to fill in the gaps. As Stegle puts zontal-integration problem, where genes omics,” Lê Cao says. “We thought that it would it: “Your neighbours tell you something about become the anchors again,” Marioni says. be good to gather the different experts in the the unknown.” Integrating data sets, Trapnell says, is like field and see how they would approach the aligning DNA sequences. “You’re assuming analysis of multiomic studies in single cells.” Difficulty squared that a population of cells that you can see Conventional single-cell experiments detail Combining modalities only multiplies the dif- with one modality is visible with the other, and thousands of molecules at the expense of posi- ficulty, says Argelaguet. “All the weaknesses, that for most cells or cell populations, there’s tional information. Spatial methods capture all the noise, all the challenges from each tech- going to be a one-to-one mapping.” The trick, molecular identity without that dissociation nology, it just gets exacerbated by combining he says, is to align the sets so that you can be step.