<<

XO FILES SCIENCE eXtraordinary Opportunity PHILANTHROPY ALLIANCE

Creating a Census of Human Cells For the first time, new techniques make possible a systematic description of the myriad types of cells in the human body that underlie both health and disease.

Aviv Regev

Imagine you had a way to cure cancer that involved taking a molecule from a tumor and engineering the body’s immune cells to recognize and kill any with that molecule. But before you could apply this approach, you would need to be sure that no healthy cell also expressed that molecule. Given the 20 trillion cells in a human body, how would you do that? This is not a hypothetical example, but was one recently posed to our laboratory. More fundamentally, without a map of different cell types and where they are found within the body, how could you systematically study changes in the map associated with different diseases, or High resolution images of retinal tissue from the human eye, showing ganglion cells (in green). The circular cell body is attached to thin dendrites on which nerve synapses form—in different understand where associated with tissue layers depending on the cell type and function. Credit: Lab of Joshua Sanes, Harvard. disease are active in our body or analyze the regulatory mechanisms that govern the production of different cell types, or genetic profile of an individual cell— proteins they use and that information sort out how different cell types combine identifying which genes are turned synthesized with the location into an to form specific tissues? on, actively making proteins. It is image that locates the cells of interest. We suggest the answer to these in effect the zip code for cell types.  Sampling algorithms and Big Data questions is to create a Human Cell Atlas Newly-invented automated tools can computational techniques that enable that organizes cells by type and location now process thousands of cells for creation of an overall cell census. These in specific tissues. What makes this now sequencing per second for a low cost. approaches are new to biology on the possible is the recent development of  The ability to map the location of scale proposed here. With appropriate three new tools: specific cell types within living tissue at sampling, analyzing just 50 million  The ability to rapidly determine cell high resolution. One technique uses cells—one for every 400,000 in the types by rapid capture, processing ion beams that scatter cell particles body—can give a detailed draft and RNA sequencing of single cells. from specific locations in a tissue; picture of human cell types. Big The RNA sequence reveals the the cell types are identified by the Data techniques can then be used XO FILE | Creating a Census of Human Cells

to combine zip codes and physical locations into a unique and invaluable reference database. Twenty-five years ago, scientists first proposed

Cells are the basic unit of life, yet the Human Genome Project to systematically they vary enormously. Huge quantities discover all of the cellular components encoded of new red blood cells are made every day, whereas nerve cells—especially the by our genes. At the time, it seemed an audacious neurons that are the processors of the brain— are made early in life and new goal, but one that proved achievable. We now ones are rarely born thereafter. The types of cells also vary widely from one tissue propose a similar systematic effort to define the to another. The lining of the gut contains cells that absorb nutrients, immune cells that underlie human health and disease. cells to fend off harmful microbes, and neurons—as well as cells of the beneficial bacteria that colonize us. The Another study of a particular class of T-cells 4. Capture the key characteristics of retina at the back of the eye functions associated with autoimmune diseases found cells during transitions, such as as a kind of digital camera, capturing an subtle differences in cells taken from the gut differentiation (from a ) or image and shipping it off to the brain and from the brain, changes that appear activation; and for analysis—and it contains more than to stem from fats in the diet and that may 5. Trace the history of cells through a 100 different types of neurons; one kind suggest new drug targets for treating these lineage—such as from a predecessor of neuron can be important to identify autoimmune diseases. Analyzing tens of stem cell in bone marrow to a when the light is turned on, another thousands of retina cells led to discovery of functioning red blood cell. for when the light is turned off, and so two new cell types that have eluded decades on. The T-cells of our immune systems of meticulous research. Just as with the Human Genome come in different forms, depending on Some 25 years ago, scientists first Project, the task is large but finite and whether they are found in the blood, proposed the Human Genome Project to can only be done successfully within in the gut, in the mouth, or in nasal systematically discover all of the cellular the context of a unified project that passages. Moreover, variations in specific components encoded by our genes. At the engages a broad community of biologists, genes that can lead to disease typically time it seemed an audacious goal, but one technologists, physicists, computational manifest themselves in specific cells, that proved achievable. We now propose a scientists and mathematicians. those cells where the genes would similar systematic effort to define the cells Some factors that point toward normally be active—muscular dystrophy that underlie human health and disease. success of the project include: in skeletal muscle cells, for example. Specifically, within five years we Both this enormous variety from one Manageable Scale. The number of propose to generate a detailed first draft type of cell to another, and the mix of human cell types depends on the level of a molecular atlas of cells in the human cells from tissue to tissue are critical to of resolution at which they are defined. body. This Human Cell Atlas will: the functioning of our body, but have A few hundred types are often quoted, not been fully studied or characterized. 1. Catalog all cell types and sub-types; but just the blood and immune system Already, in preliminary studies of 2. Distinguish cell states (e.g. a naive alone may have over 300 molecularly and the type proposed here, our lab and immune cell that has not yet functionally distinct sub-types. While the collaborators have discovered a completely encountered a pathogen compared to number appears daunting at first, there are unknown type of dendritic cells—immune the same immune cell type after it is multiple cell “copies” of the same type, and cells that constitute our first line of defense activated by encountering a bacterium); thus this is a sampling problem. Statistical against pathogens—that make up only 3. Map cell types to their location within considerations and mathematical theory 4 of every 10,000 cells in the blood. tissues and within the body; suggest that we can sample a manageable XO FILE | Creating a Census of Human Cells

The eXtraordinary Opportunity | How to Create a Human Cell Atlas

The field of genomics has substantial genomic profile of a single cell. That means in sequencing and storage costs, and a focus experience in large-scale projects such as determining which RNA molecules it expresses on RNA measurements as the first line of proposed here, but there are important from its DNA, which proteins are expressed characterization). We propose to analyze 50 differences. Genetic studies often focus on from the RNA, and related information such million cells in a five year initial effort. differences in the DNA between individuals, as how the cell’s DNA is decorated with This initial phase of the Human Cell but cannot tell the critical differences additional molecules that control it. With Atlas will also define markers for different between individual cells, including where recent breakthrough technologies this can be cell types, for which antibodies and other the genetic differences manifest themselves. done for large numbers of cells very quickly probes can be developed to find specific cell Indeed, within an individual, nearly every cell and inexpensively. This data characterizes types within a tissue. It will provide a direct has the same DNA, but it uses (or “expresses”) which genes are active in a given cell—in view of living human tissue—removing only a portion of it. In contrast, a Human effect, which proteins it produces, and what distorting effects of on which Cell Atlas focuses directly on the differences the cell does. much current knowledge is based. It will among cell types and is thus more diverse These innovations—based on advances provide a way to integrate a large body of and complex—tracking several very in molecular biology, microfluidics, droplet legacy data. Moreover, the Human Cell Atlas different types of data—and requires more technology and computation—now enable will help uncover the regulatory processes technological and computational innovation. massively-parallel assays that can process that control cell differentiation and cell What makes such a project possible is very hundreds of thousands of cells at very low interactions. Finally, and non-trivially, the recent advances in the ability to analyze the cost; we estimate a cost of about $0.17 project will generate standardized, tested, per cell. A second emerging method of and broadly applicable experimental and characterization involves imaging cells computational methods that will be useful in inside tissues at high resolution. Finally, many other contexts. new experimental and computational A level of support of $100M over five years techniques couple molecular profiling (of would support the initial organization and RNA or proteins) with ion beams to high execution of this ambitious effort. Federal resolution spatial information about their funding on this scale is unlikely, and multiple location within a tissue or even within a grants would not allow the integration of Human tissue Indication of compute resources sample cell, providing a unique characterization of expertise across many groups of investigators the structure of tissues.Indication We ofestimate compute that resources and the complex coordination needed to Single high-memory, overallmulti-core cost server for characterizing cells by these ensure comparable and reproducible results. combined methods will be $1.00 per cell (an The Human Cell Atlas is thus an extraordinary Massivelyestimate parallel on computing that assumesgrid continuingSingle reductionhigh-memory, opportunity for private philanthropy. multi-core server

Massively parallel on computing grid

Cells Oil

Illumina Mi-Seq Sequencer Big Data Tools Map Barcoded Droplet reads cell RNA and synthesize and categorize shows cell types and abundances Cells bead formation primers establishes cell type information to create a map

Microfluidic device The process shown here, which can sort and categorize 5,000 cells per second at very low cost, illustrates sorts and process cells the power of single cell genomics. The map illustrates the abundance and variety (by color) of an analysis of 44,000 human retinal cells, distinguishing 39 separate clusters of distinct cell types.

Microfluidic device used in Drop-Seq. Beads (brown in image), suspended in a Illumina Mi-Seq lysis agent, enter the device from the central channel; cells enter from the top and bottom. Laminar flow prevents mixing of the two aqueous inputs prior to droplet formation; this is evident in the image from the refraction of light along the interface of the two flows.

Image courtesy of Oni Basu, Regev lab. XO FILE | Creating a Census of Human Cells

number of cells and still recover fine data collection that is comparable across the pilot, would analyze more cells per distinctions with confidence. systems. Within such a consortium, tissue, additional tissues, expand work to be defined through a community in model , and deploy more Sample Collection. Experience has shown process, there will be working groups for measurement techniques; it could also how to acquire excellent collections human samples, model organisms, and extend analysis to disease tissues. of human tissue samples with a well- technology development, in addition concerted effort, even by individual Having a complete Human Cell Atlas to centralized data acquisition and labs. And, unlike genetic studies, a large would be like having a unique zip code management. We would expect multiple number of individuals is not required. We of each cell type combined with a three- analytics efforts. propose to complement human sampling dimensional map of how cell types weave with limited similar studies of model Appropriate Staging. A Human Cell Atlas together to form tissues, the knowledge of organisms—primates, mice and others— is an endeavor of new scale and type. A how the map connects all body systems, to obtain otherwise inaccessible samples pilot phase that can be established quickly and insights as to how changes in the and to relate knowledge from human cells and serve to test alternative strategies map underlie health and disease. This to that obtained from lab experiments, for and to evaluate the basic premises of resource would not only facilitate existing which there is extensive legacy knowledge the work would likely be particularly biological research but also open new from decades of scientific research. effective. We propose a pilot phase with landscapes for investigation. A Human a relatively sparse survey of 100,000 cells Cell Atlas will provide both foundational Inclusive Organization. We envision a from each of 50 carefully chosen tissues biological knowledge on the composition community-wide effort that balances from human and mouse, complemented of multicellular organisms as well as the need for domain-expertise in a by a much deeper survey in a few well- enable the development of effective biological system with opportunities for chosen complementary systems, such as medical diagnostics and therapies. new technologies (more so than in past peripheral blood and bone marrow, gut, genomics projects), and yet also enables and liver. A full-scale project, building on

SCIENCE PHILANTHROPY ALLIANCE 480 S. California Ave. #304, Palo Alto, CA 94306 | 650-338-1200 | [email protected] | www.xofiles.org