<<

technology feature A dream of single- As single-cell proteomics emerges, perhaps labs can avoid the need to infer levels from mRNA abundances. Vivien Marx

owadays, labs can generate massive sets of single-cell genomic and Nsingle-cell transcriptomic data. In proteomics, high-throughput single-cell methods have not yet arrived. The nascent field of single-cell proteomics (sc-proteomics) is bringing change, and perhaps helping to ,251...1,252...1, avoid the need to infer from cellular ...2,374, 2,375... mRNA levels. It’s early days, but it’s not 3,554...3,555...3,556 a distant dream to be able to tally the 4,020...4,021...4,022 3,577...5,978...5,97..6,789...6,799 proteins in single cells, says , 20 a proteomics researcher at ETH Zurich and the University of Zurich. To make that dream an everyday reality, labs push hurdles out of the way. Proteins are tougher to work with than RNA or DNA, for example they’re stickier, but eventually researchers might be able to integrate single-cell mRNA and single-cell proteomic measurements. Over 20 years ago, says Aebersold, Richard Smith at Pacific Northwest National Laboratory and his colleagues characterized hemoglobin from a single red blood cell with a technique called capillary electrophoresis- Fourier transform ion cyclotron resonance 1 . The red blood cell It’s not a faraway dream to be able to tally proteins in single cells. Credit: S. Larochelle, E. Dewalt, was a special case, says Aebersold, since Springer Nature it mainly consists of hemoglobin. But this was single-cell analysis. Fast forward to a recent approach that Aebersold and his colleague Ben Collins call note the approach is “inherently single Single molecules a “marriage across the ages”2. Developed in molecule” and thus “there are reasonable It’s hard to speculate how fast technology the labs of , Eric Anslyn prospects for decreasing sample volumes for sc-proteomics will develop, says and colleagues, at the University of Texas at and protein abundance requirements.” Harvard Medical School researcher Austin, the technique yields the amino acid They state that once fluorescent labeling Peter Kharchenko, but “I am most excited sequence of individual proteins in a highly of low-abundance proteins with fluorescent about the ability to quantify parallelized fashion3. A spinout company, tags can be achieved, this method has and other modifications.” This would Erisyon, has been launched to commercialize application potential, such as for single-cell move the field “beyond simple abundance- a single-molecule protein sequencer. proteomics experiments. based models to more accurate dynamic The approach involves Edman sequencing, It’s far from single-cell analysis, says descriptions.” Humboldt University with which proteins were sequenced in the Aebersold, but “the method certainly has researcher, Rune Linding, hopes such days before mass spec. Proteins are cleaved potential to do single cells, potentially much approaches might open up new ways to and the are labeled with identifying faster and in ways that the mass spectrometer analyze phosphorylation dynamics. fluorescent tags; the tagged peptides are then would have a hard time doing.” As a student, Proteomics and labs pursue immobilized on a glass cover-slip. Successive he used Edman sequencing and he is different questions “but they clearly provide rounds of Edman degradation chemistry intrigued to see its revival for single-molecule different views on the same system,” says remove one amino acid at a time and the fluorescence. The method piggybacks on Aebersold. “In proteomics, we’re always kind peptides are imaged at each round. The team flow-cell technology used in high-throughput of limping a while behind the genomics found dyes that handled the process but some sequencers, he says. It will take work field,” he says. Single-cell genomics and mishaps occurred: dyes fell off, didn’t attach to make this a routine application, he says, transcriptomics can capture “a kind of well or provided inadequate fluorescence. and other labs have such approaches in genealogy of the cell,” and track cells as In their published study, the team analyzed their sights, too. This study is an important they evolve and change through mutations, a zeptomolar mixture of proteins, but they proof of principle. he says. Proteomics labs can now analyze

Nature Methods | VOL 16 | SEPTEMBER 2019 | 809–812 | www.nature.com/naturemethods 809 technology feature

cells with liquid chromatography and tandem sonication by lysing cells with a freeze–heat mass spectrometry (LC-MS/MS). They cycle in pure water5. The team is testing rethought sample preparation: cell lysis, the efficiency of this sample preparation , digestion and clean-up. technique and “it appears to be at least as And there’s mass spec instruments’ aversion good, if not better than, urea lysis,” he says. to the clean-up chemicals to consider, says He and his team used this method to Slavov. In standard mass spectroscopy, sample quantify 2,000 proteins in 356 cells — a is always lost as it can, for example, stick to sample containing both monocytes the sides of chromatography columns. Bulk and macrophages. sample analysis helps labs cope with that. Vogel says that pure water might avoid But, when assessing single mammalian cells artifacts from chemicals, but she wonders and their few hundred picograms of proteins how soluble proteins without ion content are each, little if any cargo should go missing. in pure water. Speaking more generally about sample preparation for sc-proteomics, she small numbers of cells. “It’s a beginning,” he says that proteins are trickier than RNA and Freeze–heat Protein is digested Water says. This is how single-cell transcriptomics cycle to peptides DNA: they cannot be amplified, they’re sticky started, followed by massive multiplexing and and they degrade easily. “Until someone barcoding strategies that allowed resolution invents something to ‘amplify proteins’ that’ll of large numbers of cells. Such trends will always be the problem,” she says. Many take a while to develop in proteomics. methods, including hers and others, address Labs are exploring cytometry-related ways sample preparation by using techniques such to reap single-cell data. A team that includes 96 or 384-well plate as hydrostatic pressure or engineered surfaces researchers at the University of Ottawa and for enrichment. the University of Oxford, used single-cell The Slavov lab is testing their mass spec mass cytometry (CyTOF) to capture the sample-prep method to lyse cells in pure water, Multiplexing cell-fate decisions during hematopoiesis, and using a freeze–heat cycle. Credit: Slavov lab, With TMT, around 20 barcodes can be tracked how factor expression Northeastern Univ. used. But labeling a protein or peptide changed during a cell’s lineage commitment introduces complexities, says Aebersold: at 13 time-points. They measured 27 proteins the reaction must be just right, excess simultaneously in single cells. Another single- Slavov, his graduate student Harrison has to be removed and there’s clean-up. cell CyTOF effort by a team at the University Specht, and the team, hunted a different way Multiplexing is a scale-up that makes of Zurich, along with colleagues at other to extract proteins efficiently for LC-MS/MS workflow more complex. institutions, profiled tumor and immune cells analysis. “We kept trying different analysis is readily available to biologists with from 144 human breast tumor samples. approaches, most of which didn’t work,” commercially well-supported techniques, In mass cytometry/CyTOF, cells are says Slavov. Sonication to lyse cells with while analysis is mainly done in prepped for analysis with isotope-conjugated focused acoustic waves led them to develop expert labs and “cannot easily reach the . In sc-proteomics, it will likely be SCoPE-MS. When they validated the throughput, robustness and reproducibility key to integrate different technologies and method, a student in the lab held the tube in of transcriptome analysis,” as Aebersold physical principles, as with the combination the sonicator to lyse cells one at-a-time. The and his colleague Ben Collins point out. of Edman sequencing and microscopy, says team mixed labeled peptides with labeled But it doesn’t have to stay that way. New York University researcher Christine ‘carrier’ peptides to avoid the “never-ending The 200–300 picograms of protein Vogel, who was a postdoctoral fellow in chase” to quantify sample losses from the in one mammalian cell cannot yet be tallied, the Marcotte lab. clean-up process. They used isobaric tandem says Aebersold. A properly tuned and mass tags (TMT), which bind to all peptides, optimized mass spec instrument can detect Rethinking sample prep including the carrier peptides. The TMT tags all or most proteins expressed in a cell only “My ideology is to make this as accessible all have the same molecular weight so “when when many cells are analyzed concurrently. as possible,” says Nikolai Slavov, at they enter the instrument, they’re going to His lab has detected 500 to 1,000 distinct Northeastern University, about his approach correspond to a single peak in m/z space,” proteins in a single cell from a sample to sc-proteomics methods development. In says Slavov. equivalent to a single cell using SWATH-MS, keeping with his training in genetics and Vogel likes SCoPE-MS and says it’s still a type of data-independent analysis. , and interests in math, chemistry necessary to test new buffers and optimize They have not yet ‘processed’ a single and physics, Slavov runs cross-disciplinary the approach, which her lab is currently cell but they injected a proportional sc-proteomics meetings: mass spec veterans doing. In SCoPE-MS, carrier cells act as fraction from a small number of cells attend along with researchers lacking such a kind of internal reference, says Linding. into the mass spec. The goal is to eliminate expertise, as well as physicians, computational “It works,” he says, but eventually labs will sample handling losses, he says, such as scientists and industry researchers. want to avoid a ‘carrier proteome’. Other material sticking to the surface of a Slavov is happy to see how CyTOF, single- methods may emerge for analyzing proteins microtiter plate or Eppendorf tube. cell Westerns and immunoassays are enabling and modifications, such as phosphorylation. The team hunted for the least absorbent quantification of proteins in single cells. To Right now, sample preparation and labeling material, chose polydimethylsiloxane move the possibilities for identification and technology are limiting sc-proteomics, he (PDMS) and built microfabricated devices quantification beyond these techniques, his says, and he believes CyTOF will play an that “work quite well,” he says. lab developed single cell proteomics by mass important role in validation and imaging. Cells flow into the device with one cell spectrometry (SCoPE-MS)4 for identifying Because Slavov wanted a more affordable per compartment, which is confirmed with and quantifying peptides from mammalian method, in SCoPE2 the team replaced imaging. Cells can be manipulated and lysed,

810 Nature Methods | VOL 16 | SEPTEMBER 2019 | 809–812 | www.nature.com/naturemethods technology feature proteins can be washed, and the sample can proteins are plentiful but in classic proteomic then be worked up for mass spec. The team is analysis, a cell’s single peptides are often still testing the approach. Aebersold likes thrown out. He’s addressing that with a that it only requires cell sorting — no new algorithm for making more accurate chemicals needed. error assessments8. “Eventually, in single-cell analysis, In proteomics, it can seem that insufficient each cell is a singleton,” says Aebersold. numbers of the cell’s proteins are detected No two cells are entirely identical. They or that only the most abundant ones are might resemble one another closely seen, says Linding, but mass spec does detect in terms of biochemical function, but appear plenty. Unlike increased sensitivity detection to be dissimilar due to differing cell-cycle for mRNAs, which are much less abundant phases. To address such variability, Linding than cellular proteins, even small increases says he and his team try to synchronize cell in the sensitivity of protein detection make a cycles in their cell lines. Even then, cells are big difference. “highly heterogeneous,” he says, which makes Computed proteins Given that sc-proteomics is in its analysis tough. Much understanding about Slavov says that many researchers in single- infancy, many challenges also remain for the is based on population-level cell transcriptomics want to adapt their computational analysis, says Jürgen Cox from data but sc-proteomics-based measurements software for sc-proteomics. All tools in this the Max Planck Institute of . might deliver new insights about cell space will need to be benchmarked, he says, In sc-proteomics using mass spec, isobaric cycle stages. “so that people don’t fool themselves,” he labelling is quite promising, as several Until his and other sc-proteomics says. Labs need to diagnose data quality and single-cell channels can be multiplexed for techniques mature, and labs can analyze identify what needs trouble-shooting. He a single mass-spec measurement. Including large numbers of single cells quickly, the and his team have developed data-driven additional channels, such as multi-cell techniques will not yield biologically optimization of MS (DO-MS)6 to visualize samples, enhances signal detection in the interesting results, says Aebersold. and analyze data. It’s programmed in R, mass spectrometer and helps establish That is why he and his team pursue built as a Shiny app and available here. It quantification standards. an intermediate goal: analysis of small can help, for example, when elution profiles It remains challenging in computational numbers of cells. They cluster cells using are not sampled at their apex, which is proteomics to interpret the multiplexed multi-parameter fluorescence-activated cell needed to maximize the number and purity quantification channels, given that “isobaric sorting (FACS) with 8–10 fluorescent colors, of ions. Slavov foresees much development labeling techniques are notoriously plagued and group them according to similarity. work ahead for computational tools in by co-fragmentation signals,” says Cox. “Rather than doing single cells and then sc-proteomics, as well as the need for averaging them or combining them, we standards and benchmarking to make sure combine them first and then measure the quantification is well-performed. Treatment A average,” he says. At Harvard Medical School, Peter A year ago, the team needed around Kharchenko and his team have been 20,000 cells; now the method works developing a computational tool for single- with 100 cells or less. “Any cell population cell RNA-seq data analysis to address that can be sorted with a FACS sorter, even heterogeneity issues. These challenges have with very low numbers, is proteomically grown now that RNA-seq is being applied accessible,” says Aebersold. A lab might be in complex study types involving many Normalized feature values looking at a rare type of cell or one that is measurements, on numerous samples, from only available in low numbers. For example, different people. A graphing tool from his Treatment B his team plans to work with a neuroscience lab, written in R, is clustering on network of lab to analyze mouse neuronal stem cells. samples (CONOS)7, which tracks cell types One can obtain around 100 such cells across these heterogeneous datasets and per animal and the goal is to use as few clusters similar cell sub-groups. It can also be animals as possible. applied to sc-proteomics, says Kharchenko. Aebersold acknowledges this intermediate “We’ve designed it to be very tolerant approach may seem less exciting. At an with respect to diversity of samples,” says annual conference on mass spec or other Kharchenko, so users can do joint analysis Deep learning Biological forecasting technology-focused event, a lab presenting related to perturbations or across different protein measurement from single cells tissues. A number of integration methods might make a big impression. At a are emerging. At the same time, integration P P meeting, however, an sc-proteomic analysis has ‘subproblems’ he says, such as technical from a tumor might reap a different variation, variability across individuals or reaction. “They’re not going to be impressed, tissues, molecular modalities or species. With P P because they’re going to say: ‘so what have sc-proteomics data analysis, the “relative you learned?’” In experiments with small nature of the signal” is challenging and these numbers of cells, such as when exposing cells data have more complex dropouts than to a drug or other perturbation, some similar transcriptomic data. In single-cell proteomics, tools are emerging to cell types might disappear and others appear. In Linding’s view, given that sc-proteomics quantify and integrate data about cell behavior, “This is interesting information,” data are ‘richer’ than single cell RNA-seq data, signaling and regulatory networks, says Rune says Aebersold. “what you need, is an error model.” Cellular Linding. Credit: Linding lab, Humboldt Univ.

Nature Methods | VOL 16 | SEPTEMBER 2019 | 809–812 | www.nature.com/naturemethods 811 technology feature

Fragmented peptides are “measured difference in mRNA level can lead a lab to involuntarily” and add unwanted interpret the cells as different, which contributions to the signals from peptides of they might be, or their cell physiology may interest. “We are working on normalization have merely shifted, temporarily methods and signal modeling that will and stochastically. improve the situation,” he says. Missing A human cell contains hundreds values in the data matrix are inherent to all of mRNA copies with proteins numbering single-cell technologies and usually need in the tens of thousands, says Aebersold. more attention than with bulk ‘ data. One might find only a few copies per cell Taken together with , he says, of a lowly expressed mRNA, which makes special algorithms are needed for these issues. it ever more important to determine what kind of “stochastic time-space” Big physics a cell is in, he says, and the risk of drawing When physicists work with particle the proteome but, given a certain mRNA false conclusions again rears its ugly accelerators, probabilities are applied concentration, it’s hard to model how many head. “I think there’s a lot of pitfalls in the to the likelihood something is detected. proteins from that RNA are present. Beyond interpretation, from a biological point of “That, I think, we are lacking in biology,” cell-to-cell variability in both transcription view, of single-cell data, which one would says Linding. A mass spec instrument is little and , and varying levels of protein have to aware of.” Techniques matter, but like a particle accelerator: when a signal of turnover, there are phenomena such as when developing and using them one must a certain size is detected, it might or might transcriptional bursts and many external heed “what they’re good for,” he says. “Then not be a true signal. He asks, “Can we assign factors that influence cells, mRNAs and it gets interesting when they can uncover uncertainties to events?” Labs juggling big proteins. Much about the mRNA–protein something new.” data should work with probabilities, not relationship has yet to be determined. averages, he says. Models can then help Aebersold, along with Jürg Bähler of Sc-next with data integration, and with explaining University College London, and colleagues, As sc-proteomics emerges, labs take a cellular behavior, to tease out how different quantified the transcriptome and proteome diverse set of paths. As Vogel says, “it quantitative data are related or to determine in fission yeast by looking at two cell-states: might be a combination of different causality. “To do all this, we need probability- quiescence and rapid proliferation9. “There advances that will lead to near single-cell based models,” he says, also for predicting were a lot of surprises,” says Aebersold, that proteomics.” One needs efficient protein behaviors of protein networks. shed light on the relationship between gene extraction from cells, enrichment via specific More labs can now use large datasets regulation, transcription, surfaces, sensitive mass spec needing less to, for example, predict how proteins are and physiology. When fission yeast shifts to and less sample, tagging tricks to enhance interacting. Beyond measuring protein dormancy, proteins and mRNA are down- identification, and advances in mass spec data abundance or concentration, cancer regulated, each in specific ways. acquisition and computational processing. research labs will want to track Proliferating cells had around 41,000 Nevertheless, Vogel is happy about some activity quantitatively, learn how many mRNAs per cell, with a mean of around 3 recent developments, including new mass phosphorylation sites there are and how transcripts per gene. Protein-coding genes spec data acquisition methods and more many target proteins can be phosphorylated. produced a mean of around 5,000 protein libraries of known spectra, which increase the “It is the activity of the molecule that is copies per cell, with a dynamic range possibilities of mined data. important, not necessarily the abundance,” covering five orders of magnitude up to a “There is something to be gained by says Linding. Biology is changing: more tools total of around 60 million protein having different approaches, particularly in are emerging to quantify different aspects molecules per cell. the early phase,” says Linding. Given that its of signaling and of the regulatory networks Quiescent cells had a much-reduced early days in the field, he says, this is not the in and between cells — sc-proteomics helps transcriptome, with a little over 7,400 time to bet only on one horse. ❐ with that. Traveling between data captured at mRNAs, and around 31 million protein different scales is non-trivial, says Linding, molecules per quiescent cell. When adjusted Vivien Marx which calls for new algorithms, and for for the lower cell volume in quiescent cells, Technology editor for Nature Methods. machine learning combined with more protein numbers dropped by around 10% of e-mail: [email protected] traditional mathematical models. the levels in proliferating cells. The mRNA Kharchenko agrees with the need levels drop, and the protein levels less so: Published online: 12 August 2019 for probabilistic data interpretation. when a good food source comes, “they’re https://doi.org/10.1038/s41592-019-0540-6 In many cases, the probability of detection ready to run,” says Aebersold. References will be close to 1, he says, but the uncertainty The shift to quiescence is accompanied 1. Hofstadler, S. A. et al. Anal. Chem. 67, 1477–1480 (1995). in abundance will remain. It will take by proteome remodeling—nearly half of 2. Collins, B. C. & Aebersold, A. Nat. Biotechnol. 36, more experiments to figure out the all proteins changed their copy numbers 1051–1053 (2018). 3. Swaminathan, J. et al. Nat. Biotechnol. 36, 1076–1082 (2018). structure of variation in such measurements. around twofold. Protein levels drop but 4. Budnik, B., Levy, E., Harmange, G. & Slavov, N. Genome Biol. 19, Models are needed to estimate this not indiscriminately, and those the cell will 161 (2018). resulting uncertainty. need when it emerges from quiescence stay 5. Specht, H., Emmott, E., Koller, T. & Slavov, N. Preprint at bioRxiv https://doi.org/10.1101/665307 (2019). at higher levels. Overall, in quiescent cells, 6. Hufman, R. G., Chen, A., Specht, H. & Slavov, N. J. Proteome Res. Counting mRNAs and proteins mRNAs were between 10,000 to 60,000- 18, 2493–2500 (2019). At one point, labs will want to integrate fold less abundant than the corresponding 7. Barkas, N. et al. Nat. Methods https://doi.org/10.1038/s41592- 019-0466-z (2019). mRNA and single-cell protein measurements. proteins, with 1 to 10 mRNAs per gene. The 8. Robin, X. et al. Preprint at bioRxiv https://doi.org/10.1101/621961 That will bring a longstanding discussion to results highlight an sc-proteomics challenge, (2019). the fore. The transcriptome correlates with says Aebersold. For example, a twofold 9. Liu, Y., Beyer, A. & Aebersold, R. Cell 165, 535–550 (2016).

812 Nature Methods | VOL 16 | SEPTEMBER 2019 | 809–812 | www.nature.com/naturemethods