Jeanne F. Loring, Ph.D. Professor and Director Center for Regenerative Medicine Department of Chemical Physiology 10550 North Torrey Pines Road MC SP30-3021 La Jolla CA 92037 Tel: 858-784-7767 Fax: 858-784-7333 To: ICOC members E-mail: [email protected] From: Jeanne Loring, Ph.D. January 24, 2014 Request for consideration of the merits of the Center for Advanced Genomics; application number C1R-06709: Stem Cell Genomics Centers of Excellence RFA 12-06: application by Scripps Research Institute and Illumina, Inc.

Dear members of the Independent Citizen's Oversight Committee:

I am writing this letter to ask the ICOC to consider the unique merits of our proposal for a Stem Cell Genomics Center of Excellence. The application was ranked in Tier 1, recommended for funding, by the Grants Working Group. The proposal is a joint program shared by the Scripps Research Institute and the San Diego genomics company, Illumina, Inc., with whom I have had long-term collaborations, and we have the goal of together developing cutting edge genomics technologies to support the California stem cell community in research and clinical studies. It may be of interest that my Scripps co-investigator, Dr. Nicholas Schork, has just moved to the new J. Craig Venter Institute, giving us even broader access to genomic expertise. Since I have attended many of the ICOC's meetings, I am sensitive to the time constraints of the committee members who devote a large part of their time to oversight of CIRM's activities, so I will keep this message brief. The ICOC has stewardship of CIRM's funds and is developing plans to maximize the long-term impact of CIRM's investments in stem cell research and development. A key element of your future plans is to form partnerships with strong industrial organizations in order to sustain stem cell research and clinical development beyond CIRM's current time frame. I attended the January 22, 2014 meeting of the Citizens Financial Accountability Oversight Committee, the government committee headed by the State Controller's Office tasked with advising CIRM on its financial practices. At that meeting, and at the prior meeting a year ago, the dominant theme was that CIRM must form more partnerships with stable organizations such as industry. Specifically, last year CIRM was advised to "Establish a platform to enable grantees, industry, other government agencies, disease foundations, venture capitalists and others to continue to pursue CIRM’s mission upon the expiration of CIRM’s bond funding" My partnership with Illumina resonates with CIRM's mission to have long-term influence on human health. Illumina is the world leader in inventing and implementing genomics technologies. The mission of Illumina's new Chief Medical Officer, Dr. Rick Klausner, who was director of NIH's National Cancer Institute and Director for Global Health at the Bill and Melinda Gates Foundation, is to improve human health by enabling researchers and medical centers to have access to Illumina's genomics technology and scientific talent. Dr. Klausner enthusiastically supports our collaboration. Illumina is successful because it develops cutting edge genomics instruments and technologies and continually drives down the price of genomics tools; it recently announced a new instrument that can sequence a human genome for the landmark cost of $1,000. The goal of our partnership is for CIRM to have the greatest possible ability to participate in the future development of stem cell genomics. The practical value of the stem cell-genomics partnership with Illumina would be immense and swift. Genomics-based diagnostic tests are being developed at a rapid pace, replacing less informative tests, and Illumina's sequencer is the only one so far approved by the FDA for clinical diagnostics. With Illumina as a partner, stem cell scientists in California would be able to access clinically relevant genomics tools in development that can have immediate impact on their translational and clinical studies, including diagnostics for cancers, heart disease, inherited diseases, and importantly, HLA typing that can be used for determining the transplant compatibility of cells in stem cell banks worldwide. We plan to use the same technology to develop stem cell- specific assays that will enable standardized tests for qualifying cells through FDA approval. The single cell analysis methods we plan to develop will allow us for the first time to examine the heterogeneity of cell cultures that are used for transplantation. Our application was recommended for funding by the Grants Working Group in Tier 1. The reviewers viewed the partnership of Scripps with Illumina positively: "The organization of the proposed Genomics Center is well conceived as a collaboration between highly qualified investigators from an academic institution and an industry partner, representing a diversity of competencies. The balance between expertise in stem cell biology and genomics technologies is a particular strength". Although our reviews were almost all positive, the wide range of reviewers' scores (70 to 88) indicates that some of the reviewers were far more impressed than others. The issues that concerned the lower scoring reviewers are specifically addressed in a fact-based appeal of the working group's review that I submitted to CIRM on January 20. Here, for your information, is a brief summary of the appeal; there were 4 concerns raised by the reviewers. 1. I did not promise additional money from TSRI. This was noted as the only serious concern for the reviewers; however, institutional contributions were not part of the evaluation criteria in the RFA. We believe that it was an inappropriate review criterion because it is not based in scientific merit. As CIRM staff emphasized in the review report: "the GWG's scores and recommendations were based solely on scientific merit" That issue aside, since Illumina has committed to supporting this project and currently has a market cap of $18 billion, we are certain that we will not lack the infrastructure to carry out our proposed goals. 2. PluriTest, our user-friendly genomics-based diagnostic tool developed with CIRM funding, was praised by some reviewers as valuable for providing "objective standards for assessing cell fate and for quality control of cell populations." But some reviewers claimed that it was not widely used. In our appeal, we pointed out that PluriTest has been used more than 7,500 times by researchers in 29 countries and was the most highly cited pluripotency assay for iPSC lines in 2013, eclipsing the mouse teratoma assay. 3. Some reviewers were concerned that Illumina's tools would be too expensive or too difficult to use. The separate letter from my Illumina co-PD explains that this is incorrect. 4. Some reviewers thought that the Program Directors are too committed to devote enough time to the Center. But both Dr. Ronaghi and I stated clearly that we would make the Center our highest priority. My lab currently works almost entirely on stem cell genomics, and I stated that I would transfer my current grants to other PIs so that I could focus primarily on the Center. All other issues aside, we believe that our partnership with Illumina is critical at this point in CIRM's tenure. If you do not support this joint project, you will be walking away from the best possible opportunity to rapidly propel stem cell technology and genomic medicine forward in California. Scripps' reputation for excellence in research and Illumina's stability and great track record would ensure the longevity of CIRM's investment. You are in the unusual position of having to select the program with the best value from the Tier 1 applications recommended by the Grants Working Group. I urge you to judge our academic-industry partnership on the basis of our history of strong collaboration and productivity, and its power to continue CIRM's legacy for many years into the future. I understand that the ICOC is able to consider other options, such as funding two complementary Centers in Northern California and Southern California. Or you may think that combining our program with another of the Center proposals would be the most expeditious choice. I know that you will thoughtfully determine the best opportunity possible. I know that the CIRM President is currently planning to recommend another application, a consortium of 7 academic institutions and non-profits dispersed throughout the state. We deliberately kept our Center small to keep administration costs low and have an efficient decision process. This strategy is designed to maximize the benefit to all California stem cell researchers who wish to use genomics technologies. I have not had the opportunity to discuss the value of industry partnership with the President, but I'm certain that he knows that it is a priority. I know that it is difficult for you to disagree with the recommendation of the President. But I ask you to independently consider what is best for California in the long term. You have disagreed with the President in the past when you believed that you were acting for the greater good of Californians. You are the individuals who are accountable for CIRM's future, and I appreciate that you will make your decision in the best interest of California's citizens.

The decision is in your hands, and I know that you will take your roles as independent judges and the guardians of CIRM's legacy very seriously.

Thank you for your attention, and please do not hesitate to contact me if I can clarify any questions you may have.

With best regards,

Jeanne F. Loring, Ph.D. Professor and Director Illumina, Inc. 5200 illumina way San Diego, CA 92122 tel 858.202.4500 fax 858.202.4545 www.illumina.com

Mostafa Ronaghi, Ph.D. Chief Technology Officer & Senior Vice President tel: 858.202.4510 [email protected]

Diane Donohue Executive Assistant tel: 858.202.4735 [email protected] October 28, 2012

January 20, 2014

Dear Members of the Independent Citizen's Oversight Committee (ICOC):

I am writing this letter to provide the ICOC with assurance that tools developed by Illumina as a partner in the Center for Advanced Stem Cell Genomics will be accessible, both in price and in support, to the stem cell community of California. Illumina is the global leader in genomics technologies, with 80% of the next generation sequencing market. Illumina has a strong track record of driving down the price of genomics tools; last week we announced new instrumentation that will allow the human genome to be completely sequenced for $1,000. Each machine can sequence about 20,000 genomes/year. We already have orders for 30 of these instruments. At Illumina, we are committed to provide integrated solutions to advance the understanding of genetics and human health. It is Illumina’s strong interest to make a contribution to this CIRM genome initiative and plan to have both stem cell researchers and Illumina succeed in this partnership. Instead of providing sequencing and other genomics platforms to researchers in the field through our regular sales channel, we chose to participate in this program more intimately, such that our latest developments in DNA sequencing and genetic analysis can have a significant impact even before they become commercial products. Through working with our collaborators and the CIRM community, we will also learn a great deal about the technology requirements for this fast- developing field, which will in turn help us to develop the best products to serve the entire research community. We proposed to design new technologies specifically for the needs of stem cell scientists for preclinical research and clinical studies; these include stem-cell specific genomic diagnostic tools to measure genomic stability, epigenetic state, and cell lineage identification, as well as single cell technologies to analyze clinical stem cell preparations. We received a very positive review from the Grants Working Group, saying that our plans "are at the forefront of technical advances", and would "deliver technologies extremely valuable to the stem cell community. Another comment said: "the PI and team are exceptionally well qualified to deliver on this project." The reviewers praised our partnership with Scripps: "the balance between expertise in stem cell biology and genomics technologies is a particular strength" and "the teams from the two applicant institutions have a well-established, strong working relationship; reviewers considered this an important attribute of this proposal". They said that "the organization of the proposed Genomics Center is well conceived as a collaboration between highly qualified investigators from an academic institution and an industry partner, representing a diversity of competencies," and "reviewers considered it a strength that... these projects will both create novel tools and technologies and validate them". Some comments made it clear that it was understood that Illumina would make their technologies accessible: One comment said: "The industry partner institution...is well positioned to develop novel cutting edge genomics technologies and make them accessible to customers". Another said: "...the Center is well designed to support collaborative research projects and to make relevant state-of-the-art genomics technologies readily accessible to investigators with primary expertise in stem cell biology or translational research". But we are concerned that some of the reviewers were still questioning the accessibility of our technology for stem cell researchers: There was some concern that some of the tools may not be made easily and widely available. Concern was expressed about whether potential collaborators who have limited experience in genomics would receive adequate assistance in designing their proposed studies. Concern was expressed about whether the new technologies would be specifically disseminated to California investigators and whether their cost might be prohibitive to many researchers.

I would like to specifically address these concerns. We are successful because we make cutting edge genomic tools that customers can afford to use. We are committed to the stem cell community as we have been to cancer researchers and large-scale genomics efforts. In the past, Illumina has been a partner in many NIH-supported large scientific initiatives such as the International HapMap project, numerous GWAS studies and the 1000-Genome project. We specified in the grant application that more than 30% of the funds would be dedicated to grants for California researchers and physicians. Through the granting program Illumina and Scripps will provide help in planning experiments, the tools for both genomics and stem cell biology, and help with analysis. We are extremely enthusiastic about the opportunity to work with the CIRM community to build a first class stem cell genomics infrastructure that will help position California as the world leader in both basic and translational stem cell research. Please do not hesitate to contact me if you are interested in specific projects or Illumina's plans.

With best regards,

Mostafa Ronaghi

Jeanne F. Loring, Ph.D. Professor and Director Center for Regenerative Medicine Department of Chemical Physiology 10550 North Torrey Pines Road MC SP30-3021 La Jolla CA 92037 Tel: 858-784-7767 Fax: 858-784-7333 E-mail: [email protected]

To: CIRM President Alan Trounson, Ph.D., Gil Sambrano, Ph.D., and CIRM Scientific Staff From: Jeanne Loring, Ph.D. January 20, 2014

Appeal of GWG Review and Request for Consideration of the CIRM President's Recommendation for RFA 12-06, Stem Cell Genomics Centers of Excellence.

Dear Dr. Trounson, Dr. Sambrano, and CIRM Scientific Staff: As Program Director for the proposed project, "Center for Advanced Stem Cell Genomics," from the Scripps Research Institute and Illumina, Inc., I request reconsideration of our application. Please note a change in status of one of our subcontractors: Nicholas Schork, Ph.D. has relocated from Scripps to the J. Craig Venter Institute, which adds JCVI as a member of our program's team. We want to point out that we are bringing up two types of issues: factual errors by some of the reviewers, and the basis for the recommendation by CIRM's President. First, we understand that the grounds for appeal of the GWG decision are strictly limited to a “material dispute of fact, which does not include disagreements over interpretation or analysis of facts by the GWG or b specialist reviewers." Since our application was scored in Tier 1 and recommended for funding by the GWG, we are not appealing their positive decision, but rather in the section below we will raise factual errors that we believe had a negative effect on the scoring by individual reviewers. Second, four applications were recommended for funding in Tier 1. We are aware of the others, but will not mention them here. In all other previous award competitions, all of the Tier 1 applications have been considered favorably for funding by the ICOC, which usually approves all or almost all of them. In this case, however, there will probably be only one awardee, and the CIRM's President will recommend his preferred applicant to the ICOC. He has not chosen our application for his recommendation, but this does not mean that our application is inferior and should not be considered; if this were the case, it would have been placed in Tier 2 or 3. Because the decision is solely the President's, we must ask the President and the ICOC to carefully examine the pros and cons of choosing a particular applicant, and to consider the unique merits of our proposed Center. We will not discuss this further in this letter; we are separately asking the ICOC members to use their own judgment in this case rather than relying solely on the President's recommendation. Appeal of GWG scoring Our scores ranged from 70 to 88. The positive reviews highlighted the tremendous track records of the PDs, our well-established strong working relationship, the balance between excellence in genomics technologies and stem cell biology, the inclusion of a well-established industry partner, and the expectation that the proposed Center would accomplish its goals. We believe that the lower scores were made by reviewers who were mistaken about some of the facts we presented.

1 We highlight 4 instances of factual misunderstanding: 1. The only serious concern in our review was that the PD's institution did not contribute additional funds. " Although a letter from leadership indicates enthusiastic institutional support from the academic institution, no additional funds or specific dedicated space have been designated. Reviewers expressed serious concern about this lack of material commitment". However, there was no requirement in the RFA for the PD's institution to contribute additional funds. This comment in our review suggests that other applications did offer to provide extra money for their Centers. Since there was no written request for additional funds, and we were not informed that contributions would be expected or considered as a measure of scientific merit, we were put at a significant disadvantage. It is clear from our review that this additional funding was a scientific merit criterion for the GWG, not only because of their "serious concern" about our lack of additional money, but because as was stated in the message from CIRM to applicants, "the GWG's scores and recommendations were based solely on scientific merit." We question the motives of any institutions that offered additional monetary support and ask if they were informed privately that this would be considered positively in the scientific merit review. It seems appropriate to comment here that even without explicit promise of additional funds, Illumina, with its current market cap of $17.2 billion and its stated intention to provide access to stem cell tools, would be expected to be invested in the success of the joint project.

2. Some reviewers thought that our web-based genomics-based pluripotency assay, PluriTest, has not been widely adopted and therefore was not useful. "Reviewers' opinions about the utility of an already existing analytical tool, to be further developed under this award, were divided. Some judged it positively as an important tool that has been made freely available in its current form and were enthusiastic about the plan for dissemination of the updated version. Conceptually, they considered the proposed approach to be very valuable, as it has the potential to provide objective standards for assessing cell fate and for quality control of cell populations. Other reviewers expressed concern that the current tool has not been widely adopted in the stem cell community, calling into question its usefulness." The statement about the current tool not being widely adopted is factually incorrect. PluriTest, our user-friendly genomics-based pluripotency diagnostic tool, was developed under a CIRM grant and became the most highly cited pluripotency assay for iPSC lines in 2013, used for more lines than the classic teratoma assay. Notably, publications in Nature, PNAS, and Cell Stem Cell used PluriTest as their sole pluripotency assay. At the time of the review in October, there were 484 unique registered users and data from 7,428 microarray analyses had been uploaded to www.pluritest.org. As of January 20, 2014, there are 518 unique registered users, and 7,851 files have been uploaded for analysis. A recent survey of the registered users shows that researchers in 29 countries use PluriTest: Australia, Brazil, Canada, China, Denmark, Finland, France, Germany, Greece, Hong Kong, India, Ireland, Israel, Italy, Japan, Korea, the Netherlands, Norway, Poland, Portugal, Russia, Scotland, Singapore, Spain, Sweden, Switzerland, Taiwan, the UK (Great Britain), and the US. Among the groups that have adopted PluriTest are the Wellcome Trust/Sanger Institute, 6 Australian institutions, 5 Japanese institutions, ATCC and Coriell cell banks, Max Planck Institutes, the Broad Institute, Harvard, MIT, Rutgers, and Yale universities, and multiple biotechnology and pharmaceutical companies. In California, all of the major institutions receiving CIRM funding for pluripotent cells use PluriTest.

3. Some reviewers thought that Illumina would not make novel technologies available to stem cell researchers. Most of the reviewers thought that having Illumina as a partner was a strongly positive aspect of the proposal, and that the stem cell genomic tools developed under the award would be available to the stem cell community: "(Illumina) is well positioned to develop novel cutting edge genomics technologies and make them accessible to

2 customers...The Center is well designed to support collaborative research projects and to make relevant state- of-the-art genomics technologies readily accessible to investigators with primary expertise in stem cell biology or translational research". However, some of the reviewers thought that the tools would not be widely available, that those new to genomics would not receive adequate support, and that the new technologies might cost too much. "There was some concern that some of the tools may not be made easily and widely available." "Concern was expressed about whether potential collaborators who have limited experience in genomics would receive adequate assistance in designing their proposed studies" and "...about whether the new technologies would be specifically disseminated to California investigators and whether their cost might be prohibitive to many researchers". These comments have no basis in fact. We explained, explicitly and multiple times, that Illumina would make the tools developed under this award available at reasonable cost to California researchers, and that in fact much of the 30% of the award devoted to the collaborative projects would be in the form of Illumina's products and services, which includes help with experimental design and interpretation. Illumina has been successful because it has not only invented new genomics technologies, but has focused on driving down the cost of genomic analysis, and this strategy has earned the company a current market value of $17.2 billion. Even in the cold-blooded logic of commerce, it is a well-known marketing strategy that successful collaborators often grow into successful customers. Illumina has promised to support the stem cell community and can afford to do so, and, indeed, expects us to become valued customers as the field blossoms.

4. The level of commitment of the PD and co-PD were questioned. "Some reviewers expressed concern that both the PD and co-PD are already heavily committed individuals and questioned whether they would have the capacity to fully provide a strong commitment to this project". It is NOT true that we are too committed to "fully provide a strong commitment to this project". We have all successfully managed large, multicenter projects. We do this by working in teams, and our teams work together in a well-established effective collaboration. We are no different from other management-level academic and industry scientists who oversee large groups and multiple related projects. In this case, the most important fact is that we stated in our application that the Center would be our highest priority. I stated that I would reduce my effort on my other grants or transfer them to co-investigators so that I could devote all of my research efforts to the Center. All of my lab's research is already focused on human stem cell genomics, so we are well prepared to expand these efforts to encompass the needs of the Center.

In summary, we are grateful to the reviewers for providing an unusually thoughtful and useful critique of our application. Of the many comments and suggestions, we found that only the four listed above were the result of factual misunderstanding by the reviewers. Thank you for your attention to this request, and for the hard work that you do to help us in our efforts to improve stem cell research and develop cures.

With best regards,

Jeanne F. Loring, Ph.D.

3 © 2012 Nature America, Inc. All rights reserved. Rickard Rickard Sandberg ideally as small as single cells. A protocol initially developed for for developed initially protocol studies microarray A single-cell cells. single as RNA, small of as amounts devel ideally starting toward lower require pushed that methods consistently of opment been have techniques These and early embryonic cells embryonic oocytes early and mouse individual for data generate to used La La Jolla, California, USA. Hayward, California, USA. 1 that can be analyzed to accurately quantify expression levels fragments of cDNAs sequence short millions generates (mRNA-Seq) of sequencing parallel massively through of Analyses genome-wide profiling transcriptome in rare cells. for melanoma circulating tumor cells. Our protocol will be useful for addressing fundamental biological problems requiring Seq to circulating tumor cells from melanomas, we identified distinct gene expression patterns, including candidate biomarkers have increased noise, hundreds of expressed differentially genes could be identified using few cells per cell type. Applying Smart- by transcriptomics evaluating it on total RNA dilution series. We found that although gene expression estimates from single cells Wepolymorphisms. single-nucleotide determined the sensitivity and quantitative accuracy of Smart-Seq for single-cell read coverage across transcripts, which enhances detailed analyses of alternative transcript isoforms and of identification protocol (Smart-Seq) that is applicable down to single cell levels. Compared with existing methods, Smart-Seq has improved but it has been technically challenging to generate expression profiles from single cells. Here we describe a robust mRNA-Seq Genome-wide analyses transcriptome are routinely used to monitor tissue-, disease- and cell gene type–specific expression, Daniel Ramsköld and individual circulating tumor cells Full-length m nature biotechnology nature USA. Received Received 14 November 2011; accepted 22 May 2012; published online 22 July 2012; expressed in mouse oocytes had been detected, and it yielded yielded it and microarrays with detected, compared been sensitivity had increased oocytes mouse in expressed transcripts new multi-exon genes are subject to alternative RNAprocessing alternative to subject are genes multi-exon mammalian most As transcripts. full across coverage read generates through reads mapping to reads mRNA through 5 transcripts quantifies that introduced been has RNA-Seq multiplexed for single-cell method a Recently, events. splicing distal identify amplified 3 the preferentially also method mRNA-Seq expres initial in This differences sion. find to power the limits variation technical faithfully represent the RNA population before amplification and how transcriptomes single-cell whether remained Therefore, the question RNA. of amounts small cDNA with to starting when intrinsic protocols is amplification that variation technical the from cells ferent controls, dif between variation biological technical to distinguish it impossible making lacked experiment mRNA-Seq single-cell first Gregory Gregory A Daniels Ludwig Ludwig Institute for Cancer Research, Stockholm, Sweden. 5 ′ Rebecca Rebecca and John Moores Cancer Center, San Diego, La Jolla, California, USA. ends of mRNAs, and hence the data could only be used to to used be only could data the hence and mRNAs, of ends 2 , 3 n ivsiae lent RA processing RNA alternate investigate and 1 7 1 4 These These authors contributed equally to this work. should Correspondence be addressed to R.S. ([email protected]). , 5 2 Department Department of Chemical Physiology, Center for Regenerative Medicine, The Scripps Research Institute, San Diego, La Jolla, California, , 2 , , Irina Khrebtukova , 7 7 , , Shujun Luo

, 8 . Using the method, thousands of genes genes of thousands method, the Using . advance online publication online advance 6 has been adapted for mRNA-Seq and mRNA-Seq for adapted been has R ′ ends NA- 9 3 . Neither of these methods . methods of Neither these , 7 , , Yu-Chieh Wang S 3 , , Jeanne F Loring 2 7 eq eq from single-cell levels of Department Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, Sweden. . However, this this However, . 1 , assemble 4

, 5 , there there , 4 4 , , , Robin Li 5 - - - .

4 using either Covaris shearing followed by ligation of adaptors (PE) (PE) adaptors sequencing of ligation by followed Illumina shearing Covaris either standard using construct to cDNA fied ampli Wethe cDNA. of used preamplification PCR of cycles 12–18 techn switching template SMART and priming For Smart-Seq, first we lysed each cell in hypotonic solution and and solution poly(A) hypotonic in converted cell each lysed we first Smart-Seq, For sequencing RNA single-cell robust and Efficient RESULTS transcriptome mapping important cells. in clinically individual, melanoma patient to demonstrate how Smart-Seq enables high-quality to putative circulating tumor cells (CTCs) captured from the blood of a assay we applied this method, of this importance biological strate the To demon cells. single or few from start that experiments of design and post-transcriptional studies, and provide valuable insights into the demonstrate the power of single-cell RNA-Seq for both transcriptional expression vary with different amounts of starting material. Our results differing of detection and sensitivity,how variability assess hensively well dilution as of as series well-defined total purified RNAs, to compre cells, mammalian individual many we from protocol, mRNAs the this Using sequenced mRNAs. of ends the just than more from cDNAs samples which coverage, transcriptome improved markedly alleles. and variants transcript of detection efficient for coverage the provide and expression gene quantify both to used be can that method transcriptome single-cell a for need a is , Louise , C Louise Laurent 6 Department Department of Reproductive Medicine, University of California, San Diego, doi:10.1038/nbt.228 Here we introduce a single-cell RNA-sequencing protocol with with protocol RNA-sequencing single-cell a introduce we Here 3 , , Qiaolin Deng + RNA to full-length cDNA using oligo(dT) oligo(dT) using cDNA full-length to RNA 2 6 , Gary P , Schroth Gary 1 , , Omid R Faridani 3 & o

1 ticles e l c i rt A R oy floe by followed logy, , 3

Illumina, Illumina, Inc., NA NA

libraries libraries

 - - -

© 2012 Nature America, Inc. All rights reserved. PC3 and T24) obtained equally good read coverage ( coverage read good equally obtained T24) and PC3 LNCaP, from each cells (four lines cell cancer from transcriptomes we achieved close to 40% coverage at the 5 cell, eukaryotic small a RNA of in amount ( RNA total compared with previous single-cell transcriptome methods. transcriptome single-cell previous with compared We conclude that Smart-Seq has substantially improved read coverage enabled full-length transcript reconstruction ( coverage our read genes multi-exon for of expressed, 25% all indeed, 1 to ng 100 the from mRNA-Seq to standard using close observed reaching coverage dilutions nanogram with amounts, starting increased with obtained was coverage better even that showed tions Fig. 2a (non-amplified; (non-amplified; mRNA-Seq standard with analyzed LNCaP cells; (cancer Smart-Seq using analyzed origin, line cancer bladder and prostate of cells human individual ( bottom). to ( s.d. bars, Errors (non-amplified). RNA brain mouse of ng 100 on mRNA-Seq standard from data included we comparison, For down. line uppermost from listed are numbers sample and sets, data separate as shown are laboratories) different from data (including series dilution Independent RNA. brain mouse of amounts diluted from generated data Smart-Seq for transcripts over ( replicates. biological among s.d. represent bars Error expected. is coverage read in a decline which after transcripts included shortest the of length the showing line gray dashed vertical the with 3 the from a distance as transcripts the over coverage read the We display right). (top indicated ranges length transcript the with separately, analyzed and lengths annotated n 7; (ref. data transcriptome oocyte mouse age of all transcripts longer than 1 kb ( kb 1 than longer transcripts all of age cover full-length improved considerably has Smart-Seq that strated In previous mRNA-sequencingsingle-cell studies transcripts across coverage improves Smart-Seq RNA. total of micrograms few a to ng 100 from libraries Table 1 We cally generatingtypi >20 samples). million platform, uniquely mapping Illumina reads 20 ( the on library (UHRR, sequencing each RNA sequenced reference human universal and samples) (28 brain mouse samples), (16 brain human from derived RNA total of series dilution from libraries 64 generated we addition and human in or cells, mouse from 42 individual libraries Smart-Seq oh f hs lbay rprto mtos nbe adm shot random ( cDNAs enable of methods sequencing gun preparation library these of Both (Tn5). or Tn5-mediated ‘tagmentation’ technology the Nextera using ( oocytes mouse Smart-Seq–analyzed for transcripts over ( transcripts. 1 Figure s e l c i rt A  single-cell data single-cell oocyte mouse published with comparison direct a mouse enable to from oocytes transcriptomes single-cell sequenced We transcripts. from a pronounced 3 = 2). Transcripts were grouped according to according grouped were Transcripts = 2).

– ). For comparison, we also made several standard mRNA-Seq standard ). we For several made also comparison,

n h Smart-Seq read coverage across across coverage read Smart-Seq = 3) and previously published published previously and = 3) ). Smart-Seq analyses of analyses mouse brain ). RNASmart-Seq at dilu different c a Fig. 1 Fig. ) Read coverage (as in in (as coverage ) Read ) Comparison of read coverage coverage read of ) Comparison n = 12) and for prostate cell line line cell prostate for and = 12) n 7 . Analyses . of Analyses read coverage across demon transcripts n = 4). Error bars, s.d. bars, Error = 4). = 5, 3, 4 and 4 for lines top top lines 4 for 4 and 3, = 5, b ). From only 10 pg input amounts (the estimated estimated (the amounts input pg 10 only From ). b ′ -end -end bias that limited analysis across full-length ) Mean read coverage coverage read ) Mean Supplementary Fig. 1 Fig. Supplementary b ) for 12 ) for ′ end, end,

Fig. 1 Fig.

′ end. Analyses of single-cell Supplementary Table 2 Supplementary Supplementary Supplementary Fig. 3 a and and 7 b a , 8

, , the data suffered Read coverage (%) Read coverage (%) Read coverage (%) ). We generated generated We ). Supplementary Supplementary 100 100 100 Supplementary 20 40 60 80 50 50 0 0 0 Fig. 1 Fig. 0 0 5′ end ′ end(kb) Distancefrom3 Distancefrom3 ′ end(kb) Distancefrom3′end(kb) Distance from3′end(kb) ed(b Distancefrom3′end(kb) Distance′end(kb) from3 Distancefrom3′end(kb) Distance from3′end(kb) 2 1 c ) and, and, ) µ ), ), 4 3 ). ). g - - - - -

mRNA-Seq 4–5 kb 0–1 kb Transcript (%) Non-amplified Smart-Seq and previous mouse oocyte data oocyte mouse previous and Smart-Seq ( reads mapped uniquely million 3 after stabilizing levels expression with cell, per reads ping map uniquely at above a levels million detection on transcript effect ( used RNA had a larger impact on sensitivity than the number of PCR cycles 4a Fig. ( cells analyzed of number the with increasing and ( RNA total starting of pg ~20 with obtained that to similar was cells cancer individual the ( profiles single-cell all in detected reproducibly were transcripts, detected for million mappable reads), which roughly equals the median expression transcripts expressed at 10 RPKM (reads per kilobase exon model and of ~76% that showed LNCaP, the lines) from T24 and each PC3 cells s ihy erdcbe n hs o tcncl variation technical low has and reproducible highly is mRNA-Seq using cells of millions from expression gene of Analyses transcriptomics single-cell of assessment Quantitative sensitivity ( sensitivity to single-cell levels decreased the detection rate of less abundant abundant less of ( rate transcripts detection the decreased levels single-cell to amounts starting the lowering However, mRNA-Seq. standard with compared sensitivity of in ng decline 1 minimal or no or found we RNA, ng total 10 with Starting levels. different at expressed scripts RNA. total reference of amounts microgram to ng 100 from ies total RNA. For comparison, we generated standard mRNA-Seq librar of amounts low on Smart-Seq of transcripts expressed differentially of detection and variability technical sensitivity, assess to Smart-Seq applied and levels picogram and nano- to RNA down total reference of amounts microgram diluted We therefore methods. single-cell of components pre-amplification cDNA the to intrinsic variation nical tech the measured has study mRNA-Seq single-cell no knowledge, First, we addressed the sensitivity of the method in detecting tran in of detecting the method the sensitivity we First, addressed 5 1 0 0 Supplementary Fig. 5) Fig. Supplementary ) Furthermore, we observed that the starting amount of total total of amount starting the that observed we Furthermore, ) advance online publication online advance 7 6 5 2 1 Fig. 2 Fig. Smart-Seq Supplementary Fig. 2i Fig. Supplementary Fig. 2 Fig. 4 3 100 pg 10 pg 0.5–10 ng 3 2 1 2 1 b ). We found that the sensitivity of gene detection for detection of gene We ). sensitivity the that found 1–2 kb 3′ end a 5–7 kb ). Analyses of the 12 cancer cell line cells (four (four cells line cell cancer 12 the of Analyses ). Fig. 2 Fig. Supplementary Fig. 4c Fig. Supplementary 0 0 b c and that the sequence depth had little little had depth sequence the that and

), with ~8,000 genes detected per cell cell per detected genes ~8,000 with ), Read coverage (%) 6 3 100 20 40 60 80 0 5′ end , j ). We conclude that transcript transcript that conclude We ). 7–9 kb 2–3 kb nature biotechnology nature 9 7 0 demonstrated similar similar demonstrated Transcript (%) Oocytes (ref.7) Oocytes (Smart-Seq) 0 , d 215 12 6 3 ). Comparisons of ). Comparisons 2 1 Smart-Seq Supplementary Supplementary mRNA-Seq Cancer cells Non-amplified 9 1 , 9–15 kb 4 3–4 kb . To our our To . 4 3 3′ end - - - -

© 2012 Nature America, Inc. All rights reserved. from log from and Smart-Seq generated data ( data generated Smart-Seq and ( RNA input ng 100 on data mRNA-Seq standard from estimated levels expression gene brain and UHRR human between differences relative the ( within replicates in bins of genes sorted according to expression levels. Error bars, s.e.m. ( s.e.m. bars, Error levels. expression to according sorted genes of bins in replicates within ( (yellow). lines cell cancer different from cells single among comparisons and blue), and (purple line cell same the of cells multiple among (blue), line cell cancer same the from cells single among comparisons pair-wise We show cells. T24 and PC3 in (as level expression to according binned pairs, replicate within detected ( data. mRNA-Seq standard from RNA brain and UHRR human of a comparison as well as (non-amplified) RNA input ng 100 with protocols mRNA-Seq standard using analyzed UHRR human of replicates technical of a comparison both added we controls, As indicated. as RNA total UHRR human of amounts diluted from generated data Smart-Seq We used interval. confidence 90% and mean the report and replicates of groups within comparisons pair-wise all We performed level. expression to according binned pairs, replicate in detected reproducibly generated from diluted RNA and individual cells. Comparison of of Comparison data and previous mouse oocyte Smart-Seq cells. individual and RNA diluted from generated cells. single in even detected reliably are transcripts expressed of and majority vast highly the of majority low-abundance the still but transcripts, low-abundance of loss random to lead that of RNA amounts starting by limiting affected is sensitivity detection 2 Figure nature biotechnology nature from oocyte to oocyte) across the whole range of expression levels levels ( expression of range whole the across oocyte) to data oocyte in from variability (lower Smart-Seq with expression of estimation as as standard mRNA-Seq and that Smart-Seq on 1 ng total RNA showed on 10 that ng Smart-Seq RNAtotal variability had the same technical ( level of as the expression a function variability the we computed levels, expression on transcript depends ( among dilution replicates at 10 pg (Pearson correlations of 0.65–0.75) than type same the of cells individual among 0.75–0.85) of relations cor (Pearson correlations higher observed we data, RNA-dilution the against cells single the from data Comparing RNA. of amounts replicates of diluted RNA showed concordance increasing with larger d Supplementary Supplementary Fig. 6 Supplementary Fig. 2k Fig. Supplementary ) Standard deviation in gene expression estimates in replicates (as in in (as replicates in estimates expression gene in deviation ) Standard Second, we determined the reproducibility in expression levels levels expression in reproducibility the determined we Second, a c Standard deviation, log10 Gene detection (%) 100 0.4 0.8 1.2 1.6 2

20 40 60 80 0 transformed relative gene expression profiles, together with nonlinear loess regression curves (green) and and (green) curves regression loess nonlinear with together profiles, expression gene relative transformed Sensitivity and variability in Smart-Seq from few or single cells. ( cells. single or few from Smart-Seq in variability and Sensitivity 0 . . .1525303.5 3.0 2.5 2.01.5 1.0 0.5 0 . . .1525303.5 3.0 2.5 2.01.5 1.0 0.5 0 Smart-Seq 10 pg 20 pg 100 pg 1 ng 10 ng Gene expression(RPKM),log Gene expression(RPKM),log ). ). As variability in measurements of expression mRNA-Seq ). Correlation analyses between technical technical between analyses Correlation ).

Brain versusUHRR Non-amplified advance online publication online advance Smart-Seq mRNA-Seq y axis) starting from 1 ng total RNA ( RNA total 1 ng from starting axis) 10 ng Brain versusUHRR Non-amplified 10 pg 20 pg 100 pg 1 ng Fig. 2c Fig. 10 10 c 7 , ) Standard deviation in gene expression estimates estimates expression gene in deviation ) Standard demonstrated improved demonstrated d ). This analysis showed ). analysis This d b Standard deviation, log10 Gene detection (%) 100 0.2 0.4 0.6 0.8 20 40 60 80 0 0 b . . .1525303.5 3.0 2.5 2.01.5 1.0 0.5 0 . . 2.01.5 1.0 0.5 0 ) Percentage of genes reproducibly reproducibly genes of ) Percentage Dilution series

Data fromFig.2b e ), 100 pg total RNA ( RNA total pg 100 ), Gene expression(RPKM),log Gene expression(RPKM),log c ). ( ). - a a e ) Percentage of genes genes of ) Percentage ) for human LNCaP, human ) for – g in technical variability, particularly for less abundantly expressed expressed ( abundantly transcripts less for particularly variability, technical in increase clear a was there levels, picogram to down amounts input ( noise technical in increase modest a only Smart-Seq datafrom Smart-Seq eitherhuman brain sample reflecting orUHRR, expressionin distorted with transcriptspopulations of two identified ( correlation good overall showedRNAinput pg 10 with ( 0.77 and correlations0.87 Spearmanof had relative expression in Smart-Seq and standard mRNA-Seq, respectively, ( genes expressed highly in biological variation extensive revealed origin different of cells cancer vidual mRNA-Seq standard using UHRR ( and samples brain human of comparisons in found variation biological the to RNA total of levels concordance( Smart-Seq with different amounts of input RNA, we again found a from highestimated those mRNA-Seqto standard minususing estimated brain) (UHRR levels expression gene relative Comparing profiles. expression original the of representative were material preamplified Fig. 2 ) Scatter plots showing showing plots ) Scatter ial, e sesd hte snl-el xrsin rfls from profiles expression single-cell whether assessed we Finally, Between celllines Within celllines Dilution series Between celllines Within celllines c ). ). Notably, analyses of gene-expression variation between indi 80 cells Data fromFig.2a Single cells Single cells 10 cells 80 cells Single cells Single cells 10 cells n f . . 3.5 3.0 2.5 Fig. 2 Fig.

) and 10 pg total RNA ( RNA total pg 10 ) and ≥ ≥ Fig. 2e Fig. 10) 10) 10 10

c

). We compared technical variability at picogram picogram at variability We). technical compared

g

). Starting with 1 ng or 100 pg total RNA,the total pg 100 or ng 1 with Starting ). e g f Smart-Seq, 10 pg (RPKM, log2) Smart-Seq, 100 pg (RPKM, log2) Smart-Seq, 1 ng (RPKM, log2) −10 −10 −10 −5 −5 −5 10 10 10 0 5 0 5 0 5 1 50510 5 0 −5 −10 1 50510 5 0 −5 −10 1 50510 5 0 −5 −10 g y Pearson r=0.58 Spearman Pearson r=0.88 Spearman r=0.87 Pearson r=0.79 Spearman r=0.77 = ). Correlation coefficients computed computed coefficients Correlation ). Fig. 2 Fig. x lines (red). lines mRNA-Seq (RPKM,log mRNA-Seq (RPKM,log mRNA-Seq (RPKM,log d r =0.57 Fig. 2 Fig. ). Fig. 2e Fig. c ticles e l c i rt A ). When lowering lowering When ). ,f ). Comparisons ). ) but 2g) Fig. 2 2 2 ) ) ) x axis) axis)  -

© 2012 Nature America, Inc. All rights reserved. mouse ESCs levels in sequence-depth matched single-cell mRNA-Seq data. Smart-Seq data from diluted mouse brain RNA (green) compared with previously published the arrows ( variance. The numbers of significantly differentially expressed genes per pair-wise cell line comparison are shown next to global gene expression profiles. Projections are shown based on the first two dimensions that capture most of the conducted for 12 individual cancer cells (four cells each from the PC3, LNCaP and T24 cancer cell lines) based on to cell line of origin using single-cell Smart-Seq transcriptomes. Singular-value decomposition (SVD) analysis was line cells using Smart-Seq. ( ( approximated1 data the from estimated slopes loess the is, that ing, LNCaP and T24 cell lines from single-cell Smart-Seq analysis on four cells per cell line as a function of estimated false discovery rate. coverage is shown as a heatmap with darker blue indicating higher read coverage. ( measurements are plotted. ( skewed but transcriptome preserved overall the foundamounts) and starting microarray study had analyzed PCR-amplified cDNA (from picogram ( biassystematic no found we but disproportionateto lead amplificationtranscripts, also short ofcould with such minute RNA amounts ( starting when transcriptslow-abundance of mostly losses, stochastic Figure 3 s e l c i rt A  Seq data ( could be assessed, compared to previously published single-cell mRNA- twofold increase in the number of potentialindividual alternatively cells. The splicedimproved exonsread coveragethat withinfer Smart-Seqexon inclusion resulted levels in fora known alternatively spliced exons in the 12 q hundreds ( lines cell three identified the among we genes expressed and differentially of origin of line cell to according tered clus line) cell each from (four cells individual 12 of expression gene (PC3 and and cell cancer (T24) line The LNCaP) cells. bladder global prostate from transcriptomes single-cell on analyses our focused we methods, published previously with RNA of compared amounts low on Smart-Seq of performance improved the demonstrated Having differences post-transcriptional and Transcriptional ences in expression for detected transcripts. relativediffer preserved general, in cells, single or few fromanalyses included exon 13 of the single-cell transcriptomes, as seen in readtype–specificsplicing.alternative Cellinferred from coveragesplicingbe could across the differentially cells. We used the Bayesian mixture of isoforms framework (MISO) has hampered the ability to identify alternative splicing differences in single expression of much lower levels in T24 cells (15% mean inclusion levels) whereas low included in LNCaP cells (93% mean inclusion level) but was included at Fig. < 0.05 ANOVA; 0.05 <

a

The pronounced 3 SVD component 2 –100 100 150 –50 50 2e 0 –100 1

– 0 Transcriptional and post-transcriptional analyses of cancer cell g . Our data from 1 ng and 100 pg total RNA showed no skew Fig. 3 ). Together, these results demonstrated that transcriptome that demonstrated resultsTogether, ). these P 8 < 0.05, 1-way ANOVA and Tukey post-hoc test). ( and ESC-derived cells (red) and 12 Smart-Seq–analyzed individual prostate and bladder cell line cells (purple). Individual RNA or cell T24 (n=4) PC3 (n=4) NEDD4L –50 433 b ), substantially improving our ability to detect alternative P SVD component1 < 0.05 post-hoc test). post-hoc 0.05 < ′ -end bias of previous single-cell mRNA-Seq studies 0 in PC3 cells precluded inclusion level estimation. NEDD4L 731 c a ) Single-cell Smart-Seq reads mapping to a portion of the ) Categorization of individual cells according 50 660 gene ( LNCaP (n=4) Fig. 2a Supplementary Fig. 7 Fig.Supplementary 100 Fig. 3 , g 150 ). Preamplification of cDNA c ). This exon was frequently b Number of detected exons (×1,000) 10 15 20 25 30 35 40 45 0 5 g10p 0p 10pg 50pg 100pg 1 ng ). A previous A ). Mouse brain b Total RNA ) Mean number of exons with sufficient read coverage for MISO analyses of exon inclusion Fig. 3 Fig. 1

1 to

a - - - ;

ES/other Mouse Single cells cells of mary melanocytes ( we also generated Smart-Seq libraries from single cells derived from pri of melanocyte lineage–specific markers, as all NG2 melanocytic origin of the putative melanoma CTCs came from analyses Smart-Seq or mRNA-Seq (data not shown). Additional support for the from the human brain RNA samples that were previously analyzed with celltypeoforigin ( of gene expression levels showed a clear clustering of cells according to line cells. Unsupervised hierarchical clustering and correlation analyses instead were highly similar to primary melanocytes and melanoma cell ode and white blood cell samples), as well as embryonic stem cells, and purification with a MagSweeper instrument drawn(Illumina) from a patient with recurrent melanoma using immunomagnetic on our unbiased selection of the 100 transcripts most strongly associated As the NG2 UACC257, mRNAs levels for NG2 lymphomacell lines (BL41 and BJAB) tocompare them toblood cells. The putative CTCs were distinct from to peripheral blood lymphocytes ( six single NG2 biomarker identification. To this end, we generated transcriptomesvide data fromto support the use of this method for unbiased cancer-specific analysesputativeof tumorpro couldrevealoriginandtheirCTCsof ducible single-cell transcriptomes, we asked Havingwhether demonstrated globalthat Smart-Seq generatestranscriptome quantitative and repro transcriptomes cell tumor circulating of Analyses alternative RNA processing in single cells. We conclude that Smart-Seq considerably improves our ability to detect hn % as dsoey ae ( rate discovery false 1% than differentialexoninclusion levelsamong three thecelllines,less witha comparisonthisIn threeofcancer celllines, foundwe exons100 with d MITF ) Number of differentially included exons identified among the PC3, + Human cells expressed high levels of melanoma-associated genes (based cancer NEDD4L Single cells cells advance online publication online advance 1 6 but not immune markers such as n + = 3) cells and from human embryonic stem cells (ESCs, NM_001144967 putative CTCs were isolated from blood, it was important LNCaP c + gene locus from four individual T24 and LNCaP cells. Read NM_015277 putative melanoma CTCs isolated from peripheral blood T24 MLANA cell 4 cell 3 cell 2 cell 1 n cell 4 cell 3 cell 2 cell 1 Fig.4 = 2), melanoma cancer cell line (SKMEL5, 55,996 kb NEDD4L a 1 and 4 55,998 kb , TYR

SupplementaryFig. 8 i. 3 Fig.

1 Supplementary Fig. 9 5 600k 602k 604k 606k 56,008kb 56,006kb 56,004kb 56,002kb 56,000 kb and the melanocyte specific m-form d

1 d and 3

and immune tissues (lymphn Number of exons

100 200 300 PTPRC nature biotechnology nature 0 upeetr Tbe 3 TableSupplementary 0 False discoveryrate(%) LNCaP versusT24 T24 versusPC3 LNCaP versusPC3 01 025 20 15 10 5 + ( cells expressed high Fig. 4 1 2 ), and), separation . For comparison, ). Furthermore, b ), in contrast n

= n

4 and = 8). ). - - - -

© 2012 Nature America, Inc. All rights reserved. putative CTCs that support the reference (G) or risk (A) allele for the melanoma-associated SNP (rs1126809). SNP melanoma-associated the for allele (A) risk or (G) reference the support that CTCs putative transcriptomes as in in as transcriptomes melanoma associated tumor antigens ( antigens tumor associated melanoma (BL). ( (BL). from transcriptomes single-cell in PTPRC marker immune and MITF, MLANA) (PMEL, and TYR ( (ESC). cells stem embryonic human and (T24) line cell cancer bladder PC3), and (LNCaP lines cell cancer prostate (UACC), UACC257 and (SKMEL) SKMEL5 lines cell melanoma (PM), melanocytes primary (CTC), CTCs melanoma putative from cells and samples) node lymph and cells blood white and BJAB, and BL41 lines cell lymphoma (Burkitt’s samples immune human include analyzed Samples (percentage). values bootstrap with indicated are clusters in confidence the and clusters high-order indicates Coloring RPKM). (>100 genes expressed CTC identity for the NG2 patterns of melanoma-associated transcripts clearly support test).ranksumThus, transcriptomes theirglobalaboth melanomaandexpression Downregulated genes were enriched for regulators of cell death and and death cell of regulators for enriched were genes Downregulated ( categories additional and genes cycle cell genessimilarselectedamanner in (Fig. 4 with melanoma; see Online Methods), but not immune cell–associated 4 Figure nature biotechnology nature that have been repeatedly found to be upregulated in cancer ( antigens melanoma-associated were significantly (Benjamini-Hochberg adjusted hoc test) lower levels ( cytes, and 436 genes with significantly ( test) higher expression in the putative CTCs than the primary melano tified 289 genes with significantly ( gene expression profilestheir with of those of individualComparison primary tumor. melanocytes iden melanoma a from originating of a Correlation We next investigated whether the NG2 1.0 0.8 0.6 0.4 0.2 c ) Gene expression levels in CTCs for an unbiased set of 100 immune and melanoma markers. ( markers. melanoma and immune 100 of set unbiased an for CTCs in levels expression ) Gene samples Immune

g e d Single-cell transcriptomes of circulating tumor cells. ( cells. tumor circulating of transcriptomes Single-cell 100 Chr11: KE ACP CTC PM UACC SKMEL Scale KE ACP CTC PM UACC SKMEL F TYR ---> 91908075 91908076 89017970 89017965 89017960 89017955 89017950 C T G C C A C G G A A G C C T C G G T G A C G A G 4 T MSMLUACC SKMEL PM CTC UCSC genes b G A E with the addition of more immune samples (W, white blood cells; L, lymph node). ( node). lymph L, cells; (W, blood samples white immune more of addition the with 100 Supplementary Table 4 1,133 + 1 99 cells. PM Q

95 78 advance online publication online advance 0 SNP (dbSNP132)foundin≥1%ofsamples 152 Fig. 4 Fig. W 24 0 100 q < 0.05 ANOVA; d 117 ), upregulated plasma-membrane proteins ( proteins plasma-membrane upregulated ), 69 0 rs1126809 d q < 0.05 ANOVA; L and + 47 0 putative CTCs showed signs c 0 CTC , 100 Supplementary Table 5 Supplementary P 198 Supplementary Fig. 10 Fig. Supplementary < 3.7 × 10 ×3.7 < ). The upregulated genes 1 ESC R S BL ESC P 56 1 < 0.05) enriched for ESC P < 0.05 post-hoc R 100 0 0 LB L W BJ BL Immune: −15 P < 0.05 post- , Wilcoxon, H a 1 MAGEB2 MAGEA3 MAGEA6 MAGEC2 MAGEA12 MAGEA2,2B MAGEA10,A5 7 ) Hierarchical clustering of human samples based on gene expression of highly highly of expression gene on based samples human of clustering ) Hierarchical 1 , , mitotic

T24 GJB1 RPS3 CRIM1 PSMB1 ABCG5 PRAME GPR126 ADAM17 SCL20A1 100 R ). ). - - ) 87 LNCaP

100 gens, provides strong support for the conclusion that the NG2 the that conclusion the for support strong provides gens, anti tumor melanoma known of expression elevated with together eaoye ( melanocytes primary to compared CTCs the in transcripts membrane–associated melanocytes and immune primary cells. Weto compared identified nine upregulatedCTCs plasma melanoma-derived in expressed tively selec proteins membrane for screen to analysis transcriptome CTC plasma- that membraneassumption proteins priori would be a good diagnostic the biomarkers. using We tumors used the different from a from melanoma. originated that were CTCs associated with melanomas ( melanomas with associated which are not expressed in immune cells and have not been pr eaoa PAE ws ihy xrse i NG2 in expressed highly was (PRAME) melanoma in antigen expressed preferentially the Notably, genes. I class MHC expression of plasma-membrane proteins identified 37 genes with with significantly genes ( 37 identified proteins plasma-membrane of expression 0 In recent years, there has been a strong interest in CTCs identifying e PC3 ), and downregulated plasma-membrane proteins ( proteins plasma-membrane downregulated and ), 4 f 100 KE ACP CTC PM UACC SKMEL c b q < 0.05 ANOVA; q 00 ANOVA; 0.05 < a Melanoma KE ACP CTC PM UACC SKMEL with Burkitt’s lymphoma cell lines BL41 and BJAB BJAB and BL41 lines cell lymphoma Burkitt’s with Immune d – 0 f ) Heatmaps showing relative expression of expression relative showing ) Heatmaps g ) Number of reads from individual PMs and and PMs individual from reads of ) Number Expression (RPKM,log b 0 ) Expression of melanocyte makers makers melanocyte of ) Expression Fig. 4 Fig. P 010150 100 50 < 0.05 post-hoc test) lower CTC geneexpression(RPKM) P e < 0.05 post-hoc test), many of of many test), post-hoc 0.05 < ). Similarly, screening for loss of loss for screening Similarly, ). Relativeexpression(log –5 105 S BL ESC 2 ) Expression (RPKM,log 0 S BL ESC ticles e l c i rt A 1 50 f ) in single-cell single-cell ) in 2 ) + NF2 APP CD81 DPP4 CDH1 DSG2 GNAL PLIN2 HRAS HLA-B HLA-C HLA-H HLA-G IFITM2 RAB31 TTYH3 TRPM1 VAMP8 ANXA6 ABCB5 HDAC6 MGST2 SLC7A8 GPR143 TGFB1l1 SEMA5A SLC16A6 CYSLTR2 cells, which which cells, 200 2

expression *** e 3 10 viously TYR MITF PMEL PTPRC MLANA + ) cells cells  - -

© 2012 Nature America, Inc. All rights reserved. (rs1126809) the in SNP melanoma-associated the example, for SNPs, documented with coincided sites high-confidence of the percent two Ninety- shown). not (data sites editing RNA A-to-G of (9%) subset shown to correlate with melanoma aggressiveness and metastasis and aggressiveness melanoma with correlate to shown been has TRPM1 of expression low Notably, surveillance. immune gene changes expression might enable the CTCs these to from escape ( genes HLA five including surveillance, immune from escape the with associated genes of tion cells, which should be useful for both basic and clinical studies. clinical and basic both for useful be should which cells, of individual of transcriptomes the improved representation provide protocol Smart-Seq the with obtained sets data conclusion, In tions. muta and SNPs biomarkers, candidate identifying for informative tran CTC our on scriptome areusing analyses results, Smart-Seq single-cell Based also highly more splicing. enables alternative of which analyses transcripts, detailed across coverage read improved cells. LNCaP of culture a in active genes the of most detected cell, each in we, as information, useful yielded cell single a of sequencing Even differences. melanoma meaningful CTCs biologically identified cell type; for example, comparing only two primary melanocytes to six per cells a individual only few using genes expressed of differentially quantitative transcriptome data from single cells. We found hundreds and robust generates Smart-Seq approaches. which single-cell milieus, necessitates heterogeneous in often and quantities rare in exist tens to hundreds of cells. quantitiesof populationsavailable in homogeneous cell on studies to applied be could method this Therefore,variability. expression-level in increase twofold) than (less minor a only showed cells) 50–100 to roughly (corresponding ng 1 with starting whereas mRNA-Seq, ard onng of10 RNAtotal indistinguishable practically was from stand a orlaser-capture results techniques. Our showed that using Smart-Seq cells can be either individually picked or identified through cell sorting numbers of cells will have many applications for studying small and cells single raretranscriptomesfromhigh-coverage Generating cells; such DISCUSSION for only SNPsregions cells. using few and mutations in transcribed likely artifactual, sites ( only supported by calls a single cell showed genotype an excess of whereas previously unidentified, CTCs, two least at in allele alternative an for support with sites genomic high-confidence 4,312 identified we method, Smart-Seq the by provided coverage read improved the With cancers. other or melanomas with other associated and variants (SNPs) genetic polymorphisms single-nucleotide for mined be RNA-Seq. with cells CTC single of studying utility the highlight results these and invasiveness, and migration CTC of understanding our enhance likely will proteins membrane these of studies Future Accession code. Accession the pape the of in version available are references associated any and Methods M metastasis and invasion proliferation, increasing by progression cancer to contribute to thought is CDH1 of loss and CTCs, the in expression no showed (CDH1) 1 Cadherin ( melanocytes primary than CTCs the in s e l c i rt A  read data). read

ethods Smart-Seq is a robust method for single-cell RNA-Seq with with RNA-Seq single-cell for method robust a is Smart-Seq types cell important clinically and biologically many However, Lastly, we investigated whether Smart-Seq transcriptome data could 2 0 ( Fig. 4 Gene Expression Omnibus: Omnibus: Expression Gene r . g ). We conclude that Smart-Seq enables screening Supplementary Fig. 11 i. 4 Fig. f ), and and ), 1 8 . Wedownregula . found also Fig. 4 Fig. TRPM1 ) together with a smaller G SE f ). Of note, epithelial epithelial note, Of ). 3 8 , suggesting that that suggesting , 4 9 5 (sequencing (sequencing TYR online online gene gene 1 9 - - - - . version of the r pape The authors declare competing interests:financial details are available in the R.S. designed the study and prepared the manuscript, with input from other authors. libraries. L.C.L. and G.P.S. contributed to study design and manuscript text. and melanoma cell line cells. O.R.F. and Q.D. contributed additional sequencing Y.-C.W., G.A.D. and J.F.L. prepared melanoma circulating tumor cells, melanocytes developed protocols and created libraries. I.K. and S.L. did primary data analysis. prepared figures, tables and methods, and contributed manuscript text. S.L. and R.L. D.R. designed and performed the computational analyses of sequencing reads, for Strategic Research (FFL4) and Åke Wiberg Foundation (756194131). Council (starting grant 243066), Swedish Research Council (2008-4562), Foundation RT1-01108, TR1-01250, and RN2-00931). R.S. was supported by European Research R33MH87925 and California Institute for Regenerative Medicine (CL1-00502, National Institutes of Health (NIH) K12HD001259. J.F.L. was supported by NIH by a fellowship from the Marie Mayer Foundation. L.C.L. was supportedlaboratory by US (Stockholm) for assistance with MiSeq sequencer. Y.-C.W.and G. Cann was forsupported assistance with the Magsweeper, members of the ScienceReview forBoard Lifeprotocol preparation and aquisition of clinical samples, A.A.and TalasazJ. Cotton at the University of California San Diego for their help in InternalWe thank C. Burge and G. Winberg for critical reading of the manuscript, T. Juarez Note: Supplementary information is available in the 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 1. reprints/index.html. R Published online at CO AU A eprints and permissions information is available online at http://www.nature.com/ at online available is information permissions and eprints ck

T M high-densityoligonucleotide microarray analysis. ubatsn D.F. Gudbjartsson, Duncan, L.M. Duncan, Tang,A. P.Chomez, of regulation in factor microphthalmia of V.Role Setaluri, & D. Fang, Tomita, Y., Montague, Hearing, V.J. P.M. Anti-T4-tyrosinase & monoclonal antibodies– A.A. Jungbluth, S. Shukla, A.H. Talasaz, RNA of design and Analysis C.B. Burge, & E.M. Airoldi, E.T., Wang, Y., Katz, N.N. Iscove, Islam, S. F.Tang, Tang, F. Kurimoto, K. Pan, Q., Shai, O., Lee, L.J., Frey, B.J. & Blencowe, B.J. Deep surveying of alternative E.T. Wang, C. Trapnell, M. Guttman, Mortazavi, A., Williams, B., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying Acad. Sci. USA Sci. Acad. device. sweeper magnetic a using blood from cells rare other and cells nnoae tasrps n ioom wthn drn cl differentiation. cell during switching isoform Biotechnol. and Nat. transcripts unannotated to splicing. to potential for melanoma metastasis. melanoma for potential RNA-seq. multiplex cutaneous melanoma and basal cell carcinoma. cell basal and melanomacutaneous exponentially from sub-picogram quantities of mRNA. mRNA. of quantities (2002). 940–943 sub-picogram from exponentially , 377–382 (2009). 377–382 6, melanocyte differentiation marker TRP-1. marker differentiation melanocyte specific markers for pigmented melanocytes. regulation. isoform identifying (2010). 1009–1015 for experiments sequencing sequencing. high-throughput Genet. Nat. by transcriptome human the in complexity splicing mammaliantranscriptomes RNA-Seq.by all human members of the family. the of members human all mass by single-cell RNA-Seq analysis. RNA-Seq single-cell by mass keratinocytes in vitro. in keratinocytes Nature ea-/AT1 n nimoioa. n muoitceia ad rt-PCR and immunohistochemical An analysis. angiomyolipomas. in Melan-A/MART-1 (2010). 503–510 28, lincRNAs. of structure multi-exonic conserved the reveals mouse 657–663 (1999). 657–663 NT HOR CO n P owl ET advance online publication online advance I , 470–476 (2008). 470–476 456, N et al. mRNA-Seq whole-transcriptome analysis of a single cell. rcn te eiain f mroi se cls rm h inr cell inner the from cells stem embryonic of derivation the Tracing al. et E-cadherin is the major mediator of human melanocyte adhesion to adhesion melanocyte human of mediator major the is E-cadherin al. et Virchows Arch. Virchows et al. Characterization of the single-cell transcriptional landscape by highly e G G FI dg CTCF-promoted RNA polymerase II pausing links DNA methylation DNA links pausing II polymerase RNA CTCF-promoted al. et Nature lentv ioom euain n ua tsu transcriptomes. tissue human in regulation isoform Alternative al. et An overview of the MAGE gene family with the identification of identification the with family gene MAGE the of overview An al. et , 1413–1415 (2008). 1413–1415 40, et al. et Transcript assembly and quantification by RNA-Seq reveals RNA-Seq by quantification and assembly Transcript al. et t al. et Ab initio reconstruction of cell type–specific transcriptomes in transcriptomes type–specific cell of reconstruction initio Ab al. et sltn hgl erce ppltos f icltn epithelial circulating of populations enriched highly Isolating al. et Down-regulation of the novel gene melastatin correlates with correlates melastatin gene novel the of Down-regulation al. et N ment RIBU

http://www.nature.com/doifinder/10.1038/nbt.228 , 3970–3975 (2009). 3970–3975 106, . A xrsin f eaoyeascae mres p10 and gp-100 markers melanocyte-associated of Expression al. et , 511–515 (2010). 511–515 28, Representation is faithfully preserved in global cDNA amplified amplified cDNA global in preserved faithfully is Representation N n improved single-cell amplification efficient cDNA An for method 3 t al. et Genome Res. Genome CIAL CIAL I , 74–79 (2011). 74–79 , T s J. Cell Sci. Cell J. IO , 429–435 (1999). 429–435 434, N SP n TR imnain ains soit with associate variants pigmentation TYR and ASIP NTE S R

, 983–992 (1994). 983–992 107, 21 E Cancer Res. Cancer Cancer Res. Cancer S , 1160–1167 (2011). 1160–1167 , T S Cell Stem Cell Stem Cell Nat.Methods J.Invest. Dermatol. Biochem. Biophys. Res. Commun. Res. Biophys. Biochem.

NucleicAcidsRes. , 5544–5551 (2001). 5544–5551 61, Nat. Genet. Nat. nature biotechnology nature online version of the pape , 1515–1520 (1998). 1515–1520 58,

5

, 468–478 (2010). 468–478 6, , 621–628, (2008).

a. Biotechnol. Nat. 40

85 , 886–891 (2008). 886–891 , , 426–430, (1985). a. Methods Nat.

34 Nat. Biotechnol. Nat. 2 , e42 (2006).e42, . Nat. Methods

Proc. Natl. Proc.

r . online

256, 20 7, ,

© 2012 Nature America, Inc. All rights reserved. GAGTACATrGrGrG-3 rmr (5 primer RNAwas reverse-transcribed through oligo(dT) tailed priming using CDSthe The carefully designed SMARTer II A oligo (5 2 U/ 4 to media or single-cell is detailed in or from single cells. The exact number of cycles used for each dilution replicate of total RNA, 15 cycles for 100 pg of total RNA, and 18 cycles for second-strand10 pgsynthesis. totalThe cDNARNA was then amplified using 12 cycles for 1 ng the mRNA as well as an anchor sequence that serves as oligonucleotide.a universal The resulting primingfull-length cDNA sitecontains forthe complete 5 transcriptase then switches templates and continues transcribing to with thethese additional endC nucleotides,of creating the an extended template. The reverse nontemplatednucle few C a MMLVofadds Generation and amplification of Smart-Seq cDNA. METHODS ONLINE doi:10.1038/nbt.2282 of 10 SMARTerTranscriptaseReverseandoligo) Aand SmartScribe II totalvolumea in primer (CDS oligos inihibitor,RNAse mM), (10 mix dNTP mM), (100 MgCl mM 30 and KCl mM 375 8.3, pH Tris-HCl mM (250 Buffer Strand First 5× of addition the with out first-strand carried was generationcDNA The RT). (MMLV transcriptase reverse virus leukemia murine represents A, C or G) directly in total RNA or a whole cell lysate using Moloney lysis and stabilization of the RNA through RNase inhibitors. Then, poly(A) Then, inhibitors. RNase through RNA the of stabilization and lysis The deposition of an intact cell in the hypotonic lysis buffer leads to immediate RNA, see amountsstartingof on (depending PCR of cycles amplification12–18 then in ligationadaptors,IlluminaPEbase, to A singleaddition andloweda ofbythe resultinginstrument.wereshearingend-repaired,fragmentsacousticThe fol col). With the PE protocol, the amplified cDNA was fragmented using a Covaris fication of Epicentre’s Nextera DNA sample preparationeither Illumina’sprotocol (the ‘Tn5’Ultra Low protoInput mRNA-Seq Guide cDNA(the (~5 ‘PE’ng cDNA) protocol)was used to construct orIllumina sequencing a librariesmodi using sequencinglibraries. andsequencingof Smart-Seq Construction 500–5,000 bp (although lengths of mRNAs differ between cell types). High Sensitivity kits on a Bioanalyzer (Agilent), expectingusing monitored was cDNAa amplified distinctof distribution peak length aroundThe cDNA. fied Mix (Clontech) and Nuclease-Free water, resulting in a few nanograms of ampliprimer (5 1 max in added was RNA)control (or cell each applications, cell single RNAoftotalngpg–10 100 manualnowthatisincludedthe For kit.for in this or cell(s) from cDNA generating for instructions detailed the in reflected protocol is our available, commercially became kit the before generated were RNA Kit for Illumina sequencing. Although the libraries all in this manuscript become available in a kit marketed by Clontech called the SMARTer Ultra Low eration and amplification methods developed for this manuscript have recently 50 tion reaches the 5 the reaches tion 5 min in a 20- ated). With the Tn5 protocol, the amplified cDNA was ‘tagmentated’ at 55 °C for the transposase off the DNA, and the tagmentated DNA wasreaction buffer.purified We withadded 35 88 ated mRNA-Seq transcriptome data following the Illumina mRNA-Seq kit kit 1 and ng 100 from mRNA-Seq Illumina the following data transcriptome mRNA-Seq ated libraries. mRNA-Seq standard of sequencing and Construction specified in the figure legend. figures of this manuscript were generated using the PE protocol unlessin includedare otherwise cell single and sequencing platform and library construction method for each dilution replicate depth,sequence the onDetails intowere filterfastqexported files. that passed Illumina’sclusterseitheron all instruments, and MiSeq or GAIIx 2000, HiSeq DNA 1000 kits on a Bioanalyzer (Agilent), and the librariesusing confirmed werewas quality thenLibrary PCR. Nexterastandardsequenced of cycles nine by SPRI XP beads (sample to beads ratio of 1:1.6). Purified DNA was then amplified µ l reaction volumes with Advantage 2 PCR Buffer (Clontech), dNTP mix, PCR µ µ l of ribonuclease (RNase) inhibitors (Clontech, 2313B) in RNase free water. l (see Clontech manual for details). Once the reverse transcription reac ′ Supplementary Table 1 -AAGCAGTGGTATCAACGCAGAGT-3 ′ –AAGCAGTGGTATCAACGCAGAGTACT(30)VN–3 µ µ l of hypotonic lysis buffer consisting of 0.2% Triton X-100 and X-100 Triton 0.2% of consisting buffer lysis hypotonic of l l reaction with 0.25 ′ end of an RNA molecule, the terminal transferase activity transferase terminal the molecule, RNA an of end µ g of total RNA, as detailed in in detailed as RNA, total of g ′ , where r indicate rib Supplementary Table 1 µ l of PB to the tagmentation reaction mix to strip Supplementary TableSupplementary 1 for detailed instructions of all libraries gener µ l of transposase and 4 o o ′ nucleotide bases) then base-pairs -AAGCAGTGGTATCAACGCA tides to the 3 the to tides . The PCR was performed in Supplementary Table 1 Supplementary ′ The Smart-Seq cDNA gen ), Advantage 2 Polymerase . All data shown in the the in shown data All . µ l of 5× HMW Nextera end of the cDNA. the of end ′ 2 ), dithiothreitol ), ′ wee V where , We gener Amplified ′ end of µ µ l of l l of . + ------

est est expression level of the two samples, and was considered detected if it had an pairs of technical replicates or individual cells. Genes were binned by the high ( detection gene lated by of of each reads 5 10 using two pg Analyses different million samples. pendently aligned using Bowtie using aligned pendently transcriptome. and genome to reads short of Alignment above. described as lysis cDNA constructed and into buffer picked manually were cells Individual Norway). Kabi, (Fresenius healthy human from volunteers were isolated on Ficoll lymphocytes gradients using LymphoPrep blood Peripheral above. described as constructed cDNA and buffer, dilution in lysed picked, manually were hyaluronidase by oocytes Single removed digestion. were cells cumulus and oviduct the of ampulla the of dissection by treatment hCG after h 14–15 isolated were oocytes MII later. h 48 hCG of IU 5 of injection by followed PMSG, IU 5 of superovulated injection by were Mice mice. female CAST/EiJ old 4-week from lymphocytes. isolated human and oocytes mouse of Isolation libraries. Smart-Seq of preparation until °C −80 at stored and Biolabs) England (New inhibitor RNase unit/ml 4,000 2.5 into placed were cells individual The profiling. molecular for cells isolate to performed was profile Calcein-positive/CD45-negative/bead-attached desired showing cells viable of picking Manual cells. viable identify to min 20 for HBSS in Technologies) reads were converted to binary BAM files using Samtools using files BAM binary to converted were mapped reads uniquely The mapping. transcriptome initial the in gene same the of transcripts to map to for different reads allowing while and transcriptome, genome the both across unique were reads mapped that ensured procedure This location. genomic unique a to map that reads identify to reads mapped ­coordinates to genomic coordinates and thereafter compared with the genome from analyses. from Samples analyses. with 20 pg of total RNA in (used Broad Institute) using the histogram visualization for transcriptome data were using visualized the Integrated Genome Viewer (IGV, s PM aus n ra cut uig rpkmforgenes using counts read and values RPKM as variation. and sensitivity of comparisons technical and estimation level Expression for visualization heatmap and tively). Transcriptome mapped reads were converted from transcriptome transcriptome from converted were reads mapped Transcriptome tively). respec 2010, December 13 and 2011 May 16 downloaded were annotations mouse and human (Ensembl, sequences transcriptome and mm9) or (hg19 10 million uniquely mappable reads (a few ESCs few than (a reads fewer mappable uniquely with million Samples 10 levels. exon and gene in variation and sensitivity mapped reads were per sample used to compare selected randomly 10 million Only locus. gene per value RefSeq expression one giving Overlapping collapsed were mouse. transcripts for files uniqueness in-house–computed and human for (wgEncodeCrgMapabilityAlign50mer.bigWig) track Mappability ENCODE the using normalizations length transcript for positions mappable uniquely considered only calculations RPKM respectively. 2010, December 13 and 2011 August 31 the on downloaded were mouse and human for tions cell suspension using the MagSweeper instrument (Illumina) as previously described previously as (Illumina) instrument MagSweeper the using from suspension beads cell the harvest to sweeping magnetic on based captured were The cells h. 2 for °C 4 at beads magnetic MG980A streptavidin-conjugated with reacted and HBSS, with washed h, 2 for °C 4 at IgG mouse NG2) as known (also CSPG4 anti-human biotinylated with reacted subsequently were cells with Phycoerythrin-conjugated anti-human CD45 IgG to label leukocytes. The stained were cells EDTA.5 mM and nucleated BSA 1% The containing HBSS of ml 1 in resuspended EDTA, mM 5 pelleted, and BSA 1% containing HBSS 10 min at room temperature. The nucleated cells were pelleted, resuspended in sample were lysed with BD Pharm Lyse lysing solution Dickinson) for(Becton in was 3 within 4.5 processed h ml of The of erythrocytes the collection. blood Dickinson). Molecular of Characterization Circulating Melanoma Cells’.and (Becton ‘Detection The blood sample #101330, tubes IRB UCSD under collected collection were CTCs Melanoma blood EDTA K2 using melanoma blood. metastatic recurrent, from with a patient male was collected blood peripheral peripheral from CTCs individual of Isolation 1 Gene expression levels for Refseq transcripts were summarized summarized were transcripts Refseq for levels expression Gene 2 . The collected cells were stained with 5 5 with stained were cells collected The . Fig. 2a Fig. , b and µ l of Superblock (Thermo Scientific) containing containing Scientific) (Thermo Superblock of l Supplementary Fig. 4b Fig. Supplementary Figure 3 Figure 2 1 against the respective genome assembly assembly genome respective the against c . nature biotechnology nature 8 ) were therefore discarded discarded therefore were ) Supplementary Supplementary Figure µ , g/ml Calcein AM (Life (Life AM Calcein g/ml c ) were calculated over ) were calculated Fig. Fig. 2b 2 MII oocytes were were oocytes MII Ten milliliters of of milliliters Ten 3 Reads were inde were Reads Rfe annota RefSeq . 2 2 . The resulting resulting The . , d ) ) were simu

3 - - - - -

© 2012 Nature America, Inc. All rights reserved. in log in sion levels as log - expres relative or absolute using computed were correlations Spearman and Pearson package. graphics the using regression nonlinear loess and package) with 0.886. Scatter plots were generated in R using (geneplotter smoothScatter brain and LNCaP data for comparison for data LNCaP and brain ( expressed to Weanalyses the isoform. one of at limited exons least annotated all throughout structure exon-intron correct obtained we which for those as genes reconstructed full-length and we defined annotations, on RefSeq based least ten mapped reads. Analyses of transcript full-length reconstructions were SVDMAN using computed SVD and the to length unit were normalized in RPKM levels expression The to transcriptomes. (SVD) the in patterns decomposition fundamental value the determine singular using analyzed were cells cancer decomposition. value Singular erated erated from 10 LNCaP cells ( improvevariation with a larger numbers of data we cells, gen Smart-Seq used cells are often log normally distributed normally log often are cells genes below 0.1 RPKM in either sample. As gene expression levels excluding across single expression, log of mean the by genes binning samples, of pairs on ( of variation WaldAnalyses adjusted method. the using deviation standard a with within group together was replicates used of technical pairs possible for mean The all samples. in 0.1 both above RPKM nature biotechnology nature line comparisons, and in in and comparisons, line in ( post-hoc test with evaluated Tukey were by Benjamini-Hochberg) FDR, followed (5% corrections log2) (RPKM, levels testing multiple after significant genes Only test in R/Bioconductor. post-hoc expression on performed was expression. of differential Analyses ( expression gene in similarity overall the visualize to nents transcripts length of 1.5 kb (ref. (ref. kb 1.5 of length transcripts monocyte SAGE data, we converted 1.5 RPM to 1 RPKM, assuming an average of expression values each gene in putative the CTCs. Toindividual include the melanoma and for the immune cell combination. We the then evaluated mean the for markers strongest 100 the selected and metric this to according genes only. We replicate one ranked in values expression outlier by dif driven avoid ferences to samples, the of any in value expression highest the by divided ( samples node lymph and cells blood white in in shown were replicates over s.d. and mean The coverage. read largest bin the with the by division the through normalized later was bin per calculated coverage The read up. summed were genes gene different that the for all count for read bins the total before by divided was gene each for bin per count read The gene. and bin each for counted was reads of number the and bins, sized equally 40 into divided was transcript Each read. per hits 10 to up for allowing Bowtie using genome, the to than rather directly transcripts RefSeq to mapped were Reads transcripts. RefSeq mouse and human on based were of transcriptome. across read Analyses coverage on. so and samples 3-cell from 375,000 samples, matched to by random from reads reads, 10 single-cell using 125,000 millions a total of 80 cells per each of the two sample pools. These were sequence-depth isolated using the EPCAM marker, and 2 single-cell LNCaP samples, achieving 10 and into donor’sspiked 5 healthy cells from samples LNCaP cell and blood combined samples using 25, 10 and 3 cell samples from picked LNCaP cells, 25, the effect of using larger numbers of cells (used in in (used cells of numbers larger using of effect the between melanoma samples melanoma expression gene between mean in difference the calculated initially we respectively, cells, immune and melanoma with associated strongly most transcripts 100 cells. immune and for melanoma genes of marker Selection cells. line cell cancer bladder ≥ 0.1 RPKM) and multi-exon ( multi-exon and RPKM) 0.1 Supplementary Table 4 Supplementary Figure Figure P 10 < 0.05). Lists of significantly differently expressed genes are available available are genes expressed differently significantly of Lists 0.05). < expression values and s.d. by multiplying mean variation in a bin bin a in variation mean multiplying by s.d. and values expression 2 1 8 and and . . Each cell was then projected onto the two strongest SVD compo 2 RPKM values. We included publicly available human UHRR, Supplementary Figure 2 Figure Supplementary Figure 3 Figure for CTC, primary melanocyte and melanoma cell cell melanoma and melanocyte primary CTC, for Supplementary TableSupplementary 1 2 9 and a combination of monocytes of combination a and ≥ 2 The global transcript expression values for for values expression transcript global The 2 exons) genes. exons) 2 3 ). a One-way analysis of variance (ANOVA)of variance analysis One-way for comparisons between prostate and and prostate between comparisons for 4 , 2 25– 4 , we calculated absolute difference difference absolute calculated we , , including all transcripts with at with transcripts all including , 2 7 . To analyze how sensitivity and and sensitivity how To . analyze Fig. 4 Fig. Fig. 2b Fig. The read coverage analyses Fig. 2b Fig. a ). ). To obtain estimates for ). The differences were were differences The ). , d ) were also calculated calculated ) also were , d Fig. 3 Fig. ), we created two two created we ), To the identify 3 a 0 ). , T cells T , 3 1 - - - ,

Analysis Toolkit Analysis of overlap with known SNPs were based on dbSNP build 132 (ref. (ref. 132 build dbSNP on based SNPs were known with overlap of false positives arising from mapping of reads with partial poly(A) tail. Analyses models, and the last 35 base pairs of transcripts were not considered to remove gene RefSeq using regions transcribed to Weanalyses the limited threshold). (see score threshold for sites of 500 and requiring detection in two or more samples Picard (h 29. 28. 27. 26. 25. 24. 23. 22. 21. rmdup irrhcl lseig analyses. clustering Hierarchical sample. per reads mapped uniquely million 10 sampling randomly by depth sequence the matched we ( exons across coverage read of assessment fair a For them. between alternative the to mapping reads exon or upstream or the immediate exon downstream or 20 exon-exon junctions least at require which settings, MISO We default the used intervals. confidence with levels inclusion exon calculate t the calculating when variation)/2) gene + variation percentile ((90th term variance the in included was variation the of percentile 90th the line) (cell group sample each For replicates. biological across levels inclusion exon the on based exon alternative each for calculated and cDNA data cDNA and EST from identified previously exons skipped of alternatively for a collection exons. spliced alternatively of Detection inclusion levels of alternative exons we applied a inclusion. exon differential of Analyses recovered. was branch each times of we and the percentage counted from these clustered, that were independently ness) of each branchpoint, we generated thousand bootstrap gene set replicates (or robust To scipy (hcluster). python using the significance linkage evaluate 20 RPKM genes) were (3,690 clustered by and correlation Spearman complete known to counteract false positives in microarray analyses microarray in positives false counteract to known BWA scriptome (Ensembl, annotations 16 downloaded May 2011) and genome with SNP and mutation detection. in are rates presented discovery false exons at different significant by the number of significant events with correct sample groups. The numbers of divided shuffles exons in random of number significant as the estimated then was rate discovery false The groups). sample of splitting random the vary to exons nificant was counted using the of sig number group, sample the and other the from half with combined and half in split randomly were groups sample the rates, discovery false estimate of conversion the the compute shuffle each for and repeatedly ping) statistic was calculated by shuffling the sample labels (cell-to-cell line map line (cell-to-cell labels sample the shuffling by calculated was statistic

egr M.F. decomposition Berger, value SVDMAN–singular T.S. Brettin, & P.A. Dyck, M.E., Wall, Sam, L.T. statistical of Evaluation S. Dudoit, & K.D. Hansen, E., Purdom, J.H., Bullard, K.F.,Au, Xing,Jiang,Wong,Y.L.,Lin,H.,W.H.& Detection splice junctionsof from profiling expression Gene M. P.Kubista, Rorsman, & A., Ståhlberg, M., Bengtsson, Ramsköld, D., Wang, E.T., Burge, C.B. & Sandberg, R. An abundance of ubiquitously H. Li, Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient Genome Res. Genome analysis of microarray data. microarray of analysis Bioinformatics BMC experiments. mRNA-Seq in expression differential and normalization for methods paired-endRNA-seq SpliceMap.databy levels. mRNA of in single cells from the pancreatic islets of Langerhans reveals lognormal distribution Biol. data. sequence transcriptome tissue by revealed genes expressed alignment of short DNA sequences to the human genome. human the to sequences DNA short of alignment (2009). of cancer transcriptomes. cancer of , 2078–2079 (2009). 2078–2079 25, Supplementary Fig. 11 Fig. Supplementary 3 3 2 , allowing for no indels and removing multi-mapping reads. Samtools Samtools reads. multi-mapping removing and indels no for allowing , 2 , e1000598 (2009). e1000598 5, was used to filter PCR duplicates, and BAM files were reordered by reordered were files BAM and duplicates, PCR filter to used was t t h Sqec AinetMp omt n SAMtools. and format Alignment/Map Sequence The al. et p : / et al. A comparison of single molecule and amplification based sequencing / p i c 4 , 413–427 (2010). 413–427 20, 3 a . We used the mixture of isoforms (MISO) framework (MISO) isoforms of mixture the Weused . nertv aayi o te eaoa transcriptome. melanoma the of analysis Integrative al. et 4 r jointly on reads from all six CTC samples, with a quality quality a with samples, CTC six all from reads on jointly t d Genome Res. Genome statistics to to statistics . s o , 94 (2010). 94 11, u r c e PLoS ONE PLoS f o for more detailed information on varying these these varying on information detailed more for Bioinformatics r g CTC RNA-Seq Fastq files were mapped to - tran e , 1388–1392 (2005). 1388–1392 15, . P n values for the cancer-cell comparison. To comparison. cancer-cell the for values e t / Genes with average expression above above expression average with Genes ). Variant sites were called by the Genome t , e17305 (2011). e17305 6, statistics introduced above (repeatedly, introduced statistics NucleicAcidsRes. t statistic. The null distribution of the distribution null The statistic. We analyzed exon inclusion levels levels inclusion exon We analyzed To find significant differences in in differences significant find To , 566–568 (2001). 566–568 17, t -test with variance shrinkage, t statistics, thus allowing allowing thus statistics,

38 Genome Biol. Genome doi:10.1038/nbt.2282 , 4570–4578, (2010). 3 2 . A variance was was variance A . LS Comput. PLOS Bioinformatics Figure 3 Figure 35). Fig. 3 Fig. , R25 10, 1 1 b to to d ), ), . - - -

© 2012 Nature America, Inc. All rights reserved. 32. 31. 30. doi:10.1038/nbt.2282 monocyte subset. monocyte lio, .. Ci X, ae GP & arpu, . irary aa analysis: data Microarray M. Sabripour, & G.P. Page, X., Cui, D.B., Allison, B.E. Bernstein, A.M. Zawada, rm iary o osldto ad consensus. and consolidation (2006). to disarray from Nat. Biotechnol. Nat. ueSG eiec fr D4C1+ ooye a a third a as monocytes CD14.CD16+ for evidence SuperSAGE al. et , 1045–1048 (2010). 1045–1048 28, h NH oda eieois apn consortium. mapping roadmap NIH The al. et Blood , e50–e61 (2011). e50–e61 118, a. e. Genet. Rev. Nat.

55–65 7,

35. 34. 33.

next-generation DNA sequencing data. sequencing DNA next-generation hry ST, ad M & iokn K dSPdtbs fr ige nucleotide single for dbSNP-database K. Sirotkin, & M. Ward, S.T., Sherry, McKenna, A. Burrows–Wheeler with alignment read short accurate and Fast R. Durbin, & H. Li, oyopim ad te cass f io gntc variation. genetic minor (1999). 677–679 of classes other and polymorphisms transform. Bioinformatics et al. The genome analysis toolkit: a mapreduce framework for analyzing , 1754–1760 (2009). 1754–1760 25, Genome Res. Genome nature biotechnology nature

20 , 1297–1303 (2010). 1297–1303 , eoe Res. Genome 9,