Electronic Sorting of Immune Cell Subpopulations Based on Highly Plastic Genes
Total Page:16
File Type:pdf, Size:1020Kb
Electronic Sorting of Immune Cell Subpopulations Based on Highly Plastic Genes This information is current as Pingzhang Wang, Wenling Han and Dalong Ma of October 2, 2021. J Immunol 2016; 197:665-673; Prepublished online 10 June 2016; doi: 10.4049/jimmunol.1502552 http://www.jimmunol.org/content/197/2/665 Downloaded from Supplementary http://www.jimmunol.org/content/suppl/2016/06/10/jimmunol.150255 Material 2.DCSupplemental References This article cites 56 articles, 18 of which you can access for free at: http://www.jimmunol.org/ http://www.jimmunol.org/content/197/2/665.full#ref-list-1 Why The JI? Submit online. • Rapid Reviews! 30 days* from submission to initial decision • No Triage! Every submission reviewed by practicing scientists by guest on October 2, 2021 • Fast Publication! 4 weeks from acceptance to publication *average Subscription Information about subscribing to The Journal of Immunology is online at: http://jimmunol.org/subscription Permissions Submit copyright permission requests at: http://www.aai.org/About/Publications/JI/copyright.html Email Alerts Receive free email-alerts when new articles cite this article. Sign up at: http://jimmunol.org/alerts The Journal of Immunology is published twice each month by The American Association of Immunologists, Inc., 1451 Rockville Pike, Suite 650, Rockville, MD 20852 Copyright © 2016 by The American Association of Immunologists, Inc. All rights reserved. Print ISSN: 0022-1767 Online ISSN: 1550-6606. The Journal of Immunology Electronic Sorting of Immune Cell Subpopulations Based on Highly Plastic Genes Pingzhang Wang, Wenling Han, and Dalong Ma Immune cells are highly heterogeneous and plastic with regard to gene expression and cell phenotype. In this study, we categorized genes into those with low and high gene plasticity, and those categories revealed different functions and applications. We proposed that highly plastic genes could be suited for the labeling of immune cell subpopulations; thus, novel immune cell subpopulations could be identified by gene plasticity analysis. For this purpose, we systematically analyzed highly plastic genes in human and mouse immune cells. In total, 1,379 human and 883 mouse genes were identified as being extremely plastic. We also expanded our previous immunoinformatic method, electronic sorting, which surveys big data to perform virtual analysis. This approach used correlation analysis and took dosage changes into account, which allowed us to identify the differentially expressed genes. A test with human CD4+ T cells supported the method’s feasibility, effectiveness, and predictability. For example, with the use of human non- regulatory T cells, we found that FOXP3hiCD4+ T cells were highly expressive of certain known molecules, such as CD25 and Downloaded from CTLA4, and that this process of investigation did not require isolating or inducing these immune cells in vitro. Therefore, the sorting process helped us to discover the potential signature genes or marker molecules and to conduct functional evaluations for immune cell subpopulations. Finally, in human CD4+ T cells, 747 potential immune cell subpopulations and their candidate signature genes were identified, which provides a useful resource for big data–driven knowledge discoveries. The Journal of Immunology, 2016, 197: 665–673. http://www.jimmunol.org/ any phenotypes and immune cell subpopulations have that growing numbers of subpopulations will be discovered. One been identified, either from developmental commit- important question is whether these potential immune cell sub- ments or simply as representatives of certain func- populations and their functions can be predicted. Generally, for M + tional states (1–7). For example, CD4 T cells can be classified wet experiments, immune cell subsets must first be isolated by into distinct subsets based on their signature cytokines (8, 9), such FACS or be induced through in vitro culture and then used in gene as Th1, Th2, Th9, Th17, Th22, regulatory T cells (Tregs), and expression analyses or functional tests. As yet, no immunoinformatic follicular helper T (Tfh) cells. Several studies also revealed that method, using the virtual sorting of immune cells, was reported to different immune cell subsets are capable of conversion (e.g., the function with immune cell subpopulations. transitions from Th17 or Th2 to Th1 or from Tregs or Tfh to Th17), However, the rapid accumulation of publicly available high- by guest on October 2, 2021 which suggests phenotypic flexibility and plasticity (10, 11). throughput omics data, such as data from microarrays and next- Immune cell subpopulations are generally denoted by a set of generation sequencing, allows the direct generation of “big signature genes or marker molecules, especially cytokines and cell data.” These big data imply that a great deal of knowledge will be surface markers. More than 35,000 potential protein-coding genes discovered through data mining. Previously, we used immuno- exist in the human genome (12), and .3,700 genes potentially logically relevant transcriptome data from microarrays to analyze encode cell surface transmembrane proteins (13). Because gene gene coexpression (15), and we evaluated marker molecules based expression is highly plastic, and because each gene has its char- on gene plasticity (GPL) (14). GPL is the degree of change in the acteristic gene expression profile under various experiments, expression of a gene that happens in response to various envi- treatments, or pathophysiological conditions (14), we can expect ronmental or genetic influences. The GPL score is a quantitative measurement that can be used for the evaluation of biomarkers Department of Immunology, School of Basic Medical Sciences, Peking University (14). Highly plastic genes often have large GPL scores; thus, they Health Science Center, Beijing 100191, China; Peking University Center for Human are not suitable as lineage-specific biomarkers of immune cells. Disease Genomics, Beijing 100191, China; and Key Laboratory of Medical Immu- In this study, we evaluated immune cell subpopulations based on nology, Ministry of Health, Beijing 100191, China GPL analysis. We expanded the meaning of electronic sorting, which Received for publication December 14, 2015. Accepted for publication May 17, 2016. previously described the retrieval of cell states or experimental conditions based on gene expression intensity from miscellaneous This work was supported by Grant 31270948 from the National Natural Science Foundation of China and by Leading Academic Discipline Project of Beijing Edu- immune cell samples (14). As a method to “virtually sort big data cation Bureau (BMU20110254). samples to do analysis,” we could also call this process “virtual Address correspondence and reprint requests to Associate Prof. Pingzhang Wang, sorting” or “in silico sorting,” because no actual immune cells were Peking University Health Science Center, 38 Xueyuan Road, Beijing 100191, China. sorted as they are in the FACS process. First, we systematically E-mail address: [email protected] analyzed highly plastic genes in human and mouse immune cells and The online version of this article contains supplemental material. found that the genes with high plasticity were well suited for labeling Abbreviations used in this article: APA rate, absent-present and present-absent rate; + ARS, average rank score; DEG, differentially expressed gene; GO, gene ontology; immune cell subpopulations. Then, we used human CD4 T cells as a GPL, gene plasticity; KEGG, Kyoto Encyclopedia of Genes and Genomes; PP, model to perform electronic sorting. Our results revealed the feasi- present-present; RNA-seq, RNA-sequencing; Tfh, follicular helper T; Treg, regulatory bility, effectiveness, and predictability of electronic sorting as a Tcell. means to discover the potential signature genes or marker molecules, Copyright Ó 2016 by The American Association of Immunologists, Inc. 0022-1767/16/$30.00 and this method was also suitable for other immune cells. www.jimmunol.org/cgi/doi/10.4049/jimmunol.1502552 666 ELECTRONIC SORTING OF IMMUNE CELLS Materials and Methods samples were first sorted based on rank score, and three rounds of elec- Data sets tronic sorting were performed based on the quantile values (Q1, Q2, and Q3, respectively) (Fig. 1). During each round, the samples were divided All microarray data and immune cell groups were described in a previous into two groups (or virtual subpopulations). The difference (i.e., d value) in study (14). The 619 samples (Dataset 1) of human CD4+ T cells in the the average rank score (ARS) for each gene at the probe set level in both ImmuSort database (http://immusort.bjmu.edu.cn or http://immu.tipsci. subpopulations was calculated using the ARS in the high-expression com/Account/) were used. To improve reliability by cross-validation, six subpopulation subtracted from that in the low-expression subpopulation. additional data sets were prepared. Dataset 2 (816 samples) was a newly Thus, each round produced a gene list of 41,477 probe sets with unique updated version of Dataset 1, which contained all available human CD4+ annotation. After three rounds of sorting, three gene lists (List1, List2, and T cell samples included in the Gene Expression Omnibus database as of List3) were produced and used for further identification of differentially December 20, 2015 (16). Dataset 3 (361 samples) and Dataset 4 (470 expressed genes (DEGs). samples) were derived from healthy individuals in Datasets 1 and 2, re- For upregulated (or