bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Contextual diversity of the human cell-essential proteome

Thierry Bertomeu1*, Jasmin Coulombe-Huntington1*, Andrew Chatr-Aryamontri1*, Karine Bourdages1, Yu Xia2 and Mike Tyers1

1Institute for Research in Immunology and Cancer, Department of Medicine, University of Montreal, Montreal, Quebec H3C 3J7, Canada

2Department of Bioengineering, McGill University Montreal, Quebec H3A 0C3, Canada

*Co-first authors

Correspondence: [email protected]; tel 1-514-343-6668

Keywords: CRISPR/Cas9 screen, essentiality, hypothetical , alternative splicing, protein complex

Abbreviations: universal essential (UE), contextual essential (CE), lone essential (LE), non- essential (NE), single-guide RNA (sgRNA)

running title (<50 char): Variation in the human cell-essential proteome

Bertomeu et al 1 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Abstract Essential define central biological functions required for cell growth, proliferation and survival, but the nature of gene essentiality across human cell types is not well understood. We assessed essential gene function in a Cas9-inducible human B-cell lymphoma cell line using an extended knockout (EKO) library of 278,754 sgRNAs that targeted 19,084 RefSeq genes, 20,852 alternatively-spliced exons and 3,872 hypothetical genes. A new statistical analysis tool called RANKS identified 2,280 essential genes, 234 of which had not been reported previously. Essential genes exhibited a bimodal distribution across 10 cell lines screened in different studies, consistent with a continuous variation in essentiality as a function of cell type. Genes essential in more lines were associated with more severe fitness defects and encoded the evolutionarily conserved structural cores of protein complexes. Genes essential in fewer lines tended to form context-specific modules and encode subunits at the periphery of essential complexes. The essentiality of individual protein residues across the proteome correlated with evolutionary conservation, structural burial, modular domains, and protein interaction interfaces. Many alternatively-spliced exons in essential genes were dispensable and tended to encode disordered regions. We also detected a significant fitness defect for 44 newly evolved hypothetical reading frames. These results illuminate the nature and evolution of essential gene functions in human cells.

Bertomeu et al 2 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Introduction Essential genes underpin the genetic architecture and evolution of biological systems (Giaever et al. 2002). In all organisms, essential genes are needed for survival and proliferation, while in multicellular organisms additional essential genes function in different tissues at various stages of development. In the budding yeast Saccharomyces cerevisiae, only 1,114 of the ~6,000 genes encoded in the genome are essential for growth in nutrient-rich conditions (Giaever et al. 2002). The non-essential nature of most genes suggests that the genetic landscape of the cell is shaped by redundant gene functions (Hartman et al. 2001). Consistently, systematic screens in S. cerevisiae have uncovered more than 500,000 binary synthetic lethal interactions (Tong et al. 2001; Costanzo et al. 2010; Costanzo et al. 2016). In parallel, context-specific chemical screens have revealed that virtually every gene can be rendered essential under the appropriate condition (Hillenmeyer et al. 2008). The prevalence of non-essential genes has been verified by systematic genetic analysis in other single-celled organisms including Schizosaccharomyces pombe (Kim et al. 2010), Candida albicans (Roemer et al. 2003) and Escherichia coli (Baba et al. 2006). In metazoans, the knockdown of gene function by RNAi in the nematode worm Caenorhabditis elegans revealed that 1,170 genes are essential for development (Kamath et al. 2003), while in the fruit fly Drosophila melanogaster at least 438 genes are required for cell proliferation in vitro (Boutros et al. 2004). In the mouse Mus musculus, ~25% of genes tested to date are required for embryonic viability (Dickinson et al. 2016). Essential genes tend to be highly conserved, to interact with one another in local modules, and be highly connected in protein and genetic interaction networks (Hirsh and Fraser 2001; Zotenko et al. 2008; Costanzo et al. 2016). Although essentiality is often framed as an all-or-none binary phenotype, in reality the loss of gene function causes a spectrum of fitness defects that depend on developmental and environmental contexts. For example, in S. cerevisiae an additional ~600 genes are required for optimal growth in rich medium (Giaever et al. 2002) and in C. elegans most genes are required for fitness at the whole organism level (Ramani et al. 2012). The definition of essentiality is thus dependent on context and the experimental definition of fitness thresholds. The clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR- associated (CAS) protein system directs the cleavage of specific DNA sequences in prokaryotes, where it serves as an adaptive immunity mechanism against infection by foreign phage DNA (Samson et al. 2013). The Cas9 endonuclease can be targeted toward a specific DNA sequence by a single-guide RNA (sgRNA) that contains a 20 nucleotide match to the of interest (Jinek et al. 2012). The co-expression of Cas9 and an sgRNA leads to a blunt end double-strand break (DSB) at a precisely specified position in the genome. Repair of the DSB

Bertomeu et al 3 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

by error-prone non-homologous end joining (NHEJ) leads to random DNA insertions or deletions (indels) that cause frame-shift mutations and an ersatz knockout of the target gene with high efficiency (Cong et al. 2013). CRISPR/Cas9 technology has been recently adapted to perform large-scale functional gene knockout screens in human cells (Shalem et al. 2015). A pooled library of 64,751 sgRNAs that targets 18,080 genes was used in screens against melanoma and stem cell lines (Sanjana et al. 2014; Shalem et al. 2014), while a library of 182,134 sgRNAs that targets 18,166 genes was used in screens against chronic myelogenous leukemia and Burkitt’s lymphoma cell lines (Wang et al. 2015). A third library of 176,500 sgRNAs that targets 17,661 genes was used in screens against colon cancer, cervical cancer, glioblastoma and hTERT-immortalized retinal epithelial cell lines (Hart et al. 2015). In parallel, genome-scale screens based on transposon- mediated gene-trap technology have been performed in two haploid human cell lines (Blomen et al. 2015; Wang et al. 2015). Each of these genome-wide screens identified on the order of 1,500-2,200 essential genes that were enriched in functions for metabolism, DNA replication, transcription, splicing and protein synthesis (Blomen et al. 2015; Hart et al. 2015; Wang et al. 2015). Importantly, the CRISPR/Cas9 and gene-trap screens exhibit a strikingly high degree of overlap and overcome the limitations of previous RNAi-based gene knockdown approaches (Blomen et al. 2015; Hart et al. 2015; Wang et al. 2015). Currently available -wide CRISPR libraries target most well- characterized gene loci. Two libraries were designed to target the well-validated RefSeq gene collection (Sanjana et al. 2014; Wang et al. 2015) while a third library targets confirmed protein coding gene regions in the GENCODE v17 assembly (Hart et al. 2015). To extend functional CRISPR/Cas9 screens to less well characterized regions of the genome at potential sub-gene resolution, we generated a high complexity extended knockout (EKO) library of 278,754 sgRNAs that targets 19,084 RefSeq genes, 20,852 unique alternative exons and 3,872 hypothetical genes. The EKO library also includes 2,043 control sgRNAs with no match to the human genome to estimate screen noise. We used the EKO library to identify essential genes in the NALM-6 human pre-B cell lymphocytic leukemia line, including alternatively-spliced exons and hypothetical genes not previously tested in CRISPR/Cas9 screens. Our analysis revealed the broad influence of Cas9-induced double-stranded breaks on cell viability, the residue-level determinants of essential protein function, the prevalence of non-essential exons, and the evolution of new essential genes. Integration of our data with nine previous genome-wide CRISPR/Cas9 screens revealed a bimodal distribution of essential genes across cell lineages and a distribution of subunit essentiality across protein complexes, consistent with a continuous

Bertomeu et al 4 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

distribution of gene essentiality. High resolution CRISPR/Cas9 genetic screens can thus uncover the organization and evolution of the essential human proteome.

Results Design and generation of the EKO library We generated a custom sgRNA library that targeted more genomic loci than previously published libraries, including additional RefSeq genes, alternatively-spliced exons and hypothetical protein-coding genes (Fig. 1A; see Table S1 for all sgRNA sequences). The core set of the EKO library was comprised of 233,268 sgRNAs that target 19,084 RefSeq protein- coding genes and 17,139 alternatively-spliced exons within these coding regions. An extended set of 43,443 sgRNAs within the EKO library was designed to target loci predicted to be potential protein coding regions by AceView (Thierry-Mieg and Thierry-Mieg 2006) or GENCODE (Harrow et al. 2012), as well as additional alternatively-spliced exons predicted by AceView that we validated by independent RNA-seq evidence. Each gene in the EKO library was targeted by approximately 10 sgRNAs and each alternative exon by 3 sgRNAs. The combined gene list represented in the EKO library contains almost 5,000 more candidate genes than any sgRNA library screened to date, including almost 1,000 additional RefSeq genes (Sanjana et al. 2014; Shalem et al. 2014; Hart et al. 2015; Wang et al. 2015). To estimate the effect of noise in our screens, we also designed 2,043 sgRNAs that had no sequence match to the human genome.

Identification of cell fitness genes with the EKO library We used the EKO library in a viability screen to assess fitness defects caused by loss of gene function in the human pre-B cell acute lymphoblastic leukemia NALM-6 line, which is pseudodiploid and grows rapidly in suspension culture (Hurwitz et al. 1979). A tightly regulated doxycycline-inducible Cas9 clone of NALM-6 for use in library screens was generated by random integration of a lentiviral construct. The EKO library was transduced at low multiplicity of infection (MOI) and selected for lentiviral integration on blasticidin for 6 days, after which Cas9 expression was induced for 7 days with doxycycline, followed by outgrowth for 15 more days in the absence of doxycycline (Fig. 1B). Changes in sgRNA frequencies across the entire library were monitored by Illumina sequencing at six different time points during the screen. The EKO library pool was well represented after blasticidin selection (i.e., the day 0 time point for the screen), with 94% of all sgRNA read counts falling within a 10-fold range relative to one another (Fig. 1C; Fig. S1A). Little change in sgRNA frequency was observed after only 3 days of

Bertomeu et al 5 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

doxycycline induction (Fig. S1B), likely due to the lag in Cas9 induction, the kinetics of indel generation, and the time required for effective protein depletion. A progressively larger spread in sgRNA read frequencies was observed over the 21 day time course (Fig. S1C-F). We observed that genes with greater sgRNA depletion at earlier time points tended to encode proteins that were more disordered (Fig. 1D) and that had a shorter half-life (Fig. 1E). However, genes depleted by day 21 did not possess significantly longer half-lives than those depleted by day 15 (p>0.05, Wilcoxon test, data not shown), and genes that scored in the top 2,000 most depleted at day 21 but not at day 15 sample were less likely to be validated by gene-trap scores in the HAP1 cell line (p=2.37e-6, Wilcoxon test; Fig. S1G). Based on these results, we chose to use the sgRNA read frequencies at day 15 for our subsequent analyses, with the day 0 frequencies serving as the reference time point.

Calculation of significant sgRNA depletion by RANKS To compute the relative level of depletion of each sgRNA, we developed a custom tool called Robust Analytics and Normalization for Knockout Screens (RANKS), which enables statistical

analysis of any pooled CRISPR/Cas9 library screen or shRNA screen. First, the log2-ratio of the sgRNA read frequency at the day 15 versus day 0 time points, normalized by the total read count ratio, was used to quantify the relative abundance of each sgRNA (Fig. 1G). The average

log2-ratio for all ~10 sgRNAs that target each gene was used to generate a gene score (Wang et al. 2015), which reflected the potential fitness defect for each gene knockout. To account for

experimental variation in sgRNA read counts within any given screen, instead of using the log2-

ratio of the sgRNA itself, we used RANKS to estimate a p-value based on how each log2-ratio compared to a selected set of non-targeting control sgRNAs. The resulting gene log p-value score was then obtained from the average of the log p-values for each of the 10 sgRNAs per gene. As applied to our dataset, RANKS performed better than previous other methods (Li et al. 2014; Hart et al. 2015; Wang et al. 2015) as judged by established correlates (Fig. S2; Table S2). Subsequent analyses were based on the RANKS statistical scores for each gene, exon or hypothetical gene represented in the EKO library.

Genome-scale correlates of sgRNA depletion The results of the NALM-6 screen revealed that fitness scores correlated well with features known to be associated with essential genes. We examined correlates with mutation rate as estimated by protein sequence conservation across 46 vertebrate species (Blanchette et al. 2004), node degree in protein-protein interaction (PPI) networks (Chatr-Aryamontri et al. 2015),

Bertomeu et al 6 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

mRNA expression level in NALM-6 cells, essentiality detected by the gene-trap method (Blomen et al. 2015), essentiality detected by a whole-genome CRISPR/Cas9 screen in the near haploid KBM7 cell line (Wang et al. 2015), and DNA accessibility as assessed by DNAse1 hypersensitivity peak density in naïve B cells (Fig. 2A; Fig. S3). For every feature, we observed a highly significant difference between the top-ranked 2,000 genes in our screen (1-2000) and the next 2,000 ranked genes (2,001-4,000), with p-values <0.05 (Wilcoxon test), such that each feature correlated strongly with gene essentiality. The strong agreement between depletion scores from an independent genome-wide CRISPR screen with our ranked gene list over the entire distribution confirmed the reproducibility of the CRISPR/Cas9 method. For each feature, we also compared the union of the fourth and fifth bins (6,001-10,000) to the union of the last two bins (16,001-19,034). This comparison revealed that mutation rate and gene-trap score differences flattened out after the first 2,000 to 4,000 genes (p-values >0.05, two-tailed Wilcoxon test), whereas the other features still correlated with gene rank in the latter bins to varying degrees (PPI degree, p=0.00335; mRNA expression, p=0.0072, KBM7 score, p=3.94e-266; DNAse 1 hypersensitivity, p=1.27e-46). The tight concordance between the NALM-6 and KBM7 scores over all bins was expected given that the same sgRNAs were used to target all RefSeq genes included in both libraries. The strong correlation between rank score in the NALM-6 screen and DNA accessibility suggested a potential non-specific effect of sgRNA-directed DSB formation. The two additional significant correlations may be explained by the fact that mRNA expression level also correlated with DNA accessibility (Spearman’s rank correlation coefficient=0.34, p<2.2e-16), and protein interaction degree in turn correlated with (Pearson correlation coefficient=0.22, p<2.2e-16). These correlations are likely explained by unrepaired Cas9-mediated cleavage events in a fraction of cells leading to a DNA damage-dependent growth arrest and depletion from the pool independent of indel formation. Highly accessible DNA is more likely to be cleaved by Cas9 (Chari et al. 2015) and sgRNAs with multiple potential cleavage sites in amplified regions cause a greater fitness defect (Wang et al. 2015; Aguirre et al. 2016; Munoz et al. 2016), although this latter trend has not been observed in other screens (Tzelepis et al. 2016). To test this idea on a genome-wide scale we compared the predicted number of sgRNA matches in the genome to the level of depletion in the pool and indeed found a strong overall correlation (Fig. 2B). For this same reason, non- targeting control sgRNAs in the EKO library were actually enriched compared to the rest of the targeting sgRNAs in the pool (Fig. 1F), and therefore not an ideal control distribution. For subsequent analysis, we instead redefined the control set for RANKS as the 213,886 sgRNAs that targeted genes never reported as essential in any published screen, and we also removed

Bertomeu et al 7 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

all sgRNAs with more than two potential off-target cleavage sites and applied a fixed correction factor to sgRNAs with one or two potential off-target sites (see Supplemental Information and Fig. S4). This adjustment of the control sgRNA set allowed significance values to be reliably established, but left the relative gene rank order virtually unchanged (R-squared=0.978).

Universal essential genes The EKO library screen identified a total of 2,280 fitness genes in NALM-6 cells below a false discovery rate (FDR) of 0.05, which we generically termed essential genes (Table S3). Of these essential genes, 269 were specific to the NALM-6 screen: 225 corresponded to well-validated RefSeq genes, including 19 RefSeq genes not assessed previously, and 44 hypothetical genes annotated only in AceView or GenCode. The set of essential RefSeq genes in NALM-6 cells overlapped strongly with three previously identified sets of essential genes using different libraries, cell lines and methods (Fig. 3A) (Blomen et al. 2015; Hart et al. 2015; Wang et al. 2015). For direct comparative purposes between studies, we examined a set of 16,996 RefSeq genes shared between the EKO library and two published sgRNA libraries (Hart et al. 2015; Wang et al. 2015). We identified 486 genes that were uniformly essential across our NALM-6 screen and 9 previous screens in different cell lines (Hart et al. 2015; Wang et al. 2015), referred to here as universal essential (UE) genes (Fig. 3B; Table S3). This UE gene set was smaller than previously reported common essential genes in previous screens (Blomen et al. 2015; Hart et al. 2015; Wang et al. 2015) but was similarly enriched for processes required for cell proliferation and survival, including transcription, translation, energy metabolism, DNA replication and cell division (Fig. 3C). To investigate the nature of essential genes as a function of cell type, we plotted the number of shared essential genes across the 10 different cell lines. For clarity, we use the term contextually essential (CE) to refer to genes essential in more than one cell line but fewer than in all cell lines and lone essential (LE) to designate any gene that is uniquely essential in a single line. As opposed to a simple monotonic decay in shared essential genes as a function of cell line number, we observed a bimodal distribution whereby the number of shared essentials rapidly declined as the number of cell lines increased but then plateaued and peaked slightly at the maximum number of lines (Fig. 3D). To assess the potential effect of score threshold on the bimodal distribution, we examined the relative distribution of scores for essential versus non- essential genes as a function of cell line number (Fig. S5A). This analysis revealed that CE genes, especially those essential across many lines, often scored only slightly below the FDR threshold in other lines, consistent with genuine fitness defects that failed to reach significance

Bertomeu et al 8 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

in particular screens. However, it was also clear that many CE genes had no apparent fitness defect in particular cell lines, suggesting that bimodality was not merely driven by random effects in borderline essential genes (Fig. S5A). Indeed, we observed that bimodality was preserved over a wide range of essentiality thresholds (Fig. S5B). Variable essentiality across cell types may also reflect the expression of partially redundant paralogs (Wang et al. 2015). Consistently, genes with one or more close paralogs (>30% protein sequence identity) tended to have significantly lower essentiality scores in NALM-6 cells (P=3.6e-121, Wilcoxon test) and to be essential in fewer lines overall (P=1.5e-121, Wilcoxon test, Fig. S5C), but again this effect accounted for only a small fraction of the observed CE genes. We assessed three different possible models that might help explain the observed bimodality of gene essentiality across cell lines. A random model represented the situation in which each gene was equally likely to be essential in any given cell line. A binary model corresponded to the scenario in which each gene was partitioned as either essential in all cell lines or not essential in any cell line, with an arbitrary constant for experimental noise. A continuous model represented the case in which each gene was assigned a specific probability of being essential in any given cell line (see Supplemental Information and Fig. S5D). The continuous model provided the best fit to the observed distribution as it was the only model that accounted for the prevalence of essential genes in the medial fraction of cell lines (Fig. 3D). This result suggested that gene essentiality is far from an all-or-none effect across different human cell types.

Features of essential genes Model organism studies have shown that essential protein coding genes are required for evolutionarily conserved processes in cell metabolism, macromolecular biosynthesis, proliferation and survival. Consistently, and as reported previously (Blomen et al. 2015; Hart et al. 2015), we observed that the more cell lines in which a gene was essential, the higher the probability that this gene possessed a budding yeast ortholog and that the ortholog was also essential in yeast (Fig. 3E) (Giaever et al. 2002; NCBIResourceCoordinators 2016). We also examined the converse question of why almost half of the essential genes in yeast were not universally essential in human cell lines. Out of 444 essential yeast genes with a human ortholog tested in all 10 cell lines, 387 orthologs were essential in at least one cell line, with a tendency to be essential in multiple cell lines. For the remaining 57 yeast genes that appeared to be non-essential in humans, 40 of these had GO terms linked to more specialized features of yeast biology. We also observed that as the number of cell lines in which a gene was essential

Bertomeu et al 9 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

increased, the greater the depletion of its sgRNAs from the library pool (Fig. 3F), such that UE genes were associated with significantly greater fitness defects. Consistent with the enrichment for crucial cellular functions, the set of GO terms associated with essential genes became progressively more restricted as essentiality spread over more cell lines (Fig. S6). Proteins encoded by essential genes tend to cluster together within interaction networks in yeast (Hart et al. 2007), a feature also shown recently for human cells (Blomen et al. 2015; Wang et al. 2015). Using human protein interaction data from the BioGRID database (Chatr- Aryamontri et al. 2015), we observed that UE proteins tended to interact with each other more often than with random proteins (p<0.05, Fisher’s exact test) and associated preferentially within maximally connected subnetworks, referred to as cliques (for clique n=3, p<0.05, Fisher’s exact test) (Fig. 3G). These results suggested that clusters of UE genes carry out a limited set of indispensable cellular functions.

Essentiality in human protein complexes Essential genes tended to encode subunits of protein complexes as shown previously (Blomen et al. 2015; Hart et al. 2015; Wang et al. 2015), with a similar overall distribution as yeast essential genes (Fig. S7A,B; Table S4). We assessed the propensity of CE genes to interact with each other, and found that CE gene pairs essential in the same cell line were far more likely to encode subunits of the same protein complex than gene pairs that were essential in different cell lines (Fig. 4A). This result suggested that like UE genes, CE genes tended to encode essential modules in the proteome. We predicted that shared essential genes should strongly cluster cell lines by cell type identity, but found that a number of cell lines did not segregate with similar lineages (Fig. S7C). However, when cell lines were clustered by shared essential complexes, defined as complexes with at least one essential subunit, the hematopoietic cell lines NALM-6, KBM7, Raji, Jiyoye and the colon cell lines DLD1 and HCT116 were all precisely grouped together (Fig. 4B; Fig. S7C,D). The functions carried out by essential complexes thus correlate with cell type identity more closely than the complete spectrum of essential genes. At the level of individual complexes, a small number were comprised entirely of subunits essential in one or more cell lines, such as for the highly conserved SRB- and MED-containing cofactor complex (SMCC), exosome, the Rad51 homologous recombination repair complex and the DNA replicative helicase (MCM) complex (Fig. 4C). However, the vast majority of complexes contained a mixture of essential and non-essential (NE) subunits (Fig. 4D; Table S4). In order to assess whether the variation in essentiality between subunits of the same complex reflected the

Bertomeu et al 10 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

evolutionary history of the complex, we examined protein sequence conservation across 46 vertebrate species and found that subunits essential in more cell lines tended to be more conserved and expressed in more tissues than other subunits of the same complex (Fig. 4E,F). To test the notion that essentiality may reflect centrality in protein complex structure, we estimated the physical proximity of subunits for known complex structures in the PDB with at least four mapped subunits (Rose et al. 2015). Protein subunits that were essential in a greater number of cell lines tended to form more direct contacts with other subunits (Fig. 4G). As an example, the conserved KEOPS complex that mediates an essential tRNA modification reaction (Wan et al. 2016) contained a core catalytic subunit (Kae1) that was essential in all lines, a tightly linked core subunit (TP53RK) essential in nine lines, and three auxiliary subunits essential in four (LAGE3), one (C14ORF142) and zero (TPRKB) lines (Fig. 4H). The core Kae1- TP53RK subcomplex is flanked by the other subunits such that essentiality parallels structural centrality. The evolutionary plasticity of subunit essentiality is illustrated by the observation that Kae1 alone is sufficient for function in the mitochondrion, that three KEOPS subunits are essential in bacteria, and all five KEOPS subunits are essential in yeast (Wan et al. 2016).

NALM-6-specific essential genes Of all the genes tested in common between our study and nine previous CRISPR/Cas9 screens (Hart et al. 2015; Wang et al. 2015), 218 were uniquely essential to the NALM-6 screen. These LE genes had more protein interaction partners than NE genes (Fig. 5A, B) and exhibited higher expression levels in NALM-6 cells (Fig. 5A, C), two defining features of essential genes identified in other cell lines and model organisms (Blomen et al. 2015; Hart et al. 2015; Wang et al. 2015). However, LE proteins unique to NALM-6 cells were not as highly clustered in the protein interaction network as UE proteins (Fig. S8A, B), and had no more tendency to interact with UE proteins than NE proteins (Fig. S8C, D). Across all cell lines, while UE and CE proteins showed a strong propensity to interact, LE proteins were no more likely to interact with the UE core than NE proteins (Fig. 5D). These results suggested that LE proteins carry out a diversity of functions not strongly connected to the UE core, potentially as a consequence of synthetic lethal interactions with cell line-specific mutations. The NALM-6 line bears an A146T mutation in NRAS (Bamford et al. 2004), analogous to mutations that activate KRAS in some leukemias (Tyner et al. 2009). As NALM-6 cells required the NRAS gene for optimal growth (Table S2), other NALM-6 specific essentials may be required to buffer the effects of oncogenic NRAS signaling. For example, two of the 11 components of cytochrome C oxidase (mitochondrial complex IV), COX6A1 and COX8A, were

Bertomeu et al 11 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

essential for survival exclusively in NALM-6 cells, as were two cytochrome C oxidase assembly factors, COA6 and COX16 (Table S3). Cytochrome C oxidase is known to be activated by oncogenic RAS and is required for survival of other cancer cell lines that bear an activated RAS allele (Telang et al. 2012). In addition to NALM-6, the Raji and Jiyoye cell lines screened previously are each derived from the B-cell lineage (Wang et al. 2015). Surprisingly, of the 351 genes uniquely essential to one or more of these B cell lines (Table S2), only four genes were essential to all three lines, namely EBF1, CYB561A3, PAX5 and MANF. Two of these genes, PAX5 and EBF1 are key transcription factors that specify the B-cell lineage (Somasundaram et al. 2015), and were identified previously as shared essential genes between the Raji and Jiyoye cell lines (Wang et al. 2015). MANF, encodes mesencephalic astrocyte-derived neurotrophic factor and is expressed at high level in secretory tissues such as the pancreas and B-cells. MANF helps cells cope with high levels of protein folding stress in the endoplasmic reticulum (Lindahl et al. 2017) and also activates innate immune cells to facilitate tissue regeneration (Neves et al. 2016). CYB561A3, encodes a poorly characterized ascorbate-dependent cytochrome b561 family member implicated in transmembrane electron transfer and iron homeostasis (Asard et al.) but its role in B-cell proliferation or survival is not characterized. We also examined all of the genes that were essential only in B-cell lines and found that these tended to be more highly expressed in NALM-6 cells than genes essential in an equivalent number of other cell lines (Fig. 5E), consistent with the higher expression of genes involved in B-cell proliferation in NALM-6 cells. These essential genes were significantly more likely to participate in the B-cell receptor (BCR) signaling pathway (Fig. 5F), which is often hyper-activated in chronic lymphocytic leukemia (Croft et al. 2011; Seda and Mraz 2015). Concordantly, disruption of this pathway reveals vulnerabilities specific to the B-cell lineage. For example, the BCR pathway components TSC2, PI3KCD, CD79B and CD19 were all essential in two of the three B-cell lines tested to date (Table S3).

Residue-level features predict phenotypic effects of sgRNA-directed in-frame mutations A fraction of indels introduced into genomic DNA following error-prone repair of a Cas9- mediated DSB will span a multiple of 3 base pairs such that the phenotypic effect will depend on the precise function of the mutated residue. For each sgRNA in the EKO library that targeted the 2,236 essential genes in NALM-6 cells, we identified the codon that would be subjected to Cas9 cleavage and hence the residue most likely to be affected by in-frame indels. We found that sgRNAs targeting predicted domain-coding regions were significantly more depleted from the

Bertomeu et al 12 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

pool than other sgRNAs targeting the same gene (p=1.8e-45, Fig. 6A). This result confirms a previous focused study on high density sgRNA-mediated targeting of ~200 genes (Shi et al. 2015) and generalizes the effect to the diverse classes of domains encoded by the entire genome. Based on the high degree of significance of this result, we asked whether other protein-level features would also correlate with sgRNA targeting sites. We found that sgRNAs targeting disordered protein regions were significantly less depleted than other sgRNAs targeting the same gene (Fig. 6B). The significance of this trend also held when the analysis was restricted to regions outside of Pfam domains (p=8.98e-6, data not shown). Disruption of more conserved regions caused significantly more depletion than for less conserved regions of the same gene (Fig. 6C). Targeted regions that encoded a-helices or b-sheets were also significantly more depleted than other regions (Fig. 6D). Buried residues within a protein structure were more depleted than accessible residues (Fig. 6E). Finally, sgRNAs targeting interfacial regions between two protein subunits were more depleted than non-interfacial regions of the same protein (Fig. 6F). Integration of each variable into a single linear multi-variate model, together with the number of potential off-target cleavage sites and predicted sgRNA efficiencies, revealed that every variable was significantly and independently correlated with relative sgRNA depletion (Table 1). This result indicated that each residue-level feature contributed to the phenotypic effects of in-frame mutations (Fig. 6G).

Essentiality of alternatively-spliced exons The EKO library was designed in part to target specific alternatively-spliced exons, many of which are found in essential gene loci. From the depletion of sgRNAs targeting these exons, we were able to classify individual exons within essential genes as either essential or non-essential (Table S5). We analyzed the 2,143 alternative exons within RefSeq-defined coding regions of essential genes in the NALM-6 screen, each of which was covered by at least 3 sgRNAs with ≥20 reads, to identify 462 exons with sgRNAs that were significantly depleted (FDR<0.05). When we compared these essential alternative exons to non-essential exons (N=592, FDR>0.3) in essential genes, we found that the essential exons were more likely to overlap protein domains (Fig. 7A), less likely to contain long disordered regions (Fig. 7B), more conserved across 46 vertebrate species (Fig. 7C), and more highly expressed at both the protein (Fig. 7D) and mRNA (Fig. 7E) level. Importantly, mRNA expression analysis showed that all 462 essential exons were expressed in NALM-6 cells. We also mapped the exons to full-length isoforms in the IsoFunct database, which assigns functions to individual protein isoforms (Panwar et al. 2016). Isoforms that contained essential exons were more likely to be functional

Bertomeu et al 13 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(i.e., higher IsoFunct scores) than isoforms with non-essential exons (Fig. 7F). The non- essential nature of particular exons may reflect structural features or protein interactions associated with the exon encoded region. For example, the anaphase-promoting complex/cyclosome (APC/C), contains a non-essential exon in the essential gene ANAPC5, which interacts with ANAPC15, itself a non-essential component of the complex (Fig. 7G) (Chang et al. 2015). These results suggested that the EKO library can effectively distinguish essential from non-essential alternatively-spliced exons within essential gene loci, and that many alternatively spliced exons of essential genes are non-essential.

Genetic analysis of hypothetical genes The EKO library was also designed to target unvalidated hypothetical genes that are currently absent from the RefSeq database (NCBIResourceCoordinators 2016). We identified these hypothetical genes from the AceView and GenCode databases, which annotate loci on the basis of expressed sequence tag (EST) and/or RNA-seq evidence. Because many hypothetical genes may be expressed pseudogenes present in more than one copy in the genome, we excluded all sgRNAs with close mismatches to the genome in order to avoid depletion effects due to multiple cleavage events. When we considered only sgRNAs with a single potential cleavage site, we identified 44 essential genes (FDR<0.05) that were absent from RefSeq (Table S3). Similar to most poorly annotated hypothetical genes, these genes tended to encode short polypeptides, with a median length of 153 residues and a maximum length of 581 residues (Fig. 8A). We found that these essential hypothetical genes were more highly expressed across a range of tissues than the 2,000 hypothetical genes with the least depleted sgRNAs (Fig. 8B). We also analyzed a comprehensive mass spectrometry-based proteomics dataset that covers 73 tissues and body fluids (Wilhelm et al. 2014) and found that the 500 hypothetical genes with the strongest sgRNA depletion scores were more likely to have evidence of protein expression than the 2,000 genes with the lowest sgRNA depletion scores (Fig. 8C). However, alignments across 46-vertebrate genomes revealed that the essential hypothetical genes were not more conserved than their non-essential counterparts (Fig. 8D), suggesting that the essential functions were acquired through recent evolution. These results suggest that at least a fraction of newly evolved uncharacterized hypothetical genes are likely to be expressed and perform important functions.

Discussion Genome-wide collections of genetic reagents have allowed the identification of essential genes

Bertomeu et al 14 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

and insights into the functional architecture of model organisms. The first genome-wide CRISPR/Cas9 and gene-trap screens have defined a draft map of essential gene functions across a variety of human cell types (Blomen et al. 2015; Hart et al. 2015; Wang et al. 2015). Here, we have applied the high-complexity EKO library to define new essential features at the level of protein residues, alternatively spliced exons, previously uncharacterized hypothetical coding regions, and protein complexes.

A continuum of gene essentiality We combined our screen data with two previously published CRISPR/Cas9 screen datasets to define a minimal set of 486 UE genes across 10 different cell lines of diverse origins. As opposed to a simple monotonic convergence on a core set of essential genes, we observed an unexpected bimodal distribution of essential genes as a function of the number of lines screened. This distribution was best explained by a continuous variation in the probability of gene essentiality as opposed to a simple all-or-none binary model. This continuous probability model likely reflects genetic and/or epigenetic background effects, wherein different cell line contexts provide different levels of buffering against the loss of potentially essential genes. In this sense, the number of cell line backgrounds in which a gene is essential can be thought of as a form of genetic interaction degree, whereby the interaction occurs between any gene and a complex genetic background, as opposed to between two genes. In yeast, strain background markedly affects gene essentiality due to high order genetic interactions (Dowell et al. 2010), such that the essentialome of a particular cell must be defined in the context of a precise set of genetic and/or epigenetic parameters. The essentialome may be thought of as an onion with multiple layers that become progressively more context-specific. As we show here, UE genes at the center of the onion make quantitatively greater fitness contributions than progressively more cell line-specific essential genes. The middle layers of the onion correspond to the plateau in the bimodal distribution in which genes are essential in specific cell lines. The definition of essentiality is obviously influenced by the definition of experimental thresholds, but the continuum of essentiality is nevertheless apparent regardless of the specific threshold. Based on the current but limited analysis, we estimate that this consensus essentialome for 8 or more cell lines will encompass approximately 1,000 genes. This number intriguingly is close to the number of 1,114 essential genes required for yeast growth under optimal growth conditions (Giaever et al. 2002). It is important to note that this quantitative model of essentiality does not rule out the existence of a set of true UE genes. In fact, the continuum model predicts a smooth decline out

Bertomeu et al 15 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

to the maximal number of lines, in contrast to the sharp step increase observed experimentally at the maximum. This observation suggests the existence of a small set of UE genes that may be qualitatively distinct from the continuum of essentiality captured by the model. This set of essential functions likely corresponds to the structural and enzymatic core of the proteome, to which other essential functions have been added and subtracted through the course of evolution.

A continuum of essential subunits in protein complexes As in model organisms and as suggested by previous studies, the proteins encoded by UE genes tend to cluster together into modules in the protein-protein interaction network. This correlation between centrality and lethality is a robust feature of biological networks. Although originally posited to reflect the central location of essential proteins in network graphs (Jeong et al. 2001), this topological argument has been overturned in favor of the idea that essential proteins perform their functions as complexes (Zotenko et al. 2008). This notion has in turn been supported by the claim that protein complexes exhibit a tendency to be composed of mainly essential or mainly non-essential subunits (Hart et al. 2007; Ryan et al. 2013), such that the interactions of essential protein subunits would naturally cluster together. A surprising outcome of our analysis across multiple different human cell lines is the extent to which subunit essentiality for any given complex varies between cell lines. Our analysis of structural data suggests that UE subunits form the functional and structural core of essential protein complexes, to which CE, LE and NE subunits are appended. This observation probably reflects the fact that protein machines have acquired progressively more subunits through evolution, and that more recently evolved subunits tend to lie outside the essential structural core (Kim et al. 2006; Blomen et al. 2015). This variable essentiality of protein subunits is consistent with the lethality-centrality relationship (Fraser et al. 2002; Zotenko et al. 2008), but suggests that network evolution may also help drive the observed correlation (Coulombe-Huntington and Xia 2017).

Cell line-specific essential genes and synthetic lethality Only 4 essential genes were uniquely shared between the B-cell derived Raji, Jiyoye and NALM-6 lines, and only 35 genes were uniquely essential in B-cells and shared between two of the three lines. Similarly, a recent study identified only five essential genes shared between five acute myeloid leukemia (AML) cell lines, and 66 shared genes between at least three of the five lines (Tzelepis et al. 2016). These results suggest that genetic and epigenetic variation between

Bertomeu et al 16 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

lines may dominate the cell line-specific essentialome. Indeed, the majority of essential genes identified in human genome-wide screens to date are unique to single cell lines. In comparisons of wild type and laboratory yeast strains, which exhibit similar sequence variation as any two human individuals (Engel et al. 2016), strain-specific essential genes have a complex contextual basis due to multiple undefined genetic modifiers (Dowell et al. 2010). It seems likely that many cell type essential genes will reflect specific synthetic lethal interactions associated with the unique spectrum of cancer-associated mutations in any given cancer cell line. If on average each mutation yields 20 synthetic lethal interactions (Blomen et al. 2015; Costanzo et al. 2016), only 10 cell line-specific loss of function mutations would be needed to account for a specific essentiality profile of 200 LE genes. As an example, several components of cytochrome C oxidase and its assembly factors were specifically essential for survival in NALM-6 cells, potentially as a consequence of oncogenic RAS signaling (Telang et al. 2012), which is known to cooperatively trigger senescence in conjunction with mitochondrial defects (Nakamura et al. 2014). From a therapeutic perspective, the variable subunit essentiality of protein complexes suggests that a window of genetic sensitivity may exist for essential functions that are partially compromised in cancer cells (Hartwell et al. 1997; Hartman et al. 2001).

Evolution of new functions Recent RNA-seq and proteomics studies have illuminated the magnitude of alternative splicing in mammals and the attendant proteomic and phenotypic diversity generated by this mechanism (Barbosa-Morais et al. 2012). Different splice isoforms can have radically different functions, as for example pro- or anti-apoptotic splice variants of BCL2 (Kontos and Scorilas 2012). Limited systematic studies on dozens to hundreds of isoforms suggest that exon composition can often dramatically alter protein interaction profiles (Ellis et al. 2012; Yang et al. 2016). Our genome- wide screen with the EKO library suggests that a large fraction of alternatively spliced exons in essential genes are not required for cell survival. This result suggests that alternative splicing has evolved as a means to diversify protein structure and interactions without compromising essential functions. Our data also shows that essential exons in general tend to encode structured protein domains that are more highly conserved and highly expressed, whereas non- essential exons tend to encode intrinsically disordered regions. Non-essential exons likely represent the first step towards the evolution of new functions, some of which may be then destined to become essential. In contrast to the generation of new genes by duplication (Ohno 1970), the de novo appearance of new protein-coding genes is a poorly understood but nevertheless important

Bertomeu et al 17 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

evolutionary mechanism. For example, in mammals 1,828 known genes are unique to the primate lineage and 3,111 unique to rodents (Zhang et al. 2010). Proto-genes are genetic loci that produce very short proteins unique to each species and it has been suggested that a large pool of such potential genes exist in yeast (Carvunis et al. 2012). If a proto-gene assumes a beneficial function, it will be subjected to selective pressure and co-evolve across related species, such that even young genes can rapidly acquire essential functions (Chen et al. 2010). However, the detection of such proto-genes is hampered by the absence of sequence conservation and short length, which often precludes detection by mass spectrometry (Bruford et al. 2015). As shown here, CRISPR/Cas9-based screens allow the systematic functional analysis of hypothetical human genes. We detected an essential function for 44 hypothetical genes that was consistent with mRNA and protein expression data. These essential hypothetical gene functions were also evident across 36 different EKO library screens under various chemical stress conditions (data not shown). Although we cannot exclude the possibly that some of these loci may be non-coding RNA genes or pseudogenes, it is probable that these short reading frames represent newly evolved human genes. The biochemical interrogation of the corresponding proteins should help identify the function of these new candidate genes. Although we restricted our analysis to currently annotated hypothetical loci, it will be feasible to systematically query all short open reading frames in the human genome with dedicated sgRNA libraries.

High resolution disruption of the human genome Genome-wide CRISPR/Cas9-based screens have ushered in a new era of systems genetics in human cells (Shalem et al. 2015). The EKO library demonstrates the capacity of high resolution CRISPR/Cas9 screens to define gene essentiality across scales in the human proteome. Analogous deep coverage sgRNA libraries and variant CRISPR/Cas9 strategies also enable the precise mapping of regulatory and non-coding regions in the human genome (Wright and Sanjana 2016). Systematic functional analysis of these vast unexplored regions of the genome and proteome will provide insights into biological mechanisms and the evolution of phenotypic complexity in human cells.

Bertomeu et al 18 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Methods Genome-wide sgRNA library A set of 181,130 sgRNA sequences that target most RefSeq genes (~10 sgRNAs per gene) were as reported previously (Wang et al. 2014). Similar design rules were used to target additional genes from the latest RefSeq, Aceview and GENCODE releases (~10 sgRNAs per gene), as well as 20,852 alternatively-spliced exons derived from 8,744 genes (~3 sgRNA per exon). A set of 2,043 non-targeting sgRNA sequences with no detectable match to the human genome were randomly generated. The complete set of 278,754 sgRNAs was divided into three sub-library pools of 92,918 sgRNAs for array-based oligonucleotide synthesis as 60 mers (Custom Array), each containing 3-4 sgRNAs per gene, 1 sgRNA per alternative exon and 681 non-targeting sgRNAs. Each pool was amplified by PCR, cloned by Gibson assembly into the pLX-sgRNA plasmid (Wang et al. 2014), expanded in plasmid format and converted to a lentiviral pool by transfection into 293T cells (see Supplemental Information and Table S6).

Screen for essential genes A doxycycline-inducible Cas9 clonal cell line of NALM-6 was generated by infection with a Cas9-FLAG lentiviral construct and puromycin selection, followed by FACS sorting and immunoblot with anti-FLAG antibody to select a tightly regulated clone. The inducible Cas9 cell line was infected with pooled lentivirus libraries at MOI = 0.5 and 500 cells per sgRNA. After 6 days of blasticidin selection, 140 million cells were induced with doxycycline for 7 days, followed by periods of outgrowth in the absence of doxycycline. sgRNA sequences were recovered by PCR of genomic DNA, re-amplified with Illumina adapters, and sequenced on an Illumina HiSeq 2000 instrument.

Scoring of gene essentiality A RANKS score was calculated for every gene targeted by ≥4 sgRNAs represented by ≥20 reads in one sample. A single-tailed p-value for each sgRNA was obtained by comparing its

ratio to that of the control sgRNAs. The RANKS score was determined as the average loge p- value of the sgRNAs targeting the gene or exon. Gene/exon p-values were generated by comparing the RANKS score to a control distribution of an equal number of control sgRNAs for 5 million samples with FDR correction (Benjamini and Hochberg 1995). The 2,043 non-targeting sgRNAs were used as controls in calculating the original RANKS score for scoring method comparisons and the binned rank analyses. To control for non-specific effects of Cas9 cleavage, the entire EKO library was used as a control after removing sgRNAs that targeted

Bertomeu et al 19 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

previously documented essential genes (Blomen et al. 2015; Hart et al. 2015; Wang et al. 2015) and sgRNAs with ≥2 predicted off-target cleavage sites. A fixed correction factor was applied to sgRNAs with 1 or 2 predicted mismatches. For the identification of essential non-RefSeq genes, only sgRNAs with unique matches to the genome were used to avoid potential confounding cleavage events at paralogous loci.

Further details of experimental, computational and statistical methods are provided in the Supplemental Information. All materials are available upon request and code for RANKS may be obtained at https://github.com/JCHuntington/RANKS.

Acknowledgements We thank Driss Boudeffa, Bobby-Joe Breitkreutz, Manon Lord, Jennifer Huber, Philippe Daoust and Alfredo Staffa for technical support, and Traver Hart, Jason Moffat, Luisa Izzi and other members of the Tyers laboratory for helpful discussions. JC-H was supported by a Canadian Institutes of Health Research (CIHR) postdoctoral fellowship, KB by Cole Foundation and Fonds de recherche du Québec Santé studentships, YX by a Canada Research Chair in Computational and Systems Biology, and MT by a Canada Research Chair in Systems and Synthetic Biology. Funded by grants from the Natural Sciences and Engineering Research Council of Canada (RGPIN-2014-03892 to YX), the CIHR (MOP-126129 to MT), the Canadian Cancer Society Research Institute (703906 to MT), the National Institutes of Health (R01RR024031 to MT), and by an award from the Ministère de l’enseignement supérieur, de la recherche, de la science et de la technologie du Québec through Génome Québec to MT.

Author contributions TB and JC-H designed the EKO library; TB built the EKO library and performed the screen with assistance from KB; JC-H developed the RANKS algorithm and performed all statistical analyses; AC-A performed protein interaction analyses; TB, JC-H, A-CA, YX and MT wrote the manuscript.

Disclosure declaration The authors declare no conflicts of interest.

Bertomeu et al 20 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

References Aguirre AJ, Meyers RM, Weir BA, Vazquez F, Zhang CZ, Ben-David U, Cook A, Ha G, Harrington WF, Doshi MB et al. 2016. Genomic copy number dictates a gene- independent cell response to CRISPR-Cas9 targeting. Cancer Discov 6: 914-929. Asard H, Barbaro R, Trost P, Berczi A. 2013. Cytochromes b561: ascorbate-mediated trans- membrane electron transport. Antioxid Redox Signal 19: 1026-1035. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H. 2006. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2: 2006 0008. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, Flanagan A, Teague J, Futreal PA, Stratton MR et al. 2004. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer 91: 355-358. Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ, Slobodeniuc V, Kutter C, Watt S, Colak R et al. 2012. The evolutionary landscape of alternative splicing in vertebrate species. Science 338: 1587-1593. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Statist Soc Series B 57: 289–300. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14: 708-715. Blomen VA, Majek P, Jae LT, Bigenzahn JW, Nieuwenhuis J, Staring J, Sacco R, van Diemen FR, Olk N, Stukalov A et al. 2015. Gene essentiality and synthetic lethality in haploid human cells. Science 350: 1092-1096. Boutros M, Kiger AA, Armknecht S, Kerr K, Hild M, Koch B, Haas SA, Paro R, Perrimon N, Heidelberg Fly Array C. 2004. Genome-wide RNAi analysis of growth and viability in Drosophila cells. Science 303: 832-835. Bruford EA, Lane L, Harrow J. 2015. Devising a consensus framework for validation of novel human coding loci. J Proteome Res 14: 4945-4948. Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, Charloteaux B, Hidalgo CA, Barbette J, Santhanam B et al. 2012. Proto-genes and de novo gene birth. Nature 487: 370-374. Chang L, Zhang Z, Yang J, McLaughlin SH, Barford D. 2015. Atomic structure of the APC/C and its mechanism of protein ubiquitination. Nature 522: 450-454.

Bertomeu et al 21 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Chari R, Mali P, Moosburner M, Church GM. 2015. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nat Methods 12: 823-826. Chatr-Aryamontri A, Breitkreutz BJ, Oughtred R, Boucher L, Heinicke S, Chen D, Stark C, Breitkreutz A, Kolas N, O'Donnell L et al. 2015. The BioGRID interaction database: 2015 update. Nucleic Acids Res 43: D470-478. Chen S, Zhang YE, Long M. 2010. New genes in Drosophila quickly become essential. Science 330: 1682-1685. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, Hsu PD, Wu X, Jiang W, Marraffini LA et al. 2013. Multiplex genome engineering using CRISPR/Cas systems. Science 339: 819- 823. Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, Ding H, Koh JL, Toufighi K, Mostafavi S et al. 2010. The genetic landscape of a cell. Science 327: 425-431. Costanzo M, VanderSluis B, Koch EN, Baryshnikova A, Pons C, Tan G, Wang W, Usaj M, Hanchard J, Lee SD et al. 2016. A global genetic interaction network maps a wiring diagram of cellular function. Science 353. Coulombe-Huntington J, Xia Y. 2017. Network centrality analysis in fungi reveals complex regulation of lost and gained genes. PLoS One 12: e0169459. Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B et al. 2011. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res 39: D691-697. Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK, Meehan TF, Weninger WJ, Westerberg H, Adissu H et al. 2016. High-throughput discovery of novel developmental phenotypes. Nature 537: 508-514. Dosztanyi Z, Csizmok V, Tompa P, Simon I. 2005. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21: 3433-3434. Dowell RD, Ryan O, Jansen A, Cheung D, Agarwala S, Danford T, Bernstein DA, Rolfe PA, Heisler LE, Chin B et al. 2010. Genotype to phenotype: a complex problem. Science 328: 469. Ellis JD, Barrios-Rodiles M, Colak R, Irimia M, Kim T, Calarco JA, Wang X, Pan Q, O'Hanlon D, Kim PM et al. 2012. Tissue-specific alternative splicing remodels protein-protein interaction networks. Mol Cell 46: 884-892.

Bertomeu et al 22 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Engel SR, Weng S, Binkley G, Paskov K, Song G, Cherry JM. 2016. From one to many: expanding the Saccharomyces cerevisiae reference genome panel. Database (Oxford) 2016. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. 2002. Evolutionary rate in the protein interaction network. Science 296: 750-752. Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B et al. 2002. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418: 387-391. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S et al. 2012. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res 22: 1760-1774. Hart GT, Lee I, Marcotte ER. 2007. A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinformatics 8: 236. Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, Mis M, Zimmermann M, Fradet-Turcotte A, Sun S et al. 2015. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163: 1515-1526. Hartman JLt, Garvik B, Hartwell L. 2001. Principles for the buffering of genetic variation. Science 291: 1001-1004. Hartwell LH, Szankasi P, Roberts CJ, Murray AW, Friend SH. 1997. Integrating genetic approaches into the discovery of anticancer drugs. Science 278: 1064-1068. Hillenmeyer ME, Fung E, Wildenhain J, Pierce SE, Hoon S, Lee W, Proctor M, St Onge RP, Tyers M, Koller D et al. 2008. The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science 320: 362-365. Hirsh AE, Fraser HB. 2001. Protein dispensability and rate of evolution. Nature 411: 1046-1049. Hurwitz R, Hozier J, LeBien T, Minowada J, Gajl-Peczalska K, Kubonishi I, Kersey J. 1979. Characterization of a leukemic cell line of the pre-B phenotype. Int J Cancer 23: 174- 180. Jeong H, Mason SP, Barabasi AL, Oltvai ZN. 2001. Lethality and centrality in protein networks. Nature 411: 41-42. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E. 2012. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337: 816- 821.

Bertomeu et al 23 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Kamath RS, Fraser AG, Dong Y, Poulin G, Durbin R, Gotta M, Kanapin A, Le Bot N, Moreno S, Sohrmann M et al. 2003. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421: 231-237. Kim DU, Hayles J, Kim D, Wood V, Park HO, Won M, Yoo HS, Duhig T, Nam M, Palmer G et al. 2010. Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe. Nat Biotechnol 28: 617-623. Kim PM, Lu LJ, Xia Y, Gerstein MB. 2006. Relating three-dimensional structures to protein networks provides evolutionary insights. Science 314: 1938-1941. Kontos CK, Scorilas A. 2012. Molecular cloning of novel alternatively spliced variants of BCL2L12, a new member of the BCL2 gene family, and their expression analysis in cancer cells. Gene 505: 153-166. Li W, Xu H, Xiao T, Cong L, Love MI, Zhang F, Irizarry RA, Liu JS, Brown M, Liu XS. 2014. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol 15: 554. Lindahl M, Saarma M, Lindholm P. 2017. Unconventional neurotrophic factors CDNF and MANF: Structure, physiological functions and therapeutic potential. Neurobiol Dis 97: 90- 102. Munoz DM, Cassiani PJ, Li L, Billy E, Korn JM, Jones MD, Golji J, Ruddy DA, Yu K, McAllister G et al. 2016. CRISPR screens provide a comprehensive assessment of cancer vulnerabilities but generate false-positive hits for highly amplified genomic regions. Cancer Discov 6: 900-913. Nakamura M, Ohsawa S, Igaki T. 2014. Mitochondrial defects trigger proliferation of neighbouring cells via a senescence-associated secretory phenotype in Drosophila. Nat Commun 5: 5264. NCBIResourceCoordinators. 2016. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44: D7-D19. Neves J, Zhu J, Sousa-Victor P, Konjikusic M, Riley R, Chew S, Qi Y, Jasper H, Lamba DA. 2016. Immune modulation by MANF promotes tissue repair and regenerative success in the retina. Science 353: aaf3646. Ohno S. 1970. Evolution by gene duplication. Springer, Berlin. Panwar B, Menon R, Eksi R, Li HD, Omenn GS, Guan Y. 2016. Genome-wide functional annotation of human protein-coding splice variants using multiple instance learning. J Proteome Res 15: 1747-1753.

Bertomeu et al 24 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Ramani AK, Chuluunbaatar T, Verster AJ, Na H, Vu V, Pelte N, Wannissorn N, Jiao A, Fraser AG. 2012. The majority of animal genes are required for wild-type fitness. Cell 148: 792- 802. Roemer T, Jiang B, Davison J, Ketela T, Veillette K, Breton A, Tandia F, Linteau A, Sillaots S, Marta C et al. 2003. Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Mol Microbiol 50: 167-181. Rose PW, Prlic A, Bi C, Bluhm WF, Christie CH, Dutta S, Green RK, Goodsell DS, Westbrook JD, Woo J et al. 2015. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res 43: D345-356. Ruepp A, Waegele B, Lechner M, Brauner B, Dunger-Kaltenbach I, Fobo G, Frishman G, Montrone C, Mewes HW. 2010. CORUM: the comprehensive resource of mammalian protein complexes--2009. Nucleic Acids Res 38: D497-501. Ryan CJ, Krogan NJ, Cunningham P, Cagney G. 2013. All or nothing: protein complexes flip essentiality between distantly related eukaryotes. Genome Biol Evol 5: 1049-1059. Samson JE, Magadan AH, Sabri M, Moineau S. 2013. Revenge of the phages: defeating bacterial defences. Nat Rev Microbiol 11: 675-687. Sanjana NE, Shalem O, Zhang F. 2014. Improved vectors and genome-wide libraries for CRISPR screening. Nat Methods 11: 783-784. Seda V, Mraz M. 2015. B-cell receptor signalling and its crosstalk with other pathways in normal and malignant cells. Eur J Haematol 94: 193-205. Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA, Mikkelsen TS, Heckl D, Ebert BL, Root DE, Doench JG et al. 2014. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science 343: 84-87. Shalem O, Sanjana NE, Zhang F. 2015. High-throughput functional genomics using CRISPR- Cas9. Nat Rev Genet 16: 299-311. Shi J, Wang E, Milazzo JP, Wang Z, Kinney JB, Vakoc CR. 2015. Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains. Nat Biotechnol 33: 661-667. Somasundaram R, Prasad MA, Ungerback J, Sigvardsson M. 2015. Transcription factor networks in B-cell differentiation link development to acute lymphoid leukemia. Blood 126: 144-152. Telang S, Nelson KK, Siow DL, Yalcin A, Thornburg JM, Imbert-Fernandez Y, Klarer AC, Farghaly H, Clem BF, Eaton JW et al. 2012. Cytochrome c oxidase is activated by the oncoprotein Ras and is required for A549 lung adenocarcinoma growth. Mol Cancer 11: 60.

Bertomeu et al 25 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Thierry-Mieg D, Thierry-Mieg J. 2006. AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biol 7 Suppl 1: S12 11-14. Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Page N, Robinson M, Raghibizadeh S, Hogue CW, Bussey H et al. 2001. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294: 2364-2368. Tyner JW, Erickson H, Deininger MW, Willis SG, Eide CA, Levine RL, Heinrich MC, Gattermann N, Gilliland DG, Druker BJ et al. 2009. High-throughput sequencing screen reveals novel, transforming RAS mutations in myeloid leukemia patients. Blood 113: 1749-1755. Tzelepis K, Koike-Yusa H, De Braekeleer E, Li Y, Metzakopian E, Dovey OM, Mupo A, Grinkevich V, Li M, Mazan M et al. 2016. A CRISPR dropout screen identifies genetic vulnerabilities and therapeutic targets in acute myeloid leukemia. Cell Rep 17: 1193- 1205. Wan LC, Maisonneuve P, Szilard RK, Lambert JP, Ng TF, Manczyk N, Huang H, Laister R, Caudy AA, Gingras AC et al. 2016. Proteomic analysis of the human KEOPS complex identifies C14ORF142 as a core subunit homologous to yeast Gon7. Nucleic Acids Res doi:10.1093/nar/gkw1181. Wang T, Birsoy K, Hughes NW, Krupczak KM, Post Y, Wei JJ, Lander ES, Sabatini DM. 2015. Identification and characterization of essential genes in the human genome. Science 350: 1096-1101. Wang T, Wei JJ, Sabatini DM, Lander ES. 2014. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343: 80-84. Wilhelm M, Schlegl J, Hahne H, Gholami AM, Lieberenz M, Savitski MM, Ziegler E, Butzmann L, Gessulat S, Marx H et al. 2014. Mass-spectrometry-based draft of the human proteome. Nature 509: 582-587. Wright JB, Sanjana NE. 2016. CRISPR Screens to Discover Functional Noncoding Elements. Trends Genet 32: 526-529. Yang X, Coulombe-Huntington J, Kang S, Sheynkman GM, Hao T, Richardson A, Sun S, Yang F, Shen YA, Murray RR et al. 2016. Widespread expansion of protein interaction capabilities by alternative splicing. Cell 164: 805-817. Zhang YE, Vibranovski MD, Landback P, Marais GA, Long M. 2010. Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X . PLoS Biol 8.

Bertomeu et al 26 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Zotenko E, Mestre J, O'Leary DP, Przytycka TM. 2008. Why do hubs in the yeast protein interaction network tend to be essential: reexamining the connection between the network topology and essentiality. PLoS Comput Biol 4: e1000140.

Bertomeu et al 27 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure Legends Figure 1. Genome-wide sgRNA library generation and screen for essential genes. (A) Composition of the EKO library. Some sgRNAs were used to evaluate both an alternative exon and the entire gene. (B) Experimental design for screen. A NALM-6 doxycycline-inducible Cas9 cell line was transduced with the EKO library at MOI of ~0.5, followed by sgRNA vector selection, population outgrowth and determination of sgRNA frequencies at the indicated time points. (C) Number of sgRNA reads pre-doxycycline induction (day 0) superimposed on distributions after doxycycline induction (day 7) and a further 8 days outgrowth in media only (day 15). A constant factor was used to center the two distributions. (D) Effect of protein intrinsic disorder on kinetics of sgRNA depletion. Average probability of disordered stretches over the entire protein as predicted by IUPred (Dosztanyi et al. 2005) for the 1,000 genes with most highly depleted sgRNAs at day 7 versus versus day 15, two-tailed Wilcoxon test. (E) Effect of target protein half-life on kinetics of sgRNA depletion. Half-life of mouse orthologs of the 1,000 most highly depleted genes after day 7 (N=645) was compared to day 15 (N=724). P-value,

two-tailed Wilcoxon test. (F) Distribution of log2 sgRNA frequency changes for day 0 versus day 15 for all previously described non-essential sgRNAs (Blomen et al. 2015; Hart et al. 2015; Wang et al. 2015) compared to non-targeting sgRNAs in this study. An sgRNA targeting an essential gene in this study (RUVBL1) is shown for reference. Single-tailed p-values for each sgRNA were calculated as the area under the curve of the control distribution to the left of the sgRNA score divided by the total area under the control curve.

Figure 2. Features correlated with gene depletion. (A) RefSeq genes were ranked in bins of 2,000 genes from most to least depleted sgRNAs. Mutation rate was the fraction of aligned residues that differed from the human sequence across 45 vertebrate species in a 46-way Multi- Z whole-genome alignment. Protein interaction was the number of partners reported in the

BioGRID database (3.4.133 release). mRNA expression was the log2 RNA-seq reads per million (RPM) in the NALM-6 cell line. HAP1 gene-trap score was the ratio of sense to anti-sense

intronic insertions (Blomen et al. 2015). KBM7 CRISPR score was the average log2 read frequency change (Wang et al. 2015). DNAse1 hypersensitivity was the number of read peaks per kbp in naïve B-cells from ENCODE. (B) Correlation of gene depletion with potential sgRNA

target sites. Log2 read frequency fold-change for day 0 to day 15 was binned by number of perfect or single base mismatches (within the first 12 bases of sgRNA) to the human genome.

Figure 3. Universal essential genes. (A) Overlap between genes defined as essential in this

Bertomeu et al 28 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

study and three other studies (Blomen et al. 2015; Hart et al. 2015; Wang et al. 2015). The indicated overlap is between all essential genes identified in each study, i.e., an amalgam across all cell lines, in order to assess inter-library reproducibility. (B) Clustergram of essential genes across 10 different cell lines from CRISPR studies in panel A. (C) Biological processes enriched in UE genes. (D) Number of essential genes shared between cell lines. Experimentally observed essential genes are shown as a histogram. Indicated models to account for shared essential genes were fitted using maximum likelihood. See text for details. (E) Fraction of essential genes with orthologous non-essential or essential yeast genes as a function of cell line number. (F) Average score of each essential gene as a function of cell line number. (G) Fraction and relative enrichment of UE proteins in specific protein-protein interaction network motifs from BioGRID.

Figure 4. Essential subunits of protein complexes. (A) Pairs of essential genes in the same cell line tend to encode subunits of the same protein complex compared to essential genes unique to different cell lines. All possible pairs of cell lines were sampled by randomly choosing two pairs of genes per sample, one in which each gene was essential in line 1 but not in line 2 and then one additional gene essential in line 2 but not line 1. Probability of the first and second gene belonging to the same complex and that of the first and third gene were estimated from 10 million trials. P-value, Fisher’s exact test. Self-interacting proteins were excluded. (B) Cell lines clustered as function of shared essential protein complexes. All protein complexes were taken from the CORUM database release 17.02.2012 (Ruepp et al. 2010) and are listed in Table S4. (C) Distribution of essential subunits in protein complexes. (D) Fraction of essential subunits as a function of cell line number. Dot size is proportional to the number of complexes. (E) Mean fraction of mutated residues across 45 vertebrate species for all pairs of proteins that are part of a common CORUM complex and essential in at least one cell line. P-value, paired Wilcoxon rank-sum test. (F) Expression of subunits of CORUM complexes across 30 different tissues types as function of essentiality. (G) Proximity of subunits with at least 4 mapped subunits in a common PDB structure as a function of essentiality. (H) Variable essentiality of subunits of the KEOPS complex (Wan et al. 2016).

Figure 5. Cell type-specific essential genes. (A) Expression level as a function of protein interaction degree for essential genes unique to the NALM-6 screen (red) versus UE genes (blue). (B) Average interaction degree for proteins encoded by UE genes, LE genes in NALM-6 and NE genes. (C) Expression levels of UE genes, LE genes in NALM-6 and NE genes. (D)

Bertomeu et al 29 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Interactions between UE, CE, LE and NE proteins. Counts were normalized by number of genes and interactions per class. (E) mRNA expression in NALM-6 cells of essential genes either unique to one or more B-cell-derived cell lines or unique to a randomly chosen equivalent number of other cell lines. (F) Participation in the B-cell receptor pathway (BCR) of essential genes either unique to one or more B-cell-derived cell lines or unique to a randomly chosen equivalent number of other cell lines.

Figure 6. Correlation of sgRNA depletion with sub-gene and protein level features. Average log p-value of sgRNAs linked to each feature were normalized to the gene average for all sgRNAs. (A) Overlap with Pfam domains. (B) Predicted protein disorder of >40% probability as determined by UIPred (Dosztanyi et al. 2005). (C) Protein-level conservation of a 30 residue window, either more conserved or less conserved than gene average. (D) Residue burial, defined as the number of non-hydrogen atoms from non-adjacent residues present within 6 Å of the residue, for residues more or less buried than the average for the protein. (E) Residue proximity to a protein-protein interface. (F) Location of residues within an a-helix, an extended b-strand or neither as annotated from PDB. (G) Summary of protein level features that influence sgRNA depletion from the library pool.

Figure 7. Alternatively spliced exons in essential genes. (A) Fraction of exons that overlap a Pfam domain (e-value <10-5) for essential exons (FDR<0.05) and non-essential exons (FDR>0.3) in essential genes. (B) Average IUPred predicted probability of exon residues belonging to a long disordered region. (C) Average fraction of residues mutated relative to human in 45 vertebrate species. (D) Average number of matched mass spectra per residue

across human tissues. (E) Log2 RNA-seq read density in the exon normalized to gene average. (F) Average IsoFunc fold-change, representing the likeliness that a specific isoform codes for a given GO function for isoforms of essential genes. (G) Non-essential alternative exon (blue) in the essential gene ANAPC5, within the anaphase promoting complex interacts with ANAPC15, a non-essential component (green) of the complex.

Figure 8. Correlated features of predicted hypothetical genes. (A) ORF length distribution of RefSeq genes, hypothetical ORFs from AceView or GenCode covered by the EKO library. The 44 hypothetical ORFs with significant (FDR<0.05) essentiality scores in the NALM-6 screen are indicated separately. For genes with multiple transcripts, only the length of the longest predicted

protein was considered. (B) Mean log2(reads/kb) across 16 tissues from the Human BodyMap

Bertomeu et al 30 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2.0 for 44 essential and the 2,000 least essential hypothetical genes. (C) Fraction of genes with uniquely matched mass spectra at FDR<0.001 for the 500 most essential and the 2,000 least essential hypothetical genes. (D) Fraction of residues conserved across 46-vertebrate genomes for the 44 essential and the 2,000 least essential hypothetical genes.

Table 1. Multi-variate Pearson correlation between sgRNA features and gene-normalized sgRNA depletion score.

sgRNA feature p-value Overlapping domain 0.0018 Relative conservation 1.33e-6 Relative burial 3.23-6 Overlapping alpha-helix 8.27e-5 Overlapping beta-strand 8.33e-6 Overlapping PPI interface 0.0034 Overlapping protein disorder 3.30e-5

Bertomeu et al 31 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A EKO library: 278,754 sgRNAs B

19,084 RefSeq genes (192,202 sgRNAs) day 0 day 3 day 7 day 11 day 15 day 21

20,852 Alternatively-spliced + doxycycline media only exons (58,615 sgRNAs) 3,872 predicted genes (33,982 sgRNAs) comparison of sgRNA frequencies by Illumina sequencing non-targeting (2043 sgRNAs) P=3.8e-27 C 40000 D day 0 1.0 35000

30000 0.8 25000 day 15 0.6 20000

# sgRNAs 15000 depleted 0.4 10000 sgRNAs protein disorder

5000 0.2

0 1 2 3 5 10 15 32 64 137 200340 600 0 # reads per sgRNA depleted at depleted at day 7 vs day 0 day 15 vs day 7

E P=1.38e-4 F 20000 non-essential 100

1000 non-targeting 90 16000 500 non-essential 80 below selected 70 12000 200 60 100 50 8000 40 50 selected sgRNA 30 4000 (RUVBL1) 20

20 # sgRNAs targeting non-essential genes

protein half-life (h) 10 10 # non-targeting sgRNAs 0 0 5 -6 210-1-2-3-4-5 3 sgRNA score depleted at depleted at sgRNA reads at day 15 total reads at day 15 sgRNA score = log day 7 vs day 0 day 15 vs day 7 2()sgRNA reads at day 0 ( total reads at day 0 )

Figure 1 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A P=9.9e-42 P=0.7

Color Key

0.12 0.16 Value Color Key

mutation rate 1 low high

0.12 0.16

2 P=1.1e-57 P=3.4e-3 Value

Color Key 1 2 3 4 5 6 7 8 9 10 10 20 30 Value Color Key

protein interactions 1 low high

10 20 30

P=2.4e-167 P=7.2e-3 2 Value Color Key 9 −6 0 4 1 2 3 4 5 6 7 8 Value 10 Color Key

mRNA expression 1 low high

−6 0 4 P=1.8e-244 P=5.4e-2 2 Value Color Key

0.2 0.35 0.5 1 2 3 4 5 6 7 8 9 Value 10 Color Key

HAP1 gene-trap score 1 low high

P<2.2e-208 P=3.9e-266 0.2 0.35 0.5

Color Key 2 Value

−1.5 0 1 Value Color Key 1 2 3 4 5 6 7 8 9 10

KBM7 CRISPR score 1 low high

Color Key −1.5 0 1 2 Value

P=4.2e-4 P=1.3e-43 1 2 3 4 5 6

0.12 0.2

Value 1 2 3 4 5 6 7 8 9 10 Color Key

DNAse 1 hypersensitivity 1 low high

0.12 0.2 Value

2 1-2000 1 2 3 4 5 6 2001-40004001-60006001-8000 8001-10000 1 2 3 4 5 6 7 8 9 10001-1200012001-1400014001-1600016001-1800018000-1903410 gene depletion rank B 1 2 3 4 5 6 1.7e-4 1.7e-4 1.7e-4 4.7e-58 1.0e-28 2.0e-257 2

1 1 2 3 4 5 6 0

-1

-2 sgRNA score sgRNA 1 2 3 4 5 6 -3

-4 210 3 4 5 >5 # DNA cleavage sites # sgRNAs 1 2 3 4 5 6 1129 2043 9090 3192 1746 2028 in EKO library 259276

Figure 2 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Color Key A this study Blomen et al. B (CRISPR) (gene trap) 283 NALM-6 HAP1 & KBM7 −1 0 1 32 Value 168 77 86 CIRH1A INTS8NSFCHMP2APABPN1PSMD2KIF23PSMA7 ALYREFNHP2HYNCAPD2GUK1HSELAOU1PA9C2 NGDNBYSLWDR46RARSWDR33FARSAIMP4 C9orf114HINFPRBMXNOP16KIF11CDC5LGLE1 NOP9ALG2SCO2NOL6LTV1SNRNP200DOLK CDC20SOSGEPTICRRDHX8TBCDPMPCBPATA5L1 TRMT112PRPF4ZNF259RPP30GMPPBPAK1IP1CKAP5 NDOR1SNAPC4CENRPL31PRPF8CCNA2PSMA3PA PSMB2NUDCIKBKAPNOP2POLR2BRPS18HEATR1 SPOLR2INDNL2POLR1ARANGAP1CPSF2MED11PATA5 XPO1TCP1CCT3IARSNDURFC4NCAPHFAB1 DDOSTTTC27CHEK1PSMB7ECT2KANSL3SAE1 ZNHIT2VRPS13NELFBTUFMATP6V0CDHX16ARS NSL1SFPQNOL9DGCR8SF3A3CDT1SUPT6H EIF2S1PNNTTC1PAFAH1B1RBBP6RCL1MYBBP1A DDX56RACGAP1NUP62DBR1COPZ1GGPS1DIMT1 MRP63UBA1ATP6V1ATRAPPC1EIF3ACSE1LMRPL39 CCDC84CASC5DDX23TOP1SAMED17PRPF19CM1L ATP5A1RCC1HNRNPKEIF5BVPS25WDR61PTPMT1 FEN1IPO13RNPC3PRPF31DCTN5POLR3HNARFL POLR2LPOLA1MCM3RABGGTASONQARSLAS1L SART3CDC73SF3B3NUP85PRPF6CDC123NIP7 NUP214MEPCEVCPNAT10CEBPZPCID2TAF1C DIS3ARCN1RNGTTRPS4XPSMC5TSR2POLD2 PSMA1TIMM44RFC5RAE1TIMM23RUVBL1DPAGT1 TSR1RABGGTBUPF2CCT2RPS11NDC80ORC3 ACTL6AANAPC5EIF3IUSP5RPL7U2AF1PSMB3 SNIP1RIOK1CLP1KPNB1RFC3DNAPSMB4JA3 universal EIF6CCT7WDR74CHAF1ACARSNUP133COPA DRNMTCDC6WDR70LRR1EIF4A3METTL16AD1 POLR1BGRPEL1RRM2ATP2A2GINS2PRMT5MARS ZMAT2TONSLPSMC3UTP15COPB1CDC16PSMD14 GSPT1PSMD7PSMB1SSU72ATTI2TRRAPURKB MPHOSPH10TBL3KANSL2SRBD1EXOSC8DDX47GPS1 CIANAPC4DDX24RPL38EFTUD2TIMELESSWDR3AO1 NUP155UTP6DDX54RU2AF2MFAP1POLR2ATFDC1 RFC2DC10orf2GTPBP4EEF2TPSMA2ARSARS PSMD6EXOSC6PWP2TRAPPC5NARSKINNCBP1 DDX46ARL2ERCC2EXOSC2USP39RPS21RPL13 LSM7NUP107ORAOV1CDK9HARFT1DNMT1US5 GTF2BPOLR1CPOLR2GHADIEXFSMC2EXOSC7US3 KRPOLR3CDDX49POLA2CPSF1CLTCSEC61A1I1 SLMO2THOC7PRPF38BCCT4RPS2RPA1PPIL2 EIF2B3TUBGCP6DKC1NLE1INTS3ATP6V0BMED20 SF3B2TXNL4ASF3A2CENPMAAMPRAD51BUB3 C3orf17SF1POLR3ADTLSNORD95MCM6RPL27A PKMYT1MMS22LNAZBTB11MED27UTP23DYNC1H1PA TPX2DCTN2WDR75POP4NUDT21KARSFTSJ3 PSMG3PHBEIF3GSNRPD3NOL10NFS1WDR77 EIF2B1FARSBERCC3RPN1NUBP1NVLGTF2H4 ISCUTIMM13GINS3YARS2SRSF2PSMB6PTPN23 RPL8ALG11DDX21SARSDDX20PSMD1PES1 TNPO3SRSF7PLK1FAM96BNOC4LPOLE2SART1 MAK16PDCD11CINPLARSATP5DRAD51CTAF6 EXOSC4TUBG1TTI1COPB2ATP6V1DPSMD3PUF60 UBL5DDX41SNRPFGTF3C1RBM25SPC24PSMD13 POLR2DDDX51ANAPC2MYCWDR43POLR2CBIRC5 RPL4YGPN3EIF2B2GEMIN5SRRTUPF1ARS GRNAA25SMC1ASLC7A6OSDHDDSTHOC5MASTLWD1 DDX18RPL12UTP3SRSF1DHX38TUT1SNRPE BRF2ARMC7PRMT1TWISTNBPSMD4DHX15HNRNPU ECDWBSCR22RPL11NUP98AFG3L2PCNATIMM10 DDB1WEE1GPKOWPPANDONSONNCBP2HDAC3 BMDN1RRM1COPS5SMG1MCM7RPP21UD31 SUPT5HEIF1ADISG20L2SDNSA2WRPL18ARSAD1 MED30SMC4RUVBL2MRPS6NUP93INTS9MCM4 HSRPAP1FNTBAQRPOP1TOP2AC1orf109PA5 NOP58RPP14EBNA1BP2CDC23SF3B4ARPC4WDR5 USPL1OGTNOL12SSRP1DGCR14PPP2CAYTHDC1 essential genes WDR55AKIRIN2HAMRPL9NAA10NPLOC4RPUSD4US4 SF3B5DDX52CEP192RPN2 212 PSMA5NAA15SKIV2L2EXOSC9RAD51DPMPCADNAJC9 PREBSLC35B1CHAF1B ZNRD1FNBP4C15orf41SEH1LKIAA0947RPP40TTK RIOK2POP7SYMPKBRIX1SOD1NAA50WBSCR16 UFD1LRNF113AMFN2EIF2B5PSMD11N6AMT1GBF1 MRPL17RPL3RBM28RFC1TMEM199RPL19SNAPC2 DAP3GARTADSLYY1CDC45RAD21RPL10A NUDCD3PGDWDR7COASYMVDUMPSCTPS1 NARS2AURKAIP1GMPSINO80ATICEMC1MRPL16 MTEIF2B4DTYMKDLSTPFASAHCYDHFRPAP BCS1LCRLS1MTIF2PCYT1AMTHFD1GPIGNB1L GPN2AARSCSNK2BTAF10RBBP5REXO2CAD MAT2AOSBPNRDE2DNLZUBE2IC21orf59YAE1D1 NOB1PKMMACPSF3LEXOSC5RANMTORU2 CNPRIM1EIF3DETF1CCDC86INTS5TXN2OT1 POLR3ECDK1DNAJC17IDH3ASIN3ACWF19L2PGS1 ADSSGAPDHTPT1NOL11KAT5UBE2MRPL36 RPS8PATP5OEEF1A1SMU1RBM39TUBBPAT DDX10PPA1NAMPTCHORDC1SNRNP27RPL18ACOX7C HNRNPLRPS16MCRS1SAP30BPRPL7AXRCC6RBX1 TPI1NOC2LHARPL15SUGP1PWP1MAGOHUS1 ANAPC11FPGSZNF335NUP188STRIP1TYMSFDXR RPL27KIAA0391PPP1CBCOG4VMP1ACLYPSMD12 SNRNP35CCDC94MIS18BP1NUP88KANSL1ZNF407SNAPC5 LUC7L3SYS1CHMP6POLR3KTSEN54MRPS18BPPRC1 SNUPNMAD2L2CRNKL1WDR92RBM17RRP12EXOSC10 YKT6TBCATRAIPNAE1HJURPPNPT1CENPW PFDN6UBA2ATP5BNSMCE1RPLP2NRF1RINT1 TUBGCP3UTP14AOBFC1XRCC3CTNNBL1WNK1DLD ATP1A1HGSTRAPPC3ABCF1MED18PNO1RTF1 CTC1CCDC115CLNS1ABRD2URM1ZNF622RTEL1 MDC1VPS45HSD17B12GON4LTRIM28SPRTNSCGB1C1 58 ABCE1MOCS3NAA30SLC25A3MRPL42PDCD2VPS16 BAG6 C9orf78TMEM258NAALADL1SP1MCL1TXNCOX11 CDC7CENPLANKRD49FCF1GEMIN2SURF1ABHD11 ZFC3H1MYBL2WRBPMVKFKBPLSLC31A1REV3L THOC1SMC5BRF1EXOSC3CUL3TAANAPC15CC3 SRPRBWDHD1SNF8ILF2CPSF6EIF1NOM1 PSMG1CPSF4NSMCE2ATP6V1C1UTP11LUBR4NUP43 COG3PPP1R35TRMT6PDCD7NSMCE4ATOP3AC19orf52 PPP1R7TTF1NUP153ZC3H8PSMD8C12orf45OIP5 CENPPPPIL4LONP1C18orf21EIF4EZBTB17RPF1 TIMM22TAF12VHLSFSWAPCTDP1SURF6RSL1D1 ISY1BRD4TINF2NAA20RANBP2ZCCHC9CRCP TBCETRAPPC4PLK4COPG1SETD1AINTS6SLC39A7 BTAF1PPP1R15BMRPS27RBBP8GAR1SAP18E4F1 METTL3SMC6MRPS18CMRPL15NDUMRPL14COX6B1FA2 UQCRFS1CSNK1A1XYLT2GAKTPRCOQ2COX10 ORC2RRP15SEPSECSLIASUSP36BRD8EXOSC1 OXSMBRCA2GLRX5LSG1NCAPGPSTKTUBD1 COQ3NDUFS2NDUNDUFB6PPP2R4UNC45AERCC4FA11 EEFSECTSEN34STK11RNF4CDK13MTRF1LTADA3 TAZC19orf43CASC3TFB2MRBM10ASYF2CTR5 FHCOQ6PET112MTG1IMPDH2METTL14UQCRC2 METTL1VPS72TUBE1WDR4TOMM70AARID1APALB2 POLD3ZNF574SAMD4BDIDO1CWC25MRPL52TRA2B BRCA1MED19SMNDC1NDUFB8DDX27FAM210AHTATSF1 GTPBP8PRPF39SNAPC3GLEXOC1POLR3DURODTSCR2 ATP6V1G1TRAPPC8PDCLCEP85EXOC4PGGT1BSMARCA5 TSFMASNRNSUN4C16orf80RNF40PAF1CAD9PA FBXO5UBAP1GTF2F2PARS2NAA35PNISRMRPL19 227 MRPL35STILTNIPBLMRPL24EZH2BRPF1OMM22 EXOC2SDHDYLPM1DPY30WDR24 1335 C2orf66THOP1SIVA1TTC37ZNF511ALPK3HOXA13 TMEM144RBM12BSLC26A8 PYCR1AMOTL1ABCG4P2RX2TRHSV2CSPATA7 TMEM194ABHLHE22FAM101ACNMSNMAP2K5EHD2OT7 CXCL17KIF4APDZD7APOL1FRAT1EPC2USP12 CLDN9TRA2APOLR1DPOLHSLC51AMOSPD3CALML6 MBD2JSRP1AVPR2ZNF629CLINT1DNAJC5ACAA1 PDGFRAFTCDCTF1MCHR2GNL1PPP4R4FOXK2 HNRNPADRG2MCCC1ZNF251FADS3ID1TZ1PA3 ARL6IP6WHSC1IL12RB1SLMAPTMEM214ZNF581VSNL1 NASPDDX31DPTFDP1LRPAP1MAMDC2RESTPA5 ZNF324NT5C3BPLAC8L1KNDC1NINJ2WDFY2CREB3L4 SREK1METTL23FNDC8PAK2TSR3SLX4KRTAP20−3 VSIG10LINCA1BZRAP1USP38TPM1TCHHMIPOL1 ADAM11DNASE1PRKCZRAB11FIP1ALKBH8CLC14orf142UAP1 C15orf62OSBPL5LDB3DOCK10PROL1TRAF7PRMT3 NHLRC4AMDHD2FAM124ATMEM5SYNGR3ARHGAP15HDAC2 C8orf58TCF7NAV2CCSER2LNP1COL8A2EVA1A LILRB5GALNT7C5orf15MSL1TMEM42FRMD4ASLC16A14 INSFAM212AWWTR1C11orf94MLF2DOCK7ARL4C SH3BGRL3DNMT3AANKRD37SAPCD2HNRNPDBTRCPLOD1 TRIM16CRYGBATE1PPM1GB4GALT2KLHL35PCDH18 AWNT6PDCD2LDMOB1BCYP2A7WBP2NLTRNL1YDC2 IFIT3FAM120AOSCCDC85BC17orf96IL36BQPCTLPRSS27 CPSF4LBLOC1S4RAB24TMEM255AHOPRKCEHOXD13XD9 KPNA1CDH2DDX17FCRL1RASGEF1CPODXLGLUD2 DCHS1BLNKCCDC22ZNF609ABI1CAMK1NPPB JPH3LMAN2LRSAM1CAADSSL1IGFBP6SETD6CNB1 CLRN3ZDHHC2PIH1D1FERD3LSEC23IPITGA2BPTTG1IP HKDC1NIF3L1PRPSAP2AP2B1C5orf47C2orf50DEPDC1 NEURLECHDC3HIST1H3BSLC25A29SERPINB5CRKTPGS2 NAP1L2MAPK3GPACNFNSLURP1LRP6ADOTCH4 DUPD1BCAT1RBFPRR25AJAP1WDR44ANGPTL5OX2 HPCAALKBH1H3F3BRBM6TMEM82ARMC5SPOCD1 HINT3MKL1VGFC6orf89PDPRBNIP3FLYWCH1 GEMVCLRCE1C10orf76THZSCAN29SLK CSF2RBGAS2L2EGLN1ERCC5CDIP1ALKBH2CNP ERP44KAZALD1MAP1LC3BDSAMD12MBP2RY6AND5OAT4 FBXO45DNTTASB16SRGNCYP26A1MGABCG5AT2 NR6A1HRDEKSPECC1LIER3IP1ACSL4HTATIP2 TMUB2TNRC6ATMEM185BUBN1DSG2C14orf166BTSPAN4 CDKN2AIPPPAPDC1BARMC6CDK5R1STAMBPL1PRLHYIPF2 BBS10TPM4WDR48ALDH7A1ZNF785PXNC1orf74 CASKIN2MAFKCHRODF4ZNF532C17orf85AIFM2AC1 BSDC1CATMED7CCNDBP1ATXN10GNG3C19orf66CNG4 FERMT2HMGXB4TM9SF4ESCO2UBE2CPDK2ADD1 SFRP5COL1A1NCSTNOPTCMTA1GRAMD2FKBP11 MT4BCL9LHYLS1VGLL1CALCBARL6IP1XPO6 RAPGEF6RANBP9ARFFJX1NCCD9AR1HGEF12OA1 THRAMPDU1PTDSS1C14orf2ELOVL5C5orf34ZBTB40 RPRD1ANRTNFBXO11SHHMRRFSLC6A9SRSF6 TSEN15FUT4UIMC1C6orf48MYL5TGIF1ANKRD17 SIRT1INTS12MRPS17NAMND1TUBA1CHAUS6CA GOLT1BFZD2FAM204ASTAPVDAHDAARLU1C2C9 NCLARP4DNAJB9LDLRTPRKBZNF579PDAP1OA6 SPICE1KDM6AEIF4A2VBP1S1PR1CAPNS1YES1 FUSMETTL4SRP14MPRIPRPS24CEP44G3BP2 CDC42HAND2SSR2MEMAEAMCTS1CFDRTK CHPHF21ASCARF1NBNPPP1R27ODC1CXorf56TOP SUB1FTSJ1PSMC3IPEFNA5NRLKIAA0430RASSF8 BANPPRDM4CLCF1PUM2KDM3BNFE2L2SLC25A5 RREB1WIBGRGS9BPALMS1CEP41ASPMAFF4 HSPH1PRRC2CNEDD9DIAPH2POC1ASUCOARMC8 CYTH1DNAJC7MZT1MAP2K6NUSAP1MAGOHBAKAP8L PPUBE2E3KITMBIPARHGEF26POMPFOXP1ARG 130 CSPG5SLAIN2TSC22D2NDST1AKPNA3FGFR1OP2CVR2A FGFR2OTX2CEP63PRDM12C1DDYNLL1 SLC39A10ZC3H15SZRD1CCND2PARVBWASLTLK2 ATP5HSTMN1YWHAZB4GALT5NTN3MSL2MSMO1 PPIL1PTTG1TXLNGC7orf73DEPDC1BAPI5WNK3 CSTF2AKT3COG5CPEB1ARCLRRFIP1WDR89 AEBP2IL17REAPEX1RPRD2RMI1ANKHD1NOX1 NF2RPL28BTF3MPHOSPH6COG7KNSTRNMAP3K4 PLIN1LIN28BC2CD3KIAA1731PIGHTRIM71TMEM238 LENG9TNKSASNSD1EPN3PTMACDKN3RLF LOXL1CCNL1PLCL2DRG1ZNF148ZNHIT1NUP35 PANK3MPIMCM9UBE2KARHGAP1CCDC47SLC25A32 GALEGFI1STOML2C6orf57STATP53I13PDXKC2 RAD54BETV6CLPBKIF15SLC28A1CHD5SUV39H1 HELQPPIP5K2PEX12COMMD5ETAA1MORF4L1BRD1 CHD7TYRO3TRAF2TNIP1CPSF7KRASSLC7A11 IKZF2PAPOLGZFP36L2PEX14SDHAF1BIRC2IL16 SCAF11STRNLMAN1LNCSMARCD2SPTLC1SPAG7OA2 MAN2C1SAMSN1RARAPEX7ZFP36L1KIAA1199ZBTB45 FAF2THAP4STAG3CSNK1G1RBM5TRAPPC9USP14 PPIBRPGRIP1COMMD4ATPAF2IRF2BP1TRIP4WHAMM NCLNBCL2CLK3ERLIN2AGPSTMEM209SIAH1 TLE3KRTWBP4SEPT1SIK2UBL7TAT5BAP10−9 PPP1CCTMEM147TMEM30ASARNPPEX3CCDC33C1orf86 ANKRD52IWS1APEX2MAD2L1BPFAM96AMTHFD1LKPNA4 QRFPPTPN1UNC5BCDK4CEP350HPPSME3RT1 RIF1CAMK2GFAM149B1PFDN5KCNMA1LYL1MAP2K3 AP1M1ARPC2GNB1PDIA3CSDE1CBC22orf29FA2T3 NSRP1TBRG4PGRMC2IKBKGCCD79APLECD2BP2 LEO1BMBNL1RHKIAA0196KIAA1033SPTLC2UD13OA SASS6MDM4GNB2MAP3K2C17orf50CCL5NDUFB1 CELA2BZNF48TCERG1NELFERBMX2SAFBC12orf10 TECRGPS2C19orf24PRRC2APAIP2GNG12MCPH1 RNASE1DPEP3CAGDI2NINUDT16L1ZNF597PA1CUL1 NUFIP1RBM34PRKAR1ACNZRANB1SCYL2HSD11B1LOT2 SKA2ENOPH1CEP152MDM2SLC51BCENPFPTK2 ITGB5PDPK1NPM3TRAPPC13KIF2CNAPGATRX RHPN1LIN7CZWILCHFAM89BCOL9A3RSAD1STRN3 ERCC6LPRPF40ACUL2POLE3KIAA0020PPM1DCPOX BRAPARGLU1LAGE3DDX5USP7BARD1RRP36 SPDL1MED6CCND1TLN1ORC5SLC25A10UTP18 ILKPPP3R1C14orf178NCKAP1EWSR1MED4DSN1 ITGAVMED9ZNF408KCTD10HM13RBM12ATF7 SLC39A9SENP6SUPT4H1NAA38RSRC2CYFIP1STX5 DAZAP2RAMP2ARFGAP2SLC38A4SLC2A4RGIFT57SOGA1 RCN1GRIN1ANKRD32LCN15ZFYVE19AKIP1ARHGEF17 TMEM151ATDP2CDH12PTGES2CATSPER1PEX11BSIKE1 CRIP1TMEM242TFAP2CATP6V1B1METRNLRNF11PHIP IRAK4OPRL1CATWDR38MYLK2PRPHLINS AWSLC25A31NFRKBCCRL2CNNPDC1EIF4EBP2OT4AT2 BCAR3PPFIA1REDDR1CTSFDNAJC4PRSS35LT HNF4ATXLNAPHF8NTMT1RAPSNRFX1ZNF750 VWCEASB6CST2TMA7MAPK8IP1TRERF1GALR2 HNRNPH2RTN4RL2AAAR2RFX5RHBDD3HNRNPUL2CR KRT75CABP4NANPEAPPPLCB2UBXN1SLC25A25 ENDOGINO80CPIAS3PMELIKZF4WFDC3WBSCR27 INPP4BPCNRARPSLC15A3PAK6CEBPBGAL3ST3 CDK5RAP1SAMD10STT3BSYT12LRFN4B3GNT1WDR54 CDK2AP2TBC1D21STK32CIGDCC3DAKFXYD4COL27A1 FNDC3BZNF341DPNOCFIBPMUC1LCN6ALRD3 SBSPONCASC1ZER1TNNC2SHCBP1LC1QTNF4RBM38 RBPJLEIF1BRPL3LPCBP4NFAT5TNFRSF17B4GALT7 CASC4WFDC2TMCO6ZBTB3BBS1RNF114CRTC3 BCAS4LAMB2SLC39A13PPDPFMYO10SDCBPC7orf43 SLC5A12C6orf15ARAP1OTUB1PNPLA7PPP1R14DSUOX 63 DDX25TPRNNPBWR2TM9SF3SLC29A2PURAGRM2 TTC33HAPLN2TTC4 C20orf201BAFUBP3B3GAT3ZNF428SHC4SYPL2CH2 RAD54LAAK1TBC1D10BHDDC3SC6orf226CDH16PATA2 PHGR1FAM13BNECAB3C11orf86SS18L1IQGAP3SH3GLB1 C11orf84MYRFCAPN3TGFBR2TTC9CLRRC26FGF19 UBE3ATGFBR1SPAG4FGF3C2orf68NUDT8GJD2 ASRGL1ZDHHC12ALG12C15orf53TBC1D30REM1REEP2 GALK1CNIH2CELF1FAM83HAPOBEC4SLC35A4PUM1 CRY2ASTLARF1IRS2ROPN1LTP53I11PRKAG1 CLDN15SNX32GPR137CRHUBA5ELL3C1orf56 SCAND1SPINT1FMGARPZNF281TERF2IPTAM83DGOLN2 BHLHA15KCNH3SRRDKBTBD4KLHL42ERGIC3HOXC5 GPAA1MAPK15PIGMKDM1APEX6SLC35C1AZGP1 CT62SAMHD1DPM3MAPKAP1TM9SF2PIGQPIGU LMNB1TM7SF2BNIP1PIGVPRKRATMEM41BMROH6 MESDC2NTSR1FGFR1RHBDF1MAP4K2SCNM1RAB7A PKN2TOR1APRPPM1JCOMMD3ROPN1FADDOCR CHYIF1AKREMEN1LAMTFBXL6ELMO2EFNA3AC2OR5 KLF1KPNA6DPF2ZFPL1DUSP12YAP1RBBP8NL PCNXL3ARL1EIF4BIGSF3MACRRASGRP2MAP3K11OD1 MYT1SIPA1CDC42BPGKEAP1PEX16SYT15LEMD2 ZRSR2PGPPMM2SETD2CITEXD1OGFR LRCITED2VCPIP1BICD2CREB3SMG9HNRNPH3WD1 159 INO80EDCAF15CNTRPRD1BFAM122ASHARPINIRF2BP2ROB CAND1EXTL3 HEXIM1ANKDD1AMTHFD2GATA2VPS4APPP6R3HIST2H2AB SUGT1INTS7DCTN4SCYL1DYNC1LI1CCDC59MGAT1 KDSRZZZ3SKA1COG8NAF1ZNHIT6SKA3 SEC16ATRIP13EIF3JRNASEH2BSEC63RNASEH2AMECR CDK2PRKRIRPGM3T1POLGNR2C2APELP2 USP8MESUDS3GID8TIGD3RNF8KIAA0100TAP1 C11orf31TWIST2IFRD1SLC6A14USP10UBE4BRNASEH2C CYP3A43CLEC3BLPIN3LCE3CPGAM2GIT2CPN1 PRR11GXYLT1KLC3LCE6AGGHAHRSLC6A13 AGGF1USF2TRMT12SRIENDOURNF166BST2 SH3RF2SFT2D3AFTPHIL31RAZFKLRG2FZD9AND2A GMCL1PNPLA1LYPD2ACOT2SLC4A9TLE2HOXD8 318 IL31ZNF789PRKKCTD11MUS81RSBN1LMS4A7AG2 FRMPD1TPST1 YWHWDTC1CRYBA2IFITM2RSPH6AWDR20BEST3AG BCL7ALIG4LBHUFSP2FILIP1LCBX4TMEM229A RABL5ZPBPZNF394C19orf25ARHGEF25KIAA1737FOXJ1 MKLN1SLC31A2EIF4ENIF1C17orf98ITGB1BP2SLC5A1KIF5A METTL25RNF139FLPTP4A3FAM134CERBB3ABHD10VCR2 C2orf62P4HA2KIAA1279SNX2TBXA2RKLHDC8ADPM2 UCHL5PDCD10PLEKHD1CHP1TRIP6TMEM243FGFBP3 PIN1SOX11AQP5ZNF100ONDUBMP2KTUD6BFA5 COG1CD40LGPORTRIM65KCTD16IGFBP3ARHGAP9 DPYSL4FZD6HCN3C9orf116AATKC8orf22LACRT C11orf54FAM20CIL1APPP1R15AFAM179BIGIPATP9A 154 STK35KBTBD2C6orf1ATP6V0E1TRPS1HSP90AA1AGK ARL16GGT7B3GNTL1CCDC104PPP1R21ZNF580 MBANAPC13FAM183AZNF524CNPY2GIMAP8ZC3HAV1OAT2 ENPEPSUGP2G3BP1CLDN3TCEB3GSDMDNR2F1 KDELR2SLC38A6C12orf68IGFBP1ACOX1ARHGEF40CELSR3 MMP14TBX21STK31ADCY4PDS5BGIMAP5ZNF692 GDF15RNF19AHHIPL1CYBRD1RBP5MTFR2CD164 LYPLA1CCL18PRDM16FBXO43ITGB1BCL2L11DLX6 CHL1ASIPGATAD2BSHFB3GALT2PLEKHH1JUND SLC22A17ZEB1XKR8SLC2A13SCAMP4CYP26B1CAMKV CAP1SLC6A5IL25KDM4BSQLEVWFKLHL7 RND1PDE4CSTK39TNFRSF1BC7orf10KCTD2MDK LRRC23SERHL2RBM43HELLSEDARTBL2SLC35A1 PRNDMARK2SDR39U1CXADRMCFD2IER3GNS TMEM174GPR22SLC25A39STX1ANAT6NRP1PROB1 ZCWPW1KCTD5ZNF662ARL3DKK3STRA13AKIRIN1 SLC17A7WBP1MEIS2TEX19COMMD10CREG2GABARAP MAP3K12PURBFKBP6RNF126BCL2L2AREL1PARP2 C1orf27FAM49BCUTCHES4CISD3CCSHOXD11 RPS6KB1USP32SLC39A1PPP1R18DVL1MKI67PILRA CABP7ZNF784SLC16A5CYP21A2ABCA8BCAR1ABCF2 CYP4Z1ZYXZFP41BCDIN3DCD63LIMS2USP42 RNF5AMZ1PAX4HEBP1C1orf100SECTM1YJEFN3 UFM1C1orf229TSPAN12GK2HIST1H3JZNF263URGCP EIF2ACUEDC1SGPL1POC5SGTACMSS1RGS6 LIX1LLRRC43PDCD6IPAPOPT1TERF1CSPYGL CAPRIN1SLC27A5ENPP7AATF1ABRANUDT1CTN4 SPHK1ANKRD53PPIFGLULPOC1BC7orf31PPFIBP1 FBXL13C7orf61VEZF1LRIG1MFN1TMEM150AWSB2 CPED1TP53TMEM17DLX2AMZ2SCOCTMEM179B FSBPMYL6LRRC17SALL1MSI2MAP3K6MTERF NPHP1C19orf60GTF2IB4GALT4DDIT4CD69FAM76B CCDC65GRM8PARALGAPA1ARIH2OSKRT79CLMPCS1 MRC2PRKCICD300CTOP2BUPP1PPPARD6BAPDC3 DLX1EPAS1PHF17HID1PRKAB2CADM4SPR SDK2MGCUL5YY1AP1PGBD2PSME4DCAF13AT5B UFL1KLHL6GPR162SLC52A3FAM175APLCG1FAM104A RHBDD2GALNT5ZNF660METTL21BCOMMD1SPDYCKLHL36 ADAM23TMEM44CLDN4MPZL1BIRC3IL3GABRD ANP32EDUS4LWDR11ATG13PGA5CAZNF419CNB3 TTLL6EIF4HCNN2GTPBP3COPLANGRO7LY1C9 NEK2DPY19L4TTC30BELMOD3NPBGPAUSH1GTCH1 ELNUBCCC2D1BDDX50ZNHIT3HLCSSOX9 MAP7D1ZNF74BIDPYGMZNF623ZBED3MMD ITPRIPNBR1YPEL1ANKS3C22orf39MTRCNNM4 FNTAMAVSGSTP1CHI3L2KCNQ4SERTDGCR2AD3 MYO7AIVLMBD3PLCB3RHOCPRAMEFOXE3 BRMS1FHOD3CUL9CCT8L2TRMT2ADUSP26GDPD5 SORT1HIC2SHELPCAT3OXER1FERMT3FAM71F1 SP3ASCC1WDR26TXNRD2WRAP73DRGXTANGO2 RAB36P2RX6GCNT1DNAJC14TAFAM78ACDK8CR2 DGCR6LSLFNL1GRHPRPIM1YBX1KIAA1211LSP7 LAMC3SLC1A5AIFM3CD81SMCHD1KLHL22CELSR2 contextual NPAS1UBL4BSDF2L1CSF1ZNF280AFADS2ARHGEF4 1483 FAM155ACOMMD6 ZMYM3MADDGNAZAZI1PHGDH PLAUCLTCL1YDJCGP1BBC1orf162TCTN1MMACHC AGMATSCARF2KDM2BC2AIF1LARHGAP22ARVCF KIAA1432ATG2AKIFC1CNTN2TRAFD1EMC8TOP3B FRMPD2PPP5CFAM212BTBC1D2FLOSBPL7MPLVCR1 GSE1GPC5CIDECETHE1C16orf71FIBCD1VWA7 HVCN1RCCD1RGP1RTN4RZSCAN2KCNK4COX4I2 CLDN5TSSK2SYNPO2LABCC1SSBP3INPPL1ZBTB47 SLC26A10HSP90AB1TET3MAPKAPK5BTF3L4PHLDB1PIAS4 KCNA2UBE2ASNAP29ANAPC7YIPF3EDNRBKCTD21 ZCCHC7GSC2USP22AKNAD1GPR75SGMS1WAC GATA1ZFPM2TSPYL2RARGPLEKHG2SERPIND1BAZ2A PAOVGP1ZBTB2SLC12A9ABCD4FHL2ANP32BTL1 UFC1PSRC1PTPRHSLC7A4NFE2PRDX2FOXA3 DLG5CTDSP2RCC2TCOF1ATL3THAP7NSUN5 PTPN11SLC25A45TARBP2KCMF1MAPK1SEPHS1MAPK14 SLBPANKRD11C19orf70PRR13FLIIPCM1GRB2 TMEM165FECHPNPLA8POLE4PSLIRPFANCD2AN2 KDM5CUBE2G2ZC3H10RTN4IP1MDH2NDUFB2SIX5 MTBPMON2AMBRA1STAG1AZBED2FOUP1XRED1 USP48HSP90B1RDH5ALG6KIF14IGF1RCHCHD1 CXXC5TJP3DNAJC12GSG2TXNDC17DERL1LRP10 QKIEAF1CEP95GNAI2RBM20DNAJC21CIB3 CCDC110EME1POMCC8orf46ATMINZNF143NARG2 SOX2FUT11BCL7BCCL26ADHFE1USP1FBLN7 C20orf144CYB5R4ARF3SRPK2MRM1USHBP1TMEM184A DNAAF2C2orf72MUTYHUBTD1KRTCAP2USTITGA5 ITGB1BP1SCIMPSLC13A2CHMP4ADAZAP1CDC25BAGAP2 PAOXTBL1XR1C12orf40IPO4TIPCPEB3T10ARP COMMD9ANKRD13BOPN4PPME1AGAREMLTIMP2TRN C11orf30CCDC93FAM186BING3IFI35COPRSCCP110 TMEM52CREB1ZMYND10SNX4SEC62RASSF1PTMS RAATRIM37SH3PXD2BB3GALT4SMIM7SLC11A1CHEC1 ALG8DDNDTX4CDC25ACOMTD1CEBPEPFDN1 EIF3KVPS53CDK6SKP2EHMT1ATP13A1CNBP GFM2MRPS26RNF168ANKLE1MTA2MNTVPS54 MRPS9COQ5MALSU1CDK12DDI2YBEYGET4 C17orf70CAB39TOP1MTPOLG2ING5TIPRLZMYND8 CHAMP1MBTPS1TIMMDC1RAD1FDOLPP1ADRM1ANCA GNB3UBE2TNDUFB10VPS33AFBXW7ERAL1LIN54 essential genes MED28MED29SLC25A19ANLNFPPP1R12ABRIP1ANCL ZMIZ1OFKPNA2PHF12ANCCTUD5 KAT2APPP6CCHTF8NUBP2PEX10DCTN1NUFIP2 SFXN4SIGIRRHIST1H3IKRT2MAD1L1PFKPCACNA1A PREPLZW10HIST1H2BMAKAP6RAVER1TEFRILPL1 RFFLSSH2CACARD10MICU1CLEC11APLD3CNA1S ZACOL2A1HK1NALCN10PPFIA4COX8ACNCAD MLCOL4A2SYKTTC14GJC3SLC19A1CELF6LT6 MPP3NCKAP1LS1PR4SNDURBM27FBXW9PATS2FA3 ZFGATA3DDIT3CLCN6FMNL2DNAH10UAP1AND5 KIAA1614KIAA0141MRPL50BAI2SH3BP1ZNF687COX16 PKNOX1NUGGCRBP3RSPH1ME2TPK1HDAC8 C6orf47ITGAXSLC23A3MICU2GPR88MYLK3IDI1 VASNMAST4COA6FATP2C1MFSD12HNRNPUL1ANCG GLP2RHS3ST4CMSCUBE1FAM71E1ATP1A2SLC38A7YA5 FAM83EKRTKIF3CINSIG1KRT78TRIM24ATP5J2AP10−8 HCLS1DNAJC19GJB5EHMT2KCTD17ADARRFXANK SHBZNF219C9orf163NFAHERPUD1NOXA1COX6A1TC4 SLC52A1NDUDISP2AKT1S1TSSC1PRR12THRAP3FA9 ZNF106ARID3BMCOLN1SGSM2PRMT8AXIN1PPP1R13L UBR5GPSM3UGP2GLNGRNMYH2SLAMF1TSCR1 C6orf132LARP1ATMLAKAP1HRCT1MFRPAD3ALT1 ZNF641CCDC90BREREPLA2G4FPIANPPTRAF3IP1AN3 XPO4FIGNL1GP1BAMEX3CLMOD1C1orf177BASP1 MMP17AFF1EPS8L3RNF121CANT1ANKRD40NRAS PLEKHA4ATXN1PHF20GOT2ZFATC14orf93RAI1 NFE2L1CCNFTRIT1HNRNEIF2AK1PIGOLINGO4PA1 EGR2FCN2TTC31XRCC1TLE4DNAJC16VAT1L C1orf112VRTNFLAD1GNPNAT1FDX1NOANCBTCH2 IRF4FAM136AMYEVPS8ANKS1AACSF2ARID3COV FAM20ASLC52A2ATP1A3SMR3AMYH7BKAT6APBXIP1 TADA1ERHLIG3TIPINGCH1KIAA1211HIST1H4B NDUERGDNAGIGYF2CBFBFFUBP1ANCMJA4FAF7 C15orf39PEX5CLPXRPT1SMARCAL1UNX1AR1 FTSJ2SUPT20HFASTKD5TAF15ACTG2CCNT1CCAR1 SMARCE1SLC25A1SMARCC1SRMMSTCTDSPL2AMM41TO1 NFU1DERL2MED13CCND3ID3HMHA1VMA21 Wang et al. GOT1ATP6V0A2FGFR1OPSLC33A1SLC2A1COX14CROCC RHOHSTX4TSC2MYH4SNAP23CD79BTPM2 ALAS1NDUSNAPC1PRCCZWINTALADCDC27FAF1 GABPB1MRPL3RPS7HSF1MOGSMAF1BECN1 FDPSAMRPS21LSM2SLC20A1GFPT1RPS14URKA TADA2BPPCDCRPL14PAM16EMC7RPL35RRN3 RPS19FAUUQCRHAHNRNPCRPL37ACOX17CIN1 PSAPMYBRAD17EIF5AGCMTERFD2RBM8ALC PPP2R5BCLPTM1CAPN9USP35GANABCHGACSRNP1 PEX26MYT1LCCDC116SMTNAGTBATP5LATPAAT5 AHCYL1DDX11PABPN1LCHCHD3UBE2NEDC4AAAS ZFP91SLC6A17MED24C14orf166TRIM52DOCK6SSBP1 WTAPAMHR2SUPT7LLCMT1PDE1BSPENTSPAN31 LMO2GAB2TCEA1ABL1EXTL1SREBF1RAD23B DDX39ABRD9CMIPSHMT2TUBB8UBE2SBCR TRMT10CRANBP3AZEB2LDB1GSSPPCSCBD4 HECTD3CLCN3MAP3K7TAB1NOSCAF1MED15TUM AKT2RELBTAS1R3RING1ESPNADAMTSL4SUCNR1 POLQBCL6PCSK7CHUKPDE4AACSL1SP9 CHRM4MIEN1PDHA1LARP4BTMEM189IFITM5ELOVL1 KCNIP3RAB21TBC1D17CCDC43RBCK1RAB35TRIM46 SCAF8IKBKBSCRN2ZMYM4CILP2GADD45BMAP2K1 PDZK1IP1HDLBPMPP2PQLC2NCOR1TMPRSS9TMEM183A DMXL1PIK3CDPOGZFGRPOU2F2MTMR4B3GALT6 MEF2BCHMPOU2AF1SH3GL1RRCD19TTC7AAGC VPS35TFGVPS37APHF15VPS41UXS1SEC23B HFE2COA4SLC7A5AGPRJTBATP2C2UNEPAT1 MED25FLCNFDFT1FZR1LSSNAMEF2CCC1 RBPJCHD8RNF31PDS5ACALRRAB34MICALL2 RELANOSIPZC3H13HAWDR62EGLN2DUTUS2 ASUNRPS10CHKARPL5ZCCHC14ASCRIBCTR2 ALG5ISCA1AZIN1SLC38A10RHEBITPK1RPS25 ZYG11BTAF1DANKRD24UBE2Q1RASA3GPBP1PIM2 HEMK1NCALG9ETS1KCNJ12UBE2J1ZNF330OA3 AGTPBP1AP3D1EIF4E2IPMKTMEM39BDNAJC3SCAMP3 CKAP2LARF6KCNQ5TGIF2ARID4BPTF1AMTCH1 CARNS1IRF8DENND4AZNF296FGD5YPEL5EFNA2 TACDC26CDK5RAP2ZC3H11AREC8DIRC2FOXO1CO1 PDSS1NRBP2PRKAA1VEZTC17orf89ENY2SLTM IREB2EPG5CYP11A1TAF11STEMC4KDM4AARD7 RABIFCDX2CEBPDSPOPRBM18WIPI2TFPT (CRISPR) STAG2TAF5LPCBP2AK2MAPAPD5EMC6TR3 MYH9ARRTRMT1ATG9ADCP2CTBAGA RMGPPM1FEIF2AK4STIP1MBTPS2SCAPTDR1AT3 MRPS16CAPZBALDH18A1STRAPFIS1MTCH2PIAS1 MED10MRPS31VPS29MTO1MED22RPSAMRPL33 WDR18PPP2R1AHSCBZBTB7ATTF2FASNLAMTOR4 ORC6PRPF3CTU2COPS3KTI12XRN2GFER RRP1MRPL20ELP6CPSF3ALDOATRIAP1TRUB2 MMS19CCT6ATHG1LPDE4DIPCD3EAPPCF11TRMT5 RBM14NOP56ROMO1RRP9CSTF3DHX33UBA3 INCENPELP3RRS1DDX39BCSTF1VPS13DMCMBP TRAPPC11MCM5ATRIPNTP53RKGNL2PGK1PAT TRMT61AGNL3PRC1RAD9AARIH1WDR83TOMM40 SNRPBCWC22PCBP1TKTELP5EIF3BGTF3C3 DYNLRB1DDX55CDCA5RPP38HCFC1APOLR3FCTR10 SNRNP48ZNF131SRFBP1ALG14PPP1R10NOP14URB1 AASDHPPTLSM11COPS6SHQ1NCLDNTTIP2OPA1 G6PDPRPF18CDCA8SRSF11UBTFNOL7CDAN1 TAF1ATSG101DCLRE1BYEATS4NUMA1POP5ELP4 IPO7C14orf80HUS1KRR1PNKPMED12HSPA14 IMMTSGOL1BCAS2ASCC3NCAPH2NOC3LMED14 PSMC6ATP6AP2TNPO1GCN1L1SHFM1BCCIPACTR6 EIF2S3UXTTAF1RPA2ATP6V1E1MED7HNRNPH1 LENG8DCTN3LIG1SRSF3MED26MRTO4NRBP1 HMGCS1ANAPC10GTF2E2PSMC1SRP19SUPT16HMED1 SMG5QRSL1MRPL38HARS2HMGCRMRPL28COX4I1 ENO1EIF4G1METTL17MRPS24MRPL45ACOX5BCO2 MRPL41TARS2MRPL21HSD17B10TAF6LMVKMRPS25 COQ4DNM1LLARS2YRDCMARS2MRPL46MRPS14 GPN1WARS2RPS9MRPL12LAMTLRPPRCMRPL53OR2 ATP6V1B2MRPS15RARS2SMG6SNRPD2MRPL4COX15 MRPS35COA5OGDHMRPS18ASARS2MRPL34UQCRC1 MRPL44SUPV3L1DARS2CARS2GADD45GIP1EIF3LTRNAU1AP PET117MRPL43SOD2MRPS34IBA57MRPL37PTCD3 CYC1FBXW11MED8FARS2MRPL23C1QBPMIPEP SYVN1HSNDUFB9SEL1LMRPL49NDUFS1SMG7PA8 COA3MRPL18POLRMTTAF13CDC37POLR3BRPLP1 TMED2GFM1PELOVPS18MRPL22MRPL13ATP5C1 CHERPWDR82INTS10MRPS10ATP5F1TMED10BCL2L1 FBLGARSU2SURPILF3HAAARS2TUBGCP4US7 ABCB7POLR2EPSMA4KAT8ZMAT5CTR9DHPS Jiyoye, K562, SRP68ATP6V1HNOP10LIN52PDRG1EPRSSNRNP70 EIF4A1NCAPG2SCFD1TAF5 BORACCDC130TUBGCP5HAPSMB5PFDN2FIP1L1US8 KIAA1429PAXBP1XPO5BUB1BNKAPEFTUD1RPS29 CCNB1PRPF4BEIF3MCNSNW1CTCFMRPS11OT3 MICALL1THEMIS2RETNADAT2PHAXKLF16TEFM FERMT1RNASEH1GRB10KCNK5MCCD1EIF2AK2APOLD1 DIS3LBLOC1S3IP6K3PLAB4GALNT4MPV17LSENP2UR PIK3R2PAIP2BPIGKBEAN1L1CAMPPP4R1USP6 TNFRSF6BSMPDL3BSCAF4AFAP1L2ATG7SLC1A7HTRA2 GUF1NUDT14NCZRANB2AGC11orf21CCDC151PAOA4T2 WDR53RHLRRC32HBMLEGFRPEX19TN1OT2 RAD54L2OSBPL9RAPGEF1TRAF3DHX29NF1HOXA5 ATP2B1ZBTB1ARF5IFNW1SFAM228APRNPPATA24 SP2ASCC2ATG14TAS2R13MTX2PYDC1LCE5A SEC23AUBE2G1MYLIPSHC1RAD51AP1GPR62DEFB129 DET1TRAPPC2LKLK9SLC12A7SPPL3MYO18AAXIN2 ZNF646RIOK3TSTA3DDX19BARDLL3CN9UNX2 ZDHHC18LAMTPUS1FOXB1SLC25A41CORINWDR59OR1 TM4SF1CNGB1CCDC117ANGPTL6ZNF343SMIM1TOR1AIP2 Hart et al. TAGLN2ACRV1GOSR1ESRP2GPRC5CPRPS2PNPLA2 LAMFRNME6YRM7 LSM10RFPL2ATG3PTRH2UBE2J2IGHMBP2PIGA SERPINB6SEC24CMFSD5BRD7INPP5AMSMBDDA1 TMED9UBE2HSREBF2RAB18FLAD1FBXO2TMC8 PIGTILVBLCASP9PYCARDNFX1SRA1TPSG1 FCHO2RBGPR20KLF5FAM73BGLRBGATA6FA NR2F2OXLD1RB1CC1EPS15L1BAIAP2SLC25A42COX19 TENC1NONOGHDCFITM2DPH2TSGA10CHCHD2 MBLPEX1FLT3LGKLHL11LMO4CSTBAVENAC1 GLRX3USE1UNC119URI1PAXIP1HEPIDDATR6 SUV420H1DMNEMFACSL3HIST1H4AC2orf49PRSS33RTC2 SSSCA1DOT1LACANOLC1SNCBPDCD5COG2CA GASTDNAPINX1BMI1STUB1WDR73BRAT1JA2 KREMEN2ZNF837SERF2RAB11FIP4SPRSLC25A22PIK3R4YD3 KRTLRRC41REXO1KRT86SH2D3CUBE3DGNA11AP13−4 MRE11APHF23EXOC7TAF9ZSCAN10CNIL8OT11 GLSCNKHSRPUSP9XUBALD1GTF2F1BOD1L1OT10 THOC6ARID2RMND1TRMUMEAF6MTF1MTERFD1 VPS36ADCK4CDH15ACTG1PPIHHEC15orf52ATR2 SMARCD1MED16TAF4XRN1STX18NHLRC2HNRNPA2B1 HIRAGGA3CLPPCABIN1MED13LMED23NPEPPS KBM7 & Raji SELRC1UVRSLC30A9NXT1MRPS23OXA1LNDUFC2AG KATNB1TFAMDR1 LXRCC2MRPS28CDCA3ATP5IMRPL47MORC2YRM4 PDE12RNMTL1RPAP3GTPBP10SETDB1RBM33TFB1M DYRK1AHBS1LMRPL40SNRNP40SLC25A26SEC61BTIMM17A NDUFS8NDUNDUFS7GRSF1PTBP1MRPL54RPEFA10 NDUNDUFB11NDUFS3BAZ1BUROSFA6FA8FAF5 RPUSD3NDUNUBPLNELNEDD1VPS39WAPALFAFAF6 ALG1ESF1SF3B1NUP205EIF3HDMAP1CUL1 GTF3C5RPF2TAF2GTF2A1PPWD1PLRG1CCNK DDX59HNRNPMMESPCS3SRP72CENPNRNF20TAP2 ANKLE2ARFRP1CFDP1SLU7TOPBP1RBM48RPS19BP1 DCTN6PSMG2GTF2A2MRGBPTAF1BNDUFB7IGBP1 CSRP2BPTXNL4BRPP25LPIGSBOP1VWA9MIS18A CENPJZNF283CENPQCEP97SLC25A28HCFC1R1PARN TIMM50ORC4VRK1BLMMCM10C11orf57C1orf131 DNAJC2INO80BFAM98BTOE1WRAP53ALG13LENG1 MPHOSPH8GMNNCTDNERIC8ADDX3XCTU1CHMP3P1 ATL2URB2TEX10SHOC2BAP1ERCC1ORAI3 SRCAPINTS4TSEN2PSMC4RPS15ASEC13SPC25 (CRISPR) AATFNOL8NUP160GTF2E1RBM19DHX9BMS1 MRPL51CLASRPELLUTP20RPLP0CCDC101XAB2 POLR1EATRSAMM50MRPS5HARSMCM2CCNH SF3A1ZNF830NHP2L1TANGO6CCT5FAM50AACTR8 WDR12POLEGNL3LDNM2SDHASMC3SMARCB1 RPL23AGOSR2SRRM1PMF1INTS1GTF2H3WDR36 TBCBTFIP11RPCOPEZC3H3NUF2PPP4CTOR TELO2ATP6AP1SDHBPTCD1AP2S1MNRPS15AT1 MRPL10EEF1GRPS6TFRCPPIEZBTB8OSCDIPT MMGT1PRELID1DOHHYME1L1HK2SMARCA4EP300 ZC3H18EIF5RBM22NUTF2TMX2GINS1ZC3H4 NUP54IPO11MEN1C7orf26GTF3C4MRPS12WDR25 NMNPPA2GTF2H1MED21SRD5A3C16orf59CLSPNAT1 RPL13AUSP37CHMP4BSDE2AHCTF1SRRM2MIOS TAF8VPS51PI4KBVPS52POLR2FDHX30GEMIN4 PPP2R3CPOT1TRPM7NMD3C19orf53EMC3PFN1 T11DDX6COPS4CDS2PSMF1MIS12NFYC SNRPCNCAPD3RBBP4CACTINKIF20ASRFRTTN EMC2RICPOLD1SRPRNUPL1FDX1LDHX35TOR CCNCTCF3SDHCEIF3EDICER1GATCSKIV2L KDM2AEFR3AATP1B3NDRG3SLC16A1BSGCDK7 KAT7SPI1ORC1ABT1LETM1ASH2LMRPL48 INTS2ATP6V0D1PPP1R8MLST8LSM4DDX28ATP6V1F PAX5ATXN2LMKI67IPSLC4A7DYNC1I2MAD2L1GEMIN8 DNAJC11ZNF207TAF3PABPC1MANFCYB561A3EBF1 ZNRF2LY6DIKMAXAP2M1MRPL55MRPS30 C17orf82HESX1SMIM14STRADAZXDBKCTD14GLRX2 FBXO48NPNUPL2FAM69APIGPRABEP2UGT2B17PA PYURFPLSCR2RBM15MSH4JMJD6UBA6ZC3HC1 VDAHIST1H2BBMYO1EZNF585BMTERFD3CIR1ATN1C1 CYP17A1CDKL3UBE2ZUSB1HIST1H1CKIAA1430NDUFAF4 LCE3ACARD9LMTK3MYL12ATNFRSF12AIL1F10FAM24B CLIC4SPA17MRAP2NDE1ZNF347LY96BUB1 CYP3A7LAIR2HCN2PRMT2CEP170INSL6HDAC1 PDZD11CHPFWIF1LARP7FBXO36PFDN4ACOT1 DDTKLRC2BRMS1LSSBBET1LKIAA0319LBCL7C COVTI1BTHEM5PI3SNRPB2DNAJC8VPREB1RO1A RTBDNNOA1CFHR4GTF3C6C1orf227ING1OBP2A DEFB136ZBTB9HES3CAMLGTRIM8GSTO1FAIM UCHL3SMURF2YBX3EPGNFABP4PPHLN1TUBB3 ZNF490FAM209BSLC16A13MAST1GPX8ZNF141SYT8 DLD1, GBM, CTNNB1CLK4BRICD5TPSD1FABP12TAPT1ERP29 ELMOD2ENSACLCNKBPAOPN1LWSMPD4ARMS2QR4 ZNF205KIF20BHPS6GNA13JMJD8DRD5FOSL1 RNF14TIAL1S100GKRT6APRMT7GSK3AC3orf35 C17orf49CPNE7RSU1UBIAD1C4orf6GALR3ARHGEF19 TNFSF4BET1PRCPABHD17AARHGEF11CYP51A1WRN C17orf59NUDT15ADNP2CTRCCHADSLC2A3CDK11A XPOTZNF562PRDM9PRDM7IKBKECCDC54FAM46D TIMM8BCCDC174SLC30A7IGFL1HMGA1BLMHMBOAT7 CKMAC15orf40KIAA0586MBLIN28APTP4A2CSM2B UNC119BZP3IFFO1CAPZA2C6orf203HABP2NOXO1 RCBTB2AQP11MEIS3BLIDRAB14DLL4DENR PRDM8C16orf86MTFR1RNH1NAIPHIST1H1EMZF1 PIM3ODF3CEP135LSM5AENUBXN6MED31 MBTD1NDUFS5TFAP4PLEKHM1XPNPEP3ZFRBCL2A1 MRPL1UBAP2LNFYBBCLAF1SLC7A1FXNCOX18 SDHAF2NDUFV2DDTLMRPL2MPV17L2MCECSITAT C17orf53DDX19AAMD1LIPT2MRPS33FASTKD2PAPOLA APRTRRP7APDSS2BPTFATPRPF38ACEP55AD5 MTFMTSLC3A2B3GNT2POU2F1EEDC9orf41GCSH BRAFZDHHC8BIRC6FAM32ACAND2CREBBPIPPK COPS8CXXC1MOB4CIAPIN1RPS6KA4DLGAP5COMT TRIOFAM178AEHD4ZFXMS4A5CHCHD4MEA1 TMA16TULP1REV1SNAPINCARHSP1TOR2AATP5G2 C11orf88HIST1H2AHTADA2ASTT3ARSPNHOXC13USC1 DNTTIP1BBIP1IDH2APH1BZNF696XBP1HOXC4 ZCRB1TRNT1TCEB2CMPK1CENPHSLC4A1FURIN RPS20NDUFC1ICT1NDUFB4NDUCOQ7TAF7FA1 ATP5SLSUCLG1PCYT2UQCR10ZNF85IPO5PCNT TEAD3PRDM10EXOC5GPX4ARAD50COX5ACD FTHRSPIGLL1MCUR1CCDC137ZNF236PPP1R11ANCE SUZ12TGS1TUBA1BWBP11COPS2LMNANKTR HECEPT1KIF22HNRNPFSAP130CHTF18NFYAATR3 PARAB1AYTHDF2RBM42IQCCRNPS1SRP9GR1 FXYD6LPC5orf22TCF7L2PLAPIK3CATMEM70AR2C8 DNAATXN7L3JUNBRPL22L1MRPL32HCCSAPCJA1 SSBP4DPM1CRIPTYEATS2UQCRQNDUFB5MRPL27 CENPOTLDC2UFSP1MRPL11SCO1SNX22HOXC6 DRAP1PRKRIP1ATLCD1IER2RFWD2PGLSCTR3 RPIAASNSDPH5LIN9NFATC2IPCBLL1C12orf65 KIAA1524FNDUFV1DCPSIPO9DSCC1BAHD1ANCI HeLa, HCT 116 KNTC1KIF18AMGEA5RAD51BNADKAIFM1FANCF E2F3RFWD3 NUDT4PPP1CACEP57AGO2AQRICH1UQCRBCTR1A SPAG5NDUISCA2CHD1SPTSSAPPP2R2AHMBSFAF3 BDP1NBASDPH1DHX36SCDRPL29MRPS7 PPP1R2RANBP1CDC40KDM8PIK3C3RAF1CRKL DAXXC16orf72RPAINPSMG4LIPT1SETD8MINOS1 & RPE1 Raji Hela GBM K562 DLD1 Kbm7 RPE1 Raji Jiyoye Hela NALM6 GBM HCT116 K562 DLD1 RPE1 KBM7 Jiyoye NALM-6 C HCT116 significance (-log10(p-value)) 0 10 20 30 40 50 60 nitrogen metabolic process D mRNA processing 12000 ribosome biogenesis observed 3500 cell cycle DNA metabolic process 3000 random

ribonucleoprotein assembly 2500 tRNA metabolic process binary 2000 translation 1500 biosynthetic process continuous nucleocytoplasmic transport number essential 1000

chromosome organization 500 mitotic nuclear division 0 0 1 2 3 4 5 6 7 8 9 10 # screens

E 50 45 % with yeast homologs 40 % with essential yeast homologs 35 30 G 25 20 0.16 15 ? ? ? ? ?

homologs (%) 10 genes with yeast 0.14 5 4 0 0 1 2 3 4 5 6 7 8 9 10 0.12 # screens 0.10 3

0.08 2 F 0.06 2000

0.04 fold enrichment 1

1500 fraction universal genes 0.02 1000 0 500 gene with neighbors of clique neighbors of clique interactions UE genes neighbors of 2 UE genes neighbors of

average gene rank 0 UE genes 2 UE genes 12345678910 BioGRID interaction features # screens any gene UE gene ? queried gene

Figure 3 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

P=1.0e-33 A B 1.6

1.4 60 80 100 1.2 1 Jiyoye 0.8 Raji 0.6 NALM-6

0.4 KBM7 0.2 % pairs in same complex K562 0 essential in essential in DLD1 same line different lines HCT116 C HeLa 20 RPE1 18 SMCC 20S proteasome GBM 16 exosome 14 Raji GBM K562 12 MCM RPE1 HeLa DLD1 KBM7 Jiyoye HCT 116 NALM-6 10 RAD51 8

6 4 D ribosome essential in all cell lines 2 20S proteasome MCM

0 0 2 4 6 8 10 12 14 16 18 20 all subunits 100 essential in ≥ 1 cell line 140 80 spliceosome 120 60

100 40 80 20 60 all subunits essential

ribosome % essential subunits 40 0 non-essential complexes # essential subunits 20 random expectation 20 4 6 8 10 0 0 20 40 60 80 100 120 140 160 # cell lines # subunits

E F P=1.75e-85 G H 6 10 cell lines 0.105 18 P=1.04e-8 P=1.47e-46 9 cell lines 5 4 cell lines 0100 17 1 cell line 0 cell lines 4 TPRKB 0.095 16 3

0.090 # tissues TP53RK 2 15 OSGEP 0.085 1 LAGE3 fraction residues mutated # direct interaction partners

0.080 14 0 C14ORF142 essential essential essential essential essential essential in more in fewer in more in fewer in more in fewer cell lines cell lines cell lines cell lines cell lines cell lines

Figure 4 C E A bioRxiv preprint not certifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailable

mRNA expression mRNA expression 10 10 10

mRNA expression 0 1 2 10 -5 0 5 ENALM-6 UE doi: in essential P=2.0e-6 1 to3lines https://doi.org/10.1101/107797 B-cells degree (log) P<0.05 LE P<0.05 in P=1.9e-5 essential 2 to3lines other NALM-6 NE under a ; this versionpostedFebruary10,2017. CC-BY-NC-ND 4.0Internationallicense D B F

degree 10 10 10 10 10 0 1 2 3 4 UNIVERSAL ESSENTIAL ESSENTIAL LONE fraction in BCR pathway UE 0,12 0,14 0,02 0,04 0,06 0,08 The copyrightholderforthispreprint(whichwas 0,1 . 0 in essential NALM-6 P<0.05 1 to3lines P=1.3e-4 LE B-cells P<0.05 NALM-6 CONTEXTUAL in ESSENTIAL ESSENTIAL NE essential P=9.3e-4 2 to3lines NON other Figure 5 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A P=1.8e-45 B P=8.9e-10 C P=1.5e-48

3 3 3

2 2 2

1 1 1

0 0 0

-1 -1 -1

-2 -2 -2

relative sgRNA depletion score relative sgRNA -3 depletion score relative sgRNA -3 depletion score relative sgRNA -3 in disordered not in disordered more less inside domain outside domain region region conserved conserved

D P=7.9e-9 E F P=6.1e-3 P=1.6e-8 P = 2.3e-5

3 3 3

2 2 2

1 1 1

0 0 0

-1 -1 -1

-2 -2 -2 relative sgRNA depletion score relative sgRNA relative sgRNA depletion score relative sgRNA relative sgRNA depletion score relative sgRNA -3 -3 -3

α-helix β-strand other more buried less buried inside interface outside interface

G 0.4

0.3

0.2

non-essential 0.1

0

H2O -0.1 - - + H2O + -0.2

relative sgRNA depletion score relative sgRNA -0.3 essential -0.4

0.5

solvent α-helix β-strand domain conserved disordered accessible protein-proteininterface

Figure 6 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

essential isoform

START STOP non-essential isoform

A B C P=2.9e-22 P=8.3e-18 0.9 0.45 P=2.2e-12 0.8 0.4 0.14 0.7 0.35 0.12

0.6 0.3 0.1 0.5 0.25 0.08 0.4 0.2 0.06 0.3 0.15 mutation rate

protein disorder 0.04 0.2 0.1 0.1 0.05 0.02 protein domain content 0 0 0 essential non-essential essential non-essential essential non-essential

6 E 4 D P=3.4e-15 0.12 2

0.1 0

0.08 -2

0.06 -4

0.04 -6 P=5.0e-9 exon expression level

alternative exon 0.02 -8

% of spectra matching 0 -10 essential non-essential -8 -7 -6 -5 -4 -1-2-3 0 essentiality score

F P=5.0e-6 2.25 G

2.20 NE exon

2.15 ANAPC15 2.10

IsoFunc score ANAPC5 2.05

2.00 essential non-essential

Figure 7 bioRxiv preprint doi: https://doi.org/10.1101/107797; this version posted February 10, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B mRNA expression 0 2 4 6 8

0,2 adipose adrenal 0,18 all hypothetical genes brain 0,16 RefSeq breast 0,14 44 essential colon hypothetical genes 0,12 kidney non-RefSeq heart 0,1 essentials liver 0,08 2,000 least fraction of genes lung essential 0,06 lymph node 0,04 prostate 0,02 muscle

0 WBC 8 16 32 64 128 256 512 1024 2048 4096 ovary # residues testis thyroid

C P=3.9e-4 D P>0.05 7

6 1

5 0.8

4 0.6

3 0.4 2 fraction conserved 0.2

% with peptide evidence 1 0 0 500 2,000 non-RefSeq 2,000 least most essential least essential essential essential

Figure 8