An Interrogation of ORF Versus CRISPRa Pooled-Screening Technologies Used to Define Cancer Drug-Resistance Landscapes.

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Goodale, Amy Brown. 2020. An Interrogation of ORF Versus CRISPRa Pooled-Screening Technologies Used to Define Cancer Drug- Resistance Landscapes.. Master's thesis, Harvard Extension School.

Citable link https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364877

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA An Interrogation of ORF Versus CRISPRa Pooled -Screening Technologies Used To Define

Cancer Drug-Resistance Landscapes.

Amy Goodale

A Thesis in the Field of Biotechnology

for the Degree of Master of Liberal Arts in Extension Studies

Harvard University

March 2020 Copyright 2019 [Amy Goodale] Abstract

Resistance to cancer therapies is an ever-present problem, and preemptively understanding the underlying genetic causes will improve patient care, help predict clinical response rates, and elucidate new drug targets. Within the last several years, studying the drug-resistance landscape of a cancer type has been made easier with two gain-of-function pooled genetic-screening systems – open reading frames (ORFs) and

CRISPR activation (CRISPRa). The ORF and CRISPRa screening systems produce the same overexpression phenotypes, but with ORF technology the of interest is exogenously expressed as a cDNA. Although directed overexpression of single provides valuable information, the genes are expressed at non-physiological levels, which can cause artifacts and non-meaningful biological insights. CRISPRa technology has the advantage of activating the endogenous gene transcript, and its splice isoforms, while expanding the capacity to conduct large-scale genetic screens. Despite these differences between the two technologies, both have proven successful in unfolding some of the unknown mechanisms of drug-resistance in cancer.

However, recent works have shown that the two gain-of-function mechanisms provide minimally concordant gene hit lists when screened in the same NRAS-mutant melanoma cell line. This study examines why the two technologies yield some discrepancies by generating follow-up, pooled ORF and CRISPRa libraries. These libraries contain the same set of genes hits from the primary screens, and secondary screens are performed in the same cell line. The data show that genes that were identified with only one of the methodologies in the primary screens, are confirmed more strongly with the same methodology in the secondary screens. Both ORF and CRISPRa validate the same percentage of gene hits, but unique sets of ‘ORF-only’ and ‘CRISPRa-only’ genes are defined. Follow-up assays show that certain genes do not emerge in an ORF screen because the ORF itself is lethal or is not expressed post-transduction. Other genes do not score with CRISPRa because the overall significance of the hit is decreased by averaging the magnitudes of all the sgRNAs for the gene. Altogether, the two activation systems produce distinctive results due to ineffective ORF and CRISPRa constructs, library design, and unaccountable biological factors specific to the cancer model of choice.

Dedication

I dedicate this work to my father, who shaped me into the scientist and person I am today. Missing you always.

v

Acknowledgments

I would like to first thank Max Herman for the unconditional love and support throughout this entire process.

I would like to extend my sincerest gratitude towards my incredible Thesis

Director, Federica Piccioni. Thank you for everything, but most importantly, for all the laughs.

This work could not have been completed without the amazing talents of Briana

Fritchman, Desiree Hernandez, Nicole Persky, Marissa Feeley, and Mudra Hegde. Thank you for helping me on this journey.

Finally, thank you to John Doench and David Root for extending me the offer to complete this project in the Genetic Perturbation Platform at the Broad Institute. It has been a wonderful six years.

.

vi

Table of Contents

Dedication ...... v

Acknowledgments ...... vi

List of Figures ...... ix

Chapter I. Introduction...... 1

Combating Cancer Drug Resistance and Identifying New Therapies ...... 1

Pooled Genetic Screens to Study Cancer Drug Resistance ...... 3

Overexpression Perturbation Technologies ...... 5

Overexpression Screens in the Context of a MEK Inhibitor ...... 11

Comparing ORF versus CRISPRa...... 12

Chapter II. Materials and Methods ...... 14

Cell Culture ...... 14

Secondary Library Generation ...... 15

Lentivirus Production and Titering ...... 17

CRISPRa cell line generation and activity testing ...... 20

Secondary Library Viral Titrations ...... 21

Secondary ORF and CRISPRa Screens ...... 22

Genomic DNA Isolation, PCR and Sequencing ...... 23

V5 Immunoassay ...... 26

CRISPRa sgRNA Cloning and Western Blots for Gene Activation ...... 28

Chapter III. Results ...... 32

vii

Comparing Gene Hits Lists from Primary ORF and CRISPRa Screens ...... 32

Designing the Secondary Libraries ...... 37

Assay Development for Secondary CRISPRa Screen...... 41

Post-Screen Assessment ...... 44

Quality Control Analyses of the Secondary Screening Data ...... 49

Analysis of Secondary Screening Data ...... 55

Chapter IV. Discussion ...... 70

References ...... 77

viii

List of Figures

Figure 1. Schematic representation of a pooled, positive-selection drug screen...... 5

Figure 2. Schematic representation of the CRISPR-Cas9 system...... 8

Figure 3. Schematic representation of the CRISPRa Calabrese system...... 10

Figure 4. Comparing genome-wide primary screens shows minimal overlap...... 34

Figure 5. Distribution of primary screen fold-changes by hit type...... 36

Figure 6. Primary screen fold-changes for constructs of representative genes...... 37

Figure 7. Secondary screen design and analysis scheme...... 38

Figure 8. Alignment of the truncated ORF and wild-type NRAS sequences...... 40

Figure 9. Schematic representation of the dCas9-VP64 vector...... 41

Figure 10. Gene expression is activated with CRISPRa in MelJuso...... 43

Figure 11. Titration results of the secondary ORF and CRISPRa lentiviral libraries...... 44

Figure 12. Schematic representation of secondary ORF and CRISPRa screens...... 46

Figure 13. Cumulative population doublings of MelJuso during secondary screens...... 47

Figure 14. gDNA samples were successfully amplified by PCR...... 49

Figure 15. All samples recovered sufficient sequencing reads...... 51

Figure 16. Not all ORFs are successfully represented in the screen...... 52

Figure 17. Replicates of secondary screens show strong pairwise correlations...... 54

Figure 18. Non-targeting control sgRNAs show minimal fold-change...... 55

Figure 19. Volcano plots of secondary screens...... 57

ix

Figure 20. Trametinib and Selumetinib treatments are highly correlated across secondary screens...... 58

Figure 21. Heat map of secondary screen z-scores...... 60

Figure 22. Validated genes sets by drug treatment...... 62

Figure 23. Validation rates of secondary screens grouped by primary screen p-values. .. 62

Figure 24. Comparing delta z-scores across drug treatments...... 64

Figure 25. V5 fluorescent signal of individual ORF constructs...... 66

Figure 26. Log2-fold changes of individual sgRNAs of ‘ORF-only’ gene hits...... 68

Figure 27. Gene activation varies by sgRNA and by gene...... 69

x

Chapter I.

Introduction

Resistance to cancer therapies is an ongoing problem in the clinic, and functional genomic studies designed to interrogate the genetics behind the resistance have become more popular. Two pooled-screening approaches, ORF and CRISPRa, used in the same model system, identify common genes involved in cancer drug resistance but also unique genes specific to the technology used. The goal of this work is to shed light on the discrepancies between the two different technologies. Identifying these underlying causes can ultimately drive the identification of unrevealed key targets for cancer eradication.

Combating Cancer Drug Resistance and Identifying New Therapies

The basic principles of cancer have been very well established. Cancer is a result of specific genetic modifications in genes involved in the cell cycle, cell growth and cell differentiation. Genetic mutations in oncogenes, which are responsible for cell division and growth, can drive the cells to continuous proliferation. Inactivating mutations in tumor suppressor genes, as well as alterations in apoptotic or anti-apoptotic genes, can also contribute to tumor development. Inactivation of DNA repair genes can affect the normal elimination of cells carrying cancer-promoting mutations. Genetic changes, like chromosomal translocations, point mutations, deletions, amplifications, and insertions,

1

can also occur in most, or all, of these genes independently of one another. Therefore, the escape of a tumor from controlled growth can proceed through different pathways, making the treatment of the disease exceedingly intricate (Hanahan & Weinberg, 2011).

During the past two decades cancer treatments evolved from cytotoxic agents to selective targeted therapies and cancer immunotherapies. Chemotherapies are non- specific, carry significant toxicity profiles, and cause dramatic side effects. Targeted therapies, on the other hand, block specific molecular pathways or mutant required for tumor cell growth and survival, and immunotherapies activate the host’s immune response. Although treatments are improving, certain cancers with specific mutations still lack adequate targeted therapies. Additionally, resistance to cytotoxic chemotherapies and single-agent targeted therapies is the main cause of treatment failure in metastatic cancer. Identifying causes that mediate resistance to a specific therapy can help predict which patients will show a clinical response, while also providing new drug targets.

Unfortunately, cancer genomes in the clinic are particularly taxing to study. The genetic makeup from cancer type to cancer type, patient to patient, and tumor cell to tumor cell varies greatly, and over time patients can develop resistance through different genetic means. Changes in drug transport mechanisms and metabolism, alterations in a drug target, fluctuations in the level of expression of a drug target, activation of alternative signaling pathways, and evasion of cell death are some of the many ways

(Holohan, Schaeybroeck, Longley & Johnston, 2013). Despite the difficulty, understanding the importance of each gene, and each genetic mutation, in the context of cancer is a critical first step in fighting the ever-growing drug-resistance problem.

2

Interrogation strategies that provide insights into the drug-resistance landscapes of different cancers will significantly improve patient care and help to reach the ultimate goal of personalized medicine (Wilson et al., 2015).

Pooled Genetic Screens to Study Cancer Drug Resistance

Studying drug resistance in cancer is not a simple task, but has been made easier with recent advancements in the field of functional genomics. The function of the genome is most often studied by perturbing the flow of information from genotype to phenotype on a global scale with various technologies (Hartenian & Doench, 2015).

Scientists studying the genomes of bacteria and yeast perfected high-throughput genetic screening, and the strategy has since transitioned to mammalian cell-based models

(Piccioni, Younger & Root, 2018). Screening can be carried out in an arrayed format, where each genetic perturbation is delivered in a separate well (Sanjana, 2017).

Alternatively, the perturbations can be pooled and delivered together, which is a powerful approach to interrogate large numbers of genes and substantially simpler than an arrayed screen.

Typically, pooled-genetic screening involves the development of a barcoded library of perturbations in a lentiviral expression vector. Modified lentivectors allow for the delivery of genetic information to a broad number of cell lines, including primary cells, and are capable of transducing dividing cells. Lentivirus particles then incorporate the DNA sequences of the perturbations into the host cell’s genome, and the perturbations

3

are stably expressed throughout cell divisions (Hartenian & Doench, 2015). The expression vectors often contain an antibiotic-resistance marker for cell selection, and the perturbations are typically barcoded to identify them within the pool. This pooled- screening technique does, however, rely on the integration of only one perturbation per cell so that cells with the desired phenotype can be uniquely identified at the end of the screen (Piccioni, Younger & Root, 2018).

Pooled screens are generally described as either negative or positive selection. In a negative-selection screen, perturbations are depleted out of the general population. A classic example of this is a cancer screen to identify perturbations that cause cell death over time, revealing genetic vulnerabilities and dependencies (Cheung et al., 2011).

Positive-selection screens rely on enriching a population of cells for a phenotype of interest after the introduction of a stringent selective pressure. For example, cells that have differential fluorescence intensities of a marker are sorted out of the general population by fluorescence-activated cell sorting (FACS) (Hartenian & Doench, 2015).

Pooled positive-selection screens are also a powerful tool for studying mechanisms of drug resistance in cancer (Figure 1). Cells are perturbed by a genetic library and then cultured in the presence of a small-molecule drug. The drug is toxic to the general population of cells, but cells with specific genetic perturbations may survive or even proliferate (Guenther et al., 2018; Iniguez et al., 2018). Then, whether performing a negative or positive-selection screen, the final cell population is collected, genomic DNA is purified, and barcoded perturbations are sequenced to measure changes in perturbation abundance.

4

Pooled lentiviral Passage without library drug

Library Antibiotic Control transduction selection + Collect for sequencing Drug treated

Mixed population Library-selected Cancer cells cells

Passage with drug

Figure 1. Schematic representation of a pooled positive-selection drug screen. Primary Screens ORFeome Calabrese

Cancer cells are transducedDMSO withTrametinib a pool of lentiviral perturbationsSelumetinib withDMSO low multiplicity of infection. After selection, the cell population is split into the control and drug- treatment arms. After several weeks in culture,Top 100 the + Top majority 100 of cells succumb to the drug treatment, while a few with resistance1 ORF per gene-conferring perturbations12 sgRNAs per proliferate gene .

Secondary Screens ORF CRISPRa (491 constructs) (6,700 constructs)

DMSO Selumetinib DMSO Selumetinib Trametinib Trametinib Analysis Calculate fold changes ORF CRISPRa (491 constructs) (6,700 constructs) Trametinib - DMSO Trametinib - DMSO OverexpressionSelumetinib - PerturbationDMSO TechnologiesSelumetinib - DMSO

Calculate z-scores ORF CRISPRa Collapse to genes (478 genes) (561 genes) Z-score Trametinib Z-score Trametinib Z-score Selumetinib Z-score Selumetinib

ValidateIn the genes past by fewdrug years, geneticTrametinib perturbation toolsSelumetinib have evolved to next-generation 83 genes 102 genes lentiviralValidated libraries. gene list These genetic perturbations,143 genes or constructs, typically fall into one of three categories: knock-down, knock-out, or overexpression. Knock-down of a gene is mediated through the RNA interference pathway, where a double-stranded RNA molecule is delivered into the cell, triggering the cleavage of complementary mRNAs.

Expression of the corresponding is then decreased but not completely eliminated.

Knock-out of a gene has recently been simplified with developments in CRISPR

(Clustered Regularly Interspaced Short Palindromic Repeat) technology in which an

5

RNA-guided Cas protein induces a double-stranded DNA break. Overexpression of a gene can be achieved through the introduction of ectopic open reading frame

(ORF)/cDNA sequences or activation of endogenous genes with a modified CRISPR system called CRISPR activation (CRISPRa) (Hartenian & Doench, 2015).

An ORF is a continuous stretch of codons that has the ability to be translated into a protein. In a lentiviral system, the ORF sequence is integrated into the host cell’s genome and expressed in addition to the endogenous sequence. A gene of interest, or a pool of genes, can therefore be overexpressed within a cell population regardless of the endogenous expression state of the cell lineage. ORF overexpression libraries were limited in the early 2000s, and the reagents that did exist were inconsistent in terms of gene representation and sequence validation (Yang et al., 2011). In 2011, Yang et al. characterized a nearly genome-wide Human ORFeome Collection and created an ORF lentiviral-expression library. Each ORF construct was derived from a single bacterial colony, fully sequence verified, barcoded and then cloned into the pLX_317 expression vector. The ORFeome library contains approximately one construct per gene, in addition to constructs for some common cancer mutant sequences, for a total of 17,255 ORFs. The final library is not fully genome-wide, however, as the packaging of larger ORFs into a lentiviral vector system is often unsuccessful. Regardless, the human ORFeome library has been favorably used to identify pathways and individual genes responsible for drug resistance in specific cancer contexts (Iniguez et al., 2018; Guenther et al., 2018).

CRISPR activation (CRISPRa) is a recent advancement in the field of CRISPR technology. Traditional CRISPR was developed after the discovery of prokaryote loci containing repeating palindromic sequences (Ishino, Shinagawa, Makino, Amemura &

6

Nakata, 1987) that are staggered with viral DNA (Bolotin, Quinquis, Sorokin & Ehrlich,

2005). It was later determined that these loci are used as a defense mechanism against bacteriophages (Barrangou et al., 2007). The inserted viral DNA is first transcribed into a

CRISPR RNA (crRNA) and then linked to a trans-activating crRNA (tracrRNA). The hybridized RNA then guides a Cas enzyme to a complementary DNA sequence, which is then cleaved and neutralized (Jinek et al., 2012).

This prokaryotic system was adapted for mammalian gene-editing purposes in

2012 by Jinek et al. by fusing together the crRNA and tracrRNA molecules into a 20- nucleotide single guide RNA (sgRNA) (Figure 2). The sgRNA can base-pair to any DNA region of interest upstream of a protospacer adjacent motif (PAM) – a specific sequence each Cas enzyme recognizes. A commonly used Cas derivative is Cas9 from

Streptococcus pyogenes, which binds to NGG PAM sites. In cells expressing Cas9, the target DNA is cleaved if there is sufficient sequence homology to the sgRNA. The induced double-strand break is most often repaired with the non-homologous end joining pathway, frequently resulting in the loss of function of the protein. CRISPRa uses a similar sgRNA and Cas9 system, but both components have been modified to promote gene activation instead of gene knockout (Hartenian & Doench, 2015).

7

Figure 2. Schematic representation of the CRISPR-Cas9 system.

The Cas9 enzyme is guided by the sgRNA to the target DNA sequence. After base-pairing upstream of the PAM sequence, the Cas9 cleaves the target DNA, causing a double- stranded break.

With CRISPRa, the Cas9 enzyme can be engineered with several point mutations to inactivate its nuclease domain. This dead Cas9 (dCas9) therefore acts as an RNA- guided, DNA-binding protein but can no longer cleave DNA (Qi et al., 2013). With the fusion of an activation domain, such as VP16, the dCas9 is then able to recruit transcriptional machinery to the target sequence (Gilbert et al., 2013). Alternatively, the tracrRNA region of the sgRNA can also be adapted to include structural loops that recruit additional transcriptional factors (Shechner, Hacisuleyman, Younger & Rinn, 2015).

Konermann et al. (2015) developed a highly efficient CRISPRa system by combining both of these strategies. Their dCas9 protein is fused to the activation domain VP64, while their tracrRNA is modified to contain two hairpins that selectively bind to the activating bacteriophage coat-protein MS2. Additionally, Konermann et al. (2015) introduced sequences for the activating subunit of NF-κB (p65), the activation domain of the human heat-shock factor 1 (HSF1), and MS2 into their expression vector. The

8

combination of these three elements led to the development of the synergistic activation mediator (SAM) system for CRISPRa.

The SAM system was then further adapted into a -wide CRISPRa library named Calabrese. The sgRNAs within the library contain a modified tracrRNA sequence (tracr14), which incorporates four stem loops: two MS2 and two PP7 (an alternative bacteriophage coat protein) (Figure 3). The expression vector of the library, named pXPR_502, was also modified to contain sequences for PP7, p65, and HSF1. For the library design, the transcriptional start sites (TSS) of all protein-coding genes in the human genome were initially annotated with a database called FANTOM. As many sgRNAs as possible near NGG PAMs were then designed to target 150-75 nucleotides upstream of the TSS. The six best sgRNAs for each gene with the highest potential for on-target effects and the lowest potential for off-target effects were then chosen and divided into two sets of three, Set A and Set B. The Calabrese library was tested in the melanoma cell line A375 to validate its use in a drug resistance screen. The cells were engineered to express dCas9-VP64, transduced with Calabrese, and then cultured with the small molecule vemurafenib for several weeks. The screen was able to recover a large number of gene hits with significant p-values, including the positive control EGFR, a known driver of vemurafenib resistance (Sanson et al., 2019).

9

Figure 3. Schematic representation of the CRISPRa Calabrese system. sgRNAs with modified tracrRNAs are cloned into the pXPR_502 vector, which contains the activating proteins PP7, p65 and HSF1. The dCas9-VP64 enzyme is guided by the CRISPRa sgRNA to the target gene’s transcriptional start site. PP7 stem loops in the tracrRNA recruit the activating proteins and upregulate gene expression.

The ORF and CRISPRa screening systems produce the same overexpression phenotypes, but the intrinsic biology behind the end result varies. CRISPRa technology can activate the endogenous gene transcript and its splice isoforms, while an ORF construct is derived from a single cDNA sequence (Hartenian & Doench, 2015).

However, expression can only be upregulated via CRISPRa if the cell type of interest is already expressing the gene transcript. For example, genes that have been permanently silenced due to chromatin structure may not be activated, whereas an artificial expression

ORF would be transcribed. Despite these differences between the two technologies, both have proven successful in unraveling some of the mysteries of drug-resistance mechanisms in cancer.

10

Overexpression Screens in the Context of a MEK Inhibitor

Currently, there is a lack of effective treatment options for melanoma patients that harbor an NRAS mutation, and patients often become resistant to the therapies that are available. Through several preclinical studies it was concluded that mutant-NRAS disrupts the MAPK signaling pathway. Developing a compound that binds NRAS has been challenging, thus the inhibition of the downstream kinase MEK was thought to be an effective treatment option. However, treatment with MEK inhibitors yielded limited clinical response. To preemptively investigate the resistance landscape of NRAS-mutant melanoma in the context of a MEK inhibitor, Hayes et al. (2019) recently completed several genetic screens using the human ORFeome library.

One ORF screen was carried out in the NRAS Q61L-mutant cell line MelJuso under the treatment of Trametinib, a MEK1/2 small molecule inhibitor (Hayes et al.,

2019). Within the same time frame, a Calabrese CRISPRa screen was also completed in

MelJuSo with the MEK inhibitor Selumetinib. Both of these screens were executed with similar protocols and therefore presented an opportunity for data comparison across the two overexpression screening technologies. Surprisingly, there was minimal overlap between the two gene hit lists considering the thousands of genes the ORF and CRISPRa libraries have in common. Some well-known oncogenes did emerge from both screens as

MEK inhibitor resistance-drivers, but the data do not appear to be completely synonymous (Sanson et al., 2019).

11

Comparing ORF versus CRISPRa

There is a great deal of research regarding the efficacy of ORF and CRISPRa screens as interrogation methods to elucidate gene function. Yet, little is known if one system provides the more accurate, or biologically relevant, results. As mentioned previously, the two systems activate genes differently. An ORF screen can ectopically activate most genes in any cell lineage, but the constructs are derived from a limited number of transcripts. CRISPRa can target multiple splice sites, but if the cell does not intrinsically express the gene it will not be activated. Despite these inherent differences, the end phenotype is the same, and therefore the lack of overlapping gene hits between

ORF and CRISPRa is profound. Researchers interested in a drug-resistance overexpression screen are left unsure of which screening technology to pursue, so determining if one of the systems yields more dependable results will help them prioritize their efforts. Additionally, limiting the number of genome-wide screens required at the beginning of a project will greatly cut down on cost and labor requirements, allowing for more detailed follow-up studies and significant findings.

There are several possible explanations for the different results across the two screening technologies, such as genes may not be overexpressed in an ORF screen due to the size of the cDNAs and the packaging limit of lentiviral vectors. Alternatively, a gene may not be activated by CRISPRa sgRNAs because it is not expressed in the screening cell-model. However, the most likely reason for the discrepancies is a flaw in the design of the constructs. For example, the ORF library contains mutant versions of several genes. These mutants may have a competitive advantage in the presence of a drug

12

regardless of the endogenous expression state, indicating the genes would score via ORF but not CRISPRa. Additionally, the ORF constructs could contain a loss-of-function mutation that renders their overexpression insignificant, implying they would score via

CRISPRa but not ORF. The design of some CRISPRa constructs could also be faulty in the sense that the sgRNA does not target the correct TSS of the gene. These particular genes may not score as hits with CRISPRa but could with ORF.

To determine if one screening approach is ultimately more reliable due to ineffective construct design in the other, secondary libraries will be generated to perform follow-up screens in MelJuso. The secondary screens will cross-examine the same set of genes hits with both overexpression systems and both MEK inhibitors so that validation rates can be determined. The analysis of these secondary screens may reveal a more robust understanding of how these two technologies perform and why they give varying results.

13

Chapter II.

Materials and Methods

Secondary ORF and CRISPRa pooled libraries were generated containing top gene hits. MelJuso cells were re-screened using each library, and the data was deconvoluted and checked for quality control. Additionally, low-throughput assays at the protein level were performed for gene validation. The experimental techniques and protocols are outlined in this section.

Cell Culture

MelJuso cells were obtained from the Cancer Cell Line Encyclopedia, and 293T and A549 cells were obtained from ATCC. All cells were kept in a humidity controlled incubator at 37°C and 5% CO2. Cells were passaged every 2-3 days to maintain <100% confluence depending on their doubling time. 293T and A549 cells were tested for mycoplasma prior to lentivirus production and viral titering, and MelJuso cells were tested at the beginning and end of the secondary screens. A549 and 293T cells were grown in DMEM + 10% FBS, with and without 1% penicillin-streptomycin, respectively.

MelJuso cells were grown in RPMI + 10% FBS and 1% penicillin-streptomycin, and

MelJuso-dCas9-VP64 cells were maintained with a final concentration of 2 ug/mL blasticidin. During transductions, the A549 and MelJuso media was supplemented with

14

final concentrations of 8 and 2 ug/mL polybrene, respectively. A549 and MelJuso cells were selected post-transduction with final concentrations of 1.5 and 1 ug/mL puromycin, respectively.

Secondary Library Generation

For the secondary ORF library, the 491 chosen ORF constructs were identified in the Human ORFeome Collection and randomly arranged to fill six 96-well plates, with a minimum of one empty-well per plate. Barcoded tubes containing normalized DNA of the constructs were then consolidated into racks, mirroring the predetermined plate layouts. DNA tube barcodes were then scanned in their new positions for later construct identification. 100 ng of DNA per well was then stamped into 96-well polypropylene storage plates containing 10 uL per well of sterile water. The DNA plates were stored at

-20°C until future use.

For the secondary CRISPRa library, selected sgRNA sequences were affixed to

BsmBI recognition sites and overhang sequences necessary for cloning into the pXPR_502 expression vector. Additionally, forward and reverse primers were attached 5’ and 3’ to the cloning sites for PCR amplification of the library. The finalized sgRNA sequences were synthesized by Genscript on an oligo chip. The oligonucleotides were amplified using 5 uL of a primer mix (forward primer 5’

CAGCGCCAATGGGCTTTCGA 3’; reverse primer 5’

AGCCGCTTAAGAGCCTGTCG 3’) at a final concentration of 0.5 uM, 25 uL of New

15

England Biolabs (NEB) next PCR master mix (2X), 40 ng of the oligonucleotide pool, and 17 uL water. PCR was performed over 24 cycles using the following conditions:

98°C for 30 s; 53°C for 30 s; 72°C for 30 s. The PCR product was then purified using the

Qiagen PCR-cleanup kit and quantified via Nanodrop. The pXPR_502 vector was digested with BsmBI by incubating 20 uL of NEB buffer 3.1, 10 uL of BsmBI, 20 ug of the vector, and water in a 200 uL total reaction at 55°C for 4-6 hours. The digested vector was purified with a 1% agarose gel at 120v for 3 hours, excised via the Qiaquick Gel

Extraction Kit, isopropanol precipitated to remove the high salt content, and stored at

-20°C.

Next, 100 ug of purified PCR product was cloned into 500 ug of digested vector using Golden Gate cloning with Esp3I (Fisher Scientific) and T7 ligase (Epizyme). The cloning reaction was cycled 100 times with the following conditions: 37°C for 5 min;

20°C for 5 min. The ligated product was isopropanol precipitated, and then 400 ng was electroporated into 100 uL of STBL4 electrocompetent cells (Thermo Fisher Scientific).

After 16 hours of growth on agar plates with 100 ug/mL carbenicillin at 30°C, the colonies were scraped and harvested in cold LB media. Library plasmid DNA (pDNA) was isolated using a HiSpeed Plasmid Maxi prep (Qiagen) and 1 ng was submitted for

MiSeq50 Illumina sequencing to confirm adequate representation of all intended sgRNAs.

16

Lentivirus Production and Titering

For 96-well plate lentivirus production of the secondary ORF library: 2.2x10e4

293T packaging cells per well were seeded in 100 uL of growth media in 96-well tissue culture plates. The plates were left out at room temperature for one hour before being placed in the incubator overnight. The following day, a plasmid mix containing the second generation packaging plasmid psPAX2 (Addgene), the envelope plasmid pCMV_VSVG (Addgene), and Opti-MEM media was prepared such that each 96-well would receive 100 ng, 10 ng, and 10 uL of the reagents, respectively. 10 uL per well of the plasmid mix was then dispensed into the 96-well plates containing the ORF DNA. A second mix of TransIT-LT1 transfection reagent (Mirus) and Opti-MEM media was then made such that each well would receive 0.6 uL and 10 uL of the reagents, respectively.

The TransIT-LT1 and Opti-MEM mix was left for five minutes at room temperature before 10 uL per well was added to the DNA plates. The transfection plates were left at room temperature for 30 minutes, and the entire volume of each well was then carefully added onto the pre-seeded 293T cells. The cell plates were incubated for six hours, then media was removed and replaced with 170 uL per well of viral harvest media (DMEM,

10% heat-inactivated FBS, 1% BSA, 1% penicillin-streptomycin). After a 42-hour incubation, the lentiviral supernatant was removed. Half of the virus was aliquoted into

96-well polypropylene storage plates and immediately stored at -80°C, while the remaining half was pooled into a trough. The pooled virus was thoroughly mixed before being aliquoted and stored at -80°C.

17

For flask-based lentivirus production of the secondary CRISPRa library and dCas9-VP64 vector: 1.8x10e7 293T packaging cells were seeded into a T175 flask in a total volume of 25 mL growth media and incubated overnight. The following day, a mix of 6 mL of Opti-MEM media and 305 uL of TransIT-LT1 was left at room temperature for 3 minutes followed by the addition of 40 ug of library pDNA, 50 ug of psPAX2, and

5 ug of pCMV_VSVG. The transfection mix was left at room temperature for an additional 27 minutes before being added dropwise onto the pre-seeded 293T cells. The flask was incubated for 6 hours, the media was replaced with 40 mL of viral harvest media, and the flask was incubated again. For the secondary CRISPRa library, 36 hours post media-change, the lentiviral supernatant was removed from the flask, briefly centrifuged at 230 g for 1 min to pellet any cell debris, aliquoted, and immediately stored at -80°C.

For the dCas9-VP64 vector lentivirus, the supernatant was concentrated to improve viral titer. A Centricon Plus-70 Centrifugal Filter with a 3 kDa membrane

(Sigma Aldrich) was first sterilized by adding 25 mL of 70% ethanol and centrifuging at

4000 g for 25 minutes at 30°C. Excess ethanol was removed by inverting the filter and centrifuging for 5 minutes at 4000 g, followed by two 10-minute washes with sterilized water at the same speed and temperature. 60 mL of sterilized water was then added, and the filter was centrifuged at 4 000 g for one hour at 30°C to hydrate the membrane. The viral supernatant was collected from the flask and directly filtered through a 0.45 um,

PVDF membrane to remove excess debris prior to concentrating. The clarified virus was then added to the filter column and centrifuged at 4000 g for one hour at 30°C.

18

Concentrated virus was eluted off the membrane by inverting the filter and centrifuging at 1000 g for 2 min at 30°C.

For small-scale lentivirus production of the individual sgRNAs: 1x10e6 293T cells per 6-well were seeded in 1.35 mL per well of growth media. Plates were incubated overnight, and the following day, 16.3 uL of TransIT-LT1 was mixed with 322 uL of

Opti-MEM per 6-well and left at room temperature for 3 minutes. 2.45 ug of sgRNA pDNA, 2.45 ug of psPAX2, and 0.245 ug of pCMV_VSVG per 6-well was then added to the Opti-MEM/TransIT-LT1 and left for an additional 27 minutes at room temperature.

The transfection mix was added dropwise to the pre-seeded 293Ts, and the plates were incubated for six hours, after which the media was replaced with 2.5 mL per 6-well of viral harvest media. After a 36-hour incubation, the viral supernatant was removed, aliquoted and immediately stored at -80°C.

To ensure the transfections were successful, all viruses were titered post- production. 3,000 A549 cells in 100 uL growth media per well were seeded into black- walled, clear-bottomed 96-well plates and left at room temperature for one hour prior to incubating overnight. The following day, a virus of known titer and the viruses of unknown titer were serially diluted 2-fold in growth media for several iterations. The media was removed from the cell plates and replaced with 35 uL per well fresh growth media supplemented with polybrene, followed by 10 uL per well of each virus dilution in triplicate. The plates were centrifuged at 1200 g for 30 minutes at 37°C and incubated for

48 hours. 100 uL per well of growth media supplemented with puromycin was then added for another 48 hours. The puromycin media was removed from the cell plates and replaced with 50 uL per well growth media supplemented with 10% alamarBlueⓇ

19

(Biosource). Plates were incubated for ~3 hours and then scanned on a fluorescent plate reader. Fluorescence readings from the virus of known titer were used to generate a standard curve for relative titer comparison.

CRISPRa cell line generation and activity testing

MelJuso cells stably expressing dCas9-VP64 were generated with a 12-well plate spin-transduction at a low multiplicity-of-infection (MOI). A master mix of parental cells, polybrene, virus, and growth media was made such that 1.5x10e6 cells and 20 uL of concentrated dCas9-VP64 virus was seeded per 12-well in a total volume of 2 mL. The plate was then centrifuged at 930 g for 2 hours at 30°C. After the spin, 2 mL per well of fresh growth media was added dropwise to dilute the polybrene and viral media, and the plate was incubated overnight. The following day, the media was aspirated, and the cells were washed with PBS and trypsinized. All subsequent spin-transductions followed this same protocol but used varying volumes of virus depending on the viral titer. The transduced cells were seeded into a flask, and 24 hours post-seeding, growth media supplemented with blasticidin was added. The MelJuso-dCas9-VP64 cells were maintained for 2 weeks with blasticidin, after which the cells were tested for Cas9- activity and banked.

To test the activity of dCas9-VP64 in MelJuso, the cells were spin-transduced in a

12-well plate at a low MOI (~0.5) with activating sgRNAs for CD4 and CD45 (sgCD4, sgCD45). The sgRNAs had been previously cloned into the pXPR_502 vector. A no- infection control (NIC) well was also added to the plate, containing 1.5x10e6 cells,

20

polybrene, and growth media in a total volume of 2 mL. Post-transduction, the NIC, sgCD4 and sgCD45-cells were seeded into separate flasks. 24 hours after seeding, growth media supplemented with puromycin was added to the sgCD4 and sgCD45-cells. The cells were maintained with puromycin-supplemented media for an additional 72 hours, after which duplicate wells of NIC-cells and one well each of sgCD4 and sgCD45-cells were seeded into a 96-well V-bottomed plate with 2.5x10e5 cells per well. The plate was centrifuged at 1000 g for 5 minutes and the media supernatant removed. One of the NIC- wells was resuspended in 100 uL of flow buffer (PBS, 2% FBS, 5 uM EDTA), and the remaining three wells were resuspended in 90 uL of flow buffer and 5 uL each of anti-

CD4 and anti-CD45 antibodies per well. The plate was left on ice for 30 minutes, followed by three washes with flow buffer. The cells were then run through a flow cytometer. Live cells were gated using FSC/SSC, and the NIC-unstained control cells were used to gate the non-fluorescent population.

Secondary Library Viral Titrations

To determine the virus concentrations to use during the secondary library transductions, MelJuso cells were spin-transduced with varying volumes of virus in 12- well plates. MelJuso parental and dCas9-VP64 cells were collected separately and secondary ORF and CRISPRa library viruses were added, respectively, with the following range of volumes: 0, 100, 200, 300, 400 and 500 uL. The following day,

1.5x10e5 cells from each population was seeded in duplicate in 6-wells to create in-line

21

titer plates. 24 hours post-seeding, growth media supplemented with puromycin was added to one 6-well of each condition in the in-line plates. 48 hours after puromycin addition, the in-line plates were counted and transduction efficiencies were calculated for each virus volume with the following formula: # of cells (+)puromycin / # of cells (-) puromycin.

Secondary ORF and CRISPRa Screens

For the secondary screens, MelJuso parental and dCas9-VP64 cells were collected separately and spin-transduced in 12-well plates at a low MOI (~0.5). For each screen, three separate master mixes of cells, polybrene, growth media and library virus were made to create the three biological replicates. Each replicate comprised of enough cells to maintain a representation of 1,000 and 500 cells per construct for ORF and CRISPRa, respectively, taking into account the transduction efficiencies of the library viruses.

Additionally, a NIC-well was also seeded for each library transduction. Post- transduction, the cells from each replicate were pooled together and counted, and the

NIC-cells were collected and counted separately. A 6-well in-line titer plate for each replicate was seeded, as previously described. The remaining cells were seeded into a

T225 flask per replicate to maintain a healthy confluency during selection and expansion.

24 hours after seeding, growth media supplemented with puromycin was added to the flasks of cells and to one well of each condition in the in-line titer plate. 48 hours after

22

puromycin addition, the in-line titer plate was counted to confirm a low MOI for each transduction.

The cells were maintained in growth media supplemented with puromycin for an additional 48 hours to ensure complete selection. A minimum of 2,000 cells per construct per replicate was then split into the DMSO control arm and the Trametinib (final concentration 10 nM) and Selumetinib (final concentration 1.5 uM) drug-treatment arms.

The cells were seeded at a density to maintain a confluency of <100% after 2-4 days of growth. The remaining cells were pelleted by centrifugation and stored in PBS at -20°C as the early time point (ETP) reference samples. The screens were passaged and drug was refreshed every 2-4 days for a total of 14 days in treatment. The cells were counted at each passage to calculate population doublings and to maintain a minimum of 2,000 cells per construct per replicate. During every passage, the extra cells not re-seeded were pelleted and stored in the same manner as the ETP samples.

Genomic DNA Isolation, PCR and Sequencing

The ORF ETP pellets and the day-14 pellets from the DMSO, Trametinib and

Selumetinib treatments arms from the ORF and CRISPRa screens were chosen for sequencing. Genomic DNA (gDNA) from these samples was isolated using either Mini,

Midi or Maxi NucleoSpin Blood kits (Machery Nagel, Clontech) for cell pellets of

<5x10e6 cells, 5-20x10e6 cells, and >20x10e6 cells, respectively. After isolation, gDNA was stored at 4°C prior to PCR.

23

For the Mini kits: samples were thawed, centrifuged and re-suspended in 200 uL of PBS. 25 uL of Proteinase K and 200 uL of lysis Buffer B3 was then added to each sample. The mixtures were vortexed and then incubated at 70°C overnight. The following day, the samples were cooled and 1 uL of 20 mg/mL RNAse A (Clontech) was added per sample, followed by a 5 minute incubation at room temperature. 210 uL of absolute ethanol was then added per sample to precipitate the DNA. The lysate and DNA mix was loaded onto one column per sample and centrifuged at 11000 g for 1 minute to bind the

DNA. Each column was then washed with 500 uL of Buffer BW and 600 uL of Buffer

B5. After a 1 minute centrifugation at 11000 g to dry the membrane, 100 uL of heated elution buffer was added per sample and incubated at room temperature for approximately 15 minutes. The columns were then centrifuged at 11000 g for 1 minute to elute the gDNA. The gDNA was then diluted in elution buffer to a concentration of 200 ng/uL via Qubit quantification.

For the Midi/Maxi kits, respectively: samples were thawed, centrifuged and re- suspended in 2 mL/10 mL of PBS. 150 uL/500 uL of Proteinase K and 2 mL/10 mL of lysis Buffer BQ1 was then added to each sample. The mixtures were vortexed and incubated at 70°C overnight. The following day, the samples were cooled and 4.1 uL/20 uL of 20 mg/mL RNAse A (Clontech) was added per sample, followed by a 5 minute incubation at room temperature. 2 mL/10 mL of absolute ethanol was added per sample to precipitate the DNA. The lysate and DNA mix was loaded onto one column per sample and centrifuged at 3250 g for 3 minutes to bind the DNA. Each column was then washed twice with 2 mL/7.5 mL of Buffer BQ2 and centrifuged for 10 minutes at 3250 g to dry the membrane. 200 uL/1000 uL of heated elution buffer was added per sample, and the

24

columns were left to incubate at room temperature for approximately 15 minutes. The columns were then centrifuged at 3250 g for 2 minutes to elute the gDNA. The eluted gDNA was diluted in elution buffer to a concentration of 200 ng/uL via Qubit quantification.

gDNA from each sample was then distributed across multiple wells of a P7 primer plate, using one plate for each secondary screen. The Gonzo and Kermit P7 plates contain 96 uniquely barcoded primers that are specific for the pLX_317 and pXPR_502 vectors, respectively. Each well is preloaded with 10 uL of the barcoded primer at a concentration of 5 uM. The number of PCR wells to load per sample was calculated by maintaining a representation of 2,000 cells per construct per replicate and assuming an average of 6.6 pg of gDNA per cell. 50 uL of normalized gDNA was added per well for a maximum of 10 ug of gDNA. For each plate, a PCR master mix was made containing 75 uL of ExTaq DNA (Clontech), 1000 uL of 10x ExTaq buffer, 800 uL of dNTPs, 2075 uL water and 50 uL of the P5 primer mix. The P7 primer plates and the P5 stagger mix were previously made by Integrated DNA Technologies (IDT). 40 uL of

PCR master mix was added per 96-well for a total reaction volume of 100 uL. The plates were then sealed and 100 pg of CRISPRa secondary library pDNA were spiked into four empty wells in the Kermit PCR plate. The PCR was then run with the following cycling conditions: 1 minute at 95°C; then 28 repeating cycles of 30 seconds at 94°C, 30 seconds at 52.5°C, 30 seconds at 72°C; followed by 10 minutes at 72°C.

Each well containing gDNA input was then run on a 1% agarose E-gel

(Invitrogen) for 10 minutes (E-Base Electrophoresis Device, Invitrogen) with a 1kb DNA ladder (New England Biolabs) to confirm amplification of the product . 30 uL of each

25

PCR-amplified sample was then mixed together in a trough to create the pooled PCR product. To purify the pooled product prior to sequencing, 100 uL was added to 2-3 wells of a 96-well round bottom plate (Costar). 100 uL of the AMPure XP magnetic beads

(Beckman Coulter) was added to each well and gently mixed followed by a 5 minute incubation at room temperature. The plate was then placed on a magnet (Alpaqua) for 5 minutes to separate the beads from the solution. While still on the magnet, the supernatant was removed and replaced with 100 uL of 70% ethanol to wash the beads.

The wash was repeated, and the plate was removed from the magnet and left to dry for 2-

3 minutes. To elute the PCR product, 50 uL of TE buffer was added per well. The plate was then placed back on the magnet for 2 minutes, and the purified supernatant was collected. The purified pooled PCR product was then quantitated via Qubit, and 20 uL were submitted to the Broad Institute’s Genomics Platform for HiSeq200 Illumina sequencing.

V5 Immunoassay

The assay was initially optimized to obtain a maximum V5 expression-signal.

MelJuso parental cells were seeded in a range of seeding densities chosen to reach 90-

100% confluency after 72 and 96 hours. The cells were then transduced with a range of lentivirus volumes of the control blue-fluorescent protein (BFP) ORF. This matrix of conditions identified an ideal cell seeding density, high multiplicity-of-infection virus volume, and readout time-point at which to see the strongest immunofluorescent signal.

26

During this optimization assay, the BFP ORF was confirmed to have a strong V5 signal and was therefore considered the positive control.

The protocol was finalized as follows: 10,000 MelJuso parental cells per well were seeded across seven black-walled, clear-bottomed cell culture 96-well plates

(Costar) in 65 uL of growth media per well. The plates were left at room temperature for

45 minutes to help the cells settle evenly across the wells. 25 uL of virus per well from the arrayed secondary ORF library plates was then stamped into the cell plates, followed by 10 uL per well of 10X polybrene in growth media. Empty-wells in the ORF plates that did not contain any virus stamped in 25 uL of growth media instead. The seventh cell plate was used for controls, with several empty and BFP wells. The plates were centrifuged for 30 minutes at 1200 g and 37°C and left at room temperature for an additional 45 minutes before incubating overnight.

The following day, the media was aspirated and replaced with 90 uL per well of fresh growth media. 24 hours after the media change, 10 uL per well of 10X puromycin in growth media was added to the plates. 48 hours after puromycin addition, the media was removed, and the cells were fixed at room temperature with 100 uL per well of 4% paraformaldehyde and 0.1% Trition-X 100 in PBS for 30 minutes. The fixative was then removed, and the cells were washed three times with PBS. Next, the cells were blocked with 100 uL per well of blocking buffer (Li-Cor) for 1 hour at room temperature with shaking. The buffer was then replaced with 50 uL per well of primary mouse V5 antibody

(1:5000, Invitrogen) with 0.1% Tween 20 in blocking buffer. The cells were incubated overnight at 4°C with shaking and then washed three times with 0.1% Tween 20 in water.

The cells were then incubated in the dark with 50 uL per well of goat anti-mouse

27

secondary antibody (1:800, Li-Cor) in blocking buffer for 1 hour at room temperature with shaking. The cells were washed three times with 0.1% Tween 20 in water and once with PBS before drying. The plates were then scanned with the Li-Cor Odyssey and individual well fluorescence-intensities were quantitated with the ImageJ ReadPlate2.1 plugin.

CRISPRa sgRNA Cloning and Western Blots for Gene Activation

Chosen sgRNA sequences for the genes of interest, and a non-targeting sgRNA, were appended with overhangs such that they could be ligated into the pXPR_502 vector:

5’ CACCG to the forward oligo, 5’ AAAC to the reverse oligo, and 3’ C to the reverse oligo. Forward and reverse oligos for the sgRNAs were then synthesized by IDT in a 96- well plate format and re-suspended in double distilled water (ddH2O) to a concentration of 40 uM. 1.5 uL of the forward and reverse oligos for each sgRNA, 5 uL of 10X NEB buffer 3.1, and 42 uL of ddH2O were combined in a PCR-compatible 96-well plate. PCR was then run with the following cycling conditions: 5 minutes at 95°C; 5 minutes at

72°C; followed by 5 minute intervals lowering the temperature by 5°C until room temperature. 1 uL of the annealed oligo pair was then added to 20 ng of the pre-digested pXPR_502 vector, 2 uL of 10X ligase buffer (NEB), 13 uL of water, and 1 uL of T4 ligase (NEB). The ligation was then incubated at 16°C for 3-4 hours, after which 2 uL of the ligated product was added to 25 uL of competent DH5 cells. After 30 minutes on ice, the cell mixture was heat shocked for 45 seconds at 42°C, and then incubated on ice for another 2 minutes. The transformed cells were recovered with SOC media (Invitrogen)

28

for 1 hour at 37°C and then streaked onto agar dishes containing 100 ug/mL carbenicillin.

The dishes were incubated upside down overnight at 37°C.

The following day, 2-4 colonies per sgRNA were collected and individually added to 2 mL of TB media with 100 ug/mL carbenicillin. The colonies were then grown at 37°C with shaking for 17 hours, after which 150 uL of glycerol was added to 850 uL of each culture and stored at -80°C. The sequences of the sgRNA colonies were then sequenced confirmed using the protocol described previously for gDNA. Briefly, 1 uL of glycerol stock per colony was added per well to a Kermit P7 barcoded primer plate, followed by a PCR master mix containing Ex-Taq, Ex-Taq buffer, dNTPs, P5 primer, and water. The plate was PCR-cycled and the products were pooled together, purified, and submitted for MiSeq50 Illumina sequencing. The colony with the highest percent matching sequencing reads to the sgRNA was chosen for DNA preparation.

To isolate plasmid DNA from the glycerol stocks, first 1.2 mL per well of TB media containing 100 ug/mL carbenicillin was added to a deep 96-well sterile growth plate (Corning), followed by 5 uL of each glycerol stock. The plate was then sealed and shaken at 37°C for 17 hours. 40 uL of each culture was added to 80 uL of fresh TB media in a new growth plate and centrifuged at 1500 g for 8 minutes at 4°C. The media was then aspirated and 200 uL per well of resuspension buffer with RNaseA (50 mM Tris-

HCl, 10 mM EDTA, 0.1 mg/mL RNaseA, pH 8.0) was added and vortexed. 210 uL of lysis buffer (200 mM NaOH, 1% SDS) was added and incubated for 4 minutes at room temperature, followed by 300 uL of neutralization/binding buffer (3.75 M guanidinium hydrochloride, 0.9 M KOAc, 1.4 M HOAc, pH 4.35). The plate was then centrifuged at

3000 g for 30 minutes at 4°C. Next, the lysate supernatant was added to a clarification

29

filter plate on top of a deep-well collection block and centrifuged at 3000 g for 5 minutes at 4°C. The filtered lysate was then transferred to the pDNA binding plate stacked on top of a new deep-well collection block. After a 2 minute centrifugation at 1800 g at 4°C, 120 uL of wash buffer (15 mM NaCl, 40 mM Tris-HCl, 25 mM Tris, pH 6.65) was added to

480 uL of absolute ethanol, and the pDNA was washed twice with 600 uL. The pDNA plate was then dried with a 5 minute centrifugation at 1800 g and 4°C and placed on top of a DNA storage plate. The pDNA was eluted with 140 uL of elution buffer (Tris-HCl, pH 8.0) for 10 minutes at room temperature and collected after a 5 minute centrifugation at 1800 g and 4°C. The pDNA was quantified via Nanodrop, and each sgRNA was virus prepped using the 6-well protocol previously described.

MelJuso-dCas9-VP64 cells were transduced in 12-well plates, as previously described. The following day, 5x10e5 cells per well per sgRNA were seeded in a 6-well plate, as well as 2.5x10e5 MelJuso parental cells. 24 hours post-seeding, growth media supplemented with puromycin was added to the transduced cells, and 48 hours after puromycin addition, the wells were 100% confluent. Media was aspirated, and the cells were washed with cold PBS. The cells were then lysed with 200 uL per well cold 1%

NP40 lysis buffer (150 mM NaCl, 50 mM Tris pH 7.5, 2 mM EDTA pH 8, 25 mM NaF,

1% NP40) containing phosphatase and protease inhibitors (Roche). The plates were left on ice for 10 minutes before collecting the cell lysate from each sgRNA separately. The samples were centrifuged for 10 minutes at 4°C, and the supernatant was collected into a new microcentrifuge tube.

To quantify the protein content of each lysate, the Thermo Pierce BCA Assay Kit was used with standards of 2, 1.5, 1 and 0.75 mg/mL BSA. A mix of 1 uL of standard or

30

lysate sample and 100 uL of ddH2O was added per well to a 96-well plate , as well as a blank sample containing only water. 100 uL of a 98%:2% solution of BCA Reagent A to

Reagent B was added to each well, and the plate was incubated at 37°C for 25-30 minutes. The plate was scanned to measure absorbance at 560 nm, and the protein content of each lysate was calculated based on the BSA standard curve. The concentrations of the samples were then normalized in NP40 buffer and 6X loading dye (NEB) was added to a final concentration of 14.3%. The samples were boiled at 95°C for 2 minutes prior to loading into a SDS-PAGE gel (Invitrogen), along with the protein ladder (BioRad). The gel was run at 140V for ~45 minutes with a 1X MES running buffer (50 mM MES, 50 mM Tris base, 0.1% SDS, 1 mM EDTA, pH 7.3). The gel was then added to an iBlot2 stack package (Invitrogen) and transferred to the membrane with the iBlot2 transfer system (Invitrogen) at 20V for 7 minutes.

The samples for each gene of interest were then cut separately from the membrane and incubated in blocking buffer (Li-Cor) for 1 hour at room temperature with shaking. The blocking buffer was then replaced with a 1:1000 dilution of mouse or rabbit primary antibody for the gene of interest and a 1:3000 dilution of vinculin primary antibody from the opposite species (1% Tween 20 in blocking buffer, Cell Signaling

Technology). After an overnight incubation at 4°C with shaking, the membranes were washed three times with 1X TBST buffer (10% Tris-buffered saline, 0.1% Tween 20) for

5 minutes at room temperature with shaking. A 1:2000 dilution of fluorescently-labeled mouse anti-rabbit and rabbit anti-mouse secondary antibodies (1% Tween 20 in blocking buffer, Licor) was then incubated for ~1 hour at room temperature with shaking. The 1X

TBST washes were repeated, and the membranes were scanned with the Li-Cor Odyssey.

31

Chapter III.

Results

Secondary screens were successfully performed and reveal sets of validated genes that continuously score with only one overexpression technology. Underlying features of the ORF and CRISPRa constructs likely cause this divergence. The data collected from these screens and analyses used are described in this section.

Comparing Gene Hits Lists from Primary ORF and CRISPRa Screens

To identify genetic modulators of resistance to MEK1/2 inhibition in melanoma, a near genome-wide pooled ORF overexpression screen was performed in MelJuso, a

NRAS-mutant melanoma cell line that is sensitive to the MEK inhibitor Trametinib. Cells over-expressing cDNAs from the pooled ORFeome library were treated with Trametinib and cultivated for several weeks. A similar screen was conducted in MelJuso cells expressing dCas9-VP64 with the CRISPRa library Calabrese in the presence of a second- generation MEK inhibitor Selumetinib. At the end of the drug treatment, the genomic

DNA was harvested and the construct barcodes were amplified by PCR. The abundance of each ORF and sgRNA was then quantified by Illumina sequencing.

The raw sequencing data from the primary ORF Trametinib and CRISPRa

Selumetinib screens were initially analyzed at the individual construct level. The

32

sequencing reads for each ORF or sgRNA construct were first normalized by dividing the reads per construct by the total read count of each condition (i.e. each replicate of each treatment arm) and then multiplying by 1x10e6 to obtain a reads-per-million (RPM) value. The number one was added to each RPM, and then the values were log2- transformed. Typically in a drug screen, the DMSO control arm is used as the reference sample; therefore, for each construct, the DMSO log2-RPM value was subtracted from the drug-treated log2-RPM value to calculate the log2-fold change (LFC). The LFC values were averaged across the biological replicates of each primary screen, and the average LFCs were then used to calculate the hypergeometric distribution (New Analysis for CRISPR-based Screening, 2019). This statistic is equivalent to a one-sided Fisher’s exact test and collapses the LFCs of constructs for the same gene and yields an associated p-value based on their rank order. The primary screens were then analyzed at the gene level.

The ORFeome and Calabrese CRISPRa libraries contain constructs for approximately 12,860 and 18,000 unique genes, respectively. To compare the hits from these primary screens, the gene lists were first sorted to find the genes present in both libraries. The 11,960 overlapping genes were then ranked from highest to lowest average

LFC value in each primary screen. The LFCs highest in magnitude are considered resistance-conferring genes when overexpressed in the MelJuso cell line under the treatment of a MEK inhibitor. The top 100 Trametinib-resistant genes from the ORF screen ranged in average LFC value from 9.18 to 0.965 and were considered ORF-hits.

Similarly, the top 100 Selumetinib-resistant genes from the CRISPRa screen ranged in average LFC value from 2.51 to 0.740 and were considered CRISPRa-hits (Figure 4).

33

Genes ranking in the middle of both primary screens were considered non-scoring and had LFC values clustered around zero. Surprisingly, only 12 double-hit genes overlapped from the top 100 ORF and CRISPRa-hits.

RAF1 NFE2L2 2 ABCB1 EGFR ICAM1 WWTR1 HNF4A HRAS HNF4G 1 ERBB2 RPS6KA6 HAND1

0 Calabrese Double-hit ORF-hit -1

Avg. fold change (log2) change fold Avg. CRISPRa-hit Non-scoring -2 -2 0 2 4 6 8 ORFeome Avg. fold change (log2)

Figure 4. Comparing genome-wide primary screens shows minimal overlap.

Primary ORFeome and CRISPRa libraries were screened with Trametinib and Selumetinib MEK inhibitors, respectively, in MelJuso cells. Few genes confer drug- resistance in both screens (green, n=12). Others confer resistance in the ORF screen only (blue, n=88) or the CRISPRa screen only (red, n=88). Non-scoring genes (yellow, n=337) rank in the middle of both screens. Average log2-fold changes are calculated relative to the DMSO control arm and averaged across biological replicates and constructs for a given gene.

After grouping the genes into double-hits, ORF-hits, CRISPRa-hits, and non- scoring, the genes were reassessed at the individual construct level (Figure 5). Double-hit

ORFs and sgRNAs scored as the highest ranking constructs in both primary screens.

CRISPRa-scoring constructs showed little fold-change in the primary ORF screen, and similarly, ORF-scoring constructs showed little fold-change in the primary CRISPRa screen. RAF1, BRAF, and MAP2K6 were chosen as a representative double-hit, ORF-hit,

34

and CRISPRa hit, respectively, and the average LFC values of their individual constructs from each primary screen were reviewed (Figure 6). All ORF and sgRNA constructs for

RAF1 (double-hit) had positive LFC values in both primary screens. The wild-type and

V600E-mutant BRAF (ORF-hit) constructs showed a resistance phenotype in the primary

ORFeome screen, while the mutant construct with a large deletion had the opposite phenotype, as expected. The MAP2K6 (CRISPRa-hit) CRISPRa sgRNAS showed all positive LFCs while BRAF (ORF-hit) sgRNAs in the primary CRISPRa screen clustered around LFC of zero.

While the rescue effect was fairly consistent with ORF constructs fully expressed in the cells, the enrichment of the sgRNAs was highly variable. For RAF1, the best- ranking sgRNA had a LFC of ~5, and the lowest-ranking sgRNA had a LFC around zero.

For MAPK2, the best ranking sgRNA had a LFC around two, and three sgRNAs had no effect with LFCs around zero. The variability of the sgRNA-effect in the screen suggests that the sgRNA design might play an important role in the ability to activate the endogenous gene, and therefore the drug-resistance phenotype.

35

ORFeome Calabrese 10

8

6

4

2

0 Avg. fold change (log2) change fold Avg. -2

ORF-hit ORF-hit Double-hit Double-hit CRISPRa-hitNon-scoring CRISPRa-hitNon-scoring

Figure 5. Distribution of primary screen fold-changes by hit type.

Boxes include all constructs for the double-hit genes (n=12, green), top-scoring ORF genes (n=88, blue), top-scoring CRISPRa genes (n=88, red), and non-scoring genes (n=337, yellow). ORF and CRISPRa constructs have greater log2-fold change values in their respective screens. Average log2-fold changes are calculated relative to the DMSO control arm and averaged across biological replicates. The boxes represent the 25th, 50th, and 75th percentiles; whiskers show 10th and 90th percentiles. Dashed line indicates a log2-fold change value of zero.

36

a) b) Double-hit 10 6 ORF-hit CRISPRa-hit 8 4 6

4 2 Calabrese ORFeome 2 0 Avg. fold change (log2) change fold Avg. Avg. fold-change (log2) fold-change Avg. 0

-2 -2

RAF1 BRAF MAP2K6

RAF1_F648Y BRAF_V600E RAF1_wtRAF1_wt clone clone 1 BRAF_wt 2 BRAF_wt clone clone 1 2 MAP2K6_L335 ins MAP2K6_wtMAP2K6_wt clone clone 1 2 BRAF_Q201H; 204-766 del

Figure 6. Primary screen fold-changes for constructs of representative genes.

RAF1 (green), BRAF (blue), and MAP2K6 (red) are representative double-hit, ORF-hit, and CRISPRa-hit genes, respectively. ORF and CRISPRa constructs have greater log2- fold change values in their respective screens. Average log2-fold changes are calculated relative to the DMSO control arm. a) Annotations of ORF constructs from the primary ORFeome library are noted: wild-type (wt), mutations (X###Y), insertions (ins) or deletions (del). Error bars indicate standard deviations across four biological replicates. b) Dots represent the average of two biological replicates for each of the six sgRNAs/gene included in Calabrese. Dashed lines indicate log2-fold changes of zero.

Designing the Secondary Libraries

To further elucidate why so few genes scored across both ORF and CRISPRa technologies, the strategy of screening with secondary libraries was used (Figure 7).

Performing secondary screens with a more focused gene set increases the resolution of the data, as there are fewer non-scoring genes contributing noise. Screening with

37

secondary libraries also allows for identification of false-positives and calculation of validation rates to identify true ‘ORF-only’ and ‘CRISPRa-only’ genes. Additionally, re- Pooled lentiviral Passage without library screening the MelJuso cell line enables comparison of the two different drugMEK inhibitors used in each of the primary screens. The gene list chosen for the secondary libraries Library Antibiotic Control transduction selection included+ the top 88 scoring ORF-hits, the top 88 scoring CRISPRa-hits, the 12 doubleCollect -for sequencing Drug treated hits, and the 337 non-scoring genes for a total of 525 genes. Mixed population Library-selected Cancer cells cells

Passage with drug

Primary Screens ORFeome Calabrese

DMSO Trametinib Selumetinib DMSO

Top 100 + Top 100

1 ORF per gene 12 sgRNAs per gene

Secondary Screens ORF CRISPRa (491 constructs) (6,700 constructs)

DMSO Selumetinib DMSO Selumetinib Trametinib Trametinib Analysis Calculate fold changes ORF CRISPRa (491 constructs) (6,700 constructs) Trametinib - DMSO Trametinib - DMSO Selumetinib - DMSO Selumetinib - DMSO

Calculate z-scores ORF CRISPRa Collapse to genes (478 genes) (561 genes) Z-score Trametinib Z-score Trametinib Z-score Selumetinib Z-score Selumetinib

Validate genes by drug Trametinib Selumetinib 83 genes 102 genes

Validated gene list 143 genes

Figure 7. Secondary screen design and analysis scheme.

The top 100 genes from the primary ORFeome and CRISPRa screens were compiled for the secondary libraries. Secondary screens are split into DMSO, Trametinib and Selumetinib arms for both ORF and CRISPRa. Log2-fold changes are calculated using the DMSO arm as the reference sample and converted into z-scores. Genes lists are validated by drug treatment and then consolidated.

38

Not every available construct in the Human ORFeome Collection has a perfect percent match to the human protein. Therefore candidate ORFs for the 525 genes were initially filtered for an 80% protein match or higher. Many of the genes that fell below the 80% threshold were identified as non-scoring controls (n=37) and were removed from the secondary library. The remaining genes (n=13) were designated as ORF and

CRISPRa-hits and were not removed to retain as much likeness to the secondary

CRISPRa library as possible. In cases where there was more than one ORF construct per gene in the primary library, the first or second highest ranking was chosen to be included in the secondary. An ORF for eGFP was also included as an additional non-scoring control.

One of the strongest Selumetinib-resistant genes in the primary CRISPRa screen was NRAS, however, it failed to score in the top 100 in the primary ORFeome screen.

Considering the primary screens were performed in a NRAS-mutant melanoma cell line, it was curious why NRAS would not score using a cDNA library. Further analysis of the

DNA sequence of the NRAS construct included in the primary ORFeome library revealed a guanine insertion causing a frameshift. Therefore, only the first 31 amino acids of the protein are translated, resulting in an early truncation with a 16% protein match (Figure

8). To avoid re-screening with the truncated ORF, an alternative wild-type NRAS construct was included in the secondary library. Additionally, two mutant-NRAS constructs, NRAS Q61R and Q61L, were also included. These mutations are known drivers of drug resistance in RAS-mutant cancers in the clinic and were included to serve as positive controls (Hobbs, Der & Rossman, 2016). After finalizing the gene list and design parameters, the secondary ORF library was initially made in a 96-well arrayed

39

format. DNA of the 491 ORFs was randomly distributed across six 96-well plates, leaving some empty wells for plate identification. Lentivirus was then made from the 96- well plates and homogeneously pooled to create the library.

DNA sequence

Truncated NRAS Wild-type NRAS

Truncated NRAS

Wild-type NRAS …….. Protein sequence

Truncated NRAS 1 MTEVQTGGGWSRWCWEKRTDNPANPEPLCR. 31

Wild-type NRAS 1 MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQE EYSAMRDQYMRTGEGFLCVFAINNSKSFADINLYREQIKRVKDSDDVPMVLVGNKCDLPT RTVDTKQAHELAKSYGIPFIETSAKTRQGVEDAFYTLVREIRQYRMKKLNSSDDGTQGCMG LPCVVM. 189

Figure 8. Alignment of the truncated ORF and wild-type NRAS sequences.

The canonical reference sequence is from the UniProt database, and the sequences were aligned using the DNASTAR Lasergene 15 package. The DNA sequence of the NRAS construct included in the ORFeome library reveals a guanine insertion near the N- terminus. The frameshift results in an early truncation of the NRAS protein.

The secondary CRISPRa library consisted of the original six sgRNAs included in

Calabrese Set A and Set B, in addition to six newly designed sgRNAs, for the 525 genes.

The new sgRNAs were designed with the same criteria used in designing the Calabrese sgRNAs: the target window was 150-75 nucleotides upstream from the transcriptional start site, the on-target potential was maximized, and the number of off-target sequence matches was restricted. However, for some genes, one or more of these criteria were loosened to reach a quota of six new sgRNAs. Also, for approximately 15% of the genes, fewer than six new sgRNAs could be designed due to a limited number of NGG PAM sequences in the protein-coding region. In addition to the gene-targeting sgRNAs, 419

40

non-targeting sgRNAs were also included, bringing the secondary library total size to

6,700 sgRNAs. These non-targeting sgRNAs have no sequence homology with any site in the human genome and therefore will not activate gene transcription. The sgRNAs were not available as individual constructs, so the secondary library was synthesized and lentiviral particles generated as a pool of oligos.

Assay Development for Secondary CRISPRa Screen

In order to perform the secondary CRISPRa screen, a stable cell line expressing dead-Cas9 (dCas9) fused to the VP64 activator domain needed to be generated. Thus, the

MelJuso cells were transduced with the pXPR_109 lentiviral vector, which contains a blasticidin resistance cassette under a 2A promoter and dCas9-VP64 under an EF1α promoter (Figure 9). The cells were transduced at an efficiency of 20-30% to ensure only one integrant of dCas9-VP64 per cell. The cells were then grown in media containing blasticidin for two weeks to ensure proper selection of a dCas9-VP64-positive culture.

Figure 9. Schematic representation of the dCas9-VP64 vector.

The dCas9-VP64 enzyme is expressed under the EF1α promoter, and transduced cells are selected for with the blasticidin resistance cassette.

41

To determine the gene upregulation activity of the engineered MelJuso cells, the dCas9-VP64 enzyme was then tested using activating sgRNAs targeting CD4 and CD45 in the CRISPRa vector (sgCD4 and sgCD45). These two cell-surface signaling proteins are not highly expressed in melanoma cell types but amplification of their expression levels can be easily detected. The sgCD4 and sgCD45-transduced cells were stained using fluorophore-labeled CD4 and CD45 antibodies, and fluorescence readings were recorded via flow cytometry (Figure 10). Each of the cell-surface markers showed increase expression around 20% when compared to the non-transduced cells. This increased level of expression was not particularly high, but the necessary amount of activation needed to obtain significant CRISPRa screening results is unknown. Therefore, the level of dCas9-VP64 activity was deemed acceptable to continue forward with the screen.

42

CD4 stained CD45 stained

100.0% 0.0% 99.9% 0.1%

a. Non- transduced

77.7% 22.3% 99.0% 1.0%

b. sgCD4

99.8% 0.2% 82.7% 17.3%

c. sgCD45

Figure 10. Gene expression is activated with CRISPRa in MelJuso.

Cells were a) not transduced b) transduced with an activating sgRNA for CD4 or c) transduced with an activating sgRNA for CD45. Cells were stained with fluorescently- tagged antibodies against CD4 and CD45 and evaluated with flow cytometry. Gates were drawn using the not-transduced, stained cell population. CD4 and CD45 expression is activated by ~20%, denoted by a right-shift in the cell population.

Prior to screening with the secondary libraries, the transduction efficiency of each pooled lentiviral library was tested in the MelJuso cells. For downstream analysis of pooled screens, it is imperative that the multiplicity-of-infection (MOI) is less than one

(i.e. a transduction efficiency less than or equal to 30%) to obtain a single perturbation per cell. The cells were therefore titrated with varying volumes of each library lentivirus, selected with puromycin and then assessed for cell viability. With the secondary ORF library virus, parental MelJuso cells plateaued at a transduction efficiency of 15%. The secondary CRISPRa library virus yielded slightly higher transduction efficiencies with

43

MelJuso-dCas9-VP64 cells, plateauing around 30% (Figure 11). The discrepancy in the transduction efficiencies is attributable to the difference in the viral titer of the two libraries. Taking into account the efficiencies from the titrations, the secondary library transductions were then performed with enough cells to ensure a representation of at least

500 cells per construct per replicate.

50 ORF Secondary CRISPRa Secondary

40

30 30%

20 15% (% viable cells) 10 Transduction Efficiency Transduction

0 0 100 200 300 400 500

Virus Volume (uL)

Figure 11. Titration results of the secondary ORF and CRISPRa lentiviral libraries.

Cells were spin-transduced in 12-well plates with varying volumes of virus and counted for cell viability post-selection. Secondary ORF and CRISPRa library viruses yielded 15% and 30% transduction-efficiencies with 500 and 390 uL, respectively (dashed lines).

Post-Screen Assessment

For each of the secondary screens, the conditions mimicked those used in the primary screens as much as possible: puromycin selection occurred at the same time post-

44

transduction, the cells were seeded at the same initial density, and the same concentrations of the MEK inhibitors were used (Figure 12). This consistency in the design allows for an accurate calculation of validation rates. There were only two variations between the primary and secondary screens, one of them being the number of cells per construct re-seeded during each passage. The primary ORF and CRISPRa screens were passaged at 1,000 and 500 cells per construct per replicate, respectively

(Hayes et al., 2019; Sanson et al., 2019), while the secondary screens were passaged at

2,000. The condensed sizes of the secondary libraries allow for more cells to be maintained in culture, which in turn increased the statistical power of the screening data.

The second difference in screening design was that each secondary screen was split into a vehicle control arm (DMSO), as well as two drug-treatment arms, Trametinib and

Selumetinib. A confounding factor when comparing genes hits from the primary

ORFeome and CRISPRa screens was that each screen used a unique MEK inhibitor.

Therefore, screening both libraries with both small molecules allows for a true cross- comparison of primary versus secondary and ORF versus CRISPRa.

45 Activation Domains U6 sgRNA tracr14 PGk pp7-p65-HSF 2A-PuroR XPR_501 pXPR_502 pp7 VP64 HSF HSF P65

P65 PP7 XPR_502

PP7 pp7 p65 pp7 HSF sgRNA pp7

MS2

MS2 dCas9 EF1a dCas9-VP64 2A-BlastR pXPR_109 VP64

sgRNA

Cas9

Target DNA PAM

Day -14 Transduce with dCas9-VP64 pXPR_109

Blasticidin

Day 0 Transduce with secondary ORF library Day 0 Transduce with secondary CRISPRa library pLX_317 pXPR_502 Puromycin Puromycin

Day 7 Split into drug arms Day 7 Split into drug arms Collect ETP

DMSO, Trametinib, Selumetinib DMSO, Trametinib, Selumetinib

Day 21 Pellet screen Day 21 Pellet screen

Genomic DNA Genomic DNA PCR PCR Illumina seq Illumina seq

Figure 12. Schematic representation of secondary ORF and CRISPRa screens.

For the CRISPRa screen, cells were transduced with dCas9-VP64 and selected on blasticidin for two weeks prior to screening. After transduction with the secondary libraries, cells were selected with puromycin for seven days before being split into drug arms. Cells were in drug treatment for two weeks and then pelleted for sequencing.

The growth of the cells was monitored during the course of the secondary screens

(Figure 13). Every 2-4 days the cells were trypsinized, collected and counted, and then

replated at the appropriate density with and without drugs. The Trametinib and

Selumetinib-treated cells exhibited a slower growth rate than the DMSO controls cells,

indicating there was selective pressure applied necessary for a drug-resistance screen.

There was a decreased growth rate in the two drug-treatment arms in the secondary

CRISPRa screen as compared to the secondary ORF. The slower growth could be

attributed to the MelJuso cells being previously selected for dCas9-VP64 expression with

the antibiotic blasticidin. Another plausible explanation is the design itself of the

secondary libraries. The secondary CRISPRa library included six new sgRNAs of

46

unknown effect, and the slower growth in the secondary CRISPRa screen suggests that the new sgRNAs did not confer any growth advantage to the cells.

DMSO_ORF Trametinib_ORF 14 Selumetinib_ORF 12 DMSO_CRISPRa Trametinib_CRISPRa 10 Selumetinib_CRISPRa 8

6

4

2

Cumulative Population Doublings 0 0 2 4 6 8 10 12 14 Days in Drug

Figure 13. Cumulative population doublings of MelJuso during secondary screens.

Total cell yields were recorded at each passage to calculate cumulative doublings over 14 days in culture. Cell growth in Trametinib (purple) and Selumetinib (orange) treatment was inhibited compared to DMSO control cells (gray). Secondary ORF (solid lines) and CRISPRa (dashed lines) cells had varying growth rates compared to one another. Results are representative of the three biological replicates per screen.

After two weeks of culture in drug treatment, the cells were pelleted and the genomic DNA (gDNA) was isolated using column extraction kits. gDNA was also extracted from the secondary ORF screen early time point pellets that were collected prior to initiating drug treatment. The gDNA was isolated using three different size columns, Mini, Midi, and Maxi, depending on the number of cells pelleted. The performance of the different sized columns is known to vary, therefore, after extraction the amount of gDNA recovered was calculated to make sure there was minimal sample loss on the columns. With an assumed amount of 6.6 pg of gDNA per cell, the average percent recovery for the Mini, Midi, and Maxi columns kits, respectively, was 40% ±

47

19% (n=9), 55% ± 11% (n=6), and 85% ± 11% (n=6). Sample recoveries less than 100% were acceptable because the cells were pelleted two to ten-fold in excess to ensure gDNA from a minimum of 2,000 cells per construct per replicate was successfully isolated.

gDNA from every sample was then divided across multiple wells of two 96-well

PCR plates, one plate specifically for each screen. Each of the wells in the PCR plates was pre-loaded with a uniquely barcoded P7 primer for sample identification during data deconvolution. Additionally, a P5 primer containing sequences of staggered length necessary for Illumina sequencing was included in the PCR master mix. Plasmid DNA

(pDNA) was also added to the CRISPRa PCR plate as the reference sample for the library and to act as a PCR positive-control. After executing the PCR, amplified product was run on an agarose gel to determine if the reaction was successful. The gels were imaged and analyzed by eye for band intensity and for the expected band sizes of 280 and

350 bp for ORF and CRISPRa, respectively. The PCR from every well containing gDNA-input was 100% successful, and, as expected, there was no amplification in the no template control (NTC) wells (Figure 14). All of the amplified PCR products were then pooled together in equal proportions, purified using magnetic beads, and submitted for

HiSeq200 Illumina sequencing.

48

ORF secondary CRISPRa secondary

DMSO Trametinib Ladder DMSO Selumetinib Trametinib pDNA Ladder

bp bp

400 300 400 200 300 100 200 100

Selumetinib Early Time Point NTC Ladder DMSO Selumetinib Trametinib NTC pDNA Ladder

bp bp

400 300 400 200 300 200 100 100

Figure 14. gDNA samples were successfully amplified by PCR.

Specific primers for the ORF and CRISPRa vectors were used, and the amplified products were run on a gel and imaged. All samples amplified with280 thebp for correc ORF t band size (280 bp -- ORF, 350 bp -- CRISPRa) and with the same band intensity. The ladder used was a 1 kb DNA ladder.

Quality Control Analyses of the Secondary Screening Data

The sequencing results were deconvoluted using a software tool called PooQ

(PoolQ, 2019), which is able to map raw sequencing reads to a specific construct barcode within a specific input sample. First, the software filters the raw reads for those that contain specific 5’ header sequences: GACGA to positively identify an ORF barcode and

CACCG to positively identify an sgRNA. The 20-24 nucleotides that follow the header sequence are the specific barcode of the construct. In the case of CRISPRa technology, the 20-nucleotide barcode is the sequence of the sgRNA itself. Next, the barcodes found in the raw sequencing data are mapped to a reference file, which contains all possible barcodes present in the secondary ORF and CRISPRa libraries. Finally, the sequencing

49

reads are matched to a specific gDNA sample on the basis of the uniquely barcoded P7 primer included in each 96-well in the PCR plate.

To check the quality of the sequencing results, the total read counts of each sequencing lane and the percent of matching reads were reviewed. Every sequencing lane in a HiSeq200 Illumina flow cell should produce 120 to 180 million reads, and 60-90% of the reads should match to barcodes within the secondary ORF and CRISPRa libraries.

The total number of reads for the secondary ORF and CRISPRa sequencing lanes were

230 million and 182 million, respectively, indicating that they were possibly overloaded with sample, but still high quality. The percent matching reads for the secondary ORF and CRISPRa sequencing lanes were 83% and 75%, respectively, again indicating high quality sequencing data. Additionally, there were sufficiently low-percent matching reads in the NTC wells, as expected.

To ensure there was enough sequencing coverage, the reads per construct from

PCR wells with the same gDNA input were summed together. The expected result is one read per cell (Piccioni, Younger & Root, 2018) such that the total number of reads recovered can be compared to the total number of cells loaded into the PCR. The secondary ORF screen recovered many more reads than the minimum required for all experimental conditions, including the early time point (ETP) reference sample (Figure

15). The secondary CRISPRa screen recovered fewer reads than anticipated for all samples and for the pDNA. However, enough reads were recovered to maintain 800 cells per sgRNA per replicate, which is many more than the recommended minimum of 500

(Piccioni, Younger & Root, 2018). Therefore, the sequencing coverage from both secondary screens was acceptable to continue on with data analysis.

50

5.0×107 4.0×107 ORF Secondary 3.0×107 CRISPRa Secondary 2.0×107 2.0×107

1.5×107

1.0×107

11,000 cells / ORF

Avg. raw read count read raw Avg. 6 5.0×10 800 cells / sgRNA

0.0

NTC ETP pDNA DMSO Trametinib Selumetinib

Figure 15. All samples recovered sufficient sequencing reads.

Secondary ORF (blue) and CRISPRa (red) screening samples were sequenced with HiSeq200 Illumina sequencing, and the number of recovered reads per condition was summed (NTC – no template control; pDNA – plasmid DNA; ETP – early time point). Minimal reads were recovered from the no-template controls, and each ORF and CRISPRa sample recovered a minimum of 11,000 and 800 reads (i.e. cells) per construct, respectively (dashed line). Where applicable, results are representative of the three biological replicates per condition with error bars indicating st. devs.

The raw sequencing reads were converted into reads-per-million (RPM) and log2- transformed, as previously described. Briefly, the summed reads per construct were divided by the total read count of each experimental condition and multiplied by 1x10e6.

One was then added to each RPM value, in the event that the value was zero, and the values were log2-transformed. Next, the distributions of the log2-RPM in the ORF ETP and CRISPRa pDNA reference samples were reviewed since proper representation of each construct is critical for accurate interpretation of the data. Several ORF constructs from the ETP sample (n=13) had log2-RPM values of zero, indicating they were not represented at the beginning of the screen (Figure 16). It’s possible that the 96-wells containing these ORFs were not pooled properly when making the secondary library lentiviral stock, or simply these cDNAs were not transduced in the MelJuso cells. To

51

avoid skewing the results, these underrepresented ORFs were removed from all subsequent data analyses. The log2-RPM of the CRISPRa pDNA were normally distributed with all values greater than zero (data not shown), and therefore no data were excluded.

125

100

75

50

Number of ORFs 25

0 0 2 4 6 8 10 12 14 Avg. reads-per-million (log2)

Figure 16. Not all ORFs are successfully represented in the screen.

Raw sequencing reads were converted to log2-reads per million and plotted to identify frequency distribution. Several ORF constructs (n=13) have log2-reads per million of zero in the early time point sample. Dashed line indicates fitted normal distribution. Results are representative of the three biological replicates for the early time point.

The data were then converted into log2-fold changes (LFC) by subtracting the log2-RPM of the reference sample from the log2-RPM of the experimental samples (two weeks in DMSO, Trametinib and Selumetinib treatment). For this initial assessment of the secondary ORF screening data, the reference sample used was the ETP isolated on the day drug treatment began. As previously mentioned, the transduction efficiencies of certain ORFs in specific cell lines and screening contexts can vary. Therefore, it is imperative to use transduced cells as the baseline to ensure an accurate analysis of construct enrichment. For the secondary CRISPRa screening data, the initial reference

52

sample was the pDNA of the library. sgRNAs are an average of 100 base-pairs in length and package very efficiently into lentiviral particles. Therefore, the log2-RPM of the pDNA are an accurate representation of the sgRNAs present in the cells post- transduction.

The LFC values of the three biological replicates were then pairwise correlated as an additional means of assessing the quality of the screens. Replicate correlations ranged from R=0.93 to 0.95 and R=0.62 to 0.80 for the secondary ORF and CRISPRa screens, respectively (Figure 17). Despite the lower replicate correlations with CRISPRa, possibly due to the lower representation after sequencing, the screens were ultimately technical successes. Therefore, all subsequent analyses use the average of the three biological replicates. As a final quality control metric, the LFCs of the non-targeting sgRNAs

(n=419) were compared to the LFCs of the gene-targeting sgRNAs (n=6,281) in the secondary CRISPRa screens. As expected, the non-targeting LFCs clustered more tightly around zero (Figure 18).

53

ORF Secondary CRISPRa Secondary

A B C A B C

R = 0.95 R = 0.95 R = 0.80 R = 0.62 A

R = 0.95 R = 0.94 R = 0.80 R = 0.62 B Trametinib

R = 0.95 R = 0.94 R = 0.62 R = 0.62 C

R = 0.95 R = 0.93 R = 0.79 R = 0.62 A

R = 0.95 R = 0.93 R = 0.79 R = 0.64 B Selumetinib

R = 0.93 R = 0.93 R = 0.62 R = 0.64 C

Figure 17. Replicates of secondary screens show strong pairwise correlations.

Log2-fold change values for Trametinib and Selumetinib treatment arms are calculated using the early time point and plasmid DNA reference samples for the secondary ORF and CRISPRa screens, respectively. The three biological replicates are noted as A, B, and C. All axes range from -10 to 10 log2-fold change with fitted linear regressions and coefficients of correlation in red.

54

Non-targeting Gene-targeting sgRNAs sgRNAs 6 DMSO 4 Trametinib Selumetinib 2

0

-2

Average fold change (log2) change fold Average -4

-6

Figure 18. Non-targeting control sgRNAs show minimal fold-change. sgRNAs from the secondary CRISPRa library were separated into non-targeting (n=419) and gene-targeting (n=6,281) and by drug treatment arm – DMSO (gray), Trametinib (purple), and Selumetinib (orange). Non-targeting sgRNAs are less depleted and enriched compared to gene-targeting sgRNAs. Average log2-fold changes are calculated relative to the plasmid DNA and averaged across biological replicates. The boxes represent the 25th, 50th, and 75th percentiles; whiskers show 10th and 90th percentiles. Dashed line indicates a log2-fold change value of zero.

Analysis of Secondary Screening Data

After verifying the technical quality of the screens, the LFC values were recalculated using the DMSO control arm as the reference sample to mirror the analysis used for the primary screens. Then, the LFC values of all sgRNAs per gene were averaged to assess the CRISPRa data at the gene level. The LFCs of the ORF constructs could not be collapsed, as there was only one ORF per gene present in the secondary library. Next, to assign a p-value to each gene, the hypergeometric distribution was calculated, as previously described. For the secondary CRISPRa screens, the

55

hypergeometric distribution randomly distributed the 419 non-targeting sgRNAs into groups of 11-12 and averaged their LFCs to assign p-values to ‘dummy’ negative control genes. The average p-value was then plotted against the average LFC value for every gene to generate volcano plots (Figure 19).

One of the most significantly enriched genes (positive LFC) across both screening technologies and drug treatments was RAF1, the top double-hit from the primary screens.

While not the focus of this study, BDKRB2, a G-protein coupled receptor never before associated in melanoma, was one of the top depleted sensitizer genes (negative LFC) in all four experimental conditions as well. It can also be noted that the secondary CRISPRa library has the benefit of including true negative controls in the form of non-targeting sgRNAs. These sequences do not target anywhere in the genome and score with an average LFC near zero. For the secondary ORF screens, the p-values are lower than those of the secondary CRISPRa screens due to there only being one construct per gene in the library.

56

ORF Secondary CRISPRa Secondary

3 ABCB1 14 IFNG HNF4G

BDKRB2 RAF1 12 SSBP4 KRAS 2 10 8 RAF1 -value (-log10) -value (-log10) 6 p p 1 WWTR1 Trametinib 4 BDKRB2 ABL1 Avg. Avg. Avg. Avg. OTX1 2 0 0 -4 -2 0 2 4 6 -2 -1 0 1 2 3 4 Avg. fold change (log2) Avg. fold change (log2)

3 14 IFNG HNF4G

BDKRB2 RAF1 12 MC3R KRAS 2 10 RAF1 8 ABCG2 -value (-log10) -value (-log10) 6 p p ABCB1 1 BDKRB2

Selumetinib 4 OTX1 LPAR3 Avg. Avg. Avg. Avg. 2 0 0 -4 -2 0 2 4 6 -2 -1 0 1 2 3 4 Avg. fold change (log2) Avg. fold change (log2)

Figure 19. Volcano plots of secondary screens.

Average p-values are calculated by the hypergeometric distribution (one-sided Fisher’s exact test) using the rank orders of all constructs for a gene. Average log2-fold changes are calculated relative to the DMSO control arm and averaged across biological replicates and constructs for a given gene. Similar genes are enriched (green) and depleted (red) across all four secondary screens.

As mentioned previously, the secondary screens were divided into Trametinib and

Selumetinib treatment arms to determine if gene hits were unique to each MEK inhibitor.

The average LFC values were compared across the two small molecules, and linear regressions yielded correlations of 0.94 and 0.89 in the secondary ORF and CRISPRa screens, respectively (Figure 20). Given these strong correlations, it is unlikely that the lack of concordance in the primary screening data was due to the different drug treatments. The one gene that scored distinctly higher in Trametinib in the secondary

CRISPRa screen was ABCB1, a drug efflux pump known to be a driver of multidrug-

57

resistance. An alternative drug efflux pump, ABCG2, did not score in Trametinib, possibly due to its different molecular structure. The two drug transporters likely competitively exported Selumetinib, whereas ABCB1 exclusively expelled Trametinib, giving rise to a LFC value greater than three. It can also be noted that overexpression of

ABCG2 also did not drive resistance to Trametinib in the secondary ORF screen.

Additionally, the two NRAS-mutants, Q61L and Q61R, known to drive resistance in

RAS-mutant cancers in the clinic, and included in the secondary ORF library as positive controls, scored significantly in both MEK inhibitors.

Figure 20. Trametinib and Selumetinib treatments are highly correlated across secondary screens.

Each dot represents the average log2-fold value for all constructs for a given gene relative to the DMSO control arm. Non-targeting sgRNAs and the eGFP negative control ORF are colored in gray. Dashed lines indicate log2-fold change values of zero, and drug treatment coefficients of correlation are noted.

58

To compare the two screening technologies, the LFC values were converted into z-scores, as the data from each secondary screen had a different signal strength. First, the z-scores were calculated at the individual construct level, and then they were collapsed and averaged per gene – for CRISPRa this resulted in the average of 11-12 sgRNAs.

Genes were then grouped by their designated hit type from the primary screen – ORF-hit,

CRISPRa-hit, or double-hit – and compared (Figure 21). The top 20 scoring ORF-hits all yielded average z-scores greater than 1.3 in the secondary ORF screen, but their z-scores clustered around zero in the secondary CRISPRa. Similarly, the top 20 CRISPRa-hits scored stronger in the secondary CRISPRa than in the secondary ORF screen. Almost all of the double-hits scored in both secondary screens with positive z-scores, but the magnitudes of the z-scores were higher in the secondary ORF screen as compared to

CRISPRa. Surprisingly, the wild-type ORF construct for NRAS included in the secondary library did not score as a double-hit, and the gene itself was only a moderate hit in CRISPRa (average z-score of 1.31). The z-score for HRAS was not calculated for the secondary ORF screen because the ORF construct had zero sequencing reads in the

ETP sample.

59

ORF secondary CRISPRa secondary

KRAS BRAF BRD3 ARAF MAPK3 SAMD4A GPR132 CSF1R 5 NTRK2 ORF-hit PPARG SAMD4B ONECUT1 DMRT1 PDGFRB NR2F1 4 KSR1 HTR1A FBXW7 RBM47 CREB1 GK5 3 BCAR3 KLHL11 KSR2 NRAS TOB1 MDM4 TMPRSS13 2 SHOC2 SLC7A11 CRISPRa-hit ZNF521 z-score Average PRKCH ATP6V0D2 TYRO3 CNR2 1 LPAR1 GPR84 CBFA2T2 GPR20 PDE7A RAF1 0 HNF4G HNF4A EGFR WWTR1 ABCB1 Double-hit RPS6KA6 ERBB2 NFE2L2 ICAM1 HAND1 HRAS

Trametinib Trametinib Selumetinib Selumetinib

Figure 21. Heat map of secondary screen z-scores.

Average z-scores were calculated from the average log2-fold change values, and genes were separated by their designation from the primary screen – ORF-hit, CRISPRa-hit or double-hit. The top 20 ORF-hits, the top 20 CRISPR-hits, and all double-hits are ranked on their secondary screen z-score. Genes score more significantly with the same screening technology from primary to secondary. Darker blue indicates a higher average z-score.

To assess the validation rate of the secondary screening data, z-scores and the hypergeometric distribution of the primary data were first calculated. Next, primary z- scores of control genes were compiled. For the primary ORFeome screen, the controls comprised of the 300 non-scoring genes included in the secondary library, and for the

60

primary CRISPRa screen the controls comprised of both the 337 non-scoring genes included in the secondary library and 165 ‘dummy’ non-targeting genes. These sets of controls were then used to derive a false discovery rate (FDR) of <5% for each screening technology. Secondary screen z-scores above the primary FDR thresholds were considered partially validated. From this partially validated set, genes with discordant z- score polarities between the primary and secondary screens were filtered out to finalize the validated gene sets. However, these genes were only considered validated in the drug treatment used in the primary screen – Trametinib in the primary ORFeome and

Selumetinib in the primary CRISPRa. NRAS could not be validated in Trametinib because the ORF construct included in the secondary library was not the same as the one in the primary, but it was validated in Selumetinib.

Overall, 83 and 102 genes validated in Trametinib and Selumetinib, respectively

(Figure 22). 64% of the Trametinib-validated genes were designated as ORF and double- hits, and similarly, 58% of the Selumetinib-validated genes were designated as CRISPRa and double-hits from the primary screens. Therefore, approximately one-third of the scoring genes included in the secondary libraries were false positives. Genes that validated were then grouped based on their primary screen’s average p-value. In general, genes with higher p-values in the primary screens validated at a higher rate than those with lower p-values (Figure 23). The two gene sets were then de-duplicated, and four of the Selumetinib-validated genes were removed because they had zero ETP sequencing reads in Trametinib. The final list included 143 validated genes.

61

Figure 22. Validated genes sets by drug treatment.

Genes with z-scores higher than the <5% false discovery rate thresholds are considered validated from the primary to secondary screens in Trametinib (purple, n=83) and Selumetinib (orange, n=102). False discovery rates (dashed lines) are calculated from negative control gene sets from the primary ORFeome (z-score of 0.47) and Calabrese (z-score of 0.15) screens. Average z-scores are calculated from average log2-fold change values.

Figure 23. Validation rates of secondary screens grouped by primary screen p-values.

Average p-values are derived from the hypergeometric analysis of the primary ORFeome and CRISPRa screening data. The most significant genes from the primary screens generally validate at a higher rate in the secondary screens. False discovery rates of <5% are calculated from negative control gene sets from the primary ORFeome and CRISPRa screens. The number of genes in each group is indicated.

62

To focus on the genes that show different effects in the two screening technologies, delta z-scores for each drug condition for the 143 validated genes were calculated using the following formula: delta z-score = (average z-score ORF) – (average z-score CRISPRa). Contrasting the delta z-scores across the two drug treatments yielded a linear regression correlation of R=0.95 and revealed 12 significant ‘ORF-only’ genes

(delta z-scores >2), KRAS, HNF4G, BRD3, ARAF, BRAF, SAMD4A, GPR132,

MAPK3, HFN4A, NTRK2, INSR, and CSF1R, and two significant ‘CRISPRa-only’ genes (delta z-scores <-2), ELOVL1 and ZNRF4 (Figure 24). KRAS is the most significant ‘ORF-only’ hit with delta z-scores around six, and of note, the ORF constructs expressing KRAS are G13V-mutants known to drive drug resistance to MAPK inhibition.

ELOVL1 is the strongest ‘CRISPRa-only’ gene with delta z-scores around -2. NRAS was confirmed to be a moderate, but not statistically significant, ‘CRISPRa-only’ gene with an average delta z-score of -0.65.

63

6 KRAS HNF4G

SAMD4A 4 HNF4A BRD3 NTRK2 GPR132 INSR BRAF CSF1R ARAF 2 MAPK3

0 NRAS Selumetinib Delta avg. z-score

-2 ZNRF4

ELOVL1 R = 0.95 -4 -4 -2 0 2 4 6

Delta avg. z-score Trametinib

Figure 24. Comparing delta z-scores across drug treatments.

For each of the 143 validated genes, the average z-score from the secondary CRISPRa screen was subtracted from the average z-score from the secondary ORF screen. Delta z- scores >2 and <-2 (dashed lines) indicate genes which score significantly with only ORF or with only CRISPRa technology, respectively. ‘ORF-only’ genes (blue) and ‘CRISPRa- only’ (red) genes are labeled.

A plausible explanation for why certain genes score as ‘ORF-only’ hits is that the sgRNAs for the genes were not well represented in the secondary CRISPRa library pDNA. To investigate this, the average pDNA log2-RPM for every gene in the secondary

CRISPRa library was calculated. The median log2-RPM for the ‘ORF-only’ hits was identical to the median log2-RPM of all of the genes present in the secondary library

(data not shown). The ‘ORF-only’ genes were therefore not hindered from scoring with

CRISPRa due to technicalities when generating the library. In the same vein, the

‘CRISPRa-only’ hits did not lack representation in the secondary ORF screen because genes with low ETP sequencing reads were previously filtered out. However, for a few

64

genes with negative delta z-scores (‘CRISPRa-only’ hits), the ORF constructs included in the secondary ORF library had protein sequence matches below 80%. It is therefore likely that these genes did not score as ORF hits because the altered protein sequence could not confer drug resistance.

Another possibility for why genes score as ‘CRISPRa-only’ hits is that the ORF itself is simply not expressed in the cells post-transduction. To test this theory, a unique feature of the Human ORFeome Collection was utilized. Many of the ORFs included the collection contain an epitope tag sequence for V5, a short, 14-amino acid peptide from the paramyxovirus. ORF expression levels can therefore be analyzed via V5 detection. To perform the assay, MelJuso cells were seeded in 96-well plates and transduced with the

ORF constructs in their arrayed lentivirus format so that each ORF could be interrogated individually. The transduced cells were then fixed in the 96-well plates, permeabilized and stained with an antibody against V5 to quantify protein expression (Figure 25). The

ORFs that did not contain the V5 tag (n=87, including NRAS) were filtered out of the analysis. For the remaining ORFs (n=391), the average fluorescence intensity of each construct was compared to empty-wells, used to determine background fluorescence, and positive control-wells. The average fluorescence intensity of the empty-wells (n=34) and positive control-wells (n=26) was 9.49 ± 3.23 and 57.18 ± 15.37, respectively. The signal present in the empty wells was caused by auto-fluorescence of the plastic plates, while the wide range of values seen in the positive control wells is likely due to technical variations introduced by assaying six plates at the same time.

65

100 NRF1

80

60

40

20 Mean well-fluorescence

0 ELOVL1 ZNRF4 0 50 100 150 200 250 300 350 400 ORF

Figure 25. V5 fluorescent signal of individual ORF constructs.

Cells were transduced with the contents of the secondary ORF library in 96-well plate format and stained for the V5 epitope expression after 96 hours. Many genes designated as ‘CRISPRa-only’ hits (red dots) exhibited low V5 expression. The average well- fluorescence of all V5-expressing ORFs (including the positive controls) is indicated by the dashed line. The average well-fluorescence of the empty-wells (n=34) ± 1 st. dev. is indicated by the gray box. The blue dots are the blue-fluorescent protein positive-control wells (n=26).

In this experiment, the entire secondary ORF library was tested and many

‘CRISPRa-only’ genes (delta z-scores < 0) had fluorescence intensities below average

(n=23), and several even fell within the range of the empty wells (n=13). To confirm that there was adequate cell viability in the wells containing the low V5-expression

‘CRISPRa-only’ genes, the 96-well plates were visually inspected. Over half of the wells

(n=13) were below 15% confluent, including the two most significant genes ELOVL1 and ZNRF4, indicating that the majority of the cells died post-transduction with the ORF.

The average log2-RPM values from the secondary ORF screen ETP samples were then cross-referenced, and none of the 13 genes were underrepresented (data not shown). It is

66

therefore likely that a high multiplicity-of-infection resulting in multiple integrants of these ORFs was lethal to the cells. For the other ten genes with fluorescence intensities below the mean, the ORF constructs were simply not expressed post-transduction in the

MelJuso cell line.

To investigate why many genes scored as ‘ORF-only’ hits, the LFC values of the individual sgRNAs from the secondary CRISPRa screen were reassessed. In order to focus on the strongest phenotypes, a more stringent delta z-score cut-off of >3 was picked, limiting the list to nine genes: KRAS, HNF4G, BRD3, ARAF, BRAF, SAMD4A,

GPR132, HNF4A, NTRK2. These genes were then divided into two categories: false- negatives or true-negatives (Figure 26). All genes in the false-negative category have at least one sgRNA showing strong enrichment (LFC >1), indicating that these individual constructs could possibly activate gene expression. However, the majority of the sgRNAs do not have any significant effect. When the LFC values of all the sgRNAs are averaged together, the gene’s significance as a CRISPRa-hit is lowered. The genes in the true- negative category do not have any individual constructs score higher than a LFC of 0.5, and the average LFC of all sgRNAs is negative.

To confirm these genes are false and true-negatives outside of a pooled-screen setting, a few genes were further tested in a low-throughput manner. The top-scoring sgRNA and an average-scoring sgRNA for BRAF, HNF4A, and ARAF, as well as a non- targeting sgRNA, were individually cloned into the CRISPRa vector. MelJuso-dCas9-

VP64 cells were then transduced with the constructs, and gene activation was analyzed post-transduction by assessing protein expression via western blotting.

67

False negatives True negatives 4

Top sgRNA Average sgRNA 2

0

-2 Avg. fold change (log2) change fold Avg.

-4

BRAF KRAS BRD3 ARAF HNF4G HNF4A NTRK2 SAMD4A GPR132 NT sgRNA

Figure 26. Log2-fold changes of individual sgRNAs of ‘ORF-only’ gene hits.

Genes with delta z-scores >3 are labeled either false negatives (with at least one significant sgRNA), true negatives (no significant sgRNAs), or non-targeting (NT). Each dot represents the average log2-fold change value of Trametinib and Selumetinib for the 12 sgRNAs per gene in the secondary CRISPRa library. Horizontal dashed line indicates a log2-fold change value of zero. Dark blue and light blue dots indicate the top-scoring and average-scoring sgRNAs chosen for western blotting, respectively.

The levels of protein expression mirrored the differences in LFC values that were seen in the secondary CRISPRa screens (Figure 27). BRAF, which belongs to the false- negative category, is endogenously expressed in the MelJuso cell line. The non-targeting sgRNA did not alter the baseline expression, as expected. The top-scoring sgRNA for

BRAF, with a LFC of 2.92, greatly increased protein expression levels as compared to the average-sgRNA with a LFC value of -0.32. HNF4A, another gene in the false- negative category, did not show a band in the western blot in the parental MelJuso cells, possibly due to the limitations of detection of low-expressed genes. However, HNF4A was activated by both sgRNA constructs. The top-ranking sgRNA (LFC value of ~2)

68

increased expression more than the average-sgRNA (LFC value of 0.37). ARAF, which belongs to the true-negative category, was also endogenously expressed in the parental cells, but neither sgRNA was effective, confirming the lack of gene activation in the secondary CRISPRa screens.

-targeting

ParentalNon Top sgRNAAvg. sgRNA m-BRAF r-VINC

r-HNF4A m-VINC

r-ARAF m-VINC

Figure 27. Gene activation varies by sgRNA and by gene.

MelJuso-dCas9-VP64 cells transduced with sgRNA constructs were tested for gene expression levels via western blotting. Activation levels vary as expected based on the magnitude of the individual LFC values for the sgRNAs. The top sgRNA had the highest LFC value, and the avg. sgRNA had an average LFC value for the 12 sgRNAs per gene. Parental MelJuso cells and a non-targeting sgRNA are controls for baseline gene expression. Primary antibodies were either mouse (m) or rabbit (r), and secondary antibodies were fluorescently tagged. Vinculin was used as the protein loading control.

69

Chapter IV

Discussion

Two common gain-of-function technologies in the field of functional genomics,

ORF and CRISPRa, can be used to interrogate the genetics underlying cancer drug resistance. The ORF and CRISPRa methodologies produce the same overexpression phenotypes but with some differences. With ORFs, the gene of interest is exogenously expressed as a cDNA and achieves non-physiological levels of expression, while with

CRISPRa, both the endogenous gene transcript and its splice isoforms can be activated.

Both technologies have expanded the capacity to conduct large-scale genetic screens and proven successful in resolving some previously elusive drug-resistance mechanisms in cancer. However, when directly compared, the two technologies reveal a minimal number of common genes, and the lack of consistency begs the question if something fundamental to the technologies themselves inhibits correlation. The aim of this study was to provide a deeper understanding of why ORF and CRISPRa screens provide different results when asking the same biological question.

Two previously performed genome-wide screens in the MelJuso NRAS-mutant melanoma cell line provided the framework to investigate this question. One project chose to use ORF and the other CRISPRa to study drug resistance, but both technologies were screened with similar MEK inhibitors. They identified a number of common genes, but also numerous distinct hits. There are several possible explanations for the different results across the two screening systems: i) the genes could not be overexpressed in ORF

70

and CRISPRa due to a technical failure ii) endogenous genes could not be activated by sgRNAs because they are not expressed in the screening cell model or iii) the design of the ORF or sgRNA constructs is imperfect. To compare the primary genome-wide results in a more focused manner, secondary screens were performed with validation libraries containing only the top ORF and top CRISPRa gene hits, as well as the few genes that overlapped. The secondary ORF library contained only one construct per gene while secondary CRISPRa library contained 12 sgRNAs per gene. In silico analysis of the secondary screening data and follow-up in vitro assays have provided some insights into the technological and biological reasons for the different results.

Despite the limited number of available constructs, the ORF screens yielded very significant drug-resistant gene hits. One of the top scoring genes in both ORF and

CRISPRa was RAF1, a known driver of MAPK inhibition in cancer resistance. However, the primary and secondary ORF screens generated values roughly 5.5 and 2.3 z-scores higher, respectively, than the CRISPRa screens (primary z-scores not shown). This trend was not just seen in the double-hits, but in the ORF and CRISPRa-hits as well. Genes designated as ORF-hits in the primary screen scored more significantly in the secondary

ORF screen than CRISPRa-designated hits scored in the secondary CRISPRa.

When comparing the significance of the hits with p-values rather than LFCs, however, the ORF screening data is weaker, as there is only one ORF construct per gene contributing to the biological effect. Also, including only one ORF construct per gene into the library had repercussions, as several ORFs were not expressed in the cells post- transduction, including the double-hit HRAS. Including multiple ORF constructs per gene was not always a possibility due to the contents of the Human ORFeome Collection.

71

For some genes, the inclusion of many sgRNAs in the secondary CRISPRa library increased the p-value significance of the hit. The resistance-driving capability of genes like ABCB1 in Trametinib is likely to be very biologically significant as all 12 sgRNAs scored within the top 0.8% of the library (data not shown).

The smaller ORF library size may, however, have contributed to the reproducibility of the data. The secondary ORF screens recovered 10-fold more sequencing reads than required, and the high sequencing coverage likely contributed to the strong correlations across biological replicates. Additionally, the small size of the secondary ORF library allowed for a higher representation of cells per construct at the transduction, passaging and sequencing phases. The reduced replicate correlations in the

CRISPRa screens are likely to be explained by the lower number of recovered sequencing reads but may also hint at an important biological consideration – genes can only be activated if they are accessible to the transcriptional machinery. The heterogeneity of chromatin structure within a cancer cell line could contribute to unequal activation levels.

Both secondary screens successfully validated several gene hits from the primary screens. Overall, the secondary CRISPRa screen validated a larger number of genes than the secondary ORF, but the FDR threshold was set lower due to the incorporation of the

‘dummy’ non-targeting genes in the control gene set. When comparing the z-score polarities of genes between the primary and secondary screens, there were only seven discordant genes with ORF compared to 48 with CRISPRa (data not shown). Factoring in these sight differences, neither screening technology proved to validate genes at a higher rate or more significantly than the other. The overall validation rate does, however,

72

highlight the importance of performing secondary screens with a more focused library.

The primary screening data taken at face value would assume that the top 100 scoring- genes are biologically significant. In reality, many of the gene hits were chance occurrences in the setting of a genome-wide pooled screen. Some of the validated gene hits from the secondary screening data could even further prove non-significant if tested as individual constructs in an arrayed screen.

Genes identified as ‘CRISPRa-only’ hits likely do not score with ORF technology due to an intrinsic property of the constructs in the MelJuso cell line. As shown by the

V5-expression assay, many of the ORFs are simply not expressed despite adequate transduction efficiencies (e.g. ELOVL1 and ZNRF4). The reasons behind this lack of expression remain unclear. For ORFs that are highly expressed in the cells but do not score as hits, such as ABCB1, the construct could be expressing a splice isoform that is incapable of conferring resistance to a MEK inhibitor in NRAS-mutant melanoma.

Similarly, many ORFs were in fact deleterious when expressed within MelJuso, eliminating all possibility of them scoring in an ORF screen. ZNRF4, for example, was originally considered a non-scoring control gene from the primary screens. It then scored as a sensitizer-gene in the secondary ORF and as a weak resistant-gene in the secondary

CRISPRa. The delta between the LFC values drives ZNRF4 to appear as a strong

‘CRISPRa-only’ hit, when in fact the ORF may simply be lethal if expressed at too high of a level. Competition assays between various ORFs for ‘CRISPRa-only’ genes could reveal if certain subsets of ORF constructs are consistently not expressed or are toxic.

Analyses of these competition assays could identify a pattern of ORF sequences that should be avoided in future libraries.

73

The delta z-score analysis revealed many ‘ORF-only’ genes that failed to score significantly with CRISPRa. A deeper look into the single sgRNA constructs unveiled one possible reason why – many genes showed 1-3 sgRNAs with strong enrichment, while the remaining had no effect. Therefore, including a higher number of sgRNAs in the library to create redundancy lowers the significance of the gene as a whole even if one or more sgRNAs can activate expression. Had only the top two or three sgRNAs been included in the library, or if the data were analyzed differently, these false-negative

‘ORF-only’ genes may have instead scored as double-hits. This was confirmed in the protein expression analysis via western blot where the top-scoring sgRNA for BRAF significantly increased the protein expression level while the average-scoring sgRNA did not. The protein expression analysis in HNF4A revealed the gene was activated by both sgRNA constructs, with the top-ranking sgRNA increasing expression subtly more than the average-sgRNA.

On the other hand, true-negative genes like ARAF had no scoring sgRNAs in the secondary CRISPRa library, despite the fact that the constructs were designed with the same criteria used for all other genes. Knowing that the parental cells endogenously express ARAF opens up the possibility that there is an alternative transcriptional start site that was not accurately annotated. Tertiary libraries separating out false and true-negative

‘ORF-only’ hits could elucidate if the design of the secondary CRISPRa library impeded these genes from scoring with CRISPRa. The false-negative tertiary library would reduce the number of sgRNAs per gene in an attempt to raise the significance of the LFC values, while the true-negative tertiary library could possibly use an alternative transcriptional start site annotation database and include more sgRNAs per gene.

74

An alternative reason for why certain genes score as ‘ORF-only’ hits is the fact that some mutant ORF constructs were included in the secondary library. KRAS is the most prominent example, as it is the top scoring ‘ORF-only’ gene and its ORF sequence contains the well-studied G13V activating mutation (Hobbs et al., 2016). The ORFs for

CSF1R, BRD3, SAMD4A, and INSR also contain various types of mutations that could be resistance-driving in the context of an NRAS-mutant melanoma line treated with a

MEK inhibitor. The ‘ORF-only’ gene-set may very well differ if the primary ORFeome or secondary ORF libraries were redesigned to include only wild-type constructs.

The outcomes of this study confirm the phenomenon originally identified with the primary screens: ORF and CRISPRa are both relevant screening technologies for identifying drug-resistance landscapes, but they offer different results. The double-hit genes remained double-hits in both secondary screens, implying that either system could be used to identify the top resistance-drivers in a cancer type of interest. However, the less significant, and possibly more interesting, gene hits only modestly overlap. Based on the data presented, a combination of inactive constructs and faulty library design partially contributed to the lack of correlation across screening systems.

After a few more follow-up studies, new criteria for secondary library generation could be developed to reduce the likelihood of false negative and positive gene hits. For the results that remain ambiguous, the outcomes are likely attributed to unknown biological factors specific to the MelJuso cells. Different ORFs could have been more highly expressed, or different sgRNAs could have activated different genes if the screens had been performed in an alternative cell line. Therefore, for now, to identify genes that

75

contribute to drug resistance, overexpression screens should still be performed using both

ORF and CRISPRa and in multiple cancer models for the disease of interest.

76

References

Barrangou R., Fremaux C., Deveau H., Richards M., Boyaval P., Moineau S., Romero D.A & Horvath P. (2007) CRISPR provides acquired resistance against viruses in prokaryotes. Science. 315: 1709–1712.

Bolotin A., Quinquis B., Sorokin A. & Ehrlich S.D. (2005) Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology. 151, 2551–256.

Broad Institute. (2019) New Analysis for CRISPR-based Screening Data. Retrieved from https://portals.broadinstitute.org/gpp/public/analysis-tools/crispr-gene-scoring.

Broad Institute. (2019) PoolQ. Retrieved from https://gpplims.broadinstitute.org/screening/poolq/create.

Cheung H.W., Cowley G.S., Weir B.A., Boehm J.S., Rusin S., Scott J.A., East A., Ali L.D., Lizotte P.H., Wong T.C. et al. (2011) Systematic investigation of genetic vulnerabilities across cancer cell lines reveals lineage-specific dependencies in ovarian cancer. Proc Natl Acad Sci USA. 108 (30): 12372-12377.

Doench, J.G., Fusi, N., Sullender, M., Hegde, M., Vaimberg, E.W., Donovan, K.F., Smith, I., Tothova, Z., Wilen, C., Orchard, R. et al. (2016) Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nature Biotechnology. 34 (2):184-191.

Gilbert, L.A., Larson M.H., Morsut, L., Liu, Z., Brar, G.A., Torres, S.E., Stern-Ginossar, N., Brandman, O., Whitehead, E.H., Doudna, J.A. et al. (2013) CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell. 154 (2): 442- 451.

Guenther, L.M., Dharia, N.V., Ross, L., Conway, A., Robichaud, A.L., Catlett II, J.L., Wechsler, C.S., Frank, E.S., Goodale, A., Church A.J. et al. (2018) A Combination CDK4/6 and IGF1R Inhibitor Strategy for Ewing Sarcoma. Clinical Cancer Research. January 4 2019. DOI: 10.1158/1078-0432.CCR-18-0372.

Hanahan, D. & Weinberg, R.A. (2011) Hallmarks of cancer: the next generation. Cell. 144 (5): 646-674.

Hartenian, E. & Doench, J.G. (2015) Genetic screens and functional genomics using CRISPR/Cas9 technology. The FEBS Journal. 282: 1383-1393.

Hayes, T. K., Luo, F., Cohen, O., Goodale, A., Lee, Y., Pantel, S., Bagul, M., Piccioni, F., Root, D.E., Garraway, L.A., Meyerson, M. & Johannessen, C.M. (2019) A functional landscape of resistance to MEK1/2 and CDK4/6 inhibition in NRAS- Mutant melanoma. Cancer Research. 79 (9): 2352-2366.

Hobbs, G.A., Der, C.J. & Rossman, K.L. (2016) RAS isoforms and mutants in cancer at a glance. Journal of Cell Science. 129: 1287-1292.

Holohan, C., Van Schaeybroeck, S., Longley, D.B., & Johnston, P.G. (2013) Cancer drug resistance: an evolving paradigm. Nature Reviews Cancer. 13: 714-726.

Iniguez, A.B., Alexe, G., Wang, E.J., Roti, G., Patel, S., Chen, L., Kitara, S., Conway, A., Robichaud, A.L., Stolte, B. et al. (2018) Resistance to Epigenetic-Targeted Therapy Engenders Tumor Cell Vulnerabilities Associated with Enhancer Remodeling. Cancer Cell. 34 (6): 922-938.

Ishino Y., Shinagawa H., Makino K., Amemura M. & Nakata A. (1987) Nucleotide sequence of the iap gene, responsible for alkaline phosphatase isozyme conversion in Escherichia coli, and identification of the gene product. Journal of Bacteriology. 169: 5429–5433.

Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A. & Charpentier E. (2012) A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 337: 816–821.

Konermann, S., Brigham, M.D., Trevino, A.E., Joung, J., Abudayyeh, O.O., Barcena, C., Hsu, P.D., Habib, N., Gootenberg, J.S., Nishimasu, H. et al. (2015). Genome- scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature. 517 (7536): 583-588.

Piccioni, F., Younger, S.T. & Root, D.E. (2018) Pooled Lentiviral-Delivery Genetic Screens. Current Protocols in Molecular Biology. 121: 32.1.1-32.1.21.

Qi, L.S., Larson, M.H., Gilbert, L.A., Doudna, J.A., Weissman, J.S., Arkin, A.P. & Lim, W.A. (2013) Repurposing CRISPR as an RNA-guided platform for sequence- specific control of gene expression. Cell. 152 (5): 1173-1183.

Sanjana, N.E. (2017) Genome-scale CRISPR pooled screens. Analytical Biochemistry. 532: 92-99.

Sanson, K.R, Hanna, R.E., Hegde, M., Donovan, K.F., Strand, C., Sullender, M.E., Vaimberg, E.W., Goodale, A., Root, D.E., Piccioni, F. et al. (2018) Optimized libraries for CRISPR-Cas9 genetic screens with multiple modalities. Nature Communications. 9 (5416).

Shechner, D.M, Hacisuleyman, E., Younger, S.T. & Rinn, R.L. (2015) Multiplexable,

78

-specific targeting of long RNAs with CRISPR-Display. Nature Methods. 12 (7): 664-670.

Wilson, F.H., Johannessen, C.M., Piccioni, F., Tamayo, P., Kim, J.W., Van Allen, E.M., Corsello, S.M., Capelletti, M., Calles, A., Butaney, M., Sharifnia, T., et al. (2015) A Functional Landscape of Resistance to ALK Inhibition in Lung Cancer. Cancer Cell. 27: 397-408.

Yang, X., Boehm, J.S., Yang, X., Salehi-Ashtiani, K., Hao, T., Shen, Y., Lubonja, R., Thomas, S.R., Alkan, O., Bhimdi, T. et al. (2011) A public genome-scale lentiviral expression library of human ORFs. Nature Methods. 8 (8): 659-664.

79