Integrins are key mediators of cell state and tumour heterogeneity

by

Allison Michelle Lorraine Nixon

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Molecular Genetics University of Toronto

© Copyright by Allison Michelle Lorraine Nixon, 2018

Integrins are key mediators of cell state and tumour heterogeneity

Allison M. L. Nixon

Doctor of Philosophy

Department of Molecular Genetics University of Toronto

2018 Abstract

Human cancers display a high degree of intratumoural heterogeneity (ITH), largely driven by

heterogeneous genetic/epigenetic alterations and interactions with the cellular microenvironment

resulting in functionally diverse cell subpopulations with differing clinical implications.

In order to better isolate, study and potentially target clinically relevant tumour subpopulations, I

developed a methodology for simultaneous target discovery and synthetic antibody generation, coined CellectAb. This approach allowed me to discover, develop and characterize antibodies against CD133 positive colorectal cancer initiating cells (CICs). Surprisingly, two of the three antibodies I discovered and characterized recognize members of the family of cell adhesion molecules: integrins alpha (α) 7 and beta (β) 6. These antibodies enrich for self-

renewing CICs, and genetic ablation of integrin β6 impedes colorectal CIC function. Together,

these findings validate the CellectAb approach, identify two integrins as novel colorectal CIC

markers, and provide a rational for therapeutic targeting of integrin β6.

Integrins play critical roles in many cancer biological processes including survival, proliferation,

invasion and metastasis. While seven drugs targeting leukocyte/platelet integrins have

progressed to market, drugs against solid tissue integrins, especially αV, fail to show clinical

efficacy despite promising preclinical results. In chapter 3, I hypothesize that a contributing

ii

reason for this failure may be that without αV function, cells relate very differently to the extra cellular matrix (ECM) microenvironment. Specifically, loss of αV robustly mediates the loss of adherent growth and switch to a viable three dimensional sphere state in cell types ranging from fibroblasts to epithelial cancers. Surprisingly, this phenotype is completely reversed when exogenous ECM is provided or metabolic redox conditions are perturbed. This unexpected discovery stemmed from parallel functional genetic screen approaches to identify regulating proliferative interaction with the ECM.

Together, my thesis contributes to a better understanding of how cancer cells functionally interact with their microenvironment through various integrins, and provides a new technology for marker discovery on rare cell populations.

iii

Acknowledgments

Firstly, I would like to express my sincere gratitude to my thesis supervisor, Professor Jason Moffat, for the continuous support throughout my PhD work. His unique balance of active instructional support and creative freedom have facilitated so many aspects of my scientific and professional development. My time in his lab has been an incredible growing experience that I value greatly.

I would also like to thank my thesis supervisory committee, Professors Sean Egan and Brent Derry for their insightful comments and encouragement, but also for the balanced feedback and hard questions which pushed me to grow as a scientist and kept me, if not completely focused, more focused than I would be left to my own devices.

My sincere thanks also goes to Dr. Catherine O’Brien and Dr. Sheila Singh who, as collaborators and successful women in science, gave me so much to which to aspire.

I thank my fellow labmates and MoGen program-mates for the stimulating discussions and for all the fun we have had over the last several years. Especially to Patti who keeps the whole show running and to Alejandro for continuing to show up and give our projects his all. Every. Single. Day. I could not have done it all without you.

Last but not the least, I would like to thank all of my family and friends. Especially my father for being a stable and supportive presence, and my mother for instilling in me from a young age the ability and drive to think and do things for myself. I am incredibly fortunate to be surrounded by such incredible people. Your listening (or at least nodding along) while I have excitedly celebrated my successes, or bemoaned my lab-related struggles, has been instrumental in keeping me sane. Thank you.

iv

Table of Contents

Table of Contents Abstract ...... ii

Acknowledgments ...... iv

Table of Contents ...... v

List of Tables ...... x

List of Figures ...... xii

List of Abbreviations ...... xv

Chapter 1 ...... 1

1 Introduction ...... 1 1.1 Intratumoural heterogeneity ...... 1 1.1.1 The War on Cancer ...... 1 1.1.2 Cancer is Heterogeneous ...... 1 1.1.3 Differentiation state ITH and Cancer Stem Cells (CSCs) ...... 2 1.1.4 Epigenetics as a molecular mechanism for ITH ...... 5 1.1.5 Tumour microenvironmental causes of ITH ...... 5 1.1.6 Genetic ITH ...... 6 1.1.7 A functional subpopulation of Cancer Initiating Cells (CICs) ...... 8 1.1.8 Culture methods for CICs ...... 8 1.1.9 Cell surface markers of CICs ...... 9 1.1.10 Discovery of novel markers to understand and target CICs ...... 11 1.2 The (ECM) ...... 11 1.2.1 Cell Adhesion ...... 11 1.2.2 Function of the ECM ...... 11 1.2.3 Major ECM components ...... 12 1.2.4 The Matrisome ...... 13 1.3 ECM receptors ...... 14

v

1.3.1 Discoidin domain receptors ...... 14 1.3.2 Other ECM receptors ...... 14 1.3.3 Integrins ...... 15 1.3.4 Integrin signaling ...... 15 1.3.5 The integrin adhesome ...... 19 1.4 Cell-ECM interactions in human disease ...... 20 1.4.1 Mutations in ECM components leading to disease ...... 20 1.4.2 Mutations in ECM receptors leading to disease ...... 21 1.4.3 Accumulation of ECM leads to Fibrosis ...... 21 1.4.4 Increased ECM breakdown pathology ...... 22 1.4.5 Cancer ...... 23 1.4.6 Anoikis and anchorage independence in cancer ...... 23 1.5 Conclusions ...... 25

Chapter 2 ...... 27

2 Cell-based Synthetic Antibody Selection (CellectAb) Yields New Tools for Detecting Functional Subpopulations of Cancer Cells ...... 27 2.1 Contributions: ...... 27 2.2 Abstract ...... 28 2.3 Introduction ...... 29 2.3.1 Antibodies for biomedical research ...... 29 2.3.2 Antibodies for target discovery ...... 31 2.3.3 A fully synthetic antibody ...... 32 2.3.4 CD133 positive Colorectal Cancer Initiating Cells ...... 32 2.3.5 Rationale and Approach ...... 33 2.4 Methods ...... 33 2.4.1 Cell culture ...... 33 2.4.2 Isolation of AC133+ CIC-enriched fraction ...... 34 2.4.3 Antibody Selection Strategy ...... 34 2.4.4 Flow cytometry/FACS ...... 35 2.4.5 Integrin overexpression ...... 35

vi

2.4.6 Immunoprecipitation (IP) ...... 37 2.4.7 Mass Spectrometry (MS) ...... 38 2.4.8 RNA interference (RNAi) experiments ...... 38 2.4.9 CRISPR/Cas9 single knockout generation ...... 39 2.4.10 ITGA3, ITGA6, ITGA7 full gene triple knockout generation ...... 39 2.4.11 ELISA ...... 42 2.4.12 Immunohistochemistry ...... 42 2.4.13 Western Blot ...... 42 2.4.14 qPCR ...... 43 2.5 Results ...... 44 2.5.1 Isolation of presumptive colorectal cancer CICs ...... 44 2.5.2 Selection of CIC-specific synthetic antibodies ...... 44 2.5.3 Antibodies bind preferentially to the AC133high POP92 population ...... 46 2.5.4 AN01, AN02 and AN03 recognize distinct cell surface antigens on a variety of cell types ...... 48 2.5.5 Identification of prospective targets for AN01, AN02 and AN03 ...... 51 2.5.6 Validation of the protein targets for AN01, AN02 and AN03 ...... 53 2.5.7 ITGA7 and ITGB6 antibodies enrich for CRC CICs ...... 55 2.5.8 Integrin α7 antibody enriches for other types of stem cells ...... 55 2.5.9 Staining with antibody has no functional effect ...... 57 2.5.10 Integrin β6 is required for sphere formation ...... 61 2.5.11 CellectAb antibodies are species cross reactive ...... 61 2.5.12 CellectAb antibodies bind structural epitopes ...... 63 2.5.13 CellectAb can identify protein-protein interaction partners of targets ...... 66 2.6 Discussion ...... 69 2.7 Conclusion ...... 71

Chapter 3 ...... 73

3 Genetic Screens for Adherent Cell Growth Identify ITGAV as a Central Regulator of Cell State Preference ...... 73 3.1 Contributions ...... 73

vii

3.2 Abstract ...... 75 3.3 Introduction ...... 76 3.3.1 Functional genetic screens ...... 76 3.3.2 CRISPR-Cas9 pooled screening ...... 76 3.3.3 Cell culture systems ...... 77 3.3.4 Modeling the microenvironment ...... 77 3.3.5 Role of the ECM in cell fitness ...... 78 3.3.6 Rationale and Approach ...... 78 3.4 Methods ...... 79 3.4.1 Computational identification of adherence fitness genes ...... 79 3.4.2 Identifying context fitness adhesion genes using published data ...... 83 3.4.3 Cell culture ...... 83 3.4.4 CRISPR-Cas9 Extracellular Matrix Substrate Screens ...... 84 3.4.5 Generation of CRISPR Knockout cell lines ...... 86 3.4.6 Proliferation assays ...... 87 3.4.7 Adherence assays ...... 87 3.4.8 Immunofluorescence ...... 88 3.4.9 RNAi ...... 88 3.4.10 Nutrient supplementation experiments ...... 88 3.4.11 Defined media experiments ...... 89 3.4.12 Flow cytometry/ FACS ...... 89 3.4.13 Transmission Electron Microscopy (TEM) Imaging ...... 90 3.4.14 Generation of antibodies against αVβ5 ...... 90 3.4.15 Fab-phage cellular ELISAs ...... 90 3.4.16 Expression and purification of IgG ...... 91 3.4.17 Validation of IgG specificity ...... 91 3.4.18 Antibody treatment ...... 91 3.5 Results ...... 117 3.5.1 Systematic identification of adherent-related fitness genes (AFGs) ...... 117 3.5.2 Genome-wide genetic screens identify ECM-dependent fitness genes ...... 121

viii 3.5.3 Parallel approaches identify core processes regulating adherent cell state ...... 125 3.5.4 Loss of integrin αV results in sphere formation that is masked by ECM ...... 125 3.5.5 αVKO cells grow as spheres independent of genotype ...... 130 3.5.6 Pyruvate modulates cell state in αVKO cells ...... 134 3.5.7 Availability of electron acceptors determines cell state in αVKO cells ...... 134 3.5.8 Availability of glutathione and biotin determine cell state in αVKO cells ...... 140 3.5.9 Multiple αV-containing heterodimers contribute to the phenotype of αVKO cells ...... 142 3.5.10 Knocking out many integrin heterodimers causes non-spheroid suspension growth ...... 145 3.5.11 Integrin targeted biologics recapitulate genetic model ...... 151 3.6 Discussion ...... 154 3.7 Conclusion ...... 159

4 Chapter 4: Discussion and Conclusions ...... 160

References ...... 166

ix List of Tables

Chapter 1:

Table 1 – Solid tumour CIC markers…………………………………………………………………….10

Table 2 – ECM related diseases……………………………………………………………………….....26

Chapter 2:

Table 3 – CellectAb identifies many potential colorectal CIC binders …………………………………45

Table 4 – AN01, AN02 and AN03 expression patterns across cancer cells from many tumour types……………………………………………………………………………………………………...48

Table 5 – IP-MS results 1- Potential targets of AN01, AN02, and AN03…………………………...…..52

Table 6 – IP-MS results 2- AN01 pulls down integrin α7β1 in complex with CD98…….……………..67

Chapter 3:

Table 7 – 52 Adherence-related fitness genes (AFGs) ………………………………...……………..…93

Table 8 – αVKO Sanger sequencing data confirmation………………………………………....……..110

Methods Tables:

Table M1 – Chapter 2 overexpression plasmids……………………………………………...….………36

Table M2 – Chapter 2 RNA interference (RNAi) reagents….…………………………………..………38

Table M3 – Chapter 2 CRISPR reagents……………………………….………………………………..39

Table M4 – Chapter 2 PCR primers…………………………………………………………..…………40

Table M5 – Chapter 3 CRISPR reagents………………………………….……………………………..87

Table M6 – Chapter 3 RNAi reagents………………………………………………………...…………88

Table M7 – Chapter 3 antibodies………………………………………………………………………...89

x

Supplementary Tables: (electronic only)

Table S1 – Full Illumina sequencing results from CellectAb on POP92 CICs (expanded version of Table 3)

Table S2 – Full IP-MS results for AN01, AN02 and AN03 scIgG (expanded version of Table 5)

Table S3 – Full IP-MS results for AN01 Flag-tagged and Avi-tagged Fabs (expanded version of Table 6)

Table S4 – Matrix of published CRISPR screen data binarized for fitness effect

Table S5 – Log2 fold change matrix for ECM CRISPR screens

Table S6 – Log2 differential fold change matrix for ECM CRISPR screens relative to TCP control

xi

List of Figures

Chapter 1:

Figure 1 – Cancer is heterogeneous…..…………………………………………………………..3

Figure 2 – Integrins “integrate” adhesive and signaling processes across the plasma membrane……………………………………….………………………………………………..16

Chapter 2:

Figure 3 – Phage-Fab library design and CellectAb selection strategy …………………………30

Figure 4 – AN01, AN02 and AN03 bind selectively to the CD133+ presumptive CIC population ……………………………………………………………………………………………………47

Figure 5 – AN01, AN02 and AN03 do not bind CD133, and recognize distinct cell surface targets ..…………………………………………………………………………………………..50

Figure 6 – Validation of cell surface protein targets of AN01, AN02 and AN03 ……...…………………………………………………………………………………………….54

Figure 7 – AN01 and AN03 enrich for self-renewing CICs ………………...…………………..56

Figure 8 – AN01 (integrin α7) marks human muscle stem cells ………………………….….…58

Figure 9 – AN01 (integrin α7) marks human glioblastoma (GBM) stem cells………………….59

Figure 10 – Antibody binding is not cytotoxic……………………………………………....…..60

Figure 11 – Integrin β6 is required for sphere formation………………………………………..62

Figure 12 – AN01 and AN03 are species cross-reactive………………………………………...64

Figure 13 – AN03 is specific for integrin β6 in IHC and co-localizes with TGFβ activation….…………………………………………………………………………………...…65

Figure 14 – AN01 does not bind directly to CD98, however CD98 is co-IPed with α7β1 …..…68

xii

Chapter 3:

Figure 15– Data mining reveals Adherence-related Fitness Genes (AFGs)………………..……81

Figure 16 – Parallel genome-wide CRISPR screens on ECM components……………..……….97

Figure 17 – ECM masks or sensitizes hundreds of genetic dependencies in HAP-1 cells..…...... 99

Figure 18 – AFGs and ECM-dependent genetic dependencies converge on cellular processes, including the integrin adhesome…….……………………………………………………….…101

Figure 19 – Dependence on ITGAV is masked by all tested ECM components in HAP-1 cells ……..……………………………………………………………………………………….…...103

Figure 20 – Loss of ITGAV leads to sphere growth state, regardless of TP53 status…………..106

Figure 21 – Dependence on ITGAV is masked by ECM in HCT116 CRC cells…………....….108

Figure 22 – Loss of integrin αV leads to sphere formation in many cell backgrounds……..….112

Figure 23 – Pyruvate is required for sphere growth state……...……………………….….…...113

Figure 24 – Exogenous electron acceptors are required for sphere growth state…….…...…....114

Figure 25 – In a defined media system, glutathione and biotin are required for sphere growth state..…………….……………………………………………………………………………...116

Figure 26 –ITGAV knockout phenotype is mediated by loss of multiple integrin αV heterodimers..……….…………………………………………………………………………..118

Figure 27 – Integrin heterodimers determine cell growth state in HAP-1……….……………..121

Figure 28 – Transmission Electron Microscopy (TEM) reveals cell junction and morphology abnormalities associated with integrin loss …………………………………………...... ……..123

Figure 29 – Loss of all αV- and β1-integrin heterodimers leads to suspension growth state……………………………………………………………………………………………..124

Figure 30 – Effects of integrin inhibitory antibodies are masked by ECM………………….....126

xiii

Figure 31 – A model of cell growth state determination by integrin heterodimers and metabolic conditions…………………...……………...…………………………………………………...127

xiv

List of Abbreviations

7AAD – 7-Aminoactinomycin D

AC133 – mouse monoclonal antibody against CD133 antigen aFC – Absolute log-2 fold change

AFG – Adherence-related fitness gene

α – alpha

α1KO – Integrin α1 (ITGA1) genetic knockout

α2KO – Integrin α2 (ITGA2) genetic knockout

α3KO – Integrin α3 (ITGA3) genetic knockout

α4KO – Integrin α4 (ITGA4) genetic knockout

α5KO – Integrin α5 (ITGA5) genetic knockout

α6KO – Integrin α6 (ITGA6) genetic knockout

α7KO – Integrin α7 (ITGA7) genetic knockout

α9KO – Integrin α9 (ITGA9) genetic knockout

αVKO – Integrin αV (ITGAV) genetic knockout

αVβ1KO – Integrin αV (ITGAV) and Integrin β1 (ITGB1) genetic knockout

AKB – α-ketobutyrate

ALT – Alanine transaminase

APC – Allophycocyanin

ATP – Adenosine triphosphate

xv

β – beta

β1KO – Integrin β1 (ITGB1) genetic knockout

β3KO – Integrin β3 (ITGB3) genetic knockout

β4KO – Integrin β4 (ITGB4) genetic knockout

β1β5KO – Integrin β1 (ITGB1) and Integrin β5 (ITGB5) genetic knockout

β5KO – Integrin β5 (ITGB5) genetic knockout

β6KO – Integrin β6 (ITGB6) genetic knockout

BF – Bayes Factor bFGF – basic fibroblast growth factor

BiTE – Bi-specific T-cell engager

Ca2+ – Calcium ion

CAF – Cancer-associated fibroblast

CAR-T cell – Chimeric antigen receptor T-Cell

Cas9 – CRISPR associated protein 9 from Streptococcus pyogenes

CD133 – Cluster of differentiation molecule 133, gene symbol PROM1

CD98hc – heavy chain of CD98 heterodimer, gene symbol SLC3A2

CD98lc – light chain of CD98 heterodimer, gene symbol SLC7A5

CDR – Complementarity determining region

CDRH3 – Region three of the heavy chain of the complementarity determining region

CDRL3 – Region three of the light chain of the complementarity determining region

xvi

CFSE – Carboxyfluorescein succinimidyl ester

CIC – Cancer Initiating Cell

CML – Chronic myeloid leukemia

CRC – Colorectal carcinoma

CRISPR – Clustered Regularly Interspersed Short Pallendromic Repeats

CSC – Cancer Stem Cell

DCA – Dicholoroacetic acid

DDR – Discoidin domain receptor dFC – Differential log 2 fold change

DM – Serum-free defined medium

DNA – Deoxyribonucleic Acid

ECM – Extracellular Matrix

EDTA – Ethylenediaminetetraacetic acid

EGF– Epidermal growth factor

EMT – Epithelial to mesenchymal transition

ESC – Embryonic stem cell

FA – Focal adhesion

Fab – Fragment antigen-binding

FACIT – fibril associated collagens with interrupted triple helices

FACS – Fluorescence-Activated Cell Sorting

xvii

FAK – Focal adhesion kinase (gene symbol PTK2)

FBS – Fetal bovine serum

Fc – Fragment crystallizable region

FDR – False discovery rate

FITC – Fluorescein isothiocyanate

GFP – Green fluorescent protein

GO –

GTP – Guanosine triphosphate

DAPI – 4’, 6-Diamidino-2-Phenylindole, Dihydrochoride

DB – Non-enzymatic dissociation buffer

DSC – Donnelly Sequencing Centre

ELISA – Enzyme-linked immunosorbant assay fgKO –full gene knockout

GAG – Glycosaminoglycan

GBM – Glioblastoma multiforme

GSEA – Gene set enrichment analysis

GSH – Glutathione, reduced

HA – Hyaluronic acid

HEK293T – Human embryonic kidney 293 expressing large T antigen

HLA – human leukocyte antigen

xviii IF – Immunofluorescence

IgG – Immunoglobulin G

ILK – integrin linked kinase (gene symbol ILK)

IP – Immunoprecipitation

iPSC – Induced pluripotent stem cell

IPTG – Isopropyl β-D-1-thiogalactopyranoside

ITH – Intratumoural heterogeneity

LAP – Latency-associated peptide

LAP-TGFβ – complex of LAP and TGFβ

LDH – lactate dehydrogenase

LiCl – Lithium chloride

Mg2+ – Magnesium ion

MHC – Major histocompatibility complex

MLC – Myosin light chain

mRNA – Messenger RNA

MS – Mass Spectrometry

mTOR – Mammalian target of rapamycin

mTORC2 – Mammalian target of rapamycin complex 2

NAD+ – Nicotinamide adenine dinucleotide oxidized

NADH – Nicotinamide adenine dinucleotide reduced

xix NADP+ – Nicotinamide adenine dinucleotide phosphate oxidized

NADPH – Nicotinamide adenine dinucleotide phosphate reduced

PBS – phosphate buffered saline

PCC –Pearson correlation coefficient

PCR – Polymerase chain raction

PDK – Pyruvate dehydrogenase kinase

PE – Phycoerythrin

PFA – Paraformaldehyde

PG – Proteoglycans

PPP – Pentose phosphate pathway qRT-PCR – quantitative real-time PCR

RAC1 – Ras-related C3 botulinum toxin substrate 1

RGD Arginine- Glycine- Aspartic acid tripeptide motif

RIPA – Radioimmuoprecipitation buffer

RNA – Ribonucleic Acid

RNAi – RNA interference

ROC – Receiver operator characteristic

RTK – Receptor tyrosine kinase

SCC7 – Single cell clone 7 (α3/α6/α7 fgKO) scIgG – Single-chain immunoglobulin G

xx SD – Standard deviation

SEM – Standard error of the mean shRNA – short hairpin RNA sgRNA – single guide RNA

TAA – Tumour-associated antigen

TBST – Tris-buffered saline with tween-20

TCP – Tissue culture-treated plastic

TFA – Trifluoroacetic acid

TGFβ – Transforming growth factor β

xxi Chapter 1 1 Introduction 1.1 Intratumoural heterogeneity

1.1.1 The War on Cancer

As the incidence and death rates from various microbiological and viral diseases have declined following the widespread use of vaccines and antibiotics, cancer has emerged as a leading cause of death worldwide [1]. When causative agents (bacteria, fungi, viruses, etc.) for such diseases were identified, it was relatively straightforward to target the responsible pathogen with minimal detrimental effect on host cells. Indeed, in many cases, nature has provided such antimicrobials. However, the journey to curb cancer deaths has not been so smooth. In 1971, president Richard Nixon signed into law the National Cancer Act. This act is generally viewed as the beginning of the ‘war on cancer’. Now, 46 years later, this war shows no signs of slowing. Despite significant advances in detection and treatment of certain early stage cancers, cancer in general, remains the major cause of death in the developed world. The collective efforts of countless intelligent and inspired people in both the public and private sectors in dozens of countries have not made significant progress in targeting late stage disease. However, recent advances in genomic and therapeutic technologies have renewed hope that we may yet win this war.

1.1.2 Cancer is Heterogeneous

It is of note that particular cancers are, in fact, cured every day. For example, some haematological malignancies can be cured with allogeneic stem cell transplants, and in early stage solid tumours, surgical intervention with or without chemo- or radio-therapies is sometimes curative. However, for patients with advanced disease, metastasis and/or inoperable site cancers, slowing disease progression and prolonging survival is too often the best hope. Arguably, the major reason so little progress has been made towards eradicating advanced disease is that cancer is incredibly heterogeneous. It is not a single disease, but rather a collection of more than one hundred unique diseases. This figure relates only to the number of different tissues types (i.e brain cancer compared to breast cancer) or subtypes (i.e. glioblastoma multiforme compared to medulloblastoma brain tumours). Beyond these tissue and cell type classifications, the

1 2

underlying somatic genetic and external environmental differences between patients adds exponentially to the collective between-patient, or intertumoural heterogeneity (Figure 1A). Cancer heterogeneity goes even further with cancer cells from within the same tumour often showing extensive intratumoural heterogeneity (hereafter ITH). This cell-to-cell variability includes genetic, epigenetic, and micro-environmental intratumoural heterogeneity between cancer cells in a single patient (Figure 1B). These levels of ITH contribute to functional ITH, wherein subpopulations of cancer cells display varying proliferation rate, differentiation status, self-renewal, ability to seed a xenograft tumour, as well as to metastasize and/or survive various therapeutic interventions. This functional ITH is therapeutically relevant, as curative therapy requires eradicating all populations of tumour cells. That advanced disease often displays extensive ITH begs the questions how and why does this heterogeneity arise? There are three non-mutually exclusive hypotheses to help explain the existence of ITH: 1) cellular differentiation state drives heterogeneity (i.e. cancer stem cells / epigenetics) 2) microenvironmental factors drive heterogeneity (i.e aberrant ECM, immune infiltrate, metabolic conditions, hypoxia etc.) and 3) genetic factors drive heterogeneity (i.e. clonal evolution). These three theories are rooted in history and have evolved with increased understanding of the molecular mechanisms underlying cancer development and progression.

1.1.3 Differentiation state ITH and Cancer Stem Cells (CSCs)

Before cancer was characterized as a genetic disease, or the field of epigenetics was conceived, other hypotheses were put forward to explain the etiology of cancer. The ancient Greeks first posited that cancer was caused by an imbalance of humors, specifically an excess of black bile. This belief was maintained for over 2000 years when, in mid-1800s, the similarities between embryonic development and cancer were noted. Rudolph Virchow, often designated as the father of pathology, observed that certain tumours termed teratomas (from the latin “pertaining to monsters”), comprised heterogeneous mixtures of fetal and adult tissue that resembled monstrously malformed organs [2, 3]. This led to the “embryonal rest theory of cancer”, which purported that undifferentiated embryonic tissue which persisted into adulthood was responsible for cancer initiation [4]. It was thought that a change in environment of the surrounding tissue would allow this embryonic tissue to “wake up” and resume proliferation leading to tumour formation. A similar yet distinct theory was also put forward, proposing that cancer was the 3

AB Intertumour Heterogeneity Intratumour Heterogeneity

Genetic Epigenetic Microenvironmental ECM Nutrients Tissue type pH Gases

Functional ITH / Cancer Initiating Cell (CIC) Model Individual tumours Engraftment Sphere Therapy

Figure 1 – Cancer is heterogeneous. A) Intertumoural heterogeneity encompasses heterogeneity by tissue type of origin including brain, breast, lung and colon (top) and between individual patients (bottom), including tumour subtype, driver mutations,

somatic differences, etc. B) Intratumoural heterogeneity (ITH) refers to differences between individual cells in an individual tumour. It includes, but is not limited to, genetic, epigenetic and microenvironmental ITH. Microenvironmental ITH includes other cell types present, the extracellular matrix, availability of nutrients, growth factors and gases (top). These various types of intratumoural heterogeneity contribute collectively to functional heterogeneity, which can be tested experimentally using the cancer initiating cell (CIC) model (bottom). Presumptive CICs are isolated and tested for functional properties including sphere formation, engraftment into immunocompromised mice, treatment with chemo- or radiotherapy, etc.

4

result of de-differentiation of adult cells back to an embryonic state [5]. Emerging studies around this time showing that cancer could be caused by chemicals, infectious parasites and loss of inhibitory influences on body tissue were considered supporting of the dedifferentiation hypothesis [5, 6]. By the turn of the 20th century, the embryonal rest and dedifferentiation theories were generally discredited [7]. However in the early 1970s, there was a resurgence of interest in cancer as a developmental disease, and teratocarcimona scientists established the basis for the modern stem cell theory of cancer. The seven initial principals of the stem cell theory of cancer, paraphrased from Stewart Sell [3] are as follows:

i) Cancers arise from tissue stem cells; ii) Location in an abnormal place allows cancer stem cells to express malignant phenotypes; iii) Cancers contain the same cell populations as normal tissue hierarchies (stem cells, transit amplifying/progenitor cells and differentiated cells); iv) Cancers can be transplanted by stem cells, not transit amplifying; v) Products of the cancer cells that reflect states in fetal development can be used as markers for diagnosis, prognosis and treatment (i.e. onco-developmental markers); vi) Malignant cells can become benign (i.e. differentiation therapy); vii) Therapy targets transit amplifying cells and cancer regrows from resistant cancer stem cells.

Sell’s obviously controversial contentions have been hotly debated in the literature, and many have in essence been debunked. To discuss individually in order to confirm or deny the validity of these points is beyond the scope of this chapter or thesis. Suffice it to say, in response to conflicting results, the field has simplified their contentions and the current CSC hypothesis states only that stem-like cancer cells drive tumour progression and that only CSCs can self- renew and differentiate into bulk cells [8]. While the CSC model generally fits well with hematological malignancies, solid tissue tumours have shown inconsistent results. These include observations of CSC plasticity (i.e. a non-CSC can become a CSC in some context) which calls into question the relevance of a hierarchical model. There is much debate as to the relevance of cancer as a stem cell/ differentiation state disease, however, it is undeniable that most tumours indeed contain more than one functionally distinct population of cells. Whether or not these appear to be stem-like depends heavily on the definition of a what a CSC is, and may be highly context dependent in terms of tumour type and other factors. In order to examine this further, we must dig into the molecular mechanisms that underlie normal or pathological stem-cell states, namely epigenetic gene regulation and the cell niche.

5

1.1.4 Epigenetics as a molecular mechanism for ITH

Epigenetics is the study of hereditary changes in gene function independent of changes in gene sequence. While all somatic cells in an individual share a common genome, cells are programmed to behave a certain way based on their epigenome. This is highlighted in the differences between pluripotent stem cells, and differentiated cell types. The mechanisms contributing to epigenome alteration include DNA methylation, histone posttranslational modifications and chromatin remodeling. As these processes have profound effects on and cell function, epigenetic changes are common in human cancer [9]. Epigenetic ITH involves heterogeneous changes in gene expression or function between cells within a tumour, without a change in DNA sequence. Importantly, in 2010 Sharma and colleagues identified a rare subpopulation of reversibly drug-tolerant cells, driven by changes in the epigenome, demonstrating the importance of epigenetic changes in tumor cell drug resistance [10]. Evidence is accumulating that these so-called drug-tolerant persistors may drive drug resistance in many cancer types (reviewed in [11]). This description of rare, therapy resistant cells may help explain the molecular mechanism behind the CSC hypothesis. Epigenetic alterations clearly have a large impact on cell function, however this does not happen in isolation in either normal or diseased tissues. Indeed, while most of Sell’s points have been debated or debunked his second principal, stating that the location (i.e. microenvironment or niche) impacts the phenotype and function of a given cell, is highly supported by the literature.

1.1.5 Tumour microenvironmental causes of ITH

The microenvironmental theory of cancer was first described as the “seed and soil” hypothesis, wherein the cell is the seed, and the surrounding environment the soil, and both need to be appropriate for cancer to initially develop, or a distant metastasis to arise. In modern language, we call the soil the tumour microenvironment. The microenvironment is the physical location in which cancer cells reside (i.e. the cell niche), which includes both cellular and non-cellular components, discussed in more detail below. The interactions of tumour cells with each other and their microenvironment are critical to biological processes including growth, wound healing, fibrosis, and cancer development and metastasis [12]. The inherent nature of a disorganized solid tumour means there is ITH at the spatial level. Thus, where a cell is located determines its interactions with cellular and non-cellular components of the microenvironment. Through many

6 interdependent molecular mechanisms this can determine signaling pathway activation, metabolic state, gene expression, cell phenotype and cell function.

The cells of the microenvironment include fibroblasts, immune cells, and endothelial cells of the vasculature. Many studies have focused on cellular components of the microenvironment, particularly on immunological infiltrates such as macrophages (reviewed in [13]), lymphocytes (reviewed in [14]) or cancer-associated fibroblasts (CAFs) reviewed by Kalluri and Zeisburg in [15]. These ‘support cells’ may interact directly with tumour cells through cell surface receptors, or may secrete factors that modulate the biochemical and physical nature of the niche. For example, CAFs secrete extracellular matrix components (ECM) that provide a substrate for cancer cells to attach or migrate along, and may secrete paracrine factors that support growth and proliferation.

The non-cell components include all growth factors and structural proteins secreted by these microenvironmental cells, as well as available nutrients, gases and pH provided by the vasculature. ITH is generated at this level by the heterogeneous distribution of tumour stromal cells, and by the nature of the three dimensional tumour structure. Solid tumours generally have disorganized, leaky and/or insufficient vasculature. This means that some cells may have more limited access to nutrients and gases, and therefore may suffer nutrient deprivation or hypoxic stress, even to the point of necrosis. This in turn can initiate changes in gene expression, including a shift in epigenetic state leading to a phenotypic and functional cell response. Many studies have also focused on angiogenesis, hypoxia and other biochemical characteristics of the solid tumour microenvironment (reviewed in [16-18]). It is increasingly clear that the cell and non-cell components of the microenvironment are highly interrelated and collectively contribute to ITH.

1.1.6 Genetic ITH

Although epigenetics and microenvironment clearly have a profound impact on cancer ITH, cancer is also a genetic disease. In 1976, Dr. Peter C. Nowell first proposed the clonal evolution model of carcinogenesis to explain how a heterogeneous tumour could arise from a single cell. He contended that genetic instability generates diversity, which when acted on by natural selection, drives cancer progression [19]. Just over a decade later, Fearon, Hamilton and Vogelstein discovered that female cancers display monoclonal patterns of X-inactivation [20].

7

As females generally show mosaicism in X-inactivation, this is consistent with a single cell origin for cancer. The next year, they reported that genetic alterations accumulate over time as cancer progresses [21]. Together, these findings are consistent with a monoclonal nature of cancer with subsequent evolution of genetic heterogeneity. At this time, it was generally thought that clonal evolution was punctuated by sweeps wherein previous genetic sub-clones were largely pushed out by newly arising, better adapted clones.

It was not until 2012 when a seminal paper published by Gerlinger, Swanton and colleagues first conclusively identified ITH generated by branching evolution of genetic subclones in renal cell carcinoma (RCC) [22]. They sequenced multiple spatially separated samples from within primary RCC and their associated metastatic sites. Phylogenetic reconstruction revealed distinct branches that underwent convergent evolution. This, and subsequent publications by Swanton and others report the presence of multiple genetic sub-clonal populations across many tumour types, confirming the widespread presence of genetic ITH [23-26]. Indeed, Andor and colleagues estimate that 86% of tumours across 12 tumour types comprise at least two genetic sub-clones. The presence of multiple genetic sub-clones and high levels of ITH have been associated with poor survival across many cancer types [26, 27]. Given that genomic instability is a hallmark of cancer [28], it is unsurprising that most cancers comprise multiple genetic sub-clones. However, the presence of sub-clones, each carrying a different spectrum of mutations, does not necessarily imply difference in function or phenotype between cells. For example, genetic ITH has important clinical ramifications as sub-clonal mutations may exist in driver genes. Some examples where ITH has been observed in druggable driver genes (known as actionable driver mutations) are BRAF-V600E, IDH1-R132H and epidermal growth factor receptor EGFR-L858R [29]. The presence of multiple sub-clones with different mutations in actionable driver genes may give rise to resistance to targeted therapy. Curative cancer treatment may depend on eradication of all clones. Genetic ITH, especially in actionable targets, must therefore be taken into consideration.

A classic example to illustrate this is with EGFR inhibitor resistance. Many lung cancers present with activating driver mutations in EGFR [30]. These tumours depend on this oncogene, and are thus sensitive to EGFR inhibitors gefitinib or erlotinib. However, the secondary mutation T790M (ie. threonine 790 to methionine), the so-called ‘gatekeeper mutation’, leads to resistance. This mutation can sometimes be detected at a sub-clonal level in the primary tumour and predicts impending resistance [30]. It is also notable that some treatment regimens including mutagenic

8

DNA damaging therapies, lead to increased genetic ITH and therapy resistance. For example, in the case of EGFR, the T790M mutation is typically undetected before treatment and emerges after therapy [30]. It remains formally possible that the mutation was present in the treatment naïve tumour at levels below the limit of detection, that the mutation may have occurred sporadically, or it may be that the therapy itself induced the mutation [30]. While current methods may underestimate the degree of genetic heterogeneity, emerging techniques such as single cell sequencing are likely to provide a clearer picture of the extent of cellular heterogeneity within a given tumor. Overall, genetic ITH is arguably the easiest to measure and best understood type of ITH, and is of demonstrated clinical importance.

1.1.7 A functional subpopulation of Cancer Initiating Cells (CICs)

Regardless of terminology used or explanatory hypotheses proffered, cancer is comprised of heterogeneous cell populations with varying potentials for expansion. Arguably, this functional ITH is what ultimately needs to be understood and targeted. Given the current understanding of genetic, epigenetic and micro-environmental mechanisms of ITH, the existence of a CSC may not be of primary importance. This thesis will focus on specific, measurable, functional properties of cancer cells and I will use the term cancer-initiating cell (CIC) to mean a cell with the ability to form a spheroid in an in vitro surrogate assay for three dimensional tumour growth, or to establish a tumour in vivo in an immunocompromised mouse. While the conceptual difference between a CIC and CSC is largely semantic, this functional definition allows for more clear hypothesis testing. In keeping with this, below I outline strategies used to define and study CICs.

1.1.8 Culture methods for CICs

In order to characterize and study functional subpopulations of CICs, they must be enriched to a measurable level of purity. To do this, researchers have generally employed two major strategies: 1) select for CICs using culture methods, or 2) prospectively identify CICs using markers. In general, culture conditions that enrich for a tumourigenic state involved growth under serum-free conditions with defined media in tissue culture vessels that do not support attachment and adherent growth. It is worth noting that some stem cell/CSC/CIC models use extracellular matrix (ECM) support. For example, human embryonic stem cells (ESCs) or induced pluripotent stem cells (iPS cells) are often supplied with irradiated fibroblast feeder cells or ECM substrate like

9

or vitronectin. While there are many individual formulations used by different labs for different cell types, most defined media contains a carbon source, amino acids, vitamins, a pH indicator, growth factors including epidermal growth factor (EGF) and basic fibroblast growth factor (bFGF), and various complex supplements developed for stem cell culture [31-33]. Controlling the composition of the cell microenvironment in terms of metabolites and access to extracellular matrix is a common thread to stem cells/CSC/CIC research. This suggests that metabolic conditions and access to ECM may be important determinants of these cell states. In Chapter 3, I will elaborate on this idea and discuss the role of various components common to these defined medias.

1.1.9 Cell surface markers of CICs

Often combined with CIC enriching culture conditions, antibodies against specific cell surface epitopes (i.e. markers) have been extensively used to isolate CICs. For these types of experiments cells are separated based on their expression of specific marker(s), then assessed for various functional properties, including their ability to form spheres or to initiate xenograft tumours in immunocompromised mice. There has been intense interest in discerning targetable signaling pathways/processes/gene requirements to discriminate functional CICs from bulk tumour cells. Various approaches to study transcriptome, drug sensitivity, genetic dependencies etc. have been undertaken to this end. The ultimate goal of using these markers is to either learn more about CICs in order to target them, or to use the CIC marker itself to target these cells.

The pentaspan transmembrane glycoprotein, CD133, is one example of a cell surface marker used to enrich for CICs in a variety of cancer types (See Table 1 for a non-exhaustive list of CIC markers) [34]. Unfortunately, there is no one-size-fits-all CIC marker. In many cases, CIC markers are used in combination; for example, in breast cancer CD44+/CD24low cells have been described as CSCs [35]. It is also of note that non-cell surface CIC markers exist, such as ALDH1, which mark cells based on a specific enzymatic activity [36]. Despite initial successes applying CIC markers generally, evidence has accumulated that markers often show highly context-specific utility [34]. Indeed, not all cells that express a CIC marker display CIC characteristics. For example, while some CD133 cells may function as CICs, not all CD133 positive cells do, and CD133 negative cells may be CICs. Furthermore, as the term suggests, some markers only “mark” these cells while others have known mechanistic function (see Table

10

1 for examples). With the ultimate goal of eradicating these clinically relevant cells, efforts are being made to translate CIC-targeted agents into the clinic.

Table 1- Solid tumour CIC markers

A non-exhaustive look at commonly used CIC markers, the cancer types they are used for, as well as the functional significance and tested targeting strategies. Modalities targeting a non- specific drug to the cells are used as a marker, and reagents aiming to directly block the function of the marker protein are referred to as functional.

Marker Cancer Type(s) Functional significance? Targeting (localization) strategy

ALDH1 [36] Breast, Colon, Head and Neck, Yes, enzymatic activity Functional (intracellular) Lung, Melanoma, Pancreatic, Prostate

CD24 [35] Breast, Colon, Liver, ?, high or low expression in ? (surface) Melanoma, Ovarian, Pancreatic, CSCs Prostate

CD44 [35] Breast, Colon, Head and Neck, Yes, multifunctional receptor Functional (surface) Liver, Ovarian, Pancreatic, Prostate

CD133 [37] Breast, Colon, Glioma, Liver, Largely unknown, in some As a marker/ (surface) Lung, Melanoma, Ovarian, cases through regulation of Functional Pancreatic, Prostate β-catenin

Integrin α6 Breast, Glioma, Prostate Yes, laminin receptor Functional [38] (surface)

LGR5 [39] Colon Yes As a marker, (surface) Functional

11

1.1.10 Discovery of novel markers to understand and target CICs

The causes and underlying mechanisms of genetic, epigenetic, and micro-environmental ITH are clearly complex, and they act in an interdependent way. There are many possible strategies to unravel this complexity, and in this thesis I will take two very different approaches: 1) I will describe a new technology to identify cell surface epitopes on a population of CICs in order to better understand this functional subpopulation; and 2) I will delve into genetic dependencies involved in tumour cell matrix interaction. More specifically, in chapter two, I will describe a novel unbiased barcoded antibody-based methodology that I developed and applied to a model of colorectal CICs to simultaneously generate antibodies against them and discover new markers. The output of this was a series of antibodies, two of which were specific for unique subunits from the integrin family of cell adhesion molecules that act as receptors for the extracellular matrix (ECM), thus implicating the ECM microenvironment in regulation of a CIC phenotype. In chapter three, I describe a functional genetic approach to investigate genetic interactions with ECM components. Before delving into my results, I will examine in more detail what the ECM is, how the cell senses it, and what is known about the function of these interactions.

1.2 The Extracellular Matrix (ECM)

1.2.1 Cell Adhesion

Interaction of cells with each other and their surrounding structural support is necessary for the existence of multicellular organisms. Cells do not simply ‘stick’ together, but rather are organized precisely into very diverse and highly distinctive patterns. There are a wide variety of cell adhesion mechanisms that facilitate the assembly of cells into the overall architecture of each tissue. These involve dynamic cell-cell and cell-matrix connections at the plasma membrane, and require linkage to the cytoskeleton. Much research has delineated types, functions and pathologies associated with normal and aberrant cell-cell contact (reviewed in [40-42]). Below, I described major themes of how cells interact with the ECM, as this will be much of the focus of Chapter 3.

1.2.2 Function of the ECM

The ECM is a collection of soluble and insoluble molecules secreted by cells to provide structural and biochemical support. There are several core ECM components and hundreds of

12

more minor ECM players (collectively the ‘matrisome’) encoded in the , and their expression is highly regulated in order to meet the complex and dynamic requirements of a particular tissue. Once thought to act as a passive cushion, the ECM is now appreciated to play a complex and crucial role in multicellular organism development, tissue homeostasis and disease. As such, the composition of the ECM is pivotal to cellular phenotype and function through regulation of cellular adhesion, morphology, motility and signaling [12, 43, 44]. Recently, it has also become clear that mechanical characteristics of the ECM, including stiffness and deformability also provide input to cell behavior. The role of ECM components in cell adhesion and in signaling to cells through adhesion receptors, including integrins, has been extensively reported [40, 44-47]. Beyond the direct interaction of ECM proteins themselves with cells, it has become clear that the ECM acts as a reservoir for soluble growth factors including FGF, VEGF, Wnts, and TGFβ families of signaling molecules [47]. As such, the ECM regulates the physical distribution, activation status and presentation of soluble growth factors to cells. Thus, the ECM integrates complex adhesion and soluble signaling in an organized manner to control cell phenotype (reviewed in [47]) . Beyond complex and dynamic differences in ECM between normal tissues, the ECM evolves with age and is drastically reorganized in wound healing or diseased tissues. The importance of ECM is exemplified by the wide range of syndromes from minor to severe that arise from genetic abnormalities in ECM-associated proteins (discussed below). It has been argued that signals from ECM proteins are at least as important as soluble signals in governing the processes of differentiation, proliferation, survival, polarity and migration of cells during development and disease [47].

1.2.3 Major ECM components

The ECM has two major forms, the basement membrane and the interstitial matrix. Very generally, the basement membrane is a specialized form of ECM to which cells anchor, and the interstitial matrix is the three dimensional, porous ECM that surrounds and supports cells. The hundreds of ECM components can be divided into three main classes of macromolecules: fibrous proteins, proteoglycans and glycosaminoglycans.

The main fibrous proteins are collagens, elastins, and fibronectin. Collagens constitute the main structural element of the ECM, making up approximately 30% of the total protein mass of a multicellular animal [12]. Twenty-nine types of collagen have been described, falling into

13

three categories based on their structure. Fibrillar collagen is by far the most abundant in the body, but non-fibrillar and fibril associated collagens with interrupted triple helices (FACIT collagens) also play critical roles in tissue integrity and function [48]. Fibrillar collagen forms a triple stranded helix that further assembles into fibrils and further into networks of fibrils. This provides the tensile strength of tissues, and regulates cell adhesion, migration and tissue development. Most collagen is secreted and organized into sheets and cables by tissue-resident fibroblasts. Within an individual tissue, there is generally a mixture of collagens with one type predominating. Elastin works with collagen to provide stretch and recoil to tissues. Fibronectin is secreted as a dimer which contains binding sites allowing association with other fibronectin dimers, collagen, heparin and integrins. In this way, fibronectin plays a role in organizing the ECM and mediating cell attachment and function. Cellular traction forces through the cytoskeleton can stretch and unwind fibronectin revealing cryptic sites for cell adhesion receptors such as integrins.

Proteoglycans (PGs) are composed of glycosaminoglycan (GAG) chains covalently linked to a protein core. PGs may be present in the basement membrane, or bound to the cell surface. GAG chains are unbranched polysaccharide chains that may be sulfated (chondroitin sulfate, heparan sulfate, keratan sulfate) or non-sulfated (hyaluronic acid) [12]. Basement membrane bound PGs include perlecan, agrin and collagen type XVIII. Cell surface PGs include syndecans and glypicans [12]. These molecules are extremely hydrophilic, allowing hydrogel formation that enables the ECM to withstand high compressive forces. Additionally, GAGs, such as hyaluronan, may have no core protein [12].

1.2.4 The Matrisome

While collagens, laminins and fibronectin and the above-mentioned proteoglycans represent major components of the ECM, there are more than 1,000 ECM and ECM-associated proteins that make up what is known as the ‘matrisome’. These have been compiled from in vivo proteomic as well as in silico data [49]. Importantly, characteristic differences in ECM composition can be identified between tissues and tumours. The matrisome, combined with proteomic data from various tissue types and disease states has allowed for derivation of matrix molecular signatures [49]. This, along with improved techniques to detect and quantify matrix

14

proteins in various tissues [50], sets the stage to better understand the role of ECM in cell phenotype.

1.3 ECM receptors

1.3.1 Discoidin domain receptors

Discoidin domain receptors (DDRs) are non-integrin collagen receptors, and they serve both as adhesion molecules and receptor tyrosine kinase (RTKs) [51]. These receptors bind to both fibril and non-fibril collagen through the GVMGVO motif [51]. Through ligand specificity and kinase activity, DDRs function as sensors for ECM and regulate a wide range of cell functions including migration, proliferation and ECM homeostasis. There are two mammalian DDR genes (DDR1 and DDR2) that are widely expressed during development and adulthood. DDR1 expression is restricted to epithelial cells, whereas DDR2 shows mesenchymal specificity [51]. Unlike growth factor RTKs, DDRs induce slow auto-phosphorylation, on the order of hours instead of minutes, in response to ligand binding [51]. These receptors can activate the ERK pathway, and other downstream signaling cascades. Also, through crosstalk with growth factor receptors or integrins, DDR1 and DDR2 can act as pro- or anti-tumourigenic receptors depending on tissue type and stage of cancer. They have also been implicated in developmental diseases, atherosclerosis, osteoarthritis and fibrosis [51].

1.3.2 Other ECM receptors

Besides integrins and DDRs, there are many well described ECM receptors, and emerging ECM receptors. Three examples are dystroglycan, syndecans and the multifunctional CIC marker, CD44. Dystroglycan, encoded by the DAG1 gene, is expressed as a heterodimer, with an extracellular α- subunit and transmembrane β- subunit. These subunits are tightly associated with each other, and act as part of a larger dystroglycan complex to connect the ECM to the actin cytoskeleton. Dystroglycan has been studied mainly in muscle cells where its ligands include laminin, agrin and perlecan [52].

Syndecans have been reported to act as independent adhesion receptors, or as co-receptors for G protein-coupled receptors [53]. There are four mammalian syndecans, all transmembrane receptors whose extracellular region is modified by GAG chains. These chains can interact with a wide variety of molecules including growth factors, other adhesion receptors and ECM 15

components. The syndecans have been implicated in some cancers as mediating proliferation, adhesion and motility [54], and they have also been reported to organize plasma membrane microdomains, thus regulating other ECM receptors [54].

CD44 is a cell surface glycoprotein that acts as a receptor for hyaluronic acid (HA) and can also interact with other ECM ligands such as collagens, osteopontins, fibrin and matrix metalloproteinases. CD44 plays a role in the hematopoietic system on lymphocyte activation and homing, as well as in solid tissues and cancer. Importantly, CD44 has also been implicated as a CIC marker in many tumour types (See Table 1).

1.3.3 Integrins

The most prominent of the ECM receptors are the integrin family of heterodimeric cell adhesion molecules. Integrins are found on all human cells, except normal erythrocytes, where they “integrate” cytosolic processes with the extracellular environment by acting as bi-directional signaling molecules (Figure 2A) [55]. The integrin family comprises 18 α- and 8 β-subunits which non-randomly pair to form 24 different heterodimers of varying ligand specificity and tissue expression (Figure 2B) [55, 56]. These heterodimers have specific, non-redundant function, in part due to their affinity for various ligands, and in part due to specific and restricted expression patterns. In this way integrin heterodimers can be sub-grouped by primary ligand specificity or based on cell type expression. The RGD receptor group, as their name implies, primarily recognize the tri-peptide Arginine-Glycine-Aspartic acid (RGD) sequence present in numerous ECM components including vitronectin and fibronectin. The collagen and laminin receptors primarily recognize and bind to collagen and laminins, respectively. Finally, the leukocyte-specific receptors are found only on hematopoietic lineage cells where they mediate specific cell-cell and cell-matrix interactions. Importantly, almost every heterodimer can bind to additional ligands beyond the primary specificity, as illustrated in Figure 2 B.

1.3.4 Integrin signaling

Related to the role of integrins in mediating cell adhesion to various ECM ligands, integrins link cytosolic processes with the extracellular environment through bi-directional signaling [55]. In a process known as ‘outside-in’ signaling, integrins transduce information from the extracellular environment to modulate a variety of cell responses (Figure 2C) [55, 57]. 16

AB α1 Laminin α10 α2 Extracellular Collagen α11 α3 Matrix Fibronectin Osteopontin α9 β1 α6 β4 VCAM-1 α4 α7 ICAM α8 αv β8 Fibrinogen β7 α5 Integrins S Vitronectin αE β5 S β2 LAP-TGF-β β6 β3 E-Cadherin αD Thrombospondin αL αX αM αIIb Structural VEGF

Kinases leukocyte/platelet integrin heterodimers Signaling Actin Cytoskeleton

CD Outside-In Signaling Inside-Out Signaling

Inactive Active

Kindlin Talin Talin FAK RIAM ILK Src Talin RAP-GTP Talin Kindlin CRK RHO AKT MLC GEF GSK3 RAC RIAM Integrin activation Actin cytoskeleton RAP-GTP signals Survival, Motility, proliferation, spreading, growth invasion 17

Figure 2 – Integrins “integrate” adhesive and signaling processes across the plasma membrane A) The extracellular matrix (ECM) is a collection of fibrous proteins and proteoglycans that serve as biochemical support for cells in a multicellular organism. The ECM includes collagens, laminins, fibronectin, etc. Integrins are cell adhesion molecules that serve as receptors for various ECM ligands. Integrins link the ECM to the actin cytoskeleton and downstream signaling. B) The integrin family is comprised of 24 α- and 8 β-subunits which pair non-randomly to make 24 heterodimers of varying ligand specificity. Heterodimer paring is illustrated with a connecting wedge, and the colour of the wedge indicates ligand specificity. Light grey background shading indicates leukocyte/platelet-specific integrin heterodimers. C) Integrin adhesion and signaling activities are regulated by both outside-in and D) inside-out signaling events. For outside-in signaling, integrin ligation triggers increased ligand affinity, clustering and recruitment of various structural components including Talin and kindlin and kinases including integrin linked kinase (ILK), focal adhesion kinase (FAK) and Src. This leads to the formation of focal adhesions and activates downstream signaling through AKT, GSK3 etc. ultimately promoting survival, proliferation and growth as well as cytoskeleton remodeling through CRK, RHO, RAC, and MLC and thus cell motility, spreading, migration and invasion. Conversely, for inside-out signaling, integrins are intitially present in an inactive, bent conformation. Intracellular signaling events lead to Talin activation by RIAM and active RAP and subsequent association with β integrin cytoplasmic tail. This interaction triggers the extension of the integrin heterodimer to a high affinity state, allowing ligand binding and subsequent signaling. 18

Specifically, upon ligand engagement, various structural components and kinases are recruited to the intracellular tail of the integrin. First, talin and kindlin associate with the β integrin cytoplasmic domain, establishing a structural link between the ECM and actin cytoskeleton [58]. Focal adhesion kinase (FAK) is recruited to this nascent adhesion complex where it autophosphorylates and recruits Src, resulting in activation of both kinases. FAK activity plays important roles in many cellular pathways and thus acts as a major signaling integrator linking ECM-engagement to signaling. In canonical integrin signaling, FAK/Src activates Crk which subsequently activates Ras-related C3 botulinum toxin substrate 1 (RAC1) and other kinases and pathways including PAK, JNK and NFκB [59]. Alternatively, FAK can recruit GRB2, which can activate RAS leading to activation of RAF-MEK-ERK or recruit Ras-proximate-1 (RAP1), which subsequently activates ERK and MAPK [59]. Integrin linked kinase (ILK) can also be recruited and activated by nascent adhesion complexes and stimulates phosphorylation of AKT and glycogen synthase kinase 3 (GSK3) [60]. Importantly, integrin ligation promotes clustering, which allows for amplification of the outside-in signals. Activation of these various pathways can ultimately promote survival, proliferation and growth, as well as cytoskeleton remodeling through RHO, RAC, Arp2/3, the WAVE complex and myosin light chain (MLC) resulting in cell spreading, motility and invasion [61].

During the related but opposing process of “inside-out” signaling, integrins are initially present in an inactive, bent conformation and are activated from the (Figure 2D) [58, 62]. This occurs when intracellular signaling events lead to Talin activation by RIAM and active RAP, followed by association with β integrin cytoplasmic tails [63]. This interaction triggers a large conformational change and extension of the integrin heterodimer to a high affinity state, allowing ligand binding and subsequent signaling (Figure 2D).

Through both inside-out and outside-in signaling, ECM-integrin interactions function to exact control over many cellular processes including adhesion, signaling and cytoskeletal rearrangements. Much insight into the structure-function relationship of integrin activation has been ascertained through studies on the platelet integrin αIIbβ3 and its ligand fibrinogen. αIIbβ3 is expressed on circulating platelets in its inactive state. This form does not facilitate binding to soluble fibrinogen in the plasma, which is of critical importance as binding would lead to platelet aggregation and thrombosis. In a clear example of ‘outside-in’ signaling, surface-bound

19

fibrinogen can engage inactive αIIbβ3 on platelets, thus allowing non-activated platelets to join hemostatic events that are already in progress. Conversely, as an example of ‘inside-out’ signaling, αIIbβ3 undergoes a conformational change to its high affinity state upon platelet activation by thrombin, allowing for binding to fibrinogen or other ligands resulting in platelet adherence to the vessel wall. This change in activation state is now understood to originate from the cytoplasmic portion of the β-chain leading to a conformational change in the large extracellular domain whereby the integrin switches from a bent to extended conformation. While these structural activation studies were carried out in specific leukocyte-platelet model heterodimers, the molecular mechanisms involved are thought to be similar for all integrin heterodimers. For more details please see reviews by Hynes in 1992 and 2002 [55, 64]).

1.3.5 The integrin adhesome

Integrins link the ECM to the cytoskeleton, and act to transduce a multitude of complex signals through their cytoplasmic domains, which physically associate with adaptors, signaling molecules and cytoskeletal components. In order to understand integrin-mediated functions, including cell adhesion and outside-in and inside-out signaling, it is important to identify prospective cytoplasmic players. To this end, several research groups have sought to identify all proteins that physically associate with integrins in adhesion complexes. In short, the general proteomic approach has been to induce integrin adhesion complex formation by incubating cells with ligand (i.e. fibronectin) coated beads, isolate the complexes formed and use mass spectrometry to identify proteins [65]. To date, there are seven published proteomic studies aiming to characterize integrin adhesion complex members. The data from these papers, derived from different cell types with slightly different experimental methods, was recently integrated into a systems level analysis to identify the consensus integrin adhesome [65]. The authors of this resource describe a meta-adhesome of 2,412 proteins, each identified in at least one of seven experiments, and a consensus or core adhesome of 60 proteins identified in multiple integrin adhesion complexes. Interestingly, only 10 proteins were identified in all seven experimental data sets: three integrins (αV, β1 and α5), the adaptor vinculin (VCL), integrin linked kinase (ILK), actin regulators VASP and LASP1, transglutaminase 2, and scaffold protein PDLIM5. The small number of proteins identified across all seven datasets, and the large amount of variation likely indicates that the composition of integrin adhesion complexes is highly context- dependent. The composition may vary based on the cell type and integrin heterodimer expression

20

pattern therein, as well as ECM type and signaling status of the cell. These proteomic approaches have been highly informative in identifying core- and context- specific integrin associated proteins, however these analyses were limited to fibronectin-induced integrin adhesion complexes. This type of proteomic analysis is especially useful when combined with the literature curated adhesome to infer function. The functional categories identified in the literature-curated adhesome include: kinase, phosphatase, adhesion receptor, adaptor, chaperone, actin regulatory protein, GTPase or GTPase regulator, protease and molecules involved in RNA/DNA metabolism [65]. While these approaches identify prospective players and pathways in integrin adhesion, they do not measure the functional importance of specific proteins in the complex.

1.4 Cell-ECM interactions in human disease

1.4.1 Mutations in ECM components leading to disease

The critical nature of the ECM for a myriad of cellular processes and functions is clearly illustrated by the large number of human genetic diseases caused by mutations in ECM- associated genes. While certain loss of function (LOF) mutations in fibronectin and some collagens are embryonic lethal [44], many lessons can be learned from human patients with non- lethal mutations.

Genetic defects in collagen (i.e. collagenopathies) are the largest group of ECM genetic diseases. Collagenopathies affect almost every tissue and organ system in the body, depending on the distribution of expression for the affected collagen molecule (reviewed in [48]). Mutations in any of these genes can lead to serious pathologies that most often affect the bones, joints and/or connective tissues. (see Table 2 for example collagenopathic manifestations)

Lamininopathies are mutations in laminins and are not to be mistaken with laminopathies, which are mutations in the nuclear envelope proteins, lamins [66]. Like collagen mutations, lamininopathies also display a wide range of pathological manifestations in various tissues depending on expression pattern. In general, lamininopathies cause abnormalities of the skin, muscle and nerve tissues. Mutations in some laminin genes disrupt dermal-epidermal cohesion and lead to a skin blistering phenotype known as epidermolysis bullosa. This disease can also manifest when autoantibodies against these laminins are present, which end up pheno-copying

21

the genetic disease. Another example includes mutations in the gene encoding laminin α2 (LAMA2), which disrupt linkages between the ECM and cytoskeleton in muscle cells and leads to congenital [66]. Together, these collagenopathies and lamininopathies demonstrate the imperative role of undisrupted ECM for tissue homeostasis.

1.4.2 Mutations in ECM receptors leading to disease

Many ECM receptors are mutated in human genetic diseases and their impact largely parallels those of mutated ECM ligands. For example, mutations in some laminin binding integrin genes phenocopy laminin mutations, leading to epidermolysis bullosa [67]. Other diseases associated with integrin mutations include congenital muscular dystrophy, leukocyte adhesion deficiency, kidney disease, and thrombasthenia, just to name a few (see Table 2). In a particularly well understood example, loss of function mutations in the gene encoding integrin α3 (ie. ITGA3), which pairs with integrin β1 to make the α3β1 heterodimeric laminin receptor, cause congenital nephrotic syndrome, interstitial lung disease and skin fragility, all of which result in lethality [68]. Mutation in, or loss of another laminin binding integrin, α7 (encoded by ITGA7) causes in human patients and mouse models respectively [69, 70]. The classic form of congenital myopathy, Duchene muscular dystrophy, is caused by mutation of the gene (ie. DMD), which serves as a cytoplasmic linkage between the ECM and the cytoskeleton in muscle and works with the aforementioned ECM receptor, dystroglycan. There are many more examples, but it is abundantly clear that mutation in ECM receptors or their associated linkage molecules are critical for tissue homeostasis.

1.4.3 Accumulation of ECM leads to Fibrosis

Beyond genetic syndromes, some of the most common diseases associated with the cell-ECM interaction are caused by pathological quantity and composition of the ECM. Fibrosis is the formation of excess fibrous ECM (such as collagen and fibronectin) in and around inflamed or damaged tissue as part of an aberrant healing processes. Fibrosis can be caused by severe acute tissue injury, repeated tissue insult, or by inflammation resulting from many chronic autoimmune diseases, including scleroderma, rheumatoid arthritis, Crohn’s disease, ulcerative colitis, myelofibrosis and systemic lupus erythematosus [71]. The cycle of inflammation and increased ECM deposition can lead to permanent scarring, organ malfunction and, ultimately, death, as seen in end-stage liver disease, kidney disease, idiopathic pulmonary fibrosis (IPF) and heart

22

failure [71]. Fibrosis also increases the risk of developing cancer in the affected tissue. For example, chronic insult to the liver resulting in inflammation can lead to cirrhosis which in itself can be fatal, and can lead to hepatocellular carcinoma [45].

Many immunological and molecular mechanisms contribute to initiation and progression of fibrosis including dysregulated innate and adaptive immune responses. The result of these inflammatory processes is TGFβ pathway-mediated activation of myofibroblasts to aberrantly produce ECM [45]. TGFβs are pleiotropic cytokines that can be locally activated to mediate tissue remodeling and wound healing. The TGFβ pathway is normally tumour suppressive, however with prolonged signaling it leads to fibrosis, and/or somewhat paradoxically, to cancer [72]. TGFβ signaling induces translocation of the SMAD2-SMAD3-SMAD4 transcription factor complex to the nucleus where it directly activates expression of about 60 ECM-related genes including the collagens COL1A1 and COL3A1[45]. The TGFβ pathway is activated by TGFβ ligands through TGFβ receptors (TGFBR1, TGFBR2 and TGFBR3). However, the ligand is secreted in complex with Latency-associated peptide (LAP), adding another level of control to this pathway where the ligand remains inactive until mechano-chemically activated by integrins (αVβ3, αVβ6 and αVβ8), or through cleavage by certain matrix metalloproteinases (MMPs) [72]. Collectively, these processes are of great interest for the treatment of fibrosis.

1.4.4 Increased ECM breakdown pathology

The abnormal breakdown of ECM is also associated with disease, as too much ECM deposition or mutation in ECM-associated genes leads to various pathologies. ECM breakdown is mediated by proteinases including a disintegrins and metalloproteinases (ADAMs), ADAMs with a thrombospondin motif (ADAMTS), meprins, and matrix metalloproteinases (MMPs) [45]. One example of ECM breakdown pathology involves the overexpression of ADAMTS4 or ADAMTS5 leading to cartilage destruction and osteoarthritis. In another example, overexpression of heart-specific MMP1 leads to loss of cardiac collagen and cardiomyopathy. Mechanisms behind proteinase up-regulation are not completely clear, but may involve receptors in the cell-ECM interaction such a syndecan or integrins, both of which can activate MMPs or ADAMs to promote ECM breakdown. Conversely, mutations in the gene coding for MMP2 leads to deficiency of the protease causes osteolytic and arthritic syndrome [73]. While MMP2 deficiency should, in principal, cause less ECM breakdown, this very severe disease pathology is

23 similar to that of increased ECM break down, highlighting the importance of balanced ECM breakdown and synthesis in normal physiology.

1.4.5 Cancer

Many ECM components, receptors and regulators are significantly deregulated during initiation and progression of cancer. It is important to note that in the normal tissue context, the ECM restrains cancer development. For example, expression of a high molecular weight hyaluronan in naked mole rats protects against cancer [74]. In humans, the basement membrane layer of ECM provides the basis for epithelial cell apicobasal polarity [44]. Loss of this polarity is a hallmark of cancer and may be secondary to oncogenic transformation or directly caused by basement membrane disruption. It has been repeatedly observed in normal and cancer cells that changes in the biochemical composition of available ECM can cause drastic changes in cell morphology and behaviour including inappropriate proliferation, migration and invasion.

In established tumours, progression through the so-called metastatic cascade is highly mediated by cell-ECM interaction. Metastatic cancer cells use MMP and ADAM proteases to degrade the ECM in order to facilitate migration and ultimately intravasation and detachment. In order to form a secondary tumour at a distant metastatic site, cells must extravasate, survive anchorage independent conditions, and ultimately reintegrate into a new tissue microenvironment. As in initial tumour formation, metastatic cancer remodels the ECM microenvironment so that it is suitable to support tumour progression. Tropism of various cancers for one site or another in the body may also be mediated by composition of the ECM at each site. For example, the survival and growth of lung cancer cells in the liver is supported by high collagen IV overexpression [75]. Interestingly, ECM gene signatures can be predictive of patient outcome. For example, breast tumours with high expression of MMPs correlate with poor prognosis whereas tumours with high expression of protease inhibitors correlate with good prognosis [76]. The ECM can also promote angiogenesis, indirectly supporting tumour progression. Taken together, it is apparent the ECM plays a critical role in cancer initiation and progression.

1.4.6 Anoikis and anchorage independence in cancer

Normal epithelial cells require attachment to the ECM for survival. In the absence of attachment, or attachment to incorrect ECM, cells may undergo a process of programed cell death known as

24

anoikis [77]. This process has been of great interest to the cancer field, because in order to metastasize through the vasculature, cancer cell must overcome anoikis and become anchorage independent. In normal mammary epithelial cells, upon detachment from the ECM β1 integrins and EGFR are down-regulated at the cell surface. This results in a loss of growth and survival signals, namely decreased MAPK signaling leading to the de-repression of apoptotic regulators, cytochrome c release from the mitochondria leading to caspase activation and cell death [78]. Conversely, in cancer cells oncogenic alterations resulting in the constitutive activation of various kinases, small GTPases, overexpression of anti-apoptotic proteins, and the promotion of an epithelial to mesenchymal transition (EMT)-like program all promote resistance to classic anoikis (reviewed in [79]) .

During the past decades, many studies have revealed that loss of attachment to the ECM leads to a variety of distinct cellular and molecular alterations, independent of anoikis [79]. Cancer cells may overcome or take advantage of these specific processes to become anchorage independent. Specifically, three major changes occur when cells detach from the ECM: metabolic defects, autophagy and entosis. Normal mammary epithelial cells detached from the ECM have a diminished capacity to import glucose leading to decreases in flux through glycolysis and the pentose phosphate pathway (PPP), causing a drop in ATP and NADPH production, respectively. This results in increased levels of reactive oxygen species (ROS), which impede AMPK activation and fatty acid oxidation leading to compounded energy stress [79]. ECM detachment and subsequent nutrient deprivation also leads to activation of autophagy, a catabolic processes wherein cells recycle organelles, proteins, and lipids to survive times of energy and nutrient stress [79]. Finally, a process known as entosis involves the invasion of an ECM detached cell into the neighboring cell where it is ultimately degraded by the lysosome and its components repurposed. Entosis was initially observed in breast cancer cells [80], and has since been observed in additional cancer types [81]. In agreement with these contentions, several groups have reported that cancer cells reprogram their metabolism specifically to balance redox stress and mitigate ROS in order to promote anchorage independent growth [82-84]. Thus, even when cancer has activated oncogenic signaling and has overcome anoikis, it must also implement a variety of metabolic changes to survive ECM detachment. Another way to look at the same idea is that attachment to the ECM promotes a specific and stable metabolic program, and cells must adapt following detachment from a substrate. Understanding the role of cell-ECM interactions in

25 determining survival, proliferation and metabolic state may facilitate the targeting of cancer- specific processes. In summary, there is a wealth of literature studying the effects of ECM detachment on cell processes [75, 77-80, 82, 85]; however, further study is required to understand how ECM or absence of ECM contributes to a cancer phenotype through specific signaling and metabolic processes supporting survival, growth, proliferation, invasion and metastasis.

1.5 Conclusions

While the cell is the basic unit of life, it is clear that extra-cellular factors including the extracellular matrix, are crucial for multicellular organism function. Many human diseases are associated with mutations in ECM components and this highlights the importance of cell-ECM interaction for proper tissue phenotype and cell function. Accordingly, targeting cell-ECM interaction is an attractive therapeutic approach for many diseases including fibrosis and cancer. Components of the ECM, and ECM receptors as well as their associated complexes are being increasingly well defined using various ‘omic’ approaches. The description of a consensus matrisome, as well as established techniques to characterize the matrisome of individual tissues or tumours is also helping to define the tumour microenvironment. The expression of ECM- associated genes in tumours has already shown prognostic value and it stands to reason that matrisome proteins could also be used at the diagnostic level. The consensus integrin adhesome has also been described. While the role of the ECM and its receptors has been well-characterized in developmental models, much remains to be explored about the way the cell-ECM interaction functions in cancer. In chapter three I will take a functional genetic approach to understand the cell-ECM relationship on proliferation using CRISPR-Cas9 mediated drop out screens on various ECM components. This will lead to a better understanding of the cell-ECM interaction with potential therapeutic implications.

26

Table 2- ECM related diseases

Various classes of human pathologies caused by mutation or aberrant cell-ECM interaction. For a more comprehensive picture of ECM-related disease pathologies, see reviews and articles referenced in the table.

Disease cause Genes/ Pathways Disease manifestation

Collagen At least 23 Osteogenesis imperfecta (4 types), Ehlers–Danlos mutation (including syndrome (13 types), Stickler syndrome (3 types), Caffey COL1A1, COL1A2, disease, Fibrochondrogenesis, Marshal syndrome, COL2A1 etc.) Achondrogenesis/hypochondrogenesis, Platyspondylic [48], lethal skeletal dysplasia, Torrence type Sponyloepiphyseal dysplasia, congenita Spondyloepimetaphyseal dysplasia, Strudwick type Kneist dysplasia, Osteoarthritis, Porencephaly, Ullrich congenital muscular dystrophy, Epidermolysis bullosa, etc.

Laminin LAMA2, Congenital merosin-deficient muscular dystrophy, mutation LAMA3, Epidermolysis bullosa LAMB3, LAMC2, [66] congenital nephrotic syndrome, interstitial lung disease, Integrin ITGA3 [68], skin fragility, Epidermolysis bullosa, congenital muscular mutation ITGB3, ITGB4 dystrophy, leukocyte adhesion deficiency, Glanzmann's [86], thrombasthenia,

ECM ADAMs, Cardiomyopathy, Osterolytic syndrome, Arthritic breakdown ADAMTSs, syndrome, Cancer, Metastasis MMPs, [45]

Excess Inflammation, Cirrhosis, Fibrosis (cardiac, pulmonary, kidney, liver ECM/stiffening TGFβ pathway, cirrhosis, etc.), Cancer, Metastasis oncogenic transformation[45] Chapter 2 2 Cell-based Synthetic Antibody Selection (CellectAb) Yields New Tools for Detecting Functional Subpopulations of Cancer Cells

2.1 Contributions:

All animal handling was performed by Yadong Wan and initial FACS sorting of POP92 was performed with Dr. Nicholas Pedley, both from Dr. Catherine O’Brien’s laboratory. Data analysis of deep sequencing data was performed by Dr. Traver Hart. With the exception of muscle stem cell and glioblastoma work, I planned and carried out all experiments, with technical assistance from Alejandro Duque, a summer student and technician. I analyzed all wet lab experimental data, generated and formatted all data panels, with the exception of the muscle stem cell FACS. I ran all flow cytometry experiments for data acquisition, with assistance from the Faculty of Medicine Flow Cytometry Facility. I imaged all immunofluorescence at the Microscope Imaging Laboratory facility at the University of Toronto. All immunoprecipitation mass spectrometry experiments were set up by myself, lyophilized with the assistance of Dr. Jiefei Tong, run on the instrument and analyzed in Scaffold by Dr. Jonathan Krieger in the laboratory of Dr. Michal Moran.

27 28

2.2 Abstract

Failure of conventional cancer therapies has been extensively linked to rare subpopulations of self-renewing, aggressive, metastatic tumour cells. In order to isolate, study and potentially target these clinically relevant tumour cell subpopulations, I developed a methodology for simultaneous target discovery and antibody generation, coined CellectAb. Stem cell-like populations of colorectal (CRC) cancer-initiating cells (CICs) were isolated, and panned with a custom high- complexity library of synthetic fully human antibody fragments (Fabs) displayed on the M13 bacteria phage. Hundreds of binders specific for the CIC population were identified by deep sequencing and a handful of these were recovered for validation studies. I focused on the top three antibodies (termed AN01, AN02 and AN03) and found they bound preferentially to the colorectal CIC population. I identified the protein targets for AN01, AN02, and AN03 as integrin α7, HLA-A1 and integrin β6, respectively. I show that these antibodies can be used to enrich for self-renewing colorectal CICs, and that my integrin α7 antibody can prospectively identify human muscle and glioblastoma stem cells. I also observe that genetic ablation of integrin β6 impedes colorectal CIC function. My CellectAb methodology, can be readily applied to other cancer and stem cell subpopulations to facilitate rapid identification of novel targets with simultaneous generation of high quality, multifunctional recombinant antibodies. 29 2.3 Introduction

2.3.1 Antibodies for biomedical research

Antibodies are large, modular proteins secreted by B-cells as part of the mammalian adaptive immune response. Antibodies were first discovered by Emil von Behring and Shibasabura Kitasato in 1890 when they showed that transferring serum from an animal immunized against diphtheria to an infected animal could be curative [87]. Later on, antibodies were understood to function by spatial complementarity similar to the “lock and key” analogy first postulated by Emil Fischer in 1894, where antibody antigen binding regions are highly variable and can be thought of as “keys”, which only work against specific antigens or “locks”.

Antibodies are Y-shaped molecules comprising two identical heavy chains, and two light chains. Each arm of an antibody contains a Fab fragment, with structural regions of the heavy and light chains at the proximal end, as well as variable regions known as complementarity determining regions (CDRs) at the distal end (Figure 3A). These CDRs dictate specificity and affinity for unique epitopes on pathogens, called antigens. The remainder of a full antibody immunoglobulin (Ig), comprises a fragment crystallizable region (Fc), which mediates interactions with immune cell Fc-receptors or the complement system, to mediate the immune function of antibodies. Antibodies bind tightly and specifically to their requisite antigen. In their natural roles, antibodies function to neutralize their targets by blocking function and/or by engaging an immune response. Because of their high degree of affinity and specificity, antibodies have been exploited for the purposes of identifying, isolating and targeting proteins in biomedical research.

Antibodies have typically been generated in two forms: polyclonal and monoclonal. Both are made by taking advantage of naturally occurring animal immune repertoires. Briefly, a naïve animal is immunized with an antigen of interest. For polyclonal antibodies, the serum is later harvested from the animal, and the Ig fraction binding to antigen is purified. With this approach, many different and unique B-cell clones will generate antibodies against potentially different epitopes on the same antigen (i.e. they bind to different sequences in the same protein). The output is a mixture of antibodies against the same antigen, but different epitopes. This approach is fast and low cost and can allow for large amplification of signal from the target because many antibodies may bind to each protein. However, polyclonal antibody mixtures often suffer from non-specific contaminating Igs, and show high batch to batch variability that can 30 Complementarity determining regions (CDRs) A B Heavy Chain 45CDR-H3

domains CDR-H1 L

Light Chain & C Fab H1 90o Fab region

Variable regions domains C

Fc L Constant regions CDR-H2 & V H CDR-L3 bridges V

C

Negative Sort cells for CD133 CD133-positive enrichment expression CICs Negative Phage-Fab pool + Binders sequenced Library Hits ranked

Positive Positive pool Negative pool pool − background CDR Barcodes () CD133-negative Positive R1 R4 Bulk tumor cell bulk cells enrichment

Cancer-initiating cell (CIC) Round: 1432

Figure 3 – Phage-Fab library design and CellectAb selection strategy A) Schematic and structure of an antibody B) Library I structual design features. PyMOL illustrations of the full Fab (left) or variable regions (right). Framework residues are gray tubes and complementarity-determining region (CDR) residues are shown as spheres. Diversified residues are coloured purple for CDR-L3 and red for CDR-H3 C) Schematic of phage-displayed synthetic antibody selections using FACs sorted AC133 positive CICs as positive selection, and AC133 negative bulk tumour cells as negative selection. In short, naïve library was pre-absorbed on bulk tumour cells to remove sticky binders, then unbound phage-Fab was incubated with the CIC population. The cell pellet was washed, bound phage eluted, and expanded in an E. coli host. This process was repeated for a total of four rounds of selection. In parallel, sticky binders were amplified through four rounds of selection by direct incubation with bulk tumour cells, followed by elution, expansion and re-selection on bulk tumour cells. After the selection process, CDRs from the phage-Fab round four output libraries from the positive and negative selections were PCR amplified and deep sequenced to determine Fab sequence abundance in both pools and associated enrichment (see Chapter two methods for details).

31

hinder reproducibility. To facilitate generation of more reliable and defined antibodies, monoclonal antibodies were invented. For monoclonal antibodies, splenocytes are harvested from an immunized animal and these are immortalized by fusion with myeloma cells. These resulting “hybridomas” are clonally expanded and screened for antibody production. This reduces batch variability and provides a high degree of specificity and scalability. However, the process involved in making monoclonal antibodies is time consuming, expensive and requires a high degree of technical expertise.

2.3.2 Antibodies for target discovery

The adaptive immune system has long been exploited to develop antibodies for biological research, including for the discovery of cell surface markers [88]. For example, the CIC marker AC133 is a mouse monoclonal antibody originally generated against CD34+ hematopoietic stem cells [89, 90]. To generate AC133 antibody, naïve mice were immunized with a CD34+ population of hematopoietic stem cells to generate an immune response. Resultant hybridomas were screened for secretion of antibodies capable of binding to the CD34+ population. Indeed, AC133 antibody bound to the CD34+ population, but did not bind directly to CD34 itself. The target of AC133 was later identified as the transmembrane glycoprotein CD133 encoded by the PROM1 gene. [90]. The AC133 antibody has since been used to isolate stem and cancer stem cells in a multitude of tissue types (Table 1 and [37]).

Current drug development efforts for cancer largely include the development of targeted therapies, either as small molecules or biologic macromolecules to modulate protein function, localization or interactions. Antibody therapeutics have emerged as a major drug class for numerous diseases including cancer [91, 92]. In order to use animal-derived monoclonal antibodies in a human patient without causing an unwanted immune response, the structural portion of the antibody must be humanized. The success of humanized monoclonal antibodies in the clinic, particularly for highly specific, targeted therapies (e.g. Trastuzumab for breast cancer), has sparked a major shift in drug development efforts [91, 93]. Recently, the AC133 antibody was humanized by cloning the antibody variable regions (CDRs) from the original hybridoma onto a human scaffold. Importantly, humanized AC133 maintained its specificity and showed anti-tumour effects in preclinical models [94]. This, and other similar efforts have served as a proof of principal for using antibodies for CIC target discovery and potentially for translation

32

into the clinic. This basic strategy for target discovery and development is still highly useful, however it is time consuming, taking up to 30 weeks until antibody is purified. Also, it requires a great deal of expertise and labor [88]. Further, as the output is an animal antibody, more time and effort is required to generate a humanized antibody derivative for potential use in preclinical models.

2.3.3 A fully synthetic antibody

In order to overcome these challenges, researchers have developed in vitro approaches to antibody generation. Advancements in synthetic biology and protein engineering have led to the development of yeast- and phage-displayed synthetic antibody libraries of greater diversity and specificity than naturally occurring immune repertoires for use in vitro [95, 96]. The physical linkage between the genotype (i.e. the sequence of antibody variable regions) and phenotype (i.e. binding specificity) allows for deep sequencing of pools of antibodies with specific binding characteristics [97, 98]. These synthetic antibody libraries have been applied to generate antibodies specific for all manner of immobilized recombinant antigen targets, and against more complex antigens expressed in their native form on the cell surface, that are otherwise difficult to purify [98]. To target cell surface antigens, binders are enriched on a cell line engineered to overexpress the protein of interest and a matched negative cell population is used to deplete sticky binders. Individual binders can be cloned out from these pools in under a week, and, in parallel, pools of phage-Fabs bound to populations of interest can be deep sequenced and enrichment calculated computationally. These approaches have permitted the rapid, effective development of specific, high-affinity antibodies against virtually any antigen of interest [97, 98]. However, to my knowledge, a synthetic antibody selection approach has never been used to discover new cell surface targets on a cell subpopulation of interest.

2.3.4 CD133 positive Colorectal Cancer Initiating Cells

Most solid and hematological malignancies have been found to comprise functionally diverse subpopulations of cells that differ in their potential for proliferation, self-renewal, therapy resistance and metastasis formation [8, 99-101]. This heterogeneity presents major challenges to both diagnosis and treatment, positioning itself as the next frontier in cancer biology [102, 103]. The AC133 antibody can identify a functional subpopulation of cells called cancer initiating cells (CICs) in a wide variety of tumour types, including colorectal cancer [34]. This aggressive

33

subpopulation of cells shows the functional ability to initiate a tumour spheroid in vitro, or a tumour in vivo [104, 105]. Despite much success applying AC133 and other CIC markers to isolate, study and target CICs, in many cases conflicting results suggest that CIC markers, like AC133, have highly context-specific utility [34, 37]. It has been repeatedly shown, particularly in the blood system, that cell populations homogeneous for certain markers, but remaining functionally heterogeneous, can be further dissected with the discovery of new markers that fractionate into biologically distinct subpopulations [106]. Therefore, understanding and targeting functional CIC subpopulations may hinge on uncovering novel markers and tools to enrich CIC subpopulations to purity.

2.3.5 Rationale and Approach

I describe a novel approach termed CellectAb, inspired by the animal immunization technique for marker discovery, that links target discovery to antibody generation. I applied this new approach to AC133-positive human colorectal (CRC) CICs and identified hundreds of CIC- specific binders by deep sequencing. I validated three candidate synthetic antibodies as binding specifically to the CRC CIC population, identified the protein targets and demonstrated their utility. My CellectAb methodology can be readily applied to other cancer and stem cell subpopulations to facilitate the rapid identification of novel targets with simultaneous generation of high quality, multifunctional antibodies.

2.4 Methods

2.4.1 Cell culture

The POP92 colorectal cancer spheroid line was originally derived from a stage IV sigmoidal colon adenocarcinoma from a 45-year-old female patient donor xenografted in an NSG mouse. After xenograft harvest and dissociation, POP92 cells were maintained in vitro in suspension flasks (Starstedt, Cat.# 83.3911.502) in serum free media, supplemented with EGF and bFGF as previously described [32]. For expansion, POP92 spheres were passaged every four to seven days. In short, spheres were harvested by gentle centrifugation (200g), trypsinized with 0.25% Trypsin-EDTA (ThermoFisher, Cat# 25200056) for 5 minutes at 37C. Trypsin was washed out with serum-free DMEM/F12 (ThermoFisher, Cat# 11320) and cells were triturated at least 30 times against the bottom of the tube with a 5mL pipette to achieve a single cell suspension. Any

34

remaining clumps of cells were removed with a 40uM cell strainer (Falcon, Cat# 352340) . Cells were counted with a Z2 Coulter Counter (Beckman Coulter, Cat# 6605700) and re-seeded into freshly prepared media.

HAP1 integrin knockout cells were obtained from Horizon Discovery (formerly Haplogen Genomics): ITGA1 knockout - α1 KO-(Cat# HZGHC001039c001), ITGA2 knockout- α2 KO- (Cat# HZGHC001073c010), ITGA3 knockout - α3 KO-(Cat# HZGHC001043c008), ITGA4 knockout - α4 KO-(Cat# HZGHC001077c009), ITGA5 knockout - α5 KO-(Cat# HZGHC001085c001), ITGA6 knockout - α6 KO-(Cat# HZGHC001052c010), ITGA7 knockout - α7 KO-(Cat# HZGHC001054c012), ITGA9 knockout - α9 KO-(Cat# HZGHC001050c02), ITGB1 knockout - β1 KO- (Cat# HZGHC001029c003), ITGB4 knockout - β4 KO- (Cat# HZGHC001027c011) and ITGB5 knockout - β5 KO- (Cat# HZGHC001028c001) and maintained in IMDM (ThermoFisher Cat# 12440079) supplemented with 10% foetal bovine serum (FBS) (ThermoFisher Cat# 10437010) and 1% penicillin/streptomycin.

All additional adherent cell lines were maintained in the ATCC recommended media supplemented with 10% FBS and 1% penicillin/streptomycin in tissue culture treated vessels and passaged every 2-4 days by trypsinization. Cell lines were tested regularly for mycoplasma.

2.4.2 Isolation of AC133+ CIC-enriched fraction

POP92 has been previously validated to contain a population of AC133 positive CICs [32]. To isolate the AC133 positive CIC population, POP92 spheroids were mechanically and enzymatically dissociated with 0.25% trypsin, washed well and strained to single cell suspension, then stained with AC133 conjugated to alexa-488 fluorophore (Miltenyi Cat# 130- 105-225) and 7AAD viability dye (Biolegend Cat# 420404). Live, singlet cells were FACS sorted using a BD FACSAria into the top 10% of AC133 positive cells (CIC-enriched fraction), and the bottom 20% AC133 negative cells (low CIC fraction).

2.4.3 Antibody Selection Strategy

A library of highly diversified (>10^11) human Fab fragments expressed on M13 bacteriophage was created and prepared as described [91]. Naïve library was pre-absorbed on AC133 negative cells to remove sticky binders. Unbound phage-Fab was incubated with the AC133 positive CIC

35

population. The cell pellet was washed, bound phage eluted, and expanded in an E. coli host. This process was repeated for a total of 4 rounds of selection. In parallel, sticky binders were amplified through 4 rounds of selection by direct incubation with AC133- cells, followed by elution and expansion. After the selection process, complementarity-determining regions (CDRs) from the phage-Fab round 4 output libraries were PCR amplified and deep sequenced. Sequences were processed as in [91], and top hits were selected based on high read counts in positive pool and low/no read counts in the negative selection pool. The top candidate hit sequences were cloned into IPTG-inducible Fab vectors, expressed and purified from BL21 E. coli.

2.4.4 Flow cytometry/FACS

POP92 cells were processed as in section 2.4.2 above. All adherent cell lines were harvested by aspirating media, washing in PBS then fully dissociated in sterile non-enzymatic dissociation

buffer, DB (1mM EDTA, 137mM NaCl, 6.7mM NaHCO3, 5mM KCl and 5mM D-Glucose) for 5-10 minutes. Samples were then diluted with PBS 2% FBS, filtered through 40µM mesh and stained with appropriate antibody dilution and viability dye. Data acquisition was performed on a BD LSR Fortessa X20 (BD Biosciences) and FACS sorting was performed using a BD FACS Aria or BD FACS Influx.

2.4.5 Integrin overexpression

HEK293T cells were seeded at 25,000 cells per well into 12 well plates. In parallel, a large batch of control cells were seeded into a 15cm plate. After 24 hours, 12 well plates were co-transfected with indicated integrin subunits (see Methods Table M1) using XtremeGene9 (Roche Cat.#63650809001) to the manufacturer specifications. 48 hours after transfection, cells were dissociated with non-enzymatic dissociation buffer (DB) containing 1mM EDTA, 137 mM

NaCl, 6.7mM NaHCO3, 5mM KCl and 5mM D-Glucose for 5-10minutes at room temperature. Un-transfected in-tube control cells were incubated with 1:15,000 dilution of Carboxyfluorescein succinimidyl ester (CFSE) CellTrace dye (ThermoFisher, Cat.# C34570) for 30 minutes at 37C. Excess CFSE was washed away with PBS containing 2% FBS, controls cells were dissociated with DB as above. Each transfection was split into two tubes, and CFSE-stained untransfected control was added in equal amounts to all tubes. For each transfection, one tube was stained with 2ug/ml AN01 scIgG and the other with 2ug/ml AN03 scIgG, followed by anti-human APC secondary (Jackson Cat.#109-136-097) and DAPI viability dye. Data was acquired on a BD

36

LSRFortessa X-20. Cells were gated for live singlets, and segregated on the basis of CFSE staining for control and query cells. The mean APC fluorescent intensity was calculated for both the query and control cells in each tube. The difference between these measures was calculated as a % maximum fold change. This experiment was performed in biological triplicate.

Methods Table M1: Chapter 2 overexpression plasmids cDNA Origene Catalogue Number

ATP1A1 (control surface protein) SC119714

ITGA1 SC107298

ITGA1 SC119742

ITGA10 SC117827

ITGA11 SC124146

ITGA2 SC118747

ITGA3 SC303656

ITGA4 SC119587

ITGA5 SC118748

ITGA6 transcript variant 1 SC315867

ITGA6 transcript variant 2 SC120055

ITGA7 SC303148

ITGA8 SC127886

ITGaIIb SC300066

37

ITGAL SC124031

ITGAM SC315229

ITGAV SC118750

ITGB1 SC111935

ITGB2 SC320165

ITGB3 SC120057

ITGB5 SC118751

ITGB6 SC128124

ITGB8 SC118752

2.4.6 Immunoprecipitation (IP)

Cells were washed 2x in ice cold PBS, then lysed in lysis buffer (10% Glycerol, 50mM HEPES- KOH pH 8.0, 100mM KCl, 2mM EDTA, 0.1% NP-40, 1x Protease inhibitor (Sigma #), 10mM NaF, 0.25mM NaOVO3, 5nM Okadaic acid, 5nM Calyculin A, 50mM β-glycerolphosphate) for 60 minutes with intermittent vortexing. Lysates were freeze-thawed at -80C, thawed on ice, then centrifuged at >15,000 RPM for 60 minutes. Each supernatant was transferred to a new tube, and spun >15,000 RPM for additional 30 minutes. Protein supernatants were incubated with 10µg of antibody overnight at 4C on a nutator. Immunocomplexes were precipitated with Protein G (for IgGs), FLAG-M2 (FLAG-tagged Fabs) or streptavidin beads (Avi-tagged Fabs), as indicated (beads purchased from Pierce Cat#53125, Sigma Cat#A2220 and Pierce Cat# 20347,

respectively). Beads were washed well with lysis buffer, followed by H2O. Proteins were eluted in SDS-PAGE sample buffer (NuPage gel system, ThermoFisher Cat# NP0301) for western blot, or processed as below in 2.4.8 for mass spectrometry (MS).

38

2.4.7 Mass Spectrometry (MS)

Bead bound proteins for MS were eluted in 0.15% trifluoroacetic acid (TFA), then adjusted to th ~pH 8.0 with 1M NH4HCO3. Samples were reduced with 1/10 volume 45mM DTT at 60C for 20 minutes. Samples were then cooled to room temperature and incubated with cysteine blocking reagent (BioShop Cat.#IOD500.5) for 15 minutes in the dark. Peptides were digested with Biolab modified trypsin (TPCK Cat.#P8101S) at room temperature overnight. Each sample was reacidified to 1% TFA and purified using C18 stageTip (Thermo Scientific, Cat.# SP301) to the manufacturers specifications, dried using a speed vacuum, and run on a mass spectrometer.

2.4.8 RNA interference (RNAi) experiments

pLKO vector containing indicated shRNA (see Methods Table M2) were packaged into lentiviral particles. Cells were infected at MOI <1 and stably transduced cells were selected in 2ug/mL puromycin.

Methods Table M2: Chapter 2 RNA interference (RNAi) reagents

shRNA ID Target Sequence TRCN Number

shITGB1#1 GCCTTGCATTACTGCTGATAT TRCN0000029645

shITGB1#2 GCCCTCCAGATGACATAGAAA TRCN0000029648

shITGA7#1 GCCCTGGACTATGTGTTAGAT TRCN0000057709

shITGA7#2 CCTCCGGGATTTGCTACCTTT TRCN0000057712

shHLAA#1 CCCTTCCCTTTGTGACTTGAA TRCN0000057238

shHLAA#2 CACACCATCCAGATAATGTAT TRCN0000057239

shITGB6#1 GAAACATTTATGGGCCTTATT TRCN0000057706

shITGB6#2 GCCAACCCTTGCAGTAGTATT TRCN0000057707

39

2.4.9 CRISPR/Cas9 single gene knockout generation

Indicated sgRNAs (see Methods Table M3) were cloned into Cas9-containing vector, LentiCRISPR v2 (Addgene Cat#52961). The resulting vectors were packaged into lentiviral particles and cells stably transduced by selection in 2µg/ml puromycin for 48 hours, allowed to recover for 24 hours, then knockout cells were FACS sorted.

Methods Table M3: Chapter 2 CRISPR reagents

sgRNA ID CRISPR Target sequence

sgITGA7#1 (also N-terminal) AGAGTGGACATCGACCAGGG

sgITGA7#3 CAGGGATCGTCCCATGGCCG

sgITGB6#4 ACTGCGGTCTGAGGTGGAAC

sgITGB6#8 AAAGGAACTGCGGTCTGAGG

sgITGB6#9 AAGTGTGTTTGCACAAACCC

AGTGAAGGAGGCCGGGAACC sgITGA3_N- terminal

AGCCACAGCTCGATTTCGGC sgITGA3_C- terminal

CAGCTGCGACATCACCGCCC sgITGA6_N- terminal

CTTTCCGGATCCTTACAGCA sgITGA6_C-terminal

AGACCGACAGCAGTTCAAGG sgITGA7_C-terminal

2.4.10 ITGA3, ITGA6, ITGA7 full gene triple knockout generation

Due to residual transcript and concerns that there was still some functional or non-functional protein expressed, the entire coding locus was removed for ITGA3, ITGA6 and ITGA7 by simultaneous targeting of the N- and C-terminal exons. To create full knockouts, gRNAs were designed to target the 5’ (N-terminal) and 3’ (C-terminal) end of the coding region of each respective gene. A mixture of six gRNAs targeting the N and C-terminal coding portions of

40

ITGA3, ITGA6, ITGA7 (sgITGA3_N-terminal, sgITGA3_C-terminal, sgITGA6_N-terminal, sgITGA6_C-terminal, sgITGA7_N-terminal, sgITGA7_C-terminal) were co-transfected into HAP1 wildtype parental cells. After 24 hours, transfectants were selected in 2 µg/ml of puromycin for a further 24 hours. Puromycin was removed and cells were allowed to recover for 48-96 hours. As ITGA3, ITGA6 and ITGA7 encode laminin receptors, a triple knockout, resulting in the loss of three laminin binding integrin heterodimers (α3β1, α6β1 and α7β1), will be impaired for the ability to adhere to laminin. Recovered transfectants were dissociated with DB and incubated for 30 minutes in growth medium on laminin coated plates. Adherent cells were discarded and floating cells were kept. This phenotypic enrichment strategy selected for cells with impaired laminin adhesion due to disrupted laminin binding integrins. The floating cells were then stained with AN01, and AN01 negative cells were FACS sorted into single cell clones. Clones were expanded for ~2 weeks, viable cells and pellets for gDNA were banked, and 77 clones screened for AN01 binding by performing immunofluorescence in CellCarrier plates (PerkinElmer, Cat#6005550) with a PerkinElmer Opera QEHS high-throughput confocal microscope. In parallel, these clones were screened in a laminin spreading assay, which calculates the difference in confluence between equal cell numbers for each clone plated on uncoated plastic or laminin for 24 hours. Clones with negative AN01 and low laminin spreading were selected for follow-up. gDNA was isolated from these presumptive triple negative clones and an ABC-style PCR over the expected gene knockout site performed (see Methods Table M4 for primers). In this way, clones with a true full gene knockout were identified by PCR product (bridging the genomic location where the gene was removed), and non-full gene knockouts were identified with a different product size (from the gene itself, if it was still present). The majority of clones contained indel mutations but maintained at least one integrin gene of interest. However, one clone, named SCC#7 had full gene knockout for all three genes.

Methods Table 4: Chapter 2 PCR primers

Name Sequence Temp. Purpose

ITGA3fgko_F1 GAGCAGGTGAACAGGTCCTC 63C Full gene KO verification

ITGA3fgko_R1 TAGGACAGAGTCAGAGGCCC 63C Full gene KO verification

41

ITGA3fgko_C GCGTCCAGACAGGTTCTGGC 63C Full gene KO verification

ITGA6fgko_F2 GGAGCTACACCAGTTGTCCC 63C Full gene KO verification

ITGA6fgko_R2 GGAGGATGTCACCTGAGTGC 63C Full gene KO verification

ITGA6fgko_C TCGTTATCAAACTCGATCCG 63C Full gene KO verification

ITGA7fgko_F1 GCAGTGGGTCCGTATCTAGC 63C Full gene KO verification

ITGA7fgko_R1 TCTCTGGGGAAGGGATGGAG 63C Full gene KO verification

ITGA7fgko_C TCCGGCTGGTCTCGAACTCC 63C Full gene KO verification qITGA3_F2 CTACCACAACGAGATGTGCAA 60C qPCR qITGA3_R2 CCGAAGTACACAGTGTTCTGG 60C qPCR qITGA6_F1 GGCGGTGTTATGTCCTGAGTC 60C qPCR qITGA6_R1 AATCGCCCATCACAAAAGCTC 60C qPCR qITGA7_F GCTGTGAAGTCCCTGGAAGTGATT 60C All isoforms qPCR qITGA7_R GCATCTCGGAGCATCAAGTTCTT 60C All isoforms qPCR qITGA7_F_long GTTGAGCCTGGAGGAGACTG 60C Long isoform qPCR qITGA7_R_long CACTGACTCCCAACCACTGG 60C Long isoform qPCR qGAPDH_F ACCCAGAAGACTGTGGATGG 60C qPCR control qGAPDH_R TCTAGACGGCAGGTCAGGTC 60C qPCR control

42

2.4.11 ELISA

A 384 well immunosorbant plate was coated with purified recombinant protein antigens: ATP1A1 (Origene, Cat.# TP301009), CD98 full length (Abnova, Cat# H00006520-G01), CD98 transcript variant 1 (Origene, discontinued) or mouse integrin α7β1 (R&D, Cat# 7958-A7) each diluted in PBS overnight at 4C with gentle shaking. The plate was blocked with PBS 0.5% BSA at room temperature for 60 minutes. Block was removed, and antibodies diluted in PBT (PBS with 0.5% BSA and 0.05% Tween-20) incubated in wells for 60 minutes. The primary antibody was washed away at least 4x in PT (PBS 0.05% Tween-20). HRP secondary antibody was added 1:5000 in PBT for 30 minutes, then washed with PT at least 7x, then developed with TMB substrate kit (ThermoFisher, Cat# 34021) according to manufacturers specifications, and read at 450nm in the Biotek Powerwave XS plate reader.

2.4.12 Immunohistochemistry

Tumours were surgically isolated and immediately immersed in 10% formalin (Sigma Cat# HT501128) for at least 48 hours before being transferred into 70% Ethanol. Tissue was dehydrated sequentially in increasing concentrations of ethanol, then xylene, embedded in paraffin, sectioned and placed on slides. Sections were deparaffinized and rehydrated in citrate buffer pH=6 (for ITGB6 antibodies, AN03 and MATF1037, Monash University, Australia) or Tris-EDTA pH=9 (for anti-phospho-Smad3 S424 +S425, Abcam Cat#Ab52903) for antigen retrieval. Primary antibody MATF1037 was used at a dilution of 1:10,000 for one hour, AN03 was used at 15 ug/ml and phospho-Smad3 at 1:50 overnight. Staining was detected with species- specific secondary antibodies conjugated to HRP using the IHC DAB polymer detection kit. Slides were counterstained with hematoxylin, imaged on the Zeiss Axioscan slide scanner, and analyzed in Zen Blue.

2.4.13 Western Blot

Whole cell lysates were prepared using radioimmunoprecipitation assay (RIPA) buffer (ThermoFisher, Cat# 89900) supplemented with HALT protease and phosphatase inhibitor (ThermoFisher, Cat# 78443). Cells were lysed on ice for at least 60 minutes with regular vortexing and pipetting, and the insoluble pellet was spun down at maximum speed for 30 minutes. Soluble proteins were quantified using BCA assay (ThermoFisher, Cat# 23225). Equal protein amounts were mixed with NuPage sample buffer and DTT, boiled for 5 minutes and run

43 on 4-12% gradient Bis-Tris NuPage gels (ThermoFisher, Cat# NP321). Proteins were wet transferred to activated PVDF membranes (GE Healthcare Cat#10600023), and probed overnight at 4C with indicated primary antibodies. The following day membranes were washed in tris- buffered saline with 0.2% Tween (TBST), and incubated with appropriate secondary antibody for 60 minutes at room temperature. Membranes were washed in TBST and developed on the MicroChemi 4.2 (FroggaBio, Cat#95-25-00) using Pierce ECL chemiluminescent substrate (ThermoFisher, Cat#32106). Images were processed in Adobe Photoshop for rotation, cropping, contrast and brightness enhancements. For IP-Western analysis, all input samples were prepared as above in 2.4.13, and run in the same way.

2.4.14 qPCR

RNA was extracted from log phase cells using the RNeasy Plus mini kit (Qiagen, Cat#74134) to the manufacturers specifications. RNA was converted into cDNA using superscript VILO Master Mix (ThermoFisher, Cat# 11755050), to the manufacturers specifications. qPCR was performed in quadruplicate using Maxima 2x SYBR Master Mix, cDNA, and indicated primer combination (See Methods Table M4). Samples were run for relative quantification in the Applied Biosystem’s Step One Plus qPCR instrument. Relative transcript abundance was calculated by delta CT.

44 2.5 Results

2.5.1 Isolation of presumptive colorectal cancer CICs

I sought to couple target discovery to antibody affinity reagent development against a functional subpopulation of cancer cells. To do this, I exploited a patient-derived xenograft (PDX) colorectal cancer (CRC) model, known as POP92. Importantly, POP92 maintains heterogeneity and stem-like properties wherein AC133 can enrich for functional CICs [32]. POP92 xenografts were isolated and subsequently maintained in defined culture medium as spheroids by Dr. Nick Pedley in the laboratory of Dr. Catherine O’Brien (University Health Network). To isolate the CIC population, spheroids were dissociated to a single cell suspension, stained with AC133 antibody and FACS sorted into AC133-positive presumptive CICs and AC133-negative bulk cell population (Figure 3C). In order to generate antibodies against native antigens, live cells were used. Previous cell-based selection methodologies have required a minimum of 107 cells per condition [98]. However, these selections generally exploited commercial cell lines engineered to express a particular antigen of interest. Thus these cells could be expanded to achieve virtually any number desired. However, because we relied on FACS to isolate a rare subpopulation, we were technically limited in maximum achievable live cell numbers. Over an 8-hour period, under low pressure, we were able to reliably isolate at least 2x106 AC133-positive and at least 4x106 AC133-negative bulk, viable cells by FACS. This represented the top 10% of AC133-positive CICs, and the bottom 20% AC133-negative bulk population. The negative cells were split in half. 2x106 were then used for negative enrichment, and the other 2x106 for pre-absorbing the positive pool. In this way, the positive and negative populations were FACS sorted each day, for four consecutive days.

2.5.2 Selection of CIC-specific synthetic antibodies

A custom high-diversity library of synthetic antibody fragments (Fabs), displayed on M13 bacteriophage, called Library I (Figure 3B), was first panned on the surface of bulk, AC133- negative cells to remove sticky binders. Nonbinding phage-Fab was immediately transferred to the AC133-positive population and incubated to allow for binding. Non-specific phage-Fab was washed off, and specific binders eluted for subsequent amplification (positive enrichment Figure 3C). In parallel, sticky binders were eluted from the AC133-negative population and amplified (negative enrichment). For the second round, the positive enrichment pool was again counter- 45

selected on bulk cells, and non-binders transferred to the AC133-positive population. The negative enrichment pool was directly panned on bulk AC133-negative cells each day and binders eluted and amplified. Selections were carried out in this way for four rounds over four consecutive days (Figure 3C). Through repeated rounds of selection, phage-Fabs with binding specificity for the AC133-positive, presumptive-CIC population become enriched in the positive pool and depleted in the negative pool. The unique sequences of each antibody complementarity determining regions (CDRs) determine their requisite binding specificities and biophysical properties. These CDRs, encoded in the variable regions of Fab constructs, also serve as a molecular barcodes that can be read by deep sequencing, thus linking genotype to phenotype. Following the final round of selection, the CDRs from pools of phage-Fabs bound to the AC133- positive or bound to bulk cells (negative pool) were PCR amplified and deep sequenced. In parallel to deep sequencing, I amplified and sequenced 96 individual phage-clones each from positive and negative pools. Many unique Fab sequences were present a single time in either pool; however, one sequence, AN01, was identified three times (~3% frequency) in the positive pool and zero times in the negative pool. This top hit was identified approximately one week after beginning the selection process, allowing immediate follow up characterization. The deep sequencing data both confirmed my single clone approach by identifying AN01 as the most enriched binder, and contained a list of many hundreds of additional candidate binders specific to the positive pool (see Table 3 and Table S1). Prospective CIC binders were selected based on relative abundance, and subcloned CDRs into Fab expression constructs, expressed, and purified these Fabs for follow-up experiments. This work led to the development of seven Fabs, which I named AN01-AN07 and numbered in order of enrichment in the positive selection phage-pool (see Table 3).

Table 3: CellectAb identifies many potential colorectal CIC binders.

Phage-Fab output pools from the positive enrichment and negative enrichment were collected after the fourth round of selection. Complementarity determining regions (CDRs) were PCR amplified and the CDR heavy chain region 3 (CDRH3) and CDR light chain region 3 (CDRL3) were deep sequenced and translated for amino acid sequence. Translated CDR sequences are shown except for AN01, AN02 and AN03 which are redacted to show sequence length but not composition. Total number of reads in the positive (+) enrichment is shown in column 4, and

46

column 5 shows the enrichment score calculated as: log2 read counts in the positive (+) selection

pool/ log2 read counts in the negative selection pool.

Antibody CDRH3 amino acid sequence CDRL3 amino Read Enrichment ID acid sequence count score (+pool)

AN01 X[18] X[6] 579593 18.61

AN02 X[10] X[8] 15347 13.37

AN03 X[19] X[6] 12736 13.1

AN04 YHWYAM GSGLI 10027 12.75

AN05 YPVASFGWAPPYSWYYYGF SSHSLF 5187 11.8

AN06 PAFRPSYSYWYYSWAL SYSGLI 1579 10.09

AN07 YYSPYSAL SYYDYLF 1182 9.67

2.5.3 Antibodies bind preferentially to the AC133high POP92 population

I first assessed binding of these seven Fabs to the surface of live, bulk POP92 cells by flow cytometry. Of these, the top three enriched Fab sequences (AN01, AN02 and AN03) bound robustly to the cell surface (Figure 4A). To test if these antibodies bind specifically to AC133- positive presumptive CICs, I co-stained pairwise with each Fab and AC133 and analyzed by flow cytometry. The top and bottom 10% of AC133-expressing cells were gated, as during the selection process (Figure 4B top), and Fab-binding mean fluorescence intensity (MFI) compared between populations (Figure 4B bottom). The observed MFI for each of AN01, AN02 and AN03 was significantly higher on AC133-positive compared to AC133-negative cells (Figure 4C, p<0.01 Unpaired t-test with Welch’s correction, N=7), validating the ability of my selection strategy to yield antibodies with preferential specificity for the AC133-positive subpopulation. Interestingly, I did observe binding, albeit at a reduced level, to the AC133 negative population, demonstrating that my methodology allows successful selection against antigens expressed on a continuum between the subpopulations of interest.

47 AB AN01 AN02 AN03 POP92 AN01 AN02 AN03 Secondary AC133 Counts CD133 + Antibody-APC CD133 - C

**

2.5 ** Counts ** 2.0 Antibody-APC

1.5

1.0 CD133+ CICs 0.5 Fold increase in mean fluorescent intesnity on

0

AN01 AN03 AN03 Control

Figure 4 – AN01, AN02 and AN03 bind selectively to the CD133+ presumptive CIC population A) Live, surface staining of POP92 cells with AN01, AN02 and AN03 antibodies followed by anti-human IgG conjugated to APC. Histogram from a representative flow cytometry experiment showing binding to POP92 cells. B) Histograms from a representative flow analysis of POP92 cell co-stained with AC133-FITC and AN0#-APC. The top and bottom 10% of AC133 expressed cells are compared for AN0# binding. C) Bar chart illustration of the mean fold increase in AN01, AN02 or AN03 mean fluorescent intensity (MFI) on AC133 positive as compared to AC133 negative cells (normalized to 1 in each experiment). Error bars are standard error of the mean (SEM) and significance was calculated using a student’s T-test. For AN01 (N=7), AN02 (N=6), and AN03 (N=7) biological replicates.

48

2.5.4 AN01, AN02 and AN03 recognize distinct cell surface antigens on a variety of cell types

As I selected antibodies on CD133-positive cells, I may have identified new CD133-specific binders. To test this, I mixed HEK293 cells overexpressing CD133-GFP fusion protein and matched empty-vector control at a 1:1 ratio, stained with each Fab, and analyzed by flow cytometry. The matched lines were gated separately based on GFP expression and no change in Fab binding between the CD133 overexpression line (GFP+) and negative control (GFP-) was observed, indicating that my Fabs do not target the CD133 protein (Figures 5A and B). AC133 antibody was used as a control for CD133 overexpression. I went on to characterize binding patterns for each Fab on a panel of cells lines including additional CRC lines as well as lines derived from breast, prostate and pancreatic cancer. I observed unique binding patterns for each Fab, that did not correlate directly with CD133 status, (Table 4 and Figure 5C). Together, these observations indicate that AN01, AN02 and AN03 do not bind directly to CD133, do not bind ubiquitously to the cell surface, and are each binding to a distinct antigen. The sequence corresponding to each Fab CDR was also cloned into a human single-chain IgG vector for expression in mammalian cells, and similar binding patterns were observed.

Table 4: AN01, AN02 and AN03 expression pattern across cancer cells from many tumour types.

A panel of cell lines derived from a variety of normal and cancer tissue types were stained were assessed for CD133 expression status and cell surface staining pattern for AN01, AN02 and AN03 antibodies by flow cytometry. Cell lines are classified as either CD133 positive (+) or CD133 negative (-) by transcript and staining with AC133 antibody. Grey shading indicates CD133 positivity. Cell lines are classified for AN01, AN02 and AN03 by percent positive staining for indicated antibody. “+” = 1-25% positive cells, “++” = 25-75% positive cells, and “+++” = 75-100% positive cells. “-“ is <1% staining. All percent positives are calculated as % live cells shifting above secondary only control stain.

49

Relative Binding

Cancer type Cell Line CD133 status AN01 AN02 AN03

Breast MAC-LS-2 + ++ ++ +++

BT549 - - - -

MCF7 - ++ + -

HCC1954 - - + +

BT20 - + - -

Colon POP92 + ++ +++ +++

HCT116 + ++ ++ -

Caco2 + ++ - -

Pancreas GP3A - - +++ -

KP2 - + +++ +++

KP4 - - ++ -

HPAC + - - ++

PL45 + - +++ -

Prostate LnPac - ++ - -

PC3 - +++ ++ -

DU145 - - - -

Leukemia HAP1 – +++ ++ -

Non-cancerous HUVEC - + +++ -

293T - +++ - -

293T CD133-GFP + +++ - -

50

B A 50% Empty vector 50% CD133-eGFP Anti-CD133 AN01 AN02 AN03 eGFP + eGFP + eGFP - eGFP - Counts Counts Forward scatter

eGFP Antibody-APC Antibody-APC

HEK293T HUVEC KP2 GP3A MAC-LS-2 CD133 C Secondary Counts

CD133

AN01 AN02 AN03 Secondary Counts

AN0#-APC

Figure 5 – AN01, AN02 and AN03 do not bind CD133, and recognize distinct cell surface targets A) HEK293T cells stably expressing empty vector or CD133-eGFP fusion were mixed 50:50, stained with antibody and gated on eGFP expression. Binding of CD133 antibody, AC133, confirms expression exclusively on the eGFP positive population. B) Histograms show binding of each antibody in populations gated on eGFP expression. C) A panel of cell lines including HEK293T, HUVEC human vascular endothelial cells, KP2 and GP3A pancreatic ductal adenocarcinoma and MAC-LS-2 breast cancer cells were stained with AC133, AN01, AN02 or AN03 to assess binding patterns. Histograms from representative flow cytometry experiments assessing show CD133 binding (top) and AN01, AN02 and AN03 binding pattern (bottom).

51

2.5.5 Identification of prospective protein targets for AN01, AN02 and AN03

Next, I aimed to identify the cell surface protein to which each antibody was binding. To do this, I exploited binding specificities and immunoglobulin framework of my antibodies to immunoprecipitate (IP) their respective targets. Briefly, POP92 whole cell lysates were incubated with antibody, immunocomplexes precipitated using conjugated beads, eluted, digested with trypsin and peptides detected and identified by mass spectrometry (MS). Lysate incubated with beads only served as a universal negative control, and because these antibodies bind different targets, each IP could serve as negative control for the others. All three antibodies pulled down reasonable peptide counts for human IgG light and heavy chains, indicating that the IPs were successful (Table 5 and Table S2). To identify specifically bound peptides, results were filtered for zero counts in the bead only control, and top unique counts were filtered for each of AN01, AN02 and AN03.

The top cell surface protein identified as unique to AN01 IPs, with 31 peptides, was integrin alpha (α) 7 (encoded by ITGA7), followed closely by integrin beta (β) 1 (encoded by ITGB1) with 25 peptides (Table 5 and Table S2). Together, these α and β subunits form a heterodimeric laminin receptor and cell adhesion molecule, integrin α7β1.

The top cell surface protein uniquely pulled down by AN02, with 33 peptides, was the human leukocyte antigen (HLA) major histocompatibility complex (MHC) class I antigen A-1 (HLA- A1 encoded at the HLA-A gene locus) (Table 5 and Table S2). MHC Class I molecules are cell surface receptors, expressed on all nucleated cells, that act as part of the adaptive immune system. Their purpose is to present peptides from the cytoplasm to surveilling immune cells. In the case of infection or cancer, these peptides may be identified by cytotoxic T-cells as “non- self”, targeting the aberrant cell for destruction. There are over two thousand unique HLA-A haplotypes broken down by serotype (i.e. HLA-A1, HLA-A2) and further by allele (ie. HLA-A1 serotype is broken down into HLA*0101, HLA*0102 alleles etc.) Each diploid cell expresses up to two unique haplotypes from the HLA-A locus, and these are used for tissue matching during transplantation [107]. The vast majority of HLA-A peptides identified represent the HLA-A1 serotype, but the allele could not otherwise be specified, at the given MS coverage. This haplotype, and the ability of AN02 to discriminate between various cancer cell lines, suggests that my antibody is a serotype-specific binder for HLA-A, not other HLA-A serotypes.

52

The AN03 IP produced somewhat less apparent results than for AN01 and AN02 in that there were low peptide counts for unique, cell surface prospective targets. However, the integrin α subunit V (encoded by ITGAV) and integrin β subunit 6 (encoded by ITGB6), which together make integrin αVβ6 (Table 4 and Table S2), were specifically pulled down with AN03. This heterodimeric cell adhesion molecule acts as a receptor for fibronectin, vitronectin and Latency- Associated Peptide bound TGFβ (LAP-TGFβ) [72, 108].

Table 5: IP-MS results 1- Potential targets of AN01, AN02 and AN03.

Table of selected IP-MS results for antibodies from POP92 whole cell lysate was incubated with scIgG, followed by Protein G bead pull down. BOC is beads only control. Peptide counts for top hits are shown. See Table S2 for full IP-MS results.

Protein Description BOC AN01 AN02 AN03

Ig Kappa V-I IgG light chain 0 44 39 59

Ig Heavy V-III IgG heavy chain 0 7 17 18

ITGA7 Integrin α7 0 31 0 0

ITGB1 Integrin β1 0 25 0 1

ITGA7 -X1B Integrin α7 isoform X1B 0 1 0 0

SLC3A2 CD98 heavy chain 0 0 0 6

ITGAV Integrin αV 0 0 0 2

ITGB6 Integrin β6 0 0 0 2

HLA-A1 HLA-A, serotype1 0 0 33 2

53

2.5.6 Validation of the protein targets for AN01, AN02 and AN03

To test if proteins identified by IP-MS represent the true targets of my antibodies, I tested if perturbations to expression of each prospective gene target could augment binding. First, I used RNA interference (RNAi) to knock down expression of our purported antibody targets in POP92 cells and assessed binding by flow cytometry. I observed a dramatic reduction in binding of AN01 to POP92 cells stably transduced with two independent short hairpins RNAs (shRNAs) against ITGA7 or ITGB1, compared to control non-targeting shRNA (Figure 6A and 6D). Similarly, I observed a dramatic reduction in AN02 binding when POP92 were treated with shRNAs against HLA-A (Figure 6B and 6F), and shRNAs against ITGB6 reduced binding of AN03 to POP92 cells (Figure 6C). This was consistent with the IP-MS results and strongly support the notion that AN01 binds specifically and directly to α7β1, AN02 to HLA-A, and AN03 to αVβ6.

The integrin family is composed of 18 α and 8 β subunits expressed as a total of 24 unique obligate αβ heterodimers [55]. Integrins are assembled as nonrandom pairs; there are promiscuous “hub” subunits (ITGB1 and ITGAV) and monopairing “spokes”, (including ITGA7 and ITGB6) which are generally rate-limiting (Figure 2B). The human integrin repertoire shows a high degree of between members of α and β subunit families, likely due to a common evolutionary ancestor. Indeed, C. elegans have only two α and one β subunits making only two distinct integrin heterodimers [109]. To unequivocally test for integrin heterodimer specificity of our antibodies, I overexpressed twenty different integrin heterodimers by transfection in HEK293T cells and assessed antibody binding by flow cytometry. As an internal control, I spiked-in labeled control transfected cells into each tube before staining with AN01 or AN03. I observed a very high degree of specificity of AN01 for α7β1, and AN03 for αVβ6 suggesting these antibodies do not cross react with other integrins (Figure 6E). Further, that AN01 binds to α7β1, but not other β1-subunit containing heterodimers (i.e. α1β1, α2β1 etc.) confirms that AN01 binds to the α7 subunit. Similarly, that AN03 binds to αVβ6 and not to other αV-subunit containing heterodimers (ie. αVβ1, αVβ5 etc.) confirms that AN03 is specific for β6. However, I cannot rule out antibody binding to a combined epitope at the interface of α- and β-subunits of these heterodimers because α7 must be paired with β1 and similarly β6 must be paired with αV in order for these requisite heterodimers to be expressed at the cell surface.

54 ABC AN01- Integrin α7 AN02- HLA-A1 AN03- Integrin β6 Secondary LacZ Secondary SecondaryLacZ sh shLacZ sh shITGA7#1 HLAA ITGB6 ITGA7 sh #1 shITGB6#1 sh #2 shHLAA#2 sh #2 Counts Counts Counts

AN01-APC AN02-APC AN03-APC

150 D E *

AN01- Integrin α7 *** AN01 Secondary 100 AN03 shLacZ shITGB1#1 shITGB1#2

over control 50 % Maximum fold increase Counts 0 AN01-APC

α1β1 α2β1 α3β1 α4β1 α5β1 α7β1 α9β1 αLβ2 αLβ2αMβ2 αVβ1αVβ3 αVβ5αVβ6 αVβ8 Control α10β1α11β1 αIIbβ3 α6 (1)β1α6(2)β1 Integrin heterodimer transfection FG** ** 125 100 100 ** 75 50 50 25 % Parental MFI *** *** 0 Median Fluorescent intensity (% shLacZ) 0

ParentalITGA1 ITGA2 ITGA3 ITGA4 ITGA5 ITGA6 ITGA7 ITGA9 ITGB1 ITGB4 ITGB5 shLacZ KO KO KO KO KO KO KO KO KO KO KO shHLAAsh#1HLAA #2 HAP1 Knockout

Figure 6 – Validation of cell surface protein targets of AN01, AN02 and AN03 A) Histograms showing indicated antibody binding to POP92 cells treated with control shLacZ or shRNA targeting ITGA7, B) HLA-A, C) ITGB6, or D) ITGB1. E) Bar chart illustration of relative AN01 and AN03

binding on HEK293T cells co-transfected with indicated integrin subunit pairs. MFIs for each transfection were calculated as relative to in-tube untransfected control cells. This measurement is presented as % maximum staining for each antibody. In all panels error bars represent standard deviation (SD) from at three independent biological replicates. F) Bar chart illustrating AN02 binding to POP92 cells stably transduced with shRNA against HLAA or shLacZ negative control. Data is represented as % shLacZ MFI. G) Bar chart illustrating the mean fluorescent intensity (MFI) staining of AN01 on a panel of HAP1 cells with various integrin subunits knocked out (KO) presented as % parental MFI from an in-tube parental control. Significance was calculated for each comparison compared to control by student’s t-test. *p<0.05, ** p<0.01, *** p<0.001.

55

Consistent with this, my IP-MS experiments showed peptides from α and β subunits for both targets.

To test for non-specific binding to all cells (i.e. stickiness), I employed a panel of integrin knockout lines derived from the human haploid cell line, HAP1. Parental HAP1 cells have high levels of AN01 binding by flow. When ITGA7 or ITGB1 are knocked out, binding of AN01 is completely abrogated, while knockout of other integrin subunits causes minimal changes in expression of integrin α7, and thus AN01 surface binding, in this line (Figure 6G). HAP1 cells do not express HLA-A1 or integrin β6, thus no binding of AN02 or AN03 was observed across the entire integrin KO panel (data not shown). Together, these findings clearly show that AN01 is specific to cell surface integrin α7β1, AN03 to αVβ6, and AN02 to HLA-A1.

2.5.7 ITGA7 and ITGB6 antibodies enrich for CRC CICs

Non-adherent sphere formation is an in vitro surrogate assay for CIC function and self-renewal. To test for the ability of my antibodies to prospectively identify CRC CICs, I stained POP92 cells and FACS sorted the high and low populations into an in vitro sphere formation assay (Figure 7A). As single stains, both AN01 and AN03 were able to significantly enrich for sphere forming cells (Figures 7B and 7D) while AN02 showed a trend towards enrichment but did not achieve statistical significance (Figure 7C). To test for functional self-renewal capacity, these spheres were dissociated, counted, and serially passed into a secondary SFA. The increase in sphere forming efficiency (SFE) was maintained on secondary passage for all three targets, indicating that these antibodies may enrich for self-renewing, stem-like CICs in the POP92 model (Figures 7B-D). Together, these findings validate my CellectAb approach for the identification of binders specific for a cellular subpopulation of interest . Moreover, these binders can be readily used to identify protein targets, and used as single agents, to enrich for a functionally relevant cell population.

2.5.8 Integrin α7 antibody enriches for other types of stem cells

While it has not been previously implicated in CRC, the laminin receptor integrin α7 is a well described human muscle stem cell marker [110]. In collaboration with Dr. Penney Gilbert’s lab, Sadegh Davoidi used AN01 on a human muscle biopsy to isolate human muscle stem cells, and found that it was of comparable effectiveness as a commercially available antibody (Figure 8A

56 A Dissociate and Sort for positive vs. negative Sphere formation stain assay (SFA) +

Secondary sphere formation − assay 1) AN01 (Integrin α7) 2) AN02 (HLA-A1) Count spheres 3) AN03 (Integrin β6)

B Integrin α7 CDHLA-A1 Integrin β6 Primary Secondary Primary Secondary Primary Secondary SFA SFA SFA SFA SFA SFA ** 25 25 25 ** p=0.1045 20 * 20 * 20 **

15 15 15

10 10 10 efficiency efficiency efficiency

% Sphere forming 5 % Sphere forming 5 % Sphere forming 5

0 0 0

AN01+ AN01- AN01+ AN01- AN02+ AN02- AN02+ AN02- AN03+ AN03- AN03+ AN03- No Stain

Figure 7 – AN01 and AN03 enrich for self-renewing CICs. A) Schematic of FACS sorting approach. POP92 spheres were dissociated, stained with indicated antibody and viability dye, and top and bottom 10% of expressing cells were sorted into sphere formation assays (SFA). Primary sphere formation was assessed by sphere counting. Primary spheres were dissociated, cells counted and equal numbers seeded into secondary SFAs. B) Bar charts showing the mean sphere forming efficiency (SFE) as % of cells giving rise to a sphere, for FACS sorted high and low staining AN01 (integrin α7) and C) AN02 (HLA-A1) and D) AN03 (integrin β6), with individual biological replicates indicated as unfilled circles. The primary SFA is shown to the left of the dotted vertical line, and the secondary SFA is shown to the right. The FACS sorted, but unstained control is shown in (B) to illustrate the average SFE of bulk POP92. Error bars represent SEM, and significance was calculated by paired, two tailed t-test. *p<0.05, ** p<0.01, *** p<0.001.

57

and 8B). Given that CD133 is an established marker for GBM stem cells, that growth on laminin supports GBM stemness and other types of stem cells, and that the α7 related integrin, α6, marks GBM stem cells [38, 111, 112], I hypothesized that the closely related integrin α7 may also play a role in GBM. To this end, I shared my antibody with Dr. Sheila Singh’s laboratory where Nicholas Yelle used two patient-derived GBM models, which maintain the heterogeneity of the patient, to test whether the combination of CD133 and integrin α7 could isolate GBM stem cells. In short, GBM spheres were dissociated, co-stained for CD133 and integrin α7 and sorted into four quadrants; CD133+/α7+, CD133-/α7+, CD133+/α7- and CD133-/α-. Sphere formation and proliferation were assessed. In both models, double negative cells (CD133-/α7-) formed the least spheres and, as expected, the double positive population (CD133+/α7+) formed significantly more spheres as compared to the double negative cells (Figures 9A and 9C). Interestingly, in both models the α7 single positive (CD133-/α7+) cells formed more spheres, and were more proliferative than the single positive CD133 cells (CD133+/α7-), indicating that α7 may be a bonafide GBM stem cell marker with greater utility even than CD133 (Figure 9). Very recently, Haas and colleagues also identified integrin α7 as a functional marker in GBM [113].

2.5.9 Staining with antibody has no functional effect

AN01-AN03 isolate a functionally relevant cell population from POP92, and collaborators have successfully used AN01 to isolate stem-like cells in two additional models. For those experiments, the presence of integrins α7, β6 and HLA-A1 were used simply to identify, mark and isolate cells of interest by antibody binding and FACS. One concern with this strategy might be that high staining with antibody could augment the function of target antigen, perhaps leading to biological effect. For example, staining with an agonistic antibody might increase self- renewal. In contrast, an antagonistic antibody might block signaling and reduce self-renewal. To test if there was any functional effect of antibody staining on cell viability and/or stem-ness, I treated POP92 cultures with each antibody at concentrations up to 50-fold over the staining concentration (up to 100ug/mL). I observed no significant effect of antibody treatment on growth or sphere formation (Figures 10A and 10B). These results indicate that while these antibodies mark functionally relevant cells, they themselves do not have an overt functional effect on proliferation or self-renewal.

58

AN01 105 3.4% A 104

3 CD56 10 0 105 105 0103 104 105 4 104 10 ITGA7 78% 3

7AAD 3 10 86% 10 Commercial Antibody 0 0 105 2.85% 20k 40k 60k 80k100k CD45-CD11b-GlyA-CD31 0103 104 105 104 FSC CD34

CD56 103 0

B 0103 104 105 ITGA7 98.3% AN01 0.012 98.6% Commerical 0.009

0.006 Unit Area 0.003

0103 104 105 ITGA7

Figure 8 – AN01 (integrin α7) marks human muscle stem cells. A) Isolated human muscle biopsy cells stained with appropriate muscle stem

cell antibody panel and with AN01 (integrin α7) compared to commerical integrin α7 antibody. B) Flow cytometry plot of AN01 binding to isolated muscle stem cells compared directly to commercial antibody. 59

AB BT799 BT799 250 300

200 250 200 150 150 100

Sphere formation 100

50 Presto Blue Proliferation 50 (% double negative population) (% double negative population) 0 0 ITGA7 (AN01): ++-- ITGA7 (AN01): ++-- CD133: +-+ - CD133: +-+ -

C D GBM87 GBM87 800 500

400 600 300 400 200 Sphere formation 200

Presto Blue Proliferation 100 (% double negative population) (% double negative population) 0 0 ITGA7 (AN01): ++-- ITGA7 (AN01): ++-- CD133: +-+ - CD133: +-+ -

Figure 9– AN01 (integrin α7) marks human glioblastoma (GBM) stem cells. Two patient-derived GBM stem cell lines were co-stained with AN01 and anti-CD133 and FACS sorted into sphere formation and proliferation assays. A) Boxplot showing sphere formation for CD133 and AN01 quadrant sorted BT799 GBM patient tumour cells and B) bar plot showing associated proliferation. C) Sphere formation and D) proliferation for GBM87. Data is from one representative experiment, N=2. 60

A - control + control AN01 AN02 AN03

B 25 ns 20

15

10 efficiency (%) efficiency Sphere forming 5

0

PBS AN01 AN02 AN03 - control+ control

Figure 10 – Antibody binding is not cytotoxic A) Representative bright field images for POP92 cells treated with 50ug/ml of indicated antibody. A non-binding IgG raised against maltose binding protein serves as a negative control, and

anti-ROBO4 antibody serves as positive control B) Barplot showing percent sphere forming in the indicated antibody treatments. Error bars represent SD. 61

2.5.10 Integrin β6 is required for sphere formation

I next hypothesized that expression of protein targets of my antibodies may promote sphere formation and self-renewal. To test for this, I used the CRISPR-Cas9 system to knock out ITGA7 or ITGB6 genes in POP92 cells and assessed sphere formation. Cells were treated with multiple short guide RNAs (sgRNA) against each target over a 72-hour period to allow for genome editing to occur. At this time, sgITGA7 cells were stained with AN01, and sgITGB6 cells with AN03. Then the presumptive knockout populations (integrin α7 KO and integrin β6 KO, respectively) and control cells were FACS sorted into sphere formation assays. I observed no significant effect of loss of integrin α7 on POP92 sphere formation with two validated sgRNAs compared to sgLacZ sorted control cells, indicating that this integrin is not required for in vitro sphere formation in CRC CICs (Figure 11A). However, cells without integrin β6 were severely impaired in sphere formation (Figure 11C). I also used shRNA against HLA-A and FACS sorted for AN02 (HLAA-1) low cells and observed a significant reduction in sphere formation (Figure 11B). These findings suggest that while integrin α7 may mark a functional subpopulation of cells, α7 does not play a direct role in mediating sphere formation or self-renewal in this cell type. In contrast, HLA-A1 and integrin β6 also mark self-renewing cells to some extent, but play a functional role in sphere formation, at least in vitro.

2.5.11 CellectAb antibodies are species cross reactive

To test if my integrin antibodies can also recognize species orthologs, I tested for binding on mouse cell populations. Mouse and human ITGB6 have ~90% amino acid sequence identity, while ITGA7 has ~87%. I found that AN01 was able to bind to mouse splenocytes and infiltrating mouse stromal cells within the POP92 xenograft tumour (Figures 12A and 12B). This is consistent with AN01 binding integrin α7 at an epitope conserved between mouse and human. Notably, I observed only ~2% integrin α7 positive human tumour cells in the POP92 xenograft, representing a drastic reduction from POP92 spheres cultured in vitro (Figure 11B). Given that integrin α7 is low in xenograft tumour cells, but is expressed highly on mouse cancer-associated stromal cells, perhaps this integrin plays a complex role in mediating xenograft- microenvironment interactions. Conversely, AN03 was unable to bind to mouse splenocytes or mouse cells infiltrating the tumour (not shown). It is unclear whether AN03 is unable to bind mouse integrin β6, or if β6 is not expressed on these cell types. Given the finding that integrin β6 62

ABCp=0.0759 p < 0.05

ns p < 0.05

100 100 100

75 75 75

50 50 50

25 25 25 Sphere formation (% control) Sphere formation (% control) 0 Sphere formation (% control) 0 0

#1 LacZ sgLacZ sg shLacZ HLAA HLAA ITGB6ITGB6ITGB6 sgITGA7sgITGA7#2 sh sh#1 #2 sg sg #1sg #2 #3

n=3 n=3 n=4

Figure 11 – Integrin β6 is required for sphere formation. A) Bar plot illustrating mean sphere forming efficiency (SFE) for FACS sorted AN01 negative POP92 stably transduced with unique gRNAs against ITGA7 (sgITGA7) or a control guide (sgLacZ).

N=3. B) Mean SFE for FACS sorted AN02 negative POP92 cells stably transduced with shHLA-A. N=3. C) Mean SFE for FACS sorted AN03 negative POP92 cells stably transduced with sgITGB6. N=4. All error bars represent SEM and significance was calculated by paired student’s t-test. *p<0.05, ** p<0.01, *** p<0.001

63

may be a relevant therapeutic target, I assessed AN03 antibody for binding to the Macaca fascicularis (cynomolgus monkey) integrin αVβ6 heterodimer overexpressed in CHO cells. M. fascicularis is used extensively in medical experiments because its close physiology to humans allows for toxicology studies. Importantly, AN03 is cross-reactive with M. fascicularis integrin β6 (Figure 12C)), demonstrating that CellectAb can quickly and robustly generate highly specific, and species cross-reactive antibodies that could be tested further in preclinical models, including the cynomolgus monkey.

2.5.12 CellectAb antibodies bind structural epitopes

In order to investigate epitope specificity and utility of these antibodies in standard biochemical assays, I tested them in various applications including recombinant ELISA and western blot. AN01 bound specifically, albeit weakly, to recombinant mouse integrin α7β1 in one experiment (Figure 12D). In contrast, AN03 could not recognize its recombinant protein target in vitro, even with fresh antigen, suggesting that it binds to a conformation-specific epitope that requires heterodimers to be displayed on the plasma membrane (not shown). Additionally, none of the three antibodies were functional in a denaturing western blot, again suggesting that they do not bind to linear peptides, but rather to structural epitopes (not shown). With the department of pathology at Toronto General Hospital, AN03 (integrin β6) was optimized and tested for binding by immunohistochemistry (IHC) on paraffin-embedded xenograft samples. AN03 performed comparably to the commercial β6 antibody MATF1037, and was highly specific for β6-positive POP92 tumour, as compared to a β6-negative breast cancer xenograft (Figure 13A). The αVβ6 integrin acts a receptor for latency-associated peptide (LAP) associated with TGFβ, thus freeing the active form of TGFβ and promoting downstream signaling. To test if αVβ6 is associated with TGFβ signaling in POP92 xenografts, anti-phospho-SMAD3 was used as a readout for TGFβ activation. Specifically sequential tumour sections were stained by IHC with the commercial β6 antibody, AN03 and phospho-SMAD3. This revealed that areas in the tumour with high nuclear localized phospho-SMAD3 (i.e. high TGFβ activation) also expressed integrin β6, with either antibody (Figure 13B, see arrows). Conversely areas with no phospho-SMAD3 staining showed low β6, suggesting that αVβ6 activates TGFβ in this model, and that AN03 can be used to identify TGFβ activated cells. Taken together, these observations demonstrate that

64 ABHuman

Mouse Splenocytes Mouse Control IgG AN01

Control IgG Counts AN01 anti-mouseH2K-647

Counts Human Mouse Alexa-488 Counts

Alexa-488 CD

2500 Secondary 2.0 Secondary anti-integrin β6 2000 anti-CD98 AN03 1.5 AN01 1500 1.0 1000 intensity 0.5 Absorbance (450nm) Mean Fluorescent 500

0 0 CHO CHO BSA Wildtype Cyno αVβ6 mA7B1 ATP1A1 CD98 full CD98 iso1

Figure 12– AN01 and AN03 are species cross-reactive A) Representative histogram from flow cytometry of live, singlet isolated mouse splenocytes stained with AN01 B) POP92 isolated xenograft cells stained with anti-mouse H2K Alexa647 and gated for human versus mouse cells. Histograms showing AN01 and control IgG binding to each

population. C) Parental CHO cell line and CHO cells overexpressing M. fascicularis integrin αVβ6 stained with commercial β6 antibody or AN03. D) Bar chart from one ELISA experiment showing AN01 and commercial CD98 antibody binding to various recombinant proteins. Error bars are SD from 4 technical replicates. 65 A MCF7 POP92 (Integrin β6 neg) (Integrin β6 pos)

AN03 (ITGB6)

MATF1037 (Commerical anti-integrin β6)

B MATF1037 p-SMAD AN03

Inset Zoom

Scale bar = 50μM

Figure 13 – AN03 is specific for integrin β6 in immunohistochemistry and co-localizes with TGFβ activation A) Representative IHC images of AN03 and commercial integrin β6 antibody (MATF1037) staining of MCF7 (integrin β6 negative) breast tumour compared to POP92 (integrin β6 positive) colon tumour. B) Semi-sequential sections from POP92 tumours stained by IHC stained for integrin β6 and phospho-SMAD3 (S423 + S425), a marker of TGFβ signaling activation. Arrow indicates area of integrin β6 and SMAD3 negative tumour cells.

66

cell-based selections are highly amenable to creation of antibodies that bind cell-surface targets specifically in physiologically relevant conformations, and could be useful in diagnostic tests, including IHC. They also suggest a function role for integrin β6 in mediating CRC CIC function through TGFβ signaling.

2.5.13 CellectAb can identify protein-protein interaction partners of targets

In order to identify the target of AN01, two different tagged versions of AN01 Fab (Flag-tag and Avi(biotin)-Tag) were employed in IP-MS experiments. Of note, SLC3A2 and SLC7A5 peptides were present in AN01 IPs across multiple cell lines (Table 6). These peptides correspond to the heavy (CD98hc) and light (CD98lc) chains, respectively, of the multifunctional heterodimeric cell surface amino acid transporter molecule CD98 [114]. Of five shRNAs against SLC3A3 (CD98hc), two were verified for target knockdown by western blot (Figure 14A), and these were further confirmed by flow cytometry (Figure 14B). While there was drastic reduction in CD98 antibody binding to cells expressing shCD98hc, there was no change in AN01 binding (Figure 14C). These observations suggest that CD98 physically interacts with α7β1 and thus is co- precipitated with α7 by AN01. To confirm this, I pulled down AN01 from whole cell lysates and observed endogenous co-IP of CD98 with α7β1 by western blot in multiple cell lines (Figure 14F). The CD98 large amino acid transporter, LAT1, is known to interact with β1 integrins [115] and my data confirms this in CICs. Together, these findings demonstrate the utility of my methodology at identifying not only novel cell surface targets, but also interacting partners.

67

Table 6: IP-MS results 2- AN01 pulls down integrin α7β1 in complex with CD98.

Tables of selected IP-MS results for two separate AN01 Fabs used to IP their presumptive targets and interacting partners from colorectal cancer cell lines POP92 and Caco-2 and breast cancer cell line MAC-LS-2. Flag tagged AN01 Fab was precipitated with anti-FLAG beads, and Biotin (Avi) tagged AN01 Fab was precipitated with streptavidin conjugated beads. BOC is beads only control. Peptide counts for top validated hits are shown. See Table S3 for full IP-MS results.

Cell line POP92 Caco-2 Macls2

IP- Fab-Tag Avi FLAG Avi FLAG Avi FLAG

Target Protein BOC AN01 BOC AN01 BOC AN01 BOC AN01 BOC AN01 BOC AN01

Integrin ITGAB1 0 25 0 19 0 13 0 12 0 0 0 1 7 1 α β ITGA7 0 28 0 20 0 16 0 9 0 0 0 0

SLC3A2 2 19 0 12 1 23 0 19 1 19 0 10 CD98 SLC7A5 0 3 0 2 0 3 0 3 0 2 0 3

68

Anti-CD98 A B Secondary shLacZ shCD98hc#1 KDA shLacZ shCD98hcshCD98hc #1 shCD98hc #2 shCD98hc #3 shCD98hc #4 #5 shCD98hc#5 135

Anti-CD98

100

48 Counts Anti- β Actin anti-CD98- PE

CDAN01- ITGA7 Caco-2 Secondary Protein G IP shLacZ 1% KDA shCD98hc#1 BOC AN01 IgG Input shCD98hc#5 β anti-integrin 1 100 135 anti-CD98 100 MAC-LS-2

Counts Protein G IP 1% AN01-APC KDA BOC AN01 IgG Input anti-integrin β1 100 135 anti-CD98 100

Figure 14 – AN01 does not bind directly to CD98, however CD98 is co-IPed with

integrin a7β1 A) Whole cell lysate from POP92 cells treated with shRNA against the CD98 heavy chain (CD98hc) (gene name SLC2A3) run on a western blot for verification of CD98 knockdown B) Flow cytometry on POP92 cells treated with two best shCD98hc stained with anti-CD98-PE, and C) in parallel, AN01 D) IP-Western

using AN01 or control IgG (AN02) probed for integrin β1 and CD98 in additional cell lines Caco-2 (colorectal cancer) and MAC-LS-2 (breast cancer).

69

2.6 Discussion

CellectAb represents a valuable new approach for antibody development and target discovery. Success of my top three candidates as ranked by total number of read counts in the positive pool suggests that the abundance in positive selection was highly correlated with presumptive-CIC specificity. The failure of additional hits to bind to cells could be due to a loss of specificity when converted from phage-Fab to Fab modality, poor antibody properties, or that lower abundance sequences were spuriously amplified. Further, given the success of my approach using a population of ~2x106 CIC-enriched cells to select for binders, it may be possible to scale down and robustly select binders on a smaller cell population of interest.

Beyond the successful technological advancement, this undertaking has yielded three new affinity reagents, one specific for HLA-A1, and two for distinct integrin subunits, α7 and β6, all of which are all overexpressed on CD133-positive POP92 cells. Importantly, despite sequence homology between various integrin subunits, my antibodies show no cross reactivity to other α or β subunit isoforms. In addition, AN02 targets HLA-A1 and shows minimal cross reactivity with other HLA-A serotypes, validating the CellectAb approach as capable of generating highly selective and specific antibodies against critical epitopes. To my knowledge, overexpression of MHC has not been reported in CRC CICs. While the antibody-binding trended towards enriching for CICs in this model, it did not achieve statistical significance in primary sphere forming assays. However, upon secondary sphere forming assay the trend was significant, indicating that overexpression of HLA-A may be associated with self-renewal. Further, that knockdown of HLA-A reduced sphere formation suggests that HLA-A may also play an unknown role in CIC function. I also speculate that overexpression of this, or other, MHC may be reflective of an immuno-stimulatory state that could perhaps be exploited with immunotherapy.

Integrin α7 has not previously been implicated in CRC CICs. In my model system, integrin α7 was co-expressed with CD133 on the CIC fraction, and enriched for sphere forming cells. However, ITGA7 was not required for sphere formation or proliferation, under tested conditions. Furthermore, integrin α7 was reduced when these cells were grown in vivo as xenografts, whereas supporting mouse stromal cells did express α7. Together these results give a conflicted picture as to the potential function of integrin α7 in CRC CICs. This is perhaps unsurprising as

70

in other cancer models, the role of integrin α7 is disputed, with several publications suggesting a tumour suppressor role [116]. As a cell adhesion molecule and laminin receptor, α7 may be dispensable in vitro, but may play a greater role in the tumour microenvironment of a xenograft or patient. This will need to be further investigated in other CRC models. Another layer of complexity on microenvironmental ITH is the expression of integrins on non-cancer cells including cancer/tumour associated fibroblasts (CAFs or TAFs). This is of growing interest, particularly for therapeutic translation of integrin-targeted drugs [117]. One cancer type where the role of integrin α7 is becoming clear is GBM. For in vitro culture, GBM stem cells, like embryonic stem cells, are often cultured on laminin to maintain a pluripotent state [111, 112] Also, integrin α6, another laminin-binding integrin and the closest homolog of α7, has been described as a GBM stem cell marker [38]. Since initiation of this study, another group has described integrin α7 as a functional marker for GBM stem cells, and they propose it as a therapeutic target [113]. This group also undertook an antibody screening approach, albeit much more labour intensive. This group used mouse monoclonal antibodies rather than synthetic human antibodies, as in our case. In contrast to cancer, integrin α7 plays well established roles in muscle stem cells, where loss-of-function mutation in ITGA7 leads to a form of muscular dystrophy in mouse models and human patients [70]. Overall, the role of integrin α7 in cancer, particularly CRC may need to be clarified. It seems likely that other laminin-binding integrins could be functionally redundant in CRC and that perhaps targeting multiple such integrins would be required to see functional effect in this model.

Integrin β6 has been previously implicated in invasion and metastasis for many tumours, including colon and breast [118, 119], but only linked to CICs directly in squamous carcinoma [120]. In our model, β6 was co-expressed with CD133 and could be used to prospectively isolate CRC CICs. Furthermore, perturbing β6 (and thus the αVβ6 heterodimer) blocked sphere formation, indicating that this model requires this integrin to grow as self-renewing spheroids. The αVβ6 heterodimer is an RGD-binding integrin whose ligands include fibronectin and vitronectin, but its principal ligand is the latency-associated protein of transforming growth factor β (LAP-TGFβ) [72, 108]. αVβ6 binding to LAP induces a conformational change that releases activated TGFβ, which in turn can activate TGFβ signaling [72]. The link between β6 and CRC CICs, described above, may be mediated by one or more of these ligands. IHC

71

experiments showed co-localization between β6 expression and phospho-SMAD3, a readout of TGFβ activation [121]. This suggests that the role of αVβ6 in our CRC CICs may be through TGFβ, although additional experimentation will be necessary to confirm. As αVβ6 activates TGFβ, and TGFβ signaling activates transcription of various ECM genes, my identification of αVβ6 as a CRC CIC marker may indicate a role of the CIC in the maintenance of its own ECM microenvironment. β6 is up-regulated in tissues undergoing development, wound healing, and in malignant cancers especially those with metastasis [118, 119]. However, unlike other integrin subunits, β6 has restricted expression with low to undetectable levels in most cells of healthy tissue [122], making it a promising cancer associated antigen (CAA), or direct therapeutic target. While AN03 does not block the function of β6 or show anti-proliferative effect in vitro, it (or other β6 biologic) represent promising agents to target CRC CICs as antibody-drug conjugates, or to engage the immune system as a bispecific T-cell engager (BiTE) or CAR T-cell modality. Indeed, our results support what other have others have suggested, that β6 is a viable target for immunotherapy [123].

CD98 was present in immunocomplexes precipitated by AN01 (α7β1) and in one experiment AN03 (αVβ6) CellectAb antibodies (a novel finding). Thus CellectAb methodology is not only capable of identifying novel CIC markers, but also, with no additional experimental effort, identification of protein complexes associated with these markers. In the case of CD98, this finding may point to potential importance of CD98 in CIC function through association with specific integrin heterodimers. CD98 was originally discovered as an antigen on lymphocytes and is essential for clonal expansion during the process of adaptive immune responses [114]. Similarly, clonal expansion allows CICs to proliferate rapidly and to form a tumour in vivo or a spheroid in vitro. The co-IP of CD98 demonstrates the utility of CellectAb methodology for identification of protein-protein interaction partners in CICs. The finding that CD98 co-IPs with integrin αVβ6 will also need to be investigated further.

2.7 Conclusion

My CellectAb methodology enables rapid generation of cell subpopulation specific antibodies coupled to the discovery of novel targets. I identified three antibodies with preferential binding to AC133high CICs. I identified and validated their targets as integrin α7, HLA-A1 and integrin

72

β6. I show that integrin β6 plays a functional role in these POP92 CRC CICs. I propose this technology for the discovery of novel targets and antibodies against rare cell populations in other cancer and stem cell systems. Chapter 3

3 Genetic Screens for Adherent Cell Growth Identify ITGAV as a Central Regulator of Cell State Preference

3.1 Contributions

This chapter is the result of a highly collaborative effort between myself, other members of the Moffat lab, and other laboratories and facilities at the University of Toronto, and a lab at the University of Minnesota. Detailed attribution of contributions follow below. This data chapter is adapted from a manuscript entitled: “Genetic screens for adherent cell growth identify ITGAV as a central regulator of cell state preference”, submitted to Cell on October 17, 2017.

All bioinformatic analyses of published CRISPR screens were performed in collaboration with Dr. Maximillian Billman, from the laboratory of Dr. Chad Myers at the University of Minnesota. The data analysis approach and basic methodology approach were devised as a team effort between myself, Dr. Billman and our supervisors Drs. Myers and Moffat. The ECM CRISPR screens were performed by myself and Keshna Sood, from Dr. Sachdev Sidhu’s laboratory with guidance from Dr. Michael Aregger and Megha Chandrashekhar. All gDNA extraction, PCR and library prep for guide RNA sequencing was done by myself with assistance from Ms. Sood. Mapping and analysis of raw sequencing data was carried out by the CRISPR Database online tool, and further analysis was carried out by Dr. Billman, in consultation with myself and Dr. Moffat. Rough figures containing published CRISPR screen and ECM CRISPR screen data were generated after discussion of the team, in R-studio by Dr. Billman, and I adapted them for presentation. I prepped samples for RNAseq experiments and sequencing was done at the Donnelly Sequencing Center (DSC). Sequencing results were quality controlled, mapped and fold change differences were calculated by Dr. Kevin Brown. I designed and planned all wet lab experiments for the remainder of the thesis. I participated in performing the majority of experiments, some alone and some with the assistance of Alejandro Duque. An exception was the derivation and characterization of the ITGB1-ITGB5 double knockout cells, carried out by 73

74

Dr. Taras Makhnevych. Dr. Barbara Mair assisted with maintenance of the many HAP1 and HCT116 integrin knockout clones in the p53 mutant backgrounds. Mutations in clonal lines were sequenced by myself or a summer student, Yuxi Yao. I personally analyzed all wet lab experimental data and generated and formatted all data panels. I ran all flow cytometry experiments for data acquisition, but all FACS sorting experiments were performed with assistance from a technician at the Faculty of Medicine Flow Cytometry Facility. I imaged all immunofluorescence at the Microscope Imaging Laboratory facility at the University of Toronto.

75 3.2 Abstract

The extracellular matrix (ECM) is vital for cellular morphology and function, as well as higher order tissue organization. The integrin family acts as receptors for the ECM, and thus plays critical roles in many cancer biological processes including survival, proliferation, invasion and metastasis. While seven drugs targeting leukocyte/platelet integrins have progressed to market, drugs against solid tissue integrins, including integrin αV, fail to show clinical efficacy despite promising preclinical results. In this chapter we took parallel functional genetic approaches to identify genes that regulate the cell-ECM interaction. We uncover integrin αV as a core regulator of adherent growth and proliferation. Specifically, loss of αV robustly mediates the loss of adherent growth and switch to a three dimensional sphere state in cell types ranging from fibroblasts to epithelial cancers. This sphere phenotype is completely reversed when exogenous ECM is provided or, surprisingly, metabolic redox conditions are perturbed. We further delineate the specific integrin heterodimers that regulate cell-matrix and cell-cell contacts in these conditions and genetically define adherent, sphere and single-cell suspension growth states. We go on to show that cell growth state preferences depend on a balance of integrins, ECM and metabolic conditions. We propose αV-targeted drugs may have failed clinical trials because loss or inhibition of integrin αV leads to drastic changes in how cells interact with their ECM and metabolic microenvironment, which were unaccounted for in preclinical models. Our panel of human integrin knockouts also provide a novel, human model system to better study the complex roles of integrin crosstalk, ECM sub-strata and metabolism on various cancer-relevant phenotypes.

76 3.3 Introduction

3.3.1 Functional genetic screens

The ability to disrupt genes in human cells is crucial for elucidating gene function. Genome- scale, pooled gene perturbation screens allow the cataloguing of gene functions under various contexts (cell line, growth conditions, drug selection) and thus have great potential for finding therapeutic targets for diseases such as cancer. Until recently, RNA interference (RNAi) was the best available tool for systematic, genome-scale gene disruption. However, despite various improvements in shRNA reagent design and analytical approaches, its utility for measuring gene function is limited by imperfect mRNA knockdown and confounding off-target effects [124, 125]. Pooled negative selection functional genomic screens use the fold-change in shRNA for a gene relative to an initial population. A strong negative fold change is used to imply a growth defect in cells harbouring an shRNA against this gene, and in turn, the gene is described as having an essential function in the given context. For shRNA screens, incomplete knockdown is a confounding factor in the determination of gene effect.

3.3.2 CRISPR-Cas9 pooled screening

The ease of genome engineering, particularly gene knock outs, in yeast has helped to establish a conceptual framework for systematic genetics to understand gene function. However, for a long time, this could not be applied to human cells on a large scale. The application of the CRISPR/Cas (clustered regularly interspaced palindromic repeats / CRISPR associated) system to human cells changed that. Pooled screens take advantage of the CRISPR/Cas system presented a major improvement over RNAi and have allowed more sensitive functional genetic screens. The CRISPR/Cas system was originally discovered as part of the adaptive immune systems of bacteria and archaea. In these organisms, the genomic CRISPR array serves as immune system memory, and each segment can be translated into RNA, associate with a Cas endonuclease protein, and target invading pathogen DNA for destruction. This linking of a single guide RNA (sgRNA) targeting a specific DNA sequence to an endonuclease has now been extensively exploited for eukaryotic genome engineering. The precise nature of CRISPR-Cas, and the ease of generating large libraries of sgRNAs targeting human genes has quickly facilitated functional genetic screens in human cells with greater resolution and sensitivity than previously imagined. Pooled CRISPR-Cas approaches present a major opportunity for systematic identification of

77 human genes that regulate cell fitness under various contexts. The Moffat lab, and others, have now performed many iterations of sgRNA library optimization, innovative computational analysis techniques, and have screened hundreds of individual cancer cell lines [126-132]. These data sets present a major opportunity for the identification of genes that regulate cell fitness under various contexts. These contexts may be clinically relevant, as in the recent publication that used CRISPR/Cas to identify genetic determinants of sensitivity to immunotherapy [133] or to oncogenic Ras [134].

3.3.3 Cell culture systems

Mammalian cells are generally grown in vitro either adherently (as monolayers, colonies, or semi-adherent clusters), or in suspension cultures (as spheroids, organoids, clusters, or single cells) . Adherent cells are typically propagated on tissue culture treated polystyrene that has been rendered “wettable” by treatment with gas plasma. This high-energy plasma breaks apart polystyrene bonds and oxidizes the surface with polar groups [135, 136]. Serum proteins interact with this surface ultimately facilitating adherence and monolayer growth. Suspension cell lines are cultured in similarly treated or untreated polystyrene vessels; however, they do not adhere, and proliferate readily without anchorage. Suspension and adherent cell lines include both normal and malignant cells. In stark contrast to these culture methods, in vivo solid tissue resident cell types are embedded and largely immobilized in three-dimensional (3D) structures made up of various fibrous and glycoproteins. When these cells are propagated in vitro, they are generally supplied with no exogenous matrix components (other than what is present in serum), or stromal cells, and are required to grow as a monolayer, without the extensive cell-cell and cell-matrix contacts that make up a tissue or solid tumour.

3.3.4 Modeling the microenvironment

As the role of microenvironment has been increasingly appreciated, researchers have moved towards systems that better model normal and diseased tissues. In particular, the stem cell and cancer stem cell fields rely on various modifications to standard tissue culture to maintain pluripotency, promote self-renewal and maintenance of a dedifferentiated state [8, 137, 138]. These include serum-free defined media, rich in growth factors [139], feeder cell layers [140, 141] or ECM coated plates [142] and 3D culture methods including spheres [139] or organoids [39]. Such efforts to optimize the microenvironment are increasingly applied outside the stem

78

cell fields. In particular, the emergence of cancer as a metabolic disease has directed current efforts to closely examine and often revise the metabolite composition of the medium in which we culture cells [126, 143-146]. Indeed, metabolite concentration has shown striking ability to rewire cancer cell metabolism and drastically impact sensitivity to systemic cancer therapies 5- flurouracil (5FU) or metformin, [147] and [126], respectively. As media conditions approach physiological optimization, we wanted to explore the role of the cellular interaction with the ECM on genetic dependencies for growth and proliferation.

3.3.5 Role of the ECM in cell fitness

The extracellular matrix or ECM is a collection of soluble and insoluble molecules secreted by cells that serves as a scaffold upon which tissues are organized and provides critical biochemical and biomechanical cues that direct many biological processes. All human cells have some interaction with the ECM, either in the context of a tissue or during the process of cell migration. The relationship between the ECM and disease biology is complex, as the composition, biomechanics and anisotropy of the ECM are exquisitely tuned to reflect the physiological state of a tissue [43, 55]. For example, cancer progresses within a dynamically evolving ECM that modulates virtually every aspect of a tumour, including associated tumour cell types, such as stromal fibroblasts or immune infiltrating cells. Tumours develop from heterogeneous cell populations, which is also reflected in the tumour-associated ECM. In other words, there is a large degree of variability seen in ECM deposition and stiffening in a single tumour, conditions which could explain why therapeutics targeting ECM-associated features have had limited success in clinical trials [43, 148]. To date, the precise role of ECM in cellular fitness is not clear, and a systematic approach to unveiling ECM-related fitness genes may help account for dynamically evolving ECM in disease models.

3.3.6 Rationale and Approach

Many large-scale efforts have reported various types of genomic and functional genomic data across hundreds of cancer cell lines, in order to catalogue and annotate gene functions [126, 132, 149-155]. For example, the application of CRISPR-Cas9 technology to systematically knock out genes in mammalian cells has facilitated the definition of core- and context-specific fitness genes in human cell lines [126, 127, 131, 132, 134]. A fitness gene can also be considered as a cell essential gene; that is, a gene which is required for the reproductive success of a cell line derived

79

from a multicellular organism. In these studies, cells were generally maintained in nutrient-rich, serum-containing media in polystyrene vessels treated with gas plasma, hereafter referred to as tissue culture plastic (TCP). Typically, under these cell culture conditions, cells either use serum ECM proteins to stick to TCP and proliferate adherently, or they proliferate non-adherently as clusters or single cell suspensions. As a consequence of this collection of fitness screens, an opportunity exists to dissect the genes required for adherent growth. It is important to differentiate between “cell state preference”, which refers to growth status (i.e. adherent or suspension), versus “anchorage dependence”, which refers to the requirement of cells for a stable surface and ECM to grow, function and divide. The present study examines the current catalogue of fitness genes in human cells and the relationship between genes that are required for adherent growth (i.e. cell state preference) and dependence on ECM.

3.4 Methods

3.4.1 Computational identification of adherence fitness genes

3.4.1.1 Scoring and quality control of published CRISPR/Cas9 screens

Several human cell lines that grow adherently or in suspension have been subjected to genome- wide loss-of-function CRISPR/Cas9 screens in different labs, using different gRNA libraries. Fold-change data for 60 published genome-wide CRIPSR/Cas9 screens was downloaded from GenomeCRISPR [156] to assess differential phenotypes between cell lines that grow in suspension and those that grow adherently (Figure 15A). A supervised approach described in [129] was used to score gene fitness for each of these screens. This approach provided a Bayes factor (BF) score for each gene, which measures the relative probability that a given gene is essential in a given screen. To control screening quality, we calculated the area under the ROC curve (AUC) for each screen using a set of previously determined 1580 essential genes [127]. Based on this analysis, screens in L33 and PC3 cells showed an AUC < 0.8 and were excluded from further analysis (Figure 15 B).

3.4.1.2 Two-step batch correction

After quantile normalization of the 58 high quality screens, we observed strong batch effects mainly driven by the gRNA libraries used for each screen. A comparison of the six cell lines that had been screened with two different libraries (in different labs) showed that within-batch 80 screening data correlation follows a similar trend (Figure 15C). Pairwise similarity of screen fitness profiles was also dominated by batch effects. To normalize these batch effects, we defined the five different libraries that had been used across all screens as batches. We observed that the growth adherent or suspension growth feature was largely confounded with these batches: two batches contained 36 adherent cell lines and 1 that had grown in suspension, whereas the other three batches contained 20 suspension cell line and 1 that had been grown adherently. To perform a batch correction without removing phenotypic differences between suspension and adherent cell lines, we used a two-step approach. Using the sva R package [157], we first performed batch correction separately on batches containing screens mostly (36 of 37) done in adherent cell lines and those batches that mostly (20 of 21) contained suspension cell lines (Figure 15D). Next, we merged the two batch-corrected data sets and repeated the batch correction. This two-step batch correction approach removed library-driven batch effects between the 58 genome-wide screening data sets (Figure 15E). Overall, this procedure provided us with fitness phenotypes for 15673 genes across 58 screens.

3.4.1.3 Identifying high confidence fitness genes in published data sets

To identify high-confidence fitness effects of genes across the 58 screens, we assumed a fraction of fitness effects similar to screens reported in Hart et al. 2015 [127]. In this study, ~12% of all tested genes showed essential phenotypes in at least two (out of five) screens. We assumed that 81

AB

0.95

0.90

0.85 PC3 L33 Area under ROC 0.80

Mine genome-wide ag16 L33 ag16 EX8 ag16 PC3 tz16 HL60 wa17 HEL wa17 tz16 HT29 ha15 GBM wa15 RAJI wa15 ag16 A375 ag16 A673 ag16 K562 ag16 HT29 ag16 G402 ag16 TC32 ag16 TC71 ag16 T47D wa15 K562 wa15 wa17 PL21 wa17 ha15 DLD1 ha15 RPE1 ha15 HELA wa17 EOL1 wa17 st16 HPAFII ag16 RDES tz16 MV411 wa17 SKM1 wa17 wa15 KBM7 wa15 ag16 MEWO ag16 PANC1 ag16 SKES1 ag16 HCC44 ag16 BXPC3 wa17 MV411 wa17 wa15 JIYOYE wa15 ag16 SU8686 ag16 CAL120 ag16 EXS502 ag16 HS294T ha15 HCT116 wa17 P31FUJ wa17 tz16 MOLM13 tz16 OCIAML2 tz16 OCIAML3 ag16 MHHES1 wa17 MOLM13 wa17 ag16 TOV112D ag16 CORL105 ag16 COLO741 wa17 OCIAML3 wa17 wa17 OCIAML2 wa17 OCIAML5 wa17 ag16 PATU8902 ag16 NCIH2009 ag16 NCIH1373 ag16 CADOES1 ag16 PANC0813 CRISPR screening data ag16 PANC0327 ag16 PATU8988T (6 publications, 58 screens) MONOMAC1 wa17 ag16 LNCAPCLONEFGC Screen (AuthorYearCellLine)

CD1st step E2nd step Quantile−normalized batch−corrected batch−corrected

wa17 OCIAML2 ag16 PANC0813 tz16 OCIAML2 ag16 SU8686 ag16 A673 wa17 OCIAML5 ag16 PATU8988T ag16 BXPC3 wa17 P31FUJ ag16 PANC1 ag16 EX8 ha15 HCT116 ag16 LNCAPCLONEFGC ha15 RPE1 ag16 HT29 ag16 HCC44 ag16 MHHES1 0.7 0.91 ha15 DLD1 ag16 HS294T 0.50.70.9 ag16 LNCAPCLONEFGC ag16 EXS502 ag16 G402 ha15 GBM wa15 K562 0.4 0.81 ag16 NCIH2009 ag16 T47D wa15 KBM7 ag16 CORL105 ag16 CADOES1 ag16 A375 st16 HPAFII ag16 PANC0813 ag16 A375 ag16 HT29 ag16 SU8686 ag16 EXS502 PCC ag16 LNCAPCLONEFGC ag16 PATU8988T ag16 A673 tz16 OCIAML3 PCC PCC ag16 PANC1 ag16 NCIH2009 ag16 BXPC3 ag16 HS294T ag16 CORL105 ag16 EX8 ag16 HCC44 ag16 MHHES1 ag16 PANC0813 ag16 G402 ag16 SU8686 ag16 T47D ha15 DLD1 tz16 HT29 ag16 CADOES1 ag16 HT29 wa17 MV411 ag16 COLO741 ag16 EXS502 ha15 GBM ag16 MEWO ha15 HCT116 wa15 RAJI ag16 TOV112D ag16 A375 ag16 TC71 st16 HPAFII ag16 T47D ag16 RDES ag16 NCIH2009 ag16 CADOES1 ag16 SKES1 ag16 CORL105 tz16 HL60 ag16 TC32 ag16 COLO741 wa17 MONOMAC1 ag16 CAL120 ag16 MEWO ag16 A673 ag16 PATU8902 ag16 RDES ag16 BXPC3 ag16 PANC0327 ag16 SKES1 ag16 EX8 ag16 NCIH1373 ha15 HELA wa15 JIYOYE ag16 K562 ag16 CAL120 ha15 RPE1 wa17 OCIAML2 ag16 TOV112D ag16 MHHES1 wa17 OCIAML5 ag16 TC71 wa17 MOLM13 wa17 P31FUJ ag16 TC32 tz16 MOLM13 wa17 MONOMAC1 ag16 PATU8902 ag16 PATU8988T wa17 HEL ag16 PANC0327 wa17 SKM1 wa17 SKM1 ag16 NCIH1373 ag16 PANC1 wa15 KBM7 ag16 K562 wa17 HEL wa17 MV411 tz16 OCIAML2 ag16 HCC44 wa17 MOLM13 wa17 OCIAML2 ag16 HS294T wa17 EOL1 wa17 OCIAML5 wa17 EOL1 wa17 OCIAML3 wa17 P31FUJ ag16 G402 wa15 JIYOYE wa15 KBM7 ha15 HELA wa15 RAJI tz16 OCIAML3 wa17 OCIAML3 wa15 K562 wa15 JIYOYE tz16 MV411 wa17 MONOMAC1 wa17 PL21 ag16 RDES wa15 RAJI tz16 OCIAML2 ag16 SKES1 tz16 HL60 tz16 OCIAML3 ag16 COLO741 wa17 SKM1 tz16 HL60 ag16 MEWO wa15 K562 tz16 HT29 ag16 TC71 wa17 HEL ag16 TOV112D tz16 MOLM13 tz16 HT29 ag16 PATU8902 tz16 MV411 wa17 MV411 ag16 NCIH1373 ha15 DLD1 wa17 EOL1 ag16 PANC0327 st16 HPAFII wa17 MOLM13 ha15 HCT116 tz16 MOLM13 ag16 CAL120 ha15 RPE1 tz16 MV411 ag16 K562 ha15 GBM wa17 OCIAML3 ag16 TC32 ha15 HELA wa17 PL21 wa17 PL21 ag16 EX8 ag16 EX8 tz16 HL60 wa17 HEL wa17 ag16 EX8 tz16 HL60 tz16 HT29 wa17 HEL wa17 tz16 HT29 tz16 HL60 ha15 GBM wa17 HEL wa17 tz16 HT29 wa15 RAJI wa15 ha15 GBM ag16 A673 ag16 A375 ag16 K562 wa15 RAJI wa15 ag16 A375 ag16 K562 ag16 A673 ag16 T47D ag16 HT29 ag16 TC71 ag16 TC32 ag16 G402 ha15 GBM wa15 K562 wa15 wa17 PL21 wa17 ag16 TC71 ag16 HT29 ag16 TC32 ag16 T47D ag16 G402 wa15 RAJI wa15 ag16 K562 ag16 A375 ag16 A673 wa15 K562 wa15 wa17 PL21 wa17 ha15 DLD1 ha15 DLD1 ha15 RPE1 ha15 HELA ag16 TC32 ag16 TC71 ag16 HT29 ag16 T47D ag16 G402 ha15 HELA ha15 RPE1 wa15 K562 wa15 wa17 PL21 wa17 wa17 EOL1 wa17 ha15 DLD1 wa17 EOL1 wa17 st16 HPAFII ag16 RDES ha15 HELA ha15 RPE1 st16 HPAFII ag16 RDES tz16 MV411 wa17 SKM1 wa17 wa15 KBM7 wa15 tz16 MV411 wa17 EOL1 wa17 wa15 KBM7 wa15 SKM1 wa17 st16 HPAFII ag16 RDES tz16 MV411 wa17 SKM1 wa17 KBM7 wa15 ag16 MEWO ag16 PANC1 ag16 SKES1 ag16 MEWO ag16 PANC1 ag16 SKES1 ag16 HCC44 ag16 BXPC3 wa17 MV411 wa17 ag16 HCC44 ag16 BXPC3 wa17 MV411 wa17 ag16 MEWO ag16 PANC1 ag16 SKES1 ag16 HCC44 ag16 BXPC3 wa17 MV411 wa17 wa15 JIYOYE wa15 wa15 JIYOYE wa15 ag16 SU8686 ag16 CAL120 ag16 CAL120 ag16 SU8686 ag16 HS294T ag16 EXS502 ag16 HS294T ag16 EXS502 wa15 JIYOYE wa15 ha15 HCT116 ha15 HCT116 wa17 P31FUJ wa17 ag16 CAL120 ag16 SU8686 tz16 MOLM13 wa17 P31FUJ wa17 tz16 MOLM13 ag16 HS294T ag16 EXS502 ha15 HCT116 wa17 P31FUJ wa17 tz16 MOLM13 tz16 OCIAML3 tz16 OCIAML2 tz16 OCIAML3 tz16 OCIAML2 ag16 MHHES1 ag16 MHHES1 tz16 OCIAML3 tz16 OCIAML2 wa17 MOLM13 wa17 wa17 MOLM13 wa17 ag16 MHHES1 ag16 TOV112D ag16 TOV112D wa17 MOLM13 wa17 ag16 CORL105 ag16 CORL105 ag16 TOV112D ag16 COLO741 ag16 COLO741 wa17 OCIAML3 wa17 wa17 OCIAML5 wa17 OCIAML2 wa17 wa17 OCIAML5 wa17 OCIAML2 wa17 wa17 OCIAML3 wa17 ag16 CORL105 ag16 COLO741 ag16 PATU8902 ag16 NCIH2009 ag16 NCIH1373 wa17 OCIAML5 wa17 wa17 OCIAML2 wa17 wa17 OCIAML3 wa17 ag16 PATU8902 ag16 NCIH2009 ag16 NCIH1373 ag16 PATU8902 ag16 CADOES1 ag16 NCIH1373 ag16 NCIH2009 ag16 CADOES1 ag16 PANC0813 ag16 PANC0327 ag16 CADOES1 ag16 PANC0813 ag16 PANC0327 ag16 PANC0813 ag16 PANC0327 ag16 PATU8988T ag16 PATU8988T ag16 PATU8988T wa17 MONOMAC1 wa17 wa17 MONOMAC1 wa17 wa17 MONOMAC1 wa17 ag16 LNCAPCLONEFGC ag16 LNCAPCLONEFGC ag16 LNCAPCLONEFGC

FGHBinarized matrix of 3,435 Number of genes fitness genes 188 ITGAV

0.8 1.0 co-dependency Core

Fitness 0.6 ITGAV PTK2 CRK Context RAC1 ITGB5

Genes (n=3435) Fitness TLN1 ITGB7 DOCK7 ITGB1 0.0 0.2 0.4 Adherent Suspension Suspension fitness factor ITGAV 0.0 0.2 0.4 0.6 0.8 1.0 Adherent Suspension Screens (n=58) Adherence fitness factor

I 0.7 Initial AFG Expanded AFG 0.6 STT3A HAP1 essential KLF5 Other 0.5 ITGB1

0.4 PKN2 0.3 PTK2 USE1 ITGAV Ratio HAP1 0.2 IMMT NCKAP1 CRKL 0.1 RAC1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Ratio KBM7 82

Figure 15– Data Mining Reveals Adherence-related Fitness Genes (AFGs). A) Schematic of data mining approach for identification of AFGs. Fold-change data for all screens was downloaded from the GenomeCRISPR (Rauscher et al.) database (except those screened in our lab), and a Bayes factor (BF) was assigned to each gene (Hart and Moffat, 2016). B) Initial quality control using the area under the precision-recall ROC curve of gene fitness scores (BF) and the set of 1580 fitness genes (Hart et al., 2015) as a gold standard. Screens scoring less than 0.8 (L33 and PC3) were excluded from further analysis. Screen IDs contain the name of the cell line in which the screen was performed as well as an abbreviation of the first author and year of the publication: ag16 –( Aguirre et al., 2016); ha15 – (Hart et al., 2015); tz16 – (Tzelepis et al., 2016); wa15 – (Wang et al., 2015); wa17 – (Wang et al., 2017); st16- (Steinhart et al., 2016) C) All pairwise similarities by Pearson correlation coefficient (PCC) between the 58 genome-wide screens following quantile normalization, then D) after the first step of batch correction, and E) after the second step of batch correction. F) Core and context-specific fitness genes were identified across 58 screens (black = fitness effect). Screens were classified as adherent or suspension based on reported growth conditions. (See Table S6) G) Scatter plot showing all context-specific genes plotted as proportion of fitness events in adherent screens (adherence fitness factor) as the x-coordinate (n=37) and proportion fitness events in suspension screens (suspension fitness factor) as the y-coordinate (n=21). Large grey circles indicate genes with FDR<50% and color represents the number of genes falling at the given coordinates. Several AFGs are indicated. (See Table 7 for list) H) Cell lines dependent on ITGAV were enriched for co-dependence on PTK2, CRK and

ITGB5 (FDR 50%). ITGB1 and ITGB7 cell line dependencies are shown for comparison (lower panel). I) Comparison of fitness phenotypes of identified AFGs in gene trap screening data from matched human haploid lines HAP1 (adherent) versus KBM7 (suspension)(Blomen et al., 2015). Shown are gene trap sense by antisense insertion ratios for each of the 511 genes that were essential in at least one of those cell lines and were present in our context fitness gene set (see Table 7). A black circle indicates a gene that was essential in HAP1, but not KBM7 as defined by Blomen and colleagues. Red and orange dots represent initial and extended AFGs, respectively. Several AFGs are labeled. 83 the same fraction (~12%) of most extreme fitness scores in a given screen would provide a high- confidence list of fitness genes in each of the 58 screens. We further considered a gene for subsequent analysis when it fell into the high-confidence fitness effect fraction in at least 10% (at least 6/58) of all screens. This yielded 3435 genes, which were further subdivided into core and context fitness genes using hierarchical clustering (based on a Euclidean distance metric) of the binary fitness profiles (fitness effect yes/no) (see Figure 15F).

3.4.2 Identifying context fitness adhesion genes using published data

For the context-specific fitness genes, we divided their binary fitness profiles into subprofiles measured in adherent and suspension cell lines. To identify genes that are required in either adherent or suspension cells, we performed two-tailed Fisher’s exact tests and adjusted p-values for multiple testing by controlling the false discovery rate (FDR), applying the method of Benjamini-Hochberg[158]. This yielded our first-tier list of adhesion fitness genes (AFG) (Figure 15G). To extend this list, we defined cell lines depending on the central adhesion molecules TLN1 or ITGAV as a high-confidence adherence set. We tested dependencies of those cell lines again using two-tailed Fisher’s exact testing and p-value adjustment as described above. Validating this two-tier approach, (i) AFGs identified were highly enriched for the GO term integrin-mediated signaling (FDR < 5%) and (ii) the high confidence non-adherent (not dependent on TLN1) cell lines were dependent on BCL2, which is upregulated to allow cells to grow non-adherently [85].

3.4.3 Cell culture

HAP1 cells, including knockouts (acquired from Horizon Discovery: ITGB1 KO Cat# HZGHC001029c003 and ITGB5 KO Cat# HZGHC001028c001), were maintained in IMDM (ThermoFisher #12440061) containing 10% FBS and penicillin streptomycin (ThermoFisher Cat# 15140163), unless otherwise indicated. HCT116 cells were maintained in McCoys 5A (ThermoFisher Cat# 16600082), HEK293T and MCF7 cells in DMEM (ThermoFisher Cat# 11995065), all supplemented with 10% FBS and 1% penicillin-streptomycin (ThermoFisher Cat# 15140122). Cells were passed by trypsinization every 3-4 days and seeded into fresh medium at a ratio between 1:5 and 1:20 onto uncoated tissue culture treated polystyrene (TCP) or, where indicated TCP coated for 1hr at 37C with ECM substrates 50ug/ml collagen I 84

(ThermoFisher Cat# A1048301), 10ug/ml laminin (Sigma Cat# L2020), 10ug/ml fibronectin (Sigma Cat# F1141) or 5-10% Matrigel (Corning Cat# 354234), diluted in PBS plus 0.2% BSA.

3.4.4 CRISPR-Cas9 Extracellular Matrix Substrate Screens

3.4.4.1 Screening approach

Functional genetic screens using the CRISPR-Cas9 knockout (KO) system in human haploid cells as described [127]. A clonal population of Cas9-expressing HAP1 cells with high editing efficiency were infected at a low MOI, selected and cultured in parallel on ECM components (collagen, fibronectin, laminin or Matrigel) compared to standard tissue culture (TC) plastic. Cells were trypsinized, and re-plated on fresh substrate every 3 days for a total of 18 days, maintaining >200 fold average sgRNA representation at all times. Genomic DNA was extracted from cells pellets comprising 2x107 million cells (~200 fold representation) collected from the initial pool of knockouts and biological triplicate of subsequent time points, day 12 (T12 or intermediate timepoint) and day 18 (T18 or endpoint), for each substrate condition, using QiaAmp Maxi kit (Qiagen Cat# 51194), sgRNA regions were PCR amplified using Kapa HiFi Hotstart ready mix (Kappa Biosystems Cat# KK2601) and custom barcoded indexing primers as in [127]. PCR amplicons were quantified, pooled and run on the Illumina HiSeq2500. A depth of at least 4x107 raw reads was required for the initial pool, and 107- 2x107 reads per biological replicate sample was required.

3.4.4.2 CRISPR-Cas9 screening quality control

Aligned raw readcounts were obtained for 90,824 gRNAs with an average coverage between 100 and 250 readcounts for intermediate (T12) and endpoint (T18) samples and ~500 readcounts for the initial T0 sample. Each sample was normalized for sequencing depth, and the resulting data

was log2 transformed. 900 gRNAs had a readcount of less than 30 in the T0 sample and were excluded from further analysis. For the relative dropout of each gRNA from the pool, the log2 fold-change was computed by pairwise subtraction of readcounts in the T0 reference from the readcounts in each T12 and T18 sample. The log2 fold-changes for the 89,924 gRNAs were approximately normally distributed with a slight left skew and a wider distribution at the T18 timepoint (Figure 16E). To test screening quality, we assessed the ability of each replicate to identify the 1580 core fitness genes from [127] by averaging gRNA log2 fold-changes per gene and computing the area under the ROC curve. 85

3.4.4.3 Heteroscedastic variance stabilization

The variance of gRNA log2 fold-change data between each biological triplicate measure in each condition and at each timepoint depended on the initial gRNA representation at T0. This means that gRNAs with low readcouts and thus lower representation at T0 generally showed a higher standard deviation (SD). This is known as heteroscedastic behavior. To correct for this trend, we adjusted the log2 fold-change gRNA data to ensure uniform SD regardless of the initial gRNA representation at T0. For each condition and timepoint, we fitted a degree 1 local polynomial regression (Loess) curve predicting the SD in dependence of the initial gRNA readcount and

used this prediction as a scale factor for the log2 fold-change data.

3.4.4.4 Estimation of absolute and differential gRNA effects for substrate screens

Since all replicate screens on TCP and ECM substrates were done from the same library

infection, we summarized triplicates’ log2 fold-change values for each gRNA per time point and condition. This provided per-gRNA absolute fitness measurements that were summarized to a gene-level score as described below. Subsequently, differential gRNA effects were computed by taking the difference between each gRNA’s log2 fold-change measured on an ECM substrate and on TCP.

3.4.4.5 Guide RNA quality control

After excluding lowly represented gRNAs (see above) and genes that were represented by less than 3 gRNA sequences as well as non-targeting controls, our library targeted 15,617 genes with an average of 5.5 gRNA sequences per gene. To improve the sensitivity and accuracy of the ECM-specific gene score, we further identified and excluded gRNA sequences that showed the least agreement with the other gRNA sequences targeting the same gene. To do this, for each

gene, we computed correlations of the log2 fold-change profiles across our complete set of profiles including TCP, ECM substrates at all time points to measure the agreement of the set of gRNAs targeting the same gene with each other. To retain variance in the profiles of gRNAs that completely dropped out of the population in each condition at both time points, a 0-value was added to each profile. Each gRNA was assigned an agreement score A that equaled the sum of profile correlations across all gRNAs targeting the same gene (excluding self-correlation). A gRNA i targeting a gene j was excluded if 86

Aij < max i (Aij) – 1

For each gene, we removed the gRNAs exhibiting the lowest similarities that satisfied this condition, but kept at least three gRNA reagents per gene. This metric identified per-gene agreement of gRNAs in HAP1 cells rather than deriving a general assumption based on a specific gRNA sequence composition or the effect strength. Specifically, in contrast to previous methods used to prioritize gene silencing reagents the agreement score A (i) can possibly retain all reagents targeting one gene if they show sufficient agreement, (ii) does not make a judgment about reagents targeting genes with limited effect strength and (iii) avoids artificially skewing the data population.

3.4.4.6 Differential and absolute gene fitness scores

Absolute and differential gene fitness scores were estimated considering gRNAs that passed the filters described above. While absolute gene fitness scores quantify the essentiality of each gene on TCP or each ECM substrate, differential scores quantify the masking or sensitizing effect of an ECM substrate compared to TCP. Averaging measurements for the different sgRNAs targeting the same gene generated the fitness scores for each gene. To test the significance of all pair-wise genetic interactions, p-values were calculated by the moderated t-test (limma), which estimates the mean and standard errors of the four interaction scores for each gene pair, followed by empirical Bayes shrinkage of the SEM [159]. p-values for each gene were then adjusted for multiple testing by controlling the false discovery rate (FDR) applying the method of Benjamini- Hochberg [158].

3.4.5 Generation of CRISPR Knockout cell lines

Parental cell lines were either transiently transfected with, or infected at multiplicity of infection <1 with pLCKOv3 packaged into lentivirus against ITGAV and ITGB1. Knockouts were identified by PCR amplification of CRISPR-targeted genomic loci and Sanger sequenced, using indicated primers (See methods Table M5. Overlaid chromatograms resulting from heterozygous indel mutations were resolved using CRISPR ID online tool (http://crispid.gbiomed.kuleuven.be/)[160]. 87

Methods Table M5 – Chapter 3 CRISPR reagents

Guide Sequencing Target sequence Primer 1 Primer 2 ID primer

sgITG AGCATCTGTG TGCCCCACCGAA GAGCAAGACTCC ACACTGAGAC AV#1 AGGTCGAAAC AGATCTTC GTCTCAGG ACTTCTGGGTC

sgITG TAGCCGGATG CCCTCAGTGAAT GCTCCCTATTCTC TGCATCAAATT AV#2 TTTCTTCTCG AGCAAATGGTTTT AGAAAAGGTACA CAATTGGC

sgITG ACTTCCTCCG ATGTCCCAAAAT GCAAATTGTCAG CCAATGTAGCT B1#1 TAAAGCCCAG ATGACCTGCCT AAGGCGTAACA CTGTCACTG

sgITG ATCTCCAGCA CCCCTACTTACCA ATTGAGAGATAC TCACCCACAA B1#2 AAGTGAAACC AAACAGCAAA GGATGGTCAGG GCTCTTAGTA

3.4.6 Proliferation assays

Cells were seeded on TCP and/or Matrigel in triplicate. Every 24 hours for four days one plate was trypsinized and counted. For all cases of potential sphere/ suspension cells, both floating and adherent cells were trypsinized, pooled and counted. Cell counts were averaged and plotted as fold increase in cell number with individual replicates shown, error bars represent plus or minus standard deviation.

3.4.7 Adherence assays

Cells were dissociated with DB, counted and seeded at 1x106 cells per well in 12 well plates for 20 minutes at 37C. Non-adherent cells were washed away with PBS and adherent cells were fixed with methanol, stained with crystal violet and scanned. For long term adherence assays cells were seeded at 1.5x104 per well in 12 well plates and allowed to grow normally. After four days, all non-adherent cells were washed away gently in PBS and adherent cells were fixed with methanol and stained with crystal violet. 88

3.4.8 Immunofluorescence

Single cell suspension was seeded onto Nunc Lab-Tek II CC2TM treated Chamber slide system (ThermoFisher #154917) uncoated or coated with indicated ECM. After three days, cells were fixed in 4% paraformaldehyde, permeabilized briefly and stained with 1:40 phalloidin-555 (ThermoFisher #A12380), 1:100 tubulin-488 (CST #8058) and DAPI (CST #4083). Samples were sealed with a coverslip in mounting media (CST #9071) and imaged using a Zeiss LSM 880 Laser scanning confocal microscope.

3.4.9 RNAi

MCF7 cells were infected with shRNAs targeting ITGAV or LACZ control (see Methods Table M6). Stably transduced cells were selected with 2ug/ml puromycin for 48 hours. Polyclonal populations were assessed for integrin αV protein level by western blot, and assessed for morphology 7-10 days after transduction.

Methods Table M6 – Chapter 3 RNAi reagents

Hairpin ID Target sequence TRC Number

shLacZ CCGTCATAGCGATAACGAGTT TRCN0000072235

shITGAV#1 GACTGAGCTAATCTTGAGAAT TRCN0000003239

shITGAV#2 CTCTGTTGTATATCCTTCATT TRCN0000003240

shITGAV#3 CACTCCAAGAACATGACTATT TRCN0000003241

shITGAV#4 CGACAGGCTCACATTCTACTT TRCN0000010769

shITGAV#5 GTGAGGTCGAAACAGGATAAA TRCN0000010768

3.4.10 Nutrient supplementation experiments

Cells were trypsinized to single cell suspension, washed well, counted and reseeded into minimal medium (Wisent Bioproducts #391-062 supplemented with 5% FBS, 1mM glutamine, 10mM

89

glucose) containing indicated supplement (1mM sodium pyruvate (ThermoFisher # 11360070), 1mM L-Alanine (Sigma #A7627), 1mM L-Lactate (Sigma #L7022), 1mM α-ketobutyrate (AKB) (Sigma #K0875), 5mM dichloroacetate (DCA) (Sigma #347795).

3.4.11 Defined media experiments

Cells were trypsinized, washed well and seeded into indicated variation or complete serum-free defined media (DMEM (Wisent Bioproducts #391-062) base supplemented with 1% PenStrep, 10mM glucose, 3mM glutamine, 10mM HEPES (ThermoFisher #15630080), 1x NEAAs (ThermoFisher #11140050), 0.2% Lipid mixture 1 (Sigma L0288), 1x N2 Supplement A (STEMCELL Tech. #07152), 1x NeuroCult SM1 (STEMCELL Tech. #05711), 10ng/mL bFGF (ThermoFisher#13256029), 20ng/mL EGF (Gibco#PHG0311), 4µg/ml Heparin (Sigma #H1027) and 1mM Pyruvate. Some samples received additional supplementation with 3mM reduced glutathione (GSH) (Sigma #G6013) and 10µM biotin (Sigma #B4639) as indicated.

3.4.12 Flow cytometry/ FACS

Adherent cells were harvested by aspirating media, while non-adherent or semi-adherent cells were harvested by spinning at low speed (250g, 5 mins) to pellet spheres/aggregates. All cells were washed in PBS then fully dissociated in sterile non-enzymatic dissociation buffer, DB

(1mM EDTA, 137 mM NaCl, 6.7mM NaHCO3, 5mM KCl and 5mM D-Glucose) for 5-10 minutes. Samples were then diluted with sorting buffer (PBS with 1mM EDTA, 25mM HEPES pH 7.0 and 1% BSA), filtered through 40µM mesh and stained with appropriate antibody dilution and viability dye. Data acquisition was performed on a BD LSR Fortessa X20 and FACS sorting was performed using a BD FACS Aria IIu.

Methods Table M7 – Chapter 3 antibodies

Antibody Dilution Company/ Cat#

ITGB1-FITC 1:50 Abcam / ab21845

ITGAV-PE 1:100 R&D Systems / FAB1219P

CCAB /1461 αVβ5 2ug/ml

90

ITGAV 1:1000 CST / 4711

ITGB1 1:1000 CST / 9699

ITGB5 1:1000 CST / 3629

GAPDH 1:5000 CST / 2118

β-Actin 1:5000 Abcam / Ab3280

3.4.13 Transmission Electron Microscopy (TEM) Imaging

Cells were cultured and harvested as indicated (either as a monolayer or pelleted from suspension) and fixed with 0.1M phosphate buffer 4% paraformaldehyde 1% glutaraldehyde followed by a secondary fixation with 1% Osmium Tetroxide, and then a five step ethanol dehydration series. After a three step epon resin infiltration series, samples were sectioned and then stained with 5% uranyl acetate and Reynolds lead citrate. Imaging was carried out on a Hitachi H-7000 at 75Kv (25µA beam current).

3.4.14 Generation of antibodies against αVβ5

Fab-Phage from library-F [161] were cycled through four rounds of binding selection on human recombinant αVβ5 protein (R&D Cat#2528) as in [162]. Fab-Phage output pools were analyzed by clonal ELISA for binding specificity.

3.4.15 Fab-phage cellular ELISAs

After four rounds of cellular selection, phages were produced from individual clones grown in a 96-well format. Specifically, colonies of E. coli Omnimax harboring phagemids were inoculated directly into 450 µl of 2YT broth supplemented with carbenicillin and M13-KO7 helper phage; the cultures were grown overnight at 37°C in a 96 well format. Culture supernatants containing Fab-phage were diluted two-fold in PBS buffer supplemented with 1% BSA and incubated for 15 minutes at room temperature. The mixtures were transferred and incubated for 30 minutes at room temperature to plates coated with antigen versus controls. The plates were washed well 91

with PBS then incubated for 30 minutes with horseradish peroxidase/anti-M13 antibody conjugate (Sigma-Aldrich) in PBS buffer supplemented with 1% BSA. The plates were washed, developed with TMB Microwell Peroxidase Substrate Kit (KPL Inc.), quenched with 1.0 M phosphoric acid, and absorbance determined at a wavelength of 450 nm. Positive binding clones were determined by setting a threshold of 1.5-fold or greater signal of antigen expressing cells over parental cells. All positive clones were then subjected to Sanger DNA sequence analysis.

3.4.16 Expression and purification of IgG proteins

Fab-phage DNA sequences of variable domains were subcloned into two mammalian expression vectors for heavy chain and light chain expression as previously described. The plasmids were co-transfected into Expi293 cells (ThermoFisher) using the FuGENE® 6 Transfection Reagent (Promega), according to the manufacturer's instructions. The cell culture media was harvested 5– 6 days following transfection, and applied to an rProtein A affinity column (GE Healthcare). IgG

proteins were eluted with 25 mM H3PO4, pH 2.8, 100 mM NaCl and neutralized with 0.5 M

Na3PO4 pH 8.6. Eluted fractions of interest were combined, concentrated and buffer exchanged into PBS.

3.4.17 Validation of IgG specificity

HEK293T cells co-transfected with indicated overexpression plasmids for integrin subunits, αIIb (Origene #SC300066), αV (#SC118750), β1 (#SC111935), β3 (#SC120057), β5 (#SC118751), β6 (#SC128124), β8 (#SC118752) using X-tremeGene9 according to manufacturer’s protocol and were assessed for 1461 binding, after 72 hours. The untransfected control was used to set a gate for overexpressing cells.

3.4.18 Antibody treatment

Cells were harvested using DB, counted and re-suspended in media containing 20µg/ml of indicated IgG. They were incubated in suspension at 37C for 30 minutes and seeded in parallel into ECM coated and uncoated wells. Media was subsequently supplemented daily with an additional 20µg/ml IgG.

92

3.5 Results

3.5.1 Systematic identification of adherent-related fitness genes (AFGs)

We sought to identify adherent-related fitness genes; a class of genes generally required for adherent cell growth. To do so, we re-analyzed 60 genome-wide CRISPR screens from six studies completed in 54 unique human cell lines from diverse cancer types, that proliferate adherently or in suspension. Roughly half of the 54 cell lines proliferate adherently, and the other half in suspension. Quality control measures and a 2-step batch correction approach (Figure 15A- E, see methods for details) yielded relative fitness measurements for 15,673 genes across 58 screens; we report 3,435 genes as showing extreme fitness effects in at least 10% of the 58 screens (Table S4). Hierarchical clustering of these genes’ fitness profiles resulted in two major clusters; 1,602 core and 1,833 context-specific fitness genes (Figure 15F). As expected, the core fitness gene set recapitulated findings of previous studies; for example, 29 of 30 proteasome components (Fisher’s exact test; p < 4-9), all ten RNA polymerase II subunits (p < 0.0005), as well as 39 of 41 large and small ribosomal protein encoding genes (p < 3-11). The context- specific gene set was specifically examined for genes that were important for fitness in adherent, but not suspension cell lines. We define adherent-related fitness genes (referred to as AFGs hereafter) to include genes with proportionally more fitness effects in adherent cell lines than suspension cell lines (Figure 15G). At an FDR of 25%, we identified 12 AFGs, including a number of known or suspected genes involved in cell adhesion pathways such as the integrin ITGAV and TLN1, which encodes talin, a central adaptor protein connecting integrins with the cytoskeleton and downstream integrin signaling [163, 164]. To expand on this list and validate the approach, we queried for co-dependencies in ITGAV or TLN1-dependent cells and identifed an additonal 40 genes (FDR < 45%). These included CRK, CRKL, FERMT2, PTK2, EGFR, DOCK5, ANKRD52, ITGB5 and TNS3 (See Table 7), and GO term analysis of the resulting genes showed strong enrichment for cell matrix adhesion, integrin-mediated signaling, and intracellular signal transduction (FDR < 5%). Notably, cell lines that were TLN1-independent in our analysis strongly depended on the anti-apoptotic factor BCL2, which has previously been shown to be upregulated to protect cells from anoikis [85]. Together, this provided a total list of 52 AFGs (Table 7). We further validated our results using published gene trap screening data in human haploid matched adherent HAP1 and suspension KBM7 cell lines [150]. Confirming our

93 approach, the AFGs were more likely to be identified as fitness genes in HAP1 but not in KBM7 cells (p = 1.9e-5, two-sided Fisher’s exact test; Figure 15I and Table 7).

Table 7 – 52 Adherence-related fitness genes (AFGs)

Genes with proportionally more fitness effects in adherent cell lines than suspension cell lines at an FDR of ≤ 25% and genes with co-dependencies in ITGAV or TLN1-dependent cells at an FDR ≤ 45%. The resultant list of 52 AFGs was also compared for essentiality by gene trap in isogenic suspension cell line KBM7 and adherent cell line HAP1 using data from [150]. ns = not significant. nd = not determined in the study. Blomen et al. Gene trap Adherence Gene analysis essential? Adherence Suspension TLN1 co- fitness fitness BH-adj. dependent? Symbol factor factor p-value p-value (FDR) HAP1 KBM7 nd nd CCND1 0.838 0.000 8.86E-11 8.12E-08 ns nd nd VRK1 0.865 0.333 7.61E-05 9.97E-03 ns nd nd DOCK7 0.432 0.000 1.82E-04 1.75E-02 ns nd nd TRIM37 0.730 0.238 3.92E-04 3.26E-02 0.431 nd nd C2orf49 0.541 0.095 7.52E-04 5.51E-02 ns nd nd TLN1 0.568 0.143 2.09E-03 1.16E-01 0.000 nd nd SYBU 0.297 0.000 4.67E-03 2.14E-01 ns nd nd MESDC2 0.405 0.048 4.81E-03 2.15E-01 0.283 nd nd ANKRD52 0.730 0.381 1.27E-02 4.56E-01 0.283 1 0 ITGAV 0.459 0.000 1.53E-04 1.70E-02 0.386 1 0 STT3A 0.811 0.333 5.02E-04 4.00E-02 0.431 1 0 PKN2 0.378 0.000 8.92E-04 6.06E-02 0.431 1 1 RAC1 0.649 0.238 5.64E-03 2.35E-01 ns

94

1 0 FERMT2 0.243 0.000 1.97E-02 5.63E-01 0.091 1 0 PTK2 0.378 0.095 3.11E-02 7.31E-01 0.105 nd nd DOCK5 0.189 0.000 4.13E-02 7.98E-01 0.192 nd nd FITM2 0.189 0.000 4.13E-02 7.98E-01 0.431 1 0 ITGB1 0.189 0.000 4.13E-02 7.98E-01 0.431 1 0 KLF5 0.189 0.000 4.13E-02 7.98E-01 0.431 nd nd C17orf96 0.270 0.048 4.37E-02 8.06E-01 0.325 nd nd EGFR 0.243 0.048 7.69E-02 8.99E-01 0.163 nd nd CRK 0.162 0.000 7.74E-02 8.99E-01 ns 1 0 EAF1 0.162 0.000 7.74E-02 8.99E-01 0.283 1 0 ITGB5 0.162 0.000 7.74E-02 8.99E-01 0.283 nd nd TNS3 0.162 0.000 7.74E-02 8.99E-01 0.283 nd nd IPO5 0.432 0.190 8.66E-02 9.17E-01 0.431 nd nd DET1 0.486 0.238 9.41E-02 9.82E-01 0.163 1 0 NCKAP1 0.514 0.286 1.07E-01 1.00E+00 0.431 1 0 CRKL 0.486 0.381 5.84E-01 1.00E+00 0.091 1 1 IMMT 0.486 0.381 5.84E-01 1.00E+00 0.386 1 1 ACACA 0.486 0.429 7.86E-01 1.00E+00 0.431 nd nd GGA3 0.432 0.381 7.86E-01 1.00E+00 0.400 nd nd DNAJC21 0.432 0.429 1.00E+00 1.00E+00 0.283 1 0 THAP7 0.405 0.238 2.56E-01 1.00E+00 0.431 nd nd UFL1 0.405 0.286 4.08E-01 1.00E+00 0.163 1 0 ZC3H3 0.405 0.476 7.83E-01 1.00E+00 0.431

95

1 1 USE1 0.378 0.381 1.00E+00 1.00E+00 0.431 0 1 RBM27 0.351 0.381 1.00E+00 1.00E+00 0.343 1 0 RAPGEF1 0.324 0.143 2.12E-01 1.00E+00 0.346 0 1 CFLAR 0.324 0.190 3.65E-01 1.00E+00 0.431 1 1 VBP1 0.324 0.238 5.60E-01 1.00E+00 0.386 nd nd EIF4ENIF1 0.324 0.286 1.00E+00 1.00E+00 0.431 nd nd RGP1 0.297 0.190 5.35E-01 1.00E+00 0.346 nd nd SOX9 0.270 0.143 3.38E-01 1.00E+00 0.316 0 1 UXS1 0.270 0.190 5.44E-01 1.00E+00 0.431 nd nd IKBKG 0.216 0.048 1.35E-01 1.00E+00 0.283 nd nd ABCD4 0.216 0.095 3.01E-01 1.00E+00 0.431 nd nd PGRMC2 0.216 0.095 3.01E-01 1.00E+00 0.431 1 0 WWTR1 0.216 0.095 3.01E-01 1.00E+00 0.431 nd nd ELMSAN1 0.162 0.048 4.03E-01 1.00E+00 0.431 1 0 DLG5 0.135 0.048 4.02E-01 1.00E+00 0.283 nd nd PIAS1 0.135 0.095 1.00E+00 1.00E+00 0.431

Upstream of TLN1, integrin complex dependency appears to stem from ITGAV and ITGB5, which are known to heterodimerize and mediate downstream signaling through PTK2 (also known as FAK) and CRK [59, 165]. Additional integrin genes including ITGB1 and ITGB7, had extreme fitness scores in several cell lines, but did not show co-dependency with ITGAV (Figure 15H), indicating that while ITGAV integrins are essential in many adherent cell lines, there are exceptions that depend on other integrins. Consistent with these findings was a recent large-scale RNA interference screening study of 501 cell lines, which highlighted a co-dependency network that included TLN1, ITGAV, PTK2, the transcription factor TEAD1, and the glycosyltransferase

96

RPN2 [151]. Together, these analyses pinpoint a small set of integrins at the center of adherent- related fitness, and raise the question as to how the ECM plays a role in cell fitness.

3.5.2 Genome-wide genetic screens identify ECM-dependent fitness genes

In a parallel effort to define ECM-dependent fitness genes, we performed genome-wide pooled CRISPR/Cas9 screens in human HAP1 cells cultured on TCP, collagen, fibronectin, laminin or Matrigel, a basement membrane mixture (Figure 16A). The log2 fold-changes (fitness scores) for the 89,924 gRNAs were approximately normally distributed with a slight left skew and a wider distribution at the T18 timepoint (Figure 16E). Gene level fitness scores were calculated for each gene at intermediate and endpoint for TCP and each of the four substrates (Figure 16B, Table S5). To test screening quality, we assessed the ability of each replicate to identify the 1580 core fitness genes from [127] by averaging gRNA log2 fold-changes per gene and computing the area under the ROC curve and found that all screen replicates performed well at both intermediate and endpoint (Figures 16C, 16D). We also performed principal component (PC) analysis on each individual replicate screen and found that PC1 accounted for the majority of the variance (Figures 16F, 16G). All screening data was compiled into gene level fitness scores (log2FC) for each substrate and the associated significance. We expected to find genes where loss-of-function mutations changed cellular fitness differentially on ECM relative to TCP. To do this, we assessed the differential fitness for each gene between substrates by computing differential log2FC (dFC) and the associated significance. Mutant genes that reduced fitness on ECM relative to TCP were considered ‘sensitizing’ (negative dFC), whereas mutant genes that restored fitness on ECM relative to TCP were considered ‘masking’ (positive dFC). For an illustration of example sensitized and masked genes, see Figures 16B. From this analysis, we identified 884 (5.7%; FDR 15%) genes with either masking (n=375) or sensitizing (n=509) effects on at least one ECM condition relative to TCP (Figures 17A, 17B). We observed ~100 sensitizing and masking effects for each of the four ECM conditions (Figure 17B; FDR 15%). The majority of these effects were not shared across all four ECM conditions (Figures 17C, 17E), suggesting that there are likely many ECM-specific genetic regulators. A small number of genes showed differential effects across all of the different ECM substrates (Figures 17D, 17F). We determined the strongest sensitizing and masking effected by using the maximum gene dFC for any substrate plotted on the x-axis against the -log10(p-value)(Figure 16H). The strongest sensitizers included MYL6 and FBXO11 (Figure 16H). MYL6 encodes a myosin alkali light chain that is involved in

97

ABC intermediate genome-wide intermediate endpoint CRISPR screens 1 −1 core fitness Tissue culture 4 ECM −3 (PSMA5) plastic substrates 1 sensitized

−1 0.920 0.950 Fitness (FBXO11) intermediate −3 D endpoint Outgrowth 1 −1 masked endpoint −3 (ITGAV)

TCP ECM plastic 0.920 0.950 Fitness collagen curve (ess. genes) Area under ROC 2468101214 differential Non-fitness fibronectin Replicate screens on ECM laminin Sensitized ECM-regulated genetic n.s. Masked } dependencies matrigel

E intermediate endpoint Density 0.0 0.2 0.4 0.0 0.2 0.4 0.0 0.2 0.4 0.0 0.2 0.4 0.0 0.2 0.4 −6 −4 −2 0 2 4 −6 −4 −2 0 2 4 −6 −4 −2 0 2 4 −6 −4 −2 0 2 4 −6 −4 −2 0 2 4 log2 fold−change

F G H ITGAV intermediate endpoint MYL6 −0.245 FBXO11 COL5A1 −0.250 0.270 THAP3 −0.255 0.265 0.260 −0.260 0.255 −0.265 0.250 ITGB5 PC 1 (25.7%) −0.270

PC 1 (24.3%) 0.245 −0.275 0.240 abcabcabcabcabc abcabcabcabcabc Replicate screens Replicate screens −log10(p−value) early - late 1234567 effect size 0

−2 −1 0 1 2 Max. dFC on ECM sensitized masked

98 Figure 16 – Parallel Genome-wide CRISPR Screens on ECM Components A) Experimental setup for ECM-regulated fitness genes using genome-wide CRISPR/Cas9 screens in HAP1 cells grown on tissue culture plastic (TCP) or each of four different ECM substrates. Expected phenotypes for core, non-fitness, and ECM-regulated genetic dependencies are indicated. B) Fitness effect (average log2 fold-change) for PSMA5, representative of core fitness effect, FBXO11 representative of fitness effects sensitized by ECM, and ITGAV representative of fitness effect masked by ECM. B) These observed trends are illustrated for intermediate (left) and endpoints (right) on each substrate (colour). Black horizontal line illustrates the respective fitness effect on TCP and dashed black line indicates a neutral (log2FC = 0) fitness effect. Filled circles indicate a significant differential fitness effect (delta log2FC or dFC) between substrate and TCP, while open circles are not significant. C) Quality control using the area under the ROC curve of gene fitness scores and the set of 1580 fitness genes (Hart et al., 2015) as a gold standard. Per-gene scores summarize all gRNA log2 fold-changes targeting the same gene without prior quality control and per replicate for intermediate and D) for endpoint. E) Log2 fold-change of the 90k TKOv1 gRNA pool for each biological triplicate of each ECM screen at the endpoint (straight lines) and intermediate time point (dashed lines). The log2 fold-change was estimated by comparing gRNA read counts between a given time point with the measurement taken after library infection and selection (T0). F) and G) Distinction between replicates on TCP and ECM substrates at intermediate and endpoint. Shown is the first principle component (any further component explained less than 10% of the variance, and singled out one replicate screen). H) Volcano plot illustrates the maximum differential fitness effect (dFC) on ECM substrate are plotted on the x-axis (Dot size is proportional to –log10(p)*|dFC|). Black outlined dots are significant at FDR < 15%. Dot darkness represents conservation of differential effect over time (darkest shade corresponds to effects that are also present at the intermediate time measurement). Pie charts show the ECM substrates on which the selected genes exhibited significant effects, with the smaller pie charts showing presence and strength of differential effects at the intermediate time point and larger pie charts at the endpoint. The pie chart colors are consistent with panels B-G. Dashed outline circles represent no significant effects at intermediate time point. 99

A Collagen I Fibronectin Laminin Matrigel COL5A1 ITGAV 5 THAP3 FBXO11 7 7 ITGAV ITGAV 5 MYL6 6 ITGAV MYL6 FBXO11 6 4 ITGB5 FBXO11 4 5 THAP3 5 THAP3 3 4 4 3 3 3 2 2 ITGB5 ITGB5 2 2

ITGB5 T18 p-value -log10

-log10 p-value T18 p-value -log10 -log10 p-value T18 p-value -log10 -log10 p-value T18 p-value -log10 1 1 1 1 0 0 0 0 054321 054321 054321 67054321 -log10 p-value T12 -log10 p-value T12 -log10 p-value T12 -log10 p-value T12

BCD Fibronectin Laminin Masking effect Masking Effects: 250 Collagen I Matrigel (17)66 (8)87 Sensitizing effect (1) (RARRES3) (1) 200 7 (2) ITGAV 7 (10) 9 (1) THTPA 150 67 (0) (0) 75 THAP3 4 1 ESSRA 100 (1) GANAB 8 COL5A1 50 (0)9 (3)17 NMNAT3 (1)4 (0)4 MED16 Significant differential effects Significant differential 0 (3)10

Plastic Laminin Matrigel CollagenFibronectin I EFFibronectin Laminin Collagen I Matrigel (45)108 (5)234 Sensitizing Effects: (2) (2) 21 (0) (MECR) 5 (44) 3 (6) CFDP1 71 (0) (0) 28 HDHD2 4 5 SRA1 (1) FBXO11 6 (3) (1) FASN 10 4 ZNF154 (0)2 (0)5 (0)5

Figure 17 – ECM Masks or Sensitizes Hundreds of Genetic Dependencies in HAP-1 cells A) Masking (yellow) and sensitizing (blue) significant (FDR <15%) differential gene effects at intermediate and endpoints of screens on Collagen, Fibronectin, Laminin or Matrigel. B) Total number of significant masking and sensitizing gene effects at intermediate (dashed lines) and endpoint (full bars) . C) Overlap of masking effects between substrates at intermediate (in brackets) and endpoint (bold), and D) genes masked by all four ECM substrates at intermediate (in brackets) and endpoint (bold). E) Overlap of sensitizing effects, and F) genes sensitized by all four ECM substrates. 100

cytoskeletal functions [166] whereas FBXO11 encodes an F-box protein family member belonging to the Fbxs class, which can function as an arginine methyltransferase and adaptor protein to mediate neddylation of TP53, which leads to suppression of TP53 function [167]. The strongest masking effects were observed for THAP3 and ITGAV (Figure 16H). THAP3 encodes an uncharacterized THAP-domain containing protein that associates with HCFC1 and the O- GlcNAc transferase OGT [168], while ITGAV encodes the major alpha (α) integrin subunit, first identified as the vitronectin receptor [169], but later shown to heterodimerize with multiple beta (β) integrins to mediate attachment and signaling with the ECM[165].

3.5.3 Parallel approaches identify core processes regulating adherent cell state

We next examined the overlap between our parallel approaches to illuminate key bioprocesses involved in adherence- and ECM-related fitness, and found a strong coherence among AFGs identified through data mining approaches and ECM regulated fitness genes identified by direct screening (p = 1.04e-5, two-sided Fisher’s exact test). Major functional categories and genes involved in adherence-related fitness include: cytoskeletal regulators (e.g. WAVE complex member NCKAP1, PTK2, TLN1, TNS3, FERMT2, CRK, PKN2, RAC1, RHOA, CRKL, DOCK5, DOCK7, RAPGEF1, ARHGAP24), ECM components (e.g. COL5A1, CLEC3B, HSPG2), cell adhesion receptors (e.g. ITGAV, ITGB1, ITGB5, LYPD3), growth signaling pathways (e.g. EGFR, PDGFRB, CTGF), ubiquitin-modulation (e.g. DET1, FBXO11, TRIM37), transcriptional regulators (e.g. BRD2, CXXC1, ESSRA, KLF5, MED16, NOBOX, SMAD5, SOX3, SOX9, SRA1, THAP3, THAP7), protein chaperone or trafficking (e.g. MESDC2, VBP1, DOPEY2, GGA3, RGP1, SYBU), cell cycle regulators (e.g. CCDN1, CDK14), mitochondrial (e.g. IMMT, SLC25A28, TACO1) and genes involved in fat metabolism (e.g. ACACA, FASN, FITM2) (Figure 18). Overall, these observations highlight an integrated network of bioprocesses that regulate adherence-related fitness, including a central role for ECM receptors.

3.5.4 Loss of integrin αV results in sphere formation that is masked by ECM

To date, several studies have defined ITGAV, which encodes the integrin αV protein, as a fitness gene [151, 170, 171]. Our data indicate that sgITGAV dropout can be masked by addition of any tested exogenous ECM component, including collagen, fibronectin, laminin or Matrigel (Figure 101

ECM components: COL5A1 Integrin β1 CLEC3B Growth factor HSPG2 receptors: Legend EGFR PDGFRB Adherence fitness gene (AFG) ECM screen hit Integrin αVβ5 AFG and ECM hit

TNS3 Myosin WAVE Complex Secreted: MYL6 CTGF NCKAP1 Talin FERMT2 MYH9 DLG5 NF2 Cell adhesion TLN1 PKN2 receptors: PTK2 CRK RHOA Chaperone: ITGAV Actin MESDC2 FAK RAC1 VBP1 ITGB1 CRKL ITGB5 GDP GTP LYPD3 Ubiquitin- RAC GEF/GAP: modulators DOCK5 DOCK7 including: RAPGEF1 ARHGAP24 IPO5 Cell cycle: Mitochondrial:Mitochondrial: DET1 CCDN1 IMMIMMTT FBXO11 Lysosomal: Transcriptional regulators: CDK14 SLC25A28 TRIM37 ABCD4 BRD2 CXXC1 ESRRA TACO1 KLF5 MED16 NOBOX Fat metabolism: SMAD5 SOX3 SOX9 Trafficking: ACACA mRNA SRA1 THAP3 THAP7 DOPEY2 GGA3 FASN export RGP1 SYBU FITM2 ZC3H3

Figure 18 – AFGs and ECM-dependent Genetic Dependencies Converge on Cellular Processes Including the Integrin Adhesome. Illustration of highlighted AFGs (burgundy), ECM-regulated fitness genes (purple) and genes identified through both approaches (teal).

102

16H and 19A). Matrigel, which mimics the basement membrane present in vivo, showed the strongest masking effect for ITGAV knockout (Figure 19A). Of the six sgRNAs targeting ITGAV in the screened CRISPR library, five showed strong dropout (negative log2FC) on TCP and no drop out (neutral or positive log2FC) on Matrigel (Figure 19B). ITGAV encodes one of eighteen integrin α proteins, which heterodimerize with integrin β partners to bind to an extracellular ligand (Figures 2B and 19C). To validate our screen results, we constructed ITGAV knockout cell lines by introducing CRISPR guide RNAs (gRNAs) against ITGAV and Cas9 into HAP1 cells cultured on Matrigel. After 7 days, the resultant polyclonal populations showed approximately ~2:1 ratio of αV knockouts (αVKO) to wildtype cells, which is equal to the expected ratio of CRISPR induced frameshift mutations, suggesting there is no survival disadvantage due to loss of αV in the presence of Matrigel (Figure 19D). Single cell αVKO clones were then isolated by cell sorting and verified by western blot and targeted sequencing (Figure 19E). Interestingly, αVKOs formed three dimensional (3D) spheroid structures on TCP, but grow as adherent monolayer (2D) cultures when provided with Matrigel, or any other tested ECM substrate (Figure 19G). Moreover, ITGAV knockout cells proliferated at an indistinguishable rate compared to parental HAP1 cells when both were grown on Matrigel (Figure 19I). However, while there is no significant difference in cell number or viability for parental cells grown on TCP compared to Matrigel, there is a significant reduction in αVKO cell number and viability on TCP compared to Matrigel (Figures 19F, 19H). We characterized multiple independent αVKO clones and observed their proliferation on TCP and found that all clones grew as spheres, with variable growth rates (Figure 19J). We also assessed αVKO for adherence to TCP or Matrigel under normal growth conditions and found that with short or long term incubation period, parental HAP1 cells adhered readily to TCP or Matrigel, while αVKOs cells only adhered to Matrigel (Figure 20A). Despite the observation that αVKO cells proliferate indistinguishably from parental HAP1 cells on Matrigel, the morphology of αVKO cells was abnormal on Matrigel (Figure 19G). To extend this observation and examine underlying cytoskeletal differences in aVKO cells compared to parental HAP1 cells with or without Matrigel, we stained for actin filaments (F-actin) and microtubules. In the absence of Matrigel, αVKOs grow as spheres with reduced or undetectable actin filaments and minimal cytoplasm (Figure 20B). In the presence of Matrigel, αVKOs show an adherent, spread morphology like parental HAP1 cells; however, they appear to have reduced F-actin (Figure 20B). These observations support a model

G H D Gene level fold AB

αVKO change (log2)

Counts -1.5 -1.0 -0.5 0.5 αV-PE

Alamar blue fluorescence WT ~30% ~70% 0 % control TCP TCP 100 150 Collagen

50

0 Fibronectin

WT n.s. TCP αVKO-1 Laminin **

Matrigel TCP Matrigel Endpoint Intermediate αVKO-2 ** αVKO-2 αVKO-1 WT Secondary

Fold change in olgnFboetnLaminin Fibronectin Collagen Loading cell number I 100 150 200 250 50 0 αV β1

EF fold change in sgRNA 01234 abundance (log2) -3 -2 -1 0 1 2 3 WT

Matrigel Hap1 123456 αVKO-1 sgITGAV #

αVKO Control αVKO-2

Fold change in J TCP cell number Matrigel 20 40 60 80

0 Fold change in cell number 01234 100 150 200 50 0 C 01234 Days Matrigel TCP Days WT Secondary KO Primary KO Absent Heterodimer αE adhesion receptors αD αD leukocyte specific β7 Matrigel αL α L α9 α11 α αM α α4 β2β M 4 2 100 150 200 1A6 1A2 2N3 2A1 LC2 α10 50 α8 0 αX 01234 α5 β1 α1 αV α2 β6 αVKO Days α7 α3 α6 103 β3 β5 αIIb ** β8 β4 *** 104

Figure 19 –Dependence on ITGAV is Masked by all tested ECM components in HAP-1 Cells A) Mean gene level ITGAV log2 fold change (FC) on each ECM substrate at intermediate (light grey) and endpoint (dark grey). B) log2FC in sgRNA abundance for each of the library’s six ITGAV guides on TCP (unfilled bar) and Matrigel (black bar), at endpoint. Points illustrate three individual biological replicates and bars represent the mean. sgRNA#3 was excluded from gene level calculation due to poor correlation with other ITGAV guides across all screens. C) Schematic diagram of integrin subunit pairing indicating the effect of knockout out ITGAV encoding integrin αV, on all other integrin pairs. Indirect “secondary knockouts” are encircled in orange. Solid connecting lines indicate potential for expressed heterodimers, while dashed connectors indicate indirect secondary knockouts. Light grey background shading indicates leukocyte/platelet-specific integrin heterodimers. D) Histogram of flow cytometry data illustrating the proportion of cells staining for αV-PE for HAP1 parental and polyclonal knockout populations generated with two orthogonal sgRNAs. E) Western blot for integrins αV and β1 and loading control for Hap1 parental and αVKO cells. F) Growth curves for parental and αVKO cells cultured on TCP (unfilled circles) or Matrigel (black squares) over four days. Individual points indicate 3 biological replicates from a representative KO clone, and the black line the mean. Significance was calculated with a student’s t-test. G) Bright field images of HAP1 parental and representative αVKO clone cultured on various ECM substrates, black scale bar = 100 uM. H) Viability of Hap1 parental and αVKO cells cultured on TCP (unfilled bar) or Matrigel (black bar) for four days assessed using background corrected alamarBlue fluorescence expressed as % of control on TCP. Error bars represent standard deviation of four replicates from a representative experiment. Significance was calculated for each cell population between TCP and Matrigel (students t-test). I) Growth curve for parental (red) and αVKO clones (blue) both cultured Matrigel over four days. Individual points indicate the mean from separate experiments, and a line connects the overall mean. Student’s t-test indicated no significant differences in cell number. J) Growth curve for five unique, confirmed HAP1 parental αVKO clones cultured on TCP over four days. (*p<0.05, **p<0.01, ***p<0.001)

105

where under normal adherent growth, integrin αV links the ECM to the cytoskeleton allowing formation of actin filaments and while αV is the key mediator of adhesion, αVKO knockout cells are viable and provision of exogenous ECM can reverse defects in proliferation and adhesion.

3.5.5 αVKO cells grow as spheres independent of genotype

In order to rule out that the αVKO phenotypes were specific to HAP1 parental cells, we generated and sequence validated additional αVKO clones in HAP1 parental, and HAP1-TP53 knockout cells and observed similar adhesion and growth state phenotypes (Figure 20C and Table 8). To determine if the ITGAV-dependent sphere formation phenotype is present in other cell lines, we generated αVKO clones in HCT116 colorectal cancer cells (Figure 21A and Table 8) and HEK293T human embryonic kidney-derived cells (Figure 22A), each grown on Matrigel. We found that multiple αVKO clones derived from these cell lines grow as spheroids on TCP and as adherent monolayers on various ECM components (Figures 21B and 22B). Both HCT116 and HEK293T parental cells showed no difference in cell number when cultured on Matrigel as opposed to TCP (Figures 21C and 22C). Like HAP1 cells, HCT116 αVKO cells, cultured on Matrigel showed a significant increase in cell number compared to sphere growth on TCP. (Figure 21C). Conversely, HEK239T αVKO cells showed a small but significant increase in cell number on Matrigel at day 2 and 3, but no difference at day 4 (Figure 22C). Importantly, both HCT116 and HEK293T αVKO cells showed little increase in confluence over time on TCP (i.e. grew in 3D as spheres) and significantly increased confluence when cultured on Matrigel (Figures 21D and 22D). For both HCT116 and HEK293T the αVKO confluence on Matrigel was comparable to the respective WT control confluence (Figures 21D and 22D), indicating that these cells proliferate as spheres on TCP, but can spread and grow adherently on Matrigel. Also, TP53 did not impact the obvious morphological phenotypes imparted by knocking out ITGAV in HCT116 cells, whether or not the HCT116 parent cells were TP53-deficient or harbored the TP53-R248W point mutation (Figure 20C). Lastly, we used RNA interference to deplete αV in the breast cancer line MCF7, and observed similar morphological phenotypes to knocking out ITGAV in other cell lines, with the degree of spheroid formation correlating with the degree of protein reduction by shRNAs targeting ITGAV (Figures 22E and 22F). Together, these findings indicate that the main role of ITGAV in cultured cells is to promote adherent growth and suppress

106

A C n=13 n=16 n=9 Adherent 20 mins: 4 days: Mixture 100 Spheres TCP Matrigelg TCP Matrigel 75

WT 50

% Clones 25 0 αVKO Matrigel: +-+-+- p53 status: Null S215G WT Hap1 HCT116 B Tubulin Actin DAPI Overlay Uncoated WT Matrigel Uncoated αVKO Matrigel 107

Figure 20 – Loss of ITGAV Leads to Sphere Growth State, Regardless of TP53 Status A) Representative crystal violet stained images (n=3) of a 20 minute cell adhesion assay, or 4 day growth assay. B) Immunofluorescence microscopy of parental and αVKO HAP1 cells cultured with and without Matrigel. Microtubules are coloured green, actin filaments are red and nuclei are blue, white scale bar = 10μM..C) Cumulative bar plot summary of the proportion of unique, confirmed αVKO clones growing adherently (green), as non-adherent spheres (blue) or a mixture of spheroids with adherent growth (red) across three TP53 genotypes on TCP or Matrigel. Number of clones assessed is indicated above the bars. 108

A B TCP Collagen Fibronectin Laminin Matrigel

HCT116 WT

WT AVKO-1AVKO-2 αV

β1 αVKO-1 Loading

C D

40 WT 40 αVKO *** TCP WT TCP 100 30 30 Mat WT Matrigel 75 AVKO TCP 20 20 ** 50 AVKO Matrigel

cell number 10 10 Fold change in * *** 25 0 0 % Confluence 01234 01234 0 01234 Days Days

Figure 21 –Dependence on ITGAV is Masked by ECM in HCT116 CRC Cells A) Western blot for integrins αV and β1 and loading control for HCT116 parental and αVKO cells B) Representative bright field images for HCT116 parental and αVKO cells cultured on various ECM substrates. Black scale bar = 100uM. C) Growth curves for parental and αVKO cells cultured on TCP (unfilled circle) or Matrigel (black square) for four days, assessed by fold change in cell number. Student’s t-test indicated no significant difference in cell number for parental cells cultured on TCP compared to Matrigel. D) Confluence assessed over time HCT116 parental and αVKO on TCP and Matrigel. Parental on TCP in blue and Matrigel in red and, αVKO on TCP in green, and on Matrigel in purple. Confluence assessed using Incucyte instrument and software. Plates were scanned every two hours and points and connecting line illustrates the mean confluence (as assessed across three biological replicates with four technical replicate images per well), error bars are standard deviation of the mean. (*p<0.05, **p<0.01, ***p<0.001)

109

sphere formation, and that upon loss of ITGAV, cells grow as spheres regardless of genotype, but provision of exogenous ECM can restore the adherent cell state.

3.5.6 Pyruvate modulates cell state in αVKO cells

In the course of characterizing our HAP1 αV knockout clones, we observed that HAP1 αVKOs can grow adherently on TCP under minimal nutrient conditions; that is, media with reduced glucose, glutamine and no pyruvate (Figure 23A, Supplemental Movies 1,2). To ascertain the specific nutrient responsible for this observation, we added each of the reduced or absent components back independently (and pairwise) and observed that αVKO cells grow as 3D spheroids following the addition of 1mM pyruvate to the minimal medium (Figure 23B). This indicates that deprivation of pyruvate, like provision of ECM, can mask the requirement of integrin αV for adherent growth, and suggests that pyruvate is required for sphere formation.

3.5.7 Availability of electron acceptors determines cell state in αVKO cells

To define how pyruvate can promote 3D sphere formation in αVKO cells, we considered the different metabolic fates for pyruvate. Pyruvate is the end product of glucose breakdown through glycolysis (Figure 23C), yet additional glucose did not substitute for pyruvate in promoting sphere formation (Figure 23C). Pyruvate can also be converted to alanine or lactate through the activity of alanine aminotransferase (ALT) or lactate dehydrogenase (LDH), respectively (Figure 23C), yet supplementation of alanine or lactate to the minimal media was insufficient to promote 3D sphere formation in aVKO cells (Figure 24A). We next tested whether NAD+ generation is an important trigger for spheroid growth in αVKO cells by exploiting α-ketobutyrate (AKB), an alternative electron acceptor which can be used to re-generate NAD+ from NADH through LDH [172], and found that AKB could trigger 3D spheroid growth (Figure 24A). While AKB is neither a carbon or energy (ATP) source, it promotes NAD+ generation to maintain redox homeostasis (Figure 24C) [172]. Notably, the pyruvate dehydrogenase kinase (PDK) inhibitor dichloroacetate (DCA) [173-175] blocked pyruvate stimulated spheroid formation, but failed to block AKB-induced sphere formation (Figure 24A), implying that pyruvate is supporting redox homeostasis to promote 3D sphere formation. Interestingly, these treatments showed no significant effect on cell proliferation (Figure 24B). So, as the continued production of pyruvate from glycolysis consumes NAD+ [172], the requirement for exogenous electron acceptors is

110

Table 8 – αVKO Sanger sequencing data confirmation

Genomic DNA was isolated from integrin αVKO single cell clones derived from HAP1 and HCT116 and the genomic locus targeted by the ITGAV targeting sgRNA was PCR amplified and subjected to Sanger sequencing. Any resultant overlaid chromatograms indicative of multiple distinct allelic mutations were de-convoluted into individual allele sequence predictions using the CRISPR-ID online tool [160]. These sequences were then mapped onto the ITGAV open reading frame (ORF) and summarized by the specific point mutation. Only clones with frame- shift mutations in all alleles, and negative for integrin αV by flow cytometry were used for subsequent experiments. Each clone was then scored for growth state phenotype on TCP at 4 days and for whether spheres formed after extended growth on either TCP or Matrigel. bp = base pairs, del. = deletion, ins. = insertion A = adherent, S= all spheres, M = mixture of adherent and spheres, Y= yes, N= no. ITGAV Short Guide Allele #1 Allele #2 Allele #3 4 day TCP TCP Matrigel ID Cell line # phenotype spheres? Spheres? 1A2 HAP1 1 10bp del. A Y N 1A3 HAP1 1 10bp del. S Y N 1A4 HAP1 1 23bp del. S Y N 1A5 HAP1 1 8bp del. S Y N 1A6 HAP1 1 13bp del. M Y N 1N1 HAP1 1 77bp del. M Y N 1N2 HAP1 1 1bp ins. A Y N 1N3 HAP1 1 1bp ins. 1bp del. S Y N 1N4 HAP1 1 181bp del. M Y N 1N6 HAP1 1 1bp ins. M Y N 2A2 HAP1 2 1bp ins. S Y N 2A5 HAP1 2 40bp del S Y N 2A6 HAP1 2 4bp del 10bp del M Y N 2N3 HAP1 2 2bp del. M Y N 2N5 HAP1 2 2bp del. S Y N 111

2N6 HAP1 2 2bp del. M Y N 1C1 HCT116 1 1bp ins. 1bp ins. M Y N 1C2 HCT116 1 26bp del. 26bp del. 19bp del. S Y N 1C3 HCT116 1 10bp del. 10bp del. S Y N 1C5 HCT116 1 1bp ins. 10bp del. S Y N 1C6 HCT116 1 31bp del 1bp ins. S Y N 2C1 HCT116 2 13bp del. S Y N 2C2 HCT116 2 26bp del. 8bp del. 19bp del. M Y N 2C3 HCT116 2 2bp del. S Y N 2C6 HCT116 2 22bp del S Y N 112

HEK293T A HEK293T B C TCP Matrigel 40 40 WT αVKO n.s. 30 30 WT AVKO-1AVKO-2 20 20 * αV WT

cell number 10 10

Fold change in *** β1 0 0 01234 01234 Loading Days

TCP Mat αVKO

DEMCF7 shITGAV 100 WT TCP 1Cntl 2345 75 WT Matrigel AVKO TCP αV 50 AVKO Matrigel 25 β1 % Confluence 0 01234 Loading Days

shRNAs against ITGAV F shLacZ1 2 3 4 5

MCF7

Figure 22 – Loss of Integrin αV Leads to Sphere Formation in Many Cell Backgrounds A) Western blot for integrins αV and β1 and loading control for HEK293T parental and αVKO cells B) Representative bright field images for HEK293T parental and αVKO cells cultured on TCP

versus Matrigel. Black scale bar = 100uM. C) Growth curves for parental and αVKO cells cultured on TCP (unfilled circle) or Matrigel (black square) for four days, assessed by fold change in cell number. Student’s t-test indicated no significant difference in cell number for parental cells cultured on TCP compared to Matrigel. D) Confluence assessed over time HEK293T parental and αVKO on TCP and Matrigel. Parental on TCP in blue and Matrigel in red and, αVKO on TCP in green, and on Matrigel in purple. Error bars are standard deviation of the mean. E) Western blot for integrins αV and β1 and loading control of MCF7 cells treated with control or one of five unique hairpins against integrin αV. F) Representative bright field images for MCF7 treated with shITGAV cultured on TCP. (*p<0.05, **p<0.01, ***p<0.001) 113

A Days: 1 2 3 4

Rich Media

Minimal Media

Rich Minimal B C Glucose

DCA Glycolysis Alanine PDK Inhibitory Phosphorylation ALT

+ Glucose + Glutamine + Pyruvate Gluconeogenesis PDC LDH Acetyl-CoA Pyruvate Lactate

PC NADH NAD+

Oxaloacetate

Figure 23 – Pyruvate is Required for Sphere Growth State A) Time-lapse imaging of integrin αVKO cells cultured in rich or minimal media (MM) conditions daily for four days B) Representative bright field images of αVKO cells cultured in minimal media with indicated metabolite supplemented back individually. C) Metabolic pathways converge on pyruvate. [from top, counter clockwise]. Pyruvate is generated from glucose through glycolysis, and can be converted to glucose through gluconeogenesis. Pyruvate and the amino acid alanine can be interconverted through the action of alanine transaminase (ALT). Through the action of lactate dehydrogenase (LDH) pyruvate and lactate can be interconverted, a process that occurs along with a change in the redox status of NAD+/NADH. Pyruvate can be converted into oxaloacetate through the action of pyruvate carboxylase (PC). Pyruvate dehydrogenase complex (PDC) catalyzes the inter-conversion of pyruvate and acetyl-CoA. The conversion to acetyl-CoA is regulated by inhibitory phosphorylation on PDC by pyruvate dehydrogenase kinase (PDK). Treatment with dichloroacetate (DCA) inhibits PDK thus relieving repression of PDC, funneling pyruvate to acetyl-CoA. 114

A

Treatment: Minimal Pyruvatey Alanine Lactate AKB

Control

DCA

aba RichRiRichch MinimalMiM ninimamal b

B C 125 Untreated 100 LDH / DCA 75 other 50 AKB AHB

% Proliferation 25 0 NADH NAD+ AKB Alanine Lactate Minimal Pyruvate

Figure 24 – Exogenous Electron Acceptors are Required for Sphere Growth State A) Representative bright field images of αVKO cells cultured in MM supplemented with indicated metabolite (top) and in the presence of DCA (bottom). AKB is α-ketobutyrate. B) Viability of cells from A) measured with alamarBlue. The mean viability for each supplement alone is illustrated with a black bar, and with DCA treatment as a grey bars. Individual biological replicates for all conditions are shown as unfilled circles, and error bars are standard deviation of the mean. Student’s t-test indicated no significant differences. C) AKB is a substrate of LDH and other dehydrogenases and can be converted to α-hydroxybutyrate (AHB) along with change in redox status of NAD+/NADH.

115 consistent with our observation that glucose alone cannot support spheroid formation in αVKO cells. In other words, our results show that redox stress can mask the genetic dependency on ITGAV for an adherent cell state.

3.5.8 Availability of glutathione and biotin determine cell state in αVKO cells

Our data indicates that sphere formation is determined by the interplay between integrin expression, ECM availability and metabolic conditions. Hitherto, experiments were performed in serum-containing media and many factors are undefined (e.g. growth factors, nutrients, metabolites, ECM components). Starting from a serum-free defined medium (DM), we built a culture system to dissect the nutrient requirements for cell state preferences (see Methods). We observed that with no ECM coating, DM supports spheroid growth, regardless of ITGAV status (Figure 25A) [176]. ITGAV status likely has no impact on sphere formation in DM for two reasons: 1) there are no ECM or attachment factors present in DM; and 2) the nutrient conditions present in DM support sphere formation. To identify DM conditions where cells grow adherently, we supplemented individual ECM components (collagen, laminin, fibronectin) and observed that in DM plus ECM, parental HAP1 cells expressing αV grew adherently, while the αVKO cells grew as spheres, regardless of pyruvate deprivation (Figure 25A). These results demonstrate that αV is a major regulator of adherent growth, even in the absence of serum. However, since pyruvate (or AKB) was required for sphere formation in serum-containing medium, our observations indicate that DM provides other factor(s) to trigger sphere formation.

To determine what element of DM facilitates sphere formation in aVKO cells, we systematically removed DM components and observed that αVKO cells grow adherently in the absence of the SM1 supplement (Figure 25B). SM1 is a complex supplement that is based on the B27 formulation that contains insulin, vitamin A, antioxidants and other additives [177]. Critically, we found that replacing SM1 with glutathione and biotin triggered sphere formation when aVKO cells were grown in DM (Figure 25B), linking redox homeostasis to cell state preference. Overall, our data demonstrates that HAP1 cells expressing αV grow adherently in all conditions, except in the complete absence of serum-resident and exogenously supplied ECM (Figure 25B). In contrast, αVKO cells exhibit a complex cell state determination that requires pyruvate, or some source of electron acceptors, in serum-containing medium; and glutathione and biotin (or

116

A B Wildtype AVKO Pyruvate: -+-+ WT Cell State

TCP Serum? Yes No

ECM No Adherent Spheres coating?

ECM coating Yes Adherent

B Minus SM1 Supplement D αVKO Cell State Complete - + Glutathione Serum? Yes No

Yes ECM ECM No Adherent Spheres coating? coating?

No Yes

Yes Pyruvate or Yes Spheres SM1? Spheres AKB?

No No Biotin Adherent Biotin and Yes Spheres GSH?

No

Adherent

Figure 25 – In a Defined Media System, Glutathione and Biotin are Required for Sphere Growth State A) Representative bright field images of HAP1 parental and αVKO cells cultured in serum-free defined medium in the indicated condition with and without pyruvate. Black scale bars = 100um. B) Flow chart summarizing cell state preference for WT parental cells in various tested ECM and metabolic conditions. C) Representative bright field images of αVKO cells cultured in serum-free defined media conditions with indicated supplements. D) Flow chart summarizing cell state preference for αVKO cells.

117

SM1) in defined medium, for sphere formation (Figure 25D). Thus, αV integrin and conditions impacting redox homeostasis mediate cell state in both serum-containing and serum-free media conditions.

3.5.9 Multiple αV-containing heterodimers contribute to the phenotype of αVKO cells

Integrin α subunits must pair with a requisite β-subunit, forming an αβ heterodimer, in order to translocate to the plasma membrane and exert function. The integrin family comprises 24 possible heterodimers, 7 of which are leukocyte/platelet-specific and 17 non-leukocyte/platelet integrins, of which 5 include αV (Figures 19C, 26A). We sought to pinpoint the specific β subunit(s) that pair with αV mediating the aforementioned ITGAV knockout phenotypes. In our meta-analysis of 54 different human cell lines, ITGAV-dependent cell lines only co-depended on a single β integrin, ITGB5 (Figure 15H), and both ITGAV and ITGB5 showed strong masking phenotypes over time in the genome-wide ECM screens across several substrates (Figure 16H). Thus, we hypothesized that the αVβ5 heterodimer was mediating adherence-related fitness phenotypes. To test if knocking out ITGB5 mimics ITGAV knockout phenotypes such as sphere formation, we examined ITGB5 knockout cells (β5KO cells) on TCP and Matrigel. We observed that while Matrigel supported the adherent cell state, β5KO cells grew as a mixture of monolayer and 3D clusters that remained anchored to TCP and showed abnormal actin staining patterns (Figure 26E), yet were similar to parental HAP1 cells in adhesion assays (Figure 26C). Similar to αVKO cells, β5KO cells showed a significant proliferation defect on TCP that is masked by Matrigel (Figure 26D). This phenotype is intermediate to the adherence-related phenotypes observed in αVKO cells, and suggests other αV heterodimers can, to some extent, substitute for the loss of αVβ5 (Figure 26A).

ITGB1 encodes β1 integrin, which can heterodimerize with 12 different α subunits, including αV [56]. To test if integrin β1 contributes to sphere formation, we examined ITGB1 knockout (β1KO) cells (Figure 26B) on TCP and different ECM substrates and observed that β1KO cells grew adherently regardless of substrate, and showed abnormal actin staining patterns (Figures 26E, 26F). β1KO cells also have impaired abilities to adhere to Matrigel, and showed a mild, but significant, reduction in proliferation on Matrigel compared to TCP (Figure 26D). Thus,

118

A Knockout B ITGβ1 KO ITGβ5 KO Secondary Heterodimer WildtypeαV β5 β1 β1β5 αVβ1 DAPI only WT 1 1 1 YYYNNN WT β5KO 2 2 1 YYYNNN β1KO 3 3 1 YYYNNN 4 4 1 YYYNNN 5 5 1 YYYNNN 6 6 1 YYYNNN Counts 7 7 1 YYYNNN Counts 8 8 1 YYYNNN 9 9 1 YYYNNN ITGβ1-FITC ITGβ5-APC 10 10 1 YYYNNN 11 11 1 YYYNNN 12 V 1 YNYNNN 13 V 3 YNYYYN CDAdherence Assay 14 V 5 YNNYNN 100 100 β5 β1 β5 KO β1 KO TCP 15 V 6 YNYYYN *** 16 V 8 YNYYYN 75 75 Matrigel 17 6 4 YYYYYY TCP 50 50 18 L 2Y YYYYYYY Y Y Y Y **

19 M 2 YYYYYYY Y Y Y Y Y cell number 25 25 20 D 2Y YYYYYYY Y Y Y Y Fold change in ** 0 0 21 X 2Y YYYYYYY Y Y Y Y Matrigel 01234 01234 22 4 7 YYYYYYY Y Y Y Y Y 23 E 7 YYYYYYY Y Y Y Y Y Days

leukocyte / platelet 24 IIb 3 YYYYYYY Y Y Y Y Y

EF BrightfieldTubulin Actin DAPI Overlay

Uncoated

β5 KO

Matrigel

Uncoated

β1 KO

Matrigel 119

Figure 26 –ITGAV Knockout Phenotype is Mediated by Loss of Multiple Integrin αV Heterodimers A) All 24 potential integrin heterodimers (Y in yellow) and missing heterodimers (N in blue) are indicated according to gene knockout. Light grey background shading indicates leukocyte/platelet-specific heterodimers. B) Flow cytometry of HAP1 parental (WT), β1 and β5 KO cells validating loss of surface staining. Solid green histogram is WT cells stained with FITC-conjugated antibody against ITGβ1, solid grey histogram is negative control, purple line histogram is β1KO stained with anti-ITGβ1-FITC, solid pink histogram is WT with anti-ITGβ5-APC and black line is β5KO stained with anti-ITGβ5-APC. C) Representative images of crystal violet staining of adherent β1KO or β5KO cells after a 20-minute adhesion assay. D) Cell proliferation curves for indicated knockout cells on TCP (unfilled circles) and Matrigel (black squares) over the course of four days. Individual points indicate representative biological replicates (n=3), and black lines indicate the means. E) Representative bright field images of integrin β1 and β5 KOs on indicated ECM substrate. Black scale bar = 100μM. F) Immunofluorescence microscopy of integrin β1 and β5 KOs on indicated substrate. Microtubules are coloured green, actin filaments are red and nuclei are blue. White scale bar = 10μM. (* p<0.05, ** p<0.01, *** p<0.001) 120

knocking out either ITGB1 or ITGB5 alone did not mimic the phenotypes observed in ITGAV knockout cells.

3.5.10 Knocking out many integrin heterodimers causes non-spheroid suspension growth

The ITGB1 and ITGB5 single mutant phenotypes suggest that multiple αV/β pairs are important for adherent growth. To test this idea by knocking out a broader set of 16 potential α/β pairs, we generated ITGAV-ITGB1 double knockout cells, which lack all non-leukocyte specific integrins with the exception of the α6β4 laminin receptor (Figure 26A). The ITGAV-ITGB1 double knockout cells (αVβ1KO cells) (Figure 27A) do not adhere to TCP and cannot be coaxed to adherence by addition of collagen, laminin, fibronectin, Matrigel or pyruvate deprivation (Figure 27F). Strikingly, these cells are viable and grow in suspension as loose clusters of suspension cells (Figures 27F, 27G). This loosely-associated suspension cell state, stands in contrast to αVKO cells growing in suspension as tightly packed spheroids. The proliferation of αVβ1KO cells was also unaffected by ECM, and similar to the proliferation of β1KO cells (Figures 27C, 27D). Transmission electron microscopy of αVβ1KO cells revealed disrupted cell-cell junctions, reduced cytoplasm and shedding of large cytoplasmic vesicles (Figure 28). We also generated ITGAV-ITGB1 double knockout HEK293T cells and observed similar phenotypes to HAP1 αVβ1KO (Figures 29A-D). Together, these results suggest that β1-containing integrin heterodimers: 1) are generally not required for adherence to TCP; 2) regulate ECM masking of ITGAV knockout; 3) regulate adherence in conditions of pyruvate deprivation; and 4) contribute to the tight cell-cell interactions in αV spheroids. These observations suggest that epithelial integrins are required for cell-matrix and the tight cell-cell contacts observed in αVKO spheroids, but are largely dispensable for survival, growth and proliferation. That loss of β1 integrins in the αVKO background mitigates the pyruvate requirement for non-adherent growth suggests that NAD+/NADH redox homeostasis may regulation β1 integrins in adherence.

As loss of either β1 or β5 integrins did not recapitulate αVKO phenotypes, we posited that αVβ5 and αVβ1 heterodimers are functionally redundant. To test this, we generated ITGB1-ITGB5 double knockout (β1β5KO) HAP1 cells (Figure 27B). Compared to β1KO cells, the β1β5KO 121

A αV β1 Double KOs C Secondary 100 αVβ1 KO 100 β5β1 KO TCP WT αITGβ1 75 75 Matrigel WT αITGαV β1αV KO 50 50 αVβ1 KO 25 25 cell number

Fold change in 0 0 01234 01234 Counts Days β1-FITC αV-PE B D E Adherence Assay HAP1 β5 β1 Double KOs TCP 125 αVβ1 β5β1 DAPI only 100 Matrigel β5KO 75 TCP sgβ1#1 50

sgβ1#2 AlamarBlue 25

fluorescence (%

Control on TCP)Control on 0 Matrigel

Counts β1KO αVβ1 KOβ1αV KO ITGβ1-FITC

F TCP Collagen Fibronectin Laminin Matrigel

HAP1 αVβ1 KO

G Brightfield TCP Matrigel Tubulin Actin DAPI Overlay

αVβ1 KO Suspension β5β1 KO 122

Figure 27 – Integrin Heterodimers Determine Cell Growth State in HAP-1 A) Flow cytometry of HAP-1 αVβ1 double KOs. β1αV KO was generated by knockout out αV from β1 KO cells and αVβ1 KO was generated by the parental (WT) HAP-1 using a simultaneous KO transfection. B) Flow cytometry of HAP1 β5KO parental line and derived β1β5 double KOs. C) Cell growth curves for indicated double knockout cells on TCP (unfilled circles) and Matrigel (black squares) over the course of four days. Individual points indicate representative biological replicates (n=3), and black lines indicate the means. D) Alamar blue proliferation of double knockouts TCP compared to Matrigel after four days, assessed by alamarBlue, expressed as % β1 KO control on TCP. E) Representative adherence assay for αVβ1 β1β5 double KOs stained with crystal violet. F) Representative bright field images of HAP1 αVβ1 KO cultured on TCP, collagen, fibronectin, laminin or .G) Representative bright field images of double KOs cultured on TCP or Matrigel, after gentle agitation and Immunofluorescence microscopy of double KOs stained in suspension. Microtubules are coloured green, actin filaments are red and nuclei are blue. White scale bar = 10μM. (* p<0.05, ** p<0.01, *** p<0.001) 123

2500x 6000x Zoom

Wildtype

4uM

αVKO

αVβ1KO

Figure 28 – Transmission Electron Microscopy (TEM) Reveals Cell Junction and

Morphology Abnormalities Associated with Integrin Loss. Representative TEM images of parental WT HAP1, αVKO and αVβ1cells cultured on plastic (WT are adherent, αVKO cells are spheres and αVβ1cells are fully suspension) . Black scale bar = 4uM. 124

A HEK293T B αV β1 Double KOs 125 HEK293T TCP 100 Matrigel Secondary WT αITGβ1 75 WT αITGαV 50 β1αV KO

AlamarBlue 25 αVβ1 KO

fluorescence (% Control on TCP)Control on 0

WT

Counts αVβ1-1 αVβ1-2

β1-FITC αV-PE

HEK293T C αV β1 double KOs D TCP Matrigelg WT αVβ1-1 αVβ1-2

TCP

HEK293T Matrigel

Figure 29 – Loss of all αV- and β1-Integrin Heterodimers Leads to Suspension Growth State A) Flow cytometry of HEK293T αVβ1 double KOs. Two individual clones were generated from parental (WT) cells using a simultaneous KO transfection. B) Alamar blue proliferation of double knockouts TCP compared to Matrigel after four days, assessed by alamarBlue, expressed as % WT HEK293T control on TCP. C) Representative bright field images of HEK293T double

knockouts cultured on TCP or matrigel, after gentle agitation. D) Representative 20 minute adherence assay for parental WT and αVβ1 double knockout HEK293T cells . 125

cell line lacks only the αVβ5 heterodimer (Figure 26A), so phenotypic differences between β1KO and β1β5KO can be attributed to αVβ5 (Figure 26A). Intriguingly, β1β5KO cells could not adhere to tested substrates and grew as non-adherent cell clusters (Figures 27E, 27F). The phenotypes of β1β5KO cells were highly reminiscent of αVβ1KO cells, despite the potential of β1β5KO cells to express αVβ3, αVβ6, αVβ8 heterodimers, supporting the model that αVβ1 and αVβ5 heterodimers are the key regulators of cell state preference.

3.5.11 Integrin targeted biologics recapitulate genetic model

To validate our sphere formation phenotype using an alternative approach to gene disruption, we generated a panel of human recombinant Immunoglobulin type G antibodies (IgGs) against the integrin heterodimer αVβ5, and validated their binding specificity to αVβ5 by flow cytometry on cells over-expressing defined pairs of integrins (Figure 30A). To test the ability of the αVβ5 IgGs to inhibit αVβ5 function and recapitulate the corresponding genetic knockout phenotypes, we treated HAP1 parental and β1KO cells with these antibodies and observed that HAP1 parental cells were resistant to αVβ5 IgG treatment (Figure 30B), while β1KO cells became non- adherent and grew in suspension as loosely adherent cell clusters, regardless of exogenous ECM (Figure 30C). These observations confirmed that inhibiting αVβ5 in β1KO cells using a different perturbation modality mimics the β1β5 double mutant phenotype. We also treated HCT116 cells with our panel of αVβ5 IgGs and observed sphere formation and concomitant decrease in proliferation (Figure 30D). Again, Matrigel completely masked the sphere formation, proliferation and confluence effects associated with our αVβ5 IgGs in HCT116 cells (Figures 30C-E), suggesting that sensitivity to integrin inhibitors may be highly dependent on microenvironmental ECM. The observation that HAP1 parental cells are largely resistant to αVβ5 inhibitory IgGs while HCT116 cells are sensitive to these IgGs highlights that different cell lines have slightly different integrin heterodimer dependencies. Overall, our results support a model where specific integrin αVβ# heterodimers regulate cell line growth preferences and cell state (Figure 31), and that presence and composition of ECM can limit the potential effectiveness of integrin-targeted biologics. 126 C A B TCP Matrigel 20 HAP1 1461 treatment 25 15 TCP Hap1 20 Matrigel WT 10 15

% 1461 High 5 10 anti-β5

cell number 5 IgG#1461 0 Fold change in Hap1 UT 0 β1KO αVβ1αVβ3αVβ5αVβ6αVβ8 ControlαIIbβ3 Transfection 1461 Control

HCT116

25 D E HCT116 1461 treatment 20 TCP 100 Control TCP Matrigel Control Matrigel 15 n.s 75 1461 TCP 1461 Mat rigel 10 ** ** 50 cell number Fold change in 5 *** 25

% Confluence 0 0 02448 72 96 1461 1476 1479 1480 Control Time elapsed (h)

Figure 30 – Effects of Integrin Inhibitory Antibodies are Masked by ECM A) Flow cytometry data of HEK293T cells co-transfected with indicated integrin subunits. Percent of cells staining high for 1461 IgG are indicated for each transfection. B) Fold change in cell number for HAP1 parental control cells and cells treated with 1461 IgG for 3 days. No significant effects were observed

(student’s t-test) C) Representative bright field images of indicated cell line treated with integrin β5 blocking IgG #1461, on indicated substrate. Black scale bar = 100μM. D) Mean fold-change in HCT116 cell number following treatment with the indicated anti-β5 inhibitory IgGs or control non-targeting IgG for four days, on TCP (unfilled bars) and Matrigel (black bars). Individual biological replicates are shown as open symbols and error bars indicate standard deviation of the mean. Significance was calculated between control and antibody treatment for TCP and Matrigel (student’s t-test). When cultured on Matrigel, there was no significant difference in cell number between control and any antibody treatment. E) Confluence over time for HCT116 control and 1461 treated cells cultured on TCP or Matrigel for four days. Error bars represent standard deviation for four images per well for three biological replicates. (* p<0.05, ** p<0.01, *** p<0.001) 127

2D 3D

Adherent Spheroid Suspension

αVβ5 Other β1 integrins Loss of αV & β1 αVβ1 Permissive Redox or β1 & β5 ECM Conditions integrins

Figure 31 – A Model of Cell Growth State Determination by Integrin Heterodimers and Metabolic Conditions. Cells can reversibly switch between adherent, sphere and suspension state depending on genetic or biologic integrin perturbations, ECM and metabolic conditions. Specifically, αVβ1 and αVβ5 seem to be key mediators of 2D growth under standard tissue culture conditions. When these are lost, cells can switch to a spheroid phenotype if they are exposed to permissive redox conditions. This spheroid phenotype is reversible by supply with exogenous ECM. However, if all αV and β1 heterodimers are lost cells become fully suspension and insensitive to ECM or redox conditions for growth.

128 3.6 Discussion

We mined published functional genetic data and identified a high-confidence set of genes that mediate core processes required for cell fitness in an adherent state. In parallel, we performed genome-wide loss-of-function CRISPR screens on different ECM substrates and identified a small set of genes that are central regulators of cell fitness on all four ECM substrates tested, and a larger set of genes that are substrate-specific regulators of cell fitness. Notably, there was a high degree of concordance between our two approaches, suggesting that in some cases, a cellular requirement for certain AFGs can be modulated by the ECM. As a central regulator of adherent cell fitness and ECM-dependent growth, we went on to extensively characterize the role of ITGAV and found that this gene regulates a cell state transition from adherent to suspension sphere growth. ITGAV is generally read out as a fitness gene (or essential gene) in many genome-wide functional genetic screens simply because adherence is a prerequisite for cell fitness for adherent cell lines.

Integrins are heterodimeric cell-surface adhesion molecules found on all nucleated cells and they integrate processes in the intracellular compartment with the extracellular environment in a bidirectional fashion. Integrin activation is regulated by both ‘outside-in’ and ‘inside-out’ signaling, wherein heterodimers undergo large conformational changes in response to either extracellular stimulus or signaling events inside cells, respectively [55]. These processes are thought to actuate their adhesive function and downstream signaling events. In this way, the presence and composition of ECM and various intracellular sensors and pathways can mediate cell state. Our data support the wealth of literature implicating integrin αV as a key mediator of adhesion [178-180]; however, our data also support a model where integrin αV suppresses sphere formation and 3D growth. Indeed, it is increasingly accepted that monolayer growth is not highly representative of the types of cell-ECM and cell-cell contacts present in intact and diseased tissues. For this reason, 3D in vitro culture systems have been used as intermediate models between in vitro 2D cell culture models and 3D in vivo tumour models, for decades [181]. Although much research has illuminated cell signaling pathways and changes in metabolism that accompany 3D in vitro cell systems [82, 83, 181], little has been done to build an understanding of the genetic regulators of these different cell states. We used CRISPR to genetically define transitions from adherent monolayer (2D) culture to the formation of tightly

129

associated sphere (3D) cultures mediated by loss of ITGAV, and further to a loosely associated suspension culture mediated by loss of both ITGAV and ITGB1, or loss of both ITGB1 and ITGB5. Importantly, the β1 or αV integrin subunits are present in 16 out of 24 possible integrin heterodimers, representing all of the non-leukocyte/platelet integrin pairs except for α6β4 [56]. Perhaps the most striking observation in our study is that αVβ1 double knockout cells are alive and proliferate in culture, raising a number of questions about focusing on αV and β1 integrins as therapeutic targets, particularly for cancer.

The 18 α- and 8 β- integrin subunits form 24 different heterodimers, each having functional and tissue specificity [55]. Essentially, integrins have central roles in almost all phases of human biology as well as in the pathobiology of many diseases, making integrins a focus of the biotechnology and pharmaceutical industries as potential therapeutic targets. Despite substantial effort and investment, including some 480 integrin targeted drugs entering clinical trials, only seven integrin drugs have reached the market (< 2%), and all successful candidates have targeted the leukocyte/platelet integrins [148]. This lack of translational success for integrin drugs is well below the average, which is estimated at 10% success rate for drugs entering Phase 1 clinical trials to successfully pass Phase 3 [148]. Whereas drugs targeting several leukocyte/platelet integrin pairs have clear clinical efficacy, biologics and small molecules that have effectively targeted biological mechanisms of the αV integrins in in vitro and in vivo models have consistently failed clinical success. For example, the pan-β3 inhibitor ReoPro is used for percutaneous coronary intervention and the α4β1 and α4β7 inhibitor Tysabri is approved to treat Crohn’s disease [148]. While pan-αV integrin biologics including abituzumab and intetumumab and the αVβ3 and αVβ5 inhibitor, cilengitide, have shown disappointing late-phase clinical trials [148].

In a physiological context, αV integrins are widely expressed on many cell types, where different associated β subunits show distinct patterns of expression through development [182]. This, the wide range of potential ligands (i.e. vitronectin, fibronectin, osteopontin, LAP-TGFβ), and the changes in integrin subunit expression in wound healing and disease states, imply diverse functions for the αV integrin sub-family. The αV integrins became of great pharmaceutical interest about 24 years ago when it was first reported that antagonizing αVβ3 reversed angiogenesis, shrinking tumours [183, 184]. The interpretation of this finding was that αV

130

integrins play an important role in angiogenesis and thus may be viable cancer targets. This finding was expounded upon over the following decade, and a humanized monoclonal antibody against integrin αV, CNTO95 (which would go on to clinical trials as intetumumab) was shown to have anti-angiogenic and anti-tumour activity in vivo [185]. Despite these promising findings, genetically engineered mouse models were providing contrasting observations as to the physiological role of integrin αV. Two decades ago the ITGAV KO mouse was characterized and it was observed that 80% of embryos die in mid gestation due to placental defects while 20% were born live suffering from extensive intra-cerebral and intestinal hemorrhaging [186]. Counter to expectations, these mice showed considerable organogenesis and extensive vasculature and angiogenesis, necessitating a reevaluation of the importance of αV integrins for angiogenesis [186]. Subsequently, integrin αV was specifically deleted in the vascular endothelium and showed no effect on cerebral blood vessel development, further calling into question its role in angiogenesis [187]. Despite this, the apparent efficacy of integrin αV antagonists to inhibit angiogenesis suggests that a model of genetic deletion may result in different effects than inhibition. When αV is deleted, it is possible that other integrins are expressed to compensate and that in an inhibition model this is reduced. Despite conflicting results about the primacy of the role of αV in angiogenesis, many studies have shown that this integrin may be a viable target for solid tumours [119, 185, 188-190]. A plausible explanation for this is that the function of αV is different under normal physiological conditions than in the conditions of a tumour xenograft. For example, in a human patient, from the time of tumour initiation in to treatment, the ECM microenvironment has dynamically coevolved with the tumour cells. This is very different from a transplanted xenograft model. Even the best orthotropic model lacks coevolved, species-specific ECM, CAFs and endothelial cells. Despite this wealth of research into αV as a cancer target, very little work has been done to look at the relationship of cancer requirement for αV integrins in the context of different extracellular microenvironment conditions. In a notable exception, it was recently reported that αVβ3 is recycled by two distinct mechanisms depending on whether experiments were performed in serum-containing or serum-free conditions [191]. This finding is very interesting in light of our findings about the role of redox homeostasis in serum versus serum-free medium on cellular requirement of αV for adherence. Further study will be required to understand the differential requirement role for αV under different microenvironments.

131

Our screen results, genetic models of αVKO, and αVβ5-targeted biologics suggest that, at least in vitro, apparent efficacy is dictated by the cell culture microenvironment. For example, in the case of solid cancers, our results suggest that depending on the ECM and metabolic microenvironment integrin loss could either have little effect, or could lead to suspension growth (either as spheres or single cell clusters). In fact, this unwanted effect could provide cells the opportunity to thrive in a different niche or microenvironment and even directly result in increased invasion and metastasis. Indeed, loss of integrin αV in the mouse eyelid and conjunctiva has been shown to cause squamous carcinomas[192] suggesting in at least in some cases that αV may be serving a tumour suppressive function. We speculate that in patient tumours, the in situ tissue microenvironment, including ECM and possibly metabolic conditions, may at best mediate resistance to these drugs, and at worst these drugs could have tumour promoting effects. In the case of fibrosis, integrin inhibition may be a good strategy for dissociating fibrotic tissue, as has been observed with targeting integrin αV in certain organs [193], but cells may remain alive and cause other secondary problems if they are not systemically cleared. Identifying biomarkers and/or combination therapies to overcome these challenges will be important for translating the non-leukocyte/platelet integrins as therapeutic targets.

Anchorage-dependent cells require attachment to some surface in order to survive and grow. The cells can be adhered to other cells, ECM, or TCP. Importantly, if an anchorage-dependent cell is not adhered to a substrate, it is unable to function, grow and divide. Adherent growth, on the other hand, simply refers to the growth state in which cells prefer to adopt for managing a particular genetic program and/or environmental condition. It is worth noting that our implication of redox homeostasis in non-adherent growth is consistent with published studies [82, 83]. However, to our knowledge, no studies have implicated such conditions as directly mediating such cell state determination through integrins. We speculate that the metabolic state of a cell, including NAD+/NADH balance, or potentially NADP+/NADPH, as regulated by glutathione and biotin, may signal to specific integrin heterodimers, through ‘inside-out signaling’ to determine a preference for 2D or 3D growth, at least in in vitro culture conditions. Specifically, we have observed that in the absence of αV integrins, β1 integrins mediate the determination of cells to grow adherently or in a sphere state. That metabolic conditions impacting redox homeostasis can also modulate this state preference, we propose that in the

132 absence of exogenous pyruvate (or AKB) reduced NAD+ levels may signal to β1 integrins to promote adherence. It remains to be determined whether this is through increased β1 integrin expression, inside-out activation, increased recycling, or increased secretion of ECM.

It is well understood that the distinct integrin heterodimers show high levels of cross-talk and functional redundancy for adhesion to specific substrates. Indeed both integrin α5β1 and all αV- class integrins serve as receptors for the RGD found in fibronectin and are known to exert both specific and redundant functions in vivo[194]. In order to tease apart the distinct roles of β1 and αV-class integrins, pan-integrin-null mouse fibroblasts were generated and reconstituted with either β1 or αV class integrins[195] and analyzed for their phenotype in adherence to fibronectin. This system revealed that α5β1 integrins accomplish the force generation and αV- class integrins mediate structural adaptations to the force (i.e. large FAs) which together cooperate to sense rigidity and generate full traction forces on fibronectin. [195]. This mouse fibroblast integrin reconstitution model was recently used to demonstrate that αV integrins exert dual roles on α5β1 integrins to strengthen adhesion to a fibronectin substrate[196]. In this model, single-cell force spectroscopy was used to determine the strength of adhesion in the various integrin expression fibroblasts. Interestingly, αV integrins outcompeted α5β1 integrins for adhesion, and once engaged signal to α5β1 integrins to establish additional fibronectin contacts. This points to a role for initial dominance by αV integrins leading to cooperative engagement for full strength adhesion. This fits well with my observations in various models of human αVKO cell lines where αV-class integrins are required for adherence under normal conditions. In the absence of αV integrins to establish initial adherence, it appears that β1 integrins are unable to promote adhesion under normal tissue culture conditions. However when exogenous ECM is present, the requirement for αV integrins for adhesion is superseded. Unexpectedly, I uncovered a role for cellular redox state in determining this requirement for integrin αV. Specifically, in the absence of pyruvate or AKB, cells grow switch to an adherent state in a β1 integrin dependent manner. This suggests that redox-stress is sufficient to mitigate the requirement of αV integrins for initial adherence. Based on the short timescale over which this occurs I hypothesize that redox stress signals to activate β1 integrins through inside-out signaling and/or promotes clustering and strong adhesion complexes even in without αV integrins to establish traditional FAs. Furthermore, under serum-free defined media conditions,

133 cells require GSH and Biotin to grow as spheres. In the absence of either of these metabolites cells become adherent to collagen. As α5β1 is not a receptor for collagen, this suggests that other β1-pairing α integrins also crosstalk with αV-class integrins. It is also possible that redox stress signals to other cell adhesion molecules, in particular the DDR family of collagen receptors. This will need to be further investigated by subsequent studies.

To our knowledge, our αVβ1KO represent the first human near-pan integrin knockouts cells. In combination with our αVKO, β1KO and αVβ5KO cells this human cell line model will allow us and others to address a variety of pertinent questions both as to how various integrins crosstalk and how metabolism and various ECM impacts cancer proliferation and growth state preference. Of particular interest is the mechanisms by which redox stress may change the cell surface expression to trigger cell growth state changes. The removal of αV integrins and our extensive characterization of there growth state preferences provides a new model system to study β1 integrin activation. Since αV integrins are known to outcompete β1-class integrins in fibronectin adhesion [196], I suggest that our panel of multiple αVKO cancer and normal cell line will allow us to screen for and uncover novel regulators of β1 integrin activation. One such way we are pursuing this is through a genome-scale screen for suppressors of the αVKO sphere phoenotype. Specifically, a population of genetic KOs will be cultured sequentially enriched for genes whose KO causes cells to grow adherently, even in permissive redox conditions. I hypothesize this screen approach will yield known negative regulators of β1 integrin activation because their loss will allow activation and thus adherence, and loss of positive regulators of anchorage independent growth, including genes that allow supporting metabolic adaptations.

3.7 Conclusion

Overall, our data support a model where cell state preference is governed by an interplay of three factors: integrin heterodimer expression, ECM availability and metabolic conditions. Our data suggest that cells have extensive capacity to stimulate integrin backup mechanisms and grow in different cell states, which may help explain some of the difficulties in translating integrin targets for cancer, fibrosis and other indications. Our panel of various integrin KOs also provides valuable tools for hypothesis generation experiments pertaining to integrin crosstalk and regulation of integrin activation in human cancer and normal cell lines.

134

4 Chapter 4: Discussion and Conclusions

When I began my PhD, the landscape of biomedical research was very different than it is today. Over the past six years, the cancer research field has been largely overhauled by the disruptive technological innovations including CRISPR-Cas9 mediated gene editing, single-cell genomic technologies and a decreasing cost of sequencing. In this concluding chapter, I will revisit some of my key findings and discuss advantages and disadvantages to my approaches and propose future project directions.

As the understanding of CSCs and CICs has evolved, it is more apparent than ever that markers for these clinically relevant cell populations are insufficient to describe functional ITH. Current approaches to finding new markers still rely on animal immune systems. For example the group that also discovered integrin α7 as a marker for GBM stem cells screened thousands of hybridomas to find one specific clone [113]. My CellectAb methodology is a critical update to this historical strategy for antibody development and target discovery relying, as it did, on naïve animal immunization, hybridoma generation and characterization. CD133, and indeed most cluster of differentiation (CD) molecules, were discovered using the traditional approach [90]. While hybridoma technology has been highly effective, my CellectAb approach has several distinct advantages. First, it is rapid and cheap; within a week antibodies are generated against cells of interest and antibody production can be scaled up within one month. This is rapid and reduces the work load significantly compared to a 30 week timeline from immunization to scaled production from a hybridoma [88]. Second, as the antibody framework is already fully human in sequence, no humanization is required for translation into pre-clinical and clinical models. The third advantage is that in vitro selection and synthetic repertoires allow for production of species cross-reactive antibodies. Species cross reactivity may be a desirable antibody trait for various reasons, including that translational medicine toxicology studies in model organisms require cross reactivity. However, when a mouse host is immunized with a human cell surface protein, it is virtually impossible to obtain an antibody that cross reacts with the mouse version of this protein. This is because, unless the mouse is a knockout for the protein of interest, any natural immune repertoire against self-antigen should be destroyed through clonal deletion during immune system development. This means any in vivo preclinical work in mice would be unable

135 to assess on-target, off-tumour toxicity. For highly conserved target proteins, this represents a large loss of epitope space, and may preclude targeting highly evolutionarily conserved functional epitopes, such as a ligand binding site. The CellectAb fully synthetic approach is not limited by clonal deletion and could be used to target any antigen, even a highly conserved surface protein. Using knockout mice to generate monoclonal antibodies has been successful, in particular to develop an antibody against the ligand binding domain of β6 [108]. However, immunization of knockout mice is not possible for targeting proteins with essential functions, and is impractical for target discovery. The final advantage to the fully recombinant approach is that it yields a fully sequenced, completely modular antibody framework. This allows for simplified cloning and expression of multiple modalities including Fab, IgG and bi-specifics, including bi-specific T-cell engagers (BiTEs) and chimeric antigen receptor T-cells (CAR-Ts). This is especially relevant in the current cancer research climate where immunotherapy has emerged as arguably the most promising weapon in the war against cancer. In short, BiTEs are bi-specific antibody molecules where one antibody arm binds to the tumour-associated antigen (TAA), and the second arm binds to CD3 on a T-cell. Together this bridging activates the T-cell to kill the TAA-expressing tumour cell. In contrast, CAR-T cells are T-cells manipulated ex-vivo to target a specific TAA and thus to kill tumour cells upon reinjection. TAAs which are expressed on tumours and not on healthy tissues make the best targets for such modalities. Of my antibodies, AN03 against integrin β6 likely represents the most promising candidate for BiTE or CAR-T therapy (as also suggested by Whilding in 2016, [123]), because it is expressed on CICs and generally not expressed in healthy tissue. While immunotherapy models were beyond the scope of my project, I have been collaboratorating with Dr. Hunsang Lee in Dr. Mikko Taipale’s lab to attach PE40 toxin to AN03 and to test if conjugation could also be a viable strategy to target β6-expressing CICs.

In my CellectAb-based study, the time it took to FACS sort CICs was the limiting factor for achieving the cell numbers necessary for antibody selection. A drawback to this was I was only able to use a single marker (AC133) to purify the cell population of interest. An alternative, faster approach not limited by cell number would have been to use AC133 conjugated beads to pull down CD133 positive cells. However, in the POP92 model system, most cells express at least a low level of CD133, and a simple bead pull down would not have given the resolution

136

necessary to isolate the top 10% of CD133 expressing CICs. Thus a bead pull down would have decreased purity of CICs to an unacceptably low level.

Going forward, in order to apply this to very rare (~1% or lower) cell populations using multiple markers, a technological advance was necessary. Although not available at the time of my project undertaking, emerging technologies may allow for better, faster isolation of rare subpopulations amenable to the CellectAb workflow. For example, Dr. Shana Kelley’s lab here at the University of Toronto recently developed a cutting edge microfluidics-based technology using magnetic nanoparticles to rank and rapidly isolate very rare cell populations. Dr. Kelley and colleagues use a chip-based device to profile circulating tumour cells, present at parts per billion in whole blood, as described in their recent Nature Nanotechnology article [197]. This technology can be adapted to retrieve live cells, and anecdotally can profile billions of cells and retrieve millions of high purity, rare cell population of interest in just a few hours. I anticipate this technology being used to isolate a subpopulation of interest, including circulating tumour cells, CICs, various blood, solid tissue or embryonic/ induced pluripotent stem cell lineages based on complex marker expression. This would reduce the barrier to achieving reasonable cell quantities for very rare cells, as needed for applications including CellectAb.

Before CRISPR, or “BC” as we in the Moffat lab refer to this time, RNAi was the best functional genetic tool we had to perform genome-wide pooled screens in human cells. The inherent noisiness and lack of resolution in RNAi screens meant we could not easily discern subtle growth defects of many genes. Application of the CRISPR-Cas9 system to human cells for gene editing allowed all of us to think bigger than we ever had before. After becoming interested in the integrin family of cell adhesion receptors and the interaction they mediate with the ECM (chapter 2), I hypothesized that growth on different ECM would mediate different cell adherence and proliferation pathways, thus dictating certain genetic dependencies. I anticipated that many of these differences would be subtle, and thus without CRISPR-Cas9 technology and haploid cells to generate true functional knockouts, I would not have undertaken the project detailed in chapter 3. The genome scale CRISPR-Cas9 drop out screens that I performed on regular tissue culture plastic (TCP), collagen I, fibronectin, laminin and the basement membrane mixture, Matrigel, revealed hundreds of genes whose fitness effect (i.e. essentiality) is significantly dictated by ECM composition. This was not unexpected, however, in these lists we were unable to identify many patterns of enrichment in annotated gene ontology (GO) terms. This suggests to

137

me that current GO terms for pathways and biological processes do not include robust gene sets that mediate interaction with the ECM. Future work in the cell-ECM interaction field may, in time, validate additional hits from my screens and lead to a better understanding of how interaction with ECM contributes to cell phenotype and function.

The strength of my screen was in its simplicity; coating plastic dishes with ECM was cost- effective and allowed me to test four major ECM components in parallel. A weakness of the approach was that only one cell line was screened, thus not allowing for determination of general effects of ECM on fitness genes, and for cell type specific effects. At the time of this work, the Moffat lab only performed screens in clonal Cas9 expressing lines that demonstrated highly- efficient editing. This ensured for good screen quality, but theoretically could lead to clonal effects in the results. The cell line model used, human haploid HAP1 cells were reprogramed from a chronic myeloid leukemia (CML) cell line to allow for adherent growth [198]. In our lab, HAP1 cells are a workhorse for a project aimed at creating a map of all genetic interactions in human cells. For this project, many HAP1 derivative lines have been generated to probe for genetic interactions with genes of interest. For example, to better understand the metabolic wiring of a human cell, HAP1 knockout cells for various key metabolic genes have been screened. This project has generated a great deal of data and information about HAP1 cells and performing my screens in this model allows for additional comparisons. Thus, while it is possible that HAP1’s previous growth phenotype may have some lingering effect on genes mediating interaction with the ECM, the benefit of using a well-defined, haploid system that we can compare across genotypes, outweighed the drawbacks. Indeed, that the top hit, integrin αV has thus far validated in every cell line we have tried, demonstrating that HAP1 is a great model to study cell-ECM interactions.

Going forward, there are many questions still to address as to how the loss of integrin αV effects cells. A large open question is what are some of the changes in signaling pathway activation and dependency when αV is absent, and when cells transition from adherent to sphere growth phenotypes. We know that the adherent to sphere transition requires supportive redox conditions, but we do not yet understand the mechanism for this. A two pronged approach will be taken to answer these crucial questions, 1) hypothesis driven experimentation into growth and metabolic signaling pathways, and 2) unbiased functional genetic screening in αVKO cells under various

138

ECM and metabolic conditions. Specifically, given that activation of multiple signaling pathways including MAPK, ERK, PI3K-AKT etc. are known to be required for anchorage independent growth, we hypothesize they may be required for our spheroid growth phenotype in αVKO cells. As such, we are in the process of testing a panel of kinase inhibitors for their effects on the spheroid growth phenotype. We are also performing genome-wide CRISPR screens on αVKO cells cultured as spheres versus adherently on Matrigel in order to delineate specific fitness genes required in these cell growth states, and when compared to wildtype screens, identify synthetic lethal hits that could be co-targeted with αV inhibitors. Further, we are performing a positive selection screen in αVKOs cultured on TCP in rich medium and selecting for double KOs that promote adherence/ block sphere formation by These approaches will allow us to uncover genetic regulators of cell state determination and β1 integrin activation/engagement for adherence in the absence of integrin αV. The finding that I could knockout both integrin αV and β1, and that cells were still viable and proliferative without these genes was one of my more surprising observations. We will also be performing CRISPR screens on this fully suspension line in order to define and characterize genetic dependencies (i.e. fitness genes) for all three described phenotypes, adherent, sphere and suspension.

With the very recent publication of an additional 342 CRISPR-Cas9 drop out screens across a wide panel of cancer cell lines, additional computational analyses are also being performed. These screens are mostly adherent cell lines and thus have the potential to add a lot of power to our analysis for adherence-related fitness genes. Additionally, in this large data set I have noticed two classes of adherent cell lines with respect to αV requirement; in about half of all lines, αV is a strong fitness gene, thus guides against αV drop out screening. In the other half, αV has very little role in fitness. The next question is what distinguishes integrin αV-dependent lines from independent lines? I hypothesize that these cells may suppress αV dependency for adherence by one or more of four possible reasons: 1) αV-independent cells may secrete and deposit high levels of their own ECM, thus are not washed away during screening 2) screening conditions lacked pyruvate and thus αVKOs grew adherently, 3) αV-independent cell lines rely on other integrins or cell adhesion molecules for adherence, or finally 4) αV-dependent and -independent cells are programed differently in some way to respond to the above conditions. Addressing these questions and looking for integrin αV co-dependencies and anti-dependencies may give

139

clues that will help define how loss of αV changes the way cells relate to their microenvironment. This work may have impact in the field of developmental diseases such as collagenopathies, cancer and fibrosis.

In chapter 2, I developed a methodology that enabled rapid generation of subpopulation-specific antibodies coupled to the discovery of novel targets. On applying this model to a colorectal CIC model, I developed three antibodies against these cells. two of which are specific for distinct integrin subunits, α7 and β6. In chapter 3, I identified genetic dependencies regulated by interaction with ECM. The key finding was a dependency on ITGAV, (encoding integrin αV), which is masked by ECM. Further characterization revealed that ablation of αV drastically changes the way cells interact with their microenvironment. In this model a combination of metabolic and ECM conditions dictate the growth state of cells. In conclusion, both projects undertaken for this thesis highlighted a critical role for various integrins in mediating cancer cell interaction with the microenvironment. References 1. Stewart, B.W., et al., World cancer report 2014. 2014, Lyon, France Geneva, Switzerland: International Agency for Research on Cancer WHO Press. xiv, 630 pages.

2. Virchow, R., Dir Krankhoften Geschwulste Onkologie, 1863. Vol II(Pt 1).

3. Sell, S., On the stem cell origin of cancer. Am J Pathol, 2010. 176(6): p. 2584-494.

4. Cohnheim, J., Congenitales, quergestreiftes muskelsarkon der nireren. Virchows Arch, 1875: p. 65:64.

5. Rippert, H., Geschwulstelehre fur Aerzte und Studierende. Bonn. 1904.

6. Paget, J., Lectures on Surgical Pathology. 1853, Philadelphia: Lindsay & Blakiston,.

7. Bainbridge, W.S., The cancer problem. 1914, New York,: The Macmillan company. 3 p.

8. Reya, T., et al., Stem cells, cancer, and cancer stem cells. nature, 2001. 414(6859): p. 105-111.

9. Jones, P.A. and S.B. Baylin, The fundamental role of epigenetic events in cancer. Nat Rev Genet, 2002. 3(6): p. 415-28.

10. Sharma, S.V., et al., A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell, 2010. 141(1): p. 69-80.

11. Mazor, T., et al., Intratumoral Heterogeneity of the Epigenome. Cancer Cell, 2016. 29(4): p. 440-451.

12. Frantz, C., K.M. Stewart, and V.M. Weaver, The extracellular matrix at a glance. J Cell Sci, 2010. 123(Pt 24): p. 4195-200.

13. Lewis, C.E. and J.W. Pollard, Distinct role of macrophages in different tumor microenvironments. Cancer Res, 2006. 66(2): p. 605-12.

14. Lanca, T. and B. Silva-Santos, The split nature of tumor-infiltrating leukocytes: Implications for cancer surveillance and immunotherapy. Oncoimmunology, 2012. 1(5): p. 717-725.

15. Kalluri, R. and M. Zeisberg, Fibroblasts in cancer. Nat Rev Cancer, 2006. 6(5): p. 392-401.

16. Pouyssegur, J., F. Dayan, and N.M. Mazure, Hypoxia signalling in cancer and approaches to enforce tumour regression. Nature, 2006. 441(7092): p. 437-43.

141

17. Gatenby, R.A. and R.J. Gillies, Why do cancers have high aerobic glycolysis? Nat Rev Cancer, 2004. 4(11): p. 891-9.

18. Carmeliet, P. and R.K. Jain, Angiogenesis in cancer and other diseases. Nature, 2000. 407(6801): p. 249-57.

19. Nowell, P.C., The clonal evolution of tumor cell populations. Science, 1976. 194(4260): p. 23-8.

20. Fearon, E.R., S.R. Hamilton, and B. Vogelstein, Clonal analysis of human colorectal tumors. Science, 1987. 238(4824): p. 193-7.

21. Vogelstein, B., et al., Genetic alterations during colorectal-tumor development. N Engl J Med, 1988. 319(9): p. 525-32.

22. Gerlinger, M., et al., Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med, 2012. 366(10): p. 883-892.

23. Morrissy, A.S., et al., Spatial heterogeneity in medulloblastoma. Nat Genet, 2017. 49(5): p. 780-788.

24. Linch, M., et al., Intratumoural evolutionary landscape of high-risk prostate cancer: the PROGENY study of genomic and immune parameters. Ann Oncol, 2017. 28(10): p. 2472- 2480.

25. Jamal-Hanjani, M., et al., Tracking the Evolution of Non-Small-Cell Lung Cancer. N Engl J Med, 2017. 376(22): p. 2109-2121.

26. Andor, N., et al., Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat Med, 2016. 22(1): p. 105-13.

27. Morris, L.G., et al., Pan-cancer analysis of intratumor heterogeneity as a prognostic determinant of survival. Oncotarget, 2016. 7(9): p. 10051-63.

28. Hanahan, D. and R.A. Weinberg, Hallmarks of cancer: the next generation. Cell, 2011. 144(5): p. 646-74.

29. McGranahan, N., et al., Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci Transl Med, 2015. 7(283): p. 283ra54.

30. Huang, L. and L. Fu, Mechanisms of resistance to EGFR tyrosine kinase inhibitors. Acta Pharm Sin B, 2015. 5(5): p. 390-401.

31. Singh, S.K., et al., Identification of a cancer stem cell in human brain tumors. Cancer research, 2003. 63(18): p. 5821-5828.

32. O'Brien, C.A., et al., ID1 and ID3 regulate the self-renewal capacity of human colon cancer-initiating cells through p21. Cancer Cell, 2012. 21(6): p. 777-92.

142

33. International Stem Cell Initiative, C., et al., Comparison of defined culture systems for feeder cell free propagation of human embryonic stem cells. In Vitro Cell Dev Biol Anim, 2010. 46(3-4): p. 247-58.

34. Medema, J.P., Cancer stem cells: the challenges ahead. Nat Cell Biol, 2013. 15(4): p. 338-44.

35. Al-Hajj, M., et al., Prospective identification of tumorigenic breast cancer cells. Proc Natl Acad Sci U S A, 2003. 100(7): p. 3983-8.

36. Huang, E.H., et al., Aldehyde dehydrogenase 1 is a marker for normal and malignant human colonic stem cells (SC) and tracks SC overpopulation during colon tumorigenesis. Cancer Res, 2009. 69(8): p. 3382-9.

37. Li, Z., CD133: a stem cell biomarker and beyond. Exp Hematol Oncol, 2013. 2(1): p. 17.

38. Lathia, J.D., et al., regulates glioblastoma stem cells. Cell Stem Cell, 2010. 6(5): p. 421-32.

39. Sato, T., et al., Single Lgr5 stem cells build crypt villus structures in vitro without a mesenchymal niche. Nature, 2009. 459(7244): p. 262-265.

40. Gumbiner, B.M., Cell adhesion: the molecular basis of tissue architecture and morphogenesis. Cell, 1996. 84(3): p. 345-57.

41. Pokutta, S. and W.I. Weis, Structure and mechanism of cadherins and catenins in cell- cell contacts. Annu Rev Cell Dev Biol, 2007. 23: p. 237-61.

42. Cavey, M. and T. Lecuit, Molecular bases of cell-cell junctions stability and dynamics. Cold Spring Harb Perspect Biol, 2009. 1(5): p. a002998.

43. Pickup, M.W., J.K. Mouw, and V.M. Weaver, The extracellular matrix modulates the hallmarks of cancer. EMBO Rep, 2014. 15(12): p. 1243-53.

44. Rozario, T. and D.W. DeSimone, The extracellular matrix in development and morphogenesis: a dynamic view. Dev Biol, 2010. 341(1): p. 126-40.

45. Bonnans, C., J. Chou, and Z. Werb, Remodelling the extracellular matrix in development and disease. Nat Rev Mol Cell Biol, 2014. 15(12): p. 786-801.

46. Hansen, N.U., et al., The importance of extracellular matrix for cell function and in vivo likeness. Exp Mol Pathol, 2015. 98(2): p. 286-94.

47. Hynes, R.O., The extracellular matrix: not just pretty fibrils. Science, 2009. 326(5957): p. 1216-9.

48. Jobling, R., et al., The collagenopathies: review of clinical phenotypes and molecular correlations. Curr Rheumatol Rep, 2014. 16(1): p. 394.

143

49. Naba, A., et al., The extracellular matrix: Tools and insights for the "omics" era. Matrix Biol, 2016. 49: p. 10-24.

50. Naba, A., et al., Characterization of the Extracellular Matrix of Normal and Diseased Tissues Using Proteomics. J Proteome Res, 2017. 16(8): p. 3083-3091.

51. Borza, C.M. and A. Pozzi, Discoidin domain receptors in disease. Matrix Biol, 2014. 34: p. 185-92.

52. Barresi, R. and K.P. Campbell, Dystroglycan: from biosynthesis to pathogenesis of human disease. J Cell Sci, 2006. 119(Pt 2): p. 199-207.

53. Cheng, B., et al., Syndecans as Cell Surface Receptors in Cancer Biology. A Focus on their Interaction with PDZ Domain Proteins. Front Pharmacol, 2016. 7: p. 10.

54. Couchman, J.R., Syndecans: proteoglycan regulators of cell-surface microdomains? Nat Rev Mol Cell Biol, 2003. 4(12): p. 926-37.

55. Hynes, R.O., Integrins: bidirectional, allosteric signaling machines. Cell, 2002. 110(6): p. 673-87.

56. Humphries, J.D., A. Byron, and M.J. Humphries, Integrin ligands at a glance. J Cell Sci, 2006. 119(Pt 19): p. 3901-3.

57. Bianconi, D., M. Unseld, and G.W. Prager, Integrins in the Spotlight of Cancer. Int J Mol Sci, 2016. 17(12).

58. Moser, M., et al., The tail of integrins, talin, and kindlins. Science, 2009. 324(5929): p. 895-9.

59. Seguin, L., et al., Integrins and cancer: regulators of cancer stemness, metastasis, and drug resistance. Trends Cell Biol, 2015. 25(4): p. 234-40.

60. Hannigan, G., A.A. Troussard, and S. Dedhar, Integrin-linked kinase: a cancer therapeutic target unique among its ILK. Nat Rev Cancer, 2005. 5(1): p. 51-63.

61. DeMali, K.A., K. Wennerberg, and K. Burridge, Integrin signaling to the actin cytoskeleton. Curr Opin Cell Biol, 2003. 15(5): p. 572-82.

62. Ginsberg, M.H., X. Du, and E.F. Plow, Inside-out integrin signalling. Curr Opin Cell Biol, 1992. 4(5): p. 766-71.

63. Lagarrigue, F., C. Kim, and M.H. Ginsberg, The Rap1-RIAM-talin axis of integrin activation and blood cell function. Blood, 2016. 128(4): p. 479-87.

64. Hynes, R.O., Integrins: versatility, modulation, and signaling in cell adhesion. Cell, 1992. 69(1): p. 11-25.

65. Horton, E.R., et al., Definition of a consensus integrin adhesome and its dynamics during adhesion complex assembly and disassembly. Nat Cell Biol, 2015. 17(12): p. 1577-87.

144

66. McGowan, K.A. and M.P. Marinkovich, Laminins and human disease. Microsc Res Tech, 2000. 51(3): p. 262-79.

67. Kambham, N., et al., Congenital focal segmental glomerulosclerosis associated with beta4 integrin mutation and epidermolysis bullosa. Am J Kidney Dis, 2000. 36(1): p. 190-6.

68. Has, C., et al., Integrin alpha3 mutations with kidney, lung, and skin disease. N Engl J Med, 2012. 366(16): p. 1508-14.

69. Hayashi, Y.K., et al., Mutations in the integrin alpha7 gene cause congenital myopathy. Nat Genet, 1998. 19(1): p. 94-7.

70. Mayer, U., et al., Absence of integrin alpha 7 causes a novel form of muscular dystrophy. Nat Genet, 1997. 17(3): p. 318-23.

71. Wynn, T.A. and T.R. Ramalingam, Mechanisms of fibrosis: therapeutic translation for fibrotic disease. Nat Med, 2012. 18(7): p. 1028-40.

72. Munger, J.S., et al., The beta 6 binds and activates latent TGF beta 1: a mechanism for regulating pulmonary inflammation and fibrosis. Cell, 1999. 96(3): p. 319-28.

73. Mosig, R.A., et al., Loss of MMP-2 disrupts skeletal and craniofacial development and results in decreased bone mineralization, joint erosion and defects in osteoblast and osteoclast growth. Hum Mol Genet, 2007. 16(9): p. 1113-23.

74. Tian, X., et al., High-molecular-mass hyaluronan mediates the cancer resistance of the naked mole rat. Nature, 2013. 499(7458): p. 346-9.

75. Burnier, J.V., et al., Type IV collagen-initiated signals provide survival and growth cues required for liver metastasis. Oncogene, 2011. 30(35): p. 3766-83.

76. Jinga, D.C., et al., MMP-9 and MMP-2 gelatinases and TIMP-1 and TIMP-2 inhibitors in breast cancer: correlations with prognostic factors. J Cell Mol Med, 2006. 10(2): p. 499- 510.

77. Frisch, S.M. and H. Francis, Disruption of epithelial cell-matrix interactions induces apoptosis. J Cell Biol, 1994. 124(4): p. 619-26.

78. Reginato, M.J., et al., Integrins and EGFR coordinately regulate the pro-apoptotic protein Bim to prevent anoikis. Nat Cell Biol, 2003. 5(8): p. 733-40.

79. Buchheit, C.L., K.J. Weigel, and Z.T. Schafer, Cancer cell survival during detachment from the ECM: multiple barriers to tumour progression. Nat Rev Cancer, 2014. 14(9): p. 632-41.

80. Overholtzer, M., et al., A nonapoptotic cell death process, entosis, that occurs by cell-in- cell invasion. Cell, 2007. 131(5): p. 966-79.

145

81. Durgan, J. and O. Florey, Cancer cell cannibalism: Multiple triggers emerge for entosis. Biochim Biophys Acta, 2018. 1865(6): p. 831-841.

82. Schafer, Z.T., et al., Antioxidant and oncogene rescue of metabolic defects caused by loss of matrix attachment. Nature, 2009. 461(7260): p. 109-13.

83. Jiang, L., et al., Reductive carboxylation supports redox homeostasis during anchorage- independent growth. Nature, 2016. 532(7598): p. 255-8.

84. Stalnecker, C.A., A.A. Cluntun, and R.A. Cerione, Balancing redox stress: anchorage- independent growth requires reductive carboxylation. Transl Cancer Res, 2016. 5(Suppl 3): p. S433-S437.

85. Martin, S.S. and K. Vuori, Regulation of Bcl-2 proteins during anoikis and amorphosis. Biochim Biophys Acta, 2004. 1692(2-3): p. 145-57.

86. Danen, E.H. and A. Sonnenberg, Integrins in regulation of tissue development and function. J Pathol, 2003. 201(4): p. 632-41.

87. von Behring, E. and S. Kitasato, [The mechanism of diphtheria immunity and tetanus immunity in animals. 1890]. Mol Immunol, 1991. 28(12): p. 1317, 1319-20.

88. Greenfield, E.A., Antibodies : a laboratory manual. Second edition. ed. 2014, Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press. xxi, 847 pages.

89. Miraglia, S., et al., A novel five-transmembrane hematopoietic stem cell antigen: isolation, characterization, and molecular cloning. Blood, 1997. 90(12): p. 5013-21.

90. Yin, A.H., et al., AC133, a novel marker for human hematopoietic stem and progenitor cells. Blood, 1997. 90(12): p. 5002-12.

91. Adams, G.P. and L.M. Weiner, Monoclonal antibody therapy of cancer. Nat Biotechnol, 2005. 23(9): p. 1147-57.

92. Schaedel, O. and Y. Reiter, Antibodies and their fragments as anti-cancer agents. Current pharmaceutical design, 2006. 12(3): p. 363-78.

93. Plosker, G.L. and S.J. Keam, Trastuzumab: a review of its use in the management of HER2-positive metastatic and early-stage breast cancer. Drugs, 2006. 66(4): p. 449-75.

94. Yoshida, K., Anti-Prominin-1 antibody having ADCC activity or CDC activity. 2014, Google Patents.

95. Bradbury, A.R., et al., Beyond natural antibodies: the power of in vitro display technologies. Nat Biotechnol, 2011. 29(3): p. 245-54.

96. Shim, H., Synthetic approach to the generation of antibody diversity. BMB Rep, 2015. 48(9): p. 489-94.

146

97. Miersch, S. and S.S. Sidhu, Synthetic antibodies: concepts, potential and practical considerations. Methods, 2012. 57(4): p. 486-98.

98. Sidhu, S., et al., Molecular Display Method. 2013, Google Patents.

99. Hermann, P.C., et al., Distinct populations of cancer stem cells determine tumor growth and metastatic activity in human pancreatic cancer. Cell Stem Cell, 2007. 1(3): p. 313- 23.

100. Visvader, J.E. and G.J. Lindeman, Cancer stem cells in solid tumours: accumulating evidence and unresolved questions. Nat Rev Cancer, 2008. 8(10): p. 755-68.

101. Jones, R.J. and S.A. Armstrong, Cancer stem cells in hematopoietic malignancies. Biol Blood Marrow Transplant, 2008. 14(1 Suppl 1): p. 12-6.

102. Pribluda, A., C.C. de la Cruz, and E.L. Jackson, Intratumoral Heterogeneity: From Diversity Comes Resistance. Clin Cancer Res, 2015. 21(13): p. 2916-23.

103. Turtoi, A., A. Blomme, and V. Castronovo, Intratumoral heterogeneity and consequences for targeted therapies. Bull Cancer, 2015. 102(1): p. 17-23.

104. O'Brien, C.A., et al., A human colon cancer cell capable of initiating tumour growth in immunodeficient mice. Nature, 2007. 445(7123): p. 106-10.

105. Ren, F., W.Q. Sheng, and X. Du, CD133: a cancer stem cells marker, is used in colorectal cancers. World J Gastroenterol, 2013. 19(17): p. 2603-11.

106. Notta, F., et al., Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science, 2016. 351(6269): p. aab2116.

107. Choo, S.Y., The HLA system: genetics, immunology, clinical testing, and clinical implications. Yonsei Med J, 2007. 48(1): p. 11-23.

108. Huang, X., et al., The integrin alphavbeta6 is critical for keratinocyte migration on both its known ligand, fibronectin, and on vitronectin. J Cell Sci, 1998. 111 ( Pt 15): p. 2189- 95.

109. Gettner, S.N., C. Kenyon, and L.F. Reichardt, Characterization of beta pat-3 heterodimers, a family of essential integrin receptors in C. elegans. J Cell Biol, 1995. 129(4): p. 1127-41.

110. Pasut, A., P. Oleynik, and M.A. Rudnicki, Isolation of muscle stem cells by fluorescence activated cell sorting cytometry. Methods Mol Biol, 2012. 798: p. 53-64.

111. Lathia, J.D., et al., Laminin alpha 2 enables glioblastoma stem cell growth. Ann Neurol, 2012. 72(5): p. 766-78.

112. Rodin, S., et al., Long-term self-renewal of human pluripotent stem cells on human recombinant laminin-511. Nat Biotechnol, 2010. 28(6): p. 611-5.

147

113. Haas, T.L., et al., Integrin alpha7 Is a Functional Marker and Potential Therapeutic Target in Glioblastoma. Cell Stem Cell, 2017. 21(1): p. 35-50 e9.

114. Cantor, J.M. and M.H. Ginsberg, CD98 at the crossroads of adaptive immunity and cancer. J Cell Sci, 2012. 125(Pt 6): p. 1373-82.

115. Cai, S., et al., CD98 modulates integrin beta1 function in polarized epithelial cells. J Cell Sci, 2005. 118(Pt 5): p. 889-99.

116. Han, Y.C., et al., Interaction of integrin-linked kinase and miniature maintenance 7-mediating integrin {alpha}7 induced cell growth suppression. Cancer Res, 2010. 70(11): p. 4375-84.

117. Longmate, W. and C.M. DiPersio, Beyond adhesion: emerging roles for integrins in control of the tumor microenvironment. F1000Res, 2017. 6: p. 1612.

118. Moore, K.M., et al., Therapeutic targeting of integrin alphavbeta6 in breast cancer. J Natl Cancer Inst, 2014. 106(8).

119. Yang, G.Y., et al., Integrin alpha v beta 6 mediates the potential for colon cancer cells to colonize in and metastasize to the liver. Cancer Sci, 2008. 99(5): p. 879-87.

120. Dang, D. and D.M. Ramos, Identification of {alpha}v{beta}6-positive stem cells in oral squamous cell carcinoma. Anticancer Res, 2009. 29(6): p. 2043-9.

121. Nakao, A., et al., TGF-beta receptor-mediated signalling through Smad2, Smad3 and Smad4. EMBO J, 1997. 16(17): p. 5353-62.

122. Breuss, J.M., et al., Expression of the beta 6 integrin subunit in development, neoplasia and tissue repair suggests a role in epithelial remodeling. J Cell Sci, 1995. 108 ( Pt 6): p. 2241-51.

123. Whilding, L.M., S. Vallath, and J. Maher, The integrin alphavbeta6: a novel target for CAR T-cell immunotherapy? Biochem Soc Trans, 2016. 44(2): p. 349-55.

124. Moffat, J., J.H. Reiling, and D.M. Sabatini, Off-target effects associated with long dsRNAs in Drosophila RNAi screens. Trends Pharmacol Sci, 2007. 28(4): p. 149-51.

125. Echeverri, C.J., et al., Minimizing the risk of reporting false positives in large-scale RNAi screens. Nat Methods, 2006. 3(10): p. 777-9.

126. Aguirre, A.J., et al., Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting. Cancer Discov, 2016. 6(8): p. 914-29.

127. Hart, T., et al., High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype- Specific Cancer Liabilities. Cell, 2015. 163(6): p. 1515-26.

128. Hart, T., et al., Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens. G3 (Bethesda), 2017. 7(8): p. 2719-2727.

148

129. Hart, T. and J. Moffat, BAGEL: a computational framework for identifying essential genes from pooled library screens. BMC Bioinformatics, 2016. 17: p. 164.

130. Wang, T., et al., Genetic screens in human cells using the CRISPR-Cas9 system. Science, 2014. 343(6166): p. 80-4.

131. Wang, T., et al., Identification and characterization of essential genes in the human genome. Science, 2015. 350(6264): p. 1096-101.

132. Meyers, R.M., et al., Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat Genet, 2017.

133. Patel, S.J., et al., Identification of essential genes for cancer immunotherapy. Nature, 2017. 548(7669): p. 537-542.

134. Wang, T., et al., Gene Essentiality Profiling Reveals Gene Networks and Synthetic Lethal Interactions with Oncogenic Ras. Cell, 2017. 168(5): p. 890-903 e15.

135. Amstein, C.F. and P.A. Hartman, Adaptation of plastic surfaces for tissue culture by glow discharge. J Clin Microbiol, 1975. 2(1): p. 46-54.

136. Hall, J.R., et al., Activated gas plasma surface treatment of polymers for adhesive bonding. Journal of Applied Polymer Science, 1969. 13(10): p. 2085-2096.

137. Evans, M.J. and M.H. Kaufman, Establishment in culture of pluripotential cells from mouse embryos. Nature, 1981. 292(5819): p. 154-6.

138. Martin, G.R., Isolation of a pluripotent cell line from early mouse embryos cultured in medium conditioned by teratocarcinoma stem cells. Proc Natl Acad Sci U S A, 1981. 78(12): p. 7634-8.

139. Reynolds, B.A. and S. Weiss, Generation of neurons and astrocytes from isolated cells of the adult mammalian central nervous system. Science, 1992. 255(5052): p. 1707-10.

140. Hovatta, O., et al., A culture system using human foreskin fibroblasts as feeder cells allows production of human embryonic stem cells. Human Reproduction, 2003. 18(7): p. 1404-1409.

141. Takahashi, K., et al., Induction of pluripotent stem cells from adult human fibroblasts by defined factors. cell, 2007. 131(5): p. 861-872.

142. Xu, C., et al., Feeder-free growth of undifferentiated human embryonic stem cells. Nature biotechnology, 2001. 19(10): p. 971-974.

143. Birsoy, K., et al., Metabolic determinants of cancer cell sensitivity to glucose limitation and biguanides. Nature, 2014. 508(7494): p. 108-12.

144. Favaro, E., et al., Glucose utilization via glycogen phosphorylase sustains proliferation and prevents premature senescence in cancer cells. Cell Metab, 2012. 16(6): p. 751-64.

149

145. Kamphorst, J.J., et al., Human pancreatic cancer tumors are nutrient poor and tumor cells actively scavenge extracellular protein. Cancer Res, 2015. 75(3): p. 544-53.

146. Schug, Z.T., et al., Acetyl-CoA synthetase 2 promotes acetate utilization and maintains cancer cell growth under metabolic stress. Cancer Cell, 2015. 27(1): p. 57-71.

147. Cantor, J.R., et al., Physiologic Medium Rewires Cellular Metabolism and Reveals Uric Acid as an Endogenous Inhibitor of UMP Synthase. Cell, 2017. 169(2): p. 258-272 e17.

148. Raab-Westphal, S., J.F. Marshall, and S.L. Goodman, Integrins as Therapeutic Targets: Successes and Cancers. Cancers (Basel), 2017. 9(9).

149. Barretina, J., et al., Subtype-specific genomic alterations define new targets for soft-tissue sarcoma therapy. Nat Genet, 2010. 42(8): p. 715-21.

150. Blomen, V.A., et al., Gene essentiality and synthetic lethality in haploid human cells. Science, 2015. 350(6264): p. 1092-6.

151. Tsherniak, A., et al., Defining a Cancer Dependency Map. Cell, 2017. 170(3): p. 564-576 e16.

152. McDonald, E.R., 3rd, et al., Project DRIVE: A Compendium of Cancer Dependencies and Synthetic Lethal Relationships Uncovered by Large-Scale, Deep RNAi Screening. Cell, 2017. 170(3): p. 577-592 e10.

153. Barretina, J., et al., The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 2012. 483(7391): p. 603-7.

154. Garnett, M.J., et al., Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature, 2012. 483(7391): p. 570-5.

155. Iorio, F., et al., A Landscape of Pharmacogenomic Interactions in Cancer. Cell, 2016. 166(3): p. 740-754.

156. Rauscher, B., et al., GenomeCRISPR - a database for high-throughput CRISPR/Cas9 screens. Nucleic Acids Res, 2017. 45(D1): p. D679-D686.

157. Leek, J.T., et al., The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics, 2012. 28(6): p. 882-3.

158. Klipper-Aurbach, Y., et al., Mathematical formulae for the prediction of the residual beta cell function during the first two years of disease in children and adolescents with insulin-dependent diabetes mellitus. Med Hypotheses, 1995. 45(5): p. 486-90.

159. Smyth, G.K., Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol, 2004. 3: p. Article3.

160. Dehairs, J., et al., CRISP-ID: decoding CRISPR mediated indels by Sanger sequencing. Sci Rep, 2016. 6: p. 28973.

150

161. Fellouse, F.A., et al., High-throughput generation of synthetic antibodies from highly functional minimalist phage-displayed libraries. J Mol Biol, 2007. 373(4): p. 924-40.

162. Gakhal, A.K., et al., Development and characterization of synthetic antibodies binding to the cystic fibrosis conductance regulator. MAbs, 2016. 8(6): p. 1167-76.

163. Haining, A.W., T.J. Lieberthal, and A. Del Rio Hernandez, Talin: a mechanosensitive molecule in health and disease. FASEB J, 2016. 30(6): p. 2073-85.

164. Klapholz, B. and N.H. Brown, Talin - the master of integrin adhesions. J Cell Sci, 2017. 130(15): p. 2435-2446.

165. Harburger, D.S. and D.A. Calderwood, Integrin signalling at a glance. J Cell Sci, 2009. 122(Pt 2): p. 159-63.

166. Lenz, S., et al., The alkali light chains of human smooth and nonmuscle myosins are encoded by a single gene. Tissue-specific expression by pathways. J Biol Chem, 1989. 264(15): p. 9009-15.

167. Cook, J.R., et al., FBXO11/PRMT9, a new protein arginine methyltransferase, symmetrically dimethylates arginine residues. Biochem Biophys Res Commun, 2006. 342(2): p. 472-81.

168. Mazars, R., et al., The THAP-zinc finger protein THAP1 associates with coactivator HCF-1 and O-GlcNAc transferase: a link between DYT6 and DYT3 dystonias. J Biol Chem, 2010. 285(18): p. 13364-71.

169. Suzuki, S., et al., cDNA and amino acid sequences of the cell adhesion protein receptor recognizing vitronectin reveal a transmembrane domain and homologies with other adhesion protein receptors. Proc Natl Acad Sci U S A, 1986. 83(22): p. 8614-8.

170. Marcotte, R., et al., Essential gene profiles in breast, pancreatic, and ovarian cancer cells. Cancer Discov, 2012. 2(2): p. 172-189.

171. Marcotte, R., et al., Functional Genomic Landscape of Human Breast Cancer Drivers, Vulnerabilities, and Resistance. Cell, 2016. 164(1-2): p. 293-309.

172. Sullivan, L.B., et al., Supporting Aspartate Biosynthesis Is an Essential Function of Respiration in Proliferating Cells. Cell, 2015. 162(3): p. 552-63.

173. Michelakis, E.D., L. Webster, and J.R. Mackey, Dichloroacetate (DCA) as a potential metabolic-targeting therapy for cancer. Br J Cancer, 2008. 99(7): p. 989-94.

174. Stacpoole, P.W., The pharmacology of dichloroacetate. Metabolism, 1989. 38(11): p. 1124-44.

175. Sutendra, G. and E.D. Michelakis, Pyruvate dehydrogenase kinase as a novel therapeutic target in oncology. Front Oncol, 2013. 3: p. 38.

151

176. Kreso, A., et al., Self-renewal as a therapeutic target in human colorectal cancer. Nat Med, 2014. 20(1): p. 29-36.

177. Brewer, G.J., et al., Optimized survival of hippocampal neurons in B27-supplemented Neurobasal, a new serum-free medium combination. J Neurosci Res, 1993. 35(5): p. 567- 76.

178. Ata, R. and C.N. Antonescu, Integrins and Cell Metabolism: An Intimate Relationship Impacting Cancer. Int J Mol Sci, 2017. 18(1).

179. Desgrosellier, J.S. and D.A. Cheresh, Integrins in cancer: biological implications and therapeutic opportunities. Nat Rev Cancer, 2010. 10(1): p. 9-22.

180. Ley, K., et al., Integrin-based therapeutics: biological basis, clinical use and new drugs. Nat Rev Drug Discov, 2016. 15(3): p. 173-83.

181. Weiswald, L.B., D. Bellet, and V. Dangles-Marie, Spherical cancer models in tumor biology. Neoplasia, 2015. 17(1): p. 1-15.

182. Yamada, S., K.E. Brown, and K.M. Yamada, Differential mRNA regulation of integrin subunits alpha V, beta 1, beta 3, and beta 5 during mouse embryonic organogenesis. Cell Adhes Commun, 1995. 3(4): p. 311-25.

183. Brooks, P.C., R.A. Clark, and D.A. Cheresh, Requirement of vascular integrin alpha v beta 3 for angiogenesis. Science, 1994. 264(5158): p. 569-71.

184. Brooks, P.C., et al., Integrin alpha v beta 3 antagonists promote tumor regression by inducing apoptosis of angiogenic blood vessels. Cell, 1994. 79(7): p. 1157-64.

185. Trikha, M., et al., CNTO 95, a fully human monoclonal antibody that inhibits alphav integrins, has antitumor and antiangiogenic activity in vivo. Int J Cancer, 2004. 110(3): p. 326-35.

186. Bader, B.L., et al., Extensive vasculogenesis, angiogenesis, and organogenesis precede lethality in mice lacking all alpha v integrins. Cell, 1998. 95(4): p. 507-19.

187. McCarty, J.H., et al., Selective ablation of alphav integrins in the central nervous system leads to cerebral hemorrhage, seizures, axonal degeneration and premature death. Development, 2005. 132(1): p. 165-76.

188. Zhao, Y., et al., Tumor alphavbeta3 integrin is a therapeutic target for breast cancer bone metastases. Cancer Res, 2007. 67(12): p. 5821-30.

189. van der Horst, G., et al., Targeting of alpha(v)-integrins in stem/progenitor cells and supportive microenvironment impairs bone metastasis in human prostate cancer. Neoplasia, 2011. 13(6): p. 516-25.

190. van der Horst, G., et al., Targeting of alpha-v integrins reduces malignancy of bladder carcinoma. PLoS One, 2014. 9(9): p. e108464.

152

191. Cui, Y., et al., Serum free or not: Two distinct recycling mechanisms mediated by alpha v beta 3 integrin. Curr Pharm Biotechnol, 2017.

192. McCarty, J.H., et al., Genetic ablation of alphav integrins in epithelial cells of the eyelid skin and conjunctiva leads to squamous cell carcinoma. Am J Pathol, 2008. 172(6): p. 1740-7.

193. Henderson, N.C., et al., Targeting of alphav integrin identifies a core molecular pathway that regulates fibrosis in several organs. Nat Med, 2013. 19(12): p. 1617-24.

194. Yang, J.T., et al., Overlapping and independent functions of fibronectin receptor integrins in early mesodermal development. Dev Biol, 1999. 215(2): p. 264-77.

195. Schiller, H.B., et al., beta1- and alphav-class integrins cooperate to regulate myosin II during rigidity sensing of fibronectin-based microenvironments. Nat Cell Biol, 2013. 15(6): p. 625-36.

196. Bharadwaj, M., et al., alphaV-class integrins exert dual roles on alpha5beta1 integrins to strengthen adhesion to fibronectin. Nat Commun, 2017. 8: p. 14348.

197. Poudineh, M., et al., Tracking the dynamics of circulating tumour cell phenotypes using nanoparticle-mediated magnetic ranking. Nat Nanotechnol, 2017. 12(3): p. 274-281.

198. Carette, J.E., et al., Ebola virus entry requires the cholesterol transporter Niemann-Pick C1. Nature, 2011. 477(7364): p. 340-3.