<<

The Pennsylvania State University

The Graduate School

Department of Molecular Medicine

IDENTIFICATION OF SOMATIC MUTATIONS IN LGL LEUKEMIA THROUGH

WHOLE GENOME SEQUENCING AND CORRELATION OF STAT3 Y640F

MUTATION WITH TREAMENT RESPONSE TO METHOTREXATE.

A Dissertation in

Molecular Medicine

by

Thomas Lynn Olson

 2013 Thomas Lynn Olson

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

August 2013

The dissertation of Thomas L. Olson was reviewed and approved* by the following:

Thomas P. Loughran Jr., Physician Director of Penn State Cancer Institute Professor of Medicine Dissertation Advisor

Rosalyn B. Irby Associate Professor of Medicine Chair of Committee

Richard J. Courtney Professor Emeritus Chair and Distinguished Educator of Microbiology and Immunology

Thomas E. Spratt Associate Professor of Biochemistry and Microbiology Director, IBIOS-Chemical Biology Option Director, Biochemistry and Molecular Biology Program

Bruce A. Stanley Director, Section of Research Resources Director, Proteomics and Mass Spectrometry Facility Co-Director, Translational Technologies, Penn State Clinical and Translational Sciences Institute Director, Basic Science Cores, Penn State Hershey Cancer Institute Special Member

Charles Lang Vice Chair, Distinguished Professor of Cellular and Molecular Physiology, and Surgery. Director of Molecular Medicine Graduate Program

*Signatures are on file in the Graduate School

iii

ABSTRACT The rare disorder, large granular lymphocyte (LGL) leukemia is characterized by a clonal expansion of cytotoxic cells. Normal LGL are critical to the removal of virally-infected or tumor cells from the body. These cells normally expand when there is an immune challenge and then rapidly undergo when the antigen is cleared, but leukemic LGLs are resistant to apoptosis. Serological evidence indicates that LGL leukemia may be driven by chronic infection of an unknown retrovirus. Patients with LGL leukemia frequently experience autoimmune conditions such as rheumatoid arthritis and also experience cytopenias requiring medical intervention. The study of LGL leukemia is therefore important both from a patient standpoint and as a model of normal LGL. We have recently discovered and characterized multiple somatic mutations in exon 20 and 21 of STAT3 from exome sequencing of an LGL leukemia patient and confirmed this mutation to be present in both Natural Killer and T cell leukemias. A small percentage was also found to have mutations in STAT5B, with some associated with an aggressive phenotype. Concurrently, we have undertaken whole genome sequencing to determine what other mutations may contribute to LGL leukemia and to expand our knowledge of

STAT3 activation. Given the antigen driven nature of LGL leukemia, it is unclear what mutations are necessary for their lack of apoptosis and whether these will resemble those seen in other leukemias.

The current treatment of choice for LGL leukemia is immunosuppressive methotrexate.

Firstly, we present the first large prospective trial of immunosuppressive therapy in LGL leukemia. In this trial we identify a correlation between a particular Y640F mutation in STAT3 and a favorable response to first line treatment with methotrexate. We measure and report 27 serum cytokines collected for this trial, none of which were a priori predictive of treatment response. Through microarray analysis we identified a expression signature indicative of response. Interestingly, this gene signature does not appear to be driven by STAT3 mutation in

iv all samples. We propose that there are other methods of STAT activation in these samples and advance a model of how the Y640F mutation enforces response.

Secondly, we collaboratively sequenced 6 paired, leukemic/saliva genomes from three

LGL patients at a high rate of coverage. Two patient genomes contain a direct STAT3 mutation.

In the third patient we observe a mutation in the 3' UTR of the IL6R that could activate JAK-

STAT signaling if it is shown to regulate translation. Mutations in the histone modifiers, MLL2,

MLL3 and NCOR1 are supported in the sequence reads of all three genomes. Alterations in this pathway would be a new observation for LGL leukemia. We identify and catalog numerous other mutations which may give insight into LGL biology that could allow us to better understand how symptoms arise and differ between patients and why current treatments may fail.

Lastly, we present data identifying important cis-regulatory modules in LGL leukemia gathered through ChIP-Seq of genomic regions occupied by the master regulator TBET. We report evidence of binding in the promoter region of numerous constituents of lytic granules which may indicate these are important downstream effectors of TBET. Overlap between the patients used for genome work above and ChIP-Seq is allowing us to determine whether mutations can have an effect on TBET binding and subsequent transcription of nearby . As we collect the footprints of additional factors, this holds the promise of elucidating the potential impact of the tens of thousands of non-coding mutations observed in these samples.

In summary, we present the results of large collaborative efforts to sequence mutations in

LGL leukemia and determine their effect on LGL biology. We advance the knowledge of STAT3 mutations in LGL biology while identifying and stratifying other mutations for the future study of their potential impacts. Bioinformatic methods of analysis were focused on publicly available resources such as Galaxy for which new tools, file formats, and workflows were developed. This allows easy reproduction of methods by other groups and facilitates visualization and future analysis by our own.

v

TABLE OF CONTENTS

List of Figures ...... vii

List of Tables ...... ix

Abbreviations ...... x

Acknowledgements ...... xiii

Chapter 1 Introduction to LGL Leukemia ...... 1

Clinical Entity ...... 1 Diagnosis and Symptoms ...... 1 Treatment ...... 3 Natural Born Killers ...... 3 LGL Are Resistant to Apoptosis ...... 4 LGL Are Terminal Effector Memory Cells...... 5 Viral Antigen...... 6 Survival Network of LGL Leukemia ...... 8 STAT3 Dysregulation and Somatic Mutation ...... 10

Chapter 2 STAT3 Y640F Mutation Predicts Response to Immunosuppressive Therapy of LGL Leukemia: Results of a Prospective Multicenter Phase II Study of Initial Treatment with Methotrexate by the Eastern Cooperative Oncology Group (E5998) ..... 14

Introduction ...... 14 Methods ...... 15 Eligibility Criteria ...... 15 Objectives ...... 16 Study Design ...... 16 Response Criteria ...... 17 Statistical Methods ...... 18 Array Data ...... 18 Cytokine Studies ...... 20 STAT3 Mutational Analysis...... 21 STAT3 Mutagenesis ...... 22 STAT3 Reporter Assay ...... 22 Western Blot ...... 22 Results ...... 23 Demographics ...... 23 Treatment Compliance/Toxicity ...... 25 Response ...... 26 Survival and Progression-Free Survival ...... 26 Laboratory Correlates ...... 28 Discussion ...... 40 Future directions ...... 43

vi

Chapter 3 Whole Genome Sequencing of Paired Normal/Leukemic Samples from Three LGL Leukemia Patients Reveals Additional Mutational Basis for Aberrant Survival of LGL...... 45

Introduction ...... 45 Methods ...... 47 Sample Selection and Preparation ...... 47 Sequencing and Alignment ...... 48 Variant Filtering and Identification of Transcripts Affected by Mutation...... 49 Results ...... 52 Sample Collection ...... 52 Demographics...... 52 Sequencing and Alignment ...... 56 Identification of Somatic Variants Affecting Transcripts...... 58 Presence of Mutation Potentially Affecting Activation of JAK-STAT Signaling Pathway...... 59 Network and Pathway Analysis of Potential Mutations in Known ...... 60 Discussion ...... 70 Future Directions ...... 72

Chapter 4 Discovering the Function of the TBET in Terminal Effector Memory Cells Via ChIP-Seq of Large Granular Lymphocytes...... 74

Introduction ...... 74 Methods ...... 77 Patient Selection and Sample Preparation...... 77 Chromatin Immunoprecipitation...... 77 GREAT ...... 81 Motif Discovery ...... 81 Manipulation of Intervals ...... 81 Results ...... 82 Regulatory Regions Bound by TBET...... 84 Pathway Analysis ...... 85 Identification of Motifs Found in Occupied Segments ...... 90 Occupied Segments Unique to Genome Patient Two Containing Somatic Variants...... 93 Discussion ...... 95 Future Directions ...... 98

Chapter 5 Concluding Remarks ...... 100

Appendix ...... 105

References ...... 139

vii

LIST OF FIGURES

Figure 1-1: Features of the Four Basic Subsets of CD8+ Cells...... 6

Figure 1-2: The T-LGL Survival Signaling Network...... 9

Figure1-3: Ribbon Diagram of Mutations on a Dimerized STAT3 Molecule...... 12

Figure 1-4: Expression Profiles of LGL with Mutant and Wildtype STAT3...... 13

Figure 2-1: Overall Survival (n=55)...... 29

Figure 2-2: Progression-Free Survival (n=55)...... 30

Figure 2-3: Heatmap of Baseline Sample Gene Expression in MTX Responders Versus Nonresponders...... 35

Figure 2-4: Evidence for Increased Transcriptional Activity of the Y640F STAT3 Mutation...... 39

Figure 2-5: Model of Methotrexate Responsiveness and STAT3 Mediated Signaling in LGL leukemia...... 41

Figure 3-1: Example of SNV Data, the STAT3 D661 Mutations...... 52

Figure 3-2: Coverage Across TCR-Beta Gene Locus in Genome Patient Two...... 57

Figure 3-3: Targetscan Prediction of miRNA Binding Sites in One Region of the IL6R 3' UTR...... 60

Figure 3-4: IPA De Novo Network of Genes Mutated in LGL Patient Genomes...... 65

Figure 3-5: Altered Proteins Involved in the Regulation of Nuclear Factor kappa B (NFkB)...... 67

Figure 3-6: Histone Modifying Proteins Altered in LGL...... 67

Figure 3-7: Fibroblast Growth Factor Signaling Mutations...... 68

Figure 4-1: Initial Assessment of ChIP-Seq Datasets from Three Patients ...... 83

Figure 4-2: Enrichment of TBET Peaks (from TEMRA) in Active Chromatin States...... 84

Figure 4-3: Coverage of TCR Signaling Network by Genes in Proximity to TBET bound regions...... 88

Figure 4-4: Mediated Cytotoxicity Pathway Genes Proximal to TBET Occupied Segments in Leukemic LGL ...... 89

Figure 4-5: Motifs Enriched in Sites Occupied by TBET...... 91

viii

Figure 5-1: Model of Proposed Future Work ...... 104

Figure Appendix-1: KEGG Annotated Pathway of Natural Killer Cell Mediated Cytotoxicity...... 138

ix

LIST OF TABLES

Table 2-1: Step 1 On-Study Characteristics (n=55) ...... 24

Table 2-2: Best Confirmed Response in Step 1 (Eligible Patients) ...... 27

Table 2-3: Best Confirmed Response in Step 2 (Eligible Patients) ...... 27

Table 2-4: LGL Serum Cytokines ...... 31

Table 2-5: Pathways Upregulated in Responders to Methotrexate in Baseline Samples...... 37

Table 2-6: Pathways Upregulated in Y640F Mutants Versus D661 Mutants...... 37

Table 3-1: Characteristics of Three LGL Leukemia patients selected for Whole Genome Sequencing ...... 54

Table 3-2: Raw Read and Total Base Counts Obtained for Each of Six Whole Genomes. .... 56

Table 3-3: Alignment Statistics for Whole Genome Data Aligned to hg19 ...... 57

Table 3-4: Substitution Variants Between LGL Patients and the Human Reference Genome ...... 58

Table 3-5: Summary Statistics of PROVEAN and SIFT Analysis of Filtered SNVs...... 62

Table 3-4: Predicted Mutations Not Present in dbSNP...... 63

Table 4-1: DNA lengths of TBET Libraries ...... 80

Table 4-2: Clustering of TBET Bound Intervals...... 84

Table 4-3: Pathway Enrichment in Genes Associated with TBET Bound Regions...... 86

Table 4-4: TBET occupied Segments Near Transcriptional Start Sites of JAK-STAT Pathway Genes ...... 87

Table Appendix-1: Genome One Predicted Changes...... 105

Table Appendix-2: Genome Two Predicted Protein Changes...... 117

Table Appendix-3: Genome Three Predicted Protein Changes...... 130

x

ABBREVIATIONS

AA amino acid AcCD8 activated CD8+ cells AICD activation-induced cell death AKT protein kinase B ALC absolute lymphocyte count AML acute myeloid leukemia ANC absolute neutrophil count AQP7 aquaporin 7 BID BH3 interacting domain death agonist BWA Burrows-Wheeler Alignment Tool CACNA1C calcium channel, voltage-dependent, L type, alpha 1C subunit CD cluster of differentiation (as expressed on T-cells) CCR C-C ChIP chromatin immunoprecipitation ChIP-Seq chromatin immunoprecipitation sequencing CI confidence interval CLB cold lysis buffer CLPD-NK chronic lymphoproliferative disorders of NK cells CR complete response CTL cytotoxic lymphocyte(s) Cy cyclophosphamide CyA cyclosporine D661Y/V/H aspartate at position 661 mutated to tyrosine, valine, or histidine DAVID Database for Annotation, Visualization and Integrated Discovery dbSNP single nucleotide polymorphism database DREME Discriminative DNA Motif Discovery dx diagnosis EASE Expression Analysis Systematic Explorer ECOG Eastern Cooperative Oncology Group EGF epidermal growth factor EOMES ERK Extracellular signal related kinase ES enrichment signal FAS CD95 FASL FASTA FAST-All FDR q-val false discovery rate q-value GAPDH Glyceraldehyde 3-phosphate dehydrogenase GI gastrointestinal GREAT Genomics Regions Enrichment of Annotations Tool GSEA Gene Set Enrichment Analysis GSK3 glycogen synthase kinase 3 HDAC histone deacetylase HNF4A hepatocyte nuclear factor 4 alpha HTLV human T-lymphotrophic ICAM intercellular adhesion molecule

xi

IFN interferon IL interleukin IL6R interleukin-6 receptor IPA Ingenuity Pathway Analysis IPDB immunoprecipitation dilution buffer JAK-STAT janus kinase signal transducer and activator of transcription KEGG Kyoto Encyclopedia of Genes and Genomes LGL large granular lymphocyte/lymphocytic MACS model-based analysis of ChIP-Seq MAPK mitogen activated protein kinase MCL myeloid cell leukemia MDS myelodysplastic syndrome MEK mitogen-activated protein kinase kinase kinase MIP macrophage inflammatory protein miRNA /mir micro RNA MLL myeloid/lymphoid or mixed-lineage leukemia MTX methotrexate N642H asparagine at position 642 mutated to histidine N647I asparagine at position 647 mutated to isoleucine NC no change NCD8 normal CD8+ cells NCOR1 co-repressor 1 ND not determined NFkB nuclear factor kappa B NK natural killer NK-LGL natural killer large granular lymphocytic NES normalized enrichment signal NLB nuclear lysis buffer NOM p-val nominal p-value NOS3 nitric oxide synthase 3 OS occupied segment(s) PARP poly ADP ribose polymerase PBMC peripheral blood mononuclear cells PBS phosphate buffered saline PDCD1 programmed cell death protein 1 PDGF platelet derived growth factor PDGF-BB platelet derived growth factor with 2 B chains PFS progression-free survival PI3K phosphatidylinositide 3-kinases po orally PR partial response PROG progress PROVEAN Protein Variation Effect Analyzer PTPRN2 protein receptor-type tyrosine-protein phosphatase N2 PTPRT protein receptor-type tyrosine-protein phosphatase T RA rheumatoid arthritis RANTES Regulated on Activation, Normal T cell Expressed and Secreted RAS rat sarcoma family of proteins RT-PCR real-time polymerase chain reaction

xii

RUNX runt-related transcription factor S1PR5 sphingosine-1-phosphate receptor 5 S614R serine at position 614 mutated to arginine SAMtools sequence alignment/map tools SELEX systematic evolution of ligands by exponential enrichment SGOT (AST) serum glutamic oxaloacetic transaminase (aspartate transaminase) SH2 Src homology 2 SIFT sorting intolerant from tolerant SNV single nucleotide variant(s) STAT signal transducer and activator of transcription TBET T-box expressed in T-cells TCR T-cell receptor TCR-beta T-cell receptor beta TEM effector memory T-cells TEMRA effector memory T-cells expressing CD45RA TGF-beta transforming growth factor-beta TIA1 T-cell restricted intracellular antigen-1 T-LGL cytotoxic-T large granular lymphocytic TNAIP3/A20 Tumor Necrosis Factor Alpha Inducible Protein 3 TOMTOM motif comparison tool TRAIL TNF-related apoptosis-inducing ligand TSS transcriptional start site UBE2V ubiquitin conjugating E2 variant 1 UTR untranslated region V995M valine at position 995 mutated to methionine VCAM vascular cell adhesion molecule VEGF vascular endothelial growth factor VIP vasoactive inhibitory protein WB Western blot WBC white blood cell Y640F tyrosine at position 640 mutated to phenylalanine Y665F tyrosine at position 665 mutated to phenylalanine

xiii

ACKNOWLEDGEMENTS

First and foremost I would like thank Dr. Thomas Loughran, Jr., for the opportunity to work on many highly interesting and very collaborative projects. These projects were at times challenging and painfully slow to develop but ultimately very rewarding. I would also like to thank Rosalyn Irby for marshalling me through all of the processes required to finish this degree.

I thank the other members of the Loughran lab for their support and helpful comments and my doctoral committee for these things as well.

I would like to thank Lynette Zickl and other members of the Eastern Clinical Oncology

Group, including again Dr. Loughran, for doing their part in the generation of clinical trial data and for assistance in the statistical analysis of the correlates which I contributed. In regards to genome work I would like to thank Drs. Stephan Schuster and Lynn Tomsho for their work in sequencing the genomes and Drs. Webb Miller, Aakrosh Ratan, and Richard Burhans for the initial alignments of these data. I would also like to thank Dr. Ross Hardison for the initial ChIP-

Seq and bioinformatic protocols for the TBET project and Dr. Cheryl Keller-Capone for the preparation and sequencing of these libraries. Belinda Giardine and Oscar Bedoya Reina were most helpful in data visualization and troubleshooting bioinformatic methods respectively. I also thank our collaborators in Helsinki and at the Cleveland Clinic for numerous productive collaborations.

I thank my family for their tolerance of my extended absence from them and their efforts to share and keep me a part of their lives. And finally I thank my wife for both her personal support and insights into the writing process.

Chapter 1

Introduction to LGL Leukemia

Clinical Entity

Large granular lymphocyte (LGL) leukemia encompasses a rare lymphoproliferative disorder, initially described as the clonal accumulation of cytotoxic cells in the marrow, spleen and blood (Loughran, Kadin et al. 1985). LGL are so named due to being 15-18 micrometers in diameter and the abundant presence of azurophilic granules. Cytotoxic cells are chiefly comprised of two subtypes, natural-killer large granular lymphocytes (NK-LGL) and cytotoxic-T large granular lymphocytes (T-LGL). Both of these cell types can and do give rise to LGL leukemia (Loughran 1993). The World Health Organization recognizes aggressive NK-cell leukemia as being distinct from LGL leukemia and in 2008 (Campo, Swerdlow et al. 2011) instituted a provisional entity of chronic lymphoproliferative disorders of NK cells (CLPD-NK).

An aggressive CD56+ T-LGL leukemia is infrequently seen (Gentile, Uner et al. 1994). The bulk of this dissertation will be centered on studies of chronic T-LGL leukemia but may be applicable to CLPD-NK as it displays a similar clinical course. Potential insight into why these diseases are similar is also given.

Diagnosis and Symptoms

The normal LGL count in peripheral blood is .250 X 109 LGL/L. Currently, levels diagnostic of LGL leukemia are sustained counts of 0.4-2 X 109 LGL/L (Semenzato, Zambello et al. 1997). This contrasts with earlier studies where counts indicative of LGL leukemia were

2 frequently observed to range from 2-10 X 109 LGL/L (Pandolfi, Loughran et al. 1990; Loughran

1993). is routinely performed on these patients and a high percentage of CD3+,

CD16+, CD8+, and CD57+ are indicative of T-LGL leukemia. In patients with lower counts, it is easier to diagnose those patients with other clinical or hematological features. In either instance, clonal rearrangement studies of the T cell receptor (TCR) can be performed on blood or marrow to determine if clonal LGL are present (Lamy and Loughran 2010). TCR rearrangement is a process unique to T cells of the whereby portions of the genome are recombined to create novel transcripts. Staining of azurophilic granule components, within linear arrays of

CD8+ cells, in the marrow can be one way to prove LGL involvement (Morice, Kurtin et al.

2002; Burks and Loughran 2006; Osuji, Beiske et al. 2007). These granule components including , perforin and the RNA binding protein T-cell-restricted intracellular antigen-1

(TIA1) have been shown to have expression restricted to the granules of NK and T-LGL (Oshimi,

Shinkai et al. 1990; Zambello, Trentin et al. 2000). This fact is pertinent to Chapter 4 as we will demonstrate that the transcription factor TBET (T-box expressed in T cells) may be regulating the expression of these molecules.

Rheumatoid arthritis (RA), anemia, neutropenia, and other autoimmune conditions are common in LGL leukemia. Severe neutropenia (<2,000 neutrophils/ul) occurs in roughly 50% of patients as does anemia, with most patients experiencing at least some neutropenia. Involvement of the spleen (50%), liver (23%) and bone marrow (88%) are quite common while lymph nodes are almost never involved (Loughran 1993). It is recommended that patients with unexplained

RA or cytopenias be examined for LGL leukemia. The observation that marrow is involved may explain the association (Huh, Medeiros et al. 2009) between LGL leukemia and myelodysplastic syndrome (MDS). MDS consists of one or more cytopenias that come as a result of dyplasia.

Dysplasia in a hematological sense indicates a failure in early development of a lineage as opposed to destruction in the periphery. Felty’s syndrome and RA may be closely related to LGL

3 Leukemia. Felty’s Syndrome is a polyclonal expansion of LGL that arises in longstanding RA patients and is defined by an enlarged spleen and low neutrophil counts. Although, not as easily observed, it is also possible for minor clonal expansions to be observed in RA patients (Liu and

Loughran 2011). Our lab believes that these may represent a spectrum of presentation of a disease with similar underlying mechanisms of causation. Roughly 30 percent of LGL patients have concomitant RA, and as many as 60 percent will have some autoimmune component to their disease.

Treatment

Therapy for LGL often utilizes immunosuppressive agents (Lamy and Loughran 2010), such as methotrexate (MTX), which is also a drug of choice for RA. Cyclophosphamide (Cy) can also be used when the initial presentation involves anemia or if MTX is not successful. Although often successful in the short term, Cy poses a duration-of-exposure dependent risk of inducing a separate malignancy that would pose a greater risk to the patient than LGL Leukemia. A third alternative for first or second line use is Cyclosporine(CyA). All three treatments have shown success in reducing the LGL clone and ameliorating symptoms, but none of them can be considered curative. One major goal of the Loughran lab is to discover more effective and/or targeted therapeutics for LGL Leukemia

Natural Born Killers

Understanding the normal function of LGL is critical to understanding how their abnormal expansion causes autoimmune symptoms. LGL of both the NK and T cell variety are critical in the elimination of virally infected and malignantly transformed cells from the body.

4 Upon encountering these aberrant cells, LGL make contact and eliminate them through the induction of apoptosis (van Lier, ten Berge et al. 2003). Apoptosis can be triggered by the delivery of cytotoxic granules into the target cell. The contents of these granules include the pore forming protein perforin, various granzymes, and TIA1. Once perforin has created a pore in the target cell, the granzyme family of serine proteases cleave various target proteins such as poly

ADP ribose polymerase (PARP), BH3 Interacting Domain Death Agonist (BID), and to induce apoptosis (Smyth, Kelly et al. 2001). TIA1 has been shown to fragment the DNA of the target cell (Tian, Streuli et al. 1991). Another way that LGL induce apoptosis is through the

CD95/CD95 ligand axis, otherwise known as FAS and FASL. Many cells throughout the body express FAS, but the expression of FASL is exclusive to LGL. The ligation of FAS on target cells by the FASL triggers a potent apoptotic cascade. This allows LGL to maintain surveillance over widely different types of cells (Krammer 2000).

LGL Are Resistant to Apoptosis

In the course of a normal infection, immune cells including LGL expand greatly to eliminate the infected cells but then die through a process known as activation induced cell death

(AICD). LGL leukemia cells appear to be internally resistant to this process which leads to their accumulation (Yang, Epling-Burnette et al. 2008). They have also been shown to secrete high levels of soluble FAS which may be acting as a decoy to prevent their own destruction (Liu, Wei et al. 2002). In contrast, secretion of soluble FASL by LGL has been shown to contribute to the neutropenia that occurs in many patients (Liu, Wei et al. 2000). The soluble isoform of FAS is created by the actions of TIA-1, noted above, which results in altered exon usage (Izquierdo and

Valcarcel 2007). We confirm elevated levels of both FAS and FASL in serum of LGL patients in

Chapter Two as part of a 27 cytokine profile of LGL.

5 LGL Are Terminal Effector Memory Cells.

CD8+ T cells consist of four subtypes: naïve, central memory, effector memory, and terminal effector memory. After exposure to antigen during an infection or immunogenic challenge, a small number of cells remain as memory cells. Central memory cells retain expression of markers CD62L and CCR7 which allow them to enter lymphoid organs. These cells have little effector (killing) function but expand rapidly in response to antigen. Effector memory cells lose these markers and remain in the periphery (Sallusto, Lenig et al. 1999;

Masopust, Vezys et al. 2001). Effector memory (EM) T cells fall into two groups, CD45RO expressing TEM and CD45RA expressing TEMRA. Important to LGL biology, these different subsets express different levels of the cytolytic effector molecules, which indicates there may be functional differences between these types of cells (Takata and Takiguchi 2006). LGL leukemia express the markers of TEMRA (Yang, Epling-Burnette et al. 2008) which makes them uniquely suitable to study the regulation of these molecules by the transcription factor TBET in Chapter 4.

Displaying TEMRA markers is considered a sign of previous antigen stimulation, but there is some controversy as to when that stimulation would have needed to occur. Some report that TEMRA appear to be generated in culture by interleukin 7 (IL-7) or interleukin 15 (IL-15) stimulation rather than antigen. However, the culture methods used in that study did not result in appreciable expansion (Geginat, Lanzavecchia et al. 2003). Although they may not rapidly expand, multiple lines of evidence indicate that TEM and TEMRA are inherently resistant to apoptosis, indicating that this may not be a unique feature to leukemic LGL (Strauss, Knape et al. 2003; Gupta and

Gollapudi 2007). The proposed mechanisms for this resistance are listed in Figure 1-1. The identification of mutations in LGL leukemia and the classification of their effects as activating or inhibiting the two hypothetical drivers of LGL, cytokine stimulation or TCR stimulation, may indicate which is more important to LGL survival. Although we have recently seen STAT3

6 mutations consistent with cytokine driven growth (Koskela, Eldfors et al. 2012), these could represent mutations that give selective growth advantage to subsets of antigen driven LGL rather than the initiating event.

Figure 1-1: Features of the Four Basic Subsets of CD8+ Cells. Figure taken from (Gupta and Gollapudi 2007) which describes effector memory cells to be resistant to apoptosis. Potential mechanisms of this resistance include reduced activity, upregulation of nuclear factor kappa-B (NFKB) and altered levels of apoptotic regulators. Phenotypic markers are also displayed. LGL leukemia cells are exclusively the fourth category (boxed).

Viral Antigen

Evidence that LGL leukemia is antigen driven extends beyond cell surface markers. The rearrangement of the TCR is what gives rise to diversity in the antigens that can be recognized by our immune system. It creates billions of cells with TCRs that are unique to that cell alone. Cells

7 that are specific to an antigen are expanded greatly in response to the presence of that antigen along with other stimulatory factors (Geginat, Lanzavecchia et al. 2003). Evidence for a common antigen among patients is extrapolated from the observation that any given patient's TCRs is often similar and in one case identical to those from another patient (Loughran, Starkebaum et al. 1988;

Wlodarski, O'Keefe et al. 2005). In addition to T cell disorders, unusual expansions of B cells are noted in LGL leukemia. Dyscrasia or abnormal skewing of the B cell repertoire is present in 27% of patients. When dyscrasia, hypogammaglobulinemia, hypergammaglobulinemia, and B cell lymphomas are considered, up to 43% of LGL leukemia patients have some abnormality in this compartment. These disorders could plausibly be caused by exposure to a common antigen or could be a byproduct of a transformed T cell population (Viny, Lichtin et al. 2008). A viral nature of this antigen is posited by the observation that a much larger number of LGL patients react serologically to antigens from the human T-lymphotrophic virus 1 (HTLV-1). Initially shown to interact with Western blot (WB) of viral lysates (Starkebaum, Loughran et al. 1987), this analysis was later refined to a specific peptide region noted as BA21 (Loughran, Hadlock et al. 1998). Quite intriguingly, this increased seroreactivity is also noted in aplastic anemia, MDS and paroxysmal nocturnal hemoglobinuria (Nyland, Krissinger et al. 2012) which may have LGL involvement. The identity of the putative virus is not known and may constitute an undiscovered human retrovirus as infection with HTLV1/2 is not at all common in LGL leukemia (Loughran,

Sherman et al. 1994). The persistence of antigen and whether that persistence is important to the continued survival of leukemic LGL is also undetermined. Minor evidence for a continued role of antigen is the fact that we have yet to find deactivating abnormalities of the TCR. Further complicating this analysis is the concept of antigenic drift whereby the offending antigen may no longer be present, but the responding cells have gradually altered affinity to now recognize a new antigen that may be endogenous in nature. A key related question is whether this antigen persistence contributes to the inability to affect permanent cures in many LGL leukemia patients.

8 Survival Network of LGL Leukemia

A novel approach to understanding what factors drive apoptotic resistance and promote survival was undertaken by others in the lab in 2008 and has provided the basis for this thesis and other studies. The approach pulled important molecules from a literature search of LGL leukemic cells and their normal counterparts and constructed a network of their interactions. This network was collapsed to the most important nodes and analyzed with discrete dynamic modeling. In this approach each molecule or node is assigned a state (ON or OFF) that determines whether or not it is active. Extracellular stimulation is traced through the network to determine the contribution of each node to the outcomes of proliferation and apoptosis. This identifies nodes that must be on or off to support a particular outcome (Figure 1-2). The importance of this diagram is that it recapitulates the signaling abnormalities previously seen in STAT3, PI3K/AKT

(phosphatidylinositide 3-kinase/protein kinase B) and RAS/MEK/ERK (rat sarcoma, mitogen activated protein kinase kinase, extracellular signal related kinase) recently reviewed in (Leblanc,

Zhang et al. 2012; Zhang and Loughran 2012). Furthermore it also predicted LGL to be driven by platelet derived growth factor (PDGF), which was later experimentally confirmed (Yang, Liu et al. 2010). Chapter 4 describes further studies on TBET, predicted to be a key node here, while

Chapter 2 provide further insight into STAT3 and nuclear factor kappa B (NFKB) activation through mutation.

Included in that network was sphingolipid signaling, which was identified as being important to LGL cell survival in microarray analyses (Shah, Zhang et al. 2008). Rather than de novo pathway assembly, expression data was interrogated to determine what pathways demonstrate upregulated aggregate expression in LGL leukemia when compared to normal CD8s or PBMCs. Utilizing EASE or GSEA a number of important pathways were discovered including sphingolipid signaling, which is of importance to our lab as we were to the first to clone

9 the sphingosine-1-phosphate receptor 5 (S1PR5) in T cells (Kothapalli, Kusmartseva et al. 2002).

This receptor is important in promoting the pro-survival effects of S1P which acts to counteract the apoptotic stimuli, ceramide. Multiple pathways of generation and conversion of ceramide to sphingosine 1 phosphate (S1P) exist, and the relative levels of these compounds act as a

“sphingolipid rheostat” that determines cell survival.

Figure 1-2: The T-LGL Survival Signaling Network. (Zhang, Shah et al. 2008) Node and edge color represents the current knowledge of the signaling abnormalities in T-LGL leukemia. Up-regulated or constitutively active nodes are in red, down- regulated or inhibited nodes are in green, nodes that have been suggested to be deregulated (either up-regulation or down-regulation) are in blue, and the states of white nodes are unknown or unchanged compared with normal. Blue edge indicates activation and red edge indicates inhibition. The shape of the nodes indicates the cellular location: rectangular indicates intracellular components, ellipse indicates extracellular components, and diamond indicates receptors. Conceptual nodes (Stimuli, Cytoskeleton signaling, Proliferation, and Apoptosis) are labeled orange. Asterisks highlight molecules that are important to this dissertation.

10 STAT3 Dysregulation and Somatic Mutation

The first indication that activated STAT3 was involved in LGL leukemia came in 2001

(Epling-Burnette, Liu et al. 2001). In this study, STATs (a small number showed STAT5) were found to be constitutively active in LGL leukemia and to promote survival through the expression of myeloid cell leukemia sequence 1 (MCL1). STAT3 has been shown to enhance cell growth and survival through signals relayed through the interleukin 6 family of cytokines (Hirano,

Ishihara et al. 2000). Recently, a multicenter collaborative effort between our group and those in

Finland and the Cleveland Clinic reported the presence of STAT3 mutations in 40% of T-LGL leukemia patients discovered through exome sequencing. The major mutations identified were

Y640F and D661Y/V (Figure 1-3). Both were found to increase STAT3 activity by luciferase assay. A diagram of the mutations from this publication is presented as Figure 1-3. In this study, a correlation was found between mutation and increased prevalence of rheumatoid arthritis and neutropenia. Consistent with our previous work, expression profiles of STAT3 downstream effectors were similar between mutated and wildtype patient samples when compared to normal

CD8+ T cells (Figure 1-4). An independent group found a much higher rate of mutation (62.9%) consisting of similar point mutations in their LGL leukemia cohort and interestingly did not find it in any other leukemia they tested (Fasan, Kern et al. 2012). We then collaborated to screen 50

CLPD-NK and 120 T-LGL patients identifying mutations in 30% and 28% of patients respectively (Jerez, Clemente et al. 2012). This confirms the clonal nature of both diseases and unifies their pathologies. Additional symptoms and treatment complications were again noted for mutation positive patients. For this reason we integrated this analysis into the first prospective trial of immunosuppresive agents in LGL leukemia, presented in Chapter 2. We are now also extending these findings into what are considered LGL related diseases without observed major expansions of clonal LGL. The sensitive technique of allele specific PCR by Jerez was used to

11 detect mutations in patients with aplastic anemia and myelodysplastic syndrome (Jerez, 2013, in submission). In this study I performed pyrosequencing to confirm these mutations and to compare methodologies. Results would suggest amplicon sequencing and pyrosequencing to be less sensitive than allele specific PCR.

Efforts employed concurrent to the exome studies to find STAT3 and other JAK-STAT activating mutations through whole genome sequencing are presented in Chapter 3.

Understanding the nature of STAT3 activation in STAT3 mutation-negative patients is a very active area of research. We very recently discovered (Rajala, Eldfors et al. 2013) the presence of

STAT5B mutations in 4 of 198 samples (40 CLPD-NK, 158 T-LGL). The mutation pattern observed consisted of 2 patients having a Y665F mutation and two patients N642H mutation, with both N642H patients being submitted by the Loughran lab. Fortuitously, these were not true

CLPD-NK but consisted of CD56+ T cell LGL. As noted, CD56+ T-LGL leukemia are associated with an aggressive clinical course. We have since found this mutation in 5 of 5 additional aggressive patient samples from our registry and are searching for more similar cases to test.

In Chapter 3 we present another possible mechanism of STAT3 activation, that involving the IL6 receptor. The further use of exome sequencing with our collaborators has found a mutation, V995M, in the protein receptor-type tyrosine-protein phosphatase T (PTPRT) that may be acting to activate STAT3, and a handful of other mutations related to T cell signaling

(Andersson, 2013, in submission, contributing author). These mutations were found to be only in the patients whose exomes were sequenced and were not found in any of the 113 samples that were resequenced. We observe in Chapter 3 a mutation in receptor type protein tyrosine phosphatase N2 (PTPRN2) in a similar region to that observed in PTPRT. Taken together, this may indicate that STAT3 activation is caused through varied mechanisms in those 70% of

STAT3 mutation negative patients, which may make their identification tedious and time

12 consuming. Additionally, we have also observed as many as 4 separate mutations in the same patient using a deep sequencing approach. This approach sequences 10,000 or more individual strands of DNA so that mutational frequency can be counted. The mutations did not co-occur on the same read and therefore represent separate clonal populations. Numerous, single, sub- dominant STAT3 mutations were also observed, as defined by the fact that the percentage of mutated reads did not approach at least half of the estimated purity of the sample. For example, a patient with 40% CD8+, CD3+ PBMCs demonstrating only 5% of reads with D661Y mutations contains a clonal population of 5-10% STAT3 mutation positive cells when you account for the fact that the mutation could be homozygous or heterozygous. We are currently determining what happens to the clonal percentages in longitudinal studies of patients who have undergone treatment (Rajala, 2013, in preparation, contributing author).

Figure1-3: Ribbon Diagram of Mutations on a Dimerized STAT3 Molecule. Diagram is taken from (Koskela, Eldfors et al. 2012). Mutations are noted to occur in amino acids that are part of the SH2 domain.(A) Three dimensional modeling indicates the location of these mutation to be in the dimerization interface. Six STAT3 mutations are shown in one of the two subunits (magenta), as is the in-frame insertion mutation (yellow). Mutations in D661Yor V and Y640F were the most frequently found to be mutated. Similar mutations were found in LGL of NK cell origin. Y665F and N642H in STAT5B molecules would be predicted to fall into this same domain (not shown). Boxed mutations were highly prevalent in the clinical trial cohort presented in Chapter 2.

13

Figure 1-4: Expression Profiles of LGL with Mutant and Wildtype STAT3. Data reported in (Koskela, Eldfors et al. 2012), collected and analyzed by this author. Results are from a portion of the samples included in the microarray experiment in Chapter 2. What can be observed, for many supposed targets of the transcription factor STAT3, is that leukemic LGL consistently demonstrate higher levels of expression than normal CD8+ T cells regardless of mutation. Those two showing altered expression in mutant versus non-mutant are boxed in blue. I bars represent standard errors. One asterisk denotes P<0.05, two asterisks P<0.01, and three asterisks P<0.001.

14 Chapter 2

STAT3 Y640F Mutation Predicts Response to Immunosuppressive Therapy of LGL Leukemia: Results of a Prospective Multicenter Phase II Study of Initial Treatment with Methotrexate by the Eastern Cooperative Oncology Group (E5998)

Introduction

Large granular lymphocyte (LGL) leukemia is characterized by clonal expansion of cytotoxic T cells (CTL) (Loughran, Kadin et al. 1985; Loughran 1993). Prominent clinical features include neutropenia, anemia, and rheumatoid arthritis. The terminal effector memory

(TEMRA) phenotype (CD3+/CD8+/CD57+/CD45RA+/CD62L-) of leukemic LGL suggests a pivotal chronic antigen driven immune response (Yang, Epling-Burnette et al. 2008). Leukemic

LGL survival is promoted by PDGF and IL-15, resulting in global dysregulation of apoptosis and resistance to normal pathways of activation-induced cell death (Epling-Burnette, Liu et al. 2001;

Schade, Powers et al. 2006; Shah, Zhang et al. 2008; Zhang, Shah et al. 2008; Yang, Liu et al.

2010). These pathogenic features explain in part why treatment of LGL leukemia is based on immunosuppressive therapy.

No standard therapy for LGL leukemia has been established due to the absence of large prospective trials. There have been six large retrospective studies (>40 patients) of immunosuppressive treatment in LGL leukemia (Pandolfi, Loughran et al. 1990; Loughran 1993;

Dhodapkar, Li et al. 1994; Semenzato, Zambello et al. 1997; Neben, Morice et al. 2003; Bareau,

Rey et al. 2010). The three most commonly used immunosuppressives have been methotrexate

(MTX), cyclophosphamide (Cy), and cyclosporine (CyA). Overall response rates of 56% for

MTX (n = 96), 61% for Cy (n = 85), and 56% for CyA (n = 123) have been reported (Lamy and

Loughran 2011). We present herein results of the only large prospective trial of

15 immunosuppressive therapy in LGL leukemia. Our objective was to determine the overall effectiveness of the immunosuppressant agent methotrexate and to develop a means to identify those patients who do not respond to treatment, prior to treatment. As such, correlative laboratory studies (microarray, serum cytokines and mutational analysis) were conducted to determine if biomarkers or genetic analysis could predict therapeutic response. We were particularly interested in the STAT3 pathway as we had found STAT3 to be constitutively activated in leukemic LGL (Epling-Burnette, Liu et al. 2001). Moreover, STAT3 was predicted to be a key node in a network model of leukemic LGL survival (Zhang, Shah et al. 2008). Most recently, we demonstrated somatic mutations that activate STAT3 in 40% of LGL leukemia patients (Koskela, Eldfors et al. 2012) and subsequently analyzed their potential association with response to immunosuppressive agents in this study.

Methods

Eligibility Criteria

Eligibility for the Eastern Cooperative Oncology Group (ECOG) 5998 study included a diagnosis of the T cell form of LGL leukemia as determined by: 1) phenotypic studies from peripheral blood showing CD3+CD57+ cells greater than 400/mm3 or CD8+ cells greater than

650/mm2 in the eight weeks prior to registration and 2) evidence for clonal T cell receptor gene rearrangement within one year prior to registration. Patients needed to meet one of the following indications for treatment: category a) severe neutropenia less than 500/mm2; b) neutropenia associated with recurrent infection; c) symptomatic anemia; and/or d) transfusion-dependent anemia. Patients were 18 years or older and signed institutional review board informed consent in accordance with the Declaration of Helsinki. Other eligibility criteria included bilirubin <2.0

16 mg/dl, serum glutamic oxaloacetic transaminase/aspartate transaminase) (SGOT/AST)< 1.5 times normal, creatine < 2.0 mg/dl, ECOG performance status of 0-2, no previous or concurrent malignancies (except inactive non-melanoma skin cancer, in situ carcinoma of the cervix, or other cancer if the patient had been disease free for over five years), no other serious medical illness, and for female patients, not pregnant or breastfeeding.

Objectives

The primary objectives of this study were 1) to estimate the complete response (CR) rate, partial response (PR) rate, and overall response rate of MTX therapy in LGL leukemia patients treated for neutropenia or anemia and 2) to estimate CR rate, PR rate, and overall response rate of

Cy treatment in patients failing MTX, for treatment indications of neutropenia or anemia.

Secondary objectives were to conduct correlative studies to better define LGL leukemia pathogenesis as well as to correlate with therapeutic response.

Study Design

Step 1 consisted of MTX at 10 mg/m2 orally (po) in divided doses once weekly. One cycle of therapy consisted of four weeks of treatment. Prednisone was given at 1 mg/kg po daily x 30 days and then tapered off in the subsequent 24 days. Patients not responding to MTX received Cy at 100 mg po daily with the same prednisone schedule (Step 2). Patients achieving

PR in either step received MTX or Cy, respectively, for a maximum of one year. Patients achieving CR in either step received MTX or Cy, respectively, for one additional month after documentation of CR. Protocol treatment was discontinued in patients failing Step 2 therapy.

Since the primary treatment indication was neutropenia or anemia and there was a potential for

17 differential response rates, we conducted studies of identical design in each stratum defined by the primary symptom. Simon’s optimal two stage design was employed to allow for early termination of the study if this treatment demonstrated no beneficial effects with respect to response. For the first stage, 17 eligible patients were enrolled. The study was designed to terminate if fewer than 4 patients achieved a complete or partial response. If 4 or more achieved a response, the study continued to the second stage where an additional 23 patients were enrolled.

The final accrual was 59 patients. The study was designed to test the null hypothesis of a 20% response rate versus an alternative of 40% with 90% power and an overall type I error rate of

0.08.

Response Criteria

Treatment response was assessed after four cycles of therapy. CR was defined as attainment of normal complete blood count (CBC) (absolute neutrophil count (ANC) >

1500/mm3; lymphocyte count < 4,000/mm3; hemoglobin > 11g/dl; platelet count > 100,000/mm3).

In addition, LGL counts as determined by repeat flow cytometry needed to be in the normal range. Partial response (PR) was defined as improvement in hematologic parameters in the absence of CR: 1) ANC > 500, as long as this represented 50% increase (treatment category a);

2) improvement in ANC > 50% over baseline (treatment category b); 3) increase in hemoglobin by > 1g/dl for at least four months duration (treatment category c); and 4) decrease in monthly transfusion requirements of > 50% for at least four months duration (treatment category d).

Progressive disease was defined as worsening of hematologic parameters in patients previously achieving PR/CR. No response was defined as lack of CR/PR. Complete molecular remission was determined by showing absence of T cell clone using repeat T cell receptor gene

18 rearrangement studies. Thomas Loughran, Jr., served as the referee of response for this trial and also principal investigator for correlative studies.

Statistical Methods

Univariate associations between dichotomous variables were evaluated by Fisher’s Exact test (1990). Associations involving ordered categorical variables were evaluated by the Wilcoxon

Rank Sum test. Overall survival (OS) was defined as the time from study registration to death from any cause or date last known alive. Progression free survival (PFS) was defined as the time from registration to progression or to death without documentation of progression (censored).

Patients who were alive without a progression were censored at the date of last contact. The methods of Kaplan and Meier (1958) were used to estimate survival curves and the significance was tested by logrank tests. P-values were reported for two-sided tests. No adjustments were made for multiple comparisons. The median follow-up was 6.3 years (range, 0.5–12 years) for the 34 surviving patients. Lynette Zickl from ECOG performed the bulk of the statistical analysis on the trial results as well as assisted in the statistical analysis of laboratory correlates.

Array Data

Raw and normalized array results are deposited under Gene Expression Omnibus accession number GSE42664.

CD8 + T cells were prepared from normal donor lymphocyte filters from blood donations utilizing a Rosette-Sep negative isolation protocol (Stemcell Technologies) and confirmed to be

>85% pure by flow cytometry (CD3/CD8 double positive, greater than 85%)

19

TEMRA samples were additionally depleted of CD45RO positive cells by the addition of anti-CD45RO tetrameric antibody complexes and magnetic particles (Stemcell Technologies) to yield CD8+, CD3+, CD45RA+ and CCR7- cells. The use of age-matched controls limited the amount of contaminating naïve cells to less than 15 percent by flow cytometry. RNA was extracted in Trizol, phase-separated and re-suspended in water. RNA samples were verified to be of high quality by analysis on the Agilent 2100 Bioanalyzer prior to the generation of cRNA libraries. Libraries were placed on Illumina Human HT-12 version 4 revision 2 bead arrays which were prepared and read in the functional genomics core facility at Penn State Hershey

Medical Center on the BeadArray Reader from Illumina. Raw data were then quantile normalized using GenomeStudio from Illumina and then transferred to me for the remainder of the analysis. Probes that did not yield a signal greater than 100 for at least 66% of samples were excluded. Data were then imported into Gene Pattern (Subramanian, Tamayo et al. 2005) and missing values were imputed utilizing a K-nearest neighbor algorithm with settings to use 10 neighbor genes, after excluding probes that did not show expression in at least 28 of 37 registry samples. Genes with multiple probes were collapsed so that only the probe with the highest intensity was utilized. Samples were then analyzed for comparative markers and gene set enrichment, utilizing t-test as the method for scoring in gene lists. A total of 10,000 permutations were performed to establish p values between response groups, but mutation specific results were not permuted given that in the latter case given there were less than 10 samples per class. Gene

Set Enrichment Analysis was then conducted on samples stratified by response and then by mutation type, with t-test again used for ranking, datasets collapsed to gene symbol on highest intensity probe, gene set as the type of permutation to use, and all other settings were left to the default value.

20 Cytokine Studies

Sera from healthy anonymous donors were collected by Florida Blood Services (St

Petersburg, FL) and by the Hershey Medical Center Blood Bank (Hershey, PA) and consisted of

39 normal donors, 16 females (average age: 57, range: 23-83) and 23 males (average age: 68, range: 49-100). All patient sera for these studies were from baseline draws prior to the initiation of the treatment regimen.

Cytokines were measured in a multiplex format utilizing the Bio-Plex 200 instrument

(BioRad). Analytes were measured by me grouped into 4 panels. The first panel included, soluble FAS, soluble FAS ligand, soluble V-CAM, and soluble I-CAM analytes as part of a custom Human Sepsis Panel 1 from Millipore. The second panel included TGF-Betas 1, 2, and 3.

Samples were acidified and then neutralized to release TGF proteins from platelets prior to this assay. The third panel consisted of the following Group 1 cytokine panel (Bio-Rad) analytes:

IL-1Beta, IL-1ra, IL-2, IL-4, IL-5, IL-6, IL-8, IL-9,IL-10, IL-12(p70), IL-13, IL-15, IL-17A,

IFN-gamma, MIP-1alpha, MIP-1beta, PDGF-BB, and RANTES. Lastly, the fourth panel consisted of three Group 2 cytokine panel (Bio-Rad) analytes; IL-18, IFN-alpha2 and TRAIL.

All serum samples were diluted to manufacturer’s recommendations with appropriate buffers, and manufacturer’s recommendations were also followed for gate setting and reporter PMT settings for the Bio-Plex 200. Serial dilutions from standards of known concentration for each analyte were used to construct a five parameter linear regression model. Mean fluorescent intensity values for each bead region that corresponded to the analytes studied were fit to this model to determine concentrations in picograms per milliliter for each analyte measured.

21 STAT3 Mutational Analysis.

The presence of mutant STAT3 sequences was detected by standard Sanger sequencing of STAT3 exon 20 from either DNA or RNA. Nucleic acids were extracted from PBMC samples that were collected as part of the participants baseline blood draw or in a few instances, shortly thereafter. DNA and RNA were extracted with the Wizard SV genomic DNA kit (Promega) or

Trizol, respectively. RNA was converted to cDNA utilizing the Transcriptor cDNA synthesis kit and a specific primer located in exon 23. Samples were then amplified with PCR primers specific to the RNA or DNA sequences. Specific bands were gel-extracted and sequenced with BigDye™ v.1.1 Cycle Sequencing kit and the ABI PRISM® 3730xl DNA Analyzer. Sequences were analyzed using ChromasPro and BLAST search. Assays were completed in duplicate by both our lab and our collaborators Hannah Rajala (Koskela) and Satu Mustjoki.

Primers

DNA set

PCR forward, sequence 5’ GCCAGGCCACTGAACAGGGTG PCR reverse 5’ TCCCATTCCCAGGGATAACTGAGGA

cDNA set cDNA synthesis 5' GCACTCCGAGGTCAACTCCATGTCAAA PCR forward 5' ATGGGCTTTATCAGTAAGGAGCGGGAG PCR reverse 5' GCTGGCAAGGGCTTCTCCTTCTGGGTC Sequencing 5' GGGTTCAGCACCTTCACCATT

Custom Taqman gene expression assay for mir-223 precursor transcript.

Forward 5' CCAGGGCAGATGGGATATGA Reverse 5' GTGCTTGGTGAGCATCCTTGT Taqman probe 5' TGGACTGCCAGCTGG

22 STAT3 Mutagenesis

An OmicsLinkTM expression clone containing the human STAT3 coding sequence

(NM_003150) was obtained from GeneCopoeiaTM. Y640F mutant was obtained from the

Hematology Research Unit Finland. S614R, D661Y and D661V mutants were created by PCR site-directed mutagenesis using PfuUltra High-Fidelity DNA polymerase AD (Stratagene) or In-

Fusion cloning system (Clontech). Mutations were confirmed by DNA sequencing. This work was performed by Dan Zhang in the Loughran lab.

STAT3 Reporter Assay

The day before the luciferase assay, HEK293T cells were plated in 96-well plates at

20,000 cells/well and maintained in DMEM with 10% fetal bovine serum (Life Technologies).

Six hours after plating, the cells were transiently co-transfected with empty OmicsLinkTM expression vector pReceiver-M02 or pReceiver-M02 STAT3 plasmids (200ng/well) with the

Cignal STAT3 Reporter (Qiagen) according to the manufacturer’s protocol using Lipofectamine

2000 (Life Technologies). 24 hours after transfection, Dual-Glo Luciferase Assay kit (Promega) was used to detect the luciferase activity following the manufacturer’s recommendations. To ensure that the expression level of different STAT3 variants was similar, we performed Western blots with a STAT3 antibody (1:000) and a β-actin antibody (1:5000) (Cell Signaling) on parallel whole cell lysates.

Western Blot

Total PBMC or enriched CD8+ samples were lysed with RIPA buffer (Sigma) and run on NuPage 10% Bis-Tris gels (Invitrogen) and then transferred onto PVDF membranes using a

23 Trans Blot SD transfer cell (Bio-Rad). Blots were washed in 0.1% TBS-Tween and incubated with antibodies in either 5% milk or BSA TBS-T. Antibodies to -1 (3742), C- (9402) and GAPDH (2118) were purchased from Cell Signaling. Blots were developed with the Clarity

ECL reagent (Bio-Rad) and imaged using the Chemidoc XRS+ system (Bio-Rad). Bands were quantified with the Quantity One software (Bio-Rad). Bench work was performed by Zainul

Hasanali.

Results

Demographics

The study accrued 59 patients between July 16, 1999 and March 24, 2009. Per two stage design, a response analysis was conducted after the first 17 eligible patients were accrued to each stratum. Since there were more than four responses, accrual continued to a total of 59 patients.

The study terminated with 59 patients on March 24, 2009 due to slower than expected accrual.

Of the 59 patients enrolled, four patients were ineligible for Step 1: one patient did not satisfy indications for treatment; two patients did not meet eligibility criteria for diagnosis, having too few LGL; and in one patient, eligibility labs were performed more than four weeks after registration. There were 16 patients enrolled in Step 2 with two patients being ineligible: one patient did not satisfy indications for treatment and the other patient received less than four cycles of MTX therapy in Step 1. Therefore, response and survival analyses were based on 54 eligible patients with data available as of August 2012.

Baseline demographic characteristics of the study patients are shown in Table 1-1

Treatment indication was anemia in 29 (53%) patients and neutropenia in 25 (47%). The median age of the patients was 70 years (range: 20-89 years).

24 Table 2-1: Step 1 On-Study Characteristics (n=55)

N (%) Gender Male 30 55% Female 25 45% Race White 52 95 % Black 1 2% Other 2 4% Treatment Indication Anemia 29 53 % Neutropenia 26 47% Performance Status 0 16 29 % 1 33 60% 2 6 11% Age (years) Median 70 Range 20–89 Hemoglobin (g/dl) Median 10.3 Range 7.0–16.3 Platelets (K/mm3) Median 20 5 Range 49–556 WBC (K/mm3) Median 5.5 Range 1.0–24.8

Total # of LGL cells/mm3 (n=50) Median 1724 Range 231–9856 Percentage of cells CD3+/CD57+ (n=49) Median 46 % Range 3.0–99.5%

25 Treatment Compliance/Toxicity

Fifty-four patients began MTX therapy while one patient was excluded because of insufficient data. Of the 54 patients on therapy, 81% received at least 4 cycles of MTX treatment and 57% received at least 4 cycles without dose adjustment or omission. The median number of cycles for Step 1 was 5 (range: 1–14). Fourteen (26%) patients began Step 2 therapy

and received a median of 3.5 (range: 1–12) cycles of Cy treatment. Seven (50%) of the patients received at least 4 cycles of Cy and 5 (36%) received at least 4 cycles without dose adjustment or omission.

During Step 1, excessive complication or toxicity was the predominant off-treatment reason, followed by progressive disease and patient withdrawal; during Step 2, excessive complication or toxicity was the most common off-treatment reason.

Toxicity was assessed separately for patients with neutropenia versus anemia.

Hematologic toxicities were excluded in the calculation of the worst degree for all toxicity as it is difficult to distinguish treatment related-nadirs from cytopenia due to disease. In Step 1 therapy for 25 neutropenic patients, there was one grade 5 toxicity in a patient with infection associated with neutropenia. Grade 4 toxicities included infection without neutropenia (one patient) and dyspnea (one patient). In Step 1 therapy for 31 anemia patients, there was one grade 5 toxicity in a patient with pneumonitis/pulmonary infiltrates after one cycle of therapy. Grade 4 non- hematologic toxicities induced fatigue (two), hyperglycemia (one), hyponatremia (one), dyspnea

(one), hypoxia (one), and pneumonitis/pulmonary infiltrates (one).

Sixteen patients were analyzed for toxicity in Step 2 therapy. There were no grade 5 toxicities. Grade 4 non-hematologic toxicity included melena/GI bleeding (one) and increased

SGOT (one).

26 Response

Tables 2-2 and 2-3 summarize the best confirmed response for patients by treatment indication for Step 1 and 2, respectively. Among the 55 eligible patients in Step 1, three (5%) achieved a complete response, eighteen (33%) had a partial response, twenty three (42%) had stable disease, one (2%) had progressive disease, nine (16%) were unevaluable, and one (2%) was unknown. Unevaluable patients did not complete four cycles of therapy for a variety of reasons: early death (three), stroke (one), patient withdrawal (one), loss to follow-up (one), progressive disease after one cycle (one), and toxicity (two). The estimated overall response rate to MTX was 39% with 95% CI: 26%, 53%. For patients with neutropenia, the overall response rate was 44%. Patients with anemia had an overall response rate of 34%.

Among the 14 eligible patients in Step 2, three (21%) achieved a complete response, six

(43%) achieved a partial response, two (14%) had stable disease, and three (21%) were unevaluable. The overall response rate for Cy was 64% with 95% CI: 35%, 87%. For patients with neutropenia, the overall response rate was 51%. Patients with anemia had an overall response rate of 83%. Of the six eligible patients who achieved a CR in Step 1 or Step 2, one patient had a molecular remission.

Survival and Progression-Free Survival

Survival time was defined as the time from registration to the date of death or the date last known alive (censored). The median overall survival for patients presenting with anemia was

69 months. The lower bound of the 95% CI was 29 months, but the upper bound has not been reached to date. The median overall survival for patients presenting with neutropenia has not

27

Table 2-2: Best Confirmed Response in Step 1 (Eligible Patients)

Neutropenia Anemia All Patients

Best Confirmed Response (n=26) (n=29) (n=55)

N Percent N Percent N Percent

Complete Response 2 8% 1 3% 3 5%

Partial Response 9 35% 9 31% 18 33%

No Change/Stable 11 42% 12 41% 23 42%

Progressive Disease 0 0% 1 3% 1 2%

Unevaluable 3 12% 6 21% 9 16%

Unknown 1 4% 0 0% 1 2%

Table 2-3: Best Confirmed Response in Step 2 (Eligible Patients)

Neutropenia Anemia All Patients Best Confirmed Response (n=8) (n=6) (n=14) N Percent N Percent N Percent Complete Response 1 13% 2 33% 3 21%

Partial Response 3 38% 3 50% 6 43%

No Change/Stable 2 25% 0 0 2 14%

Progressive Disease 0 0% 0 0 0 0%

Unevaluable 2 25% 1 17% 3 21%

Unknown 0 0% 0 0 0 0%

28 been reached as of August 2012. The overall survival Kaplan-Meier plot by treatment indication is presented in Figure 2-1. The median follow-up time of 34 surviving patients was 76 months.

Progression-free survival (PFS) was defined as the time from registration to progression or to death without documentation of progression (censored). The Kaplan-Meier plot for PFS is presented in Figure 2-2 for 55 eligible patients in Step 1. The median PFS for patients with anemia was 29 months with a 95% CI: 21, 62 months. For patients presenting with neutropenia, the median PFS has not been reached as of August 2012, but the lower bound of the 95% CI was

23 months. Patients who were alive without a progression were censored at the date of last contact.

Laboratory Correlates

Serum Biomarkers

Of the 27 serum cytokines measured in 41 patients and 37 age-matched normals, 9 of them differed from the normal samples tested (Wilcoxon p-value less than .0019, Table 2-4). We confirmed the involvement of proteins known to be dysregulated in LGL Leukemia such as Fas

Ligand (FASL) and Interleukin 18 (Liu, Wei et al. 2000; Kothapalli, Nyland et al. 2005). We also found elevated serum levels of soluble intercellular adhesion molecule (s-ICAM) and vascular adhesion molecule (s-VCAM) which have not been previously reported in LGL leukemia. None of these serum biomarkers were predictive of therapeutic response. We were also interested in knowing whether there were differences in biomarker expression when comparing LGL leukemia patients with neutropenia to those with anemia. Of the 9 serum cytokines shown to be different between LGL and normal we observed higher levels of FASL in anemic patients (unadjusted Wilcoxon p=.051).

29 Figure 2-1: Overall Survival (n=55).

Total Dead Alive Median Anemia 29 15 14 69 months

Neutropenia 26 6 20 Not yet reached

30

Figure 2-2: Progression-Free Survival (n=55).

Total Faileda Aliveb Median Anemia 29 20 9 29 months

Neutropenia 26 11 15 Not yet reached

a This includes patients who progressed or died without a progression.

b This includes patients alive without a progression.

31

Table 2-4: LGL Serum Cytokines (* = p-value<0.05, ** = p-value < 0.0019 and significant after Bonferroni’s correction)

Normal N=37 LGL All

N=41 N=78

TGFBeta1** Median 58410 44531 51044

Mean 58600 43929 50889

Std 15984 17459 18226

TGFBeta2* Median 2773 2437 2561

Mean 2731 2441 2578

Std 305 435 404

TGFBeta3** Median 5397 4708 4949

Mean 5323 4681 4986

Std 670 889 851

Soluble VCAM** Median 395897 803559 572803

Mean 426268 846367 641569

Std 208747 373890 369315

Soluble ICAM** Median 127498 184941 144770

Mean 139361 201269 171089

Std 62455 90956 83874

32

Soluble FAS* Median 6359 7018 6552

Mean 6091 7183 6651

Std 1670 2119 1979

Soluble FASL** Median 59 111 80

Mean 61 157 110

Std 37 217 164

IFN-alpha2** Median 0 72 29

Mean 18 55 37

Std 29 43 41

IL-18** Median 111 528 212

Mean 120 570 357

Std 45 377 355

TRAIL** Median 59 176 93

Mean 59 185 126

Std 32 92 94

IL-8* Median 933 132 580

Mean 2287 34414 19174

Std 4769 116889 85843

33

IFN-gamma* Median 370 453 412

Mean 506 702 609

Std 483 617 562

RANTES** (Mean Median 11910 10850 11531

Fluorescent Intensity) Mean 11895 10048 10924

Std 1186 2603 2245

IL-6* Median 20 28 22

Mean 334 1231 806

Std 1325 3945 3018

STAT3 Mutational Analyses.

Sanger sequencing of DNA or RNA samples from 50 of 55 eligible samples was performed to detect the presence of recently discovered mutations in exon 21 of the gene encoding the transcription factor STAT3 (Koskela, Eldfors et al. 2012). In this cohort, 24 of 50 patients had STAT3 mutations (48%). Mutations resulting in variant protein were predominantly amino acid changes Y640F (22%) or D661Y (16%). Other mutations included D661V (4%),

N647I (4%) and D661H (2%). Overall, patients with STAT3 mutations were more likely to respond to treatment (p = 0.044). We were then interested in knowing whether the dominant mutations in this study, D661Y or Y640F, correlated with response. Of interest, 8 of 11 (73%) patients with Y640F mutations responded to MTX (Fisher exact p-value 0.036), whereas 3 of 8 with D661Y mutations had a response (p-value 0.67). All non-responsive patients with the

34 Y640F mutation were classified as unevaluable, based on an inability to complete at least four courses of MTX. This mutation was therefore 100% predictive of response in those instances where a full course of MTX could be administered.

Gene expression correlation with response and mutation status.

Gene Set Enrichment Analysis (GSEA) of microarray data has been informative in elucidating survival signaling pathways in LGL leukemia (Shah, Zhang et al. 2008). Therefore, microarray analysis was performed on PBMCs from 37 patient samples including 19 MTX responders and 18 non-responders. Ranking gene changes in Gene Pattern, we identified 127 genes with a Z-score greater than 3. The overwhelming majority of the genes ranked as being significantly changed represented upregulation in the responder phenotype (Figure 2-3). GSEA of KEGG, BioCarta and GenMAPP pathways identified a number of pathways as being enriched in those patients that responded to MTX. Gene sets with a normalized enrichment score greater than 1.6 are listed in Table 2-5. Myc is a common constituent of several highly ranked gene sets.

Myc, Fos and Pim1 have been shown to be transcriptional targets of Stat3 that mediate cell growth and proliferation (Gronowski, Zhong et al. 1995; Shirogane, Fukada et al. 1999). Our array results indicated these genes tended to be up-regulated in the responders by at least 50 percent (Fos 1.58 fold, Myc 1.76 and Pim1 1.57).

The pathway indicated as VIP is a gene set containing those genes that contribute to the inhibition of Activation Induced Cell Death (AICD) by Vasoactive Inhibitory Protein (VIP).

Inhibition of AICD, not necessarily through VIP involvement, has been shown previously to be involved in the pathogenesis of LGL leukemia (Zhang, Shah et al. 2008).

35

Figure 2-3: Heatmap of Baseline Sample Gene Expression in MTX Responders Versus Nonresponders.

The top 126 genes by Z-score are displayed. Red shading indicates upregulation of mRNAs which are listed by official gene symbol at right. Samples are divided into 4 main groups from left to right, normal CD8+, normal TEMRA, patients with response to MTX and lastly those not responding. Individual patient lanes are labeled by mutation type, type of response and trial accession number.

PR = partial response CR = complete response NC = no change PROG = progression

36

The upregulation of genes related to oxidative phosphorylation could indicate increased energy demands by the cell or the replication of mitochondria in preparation for division. In support of the latter, one of the most consistently elevated genes in responders is Dynamin 1-like

(DNM1L) a gene with an established role in mitochondrial fission (Smirnova, Griparic et al.

2001). This is consistent with the observed ability of Stat3 to increase the electron transport chain in support of Ras transformation, (Gough, Corlett et al. 2009) which also shows pathway upregulation in the responder phenotype. The cellular location and mechanism of interaction that brings this about continues to be controversial (Phillips, Reilley et al. 2010). Dynamin-1-like has also been shown to interact with Glycogen Synthase Kinase 3 (GSK3) (Hong, Chen et al. 1998) which also has demonstrated gene set enrichment in the responder phenotype.

Pathway enrichment by major mutation types.

As noted before, the Y640F mutated genotyope strongly correlated with response to therapy with MTX, whereas the D661 mutated genotype did not. Gene set analysis identical to that performed between responder and nonresponder was conducted on seven Y640F mutant samples and eight samples with D661 mutation, predominantly D661Y. Intriguingly, both

Oxidative Phosphorylation and Aminosugars Metabolism pathways, previously deemed to be upregulated in MTX responders, were differentially upregulated in Y640F mutants (Table 2-6).

Another related, key pathway upregulated by Y640F mutations was that of Purine Metabolism which may explain differential MTX response, as inhibition of purine biosynthesis is a primary mechanism of action of low-dose MTX (Smolenska, Kaznowska et al. 1999).

37

Table 2-5: Pathways Upregulated in Responders to Methotrexate in Baseline Samples.

PATHWAY NAME SIZE ES NES NOM p-val FDR q-val Myc HSA00190_OXIDATIVE_PHOSPHORYLATION 94 0.612149 2.343721 0 0 OXIDATIVE_PHOSPHORYLATION 46 0.615098 2.107513 0 0.001189 VIP PATHWAY 19 0.686932 1.951678 0 0.007198 yes NTHI PATHWAY 19 0.654488 1.869974 0 0.019713 HSA05216_THYROID_CANCER 21 0.631448 1.862768 0.001319 0.018062 yes P38MAPK PATHWAY 27 0.591922 1.819187 0.001267 0.029064 yes HSA00530_AMINOSUGARS_METABOLISM 24 0.598269 1.809769 0 0.028299 KREBS_TCA_CYCLE 27 0.561241 1.722913 0.00246 0.087624 GSK3 PATHWAY 17 0.613683 1.717902 0.006658 0.082197 FMLP PATHWAY 28 0.549093 1.700237 0.003686 0.093788 RAS PATHWAY 21 0.580599 1.690651 0.002581 0.093672 HSA03050_PROTEASOME 21 0.57235 1.666294 0.007762 0.115172 HSA05221_ACUTE_MYELOID_LEUKEMIA 44 0.490264 1.656008 0.004739 0.119373 yes MRNA_PROCESSING_REACTOME 90 0.434236 1.643548 0 0.126395

Table 2-6: Pathways Upregulated in Y640F Mutants Versus D661 Mutants.

PATHWAY NAME SIZE ES NES NOM p-val FDR q-val HSA00190_OXIDATIVE_PHOSPHORYLATION 94 0.567646 2.042502 0 0.001062 OXIDATIVE_PHOSPHORYLATION 46 0.536051 1.757709 0 0.167077 PTDINSPATHWAY 18 0.637419 1.744706 0.001264 0.13437 MRNA_PROCESSING_REACTOME 90 0.485078 1.741168 0 0.106182 RIBOSOMAL_PROTEINS 78 0.482901 1.703704 0 0.13044 HSA03050_PROTEASOME 21 0.602159 1.687073 0.002478 0.135277 HSA00530_AMINOSUGARS_METABOLISM 24 0.569262 1.661507 0.00885 0.15462 PURINE_METABOLISM 79 0.472723 1.66001 0.002125 0.13758 PROTEASOME 16 0.627454 1.654199 0.005013 0.131303 HSA04940_TYPE_I_DIABETES_MELLITUS 31 0.538567 1.640318 0.004717 0.138087 ERK PATHWAY 21 0.570421 1.622 0.015971 0.155567

Pathways from KEGG, GenMAPP and BioCarta as identified in GSEA. Size= gene set size ES = Enrichment signal, NES=Normalized enrichment signal, NOM p-val = nominal p-value, FDR q-val = False discovery rate q-value.

38 STAT3 Y640F mutant has strongest transcriptional activity.

We speculated that expression of gene sets enriched in Y640F mutants might be the result of stronger transcriptional activity of the Y640F mutants versus the D661 mutant. Both the

D661V and Y640F mutations of STAT3 have been shown to increase the transcriptional activity of STAT3, with the Y640F mutation being considerably stronger (Koskela, Eldfors et al. 2012).

However, the major D661 mutation observed in this study was D661Y. For this reason, we performed luciferase assays to determine the transcriptional strength of this mutation. D661Y had increased transcriptional activity when compared to an equally expressed wildtype, but much less than that observed for the Y640F mutation, 15-fold versus 131-fold respectively. We additionally report the activity of an S614R mutation from exon 20 that was found in two patients from this study (Figure 2-4A).

A. B.

39

Figure 2-4: Evidence for Increased Transcriptional Activity of the Y640F STAT3 Mutation. A) Y640F mutant greatly increased STAT3 transcriptional activity compared to STAT3 wild type and other mutants. HEK293T cells were co-transfected with Cignal STAT3 reporter harboring an SIE response element upstream of luciferase reporter and vector alone (vector), wild type STAT3 (WT), or STAT3 mutants (S614R, Y640F, D661Y or D661V). (Bottom) Western blot to detect the expression of the different STAT3 variants using human STAT3 antibody and using antibody β-actin as loading control. B) Quantitative RT-PCR indicating a reduction of the mir-223 precursor transcript in patient cells harboring the Y640F mutation. Relative expression is normalized to the mean of the Y640F sample group. C) Western blot of MYC and in leukemic LGL and normal unactivated control CD8+ (NCD8, negative control) and activated CD8+ cells (AcCD8, positive control). Two baseline patient samples from each mutation class are displayed. Bar graphs depicts the average level of expression of E2F1 and MYC as a proportion of GAPDH, normalized to unactivated control CD8+ samples.

40

Additional but indirect evidence of strong STAT3 activity is our finding that the mir-223 precursor transcript was the most significantly downregulated transcript in the Y640F mutation group (2.2 fold downregulated with a t-test p-value of 0.0004). We confirmed this array finding with quantitative RT-PCR (Figure 2B). Mir-223 has been shown to repress E2F1 translation by binding the 3' UTR of its transcript and E2F1 has been shown to reciprocally repress the transcript for mir-223 through promoter binding and inhibition of transcription (Pulikkan,

Dengler et al. 2009). Potentially, very high E2F1 levels such as those that are present prior to replication, could be achieved that would silence mir-223. An increase in E2F1 in leukemic LGL could be the result of increased Myc levels driven by STAT3 (Shirogane, Fukada et al. 1999)

(diagrammed in Figure 3). Indeed, we observe increased Myc and E2F1 protein levels in LGL cells compared to normal CD8+ controls (Figure 2B). LGL counts of 3280/uℓ observed for patients with STAT3 mutation, compared to 1576/uℓ for patients not harboring a mutation provides additional support for a more proliferative state of those sample harboring mutations

(Wilcoxon p-value of 0.051).

Discussion

We describe here the results of the first large prospective study of immunosuppressive agents for the treatment of LGL leukemia. Overall, the efficacy of the first line of treatment with

MTX was 39%. This compares to a previously reported response rate of 56% when combining data from several small retrospective studies (Lamy and Loughran 2011). A design feature of the study was to stratify patients according to the indication for treatment, i.e. anemia versus

41

Chronic antigen exposure

Inflammatory cytokines

1 6

Clone 2 size STAT3 STAT3 MTX

MYC

3

7 Strong, sustained STAT3 signal mir-223 E2F1 resulting from activating mutation. 4 5 Weaker, intermittent STAT3 signal, Purine dependent on external stimuli. pool proliferation apoptosis

Figure 2-5: Model of Methotrexate Responsiveness and STAT3 Mediated Signaling in LGL leukemia. LGL leukemia is thought to arise in the context of chronic antigen exposure that maintains LGL survival associated with production of multiple inflammatory cytokines. Downstream effectors including STAT3 amplify this autoimmune loop(1) by increasing the release of inflammatory cytokines through direct transcriptional regulation and indirectly through the support of larger clonal populations. STAT3 mutations(2), Y640F mutations in particular, provide stronger and more sustained STAT3 signaling. This could lead to larger clone size and greater cytokine production resulting in more symptomatic disease. MTX and other immunosuppressives reduce the levels of such inflammatory cytokines(6). Key downstream targets of STAT3, such as MYC and E2F1(3) were upregulated in patients that responded to MTX. Levels of mir-223 were low in Y640F mutant samples, consistent with high E2F1 expression(4) as they are reciprocally regulated. The E2F1 transcriptional program pushes cell cycle transition while monitoring for proper conditions including available purines(5). We postulate that high E2F1 expression maintained by strong and sustained STAT3 signaling in Y640F mutation positive responders sensitizes leukemic LGL to apoptosis in the setting of purine depletion caused by low dose MTX(7).

42 neutropenia. Similar response rates to MTX were observed whether treatment indication was severe/symptomatic neutropenia or severe/symptomatic anemia. For those patients not responding to MTX, Cy proved to be an effective second line of treatment with 64% of patients achieving at least a partial response. Although this response rate appeared higher than MTX, there was not statistical significance given the small number of patients receiving Cy. Given the high rate of response of the second line of treatment, it can be inferred that prior failed treatment with MTX does not negatively influence future response to Cy. Our correlative laboratory studies confirm and extend previous observations indicating that production of proinflammatory cytokines is characteristic of LGL leukemia. We previously showed high serum levels of soluble

Fas, Fas ligand, IFNgamma, IFN-alpha2, IL-6, IL-8 and IL-18 in patients with LGL leukemia

(Kothapalli, Nyland et al. 2005). A fundamental pathogenic mechanism in LGL leukemia is resistance to Fas-mediated death despite high expression of both Fas and Fas ligand in leukemic

LGL. Blockade of Fas signaling by soluble Fas is one potential mechanism leading to apoptotic resistance (Liu, Wei et al. 2002). Of interest, serum levels of TRAIL, a pro-apoptotic molecule similar to Fas ligand were also markedly elevated. Inhibition of TRAIL signaling by TRAIL decoy receptors and subsequent resistance to TRAIL induced apoptosis has been observed in a number of cancers. We note that LGL leukemic sera have high levels of such decoy receptors

(unpublished observation). For the first time in LGL leukemia, we report the presence of elevated soluble VCAM and ICAM which have been implicated in the pathogenesis of rheumatoid arthritis (Littler, Buckley et al. 1997) and the subject of potential therapeutic targeting.

A major finding of this study was that a mutated STAT3 Y640F genotype predicts therapeutic response to MTX. All patients with this mutation that completed at least 4 cycles of

MTX responded to therapy. We propose that purine depletion by MTX in the presence of E2F1 expression leads to apoptosis of leukemic LGL as they attempt to enter cell cycle (Figure 2-5).

43 This model suggests that increased transcriptional activity of STAT3 may lead to increased sensitivity to MTX. Indeed, we showed that the Y640F mutation associated with clinical response had much stronger transcriptional activity than the other most common STAT3 mutation observed in this study (D661Y). Also supportive of this model is our finding that Y640F mutants displayed 2.2 fold lower expression of the precursor for mir-223. E2F1 and mir-223 have been shown to negatively regulate each other (Pulikkan, Dengler et al. 2009) Activation of this regulatory loop would lead to sustained high levels of E2F1 and low levels of mir-223, as demonstrated in leukemic LGL with a mutated Y640F genotype (Figure 2-4). It is also of interest that not all of the patients with a responder gene signature harbored mutations in STAT3. Our previous work showed that STAT3 was a key hub in the LGL leukemia survival network (Zhang,

Shah et al. 2008). Therefore, it is conceivable that there may be mutations in other genes that would allow them to activate STAT3.

In summary, we report the results of the first prospective trial of immunosuppressive therapy for the treatment of LGL leukemia and identify mutated STAT3 Y640F genotype as predictive of response to MTX. We further postulate a functional role for STAT3 activated via mutation in the molecular response to MTX therapy in LGL leukemia.

Future directions

The general pattern in the responder phenotype is the upregulation of pathways demonstrated to be downstream of STAT3 in support of growth and proliferation. It is unclear why these pathways would show increased levels in the responder subset. One possible explanation could be that as these signals for the most part are driven by a mutated Stat3 molecule, they are not subject to the normal ebb and flow of survival signaling from external stimuli to which those cells without inborn errors are subject. If this were to be true, it is very

44 likely that those samples displaying this gene expression pattern, but devoid of mutations in exon

21 of Stat3, harbor other mutations leading to enforced activation of STAT3. These mutation could present in other protein domains of STAT3 or molecules upstream of STAT3 activation.

An alternative hypothesis could be that the non-responder cells rely on decreased apoptosis as opposed to increase proliferation. A second alternative is that symptoms that lead to these patients seeking treatment could be less dependent on circulating LGL cell numbers, such as secretion of factors contributing to cell-contact-independent reductions in RBCs and neutrophils.

Elucidation of the molecular pathways involved and proof of these concepts will require the development of a proper molecular model. Stable transfection of mutant STAT3 molecules into cell types similar to LGL leukemic cells have not been successful at this time.

45 Chapter 3

Whole Genome Sequencing of Paired Normal/Leukemic Samples from Three LGL Leukemia Patients Reveals Additional Mutational Basis for Aberrant Survival of LGL

Introduction

The initial discovery of LGL leukemia benefitted from the presence of chromosomal abnormalities in two of three cases that helped indicate that the expansions of LGL were clonal and therefore leukemic in origin (Loughran, Kadin et al. 1985). Additional cases that were examined did not reveal chromosomal abnormalities to be a common occurrence in LGL leukemia (Loughran 1993). Because of this, T cell receptor spectratyping or gene rearrangement studies are the test of choice to prove clonal expansion in LGL leukemia.

Increased sero-reactivity of LGL leukemia patients to antigens from retroviruses may be indicative that these LGL expansions are driven by the presence of an unknown chronic retrovirus infection (Loughran, Hadlock et al. 1998; Sokol, Agrawal et al. 2005). LGL leukemia cells display the markers of cells that have been chronically stimulated by antigen (Yang, Epling-

Burnette et al. 2008). In this context it was unclear whether leukemic LGL have acquired and require somatic mutations for the aberrant survival signaling and resistance to apoptosis that are characteristic of leukemic LGL.

Constitutive STAT activation in all LGL leukemia has been known since 2001 (Epling-

Burnette, Liu et al. 2001). Recently (Koskela, Eldfors et al. 2012), we have discovered the presence of an activating mutation in the transcription factor STAT3 that is present in 40% of chronic T-LGL cases. In other cohorts this number has been shown to be as high as 62.9% and highly specific for LGL leukemia (Fasan, Kern et al. 2012). This mutation has been shown to be also present in chronic LGL of natural killer (NK) cell origin (Jerez, Clemente et al. 2012), thus

46 unifying the two disorders. This result has been expanded to the LGL-related diseases aplastic anemia and myelodysplastic syndrome (MDS), even in instances where no clear clonal expansion of LGL are observed (Jerez, 2013, in submission to Blood, contributing author). Most recently, a few percent of LGL patients that are not STAT3 mutation positive have been shown to have an activating mutation in STAT5B (Rajala, Eldfors et al. 2013). Two of the four patients in this study had disease of an aggressive nature. Both of these patients had a particular N642H mutation that we have since confirmed in five additional patients with aggressive disease

(unpublished observation). All of which provide compelling evidence for the ability of mutations to drive LGL proliferation and to affect disease severity.

Chapter Two of this dissertation notes that Y640F mutations in STAT3 identify a favorable outcome group in regard to disease response to the immunosuppressant methotrexate.

However, STAT3 mutations were not present in all patients that responded to treatment so there may be other mutations affecting treatment outcome. These other responders displayed a gene profile, potentially consistent with an increase in activated STAT3 as compared to nonresponders indicating there may be other mechanisms of STAT3 activation at work here. The study measured therapeutic response based upon amelioration of primary presenting symptom, anemia or neutropenia. At present we do not know why different patients display different symptoms and whether mutations play a role in this matter.

Our initial collaborative study (Koskela, Eldfors et al. 2012) and those following up to determine prevalence of mutations in patient cohorts, have at their broadest coverage of the genome sought to measure the exomes for mutations, and at their smallest consisted of targeted

Sanger sequencing. The following study is the first whole genome sequencing project in LGL leukemia. This study was undertaken to answer several of these questions. Is the normal cascade and character of mutations found in other leukemias present in LGL leukemia? Is STAT activation in those patients without direct STAT mutation explained by mutations in other

47 proteins? Is there a mutational basis to the different symptoms observed in LGL patients? The objective of this study was to identify somatic mutations and predict the biological effects of these mutations within the framework of what is already known about LGL leukemia. We expected to see mutations supportive of STAT3 activation and other cell signaling pathways supportive of proliferation and apoptotic resistance in LGL leukemia. Indeed we found evidence for this in that potential alterations in the interleukin 6 receptor axis was found as well as alteration in fibroblast growth signaling. We also unexpectedly found alterations that may be of an epigenetic nature.

This chapter presents one portion of the data, the single nucleotide changes, and what effects I predict they may have on LGL leukemic biology.

Methods

Sample Selection and Preparation

LGL leukemia patients were chosen from an existing registry maintained by the

Loughran lab dedicated to the elucidation of the mechanisms leading to the pathology of LGL leukemia. All patients in this study were additionally consented to allow for the use of high throughput sequencing to collect genetic information about themselves and for the deposition of this information into publicly accessible databases in an unidentifiable manner. See Table 3-1 for complete patient characteristics. All studies were approved by the Penn State Hershey

Institutional Review Board. The first two patients were selected due to having CD3 and CD8 percentages greater than 85% in their PBMC samples and were sequenced concurrent to the discovery of the STAT3 mutation in exome data. The third patient was selected from a group of known STAT3 mutation-negative samples. PBMC DNA was extracted by DNA Wizard columns

(Promega) from Ficoll-Paque (GE Healthcare) separated white cell layers. Saliva DNA was

48 collected as normal control using Oragene saliva DNA collection kits and reagents (DNA

Genotek).

Sequencing and Alignment

Four micrograms of DNA from each sample was submitted to the laboratory of Stephan

Schuster. DNA libraries were then created and sequenced by Lynn Tomsho. DNA was first sheared in an ultrasonicator (Covaris) and then ligated to the standard sequencing adaptors for the

Illumina HiSeq. Insert sizes for paired end libraries were selected to be 250-300 basepairs in size.

Each genome was sequenced on 4 lanes of the Illumina HiSeq with a target of paired 101 basepair reads. Technical difficulties prematurely ended half of the reads in the second pair of genomes at the 48th basepair, but still allowed for the reading of 76 and 91 billion bases from saliva and LGL samples of this genome pair, respectively (Table 3-2).

Mapping and alignment portions of the analysis were then performed with the expert assistance of Webb Miller, Richard Burhans and Aakrosh Ratan. Sequences were aligned to 19 (hg19) utilizing the Burrows-Wheeler Alignment tool (BWA). It utilizes the

Burrows-Wheeler transformation, which compresses sequence information into a shorter form to increase the speed at which a very large number of relatively short sequencing reads can be aligned against a large framework as is the human genome (Li and Durbin 2009).

We used SAMtools version (Li et al., 2009) to call the single substitution variants from these data. Consistently, 3.5 million variants were seen for each genome. The numbers of variants called for each sample are enumerated in Table 3-3. The variants were filtered to remove ones that exceeded a depth-of-coverage limit calculated using the average coverage for the sample and the Lander-Waterman equation (Lander and Waterman 1988). This is helpful in distinguishing the variant calls from structural variants.

49 Variant Filtering and Identification of Transcripts Affected by Mutation.

The output files from SAMtools were imported into Galaxy: "an open, web-based platform for data intensive biomedical research" (www.galaxyproject.org). In our instance, a private mirror of Galaxy was maintained and nicknamed Tadpole. The output of SAMtools for each called variant includes information on the reference and variant allele at that base, the number of reads supporting each allele, a called genotype, and a probability that the genotype call is correct. Numerous programs have been designed for the detection of somatic mutations, but have not been shown to perform well (Pabinger, Dander et al. 2013), so mutations were predicted by me using simple rules in Galaxy rather than using these programs. A new data format called gd_SNP was created by Webb Miller's group to import these data into Galaxy. This format and the related gd_SAP are now being applied to numerous types of genomic data, to which I and

Thomas Loughran contributed example workflows of working with cancer genomes (Reina,

2013, in submission). Variants were selected based on the following criteria: genotypes called for saliva and LGL did not match, Phred-scaled probability of incorrect genotype >20 for both genotypes (less than 1 in 100 probability of being false), and a minimum read-depth at that base of 8 reads. The 'Sorting Intolerant from Tolerant' (SIFT) algorithm was used to determine overlap between variants and known transcripts and predict whether amino acid changes were damaging to protein function (Kumar, Henikoff et al. 2009). The SIFT algorithm creates a scaled probability matrix of possible amino acids at a position, normalized by the probability of the most frequent amino acid at that position in homologous proteins. If the scaled probability of the substituted amino acid is below a certain probability score, that substitution is predicted to be damaging. This algorithm relies on the assumption that substitutions at highly conserved positions are more likely to be damaging.

50 SIFT was implemented in Galaxy for the generation of a list of transcripts for gene list profiling. This implementation returns both protein coding changes and substitutions in untranslated regions (UTRs). The gene lists were submitted to g:Profiler (available freely at http://biit.cs.ut.ee/gprofiler/) (Reimand, Kull et al. 2007; Reimand, Arak et al. 2011) from Galaxy and genes with mutations in the Jak-Stat pathway were examined.

Although using the SIFT implementation in Galaxy has the added benefit of providing

UTR annotations, genome annotations are based upon Ensembl build 63. Protein Variation Effect

Analyzer (PROVEAN) is an updated version of SIFT that is based upon Ensembl build 66

(Choi, Sims et al. 2012). This program was used to predict protein coding mutations for the purpose of de novo network generation in Ingenuity IPA

(http://www.ingenuity.com/products/ipa). This allowed a more seamless integration with the selection of exons for targeted resequencing and will allow insertion and deletion (indel) data to be analyzed in the same manner when it is available. Predicted single nucleotide variants (SNVs) were filtered prior to network generation to exclude those predicted to be synonymous and therefore not affecting protein sequence. Somatic mutations that are present in the Single

Nucleotide Polymorphism Database (dbSNP) and therefore less likely to cause severe protein changes were also excluded, but these SNVs are still reported in the appendix with dbSNP annotation. SNVs passing these filters were combined into one list and submitted to Ingenuity

IPA. Ingenuity settings were as follows: network size of 140, Reference set: Ingenuity

Knowledge Base (Genes Only), Relationship to include: Direct and Indirect, Includes

Endogenous Chemicals, Filter Summary: Consider only relationships where confidence =

Experimentally Observed, number of genomes with mutation imported as observation, other.

These settings represent a high stringency for the edges (connections) in the network, each connection between molecules, whether it be binding (direct) or alteration of transcription

(indirect) is backed by published experimental data. These networks can be queried by clicking

51 these nodes and edges to retrieve the reference the interaction is based upon. Importing the number of genomes mutated as "other" data allowed us to colorize these graphs with increasingly intense shades of red to denote more prevalent mutation. These existing networks and references are easily transferable to other IPA users and can be rapidly updated to indicate new interaction data.

Various tracks were then uploaded to a local mirror of the UCSC Genome Browser

(http://genome.ucsc.edu/) with the help of Belinda Giardine, which can currently be used by any member of our lab to interpret what genome data is available for their region of interest.

Information was uploaded in a de-identified form and with restricted access to processed bioinformatic data. Upon publication, these data will be freely distributed. Individual aligned reads, coverage depth, SNV and indel calls can all be easily accessed. Figure 3-1 shows the portion of SNV tracks for the area of STAT3 mutations on 17 to demonstrate how these can be easily read by someone with little bioinformatic training. Indels, when present, are displayed in a similar fashion with sequence variants and proportional numbers of reads supporting each variant being displayed. Figure 3-2 provides a relevant example of coverage depth.

52

Figure 3-1: Example of SNV Data, the STAT3 D661 Mutations. One representative saliva and all three LGL samples are pictured. Heterozygous mutations are indicated for D661V in patient 2 and D661Y in patient 1. Shading of individuals bars indicates the fractional proportion of the reads supporting the labeled nucleotide. The black line was added for clarity of the printed image.

Results

Sample Collection

Preparation of DNA from PBMC was of adequate quality for whole genome sequencing

which was not surprising given the routine usage of the kits used (Wizard DNA, Promega). Less

than 10 million cells were needed to produce the 4 microgram of DNA needed for library

preparation. The use of saliva collection kits (DNA Genotek) provided a non-invasive and simple

collection technique that yielded up to 200 micrograms of DNA from a single 4 milliliter

collection. The collection consists of 2 milliliters of saliva that is mixed with two milliliters of

reagent upon closing the tube. Both sample types yielded high molecular weight genomic DNA

needed for library construction as determined by Bioanalyzer (Applied Biosystems).

Demographics.

Two male and one female patient were selected for whole genome sequencing based

upon presentation with chronic T-LGL leukemia. Although CD3+ and CD8+ cell counts

exceeded 85% for all three genome patients, no constitutional or aggressive symptoms of disease

were observed. Patient One presented with anemia with a hemoglobin level of 9.2 g/dL, whereas

the normal range for males is between 14 and 18 g/dL. Patient Two has had chronic neutropenia,

53 with all patients displaying some reduction if the normal level is defined at two thousand neutrophils per microliter, but above severe neutropenia at five hundred neutrophils per microliter. All have elevated lymphocyte levels defined as exceeding four thousand lymphocyte per microliter with Patient One being the most profound at over five times that level. Patient One has since failed treatment with cytoxan and methotrexate given as separate courses. The first two genome patients were sequenced first and their initial analysis indicated mutations in STAT3 as shown in Figure 3-1. Patient Three was then selected from a pool of patients proven to be devoid of mutations in STAT3. All three patients were shown to have T cell receptor (TCR) rearrangement indicating clonal disease manifestation. None of the three patients were undergoing treatment of disease during the collection of PBMC for these studies. Complete demographic information is contained in Table 3-1.

54

fibrosis +1 to +1 fibrosis

Bone Marrow Biopsy Marrow Bone mild 90%cellular, reticulin hematopoietic normal +2, maturation. cellularity. to40% ~30% interstitial Diffuse at infiltrate lymphoid the of 40% around in increase Mild marrow. Complete fibers. reticulin lineages all of maturation seen. was ND

ng

Peripheral Blood Blood Peripheral Staini ND ND Minimal There anisopoikilocytosis. lymphocytosis isabsolute consisting (5.98K/uL) The LGLs. of primarily toxic have neutrophils granulation.

beneficial increase in increase beneficial

Notes negative coombs Autoimmune anemia. hemolytic macrocytic with 2010, inSep Splenectomy resulting constitutional of Lack ANC. symptoms. constitutional of Lack neutropenia. Chronic symptoms. constitutional of Lack symptoms.

LGL Leukemia patients selected for Whole Genome Sequencing Genome Whole for selected patients Leukemia LGL

Year of Year dx 2011 2006 2010

ofThree

Gender Male Female Male

Age at Age dx 53 67 56

:Characteristics

1

-

3

Patient 1 2 3

Table

55

K/ul

ALC

22.76 7.6 5.98

K/ul

ANC

0.72 0.6 1.62

K/ul

Platelet

19.6 244 228

g/dL

Hemoglobin

9.2 12.7 15.1

K/ul

WBC

23.96 8.3 8.07

Cytogenetics 47XY, Abnormal. +Y[2]/46,XY[19] Normal ND

blood cell, ANC= absolute neutrophil count,ALC absolute lymphocyte count) lymphocyte absolute count,ALC neutrophil absolute ANC= cell, blood

Clonality TCR+ TCR+ TCR+

(WBC = white white = (WBC Patient 1 2 3

56 Sequencing and Alignment

Over 1 billion reads were obtained for each of the first four genome samples, with that number approaching 1.5 billion reads for the last two. As indicated previously, the second read of each paired-end library were prematurely truncated at the 48th basepair. The lowest number of read bases for any genome was 76 billion and 153 billion for the highest (Table 3-2).

Table 3-2: Raw Read and Total Base Counts Obtained for Each of Six Whole Genomes.

Individual Sample Read Length Number of reads Number of bases (read1,read2) 1 Saliva 101,101 1,037,381,724 104,775,554,124 LGL 101,101 1,203,855,660 121,589,421,660 2 Saliva 101, 48 1,021,967,880 76,136,607,060 LGL 101, 48 1,228,611,657 91,531,568,473 3 Saliva 101,101 1,411,696,650 142,581,361,650 LGL 101,101 1,520,270,670 153,547,337,670

A high percentage of the reads were both aligned and mapped to the human genome

(Table 3-3). Percentages of alignable reads were similar between saliva and LGL genomes, but predictably lower for all saliva genomes. This could indicate a low amount of contamination of non-human DNA in these samples from sources such as high molecular weight bacterial flora

DNA. The highest alignment percentage was 91.53 percent. These 8.5+ percent of reads that do not align are likely due to gaps in the current framework of the human genome and represent areas that are not properly annotated or included in hg19 (Stephan Schuster, personal communication). Only reads with unique alignments were mapped in the instances where a read

57 aligned to more than one region. Overall, greater than 99% coverage of hg19 was achieved with average base coverage ranging from 23.15 to 46.89.

Table 3-3: Alignment Statistics for Whole Genome Data Aligned to hg19

Individual Sample Generated Aligned bases Mapped bases Reference Coverage bases (Gb) (Gb) (Gb) covered 1 Saliva 104.77 92.23 88.97 99.87% 32.23 (88.02%) (84.92%) LGL 121.59 105.66 101.64 99.81% 36.93 (86.90%) (83.59%) 2 Saliva 76.14 66.23 63.38 99.18% 23.15 (86.99%) (83.24%) LGL 91.53 83.78 80.27 99.20% 29.28 (91.53%) (87.70%) 3 Saliva 142.58 119.33 114.85 99.20% 41.72 (83.69%) (80.55%) LGL 153.54 134.10 128.57 98.27% 46.89 (87.33%) (83.73%)

Figure 3-2: Coverage Across TCR-Beta Gene Locus in Genome Patient Two. A lower read depth across this locus (area under bar) in the LGL genome indicates gene rearrangement has occurred. No decrease is observed in the saliva genome. The area of no coverage may represent a structural variation as it is not observed in any of the other four genome samples (not shown).

58

Genome coverage (read depth) was examined in the region of chromosome 7 q34, the T cell receptor beta chain locus. A decrease in coverage was observed for leukemic samples of all three genome pairs. No coverage decrease is observable in the saliva sample, providing at least partial evidence that leukemic LGL were not present in these normal control samples. A visual representation for one sample pair is presented in Figure 3-2.

Identification of Somatic Variants Affecting Transcripts.

Approximately 3.5 million substitution variants, compared to hg19, were identified per genome, consistent between both LGL and saliva and across patients. An appreciable percentage of these variants were unique to individual patients resulting in 5.5 million variants being detected in the entire dataset. These particular variants represent SNVs between the sequenced genome and the human reference genome, hg19. The vast majority of these SNVs represent normal human variation between individual patients and the human reference genome. After filtering SNVs to determine those that were somatic in nature, less than 1% of the original 3.5 million SNVs passed the filter. Roughly 30,000 had altered allelic ratios in the LGL genomes, compared to the saliva genomes, in the first two patients. Fourteen thousand had altered ratios in the third genome patient (Table 3-4). These SNVs represent allelic ratios that are the result of somatic mutation and the coincident introduction of a new allele, as well as those that are the result of loss-of-heterozygosity.

Table 3-4: Substitution Variants Between LGL Patients and the Human Reference Genome

genome Sample Substitution Altered Allele Variants Ratio in LGL 1 Saliva 3,516,755 LGL 3,507,966 30,023

59

2 Saliva 3,509,405 LGL 3,524,117 30,490 3 Saliva 3,537,118 LGL 3,549,673 14,207

Presence of Mutation Potentially Affecting Activation of JAK-STAT Signaling Pathway.

Coincident with the discovery of STAT3 mutations in exome data (Koskela, Eldfors et al.

2012), STAT3 mutations were found in patient genomes One and Two corresponding to amino acid changes of D661Y and D661V respectively. As noted in Figure 3-1 and consistent with the

Sanger sequencing screening of genome pair three, no mutation in STAT3 was observed in the filtered SNVs of genome Patient Three. g:Profiler did not report any additional mutations in the

JAK-STAT pathway for Patient Three and only the STAT3 mutation for the first two. The analysis was expanded in SIFT to examine regulatory regions. Reads supporting 3' UTR mutations were then noted in the interleukin 6 receptor (IL6R, :154437915 G/C) and hepatocyte nuclear factor 4 alpha (HNF4A, chromosome 20:43058344 A/C). Targetscan 6.2

(Lewis, Burge et al. 2005) (www.targetscan.org, queried 5/24/2013) was queried to determine proximity of these mutations to miRNA binding sites. Only the IL6R contained a highly conserved site in close proximity corresponding to a potential for the miR-

34a/miR34c-5p/miR449a/miR449b family. Figure 3-3 illustrates the potential binding site alteration, of which the biological effect must still be validated experimentally. The G to C transversion may be altering a conserved motif recently identified and predicted to affect the response of the RNA induced silencing complex (Helwak, Kudla et al. 2013).

60

Figure 3-3: Targetscan Prediction of miRNA Binding Sites in One Region of the IL6R 3' UTR. Top line represents genomic sequence and bottom line represents indicate miRNA. The mutation observed in genome patient three correlates with the leftmost G of the top strand which becomes heterozygous G/C (red underline). The white area indicates the seed region used to predict binding sites of miRNAs as it is common to observe complete complementarity in this region (Lewis, Burge et al. 2005). Binding in the 3' regions of the miRNA is poorly predicted in silico as there are frequent bulges between miRNA and target, but there is evidence that a GC containing motif is frequently conserved in the 5' region of the template (Helwak, Kudla et al. 2013).

Network and Pathway Analysis of Potential Mutations in Known Proteins.

Corresponding to the reduction in SNVs observed in genome three, a reduced number of potentially altered proteins was also observed. Table 3-5 contains the summary statistics of

PROVEAN analysis for the filtered SNVs from all three genomes. Performance between the earlier used SIFT program and PROVEAN differed slightly, in most instances due to the usage of different genome annotations. Complete results including altered proteins, amino acid changes, predicted effect on protein and whether the suspected mutation encompasses a known SNP are included in Appendix A for all three whole genome pairs. Table 3-5 lists the summary statistics of the PROVEAN algorithm. Within the novel mutations, network and pathway analysis was

61 restricted to those genes whose mutations were predicted by PROVEAN to be deleterious or present in more than one LGL patient, a total of 77 distinct genes. Table 3-6 specifically lists the genes that passed these filters. Figure 3-4 contains the largest and highest scoring network created in this manner. Networks are scored based upon the interconnectedness of the molecules

(nodes) by experimental evidence of direct and indirect interactions (edges). Genes not included in the mutated list but annotated as interacting are filled in to complete the network. Primary observation of the largest network reveals a core consisting of several molecules that are kinases or are downstream of kinases. Extracellular signal related kinase (ERK), mitogen activated protein kinase (MAPK) and nuclear factor kappa B (NFKB) are three examples of downstream effectors. Platelet derived growth factor (PDGF) and the TCR co-receptor molecule CD3 are examples of upstream regulators. As these are well-studied molecules, they make up a dense core of interactions. Of these core molecules, STAT3 appears to be unique in that it is the only core molecule found to be mutated. Nitric oxide synthase 3 (NOS3) and protein kinase C gamma

(PRKCG) do not have nearly as many edges as STAT3 but are somewhat interior as well. The network was examined to find those mutated genes with multiple connections to the interior core.

Ubiquitin modifying altering the activation of NFKB and ERK among other cell signaling molecules are presented in Figure 3-5. Alterations in fibroblast growth factors are presented in Figure 3-7. A cluster of histone modifying enzymes (Figure 3-6) was found to affect multiple samples including the only mutations supported by sequence reads in all three samples.

Dysregulation of neither fibroblast growth factors nor histone modifying enzymes have been previously published in regard to LGL leukemia. Largely independent of these graphs, Table 3-7 details those genes found to be mutated in more than one LGL genome. A total of 13 genes were predicted to be mutated in more than one sample.

62

Table 3-5: Summary Statistics of PROVEAN and SIFT Analysis of Filtered SNVs. Bolded numbers are those selected for network analysis and present in table 3-6.

dbSNP PROVEAN SIFT

D

damaging

tolerated

dbSNP+

genome dbSNP eleterious

(novel)

neutral

Total

Type

NA NA

-

1 Protein 229 140 89 174 50(27) 5 162 56 11 coding Single AA 154 90 64 104 50 0 96 55 3 Change Synonymous 70 48 22 70 0 0 66 1 3 Nonsense 5 2 3 0 0 5 0 0 5 Input error 0 0 0 0 0 0 0 0 0 Non protein 30132 17552 12580 0 0 30132 0 0 30132 coding

2 Protein 243 144 99 186 49(28) 8 168 57 18 coding Single AA 159 92 67 110 49 0 96 57 6 Change Synonymous 76 49 27 76 0 0 72 0 4 Nonsense 8 3 5 0 0 8 0 0 8 Input error 0 0 0 0 0 0 0 0 0 Non protein 30537 16409 14128 0 0 30537 0 0 30537 coding

3 Protein 136 88 48 107 28(13) 1 97 27 12 coding Single AA 88 54 34 60 28 0 55 27 6 Change Synonymous 47 34 13 47 0 0 42 0 5 Nonsense 1 0 1 0 0 1 0 0 1 Input error 0 0 0 0 0 0 0 0 0 Non protein 14224 8892 5332 0 0 14224 0 0 14224 coding

63

Table 3-4: Predicted Mutations Not Present in dbSNP. Limited to those that are novel and predicted deleterious or novel and mutated in more than one patient. Input string for PROVEAN: chromosome number, 1-based coordinate, reference allele, variant allele, genotype (saliva_LGL)

Input Amino Acid PROVEAN Genome Gene Symbol Change Prediction Sample 5,148637949,G,T,2_1 ABLIM3 K645N Deleterious 2 9,33385733,C,T,2_1 AQP7 M218I Neutral 2 9,33385235,T,G,2_1 AQP7 Y265S Neutral 3 22,42089768,T,G,1_2 C22orf46 V173G Deleterious 1 6,2623686,T,C,2_1 C6orf195 N124S Deleterious 1 12,2602398,C,T,2_1 CACNA1C T320M Deleterious 1 12,2760864,G,A,2_1 CACNA1C R1322H Deleterious 2 15,93522449,C,G,2_1 CHD2 L938V Deleterious 1 2,97474433,T,G,2_1 CNNM4 I182S Deleterious 1 3,148577673,C,T,2_1 CPB1 R380* NA 2 3,98538108,G,A,2_1 DCBLD2 A342V Deleterious 2 Y,15025754,G,T,2_1 DDX3Y C218F Deleterious 1 1,182812436,T,G,2_1 DHX9 V40G Deleterious 3 1,21605755,A,C,2_1 ECE1 V56G Deleterious 1 1,226027704,C,A,2_1 EPHX1 Y299* NA 1 2,97757296,C,T,1_2 FAHD2B G50R Deleterious 1 20,26061956,C,A,2_1 FAM182A A103E Deleterious 3 20,25755770,C,G,1_2 FAM182B Q59H Deleterious 2 20,25755889,A,G,1_2 FAM182B C23R Deleterious 2 15,49776539,A,T,2_1 FGF7 E83D Deleterious 3 10,123298190,C,G,2_1 FGFR2 V222L Deleterious 1 1,240255773,A,C,2_1 FMN2 S122R Deleterious 3 2,153486204,T,A,2_1 FMNL2 I812K Deleterious 1 9,132662285,A,G,1_2 FNBP1 L549P Deleterious 1 4,79461793,C,T,2_1 FRAS1 R3852* NA 2 7,151699853,A,G,1_2 GALNTL5 E238G Deleterious 2 15,20740330,C,A,2_1 GOLGA6L6 E474* NA 1 3,72957607,A,G,2_1 GXYLT2 E122G Deleterious 2 X,153224057,A,G,1_2 HCFC1 L589P Deleterious 2 12,123333353,A,G,2_1 HIP1R D104G Deleterious 3 6,32549589,A,C,2_1 HLA-DRB1 S133A Neutral 1 6,32549589,A,C,2_1 HLA-DRB1 S133A Neutral 2 1,245021425,T,C,2_1 HNRNPU Q238R Deleterious 2 17,39254133,G,T,2_1 KRTAP4-8 S68R Deleterious 3

64

4,41678420,C,G,2_1 LIMCH1 P769A Deleterious 2 16,33961779,C,A,2_1 LINC00273 G221V Deleterious 2 16,33961779,C,A,2_1 LINC00273 G221V Deleterious 3 11,46898750,C,T,2_1 LRP4 G1093R Deleterious 3 X,148798267,T,G,1_2 MAGEA11 V345G Deleterious 2 X,26235772,T,G,2_1 MAGEB5 N118K Deleterious 2 5,162940595,G,A,2_1 MAT2B R87K Deleterious 2 15,94901745,A,G,2_1 MCTP2 E402G Deleterious 1 9,5897614,T,G,1_2 MLANA C45W Deleterious 1 9,5897614,T,G,1_2 MLANA C45W Deleterious 2 12,49440443,C,A,2_1 MLL2 C1456F Deleterious 1 12,49420606,C,T,2_1 MLL2 R5048H Deleterious 2 7,151932945,C,T,1_2 MLL3 R909K Neutral 1 7,151932945,C,T,1_2 MLL3 R909K Neutral 2 3,195507053,A,C,1_2 MUC4 S3800A Neutral 1 3,195507062,C,T,1_2 MUC4 D3797N Neutral 1 3,195508510,A,T,2_1 MUC4 L3314H Neutral 1 3,195508249,A,G,1_2 MUC4 V3401A Neutral 3 3,195509974,A,G,2_1 MUC4 F2826S Neutral 3 3,195510266,A,C,1_2 MUC4 S2729A Neutral 3 3,195513515,C,T,2_1 MUC4 A1646T Neutral 3 17,16068377,C,G,2_1 NCOR1 K178N Deleterious 1 17,16068377,C,G,2_1 NCOR1 K178N Deleterious 2 17,16068377,C,G,2_1 NCOR1 K178N Deleterious 3 19,1391014,A,T,2_1 NDUFS7 T44S Deleterious 3 7,150707313,C,T,2_1 NOS3 R875W Deleterious 1 X,100103684,C,A,1_2 NOX1 Q501H Deleterious 1 11,59480654,A,G,2_1 OR10V1 I222T Deleterious 3 7,143747712,C,A,2_1 OR2A5 A73D Deleterious 2 9,77746692,A,G,1_2 OSTF1 E68G Deleterious 2 8,101721839,C,A,2_1 PABPC1 V365L Deleterious 1 3,195994232,A,G,1_2 PCYT1A L41P Deleterious 2 12,11461553,T,C,1_2 PRB4 R122G Neutral 1 12,11461553,T,C,2_1 PRB4 R122G Neutral 3 1,203452536,G,T,1_2 PRELP R75L Deleterious 1 19,54403903,G,A,2_1 PRKCG G492E Deleterious 1 7,142458929,G,C,2_1 PRSS1 K70N Neutral 2 7,142460339,G,A,2_1 PRSS1 C171Y Deleterious 1 9,33796799,A,T,2_1 PRSS3 T81S Neutral 1 9,33798574,G,A,1_2 PRSS3 S239N Neutral 2 7,157341693,C,T,2_1 PTPRN2 A958T Deleterious 1

65

3,49137448,T,G,2_1 QARS D403A Deleterious 1 16,53644898,G,A,2_1 RPGRIP1L Q1228* NA 3 X,20205967,C,T,2_1 RPS6KA3 W223* NA 2 3,46539703,G,A,2_1 RTP3 A51T Deleterious 2 19,52033038,T,G,2_1 SIGLEC6 T318P Deleterious 2 7,150937277,T,C,1_2 SMARCD3 E352G Deleterious 1 10,112361502,G,T,2_1 SMC3 D918Y Deleterious 2 17,40474420,C,A,2_1 STAT3 D661Y Neutral 1 17,40474419,T,A,2_1 STAT3 D661V Neutral 2 20,2397899,T,G,2_1 TGM6 V453G Deleterious 2 7,150500797,C,A,2_1 TMEM176A Y144* NA 1 6,159050783,A,G,2_1 TMEM181 Y449C Deleterious 2 21,19698793,C,A,2_1 TMPRSS15 G626V Deleterious 1 12,29936515,A,G,2_1 TMTC1 I57T Deleterious 3 6,138196024,G,A,2_1 TNFAIP3 W113* NA 2 16,88926080,C,A,2_1 TRAPPC2L Y72* NA 2 14,81610012,A,G,2_1 TSHR H537R Deleterious 2 2,179647637,C,T,2_1 TTN R953H Deleterious 3 6,35467830,A,G,2_1 TULP1 Y422H Deleterious 1 20,48700749,C,T,2_1 UBE2V1 G28R Deleterious 2 WI2- V7M Deleterious 1 1,13219565,C,T,2_1 2994D6.2.1 19,44933653,A,T,2_1 ZNF229 C435S Deleterious 2

Figure 3-4: IPA De Novo Network of Genes Mutated in LGL Patient Genomes. Proteins found to be mutated in LGL genomes are represented in red, with increasing intensity indicating a larger number of genomes with a mutation in that gene. Solid lines indicate direct interactions such as protein binding, whereas indirect interactions such as induction of expression are indicated by dotted lines. The heavily connected central core (C) consists of receptors, co- receptors, and downstream effectors of tyrosine kinases. Although not visible in this representation, examples include MAPK, ERK, NFkB, PDGF, and CD3. The only mutated protein in this core is STAT3(S). But several outer mutated nodes have multiple connections acting inward and these are expanded in subsequent figures to include: UBE2V1 (U), TNAIP3 (T), and FGF7/FGFR2 (F). Proteins with nuclear function (N) MLL2, MLL3 and NCOR1 do not appear to be multiply connected to the interior of the network. The U, T and F regions are expanded in subsequent figures.

66

67

Figure 3-5: Altered Proteins Involved in the Regulation of Nuclear Factor kappa B (NFkB). Tumor necrosis factor, alpha-induced protein 3 (TNFAIP3, also known as A20) and Ubiquitin- conjugating enzyme E2 variant 1 (UBE2V1, also known as CROC1) are shown with their network connections. In genome patient two alone, sequencing data supports a premature stop codon at W113 in TNFAIP3 and a deleterious G28R mutation in UBE2V1. UBE2V1 is needed for activation of NFkB by TNF receptor associated factors (Andersen, Zhou et al. 2005; Syed, Andersen et al. 2006). TNFAIP3 has been shown to inhibit NFkB (Song, Rothe et al. 1996).

Patient # gene 1 2 3

MLL2 C1456F R5048H LL3 R909K R909K NCOR1 K178N K178N K178N

Figure 3-6: Histone Modifying Proteins Altered in LGL. Myeloid/lymphoid or mixed lineage leukemia 2,3 (MLL2,MLL3) are histone lysine-4 methyltransferases and form a complex involved in transcriptional activation. This locus has been shown to be frequently deleted in leukemias (FitzGerald and Diaz 1999; Ruault, Brun et al. 2002). Nuclear receptor co-repressor 1(NCOR1) is the only gene predicted to be mutated in all three LGL patients. This protein has been shown to recruit histone deacetylases (Hdac) which was first realized in acute myeloid leukemia (AML) (Wang, Hoshino et al. 1998).

68

patient # gene 1 2 3

FGFR2 V222L FGF7 E83D

Figure 3-7: Fibroblast Growth Factor Receptor Signaling Mutations. Mutations in the fibroblast growth factor 2 (FGFR2), a predicted V222L mutation in genome 1, could be acting to strengthen STAT3 signaling in genome patient 1 that is also mutated in STAT3. In contrast, a predicted E83D mutation in fibroblast growth factor 7 (FGF7) could be compensating for the lack of STAT3 mutation in genome patient three. FGFR axis activity has not been previously reported for LGL leukemia and may be a new area of research.

Table 3-7: Genes with Evidence of Mutation in 2 or More LGL Genomes. Symbol # Gene name Proposed function and relation to LGL leukemia. AQP7 2/3 Aquaporin 7 Aquaporin 7 has been recently shown to be involved in the uptake of antigen and migration of dendritic cells (Hara-Chikuma, Sugiyama et al. 2011), perhaps it similarly affects migration of T-LGL. Aquaporin 3 has been shown by this group to affect migration of T cells as well (Hara-Chikuma, Chikuma et al. 2012). CACNA1C 2/3 Calcium Channel, Voltage- Apoptotic resistance of LGL leukemia is well Dependent, L Type, Alpha known. Intracellular calcium has been shown 1C Subunit in some instances to induce T cell apoptosis (Qu, Al-Ansary et al. 2011), therefore an alteration in this protein could affect this pathway in LGL leukemia HLA- 2/3 Major Histocompatibility Previous studies have demonstrated an DRB1 Complex, Class II, association with a region in linkage DR Beta 1 disequilibrium with this locus, but not any particular allele, and the presence of Felty's syndrome or LGL leukemia with rheumatoid

69

arthritis (Coakley, Brooks et al. 2000). MHC- II molecules present antigen to immune cells that express TCRs. LINC0027 2/3 Long Intergenic Non- Uncertain, non-coding RNAs may affect the 3 Protein Coding RNA 273 translation and stability of other RNAs in some instances. MLANA 2/3 Melanoma Antigen Most commonly known as an antigen, that Recognized By T-Cells 1 with proper orientation, allows melanoma cells to be attacked by normal LGL of the immune system (Kawakami, Eliyahu et al. 1994). Important to the development of melanosomes by the stabilization of G-protein receptor (GPR) 143 (Giordano, Bonetti et al. 2009). This protein could affect recognition of LGL by immune cells or stabilize a similar GPCR. MUC4 2/3 Mucin 4, Cell Surface A protective glycoprotein of epithelia that has Associated also been shown to activate Erythroblastic Leukemia Viral Oncogene Homolog (ERBB2, also known as HER2) (Carraway, Perez et al. 2002). PRB4 2/3 Proline-Rich Protein BstNI A proline rich salivary protein (Lyons, Azen Subfamily 4 et al. 1988). Relation to T cells or immunity is not known. PRSS1 2/3 Protease, Serine, 1 These two genes are located within the region PRSS3 2/3 Protease, Serine, 3 that is removed during TCR genomic rearrangement (see Figure 3-1). It is conceivable that these mutations are a consequence of that action or an artifact due to the unusual alignments this creates. STAT3 2/3 Signal transducer and Constitutive activation of STAT3 is a activator of transcription 3 hallmark of LGL leukemia (Epling-Burnette, Liu et al. 2001). NCOR1 3/3 Nuclear receptor co- Histone modifiers, see Figure 3-6 repressor 1 MLL2 2/3 Myeloid/Lymphoid Or More detail in Figure 3-5 Mixed-Lineage Leukemia 2 MLL3 2/3 Myeloid/Lymphoid Or Mixed-Lineage Leukemia 3

70 Discussion

This chapter describes the sequencing bioinformatic analyses of whole genome SNVs from three matched normal saliva/LGL leukemia genomes. To date, the comparison of a leukemic sample to a normal genome derived entirely from a saliva sample has not been published. Sequencing characteristics of this new sample type appear adequate as a normal control and the data generated are of sufficient quality to make comparisons and discover somatic changes. Galaxy tools proved to be robust enough to carry out the analysis of somatic mutations utilizing data formats and tools new to this analysis (Reina, 2013, in preparation, contributing author) The most proximal need to analyze these data was to determine which of the tens of thousands of suspected mutations should be further scrutinized. In this chapter the use of pathway analysis, de novo network generation and quite simply looking for commonly mutated genes in more than one genome was used to pare down the large list of mutations to a more manageable handful of pathways for further scrutiny. As additional LGL genomes are sequenced we will continue to add these mutations to these networks.

The ultimate goal of these investigations is to develop additional therapies for LGL leukemia. The findings of mutations that may alter the translation of the IL6 receptor in the third genome patient as well as the observation of STAT3 mutations in the first two, underscores the appropriateness of targeting the STAT3 pathway in LGL leukemia. It is consistent with observations of constitutive activation of STAT3 in all LGL leukemia patients (Epling-Burnette,

Liu et al. 2001). The identification of mutations in FGFR signaling may also provide clues as to why the majority of LGL patient have no direct STAT3 mutation (Koskela, Eldfors et al. 2012)

This receptor tyrosine kinase could be acting upstream of STAT3 to provide an aberrant growth signal due to alterations in the receptor or growth factors themselves. The finding of mutations in

TNFAIP3 and UBE2V1 may identify NFkB as another area of therapeutic intervention consistent

71 with a prediction of constitutively active NFkB in LGL leukemia (Zhang, Shah et al. 2008).

UBE2V1 is needed for activation of NFkB by TNF receptor associated factors (Andersen, Zhou et al. 2005; Syed, Andersen et al. 2006). TNFAIP3 has been shown to inhibit NFkB (Song,

Rothe et al. 1996). The T cell receptor (TCR) and TCR co-receptor molecule CD3 are important for the activation of all T cells including LGL. Additionally present in this network, platelet- derived growth factor (PDGF) and transforming growth factor beta (TGFB) have both been shown to play a role in the biology of LGL leukemia (Yang, Liu et al. 2010; Saadatpour, Wang et al. 2011). TNFAIP3 and UBE2V1 may be therefore contributing heavily to the disease phenotype in LGL leukemia. From this network it appears that UBE2V1 and TNFAIP3 sit at a key point to regulate multiple signals important to leukemic LGL as they converge on NFkB.

Intervention against ERK has been previously shown to be effective (Epling-Burnette, Bai et al.

2004) in restoring normal apoptosis in NK-LGL. Enhancing or restoring the activity of TNFAIP3 may be an effective way to inhibit multiple dysregulations in LGL leukemia.

The discovery of alterations in the histone modifiers MLL2, MLL3, and NCOR1 represents a new and exciting area for LGL leukemia research. The influence of these factors in other leukemias is well known. Therefore numerous HDAC and methyltransferase inhibitors are already in clinical trial for other types of T cell lymphomas (Molife and de Bono 2011).

Experiments are ongoing to determine the frequency of every predicted deleterious, or present in more than one genome, mutation in an 80 patient cohort of LGL leukemia patients.

We consider this our first tier of mutations to confirm and determine prevalence, with a second tier of mutations not predicted to be damaging to follow.

Mutations in proteins with unclear function that are present in LGL leukemic samples may allow us to identify new proteins with potential importance to leukemic and normal LGL biology. An example could be AQP7, as only recently has an emerging role for aquaporins in

72 antigen presentation been observed (Hara-Chikuma, Sugiyama et al. 2011). Another example could be the role of CACNA1C in calcium induced apoptosis of T lymphocytes.

In summary, whole genome sequencing of three LGL leukemia samples was undertaken to determine the presence of mutations with potential contribution to biology of LGL leukemia.

We observed additional mutations with the potential to contribute to previously observed STAT3 activation and new evidence that mutations contribute to other signaling pathways known to be dysregulated in LGL leukemia. The findings of mutated proteins involved in histone modification identifies a new and exciting area of study in LGL leukemia.

Future Directions

Foremost, validating the presence of putative mutations in these genomes followed by determining their prevalence in a larger LGL leukemia patient cohort must be completed. This will help identify those that may affect a large number of patients and are appropriate targets for development of treatment modalities. Assay design and the production of capture oligos for the

MiSeq instrument (Illumina) can be completed in 1-2 months and sequenced in a matter of days.

This will provide us a targeted depth of coverage averaging 1000X for the regions of interest allowing very clear mutations calls. Data indicating a high rate of false positive calls in the genome data, if present, will be used to refine the methods we use to call somatic mutations.

At present we have only sequenced 1 genome pair from a patient without a STAT3 mutation. The realization that the majority of LGL leukemia patients do not have these mutations, yet display active STAT3 makes the whole genome sequencing of these mutation negative patients a definite priority. The observation of lower cell counts for these patients in

Chapter 2 indicates that careful attention to sample purity may be needed along with selective enrichment.

73 Mechanistic studies to determine the effect of these mutations on protein function, or protein translation in the case of the IL6 receptor, to confirm biological effect before additional effort can be justified on the development of treatments to counter these mutations. The correlation between clinical data and mutation status in larger LGL leukemia cohorts should also be examined to identify those mutations that predispose to certain symptomatic manifestations such as anemia, neutropenia and rheumatoid arthritis. We are heavily annotating our sequencing cohort with the necessary clinical information to make these determinations.

Our understanding of the structure of the human genome is constantly evolving making occasional repeat analysis of SNVs on updated gene annotation tracks a potentially worthwhile endeavor. The current structure of our analysis pipeline will allow us to pull in these other datasets and treat them in the same manner presented here allowing rapid turnaround of a list of target gene for resequencing. The data presented in this chapter were restricted to SNVs in coding transcripts but many additional analyses are possible. The identification of small insertions and deletions followed by similar analyses is one example. Another would the examination of mutations that may be affecting various non-coding transcripts. Determining large structural changes and accompanying fusion transcripts or copy number variation of genes remains to be determined. At this moment, these data have been processed but not annotated in regards to affected genes. These analyses can be completed in a timeframe of days to weeks, when resources required for validation become available.

Finally, studies must be undertaken to determine what, if any, is the consequence of the many mutations that fall into regions of the genome with no presently defined function. The identification of TBET binding regions in the next chapter provides one method of identifying important regions in LGL genomes that should be closely scrutinized for mutation.

74 Chapter 4

Discovering the Function of the Transcription Factor TBET in Terminal Effector Memory Cells Via ChIP-Seq of Large Granular Lymphocytes.

Introduction

CD8+ T cells consist of four subtypes: naïve, central memory, effector memory, and terminal effector memory (Sallusto, Lenig et al. 1999; Masopust, Vezys et al. 2001; Gupta, Bi et al. 2004). Leukemic LGL represent a striking example of a pure population of clonally expanded, antigen challenged, terminal effector memory (TEMRA) cytotoxic lymphocytes (CTL), (Yang,

Epling-Burnette et al. 2008) as determined by expression of the markers CD3+, CD8+, CCR7-, and CD45RA+. We have observed purities of these cells in LGL patients exceeding 85% of unsorted peripheral blood mononuclear cells (PBMCs) (Yang, Epling-Burnette et al. 2008).

Using network therapy and discrete dynamic modeling, others in the lab have constructed an LGL leukemia survival network (Zhang, Shah et al. 2008) which identified TBET (T-box expressed in

T cells) as a master regulator in LGL leukemia. This network is reviewed in Chapter 1.

Chromatin immunoprecipitations (ChIP) have been conducted numerous times for TBET including CD4+ T cells (Jenner, Townsend et al. 2009) and CD8+ T cells (Beima, Miazgowicz et al. 2006) but never in CD8+ TEMRA. Our efforts to isolate these cells from normal donors generally yields 2.5 million cells, far less than the 50-75 million cells needed (Ross Hardison, personal communication for quality ChIP-Seq (sequencing). The abundance of these cells in the leukemic PBMCs of LGL puts our lab in a unique position to further study this transcription factor both to advance knowledge of LGL leukemia and normal CTL.

There was reason to believe that this study would yield results pertinent to LGL leukemia. A similar T-box protein eomesodermin (EOMES) has been shown to bind to the promoter of the sphingosine-1 phosphate receptor 5 (S1PR5) in natural killer cells (Jenne, Enders

75 et al. 2009). S1PR5 has strong ties to LGL biology as it was initially cloned from a list of differentially expressed sequence tags from a microarray of leukemic LGL (Kothapalli,

Kusmartseva et al. 2002). Additionally, there is evidence that LGL leukemia is the result of chronic antigen exposure (Loughran, Hadlock et al. 1998; Sokol, Agrawal et al. 2005). It is therefore of interest that TBET has been shown to be essential for sustained response to through promoting expression of programmed cell death 1 (PDCD1) (Kao, Oestreich et al. 2011).

PDCD1 may be essential to the prevention of autoimmunity (Fife and Pauken 2011).

Identification of the other members of these cellular programs may provide additional insight into these processes.

Transcription factors with the T-box motif recognize the DNA consensus sequence of

TCACACCT. In general, T-box proteins will bind this sequence if it is present twice or more, however the specificity for which T-box protein binds depends on the orientation of the DNA sequence. They can act as both activators and repressors (Wu, Cheng et al. 2011). T-box proteins have been shown to interact with histone-modifying enzymes H3K4-methyltransferase and H3K27-demethylase (Miller and Weinmann 2009). TBET has been shown to bind to many regions independent of cell type, with widely varying consequences (Beima, Miazgowicz et al.

2006). This may reflect a requirement for the presence or absence of other factors which would be dependent on cell type. In regard to the immune system, TBET is a master regulator, involved in controlling both CD8+ and CD4+ T cell and B cell fate. TBET deficiency can lead to an inability of T cells to delineate into T helper 1 cells (Szabo, Kim et al. 2000) a process that requires TBET to displace or block the binding of the GATA-type GATA3 (Jenner,

Townsend et al. 2009). B cells fail to undergo antibody class switching (Peng, Szabo et al. 2002) leading to deficiencies in these lineages. By contrast, TBET is not required for the generation of

CD8+ cells, but loss of TBET leads to the generation of CD8+ cells with reduced capacity for cell-mediated killing (Szabo, Sullivan et al. 2002). This may be at least partially due to

76 compensation by another T-box proteins EOMES (Pearce, Mullen et al. 2003). We find EOMES to be over-expressed LGL compared to PBMC (data contained but not explicitly reported in

(Shah, Zhang et al. 2008)). Based on these findings we thought it important to determine the T- box bound regions of chromatin in LGL leukemia. Taken together, these references highlight the myriad roles for TBET in immunity and also demonstrate that simply identifying TBET bound regions is but the first step in determining the genes that it controls. Our objective for this study was to identify genomic regions occupied by TBET in each of three LGL leukemia samples. We anticipated finding numerous bound regions in proximity to genes with known roles in LGL leukemia as well as discovering new relationships to provide insight into how TBET controls effector function.

I present the first step in determining the role of TBET in LGL leukemia. I performed

ChIP-Seq in three patient samples and in collaboration with others in the lab of Ross Hardison and have prepared the data to be easily viewed by other researchers. We identify the active chromatin states associated with TBET occupied segments and I performed pathway analysis to identify overall themes of genes in proximity to these segments confirming a heavy association with NK and T cell biology. Motif searching revealed numerous other transcription factors that should be investigated for their contribution to LGL leukemia in future experiments, most notably the runt related transcription factors (RUNX). As these samples are leukemic in nature, it is important to determine which occupied segments may be the result of disease. We identify numerous somatic mutations in TBET occupied segments wherein both the mutation and TBET occupancy are unique to Patient 2 from the genome work. Further work is needed to integrate and analyze these datasets to determine which have functional consequences.

77 Methods

Patient Selection and Sample Preparation.

Three chronic T-cell-type LGL leukemia patients were selected from an existing registry in accordance with Institutional Review Board protocols at the Penn State Milton S. Hershey

Cancer Institute. De-identified patient charts indicated greater than 85% expression of the cell surface markers CD8 and CD3 and the Ficoll-Paque (GE Healthcare) separated PBMC layer was used without specific enrichment. Sample aliquots were saved to repeat flow cytometry confirming high levels of CD3 and CD8 (antibodies, BD Biosciences) and 95% purity of the

CD8+/CD3+ gated fraction for the TEMRA markers CD197- and CD45RA+ (antibodies,

Ebioscience) with four color flow cytometry performed on the FACSCalibur (BD Biosciences).

Patients 1 and 2 in this experimental series correspond to genome samples of the same numbering, whereas Patient 3 from each experiment is not the same.

Chromatin Immunoprecipitation.

Aliquots of 75 million cells were suspended in 200 milliters 1X phosphate-buffered saline (PBS). Thirty seven percent formaldehyde (all chemicals are from Fisher Scientific unless noted) was added under gentle stirring to a final concentration of 0.4% and stirring was continued for 10 minutes. Two grams of glycine was then added to quench the reaction followed by gentle stirring for 5 minutes. Suspensions were centrifuged at 500xg for 10 minutes at 4 degrees

Celsius. All remaining steps were performed at 4 degrees Celsius unless otherwise noted.

Supernatant was removed from the pellets by careful aspiration to not disturb the pellets which were then washed with 1 milliliter of PBS and centrifuged at 2000 revolutions per minute (RPM)

78 in a tabletop microcentrifuge. One additional PBS wash was performed and cells were immediately frozen at negative 80 degrees Celsius or used in the protocol without stopping.

To each pellet 500 microliters of cold Cell Lysis Buffer (CLB, 10 millimolar Tris pH 8.0,

10 millimolar NaCl, 0.2% NP40) was added and the cells were resuspended and lysed on ice for

10 minutes. 1X P8340 protease inhibitor without metal chelator (Sigma) was added to CLB,

NLB and IPDB solutions. The pellet was centrifuged at 2500 RPM in a microcentrifuge for 5 minutes, washed with 1 milliliter of PBS and recentrifuged. 1 milliliter Nuclear Lysis Buffer

(NLB, 50 millimolar Tris pH 8.0, 10 millimolar ethylenediaminetetraacetic acid (EDTA), 1% sodium dodecyl sulfate (SDS)) was then added and nuclei were allowed to lyse on ice for 10 minutes. Six hundred microliters of cold IP Dilution Buffer (IPDB, 20 millimolar Tris pH 8.0, 2 millimolar EDTA, 150 millimolar NaCl, 1% Triton X-100, 0.01% SDS) was added prior to sonication. A Falcon 2059 tube in an ice filled beaker was the vessel used for sonication under the microprobe of a Misonix 4000. Pulse settings: amplitude 30, one second on, one second off, total pulse time 30 seconds. The pulsing program was repeated 8-10 times with a target fragment size of 300 basepairs. Care was taken to avoid foaming or heating the sample. Debris was pelleted after sonication by spinning at 2500 RPM in microcentrifuge tubes for 5 minutes.

Supernatant were removed and diluted with 3.4 milliliters of IPDB. Chromatin was then pre- cleared by the addition of 200 microliters Protein G agarose (Invitrogen) and 20 micrograms of normal rabbit antibody (all antibodies Santa Cruz Biotechnology). Rabbit polyclonal anti-TBET or normal rabbit immunoglobulin was prebound to Protein G agarose with 10 micrograms of antibody being added to 50 microliters of beads in 1 milliliter PBS also under gentle rotation in a cold room. A total of 4 milliters was made of polyclonal anti-TBET complexes but only 1 milliliter of normal polyclonal rabbit was prepared. Both bindings were performed under gentle rotation overnight in a cold room.

79 Pre-cleared chromatin was centrifuged at 1000xg for minutes and the pre-bound antibody complexes spun at 7500 RPM. Two hundred microliters of pre-cleared chromatin was retained and the rest was split among the five tubes of antibody-bead complexes. Chromatin and complexes were rotated for four hours in a cold room and then centrifuged at 7500 RPM for 3 minutes. The supernatant was aspirated and then the beads were washed and centrifuged 1 time with IP wash buffer (20 millimolar Tris- pH 8.0, 2 millimolar EDTA, 50 millimolar NaCl, 1%

Triton X-100 and 0.1% SDS). Then the beads were washed twice with the same solution except for NaCl being increased to 500 millimolar (high salt IP wash buffer). One IP wash with buffer II

(10 millimolar Tris- pH 8.0, 1 millimolar EDTA, 250 millimolar LiCl, 1% NP40 detergent, 1% deoxycholate) was followed was then followed by two washes with 1X TE (10 millimolar Tris- pH 8.0, 1 millimolar EDTA). Two elutions using 100 microliters of room temperature elution buffer(100 millimolar NaHCO3, 1% SDS) were combined with 16 microliters of 5 Molar NaCl.

Samples including retained input chromatin were then digested with RNase A (Ambion) overnight at 65 degrees Celsius. Ten seconds of 300 RPM agitation was provided every 6 minutes throughout the incubation using a programmed Thermomixer (Eppendorf). After overnight incubation, 3 microliters of 20 milligram/milliliter (Promega) was added and incubated at 45 degrees Celsius for 2 hours. 1.1 milliliters of buffer PB (Qiagen) and 15 microliters of sodium acetate were added to each reaction prior to purifying the DNA of the input, antibody control and TBET on a MinElute PCR purification column (Qiagen). All four anti-

TBET reaction were applied to the same column and then eluted in 34 microliters of

RNase/DNase-free water.

To prepare the sequencing libraries, the input control (unenriched chromatin) and ChIP

DNA fragments were repaired to generate blunt ends, and a single A nucleotide was added to each end. Sequencing work was performed by Cheryl Keller-Capone in the laboratory of Ross

Hardison. Illumina genomic adaptors were then ligated to both ends of the fragments, ligation

80 products were amplified by 18 cycles of PCR, and the PCR products between approximately 250 and 500 bp were gel purified according to standard Illumina protocols. The quantity and quality of each library was evaluated by qPCR and Bioanalyzer (Agilent Technologies, Santa Clara, CA), respectively, to make sure it met Illumina standards. The Bioanalyzer was also used to measure the aggregate length of each library as shown in the following table:

Table 4-1: DNA lengths of TBET Libraries

Sample type Sample Date library Sequencing DNA ID completed Date length Input 1 4/27/2011 8/26/2011 307 bp Tbet 4/27/2011 8/26/2011 310 bp Input 2 8/11/2011 10/28/2011 485 bp Tbet 8/11/2011 10/28/2011 493 bp Input 3 8/11/2011 10/28/2011 500 bp Tbet 8/11/2011 10/28/2011 433 bp

The ChIP DNA library was sequenced in single-read mode on the Illumina HiSeq 2000 platform. Cluster generation and sequencing chemistry were performed using Illumina-supplied kits as appropriate. The resulting sequence reads were mapped to the human genome (hg19 assembly) using the Burrows-Wheeler Aligner (BWA) (Li and Durbin 2009) in automated

Galaxy workflow. The peak-calling program MACS (Model-based Analysis for ChIP-Seq)

(Zhang, Liu et al. 2008) was applied to the Tbet ChIP-Seq and input sequence data to identify

4,202, 5,513, 9,815 potential Tbet occupied segments (Tbet OSs). BigWig files of peaks and interval tracks of corresponding intervals were uploaded to a mirror of the UCSC genome browser for ease of visualization.

81 GREAT

The 9,815 binding sites from the single best patient sample were analyzed by me utilizing the Genomic Regions Enrichment of Annotations Tool (GREAT) hosted on the web by Stanford

University (great.Stanford.edu, accessed 6/2/2013). Analysis was completed utilizing GREAT version 2.01 with the following setting: species assembly hg19, Association rule: Basal + extension: 5000 bp upstream, 1000 bp downstream, 1,000,000 bp max extension with curated regulatory domains included, unless otherwise indicated.

Motif Discovery

The 9815 intervals from Patient Three were used for motif discovery. These intervals varied in length from two to five hundred basepairs, whereas the ideal size for motif searching in

DREME (Bailey 2011) is one hundred basepairs. New one hundred basepair intervals were created in Galaxy centered around the coordinate mean of the original intervals. The sequences from the hg19 canonical male genome corresponding to these intervals were retrieved in FASTA format using the "Extract Genomic DNA" tool. They were then submitted to DREME

(http://meme.nbcr.net/meme/cgi-bin/dreme.cgi, accessed 6/3/2013) using default settings.

Outputs were submitted to TOMTOM to query the transcription factor binding motif database experimentally determined through systematic evolution of ligands by exponential enrichment

(SELEX) followed by ChIP-Seq (Jolma, Yan et al. 2013).

Manipulation of Intervals

Work performed on the intervals such as clustering and coverage were performed in

Galaxy (Goecks, Nekrutenko et al. 2010) using the "Work with Genomic Intervals" tools.

82 Results

We identified 4202, 5513, and 9815 potential TBET occupied segments in samples 1-3 respectively. A high rate of overlap of the identified intervals was observed for samples 2 and 3 when the much higher number of intervals identified in sample 3 were taken into consideration.

The term interval refers to the data type of the occupied segments. It reflects the start and end of the segments as they are overlaid onto the hg19 scaffold. Intervals represent a simple data type that can be easily uploaded for display on the UCSC genome browser. Sample 1 intervals did not overlap well with the other two samples with over half of the identified intervals being unique.

By contrast, 73% of the intervals represented in sample 2 were also identified in sample 3.

Sample 3 had the highest number of intervals. The peak data and intervals for all of human genome build 19 are now available and accessible to our lab. Individual genes of interest can be called up in a mirror of the UCSC genome browser and displayed with TBET bound regions to facilitate future study. Representative graphs are included in Figure 4-1. Consistently, we observed more peaks for Sample 3 in any particular genic region compared to the other two samples. As a way of measuring this on a global scale, sample 3 had an increased number of clusters defined by less than 15 kilobases separating two intervals. An increased number of intervals per cluster was also observed (Table 4-2). The significance of this is unclear. It remains to be determined whether these difference are related to variations in samples or sample preparation conditions, although all samples were treated in a completely identical manner.

Scarcity of putative TBET binding motifs in these additional intervals could indicate that they are the result of additional proteins being present in the complex, an interaction that is only being captured in this one sample. This has not been investigated fully at this point.

83

Figure 4-1: Initial Assessment of ChIP-Seq Datasets from Three Patients The top panel indicates overlap between intervals in each of three datasets. The size of circles representing datasets are exactly proportional, but overlapping regions are not. Parenthetical values are the total number of intervals for each sample. The bottom panels show intervals for binding in all three datasets and peak data for Sample 3 around the fas ligand gene (FASLG) and the sphingosine 1 phosphate receptor 5 (S1PR5). Peak height is scaled to approximately 50 reads at the highest peak. Similar visualizations for this data is available for these peaks across the whole of hg19.

84

Table 4-2: Clustering of TBET Bound Intervals.

sample 1 2 3 T otal intervals 4202 5513 9815 15 kilobase clusters 252 740 1792 Intervals in 15 kilobase clusters 537 1749 4949 13% 32% 50% I ntervals per cluster 2.13 2.36 2.76

Regulatory Regions Bound by TBET.

There are no annotated datasets to define regulatory regions in LGL leukemia genomes or their normal counterparts that would allow us to associate these intervals with regions such as enhancers or insulators. However, if we restrict ourselves to looking at the larger scale, some knowledge can be gained by examining the most similar cell type for which these data are available. The B cell line GM12878 is one such possibility. Figure 4-2 demonstrates the enrichment of TBET in various active chromatin states. Strong enrichment (8 to 16 fold) is observed in all three samples for enhancers, weak enhancers and the transcriptional start site.

Figure 4-2: Enrichment of TBET Peaks (from TEMRA) in Active Chromatin States. CTCF (associated with insulators), E (enhancer), WE (enhancer-associated but with weaker histone modification signal) and TSS (promoter). The states are ascertained here in a lymphoblastoid cell line GM12878.

85 Pathway Analysis

The examples noted in Figure 4-1 are encouraging in that they have known relation to the biology of LGL leukemia. The only way to examine such large datasets, apart from looking at genes of known interest, is to look for enrichment of themes. It is difficult to assign chromatin regions to the genes they regulate as proximity is not always the best indicator (Li, Ruan et al.

2012). However, genes of similar function are often grouped together in genomic regions.

GREAT can be used to take advantage of this feature to determine the function of a DNA binding molecule (McLean, Bristor et al. 2010). We used the third sample as it was the largest dataset.

Entering 9815 intervals and applying the default closest gene assignment distance of 1000 kilobases resulting in 5,049 of the 17,744 genes annotated in hg19 being associated with a TBET occupied segments. Assignment of all but 48 intervals was achieved in this manner. A shorter distance such as 5 kilobases would have designated 2,618 genes as being potentially regulated by

TBET. 1361 occupied segments were within 250 basepairs of the transcriptional start site in either direction. Pathways identified as enriched were similar using any of these filters (not shown). Table 4-3 lists the top 10 MSigDB and Panther DB gene lists identified as enriched.

MSigDB pathways contain numerous lists consistent with an established role of TBET in immune function. The top two pathways represent signaling events in the types of cells that give rise to both types of LGL leukemia. Figure 4-3 graphically depicts the numerous members of the T cell receptor signaling pathway. More hits are noted in the table than viewed in the figure representing how multiple occupied segments are in proximity to the same gene and multiple isoforms with occupied segments nearby. Among PantherDB pathways, apoptosis tops the list, with JAK/STAT, PDGF, and Ras of interest to LGL leukemic biology. Occupied segments are in close proximity to multiple STATs, some within a few hundred basepairs of the transcriptional

86 start site (TSS). Members of the JAK-STAT pathway in proximity to occupied segments are detailed in Table 4-4.

Table 4-3: Pathway Enrichment in Genes Associated with TBET Bound Regions. GREAT version 2.0.2 Association rule: Single nearest gene: 1000000 bp max extension, curated regulatory domains included

Binom Binom

Binom

Rank Rank Binom Binom Fold Binom Region MSigDB Raw P- FDR Q- Enrich- Region Set

Term Name Value Val ment Hits Coverage

Natural killer cell mediated cytotoxicity 1 1.64E-110 1.45E-107 5.638235 272 0.027713 T cell receptor signaling pathway 2 3.15E-99 1.39E-96 4.750871 287 0.029241 Genes involved in Signaling in Immune system 3 4.27E-85 1.25E-82 2.857271 465 0.047376 Genes involved in Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell 4 6.44E-50 1.42E-47 5.342703 127 0.012939 Chemokine signaling pathway 5 1.09E-47 1.93E-45 2.934486 248 0.025267 B cell receptor signaling pathway 6 9.47E-47 1.39E-44 4.243027 150 0.015283 Keratinocyte Differentiation 7 1.44E-45 1.80E-43 4.807947 128 0.013041 Genes involved in TCR signaling 8 4.83E-38 5.31E-36 3.704545 142 0.014468 T Cell Receptor Signaling Pathway 9 7.98E-38 7.81E-36 4.160749 123 0.012532 Genes involved in Hemostasis 10 4.90E-35 4.31E-33 2.165355 317 0.032298

Binom Binom

Binom

Rank Rank Binom Binom Fold Binom Region PantherDB Raw P- FDR Q- Enrich- Region Set

Term Name Value Val ment Hits Coverage Apoptosis signaling pathway 1 4.85E-63 6.79E-61 4.422968 195 0.019868 T cell activation 2 2.89E-46 2.02E-44 3.704899 174 0.017728 B cell activation 3 4.28E-43 2.00E-41 4.324499 135 0.013754 mediated by chemokine and cytokine signaling pathway 4 1.12E-38 3.91E-37 2.543281 254 0.025879 EGF receptor signaling pathway 5 5.48E-28 1.53E-26 2.491761 188 0.019154

87

PDGF signaling pathway 6 1.33E-27 3.11E-26 2.342055 209 0.021294 JAK/STAT signaling pathway 7 1.57E-26 3.14E-25 6.993193 52 0.005298 VEGF signaling pathway 8 1.91E-23 3.35E-22 2.898347 120 0.012226 Integrin signaling pathway 9 8.47E-23 1.32E-21 2.187798 197 0.020071 Ras Pathway 11 1.53E-12 1.95E-11 2.280179 93 0.009475

Table 4-4: TBET occupied Segments Near Transcriptional Start Sites of JAK-STAT Pathway Genes GREAT version 2.0.2 Association rule: Single nearest gene: 1000000 bp max extension, curated regulatory domains included. Closest segment to TSS is in bold.

Symbol Descriptor Distance to TSS relative to direction of transcription

JAK1 Janus kinase 1 (+5609), (+42967), (+83740), (+102198) JAK3 Janus kinase 3 (+1650) MAPK14 Mitogen-activated protein (-58), (+1287) kinase 14 PIAS1 Protein inhibitor of (+1300) activated STAT 1 PTPRC Protein tyrosine (-41192), (-17567), (-11134), (+17), (+17753), (+21884), phosphatase, receptor type, (+30180), (+40674), (+43022), (+76951), (+149673), C (+281030), (+283831), (+296108), (+402701), (+450948), (+524845), (+530058), (+534485), (+579244) SOCS1 Suppressor of cytokine (+2663), (+4003) signaling STAT1 Signal transducer and (-35798), (-24301), (-15316), (-433), (+15042) activator of transcription 1 STAT3 Signal transducer and (-255), (+1207), (+26499) activator of transcription 3 STAT4 Signal transducer and (-1443), (+517), (+2638), (+23010), (+25020), (+36025), activator of transcription 4 (+37059), (+38913) STAT5A Signal transducer and (+50150) activator of transcription 5A STAT5B Signal transducer and (+3719), (+9268), (+23623), (+26207), (+27598) activator of transcription 5B

88

Figure 4-3: Coverage of TCR Signaling Network by Genes in Proximity to TBET bound regions. Genes in proximity to TBET occupied segments are shaded in grey. Genes with double rings represent groups with multiple members. Pathway was generated with Ingenuity IPA.

89

Figure 4-4: Natural Killer Cell Mediated Cytotoxicity Pathway Genes Proximal to TBET Occupied Segments in Leukemic LGL Red stars denote genes with assigned TBET occupied segment. Numerous internal molecules common to TCR signaling pathways are represented. Surprisingly, numerous NK cell specific surface markers genes are occupied by TBET despite the fact the ChIP was performed in TEMRA. KEGG Pathway drawn in DAVID bioinformatic tools. A duplicate figure with unobstructed gene name is located in appendix B.

90 Identification of Motifs Found in Occupied Segments

We selected the largest dataset (Sample 3) to determine what other DNA binding proteins may be associated with these occupied segments by looking for the enrichment of motifs. TBET bound intervals were shrunk to 50 basepairs on either side of their coordinate mean and submitted to DREME (Bailey 2011), which discovers short regular expression motifs. We limited the discovery to the top twenty most enriched motifs. The 28 most statistically enriched motifs are presented in Table 4-5. We used TOMTOM (Gupta, Stamatoyannopoulos et al. 2007) to quantify the similarity between our experimentally determined motifs and those of curated transcription factor binding motif databases. The fact that the top motif was not associated with TBET was somewhat surprising but can be explained by the fact that our approach may not be properly quantifying every TBET binding site. TBET binding sites may be represented in more than one of the short motifs found to be enriched by DREME. Both GTGDKAA and STGTKA can be associated with TBET binding according to TOMTOM. Complete associated transcription factors for all enriched motifs are displayed in Table 4-5. The bottom panel illustrates the criteria used to compare experimentally observed motifs and those of known transcription factors. Both nucleotide frequency and degree of conservation are compared to make the correlation.

Examining the top motif indicates that Ets related transcription factors binding sequence are frequent in TBET bound regions. Recognizing the MGGAAR sequence to be a halfsite for

STAT3 but not present in the output, we determined that it was not one of the transcription factors examined in the 2013 Jolma dataset which relied on cloned transcription factor proteins.

We therefore queried an alternative transcription factor database in TOMTOM, Jaspar

(http://jaspar.cgb.ki.se/) and this motif comparison is shown in Figure 4-4. Runt related factors

(RUNX) have not been examined in LGL leukemia. Microarray data from Chapter 2 indicates

RUNX1 is 2.4 fold higher in leukemic LGL than the combined class of normal CD8 and TEMRA

91

(t-test p-value .009, n=37 LGL, n=8 CD8 and TEMRA). RUNX3 appears to be marginally (25%) lower in LGL (t-test p-value=.03) with very strong raw probe signals in both groups (mean fluorescent intensity of 7594) which would indicate that transcript is being produced.

Figure 4-5: Motifs Enriched in Sites Occupied by TBET. Correlates made by TOMTOM of enriched motifs and known binding sequences of transcription factors determined by SELEX. Total column height represents the amount of conservation of that base position. Fractional bases are proportional to the amount of representation of each base in the bound sequence where more than one base is tolerated. The top logo represents that known for the binding site while that underneath is from the TBET bound region.

Table 4-5: DNA Binding Proteins Associated with Motifs Enriched in TBET Bound Segments

Factors are identified in Jolma(Jolma, Yan et al. 2013) except for STAT3 (Jaspar). DBD= DNA binding domain, implying a truncated protein was used.

Motif E-value MGGAAR 8.90E-241 Elf5_DBD, ELF5_full, ELF5_DBD, SPIC_full, Spic_DBD, SPIB_DBD, ERG_DBD, ETS1_DBD, ERF_DBD, ERG_full, STAT3halfsite(Jaspar) RVCCACA 7.00E-221 RUNX3_full, RUNX2_DBD_3, RUNX3_DBD_2, GLI2_DBD_2, RUNX3_DBD_3, RUNX3_DBD, RUNX2_DBD_2, RUNX2_DBD, ZNF143_DBD, GLI2_DBD GTGDKAA 1.30E-92 TBR1_DBD, TBX21_full_2, TBX1_DBD, TBR1_full, TBX20_full, TBX2_full_2, TBX21_DBD_2, TBX15_DBD, TBX20_DBD_3, TBX20_DBD CYCCDCCC 8.00E-62 SP1_DBD, SP3_DBD, ZNF740_full, KLF16_DBD, ZNF740_DBD, SP8_DBD, FOXO3_full_3, Zfp740_DBD,

92

EGR2_full, EGR3_DBD TGAGTCAB 8.50E-55 JDP2_full, Jdp2_DBD, NFE2_DBD, JDP2_DBD, MAFF_DBD, MAFK_full_2, MAFG_full, Mafb_DBD_2, MAFK_DBD_2, ATF4_DBD DAAATR 3.50E-44 E2F2_DBD, E2F2_DBD_2, E2F3_DBD, MEF2A_DBD, MEF2D_DBD ACCRC 3.20E-39 GLI2_DBD_2, ZBTB7A_DBD, ZBTB7C_full, RUNX3_full, GLI2_DBD, ZBTB7B_full, RUNX2_DBD_3, GMEB2_DBD_2, RUNX3_DBD_2, Nkx3-1_DBD STGTKA 2.60E-37 MGA_DBD, TBX4_DBD, TBX15_DBD_2, TBX1_DBD_3, TBX5_DBD, TBR1_DBD, TBX21_DBD_2, TBX21_full_2, TBX20_full, TBR1_full SAGAWA 1.70E-21 SPDEF_full_3, SPDEF_DBD_3

RCSTCA 1.60E-20 TFAP2B_DBD_2, TFAP2A_DBD_2, Tcfap2a_DBD_2, POU6F2_DBD, TFAP2C_full_3, THRA_FL, TFAP2C_DBD_2, RARA_full_2, RORA_DBD, THRB_DBD_3 GGCGKGR 2.50E-19 E2F7_DBD, SP1_DBD, SP3_DBD, KLF16_DBD, KLF14_DBD, SP4_full, EGR3_DBD, SP8_DBD, EGR4_DBD, EGR2_full ACATCCTS 3.70E-15 SPDEF_DBD, SPDEF_full, ETS1_full_2, ETV5_DBD, ETS1_DBD_2, ELF5_DBD, ELF5_full, Elf5_DBD, ETV4_DBD, SPIC_full AAAARAAA 3.30E-11 Hoxc10_DBD_2, HOXA13_full, HOXC13_DBD, HOXB13_DBD, Hoxd13_DBD, HOXD13_DBD, Foxj3_DBD_4, FOXJ3_DBD GRTGWCA 2.70E-11 MEIS3_DBD_2, Meis2_DBD_2, PKNOX2_DBD, Meis3_DBD_2, Pknox2_DBD, PKNOX1_DBD, TGIF2LX_full, TGIF2_DBD, TBX5_DBD, TBX4_DBD CGCAKGCG 2.50E-10 NRF1_full, TFAP2B_DBD_2, TFAP2A_DBD_2, Tcfap2a_DBD_2, MESP1_DBD, TFAP2C_full_3, TFAP2C_DBD_2, TCF3_DBD, EGR2_DBD, TCF4_DBD MTGAKAA 8.50E-10 GATA5_DBD

CCTCCY 1.80E-08 Rara_DBD, ZIC3_full, ZIC4_DBD, ZIC1_full

KGGTTTY 9.10E-08 IRF5_full_2, RUNX3_DBD_3, GRHL1_full, RUNX2_DBD_2, RUNX3_full, IRF4_full, IRF5_full, MSX2_DBD, RUNX2_DBD_3 GATCKCGC 1.10E-05

AAATTAGC 1.80E-05 POU3F3_DBD, POU6F2_full, POU3F2_DBD_2, HOXD8_DBD, POU3F1_DBD, GBX1_DBD, NOTO_DBD, VSX1_full, ALX3_DBD, LHX2_DBD CMGCC 8.90E-05 Hic1_DBD, YY2_full

GSTGCTA 2.10E-04 RFX2_DBD_2, RFX3_DBD_2, Rfx2_DBD_2, Barhl1_DBD_3, BARHL2_full_3, RFX4_DBD_2, Ascl2_DBD, BARHL2_DBD_3, Rhox11_DBD GGAYTACA 3.10E-04 PITX1_full, Otx1_DBD_2, PITX1_DBD, OTX1_DBD_2, PITX3_DBD, OTX2_DBD, OTX2_DBD_2, GSC2_DBD, DMBX1_DBD, Otx1_DBD

93

CKCTGC 4.10E-04

TCGCGAGA 2.80E-03 ZBED1_DBD, SREBF2_DBD, Srebf1_DBD, ARNTL_DBD, MLX_full, Mlx_DBD, BHLHE41_full, Rarg_DBD_3 AGTAGCTG 3.10E-03 Ascl2_DBD, NHLH1_full, NHLH1_DBD, TFAP4_DBD, TFAP4_full ACGTGACY 3.20E-03 ARNTL_DBD, BHLHE41_full, TFEB_full, SREBF2_DBD, TFE3_DBD, Srebf1_DBD, Bhlhb2_DBD, BHLHB3_full, USF1_DBD, BHLHB2_DBD RTAAACA 4.30E-03 Foxj3_DBD_3, FOXL1_full, FOXJ2_DBD_2, FOXO1_DBD, Foxg1_DBD_3, Foxk1_DBD_2, FOXO4_DBD_2, FOXP3_DBD, Foxc1_DBD_2, FOXO6_DBD_2

Occupied Segments Unique to Genome Patient Two Containing Somatic Variants.

In order to find segments unique to Patient 2 and therefore potentially caused by mutation, the occupied segments for Patient 1 and 3 were subtracted from patient 2 to yield those segments unique to patient 2. This yielded segments covering 608,507 basepairs of hg19. We determined coverage of these intervals by the 30,780 small nucleotide variants identified in

Chapter 3. Overall, 45 SNVs were detected in 24 intervals. Assuming a total genome size of 3 billion bases, these identified intervals had 7.2 times more SNVs than the average segment of the human genome (1 mutation was found for every 13,522 basepairs in the occupied segment versus

1 for every 97,466 basepairs in the total genome). Using the default basal plus extension (5kb upstream, 1kb downstream and a distal 1,000 kb until the nearest gene was encountered) settings of GREAT 22 regions were associated with 2 genes, 22 with 1 gene and 1 region was not associated. Table 4-6 lists these regions and associated variants from patient 2. It has been established that sequence variants can affect binding and subsequently transcription (Reddy,

Gertz et al. 2012). We are working to create alignments of the reads supporting these altered segments to determine if there is a bias towards the mutated allele being bound which would indicate the variants are causing sites to become active. The closest variant to a TSS, 1572

94 basepairs, is defensin A1 (DEFA1), identified as a toxic granule component of neutrophils

(Linzmeier, Michaelson et al. 1993). It is not known if this protein is present in the toxic granules of LGL. There is also a potential mutation 15 kb from the TSS of the oncogene PIM3 (proviral integration of Moloney murine leukemia virus 3). This is of interest to LGL in that PIMs can be activated by STATs (Shirogane, Fukada et al. 1999) and can in turn phosphorylate RUNX factors such as those identified in this study (Aho, Sandholm et al. 2006).

Table 4-6: TBET Occupied Segments Unique to, and Associated with Somatic Variants in Patient Two.

# GREAT Association rule: Basal + extension: 5000 bp upstream, 1000 bp downstream, version 2.0.2 1000000 bp max extension, curated regulatory domains included. SNV name=chromosome, hg19 coordinate, reference allele/variant allele, reference allele count saliva/reference allele count LGL. (distance to TSS in basepairs) ACTR3BP2 chr2,91693098,T/C,1/2 (-436362) ADRA2C chr4,4212836,T/C,2/1 (+444474) ANKRD20A3 chr9,68306720,G/A,1/0 (+379993) ANKRD30B chr18,14265968,G/A,2/1 (-482277) ARHGAP11B chr15,30988357,T/C,1/2 (+69363) BC068290 chr16,33962816,T/C,2/1 (+179084) CRELD2 chr22,50339088,C/A,2/1 (+26727), chr22,50339084,A/G,1/2 (+26727) CWH43 chr4,49228597,A/C,2/1 (+240366) DEFA1 chr8,6877567,C/T,1/2 (-1572), chr8,6877468,G/C,2/1 (-1572) FAN1 chr15,30988357,T/C,1/2 (-207834) FOXD4L6 chr9,68306720,G/A,1/0 (+895450) GGT1 chr22,25020329,C/T,2/1 (+40770), chr22,25020340,A/G,2/1 (+40770), chr22,25020318,T/C,1/2 (+40770) GPR39 chr2,133026668,C/A,2/1 (-147460) HERC2P3 chr15,20467242,C/T,2/1 (+243855), chr15,20467511,T/C,1/2 (+243855), chr15,20468050,A/G,1/2 (+243855) KCNJ18 chr17,21327941,T/G,1/2 (+18835), chr17,21326721,G/A,2/1 (+18835) LOC150776 chr2,133026668,C/A,2/1 (+776301) LOC440563 chr1,12924410,A/C,2/1 (+259723), chr1,12924727,A/G,1/2 (+259723), chr1,12924418,A/G,1/0 (+259723) LOC649330 chr1,12924727,A/G,1/2 (-16366), chr1,12924418,A/G,1/0 (-16366), chr1,12924410,A/C,2/1 (-16366) MBP chr18,74679576,G/A,2/1 (+165303), chr18,74679541,G/C,2/1 (+165303)

95 METTL11A chr9,132172739,A/C,1/2 (-215775), chr9,132172690,G/C,2/1 (-215775) MTRNR2L1 chr17,21327941,T/G,1/2 (-695154), chr17,21326721,G/A,2/1 (-695154) OR11H1 chr22,16170781,A/C,1/2 (+279118) OR11H12 chr14,19747195,G/T,2/1 (+370164) OR11H2 chr14,19747195,G/T,2/1 (+434733) OTOP1 chr4,4212836,T/C,2/1 (+15851) PIM3 chr22,50339088,C/A,2/1 (-15133), chr22,50339084,A/G,1/2 (-15133) PIWIL3 chr22,25020340,A/G,2/1 (+150195), chr22,25020329,C/T,2/1 (+150195), chr22,25020318,T/C,1/2 (+150195) PPP2R4 chr9,132172690,G/C,2/1 (+299432), chr9,132172739,A/C,1/2 (+299432) SPIN4 chrX,61724589,C/G,1/2 (+867537), chrX,61724747,T/C,1/2 (+867537), chrX,61721804,G/C,1/2 (+867537), chrX,61686913,A/G,1/2 (+867537), chrX,61686997,G/C,2/1 (+867537), chrX,61685569,G/C,2/1 (+867537), chrX,61719163,C/G,1/2 (+867537), chrX,61682047,T/A,2/1 (+867537), chrX,61685903,G/A,1/2 (+867537), chrX,61715620,T/G,2/1 (+867537) SPNS3 chr17,4306989,T/C,2/1 (-30464) TEKT4P2 chr21,10212056,A/C,2/1 (-243546) TPTE chr21,10212056,A/C,2/1 (+778781) TUBA3C chr13,19382667,C/T,1/2 (+373197) UBE2G1 chr17,4306989,T/C,2/1 (-36786) ZNF236 chr18,74679576,G/A,2/1 (+143355), chr18,74679541,G/C,2/1 (+143355) ZNF519 chr18,14265968,G/A,2/1 (-133473) ZXDA chrX,58563447,C/T,2/1 (-635539), chrX,58564990,C/T,2/1 (-635539)

Discussion

We identified over 10 thousand potential sites occupied by the transcription factor TBET.

It is already apparent that a great number of these will have relevance to LGL leukemia. It is beyond the scope of this dissertation to describe every one of them in detail, but we have deposited the intervals and peaks in a way that lab members can examine their gene of interest for

TBET occupation nearby. The first two regions for which we showed peak data, FASLG and

S1PR5, show strong evidence of occupancy by TBET. Neither of these has been previously reported in CD8+ T cells and will add to our network model. In the case of S1PR5, the binding

96 of EOMES in NK cells is mirrored by the binding of TBET in CD8+ TEMRA. Pathway analysis indicates that quite a few bound regions may overlap between these two cell types as the top pathways were NK cell mediated toxicity and TCR signaling. Significant overlap occurs in the effectors that are downstream of the cell surface markers, and an association with toxic granule and death inducing components is also expected to be similar between the cell types. What was not expected is the number of NK cell surface markers that are in proximity to TBET bound segments. Both inhibitory and activating surface markers are represented indicating that TBET roles in activation may also be to regulate the molecules that help to discriminate self. An example of this is the killer inhibitory receptors (KIR) that inhibit activation when ligated by class 1 major histocompatibility complex (MHC) members (Wagtmann, Rajagopalan et al. 1995).

KIR have been shown to be abnormally regulated in NK-LGL leukemia (Lamy and Loughran

2003). Of note, the activating receptor NKp46, has TBET occupation near the TSS and has been shown to be a marker for the subset of cells that give rise to leukemic LGL in interleukin-15 transgenic mice (Yu, Mitsui et al. 2011). Experimental validation of these interactions should be undertaken as they may have high value.

Not determined in this study was whether the differences between TBET binding sites in individual patients were due to the mutations detected. Elucidating whether these differences are functionally relevant will require expression data and a more accurate way to define which segments are interacting with which promoters. A large number of these sites were several hundred kilobases from the nearest gene but could still be having an effect on transcription due to the complex structure of chromatin. Experiments are planned to map these interactions through methods similar to ChIP-Seq (Li, Ruan et al. 2012). Our system of data display will allow us to layer these different sets as tracks in the UCSC genome browser allowing us to examine complex interactions. Expression data will also be obtained through next generation sequencing to avoid bias introduced by other methods that are restricted to genes that are currently annotated.

97 Current methods such as microarray require probe design to capture an RNA of interest whereas

RNA-Seq can be used to sequence the entire RNA population.

TBET was found to be associated with segments consistent with enhancers, weak enhancers and the transcriptional start site. Admittedly, these data may not be entirely accurate for TLGL as it was ascertained in a B lymphocyte cell line. The effect that TBET is having in these regions will be dependent on the chromatin state and what other factors are present. We are currently planning to map these chromatin states in LGL leukemia samples if future attempts to gain funding are successful. Motif analysis has identified several enriched motifs that may be indicative of the other factors present in complex with TBET. Demonstrating the presence in protein form of these factors and their occupancy patterns will add to our current model. The fact that these factors may reciprocally bind the same regions as TBET (Jenner, Townsend et al. 2009) warrants that their nuclear localization first be proved prior to ChIP-Seq of these factors. The contribution of Ets related transcription factors have not been determined in regard to LGL leukemia. The fact that this motif is identical to the halfsite of STAT3 takes on importance in light of the fact that halfsite binding by unphosphorylated STAT3 in conjunction with NFKB has been experimentally observed (Timofeeva, Chasovskikh et al.). Another extension of this interaction shows TBET occupied regions near many members of the JAK-STAT signaling pathway. The second most common motif may implicate RUNX transcription factors in LGL leukemia. RUNX leads to lymphoma in transgenic mice that overexpress it. It was shown to regulate multiple factors in the sphingolipid pathway which confers a survival advantage and this advantage was shown to be similar to that obtained with exogenous sphingosine-1-phosphate

(S1P) (Kilbey, Terry et al. 2010). Others in the lab have previously shown the sphingolipid rheostat, that being the ratio of pro-survival S1P to apoptotic trigger ceramide, to be altered in

LGL leukemia (Shah, Zhang et al. 2008). The regulation of this pathway by TBET and RUNX should be determined and probed for therapeutic opportunities.

98 A simultaneous advantage and flaw of this work is that it takes place in leukemic cells.

Occupied segments that are present in the leukemic cell due to mutation would be considered falsely positive in the normal cells. These segments may contribute to the pathogenesis of LGL leukemia and would in that sense be quite valuable. In patient 2, 24 regions were identified to be uniquely occupied and to contain somatic mutations. Many of these segments were quite distal to the nearest gene. This indicates that additional correlative information such as chromatin structure and expression data would be quite helpful to determine which of these may be having an effect on which gene. LGL represent a rare and unique cell type for which this information is not currently available but collecting it will be the focus of considerable future effort. Consistent with the idea that they may be conferring a survival advantage, a 7-fold enrichment of SNVs in

TBET occupied segments was observed. How these altered sites are altering transcription remains to be determined.

Future Directions

For the type of comprehensive analysis we wish to perform in the future, completion of current datasets must take place. This will involve the sequencing of somatic mutations in ChIP-

Seq Sample 3 and the collection of additional expression data. We currently plan to sequence six additional genome pairs within the next six months. Expression data will be gathered by direct sequencing of RNA (RNA-Seq) to avoid biasing against genes that are not currently annotated.

Samples for this have been selected and prepared as part of an effort to sequence 24 LGL patient samples.

The next step is to further identify genomic regions of importance in LGL samples. We will do this by performing ChIP-Seq for additional factors and for the marks of active chromatin.

Methods will have to be developed to IP important factors like NFKB and RUNX and will of

99 course depend on which RUNX is expressed in LGL. The marks of active chromatin such enhancers and insulators are well known and the antibodies are well characterized. We plan to measure DNA associated with histone modifications consistent with active promoters (H3Kme3), distal CRMs (H3K4me1) and one type of repressed chromatin (H3K27me3) by performing ChIP-

Seq of these factors in LGL leukemia samples. These biochemical features mark any gene or region in these functional categories, and thus provide a broader view of the chromatin landscape than mapping the specific TFs.

Lastly, the mapping of distal elements with the genes they regulate must be undertaken.

We will use the strategy of cross-linking fresh LGL PBMCs followed by the immunoprecipitation of RNA polymerase in a manner similar to regular ChIP-Seq. We will then construct a paired end sequencing library where each end is theoretically derived from the two pieces of DNA that were brought into proximity at the promoter. After sequencing we will map each end to the genome in such a manner that the interaction can be observed. Although technically difficult, this method has been published and should be within our capabilities.

100 Chapter 5

Concluding Remarks

The focus of this dissertation was LGL leukemia which represents clonally expanded cytotoxic lymphocytes (CTL). These studies are therefore of interest to those wishing to develop treatments for LGL leukemia and to those with interests related to CTL biology. CTLs are a part of the immune system that control virally infected and cancerous cells. In this case abnormalities related to LGL leukemia, such as prolonged survival and apoptotic resistance mechanisms, may be considered advantageous adaptations for cells cultured for CTL therapy. Others in our lab and in the LGL leukemia field have identified numerous abnormal signaling pathways, but the contribution of mutation to these pathways is little known except for the recently discovered mutations in STAT3. Determining the contribution of mutations is beneficial in that it may help to identify those traits of leukemic LGL that are not present in normal LGL. Additionally, therapies against these pathways can be developed against the mutated proteins which may help to spare normal processes and cell types within the body. This would hopefully lead to more efficacious drugs with fewer side effects.

The studies presented in this dissertation regress from most to least translational in that

Chapter 2 represents the results of a recent clinical trial correlated with a known mutation,

Chapter 3 identifies new mutations with unvalidated changes in function and Chapter 4 identifies new regions in which we are looking for mutations. Those furthest from direct patient impact will build the new base for future studies to find important axes that will eventually lead to new treatments for LGL leukemia.

The main purpose of Chapter 1 was to highlight the known clinical course and dysregulated pathways in LGL leukemia. Much of this information has been repeatedly reviewed and is common knowledge in the field. The revelations about the presence of STAT3 mutations

101 in LGL leukemia and related diseases are a large step forward as they represent the first indication that mutations are present in LGL leukemia. The finding that both chronic lymphoproliferative disease of NK cells (CLPD-NK) and T cell LGL leukemia (T-LGL) have these mutations unifies these two diseases at the molecular level. The discovery of these mutations in LGL related diseases without significant LGL expansions indicates that this mechanism may extend into a wide range of diseases. Future studies will seek to extend these findings to rheumatoid arthritis (RA). RA is many times more prevalent than LGL which increases the potential significance of these findings immensely. I am personally honored to be among the collaborative group that has made these discoveries. Unanswered questions remain in that we do not know at what stage of transformation these mutations occur in and when they occur in relation to any potential antigen driven expansion. Analysis of the demographics for the original paper (Koskela, Eldfors et al. 2012) indicates that those patients with mutations may be older than those who do not. This may give some indication that mutation is a later event.

Regardless of when it occurs, the mutation is important in that it directs treatment outcome as we discovered in Chapter 2.

Chapter 2 describes a decade-long, first prospective clinical trial of immunosupressives in

LGL leukemia. Methotrexate (MTX) was found to be 38% effective as first line treatment whereas cyclophosphamide (Cy) was found to be 64% effective as second line treatment. The incomplete response to either of these drugs highlights the need for more effective therapeutics for LGL leukemia. My major contribution in this area was the correlation of gene signatures and somatic mutations with response to MTX. These were determined from samples collected prior to treatment, therefore it is plausible that some form of gene signature or mutational status test could be used to determine those patients that should not receive MTX as first line treatment.

Although Cy appears to have a better treatment response than MTX, even though it was used on cases refractory to MTX, it carries major risk of adverse side effects that MTX does not have.

102 Individual correlates of the possibly STAT driven gene signature such as mir-223 may in the future give insight into the pathology of LGL leukemia.

In Chapter 3 we see additional evidence for mutational involvement in LGL leukemia.

These include additional mechanisms of STAT3 activation as well as other potential survival signal initiating mutations including fibroblast growth factor and nuclear factor kappa B signaling. It will be interesting to determine in future studies if these mutations contribute to the responder phenotype in Chapter 2 in the absence of direct STAT3 mutation. The presence of mutations in histone modifying proteins may be common to all LGL leukemia patients but the effect of the mutations remains to be determined. In contrast to commonalities, it is possible that other mutations, such as killer inhibitory receptors, that are found may explain the highly variable symptomatic presentations of individual LGL patients. LGL is a highly survivable disease but symptom management and quality of life are still key concerns.

Chapter 4 represents the culmination of earlier work to expand our knowledge of network models of LGL leukemia which identified TBET as a key node. I was able to identify a large number of TBET bound regions that will be of use to identify downstream effectors of TBET.

This will require future proof that TBET binding is important to expression or repression of these proximal genes. I also identified a number of factors such as runt related transcription factors

(RUNX) that may be binding with or in opposition to TBET by way of determining enriched motifs. Determining the involvement of these factors may substantially expand our knowledge if they are in any way as important as TBET appears to be.

Another goal of Chapter 4 was to demonstrate proficiency with regard to ChIP-Seq. This will assist our future goals in determining a comprehensive map of factor interactions and the contribution of mutations to the LGL leukemia phenotype. Figure 5-1 displays a model of the overarching goals for our group for the next 3-5 years of LGL leukemia research. This involves the identification of important genomic regions (Aim 1). This will include studies of the binding

103 of trans factors known to be important to LGL leukemia as well as the determination of chromatin states. This will allow us to build our interaction network as well as narrow down regions of the genome deemed to be important for resequencing studies. We will correlate this with expression data to determine which regions are specifically active or inactive in LGL leukemia which will indicate their importance to the phenotype. In Aim 2 we will determine the influence of mutation to determine which observed reactions may be unique to LGL leukemia, as opposed to normal

LGL, and therefore potential targets of therapeutic intervention. Lastly, in Aim 3 we will prove the relevance of individual mutations. This will include biochemical studies to show that mutations do cause change in protein function as one example, as well as epidemiological studies to determine which mutations affect individual patient outcomes.

104

Figure 5-1: Model of Proposed Future Work This figure outlines three specific aims of our future which builds on the foundation of preliminary data detailed in this dissertation. Areas shaded in grey are considered complete.

In conclusion, this dissertation presented one concrete example of how LGL mutation data may be used to positively affect patient outcome and then went on to describe efforts to increase our knowledge of the contribution of mutation to LGL leukemia.

Appendix

Table Appendix-1: Genome One Predicted Protein Changes. Input string for PROVEAN: chromosome number, 1-based coordinate, reference allele, variant allele, genotype (saliva_LGL)

Input Accession number Gene Symbol Amino PROVEAN dbSNP Acid prediction Change 1,12887606,C,G,1_2 ENSP00000328783 PRAMEF11 C125S Deleterious rs58074988 1,12887606,C,G,1_2 ENSP00000391839 PRAMEF11 C84S Deleterious rs58074988 1,12887606,C,G,1_2 ENSP00000439551 PRAMEF11 C84S Deleterious rs58074988 1,12887612,T,C,1_2 ENSP00000328783 PRAMEF11 H123R Neutral rs60558629 1,12887612,T,C,1_2 ENSP00000391839 PRAMEF11 H82R Neutral rs60558629 1,12887612,T,C,1_2 ENSP00000439551 PRAMEF11 H82R Neutral rs60558629 1,12919552,T,C,2_1 ENSP00000240189 PRAMEF2 W98R Deleterious rs74056159 1,12919672,T,C,1_2 ENSP00000240189 PRAMEF2 C138R Deleterious rs139382628 1,12921332,T,C,1_2 ENSP00000240189 PRAMEF2 C375R Neutral rs17039307 WI2- 1,13219565,C,T,2_1 ENSP00000414869 2994D6.2.1 V7M Deleterious 1,16383682,G,C,1_2 ENSP00000332055 CLCNKB A431P Neutral rs1057854 1,17087103,G,C,2_1 ENSP00000445850 MST1P9 S105C Neutral 1,17087116,C,T,2_1 ENSP00000445850 MST1P9 V101M Neutral 1,18692054,C,T,2_1 ENSP00000251296 IGSF21 P293L Neutral 1,18692054,C,T,2_1 ENSP00000388681 IGSF21 P246L Neutral 1,21605755,A,C,2_1 ENSP00000264205 ECE1 V67G Deleterious 1,21605755,A,C,2_1 ENSP00000349581 ECE1 V58G Deleterious 1,21605755,A,C,2_1 ENSP00000364028 ECE1 V70G Deleterious 1,21605755,A,C,2_1 ENSP00000388439 ECE1 V70G Deleterious 1,21605755,A,C,2_1 ENSP00000405088 ECE1 V54G Deleterious 1,21605755,A,C,2_1 ENSP00000432860 ECE1 V53G Deleterious 1,21605755,A,C,2_1 ENSP00000436633 ECE1 V56G Deleterious 1,22304428,C,G,1_2 ENSP00000363798 CELA3B P14A Neutral rs12074015 1,22304429,C,T,1_2 ENSP00000363798 CELA3B P14L Neutral rs12074016 1,23219398,A,C,2_1 ENSP00000363755 EPHB2 T484P Neutral rs116848191 1,23219398,A,C,2_1 ENSP00000363758 EPHB2 T479P Deleterious rs116848191 1,23219398,A,C,2_1 ENSP00000363761 EPHB2 T484P Deleterious rs116848191 1,23219398,A,C,2_1 ENSP00000363763 EPHB2 T484P Deleterious rs116848191 1,23219398,A,C,2_1 ENSP00000383053 EPHB2 T484P Deleterious rs116848191

106

1,27106111,A,C,1_2 ENSP00000320485 ARID1A T1908P Deleterious rs75329807 1,27106111,A,C,1_2 ENSP00000363267 ARID1A T1525P Deleterious rs75329807 1,27106111,A,C,1_2 ENSP00000387636 ARID1A T1691P Deleterious rs75329807 1,27106111,A,C,1_2 ENSP00000390317 ARID1A T805P Deleterious rs75329807 1,27106111,A,C,1_2 ENSP00000442437 ARID1A T236P Deleterious rs75329807 1,52825531,C,T,2_1 ENSP00000284376 CC2D1B S263N Neutral 1,52825531,C,T,2_1 ENSP00000360628 CC2D1B S177N Neutral 1,52825531,C,T,2_1 ENSP00000360642 CC2D1B S263N Neutral 1,52825531,C,T,2_1 ENSP00000395921 CC2D1B S50N Neutral 1,89449434,T,C,2_1 ENSP00000318415 RBMXL1 T26A Neutral rs2893084 1,89449434,T,C,2_1 ENSP00000446099 RBMXL1 T26A Neutral rs2893084 1,89449483,C,G,2_1 ENSP00000318415 RBMXL1 K9N Deleterious rs74100106 1,89449483,C,G,2_1 ENSP00000446099 RBMXL1 K9N Deleterious rs74100106 1,108771721,T,C,1_2 ENSP00000359055 NBPF4 Q494R Neutral rs1992354 1,108771721,T,C,1_2 ENSP00000389237 NBPF4 Q494R Neutral rs1992354 1,108771721,T,C,1_2 ENSP00000389741 NBPF4 Q523R Neutral rs1992354 1,117142700,C,A,2_1 ENSP00000321184 IGSF3 S651I Neutral rs75947003 1,117142700,C,A,2_1 ENSP00000358495 IGSF3 S651I Neutral rs75947003 1,117142700,C,A,2_1 ENSP00000358498 IGSF3 S631I Neutral rs75947003 1,144815953,A,G,1_2 ENSP00000342975 NBPF9 N184D Neutral rs320825 1,144815953,A,G,1_2 ENSP00000364702 NBPF9 N183D Neutral rs320825 1,144815953,A,G,1_2 ENSP00000390934 NBPF9 N184D Neutral rs320825 1,144815968,G,A,1_2 ENSP00000342975 NBPF9 V189I Neutral 1,144815968,G,A,1_2 ENSP00000364702 NBPF9 V188I Neutral 1,144815968,G,A,1_2 ENSP00000390934 NBPF9 V189I Neutral 1,144828764,G,T,2_1 ENSP00000281815 NBPF9 E202* NA rs114576842 1,144828764,G,T,2_1 ENSP00000342975 NBPF9 E604* NA rs114576842 1,144828764,G,T,2_1 ENSP00000364702 NBPF9 E678* NA rs114576842 1,145296448,T,A,1_2 ENSP00000345684 NBPF10 Y124N Neutral rs4996268 1,145296448,T,A,1_2 ENSP00000358345 NBPF10 Y124N Neutral rs4996268 1,145296448,T,A,1_2 ENSP00000414194 NBPF10 Y49N Neutral rs4996268 1,161480669,T,G,1_2 ENSP00000271450 FCGR2A V222G Neutral 1,161480669,T,G,1_2 ENSP00000356949 FCGR2A V221G Neutral 1,167097450,G,A,2_1 ENSP00000271385 DUSP27 E1028K Neutral 1,167097450,G,A,2_1 ENSP00000354483 DUSP27 E1028K Neutral 1,167097450,G,A,2_1 ENSP00000404874 DUSP27 E1028K Neutral 1,201180100,C,G,1_2 ENSP00000334714 IGFN1 R2027G Neutral rs4915223 1,203452536,G,T,1_2 ENSP00000343924 PRELP R75L Deleterious 1,226027704,C,A,2_1 ENSP00000272167 EPHX1 Y299* NA 1,226027704,C,A,2_1 ENSP00000355802 EPHX1 Y299* NA 1,248436456,C,A,1_2 ENSP00000324687 OR2T33 A221S Neutral rs111275277

107

1,248436527,A,C,1_2 ENSP00000324687 OR2T33 M197R Deleterious rs4474294 1,248436809,G,A,2_1 ENSP00000324687 OR2T33 P103L Neutral rs61832700 2,29268211,C,T,2_1 ENSP00000368876 FAM179A A886V Neutral 2,29268211,C,T,2_1 ENSP00000384699 FAM179A A831V Neutral 2,74273904,A,G,1_2 ENSP00000233310 TET3 E152G Neutral 2,74273904,A,G,1_2 ENSP00000307803 TET3 E194G Neutral 2,74273904,A,G,1_2 ENSP00000386869 TET3 E152G Neutral 2,97474433,T,G,2_1 ENSP00000366275 CNNM4 I695S Deleterious 2,97474433,T,G,2_1 ENSP00000444806 CNNM4 I182S Deleterious 2,97757296,C,T,1_2 ENSP00000272610 FAHD2B G50R Deleterious 2,97757296,C,T,1_2 ENSP00000410470 FAHD2B G50R Deleterious 2,97757296,C,T,1_2 ENSP00000444599 FAHD2B G50R Deleterious 2,107049392,T,A,1_2 ENSP00000303659 RGPD3 K823M Deleterious rs832373 2,107049392,T,A,1_2 ENSP00000386588 RGPD3 K823M Deleterious rs832373 2,107049392,T,A,1_2 ENSP00000413577 RGPD3 K581M Deleterious rs832373 2,107049447,C,T,1_2 ENSP00000303659 RGPD3 A805T Neutral rs700866 2,107049447,C,T,1_2 ENSP00000386588 RGPD3 A805T Neutral rs700866 2,107049447,C,T,1_2 ENSP00000413577 RGPD3 A563T Neutral rs700866 2,112615888,C,G,1_2 ENSP00000339109 ANAPC1 Q451H Neutral rs79100806 2,130831867,G,C,2_1 ENSP00000350052 POTEF Q1060E Neutral 2,130831867,G,C,2_1 ENSP00000386786 POTEF Q1060E Neutral 2,132905741,G,A,1_2 ENSP00000386398 ANKRD30BL T247M Neutral rs111770980 2,153486204,T,A,2_1 ENSP00000288670 FMNL2 I812K Deleterious 2,153486204,T,A,2_1 ENSP00000401393 FMNL2 I293K Deleterious 2,153486204,T,A,2_1 ENSP00000418959 FMNL2 I187K Deleterious 2,240969483,G,A,1_2 ENSP00000384563 OR6B2 R122C Deleterious rs10176036 3,49137448,T,G,2_1 ENSP00000307567 QARS D414A Deleterious 3,49137448,T,G,2_1 ENSP00000390015 QARS D403A Deleterious 3,75714041,G,A,1_2 ENSP00000312299 FRG2C G84R Neutral rs71244678 3,75714041,G,A,1_2 ENSP00000419432 FRG2C G83R Neutral rs71244678 3,111918215,C,T,2_1 ENSP00000306627 SLC9A10 G826S Neutral rs28516377 3,111918215,C,T,2_1 ENSP00000420688 SLC9A10 G778S Neutral rs28516377 3,111918235,G,T,1_2 ENSP00000306627 SLC9A10 T819K Neutral rs76044261 3,111918235,G,T,1_2 ENSP00000420688 SLC9A10 T771K Neutral rs76044261 3,167506966,C,A,2_1 ENSP00000295777 SERPINI1 T17K Neutral 3,167506966,C,A,2_1 ENSP00000397373 SERPINI1 T17K Neutral 3,167506966,C,A,2_1 ENSP00000420133 SERPINI1 T17K Neutral 3,167506966,C,A,2_1 ENSP00000420561 SERPINI1 T17K Neutral 3,195447886,G,C,2_1 ENSP00000325431 MUC20 C3S Neutral rs7627924 3,195447886,G,C,2_1 ENSP00000396774 MUC20 C3S Neutral rs7627924 3,195447886,G,C,2_1 ENSP00000414350 MUC20 C3S Neutral rs7627924

108

3,195507053,A,C,1_2 ENSP00000417397 MUC4 S3800A Neutral 3,195507053,A,C,1_2 ENSP00000417498 MUC4 S3800A Neutral 3,195507053,A,C,1_2 ENSP00000417657 MUC4 S3800A Neutral 3,195507053,A,C,1_2 ENSP00000417722 MUC4 S3800A Neutral 3,195507053,A,C,1_2 ENSP00000417757 MUC4 S3800A Neutral 3,195507053,A,C,1_2 ENSP00000418306 MUC4 S3800A Neutral 3,195507053,A,C,1_2 ENSP00000419798 MUC4 S3800A Neutral 3,195507053,A,C,1_2 ENSP00000419989 MUC4 S3800A Neutral 3,195507053,A,C,1_2 ENSP00000420243 MUC4 S3800A Neutral 3,195507053,A,C,1_2 ENSP00000420439 MUC4 S3800A Neutral 3,195507062,C,T,1_2 ENSP00000417397 MUC4 D3797N Neutral 3,195507062,C,T,1_2 ENSP00000417498 MUC4 D3797N Neutral 3,195507062,C,T,1_2 ENSP00000417657 MUC4 D3797N Neutral 3,195507062,C,T,1_2 ENSP00000417722 MUC4 D3797N Neutral 3,195507062,C,T,1_2 ENSP00000417757 MUC4 D3797N Neutral 3,195507062,C,T,1_2 ENSP00000418306 MUC4 D3797N Neutral 3,195507062,C,T,1_2 ENSP00000419798 MUC4 D3797N Neutral 3,195507062,C,T,1_2 ENSP00000419989 MUC4 D3797N Neutral 3,195507062,C,T,1_2 ENSP00000420243 MUC4 D3797N Neutral 3,195507062,C,T,1_2 ENSP00000420439 MUC4 D3797N Neutral 3,195508510,A,T,2_1 ENSP00000417397 MUC4 L3314H Neutral 3,195508510,A,T,2_1 ENSP00000417498 MUC4 L3314H Neutral 3,195508510,A,T,2_1 ENSP00000417657 MUC4 L3314H Neutral 3,195508510,A,T,2_1 ENSP00000417722 MUC4 L3314H Neutral 3,195508510,A,T,2_1 ENSP00000417757 MUC4 L3314H Neutral 3,195508510,A,T,2_1 ENSP00000418306 MUC4 L3314H Neutral 3,195508510,A,T,2_1 ENSP00000419798 MUC4 L3314H Neutral 3,195508510,A,T,2_1 ENSP00000419989 MUC4 L3314H Neutral 3,195508510,A,T,2_1 ENSP00000420243 MUC4 L3314H Neutral 3,195508510,A,T,2_1 ENSP00000420439 MUC4 L3314H Neutral 3,195515242,A,G,1_2 ENSP00000417397 MUC4 V1070A Neutral rs2948678 3,195515242,A,G,1_2 ENSP00000417498 MUC4 V1070A Neutral rs2948678 3,195515242,A,G,1_2 ENSP00000417657 MUC4 V1070A Neutral rs2948678 3,195515242,A,G,1_2 ENSP00000417722 MUC4 V1070A Neutral rs2948678 3,195515242,A,G,1_2 ENSP00000417757 MUC4 V1070A Neutral rs2948678 3,195515242,A,G,1_2 ENSP00000418306 MUC4 V1070A Neutral rs2948678 3,195515242,A,G,1_2 ENSP00000419798 MUC4 V1070A Neutral rs2948678 3,195515242,A,G,1_2 ENSP00000419989 MUC4 V1070A Neutral rs2948678 3,195515242,A,G,1_2 ENSP00000420243 MUC4 V1070A Neutral rs2948678 3,195515242,A,G,1_2 ENSP00000420439 MUC4 V1070A Neutral rs2948678 4,4190595,G,C,2_1 ENSP00000296358 OTOP1 P592A Deleterious rs2310687

109

5,741736,G,T,1_0 ENSP00000442373 ZDHHC11B A314D Neutral rs61128505 5,741736,G,T,1_0 ENSP00000445280 ZDHHC11B A303D Neutral rs61128505 5,141248963,T,G,2_1 ENSP00000287008 PCDH1 H25P Neutral rs12515587 5,141248963,T,G,2_1 ENSP00000350122 PCDH1 H36P Neutral rs12515587 5,141248963,T,G,2_1 ENSP00000378043 PCDH1 H25P Neutral rs12515587 5,141248963,T,G,2_1 ENSP00000403497 PCDH1 H25P Neutral rs12515587 5,141248963,T,G,2_1 ENSP00000424163 PCDH1 H3P Neutral rs12515587 5,141248963,T,G,2_1 ENSP00000424667 PCDH1 H25P Neutral rs12515587 5,141248963,T,G,2_1 ENSP00000438825 PCDH1 H3P Neutral rs12515587 5,161529565,C,T,2_1 ENSP00000410732 GABRG2 R213C Neutral 6,2623686,T,C,2_1 ENSP00000296847 C6orf195 N124S Deleterious 6,32549589,A,C,2_1 ENSP00000353099 HLA-DRB1 S133A Neutral 6,32725567,C,T,1_2 ENSP00000396330 HLA-DQB2 R247H Neutral rs77504727 6,32725567,C,T,1_2 ENSP00000410512 HLA-DQB2 R247H Neutral rs77504727 6,35467830,A,G,2_1 ENSP00000229771 TULP1 Y475H Deleterious 6,35467830,A,G,2_1 ENSP00000319414 TULP1 Y422H Deleterious 6,36393724,A,G,1_2 ENSP00000419944 PXT1 S46P Neutral 6,43164501,G,A,1_0 ENSP00000252050 CUL9 G902R Neutral rs147443571 6,43164501,G,A,1_0 ENSP00000346490 CUL9 G792R Neutral rs147443571 6,43164501,G,A,1_0 ENSP00000361730 CUL9 G902R Neutral rs147443571 6,109767930,G,T,2_1 ENSP00000351385 MICAL1 A672E Neutral rs9320288 6,109767930,G,T,2_1 ENSP00000351664 MICAL1 A758E Neutral rs9320288 6,109767930,G,T,2_1 ENSP00000357948 MICAL1 A777E Neutral rs9320288 6,109767930,G,T,2_1 ENSP00000357953 MICAL1 A282E Neutral rs9320288 6,109767931,C,T,2_1 ENSP00000351385 MICAL1 A672T Neutral 6,109767931,C,T,2_1 ENSP00000351664 MICAL1 A758T Neutral 6,109767931,C,T,2_1 ENSP00000357948 MICAL1 A777T Neutral 6,109767931,C,T,2_1 ENSP00000357953 MICAL1 A282T Neutral 6,136589448,C,A,2_1 ENSP00000229446 BCLAF1 R748L Neutral rs78267720 6,136589448,C,A,2_1 ENSP00000376159 BCLAF1 R748L Neutral rs78267720 6,136589448,C,A,2_1 ENSP00000434826 BCLAF1 R748L Neutral rs78267720 6,136589448,C,A,2_1 ENSP00000435210 BCLAF1 R750L Neutral rs78267720 6,136589448,C,A,2_1 ENSP00000435441 BCLAF1 R750L Neutral rs78267720 6,136589448,C,A,2_1 ENSP00000436501 BCLAF1 R577L Neutral rs78267720 6,136589448,C,A,2_1 ENSP00000437333 BCLAF1 R5L Deleterious rs78267720 6,136594292,T,C,2_1 ENSP00000229446 BCLAF1 N627S Deleterious rs7381749 6,136594292,T,C,2_1 ENSP00000376159 BCLAF1 N627S Deleterious rs7381749 6,136594292,T,C,2_1 ENSP00000431734 BCLAF1 N629S Neutral rs7381749 6,136594292,T,C,2_1 ENSP00000433505 BCLAF1 N629S Deleterious rs7381749 6,136594292,T,C,2_1 ENSP00000434826 BCLAF1 N627S Deleterious rs7381749 6,136594292,T,C,2_1 ENSP00000435210 BCLAF1 N629S Deleterious rs7381749

110

6,136594292,T,C,2_1 ENSP00000435441 BCLAF1 N629S Deleterious rs7381749 6,136594292,T,C,2_1 ENSP00000436142 BCLAF1 N627S Deleterious rs7381749 6,136594292,T,C,2_1 ENSP00000436216 BCLAF1 N629S Deleterious rs7381749 6,136594292,T,C,2_1 ENSP00000436501 BCLAF1 N456S Neutral rs7381749 6,167591954,G,A,1_2 ENSP00000283507 TCP10L2 R194H Neutral rs2297463 6,167591954,G,A,1_2 ENSP00000355797 TCP10L2 R194H Neutral rs2297463 7,72418963,T,C,1_2 ENSP00000378687 POM121 L985P Neutral rs1107 7,72418963,T,C,1_2 ENSP00000393020 POM121 L985P Neutral rs1107 7,100549505,T,C,1_2 ENSP00000368771 AC118759.1 I29T Neutral rs74588241 7,100549516,C,A,1_2 ENSP00000368771 AC118759.1 P33T Neutral rs76951301 7,100551433,T,C,2_1 ENSP00000324834 MUC3A F62L Neutral rs73398732 7,100552018,T,G,2_1 ENSP00000324834 MUC3A S257A Neutral rs76249962 7,142458929,G,C,1_2 ENSP00000417854 PRSS1 K70N Neutral 7,142460339,G,A,2_1 ENSP00000308720 PRSS1 C171Y Deleterious 7,142460339,G,A,2_1 ENSP00000417854 PRSS1 C185Y Deleterious 7,142460339,G,A,2_1 ENSP00000419912 PRSS1 C121Y Deleterious 7,142460339,G,A,2_1 ENSP00000434456 PRSS1 C161Y Deleterious 7,150500797,C,A,2_1 ENSP00000004103 TMEM176A Y144* NA 7,150500797,C,A,2_1 ENSP00000417626 TMEM176A Y144* NA 7,150500797,C,A,2_1 ENSP00000417834 TMEM176A Y96* NA 7,150500797,C,A,2_1 ENSP00000420081 TMEM176A Y85* NA 7,150500797,C,A,2_1 ENSP00000420818 TMEM176A Y85* NA 7,150707313,C,T,2_1 ENSP00000297494 NOS3 R875W Deleterious 7,150707313,C,T,2_1 ENSP00000417143 NOS3 R669W Deleterious 7,150707313,C,T,2_1 ENSP00000418245 NOS3 R169W Deleterious 7,150937277,T,C,1_2 ENSP00000173385 SMARCD3 E317G Deleterious 7,150937277,T,C,1_2 ENSP00000262188 SMARCD3 E365G Deleterious 7,150937277,T,C,1_2 ENSP00000349254 SMARCD3 E352G Deleterious 7,150937277,T,C,1_2 ENSP00000376558 SMARCD3 E352G Deleterious 7,151932945,C,T,1_2 ENSP00000262189 MLL3 R909K Neutral 7,151932945,C,T,1_2 ENSP00000347325 MLL3 R909K Neutral 7,151932945,C,T,1_2 ENSP00000403483 MLL3 R65K Deleterious 7,151935871,C,A,2_1 ENSP00000262189 MLL3 W858L Deleterious rs111493987 7,151935871,C,A,2_1 ENSP00000347325 MLL3 W858L Deleterious rs111493987 7,151935910,C,T,2_1 ENSP00000262189 MLL3 G845E Deleterious rs4024419 7,151935910,C,T,2_1 ENSP00000347325 MLL3 G845E Deleterious rs4024419 7,157341693,C,T,2_1 ENSP00000374064 PTPRN2 A946T Deleterious 7,157341693,C,T,2_1 ENSP00000374067 PTPRN2 A958T Deleterious 7,157341693,C,T,2_1 ENSP00000374069 PTPRN2 A975T Deleterious 7,157341693,C,T,2_1 ENSP00000385464 PTPRN2 A998T Deleterious 7,157341693,C,T,2_1 ENSP00000387114 PTPRN2 A937T Deleterious

111

8,101721839,C,A,2_1 ENSP00000174661 PABPC1 V365L Deleterious 8,101721839,C,A,2_1 ENSP00000313007 PABPC1 V365L Deleterious 8,101721839,C,A,2_1 ENSP00000427914 PABPC1 V234L Deleterious 8,101721839,C,A,2_1 ENSP00000428948 PABPC1 V56L Deleterious 8,101721839,C,A,2_1 ENSP00000429395 PABPC1 V333L Deleterious 8,101721839,C,A,2_1 ENSP00000429594 PABPC1 V320L Deleterious 8,101721839,C,A,2_1 ENSP00000430068 PABPC1 V198L Deleterious 9,5897614,T,G,1_2 ENSP00000370880 MLANA C45W Deleterious 9,5897614,T,G,1_2 ENSP00000370885 MLANA C45W Deleterious 9,5897614,T,G,1_2 ENSP00000370886 MLANA C45W Deleterious 9,33385287,T,C,2_1 ENSP00000297988 AQP7 N249D Neutral rs74557595 9,33385287,T,C,2_1 ENSP00000368821 AQP7 N248D Neutral rs74557595 9,33385287,T,C,2_1 ENSP00000410138 AQP7 N157D Neutral rs74557595 9,33796799,A,T,2_1 ENSP00000340889 PRSS3 T81S Neutral 9,33796799,A,T,2_1 ENSP00000354280 PRSS3 T124S Neutral 9,33796799,A,T,2_1 ENSP00000368715 PRSS3 T67S Neutral 9,33796799,A,T,2_1 ENSP00000401249 PRSS3 T79S Neutral 9,33796799,A,T,2_1 ENSP00000401828 PRSS3 T60S Neutral 9,43849812,T,G,1_2 ENSP00000340890 CNTNAP3B C573G Deleterious rs3864761 9,43849812,T,G,1_2 ENSP00000366784 CNTNAP3B C622G Deleterious rs3864761 9,43849812,T,G,1_2 ENSP00000366787 CNTNAP3B C573G Deleterious rs3864761 9,78790122,G,A,2_1 ENSP00000365958 PCSK5 M659I Neutral 9,78790122,G,A,2_1 ENSP00000379415 PCSK5 M659I Neutral 9,132662285,A,G,1_2 ENSP00000347907 FNBP1 L520P Deleterious 9,132662285,A,G,1_2 ENSP00000361492 FNBP1 L549P Deleterious 9,132662285,A,G,1_2 ENSP00000361493 FNBP1 L549P Deleterious 9,132662285,A,G,1_2 ENSP00000389117 FNBP1 L177P Deleterious 9,132662285,A,G,1_2 ENSP00000407548 FNBP1 L540P Deleterious 9,132662285,A,G,1_2 ENSP00000413625 FNBP1 L549P Deleterious 9,132662285,A,G,1_2 ENSP00000415602 FNBP1 L501P Deleterious 10,30915708,T,G,1_2 ENSP00000364467 LYZL2 N92T Deleterious rs142897252 10,30918549,G,C,2_1 ENSP00000364467 LYZL2 A29G Neutral 10,46999604,A,G,0_1 ENSP00000363433 GPRIN2 R242G Neutral rs3127683 10,46999604,A,G,0_1 ENSP00000363436 GPRIN2 R242G Neutral rs3127683 10,47207813,T,C,1_2 ENSP00000347372 AGAP10 H157R Neutral rs79933604 10,47207813,T,C,1_2 ENSP00000392206 AGAP10 H132R Neutral rs79933604 10,47207813,T,C,1_2 ENSP00000407436 AGAP10 H228R Neutral rs79933604 10,75239194,T,C,2_1 ENSP00000343147 PPP3CB K56R Neutral 10,75239194,T,C,2_1 ENSP00000353881 PPP3CB K56R Neutral 10,75239194,T,C,2_1 ENSP00000378299 PPP3CB K56R Neutral 10,75239194,T,C,2_1 ENSP00000378305 PPP3CB K56R Neutral

112

10,75239194,T,C,2_1 ENSP00000378306 PPP3CB K56R Neutral 10,111893087,A,G,2_1 ENSP00000277900 ADD3 N579S Neutral 10,111893087,A,G,2_1 ENSP00000348381 ADD3 N611S Neutral 10,111893087,A,G,2_1 ENSP00000353286 ADD3 N579S Neutral 10,123298190,C,G,2_1 ENSP00000263451 FGFR2 V222L Deleterious 10,123298190,C,G,2_1 ENSP00000309878 FGFR2 V222L Deleterious 10,123298190,C,G,2_1 ENSP00000337665 FGFR2 V133L Deleterious 10,123298190,C,G,2_1 ENSP00000348559 FGFR2 V107L Deleterious 10,123298190,C,G,2_1 ENSP00000350166 FGFR2 V133L Deleterious 10,123298190,C,G,2_1 ENSP00000351276 FGFR2 V222L Deleterious 10,123298190,C,G,2_1 ENSP00000352309 FGFR2 V222L Deleterious 10,123298190,C,G,2_1 ENSP00000353262 FGFR2 V133L Deleterious 10,123298190,C,G,2_1 ENSP00000358052 FGFR2 V222L Deleterious 10,123298190,C,G,2_1 ENSP00000358054 FGFR2 V222L Deleterious 10,123298190,C,G,2_1 ENSP00000358055 FGFR2 V107L Deleterious 10,123298190,C,G,2_1 ENSP00000358056 FGFR2 V222L Deleterious 10,123298190,C,G,2_1 ENSP00000358057 FGFR2 V222L Neutral 10,123298190,C,G,2_1 ENSP00000358058 FGFR2 V222L Deleterious 10,123298190,C,G,2_1 ENSP00000410294 FGFR2 V222L Deleterious 10,135440222,C,T,2_1 ENSP00000401310 FRG2B D9N Neutral rs35781983 10,135440222,C,T,2_1 ENSP00000408343 FRG2B D9N Neutral rs35781983 11,1011490,G,A,1_0 ENSP00000328024 AP2A2 R431K Neutral rs7950955 11,1018295,T,C,2_1 ENSP00000406861 MUC6 I1502M Neutral rs73403297 11,4976077,T,G,1_2 ENSP00000369729 OR51A2 K289N Neutral rs2570573 11,47857261,T,A,1_2 ENSP00000367721 NUP160 Y348F Neutral 11,47857261,T,A,1_2 ENSP00000412204 NUP160 Y98F Neutral 11,47857261,T,A,1_2 ENSP00000432367 NUP160 Y234F Neutral 11,47857261,T,A,1_2 ENSP00000433590 NUP160 Y234F Neutral 11,48346961,A,T,2_1 ENSP00000321419 OR4C3 N157Y Neutral rs75493089 11,48346961,A,T,2_1 ENSP00000378660 OR4C3 N20Y Neutral rs75493089 11,48346962,A,G,2_1 ENSP00000321419 OR4C3 N157S Neutral rs74589050 11,48346962,A,G,2_1 ENSP00000378660 OR4C3 N20S Neutral rs74589050 11,48387237,C,T,1_2 ENSP00000321338 OR4C5 V261I Neutral rs72898877 11,71544246,A,G,1_2 ENSP00000333234 DEFB108B M1V Deleterious rs72616146 11,89774540,C,A,1_2 ENSP00000388299 TRIM49L2 T394N Neutral rs75119043 12,2602398,C,T,2_1 ENSP00000266376 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000323129 CACNA1C T161M Deleterious 12,2602398,C,T,2_1 ENSP00000329877 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000336982 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000341092 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382500 CACNA1C T320M Deleterious

113

12,2602398,C,T,2_1 ENSP00000382504 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382506 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382510 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382512 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382515 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382526 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382530 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382537 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382542 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382546 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382547 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382549 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382552 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382557 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000382563 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000385724 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000385896 CACNA1C T320M Deleterious 12,2602398,C,T,2_1 ENSP00000437936 CACNA1C T320M Deleterious 12,11285909,G,C,1_2 ENSP00000444736 TAS2R30 T312R Neutral rs78329184 12,11286088,A,C,2_1 ENSP00000444736 TAS2R30 F252L Neutral rs2599404 12,11461553,T,C,1_2 ENSP00000279575 PRB4 R122G Neutral 12,11461553,T,C,1_2 ENSP00000442834 PRB4 R122G Neutral 12,11461570,G,T,2_1 ENSP00000279575 PRB4 P116H Neutral rs59189129 12,11461570,G,T,2_1 ENSP00000442834 PRB4 P116H Neutral rs59189129 12,49440443,C,A,2_1 ENSP00000301067 MLL2 C1456F Deleterious 12,50746514,G,C,1_2 ENSP00000329995 FAM186A H1367Q Neutral rs71466905 12,50746514,G,C,1_2 ENSP00000441337 FAM186A H1367Q Neutral rs71466905 12,133697130,C,G,2_1 ENSP00000380480 ZNF891 V459L Neutral rs2173970 12,133697130,C,G,2_1 ENSP00000437590 ZNF891 V459L Neutral rs2173970 13,24468329,A,G,1_2 ENSP00000371489 C1QTNF9B C60R Neutral rs4067961 13,24468329,A,G,1_2 ENSP00000371572 C1QTNF9B C60R Neutral rs4067961 13,24468329,A,G,1_2 ENSP00000371575 C1QTNF9B C60R Neutral rs4067961 13,24468329,A,G,1_2 ENSP00000371580 C1QTNF9B C60R Neutral rs4067961 14,20010197,C,T,1_2 ENSP00000340144 POTEM V321I Neutral 14,20010197,C,T,1_2 ENSP00000389253 POTEM V406I Neutral 14,20010197,C,T,1_2 ENSP00000450853 POTEM V321I Neutral 14,20010197,C,T,1_2 ENSP00000452296 POTEM V321I Neutral 15,20740330,C,A,2_1 ENSP00000398615 GOLGA6L6 E474* NA 15,23685647,C,T,2_1 ENSP00000454407 GOLGA6L2 E659K Neutral 15,28566548,T,C,1_2 ENSP00000261609 HERC2 Q11R Neutral rs2920545 15,28566548,T,C,1_2 ENSP00000456237 HERC2 Q11R Deleterious rs2920545

114

15,28566562,G,C,1_2 ENSP00000261609 HERC2 F6L Neutral rs2525965 15,28566562,G,C,1_2 ENSP00000456237 HERC2 F6L Neutral rs2525965 15,42433802,A,G,1_2 ENSP00000400843 PLA2G4F L857P Neutral rs2280248 15,78633595,T,G,1_2 ENSP00000385978 CRABP1 V79G Neutral rs2139440 15,91290630,C,G,1_2 ENSP00000347232 BLM A3G Neutral 15,91290630,C,G,1_2 ENSP00000453359 BLM A3G Neutral 15,91290630,C,G,1_2 ENSP00000454158 BLM A3G Neutral 15,93522449,C,G,2_1 ENSP00000377747 CHD2 L938V Deleterious 15,93522449,C,G,2_1 ENSP00000451366 CHD2 L938V Deleterious 15,94901745,A,G,2_1 ENSP00000350377 MCTP2 E402G Deleterious 15,94901745,A,G,2_1 ENSP00000395109 MCTP2 E402G Deleterious 16,55862824,C,T,2_1 ENSP00000353720 CES1 V39I Neutral rs3826192 16,55862824,C,T,2_1 ENSP00000355193 CES1 V38I Neutral rs3826192 16,55862824,C,T,2_1 ENSP00000390492 CES1 V38I Neutral rs3826192 16,69921959,A,C,2_1 ENSP00000348283 WWP2 T241P Neutral 16,69921959,A,C,2_1 ENSP00000352069 WWP2 T241P Neutral 16,69921959,A,C,2_1 ENSP00000396871 WWP2 T241P Neutral 16,69921959,A,C,2_1 ENSP00000445616 WWP2 T125P Neutral 16,69921959,A,C,2_1 ENSP00000455311 WWP2 T241P Neutral 17,12883487,C,T,2_1 ENSP00000262444 ARHGAP44 P84S Neutral 17,12883487,C,T,2_1 ENSP00000342566 ARHGAP44 P620S Neutral 17,12883487,C,T,2_1 ENSP00000368994 ARHGAP44 P626S Neutral 17,12883487,C,T,2_1 ENSP00000437542 ARHGAP44 P282S Neutral RP11- 17,15517284,G,C,1_2 ENSP00000261644 385D13.1.1 A245G Neutral rs3809727 RP11- 17,15517284,G,C,1_2 ENSP00000379242 385D13.1.1 A245G Neutral rs3809727 RP11- 17,15517284,G,C,1_2 ENSP00000402644 385D13.1.1 A570G Neutral rs3809727 17,16068377,C,G,2_1 ENSP00000268712 NCOR1 K178N Deleterious 17,16068377,C,G,2_1 ENSP00000379189 NCOR1 K69N Deleterious 17,16068377,C,G,2_1 ENSP00000379190 NCOR1 K69N Deleterious 17,16068377,C,G,2_1 ENSP00000379192 NCOR1 K178N Deleterious 17,16068377,C,G,2_1 ENSP00000387727 NCOR1 K178N Deleterious 17,16068377,C,G,2_1 ENSP00000395091 NCOR1 K178N Deleterious 17,16068377,C,G,2_1 ENSP00000407998 NCOR1 K178N Deleterious 17,18682399,A,G,1_2 ENSP00000306937 FBXW10 T930A Neutral rs1318979 17,18682399,A,G,1_2 ENSP00000310382 FBXW10 T992A Neutral rs1318979 17,18682399,A,G,1_2 ENSP00000379025 FBXW10 T983A Neutral rs1318979 17,18682399,A,G,1_2 ENSP00000379026 FBXW10 T982A Neutral rs1318979 17,19318132,G,A,2_1 ENSP00000454919 RNF112 R353Q Neutral 17,19318132,G,A,2_1 ENSP00000457295 RNF112 R236Q Neutral 17,20769954,T,G,2_1 ENSP00000328054 CCDC144NL T160P Neutral rs77065992

115

17,40474420,C,A,2_1 ENSP00000264657 STAT3 D661Y Neutral 17,40474420,C,A,2_1 ENSP00000373923 STAT3 D563Y Neutral 17,40474420,C,A,2_1 ENSP00000384943 STAT3 D661Y Neutral 17,45266522,T,C,1_0 ENSP00000066544 CDC27 E6G Deleterious rs62077279 17,45266522,T,C,1_0 ENSP00000432105 CDC27 E6G Deleterious rs62077279 17,45266522,T,C,1_0 ENSP00000432211 CDC27 E6G Deleterious rs62077279 17,45266522,T,C,1_0 ENSP00000433330 CDC27 E6G Deleterious rs62077279 17,45266522,T,C,1_0 ENSP00000434614 CDC27 E6G Deleterious rs62077279 17,45266522,T,C,1_0 ENSP00000437339 CDC27 E6G Deleterious rs62077279 19,22575777,T,C,2_1 ENSP00000350418 ZNF98 Y87C Neutral 19,22868629,A,T,1_2 ENSP00000342595 AC011467.1 S221C Neutral rs56047731 19,23544625,C,T,1_2 ENSP00000300619 ZNF91 A386T Deleterious rs403356 19,23544625,C,T,1_2 ENSP00000380272 ZNF91 A354T Neutral rs403356 19,35863226,G,T,1_2 ENSP00000410925 GPR42 R322L Neutral rs150552589 19,51919949,G,A,1_0 ENSP00000342389 SIGLEC10 A226V Neutral rs9304711 19,51919949,G,A,1_0 ENSP00000345243 SIGLEC10 A226V Neutral rs9304711 19,51919949,G,A,1_0 ENSP00000348646 SIGLEC10 A226V Neutral rs9304711 19,51919949,G,A,1_0 ENSP00000389132 SIGLEC10 A168V Neutral rs9304711 19,51919949,G,A,1_0 ENSP00000395475 SIGLEC10 A168V Neutral rs9304711 19,51919949,G,A,1_0 ENSP00000408387 SIGLEC10 A168V Neutral rs9304711 19,51919949,G,A,1_0 ENSP00000414324 SIGLEC10 A178V Neutral rs9304711 19,51919949,G,A,1_0 ENSP00000431444 SIGLEC10 A226V Neutral rs9304711 19,51919949,G,A,1_0 ENSP00000435281 SIGLEC10 A40V Neutral rs9304711 19,51920196,G,T,1_2 ENSP00000342389 SIGLEC10 Q144K Neutral rs73049612 19,51920196,G,T,1_2 ENSP00000345243 SIGLEC10 Q144K Neutral rs73049612 19,51920196,G,T,1_2 ENSP00000348646 SIGLEC10 Q144K Neutral rs73049612 19,51920196,G,T,1_2 ENSP00000396742 SIGLEC10 Q144K Deleterious rs73049612 19,51920196,G,T,1_2 ENSP00000431444 SIGLEC10 Q144K Neutral rs73049612 19,51920196,G,T,1_2 ENSP00000433838 SIGLEC10 Q111K Neutral rs73049612 19,53770764,G,A,1_0 ENSP00000310856 VN1R4 A52V Deleterious rs74429916 19,54403903,G,A,2_1 ENSP00000263431 PRKCG G492E Deleterious 19,54403903,G,A,2_1 ENSP00000438090 PRKCG G379E Deleterious 19,54403903,G,A,2_1 ENSP00000443493 PRKCG G492E Deleterious 19,54726241,C,T,1_2 ENSP00000245620 LILRB3 M88I Neutral rs77279742 19,54726241,C,T,1_2 ENSP00000345184 LILRB3 M88I Neutral rs77279742 19,54726241,C,T,1_2 ENSP00000375630 LILRB3 M88I Neutral rs77279742 19,54726241,C,T,1_2 ENSP00000388199 LILRB3 M88I Deleterious rs77279742 19,54726241,C,T,1_2 ENSP00000412771 LILRB3 M88I Neutral rs77279742 19,54726241,C,T,1_2 ENSP00000416920 LILRB3 M88I Deleterious rs77279742 19,56663058,C,G,1_2 ENSP00000365447 AC024580.1 A65P Neutral 19,58385748,G,A,2_1 ENSP00000410545 ZNF814 A337V Neutral rs145250945

116

20,26061865,G,T,1_2 ENSP00000246000 FAM182A E73* NA rs79051971 20,26061865,G,T,1_2 ENSP00000365580 FAM182A E73* NA rs79051971 20,26061865,G,T,1_2 ENSP00000388057 FAM182A E14* NA rs79051971 21,14982886,G,A,1_0 ENSP00000299443 POTED G113S Neutral rs6517869 21,19698793,C,A,2_1 ENSP00000284885 TMPRSS15 G626V Deleterious 22,18835523,T,C,1_2 ENSP00000342394 AC008132.13.1 V360A Neutral rs662899 22,18835523,T,C,1_2 ENSP00000393689 AC008132.13.1 V360A Neutral rs662899 22,32586926,T,C,1_2 ENSP00000248980 RFPL2 S263G Neutral rs136470 22,32586926,T,C,1_2 ENSP00000248983 RFPL2 S234G Neutral rs136470 22,32586926,T,C,1_2 ENSP00000383095 RFPL2 S234G Neutral rs136470 22,32586926,T,C,1_2 ENSP00000383096 RFPL2 S324G Neutral rs136470 22,42089768,T,G,1_2 ENSP00000385467 C22orf46 V173G Deleterious 22,42523636,C,A,2_1 ENSP00000351927 CYP2D6 R278L Deleterious rs3915951 22,42523636,C,A,2_1 ENSP00000353820 CYP2D6 R329L Deleterious rs3915951 22,42523636,C,A,2_1 ENSP00000374620 CYP2D6 R329L Deleterious rs3915951 22,42523636,C,A,2_1 ENSP00000414432 CYP2D6 R275L Deleterious rs3915951 22,42523636,C,A,2_1 ENSP00000445289 CYP2D6 R278L Deleterious rs3915951 X,55185656,C,A,2_1 ENSP00000333394 FAM104B R9I Deleterious rs5003001 X,55185656,C,A,2_1 ENSP00000364101 FAM104B R9I Deleterious rs5003001 X,55185656,C,A,2_1 ENSP00000397188 FAM104B R9I Deleterious rs5003001 X,55185656,C,A,2_1 ENSP00000420895 FAM104B R9I Deleterious rs5003001 X,55185656,C,A,2_1 ENSP00000421161 FAM104B R6I Deleterious rs5003001 X,55185656,C,A,2_1 ENSP00000423164 FAM104B R8I Deleterious rs5003001 X,100103684,C,A,1_2 ENSP00000217885 NOX1 Q452H Deleterious X,100103684,C,A,1_2 ENSP00000362051 NOX1 Q464H Deleterious X,100103684,C,A,1_2 ENSP00000362057 NOX1 Q501H Deleterious X,135958704,G,C,2_1 ENSP00000359645 RBMX P167A Deleterious rs112089728 X,135958704,G,C,2_1 ENSP00000405117 RBMX P169A Deleterious rs112089728 X,135958704,G,C,2_1 ENSP00000415250 RBMX P154A Deleterious rs112089728 X,135958704,G,C,2_1 ENSP00000454674 RBMX P32A Deleterious rs112089728 X,135958704,G,C,2_1 ENSP00000457051 RBMX P167A Deleterious rs112089728 X,135958704,G,C,2_1 ENSP00000457866 RBMX P39A Deleterious rs112089728 X,135958730,C,A,2_1 ENSP00000359645 RBMX G158V Deleterious rs78702689 X,135958730,C,A,2_1 ENSP00000405117 RBMX G160V Deleterious rs78702689 X,135958730,C,A,2_1 ENSP00000415250 RBMX G145V Deleterious rs78702689 X,135958730,C,A,2_1 ENSP00000454674 RBMX G23V Deleterious rs78702689 X,135958730,C,A,2_1 ENSP00000457051 RBMX G158V Deleterious rs78702689 X,135958730,C,A,2_1 ENSP00000457866 RBMX G30V Deleterious rs78702689 Y,6955386,G,T,1_2 ENSP00000328879 TBL1Y E453D Neutral Y,6955386,G,T,1_2 ENSP00000347289 TBL1Y E453D Neutral Y,6955386,G,T,1_2 ENSP00000372499 TBL1Y E453D Neutral

117

Y,6955386,G,T,1_2 ENSP00000437721 TBL1Y E295D Neutral Y,15025754,G,T,2_1 ENSP00000336725 DDX3Y C221F Deleterious Y,15025754,G,T,2_1 ENSP00000353284 DDX3Y C221F Deleterious Y,15025754,G,T,2_1 ENSP00000398953 DDX3Y C221F Deleterious Y,15025754,G,T,2_1 ENSP00000400377 DDX3Y C218F Deleterious Y,15025754,G,T,2_1 ENSP00000443710 DDX3Y C218F Deleterious

Table Appendix-2: Genome Two Predicted Protein Changes. Input string for PROVEAN: chromosome number, 1-based coordinate, reference allele, variant allele, genotype (saliva_LGL)

Input Accession number Gene Symbol Amino PROVEAN dbSNP Acid prediction Change

1,12887606,C,G,1_2 ENSP00000328783 PRAMEF11 C125S Deleterious rs58074988 1,12887606,C,G,1_2 ENSP00000391839 PRAMEF11 C84S Deleterious rs58074988 1,12887606,C,G,1_2 ENSP00000439551 PRAMEF11 C84S Deleterious rs58074988 1,12887612,T,C,1_2 ENSP00000328783 PRAMEF11 H123R Neutral rs60558629 1,12887612,T,C,1_2 ENSP00000391839 PRAMEF11 H82R Neutral rs60558629 1,12887612,T,C,1_2 ENSP00000439551 PRAMEF11 H82R Neutral rs60558629 1,12919672,T,C,1_2 ENSP00000240189 PRAMEF2 C138R Deleterious rs139382628 1,12919682,C,T,1_2 ENSP00000240189 PRAMEF2 T141M Neutral rs17038667 1,12942138,C,G,1_2 ENSP00000235349 PRAMEF4 E138Q Neutral rs3877226 1,18807897,C,G,2_1 ENSP00000383505 KLHDC7A P141R Neutral rs2992755 1,18807897,C,G,2_1 ENSP00000439465 KLHDC7A P78R Neutral rs2992755 1,62350011,C,T,2_1 ENSP00000255202 INADL A1021V Neutral 1,62350011,C,T,2_1 ENSP00000326199 INADL A1021V Neutral 1,62350011,C,T,2_1 ENSP00000360198 INADL A1021V Neutral 1,62350011,C,T,2_1 ENSP00000360200 INADL A1021V Neutral

118

1,62350011,C,T,2_1 ENSP00000378889 INADL A1021V Neutral 1,62350011,C,T,2_1 ENSP00000433669 INADL A480V Neutral 1,89449434,T,C,2_1 ENSP00000318415 RBMXL1 T26A Neutral rs2893084 1,89449434,T,C,2_1 ENSP00000446099 RBMXL1 T26A Neutral rs2893084 1,89449483,C,G,2_1 ENSP00000318415 RBMXL1 K9N Deleterious rs74100106 1,89449483,C,G,2_1 ENSP00000446099 RBMXL1 K9N Deleterious rs74100106 1,92447210,G,A,2_1 ENSP00000354568 BRDT V634I Neutral 1,92447210,G,A,2_1 ENSP00000359416 BRDT V561I Neutral 1,92447210,G,A,2_1 ENSP00000378038 BRDT V588I Neutral 1,92447210,G,A,2_1 ENSP00000384051 BRDT V634I Neutral 1,92447210,G,A,2_1 ENSP00000387822 BRDT V634I Neutral 1,117142736,A,G,2_1 ENSP00000321184 IGSF3 I639T Deleterious rs75067537 1,117142736,A,G,2_1 ENSP00000358495 IGSF3 I639T Deleterious rs75067537 1,117142736,A,G,2_1 ENSP00000358498 IGSF3 I619T Deleterious rs75067537 1,148004649,C,G,1_2 ENSP00000309907 NBPF14 E895Q Neutral rs16826711 1,148004649,C,G,1_2 ENSP00000358221 NBPF14 E889Q Neutral rs16826711 1,148004649,C,G,1_2 ENSP00000358375 NBPF14 E237Q Neutral rs16826711 1,148004784,T,C,1_2 ENSP00000309907 NBPF14 S850G Neutral rs144889269 1,148004784,T,C,1_2 ENSP00000358221 NBPF14 S844G Neutral rs144889269 1,148004784,T,C,1_2 ENSP00000358375 NBPF14 S192G Neutral rs144889269 1,152190945,C,T,1_2 ENSP00000357791 HRNR E1054K Neutral rs76549374 1,152276671,C,T,2_1 ENSP00000357789 FLG R3564H Neutral 1,152278814,C,T,2_1 ENSP00000357786 FLG G112S Neutral rs2184952 1,152278814,C,T,2_1 ENSP00000357789 FLG G2850S Neutral rs2184952 1,165179968,C,T,2_1 ENSP00000294816 LMX1A V239I Neutral 1,165179968,C,T,2_1 ENSP00000340226 LMX1A V239I Neutral 1,165179968,C,T,2_1 ENSP00000356868 LMX1A V239I Neutral 1,206516261,C,T,2_1 ENSP00000295713 SRGAP2 R22W Deleterious rs1769156 1,206516261,C,T,2_1 ENSP00000408089 SRGAP2 R22W Deleterious rs1769156 1,245021425,T,C,2_1 ENSP00000283179 HNRNPU Q461R Deleterious 1,245021425,T,C,2_1 ENSP00000393151 HNRNPU Q442R Deleterious 1,245021425,T,C,2_1 ENSP00000410728 HNRNPU Q238R Deleterious 1,245021425,T,C,2_1 ENSP00000416455 HNRNPU Q386R Deleterious 1,248085011,C,G,1_2 ENSP00000326225 OR2T8 A231G Neutral rs4595394 2,55771161,A,G,0_1 ENSP00000295117 CCDC104 D243G Neutral rs1045910 2,55771161,A,G,0_1 ENSP00000342699 CCDC104 D268G Neutral rs1045910 2,55771161,A,G,0_1 ENSP00000385376 CCDC104 D214G Neutral rs1045910 2,95542377,C,T,2_1 ENSP00000295201 TEKT4 R391C Deleterious rs72817671 2,96521640,C,T,2_1 ENSP00000403302 ANKRD36C G1457R Neutral rs78178577 2,96521640,C,T,2_1 ENSP00000407838 ANKRD36C G484R Neutral rs78178577 2,96521640,C,T,2_1 ENSP00000415231 ANKRD36C G708R Neutral rs78178577

119

2,96525736,T,C,2_1 ENSP00000403302 ANKRD36C R1257G Neutral rs78585559 2,96525736,T,C,2_1 ENSP00000407838 ANKRD36C R284G Neutral rs78585559 2,96525736,T,C,2_1 ENSP00000415231 ANKRD36C R508G Neutral rs78585559 2,96593000,A,G,2_1 ENSP00000403302 ANKRD36C I634T Neutral rs111976783 2,96593016,C,A,2_1 ENSP00000403302 ANKRD36C D629Y Neutral rs79307257 2,96593025,C,T,2_1 ENSP00000403302 ANKRD36C D626N Neutral rs75189823 2,130925160,T,G,1_2 ENSP00000397278 SMPD4 T44P Neutral 2,130925160,T,G,1_2 ENSP00000416208 SMPD4 T44P Neutral 2,130952038,G,A,2_1 ENSP00000318197 TUBA3E A126V Neutral rs13000721 2,220521273,G,C,2_1 ENSP00000444067 DNPEP L25V Neutral 3,46539703,G,A,2_1 ENSP00000296142 RTP3 A51T Deleterious 3,72957607,A,G,2_1 ENSP00000374268 GXYLT2 E122G Deleterious 3,75714034,C,G,1_2 ENSP00000312299 FRG2C S81R Neutral rs75456211 3,75714034,C,G,1_2 ENSP00000419432 FRG2C S80R Neutral rs75456211 3,75714819,G,A,2_1 ENSP00000312299 FRG2C R159Q Neutral rs74497996 3,75714819,G,A,2_1 ENSP00000419432 FRG2C R158Q Neutral rs74497996 3,75714917,G,A,2_1 ENSP00000312299 FRG2C A192T Neutral rs75138472 3,75714917,G,A,2_1 ENSP00000419432 FRG2C A191T Neutral rs75138472 3,75787302,T,C,1_2 ENSP00000383643 ZNF717 K484R Neutral 3,75787302,T,C,1_2 ENSP00000409514 ZNF717 K491R Neutral 3,75787302,T,C,1_2 ENSP00000419377 ZNF717 K441R Neutral 3,75790838,C,T,1_2 ENSP00000383643 ZNF717 W29* NA rs113991634 3,75790838,C,T,1_2 ENSP00000409514 ZNF717 W36* NA rs113991634 3,75790838,C,T,1_2 ENSP00000417902 ZNF717 W36* NA rs113991634 3,75790838,C,T,1_2 ENSP00000418187 ZNF717 W36* NA rs113991634 3,98538108,G,A,2_1 ENSP00000321573 DCBLD2 A342V Deleterious 3,98538108,G,A,2_1 ENSP00000321646 DCBLD2 A342V Deleterious 3,122630371,C,A,2_1 ENSP00000195173 SEMA5B A1018S Neutral 3,122630371,C,A,2_1 ENSP00000350215 SEMA5B A1020S Neutral 3,122630371,C,A,2_1 ENSP00000377208 SEMA5B A1020S Neutral 3,122630371,C,A,2_1 ENSP00000389588 SEMA5B A1074S Neutral 3,122630371,C,A,2_1 ENSP00000400828 SEMA5B A66S Neutral 3,122630371,C,A,2_1 ENSP00000413917 SEMA5B A926S Neutral 3,122630371,C,A,2_1 ENSP00000417570 SEMA5B A1020S Neutral 3,148577673,C,T,2_1 ENSP00000282957 CPB1 R380* NA 3,148577673,C,T,2_1 ENSP00000417222 CPB1 R380* NA 3,193386239,C,T,2_1 ENSP00000411699 OPA1 L75F Neutral rs12494482 3,195453257,G,A,2_1 ENSP00000325431 MUC20 A424T Neutral rs3828406 3,195453257,G,A,2_1 ENSP00000371380 MUC20 A406T Neutral rs3828406 3,195453257,G,A,2_1 ENSP00000396774 MUC20 A595T Neutral rs3828406 3,195453257,G,A,2_1 ENSP00000397774 MUC20 A7T Neutral rs3828406

120

3,195453257,G,A,2_1 ENSP00000405629 MUC20 A560T Neutral rs3828406 3,195453257,G,A,2_1 ENSP00000414350 MUC20 A595T Neutral rs3828406 3,195507491,C,T,2_1 ENSP00000417397 MUC4 A3654T Neutral rs150551454 3,195507491,C,T,2_1 ENSP00000417498 MUC4 A3654T Neutral rs150551454 3,195507491,C,T,2_1 ENSP00000417657 MUC4 A3654T Neutral rs150551454 3,195507491,C,T,2_1 ENSP00000417722 MUC4 A3654T Neutral rs150551454 3,195507491,C,T,2_1 ENSP00000417757 MUC4 A3654T Neutral rs150551454 3,195507491,C,T,2_1 ENSP00000418306 MUC4 A3654T Neutral rs150551454 3,195507491,C,T,2_1 ENSP00000419798 MUC4 A3654T Neutral rs150551454 3,195507491,C,T,2_1 ENSP00000419989 MUC4 A3654T Neutral rs150551454 3,195507491,C,T,2_1 ENSP00000420243 MUC4 A3654T Neutral rs150551454 3,195507491,C,T,2_1 ENSP00000420439 MUC4 A3654T Neutral rs150551454 3,195508130,C,G,2_1 ENSP00000417397 MUC4 V3441L Neutral rs75675109 3,195508130,C,G,2_1 ENSP00000417498 MUC4 V3441L Neutral rs75675109 3,195508130,C,G,2_1 ENSP00000417657 MUC4 V3441L Neutral rs75675109 3,195508130,C,G,2_1 ENSP00000417722 MUC4 V3441L Neutral rs75675109 3,195508130,C,G,2_1 ENSP00000417757 MUC4 V3441L Neutral rs75675109 3,195508130,C,G,2_1 ENSP00000418306 MUC4 V3441L Neutral rs75675109 3,195508130,C,G,2_1 ENSP00000419798 MUC4 V3441L Neutral rs75675109 3,195508130,C,G,2_1 ENSP00000419989 MUC4 V3441L Neutral rs75675109 3,195508130,C,G,2_1 ENSP00000420243 MUC4 V3441L Neutral rs75675109 3,195508130,C,G,2_1 ENSP00000420439 MUC4 V3441L Neutral rs75675109 3,195511369,A,G,1_2 ENSP00000417397 MUC4 V2361A Neutral rs9755674 3,195511369,A,G,1_2 ENSP00000417498 MUC4 V2361A Neutral rs9755674 3,195511369,A,G,1_2 ENSP00000417657 MUC4 V2361A Neutral rs9755674 3,195511369,A,G,1_2 ENSP00000417722 MUC4 V2361A Neutral rs9755674 3,195511369,A,G,1_2 ENSP00000417757 MUC4 V2361A Neutral rs9755674 3,195511369,A,G,1_2 ENSP00000418306 MUC4 V2361A Neutral rs9755674 3,195511369,A,G,1_2 ENSP00000419798 MUC4 V2361A Neutral rs9755674 3,195511369,A,G,1_2 ENSP00000419989 MUC4 V2361A Neutral rs9755674 3,195511369,A,G,1_2 ENSP00000420243 MUC4 V2361A Neutral rs9755674 3,195511369,A,G,1_2 ENSP00000420439 MUC4 V2361A Neutral rs9755674 3,195994232,A,G,1_2 ENSP00000391405 PCYT1A L41P Deleterious 4,41678420,C,G,2_1 ENSP00000316891 LIMCH1 P770A Deleterious 4,41678420,C,G,2_1 ENSP00000316974 LIMCH1 P1153A Deleterious 4,41678420,C,G,2_1 ENSP00000371172 LIMCH1 P603A Deleterious 4,41678420,C,G,2_1 ENSP00000379840 LIMCH1 P615A Deleterious 4,41678420,C,G,2_1 ENSP00000421242 LIMCH1 P610A Deleterious 4,41678420,C,G,2_1 ENSP00000421656 LIMCH1 P604A Deleterious 4,41678420,C,G,2_1 ENSP00000422864 LIMCH1 P603A Deleterious 4,41678420,C,G,2_1 ENSP00000424437 LIMCH1 P782A Deleterious

121

4,41678420,C,G,2_1 ENSP00000424645 LIMCH1 P770A Deleterious 4,41678420,C,G,2_1 ENSP00000424825 LIMCH1 P769A Deleterious 4,41678420,C,G,2_1 ENSP00000425222 LIMCH1 P623A Deleterious 4,41678420,C,G,2_1 ENSP00000425631 LIMCH1 P1154A Deleterious 4,41678420,C,G,2_1 ENSP00000426334 LIMCH1 P610A Deleterious 4,41678420,C,G,2_1 ENSP00000440255 LIMCH1 P122A Deleterious 4,52883388,G,T,2_1 ENSP00000341944 LRRC66 P131H Neutral 4,79461793,C,T,2_1 ENSP00000264895 FRAS1 R3852* NA 4,79461793,C,T,2_1 ENSP00000422834 FRAS1 R2081* NA 4,88537035,A,G,2_1 ENSP00000282478 DSPP D1074G Neutral 4,88537035,A,G,2_1 ENSP00000382213 DSPP D1074G Neutral 4,175898994,T,C,1_2 ENSP00000352177 ADAM29 M773T Neutral rs113485638 4,175898994,T,C,1_2 ENSP00000384229 ADAM29 M773T Neutral rs113485638 4,175898994,T,C,1_2 ENSP00000414544 ADAM29 M773T Neutral rs113485638 4,175898994,T,C,1_2 ENSP00000423517 ADAM29 M773T Neutral rs113485638 4,190862204,C,T,2_1 ENSP00000226798 FRG1 L14F Neutral rs113582021 4,190862204,C,T,2_1 ENSP00000436535 FRG1 L14F Deleterious rs113582021 5,148637949,G,T,2_1 ENSP00000310309 ABLIM3 K678N Deleterious 5,148637949,G,T,2_1 ENSP00000315841 ABLIM3 K583N Deleterious 5,148637949,G,T,2_1 ENSP00000420855 ABLIM3 K645N Deleterious 5,148637949,G,T,2_1 ENSP00000425394 ABLIM3 K678N Deleterious 5,148637949,G,T,2_1 ENSP00000430150 ABLIM3 K164N Deleterious 5,148637949,G,T,2_1 ENSP00000439230 ABLIM3 K163N Deleterious 5,162940595,G,A,2_1 ENSP00000280969 MAT2B R87K Deleterious 5,162940595,G,A,2_1 ENSP00000325425 MAT2B R98K Deleterious 5,162940595,G,A,2_1 ENSP00000397371 MAT2B R33K Deleterious 5,162940595,G,A,2_1 ENSP00000428046 MAT2B R98K Neutral 6,32287955,T,C,1_2 ENSP00000303292 C6orf10 R203G Neutral 6,32549340,C,G,2_1 ENSP00000353099 HLA-DRB1 E216Q Neutral rs17856143 6,32549589,A,C,2_1 ENSP00000353099 HLA-DRB1 S133A Neutral 6,32609312,A,C,1_2 ENSP00000339398 HLA-DQA1 Y103S Neutral rs1129808 6,32609312,A,C,1_2 ENSP00000364087 HLA-DQA1 Y103S Neutral rs1129808 6,32609312,A,C,1_2 ENSP00000378767 HLA-DQA1 Y103S Neutral rs1129808 6,32609312,A,C,1_2 ENSP00000378768 HLA-DQA1 Y103S Neutral rs1129808 6,32609312,A,C,1_2 ENSP00000437183 HLA-DQA1 Y76S Neutral rs1129808 6,32609312,A,C,1_2 ENSP00000437302 HLA-DQA1 Y103S Neutral rs1129808 6,137323213,G,A,2_1 ENSP00000314976 IL20RA L382F Neutral rs1342642 6,137323213,G,A,2_1 ENSP00000356722 IL20RA L271F Neutral rs1342642 6,137323213,G,A,2_1 ENSP00000437843 IL20RA L333F Neutral rs1342642 6,138196024,G,A,2_1 ENSP00000237289 TNFAIP3 W113* NA 6,138196024,G,A,2_1 ENSP00000401562 TNFAIP3 W113* NA

122

6,138196024,G,A,2_1 ENSP00000439665 TNFAIP3 W113* NA 6,138196024,G,A,2_1 ENSP00000441330 TNFAIP3 W113* NA 6,138196024,G,A,2_1 ENSP00000442207 TNFAIP3 W113* NA 6,138196024,G,A,2_1 ENSP00000442647 TNFAIP3 W113* NA 6,138196024,G,A,2_1 ENSP00000444718 TNFAIP3 W113* NA 6,159050783,A,G,2_1 ENSP00000323755 TMEM181 Y449C Deleterious 6,159050783,A,G,2_1 ENSP00000356057 TMEM181 Y542C Deleterious 7,36278645,G,T,2_1 ENSP00000242108 EEPD1 K310N Neutral 7,36278645,G,T,2_1 ENSP00000442692 EEPD1 K310N Neutral 7,82997024,C,T,2_1 ENSP00000303212 SEMA3E D736N Neutral 7,82997024,C,T,2_1 ENSP00000405052 SEMA3E D676N Neutral 7,82997024,C,T,2_1 ENSP00000437365 SEMA3E D736N Neutral 7,84751182,A,T,2_1 ENSP00000284136 SEMA3D L9H Neutral 7,84751182,A,T,2_1 ENSP00000401366 SEMA3D L9H Neutral 7,100549505,T,C,2_1 ENSP00000368771 AC118759.1 I29T Neutral rs74588241 7,100549516,C,A,2_1 ENSP00000368771 AC118759.1 P33T Neutral rs76951301 7,100552412,T,C,2_1 ENSP00000324834 MUC3A M388T Neutral rs73714259 7,137569710,A,G,2_1 ENSP00000403550 CREB3L2 V434A Neutral 7,142458929,G,C,2_1 ENSP00000417854 PRSS1 K70N Neutral 7,143747712,C,A,2_1 ENSP00000386208 OR2A5 A73D Deleterious 7,151699853,A,G,1_2 ENSP00000376548 GALNTL5 E238G Deleterious 7,151699853,A,G,1_2 ENSP00000392582 GALNTL5 E238G Deleterious 7,151932945,C,T,1_2 ENSP00000262189 MLL3 R909K Neutral 7,151932945,C,T,1_2 ENSP00000347325 MLL3 R909K Neutral 7,151932945,C,T,1_2 ENSP00000403483 MLL3 R65K Deleterious 7,151935910,C,T,2_1 ENSP00000262189 MLL3 G845E Deleterious rs4024419 7,151935910,C,T,2_1 ENSP00000347325 MLL3 G845E Deleterious rs4024419 8,19315956,G,T,2_1 ENSP00000310891 CSGALNACT1 Q278K Neutral 8,19315956,G,T,2_1 ENSP00000330805 CSGALNACT1 Q278K Neutral 8,19315956,G,T,2_1 ENSP00000381084 CSGALNACT1 Q278K Neutral 8,19315956,G,T,2_1 ENSP00000411816 CSGALNACT1 Q278K Neutral 8,19315956,G,T,2_1 ENSP00000428216 CSGALNACT1 Q278K Neutral 8,19315956,G,T,2_1 ENSP00000429809 CSGALNACT1 Q278K Neutral 8,19315956,G,T,2_1 ENSP00000442155 CSGALNACT1 Q278K Neutral 8,48511523,G,T,1_2 ENSP00000297423 KIAA0146 V437L Neutral 8,48511523,G,T,1_2 ENSP00000429193 KIAA0146 V119L Neutral 8,48511523,G,T,1_2 ENSP00000429487 KIAA0146 V377L Neutral 8,48511523,G,T,1_2 ENSP00000430091 KIAA0146 V126L Neutral 8,48511523,G,T,1_2 ENSP00000444061 KIAA0146 V367L Neutral 8,118830678,C,A,2_1 ENSP00000367446 EXT1 S543I Neutral 9,5897614,T,G,1_2 ENSP00000370880 MLANA C45W Deleterious

123

9,5897614,T,G,1_2 ENSP00000370885 MLANA C45W Deleterious 9,5897614,T,G,1_2 ENSP00000370886 MLANA C45W Deleterious 9,33385733,C,T,2_1 ENSP00000297988 AQP7 M219I Neutral 9,33385733,C,T,2_1 ENSP00000368817 AQP7 M155I Neutral 9,33385733,C,T,2_1 ENSP00000368820 AQP7 M218I Neutral 9,33385733,C,T,2_1 ENSP00000368821 AQP7 M218I Neutral 9,33385733,C,T,2_1 ENSP00000396111 AQP7 M162I Neutral 9,33385733,C,T,2_1 ENSP00000410138 AQP7 M127I Neutral 9,33385733,C,T,2_1 ENSP00000412868 AQP7 M87I Neutral 9,33385733,C,T,2_1 ENSP00000438860 AQP7 E88K Neutral 9,33385733,C,T,2_1 ENSP00000439534 AQP7 M219I Neutral 9,33385733,C,T,2_1 ENSP00000441619 AQP7 M127I Neutral 9,33798042,G,C,1_2 ENSP00000340889 PRSS3 C153S Deleterious rs141382822 9,33798042,G,C,1_2 ENSP00000354280 PRSS3 C196S Deleterious rs141382822 9,33798042,G,C,1_2 ENSP00000368715 PRSS3 C139S Deleterious rs141382822 9,33798042,G,C,1_2 ENSP00000401249 PRSS3 C151S Deleterious rs141382822 9,33798042,G,C,1_2 ENSP00000401828 PRSS3 C132S Deleterious rs141382822 9,33798574,G,A,1_2 ENSP00000340889 PRSS3 S196N Neutral 9,33798574,G,A,1_2 ENSP00000354280 PRSS3 S239N Neutral 9,33798574,G,A,1_2 ENSP00000368715 PRSS3 S182N Neutral 9,33798574,G,A,1_2 ENSP00000401828 PRSS3 S175N Neutral 9,36400098,C,A,2_1 ENSP00000259605 RNF38 C3F Neutral 9,36400098,C,A,2_1 ENSP00000335239 RNF38 C3F Neutral 9,39088517,C,A,1_2 ENSP00000297668 CNTNAP3 M1041I Neutral 9,39088517,C,A,1_2 ENSP00000350863 CNTNAP3 M953I Neutral 9,39088517,C,A,1_2 ENSP00000366884 CNTNAP3 M960I Neutral 9,67968476,C,T,2_1 ENSP00000366697 ANKRD20A1 R679C Neutral rs4055530 9,77746692,A,G,1_2 ENSP00000340836 OSTF1 E68G Deleterious 9,113192279,A,T,2_1 ENSP00000363593 SVEP1 S1823T Neutral rs142508835 9,113192279,A,T,2_1 ENSP00000384917 SVEP1 S1846T Neutral rs142508835 9,130475442,A,C,2_1 ENSP00000362392 C9orf117 Y483S Neutral rs497632 10,37433929,G,T,2_1 ENSP00000354432 ANKRD30A R411M Neutral 10,37433929,G,T,2_1 ENSP00000363792 ANKRD30A R411M Neutral 10,50340304,C,A,1_2 ENSP00000308292 FAM170B R69L Neutral rs17773851 10,58118630,T,C,2_1 ENSP00000322850 ZWINT R67G Neutral rs2241666 10,58118630,T,C,2_1 ENSP00000363055 ZWINT R187G Neutral rs2241666 10,58118630,T,C,2_1 ENSP00000378801 ZWINT R187G Neutral rs2241666 10,89118125,C,T,1_2 ENSP00000328439 FAM22D H106Y Neutral rs77153116 10,89118125,C,T,1_2 ENSP00000371116 FAM22D H35Y Neutral rs77153116 10,89118125,C,T,1_2 ENSP00000396080 FAM22D H35Y Neutral rs77153116 10,112361502,G,T,2_1 ENSP00000354720 SMC3 D918Y Deleterious

124

10,126681848,T,A,1_2 ENSP00000311825 CTBP2 H861L Deleterious rs113189640 10,126681848,T,A,1_2 ENSP00000338615 CTBP2 H321L Deleterious rs113189640 10,126681848,T,A,1_2 ENSP00000357816 CTBP2 H389L Deleterious rs113189640 10,126681848,T,A,1_2 ENSP00000410474 CTBP2 H321L Deleterious rs113189640 10,126681848,T,A,1_2 ENSP00000434630 CTBP2 H321L Deleterious rs113189640 10,126681848,T,A,1_2 ENSP00000436285 CTBP2 H321L Deleterious rs113189640 10,126682443,G,T,1_2 ENSP00000311825 CTBP2 H838N Deleterious rs76582415 10,126682443,G,T,1_2 ENSP00000338615 CTBP2 H298N Deleterious rs76582415 10,126682443,G,T,1_2 ENSP00000357816 CTBP2 H366N Deleterious rs76582415 10,126682443,G,T,1_2 ENSP00000410474 CTBP2 H298N Deleterious rs76582415 10,126682443,G,T,1_2 ENSP00000434630 CTBP2 H298N Deleterious rs76582415 10,126682443,G,T,1_2 ENSP00000436285 CTBP2 H298N Deleterious rs76582415 10,126683071,G,C,2_1 ENSP00000311825 CTBP2 N789K Deleterious rs61870306 10,126683071,G,C,2_1 ENSP00000338615 CTBP2 N249K Deleterious rs61870306 10,126683071,G,C,2_1 ENSP00000357816 CTBP2 N317K Deleterious rs61870306 10,126683071,G,C,2_1 ENSP00000410474 CTBP2 N249K Deleterious rs61870306 10,126683071,G,C,2_1 ENSP00000434630 CTBP2 N249K Deleterious rs61870306 10,126683071,G,C,2_1 ENSP00000436285 CTBP2 N249K Deleterious rs61870306 10,126683075,T,A,2_1 ENSP00000311825 CTBP2 H788L Deleterious rs80273852 10,126683075,T,A,2_1 ENSP00000338615 CTBP2 H248L Deleterious rs80273852 10,126683075,T,A,2_1 ENSP00000357816 CTBP2 H316L Deleterious rs80273852 10,126683075,T,A,2_1 ENSP00000410474 CTBP2 H248L Deleterious rs80273852 10,126683075,T,A,2_1 ENSP00000434630 CTBP2 H248L Deleterious rs80273852 10,126683075,T,A,2_1 ENSP00000436285 CTBP2 H248L Deleterious rs80273852 11,45793154,A,C,2_1 ENSP00000368055 CTD- T7P Neutral rs3740704 2210P24.4.1 11,48346961,A,T,2_1 ENSP00000321419 OR4C3 N157Y Neutral rs75493089 11,48346961,A,T,2_1 ENSP00000378660 OR4C3 N20Y Neutral rs75493089 11,48346962,A,G,2_1 ENSP00000321419 OR4C3 N157S Neutral rs74589050 11,48346962,A,G,2_1 ENSP00000378660 OR4C3 N20S Neutral rs74589050 11,48387900,G,A,2_1 ENSP00000321338 OR4C5 Q40* NA rs74338058 11,49186274,G,A,1_2 ENSP00000256999 FOLH1 H475Y Neutral rs61886492 11,49186274,G,A,1_2 ENSP00000344086 FOLH1 H167Y Neutral rs61886492 11,49186274,G,A,1_2 ENSP00000344131 FOLH1 H460Y Neutral rs61886492 11,49186274,G,A,1_2 ENSP00000349129 FOLH1 H475Y Neutral rs61886492 11,49186274,G,A,1_2 ENSP00000374374 FOLH1 H478Y Neutral rs61886492 11,49186274,G,A,1_2 ENSP00000431463 FOLH1 H460Y Neutral rs61886492 11,56143250,G,A,2_1 ENSP00000304188 OR8U1 A51T Neutral rs76390346 11,56468440,G,T,2_1 ENSP00000309012 OR9G1 G193C Neutral rs12421330 11,60059810,A,G,2_1 ENSP00000338648 MS4A4A K52E Deleterious rs10750931 11,60059810,A,G,2_1 ENSP00000347252 MS4A4A K33E Deleterious rs10750931 11,60059810,A,G,2_1 ENSP00000378462 MS4A4A K33E Deleterious rs10750931

125

11,60059810,A,G,2_1 ENSP00000434506 MS4A4A K52E Deleterious rs10750931 11,105789549,A,G,2_1 ENSP00000282499 GRIA4 K461E Neutral 11,105789549,A,G,2_1 ENSP00000376835 GRIA4 K461E Neutral 11,105789549,A,G,2_1 ENSP00000432180 GRIA4 K461E Neutral 11,105789549,A,G,2_1 ENSP00000435775 GRIA4 K461E Neutral 11,123886865,T,A,2_1 ENSP00000325076 OR10G4 V195E Neutral rs4084209 11,123893987,G,A,1_2 ENSP00000364164 OR10G9 A90T Neutral rs28734900 12,2760864,G,A,2_1 ENSP00000266376 CACNA1C R1383H Deleterious 12,2760864,G,A,2_1 ENSP00000323129 CACNA1C R1165H Deleterious 12,2760864,G,A,2_1 ENSP00000329877 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000336982 CACNA1C R1360H Deleterious 12,2760864,G,A,2_1 ENSP00000341092 CACNA1C R1357H Deleterious 12,2760864,G,A,2_1 ENSP00000382500 CACNA1C R1324H Deleterious 12,2760864,G,A,2_1 ENSP00000382504 CACNA1C R1324H Deleterious 12,2760864,G,A,2_1 ENSP00000382506 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000382510 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000382512 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000382515 CACNA1C R1355H Deleterious 12,2760864,G,A,2_1 ENSP00000382526 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000382530 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000382537 CACNA1C R1352H Deleterious 12,2760864,G,A,2_1 ENSP00000382542 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000382546 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000382547 CACNA1C R1363H Deleterious 12,2760864,G,A,2_1 ENSP00000382549 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000382552 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000382557 CACNA1C R1322H Deleterious 12,2760864,G,A,2_1 ENSP00000382563 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000385724 CACNA1C R1335H Deleterious 12,2760864,G,A,2_1 ENSP00000385896 CACNA1C R1335H Deleterious 12,31256905,T,C,2_1 ENSP00000384703 DDX11 C951R Neutral rs1046458 12,31256905,T,C,2_1 ENSP00000440402 DDX11 C951R Neutral rs1046458 12,42512830,T,A,1_2 ENSP00000280876 GXYLT1 D122V Deleterious rs76740071 12,42512830,T,A,1_2 ENSP00000381666 GXYLT1 D153V Deleterious rs76740071 12,49420606,C,T,2_1 ENSP00000301067 MLL2 R5048H Deleterious 12,50746514,G,C,1_2 ENSP00000329995 FAM186A H1367Q Neutral rs71466905 12,50746514,G,C,1_2 ENSP00000441337 FAM186A H1367Q Neutral rs71466905 12,58220823,C,T,1_2 ENSP00000381148 CTDSP2 V104M Neutral rs111346934 12,58220823,C,T,1_2 ENSP00000448299 CTDSP2 V74M Neutral rs111346934 12,58220841,C,T,1_2 ENSP00000381148 CTDSP2 D98N Deleterious rs74343811 12,58220841,C,T,1_2 ENSP00000448299 CTDSP2 D68N Deleterious rs74343811

126

12,78401155,G,A,2_1 ENSP00000228327 NAV3 A613T Neutral 12,78401155,G,A,2_1 ENSP00000266692 NAV3 A613T Neutral 12,78401155,G,A,2_1 ENSP00000381007 NAV3 A613T Neutral 12,78401155,G,A,2_1 ENSP00000446132 NAV3 A613T Neutral 12,78401155,G,A,2_1 ENSP00000446628 NAV3 A613T Neutral 12,123186880,C,T,2_1 ENSP00000375066 HCAR2 M317I Neutral rs2454727 12,123186880,C,T,2_1 ENSP00000443556 HCAR2 M317I Neutral rs2454727 14,81610012,A,G,2_1 ENSP00000298171 TSHR H537R Deleterious 14,81610012,A,G,2_1 ENSP00000410839 TSHR H184R Deleterious 14,81610012,A,G,2_1 ENSP00000441235 TSHR H537R Deleterious 15,23603641,T,A,1_2 ENSP00000455298 RP11- V111E Neutral rs587867 529J17.2.1 15,40716408,C,A,2_1 ENSP00000453146 IVD Q204K Neutral 15,40716408,C,A,2_1 ENSP00000454145 IVD Q56K Neutral 15,75982085,C,T,2_1 ENSP00000312506 CSPG4 E441K Deleterious rs79463888 16,4935505,T,C,2_1 ENSP00000340510 PPL K1051E Deleterious rs142943814 16,4935505,T,C,2_1 ENSP00000456963 PPL K1049E Deleterious rs142943814 16,33961465,G,A,2_1 ENSP00000443070 LINC00273 R326C Deleterious rs74932417 16,33961779,C,A,2_1 ENSP00000443070 LINC00273 G221V Deleterious 16,55862691,G,A,2_1 ENSP00000353720 CES1 S83L Deleterious rs62028647 16,55862691,G,A,2_1 ENSP00000355193 CES1 S82L Deleterious rs62028647 16,55862691,G,A,2_1 ENSP00000390492 CES1 S82L Deleterious rs62028647 16,55862791,T,C,2_1 ENSP00000353720 CES1 I50V Neutral rs3826193 16,55862791,T,C,2_1 ENSP00000355193 CES1 I49V Neutral rs3826193 16,55862791,T,C,2_1 ENSP00000390492 CES1 I49V Neutral rs3826193 16,57080528,C,A,2_1 ENSP00000262510 NLRC5 Q1105K Neutral rs289723 16,57080528,C,A,2_1 ENSP00000308886 NLRC5 Q1105K Neutral rs289723 16,57080528,C,A,2_1 ENSP00000389739 NLRC5 Q1105K Neutral rs289723 16,57080528,C,A,2_1 ENSP00000437583 NLRC5 Q46K Neutral rs289723 16,57080528,C,A,2_1 ENSP00000438548 NLRC5 Q136K Neutral rs289723 16,57080528,C,A,2_1 ENSP00000441727 NLRC5 Q1105K Neutral rs289723 16,57080528,C,A,2_1 ENSP00000442906 NLRC5 Q858K Neutral rs289723 16,57080528,C,A,2_1 ENSP00000443825 NLRC5 Q833K Neutral rs289723 16,57080528,C,A,2_1 ENSP00000444773 NLRC5 Q256K Neutral rs289723 16,88926080,C,A,2_1 ENSP00000301021 TRAPPC2L Y72* NA 16,88926080,C,A,2_1 ENSP00000455100 TRAPPC2L Y72* NA 16,88926080,C,A,2_1 ENSP00000455892 TRAPPC2L Y62* NA 16,88926080,C,A,2_1 ENSP00000457684 TRAPPC2L Y42* NA 16,88926080,C,A,2_1 ENSP00000457728 TRAPPC2L Y72* NA 17,1183354,T,C,1_0 ENSP00000329548 TUSC5 F20S Neutral 17,7722365,C,T,0_1 ENSP00000353818 DNAH2 T3561I Neutral rs7213894 17,7722365,C,T,0_1 ENSP00000373825 DNAH2 T3600I Neutral rs7213894

127

17,16068343,G,A,2_1 ENSP00000268712 NCOR1 R190* NA rs78230791 17,16068343,G,A,2_1 ENSP00000379189 NCOR1 R81* NA rs78230791 17,16068343,G,A,2_1 ENSP00000379190 NCOR1 R81* NA rs78230791 17,16068343,G,A,2_1 ENSP00000379192 NCOR1 R190* NA rs78230791 17,16068343,G,A,2_1 ENSP00000387727 NCOR1 R190* NA rs78230791 17,16068343,G,A,2_1 ENSP00000395091 NCOR1 R190* NA rs78230791 17,16068343,G,A,2_1 ENSP00000407998 NCOR1 R190* NA rs78230791 17,16068377,C,G,2_1 ENSP00000268712 NCOR1 K178N Deleterious 17,16068377,C,G,2_1 ENSP00000379189 NCOR1 K69N Deleterious 17,16068377,C,G,2_1 ENSP00000379190 NCOR1 K69N Deleterious 17,16068377,C,G,2_1 ENSP00000379192 NCOR1 K178N Deleterious 17,16068377,C,G,2_1 ENSP00000387727 NCOR1 K178N Deleterious 17,16068377,C,G,2_1 ENSP00000395091 NCOR1 K178N Deleterious 17,16068377,C,G,2_1 ENSP00000407998 NCOR1 K178N Deleterious 17,18314913,T,C,1_2 ENSP00000385341 AL353997.1 S22P Neutral rs62074237 17,18682399,A,G,1_2 ENSP00000306937 FBXW10 T930A Neutral rs1318979 17,18682399,A,G,1_2 ENSP00000310382 FBXW10 T992A Neutral rs1318979 17,18682399,A,G,1_2 ENSP00000379025 FBXW10 T983A Neutral rs1318979 17,18682399,A,G,1_2 ENSP00000379026 FBXW10 T982A Neutral rs1318979 17,21318760,G,T,1_2 ENSP00000328150 KCNJ12 V36L Neutral rs74880280 17,21318770,G,A,2_1 ENSP00000328150 KCNJ12 R39Q Neutral rs3752033 17,21319079,C,A,1_2 ENSP00000328150 KCNJ12 T142N Deleterious rs76518282 17,21319436,G,A,1_2 ENSP00000328150 KCNJ12 R261H Deleterious rs77270326 17,40474419,T,A,2_1 ENSP00000264657 STAT3 D661V Neutral 17,40474419,T,A,2_1 ENSP00000373923 STAT3 D563V Neutral 17,40474419,T,A,2_1 ENSP00000384943 STAT3 D661V Neutral 18,12122470,C,T,2_1 ENSP00000326572 ANKRD62 S456L Neutral rs74988693 18,12122470,C,T,2_1 ENSP00000405628 ANKRD62 S192L Neutral rs74988693 18,22805862,G,A,2_1 ENSP00000354794 ZNF521 P674S Neutral 18,22805862,G,A,2_1 ENSP00000382352 ZNF521 P674S Neutral 18,22805862,G,A,2_1 ENSP00000440768 ZNF521 P708S Neutral 19,9025654,T,G,1_2 ENSP00000381008 MUC16 K12267T Neutral 19,44352665,T,C,2_1 ENSP00000327314 ZNF283 C638R Neutral rs2356437 19,44833633,C,G,1_2 ENSP00000253426 ZNF285 S231T Neutral rs75646262 19,44833633,C,G,1_2 ENSP00000337081 ZNF285 S232T Neutral rs75646262 19,44833633,C,G,1_2 ENSP00000346305 ZNF285 S226T Neutral rs75646262 19,44833633,C,G,1_2 ENSP00000392867 ZNF285 S232T Neutral rs75646262 19,44833633,C,G,1_2 ENSP00000441990 ZNF285 S249T Neutral rs75646262 19,44933653,A,T,2_1 ENSP00000291187 ZNF229 C435S Deleterious 19,49674674,C,A,1_2 ENSP00000252826 TRPM4 T286K Neutral 19,49674674,C,A,1_2 ENSP00000347944 TRPM4 T3K Neutral

128

19,49674674,C,A,1_2 ENSP00000407492 TRPM4 T286K Neutral 19,52033038,T,G,2_1 ENSP00000344064 SIGLEC6 T291P Deleterious 19,52033038,T,G,2_1 ENSP00000345907 SIGLEC6 T318P Deleterious 19,52033038,T,G,2_1 ENSP00000353071 SIGLEC6 T329P Deleterious 19,52033038,T,G,2_1 ENSP00000375674 SIGLEC6 T302P Deleterious 19,52033038,T,G,2_1 ENSP00000401502 SIGLEC6 T318P Deleterious 19,52033038,T,G,2_1 ENSP00000410679 SIGLEC6 T266P Deleterious 19,55258808,C,T,1_2 ENSP00000342215 KIR2DL3 P229L Deleterious rs35861855 19,55284824,G,C,1_2 ENSP00000291633 KIR2DL1 R37P Neutral rs35509911 19,55284824,G,C,1_2 ENSP00000336769 KIR2DL1 R37P Neutral rs35509911 19,55284908,T,C,2_1 ENSP00000291633 KIR2DL1 M65T Neutral 19,55284908,T,C,2_1 ENSP00000336769 KIR2DL1 M65T Neutral 19,55512232,C,A,1_0 ENSP00000263437 NLRP2 A1049E Neutral rs1043673 19,55512232,C,A,1_0 ENSP00000344074 NLRP2 A1030E Neutral rs1043673 19,55512232,C,A,1_0 ENSP00000375601 NLRP2 A1028E Neutral rs1043673 19,55512232,C,A,1_0 ENSP00000402474 NLRP2 A1029E Neutral rs1043673 19,55512232,C,A,1_0 ENSP00000409370 NLRP2 A1052E Neutral rs1043673 19,55512232,C,A,1_0 ENSP00000440601 NLRP2 A1030E Neutral rs1043673 19,55512232,C,A,1_0 ENSP00000441133 NLRP2 A1028E Neutral rs1043673 19,55512232,C,A,1_0 ENSP00000445135 NLRP2 A1052E Neutral rs1043673 19,58385748,G,A,2_1 ENSP00000410545 ZNF814 A337V Neutral rs145250945 20,2397899,T,G,2_1 ENSP00000202625 TGM6 V453G Deleterious 20,2397899,T,G,2_1 ENSP00000370831 TGM6 V453G Deleterious 20,14474130,G,A,2_1 ENSP00000217246 MACROD2 A93T Neutral 20,14474130,G,A,2_1 ENSP00000309809 MACROD2 A93T Neutral 20,25755751,G,A,1_2 ENSP00000365585 FAM182B R69W Neutral 20,25755751,G,A,1_2 ENSP00000365586 FAM182B R66W Neutral 20,25755770,C,G,1_2 ENSP00000365585 FAM182B Q62H Deleterious 20,25755770,C,G,1_2 ENSP00000365586 FAM182B Q59H Deleterious 20,25755889,A,G,1_2 ENSP00000365585 FAM182B C23R Deleterious 20,25755889,A,G,1_2 ENSP00000365586 FAM182B C20R Deleterious 20,48700749,C,T,2_1 ENSP00000340305 UBE2V1 G95R Deleterious 20,48700749,C,T,2_1 ENSP00000344166 TMEM189- G295R Neutral UBE2V1 20,48700749,C,T,2_1 ENSP00000346343 UBE2V1 G28R Deleterious 20,48700749,C,T,2_1 ENSP00000360739 UBE2V1 G72R Deleterious 20,48700749,C,T,2_1 ENSP00000360742 UBE2V1 G95R Deleterious 20,48700749,C,T,2_1 ENSP00000395264 UBE2V1 G28R Deleterious 20,48700749,C,T,2_1 ENSP00000407770 UBE2V1 G28R Deleterious 20,48700749,C,T,2_1 ENSP00000450635 TMEM189 G295R Neutral 21,47754471,G,A,2_1 ENSP00000338675 PCNT R143H Neutral rs58106867 21,47754471,G,A,2_1 ENSP00000352572 PCNT R143H Neutral rs58106867

129

22,29611572,G,A,2_1 ENSP00000335481 EMID1 R91H Neutral 22,29611572,G,A,2_1 ENSP00000384452 EMID1 R91H Neutral 22,29611572,G,A,2_1 ENSP00000385414 EMID1 R91H Neutral 22,29611572,G,A,2_1 ENSP00000399760 EMID1 R91H Neutral 22,29611572,G,A,2_1 ENSP00000403816 EMID1 R91H Neutral 22,39388486,T,A,1_2 ENSP00000385060 APOBEC3B L489Q Neutral rs150925968 X,13061061,T,A,2_1 ENSP00000334430 FAM9C T64S Neutral X,13061061,T,A,2_1 ENSP00000369999 FAM9C T64S Neutral X,13061061,T,A,2_1 ENSP00000409506 FAM9C T24S Neutral X,13061061,T,A,2_1 ENSP00000439185 FAM9C T64S Deleterious X,20205967,C,T,2_1 ENSP00000368865 RPS6KA3 W222* NA X,20205967,C,T,2_1 ENSP00000368884 RPS6KA3 W251* NA X,20205967,C,T,2_1 ENSP00000407655 RPS6KA3 W222* NA X,20205967,C,T,2_1 ENSP00000440220 RPS6KA3 W223* NA X,20205967,C,T,2_1 ENSP00000444837 RPS6KA3 W223* NA X,26235772,T,G,2_1 ENSP00000368315 MAGEB5 N118K Deleterious X,106358698,C,G,2_1 ENSP00000203616 RBM41 C136S Neutral X,106358698,C,G,2_1 ENSP00000361557 RBM41 C136S Neutral X,106358698,C,G,2_1 ENSP00000361560 RBM41 C136S Neutral X,106358698,C,G,2_1 ENSP00000361565 RBM41 C136S Neutral X,106358698,C,G,2_1 ENSP00000405522 RBM41 C134S Neutral X,106358698,C,G,2_1 ENSP00000433251 RBM41 C136S Neutral X,140785696,T,C,2_1 ENSP00000359546 SPANXD K74E Neutral rs2983592 X,140785714,C,G,2_1 ENSP00000359546 SPANXD V68L Neutral rs5953618 X,140993722,A,C,2_1 ENSP00000285879 MAGEC1 I178L Neutral rs80314937 X,148798267,T,G,1_2 ENSP00000328177 MAGEA11 V345G Deleterious X,148798267,T,G,1_2 ENSP00000347358 MAGEA11 V374G Deleterious X,153224057,A,G,1_2 ENSP00000309555 HCFC1 L589P Deleterious X,153224057,A,G,1_2 ENSP00000346174 HCFC1 L520P Deleterious X,153224057,A,G,1_2 ENSP00000359001 HCFC1 L589P Deleterious

130

Table Appendix-3: Genome Three Predicted Protein Changes. Input string for PROVEAN: chromosome number, 1-based coordinate, reference allele, variant allele, genotype (saliva_LGL)

Input Accession number Gene Symbol Amino PROVEAN dbSNP Acid prediction Change 1,12921332,T,C,1_2 ENSP00000240189 PRAMEF2 C375R Neutral rs17039307 1,12921539,T,G,1_2 ENSP00000240189 PRAMEF2 F444V Neutral rs142594496 1,17087465,G,A,2_1 ENSP00000438833 MST1P9 P67L Neutral rs11260924 1,17087465,G,A,2_1 ENSP00000439273 MST1P9 P67L Neutral rs11260924 1,17087465,G,A,2_1 ENSP00000445850 MST1P9 P37L Neutral rs11260924 1,39879289,G,A,1_2 ENSP00000431179 KIAA0754 A982T Neutral 1,152279406,T,G,1_2 ENSP00000357789 FLG E2652D Neutral rs192116923 1,155583937,A,G,1_2 ENSP00000245564 MSTO1 K529R Neutral rs3844257 1,155583937,A,G,1_2 ENSP00000357325 MSTO1 K483R Neutral rs3844257 1,182812436,T,G,2_1 ENSP00000356520 DHX9 V40G Deleterious 1,182812436,T,G,2_1 ENSP00000440408 DHX9 V40G Deleterious 1,240255773,A,C,2_1 ENSP00000318884 FMN2 S122R Deleterious 1,248436456,C,A,2_1 ENSP00000324687 OR2T33 A221S Neutral rs111275277 1,248436582,T,C,1_2 ENSP00000324687 OR2T33 T179A Neutral rs71535238 1,248436912,T,C,2_1 ENSP00000324687 OR2T33 M69V Neutral rs71538181 1,248814052,T,A,0_1 ENSP00000342008 OR2T27 K45M Neutral rs28533004

131

2,97845632,T,C,2_1 ENSP00000391950 ANKRD36 I566T Neutral 2,97845632,T,C,2_1 ENSP00000419530 ANKRD36 I566T Neutral 2,132044842,G,A,1_2 ENSP00000412278 CYP4F31P R54K Neutral 2,179647637,C,T,2_1 ENSP00000340554 TTN R953H Deleterious 2,179647637,C,T,2_1 ENSP00000343764 TTN R999H Deleterious 2,179647637,C,T,2_1 ENSP00000348444 TTN R953H Deleterious 2,179647637,C,T,2_1 ENSP00000352154 TTN R953H Deleterious 2,179647637,C,T,2_1 ENSP00000354117 TTN R999H Deleterious 2,179647637,C,T,2_1 ENSP00000434586 TTN R953H Deleterious 3,195508249,A,G,1_2 ENSP00000417397 MUC4 V3401A Neutral 3,195508249,A,G,1_2 ENSP00000417498 MUC4 V3401A Neutral 3,195508249,A,G,1_2 ENSP00000417657 MUC4 V3401A Neutral 3,195508249,A,G,1_2 ENSP00000417722 MUC4 V3401A Neutral 3,195508249,A,G,1_2 ENSP00000417757 MUC4 V3401A Neutral 3,195508249,A,G,1_2 ENSP00000418306 MUC4 V3401A Neutral 3,195508249,A,G,1_2 ENSP00000419798 MUC4 V3401A Neutral 3,195508249,A,G,1_2 ENSP00000419989 MUC4 V3401A Neutral 3,195508249,A,G,1_2 ENSP00000420243 MUC4 V3401A Neutral 3,195508249,A,G,1_2 ENSP00000420439 MUC4 V3401A Neutral 3,195509974,A,G,2_1 ENSP00000417397 MUC4 F2826S Neutral 3,195509974,A,G,2_1 ENSP00000417498 MUC4 F2826S Neutral 3,195509974,A,G,2_1 ENSP00000417657 MUC4 F2826S Neutral 3,195509974,A,G,2_1 ENSP00000417722 MUC4 F2826S Neutral 3,195509974,A,G,2_1 ENSP00000417757 MUC4 F2826S Neutral 3,195509974,A,G,2_1 ENSP00000418306 MUC4 F2826S Neutral 3,195509974,A,G,2_1 ENSP00000419798 MUC4 F2826S Neutral 3,195509974,A,G,2_1 ENSP00000419989 MUC4 F2826S Neutral 3,195509974,A,G,2_1 ENSP00000420243 MUC4 F2826S Neutral 3,195509974,A,G,2_1 ENSP00000420439 MUC4 F2826S Neutral 3,195510266,A,C,1_2 ENSP00000417397 MUC4 S2729A Neutral 3,195510266,A,C,1_2 ENSP00000417498 MUC4 S2729A Neutral 3,195510266,A,C,1_2 ENSP00000417657 MUC4 S2729A Neutral 3,195510266,A,C,1_2 ENSP00000417722 MUC4 S2729A Neutral 3,195510266,A,C,1_2 ENSP00000417757 MUC4 S2729A Neutral 3,195510266,A,C,1_2 ENSP00000418306 MUC4 S2729A Neutral 3,195510266,A,C,1_2 ENSP00000419798 MUC4 S2729A Neutral 3,195510266,A,C,1_2 ENSP00000419989 MUC4 S2729A Neutral 3,195510266,A,C,1_2 ENSP00000420243 MUC4 S2729A Neutral 3,195510266,A,C,1_2 ENSP00000420439 MUC4 S2729A Neutral 3,195513515,C,T,2_1 ENSP00000417397 MUC4 A1646T Neutral 3,195513515,C,T,2_1 ENSP00000417498 MUC4 A1646T Neutral

132

3,195513515,C,T,2_1 ENSP00000417657 MUC4 A1646T Neutral 3,195513515,C,T,2_1 ENSP00000417722 MUC4 A1646T Neutral 3,195513515,C,T,2_1 ENSP00000417757 MUC4 A1646T Neutral 3,195513515,C,T,2_1 ENSP00000418306 MUC4 A1646T Neutral 3,195513515,C,T,2_1 ENSP00000419798 MUC4 A1646T Neutral 3,195513515,C,T,2_1 ENSP00000419989 MUC4 A1646T Neutral 3,195513515,C,T,2_1 ENSP00000420243 MUC4 A1646T Neutral 3,195513515,C,T,2_1 ENSP00000420439 MUC4 A1646T Neutral 4,367169,G,A,2_1 ENSP00000240499 ZNF141 E315K Deleterious rs145966198 4,1389156,T,C,1_2 ENSP00000323978 CRIPAK M286T Neutral rs71614972 4,1389156,T,C,1_2 ENSP00000372402 CRIPAK M228T Neutral rs71614972 4,5749975,C,T,2_1 ENSP00000264956 EVC T347M Deleterious rs34947207 4,5749975,C,T,2_1 ENSP00000372120 EVC T347M Deleterious rs34947207 4,5749975,C,T,2_1 ENSP00000426774 EVC T347M Deleterious rs34947207 4,88537232,A,G,1_2 ENSP00000282478 DSPP N1140D Neutral rs140656082 4,88537232,A,G,1_2 ENSP00000382213 DSPP N1140D Neutral rs140656082 5,140050940,C,T,2_1 ENSP00000445366 DND1 E334K Neutral rs77880328 6,6321119,C,T,2_1 ENSP00000416295 F13A1 G4R Neutral 6,47649574,A,G,1_2 ENSP00000296862 GPR111 K427E Neutral rs10807372 6,47649574,A,G,1_2 ENSP00000381727 GPR111 K359E Neutral rs10807372 6,47649574,A,G,1_2 ENSP00000422934 GPR111 K359E Neutral rs10807372 6,47649574,A,G,1_2 ENSP00000425269 GPR111 K359E Neutral rs10807372 7,100551566,T,C,1_2 ENSP00000324834 MUC3A M106T Neutral rs78538898 7,138601662,C,T,2_1 ENSP00000242365 KIAA1549 A854T Neutral rs117908080 7,138601662,C,T,2_1 ENSP00000406661 KIAA1549 A904T Neutral rs117908080 7,138601662,C,T,2_1 ENSP00000416040 KIAA1549 A904T Neutral rs117908080 8,101730064,G,A,1_2 ENSP00000174661 PABPC1 T147M Deleterious rs72681442 8,101730064,G,A,1_2 ENSP00000313007 PABPC1 T147M Deleterious rs72681442 8,101730064,G,A,1_2 ENSP00000427914 PABPC1 T19M Deleterious rs72681442 8,101730064,G,A,1_2 ENSP00000429395 PABPC1 T115M Deleterious rs72681442 8,101730064,G,A,1_2 ENSP00000429594 PABPC1 T102M Deleterious rs72681442 8,101730064,G,A,1_2 ENSP00000429892 PABPC1 T94M Deleterious rs72681442 8,101730064,G,A,1_2 ENSP00000430159 PABPC1 T102M Deleterious rs72681442 8,101730073,T,C,1_2 ENSP00000174661 PABPC1 H144R Deleterious rs72681443 8,101730073,T,C,1_2 ENSP00000313007 PABPC1 H144R Deleterious rs72681443 8,101730073,T,C,1_2 ENSP00000427914 PABPC1 H16R Deleterious rs72681443 8,101730073,T,C,1_2 ENSP00000429395 PABPC1 H112R Deleterious rs72681443 8,101730073,T,C,1_2 ENSP00000429594 PABPC1 H99R Deleterious rs72681443 8,101730073,T,C,1_2 ENSP00000429892 PABPC1 H91R Deleterious rs72681443 8,101730073,T,C,1_2 ENSP00000430159 PABPC1 H99R Deleterious rs72681443 9,33385235,T,G,2_1 ENSP00000297988 AQP7 Y266S Neutral

133

9,33385235,T,G,2_1 ENSP00000368821 AQP7 Y265S Neutral 9,33385690,G,T,1_2 ENSP00000297988 AQP7 R234S Deleterious rs139024279 9,33385690,G,T,1_2 ENSP00000368817 AQP7 R170S Deleterious rs139024279 9,33385690,G,T,1_2 ENSP00000368820 AQP7 R233S Deleterious rs139024279 9,33385690,G,T,1_2 ENSP00000368821 AQP7 R233S Deleterious rs139024279 9,33385690,G,T,1_2 ENSP00000396111 AQP7 R177S Deleterious rs139024279 9,33385690,G,T,1_2 ENSP00000410138 AQP7 R142S Deleterious rs139024279 9,33385690,G,T,1_2 ENSP00000412868 AQP7 R102S Deleterious rs139024279 9,33385690,G,T,1_2 ENSP00000438860 AQP7 P102Q Neutral rs139024279 9,33385690,G,T,1_2 ENSP00000439534 AQP7 R234S Deleterious rs139024279 9,33385690,G,T,1_2 ENSP00000441619 AQP7 R142S Deleterious rs139024279 9,43861081,T,G,1_2 ENSP00000340890 CNTNAP3B L652R Neutral rs62536501 9,43861081,T,G,1_2 ENSP00000366784 CNTNAP3B L701R Neutral rs62536501 9,43861081,T,G,1_2 ENSP00000366787 CNTNAP3B L652R Neutral rs62536501 9,43861081,T,G,1_2 ENSP00000385153 CNTNAP3B L558R Neutral rs62536501 9,43861081,T,G,1_2 ENSP00000432883 CNTNAP3B L559R Neutral rs62536501 9,97087857,G,A,1_2 ENSP00000253262 FAM22F L126F Neutral rs75315722 9,97087857,G,A,1_2 ENSP00000335067 FAM22F L126F Neutral rs75315722 9,97087857,G,A,1_2 ENSP00000343865 FAM22F L126F Neutral rs75315722 9,97087857,G,A,1_2 ENSP00000364496 FAM22F L126F Neutral rs75315722 11,1017596,T,G,1_2 ENSP00000406861 MUC6 Q1735H Neutral 11,1643049,A,C,2_1 ENSP00000331603 KRTAP5-4 V92G Neutral 11,1643049,A,C,2_1 ENSP00000382590 KRTAP5-4 V92G Neutral 11,18267463,A,G,2_1 ENSP00000256733 SAA2 V75A Deleterious rs71469388 11,18267463,A,G,2_1 ENSP00000416716 SAA2 V75A Deleterious rs71469388 11,18267463,A,G,2_1 ENSP00000432370 SAA2 V75A Deleterious rs71469388 11,18267463,A,G,2_1 ENSP00000435659 SAA2 V75A Deleterious rs71469388 11,18267463,A,G,2_1 ENSP00000436126 SAA2 V75A Deleterious rs71469388 11,18267463,A,G,2_1 ENSP00000437162 SAA2 V75A Deleterious rs71469388 11,46898750,C,T,2_1 ENSP00000367888 LRP4 G1093R Deleterious 11,59480654,A,G,2_1 ENSP00000302199 OR10V1 I222T Deleterious 11,76372464,C,T,2_1 ENSP00000260061 LRRC32 G58E Neutral 11,76372464,C,T,2_1 ENSP00000384126 LRRC32 G58E Neutral 11,76372464,C,T,2_1 ENSP00000385766 LRRC32 G58E Neutral 11,76372464,C,T,2_1 ENSP00000413331 LRRC32 G58E Neutral 12,3147203,C,T,1_0 ENSP00000351184 TEAD4 P280S Neutral rs11550887 12,3147203,C,T,1_0 ENSP00000352926 TEAD4 P323S Neutral rs11550887 12,3147203,C,T,1_0 ENSP00000380311 TEAD4 P194S Neutral rs11550887 12,11286249,T,C,1_2 ENSP00000444736 TAS2R30 I199V Neutral rs77777159 12,11461553,T,C,2_1 ENSP00000279575 PRB4 R122G Neutral 12,11461553,T,C,2_1 ENSP00000442834 PRB4 R122G Neutral

134

12,26383750,C,T,2_1 ENSP00000242729 SSPN T158I Neutral 12,26383750,C,T,2_1 ENSP00000396087 SSPN T55I Neutral 12,26383750,C,T,2_1 ENSP00000400971 SSPN T132I Neutral 12,26383750,C,T,2_1 ENSP00000442893 SSPN T55I Neutral 12,26383750,C,T,2_1 ENSP00000445360 SSPN T55I Neutral 12,29936501,C,T,2_1 ENSP00000442046 TMTC1 D62N Deleterious rs76424334 12,29936501,C,T,2_1 ENSP00000448112 TMTC1 D62N Deleterious rs76424334 12,29936501,C,T,2_1 ENSP00000449043 TMTC1 D62N Deleterious rs76424334 12,29936515,A,G,2_1 ENSP00000442046 TMTC1 I57T Deleterious 12,29936515,A,G,2_1 ENSP00000448112 TMTC1 I57T Deleterious 12,29936515,A,G,2_1 ENSP00000449043 TMTC1 I57T Deleterious 12,49950251,C,G,2_1 ENSP00000257981 KCNH3 P856R Neutral rs113209368 12,123333353,A,G,2_1 ENSP00000253083 HIP1R D104G Deleterious 14,44974225,C,A,2_1 ENSP00000344579 FSCB A656S Neutral 14,44974225,C,A,2_1 ENSP00000446012 FSCB A549S Neutral RP11- 15,23407031,T,C,2_1 ENSP00000453436 467N20.5.1 K602R Neutral 15,23685190,G,A,1_2 ENSP00000454407 GOLGA6L2 A811V Neutral rs148212312 15,42643529,T,C,2_1 ENSP00000326227 GANC F845S Neutral rs7181742 15,42643529,T,C,2_1 ENSP00000455511 GANC F54S Neutral rs7181742 15,42643529,T,C,2_1 ENSP00000457604 GANC F54S Neutral rs7181742 15,49776539,A,T,2_1 ENSP00000267843 FGF7 E141D Deleterious 15,49776539,A,T,2_1 ENSP00000453048 FGF7 E141D Deleterious 15,49776539,A,T,2_1 ENSP00000453980 FGF7 E83D Deleterious 16,33961386,G,A,2_1 ENSP00000443070 LINC00273 A352V Deleterious rs76705741 16,33961779,C,A,2_1 ENSP00000443070 LINC00273 G221V Deleterious 16,53644898,G,A,2_1 ENSP00000262135 RPGRIP1L Q1148* NA 16,53644898,G,A,2_1 ENSP00000369257 RPGRIP1L Q1228* NA 16,53644898,G,A,2_1 ENSP00000456534 RPGRIP1L Q1182* NA 16,53644898,G,A,2_1 ENSP00000457889 RPGRIP1L Q1194* NA 16,69988319,T,C,1_2 ENSP00000288040 CLEC18A L100P Neutral rs3869427 16,69988319,T,C,1_2 ENSP00000377304 CLEC18A L100P Neutral rs3869427 16,69988319,T,C,1_2 ENSP00000413990 CLEC18A L100P Neutral rs3869427 16,69988319,T,C,1_2 ENSP00000442945 CLEC18A L100P Neutral rs3869427 16,69988319,T,C,1_2 ENSP00000454685 CLEC18A L100P Neutral rs3869427 RP11- 16,89016735,A,T,0_1 ENSP00000367598 830F9.6.1 Q70L Deleterious rs12930980 17,5036274,C,A,1_2 ENSP00000250066 USP6 H89N Neutral rs78465432 17,5036274,C,A,1_2 ENSP00000328010 USP6 H89N Neutral rs78465432 17,5036281,G,C,1_2 ENSP00000250066 USP6 S91T Neutral rs76236903 17,5036281,G,C,1_2 ENSP00000328010 USP6 S91T Neutral rs76236903 17,16068377,C,G,2_1 ENSP00000268712 NCOR1 K178N Deleterious

135

17,16068377,C,G,2_1 ENSP00000379189 NCOR1 K69N Deleterious 17,16068377,C,G,2_1 ENSP00000379190 NCOR1 K69N Deleterious 17,16068377,C,G,2_1 ENSP00000379192 NCOR1 K178N Deleterious 17,16068377,C,G,2_1 ENSP00000387727 NCOR1 K178N Deleterious 17,16068377,C,G,2_1 ENSP00000395091 NCOR1 K178N Deleterious 17,16068377,C,G,2_1 ENSP00000407998 NCOR1 K178N Deleterious 17,16097825,T,G,1_2 ENSP00000268712 NCOR1 Y20S Deleterious rs73281920 17,16097825,T,G,1_2 ENSP00000379189 NCOR1 Y20S Deleterious rs73281920 17,16097825,T,G,1_2 ENSP00000379190 NCOR1 Y20S Deleterious rs73281920 17,16097825,T,G,1_2 ENSP00000379192 NCOR1 Y20S Deleterious rs73281920 17,16097825,T,G,1_2 ENSP00000387727 NCOR1 Y20S Deleterious rs73281920 17,16097825,T,G,1_2 ENSP00000395091 NCOR1 Y20S Deleterious rs73281920 17,16097825,T,G,1_2 ENSP00000407998 NCOR1 Y20S Deleterious rs73281920 17,16097825,T,G,1_2 ENSP00000410784 NCOR1 Y20S Deleterious rs73281920 17,21207813,T,G,2_1 ENSP00000319139 MAP2K3 L219W Deleterious rs74575904 17,21207813,T,G,2_1 ENSP00000345083 MAP2K3 L215W Deleterious rs74575904 17,21207813,T,G,2_1 ENSP00000355081 MAP2K3 L186W Deleterious rs74575904 17,21207813,T,G,2_1 ENSP00000378869 MAP2K3 L186W Deleterious rs74575904 17,21319523,C,T,2_1 ENSP00000328150 KCNJ12 T290M Neutral rs77987694 17,39197601,T,C,1_2 ENSP00000305975 KRTAP1-1 S17G Neutral 17,39197601,T,C,1_2 ENSP00000439330 KRTAP1-1 S17G Neutral 17,39254133,G,T,2_1 ENSP00000328444 KRTAP4-8 S68R Deleterious 17,39254133,G,T,2_1 ENSP00000414561 KRTAP4-8 S68R Deleterious 18,11610349,G,A,1_2 ENSP00000408366 SLC35G4 S213N Neutral rs56067329 18,11610382,C,A,1_2 ENSP00000408366 SLC35G4 T224N Neutral rs55927946 18,12122470,C,T,1_2 ENSP00000326572 ANKRD62 S456L Neutral rs74988693 18,12122470,C,T,1_2 ENSP00000405628 ANKRD62 S192L Neutral rs74988693 18,13884764,C,T,2_1 ENSP00000333821 MC2R A252T Neutral 18,13884764,C,T,2_1 ENSP00000382718 MC2R A252T Neutral 18,14779986,G,A,1_2 ENSP00000351875 ANKRD30B R483Q Neutral rs76927023 18,14779986,G,A,1_2 ENSP00000399031 ANKRD30B R483Q Neutral rs76927023 18,52265308,A,C,1_2 ENSP00000315265 C18orf26 T189P Neutral rs9947055 19,1391014,A,T,2_1 ENSP00000233627 NDUFS7 T125S Deleterious 19,1391014,A,T,2_1 ENSP00000364262 NDUFS7 T125S Deleterious 19,1391014,A,T,2_1 ENSP00000388398 NDUFS7 T125S Deleterious 19,1391014,A,T,2_1 ENSP00000406630 NDUFS7 T155S Deleterious 19,1391014,A,T,2_1 ENSP00000439466 NDUFS7 T44S Deleterious 19,1391014,A,T,2_1 ENSP00000440348 NDUFS7 T125S Deleterious 19,1391014,A,T,2_1 ENSP00000441075 NDUFS7 T44S Deleterious 19,1391014,A,T,2_1 ENSP00000443273 NDUFS7 T125S Deleterious 19,1391014,A,T,2_1 ENSP00000443388 NDUFS7 T44S Deleterious

136

19,1391014,A,T,2_1 ENSP00000445422 NDUFS7 T44S Deleterious 19,4511746,A,T,1_2 ENSP00000301286 PLIN4 N728K Deleterious rs62115192 19,4538599,G,A,1_2 ENSP00000302621 LRG1 P133S Neutral rs966384 19,7056571,G,T,2_1 ENSP00000333183 MBD3L3 A130D Deleterious rs111605618 19,22155783,C,T,1_2 ENSP00000380315 ZNF208 V685I Neutral 19,22155783,C,T,1_2 ENSP00000408886 ZNF208 V585I Neutral 19,22375868,C,T,2_1 ENSP00000380310 ZNF676 G27E Deleterious rs8104929 19,36002381,C,T,2_1 ENSP00000342012 DMKN G284S Neutral rs12981076 19,36002381,C,T,2_1 ENSP00000376043 DMKN G284S Neutral rs12981076 19,36002381,C,T,2_1 ENSP00000388404 DMKN G284S Neutral rs12981076 19,36002381,C,T,2_1 ENSP00000394908 DMKN G284S Neutral rs12981076 19,36002381,C,T,2_1 ENSP00000409513 DMKN G284S Neutral rs12981076 19,36002381,C,T,2_1 ENSP00000414743 DMKN G284S Neutral rs12981076 19,36002381,C,T,2_1 ENSP00000415277 DMKN G284S Neutral rs12981076 19,55286665,T,A,1_2 ENSP00000291633 KIR2DL1 L140Q Neutral rs1049287 19,55286665,T,A,1_2 ENSP00000336769 KIR2DL1 L140Q Neutral rs1049287 20,23965954,T,C,1_2 ENSP00000278765 GGTLC1 T193A Neutral 20,23965954,T,C,1_2 ENSP00000286890 GGTLC1 T193A Neutral 20,23965954,T,C,1_2 ENSP00000337587 GGTLC1 T193A Neutral 20,26061956,C,A,2_1 ENSP00000246000 FAM182A A103E Deleterious 20,26061956,C,A,2_1 ENSP00000365580 FAM182A A103E Deleterious 20,26061956,C,A,2_1 ENSP00000388057 FAM182A A44E Deleterious 22,21044631,G,A,2_1 ENSP00000443399 POM121L4P D105N Neutral 22,24325062,G,A,0_1 ENSP00000215780 GSTT2 V118M Neutral rs2301423 22,24325062,G,A,0_1 ENSP00000385765 GSTT2 V118M Neutral rs2301423 22,25010828,G,A,1_2 ENSP00000248923 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000383231 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000383232 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000383233 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000385975 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000387499 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000387796 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000389935 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000393537 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000395271 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000398589 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000400621 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000415024 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000415068 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000415553 GGT1 G84S Deleterious rs77018131 22,25010828,G,A,1_2 ENSP00000417044 GGT1 G84S Deleterious rs77018131

137

22,42951058,G,A,1_2 ENSP00000331376 SERHL2 S46N Neutral rs926333 22,42951058,G,A,1_2 ENSP00000342425 SERHL2 S46N Neutral rs926333 22,42951058,G,A,1_2 ENSP00000413512 SERHL2 S46N Neutral rs926333 22,42951058,G,A,1_2 ENSP00000435091 SERHL2 S46N Neutral rs926333 X,34148882,C,T,1_2 ENSP00000345029 FAM47A R505H Neutral rs5973089 X,140993716,G,C,2_1 ENSP00000285879 MAGEC1 V176L Neutral rs78700965 X,140993755,C,A,2_1 ENSP00000285879 MAGEC1 P189T Neutral rs74748246

138

Figure Appendix-1: KEGG Annotated Pathway of Natural Killer Cell Mediated Cytotoxicity.

139 REFERENCES

Aho, T. L., J. Sandholm, et al. (2006). "Pim-1 kinase phosphorylates RUNX family transcription factors and enhances their activity." BMC Cell Biol 7: 21. Andersen, P. L., H. Zhou, et al. (2005). "Distinct regulation of Ubc13 functions by the two ubiquitin-conjugating enzyme variants Mms2 and Uev1A." J Cell Biol 170(5): 745-55. Bailey, T. L. (2011). "DREME: motif discovery in transcription factor ChIP-seq data." Bioinformatics 27(12): 1653-9. Bareau, B., J. Rey, et al. (2010). "Analysis of a French cohort of patients with large granular lymphocyte leukemia: a report on 229 cases." Haematologica 95(9): 1534-41. Beima, K. M., M. M. Miazgowicz, et al. (2006). "T-bet binding to newly identified target gene promoters is cell type-independent but results in variable context-dependent functional effects." J Biol Chem 281(17): 11992-2000. Burks, E. J. and T. P. Loughran, Jr. (2006). "Pathogenesis of neutropenia in large granular lymphocyte leukemia and Felty syndrome." Blood Rev 20(5): 245-66. Campo, E., S. H. Swerdlow, et al. (2011). "The 2008 WHO classification of lymphoid neoplasms and beyond: evolving concepts and practical applications." Blood 117(19): 5019-32. Carraway, K. L., A. Perez, et al. (2002). "Muc4/sialomucin complex, the intramembrane ErbB2 ligand, in cancer and epithelia: to protect and to survive." Prog Nucleic Acid Res Mol Biol 71: 149-85. Choi, Y., G. E. Sims, et al. (2012). "Predicting the functional effect of amino acid substitutions and indels." PLoS One 7(10): e46688. Coakley, G., D. Brooks, et al. (2000). "Major histocompatility complex haplotypic associations in Felty's syndrome and large granular lymphocyte syndrome are secondary to allelic association with HLA-DRB1 *0401." Rheumatology (Oxford) 39(4): 393-8. Dhodapkar, M. V., C. Y. Li, et al. (1994). "Clinical spectrum of clonal proliferations of T-large granular lymphocytes: a T-cell clonopathy of undetermined significance?" Blood 84(5): 1620-7. Epling-Burnette, P. K., F. Bai, et al. (2004). "ERK couples chronic survival of NK cells to constitutively activated Ras in lymphoproliferative disease of granular lymphocytes (LDGL)." Oncogene 23(57): 9220-9. Epling-Burnette, P. K., J. H. Liu, et al. (2001). "Inhibition of STAT3 signaling leads to apoptosis of leukemic large granular lymphocytes and decreased Mcl-1 expression." J Clin Invest 107(3): 351-62. Fasan, A., W. Kern, et al. (2012). "STAT3 mutations are highly specific for large granular lymphocytic leukemia." Leukemia. Fife, B. T. and K. E. Pauken (2011). "The role of the PD-1 pathway in autoimmunity and peripheral tolerance." Ann N Y Acad Sci 1217: 45-59. FitzGerald, K. T. and M. O. Diaz (1999). "MLL2: A new mammalian member of the trx/MLL family of genes." Genomics 59(2): 187-92. Geginat, J., A. Lanzavecchia, et al. (2003). "Proliferation and differentiation potential of human CD8+ memory T-cell subsets in response to antigen or homeostatic cytokines." Blood 101(11): 4260-6. Gentile, T. C., A. H. Uner, et al. (1994). "CD3+, CD56+ aggressive variant of large granular lymphocyte leukemia." Blood 84(7): 2315-21. Giordano, F., C. Bonetti, et al. (2009). "The ocular albinism type 1 (OA1) G-protein-coupled receptor functions with MART-1 at early stages of melanogenesis to control melanosome identity and composition." Hum Mol Genet 18(23): 4530-45.

140 Goecks, J., A. Nekrutenko, et al. (2010). "Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences." Genome Biol 11(8): R86. Gough, D. J., A. Corlett, et al. (2009). "Mitochondrial STAT3 supports Ras-dependent oncogenic transformation." Science 324(5935): 1713-6. Gronowski, A. M., Z. Zhong, et al. (1995). "In vivo growth hormone treatment rapidly stimulates the tyrosine phosphorylation and activation of Stat3." Mol Endocrinol 9(2): 171-7. Gupta, S., R. Bi, et al. (2004). "Characterization of naive, memory and effector CD8+ T cells: effect of age." Exp Gerontol 39(4): 545-50. Gupta, S. and S. Gollapudi (2007). "Effector memory CD8+ T cells are resistant to apoptosis." Ann N Y Acad Sci 1109: 145-50. Gupta, S., J. A. Stamatoyannopoulos, et al. (2007). "Quantifying similarity between motifs." Genome Biol 8(2): R24. Hara-Chikuma, M., S. Chikuma, et al. (2012). "Chemokine-dependent T cell migration requires aquaporin-3-mediated hydrogen peroxide uptake." J Exp Med 209(10): 1743-52. Hara-Chikuma, M., Y. Sugiyama, et al. (2011). "Involvement of aquaporin-7 in the cutaneous primary immune response through modulation of antigen uptake and migration in dendritic cells." FASEB J 26(1): 211-8. Helwak, A., G. Kudla, et al. (2013). "Mapping the Human miRNA Interactome by CLASH Reveals Frequent Noncanonical Binding." Cell 153(3): 654-65. Hirano, T., K. Ishihara, et al. (2000). "Roles of STAT3 in mediating the cell growth, differentiation and survival signals relayed through the IL-6 family of cytokine receptors." Oncogene 19(21): 2548-56. Hong, Y. R., C. H. Chen, et al. (1998). "Human dynamin-like protein interacts with the glycogen synthase kinase 3beta." Biochem Biophys Res Commun 249(3): 697-703. Huh, Y. O., L. J. Medeiros, et al. (2009). "T-cell large granular lymphocyte leukemia associated with myelodysplastic syndrome: a clinicopathologic study of nine cases." Am J Clin Pathol 131(3): 347-56. Izquierdo, J. M. and J. Valcarcel (2007). "Fas-activated serine/threonine kinase (FAST K) synergizes with TIA-1/TIAR proteins to regulate Fas alternative splicing." J Biol Chem 282(3): 1539-43. Jenne, C. N., A. Enders, et al. (2009). "T-bet-dependent S1P5 expression in NK cells promotes egress from lymph nodes and bone marrow." J Exp Med 206(11): 2469-81. Jenner, R. G., M. J. Townsend, et al. (2009). "The transcription factors T-bet and GATA-3 control alternative pathways of T-cell differentiation through a shared set of target genes." Proc Natl Acad Sci U S A 106(42): 17876-81. Jerez, A., M. J. Clemente, et al. (2012). "STAT3 mutations unify the pathogenesis of chronic lymphoproliferative disorders of NK cells and T-cell large granular lymphocyte leukemia." Blood 120(15): 3048-57. Jolma, A., J. Yan, et al. (2013). "DNA-binding specificities of human transcription factors." Cell 152(1-2): 327-39. Kao, C., K. J. Oestreich, et al. (2011). "Transcription factor T-bet represses expression of the inhibitory receptor PD-1 and sustains virus-specific CD8+ T cell responses during chronic infection." Nat Immunol 12(7): 663-71. Kawakami, Y., S. Eliyahu, et al. (1994). "Cloning of the gene coding for a shared human melanoma antigen recognized by autologous T cells infiltrating into tumor." Proc Natl Acad Sci U S A 91(9): 3515-9. Kilbey, A., A. Terry, et al. (2010). "Runx regulation of sphingolipid metabolism and survival signaling." Cancer Res 70(14): 5860-9.

141 Koskela, H. L., S. Eldfors, et al. (2012). "Somatic STAT3 mutations in large granular lymphocytic leukemia." N Engl J Med 366(20): 1905-13. Kothapalli, R., I. Kusmartseva, et al. (2002). "Characterization of a human sphingosine-1- phosphate receptor gene (S1P5) and its differential expression in LGL leukemia." Biochim Biophys Acta 1579(2-3): 117-23. Kothapalli, R., S. B. Nyland, et al. (2005). "Constitutive production of proinflammatory cytokines RANTES, MIP-1beta and IL-18 characterizes LGL leukemia." Int J Oncol 26(2): 529-35. Krammer, P. H. (2000). "CD95's deadly mission in the immune system." Nature 407(6805): 789- 95. Kumar, P., S. Henikoff, et al. (2009). "Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm." Nat Protoc 4(7): 1073-81. Lamy, T. and T. P. Loughran, Jr. (2003). "Clinical features of large granular lymphocyte leukemia." Semin Hematol 40(3): 185-95. Lamy, T. and T. P. Loughran, Jr. (2010). "How I treat LGL leukemia." Blood 117(10): 2764-74. Lamy, T. and T. P. Loughran, Jr. (2011). "How I treat LGL leukemia." Blood 117(10): 2764-74. Lander, E. S. and M. S. Waterman (1988). "Genomic mapping by fingerprinting random clones: a mathematical analysis." Genomics 2(3): 231-9. Leblanc, F., D. Zhang, et al. (2012). "Large granular lymphocyte leukemia: from dysregulated pathways to therapeutic targets." Future Oncol 8(7): 787-801. Lewis, B. P., C. B. Burge, et al. (2005). "Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets." Cell 120(1): 15-20. Li, G., X. Ruan, et al. (2012). "Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation." Cell 148(1-2): 84-98. Li, H. and R. Durbin (2009). "Fast and accurate short read alignment with Burrows-Wheeler transform." Bioinformatics 25(14): 1754-60. Linzmeier, R., D. Michaelson, et al. (1993). "The structure of neutrophil defensin genes." FEBS Lett 321(2-3): 267-73. Littler, A. J., C. D. Buckley, et al. (1997). "A distinct profile of six soluble adhesion molecules (ICAM-1, ICAM-3, VCAM-1, E-selectin, L-selectin and P-selectin) in rheumatoid arthritis." Br J Rheumatol 36(2): 164-9. Liu, J. H., S. Wei, et al. (2000). "Chronic neutropenia mediated by fas ligand." Blood 95(10): 3219-22. Liu, J. H., S. Wei, et al. (2002). "Blockade of Fas-dependent apoptosis by soluble Fas in LGL leukemia." Blood 100(4): 1449-53. Liu, X. and T. P. Loughran, Jr. (2011). "The spectrum of large granular lymphocyte leukemia and Felty's syndrome." Curr Opin Hematol 18(4): 254-9. Loughran, T. P., Jr. (1993). "Clonal diseases of large granular lymphocytes." Blood 82(1): 1-14. Loughran, T. P., Jr., K. G. Hadlock, et al. (1998). "Epitope mapping of HTLV envelope seroreactivity in LGL leukaemia." Br J Haematol 101(2): 318-24. Loughran, T. P., Jr., M. E. Kadin, et al. (1985). "Leukemia of large granular lymphocytes: association with clonal chromosomal abnormalities and autoimmune neutropenia, thrombocytopenia, and hemolytic anemia." Ann Intern Med 102(2): 169-75. Loughran, T. P., Jr., M. P. Sherman, et al. (1994). "Prototypical HTLV-I/II infection is rare in LGL leukemia." Leuk Res 18(6): 423-9. Loughran, T. P., Jr., G. Starkebaum, et al. (1988). "Rearrangement and expression of T-cell receptor genes in large granular lymphocyte leukemia." Blood 71(3): 822-4. Lyons, K. M., E. A. Azen, et al. (1988). "Many protein products from a few loci: assignment of human salivary proline-rich proteins to specific loci." Genetics 120(1): 255-65.

142 Masopust, D., V. Vezys, et al. (2001). "Preferential localization of effector memory cells in nonlymphoid tissue." Science 291(5512): 2413-7. McLean, C. Y., D. Bristor, et al. (2010). "GREAT improves functional interpretation of cis- regulatory regions." Nat Biotechnol 28(5): 495-501. Miller, S. A. and A. S. Weinmann (2009). "An essential interaction between T-box proteins and histone-modifying enzymes." Epigenetics 4(2): 85-8. Molife, L. R. and J. S. de Bono (2011). "Belinostat: clinical applications in solid tumors and lymphoma." Expert Opin Investig Drugs 20(12): 1723-32. Morice, W. G., P. J. Kurtin, et al. (2002). "Distinct bone marrow findings in T-cell granular lymphocytic leukemia revealed by paraffin section immunoperoxidase stains for CD8, TIA-1, and granzyme B." Blood 99(1): 268-74. Neben, M. A., W. G. Morice, et al. (2003). "Clinical features in T-cell vs. natural killer-cell variants of large granular lymphocyte leukemia." Eur J Haematol 71(4): 263-5. Nyland, S. B., D. J. Krissinger, et al. (2012). "Seroreactivity to LGL leukemia-specific epitopes in aplastic anemia, myelodysplastic syndrome and paroxysmal nocturnal hemoglobinuria: results of a bone marrow failure consortium study." Leuk Res 36(5): 581-7. Oshimi, K., Y. Shinkai, et al. (1990). "Perforin gene expression in granular lymphocyte proliferative disorders." Blood 75(3): 704-8. Osuji, N., K. Beiske, et al. (2007). "Characteristic appearances of the bone marrow in T-cell large granular lymphocyte leukaemia." Histopathology 50(5): 547-54. Pabinger, S., A. Dander, et al. (2013). "A survey of tools for variant analysis of next-generation genome sequencing data." Brief Bioinform. Pandolfi, F., T. P. Loughran, Jr., et al. (1990). "Clinical course and prognosis of the lymphoproliferative disease of granular lymphocytes. A multicenter study." Cancer 65(2): 341-8. Pearce, E. L., A. C. Mullen, et al. (2003). "Control of effector CD8+ T cell function by the transcription factor Eomesodermin." Science 302(5647): 1041-3. Peng, S. L., S. J. Szabo, et al. (2002). "T-bet regulates IgG class switching and pathogenic autoantibody production." Proc Natl Acad Sci U S A 99(8): 5545-50. Phillips, D., M. J. Reilley, et al. (2010). "Stoichiometry of STAT3 and mitochondrial proteins: Implications for the regulation of oxidative phosphorylation by protein-protein interactions." J Biol Chem 285(31): 23532-6. Pulikkan, J. A., V. Dengler, et al. (2009). "Cell-cycle regulator E2F1 and microRNA-223 comprise an autoregulatory negative feedback loop in acute myeloid leukemia." Blood 115(9): 1768-78. Qu, B., D. Al-Ansary, et al. (2011). "ORAI-mediated calcium influx in T cell proliferation, apoptosis and tolerance." Cell Calcium 50(3): 261-9. Rajala, H. L., S. Eldfors, et al. (2013). "Discovery of somatic STAT5b mutations in large granular lymphocytic leukemia." Blood. Reddy, T. E., J. Gertz, et al. (2012). "Effects of sequence variation on differential allelic transcription factor occupancy and gene expression." Genome Res 22(5): 860-9. Reimand, J., T. Arak, et al. (2011). "g:Profiler--a web server for functional interpretation of gene lists (2011 update)." Nucleic Acids Res 39(Web Server issue): W307-15. Reimand, J., M. Kull, et al. (2007). "g:Profiler--a web-based toolset for functional profiling of gene lists from large-scale experiments." Nucleic Acids Res 35(Web Server issue): W193-200. Ruault, M., M. E. Brun, et al. (2002). "MLL3, a new human member of the TRX/MLL gene family, maps to 7q36, a chromosome region frequently deleted in myeloid leukaemia." Gene 284(1-2): 73-81.

143 Saadatpour, A., R. S. Wang, et al. (2011). "Dynamical and structural analysis of a T cell survival network identifies novel candidate therapeutic targets for large granular lymphocyte leukemia." PLoS Comput Biol 7(11): e1002267. Sallusto, F., D. Lenig, et al. (1999). "Two subsets of memory T lymphocytes with distinct homing potentials and effector functions." Nature 401(6754): 708-12. Schade, A. E., J. J. Powers, et al. (2006). "Phosphatidylinositol-3-phosphate kinase pathway activation protects leukemic large granular lymphocytes from undergoing homeostatic apoptosis." Blood 107(12): 4834-40. Semenzato, G., R. Zambello, et al. (1997). "The lymphoproliferative disease of granular lymphocytes: updated criteria for diagnosis." Blood 89(1): 256-60. Shah, M. V., R. Zhang, et al. (2008). "Molecular profiling of LGL leukemia reveals role of sphingolipid signaling in survival of cytotoxic lymphocytes." Blood 112(3): 770-81. Shirogane, T., T. Fukada, et al. (1999). "Synergistic roles for Pim-1 and c-Myc in STAT3- mediated cell cycle progression and antiapoptosis." Immunity 11(6): 709-19. Smirnova, E., L. Griparic, et al. (2001). "Dynamin-related protein Drp1 is required for mitochondrial division in mammalian cells." Mol Biol Cell 12(8): 2245-56. Smolenska, Z., Z. Kaznowska, et al. (1999). "Effect of methotrexate on blood purine and pyrimidine levels in patients with rheumatoid arthritis." Rheumatology (Oxford) 38(10): 997-1002. Smyth, M. J., J. M. Kelly, et al. (2001). "Unlocking the secrets of cytotoxic granule proteins." J Leukoc Biol 70(1): 18-29. Sokol, L., D. Agrawal, et al. (2005). "Characterization of HTLV envelope seroreactivity in large granular lymphocyte leukemia." Leuk Res 29(4): 381-7. Song, H. Y., M. Rothe, et al. (1996). "The tumor necrosis factor-inducible zinc finger protein A20 interacts with TRAF1/TRAF2 and inhibits NF-kappaB activation." Proc Natl Acad Sci U S A 93(13): 6721-5. Starkebaum, G., T. P. Loughran, Jr., et al. (1987). "Serum reactivity to human T-cell leukaemia/lymphoma virus type I proteins in patients with large granular lymphocytic leukaemia." Lancet 1(8533): 596-9. Strauss, G., I. Knape, et al. (2003). "Constitutive caspase activation and impaired death-inducing signaling complex formation in CD95-resistant, long-term activated, antigen-specific T cells." J Immunol 171(3): 1172-82. Subramanian, A., P. Tamayo, et al. (2005). "Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles." Proc Natl Acad Sci U S A 102(43): 15545-50. Syed, N. A., P. L. Andersen, et al. (2006). "Uev1A, a ubiquitin conjugating enzyme variant, inhibits stress-induced apoptosis through NF-kappaB activation." Apoptosis 11(12): 2147-57. Szabo, S. J., S. T. Kim, et al. (2000). "A novel transcription factor, T-bet, directs Th1 lineage commitment." Cell 100(6): 655-69. Szabo, S. J., B. M. Sullivan, et al. (2002). "Distinct effects of T-bet in TH1 lineage commitment and IFN-gamma production in CD4 and CD8 T cells." Science 295(5553): 338-42. Takata, H. and M. Takiguchi (2006). "Three memory subsets of human CD8+ T cells differently expressing three cytolytic effector molecules." J Immunol 177(7): 4330-40. Tian, Q., M. Streuli, et al. (1991). "A polyadenylate binding protein localized to the granules of cytolytic lymphocytes induces DNA fragmentation in target cells." Cell 67(3): 629-39. Timofeeva, O. A., S. Chasovskikh, et al. "Mechanisms of unphosphorylated STAT3 transcription factor binding to DNA." J Biol Chem 287(17): 14192-200.

144 van Lier, R. A., I. J. ten Berge, et al. (2003). "Human CD8(+) T-cell differentiation in response to viruses." Nat Rev Immunol 3(12): 931-9. Viny, A. D., A. Lichtin, et al. (2008). "Chronic B-cell dyscrasias are an important clinical feature of T-LGL leukemia." Leuk Lymphoma 49(5): 932-8. Wagtmann, N., S. Rajagopalan, et al. (1995). "Killer cell inhibitory receptors specific for HLA-C and HLA-B identified by direct binding and by functional transfer." Immunity 3(6): 801- 9. Wang, J., T. Hoshino, et al. (1998). "ETO, fusion partner in t(8;21) acute myeloid leukemia, represses transcription by interaction with the human N-CoR/mSin3/HDAC1 complex." Proc Natl Acad Sci U S A 95(18): 10860-5. Wlodarski, M. W., C. O'Keefe, et al. (2005). "Pathologic clonal cytotoxic T-cell responses: nonrandom nature of the T-cell-receptor restriction in large granular lymphocyte leukemia." Blood 106(8): 2769-80. Wu, W., Y. Cheng, et al. (2011). "Dynamics of the epigenetic landscape during erythroid differentiation after GATA1 restoration." Genome Res 21(10): 1659-71. Yang, J., P. K. Epling-Burnette, et al. (2008). "Antigen activation and impaired Fas-induced death-inducing signaling complex formation in T-large-granular lymphocyte leukemia." Blood 111(3): 1610-6. Yang, J., X. Liu, et al. (2010). "Platelet-derived growth factor mediates survival of leukemic large granular lymphocytes via an autocrine regulatory pathway." Blood 115(1): 51-60. Yu, J., T. Mitsui, et al. (2011). "NKp46 identifies an NKT cell subset susceptible to leukemic transformation in mouse and human." J Clin Invest 121(4): 1456-70. Zambello, R., L. Trentin, et al. (2000). "Analysis of TNF-receptor and ligand superfamily molecules in patients with lymphoproliferative disease of granular lymphocytes." Blood 96(2): 647-54. Zhang, D. and T. P. Loughran, Jr. (2012). "Large granular lymphocytic leukemia: molecular pathogenesis, clinical manifestations, and treatment." Hematology Am Soc Hematol Educ Program 2012: 652-9. Zhang, R., M. V. Shah, et al. (2008). "Network model of survival signaling in large granular lymphocyte leukemia." Proc Natl Acad Sci U S A 105(42): 16308-13. Zhang, Y., T. Liu, et al. (2008). "Model-based analysis of ChIP-Seq (MACS)." Genome Biol 9(9): R137.

VITA Thomas L. Olson

EDUCATION: The Pennsylvania State University College of Medicine (Hershey, PA) Ph.D., Molecular Medicine 2013 University of Nebraska (Lincoln, NE) B.S., Biology, December 2000 EXPERIENCE: Quest Diagnostics/BryanLGH Medical Center (Lincoln, NE) Medical Technologist in Microbiology 2001-2005 GRANT AND AWARD: Judy S. Finklestein Memorial Student Research Award 2008 PUBLICATIONS: Olson TL, Ratan A, Burhans R, Zhang D, Rajala HLM, Mustjoki S, Miller W, Schuster S, Loughran TP (2013) "Whole Genome Sequencing Reveals Numerous Driver Mutations in LGL Leukemia." (In preparation) Olson TL, Giardine BL, Keller-Capone C, Hardison R, Loughran TP (2013) "Defining the Role of the Transcription Factor T-bet in Terminal Effector Memory Cells Via ChIP-Seq of Large Granular Lymphocytic Leukemia." (In preparation) Loughran TP, Zickl L, Olson TL, Zhang D, Rajala HLM, Hasanali Z, Bennett JM, Lazarus HM, Litzow MR, Evens AM, Mustjoki, S, Tallman MS. (2013) "STAT3 Y640F Mutation Identifies Favorable Response Group to Immunosuppressive Therapy of LGL Leukemia: Results of a Prospective Multicenter Phase II Study of Initial Treatment with Methotrexate by the Eastern Cooperative Oncology Group (E5998)." (In preparation) Andersson EI, Rajala HLM, Eldfors S, Ellonen P, Olson T, Jerez A, Clemente MJ, Kallioniemi O, Porkka K, Heckman C, Loughran TP, Maciejewski JP, Mustjoki S (2013) "Novel Somatic Mutations in Large Granular Lymphocyte Leukemia affecting the STAT-pathway and T-cell activation." In preparation for submission to Blood. Bedoya-Reina OC, Ratan A, Burhans R, Giardine B, Riemer C, Olson TL, Loughran TP, Perry GH, Schuster SC, Miller W (2013) "Galaxy Tools to Study Genome Diversity." Submitted to Gigascience. Jerez A, Clemente MJ, Makishima H, Rajala HLM, Gómez-Seguí I, Olson T, McGraw K, Przychodzen B, Kulasekararaj AG, Afable MG, Husseinzadeh HD, Hosono N, LeBlanc F, Lagström S, Zhang D, Ellonen P, Lichtin AE, Wodnar-Filipowicz A, Mufti GJ, List AF, Mustjoki S, Loughran TP, Maciejewski JP (2013) "STAT3-Mutations Indicate the Presence of Subclinical Self-Reactive Clones in Aplastic Anemia and Myelodysplastic Syndromes." In revision at Blood. Rajala HL, Eldfors S, Kuusanmäki H, van Adrichem AJ, Olson T, Lagström S, Andersson EI, Jerez A, Clemente MJ, Yan Y, Zhang D, Awwad A, Ellonen P, Kallioniemi O, Wennerberg K, Porkka K, Maciejewski JP, Loughran TP Jr, Heckman C, Mustjoki S. (2013 Apr 17) "Discovery of somatic STAT5b mutations in large granular lymphocytic leukemia." Blood. Jerez A, Clemente MJ, Makishima H, Koskela H, Leblanc F, Peng Ng K, Olson T, Przychodzen B, Afable M, Gomez- Segui I, Guinta K, Durkin L, Hsi ED, McGraw K, Zhang D, Wlodarski MW, Porkka K, Sekeres MA, List A, Mustjoki S, Loughran TP, Maciejewski JP. (2012) "STAT3 mutations unify the pathogenesis of chronic lymphoproliferative disorders of NK cells and T-cell large granular lymphocyte leukemia." Blood 120(15):3048-57. Koskela HL, Eldfors S, Ellonen P, van Adrichem AJ, Kuusanmäki H, Andersson EI, Lagström S, Clemente MJ, Olson T, Jalkanen SE, Majumder MM, Almusa H, Edgren H, Lepistö M, Mattila P, Guinta K, Koistinen P, Kuittinen T, Penttinen K, Parsons A, Knowles J, Saarela J, Wennerberg K, Kallioniemi O, Porkka K, Loughran TP Jr, Heckman CA, Maciejewski JP, Mustjoki S. (2012) "Somatic STAT3 mutations in large granular lymphocytic leukemia." N Engl J Med. 366(20):1905-13. Wang BD, Kline CL, Pastor DM, Olson TL, Frank B, Luu T, Sharma AK, Robertson G, Weirauch MT, Patierno SR, Stuart JM, Irby RB, Lee NH. (2010) "Prostate apoptosis response protein 4 sensitizes human colon cancer cells to chemotherapeutic 5-FU through mediation of an NF kappaB and microRNA network." Mol Cancer 9:98 Pastor DM, Poritz LS, Olson TL, Kline CL, Harris LR, Koltun WA, Chinchilli VM, Irby RB. (2010) "Primary cell lines: false representation or model system? a comparison of four human colorectal tumors and their coordinately established cell lines." Int J Clin Exp Med 3(1):69-83. Kline CL, Olson TL, Irby RB. (2009) "Src activity alters alpha3 integrin expression in colon tumor cells." Clin Exp Metastasis 26(2):77-87.