저작자표시-비영리-변경금지 2.0 대한민국

이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게

l 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다.

다음과 같은 조건을 따라야 합니다:

저작자표시. 귀하는 원저작자를 표시하여야 합니다.

비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다.

변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다.

l 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건 을 명확하게 나타내어야 합니다. l 저작권자로부터 별도의 허가를 받으면 이러한 조건들은 적용되지 않습니다.

저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다.

이것은 이용허락규약(Legal Code)을 이해하기 쉽게 요약한 것입니다.

Disclaimer

이학박사 학위논문

Patient-specific genomic profiling for advanced cancers in young adults

진행성 암을 지닌 젊은 성인 환자의

개별 유전체 분석

2016 년 8 월

서울대학교 대학원

협동과정 종양생물학과전공

차 수 진 A Thesis of the Degree of Doctor of Philosophy

진행성 암을 지닌 젊은 성인 환자의

개별 유전체 분석

Patient-specific genomic profiling for advanced cancers in young adults

August 2016

The Department of Cancer biology

Seoul National University

College of Medicine

Soojin Cha ABSTRACT

차 수 진 (Cha Soojin) 종양생물학과 (Cancer biology) The Graduate School Seoul National University

Purpose: Although adolescent and young adult (AYA) cancers are characterized by biological features and clinical outcomes distinct from those of other age groups, the molecular profile of AYA cancers has not well been defined. In this study, I analyzed cancer genomes from rare types of metastatic AYA cancer patients to identify candidate driving and/or druggable genetic alterations.

Methods: Pattern-based heuristic annotation was developed to identify candidate driving genetic alterations from processed sequencing data of individual cancer genome. The reliability of the approach was validated by applying to whole exome sequencing (WES) data of acute myeloid leukemia

(AML) from The Cancer Genome Atlas (TCGA). Then, prospectively collected AYA tumor samples from seven different patients were analyzed using three different genomics platforms (whole-exome sequencing, whole- transcriptome sequencing or OncoScanTM). By using well-known bioinformatics tools (bwa, Picard, GATK, MuTect, and Somatic Indel

i

Detector) and the pattern-based heuristic annotation approach with open access databases (DAVID and DGIdb), I processed sequencing data and identified candidate driving genetic alterations with their druggability specific to each patient.

Results: Using pattern-based heuristic annotation, all candidate were confirmed which was described in published data of TCGA AML study as well as additional candidates such as MYC, NF1 and ARID2. In AYA cancers, the mutation frequencies were lower than those of other adult cancers

(median=0.56), except for a germ cell tumor with hypermutation. Patient- specific genetic alterations in candidate driving genes were idendified: RASA2 and NF1 (prostate cancer), TP53 and CDKN2C (olfactory neuroblastoma),

FAT1, NOTCH1, and SMAD4 (head and neck cancer), KRAS (urachal carcinoma), EML4-ALK (lung cancer), and MDM2 and PTEN (liposarcoma).

Under consult of clinicians, potential drugs were suggested for each patient according to his or her altered genes or related pathways. By comparing candidate driving genes between AYA cancers and those from all age groups for the same type of cancer, different driving genes were identified in prostate cancer and a germ cell tumor in AYAs compared with all age groups, whereas three common alterations (TP53, FAT1, and NOTCH1) in head and neck cancer were identified in both groups.

Conclusion: This study identified patient-specific genetic alterations with druggability of seven rare types of AYA cancers and showed importance and

ii feasibility of analysis of individual cancer genome. Additionally, it showed genetic alterations in cancers from AYA and those from all age groups varied by cancer type.

------

Key words: adolescent and young adult (AYA) cancer, next-generation sequencing (NGS), whole exome sequencing, precision medicine, genomics

Student number: 2012-31154

iii

CONTENTS

Abstract ...... i Contents ...... iv List of tables and figures ...... vi

General Introduction ...... 1

Chapter 1. Individual analysis of cancer genomes using next-generation sequencing Introduction ...... 4 Material and Methods Pattern-based heuristic annotation: Alternative statistical method to select candidate driving genetic alterations in rare types of cancer ...... 6 Validation of pattern-based heuristic annotation using TCGA AML data ...... 10 Results Pattern-based heuristic annotation could detect all candidate driving genes defined in published TCGA AML study and additional candidates ...... 13 Discussion ...... 24

Chapter 2. Application of the individual analysis to adolescence and young adult (AYA) cancer genomes Introduction ...... 27 Material and Methods Study design and sample information ...... 29 Tumor samples and extraction of genomic DNA and RNA ... 29

iv

Whole exome sequencing (WES) ...... 30 Processing WES data to analyze SNVs and indels ...... 31 Whole transcriptome sequencing (WTS) ...... 32 Analysis of copy number variations (CNVs) by OncoScanTM .. 33 Analysis of CNVs by VarScan2 for AYA02 Results ...... 33 Analysis of mutation frequency and mutation spectrum ...... 34 Pathway-drug Analysis ...... 34 Fusion analysis ...... 34 Concurrence of RasGAPs ...... 35 Validation of RASA2 and NF1 in AYA01 ...... 35 Validation of EML4-ALK in AYA09 ...... 35 Results AYA cancer samples, platforms and generation of data from WES ...... 36 Mutation frequency and mutation spectrum ...... 38 Individual AYA cancer analysis: patient-specific genetic alterations ...... 41 AYA01, prostate cancer ...... 63 AYA02, olfactory neuroblastoma ...... 65 AYA04, Head and neck squamous cell carcinoma ...... 66 AYA06, urachal carcinoma ...... 66 AYA07, germ cell tumor ...... 67 AYA09, lung cancer ...... 67 AYA10, liposarcoma ...... 69

Comparison of candidate driving genes from AYA cancers and

cancers in other age groups ...... 69

Discussion ...... 71 References ...... 73 Abstract in Korean ...... 81 v

LIST OF TABLES AND FIGURES

Chapter 1

Figure 1-1. CVE list ...... 7 Figure 1-2. Pattern-based heuristic annotation for single sample (SNV/Indel) ...... 8 Figure 1-3. Pattern-based heuristic annotation for single sample (CNVs) ...... 10 Figure 1-4. Pattern-based heuristic annotation for large-scale samples ...... 10 Figure 1-5. Summary of analysis of TCGA AML data using pattern-based heuristic annotation ...... 12 Table 1-1. Candidate driving genes defined in published AML study and assigned levels according to pattern-based heuristic annotation ...... 13 Table 1-2. Additional candidate driving genes identified by pattern-based heuristic annotation ...... 15 Table 1-3. Mutation pattern of TCGA AML study ...... 16 Table 1-4. References of Class II candidate cancer genes of AML .... 17 Table 1-5. Patient-specific genetic alterations identified by published TCGA AML study (NEJM) and pattern-based heuristic annotation for large-scale samples (Pattern study) ...... 18 Table 1-6. Genetic alterations only defined by pattern-based heuristic annotation for single sample ...... 23

vi

Chapter 2

Figure 2-1 WES pipeline for this study ...... 32 Table 2-1. Sample information of AYAs cancer patients ...... 36 Table 2-2. Sequencing information for WES data ...... 37 Figure 2-2. Mutation frequency of AYA cancers ...... 38 Table 2-3. Mutation frequency of AYAs and major 12 cancer types 39 Figure 2-3. Mutation spectrum of AYA cancers ...... 39 Table 2-4. Mutation frequency of WES data for AYA cancers ...... 40 Table 2-5. Processed WES data ...... 42 Figure 2-4. Candidate driving genetic alterations of AYA cancer and their druggability ...... 60 Table 2-6. Candidate driving genetic alterations of AYA cancers .... 61 Table 2-7. References of Class II candidate cancer genes of AYA cancers ...... 62 Figure 2-5. Analysis of CNVs in AYA cancers ...... 62 Figure 2-6. Copy number variations of AYA cancers from OncoScanTM and VarScan2 ...... 63 Figure 2-7. Sequencing validation of RASA2 and NF1 in AYA01 sample ...... 64 Figure 2-8. Concurrency of RasGAPs in large-scale study analyzed by using c-Bio portal ...... 65 Figure 2-9.EML4-ALK validation in AYA09 cells ...... 68

Table 2-8. Comparison of candidate driving genes of same cancer type from AYAs with those from all age group ...... 70

vii

GENERAL INTRODUCTION

Cancer is one of the leading causes of death worldwide. Abnormal genetic alterations followed by the uncontrolled growth of somatic cells initiate cancer. Although most genetic alterations are passenger mutations that do not contribute to tumorigenesis, an individual cell can proliferate and become a tumor if it acquires a sufficient set of driving mutations. Therefore, finding cancer-driving mutations and targeting the encoded abnormal and related pathways via cancer therapeutics are important strategies to delay cancer progression and prevent metastasis (1).

With emergence of advanced technology, next generation sequencing (NGS), many cancer genomes were analyzed explosively since the method improved the efficiency of DNA sequencing by more than a million-fold at a cost of nearly $1,000 per genome in 2015 (2, 3). Also, NGS can analyze various regions of the genome including whole genome, whole exome (~2% region of the whole genome), and whole transcriptome (sum of mRNAs) as well as identify various types of mutation including point mutation, insertion and deletion (indel), copy number alterations (homozygous deletion, hemizygous deletion, and amplification), and even translocation breakpoint (4).

In cancer field, the explosion of the cancer genomics data which was primarily generated by The Cancer Genome Atlas (TCGA) and International

Cancer Genome Consortium (ICGC) with large-scale approach (5). The purpose of the generating big data is to integrate understanding of cancer data using the generated information including point mutation, copy number, 1 expression and even clinical data (6, 7). As a result, we were able to have more comprehensive pictures of cancer that have been uncovered before NGS development. It is uncovered that the mutation pattern of somatic mutations

(single nucleotide variations, insertions and deletions, and copy number variations) in various cancer types and mutational signature of several cancer types (8-10). The new classification of cancers was also identified that showed mutation-dominant cancer types (M class) and copy number variation-dominant cancer types (C class) (11).

2

CHAPTER 1

Individual analysis of cancer genomes using next-generation sequencing

3

INTRODUCTION

Although the contribution of the large-scale analyses of cancer genomes was obvious as described, in respect of identifying drivers of cancer, it is insufficient approach for several reasons. A saturation analysis emphasized the need of sampling thousands of tumors to find out driving genetic alterations which had very low (less than 0.5%) frequency (12). However, analyzing cancers belong to rare type is limited to large-scale sampling that has an incidence of one in 1,500 to 2,500 patients (13). In spite of their low incidence, identifying driver mutation of rare cancer may be extremely valuable since the studies of rare cancer showed that targeted therapy in these cancers obtained response rate of >70% (13, 14).

Also, the results generated by large-scale analyses overlooked the inter-patient heterogeneity inevitably. Unsupervised clustering analysis of large-scale study on 12 major cancer types showed variation of significant candidate driving genes between patients whose cancer type was same (10). In addition, the recent paper mentioned that heterogeneity between patients who belong to the same type of cancer may induce different clinical outcomes as a consequence of the difference of somatic mutations between the patients (15).

Finally, it is often shown the inconsistency of the results from large-scale studies. Three different large-scale genomics studies for head and neck cancer showed inconsistency of candidate driving genes although several genes were common (16-19). The studies indicated the difficulty of interpretation with the result of large-scale studies, even though each study had novel meaning itself. 4

Identification of driving genetic alterations is a primary issue in precision oncology which pursued the genomics-driven individualized medicine (20).

Several studies have shown successful single sample/patient-focused analyses using whole exome sequencing (WES) different from the approach of the large-scale studies. A study for a patient with stage IV melanoma responsive to ipilimumab (anti-CTLA4 antibody) showed that the immune response was activated by mutated ATR-specific T-cell response and the mutation was identified by using WES (21). Also, a study for a patient of metastatic cholangiocarcinoma have shown that interferon gamma (IFN-γ) secretion of

CD4+ T helper 1 (TH1) cells was increased when the cells were primed with mutated erbb2 interacting (ERBB2IP) which was detected in tumor cell by WES. Also, after adoptive transfer of tumor-infiltrating lymphocytes

(TIL) containing ERBB2IP-reactive TH1 cells, the patient showed tumor regression (22).

In this study, I have developed the pattern-based heuristic annotation to identify driving genetic alterations among somatic mutations detected by

WES and copy number analysis. And the reliability of the approach was established by applying of the approach to published TCGA AML study (23).

5

MATERIALS AND METHODS

Pattern-based heuristic annotation: Alternative statistical method to select candidate driving genetic alterations in rare types of cancer

Because selecting candidate driving genetic alterations by evaluating statistical significance of the candidate mutations from passenger mutations is difficult for rare types of cancer due the low number of available samples, alternative methods are needed to find driving genetic alterations in rare types of cancers. Therefore, I used pattern-based heuristic annotation as an alternative method to identify candidate driving genetic alterations as follows.

The approach was based on a pattern-based mutation analysis of large-scale studies that described mutations in tumor suppressor genes (TSG), which tend to be truncations (deleterious mutation), whereas mutations in oncogenes (OG) tend to be clustered in hot-spot regions of the genes (15, 24).

1) Cancer gene detection and classification (Class I/II cancer gene)

Prior to the pattern analysis of variants (mutations of mutated genes), I first detected cancer genes among the sequencing results. To identify cancer genes,

I used three well-known cancer gene lists (the CVE list; the Cancer gene census from COSMIC (25), the Vogelstein list from a pattern-based mutation study with manual curation (15), and the Elledge list from a pattern-based mutation study using a computational algorithm) (Figure 1-1) (24).

6

Figure 1-1. CVE list. Three of well-known cancer gene lists were represented, ‘CVE’ list (C for cancer gene census, V for cancer gene list from Vogelstein group, and E for cancer gene list from Elledge group (15, 24, 25). Comparing genes in the CVE list with our sequenced mutated genes from processed WES data, I scored each mutated gene from 1 to 3 by counting the number of lists that contained the same genes. (For example, NF1 was mutated in our sample, and the gene also appeared in the three cancer gene lists; therefore, the CVE score for NF1 was 3). Genes with a CVE score of 2 or greater were classified as ‘Class I Cancer Genes’. Such genes are well- known cancer genes, because at least two of the cancer gene lists indicated them as cancer genes. Other mutated genes that scored less than 2 were classified as ‘Class II Cancer Genes’, if their function as cancer genes (TSG or OG) was confirmed by published in vitro or in vivo experiments using transformation assays, proliferation assays, or metastasis assays for any cancer type.

2) Identification of mutation types and functions of the mutated genes

After confirming cancer genes (Class I/II cancer genes), I analyzed their mutation type and functional characteristics (TSG/OG). The mutation types were divided into truncation mutations (nonsense, splicing mutation, and frameshift indels) and subtle alterations (missense and in-frame indels) as

7 shown in pattern-based mutation studies (15, 24). The functional characteristics (TSG/OG) of the mutated genes were defined by descriptions in the CVE list (Class I) or the literature (Class II) (Figure 1-2A).

3) Investigation of variant recurrence

Because clustered patterns of mutations were characteristics of OG (15), I analyzed the clustering pattern of the OG variants from our data by counting the recurrence of the variants in any cancer types archived in the cBio portal

(http://www.cbioportal.org/) (26). Although the recurrence of the variants was less important than the OG, I analyzed the recurrence of TSG because the meaning of recurrent variants may be important (Figure 1-2B).

Figure 1-2. Pattern-based heuristic annotation for single sample (SNV/Indel). (A) Detection of cancer gene from the processed WES data was shown. Class I cancer gene indicated well-known cancer gene, and Class II cancer gene indicated less-known cancer gene. (B) Class I/II cancer genes were questioned by how much they contribute to tumorigenesis considering their mutation type (truncation, subtle alteration), functional characteristics (TSG/OG) and recurrence of the mutations. Finally, they were assigned to levels.

8

4) Assignment of levels to the variants

Finally, I assigned levels to each variant based on their mutation types

(truncation/subtle alteration), functional characteristics (TSG/OG) and the frequency of recurrence. The levels indicated the significance of the variants as a cancer driver (level-1; strong, level-2; moderate, level-3; modest). TSG variants with truncated mutations were assigned to level-1 (Lv1) TSG based on their importance, irrespective of their recurrence. Because the influence of subtle alterations (missense/in-frame indels) on the tumor-suppressing ability of TSG was uncertain when only in silico data were used, TSG variants with subtle alterations were used to assign levels according to their classes and recurrence frequencies.

Considering the significance of clustering mutation, OG variants with subtle, recurring alterations were assigned to Lv1 OG. Because a loss-of-function is not a general characteristic of OG and the significant function of OG was based only on in silico data, OG variants with truncation mutations were assigned to levels (not level-1) according to their recurrence frequencies

(Figure 1-2B).

5) Identification of candidate copy number variations

CNVs from OncoScanTM were investigated to find focal level CNVs based on their mutation types (amplification or deletion/LOH), CVE score, functional characteristics (TSG/OG) described in the literature, and the recurrence of the alterations assessed using a list of well-known CNVs in various cancer types

9 via a large-scale analysis (27). Then, the candidates were assigned (Figure 1-

3).

Figure 1-3. Pattern-based heuristic annotation for single sample (CNVs). Copy number alteration was analyzed by heuristic way similar to WES data considered mutation type, CVE score, functional characteristics and recurrence of the alterations. Recurrent copy number alterations were determined by comparing the results of copy number variation from various cancer types as described in ref. (27)

Validation of pattern-based heuristic annotation using TCGA AML data

To validate reliability of my pattern-based annotation to find driving genetic alterations, I applied it to published AML data (maf format) uploaded in

TCGA portal (23). I downloaded level-2 WES data (download date: 2014-12-

11) and annotated data using ANNOVAR (28) and analyzed the result using pattern-based heuristic annotation for large-scale samples (Figure 1-4).

Figure 1-4. Pattern-based heuristic annotation for large-scale samples. It was similar with the scheme of pattern-based heuristic annotation for single sample (Figure 1-2), except for the consideration of recurrent variant between samples (R).

10

The analysis of large-scale samples should consider recurrent mutation between samples that may possibly be driver genes to the cancer type. So I added a filter (R≥3) to identify the recurrent mutated genes between samples.

2,585 variants were annotated by ANNOVAR and 758 variants were filtered out due to they were variants in non-coding regions or synonymous mutations or ‘unknown’ annotations. The rest 1,827 variants were annotated and investigated to identify driving genetic alterations (Figure 1-5). Total number of patient was 200, but data from three patients was excluded since the data had no exonic mutations when I annotated it by ANNOVAR. Since cBio portal information involved the result of AML study when I analyzed, I excluded data from AML study in step of evaluating recurrence of the variants.

11

Figure 1-5. Summary of analysis of TCGA AML data using pattern-based heuristic annotation

12

RESULTS

Pattern-based heuristic annotation could detect all candidate driving genes defined in published TCGA AML study and additional candidates

As processing data with ANNOVAR and pattern-based heuristic annotation, I identified all driver genes (n=23) defined in published TCGA AML study as shown in Table 1-1.

Table 1-1. Candidate driving genes defined in published TCGA AML study and assigned levels according to pattern-based heuristic annotation

# Class Gene Level 1 C2S BRINP3 Lv2/3 OG 2 C1T/S CEBPA Lv1/3 TSG 3 C1T/S DNMT3A Lv1/3 OG 4 C1T/S EZH2 Lv3 OG 5 C1T/S FLT3 Lv1/3 OG 6 C2T HNRNPK Lv1 TSG 7 C1S IDH1 Lv1 OG 8 C1S IDH2 Lv1 OG 9 C1T/S KIT Lv1/3 OG 10 C1S KRAS Lv1 OG 11 C1T/S NPM1 Lv1/3 TSG 12 C1S NRAS Lv1 OG 13 C1T/S PHF6 Lv1/3 TSG 14 C1S PTPN11 Lv1/3 OG 15 C1T RAD21 Lv1 TSG 16 C1T/S RUNX1 Lv1/3 TSG 17 C2T/S SMC1A Lv2/3 OG 18 C2T/S SMC3 Lv2/3 OG 19 C1T STAG2 Lv1 TSG 20 C1T/S TET2 Lv1/3 TSG 21 C1T/S TP53 Lv1 TSG 22 C1S U2AF1 Lv1/3 OG 23 C1T/S WT1 Lv1/3 TSG

13

Identified candidate driving genes defined in published AML study was described and assigned levels for each gene were shown. Well-known driver genes such as TP53, KRAS, NRAS, IDH1 and IDH2 were assigned level-1 mutation consistent with their functional contribution to tumorigenesis. Many candidate genes have different variants including CEBPA, DNMT3A and

FAM5C so they were assigned different levels as their mutation type and recurrence of the variants.

14

Also, I identified 50 of additional candidate genetic alterations (Table 1-2).

Table 1-2. Additional candidate driving genes identified by pattern-based heuristic annotation

# Class Gene Level 1 C1S ABL1 Lv3 OG 2 C2S ARHGAP35 Lv3 TSG 3 C1S ARID1A Lv3 TSG 4 C1T ARID2 Lv1 TSG 5 C1T ASXL1 Lv1 TSG 6 C1T BCOR Lv1 TSG 7 C2S CAMTA1 Lv3 TSG 8 C2T CAPRIN2 Lv3 OG 9 C1T CBFB Lv1 TSG 10 C1T/S CBL Lv1/3 OG 11 C1S CIC Lv3 TSG 12 C2S CSF3R Lv3 OG 13 C1T/S CSMD1 Lv1/3 TSG 14 C2T CTCF Lv1 TSG 15 C1S DAXX Lv3 TSG 16 C1S EGFR Lv1/3 OG 17 C2S ELF4 Lv2 OG 18 C2S EPHA2 Lv3 OG 19 C2S FAT1 Lv3 TSG/OG 20 C1S GATA2 Lv3 OG 21 C2S HDAC2 Lv2 OG 22 C1S IKZF1 Lv3 TSG 23 C1S JAK2 Lv1 OG 24 C1S JAK3 Lv1 OG 25 C2T/S KDM3B Lv1/2 TSG 26 C1T/S KDM6A Lv1/3 TSG 27 C1S MAX Lv1 OG 28 C1S MDM2 Lv3 OG 29 C1T/S MED12 Lv1/3 OG 30 C1S MLLT4 Lv3 OG 31 C1S MPL Lv3 OG 32 C2S MYB Lv3 OG 33 C1S MYC Lv1 OG 34 C1T/S NF1 Lv1 TSG 35 C1S NOTCH1 Lv1 TSG 36 C1S NOTCH2 Lv3 TSG 37 C2S PAX3 Lv2 OG 38 C1S PDGFRA Lv3 OG 39 C2S PRDM16 Lv3 OG 40 C1S PTCH1 Lv1 TSG 41 C1S SETBP1 Lv1/3 OG 42 C1T SETD2 Lv1 TSG 43 C1S SF3B1 Lv1 OG 44 C2S SMG1 Lv2/3 TSG 45 C1S SRSF2 Lv3 OG 46 C1T/S SUZ12 Lv1/3 TSG 47 C2S TCL1A Lv3 OG 48 C1T/S THRAP3 Lv1/3 TSG 49 C2S TUSC3 Lv2 TSG 50 C2S USP6 Lv3 OG

15

With pattern-based heuristic annotation, the number of candidate oncogenes and tumor suppressor genes was similar (n=38 vs. 34) (Table 1-3).

Interestingly, oncogenes tend to show more recurrent subtle alteration in same position than tumor suppressor genes (n=9/14 vs. 2/12) as expected and the tendency was prominent in Class I group. Recurrent variants of the oncogenes in subtle alteration group were including DNMT3A (R882H), FLT3 (D835E),

IDH1 (R132H), IDH2 (R172K), KRAS (G12D) and NRAS (G13D, G12D,

Q61K) (full data was not shown due to limit of space, but the result was identified in downloaded TCGA AML study with annotation by ANNOVAR).

Truncation group of oncogenes showed no recurrent variants.

Table 1-3. Mutation pattern of TCGA AML study Number Subtle alteration Truncation of Gene Gene ≥ 2 Recurrent Gene ≥ 2 Recurrent Class I 26 12 9 4 0 OG Class II 12 2 0 1 0 Class I 26 9 2 12 4 TSG Class II 8 3 0 1 0 Class I 0 0 0 0 0 OG/TSG Class II 1 0 0 0 0 Total 73 26 11 18 4

Tumor suppressor genes tend to show more truncated mutation than oncogenes (n=13/18 vs. 5/18) as expected. Interestingly, recurrent truncated mutation was shown only in tumor suppressor genes, not in oncogenes

(n=4/13 vs. 0/5). These patterns of the data proved the rationality of the pattern-based heuristic annotation. Functionality of the genes were validated by literatures (Table 1-4)

16

Table 1-4. References of Class II candidate cancer genes of AML

# Class Gene Level Ref. (PMID) 1 C2S ARHGAP35 Lv3 TSG 12600941 2 C2S BRINP3 Lv2/3 OG 17138656 3 C2S CAMTA1 Lv3 TSG 21857646 4 C2T CAPRIN2 Lv3 OG 24912477 5 C2S CSF3R Lv3 OG 23656643 6 C2T CTCF Lv1 TSG 21465478 7 C2S ELF4 Lv2 OG 19380490 8 C2S EPHA2 Lv3 OG 11280802 22986533 9 C2S FAT1 Lv3 TSG/OG 24590895 10 C2S HDAC2 Lv2 OG 24448241 11 C2T HNRNPK Lv1 TSG 17483488 12 C2T/S KDM3B Lv1/2 TSG 11687974 13 C2S MYB Lv3 OG 21828272 14 C2S PAX3 Lv2 OG 11221862 15 C2S PRDM16 Lv3 OG 25082879 16 C2T/S SMC1A Lv2/3 OG 23638217 17 C2T/S SMC3 Lv2/3 OG 17081288 18 C2S SMG1 Lv2/3 TSG 24107632 19 C2S TCL1A Lv3 OG 12381789 20 C2S TUSC3 Lv2 TSG 24435307 21 C2S USP6 Lv3 OG 22081069

Using pattern-based heuristic annotation for large-scale sample, I identified patient-specific genetic alterations of TCGA AML samples (Table 1-5).

17

Table 1-5. Patient-specific genetic alterations identified by published TCGA AML study (NEJM) and pattern-based heuristic annotation for large-scale samples (Pattern study)

NEJM pt# NEJM & Pattern study Pattern study only only 2802 DNMT3A (1), IDH1 (1), NPM1 (1), PTPN11 (3) -- -- 2803 ------2804 PHF6 (1) -- -- 2805 IDH2 (1), RUNX1 (1) -- -- 2806 -- -- KDM6A (1) 2807 IDH2 (1), RUNX1 (3) -- ASXL1 (1), USP6 (3) 2808 CEBPA (3), NRAS (1) -- CSF3R (3) 2809 DNMT3A (1), NPM1 (1) -- -- 2810 IDH2 (1), NPM1 (1) -- -- 2811 DNMT3A (1), FLT3 (3), NPM1 (1), SMC1A (2) -- PTCH1 (1) 2812 FLT3 (3), NPM1 (1) -- -- 2813 TP53 (1) -- -- 2814 FLT3 (3) -- -- 2816 DNMT3A (1), FLT3 (3), NPM1 (1), NRAS (1) -- -- 2817 EZH2 (3), FAM5C (2) -- CBFB (1) 2818 DNMT3A (3), FLT3 (3), NPM1 (1), RAD21 (1) -- -- 2819 KIT (3) -- -- 2820 TP53 (1) -- SUZ12 (1) 2821 IDH1 (1), IDH2 (1), RUNX1 (1), U2AF1 (3) -- ASXL1 (1) 2822 DNMT3A (1), IDH1 (1), SMC1A (3), TET2 (1) -- MED12 (3) 2823 ------2824 DNMT3A (3), NPM1 (1), SMC1A (2) -- -- 2825 DNMT3A (1), FLT3 (1), NPM1 (1) -- -- 2826 IDH2 (1), KRAS (1) , NPM1 (1) -- -- 2827 -- -- MYC (1) 2828 ------2829 TP53 (1) -- -- 2830 DNMT3A (1), FLT3 (3), TET2 (1) -- -- 2831 DNMT3A (3) -- -- 2832 ------2833 DNMT3A (1) -- CBFB (1) 2834 FLT3 (3) -- -- 2835 NPM1 (1) -- -- 2836 FLT3 (3), NPM1 (1) -- -- 2837 NPM1 (1) -- -- 2838 FAM5C (2/3), NRAS (1) -- -- CEBPA (1), DNMT3A (3), NPM1 (1), NRAS (1), PTPN11 2839 -- -- (3), SMC3 (3), WT1 (1) 2840 FLT3 (3) -- -- 2841 ------2842 ------18

2843 KIT (1/3), U2AF1 (1) -- -- 2844 SMC1A (3), WT1 (1) -- -- CAPRIN2 (3), EPHA2 2845 CEBPA (1/3), TET2 (1) -- (3) 2846 KIT (3), WT1 (1) -- -- 2847 U2AF1 (1) -- -- 2848 NPM1 (1) -- -- 2849 -- -- KDM3B (2), THRAP3 (3) 2850 IDH2 (1), RUNX1 (1), STAG2 (1) -- NOTCH1 (1), SRSF2 (3) 2851 DNMT3A (1), FLT3 (3), SMC3 (1) -- -- 2853 DNMT3A (1), FLT3 (3), NPM1 (1), SMC3 (2) -- -- 2854 ------2855 PHF6 (1), PTPN11 (1) -- -- 2857 TP53 (1) -- -- 2858 SMC1A (3) -- -- 2859 DNMT3A (1), HNRNPK (1), NPM1 (1) -- -- 2860 TP53 (1) -- -- 2861 DNMT3A (1), KRAS (1), NPM1 (1), U2AF1 (1) -- HDAC2 (2), TUSC3 (2) 2862 ------2863 CEBPA (3), DNMT3A (1), IDH1 (1) -- EGFR (3) 2864 DNMT3A (1), IDH2 (1), KRAS (1) -- ASXL1 (1) 2865 EZH2 (3), KRAS (1), RUNX1 (1/3), TET2 (1) -- JAK2 (1) 2866 IDH2 (1) -- -- 2867 DNMT3A (3), IDH1 (1) -- KDM6A (3) 2868 TP53 (1) -- -- 2869 DNMT3A (1), FLT3 (1), NPM1 (1) -- PAX3 (2) 2870 FLT3 (3), NRAS (1) -- -- 2871 FLT3 (3), NPM1 (1), PTPN11 (1), STAG2 (1) -- ELF4 (2), THRAP3 (1) 2872 -- -- TCL1A (3) 2873 DNMT3A (1), TET2 (1) -- -- 2874 CEBPA (3), IDH2 (1), WT1 (1) -- SMG1 (3) 2875 FLT3 (3) -- -- 2876 TET2 (1) -- -- 2877 FLT3 (3), IDH2 (1), NPM1 (1) -- -- 2878 TP53 (1) -- EGFR (1) 2879 FLT3 (1), NPM1 (1), TET2 (1) -- -- 2880 CEBPA (1), FLT3 (1) -- -- 2881 KIT (3) -- -- 2882 U2AF1 (1) -- SMG1 (3) 2883 -- -- KDM6A (1) 2884 DNMT3A (1), IDH1 (1), NPM1 (1), PTPN11 (1) -- -- 2885 PTPN11 (1), TP53 (1) -- -- 2886 RAD21 (1) -- JAK3 (1) 2887 DNMT3A (3), EZH2 (3), IDH1 (1), NRAS (1) -- -- 2888 KIT (3) -- -- 2889 ------19

2890 RUNX1 (1) -- -- 2891 DNMT3A (1/3), IDH2 (1) -- -- 2892 NRAS (1) -- -- 2893 ------2894 ------2895 DNMT3A (1), FLT3 (1), NPM1 (1) -- PRDM16 (3) 2896 DNMT3A (1), NPM1 (1) -- -- 2897 -- -- MLLT4 (3) 2898 DNMT3A (1/3), IDH2 (1) -- -- 2899 RUNX1 (3) -- SETBP1 (1) 2900 CEBPA (1), FLT3 (3), NPM1 (1), SMC1A (2) -- NOTCH2 (3) 2901 IDH1 (1) -- -- 2903 NPM1 (1) -- -- 2904 TP53 (1) -- -- 2905 WT1 (1) -- CSMD1 (1) 2906 FLT3 (3) -- IKZF1 (3) 2907 IDH2 (1), RUNX1 (1) -- ASXL1 (1) 2908 DNMT3A (1), SMC3 (3), TET2 (1), TP53 (1) -- ARIDI1A (3), KDM3B (1) 2909 FLT3 (3) -- -- 2910 FLT3 (3) -- CIC (3) 2911 ------2912 DNMT3A (1), PHF6 (1), RUNX1 (3), SMC3 (3), U2AF1 (1) -- -- 2913 FLT3 (3), NPM1 (1), STAG2 (1), WT1 (1) -- -- 2914 PTPN11 (1) -- CBL (1) 2915 FLT3 (3), NPM1 (3) -- -- 2916 DNMT3A (1) -- CSMD1 (3), MYB (3) 2917 KRAS (1) -- MED12 (1), SETBP1 (3) 2918 FLT3 (3) -- -- 2919 DNMT3A (3), IDH1 (1), NPM1 (1), WT1 (3) -- -- 2920 -- -- NF1 (1) 2921 FLT3 (3) -- -- 2922 FLT3 (3) -- -- 2923 NRAS (1), TET2 (1) -- -- 2924 FLT3 (3), NPM1 (1) -- -- 2925 DNMT3A (1), FLT3 (3), NPM1 (1) -- CSMD1 (3), GATA2 (3) 2926 FLT3 (3), IDH1 (1) -- -- 2927 RUNX1 (1/3) -- ASXL1 (1) 2928 DNMT3A (3), FLT3 (1), IDH1 (1), TET2 (1/3) -- -- 2929 KRAS (1) -- SF3B1 (1) 2930 FLT3 (3), WT1 (1) -- -- 2931 DNMT3A (1), FLT3 (3), NPM1 (1) -- -- 2932 NPM1 (1), NRAS (1) -- SETD2 (1) 2933 RUNX1 (3) -- -- 2934 DNMT3A (3), FLT3 (3), IDH2 (1) -- -- 2935 TP53 (1) -- --

20

2936 IDH2 (1), RUNX1 (1) -- -- 2937 HNRNPK (1), KIT (3), TET2 (1) -- -- 2938 DNMT3A (3), TP53 (1) -- -- 2939 KIT (3) -- -- 2940 CEBPA (1/3) -- -- 2941 TP53 (1) -- -- 2942 FLT3 (3) -- -- 2943 TP53 (1) -- -- 2945 DNMT3A (1), FLT3 (3), IDH1 (1), KIT (3), NPM1 (1) -- -- 2946 ------2947 DNMT3A (1), FLT3 (3), NPM1 (1) -- -- 2948 IDH2 (1) -- -- 2949 DNMT3A (1), IDH1 (1), RUNX1 (1) -- SUZ12 (1) 2950 SMC3 (1) -- ARHGAP35 (3) 2952 CEBPA (1), NRAS (1), TP53 (1) -- FAT1 (3), MDM2 (3) 2954 ------2955 CEBPA (1/3), DNMT3A (1) -- GATA2 (3) 2956 ------2957 FLT3 (3) -- -- 2959 IDH2 (1), PHF6 (1), RUNX1 (1) -- -- 2963 FLT3 (3), NPM1 (1), PTPN11 (1), SMC1A (3) -- -- 2964 STAG2 (1), TET2 (1/3) -- CAMTA1 (3) 2965 DNMT3A (1), FAM5C (3), FLT3 (3), NPM1 (1), TET2 (1) -- -- 2966 DNMT3A (3), IDH2 (1), KRAS (1) -- SMG1 (2) 2967 DNMT3A (3), NPM1 (1), NRAS (1), RAD21 (1) -- -- 2968 DNMT3A (1/3), NRAS (1), U2AF1 (1) -- BCOR (1) 2969 DNMT3A (1), FLT3 (3), IDH1 (1), NPM1 (1) -- -- 2970 FLT3 (3), RUNX1 (1), WT1 (1) -- BCOR (1) 2971 TET2 (1/3) -- -- 2972 NPM1 (1), PTPN11 (1), STAG2 (1) -- -- 2973 IDH2 (1), NPM1 (1) -- -- 2974 DNMT3A (1), FLT3 (3), NPM1 (1) -- -- 2975 DNMT3A (1), RAD21 (1) -- -- 2976 FAM5C (3), FLT3 (3), NPM1 (1), PHF6 (3), WT1 (1) -- -- 2977 IDH1 (1) -- -- 2978 NRAS (1), RUNX1 (1), STAG2 (1), TET2 (1) -- -- 2979 CEBPA (1/3) -- -- 2980 FLT3 (3) -- -- 2981 DNMT3A (1), FLT3 (3), NPM1 (1) -- -- 2982 ------2983 RUNX1 (3) -- -- 2984 IDH1 (1), NPM1 (1), NRAS (1) -- -- 2985 NRAS (1) -- -- 2986 FLT3 (1), NPM1 (1), RAD21 (1) -- -- 2987 DNMT3A (1), KRAS (1), NPM1 (1) -- PDGFRA (3)

21

2988 DNMT3A (3), NPM1 (1) -- -- 2989 NPM1 (1), WT1 (1) -- CBL (3) 2990 IDH1 (1), NPM1 (1) -- -- 2991 -- -- ARID2 (1), MAX (1) 2992 IDH1 (1), NPM1 (1) -- -- 2993 DNMT3A (3), FLT3 (3), NPM1 (1), SMC3 (3) -- -- 2994 FLT3 (3) -- -- 2995 ------2996 TET2 (1/3), U2AF1 (1) -- MPL (3) 2997 ------2998 FAM5C (2), FLT3 (3) -- -- 2999 -- -- ABL1 (3) 3000 CEBPA (1/3) -- -- 3001 ------3002 IDH2 (1) -- -- 3005 ------3006 FLT3 (1), TET2 (1) -- -- 3007 FLT3 (3) -- -- 3008 CEBPA (3) -- -- CTCF (1), DAXX (3), 3009 PHF6 (3), RUNX1 (1), WT1 (1) -- NF1 (1), SUZ12 (3) 3011 IDH1 (1), NPM1 (1) -- -- 3012 ------Note TSG: Gene name OG: Gene name TSG/OG: Gene name

Among data, since 30 patient samples showed no candidate genes I performed

the pattern-based heuristic annotation for single sample to identify candidate

genetic alterations more deeply (Figure 1-2). Interestingly, 12 patient samples

showed candidate genes such as NF1, MYC and ARID2 with level-1 (Table 1-

6). Also, one of the 27 driver genes only defined by my approach, CSF3R

(T618I), was proven to have oncogenic property through transformation assay

in hematological cancer in a recent study (29). Those results assured the

reliability of the approach and pointed out the importance of individual

analysis of cancer genome.

22

Table 1-6. Genetic alterations only defined by pattern-based heuristic annotation for single sample

Pt # NEJM & Pattern study NEJM only Pattern study only 2803 -- -- POU5F1 (3) 2806 -- -- KDM2B (3), KDM6A (1) 2823 ------2827 -- -- MYC (1), SELENBP1 (3) 2828 ------2832 -- -- VASH1 (3) 2841 -- -- DAG1 (3) 2842 ------2849 -- -- KDM3B (2), THRAP3 (3) 2854 ------2862 ------2872 -- -- TCL1A (3) 2883 -- -- KDM6A (1) 2889 ------2893 ------2894 ------2897 -- -- MLLT4 (3) 2911 ------2920 -- -- NF1 (1) 2946 ------2954 ------2956 ------2982 ------2991 -- -- ARID2 (1), MAX (1) 2995 ------2997 ------2999 -- -- ABL1 (3) 3001 ------3005 ------3012 ------Note TSG: Gene name OG: Gene name TSG/OG: Gene name

23

DISCUSSION

Individualized genomic analysis: another approach to identify driving genetic alterations

This study showed importance of the individual genomic analysis of cancer and reliability of the pattern-based heuristic annotation to identify driving genetic alterations which was validated by TCGA AML data.

After facing era of NGS, most cancer genomes were analyzed to investigate the mutations driving tumors by large-scale sampling and well-constructed methodologies based on the statistics and mathematics with multi-dimensional data (7, 12). Through the approach, mutational signatures specific to many cancer types were identified, however, it is not sufficient to detect driving genetic alterations in individual cancer samples since the large scale-analyses does not consider inter-patient heterogeneity (15). Therefore, as another approach, individual analysis of cancer genome was needed to identify candidate driving genetic alterations.

The importance and reliability of individual approach was validated by applying pattern-based heuristic annotation to TCGA AML study. Thirty among 200 AML patients who were included within TCGA AML study were inexplicable with the approach of large-scale analysis (23). My approach, pattern-based heuristic annotation, could identify candidate driving genetic alterations such as NF1, ARID2, MYC and MAX as level-1 candidates in 12 among the thirty (Table 1-6). Also, CSF3R mutation (T6181) was detected in a sample (patient #2808) through my approach that was clarified as an 24 oncogenic property in hematological tumors indicating the possibility of contribution of the mutation to tumorigenesis although other mutations

(CEBPA and NRAS) were also detected in the sample (Table 1-5) (29). This finding also showed the possible oncogenic property of candidate genes even though they were level-3.

This study considered mutation type, recurrence of the variant and functionality of the mutated genes as well as published knowledge and open access database, different from other individual analyses (30, 31). My approach, however, may include false-positive candidates since some literature information is not sufficiently validated and classification of oncogene and tumor suppressor gene may be varied to cell context or cancer type.

25

CHAPTER 2

Application of the individual analysis to adolescence and young adult (AYA) cancer genomes

26

INTRODUCTION

Cancer driving genetic alterations were identified in various cancer types via large-scale analyses led by The Cancer Genome Atlas (TCGA) and

International Cancer Genome Consortium (ICGC) (5). Although large-scale analyses unveiled frequently altered driving mutations in many cancer types, such as BRAF (V600E) in melanoma and colorectal cancer, finding less frequently altered mutations is a challenge using large-scale analyses, especially in uncommon cancer types (5, 12, 15).

Adolescent and young adult (AYA) cancer is a rare type of malignant disease that arises in patients aged 15 to 39 years and is characterized by biological features, therapeutic outcomes, and survival rates that are distinct from those observed in other age groups. Although determining the genomic profiles of

AYA cancer is important to investigate the causes of these distinct characteristics, large-scale genomic studies or molecular data for AYA cancer are not available due to the rarity of the disease and the difficulty of collecting tumor samples (32, 33).

In this study, I analyzed seven different AYA cancers from patients with metastatic tumors using three different genomics platforms (WES, whole- transcriptome sequencing (WTS), and OncoScanTM) with my colleagues. We identified single nucleotide variations (SNVs) and insertion and deletions

(indels) by WES and detected fusions by using WTS. For copy number variations (CNVs), we used OncoScanTM that is the genomics platform for analysis of copy number variations which had high performance with samples 27 from FFPE, especially (34). We processed the WES data with well-known bioinformatics tools (bwa, Picard, GATK, MuTect, and Somatic Indel

Detector), as other studies described and processed WTS data with fusion detection tools (17, 30, 35). Then, candidate genes were identified and potential drugs were suggested that are specific to the genetic alterations of each patient. Finally, we compared candidate genes for AYA cancers with the same types of cancers from all age groups using published data.

28

MATERIALS AND METHODS

Study design and sample information

This study was reviewed and approved by the Institutional Review Board

(IRB) of Seoul National University Hospital (1206-086-414). Samples from seven different tumors, prostate cancer, olfactory neuroblastoma, head and neck squamous cell carcinoma (HNSCC), urachal carcinoma, germ cell tumor, lung cancer, and liposarcoma, were prospectively obtained in three different forms (fresh-frozen tissue, formalin-fixed paraffin-embedded (FFPE), and pleurisy). The samples were analyzed using three different genomics platforms (WES, WTS, and OncoScanTM) as the tumor sources permitted. We first intended to analyze samples from ten patients, but three samples were excluded because the amount of provided tumor sample was insufficient

(AYA03) or sufficient DNA/RNA for a genome-scale analysis was not obtained (AYA05, and 08). For sample AYA04 (HNSCC), the HPV infection status was identified by IHC staining (data not shown).

Tumor samples and extraction of genomic DNA and RNA

This study used different types of tumor tissue: fresh-frozen tissue, formalin- fixed paraffin-embedded (FFPE) tissue and pleurisy. Fresh-frozen tissue samples were resected from gun-biopsy followed by storage in liquid nitrogen, and pleurisy sample was obtained by cell pellet using centrifugation. Genomic

DNA of each tumor samples was extracted by QIAamp DNA Micro Kit

(Qiagen, Valencia, CA, USA) for fresh-frozen tissue samples and Maxwell®

16 FFPE Plus LEV DNA purification kit (Promega, Madison, WI, USA) for 29

FFPE samples, then they were quantified using Qubit® 2·0 Fluorometer (Life

Technologies, Carlsbad, CA, USA). Genomic DNA from pleurisy sample was extracted by GeneAll ExgeneTM Cell SV kit (GeneAll, Seoul, Korea) and quantified by Eon™ Microplate Spectrophotometer (BioTek Instruments,

Winooski, VT, USA). Matched normal genomic DNA was obtained from peripheral blood of each patient. Normal genomic DNA was extracted using

Maxwell® 16 LEV Blood DNA Kit (Promega) and quantified by Eon™

Microplate Spectrophotometer. RNA of tumor samples was extracted by

PureLink RNA mini kit (Ambion) for fresh-frozen tissue samples and pleurisy sample. Extracted samples were quantified by Qubit® 2·0

Fluorometer for fresh-frozen tissue and Eon™ Microplate Spectrophotometer

(BioTek Instruments) for pleurisy.

Whole exome sequencing (WES)

A minimum of 3 μg of genomic DNA was randomly fragmented by Covaris, and the sizes of the library fragments were mainly distributed between 250 bp and 300 bp. adapters were then ligated to both ends of the fragments.

Extracted DNA was amplified by ligation-mediated PCR (LM-PCR) and then purified and hybridized to the SureSelect XT Human All Exon v4+UTR 71

Mb (Agilent Technologies, Santa Clara, CA, USA) for enrichment according to the manufacturer's recommended protocol. After loading each captured library on the Hiseq2000 platform (Illumina, San Diego, CA, USA), we performed high-throughput sequencing for each captured library. Raw image files were processed by Illumina CASAVA v1.8.2 for base-calling with

30 default parameters, and the sequences from each individual were generated as

101-bp pair-end reads.

Processing WES data to analyze SNVs and indels

WES data were processed using a series of steps. We aligned the sequenced files (Fastq file) to the reference genome (human reference genome g1k v37) using the Burrows-Wheeler Aligner (BWA v0.7.5a)(36) and then sorted the output and removed PCR duplicates using PICARD v1.95(37). Using the typical GATK workflow (The Genome Analysis Toolkit v2.6-5), we processed the data for local indel realignment and base quality recalibration

(38). For variant calling, we used MuTect v1.1.6 for single nucleotide variants

(SNVs) and Somatic Indel Detector (from GATK v2.2-8) for indels (39).

Whereas we called the SNVs with the default setting value, we altered the tumor indel fraction from 0.3 to 0.05 (T_INDEL_F <0.05) for indel calling after considering false-negatives. The called variants interpreted as somatic mutations were tagged with “KEEP” or “SOMATIC” with MuTect and

Somatic Indel Detector, respectively, and used for further study. To avoid false-positive indel variants, we filtered out variants with tumor alterative reads less than 6. All somatic variants were annotated by ANNOVAR (28).

The variants that passed through the steps were called ‘processed WES data’

(Figure 2-1).

31

Figure 2-1. WES pipeline for this study. Whole exome sequencing data were processed by this pipeline comprised of well-known bioinformatics tools.

Whole transcriptome sequencing (WTS)

Libraries for RNA sequencing were generated using the TruSeq RNA Sample

Preparation kit (Illumina) following the manufacturer’s protocol. Briefly, mRNA was purified using poly-T oligo-attached magnetic beads and subsequently fragmented. The cleaved RNA fragments were reverse transcribed into first strand cDNA using random primers. Next, second strand cDNA was synthesized using DNA Polymerase I and RNase H. The second strand cDNA fragments were subject to an end repair, the addition of a single

‘A’ base, and ligation of the adapters. The products were then purified and enriched with PCR to create the final cDNA library. Sequencing (2 × 101 bp) was carried out on the HiSeq 2000 platform (Illumina).

32

Analysis of copy number variations (CNVs) by OncoScanTM

We used the 330-k OncoScanTM FFPE platform (Affymetrix, Santa Clara, CA,

USA) to identify candidate CNVs (amplification/deletion and loss-of- heterozygosity (LOH)). AYA02 was excluded because the amount of DNA was insufficient for OncoScanTM. A minimum of 80 ng of DNA from each sample was used for the OncoScanTM platform. The Nexus Express

(Affymetrix) software was used to analyze the data and find CNVs. We filtered out CNVs with a CN ≤ 2.5, which were considered insignificant amplification, and analyzed -level CNVs and focal level CNVs.

Analysis of CNVs by VarScan2 for AYA02

To analyze CNVs in AYA02, we used VarScan2 as an alternative method to

OncoScanTM. After processing data up to the ‘Realignment/Recalibration’ step in WES processing (Figure 2-1), we processed data based on the manufacturer’s recommendations. We generated mpileup data from recalibrated BAM files of both tumor and normal using SAMtools and used the ‘Copy caller’ module of VarScan2 to generate the relative copy number change (C), which was determined as follows: C = log2 ((DT/DN)*(IN/IT)), where ‘D’ stands for the average depth, ‘I’ for the number of uniquely mapped bases, ‘N’ for normal, and ‘T’ for tumor (40, 41). After filtering out mapping quality values <15, we adjusted the relative copy number using the re-centering option in ‘Copy caller’ and segmented copy number regions based on the circular binary segmentation algorithm. After merging, the results were represented by IGV (Integrative Genomics Viewer) (42). Because 33 the VarScan2 results covered only exon regions, not all genome regions, we analyzed only chromosome-level CNVs that were similar to those obtained with OncoScanTM and did not analyze focal-level CNVs. To create graphs of relative copy number changes, we used data from re-centered relative copy number changes displayed on a log2 scale.

Analysis of mutation frequency and mutation spectrum

The mutation frequency was analyzed by counting the number of variants annotated by ANNOVAR from WES data as nonsynonymous SNVs, synonymous SNVs, nonsense mutations, stop-loss mutations, splicing mutations, frameshift insertions/deletions (indels), in-frame indels, and noncoding RNA in exonic regions. These mutations had previously been described in published data from 12 major cancer studies (10). To analyze the mutation spectrum, we used SNVs processed with MuTect in all sequenced regions not limited to coding regions.

Pathway-drug Analysis

After assigning the levels to the variants by pattern-based heuristic annotation, we investigated the biologic pathways of variants using DAVID or the literature (43). To analyze the druggability of the variants, we concentrated mainly on level-1 (strong) variants using DGIdb (44).

Fusion analysis

WTS data from four samples (AYA01, 02, 09, and 10) was analyzed to identify cancer driving fusions using three different fusion tools, FusionMap, deFuse and ChimeraScan (45-47). From the results, candidate cancer-driving 34 fusions were selected by using a fusion gene list archived in COSMIC

(download date: 2015-03-03).

Concurrence of RasGAPs

Using cBio portal, co-occurrence of RasGAPs (RASA1, RASA2, RASA3,

RASA4, RASAL1, RASAL2, RASAL2, NF1, SynGAP1 and DAP2IP) was analyzed in large study of various cancer types (N > 90 cases, analysis date:

2015-3-11) (48). All data were composed of exome data matched tumor and normal. After entering RasGAPs gene sets, I obtained the result of concurrent tendency across cancer types.

Validation of RASA2 and NF1 in AYA01

Candidate genetic alterations in DNA of AYA01, RASA2 and NF1, were validated by deep sequencing (using HiSeq2000) or Sanger sequencing.

Validation of EML4-ALK in AYA09

We obtained tumor cells from pleural effusion of AYA09 (lung cancer) through centrifugation and extracted RNA as described above. Then, we performed RT-PCR with SuperScript III First-strand synthesis system for RT-

PCR (Invitrogen, San Diego, CA, USA) and pairs of primers (forward: 5’-

TAC CAG TGC TGT CTC AAT TGC AGG-3’, reverse: 5’-ATC CAG TTC

GTC CTG TTC AGA GC-3’) under condition as follows: 95°C for 10min, followed by 35 cycling at 95°C for 30 sec, 62°C for 30 sec, and 72°C for 30 sec. As control of EML4-ALK fusion, we used H2228 cell line (American

Type Culture Collection, Rockville, MD, USA) with variant 3a/b of EML4-

ALK. The product of PCR was sequenced by Sanger sequencing. 35

RESULTS

AYA cancer samples, platforms and generation of data from WES

Samples analyzed in this study were collected from AYA patients (median

age=32) who had metastatic tumors. Seven different tumor samples were

obtained in three different forms (fresh-frozen tissues, FFPE and pleurisy) and

analyzed using three different genomics platforms (WES, WTS and

OncoScanTM) (Table 2-1).

Table 2-1. Sample information of AYAs cancer patients

No. Ag Se Tumor type Tissue Previous Platform AYA e x type treatment† * WES WTS Onco- Scan #01 30 M Prostate cancer Fresh- Docetaxel+P v v v frozen d #02 30 M Olfactory Fresh- ICE v v - neuroblastoma frozen Op PORT

#04 33 M Head and neck Fresh- Op v - v squamous cell frozen CCRT carcinoma DP Cetuximab FP Op #06 32 M Urachalcarcinom FFPE Op v - v a #07 21 M Germ cell tumor Fresh- Op v - v frozen BEP IE #09 34 M Lung cancer pleurisy - - v -

#10 33 M Liposarcoma Fresh- Op v v v frozen * #03: exclusion, because of no tumor samples provided; #05, 08: exclusion, because of insufficient samples to sequencing †Pd, prednisolone; ICE, ifosfamide+carboplatin+etoposide; Op, operation; PORT, postoperative radiotherapy; CCRT, concurrent chemoradiotherapy; DP, docetaxel+cisplatin; FP, 5-fluorouracil+cisplatin; BEP, bleomycin+etoposide+cisplatin; IE, ifosfamide+etoposide

36

We processed WES data using well-known bioinformatics tools described in

Figure 2-1 following previously published studies (17, 35). From the WES data of six tumors and matched normal blood, we generated total of 95 gigabases (Gb, range 12.5-20.0/sample) and 106 Gb (range 12.7-27.0/sample) of mapped sequences, respectively. The mean target coverage was 135X

(range, 101X-190X) and 150X (range, 115X-205X) for tumor samples and normal blood, respectively. The coverage of mean target bases exceeded 30X for 89.8% and 92.5% of the tumor and normal blood samples, respectively

(Table 2-2).

Table 2-2. Sequencing information for WES data

Sample AYA01_Normal AYA01_Tumor AYA02_Normal AYA02_Tumor Target Regions 199,259 199,259 199,259 199,259 Bases in Target Region 70,369,848 70,369,848 70,369,848 70,369,848 Bases Sequenced 14,340,725,784 13,394,620,000 17,853,383,170 15,593,444,136 Mean Target Coverage 145.01 129.82 118.97 103.26 % of targets at coverage 30X 92.61 91.91 90.29 87.03

Sample AYA04_Normal AYA04_Tumor AYA06_Normal AYA06_Tumor Target Regions 199,259 199,259 199,259 199,259 Bases in Target Region 70,369,848 70,369,848 70,369,848 70,369,848 Bases Sequenced 26,970,195,236 19,997,019,694 20,503,746,592 19,863,262,162 Mean Target Coverage 205.08 190.36 194.76 164.54 % of targets at coverage 30X 96.04 91.00 95.55 93.93

Sample AYA07_Normal AYA07_Tumor AYA10_Normal AYA10_Tumor Target Regions 199,259 199,259 199,259 199,259 Bases in Target Region 70,369,848 70,369,848 70,369,848 70,369,848 Bases Sequenced 13,348,069,706 13,796,972,488 12,669,135,182 12,476,740,282 Mean Target Coverage 119.20 121.96 115.38 101.3 % of targets at coverage 30X 89.44 88.84 90.8 86.03

AYAs_Normal AYAs_Tumor Bases Sequenced 105,685,255,670 95,122,058,762 Mean Target Coverage 149.73 135.21 % of targets at coverage 30X (mean) 92.46 89.79

37

Mutation frequency and mutation spectrum

We analyzed the mutation frequency and mutation spectrum of six samples from processed WES data (Figure 2-2, Table 2-3 and Figure 2-3).

Figure 2-2. Mutation frequency of AYA cancers. Somatic mutation frequencies of pediatric, AYA and adult cancers are shown. Mutation frequencies of AYA cancers were assessed using somatic mutations annotated as nonsynonymous SNVs, synonymous SNVs, nonsense mutations, stop-loss mutations, splicing mutations, frameshift indels, in-frame indels, and noncoding RNA. Mutation frequencies for other cancers were derived from published data from the same regions (10).

38

Table 2-3. Mutation frequency of AYAs and major 12 cancer types

Class Tumor Type Median Mutation Freq Q1 Mutation Q3 Mutation (/Mb) Freq Freq Pediatric AML 0.29 0.16 0.41 AYA AYA01 0.31 - - AYA AYA10 0.32 - - AYA AYA11 0.37 - - AYA AYA06 0.45 - - AYA AYA02 0.46 - - AYA AYA14 0.62 - - AYA AYA16 0.93 - - Adult solid BRCA 1.02 0.69 1.71 AYA AYA04 1.37 - - Adult solid OV 1.49 1.03 2.01 Adult solid KIRC 1.73 1.29 2.25 Adult solid UCEC 1.89 1.33 9.66 Adult solid GBM 1.89 1.53 2.32 Adult solid COAD/READ 3.09 2.19 4.74 Adult solid HNSC 3.80 2.41 5.60 Adult solid BLCA 6.01 3.61 10.03 Adult solid LUAD 6.69 3.23 12.54 AYA AYA07 6.70 - - Adult solid LUSC 8.16 6.05 11.03

Figure 2-3. Mutation spectrum of AYA cancers. The mutation spectrum of AYA cancers (transition and transversion frequency) was assessed using SNVs processed with MuTect in whole-exome regions.

39

Except for AYA07, which contained a hypermutation, the coding regions

(exon and splicing regions) of a total of 276 somatic mutations (median, 40; range, 26-133) were detected (Table 2-4).

Table 2-4. Mutation frequency of WES data for AYA cancers

AYA01 AYA02 AYA04 AYA06 AYA07 AYA10 Somatic mutation/Mb 0.37 0.69 1.87 0.56 9.37 0.39 Number of somatic mutations 26 49 133 40 665 28 Synonymous SNV 1 8 28 3 157 5 Nonsynonymous SNV 10 13 83 15 441 12 Nonsense 3 1 6 1 5 0 Stoploss 0 0 0 0 2 0 Splicing mutation 2 2 1 2 14 1 ncRNA mutation 3 8 8 5 34 0 Indels 7 17 7 14 12 10 DNA repair genes TDG TP53 TP53 MSH3 DDB1/ns - /Mutation context* /ns /ns /sp /ns LIG3/ns MNAT1/ns POLE/ns POLG/ns POLQ/ns

Allele freq. of 0.05 0.58 0.28 0.1 0.06 (DDB1) - DNA repair genes 0.06 (LIG3) 0.08 (MNAT1) 0.06 (POLE) 0.06 (POLG) 0.27 (POLQ) *ns- nonsynonymous SNV, sp: splicing alteration

Compared with the mutation frequency of 12 major cancer types obtained from published data, the somatic mutation frequency of most samples was low

(median, 0.56/Mb; range, 0.37-1.87), except for AYA07 (Figure 2-2) (10).

This finding is consistent with data showing that somatic mutations are accumulated with age (1). Although the mutation frequency of AYA07 was high (9.37/Mb), the overall mutation frequency of AYA cancers is low: a recent large-scale study of testicular germ cell tumors (TGCTs), the same

40 tumor type as that of AYA07, demonstrated a low mutation frequency (mean

0.5/Mb) (49). Additionally, three AYA cancers with higher mutation frequencies (AYA02, 04, and 07) harbored mutations in TP53 or in several

DNA repair genes compared with other samples (mean, 3.98/Mb vs. 0.44/Mb, respectively), as shown in large studies (10).

Except for AYA07, the mutation spectrum of AYA cancers showed prominent C>T transitions, as has been demonstrated in many solid cancers

(Figure 2-3) (9). Interestingly, AYA07 showed a high proportion of C>A transversions (79.7%). This result may be related to an over-representation of

C>A transversions in the TCGT study (49) or multiple cycles of chemotherapy for the AYA07 patient, because increased C>A transversions were observed in all samples obtained from 8 relapsed AML patients after chemotherapy (50).

Individual AYA cancer analysis: patient-specific genetic alterations

After processing the WES data, we annotated variants using ANNOVAR as described in Figure 2-1. All annotated variants are described in Table 2-5. We then selected driving genetic alterations using my pattern-based heuristic annotation, because it is limited to analyzing rare types of cancers using the same statistical methods to select driving genetic alterations used in large- scale studies (Figure 1-1 – 1-3) (12).

41

Table 2-5. Processed WES data

(A) AYA01

dbSNP138 CVE list T_ T_ Mutation Chr Start End Ref Alt Altered gene Non- Ref Alt All C V E context** Flagged* 11005021 1 110050210 G GC AMIGO1 106 6 - - - - - FS ins 0 13112790 8 131127909 C A ASAP1 85 7 - - - - - nonsense 9 17 29162451 29162451 G A ATAD5 71 10 - - - - - nsSNV 14 96730408 96730408 C T BDKRB1 120 24 - - - - - nsSNV 13576280 9 135762805 T G C9orf9 12 4 - - - - - nsSNV 5 rs3758 rs375808 16 69155012 69155012 G A CHTF8 122 8 - - - nsSNV 08540 540 8 37888131 37888131 G GC EIF4EBP1 45 6 - - - - - FS ins 22 29630189 29630189 G T EMID1 31 3 - - - - - splicing 11 68700794 68700794 C A IGHMBP2 23 5 - - - - - nsSNV 19 46387815 46387815 C T IRF2BP1 119 17 - - - - - sSNV 1 15386680 15386680 G GCC KAZN 248 15 - - - - - FS ins 11 65266568 65266568 T C MALAT1 41 7 - - - - - ncRNA 22 51048683 51048683 G GC MAPK8IP2 84 6 - - - - O UN 20 29637976 29637976 G C MLLT10P1 14 4 - - - - - ncRNA 10064573 7 100645731 G A MUC12 44 3 - - - - - nsSNV 1 13 77642780 77642780 G A MYCBP2 56 5 - - - - - nonsense 20399140 2 203991403 C A NBEAL1 158 12 - - - - - nsSNV 3 17 29588833 29588834 GC G NF1 216 45 - - O O O FS del 24815449 1 248154493 T C OR2L1P 8 3 - - - - - ncRNA 3 17 7411682 7411682 C T POLR2A 142 11 - - - - - nsSNV 15086910 X 150869103 C A PRRG3 32 11 - - - - - nsSNV 3 14132754 3 141327541 T A RASA2 31 3 - - - - - splicing 1 14127881 3 141278819 A AT RASA2 108 11 - - - - - FS ins 9 21 33043756 33043756 C A SCAF4 97 5 - - - - - nonsense 11 64323714 64323714 G GC SLC22A11 167 9 - - - - - FS ins GGAG GCGG CGCC 22 26879946 26879967 G SRRD 77 25 - - - - - IF del CCGG GGGA GA 10437670 rs6193 rs619376 12 104376700 A C TDG 78 4 - - - nsSNV 0 7630 30

42

(B) AYA02

dbSNP138 CVE list Mutati Altered T_ T_ on Chr Start End Ref Alt Non- gene Ref Alt All C V E context Flagged* ** 7 152513619 152513619 G A ACTR3B 71 10 - - - - - sSNV 7 134132780 134132783 ATCC A AKR1B1 35 37 - - - - - IF del 17 79954560 79954560 G GC ASPSCR1 114 6 - - O - - FS ins 3 10401667 10401667 G A ATP2B2 71 4 - - - - - sSNV 16 84495698 84495698 G A ATP2C2 38 3 - - - - - sSNV 1 28564378 28564378 G GA ATPIF1 30 72 - - - - - FS ins 12 112036746 112036749 GGGC G ATXN2 4 7 - - - - - IF del ACTG 13 70713509 70713509 A ATXN8OS 4 9 - - - - - ncRNA CTG 9 121929604 121929604 G GTGT BRINP1 44 61 - - - - - IF ins 7 2963980 2963980 G GT CARD11 23 18 - - O O - FS ins 17 78165189 78165189 G A CARD14 41 6 - - - - - nsSNV 16 1484849 1484849 G A CCDC154 22 20 - - - - - nsSNV CCAGCG CCTGCT 22 37964408 37964429 C CDC42EP1 16 14 - - - - - IF del GCAAAC CCCT nonsen 1 51436116 51436116 C T CDKN2C 10 20 - - O O - se 14 105352853 105352853 C CG CEP170B 51 6 - - - - - FS ins 11 46342080 46342081 GT G CREB3L1 111 6 - - O - - splicing 11 46321710 46321710 C T CREB3L1 47 6 - - O - - sSNV rs38 rs385614 1 225528183 225528183 C A DNAH14 132 6 5614 - - - nsSNV 5 5 16 3706179 3706179 G A DNASE1 155 6 - - - - - nsSNV TAGCAG 4 88537072 88537081 T DSPP 4 10 - - - - - IF del TGAC rs14 rs145673 3 58084486 58084486 A G FLNB 9 17 5673 - - - nsSNV 747 747 HMMR- 5 162909996 162909996 T C 103 31 - - - - - ncRNA AS1 12 56428760 56428760 C CG IKZF4 93 6 - - - - - FS ins 5 128440934 128440934 A G ISOC1 51 35 - - - - - sSNV 16 919000 919000 G A LMF1 36 25 - - - - - sSNV LOC10272 10 29776325 29776327 TAA T 10 6 - - - - - ncRNA 4316 2 66667039 66667039 C CG MEIS1 84 7 - - - - - FS ins 17 4796837 4796837 G A MINK1 26 47 - - - - - nsSNV MIRLET7B 22 46507545 46507547 AAG A 86 8 - - - - - ncRNA HG 12 40918902 40918902 T C MUC19 42 3 - - - - - UN 5 16694605 16694605 G GC MYO10 158 10 - - - - - FS ins NCBP2- 3 196669594 196669594 C T 9 19 - - - - - ncRNA AS2 15 24921468 24921468 T TG NPAP1 83 8 - - - - - FS ins 9 131222915 131222915 A AC ODF2 80 9 - - - - - FS ins 16 67695546 67695546 G GC PARD6A 184 14 - - - - - FS ins 11 3845151 3845151 C CG PGAP2 120 12 - - - - - FS ins 17 6374655 6374655 T A PITPNM3 3 12 - - - - - nsSNV 7 75157156 75157156 T TC PMS2P3 61 7 - - - - - ncRNA 22 46631080 46631080 A C PPARA 13 4 - - - - - nsSNV 9 127920662 127920662 C T PPP6C 57 10 - - - - O splicing 14 24605476 24605476 C T PSME1 34 3 - - - - - nsSNV 7 127729577 127729577 C T SND1 41 19 - - - - - nsSNV 15 100889519 100889519 C T SPATA41 188 123 - - - - - ncRNA 4 124323691 124323691 G T SPRY1 21 42 - - - - - nsSNV 19 56029967 56029967 C CT SSC5D 174 15 - - - - - FS ins 12 65269550 65269550 T C TBC1D30 24 54 - - - - - sSNV 3 30664743 30664743 G T TGFBR2 109 10 - - - - O nsSNV 17 7577572 7577572 T C TP53 6 8 - - O O O nsSNV 19 21926904 21926904 G T ZNF100 68 4 - - - - - sSNV 10 43014402 43014404 AAG A ZNF37BP 43 6 - - - - - ncRNA 43

(C) AYA04

dbSNP138 CVE list Altered T_ T_ Mutation Chr Start End Ref Alt Non- gene Ref Alt All C V E context** Flagged* 10 114181766 114181766 C T ACSL5 135 29 - - - - - sSNV 10 127755315 127755315 C A ADAM12 177 13 - - - - - nsSNV 5 7826867 7826867 C T ADCY2 183 48 - - - - - sSNV rs148315 rs148315 1 980882 980882 G A AGRN 157 19 - - - nsSNV 419 419 2 131520783 131520783 G A AMER3 198 49 - - - - - nsSNV 4 114275811 114275811 C T ANK2 63 12 - - - - - nsSNV 4 125631581 125631581 A T ANKRD50 249 62 - - - - - nsSNV 2 219114594 219114594 G C ARPC2 232 51 - - - - - sSNV 7 150721080 150721080 G A ATG9B 100 6 - - - - - UN 15 50190393 50190393 C T ATP8B4 100 6 - - - - - nsSNV 1 193149870 193149870 G A B3GALT2 216 22 - - - - - nsSNV 9 135458462 135458462 C A BARHL1 119 7 - - - - - nsSNV 17 65924573 65924573 G T BPTF 234 11 - - - - - nsSNV 17 65924590 65924590 A G BPTF 233 12 - - - - - sSNV 1 11008227 11008227 C T C1orf127 254 65 - - - - - sSNV 20 1187715 1187715 G T C20orf202 129 7 - - - - - nsSNV 9 35674208 35674208 T C CA9 63 4 - - - - - sSNV 12 2705120 2705120 T C CACNA1C 771 32 - - - - - nsSNV 12 1996184 1996184 T C CACNA2D4 15 10 - - - - - nsSNV 12 7653878 7653878 T G CD163 808 40 - - - - O nsSNV 10 85974315 85974315 C A CDHR1 191 13 - - - - - nsSNV 17 30815356 30815356 G C CDK5R1 236 18 - - - - - nsSNV 17 46053334 46053334 A G CDK5RAP3 102 5 - - - - - sSNV 22 17669307 17669309 GCA G CECR1 227 22 - - - - - FS del rs371741 1 6166747 6166747 G A CHD5 53 11 - - - - nsSNV 094 2 101009849 101009849 G A CHST10 120 50 - - - - - nsSNV 12 122794338 122794338 T A CLIP1 24 7 - - - - - nsSNV 3 130187801 130187801 T G COL6A5 282 22 - - - - - nsSNV 22 37322114 37322114 T A CSF2RB 130 10 - - - - - nsSNV 2 80085214 80085214 G A CTNNA2 130 24 - - - - - nsSNV 10 104596845 104596845 C T CYP17A1 237 18 - - - - - nsSNV 10 94834660 94834660 A T CYP26A1 103 6 - - - - - nsSNV 11 6662114 6662114 G C DCHS1 175 12 - - - - - nsSNV 4 155163852 155163852 A C DCHS2 143 20 - - - - - sSNV 12 9574060 9574060 A T DDX12P 186 6 - - - - - ncRNA 3 38080791 38080791 G A DLEC1 151 45 - - - - - sSNV 6 170592616 170592616 C CG DLL1 107 6 - - - - - FS ins 17 76457652 76457652 C A DNAH17 105 13 - - - - - nonsense 17 11513721 11513721 C T DNAH9 142 11 - - - - - nsSNV rs140156 rs140156 1 65867548 65867548 G A DNAJC6 147 10 - - - nsSNV 759 759 4 88534407 88534409 AAT A DSPP 102 15 - - - - - FS del 6 74228438 74228438 T A EEF1A1 128 5 - - - - - nsSNV 2 27587675 27587675 G A EIF2B4 135 8 - - - - - nonsense 6 65301613 65301613 A C EYS 392 20 - - - - - nsSNV 4 187524594 187524594 G GT FAT1 326 82 - - - - O FS ins 11 92534689 92534689 G C FAT3 87 7 - - - - - nsSNV 3 13612717 13612717 G C FBLN2 27 6 - - - - - nsSNV 1 171121179 171121179 G A FMO6P 414 26 - - - - - ncRNA 12 8195263 8195263 C T FOXJ2 738 98 - - - - - nsSNV 3 138665096 138665096 G C FOXL2 47 20 - - O O - nsSNV GCSAML- 1 247694105 247694106 GC G 34 12 - - - - - ncRNA AS1 4 44709876 44709876 T C GNPDA2 171 19 - - - - - nsSNV GOLGA6L1, rs200513 rs200513 15 22743407 22743407 G A 35 3 - - - nsSNV GOLGA6L22 306 306 16 57608811 57608811 G T GPR114 54 7 - - - - - sSNV 5 89923515 89923515 A C GPR98 112 6 - - - - - nsSNV 3 141497247 141497247 G A GRK7 98 35 - - - - - nsSNV 44

X 51487056 51487056 C T GSPT2 43 19 - - - - - nsSNV 6 29856463 29856463 T G HLA-H 38 4 - - - - - ncRNA 6 55443760 55443760 G C HMGCLL1 216 17 - - - - - nsSNV 2 177016663 177016663 C T HOXD4 22 3 - - - - - nsSNV 2 234079144 234079144 G A INPP5D 46 6 - - - - - UN 3 12977592 12977592 C T IQSEC1 81 6 - - - - - sSNV 2 227662160 227662162 CCA C IRS1 135 8 - - - - - FS del 12 2926407 2926407 G A ITFG2 254 18 - - - - - nsSNV 17 73736508 73736508 A T ITGB4 126 14 - - - - - nsSNV 4 6037730 6037730 A T JAKMIP1 93 7 - - - - - sSNV 1 39880094 39880094 C T KIAA0754 202 20 - - - - - nsSNV 22 33700389 33700389 C T LARGE 122 6 - - - - - nsSNV 13 53277762 53277762 A C LECT1 297 32 - - - - - nsSNV 1 202266700 202266700 G T LGR6 190 21 - - - - - nonsense 6 33554802 33554802 A AAC LINC00336 5 6 - - - - - ncRNA 19 1556803 1556803 C A MEX3D 132 15 - - - - - nonsense 15 89453045 89453045 G A MFGE8 32 4 - - - - O sSNV 7 7612489 7612489 G C MIOS 179 15 - - - - - nsSNV 7 7612490 7612490 G T MIOS 176 16 - - - - - nsSNV 4 4864581 4864581 C T MSX1 63 6 - - - - - nsSNV 11 1080576 1080576 G A MUC2 90 4 - - - - - sSNV 16 48595053 48595053 G A N4BP1 227 65 - - - - - nsSNV 17 7227216 7227216 G A NEURL4 277 27 - - - - - sSNV 14 36988746 36988746 T TC NKX2-1-AS1 227 12 - - - - - ncRNA 1 162040068 162040068 C T NOS1AP 222 23 - - - - - nsSNV 11 123887008 123887008 C A OR10G4 210 13 - - - - - nsSNV 11 5443470 5443470 T G OR51Q1 841 60 - - - - - nsSNV X 99662510 99662510 G A PCDH19 87 59 - - - - - sSNV 4 30724475 30724475 G T PCDH7 234 13 - - - - O nsSNV 8 52744090 52744090 T C PCMTD1 45 4 - - - - O nsSNV 21 44180483 44180483 C T PDE9A 224 48 - - - - - nsSNV 16 20387485 20387485 G A PDILT 79 24 - - - - - nonsense 20 8639311 8639311 G T PLCB1 129 8 - - - - - nsSNV 20 8782781 8782781 C T PLCB1 87 10 - - - - - nsSNV 2 68613666 68613666 C T PLEK 170 14 - - - - - nonsense 1 204226768 204226768 C T PLEKHA6 54 6 - - - - O nsSNV 11 65046261 65046261 C T POLA2 76 30 - - - - - sSNV 2 132021766 132021766 A C POTEE 79 7 - - - - - nsSNV 7 150069952 150069952 G C REPIN1 140 16 - - - - - nsSNV 10 62645914 62645914 C T RHOBTB1 494 101 - - - - - nsSNV 2 114369818 114369818 G A RPL23AP7 4 3 - - - - - ncRNA 8 73984036 73984036 G C SBSPON 100 10 - - - - - nsSNV 11 62189812 62189812 G A SCGB1A1 126 8 - - - - - nsSNV 2 165950946 165950946 A T SCN3A 141 7 - - - - O nsSNV 6 76412647 76412647 T C SENP6 138 17 - - - - - nsSNV 2 180008480 180008480 C T SESTD1 74 8 - - - - O nsSNV 19 51170416 51170416 G A SHANK1 51 6 - - - - - nsSNV 19 51918521 51918521 A C SIGLEC10 73 6 - - - - - nsSNV 20 37356934 37356934 C T SLC32A1 218 20 - - - - - sSNV 6 118635218 118635218 T G SLC35F1 414 18 - - - - - nsSNV 15 67482790 67482790 G A SMAD3 56 5 - - - - - sSNV 17 61910454 61910454 G T SMARCD2 76 19 - - - - - nsSNV 2 1168851 1168851 C T SNTG2 62 5 - - - - - sSNV 8 48641564 48641564 C A SPIDR 51 5 - - - - - sSNV X 138529572 138529572 A T SRD5A1P1 28 3 - - - - - ncRNA 4 119979077 119979078 AC A SYNPO2 193 12 - - - - - FS del X 100531469 100531472 CATT C TAF7L 265 36 - - - - - IF del 12 6566795 6566795 C T TAPBPL 211 10 - - - - - sSNV 19 3600211 3600211 G A TBXA2R 14 6 - - - - - nsSNV 14 24725214 24725214 G A TGM1 116 6 - - - - - nsSNV 15 39885852 39885852 G A THBS1 40 3 - - - - - nsSNV 15 63054504 63054504 G A TLN2 481 42 - - - - - nsSNV 22 22314781 22314781 G C TOP3B 35 3 - - - - - nsSNV 17 7579592 7579592 T C TP53 60 23 - - O O O splicing 22 20100239 20100239 C T TRMT2A 72 7 - - - - - sSNV 9 131073180 131073180 G A TRUB2 317 38 - - - - - nsSNV 2 179453333 179453333 G T TTN 127 18 - - - - - nsSNV 45

2 179589251 179589251 C A TTN 49 5 - - - - - nsSNV 19 4944397 4944397 C T UHRF1 193 71 - - - - - UN 19 17751431 17751431 C T UNC13A 53 10 - - - - - sSNV rs749001 rs749001 17 5037195 5037195 G A USP6 114 6 O - - nsSNV 03 03 X 65252418 65252418 A T VSIG4 62 5 - - - - - nsSNV 12 6161801 6161801 C T VWF 190 11 - - - - - sSNV 12 863316 863316 G A WNK1 101 13 - - - - - sSNV 3 55508509 55508509 G A WNT5A 53 4 - - - - - sSNV 14 77605591 77605591 G A ZDHHC22 60 21 - - - - - nsSNV 5 180277628 180277628 G A ZFP62 340 47 - - - - - sSNV 6 87964581 87964581 C A ZNF292 388 62 - - - - O nsSNV 19 21992579 21992579 C G ZNF43 89 11 - - - - - nsSNV 1 182026125 182026125 C T ZNF648 224 34 - - - - - nsSNV

46

(D) AYA06

dbSNP138 CVE list Mutation Altered T_ T_ Chr Start End Ref Alt Non- context* gene Ref Alt All C V E Flagged* * 13 25745349 25745349 G GC AMER2 95 8 - - - - - FS ins 9 95599345 95599345 G GA ANKRD19P 6 7 - - - - - ncRNA 13 70713509 70713509 A ACTG ATXN8OS 27 7 - - - - - ncRNA 19 51892382 51892382 G GC C19orf84 186 16 - - - - - FS ins GGGCGCC 18 20716014 20716023 G CABLES1 15 9 - - - - - IF del GGC 6 7389434 7389434 C T CAGE1 126 6 - - - - - splicing 7 93055822 93055822 C T CALCR 297 11 - - - - - nsSNV 8 82644888 82644888 A AG CHMP4C 188 10 - - - - - FS ins 22 37769495 37769495 T TC ELFN2 80 6 - - - - - FS ins 17 71205858 71205861 ATGC A FAM104A 102 6 - - - - - IF del rs102465 rs102465 17 18682505 18682505 T C FBXW10 32 4 - - - nsSNV 7 7 4 153896047 153896047 A AC FHDC1 97 7 - - - - - FS ins rs312065 rs312065 1 152285137 152285137 G T FLG 104 5 - - - nsSNV 4 4 rs404234 rs199794 15 30905994 30905994 G A GOLGA8H 36 5 - - - nsSNV 7 386 1 13183492 13183492 T C HNRNPCL2 11 4 - - - - - sSNV 2 170374777 170374777 G A KLHL41 59 4 - - - - - nsSNV rs112445 12 25398281 25398281 C T KRAS 128 10 - O O O nsSNV 441 TGGCAGC 17 39305775 39305775 T AGCTGGG KRTAP4-5 29 14 - - - - O IF ins GC 17 39346595 39346595 T TA KRTAP9-1 155 13 - - - - - FS ins AGGGCA 15 90320134 90320146 GGGGCA A MESP2 3 21 - - - - - IF del G 3 196733535 196733535 C CG MFI2 77 7 - - - - - FS ins rs200167 rs200167 5 79950724 79950724 G C MSH3 45 5 - - - nsSNV 5 5 11 55579145 55579145 C T OR5L1 31 3 - - - - - nsSNV 1 158532510 158532510 C A OR6P1 304 10 - - - - - nsSNV 19 7607678 7607678 C T PNPLA6 125 5 - - - - - nsSNV 1 79002163 79002163 C T PTGFR 238 8 - - - - - nonsense 1 64605927 64605927 G A ROR1 472 13 - - - - - nsSNV 19 39923952 39923952 G A RPS16 29 3 - - - - - sSNV 12 123005948 123005948 C T RSRC2 193 7 - - - - - nsSNV 21 36410869 36410871 CCT C RUNX1-IT1 42 10 - - - - - ncRNA GGAGTG GTCTGTG 12 109017650 109017680 CCTCCGT G SELPLG 272 85 - - - - - IF del GGGCACT GGTT 21 34927468 34927468 C G SON 40 6 - - - - - sSNV 18 12449779 12449779 G A SPIRE1 340 11 - - - - - nsSNV 12 22354743 22354743 G A ST8SIA1 284 9 - - - - - nsSNV 9 113137744 113137745 CT C SVEP1 46 6 - - - - - splicing 15 90118864 90118864 G GC TICRR 117 9 - - - - - FS ins X 73045808 73045808 G A TSIX,XIST 182 10 - - - - - ncRNA rs749001 rs749001 17 5037195 5037195 G A USP6 67 6 O - - nsSNV 03 03 8 146201230 146201233 TACA T ZNF252P 91 6 - - - - - ncRNA 19 58600100 58600100 T TGG ZSCAN18 121 10 - - - - - FS ins

47

(E) AYA07

dbSNP138 CVE list Altered T_ T_ Mutation Chr Start End Ref Alt Non- gene Ref Alt All C V E context** Flagged* 19 58864015 58864015 G A A1BG-AS1 17 4 - - - - - ncRNA 2 169791885 169791885 C T ABCB11 36 25 - - - - - sSNV 16 16282712 16282712 G T ABCC6 25 4 - - - - - sSNV 3 100566447 100566447 C A ABI3BP 133 5 - - - - - sSNV 17 61559009 61559009 G T ACE 73 5 - - - - - nsSNV 17 61559937 61559937 G T ACE 124 7 - - - - - nsSNV 17 61568684 61568684 G T ACE 32 4 - - - - - nsSNV 11 76709805 76709805 A T ACER3 101 40 - - - - - splicing 11 76709806 76709806 G T ACER3 102 39 - - - - - splicing 1 120436968 120436968 C A ADAM30 93 5 - - - - - sSNV rs7758 rs775815 5 33549374 33549374 G T ADAMTS12 121 8 - - - nsSNV 1578 78 5 33891883 33891883 C A ADAMTS12 69 5 - - - - - nsSNV 9 18906820 18906820 C A ADAMTSL1 40 5 - - - - - nsSNV 9 136402633 136402633 G T ADAMTSL2 39 4 - - - - - nsSNV 6 147136309 147136309 C A ADGB 60 5 - - - - - nsSNV 4 100232023 100232023 A T ADH1B 162 38 - - - - - sSNV 8 26721911 26721911 G A ADRA1A 20 22 - - - - - sSNV 2 236957852 236957852 G T AGAP1 59 5 - - - - - nsSNV 12 58130904 58130904 C A AGAP2 37 3 - - - - - nsSNV 6 151670599 151670599 C A AKAP12 28 4 - - - - - nsSNV 15 86287887 86287887 G T AKAP13 46 4 - - - - - nsSNV 19 15484021 15484021 C A AKAP8 61 5 - - - - - nsSNV 9 117118302 117118302 G T AKNA 65 5 - - - - - sSNV 11 77832169 77832169 C G ALG8 226 74 - - - - - nsSNV 19 36501574 36501574 G A ALKBH6 135 53 - - - - - sSNV 1 21887134 21887134 C A ALPL 55 5 - - - - - nsSNV 3 46718456 46718456 G T ALS2CL 63 5 - - - - - nsSNV 12 31850856 31850856 G T AMN1 201 8 - - - - - nsSNV 12 31841951 31841951 G T AMN1 137 7 - - - - - sSNV 11 113270241 113270241 G T ANKK1 64 6 - - - - - nsSNV 5 72851348 72851348 C A ANKRA2 149 7 - - - - - nsSNV 16 89354947 89354947 C A ANKRD11 79 5 - - - - - nsSNV 19 33095298 33095298 C A ANKRD27 80 5 - - - - - nsSNV 5 79855084 79855084 G A ANKRD34B 42 16 - - - - - nsSNV 15 40574762 40574762 G T ANKRD63 37 4 - - - - - nsSNV 8 124748071 124748071 C A ANXA13 57 5 - - - - - nsSNV 16 71766978 71766978 G T AP1G1 39 4 - - - - - nsSNV 1 114440492 114440492 G T AP4B1 66 5 - - - - - sSNV 14 20924871 20924871 C CA APEX1 55 34 - - - - - FS ins 19 36365496 36365496 C A APLP1 69 6 - - - - - nsSNV rs1785 rs178531 14 75140777 75140777 G T AREL1 69 5 - - - nsSNV 3116 16 20 47641993 47641993 G A ARFGEF2 96 27 - - - - - sSNV 17 12887975 12887975 C A ARHGAP44 83 5 - - - - - sSNV 1 17930026 17930026 C A ARHGEF10L 39 4 - - - - - nsSNV 2 39185184 39185184 G T ARHGEF33 56 17 - - - - O nsSNV 2 39185185 39185185 G T ARHGEF33 55 17 - - - - O nsSNV 2 97217432 97217432 G T ARID5A 25 4 - - - - - nsSNV 6 109220994 109220994 G T ARMC2 24 4 - - - - - nsSNV 3 35778792 35778792 C A ARPP21 36 4 - - - - - nsSNV 14 104577930 104577930 C A ASPG 24 4 - - - - - nsSNV 8 124408451 124408451 G T ATAD2 43 3 - - - - O sSNV 14 96800086 96800086 C A ATG2B 85 5 - - - - - sSNV 6 106764022 106764022 G A ATG5 83 10 - - - - O nsSNV 5 160042911 160042911 G T ATP10B 126 8 - - - - - nsSNV 16 84459400 84459400 G T ATP2C2 45 4 - - - - - nsSNV 2 71185508 71185508 G A ATP6V1B1 50 18 - - - - - nsSNV ATP6V1G2- rs3705 rs370511 6 31514448 31514448 C A 303 14 - - - ncRNA DDX39B 11629 629 13 52511698 52511698 G T ATP7B 65 6 - - - - - nsSNV 13 52516578 52516578 G T ATP7B 29 4 - - - - - nsSNV 10 116853763 116853763 G T ATRNL1 40 4 - - - - - nsSNV 48

7 70255271 70255271 C A AUTS2 107 6 - - - - - sSNV 19 41765733 41765733 C A AXL 230 8 - - - - - nsSNV 11 134251848 134251848 G T B3GAT1 81 5 - - - - - nsSNV 11 372919 372919 G A B4GALNT4 14 10 - - - - - nsSNV 17 79007094 79007094 C A BAIAP2-AS1 57 5 - - - - - ncRNA 1 147096518 147096518 C A BCL9 134 7 - - O - - nsSNV 20 55840829 55840829 G T BMP7 147 7 - - - - - nsSNV 19 15350596 15350596 G T BRD4 24 4 - - O - - nsSNV 19 15378291 15378291 G T BRD4 41 4 - - O - - sSNV rs1451 rs145107 5 878551 878551 A G BRD9 91 10 - - - nsSNV 07515 515 16 2260174 2260174 G T BRICD5 31 4 - - - - - nsSNV 9 121930383 121930383 C A BRINP1 31 7 - - - - - nsSNV 10 93788546 93788546 G A BTAF1 45 21 - - - - - splicing 11 57512463 57512463 C T BTBD18 168 46 - - - - - nsSNV 6 26385475 26385475 G T BTN2A2 71 6 - - - - - nsSNV 5 180420152 180420152 G T BTNL3 32 4 - - - - - nsSNV 10 50533403 50533403 C A C10orf71 81 5 - - - - - nsSNV 14 29261243 29261243 G T C14orf23 85 5 - - - - - ncRNA 15 76496640 76496640 G T C15orf27 38 4 - - - - - nsSNV 15 80215201 80215201 C A C15orf37 183 7 - - - - - ncRNA 1 11008789 11008789 C A C1orf127 34 4 - - - - - nsSNV 1 60520883 60520883 T A C1orf87 111 54 - - - - - nsSNV 17 77044053 77044053 G C C1QTNF1 50 16 - - - - - nsSNV 15 62360729 62360729 G C C2CD4A 23 17 - - - - - nsSNV 6 30615198 30615198 C A C6orf136 34 4 - - - - - nsSNV 8 146278141 146278141 C A C8orf33 23 4 - - - - - nsSNV 2 27444100 27444100 G T CAD 51 4 - - - - - sSNV 17 76993222 76993222 C A CANT1 186 8 - - O - - nsSNV 6 17507933 17507933 G T CAP2 104 7 - - - - - nsSNV 15 42682292 42682292 C A CAPN3 81 7 - - - - - sSNV 12 75678861 75678861 C T CAPS2 206 29 - - - - - splicing 19 5751734 5751734 C A CATSPERD 17 3 - - - - - nsSNV 19 48800504 48800504 G T CCDC114 83 5 - - - - - nsSNV 5 68616173 68616173 C A CCDC125 118 6 - - - - - nsSNV 3 42798707 42798707 C A CCDC13 123 6 - - - - - splicing X 50052253 50052253 C A CCNB3 33 5 - - - - - nsSNV X 50089805 50089805 G T CCNB3 68 5 - - - - - nsSNV 1 160811244 160811244 C A CD244 87 6 - - - - - nsSNV 15 73995241 73995241 G T CD276 6 4 - - - - - nsSNV 3 107770787 107770787 C A CD47 117 6 - - - - - nsSNV 14 103447248 103447248 C T CDC42BPB 182 47 - - - - - sSNV 20 58569523 58569523 T A CDH26 115 19 - - - - - nsSNV 20 60348183 60348183 C A CDH4 29 4 - - - - - nsSNV 19 42083630 42083630 G T CEACAM21 89 7 - - - - - nsSNV 16 81061864 81061864 C A CENPN 17 3 - - - - - nsSNV 3 101484336 101484336 G T CEP97 112 6 - - - - - nsSNV 2 202014412 202014412 G T CFLAR-AS1 130 6 - - - - - ncRNA 1 151506546 151506546 G A CGN 12 7 - - - - - sSNV 12 6701875 6701875 G T CHD4 82 5 - - - - O nsSNV rs1429 rs142966 22 29121232 29121232 C A CHEK2 172 8 O - - nsSNV 66756 756 2 175614897 175614897 C T CHRNA1 19 5 - - - - - nsSNV 15 78894408 78894408 G T CHRNA3 65 39 - - - - - sSNV 18 24604117 24604117 C A CHST9 32 4 - - - - - nsSNV 16 57473198 57473198 G T CIAPIN1 82 6 - - - - - nsSNV 19 42798827 42798827 C A CIC 142 6 - - O O O nsSNV 7 139228997 139228997 G T CLEC2L 45 4 - - - - - nsSNV 5 157243839 157243839 C A CLINT1 178 7 - - - - - splicing 19 45494142 45494142 C A CLPTM1 31 4 - - - - - nsSNV 3 140178505 140178505 C A CLSTN2 86 5 - - - - - sSNV 5 79026940 79026940 G A CMYA5 140 71 - - - - - sSNV 16 57931733 57931733 G A CNGB1 58 43 - - - - - nsSNV 6 116442151 116442151 C A COL10A1 66 6 - - - - - nsSNV 6 33140415 33140415 C A COL11A2 108 6 - - - - - splicing 10 71712646 71712646 G A COL13A1 21 10 - - - - - sSNV 13 111109773 111109773 G T COL4A2 64 5 - - - - O nsSNV 2 238249368 238249368 G T COL6A3 121 6 - - - - - nsSNV 49

3 99513773 99513773 G A COL8A1 37 11 - - - - - nsSNV 3 128979575 128979575 G T COPG1 35 4 - - - - - sSNV 20 30226879 30226879 G T COX4I2 16 4 - - - - - nsSNV 4 15067827 15067827 G T CPEB2 84 5 - - - - O nsSNV 4 15005287 15005287 C A CPEB2 94 6 - - - - O sSNV 20 2776413 2776413 G T CPXM1 93 5 - - - - - nsSNV 20 20023059 20023059 G A CRNKL1 269 70 - - - - - sSNV 12 53554023 53554023 C A CSAD 91 5 - - - - - sSNV 1 34554624 34554624 G A CSMD2 33 22 - - - - - sSNV 12 51457820 51457820 C T CSRNP2 78 19 - - - - - sSNV 7 117431318 117431318 G T CTTNBP2 40 3 - - - - - sSNV 7 101892210 101892210 G T CUX1 24 3 - - - - - nsSNV 2 172398266 172398266 G T CYBRD1 12 3 - - - - - nsSNV 8 143956518 143956518 G T CYP11B1 68 4 - - - - - nsSNV 19 41383099 41383099 G T CYP2A7 117 7 - - - - - nsSNV 8 59409400 59409400 G T CYP7A1 86 6 - - - - - nsSNV 9 124522316 124522316 G T DAB2IP 119 7 - - - - - nsSNV 17 42807281 42807281 G T DBF4B 38 4 - - - - - nsSNV 1 168012337 168012337 C A DCAF6 83 5 - - - - - nsSNV 11 6644232 6644232 C A DCHS1 37 3 - - - - - nsSNV 11 6655252 6655252 C A DCHS1 131 7 - - - - - splicing 1 85816130 85816130 C A DDAH1 56 5 - - - - - nsSNV 11 61070099 61070099 G T DDB1 80 5 - - - - - nsSNV rs1406 rs140621 22 24313802 24313802 G A DDT 390 17 - - - nsSNV 21020 020 9 118163487 118163487 C A DEC1 22 5 - - - - - nsSNV 7 24745809 24745809 G T DFNA5 40 4 - - - - - nsSNV 17 9680569 9680569 C A DHRS7C 57 18 - - - - - nsSNV 15 40660326 40660326 G C DISP2 13 4 - - - - - sSNV DKFZP434H 16 56228120 56228120 C A 74 6 - - - - - ncRNA 168 8 42234516 42234516 C A DKK4 68 5 - - - - - sSNV 1 64014766 64014766 G T DLEU2L 94 6 - - - - - ncRNA 3 57430951 57430951 C A DNAH12 118 27 - - - - - UN 10 129209158 129209158 G T DOCK1 30 4 - - - - - nsSNV 10 129242541 129242541 C A DOCK1 40 3 - - - - - nonsense 2 225661798 225661798 T G DOCK10 27 17 - - - - - nsSNV 17 1939927 1939927 C A DPH1 83 5 - - - - - nsSNV 21 42064876 42064876 C A DSCAM 74 4 - - - - - nsSNV 15 45454039 45454039 C A DUOX1 107 6 - - - - - sSNV 1 161719802 161719802 G T DUSP12 45 4 - - - - - nsSNV 2 71791262 71791262 C A DYSF 113 8 - - - - - sSNV 6 139210111 139210111 G T ECT2L 41 4 - - O - - nsSNV 14 23829065 23829065 C T EFS 17 6 - - - - - nsSNV 14 34419953 34419953 G A EGLN3 54 37 - - - - - sSNV 6 31856031 31856031 G T EHMT2 92 5 - - - - - nsSNV 11 107501263 107501263 G T ELMOD1 105 5 - - - - - sSNV 3 10044615 10044615 A T EMC3-AS1 138 68 - - - - - ncRNA 14 89087518 89087518 C A EML5 213 8 - - - - - nsSNV 1 225692753 225692753 C A ENAH 36 3 - - - - O nsSNV 3 40464384 40464384 G T ENTPD3 27 3 - - - - - nsSNV 8 23290466 23290466 G T ENTPD4 35 3 - - - - - sSNV 22 41573677 41573677 G T EP300 105 7 - - O O O nsSNV 12 15776223 15776224 TA T EPS8 121 12 - - - - - splicing 12 15774299 15774299 G A EPS8 101 18 - - - - - sSNV 9 27284961 27284961 A C EQTN 20 16 - - - - - nsSNV 6 170176597 170176597 C A ERMARD 65 5 - - - - - nsSNV 12 53680501 53680501 C A ESPL1 83 5 - - - - - sSNV 7 13935516 13935516 G T ETV1 129 7 - - O - - nsSNV 4 5624568 5624568 G T EVC2 85 5 - - - - - nsSNV 22 29695224 29695224 G T EWSR1 7 3 - - O - - sSNV 19 17000816 17000816 C A F2RL3 25 3 - - - - - nsSNV GGCAG 3 197880130 197880139 G FAM157A 8 8 - - - - - IF del CAGCA 11 6245450 6245450 G T FAM160A2 31 4 - - - - - nsSNV 4 17710901 17710901 C A FAM184B 41 14 - - - - - nsSNV 4 2698202 2698202 G T FAM193A 114 7 - - - - - nsSNV 9 34832628 34832628 G T FAM205B 38 4 - - - - - ncRNA 50

FAM24B- 10 124591789 124591789 G C 133 5 - - - - - ncRNA CUZD1 X 34961658 34961658 C A FAM47B 41 4 - - - - - nsSNV 17 80053270 80053270 G T FASN 56 5 - - - - - nsSNV 4 187521205 187521205 C A FAT1 55 5 - - - - O nsSNV 22 45921519 45921519 C A FBLN1 8 3 - - - - - sSNV 3 33416815 33416815 G T FBXL2 38 4 - - - - - nsSNV 2 73496555 73496555 G T FBXO41 42 4 - - - - - sSNV 22 23605315 23605315 G T FBXW4P1 169 11 - - - - - ncRNA 19 40362919 40362919 C A FCGBP 127 6 - - - - - nsSNV 1 157769873 157769873 T A FCRL1 34 25 - - - - - nonsense 14 24601561 24601561 G A FITM1 43 26 - - - - - sSNV FLJ10038, 15 50646423 50646423 C A GABPB1- 43 4 - - - - - ncRNA AS1 5 180047702 180047702 C A FLT4 126 7 - - - - - nsSNV 12 2983409 2983409 G T FOXM1 180 8 - - - - - nsSNV 14 89647120 89647120 C A FOXN3 46 5 - - - - - nsSNV 9 14747005 14747005 C A FREM1 21 4 - - - - - nsSNV 3 69246102 69246102 C T FRMD4B 111 38 - - - - - nonsense 5 132939620 132939620 C A FSTL4 35 13 - - - - - nsSNV 16 53907763 53907763 C A FTO 95 5 - - - - - nsSNV 9 133501811 133501811 T A FUBP3 38 19 - - - - - nsSNV 9 133501812 133501812 G T FUBP3 38 19 - - - - - nsSNV 11 94278261 94278261 G T FUT4 186 7 - - - - - nsSNV 7 151818052 151818052 G T GALNT11 41 4 - - - - - nsSNV 9 101570334 101570334 C A GALNT12 16 3 - - - - - sSNV 11 62398136 62398136 C T GANAB 151 35 - - - - - nonsense 21 34889797 34889797 C A GART 225 8 - - - - - nsSNV 3 128199862 128199862 C A GATA2 83 5 - - O O - stoploss 16 19528385 19528385 T C GDE1 64 38 - - - - - nsSNV 5 154287202 154287202 C A GEMIN5 90 5 - - - - - nsSNV 6 13365080 13365080 C A GFOD1 78 5 - - - - - sSNV 17 73238423 73238423 C G GGA3 130 81 - - - - - sSNV 5 42689093 42689093 G A GHR 34 9 - - - - - nsSNV 7 100281860 100281860 C A GIGYF1 28 4 - - - - - nsSNV 2 233674453 233674453 G T GIGYF2 44 4 - - - - O sSNV 2 220107545 220107545 G A GLB1L 111 68 - - - - - nsSNV 7 42004664 42004664 C A GLI3 132 7 - - - - - nsSNV X 102979529 102979529 G T GLRA4 11 3 - - - - - sSNV 19 48182705 48182705 G T GLTSCR1 12 9 - - - - O nsSNV 2 70057081 70057081 G T GMCL1 52 21 - - - - - nsSNV 5 180664645 180664645 G T GNB2L1 93 6 - - - - - sSNV 11 62476192 62476192 C A GNG3 180 8 - - - - - nsSNV 12 21659829 21659829 G T GOLT1B 270 10 - - - - - nsSNV 3 100374024 100374024 G T GPR128 212 9 - - - - - nsSNV X 150349070 150349070 C A GPR50 93 5 - - - - - nsSNV 1 110086520 110086520 G T GPR61 111 6 - - - - - sSNV 12 66911756 66911756 C A GRIP1 132 7 - - - - - nsSNV 3 141535841 141535841 G T GRK7 64 5 - - - - - sSNV 19 42744186 42744186 C A GSK3A 172 9 - - - - - nsSNV 9 124044097 124044097 C A GSN-AS1 91 5 - - - - - ncRNA 8 30557600 30557600 G T GSR 65 5 - - - - - sSNV 6 43593255 43593255 C A GTPBP2 90 5 - - - - - nsSNV 17 7909924 7909924 C A GUCY2D 33 3 - - - - - sSNV 6 105298872 105298872 C A HACE1 32 23 - - - - - nsSNV 6 105298873 105298873 T C HACE1 32 23 - - - - - nsSNV 19 17160821 17160821 G T HAUS8 61 5 - - - - O sSNV 6 135359042 135359042 C A HBS1L 25 3 - - - - - nonsense X 153222389 153222389 C A HCFC1 23 3 - - - - - nsSNV X 153228808 153228808 C A HCFC1 26 3 - - - - - nsSNV 14 73976017 73976017 C A HEATR4 114 6 - - - - - nsSNV 12 112630465 112630465 G T HECTD4 59 5 - - - - - nsSNV 7 43490477 43490477 C A HECW1 43 4 - - - - - nsSNV 2 197106886 197106886 C A HECW2 36 3 - - - - - nsSNV 17 80395108 80395108 G T HEXDC 40 4 - - - - - nsSNV 12 123342759 123342759 G T HIP1R 42 4 - - - - - sSNV 1 114483207 114483207 C A HIPK1 87 5 - - - - - nsSNV 51

6 26017368 26017368 C A HIST1H1A 82 5 - - - - - nsSNV 6 26107800 26107800 C A HIST1H1T 191 8 - - - - - nsSNV 1 42049707 42049707 C A HIVEP3 78 5 - - - - - nsSNV 1 185972857 185972857 C A HMCN1 121 6 - - - - O sSNV 5 149431301 149431301 C A HMGXB3 142 6 - - - - - nsSNV 10 43883315 43883315 C A HNRNPF 39 4 - - - - - nsSNV 17 46688008 46688008 G T HOXB7 89 5 - - - - - sSNV 12 54383022 54383022 C A HOXC10 72 5 - - - - - nsSNV 15 58983868 58983872 TCTTC T HSP90AB4P 37 6 - - - - - ncRNA 11 122929117 122929117 C A HSPA8 72 5 - - - - - nsSNV 1 23519665 23519665 G T HTR1D 70 5 - - - - - nsSNV 1 19992892 19992892 C A HTR6 50 4 - - - - - nsSNV 11 118925253 118925253 G T HYOU1 53 5 - - - - - nsSNV 9 95003198 95003198 G T IARS 33 4 - - - - - nsSNV 3 50326697 50326697 G T IFRD2 31 4 - - - - - sSNV 3 129221570 129221570 C A IFT122 80 5 - - - - - nsSNV 16 1841727 1841727 C A IGFALS 23 3 - - - - - nsSNV 7 50444295 50444295 G C IKZF1 49 5 - - O O - sSNV 2 102782717 102782717 C A IL1R1 112 5 - - - - - sSNV 2 86371667 86371667 G C IMMT 57 9 - - - - - sSNV 10 121583332 121583332 C A INPP5F 39 5 - - - - - nsSNV 20 20350147 20350148 CG C INSM1 57 13 - - - - - FS del 1 153727215 153727215 G A INTS3 16 9 - - - - - sSNV 5 61897648 61897648 C A IPO11 80 5 - - - - - sSNV 12 30829838 30829838 G T IPO8 131 6 - - - - - nsSNV 3 51812713 51812713 G T IQCF6 87 4 - - - - - sSNV X 53268413 53268413 G T IQSEC2 67 5 - - - - - nsSNV 5 150227805 150227805 G A IRGM 177 64 - - - - - sSNV 16 54319495 54319495 C A IRX3 72 4 - - - - - nsSNV 16 71958692 71958692 T TAAG IST1 31 16 - - - - - IF ins 16 31424454 31424454 C A ITGAD 60 5 - - - - - nsSNV 10 33200428 33200428 C A ITGB1 74 6 - - - - - nsSNV 15 41795205 41795205 C G ITPKA 39 17 - - - - - sSNV 3 4687151 4687151 G T ITPR1 86 5 - - - - - nsSNV 8 1951175 1951175 C A KBTBD11 21 3 - - - - - nsSNV 20 47989847 47989847 C A KCNB1 123 6 - - - - - sSNV 20 62045472 62045472 G T KCNQ2 47 5 - - - - - nsSNV 20 62103545 62103545 C A KCNQ2 138 6 - - - - - nsSNV 1 196438199 196438199 C A KCNT2 31 4 - - - - O splicing 8 110984623 110984623 G T KCNV1 118 7 - - - - - sSNV 12 109907350 109907350 C A KCTD10 38 4 - - - - - nsSNV 14 24901016 24901016 C A KHNYN 57 5 - - - - - sSNV 16 15698195 15698195 G C KIAA0430 95 33 - - - - - nsSNV 19 18375496 18375496 G T KIAA1683 98 6 - - - - - sSNV 2 8934036 8934036 C A KIDINS220 68 5 - - - - - nsSNV 20 16360543 16360544 AG A KIF16B 478 86 - - - - - FS del 14 104638926 104638926 G T KIF26A 43 4 - - - - - sSNV 6 33373304 33373304 C A KIFC1 52 4 - - - - - sSNV 8 145694727 145694727 C A KIFC2 67 5 - - - - - nsSNV 2 10188314 10188314 C A KLF11 239 9 - - - - - nsSNV 7 130418131 130418131 G T KLF14 60 5 - - - - - nsSNV 9 110250541 110250541 C A KLF4 17 3 - - O O - nsSNV 1 899746 899746 C A KLHL17 29 4 - - - - - sSNV 3 47361167 47361167 C A KLHL18 72 5 - - - - - nsSNV 7 151842298 151842298 G T KMT2C 115 6 - - - - O nsSNV 12 49432280 49432280 C A KMT2D 58 5 - - - - O nsSNV 7 98786122 98786122 G T KPNA7 128 6 - - - - - nsSNV 17 20405802 20405802 T C KRT16P3 19 3 - - - - - ncRNA 12 53298675 53298675 A C KRT8 9 4 - - - - - nsSNV rs2018 rs201807 12 53298699 53298699 G A KRT8 15 4 - - - nsSNV 07576 576 17 39296476 39296476 G C KRTAP4-6 11 4 - - - - - nsSNV 17 39346595 39346595 T TA KRTAP9-1 94 8 - - - - - FS ins 14 56122813 56122813 G T KTN1 80 5 - - O - - nsSNV 14 59950741 59950741 C A L3HYPDH 86 7 - - - - - nsSNV 20 42142257 42142257 C A L3MBTL1 130 6 - - - - - sSNV 20 9498807 9498807 T C LAMP5 52 9 - - - - - nsSNV 52

5 154173230 154173230 C A LARP1 94 7 - - - - - nsSNV 19 11226816 11226816 G A LDLR 10 7 - - - - - nsSNV 13 53277821 53277821 G T LECT1 92 6 - - - - - nsSNV 4 109000717 109000717 G T LEF1 41 4 - - - - - nsSNV 7 103969609 103969609 C A LHFPL3 111 6 - - - - - nsSNV 17 33316503 33316503 G T LIG3 84 5 - - - - - nsSNV 12 50589631 50589631 G T LIMA1 90 6 - - - - - nsSNV 12 81239693 81239693 C T LIN7A 85 18 - - - - - nsSNV 22 42354487 42354489 GGA G LINC00634 34 8 - - - - - ncRNA 15 66977723 66977723 G T LINC01169 69 5 - - - - - ncRNA 19 42928360 42928360 G A LIPE-AS1 46 10 - - - - - ncRNA 7 97823042 97823042 C A LMTK2 76 5 - - - - - nsSNV LOC100129 3 122606292 122606292 G T 73 23 - - - - - ncRNA 550 LOC101926 20 25826811 25826811 C G 39 6 - - - - - ncRNA 935 12 53447253 53447253 G T LOC283335 41 4 - - - - - ncRNA 22 48934923 48934923 G T LOC284933 152 7 - - - - - ncRNA 12 54520018 54520018 G C LOC400043 100 27 - - - - - ncRNA 1 99771557 99771557 C A LPPR4 135 6 - - - - - nsSNV 12 57548081 57548081 G T LRP1 179 7 - - - - - nsSNV 12 57599434 57599434 A T LRP1 62 29 - - - - - nsSNV 2 141609230 141609231 TC T LRP1B 45 27 - - - - - FS del 20 6022609 6022609 C A LRRN4 83 5 - - - - - nsSNV 3 54952849 54952849 A T LRTM1 40 18 - - - - - sSNV 11 65306808 65306808 G T LTBP3 21 3 - - - - - nsSNV 1 23418666 23418666 G T LUZP1 90 5 - - - - - sSNV 6 6347040 6347040 T A LY86-AS1 93 5 - - - - - ncRNA 6 10775640 10775640 G T MAK 50 5 - - - - - sSNV TTGCTG 11 95825374 95825383 T MAML2 24 57 - - O - - IF del CTGC 9 139992324 139992324 C A MAN1B1 55 6 - - - - - nsSNV 12 53881007 53881007 C A MAP3K12 56 5 - - - - - nsSNV X 20031173 20031173 C A MAP7D2 76 5 - - - - - nsSNV 17 19285244 19285244 G T MAPK7 23 3 - - - - - nsSNV 3 50679749 50679749 C A MAPKAPK3 24 4 - - - - - nsSNV 18 32720314 32720314 C A MAPRE2 38 4 - - - - - nsSNV 17 60814520 60814520 G T MARCH10 134 6 - - - - - nsSNV 16 71668649 71668649 G T MARVELD3 32 4 - - - - - nsSNV 20 3842045 3842045 C A MAVS 60 5 - - - - - nsSNV 11 119182607 119182607 C A MCAM 62 4 - - - - - nsSNV 3 127340071 127340071 G A MCM2 36 13 - - - - - sSNV 6 90358006 90358006 G T MDN1 70 5 - - - - - nsSNV 19 42865018 42865018 G A MEGF8 120 25 - - - - - sSNV 7 141756685 141756685 G T MGAM 92 5 - - - - - sSNV 1 62121191 62121192 TA T MGC34796 13 6 - - - - - ncRNA 19 51321990 51321990 C A MGC45922 7 4 - - - - - ncRNA 22 18273574 18273574 C T MICAL3 22 4 - - - - - nsSNV 11 5011037 5011037 G T MMP26 69 5 - - - - - nsSNV 14 61201592 61201592 G T MNAT1 45 4 - - - - - nsSNV 1 113235503 113235503 C A MOV10 25 4 - - - - - sSNV 17 56356929 56356929 G T MPO 87 5 - - - - - nsSNV 8 142490048 142490048 G A MROH5 32 17 - - - - - UN 20 35757529 35757529 G A MROH8 30 7 - - - - - UN 11 66204369 66204369 G A MRPL11 60 12 - - - - - nsSNV 7 100648065 100648065 C T MUC12 58 146 - - - - - nsSNV 19 9076707 9076707 G T MUC16 87 5 - - - - - nsSNV 11 1084822 1084822 G T MUC2 74 4 - - - - - nsSNV 11 1016871 1016871 G A MUC6 154 11 - - - - - nsSNV 11 1029572 1029572 G T MUC6 32 4 - - - - - sSNV 22 36702009 36702009 G T MYH9 45 4 - - O - - nsSNV 2 211167494 211167494 G T MYL1 39 20 - - - - - nsSNV 17 18025208 18025208 C A MYO15A 69 5 - - - - - nsSNV 17 73610680 73610680 C A MYO15B 40 5 - - - - - ncRNA 13 109793496 109793496 C A MYO16 65 5 - - - - - nsSNV 2 128387231 128387231 C A MYO7B 84 5 - - - - - nsSNV 9 138903489 138903489 T TC NACC2 69 6 - - - - - FS ins 16 5078940 5078940 C A NAGPA 85 6 - - - - - sSNV 53

8 18080245 18080245 G T NAT1 108 7 - - - - - nsSNV 12 78362412 78362412 C A NAV3 86 12 - - - - - nsSNV 3 47033160 47033160 C A NBEAL2 70 5 - - - - - nsSNV 11 113144386 113144386 C T NCAM1-AS1 47 13 - - - - - ncRNA 11 134028302 134028302 C A NCAPD3 90 5 - - - - - nsSNV 12 54902293 54902293 C A NCKAP1L 121 6 - - - - - nsSNV 2 24952591 24952591 G T NCOA1 156 7 - - O - - nsSNV 12 124826411 124826411 G T NCOR2 78 5 - - - - O nsSNV 1 160322739 160322739 C A NCSTN 32 4 - - - - - sSNV 14 21491046 21491046 G T NDRG2 36 3 - - - - - sSNV 10 21101819 21101819 G T NEBL 32 4 - - - - - sSNV 8 91804192 91804192 G T NECAB1 75 5 - - - - - nsSNV 1 72241872 72241872 C A NEGR1 31 4 - - - - - nsSNV 9 127088628 127088628 G A NEK6 72 35 - - - - - nsSNV 17 46135741 46135741 C A NFE2L1 161 8 - - - - - nsSNV 1 115828958 115828958 C A NGF 92 6 - - - - - nsSNV 14 52473306 52473306 G T NID2 78 5 - - - - - nsSNV 15 23006250 23006250 G T NIPA2 43 4 - - - - - sSNV 8 23540374 23540374 C A NKX3-1 43 5 - - - - - nsSNV 17 7319276 7319276 G T NLGN2 128 7 - - - - - nsSNV X 6069396 6069396 G T NLGN4X 86 5 - - - - - nsSNV 19 56220344 56220344 G T NLRP9 87 5 - - - - - sSNV 11 119052907 119052907 G T NLRX1 83 6 - - - - - nsSNV 7 150711126 150711126 G T NOS3 93 5 - - - - - nsSNV X 100118486 100118486 C A NOX1 37 4 - - - - - nsSNV 11 66192488 66192488 C A NPAS4 109 6 - - - - - sSNV 3 132419238 132419238 C A NPHP3 87 5 - - - - - sSNV 7 98256544 98256544 G T NPTX2 43 3 - - - - - nsSNV 22 39239513 39239513 C A NPTXR 18 3 - - - - - sSNV 16 67920001 67920003 ACT A NRN1L 97 63 - - - - - FS del 11 64428417 64428417 C A NRXN2 95 5 - - - - - nsSNV 10 104859696 104859696 C A NT5C2 106 7 - - O - - nsSNV 12 104179244 104179244 G T NT5DC3 60 5 - - - - - nsSNV 11 120097672 120097672 C A OAF 163 8 - - - - - nsSNV 15 76019739 76019739 C A ODF3L1 77 5 - - - - - nsSNV 3 9791994 9791994 C A OGG1 105 6 - - - - - sSNV 11 132812898 132812898 G T OPCML 30 3 - - - - - sSNV 12 48596213 48596213 G T OR10AD1 76 5 - - - - - nsSNV 11 123908858 123908858 G T OR10G7 95 5 - - - - - nsSNV 11 48346563 48346563 C A OR4C3 122 53 - - - - - nsSNV 3 97869022 97869022 C A OR5H14 30 5 - - - - - nsSNV 7 142750049 142750049 T G OR6V1 622 18 - - - - - nsSNV 11 124180053 124180053 C T OR8D1 41 17 - - - - - nsSNV 11 55904809 55904809 G T OR8J3 104 6 - - - - - nsSNV 12 48920257 48920257 C A OR8S1 215 8 - - - - - sSNV 12 58114219 58114219 G T OS9 43 4 - - - - - nsSNV 3 161221576 161221576 C T OTOL1 15 5 - - - - - nsSNV 12 56726729 56726729 G T PAN2 72 4 - - - - - sSNV 16 50188133 50188133 C A PAPD5 38 4 - - - - - nsSNV 19 39589134 39589134 G T PAPL 86 5 - - - - - nsSNV 14 73733234 73733234 C A PAPLN 44 4 - - - - - nsSNV 14 97000899 97000899 G T PAPOLA 142 6 - - - - - nsSNV 20 49366736 49366736 C A PARD6B 142 7 - - - - - nsSNV 3 122259629 122259629 C A PARP9 159 7 - - - - - nsSNV 2 223066881 223066881 C A PAX3 46 4 - - O - - nsSNV 3 52643508 52643508 C A PBRM1 144 7 - - O O O nsSNV 11 66638627 66638627 G T PC 102 6 - - - - - nsSNV 3 51993966 51993966 C A PCBP4 40 4 - - - - - nsSNV 4 30724727 30724727 C G PCDH7 51 12 - - - - O nsSNV 5 140209865 140209865 G T PCDHA6 56 20 - - - - - nsSNV 5 140558897 140558897 G T PCDHB8 200 8 - - - - - nsSNV 5 140711946 140711946 C A PCDHGA1 68 5 - - - - - sSNV 20 17446037 17446037 C T PCSK2 256 24 - - - - - sSNV 19 1487169 1487169 G T PCSK4 45 4 - - - - - nsSNV 5 149498364 149498364 G T PDGFRB 151 7 - - O - - sSNV 12 48535726 48535726 G T PFKM 188 11 - - - - - nsSNV 3 169896685 169896685 G T PHC3 106 9 - - - - - nsSNV 54

1 6680129 6680129 G T PHF13 38 4 - - - - - nsSNV 18 60630608 60630608 C A PHLPP1 4 3 - - - - - nsSNV 13 73409508 73409508 T TA PIBF1 1 9 - - - - - splicing 3 138403499 138403499 G T PIK3CB 98 6 - - - - O sSNV 1 46543247 46543247 C T PIK3R3 42 24 - - - - - nsSNV 19 40873657 40873657 G T PLD3 81 4 - - - - - sSNV 8 144992582 144992582 G T PLEC 35 4 - - - - - nsSNV 8 144997967 144997967 G T PLEC 89 5 - - - - - nsSNV 3 126746882 126746882 C A PLXNA1 148 7 - - - - - nsSNV 1 208255771 208255771 G T PLXNA2 98 6 - - - - - nsSNV 3 48457835 48457835 C A PLXNB1 21 3 - - - - - nsSNV 1 166819553 166819553 A T POGK 50 36 - - - - - nsSNV 22 42981894 42981894 G T POLDIP3 70 5 - - - - - nsSNV 12 133233739 133233739 G T POLE 85 5 - - - - O nsSNV 15 89871948 89871948 C A POLG 80 5 - - - - - nsSNV 3 121168195 121168195 A C POLQ 74 27 - - - - - nsSNV 22 38363139 38363139 G A POLR2F 13 13 - - - - - sSNV 3 119378978 119378978 C A POPDC2 41 3 - - - - - nsSNV 3 87325472 87325472 T A POU1F1 17 8 - - - - - sSNV 6 99283522 99283522 C A POU3F2 82 5 - - - - - nsSNV 2 44436479 44436479 C G PPM1B 93 38 - - - - - nsSNV 11 61250211 61250211 C A PPP1R32 12 4 - - - - - sSNV 12 108134927 108134927 G T PRDM4 174 7 - - - - - nsSNV X 23685907 23685907 C A PRDX4 30 4 - - - - - nsSNV 13 39586310 39586310 C A PROSER1 26 3 - - - - - sSNV 20 62614440 62614440 C A PRPF6 66 5 - - - - - nsSNV 19 50098982 50098982 G T PRR12 39 5 - - - - - nsSNV 9 134305561 134305561 G T PRRC2B 50 5 - - - - - nsSNV 9 134351326 134351326 C A PRRC2B 164 10 - - - - - sSNV rs3765 rs376581 8 143763339 143763339 G A PSCA 86 25 - - - nsSNV 81990 990 19 43237008 43237008 C T PSG3 22 6 - - - - - nsSNV 17 38151512 38151512 C A PSMD3 125 6 - - - - - nsSNV 9 130885257 130885257 C A PTGES2 66 6 - - - - - nsSNV 8 141678378 141678378 G T PTK2 150 7 - - - - - nsSNV 6 43109749 43109749 G A PTK7 82 36 - - - - - nsSNV 12 71139637 71139637 G T PTPRR 94 6 - - - - - nsSNV 3 191179074 191179074 C T PYDC2 84 22 - - - - - sSNV 16 640372 640372 G T RAB40C 44 4 - - - - - nsSNV X 103080708 103080708 C A RAB9B 35 4 - - - - - nsSNV 20 20553610 20553610 C T RALGAPA2 26 7 - - - - - nsSNV 3 50150859 50150859 G T RBM5 66 5 - - - - - nsSNV 6 111696294 111696294 G T REV3L 80 5 - - - - - sSNV 10 62648340 62648340 C A RHOBTB1 186 7 - - - - - nsSNV 14 93118367 93118367 C A RIN3 86 5 - - - - - nsSNV 1 40705314 40705314 G T RLF 92 5 - - - - - nsSNV 8 101270957 101270957 G T RNF19A 95 6 - - - - - nsSNV 14 24616983 24616983 G T RNF31 30 3 - - - - - nsSNV 2 89037502 89037502 G T RPIA 32 4 - - - - - sSNV RPL17, 18 47015879 47015879 C A RPL17- 26 4 - - - - - sSNV C18orf32 19 12936690 12936690 C T RTBDN 39 13 - - - - - nsSNV 22 20229905 20229905 C A RTN4R 85 28 - - - - - nsSNV 9 91616420 91616420 A T S1PR3 109 48 - - - - - nsSNV 2 224463385 224463385 C A SCG2 93 5 - - - - - nsSNV 7 12675689 12675689 C A SCIN 60 16 - - - - - sSNV 11 118014739 118014739 G T SCN4B 90 6 - - - - - nsSNV 2 167301483 167301483 G T SCN7A 92 5 - - - - - nsSNV 11 9090981 9090981 C A SCUBE2 77 5 - - - - - nsSNV 1 156146641 156146641 G T SEMA4A 75 5 - - - - - sSNV 22 21138358 21138358 G T SERPIND1 136 6 - - - - - nsSNV 16 30991124 30991124 C A SETD1A 35 4 - - - - - sSNV 3 47098785 47098785 G T SETD2 128 6 - - O O O sSNV 1 150915493 150915493 T A SETDB1 32 26 - - - - - nsSNV 16 28884061 28884061 C A SH2B1 45 4 - - - - - sSNV 19 51192455 51192455 C A SHANK1 14 3 - - - - - nsSNV 19 422156 422156 G T SHC2 25 4 - - - - - nsSNV 55

rs7304 rs730496 19 51920196 51920196 G T SIGLEC10 44 5 - - - nsSNV 9612 12 19 51645659 51645659 G T SIGLEC7 90 5 - - - - - nsSNV 5 133502930 133502930 C A SKP1 83 5 - - - - - nsSNV 1 160718048 160718048 C A SLAMF7 34 4 - - - - - sSNV 16 56906304 56906304 C A SLC12A3 51 13 - - - - - sSNV 3 124826587 124826587 C A SLC12A8 140 7 - - - - - nsSNV 19 49933799 49933799 G A SLC17A7 16 4 - - - - - nsSNV 5 36680612 36680612 C A SLC1A3 169 8 - - - - - nsSNV 11 64329527 64329527 G T SLC22A11 82 5 - - - - - sSNV 11 62763553 62763553 C A SLC22A8 171 8 - - - - - nsSNV 7 87488025 87488025 C A SLC25A40 107 6 - - - - - nsSNV 8 92378848 92378848 C A SLC26A7 83 5 - - - - - nsSNV 15 45560541 45560541 G A SLC28A2 31 10 - - - - - nsSNV 10 73111419 73111419 C A SLC29A3 39 4 - - - - - nsSNV 6 44222724 44222724 G T SLC35B2 70 5 - - - - - nsSNV 17 42335094 42335094 C A SLC4A1 38 4 - - - - - nsSNV 20 3211679 3211679 C A SLC4A11 81 6 - - - - - sSNV rs1500 rs150040 3 27453191 27453191 C T SLC4A7 46 15 - - - nsSNV 40978 978 8 145584265 145584265 G T SLC52A2 90 5 - - - - - sSNV 5 149578908 149578908 G T SLC6A7 131 7 - - - - - nsSNV 12 21427436 21427436 C A SLCO1A2 159 7 - - - - - nsSNV 4 20490610 20490610 G T SLIT2 33 4 - - - - - nsSNV 10 105762531 105762531 G T SLK 58 5 - - - - - nsSNV 20 57610099 57610099 C T SLMO2 20 12 - - - - - nsSNV 19 11129675 11129675 C A SMARCA4 80 5 - - O O O sSNV 4 90756714 90756714 C A SNCA 40 4 - - - - - nsSNV 18 67992266 67992266 C A SOCS6 24 3 - - - - - nsSNV 2 171573149 171573149 C A SP5 50 4 - - - - - sSNV 13 24825897 24825897 C A SPATA13 81 6 - - - - - sSNV 6 44320494 44320494 G T SPATS1 139 6 - - - - - nsSNV 1 16254813 16254813 G T SPEN 84 5 - - - - O nsSNV 1 16254891 16254891 G A SPEN 61 39 - - - - O nsSNV 5 136320830 136320830 C A SPOCK1 109 6 - - - - - nsSNV 17 43923312 43923312 G T SPPL2C 115 6 - - - - - nsSNV 14 65239344 65239344 C A SPTB 34 4 - - - - - nsSNV 14 65239665 65239665 C A SPTB 76 4 - - - - - nsSNV 16 30724921 30724921 C A SRCAP 82 5 - - - - - sSNV 7 104766715 104766715 C A SRPK2 24 3 - - - - - nsSNV GGAGG CGGCG 22 26879946 26879967 CCCCGG G SRRD 69 27 - - - - - IF del GGGAG A 22 37603027 37603027 G T SSTR3 80 5 - - - - - sSNV 1 44303930 44303930 G A ST3GAL3 156 62 - - - - - sSNV 7 116862001 116862001 C A ST7 35 4 - - - - - nsSNV 8 74351277 74351277 G T STAU2-AS1 194 8 - - - - - ncRNA X 7177590 7177590 C A STS 84 5 - - - - - nsSNV STX16- 20 57290463 57290463 C A 129 6 - - - - - ncRNA NPEPL1 16 28603371 28603371 A G SULT1A2 15 9 - - - - - nsSNV 16 28603372 28603372 G T SULT1A2 14 9 - - - - - nsSNV 10 70968510 70968510 G T SUPV3L1 134 7 - - - - - nsSNV 9 93636492 93636492 C A SYK 116 6 - - O - - nsSNV 9 93641094 93641094 G T SYK 91 5 - - O - - nsSNV 14 74876409 74876409 G T SYNDIG1L 17 4 - - - - - sSNV 14 64678695 64678695 G T SYNE2 88 5 - - - - - nsSNV 4 119948372 119948372 G T SYNPO2 62 5 - - - - - nsSNV 1 43896734 43896734 C A SZT2 71 6 - - - - - nsSNV 4 1729975 1729975 C A TACC3 19 3 - - - - - sSNV 16 30002820 30002820 C A TAOK2 15 4 - - - - - sSNV 1 234601455 234601456 CT C TARBP1 73 12 - - - - - splicing 1 19180971 19180971 G T TAS1R2 76 5 - - - - - sSNV 19 50390826 50390826 G T TBC1D17 81 5 - - - - - nsSNV 11 12901388 12901388 G T TEAD1 8 3 - - - - - nsSNV 5 167674392 167674392 C A TENM2 101 6 - - - - - nsSNV 11 78440644 78440644 C A TENM4 38 4 - - - - - sSNV 56

TFAP2A- 6 10415107 10415107 G A 18 5 - - - - - ncRNA AS1 1 92149414 92149414 C A TGFBR3 32 4 - - - - - nsSNV 15 71175212 71175212 G T THAP10 86 6 - - - - - sSNV 22 21354468 21354468 G T THAP7 39 4 - - - - - nsSNV X 122758005 122758005 C A THOC2 23 3 - - - - - nsSNV 16 3074360 3074360 G T THOC6 82 6 - - - - - sSNV 2 138329991 138329991 G T THSD7B 91 4 - - - - - UN 8 144681483 144681483 G T TIGD5 92 5 - - - - - sSNV 9 35708418 35708418 G A TLN1 3 8 - - - - - nsSNV 4 38798572 38798572 G T TLR1 89 5 - - - - - sSNV 20 2616601 2616601 C G TMC2 38 16 - - - - - nsSNV 17 76109255 76109255 G A TMC6 29 5 - - - - - sSNV 1 20009821 20009821 G T TMCO4 43 4 - - - - - sSNV 12 57457942 57457942 C A TMEM194A 68 5 - - - - - sSNV 9 140099861 140099861 G T TMEM203 27 4 - - - - - sSNV 1 212550933 212550933 C A TMEM206 118 6 - - - - - nsSNV TMEM51- 1 15443766 15443766 G T 76 6 - - - - - ncRNA AS1 1 16069533 16069533 C A TMEM82 36 3 - - - - - sSNV 15 52192437 52192437 C A TMOD3 53 29 - - - - - nsSNV 17 26671453 26671453 C A TNFAIP1 38 4 - - - - - nsSNV 17 18217952 18217952 C A TOP3A 25 5 - - - - - nsSNV 1 179815593 179815593 C A TOR1AIP2 79 5 - - - - - sSNV 9 123673617 123673617 G T TRAF1 12 3 - - - - - nsSNV 6 116821669 116821669 C T TRAPPC3L 25 5 - - - - - nsSNV 17 57141768 57141769 TA T TRIM37 43 7 - - - - - splicing 3 160156679 160156679 G T TRIM59 94 6 - - - - - nsSNV 11 8643262 8643262 T A TRIM66 49 25 - - - - - nsSNV 5 14507230 14507230 G T TRIO 75 5 - - - - - splicing 20 5921750 5921750 C A TRMT6 58 5 - - - - - nsSNV 7 98562363 98562363 C A TRRAP 88 5 - - - - - nsSNV 7 98576461 98576461 C A TRRAP 46 5 - - - - - sSNV 7 100075040 100075040 G T TSC22D4 45 4 - - - - - sSNV 20 51872809 51872809 C A TSHZ2 199 9 - - - - - nsSNV 15 43044918 43044918 C G TTBK2 56 28 - - - - - nsSNV 3 180328173 180328173 G T TTC14 38 5 - - - - - nsSNV 17 40094865 40094865 G T TTC25 40 3 - - - - - UN 2 179615173 179615173 G T TTN 43 4 - - - - - nsSNV 2 179444880 179444880 G T TTN 91 6 - - - - - sSNV 10 93711 93711 C T TUBB8 31 4 - - - - - sSNV 10 135107202 135107202 C A TUBGCP2 42 5 - - - - - nsSNV 1 154242713 154242713 C A UBAP2L 69 5 - - - - - nsSNV 11 5529288 5529288 T A UBQLN3 30 4 - - - - - nsSNV 11 5529289 5529289 C A UBQLN3 31 4 - - - - - nsSNV 2 170897395 170897395 G T UBR3 136 8 - - - - - sSNV 17 73838963 73838963 C A UNC13D 80 5 - - - - - nsSNV 10 73051211 73051211 C A UNC5B 152 7 - - - - - sSNV 2 210704103 210704103 C A UNC80 124 6 - - - - - nsSNV 12 6971705 6971705 G T USP5 100 6 - - - - - nsSNV 8 67563754 67563754 C A VCPIP1 126 6 - - - - - nsSNV 2 219293043 219293043 C A VIL1 42 4 - - - - - nsSNV 8 100883761 100883761 G T VPS13B 133 9 - - - - - nsSNV 15 41192668 41192668 G T VPS18 80 5 - - - - - nsSNV 11 124620732 124620732 C A VSIG2 38 4 - - - - - nsSNV 10 50311832 50311832 G T VSTM4 29 4 - - - - - sSNV 12 6076656 6076656 G T VWF 106 7 - - - - - nsSNV 2 224746696 224746696 C A WDFY1 18 5 - - - - - nsSNV 19 990330 990330 G T WDR18 104 7 - - - - - nsSNV 2 128471517 128471517 T C WDR33 31 7 - - - - O nsSNV 9 131396134 131396134 C A WDR34 64 4 - - - - - sSNV X 48933211 48933211 G T WDR45 19 4 - - - - - nsSNV 19 33623327 33623327 G T WDR88 83 6 - - - - - nsSNV 1 3553560 3553560 C A WRAP73 67 4 - - - - - nsSNV 14 75245330 75245330 C A YLPM1 89 5 - - - - O nsSNV 10 27420789 27420789 C A YME1L1 99 7 - - - - - nsSNV 3 114058131 114058131 G C ZBTB20 80 6 - - - - - sSNV 8 135621100 135621100 C T ZFAT 58 48 - - - - - sSNV 57

10 81070801 81070804 CCTT C ZMIZ1 60 9 - - - - - IF del 19 19790848 19790848 C A ZNF101 78 6 - - - - - sSNV 19 44831632 44831632 G T ZNF112 78 5 - - - - - nsSNV 19 58944746 58944746 C A ZNF132 172 8 - - - - - nsSNV 16 3190790 3190790 C A ZNF213 18 3 - - - - - sSNV 19 44680640 44680640 C A ZNF226 153 7 - - - - - nsSNV 8 146201231 146201232 AC A ZNF252P 117 13 - - - - - ncRNA 19 58452381 58452381 C A ZNF256 85 5 - - - - - nsSNV 14 74371680 74371680 C T ZNF410 23 15 - - - - - sSNV 5 121488780 121488780 G T ZNF474 294 10 - - - - - stoploss 19 53015183 53015183 G T ZNF578 43 4 - - - - - nsSNV 4 87034 87034 G T ZNF595 12 4 - - - - - UN rs3698 rs369892 16 2049619 2049619 C T ZNF598 71 5 - - - nsSNV 92816 816 3 40558080 40558080 G T ZNF620 81 6 - - - - - nsSNV 16 31090723 31090723 G T ZNF646 90 5 - - - - - nsSNV 19 53668299 53668299 C T ZNF665 108 23 - - - - - nsSNV 8 146067311 146067311 G T ZNF7 113 6 - - - - - sSNV 19 57755340 57755340 G T ZNF805 125 6 - - - - - nsSNV 19 53056925 53056925 C A ZNF808 57 5 - - - - - sSNV 16 71894454 71894454 G A ZNF821 108 8 - - - - - nsSNV 16 71894455 71894455 C G ZNF821 107 8 - - - - - sSNV 14 102792656 102792656 C A ZNF839 61 5 - - - - - nsSNV 19 53855540 53855540 C A ZNF845 87 6 - - - - - nsSNV 19 52888464 52888464 G T ZNF880 71 5 - - - - - nsSNV 1 238053762 238053762 C A ZP4 117 6 - - - - - nsSNV 1 45553675 45553675 C A ZSWIM5 116 6 - - - - - nsSNV

58

(F) AYA10

dbSNP138 CVE list Altered T_ T_ Mutation Chr Start End Ref Alt Non- gene Ref Alt All C V E context** Flagged* 20 47601265 47601265 G C ARFGEF2 44 5 - - - - - splicing 12 58005746 58005746 C T ARHGEF25 44 31 - - - - - sSNV 13576280 9 135762805 T G C9orf9 14 6 - - - - - nsSNV 5 19 45912489 45912492 CAAG C CD3EAP 98 6 - - - - - IF del 17 45234301 45234301 C A CDC27 21 3 - - - - O nsSNV rs7535 rs753536 17 45234707 45234707 T A CDC27 25 4 - - O nsSNV 3677 77 1 22902839 22902839 G A EPHA8 64 4 - - - - - nsSNV 14 76905825 76905825 C CG ESRRB 46 7 - - - - - FS ins 12636982 4 126369825 A G FAT4 137 5 - - - - - nsSNV 5 15 51675998 51675998 C T GLDN 15 3 - - - - - sSNV AGCCGCC 4 3076665 3076665 A HTT 19 7 - - - - - IF ins ACC rs7713 rs771346 17 39412119 39412119 C A KRTAP9-9 49 5 - - - nsSNV 4625 25 17 39412093 39412093 C A KRTAP9-9 42 5 - - - - - sSNV 11 95825371 95825374 CTGT C MAML2 43 13 - - O - - IF del 11 95825362 95825362 C T MAML2 37 3 - - O - - sSNV 12 5603608 5603608 G GC NTF3 77 8 - - - - - FS ins 12 80665560 80665560 C A OTOGL 89 26 - - - - - nsSNV rs2004 rs200469 16 70154480 70154480 A G PDPR 26 3 - - - nsSNV 69748 748 20 9389295 9389295 A T PLCB4 121 8 - - - - - nsSNV 1 3322119 3322119 G GC PRDM16 95 13 - - O - - FS ins 12 15677895 15677895 A C PTPRO 78 6 - - - - - nsSNV rs3752 rs146958 22 42910199 42910199 G A RRP7A 51 6 - - - nsSNV 505 654 5 77717632 77717632 A T SCAMP1 192 11 - - - - - UN 14 92920290 92920290 C T SLC24A4 64 6 - - - - - sSNV 16 2816115 2816115 A AC SRRM2 455 26 - - - - - FS ins 15208336 1 152083369 T TC TCHH 227 13 - - - - - FS ins 9 GCAG 16 67876787 67876796 CAGC G THAP11 28 25 - - - - - IF del AA 16696453 4 166964538 C A TLL1 67 5 - - - - O nsSNV 8 7 43917136 43917136 A AC URGCP 157 10 - - - - - FS ins

*NonFlagged: dbSNP files subtracting Flagged dbSNP entries. Flagged SNPs include SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated". **nsSNV: non-synonymous SNV, sSNV: synonymous SNV, FS ins: frameshift insertion, FS del: frameshift deletion, IF ins: in-frame insertion, IF del: in-frame deletion, ncRNA: non-coding RNA, UN: Unknown. transcript maps to multiple locations, all as "coding transcripts", but none has a complete ORF.

Except for the hypermutations in AYA07, we focused on level-1 variants to identify driving genetic alterations of AYA cancers that are specific to each patient and may be druggable (Figure 2-4 and Table 2-6). References of

Class II candidate genes were shown in Table 2-7. CNVs were analyzed to identify candidate driving CNVs (OncoScanTM) and chromosome-level

CNVs (OncoScanTM or VarScan2) (Figure 2-5 and Figure 2-6).

59

Figure 2-4. Candidate driving genetic alterations of AYA cancer and their druggability. An analysis of WES/WTS and OncoScanTM with our heuristic annotation identified level-1 candidate genetic alterations. By analyzing DAVID and DGIdb, the representative pathway of AYA cancers and druggability were also identified. The druggability is indicated by illustrations of pills; red indicates a direct inhibitor of a candidate target gene, and blue/yellow indicates an inhibitor of a pathway that includes the candidate alterations. AYA07 was excluded from the candidate gene search due to the hypermutation. All candidate genetic alterations are described in Table 2-5.

60

Table 2-6. Candidate driving genetic alterations of AYA cancers

No. Cancer Type Altered Mutation Allele Protein R† Class Level AYA gene type Freq. change #01 Prostate cancer ATAD5 nsSNV 0.12 G451D 0 C2 Lv3 TSG NF1 FS del 0.17 S1561fsǂ 0 C1 Lv1 TSG NF1 LOH - - - CNV Lv1 TSG RASA2 splicing 0.09 A742_splice 0 C2 Lv1 TSG RASA2 FS ins 0.09 D282fs 0 C2 Lv1 TSG SUZ12 LOH - - - CNV Lv2 TSG #02 Olfactory BRINP1 IF ins 0.58 681_682insT 0 C2 Lv3 TSG neuroblastoma CARD11 FS ins 0.44 Y609fs 0 C1 Lv3 OG CDKN2C nonsense 0.67 Q26*ǂ 0 C1 Lv1 TSG MEIS1 FS ins 0.08 R102fs 0 C2 Lv3 OG MINK1 nsSNV 0.64 E837Kǂ 0 C2 Lv3 TSG PPP6C splicing 0.15 G80_spliceǂ 0 C2 Lv1 TSG TGFBR2 nsSNV 0.08 R49S 0 C2 Lv3 TSG /Lv3 OG TP53 nsSNV 0.57 M237Vǂ >30 C1 Lv1 TSG #04 Head and neck ANK2 nsSNV 0.16 H2013Y 0 C2 Lv3 TSG squamous cell AXIN1 LOH - - - CNV Lv2 TSG carcinoma BAP1 LOH - - - CNV Lv2 TSG CDH1 LOH - - - CNV Lv2 TSG CHD5 nsSNV 0.17 R1891W 0 C2 Lv3 TSG FAT1 FS ins 0.20 L3696fs - C2 Lv1 TSG /Lv3 OG FOXL2 nsSNV 0.30 P157A 0 C1 Lv3 OG ITGB4 nsSNV 0.10 E839Vǂ 0 C2 Lv3 OG LECT1 nsSNV 0.10 W325Gǂ 0 C2 Lv3 TSG MSX1 nsSNV 0.09 S208L 1 C2 Lv2 TSG NOTCH1 Focal del - - - CNV Lv1 TSG NOTCH1 LOH - - - CNV Lv1 TSG PTCH1 LOH - - - CNV Lv2 TSG SETD2 Focal del - - - CNV Lv2 TSG SETD2 LOH - - - CNV Lv2 TSG SMAD4 Focal del - - - CNV Lv1 TSG TP53 splicing 0.28 S33_spliceǂ >5 C1 Lv1 TSG USP6 nsSNV 0.05 R133K 2 C2 Lv2 OG #06 Urachal CDH1 LOH - - - CNV Lv2 TSG carcinoma KRAS nsSNV 0.07 G13Dǂ >100 C1 Lv1 OG USP6 nsSNV 0.08 R133K 2 C2 Lv2 OG #09 Lung cancer EML4-ALK Fusion - - - - OG #10 Liposarcoma MDM2 Focal amp - - - CNV Lv1 OG PRDM16 FS ins 0.12 A365fsǂ 1 C2 Lv2 OG PTEN Focal del - - - CNV Lv1 TSG PTPRO nsSNV 0.07 K680Tǂ 1 C2 Lv2 TSG URGCP FS ins 0.06 F643fsǂ 1 C2 Lv2 OG

61

Table 2-7. References of Class II candidate cancer genes of AYA cancers

Sample Gene Class Level Ref. (PMID) AYA01 ATAD5 C2S Lv3 TSG 21901109 AYA01 RASA2 C2T Lv1 TSG 16564017 AYA01 RASA2 C2T Lv1 TSG 16564017 AYA02 BRINP1 C2S Lv3 TSG 11420708 AYA02 MEIS1 C2T Lv3 OG 24958854 AYA02 MINK1 C2S Lv3 TSG 16337592 AYA02 PPP6C C2T Lv2 TSG 25486434 24336330 AYA02 TGFBR2 Lv3 TSG/Lv3 OG 23661553 AYA04 ANK2 C2S Lv3 TSG 21042036 AYA04 CHD5 C2S Lv3 TSG 17289567 22986533 AYA04 FAT1 C2T Lv1 TSG/Lv3 OG 24590895 AYA04 ITGB4 C2S Lv3 OG 23348745 AYA04 LECT1 C2S Lv3 TSG 19480713 AYA04 MSX1 C2S Lv2 TSG 15705871 AYA04 USP6 C2S Lv2 OG 22081069 AYA06 USP6 C2S Lv2 OG 22081069 AYA10 PRDM16 C2T Lv2 OG 25082879 AYA10 PTPRO C2S Lv2 TSG 22851698 AYA10 URGCP C2T Lv2 OG 12082552

Figure 2-5. Analysis of CNVs in AYA cancers. (A) Distributions of relative copy number change (C) in AYA cancers, shown on a log2 scale. (B) Chromosome-level alterations are shown and were processed by VarScan2. Similar patterns were detected by OncoScanTM (Figure 2-6). (C) OncoScanTM identified a focal amplification of MDM2 in AYA10.

62

Figure 2-6. CNVs of AYA cancers from OncoScanTM and VarScan2. Since whole exome sequencing generated data from only exome regions (~2% of whole genome), processed data from VarScan2 were limited to detect candidate focal CNVs. However, this figure validated the feasibility of VarScan2 to analysis of chromosome-level CNVs due to similar patterns were shown between data from OncoScanTM platform (A) and VarScan2 (B).

AYA01, prostate cancer: aberrant activation of the RAS pathway

AYA01 showed concurrent loss-of-function in genes of the RasGAP family

(NF1 and RASA2). A frameshift deletion and LOH were detected in NF1, and a frameshift insertion and splicing mutation were detected in RASA2 that were validated by sequencing (Figure 2-7). Interestingly, concurrent mutations in

RasGAPs have been identified in several types of cancers according to the cBio portal (Figure 2-8). Furthermore, a recent study demonstrated the synergistic oncogenic effects of non-canonical Ras mutations in the context of loss-of-function in RasGAP (51). Because RasGAPs contribute to

63 tumorigenesis, we suggested an MEK inhibitor (as a single agent or in combination) for the treatment of AYA01 (52, 53). On the other hand, ATAD5

(Lv3 TSG) was mutated in AYA01 that featured haploinsufficient tumor suppressor contributed to generate various tumors in mice. (43)

Figure 2-7. Sequencing validation of RASA2 and NF1 in AYA01 sample. Representative potential candidate drivers of AYA01 were validated by NGS or Sanger sequencing. (A) Insertion of RASA2 was validated by NGS (Illumina Hiseq2000) with very deep depth (>10,000,000X) since allelic frequency of the mutation was under 0.1. (B) Splicing mutation of RASA2 was validated by Sanger sequencing. (C) Deletion of NF1 was validated by Sanger sequencing.

64

Figure 2-8. Concurrency of RasGAPs in large-scale study analyzed by using c-Bio portal. Ten of RasGAPs (48) were analyzed to their tendency of alteration whether they were altered together or mutually exclusively. WES data of 14 cancer types were selected in this analysis which included more than 90 samples and at least 10% alteration of 10 RasGAPs.

AYA02, olfactory neuroblastoma: chromosome instability and loss-of-function of CDKN2C

AYA02 harbored a chromosome-level alteration with a TP53 missense mutation that contributed to chromosome instability (54). Interestingly,

AYA02 showed a double peak of a relative copy number change and arm- level alterations, which differed from other tumors (Figure 2-5A and 2-5B). A loss-of-function in CDKN2C was identified with high-allelic frequency

(0.667). Given the tumor suppressor function of CDKN2C in breast cancer, a loss-of-function of CDKN2C may have driven tumor formation in AYA02;

65 therefore, we selected CDK4/6 inhibitors as a potential drug (palbociclib and

LY2835219) (55).

AYA04, Head and neck squamous cell carcinoma: alteration of the Wnt and

NOTCH pathways

AYA04 harbored TP53 mutations with alterations in FAT1, NOTCH1 and

SMAD4 that have been recurrently discovered by several large-scale studies of head and neck cancer (18, 19, 56). Specifically, AYA04 harbored concurrent mutations in Wnt pathway genes, such as FAT1, MSX1 and AXIN1, which were reported in a recent large-scale study of HNSCC with HPV (-) (19). Also, recent study showed that Wnt pathway contributed to tumorigenesis of

HNSCC by activating Oct4 (57). We suggested potential drugs (LGK974 and

γ secretase inhibitor) for AYA04 based on the importance of the Wnt and

NOTCH pathways.

AYA06, urachal carcinoma: alteration in noted KRAS mutation

Because AYA06 showed only one level-1 variant in KRAS (G13D) with no candidate CNVs, we selected an MEK inhibitor (selumetinib) as a potential drug. However, a missense mutation in USP6 (R133K, Lv2 OG) was detected in AYA06 and AYA04. USP6 is known to be able to initiating tumorigenesis either in cell lines or in mice via the activation of the NF-κB pathway, although the function of the R133K variant remains elusive (58).

66

AYA07, germ cell tumor: alteration in DNA repair genes and genome instability

AYA07 was excluded from the identification of candidate driving genetic alterations, because AYA07 showed a high mutation frequency with many

CNVs may be caused by missense mutations in six DNA repair genes (DDB1,

LIG3, MNAT1, POLE, POLG and POLQ) (Figure 2-2 and Table 2-3) (59).

The most frequently mutated gene, KIT, which was found in a large-scale study of TGCT, was not detected in AYA07 (49).

AYA09, lung cancer: well-known fusion, EML4-ALK

AYA09 was analyzed by WTS only because an IHC result for ALK was positive (data not shown). Because the EML4-ALK detected in our fusion processing is well known in lung cancer, crizotinib and ceritinib were recommended as potential drugs for AYA09 (Figure 2-9 and Table 2-5). The patient was treated with crizotinib and showed clinically significant tumor shrinkage, as expected.

67

Figure 2-9. EML4-ALK validation in AYA09 cells. (A) Process of EML4-ALK fusion (variant 3) was shown. The fusion was composed of exon 1-6 of EML4 and exon 20-28 of ALK. Variant 3 of EML4-ALK was sub-divided into two formats (3a and 3b) that are different sequence of 33 bp as represented. (B) Variant 3a of EML4-ALK (left) and variant 3b of EML4-ALK (right) were shown that had common ALK sequence. (C) RT-PCR result of variant 3 of EML-ALK was shown. RNA was extracted from cells of AYA09 and H2228 (which was cell line known to having variant 3a/b of EML4-ALK). (D) Sequence of the result of RT-PCR was identified by Sanger sequencing. AYA09 showed only variant 3b of EML4-ALK, but H2228 showed mixture of variant 3a and 3b of EML4-ALK. Color bar represented origin of the sequence from EML4 (blue), different sequence between variant 3a and 3b (grey) or ALK (red). (E) EML4-ALK fusion was detected by using break-apart fluorescent in situ hydridyzation (FISH). Positive signals were defined as split signals ≥ 15%.

68

AYA10, liposarcoma: amplification of MDM2 with PTEN deletion

Although level-1 SNV/indel alterations were not detected, CNVs in MDM2 and PTEN, which play a role in the p53 pathway, were identified in AYA10

(Figure 2-4). Target-specific drugs for MDM2 amplification, DS-3032b and

RO6839921, plus an mTOR inhibitor, everolimus, were recommended.

Comparison of candidate driving genes from AYA cancers and cancers in other age groups

To investigate differences in the genomic profiles between AYA cancers and cancers found in other age groups, we compared candidate driving genes between AYAs and all age groups for the same cancer based on published data from analyses similar to ours (Table 2-7). The mutation pattern of

AYA01 (prostate cancer) differed from that shown in a large-scale study analysis. Whereas AYA01 harbored alterations in the Ras pathway (NF1 and

RASA2), prostate cancer from other age groups showed recurrent mutations in

SPOP, TP53, and PTEN (60). However, several commonly altered genes in

HNSCC, such as TP53, FAT1, and NOTCH1, were identified in both AYA04 and in large-scale studies, whereas other mutations differed (19).

69

Table 2-8. Comparison of candidate driving genes of same cancer type from AYAs with those from all age group

Prostate cancer HNSCC AYA01 Barbieri, et al. AYA04 TCGA Sequencing Platform WES (n=1) WES (n=112) WES (n=1) WES (n=279) Exome capture kit Agilent Agilent Agilent Agilent SureSelect SureSelect SureSelect SureSelect Sequencing Illumina HiSeq Illumina HiSeq Illumina HiSeq Illumina HiSeq instrument Depth (mean) 138X 118X 198X 95X Age (median) 30 63 33 61 Bioinformatic pipeline bwa (hg19) Alignment Picard Deduplication GATK Realignment GATK Recalibration Variant calling MuTect SNVs Somatic Indel Detector/Indelocator Indels Selection of Heuristic MutSig Heuristic MutSig candidate SNV/Indels annotation annotation CNVs detection Affymetrix Affymetrix Affymetrix Affymetrix OncoScan SNP 6.0 OncoScan SNP 6.0 Selection Heuristic GISTIC Heuristic GISTIC of candidate CNVs annotation annotation Candidate driving genes

SNVs/Indels NF1 SPOP TP53 CDKN2A RASA2 TP53 FAT1 FAT1 ATAD5 PTEN MSX1 TP53 FOXA1 USP6 CASP8 CDKN1B ANK2 AJUBA ZNF595 CHD5 PIK3CA THSD7B FOXL2 NOTCH1 MED12 ITGB4 KMT2D NIPA2 LECT1 NSD1 PIK3CA HLA-A C14orf9 TGFBR2 SCN11A CNVs* NF1 NA† NOTCH1 FADD SUZ12 SMAD4 CDKN2A SETD2 CSMD1 PTCH1 SOX2 AXIN1 LRP1B BAP1 EGFR CDH1 FAT1 *Emphasized results from published paper (q < 0.0001). † There was not available emphasized result of CNVs

70

DISCUSSION

Genomic profiling of AYA cancers

In this study, we described the genomic profiles of seven different rare types of AYA cancers using three different genomics platforms (WES, WTS and

OncoScanTM). After processing genomics data, we identified potential druggable targets for each cancer and selected existing anti-cancer drugs to treat individual patients (Figure 2-4 and Table 2-6).

We identified candidate driving genetic alterations specific to each patient using logical manual curation (pattern-based heuristic annotation) as alternative to statistical method (Figure 1-1 – 1-3). It is needed to alternative method to select candidate genes in rare type of cancers, like AYA cancers, since low number of samples is limited to the selection of candidate driving genes using statistical method as shown in large-scale studies (12).

Because the features of AYA cancers are distinct from those of other age groups, including the incidences and clinical outcomes, studying the genomic profiles of AYA cancers is important to identify the unique features of AYA cancers (32, 33). When comparing candidate genes between AYAs and all age groups for the same cancer type, we identified different candidate genes in prostate cancer (AYA01) and a germ cell tumor (AYA07), although several common candidate genes (TP53, FAT1, and NOTCH1) were found in HNSCC

(AYA04) in both AYAs and all age groups (Table 2-7). These results showed that AYA-specific genetic alterations may be different from those in other age

71 groups; thus, further study is needed to define the significance of the differences in the genetic alterations between AYAs and other age groups.

Clinical implication

In this study, we analyzed individual cancer genomes and suggested potential drugs for each patient based on his or her genetic alterations. Characterizing the genomes of patients and genomics-driven knowledge enabled personalized medicine and advanced cancer genomics for clinical implications (20).

Moreover, to establish the clinical validity of genetic tests, especially for NGS data, the FDA discussed ‘post-marketing pursuit’ to define the clinical implications of variants generated from NGS, which have remained unknown

(61). Therefore, we expect many prospective genomic studies, such as our study, to link the patient to therapy as well as diagnosis, prognosis, and monitoring (62).

72

REFERENCES

1. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature.

2009;458(7239):719-24.

2. Mardis ER. A decade's perspective on DNA sequencing technology.

Nature. 2011;470(7333):198-203.

3. National Research Institute. Available from: http://www.genome.gov/sequencingcosts/.

4. Simon R, Roychowdhury S. Implementing personalized cancer genomics in clinical trials. Nat Rev Drug Discov. 2013;12(5):358-69.

5. Garraway LA, Lander ES. Lessons from the cancer genome. Cell.

2013;153(1):17-37.

6. Cancer Genome Atlas Research N, Weinstein JN, Collisson EA, Mills

GB, Shaw KR, Ozenberger BA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nature genetics. 2013;45(10):1113-20.

7. Kristensen VN, Lingjaerde OC, Russnes HG, Vollan HK, Frigessi A,

Borresen-Dale AL. Principles and methods of integrative genomic analyses in cancer. Nature reviews Cancer. 2014;14(5):299-313.

8. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S,

Biankin AV, et al. Signatures of mutational processes in human cancer.

Nature. 2013;500(7463):415-21.

9. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K,

Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214-8.

73

10. Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, et al.

Mutational landscape and significance across 12 major cancer types. Nature.

2013;502(7471):333-9.

11. Ciriello G, Miller ML, Aksoy BA, Senbabaoglu Y, Schultz N, Sander

C. Emerging landscape of oncogenic signatures across human cancers. Nature genetics. 2013;45(10):1127-33.

12. Lawrence MS, Stojanov P, Mermel CH, Robinson JT, Garraway LA,

Golub TR, et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature. 2014;505(7484):495-501.

13. Munoz J, Kurzrock R. Targeted therapy in rare cancers--adopting the orphans. Nature reviews Clinical oncology. 2012;9(11):631-42.

14. Braiteh F, Kurzrock R. Uncommon tumors and exceptional therapies: paradox or paradigm? Molecular cancer therapeutics. 2007;6(4):1175-9.

15. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA, Jr.,

Kinzler KW. Cancer genome landscapes. Science. 2013;339(6127):1546-58.

16. Kang H, Kiess A, Chung CH. Emerging biomarkers in head and neck cancer in the era of genomics. Nature reviews Clinical oncology.

2015;12(1):11-26.

17. Stransky N, Egloff AM, Tward AD, Kostic AD, Cibulskis K,

Sivachenko A, et al. The mutational landscape of head and neck squamous cell carcinoma. Science. 2011;333(6046):1157-60.

18. Agrawal N, Frederick MJ, Pickering CR, Bettegowda C, Chang K, Li

RJ, et al. Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1. Science. 2011;333(6046):1154-7.

74

19. Cancer Genome Atlas N. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature. 2015;517(7536):576-82.

20. Garraway LA, Verweij J, Ballman KV. Precision oncology: an overview. Journal of clinical oncology : official journal of the American

Society of Clinical Oncology. 2013;31(15):1803-5.

21. van Rooij N, van Buuren MM, Philips D, Velds A, Toebes M,

Heemskerk B, et al. Tumor exome analysis reveals neoantigen-specific T-cell reactivity in an ipilimumab-responsive melanoma. Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

2013;31(32):e439-42.

22. Tran E, Turcotte S, Gros A, Robbins PF, Lu YC, Dudley ME, et al.

Cancer immunotherapy based on mutation-specific CD4+ T cells in a patient with epithelial cancer. Science. 2014;344(6184):641-5.

23. Cancer Genome Atlas Research N. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. The New England journal of medicine. 2013;368(22):2059-74.

24. Davoli T, Xu AW, Mengwasser KE, Sack LM, Yoon JC, Park PJ, et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell. 2013;155(4):948-62.

25. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al. A census of human cancer genes. Nature reviews Cancer. 2004;4(3):177-

83.

75

26. Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al.

Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science signaling. 2013;6(269):pl1.

27. Zack TI, Schumacher SE, Carter SL, Cherniack AD, Saksena G,

Tabak B, et al. Pan-cancer patterns of somatic copy number alteration. Nature genetics. 2013;45(10):1134-40.

28. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic acids research.

2010;38(16):e164.

29. Maxson JE, Gotlib J, Pollyea DA, Fleischman AG, Agarwal A, Eide

CA, et al. Oncogenic CSF3R mutations in chronic neutrophilic leukemia and atypical CML. The New England journal of medicine. 2013;368(19):1781-90.

30. Van Allen EM, Wagle N, Stojanov P, Perrin DL, Cibulskis K,

Marlow S, et al. Whole-exome sequencing and clinical interpretation of formalin-fixed, paraffin-embedded tumor samples to guide precision cancer medicine. Nature medicine. 2014;20(6):682-8.

31. Roychowdhury S, Iyer MK, Robinson DR, Lonigro RJ, Wu YM, Cao

X, et al. Personalized oncology through integrative high-throughput sequencing: a pilot study. Science translational medicine.

2011;3(111):111ra21.

32. Bleyer A, Barr R, Hayes-Lattin B, Thomas D, Ellis C, Anderson B, et al. The distinctive biology of cancer in adolescents and young adults. Nature reviews Cancer. 2008;8(4):288-98.

76

33. Tricoli JV, Seibel NL, Blair DG, Albritton K, Hayes-Lattin B. Unique characteristics of adolescent and young adult acute lymphoblastic leukemia, breast cancer, and colon cancer. J Natl Cancer Inst. 2011;103(8):628-35.

34. Foster JM, Oumie A, Togneri FS, Vasques FR, Hau D, Taylor M, et al. Cross-laboratory validation of the OncoScan(R) FFPE Assay, a multiplex tool for whole genome tumour profiling. BMC Med Genomics. 2015;8:5.

35. Ho AS, Kannan K, Roy DM, Morris LG, Ganly I, Katabi N, et al. The mutational landscape of adenoid cystic carcinoma. Nature genetics.

2013;45(7):791-8.

36. Li H, Durbin R. Fast and accurate short read alignment with Burrows-

Wheeler transform. Bioinformatics. 2009;25(14):1754-60.

37. PICARD. Available from: http://broadinstitute.github.io/picard/.

38. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K,

Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research.

2010;20(9):1297-303.

39. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D,

Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature biotechnology. 2013;31(3):213-9.

40. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome research. 2012;22(3):568-76.

77

41. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.

The Sequence Alignment/Map format and SAMtools. Bioinformatics.

2009;25(16):2078-9.

42. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES,

Getz G, et al. Integrative genomics viewer. Nature biotechnology.

2011;29(1):24-6.

43. Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols. 2009;4(1):44-57.

44. Griffith M, Griffith OL, Coffman AC, Weible JV, McMichael JF,

Spies NC, et al. DGIdb: mining the druggable genome. Nature methods.

2013;10(12):1209-10.

45. Ge H, Liu K, Juan T, Fang F, Newman M, Hoeck W. FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution. Bioinformatics. 2011;27(14):1922-8.

46. McPherson A, Hormozdiari F, Zayed A, Giuliany R, Ha G, Sun MG, et al. deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data.

PLoS computational biology. 2011;7(5):e1001138.

47. Iyer MK, Chinnaiyan AM, Maher CA. ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics.

2011;27(20):2903-4.

48. King PD, Lubeck BA, Lapinski PE. Nonredundant functions for Ras

GTPase-activating proteins in tissue homeostasis. Science signaling.

2013;6(264):re1.

78

49. Litchfield K, Summersgill B, Yost S, Sultana R, Labreche K, Dudakia

D, et al. Whole-exome sequencing reveals the mutational spectrum of testicular germ cell tumours. Nature communications. 2015;6:5973.

50. Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, Welch JS, et al.

Clonal evolution in relapsed acute myeloid leukaemia revealed by whole- genome sequencing. Nature. 2012;481(7382):506-10.

51. Stites EC, Trampont PC, Haney LB, Walk SF, Ravichandran KS.

Cooperation between Noncanonical Ras Network Mutations. Cell reports.

2015.

52. McGillicuddy LT, Fromm JA, Hollstein PE, Kubek S, Beroukhim R,

De Raedt T, et al. Proteasomal and genetic inactivation of the NF1 tumor suppressor in gliomagenesis. Cancer cell. 2009;16(1):44-54.

53. Min J, Zaslavsky A, Fedele G, McLaughlin SK, Reczek EE, De Raedt

T, et al. An oncogene-tumor suppressor cascade drives metastatic prostate cancer by coordinately activating Ras and nuclear factor-kappaB. Nature medicine. 2010;16(3):286-94.

54. Negrini S, Gorgoulis VG, Halazonetis TD. Genomic instability--an evolving hallmark of cancer. Nature reviews Molecular cell biology.

2010;11(3):220-8.

55. Pei XH, Bai F, Smith MD, Usary J, Fan C, Pai SY, et al. CDK inhibitor p18(INK4c) is a downstream target of GATA3 and restrains mammary luminal progenitor cell proliferation and tumorigenesis. Cancer cell.

2009;15(5):389-401.

79

56. Martin D, Abba MC, Molinolo AA, Vitale-Cross L, Wang Z, Zaida M, et al. The head and neck cancer cell oncogenome: a platform for the development of precision molecular therapies. Oncotarget. 2014;5(19):8906-

23.

57. Lee SH, Koo BS, Kim JM, Huang S, Rho YS, Bae WJ, et al.

Wnt/beta-catenin signalling maintains self-renewal and tumourigenicity of head and neck squamous cell carcinoma stem-like cells by activating Oct4.

The Journal of pathology. 2014;234(1):99-107.

58. Pringle LM, Young R, Quick L, Riquelme DN, Oliveira AM, May MJ, et al. Atypical mechanism of NF-kappaB activation by TRE17/ubiquitin- specific protease 6 (USP6) oncogene and its requirement in tumorigenesis.

Oncogene. 2012;31(30):3525-35.

59. Wood RD, Mitchell M, Sgouros J, Lindahl T. Human DNA repair genes. Science. 2001;291(5507):1284-9.

60. Barbieri CE, Baca SC, Lawrence MS, Demichelis F, Blattner M,

Theurillat JP, et al. Exome sequencing identifies recurrent SPOP, FOXA1 and

MED12 mutations in prostate cancer. Nature genetics. 2012;44(6):685-9.

61. Evans BJ, Burke W, Jarvik GP. The FDA and Genomic Tests -

Getting Regulation Right. The New England journal of medicine. 2015.

62. Lee SH, Sim SH, Kim JY, Cha S, Song A. Application of cancer genomics to solve unmet clinical needs. Genomics & informatics.

2013;11(4):174-9.

80

Abstract in Korean

연구 목적: 젊은 성인 암 환자의 생물학적 특징 및 치료 결과는 다른

나이 그룹의 암 환자들과 비교할 때 독특한 특성이 있다고 알려져

있으나 이들의 분자적 특성에 대한 연구는 많지 않다. 본 연구에서는

전이성 희귀 암을 지닌 젊은 성인 환자들의 암 유전체를 개별적으로

분석하여 각 환자들의 암 유발 유전 변이 후보들과 이들의 약물 적용

가능성을 논하고자 한다.

연구 방법: 젊은 성인 암 환자 샘플을 분석하기에 앞서 시퀀싱

데이터에서 암 유발 유전 변이 후보들을 찾기 위하여 패턴에 근거한

주석 방법 (pattern-based heuristic annotation approach)을

고안했다. 이 방법은 The Cancer Genome Atlas (TCGA)에서

제공하는 급성골수성백혈병의 전체 엑솜 염기 서열 분석법 (whole exome sequencing) 데이터를 성공적으로 분석하여 신뢰성을

확보했다. 이 후, 세 개의 유전체 분석 플랫폼 (전체 엑솜 염기 서열

분석법, 전체 전사체 염기 서열 분석법 (whole transcriptome sequencing), OncoScanTM)을 이용하여 전향적으로 수집한 일곱

명의 젊은 성인 희귀 암 환자들의 종양 샘플들을 분석하였다. 잘

알려진 생물 정보학 도구들 (bwa, Picard, GATK, MuTect, Somatic

Indel Detector)과 자체적으로 개발한 패턴에 근거한 주석 방법

(pattern-based heuristic annotation approach), DAVID 나

81

DGIdb 와 같은 오픈 액세스 데이터베이스를 활용하여 염기 서열

데이터를 가공하고 각 환자들의 암 유발 유전 변이 후보들과 약물 적용

가능성을 분석했다.

연구 결과: 패턴에 근거한 주석 방법은 TCGA 의 급성골수성백혈병

연구에서 확인된 모든 암 유발 유전 변이 후보들을 찾을 뿐 아니라,

MYC, NF1, ARID2 와 같이 잘 알려진 암 유발 유전 변이 후보들도

추가적으로 찾아냄으로써 신뢰성을 확보했다. 과변이 (hyper mutation)를 보인 생식세포종양 (germ cell tumor)을 제외하고, 젊은

성인 암의 변이 빈도는 다른 성인 암들의 변이 빈도보다 적었다

(median= 0.56). 환자 특이적인 암 유발 유전 변이 후보는 다음과

같았다: RASA2 와 NF1 (전립선암; prostate cancer), TP53 과

CDKN2C (후각신경아세포종; olfactory neuroblastoma), FAT1,

NOTCH1, SMAD4 (두경부암; head and neck cancer), KRAS

(요막관종양; urachal carcinoma), EML4-ALK (폐암; lung cancer), MDM2 와 PTEN (지방육종; liposarcoma). 임상의들의

자문을 통해 각 환자들의 암 유발 유전 변이 후보들의 유전자 혹은

관련된 경로 (pathway)에 대한 약물 적용 가능성을 제시하였다. 또한,

본 연구에서 진행한 젊은 성인 암과 같은 종류의 암을 지닌 다른 나이

그룹을 비교 시, 전립선암과 생식세포종양은 서로 다른 암 유발 유전

변이 후보들을 가지지만, 두경부암에서는 TP53, FAT1, NOTCH1 이

공통적인 암 유발 유전 변이 후보들인 것으로 나타났다.

82

연구 결론: 본 연구는 일곱 명의 젊은 성인 희귀 암의 암 유발 유전

변이 후보들을 분석하고 이들의 약물 적용 가능성을 제시함으로써 암

유전체의 개별 분석의 중요성과 가능성을 보여주었다. 또한 같은 암

종이라도 젊은 성인 암의 암 유발 유전 변이 후보들은 다른 나이

그룹의 것들과 서로 다르게 나타날 수도 있음을 확인하였다.

83