THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE

DEPARTMENT OF BIOMEDICAL ENGINEERING

TARGETED SEQUENCING OF 9 ASSOCIATED WITH PAPILLARY TYPE 1 RENAL CELL CARCINOMA

MICHAEL BELSKY SPRING 2015

A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Biomedical Engineering with honors in Biomedical Engineering

Reviewed and approved* by the following:

William O Hancock Professor of Biomedical Engineering Thesis Supervisor

Peter J Butler Professor of Biomedical Engineering Honors Adviser

William A LaFramboise Associated Professor of Pathology University of Pittsburgh

* Signatures are on file in the Schreyer Honors College. i

ABSTRACT

We have completed a targeted sequencing study of 6 patient Papillary Type 1 tumors. Fragment libraries were created for 456 exons of 43 genes on the p-arm of between reference positions 1,051,050-14,150,350 using biotinylated DNA hybridization probes to enrich

112.5 kb of sequence. The selected fragments were collected, concentrated, and subjected to a second round of hybridization, capture, wash, and amplification followed by sequencing on the

Ion Torrent PGM using the 318 v2 chip. Analysis was performed using the GenomAnalytics

(GenomOncology, Westlake, OH) server and all findings were confirmed by manual curation of the sequence using the Integrated Genome Viewer (Broad Institute, Canbridge, MA). We previously identified significant copy number losses specific to Papillary Type 2 renal tumors in the GLDC, SLC1A1, and CDC37L1 genes of each tumor. Also, we identified 3 single nucleotide substitutions that were identical in every tumor sample within the coding regions of 3 genes

(TPD52L3, PDCD1LG2 and KIAA0020) in this genomic domain. Papillary Type 1 tumors were sequenced to determine if they exhibit different single nucleotide and copy number variants than those identified in Papillary Type 2 samples. The long-range goal is to develop a low cost, high- resolution test requiring minimal substrate to rapidly distinguish among Papillary Type 1 and

Type 2 cancers using small biopsy specimens or fine needle aspirates.

ii

TABLE OF CONTENTS

List of Figures ...... iii

List of Tables ...... iv

Acknowledgements ...... v

Chapter 1 Introduction ...... 1

Papillary Type 1 Renal Cell Carcinoma ...... 1 Next Generation Sequencing ...... 4 Hallmarks of Cancer ...... 6 The Study ...... 11

Chapter 2 Methods ...... 13

Tissue Samples ...... 13 DNA Extraction ...... 13 Fragment Library Construction ...... 15 Targeted Sequencing of Matched Tumor-Normal Specimens ...... 17 Data Analysis ...... 18

Chapter 3 Results and Discussion ...... 20

Chapter 4 Conclusions and Future Work ...... 27

Future Work ...... 28

Bibliography ...... 29

Appendix A. Ion Torrent Run Reports ...... 32

Appendix B. Fragment Library Construction Documents ...... 41

Appendix C. Variant Caller Output ...... 43

iii

List of Figures

Figure 1 - RCC Subtypes ...... 2

Figure 2 - Copy Number Profiles of RCC Subtypes ...... 12

Figure 3 - Extraction Process (QIAGEN) ...... 15

Figure 4 - Ion Torrent PGM Semiconductor (Life Technologies) ...... 17

Figure 5 - Significant Copy Number Deletions ...... 26

Figure 6 - Significant Copy Number Amplifications ...... 26

Figure 7 - Ion Torrent Run Report Samples 1-6 ...... 32

Figure 8 - Ion Torrent Run Report Samples 7-12 ...... 36

Figure 9 - Sample Bioanalyzer Output ...... 42

iv

List of Tables

Table 1 - Tumor Specific Variants Found In All Tumor Samples ...... 21

Table 2 - Genes With Variants ...... 22

Table 3 - Dilution Guide for Fragment Library Construction ...... 41

Table 4 – Tumor Specific Damaging SNVs Called by VARSCAN2 ...... 43

Table 5 - Tumor Specific Damaging DELs Called by VARSCAN2 ...... 44

Table 6 - Damaging SNVs Found in All Tumors by Ion Torrent Plugin ...... 45

Table 7 – Damaging SNVs Found in All Normal Samples by Ion Torrent Plugin ...... 46

Table 8 - Damaging Variants in All Tumors Called by GATK UGT ...... 47

Table 9 - Damaging Variants in All Normal Samples Called by GATK UGT ...... 48

v

Acknowledgements

I have been so fortunate to have worked with and learned from several member of the

Biomedical Engineering Department during my time at the Pennsylvania State University.

Thank you to Dr. William Hancock, my thesis advisor, for all of his support and guidance for this work. Thank you as well to Dr. Peter Butler and Dr. Keefe Manning, my two honors advisors, for their help in planning my undergraduate education.

Thank you to Dr. Rajiv Dhir and the UPMC Shadyside Pathology Department and Tissue

Bank for their expertise in providing and classifying tissue samples used in the study.

Paramount thanks go to the members of the Clinical Genomics Facility at Shadyside

Hospital in Pittsburgh, PA. This group worked with me for each summer of my undergraduate education, and the majority of my ability to conduct thorough, valuable research and my knowledge in the field of cancer biology and genetics can be credited to their teaching – Dr.

William LaFramboise, Patti Petrosko, Maureen Lyons, and Christin Sciulli. Special thanks are given to Dr. LaFramboise, who has been my greatest mentor and role model as I have learned what it means to be a great scientist.

Lastly, I would like to thank my family (Mom, Dad, and Kara) and friends (my roommates, the Penn State Glee Club, OGC, and the THON 2015 Family Relations Committee) for their support, encouragement, and love. While I have relied on many colleagues and teachers throughout this process, you all have kept me grounded, sane, and most importantly, happy.

Without the people mentioned above, this thesis would not have come to completion. I cannot imagine this process without a single one of them. 1

Chapter 1

Introduction

Papillary Type 1 Renal Cell Carcinoma

Renal Cell Carcinoma (RCC) is a prevalent condition, accounting for 2-3% of all malignant disease in adults (Rini, Campbell, & Escudier, 2009). The annual incidence in the

United States has increased steadily, with 58,000 new diagnoses in 2010. Partial or radical nephrectomy for small and large tumors is standard treatment for the disease. Preventative methods for managing RCC tend to be overlooked, as kidney cancer is characterized by the late diagnoses of those affected by it. Often times, patients don’t begin to present symptoms – blood in the urine, a lump in the abdomen, localized pain - until the disease has progressed substantially. Diagnoses of RCC typically come serendipitously: A patient will undergo a CT scan or MRI for an unrelated ailment in the vicinity of the kidneys, and a physician will discover the tumor by chance. The effectiveness of immunotherapy as a course of treatment for RCC is still controversial and mostly reserved for patients expected to have positive outcomes.

RCC is classified into 5 tumor subtypes with different morphological traits/signatures, courses of treatment used, and clinical outcomes. Clear-cell carcinoma is the most common subtype, accounting for 82% of RCC. This is a renal cortical tumor characterized by malignant epithelial cells with clear cytoplasm and a compact-alveolar growth pattern interrupted by complex vasculature. Papillary Type 1 (PAP1) and Type 2 (PAP2) RCC account for 11% of 2 cases. Type 1 tumors are lined by small cells with clear cytoplasm and single layers of small nuclei, while Type 2 tumors are lined by large cells and have rather large nuclei (Klatte et al.,

2009). Type 1 tumors tend to be low grade and low stage, while Type 2 tumors are usually high grade and caught at much higher stage. Chromophobe tumors account for 5% of RCC cases, tend to be larger than clear cell tumors, and are commonly diagnosed at an early stage (Vera-

Badillo, Conde, & Duran, 2012). Finally, the oncocytoma subtype accounts for the remainder of

RCC cases and is generally a benign neoplasm (Geramizadeh et al.). Below are samples hematoxylin and eosin (H&E) stains of the five major RCC subtypes.

Figure 1 - RCC Subtypes

Shown above are hematoxylin and eosin stains for the five major subtypes of RCC. Panel A shows Chromophobe, panel B shows Clear Cell, panel C shows Oncocytoma, panel D shows Papillary Type 1 (PAP1), and panel E shows Papillary Type 2 (PAP2).

Currently, RCC tumors are classified post-surgically to determine the best course of follow-up treatment after partial or radical nephrectomy. Diagnosis of RCC subtypes relies almost completely on morphological analysis of specimens by pathologists. A molecular test to 3 differentiate RCC subtypes via less invasive techniques such as a fine-needle biopsy would better inform oncologists and allow them to choose the best course of treatment before turning to surgery. For example, Papillary Type 2 tumors are extremely aggressive, metastasize quickly, and need to be treated equally aggressively, whereas Papillary Type 1 tumors are usually localized and benign.

In a study of the genetic landscape of the RCC subtypes, vast genomic changes were identified across types (Krill-Burger et al., 2012). Certain regions commonly affected by copy number variations were identified for each subtype. Particularly, an array analysis of the p-arm of chromosome 9 found common copy number losses in Papillary Type 2 tumors previously implicated as an active tumor suppressor in lung cancer cell lines. In a deep sequencing study of these regions, we identified common single nucleotide variants (SNVs) and small insertions and deletions (INDELs) in 16 out of 43 studied genes. Some of these specific single base mutations were found commonly in all samples in the study, creating a powerful genetic fingerprint for distinguishing Papillary Type 2 RCC through a minimal, targeted panel.

This research inspired the study presented in this thesis. The same sequence of 43 genes from the p-arm of chromosome 9 was studied in Papillary Type 1 RCC samples. Our hope is to identify copy number variants, single nucleotide variants, and small insertions and deletions that can be used to genetically distinguish Papillary Type 1 RCC from Papillary Type 2 RCC. The long-term goal is the development of a genetic panel to differentiate the subtypes through a small needle biopsy and rapid sequencing of a targeted DNA region. Quick, non-invasive methods for differentiating these morphologically-similar but clinically-disparate subtypes would be invaluable to physicians in determining the most effective treatment plan for RCC patients. 4 Next Generation Sequencing

Next generation sequencing provides a method for massively sequencing millions of

DNA strands simultaneously to give extremely high throughput and allow larger and larger samples of DNA to be studied at higher and higher depths, or greater accuracy (Walker, 2009).

Next generation sequencing operates on the principle of recording the signal given off when a single base is incorporated into a fragment of DNA, except that it achieves this in a massively parallel way, allowing for the sequencing of entire genomes or exomes from various organisms.

Targeted sequencing using next generation sequencing allows for a small subset of genes or any defined region in a genome to be studied: Researchers can pick and choose regions most relevant to a particular disease instead of sifting through the entire genome (Harismendy et al., 2009).

Next generation sequencing can be built on three different technological models and methods for collecting data (Mardis, 2008). The first is called pyrosequencing, which records the incorporation of each nucleotide into a template strand of DNA by measuring light produced by an enzyme called luciferase. DNA fragment libraries are created via PCR first. Next, microscopic beads are added to the solution and emulsion PCR is used to decorate the beads with copies of the template strands. These beads are loaded into a sequencer into tiny wells, where changes in fluorescence can be measured. The Ion Torrent Personal Genome Machine (PGM) used in this study operates on a similar principle, instead measuring miniscule changes in pH due to the incorporation of bases into a template strand.

The second most common method has a slightly different approach. Genomic DNA is prepared and the DNA is then attached at one end to a flat surface, the inside of flow cell channels. Bridge amplification is used to create cluster strands adhered to the surface. DNA 5 polymerase is then used to extend the strand one fluorescently labeled nucleotide at a time, and optic devices are used to determine sequence.

The third method combines elements of the first two. DNA strands are bound to magnetic beads, which are then deposited onto a flow cell rather than into wells. A ligase- mediated approach is used instead of a DNA polymerase approach, in which two bases are encoded in quick succession. A more advanced optical system of lasers is used to excite fluorescently labeled nucleotides after they are incorporated into the strands, and this fluorescence is used to determine which pair of sequence was incorporated, rather than single bases. This dual base-calling encoding allows for sequences to be studied for errors in the sequencing vs. mutations.

Working with such expansive datasets as a result of this massively parallel sequencing creates a unique and complicated problem in data analysis and management. Firstly, the large file sizes created are cumbersome to manipulate and require an expansive IT infrastructure to safely store and transfer for various analyses. Second, data sets of this magnitude require unique statistical techniques for analysis (Coe et al., 2014). Third, many of the publically available algorithms call bases using completely different statistical models, which gives seemingly random concordance and discordance between variant callers (Li & Olivier, 2013). Manual curation and verification via methods such as Sanger sequencing remain the most foolproof methods for calling variants. In previous research endeavors, we worked alongside a software company – GenomOncology, Westlake, OH – to develop a shell program that allows for analysis via a battery of variant calling algorithms and then compares the results, allowing a user to find variants called commonly across specific subsets of samples and the tools used to analyze them.

This software package also exports data to outside databases that can predict sequence 6 changes based on variants, whether these variants are predicted to have deleterious effects on a product, and whether variants are implicated in other diseases and cancers.

Hallmarks of Cancer

Tumorigenesis is believed to occur through mutations in specific genes known as

“driver” genes. These genes can be tumor suppressors or oncogenes. A tumor suppressor is any gene that keeps a cell from developing into a cancerous one; mutation of a tumor suppressor such that its function is lost can lead to cancer. An oncogene is any gene involved with cell growth, differentiation, or division that has been modified in a way that causes cells to become malignant. Rather than occurring in one gene, mutations may occur in multiple genes that converge on a single pathway leading to tumorigenesis. Identification of mutations in subtypes of cancer can not only provide new molecular biomarkers to differentiate disease, but may also provide new therapeutic targets that could be treated to stop cancer cells proliferation.

There are six biological capabilities acquired during the development of cancer cells.

Mutations found through our study to converge on genes or pathways involved in these six

“hallmarks” of cancer could act as potential targets for molecular therapy (Hanahan & Weinberg,

2011). The following is a summary of the capabilities identified by Hanahan and Weinberg in the seminal paper, “Hallmarks of Cancer: The Next Generation.”

Sustaining Proliferative Signaling – This involves the ability of cancer cells to replicate

indefinitely, one of their defining characteristics. Normal cells are controlled by growth-

promoting or suppressing hormones that keep them confined within a typical cell growth

and division cycle. Cancer cells find a way to interfere with these signals and therefore 7 control their own fate. For example, cancer cells can continually produce growth factors that cause them to keep proliferating at expansive rates, or they can influence other factors such as energy metabolism. Not much is known about the release of these signals in normal cells and the complicated pathways that they act through, since these growth factors are transmitted via paracrine signaling (cells release hormones that are distributed in such a way that whichever cells are closest receive the most hormone and ones that are further away receive significantly less), which can be very difficult to study experimentally. However, these mechanisms are much better understood in cancer cells than in normal cells. Cancer cells may sustain proliferative signaling by creating more growth factor hormones themselves, stimulating tumor-supportive cells nearby, creating an excess of receptors and making a cell much more responsive to what should be a limited amount of growth factor, or by altering the structure of these receptors and increasing their affinity for a growth factor.

Evading growth suppressors – Cancer cells often find ways to avoid the signals that are designed to discourage uncontrolled proliferation. They learn to work around cellular programs that negatively regulate this behavior: Essentially, cancer cells learn how to beat tumor suppressors. These tumor suppressors may work to limit cell growth or proliferation. Two classic examples of tumor suppressors are the RB and TP53 .

RB integrates signals from sources both inside and outside of the cell and decides whether or not a cell should continue through the . TP53 receives signals when a cell is stressed, and can halt progression through the cell cycle until those stresses have been resolved. If a cell lacks a functional copy of either of these genes, it is easy to see how it could ignore signals from the cell to stop proliferation and simply continue to 8 grow. Cells also learn to evade contact inhibition, a mechanism by which cell-to-cell contact actually inhibits further growth of cells because if other cells are nearby, there is no need to continue to proliferate. Merlin, the protein product of the gene NF2, controls contact inhibition by coupling E-cadherin molecules to transmembrane EFG receptors.

Merlin strengthens the adhesion of these two molecules and limits cells’ ability to emit mitogenic signals. If this gene is not functioning properly, contact inhibition cannot provide the negative feedback loop used to prevent cells from further proliferation.

Resisting cell death – Apoptosis, or programmed cell death, is a well-known mechanism used to prevent cancer development. However, cells may acquire the ability to ignore the signals used to induce apoptosis (stresses the cancer cells experience during tumorigenesis or even in response to chemotherapy) and, therefore, can become highly malignant and even resist therapy. Some typical stressors that would normally induce apoptosis are highly damaged DNA and very high oncogene signaling. The TP53 tumor suppressor induces apoptosis by upregulating the expression of proteins in the apoptotic pathway. Inactivation of these genes will keep these protein levels low and, therefore, apoptosis will not occur. TP53 acts as a “sensor” for critical damage, and a lack of awareness of critical damage due to its inactivation can encourage tumorigenesis. Cells may also die by necrosis, in which cells become bloated, explode, and release contents to the local environment. Although cells are known to release pro-inflammatory signals that activate the immune system, there is also evidence to suggest that inflammatory cells may actually be tumor promoting – they are capable of angiogenesis, proliferation, invasiveness, etc. Necrotic cells may also release factors that stimulate proliferation through paracrine signaling, similar to that discussed above. 9 Enabling replicative immortality – Cancer cells can acquire the ability of unlimited replicative potential that allows them to form large tumors. This is in stark contrast to the majority of cells in the body, which are only able to live through a fixed number of cell cycles before they die out. This limit has been linked to two major processes, senescence and crisis. Senescence is a state of nonproliferation (but a cell is still living), while crisis is involved in cell death. Cells within a tumor can be induced into senescence where they will live on until a crisis is reached and the majority of the cell population dies.

However, sometimes one cell can survive this crisis, and as a result, exhibit an unlimited and unstoppable replicative ability, or immortalization. Telomeres on the ends of have been found to be central components of this process. Telomerase, a special DNA polymerase that adds to telomeres, is overexpressed in about 90% of all cancer cells. Extending DNA on the telomeres counters the typical erosion of telomeres that, after a certain point, is used to signal cell death.

Inducing angiogenesis – Tumors require nutrients, oxygen, and methods for disposing of waste just as normal cells do. Therefore, an ability of a tumor to increase vasculature in its microenvironment can provide cancer cells with all of the raw materials needed for further proliferation. Mutations that “turn on” angiogenesis in a tumor can be vital in helping grow and expand, and once the switch is on, it usually does not turn off. Genes typically involved in embryonic development may be activated in cancerous cells to induce angiogenesis and feed tumors. For example, the VEGF-A gene encodes ligands that help control new blood vessel growth during embryonic development. If this gene is upregulated by an oncogene, tumor cells may increase blood vessel growth in their immediate area, allowing them to grow larger and proliferate more quickly. The blood 10 vessels in tumors are unique and unlike those in normal tissues. Their vasculature is characterized by capillary sprouting, twisted and excessive vessel branching, distorted vessels, enlarged vessels, microhemorrhaging and leaking, etc. Cells do possess angiogenesis inhibitors that work as tumor suppressors against the formation of this vasculature, but if these genes are inactivated, their gate-keeping properties halt.

Activating invasion and metastasis – This is the most critical hallmark of cancer involved in tumors becoming extremely malignant and pervading throughout an entire body.

Reduction of E-cadherin on the surface of cancer cells is known to increase the likelihood of invasion, as E-cadherin binding typically suppresses growth and keeps a cancerous cell from invading an epithelial layer. Genes encoding cell-to-cell interactions and cell-to- endothelial interactions have been demonstrated to be affected in some of the most aggressive forms of cancer. Invasion and metastasis has commonly been described as a sequence of discrete steps that must be completed in order, beginning with local invasion, then movement into nearby blood and lymph vessels, then movement through the limbic and circulatory systems, until escape of the cancer cells from these vessels at a distant location in the body and the growth of a metastasis. One particular program, the epithelial to mesenchymal transition, has recently been implicated as a method by which transformed cells can acquire the ability to invade new regions, resist cell death, etc.

Basically, a cell regresses back to a less-differentiated state and can now acquire new traits that allow it to invade and metastasize. Studies also suggest that specific types of cancer have specific invasion patterns. 11 The Study

We previously identified a possible genomic “hotspot” in Renal Cell Papillary Type 2

Carcinoma based on high sensitivity SNP and microarray analysis (Krill-Burger et al., 2012).

All 6 tumor samples we evaluated demonstrated similar copy number losses in a 19.9 kb domain of chromosome 9. This region is of interest as a prospective diagnostic biomarker and as a genomic domain that may contribute to underlying mechanisms of renal tumorigenesis. The need for a molecular biomarker to help classify Renal Papillary neoplasms (Type 1 vs. Type 2) is particularly important because a) it is often challenging to distinguish among papillary cancers based on morphology alone and b) Papillary Type 1 tumors are slow growing while Papillary

Type 2 specimens are aggressive with unpredictable growth patterns.

Using next generation DNA sequencing on the Ion Torrent Personal Genome Machine, we created a targeted sequencing study of Papillary Type 1 Renal Cell Carcinoma (RCC) focusing on the same hotspot area identified in our previous study of Papillary Type 2 RCC. 6 paired tumor and normal samples were studied. We sought to identify single nucleotide variants

(SNVs), small insertions and deletions (INDELs), and copy number variants (CNVs) that could be used to create a genetic panel to differentiate the papillary subtypes. Additionally, the results of the study may identify possible therapeutic targets for treating Papillary Type 1 Renal Cell

Carcinoma and for understanding the process of tumorigenesis in this particular cancer.

12 Figure 2 - Copy Number Profiles of RCC Subtypes

The profiles above were found in a study by Krill-Berger et al. in 2012. PAP2 RCC samples shared common copy number deletions in the p-arm of chromosome 9. This “hotspot” region became the focus of a targeted sequencing study on PAP2 RCC that inspired the study on PAP1 RCC presented below.

13

Chapter 2

Methods

Tissue Samples

Six paired samples (12 samples total) of Papillary Type 1 Renal Cell Carcinoma tumors and matched adjacent normal tissue samples were obtained from the University of Pittsburgh

Health Sciences Tissue Bank. Surgical pathologists classified tumor specimens as Papillary

Type 1. Tissue specimens were frozen and placed in Optimal Cutting Blocks and stored at

-80° C. Tissue was bulk dissected from the cutting blocks before DNA was extracted.

DNA Extraction

DNA was extracted from tissue and purified using the Qiagen Genomic Tip protocol.

The Genomic-tip 100/G “midi-prep” and 20/G “mini-prep” were used, which typically extract

10-100 g of DNA (binding capacity 100 g) and 1-20 g (binding capacity 20 g), respectively. In this protocol, tissue is lysed and an anion-exchange resin and serial washes are used to first bind DNA to the resin, wash the DNA free of impurities, and then elute the DNA from the resin via centrifugation. DNA purification is based on the binding of the negatively charged phosphate groups in the DNA backbone and positively-charged dimethylaminoethanol

(DEAE) groups on the resin’s surface. Buffers of varying pH and salt concentrations are used to 14 either wash the DNA of impurities (proteins, RNA, etc.) or, finally, to elute the DNA off of the column and collect it.

First, tissue was homogenized in a 2 mL tube in 1.5 mL of the G2 buffer from the

Genomic Tip kit. This solution was transferred to a 15 mL conical tube and enough G2 buffer was added so the volume of the solution was 2 mL. 100 L of Proteinase K was added to the mixture. The mixture was left to incubate for 2 hours at 50° C. After incubation, each sample was vortexed for 20 seconds, and 5 of the samples were placed into mini Genomic-tip tubes while 1 (644N) was placed in a midi-sized tube. 644N was transferred to a larger tube to avoid clogging the anion-exchange resin because a large tissue sample was used. Before transfer, the mini-prep tube was equilibrated with 1 mL of buffer QBT while the midi-prep tube was equilibrated with 4 mL of buffer QBT, and the tubes were emptied via gravity flow. This step ensured that the resin was wet before the sample was loaded. After loading the sample, the tube was again allowed to equilibrate by gravity flow. The mini Genomic-tips were then washed 3 times with 1.0 mL of Buffer QC and the midi Genomic-tip was washed 2 times with 7.5 mL of

Buffer QC. The DNA was eluted into 6 new 15 mL conical tubes with pre-warmed (50° C)

Buffer QF (2 x 1 mL for mini, 1 x 5 mL for midi). The DNA was precipitated by the addition of isopropanol (1.4 mL for mini, 3.5 mL for midi). Each tube was then centrifuged (6500 x g, 4° C,

15 minutes) and the supernatant removed, leaving a pellet of DNA at the bottom of each conical.

1 or 2 mL of ethanol was added to each tube (mini or midi), the samples were vortexed, and spun again (6500 x g, 4° C, 10 minutes). Excess EtOH was removed via pipettes, the samples were dried for 10 minutes, and 50 L of low TE buffer (very low levels of EDTA) were used to precipitate samples overnight. 15 Figure 3 - Extraction Process (QIAGEN)

Above is a graphic representation of the Qiagen Genomic Tip DNA extraction process. Tissue is lysed and played into the top of the flow column, which contains an anion-exchange resin that binds to the phosphate backbone on DNA. Buffers of varying pH and salt concentrations are washed over the column to clean the DNA and finally elute it from the column.

Fragment Library Construction

Fragment libraries for the 6 tumors (12 paired samples) were created using biotinylated

DNA hybridization probes to enrich for 1.5 kilobases of targeted sequencing. These regions comprised 43 genes on the p-arm of chromosome 9 (p24.3 to p22.3). The Qubit 2.0 fluorometer was used to determine the concentration of DNA and quality (260/280 ratio > 1.8) and molecular 16 size was determined via the Bioanalyzer 2100. After determining concentrations, 1 g of double-stranded DNA was enzymatically sheared using the Ion Shear Enzyme kit to generate

200-300bp DNA fragments. The enzymes must be empirically tested to determine sufficient yet limited reaction times to achieve correct fragment size. The Qubit and Bioanalyzer must be used after each enzymatic shearing test to determine fragment sizes. After the DNA has been sheared, a P1 adapter and barcode adapters are annealed to the DNA fragments. The P1 is a nucleotide sequence that will allow the fragments to be amplified before sequencing. The barcode adapters are nucleotide sequences attached to the libraries so that tumor and normal samples can be sequenced simultaneously under the same conditions, but their sequences can be separated during data processing. This ligated and fragmented DNA was size selected using gel electrophoresis to be approximately 330bp in length, including approximately 80bp of adaptors.

These libraries were amplified in the GeneAMP PCR System 9700 (12x, 95° C: 15 sec; 58° C:

15 sec; 72° C: 1min) using the Ion Xpress Plus gDNA Fragment Library Kit and equimolar amounts (250ng/L) of normal and matched tumor DNA were matched so that after amplification they could still be compared when determining copy number fold changes. The

DNA was captured on streptavidin beads to be washed and subjected to another round of amplification (5x, 95° C: 2min; 58° C: 15 sec; 72° C:1 min), eluted in 50 L low TE and subjected to a second, identical round of hybridization, capture, wash, and amplification, but the second amplification was 12 cycles in length. 17 Targeted Sequencing of Matched Tumor-Normal Specimens

The OneTouch 2 carried out emulsion PCR on the libraries. Essentially, DNA fragments are inserted into the device and amplified to decorate beads called Ion Sphere Particles (ISPs).

The goal is to create an oil emulsion with 1 bead covered in copies of 1 unique DNA fragment.

Each emulsion contains free nucleotides and enzymes for the amplification process. If multiple

DNA fragments are in one emulsion, a polyclonal bead is produced that cannot be sequenced.

The enriched ISPs were sequenced on the Ion Torrent Personal Genome Machine (PGM).

The PGM was initialized to a pH of 7.5. ISPs are loaded into a chip with micro-wells in it, one

ISP to one well. The chip is then placed on to the PGM and run through 500 cycles of washes with nucleotides. Each time a nucleotide washes over the ISPs, DNA polymerase incorporates it into the strand (when appropriate) and a proton is released. A semiconductor on the bottom of the chip can sense the tiny changes in pH, and when the base is incorporated, the PGM records which nucleotide was washed over it and uses that to determine the sequence of DNA on the ISP.

Figure 4 - Ion Torrent PGM Semiconductor (Life Technologies)

The figure above shows the interface between the Ion Torrent sequencing chip and PGM. Decorated beads fill the micro- machined wells, and the wells are washed with nucleotides. When a nucleotide is incorporated into bead strands, the ion- sensitive layer measures the release of H+ ions and the proprietary Ion sensor records the information to generate a sequence for the DNA bound to the bead in each well. 18 Data Analysis

Sequence base calls from the PGM were mapped and aligned to HG19 ( version 19) using the Burrows-Wheeler Aligner (BWA). After being mapped, the data was stored in BAM files, which are binary versions of SAM files. A SAM file is a tab-delimited text file that contains sequence alignment data. Each BAM file has an associated BAI file that acts as a “key” that can “unlock” the corresponding BAM file and allows it to be read by data analysis software. A battery of variant calling algorithms were implemented into a plug-in called Galaxy, developed here at Penn State: GATK’s Unified Genotyper v2.3; IonTorrent Variant Caller plugin v3.4; and MuTect v1.1.4. Variant call files (.vcf) were created and implemented into the software GenomOncology. GenomOncology allows users to upload multiple variant call files generated from a variety of software to test their concordance and find variants common to all of the variant callers. Due to the wide range of variants that each program calls, we looked for common calls from all of the different algorithms to avoid false positives. We compared the variants against HG19 for non-synonymous single nucleotide variants (SNVs), small insertions and deletions (INDELs, <5 bases), somatic or germline origin variants, or whether the variants are included in the dbSNP database. Somatic variants are those that are found in the tumor sequences but not the paired normal sequences, while germline variants are found in both sequences. We filtered SNVs using the programs Polyphen-2 and SIFT (Sorting Intolerant From

Tolerant) to determine whether the variants were deleterious or benign. INDELs were analyzed using SIFT-INDEL to determine whether they were tolerated or frameshift mutations. SNVs and

INDELs were confirmed by direct, manual curation of the BAM file sequences using the

Integrated Genomics Viewer (IGV) to further prevent false negatives. 19 Copy number variants (CNVs) were identified using exon base pileups from IGV restricted to coding domains only. Briefly, the average number of exon base calls in the normal samples were ranked from highest coverage to lowest coverage, compiled in bins of 30 exons, and each exon value was normalized based on total counts within its associated bin. This process was repeated for each matching tumor ranked in concordance with its paired normal sample. Statistical comparisons of normalized exon base counts were performed using the

Student’s Paired T-test to evaluate each exon among the 6 normal-tumor pairs with level of significance corrected for multiple testing, false discovery (p < 0.02).

20 Chapter 3

Results and Discussion

Initially, we used three different variant callers to identify mutations in the DNA sequences: the IonTorrent variant caller plugin, the Genome Analysis Tool Kit Unified

Genotyper (GATK), and the Washington University’s algorithm VARSCAN2. VARSCAN2 has been shown to be the most reliable variant caller in detecting tumor-specific mutations, so it was the primary algorithm used in detecting mutations for the study. The Ion Torrent variant caller and GATK were used to verify mutations found by VARSCAN2. When analyzing the sample pairs for tumor specific variants, we found none using the Ion Torrent variant caller. The Ion

Torrent variant caller is designed to be used at the clinical level and is much more stringent than necessary for initial research, while the GATK algorithm is too forgiving in making variant calls.

VARSCAN2 found 12 single nucleotide variants (SNVs) across 5 genes and 35 deletions across

22 genes that were common to all 6 tumors sequenced for the study. No insertions were found that were common to all 6 tumors. All variants were designated as possibly or probably damaging by SIFT for SNVs and by SIFT-INDEL for deletions. Manual curation of the data revealed that these variants were also detected in the paired normal specimens. However, the mutations were found at much lower frequencies in the normal specimens. The presence of these mutations in the paired adjacent normal specimen could be from cancer field effects, the phenomenon in which areas outside of the primary tumor have still been affected by carcinogenic agents. Additionally, heterogeneity in the adjacent normal tissue sample could be responsible for the observed mutations in the normal specimen. The variants identified by

VARSCAN2 are listed in Table 1 below. 21 Table 1 - Tumor Specific Variants Found In All Tumor Samples

Gene VT Position Gene VT Position SMARCA2 Del 2123873 TPD52L3 Del 6328715 SMARCA2 Del 2182210 TPD52L3 Sub 6328947 VLDLR Del 2643641 UHRF2 Del 6413577 VLDLR Del 2645677 UHRF2 Sub 6413582 KIAA0020 Sub 2804365 UHRF2 Sub 6413584 KIAA0020 Del 2824788 UHRF2 Sub 6413591 KIAA0020 Del 2837296 UHRF2 Sub 6413621 RFX3 Del 3225134 UHRF2 Sub 6481710 GLIS3 Del 3932361 GLDC Sub 6556188 GLIS3 Del 4117832 GLDC Del 6587248 GLIS3 Del 4118472 GLDC Sub 6604594 C9orf68 Del 4605351 GLDC Del 6610227 C9orf68 Del 4661905 GLDC Del 6610249 JAK2 Del 5054799 JMJD2C Del 6893227 JAK2 Del 5089728 C9orf123 Del 7798579 INSL4 Del 5233654 C9orf123 Sub 7799526 C9prf46 Del 5361173 C9orf123 Sub 7799583 CD274 Del 5462913 C9orf123 Sub 7799653 KIAA1432 Del 5753614 PTPRD Del 8319957 ERMP1 Del 5801292 PTPRD Del 8484199 KIAA2026 Del 5920370 TYRP1 Del 12694009 KIAA2026 Del 5920456 MPDZ Del 13175754 KIAA2026 Del 5968044 MPDZ Del 13206081 RANBP6 Del 6014890 MPDZ Del 13217183 Table 1 shows tumor specific substitutions and deletions found using the VARSCAN2 variant caller. All substitutions were marked as possibly damaging or probably damaging by SIFT. All deletions were marked as frameshift deletions by SIFT- INDEL. Blue highlight indicates that the same variant was detected by both the IonTorrent variant caller and GATK UGT as well. Yellow indicates that the same variant was also detected in our lab’s study of PAP2. Green indicates that it was called by other variant callers and is also in the PAP2 dataset.

One of the mutations found in all PAP1 tumor specimens was also found in Papillary

Type 2 (PAP2) specimens in a previous study done by the lab. With the exception of this variant, all of those listed in the table above could comprise a future genetic panel that could be used to analyze a fine-needle biopsy of undetermined kidney tumor type to classify it as PAP1 or

PAP2. Additionally, since all of the mutations listed above were identified as having possible deleterious effects during translation, any of the variants could be involved in the final pathway 22 leading to PAP1 tumorigenesis. Table 2 summarizes the genes that variants were found in and whether or not they are implicated in other cancer types.

Table 2 - Genes With Variants

Gene Expression In Other Cancer Types SMARCA2 Lung adenocarcinoma, clear cell RCC VLDLR Clear cell RCC KIAA0020 N/A RFX3 N/A GLIS3 N/A C9orf68 N/A JAK2 Renal cancer INSL4 Breast cancer C9orf46 Breast cancer, ductal carcinoma CD274 Large B-cell lymphoma, cholangiocarcinoma KIAA1432 N/A ERMP1 N/A KIAA2026 N/A RANBP6 N/A TPD52L3 N/A UHRF2 Breast cancer GLDC N/A KDM4C Esophagus squamous cell carcinoma, prostate cancer C9orf123 N/A PTPRD Glioblastoma, lung, head and neck squamous cell cancer TYRP1 Melanoma MPDZ N/A The genes above contained mutations common to all 6 tumor samples sequenced for the study. Ten of them have been previously implicated in other cancer types.

A short description of the genes identified above follows:

SMARCA2 is an SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily A member. Members of this gene family show helicase and ATPase activities and are believed to be involved in the transcription of other genes by altering chromatin structure around those genes (Genecards). SMARCA2 (also known as BRM) has been found to be essential for the growth of tumor cells in BRG1/SMARCA4 deficient cancers such as lung 23 adenocarcinomas. SWI/SNF complexes are implicated in many different cancer subtypes, including Clear Cell RCC (Hoffman et al., 2014).

VLDLR (Very Low Density Lipoprotein Receptor) plays a role in triglyceride metabolism by mediating increased uptake and regulation of lipids and the reelin signaling pathway

(Genecards). Upregulated VLDLR has been found in studies of clear cell RCC and may even be responsible for their distinct morphology (Sundelin et al., 2012).

GLIS3 (GLIS family zinc finger 3) is a transcription factor involved in kidney development, but not necessarily in cancer (Kang, Beak, Kim, Herbert, & Jetten, 2009).

JAK2 (Janus kinase 2) is implicated in polycythemia vera, essential thrombocythemia, and myeloid metaplasia by being constantly activated (Gilliland, 2005). A JAK2 effector, erythropoietin, has been found to promote renal tumor proliferation under hypoxic conditions

(Miyake et al., 2013).

INSL4 (Insulin-like 4) is a gene involved in embryonic development in early placental cytotrophoblasts and syncytiotrophoblasts (Genecards). Growth factor pro-EPIL was found to be overexpressed in breast cancer cells but not surrounding stromal cells, increasing levels of

INSL4 products (Brandt et al., 2002).

C9orf46, also known as PLGRKT (plasminogen receptor, c-terminal lysine transmembrane protein) is involved in the regulation of inflammatory responses and regulates monocyte chemotactic migration (Genecards). PLGRKT is highly expressed in metastatic breast cancer and ductal carcinoma (Meuller et al., 2014).

CD274 (CD274 molecule) is involved in the costimulatory signal and is essential for T- cell proliferation and production of IL10 and IFNG (Genecards). It is implicated in primary 24 mediastinal large b-cell lymphoma (Genecards) and suppressive expression of CD274 causes increased tumorigenesis in cholangiocarcinoma (Tamai et al., 2014).

UHRF2 is an E3 -protein ligase, an intermolecular hub protein in the cell cycle network. It may contribute to epigenetic control of gene expression in differentiated animal cells. This ligase ubiquitinates cyclins in an apparently phosphorylation-independent manner to induce G1 arrest. It is implicated in several pathways important to cancer, such as the cell cycle regulation, cell proliferation, cell differentiation, and apoptosis. (Genecards). UHRF2 is a known oncogene – when knocked out in a study of breast cancer cells, it prohibited further amplification (Wu et al., 2012).

KDM4C (lysine-specific demethylase 4C) is a histone demethylase associated with esophagus squamous cell carcinoma and androgen independence in prostate cancer (Crea et al.,

2012) (Byrne et al., 1998).

PTPRD stands for receptor-type tyrosine-protein phosphatase delta. It is involved in cell adhesion molecule binding and receptor binding. It is implicated in neuron differentiation via presynaptic neural assembly and protein dephosphorylation as well (Genecards). Deactivated contact inhibition is common in cancer cells, so disruptions to cell signaling pathways could be critical in tumorigenesis. PTPRD is a tumor suppressor frequently inactivated in glioblastoma and also implicated in head and neck squamous cell carcinomas and lung cancers (Veeriah et al.,

2009).

TYRP1 (tyrosine-related protein 1) encodes a melanosomal enzyme that is directly involved in the production of melanin in skin (Genecards). This gene has been implicated as a prognostic marker in melanoma metastases (Journe et al., 2011). 25 MPDZ (multiple PDZ domain protein) is a member of NMDAR signaling complex that may play a role in synaptic plasticity in excitatory synapses (Genecards). This gene has been implicated in neoplasm formation and is listed in the COSMIC database (the catalogue of somatic mutations in cancer) (Forbes et al., 2010).

KIAA0020 (Minor Histocompatibility Antigen), RFX3 (Regulatory Factor X 3),

C9orf58/SPATA6L (spermatogenesis associated 6-like), KIAA1432 (connexin-43-interacting protein), C9orf123 (transmembrane protein 261), ERMP1 (endoplasmic reticulum metallopeptidase), KIAA2026 (uncharacterized protein), RANBP6 (Ran-binding protein 6),

TPD52L3 (tumor protein D52-like 3) and GLDC (glycine dehydrogenase) are not directly implicated in any cancers.

The following figures show copy number fold changes for exonic deletions and amplifications, respectively. Only one statistically significant (p < 0.02) copy number difference, a deletion, was found with a value less than the set cut-off value of -1.25. All other significant deletions and all significant amplifications had values very close to +/- 1. Copy number amplifications of single exons are most likely not biologically relevant, as all of the exons that code for a single gene product would need to be amplified to change overall protein expression. The deletion of a single exon, though, could have a significant impact, since losing any part of a protein would alter its structure in such a way as to alter function as well. It is not unexpected that so few regions of copy number amplification and deletion were found, since the regions sequenced contained copy number differences characteristic to PAP2 RCC – it would have been surprising if there had been many copy number differences in PAP1 samples.

26

Figure 5 - Significant Copy Number Deletions

Copy number deletions found commonly in all 6 tumor samples (p < 0.02) are shown above. Fold change values indicate the relative depth of coverage comparing tumor to normal samples. For example, a fold change of -2 indicates half the depth of coverage in the tumor sample as in the normal sample.

Figure 6 - Significant Copy Number Amplifications

Copy number amplifications found commonly in all 6 tumor samples (p < 0.02) are shown above. Fold change values indicate the relative depth of coverage comparing tumor to normal samples. For example, a fold change of 2 indicates twice the depth of coverage in the tumor sample as in the normal sample. 27 RCL1 (RNA Terminal Phosphate Cyclase-Like 1) plays a role in 40S-ribosomal-subunit biogenesis in early pre-rRNA processing and may be implicated in lung squamous cell carcinoma (GeneCards), but the clinical significance of mutations in this gene remain uncertain.

Chapter 4

Conclusions and Future Work

Next-generation DNA sequencing of Papillary Type 1 Renal Cell Carcinoma tumor samples and paired normal specimen revealed a set of single nucleotide substitutions and deletions common to all tumors. 11 substitutions across 5 genes and 35 deletions across 22 genes could create a genetic panel that can distinguish PAP1 tumors from other RCC subtypes, particularly PAP2. This panel could be used to create a set of targets to be used in the sequencing of fine needle biopsy samples of tumors, allowing physicians to use this minimally invasive technique to classify an RCC tumor before proceeding with radical treatments of chemotherapy or complete nephrectomies. Further targeted sequencing studies on increased samples sizes of PAP2 and PAP1 samples could confirm the efficacy of this proposed panel as a set of diagnostic biomarkers for RCC. Additionally, many of the genes affected by these single nucleotide substitutions and deletions have been implicated in other cancer types – if confirmed to exist in large samples of PAP1 tumors, these mutations could be important therapeutic targets.

Copy number variant analysis of the PAP1 samples did not generate many results, although one copy number deletion in the RCL1 gene occurred in all 6 samples. The RCL1 gene is implicated in lung squamous cell carcinoma, so it may play a role in tumorigenesis in PAP1 samples. While several other exonic regions were found to have copy number deletions and 28 amplifications across the samples, they did not occur at high enough levels to have downstream effects on the amounts of protein products being increased or decreased. While copy number fingerprints could form a potential biomarker panel as well, it is not as convenient to sequence an entire exon when single base locations could be used to form a differential genetic test.

Future Work

Increasing the number of samples will be the first step in moving the discovered genetic mutations into a true diagnostic test for differentiating subtypes of RCC. Studying more samples would most likely narrow the list of common single nucleotide substitutions and deletions, but these variants would hold more statistical power in distinguishing tumor types.

Once a finalized list of biomarkers is developed, the mutations would need to be verified by different sequencing methods. This list of biomarkers can be validated by comparing to various sequencing data archives such as the Cancer Genome Atlas (TCGA), but should also be validated via alternate accepted methods of sequencing such as Sanger Sequencing. 29

Bibliography

Brandt, B., Roetger, A., Bidart, J. M., Packeisen, J., Schier, K., Mikesch, J. H., … Buerger, H. (2002). Early placenta insulin-like growth factor (pro-EPIL) is overexpressed and secreted by c-erbB-2-positive cells with high invasion potential. Cancer Research, 62(4), 1020– 1024.

Byrne JA, Nourse CR, Basset P, Gunning P. Identification of homo- and heteromeric interactions between members of the breast carcinoma-associated D52 protein family using the yeast two-hybrid system. Oncogene, 1998, 16:873-881

Coe, B. P., Witherspoon, K., Rosenfeld, J. a, van Bon, B. W. M., Vulto-van Silfhout, A. T., Bosco, P., … Eichler, E. E. (2014). Refining analyses of copy number variation identifies specific genes associated with developmental delay. Nature Genetics, 46(10). http://doi.org/10.1038/ng.3092

Crea F, Sun L, Mai A, Chiang YT, Farrar WL, Danesi R, Helgason CD. The emerging role of histone lysine demethylases in prostate cancer. Mol Cancer, 2012, 11:52-62

Forbes SA, Tang G, Bindal N, Bamford S, Dawson E, Cole C, Kok CY, Jia M, Ewing R, Menzies A, Teague JW, Stratton MR, Futreal PA. COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer. Nucleic Acids Res, 2010, 38(Database issue):D652-657

Geramizadeh, B., Ravanshad, M., & Rahsaz, M. (2008). Useful markers for differential diagnosis of oncocytoma, chromophobe renal cell carcinoma and conventional renal cell carcinoma. Indian Journal of Pathology & Microbiology, 51(2), 167-171. doi:10.4103/0377- 4929.41641

Genecards. (2015). http://www.genecards.org

Gilliland G. (2005). After Discovering the JAK2 Mutation: Next Steps in Developing Targeted Treatments for PV, ET and MF. http://www.mpnresearchfoundation.org/After-Discovering- the-JAK2-Mutation-3A-Next-Steps-in-Developing-Targeted-Treatments-for-PV-2C-ET- and-MF

Hanahan, D., & Weinberg, R. a. (2011). Hallmarks of cancer: The next generation. Cell, 144(5), 646–674. http://doi.org/10.1016/j.cell.2011.02.013

Harismendy, O., Ng, P. C., Strausberg, R. L., Wang, X., Stockwell, T. B., Beeson, K. Y., … Frazer, K. a. (2009). Evaluation of next generation sequencing platforms for population 30 targeted sequencing studies. Genome Biology, 10(3), R32. http://doi.org/10.1186/gb-2009- 10-3-r32

Hoffman, G. R., Rahal, R., Buxton, F., Xiang, K., Mcallister, G., Frias, E., … Jagani, Z. (2014). Functional epigenetics approach identifies BRM / SMARCA2 as a critical synthetic lethal target in BRG1-deficient cancers, 111(8), 2–7. http://doi.org/10.1073/pnas.1316793111

Journe, F., Id Boufker, H., Van Kempen, L., Galibert, M.-D., Wiedig, M., Salès, F., … Ghanem, G. (2011). TYRP1 mRNA expression in melanoma metastases correlates with clinical outcome. British Journal of Cancer, 105(11), 1726–32. http://doi.org/10.1038/bjc.2011.451

Kang, H. S., Beak, J. Y., Kim, Y.-S., Herbert, R., & Jetten, A. M. (2009). Glis3 is associated with primary cilia and Wwtr1/TAZ and implicated in polycystic kidney disease. Molecular and Cellular Biology, 29(10), 2556–2569. http://doi.org/10.1128/MCB.01620-08

Klatte, T., Pantuck, A. J., Said, J. W., Seligson, D. B., Rao, N. P., LaRochelle, J. C., … Belldegrun, A. S. (2009). Cytogenetic and molecular tumor profiling for type 1 and type 2 papillary renal cell carcinoma. Clinical Cancer Research : An Official Journal of the American Association for Cancer Research, 15(4), 1162–9. http://doi.org/10.1158/1078- 0432.CCR-08-1229

Krill-Burger, J. M., Lyons, M. a., Kelly, L. a., Sciulli, C. M., Petrosko, P., Chandran, U. R., … Laframboise, W. a. (2012). Renal cell neoplasms contain shared tumor type-specific copy number variations. American Journal of Pathology, 180(6), 2427–2439. http://doi.org/10.1016/j.ajpath.2012.01.044

Li, W., & Olivier, M. (2013). Current analysis platforms and methods for detecting copy number variation. Physiological Genomics, 45(1), 1–16. http://doi.org/10.1152/physiolgenomics.00082.2012

Life Technologies. (2015). http://www.lifetechnologies.com/us/en/home/brands/ion- torrent.html

Mardis, E. R. (2008). Next-generation DNA sequencing methods. Annual Review of Genomics and Human Genetics, 9, 387–402. http://doi.org/10.1146/annurev.genom.9.081307.164359

Miyake, M., Goodison, S., Lawton, a., Zhang, G., Gomes-Giacoia, E., & Rosser, C. J. (2013). Erythropoietin is a JAK2 and ERK1/2 effector that can promote renal tumor cell proliferation under hypoxic conditions. Journal of Hematology & Oncology, 6(1), 65. http://doi.org/10.1186/1756-8722-6-65

Mueller, B. M. Miles, L. A. (2014). A Novel Plasminogen Receptor in Breast Cancer. National Institutes of Health.

QIAGEN Genomic DNA Handbook. (2001). 31 Rini, B. I., Campbell, S. C., & Escudier, B. (2009). Renal cell carcinoma. Lancet, 373(9669), 1119–32. http://doi.org/10.1016/S0140-6736(09)60229-4

Sundelin, J. P., Ståhlman, M., Lundqvist, A., Levin, M., Parini, P., Johansson, M. E., & Borén, J. (2012). Increased Expression of the Very Low-Density Lipoprotein Receptor Mediates Lipid Accumulation in Clear-Cell Renal Cell Carcinoma. PLoS ONE, 7(11), 1–5. http://doi.org/10.1371/journal.pone.0048694

Tamai, K., Nakamura, M., Mizuma, M., Mochizuki, M., Yokoyama, M., Endo, H., … Tanaka, N. (2014). Suppressive expression of CD274 increases tumorigenesis and cancer stem cell phenotypes in cholangiocarcinoma. Cancer Science, 105(6), 667–674. http://doi.org/10.1111/cas.12406

Veeriah, S., Brennan, C., Meng, S., Singh, B., Fagin, J. a, Solit, D. B., … Chan, T. a. (2009). The tyrosine phosphatase PTPRD is a tumor suppressor that is frequently inactivated and mutated in glioblastoma and other human cancers. Proceedings of the National Academy of Sciences of the United States of America, 106(23), 9435–9440. http://doi.org/10.1073/pnas.0900571106

Vera-Badillo, F. E., Conde, E., & Duran, I. (2012). Chromophobe renal cell carcinoma: a review of an uncommon entity. International Journal of Urology : Official Journal of the Japanese Urological Association, 19(10), 894–900. http://doi.org/10.1111/j.1442-2042.2012.03079.x

Walker, J. M. (2009). IN M OLECULAR B IOLOGY TM Series Editor. (Array, Ed.)Life Sciences (Vol. 531). Humana Press. Retrieved from http://books.google.com/books?id=Ku2wPAAACAAJ

Wu, J., Liu, S., Liu, G., Dombkowski, a, Abrams, J., Martin-Trevino, R., … Yang, Z.-Q. (2012). Identification and functional analysis of 9p24 amplified genes in human breast cancer. Oncogene, 31(3), 333–341. http://doi.org/10.1038/onc.2011.227

32 Appendix A. Ion Torrent Run Reports

Figure 7 - Ion Torrent Run Report Samples 1-6

Run Report for Auto user WIC-3-RCC PAP1 Chr9 TargetSeq BC1-6 7 29 14 156

Run Summary

1.4 G 68 6,116,853 228 bp 246 bp 254 bp Tot al Bases K ey Signal Tot al Reads Mean Median Mode 81% 67% R ead L engt h ISP Loading Usable Reads I SP Density I SP Sum m ar y

Addr essable W ells 11,303,427 With ISPs 9,164,218 81.1% Live 9,162,670 100.0% Test Fragment 16,110 00.2% Library 9,146,560 99.8%

L ibr ar y I SP s 9,146,560 Filtered: Polyclonal 2,216,327 24.2% Filtered: Low Quality 813,229 08.9% Filtered: Primer Dimer 151 00.0% F inal L ibr ar y I SPs 6,116,853 66.9%

Barcode Name Sample Bases ≥ Q20 Reads Mean Read Lengt h No barcode None 22,101,641 20,230,139 94,293 234 bp IonXpress 001 TP08-S00463T 225,867,093 206,402,291 978,641 230 bp IonXpress 002 TP08-S00463N 233,101,413 212,426,246 971,749 239 bp IonXpress 003 TP08-S00644T 226,034,642 206,009,769 948,395 238 bp IonXpress 004 TP08-S00644N 236,424,357 217,073,936 1,033,973 228 bp IonXpress 005 TP08-S00724T 154,715,382 142,679,594 717,294 215 bp IonXpress 006 TP08-S00724N 300,943,467 275,807,513 1,368,641 219 bp

Test Fragment Reads Percent 50AQ17 Read Lengt h Histogram

TF A 14,055 93%

1

33

Run Report for Auto user WIC-3-RCC PAP1 Chr9 TargetSeq BC1-6 7 29 14 156

A lignment Summary ( ali gned to H omo sapi ens H G 19 chr 9)

1.3 G 9.6X 99.1% Tot al A lignment Bases A verage Coverage Mean Raw Accuracy 1x Depth of Ref erence

AQ17 AQ20 Per fect Tot al N umb er of B ases [M bp] 1.2 G 1.1 G 871 M M ean L engt h [bp] 224 213 163 L ongest A lignm ent [bp] 452 452 408 M ean Cover age D ept h 9.1 8.4 6.2

2

34

Run Report for Auto user WIC-3-RCC PAP1 Chr9 TargetSeq BC1-6 7 29 14 156

coverageA nalysis

variant Caller

3

35

Run Report for Auto user WIC-3-RCC PAP1 Chr9 TargetSeq BC1-6 7 29 14 156

A nalysis D et ails

R un N am e R 2014 07 29 15 21 19 user WIC-3-RCC PAP1 Chr9 TargetSeq BC1-6 7 29 14 R un D at e July 29, 2014, 3:21 p.m. R un Flows 900 Project s RCC PAP2 internal Sample TP08-S00724N, TP08-S00644T , TP08-S00463T , TP08-S00644N, TP08-S00463N, TP08-S00724T R efer ence PGM WICKS Flow Or der TACGTACGTCTGAGCATCGATCGATGTACAGC L ibr ar y K ey TCAG TF K ey ATCG Chip Check Passed Chip T ype 318C Chip D at a single B ar code Set IonXpress A nalysis Name Auto user WIC-3-RCC PAP1 Chr9 TargetSeq BC1-6 7 29 14 156 A nalysis D at e July 30, 2014, 2:17 a.m. A nalysis Flows 900 r unI D M9S5A

Soft ware Version Tor r ent Suit e 4.0.1 host 11M JFP1 ion-analysis 4.0.5-1 ion-dbr eport s 4.0.21-1 ion-gpu 4.0.0-1 ion-pipeline 4.0.6-1 ion-plugins 4.0.21-1 ion-t orr ent r 4.0.4-1 Script 21.5.5 L iveV iew 545 D at aCollect 462 OS 20 G r aphics 35

4

36 Figure 8 - Ion Torrent Run Report Samples 7-12

Run Report for Auto user WIC-14-Chromosome 9 Ion TargetSeq RCCPAP1 3prs 12-5-2014 167 Run Summary

1.6 G 68 7,106,921 228 bp 244 bp 256 bp Tot al Bases K ey Signal Tot al Reads Mean Median Mode 87% 73% R ead L engt h ISP Loading Usable Reads I SP Density I SP Sum m ar y

Addr essable W ells 11,302,449 With ISPs 9,823,471 86.9% Live 9,822,488 100.0% Test Fragment 20,869 00.2% Library 9,801,619 99.8%

L ibr ar y I SP s 9,801,619 Filtered: Polyclonal 1,911,628 19.5% Filtered: Low Quality 782,907 08.0% Filtered: Primer Dimer 163 00.0% F inal L ibr ar y I SPs 7,106,921 72.5%

Barcode Name Sample Bases ≥ Q20 Reads Mean Read Lengt h No barcode None 21,395,115 19,447,181 92,483 231 bp IonXpress 007 116 Tumor 227,168,054 207,494,397 1,015,255 223 bp IonXpress 008 116 Normal 322,972,161 292,265,837 1,368,101 236 bp IonXpress 009 299 Tumor 261,595,864 241,080,434 1,187,700 220 bp IonXpress 010 299 Normal 263,896,992 241,300,327 1,141,230 231 bp IonXpress 011 979 Tumor 263,949,540 239,571,845 1,123,234 234 bp IonXpress 012 979 Normal 259,104,345 238,103,678 1,173,287 220 bp

Test Fragment Reads Percent 50AQ17 Read Lengt h Histogram

TF A 18,879 92%

1

37

Run Report for Auto user WIC-14-Chromosome 9 Ion TargetSeq RCCPAP1 3prs 12-5-2014 167

A lignment Summary ( ali gned to H omo sapi ens H G 19 chr 9)

1.4 G 10.5X 98.7% Tot al A lignment Bases A verage Coverage Mean Raw Accuracy 1x Depth of Ref erence

AQ17 AQ20 Per fect Tot al N umb er of B ases [M bp] 1.3 G 1.2 G 886 M M ean L engt h [bp] 214 201 150 L ongest A lignm ent [bp] 358 358 316 M ean Cover age D ept h 9.6 8.8 6.3

2

38

Run Report for Auto user WIC-14-Chromosome 9 Ion TargetSeq RCCPAP1 3prs 12-5-2014 167

coverageA nalysis

variant Caller

3

39

Run Report for Auto user WIC-14-Chromosome 9 Ion TargetSeq RCCPAP1 3prs 12-5-2014 167

A nalysis D et ails

R un N am e R 2014 12 05 10 25 48 user WIC-14-Chromosome 9 Ion TargetSeq RCCPAP1 3prs 12-5-2014 R un D at e Dec. 5, 2014, 10:25 a.m. R un Flows 900 Project s RCC PAP1 internal Sample 116 Tumor, 299 Tumor, 299 Normal, 116 Normal, 979 Normal, 979 Tumor R efer ence PGM WICKS Flow Or der TACGTACGTCTGAGCATCGATCGATGTACAGC L ibr ar y K ey TCAG TF K ey ATCG Chip Check Passed Chip T ype 318C Chip D at a single B ar code Set IonXpress A nalysis Name Auto user WIC-14-Chromosome 9 Ion TargetSeq RCCPAP1 3prs 12-5-2014 167 A nalysis D at e Dec. 5, 2014, 8:39 p.m. A nalysis Flows 900 r unI D V0RKE

Soft ware Version Tor r ent Suit e 4.0.1 host 11M JFP1 ion-analysis 4.0.5-1 ion-dbr eport s 4.0.21-1 ion-gpu 4.0.0-1 ion-pipeline 4.0.6-1 ion-plugins 4.0.21-1 ion-t orr ent r 4.0.4-1 Script 21.5.5 L iveV iew 545 D at aCollect 462 OS 20 G r aphics 35

4

40

The run reports shown in Appendix A give all of the data processing and software information used by the Ion Torrent PGM. First, basic information on the chip loading density is shown in heat map format (darker reds indicate higher density chip loading and lighter blue indicate low density chip loading). Data is collected on how much of the original fragment library is sequenced and gives usable data to move forward into copy number and variant analysis. The average length of sequenced fragments is also shown.

Information on the alignment (whether or not they can be mapped to the human reference genome) of the base calls is shown next. This data is broken up by each barcoded sample sequenced in the run to show average depth of coverage, mean lengths of reads, and what portion of the reads were aligned to reference. Individual plugin output (here, the only one was used that variant caller plugin) is shown as well.

Finally, all information on what software versions were used on the PGM and within the

Ion Torrent Software Suite is listed at the end of each report. This information is useful for verification of findings by other groups, since different iterations of algorithms and software can potentially return different variant calls.

41 Appendix B. Fragment Library Construction Documents

Table 3 - Dilution Guide for Fragment Library Construction

7/28/14 RCC PAP1 Ch 9 Target Seq Libraries BC ng/uL dilute to ng/uL pM ratio HSQ/ actual TDF pm/TDF dil #1 dil #2 dil #3 HSQ 500pg/uL BioA HSBioA conc pM 3:57 5:45 5: 1/2 4.71 1:9 0.361 1573 13.0 20523 13.5 1520.2 76.0 7.6 33.0 3/4 3.92 1:8 0.265 1175 14.8 17381 13.5 1287.5 64.4 6.4 27.2 5/6 2.96 1:6 0.283 1331 10.5 13921 13.5 1031.2 51.6 5.2 20.8

Table 3 above shows a sample spreadsheet used during fragment library creation to plan

dilutions. Multistep dilutions using low-retention tips for micropipettes are critical to achieving

equimolar amounts of genomic DNA for libraries. The High Sensitivity Qubit (HSQ)

Fluorometer is the main device used to calculate concentrations of DNA, while the Bioanalyzer

(BioA) is used to calculate concentrations of DNA fragments of specific lengths. The ratio

between these two measured concentrations can be used to find the actual picomolar amount of

DNA. All libraries must be diluted down to the same final total dilution factor (TDF), so a series

of dilutions for each sample is created.

Figure 9 shows a sample Bioanalyzer output for one run. The plots show concentrations

of DNA on the y-axis and the fragment sizes on the x-axis. Two peaks for the standards run

(approximately 35 and 10380 bp in length) surround the data points for the sample. Ideally, the

middle peak should fall between 200 and 300 bp in length. All of the plots shown in Figure 9

show that fragment sizes were all larger than 300 bp, so the enzymatic shearing reaction used in

fragment library construction must be run longer in order to create smaller fragments for

libraries.

Figure 9 - Sample Bioanalyzer Output 42

43 Appendix C. Variant Caller Output

Table 4 – Tumor Specific Damaging SNVs Called by VARSCAN2

Chr Start End VT Ref Alt In Gene? In CDS? 9 2804365 2804365 snv C T KIAA0020 G638E 9 6328947 6328947 snv T C TPD52L3 F118L 9 6413582 6413582 snv G A UHRF2 R31Q 9 6413584 6413584 snv G A UHRF2 V32M 9 6413591 6413591 snv C T UHRF2 A34V 9 6413621 6413621 snv G A UHRF2 R44H 9 6481710 6481710 snv A G UHRF2 K410E 9 6556188 6556188 snv G A GLDC Q723* 9 6604594 6604594 snv A G GLDC V351A 9 6610249 6610249 snv T C GLDC N193S 9 7799526 7799526 snv C T C9orf123 R70Q 9 7799583 7799583 snv C T C9orf123 R51H 9 7799653 7799653 snv G T C9orf123 P28T

44 Table 5 - Tumor Specific Damaging DELs Called by VARSCAN2

Chr Start End VT Ref Alt In Gene? In CDS? 9 2123873 2123873 del G - SMARCA2 Del-FS 9 2182210 2182210 del A - SMARCA2 Del-FS 9 2643641 2643641 del C - VLDLR Del-FS 9 2645677 2645677 del G - VLDLR Del-FS 9 2824788 2824788 del C - KIAA0020 Del-FS 9 2837296 2837296 del T - KIAA0020 Del-FS 9 3225134 3225134 del G - RFX3 Del-FS 9 3932361 3932361 del T - GLIS3 Del-FS 9 4117832 4117832 del T - GLIS3 Del-FS 9 4118472 4118472 del C - GLIS3 Del-FS 9 4605351 4605351 del T - SPATA6L Del-FS 9 4661905 4661905 del A - SPATA6L Del-FS 9 5054799 5054799 del T - JAK2 Del-FS 9 5089728 5089728 del G - JAK2 Del-FS 9 5233654 5233654 del A - INSL4 Del-FS 9 5361173 5361173 del T - PLGRKT Del-FS 9 5462913 5462913 del G - CD274 Del-FS 9 5753614 5753614 del A - KIAA1432 Del-FS 9 5801292 5801292 del T - ERMP1 Del-FS 9 5920370 5920370 del T - KIAA2026 Del-FS 9 5920456 5920456 del C - KIAA2026 Del-FS 9 5968044 5968044 del T - KIAA2026 Del-FS 9 6014890 6014890 del C - RANBP6 Del-FS 9 6328715 6328715 del G - TPD52L3 Del-FS 9 6413577 6413577 del C - UHRF2 Del-FS 9 6587248 6587248 del G - GLDC Del-FS 9 6610227 6610227 del C - GLDC Del-FS 9 6893227 6893227 del A - KDM4C Del-FS 9 7798579 7798579 del A - C9orf123 Del-FS 9 8319957 8319957 del A - PTPRD Del-FS 9 8484199 8484199 del C - PTPRD Del-FS 9 12694009 12694009 del A - TYRP1 Del-FS 9 13175754 13175754 del G - MPDZ Del-FS 9 13206081 13206081 del A - MPDZ Del-FS 9 13217183 13217183 del T - MPDZ Del-FS

45 Table 6 - Damaging SNVs Found in All Tumors by Ion Torrent Plugin

Chr Start End VT Ref Alt In Gene? In CDS? 9 2810359 2810359 snv C A KIAA0020 E570* 9 2810410 2810410 snv C G KIAA0020 A553P 9 6328947 6328947 snv T C TPD52L3 F118L 9 6460622 6460622 snv C T UHRF2 R232* 9 6460645 6460645 snv G A UHRF2 W239* 9 6475450 6475450 snv T C UHRF2 I308T 9 6481674 6481674 snv G T UHRF2 E398* 9 6481701 6481701 snv A G UHRF2 K407E 9 6481710 6481710 snv A G UHRF2 K410E 9 6482011 6482011 snv G A UHRF2 R435H 9 6482014 6482014 snv C T UHRF2 T436M 9 6482034 6482034 snv C T UHRF2 P443S 9 6497223 6497223 snv C A UHRF2 P544T 9 6497230 6497230 snv A G UHRF2 D546G 9 6497253 6497253 snv C T UHRF2 R554W 9 6497289 6497289 snv C T UHRF2 R566C 9 6498057 6498057 snv T C UHRF2 F603L 9 6498063 6498063 snv G A UHRF2 V605I 9 6498129 6498129 snv C T UHRF2 R627W 9 6504641 6504641 snv G A UHRF2 E738K 9 6504679 6504679 snv C G UHRF2 H750Q 9 6536114 6536114 snv T C GLDC I930V 9 6553502 6553502 snv G A GLDC H775Y 9 6556188 6556188 snv G A GLDC Q723* 9 6558582 6558582 snv C T GLDC D677N 9 6558645 6558645 snv C T GLDC A656T 9 6558663 6558663 snv C T GLDC A650T 9 6558671 6558671 snv G A GLDC P647L 9 6587205 6587205 snv G A GLDC R596* 9 6602137 6602137 snv T A GLDC K376M 9 6602146 6602146 snv C T GLDC R373Q 9 6602149 6602149 snv A G GLDC I372T 9 6602180 6602180 snv G A GLDC R362C 9 6620238 6620238 snv T C GLDC Y139C

46 Table 7 – Damaging SNVs Found in All Normal Samples by Ion Torrent Plugin

Chr Start End VT Ref Alt In Gene? In CDS? 9 2807887 2807887 snv G A KIAA0020 L581F 9 2810410 2810410 snv C G KIAA0020 A553P 9 6328947 6328947 snv T C TPD52L3 F118L 9 6460610 6460610 snv G A UHRF2 D228N 9 6460622 6460622 snv C T UHRF2 R232* 9 6460645 6460645 snv G A UHRF2 W239* 9 6477685 6477685 snv C T UHRF2 S346F 9 6477688 6477688 snv G T UHRF2 C347F 9 6477697 6477697 snv G A UHRF2 C350Y 9 6481674 6481674 snv G T UHRF2 E398* 9 6481701 6481701 snv A G UHRF2 K407E 9 6481710 6481710 snv A G UHRF2 K410E 9 6482011 6482011 snv G A UHRF2 R435H 9 6482014 6482014 snv C T UHRF2 T436M 9 6482034 6482034 snv C T UHRF2 P443S 9 6486885 6486885 snv C T UHRF2 A486V 9 6497223 6497223 snv C A UHRF2 P544T 9 6497230 6497230 snv A G UHRF2 D546G 9 6497253 6497253 snv C T UHRF2 R554W 9 6497289 6497289 snv C T UHRF2 R566C 9 6498057 6498057 snv T C UHRF2 F603L 9 6498063 6498063 snv G A UHRF2 V605I 9 6498129 6498129 snv C T UHRF2 R627W 9 6499865 6499865 snv G A UHRF2 G647R 9 6499928 6499928 snv G A UHRF2 D668N 9 6504641 6504641 snv G A UHRF2 E738K 9 6506113 6506113 snv G A UHRF2 M781I 9 6506120 6506120 snv A C UHRF2 N784H 9 6506174 6506174 snv C T UHRF2 R802* 9 6536114 6536114 snv T C GLDC I930V 9 6556188 6556188 snv G A GLDC Q723* 9 6558582 6558582 snv C T GLDC D677N 9 6558645 6558645 snv C T GLDC A656T 9 6558663 6558663 snv C T GLDC A650T 9 6558671 6558671 snv G A GLDC P647L 9 6587205 6587205 snv G A GLDC R596* 9 6602137 6602137 snv T A GLDC K376M 9 6602146 6602146 snv C T GLDC R373Q 9 6602149 6602149 snv A G GLDC I372T 9 6602180 6602180 snv G A GLDC R362C 9 6620238 6620238 snv T C GLDC Y139C

47 Table 8 - Damaging Variants in All Tumors Called by GATK UGT

Chr Start End VT Ref Alt In Gene? In CDS? 9 2645677 2645677 del G - VLDLR Del-FS 9 2804365 2804365 snv C T KIAA0020 G638E 9 2824788 2824788 del C - KIAA0020 Del-FS 9 4118472 4118472 del C - GLIS3 Del-FS 9 4118656 4118656 del A - GLIS3 Del-FS 9 4605351 4605351 del T - SPATA6L Del-FS 9 4719173 4719173 snv G A AK3 R136* 9 4719193 4719193 snv C T AK3 W129* 9 4719209 4719209 snv G A AK3 R124C 9 4719250 4719250 snv G T AK3 T110K 9 4719260 4719260 snv G A AK3 Q107* 9 4719299 4719299 snv T A AK3 R94W 9 5054799 5054799 del T - JAK2 Del-FS 9 5077536 5077536 del A - JAK2 Del-FS 9 5304561 5304561 del A - RLN2 Del-FS 9 5361787 5361787 del A - PLGRKT Del-FS 9 5753614 5753614 del A - KIAA1432 Del-FS 9 5919861 5919861 del T - KIAA2026 Del-FS 9 5944874 5944874 ins - T KIAA2026 Ins-FS 9 6328715 6328715 del G - TPD52L3 Del-FS 9 6328947 6328947 snv T C TPD52L3 F118L 9 6413552 6413552 snv G A UHRF2 R21H 9 6413582 6413582 snv G A UHRF2 R31Q 9 6413584 6413584 snv G A UHRF2 V32M 9 6413591 6413591 snv C T UHRF2 A34V 9 6413621 6413621 snv G A UHRF2 R44H 9 6420914 6420914 snv G T UHRF2 L52F 9 6499865 6499865 snv G A UHRF2 G647R 9 6536114 6536114 snv T C GLDC I930V 9 6536216 6536216 snv A G GLDC S896P 9 6536221 6536221 snv G A GLDC T894I 9 6556188 6556188 snv G A GLDC Q723* 9 6587205 6587205 snv G A GLDC R596* 9 6610227 6610227 del C - GLDC Del-FS 9 6610249 6610249 snv T C GLDC N193S 9 6620238 6620238 snv T C GLDC Y139C 9 7799526 7799526 snv C T C9orf123 R70Q 9 8319957 8319957 del A - PTPRD Del-FS 9 8341848 8341848 snv T C PTPRD T1598A 9 8341886 8341886 snv A G PTPRD V1585A 9 8341890 8341890 snv G A PTPRD H1584Y 9 8341898 8341898 snv A T PTPRD I1581N 9 8341965 8341965 snv G A PTPRD R1559W 9 13125277 13125277 del T - MPDZ Del-FS 9 13206081 13206081 del A - MPDZ Del-FS 9 13217183 13217183 del T - MPDZ Del-FS

48 Table 9 - Damaging Variants in All Normal Samples Called by GATK UGT

Chr Start End VT Ref Alt In Gene? In CDS? 9 2643641 2643641 del C - VLDLR Del-FS 9 2811366 2811366 snv C G KIAA0020 G544R 9 2824788 2824788 del C - KIAA0020 Del-FS 9 4117832 4117832 del T - GLIS3 Del-FS 9 4118472 4118472 del C - GLIS3 Del-FS 9 4118656 4118656 del A - GLIS3 Del-FS 9 4605351 4605351 del T - SPATA6L Del-FS 9 4719209 4719209 snv G A AK3 R124C 9 4719250 4719250 snv G T AK3 T110K 9 4719260 4719260 snv G A AK3 Q107* 9 4719287 4719287 snv G A AK3 Q98* 9 4719299 4719299 snv T A AK3 R94W 9 5054799 5054799 del T - JAK2 Del-FS 9 5304561 5304561 del A - RLN2 Del-FS 9 5753614 5753614 del A - KIAA1432 Del-FS 9 5811133 5811133 del T - ERMP1 Del-FS 9 5919861 5919861 del T - KIAA2026 Del-FS 9 6328947 6328947 snv T C TPD52L3 F118L 9 6413515 6413515 snv G C UHRF2 D9H 9 6413552 6413552 snv G A UHRF2 R21H 9 6413582 6413582 snv G A UHRF2 R31Q 9 6413584 6413584 snv G A UHRF2 V32M 9 6413591 6413591 snv C T UHRF2 A34V 9 6413621 6413621 snv G A UHRF2 R44H 9 6477774 6477774 snv C A UHRF2 P376T 9 6499865 6499865 snv G A UHRF2 G647R 9 6506113 6506113 snv G A UHRF2 M781I 9 6506174 6506174 snv C T UHRF2 R802* 9 6536114 6536114 snv T C GLDC I930V 9 6536221 6536221 snv G A GLDC T894I 9 6556188 6556188 snv G A GLDC Q723* 9 6587205 6587205 snv G A GLDC R596* 9 6604644 6604644 del A - GLDC Del-FS 9 6610227 6610227 del C - GLDC Del-FS 9 6610249 6610249 snv T C GLDC N193S 9 6620238 6620238 snv T C GLDC Y139C 9 7174684 7174684 del T - KDM4C Del-FS 9 7799526 7799526 snv C T C9orf123 R70Q 9 7799583 7799583 snv C T C9orf123 R51H 9 7799653 7799653 snv G T C9orf123 P28T 9 8319957 8319957 del A - PTPRD Del-FS 9 8484199 8484199 del C - PTPRD Del-FS 9 13206081 13206081 del A - MPDZ Del-FS 9 13217183 13217183 del T - MPDZ Del-FS

49

Tables 4-9 show the output of the battery of variant callers used to identify single nucleotide variants and small insertions and deletions. Three variant callers were used:

VARSCAN2, the Ion Torrent Variant Caller Plugin in the Torrent Suite software package, and the Genome Analysis Toolkit (GATK) Unified Genotyper (UGT). These tables show the large differences in number of variants identified by each variant caller. These differences are due to the distinct algorithms and varying levels of stringency set as the default settings for each variant caller. VARSCAN2 was used to identify tumor-specific variants, insertions, and deletions only, while the Ion Torrent Plugin and GATK UGT do not by default find mutations specific to tumors. The mutations found by these variant callers were found commonly in all tumor specimens studied: However, the mutations are not necessarily absent in the paired normal specimens.

ACADEMIC VITA

Michael A. 218 S Sparks St., Apt. 409 (724) 553-6333 State College, PA 16801 [email protected] Belsky

Education The Pennsylvania State University, University Park, PA BS in Biomedical Engineering, Schreyer Honors College Expected Graduation Date: 05/2015

Research 05/2014 – 08/2014: Research Intern Experience Cancer Biomarkers Facility, UPMC Shadyside, Pittsburgh, PA Led senior thesis project with the goal of identifying potential genetic biomarkers and therapeutic targets for the treatment of Type I papillary renal cell carcinoma.

05/2013 – 08/2013: Research Intern Cancer Biomarkers Facility, UPMC Shadyside, Pittsburgh, PA Developed and troubleshot DNA sequencing data analysis software package to be implemented in all further research projects. Utilized software to identify deleterious mutations in cholangiocarcinoma tissue samples.

05/2012 – 08/2012: Research Intern Cancer Biomarkers Facility, UPMC Shadyside, Pittsburgh, PA Established IT infrastructure and created analysis pipeline for IonTorrent sequencing platform data. Studied single-base and copy number variant mutations in Type II papillary renal cell carcinoma samples.

Volunteer 09/2014 – 04/2015: THON Bereaved Family Contact Experience The Pennsylvania State University, University Park, PA As a leader for the Penn State IFC Panhellenic Dance Marathon, liaised with families whose children passed away due to cancer. Provided outstanding emotional support to families. Relayed all information about fundraisers and events to families.

Work 08/2014 – 04/2015: Physiology Laboratory Teaching Assistant Experience The Pennsylvania State University, University Park, PA Currently teaching two classes of approximately 16 undergraduate students through weekly experiments. Topics include electroencephalography, electromyography, electrocardiography, the respiratory system, and exercise physiology.