Thesis

Genomic profiling of B-cell lymphoid neoplasms

PORETTI, Giulia

Abstract

Le cancer est une maladie génétique des cellules somatiques caractérisée par l'accumulation d'anomalies génomiques. Nous avons effectué des études d'hybridation génomique comparative et d'expression avec microarrays sur environ 200 lymphomes à cellules B pour identifier des gènes candidats avec niveaux accrus d'ADN associés à surexpression. Dans le myélome multiple l'analyse intégrative des profils du génome et du transcriptome a identifié des candidats rélévants, à évaluer par des études supplémentaires avec plus d'échantillons. L'analyse des profils génomiques de plusieurs lymphomes a révélée une amplification récurrente en 11q23.1. L'amplification a été validée par PCRq et par hybridation fluorescente in situ, ce qui a montré des remaniements chromosomiques complexes. Avec les tiling arrays ont y a trouvé une transcription accrue, mais les cibles de l'amplification n'ont pas été découverts. Nous avons supposé une certaine fragilité de la région ou l'existence d'une nouvelle activité de transcription superposée aux gènes connus, en gênant leur fonction.

Reference

PORETTI, Giulia. Genomic profiling of B-cell lymphoid neoplasms. Thèse de doctorat : Univ. Genève, 2009, no. Sc. 4054

URN : urn:nbn:ch:unige-50961 DOI : 10.13097/archive-ouverte/unige:5096

Available at: http://archive-ouverte.unige.ch/unige:5096

Disclaimer: layout of this document may differ from the published version.

1 / 1 UNIVERSITÉ DE GENÈVE FACULTÉ DES SCIENCES Section des Sciences Pharmaceutiques Professeur Leonardo Scapozza

ONCOLOGY INSTITUTE OF SOUTHERN SWITZERLAND (IOSI) Laboratory of Experimental Oncology Francesco Bertoni, MD

Genomic Profiling of B-cell Lymphoid Neoplasms

THÈSE

présentée à la Faculté des sciences de l’Université de Genève pour obtenir le grade de Docteur ès sciences, mention sciences pharmaceutiques

par

GIULIA PORETTI

de

Chiasso (TI)

Thèse No 4054

LUGANO Fondazione OTAF Sorengo 2009

To my grandmother Martina, who taught me that “Chi la dura la vince.” “He who perseveres wins at last.” and to all smart, great people I was lucky enough to meet.

“La natura non fa nulla di inutile.”

Aristotele

RÉSUMÉ

Le cancer est une maladie génétique très complexe des cellules somatiques. Elle est caractérisée par l’accumulation de nombreuses anomalies génomiques menant à une déstabilisation du transcriptome et du protéome et à l’altération de la biologie cellulaire. Les aberrations génétiques affectent les cellules tumorales avec des altérations au niveau des nucléotides telles que les mutations ponctuelles et l’instabilité des microsatellites, ainsi qu’au niveau des , avec l’aneuploïdie, les translocations, les insertions, les duplications/amplifications, les inversions et les délétions. L’établissement d’un inventaire global des mutations causales dans le processus de cancérisation n’est pas évident si l’on considère que ces mutations s’accompagnent souvent d’événements génétiques secondaires. Ceux-ci sont acquis au hasard pendant la progression de la tumeur et ne sont pas directement responsables de l’évolution de la maladie. Toutefois l’étude des aberrations génétiques récurrentes a conduit à la découverte de gènes liés au cancer, qui ont été classés en oncogènes et gènes suppresseurs de tumeur. En outre, les techniques à haut débit et à haute résolution comme les microarrays permettent maintenant des analyses globales des anomalies du nombre de copies des molécules d’ADN et des altérations du transcriptome dans les échantillons tumoraux. Cette approche globale représente une réponse appropriée à la complexité des cancers humains. Ainsi les études de typage génomique ont permis l’identification des facteurs de diagnostic et pronostic, mis en évidence de nouvelles cibles thérapeutiques et ont proposé pour certaines tumeurs un système de classification fondé sur les caractéristiques biologiques. Nous avons effectué des études de typage génomique sur des lymphomes à cellules B pour découvrir les altérations de dosage génique et les gènes ciblés par des variations du nombre de copies. L’objectif de notre recherche était l’identification de gènes candidats ayant un rôle probable dans l’apparition des tumeurs ainsi que leur caractérisation. Pour améliorer la sélection des gènes candidats nous avons émis l’hypothèse que les mutations causales sont récurrentes et nous avons sélectionné les gènes candidats avec des altérations aussi bien au niveau du génome qu’au niveau du transcriptome. Nous avons donc recherché les changements du dosage d’ADN fréquents et associés à une expression génique altérée. Nous nous sommes notamment intéressés aux gènes avec gain ou amplification d’ADN associé à une surexpression, deux caractéristiques typiquement observées comme mécanisme d’activation oncogénique et très importantes pour des applications thérapeutiques potentielles. Nous avons commencé par des investigations fondées sur l’hybridation génomique comparative (aCGH) effectuée avec microarray. Nous avons utilisé des microarrays à haute résolution interrogeant des polymorphismes du simple nucléotide (SNP) (SNP arrays) pour analyser les profils du génome d’environ 200 tumeurs à cellules B. Le projet que nous

présentons ici concerne la caractérisation moléculaire du myélome multiple (MM) et les recherches suivantes basées sur les résultats obtenus. Le MM est une malignité incurable des plasmocytes, caractérisée par plusieurs anomalies génomiques (aneuploïdie, anomalies de nombre des copies d’ADN, translocations impliquant surtout le gène IgH en 14q32). Nous avons analysé des échantillons de MM (patients et lignées cellulaires) en utilisant des microarrays pour obtenir les profils globaux au niveau du génome et de l’expression génique. L’objectif de cette analyse était principalement l’identification de nouveaux gènes candidats pour des approches thérapeutiques. Pour représenter les profils de gains ou pertes et de perte d’hétérozygocité (PH) du génome entier nous avons utilisé des cartes calorimétriques, alors que les fréquences des aberrations génomiques ont été présentées graphiquement. Parmi les caractéristiques principalement observées, nous avons noté un gain du bras chromosomique 1q, des niveaux d’ADN fréquemment élevés chez les chromosomes 3, 5, 7, 9, 11, 15, 19, et des pertes d’ADN récurrentes accompagnées de PH en 1p, 13q et 17p. Ces évidences étaient concordantes avec les connaissances antérieures, ce qui a confirmé la fiabilité de la méthode utilisée. Pour trouver les changements de nombre des copies d’ADN avec un effet sur l’expression de gènes importants, nous avons intégré les résultats du typage global du génome et du transcriptome en utilisant deux méthodes: un filtre combiné tant sur les données d’ADN que celles d’expression génique, ou la détection des gènes avec valeurs aberrantes d’expression selon l’algorithme COPA et un niveau d’ADN élevé. Notre intérêt portait sur la sélection de gènes candidats caractérisés par un niveau d’ADN augmenté et simultanément surexprimés. L’analyse intégrative a identifié des candidats importants et récurrents, y compris des transcrits impliqués dans la pathogenèse du MM, des gènes impliqués dans d’autres types de tumeurs, des oncogènes connus mais pas encore associés au MM, et des candidats concordants dans plusieurs jeux de données sur le MM. L’analyse des profils génomiques de nombreuses tumeurs à cellules B faite dans notre laboratoire avec les 10K SNP arrays, comprenant les échantillons de MM déjà mentionnés, des échantillons de lymphome à cellules du manteau (MCL) et de lymphome diffus à grandes cellules (DLBCL), a permis l’identification de trois profils 10K avec une amplification similaire en 11q23.1. L’amplification d’ADN en 11q23.1 a été trouvée dans trois lignées cellulaires: JJN3 (MM), KARPAS422 (DLBCL) et JEKO1 (MCL). Cette amplification récurrente était localisée dans une région souvent réarrangée dans les malignitées hématologiques et on l’a choisie pour d’autres analyses. Nous avons utilisé les microarrays 250K, plus récents et avec une meilleure résolution, pour la détection des SNPs permettant ainsi une définition plus précise de la région amplifiée. Les résultats ont confirmé une amplification superposée en 11q23.1 pour les lignées JJN3 et KARPAS42, tandis que JEKO1 a montré un gain d’ADN entouré de régions amplifiées. Une autre lignée cellulaire de DLBCL, U2932, a été analysée avec les microarrays 250K, ce qui nous a permis de montrer une amplification en 11q23.1, ressemblant à celle des lignées JJN3 et KARPAS422. La région minimale d’amplification en

commun entre les lignées cellulaires JJN3, KARPAS422 et U2932 était délimitée par l’amplimère de JJN3 et couvrait 330 kb. Cette amplification a été validée par PCRq. Les analyses d’hybridation fluorescente in situ (FISH) effectuées sur les quatres lignées cellulaires avec des clones de chromosomes bactériens artificiels spécifiques pour la région amplifiée ont montré quatre motifs d’amplification différents, avec des remaniements chromosomiques complexes. Pour identifier les cibles de l’amplification d’ADN nous avons analysé le transcriptome des lignées cellulaires par RT-PCR. Le but était l’identification de transcrits surexprimés, y compris les transcrits annotés, prédits et microARNs (mir-34b et mir-34c). Apparemment le niveau de transcription n’était pas influencé par l’amplification d’ADN. La cartographie du transcriptome obtenue avec tiling arrays a révélé une activité de transcription accrue uniquement près des gènes PPP2R1B (JJN3 et KARPAS422), POU2AF1 (KARPAS422), et SNF1LK2 (KARPAS422), et a permis d’écarter toute activité de transcription hors des gènes annotés. Nous avons sélectionné comme cibles putatives de l’amplification les gènes POU2AF1 et PPP2R1B, et nous les avons caractérisés ultérieurement à travers des études de perte de fonction. Une évaluation fonctionnelle de leur effet biologique n’a pas été possible parce que nous ne sommes pas arrivés à un niveau satisfaisant de knockdown de l’ARNm. Les profils d’expression génétique révélés par les tiling arrays pendant la caractérisation de la région 11q23.1 ont fourni aussi une cartographie détaillée de l’activité de transcription des chromosomes 8, 11 et 12 dans les lignée cellulaires JJN3, JEKO1 et KARPAS422. Nous avons catalogué les unités de transcription putatives sur la base du niveau d’ADN sous-jacent. Nous avons défini des régions génomiques où l’activité de transcription élevée était combinée avec une augmentation du niveau d’ADN (up genomic intervals, UGIs), et d’autres où l’activité de transcription réduite était combinée avec une diminution du niveau d’ADN (down genomic intervals, DGIs). La comparaison des UGIs avec les annotation génomique courants a amené à l’identification de régions avec une activité de transcription tout à fait nouvelle. Nous avons en particulier étudié l’activité de transcription nouvellement détectée en 11p12 dans la lignée cellulaire JJN3. Le transcrit putatif en 11p12 était distal (>10kb) de gènes annotés et était composé par 15 UGIs non-annotés et consécutifs, complètement exclus de toute annotation génomique préexistante. Les UGIs de JJN3 en 11p12 couvraient presque 128 kb et nous avons émis l’hypothèse qu’ils faisaient potentiellement partie d’un nouveau transcrit unique. Parmi les 15 UGIs, deux ont été validés par RT-PCR. D’autres étaient superposés à de nouveaux transcrits ARN récemment décrits et/ou parfaitement alignés avec des régions hautement conservées parmi les mammifères. Ces résultats sont intéressants, mais des études supplémentaires sont nécessaires pour comprendre la vraie nature des régions génomiques avec une activité de transcription nouvelle. L’utilisation de microarrays pour effectuer des études d’hybridation génomique comparative et d’expression génique sur des lymphomes à cellules B nous a permis de sélectionner des gènes candidats dans le MM, d’identifier une amplification récurrente en 11q23.1, et de

découvrir de l’activité de transcription inattendue. À propos des gènes candidats dans le MM, des études supplémentaires avec un nombre d’échantillons de MM plus vaste sont nécessaires pour comprendre l’importance éventuelle des transcrits sélectionnés dans la pathogenèse de la maladie et la possibilité de les exploiter comme des nouvelles cibles thérapeutiques. En ce qui concerne la région amplifiée en 11q23.1, les analyses effectuées ont permis une caractérisation précise de la région au niveau du ADN et du ARN, mais les résultats obtenus n’ont pas clairement révélé les cibles de l’amplification. D’une part nous avons supposé l’existence d’une nouvelle activité de transcription superposée aux gènes annotés, en gênant leur fonction. D’autre part une certaine fragilité de la région pourrait être responsable des réarrangements chromosomiques complexes, ce que confirment les résultats des analyses avec FISH. En conclusion les tiling arrays ont permis d’apprécier la complexité architecturale du génome humain. La caractérisation moléculaire et fonctionnelle du nouveau transcrit putatif en 11p12 et d’autres UGIs non-annotés pourrait révéler des aspects intéressants et inconnus de l’organization du génome humain.

TABLE OF CONTENTS

ABBREVIATIONS III

SUMMARY V

1 GENERAL INTRODUCTION 1

1.1 Genomic profiling in cancer 1 1.1.1 Cancer and genetic aberrations 1 1.1.1.1 Cancer 1 1.1.1.2 Genetic instability in cancer 3 1.1.1.3 Genetic aberrations in cancer 4 1.1.2 Genomic profiling 14 1.1.2.1 Array comparative genomic hybridization (aCGH) 14 1.1.2.2 expression profiling 24

1.2 B-cell lymphoid neoplasms 28 1.2.1 Normal B-cell ontogeny 28 1.2.2 B-cell tumors 32

2 GENERAL AIM AND EXPERIMENTAL PLAN 34

3 GENOMIC PROFILING OF MULTIPLE MYELOMA 36

3.1 Introduction 37 3.2 Aim 50 3.3 Materials and methods 51 3.4 Results 62 3.4.1 Genome profiling 62 3.4.2 Integrative analysis of expression and genomic profiles 71 3.5 Discussion 88

4 MOLECULAR AND FUNCTIONAL CHARACTERIZATION OF A RECURRENT DNA AMPLIFICATION AT 11q23.1 98

4.1 Introduction 100 4.2 Aim 101 4.3 Materials and Methods 102 4.4 Results 120 4.4.1 Genome profiling 120 4.4.2 Transcriptome profiling 126 4.4.3 FISH experiments 132 4.4.4 Loss-of-function studies 135

I 4.5 Discussion 137

5 TILING GENE EXPRESSION PROFILES ANALYSIS 145

5.1 Introduction 146 5.2 Aim 146 5.3 Materials and methods 147 5.4 Results 152 5.5 Discussion 161

6 GENERAL DISCUSSION 164

7 BIBLIOGRAPHY 166

ACKNOWLEDGEMENTS VIII

SUPPLEMENTARY MATERIAL X

APPENDIX XXIII

CURRICULUM VITAE XXV

POSTERS AND PUBLICATIONS XXVI

II ABBREVIATIONS

AB antibody (a)CGH (array) comparative genomic hybridization AG antigen ASCT autologous (hematopoietic) stem cell transplantation BAC bacterial artificial BCR B-cell receptor BM bone marrow BMSC bone marrow stromal cell CIN chromosomal instability CN (DNA) copy number CNAT GeneChip Chromosome Copy Number Analysis Tool (Affymetrix) COPA Cancer Outlier Profile Analysis CSR class switch recombination DGI down genomic interval (see chapter 5 for definition) DLBCL diffuse large B-cell lymphoma ds double-stranded DSB double-stranded break EST expressed sequence tag FISH fluorescence in situ hybridization GC germinal centre GCOS GeneChip Operating Software (Affymetrix) GDAS GeneChip DNA Analysis Software (Affymetrix) GEP gene expression profile HMCL human multiple myeloma cell line HMM Hidden Markov Model HSR homogeneously staining region Ig immunoglobulin IgH immunoglobulin heavy chain LOH loss of heterozygosity MAS Affymetrix MicroArray Software MCL mantle cell lymphoma MGUS monoclonal gammopathy of undetermined significance miRNA microRNA MM multiple myeloma mmix master mix ncRNA non-coding RNA NHL non-Hodgkin’s lymphoma

III NTC no template control (negative reaction) PC plasma cell (p)UPD (partial) uniparental disomy RACE rapid amplification of cDNA ends RMA robust multi-array average SHM somatic hypermutation SNP single nucleotide polymorphism ssDNA single-stranded DNA TUF transcript of unknown function UGI up genomic interval (see chapter 5 for definition) WGSA whole genome sampling assay WT whole transcript

IV SUMMARY

Cancer is a very complex genetic disease of somatic cells characterized by the accumulation of multiple lesions at genome level, that cause deregulation at transcriptome and proteome levels and alter cell biology. Genetic aberrations affect cancer cells with changes at the nucleotide level, including point mutations and microsatellite instability, and at chromosome level, comprising aneuploidy, translocations, insertions, duplications/amplifications, inversions, and deletions. The comprehensive knowledge of causative mutations involved in tumorigenesis is challenging, since these are usually accompanied by secondary genetic events, acquired at random during disease progression and irrelevant for the development of malignancy. However, the study of recurrent genetic alterations led to the discovery of associated with cancer, further classified into oncogenes and tumor suppressor genes. In addition, the advent of high-throughput and high-resolution techniques like microarray platforms enabled genome-wide analyses of DNA copy number (CN) abnormalities and gene expression alterations in tumor samples, permitting an adequate approach to face the complexity of the disease. Genomic profiling studies have been successfully used for the identification of diagnostic and prognostic factors, for the detection of new therapeutic targets, and for the development of tumor classification systems reflecting the underlying biology. We performed genomic profiling studies in B-cell lymphoid neoplasms to discover gene dosage alterations and to detect the genes targeted by CN changes. Our aim was to identify tumor-specific candidate genes with a putative role in tumor development, and to further characterize them. To improve the selection of candidate cancer genes, we followed the assumption that causative mutations should be recurrent, and we selected candidate genes with changes of both DNA and RNA levels. We therefore looked for frequent CN changes associated with deregulated gene expression. In particular, we were interested in genes with DNA gain or amplification associated with upregulated gene expression, strong indicators of oncogenic behaviour and interesting targets for the development of new therapeutic approaches. We performed aCGH (array comparative genomic hybridization) studies with high-resolution SNP (single nucleotide polymorphism) arrays to investigate the genome-wide DNA profiles of almost 200 B-cell tumors. Here we present the molecular characterization of multiple myeloma (MM) and the successive investigations started from observations done in the course of this study. MM is an incurable plasma cell malignancy linked to heterogeneous genomic abnormalities, such as ploidy status alterations, chromosomal copy number changes and non-random chromosomal translocations mainly involving the IgH (immunoglobulin heavy chain) gene locus at 14q32. To improve the molecular characterization of MM and identify new

V candidate cancer genes exploitable as therapeutic targets we obtained, with DNA microarrays, the genome-wide DNA profiles and gene expression profiles of both MM clinical samples and human MM cell lines (HMCLs). We created heat-maps for visualizing genome- wide DNA CN and loss of heterozygosity (LOH), and we presented the frequencies of genomic aberrations through frequency plots. As major characteristics we observed 1q gain and frequently increased DNA level of odd chromosomes (3, 5, 7, 9, 11, 15, 19), and recurrent DNA losses and LOH events at 1p, 13q and 17p, in concordance with previous knowledge, thus confirming the efficacy of the adopted method. To identify DNA CN changes influencing the expression of relevant genes we integrated global gene expression and genomic profiling data by applying a matched filter on both DNA and expression data, or by the COPA-based detection of genes with outlier expression and underlying increased DNA CN. We were particularly interested in the selection of candidate oncogenes presenting increased DNA CN coupled with overexpression. The integrative analysis allowed the identification of strong, recurrent candidates as evidenced by the presence of transcripts with a proven role in MM pathogenesis, but also genes targeted in other tumor types, known oncogenes not yet implicated in MM and candidates concordant across various MM data sets. The observation of the genome-wide DNA profiles of a series of B-cell tumors collected at that time in our laboratory by means of 10K SNP arrays, including the previously mentioned MM, but also mantle cell lymphoma (MCL) and diffuse large B-cell lymphoma (DLBCL) samples, led us to the identification of three 10K profiles with a similar 11q23.1 amplification, belonging to the cell lines JJN3 (MM), KARPAS422 (DLBCL) and JEKO1 (MCL). The recurrent amplification was localized in a region often rearranged in haematological malignancies and was therefore selected for further characterization. The updated 250K SNP arrays were used to better define the amplified region, and we confirmed an overlapping amplicon at 11q23.1 in JJN3 and KARPAS422, whereas JEKO1 showed a DNA gain flanked by amplified regions. A fourth DLBCL cell line, U2932, was analyzed with 250K arrays and exhibited an amplification similar to JJN3 and KARPAS422. The minimal common region of amplification was delimited by the amplicon of JJN3, and covered 330 kb. 11q23.1 amplification was validated by qPCR. Fluorescence in situ hybridization (FISH) analyses were performed on the four cell lines using BAC clones overlapping the amplified region, and showed four different amplification patterns of complex chromosome rearrangements. To identify the amplification target(s), we analyzed the transcriptome of the cell lines by RT- PCR looking for the overexpression of known and predicted transcripts and microRNAs (miR- 34b and miR-34c). The transcription level was apparently not influenced by the underlying DNA amplification. Transcriptome mapping with tiling arrays was performed on JJN3, JEKO1 and KARPAS422, and revealed increased transcription activity only in correspondence of the genes PPP2R1B (in JJN3 and KARPAS422), POU2AF1 (in KARPAS422), and SNF1LK2 (in KARPAS422), excluding the presence of transcription activity outside annotated transcripts. The putative amplification targets POU2AF1 and PPP2R1B were selected for further

VI characterization by loss-of-function studies. Unfortunately, functional evaluation of their biological effect was not possible due to the inability to reach satisfactory mRNA knockdown, and we could not indicate a clear amplification target. Besides the characterization of the transcription activity at 11q23.1, transcriptome mapping with tiling arrays provided a detailed map of the gene expression profiles on chromosomes 8, 11, and 12 of the cell lines JJN3, JEKO1 and KARPAS422. We catalogued the putative transcribed units with respect to the cell line-specific DNA CN changes and we defined genomic regions where increased transcription activity was matching with increased DNA level (up genomic intervals, UGIs), and other regions where decreased transcription activity was matching with decreased DNA level (down genomic intervals, DGIs). A precise comparison of the UGIs with currently known annotation tracks led to the identification of UGIs located outside any annotation, thus representing novel transcription activity. Among the various regions with novel transcription activity, we concentrated on the one detected at 11p12 in the cell line JJN3. The 11p12 putative novel transcript was located distal (>10kb) to previously annotated genes and was constituted by 15 un-annotated, consecutively mapped UGIs, not interrupted by any known transcript. JJN3 UGIs at 11p12 covered about 128 kb and we assumed that they were part of a unique, long putative novel transcript. Two UGIs were validated by RT-PCR, others were overlapping with previously described novel RNA transcripts, and/or perfectly aligned with high mammalian conservation scores. These results are promising, but further investigations are needed to reveal the real nature of the genomic regions presenting novel transcription activity. In conclusion, the use of microarrays to perform aCGH and gene expression analyses in B-cell lymphoid neoplasms allowed us to select candidate genes in MM, to detect a recurrent amplification at 11q23.1, and to discover unexpected transcription activity. Concerning the candidate genes in MM, further studies involving larger series of MM samples are needed to elucidate whether they can be considered relevant candidates in the pathogenesis of the disease, and whether they may contribute toward the identification of novel therapeutic targets. In the case of the amplified region at 11q23.1, all the analyses carried out provided a precise characterization of the region both at DNA and RNA level, but they were still not successful in indicating strong amplification targets. We concluded that this could be due either to the existence of novel transcription activity overlapping annotated genes and interfering with their normal function, or to the relative fragility of the region causing complex chromosome rearrangements, as observed in FISH results. Finally, we could appreciate the complex architecture of the by the analysis of the tiling gene expression profiles. Molecular and functional characterization of the putative novel transcript at 11p12 and of other un-annotated UGIs could reveal interesting, unknown aspects of the genomic organization.

VII 1 GENERAL INTRODUCTION

1.1 Genomic profiling in cancer

1.1.1 Cancer and genetic aberrations

1.1.1.1 Cancer Cancer is defined by the World Health Organization as a large group of diseases that can affect any part of the body and are characterized by uncontrolled growth and spread of abnormal cells beyond their usual boundaries7. Cancer is a very complex genetic disease of somatic cells, characterized by the accumulation of multiple alterations at genome level. The concept that neoplasia originates from genetic changes was first proposed by Boveri8 in 1914 and generally it is referred to as the somatic mutation theory of tumorigenesis. Experimentally it was first proven in 1927 by Muller’s discovery that ionizing radiation, already considered a potent carcinogen, had mutagenic activity9 and it is supported today by a variety of experimental data. Genetic abnormalities may cause deregulation at transcriptome and proteome levels and affect cell proliferation, differentiation and survival. In cancer, genetic aberrations due to the influence of environmental factors, to mistakes in cell replication and to aberrant DNA repair mechanisms are found. Environmental factors are intended as physical carcinogens (e.g. UV and ionizing radiation), chemical carcinogens (e.g. asbestos, tobacco smoke components) and biological carcinogens, such as infections with oncogenic viruses (e.g. DNA viruses like HPV, SV40; RNA viruses like HIV) or certain bacteria (e.g. Helicobacter pylori). It is possible to distinguish between germline mutations or polymorphisms and somatic mutations. Germline mutations are mainly recessive (i.e both alleles need to be mutated to have an impact on the phenotype) and predisposing to cancer. Somatic mutations represent the majority of cancer mutations and are acquired during pathogenesis (de novo mutations)10. Clearly, recurrent somatic mutations, defined as those found in at least two cases of the same morphologic entity11 (URL: http://cgap.nci.nih.gov/Chromosomes/Mitelman), play an important role in diagnosis and prognosis, and represent interesting therapeutic targets. The study of mutational events linked to tumor development has a great challenge to overcome: the distinction between “driver” or causative mutations, i.e. conferring functional advantages to tumor cells, and “passenger” or random mutations, that represent secondary genetic events, acquired during disease progression and without biologic significance10. Causative mutations are expected to be recurrent within a sample batch but also among different data sets on the same pathology12. Moreover, the type of genome aberrations has

1 also to be considered, with high-level DNA copy number changes probably having a greater impact than low-level12. Even if every cancer type presents specific alterations, it is possible to establish some common biological principles leading to cell transformation13. Essentially, the following acquired capabilities are hallmarks of cancer cells: self-sufficiency in growth signals, insensitivity to growth-inhibitory signals, evasion of apoptosis, limitless replicative potential, sustained angiogenesis and tissue invasion and metastasis13. An additional, typical attribute of cancer cells is genetic instability. It is still debated on the role of genetic instability as driving force in the development of sporadic cancers. On one hand a so-called “mutator phenotype”, i.e. a cell with increased genomic instability, is considered necessary for carcinogenesis14,15. The “mutator phenotype” model proposes that the initiating event is the occurrence of mutations in DNA polymerases, that render them error-prone, and/or mutations in enzymes involved in DNA repair mechanisms, or more generally in genes that control genomic integrity16,17. This phenotype has no direct selective advantage but increases the mutation rate of other genes. It corresponds to the situation observed in inherited human cancer syndromes with germline mutations in genes involved in maintaining genome integrity (e.g. xeroderma pigmentosum, ataxia telangiectasia, hereditary non- polyposis colorectal cancer). On the other hand, in a model comparing the tumorigenesis process to a form of somatic evolution, the selection of advantageous mutations is considered the driving force of sporadic tumor development rather than an increase in the mutation rate18. According to this model, tumor growth is initiated by one or more mutations giving the cell a growth advantage. The clone will acquire successive mutations and, under the selective pressure of tumor microenvironment, the cells bearing advantageous mutations undergo successive waves of clonal expansion. This was described for example in sporadic colorectal cancer development19. In this context, the existence of many genetic alterations in a tumor does not necessarily mean that it has a “mutator phenotype”. In fact, instability is a matter of rate, and the presence of a mutated state provides no information about the rate of its occurrence20. Although an inherited tendency to genomic instability clearly drives tumorigenesis and even if genetic instability is often seen in cancer cells, the development of sporadic cancers probably occurs on the basis of a normal mutation rate and is mainly driven by the selection of advantageous mutations21,22. Genetic instability might arise as a secondary effect of mutations, but the direct selective effect of such mutations override it, and genetic instability only indirectly contribute to the somatic evolution of cancer22. Recently, another hypothesis has been made for cancer development, based on an oncogene-induced DNA damage model (reviewed in Halazonetis et al.23). The authors suggest that activated oncogenes induce perturbations in the DNA replication machinery, which in turn lead to the formation of DNA breaks. According to this model, the presence of DNA damage characterizes both precancerous lesions and cancers. The malignant

2 transformation is given by the inactivation of p53 or, less often, other DNA damage checkpoint , that impairs the DNA damage response pathway normally leading to cell cycle arrest, apoptosis or senescence. This model is therefore based on two key features of most cancers, namely genomic instability and p53 mutations.

1.1.1.2 Genetic instability in cancer Even if the causative role of genetic instability in tumor development can be discussed, it is widely accepted that numerous genetic aberrations are accumulated in neoplastic cells, providing evidence for the genetic basis of human cancer. The consequences of genomic instability are genetic aberrations, ranging from point mutations to complex chromosome rearrangements. The existence of two levels of genetic instability affecting the vast majority of cancers was proposed20. In a small subset of cases the instability is observed as subtle sequence alterations at nucleotide level, whereas in most other cancers it is found at chromosome level. Sequence errors at nucleotide level can be generated by polymerases during DNA replication or by mutagens. In normal cells they are avoided by DNA repair mechanisms, such as nucleotide-excision repair (NER), base-excision repair (BER) and mismatch repair (MMR) mechanisms. NER is mainly involved in repairing covalent modifications of DNA caused by exogenous mutagens. NER defects result for example in high susceptibility to skin tumors due to the exposure to a widely diffused mutagen like ultraviolet light20. The inactivation of MMR genes gives rise to microsatellite instability (MIN). Microsatellites are short sequences of DNA repeats scattered throughout the genome and MIN leads to repetitive DNA expansions and contractions. Another typical consequence of genetic instability at nucleotide level is given by point mutations, involving substitution, deletion or insertion of a few nucleotides. At chromosome level, genetic instability involves both structural and numerical chromosome abnormalities and is the result of both chromosomal instability and chromosomal rearrangements. Chromosomal instability (CIN) involves changes in chromosome number leading to chromosome gains or losses and nondisjunction, a phenomenon known as aneuploidy. CIN is caused by failures in mitotic chromosome segregation, normally controlled both by the entire mitotic machinery and by a spindle checkpoint24. The spindle mitotic checkpoint is a quality-control mechanism preceding anaphase entry, that ensures that all pairs of sister chromatids have achieved bipolar attachment to the mitotic spindle for a correct segregation. Together with mutations in spindle checkpoint, also structural components of the mitotic spindle, in particular defects in the regulation of centrosome number, seem to play a key role in CIN. In contrast, instability leading to chromosomal rearrangements refers to events changing the genetic linkage of two DNA fragments, resulting in translocations, insertions, duplications, inversions or deletions25. The common feature of these instability events is that they are generated by DNA breaks and are mainly associated with replication stress, i.e with collapse of the replication machinery. During the S-

3 phase of the cell cycle, DNA is most vulnerable due to unwind of the parental duplex to allow access to the replication machinery. Stalling of the replication process causes the formation of single-stranded DNA (ssDNA) gaps and double-stranded breaks (DSBs). Many checkpoint functions, such as replication, repair and S-phase checkpoints, are involved in monitoring genome integrity during replication and in preventing DNA damage. If the controlled process is disrupted by replication stress, DNA breaks accumulate and chromosomal rearrangements are generated by error-prone DNA DSB repair mechanisms. These latter typically utilize homologous recombination and/or non-homologous end joining pathways to resolve DSBs (reviewed in Jackson et al.26). Replication stress can be induced by replication inhibition and/or S-phase checkpoint inactivation, but, as previously mentioned, can also be the consequence of oncogene activation23. Nevertheless, not all DSBs arise as a result of replication stress. Another mechanism involves telomere erosion. Telomeres are complex nucleoprotein structures located at the ends of linear chromosomes, critical for maintaining genome integrity. Telomere erosion is associated with massive genomic instability during the “cellular crisis” state, i.e. the period of massive cell death due to telomere dysfunction before the acquisition of telomerase activity by the surviving, malignant cells27. Other two elements that play a role in genetic instability leading to rearrangements are fragile sites and highly transcribed DNA sequences25. Fragile sites are DNA sequences spanning approximately 50 kb to 1 Mb, that show gaps and breaks due to partial inhibition of DNA synthesis, particularly if exposed to physiological stress28. Their instability is probably linked to their DNA structure, usually presenting trinucleotide repeats. They are frequently associated with rearrangements like translocations, amplifications, and integrations of exogenous DNA. Transcription-associated instability is based on the fact that transcription of a DNA sequence involves the formation of ssDNA intermediates, that are chemically more unstable than dsDNA.

1.1.1.3 Genetic aberrations in cancer As previously discussed, a key feature of cancer is the presence of non-random genetic aberrations (Fig. 1). Genomic changes typically observed in cancer cells can be numerical or structural, and can coexist in complex karyotypes29. Numerical aberrations comprise aneuploidy, DNA copy number (CN) changes, and unbalanced translocations, whereas structural aberrations include balanced translocations, and inversions. However, in situations like extra-chromosomal amplifications, unbalanced translocations, insertions and deletions, the distinction of numerical and structural aberration is not univocal. Structural changes are defined as balanced if they involve equal exchange of material between two chromosomal regions, whereas the term unbalanced refers to a non-reciprocal exchange leading to gain or loss of genome portions.

4 As already anticipated, aneuploidy corresponds to the phenomenon of an abnormal chromosome number with respect to the typical number of 46 human chromosomes. This is due to gains or losses of whole chromosomes by CIN. Translocations are given by exchange of chromosomal segments between non-homologous chromosomes, and can be balanced or unbalanced. Spatial proximity and are factors probably contributing in increasing the propensity of translocated chromosome partners to rearrange29. DNA copy number changes comprise deletions, gains or amplifications of genomic material. Amplification is a neoplasia-associated mechanism we were particularly interested in, and is therefore reviewed in a separate chapter (see next sub-chapter). CN aberrations are also referred to as gene dosage alterations, i.e alterations in the number of copies of a given sequence found in a cell.

A failed cytokinesis

B segregation errors

at anaphase

C1 C2 C high-level amplification

D D1 D2 translocations

E loss of heterozygosity UPD: or or

Fig. 1: Acquisition of genetic aberrations in cancer cells (modified from Bayani et al.29 and Albertson et al.56). A normal cell (left) is represented with three chromosome pairs (yellow, fucsia, green). (A) Polyploidy: failure to undergo cytokinesis after the diploid chromosome set has doubled leads to polyploidy (here a tetraploid cell for chromosomes yellow, fucsia, green); (B) aneuploidy: segregation errors at anaphase result in two aneuploid daughter cells: one monosomic for the green chromosome and the other trisomic; (C) amplification: gene amplification by double minutes (C1) or homogeneously staining regions (C2); (D) translocation: unbalanced (D1) or balanced (D2); (E) loss of heterozygosity (LOH): denotes the loss of one allele in an heterozygous pair due to deletion or mutation (black spot), or, more rarely, to uniparental disomy (UPD).

5 Further genetic anomalies involved in cancer are somatic point mutations, loss of heterozygosity (LOH) and epigenetic modifications, all reviewed in the following paragraphs. The result of somatic point mutations, where one or a few nucleotides are altered by sequence error and/or deletion or insertion, can be nonsense mutations, missense mutations or frameshift mutations. In nonsense mutations the new codon causes the to prematurely terminate, leading to a shorter and usually non-functional product. Missense mutations present an incorrect aminoacid into the protein sequence. The effect on protein function depends on the site of mutation and the nature of the aminoacid replacement. In general, mutations that do not affect the protein sequence or function are called silent or synonymous mutations. Frameshift mutations cause the affected codon to be misread and subsequently also all the following codons, leading to a very different and often non- functional product. Classic LOH is defined as the loss of one allele of a heterozygous locus. Genetic mechanisms leading to LOH are highly variegated. An LOH event can be caused by deletion, that can involve the whole chromosome or smaller regions, by mitotic nondisjunction, or by mitotic recombination between two homologous chromosomes, including break-induced replication and gene conversion (interstitial mitotic recombination event)33-35 (Figg. 2-3). Interchromosomal recombinations like translocations are sometimes cited as possible cause of LOH since they can lead to deletions as consequences of DSBs. As reported in figure 2, there may be a series of events leading to the final, selected clone with the observed LOH pattern35.

Fig. 2: Mechanisms leading to LOH (modified from Murthy et al.35). (see text for explanations)

6 The result of LOH is the loss of the wild-type allele with the consequent functional alteration (Fig. 3). As discussed later (chapter 1.1.2.1), LOH was observed also without an accompanying copy number change, the so called copy-neutral LOH, and was associated to uniparental disomy (UPD)35-37. UPD refers to the situation in which both copies of a chromosome pair have originated from one parent, either as isodisomy, in which two identical segments form one parent homologue are present, or heterodisomy, where sequences from both homologues of the transmitting parent are present38. This can occur during transmission of chromosomes from parents to gametes or during early cell divisions in the zygote (germline recombination resulting in constitutive UPD), but also as a consequence of somatic recombination during mitotic cell divisions (mitotic or somatic recombination). Constitutive UPD is a quite rare phenomenon, whereas UPD occurring in adult cells by somatic recombination is probably more common, with an incidence increasing with cell division rate. Two mechanisms are supposed to be involved (Fig. 4): in UPD of whole chromosomes, mitotic nondisjunction is followed by duplication of the remaining homologue in the monosomic cell or loss of one chromosome in the trisomic cell; partial uniparental disomy (pUPD) involves mitotic recombination events between chromatids, such as reciprocal exchange of chromosome material between two homologues. pUPD causing interstitial regions of LOH by multiple recombination events is also called gene conversion.

deletion

gene conversion

recombination

translocation

nondisjunction

chromosome loss and duplication

Fig. 3: Mechanisms leading to LOH (modified from de Nooij-van Dalen et al.33). In cells carrying one wild-type allele (T) and one mutant allele (t) of a tumor suppressor gene, LOH events may lead to the expression of the recessive mutation.

7

Fig. 4: Mechanisms of uniparental disomy (UPD) (modified from Walker et al.125). In a normal cell (i) two homologous chromosomes with both parental alleles are present. During mitosis they are duplicated, so that each chromosome is made by two sister chromatids (iv). Mitotic nondisjunction can occur (ii), resulting in either monosomic or trisomic cells for a given chromosome. UPD of a whole chromosome can then occur if the remaining chromosome in the monosomic cell is duplicated or if the outnumbered allele in the trisomic cell is deleted. Alternatively, mitotic recombination between homologous chromatids can occur, resulting in partial UPD involving a chromosome arm (v), a telomeric region (vi), or interstitial regions (vii, viii).

The most common epigenetic modifications found in cancer cells are histone hyper-/hypo- methylation and alterations in histone acetylation/deacetylation balance (reviewed in39). DNA methylation involves the addition of a methyl group to the cytosine ring (carbon 5 position) by DNA methyltransferases. In the human genome, methylation is primarily found in repetitive DNA elements to protect them against recombination. Hypomethylation of the genome results in increased mutation rate and activation of otherwise silenced genes. On the contrary, hypermethylation of gene promoter regions causes their transcriptional repression. In tumors, hypomethylation can result in increased expression of oncogenes and hypermethylation in silencing of tumor suppressor genes40,41. Another relevant epigenetic modification involved in chromatin remodelling is histone acetylation, which consists of the addition of acetyl groups to lysines from the amino-terminal tail of core histones. Lysine acetylation and deacetylation reactions are catalyzed respectively by histone acetyltransferases (HAT) and histone deacetylases (HDAC). Acetylated histones enhance chromatic decondensation and DNA accessibility, therefore the acetylation state correlates with transcriptional activation. The disruption of the acetylation/deacetylation balance due to alterations of HATs or HDACs has been detected in many cancers. Moreover, malignant upregulation of histone deacetylation processes has been successfully targeted by therapeutic interventions with HDAC inhibitors. The identification of recurrent genetic alterations by cytogenetics has led to the discovery of previously unknown genes associated with cancer, also named cancer genes10. The first spectacular success of cancer cytogenetics goes back to the 1960s, when the Philadelphia chromosome (Ph) was discovered in patients with chronic myeloid leukaemia (CML)42. With the advent of molecular genetic techniques during the 1980s11, the breakpoints of genetic aberrations started to be characterized at the molecular level, and the affected genes were 8 discovered. The cancer genes located at frequently rearranged genomic sites were divided into two functional classes: the “dominant” oncogenes (i.e a single mutated allele is sufficient to contribute to tumorigenesis) and the “recessive” tumor suppressor genes (i.e both alleles need to be mutated). Chromosomal rearrangements were recognized as mechanisms of cancer genes deregulation, resulting in abnormal gene expression profiles and aberrant growth and proliferation of cancer cells. The first molecular consequence of a genetic aberration to be explained was the activation of MYC oncogene by translocations involving the MYC locus at chromosome 8 and immunoglobulin (Ig) gene loci (chromosome 2: IgK locus; chromosome 14: IgH locus; chromosome 22: IgL locus)43. Due to the translocation, MYC expression is increased by the Ig transcriptional enhancer. Another oncogenic deregulation mechanism is activated in the above mentioned Ph-positive CML, where the t(9;22) leads to the formation of a BCR/ABL fusion transcript that activates the ABL tyrosine kinase inappropriately44. The functional consequences of recurrent chromosomal rearrangements at the molecular level are of two major types: they can create a chimeric gene coding for a tumorigenic fusion transcript or juxtapose a gene to the regulatory elements of a partner gene, such as gene promoters or transcriptional enhancers, resulting in deregulated expression. The two main groups of genes deregulated (i.e activated but also inactivated) by the formation of a fusion gene are those encoding tyrosine kinases (mainly activated) and transcription factors (transcriptional activity can be enhanced, aberrant, or repressed)45. Inactivating mechanisms of tumor suppressor genes were discovered later, mainly studying familial cancers and in particular childhood tumors, where losses of tumor suppressor loci were demonstrated46. The widely described, simplified alteration scheme of tumor suppressor genes involves first a mutation of one allele (inactivating mutation) and then the loss of the wild-type allele (LOH, Fig. 3). According to this model, two successive mutational events are required in the same cell in order to inactivate tumor suppressor genes, as proposed by Knudson with the two-hit hypothesis in retinoblastoma47. Briefly, Knudson’s hypothesis states that inactivation of a tumor suppressor gene follows either inheritance or occurrence of a spontaneous mutation, with subsequent loss of the other wild-type allele. As previously reported, genetic mechanisms leading to LOH are more variegated than this simplistic representation (Figg. 2-3). In general, cancer genes are characterized by a redundancy of deregulation mechanisms. Oncogenes can be activated by gene dosage alterations, point mutations, translocations, promoter hypomethylation, constitutively active transcription factors, or absence of control by a tumor suppressor gene, if the latter is inactivated. Tumor suppressor genes can be inactivated by gene dosage alterations, LOH events, point mutations, or epigenetic silencing by promoter hypermethylation. If the aberrant functional consequences of genetic aberrations like somatic point mutations, epigenetic changes, LOH, CN alterations of small genomic regions or chromosomal rearrangements with breakpoints are almost clear, the

9 pathogenetic significance of aneuploidy or CN imbalances affecting large genomic regions containing multiple genes remains difficult to explain. A recent version (January 22, 2007) of the Cancer Gene Consensus of the Cancer Genome Project at the Sanger Institute, originally described by Futreal et al.10, contains 363 cancer genes whose aberrations are considered causal in the development of specific cancers. Of these, 70 are tumor suppressor genes, 292 are oncogenes and one can act as both. Only seven (2%) of these oncogenes were shown to be predominately activated by amplification, compared to 268 (92%), which are activated mainly by chromosomal translocation. However, a recent publication suggests that amplification is a mechanism of oncogene activation more common than previously believed48. In recent years, advanced high-throughput technologies have provided an overview of the high variety of somatic mutations49 and copy-number alterations50-52 found in human cancers. In addition, large-scale association studies have uncovered genome variations that determine the genetic susceptibility of individuals to various types of cancer53,54. In conclusion, the identification of genetic alterations can lead to the discovery of genes, transcripts and pathways that play a relevant role in cancer. Moreover, neoplasia-associated genomic abnormalities are clinically useful, since they can be used for diagnosis and tumor classification, and can be helpful in the selection of appropriate treatment modalities, and in some cases, they can be exploited as therapeutic targets.

DNA amplification Amplification refers to an increase of at least four copies of an intrachromosomal DNA segment that is less than 20 Mb in length20. Amplified DNA can be organized as repeated units at a single genomic locus or as units scattered throughout various chromosomes, but also as extra-chromosomal elements55. The detection of DNA amplifications can be done at cytogenetic level by means of standard microscopic techniques, and at molecular level by fluorescence in situ hybridization (FISH), comparative genomic hybridization (CGH), and arrayCGH (aCGH). Cytogenetic manifestations of DNA amplification are homogeneously staining regions (HSRs) and extrachromosomal acentric DNA fragments, such as double minutes (DMs) and episomes56. HSRs are made of inverted tandem repeats within a chromosome, whereas DMs are circular, extra-chromosomal elements, a few Mbs in size, that replicate autonomously. Episomes are only ~250 bp in length and can be identified only with molecular methods. Frequently amplified chromosomal loci have been detected using computational modelling on published CGH data57. Mainly, they co-localize with well known oncogenes such as BCL2, typical for Non-Hodgkin’s lymphoma (NHL), EGFR in glioma and non-small cell lung cancer, MYCN in neuroblastoma, ERBB2 in breast and ovarian cancer, and MYC in various cancers. In fact, DNA amplification is one of the mechanisms by which oncogenes are activated, as demonstrated by integrating gene dosage with expression level. The downstream effect of

10 DNA amplification is the increased expression of the targeted genes. However, its impact on gene expression may vary among different cancer types30,31,58. Gene amplification is a mechanism of oncogene deregulation mainly observed in solid tumors. On the contrary, amplifications occur with lower frequencies in haematological malignancies, where translocation events are thought to be prevalent57. In a recent survey of amplifications in 104 cell lines of various tissues of origin, leukaemia and lymphomas had on average 10 amplicons, whereas epithelial cancers had an average of 3648. Amplifications probably arise through DNA DSBs at both ends of the amplified region, as predicted by models explaining their formation, such as breakage–fusion–bridge cycle and excision and unequal segregation of extrachromosomal DNA fragments59. According to the breakage–fusion–bridge (BFB) model59 leading to the formation of HSRs, the double-stranded DNA break is induced by telomere erosion/dysfunction or fragile sites. The two uncapped, sister chromatids then fuse, apparently as a consequence of aberrant DNA repair mechanisms. During mitosis, at anaphase, the dicentric, fused chromatids form a bridge, where DNA is arranged in head-to-head position. If this structure divides asymmetrically, the daughter cells will receive either a chromatid with duplicated genetic material, or a chromatid from which DNA was deleted. BFB cycles could result in amplified inverted copies of a genomic region. On the other hand, the excision-relocation model59 describes the selection of circular extrachromosomal chromatin bodies (DMs and episomes) at cell division as gene amplification mechanism. These circular elements are probably formed by excision following loop formation, in some cases due to replication bubbles, or by circularization of fragments derived from HSRs breakdown59. They can persist extrachromosomally, or can be transposed to another chromosome as HSRs or distributed insertions. DNA amplifications might be promoted by the presence of specific genome sequences (e.g. fragile sites, repetitive elements), defects in DNA replication or telomere dysfunction55, as well as by environmental factors59. Fragile sites are chromosomal regions prone to breakage under conditions of replication stress, and it was shown that common fragile sites are co- localized with amplifications60. Hellman et al. demonstrated that the breakpoints of the intrachromosomal amplification of the MET oncogene in a human gastric carcinoma is located within the common fragile site FRA7G region. Since their discovery, fragile sites were thought to facilitate genetic recombination events giving rise to cancer. For example, the FHIT gene, a particularly large gene discovered at the most active common fragile region FRA3B, plays a role in tumor development as inactivated tumor suppressor gene, as well as the WWOX gene at 16q23.161. We recently observed that FHIT, WWOX and other genes at fragile sites are preferentially targeted by genomic deletions in HIV-NHL4 (manuscript in preparation). A statistical study evaluating the influence of fragile sites, large genes and cancer genes on the preferential localization of DNA amplifications co-localized the amplification hot spots with known fragile sites, cancer genes, and virus integration sites, but statistical significance could not be assessed57. These findings suggest that the location of

11 DNA amplifications is mainly directed by the presence of oncogenes and of large genes, while fragile sites might be involved in the DNA amplification mechanism without affecting its precise localization. Similar results are reported in a study involving 104 cancer cell lines48. Any statistical significant association was shown between amplifications and known fragile sites in the human genome, but amplifications co-localized at some hot spots with known cancer genes and fragile sites. DNA amplification usually appears in advanced cancers, where p53-mediated maintenance of genomic integrity is lost62, but this is not an universal rule. In fact, amplification was observed also in early or pre-malignant stages demonstrating that it is not simply a feature of highly rearranged genomes of advanced tumors55. Biological specificity and significance of gene amplifications make them attractive targets for clinical applications, such as prognostic evaluation63,64, diagnosis and therapy65, as also shown by us1. In fact, the gain-of-function effect of gene amplifications makes them ideal targets for therapeutic intervention due to the direct nature of their activation and the fact that a tumor can become addicted to the enhanced expression of the affected gene65. There is increasing interest in the genome-wide study of amplified regions in cancer samples as putative loci that harbour genes important for tumor development. Gene amplification has the potential to lead to the identification of novel oncogenes, and to characterize tumors from a prognostic and diagnostic point of view. The drawback of the study of amplified regions is the identification of the driver gene, because several genes might map in the investigated region. One possibility to improve the prediction of candidate oncogenes is based on the simultaneous evaluation of their expression level: overexpression linked to increased DNA level is a strong indicator of oncogene behaviour.

Genetic aberrations in non-coding regions Besides the deregulation of cancer genes, genetic aberrations may also perturb the function of non-coding RNAs (ncRNAs). This RNA class seems to be mainly involved in translational regulation, and its altered expression has been associated with cancer and other complex diseases (e.g. Alzheimer’s disease, mental retardation, psychiatric disorders)66. Well- characterized ncRNAs with known function include ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs), which are involved in mRNA translation, small nuclear RNAs (snRNAs), which are involved in splicing, small nucleolar RNAs (snoRNAs), which are involved in the modification of rRNAs, and microRNAs (miRNAs). Recently, another type of ncRNAs has been described, the ultraconserved genes (UCGs)67. Two ncRNAs are known to play a role in tumorigenesis: miRNAs and the recently discovered UCGs. Both miRNAs68,69 and UCGs67 are frequently located at rearranged regions involved in cancer, and the expression level of miRNAs can be influenced by the underlying DNA copy number69.

12 MicroRNA genes encode small, non-coding RNA molecules. Once mature they are about 22 nt long, and are involved in transcriptional and translational regulation of gene expression. Many miRNAs have conserved sequences between distantly related organisms, suggesting their essential role in physiologic processes, demonstrated for some pathways linked to development, differentiation, cell cycle regulation, apoptosis and metabolism70. Approximately 50% of all miRNAs are embedded within introns of protein-coding genes or non-coding RNA transcripts71. miRNAs bind to target mRNAs by base pairing at partial complementary sites (in plants perfect base pairing), mainly located at the 3’-untranslated region (UTR)72,73. Each miRNA has the potential to regulate a large number of gene’s UTR, and several miRNAs can also have the same target, giving rise to a complex regulation network of miRNAs and miRNA targets. miRNAs are very similar to siRNAs in their function. The main difference lies in the biogenesis process (reviewed in Kim et al.74). Both siRNAs and miRNAs are double-stranded before incorporation into the RNA-induced silencing complex (RISC), but prior to insertion one strand is removed. The strand of the miRNA duplex stably associated with RISC represents the mature miRNA, that determines the fate of the miRNA and its target mRNA. In case of perfect complementarity between mature miRNA and target mRNA (like in plants), the latter is cleaved and degraded. In contrast, the regulatory mechanism observed in case of imperfect base pairing is mainly translational silencing, but a reduction of target mRNA level was also observed75. Growing interest is dedicated to the role of specific microRNAs in cancer. In fact, evidences have shown that losses of miRNAs with tumor-suppressive activity may contribute to tumorigenesis. For example, in chronic lymphocytic leukaemia 13q14.3 deletion has been associated with loss of mir-15a and mir-16-176. mir-15a and mir-16-1 are located in a region without known coding genes, which is frequently deleted also in mantle cell lymphoma, multiple myeloma and prostate cancer. They were found to negatively regulate the expression of the anti-apoptotic BCL2 protein77, therefore playing a role in B-cell survival, but these results have not been largely confirmed78. An oncogenic role of overexpressed miRNAs in cancer has also been established. For example, this is the case for mir-155, both in solid and in haematological malignancies79,80, or for the mir-17-92 cluster, located at 13q31-32, a region frequently amplified in malignant B-cell lymphoma81. In our laboratory we detected a high copy number amplification at 13q31 in the human mantle cell lymphoma cell line JEKO1, targeting both MYC and mir-17-9282. To conclude on the role of miRNAs in cancer, we can say that they potentially act both as oncogenes and tumor suppressor genes depending on the specific set of conditions they are subjected to. Moreover, miRNA expression profiles in human cancers can be significantly informative and can potentially help in tumor classification, diagnosis and prognosis prediction83. Ultraconserved regions (UCRs) are a subset of conserved sequences located both in intra- and intergenic regions, that are strictly conserved among orthologous regions of the human, rat, and mouse genomes84. Calin et al. described their alteration at gene expression level in

13 human cancers (leukaemia and carcinomas) and their location at known regions involved in cancer67. Moreover, they proved in a cancer model that a differentially expressed UCG had oncogenic properties, as decreasing its overexpression induced apoptosis in colon cancer cells. UCGs are a relative new field of research and new investigations are needed to better define the functional significance of their alterations in human cancer. In conclusion, genetic aberrations in tumors of these two ncRNA classes, miRNAs and UCGs, support a model of carcinogenesis in which both coding and non-coding genes are playing a role67,85.

1.1.2 Genomic profiling

One of the central aims of cancer research is the comprehensive knowledge of genomic alterations driving oncogenesis. Due to the complexity of the disease, this implies the use of global approaches and the integration of genome-wide analyses at DNA, RNA and functional levels86. With the available human genome sequence87,88 and the advent of high- throughput techniques like microarray platforms, it is now possible to survey genome-wide DNA copy number abnormalities and gene expression alterations at high resolution89. Genomic profiling plays a very important role, both for the identification of diagnostic and prognostic factors and new therapeutic targets, and for the development of tumor classification systems that reflects the underlying biology. Basically, genomic profiling enables high-throughput screening of the genomic and gene expression alterations in tumor samples. Once candidate genes or transcripts are identified, other independent techniques are applied for confirming the results and studying the pathological mechanisms linked to the observed alterations.

1.1.2.1 Array comparative genomic hybridization (aCGH) Genome-wide DNA profiling, sometimes simply referred to as genomic profiling in the context of DNA characterization studies, is a powerful initial approach for cancer research. It allows to simultaneously identify multiple alterations at genomic level from various specimens in an unbiased manner and in short time. Array comparative genomic hybridization (aCGH) is based upon conventional CGH. The latter appeared as a valid approach to detect DNA copy number changes due to duplications/amplifications, deletions and unbalanced translocations. In CGH, the DNA to be tested and a reference DNA are differentially labelled and hybridized on a glass slide with normal metaphase chromosome spreads90. The ratio of the hybridization signal intensity of test and reference samples at any interrogated locus is detected. The assumption is that the relative amount of test and reference DNA bound to a given chromosomal locus is determined by the relative abundance of that sequence in the two DNA samples. After

14 measurement of the ratio of the intensities of the two different dyes along the target chromosomes, regions of impaired DNA content can be detected. CGH studies provided many useful data on genetic events occurring in lymphoid neoplasms91-94. A drawback of CGH is that the DNA of the chromosomal spreads on the slide is still highly condensed and supercoiled, thus the resolution of the technique is limited: approximately 10-12 Mb for deleted regions and approximately 2 Mb only for high level (> 5- 10 fold) amplifications. CGH is also not able to identify small intragenic alterations, ploidy changes, alterations affecting pericentromeric and heterochromatic regions, and balanced translocations. To overcome some of these limits, the microarray-based CGH, or aCGH, was implemented. Solinas-Toldo et al. performed matrix-based CGH on glass slides with spotted probe DNAs of different types: cosmids, chromosome-specific DNA libraries, P1 or PAC clones of genomic fragments (75-130 kb)95. They demonstrated that matrix-based CGH was at least as efficient as CGH on metaphase chromosomes in detecting high-copy-number amplifications and low- copy-number changes. Moreover, PAC probes greatly enhanced the resolution and allowed the detection of imbalances not identified by chromosomal CGH. The possibility to spot many DNA probes in an ordered manner on a substrate made of glass or nylon opened this technology to automation and high-throughput screenings, thus rendering aCGH a more simple and more efficient procedure to detect DNA copy number changes. Two examples are represented by a very recent work on diffuse large B-cell lymphoma (DLBCL)96 and another one on acute lymphoblastic leukaemia (ALL)50. The study of 203 DLBCL samples by high-resolution, genome-wide copy number analysis identified 272 recurrent chromosomal aberrations that were associated with gene expression alterations96. With both gene expression and genomic profiles available the authors demonstrated that 30 chromosomal alterations were differentially distributed among DLBCL subtypes previously defined by different gene expression profiles, termed germinal center B-cell-like (GCB) DLBCL, activated B-cell-like (ABC) DLBCL, and primary mediastinal B-cell lymphoma (PMBL)94. For example, an amplicon on chromosome 19 was detected in 26% of ABC DLBCLs but only in 3% of GCB DLBCLs and PMBLs. A highly upregulated gene in this amplicon was SPIB, which encodes an ETS family transcription factor and is probably involved in the pathogenesis of ABC DLBCL. The data presented in that work provide genetic evidence that the DLBCL subtypes are distinct diseases that use different oncogenic pathways. Another recent genome-wide analysis of genetic alterations in ALL reaffirms the power of high-resolution genome-wide approaches as initial step to detect new oncogenic lesions50. Two hundred and forty two pediatric ALL patients were analyzed by means of SNP (single nucleotide polymorphism) arrays and genomic DNA sequencing. The work revealed the involvement of principal regulators of B- cell development and differentiation in ALL pathogenesis. PAX5 was identified as the most frequent target of somatic mutations (deletions, translocations, point mutations) leading to downregulation of the functional protein.

15 Nowadays, there are different aCGH platforms that can be used, including cDNA microarrays, genomic clones arrays, and oligonucleotide arrays. cDNA microarrays were initially designed for gene expression profiling. In a study by Pollack et al., human cDNA arrays were successfully adopted for both DNA copy number and gene expression analyses58. They took advantage of the availability of previously annotated expressed gene sequences to create microarrays containing more than 5,000 probes representing approximately 5,000 human genes. About half of the cDNAs were either known genes or similar to known genes in other organisms, the rest were anonymous ESTs. With the use of tumor cell lines with known amplifications and deletions they demonstrated the efficacy of the approach. Moreover, cDNA microarrays performed well in discriminating low-level gene amplifications and single- copy deletions. The increased resolution power with respect to CGH or aCGH with large genomic clones was demonstrated by the detection of previously unreported DNA abnormalities. Besides the availability of cDNAs probes and the increased resolution and genome coverage, another advantage of cDNA microarrays is the possibility to analyze in parallel both DNA copy number and gene expression of the same sample. Pinkel et al. used DNA from cosmids, P1, BACs (bacterial artificial chromosomes) and large insert genomic clones as array probes97. They were able to measure single-copy decreases and increases from diploid status with a resolution of approximately 40 kb and they also detected new copy number aberrations of chromosome 20 in breast cancer clinical samples. The laborious preparation and spotting of BAC clones led to the development of approaches using representations of BACs, such as ligation-mediated PCR and a modified DOP–PCR (degenerate oligonucleotide primer-PCR) protocol. Ligation-mediated PCR was proposed by Snijders and colleagues98, based on the technique developed by Klein et al.99 for global amplification of DNA from small samples to create a complete representation of the human genome. Briefly, DNA isolation by proteinase K is followed by restriction enzyme digestion of DNA. This step leaves an overhang on DNA fragments that is necessary for adapter ligation. After a first round of PCR amplification, adapter-specific primers and labelled nucleotide allow for single-primer PCR amplification and labelling. The second method, the modified DOP-PCR, was proposed by Fiegler et al. and enabled for amplification of large genomic clones (BAC and PAC) with three primers predicted to be highly specific for human DNA by a bioinformatic approach100. This decreased the impact of E. coli DNA contamination with respect to previous DOP-PCR approaches and avoided costly and time-consuming procedures for the preparation of DNA clones. Snijders et al. assembled a BAC array for measurement of DNA copy number across the human genome that comprised approximately 2,400 BAC clones with an average resolution of about 1.4 Mb, used for measurements both in cell lines and clinical material. They were able to reliably detect and quantify high-level amplifications and single-copy alterations in diploid, polyploid and heterogeneous genetic backgrounds.

16 Technical limitations of BAC aCGH regarding the array preparation, its resolution and its performance could be in part solved. Improvements of BAC aCGH technique were presented in some cancer research studies. Greshock et al. assembled an high-resolution array with a set of more than 4,000 selected BAC clones101. The BAC clones were optimized based on genome alignment, aCGH performance, cancer gene coverage and resolution. They included only clones that were unambiguously mapped, with a good aCGH performance measured as hybridization reliability, reproducibility and sensitivity, and allowing a mean resolution of about 1 Mb across the human genome, this comprising as many cancer-related genes as possible. Inazawa et al. were able to detect novel cancer-related genes through aCGH analyses done with different BAC arrays in various tumors102. In particular they used an array with about 4,500 BAC clones for genome-wide analysis at a resolution of about 0.7 Mb and an array consisting of 800 BAC clones covering 800 known cancer-related genes for diagnosis of cancer-specific copy number aberrations. The construction of a tiling array consisting of more than 32,000 overlapping BAC clones covering the entire human genome greatly improved the resolution capacity of BAC array and increased the ability to identify unknown genetic alterations that may be associated with multiple tumor types103. cDNAs and BAC clones presented some technical problems such as repetitive motives in some sequences, cross hybridization to similar targets and eventually probe contamination with unspecific PCR products. The deeper knowledge of the human genome resulting from the various sequencing efforts87,88,104 opened the way to new approaches for genomic profiling. Namely, array elements started to be designed for targeting known sequences and selected genomic regions. Oligonucleotide arrays made of single-stranded 25 to 85-mer elements designed taking advantage from the published draft of the human genome sequence were developed. With oligonucleotide arrays the exact sequence and length of the probes are known, and repetitive sequences and cross-hybridization can be avoided. The disadvantage of small sized oligonucleotides is the high noise level, resulting in a lower signal to noise ratio for each probe. In a short report Carvalho and colleagues demonstrated the increased performance and the simplified handling of aCGH by means of spotted 60-mer oligonucleotides with respect to BAC arrays108. The 60-mer sequences were selected from a library of human oligonucleotides. Later they optimized the in-house spotted oligo aCGH platform with a new protocol and an extended oligonucleotide library (Human Release 2.0 oligonucleotide library)109. The average spacing between the oligonucleotide was about 100 kb thus corresponding to an average resolution of about 100 kb, allowing the detection of single copy number changes and homozygous as well as heterozygous deletions. An alternative method called representational oligonucleotide microarray analysis (ROMA) was presented by Lucito et al.110. They assembled arrays made by oligonucleotide probes derived by a protocol for creating genome representations. The technique was previously used both for the preparation of DNA samples and probes to perform aCGH111. It is based on restriction

17 enzyme digestion of genomic DNA followed by ligation-mediated PCR112. With the use of the draft of the human genome sequence, restriction sites could be predicted, probes could be designed to be complementary to the expected representations and the location of the resulting oligonucleotides could be precisely mapped. In their work Lucito et al. also demonstrated that the minimal oligonucleotide length which gives the maximal performance, looking in particular at the signal-to-noise-ratio, was the 70-mer. They created a representational oligonucleotide microarray with a very high density constituted of 85,000 70- mers achieving an average resolution of 30 kb. Oligonucleotide probes can be synthesized directly on the array surface but can also be spotted on it in a second time. Various techniques are used for their synthesis, such as light- directed chemical synthesis (photolithography, Affymetrix)105, inkjet technology (Agilent) or maskless array synthesis technology (NimbleGen Systems)106. Illumina proposes also BeadChips, where oligonucleotide probes are adsorbed to silica beads arranged on a microarray107. Nowadays, commercial oligonucleotide arrays are proposed by many companies. Here are presented some examples, with the main focus on the development of Affymetrix SNP arrays and their use in aCGH approaches for the analysis of chromosomal copy number imbalances and LOH events. Oligonucleotide SNP arrays by Affymetrix are made of 25-mer oligonucleotides directly synthesized on a glass slide by photolithography105. Oligonucleotide SNP arrays were originally designed for large scale SNP genotyping113,114. Later they were used for LOH analysis115,116 and, since the demonstration that fluorescent hybridization signal intensity showed a dosage response to variation in DNA CN117,118, they started to be used for combined CN and LOH analysis in cancer studies. These arrays are described more in details in the next sub-chapter. The Agilent Technology’s long oligonucleotide microarrays are composed of 60-mer directly synthesized on the array, giving a median resolution of about 70 kb. The 60-mer platform proved to be useful for the detection of copy number alterations with a high sensitivity and without the need to generate reduced-complexity representations of total genomic DNA. Barrett et al. also created a custom 60-mer array representing unique genomic sequences of chromosomes 16, 17, 18 and X with an average resolution ranging from 10.5 kb to 31 kb and with an improved performance for aCGH studies119,120. Another design in the production of oligonucleotide microarrays was proposed by Selzer et al121. They introduced the use of probes with the same melting temperature of 76°C but varying sizes (45/85-mer), to optimize the hybridization conditions and minimize the hybridization bias. Probes were synthesized by light-directed chemical synthesis directly on the array according to NimbleGen manufacture. In their study they used two array types: a whole-genome array with a 6 kb median probe spacing, and a tiling array for selected genomic regions with either 50 or 140 bp median probe spacing.

18 SNP arrays and Affymetrix GeneChip Mapping Assay SNP arrays designed to perform genome-wide genotyping were assembled upon predictions of SNPs containing fragments of the genome obtained after in silico digestion of genomic DNA with restriction enzymes (e.g. EcoRI, BglII and XbaI)114. Most SNPs were chosen from public (e.g. The SNP Consortium (TSC)) and Perlegen SNP (500K) databases according to their presence on restriction enzyme fragments of the human genome. Additional criteria for their inclusion on the array were high reproducibility, call rate, accuracy and physical map position allowing an homogeneous covering of the entire human genome. The selected SNPs were validated across many individuals from multiple populations, including Caucasian, Asian, African-American, African, and South American. A single high-density oligonucleotide array was created for genotyping of over 10,000 SNPs simultaneously122 with the same one-primer amplification assay as previously presented by Kennedy et al.114. To increase assay sensitivity and specificity, 40 different 25-mer oligonucleotides per SNP were spotted on the array, with the SNP in the centre position of the 25-mer. Each SNP position (position 0) was interrogated by probe pairs made by perfect match and one-base central mismatch for either allele A or allele B. Mismatch probes were added to differentiate between informative signal (perfect match) and noise (mismatch). Redundancy of probes was reached adding offset probe pairs at four additional nucleotide positions flanking the SNP (positions -1, -4, +1, +4). Both the forward and the reverse orientation of the polymorphic site were represented on the array (Fig. 5). As a whole, 40 distinct probes representing one SNP were scattered throughout the array as probe pairs (perfect match and mismatch) to mitigate the effects of array variation.

PM A MM Sense A PMB

MMB

PMA

Anti- MMA

sense PMB

MMB AA AB BB

Fig. 5: Allele-specific hybridization with 40 probes/SNP (Affymetrix). On the array there are 25-mer probes corresponding to both of the two possible alleles at each SNP position. There are perfect match (PM) probes of the A and B allele sequences, and mismatch (MM) probes, with a single basepair mismatch at the center position, added to determine specificity of binding. In total, each SNP is interrogated by 40 different 25-mer probes. According to the hybridization pattern observed, it is possible to determine wheter a SNP is AA, AB or BB. Simultaneously, the intensity of the signal is detected and allow for copy number estimation.

Genomic sample DNA was prepared following the whole genome sampling assay (WGSA), a complexity reduction method similar to the one proposed by Lucito et al.112. After digestion of

19 total genomic DNA with a restriction enzyme and ligation to an adaptor recognizing the 4 bp overhang left by the restriction enzyme, PCR amplification was performed using a single, adaptor-specific primer. Amplified DNA was purified, fragmented with DNase and biotin- labelled, before allele-specific hybridization to the oligonucleotide array was performed (Fig. 6).

250 ng Genomic DNA Xba Xba Xba RE Digestion

Adapter Ligation PCR: Single Primer Amplification

Complexity Reduction

Fragmentation and Labeling Hybridization Wash & Stain

Scan

Fig. 6: GeneChip Mapping Assay overview (Affymetrix). Total genomic DNA is digested with a restriction enzyme (RE) and ligated to an adaptor. A generic primer recognizing the adaptor sequence is used to amplify ligated DNA fragments. Purified, amplified DNA is fragmented, labelled and hybridized to a GeneChip array, which is then washed and stained on a GeneChip Fluidics Station and scanned by a GeneChip Scanner.

The Affymetrix approach is based on the hybridization of a single sample to the array (single- channel hybridization). The hybridization signal intensity (phycoerythrin fluorescence) is detected by a scanner after having washed away the hybridization buffer from the array and having performed the staining procedure, consisting of three steps. The staining process is designed to amplify the signal of the annealed probe and comprises a first staining with streptavidin, followed by an incubation with biotinylated anti-streptavidin antibody and finally with streptavidin-phycoerythrin (SAPE) conjugates. The results are obtained through software applications that compare the acquired hybridization signal intensities corresponding to the analyzed genome with built in average values from normal reference samples. SNP arrays demonstrated to be proven as high-throughput genome-wide SNP genotyping platforms. Considering the increasing number of SNPs being discovered and advancements in array technology, assay and algorithm development, the scalability potential of this approach was more than evident. The first SNP array allowing the simultaneous genotyping of over 10,000 SNPs was the Affymetrix GeneChip Mapping 10K Array122.

20 Plastic cartridge Notch Septa

FRONT

Probe array on glass substrate

BACK

Fig. 7: GeneChip microarray (Affymetrix).

Affymetrix GeneChip Arrays consist of a square glass substrate mounted in a plastic cartridge, with a chamber that acts as a reservoir where hybridization and washing occur (Fig. 7). Each 10K Array is made of 8 μm x 8 μm features, each feature consisting of more than 1 million copies of a 25-mer oligonucleotide probe of defined sequence, synthesized in parallel by a unique combination for manufacturing made of photolithography and combinatorial chemistry105. For each SNP, redundancy of probes is assured by 40 different oligonucleotides, each with a slight variation in perfect matches, mismatches and flanking sequence around the SNP. The median and average distance between SNPs are 113 kb and 258 kb, respectively. An improved version of this assay was proposed again by Matsuzaki et al. with a pair of oligonucleotide arrays for the genotype analysis of over 100,000 SNPs using two different restriction enzymes (XbaI and HindIII)123 (Affymetrix GeneChip Mapping 100K Set). Further improvements concerning the genome coverage and the interrogated issues were developed on the basis of the same protocol, going from Affymetrix GeneChip Mapping 500K Array Set, until the Affymetrix Genome-Wide Human SNP Array 5.0 and 6.0 (www.affymetrix.com). The GeneChip Mapping 500K Array Set consists of two independent arrays (GeneChip Human Mapping 250K Nsp and Sty arrays) and two independent reagents kits, that, combined, enable the genotyping of more than 500,000 SNP loci with a single primer each. Each 250K Array is made of 6.5 million 5 μm x 5 μm features, each feature consisting of more than 1 million copies of a 25-mer oligonucleotide probe of defined sequence. For each SNP, redundancy of probes is assured by 24 or 40 different oligonucleotides tiled around the SNP. In general, 14 offset quartets made of perfect match and mismatch probes for both A and B alleles were tested and the six best performing quartets, i.e. 24 probes, were chosen to represent a SNP. For SNPs with higher genetic importance 10 quartets, i.e. 40 probes, were selected. The median and average distance between SNPs are 2.5 kb and 5.8 kb, respectively. The SNP Array 5.0 is a single microarray covering all SNPs from the original two-chip Mapping 500K Array Set, as well as 420,000 additional non-polymorphic probes that can measure other genetic differences, such as copy number variants. The new Affymetrix Genome-Wide Human SNP Array 6.0 features 1.8

21 million genetic markers, including more than 906,600 SNPs and more than 946,000 probes for the detection of copy number variation. In the presented work, we adopted the arrays Affymetrix GeneChip Human Mapping 10K Xba 142 2.0 and GeneChip Human Mapping 250K Nsp for aCGH studies. As already mentioned, SNP arrays were created for genome-wide genotyping aims, but they started to be used also for LOH115,116 and DNA CN analysis117,118. Original methods used for LOH detection such as allelotyping using restriction fragment length polymorphism (RFLP) or by polymorphic microsatellite markers (short tandem repeats, STRs; simple sequence length polymorphism, SSLP) are long and labour intensive processes for large scale studies, require a high amount of DNA and have a low resolution because of low abundance of microsatellite markers. On the contrary, allelotyping with SNP arrays gives the possibility to study a high number of loci at one time with a small amount of sample material. In the studies by Mei et al.115 and Linblad-Toh et al. 116, LOH was detected at informative SNP positions, corresponding to heterozygous SNPs in the normal DNA and homozygous or no call in tumor DNA. It was experimentally verified that a tumor purity of 90% give essentially the same results as 100%116 and the identification of allelic imbalances, but not of true LOH, was possible with a background of up to 50% normal DNA115. Moreover, it was demonstrated that high-density SNP arrays allow the identification of LOH in tumor samples also in absence of matched normal DNA124. This is possible because, by considering each SNP independently, the probability that a contiguous stretch of homozygous calls happens by chance is computed. Thus, chromosome regions with LOH are inferred from this probability based on the assumption that the occurrence of long stretches of homozygous regions along a chromosome is very unlikely and therefore may indicate a region of LOH. This LOH analysis has a lower resolution than the one performed with paired normal and tumor samples. With the increasing density of SNP arrays it becomes more difficult to identify regions of acquired LOH in tumor samples without a paired control sample125. Since the algorithms used to compute significant regions of LOH are based on the identification of regions of contiguous homozygous calls, with an increased number of SNPs it becomes more likely that this occurs naturally. It is still possible to infer LOH using freely available control group data, but this requires more careful manipulation and more complex algorithms126. The first study exploring the possibility to generate both genotype and copy number data with the same platform used the WGSA114 and the Affymetrix p501 array117. The Affymetrix p501 array was designed as a prototype high-density oligonucleotide array for WGSA (in fact was also referred to as WGSA p501 array) and contained oligonucleotide probes representing more than 6,000 SNPs meeting stringent selection criteria. The probes were designed based on SNPs predicted to be contained in the 400-800 bp fragments of XbaI- digested genomic DNA and had a median spacing of 260 kb. Each SNP was interrogated by a set of probes according to the same array design presented in previous works114,127. After target preparation following the WGSA and hybridization to the array, fluorescent intensity of

22 annealed probes was measured. For any SNP the average intensity difference between perfect match and mismatch probes was calculated. A log transformation was introduced to make the distribution more Gaussian. After having performed this calculation for all the SNPs on the array, the resulting values were scaled back to a mean of zero. The DNA CN was estimated after having undertaken a smoothing procedure of the obtained values. A running average procedure over five SNPs (test SNP and the four flanking positions) was performed. This was converted into CN by calculating the fluorescence ratio over the mean signal intensity from a reference series of 29 normal DNAs. In addition, to evaluate the significance of the variation in CN of the sample with respect to the reference set, a p-value was calculated indicating how likely the observed CN change could be expected in the normal population. The CN estimation procedure allowed the identification in cancer cell lines of amplifications (reported by at least three consecutive SNPs with CN>5), homozygous deletions (reported by at least three consecutive SNPs with CN=0) but also more subtle changes like CN of 3 and 1. Zhao et al. presented a similar work of simultaneous characterization of CN and allelic alterations in cancer genome with high-density SNP arrays118. They analyzed cell lines with a defined DNA CN and clinical samples of solid tumors with the XbaI mapping assay114. Invariant set normalization method128 was used to normalize all the probe intensities of all arrays at a common baseline probe intensity level to allow a comparison of the signals across different arrays. Both observed and inferred DNA CN were determined. To assess the observed CN, the mean signal corresponding to a DNA CN of two was calculated for each SNP averaging the signal values of all normal cell lines (B-lymphoblast cell lines used as normal control for each cancer sample analyzed). On the other hand, the Hidden Markov Model (HMM) was applied to the raw signal data to obtain the inferred DNA CN. They could demonstrate that observed and inferred CN correlated well with the known DNA CN of the cell lines. The analytical method based on the HMM to infer the CN of each SNP based on hybridization intensity was implemented in the dChip software129. With the SNP array hybridization signal and the dChip model they could detect low- and high-level amplifications as well as hemizygous and homozygous deletions (defined as at least two SNPs covering >1 kb with an inferred CN of 0) also in human cancer derived-samples. Tumor purity resulted to be an important issue, in particular for the detection of deletions. In fact, CN amplifications could still be identified in the presence of 60% tumor DNA, but 90-100% tumor purity was required for finding deletions. Interestingly, CN analyses gave similar results both using the paired normal DNA (B-lymphoblast cell lines used as normal control for each cancer sample analyzed) and a pooled normal DNA as reference sample. The possibility to identify chromosomal CN changes in tumor samples in absence of matched normal DNA was demonstrated also by others124. It is possible to estimate the CN of individual SNPs by comparing the signal intensity of each SNP from the tumor sample with the mean of the corresponding SNP in a reference set containing more than 100 normal individuals (Affymetrix

23 CNAT software). The use of high-density SNP arrays to detect CN alterations in cancer cell lines and clinical samples117,118 opened the way to simultaneous genome-wide measurement of both DNA CN and LOH, which give a greater advantage over previous aCGH approaches and allow a deeper insight into the nature of the underlying genomic alterations in cancer cells. As above mentioned, high-density SNP arrays can be used to discriminate between LOH mechanisms by analyzing CN changes125. A work on the identification of the mechanisms leading to LOH based on microsatellite analysis and FISH of two tumor suppressor genes in breast carcinoma proposed UPD as the potential LOH mechanism for 8% of the samples (four out of 50)35. A more recent study with Affymetrix 10K SNP arrays on acute myeloid leukaemia uncovered large regions of homozygosity matching with normal DNA CN in 20% of the samples (12 out of 64 AML samples). A comparison between paired tumor DNA and DNA at remission revealed that LOH was somatically acquired in leukemic cells due to pUPD36. A similar result was obtained in a study done in 14 basal cell carcinoma and control paired samples37. Also in this case, UPD was identified as the somatic recombination event leading to copy-neutral LOH in 42% of the cases.

1.1.2.2 Gene expression profiling Gene expression profiling for characterizing changes in global expression levels between normal and cancer cells has proven useful in the identification of distinct signatures composed of representative transcripts for tumor classification and prognosis prediction. As an example, we can cite the two major molecular subtypes of diffuse large B cell lymphoma (DLBCL), known as ‘‘germinal centre B-cell-like’’ (GCB) and ‘‘activated B-cell-like’’ (ABC), that have been defined by gene expression profiling studies130,131. They differ from a pathogenetic and clinical point of view. Moreover, gene expression profiling identified primary mediastinal B-cell lymphoma (PMBL) as an additional clinically and pathogenetically distinctive subgroup of DLBCL, strongly supporting a relationship between PMBL and Hodgkin lymphoma132,133. Very recently, a genomic profiling of DLBCL samples by high-resolution, genome-wide CN analysis coupled with gene expression profiling revealed different oncogenic pathways that are used by the DLBCL subtypes, reinforcing the view that they represent pathogenetically distinct diseases96. Another example is given by a study that established a 70-gene signature in breast cancer as a more powerful predictor of the outcome of disease in young patients than standard systems based on clinical and histologic criteria134. The 70-gene prognosis profile has been translated into a customized microarray (MammaPrint) suitable for high-throughput processing, and for the first time microarray technology is being validated as a reliable diagnostic tool to predict disease outcome in breast cancer patients135. The principles of the microarray technology and, in particular, oligonucleotide arrays, has already been presented in the previous chapter. The only difference is constituted by the target hybridized to the array: in case of aCGH studies the starting material is genomic DNA,

24 whereas for gene expression studies it is mRNA or all other possible RNA species resulting from the process of gene expression. It is important to mention that the possibility to detect some small RNAs, like for example miRNAs, depends both on the array design (e.g. probe design, coverage) and on the target preparation, determined by the assay. This introduction will mainly focus on tiling arrays, which are a relatively new array type developed for high-resolution, unbiased (i.e. independent of previous annotation) transcriptome mapping. Usual expression arrays use probes designed at the 3’-end of a gene or on coding regions (surrogate strategy), others have probes interrogating each exon (annotation strategy), thus allowing the detection of splice variants. Tiling arrays interrogate systematically at regular intervals the non-repetitive part of one strand of the genome (Fig. 8). The standard target strand is the one in the reverse (R) orientation, this means that tiling arrays labelled with “R” are complementary to the reverse (-) strand of the genome. The great advantage they offer is their high resolution power, which is measured from the central position of adjacent oligonucleotide probes and is reported as resolution (Fig. 8).

Affymetrix array design Known exons Unknown

Surrogate strategy probes on 3‘ end Annotation strategy probes on exons Tiling strategy unbiased

Affymetrix Tiling array with 35 bp resolution

Fig. 8: Affymetrix array design and Human Tiling array with 35 bp resolution. Tiling arrays are microarrays designed to cover at regular intervals whole chromosomes or genomes regardless to genome annotation. Affymetrix Tiling arrays are made of 25-mer oligonucleotides here presented with a 35 bp resolution, i.e leaving on average a 10 bp gap between oligonucleotide probes. Array resolution is measured from the central position of adjacent probes.

Tiling arrays demonstrated to be useful to detect transcripts outside the currently annotated genes (such as well-characterized exons from RefSeq, mRNA from GenBank and publicly available ESTs)136-140. Novel transcription activity was described within intronic regions of known genes, within intergenic genomic regions, and also overlapping to annotated transcripts141. In a study, 31% of the transcription was detected in intergenic regions and 26%

25 in intronic regions139. This scenario is further complicated by the issue of transcriptional direction, as discussed later. Hypothesis on the nature of this novel transcription activity include the following possibilities: ncRNAs, novel protein-coding transcripts or new splice variants, biological “noise”, or experimental artefacts. The term “transcript of unknown function” (TUF) was proposed by the ENCODE consortium in alternative to ncRNA to denote novel, un-annotated eukaryotic transcripts with reduced protein coding potential, not yet validated as ncRNAs142. Three potential categories of TUFs can be identified, based on their relationship to annotated transcripts141. The first category includes antisense TUFs occurring in cis (i.e overlapping sense transcripts in the opposite strand and presenting sequences completely complementary to exonic and/or intronic portions of sense transcript), or trans (i.e. synthesized at a genomic site distal from the sense transcript and only partially complementary to it) relative to the sense transcript. In the human genome it is predicted that at least 20% of the coding genes might have at least one antisense transcript143. The second category of TUFs consists of isoforms of annotated protein-coding transcripts, therefore with the same transcriptional orientation than the known transcript. They present new exons or new transcription start site. The third category corresponds to sense TUFs, either overlapping intronic regions of annotated genes or found in intergenic positions. The assumption that novel transcripts possess some biological function and are not to be considered only as biological “noise” is supported by the fact that they seem to be conserved between human and mouse genome138,144. Moreover, it is known that mammalian genome conservation is significant also outside protein-coding sequences145. Some studies evaluated also strand-specificity of novel transcription activity138,146. As a result, tiling arrays revealed the existence of unknown transcripts overlapping with annotated genes, both on the same strand but also on the opposite strand147. In protein-coding genes, sense transcription is more frequent than antisense, but it seems that in case of non-coding transcripts the antisense transcription play a more important role148. Some of these new transcripts have already been validated with independent approaches, such as (strand-specific) RT-PCR136,138,146,149, rapid amplification of cDNA ends (RACE)139, combination of (strand specific) RACE and hybridization of the obtained products to tiling arrays (RACE/array)139,147, PCR product cloning and sequencing136,139,146,147 or northern blot analysis146. Since they are present at low copy number per cell, their detection can be difficult and can require some technical adaptations, like nested PCRs or 40 amplification cycles in PCRs. Basically, tiling arrays studies on genome-wide transcription activity have identified novel RNAs species transcribed from regions currently not annotated and have revealed that the proportion of transcribed genome is larger than previously expected. There is a highly overlapping, complex transcription activity in the human genome, where one base pair can

26 be part of many transcripts emanating from both strands of the genome. The functional importance of all the un-annotated transcripts is not yet known.

Oligonucleotide-based expression arrays We will present gene expression profiling data collected with Affymetrix GeneChip Expression arrays called Human Genome U133 (HG-U133) and Affymetrix GeneChip Human Tiling arrays, both oligonucleotide arrays. The principles, the design and the manufacturing of oligonucleotide arrays have already been presented in the previous chapter. The application of oligonucleotide arrays for gene expression profiling implies rigorous sample preparation. Starting material is represented by high-quality sample RNA. Sample RNA is reverse transcribed in cDNA, which in turn is used to prepare biotin-labelled complementary RNA (cRNA) targets. Finally, biotinylated cRNA is hybridized to the oligonucleotide arrays.

Affymetrix GeneChip Expression Array HG-U133 The Human Genome U133 (HG-U133) set, consists of two GeneChip arrays (HG-U133A and HG-U133B), and contains almost 45,000 probe sets representing more than 39,000 transcripts derived from approximately 33,000 human genes. The design of the probes was based on GenBank, dbEST, and RefSeq databases. The HG-U133A array includes representations of the RefSeq database sequences and probe sets related to sequences previously represented on the Human Genome U95Av2 array. The HG-U133B array contains primarily probe sets representing EST clusters. For our studies we adopted the HG-U133A arrays.

Affymetrix GeneChip Tiling Arrays Tiling arrays from Affymetrix provide 25-mer probes at an average resolution of 35 bp, meaning that a gap of approximately 10 bp is left between adjacent oligonucleotide probes (Fig. 8). Sequences used in the design of the Human Tiling arrays were selected from NCBI human genome assembly (Build 34), after removal of repetitive elements by RepeatMasker. Both coding and non-coding sequences were considered. The high resolution of tiling arrays implies the use of a set of arrays to cover all human chromosomes. GeneChip Human Tiling 1.0R Array Set is designed for transcripts mapping and includes 14 arrays with perfect match and mismatch probes, that cannot be purchased separately. The 14-array set contains approximately 45 million oligonucleotide probe pairs, and each array in the set contains over 3.25 million perfect match-mismatch probe pairs to specifically interrogate genomic regions. GeneChip Human Tiling 2.0R Array set consists of 7 arrays covering the entire genome with approximately 45 million oligonucleotide probes. Each array in the set contains over 6.5 million perfect match probes only, intended to be used for chromatin immunoprecipitation (ChIP) experiments, and that can be adapted for transcriptome mapping. The Human Tiling

27 2.0R Array Set is available either as a whole-genome set of seven arrays, or individual arrays from the set may be purchased separately. In the presented work we used Human Tiling 2.0R F arrays, which cover chromosomes 8, 11, and 12.

1.2 B-cell lymphoid neoplasms

1.2.1 Normal B-cell ontogeny

B-cells are Bone marrow-derived lymphocytes. They are responsible for the production and secretion of soluble immunoglobulin (Ig) molecules or antibodies (ABs) that recognize foreign antigens (AGs), thus playing a crucial role in adaptive, humoral immunity. They express clonally diverse cell surface Ig receptors (or B-cell receptors), that recognize specific AG epitopes. The B-cell receptor (BCR) is an integral membrane protein complex composed of two Ig heavy (H) chains and two Ig light (L) chains joined by disulfide bonds. Each chain contains variable and constant regions. Other components of the BCR are the CD79A and CD79B molecules, that contain cytoplasmatic activation motifs mediating intracellular signalling after BCR crosslinking. Like Ig receptors on B-cell surface, soluble AB molecules are composed of two heavy and two light chains as well, and are subdivided in five classes according to different heavy chain constant regions (IgM, IgD, IgG, IgA and IgE). The variable regions of the heavy and light chains together form a domain that specifically recognizes the AG, and the heavy chain constant region determines how the AG will be removed from the body, the so-called effector function. The variable regions of the IgH chain are assembled from a variable (V), a diversity (D) and a junctional (J) gene among the ones present on the IgH gene locus on chromosome 14; the variable regions of the IgL chains, which can be of κ or λ type, are assembled from V and J elements located either at the Igκ or Igλ gene locus on chromosome 2 and 22, respectively. This process of gene segment reshuffling of the Ig variable regions is known as V(D)J recombination. In humans, B-cell development starts from lymphoid progenitors in primary lymphoid tissues, i.e. fetal liver and fetal or adult bone marrow (BM), and is followed by functional maturation in secondary or peripheral lymphoid organs, like lymph nodes, spleen, and mucosa- associated lymphoid tissues (MALT), leading to terminally differentiated plasma cells (PCs) or memory B-cells (Fig. 9). PCs are non-proliferating, AB-secreting cells that circulate in blood and lymphatic system. Memory B-cells have self-renewal capacity, they proliferate slowly and can rapidly respond to AG re-exposure by differentiating into both PCs and germinal centre (GC) B-cells in another round of affinity maturation. Self-renewal is a process by which is generated a daughter cell that maintains the same properties as the parental cell. In the

28 hematopoietic system, self-renewal is observed in stem cells and memory cells of the immune system.

Bone marrow Secondary lymphoid organs

GC

Fig. 9: B-cell development (modified from Sagaert et al.188). (see text for explanations)

B-cell maturation is dependent on a complex network of cytokines and transcription factors, some of them still to be uncovered, that are responsible for the regulation of all maturation stages by influencing the gene expression of B-cells. Ig gene diversification is a key element of B-cell ontogeny and is based on developmentally regulated genetic instability processes. In fact it is accomplished by three error-prone DNA remodelling mechanisms: V(D)J recombination taking place at early stages in the BM, and class switch recombination (or isotype-switching) and somatic hypermutation, both taking place within GCs of secondary lymphoid organs. The earliest B-cells in the BM are the progenitor B-cells (pro-B-cells), characterized by their ability to rearrange Ig genes by V(D)J recombination. As already mentioned, V(D)J recombination is a process involving combinatorial rearrangements of V, D, and J segments of the IgH and IgL gene loci, resulting in individual variable domains within heavy and light chains150,151. This process is catalysed by the recombination activating genes 1 and 2 (RAG- 1/2), which are lymphocyte-specific endonucleases cutting the rearranging gene segments at specific recombination signal sequences (RSS)152. Recombination of the V, D, and J gene segments of the IgH chain locus gives rise to pre-B-cells. Subsequent rearrangement of the V

29 and J gene segments of the IgL chain locus leads to immature B-cells with a vast repertoire of functional VDJH and VJL rearrangements encoding for diverse surface IgM. Since these cells are not yet been exposed to external AGs, they are referred to as naïve B-cells. Immature B- cells subsequently give rise to the mature B-cells that express both IgM and IgD at the cell surface. Mature naïve B-cells with a functional BCR leave the BM, whereas B-cell precursors that fail to express a BCR undergo apoptosis153. The majority of mature B-cells home to secondary lymphoid tissues, where AG encounter takes place. By AG encounter, mature B- cells are activated. They proliferate and can either directly differentiate into AB-secreting cells (including both proliferating plasmablasts and non-divinding PCs) (primary immune response), enter GC reaction (secondary immune response), or undergo clonal expansion in extrafollicular sites (T-independent maturation)154. A first-line immune reaction is given by AG- activated B-cells not entering the GC reaction, which transform into large lymphoblasts that proliferate in loco and differentiate into short-living PCs, responsible for the rapid release of low-affinity ABs. On the contrary, lymphoblasts that migrate into GC generate long-living PCs that produce high-affinity ABs and memory B-cells (reviewed in Klein et al.155). GCs are highly specialized histological structures in secondary lymphoid tissues that are generated during the primary immune response. Within approximately two weeks, GC reaches the maximal size, then the structure slowly disappears. In GCs, BCR-activated B-cells undergo clonal expansion under the selective pressure of a given antigenic epitope, that drives the affinity maturation of the BCR. During GC reaction, Ig genes are exposed to a second wave of mutations and rearrangements through somatic hypermutation (SHM) of genes encoding the Ig variable regions of the heavy and light chains, and class switch recombination (CSR) at the IgH locus. SHM is associated with DNA double-strand breaks156,157 and randomly adds point mutations: mainly single nucleotide exchanges, but also deletions and duplications158. CSR results in a change of the Ig heavy chain constant region, whereby the AB class changes from IgM and IgD to either IgG, IgA or IgE, but its specificity is maintained159. CSR occurs by DNA recombination involving DNA double-strand breaks followed by non-homologous end- joining processes between specific repetitive regions preceding the IgH constant region. Both CSR and SHM are mediated by the B-cell specific enzyme activation-induced cytidine deaminase (AID)160,161. AID is a DNA-modifying enzyme which catalyses the targeted deamination of deoxycitidine residues into deoxyuridine, thus introducing a mismatch, subsequently repaired by error-prone mechanisms with different outcomes. This represents a unique and specialized case of transcription-associated genetic instability necessary for the generation of genetic diversity. GC B-cells can be divided into two subtypes: centroblasts and centrocytes155. Centroblasts are proliferating GC B-cells in which the variable regions of Ig genes are undergoing SHM, whereas centrocytes represent the progeny of centroblasts. Centrocytes undergo Ig CSR and are selected on the basis of the improved affinity of the Ig for a specific AG. Centroblasts are characterized by a fast proliferation rate, required for the Ig maturation process and

30 accompanied by a dramatic upregulation of genes associated with cell proliferation and the downregulation of genes encoding negative regulators of clonal expansion162,163. Notably, DNA damage detection and repair systems are specifically suppressed in centroblasts164. Moreover, they present unlimited replicative potential thanks to telomerase expression165. On the other hand, centroblasts are particularly prone to apoptosis, probably for the need to rapidly eliminate B-cells with disadvantageous Ig mutations. Namely, GC B-cells that express self-recognizing receptors, fail to express a functional receptor, or acquire disadvantageous Ig mutations are eliminated by apoptosis. Only a few centrocytes acquire mutations in the BCR that increase their affinity for a specific AG and are selected for survival as post-GC B- cells. They differentiate into long-living, AB-secreting PCs or memory B-cells. The differentiation process leading to PCs is probably driven by the expression of high-affinity BCR166, whereas the differentiation to memory pool seem to be a stochastic event during GC reaction167, but the signals that induce post-GC differentiation are largely unknown. B-cell lymphoma 6 (BCL-6) has been proposed as a major regulator of GC B-cell development155. Among different B-cell stages, BCL-6 is particularly expressed by GC B- cells168,169. BCL-6 is a member of a large family of transcription factors containing zinc-finger motifs, which mediate specific DNA binding and cause transcriptional repression of target genes. In centroblasts it is involved in the repression of multiple cellular processes (e.g. negative cell-cycle regulation, genotoxic stress responses, PC differentiation, memory B-cell differentiation) to permit high proliferation and high mutation rates by CSR and SHM. The expression of BCL-6 is, in turn, regulated by the level of DNA damage, which is sensed by a signaling pathway that promotes BCL-6 degradation by phosphorylation and ubiquitin- proteasome degradation in case of accumulation of DNA damage170. The repression program of BCL-6 is probably downregulated by various mechanisms, that allow, among others, the expression of B-lymphocyte-induced maturation protein 1 (BLIMP1), necessary for PCs differentiation171,172, and of still unknown factors that lead to memory B-cells differentiation173. PCs commitment has also been linked to the functional inactivation of the transcriptional repressor paired box protein 5 (PAX5), expressed in naïve, GC and memory B- cells174. An alternative pathway to acquire SHM is given by the T-cell independent AG activation in extrafollicular areas of secondary lymphoid tissues (e.g. marginal zone of spleen or lymph nodes)154. B-cell development is clearly characterized by complex and tightly regulated processes involving tricky DNA remodelling events. Perturbations in normal B-cell ontogeny and selection can lead to immunodeficiency syndromes, autoimmune diseases, and B-cell tumors.

31 1.2.2 B-cell tumors

B-cell-derived malignancies comprise leukemias, lymphomas and multiple myeloma (MM). In principle, every B-cell developmental stage potentially represents the cellular origin of a distinct B-cell tumor, that in turn reflects the expansion of an aberrant, undifferentiated subclone175. For example, malignant PCs found in MM derive from post-GC PCs. The introduction of gene expression studies comparing the expression profile of normal B-cell subsets representing different developmental stages with the ones of a B-cell tumor enabled further insights into the cellular origin of B-cell malignancies. This approach led, for example, to the identification of two major molecular subtypes of diffuse large B cell lymphoma (DLBCL), known as ‘‘germinal center B-cell-like’’ (GCB) and ‘‘activated B-cell-like’’ (ABC)130, indicating a different pathogenesis96. V(D)J recombination, SHM, and CSR mechanisms are of crucial importance in the pathogenesis of B-cell tumors. In fact, these DNA modification processes necessary for Ig diversification, if aberrant, might cause mutations and/or chromosomal translocations since they intrinsically generate double-strand DNA breaks. The major oncogenic consequence of errors in these processes is the generation of chromosomal translocations involving one of the Ig loci (IgH, IgLκ or IgLλ) and a proto-oncogene176. As a result, the oncogene is under the control of the Ig transcriptional enhancer, that lead to its deregulated, constitutive expression. Chromosomal translocations involving V(D)J recombination occur early in B-cell development and present translocation breakpoints close to the RSS site (e.g. t(14;18) affecting BCL2 in follicular lymphoma). Translocation breakpoints located in IgH switch regions are indicative of aberrant CSR mechanisms (e.g. t(4;14) deregulating FGFR3/MMSET or t(14;16) deregulating MAF in multiple myeloma), whereas translocation breakpoints within or adjacent to rearranged variable genes (e.g. c-MYC/IgH or IgL translocations in Burkitt’s lymphoma) suggest the involvement of SHM mechanisms. SHM process mainly generates nucleotide exchanges but also deletions and duplications. Besides chromosomal translocations involving Ig genes, other factors are considered to play important roles in the pathogenesis of B-cell tumors. Aberrant SHM could also target non-Ig genes and lead to tumorigenic mutations and translocations177-179. Moreover, the dependency on BCR signalling for survival typical for normal B-cells might also promote survival and proliferation of most malignant B-cells180. The activation of BCR in B-cell tumors was found to be mediated by autoantigens181,182, foreign AGs from chronic infections by bacteria or viruses183-185, or synergistic effects of both foreign and autoantigens, as proposed for gastric MALT lymphomas180. Interestingly, B-cell tumors not expressing a functional BCR are frequently found to be infected by Epstein-Barr virus (EBV), and it has been described that the expression of the EBV-encoded latent membrane protein 2A (LMP2A) can replace the BCR- mediated signals in murine B-cells186,187. Finally, in many B-cell malignancies the survival and/or

32 proliferation stimuli are likely to be provided also by interactions with tumor microenvironment180.

33 2 GENERAL AIM AND EXPERIMENTAL PLAN

A central aim of cancer research is to identify altered genes involved in the development of malignant cells. A strategy to detect those genes is given by the analysis of recurrent genetic aberrations and the identification of the targeted locus. Due to the complexity of cancer, it is advantageous to obtain a global view of both DNA CN and RNA changes for a better understanding of the impact of genomic changes on cancer biology. Moreover, integrative CN and expression analysis provides an efficient fist-pass tool for distinguishing potential cancer genes from genes randomly affected by DNA CN changes without any effect at transcriptional level. Microarray applications have proven to be the most accessible, high- throughput and high resolution detection systems for genomic profiling studies, including both aCGH and gene expression analyses. They enable the detection of genetic alterations leading to impaired chromosomal CN and altered gene expression profiles. It is known that changes in DNA CN can influence the expression and the activity of cancer associated genes30-32, thus representing an important oncogenic mechanism. Our aim was to discover DNA CN changes associated with B-cell lymphoid neoplasms and to detect candidate genes targeted by genetic lesions. Candidate genes could be helpful for improving the molecular characterization of B-cell tumors and for the identification of novel therapeutic strategies. In particular, we were interested in the discovery of candidate oncogenes activated by an increased DNA CN, that could represent relevant therapeutic targets. To identify candidate cancer genes we decided to performe aCGH studies. We used high- resolution SNP arrays to investigate the genome-wide DNA profiles of almost 200 B-cell tumors, including mantle cell lymphoma (MCL)1, diffuse large B-cell lymphoma (DLBCL)2, hairy cell leukaemia (HCL)3, HIV-related B-cell lymphoma4, and multiple myeloma (MM)5 samples. We present here the project regarding the molecular characterization of MM (chapter 3) and the successive investigations started from observations done in the course of the study (chapters 4 and 5). To improve our knowledge on the genetic lesions in MM and possibly identify new candidate cancer genes, we obtained the genome-wide DNA and gene expression profiles of MM samples. We focused our search of altered genes on recurrent aberration sites, since, probably, they do not occur at random but play a causative role. An additional criterion we considered to obtain more stringent results was the gene expression behaviour. To identify those CN changes influencing the expression of relevant candidate genes we integrated global gene expression profiling and genomic profiling data. For the selection of strong candidate genes among the transcripts with paired DNA and RNA levels we focused our attention on recurrent candidates in our sample batch concordant with previously published

34 MM data sets and/or already linked to MM or cancer. We principally concentrated our interest on gained or amplified candidates, likely to be indicative of oncogenes. In the second part of the work we characterized a recurrent amplification we found at chromosome 11 in the course of aCGH studies by 10K genomic profiles. We decided to confirme and better resolve it by the updated 250K arrays. In particular we wanted to understand the biological role of 11q23.1 instability in B-cell tumors, and to identify candidate genes targeted by the recurrent amplification. Expression analysis was performed, looking for overexpressed transcripts. We used tiling arrays for trascriptome mapping to identify candidate genes, and we successively tested them by quantitative real-time RT-PCR (qRT- PCR) in a larger panel of B-cell tumors. Moreover, on the basis of the qRT-PCR results, we selected putative amplification targets, to be further investigated by loss-of-function studies. Additionally, FISH analyses with BAC clones specific for the amplified region were performed on all four cell lines to investigate the DNA amplification patterns. In the third part we analyzed the detailed maps of putative transcribed units of chromosomes 8, 11, and 12 obtained by tiling gene expression arrays during the study of the 11q23.1 amplification. Our aim was, on one hand, to characterize the transcription activity identified on chromosomes 8, 11, and 12 with respect to the underlying DNA level, and, on the other hand, to exploit the high-resolution transcription mapping data to uncover new transcripts. A combination of the tiling expression profiles with the genomic profiles was performed. We wanted to localize genomic regions where increased transcription activity was matching with increased DNA level and decreased transcription activity was matching with decreased DNA level. The presence of novel transcription activity was evaluated by comparing the obtained regions with known annotation tracks.

35 3 GENOMIC PROFILING OF MULTIPLE MYELOMA

Abstract

In this chapter we present the molecular characterization of multiple myeloma (MM) by genomic profiling approaches. Part of the presented results have already been published5. MM is an incurable plasma cell malignancy linked to heterogeneous genomic abnormalities, such as ploidy status alterations, chromosomal CN changes and non-random chromosomal translocations mainly involving the IgH gene locus at 14q32. To improve the molecular characterization of MM and identify new candidate cancer genes exploitable as therapeutic targets we obtained, with DNA microarrays, the genome-wide DNA profiles and gene expression profiles of 30 MM samples, comprising both cell lines (17 HMCLs) and clinical samples (13 DNA profiles, 12 gene expression profiles). As major characteristics we observed 1q gain and frequently increased DNA level of odd chromosomes (3, 5, 7, 9, 11, 15, 19), and recurrent DNA losses and LOH events at 1p, 13q and 17p, in concordance with previous knowledge, thus confirming the efficacy of the adopted method. To identify DNA CN changes influencing the expression of relevant genes we integrated global gene expression and genomic profiling data by applying a matched filter on both DNA and expression data, or by the COPA-based6 detection of genes with outlier expression and underlying increased DNA CN. We were particularly interested in the selection of candidate oncogenes presenting increased DNA CN coupled with overexpression, which represent favorite targets for the development of novel therapeutic approaches. Moreover, we focused on candidate genes recurrent in our data set and in concordance with previous MM publications. We identified strong, recurrent candidates as evidenced by the presence of transcripts with a proven role in MM pathogenesis, but also genes targeted in other tumor types, known oncogenes not yet implicated in MM and candidates concordant across various MM data sets. Our data further support the evidence of the genomic complexity of MM and reinforce the role of integrative genomic approaches for improving the understanding of the molecular pathogenesis of MM. Further studies involving larger series of MM samples are needed to elucidate whether our selected transcripts can be considered relevant candidates in the pathogenesis of the disease, and whether they may contribute toward the identification of novel therapeutic targets.

36 3.1 Introduction

Multiple myeloma Multiple Myeloma (MM) is a B-cell tumor, specifically a malignant plasma cell (PC) disorder. It accounts for approximately 10% of all haematological cancers189. In Europe, as well as in the United States, it affects 15,000 new patients per year190. Although presenting a relatively homogeneous morphology, it is characterized by heterogeneous genomic abnormalities, such as ploidy status alterations, deletions/amplifications at different chromosomes and non- random chromosomal translocations mainly involving the IgH gene locus at 14q32. Despite high-dose chemotherapy regimens and stem cell transplantation, MM remains an incurable malignancy. To analyze the possible causes of the malignant transformation of PCs, it is worth considering the biology of normal PCs191-193 (chapter 1.2). PCs are terminally differentiated B-cells. Like all B-cells, they undergo three DNA remodelling mechanisms necessary for the generation of AG-specific Ig: V(D)J recombination, somatic hypermutation (SHM) and class switch recombination (CSR). Malignant PCs develop from post-GC B-cells, thus from B-cells that have undergone V(D)J recombination, SHM and CSR. As previously discussed, both processes are believed to play a key role in the pathogenesis of B-cell tumors. In fact, these DNA modification mechanisms involve DNA double-strand breaks and they might cause chromosomal translocations and somatic mutations. These, in turn, might activate oncogenes or inactivate tumor suppressor genes. Hallmarks of normal, mature PCs are the large amount of Ig produced, the high expression of syndecan-1 (CD138) and the expression of CD19 and CD38. PCs are rarely found in peripheral blood and usually represent 0.2% to 2.8% of bone marrow (BM) mononuclear cells. In contrast to highly proliferating plasmablasts (>30% DNA synthetizing cells), PCs have a very low proliferation rate. MM is characterized by accumulation in the BM, at multiple intramedullary sites, of monoclonal PCs, and elevated serum and urine monoclonal protein (Ig or part of Ig protein) leading to end organ sequelae. Like healthy PCs, the majority of tumor cells have a low proliferation rate (<1% DNA synthesizing cells), rendering difficult their analysis by conventional cytogenetic. Only in advanced stages, malignant PCs are found in the peripheral blood and are highly proliferative. This condition is called plasma cell leukemia (PCL) and represents a very aggressive form of the disease. Myeloma stem cells have been described as a minority of cancer cells with clonogenic potential, presenting self-renewal ability and resistance to toxic agents, like normal memory B- cells194. Myeloma cancer stem cells are regarded as the possible cause of disease relapse, but experimental proofs are still needed to clearly address this topic. MM is characterized by an heterogeneous clinical course, ranging from monoclonal gammopathy of undetermined significance (MGUS) to extra-medullary MM, with genomic

37 instability increasing with disease stage195,196 (Fig. 10). MGUS is a premalignant condition with low level of monoclonal PCs in the BM (<10%) and low level of monoclonal Ig in the serum (M- Ig 0.5-3 g/dl). It is a relatively stable condition, with no associated symptoms. Progression to MM is unpredictable and occurs at a rate of 1%/year197. Malignant expansion such as in smoldering MM (SMM) is defined by a tumor content in BM higher than 10%. SMM is an asymptomatic stage (no osteolytic lesions or organ complications) with slightly increased level of PC proliferation and serum Ig, and with a high probability of progression to overt MM. The condition of symptomatic MM is linked to clinical complications. Typical manifestations are lytic bone disease, hypercalcemia, cytopenia, immunodeficiency, anemia, renal dysfunction, hyperviscosity and peripheral neuropathy198. Bone destruction and impairement of BM physiology are due to a PC infiltration higher than 10%. End organ failure, e.g. kidneys, and increased blood viscosity, are caused by highly increased M-Ig serum levels. MM might further develop to the extramedullary form if PC proliferation involves other tissues than BM, like blood, pleural fluid and skin. In case of blood involvement and 2 x 109/l or more circulating malignant PCs the disease is called PCL, and it can also develop as de novo. Human myeloma cell lines (HMCLs) are usually established from this stage of the disease, where tumor cells are more convenient to collect and have a high proliferation rate.

Fig. 10: Disease stages and timing of oncogenic events (from Bergsagel et al.196). (see text for explanation)

MM and MGUS are twice as common in blacks compared with whites and slightly more common in males than females189. Despite evidence for some clustering of MGUS and MM within families, the roles of genetic predisposition and environment remain unclear199-201.

38 Multiple myeloma therapy A timeline depicting the history and treatment of MM from 1844 to the present is shown in figure 11 as proposed by Kyle et al.189.

Fig. 11: Timeline of history and treatment of multiple myeloma (from Kyle et al.189). (see text for explanation)

The first successful steps in MM therapy were done with alkylators and corticosteroid agents. The alkylating agent melphalan started to be used in 1958202 with patients’ benefit. Few years later also corticosteroids were tested, in particular prednisone. The best results were obtained with the classic regimen of melphalan plus prednisone (MP), that improved the survival of six months with respect to melphalan alone203. Other therapeutic modalities were introduced, combining various alkylating agents such as carmustine, melphalan, and cyclophosphamide with prednisone, but despite improvement of response rate, this combination chemotherapy did not change response duration and overall survival. For this reason, MP remained the principal therapeutic approach against MM for decades. In 1983, the first autologous BM transplantation was reported in MM, preceded by an high- dose melphalan regimen204. Successive studies used high-dose chemotherapy and total body irradiation prior to autologous or allogeneic BM transplantation205. Subsequent development of intense treatment programs using autologous transplantation helped in establishing high-dose therapy schedules and stem cell rescue as standard therapy for MM. Despite high-dose chemotherapy and stem cell transplantation, MM remains an incurable malignancy owing to intrinsic or acquired drug resistance. However, important advances in MM treatment are based on the introduction of therapeutic modalities targeting both tumor

39 cells and BM microenvironment. New therapeutic agents such as bortezomib, thalidomide or its analog lenalidomide can achieve responses in relapsed or refractory MM206. Thalidomide was described for the first time in 1994 as antiangiogenic agent in a rabbit cornea micropocket assay207. The increasing awareness of the importance of angiogenesis also in MM linked thalidomide to this pathology. Tested in a clinical trial, thalidomide resulted to be the first new drug with single-agent activity for MM208. The combination of thalidomide with steroids improved the response rates in relapsed disease to approximately 50%, and further to 65% in the combination with steroids and cyclophosphamide189. Lenalidomide is a 4-amino substituted analog of thalidomide. It was developed to improve the pharmacologic properties of thalidomide, and belongs to a class of molecules called immunomodulatory drugs. Lenalidomide activity was evaluated in preclinical studies and revealed the following properties: induction of apoptosis or growth arrest even in resistant MM cell lines and patient cells; decreased binding of MM cells to bone marrow stromal cells (BMSCs); inhibition of cytokines’ production in the BM milieu mediating growth and survival of MM cells (IL-6, VEGF, TNF-alpha); blocked angiogenesis; stimulated host anti-MM natural killer cell immunity209-211. On the basis of these results, lenalidomide was tested in MM212. Lenalidomide plus dexamethasone is now used in relapsed patients. Although increased angiogenesis in MM BM and the antiangiogenic effect of thalidomide formed the empiric basis for its use in MM, in vitro studies demonstrated that thalidomide-derived immunomodulatory drugs trigger activation of caspase-8, enhance MM cell sensitivity to FAS-induced apoptosis, and downregulate nuclear factor (NF)-kappa B activity as well as expression of cellular inhibitor of apoptosis213. Bortezomib is a proteasome inhibitor. It was shown that inhibition of proteasome pathway leads to cellular apoptosis and that malignant, proliferating cells are particularly susceptible to it214. Anti-myeloma activity of bortezomib was demonstrated in clinical studies, also in refractory cases215,216. Current MM treatment is based on stratification of patients according to their age, the stage of the disease and other prognostic factors indicative for high-risk myeloma like: deletion of chromosome 13 or hypodiploidy on conventional karyotyping; deletion of chromosome arm 17p or the presence of t(4;14) or t(14;16) on molecular genetic studies; high proliferating cells with labelling index ≥ 3%217,218. The median survival of high-risk patients is only two to three years, compared with five or more years in standard-risk cases189. At diagnosis, patients are evaluated for eligibility for transplantation. In the majority of cases autologous hematopoietic stem cell transplantation (ASCT) is undertaken219. However, allogeneic stem cell transplantation, if feasible, offers the advantage of a possibly more favourable outcome due to a graft-versus-tumor effect. The decision is principally based on patient’s age, and the limit for transplantation intervention is usually set at 65/70 years. A standard-risk transplant candidate will receive a pre-transplantation induction regimen based on novel agents219, like bortezomib/dexamethasone (BD) or bortezomib/thalidomide/

40 dexamethasone (BTD), both superior to thalidomide/dexamethasone (TD)220. After induction treatment, ASCT is performed and maintenance therapy with thalidomide prolongs event free survival and overall survival220. Patients not eligible for transplantation are treated with standard alkylating agents using one of the most common combinations, i.e. melphalan/prednisone/thalidomide (MPT), melphalan/prednisone/ bortezomib (MPV) or melphalan/prednisone/lenalidomide (MPR). An alternative to these MP combinations would be lenalidomide/dexamethasone, particularly using low dose dexamethasone220.

Genetic events in multiple myeloma The genetics of MM is still an important field of research. Techniques such as classical cytogenetics, FISH and CGH allowed the identification of recurrent genetic aberrations. Structural and numerical chromosomal aberrations found in MM are reminiscent of epithelial tumors rather than hematological malignancies193. However, the ratio of balanced translocations to unbalanced is higher in MM than in epithelial tumors193. The first report on the existence of two patterns of cytogenetics abnormalities in MM opposing hyperdiploid and non-hyperdiploid cases was published by Smadja et al. in 1998221 and was confirmed by other studies196,222,223. Approximately 55-60% of patients have a hyperdiploid karyotype, as defined by a chromosome number ranging from 48 to 74. In general, hyperdiploid cases present trisomies of odd chromosomes, in particular 3, 5, 7, 9, 11, 15, 19, 21. The remaining cases belong to the so-called non-hyperdiploid group, with less than 48 or more than 74 chromosomes, including hypodiploid, near-diploid, pseudodiploid or near- tetraploid chromosome numbers. Ploidy status rarely changes with disease progression224, and it has been demonstrated that hyperdiploid cases tend to have a better prognosis than non-hyperdiploid222. Deletions at chromosome 13 (del13) occur in approximately 70%-75% of non-hyperdiploid MM, but only in 35%-40% of hyperdiploid cases. It remains to be clarified if ploidy and del13 are independent prognostic factors196. In general, hyperdiploid cases are not interested by primary chromosomal translocations involving IgH locus, whereas these are frequent among nonhyperdiploid cases225. Secondary translocations, which appear as late events, seem to occur with a similar prevalence in hyperdiploid and non-hyperdiploid tumors196. A publication by Carrasco et al. identified, by means of outcome-annotated genomic profiles, four different subgroups of patients226. Traditional distinction of hyperdiploid (with odd chromosomes gained) vs non-hyperdiploid (with losses of 1p, 8p, 13, 16q, and 1q amplification) MM cases was confirmed. Interestingly, among hyperdiploid MM cases, two additional molecular subclasses with different clinical behaviour were described: hyperdiploid cases with gain of chromosome 11 showed a more favorable outcome than hyperdiploid cases with 1q gain and loss of chromosome 13. As previously anticipated, another characteristic feature of MM is the presence of chromosomal translocations involving the IgH locus at 14q32.3227, and less frequently the IgL

41 locus at 2p12 for κ light-chain or at 22q11 for λ light-chain228. Translocations involving the IgH switch region are detected both in MGUS and in MM, suggesting that they occur early in the disease process. They are defined as primary translocations, to differentiate them from the secondary translocation events typical for disease progression. Primary translocations are mediated mainly by errors in IgH switch recombination, but sometimes also by errors in SHM during GC reaction228,176. The prevalence of IgH translocations increases with disease stage: 50% in MGUS and in smoldering MM, 55-73% in intramedullary MM and 85% in PCL196. In HMCL more than 90% of the cases bear IgH translocations. The most common translocation partners are chromosome 4p16.3 with the genes MMSET and FGFR3 (fibroblast growth factor receptor 3)(15%), 11q13 with CCND1 (cyclin D1) (15%-20%), 16q23 with MAF (5%-10%), 6p21 with CCND3 (cyclin D3) (5%) and 20q11 with MAFB (5%)196. These genes result to be overexpressed owing to juxtaposition to the strong Ig enhancer. t(4;14) causes a simultaneous deregulation of two proteins: MMSET on der(4) and FGFR3 on der(14)229,230. FGFR3 is a receptor tyrosine kinase, that shows oncogene behaviour in MM231. MMSET, also known as WHSC1, is a nuclear SET domain protein. Proteins with SET domain are thought to be involved in chromatin remodelling. Interestingly, MLL1, another member of the mammalian SET domain protein family, is targeted by chromosomal translocations in acute leukemia232. The oncogenes MAF and MAFB are members of the same family of transcription factors with basic and leucine zipper domains, including also c-JUN, c-FOS. The centromeric 16q23 breakpoints are positioned within the introns of a very large gene, WWOX, located on a fragile site (FRA16D). The best studied oncogenic consequences of IgH translocations are those caused by MAF and FGFR3 deregulation. c-MAF promotes MM cell proliferation and increases MM cell adhesion to BM stromal cells233. Studies on FGFR3 showed that inhibition of the receptor leads to differentiation and apoptosis of t(4;14) MM cells234. However, there are also cases with t(4;14) that show IGH/MMSET fusion transcript and lack FGFR3 expression235. Thus, the real role of FGFR3 is still to be clarified. Deregulated or increased expression of one of the cyclins D1 (CCND1), D2 (CCND2) or D3 (CCND3) is observed in virtually all MM tumors, suggesting a major role of cyclin D in MM pathogenesis236. Cyclin D family is involved in the regulation of physiologic cell cycle progression from G1 to S phase, therefore its involvement in tumor development is not surprising. However, it is strange to observe high levels of cyclin D1, D2 and D3 mRNA in virtually all MM tumors236, that, on the other side, are characterized by a very low proliferation index. The proliferation rate of MM PCs is comparable to the one of healthy PCs, but the latter express low levels of CCND2 and CCND3 and little or no CCND1236. CCND1 is often translocated, and deregulated in the majority of hyperdiploid tumors by gene amplification or polysomy. CCND3 is also targeted by IgH translocations, and CCND2 was indicated as a target of MAF transcription factors. A classification system based on the integration of gene

42 expression profiling data and cytogenetics data to identify patterns of translocations and cyclin D expression (TC classification) was proposed236. Briefly, MM samples were subdivided in eight groups196, referred to as “11q”, “6p”, “MAF”, “4p”, “D1”, “D1 + D2”, “D2”, and “none”. Tumors of the groups “11q” and “6p” express high levels of either cyclin D1 or cyclin D3 as a result of t(11;14) or t(6;14), respectively; tumors of the “D1” group ectopically express low to moderate levels of cyclin D1, despite the absence of a t(11;14) translocation, and the “D1 + D2” group shows, in addition, cyclin D2 expression. The group “D2” represents a mixture of tumors that do not fall into any of the other groups and express cyclin D2; the group “none” does not express D-type cyclins. Tumors of the “4p” group express high levels of cyclin D2 and also MMSET (and in most cases FGFR3) as a result of a t(4;14) translocation; the “MAF” group shows the highest expression levels of cyclin D2 and also high levels of either c-MAF or MAFB, consistent with the possibility that both MAF transcription factors upregulate the expression of cyclin D2. Supervised hierarchical clustering of gene expression profiles based on TC classification identifies homogeneous groups of tumors with distinctive patterns of gene expression and phenotype. Moreover, the TC classification system is primarily focused on oncogenic events considered very early if not the initiating ones by the authors, with eventually the exception of the “D1+D2” subgroup. Investigations on the influence of cytogenetic aberrations on MM patients outcome revealed that IgH chromosomal translocations are linked to prognosis. The major distinction can be made for t(4;14) and t(14;16), which are associated with a shorter survival, and, on the other hand, t(11;14) with a longer survival with respect to other genetic subtypes237,238. The partner loci are still unknown in a fraction of cases with IgH translocations239, thus new deregulated genes are still to be identified. In addition to the alterations of ploidy status and the presence of chromosomal translocations, gains and losses of specific chromosomal regions are another typical event in MM. Regions of chromosomal CN changes were described240-242, mainly including gain of 1q, 6p, 9q, 11q, 12q, 15q, 17q and 19q (gain of odd chromosomes associated with hyperdiploidy) and deletion of 1p, 6q, 8p, 13q, 16q, 17p and 22q. Some DNA CN aberrations are linked to prognosis222. Monoallelic loss of 13q (del13) is a common event in MM and is associated with short survival and low response rate. Interestingly, del13 detected by karyotype analysis (done on metaphases) corresponds to a worse prognosis with respect to del13 detected by interphase FISH (done on nuclei), because karyotype detection requires a proliferative tumor in order to get metaphase cells. The identification of the minimal deleted region at 13q14243 did not revealed the role of del13 on MM pathogenesis. Although it is the locus where the tumor suppressor gene RB1 is mapped, its inactivation is improbable. The inactivation of its tumor suppressor function requires elimination of both alleles, and in MM biallelic deletion, inactivating mutations and lack of RB expression occur only rarely244. It is likely that more than one single gene is targeted by the deletion, and the biological explanation for worse prognosis linked to del13 remains

43 controversial. Also loss of 17p, where is located the tumor suppressor gene TP53, and 1p245 are related to poor prognosis. More recently, gains or amplifications of 1q were similarly linked to bad prognosis245-248. In particular, the candidate genes CKS1B, BCL9 and PDZK1 at 1q21 were suggested to be deregulated125,247,248. Also deletion and LOH of chromosome arm 16q, and the involved genes WWOX and CYLD, were described as being important in conferring poor prognosis249. MM progression is associated, from a molecular point of view, with increased instability at genetic level. Some frequent events observed in advanced stages are mutational activation of NRAS and KRAS oncogenes, inactivation of CDKN2A, CDKN2C, CDKN1B and/or PTEN tumor suppressor genes and other mutational events including inactivation of TP53 and secondary translocations involving MYC, both linked to poor prognosis193. This explains the more aggressive nature of advanced stages. The two members of the RAS family, NRAS and KRAS, are mutated in 40-50% of MM patients, a percentage increasing from MGUS to MM250. The activation of RAS genes leads to increased activation of the mitogen-activated protein kinase (MAPK) pathway251. MAPK is known to play a role in MM cell proliferation and survival, and might be involved also in disease progression. No activating mutations of HRAS were reported. Interestingly, KRAS2 mutation is associated with a shorter survival, whereas the prognosis of patients with NRAS mutation is similar to the one of non-mutated cases252. MYC abnormalities were identified by FISH analysis as complex translocations and insertions, and are frequently found in MM cell lines and in patients with aggressive disease and poor outcome253-255. MYC gene (c- >>N- >L-) dysregulation occurs by its juxtaposition to an Ig locus (IgH ~ Igλ >> Igκ) or to other less characterized chromosomal loci. Besides the most frequent MYC, other genes are targeted by secondary translocations, like for example MUM1/IRF4, IRTA1/IRTA2, PAX5, MLL1, and CCND2193.

Gene expression profiling in multiple myeloma DNA microarray technology was used in various recent studies to characterize the gene expression profiles (GEP) of malignant PCs245,256-263. GEP of more than 400 newly diagnosed patients led to a classification of MM in seven groups262, some of them clearly driven by the underlying IgH translocation or the ploidy status. A group (PR, proliferation) was characterized by the overexpression of genes involved in cell cycle, proliferation, and cancer-testis antigen genes (e.g. MAGE, GAGE). Another group (LB, low bone disease) presented distinctive lower expression of interferon-induced genes, like IFI27, and genes involved in bone disease, like the WNT signaling antagonists Dickkopf 1 (DKK1) and Frizzled B (FZD1), matching with a lower number of focal lesions as defined by magnetic resonance. MF group included t(14;16)(c-MAF) and t(14;20)(MAFB) samples, characterized by overexpression of MAF targets, such as CX3CR1 and ITGB7. The reciprocal t(4;14) led to activation of both FGFR3 and MMSET genes and clustered in the MS group,

44 including also overexpressed PBX1. CD-1 group included t(11;14), leading to CCND1 overexpression, whereas CD-2 included t(6;14) samples with CCND3 overexpression. CD-1 and CD-2 shared similar transcription profiles, although they could be divided in two distinct groups by significantly differentially expressed genes. HY group comprised hyperdiploid cases, overexpressing GNG11, the WNT signaling antagonists FRZB and DKK1, and several interferon-induced genes among others. The proliferation group was linked to disease progression. Moreover, proliferation and MS groups were characterized by overexpression of 1q genes and poor prognosis. The same series of data was analyzed to define a molecular signature associated with short survival, and 70 genes with deregulated expression were linked to bad prognosis (70-gene high-risk signature)245. They comprised 51 upregulated genes and 19 downregulated genes, that are mainly mapping to . At 1q were located the upregulated genes, and at 1p the downregulated ones. 17 of these 70 genes were sufficient for prediction of bad prognosis. Moreover, supervised clustering with the 70 genes applied to healthy donors’ PCs, MGUS patients, MM patients and HMCLs revealed in low-risk myeloma group a pattern similar to that of MGUS and normal PCs, whereas in high-risk group a pattern more similar to that of HMCLs. Several surveys were proposed on the comparison of GEPs of MM cells with MGUS and PCs from healthy patients. In a study by Zhan et al., BM PCs from 74 patients with newly diagnosed MM were compared to five MGUS and 31 healthy individuals as normal counterpart258. At least two classes of MM were identified: on one hand a GEP more similar to MGUS and, on the other hand, a GEP remembering those of HMCLs, with overexpression of genes involved in DNA metabolism and cell cycle control. In addition, 120 novel candidate disease genes were detected, that discriminate normal and malignant PCs. Many are involved in adhesion, apoptosis, cell cycle, drug resistance, growth arrest, oncogenesis, signalling, and transcription. A microarray expression analysis of MGUS and MM samples and their comparison to normal samples and between each other, allowed the identification of genes and pathways involved in the transformation from normal cells to MGUS and to MM260. Supervised analysis of the gene expression profiles of PCs from five healthy donors, seven MGUS and 22 MM patients revealed 263 genes differentially expressed between normal (N) and MGUS samples (172 downregulated, 91 upregulated), 380 between normal and MM samples (252 downregulated, 128 upregulated; 197 in common with the differentially expressed N to MGUS) and only 74 between MGUS and MM (52 downregulated, 22 upregulated). Most genes differentially expressed between normal and MGUS/MM PCs were downregulated and the differences of gene expression is, as expected, bigger between normal and either MGUS or MM than between MGUS and MM. This could be linked to the central feature of PC development, namely transcriptional repression. Gene expression profiles of PCs isolated from seven MGUS, 39 MM and six PCL patients were analyzed by means of DNA microarrays to investigate the contribution of specific genetic

45 lesions in MM biology264. Unsupervised analysis of GEP revealed molecular patterns associated with distinct IgH translocations. Supervised analysis of MGUS vs PCL expression profiles identified 120 differentially expressed probesets, linked to DNA replication and metabolism. Supervised analysis of MM vs MGUS expression profiles identified only 15 differentially expressed genes. Supervised analysis of differentially expressed genes in translocation subgroups revealed the overexpression of CCND2 and genes involved in cell adhesion pathways in cases with deregulated MAF and MAFB; upregulation of apoptosis-related genes in cases with t(4;14); downregulation of the alpha-subunit of the IL6 receptor in patients with the t(11;14). Moreover, a set of cancer germline antigens was specifically expressed in a subgroup of MM patients characterized by an aggressive clinical evolution. An interesting work presented by Mushi et al. compared the GEP of normal and malignant PCs purified from the BM of genetically identical twins261. One was a MM patient, the other was healthy and was used as normal reference. Two hundred and ninety-six genes were upregulated and 103 genes were downregulated at least 2-fold in MM cells versus normal twin PCs. Genes significantly upregulated in MM cells included cell survival genes, transcriptional factors, cell-cycle–related genes, stress response- and ubiquitin/proteasome pathway-related genes and various ribosomal genes reflecting increased metabolic and translational activity. Genes that were downregulated in MM cells versus healthy twin PCs included RAD51, killer cell immunoglobulin-like receptor protein, and apoptotic protease activating factor. More recently, whole-genome microarrays analysis on a large cohort of patients, identified genes differentially expressed between PCs from healthy donors, MGUS patients and MM patients263. Fifty two genes involved in important pathways related to cancer were differentially expressed in the PCs of healthy subjects vs patients with MGUS and symptomatic MM. Among MM patients, unsupervised hierarchical clustering detected a subgroup with MGUS-like signature, that exhibited favorable prognosis. All these publications about microarrays analysis of gene expression in MM revealed new molecular subclasses, some of which linked to better prognosis, and improved the understanding of MM pathogenesis.

Integrative analysis of expression and genomic profiles It was demonstrated that DNA CN changes are a mechanism that lead to altered gene expression in cancer cells30-32. Integrative studies combining genomic and gene expression profiles were undertaken by different research groups to clarify the consequences of genomic alterations in MM cell biology and to focus on a more restricted number of genes probably involved in the pathogenesis of MM5,125,226,265. Largo et al. characterized genes amplified and overexpressed in nine HMCLs (6/9 shared with our HMCL panel), with the aim of providing putative molecular targets for MM therapy265. They used FISH and cDNA microarrays for parallel CGH and gene expression analysis. The in-

46 house made CNIO oncochip containing 6,000 cancer-related transcripts was applied. With FISH analysis they classified the HMCLs into TC groups236, and through supervised analysis they identified 166 differentially expressed genes. By means of aCGH they identified recurrently amplified regions on five chromosomes (1, 7, 8, 11, 18). Recurrent gains were described at 1q, 4q, 7q, 8q, 11q, 15q, 18q and losses at 6q, 9p, 13q, 14q. To find genes whose expression levels were elevated because of increased DNA CN they combined CN analysis and expression profiles. 60 genes amplified and overexpressed in at least two cell lines and 29 deleted and downregulated in at least two cell lines were presented. Finally, they described co- amplification and overexpression for MALT1 and BCL2. Carrasco et al. performed a comprehensive and integrated analysis of recurrent amplifications and deletions with associated gene expression alteration both in HMCLs (43) and outcome-annotated MM patients (67)226. By means of unsupervised classification of outcome-annotated genomic profiles using non-negative matrix factorization (NMF), they identified four distinct subgroups of patients. Traditional distinction of hyperdiploid (with odd chromosomes gained) vs non-hyperdiploid (with losses of 1p, 8p, 13, 16q; 1q amplification) cases was confirmed. Surprisingly, the expected differences in outcome between the two groups, i.e. better outcome in hyperdiploid vs non-hyperdiploid, was not significant. Further analysis detected four distinct subgroups. Interestingly, among hyperdiploid MM cases, two additional molecular subclasses with different clinical behaviour were described: hyperdiploid cases with gain of chromosome 11 had more favorable outcome than hyperdiploid cases with 1q gain and loss of chromosome 13. Genomic profiles of 43 HMCLs and 67 MM patients presented recurrent gains of 1q, 3, 5, 7, 9, 11, 15, 19, 21 and losses of 1p and 13. They identified 298 minimal common regions (MCR) in at least two samples among 43 HMCLs and 67 MM cases. After the application of a high priority filter on MCR based on their presence in primary tumors and the occurrence of an high amplitude event, 87 MCRs were selected, comprising 40 deletions and 47 amplifications. Within the 87 prioritized MCRs they proposed lists of MM candidate genes (oncogenes or tumor suppressor genes) defined as the ones with matching CN and RNA level and with altered gene expression also in the absence of gene dosage alterations. Moreover, they indicated known viral integration sites and human miRNAs residing within prioritized MCRs. Additionally, in supplementary table 2 of226 were listed the candidate oncogenes (amplified and overexpressed) mapping to 1q21-q23 with significantly increased expression in the hyperdiploid subgroup linked to poor prognosis. The same was done for candidate tumor suppressor genes (deleted and downregulated) located at 13q, that were significantly decreased in their expression in the hyperdiploid subgroup linked to poor prognosis (supplementary table 3 of226). Walker et al. investigated myeloma genomics of 30 patients by SNP-based mapping arrays and gene expression arrays to discover genes important in MM pathogenesis125. The most frequent CN changes they detected were gains of odd chromosomes due to hyperdiploidy and of chromosome arms 1q and 6p, and losses of 1p, 6q, 8p, 13q, 16q. In 12 cases they

47 disposed of paired normal control, that were used to study LOH patterns. 117 LOH regions present in 10% of the patients were described. They covered 3,041 genes. Through comparison of these genes with a previous list of transcripts involved in the progression from normal to MGUS and MM260, they reduced the number of common genes to 47: 23 differentially expressed in MGUS vs normal samples; 38 in MM vs normal; eight in MGUS vs MM. By integration of expression and SNP mapping data they found the most frequent alterations at 1p, 1q, 6q, 8p, 13 and 16q. At 1p 51 genes were deleted and downregulated; at 1q 94 genes were amplified and overexpressed; at 6q 30 genes were deleted and downregulated; at 8p 46 genes had matching LOH and downregulation; at 13q 48 genes were deleted and downregulated; at 16q 60 genes were deleted and downregulated. In a work we recently published on the molecular characterization of a panel of 23 HMCLs, we used a similar integrative genomic approach5. To extend and improve the knowledge of useful experimental models like cell lines, we combined FISH with gene expression and genome-wide profiling. Some results will be reported here. Briefly, the expression profiles of the genes targeted by the main IgH translocations showed that WHSC1/MMSET was expressed at different levels in all of the HMCLs and that the expression of the MAF gene was not restricted to the HMCLs carrying t(14;16)(q32;q23), thus suggesting that molecular mechanisms other than chromosomal translocation could be responsible for their activation. MAFB overexpression correlated with translocation events involving its locus at chromosome 20. Cyclin D1 was only expressed in t(11;14)-positive cell lines, cyclin D2 expression was observed in correlation with MAF and MAFB expression but also in the absence of their upregulation, and cyclin D3 was present in all HMCLs but was particularly elevated in case of t(6;14) translocation. Genome-wide DNA profiling revealed remarkable genomic instability involving large DNA regions. DNA profiles identified 1q gain in 88% of the analyzed cell lines, together with recurrent gains on 8q, 18q, 7q, and 20q; the most frequent deletions affected 1p, 13q, 17p, and 14q; almost all cell lines (94%) presented LOH on chromosome 13. Recurrent amplifications, defined as CN higher than four in at least two cell lines, were identified. The most recurrent ones were localized at 1q21.1-q24.2 and 8q24.21-q24.3. Two hundred and twenty-two genes were found to be simultaneously overexpressed and amplified in our panel. In particular, 23 genes mainly located on chromosomes 1q, 8, 13, and 18, were overexpressed and amplified in at least two cell lines, including the BCL2 locus at 18q21.33.

The bone marrow microenvironment Besides the genomic alterations in MM cells, another crucial feature of MM pathogenesis is represented by the BM microenvironment. BM microenvironment is important for survival, growth and differentiation of both normal and malignant PCs. It is known that the interactions between PCs and their microenvironment promote tumor growth, survival, drug resistance and migration to the BM.

48 The BM consists of extracellular matrix (ECM) and cellular components266. The ECM proteins include fibronectin, collagen, laminin and ostepontin, whereas the cellular fraction is constituted by haematopoietic stem cells with progenitors and precursor cells, cells of the immune system, erythrocytes, osteoclasts and osteoblasts, bone marrow stromal cells (BMSCs) and bone marrow epithelial cells (BMECs). Cell-cell or cell-protein interactions are mediated by adhesion molecules on MM cell surface, as for example CD44, very late antigen (VLA4), VLA5, leukocyte function-associated antigen 1 (LFA1), neuronal adhesion molecule (NCAM), intercellular adhesion molecule (ICAM1), syndecan 1, MPC1266. Additional important players are cytokines and growth factors. Some of the most cited are: interleukin 6 (IL6), insulin-like growth factor 1 (IGF1), B-cell activating factor (BAFF), fibroblast growth factor (FGF), stromal cell-derived factor 1α (SDF1α; receptor on MM cells: CXCR4), tumor necrosis factor-α (TNF-α), IFN-α, IL10, transforming growth factor-β (TGFβ), hepathocyte growth factor (HGF), WNT family, epidermal growth factor (EGF) family, vascular endothelial growth factor (VEGF)190,266. These cytokines are usually provided by the microenvironment in a paracrine manner, but some of them can also be produced by the MM cells themselves in an autocrine stimulation loop initiated by cell-cell interactions or owing to genetic aberrations of the malignant PCs. The interactions of MM cells with surrounding cells or secreted cytokines and growth factors activate signalling pathways that mediate growth, survival, drug resistance, migration, osteoclastogenesis and angiogenesis267. The major pathways involved are: NF-kB, PI3K-AKT, RAS-RAF-MAPK, JAK/STAT3266. As a consequence, cytoplasmatic sequestration of transcription factors, upregulation of cell-cycle regulatory proteins and anti-apoptotic proteins, and increased activity of telomerase are observed195. Investigations on the regulation of these pathways in MM led to the identification of various cytokines and growth factors with major roles. GEP in MM helped a lot in their discovery190. Three categories of growth factors were identified in relationship to myeloma cell survival and proliferation, subdivided according to the signalling pathways they activate in PCs268. First, IL6, cytokines of the IL6 family and Interferon-α activate the JAK/STAT and MAP kinase pathways. STAT3 binding elements are found in the promoters of several antiapoptotic proteins, such as MCL1, BCL2 and BCL-XL. Second, PI-3K/AKT pathway is activated by IGF1 and heparin-binding factors, and mainly leads to inhibition of apoptosis. Besides these growth factors, PI-3K/AKT pathway is regulated also by the tumor suppressor gene PTEN. The fact that PTEN can be deleted or mutated in MM underlines the importance of PI-3K/AKT pathway in this pathology269. Heparin-binding factors interact with MM cells via the proteoglycan syndecan-1 (an heparan-sulfate protein), that presents them to their specific receptor. They comprise EGF family members, that activate ErbB receptors; hepatocyte growth factor (HGF) and its receptor MET270; fibroblast growth factors (FGFs) and their receptors (FGFRs). The EGF/EGF-receptor family includes four receptors (ErbB1-4) and 10 ligands (EGF, AREG, TGF-α to ErbB1(EGFR); HB-EGF, BTC, EPR to ErbB1 and 4; NRG1, NRG2 to ErbB3 and 4; NRG3 and 4 to ErbB4). The expression of ErbB

49 receptors is altered in many epithelial tumors but has never been involved in haematological malignancies. In a GEP study, the expression and biological activity of the EGF/EGF-receptor family was studied comparing PCs from 65 MM-patients and 7 normal individuals to plasmablasts and B-cells. Five EGF-family genes were expressed in MM: NRG2 and NRG3 were expressed by MM only, while NRG1, AREG and TGF-α were expressed both in MM and normal plasma-cells271. The third category of growth factors comprises the tumor necrosis factor family members BAFF and APRIL, that bind to tumor necrosis factor (TNF) receptors such as BAFF-R, BCMA, and TACI, and activate NF-kB pathway. BAFF is involved in the survival of normal and malignant PCs; APRIL is highly expressed in several tumor-surrounding tissues where it stimulates tumor growth272. TACI and BCMA are overexpressed in malignant PCs with respect to their normal counterpart, whereas their ligands, BAFF and APRIL, are produced by the tumor microenvironment273. Recently, by means of aCGH and GEP, two groups have identified an heterogeneous panel of genetic alterations yielding the activation of NF-kB pathway in about 20% of MM patients and 40% of MM cell lines274,275. They have observed overexpression and/or gain-of-function of genes activating the pathway, such as NIK, NFKB1, NFKB2, CD40, LTBR, and TACI. On the other hand also inactivating abnormalities of NF-kB negative regulators, like TRAF3, cIAP1, cIAP2, CYLD, TRAF2, and BIRC2/BIRC3 have been described. The two studies disagree only on whether the canonical or the noncanonical NF- kB pathways are mainly deregulated. In normal PCs, NF-kB is also activated, but in a controlled manner190. From this complex presentation of the interactions of MM cells with their surrounding tissue, it is clear that malignant behaviour of MM PCs is due both to genetic alterations leading to deregulated gene expression in the PCs, but also to the activating signals provided by the microenvironment. It is not surprising, therefore, that adhesion molecules, cytokines and all the interactions of MM cells with BM microenvironment are regarded as potential therapeutic targets, as demonstrated by the clinical trials exploiting some of them266. Another important feature to remember in relationship to the interaction of MM cells with BM microenvironment, is that advanced stages of the disease become more and more independent to the stimuli provided by the surrounding BM milieu due to the accumulation of diverse genetic aberrations. In fact, as already discussed, late stages are more aggressive and exhibit an increased rate of proliferation. This explains the expansion of malignant PCs of advanced MM from the BM to the peripheral blood or other tissues193.

3.2 Aim

The aim of our project was to improve the molecular characterization of MM and possibly identify new cancer genes exploitable as therapeutic targets. With this purpose, we obtained by DNA microarrays the genome-wide DNA and gene expression profiles of 29 MM samples,

50 comprising both cell lines (17) and clinical samples (12). Our major goal was the identification of candidate oncogenes, therefore we decided to focus our attention mainly on amplified regions. As already discussed, amplification is an important mechanism that can activate genes relevant for tumorigenesis. Activated, overexpressed genes represent favorite targets for the development of novel therapeutic approaches. With the available data we could describe the recurrent aberrations found in the genomic profiles of MM samples. To identify those aberrations influencing the expression of relevant genes we decided to integrate global gene expression profiling and genomic profiling data. Our hypothesis was to detect causative lesions in MM genome as the ones having a biologic effect on the transcription level and being recurrent in our data set and in concordance with previously published data sets. Recurrent aberrations are more probably playing a causative role rather than being the consequence of random events. Therefore, comprehensive and integrated analysis of gains and deletions with their associated alterations at gene expression level in all myeloma samples was undertaken. To focus on a more restricted list of candidate genes we filtered the resulting lists of selected transcripts with matching DNA and RNA level for the recurrent ones. Finally, we considered some candidate genes more relevant if concordant with other MM data sets and if previously linked to MM or, more generally, to cancer.

3.3 Materials and methods

Clinical samples Bone marrow aspirates from 12 untreated MM patients and one untreated PCL patient were collected at the Fondazione IRCCS, Policlinico and Dipartimento Scienze Mediche, University of Milan, Italy after informed consent during standard diagnostic procedures. PCs were purified from bone marrow samples, after red blood cell lysis with 0.86% ammonium chloride, using CD138 immunomagnetic microbeads according to the manufacturer’s instructions (MidiMACS system, Milteny Biotec, Auburn, CA, USA)264,276,277. Briefly, 10 μl of CD138 microbeads and 90 μl of buffer M (PBS, pH 7.2, BSA 0.5%, EDTA 2 mM) per 10 x 106 mononuclear cells were added and incubated for 45’ at 4°C. The coated cells were then fluorescein isothiocyanate (FITC)-labelled with 10 μl of CD138 antibody for 10’ at 4°C. After washing, the cells were layered on a MS+ separation column following the manufacturer’s instructions (Milteny Biotec, Bergisch Gladbach, Germany). The CD138+ cells were removed from the column and their purity was on average 71% (range: 45% - 98%; table 1A, chapter 3.4.1), as assessed by means of cytofluorimetric evaluation of cells expressing CD138.

51 Cell lines HMCLs were obtained from the following sources: DMSZ-German collection of Microorganisms and Cell Culture, Germany (NCI-H929, OPM2, JJN3, RPMI8226, U266 and KMS12); Dr T. Otsuki, Kawasaki Medical School, Okayama, Japan (KMS28, KMS34, KMS18, KMS11, KMS26, KMM1 and KMS20); Dr F. Malavasi, Department of Genetics, University of Torino, Italy (LP1, SKMM1) or established at the Fondazione IRCCS, Policlinico and Dipartimento Scienze Mediche, University of Milan, Italy (CMA02 and CMA03)278. The cell lines were cultured in Iscove’s modified Dulbecco’s medium (IMDM) (GIBCO, Invitrogen AG, Basel, Switzerland) supplemented with 10% fetal bovine serum (GIBCO) and maintained at 37°C, in a 5% CO2 humified atmosphere. CMA02 and CMA03 were cultured in presence of 20 U/ml recombinant human IL-6 (R&D System, Minneapolis, MN, USA).

Fluorescence in situ hybridization (FISH) FISH analysis was adopted to characterize the MM samples (patients and cell lines) for the presence of the MM typical IgH chromosomal translocations. FISH was performed at the Fondazione IRCCS, Policlinico and Dipartimento Scienze Mediche, University of Milan, Italy as previously described5,255,276. Briefly, bone marrow cell suspensions were cultured in RPMI-1640 medium supplemented with 15% fetal calf serum whereas cell lines were cultured in the respective growth medium. Both primary cells and cell lines were incubated for 24 h at 37°C in 5% CO2 without any mitogen. Then, cells were treated with colcemid (0.1 μg/ml) (Celbio, Milan,Italy) for 90 minutes. The cells were harvested using hypotonic potassium chloride, fixed with 3:1 (v/v) ratio of methanol/glacial acetic acid and then stored at -20°C. The slides were hybridized in situ with probes labelled with biotin or fluorochrome Cy3 (Amersham, Little Chalfont, UK) by nick translation as previously described279,280. The biotin-labelled DNA was detected using FITC-conjugated avidin as a green signal (Vector Laboratories, Burlingame, CA); direct Cy3 was detected as a red signal. The chromosomes were stained with 4’,6’-diaminido-2-phenylindole dihydrochloride (DAPI). Digital images were collected using a Leica DMR epifluorescence microscope (Leica Imaging Systems Ltd., Cambridge, UK) equipped with a CCD camera (Cohu Inc., San Diego, CA) and the FITC-avidin, Cy3 and DAPI fluorescence signals were detected using specific filters. The images were recorded and merged using QFISH software (Leica Imaging Systems Ltd., Cambridge, UK).

Genome-wide DNA profiles Genome-wide profiles were obtained with the GeneChip Mapping 10K Xba 2.0 assay in conjunction with the GeneChip Human Mapping 10K Array Xba 142 2.0 according to manufacturer’s instructions (Affymetrix Inc., Santa Clara, CA, USA). This assay enables genotyping of more than 10,000 SNP loci on a single array. In the following paragraphs are

52 listed all the procedures on DNA target preparation, target hybridization, fluidics station set up, array scan and data analysis.

Genomic DNA preparation: isolation, quality control and quantification Human genomic DNA was isolated using QIAamp DNA Mini Kit (Qiagen, Hilden, Germany) following the manufacturer’s instructions. Genomic DNA samples were quantified by spectrophotometric measurements using a Nanodrop (NanoDrop Technologies, Wilmington, DE, USA). DNA concentration was adjusted to 50 ng/μl with reduced EDTA TE Buffer (Appendix). Whole DNA integrity was verified by electrophoresis of at least 250 ng total genomic DNA on a 1% agarose gel prepared and ran in TBE buffer (Appendix). After electrophoresis, DNA was stained in aqueous solution of ethidium bromide (AppliChem GmbH, Darmstadt, DE) (0.5 μg/ml) and visualized using the AlphaImager 3400 (AlphaInnotech Corporation, Fremont, CA, USA). Genomic DNA degradation was detected as a smear given by DNA fragments of variable lengths. In contrast, high quality genomic DNA ran as a major band at approximately 10-20 kb. Degraded genomic DNA samples were discarded.

GeneChip Mapping 10K 2.0 assay protocol 250 ng of genomic DNA (50 ng/μl) were digested with a mix prepared on ice containing 1x NE Buffer 2 (New England Biolab, Beverly, MA, USA), 1x BSA (New England Biolab), 0.5 U/μl Xba I (New England Biolab) in a final volume of 20 μl. The digestion mix was incubated at 37°C for 120’, 70°C for 20’ and cooled to 4°C in a MJR Thermal Cycler 200 (MJ Research, Cambridge, MA, USA). Digested DNA was ligated to Xba adaptor adding to it a reaction mix prepared on ice and containing 0.25 μM Adaptor Xba, 1x T4 DNA Ligase buffer (New England Biolab) and T4 DNA Ligase (New England Biolab) diluted 1:20 in a final volume of 25 μl. For good ligation efficiency, it is essential to mix ligase buffer throughly before use and held it at 4°C. Ligation program was run in a MJR Thermal Cycler 200 as follows: 16°C for 120’, 70°C for 20’ and final cooling to 4°C. The ligated DNA was diluted 1:4 with water (Aqua ad injectabilia, B.Braun) prior to perform PCR. PCR amplification of ligated DNA fragments was performed in four replicates for each sample. The PCR Master Mix (mmix) was prepared on ice with the AmpliTaq Gold PCR Kit (Applied Biosystems, Rotkreuz, Switzerland) and contained 1x PCR Buffer, 250 μM each dNTP

(Invitrogen, Carlsbad, CA, USA), 2.5 mM MgCl2, 0.75 μM PCR Primer Xba, and 0.1 U/μl AmpliTaq Gold in a final volume of 100 μl with 10 μl diluted ligated DNA for each replicate. The PCR program ran on a MJR Thermal Cycler 200 with heated lid was: initial denaturation at 95°C for 3’, followed by 35 cycles of 95°C for 20’’, 59°C for 15’’ and 72°C for 15’’, then finally 72°C for 7’. The quality of PCR products was checked on a 2% agarose gel (TBE). The expected PCR product average size is between 250-1,000 bp.

53 PCR products were purified and eluted with QIAGEN MinElute 96 UF PCR Purification Kit (Qiagen) according to the manufacturer’s instructions. Briefly, a 96 UF PCR Purification plate was connected to a vacuum source and the four PCR reactions per sample were consolidated into one well of the plate. With application of a vacuum the wells were dried and washed two times with water. The vacuum was released and the purified PCR products were eluted adding 40 μl EB buffer to each well and shaking the plate at room temperature for 5’. The purified PCR product was recovered by pipetting the eluate out of each well, and was quantified with a Nanodrop (NanoDrop Technologies). 20 μg of purified PCR product in a volume of 45 μl were needed for the next fragmentation step. Fragmentation was performed in a final volume of 55 μl containing, besides 45 μl of purified PCR product, 1x Fragmentation Buffer and 0.24 U Fragmentation reagent. The Fragmentation reagent (DNase I) is very sensitive and for a good efficiency it requires a rapid handling, fresh preparation and low temperature (work on ice). Fragmentation was performed at 37°C for 30’, followed by DNase inactivation step at 95°C for 15’, on a MJR Thermal Cycler 200. The fragmentation results were checked on a 4% agarose gel (TBE). The expected fragment average size is less than 180 bp. The fragments were then labelled with 0.143 mM GeneChip DNA Labelling Reagent, 1x TdT Buffer and 1.5 U/μl Terminal deoxynucleotidyl Transferase (TdT). The GeneChip DNA Labelling Reagent contains a biotinylated proprietary label, and is used to generate labelled DNA targets for hybridization to GeneChip arrays. The mix was prepared on ice and the incubation at 37°C for 120’, and 95°C for 15’ was performed on a MJR Thermal Cycler 200. Buffers and solutions for array hybridization, washing and staining were prepared according to the GeneChip Mapping 10K 2.0 Assay Manual and, if recommended, were filtered with Stericup vacuum driven disposable filtration systems (Millipore AG, Zug, Switzerland). The Hybridization Cocktail was prepared mixing 70 μl of fragmented and labelled DNA target with 56 mM MES, 5% DMSO (Sigma-Aldrich Chemie GmbH, Buchs, Switzerland), 2.5 x Denhardt’s Solution (Sigma-Aldrich Chemie GmbH), 5.77 mM EDTA (Ambion Europe Ltd., Huntingdon, UK), 0.115 mg/ml Herring Sperm DNA (Invitrogen), 1x Oligonucleotide Control Reagent, 11.5 μg/ml Human Cot-1 (Invitrogen), 0.0115% Tween-20 (Sigma-Aldrich Chemie GmbH), 2.69 M TMACL (Tetramethyl Ammonium Chloride) (Sigma-Aldrich Chemie GmbH), to a final volume of 260 μl. The Hybridization Mix was heated at 95°C for 10’, cooled down on ice for 10’’ and centrifuged at 2,000 rpm for 1’. After an incubation at 48°C for 2’, 80 μl of denatured Hybridization Mix were injected into the GeneChip Human Mapping 10K Array, that was placed at 48°C in a GeneChip Hybridization Oven 640 (Affymetrix), at 60 rpm and incubated for 16-18 hours. To wash, stain, and scan a probe array, we first registered the experiment in GeneChip Operating Software (GCOS). Briefly, each experiment was assigned with a name, the probe

54 array type used (for 10K arrays: Mapping10K_Xba142), the name and other specifications of the sample, the name of the project. Washing and staining was processed with the Affymetrix Fluidics Station 400. The scripts for the fluidics station were downloaded from the Affymetrix web site at www.affymetrix.com/ support/technical/fluidics_scripts.affx and the fluidics protocol “Mini_Mapping10Kv1” was followed. After 16-18 hours of hybridization, the hybridization cocktail was removed from the probe array, that was completely filled with Array Holding Buffer. We then followed the fluidics protocol “Mini_Mapping10Kv1”. It consisted of initial cycles of washing with Wash Buffer A and B, followed by a first stain with Streptavidin Phycoerythrin (SAPE), Wash A, second stain with biotinylated anti-streptavidin antibody and third stain again with SAPE. After some final wash cycles, the array was filled with Array Holding Buffer. Second and third stain allowed an amplification of the fluorescent signal. The staining reagents were prepared fresh according to the manual: SAPE solution mix contained 1x Stain Buffer and 10 μg/ml Streptavidin Phycoerythrin (SAPE) (Molecular Probes Inc., Eugene, OR, USA); the Antibody Solution mix contained 1x Stain Buffer and 5 μg/ml biotinylated anti-streptavidin antibody (goat) (Vector Laboratories, Inc., Burlingame, CA, USA). At the end of the wash and stain procedure the array was filled with the Array Holding Buffer and the probe array window was checked for bubbles. If large bubbles were present, we proceeded to manual drain of the Array Holding Buffer from the array and subsequent filling with fresh Holding Buffer. The arrays were finally scanned using the Affymetrix GeneChip Scanner 3000. Library files “Mapping10K_Xba142” required to scan and analyze the 10K Xba 2.0 arrays were downloaded from the from the Affymetrix web site at www.affymetrix.com/support/ technical/libraryfilesmain.affx. Data acquisition was performed by GCOS, that automatically generated cell intensity data (CEL files) from each array. This is possible thanks to the presence in the hybridization mix of the Oligonucleotide Control Reagent (OCR). OCR is composed by the Hybridization Control Stock and the Control Oligo B2. The Hybridization Control Stock is made by premixed, biotin- labelled non-eukaryotic transcripts in staggered concentrations, which bind to specific probe sets and serve as control for proper hybridization, washing, and staining procedures. Control Oligo B2 is a synthetic biotinylated oligonucleotide that provides alignment signals for image analysis and enables an automatic grid alignment on the hybridization signal intensities collected by scanning. The creation of a CEL file is a guarantee of good hybridization and successful grid alignment, that allow the acquisition and evaluation of intensity data.

Data analysis Data acquisition was performed using GCOS v1.2 and data analysis was done with GeneChip DNA Analysis Software (GDAS) v3.0, both Affymetrix software applications.

55 CHP files with raw perfect match and mismatch intensities and genotype calls of each interrogated SNP were generated for each array by GDAS starting with the raw intensity data (CEL files). The GDAS calculation is based on the modified partitioning around medoids (MPAM) mapping algorithm114. Subsequent data analysis was performed using the open-source, statistical program R and the Bioconductor packages281. Mapping data for probes were derived from the National Center for Biotechnology Information (NCBI) Human Genome Build 35, as provided by Affymetrix for the GeneChip Mapping 10K Array Xba 142 2.0 (release date: 22 September 2005). A total of 172 non-annotated probes were discarded, along with further 75 probes that had identical raw CN on the whole dataset, indicating a possible probe failure. The GeneChip Chromosome Copy Number Analysis Tool (CNAT) v2.1β282 was used to extract CN and loss of heterozygosity (LOH) probability starting from the GDAS-derived, raw CHP files. In absence of paired tumor and normal DNA from the same sample, CN and LOH were inferred by the software with respect to an integrated reference set containing more than 100 normal individuals. The raw CN data were pre-processed by taking the ratio of each raw SNP value over the average signal of three strictly normal references, in order to reduce multiplicative noise from the probes. The obtained value was referred to as normalized CN. Raw LOH data were not pre-processed. After pre-processing, chromosome CN profiles were estimated using the DNAcopy Bioconductor package, which looks for optimal breakpoints through successive statistical comparison of chromosomal segments applying the circular binary segmentation (CBS) algorithm283. The obtained CN was referred to as estimated CN and was used for further applications on genome-wide DNA CN profiles if not otherwise specified. The median of the estimated profiles was then scaled back to a nominal multiplicity of two or some other known value if ploidy had been determined by cytogenetics analyses. Estimated CN values were discretized into four major CN states according to CN≤ 1.7 for deletions, 1.7 2.3 for gains and CN>4 for amplifications. We defined regions with significant LOH as those with a LOH probability value higher than 4. These thresholds corresponded to a statistical significance interval of p<0.005 that was evaluated separately using a larger data set of 85 profiles of mixed B-cell neoplasms. All the data obtained with Affymetrix GeneChip Mapping 10K 2.0 assay were collected in an Excel table, where for each interrogated SNP (raws) and each processed sample (columns) it is possible to find the genotype call, the normalized CN, the estimated CN and the LOH probability. After having grouped all MM samples in a unique Excel table or MM patients and HMCLs separately, two columns were added for each sample: in one column it was reported the CN state assigned to each SNP and in the other column the LOH state, both defined according to the above mentioned thresholds. In addition, a conditional formatting procedure was applied to these latter columns for a more comfortable visualization of the

56 results. For CN, 1 (green) corresponded to deletion, 2 to normal and 3 (red) to gain; whereas for LOH, 2 (grey) corresponded to significant LOH. Three additional columns were added in every Excel table (30 MM samples, 17 HMCLs or 13 MM samples) where the percentage of cases with gain, loss and significant LOH among the analyzed samples was calculated for each SNP. Frequency tables were assembled based on these latter calculations in conformity with some criteria. First, chromosome X was not taken into consideration. Second, regions affected by either gain, deletion or LOH in at least 40% (38% in 13 MM patients and 41% in 17 HMCLs) of the cases were defined as recurrent and were reported in the frequency tables as delimited by the SNP-ID coverage. Finally, the minimal affected interval was reported for each recurrent region with the corresponding frequency level. A complete characterization of the genome-wide CN changes of the 17 HMCLs as determined by the thresholds on estimated CN was summarized in a table. In this table were listed all the chromosomal regions - except of chromosome X - affected by gain or deletion for each cell line according to intervals as delimited by SNP-ID coverage. This was created as an useful instrument to choose among available cell lines, the cell line presenting a defined aberration under investigation. Heat-maps and frequency plots were created for all MM samples together or MM patients and HMCLs separately with the open-source software R according to the calculated thresholds on the estimated CN and the calculated frequencies. The heat-map is a graphical representation that distinguishes regions with different DNA CN or LOH probability with different colors.

Quantitative real-time PCR (qPCR) Relative gene copy numbers of selected amplified regions (CN> 4) identified by the aCGH analysis were determined by quantitative real-time polymerase chain reaction (qPCR). Regions to be validated were chosen due to the presence of an associated increased gene expression level or because mapped in a region (11q23.1) identified as amplified also in concomitant projects (chapter 4). qPCR was performed using the ABI PRISM 7000 Sequence Detection System and the SYBR GREEN PCR mmix (Applied Biosystems). Primers were designed using Primer Express 2.0 software (Suppl. Tab. 1). Primers LINE-1 F and LINE-1 R, specific for the homonymous repetitive element Line-1, were used as endogenous control118,284. All primers were synthesized by Sigma-Genosys (Steinheim, D). PCRs were performed using a 96-wells plate and optical adhesive covers (Applied Biosystems) in a final volume of 20 μl with 1x SYBR GREEN PCR mmix and 10 ng genomic DNA. Cycling parameters were: 95°C for 10 min, followed by 40 cycles of 95°C for 15’’ and 60°C for 1’. All samples were analyzed in triplicate, together with a negative control containing all PCR reagents without any template DNA (no template control, NTC) and a DNA sample from an healthy donor as normal reference. The standard curve method was used to calculate target gene copy number in the tumor DNA sample normalized to the

57 repetitive element Line-1. Normal DNA was used as calibrator for relative quantification. Normal DNA copy number was set to two in order to facilitate the interpretation of the results. Dissociation curve analysis was performed to verify the absence of unspecific amplification.

Gene expression profiles Gene expression profiling data were generated at the Fondazione IRCCS, Policlinico and Dipartimento Scienze Mediche, University of Milan, Italy with GeneChip Human Genome U133A (HG-U133A) arrays according to manufacturer’s instructions (Affymetrix). HG-U133A is one of the two arrays constituting the HG-U133 Set (U133A + B) and covers the most characterized genes. Bone marrow specimens from 102 MM patients, 11 MGUS, 9 PCL, together with four healthy donors were collected and processed with the same assay by the collaborative group in Milan285. Gene expression level of transcripts selected by integration of genomic and gene expression profiles are presented.

Total RNA preparation: isolation, quality control and quantitation Total RNA was extracted from PCs using the TRIZOL reagent (Invitrogen) and purified using the RNeasy total RNA Isolation Kit (Qiagen, Valencia, CA, USA). RNA integrity was verified with Agilent 2100 Bioanalyzer (Agilent Technologies, Waldbronn, D).

GeneChip Human Genome U133A assay protocol Biotin-labelled cRNA was prepared following the GeneChip Expression Analysis Technical Manual protocol (Affymetrix) and hybridized on GeneChip Human Genome U133A (HG- U133A) arrays according to manufacturer’s instructions (Affymetrix) and previously published works264,286. Briefly, 5 μg of total RNA were converted into double-stranded cDNA using the Custom SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen). The cDNA was extracted with Phase Lock Gel Light (Eppendorf, Hamburg, D), based on phenol-chloroform phase separation technique. Biotinylated cRNA was generated using an in vitro reaction with Enzo BioArray HighYield RNA transcript labelling kit (Enzo Life Sciences, Inc., Farmingdale, NY, USA) and purified with RNeasy total RNA Isolation kit (Qiagen). 15 μg fragmented cRNA were hybridized on HG-U133A arrays after quality checking on GeneChip Test3 arrays (Affymetrix). The arrays were scanned using the GeneArray Scanner G2500A (Agilent).

Data analysis The expression probes were annotated using the latest annotation for the HG-U133A chip set (release date 17 September 2005). A total of 1530 non-annotated probes were disregarded, along with further 2029 probes sets of the special types "_x", "_r", "_f", that are not unique for a single gene sequence. Probe level signals were converted to expression values using the Bioconductor package for Robust Multi-array Average (RMA) analysis287, in which perfect

58 match intensities are background adjusted, normalized using quantile-quantile normalization, and log2 transformed. Detection calls were calculated using the default parameters of Affymetrix MAS 5.0 software package. Data with absent calls in all arrays were filtered out, whereas no filtering procedure was applied to the intensity levels. The images were processed using Affymetrix MicroArray (MAS) 5.0 software to generate gene expression intensity values.

Integrative analysis of expression and genomic profiles Using annotation tables of the Affymetrix GeneChip Mapping 10K 2.0 Array and GeneChip HG-U133A Array, we located for each HG-U133A expression probe (also referred to as probe- ID) its chromosomal position. Using a bisection search algorithm we found the two SNP probes of the 10K array (also referred to as SNP-ID) flanking immediately left and right from its consensus start position and end position, respectively. Subsequently, a table of SNP- expression probe pairs was created by pairing each expression probe with latter two SNP probes and any other SNP probe in between; a total of 38,539 SNP-expression probe pairs were found. We used two different integrative approaches to find transcripts with altered gene expression due to DNA CN changes, such as gained and amplified genes matching with overexpressed transcripts or deleted genes matching with downregulated transcripts, to which we will later refer to as selected transcripts. In the first integrative approach, we applied a matched filter on both DNA CN and gene expression value to select gained (CN> 2.3) or amplified (CN> 4) cases with simultaneous overexpression of the paired gene of at least 2-fold above the average expression of all cases or, respectively, cases with loss in DNA (CN≤ 1.7) and simultaneous downregulation of the paired gene of at least 2-fold downward from the average expression value. First, gene expression probes were pre-filtered for at least 0.2 variation. Then, for each SNP-expression probe pair, a matched filter selected the cases with amplified copy number (CN> 4) and simultaneous overexpression of the paired gene of at least 2-fold above the average expression values. The same was done for DNA CN gain (CN> 2.3) and DNA CN loss (CN< 1.7), in the latter case looking for simultaneous downregulation of the paired gene of at least 2-fold downward from the average expression value of all cases. Our second method was based on the identification of genes with an outlier expression profile according to the COPA (Cancer Outlier Profile Analysis) algorithm proposed by Tomlins et al.6. The Cancer Outlier Profile Analysis (COPA) is a method developed to identify candidate cancer genes as those presenting large deviations from the median expression value in a subset of cases, i.e. outliers for their gene expression profile, due to underlying genetic events such as translocations or genomic amplifications. Simple numerical transformations based on the calculation of the median and Median Absolute Deviation

59 (MAD) are applied on the expression value of each gene-specific probe in a group of samples. First, the samples are ordered by ascending expression level for a specific gene. Then, the probe-specific COPA score is calculated for a chosen percentile “n”, resulting from the difference of the n-th percentile expression value to the median expression value of a gene across the samples, divided by the MAD value (Fig. 12).

Fig. 12: The COPA transformation based on Tomlins et al.6. COPA 90th-score is calculated as the fold-change in the gene expression value of the sample at the 90th percentile with respect to the median expression value, normalized with the Median Absolute Deviation (MAD).

Finally, COPA scores of the n-th percentile are rank-ordered to provide a prioritized list of genes with outlier expression profiles. Since we were interested in the influence of gene dosage alterations on the gene expression level, we searched for transcription outliers among overexpressed genes and we selected those genes with an increased DNA level. We determined the top 100 unique outliers at the 75th and 90th percentiles and then we selected the ones with recurrent DNA gain (CN> 2.3) (i.e. in at least five cases). With both the matched filter and the COPA-based combination of genomic and gene expression profiles we often found repetitions of the same gene in the resulting list of selected transcripts. This can be due to the presence of various probe pairs representing the same gene due to different SNP-IDs paired with the same probe-ID or vice versa. We adopted a filtering procedure to obtain lists of unique genes. Among the matched filter results, only the probe pair found in the majority of samples was considered, while the remaining probe pairs representing the same transcript but with a lower frequency were discarded. Among COPA outliers, the transcript with the better ranking COPA score (highest among top 100) and the highest frequency was considered. As a matter of completeness, the quantity of unique SNP- IDs, unique probe-IDs and unique genes identified in every integrative approach are mentioned in the results (Tabb. 5-6, chapter 3.4.2). Additional tables were created with the obtained lists of selected transcripts applying a filter on the number of cases showing a defined association between DNA CN and gene

60 expression level (frequency filtered results). This was done to narrow down the lists to recurrent candidate genes, increasing the significance of the results. Selected genes showing amplification paired with overexpression were filtered for genes shared by at least two cases, whereas gains or losses matching with overexpression or downregulation gave longer lists that were filtered for genes found in at least five cases.

Functional analysis with DAVID Bioinformatic Resources 2008 Lists of selected transcripts, defined as those with positive association between DNA CN and gene expression level, were functionally annotated with the web-accessible program DAVID 2008 (http://david.abcc.ncifcrf.gov/). We analyzed both the complete lists of unique transcripts identified with matched filter integrative analysis (gained and overexpressed; amplified and overexpressed; lost and downregulated) and COPA-based integrative analysis (75th and 90th COPA outliers with gain) and also the frequency filtered lists. The Database for Annotation, Visualization and Integrated Discovery (DAVID) is a web- accessible program of the National Institute of Allergy and Infectious Diseases (NIAID, National Institute of Health NIH) for functional annotation of microarray analysis results. DAVID combines annotations from different public databases “to understand biological meaning behind large list of genes”. It identifies among the submitted list of genes enriched biological themes, in particular (GO) categories such as Biological Process (BP), Molecular Function (MF) and Cellular Component (CC), and analyzes the distribution of genes on biochemical pathways like BioCarta, KEGG and Panther pathways288. The file format needed for the analysis can be a plain or a tab delimited text with the gene identifier or probe_ID as first column and no column headings. DAVID considers only unique gene identifiers from the user input gene list, thus in case of repetitions of the same gene, these are discarded. To perform functional annotation analysis with DAVID we prepared text files listing all Probe-ID (i.e. identifier of the HG-U133A expression probes) of the selected transcripts to be submitted to DAVID by the “Functional Annotation” tool. The following procedure was done: “Select identifier: AFFY-ID”, “Upload gene list” choosing from previously created text files the list to be submitted, that was compared to the “Background: Homo sapiens”. Annotation was done with the Panther category “Biological Process (BP)” and the pathways “Biocarta” and “Panther Pathway”. PANTHER is an acronymous for Protein Analysis THrough Evolutionary Relationships (www.pantherdb.org). It is a database initially designed by Celera for high- throughput analysis of protein sequences that offers, as key feature, a simplified ontology of protein function, mainly used for the classification of new sequences289. PANTHER Classification System uses published scientific experimental evidences and evolutionary relationships to predict function, even in the absence of direct experimental proof. The applications on protein classification and protein function (ontology terms and pathways) proposed by PANTHER increased with time and it is now possible also to use it for the

61 functional analysis of large transcript lists, including data generated from high-throughput expression experiments290. Lists of genes can be analyzed in terms of statistically over- or under-represented categories, such as molecular function, biological process or pathway, with respect to a reference list. Among the annotation terms we considered as significant only those with a FDR≤0.1.

3.4 Results

3.4.1 Genome profiling

The MM samples were characterized by FISH for the presence of the typical IgH chromosomal translocations (Tab. 1).

Tab. 1: FISH analysis of typical MM translocations. A 13 MM clinical samples264,276. Numbering of samples adopted for genomic profiling with 10K SNP arrays is given, together with corresponding numbering adopted for gene expression analysis with HG-U133A arrays. %CD138: PC purity as assessed by cytofluorimetry. POS: indicates positivity for a given translocation as resulting from FISH analysis. NA: not analyzed. A t(4;14) t(6;14) t(11;14) t(14;16) t(14;20) 10K HG-U133A %CD138 FGFR3/MMSET CCND3 CCND1 MAF MAFB 05-099 MM-016 82 05-089 MM-026 65 POS 05-094 MM-031 64 POS 05-095 MM-037 87 POS 05-090 MM-038 45 05-098 MM-039 56 05-100 MM-040 53 05-093 MM-042 98 POS 05-097 MM-049 68 05-102 MM-070 57 POS 05-092 MM-079 89 05-088 PCL004 60 POS 05-091 NA 97 NA NA NA NA NA

B 17 HMCLs5,255. POS: indicates positivity for a given translocation as resulting from FISH analysis. B t(4;14) t(14;16) t(14;20) t(11;14) t(6;14) t(8;14) FGFR3/MMSET MAF MAFB CCND1 CCND3 MYC KMS28 POS POS KMS34 POS NCI-H929 POS KMS18 POS t(2;8)(qter;24) LP1 POS OPM2 POS t(?;20) POS KMS11 POS POS POS KMS26 POS POS JJN3 POS POS c-MYC insertion on RPMI8226 t(16;22)(q32;q11):der(16) CMA-02 POS POS

62 Table 1B: continuation. B t(4;14) t(14;16) t(14;20) t(11;14) t(6;14) t(8;14) FGFR3/MMSET MAF MAFB CCND1 CCND3 MYC SKMM1 POS POS CMA-03 t(?;20) POS U266 POS KMS12 POS KMM1 t(6p25;14) POS KMS20 POS t(11;14)(CCND1) was detected in three patients; t(4;14)(FGFR3 and MMSET/WHSC1) was detected in one case, as well as t(6;14)(CCND3) and t(14;16)(MAF); t(14;20)(MAFB) was not present among clinical samples. In one clinical sample, FISH was not performed and in six cases any known translocation was detected. CCND1 translocation at 11q13 t(11;?) not involving the IgH locus or MAFB translocation at 20q t(20;?) not involving the IgH locus were not detected in our series. Among the cell lines, t(4;14)(FGFR3 and MMSET/WHSC1) was detected in eight cell lines; t(14;16)(MAF) in five cell lines, t(14;20)(MAFB) in three cell lines; t(11;14)(CCND1) in two cell lines and t(6;14)(CCND3) in one cell line. Three cell lines had t(4;14) and a second translocation, either t(14;16) (in two of them) or t(14;20). Moreover, rearrangements involving MYC locus were detected in 10 out of 17 HMCLs. Thirty genome-wide DNA profiles of MM samples were obtained, comprising 17 HMCLs and 13 MM clinical samples. All the chromosome regions affected by DNA gains or losses of each cell line were previously presented in a paper on the molecular characterization of HMCLs5 and are summarized in the supplementary table 2. Figures 13-15 show the heat-maps for CN and LOH probability of patients (Fig. 13), cell lines (Fig. 14) and all samples together (Fig. 15).

63

Fig. 13: Heat-maps based on genomic profiles of 13 MM clinical samples.

Fig. 14: Heat-maps based on genomic profiles of 17 HMCLs.

Fig. 15: Heat-maps based on genomic profiles of 30 MM samples.

Figg 13-15: In the upper panel, heat-map of DNA copy number (CN) changes: black=normal CN, green=deletion, red=gain, orange=amplification. In the lower panel, heat-map of significant LOH probability: black=no LOH, grey=significant LOH. On the x-axis is plotted the physical mapping along all 23 chromosomes, the centromeres indicated as dashed line. On the y-axis are plotted the MM samples.

Frequencies of genomic aberrations among 13 MM patients (Fig. 16), 17 HMCLs (Fig. 17) and all 30 MM samples (Fig. 18) are presented as frequency plots.

64

Fig. 16: Frequency plots of DNA gains (red), losses (black) and LOH (lower panel) in 13 MM clinical samples.

Fig. 17: Frequency plots of DNA gains (red), losses (black) and LOH (lower panel) in 17 HMCLs.

Fig. 18: Frequency plots of DNA gains (red), losses (black) and LOH (lower panel) in 30 MM genomic profiles. Figg. 16-18: In the upper panel are presented the frequency of gains (upper, red part) and deletion (lower, black part) at a defined chromosomal location among the analyzed samples. In the lower panel is presented the percentage of cases with significant LOH at a defined chromosomal position. On the x-axis is plotted the physical mapping along all 23 chromosomes, with the centromeres indicated as vertical dashed line. On the y-axis the frequency of cases with a given genomic aberration; the horizontal dashed line define the 40% threshold for recurrent aberrations.

65 It is known that in MM, besides structural rearrangements and ploidy aberrations, chromosomal CN changes are often observed. Gain of 1q, 6p, 9q, 11q, 12q, 15q, 17q, 19q and deletion of 1p, 6q, 8p, 13q, 16q, 17p and 22q have been previously described240-242. Recurrent regions of gain, deletion and LOH were defined as chromosomal aberrations found in at least 40% of all the MM genomic profiles (38% in patients and 41% in HMCLs). Recurrent genomic aberrations are summarized in tables 2-4, together with the minimally affected region of each chromosome arm. Among MM patients, recurrent DNA gains were identified at 1q, 3, 5, 7, 9, 11, 15, 19 (Tab. 2). The frequency profiles of chromosomes 3, 5, 7, 9, 11, 15, and 19 are indicative of trisomy of odd chromosomes. The small peaks of CN gains on 6q and 12q are probably due to polymorphisms. The Database of genomic variants (http://projects.tcag.ca/variation/cgi- bin/gbrowse/hg18) was consulted. At 6q25.3 a copy number variation site291 has been described, whereas at 12q14.1 any known gene or transcript as well as any genomic variant is known. The most frequent DNA loss in MM patients was detected on 13q (Tab. 2). Few patients presented extended regions of LOH (Fig. 13), none of them being recurrent. The HMCLs showed recurrent DNA gains at 1q, 3q, 7p and 7q, 8p and 8q, 11p and 11q, 12q, 18q, 19p, 20p and 20q (Tab. 3). Recurrent DNA losses were detected at 1p, 4p and 4q, 6q, 8p, 9p, 12p and 12q, 13q, 14q, 17p and 22q (Tab. 3). Looking at the genome-wide frequency profile (Fig. 17) it is possible to recognize that the most common genomic aberrations among HMCLs were gain of chromosome arm 1q and losses of chromosome arms 1p, 13q and 17p, all known to be associated with MM. Finally, recurrent LOH in HMCLs was detected at 1p and 1q, 2q, 7p, 8p and 8q, 10q, 11q, 12p and 12q, 13q, 14q, 16q, 17p, 20p, and 22q (Tab. 3). The most frequent regions of LOH were matching with the most common deleted regions, i.e. chromosome arms 1p, 13q and 17p. As expected, the genomic profiles of all the 30 samples showed as major characteristics 1q gain and frequently increased DNA level of odd chromosomes (3, 5, 7, 9, 11, 15, 19) on one hand, and recurrent DNA losses and LOH event at 1p, 13q and 17p on the other hand (Tab. 4).

Tab. 2: Recurrent regions of gain and loss in 13 MM patients. Recurrent regions were defined as those present in at least 38% of the clinical samples. In bold are highlighted minimal affected intervals. chr. physical mapping cytoband frequency GAINS 1 157283331-177327853 q23.2-q25.3 ≥ 38% 184480646-189598984 q31.1-q31.2 ≥ 38% 158900780-158900901 q23.3 62%

3 653347-199368514 p26.3-q29 ≥ 38%

5 1677351-179796426 p15.33-q35.3 ≥ 38%

6 157941077-157941351 q25.3 ≥ 38%

7 1838367-144617298 p22.3-q35 ≥ 38% 66 Tab. 2: continuation. chr. physical mapping cytoband frequency GAINS 9 239391-38288618 p24.3-p13.2 ≥ 38% 67050707-135054595 q12-q34.3 ≥ 38% 67050707-68511255 q12-q13 54%

11 2573106-134402514 p15.5-q25 ≥ 38%

12 57210556-57210983 q14.1 ≥ 38%

15 21490270-100169952 q11.2-q26.3 ≥ 38%

19 3542590-63437743 p13.3-q13.43 ≥ 38%

LOSSES 13 18478972-96206841 q12.11-q32.1 ≥ 38%

Tab. 3: Recurrent regions of gain, loss and LOH in 17 HMCLs. Recurrent regions were defined as those present in at least 41% of the cell lines. In bold are highlighted minimal affected intervals. chr. physical mapping cytoband frequency GAINS 1 143780476-219529663 q21.1-q41 ≥ 41% 229110842-245119787 q42.2-q44 ≥ 41%

3 150938349-199368514 q25.1-q29 ≥ 41%

7 48206690-52923342 p12.3-p12.1 ≥ 41% 70110441-158466134 q11.22-q36.3 ≥ 41% 102065731-104971619 q22.1-q22.3 59%

8 8160867-11422122 p23.1-p23.1 ≥ 41% 40156906-142072236 p11.21-q24.3 ≥ 41%

11 51247890-51319510 p11.12 ≥ 41% 68986287-78714313 q13.3-q14.1 ≥ 41% 103441794-126153350 q22.3-q24.2 ≥ 41%

12 52546417-57090576 q13.13-q14.1 ≥ 41%

18 27278564-68366009 q12.1-q22.3 ≥ 41%

19 3542590-11658308 p13.3-p13.2 ≥ 41%

20 25127155-61366354 p11.21-q13.33 ≥ 41%

LOSSES 1 48687466-119786066 p33-p12 ≥ 41% 103276462-104902902 p21.1 76%

4 17354869-22577616 p15.32-p15.31 ≥ 41% 44387939-73200795 p13-q13.3 ≥ 41% 54415402 q12 47%

6 103171750-110952941 q16.3-q21 ≥ 41%

8 19947049-34400536 p21.3-p12 ≥ 41%

67 Tab. 3: continuation. LOSSES 9 7560393-9589450 p24.1-p23 ≥ 41%

12 5952374-21587716 p13.31-p12.1 ≥ 41% 128620076-130547441 q24.33 ≥ 41%

13 18478972-114040116 q12.11-q34 ≥ 41%

14 20257888-62251395 q11.2-q23.2 ≥ 41% 65838135-68137391 q23.3-q24.1 ≥ 41% 77579733-79548254 q24.3-q31.1 ≥ 41%

17 3565047-20147237 p13.2-p11.2 ≥ 41%

22 15685581-29258154 q11.1-q12.2 ≥ 41% 44038578-48881270 q13.31-q13.33 ≥ 41% 20306803-21026845 q11.21-q11.22 47%

LOH 1 60629099-119786066 p31.3-p12 ≥ 41% 143780476 q21.1 ≥ 41% 169749920-172396602 q25.1 ≥ 41% 208782115-230304605 q32.3-q42.2 ≥ 41% 82994226-87815233 p31.1-p22.3 65%

2 107065557-111682453 q12.3-q13 ≥ 41% 120958137-124731332 q14.2-q14.3 ≥ 41%

7 2969574-11131340 p22.2-p21.3 ≥ 41%

8 1130901-50311386 p23.3-q11.21 ≥ 41% 16951688 p22 59%

10 67363165-133792769 q21.3-q26.3 ≥ 41% 108301686-109138839 q25.1 59%

11 92389234-97426119 q21-q22.1 ≥ 41%

12 6755671-29246991 p13.31-p11.22 ≥ 41% 95555959-99762926 q23.1 ≥ 41%

13 25812728-114040116 q12.13-q34 ≥ 41%

14 21434309-71234961 q11.2-q24.2 ≥ 41% 76323852-96104558 q24.3-q32.2 ≥ 41%

16 53771594-56247384 q12.2-q13 ≥ 41%

17 3565280-20147237 p13.2-p11.2 ≥ 41% 12271548-14432937 p12 71%

20 7223683-11897114 p12.3-p12.2 ≥ 41%

22 25935191-35835576 q12.1-q13.1 ≥ 41% 30001008-30783771 q12.2-q12.3 47%

68 Tab. 4: Recurrent regions of gain, loss and LOH in 30 MM samples. Recurrent regions were defined as those present in at least 40% of the cases. In bold are highlighted minimal affected intervals. chr. physical mapping cytoband frequency GAINS 1 143780476-210055219 q21.1-q32.3 ≥ 40% 208098497-210055219 q32.3 158900780-158900901 q23.3 77%

3 150938349-199368514 q25.1-q29 ≥ 40%

6 157941077-157941351 q25.3 ≥ 40%

7 48206690-52923342 p12.3-p12.1 ≥ 40% 7 70110441-158466134 q11.22-q36.3 ≥ 40% 102065731-104971619 q22.1-q22.3 50%

8 69712752-142072236 q13.2-q24.3 ≥ 40%

11 51247890-51319510 p11.12 ≥ 40% 68986287-83333212 q13.3-q14.1 ≥ 40% 103441794-126153350 q22.3-q24.2 ≥ 40% 76728362-78652934 q13.5-q14.1 47%

12 57210556-57210983 q14.1 ≥ 40%

15 59210481-64658022 q22.2-q22.31 ≥ 40% 65083141-67859182 q22.33-q23 ≥ 40% 78368329-99686708 q25.1-q26.3 ≥ 40%

18 49959698-59173095 q21.2-q21.33 ≥ 40%

19 3542590-24163969 p13.3-p12 ≥ 40% 37946718-52292441 q13.11-q13.32 ≥ 40%

LOSSES 1 81454907-108974013 p31.1-p13.3 ≥ 40% 1 117206862-117866019 p13.1-p12 ≥ 40% 103276462-104902902 p21.1 47%

13 18867463-114040116 q12.11-q34 ≥ 40%

17 3565047-12558426 p13.2-p12 ≥ 40%

LOH 1 82994226-87815233 p31.1-p22.3 ≥ 40%

13 25812728-114040116 q12.13-q34 ≥ 40%

17 6211340-15053487 p13.2-p12 ≥ 40% 12271548-14432937 p12 43%

69 By comparing the gain/loss and LOH heat-maps, it was possible to identify different LOH patterns. In fact, one of the advantages of using high-density SNP arrays for genomic profiling is the possibility to discriminate between different LOH mechanisms by analyzing simultaneously LOH and CN changes. The most common example of LOH in MM identified using SNP arrays is on chromosome 13, due to monosomy292. Other regions of LOH are typically found also on 1p, 6q, 8p and 16q292. LOH is usually coupled with a deletion, but it can also be generated by an high increase in CN of one allele relative to the other (e.g. a DNA amplification). Moreover, UPD mechanisms can lead to copy-neutral LOH (chapter 1.1.1.3). The few patients we analyzed presenting LOH at 1p, 13q, and 17p had a deletion- induced LOH. On the contrary, among the cell lines only 17p showed a deletion matching with LOH. At 1p and 13q the majority of HMCLs exhibited LOH accompanied by deletion, but the occurrence of LOH in the absence of reduction in CN (copy-neutral LOH) was also observed. In two cell lines (KMM1 and JJN3) we detected 1pter LOH matching with normal DNA CN. On chromosome 13, copy-neutral LOH was found in six cell lines (U266, OPM2, KMS28, KMS26, KMS18, KMS11). Moreover, in two cell lines LOH was identified in correspondence with DNA CN gain (KMS26, JJN3). All these cases were reminiscent of UPD due to mitotic recombination events between chromatids, eventually followed by DNA replication events. We observed UPD of an entire chromosome arm (at 13q, in the cell lines KMS26, KMS11, JJN3) with additional gain or amplification events (KMS26, JJN3); UPD of a part of a chromosome arm to the telomere (at 1p in KMM1 and JJN3); or interstitial UPD regions by multiple recombination events. More in details, we analyzed the patterns of all the recurrent LOH regions found in the HMCL panel. As a whole, a series of complex rearrangements were observed, including amplification-driven LOH (1q) and different UPD mechanisms (interstitial, whole chromosome arm, telomeric) coupled with changes in DNA CN. Since we were particularly interested in amplifications as one of the mechanisms by which oncogenes are activated, we validated amplified regions identified by SNP arrays with real- time qPCR. We could confirm the amplifications at 1p21.2 (CGI-30) in KMS26, 13q31.3 (miR-17- 92 cluster) in KMS11, 13q33.3 (TNFSF13b) in KMS12 and 11q23.1 in JJN3 (Fig. 19).

70 11q23.1 1p21.2

12 16

10 14 12 8 10 qPCR qPCR 6 8 10K 10K 6 4 4 DNA copy number 2 DNA copynumber 2 0 0 normal JJN3 normal KMS26

13q31.3 13q33.3

10 6

8 4 6 qPCR qPCR 10K 10K 4 2

DNA copy number 2 DNA copy number

0 0 normal KMS11 normal KMS12

Fig. 19: Validation by quantitative real-time PCR (qPCR) of amplified regions identified by genome-wide DNA profiling with 10K arrays.

3.4.2 Integrative analysis of expression and genomic profiles

To identify new potential therapeutic targets affected by DNA CN changes we matched the genome-wide DNA and gene expression profiles of 29 MM samples, including 12 MM patients and 17 HMCLs. We present here also the results of the integrative analysis of expression and genomic profiles with the matched filter approach applied to 17 HMCLs that we previously published5. Two different methods were used. First, we applied a matched filter on both DNA and expression data to identify transcripts overexpressed and mapped in gained/amplified DNA regions or downregulated and mapped in lost regions. In a second approach, we applied the COPA algorithm6 combined with a successive filter at DNA level to select genes with outlier overexpression and DNA CN gain. The resulting lists of selected transcripts are indicative of candidate oncogenes, in the case of increased DNA CN and overexpression, or candidate tumor suppressor genes, in case of decreased DNA level and downregulation.

Tab. 5: Summary of the results obtained with matched filter integrative analysis on 29 MM samples. MATCHED FILTER unique unique unique unique genes in at SNP-ID probe-ID genes least 5 cases CN<1.7 ^ fc≥2 down 807 1,229 1,150 30 CN>2.3 ^ fc≥2 up 1,477 3,195 2,908 112 unique genes in at least 2 cases CN>4.0 ^ fc≥2 up 225 422 412 47

71 Tab. 6: Summary of the results obtained with COPA-based integrative analysis on 29 MM samples. COPA-BASED ANALYSIS COPA score range of top 100 unique genes in at least 5 unique outliers with gain cases (COPA rank range) CN>2.3 ^ COPA 75th up 14,06 to 2,15 66 (2 to 100) CN>2.3 ^ COPA 90th up 35,40 to 7,43 61 (2 to 100)

In order to identify the most relevant candidate genes among the lists, we compared our lists with the ones published in three recent papers, in which similar integrative approaches have been applied to study genetic aberrations involved in MM pathogenesis125,226,265. Common transcripts between ours and published candidates125,226,265 were found (genes highlighted with # in tables 9-14). With the matched filter approach we identified a total of 2,908 unique genes, represented by 3,195 unique HG-U133A probes and 1,477 unique SNP-IDs, with matching DNA CN gain and overexpression (Tab. 5). Functional annotation revealed over-representation of various biological processes (BP) (Tab. 7). They mainly belonged to the parental category of nucleic acid metabolism, but they also comprised other categories such as protein metabolism, development, signal transduction, immunity, cell cycle, lipid/aminoacid metabolism, cell motility, cell adhesion, and apoptosis. The majority of transcripts were involved in mRNA transcription and its regulation. Two BIOCARTA pathways resulted to be over-represented: control of gene expression by vitamin D receptor and signal transduction by ErbB2 (Tab. 8).

Tab.7: Over-represented Panther Biological Processes (BP) among the lists of unique transcripts selected by integrative matched filter analysis on 29 MM samples. Only significant terms were considered according to FDR≤0.1. Count: number of transcripts involved; %: percentage of transcripts involved; mf_ampli_all: 412 unique genes amplified and overexpressed; mf_gain_all: 2,908 unique genes gained and overexpressed; mf_loss_all: 1,150 unique genes lost and downregulated. list over-represented panther BP (FDR≤0.1) count % mf_ampli_all BP00044:mRNA transcription regulation 184 44,34% mf_ampli_all BP00289:Other metabolism 82 19,76% mf_ampli_all BP00063:Protein modification 67 16,14% mf_gain_all BP00044:mRNA transcription regulation 1320 45,22% mf_gain_all BP00040:mRNA transcription 797 27,30% mf_gain_all BP00071:Proteolysis 743 25,45% mf_gain_all BP00150:MHCI-mediated immunity 644 22,06% mf_gain_all BP00031:Nucleoside, nucleotide and nucleic acid metabolism 601 20,59% mf_gain_all BP00143:Cation transport 577 19,77% mf_gain_all BP00102:Signal transduction 545 18,67% mf_gain_all BP00104:G-protein mediated signaling 533 18,26% mf_gain_all BP00286:Cell structure 491 16,82% mf_gain_all BP00060:Protein metabolism and modification 484 16,58% mf_gain_all BP00289:Other metabolism 482 16,51% mf_gain_all BP00149:T-cell mediated immunity 476 16,31% mf_gain_all BP00063:Protein modification 446 15,28% mf_gain_all BP00151:MHCII-mediated immunity 368 12,61%

72 Tab.7: continuation. list over-represented panther BP (FDR≤0.1) count % mf_gain_all BP00064:Protein phosphorylation 361 12,37% mf_gain_all BP00103:Cell surface receptor mediated signal transduction 350 11,99% mf_gain_all BP00142:Ion transport 325 11,13% mf_gain_all BP00193:Developmental processes 262 8,98% mf_gain_all BP00067:Protein glycosylation 254 8,70% mf_gain_all BP00036:DNA repair 240 8,22% mf_gain_all BP00285:Cell structure and motility 218 7,47% mf_gain_all BP00197:Spermatogenesis and motility 213 7,30% mf_gain_all BP00141:Transport 210 7,19% mf_gain_all BP00199:Neurogenesis 153 5,24% mf_gain_all BP00179:Apoptosis 145 4,97% mf_gain_all BP00276:General vesicle transport 144 4,93% mf_gain_all BP00034:DNA metabolism 141 4,83% mf_gain_all BP00125:Intracellular protein traffic 139 4,76% mf_gain_all BP00273:Chromatin packaging and remodeling 135 4,62% mf_gain_all BP00019:Lipid, fatty acid and steroid metabolism 129 4,42% mf_gain_all BP00248:Mesoderm development 128 4,39% mf_gain_all BP00020:Fatty acid metabolism 127 4,35% mf_gain_all BP00272:Phospholipid metabolism 119 4,08% mf_gain_all BP00250:Muscle development 117 4,01% mf_gain_all BP00041:General mRNA transcription activities 105 3,60% mf_gain_all BP00124:Cell adhesion 104 3,56% mf_gain_all BP00204:Cytokinesis 104 3,56% mf_gain_all BP00032:Purine metabolism 99 3,39% mf_gain_all BP00014:Amino acid biosynthesis 98 3,36% mf_gain_all BP00292:Other carbon metabolism 96 3,29% mf_gain_all BP00201:Skeletal development 83 2,84% mf_gain_all BP00112:Calcium mediated signaling 79 2,71% mf_gain_all BP00203:Cell cycle 76 2,60% mf_gain_all BP00054:tRNA metabolism 75 2,57% mf_gain_all BP00108:Receptor protein tyrosine kinase signaling pathway 73 2,50% mf_gain_all BP00056:Metabolism of cyclic nucleotides 69 2,36% mf_gain_all BP00126:Exocytosis 65 2,23% mf_gain_all BP00207:Cell cycle control 62 2,12% mf_gain_all BP00137:Protein targeting and localization 62 2,12% mf_gain_all BP00069:Protein disulfide-isomerase reaction 59 2,02% mf_gain_all BP00268:Antioxidation and free radical removal 56 1,92% mf_gain_all BP00017:Amino acid catabolism 56 1,92% mf_gain_all BP00033:Pyrimidine metabolism 55 1,88% mf_gain_all BP00049:mRNA polyadenylation 54 1,85% mf_gain_all BP00070:Protein-lipid modification 51 1,75% mf_gain_all BP00114:MAPKKK cascade 50 1,71% mf_gain_all BP00128:Constitutive exocytosis 39 1,34% mf_loss_all BP00044:mRNA transcription regulation 537 46,33% mf_loss_all BP00040:mRNA transcription 342 29,51% mf_loss_all BP00071:Proteolysis 296 25,54% mf_loss_all BP00104:G-protein mediated signaling 231 19,93% mf_loss_all BP00143:Cation transport 222 19,15% mf_loss_all BP00031:Nucleoside, nucleotide and nucleic acid metabolism 216 18,64% mf_loss_all BP00060:Protein metabolism and modification 208 17,95%

73 Tab.7: continuation. list over-represented panther BP (FDR≤0.1) count % mf_loss_all BP00286:Cell structure 193 16,65% mf_loss_all BP00063:Protein modification 188 16,22% mf_loss_all BP00289:Other metabolism 174 15,01% mf_loss_all BP00064:Protein phosphorylation 151 13,03% mf_loss_all BP00036:DNA repair 98 8,46% mf_loss_all BP00179:Apoptosis 83 7,16% mf_loss_all BP00034:DNA metabolism 59 5,09% mf_loss_all BP00001:Carbohydrate metabolism 49 4,23% mf_loss_all BP00203:Cell cycle 33 2,85% mf_loss_all BP00128:Constitutive exocytosis 23 1,98% mf_loss_all BP00110:Other receptor mediated signaling pathway 21 1,81%

Tab. 8: Over-represented BIOCARTA pathways among the list of 2,908 unique genes gained and overexpressed selected by integrative matched filter analysis. Only significant terms were considered according to FDR≤0.1. Count: number of transcripts involved; %: percentage of transcripts involved. list over-represented BIOCARTA pathway count % mf_gain_all h_vdrPathway:Control of Gene Expression by Vitamin D Receptor 17 0,58% mf_gain_all h_her2Pathway:Role of ERBB2 in Signal Transduction and Oncology 16 0,55%

Among the 2,908 unique gained and overexpressed genes, 112 were shared by at least five samples out of 29 (Tab. 9). The 112 recurrent genes were mostly located on chromosome arm 1q (36/112), followed by chromosome 8 (16/112), and 7q (11/112). Genes known to be involved in MM pathogenesis, such as HGF, IL6R, NCAM1, MAF, and MCL1, were identified and they were considered a proof of the reliability of the adopted approach (positive controls, marked with ^ in Tab. 9). Besides these established oncogenes, another 17 genes implicated in MM pathogenesis, and three genes described in other cancer types were observed (indicated with * in Tab. 9). HGF, IL6R, MCL1 were shared with the integrative analysis previously published by Carrasco et al.226, and further 17 genes were in common with either Carrasco et al.226, Largo et al.265 or Walker et al.125 (highlighted with # in Tab. 9). Interestingly, MRPL13 was recurrent in three lists: our’s, Carrasco’s and Largo’s one. It encodes a mitochondrial ribosomal protein, that helps in protein synthesis within the mitochondrion.

Tab. 9: Genes selected by integrative analysis with matched filter in all MM samples. Matched-filter analysis selected 112 unique genes with DNA CN > 2.3 and expression value of the paired gene ≥ 2-fold above the average expression value, occurring in at least five cases out of 29 (17 cell lines + 12 patients). #: indicates common genes with published integrative analyses on MM125,226,265; ^: highlights genes known to be involved in MM pathogenesis (positive controls); *: indicates genes known to play a role in cancer but not yet described in relationship to MM. In bold are reported genes shared with COPA-based integrative analysis. chromosome cytoband symbol cases % cases 8 chr8q24.1 SQLE 10 34% 1 chr1q31 ASPM^ 9 31% 1 chr1q21.2 ANP32E 8 28% 7 chr7q35-q36 EZH2^ 8 28% 1 chr1q32.1 KIF14^ 8 28%

74 Tab. 9: continuation. chromosome cytoband symbol cases % cases 1 chr1q23 PBX1^ 8 28% 8 chr8q24.3 PTP4A3 8 28% 1 chr1q21 S100A6 # 8 28% 1 chr1q23.3 BAT2D1 7 24% 1 chr1q25 C1orf24 7 24% 7 chr7q21.1 HGF^ # 7 24% 19 chr19p13.3-p13.2 ICAM3 # 7 24% 7 chr7q36 INSIG1 7 24% 7 chr7q21 MAGI2 7 24% 8 chr8q11.2 MCM4^ 7 24% 1 chr1q21 S100A11 # 7 24% 1 chr1q21 S100A4 # 7 24% 3 chr3q29 TFRC 7 24% 7 chr7q31.1 CAV1 6 21% 1 chr1q32-q41 CENPF 6 21% 11 chr11q24-q24 CHEK1 6 21% 8 chr8q24.12 DCC1 6 21% 1 chr1q32.3 DTL 6 21% 8 chr8p23.1-p22 FDFT1 6 21% 19 chr19q13.32 FOSB* 6 21% 20 chr20q13.2-q13.3 GNAS 6 21% 11 chr11q23.2 IGSF4 6 21% 1 chr1q21 IL6R^ # 6 21% 15 chr15q26 ISG20^ 6 21% 8 chr8p22-q22.3 LACTB2 6 21% 7 chr7q21.3-q22.1 MCM7^ 6 21% 7 chr7q32 MEST 6 21% 8 chr8q22.1-q22.3 MRPL13 # 6 21% 8 chr8q22 MYBL1^ 6 21% 11 chr11q23.1 NCAM1^ 6 21% 1 chr1q32.1 NUCKS1 6 21% 9 chr9q13 PIP5K1B 6 21% 18 chr18q21.32 PMAIP1 6 21% 20 chr20p11.21 PSF1 6 21% 3 chr3q27 RFC4^ 6 21% 1 chr1q21 S100A10* 6 21% 3 chr3q27-q28 ST6GAL1 6 21% 20 chr20q13.2-q13.3 STK6^ 6 21% 1 chr1q32.1 TIMM17A 6 21% 17 chr17q21-q22 TOP2A 6 21% 20 chr20q11.2 TPX2 6 21% 1 chr1q23 UCK2 # 6 21% 8 chr8p11.23 ADAM9 5 17% 11 chr11q12.2 AHNAK 5 17% 7 chr7q21.3 ASK # 5 17% 8 chr8q24.13 ATAD2 5 17% 1 chr1q24-q25 C1orf22 5 17% 1 chr1q32.1 CAMSAP1L1 # 5 17% 7 chr7q31.1 CAV2 5 17% 8 chr8q22.1 CCNE2 5 17%

75 Tab. 9: continuation. chromosome cytoband symbol cases % cases 19 chr19p13-q13.4 CD37 5 17% 9 chr9q22.33 CDC14B 5 17% 18 chr18q11.2 CDH2 5 17% 1 chr1q21.2 CKIP-1 5 17% 1 chr1q24 CREG1 5 17% 8 chr8q24.12 DEPDC6 # 5 17% 11 chr11q13.2-q13.5 DHCR7 5 17% 15 chr15q26.3 DMN 5 17% 5 chr5q34 DUSP1 5 17% 3 chr3q26.1-q26.2 ECT2 5 17% 19 chr19q13.3 FCGRT 5 17% 1 chr1q22 FDPS # 5 17% 8 chr8q12.3 GGH # 5 17% 1 chr1q31 GLUL 5 17% 19 chr19p13.3 GNG7 5 17% 17 chr17q25.1 HN1 5 17% 3 chr3q29 IQCG 5 17% 5 chr5q13.3 IQGAP2 5 17% 12 chr12q13.13 ITGB7^ 5 17% 15 chr15q22.31 KIAA0101 5 17% 19 chr19p13.13-p13.11 KLF2^ 5 17% 1 chr1q42.1 LBR 5 17% 1 chr1q21.2-q21.3 LMNA 5 17% 1 chr1q32.3 LPGAT1 5 17% 7 chr7q36.3 LUZP5 5 17% 16 chr16q22-q23 MAF^ 5 17% 1 chr1q21 MCL1^ # 5 17% 11 chr11q21 MGC5306^ 5 17% 15 chr15q21 MYO5A^ 5 17% 8 chr8q24.3 NDRG1 5 17% 1 chr1q23.1 NES # 5 17% 14 chr14q32.31 PAPOLA 5 17% 8 chr8p21.2 PBK 5 17% 10 chr10p15.3-p15.2 PFKP^ 5 17% 1 chr1q25 PLA2G4A 5 17% 16 chr16p12.1 PLK1 5 17% 20 chr20q11.23 PPP1R16B 5 17% 15 chr15q26.1 PRC1 5 17% 19 chr19q13.3 PRKD2 5 17% 1 chr1q32 RAB7L1 5 17% 8 chr8q24 RAD21^ 5 17% 1 chr1q21 RFX5 5 17% 1 chr1q25.3 RGL1 5 17% 1 chr1q31 RGS1 5 17% 11 chr11q13.1 SAC3D1 5 17% 16 chr16q11.2 SHCBP1 5 17% 3 chr3q26.1 SMC4L1 5 17% 1 chr1q25 SOAT1 5 17% 7 chr7q22-q31.1 SRPK2 # 5 17% 1 chr1q21.2 SYT11 5 17%

76 Tab. 9: continuation. chromosome cytoband symbol cases % cases 18 chr18q21.1 TCF4 # 5 17% 1 chr1q32 TRAF5* 5 17% 18 chr18p11.32 TYMS^ 5 17% 20 chr20q13.12 UBE2C^ 5 17% 20 chr20q13.1 YWHAB # 5 17% 19 chr19q13.1 ZFP36 # 5 17% 1 chr1q32.1 ZNF281 # 5 17%

Forty-hundred and twelve of the 2,908 genes with coupled DNA gain and overexpression were amplified (Tab. 5). They mostly represented BP linked to nucleic acid metabolism, in particular mRNA transcription regulation (Tab. 7). Forty-seven amplified and overexpressed genes were found in at least two samples, mainly distributed on chromosomes arms 1q (20/47) and 8q (8q24)(15/47) (Tab. 10). The list included MCL1, considered a positive control, and other four genes already described as implicated in MM (indicated with ^ in Tab. 10). Moreover, 14 genes were shared with either Carrasco et al.226, Largo et al.265 or Walker et al.125 (highlighted with # in Tab. 10). POLR3C and RBM8A were found by us, Walker et al. and Carrasco et al., and four genes known to be involved in tumors were detected (marked with * in Tab. 10).

Tab. 10: Genes selected by integrative analysis with matched filter in all MM samples. Matched-filter analysis selected 47 unique genes with DNA CN > 4 and expression value of the paired gene ≥ 2-fold above the average expression value, occurring in at least two cases out of 29 (17 cell lines + 12 patients). #: indicates common genes with published integrative analyses on MM125,226,265; ^: highlights genes known to be involved in MM pathogenesis (positive controls); *: indicates genes known to play a role in cancer but not yet described in relationship to MM. In bold are reported genes shared with COPA-based integrative analysis. chromosome cytoband symbol cases % cases 1 chr1q23 PBX1^ 4 14% 1 chr1q23 UCK2 # 4 14% 8 chr8q24.3 BOP1 3 10% 8 chr8q24.3 CYC1 # 3 10% 8 chr8q24.21 FAM49B 3 10% 1 chr1q23.3 PFDN2 # 3 10% 1 chr1q21 ARNT # 2 7% 8 chr8q24.13 ATAD2 2 7% 1 chr1q24 ATP1B1^ 2 7% 8 chr8q24.3 C8orf33 2 7% 1 chr1q24 CREG1 2 7% 13 chr13q34 CUL4A* 2 7% 18 chr18q12 CXXC1 2 7% 8 chr8q24.1-q24.2 DDEF1* 2 7% 8 chr8q24 EIF2C2 # 2 7% 12 chr12p12.1 ETNK1 2 7% 8 chr8q24.3 EXOSC4^ 2 7% 1 chr1q22 FDPS # 2 7% 13 chr13q34 FLJ11305 2 7%

77 Tab. 10: continuation. chromosome cytoband symbol cases % cases 1 chr1q21 GBA /// GBAP # 2 7% 8 chr8q24.22 KIAA0143 2 7% 12 chr12p12.1 KIAA0528 2 7% 13 chr13q34 LAMP1 2 7% 3 chr3q25.32 LXN 2 7% 18 chr18q21 MBD2* 2 7% 1 chr1q21 MCL1^ # 2 7% 8 chr8q24.3 NDRG1 2 7% 1 chr1q23 NDUFS2 # 2 7% 1 chr1q21.3 PMVK 2 7% 18 chr18q21.1 POLI 2 7% 1 chr1q21.1 POLR3C # 2 7% 1 chr1q12 RBM8A # 2 7% 8 chr8q24.3 RECQL4 2 7% 1 chr1q21 RFX5 2 7% 1 chr1q21 S100A10* 2 7% 1 chr1q21 S100A11 # 2 7% 1 chr1q21 S100A6 # 2 7% 1 chr1q22 SEMA4A 2 7% 18 chr18q21.1 SETBP1 2 7% 8 chr8q24.2-qtel SIAHBP1 2 7% 8 chr8q24.3 SLC39A4 # 2 7% 18 chr18q21.1 SMAD2^ 2 7% 8 chr8q24.1 SQLE 2 7% 8 chr8q24.22 ST3GAL1 2 7% 13 chr13q34 TUBGCP3 2 7% 1 chr1q21 TUFT1 2 7% 1 chr1q23.3 UAP1 2 7%

Integrative analysis of expression and genomic profiles with the matched filter approach applied to 17 HMCLs5 led to the identification of 222 unique genes with matching overexpression and DNA amplification, represented by 290 HG-U133A probes and 226 SNP-ID (Suppl. Tab. 3). In particular, 23 selected transcripts were found in at least two cell lines, mainly distributed on chromosome arms 1q (14/22), 8q (4/22), 13q (2/22), and 18q (3/22) (Tab. 11). The MM gene MCL1 was detected, together with PBX1, the latter recently identified as overexpressed in a subgroup of MM samples based on unsupervised hierarchical clustering of gene expression profiles262. MCL1 was shared, together with other five genes, with previously published studies125,226,265 (highlighted with # in Tab. 11). Four more genes linked to cancer were identified (marked with * in Tab. 11).

78 Tab. 11: Genes selected by integrative analysis with matched filter in HMCLs. 23 genes mapped in regions with a DNA CN > 4 and expression value of the paired gene ≥ 2-fold above the average expression value, occurring in two or more HMCLs. #: indicates common genes with published integrative analyses on MM125,226,265; ^: highlights genes known to be involved in MM pathogenesis (positive controls); *: indicates genes known to play a role in cancer but not yet described in relationship to MM. In bold are reported genes shared with COPA-based integrative analysis. Gene Symbol Chromosomal Location Cell Line ARNT # chr1q21 H929/LP1/OPM2 MCL1^ # chr1q21 H929/KMS20 RFX5 chr1q21 KMS20/LP1 S100A10* chr1q21 H929/LP1 S100A6 # chr1q21 H929/KMS20 SHC1 # chr1q21 OPM2/H929 TUFT1 chr1q21 KMS20/OPM2 TXNIP chr1q21.1 H929/KMS20 ARHGEF2 chr1q21-q22 H929/OPM2 SEMA4A chr1q22 H929/KMS20 PBX1^ chr1q23 CMA02/KMS20/OPM2 SLAMF7 # chr1q23.1-q24.1 H929/KMS20/OPM2 UAP1 chr1q23.3 H929/OPM2 ATP1B1 chr1q24 CMA02/KMS20/OPM2 DDEF1* chr8q24.1-q24.2 JJN3/KMS18 FAM49B chr8q24.21 CMA02/JJN3/KMS18 KIAA0143 chr8q24.22 JJN3/KMS18 NDRG1 chr8q24.3 CMA02/JJN3 CUL4A* chr13q34 JJN3/KMS12 TUBGCP3 chr13q34 JJN3/KMS12 MBD2* chr18q21 CMA02/KMS26 POLI chr18q21.1 CMA02/KMS26 BCL2 # chr18q21.33 CMA02/KMS12

Notably, the anti-apoptotic gene BCL2 was overexpressed and amplified in two cell lines, CMA02 and KMS12. Given previous observations of BCL2 amplification and overexpression in one HMCL (1/9) (SKMM2, not included in our panel) by Largo et al.265, its oncogenic behaviour in five patients (5/67) and 10 HMCLs (10/43) reported by Carrasco et al.226, and its relevance in lymphoid malignancies, we characterized the involvement of BCL2 in HMCLs5 (Fig. 20). FISH analysis confirmed a remarkable amplification of the locus in CMA02 and gain in KMS125 (Fig. 20A). The marked amplification in CMA02 was not detected in malignant plasma cells from the corresponding primary tumor (PCL-04), suggesting that it may be a secondary event related to in vitro cell line establishment. Overall, a good concordance between FISH signals and CN was observed in all HMCLs. Concerning the expression levels of the BCL2 gene, the gene expression profiles showed its overexpression in four cell lines (LP1, KMS12, CMA02 and CMA03) when compared to a threshold level derived from marrow plasma cells of four healthy donors277 (Fig. 20B).

79

Fig. 20: Involvement of BCL2 in HMCLs5 . (A) FISH analysis of KMS12 and CMA02 cell lines and patient PCL-04 (05-088) from whom CMA02 was derived. Green signal: chromosome 18 Alpha Satellite probe; red signal: probe specific for the BCL2 locus. (B) BCL2 expression profiles (HG-U133A probe-ID: 203684_s_at) in 17 HMCLs. The dotted line indicates the arbitrary BCL2 expression threshold obtained from the plasma cells of healthy donors (median + 3 standard deviations). FISH and copy number results for each cell line are shown below the histogram: N=no extra-copies of BCL2 in comparison with the number of chromosome 18 centromeric signals; G=extra copies of BCL2; D= BCL2 signals lower than chromosome 18 centromeric signals; A=amplification.

Association between loss and downregulation according to the matched filter method was detected in 1,150 unique genes, represented by 1,229 unique HG-U133A probes and 807 unique SNP-IDs (Tab. 5). Among them, functional annotation analysis highlighted the over- representation of BP linked to nucleic acid metabolism category, and protein metabolism (Tab. 7). This, together with the results obtained with gained and overexpressed transcripts, indicates both metabolic processes as being mostly deregulated in MM. Thirty transcripts were lost and downregulated in at least five samples (Tab. 12). They were mainly distributed on chromosome arms 1p (8/30) and 13q (6/30). Five genes were share with either Carrasco et al.226, Largo et al.265 or Walker et al.125 (highlighted with # in Tab. 12), and three transcripts have already been described in relationship to MM (indicated with ^ in Tab. 12).

80 Tab. 12: Genes selected by integrative analysis with matched filter in all MM samples. Matched-filter analysis selected 30 unique genes with DNA CN ≤ 1.7 and expression value of the paired gene ≥ 2-fold downward from the average expression value, occurring in at least five cases out of 29 (17 cell lines + 12 patients). #: indicates common genes with published integrative analyses on MM125,226,265; ^: highlights genes known to be involved in MM pathogenesis (positive controls); *: indicates genes known to play a role in cancer but not yet described in relationship to MM. chromosome cytoband symbol cases % cases 13 chr13q22 EDNRB 11 38% 13 chr13q14.11 RGC32 10 34% 1 chr1p13 AMPD1 7 24% 1 chr1p31 PDE4B^ 7 24% 17 chr17p13.3 ATP2A3 6 21% 4 chr4q13.2 BRDG1 # 6 21% 1 chr1p13 CD53 6 21% 8 chr8p21.3 ChGn # 6 21% 14 chr14q12-q13 COCH 6 21% 14 chr14q11.2 ISGF3G 6 21% 13 chr13q13 NBEA # 6 21% 13 chr13q14.3 RCBTB2 6 21% 12 chr12p12.3 ARHGDIB 5 17% 4 chr4p15 CD38 5 17% 17 chr17p13.1 CENTB1 5 17% 1 chr1p22 DPYD 5 17% 12 chr12p12.1 ETNK1 5 17% 1 chr1p21 EXTL2 5 17% 14 chr14q24.3 FUT8 5 17% 1 chr1p31.2-p31.1 GADD45A^ 5 17% 4 chr4p16.3 IDUA 5 17% 14 chr14q32 IFI27^ 5 17% 1 chr1p32-p31 JUN 5 17% 4 chr4p15.2 KIAA0746 5 17% 13 chr13q14.3 LCP1 5 17% 16 chr16q22-q23 MAF # 5 17% 6 chr6q22 MAN1A1 5 17% 14 chr14q32.33 MGC27165 5 17% 13 chr13q32.2 RANBP5 # 5 17% 1 chr1pter-p22.1 RNF11 5 17%

The COPA algorithm was designed to identify genes which appear as outliers in their expression profile due to underlying genetic events such as translocations or genomic amplifications6. We adopted the COPA approach in order to look for transcription outliers among overexpressed genes due to gene dosage alterations. We applied the COPA algorithm to both the 75th (COPA 75th) and 90th (COPA 90th) percentiles. Top 100 unique outlier genes identified by COPA 75th and COPA 90th characterized by DNA gain are reported in supplementary tables 4 and 5, respectively. A summary of the results obtained with COPA- based integrative approach is presented in table 6. Functional annotation of the top 100 unique outlier genes could not detect significant over-representation of any biological process or pathway, due to the low number of transcripts analyzed. Table 13 (COPA at 75th) and table 14 (COPA at 90th) show the unique genes among the top 100 outliers with DNA CN > 2.3 in at least five cases out of 29. COPA 75th identified 66 outliers

81 with recurrent DNA gain, whereas COPA 90th found 61 outliers with recurrent DNA gain; 20 genes were detected both with COPA 75th- and COPA 90th-based integrative approaches (Tabb. 13-14, in italics). Among the 66 unique genes selected from the COPA 75th top 100 outliers characterized by DNA CN > 2.3 in five or more cases, we could appreciate the presence of the positive controls CCND1, HGF, and MAF and another six genes involved in MM pathways (indicated with ^ in Tab. 13). Three genes were in common with previously published works presenting integrative analyses in MM: HGF and SLC43A3 with Carrasco et al.125,226, and HIST1H2BG with Largo et al.265 (highlighted with # in Tab. 13). Seven genes relevant in other tumor types were also identified (marked with * in Tab. 13). Among the 61 unique genes selected from the COPA 90th top 100 outliers with DNA CN > 2.3 in five or more cases, we found CCND1, MAFB, and six additional genes related to MM (indicated with ^ in Tab. 14), and five genes linked to other cancer types (marked with * in Tab. 14). Comparing the selected COPA 90th outliers with published candidate oncogene lists125,226,265, we identified only one gene, NES, shared with Carrasco et al.226. Concerning the genes playing a role in cancer, COPA 75th and COPA 90th candidates both contained genes coding for the cancer germline antigen BAGE, and calcium-binding proteins of the S100 family.

Tab. 13: Genes selected by COPA-based integrative analysis in all MM samples. COPA analysis at 75th percentile selecting 66 unique genes among the top 100 outliers with DNA CN > 2.3 in at least five cases out of 29 (17 cell lines + 12 patients). #: indicates common genes with published integrative analyses on MM125,226,265; ^: highlights genes known to be involved in MM pathogenesis (positive controls); *: indicates genes known to play a role in cancer but not yet described in relationship to MM. In bold are reported genes shared with matched filter integrative analysis, whereas in italics the genes in common with COPA 90th outliers with paired DNA gain. COPA.75 cases % cases chr. cytoband symbol score COPA rank w.gain w. gain 1 chr1q21 S100A8* 11,59 2 19 66% 7 chr7q21 PEG10 7,67 5 12 41% 7 chr7q21-q22 SGCE 6,80 6 12 41% 15 chr15q11.2-q12 NDN 5,94 8 10 34% 8 chr12q15/chr19q13.4 LYZ/LILRB1 5,31 10 10 34% 21 chr21p11.1 BAGE* 5,13 11 6 21% 1 chr1q22-q23 SLAMF1 5,12 12 20 69% 1 chr1q21 S100A12* 4,97 13 19 66% 3 chr3q21-q24 PCOLCE2 4,42 14 11 38% 19 chr19q13.32 FOSB* 4,23 17 12 41% 3 chr3p22.3 OSBPL10 3,90 18 8 28% 21 chr21p11 TPTE 3,75 20 6 21% 1 chr1q23 FCGR2B 3,57 22 20 69% 12 chr12q12 PDZRN4 3,55 23 7 24% 16 chr16q22-q23 MAF^ 3,37 27 5 17% 11 chr11p15.5 PHLDA2 3,23 28 7 24% 1 chr1q41 ESRRG 3,08 29 11 38%

82 Tab. 13: continuation. COPA.75 cases % cases chr. cytoband symbol score COPA rank w.gain w. gain 11 chr11q13 CCND1^ 3,05 32 12 41% 7 chr7q11.22 AUTS2 2,93 33 11 38% 7 chr7q32.3 FLJ43663 2,89 36 14 48% 7 chr7p15-p14 CPVL 2,89 37 11 38% 19 chr19q13.4 LILRB4 2,88 38 10 34% 8 chr8p21.3 ChGn 2,87 39 7 24% 14 chr14q24.3 FOS* 2,85 40 5 17% 3 chr3q26.1-q26.2 BCHE 2,84 42 12 41% 5 chr5q32 SH3TC2 2,84 43 7 24% 1 chr1q42-q43 KCNK1 2,80 44 11 38% 1 chr1q21 S100A9* 2,80 45 19 66% 9 chr9q31 TMEFF1 2,77 47 10 34% 9 chr9q31 KLF4^ 2,76 48 11 38% 8 chr8q13.2-q13.3 SULF1 2,70 49 14 48% 11 chr11p15 TRIM22^ 2,67 51 7 24% 9 chr9q34 FCN1 2,66 52 10 34% 9 chr9p12 DDX58^ 2,66 53 7 24% 7 chr7q21.1 HGF^ # 2,65 54 12 41% 11 chr11q11 SLC43A3 # 2,64 55 11 38% 6 chr6p21.3 HIST1H2BG # 2,64 56 6 21% 3 chr3q13.2 ZBTB20 2,63 57 7 24% 9 chr9q22 SYK* 2,63 58 10 34% 11 chr11q12-q13.1 FADS2 2,60 59 11 38% 8 chr8p21-p12 CLU 2,60 60 7 24% 3 chr3q12-q13 CD200 2,56 62 7 24% 5 chr5q12-q13.3 ENC1 2,51 63 10 34% 17 chr17q23 TBX2 2,49 64 8 28% 16 chr16p13.3 IL32 2,48 65 5 17% 1 chr1q23 FCGR2A 2,47 66 20 69% 1 chr1q44 TRIM58^ 2,47 68 11 38% 17 chr17q21.33 CROP 2,44 69 8 28% 11 chr11q12.3 HRASLS2 2,43 70 11 38% 2 chr2p24 DTNB 2,40 72 5 17% 2 chr2p25 SOX11^ 2,36 75 5 17% 2 chr2p25.2 RSAD2 2,34 76 5 17% 17 chr17q21.2 SUI1 2,33 77 7 24% 1 chr1p36.13-q42.3 C1orf73 2,30 82 12 41% 19 chr19p13 TNFSF7 2,28 84 14 48% 15 chr15q26 STARD5 2,24 87 12 41% 17 chr17q25.1 CD300A 2,23 88 7 24% 1 chr1q23 PBX1^ 2,23 89 19 66% 1 chr1q25.2-q25.3 PTGS2 2,23 90 15 52% 1 chr1q25.3 RGL1 2,21 91 16 55% 1 chr1q21.2 SYT11 2,20 93 19 66% 18 chr18q12 MOCOS 2,20 94 9 31% 17 chr17p11.2 EPN2 2,20 95 6 21% 9 chr9q33-q34.1 LHX2 2,15 98 11 38% 7 chr7q22 RELN 2,15 99 15 52% 21 chr21q22.3 ABCG1 2,15 100 7 24%

83 Tab. 14: Genes selected by COPA-based integrative analysis in all MM samples. COPA analysis at 90th percentile selecting 61 unique genes among the top 100 outliers with DNA CN > 2.3 in at least five cases out of 29 (17 cell lines + 12 patients). #: indicates common genes with published integrative analyses on MM125,226,265; ^: highlights genes known to be involved in MM pathogenesis (positive controls); *: indicates genes known to play a role in cancer but not yet described in relationship to MM. In bold are reported genes shared with matched filter integrative analysis, whereas in italics the genes in common with COPA 75th outliers with paired DNA gain. COPA.90 cases % cases chr. cytoband symbol score COPA rank w.gain w. gain 1 chr1q31.2 RGS13 30,22 2 16 55% 1 chr1q21 S100A8* 17,49 8 19 66% 15 chr15q11.2 HDGFRP3 16,30 12 11 38% 5 chr5q13 F2RL1 14,53 14 10 34% 5 chr5q14 NR2F1 13,97 16 8 28% 21 chr21p11.1 BAGE* 13,68 17 6 21% 11 chr11q13.2 GAL 13,54 18 12 41% 14 chr14q32.33 IGHD 13,51 19 6 21% 18 chr18q22 MC4R 12,95 21 11 38% 2 chr2p25.1 GREB1 12,46 24 5 17% 1 chr1q25.2-q25.3 PTGS2 11,85 28 15 52% 7 chr7q21 PEG10 11,81 29 12 41% 8 chr12q15/chr19q13.4 LYZ/LILRB1 11,57 31 10 34% 17 chr17q25 LGALS3BP^ 11,25 33 7 24% 1 chr1q21 S100A12* 11,13 34 19 66% 3 chr3q26.31 NLGN1 11,08 35 12 41% 16 chr16p13.3 IL32 10,94 36 5 17% 9 chr9p24.1 NFIB 10,82 38 8 28% 11 chr11p15.5 PHLDA2 10,33 39 7 24% 14 chr14q32.33 IGHM 9,99 41 6 21% 8 chr8p22 MSR1 9,98 42 6 21% 22 chr22q12.3 PVALB 9,97 43 5 17% 11 chr11q13 CCND1^ 9,85 45 12 41% 2 chr2p25.2 RSAD2 9,82 46 5 17% 8 chr8q13.2-q13.3 SULF1 9,75 47 14 48% 8 chr8q22.1 FLJ20171 9,61 50 13 45% 3 chr3q21-q24 PCOLCE2 9,55 53 11 38% 15 chr15q11.2-q12 NDN 9,48 54 10 34% 11 chr11p13 C11orf8 9,36 56 9 31% 16 chr16p13.3 PRSS21 9,26 57 5 17% 2 chr2p25 SOX11^ 9,24 58 5 17% 11 chr11q23.3 MCAM 8,98 62 11 38% 1 chr1q22-q23 SLAMF1 8,97 63 20 69% 7 chr7q21-q31 SEMA3C 8,90 64 12 41% 8 chr8p21.2 PNMA2 8,80 66 7 24% 3 chr3q26.2 TNIK 8,73 67 12 41% 7 chr7q21-q22 SGCE 8,71 68 12 41% 21 chr21p11 TPTE 8,60 69 6 21% 12 chr12q12 PDZRN4 8,59 70 7 24% 5 chr5q33.1-qter LCP2 8,57 71 7 24% 8 chr8p11.2-p11.1 FGFR1* 8,56 72 9 31% 19 chr19q13.42 NALP2 8,55 73 10 34% 15 chr15q26 NR2F2 8,52 74 12 41% 2 chr2p24.1 MYCN* 8,48 75 5 17% 7 chr7q31-q32 GNG11^ 8,45 76 12 41%

84 Tab. 14: continuation. COPA.90 cases % cases chr. cytoband symbol score COPA rank w.gain w. gain 15 chr15q24-q25 CTSH 8,38 78 10 34% 8 chr8q22 RUNX1T1^ 8,19 80 13 45% 8 chr8q24.1 ENPP2 8,18 81 13 45% 3 chr3q25.32-q25.33 SCHIP1 8,05 83 12 41% 8 chr8p22 DLC1 8,00 85 8 28% 8 chr8p23.1 DEFA1/DEFA3 7,94 87 8 28% 20 chr20q11.2-q13.1 MAFB^ 7,93 88 10 34% 1 chr1q23.1 NES # 7,85 90 19 66% 7 chr7q31.3 PTPRZ1 7,70 92 14 48% 9 chr9q33-q34.1 LHX2 7,69 93 11 38% 1 chr1q42-q43 KCNK1 7,67 94 11 38% 19 chr19q13.33 MYBPC2 7,66 95 11 38% 5 chr5q13.3 KIAA0888 7,62 96 10 34% 9 chr9q21.13 ANXA1^ 7,59 97 9 31% 3 chr3p21|3p21.3 CX3CR1^ 7,46 99 8 28% 1 chr1q42.11 CDC42BPA 7,43 100 10 34%

The following seven genes were found both in the COPA-based and in the matched filter integrative analysis: PBX1, FOSB, HGF, MAF, NES, RGL1, SYT11. Interestingly, two of them, HGF and NES, were also cited by Carrasco et al. as candidate oncogenes. The gene expression level of some selected transcripts with increased DNA and RNA level could be validated across a large data-set of HG-U133A gene expression profiles obtained from 102 MM patients, 11 MGUS, nine PCL, and four normal samples, collected by the collaborative group in Milan285. MM samples were stratified into five translocations/cyclins (TC) classes using previously described criteria277. The five groups were characterised as follows: TC1 (22 cases) showed the t(11;14) or t(6;14) translocation, with consequent overexpression of CCND1 or CCND3; TC2 (22 cases) showed low to moderate levels of the CCND1 gene in the absence of any primary IgH translocation; TC3 (34 cases) included tumours that did not fall into any of the other groups, most of which expressing CCND2; TC4 (19 cases) showed high CCND2 levels and the presence of the t(4;14) translocation; and TC5 (five cases) expressed the highest levels of CCND2 in association with either t(14;16) or t(14;20). Results are presented in figure 21. We looked for increased expression of a given transcript in MGUS, MM, or PCL sample subtypes with respect to its expression in the normal samples. We considered as overexpressed the samples with an expression of at least 2-fold higher than the highest expression value occurring among the four normal samples. The gene expression profile of the genes PBX1, HGF, and MAF is reported. As previously cited, these genes were detected with both the COPA-based and the matched filter integrative analyses as gained/amplified and overexpressed (Tabb. 9-10, 13). HGF was mainly overexpressed in TC2 samples (64%, 14/22) and in MGUS (55%, 6/11), but also TC1, TC3, TC4 and PCL presented some overexpressed cases. MAF overexpression was principally observed in TC5 (80%, 4/5) and PCL (33%, 3/9), followed by TC4 (16%) and TC3 (6%). Overexpression of PBX1 was limited, and

85 confined to PCL (22%, 2/9), TC5 (20%, 1/5) and TC4 (11%, 2/19). UCK2 appeared both among the gained and overexpressed genes, as well as among the amplified, overexpressed transcripts (Tabb. 9-10). It was overexpressed in all PCL and in the majority of TC4 (74%, 14/19), but its overexpression was detected in all groups of samples, including TC3 (47%), MGUS (45%), TC2 (41%) and TC1 (27%). The genes PFDN2, CYC1 and BOP1 (Table 10) were particularly overexpressed in PCL samples. PFDN2 was overexpressed in 56% (5/9) PCL, 16% TC4, and in a pair of TC1 and TC3 samples. CYC1 presented overexpression in 44% (4/9) PCL and 16% TC4. BOP1 was overexpressed in 44% PCL, followed by 22% TC3, 14% TC2 and 11% TC4. The overexpression of ZBTB20, detected among the gained COPA-75 outliers (Tab. 13), was found mainly in TC1 (41%, 9/22) and TC2 (36%, 8/22), and in less extent in all the other groups except for PCL. Too little is known about the correlation of TC classes with clinical behaviour to draw any conclusion on the impact of the mentioned transcripts on the pathology. By the study proposed by Agnelli et al. it is known that a unique signature characterizes the TC2 group by the overexpression of genes involved in protein biosynthesis at the translational level, like ribosomal protein genes related to large and small ribosome subunits, translational initiation factor 3 (eIF3k), and translational elongation factor 2 (EEF2)277. However, further investigations including more samples are needed to better define the peculiarities of every MM subtype. By analyzing the HG-U133A expression profiles of some selected transcripts we could validate their altered gene expression in a larger data-set including also MGUS and PCL samples, and we could observe the pattern of distribution of overexpressed cases among the TC groups. Some of them were indistinctly deregulated in MGUS, MM, and PCL samples (e.g. HGF, UCK2), whereas others showed a marked distribution on few subtypes (e.g. MAF, PBX1, CYC1).

210755_at HGF

3000

2500

2000

1500

1000

500

0 N-001 N-002 N-004 N-005 PCL-001 PCL-002 PCL-004 PCL-006 PCL-007 PCL-008 PCL-009 PCL-005 PCL-011 MM-015 MM-019 MM-031 MM-032 MM-037 MM-052 MM-055 MM-070 MM-111 MM-115 MM-128 MM-140 MM-159 MM-026 MM-100 MM-119 MM-126 MM-054 MM-064 MM-179 MM-180 MM-186 MM-014 MM-035 MM-038 MM-039 MM-056 MM-079 MM-103 MM-116 MM-118 MM-131 MM-143 MM-146 MM-151 MM-030 MM-034 MM-043 MM-049 MM-077 MM-121 MM-178 MM-183 MM-190 MM-016 MM-027 MM-036 MM-045 MM-048 MM-050 MM-068 MM-078 MM-081 MM-082 MM-092 MM-094 MM-107 MM-114 MM-117 MM-129 MM-148 MM-149 MM-150 MM-152 MM-153 MM-160 MM-161 MM-167 MM-040 MM-047 MM-101 MM-072 MM-106 MM-155 MM-174 MM-177 MM-184 MM-191 MM-021 MM-033 MM-042 MM-063 MM-067 MM-074 MM-083 MM-087 MM-104 MM-109 MM-113 MM-123 MM-138 MM-158 MM-066 MM-133 MM-089 MM-173 MM-185 MM-025 MM-069 MM-004 MM-154 MM-139 MGUS-012 MGUS-015 MGUS-034 MGUS-005 MGUS-008 MGUS-017 MGUS-018 MGUS-019 MGUS-020 MGUS-026 MGUS-031

N MGUS TC1 TC2 TC3 TC4 TC5 PCL

Fig. 21: continuation on next pages.

86 MAF 206363_at

2000 1800 1600 1400 1200 1000 800 600 400 200 0 N-001 N-002 N-004 N-005 MM-015 MM-019 MM-031 MM-032 MM-037 MM-052 MM-055 MM-070 MM-111 MM-115 MM-128 MM-140 MM-159 MM-026 MM-100 MM-119 MM-126 MM-054 MM-064 MM-179 MM-180 MM-186 MM-014 MM-035 MM-038 MM-039 MM-056 MM-079 MM-103 MM-116 MM-118 MM-131 MM-143 MM-146 MM-151 MM-030 MM-034 MM-043 MM-049 MM-077 MM-121 MM-178 MM-183 MM-190 MM-016 MM-027 MM-036 MM-045 MM-048 MM-050 MM-068 MM-078 MM-081 MM-082 MM-092 MM-094 MM-107 MM-114 MM-117 MM-129 MM-148 MM-149 MM-150 MM-152 MM-153 MM-160 MM-161 MM-167 MM-040 MM-047 MM-101 MM-072 MM-106 MM-155 MM-174 MM-177 MM-184 MM-191 MM-021 MM-033 MM-042 MM-063 MM-067 MM-074 MM-083 MM-087 MM-104 MM-109 MM-113 MM-123 MM-138 MM-158 MM-066 MM-133 MM-089 MM-173 MM-185 MM-025 MM-069 MM-004 MM-154 MM-139 PCL-001 PCL-002 PCL-004 PCL-006 PCL-007 PCL-008 PCL-009 PCL-005 PCL-011 MGUS-012 MGUS-015 MGUS-034 MGUS-005 MGUS-008 MGUS-017 MGUS-018 MGUS-019 MGUS-020 MGUS-026 MGUS-031 N MGUS TC1 TC2 TC3 TC4 TC5 PCL

PBX1 212148_at

400

350

300

250

200

150

100

50

0 N-001 N-002 N-004 N-005 M-139 MM-015 MM-019 MM-031 MM-032 MM-037 MM-052 MM-055 MM-070 MM-111 MM-115 MM-128 MM-140 MM-159 MM-026 MM-100 MM-119 MM-126 MM-054 MM-064 MM-179 MM-180 MM-186 MM-014 MM-035 MM-038 MM-039 MM-056 MM-079 MM-103 MM-116 MM-118 MM-131 MM-143 MM-146 MM-151 MM-030 MM-034 MM-043 MM-049 MM-077 MM-121 MM-178 MM-183 MM-190 MM-016 MM-027 MM-036 MM-045 MM-048 MM-050 MM-068 MM-078 MM-081 MM-082 MM-092 MM-094 MM-107 MM-114 MM-117 MM-129 MM-148 MM-149 MM-150 MM-152 MM-153 MM-160 MM-161 MM-167 MM-040 MM-047 MM-101 MM-072 MM-106 MM-155 MM-174 MM-177 MM-184 MM-191 MM-021 MM-033 MM-042 MM-063 MM-067 MM-074 MM-083 MM-087 MM-104 MM-109 MM-113 MM-123 MM-138 MM-158 MM-066 MM-133 MM-089 MM-173 MM-185 MM-025 MM-069 MM-004 MM-154 M PCL-001 PCL-002 PCL-004 PCL-006 PCL-007 PCL-008 PCL-009 PCL-005 PCL-011 MGUS-012 MGUS-015 MGUS-034 MGUS-005 MGUS-008 MGUS-017 MGUS-018 MGUS-019 MGUS-020 MGUS-026 MGUS-031 N MGUS TC1 TC2 TC3 TC4 TC5 PCL

800 218336_at PFDN2 700

600

500

400

300

200

100

0 N-001 N-002 N-004 N-005 MM-015 MM-019 MM-031 MM-032 MM-037 MM-052 MM-055 MM-070 MM-111 MM-115 MM-128 MM-140 MM-159 MM-026 MM-100 MM-119 MM-126 MM-054 MM-064 MM-179 MM-180 MM-186 MM-014 MM-035 MM-038 MM-039 MM-056 MM-079 MM-103 MM-116 MM-118 MM-131 MM-143 MM-146 MM-151 MM-030 MM-034 MM-043 MM-049 MM-077 MM-121 MM-178 MM-183 MM-190 MM-016 MM-027 MM-036 MM-045 MM-048 MM-050 MM-068 MM-078 MM-081 MM-082 MM-092 MM-094 MM-107 MM-114 MM-117 MM-129 MM-148 MM-149 MM-150 MM-152 MM-153 MM-160 MM-161 MM-167 MM-040 MM-047 MM-101 MM-072 MM-106 MM-155 MM-174 MM-177 MM-184 MM-191 MM-021 MM-033 MM-042 MM-063 MM-067 MM-074 MM-083 MM-087 MM-104 MM-109 MM-113 MM-123 MM-138 MM-158 MM-066 MM-133 MM-089 MM-173 MM-185 MM-025 MM-069 MM-004 MM-154 MM-139 PCL-001 PCL-002 PCL-004 PCL-006 PCL-007 PCL-008 PCL-009 PCL-005 PCL-011 MGUS-012 MGUS-015 MGUS-034 MGUS-005 MGUS-008 MGUS-017 MGUS-018 MGUS-019 MGUS-020 MGUS-026 MGUS-031 N MGUS TC1 TC2 TC3 TC4 TC5 PCL

CYC1 201066_at 2500

2000

1500

1000

500

0 123 N-001 N-002 N-004 N-005 MM-138 MM-158 MM-066 MM-133 MM-089 MM-173 MM-185 MM-025 MM-069 MM-004 MM-154 MM-139 MM-015 MM-019 MM-031 MM-032 MM-037 MM-052 MM-055 MM-070 MM-111 MM-115 MM-128 MM-140 MM-159 MM-026 MM-100 MM-119 MM-126 MM-054 MM-064 MM-179 MM-180 MM-186 MM-014 MM-035 MM-038 MM-039 MM-056 MM-079 MM-103 MM-116 MM-118 MM-131 MM-143 MM-146 MM-151 MM-030 MM-034 MM-043 MM-049 MM-077 MM-121 MM-178 MM-183 MM-190 MM-016 MM-027 MM-036 MM-045 MM-048 MM-050 MM-068 MM-078 MM-081 MM-082 MM-092 MM-094 MM-107 MM-114 MM-117 MM-129 MM-148 MM-149 MM-150 MM-152 MM-153 MM-160 MM-161 MM-167 MM-040 MM-047 MM-101 MM-072 MM-106 MM-155 MM-174 MM-177 MM-184 MM-191 MM-021 MM-033 MM-042 MM-063 MM-067 MM-074 MM-083 MM-087 MM-104 MM-109 MM-113 MM- PCL-001 PCL-002 PCL-004 PCL-006 PCL-007 PCL-008 PCL-009 PCL-005 PCL-011 MGUS-012 MGUS-015 MGUS-034 MGUS-005 MGUS-008 MGUS-017 MGUS-018 MGUS-019 MGUS-020 MGUS-026 MGUS-031 N MGUS TC1 TC2 TC3 TC4 TC5 PCL

87 BOP1 212563_at

800

700

600

500

400

300

200

100

0 N-001 N-002 N-004 N-005 MM-015 MM-019 MM-031 MM-032 MM-037 MM-052 MM-055 MM-070 MM-111 MM-115 MM-128 MM-140 MM-159 MM-026 MM-100 MM-119 MM-126 MM-054 MM-064 MM-179 MM-180 MM-186 MM-014 MM-035 MM-038 MM-039 MM-056 MM-079 MM-103 MM-116 MM-118 MM-131 MM-143 MM-146 MM-151 MM-030 MM-034 MM-043 MM-049 MM-077 MM-121 MM-178 MM-183 MM-190 MM-016 MM-027 MM-036 MM-045 MM-048 MM-050 MM-068 MM-078 MM-081 MM-082 MM-092 MM-094 MM-107 MM-114 MM-117 MM-129 MM-148 MM-149 MM-150 MM-152 MM-153 MM-160 MM-161 MM-167 MM-040 MM-047 MM-101 MM-072 MM-106 MM-155 MM-174 MM-177 MM-184 MM-191 MM-021 MM-033 MM-042 MM-063 MM-067 MM-074 MM-083 MM-087 MM-104 MM-109 MM-113 MM-123 MM-138 MM-158 MM-066 MM-133 MM-089 MM-173 MM-185 MM-025 MM-069 MM-004 MM-154 MM-139 PCL-001 PCL-002 PCL-004 PCL-006 PCL-007 PCL-008 PCL-009 PCL-005 PCL-011 MGUS-012 MGUS-015 MGUS-034 MGUS-005 MGUS-008 MGUS-017 MGUS-018 MGUS-019 MGUS-020 MGUS-026 MGUS-031 N MGUS TC1 TC2 TC3 TC4 TC5 PCL

UCK2 209825_s_at 1200

1000

800

600

400

200

0 -113 N-001 N-002 N-004 N-005 MM-123 MM-138 MM-158 MM-066 MM-133 MM-089 MM-173 MM-185 MM-025 MM-069 MM-004 MM-154 MM-139 MM-015 MM-019 MM-031 MM-032 MM-037 MM-052 MM-055 MM-070 MM-111 MM-115 MM-128 MM-140 MM-159 MM-026 MM-100 MM-119 MM-126 MM-054 MM-064 MM-179 MM-180 MM-186 MM-014 MM-035 MM-038 MM-039 MM-056 MM-079 MM-103 MM-116 MM-118 MM-131 MM-143 MM-146 MM-151 MM-030 MM-034 MM-043 MM-049 MM-077 MM-121 MM-178 MM-183 MM-190 MM-016 MM-027 MM-036 MM-045 MM-048 MM-050 MM-068 MM-078 MM-081 MM-082 MM-092 MM-094 MM-107 MM-114 MM-117 MM-129 MM-148 MM-149 MM-150 MM-152 MM-153 MM-160 MM-161 MM-167 MM-040 MM-047 MM-101 MM-072 MM-106 MM-155 MM-174 MM-177 MM-184 MM-191 MM-021 MM-033 MM-042 MM-063 MM-067 MM-074 MM-083 MM-087 MM-104 MM-109 MM PCL-001 PCL-002 PCL-004 PCL-006 PCL-007 PCL-008 PCL-009 PCL-005 PCL-011 MGUS-012 MGUS-015 MGUS-034 MGUS-005 MGUS-008 MGUS-017 MGUS-018 MGUS-019 MGUS-020 MGUS-026 MGUS-031

N MGUS TC1 TC2 TC3 TC4 TC5 PCL

205383_s_at ZBTB20

1400

1200

1000

800

600

400

200

0 N-001 N-002 N-004 N-005 MM-015 MM-019 MM-031 MM-032 MM-037 MM-052 MM-055 MM-070 MM-111 MM-115 MM-128 MM-140 MM-159 MM-026 MM-100 MM-119 MM-126 MM-054 MM-064 MM-179 MM-180 MM-186 MM-014 MM-035 MM-038 MM-039 MM-056 MM-079 MM-103 MM-116 MM-118 MM-131 MM-143 MM-146 MM-151 MM-030 MM-034 MM-043 MM-049 MM-077 MM-121 MM-178 MM-183 MM-190 MM-016 MM-027 MM-036 MM-045 MM-048 MM-050 MM-068 MM-078 MM-081 MM-082 MM-092 MM-094 MM-107 MM-114 MM-117 MM-129 MM-148 MM-149 MM-150 MM-152 MM-153 MM-160 MM-161 MM-167 MM-040 MM-047 MM-101 MM-072 MM-106 MM-155 MM-174 MM-177 MM-184 MM-191 MM-021 MM-033 MM-042 MM-063 MM-067 MM-074 MM-083 MM-087 MM-104 MM-109 MM-113 MM-123 MM-138 MM-158 MM-066 MM-133 MM-089 MM-173 MM-185 MM-025 MM-069 MM-004 MM-154 MM-139 PCL-001 PCL-002 PCL-004 PCL-006 PCL-007 PCL-008 PCL-009 PCL-005 PCL-011 MGUS-012 MGUS-015 MGUS-034 MGUS-005 MGUS-008 MGUS-017 MGUS-018 MGUS-019 MGUS-020 MGUS-026 MGUS-031 N MGUS TC1 TC2 TC3 TC4 TC5 PCL

Fig. 21: Gene expression analysis of transcripts selected by integration of genomic and gene expression profiles. The figures show HG-U133A gene expression results of some selected transcripts with matching DNA and RNA profile, that have been collected by Agnelli et al.285 across a large data-set comprising 102 MM patients, 11 MGUS, 9 PCL, 4 normal samples. MM samples were stratified into five translocations/cyclins (TC) classes according to previously described criteria277: TC1 includes 22 cases; TC2 22 cases; TC3 34 cases; TC4 19 cases; and TC5 5 cases. In each figure is given the probe-ID of the HG-U133A corresponding to the expression level of the observed gene (at the top, to the right).

3.5 Discussion

Known genomic imbalances of MM240-242 were identified, confirming the efficacy of the adopted method. The 30 MM samples we analyzed by aCGH exhibited as major

88 characteristics 1q gain and frequently increased DNA level of odd chromosomes on one hand, and recurrent DNA losses and LOH events at 1p, 13q and 17p on the other hand. Our data further support the evidence of the genomic complexity of MM, and reinforce the role of an integrated genomic approach for improving our understanding of the molecular pathogenesis of the disease. Recurrent genomic aberrations among MM patients reflected the expected situation from previous observations: gain at 1q, trisomy of chromosomes 3, 5, 7, 9, 11, 15, 19, and deletion at 13q. Trisomies of odd chromosomes together with 1q gain are typical for hyperdiploid MM. In our patients series hyperdiploid cases seem to be well represented. In fact, translocations, probably corresponding to non-hyperdiploid cases, were detected by FISH in six patients out of 13. The most common genomic aberrations in HMCLs were gains of chromosome arm 1q and losses accompanied by LOH at chromosome arms 1p, 13q and 17p, all typical for MM. As all four aberrations are known to be associated with poor prognosis293, it is not surprising to find them with a high prevalence in HMCLs. Notably, among the different types of IgH translocation, t(4;14) was the most frequently associated with genomic gains: in particular 1q and 8q24.23-q24.3. Almost half of the HMCLs with the t(4;14) investigated by SNP arrays showed 1q DNA gains, thus underlining the association between the translocation125 and chromosome 1q gain, which may contribute to the poor prognosis of t(4;14)-positive cases226,293. LOH events were particularly frequent among HMCLs, but not in MM patients. In HMCLs we could observe different LOH patterns, underlying different mechanisms of LOH formation: LOH linked to DNA loss, to a gain/amplification, or copy-neutral LOH. LOH matching with losses is compatible with deletions occurring early in the transformation process, before changes in ploidy status. The occurrence of copy-neutral LOH, thus in the absence of a reduction in CN, suggests the involvement of UPD as a consequence of somatic recombination during mitotic cell division. As previously observed by Walker et al.125, we also found UPD in regions of DNA gain/amplification, such as the long arm of chromosomes 1 and 8. This is probably due to a DNA replication event following UPD mechanisms. Indeed, although regions of UPD more commonly underline the presence of tumour suppressor genes37, they might also contain cancer genes, as recently described in myeloproliferative disorders associated with JAK2 mutations294. A comparison between the heat-maps and frequency plots of patients and HMCLs demonstrated that cell lines have an higher genomic instability with respect to clinical samples. In general, cell lines established from malignancies might not precisely reflect the in vivo situation of tumor cells as far as genetic abnormalities are considered. However, the highly proliferative HMCLs can be viewed as the ultimate stage of the tumor progression, thus providing a repository of all genetic changes accumulated during tumor progression, as well as some additional genetic alterations that they might have acquired during in vitro culture193. So, even though most HMCLs are established from cells taken during advanced or extramedullary disease phases, they faithfully reflect the heterogeneity found in MM patients,

89 particularly in relation to IgH chromosomal translocations. The availability of HMCLs has significantly helped to elucidate many of the molecular and biological aspects of MM, such as the identification of the most recurrent IgH translocations and the complex network of cytokines affecting PC growth and angiogenesis196,239. HMCLs represent therefore a helpful model of the heterogeneous genetic lesions found in MM, that can be used to investigate the function of the genes targeted by genetic aberrations, as well as the activity of novel therapeutic agents. Moreover, limitations of currently used MM mouse models in mimicking all aspects of the human disease193 render HMCLs a valid and simpler experimental tool. The DNA CN profiles led us to the identification of amplified regions, that were confirmed by qPCR. On one hand, the qPCR validation demonstrated the reliability of the adopted aCGH approach in precisely defining DNA CN alterations. On the other hand, we were able to detect in an unbiased manner genomic regions potentially harboring candidate oncogenes, thus representing interesting hints to be further investigated. We validated the amplification of already known cancer transcripts like BAFF (TNFSF13b) at 13q33.3 in the cell line KMS12 and the miR-17-92 cluster, located at 13q31.3, amplified in KMS11. BAFF is a growth factor belonging to the TNF superfamily involved in survival of normal and malignant B-cells. Aberrant BAFF signaling was described in MM, where an autocrine loop of tumor cells stimulation was suggested295,296. BAFF secretion is 3 to 10-fold higher in BMSCs than in MM cells. Moreover, adhesion of tumor cell to BMSC mediates BAFF upregulation in BMSCs, which in turn increases the adhesion of myeloma cells to BMSCs in a dose-dependent manner297. DNA amplification could represent a mechanism leading to BAFF overexpression, that render the malignant cells independent to BM microenvironment for growth and survival due to the autocrine stimulation loop. miR-17-92 cluster is frequently amplified and overexpressed in malignant B-cell lymphoma298,299. It has been previously reported the case of the mantle cell lymphoma cell line JEKO1 that concomitantly amplifies both MYC and miR-17-9282. Co- amplification was not detected in KMS11. Furthermore, we confirmed the amplification at 11q23.1 found in JJN3, and additionally revealed in two other cell lines, KARPAS422 and JEKO1, by aCGH studies with 10K SNP arrays performed in our laboratory1,2. Interestingly, the amplification was located at a region often rearranged in haematological malignancies300- 303. Chromosome 11 is frequently affected by ploidy changes, gains and translocation events in MM. Besides the three cell lines JJN3, KARPAS422 and JEKO1, that presented a similar amplification pattern with a delimited region of highly increased DNA CN, we observed also six MM patients and six HMCLs affected by 11q23.1 gain (data not shown). We therefore selected the recurrent amplification at 11q23.1 for deeper investigation, as presented in the second part of this work.

The integrative analysis matching genome-wide DNA and gene expression profiles of 29 MM samples identified genetic elements with strong biological meaning, as evidenced by the presence in our lists of recurrent gained/amplified and overexpressed genes with proven role

90 in MM pathogenesis (positive controls, e.g. MCL1, NCAM1, MAF, CCND1, MAFB), genes in common with previously published candidates and also cancer genes relevant for other tumor types but not yet implicated in MM pathogenesis. Two different strategies have been used: on one hand matched filter and on the other hand COPA-based integrative analyses. We applied a matched filter on both DNA profiles and expression data to identify transcripts with paired DNA and RNA levels. We selected transcripts overexpressed and mapped in gained DNA regions, overexpressed and amplified, or downregulated and lost. The other approach was based on the COPA algorithm6 combined with a successive filter at DNA level to select genes with outlier overexpression and DNA CN gain. The resulting lists of selected transcripts are indicative of candidate oncogenes, in the case of increased DNA CN and overexpression, or candidate tumor suppressor genes, in case of decreased DNA level and downregulation. Following our principal aim being the identification of candidate oncogenes, we mainly focused our attention on amplified, overexpressed genes resulting from the matched filter analysis and listed in table 10. Among them, we detected MCL1, an anti-apoptotic gene involved in the pathogenesis of MM304. MCL1 was among the candidate oncogenes previously proposed by Carrasco et al.226 and was highly upregulated in patient MM cells with respect to normal twin PCs261. Other candidate oncogenes shared with Carrasco’s ones226 include ARNT, FDPS, GBA/GBAP, S100A6 and S100A11. All these genes were considered strong candidate oncogenes mapping at 1q21-q23 and exhibiting significantly increased expression in hyperdiploid subgroup linked to poor prognosis, i.e. with 1q gain and 13q deletion (supplementary table S2 of226). ARNT is targeted by chromosomal translocations in acute leukemias305, interfering with its activity as central partner of several heterodimeric transcription factors. FDPS, farnesyl diphosphate synthase, is involved in the mevalonate pathway. Inhibitors of mevalonate pathway have antiproliferative effects in vitro, principally mediated by prevention of the synthesis of isoprenoids (e.g. farnesyl pyrophosphate and geranylgeranyl pyrophosphate) that are used for post-translational modifications, and leading to cell cycle arrest and apoptosis306. Several studies have established that bisphosphonates effectively induce apoptosis of HMCL307. Newer nitrogen containing bisphosphonates seem to have anti-tumor potential due, at least in part, to the inhibition of farnesyl diphosphate synthase and consequent disruption of the mevalonate pathway308. GBA/GBAP, glucosidase beta and glucosidase beta pseudogene, encodes a lysosomal membrane protein that cleaves the beta-glucosidic linkage of glycosylceramide, an intermediate in glycolipid metabolism. Mutations in this gene cause Gaucher disease, a lysosomal storage disease characterized by an accumulation of glucocerebrosides309. Altered expression of genes coding for calcium- binding proteins of the S100 family have been reported in various tumors, including lymphomas310-312. They are multifunctional signaling proteins, involved in the regulation of diverse cellular processes and distributed both in intracellular compartments and extracellular

91 space310. Some of them such as S100A4, S100A8, S100A9, S100A12 and S100A13 are secreted and act in a cytokine-like manner310. Many are localized at chromosome 1q21 in the so- called epidermal differentiation cluster and are designated with the stem symbol S100A followed by consecutive Arabic numbers. This cluster is an area frequently rearranged in human tumors311. S100A6 is expressed in many epithelial tumors; S100A8 is not expressed in normal epithelia, but strong expression of S100A8 and S100A9 in lymphocytes is linked to inflammatory processes; S100A9 is expressed in squamous epithelium and in some cases of squamous carcinoma and invasive breast cancer; S100A11 is expressed in most epithelia and many lymphocytes, with strong expression in many tumor types311. S100A10 and S100A11 are overexpressed and amplified in anaplastic large cell lymphoma cell lines and tissue samples312. Amplification at 1q12-q22 linked to overexpression of IRTA2, PDZK1, and S100A6 is detected in HMCLs247. We identified the amplification and overexpression of S100A10, S100A11, and S100A6, the latter two in common with candidate oncogenes linked to poor prognosis presented in supplementary table S2 of226. Moreover, we identified S100A4 among the recurrent gained and overexpressed genes, also cited in supplementary table S2 of226 and upregulated in patient MM cells with respect to normal twin PCs261. Finally, COPA-based integrative analysis revealed overexpression of S100A8, S100A9 and S100A12 with underlying DNA gain in the 66% of MM samples. These results strongly suggest an involvement of the S100A cluster in MM, at least in cases with 1q21 gain. Probably the most interesting finding was the amplification and overexpression of POLR3C and RBM8A, shared both with Walker et al.125 and Carrasco et al.226. POLR3C encodes the polypeptide C of RNA polymerase III. RNA polymerase III and RNA polymerase I are essential for protein synthesis. In fact, they are involved in the transcription of rRNA and tRNA, necessary for ribosome biogenesis. Since the availability of ribosome is in many cases the limiting factor on the rate of protein synthesis313, deregulated RNA polymerases in tumors have a great impact on growth and proliferation potential of cancer cells. Protein synthesis and therefore cell growth can be stimulated by various oncogenic mechanisms: for example abnormally active ERK (extracellular signal-regulated kinases), most commonly due to RAS mutations, that stimulates transcription of Pol I and III via different targets314; MYC activation, that directly lead to activation of RNA polymerase III315; compromised p53 function, thus absence of p53 repression on Pol III transcription316. Overexpression of RNA polymerase III might be another mechanism leading to malignant activation of protein synthesis, although in our case we observed the alteration of a single Pol III subunit. However, this observation could be indicative of deregulation of ribosome biogenesis and protein synthesis pathways in MM, supported by the following additional findings. The amplified, overexpressed RBM8A (RNA binding motif protein 8A) encodes a ribosomal protein with a conserved RNA-binding motif. Another RNA binding motif protein, RBM10, was described, together with EXOSC4, among RNA metabolism genes associated with proliferation being upregulated in PCL with respect to MGUS264. EXOSC4 was present in our list of recurrent amplified, overexpressed

92 genes as well, and was upregulated in the 70-gene MM high-risk signature245. Notably, a strong candidate among the overexpressed, gained genes was MRPL13, in common with Carrasco’s226 and Largo’s265. It is a mitochondrial ribosomal protein, that helps in protein synthesis within the mitochondrion. Another ribosomal protein, MRPL32, was previously described among differentially expressed transcripts in MM and MGUS patients with respect to healthy subjects263. It was progressively upregulated following the transition from MGUS to MM. Besides genes involved in ribosome metabolism, we detected the amplification and overexpression of a translation initiation factor, EIF2C2, similar to EIF3S3 (eukaryotic translation initiation factor 3 subunit H), previously described as gained and overexpressed by Largo et al.265. EIF2C2, eukaryotic translation initiation factor 2C2 or AGO2, encodes a member of the Argonaute family of proteins which play a key role in miRNA-mediated degradation of mRNA and seem to be required for tissue-specific developmental processes. Interestingly, EIF2C2 belonged to the 70-gene MM high-risk signature245 and was recently linked to transformed phenotype in breast cancer cell line317. An additional evidence giving more importance to these observations is the previously reported oncogenic deregulation of pathways responsible for protein synthesis in MM, including the deregulation of ribosomal proteins226,256,258. Further candidate genes recurrent in our samples and in common with published candidates were UCK2125, PFDN2125, NDUFS2125 and CYC1265. UCK2, uridine-cytidine kinase 2, is involved in the synthesis of nucleic acids. A minimally gained segment at 1q was detected in Burkitt lymphoma and pediatric high hyperdiploid acute lymphoblastic leukemia, and UCK2 was significantly overexpressed in cases with 1q gain318. Expression of the UCK2 protein has been correlated with sensitivity to a few anticancer drugs319. PFDN2, prefoldin subunit 2, is a gene that encodes a subunit of prefoldin, a molecular chaperone complex that binds and stabilizes newly synthesized polypeptides, thereby allowing them to fold correctly. NDUFS2 codes for a subunit of the mitochondrial complex I, i.e. the first multimeric enzymatic complex of the respiratory chain, and CYC1 encodes cytochrome c1. By integrative analysis of expression and genomic profiles in 17 HMCLs, we previously identified BCL2 overexpression in two HMCLs with DNA amplification of the same locus5, in concordance with the results obtained by Largo et al.265 and Carrasco et al.226. In the integrative analysis of DNA and RNA profiles presented here including both 17 HMCLs and 12 clinical samples we could identify BCL2 as amplified and overexpressed only with the matched filter approach and only in one case, the cell line CMA02 (data not shown). BCL2 amplification in KMS12 was not detected due to the new threshold used to delimit overexpression, calculated on the average expression value of 29 MM cases. However, our recent observation in HMCLs5, together with Largo’s and Carrasco’s ones, suggests that BCL2 gain/amplification and overexpression may play a role in myelomagenesis.

93 Besides the aforementioned genes shared with previous publications, other amplified and overexpressed transcripts selected by our matched filter analysis on DNA and RNA profiles have been linked to cancer and may be of interest in myeloma. PBX1 (pre-B-cell leukemia homeobox 1) is a target of chromosomal translocations in acute leukemias320. In a work on the molecular classification of more than 400 MM patients based on unsupervised hierarchical clustering of gene expression profiles, the B-cell oncogene PBX1 was overexpressed in the MS group, represented by cases with t(4;14), thus activation of FGFR3 and MMSET262. DDEF1, which codes for an ADP ribosylation factor-GTPase activating protein, has been reported to be amplified and overexpressed in uveal melanoma321. CUL4A, a member of the cullin proteins family that forms part of a ubiquitin-protein ligase E3 complex, is amplified and overexpressed in hepatocarcinoma322. The gene coding for the methylated DNA-binding protein 2 (MBD2) seemed to act as a transcription repressor of methylated genes, and treatment with MBD2 anti-sense oligonucleotides inhibited the growth of tumor xenografts in mice323. Even if our main interest was addressed to the identification of candidate oncogenes among amplified and overexpressed genes, it is worth considering some gained and overexpressed genes selected by the matched filter approach as well. To be cited, the presence of genes of known pathogenetic relevance in MM such as the previously mentioned MCL1; the hepatocyte growth factor (HGF), a potent MM growth and survival factor324; the interleukin 6 receptor (IL6R), that in soluble form can also function as an agonist325; and MAF, known to be activated in MM by the t(14;16) that causes its overexpression. The fact that we detected MAF among genes with increased expression and simultaneously increased DNA level could suggest gene dosage alteration as an additional mechanism deregulating it. In the previous work on HMCLs we reported that the expression of MAF was not restricted to cell lines with t(4;14)5. The same was also observed in a very small fraction of MM patients overexpressing MAF without any evidence of the t(14;16)276. We detected also ITGB7, which is a target of MAF previously found overexpressed together with MAF in MM samples262. Moreover, we identified genes linked to cell cycle and proliferation, such as DNA mismatch repair genes MCM4 and MCM7, previously found upregulated in PCL vs MGUS samples264, and ASK, already cited as candidate oncogene by Carrasco et al.226 and overexpressed in MGUS and MM samples with respect to normal PCs326. Of interest the fact that developmental processes were quite common among the over- represented Panther Biological Processes revealed by functional analysis of gained and overexpressed transcripts. This is concordant with previous observations made by Davies et al.260. They analyzed the multi-step malignant transformation from MGUS to MM by comparing the gene expression profiles of healthy PCs vs MGUS vs MM, and found that developmental pathways were involved in malignant transformation. It is known that developmental pathways are typical of stem cells, and the expression of developmental programs could be linked to the presence of MM stem cells194. We found the involvement in MM samples of NES,

94 the gene coding for NESTIN. NESTIN is a cytoskeletal intermediate filament protein. It is considered a marker of stem and progenitor cells in adult tissues, that are characterized by proliferation, differentiation and migration capacity327. Increased expression in different tumors was reported, sometimes correlated with advanced disease328. In our panel of MM samples, NES was gained and overexpressed according to both matched filter and COPA 90th analysis and was previously presented as a strong candidate oncogene in MM cases with 1q gain (supplementary table S2 of226). Additionally, functional analysis highlighted the involvement of vitamin D receptor (VDR) in the control of gene expression. VDR acts as a ligand-activated transcription factor. In a recent aCGH study of HIV-related B-cell lymphoma4 (manuscript in preparation), we identified in HIV-PEL a recurrent gain affecting the locus of VDR. From the literature it is known that PEL gene expression resembles that of malignant PCs329,330. Similar to malignant PCs, PEL samples exhibits increased expression and protein level of the vitamin D receptor and are sensitive to a vitamin D analogue drug329. In MM we observed VDR overexpression associated with underlying gain only in three samples (data not shown). The involvement of the VDR pathway was revealed by functional annotation of all genes selected by matching DNA gain and RNA overexpression. In general, the overlap between our candidate oncogenes and three recently published lists of overexpressed and gained/amplified genes125,226,265 was not negligible if we consider the biological variability affecting small case series and technical aspects that may account for big differences among similar approaches. The microarrays used in ours and in the published works are different and discrepancies in genomic profiling are to be expected when different techniques are used331. Interestingly, great overlap was detected for genes relevant for protein synthesis, such as POLR3C, RBM8A and MRPL13. The overall involvement of mRNA transcription regulation and protein synthesis in the transcripts we selected by integrative analysis was confirmed by the functional annotation analysis. The latter revealed that mRNA transcription regulation was the most deregulated biological process, being the most over- represented among amplified and overexpressed, gained and overexpressed, and deleted and downregulated transcripts. The COPA-based integrative analysis enabled the selection of genes with marked overexpression with respect to the median expression level and characterized by recurrent DNA gain. The more the expression level is increased, the higher is the COPA score assigned to the gene. Among both COPA 75th and COPA 90th outliers with high COPA ranking there were genes coding for calcium-binding proteins of the S100 family and BAGE, coding for the B melanoma antigen and belonging to the cancer germline antigen genes, also called cancer/testis antigens (CT antigens). The latter are normally expressed only in the human germline, but are also found in various tumor types332. They encode tumor-specific antigenic peptides, that can be recognized by cytotoxic T cells, thus clearly representing an interesting target for cancer immunotherapy. Moreover, CT antigens expressed in tumors could have a role in similarities observed between germline and cancer development, influencing

95 processes like signaling, transcription, translation and chromosomal recombination333. Their aberrant expression in cancer cells is probably due to hypomethylation, and might lead to abnormal chromosome segregation and aneuploidy, but experimental data are still needed to demonstrate it332. CT antigens include, among others, genes of the MAGE (melanoma antigen), BAGE (B melanoma antigen), and GAGE (G antigen) families. Gene expression profiling (GEP) of MGUS, MM, and PCL samples revealed overexpression of GAGE and MAGE in some MM cases characterized by the absence of IgH translocation or locus amplification, and a transcriptional profile similar to PCL, therefore associated with a more aggressive disease264. A similar correlation between MAGE-type genes and MM disease stage was already observed in MM patients334. Moreover, the proliferation subgroup detected by molecular classification of MM samples based on GEP, presented overexpression of the tumor-specific antigen genes GAGE and MAGE, and was associated with disease progression and poor prognosis262. In our previous study on the molecular characterization of HMCLs we detected high expression of GAGE genes in six cell lines5. Here we show the overexpression of BAGE in a subset of MM samples and the recurrent gain of its locus. Although CT antigen expression is believed to be mainly driven by hypomethylation, DNA CN changes could be an additional mechanism deregulating their expression. The identification of marked overexpression of BAGE in a subset of samples suggest the possible pathogenic role for such proteins in MM and also confirms that they could be attractive therapeutic targets. S100A cluster was recurrent among the candidate oncogenes selected with both matched filter and COPA-based integrative strategy, strongly suggesting an important role in MM as previously discussed. As expected, COPA outliers included genes known to be deregulated by chromosomal rearrangements in MM, such as CCND1, detected in both COPA 75th and COPA 90th analysis, MAF (COPA 75th) and MAFB (COPA 90th), apparently linked also to increased DNA levels. Additionally, in COPA 75th we detected some interesting candidate genes like FOS and FOSB, SYK, and the above discussed HGF and PBX1. FOSB was gained and overexpressed in both matched filter and COPA 75th analysis. FOS appeared among the upregulated genes in PCs of the MM twin with respect to normal PCs of the healthy twin261. FOS genes encode leucine zipper proteins that form transcription factor complexes and have been implicated in the regulation of cell proliferation, differentiation, and transformation. In a precedent study with genomic and expression profiling of mantle cell lymphoma samples we identified amplification and overexpression of SYK in the cell line JEKO1, and low-level amplification with protein overexpression was demonstrated in a small subset of patients1. SYK is a tyrosine kinase involved in B-cell receptor signalling. Functional studies suggested SYK inhibition as a new therapeutic strategy to be explored in lymphomas and possibly in MM patients1. COPA 90th outliers included GNG11, previously found overexpressed in hyperdiploid samples262, and CX3CR1, a target of MAF transcription factors262.

96 Our attempt to identify tumor suppressor candidates with matched filter integrative analysis revealed three genes already cited as differentially expressed in MM samples: PDE4B was downregulated in MM vs MGUS263; the interferon-induced gene IFI27 was downregulated in low bone disease subgroup262; and GADD45B, a member of DNA damage-inducible gene family, was downregulated in MM and PCL vs MGUS264. GADD45 is a family of genes usually upregulated following DNA damage and involved in growth arrest. GADD45A is regulated by several key tumor suppressors and oncogenes that are implicated in controlling genomic stability (e.g p53, ATM, MYC), and deletion of GADD45A leads to centrosome amplification and consequent genomic instability, characterized by abnormal mitosis and aneuploidy335. In conclusion, our integrated genomic approaches identified several deregulated genes in the context of genetic aberrations in MM samples: transcripts with a proven role in MM pathogenesis, but also genes targeted in other tumor types, known oncogenes not yet implicated in MM and candidates concordant across various data sets. Integrated copy number and expression analysis provided an efficient tool for the selection of potential cancer genes among the ones involved in CN changes. Candidate genes recurrent in various studies more likely play a causative role in the disease than other alterations, more probably due to random events. Moreover, the detection among our candidates of established MM genes and molecular targets currently under evaluation in clinical trials for the treatment of MM266 (like SLAMF7 (CS1), Tab. 11; NCAM1 (CD56), Tab. 9; IL6R, Tab. 9) is a proof of the efficacy of integrative genomic approaches in the identification of molecular and clinical strategies to target MM and its microenvironment. We adopted two different strategies to combine genome-wide DNA and RNA profiles, and the limited overlap of the results confirmed their complementarity in the identification of relevant, candidate genes. In general, our findings are consistent with the genomic complexity and widespread molecular changes associated with MM. However, it has to be considered that alterations of DNA CN are a common cause but not the only cause of deregulated gene expression in a cancer cell. Genetic changes such as balanced translocations, point mutations or epigenetic modifications fail to be detected by aCGH approaches but are as important as CN changes in generating abnormal gene expression profiles or abnormal protein functions. Further studies involving large series of MM samples are needed to elucidate whether our selected transcripts can be considered relevant candidates in the pathogenesis of the disease, and whether they may contribute toward the identification of novel therapeutic targets. Moreover, it is necessary to validate the observed expression changes at the protein level, and functional studies are needed to reveal the real role of candidate oncogenes.

97 4 MOLECULAR AND FUNCTIONAL CHARACTERIZATION OF A RECURRENT DNA AMPLIFICATION AT 11q23.1

Abstract During the observation of the genome-wide DNA profiles of a series of B-cell tumors collected at that time by means of 10K SNP arrays, including the previously mentioned MM (chapter 3), mantle cell lymphoma (MCL) and diffuse large B-cell lymphoma (DLBCL) samples, we detected a recurrent amplification at 11q23.1. Three 10K profiles with a similar 11q23.1 amplification were identified, belonging to the cell lines JJN3 (MM), KARPAS422 (DLBCL) and JEKO1 (MCL). Amplifications are one of the mechanisms leading to oncogene activation and tumor development, and amplified genes can be exploited as prognostic markers and therapeutic targets. Clearly, the recurrent 11q23.1 amplification represented an interesting finding to be further characterized from a molecular and functional point of view. The updated 250K SNP arrays were used to better define the amplified region, and we confirmed an overlapping amplicon at 11q23.1 in JJN3 and KARPAS422, whereas JEKO1 showed a DNA gain flanked by amplified regions. A fourth cell line, U2932 (DLBCL), analyzed with 250K arrays exhibited an amplification similar to JJN3 and KARPAS422. The minimal common region of amplification was delimited by the amplicon of JJN3, and covered 330 kb. 11q23.1 amplification was validated by qPCR, and fluorescence in situ hybridization (FISH) analyses performed on the four cell lines using BAC clones overlapping the amplified region showed four different amplification patterns of complex chromosome rearrangements. To identify the amplification target(s), we analyzed the transcriptome of the cell lines by RT- PCR, looking for the overexpression of known and predicted transcripts and microRNAs (mir- 34b and mir-34c). The transcription level was apparently not influenced by the underlying DNA amplification. Transcriptome mapping with tiling arrays revealed increased transcription activity only in correspondence of the genes PPP2R1B (JJN3 and KARPAS422), POU2AF1 (KARPAS422), and SNF1LK2 (KARPAS422), and excluded the presence of transcription activity outside annotated transcripts. The putative amplification targets POU2AF1 and PPP2R1B were selected for further characterization by loss-of-function studies. Functional evaluation of their biological effect was not possible, due to the inability to reach satisfactory mRNA knockdown. Our attempt to identify the target(s) of the 11q23.1 amplification did not detect strong candidates. We proposed the existence of novel transcription activity overlapping annotated genes. In particular, we hypothesized that the putative tumor suppressor gene PPP2R1B could be targeted by endogenous cis-antisense transcripts, altered by the 11q23.1 amplification and possibly causing its deregulation. In addition, among alternative

98 explanations for 11q23.1 amplification, there is the relative fragility of the region causing chromosome rearrangements, as confirmed by FISH results.

99 4.1 Introduction

Amplifications are considered a quite common mechanism by which oncogenes are activated48,57,336. Moreover, amplified genes can represent important prognostic markers63,64 and therapeutic targets65,337. Our interest on the 11q23.1 region derived from the observation of the genome-wide DNA profiles of a series of B-cell tumors analyzed by aCGH with Affymetrix Human Mapping 10K arrays. A screening of the available 10K genomic profiles comprising clinical cases and cell lines, and including MM (chapter 3), mantle cell lymphoma (MCL)1 and diffuse large B-cell lymphoma (DLBCL)2 samples, led to the identification of three cell lines with a similar 11q23.1 amplification: JJN3 (MM), KARPAS422 (DLBCL) and JEKO1 (MCL). 11q23 is frequently associated with chromosome aberrations in hematological malignancies300. Aberrations of the long arm of chromosome 11, such as translocations and deletions, are very common in lymphoproliferative disorders (LPDs). Among the genes that reside at translocation breakpoint, the MLL gene (11q23.3) is frequently found in acute leukemias301, and has been described also in relationship with few cases of NHL338. Various chromosome breakpoint sites indicative of translocations, deletions or additional chromosome material are described on 11q22-q23.1 and 11q25302,339. 11q23.1 is frequently deleted in hematological malignancies340. The tumor suppressor gene ATM is involved in chronic lymphocytic leukaemia (CLL)341 and MCL342,343 cases with 11q deletions, but more genes of this critical region are probably important344. A novel translocation, t(11;13)(q23;q12), has been identified at 11q23.1 in a patient with CLL303. The breakpoint is located between genes BTG4 and POU2AF1, making them interesting as candidate genes. In another translocation, t(X;11)(q13;q23), the breakpoints have been cloned345: at 11q23 is affected the gene ARHGAP20, and at Xq13 the gene BRWD3, both disrupted by the translocation. Quantitative expression analysis in a panel of 22 CLL samples has revealed significant ARHGAP20 upregulation. As previously discussed, chromosomal translocations involving the 11q23 region and leading to MLL deregulation are described in both acute lymphoblastic leukaemia (ALL) and acute myeloid leukaemia (AML). Besides structural aberrations, 11q23 is amplified at genomic level in patients samples with AML, as detected using MLL-specific FISH probes346. Some genes located at 11q23-q25 have been reported as amplified candidate genes in hematologic malignancies, such as ETS1347, FLI1, DDX6, SRPK, NFRKB, and KCNJ5348-350. MLL has been identified by means of expression analyses as the prominent target gene of the recurrent 11q23 amplicon in AML and myelodysplastic syndrome (MDS)349. Moreover, complex patterns of co-amplification with MLL have been detected at 11q13.5 and 11q23-24 in AML and MDS cases, thus increasing the number of candidate target genes350.

100 Aberrations of chromosome 11 are frequently found in MM patients, in particular at 11q23, which is frequently gained as detected by FISH240,351 and CGH352 analyses. This supports a putative role of genes located on 11q23 for MM pathogenesis. It is well known that fragile sites are associated with chromosomal instability and amplification mechanisms. Two fragile sites are described at 11q23.3: FRA11G, which is common (i.e. being probably a constitutional feature of all individuals), and FRA11B, which is rare (i.e. occurring in less than 5% of all individuals)353.

4.2 Aim

Our aim was to understand the biological role of 11q23.1 instability in B-cell tumors, and to identify candidate genes targeted by the recurrent amplification detected in the 10K genomic profiles of JJN3, KARPAS422 and JEKO1. As soon as the updated Affymetrix Human Mapping 250K array was available, we used it to confirme and better characterize the recurrent amplification first identified with the 10K array. Moreover we performed quantitative real-time PCR to validate the 11q23.1 amplification. In order to identify the amplification target(s) we analyzed the transcriptome of the cell lines. Our hypothesis was to find the putative target(s) among the genes mapping in the amplified region as overexpressed transcript(s). We initially performed reverse transcription (RT)-PCR, then we further used tiling arrays for high-resolution mapping of the transcriptome, looking for all possible transcripts. Finally, we decided to test the expression of some candidate genes identified through the tiling array profiles. We therefore performed quantitative real-time RT- PCR (qRT-PCR) in a larger panel of B-cell tumors (cell lines and patients). By analysing the results of qRT-PCR we noticed a DLBCL cell line, U2932, with an expression pattern similar to KARPAS422. From the literature we learned that U2932 is affected by a chromosomal rearrangement of 11q21-q23, linked to DNA amplification354. We therefore obtained the genome-wide DNA profile of U2932 with the Affymetrix Human Mapping 250K array and we validated the 11q23.1 amplification by qPCR. Additionally, we investigated the patterns of DNA amplification by FISH in the cell lines JJN3, KARPAS422, JEKO1, and U293. On the basis of the qRT-PCR results we selected two putative amplification targets, POU2AF1 and PPP2R1B, and we tried to characterize them from a functional point of view by loss-of-function studies with RNA interference.

101 4.3 Materials and Methods

Cell lines JJN3 was kindly provided by A. Neri (Fondazione IRCCS, Policlinico and Dipartimento Scienze Mediche, University of Milan, Italy); KARPAS422 by J. A. Martínez Climent (Department of Hematology and Medical Oncology, Clinical Hospital, University of Valencia, Spain); U2932 by R. E. Davies (U.S. National Cancer Institute, Washington, DC, USA); and JEKO1 was obtained from the DMSZ (German collection of Microorganisms and Cell Culture, Germany; http://www.dsmz.de). The MCL and DLBCL cell lines were maintained in RPMI-1640 medium (GIBCO) supplemented with 10% (JEKO1, U2932) or 20% (KARPAS422) fetal bovine serum (GIBCO); JJN3 was cultured with IMDM (GIBCO) supplemented with 10% fetal bovine serum. The cell lines were incubated at 37°C, in a 5% CO2 humified atmosphere. The cells were growing in suspension, usually forming clumps, with the exception of JEKO1, that were mainly found as single cells or very small clumps. In table 15 are reported the public available cytogenetics data of the cell lines JJN3, KARPAS422, U2932, and JEKO1 registered at the DMSZ and presented also by the “Guide to Leukemia-Lymphoma Cell Lines”355.

Tab. 15: Cytogenetics data of the cell lines JJN3, KARPAS422, U2932, and JEKO1 registered at the “German Collection of Microorganisms and Cell Cultures” (http://www.dsmz.de) and presented by the “Guide to Leukemia-Lymphoma Cell Lines”355. In bold are highlighted the aberrations involving chromosome 11. JJN3 flat-moded hypotriploid karyotype with 9% polyploidy; 58-67<3n>XX. +1, -2, +5, +8, +8, -9, -11, -12, -13, -15, -17, +20, add(1)(p22), der(1)t(1;?3)(q41;p26)x2, add(3)(p26), add(5)(p15)x1-2, i(5p), del(6)(q13), del(7)(q32), der(7)t(7;11)(q36;q13), add(8)(p1?)x2, t(12;19)(p13;q13), der(14)add(14)(p11)t(14;16)(q32;q23), der(14)t(14;16)(q32;q23), der(16)t(14;16)(q32;q23)x2, del(22)(q12) KARPAS422 Hyperdiploid karyotype, 10% polyploidy; 47(44-48)<2n>XX. +14, t(2;10)(p23;q22), t(4;11)(q21;q24), t(4;16)(q21;p13), der(14)t(14;18)(q32;q21)x2. Molecular karyotype (FISH): t(4;16;10)(q21;p13;q23), no t(4;11)(q21;q23), t(14;18)(q32;q21), +der(14)t(14;18)(q32;q21) U2932 45, X, -X. der(1)del(1)(q23)dup(1)(q12q23), del(2)(q11), der(3)ins(3;18)(q2?7;q21)-?hsr(18)(q21), der(6)t(6;18)(q16;q?), der(7)t(7;2;15)(q22;q24q34;q22), t(9;19)(q34;q13.1), der(10)t(7;10)(q22;21), der(11)t(1;11)(q12q23;q23)?hsr(11)(q21q23), del(13)(q12), der(14)t(14;3;18;3)(q32;q13.2q2?7;q21;q2?7)?hsr(18)(q21), der(18)t(1;18)(q23;q12), der(18)t(3;18)(q2?7;q23)?hsr(18)(q21), der(21)t(15;21)(q22;p11) JEKO1 highly rearranged flat-moded hypertriploid; 70-78<3n>XXXX. +X, +1, +2, +2, +4, +6, +7, -8, +10, +11, -12, -12, +13, +14, +15, -17, +18, -19, -19, -22, +2- 4mar, del(X)(p21)/add(X)(p11.2)x2, der(X)t(X;11)(q26-28;q13)t(11;?10)(q24;?p12), der(1)t(1;2)(p21;q22)x2, add(2)(q2?), del(2)(q11), add(3)(q27), der(5)t(5;12)(p15;p11- 12)x2, add(6)(p21)x2, der(6;13)(p10;q10)x1-2, der(7)t(4;7)(q2?6;q32.1)x2, add(9)(p2?), der(9)t(9;14)(q34-35;q32-33)x2, der(10)t(10;11)(p12;?)x2, der(11)t(11;14)(q13;q32)x1-2, add(13)(p11), der(13)t(13;?)(p11;?)t(?;13)(?;q11), add(14)(p11), der(14)t(14;17)(p12;q11)t(14;6)(q24;?p11), der(14)t(11;14)(?;p11)t(11;?8)(?;?), del(16)(p11), add(17)(p11), ins(19;12)(q13.1;q13.2q2?4)x2

102 Genome-wide DNA profiles Genomic profiles were obtained with the GeneChip Mapping 500K assay in conjunction with the GeneChip Human Mapping 250K Array NspI according to manufacturer’s instructions (Affymetrix). This assay enables genotyping of more than 250,000 SNP loci on a single array. In the following paragraphs are listed all the procedures on DNA target preparation, target hybridization, fluidics set up, array scan and data analysis.

Genomic DNA preparation: isolation, quality control and quantification Human genomic DNA was extracted using the QIAamp DNA Mini kit (Qiagen), following the manufacturer’s instructions. During the extraction protocol, RNA digestion with Ribonuclease A (USB Corporation, Ohio, USA) was performed and DNA was eluted in DEPC-water. Whole DNA quantification, quality control and preparation was done as previously explained (chapter 3.3, “Genome-wide DNA profiles”).

GeneChip Mapping 500K assay protocol We followed the low throughput protocol presented in the Appendix C of the GeneChip Mapping 500K Assay Manual. Essentially we performed the same procedures as previously explained for the GeneChip Mapping 10K 2.0 assay (chapter 3.3, “Genome-wide DNA profiles”). We report here only the updates. The initial digestion of total genomic DNA was performed with Nsp I (New England Biolab) at 37°C for 120’ and 65°C for 20’. Ligation mix contained 1.5 μM Adaptor Nsp I, 1x T4 DNA Ligase buffer and 32 U/μl T4 DNA Ligase (New England Biolab), and the ligation program was: 16°C for 180’ and 70°C for 20’. PCR amplification of ligated DNA fragments was performed in three replicates for each sample (four replicates in case of scarce yield). The PCR Master Mix was prepared on ice with the Titanium DNA Amplification Kit (Clontech Laboratories, Inc./Takara Bio Company, CA, USA) and contained 1x Clontech Titanium Taq PCR Buffer, 1 M GC-Melt, 350 μM each dNTP, 4.5 μM PCR Primer 002, and 1 x Clontech Titanium Taq DNA Polymerase in a final volume of 100 μl with 10 μl diluted ligated DNA for each replicate. The PCR program ran on a MJR Thermal Cycler 200 with heated lid was: initial denaturation at 94°C for 3’, followed by 30 cycles of 94°C for 30’’, 60°C for 30’’ and 68°C for 15’’, then finally 68°C for 7’. PCR products were purified with DNA Amplification Clean-Up Kit (Clontech, Takara Bio Europe) using a QIAvac Multiwell unit (Qiagen). Briefly, a Clean-up plate was connected to a vacuum source and 8 μl 0.1 M EDTA were added to each PCR reaction and all the PCR reactions for each sample were consolidated into one well of the plate. With application of a vacuum the wells were dried and washed three times with water. The vacuum was released and the purified PCR products were eluted adding 45 μl RB buffer to each well and shaking the plate at room temperature for 10’. The purified PCR product was recovered by pipetting the eluate out of each well, and was quantified with a Nanodrop (NanoDrop Technologies). 90 μg of purified PCR product in a volume of 45 μl (RB buffer) were needed for the next

103 fragmentation step. Fragmentation was performed in a final volume of 55 μl containing, besides 45 μl of purified PCR product, 1x Fragmentation Buffer and 0.25 U Fragmentation Reagent. Fragmentation was performed at 37°C for 35’, followed by DNase inactivation step at 95°C for 15’. DNA fragments were then labelled with 0.857 mM GeneChip DNA Labelling Reagent, 1x TdT Buffer and 1.5 U/μl Terminal deoxynucleotidyl transferase (TdT). The mix was prepared on ice and incubated at 37°C for 240’, and 95°C for 15’. The Hybridization Cocktail was prepared as previously explained and was denatured at 99°C for 10’ in a heat block, cooled down on ice for 10’’ and centrifuged at 2,000 rpm for 1’. After an incubation at 49°C for 1’, 200 μl of denatured Hybridization Mix were injected into the GeneChip Human Mapping 250K Array NspI, that was placed at 49°C in a GeneChip Hybridization Oven 640 (Affymetrix), at 60 rpm and incubated for 16-18 hours. Buffers and solutions for array hybridization, washing and staining were prepared according to Chapter 5 of the GeneChip Mapping 500K Assay Manual and, if recommended, were filtered with Stericup vacuum driven disposable filtration system (Millipore AG). To wash, stain, and scan a probe array, we first registered the experiment in GCOS. Briefly, each experiment was assigned with a name, the probe array type used (Mapping250K_Nsp), the name and other specifications of the sample, the name of the project. Washing and staining was performed with GeneChip Fluidics Station 450. The scripts for the fluidics station were downloaded from the Affymetrix web site at www.affymetrix.com/support/technical/fluidics_scripts.affx. We followed the fluidics protocol “Mapping500k1v2_450”. The arrays were finally scanned using the GeneChip Scanner 3000 7G. Library files “Mapping250K_Nsp” required to scan and analyze the 250K NspI arrays were downloaded from the Affymetrix web site at www.affymetrix.com/support/technical/ libraryfilesmain.affx. GCOS automatically generated cell intensity data (CEL files) from each array as previously explained.

Data analysis Data acquisition was performed using the GCOS v1.4 and Genotyping Analysis Software (GTYPE) v4.1. CHP files with raw perfect match and mismatch intensities and genotype calls of every interrogated SNP on each array were generated applying the GTYPE “Batch Analysis” tool to the raw intensity data (CEL files) of all the processed arrays. The GTYPE Batch Analysis calculation is based on the Dynamic Model (DM) mapping algorithm356. CHP files constitute the input to the CN analysis pipeline by the GTYPE Chromosome Copy Number Analysis Tool (CNAT) v4.01. CNAT4 implements a Hidden Markov Model (HMM)-based algorithm357. “CNAT4 Batch Analysis” was run and the following steps were performed. First, probes and SNP levels were filtered, considering only perfect match probes for CN analysis. Then, probe-level normalization of signal intensity was performed across the batch of multiple arrays to reduce experimental noise. “Quantile normalization” was selected, as

104 recommended for NspI 250K arrays and un-paired analysis. In fact we followed the un-paired analysis mode comparing tumor samples with a global reference set. The public available genomic profiles derived from 46 Caucasian healthy females from the HapMap project were used as global reference set. The CEL intensity data of the 46 Caucasian females obtained with the 250K NspI array (CEU_sample identifier_NSP.CEL) were downloaded from http://www.hapmap.org/ downloads/raw_data/affy500k, and the corresponding raw perfect match and mismatch intensities and genotype calls (CHP files) to be used as reference were calculated applying the DM algorithm as previously explained. Then, for each SNP the relative raw CN was estimated, based on the log2 ratio of the normalized signals of a test sample over the global reference set. Additionally, a PCR normalization step was performed to correct for potential artefacts introduced by the PCR process.

Subsequently, Gaussian smoothing was applied on raw log2 ratio data to improve the signal- to-noise ratio. We adopted the default genomic smoothing bandwidth of 100 kb (default log2 ratio). Finally, a five states (homozygous deletion CN=0; hemizygous deletion CN=1; normal diploid CN=2; single copy gain CN=3; amplification CN≥4) HMM was applied for further smoothing and segmentation of the CN data, in order to determine the state which had the maximum likelihood of being the true underlying CN state. Also for HMM-based segmentation we adopted default settings. As a result, a CNT file was created for each sample. CN profiles can be exported and visualized on genome browsers (e.g. Integrated Genome Browser, at http://www.affymetrix.com/support/ developer/tools/download_igb.affx; UCSC Genome Browser358 at http://genome.ucsc.edu/).

Estimation of DNA CN thresholds for cell lines The definition of clear CN states, like deletion (CN=1), baseline CN (CN=2), gain (CN=3), amplification (CN≥4), according to the genomic profile is problematic in cell lines due to their highly rearranged genomes. In fact, they do not have a defined ploidy status homogenous for all the chromosomes. This renders impossible the determination of a unique baseline copy number for all cell lines according to which are estimated thresholds for deletion, gain and amplification of DNA material. Therefore thresholds delimiting different DNA CN states were defined separately for each cell line. The open-source, statistical program R was used to create histograms representing the genome-wide distribution of log2 ratio-values (default genomic smoothing) of the cell lines JJN3 (Fig. 22), KARPAS422 (Fig. 23), U2932 (Fig. 24) and JEKO1 (Fig. 25). Thresholds were experimentally estimated according to the histograms. We considered each peak as representative of a different CN state and we adjusted the thresholds accordingly. Whenever possible, thresholds were set at the minimal of two superimposed curves. We observed the distribution of copy number states in the histogram and we hypothesized that the highest peak, i.e. the most frequent DNA CN level, corresponded to the baseline CN. The baseline

105 CN is normally defined as two, but in the case of non-diploid cell lines it can also be different from two. The hypothesized baseline CN was confirmed looking at the distribution of CN levels found in the genomic profiles (Figg. 28-31, chapter 4.4.1). The most frequent DNA CN level corresponded to the baseline in the cell lines KARPAS422, U2932 and JEKO1. On the contrary, JJN3 presented an histogram difficult to interpret, and the genomic profile (Fig. 28.1, chapter 4.4.1) revealed that the most frequent CN state in JJN3 was deletion and not baseline as expected. Once established the baseline CN level, deletion or gain/amplification states are easily defined on the histogram as the preceding or the following peaks, respectively. In situations where different CN states are not delimited by clear borders, as for example observed in U2932 and JEKO1, thresholds were defined according to the genomic profiles by visual interpretation of regions with different CN levels.

Fig. 22: JJN3 Fig. 23: KARPAS422

-0.02 0.23 0.47 -0.25 0.13 0.5 10000 15000 Frequency 5000 0 0 5000 10000 15000 20000

-1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0

Fig. 24: U2932 Fig. 25: JEKO1

-0.18 0.26 0.65 -0.23 0.27 0.58 Frequency 0 5000 10000 15000 20000 0 5000 15000 25000 35000

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.

Figg. 22-25: Histograms of DNA copy number (CN) distribution. On the x-axis is plotted the default log2 ratio obtained by the CNAT4 calculation with default smoothing on the 250K array hybridization data; on the y-axis the number of array probes with a defined CN state. Dashed vertical lines represent CN thresholds. Fig. 22: JJN3; Fig. 23: KARPAS422; Fig. 24: U2932; Fig. 25: JEKO1.

106 Table 16 summarizes the cell line-specific, visually estimated thresholds. JJN3 had a baseline

DNA CN in the CN range -0.02 ≤ log2 ratio ≤ 0.23, whereas log2 ratio < -0.02 was indicative of loss, log2 ratio > 0.23 of gain, and log2 ratio > 0.47 of amplification. KARPAS422 had a baseline

DNA CN in the CN range -0.25 ≤ log2 ratio ≤ 0.13, with losses delimited by log2 ratio < -0.25, gains by log2 ratio > 0.13, and amplifications by log2 ratio > 0.5. U2932 showed baseline DNA

CN at -0.18 ≤ log2 ratio ≤ 0.26, losses at log2 ratio < -0.18, gains at log2 ratio > 0.26, and amplifications at log2 ratio > 0.65. Finally, JEKO1 had baseline DNA CN at -0.23 ≤ log2 ratio ≤

0.27, losses at log2 ratio < -0.23, gains at log2 ratio > 0.27, and amplifications at log2 ratio >

0.58.

Tab. 16: Estimated copy number thresholds on default log2 ratio for delimiting DNA loss, gain, and amplification of cell lines JJN3, KARPAS422, U2932, and JEKO1. JJN3 DNA copy number threshold loss (CN=1) log2 ratio < -0.02 gain (CN=3) log2 ratio > 0.23 amplification (CN≥4) log2 ratio > 0.47

KARPAS422 loss (CN=1) log2 ratio < -0.25 gain (CN=3) log2 ratio > 0.13 amplification (CN≥4) log2 ratio > 0.5

U2932 loss (CN=1) log2 ratio < -0.18 gain (CN=3) log2 ratio > 0.26 amplification (CN≥4) log2 ratio > 0.65

JEKO1 loss (CN=1) log2 ratio < -0.23 gain (CN=3) log2 ratio > 0.27 amplification (CN≥4) log2 ratio > 0.58

Definition of amplification breakpoints upon raw genome-wide DNA profiles According to the 250K genomic profiles (Fig. 32, chapter 4.4.1), the amplicon of the cell line JJN3 delimited the minimal common amplified region at 11q23.1. Therefore, the breakpoints of the minimal common amplified region were estimated upon the raw CN profile of JJN3. This was obtained with Affymetrix CNAT 4.01 after having disabled the Gaussian smoothing option (i.e. genomic smoothing was set to 0 kb). In this case the CN data are smoothed only by the HMM algorithm. To define the amplified region we analyzed the raw HMM median log2 ratio at 11q23.1 of the cell line JJN3.

Quantitative real-time PCR (qPCR) qPCR was performed on genomic DNA of an healthy donor (normal DNA) and cell lines using the ABI PRISM 7000 Sequence Detection System and the SYBR GREEN PCR Master Mix

107 (Applied Biosystems). Primers SYBR11 F and SYBR11 R were designed with Primer Express 2.0 software and were used to amplify the common region of amplification at 11q23.1 (Fig. 33, chapter 4.4.1; Suppl. Tab. 1). Primers LINE-1 F and LINE-1 R were used as endogenous control as previously cited118,284 (chapter 3.3, “qPCR”). All primers were synthesized by Sigma-Genosys (Steinheim, D). PCRs were performed as previously described (chapter 3.3, “qPCR”). All samples were analyzed in triplicate, together with a negative control containing all PCR reagents without any template DNA (NTC) and a DNA sample from an healthy donor as normal reference. The standard curve method was applied to determine 11q23.1 CN in DNA samples normalized to the repetitive element Line-1. Normal DNA was used as calibrator for relative quantification of 11q23.1 DNA. Normal DNA CN for the 11q23.1 region was set to two. Dissociation curve analysis was performed to verify the absence of unspecific amplification.

Fluorescence in situ hybridization (FISH) FISH analysis was performed at Department of Pathology of the Ospedale di Circolo in Varese (University of Insubria, Varese, IT).

Cells were cultured in their respective growth medium at 37°C in 5% CO2. At a convenient growth phase, cells were exposed overnight to colcemid (0.09 μg/ml) (Celbio, Milan,Italy). Cells harvesting was performed using hypotonic treatment and methanol/glacial acetic acid (3:1) fixation. Metaphase spreads were conventionally prepared and FISH experiments were carried out on metaphases as previously described359-361. The following probes were used: whole chromosome painting for chromosomes 4, 11 and 7 (WCP4, WCP11, WCP7), directly labelled with red and green fluorochromes; BAC clones RP11-794P6, RP11-759M17 and RP11-404A3 (http://www.biologia.uniba.it/rmc/), mapped at 11q23 (Fig. 33, chapter 4.4.1) and labelled with biotin using nick translation kit (Roche). Hybridized metaphases were counterstained with 4’,6’-diaminido-2-phenylindole dihydrochloride (DAPI) (Vysis, Downers Grove, IL, USA), and analysed at Leica DMRA fluorescence microscope (Leica, Wetzlar, Germany) using both single and double filters observations. Digital images were collected with a cooled CCD camera (Joko-cho, Hamamatsu City, Japan) coupled with FISH software (TESIMAGING, Assago, Italy). Dual color FISH analysis was performed using specific chromosome painting directly labelled with green and red fluorochromes. Dual color FISH with BAC probes was performed using directly labelled WCP (green) for chromosome 11 and biotin labelled and Cy3-revealed BAC clones. miRNA expression analysis: mir-34b and mir-34c MicroRNA expression was assessed with the mirVana qRT-PCR miRNA Detection Kit (Ambion Europe Ltd., Huntingdon, UK) starting from total RNA samples. Total RNA was extracted with TRIZOL reagent (Invitrogen). For miRNA applications it is important not to combine TRIZOL

108 protocol with RNA purification procedures on RNA-binding glass-fiber filters (e.g. Qiagen columns) to avoid loss of small RNA species (<200 nt). Briefly, 5-10x106 cells were harvested at exponential growth phase (~1x106 cells/ml). The cells were centrifuged to discard the growth medium and washed once with PBS 1x (GIBCO). The cell pellet was manually dissolved and 1 ml TRIZOL reagent was added. Cell lysis was carried out at room temperature by careful mixing of the suspension. After 5’, 200 μl chloroform (Sigma-Aldrich) were added and the tube was extensively mixed by inverting it many times during 5’. All the subsequent steps were performed on ice or in the refrigerated eppendorf centrifuge 5810 R. Aqueous and organic phase were separated by a centrifugation at 14,000 rpm, 4°C for 15’. After this, the clear and colourless aqueous phase was collected and further mixed with 500 μl isopropanol (Merck) by inverting the tube. An incubation at -80°C for 30’ was performed to facilitate the formation of the RNA pellet. Another centrifugation step at 14,000 rpm, 4°C for 15’ was run and the supernatant was carefully removed prior to wash the RNA pellet with 600 μl 70% ethanol (Merck AG, Zug, Switzerland) in DEPC-water. Finally, a 10’ centrifugation at 13,000 rpm, 4°C was performed, the ethanol was completely removed and the RNA was stored at room temperature to let the pellet dry. The RNA pellet was dissolved in 30-50 μl DEPC-water and stored at -80°C. RNA concentration and quality was assessed by spectrophotometric measurements with Nanodrop (NanoDrop Technologies). The RNA concentration was adjusted with DEPC-water to 50 ng/μl, and 50 ng of total RNA were used per reaction. The mirVana kit preferentially detect mature miRNA, which are small (19-23 nt), endogenous, single-stranded RNA molecules. Due to the particular patented design of the Ambion primers, the expected size of the amplicon is approximately 90 bp. The mirVana qRT-PCR miRNA Detection Kit is intended to be used with mirVana qRT-PCR primer Sets specific for the miRNA of interest, containing a primer for RT and an optimized PCR primer pair. We used both mirVana qRT-PCR has-miR-34b and has-miR-34c primer sets, whereas the has-miR-24 primer set was used to amplify the widely expressed miR-24 as a positive control for miRNA expression in our RNA samples and as a positive control of assay performance combined with human Heart total RNA supplied by the kit. We tested our RNA samples for small species RNA with the hsa-miR-24 primer set and we could confirm the compatibility of TRIZOL RNA extraction protocol for miRNA investigations (Fig. 35, chapter 4.4.2). We performed a qualitative analysis of miRNA expression with mirVana qRT-PCR miRNA Detection Kit following the manufacturer’s protocol for end-point PCR analysis. After dilution (1:10) of mirVana RT primers stock solution (10x) with Nuclease-free water provided by the kit, a RT master mix (mmix) was prepared. All reagents (except of the ArrayScript Enzyme Mix) were initially mixed by vortexing and kept on ice. Separate RT mmix was prepared for each miRNA-specific RT primer. Final composition of the RT reaction included mirVana RT Buffer (1x), 1 μl 1x mirVana RT primers, 0.4 μl ArrayScript Enzyme Mix and 50 ng sample RNA or nuclease- free water for NTC sample. Nuclease-free water was added to a final volume of 10 μl. RT

109 reaction mix was incubated for 30’ at 37°C, followed by an enzyme inactivation step of 95°C for 10’ in a MJR Thermal Cycler 200 with heated lid. After RT, samples were transferred on ice and PCR reaction was immediately set up. Following a brief centrifugation of PCR buffer and primers, we assembled the PCR mmix on ice. A final volume of 15 μl containing mirVana PCR Buffer (1x), 1U Super Taq (HT Biotechnology Ltd., Cambridge, UK), 0.5 μl mirVana PCR primers and Nuclease-free water was added to each RT product. The PCR reaction mix was placed in a pre-heated MJR Thermal Cycler 200 with heated lid, and the initial denaturation at 95°C for 3’ was started, followed by 40 cycles of 95°C for 15’’ and 60°C for 30’’, and the final extension at 72°C for 7’. PCR products were analyzed by gel electrophoresis on a 3.5% agarose gel prepared and run in TBE buffer. After electrophoresis, miRNA amplicons were stained with an aqueous solution of ethidium bromide (0.5 μg/ml) and visualized using the AlphaImager 3400 (AlphaInnotech Corporation). It was expected to see discrete bands of about 90 bp, that can be easily distinguished from smaller primer dimer bands seen in the NTC reaction. As positive control we used the prostate cancer cell line DU145 expressing both mir-34b and mir-34c and human Heart total RNA provided by the kit.

Transcriptome mapping with tiling arrays We performed transcriptome mapping with the GeneChip Human Tiling 2.0R F arrays (Affymetrix), which cover chromosomes 8, 11, and 12. Although 2.0 R tiling arrays are intended to be used for chromatin immunoprecipitation (ChIP) experiments, we chose this array type for transcriptome mapping on chromosome 11 due to the possibility to buy individual arrays from the set. The samples were prepared following the GeneChip Whole Transcript (WT) double-stranded (ds) target assay (Affymetrix), which is designed to be used with 1.0 R tiling array set (Affymetrix). The GeneChip WT ds target assay generates ds labelled DNA targets from the entire expressed genome. Random-primed cDNA is generated from total RNA, and since we were using only Human Tiling 2.0R F arrays and not the entire array set, we followed the un-amplified protocol. To deal with data coming from sample hybridization to perfect match only, thus without any information about the noise level usually given by the mismatch probes, we performed technical triplicates (i.e. hybridization triplicates) for every sample.

GeneChip WT ds target assay Total RNA of cell lines JJN3, KARPAS422 and JEKO1 was extracted with TRIZOL reagent (Invitrogen) as previously described (“miRNA expression analysis: mir-34b and mir-34c”). The quantity and quality of RNA was assessed by a Nanodrop spectrophotometer (NanoDrop Technologies Inc.). RNA pool derived from human total RNA of lymph node, thymus and spleen from healthy donors (Ambion) was used as reference and calibrator sample (CTR RNA pool).

110 RNA from CTR pool and cell lines was processed in triplicates following the GeneChip WT ds target assay manual (Affymetrix), starting three times with 7 μg of the same total RNA (technical replicates). Here we describe the protocol we followed to produce the amount of labelled target necessary for hybridization to a single array (Fig. 26) (chapters 3-6 and Appendix B of the GeneChip WT ds target assay manual).

3x replicates JJN3 3x replicates CTR RNA pool: Starting Material 3x replicates KARPAS422 vs 7 μg total RNA lymph node, spleen, thymus RNA 3x replicates JEKO1 from healthy donors

1st & 2nd strand cDNA synthesis

7.5 μg dsDNA

fragmentation and labeling

Fragmented, Labeled DNA

hybridization, wash & stain

Human Tiling 2.0R F array (covering chromosomes 8, 11, 12)

Fig. 26: Flow diagram of the GeneChip WT ds Target Assay (Affymetrix). (see text for explanation)

Target preparation without amplification The protocol started with the first-strand cDNA synthesis, that required the use of the GeneChip WT ds cDNA synthesis kit (Affymetrix). Total RNA/Random primer mix was prepared on ice adding 7 μg total RNA, 3 μg random primers and RNase-free water to a final volume of 8 μl. The mix was incubated in a MJR Thermal Cycler 200 with heated lid using the following program: 70°C for 5’, 25°C for 5’, and finally 4°C for 2-10’. In the meantime, First-Strand cDNA Synthesis Master Mix (mmix) was prepared on ice. 12 μl of it were added to the RNA/Random primer mix for a total reaction volume of 20 μl, containing 7 μg total RNA, 3 μg random primers, 1x 1st Strand Buffer , 0.01 M DTT, 0.5 M dNTP+dUTP, 1 μl RNase Inhibitor, and 4 μl SuperScript II. The reaction mix was incubated in a MJR Thermal Cycler 200 with heated lid and the following protocol was run: 25°C for 10’, 42°C for 90’, 70°C for 10’, and 4°C for 2-10’. Finally, 20 μl of the Second-Strand cDNA Synthesis mmix prepared on ice were added to the First-Strand cDNA reaction. The Second-Strand cDNA Synthesis mix contained all the previous components with additionally 3.5 mM MgCl2, 0.25 mM dNTP+dUTP, 1.2 μl DNA Polymerase I, 0.5 μl RNase H and RNase-free water to a final volume of 40 μl. The Second-Strand cDNA synthesis reaction was performed in a MJR Thermal Cycler 200 with heated lid and the program: 16°C for 120’, 75°C for 10’, and final cooling to 4°C, prior to store the product at - 20°C or to proceed to the ds DNA purification step.

111 ds DNA cleanup required the use of the GeneChip Sample Cleanup Module. We performed the cleanup procedure with the eppendorf centrifuge 5415D. First of all, ds DNA was transferred in a new 1.5 ml tube. If a precipitate was visible it was resuspended by pipetting. 60 μl of RNase-free water were added to each sample, together with 370 μl of cDNA binding buffer. The entire sample solution was loaded to a cDNA cleanup spin column in a 2 ml collection tube. After a spin at 8,000 rcf for 1’ the flow-through was discarded and the column was transferred to a new collection tube. 750 μl of cDNA wash buffer - previously diluted with 100% ethanol for molecular biology (Merck) according to the instructions – were added to the column and again a spin at 8,000 rcf for 1’ was run, discarding the flow- through. The column was then centrifuged without cap at a speed of 16,100 rcf (maximum speed) for 5’ and successively transferred to a new 1.5 ml collection tube. 15 μl of the cDNA elution buffer were added directly to the column membrane. After 1’ incubation at room temperature it was centrifuged at 16,100 rcf for 1’. The yield of purified ds DNA was determined by means of spectrophotometric measurement with a Nanodrop (NanoDrop Technologies, Wilmington, DE, USA). The expected yield was ≥ 7.5 μg ds DNA. Fragmentation and labelling of fragmented ds DNA required the use of GeneChip WT ds DNA terminal labelling kit (Affymetrix). Fragmentation mmix was prepared on ice and added to 7.5 μg purified ds DNA. The reaction mix contained 1x Fragmentation Buffer, 15 U UDG, 225 U APE1 and RNase-free water up to 48 μl. The following incubation was performed in a MJR Thermal Cycler 200 with heated lid: 37°C for 60’, 93°C for 2’, and final cooling to 4°C for 2-10’. 45 μl of fragmentation product were transferred to a new tube and stored at -20°C for the next step. The rest was reserved for fragmentation analysis with Bioanalyzer 2100 (Agilent Technologies Inc., Palo Alto,CA, USA). 1 μl of fragmented DNA was loaded on a RNA 6000 Nano LabChip (Agilent Technologies Inc.) according to the manufacturer’s instructions. The expected result of the Bioanalyzer analysis is presented in figure 27.

8 Bioanalyzer marker

7 DNA fragments

[Fluorescence] 6

5

4

3

2

1

0

14 19 24 29 34 39 44 49 54 [seconds] Fig. 27: Bioanalyzer profile of fragmented DNA. The majority of fragmented DNA is expected to range between 25- 200 bases, without any overlaid peaks, that would be indicative of rRNA contamination. 112 If the Bioanalyzer profiles did not correspond to the expected one, the fragmentation procedure was repeated. The labelling reaction was prepared on ice adding 1x TdT Buffer, 60 U TdT, 83 μM DNA Labelling Reagent to the 45 μl of the fragmented ds DNA sample, up to a final volume of 60 μl. The reaction was incubated in a MJR Thermal Cycler 200 with heated lid with the program: 37°C for 60’, 70°C for 10’, and finally 4°C for 2-10’. Buffers and solutions for array hybridization, washing and staining were prepared according to Appendix B of the GeneChip WT ds Target Assay and, if recommended, were filtered with Stericup vacuum driven disposable filtration system (Millipore AG). The Hybridization Cocktail was prepared according to table B.4 of the manual, which is the receipt for single array hybridization without any amplification of the starting material. 60 μl of fragmented and labelled DNA target (~ 37.5 ng/μl) were mixed with 50 pM Control Oligo B2 (Affymetrix), 0.1 mg/ml Herring Sperm DNA (Invitrogen), 0.5 mg/ml Acetylated BSA (Invitrogen), 1x Hybridization Buffer, 7% DMSO (Sigma-Aldrich Chemie GmbH) and RNase-free water to a final volume of 200 μl. The Hybridization Cocktail was heated at 99°C for 5’, cooled to 45°C for 5’ and centrifuged at 13,200 rpm for 1’. 200 μl of sample-specific Hybridization Cocktail were injected into the Human Tiling 2.0 R F array, that was placed at 45°C in a GeneChip Hybridization Oven 640 (Affymetrix), at 60 rpm and incubated for 16 hours. To wash, stain, and scan a probe array, we first registered the experiment in GCOS. Briefly, each experiment was assigned with a name, the probe array type used (for Tiling 2.0R F arrays: Hs35b_P06R_v01), the name and other specifications of the sample, and the name of the project. Wash and stain procedure was performed on the GeneChip Fluidics Station 450 according to the assay manual. Fluidics scripts and library files (“Human Tiling 2.0R Array set”) were downloaded from the Affymetrix web site as previously explained. After 16 hours of hybridization, the hybridization cocktail was removed from the probe array, that was completely filled with 250 μl Non-Stringent Wash Buffer (Wash Buffer A). We then followed the fluidics protocol “FS450_0001”. It consisted of initial cycles of washing with Wash Buffer A and B, followed by a first stain with Streptavidin Phycoerythrin (SAPE), Wash A, second stain with biotinylated anti-streptavidin antibody and third stain again with SAPE. After final wash cycles, the array was filled with Array Holding Buffer. Second and third stain allowed an amplification of the fluorescent signal. The staining reagents were prepared according to Appendix B. Stain Cocktail 1 used for the first and third stain corresponded to the SAPE solution mix, containing 1x Stain Buffer, 2 mg/ml BSA, 10 μg/ml SAPE (Molecular Probes Inc.), and milliQ water to a total volume of 600 μl. The stain cocktail 2 used for the second stain corresponded to the Antibody Solution mix, which contained 1x Stain Buffer, 2 mg/ml BSA, 0.1 mg/ml Goat IgG (reagent grade, Sigma-Aldrich), 3 μg/ml biotinylated anti-streptavidin antibody (goat) (Vector Laboratories) and milliQ water to a total volume of 600 μl. At the end of the wash and stain procedure the array was filled with the Array Holding Buffer and the probe array window was checked for bubbles. If large bubbles were present, we proceeded

113 to manual drain of the Array Holding Buffer from the array and subsequent filling with fresh Holding Buffer. The arrays were finally scanned using the GeneChip Scanner 3000 7G and the acquisition of signal intensity data (CEL) from each array was automatically completed by GCOS as previously explained.

Data analysis

1) From hybridization signal intensities to MAT scores: MAT algorithm Hybridization signal intensities (CEL) were analyzed with the MAT algorithm (Model-based Analysis of Tiling arrays)362, an open-source package for the analysis of tiling array data downloaded from http://chip.dfci.harvard.edu/~wli/MAT/. Before running the program, user- selectable parameters for tiling array analysis could be adapted in the TAG file. This comprised for example the definition of samples to be considered for the MAT analysis and the type of analysis to be performed (single group or two-groups, and how many replicates per group), the array type used to collect gene expression data, or the width of the Gaussian Window (GW) to be applied in the signal smoothing procedure explained later. The program is command line only and has no graphical user interface (GUI) mode, but the results can be viewed on IGB or UCSC genome browser. MAT first performed a background correction of the measured intensity, going beyond simple GC correction, and based on a sophisticated probe behaviour model and probe standardization. A t -statistics was performed in order to correct each signal from each probe i for background given by variable design of the probes (Equation 1). In particular, variation in hybridization among probes was determined by content and position of nucleotides (A, T, C, G) and by probe copy number.

)log( −mPM ˆ ii t = i si (Eq. 1)

In equation 1, ti is the background corrected perfect match intensity of probe i , i.e the measured probe level corrected for probe design; PM i is the measured perfect match intensity of probe i , i.e the measured probe level; mˆ i is the baseline intensity predicted by the model based on sequence and copy number of probe i , i.e. the background of a group of probes with similar design (affinity bin); si is the standard deviation of the affinity bin to which probe i belongs362. Then, successive moving average of the standardized signal was applied to reduce the noise and estimate a MAT score (or smoothed t -value) for each interrogated probe, ending up with a MAT score profile for each analyzed sample. The width of the GW is usually dependent on what it is investigated, and should be equal to or bigger than the minimum feature that

114 has to be resolved. For exon mapping we considered the typical median and mean exon size in human genome of 133 bp and 262 bp respectively363, and the fact that an array resolution of 35 bp implies the use of a GW bigger than 35 bp. We used 100 bp GW, corresponding, on average, to a smoothing procedure on at least three probe signals. Following the MAT

362 algorithm , the MAT score was computed as trimmed mean of ti values within the GW taken into account. The trimmed mean removes the top 10% and bottom 10% of the t -values and averages the remaining 80% of the t -values. The trimmed mean of t -values was then corrected by the number of probes in the GW. The MAT score was calculated for each GW and was assigned to the probe at the centre of the window according to equation 2.

MATscore p ×= TMn (Eq. 2)

In equation 2, n p is the square root of the number of probes in the window, and TM is the trimmed mean of the t -values within the window. In our case a two-groups MAT analysis with triplicates was run, starting with triplicates CEL files from a cell line vs triplicates CEL of the control RNA pool. Through the comparison of the transcription activity of a sample to a reference it was possible to detect positive transcription activity corresponding to overexpression and negative transcription activity representing downregulation with respect to the reference. After having obtained the probe-specific MAT scores both for the cell line and for the CTR RNA pool, the algorithm computed for each probe the calculation reported in equation 3.

MAT scorecell line – MAT scoreCTR RNA pool (Eq. 3). The difference of MAT scores in sample vs reference was the final result of two-groups MAT analysis and was contained in BAR files. The MAT score profiles (BAR files) were visualized on IGB, where each bar of the tiling expression profile represents a probe-specific MAT score, and positive transcription denotes overexpression, whereas negative transcription denotes downregulation with respect to CTR RNA pool (Fig. 37, chapter 4.4.2). IGB can be used for visualizing genomes and genomic data but also for elaborating them. For example, we selected as separate BED files putative transcribed units. This was done applying a threshold (step 2) and defined maxgaps and minruns parameters (step 3) to the MAT score profiles (Figg. 36-37, chapter 4.4.2).

2) Definition of significant transcription activity: MAT score threshold Significant transcription activity was selected after the application of a threshold on the MAT scores corresponding to a Bonferroni-corrected p-value of 0.05 according to equation 4. p_min=(0.05/#probes)*(GW/35) (Eq.4) In equation 4, #probes is the number of probes on the array (about 6.5 million), and GW/35 represents the number of probes in the GW. This corresponded to a p_min=2.2*10-8, that in turns corresponded to the following MAT score thresholds: in JJN3 tiling expression profile it

115 was|4.98|, in KARPAS422 tiling expression profile |5.09|, and in JEKO1 |4.99|. The MAT score threshold is an absolute value and is symmetrically valid both for overexpression and downregulation. The probes that showed significant transcription were referred to as positive probes.

3) Definition of putative transcribed units: maxgap and minrun parameters Positive probes were further organized in putative transcribed units according to maxgap and minrun parameters. Maxgap is defined as the maximal distance that can separate two neighbouring positive probes to be considered part of the same transcribed unit; minrun corresponds to the minimal size of a region with positive probes to be considered a transcribed unit. The choice of maxgap and minrun sizes can be based upon various aspects. First, it depends on the GW applied to the data. Since we used a GW of 100 bp, we defined maxgaps of 200 bp as twice the value of GW considering the Nyquist criterion, that precludes any features (i.e. gaps) less than twice the sampling frequency. Other important aspects are the previously cited median exon size in humans (133 bp) and the peak of the distribution of DNA fragments hybridized to the array, which was 25-100 bp. For these reasons and since we wanted to consider as transcribed units only regions with at least three probes with significant transcription activity, we adopted minruns of 100 bp.

Quantitative real-time RT-PCR (qRT-PCR) Total RNA (1μg) was reverse-transcribed with random hexamers using SuperScript First-Strand Synthesis System for RT-PCR (Invitrogen). Briefly, 1 μg total RNA (RNA concentration≥125 ng/μl) was mixed with 50 ng random hexamers, and 1 mM dNTP mix, and the final volume of 10 μl was reached with DEPC-water. A NTC was added to exclude reagents contamination and contained all the same reagents lacking any RNA template. The first mix was incubated at 65°C for 5’ in a MJR Thermal Cycler 200, then cooled on ice for at least 1’. In the meantime, the second mix was prepared, added to the first mix and the whole was further incubated at 25°C for 2’ in the thermal cycler. At that point, the enzyme SuperScript II RT was added to the final mix, which consisted of the first mix with 1x RT buffer, 5 mM MgCl2, 10 mM DTT , 40 U RNase Out and 50 U SuperScript II RT. The following program was run on the thermal cycler: 25°C for 10’, 42°C for 50’, and 70°C for 15’. The reaction was then chilled on ice and 2 U RNaseH were added in each tube, with subsequent incubation at 37°C for 20’. The resulting cDNA was diluted 1:5 with DEPC-water prior to be used as template for qRT-PCR. Real-time PCR was performed with the ABI PRISM 7000 Sequence Detection System (Applied Biosystems) and the ABsolute QPCR ROX Mix (AB-1139) (ABgene, Epsom, UK). We used the TaqMan GeneExpression Assays (20x mix of unlabelled PCR primers and FAM-labelled TaqMan MGB probes) for POU2AF1 (Hs00174811_m1), PPP2R1B (Hs00184737_m1), SNF1LK2 (Hs00404960_m1) and ACTB (Hs99999903_m1) (Applied Biosystems). Polymerase chain

116 reactions were performed using a 96-wells plate and optical adhesive covers (Applied Biosystems) in a final volume of 25 μl with 1x Absolute QPCR ROX Mix (ABgene), 1x mix of primers and probes, 2 μl diluted cDNA, and DEPC-water to reach the final volume. Cycling parameters were: 95°C for 15’, followed by 40 cycles of 95°C for 15’’ and 60°C for 1’. All samples, besides patients samples, were analyzed in triplicate. For patients samples only duplicates could be run due to the scarce material available. A NTC was added to each plate and for each assay, and contained all PCR reagents without any template DNA. The comparative CT method was used for relative quantification. Gene expression levels of POU2AF1, PPP2R1B and SNF1LK2 were normalized to the endogenous control ACTB and divided by the calibrator CTR RNA pool, consisting of human total RNAs of lymph node, thymus and spleen from healthy donors (Ambion).

RNA interference Small interfering RNAs (siRNAs) targeting POU2AF1, PPP2R1B, and the Firefly luciferase gene GL3 were purchased from Ambion. GL3 siRNA (siGL3) was used as negative control, since it is not complementary to any gene in the human genome. Custom siRNA directed against POU2AF1 was designed according to364. We ordered standard purity (column purified), annealed POU2AF1 siRNA. It was 21 nt in length, with the following sense sequence: 5’- GGUUCUGUGUCUGCAGUGAtt-3’, targeting a sequence in exon 4. Silencer Validated siRNA against PPP2R1B was also standard purity and annealed. The sense sequence of the 21 nt long siRNA was: 5’-GGGCAUCAAAUGCUGUUAAtt-3’, targeting a sequence in exon 5. The medium Opti-MEM (GIBCO) was used as solvent for Lipofectamine and siRNA dilutions and as growth medium for cells during transfection. Lipofectamine 2000 (Invitrogen) is a cationic lipid formulation and was used as transfection reagent. siRNA stock (100 μM) and working (20 μM) solutions were prepared adding nuclease-free water to the dried oligonucleotides. Silencer Cy3-labelled GAPDH siRNA (Ambion) was used for monitoring the uptake of siRNA by fluorescence activated cell sorting (FACS). Cy3-labelled GAPDH siRNA (50 nM) was transfected with the same procedure as described for the experimental siRNAs. An untreated control sample was run in parallel. After 24 hours incubation, cells were harvested, and about 200,000 cells were washed twice with ice-cold phosphate-buffered saline 1x (PBS) containing 1% fetal bovine serum and finally examined using a FACSCalibur (Becton Dickinson) as previously described365. The percentage of Cy3-GAPDH siRNA positive cells was calculated using the Cell Quest software. The control (untreated) sample was used to define the background, negative signal. Only the portion of siRNA-treated cells not overlapping with the control was considered positive for fluorescence and therefore transfected. To perform RNA interference studies, cells were plated at a concentration of 1x106 cells/ml in

Opti-MEM in six-well plates (2ml/well) and incubated at 37°C with 5% CO2 while preparing transfection reagents. Nucleic acid-Lipofectamine complexes (silencing complexes) were prepared mixing the desired quantity of siRNA working solution and Lipofectamine. First, siRNA

117 and Lipofectamine were diluted in Opti-MEM using separate tubes. After 5’ incubation at room temperature, they were mixed to form the silencing complexes and incubated for 20’ at room temperature. Once formed the complexes, the silencing mix was added to the cells in the respective well pipetting and rocking the plate to and fro. The cells were transfected either with 100 nM PPP2R1B siRNA or 200 nM POU2AF1 siRNA and GL3 siRNA (negative control). An untreated control sample was run in parallel. Transfection was stopped after 5 hours. The cells were recovered and centrifuged to remove silencing complexes. The cell pellet was dissolved with the growth medium (RPMI + 20% FBS) and the cell concentration was measured with the counter Beckman Coulter ZZ particle count and size analyzer (Beckman Coulter GmbH, Krefeld, Germany). A small fraction of cell suspension was reserved to the cell proliferation assay, the rest was plated again in a six-well plate and incubated at

37°C with 5% CO2 for 48 or 72 hours. Knockdown was determined 48h and 72h after transfection by monitoring both mRNA by qRT-PCR and protein level by western blotting. Proliferation rate of transfected cells was determined using the colorimetric cell proliferation assay done with the thiazolyl blue tetrazolium bromide salt (MTT) (Sigma-Aldrich). qRT-PCR After the incubation time (48 or 72 hours), half volume (1 ml) of transfected cells was used for RNA extraction with TRIZOL and qRT-PCR for POU2AF1 and PPP2R1B was performed following previously described protocols. Gene expression levels of POU2AF1 and PPP2R1B were normalized to the endogenous control ACTB and divided by the calibrator GL3.

SDS-PAGE and Western blotting (see Appendix for SDS-PAGE buffers, solutions and gel) After the incubation time (48 or 72 hours), the other half (1 ml) of transfected cells was used for protein extraction. Cells were centrifuged and we obtained whole-cell lysates for protein extraction adding 30 μl protein lysis buffer to the cell pellet. Protein quantification was done by BCA Protein Assay Kit (Pierce, IL, USA) according to the manufacturer’s instructions. This assay enables a colorimetric detection and quantification of total protein based on bicinchoninic acid (BCA). The method combines the reduction of Cu2+ to Cu+ by proteins in alkaline medium, known as biuret reaction, with the colorimetric detection of Cu+ with a reagent containing BCA. The chelation of two molecules of BCA with one cuprous cation (Cu+) gives a purple coloration and the complex strongly absorbs at 570 nm. Absorbance was detected by a 96-well plate reader (Beckman Coulter AD340, Beckman Coulter GmbH), and it is directly proportional to the protein concentration. Protein dilutions were prepared adjusting the concentration to 2 μg/μl adding 2x loading buffer and water. Prior to gel electrophoresis, protein dilutions were denatured at 95°C for 8’- 10’ and stored on ice after brief spin down. 20 μg proteins were loaded on 10% sodium dodecylsulfate–polyacrylamide gel. The marker Page Ruler Prestained Protein Ladder (Fermentas International Inc., Canada) was used. Electrophoresis was performed in 1x

118 running buffer (TGE). Western blotting on the nitrocellulose membrane Hybond ECL (Amersham-Pharmacia, GE Healthcare, Otelfingen, Switzerland) was done in 1x transfer buffer. Successful protein transfer was examined by Ponceau (Sigma) staining. Unspecific antibody binding was avoided blocking the nitrocellulose membrane with 0.2% I-Block (Tropix, Inc., Bedford, MA, USA) solution (in PBS). The following primary antibodies were used: anti- PP2A-Aβ (C-20, sc-6113) (Santa Cruz Biotechnology Inc., CA, USA), anti-BOB-1 (AB against POU2AF1) (C-20, sc-955) (Santa Cruz Biotechnology Inc.), anti-a-Tubulin (Calbiochem, Nottingham, UK). Usually an overnight incubation at 4°C was performed with primary antibody solution, prepared in I-Block according to optimized dilutions: anti-PP2A-Aβ (goat) was used 1:2000, anti-BOB-1 (rabbit) 1:2000, and anti-Tubulin (mouse) 1:1,000. Nitrocellulose membrane was subsequently washed with 0.1% TBST. Finally, incubation with horseradish peroxidase (HRP)-labelled secondary antibodies solutions in I-Block, either anti-goat 1:3000 (donkey anti-goat IgG-HRP, Santa Cruz Inc.), or anti-rabbit 1:3000 (ECL anti-rabbit IgG, HRP- linked whole antibody, Amersham-Pharmacia), or anti-mouse 1:3000 (ECL anti-mouse IgG, HRP-linked whole antibody, Amersham-Pharmacia), allowed for the detection of investigated proteins by the enhanced chemiluminescence (ECL) system (Amersham-Pharmacia). It is a light emitting, non-radioactive method for the detection of immobilized specific antigens with HRP-labelled secondary antibodies. Adding the enzyme substrate to the blotting membrane results in light emission at the precise position where the HRP labelled antibody is bound, that can be detected by exposure to a light-sensitive autoradiography film.

Cell proliferation assay (MTT) After the measurement of the concentration of the transfected cell suspension, the needed volume was taken and the concentration was adjusted to 100,000 cells/ml. Triplicates of 100 μl from the diluted cell suspension were plated in a flat-bottomed 96-well plate, one for each time-point (48 or 72 hours), and incubated at 37°C with 5% CO2. After 48 or 72 hours incubation, 15 μl of an MTT solution (5 mg/ml in sterile PBS) were added to each well. The plate was further incubated for 4 hours, during which the yellow tetrazolium salt MTT salt is absorbed and metabolized in the mitochondria of proliferating cells into a blue/violet formazan product366. 100 μl MTT lysis buffer (Appendix) were added after 4 hours to lyse the cells and solubilize the formazan metabolite. After overnight incubation, the plate was analyzed by spectrophotometric measurement at 570 nm with Beckman Coulter AD340. The 570 nm absorbance is directly proportional to the number of proliferating cells. The proliferation rate was calculated relative to the negative control GL3.

119 4.4 Results

4.4.1 Genome profiling

Genomic profiles of JJN3, KARPAS422, U2932 and JEKO1 cell lines were obtained with Affymetrix 250K Human Mapping arrays (top panel of Figg. 28.1, 29.1, 30.1, 31.1). Chromosome 11 genomic profiles (Figg. 28.2, 29.2, 30.2, 31.2) revealed a recurrent amplified region. A detailed comparison of the genomic profiles for chromosome 11 is presented in figure 32. The amplification peak at 11q23.1 was clearly overlapping among the cell lines JJN3, KARPAS422 and U2932, but not in JEKO1, that showed a DNA gain. The minimal common amplified region was delimited by the amplicon in JJN3 (Fig. 33). Therefore, the definition of the minimal common amplification was based on the raw HMM median log2 ratio at 11q23.1 of the cell line JJN3. Amplification was detected from position 110,687,662 to 111,018,800 according to the physical mapping of amplified SNPs based on the March 2006 human genome assembly (hg18), thus covering 330 kb. If the surrounding, non-amplified SNPs were considered as well, the enlarged region spanned from 110,666,351 to 111,044,085 (377 kb). To better characterize the DNA CN profile surrounding the amplified region it was necessary to estimate CN thresholds for DNA loss, gain, and amplification of each cell line. The thresholds were mainly established observing the histograms with the frequency (Figg. 22-24) and the genomic profiles with the distribution of various DNA CN levels (top panel of Figg. 28.1-31.1). Table 16 (chapter 4.3) summarizes the cell line-specific, visually estimated thresholds. Applying these thresholds to the genome-wide profiles or to the chromosome 11 DNA profiles as presented in the middle and bottom panels of figures 28.1-31.1 and 28.2-31.2 respectively, it was possible to clearly identify the patterns of CN changes on the whole genome and on chromosome 11, in particular around the amplified region at 11q23.1.

120 >gain >ampli loss<

Fig. 28.1: Genomic profile of JJN3 visualized by CNAT Viewer (GTYPE). In each of the four panels, on x-axis are ordered all the chromosomes, from chromosome 1 to X; on y-axis, the default log2 ratio obtained by the CNAT4 calculation on 250K array hybridization data. The top panel presents the unfiltered genomic profile (green). In the middle, two panels with red profiles highlight the gained (y>0.23), respectively amplified (y>0.47) regions obtained by application of a filter on the y-axis according to the estimated CN thresholds (Fig. 22). The same was done for deletions (y<-0.02), presented in blue in the bottom panel.

>gain >ampli loss<

11q23.1

Fig. 28.2: Chromosome 11 genomic profile of JJN3 visualized by CNAT Viewer (GTYPE). On x-axis, the physical mapping of chromosome 11. On y-axis, the default log2 ratio obtained by the CNAT4 calculation on 250K array hybridization data.

121 >gain >ampli loss<

Fig. 29.1: Genomic profile of KARPAS422 visualized by CNAT Viewer (GTYPE). In each of the four panels, on x-axis are ordered all the chromosomes, from chromosome 1 to X; on y-axis, the default log2 ratio obtained by the CNAT4 calculation on 250K array hybridization data. The top panel presents the unfiltered genomic profile (green). In the middle, two panels with red profiles highlight the gained (y>0.13), respectively amplified (y>0.5) regions obtained by application of a filter on the y-axis according to the estimated CN thresholds (Fig. 23). The same was done for deletions (y<-0.25), presented in blue in the bottom panel.

>gain >ampli loss<

11q23.1

Fig. 29.2: Chromosome 11 genomic profile of KARPAS422 visualized by CNAT Viewer (GTYPE). On x-axis, the physical mapping of chromosome 11. On y-axis, the default log2 ratio obtained by the CNAT4 calculation on 250K array hybridization data.

122 >gain >ampli loss<

Fig. 30.1: Genomic profile of U2932 visualized by CNAT Viewer (GTYPE). In each of the four panels, on x-axis are ordered all the chromosomes, from chromosome 1 to X; on y-axis, the default log2 ratio obtained by the CNAT4 calculation on 250K array hybridization data. The top panel presents the unfiltered genomic profile (green). In the middle, two panels with red profiles highlight the gained (y>0.26), respectively amplified (y>0.65) regions obtained by application of a filter on the y-axis according to the estimated CN thresholds (Fig. 24). The same was done for deletions (y<-0.18), presented in blue in the bottom panel.

>gain >ampli loss<

11q23.1

Fig. 30.2: Chromosome 11 genomic profile of U2932 visualized by CNAT Viewer (GTYPE). On x-axis, the physical mapping of chromosome 11. On y-axis, the default log2 ratio obtained by the CNAT4 calculation on 250K array hybridization data.

123 >gain >ampli loss<

Fig. 31.1: Genomic profile of JEKO1 visualized by CNAT Viewer (GTYPE). In each of the four panels, on x-axis are ordered all the chromosomes, from chromosome 1 to X; on y-axis, the default log2 ratio obtained by the CNAT4 calculation on 250K array hybridization data. The top panel presents the unfiltered genomic profile (green). In the middle, two panels with red profiles highlight the gained (y>0.27), respectively amplified (y>0.58) regions obtained by application of a filter on the y-axis according to the estimated CN thresholds (Fig. 25). The same was done for deletions (y<-0.23), presented in blue in the bottom panel.

>gain >ampli loss<

11q23.1

Fig. 31.2: Chromosome 11 genomic profile of JEKO1 visualized by CNAT Viewer (GTYPE). On x-axis, the physical mapping of chromosome 11. On y-axis, the default log2 ratio obtained by the CNAT4 calculation on 250K array hybridization data.

124 JJN3

KARPAS422

U2932

JEKO1

11q23.1

Fig. 32: 250K genomic profiles of chromosome 11 visualized by CNAT Viewer (GTYPE). On x-axis, the physical mapping of chromosome 11. On y-axis, the default log2 ratio obtained by the CNAT4 calculation on 250K array hybridization data. From the top to the bottom: in green, the genomic profile of the cell line JJN3; in red, the cell line KARPAS422; in fuchsia, the cell line U2932; in blue, the cell line JEKO1. At 11q23.1 is visible a common region of amplification in the cell lines JJN3, KARPAS422 and U2932, whereas JEKO1 shows a DNA gain.

* * * SYBR11 F SYBR11 R

*

Fig. 33: Minimal common region of amplification at 11q23.1 visualized on the genome browser of the Database of Genomic Variants (http://projects.tcag.ca/variation/cgi-bin/gbrowse/hg18, current release). In fuchsia are displayed the RefSeq Genes from NCBI RefSeq Project; in blue microRNAs tracks from miRBASE; in green mRNA from GenBank; in black genomic clones. MicroRNAs are labelled according to miRBASE accession (http://microrna.sanger.ac.uk/), and correspond to hsa-miR-34b (MI0000742) and hsa-miR-34c (MI0000743). The annealing region of primers SYBR11 used for qPCR is displayed, and the transcripts initially investigated by RT-PCR are highlighted by an orange asterisc.

125 We were interested in identifying putative targets of the common amplification. In figure 33 are presented 483,650 bp including the minimal amplified region and the corresponding annotation with respect to RefSeq genes, microRNAs, mRNAs and genomic clones as visualized on the genome browser of the Database of Genomic Variants (http://projects.tcag.ca/variation/cgi-bin/gbrowse/hg18, current release). RefSeq gene track displays curated gene entries from the NCBI RefSeq Project (http://www.ncbi.nlm.nih.gov/RefSeq), derived from primary GenBank submissions with varying levels of validation. MicroRNA track shows microRNAs contained within the miRBASE (http://microrna.sanger.ac.uk/, release 9.0). The mRNA track is downloaded from UCSC genome browser and shows mRNAs from the current assembly in GenBank (http://www.ncbi.nlm.nih.gov/Genbank/index.html). Also the clone track is downloaded from UCSC and shows the clone coverage of the human genome assembly hg18 (Build 36: Mar.2006). Moreover, the annealing region of the primers SYBR11 used to confirm by qPCR the 11q23.1 amplification observed in the 250K genomic profiles is visible. qPCR revealed an highly increased DNA CN in the cell lines JJN3, KARPAS422 and U2932, thus validating the 11q23.1 amplification (Fig. 34). JJN3 had a DNA CN higher than eight, KARPAS422 DNA CN was higher than nine, and U2932 had a DNA CN of nine.

qPCR on 11q23.1

10 9 8 7 6 5 4 3

DNA copynumber 2 1 0 normal JEKO1 JJN3 KARPAS422 U2932

Fig. 34: Validation of the amplified region at 11q23.1 by qPCR on genomic DNA. On the x-axis are listed the cell lines and the normal DNA sample obtained from an healthy donor. On y-axis the DNA CN quantified with the standard curve method, normalized to the repetitive element Line-1 (endogenous control) and relative to normal DNA (calibrator).

4.4.2 Transcriptome profiling

After having characterized the recurrent 11q23.1 amplification at the DNA level, we were interested in its effects on RNA level. To identify the amplification target(s) we analyzed the transcriptome of the cell lines presenting the DNA amplification. Our hypothesis was to find

126 the putative targets as overexpressed transcripts among the genes involved in the 11q23.1 amplification. Initially, we performed RT-PCRs to study the level of transcription of some known and predicted genes among those mapping in the amplified region as presented by figure 33 (data not shown). Unfortunately, we could not identify any overexpressed gene among the ones we tested, therefore this approach was not helpful to identify putative targets. Moreover, we noticed that in the HG-U133A gene expression profile of the cell line JJN3 previously obtained during the MM project (chapter 3), none of the genes mapping at the amplified region and interrogated by the array showed increased expression compared to other MM cell lines and clinical samples. These observations led us to the hypothesis that the amplification at DNA level could have targets not investigated neither by HGU133A nor by the RT-PCRs, as for example other genes, unknown transcripts or microRNAs. This hypothesis was strengthened by the presence of two human microRNAs in the middle of the minimal common region of amplification (Fig. 33): mir-34b and mir-34c. MicroRNA expression was analyzed in JJN3, KARPAS422, JEKO1 and seven further lymphoma cell lines: four DLBCL cell lines (DOHH2, OCI-LY3, SUDHL6, VAL) and three MCL cell lines (GRANTA519, REC, NCEB1) (Fig. 35). The third cell line with the 11q23.1 amplification, U2932, was not yet available at the time we investigated miRNA expression. As surrogate of a normal counterpart we analyzed also RNA from CD19+ and CD19- cells, and human total RNA of lymph node, thymus and spleen from healthy donors. Besides miR-34b and miR-34c expression, we included also the analysis of miR-24, a widely expressed miRNA, as positive control reaction. All the samples tested showed expression of miR-24. This confirmed the compatibility of RNA samples for the miRNA detection assay and the presence of small RNA species. Moreover, miR-24 expression demonstrated that the protocol worked well. We could detect miR-34b expression in control RNA from lymph node, spleen, and thymus, and in the cell lines GRANTA519, REC and slightly also in JJN3. miR-34c was expressed in control RNA from lymph node and spleen, and in the cell lines JJN3, GRANTA519, REC and slightly in VAL. In KARPAS422 and JEKO1 neither miR-34b, nor miR-34c were expressed. In general, miR-34b and miR-34c expression could not be associated with cancer cell lines. On the contrary, they seemed expressed in healthy secondary organs of the immune system. If we focus the attention on the cell lines with the common 11q23.1 amplification, KARPAS422 and JJN3, it was not possible to demonstrate any regulation of mir-34b and mir-34c expression by increased DNA level. KARPAS422 did not express neither mirR-34b nor miR-34c. On the contrary, JJN3 expressed both, but only at low levels.

127 thymus NTC Lymph node Lymph spleen JJN3 KARPAS422 + JEKO1 miR-34b

miR-34c

miR-24 A

REC CD19+ DOHH2 NTC NCEB1 GRANTA519 CD19- SUDHL6 VAL OCI-LY3 +

miR-34b

miR-34c

miR-24 B Fig. 35: mirVana RT-PCR miRNA Detection. The top and the middle panel show the expression of miR-34b and miR- 34c, respectively. The expected size of the amplicon is approximately 90 bp, which corresponds to the upper band. The smaller band is given by primer dimers. The lower panel displays miR-24 expression (positive control reaction). A: the positive control for miR-34b and miR-34c was DU145, whereas for miR24 we used human Heart total RNA. B: the positive control was human Heart total RNA.

Discarded the miRNAs as candidate targets, we reconsidered the possibility to find a gene at 11q23.1, either known or not yet described, justifying the recurrent amplification. The difficulties encountered in the identification of the amplification targets by RT-PCR prompted us to use tiling arrays for high-resolution transcriptome mapping. Tiling gene expression profiles were obtained for JJN3, KARPAS422 and JEKO1 using the Affymetrix GeneChip Human Tiling 2.0R F arrays and applying the MAT algorithm to calculate MAT scores from the measured hybridization intensities. The cell line U2932 was not included because it was not yet available at the time we investigated gene expression with tiling arrays. Since the MAT scores of the cell

128 lines are expressed relative to the CTR RNA pool, positive MAT scores represent overexpression in the cell line with respect to the CTR pool. On the contrary, negative MAT scores would be indicative of downregulation in the cell line with respect to CTR pool. Putative transcribed units were identified after selection of significant transcription activity by the application of a MAT score-threshold corresponding to a Bonferroni-corrected p-value of 0.05 and organization of positive probes in transcribed units according to maxgap ≤ 200bp and minrun > 100bp (Fig. 36). We concentrated our attention on tiling gene expression profiles of the cell lines affected by the 11q23.1 amplification, thus JJN3 and KARPAS422 (Fig. 37). JJN3 displayed increased transcription activity relative to the CTR only in correspondence of the gene PPP2R1B. KARPAS422 showed overexpression of POU2AF1, PPP2R1B and SNF1LK2. According to the tiling gene expression profiles, we considered three genes as putative amplification targets: POU2AF1, PPP2R1B and SNF1LK2. These genes were validated by qRT- PCR, that was performed on nine MM cell lines (JJN3, KMS18, KMS11, OPM2, LP1, SKMM1, RPMI-8226, U266, KMM1), nine DLBCL cell lines (KARPAS422, U2932, DOHH2, VAL, OCI-LY3, OCI- LY7, OCI-LY10, OCI-LY19, SUDHL6) and eight DLBCL clinical samples, three MCL cell lines (JEKO1, REC, GRANTA519) and five clinical samples, and RNA from CD19+ cells used as a normal control (Fig. 38). It was at this point that we identified the DLBCL cell line U2932 as an additional, interesting model for studying the 11q23.1 amplification. U2932 presented an expression pattern similar to KARPAS422, and published data on U2932 indicated a DNA amplification at 11q21-q23354. These were the reasons for obtaining its genome-wide DNA profile as previously illustrated. The latter confirmed the 11q23.1 amplification, thus leading to a total of three cell lines with the same amplified locus: JJN3, KARPAS422 and U2932. The qRT-PCR gene expression patterns were different in the DLBCL and MM cell lines (Fig. 38). KARPAS422 and U2932 both overexpressed the three genes POU2AF1, PPP2R1B and SNF1LK2. JJN3 showed overexpression of PPP2R1B but downregulation of SNF1LK2. In particular, differently from what previously reported364 for MM, JJN3 did not express POU2AF1 at all, indicating that in myeloma this genomic event is not necessarily associated with gene overexpression. Looking at the frequency of gene overexpression with respect to the CTR pool in MM, DLBCL and MCL cell lines we observed that SNF1LK2 was overexpressed in 56% of MM cell lines (5/9), 33% of DLBCL (3/9) and in none of the MCL cell lines analyzed (Fig. 38C); PPP2R1B was overexpressed in 100% of MM cell lines, 67% of DLBCL (6/9) and MCL (2/3) cell lines, and 87% (7/8) of DLBCL patients and 60% (3/5) of MCL patients (Fig. 38B); POU2AF1 was overexpressed in 78% (7/9) of MM cell lines, 88% of DLBCL patients and cell lines, and 100% of MCL patients and cell lines (Fig. 38A). As a whole, qRT-PCR results indicated POU2AF1 and PPP2R1B to be more frequently overexpressed than SNF1LK2. Therefore we considered POU2AF1 and PPP2R1B more probable targets of the 11q23.1 amplification than SNF1LK2, and we selected them for functional studies (chapter 4.4.4).

129 BAR file with MAT scores

threshold: MAT score corresponding to Bonferroni-corrected p=0.05

maxgap ≤ 200bp

minrun > 100bp Affymetrix

Fig. 36: Identification of significant transcription and transcribed units in MAT-analyzed tiling array profiles. With IGB, a threshold was applied to the MAT scores (BAR file) to select for significant transcription activty (positive probes). Then, transcribed units were identified using the following parameters: maxgap ≤ 200 bp as the maximal distance that can separate two neighbouring positive probes to be considered part of the same transcribed unit; minrun > 100 bp as the minimal size of a region with positive probes to be considered a transcribed unit. The identified transcribed units can be saved as separated BED file.

Fig. 37: Tiling gene expression profiles of JJN3 and KARPAS422 at 11q23.1 viewed on Intergrated Genome Browser (IGB). On the x-axis is displayed the physical mapping of chromosome 11, on y-axis are reported the MAT scores that consitute the tiling gene expression profiles of the cell lines JJN3 (in green) and KARPAS422 (in red) relative to the CTR RNA pool. Under the tiling expression profile of each cell line are highlighted as bars (BED files) the regions with significant transcription activity, i.e. with MAT scores > threshold (visible as continuos line over the profile), and considered putative transcribed units according to maxgap ≤ 200bp and minrun > 100bp. Known RefSeq annotations (in green) and sno/miRNA annotations (in yellow miR-34b, miR-34c) of 11q23.1 are visualized.

130 A POU2AF1 ) ) 2 ACTB ACTB 1,5

1

0,5

0

-0,5 JJN3 U2932

log(relative expression)(/ expression)(/ log(relative -1 KARPAS422

B PPP2R1B

) 2,5

2 ACTB

1,5

1

0,5

0

-0,5 JJN3 U2932 -1 log(relative expression)(/ KARPAS422 + MM cell lines DLBCL cell lines DLBCL patients MCL NTC CD19 cell lines patients Spleen Thymus

Lymph node

C SNF1LK2

) 0,8 0,6 ACTB 0,4 0,2 0 -0,2

-0,4 U2932 JJN3 -0,6

-0,8 KARPAS422 -1 log(relative expression)(/ log(relative +

MM cell lines DLBCL cell lines MCL NTC

CD19 cell lines Spleen Thymus

Lymph node Lymph Fig. 38: qRT-PCR of genes POU2AF1 (A), PPP2R1B (B) and SNF1LK2 (C). On the x-axis the analyzed samples, with specified position of cell lines with 11q23.1 amplification (JJN3, KARPAS422 and U2932); on y-axis the gene expression levels normalized to the housekeeping gene ACTB and relative to CTR RNA pool on a logarithmic scale.

131 4.4.3 FISH experiments

To characterize the genetic mechanism leading to 11q23.1 amplification we performed FISH analyses, using probes derived from three BAC clones covering the amplified region. FISH analyses performed on metaphases of the cell lines JJN3, KARPAS422, U2932 and JEKO1 revealed different FISH patterns (Figg. 39, 40, 42, 43). The BAC clones RP11-794P6, RP11- 759M17, RP11-404A3 co-localized in all cell lines in the same 11q23.1 region. Dual color FISH with WCP11 and BAC clones on JJN3 demonstrated the presence of two isochromosomes i(11q) and two different derivative chromosomes 11, being the copy number of 11q23 region eight, in contrast with four centromeres of chromosome 11. This result is compatible with a low level amplification of 11q in JJN3. A specific translocation between chromosome 7 and 11 previously described355 was not confirmed (Fig. 39B).

Fig. 39: FISH to JJN3 cells. A: metaphases hybridized with RP11-794P6 BAC probe (red); B: dual-color FISH with WCP7 (green) and WCP11 (red) probes. The experiments demonstrated absence of translocations between chromosome 11 and 7 (B), but presence of other rearrangements involving sequences of chromosome 11. In particular the region 11q23.1 (RP11-794P6) was present on three chromosome 11 derivatives (A).

The KARPAS422 cell line contained a normal chromosome 11 and a derivative chromosome 11 (der(11)) showing an in tandem duplication of the region hybridized by BAC clones (Fig. 40A). The duplication of 11q region was confirmed by FISH with WCP11 (Fig. 40B). Like other FISH studies on KARPAS422339,367, we excluded the presence of the translocation t(4;11) by dual-color FISH of probes WCP4 and WCP11 (Fig. 40C). Our results were in agreement with a previously published work identifying a derivative chromosome showing terminal deletion and an inverted tandem repeat of 11q23 region339 (Fig. 41). We confirmed the terminal deletion on der(11) as well, with a probe mapping to MLL gene that gave no signal on der(11), but

132 was positive on normal chromosome 11 (data not shown). Interestingly, this cell line was originally described in literature as having a HSR at 11q23355 (Tab. 15). Our results demonstrated that culture conditions probably induced in KARPAS422 a secondary rearrangement with a breakpoint inside the HSR, resulting in a duplication. These data suggest that the amplified sequences of the HSR may contain gene(s) conferring a positive selection to cell growth or a fragile/recombination-prone site.

Fig. 40: FISH on KARPAS422 metaphases. A: RP11-794P6 probe (red); B: dual-color FISH with WCP4 (green) and WCP11 (red) probes. The experiment demonstrated the absence of traslocations between chromosome 4 and chromosome 11 (B), and the presence of derivative chromosome 11 due to duplication of 11q23.1 region and translocation with chromosome other than 4.

Fig. 41: Diagram of the rearranged chromosome 11 in KARPAS422 with inverted tandem repeat of segments from 11q (from Kobayashi et al.339). Arrows indicate the direction of the chromosome from centromere to telomere; 3.16 (empty circle) and 23.20 (full circle) are the cosmid probes used to detect inverted tandem repeats.

133 Dual color FISH using BAC clones and WCP11 as probes on metaphase and on interphase nuclei of U2932 revealed an inverted duplication with amplification of the region hybridized by all BAC clones (invdup(11q22-23)amp(11q23.1)) (Fig. 42). In addition, the derivative chromosome 11 showing inverted duplication with amplification was also translocated with a breakpoint at 11q23.1 (Fig. 42). According to the public available karyotype of U2932355 (Tab. 15), the translocation should involve chromosome 1, like t(1;11)(q12q23;q23).

Fig. 42: Dual color FISH using RP11-404A3 clone (red) and WCP11 (green) as probes on U2932 metaphases. The experiment demonstrated an inverted duplication and amplification of 11q23.1 region on chromosome 11 derivatives.

FISH analysis of JEKO1 using WCP11 and BAC clones as probes demonstrated that the 11q23 region localized on different chromosomes, indicating the presence of multiple rearrangements (Fig. 43). Moreover, in this cell line the copy number of 11q23.1 region identified by BAC clones was variable between four and six copies, but only two normal chromosome 11 were observed. This genetic condition suggests duplication and low level amplification of the 11q23.1 region also in the JEKO 1 cell line.

134

Fig. 43: FISH to JEKO1 cells. A: FISH with WCP 11 probe (red); B: FISH with BAC794P6 as probe (red). These experiments demonstrated that JEKO1 presents several chromosome 11 derivatives. In addition, the 11q23.1 region identified by a panel of BAC probes was present on normal chromosome 11 and on different chromosome 11 derivatives.

4.4.4 Loss-of-function studies

Downregulation experiments aimed to silence the genes POU2AF1 and PPP2R1B by RNA interference were performed in order to investigate the possible functional consequences of 11q23.1 amplification. The results of cellular uptake of labelled siRNA in the cell lines JJN3 and KARPAS422 assessed by FACS measurement revealed a good transfection efficiency in KARPAS422, where 71% of the treated cells were positively transfected, whereas in JJN3, with 27% positive cells, the transfection was clearly inefficient (Fig. 44). Based on this, we decided to perform silencing experiments only with the cell line KARPAS422.

JJN3: transfection efficiency 27% KARPAS422: transfection efficiency 71%

Fig. 44: Cellular uptake of siRNA in JJN3 and KARPAS422 determined by FACS. Cells were transfected with 50 nM of Silencer Cy3-labelled GAPDH siRNA. An untreated control sample was run in parallel. siRNA uptake was measured by FACS and is visualized as fluorescence intensity distribution in control (empty curve) and siRNA-treated (full, red curve) cells. Control was used to define the background, negative signal represented by 100 to 101 FL2-H (x-axis).

135 The results of RNA interference with POU2AF1 siRNA on KARPAS422 are summarized in figure 45. The treatment of KARPAS422 with 200 nM POU2AF1 siRNA had no knockdown effects on mRNA level, both at 48 and 72 hours post transfection (Fig. 45A). This corresponded to the same timeframe previously adopted with success by Zhao et al. to detect downregulation of POU2AF1 expression after treatment with the same siRNA molecule364. Figure 45B shows the results of western blot analysis performed on whole-cell lysates collected 48 and 72 hours post transfection. POU2AF1 protein was detected with a polyclonal antibody raised against a peptide mapping at the C-terminus of human POU2AF1, binding both p34 (MW: 34 kDa) and p35 (MW: 35 kDa) isoforms. As expected, also at the protein level no clear effect of the siRNA treatment was visible. Finally, proliferation rate in treated cells was assessed relative to negative control GL3: 48 hours post transfection it was 94%, 72 hours post transfection was reduced to 84% (Fig. 45C). In conclusion, custom POU2AF1 siRNA was not effective in downregulating mRNA and protein level in KARPAS422 cell line, although a weak decrease in cell proliferation rate could be appreciated.

A qRT-PCR 48h post tra nsfe ction: POU2AF1 qRT-PCR 72h post transfection: POU2AF1

120% 110% 110% 100% 100% 90% 90% 80% 80% expression expression 70% 70% 60% 60% 50% 50% 40% 40% 30% POU2AF1

30% POU2AF1 20% 20% 10% 10% 0% 0% relative relative relative GL3 POU2AF1 siRNA GL3 POU2AF1 siRNA siRNA GL3 GL3 CTR CTR B 48h post transfection siRNA 72h post transfection p35 → POU2AF1 p35 → POU2AF1p34 → p34 →

TUBULIN TUBULIN

C cell proliferation 48h post transfection cell proliferation 72h post tra nsfe cti on 120% 120%

100% 100%

80% 80%

60% 60%

40% 40%

20% 20% cell proliferation rate cell proliferation proliferation raterate 0% 0% GL3 POU2AF1 siRNA GL3 POU2A F1 siRNA

Fig. 45: RNA interference with POU2AF1 siRNA on KARPAS422. Cells were transfected with 200 nM POU2AF1 siRNA and GL3 siRNA (negative control). mRNA level (A), protein level (B) and cell proliferation rate (C) were determined 48h and 72h post transfection. B: TUBULIN level was measured as loading control; CTR: untreated cells.

The effects of RNA interference with PPP2R1B siRNA on KARPAS422 cells are summarized in figure 46. The treatment with 100nM PPP2R1B siRNA caused a mRNA knockdown slightly visible 48 hours post transfection (94%), reaching more than 20% 72 hours post transfection (Fig. 46A), 136 but no effects could be appreciated on protein level and cell proliferation. Western blot analysis results are shown in panel B (Fig. 46B). PP2A-Aβ was detected with a polyclonal antibody raised against a peptide mapping at the C-terminus of human PP2A-Aβ (MW: 65 kDa). No differences in protein level were appreciable among untreated, negative control and siRNA treated cells, neither at 48 nor at 72 hours post transfection. Cell proliferation rate was practically not influenced by RNA interference: 48 hours post transfection it was 99% in treated cells with respect to negative control GL3, and 72 hours after transfection it still was 93% (Fig. 46C).

A qRT-PCR 48h post tra nsfe cti on: PPP2R1B qRT-PCR 72h post transfection: PPP2R1B

110% 110% 100% 100% 90% 90% 80% 80% expression 70% expression 70% 60% 60% 50% 50% 40% 40% 30%

PPP2R1B 30% PPP2R1B 20% 20% 10% 10% 0% 0% relative relative relative relative GL3 PPP2R1B s iRNA GL3 PPP2R1B s iRNA

72h post transfection CTR siRNA 48h post transfection GL3 CTR GL3 siRNA B PP2A-Aβ PP2A-Aβ

TUBULIN TUBULIN

C cell proliferation 48h post transfection cell proliferation 72h post transfection 120% 120%

100% 100%

80% 80%

60% 60%

40% 40%

20% 20% cell proliferation rate cell proliferationcell rate 0% 0% GL3 PPP2R1B s iRNA GL3 PPP2R1B siRNA

Fig. 46: RNA interference with PPP2R1B siRNA on KARPAS422. Cells were transfected with 100 nM PPP2R1B siRNA and GL3 siRNA (negative control). mRNA level (A), protein level (B) and cell proliferation rate (C) were determined 48h and 72h post transfection. B: TUBULIN level was measured as loading control; CTR: untreated cells.

4.5 Discussion

In the course of aCGH studies applied to various B-cell tumors with the purpose of identifying relevant CN changes linked to malignancy, we detected a recurrent amplification at 11q23.1 in four lymphoma cell lines: JJN3, KARPAS422, U2932, and JEKO1. Later characterization revealed that 11q23.1 amplification was clearly overlapping among the cell lines JJN3, KARPAS422 and U2932. JEKO1 exhibited a different amplification pattern, with DNA gain on correspondence to the common amplified region, surrounded by amplifications. We focused

137 on the minimal common amplified region delimited by JJN3 amplicon and shared by JJN3, KARPAS422 and U2932. Our interest in the recurrent 11q23.1 amplification was mainly driven by the hypothesis that a highly increased DNA CN may determine the overexpression of a gene/genes and the modification of its/their function, both events possibly playing a role in cancer development and progression. DNA dosage alterations occurring in somatic cells are frequent contributors to tumorigenesis. However, it remains a difficult task to find out the real amplification target(s). First, the affected genomic regions usually include more than one single gene, therefore the measurement of gene expression at RNA and/or protein level is essential for candidate evaluation. Second, overexpression can occur in many genes mapping in the amplified region, not forcedly driven by the amplification368. A third consideration is related to the fact that overexpression of candidate genes can be caused by various mechanisms, including gene dosage alteration but also translocations, mutations or epigenetic modifications. In our case, 11q23.1 amplification did not seem to have a massive effect on transcription level. Transcriptome mapping with tiling arrays did not identify upregulated transcription activity outside the boundaries of known transcripts in the cell lines affected by the amplification with respect to normal control RNA pool. The involvement of two candidate genes, POU2AF1 and PPP2R1B, and two miRNAs, mir-34b and mir-34c, was evaluated. The POU2AF1 (POU domain, class 2, associating factor 1; alias BOB1 or OBF1) gene encodes for a protein also known as OCA-B, OBF1 or BOB1. POU2AF1 is a B-cell specific transcriptional co-activator of the transcription factors OCT1 and OCT2, therefore named also OBF1 for OCT binding factor 1369,370. The ubiquitous OCT1 and/or the B-cell enriched OCT2 bind to the octamer motif of the Ig gene promoter and, by recruitment of POU2AF1 in transcriptional complex, they activate Ig gene transcription. The B-cell specific protein plays an important role in antigen-driven stages of B-cell activation and maturation. POU2AF1-deficient mice present great impairment of AG-dependent response and maturation of B-cells371-373. They have primary follicles, but they lack well-developed GCs372,373. POU2AF1 is likely to be involved also in B-cell signalling374, as evidenced by the presence of a second myristoylated isoform localized predominantly in the cytoplasm and plasma membrane375. POU2AF1 has three isoforms: p34, p35 and p40, all encoded by the same mRNA. p40 has an alternative start codon with respect to p34 and after translation is quickly processed to p35. p34 and p35 are identical in their internal sequence but differ in their N- terminal region, in vivo activity as transcriptional activators, and sub-cellular localization. p34 is mainly located in the nucleus and efficiently functions as transcriptional activator. p35 has a myristoylation motif in the N-terminal sequence, that determines its localization in the cytoplasm, partially bound to the plasma membrane. The in vivo ability of p35 to activate transcription from octamer element-containing promoters is much lower than p34, but seem to play a role in B-cell signalling374,375.

138 Expression of POU2AF1 is restricted to B-lymphocytes and is regulated between the different stages of B-cell development376, with GC cells showing highly increased protein level. Moreover, up-regulation of POU2AF1 was found also in GC-derived lymphoma376, whereas pre- and post-follicular B-cell lymphoma generally do not express POU2AF1. In established B- cell lines it is almost always expressed. An integrative genomic study proposed by Largo et al. characterized genes amplified and overexpressed in nine HMCLs265: POU2AF1 was reported to locate in a frequently gained region and to be overexpressed regardless of underlying DNA CN. More recently, a role for POU2AF1 as 11q23.1 amplification target in MM cell lines was proposed364. The consequence of POU2AF1 activation was linked to promotion of MM cell growth. In fact, the authors observed inhibition of MM cell growth by POU2AF1 downregulation, and growth stimulation by its ectopic expression. They proposed an oncogenic mechanism indicating the B-cell maturation factor TNFRSF17, which is one of the receptors of BAFF, as possible target of the transcriptional co-activator POU2AF1. They demonstrated the direct interaction between POU2AF1 and TNFRSF17, and the correlation of POU2AF1 and TNFRSF17 expression in MM cell lines and primary samples. We detected overexpression of POU2AF1 relative to a normal control RNA pool in all MM cell lines analyzed by qRT-PCR with the exception of JJN3. The MM cell line JJN3 was affected by the same 11q23.1 amplification as described by Zhao et al.364, but lacked POU2AF1 expression, thus demonstrating that 11q23.1 amplification in MM not always influences POU2AF1 expression. The fact that POU2AF1 expression is not only dependent on DNA CN was already observed265. In our previous study on MM presented in chapter 3, we detected 11q23.1 amplification in JJN3, one out of 17 HMCLs, whereas 11q23.1 gain was found in six MM patients and six HMCLs (data not shown). Although 11q23.1 amplification can be considered a relatively infrequent event in MM, as proposed also by Zhao et al.364, its detection in MM patients demonstrated that it is not acquired during cell line establishment. The two DLBCL cell lines with the amplification, KARPAS422 and U2932, exhibited overexpression of POU2AF1, like all the other DLBCL and MCL samples. The role of POU2AF1 in B-cell malignancies is still controversial, and given its increased expression also at normal B- cell developmental stages, it is a weak candidate for being the 11q23.1 amplification target. The second candidate was PPP2R1B, a gene that encodes for the β isoform of the subunit A of the serine/threonine protein phosphatase 2A (PP2A Aβ). Serine/threonine protein phosphatase 2A (PP2A) is a trimeric protein composed by a core catalytic subunit (PP2A C,~36kDa, isoform α or β), a scaffold protein (PP2A A or PR65,~65 kDa, isoform α or β) and a variable regulatory subunit (PP2A B). PP2A A is mediating the binding between the catalytic subunit PP2A C and a wide variety of PP2A Bs. The two isoforms α or β of the scaffold subunit A are ubiquitously expressed and share 86% sequence identity. PP2A B subunits are responsible for the substrate specificity, the sub-cellular localization (i.e. nucleus or cytoplasm), and the catalytic activity of the protein complex. The PP2A family comprises several holoenzyme

139 complexes due to the existence of different isoforms of every subunit encoded by different genes, to two types of post-translational modifications of the catalytic subunit (phosphorylation and methylation) and to four different families of B subunits with respective isoforms and splice variants, giving rise to a large number of potential combinations377. PP2A is a ubiquitously expressed protein involved in many signalling pathways, that accounts for a large fraction of phosphatase activity in eukaryotic cells. It is known that protein phosphorylation play a crucial role in the regulation of many cellular processes. Protein phosphorylation involves both protein kinases (PKs) and protein phosphatases (PPs), that act in a tightly coordinated manner and can regulate their activities reciprocally or by themselves. The activity of PPs is controlled by different regulatory subunits and by specific inhibitors as well. In particular, PP2A has two regulatory subunits, A and B, and can be inhibited by microbial toxins (e.g. okadaic acid and SV40 small t antigen)378. The first link of PP2A to carcinogenesis was deduced from the observation that the tumor promoting okadaic acid is a potent inhibitor of PP2A379. Later, the PP2A’s tumor suppressing activity was also suggested by the discovery that PP2A was the target of the small tumor (ST) antigen of two transforming DNA viruses (reviewed in380). Additionally, PPP2R1B encoding for the PP2A Aβ subunit is mutated in solid cancers381,382, revealing a putative tumor suppressor function. In some human leukemias PP2A regulation is altered by translocations, leading to the formation of fusion proteins383. Due to the various functions of PP2A in cellular processes it is a difficult task to understand its role in tumor development. An additional complication is related to the fact that complete inhibition of PP2A or loss of the catalytic or scaffold subunits causes cell death, indicating that PP2A is essential for cell viability384. Recent findings provide evidence that PP2A inhibition cooperates with other oncogenic changes (e.g. p53 and RB inactivation, hTERT and H-RAS activation) and leads from immortalization to cell transformation384-386. PP2A inhibition and the consequent enhanced phosphorylation and activity of any tumorigenic protein (e.g. c-MYC transcription factor, AKT protein kinase, RalA GTPase) induce transformation, probably due to the loss of PP2A antagonism to RAS signalling pathway384. Although evidences have suggested that PPP2R1B might be a tumor suppressor gene, our data were difficult to interpret based on this assumption. In fact, PPP2R1B was overexpressed in all three amplified cell lines JJN3, KARPAS422 and U2932, but also in all MM cell lines and in the majority of DLBCL samples and MCL samples. Again, the role of PPP2R1B remains unclear and we cannot indicate it as oncogenic target of the 11q23.1 amplification. Another hypothesis we considered was the involvement of microRNAs as putative amplification targets. As already mentioned, HG-U133A gene expression profiles collected during the MM project (chapter 3) did not help in the identification of transcripts with altered gene expression level due to 11q23.1 amplification. For this reason we focused our attention on two microRNAs, not interrogated by HG-U133A: mir34-b and mir-34c. mir-34b and mir-34c seem to be expressed as a miRNA cluster. Computational promoter analysis revealed the

140 presence of an evolutionary conserved p53 consensus sequence about 3 kb upstream of the miRNA coding sequence, and p53 responsiveness was experimentally proven387. mir-34b and mir-34c are activated by p53 and represent novel effectors of its tumor suppression function, cooperating in reducing proliferation and adhesion independent growth by miRNA- mediated silencing of target genes. Regulation of mir-34b/c by p53 was reported also in human fibroblasts388. Their expression reflected p53 status. Ectopic expression of miR-34b/c induced cell cycle arrest in both primary and tumour-derived cell lines. We observed high miR-34b and miR-34c expression in control RNA from normal lymph node, spleen and thymus with respect to absent or very low expression levels in cancer cell lines, even in cases with locus amplification. A different behaviour of miR-34b/c expression in normal and malignant cells can be supposed, in line with their tumor suppressing function, but a correlation with the 11q23.1 amplification seem unlikely. The described functions of PPP2R1B and miR-34b/c let us suppose that the genetic rearrangements leading to 11q23.1 amplification could be responsible for their inactivation. In fact, amplifications often occur proximal to a break, as for example translocation breakpoints, or are the result of inverted repeats, where the breakpoints may disrupt tumor suppressor genes. Moreover, DNA amplification could be accompanied by interstitial deletion at breakpoints, not necessarily detected by aCGH, but causing gene inactivation. This explanation would be particularly intriguing for PPP2R1B, located at the telomeric amplification breakpoint of JJN3 and KARPAS422. However, PPP2R1B expression was apparently not altered as indicated by qRT-PCR results, thus rendering unlikely this hypothesis. Regulation by p53 has been described for mir-34b/c expression387. Our data exclude that mir- 34b/c expression is influenced by DNA amplification, but genetic lesions linked to the amplification event could have an impact on their expression in malignant cells. The fact that we found low or absent miR-34b and miR-34c expression in cancer cell lines could be indicative of oncogenic deregulation. Unfortunately, our functional studies to characterize the involvement of POU2AF1 and PPP2R1B in 11q23.1 amplification were not helpful for the identification of a target. This was mainly linked to difficulties encountered in setting up RNA interference experiments. Indeed, we could not reach satisfactory levels of mRNA knockdown, both for POU2AF1 and PPP2R1B, therefore rendering impossible a clear functional evaluation of their biological effect. We discarded the existence of splice variants not recognized by the siRNA molecules according to the data obtained from the Ensembl web-site (http://www.ensembl.org/Homo_sapiens/ index.html, release 50, July 2008). POU2AF1 exhibited two known protein coding splice variants, all with the same exon 4, targeted by POU2AF1 siRNA. PPP2R1B showed three known protein coding splice variants, all with exon 5 targeted by PPP2R1B siRNA. The major obstacle to overcome was probably constituted by the low transfection efficiency, typical for B-NHL cell lines. This is likely linked to the fact that these cells are growing in suspension. A possible

141 solution to unsuccessful transfection could be the use of electroporation or viral vectors instead of cationic lipid reagents, but these alternatives could not be tested. In conclusion, our attempt to identify the target(s) of the 11q23.1 amplification among POU2AF1, PPP2R1B, and mir-34b and mir-34c, did not detect strong candidates. Some possibilities are still open and need to be considered. On one hand, we cannot exclude the existence of novel transcription activity overlapping annotated genes. On the other hand there is the possible structural fragility of the 11q23.1 region. The phenomenon of novel transcription activity overlapping annotated genes has already been observed in tiling array studies on global transcription activity. Many publications revealed the complex nature of the human genome and the existence of novel transcripts with apparently little protein-coding capacity, called transcripts of unknown function (TUFs) (reviewed in141,389). Overlapping transcription has been observed, both from the same (sense) or opposite DNA strands (antisense), meaning that different functional elements can co- locate in the same genomic region. Intronic natural antisense transcripts or intronic sense- antisense pairs have been described390. Sense-antisense transcription pairs are endogenous transcripts made of sense RNA and the cis-encoded antisense RNA, i.e. transcribed from the opposite strand of the same genomic locus as the sense RNA, therefore overlapping with it390. In the human genome it is predicted that at least 20% of the genes might form sense- antisense pairs143, the majority of sense transcripts representing protein-coding genes and most of the antisense transcripts representing ncRNAs. Sense-antisense pairing is evolutionary conserved391. The biological role of these ncRNAs is probably linked to antisense regulatory functions, but the exact mechanism of antisense regulation remains unclear. Antisense intronic ncRNA have been shown to be overexpressed in prostate tumors, and their levels have been found to correlate to tumor differentiation392, thus suggesting differential expression in cancer. Our tiling array data let suppose that the transcription activity seen at PPP2R1B (JJN3 and KARPAS422), POU2AF1 (KARPAS422) or SNF1LK2 (KARPAS422), could be the result of novel transcription activity overlapping annotated genes. In fact, transcription activity measured there was not limited to annotated exons, in contrast with what we observed in many other expressed genes (Fig. 48, chapter 5.4), but was unexpectedly high also in intronic regions. Unfortunately, the samples hybridized to the tiling arrays were prepared following the whole- transcript double-stranded target assay, and also the qRT-PCR performed with TaqMan GeneExpression assays (Applied Biosystems) was not orientation-specific, therefore we still ignore the direction of the transcription activity. However, this assumption is particularly attractive for the transcription activity detected at PPP2R1B locus, since its upregulated expression was observed both in JJN3 and KARPAS422, and its putative role as tumor suppressor gene makes it an interesting target. We propose the sense protein-coding transcript PPP2R1B as target of endogenous antisense cis-transcription, eventually deregulating it. The fact that we observed expression of PPP2R1B by qRT-PCR with TaqMan

142 GeneExpression Assays is apparently in conflict with the proposed transcriptional deregulation by an antisense cis-transcript, that is expected to have an antagonistic RNAi-mediated influence on the complementary transcript. However, simultaneous expression of sense- antisense transcripts was observed to be either co-ordinately or discordantly regulated, therefore the mechanism of transcription interference is probably too simplistic141. Moreover, we cannot be sure that the qRT-PCR system is specifically detecting only the sense protein- coding PPP2R1B. Although hazardous, this hypothesis should not be discarded without experimental proof of the structure of the putative novel antisense cis-transcript. As already mentioned, the molecular mechanism of these TUFs is still unclear, and the antisense putative novel cis-transcript could be involved in the translational regulation of the protein-coding PPP2R1B and compete with its translation. Further investigations are needed to characterize molecular and functional nature of this putative novel transcription activity: where it starts and stops, in which direction is transcribed (forward or reverse), its relationship to the expression of annotated gene, its function (coding gene or non-coding gene). Molecular characterization would imply the use of techniques such as strand-specific RT-PCR, RACE, combination of strand-specific RACE and hybridization of the obtained products to tiling arrays, cloning and sequencing of the PCR product, northern blot analyis, all applied to total RNA samples rigorously free of genomic DNA contaminants. Functional characterization could be performed by gain-of-function and improved loss-of-function studies. If both sense and antisense transcripts could be detected by strand-specific protocols, an unbalanced sense-antisense pairing with altered biological function due to the amplification event can be supposed. Among alternative explanations for 11q23.1 amplification there is the relative fragility of the region, causing chromosome rearrangements. Indeed, FISH analyses performed on the four cell lines using BAC clones overlapping the amplified region showed four different patterns of complex chromosome rearrangements. A similar scenario was for example proposed for chromosome rearrangements on 8p in epithelial cancers393. The widely diffused loss of distal 8p is associated with a complex combination of losses and duplications/amplifications proximal to the loss, suggesting highly complex 8p rearrangements, probably due to instability of that region more than events primarly driven by loss of a tumor suppressor gene. Our data, together with previous works303, seem to suggest that 11q23.1 aberrations might be the result of genetic instability of this region, non necessarily associated with a direct effect on genes mapped in it. Moreover, we cannot exclude cis- or trans- regulatory effects on more distant genes. Very recently, a common polymorphism found at 11q23.1 and associated with risk of colorectal cancer was described394. The location of the causal locus was restricted to a 60kb region, comprising the transcripts FLJ45803, LOC120376, C11orf53 and POU2AF1. They could exclude both the presence of exonic mutations in the transcripts in linkage disequilibrium with the 11q23.1 polymorphism and cis-regulatory effects on neighbouring genes. Therefore, the authors proposed that the association between 11q23.1 polymorphism

143 and increased colorectal cancer risk could be a consequence of a still uncharacterized gene or miRNA within the 60 kb region, or of a sequence change with cis- or trans-regulatory effects on genes mapping outside 11q23.1.

144 5 TILING GENE EXPRESSION PROFILES ANALYSIS

Abstract

Tiling gene expression profiles collected during the characterization of the locus 11q23.1 (chapter 4) provided a detailed map of the transcription activity on chromosomes 8, 11, and 12 of the cell lines JJN3, JEKO1 and KARPAS422. We catalogued the putative transcribed units with respect to the cell line-specific DNA CN changes and we defined genomic regions where increased transcription activity was matching with increased DNA level (up genomic intervals, UGIs), and regions where decreased transcription activity was matching with decreased DNA level (down genomic intervals, DGIs). A precise comparison of the UGIs with currently known annotation tracks led to the identification of novel transcription activity. Among the various regions with novel transcription activity, we concentrated on transcription activity detected at 11p12 in the cell line JJN3. The 11p12 putative novel transcript was located distal (>10kb) to previously annotated genes and was constituted by 15 un- annotated, consecutively mapped UGIs, not interrupted by any known annotation. JJN3 UGIs at 11p12 covered about 128 kb and we assumed that they were part of a unique, long putative novel transcript. Two UGIs were validated by RT-PCR, other were overlapping with previously described novel RNA transcripts, and/or perfectly aligned with high mammalian conservation scores. Molecular and functional characterization of the putative novel transcript at 11p12 and of other un-annotated UGIs could reveal interesting aspects of the complex architecture of the human genome.

145 5.1 Introduction

The molecular architecture of the human genome is much more complex than previously understood. As a consequence, investigative genomic approaches based on that limited understanding are inadequate. Tiling arrays are a relatively new array type developed for high-resolution, unbiased (i.e. independent of previous annotation) transcriptome mapping, and have already demonstrated to be useful to detect transcripts outside the currently annotated genes (such as well-characterized exons from RefSeq, mRNA from GenBank and public ESTs)136-140. Novel transcripts, also called transcripts of unknown function (TUF), have been described within intronic regions of known genes, within intergenic genomic regions, and also overlapping to annotated transcripts141. They include antisense cis- or trans- transcripts relative to the sense transcript, sense TUFs, either overlapping intronic regions of annotated genes or found in intergenic positions, and novel isoforms of annotated protein- coding transcripts with either new transcription start site or extended, shortened, or completely new exons. Basically, tiling arrays studies on genome-wide transcription activity have identified novel RNAs species transcribed from regions currently not annotated and have revealed that the proportion of transcribed genome is larger than previously expected. There is a highly overlapping, complex transcription activity in the human genome, where one base pair can be part of many transcripts emanating from both strands of the genome. The functional importance of all the un-annotated transcripts is not yet known.

5.2 Aim

We performed gene expression analysis with Human Tiling 2.0R F arrays (Affymetrix) in the cell lines JJN3, KARPAS422 and JEKO1 for transcriptome mapping of the 11q23.1 amplified region (chapter 4). This allowed us to collect a detailed map of the transcription activity on chromosomes 8, 11, and 12 of three cell lines. Our aim was then to characterize the putative transcribed units identified on chromosomes 8, 11, and 12 in the cell lines JJN3, KARPAS422 and JEKO1 with respect to cell line-specific genomic profiles and to already known annotations. We wanted to perform a matching of the tiling expression profiles with the genomic profiles to localize genomic regions where increased transcription activity was matching with increased DNA level, and decreased transcription activity was matching with decreased DNA level. We also aimed to obtain a precise comparison of these regions with known annotations to characterize them and to possibly identify novel transcription activity.

146 5.3 Materials and Methods

Cell lines The previously described MM cell line JJN3, the DLBCL cell line KARPAS422 and the MCL cell line JEKO1 were used (chapter 4).

Genome-wide DNA profiles and CN thresholds Genome-wide DNA profiles and the estimated DNA CN thresholds for delimiting lost, gained and amplified regions of the cell lines were obtained as previously explained (chapter 4.3, Tab. 16).

Expression profiling by tiling arrays and data analysis Tiling gene expression profiles as well as tiling array data analysis, from the MAT scores calculation until the definition of significant transcription activity and putative transcribed units (steps 1-3 of tiling arrays data analysis), were obtained as previously explained (chapter 4.3). Further steps were added to the tiling array data analysis procedure, presented in the following paragraphs, at points 4 and 5.

4) Matching tiling expression profiles with genomic profiles by IGB: identification of up genomic intervals (UGIs) and down genomic intervals (DGIs) First of all the DNA CN profiles were exported from GTYPE in a IGB-compatible format as default “log2 ratio”. Default “log2 ratio” was obtained by the CNAT4 Batch analysis calculation on 250K array hybridization data with the default genomic smoothing bandwidth of 100 kb, as previously reported (chapter 4). The export procedure was done through the “CNAT Viewer” module of GTYPE. Each cell line-specific CNT file was opened separately, and the “Export” option allowed to transform it in a IGB-compatible default “log2 ratio” format, selecting the metrics “log2 ratio”. The exported default CN profiles were loaded on IGB. As already explained, IGB is a genome browser useful to visualize and to elaborate genomic data (chapter 4). For example, it is possible to apply some filters to the profiles and select as separate BED files particular regions of interest. After having selected a profile or graph, the menu “Graph Adjuster” was opened, and the option “Graph Thresholds” was chosen. The “Visibility” menu was turned on, the “Direction” of the regions to be highlighted was specified as “>threshold” or “≤ threshold”, then the value of the MAT score threshold, as well as maxgap and minrun were given, and the regions corresponding to the imposed parameters were visualized as bars (BED file) under the profile. The “Offsets for Thresholded Regions” were set to “Start: 12” and “End: 13” due to the design of the arrays used both for genomic and gene expression profiles, with 25-mer probes, their position being measured at the centre of

147 the oligonucleotide. In the “Graph Thresholds” window it was also possible to obtain the BED file as separate track with “Make Track”, and to save and export the BED file by right clicking on the new track and selecting “Save Annotations”, then “Save tier as BED file”.

Following this procedure, we applied thresholds on default “log2 ratio” profiles for delimiting DNA losses and DNA gains according to the previously estimated CN thresholds (Tab. 17 and fig. 47A). The same was done on the tiling expression profiles loading the MAT score profiles (BAR files) on IGB and applying the thresholds summarized in table 17 (Fig. 47B). We ended up with BED files representing filtered tiling expression profiles (increased and decreased transcription activity) (Fig. 47D), and filtered genome-wide DNA profiles (gained and lost genomic DNA) (Fig. 47C). The matching of CN and tiling expression data to find UGIs and DGIs was performed with Galaxy, an open-source metaserver for integrative analysis of genomic data, available at http://galaxy.psu.edu/. Galaxy was developed by the Center for Comparative Genomics and Bioinformatics, Penn State University, PA, USA. On the Galaxy Homepage “start using galaxy” has to be selected, then the tool “Get Data”, followed by ”Upload File”, through which all BED files were uploaded. Afterwards, the tools “Operate on Genomic Intervals”, then “Intersect: overlapping pieces of intervals” were chosen to match CN profiles and gene expression profiles. The identified UGIs and DGIs could be saved as BED files and could be imported in Excel or visualized by IGB (Fig. 47E).

Tab. 17: Parameters used in the elaboration of genomic and tiling expression profiles on IGB. These cell line-specific parameters were adopted in the “Graph Adjuster” option of IGB, and applied to the respective fields in the “Graph Thresholds” window. JEKO1 Threshold maxgap minrun gain log2 ratio > 0.27 ≤ 300,000 bp > 0 bp loss log2 ratio < -0.23 ≤ 300,000 bp > 0 bp overexpression MAT score > 4.99 ≤ 200 bp > 100 bp downregulation MAT score < -4.99 ≤ 200 bp > 100 bp

JJN3 gain log2 ratio > 0.23 ≤ 300,000 bp > 0 bp loss log2 ratio < -0.02 ≤ 300,000 bp > 0 bp overexpression MAT score > 4.98 ≤ 200 bp > 100 bp downregulation MAT score < -4.98 ≤ 200 bp > 100 bp

KARPAS422 gain log2 ratio > 0.13 ≤ 300,000 bp > 0 bp loss log2 ratio < -0.25 ≤ 300,000 bp > 0 bp overexpression MAT score > 5.09 ≤ 200 bp > 100 bp downregulation MAT score < -5.09 ≤ 200 bp > 100 bp

148 E

D

C

B

A

Fig. 47: Matching of genomic and tiling expression profiles. Here is presented, as an example, KARPAS422 on chromosome 8 (IGB visualization). On the x-axis is displayed the physical mapping of chromosome 8. On y-axis are reported (from the bottom): (A) in light blue the default DNA CN profile filtered for gains and the selected gained regions as bars; (B) in red the tiling expression profiles constituted by MAT scores filtered for overexpression and the selected putative transcribed units as red bars; (C) in light bue the track (BED file) representing gained regions; (D) in red the track (BED file) representing overexpressed regions; (E) in yellow the track representing UGIs. The green rectangle highlights a region with matching gain and overexpression, perfectly aligned with the corresponding UGI. The same region is presented also enlarged in the green box.

5) Annotating UGIs and DGIs We selected the following annotation categories to be compared with UGIs and DGIs: UCSC genes, which is a track showing gene predictions based on data from RefSeq, Genbank, and UniProt; sno/miRNA, that displays microRNAs from the miRNA Registry at the Wellcome Trust Sanger Institute and small nucleolar RNAs (snoRNAs and scaRNAs) from the snoRNABase maintained at the Laboratoire de Biologie Moléculaire Eucaryote; Bertone Yale TAR, that shows the locations of transcriptionally active regions (TARs) as presented in Bertone et al.138; Affy long RNA (lRNA), that shows transcribed fragments (transfrags) representing long RNAs from the Affymetrix Transcriptome Phase 3 Long RNA Fragments project140. Bertone et al. used an oligonucleotide arrays (36-mer probes) with an average resolution of 46 bp to interrogate the transcription of both strands of non-repetitive human genomic DNA138. A pool of human liver polyadenylated RNA from several individuals was analyzed. Probes with significant fluorescence intensity were compared to predicted (GenScan) or annotated genes (Ensembl, RefSeq). As expected, the greater correlation was found with fully characterized transcripts and predicted genes, but they found also novel transcribed sequences. They defined novel Transcriptional Active Regions (TARs) as novel transcribed regions identified by at least five consecutive probes with fluorescence intensity in the top 90th intensity percentile (over all probes on the array) within a 250 nt window. TARs were validated by RT-PCR. The Affymetrix Transcriptome Phase 3 project analyzed human nuclear and cytosolic

149 polyadenylated long RNA (lRNA, > 200 nt) from eight cell lines and whole-cell RNAs less than 200 nt (short RNA, sRNA) using oligonucleotide arrays (25-mer probes) with 5 bp resolution140. To create maps of transcribed fragments (transfrags), adjacent positive probes were connected according to defined maxgaps (i.e. the maximal distance that two neighbouring positive probes can be separated by to be considered part of the same transfrag), for sRNA 4 nt and for lRNA 11 nt, and minrun (i.e. the minimal size of a transfrag), for sRNA 7 nt and for lRNA 49 nt. Transfrags representing either long RNAs (lRNAs) or short RNAs (sRNAs) were detected and validated. We chose the lRNA fragments of the T cell lymphoma cell line Jurkat for annotating our UGIs and DGIs. The annotation data for chromosomes 8, 11, and 12 were downloaded from the UCSC Table Browser395,396 on the UCSC Genome Browser web site358 at http://genome.ucsc.edu/, based on the March 2006 human genome assembly. Tracks “Bertone Yale TAR” and “Affy Tx lRNA Reg: Jurkat lRNA” were selected from the group “Expression and Regulation” on the UCSC Table Browser window, whereas the tracks “UCSC genes: Known gene” and “sno/miRNA” were downloaded from the group “Gene and Gene Prediction Tracks”. For each annotation track (UCSC genes, sno/miRNA, Bertone Yale TAR, Jurkat lRNA) we prepared a single file containing chromosome 8, 11, and 12 mapping informations. Moreover, the “UCSC Known gene” annotation track was modified with the statistical program R to subdivide each gene in exons and introns. In the end, the modified UCSC Known gene annotation track reported the start and end position of every exon and intron, that were numbered considering that for forward transcription (on + strand) the exon numbering increases with physical mapping, whereas for reverse transcription (on – strand) the exon numbering decreases with physical mapping. Annotation of UGIs and DGIs with the downloaded tracks was performed again with Galaxy (http://galaxy.psu.edu/). The tool “Operate on Genomic Intervals”, then ”Join” were selected. We joined, as first query, the BED file with all the genomic intervals, and all the annotation tracks (UCSC genes, sno/miRNA, Bertone Yale TAR, Jurkat lRNA) as second query. The minimal overlap was set to 1 bp, and the resulting file was adapted in the return option to “All records of first query (fill null with “.”)”. This means that all records present in the first query will be cited in the resulting table, either annotated with one of the terms of the second query or with “.” in case of unknown annotation. The latter situation encountered in UGIs was considered as novel transcription activity, and was further studied loading the un-annotated genomic intervals (BED file) on the UCSC Genome Browser. There, the position of the regions of interest was evaluated with respect to additional annotations. We visualized the following categories, as presented in some screen shot in the results part (Figg. 50-53, 55): UCSC genes, sno/miRNA, Human mRNA (GenBank), Human ESTs (including unspliced), Bertone Yale TARs, Affymetrix Transcriptome Phase 3 Long and Short RNA Fragments, and mammalian conservation scores (evolutionary conservation scores are derived from comparative analysis of genomic DNA sequences from multiple species). To evaluate the type and quantity of

150 novel transcription activity from a qualitative point of view, we considered un-annotated UGIs with subsequent physical mapping and not interrupted by any known annotation as part of a unique putative novel transcript, whereas all the other un-annotated UGIs were considered as single entities, but the possibility they belong to a unique transcript was not excluded.

RT-PCR RT-PCR was adopted to validate novel transcription activity. Total RNA was extracted with the TRIZOL reagent (Invitrogen) combined with purification and DNaseI digestion steps on RNeasy Mini Kit spin columns (Qiagen). The TRIZOL protocol was followed as previously described (chapter 4.3, “miRNA expression analysis: mir-34b and mir-34c”) until the first centrifugation step at 14,000 rpm, 4°C for 15’. Afterwards, 400 μl ethanol (Merck) were added drop wise to the supernatant and half of the resulting volume was loaded on a spin column. A centrifugation at 10,000 rpm for 15’’ was performed and the rest of the RNA solution was loaded on the column and again centrifuged. The column was then washed with 350 μl RW1 buffer and a centrifugation at 10,000 rpm for 15’’ was run. 40 μl of DNase mix containing 35 μl RDD buffer and 5 μl DNase were added to each column, followed by incubation at room temperature for 15’. The column was then washed and centrifuged (10,000 rpm, 15’), first with 350 μl RW1 buffer, then two times with 500 μl RPE buffer. The second time the centrifugation was done at 10,000 rpm for 2’, and, after having discarded the flow-through, again at 10,000 rpm for 1’. Total RNA was eluted with 30-50 μl DEPC-water. Total RNA (1μg) was reverse-transcribed with random hexamers using SuperScript First-Strand Synthesis System for RT-PCR (Invitrogen) as previously described (chapter 4.3, “qRT-PCR”). A “no RT” control was added for each cell line RNA sample to exclude genomic DNA contamination: it contained all the reagents but lacked SuperScript II RT enzyme. The resulting cDNA was diluted 1:5 with water and used as template for conventional PCRs with the AmpliTaq Gold PCR Kit (Applied Biosystems). PCR primers were designed with the open- source program primer3397 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) on the JJN3 regions UGI 9 (primer pair 11p_9) and 11 (primer pair 11p_11) (Fig. 53A,B), and were synthesized by Sigma-Genosys (Steinheim, D) (Suppl. Tab. 6). Primers for ACTIN were also used to test its expression as endogenous control (Suppl. Tab. 6). PCRs were performed in a final volume of 25 μl with 1x GeneAmp PCR Buffer, 1.5 mM MgCl2, 0.2 mM dNTP mix, 0.2 μM each primer, 1 U Hot Start DNA Polymerase AmpliTaq Gold, and 2.5 μl diluted cDNA. The following PCR program was run for 11p_9 and 11p_11 on a MJR Thermal Cycler 200 (MJ Research): 95° for 10’, followed by 26 cycles of 95°C for 15’’, 58°C for 30’’, 72°C for 20’’, and final extension at 72°C for 7’. As suggested in previous works on the detection of non-coding RNA (ncRNA) species136,146,149, we also increased the number of cycles of the PCR program to 40 in order to detect weak transcription levels. For ACTIN expression analysis the cycling program was: 95° for 10’, followed by 26 cycles of 94°C 15’’, 55°C 30’’, 72°C 15’’, and final extension at 72°C for 7’. A negative control containing all PCR reagents without any template DNA (NTC), a

151 genomic DNA sample as positive control and no RT controls were also added. Human total RNA of lymph node, thymus and spleen from healthy donors (Ambion) and their pool (CTR RNA pool) were used as representative for the normal counterpart. PCR products were separated by gel electrophoresis (2.5% agarose gel in TBE), stained with ethidium bromide and visualized using the AlphaImager 3400 (AlphaInnotech Corporation) as previously explained (chapter 4.3).

5.4 Results

Genome-wide DNA profiles and tiling gene expression profiles of the cell lines JJN3, KARPAS422 and JEKO1 were obtained as described in chapter 4. DNA gains and losses were delimited by cell-specific thresholds on DNA profiles. A filtering procedure was developed and applied to tiling gene expression profiles to isolate significant transcription activity and organize it in putative transcribed regions. MYC is presented as a positive control for the ability of the adopted strategy to analyze tiling array data in a reliable manner (Fig. 48).

exon 1 exon 2 exon 3 5‘-UTR 3‘-UTR

Fig. 48: Tiling gene expression profile of JEKO1 viewed on Intergrated Genome Browser (IGB). On the x-axis is displayed the physical mapping of chromosome 8, on y-axis are reported the MAT scores that consitute the tiling gene expression profiles of the cell line JEKO1 (in red) relative to the CTR RNA pool. Under the tiling expression profile are highlighted as red bars (BED files) the regions with significant transcription activity, i.e. with MAT scores > threshold (visible as continuos line over the profile), and considered putative transcribed units according to maxgap ≤ 200bp and minrun > 100bp. RefSeq annotation (in green) for the gene MYC is visualized. MYC is transcribed from the forward (+) strand, and shows three exons, an initial 5’-UTR and a final 3’-UTR.

We matched genomic with tiling expression profiles to identify UGIs, i.e. upregulated putative transcribed units with underlying DNA gain, and DGIs, i.e. downregulated putative transcribed units with underlying DNA loss (Fig. 47). As a result we obtained a table containing all UGIs and DGIs of three chromosomes (chromosome 8, 11, 12) and three cell lines (JJN3, KARPAS422, JEKO1). The genomic regions were defined with the start and end position on a given chromosome, the cell line presenting it, and the nature of the matching, i.e. if up or down. Considering the three cell lines and the three chromosomes all together, we detected more than 900 UGIs and more than 450 DGIs.

152 The lists of UGIs and DGIs were then annotated with the following categories: UCSC genes, sno/miRNA, Bertone Yale TAR, and Jurkat lRNA. All the genomic intervals were reported in a table, either annotated with one of the mentioned categories or accompanied by “.” if un- annotated. The genomic intervals were presented with start and end position, cell line to which they belong, and nature of the matching. Additionally, the table showed annotation with chromosome, start and end positions, name of the annotation and strand direction (forward +, or reverse - strand). In particular, we focused our attention on novel transcription activity, corresponding to un- annotated UGIs (Suppl. Tab. 7), further grouped into unique, longer putative novel transcripts if they had subsequent physical mapping and were not interrupted by any known annotation. The majority of un-annotated regions was found on chromosome 11, mainly in the cell line JJN3. JJN3 presented 11 UGIs and four upregulated putative novel transcripts; JEKO1 showed six UGIs and two upregulated putative novel transcripts; and KARPAS422 had three UGIs and three upregulated putative novel transcripts. Un-annotated genomic regions on chromosomes 8 and 12 were less frequent. On chromosome 8 were detected three UGIs in JEKO1 and 11 in JJN3. On chromosome 12 we found one UGI in KARPAS422. Among the various regions with novel transcription activity, we studied the putative novel transcript detected at 11p12 in the cell line JJN3 (Fig. 49). The JJN3 genomic profile revealed a large gain at 11p12-p11.2 as presented in figure 28.2 and also visible in figure 49 as continuous light blue bar covering the region of interest. The tiling expression profile showed a large region of significant transcription activity at 11p12, that could be further organized in putative transcribed units, corresponding to the UGIs reported in figure 49.

UGIs

significant transcription activity

Fig. 49: Genomic and tiling expression profiles, and UGIs of JJN3 at 11p12 as visualized on IGB (March 2006 human genome assembly hg18). On the x-axis is displayed the physical mapping of chromosome 11. On y-axis are presented (from the bottom): in light blue the default DNA CN profile filtered for gains and the selected gained region as continuous blue bar; in red the tiling expression profile filtered for significant transcription activity, and as red bars the selected regions; in yellow the track representing UGIs. The nearest known annotation was found to the right of the transcription activity and corresponded to the gene LRRC4C on reverse strand (- strand).

153 The 11p12 putative novel transcript was located distal (>10kb) to previously annotated genes. The nearest known annotation was found about 16 kb to the right, and was represented by the gene LRRC4C on reverse strand (- strand). JJN3 significant transcription activity at 11p12 went from 39,836,655 to 40,077,368, covering about 240 kb (Fig. 50), whereas JJN3 UGIs covered about 128 kb (chr11:39,948,308-40,077,065) and were constituted by 15 intervals (Fig. 51).

Fig. 50: JJN3 significant transcription activity at 11p12 visualized on UCSC Genome Browser (http://genome.ucsc.edu, March 2006 human genome assembly (hg18); chr11:39,836,655-40,077,368; 240,714 bp). At the top of the UCSC screen shot is visible the significant transcription activity, loaded on the Genome Browser as custom track (BED file). The following user-selectable annotation tracks were added: UCSC genes, sno/miRNA, Human mRNAs (GenBank), Human ESTs (including unspliced), Bertone Yale TARs, Affymetrix Transcriptome Phase 3 Long and Short RNA Fragments, and Mammalian Conservation Score.

154

Fig. 51: JJN3 UGIs at 11p12 visualized on UCSC Genome Browser (http://genome.ucsc.edu, March 2006 human genome assembly (hg18); chr11:39,948,308-40,077,065; 128,758 bp). At the top of the UCSC screen shot are visible the 15 JJN3 UGIs, loaded on the Genome Browser as custom track (BED file). The following user-selectable annotation tracks were added: UCSC genes, sno/miRNA, Human mRNAs (GenBank), Human ESTs (including unspliced), Bertone Yale TARs, Affymetrix Transcriptome Phase 3 Long and Short RNA Fragments, and Mammalian Conservation Score.

Interestingly, seven out of 15 UGIs were overlapping with Affymetrix Transcriptome Phase 3 RNA Fragments (other than Jurkat lRNA fragments, already excluded by the annotating procedure), and/or perfectly aligned with high mammalian conservation scores (Fig. 52). Six UGIs were located in correspondence of regions highly conserved in mammals (UGI 1, 3, 4, 5, 9, 11), and four were overlapping with Affymetrix Transcriptome Phase 3 RNA Fragments (UGI 0, 1, 3, 9).

155 A

B

Fig. 52: Continuation on next page.

156 C

D

Fig. 52: Some examples of single JJN3 UGIs visualized on UCSC Genome Browser (http://genome.ucsc.edu, March 2006 human genome assembly (hg18)). A: region_0 partially overlapping with HeLa cytosolic lRNA fragment (330bp window). B: region_4 overlapping with high mammalian conservation score (210bp window). C: region_3 overlapping with HepG2 cytosolic lRNA and partially with high mammalian conservation score (777bp window). D: region_14 completely un-annotated (466bp window).

The novel transcription activity at region_9 (UGI 9) and region_11 (UGI 11) in JJN3 (Fig. 53) was validated by RT-PCR (Fig. 54A). JJN3 UGI 9 was partially overlapping with HeLa cytosolic lRNA and high mammalian conservation scores (Fig. 53A). JJN3 UGI 11 was only overlapping with high mammalian conservation scores (Fig. 53B). After a PCR program with 26 amplification cycles, an amplicon both for region_9 and region_11 was detected by gel electrophoresis and ethidium bromide staining only in JJN3 and in genomic DNA, added as positive control (Fig. 54A). As suggested in previous works on the detection of rare, novel RNA species136,146,149,

157 we increased the number of cycles of the PCR program to 40 in order to identify also weak transcription levels. This led to the detection of weak transcription activity at region_9 in JEKO1, spleen and CTR RNA pool, and at region_11 in JEKO1 and thymus (Fig. 54B). ACTIN expression was tested as endogenous control and confirmed the quality of the RNA starting material (Fig. 54C). RNA Contamination by genomic DNA was excluded by the no RT control. Human total RNA of lymph node, thymus and spleen from healthy donors and their pooling (CTR RNA pool) were considered free of any genomic contamination as declared by the provider (Ambion). The purchased RNAs were treated with DNase and DNA contamination was checked in the quality control procedure.

A

B

Fig. 53: JJN3 UGIs at 11p12 validated by RT-PCR (screen shot from: http://genome.ucsc.edu, March 2006 human genome assembly (hg18)). A: JJN3 UGI 9, partially overlapping with HeLa cytosolic lRNA and high mammalian conservation score; it was amplified with the RT-PCR primers 11p_9 (381bp window). B: JJN3 UGI 11, overlapping with high mammalian conservation score and amplified with the RT-PCR primers 11p_11 (307bp window).

158

Fig. 54: RT-PCR for the validation of novel transcription activity detected at 11p12. A: amplicons detected after 26 PCR cycles; B: amplicons detected after 40 PCR cycles; C: ACTIN expression tested as endogenous control.

Novel transcription activity common to different cell lines was identified only on chromosome 11: JEKO1 and JJN3 shared two UGIs and one upregulated putative novel transcript at 11q14.1, whereas KARPAS422 and JEKO1 had in common one UGI (Fig. 55).

A

Fig. 55: continuation on next pages.

159 B

C

Fig. 55: continuation on next page.

160 D

Fig. 55: Recurrent novel transcription activity on chromosome 11 visualized on UCSC Genome Browser (http://genome.ucsc.edu, March 2006 human genome assembly (hg18)). A: recurrent UGI in JJN3 and JEKO1 at 11q14.1 (558bp window). B: recurrent UGI in JJN3 and JEKO1 at 11q22.3 (621bp window). C: recurrent UGI in JEKO1 and KARPAS422 at 11q14.1 (416bp window). D: recurrent upregulated putative novel transcript in JJN3 and JEKO1 at 11q14.1 (12,438bp window).

5.5 Discussion

Recent evidences from systematic studies on genome-wide transcription activity revealed the complex nature of the human genome (reviewed in141,389), where most of both DNA strands might be transcribed. It was even suggested that all of the non-repeat portions of the human genome are transcribed142. Overlapping transcription was observed, both from the same or opposite DNA strands, meaning that different functional elements can co-locate in the same genomic region. A further level of complexity is given by long-range interactions, including distal 5’-ends and transcriptional fusion of neighbouring genes. The majority of the human genomic DNA is non-coding and made by intronic and intergenic sequences, also referred to as “junk DNA”, giving rise to ncRNAs. ncRNAs are defined as eukaryotic transcripts with reduced protein coding potential, often expressed at low levels. Their widespread occurrence suggests that they are functionally important142. ncRNAs are supposed to be involved in relevant regulatory functions, like the control of RNA stability, gene expression, tissue and cellular development, RNA modification, chromatin remodelling, alternative splicing, sub-cellular localization of proteins, and other processes145. The integrative analysis we performed with tiling gene expression and genomic profiles allowed the identification of upregulated putative transcribed units matching with DNA gain

161 (UGIs) or downregulated putative transcribed units matching with DNA loss (DGIs). The selection of altered transcription activity with paired DNA CN change gave us the possibility to characterize the putative transcribed units with respect to the underlying genomic profile and to concentrate on a reduced number of putative transcribed units. Following this need we also adopted a very stringent cutoff to correlate UGIs and DGIs with known annotations. In fact, the minimal overlap to match UGI or DGI with an annotation track was set to 1 bp. As a consequence, only transcription activity completely mapping outside known genes was un-annotated and could be strictly defined as novel. Novel transcription activity regions presented here are certainly underestimated. We disregarded transcription activity overlapping with known sites, such as sense/antisense exonic/intronic transcripts, that could be indicative of novel exons or splice variants of known genes, or unknown transcripts overlapping the known gene on the same or opposite strand. In particular, we were interested in UGIs located outside any known annotation, i.e. intergenic, that could represent novel transcription activity. We assumed that un-annotated UGIs consecutively mapped and not interrupted by any known annotated transcript were part of a unique, longer putative novel transcript. This was the case for the 15 UGIs affecting 128 kb at 11p12 in the cell line JJN3. Its position distant from known transcription sites maked it more likely to represent a novel transcript. We confirmed by RT-PCR the expression of UGI 9 and UGI 11 in JJN3, and, to a lower extent, also in JEKO1 and some control RNAs from healthy tissues of the lymphatic system. Additional validation is needed to confirm the expression of all the other UGIs we supposed to be transcribed besides UGI 9 and UGI 11. The fact that UGI 9 and three more UGIs (UGI 0, 1, 3) were overlapping with Affymetrix Transcriptome Phase 3 RNA Fragments, and that UGI 9 and UGI 11 together with four more UGIs (UGI 1, 3, 4, 5) were aligned with high mammalian conservation scores, strengthened the probability that 11p12 UGIs are really expressed and may represent a novel transcript. It would be of interest to characterized the structure of the 11p12 putative novel transcript more in details, to know its strand of origin, start and stop positions, localization of exonic and intronic regions, number of transcripts derived from UGIs assembly, its expression in different tumor types. The best way to proceed would be to perform 5’- and 3’-RACE on both strands, followed by cloning and sequencing of the obtained products. Moreover, analyses of a large panel of tumor samples should be performed, in order to understand if it is a phenomenon particularly linked to MM, since we originally detected upregulated transcription activity in a MM cell line. A very important condition to respect to avoid false positive results when validating novel transcription activity is the quality of the starting RNA. It must be absolutely free of genomic DNA. The latter should be excluded by the no RT control (negative control lacking reverse transcriptase). In addition, the detection of novel transcripts requires sometimes the use of

162 nested PCR or as many as 40 amplification cycles, since they seem to be present at low copy number per cell. The biological function of the 11p12 putative novel transcript is still to be studied. It could be a novel protein-coding transcript or a ncRNA, better defined as TUF. 11p12 transcription activity is mapped distant from known transcription sites and is constituted by consecutively mapped UGIs, therefore the UGIs could represent single exons of a novel protein-coding transcript. After having resolved the structure of the novel transcript, the presence of open reading frames should be analyzed to evaluate its protein-coding potential. 11p12 could also represent a novel long ncRNA. Functional ncRNAs range from ~22 bp (miRNAs) to many kb, such as the murine 18 kb Xist (X-inactive-specific transcript) or 108 kb Air (antisense IGF2R RNA) ncRNAs. A study of mouse cDNA clones revealed that Air and Xist are not single, full-length transcripts but a cluster of multiple, shorter cDNAs, which are unspliced398. The authors conducted a genome-wide search for novel macro ncRNA candidates. They identified many such regions, mapping outside known protein-coding loci and with a mean length of 92 kb. Some of them were experimentally validated. A similar approach could be adopted to characterize our putative novel long ncRNA at 11p12. Further analysis of the function of ncRNA genes might involve the search for sequence homologies, developmental expression patterns, subcellular localization, and both knockout and ectopic expression in in vivo studies combined with phenotype screening. Our results opened the way to a wide variety of future investigations aimed to the validation and characterization of other interesting, un-annotated transcribed regions, following the same procedure proposed for the 11p12 putative novel transcript. For example it could be interesting to study recurrent novel transcription activity as identified on chromosome 11 (Fig. 55), or un-annotated transcription activity in correspondence of particular regions of the DNA profile, such as amplifications or breakpoints.

163 6 GENERAL DISCUSSION

The use of microarrays to perform aCGH and gene expression analyses in cancer samples allowed us to identify altered genomic regions potentially harbouring candidate genes involved in the development of malignancy, thus representing interesting hints to be further investigated. Genome-wide DNA profiles of MM samples enabled a deeper insight into the extent and precise mapping of typical but also new chromosomal CN changes. SNP arrays used to perform aCGH had the advantage of measuring both DNA CN and LOH probability, thus giving us the opportunity to observe different LOH patterns and speculate on the underlying mechanisms of LOH formation. Moreover, the integrative analysis matching genome-wide DNA and gene expression profiles of MM samples identified strong candidate genes as evidenced by the presence of genes with proven role in MM pathogenesis, genes shared with relevant publications on MM genomic profiling and also known cancer genes not yet implicated in MM. We found particularly interesting the involvement of genes coding for calcium-binding proteins of the S100A family, recurrent among the candidate oncogenes selected with both matched filter and COPA-based integrative strategies; the deregulation of ribosome biogenesis and protein synthesis pathways, confirmed also by the functional annotation analysis; the quite common finding of developmental processes among gained and overexpressed transcripts revealed by functional analysis; the observation of BAGE among COPA 75th and COPA 90th outliers, which is coding for the B melanoma antigen and belongs to the cancer germline antigen gene family. These results strongly suggest a possible pathogenic role for such genes and pathways in MM. Further studies are necessary to validate selected candidate genes with independent approaches, to elucidate their oncogenic potential and to understand their relevance as novel therapeutic targets. A recurrent amplified region on chromosome 11 was identified in the 10K DNA CN profiles obtained during the MM project and in two other parallel projects that were in progress in our laboratory. The advent of 250K SNP arrays allowed a better characterization of the 11q23.1 amplification at DNA level. The cell lines JJN3, KARPAS422 and U2932 presented a minimal common region of amplification, delimited by the amplicon of JJN3. We tried to discover the biological function of this recurrent genomic aberration by analyzing the effects on transcription activity. Neither RT-PCR nor tiling array approaches identified a clear target within the genes located in the amplified region. Transcriptome mapping with tiling arrays excluded the presence of novel transcription activity outside the boundaries of annotated genes. We therefore proposed the existence of novel transcription activity overlapping annotated genes. The design of the tiling gene expression assay did not reveal the direction of transcription. We indicated the putative tumor suppressor PPP2R1B as possible target of endogenous cis-antisense transcription eventually deregulating the sense protein-coding

164 transcript PPP2R1B. Further experiments are necessary to confirm this hypothesis, including the detection of the cis-antisense transcript by strand-specific RT-PCR approaches, followed by the characterization of its biological function and the role of DNA amplification in its regulation. Another possible explanation for 11q23.1 amplification is the relative fragility of the region, causing chromosome rearrangements as confirmed by the FISH analyses. Transcriptome mapping with tiling arrays on chromosomes 8, 11, and 12, and successive matching with the DNA profiles allowed the selection of a high number of upregulated putative transcribed units coupled with DNA gain (UGIs) or downregulated putative transcribed units matching with DNA loss (DGIs). Interestingly, we detected also currently un- annotated UGIs, probably indicative of novel transcription activity. The reliability of our findings regarding putative novel transcription activity was proven by the validation of un- annotated UGIs at 11p12. Full characterization of the putative novel transcript at 11p12 and of other un-annotated UGIs would require a wide variety of new, extensive investigations, that could give the possibility to gain a deeper insight in still unexplained aspects of the complex architecture of the human genome. Additional data on novel transcription activity could be obtained performing the annotation of UGIs and DGIs with a less stringent cutoff, in order to take into account also novel transcription activity overlapping with annotated genes, excluded from the current analysis, or evaluating the tiling gene expression profiles independently from DNA profiles, thus considering all the putative transcribed units, matching or not with underlying DNA CN. We can conclude that the great advantage of global genomic approaches to study a complex disease such as cancer is the possibility to simultaneously identify multiple aberrations, probably playing a role as a complex pattern of mutations in tumor development. Genomic profiling can be used as a powerful systematic tool for unbiased discovery of novel disease-related genes, opening the way to a variety of further experiments that keep on adding knowledge to the dark sides of cancer biology. We could demonstrate that integrative genomic approaches, especially the various combinations of genome-wide DNA CN and gene expression level, are of immense help in the identification of relevant chromosomal CN changes and in the selection of candidate genes. Last but not least, it has to be considered that alterations of DNA CN are a common cause but not the only cause of deregulated gene expression in a cancer cell. Genetic changes such as balanced translocations, whole genome ploidy, point mutations or epigenetic modifications fail to be detected by aCGH approaches but are as important as CN changes in generating abnormal gene expression profiles or abnormal protein functions. Moreover, modifications at post-transcriptional level (i.e. not due to genetic lesions) may equally contribute to the activation or inactivation of relevant cancer genes.

165 7 BIBLIOGRAPHY

1. Rinaldi, A., et al., Genomic and expression profiling identifies the B-cell associated tyrosine kinase Syk as a possible therapeutic target in mantle cell lymphoma. Br J Haematol, 2006. 132(3): p. 303-16. 2. Rinaldi, A., et al., Comparative genome-wide profiling of post-transplant lymphoproliferative disorders and diffuse large B-cell lymphomas. Br J Haematol, 2006. 134(1): p. 27-36. 3. Forconi, F., et al., High density genome-wide DNA profiling reveals a remarkably stable profile in hairy cell leukaemia. Br J Haematol, 2008. 141(5): p. 622-30. 4. Bertoni, F., et al., Genome Wide-DNA Profiling of HIV-Related Non-Hodgkin Lymphomas: Implications for Disease Pathogenesis and Histogenesis. ASH Annual Meeting Abstracts, 2007. 110(11): p. 561. 5. Lombardi, L., et al., Molecular characterization of human multiple myeloma cell lines by integrative genomics: insights into the biology of the disease. Genes Chromosomes Cancer, 2007. 46(3): p. 226-38. 6. Tomlins, S.A., et al., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 2005. 310(5748): p. 644-8. 7. WHO, World cancer report, ed. P.K. Bernard W. Stewart. 2003, Lyon: IARC Press. 8. Boveri, T., Zur Frage der Entstehung maligner Tumoren. 1914, Jena, Germany: Gustav Fisher. 9. Muller, H.J., Artificial Transmutation of the Gene. Science, 1927. 66(1699): p. 84-87. 10. Futreal, P.A., et al., A census of human cancer genes. Nat Rev Cancer, 2004. 4(3): p. 177-83. 11. Mitelman, F., Recurrent chromosome aberrations in cancer. Mutat Res, 2000. 462(2-3): p. 247-53. 12. Beroukhim, R., et al., Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc Natl Acad Sci U S A, 2007. 104(50): p. 20007-12. 13. Hanahan, D. and R.A. Weinberg, The hallmarks of cancer. Cell, 2000. 100(1): p. 57-70. 14. Loeb, L.A., Mutator phenotype may be required for multistage carcinogenesis. Cancer Res, 1991. 51(12): p. 3075-9. 15. Beckman, R.A. and L.A. Loeb, Efficiency of carcinogenesis with and without a mutator mutation. Proc Natl Acad Sci U S A, 2006. 103(38): p. 14140-5. 16. Loeb, L.A., C.F. Springgate, and N. Battula, Errors in DNA replication as a basis of malignant changes. Cancer Res, 1974. 34(9): p. 2311-21. 17. Loeb, L.A., J.H. Bielas, and R.A. Beckman, Cancers exhibit a mutator phenotype: clinical implications. Cancer Res, 2008. 68(10): p. 3551-7; discussion 3557.

166 18. Tomlinson, I.P., M.R. Novelli, and W.F. Bodmer, The mutation rate and cancer. Proc Natl Acad Sci U S A, 1996. 93(25): p. 14800-3. 19. Fearon, E.R. and B. Vogelstein, A genetic model for colorectal tumorigenesis. Cell, 1990. 61(5): p. 759-67. 20. Lengauer, C., K.W. Kinzler, and B. Vogelstein, Genetic instabilities in human cancers. Nature, 1998. 396(6712): p. 643-9. 21. Sieber, O.M., K. Heinimann, and I.P. Tomlinson, Genomic instability--the engine of tumorigenesis? Nat Rev Cancer, 2003. 3(9): p. 701-8. 22. Bodmer, W., J.H. Bielas, and R.A. Beckman, Genetic instability is not a requirement for tumor development. Cancer Res, 2008. 68(10): p. 3558-60; discussion 3560-1. 23. Halazonetis, T.D., V.G. Gorgoulis, and J. Bartek, An oncogene-induced DNA damage model for cancer development. Science, 2008. 319(5868): p. 1352-5. 24. Draviam, V.M., S. Xie, and P.K. Sorger, Chromosome segregation and genomic stability. Curr Opin Genet Dev, 2004. 14(2): p. 120-5. 25. Aguilera, A. and B. Gomez-Gonzalez, Genome instability: a mechanistic view of its causes and consequences. Nat Rev Genet, 2008. 9(3): p. 204-17. 26. Jackson, S.P., Detecting, signalling and repairing DNA double-strand breaks. Biochem Soc Trans, 2001. 29(Pt 6): p. 655-61. 27. Maser, R.S. and R.A. DePinho, Connecting chromosomes, crisis, and cancer. Science, 2002. 297(5581): p. 565-9. 28. Durkin, S.G. and T.W. Glover, Chromosome fragile sites. Annu Rev Genet, 2007. 41: p. 169-92. 29. Bayani, J., et al., Genomic mechanisms and measurement of structural and numerical instability in cancer cells. Semin Cancer Biol, 2007. 17(1): p. 5-18. 30. Hyman, E., et al., Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res, 2002. 62(21): p. 6240-5. 31. Pollack, J.R., et al., Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc Natl Acad Sci U S A, 2002. 99(20): p. 12963-8. 32. Heidenblad, M., et al., Microarray analyses reveal strong influence of DNA copy number alterations on the transcriptional patterns in pancreatic cancer: implications for the interpretation of genomic amplifications. Oncogene, 2005. 24(10): p. 1794-801. 33. de Nooij-van Dalen, A.G., et al., Chromosome loss with concomitant duplication and recombination both contribute most to loss of heterozygosity in vitro. Genes Chromosomes Cancer, 1998. 21(1): p. 30-8. 34. Thiagalingam, S., et al., Mechanisms underlying losses of heterozygosity in human colorectal cancers. Proc Natl Acad Sci U S A, 2001. 98(5): p. 2698-702. 35. Murthy, S.K., et al., Loss of heterozygosity associated with uniparental disomy in breast carcinoma. Mod Pathol, 2002. 15(12): p. 1241-50.

167 36. Raghavan, M., et al., Genome-wide single nucleotide polymorphism analysis reveals frequent partial uniparental disomy due to somatic recombination in acute myeloid leukemias. Cancer Res, 2005. 65(2): p. 375-8. 37. Teh, M.T., et al., Genomewide single nucleotide polymorphism microarray mapping in Basal cell carcinomas unveils uniparental disomy as a key somatic event. Cancer Res, 2005. 65(19): p. 8597-603. 38. Robinson, W.P., Mechanisms leading to uniparental disomy and their clinical consequences. Bioessays, 2000. 22(5): p. 452-9. 39. Lafon-Hughes, L., et al., Chromatin-remodelling mechanisms in cancer. Mutat Res, 2008. 658(3): p. 191-214. 40. Baylin, S.B. and J.E. Ohm, Epigenetic gene silencing in cancer - a mechanism for early oncogenic pathway addiction? Nat Rev Cancer, 2006. 6(2): p. 107-16. 41. Esteller, M., Cancer epigenomics: DNA methylomes and histone-modification maps. Nat Rev Genet, 2007. 8(4): p. 286-98. 42. Nowell, P.C., and Hungerford, D., A minute chromosome in human chronic granulocytic leukemia [abstract]. Science, 1960. 132: p. 1497. 43. Klein, G., Multiple phenotypic consequences of the Ig/Myc translocation in B-cell- derived tumors. Genes Chromosomes Cancer, 1989. 1(1): p. 3-8. 44. Rowley, J.D., The Philadelphia chromosome translocation. A paradigm for understanding leukemia. Cancer, 1990. 65(10): p. 2178-84. 45. Frohling, S. and H. Dohner, Chromosomal abnormalities in cancer. N Engl J Med, 2008. 359(7): p. 722-34. 46. Knudson, A.G., Jr., Hereditary cancer, oncogenes, and antioncogenes. Cancer Res, 1985. 45(4): p. 1437-43. 47. Knudson, A.G., Jr., Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci U S A, 1971. 68(4): p. 820-3. 48. Lockwood, W.W., et al., DNA amplification is a ubiquitous mechanism of oncogene activation in lung and other cancers. Oncogene, 2008. 27(33): p. 4615-24. 49. Greenman, C., et al., Patterns of somatic mutation in human cancer genomes. Nature, 2007. 446(7132): p. 153-8. 50. Mullighan, C.G., et al., Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature, 2007. 446(7137): p. 758-64. 51. Weir, B.A., et al., Characterizing the cancer genome in lung adenocarcinoma. Nature, 2007. 450(7171): p. 893-898. 52. Maser, R.S., et al., Chromosomally unstable mouse tumours have genomic alterations similar to diverse human cancers. Nature, 2007. 447(7147): p. 966-71. 53. Easton, D.F., et al., Genome-wide association study identifies novel breast cancer susceptibility loci. Nature, 2007. 447(7148): p. 1087-93.

168 54. Gudmundsson, J., et al., Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nat Genet, 2007. 39(5): p. 631-7. 55. Albertson, D.G., Gene amplification in cancer. Trends Genet, 2006. 22(8): p. 447-55. 56. Albertson, D.G., et al., Chromosome aberrations in solid tumors. Nat Genet, 2003. 34(4): p. 369-76. 57. Myllykangas, S., et al., DNA copy number amplification profiling of human neoplasms. Oncogene, 2006. 25(55): p. 7324-32. 58. Pollack, J.R., et al., Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet, 1999. 23(1): p. 41-6. 59. Myllykangas, S. and S. Knuutila, Manifestation, mechanisms and mysteries of gene amplifications. Cancer Lett, 2006. 232(1): p. 79-89. 60. Hellman, A., et al., A role for common fragile site induction in amplification of human oncogenes. Cancer Cell, 2002. 1(1): p. 89-97. 61. Iliopoulos, D., et al., Roles of FHIT and WWOX fragile genes in cancer. Cancer Lett, 2006. 232(1): p. 27-36. 62. Yin, Y., et al., Wild-type p53 restores cell cycle control and inhibits gene amplification in cells with mutant p53 alleles. Cell, 1992. 70(6): p. 937-48. 63. Savelyeva, L. and M. Schwab, Amplification of oncogenes revisited: from expression profiling to clinical application. Cancer Lett, 2001. 167(2): p. 115-23. 64. Al-Kuraya, K., et al., Prognostic relevance of gene amplifications and coamplifications in breast cancer. Cancer Res, 2004. 64(23): p. 8534-40. 65. Weinstein, I.B. and A. Joe, Oncogene addiction. Cancer Res, 2008. 68(9): p. 3077-80; discussion 3080. 66. Xie, W., W. Ted Brown, and R.B. Denman, Translational regulation by non-protein- coding RNAs: different targets, common themes. Biochem Biophys Res Commun, 2008. 373(4): p. 462-6. 67. Calin, G.A., et al., Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas. Cancer Cell, 2007. 12(3): p. 215-29. 68. Calin, G.A., et al., Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers. Proc Natl Acad Sci U S A, 2004. 101(9): p. 2999- 3004. 69. Zhang, L., et al., microRNAs exhibit high frequency genomic alterations in human cancer. Proc Natl Acad Sci U S A, 2006. 103(24): p. 9136-41. 70. Hwang, H.W. and J.T. Mendell, MicroRNAs in cell proliferation, cell death, and tumorigenesis. Br J Cancer, 2006. 94(6): p. 776-80. 71. Rodriguez, A., et al., Identification of mammalian microRNA host genes and transcription units. Genome Res, 2004. 14(10A): p. 1902-10. 72. Lagos-Quintana, M., et al., Identification of novel genes coding for small expressed RNAs. Science, 2001. 294(5543): p. 853-8.

169 73. Bartel, D.P., MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, 2004. 116(2): p. 281-97. 74. Kim, V.N., MicroRNA BIOGENESIS: COORDINATED CROPPING AND DICING. Nature Reviews Molecular Cell Biology, Nat Rev Mol Cell Biol, 2005. 6(5): p. 376-385. 75. He, L. and G.J. Hannon, MicroRNAs: small RNAs with a big role in gene regulation. Nat Rev Genet, 2004. 5(7): p. 522-31. 76. Calin, G.A., et al., Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci U S A, 2002. 99(24): p. 15524-9. 77. Cimmino, A., et al., miR-15 and miR-16 induce apoptosis by targeting BCL2. Proc Natl Acad Sci U S A, 2005. 102(39): p. 13944-9. 78. Ouillette, P., et al., Integrated genomic profiling of chronic lymphocytic leukemia identifies subtypes of deletion 13q14. Cancer Res, 2008. 68(4): p. 1012-21. 79. Eis, P.S., et al., Accumulation of miR-155 and BIC RNA in human B cell lymphomas. Proc Natl Acad Sci U S A, 2005. 102(10): p. 3627-32. 80. Volinia, S., et al., A microRNA expression signature of human solid tumors defines cancer gene targets. Proc Natl Acad Sci U S A, 2006. 103(7): p. 2257-61. 81. Ota, A., et al., Identification and characterization of a novel gene, C13orf25, as a target for 13q31-q32 amplification in malignant lymphoma. Cancer Res, 2004. 64(9): p. 3087-95. 82. Rinaldi, A., et al., Concomitant MYC and microRNA cluster miR-17-92 (C13orf25) amplification in human mantle cell lymphoma. Leuk Lymphoma, 2007. 48(2): p. 410-2. 83. Garzon, R., et al., MicroRNA expression and function in cancer. Trends Mol Med, 2006. 12(12): p. 580-7. 84. Bejerano, G., et al., Ultraconserved elements in the human genome. Science, 2004. 304(5675): p. 1321-5. 85. Calin, G.A. and C.M. Croce, MicroRNA-cancer connection: the beginning of a new tale. Cancer Res, 2006. 66(15): p. 7390-4. 86. Kim, S.Y. and W.C. Hahn, Cancer genomics: integrating form and function. Carcinogenesis, 2007. 28(7): p. 1387-92. 87. Venter, J.C., et al., The sequence of the human genome. Science, 2001. 291(5507): p. 1304-51. 88. McPherson, J.D., et al., A physical map of the human genome. Nature, 2001. 409(6822): p. 934-41. 89. Cowell, J.K. and L. Hawthorn, The application of microarray technology to the analysis of the cancer genome. Curr Mol Med, 2007. 7(1): p. 103-20. 90. Kallioniemi, A., et al., Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science, 1992. 258(5083): p. 818-21.

170 91. Barth, T.F., et al., Characteristic pattern of chromosomal gains and losses in primary large B-cell lymphomas of the gastrointestinal tract. Blood, 1998. 91(11): p. 4321-30. 92. Monni, O., et al., Gain of 3q and deletion of 11q22 are frequent aberrations in mantle cell lymphoma. Genes Chromosomes Cancer, 1998. 21(4): p. 298-307. 93. Rao, P.H., et al., Chromosomal and gene amplification in diffuse large B-cell lymphoma. Blood, 1998. 92(1): p. 234-40. 94. Bea, S., et al., Diffuse large B-cell lymphoma subgroups have distinct genetic profiles that influence tumor biology and improve gene-expression-based survival prediction. Blood, 2005. 106(9): p. 3183-90. 95. Solinas-Toldo, S., et al., Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer, 1997. 20(4): p. 399- 407. 96. Lenz, G., et al., Molecular subtypes of diffuse large B-cell lymphoma arise by distinct genetic pathways. Proc Natl Acad Sci U S A, 2008. 105(36): p. 13520-5. 97. Pinkel, D., et al., High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet, 1998. 20(2): p. 207-11. 98. Snijders, A.M., et al., Assembly of microarrays for genome-wide measurement of DNA copy number. Nat Genet, 2001. 29(3): p. 263-4. 99. Klein, C.A., et al., Comparative genomic hybridization, loss of heterozygosity, and DNA sequence analysis of single cells. Proc Natl Acad Sci U S A, 1999. 96(8): p. 4494-9. 100. Fiegler, H., et al., DNA microarrays for comparative genomic hybridization based on DOP-PCR amplification of BAC and PAC clones. Genes Chromosomes Cancer, 2003. 36(4): p. 361-74. 101. Greshock, J., et al., 1-Mb resolution array-based comparative genomic hybridization using a BAC clone set optimized for cancer gene analysis. Genome Res, 2004. 14(1): p. 179-87. 102. Inazawa, J., J. Inoue, and I. Imoto, Comparative genomic hybridization (CGH)-arrays pave the way for identification of novel cancer-related genes. Cancer Sci, 2004. 95(7): p. 559-63. 103. Ishkanian, A.S., et al., A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet, 2004. 36(3): p. 299-303. 104. Lander, E.S., et al., Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860-921. 105. Fodor, S.P., et al., Light-directed, spatially addressable parallel chemical synthesis. Science, 1991. 251(4995): p. 767-73. 106. Singh-Gasson, S., et al., Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror array. Nat Biotechnol, 1999. 17(10): p. 974-8. 107. Gunderson, K.L., et al., A genome-wide scalable SNP genotyping assay using microarray technology. Nat Genet, 2005. 37(5): p. 549-54.

171 108. Carvalho, B., et al., High resolution microarray comparative genomic hybridisation analysis using spotted oligonucleotides. J Clin Pathol, 2004. 57(6): p. 644-6. 109. van den Ijssel, P., et al., Human and mouse oligonucleotide-based array CGH. Nucleic Acids Res, 2005. 33(22): p. e192. 110. Lucito, R., et al., Representational oligonucleotide microarray analysis: a high- resolution method to detect genome copy number variation. Genome Res, 2003. 13(10): p. 2291-305. 111. Lucito, R., et al., Detecting gene copy number fluctuations in tumor cells by microarray analysis of genomic representations. Genome Res, 2000. 10(11): p. 1726-36. 112. Lucito, R., et al., Genetic analysis using genomic representations. Proc Natl Acad Sci U S A, 1998. 95(8): p. 4487-92. 113. Wang, D.G., et al., Large-scale identification, mapping, and genotyping of single- nucleotide polymorphisms in the human genome. Science, 1998. 280(5366): p. 1077- 82. 114. Kennedy, G.C., et al., Large-scale genotyping of complex DNA. Nat Biotechnol, 2003. 21(10): p. 1233-7. 115. Mei, R., et al., Genome-wide detection of allelic imbalance using human SNPs and high-density DNA arrays. Genome Res, 2000. 10(8): p. 1126-37. 116. Lindblad-Toh, K., et al., Loss-of-heterozygosity analysis of small-cell lung carcinomas using single-nucleotide polymorphism arrays. Nat Biotechnol, 2000. 18(9): p. 1001-5. 117. Bignell, G.R., et al., High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res, 2004. 14(2): p. 287-95. 118. Zhao, X., et al., An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res, 2004. 64(9): p. 3060-71. 119. Barrett, M.T., et al., Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proc Natl Acad Sci U S A, 2004. 101(51): p. 17765- 70. 120. Brennan, C., et al., High-resolution global profiling of genomic alterations with long oligonucleotide microarray. Cancer Res, 2004. 64(14): p. 4744-8. 121. Selzer, R.R., et al., Analysis of chromosome breakpoints in neuroblastoma at sub- kilobase resolution using fine-tiling oligonucleotide array CGH. Genes Chromosomes Cancer, 2005. 44(3): p. 305-19. 122. Matsuzaki, H., et al., Parallel genotyping of over 10,000 SNPs using a one-primer assay on a high-density oligonucleotide array. Genome Res, 2004. 14(3): p. 414-25. 123. Matsuzaki, H., et al., Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat Methods, 2004. 1(2): p. 109-11.

172 124. Wong, K.K., et al., Allelic imbalance analysis by high-density single-nucleotide polymorphic allele (SNP) array with whole genome amplified DNA. Nucleic Acids Res, 2004. 32(9): p. e69. 125. Walker, B.A., et al., Integration of global SNP-based mapping and expression arrays reveals key regions, mechanisms and genes important in the pathogenesis of multiple myeloma. Blood, 2006. 108(5): p. 1733-43. 126. Beroukhim, R., et al., Inferring loss-of-heterozygosity from unpaired tumors using high- density oligonucleotide SNP arrays. PLoS Comput Biol, 2006. 2(5): p. e41. 127. Lipshutz, R.J., et al., High density synthetic oligonucleotide arrays. Nat Genet, 1999. 21(1 Suppl): p. 20-4. 128. Li, C. and W. Hung Wong, Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol, 2001. 2(8): p. RESEARCH0032. 129. Li, C. and W.H. Wong, Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A, 2001. 98(1): p. 31- 6. 130. Alizadeh, A.A., et al., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 2000. 403(6769): p. 503-11. 131. Rosenwald, A., et al., The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med, 2002. 346(25): p. 1937- 47. 132. Rosenwald, A., et al., Molecular diagnosis of primary mediastinal B cell lymphoma identifies a clinically favorable subgroup of diffuse large B cell lymphoma related to Hodgkin lymphoma. J Exp Med, 2003. 198(6): p. 851-62. 133. Savage, K.J., et al., The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical Hodgkin lymphoma. Blood, 2003. 102(12): p. 3871-9. 134. van de Vijver, M.J., et al., A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med, 2002. 347(25): p. 1999-2009. 135. Bianchini, G. and L. Gianni, Introducing new molecular technologies into routine clinical cancer care: is there an impact on the treatment of breast cancer? Ann Oncol, 2008. 19 Suppl 7: p. vii177-83. 136. Kapranov, P., et al., Large-scale transcriptional activity in chromosomes 21 and 22. Science, 2002. 296(5569): p. 916-9. 137. Schadt, E.E., et al., A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome Biol, 2004. 5(10): p. R73. 138. Bertone, P., et al., Global identification of human transcribed sequences with genome tiling arrays. Science, 2004. 306(5705): p. 2242-6.

173 139. Cheng, J., et al., Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science, 2005. 308(5725): p. 1149-54. 140. Kapranov, P., et al., RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science, 2007. 316(5830): p. 1484-8. 141. Gingeras, T.R., Origin of phenotypes: genes and transcripts. Genome Res, 2007. 17(6): p. 682-90. 142. Willingham, A.T. and T.R. Gingeras, TUF love for "junk" DNA. Cell, 2006. 125(7): p. 1215- 20. 143. Chen, J., et al., Over 20% of human transcripts might form sense-antisense pairs. Nucleic Acids Res, 2004. 32(16): p. 4812-20. 144. Mockler, T.C., et al., Applications of DNA tiling arrays for whole-genome analysis. Genomics, 2005. 85(1): p. 1-15. 145. Mattick, J.S., RNA regulation: a new genetics? Nat Rev Genet, 2004. 5(4): p. 316-23. 146. Kampa, D., et al., Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res, 2004. 14(3): p. 331-42. 147. Kapranov, P., et al., Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. Genome Res, 2005. 15(7): p. 987-97. 148. Gustincich, S., et al., The complexity of the mammalian transcriptome. J Physiol, 2006. 575(Pt 2): p. 321-32. 149. Cawley, S., et al., Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell, 2004. 116(4): p. 499-509. 150. Brack, C., et al., A complete immunoglobulin gene is created by somatic recombination. Cell, 1978. 15(1): p. 1-14. 151. Jung, D. and F.W. Alt, Unraveling V(D)J recombination; insights into gene regulation. Cell, 2004. 116(2): p. 299-311. 152. Oettinger, M.A., et al., RAG-1 and RAG-2, adjacent genes that synergistically activate V(D)J recombination. Science, 1990. 248(4962): p. 1517-23. 153. Rajewsky, K., Clonal selection and learning in the antibody system. Nature, 1996. 381(6585): p. 751-8. 154. Walsh, S.H. and R. Rosenquist, Immunoglobulin gene analysis of mature B-cell malignancies: reconsideration of cellular origin and potential antigen involvement in pathogenesis. Med Oncol, 2005. 22(4): p. 327-41. 155. Klein, U. and R. Dalla-Favera, Germinal centres: role in B-cell physiology and malignancy. Nat Rev Immunol, 2008. 8(1): p. 22-33. 156. Papavasiliou, F.N. and D.G. Schatz, Cell-cycle-regulated DNA double-stranded breaks in somatic hypermutation of immunoglobulin genes. Nature, 2000. 408(6809): p. 216- 21.

174 157. Bross, L., et al., DNA double-strand breaks in immunoglobulin genes undergoing somatic hypermutation. Immunity, 2000. 13(5): p. 589-97. 158. Wilson, P.C., et al., Somatic hypermutation introduces insertions and deletions into immunoglobulin V genes. J Exp Med, 1998. 187(1): p. 59-70. 159. Chaudhuri, J. and F.W. Alt, Class-switch recombination: interplay of transcription, DNA deamination and DNA repair. Nat Rev Immunol, 2004. 4(7): p. 541-52. 160. Muramatsu, M., et al., Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell, 2000. 102(5): p. 553-63. 161. Revy, P., et al., Activation-induced cytidine deaminase (AID) deficiency causes the autosomal recessive form of the Hyper-IgM syndrome (HIGM2). Cell, 2000. 102(5): p. 565-75. 162. Klein, U., et al., Transcriptional analysis of the B cell germinal center reaction. Proc Natl Acad Sci U S A, 2003. 100(5): p. 2639-44. 163. Shaffer, A.L., et al., Signatures of the immune response. Immunity, 2001. 15(3): p. 375- 85. 164. Phan, R.T. and R. Dalla-Favera, The BCL6 proto-oncogene suppresses p53 expression in germinal-centre B cells. Nature, 2004. 432(7017): p. 635-9. 165. Hu, B.T., et al., Telomerase is up-regulated in human germinal center B cells in vivo and can be re-expressed in memory B cells activated in vitro. J Immunol, 1997. 159(3): p. 1068-71. 166. Phan, T.G., et al., High affinity germinal center B cells are actively selected into the plasma cell compartment. J Exp Med, 2006. 203(11): p. 2419-24. 167. Blink, E.J., et al., Early appearance of germinal center-derived memory B cells and plasma cells in blood after primary immunization. J Exp Med, 2005. 201(4): p. 545-54. 168. Cattoretti, G., et al., BCL-6 protein is expressed in germinal-center B cells. Blood, 1995. 86(1): p. 45-53. 169. Allman, D., et al., BCL-6 expression during B-cell activation. Blood, 1996. 87(12): p. 5257-68. 170. Phan, R.T., et al., Genotoxic stress regulates expression of the proto-oncogene Bcl6 in germinal center B cells. Nat Immunol, 2007. 8(10): p. 1132-9. 171. Turner, C.A., Jr., D.H. Mack, and M.M. Davis, Blimp-1, a novel zinc finger-containing protein that can drive the maturation of B lymphocytes into immunoglobulin-secreting cells. Cell, 1994. 77(2): p. 297-306. 172. Shapiro-Shelef, M., et al., Blimp-1 is required for the formation of immunoglobulin secreting plasma cells and pre-plasma memory B cells. Immunity, 2003. 19(4): p. 607- 20. 173. Kuo, T.C., et al., Repression of BCL-6 is required for the formation of human memory B cells in vitro. J Exp Med, 2007. 204(4): p. 819-30.

175 174. Cobaleda, C., et al., Pax5: the guardian of B cell identity and function. Nat Immunol, 2007. 8(5): p. 463-70. 175. Kuppers, R., et al., Cellular origin of human B-cell lymphomas. N Engl J Med, 1999. 341(20): p. 1520-9. 176. Kuppers, R. and R. Dalla-Favera, Mechanisms of chromosomal translocations in B cell lymphomas. Oncogene, 2001. 20(40): p. 5580-94. 177. Shen, H.M., et al., Mutation of BCL-6 gene in normal B cells by the process of somatic hypermutation of Ig genes. Science, 1998. 280(5370): p. 1750-2. 178. Pasqualucci, L., et al., BCL-6 mutations in normal germinal center B cells: evidence of somatic hypermutation acting outside Ig loci. Proc Natl Acad Sci U S A, 1998. 95(20): p. 11816-21. 179. Pasqualucci, L., et al., Hypermutation of multiple proto-oncogenes in B-cell diffuse large-cell lymphomas. Nature, 2001. 412(6844): p. 341-6. 180. Kuppers, R., Mechanisms of B-cell lymphoma pathogenesis. Nat Rev Cancer, 2005. 5(4): p. 251-62. 181. Borche, L., et al., Evidence that chronic lymphocytic leukemia B lymphocytes are frequently committed to production of natural autoantibodies. Blood, 1990. 76(3): p. 562-9. 182. Sthoeger, Z.M., et al., Production of autoantibodies by CD5-expressing B lymphocytes from patients with chronic lymphocytic leukemia. J Exp Med, 1989. 169(1): p. 255-68. 183. Hermine, O., et al., Regression of splenic lymphoma with villous lymphocytes after treatment of hepatitis C virus infection. N Engl J Med, 2002. 347(2): p. 89-94. 184. Wotherspoon, A.C., et al., Regression of primary low-grade B-cell gastric lymphoma of mucosa-associated lymphoid tissue type after eradication of Helicobacter pylori. Lancet, 1993. 342(8871): p. 575-7. 185. Hussell, T., et al., Helicobacter pylori-specific tumour-infiltrating T cells provide contact dependent help for the growth of malignant B cells in low-grade gastric lymphoma of mucosa-associated lymphoid tissue. J Pathol, 1996. 178(2): p. 122-7. 186. Caldwell, R.G., et al., Epstein-Barr virus LMP2A drives B cell development and survival in the absence of normal B cell receptor signals. Immunity, 1998. 9(3): p. 405-11. 187. Casola, S., et al., B cell receptor signal strength determines B cell fate. Nat Immunol, 2004. 5(3): p. 317-27. 188. Sagaert, X., B. Sprangers, and C. De Wolf-Peeters, The dynamics of the B follicle: understanding the normal counterpart of B-cell-derived malignancies. Leukemia, 2007. 21(7): p. 1378-86. 189. Kyle, R.A. and S.V. Rajkumar, Multiple myeloma. Blood, 2008. 111(6): p. 2962-72. 190. Mahtouk, K., et al., Input of DNA microarrays to identify novel mechanisms in multiple myeloma biology and therapeutic applications. Clin Cancer Res, 2007. 13(24): p. 7289-95.

176 191. Shapiro-Shelef, M. and K. Calame, Plasma cell differentiation and multiple myeloma. Curr Opin Immunol, 2004. 16(2): p. 226-34. 192. Shapiro-Shelef, M. and K. Calame, Regulation of plasma-cell development. Nat Rev Immunol, 2005. 5(3): p. 230-42. 193. Kuehl, W.M. and P.L. Bergsagel, Multiple myeloma: evolving genetic events and host interactions. Nat Rev Cancer, 2002. 2(3): p. 175-87. 194. Huff, C.A. and W. Matsui, Multiple myeloma cancer stem cells. J Clin Oncol, 2008. 26(17): p. 2895-900. 195. Hideshima, T., et al., Advances in biology of multiple myeloma: clinical applications. Blood, 2004. 104(3): p. 607-18. 196. Bergsagel, P.L. and W.M. Kuehl, Molecular pathogenesis and a consequent classification of multiple myeloma. J Clin Oncol, 2005. 23(26): p. 6333-8. 197. Kyle, R.A. and S.V. Rajkumar, Monoclonal gammopathy of undetermined significance. Br J Haematol, 2006. 134(6): p. 573-89. 198. Kyle, R.A. and S.V. Rajkumar, Multiple myeloma. N Engl J Med, 2004. 351(18): p. 1860- 73. 199. Lynch, H.T., et al., Phenotypic heterogeneity in multiple myeloma families. J Clin Oncol, 2005. 23(4): p. 685-93. 200. Lynch, H.T., et al., Familial multiple myeloma: a family study and review of the literature. J Natl Cancer Inst, 2001. 93(19): p. 1479-83. 201. Durie, B.G., The epidemiology of multiple myeloma. Semin Hematol, 2001. 38(2 Suppl 3): p. 1-5. 202. Blokhin, N., et al., [Clinical experiences with sarcolysin in neoplastic diseases.]. Ann N Y Acad Sci, 1958. 68(3): p. 1128-32. 203. Alexanian, R., et al., Treatment for multiple myeloma. Combination chemotherapy with different melphalan dose regimens. Jama, 1969. 208(9): p. 1680-5. 204. McElwain, T.J. and R.L. Powles, High-dose intravenous melphalan for plasma-cell leukaemia and myeloma. Lancet, 1983. 2(8354): p. 822-4. 205. Barlogie, B., et al., High-dose chemoradiotherapy and autologous bone marrow transplantation for resistant multiple myeloma. Blood, 1987. 70(3): p. 869-72. 206. Dimopoulos, M.A. and E. Kastritis, The role of novel drugs in multiple myeloma. Ann Oncol, 2008. 19 Suppl 7: p. vii121-7. 207. D'Amato, R.J., et al., Thalidomide is an inhibitor of angiogenesis. Proc Natl Acad Sci U S A, 1994. 91(9): p. 4082-5. 208. Singhal, S., et al., Antitumor activity of thalidomide in refractory multiple myeloma. N Engl J Med, 1999. 341(21): p. 1565-71. 209. Hideshima, T., et al., Thalidomide and its analogs overcome drug resistance of human multiple myeloma cells to conventional therapy. Blood, 2000. 96(9): p. 2943-50.

177 210. Gupta, D., et al., Adherence of multiple myeloma cells to bone marrow stromal cells upregulates vascular endothelial growth factor secretion: therapeutic applications. Leukemia, 2001. 15(12): p. 1950-61. 211. Davies, F.E., et al., Thalidomide and immunomodulatory derivatives augment natural killer cell cytotoxicity in multiple myeloma. Blood, 2001. 98(1): p. 210-6. 212. Richardson, P.G., et al., Immunomodulatory drug CC-5013 overcomes drug resistance and is well tolerated in patients with relapsed multiple myeloma. Blood, 2002. 100(9): p. 3063-7. 213. Mitsiades, N., et al., Apoptotic signaling induced by immunomodulatory thalidomide analogs in human multiple myeloma cells: therapeutic implications. Blood, 2002. 99(12): p. 4525-30. 214. Adams, J., et al., Proteasome inhibitors: a novel class of potent and effective antitumor agents. Cancer Res, 1999. 59(11): p. 2615-22. 215. Orlowski, R.Z., et al., Phase I trial of the proteasome inhibitor PS-341 in patients with refractory hematologic malignancies. J Clin Oncol, 2002. 20(22): p. 4420-7. 216. Richardson, P.G., et al., A phase 2 study of bortezomib in relapsed, refractory myeloma. N Engl J Med, 2003. 348(26): p. 2609-17. 217. Greipp, P.R., et al., International staging system for multiple myeloma. J Clin Oncol, 2005. 23(15): p. 3412-20. 218. Blade, J., L. Rosinol, and M.T. Cibeira, Prognostic factors for multiple myeloma in the era of novel agents. Ann Oncol, 2008. 19 Suppl 7: p. vii117-20. 219. Harousseau, J.L., Autologous transplantation for multiple myeloma. Ann Oncol, 2008. 19 Suppl 7: p. vii128-33. 220. San-Miguel, J.F. Current treatment approaches in multiple myeloma. in 10th International Conference on Malignant Lymphoma 4-7 June. 2008. Lugano, Switzerland. 221. Smadja, N.V., et al., Chromosomal analysis in multiple myeloma: cytogenetic evidence of two different diseases. Leukemia, 1998. 12(6): p. 960-9. 222. Debes-Marun, C.S., et al., Chromosome abnormalities clustering and its implications for pathogenesis and prognosis in myeloma. Leukemia, 2003. 17(2): p. 427-36. 223. Wuilleme, S., et al., Ploidy, as detected by fluorescence in situ hybridization, defines different subgroups in multiple myeloma. Leukemia, 2005. 19(2): p. 275-8. 224. Chng, W.J., et al., Ploidy status rarely changes in myeloma patients at disease progression. Leuk Res, 2005. 225. Fonseca, R., et al., The recurrent IgH translocations are highly associated with nonhyperdiploid variant multiple myeloma. Blood, 2003. 102(7): p. 2562-7. 226. Carrasco, D.R., et al., High-resolution genomic profiles define distinct clinico- pathogenetic subgroups of multiple myeloma patients. Cancer Cell, 2006. 9(4): p. 313- 25.

178 227. Bergsagel, P.L., et al., Promiscuous translocations into immunoglobulin heavy chain switch regions in multiple myeloma. Proc Natl Acad Sci U S A, 1996. 93(24): p. 13931-6. 228. Bergsagel, P.L. and W.M. Kuehl, Chromosome translocations in multiple myeloma. Oncogene, 2001. 20(40): p. 5611-22. 229. Chesi, M., et al., Frequent translocation t(4;14)(p16.3;q32.3) in multiple myeloma is associated with increased expression and activating mutations of fibroblast growth factor receptor 3. Nat Genet, 1997. 16(3): p. 260-4. 230. Chesi, M., et al., The t(4;14) translocation in myeloma dysregulates both FGFR3 and a novel gene, MMSET, resulting in IgH/MMSET hybrid transcripts. Blood, 1998. 92(9): p. 3025-34. 231. Chesi, M., et al., Activated fibroblast growth factor receptor 3 is an oncogene that contributes to tumor progression in multiple myeloma. Blood, 2001. 97(3): p. 729-36. 232. Ayton, P.M. and M.L. Cleary, Molecular mechanisms of leukemogenesis mediated by MLL fusion proteins. Oncogene, 2001. 20(40): p. 5695-707. 233. Hurt, E.M., et al., Overexpression of c-maf is a frequent oncogenic event in multiple myeloma that promotes proliferation and pathological interactions with bone marrow stroma. Cancer Cell, 2004. 5(2): p. 191-9. 234. Trudel, S., et al., Inhibition of fibroblast growth factor receptor 3 induces differentiation and apoptosis in t(4;14) myeloma. Blood, 2004. 103(9): p. 3521-8. 235. Santra, M., et al., A subset of multiple myeloma harboring the t(4;14)(p16;q32) translocation lacks FGFR3 expression but maintains an IGH/MMSET fusion transcript. Blood, 2003. 101(6): p. 2374-6. 236. Bergsagel, P.L., et al., Cyclin D dysregulation: an early and unifying pathogenic event in multiple myeloma. Blood, 2005. 237. Moreau, P., et al., Recurrent 14q32 translocations determine the prognosis of multiple myeloma, especially in patients receiving intensive chemotherapy. Blood, 2002. 100(5): p. 1579-83. 238. Fonseca, R., et al., Clinical and biologic implications of recurrent genomic aberrations in myeloma. Blood, 2003. 101(11): p. 4569-75. 239. Seidl, S., H. Kaufmann, and J. Drach, New insights into the pathophysiology of multiple myeloma. Lancet Oncol, 2003. 4(9): p. 557-64. 240. Cremer, F.W., et al., Delineation of distinct subgroups of multiple myeloma and a model for clonal evolution based on interphase cytogenetics. Genes Chromosomes Cancer, 2005. 44(2): p. 194-203. 241. Cigudosa, J.C., et al., Characterization of nonrandom chromosomal gains and losses in multiple myeloma by comparative genomic hybridization. Blood, 1998. 91(8): p. 3007-10.

179 242. Gutierrez, N.C., et al., Prognostic and biologic significance of chromosomal imbalances assessed by comparative genomic hybridization in multiple myeloma. Blood, 2004. 104(9): p. 2661-6. 243. Elnenaei, M.O., et al., Delineation of the minimal region of loss at 13q14 in multiple myeloma. Genes Chromosomes Cancer, 2003. 36(1): p. 99-106. 244. Juge-Morineau, N., et al., The retinoblastoma susceptibility gene RB-1 in multiple myeloma. Leuk Lymphoma, 1997. 24(3-4): p. 229-37. 245. Shaughnessy, J.D., Jr., et al., A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1. Blood, 2007. 109(6): p. 2276-84. 246. Sawyer, J.R., et al., Genomic instability in multiple myeloma: evidence for jumping segmental duplications of chromosome arm 1q. Genes Chromosomes Cancer, 2005. 42(1): p. 95-106. 247. Inoue, J., et al., Overexpression of PDZK1 within the 1q12-q22 amplicon is likely to be associated with drug-resistance phenotype in multiple myeloma. Am J Pathol, 2004. 165(1): p. 71-81. 248. Shaughnessy, J., Amplification and overexpression of CKS1B at chromosome band 1q21 is associated with reduced levels of p27Kip1 and an aggressive clinical course in multiple myeloma. Hematology, 2005. 10 Suppl 1: p. 117-26. 249. Jenner, M.W., et al., Gene mapping and expression analysis of 16q loss of heterozygosity identifies WWOX and CYLD as being important in determining clinical outcome in multiple myeloma. Blood, 2007. 110(9): p. 3291-300. 250. Intini, D., et al., Relevance of Ras gene mutations in the context of the molecular heterogeneity of multiple myeloma. Hematol Oncol, 2006. 251. Bezieau, S., et al., High incidence of N and K-Ras activating mutations in multiple myeloma and primary plasma cell leukemia at diagnosis. Hum Mutat, 2001. 18(3): p. 212-24. 252. Liu, P., et al., Activating mutations of N- and K-ras in multiple myeloma show different clinical associations: analysis of the Eastern Cooperative Oncology Group Phase III Trial. Blood, 1996. 88(7): p. 2699-706. 253. Avet-Loiseau, H., et al., Rearrangements of the c-myc oncogene are present in 15% of primary human multiple myeloma tumors. Blood, 2001. 98(10): p. 3082-6. 254. Shou, Y., et al., Diverse karyotypic abnormalities of the c-myc locus associated with c- myc dysregulation and tumor progression in multiple myeloma. Proc Natl Acad Sci U S A, 2000. 97(1): p. 228-33. 255. Fabris, S., et al., Heterogeneous pattern of chromosomal breakpoints involving the MYC locus in multiple myeloma. Genes Chromosomes Cancer, 2003. 37(3): p. 261-9. 256. Claudio, J.O., et al., A molecular compendium of genes expressed in multiple myeloma. Blood, 2002. 100(6): p. 2175-86.

180 257. De Vos, J., et al., Comparison of gene expression profiling between malignant and normal plasma cells with oligonucleotide arrays. Oncogene, 2002. 21(44): p. 6848-57. 258. Zhan, F., et al., Global gene expression profiling of multiple myeloma, monoclonal gammopathy of undetermined significance, and normal bone marrow plasma cells. Blood, 2002. 99(5): p. 1745-57. 259. Zhan, F., B. Barlogie, and J. Shaughnessy, Jr., Toward the identification of distinct molecular and clinical entities of multiple myeloma using global gene expression profiling. Semin Hematol, 2003. 40(4): p. 308-20. 260. Davies, F.E., et al., Insights into the multistep transformation of MGUS to myeloma using microarray expression analysis. Blood, 2003. 102(13): p. 4504-11. 261. Munshi, N.C., et al., Identification of genes modulated in multiple myeloma using genetically identical twin samples. Blood, 2004. 103(5): p. 1799-806. 262. Zhan, F., et al., The molecular classification of multiple myeloma. Blood, 2006. 108(6): p. 2020-8. 263. Zhan, F., et al., Gene-expression signature of benign monoclonal gammopathy evident in multiple myeloma is linked to good prognosis. Blood, 2007. 109(4): p. 1692- 700. 264. Mattioli, M., et al., Gene expression profiling of plasma cell dyscrasias reveals molecular patterns associated with distinct IGH translocations in multiple myeloma. Oncogene, 2005. 24(15): p. 2461-73. 265. Largo, C., et al., Identification of overexpressed genes in frequently gained/amplified chromosome regions in multiple myeloma. Haematologica, 2006. 91(2): p. 184-91. 266. Hideshima, T., et al., Understanding multiple myeloma pathogenesis in the bone marrow to identify new therapeutic targets. Nat Rev Cancer, 2007. 7(8): p. 585-98. 267. van de Donk, N.W., H.M. Lokhorst, and A.C. Bloem, Growth factors and antiapoptotic signaling pathways in multiple myeloma. Leukemia, 2005. 19(12): p. 2177-85. 268. Klein, B., et al., Survival and proliferation factors of normal and malignant plasma cells. Int J Hematol, 2003. 78(2): p. 106-13. 269. Ge, N.L. and S. Rudikoff, Expression of PTEN in PTEN-deficient multiple myeloma cells abolishes tumor growth in vivo. Oncogene, 2000. 19(36): p. 4091-5. 270. Derksen, P.W., et al., The hepatocyte growth factor/Met pathway controls proliferation and apoptosis in multiple myeloma. Leukemia, 2003. 17(4): p. 764-74. 271. Mahtouk, K., et al., Heparan sulphate proteoglycans are essential for the myeloma cell growth activity of EGF-family ligands in multiple myeloma. Oncogene, 2006. 25(54): p. 7180-91. 272. Mackay, F. and C. Ambrose, The TNF family members BAFF and APRIL: the growing complexity. Cytokine Growth Factor Rev, 2003. 14(3-4): p. 311-24.

181 273. Moreaux, J., et al., The level of TACI gene expression in myeloma cells is associated with a signature of microenvironment dependence versus a plasmablastic signature. Blood, 2005. 106(3): p. 1021-30. 274. Keats, J.J., et al., Promiscuous mutations activate the noncanonical NF-kappaB pathway in multiple myeloma. Cancer Cell, 2007. 12(2): p. 131-44. 275. Annunziata, C.M., et al., Frequent engagement of the classical and alternative NF- kappaB pathways by diverse genetic abnormalities in multiple myeloma. Cancer Cell, 2007. 12(2): p. 115-30. 276. Fabris, S., et al., Characterization of oncogene dysregulation in multiple myeloma by combined FISH and DNA microarray analyses. Genes Chromosomes Cancer, 2005. 42(2): p. 117-27. 277. Agnelli, L., et al., Molecular Classification of Multiple Myeloma: A Distinct Transcriptional Profile Characterizes Patients Expressing CCND1 and Negative for 14q32 Translocations. J Clin Oncol, 2005. 278. Verdelli, D., et al., Molecular and biological characterization of three novel interleukin- 6-dependent human myeloma cell lines. Haematologica, 2005. 90(11): p. 1541-8. 279. Lichter, P., et al., High-resolution mapping of human chromosome 11 by in situ hybridization with cosmid clones. Science, 1990. 247(4938): p. 64-9. 280. Richelda, R., et al., A novel chromosomal translocation t(4; 14)(p16.3; q32) in multiple myeloma involves the fibroblast growth-factor receptor 3 gene. Blood, 1997. 90(10): p. 4062-70. 281. Gentleman, R.C., et al., Bioconductor: open software development for computational biology and bioinformatics. Genome Biol, 2004. 5(10): p. R80. 282. Huang, J., et al., Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum Genomics, 2004. 1(4): p. 287-99. 283. Olshen, A.B., et al., Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 2004. 5(4): p. 557-72. 284. Zhao, X., et al., Homozygous deletions and chromosome amplifications in human lung carcinomas revealed by single nucleotide polymorphism array analysis. Cancer Res, 2005. 65(13): p. 5561-70. 285. Agnelli, L., et al., Upregulation of translational machinery and distinct genetic subgroups characterise hyperdiploidy in multiple myeloma. Br J Haematol, 2007. 136(4): p. 565-73. 286. Agnelli, L., et al., Integrative genomic analysis reveals distinct transcriptional and genetic features associated with chromosome 13 deletion in multiple myeloma. Haematologica, 2007. 92(1): p. 56-65. 287. Irizarry, R.A., et al., Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 2003. 4(2): p. 249-64.

182 288. Dennis, G., Jr., et al., DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol, 2003. 4(5): p. P3. 289. Thomas, P.D., et al., PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification. Nucleic Acids Res, 2003. 31(1): p. 334-41. 290. Mi, H., et al., The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res, 2005. 33(Database issue): p. D284-8. 291. de Smith, A.J., et al., Array CGH analysis of copy number variation identifies 1284 new genes variant in healthy white males: implications for association studies of complex diseases. Hum Mol Genet, 2007. 16(23): p. 2783-94. 292. Walker, B.A. and G.J. Morgan, Use of single nucleotide polymorphism-based mapping arrays to detect copy number changes and loss of heterozygosity in multiple myeloma. Clin Lymphoma Myeloma, 2006. 7(3): p. 186-91. 293. Kuehl, W.M. and P.L. Bergsagel, Early genetic events provide the basis for a clinical classification of multiple myeloma. Hematology (Am Soc Hematol Educ Program), 2005: p. 346-52. 294. Kralovics, R., et al., A gain-of-function mutation of JAK2 in myeloproliferative disorders. N Engl J Med, 2005. 352(17): p. 1779-90. 295. Novak, A.J., et al., Expression of BCMA, TACI, and BAFF-R in multiple myeloma: a mechanism for growth and survival. Blood, 2004. 103(2): p. 689-94. 296. Moreaux, J., et al., BAFF and APRIL protect myeloma cells from apoptosis induced by interleukin 6 deprivation and dexamethasone. Blood, 2004. 103(8): p. 3148-57. 297. Tai, Y.T., et al., Role of B-cell-activating factor in adhesion and growth of human multiple myeloma cells in the bone marrow microenvironment. Cancer Res, 2006. 66(13): p. 6675-82. 298. Tagawa, H. and M. Seto, A microRNA cluster as a target of genomic amplification in malignant lymphoma. Leukemia, 2005. 299. He, L., et al., A microRNA polycistron as a potential human oncogene. Nature, 2005. 435(7043): p. 828-33. 300. Bernard, O.A. and R. Berger, Molecular basis of 11q23 rearrangements in hematopoietic malignant proliferations. Genes Chromosomes Cancer, 1995. 13(2): p. 75-85. 301. Ziemin-van der Poel, S., et al., Identification of a gene, MLL, that spans the breakpoint in 11q23 translocations associated with human leukemias. Proc Natl Acad Sci U S A, 1991. 88(23): p. 10735-9. 302. Tanaka, K., et al., Restricted chromosome breakpoint sites on 11q22-q23.1 and 11q25 in various hematological malignancies without MLL/ALL-1 gene rearrangement. Cancer Genet Cytogenet, 2001. 124(1): p. 27-35.

183 303. Auer, R.L., et al., Identification of a potential role for POU2AF1 and BTG4 in the deletion of 11q23 in chronic lymphocytic leukemia. Genes Chromosomes Cancer, 2005. 43(1): p. 1-10. 304. Zhang, B., I. Gojo, and R.G. Fenton, Myeloid cell factor-1 is a critical survival factor for multiple myeloma. Blood, 2002. 99(6): p. 1885-93. 305. Salomon-Nguyen, F., et al., The t(1;12)(q21;p13) translocation of human acute myeloblastic leukemia results in a TEL-ARNT fusion. Proc Natl Acad Sci U S A, 2000. 97(12): p. 6757-62. 306. Baulch-Brown, C., et al., Inhibitors of the mevalonate pathway as potential therapeutic agents in multiple myeloma. Leuk Res, 2007. 31(3): p. 341-52. 307. Shipman, C.M., et al., Bisphosphonates induce apoptosis in human myeloma cell lines: a novel anti-tumour activity. Br J Haematol, 1997. 98(3): p. 665-72. 308. Green, J.R., Anti-tumor potential of bisphosphonates. Med Klin (Munich), 2000. 95 Suppl 2: p. 23-8. 309. Alfonso, P., et al., Mutation analysis and genotype/phenotype relationships of Gaucher disease patients in Spain. J Hum Genet, 2007. 52(5): p. 391-6. 310. Marenholz, I., C.W. Heizmann, and G. Fritz, S100 proteins in mouse and man: from evolution to function and pathology (including an update of the nomenclature). Biochem Biophys Res Commun, 2004. 322(4): p. 1111-22. 311. Cross, S.S., et al., Expression of S100 proteins in normal human tissues and common cancers using tissue microarrays: S100A6, S100A8, S100A9 and S100A11 are all overexpressed in common cancers. Histopathology, 2005. 46(3): p. 256-69. 312. Rust, R., et al., High expression of calcium-binding proteins, S100A10, S100A11 and CALM2 in anaplastic large cell lymphoma. Br J Haematol, 2005. 131(5): p. 596-608. 313. White, R.J., RNA polymerases I and III, growth control and cancer. Nat Rev Mol Cell Biol, 2005. 6(1): p. 69-78. 314. Stefanovsky, V.Y., et al., An immediate response of ribosomal transcription to growth factor stimulation in mammals is mediated by ERK phosphorylation of UBF. Mol Cell, 2001. 8(5): p. 1063-73. 315. Gomez-Roman, N., et al., Direct activation of RNA polymerase III transcription by c- Myc. Nature, 2003. 421(6920): p. 290-4. 316. Stein, T., et al., RNA polymerase III transcription can be derepressed by oncogenes or mutations that compromise p53 function in tumours and Li-Fraumeni syndrome. Oncogene, 2002. 21(19): p. 2961-70. 317. Adams, B.D., K.P. Claffey, and B.A. White, Argonaute-2 Expression is Regulated by EGFR/MAPK Signaling and Correlates with a Transformed Phenotype in Breast Cancer Cells. Endocrinology, 2008. 318. Davidsson, J., et al., Tiling resolution array comparative genomic hybridization, expression and methylation analyses of dup(1q) in Burkitt lymphomas and pediatric

184 high hyperdiploid acute lymphoblastic leukemias reveal clustered near-centromeric breakpoints and overexpression of genes in 1q22-32.3. Hum Mol Genet, 2007. 16(18): p. 2215-25. 319. Shimamoto, Y., et al., Sensitivity of human cancer cells to the new anticancer ribo- nucleoside TAS-106 is correlated with expression of uridine-cytidine kinase 2. Jpn J Cancer Res, 2002. 93(7): p. 825-33. 320. Aspland, S.E., H.H. Bendall, and C. Murre, The role of E2A-PBX1 in leukemogenesis. Oncogene, 2001. 20(40): p. 5708-17. 321. Ehlers, J.P., et al., DDEF1 is located in an amplified region of chromosome 8q and is overexpressed in uveal melanoma. Clin Cancer Res, 2005. 11(10): p. 3609-13. 322. Yasui, K., et al., TFDP1, CUL4A, and CDC16 identified as targets for amplification at 13q34 in hepatocellular carcinomas. Hepatology, 2002. 35(6): p. 1476-84. 323. Campbell, P.M., V. Bovenzi, and M. Szyf, Methylated DNA-binding protein 2 antisense inhibitors suppress tumourigenesis of human cancer cell lines in vitro and in vivo. Carcinogenesis, 2004. 25(4): p. 499-507. 324. Borset, M., et al., Concomitant expression of hepatocyte growth factor/scatter factor and the receptor c-MET in human myeloma cell lines. J Biol Chem, 1996. 271(40): p. 24655-61. 325. Hirano, T., Interleukin 6 and its receptor: ten years later. Int Rev Immunol, 1998. 16(3-4): p. 249-84. 326. Kwan, T., et al., Genome-wide analysis of transcript isoform variation in humans. Nat Genet, 2008. 40(2): p. 225-31. 327. Kawamoto, M., et al., Nestin expression correlates with nerve and retroperitoneal tissue invasion in pancreatic cancer. Hum Pathol, 2008. 328. Kleeberger, W., et al., Roles for the stem cell associated intermediate filament Nestin in prostate cancer migration and metastasis. Cancer Res, 2007. 67(19): p. 9199-206. 329. Jenner, R.G., et al., Kaposi's sarcoma-associated herpesvirus-infected primary effusion lymphoma has a plasma cell gene expression profile. Proc Natl Acad Sci U S A, 2003. 100(18): p. 10399-404. 330. Klein, U., et al., Gene expression profile analysis of AIDS-related primary effusion lymphoma (PEL) suggests a plasmablastic derivation and identifies PEL-specific transcripts. Blood, 2003. 101(10): p. 4115-21. 331. Bertoni, F., et al., Update on the molecular biology of mantle cell lymphoma. Hematol Oncol, 2006. 24(1): p. 22-7. 332. Simpson, A.J., et al., Cancer/testis antigens, gametogenesis and cancer. Nat Rev Cancer, 2005. 5(8): p. 615-25. 333. Old, L.J., Cancer/testis (CT) antigens - a new link between gametogenesis and cancer. Cancer Immun, 2001. 1: p. 1.

185 334. van Baren, N., et al., Genes encoding tumor-specific antigens are expressed in human myeloma cells. Blood, 1999. 94(4): p. 1156-64. 335. Hollander, M.C. and A.J. Fornace, Jr., Genomic instability, centrosome amplification, cell cycle checkpoints and Gadd45a. Oncogene, 2002. 21(40): p. 6228-33. 336. Myllykangas, S., T. Bohling, and S. Knuutila, Specificity, selection and significance of gene amplifications in cancer. Semin Cancer Biol, 2007. 17(1): p. 42-55. 337. Slamon, D.J., et al., Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. N Engl J Med, 2001. 344(11): p. 783-92. 338. Thirman, M.J., et al., Rearrangement of the MLL gene in acute lymphoblastic and acute myeloid leukemias with 11q23 chromosomal translocations. N Engl J Med, 1993. 329(13): p. 909-14. 339. Kobayashi, H., et al., Variability of 11q23 rearrangements in hematopoietic cell lines identified with fluorescence in situ hybridization. Blood, 1993. 81(11): p. 3027-33. 340. Kobayashi, H., et al., Analysis of deletions of the long arm of chromosome 11 in hematologic malignancies with fluorescence in situ hybridization. Genes Chromosomes Cancer, 1993. 8(4): p. 246-52. 341. Stilgenbauer, S., et al., Molecular cytogenetic delineation of a novel critical genomic region in chromosome bands 11q22.3-923.1 in lymphoproliferative disorders. Proc Natl Acad Sci U S A, 1996. 93(21): p. 11837-41. 342. Stilgenbauer, S., et al., The ATM gene in the pathogenesis of mantle-cell lymphoma. Ann Oncol, 2000. 11 Suppl 1: p. 127-30. 343. Greiner, T.C., et al., Mutation and genomic deletion status of ataxia telangiectasia mutated (ATM) and p53 confer specific gene expression profiles in mantle cell lymphoma. Proc Natl Acad Sci U S A, 2006. 103(7): p. 2352-7. 344. Kalla, C., et al., Analysis of 11q22-q23 deletion target genes in B-cell chronic lymphocytic leukaemia: evidence for a pathogenic role of NPAT, CUL5, and PPP2R1B. Eur J Cancer, 2007. 43(8): p. 1328-35. 345. Kalla, C., et al., Translocation t(X;11)(q13;q23) in B-cell chronic lymphocytic leukemia disrupts two novel genes. Genes Chromosomes Cancer, 2005. 42(2): p. 128-43. 346. Avet-Loiseau, H., et al., Amplification of the 11q23 region in acute myeloid leukemia. Genes Chromosomes Cancer, 1999. 26(2): p. 166-70. 347. Rovigatti, U., D.K. Watson, and J.J. Yunis, Amplification and rearrangement of Hu-ets-1 in leukemia and lymphoma with involvement of 11q23. Science, 1986. 232(4748): p. 398-400. 348. Crossen, P.E., et al., Identification of amplified genes in a patient with acute myeloid leukemia and double minute chromosomes. Cancer Genet Cytogenet, 1999. 113(2): p. 126-33.

186 349. Poppe, B., et al., Expression analyses identify MLL as a prominent target of 11q23 amplification and support an etiologic role for MLL gain of function in myeloid malignancies. Blood, 2004. 103(1): p. 229-35. 350. Zatkova, A., et al., Distinct sequences on 11q13.5 and 11q23-24 are frequently coamplified with MLL in complexly organized 11q amplicons in AML/MDS patients. Genes Chromosomes Cancer, 2004. 39(4): p. 263-76. 351. Cremer, F.W., et al., High incidence and intraclonal heterogeneity of chromosome 11 aberrations in patients with newly diagnosed multiple myeloma detected by multiprobe interphase FISH. Cancer Genet Cytogenet, 2005. 161(2): p. 116-24. 352. Liebisch, P., et al., Value of comparative genomic hybridization and fluorescence in situ hybridization for molecular diagnostics in multiple myeloma. Br J Haematol, 2003. 122(2): p. 193-201. 353. Fechter, A., et al., Common fragile site FRA11G and rare fragile site FRA11B at 11q23.3 encompass distinct genomic regions. Genes Chromosomes Cancer, 2007. 46(1): p. 98- 106. 354. Amini, R.M., et al., A novel B-cell line (U-2932) established from a patient with diffuse large B-cell lymphoma following Hodgkin lymphoma. Leuk Lymphoma, 2002. 43(11): p. 2179-89. 355. Drexler, H.G., Guide to Leukemia-Lymphoma Cell Lines. 2005, Braunschweig, Germany. 356. Di, X., et al., Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays. Bioinformatics, 2005. 21(9): p. 1958-63. 357. Affymetrix, CNAT 4.0: copy number and loss of heterozygosity estimation algorithms for the GeneChip Human Mapping 10/50/100/250/500K array set. 2007. p. 1-26. 358. Kent, W.J., et al., The human genome browser at UCSC. Genome Res, 2002. 12(6): p. 996-1006. 359. Tibiletti, M.G., et al., Early involvement of 6q in surface epithelial ovarian tumors. Cancer Res, 1996. 56(19): p. 4493-8. 360. Tibiletti, M.G., et al., Chromosome 6 abnormalities in ovarian surface epithelial tumors of borderline malignancy suggest a genetic continuum in the progression model of ovarian neoplasms. Clin Cancer Res, 2001. 7(11): p. 3404-9. 361. Tibiletti, M.G., et al., Genetic and cytogenetic observations among different types of ovarian tumors are compatible with a progression model underlying ovarian tumorigenesis. Cancer Genet Cytogenet, 2003. 146(2): p. 145-53. 362. Johnson, W.E., et al., Model-based analysis of tiling-arrays for ChIP-chip. Proc Natl Acad Sci U S A, 2006. 103(33): p. 12457-62. 363. Dhami, P., et al., Exon array CGH: detection of copy-number changes at the resolution of individual exons in the human genome. Am J Hum Genet, 2005. 76(5): p. 750-62.

187 364. Zhao, C., et al., POU2AF1, an amplification target at 11q23, promotes growth of multiple myeloma cells by directly regulating expression of a B-cell maturation factor, TNFRSF17. Oncogene, 2008, 27(1): p. 63-75. 365. Carbone, G.M., et al., DNA binding and antigene activity of a daunomycin- conjugated triplex-forming oligonucleotide targeting the P2 promoter of the human c- myc gene. Nucleic Acids Res, 2004. 32(8): p. 2396-410. 366. Mosmann, T., Rapid colorimetric assay for cellular growth and survival: application to proliferation and cytotoxicity assays. J Immunol Methods, 1983. 65(1-2): p. 55-63. 367. Sanchez-Izquierdo, D., et al., MALT1 is deregulated by both chromosomal translocation and amplification in B-cell non-Hodgkin lymphoma. Blood, 2003. 101(11): p. 4539-46. 368. Pinkel, D. and D.G. Albertson, Array comparative genomic hybridization and its applications in cancer. Nat Genet, 2005. 37 Suppl: p. S11-7. 369. Gstaiger, M., et al., A B-cell coactivator of octamer-binding transcription factors. Nature, 1995. 373(6512): p. 360-2. 370. Strubin, M., J.W. Newell, and P. Matthias, OBF-1, a novel B cell-specific coactivator that stimulates immunoglobulin promoter activity through association with octamer- binding proteins. Cell, 1995. 80(3): p. 497-506. 371. Kim, U., et al., The B-cell-specific transcription coactivator OCA-B/OBF-1/Bob-1 is essential for normal production of immunoglobulin isotypes. Nature, 1996. 383(6600): p. 542-7. 372. Nielsen, P.J., et al., B lymphocytes are impaired in mice lacking the transcriptional co- activator Bob1/OCA-B/OBF1. Eur J Immunol, 1996. 26(12): p. 3214-8. 373. Schubart, D.B., et al., B-cell-specific coactivator OBF-1/OCA-B/Bob1 required for immune response and germinal centre formation. Nature, 1996. 383(6600): p. 538-42. 374. Siegel, R., et al., Nontranscriptional regulation of SYK by the coactivator OCA-B is required at multiple stages of B cell development. Cell, 2006. 125(4): p. 761-74. 375. Yu, X., et al., Identification and characterization of a novel OCA-B isoform. implications for a role in B cell signaling pathways. Immunity, 2001. 14(2): p. 157-67. 376. Greiner, A., et al., Up-regulation of BOB.1/OBF.1 expression in normal germinal center B cells and germinal center-derived lymphomas. Am J Pathol, 2000. 156(2): p. 501-7. 377. Janssens, V. and J. Goris, Protein phosphatase 2A: a highly regulated family of serine/threonine phosphatases implicated in cell growth and signalling. Biochem J, 2001. 353(Pt 3): p. 417-39. 378. Hunter, T., Protein kinases and phosphatases: the yin and yang of protein phosphorylation and signaling. Cell, 1995. 80(2): p. 225-36. 379. Bialojan, C. and A. Takai, Inhibitory effect of a marine-sponge toxin, okadaic acid, on protein phosphatases. Specificity and kinetics. Biochem J, 1988. 256(1): p. 283-90.

188 380. Arroyo, J.D. and W.C. Hahn, Involvement of PP2A in viral and cellular transformation. Oncogene, 2005. 24(52): p. 7746-55. 381. Wang, S.S., et al., Alterations of the PPP2R1B gene in human lung and colon cancer. Science, 1998. 282(5387): p. 284-7. 382. Calin, G.A., et al., Low frequency of alterations of the alpha (PPP2R1A) and beta (PPP2R1B) isoforms of the subunit A of the serine-threonine phosphatase 2A in human neoplasms. Oncogene, 2000. 19(9): p. 1191-5. 383. Adler, H.T., et al., HRX leukemic fusion proteins form a heterocomplex with the leukemia-associated protein SET and protein phosphatase 2A. J Biol Chem, 1997. 272(45): p. 28407-14. 384. Mumby, M., PP2A: unveiling a reluctant tumor suppressor. Cell, 2007. 130(1): p. 21-4. 385. Junttila, M.R., et al., CIP2A inhibits PP2A in human malignancies. Cell, 2007. 130(1): p. 51-62. 386. Sablina, A.A., et al., The tumor suppressor PP2A Abeta regulates the RalA GTPase. Cell, 2007. 129(5): p. 969-82. 387. Corney, D.C., et al., MicroRNA-34b and MicroRNA-34c Are Targets of p53 and Cooperate in Control of Cell Proliferation and Adhesion-Independent Growth. Cancer Res, 2007. 388. He, L., et al., A microRNA component of the p53 tumour suppressor network. Nature, 2007. 447(7148): p. 1130-4. 389. Kapranov, P., A.T. Willingham, and T.R. Gingeras, Genome-wide transcription and the implications for genomic organization. Nat Rev Genet, 2007. 8(6): p. 413-23. 390. Galante, P.A., et al., Sense-antisense pairs in mammals: functional and evolutionary considerations. Genome Biol, 2007. 8(3): p. R40. 391. Zhang, Y., et al., Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species. Nucleic Acids Res, 2006. 34(12): p. 3465- 75. 392. Reis, E.M., et al., Antisense intronic non-coding RNA levels correlate to the degree of tumor differentiation in prostate cancer. Oncogene, 2004. 23(39): p. 6684-92. 393. Pole, J.C., et al., High-resolution analysis of chromosome rearrangements on 8p in breast, colon and pancreatic cancer reveals a complex pattern of loss, gain and translocation. Oncogene, 2006. 25(41): p. 5693-706. 394. Pittman, A.M., et al., Refinement of the basis and impact of common 11q23.1 variation to the risk of developing colorectal cancer. Hum Mol Genet, 2008. 395. Karolchik, D., et al., The UCSC Table Browser data retrieval tool. Nucleic Acids Res, 2004. 32(Database issue): p. D493-6. 396. Karolchik, D., et al., The UCSC Genome Browser Database. Nucleic Acids Res, 2003. 31(1): p. 51-4.

189 397. Rozen, S. and H. Skaletsky, Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol, 2000. 132: p. 365-86. 398. Furuno, M., et al., Clusters of internally primed transcripts reveal novel long noncoding RNAs. PLoS Genet, 2006. 2(4): p. e37.

190

ACKNOWLEDGEMENTS

I would like to thank my boss and co-director of my PhD thesis Dr. Francesco Bertoni, group leader of the Functional Genomics & Lymphoma Research group of the Laboratory of Experimental Oncology, at the Oncology Institute of Southern Switzerland (IOSI) in Bellinzona (TI), where I carried out my PhD. Of course I thank also all the group members, in particular our bioinformatician Dr. Ivo Kwee for his help in the analysis of microarray data and his enlightening explanations on data analysis algorithms and statistical concepts.

Then I would like to thank Prof. Franco Cavalli, director of the IOSI Institute, and Dr. Carlo Catapano, director of the IOSI Laboratory of Experimental Oncology, for giving me the opportunity to work in the field of cancer research and broaden my knowledge of cellular and molecular biology.

I further thank all the people with whom I could collaborate, more in details:

Prof. Antonino Neri and his group at the Fondazione IRCCS, Policlinico and Dipartimento Scienze Mediche, University of Milan, Italy, for the collaborative work on the multiple myeloma project;

Dr. Maria Grazia Tibiletti and Dr. Barbara Bernasconi at Department of Pathology of the Ospedale di Circolo, University of Insubria, Varese, Italy, who performed the FISH analyses to characterize the 11q23.1 amplified region;

Dr. Francesco Forconi at the Policlinico le Scotte, Siena, Italy, for the collaboration in the project of genomic profiling of hairy cell leukaemia;

Prof. Gianluca Gaidano and Dr. Daniela Capello at the Amedeo Avogadro University, Novara, Italy, for the collaborative work on the genomic profiling and characterization of HIV- related NHL.

A particular thankyou to Prof. Leonardo Scapozza at the University of Geneva, supervisor of my PhD thesis, who allowed me to perform the doctorate extra-muros at the IOSI Laboratory of Experimental Oncology in Bellinzona.

I also thank all the members of the jury for having accepted the invitation to my PhD defense.

VIII Finally, a special thankyou to my colleagues Romina, Emilia, Davide, Pascale, and the former members of the IOSI laboratory Veronica and Sara, for the unforgettable moments we shared.

I am deeply grateful to my family for the endless support and for always giving me the opportunity to find my own way. They teached me everything you cannot learn with books. I also immensely thank Teo, who shared with me the difficult moments of my PhD thesis and motivated me whenever necessary.

Finally, I thank the San Salvatore Foundation for the fellowship supporting my work.

IX SUPPLEMENTARY MATERIAL

Suppl. Tab. 1: qPCR primers for validation of selected amplified regions identified by 10K arrayCGH analysis. mapping Forward primer (5’→3’) Reverse primer (5’→3’) Concentration [nM] CGI-30 1q21.2 50F/50R GGAGATGCCAAGGACATCACA TGCAGCGTCTAACAACTTCCA miR-17-92 13q31.3 50F/50R TTTCCTGATGGCAGCTTGC AAGGCCCCTGGGATTTTTAG BAFF 13q33.3 50F/50R AAGAATCTGTGGGTGCAGGTG GCATTCCCATTACGATTATCCC SYBR11 11q23.1 50F/50R CCAAGGGCTAGGCTTTCAGG GTTCTGAGAACGTTGCGCTG LINE-1 300F/900R AAAGCCGCTCAACTACATGG TGCTTTGAATGCGTCCCAGAG

Suppl. Tab. 2: Chromosome regions affected by DNA gains or losses in 17 HMCLs. Cell line losses gains

CMA02 1p12-p36.11, 1q31.2-q32.2, 1q42.3-q44, 2p23.2- 1q21.1-q24.2, 1q25.1-q25.3, 1q31.1-q31.2, 3, 5p12- p25.3, 4, 8q11.21-q12.1, 12q21.2-q24.33, 13q12.11- p15.33, 5q11.2-q14.3, 7, 8p11.1-p21.2, 8q13.2- q32.1, 17p11.2-p13.2 q24.12, 8q24.13-q24.3, 10p11.21-p15.3, 12p11.1- p13.33, 12q11-q21.1, 13q32.2-q34, 14q, 15q, 18q11.2-q21.2, 19, 21q, 22q

CMA03 1p12-p35.1, 4p12-p16.3, 4q11-q28.2, 5p15.2- 5p14.2-p15.2, 5p12-p13.2, 5q13.3-q15, 7p11.1- p15.33, 5p13.2-p14.1, 6q24.2-q25.3, 7p15.3-p22.3, p15.3, 7q11.21-q22.3, 8q11.23-q24.3, 9p24.1-p24.3, 7q35-q36.3, 8p23.2-p23.3, 8p11.1-p23.1, 8q11.1- 9p22.2-p24.1, 9q12-q33.1, 11q11-q25, 12q13.13- q11.23, 9p13.3-p22.2, 11p11.2-p15.5, 12q14.3- q14.2, 14q11.2-q23.1, 14q23.3-q24.2, 15q, 17q11.1- q23.3, 13q, 14q24.2-q31.1, 16q12.1-q24.3, 17p12- q21.33, 17q22-q25.3, 18q21.2-q22.3, 19p12-p13.3 p13.2, 22q11.21-q12.2, 22q13.1-q13.33

H929 1p13.3-p22.3, 6q25.3-q27, 7p21.1-p22.3, 8q24.21- 1q21.1-q25.2, 8p11.1-p23.3, 8q11.1-q21.3, 8q22.3- q24.3, 10q11.1-q22.1, 12p11.22-p13.31, 13q, q24.13, 11q14.3-q25, 18q12.3-q23, 20q11.21- 19q13.33-q13.43, 20p11.21-p13 q13.33

JJN3 2q14.1-q21.2 1p12-p34.2, 1q21.1-q32.2, 2p23.3-p25.3, 3, 5p12- p15.33, 6p11.2-p25.3, 6q11.1-q14.3, 7p11.1-p22.3, 7q11.21-q21.13, 7q36.1-q36.3, 8q11.1-q24.3, 11p11.2-p12, 11q11q25, 13q12.11-q13.2, 13q13.3- q14.2, 13q21.1-q34, 16p11.2-p13.3, 16q12.1-q24.3, 17, 18, 19, 20, 21q, 22q11.1-q13.2

KMM1 1p12-p34.3, 2p11.2-p25.3, 2q21.2-q32.1, 3p12.1- 1q21.1-q41, 3q23-q29, 6p21.2-p22.2, 10p12.31- p14.2, 6p22.2-p24.3, 6q11.1-q27, 8q11.23-q24.3, p15.3, 12q21.31-q23.1, 14q22.1-q24.3, 18q21.2- 9p24.1-p24.3, 9p13.2-p24.1, 10p11.21-p12.2, q21.33, 19q13.11-q13.12, 19q13.13-q13.32, 10q11.1-q23.31, 13q, 14q11.2-q21.1, 14q24.3-q32.2, 20q11.21-q13.33 15q, 16, 17p12-p13.2

KMS11 1p35.2-p36.32, 1p21.1-p31.3, 3p22.2-p26.3, 3p11.1- 1p33-p35.1, 1p12-p13.3, 1q21.1-q44, 5q11.2-q35.3, p21.1, 3q11.2-q29, 4p12-p16.3, 6p11.2-p12.2, 7p21.1-p22.3, 7q22.1-q36.3, 7q21.3-q36.3, 8q21.3- 6q11.1-q14.2, 6q14.3-q21, 6q21-q23.3, 7p14.3- q23.1, 9q22.31-q34.3, 13q31.3, 16p11.2-p13.3, p21.1, 8p22-p23.3, 8p11.21-p22, 9p21.3-p24.3, 16q23.1-q24.3, 17q11.1-q25.3, 19p12-p13.3, 10q23.2-q26.3, 11p11.2-p15.5, 12p12.1-p13.33, 14q, 21q21.2-q22.3 15q, 17p12-p13.2, 18q11.2-q23, 20p11.21-p13, 20q13.2-q13.33, 22q11.23-q13.33

X Suppl. Tab. 2: continuation. KMS12 1p31.3-p35.1, 1p12-p31.1, 1q31.2-q41, 5q11.2- 1q21.1-q31.1, 1q41-q44, 3q27.3-q29, 7p11.1-p21.3, q35.3, 9p21.2-p24.3, 10q11.21-q26.3, 12q23.3- 7q11.21-q36.3, 8, 9q13-q33.3, 10p11.21-p15.3, q24.33, 13q12.11-q31.1, 14q11.2-q23.3, 16q13- 11p11.12-p15.5, 11q13.3-q25, 13q31.1-q34, 14q24.1- q24.3, 17p12-p13.2, 18q11.2-q12.1, 18q12.2-q12.3, q32.33, 15q21.3-q23, 16q12.1-q12.2, 18p11.21- 18q21.1q21.2, 18q21.2-q21.33, 18q22.1-q23, p11.32, 18q12.1-q12.2 19q12-q13.43, 22q

KMS18 6q24.1-q27, 7p15.3-p22.3, 8p12-p23.3, 10q21.1- 1p33-p36.32, 1q21.1-q41, 3p25.1-p26.3, 6p22.1- q26.3, 11q14.3-q22.3, 12p11.21-p13.31, 13q12.11- p25.3, 8q11.1q24.3, 12p13.31-p13.33, 15q24.3-q26.3 q13.3, 14q11.2-q32.2, 17p12-p13.2, 20p12.1-p13, 22q

KMS20 1p12-p34.2, 2q14.2-q23.3, 2q35-q37.3, 10p11.21- 1q21.1-q41, 12q24.21-q24.33 p15.3, 11q22.1-q22.3, 12p13.31-p13.33, 12q21.31- q24.21, 13q, 16q12.1-q24.3, 17p12-p13.2, 18q21.1- q23, 22q

KMS26 1p21.2-p31.1, 1p12-p21.1, 1q32.3-q44, 2q11.2- 1q21.1-q32.3, 2q22.3-q33.3, 2q33.3-q35, 3p24.2- q22.3, 4q11q22.1, 4q28.2q28.3, 6q16.3-q21, p26.3, 3q13.31-q29, 4q28.3-q34.3, 6q24.1-q27, 8p21.2-p22, 11p15.1-p15.5, 11q24.3-q25, 12p12.1- 8q11.23-q23.2, 8q24.12-q24.21, 9q21.12-q34.3, p13.33, 14q12-q23.2, 16q12.1-q12.2, 16q13-q22.3, 10q21.1-q22.3, 12p11.22-p12.1, 13q21.31-q31.3, 17p12-p13.2 15q25.1-q26.3, 16p11.2-p13.3, 16q23.1-q24.3, 18q12.1-q23, 19, 20q11.21-q13.33, 21q11.2-q21.1, 22q11.23-q13.33

KMS28 1p12-p32.2, 2q12.1-q32.3, 2q36.1-q36.3, 3p12.2- 1q21.1-q23.3, 1q41-q44, 7q11.22-q36.3, 8q12.1- p26.3, 6q22.33-q27, 7p21.1-p22.3, 8p12-p23.3, q24.3, 12q24.21-q24.31, 17q11.2-q25.3 9p24.1-p24.3, 9p23-p24.1, 11q14.1-q22.3, 11q23.3- q25, 12p11.1-p13.33, 12q21.31-q23.3, 12q24.31- q24.33, 13q12.11-q21.33, 22q

KMS34 4p12-p16.3, 4q11-q13.3, 5q14.3-q21.2, 5q23.1- 7q21.11-q36.3, 8p21.3-p23.3, 8q11.1-q24.3, 12p11.1- q31.1, 5q31.3-q33.1, 5q33.2-q33.3, 7p21.2-p22.3, p13.33, 12q11-q21.2, 12q23.1-q24.33, 18q11.2-q23 8p11.21-p21.3, 10q25.1-q25.2, 13q, 14q, 17p12- p13.2, 18p11.22-p11.32, 20p11.21-p13, 22q11.1- q12.3

LP1 1p12-p36.32, 1q32.1-q44, 2q11.2-q32.3, 3p11.1- 1q21.1-q32.1, 3q21.1-q29, 5q14.1-q14.3, 6p24.3- p26.3, 3q11.2-q13.33, 4p12-p16.3, 4q11-q13.3, p25.2, 7q11.21-q32.3, 8p11.1-p23.1, 8q11.1-q24.21, 6q13-q22.31, 6q22.31-q27, 8q24.21-q24.3, 9, 10p13-p15.3, 11p11.12-p14.3, 11q11-q14.1, 12q12- 12p11.21-p13.33, 13q, 14q11.2-q21.3, 19q12 q13.11, 12q13.12-q15, 18q12.3-q22.1, 20q11.21- q13.33

OPM2 1p12-p21.2, 5q14.3-q35.3, 7p15.2-p22.3, 7q11.21- 1q21.1-q44, 4p12-p16.3, 4q11-q34.3, 4q34.3-q35.2, q22.1, 8q21.11-q24.13, 8q24.21-q24.3, 13q33.1- 5p13.1-p15.33, 7p15.2-p11.1, 7q22.1-q36.3, 8p11.1- q34, 14q11.2-q23.2, 14q24.3-q32.12 p23.3, 8q11.1-q21.11, 8q24.13-q24.21, 10p11.21- p15.3, 11, 12p11.1-p13.33, 12q11q15, 12q23.2- q24.33, 14q32.13-q32.2, 16, 17p11.2-p13.1, 17q11.1- q25.3, 18, 19p12-p13.3, 19q12-q13.41, 20q11.21- q13.33

RPMI8226 1p12-p31.2, 2q22.1-q37.3, 4, 5p13.2-p15.31, 1p31.2-p36.32, 1q21.1-q44, 3p25.1-p26.3, 3q22.1- 5q23.1-q35.3, 6q14.1-q25.1, 8p11.1-p23.3, 8q11.1- q29, 5p15.31-p15.33, 5p12-p13.2, 5q11.2-q21.1, q21.2, 9p13.2-p24.3, 9q12-q33.2, 10q11.1-q25.1, 6p22.1-p25.3, 6q12-q14.1, 6q26-q27, 7p11.1-p12.3, 11p11.12-p15.5, 11q23.3-q25, 13q, 14q, 15q11.2- 7q11.21-q36.3, 9q33.2-q34.3, 11q12.2-q14.1, q21.1, 17p12-p13.2, 19q12-q13.33 11q22.3q23.3, 12q11q15, 15q21.1q24.3, 15q25.3- q26.3, 16p11.2-p13.3, 16q12.1-q23.1, 17q11.1-q25.3, 18, 21q

SKMM1 1p13.3-p31.2, 6q14.3-q21, 8q24.21-q24.3, 10p12.2- 1q21.1-q44, 3p21.33-p26.3, 11q12.2-q25, 14q21.3- p15.3, 13q q32.33

U266 4p15.31-p15.32, 4q12-q22.3, 4q31.3-q35.2, 6q16.1- 1q21.1-q25.3, 1q42.2-q44, 2p21-p25.3, 3q25.1-q29, q21, 8p12-p21.3, 10p12.31p13, 10p11.21-p12.1, 4q28.3-q31.3, 7p12.1-p22.3, 7q22.1-q35, 9q31.1- 11q22.1-q25, 12p11.21-p13.33, 13q12.11-q21.1, q34.3, 10q21.1-q21.3, 15q, 20q11.21-q13.33 14q22.1-q24.2, 17p12-p13.2, 20p11.21-p13, 21q11.2-q21.1

XI Suppl. Tab. 3: 222 genes selected by integrative analysis with matched filter in HMCLs. 222 genes mapped in regions with a DNA CN > 4 and with an expression value of the paired gene ≥ 2-fold above the average expression value in 17 HMCLs. Gene Symbol Chromosomal Location Cell Line C1orf73 chr1p36.13-q42.3 KMS26 FAM20B chr1p36.13-q41 KMS20 DPH5 chr1p21.2 KMS26 WDR3 chr1p13-p12 KMS26 MTMR11 chr1q12-q21 KMS20 MEF2D chr1q12-q23 H929 ARHGEF11 chr1q21 KMS20 ARNT chr1q21 H929/LP1/OPM2 C1orf2 chr1q21 LP1 CTSS chr1q21 KMS20 FCRL2 chr1q21 KMS20 GBA /// GBAP chr1q21 H929 IL6R chr1q21 H929 MCL1 chr1q21 H929/KMS20 PIAS3 chr1q21 KMS20 RFX5 chr1q21 KMS20/LP1 S100A10 chr1q21 H929/LP1 S100A4 chr1q21 KMS20 S100A6 chr1q21 H929/KMS20 SETDB1 chr1q21 KMS20 SH2D2A chr1q21 H929 SHC1 chr1q21 OPM2/H929 TUFT1 chr1q21 KMS20/OPM2 TXNIP chr1q21.1 H929/KMS20 PEA15 chr1q21.1 KMS20 C1orf54 chr1q21.2 KMS20 CKIP-1 chr1q21.2 KMS20 HIST2H2AA chr1q21.2 OPM2 SV2A chr1q21.2 KMS20 SYT11 chr1q21.2 KMS20 F11R chr1q21.2-q21.3 KMS20 LMNA chr1q21.2-q21.3 H929 SEC22L1 chr1q21.2-q21.3 KMS26 KCNN3 chr1q21.3 H929 LY9 chr1q21.3-q22 KMS20 ARHGEF2 chr1q21-q22 H929/OPM2 VPS45A chr1q21-q22 KMS20 B4GALT3 chr1q21-q23 H929 HIST2H2BE chr1q21-q23 OPM2 AIM2 chr1q22 H929 FDPS chr1q22 LP1 IFI16 chr1q22 H929 MAPBPIP chr1q22 KMS20 PAQR6 chr1q22 KMS20 PBXIP1 chr1q22 KMS20 PEX19 chr1q22 KMS20 SEMA4A chr1q22 H929/KMS20 MGST3 chr1q23 OPM2

XII Suppl. Tab. 3: continuation. Gene Symbol Chromosomal Location Cell Line PBX1 chr1q23 CMA02/KMS20/OPM2 C1orf66 chr1q23.1 KMS20 NES chr1q23.1 H929 SLAMF7 chr1q23.1-q24.1 H929/KMS20/OPM2 BAT2D1 chr1q23.3 OPM2 SLC19A2 chr1q23.3 OPM2 UAP1 chr1q23.3 H929/OPM2 COPA chr1q23-q25 H929 FMO3 chr1q23-q25 KMS20 MRPS14 chr1q23-q25 LP1 ATP1B1 chr1q24 CMA02/KMS20/OPM2 C1orf9 chr1q24 OPM2 CREG1 chr1q24 OPM2 QSCN6 chr1q24 KMS20 RABGAP1L chr1q24 OPM2 KLHL20 chr1q24.1-q24.3 OPM2 DNM3 chr1q24.3 OPM2 C1orf22 chr1q24-q25 LP1 KIAA0859 chr1q24-q25.3 LP1 C1orf24 chr1q25 LP1 NPL chr1q25 KMS20 C1orf25 chr1q25.2 KMS20 BGLAP /// PMF1 chr1q25-q31 /// chr1q12 H929 RGS1 chr1q31 LP1 RGS2 chr1q31 LP1 RBBP5 chr1q32 KMS26 DTL chr1q32.3 KMS26 FLJ12505 chr1q32.3 KMS26 LOC90806 chr1q32.3 KMS26 TRIM58 chr1q44 OPM2 ZNF124 chr1q44 OPM2 NRP2 chr2q33.3 CMA03 ADAM23 chr2q33 CMA03 GMPS chr3q24 RPMI8226 SSR3 chr3q25.31 RPMI8226 LXN chr3q25.32 RPMI8226 MFSD1 chr3q25.32 RPMI8226 BCHE chr3q26.1-q26.2 RPMI8226 GOLPH4 chr3q26.2 RPMI8226 TNIK chr3q26.2 RPMI8226 ECT2 chr3q26.1-q26.2 RPMI8226 NLGN1 chr3q26.31 RPMI8226 PIK3CA chr3q26.3 RPMI8226 MFN1 chr3q26.32 RPMI8226 ACTL6A chr3q26.33 RPMI8226 IMP-2 chr3q27.2 RPMI8226 MCM3 chr6p12 JJN3 MARCKS chr6q22.2 U266 HGF chr7q21.1 JJN3 PLEC1 chr8q24 KMS26 ZNF7 chr8q24 KMS18

XIII Suppl. Tab. 3: continuation. Gene Symbol Chromosomal Location Cell Line SQLE chr8q24.1 JJN3 ATAD2 chr8q24.13 JJN3 DERL1 chr8q24.13 KMS18 DDEF1 chr8q24.1-q24.2 JJN3/KMS18 KHDRBS3 chr8q24.2 KMS18 FAM49B chr8q24.21 CMA02/JJN3/KMS18 KIAA0143 chr8q24.22 JJN3/KMS18 ST3GAL1 chr8q24.22 JJN3 CPSF1 chr8q24.23 KMS18 SIAHBP1 chr8q24.2-qtel KMS18 NDRG1 chr8q24.3 CMA02/JJN3 KIAA0870 chr8q24.3 KMS18 PTP4A3 chr8q24.3 KMS26 ZC3H3 chr8q24.3 JJN3 SCRIB chr8q24.3 JJN3 GRINA chr8q24.3 KMS18 EXOSC4 chr8q24.3 JJN3 CYC1 chr8q24.3 KMS18 BOP1 chr8q24.3 JJN3 VPS28 chr8q24.3 KMS18 C8orf33 chr8q24.3 KMS18 MNAB chr9q34 RPMI8226 DNM1 chr9q34 RPMI8226 ST6GALNAC4 chr9q34 RPMI8226 SLC2A6 chr9q34 RPMI8226 FLJ35348 chr9q34 RPMI8226 STOM chr9q34.1 RPMI8226 AK1 chr9q34.1 RPMI8226 LOC441468 chr9q34.11 RPMI8226 RXRA chr9q34.3 RPMI8226 KIAA0649 chr9q34.3 RPMI8226 NOTCH1 chr9q34.3 RPMI8226 NPDC1 chr9q34.3 RPMI8226 CSTF3 chr11p13 KMM1 FBXO3 chr11p13 KMM1 PPP2R1B chr11q23.2 JJN3 ITPR2 chr12p11 KMS26 MRPS35 chr12p11 KMS26 FLJ11088 chr12p11.22 KMS26 PTX1 chr12p11.22 KMS26 C12orf11 chr12p11.23 KMS26 SURB7 chr12p11.23 KMS26 CMAS chr12p12.1 KMS26 ETNK1 chr12p12.1 KMS26 KIAA0528 chr12p12.1 KMS26 KRAS chr12p12.1 KMS26 LRMP chr12p12.1 KMS26 ARNTL2 chr12p12.2-p11.2 KMS26 ELF1 chr13q13 JJN3 SLC25A15 chr13q14 JJN3 FOXO1A chr13q14.1 JJN3

XIV Suppl. Tab. 3: continuation. Gene Symbol Chromosomal Location Cell Line DNAJC15 chr13q14.1 JJN3 MRPS31 chr13q14.11 JJN3 WBP4 chr13q14.11 JJN3 NARG1L chr13q14.11 JJN3 AKAP11 chr13q14.11 JJN3 MTRF1 chr13q14.1-q14.3 JJN3 ANKRD10 chr13q34 KMS12 CDC16 chr13q34 JJN3 CUL4A chr13q34 JJN3/KMS12 FLJ11305 chr13q34 KMS12 FLJ12118 chr13q34 KMS12 LAMP1 chr13q34 KMS12 TUBGCP3 chr13q34 JJN3/KMS12 UPF3A chr13q34 KMS12 MBTPS1 chr16q24 KMS26 FLJ10826 chr16q12.2 KMS26 NUP93 chr16q13 KMS26 CIAPIN1 chr16q13-q21 KMS26 GABARAPL2 chr16q22.3-q24.1 KMS26 MAF chr16q22-q23 KMS26 ADAT1 chr16q23.1 KMS26 TERF2IP chr16q23.1 KMS26 ASCIZ chr16q23.2 KMS26 BM039 chr16q23.2 KMS26 DC13 chr16q23.2 KMS26 HSBP1 chr16q23.3 KMS26 MPHOSPH6 chr16q23.3 KMS26 WWOX chr16q23.3-q24.1 KMS26 KARS chr16q23-q24 KMS26 CBFA2T3 chr16q24 KMS26 COTL1 chr16q24.1 KMS26 KIAA0182 chr16q24.1 KMS26 ZCCHC14 chr16q24.2 KMS26 ANKRD11 chr16q24.3 KMS26 GALNS chr16q24.3 KMS26 KIAA1049 chr16q24.3 KMS26 TTC19 chr17p12 KMS26 NCOR1 chr17p11.2 KMS26 TRPV2 chr17p11.2 KMS26 TNFRSF13B chr17p11.2 KMS26 COPS3 chr17p11.2 KMS26 MBD2 chr18q21 CMA02/KMS26 PIAS2 chr18q21.1 KMS26 POLI chr18q21.1 CMA02/KMS26 RKHD2 chr18q21.1 KMS26 SETBP1 chr18q21.1 KMS26 SMAD4 chr18q21.1 KMS26 SMAD7 chr18q21.1 KMS26 TCF4 chr18q21.1 KMS26 FVT1 chr18q21.3 CMA02 VPS4B chr18q21.32-q21.33 CMA02

XV Suppl. Tab. 3: continuation. Gene Symbol Chromosomal Location Cell Line PHLPP chr18q21.33 CMA02 BCL2 chr18q21.33 CMA02/KMS12 ZNF227 chr19q13.31 KMM1 BCL3 chr19q13.1-q13.2 KMM1 FBXO17 chr19q13.2 KMM1 ZNF45 chr19q13.2 KMM1 ZNF226 chr19q13.2 KMM1 ZNF228 chr19q13.2 KMM1 STRN4 chr19q13.2 KMM1 AP2S1 chr19q13.2-q13.3 KMM1 CD3EAP chr19q13.3 KMM1 PRKD2 chr19q13.3 KMM1 RELB chr19q13.31-q13.32 KMM1 CECR5 chr22q11.1 CMA03 BCL2L13 chr22q11 CMA03 CECR1 chr22q11.2 CMA03 MICAL3 chr22q11.21 CMA03 MRPL40 chr22q11.21 CMA03 PIK4CA chr22q11.21 CMA03 UFD1L chr22q11.21 CMA03 COMT chr22q11.21 CMA03

XVI Suppl. Tab. 4: COPA analysis at 75th percentile selecting top 100 unique outliers with DNA CN > 2.3. chr. cytoband symbol COPA.75 score COPA rank cases % cases 2 chr2qter FRZB 14,06 1 3 10% 1 chr1q21 S100A8 11,59 2 19 66% 13 chr13q22 EDNRB 9,33 3 2 7% 4 chr4p16.3 FGFR3 8,15 4 3 10% 7 chr7q21 PEG10 7,67 5 12 41% 7 chr7q21-q22 SGCE 6,80 6 12 41% 2 chr2p21 TACSTD1 6,34 7 3 10% 15 chr15q11.2-q12 NDN 5,94 8 10 34% 6 chr6q23.1 CTGF 5,74 9 1 3% 8 chr12q15/chr19q13.4 LYZ/LILRB1 5,31 10 10 34% 21 chr21p11.1 BAGE 5,13 11 6 21% 1 chr1q22-q23 SLAMF1 5,12 12 20 69% 1 chr1q21 S100A12 4,97 13 19 66% 3 chr3q21-q24 PCOLCE2 4,42 14 11 38% 1 chr1p31 PDE4B 4,42 15 1 3% 2 chr2q24 SCN3A 4,35 16 3 10% 19 chr19q13.32 FOSB 4,23 17 12 41% 3 chr3p22.3 OSBPL10 3,90 18 8 28% 4 chr4q13.2 BRDG1 3,86 19 2 7% 21 chr21p11 TPTE 3,75 20 6 21% 4 chr4q31.3 TRIM2 3,71 21 4 14% 1 chr1q23 FCGR2B 3,57 22 20 69% 12 chr12q12 PDZRN4 3,55 23 7 24% 2 chr2q22-q23 NR4A2 3,49 24 3 10% 12 chr12p13.2-p13.3 TEAD4 3,46 25 4 14% 2 chr2q32.3-q33 GULP1 3,45 26 3 10% 16 chr16q22-q23 MAF 3,37 27 5 17% 11 chr11p15.5 PHLDA2 3,23 28 7 24% 1 chr1q41 ESRRG 3,08 29 11 38% 2 chr2q31.1 ITGA6 3,07 30 3 10% 20 chr20p12 C20orf103 3,07 31 3 10% 11 chr11q13 CCND1 3,05 32 12 41% 7 chr7q11.22 AUTS2 2,93 33 11 38% 1 chr1p31.2 C1orf139 2,93 34 1 3% 6 chr6q14-q15 CNR1 2,91 35 1 3% 7 chr7q32.3 FLJ43663 2,89 36 14 48% 7 chr7p15-p14 CPVL 2,89 37 11 38% 19 chr19q13.4 LILRB4 2,88 38 10 34% 8 chr8p21.3 ChGn 2,87 39 7 24% 14 chr14q24.3 FOS 2,85 40 5 17% 13 chr13q21.1-q22 POU4F1 2,84 41 2 7% 3 chr3q26.1-q26.2 BCHE 2,84 42 12 41% 5 chr5q32 SH3TC2 2,84 43 7 24% 1 chr1q42-q43 KCNK1 2,80 44 11 38% 1 chr1q21 S100A9 2,80 45 19 66% 4 chr4q13-q21 AREG 2,78 46 2 7% 9 chr9q31 TMEFF1 2,77 47 10 34% 9 chr9q31 KLF4 2,76 48 11 38% 8 chr8q13.2-q13.3 SULF1 2,70 49 14 48% 4 chr4q12-q13 PPBP 2,68 50 2 7% 11 chr11p15 TRIM22 2,67 51 7 24%

XVII Suppl. Tab. 4: continuation. chr. cytoband symbol COPA.75 score COPA rank cases % cases 9 chr9q34 FCN1 2,66 52 10 34% 9 chr9p12 DDX58 2,66 53 7 24% 7 chr7q21.1 HGF 2,65 54 12 41% 11 chr11q11 SLC43A3 2,64 55 11 38% 6 chr6p21.3 HIST1H2BG 2,64 56 6 21% 3 chr3q13.2 ZBTB20 2,63 57 7 24% 9 chr9q22 SYK 2,63 58 10 34% 11 chr11q12-q13.1 FADS2 2,60 59 11 38% 8 chr8p21-p12 CLU 2,60 60 7 24% 2 chr2p16 EFEMP1 2,57 61 3 10% 3 chr3q12-q13 CD200 2,56 62 7 24% 5 chr5q12-q13.3 ENC1 2,51 63 10 34% 17 chr17q23 TBX2 2,49 64 8 28% 16 chr16p13.3 IL32 2,48 65 5 17% 1 chr1q23 FCGR2A 2,47 66 20 69% 10 chr10q21 ANK3 2,47 67 2 7% 1 chr1q44 TRIM58 2,47 68 11 38% 17 chr17q21.33 CROP 2,44 69 8 28% 11 chr11q12.3 HRASLS2 2,43 70 11 38% 12 chr12q24.13 RPH3A 2,43 71 3 10% 2 chr2p24 DTNB 2,40 72 5 17% 4 chr4q25-q27 ANK2 2,39 73 2 7% 2 chr2p23.1 YPEL5 2,37 74 4 14% 2 chr2p25 SOX11 2,36 75 5 17% 2 chr2p25.2 RSAD2 2,34 76 5 17% 17 chr17q21.2 SUI1 2,33 77 7 24% 6 chr6q22.31 SMPDL3A 2,32 78 1 3% 2 chr2p11.1-q11.1 EIF5B 2,32 79 2 7% 1 chr1p36.3-p36.2 TNFRSF1B 2,31 80 2 7% 6 chr6q23.1-23.3 MOXD1 2,30 81 1 3% 1 chr1p36.13-q42.3 C1orf73 2,30 82 12 41% 6 chr6q15 GABRR1 2,29 83 1 3% 19 chr19p13 TNFSF7 2,28 84 14 48% 4 chr4q12 IGFBP7 2,27 85 2 7% 4 chr4p14 TBC1D1 2,24 86 2 7% 15 chr15q26 STARD5 2,24 87 12 41% 17 chr17q25.1 CD300A 2,23 88 7 24% 1 chr1q23 PBX1 2,23 89 19 66% 1 chr1q25.2-q25.3 PTGS2 2,23 90 15 52% 1 chr1q25.3 RGL1 2,21 91 16 55% 1 chr1p21 AMY1/AMY2 2,21 92 1 3% 1 chr1q21.2 SYT11 2,20 93 19 66% 18 chr18q12 MOCOS 2,20 94 9 31% 17 chr17p11.2 EPN2 2,20 95 6 21% 1 chr1p36.2 CA6 2,18 96 2 7% 1 chr1p31.1 LPHN2 2,17 97 1 3% 9 chr9q33-q34.1 LHX2 2,15 98 11 38% 7 chr7q22 RELN 2,15 99 15 52% 21 chr21q22.3 ABCG1 2,15 100 7 24%

XVIII Suppl. Tab. 5: COPA analysis at 90th percentile selecting top 100 unique outliers with DNA CN > 2.3. chr. cytoband symbol COPA.90 score COPA rank cases % cases 13 chr13q22 EDNRB 35,40 1 2 7% 1 chr1q31.2 RGS13 30,22 2 16 55% 4 chr4q13-q21 AREG 29,24 3 2 7% 2 chr2q32.3-q33 GULP1 22,29 4 3 10% 2 chr2p21 TACSTD1 20,34 5 3 10% 4 chr4q21-q25 SPP1 20,17 6 2 7% 6 chr6q23.1 CTGF 18,74 7 1 3% 1 chr1q21 S100A8 17,49 8 19 66% 6 chr6q22-q23 LAMA2 17,18 9 1 3% 2 chr2qter FRZB 17,17 10 3 10% 4 chr4p16.3 FGFR3 16,83 11 3 10% 15 chr15q11.2 HDGFRP3 16,30 12 11 38% 4 chr4p15.31 GBA3 15,28 13 2 7% 5 chr5q13 F2RL1 14,53 14 10 34% 4 chr4q34-q35 HPGD 14,22 15 3 10% 5 chr5q14 NR2F1 13,97 16 8 28% 21 chr21p11.1 BAGE 13,68 17 6 21% 11 chr11q13.2 GAL 13,54 18 12 41% 14 chr14q32.33 IGHD 13,51 19 6 21% 12 chr12q15 PHLDA1 12,99 20 2 7% 18 chr18q22 MC4R 12,95 21 11 38% 6 chr6q14-q15 CNR1 12,87 22 1 3% 6 chr6p21.3 HLA-DQA1 12,55 23 4 14% 2 chr2p25.1 GREB1 12,46 24 5 17% 12 chr12p12.3 EMP1 12,45 25 4 14% 1 chr1p31.1 LPHN2 12,21 26 1 3% 2 chr2q24.3 COBLL1 12,05 27 3 10% 1 chr1q25.2-q25.3 PTGS2 11,85 28 15 52% 7 chr7q21 PEG10 11,81 29 12 41% 6 chr6p21.3 HLA-DPA1 11,67 30 4 14% 8 chr12q15/chr19q13.4 LYZ/LILRB1 11,57 31 10 34% 4 chr4q12 IGFBP7 11,44 32 2 7% 17 chr17q25 LGALS3BP 11,25 33 7 24% 1 chr1q21 S100A12 11,13 34 19 66% 3 chr3q26.31 NLGN1 11,08 35 12 41% 16 chr16p13.3 IL32 10,94 36 5 17% 1 chr1p36 CD52 10,90 37 2 7% 9 chr9p24.1 NFIB 10,82 38 8 28% 11 chr11p15.5 PHLDA2 10,33 39 7 24% 4 chr4q12-q13 PPBP 10,16 40 2 7% 14 chr14q32.33 IGHM 9,99 41 6 21% 8 chr8p22 MSR1 9,98 42 6 21% 22 chr22q12.3 PVALB 9,97 43 5 17% 4 chr4q12 NMU 9,88 44 2 7% 11 chr11q13 CCND1 9,85 45 12 41% 2 chr2p25.2 RSAD2 9,82 46 5 17% 8 chr8q13.2-q13.3 SULF1 9,75 47 14 48% 4 chr4q21.21 MGC10646 9,67 48 2 7% 6 chr6q15 GABRR1 9,62 49 1 3% 8 chr8q22.1 FLJ20171 9,61 50 13 45% 6 chr6p21.3 HLA-DPB1 9,60 51 4 14%

XIX Suppl. Tab. 5: continuation. chr. cytoband symbol COPA.90 score COPA rank cases % cases 6 chr6p21.3 HLA-DQA1/ 2 9,58 52 4 14% 3 chr3q21-q24 PCOLCE2 9,55 53 11 38% 15 chr15q11.2-q12 NDN 9,48 54 10 34% 4 chr4q32.3 FLJ20035 9,39 55 3 10% 11 chr11p13 C11orf8 9,36 56 9 31% 16 chr16p13.3 PRSS21 9,26 57 5 17% 2 chr2p25 SOX11 9,24 58 5 17% 13 chr13q33 KDELC1 9,21 59 4 14% 22 chr22q11.23|22q11 VPREB3 9,20 60 4 14% 6 chr6q24.1 GPR126 9,06 61 2 7% 11 chr11q23.3 MCAM 8,98 62 11 38% 1 chr1q22-q23 SLAMF1 8,97 63 20 69% 7 chr7q21-q31 SEMA3C 8,90 64 12 41% 12 chr12p13.2-p13.3 TEAD4 8,84 65 4 14% 8 chr8p21.2 PNMA2 8,80 66 7 24% 3 chr3q26.2 TNIK 8,73 67 12 41% 7 chr7q21-q22 SGCE 8,71 68 12 41% 21 chr21p11 TPTE 8,60 69 6 21% 12 chr12q12 PDZRN4 8,59 70 7 24% 5 chr5q33.1-qter LCP2 8,57 71 7 24% 8 chr8p11.2-p11.1 FGFR1 8,56 72 9 31% 19 chr19q13.42 NALP2 8,55 73 10 34% 15 chr15q26 NR2F2 8,52 74 12 41% 2 chr2p24.1 MYCN 8,48 75 5 17% 7 chr7q31-q32 GNG11 8,45 76 12 41% 4 chr4q31.3 TRIM2 8,45 77 4 14% 15 chr15q24-q25 CTSH 8,38 78 10 34% 16 chr16q22.1 CDH1 8,27 79 3 10% 8 chr8q22 RUNX1T1 8,19 80 13 45% 8 chr8q24.1 ENPP2 8,18 81 13 45% 10 chr10q21 ANK3 8,16 82 2 7% 3 chr3q25.32-q25.33 SCHIP1 8,05 83 12 41% 14 chr14q11.2 RNASE6 8,01 84 4 14% 8 chr8p22 DLC1 8,00 85 8 28% 4 chr4q34-q35 FAT 7,94 86 2 7% 8 chr8p23.1 DEFA1/DEFA3 7,94 87 8 28% 20 chr20q11.2-q13.1 MAFB 7,93 88 10 34% 4 chr4q13 UGT2B17 7,87 89 2 7% 1 chr1q23.1 NES 7,85 90 19 66% 2 chr2cen-q13 MAL 7,74 91 2 7% 7 chr7q31.3 PTPRZ1 7,70 92 14 48% 9 chr9q33-q34.1 LHX2 7,69 93 11 38% 1 chr1q42-q43 KCNK1 7,67 94 11 38% 19 chr19q13.33 MYBPC2 7,66 95 11 38% 5 chr5q13.3 KIAA0888 7,62 96 10 34% 9 chr9q21.13 ANXA1 7,59 97 9 31% 1 chr1p31 PDE4B 7,53 98 1 3% 3 chr3p21|3p21.3 CX3CR1 7,46 99 8 28% 1 chr1q42.11 CDC42BPA 7,43 100 10 34%

XX Suppl. Tab. 6: RT-PCR primers for validation of novel transcription activity identified by tiling expression arrays analysis. Forward primer (5’→3’) Reverse primer (5’→3’) amplicon [bp] 11p_9 TGCTGATAGTGCAACTTGAGG CACGGAGTCTTTTGGGTTTG 253 11p_11 CACAGAGAAGTAATCATGCCATTT CACACACACTGAATGAAGCTGA 257 ACTIN AAGAGAGGCATCCTCACCCT TACATGGCTGGGGTGTTGAA 218

Suppl. Tab. 7: UGIs mapping at un-annotated regions. These regions are indicative of novel transcription activity, comprising un-annotated genomic intervals and putative novel transcripts. chromosome start end cell line chr8 7263853 7264032 JEKO-1 chr8 8277371 8277514 JEKO-1 chr8 128169096 128169200 JEKO-1 chr8 41488441 41488661 JJN3 chr8 43245552 43245805 JJN3 chr8 47610184 47610331 JJN3 chr8 47645097 47645203 JJN3 chr8 47729278 47729490 JJN3 chr8 57177254 57177362 JJN3 chr8 62013325 62013548 JJN3 chr8 62277459 62277711 JJN3 chr8 76451533 76451643 JJN3 chr8 90495778 90496031 JJN3 chr8 115111247 115111468 JJN3 chr11 76702564 76702752 JEKO-1 chr11 82367384 82367660 JEKO-1 chr11 82463583 82463800 JEKO-1 chr11 82466061 82466343 JEKO-1 chr11 82466721 82466921 JEKO-1 chr11 82467194 82467383 JEKO-1 chr11 82467725 82468243 JEKO-1 chr11 82468777 82469277 JEKO-1 chr11 82469558 82469707 JEKO-1 chr11 82469924 82470035 JEKO-1 chr11 82474542 82474723 JEKO-1 chr11 82475704 82476016 JEKO-1 chr11 86413352 86413531 JEKO-1 chr11 89569008 89569118 JEKO-1 chr11 89570060 89570239 JEKO-1 chr11 94131013 94131366 JEKO-1 chr11 94286625 94286770 JEKO-1 chr11 109425634 109426043 JEKO-1 chr11 39948308 39948527 JJN3 chr11 40001433 40001538 JJN3 chr11 40030486 40030594 JJN3 chr11 40039979 40040279 JJN3 chr11 40040882 40041065 JJN3 chr11 40052731 40052916 JJN3 chr11 40061682 40062010 JJN3 chr11 40063750 40064150 JJN3 chr11 40065225 40065487 JJN3 chr11 40065735 40065911 JJN3 chr11 40068464 40068610 JJN3

XXI Suppl. Tab. 7: continuation. chromosome start end cell line chr11 40071835 40072015 JJN3 chr11 40074735 40074987 JJN3 chr11 40076393 40076505 JJN3 chr11 40076786 40077065 JJN3 chr11 61855673 61855979 JJN3 chr11 74459276 74459427 JJN3 chr11 74478797 74479121 JJN3 chr11 74482090 74482197 JJN3 chr11 74489103 74490061 JJN3 chr11 74490457 74490734 JJN3 chr11 74491240 74491420 JJN3 chr11 74492161 74492722 JJN3 chr11 74494347 74494679 JJN3 chr11 76702381 76702752 JJN3 chr11 82467725 82467829 JJN3 chr11 82474576 82474723 JJN3 chr11 82475704 82475912 JJN3 chr11 87772673 87772780 JJN3 chr11 109425711 109426007 JJN3 chr11 118210014 118210160 JJN3 chr11 118979578 118979987 JJN3 chr11 122336829 122337043 JJN3 chr11 122393950 122394188 JJN3 chr11 125442764 125442970 JJN3 chr11 125443290 125443400 JJN3 chr11 82367419 82367660 KARPAS422 chr11 87781812 87782001 KARPAS422 chr11 103257005 103257181 KARPAS422 chr11 103261076 103261295 KARPAS422 chr11 103264731 103264839 KARPAS422 chr11 103270702 103270831 KARPAS422 chr11 103274731 103274904 KARPAS422 chr11 107340888 107341078 KARPAS422 chr11 107343550 107343698 KARPAS422 chr11 110721360 110721503 KARPAS422 chr11 110726296 110726435 KARPAS422 chr12 51822123 51822374 KARPAS422

XXII APPENDIX

TBE 10x (1l) (pH 8.3) 108g TrisBase 55g boric acid 20 ml EDTA 0.5M pH8 H2O ad 1000 ml

Reduced EDTA TE buffer (pH 8.0) final concentration TrisHCl pH 8.0 10mM EDTA pH 8.0 0.1mM

MTT lysis buffer final concentration SDS 25% HCl 0.025 N

Sodium dodecylsulfate–polyacrylamide gel electrophoresis (SDS-PAGE)

Stacking gel (4ml, 1.5 mm thin) 5% 40% acrilammide 0.5 ml 0.5M TrisHCl pH 6.8 1 ml 10% SDS 40 μl 10% APS 40 μl Temed 6 μl H2O 2.4 ml

Running gel (10ml, 1.5 mm thin) 10% 40% acrilammide 2.5 ml 1.5 M Tris pH 8.8 2.5 ml 10% SDS 100 μl 10% APS 100 μl Temed 8 μl H2O 4.8 ml

Protein lysis buffer with phosphatase inhibitors: final concentration 1M Tris HCl pH 7.4 50 mM 5M NaCl 250 mM Nonidet p40 (NP-40) 0.1% 0.5 M EDTA 5 mM Protease inhibitor cocktail* 10% 200mM PMSF (in CH3OH) 2 mM 200mM Na3VO4 (in H2O) 10 mM 100mM NaF 50 mM H2O ad volumen

*A 100% solution is prepared dissolving 1 Mini complete tablet in 1 ml water (Roche Diagnostics GmbH, Penzberg, Germany).

XXIII Protein loading buffer 2x final concentration 0.5M TrisHCl pH 6.8 100mM 1M DTT 200mM 20% SDS 4% 1% Bromophenol blue 0.2% glycerol 20%

Running buffer (TGE) 5X (1l) 15.15g TrisBase 94g glycin 50 ml SDS 10% H2O ad 1000 ml

Transfer Buffer 2X (2l) 24.2g TrisBase 28.5g glycin 4ml SDS 10% 800ml methanol H2O ad 2000 ml

TBS 10X final concentration 1m TrisHCl pH 7.5 100mM 5M NaCl 1.5M

TBST 0.1% final concentration 10x TBS 1x Tween 20 0.1%

XXIV CURRICULUM VITAE

First name and Family name Giulia Poretti of Athos Poretti and Angela Poretti (Saglini)

Date and Place of birth 2 September 1980, Lugano

Place of origin Lugano (TI)

Nationality Swiss

Address Via Soldini 29, CH-6830 Chiasso +41 (0)79 752 96 62 [email protected]

Marital status Unmarried

Education

1995/1999 Literary high school, Mendrisio June 1999 Literary high school leaving certificate

1999/2004 Swiss Federal Institute of Technology Zurich (ETHZ), Faculty of Pharmacy

Diploma thesis Development of a human thymidine kinase with a broad substrate specificity to use as suicide gene, Supervisor: Prof. Leonardo Scapozza, Institute of Pharmaceutical Sciences, ETHZ

November 2004 Swiss federal pharmacist’s diploma

Work experience

February 2005 – July 2008 PhD student at the Laboratory of Experimental Oncology of the Oncology Institute of Southern Switzerland (IOSI), via Vela 6, 6500 Bellinzona (TI), Switzerland. Supervisor PhD thesis: Prof. Leonardo Scapozza, Section des Sciences Pharmaceutiques, Université de Genève Co-director PhD thesis: Dr. Francesco Bertoni, IOSI

Additional Information

Languages Italian: mother-tongue English, German and French: good, both oral and written

Informatic knowledge Microsoft Office applications, Affymetrix software applications, UCSC Genome Browser, Integrated Genome Browser (IGB), genome browser of the Database of Genomic Variants, open-source metaserver Galaxy, web-accessible program DAVID

Technical skills Genome-wide DNA profiling and gene expression profiling by Affymetrix GeneChip systems, standard molecular and cell biology techniques

XXV POSTERS AND PUBLICATIONS

Publications

Rinaldi A, Kwee I, Taborelli M, Largo C, Uccella S, Martin V, Poretti G, Gaidano G, Calabrese G, Martinelli G, Baldini L, Pruneri G, Capella C, Zucca E, Cotter FE, Cigudosa JC, Catapano CV, Tibiletti MG, Bertoni F. Genomic and expression profiling identifies the B-cell associated tyrosine kinase Syk as a possible therapeutic target in mantle cell lymphoma. Br J Haematol. 2006;132:303-316

Rinaldi A, Kwee I, Poretti G, Mensah A, Pruneri G, Capello D, Rossi D, Zucca E, Ponzoni M, Catapano C, Tibiletti MG, Paulli M, Gaidano G, Bertoni F. Comparative genome-wide profiling of post-transplant lymphoproliferative disorders and diffuse large B-cell lymphomas. Br J Haematol. 2006;134:27-36

Lombardi L, Poretti G, Mattioli M, Fabris S, Agnelli L, Bicciato S, Kwee I, Rinaldi A, Ronchetti D, Verdelli D, Lambertenghi-Deliliers G, Bertoni F, Neri A. Molecular characterization of human multiple myeloma cell lines by integrative genomics: insights into the biology of the disease. Genes Chromosomes Cancer. 2007;46:226-238

Rinaldi A, Poretti G, Kwee I, Zucca E, Catapano CV, Tibiletti MG, Bertoni F. Concomitant MYC and microRNA cluster miR-17-92 (C13orf25) amplification in human mantle cell lymphoma. Leuk Lymphoma. 2007;48:410-412

Forconi F, Poretti G, Kwee I, Sozzi E, Rossi D, Rancoita PM, Capello D, Rinaldi A, Zucca E, Raspadori D, Spina V, Lauria F, Gaidano G, Bertoni F. High density genome-wide DNA profiling reveals a remarkably stable profile in hairy cell leukaemia. Br J Haematol. 2008;141:622-630

Posters

G. PORETTI, I. KWEE, L. AGNELLI, S. FABRIS, M. MATTIOLI, A. RINALDI, S. BICCIATO, L. LOMBARDI, F. BERTONI, A. NERI Combined whole genome profiling and gene expression in multiple myeloma. USGEB Meeting, February 23-24 2006, Geneva (CH)

F. FORCONI, G. PORETTI, I. KWEE, E. SOZZI, D. ROSSI, D. CAPELLO, A. RINALDI, E. ZUCCA, F. LAURIA, G. GAIDANO, F. BERTONI Genome-wide DNA profiling identifies a stable profile although with aberrations targeting the fibroblast growth factor pathway in Hairy Cell Leukemia, Blood 2007, 118 (11), 248b. ASH Annual Meeting and Exposition, December 8-11 2007, Georgia World Congress Center Atlanta, Georgia (USA)

F. BERTONI, G. PORETTI, D. CAPELLO, I. KWEE, C. DEAMBROGI, A. GLOGHINI, L.M. LAROCCA, A. RINALDI, D. ROSSI, E. ZUCCA, A. CARBONE, G. GAIDANO Genome-wide DNA profiling of HIV-related non-Hodgkin lymphomas: implications for disease pathogenesis and histogenesis, Blood 2007, 118 (11), 172a. ASH Annual Meeting and Exposition, December 8-11 2007, Georgia World Congress Center Atlanta, Georgia (USA)

G. PORETTI, I. KWEE, M. TIBILETTI, B. BERNASCONI, A. RINALDI, E. ZUCCA, A. NERI, F. BERTONI Molecular and functional characterization of 11q23.1 amplification in multiple myeloma (MM) and diffuse large B-cell lymphoma (DLBCL), 10th International Conference on Malignant Lymphoma, June 4-7 2008, Lugano (CH)

XXVI