Genetics: Early Online, published on December 26, 2017 as 10.1534/genetics.117.300552
1 The hidden genomic and transcriptomic plasticity of giant marker chromosomes in cancer
1 2 3 1 1 2 Gemma Macchia * , Marco Severgnini , Stefania Purgato , Doron Tolomeo , Hilen Casciaro ,
2 1 1 4 4 3 Ingrid Cifola , Alberto L’Abbate , Anna Loverro , Orazio Palumbo , Massimo Carella ,
5 3 2 6 1 4 Laurence Bianchini , Giovanni Perini , Gianluca De Bellis , Fredrik Mertens , Mariano Rocchi ,
1# 5 Clelia Tiziana Storlazzi .
6 (1) Department of Biology, University of Bari “Aldo Moro”, Bari, Italy;
7 (2) Institute for Biomedical Technologies (ITB), CNR, Segrate, Italy;
8 (3) Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy;
9 (4) Laboratorio di Genetica Medica, IRCCS Casa Sollievo della Sofferenza, San Giovanni
10 Rotondo, Italy;
11 (5) Laboratory of solid tumor genetics, Université Côte d'Azur, CNRS, IRCAN, Nice, France.
12 (6) Department of Clinical Genetics, University and Regional Laboratories, Lund University,
13 Lund, Sweden.
14
15 EMBL-EBI Array Express database: E-MTAB-5625
16 NCBI Short Read Archive: PRJNA378952.
17 GenBank repository: KY966261-KY966313 and KY966314-KY966332
18
19
20
21
22
23
24
25
26
27 Running Title: Neocentromeres and chimeric transcripts in cancer
1
Copyright 2017. 1 Keywords: neocentromere, fusion transcript, WDLPS, LSC, gene amplification
2
3 * Corresponding author:
4 Macchia Gemma, Department of Biology, University of Bari, Via Orabona no.4, 70125 Bari (Italy)
5 Email: [email protected]
6 Tel No: +39 0805443582
7 Fax: +39 0805443386
8
9
2 1 ABSTRACT
2
3 Genome amplification in the form of rings or giant rod-shaped marker chromosomes is a common
4 genetic alteration in soft tissue tumours. The mitotic stability of these structures is often rescued by
5 perfectly functioning analphoid neocentromeres, which therefore significantly contribute to cancer
6 progression. Here, we disentangled the genomic architecture of many neocentromeres stabilizing
7 marker chromosomes in well-differentiated liposarcoma and lung sarcomatoid carcinoma samples.
8 In cells carrying heavily rearranged RGMs, these structures were assembled as patchworks of
9 multiple short amplified sequences, disclosing an extremely high level of complexity and definitely
10 ruling out the existence of regions prone to the neocentromere seeding. Moreover, by studying two
11 well-differentiated liposarcoma samples derived from the onset and the recurrence of the same
12 tumor, we documented an expansion of the neocentromeric domain that occurred during tumor
13 progression, which reflects a strong selective pressure acting toward the improvement of the
14 neocentromeric functionality in cancer. In lung sarcomatoid carcinoma cells, extensive “centromere
15 sliding” phenomena giving rise to multiple, closely mapping neocentromeric epialleles on separate
16 co-existing markers occur likely due to the instability of neocentromeres arising in cancer cells.
17 Finally, by investigating the transcriptional activity of neocentromeres, we came across a burst of
18 chimeric transcripts, both by extremely complex genomic rearrangements, and cis/trans-splicing
19 events. Post-transcriptional editing events have been reported to expand and variegate the genetic
20 repertoire of higher eukaryotes, so they might have a determining role in cancer. The increased
21 incidence of fusion transcripts, might act as a driving force for the genomic amplification process,
22 together with the increased transcription of oncogenes.
23
24
25
26
3 1 INTRODUCTION
2 Genome amplification is a frequent genetic alteration in cancer, with variable cytogenetic
3 manifestations including double minutes, homogeneously staining regions and/or ring and giant
4 rod-shaped marker chromosomes (RGM) (MATSUI et al. 2013; L'ABBATE et al. 2014; NORD et al.
5 2014). While double minutes and homogeneously staining regions have been described in a variety
6 of cancer types (MATSUI et al. 2013), RGMs are particularly common in soft tissue tumours,
7 notably in well-differentiated liposarcomas (WDLPS), and shown to contain amplified sequences
8 from several chromosomes (NORD et al. 2014). During tumour progression, the ring chromosomes
9 are frequently broken and resealed or transformed into rod-shaped markers capturing the telomeres
10 from other chromosomes (NORD et al. 2014). This instability results in a highly complex internal
11 structure of these markers, as well as in extensive heterogeneity with respect to size and number per
12 cell (GARSED et al. 2014; NORD et al. 2014). RGMs frequently lack functional centromeric alphoid
13 sequences and their mitotic stability is rescued by the emergence of perfectly functioning analphoid
14 neocentromeres, which might indirectly contribute to cancer progression (MACCHIA et al. 2015).
15 Nonetheless, there are few studies addressing neocentromeres in cancer, probably because most of
16 the technologies employed to study the tumour genotypes are unable to unveil them. The
17 occurrence of neocentromeres in cancer, therefore, could be more frequent than reported. Similarly,
18 very little is known about the impact of neocentromeres on transcription, although centromeric
19 satellite regions have been reported to produce non-coding transcripts actively involved in the
20 centromere assembly (CHAN et al. 2012; ROSIC et al. 2014; QUENET AND DALAL 2015; MCNULTY
21 et al. 2017). Also, genes within neocentromeres are still actively transcribed (AMOR AND CHOO
22 2002; WONG et al. 2006). In line with these notions, the occurrence of neocentromeres in colon
23 cancer cell lines was reported to correlate with large DNase I hypersensitive sites, which are usually
24 sites of active transcription or high nucleosome turnover (ATHWAL et al. 2015). By combining
25 chromatin immunoprecipitation (IP) deep sequencing (ChIP-seq), whole genome sequencing
26 (WGS), immuno-fluorescence in situ hybridisation (immuno-FISH), whole transcriptome
27 sequencing (total RNA-seq) and other molecular analyses, we investigated in detail the genomic
4 1 architecture of neocentromeres arising on RGMs, as well as their contribution to transcription, in
2 the lung sarcomatoid carcinoma (LSC) cell line 04T036 and in the three liposarcoma cell lines
3 93T449, 94T778 and 95T1000. Overall, our study uncovered the complex organization of
4 neocentromeres in cancer and shed light on the extraordinarily high genomic and transcriptomic
5 plasticity associated with RGMs in solid tumours.
6
7 MATERIALS AND METHODS
8 Tumour cell lines
9 Four tumour cell lines (04T036, 93T449, 94T778, and 95T1000), kindly provided by The Centre
10 Hospitalier Universitaire de Nice (France), were included in the study. 04T036 was established
11 from the LSC of a 50-year-old man. Cytogenetic and multicolor FISH analyses showed a near-
12 triploid karyotype with numerous structural aberrations and four to six small RGMs containing
13 chromosome 9 amplified sequences, and two RGMs containing chromosome 3 amplified sequences
14 (ITALIANO et al. 2006). 93T449 and 94T778 cell lines were obtained from a primary retroperitoneal
15 WDLPS at onset and at relapse, respectively. These commercial cell lines showed complex
16 karyotypes with multiple RGMs at G-banding and multicolour FISH analysis , and a clear
17 difference in the chromosome overall arrangement between them (SIRVENT et al. 2000; GARSED et
18 al. 2014). 95T1000 cell line was generated from a WDLPS relapse; SKY analysis revealed a
19 hypertriploid karyotype with multiple chromosomal structural abnormalities (PEDEUTOUR et al.
20 2012). All cells retained a giant marker chromosome, previously identified in the primary cell
21 cultures. This giant chromosome contained high-level amplification of chromosomal regions
22 deriving from 10p and 12q and lacked alpha-satellite DNA (PEDEUTOUR et al. 2012).
23
24 SNP array data
25 All cell lines were analysed by Affymetrix Genome Wide Human SNP Array 6.0 platform
26 (Affymetrix, Santa Clara, CA, USA), as described (STORLAZZI et al. 2010).
27
5 1 Whole Genome Sequencing
2 WGS was carried out to disentangle the genomic architecture of RGMs holding neocentromeres.
3 Library preparations were performed using the TruSeqDNA Nano 350 bp protocol (Illumina, San
4 Diego, CA, USA). The sequencing data were acquired using the Illumina Xten at the NYGC (New
5 York, US), in a paired-end 150-cycle run (mean coverage 40× per sample). Reads were aligned to
6 the human reference genome (GRCh37/hg19) using BWA-MEM (v.0.7.12) [http://bio-
7 bwa.sourceforge.net/, (LI AND DURBIN 2009)] and PCR duplicates were removed using Picard
8 (v.1.119) (http://picard.sourceforge.net/). Candidate structural variations (SVs) were identified
9 using Delly (v. 0.5.9) and Crest (v. 1.0) with default parameters (WANG et al. 2011; RAUSCH et al.
10 2012). Copy number analysis was performed using BIC-seq 0.7alpha (XI et al. 2011), and genomic
11 intervals showing a log2 copyRatio > 0.5 and > 2.5 were considered as amplified and highly
12 amplified, respectively.
13
14 ChIP-sequencing
15 To determine the internal structure of the neocentromeres, native ChIP-seq was performed as
16 described (WADE et al. 2009). Immunoprecipitation was run using a polyclonal antibody against the
17 CENP-A (TRAZZI et al. 2009). Both input and IP DNA fragments were purified and processed using
18 the TruSeq ChIP Library Preparation Kit (Illumina) and sequenced on the Illumina HiSeq 2500 at
19 the IGA Technology Services facility (Udine, Italy) (single-end 100-cycle run, 140M
20 reads/sample). Raw reads were aligned to the human reference genome (GRCh37/hg19), using
21 BWA-MEM (v 0.7.10). CENP-A enriched regions corresponding to putative neocentromeres were
22 identified using the CNV-seq tool, merging all overlapping intervals (XIE AND TAMMI 2009).
23 Selected regions were then filtered to exclude alphoid sequences, weak enrichments and regions
24 with read “spikes” piling-up in a single position. Next, putative neocentromeric fragments were
25 ranked according to their Overall Evaluation Criteria (OEC) score (SEVERGNINI et al. 2006). We
26 then screened the first 40 hits looking at the ChIP-seq alignment data with the Integrative Genomic
27 Viewer (IGV) software (THORVALDSDOTTIR et al. 2013) to exclude false positives, and validated
6 1 the selected intervals by immuno-FISH as described below. Using this approach, multiple CENP-A
2 enriched peaks were detected in all samples. In order to define the neocentromere internal structure,
3 a local re-mapping of the IP reads was performed against the identified putative neocentromeric
4 intervals (custom reference) by Blast (ALTSCHUL et al. 1990) or Blat (KENT 2002), searching for
5 reads spanning over the junctions. The list of reads splitting between two neocentromeric regions
6 was created and the OEC score was, then, used to prioritize the putative junctions. Full details are
7 provided in Methods S1.
8
9 Immuno-FISH assays on elongated chromosomes
10 Immuno-FISH on elongated chromosomes was performed to validate the putative structure of
11 neocentromeres, as previously described (EARNSHAW et al. 1989). As recommended by Beh et al.,
12 2016, we used a rabbit anti-CENP-C polyclonal antibody and a goat anti-rabbit FITC-conjugate
13 antibody to label functional centromeres (BEH et al. 2016). CENP-A and CENP-C, were reported to
14 localize exclusively to active centromeres, as part of the constitutive centromere-associated network
15 (KLARE et al. 2015; SHONO et al. 2015; BEH et al. 2016). Alphoid and BAC probes spanning
16 candidate neocentromeric regions were selected and labelled as reported (TROMBETTA et al. 2012;
17 MACCHIA et al. 2015) (Table S1). Subsequent FISH experiments were conducted to verify the co-
18 localization between CENP-C signals and the tested neocentromeric intervals as described
19 (STORLAZZI et al. 2006).
20
21 PCR and qPCR assays
22 PCR and Sanger sequencing were performed to validate genomic SVs as described (STORLAZZI et
23 al. 2010). Primer sequences are available upon request. To test differential ChIP-enrichments
24 between the two related samples 93T449 and 94T778, qPCR experiments were performed starting
25 from 10 ng of IP (target) and Input DNA (negative control) with the ready-to-use hot start reaction
26 mix for SYBR Green I-based real-time PCR assays using the LightCycler® 96 System, according
27 to the manufacturer’s protocols (Roche, Basel, Switzerland) (primers listed in Table S7). As
7 1 positive control, we included the top ChIP-enriched region shared by the two cell lines, defined by
2 immuno-FISH as the neocentromeric core. The results were analysed with a custom approach based
3 on the ΔΔCt-method (LIVAK AND SCHMITTGEN 2001). In detail, the ΔCt was first calculated for
4 each neocentromeric peak as the difference in the Ct between the IP and the Input. Then, the ΔΔCt
5 was obtained using the positive control as calibrator. Finally, we compared the results for 93T449
–ΔΔCt 6 and 94T778 looking at the 2 fold ratios between each region and the corresponding calibrator.
7
8 RNA-seq library preparation and analysis
9 We investigated the transcription activity of neocentromeres by total RNA-seq. Total RNA was
10 extracted using RNeasy Mini Kit (QIAGEN, Hilden, Germany), checked for integrity on the 2100
11 Bioanalyzer instrument (Agilent Technologies, Santa Clara, CA, USA) and stored at -80° until use.
12 Libraries were prepared using the TruSeq Stranded Total RNA Library Prep Kit (Illumina), and
13 sequenced on the HiSeq 2000 at the IGA Technology Services facility (Udine, Italy) (paired-end
14 100-cycle run, 70M reads/sample). After running FastQC for quality control
15 (www.bioinformatics.babraham.ac.uk/projects/fastqc/), paired-end reads were mapped to the human
16 reference genome (GRCh37/hg19) using STAR aligner (v.2.4) (DOBIN et al. 2013), with Gencode
17 (v. 19, (HARROW et al. 2012)) as gene transcript model and Samtools (v. 0.1.19) (LI et al. 2009) for
18 duplicate removal. Alignment tracks were checked by IGV to identify transcriptional activity of the
19 neocentromeric sites.
20
21 Chimeric transcripts, RNA-seq and WGS data integration
22 Looking at the RNA-seq alignment tracks by IGV, we found multiple truncated transcripts, likely
23 derived from gene fusion events. To identify these chimeras, we analysed RNA-seq data using
24 ChimeraScan with default parameters (IYER et al. 2011). The identified fusion transcripts were then
25 mapped at SV positions with a custom approach combining WGS and RNA-seq data (Methods S1).
26 RT-PCR and Sanger sequencing validations were performed on all transcripts as previously
27 described (STORLAZZI et al. 2010), filtering out overlapping, adjacent and read-through transcripts
8 1 and all chimeras with a score <15. For chimeras not supported by any SV we considered for
2 validation only those with split reads. All chimeras detected in 93T449 or 94T778 were tested in
3 both cell lines.
4
5 Data Access
6 SNP array data are available at the EMBL-EBI Array Express database
7 (https://www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-5625. WGS, RNA-seq,
8 anti-CENP-A and input ChIP-seq data are available at NCBI Short Read Archive (SRA,
9 https://www.ncbi.nlm.nih.gov/sra) with accession number PRJNA378952. All validated SV
10 sequences were submitted to GenBank repository (http://www.ncbi.nlm.nih.gov/genbank/), under
11 accession numbers KY966261-KY966313 and KY966314-KY966332 for chimeric transcripts and
12 genomic fusions, respectively.
13
14 RESULTS
15 Seven distinct neocentromeres coexist in the cell line 04T036
16 The LSC cell line 04T036 was already described as carrying multiple markers stabilized by
17 neocentromeres (ITALIANO et al. 2006). In order to study these neocentromeres, we first
18 investigated the genomic context in which they occurred. Both SNP-array and WGS copy number
19 variant analyses identified weak amplification of the 3q26.1-q29 and 9p24.3-p23 regions (Table
20 S1). FISH with BAC probes mapping at these regions confirmed the presence of a variable number
21 of RGMs, looking like isochromosomes, per cell: two derived from chromosome 3, and two to five
22 from chromosome 9 (ITALIANO et al. 2006). We mapped the fusion junctions of these
23 isochromosomes using by WGS (Figure 1), and validated the results by PCR and Sanger
24 sequencing. Neocentromeres arising on the multiple copies of RGMs described by Italiano et al.
25 (2006) were detected by looking at anti-CENP-C signals not co-localizing with alphoid probes at
26 immuno-FISH on metaphase spreads (Figure 1A) (ITALIANO et al. 2006). To define the internal
27 organization of these neocentromeres, we performed ChIP-seq with the anti-CENP-A antibody.
9 1 This analysis disclosed seven separate peaks of enrichment spanning the amplified regions of the
2 RGM of chromosome 3 (peaks 1 and 2) and 9 (peaks 3-7), ranging from 52 to 136 Kb in size
3 (Figure 1, Table S2). The bell-shaped coverage profile of each candidate region, visualized in IGV
4 by looking at the ChIP-seq alignment to the reference genome, as well as both ChIP-seq and WGS
5 data (Table S3a and 3b) suggested that each of them was specific for a different marker
6 chromosome (Figure 1B, C). This hypothesis was verified by performing simultaneous immuno-
7 FISH experiments on elongated chromosomes, with anti-CENP-C antibodies pinpointing
8 centromeres, and different combinations of BAC probes spanning the ChIP-enriched peaks. The
9 obtained results showed the co-localization of the neocentromeric CENP-C signal with a different
10 BAC probe on a separate markers, confirming the existence of distinct neocentromeric epialleles
11 arising on the multiple copies of RGMs (Figure 1).
12
13 Complex neocentromeres undergoing structural evolution in WDLPS cell lines
14 The combination of SNP-array and WGS approaches confirmed the already described highly
15 complex internal structure of the marker chromosomes of the WDLPS cell lines (ITALIANO et al.
16 2009; GARSED et al. 2014) (Table S1, 3a, 3b). In 95T1000, despite the internal arrangement of the
17 multiple detected markers varying within and among cells, their amplified content from
18 chromosomes 10 and 12 remained highly conserved (Figure S1, Table S1). The presence of
19 neocentromeres arising on RGMs was confirmed by looking at anti-CENP-C signals not co-
20 localizing with alphoid probes by immuno-FISH on metaphase spreads (Figure 2B). The anti-
21 CENP-A ChIP-seq analysis, performed to characterize these neocentromeres, revealed four separate
22 peaks of enrichment at 10p12.1, 10p12.33, 12q13.3, and 12q14.1, spanning 74, 9, 43 and 40 Kb in
23 size, respectively. The inspection of the ChIP-seq alignment data by IGV disclosed a coverage drop
24 at one or both ends of each peak (Figure 2A), suggesting that the multiple non-continuous, non-
25 collinear enriched sequences might be juxtaposed by SVs to form a single neocentromere. Our
26 custom ChIP-seq analysis (Methods S1) confirmed this hypothesis, revealing specific SVs joining
27 the fragments (Figure 2A, Table S4), all validated by PCR and Sanger sequencing. Immuno-FISH
10 1 experiments with anti-CENP-C antibodies and BAC probes spanning the core ChIP-enriched
2 regions confirmed the occurrence of a single neocentromere stabilizing all the observed marker
3 chromosomes (Figure 2C).
4 In 93T449 (primary tumour) and 94T778 (recurrence), we found 17 chromosomes involved
5 in the amplification (Figure S2). By comparing the copy number profiles of the two related cell
6 lines, we disclosed a perfect conservation of the overall RGM amplified content, with few
7 differences in the copy number state of the amplicons (Figure S3 and Table S1). In metaphase
8 spreads, a higher number of RGM per cell was found in the recurrence versus the onset tumour
9 (average number of 2.63 vs 1.14). Moreover, we detected 16 SVs specific for 93T449, and six for
10 94T778 (Table S5). In both cell lines, each observed RGM was mitotically stabilized by a
11 neocentromere (Figure 4A, B), as confirmed when looking at anti-CENP-C signals not co-
12 localizing with alphoid probes by immuno-FISH on metaphase spreads (Figure 2B).
13 The CENP-A ChIP-seq analysis disclosed four strongly enriched regions shared by both cell
14 lines, as well as five additional weakly enriched fragments, specific for 94T778, clearly suggesting
15 an enlargement of the CENP-A centromeric domain in the recurrence of the tumour. As for
16 95T1000, the neocentromeric seeding occurred on a patchwork of amplified sequences (Figure 3).
17 By juxtaposing each enrichment peak according to the identified SVs, we inferred two separate
18 neocentromeric contiguous sequences (NEO1 and NEO2), sharing a 5 Kb region at chr1: 188.377-
19 188.382 Mb. Since no SV connecting these two contigs could be detected, we hypothesized the
20 occurrence of two distinct neocentromeres arising at the tumour onset (Figure 3). To validate this
21 hypothesis, we performed immuno-FISH experiments on elongated chromosomes with the anti-
22 CENP-C antibody and BAC probes spanning the core regions of the two inferred neocentromeres.
23 The results disclosed a mutually exclusive co-localization of the anti-CENP-C signals with either of
24 the probe on each marker chromosome, confirming two separate neocentromeres occurring within
25 the same cell in both samples (Figure 4). Finally, qPCR assays performed on the CENP-A IP and
26 input DNA with primer pairs specific for each fragment of NEO1 and NEO2 confirmed the
27 differential enrichment ratio of shared vs specific (94T778) ChIP-enriched sites, proving the size
11 1 increase of both neocentromeres from the primary tumour to the recurrence (Figure S4). More
2 specifically, NEO1 increased from 84 to 147 Kb and NEO2 from 68 to 121 Kb.
3
4 From neocentromeric transcription to the burst of chimeric transcripts
5 We investigated the transcription activity of neocentromeres by integrating RNA-seq, WGS
6 and ChIP-seq data. In line with previous studies (AMOR AND CHOO 2002; WONG et al. 2006), we
7 found that gene transcriptional activity was not hampered by the presence of neocentromeres.
8 Surprisingly, we also found two chimeric transcripts with one of the two partners mapping within
9 the neocentromere domains of 93T449 and 94T778 (i.e APP/HMCN1, and LOC100507250/HMCN1).
10 Investigating their origin, we found that the SVs supporting these chimeras were not involved in the
11 assembly of the neocentromeres; therefore, they likely derived from additional rearrangements of
12 the same amplified region, which did not acquire a centromeric function. Despite that this analysis
13 did not reveal any aberrant transcript specific for neocentromeres, it shed light on the multiple
14 chimeric transcripts originated through the amplification process manifested in the RGMs. We
15 detected thousands of putative chimeras, and validated the most abundant ones by RT-PCR and
16 Sanger sequencing (Table 1, Table S6). Looking at the Refseq gene function annotations, most of
17 the chimeric partners were reported as cancer-associated genes (Table 1). To investigate the origin
18 of these chimeras, we built a custom pipeline to integrate WGS and RNA-seq data (for details see
19 Methods S1), and demonstrated that some transcripts were supported by perfectly matching SVs
20 (Class I), while others showed a much more complex origin. Class II transcripts, indeed, were
21 assembled by means of several non-contiguous, non-collinear genomic fragments interposed
22 between partner genes, which were actively transcribed, but, subsequently, spliced out from the
23 mature mRNA (Figure S5 and S6). On the contrary, Class III chimeras, lacking any supporting SV,
24 could possibly originate from post-transcriptional events (Table 1).
25
26 DISCUSSION
12 1 In the present study, we unveiled the extremely complex molecular architecture of neocentromeres
2 mitotically stabilizing RGMs in WDLPS and LSC cell lines, describing their structure down to the
3 nucleotide level, and disclosed the occurrence of a burst of chimeric transcripts associated with
4 genomic amplification. Both these findings shed light on the extraordinary genomic and
5 transcriptomic plasticity of RGMs, likely playing a role in cancer evolution.
6
7 The enhanced complexity of neocentromeres arising at RGMs
8 The genomic architecture of neocentromeres in LSC and WDLPS cancer types showed distinct
9 features. Each of the multiple RGMs found in the 04T036 LSC cell line had a single
10 neocentromere, embedded in a continuous, non-rearranged sequence. Four out of the five
11 neocentromeres mitotically stabilizing the multiple copies of RGM9 mapped close to each other
12 (within 1 Mb), while the fifth mapped ~10 Mb apart. Instead, the two neocentromeres stabilizing
13 the RGM3 mapped ~ 5 Mb apart from each other. Very likely, only a single neocentromeric-
14 seeding event occurred on each of the two ancestral types of RGM, which subsequently multiplied
15 following mitotic errors. The different positions of the neocentromeres arising on the multiple
16 copies of RGM3 and RGM9 were likely due to extensive “sliding” processes along the
17 chromosomes during tumour evolution, which led to functional “epialleles”. This phenomenon was
18 recently discovered by Purgato et al. (2015) while studying the satellite-free centromere of horse
19 chromosome 11 (likely an evolutionary neocentromere) (PURGATO et al. 2015). That centromere
20 was sliding within a ~500 Kb genomic segment, a smaller region compared to those described here
21 in 04T036. When a neocentromere arises in natural populations, the meiotic process can be affected
22 by large distance sliding events giving rise to epialleles. Indeed, in heterozygous individuals,
23 crossing-over events involving regions delimited by neocentromeric epialleles would generate
24 acentric/dicentric chromosomes, resulting in a reduced fitness. However, the mitotic process is not
25 affected by distant neocentromeric epialleles, so, in cancer, it seems very likely that neocentromere
26 fluctuations might involve several Mb, especially on RGMs. Moreover, the alternative hypothesis
13 1 calling for the simultaneous and independent seeding of seven neocentromeres on separate acentric
2 RGMs seems highly unlikely.
3 Different from what was observed in the LSC cell line, the neocentromeres of all the studied
4 WDLPS cell lines showed a very complex structure, consisting of a patchwork of multiple short
5 sequences, amplified from several chromosomes. The structural complexity of neocentromeres in
6 WDLPS strongly indicates that they arose secondarily to the amplification process, very likely as a
7 consequence of the massive recruitment of CENP-A on highly rearranged RGMs to enable double-
8 strand-break repair (ZEITLIN et al. 2009). Moreover, by comparing the 93T449 and 94T778 cell
9 lines (primary and relapse of the same tumour, respectively), we disclosed additional interesting
10 features. The perfect match between the amplified RGM content of the two related cell lines
11 indicates that these structures arose as early events in the tumour evolution and were maintained
12 under a strong selective pressure. The higher number of SVs specific for the onset cell line
13 (93T449) also suggested that the recurrence was established from a minor sub-clone of the primary
14 tumour. Moreover, the neocentromeres of 94T778 were 50-60 Kb larger than those of 93T449,
15 suggesting an evolution of these structures during tumour progression. As the size of centromeres
16 tends to be uniform within species, and the expansion of the neocentromeric size has been
17 suggested to constitute a key factor in the survival of neocentric chromosomes (WANG et al. 2014),
18 we speculate that the size increase of neocentromeres during tumorigenesis might be correlated with
19 acquired mitotic stability of the RGMs harbouring them. As for 04T036, the presence of two
20 separate neocentromeres, even sharing a 5 Kb domain in 94T778 cell line, might be the result of
21 centromeric sliding phenomena.
22 Literature on human neocentromeres arising in acentric chromosomal fragments indicates a
23 variable functional efficiency of these structures, which often leads to mosaicism for the marker
24 (MARSHALL et al. 2008). In cancer, an incomplete functional efficiency of neocentromeres, causing
25 non-disjunction, might lead to the segregation of multiple copies of RGMs to daughter cells,
26 potentially increasing their selective advantage. This consideration could apply to the finding of a
27 higher number of RGMs in a relapse (94T778) than in a primary tumour (93T449). It might be also
14 1 conjectured that the instability of the neocentromeres in the primary tumour led to the selection of a
2 higher number of RGMs per cell, subsequently stabilized by a neocentromere expansion in the
3 recurrence. Combined, our results shed light on the extraordinary evolutionary plasticity of cancer-
4 associated neocentromeres, and provide strong support for the strictly epigenetic nature of the
5 neocentromeric seeding process.
6
7 Transcriptomic plasticity: the hidden side of the complex rearrangements in RGMs
8 Centromeric transcription has been reported as actively participating to the kinetochore
9 assembly and function (MULLER AND ALMOUZNI 2017). Our RNA-seq analyses, however, did not
10 reveal any unusual transcriptional activity specific for neocentromeres, but disclosed the occurrence
11 of several chimeric transcripts derived from the whole RGM, which we considered worthwhile to
12 investigate, as very few previous studies focused on this topic. The combined analysis of genomic
13 and transcriptomic sequencing data allowed us to detect chimeric transcripts originating from
14 extremely complex rearrangements, such as those resulting in the fusion of three partner genes
15 (among Class I transcripts), or those assembled by means of multiple interposed non-contiguous,
16 non-collinear genomic fragments (Class II). To the best of our knowledge, such complex fusions
17 have not been described before in cancer. The molecular mechanism underlying these genomic
18 chimeras might resemble the “exon-shuffling” process creating functional genetic novelties and
19 multi-domain proteins over natural evolution (FRANCA et al. 2012). Indeed, a similar but
20 accelerated mechanism might enrich the genetic repertoire of cancer cells, likely providing a
21 selective advantage. The splicing machinery also seems to be actively involved in shaping chimeras
22 and creating variability in cancer. In assembling Class II chimeras, for instance, it actively removes
23 the interposed fragments from the mature mRNA. Here, we also report multiple examples of
24 transcription-induced chimeras (Class III), although our analysis likely just uncovered the tip of the
25 iceberg (LU et al. 2016).
26 The burst of chimeric transcripts documented in our study brings the focus on the extreme
27 transcriptome plasticity of highly rearranged RGMs, which might be crucial for cancer initiation
15 1 and progression. Indeed, the shuffling of coding sequences might strongly contribute to the
2 proliferative success of the cancer cells by enriching and variegating the genetic substrate for
3 selection to act on. We propose that the increased incidence of gene fusion events, caused by the
4 extremely rearranged nature of RGMs, might act as a driving force for genome amplification,
5 together with the well-established oncogene increased expression (TAYLOR et al. 2011;
6 PANAGOPOULOS et al. 2014).
7
8 ACKNOWLEDGEMENTS
9 This work was supported by the Italian Association on Cancer Research (AIRC) (AIRC Investigator
10 Grant 15413) to CTS, PRIN (Progetti di Interesse Nazionale) and EPIGEN (CNR) to MR,
11 Fondazione cassa di Risparmio di Puglia and APQ Ricerca Regione Puglia (FutureInResearch) to
12 GM.
13
14 DISCLOSURE/CONFLICT OF INTEREST
15 The authors have no duality of interest to declare.
16 Supplementary information is available
17 REFERENCES
18
19 Altschul, S. F., W. Gish, W. Miller, E. W. Myers and D. J. Lipman, 1990 Basic local alignment search tool. J Mol Biol 215: 403- 20 410. 21 Amor, D. J., and K. H. Choo, 2002 Neocentromeres: role in human disease, evolution, and centromere study. Am J Hum Genet 71: 22 695-714. 23 Athwal, R. K., M. P. Walkiewicz, S. Baek, S. Fu, M. Bui et al., 2015 CENP-A nucleosomes localize to transcription factor hotspots 24 and subtelomeric sites in human cancer cells. Epigenetics Chromatin 8: 2. 25 Beh, T. T., R. N. MacKinnon and P. Kalitsis, 2016 Active centromere and chromosome identification in fixed cell lines. Mol 26 Cytogenet 9: 28. 27 Chan, F. L., O. J. Marshall, R. Saffery, B. W. Kim, E. Earle et al., 2012 Active transcription and essential role of RNA polymerase II 28 at the centromere during mitosis. Proc Natl Acad Sci U S A 109: 1979-1984. 29 Dobin, A., C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski et al., 2013 STAR: ultrafast universal RNA-seq aligner. 30 Bioinformatics 29: 15-21. 31 Earnshaw, W. C., H. Ratrie, 3rd and G. Stetten, 1989 Visualization of centromere proteins CENP-B and CENP-C on a stable 32 dicentric chromosome in cytological spreads. Chromosoma 98: 1-12. 33 Franca, G. S., D. V. Cancherini and S. J. de Souza, 2012 Evolutionary history of exon shuffling. Genetica 140: 249-257. 34 Garsed, D. W., O. J. Marshall, V. D. Corbin, A. Hsu, L. Di Stefano et al., 2014 The architecture and evolution of cancer 35 neochromosomes. Cancer Cell 26: 653-667. 36 Harrow, J., A. Frankish, J. M. Gonzalez, E. Tapanari, M. Diekhans et al., 2012 GENCODE: the reference human genome annotation 37 for The ENCODE Project. Genome Res 22: 1760-1774. 38 Italiano, A., R. Attias, A. Aurias, G. Perot, F. Burel-Vandenbos et al., 2006 Molecular cytogenetic characterization of a metastatic 39 lung sarcomatoid carcinoma: 9p23 neocentromere and 9p23-p24 amplification including JAK2 and JMJD2C. Cancer Genet 40 Cytogenet 167: 122-130.
16 1 Italiano, A., G. Maire, N. Sirvent, P. A. Nuin, F. Keslair et al., 2009 Variability of origin for the neocentromeric sequences in 2 analphoid supernumerary marker chromosomes of well-differentiated liposarcomas. Cancer Lett 273: 323-330. 3 Iyer, M. K., A. M. Chinnaiyan and C. A. Maher, 2011 ChimeraScan: a tool for identifying chimeric transcription in sequencing data. 4 Bioinformatics 27: 2903-2904. 5 Kent, W. J., 2002 BLAT--the BLAST-like alignment tool. Genome Res 12: 656-664. 6 Klare, K., J. R. Weir, F. Basilico, T. Zimniak, L. Massimiliano et al., 2015 CENP-C is a blueprint for constitutive centromere- 7 associated network assembly within human kinetochores. J Cell Biol 210: 11-22. 8 L'Abbate, A., G. Macchia, P. D'Addabbo, A. Lonoce, D. Tolomeo et al., 2014 Genomic organization and evolution of double 9 minutes/homogeneously staining regions with MYC amplification in human cancer. Nucleic Acids Res 42: 9131-9145. 10 Li, H., and R. Durbin, 2009 Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754-1760. 11 Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan et al., 2009 The Sequence Alignment/Map format and SAMtools. 12 Bioinformatics 25: 2078-2079. 13 Livak, K. J., and T. D. Schmittgen, 2001 Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta 14 Delta C(T)) Method. Methods 25: 402-408. 15 Lu, G., J. Wu, G. Zhao, Z. Wang, W. Chen et al., 2016 Abundant and broad expression of transcription-induced chimeras and protein 16 products in mammalian genomes. Biochem Biophys Res Commun 470: 759-765. 17 Macchia, G., K. H. Nord, M. Zoli, S. Purgato, P. D'Addabbo et al., 2015 Ring chromosomes, breakpoint clusters, and 18 neocentromeres in sarcomas. Genes Chromosomes Cancer 54: 156-167. 19 Marshall, O. J., A. C. Chueh, L. H. Wong and K. H. Choo, 2008 Neocentromeres: new insights into centromere structure, disease 20 development, and karyotype evolution. American journal of human genetics 82: 261-282. 21 Matsui, A., T. Ihara, H. Suda, H. Mikami and K. Semba, 2013 Gene amplification: mechanisms and involvement in cancer. Biomol 22 Concepts 4: 567-582. 23 McNulty, S. M., L. L. Sullivan and B. A. Sullivan, 2017 Human Centromeres Produce Chromosome-Specific and Array-Specific 24 Alpha Satellite Transcripts that Are Complexed with CENP-A and CENP-C. Dev Cell 42: 226-240 e226. 25 Muller, S., and G. Almouzni, 2017 Chromatin dynamics during the cell cycle at centromeres. Nat Rev Genet. 26 Nord, K. H., G. Macchia, J. Tayebwa, J. Nilsson, F. Vult von Steyern et al., 2014 Integrative genome and transcriptome analyses 27 reveal two distinct types of ring chromosome in soft tissue sarcomas. Hum Mol Genet 23: 878-888. 28 Panagopoulos, I., B. Bjerkehagen, L. Gorunova, J. M. Berner, K. Boye et al., 2014 Several fusion genes identified by whole 29 transcriptome sequencing in a spindle cell sarcoma with rearrangements of chromosome arm 12q and MDM2 30 amplification. Int J Oncol 45: 1829-1836. 31 Pedeutour, F., G. Maire, A. Pierron, D. M. Thomas, D. W. Garsed et al., 2012 A newly characterized human well-differentiated 32 liposarcoma cell line contains amplifications of the 12q12-21 and 10p11-14 regions. Virchows Arch 461: 67-78. 33 Purgato, S., E. Belloni, F. M. Piras, M. Zoli, C. Badiale et al., 2015 Erratum to: Centromere sliding on a mammalian chromosome. 34 Chromosoma 124: 289. 35 Quenet, D., and Y. Dalal, 2015 Correction: a long non-coding RNA is required for targeting centromeric protein A to the human 36 centromere. Elife 4. 37 Rausch, T., T. Zichner, A. Schlattl, A. M. Stutz, V. Benes et al., 2012 DELLY: structural variant discovery by integrated paired-end 38 and split-read analysis. Bioinformatics 28: i333-i339. 39 Rosic, S., F. Kohler and S. Erhardt, 2014 Repetitive centromeric satellite RNA is essential for kinetochore formation and cell 40 division. J Cell Biol 207: 335-349. 41 Severgnini, M., L. Pattini, C. Consolandi, E. Rizzi, C. Battaglia et al., 2006 Application of the Taguchi method to the analysis of the 42 deposition step in microarray production. IEEE Trans Nanobioscience 5: 164-172. 43 Shono, N., J. Ohzeki, K. Otake, N. M. Martins, T. Nagase et al., 2015 CENP-C and CENP-I are key connecting factors for 44 kinetochore and CENP-A assembly. J Cell Sci 128: 4572-4587. 45 Sirvent, N., A. Forus, W. Lescaut, F. Burel, S. Benzaken et al., 2000 Characterization of centromere alterations in liposarcomas. 46 Genes Chromosomes Cancer 29: 117-129. 47 Storlazzi, C. T., T. Fioretos, C. Surace, A. Lonoce, A. Mastrorilli et al., 2006 MYC-containing double minutes in hematologic 48 malignancies: evidence in favor of the episome model and exclusion of MYC as the target gene. Hum Mol Genet 15: 933- 49 942. 50 Storlazzi, C. T., A. Lonoce, M. C. Guastadisegni, D. Trombetta, P. D'Addabbo et al., 2010 Gene amplification as double minutes or 51 homogeneously staining regions in solid tumors: origin and structure. Genome Res 20: 1198-1206. 52 Taylor, B. S., P. L. DeCarolis, C. V. Angeles, F. Brenet, N. Schultz et al., 2011 Frequent alterations and epigenetic silencing of 53 differentiation pathway genes in structurally rearranged liposarcomas. Cancer Discov 1: 587-597. 54 Thorvaldsdottir, H., J. T. Robinson and J. P. Mesirov, 2013 Integrative Genomics Viewer (IGV): high-performance genomics data 55 visualization and exploration. Brief Bioinform 14: 178-192. 56 Trazzi, S., G. Perini, R. Bernardoni, M. Zoli, J. C. Reese et al., 2009 The C-terminal domain of CENP-C displays multiple and 57 critical functions for mammalian centromere formation. PLoS One 4: e5832. 58 Trombetta, D., G. Macchia, N. Mandahl, K. H. Nord and F. Mertens, 2012 Molecular genetic characterization of the 11q13 59 breakpoint in a desmoplastic fibroma of bone. Cancer Genet 205: 410-413. 60 Wade, C. M., E. Giulotto, S. Sigurdsson, M. Zoli, S. Gnerre et al., 2009 Genome sequence, comparative analysis, and population 61 genetics of the domestic horse. Science 326: 865-867. 62 Wang, J., C. G. Mullighan, J. Easton, S. Roberts, S. L. Heatley et al., 2011 CREST maps somatic structural variation in cancer 63 genomes with base-pair resolution. Nat Methods 8: 652-654. 64 Wang, K., Y. Wu, W. Zhang, R. K. Dawe and J. Jiang, 2014 Maize centromeres expand and adopt a uniform size in the genetic 65 background of oat. Genome Res 24: 107-116. 66 Wong, N. C., L. H. Wong, J. M. Quach, P. Canham, J. M. Craig et al., 2006 Permissive transcriptional activity at the centromere 67 through pockets of DNA hypomethylation. PLoS Genet 2: e17. 68 Xi, R., A. G. Hadjipanayis, L. J. Luquette, T. M. Kim, E. Lee et al., 2011 Copy number variation detection in whole-genome 69 sequencing data using the Bayesian information criterion. Proc Natl Acad Sci U S A 108: E1128-1136. 70 Xie, C., and M. T. Tammi, 2009 CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC 71 Bioinformatics 10: 80. 17 1 Zeitlin, S. G., N. M. Baker, B. R. Chapados, E. Soutoglou, J. Y. Wang et al., 2009 Double-strand DNA breaks recruit the 2 centromeric histone CENP-A. Proc Natl Acad Sci U S A 106: 15762-15767. 3
4
5
18 1 FIGURES AND TABLE LEGENDS
2
3 Figure 1: The genomic architecture of the RGMs and neocentromeres in 04T036 cell line. (A)
4 Metaphase spread showing immuno-FISH results on the 04T036 cell line; probe names and signals
5 are consistently coloured; white arrows pinpoint neocentromeres on RGMs, detectable by CENP-C
6 signals (green) not co-localizing with alphoid probe (red); BAC probes spanning the amplified
7 regions are also hybridized (violet and blue) on the RGMs. (B, C) Schematic representation of the
8 overall structures of RGM9 (B) and RGM3 (C) in 04T036; thin blue lines and breakpoint positions
9 indicate the SV connections assembling the isochromosomes. IGV partial auto-scaled plots of the
10 CENP-A ChIP-enriched regions, with the corresponding input panels (negative controls); Y and X-
11 axis represent read count averaged in windows (value shown at the left of each box), and reference
12 human genome (GRCh37/hg19) coordinates (Kb positions shown at the top), respectively; BAC
13 probes spanning each neocentromeric peak are reported on the right. Immuno-FISH results are
14 shown on the right (D); probe names and signals are consistently coloured; CENP-C signals (green)
15 mapping at RGMs pinpoint neocentromeres (as shown in A). Left and middle boxes of Immuno-
16 FISH panels show a two-way merge between each probe and CENP-C signals; right boxes show
17 partial metaphases with three-way merged signals; in each box, pale blue and yellow arrows
18 indicate neocentromeres carrying blue/green and red/green co-localizing signals, respectively. The
19 results achieved with this combination of different neocentromeric probes confirmed the presence
20 of multiple neocentromeric alleles [represented as red lines on the RGM ideograms (B, C)].
21
22 Figure 2: The genomic architecture of the neocentromeres in 95T1000 cell line. (A) IGV partial
23 plots of the CENP-A ChIP-enriched regions aligned to the human reference (GRCh37/hg19),
24 juxtaposed in proper order and orientation to assemble the neocentromeric inferred structure of
25 95T1000; top panel shows the normalized enrichment profile of each peak (IP versus input; full
26 details in Supplementary Methods); middle and lower panels show the IP and input raw enrichment
27 profiles; Y and X-axis represent read count averaged in windows (value shown at the left of each
19 1 box), and reference genome coordinates (Kb positions shown at the top), respectively; we fixed the
2 scale of the Y axis to the value of the peak with the highest read count average on IGV. BAC
3 probes chosen for the immuno-FISH experiments are reported below each corresponding
4 neocentromeric peak. The lower panel is a schematic reconstrunction of the neocentromeric
5 assembled contig; each neocentromeric fragment is represented as an arrow and aligned to the
6 upper panel; thin blue lines represent SV connections, and flags indicate breakpoint positions.
7 Immuno-FISH results with anti CENP-C antibodies (green) co-hybridized with alphoid probe (B)
8 and BAC probes spanning the neocentromeric core (C) are shown; probe names and signals are
9 consistently coloured; CENP-C signals (green), not co-localizing with the alphoid probe (red),
10 pinpoint neocentromeres mapping at RGMs in (B), while white arrow heads indicate
11 neocentromeres carrying merged blue/red/green signals in (C).
12
13 Figure 3: The genomic architecture of NEO1 and NEO2 in 93T449 and 94T778 cell lines. At
14 the top of each box, IGV partial plots at the CENP-A ChIP-enriched regions of NEO1 (A) and
15 NEO2 (B) are shown as juxtaposed in a proper order and orientation to assemble the
16 neocentromeric domain of both 93T449 and 94T778 cell lines. The top panels of both (A) and (B)
17 boxes show the normalized enrichment profile of the neocentromeric peaks for each cell line (IP
18 versus input; full details in Supplementary Methods); the middle and lower panels show the IP and
19 input raw enrichment profiles; Y and X-axis represent read count averaged in windows (the relative
20 values are shown on the left of each box), and reference genome coordinates (Kb positions shown at
21 the top), respectively; we fixed the scale of the Y axis to the value of the peak with the highest read
22 count average on IGV. BAC probes chosen for the immuno-FISH experiments are reported below
23 the corresponding neocentromeric peak. At the bottom of each box, a schematic reconstruction of
24 the neocentromeric assembled contig is represented; each neocentromeric fragment is shown as an
25 arrow and aligned to the upper panel; thin blue lines represent SV connections, and flags indicate
26 breakpoint positions. A clear differential enrichment between the neocentromeric tails is shown
27 when comparing 93T449 and 94T778.
20 1
2 Figure 4: Metaphase spreads showing immuno-FISH results on the 93T449 (A, C) and 94T778 (B,
3 D) cell lines; probe names and signals are consistently coloured; CENP-C signals (green), not co-
4 localizing with alphoid probes (red), pinpoint neocentromeres on RGMs (green arrows), while
5 probes RP11-203K8 and RP11-30I11 are specific for NEO1 and NEO2 cores, respectively. (C, D)
6 Left and middle boxes show two-way merged images of each probe and CENP-C signals; boxes to
7 the right show the three-way merged signals on partial metaphase spreads; pale blue and yellow
8 arrows indicate neocentromeres carrying blue/green and red/green merged signals, respectively.
9 The results achieved with this combination of different neocentromeric probes demonstrate the
10 presence of two neocentromeric alleles in both cell lines.
11
12 Table 1: List of all the validated chimeras detected by ChimeraScan; Class I, II and III refer to
13 chimeras derived from canonical, complex and transcription-induced fusions, respectively. Gene
14 names styles are consistent with their main function (bold = cancer genes, underlined =
15 ubiquitination, Italic = RNA-splicing and surveillance, Italic = long non-coding, and bold =
16 trafficking and transport). OF= out of frame, IF= in frame, NA= not analysed. *= chimera whose
17 supporting SV, containing a SINE at the breakpoint site, was detected by long-PCR and Sanger
18 sequencing.
19
21 Peak 1 AB B 192,04 192,06 192,08 192,10 192,12 192,14 192,16 [0-362] q29
q28 q27 [0-362] q26.33 q26.32 q26.31 RP11-144F4 q26.1 167,011,826 Peak 2 167,013,341 187,26 187,28 187,30 187,32 187,3 187,36 187,38
q26.1 q26.31 [0-2198] q26.32 q26.33
ALPHOID q27 RP11-668E6 q28 [0-2198] RP11-58M14 q29 CENPC RP11-58M14 RGM3 Peak 3 C 1,34 1,36 1,38 1,40 1,42 1,44 1,46 D RGM3 [0-3633]
[0-3633]
[0-3633]
RP11-668E6
Peak 4 10,56 10,58 10,60 10,62 10,64 10,66 10,68 RGM9 [0-626]
[0-626]
RP11-142K8
Peak 5 10,78 10,80 10,82 10,84 10,86 10,90 10,92 [0-1460]
[0-1460]
RP11-258B1
Peak 6 11,66 11,68 11,70 11,72 11,74 11,76 11,78 [0-1560]
[0-1560] p24.3 p24.2 RP11-351P15 p24.1 p23 Peak 7 13,033,681 12,30 12,32 12,34 12,36 12,38 12,40 12,42 13,093,117 13,074,236 [0-4218] 13,033,310 p23 p24.1 [0-4218] p24.2 p24.3 RP11-822C9 RGM9 A chr12 chr12 chr10 chr10 chr12 59,600 59,620 51,920 51,900 51,880 51,860 51,840 17,920 17,930 27,680 27,700 27,720 59,600 59,580 [0-29]
[0-1790] 95T1000
[0-1790] INPUT IP NORMALIZED IP INPUT
RP11-791A6 RP11-979M22 RP11-344K11
51,910,969 17,920,630 27,682,052 59,598,872
59,619,625 51,836,596 17,929,851 27,725,341
B C ALPHOID RP11-791A6 RP11-791A6 RP11-979M22 RP11-344K11 RP11-979M22 CENPC CENPC CENPC A chr1 chr12 chr6 chr1 188,370 188,390 70,290 70,300 114,120 114,100 185,780 185,800 185,820 185,880 185,860 185,840 185,890 [0-27]
[0-27]
[0-3498]
[0-3498]
[0-1793]
[0-1793] INPUT IP INPUT IP NORMALIZED IP INPUT IP INPUT
RP11-203K8 185,886,859
185,829,026
70,296,334 114,124,644 185,780,258 185,886,859 94T778 93T449
188,382,000 70,302,122 114,104,606 185,829,026
B chr1 chr1 chr12 chr12 chr12 188,380 188,390 188,840 188,830 60,540 60,55 58,050 58,060 66,020 66,000 65,980 65,960 65,940 65,920
[0-23]
[0-23]
[0-1450]
[0-1450]
[0-776]
[0-776] INPUT IP INPUT IP NORMALIZED IP INPUT IP INPUT
RP11-30I11 58,051,280 66,027,424
60,552,128 58,059,125
188,843,129 60,541,345 58,051,280 66,027,424 94T778 93T449
188,382,930 188,833,503 60,552,128 58,059,125
A C 93T449 RP11-30I11 RP11-203K8 RP11-30I11 CENPC CENPC RP11-203K8 CENPC
RP11-30I11 RP11-203K8 RP11-30I11 CENPC CENPC RP11-203K8 CENPC
ALPHOID CENPC B D 94T778
RP11-30I11 ALPHOID RP11-30I11 RP11-203K8 RP11-203K8 CENPC CENPC CENPC CENPC 94T778 93T449 95T1000 04T036
RASAL2 MARCH9 SLCO3A1 ASB7 LIMS3 LOC100507250 SH3TC1/ IGF1R B4GALNT1/ VAMP4 APP/HMCN1 VEZT MARCH9 SUCO FURIN NDFIP2 RASAL2 IGF1R MARCH9 MARCH9 SLCO3A1 SH3TC1/ APP/HMCN1 QKI SUCO/ LOC100507250 NDFIP2 VAMP4 B4GALNT1/ VEZT FURIN CRADD CDC123 KDM2B LEMD3 HMGA2 FAM107B/CDNF MLLT10 LEMD3 CAMK1D ANKRD26/ GNS/ANKRD26 ITGA5 UPF2 FRS2 BC065763/abParts ZNF286A/SIAE PPARD / LOC100132735 / /CPZ /CPZ /FGD6 / / UPF2 /LOC101926960 / - / LEMD3 /ADAMTS1 /ADAMTS1 SLC12A7 SLC12A7 LOC440895 LOC101926960 /MYO1D SLC9B1 / /MKX /LYZ / / / / / / / /LOC100240734 / /KRT121P UPF2 MYO16 MYO16 CELF2 TAF3 TAF3 TRIP13 TRIP13 ACVR1 ACVR1 CCNT1 /BC073932 / /BC073932 / / /BACE2 /BACE2 CHIMERA
SLC26A10 SLC26A10 ANAPC5 NCKAP1L
SLC28A3 SLC28A3
-
SPA17 /HMCN1 /HMCN1
- -
AS1 AS1
/
MALL
SCORE 477 909 101 119 140 180 190 103 113 170 175 251 328 122 205 208 262 294 19 18 49 11 15 18 28 65 69 77 78 91 23 19 36 36 37 51 73 76 78 90 22 92 22 35 44 72 85 8
Class I Class I Class III Class I Class I Class I Class I Class III Class III Class III Class II Class I Class II Class I Class I Class I Class II Class I Class II Class I Class II Class I Class II Class I Class I Class III Cla Class I Class I Class II Class I Class II Class I Class I Class II Class I Class III Class I Class I Class III Class I Class I Class II Class II Class I Class I Class I Class I CLASS ss I
I
IF IF IF IF IF IF OF IF IF IF IF IF IF IF IF IF IF IF IF IF IF IF IF IF IF OF IF IF IF IF IF IF IF IF IF IF OF IF OF IF IF IF OF OF IF IF IF IF FRAME
No conserved domainretained N No conserved domainretained N Promoter swapping N C Chimericprotein C Promoter swapping Chime C No conserved domainretained Chimericprotein C No conserved dom Chimericprotein Chimericprotein Chimericprotein N Promoter swapping NA Promoter swapping N Promoter swapping No conserved domainretained (uc001spx.2 isoform) N Chimericprotein N N C No conserved domainretained Promoter swapping N N N No conserved domainretained (uc001spy.3isoform) No conserved domainretained C No conserved domainretained Promoter swapping No conserved domainretained No conserved domainretained (uc001s No conserved domainretained (uc001spy.3isoform) N C N C ------term truncated FURIN term term truncated HMGA2 term truncated term truncated S term truncated FURIN term truncated SHETC1 term truncated QKI term truncated HMCN1 (lossIG domains) of term truncated TAF3 (loss of BTPdomain) term term truncated LEMD3 (loss ofLEM domain) term truncated MYO1D term term truncated MALL (MARVEL domainretained) term truncated HMCN1 (lossIg domains) of term truncated TAF3 (loss of BTPdomain) term truncated HMCN1 term truncated CPZ (loss of CRD domain) term truncated BACE term truncated HMCN1 (lossIG domains) of
ricprotein
truncated truncated CPZ (loss of CRD domain) truncated BACE2 (no domainloss) IN SILICOIN TRANSLATION
CDC123 LEMD3 ain retained HETC1
2
(no domainloss)
(lossIg domains) of
px.2 isoform)