Epigenetic Mechanisms Are Involved in the Oncogenic Properties of ZNF518B in Colorectal Cancer
Total Page:16
File Type:pdf, Size:1020Kb
Epigenetic mechanisms are involved in the oncogenic properties of ZNF518B in colorectal cancer Francisco Gimeno-Valiente, Ángela L. Riffo-Campos, Luis Torres, Noelia Tarazona, Valentina Gambardella, Andrés Cervantes, Gerardo López-Rodas, Luis Franco and Josefa Castillo SUPPLEMENTARY METHODS 1. Selection of genomic sequences for ChIP analysis To select the sequences for ChIP analysis in the five putative target genes, namely, PADI3, ZDHHC2, RGS4, EFNA5 and KAT2B, the genomic region corresponding to the gene was downloaded from Ensembl. Then, zoom was applied to see in detail the promoter, enhancers and regulatory sequences. The details for HCT116 cells were then recovered and the target sequences for factor binding examined. Obviously, there are not data for ZNF518B, but special attention was paid to the target sequences of other zinc-finger containing factors. Finally, the regions that may putatively bind ZNF518B were selected and primers defining amplicons spanning such sequences were searched out. Supplementary Figure S3 gives the location of the amplicons used in each gene. 2. Obtaining the raw data and generating the BAM files for in silico analysis of the effects of EHMT2 and EZH2 silencing The data of siEZH2 (SRR6384524), siG9a (SRR6384526) and siNon-target (SRR6384521) in HCT116 cell line, were downloaded from SRA (Bioproject PRJNA422822, https://www.ncbi. nlm.nih.gov/bioproject/), using SRA-tolkit (https://ncbi.github.io/sra-tools/). All data correspond to RNAseq single end. doBasics = TRUE doAll = FALSE $ fastq-dump -I --split-files SRR6384524 Data quality was checked using the software fastqc (https://www.bioinformatics.babraham. ac.uk /projects/fastqc/). The first low quality removing nucleotides were removed using FASTX- Toolkit (http://hannonlab.cshl.edu/fastxtoolkit/). $ fastqc HCT116_PBS_siEZH2.fastq $ fastx_trimmer -i HCT116_PBS_siEZH2.fastq -o HCT116_PBS_siEZH2_1.fastq -f 15 -Q 33 –v The fastq were mapped using STAR, indexing the human genome version GRCH38.p12. Then, the resulting SAM files were compressed to BAM format and sorted, using SAMtools (http://samtools.sourceforge. net/): $ ./STAR –runMode genomeGenerate --runThreadN 10 --genomeDir ./ --genomeFastaFiles GRCh38.p12.genome.fa $ ./STAR --genomeDir ./ --runThreadN 10 starIndex --readFilesIn /home/HCT116_PBS_siEZH2_1.fastq --outFileNamePrefix /home/HCT116_siEZH2 $ samtools view -S -b HCT116_siEZH2Aligned.out.sam > HCT116_siEZH2.bam $ samtools sort HCT116_siEZH2.bam -o HCT116_siEZH2_sort.bam The fastq files also were mapped using Bowtie2 (httphttp://bowtie-bio.sourceforge.net/ bowtie2/index.shtml). $ ./bowtie2-build -f /mnt/sdb1/Bowtie2/GRCh38.p12.genome.fa human $ ./bowtie2 -x human -U /mnt/sdb1/Fastq/HCT116_PBS_siEZH2_1.fastq -S siEZH2.sam $ samtools view -S -b siEZH2.sam > siEZH2.bam $ samtools sort siEZH2.bam -o siEZH2_sort.bam 2. Count and matrix generation The libraries used for the analysis were: library(Rsamtools) library(GenomicFeatures) library(GenomicAlignments) library(org.Hs.eg.db) library(edgeR) library(VennDiagram) 2.1. Generation of the STAR expression matrix For siG9A in HCT116 cell lines: gtfFile = "gencode.v29.chr_patch_hapl_scaff.annotation.gtf" txdb = makeTxDbFromGFF(gtfFile, format="gtf") genes = exonsBy(txdb, by="gene") indir = getwd() files = list.files(indir, pattern = '*.bam') # [1] "HCT116_siG9A_sort.bam""HCT116_siNontarget_sort.bam" bamLst = BamFileList(files, index=character(), obeyQname=TRUE) # Counting reads siG9A = summarizeOverlaps(features = genes, reads = bamLst, mode=”Union”, singleEnd=TRUE, ignore.strand=TRUE fragments=FALSE) colnames(siG9A) # [1] "HCT116_siG9a_sort.bam""HCT116_siNontarget_sort.bam" SampleName = c("siG9a","siNon") Stage = c("siG9a","siNon") colData(siG9a) = DataFrame(SampleName, Stage) dim(siG9a) # [1] 64837 2 save(siG9a,file="siG9a.rda") For siEZH2 in HCT116 cell lines: indir = getwd() files = list.files(indir, pattern = '*.bam') # [1] "HCT116_siEZH2_sort.bam""HCT116_siNontarget_sort.bam" bamLst = BamFileList(files, index=character(), obeyQname=TRUE) siEZH2 = summarizeOverlaps(features = genes,read=bamLst, mode="Union", singleEnd=TRUE, ignore.strand=TRUE, fragments=FALSE) colnames(siEZH2) # [1] "HCT116_siEZH2_sort.bam""HCT116_siNontarget_sort.bam" SampleName = c("siEZH2","siNon") Stage = c("siEZH2","siNon") colData(siEZH2) = DataFrame(SampleName, Stage) dim(siEZH2) # [1] 64837 2 save(siEZH2,file="siEZH2.rda") 2.1. Generation of the Bowtie2 expression matrix For siEZH2 in HCT116 cell line: gtfFile = "gencode.v29.chr_patch_hapl_scaff.annotation.gtf" txdb = makeTxDbFromGFF(gtfFile, format="gtf") genes = exonsBy(txdb, by="gene") indir = getwd() files = list.files(indir, pattern = '*.bam') # [1] "siEZH2_sort.bam""siNonTarget_sort.bam" bamLst = BamFileList(files, index=character(), obeyQname=TRUE) siEZH2_Bowtie = summarizeOverlaps(features = genes, read=bamLst, mode="Union", singleEnd=TRUE, ignore.strand=TRUE, fragments=FALSE) colnames(siEZH2_Bowtie) # [1] "siEZH2_sort.bam""siNonTarget_sort.bam" SampleName = c("siEZH2","siNon") Stage = c("siEZH2","siNon") colData(siEZH2_Bowtie) = DataFrame(SampleName, Stage) dim(siEZH2_Bowtie) # [1] 64837 2 save(siEZH2_Bowtie,file="siEZH2_Bowtie.rda") For siG9a in HCT116 cell line: indir = getwd() files = list.files(indir, pattern = '*.bam') # [1] "siG9a_sort.bam""siNonTarget_sort.bam" bamLst = BamFileList(files, index=character(), obeyQname=TRUE) siG9a_Bowtie = summarizeOverlaps(features = genes, read=bamLst, mode="Union", singleEnd=TRUE, ignore.strand=TRUE, fragments=FALSE) colnames(siG9a_Bowtie) # [1] "siG9a_sort.bam" "siNonTarget_sort.bam" SampleName = c("siG9a","siNon") Stage = c("siG9a","siNon") colData(siG9a_Bowtie) = DataFrame(SampleName, Stage) dim(siG9a_Bowtie) # [1] 64837 2 save(siG9a_Bowtie,file="siG9a_Bowtie.rda") 3. Normalisation and statistical analysis 3.1 STAR-edgeR The statistical analisis was make acording to the user guide of edgeR package (https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide). In this case, we have one library for each treatment condition. Thus, the biological variability of 0.01 was assigned, as recommended in case of technical replicates. For siEZH2 in HCT116 cell line, the procedure was: load(file="siEZH2.rda") # edgeR user guide: 2.11 What to do if you have no replicates x = assay(siEZH2) group = iEZH2$Stage y = DGEList(counts=x,group=group) y = calcNormFactors(y) y$samples # group lib.size norm.factors # Sample1 siEZH2 41941545 0.9808675 # Sample2 siNon 39355314 1.0195057 bcv = 0.01 # recomended by edgeR manual et = exactTest(y, dispersion=bcv) write.csv(topTags(et, n=5000, adjust.method="BH", sort.by="PValue",p.value=1), file = "siEZH2.csv") For siG9a in HCT116 cell line, the procedure was: load(file="siG9a.rda") x = assay(siG9a) group = iEG9a$Stage y = GEList(counts=x,group=group) y = calcNormFactors(y) y$samples # group lib.size norm.factors # Sample1 siG9a 33123918 1.0065409 # Sample2 siNon 39355314 0.9935016 bcv = 0.01 et = exactTest(y, dispersion=bcv) write.csv(topTags(et, n=5000, adjust.method="BH", sort.by="PValue", p.value=1), file = "siG9a.csv") 3.2. Bowtie2-edgeR For siEZH2 in HCT116 cell lines, the procedure was: load(file="siEZH2_Bowtie.rda") x = assay(siEZH2_Bowtie) group = siEZH2_Bowtie$Stage y = DGEList(counts=x,group=group) y = calcNormFactors(y) y$samples # group lib.size norm.factors # Sample1 siEZH2 275877230.9850405 # Sample2siNon 260485281.0151867 bcv = 0.01 et = exactTest(y, dispersion=bcv) write.csv(topTags(et, n=5000, adjust.method="BH", sort.by="PValue", p.value=1), file = "siEZH2_Bowtie.csv") For siG9a in HCT116 cell line, the procedure was: load(file="siG9a_Bowtie.rda") x = assay(siG9a_Bowtie) group = siG9a_Bowtie$Stage y = DGEList(counts=x,group=group) y = calcNormFactors(y) y$samples # group lib.size norm.factors # Sample1 siG9a 21498497 1.0149663 # Sample2 siNon 26048528 0.9852544 bcv = 0.01 et = exactTest(y, dispersion=bcv) write.csv(topTags(et, n=5000, adjust.method="BH", sort.by="PValue", p.value=1), file = "siG9a_Bowtie.csv") 4. Contrasting the in silico results with those experimentally obtained in HCT116 cells after knocking-down ZNF518B Lista siGenes = read.csv(file= "lista_genes.csv", header = TRUE,sep=",") area1 = intersect(Lista_siGenes$siEZH2,Lista_siGenes$siEZH2) length(area1) # 433 area2 = intersect(Lista_siGenes$siG9A,Lista_siGenes$siG9A) length(area2) # 190 area3 = intersect(Lista_siGenes$siZNF518B,Lista_siGenes$siZNF518B) length(area3) # 580 n12 = intersect(area1, area2) length(n12) # 90 n23 = intersect(area2, area3) length(n23) # 17 n13 = intersect(area1, area3) length(n13) # 41 n123 = intersect(intersect(area1, area2), area3) length(n123) # 9 library(VennDiagram) ## Loading required package: grid ## Loading required package: futile.logger draw.triple.venn(area1= 433, area2= 190, area3= 580, n12= 90, n23= 17, n13= 41, n123= 9, category = c("HCT116 siEZH2","HCT116 siEHMT2", "HCT116 siZNF518B"), lty = "blank", sep.dist = 1, fill = c("red", "green", "Blue"), euler.d = TRUE scaled = TRUE, rotation.degree = 10, cex = 1, cat.cex = 1, cat.dist = 0.1) KRAS TP53 APC PIK3CA BRAF Mutation type missense variant NRAS stop gained FBXW7 frameshift variant SMAD4 CTNNB1 ERBB3 40 20 0 % Mutant Sample (n=60) Figure S1. Next-Generation Sequencing for somatic mutations in 60 out of the 66 patients cohort. The percent occurrence and the nature of the different mutations are indicated. A B Control siZNF518B Figure S2. Global analysis of the transcriptomic profile of normal DLD1 and HCT116 cell lines and of cells with silenced ZNF518B. The principalcomponent analysis