Definition of Biological Responses Through the Analysis of Gene Expression Profiles
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF UDINE DOCTORATE COURSE IN BIOMEDICAL AND BIOTECHNOLOGICAL SCIENCES CICLE XVII RESEARCH DOCTORATE THESIS DEFINITION OF BIOLOGICAL RESPONSES THROUGH THE ANALYSIS OF GENE EXPRESSION PROFILES Candidate: Raffaella Picco Tutor: Prof. Claudio Brancolini Prof. Federico Fogolari ACADEMIC YEAR 2014/2015 A Marian, a Carla e alle mie figlie TABLE OF CONTENTS 1. ABSTRACT 1 2. INTRODUCTION 3 2.1 MICROARRAY 3 MICROARRAY TECHNOLOGY 3 IMAGE ANALYSIS 5 GRAPHICAL PRESENTATION OF THE DATA 9 QUALITY ASSESSMENT OF AFFYMETRIX GENECHIP 9 DIFFERENTIALLY EXPRESSED GENES 11 CLUSTERING 12 PCA 13 PUBLIC DATABASES OF GENE EXPRESSION PROFILES 14 BIOLOGICAL INTERPRETATION OF THE RESULTS 14 2.2 CASPASES 15 CASPASE-2 16 CASPASES in the Central Nervous System (CNS) 20 2.3 CHRONIC LYMPHOCYTIC LEUKEMIA 21 Principal genetic alterations in CLL 23 3. MATERIAL AND METHODS 25 PIPELINE 25 Packages installation 25 Preprocessing 26 Quality Assessment of Affymetrix GeneChip 26 RNA degradation 27 Probe Level Model (PLM) 28 Filtering 29 Annotating a platform 29 Differentially Expressed Genes (DEG) selection 31 Model fitting 31 Paired samples 32 Annotation insertion 33 Correlations and heatmap 33 Partial Least Square (PLS) regression 35 AWK 35 Shell scripts 35 Graphics 36 4. RESULTS and DISCUSSION 39 4.1 RESULTS OF THE FIRST PART 39 4.2 RESULTS OF THE SECOND PART 41 5. ADDITIONAL WORK 52 5.1 Next-Generation Sequencing Analysis of miRNAs Expression in Control and FSHD Myogenesis 52 5.2 Synthesis, Characterization, and Optimization for in Vivo Delivery of a Nonselective Isopeptidase Inhibitor as New Antineoplastic Agent 53 6. CONCLUSIONS 55 7. APPENDIX 57 7.1 Scripts – part 1 57 7.2 Scripts – part 2 60 8. BIBLIOGRAPHY 63 9. PUBLISHED PAPERS 75 1. ABSTRACT 1. ABSTRACT The microarray technology has revolutionized the study of the complexity in the biological systems and of the biological responses to the environment. In order to define new scenarios and new biological pathways, the aim of this PhD thesis was the acquisition of a mastery in the management and analysis of microarray data, deepening the basic concepts of statistical analysis, to make knowledgeably choices about the test more suitable to a particular set of data. After in-depth knowledge of the different phases, the aim was to build up a generic pipeline that can be used to analyze any type of microarray data. The computational pipeline for processing raw microarray data (images) was implemented in R, using mostly Bioconductor packages. Implementation aimed to define gene expression levels, to provide experiment quality assessment and significative statistical tests. During the first part of the PhD thesis my purpose was the determination of the gene function combining experiments of silencing with the gene expression analysis. Caspase-2 is a member of the cystein-protease family of enzymes, which carry out important roles in apoptosis and in inflammation. Although CASP2 is highly conserved through the evolution and was the first caspase identified, several contradictory results are found in the literature. Being expressed at high level during the neurological development and with a strong involvement in the apoptotic processes in the adult central nervous system, we decided to proceed with its silencing in glioblastoma cells, to evaluate the effect on gene expression. The comparative analysis of expression profiles of silenced cells respect to the control, highlighted the relation between CASP2 and genes involved in the cholesterol metabolism. Previous studies have suggested for this enzyme a role in the control of intracellular level of this metabolite. Therefore, we decided to use data stored in public databases in order to extend the investigation, including all the other caspases and all the genes in some way connected to cholesterol. Computational analysis of the correlation between expression levels of these different genes families, has allowed us to: i) define correlations among cholesterol genes and caspases and ii) perform a hierarchical clustering of the different caspases in term of correlational profile. The analysis was expanded to normal brain and liver tissues and a correlation between expression levels of certain caspases and aging was found in human brain. During the second part of the work, I performed gene expression profile analysis to define 1 1. ABSTRACT signaling pathways and resistance mechanisms elicited after treatment of chronic B lymphocytic leukemia cells with a new class of ubiquitin-proteasome system (UPS) inhibitors. Through the comparison of transcriptional profiles before and after treatments, many genes and relative pathways, whose expression was altered by the UPS inhibitor, were identified. These genes have allowed us to define new mechanisms of drug action. Furthermore, considering the difference in terms of responsiveness of the analyzed patients, we propose some genes as responsible for the differential cellular reactivity to the specific inhibitor. 2 2. INTRODUCTION 2. INTRODUCTION 2.1 MICROARRAY The microarray technique, developed in the 90s (Schena, 1995), enables the analysis of gene expression by monitoring at once the RNA products of thousands of genes. Thanks to the production of high quality platforms, the creation of standardized experimental protocols, the construction of sophisticated devices for the recording of the intensity values and the development of robust computational tools for data management, this technology became a technique of everyday use in biological laboratories (Hoheisel, 2006; Trevino, 2007). Unlike traditional investigation techniques, that make use of probes labeled with fluorescent substances directed against specific sequences of DNA anchored to a solid support, microarray technology overturns this relationship: in this case the non-labeled probes are fixed on the same support and the labeled sample is instead in the liquid phase. The hybridization of all the probes and their respective targets occurs simultaneously and this produces information on the global transcriptional state of the sample. After removal of excess solution that did not interact with the probes, the bond between the probe and target is translated into a numerical value, after recording the emitted fluorescence, which is supposed to be proportional to the amount of the respective transcript present in the sample under investigation. The microarray analysis provides a matrix of genes (along rows) and samples (along columns). The values in the matrix are intensity measurements that are proportional to the mRNA quantity of each gene present in a cell. MICROARRAY TECHNOLOGY The choice of the material for the microarray construction has a fundamental importance. At the beginning, DNA arrays were based on nylon membranes. Nowadays, both porous and non-porous materials can be selected for their construction, each of them with specific advantages. The non-porous ones, such as glass, are the most useful, given the low intrinsic fluorescence and the poor dissemination. These features allow to create small sized spots in a very precise way, permitting the use of an 3 2. INTRODUCTION extremely low sample quantity. Unfortunately the glass has a high affinity for dust and for the other contaminants present in the air and a smaller binding surface with respect to the other porous materials. A recent technology makes use of glass beads to which the probes are linked before being spotted in a completely random way. The positions are then determined through a complex pseudo-sequencing process. The small bead size allows for greater number of spots per microarray slide and as many as 12 simultaneous sample analyses on a single platform are possible. An additional microarray classification is made on the basis of the support dimensionality: there are 2D-microarray and 3D-microarray. In this case each probe (specific for a single gene) is linked to the microchannel walls, process that allows to increase in a considerable way the quantity of probes used (therefore with an augmentation of sensitivity) and the possibility to use the much more stable chemiluminescence (instead of the fluorescence). Three techniques for microarray manufacturing exist. In the first technique single strand cDNA molecules (length between 200 and 2000 bases) or pre-synthesized oligonucleotides (50-100 bases) in solution are deposited onto a solid support through the use of a robot. These are spotted microarray. The second technique exploits the in-situ synthesis of 60-bases oligonucleotides through ink- jet printing technology (Agilent): this is a very flexible method that allows the production of personalized array for two-dye experiments. A computer controls the printing process and the sequences of all the oligonucleotides are contained in a file. This methodology makes particularly efficient the manufacturing process of custom microarray. Another positive feature is the uniformity of the spot shape and dimension. In the third technology small oligonucleotides are directly synthesized on a silica support, through photolithographic techniques (Affymetrix) that make use of masks and light-reactive compounds with the selective activation of specific nucleotides in well-determined positions. The sequential use of different masks leads to the synthesis of all probes essential for the whole genome analysis. The high cost of this technology does not justify the realization of custom array. The maximum oligonucleotide length allowed is 25 bases and the density reached on the spot is very high (over