Bioinformatic Analysis of Next‑Generation Sequencing Data to Identify Dysregulated Genes in Fibroblasts of Idiopathic Pulmonary Fibrosis
Total Page:16
File Type:pdf, Size:1020Kb
INTERNATIONAL JOURNAL OF MOleCular meDICine 43: 1643-1656, 2019 Bioinformatic analysis of next‑generation sequencing data to identify dysregulated genes in fibroblasts of idiopathic pulmonary fibrosis CHAU‑CHYUN SHEU1-3, WEI‑AN CHANG1,2, MING‑JU TSAI1-3, SSU‑HUI LIAO1, INN‑WEN CHONG2,3 and PO-LIN KUO1 1Graduate Institute of Clinical Medicine, College of Medicine, Kaohsiung Medical University; 2Division of Pulmonary and Critical Care Medicine, Kaohsiung Medical University Hospital; 3Department of Internal Medicine, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan, R.O.C. Received October 7, 2018; Accepted January 29, 2019 DOI: 10.3892/ijmm.2019.4086 Abstract. Idiopathic pulmonary fibrosis (IPF) is a lethal and miRNA expression data, combined with GEO verifica- fibrotic lung disease with an increasing global burden. It is tion, finally identified Homo sapiens (hsa)-miR-1254-INKA2 hypothesized that fibroblasts have a number of functions that and hsa-miR-766-3p-INKA2 as the potential miRNA-mRNA may affect the development and progression of IPF. However, interactions in IPF fibroblasts. In summary, the results the present understanding of cellular and molecular mecha- of the present study suggest that dysregulation of PAX8, nisms associated with fibroblasts in IPF remains limited. hsa-miR-1254-INKA2 and hsa-miR-766-3p-INKA2 may The present study aimed to identify the dysregulated genes promote the proliferation and survival of IPF fibroblasts. In in IPF fibroblasts, elucidate their functions and explore the functional analysis of the dysregulated genes, a marked potential microRNA (miRNA)‑mRNA interactions. mRNA association between fibroblasts and the ECM was identified. and miRNA expression profiles were obtained from IPF These data improve the current understanding of fibroblasts fibroblasts and normal lung fibroblasts using a next‑gener- as key cells in the pathogenesis of IPF. As a screening study ation sequencing platform, and bioinformatic analyses were using bioinformatics approaches, the results of the present performed in a step‑wise manner. A total of 42 dysregulated study require additional validation. genes (>2 fold‑change of expression) were identified, of which 5 were verified in the Gene Expression Omnibus (GEO) Introduction database analysis, including the upregulation of neurotrimin (NTM), paired box 8 (PAX8) and mesoderm development Idiopathic pulmonary fibrosis (IPF) is a chronic, progressive, LRP chaperone, and the downregulation of ITPR interacting fibrotic lung disease. IPF predominantly affects elderly men. domain containing 2 and Inka box actin regulator 2 (INKA2). Patients with IPF usually present with exertional dyspnea, dry Previous data indicated that PAX8 and INKA2 serve roles in cough and inspiratory bibasilar Velcro‑like crackles on lung cell growth, proliferation and survival. Gene Ontology analysis auscultation. At present, high‑resolution computed tomography indicated that the most significant function of these 42 dysreg- is the most important diagnostic tool for IPF, which typically ulated genes was associated with the composition and function presents with usual interstitial pneumonia pattern (1,2). IPF is of the extracellular matrix (ECM). A total of 60 dysregulated a lethal lung disease. The prognosis of IPF is poorer compared miRNAs were also identified, and 1,908 targets were predicted with a number of types of cancer (3), with an estimated median by the miRmap database. The integrated analysis of mRNA survival of 3‑5 years if untreated (2). The incidence of IPF, hospitalization rates and mortality due to IPF have increased during previous years, suggesting an increasing global burden of this disease (4‑6). The pathogenesis of IPF remains largely unknown. Data Correspondence to: Professor Po‑Lin Kuo, Graduate Institute from observational studies suggest that cigarette smoking, of Clinical Medicine, College of Medicine, Kaohsiung Medical air pollution (7), environmental exposure (8), chronic viral University, 100 Shih‑Chuan 1st Road, Kaohsiung 807, Taiwan, infection (9) and esophageal reflux (10) predispose individuals R.O.C. E‑mail: [email protected] to IPF (4). There is also increasing evidence for genetic predisposition to IPF: Recent genome‑wide association studies Key words: bioinformatics, fibroblasts, idiopathic pulmonary identified that several genetic variants, primarily involved in fibrosis, Inka box actin regulator 2, next‑generation sequencing, epithelial cell‑cell adhesion and integrity, the innate immune paired box 8 response, host defense and DNA repair, are associated with increased risk of IPF (11‑13). These together indicate that gene‑environmental interactions serve important roles in the 1644 SHEU et al: BIOINFORMATIC ANALYSIS OF DYSREGULATED GENES IN IPF FIBROBLASTS pathogenesis of IPF. In genetically predisposed individuals, DHLF‑IPF and NHLF cells used for the NGS analysis were repetitive environmental exposures lead to an aberrant harvested from the 1st generation of cells following cultivation injury‑remodeling process (14). Activated lung epithelial cells from the primary cells. produce pro‑fibrotic growth factors, chemokines and other mediators, resulting in abnormal wound healing character- NGS. The expression profiles of miRNAs and mRNAs were ized by mesenchymal transition of epithelial cells, activation examined using NGS. Protocol was described in our previous and differentiation of fibroblasts and myofibroblasts. These studies (25‑27). In brief, total RNA from the DHLF‑IPF and fibroblasts and myofibroblasts secrete excessive amounts of NHLF cells were extracted with TRIzol® reagent (Thermo extracellular matrix proteins including fibrillary collagens, Fisher Scientific, Inc., Waltham, MA, USA) at Welgene Biotech vimentin and fibronectin (15), contributing to the destruction Co., Ltd. (Taipei, Taiwan). Purified RNA was quantified at an of lung architecture by progressive scarring. optical density of 260 nm using an ND‑1000 spectrophotom- It is hypothesized that fibroblasts have a number of func- eter (NanoDrop Technologies; Thermo Fisher Scientific, Inc.) tions that may affect the development and progression of and the RNA quality was examined using a Bioanalyzer 2100 IPF (16). Although previous studies have described the differ- (Agilent Technologies, Inc., Santa Clara, CA, USA) with ences between IPF and normal lung fibroblasts (17‑21), the RNA 6000 LabChip® kit (Agilent Technologies, Inc.). Next, the knowledge concerning the cellular and molecular mechanisms samples were prepared for library construction and sequencing associated with fibroblasts in IPF remains limited, and their using an Illumina preparation kit (Illumina, Inc., San Diego, specific roles requires additional investigation (16,22). CA, USA). For small RNA sequencing, the harvested cDNA For an improved understanding of the roles fibroblasts constructs containing 18‑40‑nucleotide RNA fragments serve in the pathogenesis of IPF, the present study used a next (140‑155 nucleotides in length with all adapters) were selected. generation sequencing (NGS) platform and a step‑wise bioin- The libraries were then sequenced on an Illumina instrument formatic approach to analyze the expression levels of mRNAs (75 single‑end cycles). Following trimming and removal and microRNAs (miRNAs) and their interactions in IPF fibro- of reads with low quality scores using Trimmomatics (28), blasts compared with normal lung fibroblasts. The aims of the the qualified reads were analyzed with miRDeep2 (29) and present study were to identify the top differentially expressed the UCSC Genome Browser (genome.ucsc.edu/) (30). The genes and subsequently elucidate the function of these miRNAs with low levels (<1 normalized read per million) in dysregulated genes, and to explore potential miRNA‑mRNA either DHLF‑IPF or NHLF cells were excluded. interactions in IPF fibroblasts. For transcriptome sequencing, the library was constructed with a SureSelect Strand Specific RNA Library Preparation Materials and methods kit (Agilent Technologies, Inc.), and sequenced on a Solexa platform (150 paired‑end cycles), using the TruSeq SBS kit Study design. The study design is summarized in Fig. 1. (Illumina, Inc.). Similar with small RNA sequencing data, Firstly, cell cultures of the IPF fibroblasts and healthy lung Trimmomatics (version 0.36) was implemented to trim or fibroblasts were generated. Then, total RNA of the fibroblasts remove the reads with low quality score. Qualified reads were were extracted and sent for RNA and small RNA sequencing analyzed using HISAT2 (version 2.1.0) (31). The genes with low using the NGS platform. The differentially expressed genes expression levels [<0.3 fragment per kilobase of transcript per (>2 fold‑change) were analyzed with Ingenuity® Pathway million mapped reads (FPKM)] in either DHLF‑IPF or NHLF Analysis (IPA) and the Database for Annotation, Visualization cells were excluded. The 2‑tailed P‑values were calculated and Integrated Discovery (DAVID) for pathway analysis and [2(Cufflinks version 2.2.1)] with non‑grouped samples used functional interpretation. These dysregulated genes were addi- ‘blind mode’, in which all samples were treated as replicates tionally verified in representative Gene Expression Omnibus of a single global ‘condition’ and used to build one model for (GEO) datasets. The differentially expressed miRNAs statistical analysis (32,33). Genes with P<0.05 [(‑log10) P‑value (>2 fold‑change) were analyzed with miRmap (mirmap.ezlab. >1.3]