Supporting Information

Total Page:16

File Type:pdf, Size:1020Kb

Supporting Information Supporting Information Muro et al. 10.1073/pnas.0807813105 SI Text (Fig. S1) showed for 10 of the PAS variants a marked tendency 1. Algorithm for Automated Recognition of EST End Clusters. To to lie 10–30 nt upstream of EST ends, whereas no strong trend determine the coordinates of genomic EST alignments we used was seen for 3 additional variants. Based on this simple analysis the UCSC genome annotation (1), a regularly updated database we chose to accept only 10 of the variants as functional PAS which includes mouse and human EST sequences and their (Table S1), and only EST ends validated by a PAS 10–30 nt positions in the genome. The UCSC Golden Path version was upstream. Furthermore, we considered all EST ends validated by mm6 for mouse (equivalent to NCBI build 34, March 2005) and the same PAS and ending within 20 nt of one another to hg18 for human (NCBI Build 36.1, March 2006). The UCSC represent the same transcript end and to be clustered together, mapping database is based on alignments of ESTs to the as others have also proposed (6). corresponding genome using the BLAT program (2). Because of Next, we studied the distribution of individual EST ends the relatively low accuracy of EST sequencing and recent around our predicted rough ends (Fig. S2). The graph indicates genomic duplications (3), some ESTs can be aligned to more that in a range of Ϫ200 to ϩ150 nt around these rough ends the than 1 genomic position. To avoid possible misidentifications, frequency of EST ends is distinguishable from the surrounding ambiguously aligning ESTs were excluded from our analysis. background. Therefore, to relate rough ends to precise genomic Ϫ ϩ Multiple matches to the genome were reported when alignments locations of EST ends, first EST ends in a range of 200 to 150 had a base pair identity within 0.5% of the best alignment and nt of the rough end were collected, then EST ends within 20 nt at least 96% base pair identity to the genomic sequence. We of each other were clustered to define a potential termination accepted only ESTs that aligned to a unique genomic position. zone, and finally, each cluster was tested for the presence of a For example, in the analysis of the murine genome this constraint PAS within 10–30 nt upstream of the start of the cluster. reduced the number of aligned ESTs from 4.9 to 3.3 million. EST The strength of a cluster is described in terms of total number alignment to the genomic sequence circumvents the need to of EST terminations in the cluster and of maximum number of analyze EST sequences for possible sequencing errors, RNA terminations at a single nucleotide position. The width of the editing, or elimination of polyA tails. Once a genomic position termination zone provides an additional descriptor used in the was identified by coalignment of multiple EST ends, further analysis. The results of this protocol are ‘‘candidate ends’’— analysis was simply performed on the corresponding genomic approximate RNA transcript end predictions—each of which is sequence. a cluster of EST ends located near a rough end. A count of the number of matching ESTs along the genome (Fig. 1A) results in a histogram (Fig. 1B). The simple presence 2. Estimation of Recall and Precision of the Method and Comparison of ESTs does not indicate the direction of transcription. How- to Other Approaches. The correspondence between the automated ever, our analysis revealed, as expected, that the numbers of predictions for the murine genome and the nominal ends in the aligned ESTs gradually increase toward transcript 3Ј ends (Fig. RefSeq collection of protein-encoding transcripts was assessed 1B) and then abruptly fall, suggesting that the shape of the EST quantitatively. Of the 69,220 predictions obtained by using a frequency histogram could be used to infer the direction of minimum definition of 2 ESTs in a cluster and absence of a transcription. A histogram was generated describing the number nearby polyA tract, 10,693 fell within 10 nt of one of 18,280 of ESTs spanning each genomic position (in steps of 20 nt), database transcript ends. ‘‘Recall,’’ the proportion of nominal irrespective of the EST direction or RNA splicing. Intron/exon RefSeq ends that were matched by automated prediction, was ϭ boundaries were not considered for this computation as we were 10,693/18,280 0.59, a figure reflecting the incompleteness of Ј primarily interested in relating EST ends to transcript ends. The many of the database sequences at their 3 ends. When RefSeq resulting EST histogram was convoluted with a mathematical sequences lacking a PAS within 50 terminal nt were removed ϭ function that acts as an ‘‘edge detector.’’ from consideration, recall was 10,693/13,349 0.80. This mea- sure is still influenced by database sequences that are incomplete f(x) ϭ a ⅐ tanh(x/b) ⅐ [1 Ϫ (tanh(x/b))2] but contain a nonused PAS near the nominal terminus (e.g., rows 11–18 in Table S2). The constants a and b modify the width of the function and are, When we focused on the subset of nominal transcript ends that therefore, parameters to optimize. We used a ϭϪ1/150 and b ϭ were matched by automated prediction (10,693) and considered 150. Sharp edges are converted into maxima or minima, de- the total number of ‘‘local ends’’ in the Ϫ200 to ϩ150 nt range pending on the transcript direction, whereas the absolute mag- of these ends and in the direction of transcription of the gene, nitude of a peak indicates the abruptness of the edge. The sign 15,481 were found. ‘‘Precision,’’ the proportion of total predicted of the convolution maximum indicates the direction of the local ends that correspond to confirmed database sequence ends, transcript: negative values indicate termination of transcription is estimated as 10,693/15,481 ϭ 0.69. Lower precision values proceeding from right to left and positive values the opposite mainly reflect greater usage of locally redundant polyadenyla- direction. Peaks are indicated by red bars in Fig. 1B. Peaks of tion sites in the transcriptome. The 4,788 additional terminal magnitude 0.25 or greater were accepted as ‘‘rough ends’’ zone predictions are likely to represent local alternative ends (as indicative of potential termination. Further analysis was then illustrated in Fig. 1) typically associated with EST end clusters performed to relate these rough ends to the precise genomic containing smaller numbers of ESTs than the clusters coinciding locations of the ends of ESTs associated with 1 or more PAS. with the nominal ends. Valid transcript termini are expected to contain a PAS 10–30 Candidate cluster ends are characterized by variable levels of nt upstream of the polyadenylation site. The most common and EST end evidence described as number of EST ends in the strongest signal is AAUAAA, but 12 additional variants have cluster, maximum number of EST ends at a single position, been demonstrated or proposed (4–6) (Table S1). An analysis of association or not to a PAS, and presence or absence of nearby the distribution of distances between the closest PAS to putative polyA tracts. Our computational method allows tuning of results transcript ends in the RefSeq collection and the transcript ends according to thresholds on these 4 properties. Fig. S3 illustrates Muro et al. www.pnas.org/cgi/content/short/0807813105 1of19 the dependence of recall and precision on these variables when accepted all uniquely aligning ESTs regardless of whether their comparing the complete prediction on the mouse genome to the corresponding database entries included a flanking A-rich se- RefSeq collection of protein-encoding transcripts. This analysis quence. Our approach also differed from the PolyA࿝DB and the was used to guide the optimization of the method parameters Yan and Marr (9) methods, which considered only ESTs that and to focus on predicted ends supported by 2 EST ends, as they overlap a gene. The latter strategy detects only those transcript offer the best tradeoff between recall and precision values given ends that are close to current gene predictions. This restriction the current state of the murine EST database. would preclude, for example, detection of the alternative down- Generically similar approaches to computational prediction of stream polyadenylation site for the Pde7a gene described in Fig. Ј RNA transcript ends based on EST evidence have been de- 1. PACdb is less restrictive, but also relates all 3 -processing sites scribed before (7–12). Collections of manually curated ends have to known or currently predicted genes. This approach can result also been published (VEGA) (13). Fig. S4 compares the levels in assignment of a termination to the wrong gene due to missing of recall relative to our experimental benchmark and other gene annotations (7). Many human and mouse gene predictions are currently unstable as they depend on changing EST data and accessible compilations of curated and predicted ends, of our gene prediction methodologies (14). Our analysis, in contrast, is algorithm (TS), the method from Lopez et al. (11, 12) and VEGA independent of gene predictions and we demonstrate experi- (13). With correct prediction of 110 of our 113 experimentally Ј mentally that it can be used to detect new transcript ends that are verified 3 ends (108 positions predicted within an accuracy of 10 not described in the databases. Further, our approach is not nt), TS represents a decisive improvement in predictive power restricted to protein-coding RNAs and can be used to detect over previous approaches. In general, recall was better for more noncoding RNAs, a growing number of which are being recog- highly curated collections of transcripts.
Recommended publications
  • Gene Expression Studies: from Case-Control to Multiple-Population-Based Studies
    From the Institute of Human Genetics, Helmholtz Zentrum Munchen,¨ Deutsches Forschungszentrum fur¨ Gesundheit und Umwelt (GmbH) Head: Prof. Dr. Thomas Meitinger Gene expression studies: From case-control to multiple-population-based studies Thesis Submitted for a Doctoral Degree in Natural Sciences at the Faculty of Medicine, Ludwig-Maximilians-Universitat¨ Munchen¨ Katharina Schramm Dachau, Germany 2016 With approval of the Faculty of Medicine Ludwig-Maximilians-Universit¨atM ¨unchen Supervisor/Examiner: Prof. Dr. Thomas Illig Co-Examiners: Prof. Dr. Roland Kappler Dean: Prof. Dr. med. dent. Reinhard Hickel Date of oral examination: 22.12.2016 II Dedicated to my family. III Abstract Recent technological developments allow genome-wide scans of gene expression levels. The reduction of costs and increasing parallelization of processing enable the quantification of 47,000 transcripts in up to twelve samples on a single microarray. Thereby the data collec- tion of large population-based studies was improved. During my PhD, I first developed a workflow for the statistical analyses of case-control stu- dies of up to 50 samples. With large population-based data sets generated I established a pipeline for quality control, data preprocessing and correction for confounders, which re- sulted in substantially improved data. In total, I processed more than 3,000 genome-wide expression profiles using the generated pipeline. With 993 whole blood samples from the population-based KORA (Cooperative Health Research in the Region of Augsburg) study we established one of the largest population-based resource. Using this data set we contributed to a number of transcriptome-wide association studies within national (MetaXpress) and international (CHARGE) consortia.
    [Show full text]
  • Mouse Wdr37 Conditional Knockout Project (CRISPR/Cas9)
    https://www.alphaknockout.com Mouse Wdr37 Conditional Knockout Project (CRISPR/Cas9) Objective: To create a Wdr37 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Wdr37 gene (NCBI Reference Sequence: NM_001039388 ; Ensembl: ENSMUSG00000021147 ) is located on Mouse chromosome 13. 14 exons are identified, with the ATG start codon in exon 2 and the TAA stop codon in exon 14 (Transcript: ENSMUST00000054251). Exon 3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Wdr37 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-90H23 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Exon 3 starts from about 9.34% of the coding region. The knockout of Exon 3 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 4159 bp, and the size of intron 3 for 3'-loxP site insertion: 2736 bp. The size of effective cKO region: ~597 bp. The cKO region does not have any other known gene. Page 1 of 8 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele gRNA region 5' gRNA region 3' 1 3 14 Targeting vector Targeted allele Constitutive KO allele (After Cre recombination) Legends Exon of mouse Wdr37 Homology arm cKO region loxP site Page 2 of 8 https://www.alphaknockout.com Overview of the Dot Plot Window size: 10 bp Forward Reverse Complement Sequence 12 Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats.
    [Show full text]
  • Nº Ref Uniprot Proteína Péptidos Identificados Por MS/MS 1 P01024
    Document downloaded from http://www.elsevier.es, day 26/09/2021. This copy is for personal use. Any transmission of this document by any media or format is strictly prohibited. Nº Ref Uniprot Proteína Péptidos identificados 1 P01024 CO3_HUMAN Complement C3 OS=Homo sapiens GN=C3 PE=1 SV=2 por 162MS/MS 2 P02751 FINC_HUMAN Fibronectin OS=Homo sapiens GN=FN1 PE=1 SV=4 131 3 P01023 A2MG_HUMAN Alpha-2-macroglobulin OS=Homo sapiens GN=A2M PE=1 SV=3 128 4 P0C0L4 CO4A_HUMAN Complement C4-A OS=Homo sapiens GN=C4A PE=1 SV=1 95 5 P04275 VWF_HUMAN von Willebrand factor OS=Homo sapiens GN=VWF PE=1 SV=4 81 6 P02675 FIBB_HUMAN Fibrinogen beta chain OS=Homo sapiens GN=FGB PE=1 SV=2 78 7 P01031 CO5_HUMAN Complement C5 OS=Homo sapiens GN=C5 PE=1 SV=4 66 8 P02768 ALBU_HUMAN Serum albumin OS=Homo sapiens GN=ALB PE=1 SV=2 66 9 P00450 CERU_HUMAN Ceruloplasmin OS=Homo sapiens GN=CP PE=1 SV=1 64 10 P02671 FIBA_HUMAN Fibrinogen alpha chain OS=Homo sapiens GN=FGA PE=1 SV=2 58 11 P08603 CFAH_HUMAN Complement factor H OS=Homo sapiens GN=CFH PE=1 SV=4 56 12 P02787 TRFE_HUMAN Serotransferrin OS=Homo sapiens GN=TF PE=1 SV=3 54 13 P00747 PLMN_HUMAN Plasminogen OS=Homo sapiens GN=PLG PE=1 SV=2 48 14 P02679 FIBG_HUMAN Fibrinogen gamma chain OS=Homo sapiens GN=FGG PE=1 SV=3 47 15 P01871 IGHM_HUMAN Ig mu chain C region OS=Homo sapiens GN=IGHM PE=1 SV=3 41 16 P04003 C4BPA_HUMAN C4b-binding protein alpha chain OS=Homo sapiens GN=C4BPA PE=1 SV=2 37 17 Q9Y6R7 FCGBP_HUMAN IgGFc-binding protein OS=Homo sapiens GN=FCGBP PE=1 SV=3 30 18 O43866 CD5L_HUMAN CD5 antigen-like OS=Homo
    [Show full text]
  • Genome Wide Association of Chronic Kidney Disease Progression: the CRIC Study (Author List and Affiliations Listed at End of Document)
    SUPPLEMENTARY MATERIALS Genome Wide Association of Chronic Kidney Disease Progression: The CRIC Study (Author list and affiliations listed at end of document) Genotyping information page 2 Molecular pathway analysis information page 3 Replication cohort acknowledgments page 4 Supplementary Table 1. AA top hit region gene function page 5-6 Supplementary Table 2. EA top hit region gene function page 7 Supplementary Table 3. GSA pathway results page 8 Supplementary Table 4. Number of molecular interaction based on top candidate gene molecular networks page 9 Supplementary Table 5. Results of top gene marker association in AA, based on EA derived candidate gene regions page 10 Supplementary Table 6. Results of top gene marker association in EA, based on AA derived candidate gene regions page 11 Supplementary Table 7. EA Candidate SNP look up page 12 Supplementary Table 8. AA Candidate SNP look up page 13 Supplementary Table 9. Replication cohorts page 14 Supplementary Table 10. Replication cohort study characteristics page 15 Supplementary Figure 1a-b. Boxplot of eGFR decline in AA and EA page 16 Supplementary Figure 2a-l. Regional association plot of candidate SNPs identified in AA groups pages 17-22 Supplementary Figure 3a-f. Regional association plot of candidate SNPs identified in EA groups pages 23-25 Supplementary Figure 4. Molecular Interaction network of candidate genes for renal, cardiovascular and immunological diseases pages 26-27 Supplementary Figure 5. Molecular Interaction network of candidate genes for renal diseases pages 28-29 Supplementary Figure 6. ARRDC4 LD map page 30 Author list and affiliations page 31 1 Supplemental Materials Genotyping Genotyping was performed on a total of 3,635 CRIC participants who provided specific consent for investigations of inherited genetics (of a total of 3,939 CRIC participants).
    [Show full text]
  • Chromatin Conformation Links Distal Target Genes to CKD Loci
    BASIC RESEARCH www.jasn.org Chromatin Conformation Links Distal Target Genes to CKD Loci Maarten M. Brandt,1 Claartje A. Meddens,2,3 Laura Louzao-Martinez,4 Noortje A.M. van den Dungen,5,6 Nico R. Lansu,2,3,6 Edward E.S. Nieuwenhuis,2 Dirk J. Duncker,1 Marianne C. Verhaar,4 Jaap A. Joles,4 Michal Mokry,2,3,6 and Caroline Cheng1,4 1Experimental Cardiology, Department of Cardiology, Thoraxcenter Erasmus University Medical Center, Rotterdam, The Netherlands; and 2Department of Pediatrics, Wilhelmina Children’s Hospital, 3Regenerative Medicine Center Utrecht, Department of Pediatrics, 4Department of Nephrology and Hypertension, Division of Internal Medicine and Dermatology, 5Department of Cardiology, Division Heart and Lungs, and 6Epigenomics Facility, Department of Cardiology, University Medical Center Utrecht, Utrecht, The Netherlands ABSTRACT Genome-wide association studies (GWASs) have identified many genetic risk factors for CKD. However, linking common variants to genes that are causal for CKD etiology remains challenging. By adapting self-transcribing active regulatory region sequencing, we evaluated the effect of genetic variation on DNA regulatory elements (DREs). Variants in linkage with the CKD-associated single-nucleotide polymorphism rs11959928 were shown to affect DRE function, illustrating that genes regulated by DREs colocalizing with CKD-associated variation can be dysregulated and therefore, considered as CKD candidate genes. To identify target genes of these DREs, we used circular chro- mosome conformation capture (4C) sequencing on glomerular endothelial cells and renal tubular epithelial cells. Our 4C analyses revealed interactions of CKD-associated susceptibility regions with the transcriptional start sites of 304 target genes. Overlap with multiple databases confirmed that many of these target genes are involved in kidney homeostasis.
    [Show full text]
  • Network-Based Integrative Analysis of Genomics, Epigenomics and Transcriptomics in Autism Spectrum Disorders
    International Journal of Molecular Sciences Article Network-Based Integrative Analysis of Genomics, Epigenomics and Transcriptomics in Autism Spectrum Disorders Noemi Di Nanni 1,2, Matteo Bersanelli 3,4, Francesca Anna Cupaioli 1 , Luciano Milanesi 1, Alessandra Mezzelani 1 and Ettore Mosca 1,* 1 Institute of Biomedical Technologies, Italian National Research Council, Via Fratelli Cervi 93, 20090 Segrate (MI), Italy 2 Department of Industrial and Information Engineering, University of Pavia, Via Ferrata 5, 27100 Pavia, Italy 3 Department of Physics and Astronomy, University of Bologna, Via B. Pichat 6/2, 40127 Bologna, Italy 4 National Institute of Nuclear Physics (INFN), 40127 Bologna, Italy * Correspondence: [email protected]; Tel.: +39-02-26-42-2614 Received: 14 June 2019; Accepted: 6 July 2019; Published: 9 July 2019 Abstract: Current studies suggest that autism spectrum disorders (ASDs) may be caused by many genetic factors. In fact, collectively considering multiple studies aimed at characterizing the basic pathophysiology of ASDs, a large number of genes has been proposed. Addressing the problem of molecular data interpretation using gene networks helps to explain genetic heterogeneity in terms of shared pathways. Besides, the integrative analysis of multiple omics has emerged as an approach to provide a more comprehensive view of a disease. In this work, we carry out a network-based meta-analysis of the genes reported as associated with ASDs by studies that involved genomics, epigenomics, and transcriptomics. Collectively, our analysis provides a prioritization of the large number of genes proposed to be associated with ASDs, based on genes’ relevance within the intracellular circuits, the strength of the supporting evidence of association with ASDs, and the number of different molecular alterations affecting genes.
    [Show full text]
  • 393LN V 393P 344SQ V 393P Probe Set Entrez Gene
    393LN v 393P 344SQ v 393P Entrez fold fold probe set Gene Gene Symbol Gene cluster Gene Title p-value change p-value change chemokine (C-C motif) ligand 21b /// chemokine (C-C motif) ligand 21a /// chemokine (C-C motif) ligand 21c 1419426_s_at 18829 /// Ccl21b /// Ccl2 1 - up 393 LN only (leucine) 0.0047 9.199837 0.45212 6.847887 nuclear factor of activated T-cells, cytoplasmic, calcineurin- 1447085_s_at 18018 Nfatc1 1 - up 393 LN only dependent 1 0.009048 12.065 0.13718 4.81 RIKEN cDNA 1453647_at 78668 9530059J11Rik1 - up 393 LN only 9530059J11 gene 0.002208 5.482897 0.27642 3.45171 transient receptor potential cation channel, subfamily 1457164_at 277328 Trpa1 1 - up 393 LN only A, member 1 0.000111 9.180344 0.01771 3.048114 regulating synaptic membrane 1422809_at 116838 Rims2 1 - up 393 LN only exocytosis 2 0.001891 8.560424 0.13159 2.980501 glial cell line derived neurotrophic factor family receptor alpha 1433716_x_at 14586 Gfra2 1 - up 393 LN only 2 0.006868 30.88736 0.01066 2.811211 1446936_at --- --- 1 - up 393 LN only --- 0.007695 6.373955 0.11733 2.480287 zinc finger protein 1438742_at 320683 Zfp629 1 - up 393 LN only 629 0.002644 5.231855 0.38124 2.377016 phospholipase A2, 1426019_at 18786 Plaa 1 - up 393 LN only activating protein 0.008657 6.2364 0.12336 2.262117 1445314_at 14009 Etv1 1 - up 393 LN only ets variant gene 1 0.007224 3.643646 0.36434 2.01989 ciliary rootlet coiled- 1427338_at 230872 Crocc 1 - up 393 LN only coil, rootletin 0.002482 7.783242 0.49977 1.794171 expressed sequence 1436585_at 99463 BB182297 1 - up 393
    [Show full text]
  • Supplementary Material
    Supplementary material Gene Expression (mRNA) Markers for Differentiating between Malignant and Benign Follicular Thyroid Tumours Bartosz Wojtas 1,2,†, Aleksandra Pfeifer 1,3,†, Malgorzata Oczko-Wojciechowska 1, Jolanta Krajewska 1, Agnieszka Czarniecka 4, Aleksandra Kukulska 1, Markus Eszlinger 5, Thomas Musholt 6, Tomasz Stokowy 1,3,7, Michal Swierniak 1,8, Ewa Stobiecka 9, Ewa Chmielik 9, Dagmara Rusinek 1, Tomasz Tyszkiewicz 1, Monika Halczok 1, Steffen Hauptmann 10, Dariusz Lange 9, Michal Jarzab 11, Ralf Paschke 12 and Barbara Jarzab 1,* 1 Department of Nuclear Medicine and Endocrine Oncology, Maria Sklodowska-Curie Institute— Oncology Center, Gliwice Branch, Wybrzeze Armii Krajowej 15, 44-101 Gliwice, Poland; 2 Laboratory of Molecular Neurobiology, Neurobiology Center, Nencki Institute of Experimental Biology, Pasteura 3, 02-093 Warsaw, Poland 3 Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 2A, 44-100 Gliwice, Poland 4 The Oncologic and Reconstructive Surgery Clinic, Maria Sklodowska-Curie Institute—Oncology Center, Gliwice Branch, Wybrzeze Armii Krajowej 15, 44-101 Gliwice, Poland; 5 Department of Oncology & Arnie Charbonneau Cancer Institute, Cumming School of Medicine, University of Calgary, Calgary, Alberta T2N 4N1 6 Department of General, Visceral, and Transplantation Surgery, University Medical Center of the Johannes Gutenberg University, D55099 Mainz, Germany 7 Department of Clinical Science, University of Bergen, 5020 Bergen, Norway 8 Genomic Medicine, Department
    [Show full text]
  • Genomic Approaches to Identify Important Traits in Avian Species Bhuwan Khatri University of Arkansas, Fayetteville
    University of Arkansas, Fayetteville ScholarWorks@UARK Theses and Dissertations 8-2018 Genomic Approaches to Identify Important Traits in Avian Species Bhuwan Khatri University of Arkansas, Fayetteville Follow this and additional works at: http://scholarworks.uark.edu/etd Part of the Bioinformatics Commons, Large or Food Animal and Equine Medicine Commons, and the Poultry or Avian Science Commons Recommended Citation Khatri, Bhuwan, "Genomic Approaches to Identify Important Traits in Avian Species" (2018). Theses and Dissertations. 2901. http://scholarworks.uark.edu/etd/2901 This Dissertation is brought to you for free and open access by ScholarWorks@UARK. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of ScholarWorks@UARK. For more information, please contact [email protected], [email protected]. Genomic Approaches to Identify Important Traits in Avian Species A dissertation submitted in partial fulfillment of the requirements for degree of Doctor of Philosophy in Poultry Science by Bhuwan Khatri Tribhuvan University Bachelor in Science in Microbiology, 2005 Tribhuvan University Master in Science in Medical Microbiology, 2008 August 2018 University of Arkansas This dissertation is approved for recommendation to the Graduate Council. ________________________ Byungwhi Caleb Kong, Ph.D. Dissertation Director ____________________ ____________________ Walter G. Bottje, Ph.D. Young M. Kown, Ph.D. Committee Member Committee Member ___________________________ Charles F. Rosenkrans Jr., Ph.D. Committee Member ABSTRACT ABSTRACT This dissertation focusses on identifying different molecular markers that have impact on overall poultry production. Chapter one reviews microRNA (miRNA), copy number variation (CNV) and single nucleotide polymorphism (SNP) as markers suggested in different avian species by various studies. It reviews modern genomic approaches that are employed for next generation sequencing data analysis and verification.
    [Show full text]
  • Analyzing the Mirna-Gene Networks to Mine the Important Mirnas Under Skin of Human and Mouse
    Hindawi Publishing Corporation BioMed Research International Volume 2016, Article ID 5469371, 9 pages http://dx.doi.org/10.1155/2016/5469371 Research Article Analyzing the miRNA-Gene Networks to Mine the Important miRNAs under Skin of Human and Mouse Jianghong Wu,1,2,3,4,5 Husile Gong,1,2 Yongsheng Bai,5,6 and Wenguang Zhang1 1 College of Animal Science, Inner Mongolia Agricultural University, Hohhot 010018, China 2Inner Mongolia Academy of Agricultural & Animal Husbandry Sciences, Hohhot 010031, China 3Inner Mongolia Prataculture Research Center, Chinese Academy of Science, Hohhot 010031, China 4State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China 5Department of Biology, Indiana State University, Terre Haute, IN 47809, USA 6The Center for Genomic Advocacy, Indiana State University, Terre Haute, IN 47809, USA Correspondence should be addressed to Yongsheng Bai; [email protected] and Wenguang Zhang; [email protected] Received 11 April 2016; Revised 15 July 2016; Accepted 27 July 2016 Academic Editor: Nicola Cirillo Copyright © 2016 Jianghong Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Genetic networks provide new mechanistic insights into the diversity of species morphology. In this study, we have integrated the MGI, GEO, and miRNA database to analyze the genetic regulatory networks under morphology difference of integument of humans and mice. We found that the gene expression network in the skin is highly divergent between human and mouse.
    [Show full text]
  • Reciprocal Perspective Improves Mirna Targeting Prediction Daniel G
    www.nature.com/scientificreports OPEN RPmirDIP: Reciprocal Perspective improves miRNA targeting prediction Daniel G. Kyrollos1, Bradley Reid1, Kevin Dick1,2 & James R. Green1,2* MicroRNAs (miRNAs) are short, non-coding RNAs that interact with messenger RNA (mRNA) to accomplish critical cellular activities such as the regulation of gene expression. Several machine learning methods have been developed to improve classifcation accuracy and reduce validation costs by predicting which miRNA will target which gene. Application of these predictors to large numbers of unique miRNA–gene pairs has resulted in datasets comprising tens of millions of scored interactions; the largest among these is mirDIP. We here demonstrate that miRNA target prediction can be signifcantly improved ( p < 0.001 ) through the application of the Reciprocal Perspective (RP) method, a cascaded, semi-supervised machine learning method originally developed for protein- protein interaction prediction. The RP method, aptly named RPmirDIP, augments the original mirDIP prediction scores by leveraging local thresholds from the two complimentary views available to each miRNA–gene pair, rather than apply a traditional global decision threshold. Application of this novel RPmirDIP predictor promises to help identify new, unexpected miRNA–gene interactions. A dataset of RPmirDIP-scored interactions are made available to the scientifc community at cu-bic.ca/RPmirDIP and https ://doi.org/10.5683/SP2/LD8JK J. MicroRNAs (miRNAs) represent a class of short (18–28 nucleotide [nt]) non-coding RNA molecules. Tey achieve post-transcriptional and translational regulation of protein expression via base-pairing with comple- mentary sequences of messenger RNA (mRNA) molecules. Gene regulation by miRNAs does not adhere to a simple one miRNA–one target gene mapping.
    [Show full text]
  • Systematic Integrated Analysis of Genetic and Epigenetic Variation in Diabetic Kidney Disease
    Systematic integrated analysis of genetic and epigenetic variation in diabetic kidney disease Xin Shenga,b, Chengxiang Qiua,b, Hongbo Liua,b, Caroline Glucka,b, Jesse Y. Hsuc,d, Jiang Hee, Chi-yuan Hsuf, Daohang Shad, Matthew R. Weirg, Tamara Isakovah,i, Dominic Rajj, Hernan Rincon-Cholesk, Harold I. Feldmana,c,d, Raymond Townsenda, Hongzhe Lic,d, and Katalin Susztaka,b,1 aDepartment of Medicine, Renal Electrolyte and Hypertension Division, University of Pennsylvania, Philadelphia, PA 19104; bDepartment of Genetics, University of Pennsylvania, Philadelphia, PA 19104; cDepartment of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104; dCenter for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104; eDepartment of Epidemiology, Tulane University School of Public Health and Tropical Medicine, Tulane University Translational Science Institute, New Orleans, LA 70118; fDivision of Nephrology, Department of Medicine, University of California, San Francisco, CA 94143; gDivision of Nephrology, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD 21201; hDivision of Nephrology and Hypertension, Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611; iCenter for Translational Metabolism and Health, Institute for Public Health and Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611; jDivision of Kidney Disease
    [Show full text]