Mouse Phyhipl Knockout Project (CRISPR/Cas9)

Total Page:16

File Type:pdf, Size:1020Kb

Mouse Phyhipl Knockout Project (CRISPR/Cas9) https://www.alphaknockout.com Mouse Phyhipl Knockout Project (CRISPR/Cas9) Objective: To create a Phyhipl knockout Mouse model (C57BL/6N) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Phyhipl gene (NCBI Reference Sequence: NM_178621 ; Ensembl: ENSMUSG00000037747 ) is located on Mouse chromosome 10. 5 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 5 (Transcript: ENSMUST00000046513). Exon 2~4 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Exon 2 starts from about 9.24% of the coding region. Exon 2~4 covers 43.56% of the coding region. The size of effective KO region: ~5846 bp. The KO region does not have any other known gene. Page 1 of 9 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele 5' gRNA region gRNA region 3' 1 2 3 4 5 Legends Exon of mouse Phyhipl Knockout region Page 2 of 9 https://www.alphaknockout.com Overview of the Dot Plot (up) Window size: 15 bp Forward Reverse Complement Sequence 12 Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis. Overview of the Dot Plot (down) Window size: 15 bp Forward Reverse Complement Sequence 12 Note: The 2000 bp section downstream of Exon 4 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis. Page 3 of 9 https://www.alphaknockout.com Overview of the GC Content Distribution (up) Window size: 300 bp Sequence 12 Summary: Full Length(2000bp) | A(28.35% 567) | C(18.3% 366) | T(30.75% 615) | G(22.6% 452) Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis. Overview of the GC Content Distribution (down) Window size: 300 bp Sequence 12 Summary: Full Length(2000bp) | A(29.6% 592) | C(19.4% 388) | T(29.75% 595) | G(21.25% 425) Note: The 2000 bp section downstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis. Page 4 of 9 https://www.alphaknockout.com BLAT Search Results (up) QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ----------------------------------------------------------------------------------------------- browser details YourSeq 2000 1 2000 2000 100.0% chr10 - 70571020 70573019 2000 browser details YourSeq 49 523 595 2000 93.2% chr2 + 153043135 153043226 92 browser details YourSeq 42 167 406 2000 95.7% chr3 - 39761746 39761991 246 browser details YourSeq 31 1871 1908 2000 94.5% chr13 - 42386349 42386396 48 browser details YourSeq 31 576 617 2000 97.1% chr12 + 81295845 81295892 48 browser details YourSeq 31 562 595 2000 97.0% chr11 + 46026949 46026983 35 browser details YourSeq 30 1877 1909 2000 97.0% chr2 - 169219328 169219366 39 browser details YourSeq 30 532 580 2000 96.9% chr11 - 21116089 21116139 51 browser details YourSeq 29 168 204 2000 96.8% chr19 - 10893942 10893987 46 browser details YourSeq 29 167 197 2000 96.8% chr7 + 55019301 55019331 31 browser details YourSeq 29 168 206 2000 77.5% chr6 + 68931433 68931465 33 browser details YourSeq 29 564 595 2000 96.8% chr14 + 79093311 79093343 33 browser details YourSeq 29 976 1090 2000 54.6% chr1 + 3995726 3995774 49 browser details YourSeq 28 167 196 2000 96.7% chr18 - 34934325 34934354 30 browser details YourSeq 27 533 579 2000 86.3% chr5 + 99628559 99628603 45 browser details YourSeq 26 562 587 2000 100.0% chrX - 134283182 134283207 26 browser details YourSeq 26 167 192 2000 100.0% chr19 - 21553416 21553441 26 browser details YourSeq 26 167 192 2000 100.0% chr18 - 85560441 85560466 26 browser details YourSeq 26 167 192 2000 100.0% chr16 - 4210966 4210991 26 browser details YourSeq 26 167 192 2000 100.0% chr10 - 26589841 26589866 26 Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found. BLAT Search Results (down) QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ----------------------------------------------------------------------------------------------- browser details YourSeq 2000 1 2000 2000 100.0% chr10 - 70563174 70565173 2000 browser details YourSeq 66 82 208 2000 87.4% chr17 - 87864887 87865015 129 browser details YourSeq 53 65 205 2000 84.3% chr1 + 75296120 75296266 147 browser details YourSeq 52 70 208 2000 76.4% chr11 + 30229911 30230039 129 browser details YourSeq 47 96 213 2000 94.4% chr6 - 110236839 110236957 119 browser details YourSeq 44 97 207 2000 85.2% chr13 - 118678769 118678877 109 browser details YourSeq 42 96 220 2000 91.4% chr4 + 119402316 119402439 124 browser details YourSeq 41 96 201 2000 80.6% chr12 + 69713373 69713479 107 browser details YourSeq 40 76 206 2000 87.1% chr17 - 29547230 29547361 132 browser details YourSeq 38 88 210 2000 90.5% chr18 + 31899730 31899851 122 browser details YourSeq 37 97 210 2000 95.2% chr3 - 121400795 121400909 115 browser details YourSeq 37 96 211 2000 97.5% chr11 - 75981130 75981247 118 browser details YourSeq 37 82 208 2000 91.2% chr13 + 59782854 59782980 127 browser details YourSeq 36 96 211 2000 95.0% chr13 - 92267611 92267727 117 browser details YourSeq 36 322 376 2000 89.4% chr1 - 153400550 153400605 56 browser details YourSeq 36 98 209 2000 92.9% chr13 + 75219978 75220091 114 browser details YourSeq 36 98 210 2000 95.0% chr12 + 77029707 77029821 115 browser details YourSeq 36 75 118 2000 88.1% chr11 + 44525914 44525956 43 browser details YourSeq 35 96 211 2000 94.9% chr10 + 117157265 117157381 117 browser details YourSeq 35 96 210 2000 94.9% chr1 + 136585892 136586007 116 Note: The 2000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found. Page 5 of 9 https://www.alphaknockout.com Gene and protein information: Phyhipl phytanoyl-CoA hydroxylase interacting protein-like [ Mus musculus (house mouse) ] Gene ID: 70911, updated on 12-Aug-2019 Gene summary Official Symbol Phyhipl provided by MGI Official Full Name phytanoyl-CoA hydroxylase interacting protein-like provided by MGI Primary source MGI:MGI:1918161 See related Ensembl:ENSMUSG00000037747 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI267048; 4921522K17Rik Expression Biased expression in testis adult (RPKM 75.5), cerebellum adult (RPKM 31.7) and 6 other tissuesS ee more Orthologs human all Genomic context Location: 10; 10 B5.3 See Phyhipl in Genome Data Viewer Exon count: 6 Annotation release Status Assembly Chr Location 108 current GRCm38.p6 (GCF_000001635.26) 10 NC_000076.6 (70557686..70599291, complement) Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 10 NC_000076.5 (70020434..70062039, complement) Chromosome 10 - NC_000076.6 Page 6 of 9 https://www.alphaknockout.com Transcript information: This gene has 10 transcripts Gene: Phyhipl ENSMUSG00000037747 Description phytanoyl-CoA hydroxylase interacting protein-like [Source:MGI Symbol;Acc:MGI:1918161] Gene Synonyms 4921522K17Rik Location Chromosome 10: 70,557,682-70,655,965 reverse strand. GRCm38:CM001003.2 About this gene This gene has 10 transcripts (splice variants), 249 orthologues, 1 paralogue and is a member of 1 Ensembl protein family. Transcripts Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags Phyhipl- ENSMUST00000046513.9 2877 375aa ENSMUSP00000045807.3 Protein coding CCDS23914 Q8BGT8 TSL:1 201 GENCODE basic APPRIS P3 Phyhipl- ENSMUST00000162251.7 1540 330aa ENSMUSP00000125179.1 Protein coding CCDS48593 F7D3N3 TSL:3 206 GENCODE basic APPRIS ALT1 Phyhipl- ENSMUST00000162144.1 635 212aa ENSMUSP00000124828.1 Protein coding - F6U6Z2 CDS 5' and 3' incomplete 205 TSL:3 Phyhipl- ENSMUST00000162793.7 598 41aa ENSMUSP00000124655.1 Protein coding - E0CXW9 CDS 3' incomplete 209 TSL:3 Phyhipl- ENSMUST00000163054.1 3485 No - Retained - - TSL:1 210 protein intron Phyhipl- ENSMUST00000162470.1 1877 No - Retained - - TSL:1 207 protein intron Phyhipl- ENSMUST00000162571.7 2011 No - lncRNA - - TSL:1 208 protein Phyhipl- ENSMUST00000161687.7 1665 No - lncRNA - - TSL:1 204 protein Phyhipl- ENSMUST00000160127.1 359 No - lncRNA - - TSL:5 203 protein Phyhipl- ENSMUST00000159025.1 326 No - lncRNA - - TSL:3 202 protein Page 7 of 9 https://www.alphaknockout.com 118.28 kb Forward strand 70.56Mb 70.58Mb 70.60Mb 70.62Mb 70.64Mb 70.66Mb Genes Fam13c-202 >protein coding (Comprehensive set... Fam13c-205 >retained intron Fam13c-203 >protein coding Fam13c-201 >protein coding Contigs AC122896.4 > Genes (Comprehensive set... < Phyhipl-201protein coding < Phyhipl-208lncRNA < Phyhipl-206protein coding < Phyhipl-204lncRNA < Phyhipl-205protein coding < Phyhipl-207retained intron < Phyhipl-202lncRNA < Phyhipl-210retained intron < Phyhipl-203lncRNA < Phyhipl-209protein coding Regulatory Build 70.56Mb 70.58Mb 70.60Mb 70.62Mb 70.64Mb 70.66Mb Reverse strand 118.28 kb Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Gene Legend Protein Coding merged Ensembl/Havana Ensembl protein coding Non-Protein Coding processed transcript RNA gene Page 8 of 9 https://www.alphaknockout.com Transcript: ENSMUST00000046513 < Phyhipl-201protein coding Reverse strand 41.61 kb ENSMUSP00000045... Low complexity (Seg) Superfamily Fibronectin type III superfamily Pfam Fibronectin type III PROSITE profiles Fibronectin type III PANTHER PTHR15698 PTHR15698:SF8 Gene3D Immunoglobulin-like fold CDD Fibronectin type III All sequence SNPs/i... Sequence variants (dbSNP and all other sources) Variant Legend synonymous variant Scale bar 0 40 80 120 160 200 240 280 320 375 We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.
Recommended publications
  • Cytogenetic and Molecular Characterization of the Macro- And
    University of Ulm Department of Human Genetics Prof. Dr. med. Walther Vogel Cytogenetic and Molecular Characterization of the Macro- and Micro-inversions, which Distinguish the Human and the Chimpanzee Karyotypes - from Speciation to Polymorphism Thesis Applying for the Degree of Doctor of Human Biology (Dr. hum. biol.) Faculty of Medicine University of Ulm Presented by Justyna Monika Szamalek from Wrze śnia in Poland 2006 Amtierender Dekan: Prof. Dr. Klaus-Michael Debatin 1. Berichterstatter: Prof. Dr. med. Horst Hameister 2. Berichterstatter: Prof. Dr. med. Konstanze Döhner Tag der Promotion: 28.07.2006 Content Content 1. Introduction ...................................................................................................................7 1.1. Primate phylogeny........................................................................................................7 1.2. Africa as the place of human origin and the living area of the present-day chimpanzee populations .................................................................9 1.3. Cytogenetic and molecular differences between human and chimpanzee genomes.............................................................................................10 1.4. Cytogenetic and molecular differences between common chimpanzee and bonobo genomes................................................................................17 1.5. Theory of speciation .....................................................................................................18 1.6. Theory of selection
    [Show full text]
  • An Epigenome-Wide Association Study Based on Cell Type
    Integrative Molecular Medicine Research Article ISSN: 2056-6360 An epigenome-wide association study based on cell type- specific whole-genome bisulfite sequencing: Screening for DNA methylation signatures associated with bone mass Shohei Komaki1, Hideki Ohmomo1,2, Tsuyoshi Hachiya1, Ryohei Furukawa1, Yuh Shiwa1,2, Mamoru Satoh1,2, Ryujin Endo3,4, Minoru Doita5, Makoto Sasaki6,7 and Atsushi Shimizu1 1Division of Biomedical Information Analysis, Iwate Tohoku Medical Megabank Organization, Disaster Reconstruction Center, Iwate Medical University, 2-1-1 Nishitokuta, Yahaba, Shiwa, Iwate 028-3694, Japan 2Division of Biobank and Data Management, Iwate Tohoku Medical Megabank Organization, Disaster Reconstruction Center, Iwate Medical University, 2-1-1 Nishitokuta, Yahaba, Shiwa, Iwate 028-3694, Japan 3Division of Public Relations and Planning, Iwate Tohoku Medical Megabank Organization, Disaster Reconstruction Center, Iwate Medical University, 2-1-1 Nishitokuta, Yahaba, Shiwa, Iwate 028-3694, Japan 4Division of Medical Fundamentals for Nursing, Iwate Medical University, 2-1-1 Nishitokuta, Yahaba, Shiwa, Iwate 028-3694, Japan 5Department of Orthopaedic Surgery, School of Medicine, Iwate Medical University, 19-1 Uchimaru, Morioka, Iwate 020-8505, Japan 6Iwate Tohoku Medical Megabank Organization, Disaster Reconstruction Center, Iwate Medical University, 2-1-1 Nishitokuta, Yahaba, Shiwa, Iwate 028-3694, Japan 7Division of Ultrahigh Field MRI, Institute for Biomedical Sciences, Iwate Medical University, 2-1-1 Nishitokuta, Yahaba, Shiwa, Iwate 028-3694, Japan Abstract Bone mass can change intra-individually due to aging or environmental factors. Understanding the regulation of bone metabolism by epigenetic factors, such as DNA methylation, is essential to further our understanding of bone biology and facilitate the prevention of osteoporosis. To date, a single epigenome-wide association study (EWAS) of bone density has been reported, and our knowledge of epigenetic mechanisms in bone biology is strictly limited.
    [Show full text]
  • Endocrine System Local Gene Expression
    Copyright 2008 By Nathan G. Salomonis ii Acknowledgments Publication Reprints The text in chapter 2 of this dissertation contains a reprint of materials as it appears in: Salomonis N, Hanspers K, Zambon AC, Vranizan K, Lawlor SC, Dahlquist KD, Doniger SW, Stuart J, Conklin BR, Pico AR. GenMAPP 2: new features and resources for pathway analysis. BMC Bioinformatics. 2007 Jun 24;8:218. The co-authors listed in this publication co-wrote the manuscript (AP and KH) and provided critical feedback (see detailed contributions at the end of chapter 2). The text in chapter 3 of this dissertation contains a reprint of materials as it appears in: Salomonis N, Cotte N, Zambon AC, Pollard KS, Vranizan K, Doniger SW, Dolganov G, Conklin BR. Identifying genetic networks underlying myometrial transition to labor. Genome Biol. 2005;6(2):R12. Epub 2005 Jan 28. The co-authors listed in this publication developed the hierarchical clustering method (KP), co-designed the study (NC, AZ, BC), provided statistical guidance (KV), co- contributed to GenMAPP 2.0 (SD) and performed quantitative mRNA analyses (GD). The text of this dissertation contains a reproduction of a figure from: Yeo G, Holste D, Kreiman G, Burge CB. Variation in alternative splicing across human tissues. Genome Biol. 2004;5(10):R74. Epub 2004 Sep 13. The reproduction was taken without permission (chapter 1), figure 1.3. iii Personal Acknowledgments The achievements of this doctoral degree are to a large degree possible due to the contribution, feedback and support of many individuals. To all of you that helped, I am extremely grateful for your support.
    [Show full text]
  • Downloaded From
    bioRxiv preprint doi: https://doi.org/10.1101/852368; this version posted December 12, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 Intra-Species Differences in Population Size shape Life History and Genome Evolution 2 Authors: David Willemsen1, Rongfeng Cui1, Martin Reichard2, Dario Riccardo Valenzano1,3* 3 Affiliations: 4 1Max Planck Institute for Biology of Ageing, Cologne, Germany. 5 2The Czech Academy of Sciences, Institute of Vertebrate Biology, Brno, Czech Republic. 6 3CECAD, University of Cologne, Cologne, Germany. 7 *Correspondence to: [email protected] 8 Key words: life history, evolution, genome, population genetics, killifish, Nothobranchius 9 furzeri, lifespan, sex chromosome, selection, genetic drift 10 11 1 bioRxiv preprint doi: https://doi.org/10.1101/852368; this version posted December 12, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 12 Abstract 13 The evolutionary forces shaping life history trait divergence within species are largely unknown. 14 Killifish (oviparous Cyprinodontiformes) evolved an annual life cycle as an exceptional 15 adaptation to life in arid savannah environments characterized by seasonal water availability. The 16 turquoise killifish (Nothobranchius furzeri) is the shortest-lived vertebrate known to science and 17 displays differences in lifespan among wild populations, representing an ideal natural experiment 18 in the evolution and diversification of life history.
    [Show full text]
  • Tepzz 8Z6z54a T
    (19) TZZ ZZ_T (11) EP 2 806 054 A1 (12) EUROPEAN PATENT APPLICATION (43) Date of publication: (51) Int Cl.: 26.11.2014 Bulletin 2014/48 C40B 40/06 (2006.01) C12Q 1/68 (2006.01) C40B 30/04 (2006.01) C07H 21/00 (2006.01) (21) Application number: 14175049.7 (22) Date of filing: 28.05.2009 (84) Designated Contracting States: (74) Representative: Irvine, Jonquil Claire AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HGF Limited HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL 140 London Wall PT RO SE SI SK TR London EC2Y 5DN (GB) (30) Priority: 28.05.2008 US 56827 P Remarks: •Thecomplete document including Reference Tables (62) Document number(s) of the earlier application(s) in and the Sequence Listing can be downloaded from accordance with Art. 76 EPC: the EPO website 09753364.0 / 2 291 553 •This application was filed on 30-06-2014 as a divisional application to the application mentioned (71) Applicant: Genomedx Biosciences Inc. under INID code 62. Vancouver, British Columbia V6J 1J8 (CA) •Claims filed after the date of filing of the application/ after the date of receipt of the divisional application (72) Inventor: Davicioni, Elai R.68(4) EPC). Vancouver British Columbia V6J 1J8 (CA) (54) Systems and methods for expression- based discrimination of distinct clinical disease states in prostate cancer (57) A system for expression-based discrimination of distinct clinical disease states in prostate cancer is provided that is based on the identification of sets of gene transcripts, which are characterized in that changes in expression of each gene transcript within a set of gene transcripts can be correlated with recurrent or non- recur- rent prostate cancer.
    [Show full text]
  • An Evaluation of Cancer Subtypes and Glioma Stem Cell Characterisation Unifying Tumour Transcriptomic Features with Cell Line Expression and Chromatin Accessibility
    An evaluation of cancer subtypes and glioma stem cell characterisation Unifying tumour transcriptomic features with cell line expression and chromatin accessibility Ewan Roderick Johnstone EMBL-EBI, Darwin College University of Cambridge This dissertation is submitted for the degree of Doctor of Philosophy Darwin College December 2016 Dedicated to Klaudyna. Declaration • I hereby declare that except where specific reference is made to the work of others, the contents of this dissertation are original and have not been submitted in whole or in part for consideration for any other degree or qualification in this, or any other university. • This dissertation is my own work and contains nothing which is the outcome of work done in collaboration with others, except as specified in the text and Acknowledge- ments. • This dissertation is typeset in LATEX using one-and-a-half spacing, contains fewer than 60,000 words including appendices, footnotes, tables and equations and has fewer than 150 figures. Ewan Roderick Johnstone December 2016 Acknowledgements This work was funded by the Biotechnology and Biological Sciences Research Council (BBSRC, Ref:1112564) and supported by the European Molecular Biology Laboratory (EMBL) and its outstation, the European Bioinformatics Institute (EBI). I have many people to thank for assistance in preparing this thesis. First and foremost I must thank my supervisor, Paul Bertone for his support and willingness to take me on as a student. My thanks are also extended to present and past members of the Bertone group, particularly Pär Engström and Remco Loos who have provided a great deal of guidance over the course of my studentship.
    [Show full text]
  • Intra-Species Differences in Population Size Shape Life History and Genome Evolution
    bioRxiv preprint doi: https://doi.org/10.1101/852368; this version posted December 12, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 Intra-Species Differences in Population Size shape Life History and Genome Evolution 2 Authors: David Willemsen1, Rongfeng Cui1, Martin Reichard2, Dario Riccardo Valenzano1,3* 3 Affiliations: 4 1Max Planck Institute for Biology of Ageing, Cologne, Germany. 5 2The Czech Academy of Sciences, Institute of Vertebrate Biology, Brno, Czech Republic. 6 3CECAD, University of Cologne, Cologne, Germany. 7 *Correspondence to: [email protected] 8 Key words: life history, evolution, genome, population genetics, killifish, Nothobranchius 9 furzeri, lifespan, sex chromosome, selection, genetic drift 10 11 1 bioRxiv preprint doi: https://doi.org/10.1101/852368; this version posted December 12, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 12 Abstract 13 The evolutionary forces shaping life history trait divergence within species are largely unknown. 14 Killifish (oviparous Cyprinodontiformes) evolved an annual life cycle as an exceptional 15 adaptation to life in arid savannah environments characterized by seasonal water availability. The 16 turquoise killifish (Nothobranchius furzeri) is the shortest-lived vertebrate known to science and 17 displays differences in lifespan among wild populations, representing an ideal natural experiment 18 in the evolution and diversification of life history.
    [Show full text]
  • Structure-Function Relationships of Rna and Protein in Synaptic Plasticity
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2017 Structure-Function Relationships Of Rna And Protein In Synaptic Plasticity Sarah Middleton University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Bioinformatics Commons, Biology Commons, and the Neuroscience and Neurobiology Commons Recommended Citation Middleton, Sarah, "Structure-Function Relationships Of Rna And Protein In Synaptic Plasticity" (2017). Publicly Accessible Penn Dissertations. 2474. https://repository.upenn.edu/edissertations/2474 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/2474 For more information, please contact [email protected]. Structure-Function Relationships Of Rna And Protein In Synaptic Plasticity Abstract Structure is widely acknowledged to be important for the function of ribonucleic acids (RNAs) and proteins. However, due to the relative accessibility of sequence information compared to structure information, most large genomics studies currently use only sequence-based annotation tools to analyze the function of expressed molecules. In this thesis, I introduce two novel computational methods for genome-scale structure-function analysis and demonstrate their application to identifying RNA and protein structures involved in synaptic plasticity and potentiation—important neuronal processes that are thought to form the basis of learning and memory. First, I describe a new method for de novo identification of RNA secondary structure motifs enriched in co-regulated transcripts. I show that this method can accurately identify secondary structure motifs that recur across three or more transcripts in the input set with an average recall of 0.80 and precision of 0.98. Second, I describe a tool for predicting protein structural fold from amino acid sequence, which achieves greater than 96% accuracy on benchmarks and can be used to predict protein function and identify new structural folds.
    [Show full text]
  • Modeling Gene Regulation from Paired Expression and Chromatin Accessibility Data
    Modeling gene regulation from paired expression and PNAS PLUS chromatin accessibility data Zhana Durena,b,c, Xi Chenb, Rui Jiangd,1, Yong Wanga,c,1, and Wing Hung Wongb,1 aAcademy of Mathematics and Systems Science, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing 100080, China; bDepartment of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, CA 94305; cSchool of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; and dMinistry of Education Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, Tsinghua National Laboratory for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084, China Contributed by Wing Hung Wong, May 8, 2017 (sent for review March 20, 2017; reviewed by Christina Kendziorski and Sheng Zhong) The rapid increase of genome-wide datasets on gene expression, gene expression data, accessibility data are available for a diverse set chromatin states, and transcription factor (TF) binding locations offers of cellular contexts (Fig. 1, blue boxes). In fact, we expect the an exciting opportunity to interpret the information encoded in amount of matched expression and accessibility data (i.e., measured genomes and epigenomes. This task can be challenging as it requires on the same sample) will increase very rapidly in the near future. joint modeling of context-specific activation of cis-regulatory ele- The purpose of the present work is to show that, by using ments (REs) and the effects on transcription of associated regulatory matched expression and accessibility data across diverse cellular factors. To meet this challenge, we propose a statistical approach contexts, it is possible to recover a significant portion of the in- based on paired expression and chromatin accessibility (PECA) data formation in the missing data on binding location and chromatin across diverse cellular contexts.
    [Show full text]
  • Using Massively Parallel Sequencing to Determine the Genetic Basis of Leigh Syndrome, the Most Common Mitochondrial Disorder Affecting Children
    Using Massively Parallel Sequencing to determine the genetic basis of Leigh Syndrome, the most common mitochondrial disorder affecting children Nicole Janet Lake ORCID ID 0000-0003-4103-6387 Doctor of Philosophy January 2018 Department of Paediatrics Faculty of Medicine, Dentistry and Health Sciences University of Melbourne Submitted in total fulfilment of the requirements of the degree of Doctor of Philosophy Abstract Mitochondrial diseases are debilitating illnesses caused by mutations that impair mitochondrial energy generation. The most common clinical presentation of mitochondrial disease in children is Leigh syndrome. This neurodegenerative disorder can be caused by mutations in more than 85 genes, encoded by both nuclear and mitochondrial DNA (mtDNA). When this PhD commenced, massively parallel sequencing for genetic diagnosis of Leigh syndrome was transitioning into the clinic, however its diagnostic utility in a clinical setting was unknown. Furthermore, a significant number of Leigh syndrome patients remained without a genetic diagnosis, indicating that further research was required to expand our understanding of the genetic basis of disease. To identify the maximum diagnostic yield of massively parallel sequencing in patients with Leigh syndrome, and to provide insight into the genetic basis of disease, unsolved patients from a historical Leigh syndrome cohort were studied. This cohort is comprised of 67 clinically- ascertained patients diagnosed with Leigh or Leigh-like syndrome according to stringent criteria. DNA from all 33 patients lacking a genetic diagnosis underwent whole exome sequencing, with parallel sequencing of the mtDNA. A targeted analysis of 2273 genes was performed, which included known and candidate mitochondrial disease genes, and differential diagnosis genes underlying distinct disorders with phenotypic overlap.
    [Show full text]
  • Retrotransposon-Mediated Instability in the Human Genome Shurjo Kumar Sen Louisiana State University and Agricultural and Mechanical College, [email protected]
    Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 2008 Retrotransposon-mediated instability in the human genome Shurjo Kumar Sen Louisiana State University and Agricultural and Mechanical College, [email protected] Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations Recommended Citation Sen, Shurjo Kumar, "Retrotransposon-mediated instability in the human genome" (2008). LSU Doctoral Dissertations. 1306. https://digitalcommons.lsu.edu/gradschool_dissertations/1306 This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. RETROTRANSPOSON-MEDIATED INSTABILITY IN THE HUMAN GENOME A Dissertation Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Department of Biological Sciences by Shurjo Kumar Sen B.Sc. (Hons.), University of Calcutta, 2001 M.Sc., University of Calcutta, 2003 May 2008 ACKNOWLEDGEMENTS I owe a debt of gratitude to several people for their assistance during the course of my dissertation research. My graduate advisor, Dr. Mark A. Batzer, has been no less than a father to me ever since I joined LSU, and has frequently helped me in his own special way to get over periodic attacks of laziness. Drs. Michael Hellberg, Joomyeong Kim and Stephania Cormier have been simply wonderful as members of my graduate committee. Dr. John Battista graciously spent large amounts of time answering my questions about DNA repair.
    [Show full text]
  • Genomic Population Structure and Prevalence of Copy Number Variations in South African Nguni Cattle Magretha Diane Wang1,2, Kennedy Dzama1, Charles A
    Wang et al. BMC Genomics (2015) 16:894 DOI 10.1186/s12864-015-2122-z RESEARCH ARTICLE Open Access Genomic population structure and prevalence of copy number variations in South African Nguni cattle Magretha Diane Wang1,2, Kennedy Dzama1, Charles A. Hefer2 and Farai C. Muchadeyi2* Abstract Background: Copy number variations (CNVs) are modifications in DNA structure comprising of deletions, duplications, insertions and complex multi-site variants. Although CNVs are proven to be involved in a variety of phenotypic discrepancies, the full extent and consequence of CNVs is yet to be understood. To date, no such genomic characterization has been performed in indigenous South African Nguni cattle. Nguni cattle are recognized for their ability to sustain harsh environmental conditions while exhibiting enhanced resistance to disease and parasites and are thought to comprise of up to nine different ecotypes. Methods: Illumina BovineSNP50 Beadchip data was utilized to investigate genomic population structure and the prevalenceofCNVsin492SouthAfricanNgunicattle.PLINK,ADMIXTURE,R,gPLINKandHaploviewsoftwarewas utilized for quality control, population structure and haplotype block determination. PennCNV hidden Markov model identified CNVs and genes contained within and 10 Mb downstream from reported CNVs. PANTHER and Ensembl databases were subsequently utilized for gene annotation analyses. Results: Population structure analyses on Nguni cattle revealed 5 sub-populations with a possible sub-structure evident at K equal to 8. Four hundred and thirty three CNVs that formed 334 CNVRs ranging from 30 kb to 1 Mb in size are reported. Only 231 of the 492 animals demonstrated CNVRs. Two hundred and eighty nine genes were observed within CNVRs identified. Of these 149, 28, 44, 2 and 14 genes were unique to sub-populations A, B, C, D and E respectively.
    [Show full text]