Mouse Tsfm Knockout Project (CRISPR/Cas9)

Total Page:16

File Type:pdf, Size:1020Kb

Mouse Tsfm Knockout Project (CRISPR/Cas9) https://www.alphaknockout.com Mouse Tsfm Knockout Project (CRISPR/Cas9) Objective: To create a Tsfm knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Tsfm gene (NCBI Reference Sequence: NM_025537 ; Ensembl: ENSMUSG00000040521 ) is located on Mouse chromosome 10. 6 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 6 (Transcript: ENSMUST00000040560). Exon 3~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Exon 3 starts from about 23.56% of the coding region. Exon 3~5 covers 34.98% of the coding region. The size of effective KO region: ~4747 bp. The KO region does not have any other known gene. Page 1 of 9 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele 5' gRNA region gRNA region 3' 1 3 4 5 6 Legends Exon of mouse Tsfm Knockout region Page 2 of 9 https://www.alphaknockout.com Overview of the Dot Plot (up) Window size: 15 bp Forward Reverse Complement Sequence 12 Note: The 780 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis. Overview of the Dot Plot (down) Window size: 15 bp Forward Reverse Complement Sequence 12 Note: The 2000 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis. Page 3 of 9 https://www.alphaknockout.com Overview of the GC Content Distribution (up) Window size: 300 bp Sequence 12 Summary: Full Length(780bp) | A(26.28% 205) | C(20.77% 162) | T(29.23% 228) | G(23.72% 185) Note: The 780 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis. Overview of the GC Content Distribution (down) Window size: 300 bp Sequence 12 Summary: Full Length(2000bp) | A(23.8% 476) | C(21.7% 434) | T(27.9% 558) | G(26.6% 532) Note: The 2000 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis. Page 4 of 9 https://www.alphaknockout.com BLAT Search Results (up) QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ----------------------------------------------------------------------------------------------- browser details YourSeq 780 1 780 780 100.0% chr10 - 127029683 127030462 780 browser details YourSeq 167 480 692 780 88.7% chr11 + 97266770 97266967 198 browser details YourSeq 162 493 679 780 93.6% chr8 - 33754934 33755122 189 browser details YourSeq 160 480 683 780 91.9% chr16 - 17696539 17696740 202 browser details YourSeq 160 495 679 780 93.6% chr12 + 3569692 3569878 187 browser details YourSeq 158 494 680 780 93.1% chr12 + 70926461 70926665 205 browser details YourSeq 158 496 678 780 91.2% chr11 + 75529088 75529267 180 browser details YourSeq 157 487 679 780 92.0% chr11 - 115773179 115773372 194 browser details YourSeq 156 496 666 780 96.0% chr16 + 21970222 21970394 173 browser details YourSeq 156 495 686 780 91.1% chr13 + 47053172 47053368 197 browser details YourSeq 155 457 678 780 91.5% chrX + 7719611 7720025 415 browser details YourSeq 155 498 679 780 92.6% chr11 + 113706884 113707063 180 browser details YourSeq 154 499 683 780 90.1% chr11 - 115349621 115349803 183 browser details YourSeq 154 499 678 780 97.6% chr8 + 114409593 114409773 181 browser details YourSeq 154 496 671 780 94.9% chr11 + 113711359 113711542 184 browser details YourSeq 153 495 679 780 91.9% chr4 - 39569645 39569835 191 browser details YourSeq 153 495 666 780 94.8% chr9 + 99231994 99232167 174 browser details YourSeq 152 504 695 780 91.9% chr5 - 121337798 121337999 202 browser details YourSeq 152 496 684 780 90.5% chr5 - 118301919 118302109 191 browser details YourSeq 152 500 684 780 93.3% chr19 - 32875777 32875966 190 Note: The 780 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found. BLAT Search Results (down) QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ----------------------------------------------------------------------------------------------- browser details YourSeq 2000 1 2000 2000 100.0% chr10 - 127022936 127024935 2000 browser details YourSeq 94 739 897 2000 81.0% chr18 - 37788435 37788577 143 browser details YourSeq 93 737 898 2000 84.1% chr11 + 31829225 31829373 149 browser details YourSeq 91 752 897 2000 82.1% chr9 + 61106858 61106987 130 browser details YourSeq 88 744 884 2000 81.7% chr17 + 45915386 45915519 134 browser details YourSeq 87 739 884 2000 80.5% chr4 + 155798141 155798279 139 browser details YourSeq 86 739 879 2000 80.4% chr6 - 117822107 117822241 135 browser details YourSeq 86 744 879 2000 86.5% chr3 - 28465062 28465195 134 browser details YourSeq 85 739 872 2000 83.1% chr14 - 88154101 88154228 128 browser details YourSeq 85 751 884 2000 82.9% chr8 + 35711471 35711598 128 browser details YourSeq 84 737 862 2000 82.7% chr9 - 70194486 70194607 122 browser details YourSeq 84 717 862 2000 90.5% chr4 + 117221525 117221671 147 browser details YourSeq 83 737 870 2000 83.7% chr7 + 88547735 88547864 130 browser details YourSeq 80 723 842 2000 90.2% chr8 - 5092012 5092403 392 browser details YourSeq 80 758 884 2000 81.8% chr12 - 79013844 79013961 118 browser details YourSeq 80 739 870 2000 82.3% chrX + 99881395 99881520 126 browser details YourSeq 80 741 862 2000 91.0% chr16 + 20329234 20329363 130 browser details YourSeq 78 743 866 2000 90.0% chr5 - 129384831 129384954 124 browser details YourSeq 77 739 862 2000 83.9% chr9 + 53642859 53642977 119 browser details YourSeq 76 752 870 2000 84.6% chr9 - 64136382 64136495 114 Note: The 2000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found. Page 5 of 9 https://www.alphaknockout.com Gene and protein information: Tsfm Ts translation elongation factor, mitochondrial [ Mus musculus (house mouse) ] Gene ID: 66399, updated on 10-Oct-2019 Gene summary Official Symbol Tsfm provided by MGI Official Full Name Ts translation elongation factor, mitochondrial provided by MGI Primary source MGI:MGI:1913649 See related Ensembl:ENSMUSG00000040521 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as EF-TS; EF-Tsmt; 2310050B20Rik; 9430024O13Rik Expression Ubiquitous expression in adrenal adult (RPKM 24.4), ovary adult (RPKM 16.4) and 28 other tissues See more Orthologs human all Genomic context Location: 10; 10 D3 See Tsfm in Genome Data Viewer Exon count: 6 Annotation release Status Assembly Chr Location 108 current GRCm38.p6 (GCF_000001635.26) 10 NC_000076.6 (127022332..127030814, complement) Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 10 NC_000076.5 (126459388..126467870, complement) Chromosome 10 - NC_000076.6 Page 6 of 9 https://www.alphaknockout.com Transcript information: This gene has 7 transcripts Gene: Tsfm ENSMUSG00000040521 Description Ts translation elongation factor, mitochondrial [Source:MGI Symbol;Acc:MGI:1913649] Gene Synonyms 2310050B20Rik, 9430024O13Rik, EF-TS, EF-Tsmt Location Chromosome 10: 127,011,572-127,030,840 reverse strand. GRCm38:CM001003.2 About this gene This gene has 7 transcripts (splice variants), 181 orthologues, is a member of 1 Ensembl protein family and is associated with 2 phenotypes. Transcripts Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags Tsfm-201 ENSMUST00000040560.10 1206 324aa ENSMUSP00000042134.4 Protein coding CCDS24222 Q9CZR8 TSL:1 GENCODE basic APPRIS P1 Tsfm-202 ENSMUST00000120547.1 1271 193aa ENSMUSP00000113446.1 Protein coding - Q9CX33 TSL:1 GENCODE basic Tsfm-207 ENSMUST00000152054.7 655 206aa ENSMUSP00000122669.1 Protein coding - D3Z4M7 TSL:3 GENCODE basic Tsfm-203 ENSMUST00000134917.1 695 No protein - Retained intron - - TSL:1 Tsfm-206 ENSMUST00000145476.1 656 No protein - Retained intron - - TSL:1 Tsfm-204 ENSMUST00000138556.1 641 No protein - Retained intron - - TSL:1 Tsfm-205 ENSMUST00000140564.7 833 No protein - lncRNA - - TSL:3 Page 7 of 9 https://www.alphaknockout.com 39.27 kb Forward strand 127.01Mb 127.02Mb 127.03Mb 127.04Mb Genes Avil-201 >protein coding (Comprehensive set... Avil-204 >protein coding Avil-202 >protein coding Avil-203 >protein coding Contigs < AC134329.3 Genes (Comprehensive set... < Tsfm-207protein coding < Eef1akmt3-201protein coding < Tsfm-201protein coding < Tsfm-204retained intron < Tsfm-203retained intron < Tsfm-205lncRNA < Tsfm-206retained intron < Tsfm-202protein coding Regulatory Build 127.01Mb 127.02Mb 127.03Mb 127.04Mb Reverse strand 39.27 kb Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Gene Legend Protein Coding Ensembl protein coding merged Ensembl/Havana Non-Protein Coding RNA gene processed transcript Page 8 of 9 https://www.alphaknockout.com Transcript: ENSMUST00000040560 < Tsfm-201protein coding Reverse strand 8.51 kb ENSMUSP00000042... Low complexity (Seg) Superfamily UBA-like superfamily Elongation factor Ts, dimerisation domain superfamily Pfam Translation elongation factor EFTs/EF1B, dimerisation PROSITE patterns Translation elongation factor Ts, conserved site Translation elongation factor Ts, conserved site PANTHER Translation elongation factor EFTs/EF1B PTHR11741:SF0 HAMAP Translation elongation factor EFTs/EF1B Gene3D 1.10.8.10 Elongation factor Ts, dimerisation domain superfamily CDD cd14275 All sequence SNPs/i... Sequence variants (dbSNP and all other sources) Variant Legend stop gained missense variant splice region variant synonymous variant Scale bar 0 40 80 120 160 200 240 280 324 We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.
Recommended publications
  • Complexity of a Small Non-Protein Coding Sequence in Chromosomal Region 22Q11.2: Presence of Specialized DNA Secondary Structures and RNA Exon/Intron Motifs Delihas
    Complexity of a small non-protein coding sequence in chromosomal region 22q11.2: presence of specialized DNA secondary structures and RNA exon/intron motifs Delihas Delihas BMC Genomics (2015) 16:785 DOI 10.1186/s12864-015-1958-6 Delihas BMC Genomics (2015) 16:785 DOI 10.1186/s12864-015-1958-6 RESEARCHARTICLE Open Access Complexity of a small non-protein coding sequence in chromosomal region 22q11.2: presence of specialized DNA secondary structures and RNA exon/intron motifs Nicholas Delihas Abstract Background: DiGeorge Syndrome is a genetic abnormality involving ~3 Mb deletion in human chromosome 22, termed 22q.11.2. To better understand the non-coding regions of 22q.11.2, a small 10,000 bp non-protein-coding sequence close to the DiGeorge Critical Region 6 gene (DGCR6) was chosen for analysis and functional entities as the homologous sequence in the chimpanzee genome could be aligned and used for comparisons. Methods: The GenBank database provided genomic sequences. In silico computer programs were used to find homologous DNA sequences in human and chimpanzee genomes, generate random sequences, determine DNA sequence alignments, sequence comparisons and nucleotide repeat copies, and to predicted DNA secondary structures. Results: At its 5′ half, the 10,000 bp sequence has three distinct sections that represent phylogenetically variable sequences. These Variable Regions contain biased mutations with a very high A + T content, multiple copies of the motif TATAATATA and sequences that fold into long A:T-base-paired stem loops. The 3′ half of the 10,000 bp unit, highly conserved between human and chimpanzee, has sequences representing exons of lncRNA genes and segments of introns of protein genes.
    [Show full text]
  • A Different View on DNA Amplifications Indicates Frequent, Highly Complex, and Stable Amplicons on 12Q13-21 in Glioma
    A Different View on DNA Amplifications Indicates Frequent, Highly Complex, and Stable Amplicons on 12q13-21 in Glioma Ulrike Fischer,1 Andreas Keller,3 Petra Leidinger,1 Stephanie Deutscher,1 Sabrina Heisel,1 Steffi Urbschat,2 Hans-Peter Lenhof,3 and Eckart Meese1 Departments of 1Human Genetics and 2Neurosurgery, Saarland University, Homburg/Saar, Germany and 3Center for Bioinformatics, Saarland University, Saarbru¨cken, Germany Abstract Introduction To further understand the biological significance of DNA amplification does not occur in normal human cells amplifications for glioma development and recurrencies, but in multidrug-resistant cells and in tumor cells. Numerous we characterized amplicon frequency and size in studies described gene amplification in various human tumors low-grade glioma and amplicon stability in vivo in by cytogenetic and molecular genetic means. The breakage- recurring glioblastoma. We developed a 12q13-21 fusion-bridge cycle (1) is the most popular model to explain amplicon–specific genomic microarray and a intrachromosomal amplifications and many amplified structures bioinformatics amplification prediction tool to analyze like mixed ladders that were found in homogeneously staining amplicon frequency, size, and maintenance in 40 glioma regions (2). The ‘‘episome model’’ proposes that episomes samples including 16 glioblastoma, 10 anaplastic result from excision of small circular DNA that enlarges by astrocytoma, 7 astrocytoma WHO grade 2, and 7 overreplication or recombination until it becomes cytogeneti- pilocytic astrocytoma. Whereas previous studies cally visible as double minute chromosomes (3). Recent results reported two amplified subregions, we found a more of Tanaka et al. (4) showthat large palindromic sequences were complex situation with many amplified subregions. present in human cancers and the location of palindromes in the Analyzing 40 glioma, we found that all analyzed cancer genome serves as a structural platform to support glioblastoma and the majority of pilocytic astrocytoma, subsequent gene amplification.
    [Show full text]
  • Variation in Protein Coding Genes Identifies Information Flow
    bioRxiv preprint doi: https://doi.org/10.1101/679456; this version posted June 21, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Animal complexity and information flow 1 1 2 3 4 5 Variation in protein coding genes identifies information flow as a contributor to 6 animal complexity 7 8 Jack Dean, Daniela Lopes Cardoso and Colin Sharpe* 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Institute of Biological and Biomedical Sciences 25 School of Biological Science 26 University of Portsmouth, 27 Portsmouth, UK 28 PO16 7YH 29 30 * Author for correspondence 31 [email protected] 32 33 Orcid numbers: 34 DLC: 0000-0003-2683-1745 35 CS: 0000-0002-5022-0840 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Abstract bioRxiv preprint doi: https://doi.org/10.1101/679456; this version posted June 21, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Animal complexity and information flow 2 1 Across the metazoans there is a trend towards greater organismal complexity. How 2 complexity is generated, however, is uncertain. Since C.elegans and humans have 3 approximately the same number of genes, the explanation will depend on how genes are 4 used, rather than their absolute number.
    [Show full text]
  • Accurate Prediction of Kinase-Substrate Networks Using
    bioRxiv preprint doi: https://doi.org/10.1101/865055; this version posted December 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Accurate Prediction of Kinase-Substrate Networks Using Knowledge Graphs V´ıtNov´aˇcek1∗+, Gavin McGauran3, David Matallanas3, Adri´anVallejo Blanco3,4, Piero Conca2, Emir Mu~noz1,2, Luca Costabello2, Kamalesh Kanakaraj1, Zeeshan Nawaz1, Sameh K. Mohamed1, Pierre-Yves Vandenbussche2, Colm Ryan3, Walter Kolch3,5,6, Dirk Fey3,6∗ 1Data Science Institute, National University of Ireland Galway, Ireland 2Fujitsu Ireland Ltd., Co. Dublin, Ireland 3Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland 4Department of Oncology, Universidad de Navarra, Pamplona, Spain 5Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland 6School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland ∗ Corresponding authors ([email protected], [email protected]). + Lead author. 1 bioRxiv preprint doi: https://doi.org/10.1101/865055; this version posted December 4, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Abstract Phosphorylation of specific substrates by protein kinases is a key control mechanism for vital cell-fate decisions and other cellular pro- cesses. However, discovering specific kinase-substrate relationships is time-consuming and often rather serendipitous.
    [Show full text]
  • Gnomad Lof Supplement
    1 gnomAD supplement gnomAD supplement 1 Data processing 4 Alignment and read processing 4 Variant Calling 4 Coverage information 5 Data processing 5 Sample QC 7 Hard filters 7 Supplementary Table 1 | Sample counts before and after hard and release filters 8 Supplementary Table 2 | Counts by data type and hard filter 9 Platform imputation for exomes 9 Supplementary Table 3 | Exome platform assignments 10 Supplementary Table 4 | Confusion matrix for exome samples with Known platform labels 11 Relatedness filters 11 Supplementary Table 5 | Pair counts by degree of relatedness 12 Supplementary Table 6 | Sample counts by relatedness status 13 Population and subpopulation inference 13 Supplementary Figure 1 | Continental ancestry principal components. 14 Supplementary Table 7 | Population and subpopulation counts 16 Population- and platform-specific filters 16 Supplementary Table 8 | Summary of outliers per population and platform grouping 17 Finalizing samples in the gnomAD v2.1 release 18 Supplementary Table 9 | Sample counts by filtering stage 18 Supplementary Table 10 | Sample counts for genomes and exomes in gnomAD subsets 19 Variant QC 20 Hard filters 20 Random Forest model 20 Features 21 Supplementary Table 11 | Features used in final random forest model 21 Training 22 Supplementary Table 12 | Random forest training examples 22 Evaluation and threshold selection 22 Final variant counts 24 Supplementary Table 13 | Variant counts by filtering status 25 Comparison of whole-exome and whole-genome coverage in coding regions 25 Variant annotation 30 Frequency and context annotation 30 2 Functional annotation 31 Supplementary Table 14 | Variants observed by category in 125,748 exomes 32 Supplementary Figure 5 | Percent observed by methylation.
    [Show full text]
  • Annotating Gene Sequence Variation Watch for Multiple Transcripts!
    Identifying functionally significant causal variants in Segregation data Functional data Annotating gene sequence variation Frequency Animal in Interpretation Models Controls Shamil Sunyaev Department of Biomedical Informatics Harvard Medical School Bioinformatics Broad Institute of M.I.T. and Harvard Map variants on genomic annotation Nonsense variants One of most significant types of variants usually leading to the complete loss of function. Watch for multiple transcripts! Nonsense variants are enriched in sequencing artifacts Watch for conflicting annotations! Important considerations: i) location along the gene, ii) does the variant cause NMD? iii) is the variant in a commonly skipped exon? Tool: LOFTEE Variants involved in splicing SpliceAI Variants in canonic splicing sites Variants in exonic or intronic splicing enhancers Gain of splicing variants Missense variants: computational Does the mutation fit the pattern of past predictions evolution? This image cannot currently be displayed. human VVSTADLCAPSSTKLDERA dog FVSTSELCAGSTTRLEERA fish FLSTSELCVPSTLKVNEKV Statistical issues: -sequences are related by phylogeny -generally, we have too few sequences Does the mutation fit the pattern of past evolution? Continuous time Markov model • We assume a constant fitness landscape: what is good for fish is good for human! GLY VAL ALA GLY ALA • We can estimate whether the mutation fits the pattern of amino acid changes. • We can also estimate rate of evolution at the amino acid site Continuous time Markov model Protein structure view P – matrix of transition probabilities 푃 푡 = 푒-/ p – stationary distribution • Most of pathogenic mutations are important for stability (good news?). • DDG is difficult to estimate. 푄휋1# 0 • Unfolded protein response pathway has to be taken into account.
    [Show full text]
  • Table S1. 103 Ferroptosis-Related Genes Retrieved from the Genecards
    Table S1. 103 ferroptosis-related genes retrieved from the GeneCards. Gene Symbol Description Category GPX4 Glutathione Peroxidase 4 Protein Coding AIFM2 Apoptosis Inducing Factor Mitochondria Associated 2 Protein Coding TP53 Tumor Protein P53 Protein Coding ACSL4 Acyl-CoA Synthetase Long Chain Family Member 4 Protein Coding SLC7A11 Solute Carrier Family 7 Member 11 Protein Coding VDAC2 Voltage Dependent Anion Channel 2 Protein Coding VDAC3 Voltage Dependent Anion Channel 3 Protein Coding ATG5 Autophagy Related 5 Protein Coding ATG7 Autophagy Related 7 Protein Coding NCOA4 Nuclear Receptor Coactivator 4 Protein Coding HMOX1 Heme Oxygenase 1 Protein Coding SLC3A2 Solute Carrier Family 3 Member 2 Protein Coding ALOX15 Arachidonate 15-Lipoxygenase Protein Coding BECN1 Beclin 1 Protein Coding PRKAA1 Protein Kinase AMP-Activated Catalytic Subunit Alpha 1 Protein Coding SAT1 Spermidine/Spermine N1-Acetyltransferase 1 Protein Coding NF2 Neurofibromin 2 Protein Coding YAP1 Yes1 Associated Transcriptional Regulator Protein Coding FTH1 Ferritin Heavy Chain 1 Protein Coding TF Transferrin Protein Coding TFRC Transferrin Receptor Protein Coding FTL Ferritin Light Chain Protein Coding CYBB Cytochrome B-245 Beta Chain Protein Coding GSS Glutathione Synthetase Protein Coding CP Ceruloplasmin Protein Coding PRNP Prion Protein Protein Coding SLC11A2 Solute Carrier Family 11 Member 2 Protein Coding SLC40A1 Solute Carrier Family 40 Member 1 Protein Coding STEAP3 STEAP3 Metalloreductase Protein Coding ACSL1 Acyl-CoA Synthetase Long Chain Family Member 1 Protein
    [Show full text]
  • Systematic Detection of Brain Protein-Coding Genes Under Positive Selection During Primate Evolution and Their Roles in Cognition
    Downloaded from genome.cshlp.org on October 7, 2021 - Published by Cold Spring Harbor Laboratory Press Title: Systematic detection of brain protein-coding genes under positive selection during primate evolution and their roles in cognition Short title: Evolution of brain protein-coding genes in humans Guillaume Dumasa,b, Simon Malesysa, and Thomas Bourgerona a Human Genetics and Cognitive Functions, Institut Pasteur, UMR3571 CNRS, Université de Paris, Paris, (75015) France b Department of Psychiatry, Université de Montreal, CHU Ste Justine Hospital, Montreal, QC, Canada. Corresponding author: Guillaume Dumas Human Genetics and Cognitive Functions Institut Pasteur 75015 Paris, France Phone: +33 6 28 25 56 65 [email protected] Dumas, Malesys, and Bourgeron 1 of 40 Downloaded from genome.cshlp.org on October 7, 2021 - Published by Cold Spring Harbor Laboratory Press Abstract The human brain differs from that of other primates, but the genetic basis of these differences remains unclear. We investigated the evolutionary pressures acting on almost all human protein-coding genes (N=11,667; 1:1 orthologs in primates) based on their divergence from those of early hominins, such as Neanderthals, and non-human primates. We confirm that genes encoding brain-related proteins are among the most strongly conserved protein-coding genes in the human genome. Combining our evolutionary pressure metrics for the protein- coding genome with recent datasets, we found that this conservation applied to genes functionally associated with the synapse and expressed in brain structures such as the prefrontal cortex and the cerebellum. Conversely, several genes presenting signatures commonly associated with positive selection appear as causing brain diseases or conditions, such as micro/macrocephaly, Joubert syndrome, dyslexia, and autism.
    [Show full text]
  • Whole Genome Analyses of a Well-Differentiated Liposarcoma Reveals Novel SYT1 and DDR2 Rearrangements
    Whole Genome Analyses of a Well-Differentiated Liposarcoma Reveals Novel SYT1 and DDR2 Rearrangements Jan B. Egan1, Michael T. Barrett2, Mia D. Champion3,4, Sumit Middha5, Elizabeth Lenkiewicz2, Lisa Evers2, Princy Francis 6 Jessica Schmidt 6 Chang-Xin , Shi 6 , Scott Van Wier, 6 Sandra, Badar 6 , Gregory Ahmann 6 K., Martin Kortuem 7 , Nicole J. Boczek8 , Rafael Fonseca 1 , 9, David W. Craig10, John D. Carpten11, Mitesh J. Borad1,9, A. Keith Stewart1,9* 1 Comprehensive Cancer Center, Mayo Clinic, Scottsdale, Arizona, United States of America, 2 Clinical Translational Research Division, Translational Genomics Research Institute, Phoenix, Arizona, United States of America, 3 Department of Biomedical Statistics and Informatics, Mayo Clinic, Scottsdale, Arizona, United States of America, 4 Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, United States of America, 5 Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America, 6 Research, Mayo Clinic, Scottsdale, Arizona, United States of America, 7 Hematology, Mayo Clinic, Scottsdale, Arizona, United States of America, 8 Mayo Graduate School, Mayo Clinic, Rochester, Minnesota, United States of America, 9 Division of Hematology/Oncology Mayo Clinic, Scottsdale, Arizona, United States of America, 10 Neurogenomics Division, Translational Genomics Research Institute, Phoenix, Arizona, United States of America, 11 Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, Arizona, United States of America Abstract Liposarcoma is the most common soft tissue sarcoma, but little is known about the genomic basis of this disease. Given the low cell content of this tumor type, we utilized flow cytometry to isolate the diploid normal and aneuploid tumor populations from a well-differentiated liposarcoma prior to array comparative genomic hybridization and whole genome sequencing.
    [Show full text]
  • Annotation of Functional Variation Within Non-MHC MS Susceptibility Loci Through Bioinformatics Analysis
    Genes and Immunity (2014) 15, 466–476 & 2014 Macmillan Publishers Limited All rights reserved 1466-4879/14 www.nature.com/gene ORIGINAL ARTICLE Annotation of functional variation within non-MHC MS susceptibility loci through bioinformatics analysis FBS Briggs, LJ Leung and LF Barcellos There is a strong and complex genetic component to multiple sclerosis (MS). In addition to variation in the major histocompatibility complex (MHC) region on chromosome 6p21.3, 110 non-MHC susceptibility variants have been identified in Northern Europeans, thus far. The majority of the MS-associated genes are immune related; however, similar to most other complex genetic diseases, the causal variants and biological processes underlying pathogenesis remain largely unknown. We created a comprehensive catalog of putative functional variants that reside within linkage disequilibrium regions of the MS-associated genic variants to guide future studies. Bioinformatics analyses were also conducted using publicly available resources to identify plausible pathological processes relevant to MS and functional hypotheses for established MS-associated variants. Genes and Immunity (2014) 15, 466–476; doi:10.1038/gene.2014.37; published online 17 July 2014 INTRODUCTION protein structure through alternative splicing within IL7R7 and 8 Multiple sclerosis (MS) is a clinically heterogeneous autoimmune TNFRSF1A. However, the causal variants and the pathological disease of the central nervous system with a complex etiology, biological processes mediated by the remaining 103 loci
    [Show full text]
  • A Multi- Tissue Transcriptomic Network META- Analysis Rosa Faner1* , Jarrett D
    Faner et al. Respiratory Research (2019) 20:5 https://doi.org/10.1186/s12931-018-0965-y RESEARCH Open Access Do sputum or circulating blood samples reflect the pulmonary transcriptomic differences of COPD patients? A multi- tissue transcriptomic network META- analysis Rosa Faner1* , Jarrett D. Morrow2, Sandra Casas-Recasens1, Suzanne M. Cloonan3, Guillaume Noell1, Alejandra López-Giraldo1,4, Ruth Tal-Singer5, Bruce E. Miller5, Edwin K. Silverman2, Alvar Agustí1,4 and Craig P. Hersh2 Abstract Background: Previous studies have identified lung, sputum or blood transcriptomic biomarkers associated with the severity of airflow limitation in COPD. Yet, it is not clear whether the lung pathobiology is mirrored by these surrogate tissues. The aim of this study was to explore this question. Methods: We used Weighted Gene Co-expression Network Analysis (WGCNA) to identify shared pathological mechanisms across four COPD gene-expression datasets: two sets of lung tissues (L1 n = 70; L2 n = 124), and one each of induced sputum (S; n = 121) and peripheral blood (B; n = 121). Results: WGCNA analysis identified twenty-one gene co-expression modules in L1. A robust module preservation between the two L datasets was observed (86%), with less preservation in S (33%) and even less in B (23%). Three modules preserved across lung tissues and sputum (not blood) were associated with the severity of airflow limitation. Ontology enrichment analysis showed that these modules included genes related to mitochondrial function, ion-homeostasis, T cells and RNA processing. These findings were largely reproduced using the consensus WGCNA network approach. Conclusions: These observations indicate that major differences in lung tissue transcriptomics in patients with COPD are poorly mirrored in sputum and are unrelated to those determined in blood, suggesting that the systemic component in COPD is independently regulated.
    [Show full text]
  • Hipsc-Derived Cardiomyocyte Model of LQT2 Syndrome Derived
    cells Article hiPSC-Derived Cardiomyocyte Model of LQT2 Syndrome Derived from Asymptomatic and Symptomatic Mutation Carriers Reproduces Clinical Differences in Aggregates but Not in Single Cells Disheet Shah 1,* , Chandra Prajapati 1, Kirsi Penttinen 1, Reeja Maria Cherian 1, Jussi T. Koivumäki 1, Anna Alexanova 1 , Jari Hyttinen 1 and Katriina Aalto-Setälä 1,2 1 Faculty of Medicine and Health Technology and BioMediTech Institute, Tampere University, 33520 Tampere, Finland; chandra.prajapati@tuni.fi (C.P.); kirsi.penttinen@tuni.fi (K.P.); reeja.maria.cherian@tuni.fi (R.M.C.); jussi.koivumaki@tuni.fi (J.T.K.); anna.alexanova@tuni.fi (A.A.); jari.hyttinen@tuni.fi (J.H.); katriina.aalto-setala@tuni.fi (K.A.-S.) 2 Heart Hospital, Tampere University Hospital, 33520 Tampere, Finland * Correspondence: disheet.shah@tuni.fi Received: 26 February 2020; Accepted: 2 May 2020; Published: 7 May 2020 Abstract: Mutations in the HERG gene encoding the potassium ion channel HERG, represent one of the most frequent causes of long QT syndrome type-2 (LQT2). The same genetic mutation frequently presents different clinical phenotypes in the family. Our study aimed to model LQT2 and study functional differences between the mutation carriers of variable clinical phenotypes. We derived human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CM) from asymptomatic and symptomatic HERG mutation carriers from the same family. When comparing asymptomatic and symptomatic single LQT2 hiPSC-CMs, results from allelic imbalance, potassium current density, and arrhythmicity on adrenaline exposure were similar, but a difference in Ca2+ transients was observed. The major differences were, however, observed at aggregate level with increased susceptibility to arrhythmias on exposure to adrenaline or potassium channel blockers on CM aggregates derived from the symptomatic individual.
    [Show full text]