https://www.alphaknockout.com

Mouse Pus1 Knockout Project (CRISPR/Cas9)

Objective: To create a Pus1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Pus1 (NCBI Reference Sequence: NM_001025561 ; Ensembl: ENSMUSG00000029507 ) is located on Mouse 5. 7 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 7 (Transcript: ENSMUST00000086643). Exon 1~7 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit slow postnatal weight gain, impaired exercise endurance, and alterations in muscle metabolism related to mitochondrial content and oxidative capacity.

Exon 1 starts from about 0.08% of the coding region. Exon 1~7 covers 100.0% of the coding region. The size of effective KO region: ~6383 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7

Legends Exon of mouse Pus1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.35% 447) | C(25.45% 509) | T(23.75% 475) | G(28.45% 569)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(21.7% 434) | C(22.7% 454) | T(32.2% 644) | G(23.4% 468)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr5 - 110780320 110782319 2000 browser details YourSeq 65 1105 1265 2000 87.5% chr2 - 6152289 6152547 259 browser details YourSeq 64 30 163 2000 91.1% chr16 + 42042849 42043006 158 browser details YourSeq 59 45 160 2000 75.7% chr4 - 111670295 111670410 116 browser details YourSeq 56 62 170 2000 75.5% chr2 + 124279757 124279864 108 browser details YourSeq 48 1075 1152 2000 76.6% chr11 + 96031566 96031633 68 browser details YourSeq 47 49 166 2000 76.7% chr17 + 13783707 13783822 116 browser details YourSeq 46 45 163 2000 85.8% chr10 + 95463815 95463931 117 browser details YourSeq 44 938 1147 2000 94.0% chr16 + 13628621 13629114 494 browser details YourSeq 42 48 157 2000 87.0% chr12 - 103254822 103254928 107 browser details YourSeq 42 28 94 2000 88.9% chr15 + 3233743 3233839 97 browser details YourSeq 41 1111 1159 2000 91.9% chr17 - 84970863 84970911 49 browser details YourSeq 41 77 147 2000 91.2% chr6 + 137093445 137093514 70 browser details YourSeq 40 1100 1151 2000 88.5% chr7 - 49517134 49517185 52 browser details YourSeq 40 1192 1276 2000 93.5% chr19 + 53633703 53633790 88 browser details YourSeq 40 43 147 2000 87.5% chr16 + 35914486 35914589 104 browser details YourSeq 39 1109 1159 2000 88.3% chr15 - 79544304 79544354 51 browser details YourSeq 39 999 1227 2000 63.3% chr10 - 112362576 112362723 148 browser details YourSeq 39 1120 1163 2000 95.5% chr1 - 59319544 59319589 46 browser details YourSeq 39 45 98 2000 88.9% chr13 + 58665116 58665168 53

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr5 - 110771935 110773934 2000 browser details YourSeq 316 614 1928 2000 93.2% chr8 - 69609895 70144378 534484 browser details YourSeq 263 602 1896 2000 95.2% chr9 + 53530951 53651973 121023 browser details YourSeq 156 593 758 2000 97.0% chr5 - 137479019 137479184 166 browser details YourSeq 155 593 759 2000 96.5% chr5 - 90190011 90190177 167 browser details YourSeq 155 593 758 2000 97.0% chr16 - 4284326 4284492 167 browser details YourSeq 155 592 758 2000 96.5% chr10 - 71317150 71317316 167 browser details YourSeq 154 592 758 2000 96.5% chr5 - 107948647 107948835 189 browser details YourSeq 154 1589 1935 2000 89.6% chr9 + 73091051 73091637 587 browser details YourSeq 153 593 758 2000 96.4% chr19 - 24840625 24840792 168 browser details YourSeq 153 594 759 2000 96.4% chr2 + 11605492 11605658 167 browser details YourSeq 153 1442 1923 2000 85.3% chr10 + 66926456 66926867 412 browser details YourSeq 152 593 758 2000 96.4% chr19 - 23074436 23074610 175 browser details YourSeq 152 598 759 2000 97.0% chr17 + 25929182 25929343 162 browser details YourSeq 152 592 758 2000 94.0% chr10 + 11393009 11393173 165 browser details YourSeq 151 598 758 2000 96.9% chr17 - 3192180 3192340 161 browser details YourSeq 151 603 1088 2000 93.2% chr11 - 3173327 3173955 629 browser details YourSeq 150 593 759 2000 95.3% chr9 + 110317110 110317661 552 browser details YourSeq 149 612 1088 2000 84.9% chrX - 103417603 103417825 223 browser details YourSeq 149 593 757 2000 95.8% chr2 - 131806140 131806342 203

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and protein information: Pus1 synthase 1 [ Mus musculus (house mouse) ] Gene ID: 56361, updated on 10-Oct-2019

Gene summary

Official Symbol Pus1 provided by MGI Official Full Name pseudouridine synthase 1 provided by MGI Primary source MGI:MGI:1929237 See related Ensembl:ENSMUSG00000029507 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as MPUS1; mPus1p; A730013B20Rik Expression Ubiquitous expression in large intestine adult (RPKM 13.1), liver E14 (RPKM 13.0) and 28 other tissues See more Orthologs human all

Genomic context

Location: 5; 5 F See Pus1 in Genome Data Viewer Exon count: 7

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 5 NC_000071.6 (110773667..110780649, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 5 NC_000071.5 (111202686..111209634, complement)

Chromosome 5 - NC_000071.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 7 transcripts

Gene: Pus1 ENSMUSG00000029507

Description pseudouridine synthase 1 [Source:MGI Symbol;Acc:MGI:1929237] Gene Synonyms A730013B20Rik, MPUS1, mPus1p Location Chromosome 5: 110,773,667-110,780,659 reverse strand. GRCm38:CM000998.2 About this gene This gene has 7 transcripts (splice variants), 201 orthologues, 2 paralogues, is a member of 1 Ensembl protein family and is associated with 6 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Pus1-203 ENSMUST00000086643.11 1868 441aa ENSMUSP00000083844.5 Protein coding CCDS19530 H7BX59 TSL:1 GENCODE basic APPRIS P4

Pus1-202 ENSMUST00000031483.14 1814 423aa ENSMUSP00000031483.8 Protein coding CCDS19531 Q9WU56 TSL:1 GENCODE basic APPRIS ALT2

Pus1-201 ENSMUST00000031481.12 1563 393aa ENSMUSP00000031481.6 Protein coding CCDS39212 Q9WU56 TSL:1 GENCODE basic APPRIS ALT2

Pus1-207 ENSMUST00000170468.7 1510 393aa ENSMUSP00000130814.1 Protein coding CCDS39212 Q9WU56 TSL:5 GENCODE basic APPRIS ALT2

Pus1-204 ENSMUST00000112426.7 1389 347aa ENSMUSP00000108045.1 Protein coding CCDS84929 Q9WU56 TSL:1 GENCODE basic

Pus1-205 ENSMUST00000136483.7 701 147aa ENSMUSP00000115143.1 Protein coding - D3YWU8 CDS 3' incomplete TSL:3

Pus1-206 ENSMUST00000149208.1 527 162aa ENSMUSP00000115468.1 Protein coding - D3Z092 CDS 3' incomplete TSL:3

Page 7 of 9 https://www.alphaknockout.com

26.99 kb Forward strand

110.77Mb 110.78Mb 110.79Mb Gm15559-201 >lncRNA (Comprehensive set...

Contigs AC161348.2 > Genes (Comprehensive set... < Ep400-203protein coding < Pus1-201protein coding < Ulk1-201protein coding

< Ep400-201protein coding < Pus1-203protein coding < Ulk1-202nonsense mediated decay

< Ep400-204protein coding < Pus1-202protein coding < Ulk1-207nonsense mediated decay

< Ep400-210protein coding < Pus1-204protein coding < Ulk1-209retained intron

< Ep400-202protein coding < Pus1-207protein coding < Ulk1-211protein coding

< Ep400-211lncRNA < Pus1-205protein coding < Ulk1-208retained intron

< Pus1-206protein coding < Ulk1-205retained intron

Regulatory Build

110.77Mb 110.78Mb 110.79Mb Reverse strand 26.99 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000086643

< Pus1-203protein coding

Reverse strand 6.93 kb

ENSMUSP00000083... MobiDB lite Superfamily Pseudouridine synthase, catalytic domain superfamily

Pfam Pseudouridine synthase I, TruA, alpha/beta domain PANTHER Pseudouridine synthase I, TruA

PTHR11142:SF4 Gene3D 3.30.70.580 Pseudouridine synthase I, TruA, C-terminal CDD Pseudouridine synthase PUS1/ PUS2-like

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 400 441

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9