https://www.alphaknockout.com

Mouse Osgep Knockout Project (CRISPR/Cas9)

Objective: To create a Osgep knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Osgep (NCBI Reference Sequence: NM_133676 ; Ensembl: ENSMUSG00000006289 ) is located on Mouse 14. 11 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 11 (Transcript: ENSMUST00000159292). Exon 2~11 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 11.54% of the coding region. Exon 2~11 covers 88.56% of the coding region. The size of effective KO region: ~4186 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10 11

Legends Exon of mouse Osgep Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(28.65% 573) | C(21.35% 427) | T(28.6% 572) | G(21.4% 428)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.3% 486) | C(28.75% 575) | T(28.6% 572) | G(18.35% 367)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr14 - 50920050 50922049 2000 browser details YourSeq 259 1 562 2000 86.4% chr14 - 124546683 124547406 724 browser details YourSeq 207 1 295 2000 89.4% chr2 - 64664445 64664753 309 browser details YourSeq 207 1 302 2000 91.0% chr18 + 68576431 68576763 333 browser details YourSeq 204 4 302 2000 88.8% chr19 - 50085425 50085758 334 browser details YourSeq 204 1 278 2000 92.2% chrX + 112208827 112614184 405358 browser details YourSeq 201 1 302 2000 87.7% chr5 + 22883923 22884242 320 browser details YourSeq 200 1 290 2000 89.8% chr3 - 87598768 87599075 308 browser details YourSeq 199 5 294 2000 87.9% chr16 + 57791739 57792045 307 browser details YourSeq 196 7 290 2000 89.2% chrX - 150927432 150927729 298 browser details YourSeq 192 1 290 2000 87.6% chrX - 79256090 79256396 307 browser details YourSeq 190 1 278 2000 89.4% chr2 - 131788349 131788903 555 browser details YourSeq 187 5 278 2000 88.5% chrX + 76920226 76920512 287 browser details YourSeq 186 1 273 2000 88.5% chr2 - 17791546 17791833 288 browser details YourSeq 186 32 278 2000 90.9% chr19 - 5216606 5216866 261 browser details YourSeq 186 13 276 2000 91.2% chr17 - 65991517 65991795 279 browser details YourSeq 185 7 302 2000 88.2% chrX - 70229726 70230053 328 browser details YourSeq 185 1 275 2000 88.1% chr6 + 9488438 9488725 288 browser details YourSeq 182 18 374 2000 91.8% chr18 - 44901495 44902518 1024 browser details YourSeq 181 1 257 2000 88.3% chr12 - 9417388 9417660 273

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr14 - 50913862 50915861 2000 browser details YourSeq 350 1646 1997 2000 100.0% chr5 + 96672143 96672499 357 browser details YourSeq 349 1649 2000 2000 99.8% chr14 - 54613734 54614087 354 browser details YourSeq 348 1650 1998 2000 100.0% chr6 - 49971163 49971515 353 browser details YourSeq 348 1650 2000 2000 99.8% chr14 - 11191556 11191910 355 browser details YourSeq 348 1650 1998 2000 100.0% chr2 + 117206877 117207229 353 browser details YourSeq 347 1650 1999 2000 99.8% chrX - 106719700 106720055 356 browser details YourSeq 347 1650 1997 2000 100.0% chr5 - 37600086 37600437 352 browser details YourSeq 347 1650 1997 2000 100.0% chr4 - 94950407 94950758 352 browser details YourSeq 346 1650 2000 2000 99.5% chr7 - 29930095 29930449 355 browser details YourSeq 346 1649 1997 2000 99.8% chr4 + 135802141 135802493 353 browser details YourSeq 346 1648 2000 2000 99.2% chr19 + 32462188 32462544 357 browser details YourSeq 346 1649 1997 2000 99.8% chr16 + 72814009 72814361 353 browser details YourSeq 346 1650 1998 2000 99.8% chr15 + 10498310 10498662 353 browser details YourSeq 345 1650 1997 2000 99.8% chr7 - 73485803 73486154 352 browser details YourSeq 345 1650 1997 2000 99.8% chr6 - 95779275 95779626 352 browser details YourSeq 345 1650 1998 2000 99.5% chr11 - 14715631 14715979 349 browser details YourSeq 345 1650 1997 2000 99.8% chr10 - 114920935 114921286 352 browser details YourSeq 345 1650 1997 2000 99.8% chr10 - 107657584 107657935 352 browser details YourSeq 345 1650 1997 2000 99.8% chr2 + 162179525 162179874 350

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and protein information: Osgep O-sialoglycoprotein endopeptidase [ Mus musculus (house mouse) ] Gene ID: 66246, updated on 12-Aug-2019

Gene summary

Official Symbol Osgep provided by MGI Official Full Name O-sialoglycoprotein endopeptidase provided by MGI Primary source MGI:MGI:1913496 See related Ensembl:ENSMUSG00000006289 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as GCPL-1; PRSMG1; 1500019L24Rik Expression Ubiquitous expression in ovary adult (RPKM 44.1), adrenal adult (RPKM 34.5) and 28 other tissues See more Orthologs human all

Genomic context

Location: 14; 14 C1 See Osgep in Genome Data Viewer Exon count: 11

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 14 NC_000080.6 (50915374..50924893, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 14 NC_000080.5 (51535049..51544568, complement)

Chromosome 14 - NC_000080.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: Osgep ENSMUSG00000006289

Description O-sialoglycoprotein endopeptidase [Source:MGI Symbol;Acc:MGI:1913496] Gene Synonyms 1500019L24Rik, GCPL-1, PRSMG1 Location : 50,906,478-50,924,893 reverse strand. GRCm38:CM001007.2 About this gene This gene has 8 transcripts (splice variants), 188 orthologues, 1 paralogue and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Osgep- ENSMUST00000159292.7 3388 335aa ENSMUSP00000124039.1 Protein coding CCDS27026 A0A0R4J1Y3 TSL:1 202 GENCODE basic APPRIS P1

Osgep- ENSMUST00000162177.7 1219 254aa ENSMUSP00000124016.1 Protein coding - E0CYN9 TSL:1 207 GENCODE basic

Osgep- ENSMUST00000160375.7 713 156aa ENSMUSP00000124099.1 Protein coding - E0CYK9 CDS 3' 203 incomplete TSL:5

Osgep- ENSMUST00000160393.7 2004 335aa ENSMUSP00000125155.1 Nonsense mediated - A0A0R4J1Y3 TSL:1 204 decay

Osgep- ENSMUST00000160890.7 1812 80aa ENSMUSP00000124659.1 Nonsense mediated - E0CXW7 TSL:2 206 decay

Osgep- ENSMUST00000006452.12 1308 186aa ENSMUSP00000006452.6 Nonsense mediated - E9QMF4 TSL:5 201 decay

Osgep- ENSMUST00000160464.1 839 No - Retained intron - - TSL:2 205 protein

Osgep- ENSMUST00000162850.1 586 No - Retained intron - - TSL:2 208 protein

Page 7 of 9 https://www.alphaknockout.com

38.42 kb Forward strand

50.90Mb 50.91Mb 50.92Mb 50.93Mb Gm24689-201 >snRNA Apex1-201 >protein coding (Comprehensive set...

Apex1-203 >protein coding

Apex1-204 >protein coding

Apex1-202 >protein coding

Pnp-203 >protein coding

Contigs < AC027184.15 < AC136376.3 Genes (Comprehensive set... < Klhl33-204protein coding < Osgep-204nonsense mediated decay < Pip4p1-205protein coding

< Klhl33-203retained intron < Osgep-202protein coding < Pip4p1-201protein coding

< Klhl33-202protein coding < Osgep-201nonsense mediated decay < Pip4p1-206protein coding

< Osgep-206nonsense mediated decay < Pip4p1-209protein coding

< Osgep-207protein coding < Pip4p1-203retained intron

< Osgep-208retained intron < Pip4p1-207protein coding

< Osgep-203protein coding < Pip4p1-204protein coding

< Osgep-205retained intron < Pip4p1-211retained intron

< Pip4p1-202retained intron

< Pip4p1-208retained intron

< Pip4p1-210retained intron

Regulatory Build

50.90Mb 50.91Mb 50.92Mb 50.93Mb Reverse strand 38.42 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000159292

< Osgep-202protein coding

Reverse strand 11.30 kb

ENSMUSP00000124... TIGRFAM Kae1/TsaD family Superfamily SSF53067 Prints Kae1/TsaD family Pfam Gcp-like domain PROSITE patterns Peptidase M22, conserved site PANTHER tRNA N6-adenosine threonylcarbamoyltransferase Kae1/OSGEP

Gcp-like domain HAMAP tRNA N6-adenosine threonylcarbamoyltransferase Kae1/OSGEP Gene3D 3.30.420.40

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 335

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9