https://www.alphaknockout.com

Mouse Osgep Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Osgep conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Osgep (NCBI Reference Sequence: NM_133676 ; Ensembl: ENSMUSG00000006289 ) is located on Mouse 14. 11 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 11 (Transcript: ENSMUST00000159292). Exon 4~11 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Osgep gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-393L19 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 4~11 covers 59.1% of the coding region. Start codon is in exon 1, and stop codon is in exon 11. The size of intron 3 for 5'-loxP site insertion: 1684 bp. The size of effective cKO region: ~2343 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

gRNA region

Wildtype allele T A

5' gRNA region A 3'

1 2 3 4 5 6 7 8 9 10 11

Targeting vector T A A

Targeted allele T A A

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Osgep Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8570bp) | A(25.81% 2212) | C(22.61% 1938) | T(29.07% 2491) | G(22.51% 1929)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr14 - 50918182 50921181 3000 browser details YourSeq 286 1132 1568 3000 92.7% chr18 - 33384543 33384841 299 browser details YourSeq 157 2318 2540 3000 85.0% chr14 + 92455979 92456187 209 browser details YourSeq 154 2344 2555 3000 91.1% chr2 + 83753455 83753672 218 browser details YourSeq 152 2319 2522 3000 87.7% chr13 + 95203374 95203563 190 browser details YourSeq 151 2315 2523 3000 87.0% chr14 + 72875382 72875566 185 browser details YourSeq 149 2353 2543 3000 91.9% chr1 - 81846433 81846621 189 browser details YourSeq 147 2346 2551 3000 90.4% chr9 - 71370693 71370907 215 browser details YourSeq 141 2367 2549 3000 91.2% chr9 + 66998449 66998633 185 browser details YourSeq 139 2318 2544 3000 82.6% chr1 - 105526957 105527128 172 browser details YourSeq 137 2374 2543 3000 88.2% chr7 + 131006213 131006372 160 browser details YourSeq 132 2398 2553 3000 93.0% chr8 + 69723770 69723967 198 browser details YourSeq 130 2378 2540 3000 89.9% chr11 - 3607208 3607363 156 browser details YourSeq 130 2342 2546 3000 92.8% chr14 + 76164179 76164587 409 browser details YourSeq 129 2391 2544 3000 92.3% chr14 + 47308114 47308276 163 browser details YourSeq 128 2315 2524 3000 85.2% chr3 - 95102922 95103071 150 browser details YourSeq 126 2388 2543 3000 91.1% chr10 + 81108050 81108208 159 browser details YourSeq 125 2374 2524 3000 89.4% chrX - 134978487 134978629 143 browser details YourSeq 124 2342 2540 3000 85.8% chrX + 71363885 71364032 148 browser details YourSeq 124 2374 2523 3000 89.3% chr2 + 90908344 90908485 142

Note: The 3000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr14 - 50912862 50915861 3000 browser details YourSeq 376 1650 2498 3000 94.6% chr11 - 14715277 14715979 703 browser details YourSeq 366 1650 2495 3000 94.8% chr9 + 110806327 110807078 752 browser details YourSeq 354 1650 2020 3000 97.5% chr4 - 94950397 94950758 362 browser details YourSeq 350 1646 1997 3000 100.0% chr5 + 96672143 96672499 357 browser details YourSeq 348 1650 1998 3000 100.0% chr6 - 49971163 49971515 353 browser details YourSeq 348 1649 1997 3000 100.0% chr14 - 54613737 54614087 351 browser details YourSeq 348 1650 1998 3000 100.0% chr2 + 117206877 117207229 353 browser details YourSeq 347 1650 1999 3000 99.8% chrX - 106719700 106720055 356 browser details YourSeq 347 1650 1997 3000 100.0% chr5 - 37600086 37600437 352 browser details YourSeq 347 1650 1997 3000 100.0% chr14 - 11191559 11191910 352 browser details YourSeq 346 1650 1998 3000 99.8% chr7 - 29930097 29930449 353 browser details YourSeq 346 1650 2003 3000 99.2% chr5 - 148148219 148148621 403 browser details YourSeq 346 1649 1997 3000 99.8% chr4 + 135802141 135802493 353 browser details YourSeq 346 1649 1997 3000 99.8% chr16 + 72814009 72814361 353 browser details YourSeq 346 1650 1998 3000 99.8% chr15 + 10498310 10498662 353 browser details YourSeq 345 1650 1997 3000 99.8% chr7 - 73485803 73486154 352 browser details YourSeq 345 1650 1997 3000 99.8% chr6 - 95779275 95779626 352 browser details YourSeq 345 1650 1997 3000 99.8% chr10 - 114920935 114921286 352 browser details YourSeq 345 1650 1997 3000 99.8% chr10 - 107657584 107657935 352

Note: The 3000 bp section downstream of Exon 11 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and protein information: Osgep O-sialoglycoprotein endopeptidase [ Mus musculus (house mouse) ] Gene ID: 66246, updated on 12-Aug-2019

Gene summary

Official Symbol Osgep provided by MGI Official Full Name O-sialoglycoprotein endopeptidase provided by MGI Primary source MGI:MGI:1913496 See related Ensembl:ENSMUSG00000006289 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as GCPL-1; PRSMG1; 1500019L24Rik Expression Ubiquitous expression in ovary adult (RPKM 44.1), adrenal adult (RPKM 34.5) and 28 other tissues See more Orthologs human all

Genomic context

Location: 14; 14 C1 See Osgep in Genome Data Viewer

Exon count: 11

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 14 NC_000080.6 (50915374..50924893, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 14 NC_000080.5 (51535049..51544568, complement)

Chromosome 14 - NC_000080.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: Osgep ENSMUSG00000006289

Description O-sialoglycoprotein endopeptidase [Source:MGI Symbol;Acc:MGI:1913496] Gene Synonyms 1500019L24Rik, GCPL-1, PRSMG1 Location : 50,906,478-50,924,893 reverse strand. GRCm38:CM001007.2 About this gene This gene has 8 transcripts (splice variants), 188 orthologues, 1 paralogue and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Osgep- ENSMUST00000159292.7 3388 335aa ENSMUSP00000124039.1 Protein coding CCDS27026 A0A0R4J1Y3 TSL:1 202 GENCODE basic APPRIS P1

Osgep- ENSMUST00000162177.7 1219 254aa ENSMUSP00000124016.1 Protein coding - E0CYN9 TSL:1 207 GENCODE basic

Osgep- ENSMUST00000160375.7 713 156aa ENSMUSP00000124099.1 Protein coding - E0CYK9 CDS 3' 203 incomplete TSL:5

Osgep- ENSMUST00000160393.7 2004 335aa ENSMUSP00000125155.1 Nonsense mediated - A0A0R4J1Y3 TSL:1 204 decay

Osgep- ENSMUST00000160890.7 1812 80aa ENSMUSP00000124659.1 Nonsense mediated - E0CXW7 TSL:2 206 decay

Osgep- ENSMUST00000006452.12 1308 186aa ENSMUSP00000006452.6 Nonsense mediated - E9QMF4 TSL:5 201 decay

Osgep- ENSMUST00000160464.1 839 No - Retained intron - - TSL:2 205 protein

Osgep- ENSMUST00000162850.1 586 No - Retained intron - - TSL:2 208 protein

Page 6 of 8 https://www.alphaknockout.com

38.42 kb Forward strand

50.90Mb 50.91Mb 50.92Mb 50.93Mb Gm24689-201 >snRNA Apex1-201 >protein coding (Comprehensive set...

Apex1-203 >protein coding

Apex1-204 >protein coding

Apex1-202 >protein coding

Pnp-203 >protein coding

Contigs < AC027184.15 < AC136376.3 Genes (Comprehensive set... < Klhl33-204protein coding < Osgep-204nonsense mediated decay < Pip4p1-205protein coding

< Klhl33-203retained intron < Osgep-202protein coding < Pip4p1-201protein coding

< Klhl33-202protein coding < Osgep-201nonsense mediated decay < Pip4p1-206protein coding

< Osgep-206nonsense mediated decay < Pip4p1-209protein coding

< Osgep-207protein coding < Pip4p1-203retained intron

< Osgep-208retained intron < Pip4p1-207protein coding

< Osgep-203protein coding < Pip4p1-204protein coding

< Osgep-205retained intron < Pip4p1-211retained intron

< Pip4p1-202retained intron

< Pip4p1-208retained intron

< Pip4p1-210retained intron

Regulatory Build

50.90Mb 50.91Mb 50.92Mb 50.93Mb Reverse strand 38.42 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000159292

< Osgep-202protein coding

Reverse strand 11.30 kb

ENSMUSP00000124... TIGRFAM Kae1/TsaD family Superfamily SSF53067 Prints Kae1/TsaD family Pfam Gcp-like domain PROSITE patterns Peptidase M22, conserved site PANTHER tRNA N6-adenosine threonylcarbamoyltransferase Kae1/OSGEP

Gcp-like domain HAMAP tRNA N6-adenosine threonylcarbamoyltransferase Kae1/OSGEP Gene3D 3.30.420.40

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 335

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8