https://www.alphaknockout.com

Mouse Ngly1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Ngly1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Ngly1 (NCBI Reference Sequence: NM_021504 ; Ensembl: ENSMUSG00000021785 ) is located on Mouse 14. 12 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 12 (Transcript: ENSMUST00000022310). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Ngly1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-304N3 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit dysregulation of the endoplasmic reticulum (ER)-associated degradation (ERAD) process.

Exon 2 starts from about 6.76% of the coding region. The knockout of Exon 2 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 5144 bp, and the size of intron 2 for 3'-loxP site insertion: 5807 bp. The size of effective cKO region: ~615 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 12 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Ngly1 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7115bp) | A(30.72% 2186) | C(16.99% 1209) | T(30.91% 2199) | G(21.38% 1521)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr14 + 16251393 16254392 3000 browser details YourSeq 292 2136 2705 3000 87.9% chr18 + 73179538 73179922 385 browser details YourSeq 291 2347 2703 3000 92.5% chr2 + 127101421 127101777 357 browser details YourSeq 290 2353 2715 3000 90.3% chr19 - 61099534 61099881 348 browser details YourSeq 290 2346 2970 3000 86.4% chr13 - 6334915 6335293 379 browser details YourSeq 290 2348 2699 3000 90.7% chr1 + 23994282 23994614 333 browser details YourSeq 289 2344 2708 3000 91.1% chr4 - 21954331 21954686 356 browser details YourSeq 289 2348 2699 3000 90.3% chr13 - 101282065 101282403 339 browser details YourSeq 289 2345 2709 3000 91.6% chr10 + 95786026 95786382 357 browser details YourSeq 289 2353 2699 3000 91.0% chr1 + 156176686 156177018 333 browser details YourSeq 288 2349 2750 3000 91.7% chr13 + 93033228 93033609 382 browser details YourSeq 288 2349 2703 3000 90.1% chr11 + 57959174 57959507 334 browser details YourSeq 288 2353 2705 3000 90.6% chr10 + 61559970 61560314 345 browser details YourSeq 287 2338 2699 3000 90.8% chr6 + 134507952 134508303 352 browser details YourSeq 287 2353 2699 3000 91.7% chr1 + 23394573 23394905 333 browser details YourSeq 286 2349 2699 3000 91.0% chr13 - 111277336 111277673 338 browser details YourSeq 286 2348 2699 3000 90.7% chr13 + 11980275 11980611 337 browser details YourSeq 286 2348 2714 3000 89.8% chr12 + 69915759 69916115 357 browser details YourSeq 286 2347 2705 3000 90.3% chr12 + 50729411 50729760 350 browser details YourSeq 286 2347 2699 3000 90.5% chr1 + 93056574 93056918 345

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr14 + 16255008 16258007 3000 browser details YourSeq 35 1367 1420 3000 87.3% chr13 + 108424814 108425029 216 browser details YourSeq 32 1445 1498 3000 86.9% chr14 + 61774886 61774938 53 browser details YourSeq 31 1 56 3000 97.0% chr15 - 21281046 21281102 57 browser details YourSeq 31 526 567 3000 91.2% chr10 + 118090397 118090437 41

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and protein information: Ngly1 N-glycanase 1 [ Mus musculus (house mouse) ] Gene ID: 59007, updated on 24-Oct-2019

Gene summary

Official Symbol Ngly1 provided by MGI Official Full Name N-glycanase 1 provided by MGI Primary source MGI:MGI:1913276 See related Ensembl:ENSMUSG00000021785 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Png1; PNGase; 1110002C09Rik Expression Ubiquitous expression in testis adult (RPKM 30.1), adrenal adult (RPKM 10.0) and 28 other tissues See more Orthologs human all

Genomic context

Location: 14 A2; 14 7.08 cM See Ngly1 in Genome Data Viewer

Exon count: 12

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 14 NC_000080.6 (16249314..16311926)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 14 NC_000080.5 (17081828..17144440)

Chromosome 14 - NC_000080.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: Ngly1 ENSMUSG00000021785

Description N-glycanase 1 [Source:MGI Symbol;Acc:MGI:1913276] Gene Synonyms 1110002C09Rik, PNGase, Png1 Location Chromosome 14: 16,249,280-16,311,926 forward strand. GRCm38:CM001007.2 About this gene This gene has 8 transcripts (splice variants), 197 orthologues, is a member of 1 Ensembl protein family and is associated with 4 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Ngly1- ENSMUST00000022310.6 2935 651aa ENSMUSP00000022310.6 Protein coding CCDS26832 Q9JI78 TSL:1 201 GENCODE basic APPRIS P2

Ngly1- ENSMUST00000224656.1 2329 596aa ENSMUSP00000152998.1 Protein coding - A0A286YCI0 GENCODE basic 207 APPRIS ALT2

Ngly1- ENSMUST00000223973.1 753 251aa ENSMUSP00000153025.1 Protein coding - A0A286YCZ7 CDS 5' and 3' 205 incomplete

Ngly1- ENSMUST00000224154.1 373 111aa ENSMUSP00000153445.1 Protein coding - A0A286YDI5 CDS 3' incomplete 206

Ngly1- ENSMUST00000153109.1 3296 No - Retained - - TSL:1 203 protein intron

Ngly1- ENSMUST00000223879.1 2417 No - Retained - - - 204 protein intron

Ngly1- ENSMUST00000143947.1 969 No - Retained - - TSL:3 202 protein intron

Ngly1- ENSMUST00000226089.1 750 No - Retained - - - 208 protein intron

Page 6 of 8 https://www.alphaknockout.com

82.65 kb Forward strand 16.24Mb 16.26Mb 16.28Mb 16.30Mb 16.32Mb (Comprehensive set... Gm47794-201 >lncRNA Ngly1-208 >retained intron Gm47797-201 >TEC Gm47798-201 >lncRNA

Ngly1-201 >protein coding

Ngly1-207 >protein coding

Ngly1-203 >retained intron

Ngly1-206 >protein coding 5430414B19Rik-201 >lncRNA

Ngly1-205 >protein coding

Ngly1-204 >retained intron Ngly1-202 >retained intron

Contigs AC173482.1 > AC154452.2 > Genes < Oxsm-201protein coding (Comprehensive set...

< Oxsm-203protein coding

< Oxsm-202protein coding

< Oxsm-204lncRNA

Regulatory Build

16.24Mb 16.26Mb 16.28Mb 16.30Mb 16.32Mb Reverse strand 82.65 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000022310

62.65 kb Forward strand

Ngly1-201 >protein coding

ENSMUSP00000022... PDB-ENSP mappings Low complexity (Seg) Superfamily PUB-like domain superfamily Papain-like cysteine peptidase superfamily Galactose-binding-like domain superfamily

SMART PUB domain -like Peptide N glycanase, PAW domain

Pfam PUB domain Transglutaminase-like Peptide N glycanase, PAW domain

PROSITE profiles Peptide N glycanase, PAW domain PANTHER PTHR12143

PTHR12143:SF19 Gene3D 1.20.58.2190 3.10.620.30 PAW domain superfamily

2.20.25.10 CDD cd10459

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 651

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8