https://www.alphaknockout.com

Mouse Hgsnat Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Hgsnat conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Hgsnat (NCBI Reference Sequence: NM_029884 ; Ensembl: ENSMUSG00000037260 ) is located on Mouse 8. 18 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 18 (Transcript: ENSMUST00000037609). Exon 3~4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Hgsnat gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-294F8 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit progressive storage pathology in the CNS and peripheral organs, accumulation in brain and most somatic organs, lysosomal distension and dysfunction, astrocytosis, microgliosis, hepatosplenomegaly, behavioral deficits and premature death.

Exon 3 starts from about 15.45% of the coding region. The knockout of Exon 3~4 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 1139 bp, and the size of intron 4 for 3'-loxP site insertion: 2614 bp. The size of effective cKO region: ~1156 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 18 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Hgsnat Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7656bp) | A(25.54% 1955) | C(22.13% 1694) | T(28.2% 2159) | G(24.14% 1848)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 - 25971953 25974952 3000 browser details YourSeq 304 2375 2993 3000 88.2% chr11 - 101675897 101676558 662 browser details YourSeq 262 2419 2960 3000 90.8% chr7 + 141397070 141397779 710 browser details YourSeq 248 2356 2692 3000 92.0% chr1 + 87525375 87525732 358 browser details YourSeq 239 2375 2663 3000 93.2% chr2 - 21346648 21346975 328 browser details YourSeq 233 2496 2969 3000 92.7% chr18 + 42258981 42259608 628 browser details YourSeq 231 2355 2692 3000 95.0% chr12 + 69663538 69664004 467 browser details YourSeq 228 2375 2668 3000 91.0% chr2 - 155441579 155441872 294 browser details YourSeq 225 2355 2673 3000 92.9% chr5 + 21670903 21671253 351 browser details YourSeq 222 2385 2669 3000 91.2% chr15 + 37012734 37013363 630 browser details YourSeq 220 2375 2671 3000 93.4% chr11 - 51803257 51803666 410 browser details YourSeq 213 2375 2692 3000 89.9% chr2 - 119055957 119056478 522 browser details YourSeq 212 2375 2679 3000 90.7% chr11 + 100322896 100323441 546 browser details YourSeq 209 2375 2679 3000 87.5% chr11 - 102524069 102524352 284 browser details YourSeq 209 2375 2692 3000 93.4% chr16 + 22106644 22217234 110591 browser details YourSeq 206 2411 2692 3000 93.3% chr17 - 65622710 65623058 349 browser details YourSeq 203 2403 2688 3000 94.4% chr9 + 72917412 72917981 570 browser details YourSeq 201 2411 2692 3000 91.8% chr10 + 20131202 20131686 485 browser details YourSeq 200 2391 2672 3000 91.7% chr6 - 88475183 88475491 309 browser details YourSeq 196 2421 2674 3000 86.9% chr11 + 53831787 53832032 246

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 - 25967797 25970796 3000 browser details YourSeq 341 848 1188 3000 100.0% chr8 - 25969556 25969896 341 browser details YourSeq 336 904 1239 3000 100.0% chr8 - 25969611 25969946 336 browser details YourSeq 286 850 1135 3000 100.0% chr8 - 25969556 25969841 286 browser details YourSeq 55 921 1145 3000 69.9% chr1 - 189032823 189032911 89 browser details YourSeq 52 420 501 3000 85.4% chr12 - 3392053 3392140 88 browser details YourSeq 46 719 776 3000 94.4% chr6 - 137511914 137511988 75 browser details YourSeq 42 707 768 3000 83.0% chr9 - 120924636 120924693 58 browser details YourSeq 41 715 766 3000 95.7% chr1 - 119385728 119385782 55 browser details YourSeq 41 698 761 3000 88.0% chr1 + 161567974 161568036 63 browser details YourSeq 40 718 760 3000 97.7% chr8 - 107235808 107235851 44 browser details YourSeq 40 720 766 3000 93.7% chr2 + 59714326 59714374 49 browser details YourSeq 39 718 766 3000 97.7% chr9 - 54905469 54905524 56 browser details YourSeq 36 718 761 3000 92.9% chr1 - 89411096 89411140 45 browser details YourSeq 36 713 761 3000 95.0% chr5 + 114440398 114440447 50 browser details YourSeq 35 718 761 3000 94.9% chr12 - 86462273 86462317 45 browser details YourSeq 35 398 439 3000 97.3% chr11 - 74980264 74980307 44 browser details YourSeq 35 430 500 3000 94.9% chr10 - 81645523 81645604 82 browser details YourSeq 35 868 986 3000 61.6% chr1 - 189032823 189032873 51 browser details YourSeq 35 720 766 3000 95.0% chr5 + 125193390 125193439 50

Note: The 3000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and protein information: Hgsnat heparan-alpha-glucosaminide N- [ Mus musculus (house mouse) ] Gene ID: 52120, updated on 10-Oct-2019

Gene summary

Official Symbol Hgsnat provided by MGI Official Full Name heparan-alpha-glucosaminide N-acetyltransferase provided by MGI Primary source MGI:MGI:1196297 See related Ensembl:ENSMUSG00000037260 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Tmem76; AW208455; D8Ertd354e; 9430010M12Rik Expression Ubiquitous expression in cerebellum adult (RPKM 12.4), subcutaneous fat pad adult (RPKM 10.2) and 28 other tissues See Orthologs more human all

Genomic context

Location: 8 A2; 8 14.22 cM See Hgsnat in Genome Data Viewer

Exon count: 18

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 8 NC_000074.6 (25944459..25976744, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 8 NC_000074.5 (27054931..27087216, complement)

Chromosome 8 - NC_000074.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Hgsnat ENSMUSG00000037260

Description heparan-alpha-glucosaminide N-acetyltransferase [Source:MGI Symbol;Acc:MGI:1196297] Gene Synonyms 9430010M12Rik, D8Ertd354e, Tmem76 Location : 25,944,453-25,976,753 reverse strand. GRCm38:CM001001.2 About this gene This gene has 4 transcripts (splice variants), 257 orthologues, is a member of 1 Ensembl and is associated with 54 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Hgsnat-201 ENSMUST00000037609.7 2694 656aa ENSMUSP00000040356.6 Protein coding CCDS40309 Q3UDW8 TSL:1 GENCODE basic APPRIS P1

Hgsnat-204 ENSMUST00000211550.1 485 119aa ENSMUSP00000147675.1 Protein coding - A0A1B0GRV1 CDS 3' incomplete TSL:5

Hgsnat-203 ENSMUST00000210894.1 870 No protein - Retained intron - - TSL:2

Hgsnat-202 ENSMUST00000209420.1 489 No protein - lncRNA - - TSL:3

52.30 kb Forward strand 25.94Mb 25.95Mb 25.96Mb 25.97Mb 25.98Mb Kcnu1-201 >protein coding (Comprehensive set...

Kcnu1-202 >protein coding

Contigs AC122752.10 > < AC093366.9 Genes (Comprehensive set... < Hgsnat-201protein coding < Pomk-201protein coding

< Hgsnat-203retained intron < Hgsnat-204protein coding

< Hgsnat-202lncRNA

Regulatory Build

25.94Mb 25.95Mb 25.96Mb 25.97Mb 25.98Mb Reverse strand 52.30 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000037609

< Hgsnat-201protein coding

Reverse strand 32.30 kb

ENSMUSP00000040... Transmembrane heli... MobiDB lite Low complexity (Seg) Pfam Domain of unknown function DUF1624

PANTHER Heparan-alpha-glucosaminide N-acetyltransferase

PTHR31061

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

inframe deletion missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 656

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7