https://www.alphaknockout.com

Mouse Uap1 Knockout Project (CRISPR/Cas9)

Objective: To create a Uap1 knockout Mouse model (C57BL/6N) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Uap1 (NCBI Reference Sequence: NM_133806 ; Ensembl: ENSMUSG00000026670 ) is located on Mouse 1. 11 exons are identified, with the ATG start codon in exon 2 and the TAA stop codon in exon 11 (Transcript: ENSMUST00000027981). Exon 3~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 17.98% of the coding region. Exon 3~5 covers 35.44% of the coding region. The size of effective KO region: ~4648 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 5 11

Legends Exon of mouse Uap1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.65% 553) | C(17.45% 349) | T(33.9% 678) | G(21.0% 420)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(29.65% 593) | C(16.2% 324) | T(32.3% 646) | G(21.85% 437)

Note: The 2000 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr1 - 170161572 170163571 2000 browser details YourSeq 171 2 212 2000 91.8% chrX + 64367934 64368526 593 browser details YourSeq 32 1840 1878 2000 92.2% chr4 - 99665168 99665211 44 browser details YourSeq 28 1244 1285 2000 83.4% chr13 + 51610905 51610946 42 browser details YourSeq 25 458 488 2000 88.9% chr3 + 32072476 32072505 30 browser details YourSeq 24 296 319 2000 100.0% chr1 - 161721323 161721346 24 browser details YourSeq 23 1836 1868 2000 84.9% chr19 + 21633565 21633597 33 browser details YourSeq 23 1836 1864 2000 89.7% chr12 + 17738390 17738418 29 browser details YourSeq 23 1836 1866 2000 87.1% chr11 + 76909721 76909751 31

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr1 - 170154924 170156923 2000 browser details YourSeq 230 157 732 2000 93.0% chr2 - 156101960 156102486 527 browser details YourSeq 200 89 344 2000 93.0% chr10 + 21714141 21714379 239 browser details YourSeq 199 141 346 2000 99.1% chr3 - 27679166 27679396 231 browser details YourSeq 195 147 345 2000 99.5% chr7 - 16253253 16253472 220 browser details YourSeq 195 144 346 2000 98.6% chr11 - 62451772 62596555 144784 browser details YourSeq 195 141 362 2000 97.1% chr7 + 127692104 127692324 221 browser details YourSeq 195 145 344 2000 99.0% chr10 + 111235925 111236170 246 browser details YourSeq 194 145 344 2000 98.5% chr15 - 4490048 4490247 200 browser details YourSeq 193 145 346 2000 98.1% chr7 + 46932572 46932775 204 browser details YourSeq 193 146 345 2000 98.5% chr5 + 114833686 114833889 204 browser details YourSeq 193 11 344 2000 90.6% chr4 + 107769598 107769803 206 browser details YourSeq 193 145 344 2000 98.5% chr4 + 59037196 59037399 204 browser details YourSeq 192 145 344 2000 98.5% chrX - 98895677 98895895 219 browser details YourSeq 192 145 344 2000 99.0% chr11 - 70517098 70517309 212 browser details YourSeq 192 145 345 2000 98.1% chr8 + 70827528 70827729 202 browser details YourSeq 192 148 344 2000 99.0% chr16 + 32650097 32650307 211 browser details YourSeq 191 150 344 2000 99.0% chrX - 73937759 73937953 195 browser details YourSeq 191 145 344 2000 98.0% chr18 - 67701969 67702172 204 browser details YourSeq 191 145 344 2000 98.0% chr16 - 38614434 38614637 204

Note: The 2000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and protein information: Uap1 UDP-N-acetylglucosamine pyrophosphorylase 1 [ Mus musculus (house mouse) ] Gene ID: 107652, updated on 14-Aug-2019

Gene summary

Official Symbol Uap1 provided by MGI Official Full Name UDP-N-acetylglucosamine pyrophosphorylase 1 provided by MGI Primary source MGI:MGI:1334459 See related Ensembl:ENSMUSG00000026670 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AgX; AGX1; AGX-1; AGX-2; SPAG2; ESTM38; AA420407; AA437972 Expression Ubiquitous expression in large intestine adult (RPKM 10.9), colon adult (RPKM 10.8) and 28 other tissues See more Orthologs human all

Genomic context

Location: 1; 1 H3 See Uap1 in Genome Data Viewer Exon count: 12

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 1 NC_000067.6 (170141223..170174964, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 1 NC_000067.5 (172072134..172105077, complement)

Chromosome 1 - NC_000067.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 9 transcripts

Gene: Uap1 ENSMUSG00000026670

Description UDP-N-acetylglucosamine pyrophosphorylase 1 [Source:MGI Symbol;Acc:MGI:1334459] Gene Synonyms AGX1, AgX, ESTM38, SPAG2 Location : 170,141,938-170,174,957 reverse strand. GRCm38:CM000994.2 About this gene This gene has 9 transcripts (splice variants), 207 orthologues, 2 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Uap1-202 ENSMUST00000111350.9 2311 505aa ENSMUSP00000106982.3 Protein coding CCDS83631 A0A0R4J1F6 TSL:1 GENCODE basic APPRIS ALT1

Uap1-203 ENSMUST00000111351.9 2287 522aa ENSMUSP00000106983.3 Protein coding CCDS83632 Q3UHZ7 TSL:1 GENCODE basic APPRIS ALT1

Uap1-201 ENSMUST00000027981.7 2283 521aa ENSMUSP00000027981.7 Protein coding CCDS15467 A0A0R4J085 TSL:1 GENCODE basic APPRIS P3

Uap1-207 ENSMUST00000162253.6 1896 No protein - Retained intron - - TSL:2

Uap1-206 ENSMUST00000161492.1 856 No protein - Retained intron - - TSL:1

Uap1-205 ENSMUST00000161112.1 676 No protein - Retained intron - - TSL:1

Uap1-204 ENSMUST00000160848.1 409 No protein - lncRNA - - TSL:5

Uap1-209 ENSMUST00000191797.1 387 No protein - lncRNA - - TSL:5

Uap1-208 ENSMUST00000191690.1 369 No protein - lncRNA - - TSL:5

Page 7 of 9 https://www.alphaknockout.com

53.02 kb Forward strand 170.14Mb 170.15Mb 170.16Mb 170.17Mb 170.18Mb Gm37748-201 >lncRNA (Comprehensive set...

Contigs AC119893.9 > < AC123650.7 Genes (Comprehensive set... < Uap1-202protein coding < Gm37502-201lncRNA

< Uap1-203protein coding

< Uap1-201protein coding

< Uap1-206retained intron< Uap1-207retained intron

< Uap1-204lncRN

< Uap1-209lncRNA

< Uap1-208lncRNA

Regulatory Build

170.14Mb 170.15Mb 170.16Mb 170.17Mb 170.18Mb Reverse strand 53.02 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000027981

< Uap1-201protein coding

Reverse strand 32.94 kb

ENSMUSP00000027... Coiled-coils (Ncoils) Superfamily Nucleotide-diphospho-sugar transferases

Pfam UDPGP family

PANTHER PTHR11952:SF4

UDP-sugar pyrophosphorylase Gene3D Nucleotide-diphospho-sugar transferases 3.40.1630.20

CDD cd04193

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 521

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9