https://www.alphaknockout.com

Mouse Zg16 Knockout Project (CRISPR/Cas9)

Objective: To create a Zg16 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Zg16 (NCBI Reference Sequence: NM_026918 ; Ensembl: ENSMUSG00000049350 ) is located on Mouse 7. 3 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 3 (Transcript: ENSMUST00000051122). Exon 1~3 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit increased bacterial penetrance into the inner mucus layer of the colon, increased bacterial loads in draining lymph nodes and the spleen and increased abdominal fat mass.

Exon 1 starts from about 0.2% of the coding region. Exon 1~3 covers 100.0% of the coding region. The size of effective KO region: ~1685 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3

Legends Exon of mouse Zg16 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.05% 501) | C(22.05% 441) | T(27.85% 557) | G(25.05% 501)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.1% 542) | C(20.25% 405) | T(29.8% 596) | G(22.85% 457)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr7 - 127051974 127053973 2000 browser details YourSeq 37 925 977 2000 76.2% chr17 - 25136066 25136110 45 browser details YourSeq 37 951 1038 2000 95.2% chr11 + 80161206 80161701 496 browser details YourSeq 36 938 996 2000 90.7% chr4 + 65557300 65557368 69 browser details YourSeq 36 940 987 2000 95.0% chr14 + 20208875 20208923 49 browser details YourSeq 35 940 984 2000 87.2% chr17 - 71039485 71039527 43 browser details YourSeq 35 942 987 2000 92.7% chr14 + 48834916 48834962 47 browser details YourSeq 32 938 976 2000 83.8% chr7 + 112273512 112273548 37 browser details YourSeq 32 940 981 2000 86.2% chr10 + 121052584 121052623 40 browser details YourSeq 31 963 999 2000 81.3% chr8 - 109642350 109642381 32 browser details YourSeq 29 935 983 2000 93.8% chr12 + 112646156 112646204 49 browser details YourSeq 28 1003 1032 2000 96.7% chr5 - 8445407 8445436 30 browser details YourSeq 27 945 986 2000 96.6% chr10 - 68101090 68101132 43 browser details YourSeq 27 947 988 2000 96.6% chr3 + 37720081 37720123 43 browser details YourSeq 25 946 978 2000 87.9% chr6 + 20771189 20771221 33 browser details YourSeq 24 930 957 2000 92.9% chr10 - 52365170 52365197 28 browser details YourSeq 23 1016 1042 2000 92.6% chr1 - 105432096 105432122 27

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr7 - 127048287 127050286 2000 browser details YourSeq 194 1546 1825 2000 92.6% chr3 - 88600246 88600735 490 browser details YourSeq 183 1545 1804 2000 91.9% chr11 + 101712490 101712976 487 browser details YourSeq 183 1581 1821 2000 94.7% chr1 + 78640485 78640745 261 browser details YourSeq 172 1543 1740 2000 94.9% chr15 + 89526329 89526762 434 browser details YourSeq 170 1584 1795 2000 92.5% chr8 - 27063116 27063386 271 browser details YourSeq 170 1541 1747 2000 93.0% chr10 - 48279807 48280015 209 browser details YourSeq 163 1584 1821 2000 94.6% chr10 + 73179068 73179473 406 browser details YourSeq 162 1563 1780 2000 91.4% chr18 + 36820673 36821211 539 browser details YourSeq 162 1528 1751 2000 91.8% chr10 + 75574611 75574837 227 browser details YourSeq 161 1547 1740 2000 91.4% chr11 - 3423760 3423950 191 browser details YourSeq 160 1585 1827 2000 94.0% chr1 + 84893696 84894077 382 browser details YourSeq 159 1555 1752 2000 90.9% chr13 - 62758284 62758698 415 browser details YourSeq 159 1559 1762 2000 90.5% chr5 + 65999677 65999877 201 browser details YourSeq 158 1560 1740 2000 94.0% chr2 - 130997108 130997293 186 browser details YourSeq 158 1547 1740 2000 94.0% chr1 - 135420819 135421144 326 browser details YourSeq 158 1545 1740 2000 93.9% chr8 + 4153856 4154058 203 browser details YourSeq 158 1541 1737 2000 96.0% chr13 + 13675266 13675462 197 browser details YourSeq 158 1555 1740 2000 93.6% chr12 + 71108019 71108815 797 browser details YourSeq 158 1541 1748 2000 90.7% chr11 + 76270716 76270966 251

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Zg16 zymogen granule protein 16 [ Mus musculus (house mouse) ] Gene ID: 69036, updated on 14-Aug-2019

Gene summary

Official Symbol Zg16 provided by MGI Official Full Name zymogen granule protein 16 provided by MGI Primary source MGI:MGI:1916286 See related Ensembl:ENSMUSG00000049350 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as ZG16p; AI593689; 1810010M01Rik Expression Biased expression in large intestine adult (RPKM 2660.4), small intestine adult (RPKM 1563.8) and 3 other tissues See Orthologs more human all

Genomic context

Location: 7; 7 F3 See Zg16 in Genome Data Viewer

Exon count: 4

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 7 NC_000073.6 (127050156..127052675, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 7 NC_000073.5 (134193670..134195491, complement)

Chromosome 7 - NC_000073.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Zg16 ENSMUSG00000049350

Description zymogen granule protein 16 [Source:MGI Symbol;Acc:MGI:1916286] Gene Synonyms 1810010M01Rik Location Chromosome 7: 127,050,156-127,087,328 reverse strand. GRCm38:CM001000.2 About this gene This gene has 4 transcripts (splice variants), 121 orthologues, is a member of 1 Ensembl protein family and is associated with 6 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Zg16-202 ENSMUST00000205424.1 682 167aa ENSMUSP00000145876.1 Protein coding CCDS21856 Q8K0C5 TSL:2 GENCODE basic APPRIS P1

Zg16-201 ENSMUST00000051122.6 636 167aa ENSMUSP00000056916.5 Protein coding CCDS21856 Q8K0C5 TSL:1 GENCODE basic APPRIS P1

Zg16-203 ENSMUST00000205559.1 723 104aa ENSMUSP00000145957.1 Protein coding - A0A0U1RPF0 CDS 3' incomplete TSL:3

Zg16-204 ENSMUST00000205623.1 378 No protein - lncRNA - - TSL:5

57.17 kb Forward strand 127.05Mb 127.06Mb 127.07Mb 127.08Mb 127.09Mb Gm25333-201 >misc RNA Gm44939-201 >lncRNA Gm22747-201 >snRNA AI467606-201 >protein coding (Comprehensive set...

AI467606-202 >protein coding

Contigs AC122863.4 > AC122537.4 > Genes (Comprehensive set... < Kif22-201protein coding < Zg16-201protein coding < Zg16-204lncRNA

< Kif22-204retained intron < Zg16-202protein coding

< Kif22-203protein coding < Zg16-203protein coding

< Gm44343-201miRNA

Regulatory Build

127.05Mb 127.06Mb 127.07Mb 127.08Mb 127.09Mb Reverse strand 57.17 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000051122

< Zg16-201protein coding

Reverse strand 1.82 kb

ENSMUSP00000056... Low complexity (Seg) Cleavage site (Sign... Superfamily Jacalin-like lectin domain superfamily SMART Jacalin-like lectin domain Pfam Jacalin-like lectin domain PROSITE profiles Jacalin-like lectin domain PANTHER PTHR33589

Zymogen granule membrane protein 16 Gene3D Jacalin-like lectin domain superfamily CDD cd09611

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 20 40 60 80 100 120 140 167

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8