https://www.alphaknockout.com

Mouse Gstm4 Knockout Project (CRISPR/Cas9)

Objective: To create a Gstm4 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Gstm4 (NCBI Reference Sequence: NM_026764 ; Ensembl: ENSMUSG00000027890 ) is located on Mouse 3. 8 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 8 (Transcript: ENSMUST00000029489). Exon 1~8 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 1 starts from about 0.15% of the coding region. Exon 1~8 covers 100.0% of the coding region. The size of effective KO region: ~3432 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8

Legends Exon of mouse Gstm4 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(31.05% 621) | C(22.05% 441) | T(24.2% 484) | G(22.7% 454)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.45% 489) | C(26.0% 520) | T(25.95% 519) | G(23.6% 472)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 - 108044673 108046672 2000 browser details YourSeq 151 475 927 2000 86.5% chr11 + 23507509 23671916 164408 browser details YourSeq 98 197 672 2000 92.4% chr6 - 87596290 87597114 825 browser details YourSeq 89 494 685 2000 91.6% chr10 + 78346855 78347373 519 browser details YourSeq 80 462 643 2000 86.0% chr3 + 86095391 86095716 326 browser details YourSeq 80 225 560 2000 73.4% chr2 + 76642654 76642827 174 browser details YourSeq 78 469 689 2000 80.3% chr4 - 40878962 40879137 176 browser details YourSeq 77 771 929 2000 88.2% chr17 - 45443253 45443411 159 browser details YourSeq 75 468 647 2000 90.4% chr4 + 32946200 32946582 383 browser details YourSeq 73 471 560 2000 92.0% chr13 - 53432856 53432946 91 browser details YourSeq 73 471 578 2000 86.4% chr12 + 17536632 17536738 107 browser details YourSeq 70 469 560 2000 87.4% chr17 - 29177446 29177536 91 browser details YourSeq 70 468 569 2000 84.4% chr6 + 124483614 124483715 102 browser details YourSeq 70 471 560 2000 88.3% chr16 + 38678472 38678560 89 browser details YourSeq 69 462 558 2000 88.1% chr16 - 23102034 23102129 96 browser details YourSeq 68 475 578 2000 86.4% chr11 - 87096187 87096289 103 browser details YourSeq 66 730 864 2000 91.3% chr6 - 47582957 47583096 140 browser details YourSeq 66 831 927 2000 87.5% chr12 - 31776816 31776913 98 browser details YourSeq 66 469 652 2000 87.5% chr6 + 18437065 18437246 182 browser details YourSeq 66 469 560 2000 85.1% chr13 + 100658763 100658853 91

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 - 108039239 108041238 2000 browser details YourSeq 361 77 565 2000 89.4% chr1 + 141255913 141256398 486 browser details YourSeq 138 1492 1986 2000 74.9% chr13 + 111378139 111378631 493 browser details YourSeq 136 1285 1910 2000 87.1% chr6 + 119169457 119170146 690 browser details YourSeq 134 1671 2000 2000 82.4% chrX - 152118942 152119248 307 browser details YourSeq 131 1543 1998 2000 78.3% chr9 - 79425540 79426044 505 browser details YourSeq 131 1436 1913 2000 88.8% chr2 - 126438349 126438863 515 browser details YourSeq 110 1197 1893 2000 82.4% chr6 + 32390292 32391057 766 browser details YourSeq 102 1676 1918 2000 88.8% chr11 - 3965916 3966162 247 browser details YourSeq 101 1653 1993 2000 83.1% chr1 - 73646273 73646807 535 browser details YourSeq 101 1681 1986 2000 89.8% chr11 + 25956847 25957152 306 browser details YourSeq 97 1287 1982 2000 73.5% chr15 + 5702781 5703544 764 browser details YourSeq 93 1201 1749 2000 91.1% chr12 + 56220253 56220855 603 browser details YourSeq 89 1675 2000 2000 88.6% chr1 + 172036519 172036857 339 browser details YourSeq 87 1197 1986 2000 86.6% chr10 + 104336592 104337500 909 browser details YourSeq 85 1200 1843 2000 84.5% chr9 + 107041601 107042330 730 browser details YourSeq 84 1761 2000 2000 88.2% chr5 - 88472682 88472924 243 browser details YourSeq 83 1681 1986 2000 91.1% chr10 - 75069642 75069958 317 browser details YourSeq 83 1530 1986 2000 91.9% chr2 + 129975164 129975647 484 browser details YourSeq 82 1762 1932 2000 90.2% chr6 - 100957321 100957804 484

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and protein information: Gstm4 glutathione S-transferase, mu 4 [ Mus musculus (house mouse) ] Gene ID: 14865, updated on 10-Oct-2019

Gene summary

Official Symbol Gstm4 provided by MGI Official Full Name glutathione S-transferase, mu 4 provided by MGI Primary source MGI:MGI:95862 See related Ensembl:ENSMUSG00000027890 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Gstb4; Gstb-4; GSTM7-7; 1110004G14Rik Expression Broad expression in bladder adult (RPKM 39.1), liver adult (RPKM 27.9) and 27 other tissuesS ee more

Genomic context

Location: 3; 3 F2.3 See Gstm4 in Genome Data Viewer

Exon count: 10

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 3 NC_000069.6 (108030571..108045179, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 3 NC_000069.5 (107843326..107847777, complement)

Chromosome 3 - NC_000069.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Gstm4 ENSMUSG00000027890

Description glutathione S-transferase, mu 4 [Source:MGI Symbol;Acc:MGI:95862] Gene Synonyms 1110004G14Rik, Gstb-4, Gstb4 Location Chromosome 3: 108,040,408-108,044,894 reverse strand. GRCm38:CM000996.2 About this gene This gene has 3 transcripts (splice variants), 215 orthologues, 16 paralogues, is a member of 1 Ensembl protein family and is associated with 1 phenotype. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Gstm4-201 ENSMUST00000029489.14 1707 218aa ENSMUSP00000029489.8 Protein coding CCDS17748 Q8R5I6 TSL:1 GENCODE basic APPRIS P1

Gstm4-203 ENSMUST00000178808.7 1462 184aa ENSMUSP00000136643.1 Protein coding CCDS51045 A2AE91 TSL:1 GENCODE basic

Gstm4-202 ENSMUST00000106670.1 777 184aa ENSMUSP00000102281.1 Protein coding CCDS51045 A2AE91 TSL:3 GENCODE basic

24.49 kb Forward strand 108.035Mb 108.040Mb 108.045Mb 108.050Mb Gm25592-201 >snoRNA (Comprehensive set...

Contigs AL671877.15 > Genes (Comprehensive set... < Gm12499-201unprocessed pseudogene < Gstm4-201protein coding

< Gstm4-203protein coding

< Gstm4-202protein coding

Regulatory Build

108.035Mb 108.040Mb 108.045Mb 108.050Mb Reverse strand 24.49 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

pseudogene RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000029489

< Gstm4-201protein coding

Reverse strand 4.49 kb

ENSMUSP00000029... Superfamily Thioredoxin-like superfamily Glutathione S-transferase, C-terminal domain superfamily

SFLD SFLDG00363

SFLDG01205 Prints Glutathione S-transferase, Mu class Pfam Glutathione S-transferase, N-terminal Glutathione S-transferase, C-terminal

PROSITE profiles Glutathione S-transferase, N-terminal Glutathione S-transferase, C-terminal-like

PANTHER PTHR11571

PTHR11571:SF229 Gene3D 1.20.1050.10

3.40.30.10 CDD cd03075 cd03209

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 180 218

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8