https://www.alphaknockout.com

Mouse Gsta4 Knockout Project (CRISPR/Cas9)

Objective: To create a Gsta4 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Gsta4 (NCBI Reference Sequence: NM_010357 ; Ensembl: ENSMUSG00000032348 ) is located on Mouse 9. 7 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 7 (Transcript: ENSMUST00000034903). Exon 2~6 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for disruptions in this gene display increased sensitivity to oxidative stress, increased alanine and aspartate transaminase levels, increased susceptibility to bacterial infection, reduced litter size, and impaired glucose tolerance.

Exon 2 starts from the coding region. Exon 2~6 covers 81.98% of the coding region. The size of effective KO region: ~9663 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7

Legends Exon of mouse Gsta4 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1124 bp section downstream of Exon 6 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.1% 482) | C(20.95% 419) | T(34.55% 691) | G(20.4% 408)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1124bp) | A(23.93% 269) | C(20.28% 228) | T(33.19% 373) | G(22.6% 254)

Note: The 1124 bp section downstream of Exon 6 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr9 + 78196291 78198290 2000 browser details YourSeq 176 193 1153 2000 93.1% chr17 + 71209629 71590785 381157 browser details YourSeq 153 1085 1671 2000 89.6% chr19 + 5614704 5868958 254255 browser details YourSeq 151 158 1155 2000 90.2% chr15 - 102053001 102072783 19783 browser details YourSeq 104 1021 1176 2000 93.4% chr15 + 14985052 14985371 320 browser details YourSeq 99 631 1175 2000 76.1% chr14 - 91925963 91926119 157 browser details YourSeq 98 1029 1155 2000 91.6% chr12 + 31680658 31680789 132 browser details YourSeq 97 1039 1155 2000 95.4% chr3 + 127519424 127519544 121 browser details YourSeq 93 1031 1171 2000 84.3% chr1 + 69824608 69824726 119 browser details YourSeq 92 1046 1172 2000 93.4% chrX - 73530703 73530835 133 browser details YourSeq 91 1071 1645 2000 74.0% chr14 + 55843159 55843392 234 browser details YourSeq 90 1039 1155 2000 91.7% chr10 - 80539445 80539568 124 browser details YourSeq 90 1045 1174 2000 88.8% chr1 + 78291743 78291878 136 browser details YourSeq 89 1038 1155 2000 93.3% chr11 - 74753514 74754030 517 browser details YourSeq 89 1047 1171 2000 94.1% chr10 - 114638919 114639047 129 browser details YourSeq 89 1039 1155 2000 94.2% chr14 + 31122953 31123426 474 browser details YourSeq 87 1038 1155 2000 94.0% chr12 - 76567949 76568069 121 browser details YourSeq 86 1116 1652 2000 75.6% chr11 - 9008214 9008449 236 browser details YourSeq 85 1050 1166 2000 93.8% chr12 - 72093768 72093891 124 browser details YourSeq 83 1039 1155 2000 94.7% chr18 + 35110931 35111053 123

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1124 1 1124 1124 100.0% chr9 + 78207934 78209057 1124 browser details YourSeq 119 610 992 1124 89.5% chr17 + 26331691 26332118 428 browser details YourSeq 108 610 931 1124 72.6% chr3 + 67556526 67556829 304 browser details YourSeq 98 664 933 1124 79.4% chr3 - 108608404 108608603 200 browser details YourSeq 77 727 989 1124 79.8% chr4 + 100460109 100460310 202 browser details YourSeq 65 675 940 1124 91.2% chr2 + 51359698 51360057 360 browser details YourSeq 52 874 992 1124 75.9% chr6 - 112558296 112558391 96 browser details YourSeq 38 840 948 1124 95.3% chr4 - 3419225 3419751 527 browser details YourSeq 38 902 951 1124 93.1% chr16 - 44426236 44426289 54 browser details YourSeq 37 696 754 1124 81.4% chr16 + 65029297 65029355 59 browser details YourSeq 34 730 783 1124 89.5% chr18 - 77349236 77349288 53 browser details YourSeq 34 730 793 1124 87.5% chr19 + 36728326 36728388 63 browser details YourSeq 27 612 644 1124 91.0% chr4 - 83953367 83953399 33 browser details YourSeq 27 961 991 1124 96.6% chr16 + 90993695 90993727 33 browser details YourSeq 26 609 640 1124 90.7% chr10 + 99567154 99567185 32 browser details YourSeq 25 268 295 1124 96.5% chr1 + 179485599 179485627 29 browser details YourSeq 24 615 644 1124 90.0% chr5 - 48339128 48339157 30

Note: The 1124 bp section downstream of Exon 6 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and protein information: Gsta4 S-transferase, alpha 4 [ Mus musculus (house mouse) ] Gene ID: 14860, updated on 1-Oct-2019

Gene summary

Official Symbol Gsta4 provided by MGI Official Full Name glutathione S-transferase, alpha 4 provided by MGI Primary source MGI:MGI:1309515 See related Ensembl:ENSMUSG00000032348 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as GST5.7; mGsta4 Expression Biased expression in bladder adult (RPKM 693.7), stomach adult (RPKM 118.2) and 1 other tissue See more

Genomic context

Location: 9; 9 E1 See Gsta4 in Genome Data Viewer

Exon count: 7

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 9 NC_000075.6 (78191966..78209349)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 9 NC_000075.5 (78039773..78057156)

Chromosome 9 - NC_000075.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 2 transcripts

Gene: Gsta4 ENSMUSG00000032348

Description glutathione S-transferase, alpha 4 [Source:MGI Symbol;Acc:MGI:1309515] Gene Synonyms GST 5.7, mGsta4 Location Chromosome 9: 78,182,774-78,209,349 forward strand. GRCm38:CM001002.2 About this gene This gene has 2 transcripts (splice variants), 214 orthologues, 16 paralogues, is a member of 1 Ensembl protein family and is associated with 12 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Gsta4-201 ENSMUST00000034903.6 977 222aa ENSMUSP00000034903.5 Protein coding CCDS23357 P24472 TSL:1 GENCODE basic APPRIS P1

Gsta4-202 ENSMUST00000213215.1 743 105aa ENSMUSP00000149763.1 Protein coding - A0A1L1SS61 CDS 3' incomplete TSL:3

46.58 kb Forward strand 78.18Mb 78.19Mb 78.20Mb 78.21Mb (Comprehensive set... C920006O11Rik-201 >lncRNA Gsta4-201 >protein coding

C920006O11Rik-202 >transcribed unprocessed pseudogene

Gsta4-202 >protein coding

Contigs < AC159313.3 Genes < Rn7sk-201misc RNA < Gm3126-201processed pseudogene (Comprehensive set...

Regulatory Build

78.18Mb 78.19Mb 78.20Mb 78.21Mb Reverse strand 46.58 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene pseudogene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000034903

17.40 kb Forward strand

Gsta4-201 >protein coding

ENSMUSP00000034... PDB-ENSP mappings Low complexity (Seg) Superfamily Thioredoxin-like superfamily Glutathione S-transferase, C-terminal domain superfamily

SFLD SFLDG01205

Glutathione Transferase family Prints Glutathione S-transferase, alpha class Pfam Glutathione S-transferase, N-terminal Glutathione S-transferase, C-terminal

PROSITE profiles Glutathione S-transferase, N-terminal Glutathione S-transferase, C-terminal-like

PANTHER PTHR11571

PTHR11571:SF101 Gene3D 1.20.1050.10

3.40.30.10 CDD cd03077 cd03208

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 180 200 222

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8