https://www.alphaknockout.com

Mouse Gsta4 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Gsta4 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Gsta4 (NCBI Reference Sequence: NM_010357 ; Ensembl: ENSMUSG00000032348 ) is located on Mouse 9. 7 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 7 (Transcript: ENSMUST00000034903). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Gsta4 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-360F18 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for disruptions in this gene display increased sensitivity to oxidative stress, increased alanine and aspartate transaminase levels, increased susceptibility to bacterial infection, reduced litter size, and impaired glucose tolerance.

Exon 2 starts from about 100% of the coding region. The knockout of Exon 2 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 6205 bp, and the size of intron 2 for 3'-loxP site insertion: 1113 bp. The size of effective cKO region: ~587 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 3 7 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Gsta4 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7087bp) | A(25.22% 1787) | C(22.55% 1598) | T(31.28% 2217) | G(20.95% 1485)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr9 + 78195041 78198040 3000 browser details YourSeq 217 1097 1527 3000 88.6% chr9 - 114698422 114699078 657 browser details YourSeq 217 1078 1759 3000 87.8% chr13 + 91440596 91441247 652 browser details YourSeq 194 1098 1527 3000 88.1% chr13 + 55196329 55196715 387 browser details YourSeq 188 1097 1483 3000 90.9% chr9 - 87047755 87048309 555 browser details YourSeq 182 1097 1495 3000 92.9% chr1 + 171178949 171179462 514 browser details YourSeq 180 1098 1483 3000 85.4% chr14 + 25981204 25981456 253 browser details YourSeq 180 1098 1483 3000 85.4% chr14 + 26260538 26260792 255 browser details YourSeq 163 1096 1298 3000 89.9% chr13 + 97686437 97686637 201 browser details YourSeq 163 1097 1412 3000 93.6% chr11 + 95439894 95440498 605 browser details YourSeq 162 1097 1289 3000 92.2% chr7 - 44424485 44424681 197 browser details YourSeq 161 1094 1291 3000 89.0% chr18 - 35695250 35695440 191 browser details YourSeq 160 1081 1269 3000 90.6% chr18 + 6630342 6630522 181 browser details YourSeq 159 1096 1279 3000 93.5% chr11 - 60803014 60803198 185 browser details YourSeq 158 1098 1269 3000 96.0% chr12 - 99590894 99591065 172 browser details YourSeq 158 1097 1292 3000 92.4% chr11 - 103523260 103523454 195 browser details YourSeq 158 1098 1470 3000 89.5% chr10 - 67142515 67143108 594 browser details YourSeq 158 1095 1269 3000 95.5% chr5 + 144087591 144087766 176 browser details YourSeq 157 1079 1269 3000 91.1% chr11 - 57538797 57538987 191 browser details YourSeq 157 1097 1269 3000 95.4% chr11 - 51628385 51628557 173

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr9 + 78198628 78201627 3000 browser details YourSeq 54 601 688 3000 92.2% chr11 + 62423252 62423339 88 browser details YourSeq 51 2875 2995 3000 71.1% chr12 + 64163246 64163366 121 browser details YourSeq 40 2510 2679 3000 88.7% chr2 - 162173507 162173684 178 browser details YourSeq 40 2839 2886 3000 87.3% chr17 + 63882806 63882852 47 browser details YourSeq 40 2935 2986 3000 88.5% chr11 + 74997765 74997816 52 browser details YourSeq 39 2440 2500 3000 95.4% chr6 - 50555731 50555802 72 browser details YourSeq 39 2876 2926 3000 91.7% chr11 - 43800798 43800856 59 browser details YourSeq 38 2151 2223 3000 93.2% chrX + 151884658 151884732 75 browser details YourSeq 36 2880 2941 3000 85.4% chr16 - 20136479 20136538 60 browser details YourSeq 34 2177 2222 3000 87.0% chr1 - 171652203 171652248 46 browser details YourSeq 34 604 687 3000 94.8% chr1 + 162141503 162141587 85 browser details YourSeq 33 2818 2864 3000 85.2% chr13 - 27839986 27840032 47 browser details YourSeq 33 2152 2211 3000 97.3% chr7 + 138508131 138508192 62 browser details YourSeq 33 607 683 3000 91.5% chr16 + 5148769 5148844 76 browser details YourSeq 32 2016 2144 3000 84.3% chr7 + 76140400 76140526 127 browser details YourSeq 31 606 641 3000 94.5% chr19 - 27642980 27643019 40 browser details YourSeq 30 2945 2984 3000 87.5% chr17 - 18868772 18868811 40 browser details YourSeq 29 655 695 3000 85.4% chr2 - 129022826 129022866 41 browser details YourSeq 29 600 636 3000 89.2% chr6 + 84244625 84244661 37

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and protein information: Gsta4 S-transferase, alpha 4 [ Mus musculus (house mouse) ] Gene ID: 14860, updated on 1-Oct-2019

Gene summary

Official Symbol Gsta4 provided by MGI Official Full Name glutathione S-transferase, alpha 4 provided by MGI Primary source MGI:MGI:1309515 See related Ensembl:ENSMUSG00000032348 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as GST5.7; mGsta4 Expression Biased expression in bladder adult (RPKM 693.7), stomach adult (RPKM 118.2) and 1 other tissue See more

Genomic context

Location: 9; 9 E1 See Gsta4 in Genome Data Viewer Exon count: 7

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 9 NC_000075.6 (78191966..78209349)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 9 NC_000075.5 (78039773..78057156)

Chromosome 9 - NC_000075.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 2 transcripts

Gene: Gsta4 ENSMUSG00000032348

Description glutathione S-transferase, alpha 4 [Source:MGI Symbol;Acc:MGI:1309515] Gene Synonyms GST 5.7, mGsta4 Location Chromosome 9: 78,182,774-78,209,349 forward strand. GRCm38:CM001002.2 About this gene This gene has 2 transcripts (splice variants), 214 orthologues, 16 paralogues, is a member of 1 Ensembl protein family and is associated with 12 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Gsta4-201 ENSMUST00000034903.6 977 222aa ENSMUSP00000034903.5 Protein coding CCDS23357 P24472 TSL:1 GENCODE basic APPRIS P1

Gsta4-202 ENSMUST00000213215.1 743 105aa ENSMUSP00000149763.1 Protein coding - A0A1L1SS61 CDS 3' incomplete TSL:3

46.58 kb Forward strand 78.18Mb 78.19Mb 78.20Mb 78.21Mb (Comprehensive set... C920006O11Rik-201 >lncRNA Gsta4-201 >protein coding

C920006O11Rik-202 >transcribed unprocessed pseudogene

Gsta4-202 >protein coding

Contigs < AC159313.3 Genes < Rn7sk-201misc RNA < Gm3126-201processed pseudogene (Comprehensive set...

Regulatory Build

78.18Mb 78.19Mb 78.20Mb 78.21Mb Reverse strand 46.58 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene pseudogene

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000034903

17.40 kb Forward strand

Gsta4-201 >protein coding

ENSMUSP00000034... PDB-ENSP mappings Low complexity (Seg) Superfamily Thioredoxin-like superfamily Glutathione S-transferase, C-terminal domain superfamily

SFLD SFLDG01205

Glutathione Transferase family Prints Glutathione S-transferase, alpha class Pfam Glutathione S-transferase, N-terminal Glutathione S-transferase, C-terminal

PROSITE profiles Glutathione S-transferase, N-terminal Glutathione S-transferase, C-terminal-like

PANTHER PTHR11571

PTHR11571:SF101 Gene3D 1.20.1050.10

3.40.30.10 CDD cd03077 cd03208

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 180 200 222

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7