https://www.alphaknockout.com

Mouse Gsto2 Knockout Project (CRISPR/Cas9)

Objective: To create a Gsto2 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Gsto2 (NCBI Reference Sequence: NM_026619 ; Ensembl: ENSMUSG00000025069 ) is located on Mouse 19. 9 exons are identified, with the ATG start codon in exon 4 and the TAA stop codon in exon 9 (Transcript: ENSMUST00000056159). Exon 4~7 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 4 starts from the coding region. Exon 4~7 covers 62.9% of the coding region. The size of effective KO region: ~4844 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 4 5 6 7 9

Legends Exon of mouse Gsto2 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1330 bp section upstream of Exon 4 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 7 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(1330bp) | A(25.86% 344) | C(20.38% 271) | T(29.62% 394) | G(24.14% 321)

Note: The 1330 bp section upstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(23.65% 473) | C(22.45% 449) | T(27.2% 544) | G(26.7% 534)

Note: The 2000 bp section downstream of Exon 7 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1330 1 1330 1330 100.0% chr19 + 47870441 47871770 1330 browser details YourSeq 27 220 247 1330 100.0% chr15 - 97565040 97565076 37 browser details YourSeq 24 610 635 1330 96.2% chr1 - 144746866 144746891 26 browser details YourSeq 23 401 426 1330 83.4% chr1 + 3929130 3929153 24 browser details YourSeq 22 213 234 1330 100.0% chr10 + 3296076 3296097 22 browser details YourSeq 22 615 638 1330 95.9% chr1 + 173684840 173684863 24 browser details YourSeq 21 220 240 1330 100.0% chr6 - 19856286 19856306 21 browser details YourSeq 21 697 717 1330 100.0% chr1 + 150395270 150395290 21 browser details YourSeq 20 429 448 1330 100.0% chr1 + 130328216 130328235 20

Note: The 1330 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr19 + 47876499 47878498 2000 browser details YourSeq 64 1140 1493 2000 72.8% chr3 - 145947279 145947504 226 browser details YourSeq 58 1383 1481 2000 91.5% chr11 - 116734665 116734799 135 browser details YourSeq 55 1399 1495 2000 81.9% chr3 - 85813826 85813914 89 browser details YourSeq 54 1426 1502 2000 87.9% chr13 - 93264954 93265029 76 browser details YourSeq 52 1372 1500 2000 90.7% chr14 + 118312053 118312201 149 browser details YourSeq 50 1398 1493 2000 90.4% chr7 - 117924486 117924617 132 browser details YourSeq 50 1426 1493 2000 88.0% chr12 - 80871819 80871884 66 browser details YourSeq 47 1426 1495 2000 85.8% chr9 - 100527920 100527987 68 browser details YourSeq 47 1425 1491 2000 91.4% chr16 - 24796535 24796601 67 browser details YourSeq 44 1426 1493 2000 85.0% chr3 + 37301269 37301334 66 browser details YourSeq 43 1398 1492 2000 92.2% chr11 - 86781094 86781225 132 browser details YourSeq 43 1426 1493 2000 78.5% chr17 + 83563587 83563652 66 browser details YourSeq 43 1431 1492 2000 92.4% chr17 + 56556111 56556554 444 browser details YourSeq 42 1426 1512 2000 77.0% chr16 - 38314124 38314208 85 browser details YourSeq 42 1431 1480 2000 85.5% chr7 + 80396758 80396805 48 browser details YourSeq 40 1424 1493 2000 81.5% chr11 + 15988566 15988633 68 browser details YourSeq 38 1409 1495 2000 75.7% chr9 - 119590004 119590107 104 browser details YourSeq 35 1427 1495 2000 74.7% chr7 - 131480041 131480108 68 browser details YourSeq 33 1443 1492 2000 75.6% chr17 - 29184994 29185039 46

Note: The 2000 bp section downstream of Exon 7 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and protein information: Gsto2 glutathione S-transferase omega 2 [ Mus musculus (house mouse) ] Gene ID: 68214, updated on 12-Aug-2019

Gene summary

Official Symbol Gsto2 provided by MGI Official Full Name glutathione S-transferase omega 2 provided by MGI Primary source MGI:MGI:1915464 See related Ensembl:ENSMUSG00000025069 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as GSTO 2-2; 1700020F09Rik; 4930425C18Rik Expression Restricted expression toward testis adult (RPKM 98.1) See more

Genomic context

Location: 19; 19 D1 See Gsto2 in Genome Data Viewer

Exon count: 10

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 19 NC_000085.6 (47865438..47886308)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 19 NC_000085.5 (47940284..47960795)

Chromosome 19 - NC_000085.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 9 transcripts

Gene: Gsto2 ENSMUSG00000025069

Description glutathione S-transferase omega 2 [Source:MGI Symbol;Acc:MGI:1915464] Gene Synonyms 1700020F09Rik, 4930425C18Rik Location Chromosome 19: 47,865,534-47,886,324 forward strand. GRCm38:CM001012.2 About this gene This gene has 9 transcripts (splice variants), 522 orthologues, 15 paralogues, is a member of 1 Ensembl protein family and is associated with 2 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Gsto2- ENSMUST00000056159.10 1351 248aa ENSMUSP00000052592.4 Protein coding CCDS29894 Q8K2Q2 TSL:1 201 GENCODE basic APPRIS P2

Gsto2- ENSMUST00000120645.7 1277 248aa ENSMUSP00000113409.1 Protein coding CCDS29894 Q8K2Q2 TSL:1 202 GENCODE basic APPRIS P2

Gsto2- ENSMUST00000135016.2 1145 248aa ENSMUSP00000119680.2 Protein coding CCDS29894 D3Z1Q9 TSL:5 203 Q8K2Q2 GENCODE basic APPRIS P2

Gsto2- ENSMUST00000235896.1 1174 214aa ENSMUSP00000157838.1 Protein coding - A0A494BB82 GENCODE 205 basic APPRIS ALT2

Gsto2- ENSMUST00000238084.1 1174 214aa ENSMUSP00000158414.1 Protein coding - A0A494BB82 GENCODE 209 basic APPRIS ALT2

Gsto2- ENSMUST00000237514.1 1118 192aa ENSMUSP00000158279.1 Protein coding - A0A494BAY2 GENCODE 207 basic APPRIS ALT2

Gsto2- ENSMUST00000235857.1 1000 64aa ENSMUSP00000158441.1 Nonsense mediated - A0A494BBG0 - 204 decay

Gsto2- ENSMUST00000236449.1 685 No - lncRNA - - - 206 protein

Gsto2- ENSMUST00000237607.1 539 No - lncRNA - - - 208 protein

Page 7 of 9 https://www.alphaknockout.com

40.79 kb Forward strand 47.86Mb 47.87Mb 47.88Mb 47.89Mb (Comprehensive set... Gsto1-201 >protein coding Gsto2-205 >protein coding

Gsto1-205 >protein coding Gsto2-202 >protein coding

Gsto1-206 >nonsense mediated decay Gsto2-207 >protein coding

Gsto1-202 >retained intron Gsto1-203 >retained intron Gsto2-203 >protein coding

Gsto1-204 >lncRNA Gsto2-201 >protein coding

Gsto2-204 >nonsense mediated decay

Gsto2-206 >lncRNA

Gsto2-209 >protein coding

Gsto2-208 >lncRNA

Contigs < AC126679.2 Genes < Cfap43-201retained intron (Comprehensive set...

< Itprip-201protein coding

Regulatory Build

47.86Mb 47.87Mb 47.88Mb 47.89Mb Reverse strand 40.79 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000056159

20.56 kb Forward strand

Gsto2-201 >protein coding

ENSMUSP00000052... Superfamily Thioredoxin-like superfamily

Glutathione S-transferase, C-terminal domain superfamily SFLD Glutathione Transferase family

SFLDG00358 Prints Glutathione S-transferase, omega-class Pfam Glutathione S-transferase, N-terminal PROSITE profiles Glutathione S-transferase, N-terminal Glutathione S-transferase, C-terminal-like

PANTHER PTHR43968

PTHR43968:SF4 Gene3D 3.40.30.10

1.20.1050.10 CDD cd03055 cd03184

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 248

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9