https://www.alphaknockout.com

Mouse Tox4 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Tox4 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Tox4 (NCBI Reference Sequence: NM_023434 ; Ensembl: ENSMUSG00000016831 ) is located on Mouse 14. 10 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 10 (Transcript: ENSMUST00000022766). Exon 8~9 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Tox4 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-137N7 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 8 starts from about 48.03% of the coding region. The knockout of Exon 8~9 will result in frameshift of the gene. The size of intron 7 for 5'-loxP site insertion: 633 bp, and the size of intron 9 for 3'-loxP site insertion: 517 bp. The size of effective cKO region: ~1709 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 7 8 9 1011 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Tox4 Homology arm cKO region Exon of mouse Mettl3 loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8209bp) | A(27.99% 2298) | C(21.15% 1736) | T(29.15% 2393) | G(21.71% 1782)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr14 + 52288226 52291225 3000 browser details YourSeq 309 1 525 3000 87.8% chr1 + 84880484 84945487 65004 browser details YourSeq 270 122 989 3000 88.8% chr5 - 122710598 122935871 225274 browser details YourSeq 262 1 497 3000 88.5% chr15 - 58993141 58994236 1096 browser details YourSeq 229 1 300 3000 91.1% chr4 - 138022148 138022447 300 browser details YourSeq 226 1 301 3000 94.9% chr11 - 88333211 88333731 521 browser details YourSeq 222 1 299 3000 91.7% chr12 - 76834324 76834630 307 browser details YourSeq 221 1 299 3000 90.6% chrX + 134618602 134619076 475 browser details YourSeq 216 1 1055 3000 92.5% chr3 + 94617272 95103049 485778 browser details YourSeq 215 112 526 3000 87.9% chr17 - 46489044 46691819 202776 browser details YourSeq 212 1 302 3000 86.6% chr4 + 99164030 99164321 292 browser details YourSeq 210 1 303 3000 90.5% chr4 - 116535723 116969807 434085 browser details YourSeq 208 122 498 3000 91.4% chr17 - 27594015 27731757 137743 browser details YourSeq 207 1 302 3000 89.5% chr19 + 5021787 5022078 292 browser details YourSeq 203 31 523 3000 89.8% chr2 + 120666895 120667427 533 browser details YourSeq 200 1 301 3000 89.7% chr17 + 50539825 50540180 356 browser details YourSeq 195 5 299 3000 89.5% chr11 - 3178374 3178684 311 browser details YourSeq 190 141 516 3000 89.4% chr15 + 76665317 76665791 475 browser details YourSeq 168 145 521 3000 80.1% chr13 + 65849782 65850088 307 browser details YourSeq 163 2 291 3000 87.7% chr2 + 164379891 164380198 308

Note: The 3000 bp section upstream of Exon 8 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr14 + 52292935 52295934 3000 browser details YourSeq 1195 265 1576 3000 96.6% chr1 - 16767013 16768329 1317 browser details YourSeq 335 385 2523 3000 91.8% chr10 - 128158489 128380431 221943 browser details YourSeq 288 383 2526 3000 91.0% chr1 - 130104096 130464735 360640 browser details YourSeq 253 408 2471 3000 92.4% chr10 + 80598703 80965560 366858 browser details YourSeq 204 386 2344 3000 91.5% chr1 - 152760384 152911594 151211 browser details YourSeq 171 1075 1559 3000 85.0% chr15 + 68764862 68765385 524 browser details YourSeq 168 2249 2526 3000 87.4% chr16 - 30393230 30393504 275 browser details YourSeq 149 384 1126 3000 91.8% chr8 + 105583094 105921817 338724 browser details YourSeq 148 383 552 3000 94.1% chr11 - 31080687 31080856 170 browser details YourSeq 147 384 545 3000 94.4% chr11 - 118360070 118360230 161 browser details YourSeq 146 387 546 3000 96.3% chr10 - 19885380 19885541 162 browser details YourSeq 146 385 548 3000 93.8% chr12 + 69612033 69612194 162 browser details YourSeq 144 383 550 3000 94.6% chr8 - 126983444 126983627 184 browser details YourSeq 144 383 542 3000 95.6% chr13 - 99404343 99404504 162 browser details YourSeq 144 381 548 3000 94.1% chr4 + 108194409 108194584 176 browser details YourSeq 144 385 545 3000 95.6% chr1 + 140242926 140243087 162 browser details YourSeq 142 388 564 3000 91.4% chr16 + 64892304 64892476 173 browser details YourSeq 141 393 566 3000 92.8% chr9 - 62781125 62781299 175 browser details YourSeq 140 385 544 3000 94.4% chr10 - 11111563 11111723 161

Note: The 3000 bp section downstream of Exon 9 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Tox4 TOX high mobility group box family member 4 [ Mus musculus (house mouse) ] Gene ID: 268741, updated on 12-Aug-2019

Gene summary

Official Symbol Tox4 provided by MGI Official Full Name TOX high mobility group box family member 4 provided by MGI Primary source MGI:MGI:1915389 See related Ensembl:ENSMUSG00000016831 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as LCP1; AA410149; A630040M18; 5730589K01Rik Expression Ubiquitous expression in thymus adult (RPKM 22.1), ovary adult (RPKM 18.1) and 28 other tissues See more Orthologs human all

Genomic context

Location: 14; 14 C2 See Tox4 in Genome Data Viewer

Exon count: 10

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 14 NC_000080.6 (52279146..52295509)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 14 NC_000080.5 (52898821..52915184)

Chromosome 14 - NC_000080.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 5 transcripts

Gene: Tox4 ENSMUSG00000016831

Description TOX high mobility group box family member 4 [Source:MGI Symbol;Acc:MGI:1915389] Gene Synonyms 5730589K01Rik Location : 52,279,146-52,296,401 forward strand. GRCm38:CM001007.2 About this gene This gene has 5 transcripts (splice variants), 259 orthologues, 33 paralogues, is a member of 1 Ensembl protein family and is associated with 5 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Tox4-201 ENSMUST00000022766.7 5350 619aa ENSMUSP00000022766.6 Protein coding CCDS36920 Q8BU11 TSL:5 GENCODE basic APPRIS P1

Tox4-203 ENSMUST00000152493.1 2997 No protein - Retained intron - - TSL:1

Tox4-202 ENSMUST00000137753.1 709 No protein - Retained intron - - TSL:2

Tox4-204 ENSMUST00000172655.1 344 No protein - Retained intron - - TSL:3

Tox4-205 ENSMUST00000173361.1 675 No protein - lncRNA - - TSL:3

Page 6 of 8 https://www.alphaknockout.com

37.26 kb Forward strand

52.27Mb 52.28Mb 52.29Mb 52.30Mb Gm23758-201 >snoRNA Tox4-205 >lncRNA (Comprehensive set...

Tox4-201 >protein coding

Tox4-203 >retained intron

Tox4-204 >retained intron

Tox4-202 >retained intron

Contigs < AC126037.4 Genes < Rab2b-201protein coding < Mettl3-201protein coding (Comprehensive set...

< Rab2b-203protein coding < Mettl3-212nonsense mediated decay

< Rab2b-202protein coding < Mettl3-210protein coding < Mettl3-215retained intron

< Rab2b-207nonsense mediated decay < Mettl3-213protein coding

< Rab2b-204nonsense mediated decay < Mettl3-209retained intron

< Rab2b-206lncRNA < Mettl3-206nonsense mediated decay

< Rab2b-205nonsense mediated decay < Mettl3-204retained intron

< Mettl3-207retained intron

< Mettl3-203retained intron

< Mettl3-202nonsense mediated decay

< Mettl3-211nonsense mediated decay

< Mettl3-208nonsense mediated decay

< Mettl3-214protein coding

< Mettl3-205nonsense mediated decay

Regulatory Build

52.27Mb 52.28Mb 52.29Mb 52.30Mb Reverse strand 37.26 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000022766

17.26 kb Forward strand

Tox4-201 >protein coding

ENSMUSP00000022... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily High mobility group box domain superfamily

SMART High mobility group box domain Prints PR00886 Pfam High mobility group box domain

PROSITE profiles High mobility group box domain PANTHER PTHR45781

TOX high mobility group box family member 4 Gene3D High mobility group box domain superfamily

CDD cd00084

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 619

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8