https://www.alphaknockout.com

Mouse Setx Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Setx conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated engineering.

Strategy summary: The Setx (NCBI Reference Sequence: NM_198033 ; Ensembl: ENSMUSG00000043535 ) is located on Mouse 2. 26 exons are identified, with the ATG start codon in exon 3 and the TAG stop codon in exon 26 (Transcript: ENSMUST00000061578). Exon 4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Setx gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-50I2 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit male infertility due to arrested male meiosis and reduced female fertility.

Exon 4 starts from about 2.27% of the coding region. The knockout of Exon 4 will result in frameshift of the gene. The size of intron 3 for 5'-loxP site insertion: 3116 bp, and the size of intron 4 for 3'-loxP site insertion: 3531 bp. The size of effective cKO region: ~711 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 4 26 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Setx Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7211bp) | A(27.43% 1978) | C(20.14% 1452) | T(31.34% 2260) | G(21.09% 1521)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr2 + 29126931 29129930 3000 browser details YourSeq 489 351 928 3000 94.9% chr2 - 29101947 29102541 595 browser details YourSeq 471 951 1713 3000 89.1% chr5 + 100076711 100077313 603 browser details YourSeq 454 952 1713 3000 88.8% chr18 - 67179648 67180236 589 browser details YourSeq 420 949 1660 3000 91.7% chr9 - 90052437 90053283 847 browser details YourSeq 408 738 1324 3000 89.6% chr15 + 102550345 102551048 704 browser details YourSeq 386 884 1629 3000 90.0% chr19 - 4077201 4078252 1052 browser details YourSeq 384 778 1713 3000 85.6% chr11 - 119011606 119012205 600 browser details YourSeq 348 951 1598 3000 91.9% chr19 - 40909206 40910059 854 browser details YourSeq 343 949 1579 3000 92.2% chr5 - 65315549 65316355 807 browser details YourSeq 334 842 1324 3000 91.4% chr2 - 3497471 3498353 883 browser details YourSeq 328 454 1320 3000 87.4% chr11 - 54743414 54743877 464 browser details YourSeq 324 949 1532 3000 92.0% chr6 + 112583352 112584018 667 browser details YourSeq 322 453 1324 3000 87.7% chr14 - 78951764 78952462 699 browser details YourSeq 319 449 1715 3000 86.4% chr17 - 73168287 73168729 443 browser details YourSeq 315 950 1713 3000 87.6% chr3 + 35700270 35700844 575 browser details YourSeq 312 949 1532 3000 90.0% chrX - 152426308 152426911 604 browser details YourSeq 312 949 1324 3000 92.3% chr15 - 85225267 85225659 393 browser details YourSeq 311 949 1532 3000 91.3% chrX + 152497405 152498012 608 browser details YourSeq 311 923 1324 3000 91.5% chr5 + 21770348 21771153 806

Note: The 3000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr2 + 29130642 29133641 3000 browser details YourSeq 235 356 1112 3000 91.3% chr10 + 71378338 72000916 622579 browser details YourSeq 155 315 522 3000 90.2% chr11 + 60423625 60423833 209 browser details YourSeq 153 323 519 3000 89.0% chr9 + 59465147 59465332 186 browser details YourSeq 152 323 520 3000 88.5% chr7 + 114119572 114119764 193 browser details YourSeq 149 326 527 3000 84.0% chr2 - 121014167 121014359 193 browser details YourSeq 148 323 518 3000 85.8% chr8 + 84282330 84282512 183 browser details YourSeq 147 321 520 3000 86.9% chr2 - 90872783 90872971 189 browser details YourSeq 147 322 501 3000 91.1% chr2 + 51454896 51455083 188 browser details YourSeq 146 323 506 3000 89.5% chrX + 162559776 162559954 179 browser details YourSeq 146 246 487 3000 89.5% chr3 + 129999964 130000500 537 browser details YourSeq 145 323 519 3000 89.6% chr7 - 49543602 49543794 193 browser details YourSeq 144 323 959 3000 81.2% chr2 + 101929625 101930019 395 browser details YourSeq 143 323 497 3000 91.4% chr9 - 32662791 32662967 177 browser details YourSeq 142 317 492 3000 91.0% chr18 - 20923687 20924169 483 browser details YourSeq 142 944 1185 3000 81.8% chr11 + 105910684 105910899 216 browser details YourSeq 141 947 1112 3000 92.8% chr3 - 32579678 32579846 169 browser details YourSeq 141 348 519 3000 93.3% chr14 + 60970300 60970664 365 browser details YourSeq 140 323 497 3000 90.3% chrX - 56876111 56876286 176 browser details YourSeq 140 322 487 3000 92.2% chr4 - 148163014 148163179 166

Note: The 3000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and protein information: Setx senataxin [ Mus musculus (house mouse) ] Gene ID: 269254, updated on 6-Oct-2019

Gene summary

Official Symbol Setx provided by MGI Official Full Name senataxin provided by MGI Primary source MGI:MGI:2443480 See related Ensembl:ENSMUSG00000043535 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AOA2; Als4; Sen1; SCAR1; AW060766; mKIAA0625; A130090N03; A930037J23Rik Expression Broad expression in testis adult (RPKM 26.8), CNS E11.5 (RPKM 4.6) and 19 other tissues See more Orthologs human all

Genomic context

Location: 2; 2 B See Setx in Genome Data Viewer

Exon count: 30

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 2 NC_000068.7 (29123588..29182471)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 2 NC_000068.6 (28980512..29037991)

Chromosome 2 - NC_000068.7

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 5 transcripts

Gene: Setx ENSMUSG00000043535

Description senataxin [Source:MGI Symbol;Acc:MGI:2443480] Gene Synonyms A930037J23Rik, Als4 Location Chromosome 2: 29,124,181-29,182,471 forward strand. GRCm38:CM000995.2 About this gene This gene has 5 transcripts (splice variants), 187 orthologues, 12 paralogues, is a member of 1 Ensembl protein family and is associated with 18 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Setx- ENSMUST00000061578.8 10970 2646aa ENSMUSP00000051492.2 Protein coding CCDS38090 A2AKX3 TSL:5 201 GENCODE basic APPRIS P1

Setx- ENSMUST00000145422.1 1141 381aa ENSMUSP00000119176.1 Protein coding - F6R186 CDS 5' and 3' 204 incomplete TSL:5

Setx- ENSMUST00000129544.7 525 81aa ENSMUSP00000119521.1 Protein coding - A0A0A0MQJ0 CDS 3' incomplete 202 TSL:3

Setx- ENSMUST00000154910.1 538 No - Retained - - TSL:3 205 protein intron

Setx- ENSMUST00000135992.1 577 No - lncRNA - - TSL:5 203 protein

78.29 kb Forward strand

Genes (Comprehensive set... Setx-202 >protein coding Setx-203 >lncRNA

Setx-201 >protein coding

Setx-205 >retained intron Setx-204 >protein coding

Contigs AL845267.2 > AL772379.11 > Regulatory Build

Reverse strand 78.29 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000061578

57.48 kb Forward strand

Setx-201 >protein coding

ENSMUSP00000051... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily P-loop containing nucleoside triphosphate hydrolase Pfam DNA2/NAM7 , AAA domain

DNA2/NAM7 helicase-like, AAA domain PANTHER PTHR10887:SF382

PTHR10887 Gene3D 3.40.50.300 CDD cd18042 cd18808

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend inframe insertion inframe deletion missense variant synonymous variant

Scale bar 0 400 800 1200 1600 2000 2646

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7