https://www.alphaknockout.com

Mouse Spata2 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Spata2 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Spata2 (NCBI Reference Sequence: NM_170756 ; Ensembl: ENSMUSG00000047030 ) is located on Mouse 2. 3 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 3 (Transcript: ENSMUST00000057627). Exon 2~3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Spata2 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP24-144E24 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous knockout leads to small testes, oligospermia, asthenozoospermia, reduced male fertility and decreased male germ cell numbers. It also affects necroptosis and increases inflammatory responses.

Exon 2~3 covers 100.0% of the coding region. Start codon is in exon 2, and stop codon is in exon 3. The size of intron 1 for 5'-loxP site insertion: 7165 bp. The size of effective cKO region: ~2396 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

gRNA region

Wildtype allele T A

5' gRNA region G 3'

1 2 3

Targeting vector T A G

Targeted allele T A G

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Spata2 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8623bp) | A(20.79% 1793) | C(26.42% 2278) | T(26.22% 2261) | G(26.57% 2291)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr2 - 167485726 167488725 3000 browser details YourSeq 258 124 823 3000 87.9% chr10 - 93526203 93722716 196514 browser details YourSeq 244 178 831 3000 94.7% chr1 - 155911610 156309830 398221 browser details YourSeq 244 291 831 3000 89.0% chr1 - 58384149 58384701 553 browser details YourSeq 243 178 823 3000 88.7% chr11 - 103696668 103842822 146155 browser details YourSeq 243 320 831 3000 91.1% chr10 + 81348187 81522369 174183 browser details YourSeq 221 311 831 3000 83.4% chr11 - 100543810 100544165 356 browser details YourSeq 217 323 831 3000 93.0% chr4 - 148035830 148036350 521 browser details YourSeq 204 338 830 3000 86.3% chr10 - 117730436 117731042 607 browser details YourSeq 202 127 831 3000 86.6% chr1 - 35703984 35748417 44434 browser details YourSeq 200 322 831 3000 94.7% chr16 - 90077940 90124239 46300 browser details YourSeq 197 327 830 3000 92.4% chr1 - 16609665 16610266 602 browser details YourSeq 196 377 831 3000 85.8% chr11 + 4853916 4854249 334 browser details YourSeq 185 336 831 3000 86.4% chr1 - 84875342 84875654 313 browser details YourSeq 184 327 831 3000 86.2% chr10 - 128450103 128450445 343 browser details YourSeq 174 178 831 3000 88.3% chr11 - 74855044 74866001 10958 browser details YourSeq 173 278 831 3000 86.1% chr10 + 128898300 128898547 248 browser details YourSeq 160 659 840 3000 95.6% chr4 - 116041858 116042072 215 browser details YourSeq 158 439 831 3000 95.0% chr1 - 59661544 59793311 131768 browser details YourSeq 157 657 831 3000 96.0% chr11 - 75646244 75646441 198

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr2 - 167480353 167483352 3000 browser details YourSeq 1481 1 1933 3000 92.0% chr9 - 15431502 15439994 8493 browser details YourSeq 933 1005 1954 3000 98.8% chr17 + 30505310 30506256 947 browser details YourSeq 733 1046 1919 3000 93.9% chr8 - 47626661 47627557 897 browser details YourSeq 187 2259 2587 3000 87.5% chr4 - 129425837 129694967 269131 browser details YourSeq 186 2258 2650 3000 88.1% chr2 + 26070915 26071354 440 browser details YourSeq 162 2258 2593 3000 86.1% chr7 - 97392524 97392928 405 browser details YourSeq 162 1791 1954 3000 99.4% chr10 + 116980646 116980809 164 browser details YourSeq 159 2247 2450 3000 89.2% chr18 - 35304836 35305037 202 browser details YourSeq 156 2250 2509 3000 86.0% chr11 + 120652821 120653362 542 browser details YourSeq 149 2244 2428 3000 91.2% chr8 - 111428833 111429025 193 browser details YourSeq 145 2293 2644 3000 83.0% chr5 + 92129952 92130556 605 browser details YourSeq 143 2247 2437 3000 89.9% chr1 + 165504329 165504533 205 browser details YourSeq 142 2258 2644 3000 86.3% chr5 + 128954105 128954597 493 browser details YourSeq 142 2257 2446 3000 89.1% chr10 + 118223732 118223920 189 browser details YourSeq 140 1925 2658 3000 80.2% chr12 + 21551783 21552164 382 browser details YourSeq 140 2254 2437 3000 93.7% chr11 + 95886154 95886341 188 browser details YourSeq 138 2280 2542 3000 90.1% chr9 + 56540836 56541270 435 browser details YourSeq 138 2258 2433 3000 91.2% chr6 + 12658648 12658824 177 browser details YourSeq 138 1920 2443 3000 80.5% chr10 + 115258307 115258519 213

Note: The 3000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Spata2 spermatogenesis associated 2 [ Mus musculus (house mouse) ] Gene ID: 263876, updated on 24-Oct-2019

Gene summary

Official Symbol Spata2 provided by MGI Official Full Name spermatogenesis associated 2 provided by MGI Primary source MGI:MGI:2146885 See related Ensembl:ENSMUSG00000047030 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI504642; mKIAA0757 Expression Ubiquitous expression in large intestine adult (RPKM 17.1), colon adult (RPKM 16.8) and 28 other tissues See more Orthologs human all

Genomic context

Location: 2; 2 H3 See Spata2 in Genome Data Viewer

Exon count: 5

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 2 NC_000068.7 (167481136..167493238, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 2 NC_000068.6 (167306636..167318374, complement)

Chromosome 2 - NC_000068.7

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Spata2 ENSMUSG00000047030

Description spermatogenesis associated 2 [Source:MGI Symbol;Acc:MGI:2146885] Location Chromosome 2: 167,481,133-167,492,887 reverse strand. GRCm38:CM000995.2 About this gene This gene has 6 transcripts (splice variants), 193 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 11 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Spata2-201 ENSMUST00000057627.15 4012 515aa ENSMUSP00000057095.9 Protein coding CCDS17100 Q8K004 TSL:1 GENCODE basic APPRIS P1

Spata2-202 ENSMUST00000109211.8 1531 414aa ENSMUSP00000104834.2 Protein coding - Q8K004 TSL:1 GENCODE basic

Spata2-204 ENSMUST00000147051.1 733 No protein - Retained intron - - TSL:1

Spata2-206 ENSMUST00000155875.7 475 No protein - lncRNA - - TSL:2

Spata2-205 ENSMUST00000154770.1 462 No protein - lncRNA - - TSL:2

Spata2-203 ENSMUST00000126389.7 414 No protein - lncRNA - - TSL:2

Page 6 of 8 https://www.alphaknockout.com

31.75 kb Forward strand 167.48Mb 167.49Mb 167.50Mb Slc9a8-201 >protein coding Rnf114-202 >protein coding (Comprehensive set...

Slc9a8-202 >protein coding

Slc9a8-203 >protein coding

Slc9a8-206 >lncRNA

Slc9a8-208 >lncRNA

Slc9a8-210 >lncRNA

Contigs AL589870.30 > Genes (Comprehensive set... < Spata2-201protein coding

< Spata2-202protein coding

< Spata2-204retained intron

< Spata2-203lncRNA

< Spata2-206lncRNA

< Spata2-205lncRNA

Regulatory Build

167.48Mb 167.49Mb 167.50Mb Reverse strand 31.75 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000057627

< Spata2-201protein coding

Reverse strand 11.76 kb

ENSMUSP00000057... Low complexity (Seg) PANTHER PTHR15326

Spermatogenesis-associated protein 2 Gene3D 1.20.58.2190

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 515

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8