https://www.alphaknockout.com

Mouse Spata5 Knockout Project (CRISPR/Cas9)

Objective: To create a Spata5 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Spata5 (NCBI Reference Sequence: NM_001163511 ; Ensembl: ENSMUSG00000027722 ) is located on Mouse 3. 16 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 16 (Transcript: ENSMUST00000108112). Exon 2~7 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 6.01% of the coding region. Exon 2~7 covers 43.82% of the coding region. The size of effective KO region: ~9185 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 16

Legends Exon of mouse Spata5 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 7 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.75% 555) | C(16.45% 329) | T(34.6% 692) | G(21.2% 424)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.8% 456) | C(17.05% 341) | T(39.25% 785) | G(20.9% 418)

Note: The 2000 bp section downstream of Exon 7 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 + 37422512 37424511 2000 browser details YourSeq 129 1556 1749 2000 85.9% chr2 - 123728790 123728987 198 browser details YourSeq 128 1582 1786 2000 83.2% chr3 - 120427426 120427617 192 browser details YourSeq 123 1551 1731 2000 87.2% chr17 - 33725791 33725993 203 browser details YourSeq 123 1581 1743 2000 90.2% chr11 + 30505752 30735514 229763 browser details YourSeq 122 1578 1740 2000 86.3% chr12 + 11759690 11759847 158 browser details YourSeq 122 1572 1741 2000 88.7% chr11 + 60549384 60549557 174 browser details YourSeq 121 1552 1743 2000 81.8% chr16 - 32427347 32427520 174 browser details YourSeq 121 1578 1749 2000 87.2% chr1 + 7750152 7750330 179 browser details YourSeq 120 1579 1742 2000 86.2% chr15 + 98687166 98687317 152 browser details YourSeq 120 1577 1730 2000 89.1% chr1 + 172128088 172128238 151 browser details YourSeq 119 1579 1732 2000 87.5% chr10 + 3584901 3585052 152 browser details YourSeq 118 1595 1750 2000 88.5% chr5 - 143970797 143970964 168 browser details YourSeq 116 1578 1742 2000 85.2% chr2 - 13172107 13172257 151 browser details YourSeq 116 1579 1729 2000 86.9% chr8 + 18585681 18585827 147 browser details YourSeq 116 1587 1743 2000 84.6% chr4 + 8250714 8250855 142 browser details YourSeq 115 1578 1748 2000 88.7% chr2 - 130094229 130094411 183 browser details YourSeq 115 1578 1727 2000 90.3% chr16 + 20470251 20470400 150 browser details YourSeq 114 1579 1742 2000 83.3% chr17 - 47160477 47160622 146 browser details YourSeq 114 1580 1732 2000 88.6% chr1 - 144818710 144818863 154

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 + 37433697 37435696 2000 browser details YourSeq 92 505 655 2000 89.8% chr2 - 130727586 130727737 152 browser details YourSeq 74 506 649 2000 78.1% chrX + 71839369 71839513 145 browser details YourSeq 73 505 884 2000 96.3% chr14 + 57796783 57797344 562 browser details YourSeq 72 506 587 2000 95.2% chr12 - 102671239 102883722 212484 browser details YourSeq 69 504 656 2000 88.4% chr2 - 164645664 164645815 152 browser details YourSeq 68 505 655 2000 82.0% chr8 + 87390611 87390761 151 browser details YourSeq 67 523 611 2000 87.5% chr10 - 118009328 118009414 87 browser details YourSeq 66 504 597 2000 87.3% chr9 - 89447031 89447123 93 browser details YourSeq 66 507 601 2000 85.4% chr2 + 25663644 25663735 92 browser details YourSeq 65 513 620 2000 91.2% chr1 - 39394601 39394713 113 browser details YourSeq 65 552 667 2000 78.5% chr1 + 152618551 152618667 117 browser details YourSeq 64 509 611 2000 82.0% chr2 - 24746198 24746299 102 browser details YourSeq 63 504 658 2000 80.5% chr3 - 108606856 108607009 154 browser details YourSeq 63 510 605 2000 90.8% chr14 - 54785894 54786190 297 browser details YourSeq 63 502 581 2000 94.6% chr12 - 80081393 80198490 117098 browser details YourSeq 62 505 599 2000 79.6% chr1 - 172614330 172614417 88 browser details YourSeq 62 516 658 2000 87.1% chr16 + 29827893 29828036 144 browser details YourSeq 61 510 598 2000 84.4% chrX + 36210681 36210768 88 browser details YourSeq 60 509 656 2000 81.1% chr10 + 42016642 42016780 139

Note: The 2000 bp section downstream of Exon 7 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Spata5 spermatogenesis associated 5 [ Mus musculus (house mouse) ] Gene ID: 57815, updated on 12-Aug-2019

Gene summary

Official Symbol Spata5 provided by MGI Official Full Name spermatogenesis associated 5 provided by MGI Primary source MGI:MGI:1927170 See related Ensembl:ENSMUSG00000027722 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Spaf; C78064; 2510048F20Rik Expression Ubiquitous expression in CNS E11.5 (RPKM 3.8), placenta adult (RPKM 3.4) and 25 other tissues See more Orthologs human all

Genomic context

Location: 3; 3 B See Spata5 in Genome Data Viewer Exon count: 18

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 3 NC_000069.6 (37419903..37579096)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 3 NC_000069.5 (37319202..37478017)

Chromosome 3 - NC_000069.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 10 transcripts

Gene: Spata5 ENSMUSG00000027722

Description spermatogenesis associated 5 [Source:MGI Symbol;Acc:MGI:1927170] Gene Synonyms 2510048F20Rik, C78064, Spaf Location Chromosome 3: 37,419,896-37,579,096 forward strand. GRCm38:CM000996.2 About this gene This gene has 10 transcripts (splice variants), 196 orthologues, 5 paralogues, is a member of 1 Ensembl protein family and is associated with 7 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Spata5-202 ENSMUST00000108112.9 3295 893aa ENSMUSP00000103747.3 Protein coding CCDS50895 Q3UMC0 TSL:1 GENCODE basic APPRIS ALT2

Spata5-209 ENSMUST00000198968.1 2916 842aa ENSMUSP00000143349.1 Protein coding CCDS79900 A0A0G2JFY0 TSL:1 GENCODE basic APPRIS ALT2

Spata5-201 ENSMUST00000029277.12 2890 892aa ENSMUSP00000029277.8 Protein coding CCDS17321 A0A0A0MQ80 TSL:1 GENCODE basic APPRIS P3

Spata5-204 ENSMUST00000130674.1 592 No protein - Retained intron - - TSL:3

Spata5-207 ENSMUST00000142199.1 371 No protein - Retained intron - - TSL:2

Spata5-203 ENSMUST00000124347.1 599 No protein - lncRNA - - TSL:3

Spata5-210 ENSMUST00000200093.1 482 No protein - lncRNA - - TSL:3

Spata5-208 ENSMUST00000147687.7 404 No protein - lncRNA - - TSL:3

Spata5-205 ENSMUST00000132958.2 379 No protein - lncRNA - - TSL:3

Spata5-206 ENSMUST00000138225.1 370 No protein - lncRNA - - TSL:5

Page 7 of 9 https://www.alphaknockout.com

179.20 kb Forward strand 37.42Mb 37.44Mb 37.46Mb 37.48Mb 37.50Mb 37.52Mb 37.54Mb 37.56Mb 37.58Mb (Comprehensive set... Fgf2-208 >protein coding Spata5-207 >retained intron Gm43484-201 >TEC Spata5-205 >lncRNA

Spata5-202 >protein coding

Spata5-201 >protein coding

Spata5-209 >protein coding

Spata5-208 >lncRNA Spata5-204 >retained intron

Spata5-206 >lncRNA Spata5-210 >lncRNA

Spata5-203 >lncRNA

Contigs AL627074.11 > Genes < Nudt6-208protein coding < Gm12564-201processed pseudogene (Comprehensive set...

< Nudt6-201protein coding

< Nudt6-203protein coding

< Nudt6-204protein coding

< Nudt6-207protein coding

< Nudt6-202protein coding

< Nudt6-205lncRNA

< Nudt6-209protein coding

< Nudt6-210nonsense mediated decay

< Nudt6-206protein coding

Regulatory Build

37.42Mb 37.44Mb 37.46Mb 37.48Mb 37.50Mb 37.52Mb 37.54Mb 37.56Mb 37.58Mb Reverse strand 179.20 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene pseudogene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000108112

159.20 kb Forward strand

Spata5-202 >protein coding

ENSMUSP00000103... MobiDB lite Low complexity (Seg) Superfamily Aspartate decarboxylase-like domain superfamily

P-loop containing nucleoside triphosphate hydrolase SMART AAA+ ATPase domain Pfam ATPase, AAA-type, core

AAA ATPase, AAA+ lid domain PROSITE patterns ATPase, AAA-type, conserved site PANTHER PTHR23077:SF27

PTHR23077 Gene3D 2.40.40.20 1.10.8.60

3.40.50.300 CDD cd00009

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 720 800 893

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9