https://www.alphaknockout.com

Mouse Sipa1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Sipa1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sipa1 (NCBI Reference Sequence: NM_001164481 ; Ensembl: ENSMUSG00000056917 ) is located on Mouse 19. 18 exons are identified, with the ATG start codon in exon 4 and the TGA stop codon in exon 18 (Transcript: ENSMUST00000071857). Exon 8~11 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Sipa1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-25C21 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous null mice display chronic myelocytic leukemia in either the chronic phase or blast crisis.

Exon 8 starts from about 36.87% of the coding region. The knockout of Exon 8~11 will result in frameshift of the gene. The size of intron 7 for 5'-loxP site insertion: 362 bp, and the size of intron 11 for 3'-loxP site insertion: 1125 bp. The size of effective cKO region: ~2224 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3' 16

1 7 8 9 10 11 12 13 14 15 18 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Sipa1 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8655bp) | A(23.52% 2036) | C(26.54% 2297) | T(23.03% 1993) | G(26.91% 2329)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr19 - 5655973 5658972 3000 browser details YourSeq 170 1354 1908 3000 85.5% chr1 + 37878350 37878656 307 browser details YourSeq 160 1634 1907 3000 86.2% chr8 - 111176864 111177107 244 browser details YourSeq 153 1736 1908 3000 93.1% chr7 - 113439413 113439584 172 browser details YourSeq 152 1733 1908 3000 91.4% chr11 - 86746548 86746721 174 browser details YourSeq 150 1736 1907 3000 91.8% chr15 - 78932465 78932634 170 browser details YourSeq 150 1579 1907 3000 88.2% chr11 + 109619135 109619572 438 browser details YourSeq 149 1733 1907 3000 92.3% chr10 - 121466065 121466237 173 browser details YourSeq 148 1736 1908 3000 91.4% chr14 - 79571479 79571645 167 browser details YourSeq 147 1736 1908 3000 90.7% chr11 - 93961254 93961424 171 browser details YourSeq 146 1733 1908 3000 91.2% chr10 - 60761309 60761482 174 browser details YourSeq 145 1732 1908 3000 89.8% chr17 - 7400126 7400301 176 browser details YourSeq 144 1736 1907 3000 90.0% chr19 - 5409635 5409804 170 browser details YourSeq 141 1738 1908 3000 91.8% chr10 - 70395380 70395553 174 browser details YourSeq 141 1737 1908 3000 91.3% chr13 + 38321975 38322146 172 browser details YourSeq 140 1737 1908 3000 92.7% chrX - 135757524 135757697 174 browser details YourSeq 140 1742 1921 3000 89.3% chr11 - 98124226 98124408 183 browser details YourSeq 139 1742 1907 3000 92.5% chrX - 36775949 36776113 165 browser details YourSeq 138 1737 1908 3000 91.8% chr4 + 133525229 133525398 170 browser details YourSeq 138 1742 1915 3000 87.7% chr4 + 116262599 116262769 171

Note: The 3000 bp section upstream of Exon 8 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr19 - 5650749 5653748 3000 browser details YourSeq 91 319 483 3000 90.3% chr15 + 4102996 4103160 165 browser details YourSeq 81 321 505 3000 89.5% chr17 + 48897878 48898075 198 browser details YourSeq 74 277 433 3000 81.7% chr16 - 30564249 30564399 151 browser details YourSeq 71 321 442 3000 91.8% chr11 - 79073894 79074018 125 browser details YourSeq 65 309 445 3000 89.4% chr5 - 129713123 129713257 135 browser details YourSeq 64 319 438 3000 87.9% chr12 - 17490998 17491114 117 browser details YourSeq 62 301 438 3000 77.7% chr12 - 76212724 76212851 128 browser details YourSeq 60 301 387 3000 87.4% chr1 + 133095755 133095849 95 browser details YourSeq 59 301 433 3000 79.3% chr11 + 80350998 80351125 128 browser details YourSeq 57 301 373 3000 89.1% chr4 - 125734069 125734141 73 browser details YourSeq 56 332 443 3000 88.8% chr14 + 24250199 24250315 117 browser details YourSeq 55 319 496 3000 78.3% chr5 + 64854161 64854323 163 browser details YourSeq 55 309 378 3000 89.9% chr4 + 100618069 100618138 70 browser details YourSeq 54 301 503 3000 82.7% chr7 - 126001396 126001595 200 browser details YourSeq 54 329 442 3000 89.8% chr11 + 75601804 75743158 141355 browser details YourSeq 53 311 437 3000 86.6% chr14 + 25771857 25771982 126 browser details YourSeq 53 320 439 3000 85.6% chr12 + 84664321 84664439 119 browser details YourSeq 53 333 443 3000 86.2% chr11 + 116203554 116203662 109 browser details YourSeq 52 310 373 3000 88.8% chr9 - 96208260 96208322 63

Note: The 3000 bp section downstream of Exon 11 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Sipa1 signal-induced proliferation associated gene 1 [ Mus musculus (house mouse) ] Gene ID: 20469, updated on 12-Aug-2019

Gene summary

Official Symbol Sipa1 provided by MGI Official Full Name signal-induced proliferation associated gene 1 provided by MGI Primary source MGI:MGI:107576 See related Ensembl:ENSMUSG00000056917 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Spa1 Expression Broad expression in spleen adult (RPKM 156.5), thymus adult (RPKM 70.0) and 18 other tissues See more Orthologs human all

Genomic context

Location: 19 A; 19 4.34 cM See Sipa1 in Genome Data Viewer

Exon count: 18

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 19 NC_000085.6 (5651185..5663722, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 19 NC_000085.5 (5651185..5663707, complement)

Chromosome 19 - NC_000085.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 13 transcripts

Gene: Sipa1 ENSMUSG00000056917

Description signal-induced proliferation associated gene 1 [Source:MGI Symbol;Acc:MGI:107576] Gene Synonyms SPA-1 Location Chromosome 19: 5,651,185-5,663,707 reverse strand. GRCm38:CM001012.2 About this gene This gene has 13 transcripts (splice variants), 163 orthologues, 6 paralogues, is a member of 1 Ensembl protein family and is associated with 12 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sipa1-201 ENSMUST00000071857.12 3812 1038aa ENSMUSP00000073618.5 Protein coding CCDS29474 E9Q0Y4 TSL:1 GENCODE basic APPRIS P1

Sipa1-202 ENSMUST00000080824.12 3741 1038aa ENSMUSP00000079637.5 Protein coding CCDS29474 E9Q0Y4 TSL:1 GENCODE basic APPRIS P1

Sipa1-203 ENSMUST00000164304.8 3584 1038aa ENSMUSP00000128208.1 Protein coding CCDS29474 E9Q0Y4 TSL:1 GENCODE basic APPRIS P1

Sipa1-204 ENSMUST00000169854.1 3583 1038aa ENSMUSP00000132345.1 Protein coding CCDS29474 E9Q0Y4 TSL:5 GENCODE basic APPRIS P1

Sipa1-205 ENSMUST00000236006.1 3513 1038aa ENSMUSP00000158378.1 Protein coding CCDS29474 E9Q0Y4 GENCODE basic APPRIS P1

Sipa1-211 ENSMUST00000237874.1 3456 1038aa ENSMUSP00000157692.1 Protein coding CCDS29474 E9Q0Y4 GENCODE basic APPRIS P1

Sipa1-207 ENSMUST00000236464.1 2801 693aa ENSMUSP00000158193.1 Protein coding - Q3V403 GENCODE basic

Sipa1-210 ENSMUST00000237544.1 388 34aa ENSMUSP00000158151.1 Protein coding - A0A494BAM3 CDS 3' incomplete

Sipa1-208 ENSMUST00000236486.1 2893 No protein - Retained intron - - -

Sipa1-206 ENSMUST00000236332.1 245 No protein - Retained intron - - -

Sipa1-213 ENSMUST00000238092.1 679 No protein - lncRNA - - -

Sipa1-212 ENSMUST00000238020.1 470 No protein - lncRNA - - -

Sipa1-209 ENSMUST00000236827.1 143 No protein - lncRNA - - -

Page 6 of 8 https://www.alphaknockout.com

32.52 kb Forward strand 5.65Mb 5.66Mb 5.67Mb Rela-201 >protein coding (Comprehensive set...

Rela-206 >nonsense mediated decay

Rela-204 >nonsense mediated decay

Rela-203 >retained intron

Contigs < AC134563.5 Genes (Comprehensive set... < Sipa1-204protein coding < Pcnx3-202protein coding

< Sipa1-203protein coding < Pcnx3-211retained intron

< Sipa1-205protein coding < Pcnx3-201protein coding

< Sipa1-201protein coding < Pcnx3-203retained intron

< Sipa1-202protein coding < Pcnx3-206nonsense mediated decay

< Sipa1-207protein coding < Pcnx3-205retained intron

< Sipa1-211protein coding < Pcnx3-210protein coding

< Sipa1-206retained intron < Sipa1-210protein coding

< Sipa1-213lncRNA < Sipa1-212lncRNA

< Sipa1-208retained intron

< Sipa1-209lncRNA

Regulatory Build

5.65Mb 5.66Mb 5.67Mb Reverse strand 32.52 kb

Regulation Legend

Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000071857

< Sipa1-201protein coding

Reverse strand 12.47 kb

ENSMUSP00000073... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Rap/Ran-GAP superfamily PDZ superfamily

SMART PDZ domain Pfam Rap GTPase activating protein domain PDZ domain

PROSITE profiles Rap GTPase activating protein domain PDZ domain

PANTHER PTHR15711:SF14

PTHR15711 Gene3D Rap/Ran-GAP superfamily 2.30.42.10

CDD cd00992

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 100 200 300 400 500 600 700 800 900 1038

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8