https://www.alphaknockout.com

Mouse Sipa1 Knockout Project (CRISPR/Cas9)

Objective: To create a Sipa1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sipa1 (NCBI Reference Sequence: NM_001164481 ; Ensembl: ENSMUSG00000056917 ) is located on Mouse 19. 18 exons are identified, with the ATG start codon in exon 4 and the TGA stop codon in exon 18 (Transcript: ENSMUST00000071857). Exon 4~18 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous null mice display chronic myelocytic leukemia in either the chronic phase or blast crisis.

Exon 4 starts from about 0.03% of the coding region. Exon 4~18 covers 100.0% of the coding region. The size of effective KO region: ~9560 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3' 14 16

1 3 4 5 6 9 10 11 1213 15 17 18

Legends Exon of mouse Sipa1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(21.2% 424) | C(28.1% 562) | T(23.4% 468) | G(27.3% 546)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.2% 484) | C(24.0% 480) | T(24.6% 492) | G(27.2% 544)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr19 - 5660981 5662980 2000 browser details YourSeq 172 692 968 2000 91.9% chr17 + 53624764 53657076 32313 browser details YourSeq 133 692 839 2000 95.3% chr19 - 56410136 56410283 148 browser details YourSeq 129 692 839 2000 94.6% chr5 - 98123576 98123723 148 browser details YourSeq 129 693 838 2000 93.8% chr3 - 127979110 127979254 145 browser details YourSeq 125 692 839 2000 92.2% chr1 + 74810906 74811051 146 browser details YourSeq 120 692 826 2000 95.5% chr3 - 88275751 88275885 135 browser details YourSeq 120 692 826 2000 95.5% chr15 - 34492237 34492371 135 browser details YourSeq 119 693 829 2000 94.8% chr10 - 67077485 67077634 150 browser details YourSeq 119 697 836 2000 93.5% chrX + 12383741 12383880 140 browser details YourSeq 119 693 826 2000 94.6% chr11 + 80309342 80309473 132 browser details YourSeq 117 693 826 2000 92.5% chr7 - 24757209 24757341 133 browser details YourSeq 116 693 826 2000 94.0% chr5 - 95177970 95178563 594 browser details YourSeq 115 573 798 2000 93.3% chr8 - 72531399 72532063 665 browser details YourSeq 115 692 839 2000 94.7% chr5 - 100831913 100832060 148 browser details YourSeq 107 600 974 2000 73.2% chr19 + 24911606 24911829 224 browser details YourSeq 107 689 842 2000 92.8% chr19 + 12188254 12188411 158 browser details YourSeq 106 702 973 2000 79.9% chr9 - 61389619 61389867 249 browser details YourSeq 100 600 968 2000 76.0% chr2 + 130049122 130049336 215 browser details YourSeq 98 601 968 2000 77.6% chr8 + 121219540 121219772 233

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr19 - 5649419 5651418 2000 browser details YourSeq 165 1134 1336 2000 96.2% chr12 - 108250834 108251062 229 browser details YourSeq 157 1178 1376 2000 94.4% chr13 - 90130399 90130642 244 browser details YourSeq 149 1154 1332 2000 94.1% chr4 - 126167692 126167907 216 browser details YourSeq 139 1123 1316 2000 96.0% chr3 - 63969408 63969742 335 browser details YourSeq 125 1188 1337 2000 94.4% chr2 - 76573533 76573706 174 browser details YourSeq 124 1188 1325 2000 95.6% chr2 - 35197401 35197545 145 browser details YourSeq 123 1176 1325 2000 97.0% chr5 - 64827472 64827623 152 browser details YourSeq 123 1194 1380 2000 93.8% chr13 - 28784295 28784509 215 browser details YourSeq 123 1197 1347 2000 94.3% chr11 - 69388510 69388660 151 browser details YourSeq 123 1194 1336 2000 93.7% chr8 + 110781512 110781661 150 browser details YourSeq 121 1194 1336 2000 94.2% chr5 - 134632626 134632778 153 browser details YourSeq 121 1194 1325 2000 96.2% chr18 - 30248114 30248251 138 browser details YourSeq 120 1197 1331 2000 94.8% chr5 - 102531555 102531695 141 browser details YourSeq 120 1187 1325 2000 94.3% chr3 - 108954366 108954515 150 browser details YourSeq 120 1194 1326 2000 95.5% chrX + 52623548 52623686 139 browser details YourSeq 119 1194 1332 2000 95.5% chr1 - 134551149 134551300 152 browser details YourSeq 118 1197 1325 2000 96.1% chr13 + 112951252 112951386 135 browser details YourSeq 117 1194 1325 2000 94.7% chr15 - 97297176 97297313 138 browser details YourSeq 117 1188 1316 2000 96.1% chr2 + 156410416 156410568 153

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Sipa1 signal-induced proliferation associated gene 1 [ Mus musculus (house mouse) ] Gene ID: 20469, updated on 12-Aug-2019

Gene summary

Official Symbol Sipa1 provided by MGI Official Full Name signal-induced proliferation associated gene 1 provided by MGI Primary source MGI:MGI:107576 See related Ensembl:ENSMUSG00000056917 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Spa1 Expression Broad expression in spleen adult (RPKM 156.5), thymus adult (RPKM 70.0) and 18 other tissues See more Orthologs human all

Genomic context

Location: 19 A; 19 4.34 cM See Sipa1 in Genome Data Viewer Exon count: 18

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 19 NC_000085.6 (5651185..5663722, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 19 NC_000085.5 (5651185..5663707, complement)

Chromosome 19 - NC_000085.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 13 transcripts

Gene: Sipa1 ENSMUSG00000056917

Description signal-induced proliferation associated gene 1 [Source:MGI Symbol;Acc:MGI:107576] Gene Synonyms SPA-1 Location Chromosome 19: 5,651,185-5,663,707 reverse strand. GRCm38:CM001012.2 About this gene This gene has 13 transcripts (splice variants), 163 orthologues, 6 paralogues, is a member of 1 Ensembl protein family and is associated with 12 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sipa1-201 ENSMUST00000071857.12 3812 1038aa ENSMUSP00000073618.5 Protein coding CCDS29474 E9Q0Y4 TSL:1 GENCODE basic APPRIS P1

Sipa1-202 ENSMUST00000080824.12 3741 1038aa ENSMUSP00000079637.5 Protein coding CCDS29474 E9Q0Y4 TSL:1 GENCODE basic APPRIS P1

Sipa1-203 ENSMUST00000164304.8 3584 1038aa ENSMUSP00000128208.1 Protein coding CCDS29474 E9Q0Y4 TSL:1 GENCODE basic APPRIS P1

Sipa1-204 ENSMUST00000169854.1 3583 1038aa ENSMUSP00000132345.1 Protein coding CCDS29474 E9Q0Y4 TSL:5 GENCODE basic APPRIS P1

Sipa1-205 ENSMUST00000236006.1 3513 1038aa ENSMUSP00000158378.1 Protein coding CCDS29474 E9Q0Y4 GENCODE basic APPRIS P1

Sipa1-211 ENSMUST00000237874.1 3456 1038aa ENSMUSP00000157692.1 Protein coding CCDS29474 E9Q0Y4 GENCODE basic APPRIS P1

Sipa1-207 ENSMUST00000236464.1 2801 693aa ENSMUSP00000158193.1 Protein coding - Q3V403 GENCODE basic

Sipa1-210 ENSMUST00000237544.1 388 34aa ENSMUSP00000158151.1 Protein coding - A0A494BAM3 CDS 3' incomplete

Sipa1-208 ENSMUST00000236486.1 2893 No protein - Retained intron - - -

Sipa1-206 ENSMUST00000236332.1 245 No protein - Retained intron - - -

Sipa1-213 ENSMUST00000238092.1 679 No protein - lncRNA - - -

Sipa1-212 ENSMUST00000238020.1 470 No protein - lncRNA - - -

Sipa1-209 ENSMUST00000236827.1 143 No protein - lncRNA - - -

Page 7 of 9 https://www.alphaknockout.com

32.52 kb Forward strand 5.65Mb 5.66Mb 5.67Mb Rela-201 >protein coding (Comprehensive set...

Rela-206 >nonsense mediated decay

Rela-204 >nonsense mediated decay

Rela-203 >retained intron

Contigs < AC134563.5 Genes (Comprehensive set... < Sipa1-204protein coding < Pcnx3-202protein coding

< Sipa1-203protein coding < Pcnx3-211retained intron

< Sipa1-205protein coding < Pcnx3-201protein coding

< Sipa1-201protein coding < Pcnx3-203retained intron

< Sipa1-202protein coding < Pcnx3-206nonsense mediated decay

< Sipa1-207protein coding < Pcnx3-205retained intron

< Sipa1-211protein coding < Pcnx3-210protein coding

< Sipa1-206retained intron < Sipa1-210protein coding

< Sipa1-213lncRNA < Sipa1-212lncRNA

< Sipa1-208retained intron

< Sipa1-209lncRNA

Regulatory Build

5.65Mb 5.66Mb 5.67Mb Reverse strand 32.52 kb

Regulation Legend

Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000071857

< Sipa1-201protein coding

Reverse strand 12.47 kb

ENSMUSP00000073... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Rap/Ran-GAP superfamily PDZ superfamily

SMART PDZ domain Pfam Rap GTPase activating protein domain PDZ domain

PROSITE profiles Rap GTPase activating protein domain PDZ domain

PANTHER PTHR15711:SF14

PTHR15711 Gene3D Rap/Ran-GAP superfamily 2.30.42.10

CDD cd00992

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 100 200 300 400 500 600 700 800 900 1038

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9