https://www.alphaknockout.com

Mouse Sympk Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Sympk conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sympk (NCBI Reference Sequence: NM_026605 ; Ensembl: ENSMUSG00000023118 ) is located on Mouse 7. 27 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 27 (Transcript: ENSMUST00000023882). Exon 4~5 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Sympk gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-127O11 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous ofr a transgenic gene disruption exhibit anemia at E15 and hydrops fetalis.

Exon 4 starts from about 4.45% of the coding region. The knockout of Exon 4~5 will result in frameshift of the gene. The size of intron 3 for 5'-loxP site insertion: 1362 bp, and the size of intron 5 for 3'-loxP site insertion: 3654 bp. The size of effective cKO region: ~710 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 3 4 5 27 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Sympk Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7210bp) | A(22.41% 1616) | C(25.58% 1844) | T(29.21% 2106) | G(22.8% 1644)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr7 + 19027294 19030293 3000 browser details YourSeq 144 1584 1889 3000 81.6% chr18 - 46356795 46356973 179 browser details YourSeq 85 2464 2638 3000 92.2% chr7 + 98188041 98188220 180 browser details YourSeq 83 2449 2599 3000 87.7% chr10 + 115345595 115345777 183 browser details YourSeq 73 2464 2606 3000 80.2% chr6 + 139634694 139634844 151 browser details YourSeq 70 2466 2586 3000 82.8% chr10 - 39081149 39081279 131 browser details YourSeq 69 2514 2629 3000 90.9% chr18 - 60555691 60555815 125 browser details YourSeq 67 2466 2617 3000 90.3% chr11 - 77292552 77292712 161 browser details YourSeq 67 2486 2634 3000 92.5% chr4 + 121134920 121135088 169 browser details YourSeq 67 2518 2605 3000 94.6% chr14 + 105998992 105999087 96 browser details YourSeq 64 2464 2581 3000 79.1% chr3 + 40956107 40956228 122 browser details YourSeq 63 2514 2603 3000 89.8% chr2 + 74910446 74910543 98 browser details YourSeq 63 2537 2636 3000 92.0% chr1 + 133079271 133079377 107 browser details YourSeq 62 2524 2620 3000 91.8% chr12 + 3601543 3601648 106 browser details YourSeq 61 2486 2619 3000 93.1% chr11 - 94584081 94584392 312 browser details YourSeq 61 2463 2565 3000 88.0% chr11 - 50824845 50824946 102 browser details YourSeq 61 2439 2582 3000 93.0% chr19 + 59028647 59028806 160 browser details YourSeq 61 2516 2606 3000 92.9% chr14 + 40865707 40865805 99 browser details YourSeq 59 2460 2622 3000 92.9% chr6 - 113223214 113223387 174 browser details YourSeq 59 2466 2586 3000 94.1% chr6 - 12199205 12199326 122

Note: The 3000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr7 + 19031004 19034003 3000 browser details YourSeq 384 1216 1722 3000 91.8% chr7 - 15836594 15837109 516 browser details YourSeq 371 1228 1722 3000 90.0% chr9 + 40768429 40769122 694 browser details YourSeq 355 1232 1716 3000 90.5% chr2 - 84029362 84030141 780 browser details YourSeq 345 1249 1722 3000 92.9% chr19 + 16711193 16711666 474 browser details YourSeq 331 1277 1722 3000 91.3% chr6 - 113205582 113206366 785 browser details YourSeq 329 1277 1721 3000 90.5% chr2 - 146277297 146278086 790 browser details YourSeq 315 1251 1722 3000 90.1% chr19 - 52875102 52875790 689 browser details YourSeq 280 1350 1721 3000 91.2% chr14 + 94516611 94517398 788 browser details YourSeq 274 1342 1722 3000 93.2% chr18 - 46545306 46546091 786 browser details YourSeq 273 1399 1784 3000 90.5% chr13 - 42551001 42551760 760 browser details YourSeq 271 1216 1722 3000 87.3% chr19 + 11704181 11704553 373 browser details YourSeq 269 1388 1767 3000 90.0% chr9 - 84239187 84239780 594 browser details YourSeq 269 1389 1722 3000 91.7% chr12 - 101653579 101653915 337 browser details YourSeq 269 1399 1722 3000 93.8% chr11 - 26165623 26165948 326 browser details YourSeq 269 1243 1722 3000 92.0% chr9 + 16519094 16519662 569 browser details YourSeq 267 1402 2181 3000 85.3% chr2 - 105876220 105876561 342 browser details YourSeq 267 1399 1713 3000 93.5% chr1 + 98010068 98010388 321 browser details YourSeq 264 1388 1716 3000 91.4% chr19 + 41252910 41253237 328 browser details YourSeq 263 1399 1722 3000 92.9% chr8 + 10469245 10469570 326

Note: The 3000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Sympk symplekin [ Mus musculus (house mouse) ] Gene ID: 68188, updated on 12-Aug-2019

Gene summary

Official Symbol Sympk provided by MGI Official Full Name symplekin provided by MGI Primary source MGI:MGI:1915438 See related Ensembl:ENSMUSG00000023118 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as SPK; SYM; AA125406; AI449890; 1500016F02Rik; 4632415H16Rik Expression Ubiquitous expression in testis adult (RPKM 36.3), ovary adult (RPKM 34.7) and 28 other tissues See more Orthologs human all

Genomic context

Location: 7; 7 A3 See Sympk in Genome Data Viewer

Exon count: 29

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 7 NC_000073.6 (19024377..19054623)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 7 NC_000073.5 (19609726..19639971)

Chromosome 7 - NC_000073.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: Sympk ENSMUSG00000023118

Description symplekin [Source:MGI Symbol;Acc:MGI:1915438] Gene Synonyms 1500016F02Rik, 4632415H16Rik Location Chromosome 7: 19,024,377-19,054,618 forward strand. GRCm38:CM001000.2 About this gene This gene has 8 transcripts (splice variants), 188 orthologues, is a member of 1 Ensembl protein family and is associated with 5 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sympk- ENSMUST00000023882.13 4116 1288aa ENSMUSP00000023882.7 Protein coding CCDS52051 F8WJD4 TSL:1 201 GENCODE basic APPRIS P1

Sympk- ENSMUST00000153976.7 827 198aa ENSMUSP00000121540.1 Protein coding - D3Z2M2 CDS 3' 208 incomplete TSL:5

Sympk- ENSMUST00000130328.1 469 143aa ENSMUSP00000115900.1 Protein coding - F6UJ99 CDS 5' 202 incomplete TSL:3

Sympk- ENSMUST00000146903.7 4781 234aa ENSMUSP00000138740.1 Nonsense mediated - S4R2Q3 TSL:1 206 decay

Sympk- ENSMUST00000137287.2 4760 No - Retained intron - - TSL:1 204 protein

Sympk- ENSMUST00000131230.1 636 No - Retained intron - - TSL:2 203 protein

Sympk- ENSMUST00000148861.7 1099 No - lncRNA - - TSL:1 207 protein

Sympk- ENSMUST00000138440.1 799 No - lncRNA - - TSL:3 205 protein

Page 6 of 8 https://www.alphaknockout.com

50.24 kb Forward strand 19.02Mb 19.03Mb 19.04Mb 19.05Mb 19.06Mb (Comprehensive set... Sympk-201 >protein coding Rsph6a-201 >protein coding

Sympk-208 >protein coding Sympk-205 >lncRNA Sympk-207 >lncRNA

Sympk-204 >retained intron Sympk-202 >protein coding

Sympk-206 >nonsense mediated decay Sympk-203 >retained intron

Rsph6a-202 >protein coding

Rsph6a-203 >lncRNA

Contigs < AC170864.2 Genes < Foxa3-201protein coding (Comprehensive set...

Regulatory Build

19.02Mb 19.03Mb 19.04Mb 19.05Mb 19.06Mb Reverse strand 50.24 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000023882

30.24 kb Forward strand

Sympk-201 >protein coding

ENSMUSP00000023... MobiDB lite Low complexity (Seg) Superfamily Armadillo-type fold

Pfam Symplekin/Pta1, N-terminal Symplekin C-terminal

PANTHER PTHR15245

Symplekin/Pta1 Gene3D Armadillo-like helical

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

frameshift variant inframe deletion missense variant splice region variant synonymous variant

Scale bar 0 200 400 600 800 1000 1288

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8