https://www.alphaknockout.com

Mouse Cdc42bpa Knockout Project (CRISPR/Cas9)

Objective: To create a Cdc42bpa knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cdc42bpa (NCBI Reference Sequence: NM_001033285 ; Ensembl: ENSMUSG00000026490 ) is located on Mouse 1. 37 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 37 (Transcript: ENSMUST00000111117). Exon 3~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 5.22% of the coding region. Exon 3~5 covers 6.33% of the coding region. The size of effective KO region: ~8696 bp. The KO region does not have any other known gene.

Page 1 of 10 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 5 37

Legends Exon of mouse Cdc42bpa Knockout region

Page 2 of 10 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 10 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(29.8% 596) | C(17.8% 356) | T(30.85% 617) | G(21.55% 431)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(30.1% 602) | C(15.85% 317) | T(35.6% 712) | G(18.45% 369)

Note: The 2000 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 10 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr1 + 180029506 180031505 2000 browser details YourSeq 175 257 857 2000 82.5% chr15 + 97222510 97223086 577 browser details YourSeq 168 236 846 2000 82.6% chr12 - 86674154 86674725 572 browser details YourSeq 157 551 838 2000 88.1% chr6 + 135693312 135693873 562 browser details YourSeq 155 241 842 2000 82.4% chr6 - 23053008 23053578 571 browser details YourSeq 152 379 846 2000 85.9% chr1 - 67931500 67932056 557 browser details YourSeq 142 236 846 2000 83.4% chr8 + 64211963 64212536 574 browser details YourSeq 142 256 842 2000 77.9% chr8 + 53501635 53502157 523 browser details YourSeq 142 247 817 2000 82.4% chr12 + 85773715 85774216 502 browser details YourSeq 141 406 849 2000 82.4% chr5 + 132563516 132563919 404 browser details YourSeq 140 609 843 2000 85.8% chr4 - 137220961 137221492 532 browser details YourSeq 139 244 860 2000 81.7% chrX + 112951576 112952173 598 browser details YourSeq 139 456 890 2000 86.4% chr12 + 104611882 104612490 609 browser details YourSeq 137 235 827 2000 82.0% chr10 - 40026441 40027013 573 browser details YourSeq 133 592 839 2000 87.1% chr7 - 89872923 89873445 523 browser details YourSeq 131 236 860 2000 83.7% chr7 - 66315771 66316392 622 browser details YourSeq 128 655 846 2000 89.5% chr4 - 22553022 22553430 409 browser details YourSeq 128 234 846 2000 81.7% chr16 - 39587939 39588541 603 browser details YourSeq 128 238 847 2000 83.8% chr1 - 143112965 143113561 597 browser details YourSeq 128 240 852 2000 83.2% chr13 + 43320935 43321508 574

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr1 + 180040202 180042201 2000 browser details YourSeq 279 1706 2000 2000 97.7% chr1 + 180042193 180042605 413 browser details YourSeq 261 1708 2000 2000 94.9% chr1 + 180042336 180042699 364 browser details YourSeq 220 1708 2000 2000 90.8% chr1 + 180042414 180042652 239 browser details YourSeq 180 1708 1906 2000 93.9% chr1 + 180042505 180042699 195 browser details YourSeq 163 1708 1885 2000 93.7% chr1 + 180042552 180042725 174 browser details YourSeq 118 1871 2000 2000 93.0% chr1 + 180042385 180042511 127 browser details YourSeq 111 1886 2000 2000 98.3% chr1 + 180042444 180042558 115 browser details YourSeq 86 1909 2000 2000 93.3% chr1 + 180042376 180042464 89 browser details YourSeq 75 1708 1784 2000 98.8% chr1 + 180042646 180042722 77 browser details YourSeq 51 293 364 2000 86.2% chr11 + 74198189 74198259 71 browser details YourSeq 45 1450 1582 2000 92.6% chr15 - 66017781 66017916 136 browser details YourSeq 44 1734 1979 2000 62.8% chr1 - 154927680 154927783 104 browser details YourSeq 36 1740 1779 2000 95.0% chr1 + 180042443 180042482 40 browser details YourSeq 33 232 353 2000 81.4% chr13 + 92356883 92357002 120 browser details YourSeq 33 1708 1740 2000 100.0% chr1 + 180042693 180042725 33 browser details YourSeq 25 323 353 2000 90.4% chr1 + 22111737 22111767 31 browser details YourSeq 24 323 366 2000 77.3% chr13 - 56593636 56593679 44 browser details YourSeq 24 227 258 2000 87.5% chr1 - 151367285 151367316 32 browser details YourSeq 24 370 399 2000 90.0% chr10 + 118014036 118014065 30

Note: The 2000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 10 https://www.alphaknockout.com

Gene and protein information: Cdc42bpa CDC42 binding protein alpha [ Mus musculus (house mouse) ] Gene ID: 226751, updated on 14-Aug-2019

Gene summary

Official Symbol Cdc42bpa provided by MGI Official Full Name CDC42 binding protein kinase alpha provided by MGI Primary source MGI:MGI:2441841 See related Ensembl:ENSMUSG00000026490 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as DMPK-like; A930014J19Rik Expression Broad expression in cerebellum adult (RPKM 17.3), cortex adult (RPKM 15.0) and 26 other tissues See more Orthologs human all

Genomic context

Location: 1; 1 H4 See Cdc42bpa in Genome Data Viewer Exon count: 43

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 1 NC_000067.6 (179960079..180165606)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 1 NC_000067.5 (181891220..182095733)

Chromosome 1 - NC_000067.6

Page 6 of 10 https://www.alphaknockout.com

Transcript information: This gene has 18 transcripts

Gene: Cdc42bpa ENSMUSG00000026490

Description CDC42 binding protein kinase alpha [Source:MGI Symbol;Acc:MGI:2441841] Gene Synonyms A930014J19Rik, DMPK-like Location : 179,960,472-180,165,603 forward strand. GRCm38:CM000994.2 About this gene This gene has 18 transcripts (splice variants), 213 orthologues, 5 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cdc42bpa- ENSMUST00000111117.7 9161 1732aa ENSMUSP00000106746.1 Protein coding CCDS48466 E9PVY0 TSL:1 204 GENCODE basic APPRIS P3

Cdc42bpa- ENSMUST00000097450.9 5160 1719aa ENSMUSP00000095059.3 Protein coding CCDS83649 Q3UU96 TSL:1 202 GENCODE basic APPRIS ALT1

Cdc42bpa- ENSMUST00000076687.11 4917 1638aa ENSMUSP00000075980.5 Protein coding CCDS83650 H7BX44 TSL:5 201 GENCODE basic APPRIS ALT1

Cdc42bpa- ENSMUST00000097453.8 6704 1691aa ENSMUSP00000095062.2 Protein coding - D3YYN8 TSL:5 203 GENCODE basic

Cdc42bpa- ENSMUST00000143176.7 6389 984aa ENSMUSP00000115261.1 Protein coding - F6S5Z6 CDS 5' 212 incomplete TSL:5

Cdc42bpa- ENSMUST00000212756.1 5247 1748aa ENSMUSP00000148469.1 Protein coding - A0A1D5RLQ9 TSL:5 218 GENCODE basic

Cdc42bpa- ENSMUST00000133890.7 3567 1048aa ENSMUSP00000116337.1 Protein coding - F6Q5A5 CDS 5' 207 incomplete TSL:5

Cdc42bpa- ENSMUST00000135056.7 2197 732aa ENSMUSP00000114333.1 Protein coding - F6YZJ7 CDS 5' and 3' 209 incomplete TSL:5

Cdc42bpa- ENSMUST00000145181.1 554 185aa ENSMUSP00000118039.1 Protein coding - F6W0C2 CDS 5' and 3' 214 incomplete TSL:2

Cdc42bpa- ENSMUST00000134959.7 2892 93aa ENSMUSP00000142018.1 Nonsense mediated - A0A0A6YXJ5 TSL:5 208 decay

Cdc42bpa- ENSMUST00000152582.7 3900 No - Retained intron - - TSL:1 216 protein

Cdc42bpa- ENSMUST00000194974.1 3175 No - Retained intron - - TSL:NA 217 protein

Cdc42bpa- ENSMUST00000143161.1 2447 No - Retained intron - - TSL:1 211 protein

Cdc42bpa- ENSMUST00000132894.2 2129 No - Retained intron - - TSL:1 206 protein

Cdc42bpa- ENSMUST00000129754.1 951 No - Retained intron - - TSL:2 205 protein

Cdc42bpa- ENSMUST00000139002.1 732 No - Retained intron - - TSL:5 210 protein

Cdc42bpa- ENSMUST00000143350.7 1298 No - lncRNA - - TSL:1 Page 7 of 10 https://www.alphaknockout.com

213 protein

Cdc42bpa- ENSMUST00000145274.1 645 No - lncRNA - - TSL:3 215 protein

225.13 kb Forward strand 180.00Mb 180.05Mb 180.10Mb 180.15Mb (Comprehensive set... Cdc42bpa-203 >protein coding

Cdc42bpa-204 >protein coding

Cdc42bpa-208 >nonsense mediated decay Cdc42bpa-215 >lncRNA

Cdc42bpa-218 >protein coding

Cdc42bpa-202 >protein coding

Cdc42bpa-201 >protein coding

Cdc42bpa-213 >lncRNA Cdc42bpa-210 >retained intron Cdc42bpa-205 >retained intron

Cdc42bpa-206 >retained intron Cdc42bpa-207 >protein coding

Gm38169-201 >TEC Cdc42bpa-216 >retained intron

Cdc42bpa-217 >retained intron

Cdc42bpa-212 >protein coding

Cdc42bpa-209 >protein coding

Cdc42bpa-214 >protein coding

Cdc42bpa-211 >retained intron

Contigs AC125380.4 > AC132436.3 > Genes < Coq8a-206nonsense mediated decay (Comprehensive set...

< Coq8a-211nonsense mediated decay

< Coq8a-202retained intron

< Gm38331-201TEC

< Coq8a-217protein coding

< Coq8a-201protein coding

< Coq8a-208protein coding

< Coq8a-209protein coding

< Coq8a-216protein coding

Regulatory Build

180.00Mb 180.05Mb 180.10Mb 180.15Mb Reverse strand 225.13 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding Page 8 of 10 merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript 225.13 kb Forward strand 180.00Mb 180.05Mb 180.10Mb 180.15Mb Genes Cdc42bpa-203 >protein coding (Comprehensive set...

Cdc42bpa-204 >protein coding

Cdc42bpa-208 >nonsense mediated decay Cdc42bpa-215 >lncRNA

Cdc42bpa-218 >protein coding

Cdc42bpa-202 >protein coding

Cdc42bpa-201 >protein coding

Cdc42bpa-213 >lncRNA Cdc42bpa-210 >retained intron Cdc42bpa-205 >retained intron

Cdc42bpa-206 >retained intron Cdc42bpa-207 >protein coding

Gm38169-201 >TEC Cdc42bpa-216 >retained intron

Cdc42bpa-217 >retained intron

Cdc42bpa-212 >protein coding

Cdc42bpa-209 >protein coding

Cdc42bpa-214 >protein coding

Cdc42bpa-211 >retained intron

Contigs AC125380.4 > AC132436.3 > Genes < Coq8a-206nonsense mediated decay (Comprehensive set...

< Coq8a-211nonsense mediated decay

< Coq8a-202retained intron

< Gm38331-201TEC

< Coq8a-217protein coding

< Coq8a-201protein coding

< Coq8a-208protein coding

< Coq8a-209protein coding

< Coq8a-216protein coding

Regulatory Build

180.00Mb 180.05Mb 180.10Mb 180.15Mb Reverse strand 225.13 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding https://www.alphaknockout.com

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 9 of 10 https://www.alphaknockout.com

Transcript: ENSMUST00000111117

205.02 kb Forward strand

Cdc42bpa-204 >protein coding

ENSMUSP00000106... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Protein kinase-like domain superfamily SSF57889

SSF50729 SMART Pleckstrin homology domain CRIB domain

AGC-kinase, C-terminal Protein kinase C-like, phorbol ester/diacylglycerol-binding domain

Citron homology (CNH) domain Pfam Protein kinase domain KELK-motif containing domain Citron homology (CNH) domain

Protein kinase, C-terminal Myotonic dystrophy protein kinase, coiled coil

Protein kinase C-like, phorbol ester/diacylglycerol-binding domain PROSITE profiles AGC-kinase, C-terminal Pleckstrin homology domain CRIB domain

Protein kinase domain Citron homology (CNH) domain

Protein kinase C-like, phorbol ester/diacylglycerol-binding domain PROSITE patterns /threonine-protein kinase, active site Protein kinase C-like, phorbol ester/diacylglycerol-binding domain

Protein kinase, ATP binding site PANTHER Serine/threonine-protein kinase MRCK alpha

PTHR22988 Gene3D 3.30.200.20 3.30.60.20

1.10.510.10 PH-like domain superfamily CDD cd05623 cd01243 cd00132

Protein kinase C-like, phorbol ester/diacylglycerol-binding domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend frameshift variant missense variant splice region variant synonymous variant

Scale bar 0 200 400 600 800 1000 1200 1400 1732

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 10 of 10