https://www.alphaknockout.com

Mouse Cep164 Knockout Project (CRISPR/Cas9)

Objective: To create a Cep164 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cep164 (NCBI Reference Sequence: NM_001081373 ; Ensembl: ENSMUSG00000043987 ) is located on Mouse 9. 30 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 30 (Transcript: ENSMUST00000117194). Exon 4~7 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 4 starts from about 4.88% of the coding region. Exon 4~7 covers 14.35% of the coding region. The size of effective KO region: ~8444 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 4 5 6 7 30

Legends Exon of mouse Cep164 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 4 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 7 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.1% 482) | C(21.9% 438) | T(31.8% 636) | G(22.2% 444)

Note: The 2000 bp section upstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(23.1% 462) | C(25.85% 517) | T(28.5% 570) | G(22.55% 451)

Note: The 2000 bp section downstream of Exon 7 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr9 - 45809909 45811908 2000 browser details YourSeq 214 1311 1861 2000 92.9% chr8 - 22609545 22610183 639 browser details YourSeq 207 1311 1861 2000 91.6% chr9 + 59620943 59621595 653 browser details YourSeq 191 1311 1861 2000 88.7% chr10 - 128018574 128019416 843 browser details YourSeq 190 1360 1861 2000 96.6% chr4 - 129737736 129738276 541 browser details YourSeq 187 1360 1861 2000 96.1% chr1 - 86802188 86802807 620 browser details YourSeq 181 1675 1884 2000 96.0% chr17 - 30533860 30534113 254 browser details YourSeq 179 1365 1861 2000 93.2% chr11 + 5744448 5745008 561 browser details YourSeq 176 1679 1882 2000 95.5% chr16 - 14322917 14323144 228 browser details YourSeq 176 1458 1861 2000 95.9% chr11 - 116169778 116170407 630 browser details YourSeq 175 1359 1861 2000 94.5% chrX + 159396296 159396816 521 browser details YourSeq 175 1460 1861 2000 95.9% chr14 + 47146153 47146780 628 browser details YourSeq 174 1371 1861 2000 95.4% chr4 - 127098862 127099529 668 browser details YourSeq 173 1468 1861 2000 90.4% chr11 - 115735342 115735676 335 browser details YourSeq 173 1693 1885 2000 95.4% chr1 - 164024561 164025081 521 browser details YourSeq 172 1693 1888 2000 96.3% chr9 - 59507580 59507813 234 browser details YourSeq 171 1384 1865 2000 87.9% chr7 - 30802988 30803364 377 browser details YourSeq 171 1678 1873 2000 95.8% chr11 + 97236943 97237147 205 browser details YourSeq 170 1677 1869 2000 94.4% chr11 - 93978846 93979053 208 browser details YourSeq 169 1676 1861 2000 95.7% chr7 + 82915784 82915969 186

Note: The 2000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr9 - 45799465 45801464 2000 browser details YourSeq 128 1269 1570 2000 84.8% chr4 + 139219059 139219491 433 browser details YourSeq 115 1263 1418 2000 88.9% chr7 - 3295176 3295345 170 browser details YourSeq 114 1267 1567 2000 88.1% chr11 + 70549569 70550334 766 browser details YourSeq 112 1256 1407 2000 87.5% chr4 + 42775720 43157548 381829 browser details YourSeq 111 1255 1388 2000 92.5% chr5 - 28008181 28008318 138 browser details YourSeq 110 1255 1402 2000 84.1% chr11 + 82716218 82716361 144 browser details YourSeq 109 1254 1389 2000 90.4% chr15 + 11750419 11750572 154 browser details YourSeq 109 1259 1567 2000 86.9% chr13 + 62399513 62399952 440 browser details YourSeq 108 1259 1573 2000 87.5% chr4 + 32841617 32841985 369 browser details YourSeq 108 1255 1398 2000 88.8% chr10 + 84750949 84751102 154 browser details YourSeq 107 1255 1407 2000 86.0% chr1 - 181186617 181186771 155 browser details YourSeq 107 1259 1402 2000 88.6% chr1 - 167338716 167338881 166 browser details YourSeq 106 1255 1389 2000 90.9% chr9 - 96615444 96615626 183 browser details YourSeq 106 1258 1397 2000 88.6% chr10 - 61334401 61338428 4028 browser details YourSeq 105 1259 1399 2000 89.0% chr1 - 134559078 134559221 144 browser details YourSeq 105 1255 1401 2000 86.8% chr1 - 103329509 103329658 150 browser details YourSeq 104 1263 1401 2000 87.8% chr16 - 87399142 87399681 540 browser details YourSeq 103 1255 1397 2000 86.6% chr10 - 111025304 111025451 148 browser details YourSeq 102 1271 1407 2000 87.6% chr11 - 48628246 48628383 138

Note: The 2000 bp section downstream of Exon 7 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Cep164 centrosomal protein 164 [ Mus musculus (house mouse) ] Gene ID: 214552, updated on 10-Oct-2019

Gene summary

Official Symbol Cep164 provided by MGI Official Full Name centrosomal protein 164 provided by MGI Primary source MGI:MGI:2384878 See related Ensembl:ENSMUSG00000043987 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI450905; BC027092; D030051D21 Expression Ubiquitous expression in testis adult (RPKM 15.9), placenta adult (RPKM 6.7) and 28 other tissues See more Orthologs human all

Genomic context

Location: 9; 9 A5.2 See Cep164 in Genome Data Viewer Exon count: 35

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 9 NC_000075.6 (45766946..45828686, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 9 NC_000075.5 (45575029..45636721, complement)

Chromosome 9 - NC_000075.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 9 transcripts

Gene: Cep164 ENSMUSG00000043987

Description centrosomal protein 164 [Source:MGI Symbol;Acc:MGI:2384878] Location Chromosome 9: 45,766,946-45,828,691 reverse strand. GRCm38:CM001002.2 About this gene This gene has 9 transcripts (splice variants), 148 orthologues, 3 paralogues, is a member of 1 Ensembl protein family and is associated with 2 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cep164- ENSMUST00000117194.7 5541 1333aa ENSMUSP00000114053.1 Protein coding CCDS52783 D3YVU3 TSL:5 201 GENCODE basic APPRIS P2

Cep164- ENSMUST00000213154.1 6105 2034aa ENSMUSP00000149815.1 Protein coding - A0A1L1SSA4 TSL:5 204 GENCODE basic APPRIS ALT2

Cep164- ENSMUST00000216284.1 4819 1172aa ENSMUSP00000150742.1 Protein coding - A0A1L1SUF9 CDS 5' 207 incomplete TSL:1

Cep164- ENSMUST00000132430.1 1880 627aa ENSMUSP00000117344.1 Protein coding - F7AUV3 CDS 5' and 3' 202 incomplete TSL:5 APPRIS ALT2

Cep164- ENSMUST00000214868.1 360 34aa ENSMUSP00000149980.1 Protein coding - A0A1L1SSP5 CDS 3' 205 incomplete TSL:5

Cep164- ENSMUST00000217554.1 770 79aa ENSMUSP00000149347.1 Nonsense mediated - A0A1L1SR80 CDS 5' 209 decay incomplete TSL:3

Cep164- ENSMUST00000217022.1 2452 No - Retained intron - - TSL:NA 208 protein

Cep164- ENSMUST00000152629.1 680 No - Retained intron - - TSL:2 203 protein

Cep164- ENSMUST00000214971.1 553 No - Retained intron - - TSL:2 206 protein

Page 7 of 9 https://www.alphaknockout.com

81.75 kb Forward strand 45.76Mb 45.78Mb 45.80Mb 45.82Mb Bace1-201 >protein coding (Comprehensive set...

Bace1-202 >protein coding

Contigs AC126804.3 > Genes (Comprehensive set... < Cep164-201protein coding

< Cep164-207protein coding < Cep164-205protein coding

< Cep164-206retained intron < Cep164-208retained intron < Gm7286-201processed pseudogene

< Cep164-204protein coding

< Cep164-203retained intron < 2610028D06Rik-201TEC

< Cep164-202protein coding

< Cep164-209nonsense mediated decay

Regulatory Build

45.76Mb 45.78Mb 45.80Mb 45.82Mb Reverse strand 81.75 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript pseudogene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000117194

< Cep164-201protein coding

Reverse strand 61.75 kb

ENSMUSP00000114... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily WW domain superfamily SMART WW domain PROSITE profiles WW domain PANTHER PTHR18902:SF27

PTHR18902 Gene3D 3.30.1470.10 CDD WW domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend stop gained inframe deletion missense variant splice region variant synonymous variant

Scale bar 0 200 400 600 800 1000 1333

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9