https://www.alphaknockout.com

Mouse Cep164 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Cep164 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cep164 (NCBI Reference Sequence: NM_001081373 ; Ensembl: ENSMUSG00000043987 ) is located on Mouse 9. 30 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 30 (Transcript: ENSMUST00000117194). Exon 4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Cep164 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-19F19 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 4 starts from about 4.88% of the coding region. The knockout of Exon 4 will result in frameshift of the gene. The size of intron 3 for 5'-loxP site insertion: 8069 bp, and the size of intron 4 for 3'-loxP site insertion: 6375 bp. The size of effective cKO region: ~705 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 4 30 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Cep164 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7205bp) | A(23.25% 1675) | C(23.19% 1671) | T(28.24% 2035) | G(25.32% 1824)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr9 - 45810159 45813158 3000 browser details YourSeq 99 2561 2998 3000 76.3% chr11 - 52405886 52406202 317 browser details YourSeq 98 2561 2998 3000 92.3% chr14 - 26985561 26986031 471 browser details YourSeq 96 2604 3000 3000 88.8% chr13 - 9309524 9309931 408 browser details YourSeq 96 2561 3000 3000 80.6% chr12 + 104832543 104832843 301 browser details YourSeq 95 2603 3000 3000 80.4% chr11 - 120208713 120208974 262 browser details YourSeq 88 2125 2229 3000 89.8% chr9 - 118600781 118600881 101 browser details YourSeq 86 2615 2997 3000 95.8% chr11 + 5744448 5744897 450 browser details YourSeq 81 2616 3000 3000 77.1% chr1 - 181136009 181136133 125 browser details YourSeq 75 2139 2631 3000 77.3% chr11 + 70449895 70450199 305 browser details YourSeq 73 2136 2233 3000 84.3% chr12 + 65795685 65795776 92 browser details YourSeq 68 2121 2209 3000 83.8% chr2 - 155095404 155095484 81 browser details YourSeq 68 2928 3000 3000 98.6% chr14 - 47681024 47681096 73 browser details YourSeq 68 2924 3000 3000 90.7% chr12 + 106058754 106058828 75 browser details YourSeq 67 2924 2998 3000 95.7% chr17 - 26114516 26114589 74 browser details YourSeq 67 2926 3000 3000 90.0% chr1 - 174546543 174546612 70 browser details YourSeq 67 2925 3000 3000 91.5% chr16 + 91716053 91716124 72 browser details YourSeq 67 2925 3000 3000 91.4% chr11 + 4845359 4845430 72 browser details YourSeq 66 2923 3000 3000 92.4% chr18 - 85985709 85985786 78 browser details YourSeq 66 2925 3000 3000 92.7% chr12 - 101889094 101889166 73

Note: The 3000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr9 - 45806454 45809453 3000 browser details YourSeq 46 292 714 3000 94.5% chr13 - 10369798 10370226 429 browser details YourSeq 41 249 333 3000 93.5% chr1 - 151722638 151723124 487 browser details YourSeq 39 283 333 3000 93.4% chr4 - 46046449 46046499 51 browser details YourSeq 39 283 336 3000 85.4% chr9 + 11777794 11777842 49 browser details YourSeq 38 283 336 3000 95.3% chr13 - 42576589 42576701 113 browser details YourSeq 37 283 333 3000 95.2% chr16 + 83541126 83541193 68 browser details YourSeq 34 576 630 3000 97.3% chr6 - 147076220 147076276 57 browser details YourSeq 34 283 336 3000 75.0% chr2 + 157687808 157687846 39 browser details YourSeq 34 283 332 3000 92.5% chr16 + 38552906 38552957 52 browser details YourSeq 34 283 326 3000 92.5% chr11 + 62985917 62985962 46 browser details YourSeq 33 297 336 3000 79.5% chr12 - 111420720 111420753 34 browser details YourSeq 32 576 628 3000 97.1% chr6 - 147076336 147076391 56 browser details YourSeq 32 283 336 3000 69.7% chr5 + 82006912 82006944 33 browser details YourSeq 32 310 346 3000 97.1% chr11 + 65134921 65134962 42 browser details YourSeq 28 284 329 3000 96.7% chr19 - 47797618 47797664 47 browser details YourSeq 27 283 323 3000 75.9% chr13 + 6194928 6194962 35 browser details YourSeq 22 310 333 3000 95.9% chr18 - 24407116 24407139 24 browser details YourSeq 22 311 333 3000 100.0% chr13 + 108305171 108305194 24 browser details YourSeq 20 312 333 3000 95.5% chr19 + 35483664 35483685 22

Note: The 3000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Cep164 centrosomal protein 164 [ Mus musculus (house mouse) ] Gene ID: 214552, updated on 10-Oct-2019

Gene summary

Official Symbol Cep164 provided by MGI Official Full Name centrosomal protein 164 provided by MGI Primary source MGI:MGI:2384878 See related Ensembl:ENSMUSG00000043987 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI450905; BC027092; D030051D21 Expression Ubiquitous expression in testis adult (RPKM 15.9), placenta adult (RPKM 6.7) and 28 other tissues See more Orthologs human all

Genomic context

Location: 9; 9 A5.2 See Cep164 in Genome Data Viewer

Exon count: 35

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 9 NC_000075.6 (45766946..45828686, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 9 NC_000075.5 (45575029..45636721, complement)

Chromosome 9 - NC_000075.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 9 transcripts

Gene: Cep164 ENSMUSG00000043987

Description centrosomal protein 164 [Source:MGI Symbol;Acc:MGI:2384878] Location Chromosome 9: 45,766,946-45,828,691 reverse strand. GRCm38:CM001002.2 About this gene This gene has 9 transcripts (splice variants), 148 orthologues, 3 paralogues, is a member of 1 Ensembl protein family and is associated with 2 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cep164- ENSMUST00000117194.7 5541 1333aa ENSMUSP00000114053.1 Protein coding CCDS52783 D3YVU3 TSL:5 201 GENCODE basic APPRIS P2

Cep164- ENSMUST00000213154.1 6105 2034aa ENSMUSP00000149815.1 Protein coding - A0A1L1SSA4 TSL:5 204 GENCODE basic APPRIS ALT2

Cep164- ENSMUST00000216284.1 4819 1172aa ENSMUSP00000150742.1 Protein coding - A0A1L1SUF9 CDS 5' 207 incomplete TSL:1

Cep164- ENSMUST00000132430.1 1880 627aa ENSMUSP00000117344.1 Protein coding - F7AUV3 CDS 5' and 3' 202 incomplete TSL:5 APPRIS ALT2

Cep164- ENSMUST00000214868.1 360 34aa ENSMUSP00000149980.1 Protein coding - A0A1L1SSP5 CDS 3' 205 incomplete TSL:5

Cep164- ENSMUST00000217554.1 770 79aa ENSMUSP00000149347.1 Nonsense mediated - A0A1L1SR80 CDS 5' 209 decay incomplete TSL:3

Cep164- ENSMUST00000217022.1 2452 No - Retained intron - - TSL:NA 208 protein

Cep164- ENSMUST00000152629.1 680 No - Retained intron - - TSL:2 203 protein

Cep164- ENSMUST00000214971.1 553 No - Retained intron - - TSL:2 206 protein

Page 6 of 8 https://www.alphaknockout.com

81.75 kb Forward strand 45.76Mb 45.78Mb 45.80Mb 45.82Mb Bace1-201 >protein coding (Comprehensive set...

Bace1-202 >protein coding

Contigs AC126804.3 > Genes (Comprehensive set... < Cep164-201protein coding

< Cep164-207protein coding < Cep164-205protein coding

< Cep164-206retained intron < Cep164-208retained intron < Gm7286-201processed pseudogene

< Cep164-204protein coding

< Cep164-203retained intron < 2610028D06Rik-201TEC

< Cep164-202protein coding

< Cep164-209nonsense mediated decay

Regulatory Build

45.76Mb 45.78Mb 45.80Mb 45.82Mb Reverse strand 81.75 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript pseudogene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000117194

< Cep164-201protein coding

Reverse strand 61.75 kb

ENSMUSP00000114... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily WW domain superfamily SMART WW domain PROSITE profiles WW domain PANTHER PTHR18902:SF27

PTHR18902 Gene3D 3.30.1470.10 CDD WW domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend stop gained inframe deletion missense variant splice region variant synonymous variant

Scale bar 0 200 400 600 800 1000 1333

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8