https://www.alphaknockout.com

Mouse Cenpn Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Cenpn conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cenpn (NCBI Reference Sequence: NM_028131 ; Ensembl: ENSMUSG00000031756 ) is located on Mouse 8. 12 exons are identified, with the ATG start codon in exon 3 and the TGA stop codon in exon 12 (Transcript: ENSMUST00000034205). Exon 6~7 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Cenpn gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-28A19 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 6 starts from about 27.5% of the coding region. The knockout of Exon 6~7 will result in frameshift of the gene. The size of intron 5 for 5'-loxP site insertion: 456 bp, and the size of intron 7 for 3'-loxP site insertion: 1231 bp. The size of effective cKO region: ~2348 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 5 6 7 8 12 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Cenpn Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8826bp) | A(24.68% 2178) | C(23.16% 2044) | T(27.16% 2397) | G(25.01% 2207)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 + 116928367 116931366 3000 browser details YourSeq 208 298 2183 3000 94.1% chr1 - 58419060 58455335 36276 browser details YourSeq 164 1983 2200 3000 88.5% chr7 + 46801063 46801262 200 browser details YourSeq 155 1997 2202 3000 86.6% chr1 - 64786031 64786210 180 browser details YourSeq 145 2023 2197 3000 93.0% chr15 + 81484425 81484601 177 browser details YourSeq 140 2034 2200 3000 90.4% chr9 - 71249055 71249219 165 browser details YourSeq 132 2029 2201 3000 89.5% chr4 - 133675024 133675199 176 browser details YourSeq 129 2051 2200 3000 91.1% chr5 + 137789978 137790124 147 browser details YourSeq 124 2054 2200 3000 90.4% chr8 + 121958935 121959079 145 browser details YourSeq 114 299 488 3000 89.8% chr14 + 34213308 34237316 24009 browser details YourSeq 98 315 488 3000 88.4% chr6 - 146838493 146838675 183 browser details YourSeq 89 299 419 3000 85.5% chr7 - 133183223 133183336 114 browser details YourSeq 88 314 679 3000 82.4% chr17 - 78685643 78686236 594 browser details YourSeq 86 312 474 3000 83.1% chr12 - 108595352 108595517 166 browser details YourSeq 86 308 441 3000 85.4% chr1 + 132896191 133021488 125298 browser details YourSeq 85 274 414 3000 87.7% chrX + 56629678 56630228 551 browser details YourSeq 84 305 421 3000 84.5% chr13 + 96834614 96834721 108 browser details YourSeq 82 298 427 3000 88.0% chr2 - 76225865 76225996 132 browser details YourSeq 81 329 438 3000 88.7% chr11 + 49401093 49401204 112 browser details YourSeq 80 298 497 3000 87.2% chr1 + 24297475 24297965 491

Note: The 3000 bp section upstream of Exon 6 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 + 116933715 116936714 3000 browser details YourSeq 87 198 421 3000 82.9% chr11 - 33128277 33128493 217 browser details YourSeq 83 194 488 3000 92.8% chr1 + 112005660 112005956 297 browser details YourSeq 76 189 488 3000 83.7% chr1 - 38090228 38090519 292 browser details YourSeq 73 328 527 3000 76.7% chr5 + 28505216 28505394 179 browser details YourSeq 72 182 421 3000 82.1% chr1 - 33624606 33624837 232 browser details YourSeq 72 179 527 3000 90.0% chr1 - 10361179 10361544 366 browser details YourSeq 70 328 530 3000 90.7% chr14 + 120879963 120880211 249 browser details YourSeq 66 156 415 3000 78.5% chr4 - 149006339 149006585 247 browser details YourSeq 65 325 445 3000 86.6% chr10 + 23478176 23478392 217 browser details YourSeq 65 157 527 3000 89.2% chr1 + 32035315 32035718 404 browser details YourSeq 64 196 417 3000 89.1% chr16 - 15596408 15596627 220 browser details YourSeq 62 182 495 3000 82.9% chr1 + 69796915 69797222 308 browser details YourSeq 59 198 537 3000 87.5% chr14 + 25146025 25146378 354 browser details YourSeq 58 189 495 3000 75.8% chr1 - 152303002 152303280 279 browser details YourSeq 58 314 433 3000 87.2% chr3 + 85533229 85533359 131 browser details YourSeq 57 188 445 3000 80.6% chr1 + 165264915 165265160 246 browser details YourSeq 56 189 345 3000 95.3% chr4 - 90280536 90281239 704 browser details YourSeq 56 195 421 3000 95.2% chr1 - 34508908 34570347 61440 browser details YourSeq 56 2943 3000 3000 98.3% chr3 + 111369287 111369344 58

Note: The 3000 bp section downstream of Exon 7 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Cenpn protein N [ Mus musculus (house mouse) ] Gene ID: 72155, updated on 10-Oct-2019

Gene summary

Official Symbol Cenpn provided by MGI Official Full Name centromere protein N provided by MGI Primary source MGI:MGI:1919405 See related Ensembl:ENSMUSG00000031756 Gene type protein coding RefSeq status PROVISIONAL Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI426416; AW545024; 2610510J17Rik Expression Biased expression in testis adult (RPKM 55.4), CNS E11.5 (RPKM 13.8) and 8 other tissues See more Orthologs human all

Genomic context

Location: 8; 8 E1 See Cenpn in Genome Data Viewer

Exon count: 12

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 8 NC_000074.6 (116921716..116941504)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 8 NC_000074.5 (119445640..119465403)

Chromosome 8 - NC_000074.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 5 transcripts

Gene: Cenpn ENSMUSG00000031756

Description centromere protein N [Source:MGI Symbol;Acc:MGI:1919405] Gene Synonyms 2610510J17Rik Location : 116,921,730-116,941,507 forward strand. GRCm38:CM001001.2 About this gene This gene has 5 transcripts (splice variants), 200 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cenpn-201 ENSMUST00000034205.4 1768 337aa ENSMUSP00000034205.4 Protein coding CCDS22693 Q9CZW2 TSL:1 GENCODE basic APPRIS P1

Cenpn-203 ENSMUST00000212263.1 1616 250aa ENSMUSP00000148413.1 Protein coding - Q9CZW2 TSL:1 GENCODE basic

Cenpn-205 ENSMUST00000212775.1 837 136aa ENSMUSP00000148610.1 Protein coding - A0A1D5RM33 TSL:3 GENCODE basic

Cenpn-202 ENSMUST00000212184.1 2497 No protein - Retained intron - - TSL:1

Cenpn-204 ENSMUST00000212578.1 447 No protein - Retained intron - - TSL:3

Page 6 of 8 https://www.alphaknockout.com

39.78 kb Forward strand 116.92Mb 116.93Mb 116.94Mb 116.95Mb (Comprehensive set... Cenpn-204 >retained intron Atmin-201 >protein coding

Cenpn-201 >protein coding

Cenpn-205 >protein coding

Cenpn-203 >protein coding

Cenpn-202 >retained intron

Contigs AC121101.8 > < AC162858.6 Genes < Cmc2-205protein coding (Comprehensive set...

< Cmc2-203nonsense mediated decay

< Cmc2-206nonsense mediated decay

< Cmc2-201protein coding

< Cmc2-204lncRNA

< Cmc2-202protein coding

Regulatory Build

116.92Mb 116.93Mb 116.94Mb 116.95Mb Reverse strand 39.78 kb

Regulation Legend CTCF Enhancer Open Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000034205

19.76 kb Forward strand

Cenpn-201 >protein coding

ENSMUSP00000034... Pfam Centromere protein Chl4/mis15/CENP-N PANTHER PTHR46790

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 337

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8