https://www.alphaknockout.com

Mouse Cenpb Knockout Project (CRISPR/Cas9)

Objective: To create a Cenpb knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cenpb (NCBI Reference Sequence: NM_007682 ; Ensembl: ENSMUSG00000068267 ) is located on Mouse 2. 1 exon is identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 1 (Transcript: ENSMUST00000089510). Exon 1 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele display decreased body weight, small testis, oligospermia, and an age- and background-dependent reduction in female reproductive competence associated with abnormalities in uterus morphology, metral environment, gestational length, and parturition.

Exon 1 starts from about 0.06% of the coding region. Exon 1 covers 100.0% of the coding region. The size of effective KO region: ~1795 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1

Legends Exon of mouse Cenpb Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.6% 492) | C(25.9% 518) | T(24.7% 494) | G(24.8% 496)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(21.65% 433) | C(27.85% 557) | T(24.25% 485) | G(26.25% 525)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr2 - 131179877 131181876 2000 browser details YourSeq 170 27 586 2000 82.3% chr5 - 136097389 136097661 273 browser details YourSeq 163 24 642 2000 91.4% chr12 + 66951510 66952137 628 browser details YourSeq 163 27 215 2000 93.6% chr11 + 6537536 6537895 360 browser details YourSeq 159 24 214 2000 93.5% chr1 - 93639194 93639396 203 browser details YourSeq 158 25 211 2000 93.0% chr11 - 69869917 69870106 190 browser details YourSeq 156 26 214 2000 91.6% chr11 + 103949831 103950023 193 browser details YourSeq 155 27 214 2000 89.1% chr6 - 49183118 49183299 182 browser details YourSeq 155 27 214 2000 89.2% chr1 - 153110919 153111102 184 browser details YourSeq 154 10 199 2000 90.6% chr8 - 87923909 87924098 190 browser details YourSeq 154 21 214 2000 91.5% chr11 - 62172442 62172638 197 browser details YourSeq 153 25 195 2000 94.8% chr4 + 149763966 149764136 171 browser details YourSeq 153 26 209 2000 92.4% chr3 + 19082891 19083077 187 browser details YourSeq 153 27 621 2000 82.7% chr10 + 71472755 71473146 392 browser details YourSeq 152 26 208 2000 90.1% chr7 - 29834341 29834521 181 browser details YourSeq 152 26 212 2000 91.4% chr3 - 129399712 129399901 190 browser details YourSeq 152 26 208 2000 97.0% chr11 + 120162332 120162522 191 browser details YourSeq 152 26 208 2000 92.7% chr11 + 46791902 46792090 189 browser details YourSeq 152 26 210 2000 91.4% chr10 + 33961731 33961918 188 browser details YourSeq 151 23 195 2000 93.7% chr9 + 100436514 100436686 173

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr2 - 131176080 131178079 2000 browser details YourSeq 22 1689 1710 2000 100.0% chr6 - 24394246 24394267 22 browser details YourSeq 22 31 55 2000 96.0% chr1 - 51616326 51616353 28 browser details YourSeq 20 1390 1409 2000 100.0% chr1 - 13133681 13133700 20 browser details YourSeq 20 943 962 2000 100.0% chr1 + 97857068 97857087 20 browser details YourSeq 20 670 689 2000 100.0% chr1 + 43051768 43051787 20 browser details YourSeq 20 1664 1683 2000 100.0% chr1 + 38768372 38768391 20

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Cenpb centromere protein B [ Mus musculus (house mouse) ] Gene ID: 12616, updated on 10-Oct-2019

Gene summary

Official Symbol Cenpb provided by MGI Official Full Name centromere protein B provided by MGI Primary source MGI:MGI:88376 See related Ensembl:ENSMUSG00000068267 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as CENP-B Orthologs human all

Genomic context

Location: 2 F1; 2 63.29 cM See Cenpb in Genome Data Viewer

Exon count: 1

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 2 NC_000068.7 (131177289..131180067, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 2 NC_000068.6 (131003025..131005748, complement)

Chromosome 2 - NC_000068.7

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 1 transcript

Gene: Cenpb ENSMUSG00000068267

Description centromere protein B [Source:MGI Symbol;Acc:MGI:88376] Location Chromosome 2: 131,175,182-131,180,067 reverse strand. GRCm38:CM000995.2 About this gene This gene has 1 transcript (splice variant), 222 orthologues, 8 paralogues, is a member of 1 Ensembl protein family and is associated with 16 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cenpb-201 ENSMUST00000089510.4 4886 599aa ENSMUSP00000086938.3 Protein coding CCDS16757 P27790 TSL:NA GENCODE basic APPRIS P1

24.89 kb Forward strand 131.17Mb 131.18Mb 131.19Mb Gm14232-201 >lncRNA Cdc25b-203 >retained intron (Comprehensive set...

Cdc25b-201 >protein coding

Cdc25b-202 >protein coding

Contigs AL831736.16 > Genes < Spef1-201protein coding (Comprehensive set...

< Spef1-202protein coding

< Spef1-203protein coding< Cenpb-201protein coding

Regulatory Build

131.17Mb 131.18Mb 131.19Mb Reverse strand 24.89 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000089510

< Cenpb-201protein coding

Reverse strand 4.89 kb

ENSMUSP00000086... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Homeobox-like domain superfamily CENP-B, dimerisation domain superfamily SMART HTH CenpB-type DNA-binding domain Pfam DNA binding HTH domain, Psq-type DDE superfamily endonuclease domain Centromere protein CENP-B, C-terminal domain

HTH CenpB-type DNA-binding domain PROSITE profiles DNA binding HTH domain, Psq-type

HTH CenpB-type DNA-binding domain PANTHER PTHR19303

Major centromere autoantigen B Gene3D 1.10.10.60 CENP-B, dimerisation domain superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

inframe insertion missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 599

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8