https://www.alphaknockout.com

Mouse Cbx1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Cbx1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cbx1 (NCBI Reference Sequence: NM_007622 ; Ensembl: ENSMUSG00000018666 ) is located on Mouse 11. 6 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 5 (Transcript: ENSMUST00000093943). Exon 4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Cbx1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-196B19 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: An uncharacterized gene trap insertion does not result in an obvious phenotype during the observation period early in life, although abnormalities may still develop at older age.

Exon 4 starts from about 57.48% of the coding region. The knockout of Exon 4 will result in frameshift of the gene. The size of intron 3 for 5'-loxP site insertion: 1121 bp, and the size of intron 4 for 3'-loxP site insertion: 3793 bp. The size of effective cKO region: ~596 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 3 4 6 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Cbx1 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7095bp) | A(28.26% 2005) | C(19.66% 1395) | T(30.09% 2135) | G(21.99% 1560)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr11 + 96799479 96802478 3000 browser details YourSeq 310 1291 2128 3000 89.3% chr8 + 15877621 15877971 351 browser details YourSeq 185 37 634 3000 85.8% chr4 + 155688715 155689155 441 browser details YourSeq 177 27 632 3000 83.3% chr4 - 154162080 154162307 228 browser details YourSeq 176 36 233 3000 95.0% chr9 - 57575149 57575544 396 browser details YourSeq 175 36 560 3000 87.1% chr15 + 65945834 65946331 498 browser details YourSeq 173 27 239 3000 89.5% chr2 - 166998450 166998656 207 browser details YourSeq 172 36 634 3000 84.0% chr7 - 19786860 19787062 203 browser details YourSeq 170 36 233 3000 93.5% chr4 - 123359107 123359491 385 browser details YourSeq 170 59 636 3000 85.7% chr3 - 137914232 137914646 415 browser details YourSeq 170 27 232 3000 93.0% chr11 - 50318456 50318664 209 browser details YourSeq 169 36 233 3000 91.2% chr16 - 17584171 17584364 194 browser details YourSeq 167 36 239 3000 89.1% chr3 + 88961446 88961636 191 browser details YourSeq 167 36 238 3000 91.7% chr10 + 120818532 120818756 225 browser details YourSeq 166 36 233 3000 93.2% chr17 - 6705622 6705829 208 browser details YourSeq 166 38 233 3000 92.8% chr1 - 156922386 156922582 197 browser details YourSeq 166 34 232 3000 92.8% chr8 + 64275646 64275852 207 browser details YourSeq 166 36 232 3000 93.3% chr7 + 127459220 127459428 209 browser details YourSeq 166 36 232 3000 94.2% chr5 + 129621726 129621925 200 browser details YourSeq 165 36 233 3000 89.8% chr9 - 6819763 6819948 186

Note: The 3000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr11 + 96803074 96806073 3000 browser details YourSeq 462 745 2688 3000 92.4% chr7 + 12775926 13202767 426842 browser details YourSeq 268 746 1553 3000 90.7% chr1 - 74230820 74547126 316307 browser details YourSeq 234 1239 1646 3000 88.3% chr9 + 65881718 65882083 366 browser details YourSeq 209 1239 1642 3000 94.9% chr5 + 123112645 123113221 577 browser details YourSeq 209 1454 2614 3000 92.0% chr10 + 40861586 41021805 160220 browser details YourSeq 192 746 1339 3000 88.5% chr4 + 154251550 154252118 569 browser details YourSeq 186 749 1344 3000 86.8% chr15 - 102137633 102138036 404 browser details YourSeq 185 749 1343 3000 85.0% chr12 - 85144271 85144594 324 browser details YourSeq 178 749 1345 3000 84.0% chr2 + 157568025 157568370 346 browser details YourSeq 173 2201 2687 3000 87.2% chr4 - 142515250 142757163 241914 browser details YourSeq 169 777 1356 3000 91.7% chr7 - 134652572 134653145 574 browser details YourSeq 161 748 1354 3000 87.4% chr7 + 25428771 25429351 581 browser details YourSeq 159 1249 1532 3000 87.2% chr6 + 94519434 94519694 261 browser details YourSeq 148 2513 2682 3000 94.1% chr5 - 23828484 23828656 173 browser details YourSeq 144 749 1356 3000 86.0% chr9 - 21173556 21174123 568 browser details YourSeq 141 735 1338 3000 81.9% chrX + 38441313 38441613 301 browser details YourSeq 139 2503 2687 3000 86.1% chr13 + 74843076 74843252 177 browser details YourSeq 135 2516 2686 3000 87.9% chr1 + 86603523 86603688 166 browser details YourSeq 134 741 909 3000 93.1% chr7 - 127831140 127831328 189

Note: The 3000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and protein information: Cbx1 chromobox 1 [ Mus musculus (house mouse) ] Gene ID: 12412, updated on 24-Oct-2019

Gene summary

Official Symbol Cbx1 provided by MGI Official Full Name chromobox 1 provided by MGI Primary source MGI:MGI:105369 See related Ensembl:ENSMUSG00000018666 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Cbx; M31; HP1B; MOD1; Cbx-rs2; Hp1beta; E430007M08Rik Expression Ubiquitous expression in CNS E11.5 (RPKM 39.9), CNS E14 (RPKM 17.9) and 27 other tissues See more Orthologs human all

Genomic context

Location: 11 D; 11 60.11 cM See Cbx1 in Genome Data Viewer

Exon count: 7

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 11 NC_000077.6 (96788680..96808982)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 11 NC_000077.5 (96650450..96669954)

Chromosome 11 - NC_000077.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 5 transcripts

Gene: Cbx1 ENSMUSG00000018666

Description chromobox 1 [Source:MGI Symbol;Acc:MGI:105369] Gene Synonyms Cbx-rs2, E430007M08Rik, HP1B, Hp1beta, M31, MOD1 Location Chromosome 11: 96,789,127-96,808,640 forward strand. GRCm38:CM001004.2 About this gene This gene has 5 transcripts (splice variants), 279 orthologues, 9 paralogues, is a member of 1 Ensembl protein family and is associated with 3 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cbx1- ENSMUST00000093943.9 1251 185aa ENSMUSP00000091475.3 Protein coding CCDS25303 P83917 TSL:1 203 GENCODE basic APPRIS P1

Cbx1- ENSMUST00000018810.9 1824 138aa ENSMUSP00000018810.3 Protein coding - Q7TPM0 TSL:1 201 GENCODE basic

Cbx1- ENSMUST00000079702.3 912 150aa ENSMUSP00000078640.3 Protein coding - Q9CYJ8 TSL:1 202 GENCODE basic

Cbx1- ENSMUST00000134585.7 2377 138aa ENSMUSP00000137834.1 Nonsense mediated - Q7TPM0 TSL:1 204 decay

Cbx1- ENSMUST00000141257.1 814 No - Retained intron - - TSL:2 205 protein

Page 6 of 8 https://www.alphaknockout.com

39.51 kb Forward strand 96.78Mb 96.79Mb 96.80Mb 96.81Mb Cbx1-201 >protein coding (Comprehensive set...

Cbx1-204 >nonsense mediated decay

Cbx1-203 >protein coding

Cbx1-205 >retained intron

Cbx1-202 >protein coding

Contigs AL596384.17 > Genes < Gm11517-201processed pseudogene < Nfe2l1-211protein coding (Comprehensive set...

< Nfe2l1-201protein coding

< Nfe2l1-210protein coding

< Nfe2l1-212protein coding

< Nfe2l1-203protein coding

< Nfe2l1-204protein coding

< Nfe2l1-202protein coding

Regulatory Build

96.78Mb 96.79Mb 96.80Mb 96.81Mb Reverse strand 39.51 kb

Regulation Legend CTCF Open Promoter Flank Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000093943

19.48 kb Forward strand

Cbx1-203 >protein coding

ENSMUSP00000091... PDB-ENSP mappings MobiDB lite Low complexity (Seg) Superfamily Chromo-like domain superfamily SMART Chromo shadow domain

Chromo/chromo shadow domain Prints Chromo domain subgroup Pfam Chromo domain Chromo shadow domain

PROSITE profiles Chromo/chromo shadow domain PROSITE patterns Chromo domain, conserved site PANTHER PTHR22812:SF145

PTHR22812 Gene3D 2.40.50.40 CDD cd18650 cd18654

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 185

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8