https://www.alphaknockout.com

Mouse Cbx2 Knockout Project (CRISPR/Cas9)

Objective: To create a Cbx2 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cbx2 (NCBI Reference Sequence: NM_007623 ; Ensembl: ENSMUSG00000025577 ) is located on Mouse 11. 5 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 5 (Transcript: ENSMUST00000026662). Exon 4 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mutations cause malformations of the axial skeletal, reduced viability, poor growth and male to female sex reversal.

Exon 4 starts from about 11.75% of the coding region. Exon 4 covers 6.81% of the coding region. The size of effective KO region: ~106 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 4 5

Legends Exon of mouse Cbx2 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 4 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1343 bp section downstream of Exon 4 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(21.05% 421) | C(21.45% 429) | T(25.5% 510) | G(32.0% 640)

Note: The 2000 bp section upstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1343bp) | A(19.88% 267) | C(28.22% 379) | T(26.06% 350) | G(25.84% 347)

Note: The 1343 bp section downstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr11 + 119024451 119026450 2000 browser details YourSeq 106 1063 1228 2000 83.3% chr4 + 131875643 131875806 164 browser details YourSeq 105 1096 1229 2000 90.3% chr5 - 121242073 121477436 235364 browser details YourSeq 102 1061 1230 2000 81.7% chr10 - 82904711 82904877 167 browser details YourSeq 101 1063 1229 2000 81.7% chr1 + 86050531 86050688 158 browser details YourSeq 100 975 1209 2000 81.6% chr1 - 78317810 78318189 380 browser details YourSeq 96 1049 1227 2000 82.9% chr4 - 119003092 119003263 172 browser details YourSeq 96 1064 1204 2000 89.5% chrX + 94545782 94546383 602 browser details YourSeq 94 1088 1229 2000 86.2% chr4 + 62644947 62645089 143 browser details YourSeq 92 1078 1236 2000 84.2% chr13 - 103979123 103979276 154 browser details YourSeq 91 1088 1224 2000 87.9% chr1 - 74297723 74297858 136 browser details YourSeq 90 1063 1212 2000 84.9% chr16 + 38054557 38054709 153 browser details YourSeq 90 976 1206 2000 84.0% chr10 + 82300876 82301122 247 browser details YourSeq 84 1063 1236 2000 80.0% chr4 - 149978529 149978695 167 browser details YourSeq 84 1060 1204 2000 85.6% chr11 + 77486395 77486538 144 browser details YourSeq 82 591 1228 2000 75.8% chr1 - 172021550 172022104 555 browser details YourSeq 81 1063 1212 2000 81.9% chr2 + 33386962 33387109 148 browser details YourSeq 80 1041 1228 2000 89.4% chrX - 36148128 36148317 190 browser details YourSeq 79 1088 1225 2000 88.4% chr9 - 120063569 120063709 141 browser details YourSeq 78 1077 1206 2000 88.7% chr10 - 99694796 99694925 130

Note: The 2000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1343 1 1343 1343 100.0% chr11 + 119026557 119027899 1343 browser details YourSeq 157 858 1152 1343 86.7% chr1 + 35480645 35480970 326 browser details YourSeq 144 869 1160 1343 91.1% chr14 + 121981784 121982457 674 browser details YourSeq 142 852 1152 1343 84.9% chr16 + 24074435 24082612 8178 browser details YourSeq 132 861 1121 1343 85.2% chr6 + 82058850 82059122 273 browser details YourSeq 117 861 1160 1343 87.4% chr10 - 20302164 20302656 493 browser details YourSeq 116 862 1161 1343 92.2% chr3 - 58174798 58175139 342 browser details YourSeq 116 862 1161 1343 88.8% chr18 - 47979716 47980028 313 browser details YourSeq 114 863 1161 1343 87.2% chr15 + 4592277 4592617 341 browser details YourSeq 113 848 1159 1343 86.7% chr16 - 4734222 4734577 356 browser details YourSeq 107 932 1161 1343 90.3% chr5 + 81681484 81681776 293 browser details YourSeq 106 928 1161 1343 91.6% chr18 - 65779827 65780119 293 browser details YourSeq 105 920 1160 1343 90.3% chr11 - 63718964 63719223 260 browser details YourSeq 105 935 1161 1343 88.4% chr17 + 64371444 64371705 262 browser details YourSeq 104 863 1161 1343 85.3% chr15 - 90705636 90705976 341 browser details YourSeq 102 917 1161 1343 92.7% chr7 - 139289390 139289656 267 browser details YourSeq 100 917 1142 1343 87.4% chr10 - 3327800 3328063 264 browser details YourSeq 100 935 1152 1343 87.3% chr13 + 75688205 75688451 247 browser details YourSeq 100 873 1152 1343 88.8% chr11 + 6210963 6211246 284 browser details YourSeq 97 886 1152 1343 86.1% chr17 - 84485319 84485582 264

Note: The 1343 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Cbx2 chromobox 2 [ Mus musculus (house mouse) ] Gene ID: 12416, updated on 16-Sep-2019

Gene summary

Official Symbol Cbx2 provided by MGI Official Full Name chromobox 2 provided by MGI Primary source MGI:MGI:88289 See related Ensembl:ENSMUSG00000025577 Gene type protein coding RefSeq status REVIEWED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as pc; M33; MOD2 Summary This gene encodes a component of the polycomb multiprotein complex, which is required to maintain the transcriptionally Expression repressive state of many throughout development via chromatin remodeling and modification of histones. Disruption of this gene in results in male-to-female gonadal sex reversal. [provided by RefSeq, Sep 2015] Orthologs Broad expression in CNS E11.5 (RPKM 13.9), limb E14.5 (RPKM 8.1) and 21 other tissues See more human all

Genomic context

Location: 11 E2; 11 83.33 cM See Cbx2 in Genome Data Viewer Exon count: 6

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 11 NC_000077.6 (119023019..119031275)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 11 NC_000077.5 (118884343..118892584)

Chromosome 11 - NC_000077.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 2 transcripts

Gene: Cbx2 ENSMUSG00000025577

Description chromobox 2 [Source:MGI Symbol;Acc:MGI:88289] Gene Synonyms M33 Location Chromosome 11: 119,022,962-119,031,270 forward strand. GRCm38:CM001004.2 About this gene This gene has 2 transcripts (splice variants), 233 orthologues, 9 paralogues, is a member of 1 Ensembl protein family and is associated with 26 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cbx2-201 ENSMUST00000026662.7 3807 519aa ENSMUSP00000026662.7 Protein coding CCDS25708 P30658 TSL:1 GENCODE basic APPRIS P1

Cbx2-202 ENSMUST00000139746.1 762 No protein - lncRNA - - TSL:2

28.31 kb Forward strand

119.02Mb 119.03Mb 119.04Mb Genes (Comprehensive set... Cbx2-201 >protein coding

Cbx2-202 >lncRNA

Contigs AL662835.11 >

Genes < Cbx8-201protein coding (Comprehensive set...

< Cbx8-202lncRNA

< Cbx8-203lncRNA

Regulatory Build

119.02Mb 119.03Mb 119.04Mb Reverse strand 28.31 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana

Non-Protein Coding

RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000026662

8.31 kb Forward strand

Cbx2-201 >protein coding

ENSMUSP00000026... MobiDB lite Low complexity (Seg) Superfamily Chromo-like domain superfamily

SMART Chromo/chromo shadow domain

Pfam Chromo domain CBX family C-terminal motif

PROSITE profiles Chromo/chromo shadow domain

PROSITE patterns Chromo domain, conserved site

PANTHER PTHR46860

Gene3D 2.40.50.40 CDD cd18647

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 519

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8