https://www.alphaknockout.com

Mouse Cbx8 Knockout Project (CRISPR/Cas9)

Objective: To create a Cbx8 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cbx8 (NCBI Reference Sequence: NM_013926 ; Ensembl: ENSMUSG00000025578 ) is located on Mouse 11. 5 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 5 (Transcript: ENSMUST00000026663). Exon 1~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit impaired MLL-AF9 transformation but are otherwise viable with normal hematopoiesis.

Exon 1 starts from about 0.09% of the coding region. Exon 1~5 covers 100.0% of the coding region. The size of effective KO region: ~2169 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5

Legends Exon of mouse Cbx8 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(18.95% 379) | C(26.85% 537) | T(18.65% 373) | G(35.55% 711)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(21.4% 428) | C(24.3% 486) | T(26.05% 521) | G(28.25% 565)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr11 - 119040851 119042850 2000 browser details YourSeq 21 529 549 2000 100.0% chr3 - 121725281 121725301 21 browser details YourSeq 21 1081 1101 2000 100.0% chr12 + 77631156 77631176 21 browser details YourSeq 20 1864 1883 2000 100.0% chr1 + 20869621 20869640 20

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr11 - 119036680 119038679 2000 browser details YourSeq 28 1684 1715 2000 96.7% chr14 - 70972696 70972744 49 browser details YourSeq 28 264 291 2000 100.0% chr12 + 7360850 7360877 28 browser details YourSeq 25 566 595 2000 93.4% chr1 + 66892481 66892516 36 browser details YourSeq 24 552 579 2000 88.5% chr1 + 123374561 123374587 27 browser details YourSeq 23 102 124 2000 100.0% chr4 - 3640369 3640391 23 browser details YourSeq 22 1723 1744 2000 100.0% chr1 + 28660023 28660044 22 browser details YourSeq 22 542 564 2000 100.0% chr1 + 13753284 13753307 24 browser details YourSeq 21 557 577 2000 100.0% chr10 - 3158331 3158351 21 browser details YourSeq 21 1217 1237 2000 100.0% chr13 + 50059611 50059631 21

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Cbx8 chromobox 8 [ Mus musculus (house mouse) ] Gene ID: 30951, updated on 12-Aug-2019

Gene summary

Official Symbol Cbx8 provided by MGI Official Full Name chromobox 8 provided by MGI Primary source MGI:MGI:1353589 See related Ensembl:ENSMUSG00000025578 Gene type protein coding RefSeq status PROVISIONAL Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Pc3 Expression Ubiquitous expression in CNS E14 (RPKM 2.1), CNS E11.5 (RPKM 1.9) and 28 other tissues See more Orthologs human all

Genomic context

Location: 11; 11 E2 See Cbx8 in Genome Data Viewer Exon count: 5

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 11 NC_000077.6 (119036303..119040913, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 11 NC_000077.5 (118899750..118902227, complement)

Chromosome 11 - NC_000077.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Cbx8 ENSMUSG00000025578

Description chromobox 8 [Source:MGI Symbol;Acc:MGI:1353589] Gene Synonyms Pc3, polycomb 3 Location Chromosome 11: 119,036,305-119,040,969 reverse strand. GRCm38:CM001004.2 About this gene This gene has 3 transcripts (splice variants), 248 orthologues, 9 paralogues, is a member of 1 Ensembl protein family and is associated with 1 phenotype. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cbx8-201 ENSMUST00000026663.7 3580 362aa ENSMUSP00000026663.7 Protein coding CCDS25709 Q9QXV1 TSL:1 GENCODE basic APPRIS P1

Cbx8-202 ENSMUST00000128019.1 789 No protein - lncRNA - - TSL:3

Cbx8-203 ENSMUST00000143831.1 533 No protein - lncRNA - - TSL:2

24.66 kb Forward strand

119.03Mb 119.04Mb 119.05Mb Cbx2-201 >protein coding (Comprehensive set...

Cbx2-202 >lncRNA

Contigs AL662835.11 >

Genes (Comprehensive set... < Cbx8-201protein coding

< Cbx8-202lncRNA

< Cbx8-203lncRNA

Regulatory Build

119.03Mb 119.04Mb 119.05Mb Reverse strand 24.66 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana

Non-Protein Coding

RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000026663

< Cbx8-201protein coding

Reverse strand 4.67 kb

ENSMUSP00000026... PDB-ENSP mappings MobiDB lite Low complexity (Seg) Superfamily Chromo-like domain superfamily SMART Chromo/chromo shadow domain Pfam Chromo domain CBX family C-terminal motif

PROSITE profiles Chromo/chromo shadow domain PROSITE patterns Chromo domain, conserved site PANTHER PTHR46389 Gene3D 2.40.50.40 CDD cd18627

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 362

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8