https://www.alphaknockout.com

Mouse Chd8 Knockout Project (CRISPR/Cas9)

Objective: To create a Chd8 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Chd8 (NCBI Reference Sequence: NM_201637 ; Ensembl: ENSMUSG00000053754 ) is located on Mouse 14. 37 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 37 (Transcript: ENSMUST00000089752). Exon 2~7 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous null embryos are growth retarded starting at E5.5 and exhibit developmental arrest at E6.5. Mutants develop into an egg cylinder but do not form a primitive streak or mesoderm and exhibit increased apoptosis at E7.5.

Exon 2 starts from about 10.9% of the coding region. Exon 2~7 covers 15.32% of the coding region. The size of effective KO region: ~8889 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 37

Legends Exon of mouse Chd8 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1228 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 457 bp section downstream of Exon 7 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(1228bp) | A(27.12% 333) | C(16.86% 207) | T(31.68% 389) | G(24.35% 299)

Note: The 1228 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(457bp) | A(24.29% 111) | C(20.79% 95) | T(32.6% 149) | G(22.32% 102)

Note: The 457 bp section downstream of Exon 7 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1228 1 1228 1228 100.0% chr14 - 52235502 52236729 1228 browser details YourSeq 67 974 1040 1228 100.0% chr14 - 23979763 23979829 67 browser details YourSeq 54 993 1049 1228 98.3% chr10 + 64413648 64413718 71 browser details YourSeq 52 526 647 1228 86.2% chr16 + 10668269 10668391 123 browser details YourSeq 51 519 648 1228 90.5% chr16 + 53112681 53112811 131 browser details YourSeq 50 545 750 1228 93.2% chr10 - 12912581 12912970 390 browser details YourSeq 48 522 640 1228 92.9% chr4 + 117045831 117046203 373 browser details YourSeq 48 522 656 1228 88.8% chr18 + 74370485 74370620 136 browser details YourSeq 47 529 643 1228 91.3% chr16 - 87486752 87487068 317 browser details YourSeq 43 509 566 1228 87.8% chr5 + 123337422 123337484 63 browser details YourSeq 41 544 647 1228 95.6% chr10 - 84649946 84650050 105 browser details YourSeq 41 522 648 1228 93.7% chr3 + 88657625 88657752 128 browser details YourSeq 40 501 647 1228 97.7% chr3 + 87244893 87245042 150 browser details YourSeq 40 511 647 1228 82.3% chr13 + 87942463 87942606 144 browser details YourSeq 40 501 590 1228 71.1% chr11 + 114858912 114858999 88 browser details YourSeq 38 519 569 1228 88.0% chrX - 11952716 11952768 53 browser details YourSeq 37 507 560 1228 84.7% chr2 - 153477716 153477772 57 browser details YourSeq 37 519 569 1228 86.3% chr1 - 156287976 156288026 51 browser details YourSeq 36 487 558 1228 77.5% chr2 - 91949206 91949266 61 browser details YourSeq 36 515 562 1228 87.5% chr11 + 107635373 107635420 48

Note: The 1228 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 457 1 457 457 100.0% chr14 - 52226156 52226612 457 browser details YourSeq 40 65 380 457 95.6% chr18 - 61416920 61417400 481 browser details YourSeq 25 199 227 457 81.5% chr1 + 178855141 178855167 27 browser details YourSeq 24 156 189 457 70.4% chr14 - 103437479 103437505 27 browser details YourSeq 23 219 243 457 96.0% chr19 - 18190725 18190749 25 browser details YourSeq 23 115 140 457 96.2% chr10 + 11975590 11975617 28 browser details YourSeq 22 208 229 457 100.0% chr2 - 143832946 143832967 22 browser details YourSeq 21 191 211 457 100.0% chr5 + 113483420 113483440 21 browser details YourSeq 20 146 165 457 100.0% chr5 - 4059834 4059853 20 browser details YourSeq 20 118 137 457 100.0% chr4 - 69664653 69664672 20 browser details YourSeq 20 204 225 457 95.5% chr13 - 96568296 96568317 22 browser details YourSeq 20 396 415 457 100.0% chr5 + 57776266 57776285 20 browser details YourSeq 20 355 374 457 100.0% chr11 + 22493212 22493231 20

Note: The 457 bp section downstream of Exon 7 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Chd8 chromodomain DNA binding protein 8 [ Mus musculus (house mouse) ] Gene ID: 67772, updated on 10-Oct-2019

Gene summary

Official Symbol Chd8 provided by MGI Official Full Name chromodomain helicase DNA binding protein 8 provided by MGI Primary source MGI:MGI:1915022 See related Ensembl:ENSMUSG00000053754 Gene type protein coding RefSeq status REVIEWED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Chd-8; Duplin; HELSNF1; AU015341; mKIAA1564; 5830451P18Rik Summary This gene encodes a member of the chromodomain-helicase-DNA binding protein family, which is characterized by a SNF2- Expression like domain and two chromatin organization modifier domains. The encoded protein also contains brahma and kismet domains, which is common to the subfamily of chromodomain-helicase-DNA binding to which this protein belongs. In mammals, this gene has been shown to function in several processes including transcriptional regulation, epigenetic remodeling, promotion of cell proliferation, and regulation of RNA synthesis. Knockout of this gene causes early embryonic lethality due to widespread apoptosis. Heterozygous loss of function mutations result in autism spectrum disorder-like behaviors that include increased anxiety, repetitive behavior, and altered social behavior. [provided by RefSeq, Dec 2016] Orthologs Ubiquitous expression in CNS E11.5 (RPKM 11.5), thymus adult (RPKM 10.3) and 28 other tissues See more human all

Genomic context

Location: 14; 14 C2 See Chd8 in Genome Data Viewer

Exon count: 40

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 14 NC_000080.6 (52198151..52258042, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 14 NC_000080.5 (52817826..52857247, complement)

Chromosome 14 - NC_000080.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 16 transcripts

Gene: Chd8 ENSMUSG00000053754

Description chromodomain helicase DNA binding protein 8 [Source:MGI Symbol;Acc:MGI:1915022] Gene Synonyms 5830451P18Rik, Duplin Location : 52,198,151-52,257,780 reverse strand. GRCm38:CM001007.2 About this gene This gene has 16 transcripts (splice variants), 187 orthologues, 32 paralogues, is a member of 1 Ensembl protein family and is associated with 8 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Chd8- ENSMUST00000200169.5 8509 2582aa ENSMUSP00000142890.1 Protein coding CCDS36919 Q09XV5 TSL:5 212 GENCODE basic APPRIS P1

Chd8- ENSMUST00000089752.10 8190 2582aa ENSMUSP00000087184.4 Protein coding CCDS36919 Q09XV5 TSL:1 201 GENCODE basic APPRIS P1

Chd8- ENSMUST00000149975.8 3094 1031aa ENSMUSP00000122995.2 Protein coding - F7AL76 CDS 5' and 3' 209 incomplete TSL:5

Chd8- ENSMUST00000227897.1 415 138aa ENSMUSP00000154595.1 Protein coding - A0A2I3BRI1 CDS 5' and 3' 216 incomplete

Chd8- ENSMUST00000226307.1 360 22aa ENSMUSP00000154509.1 Protein coding - A0A2I3BR97 CDS 3' incomplete 213

Chd8- ENSMUST00000226681.1 3695 No - Retained - - - 215 protein intron

Chd8- ENSMUST00000147827.7 3543 No - Retained - - TSL:1 207 protein intron

Chd8- ENSMUST00000147309.1 3350 No - Retained - - TSL:1 206 protein intron

Chd8- ENSMUST00000199135.1 3329 No - Retained - - TSL:NA 211 protein intron

Chd8- ENSMUST00000134329.1 2444 No - Retained - - TSL:1 203 protein intron

Chd8- ENSMUST00000145404.1 911 No - Retained - - TSL:3 205 protein intron

Chd8- ENSMUST00000136528.1 741 No - Retained - - TSL:3 204 protein intron

Chd8- ENSMUST00000122823.1 674 No - Retained - - TSL:5 202 protein intron

Chd8- ENSMUST00000226625.1 488 No - Retained - - - 214 protein intron

Chd8- ENSMUST00000155614.1 428 No - Retained - - TSL:3 210 protein intron

Chd8- ENSMUST00000149694.1 479 No - lncRNA - - TSL:5 208 protein

Page 7 of 9 https://www.alphaknockout.com

79.63 kb Forward strand 52.20Mb 52.22Mb 52.24Mb 52.26Mb Gm26590-201 >lncRNA Gm43766-201 >TEC (Comprehensive set...

Gm26590-202 >lncRNA

Contigs AC159323.3 > < AC126037.4 Genes (Comprehensive set... < Supt16-201protein cod

< Chd8-201protein coding < Rab2b-204nonsense mediated decay

< Chd8-212protein coding < Rab2b-206lncRNA

< Chd8-215retained intron < Chd8-202retained intron< Chd8-214retained intron < Rab2b-205nonsense mediated decay

< Chd8-207retained intron < Chd8-210retained intron < Chd8-211retained intron

< Chd8-206retained intron < Rab2b-208retained intron

< Chd8-209protein coding < Rab2b-201protein coding

< Chd8-203retained intron < Rab2b-203protein coding

< Chd8-204retained intron < Rab2b-202protein coding

< Chd8-205retained intron

Regulatory Build

52.20Mb 52.22Mb 52.24Mb 52.26Mb Reverse strand 79.63 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000089752

< Chd8-201protein coding

Reverse strand 39.42 kb

ENSMUSP00000087... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Chromo-like domain superfamily BRK domain superfamily

P-loop containing nucleoside triphosphate hydrolase SMART Chromo/chromo shadow domain BRK domain

Helicase superfamily 1/2, ATP-binding domain

Helicase, C-terminal Pfam Chromo domain Helicase, C-terminal BRK domain

SNF2-related, N-terminal domain PROSITE profiles Chromo/chromo shadow domain

Helicase superfamily 1/2, ATP-binding domain

Helicase, C-terminal PANTHER PTHR45623

PTHR45623:SF3 HAMAP Chromodomain-helicase-DNA-binding protein 8 Gene3D 2.40.50.40 3.40.50.300 1.10.10.60 BRK domain superfamily

SNF2-like, N-terminal domain superfamily CDD cd18060 cd18793

cd18663

cd18668

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant splice region variant synonymous variant

Scale bar 0 400 800 1200 1600 2000 2582

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9