https://www.alphaknockout.com

Mouse Nsd1 Knockout Project (CRISPR/Cas9)

Objective: To create a Nsd1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Nsd1 (NCBI Reference Sequence: NM_008739 ; Ensembl: ENSMUSG00000021488 ) is located on Mouse 13. 23 exons are identified, with the ATG start codon in exon 2 and the TAA stop codon in exon 23 (Transcript: ENSMUST00000099490). Exon 3 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygotes for targeted null mutations exhibit excess apoptosis and retarded growth, fail to complete gastrulation, and are resorbed by embryonic day 10.

Exon 3 starts from about 11.53% of the coding region. Exon 3 covers 1.68% of the coding region. The size of effective KO region: ~136 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 3 23

Legends Exon of mouse Nsd1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.05% 501) | C(15.8% 316) | T(37.3% 746) | G(21.85% 437)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(28.95% 579) | C(18.95% 379) | T(31.6% 632) | G(20.5% 410)

Note: The 2000 bp section downstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr13 + 55231896 55233895 2000 browser details YourSeq 213 352 736 2000 94.2% chr9 - 92949221 92949639 419 browser details YourSeq 208 333 749 2000 94.5% chr15 + 93110485 93111002 518 browser details YourSeq 205 283 736 2000 88.7% chr11 + 62545171 62545507 337 browser details YourSeq 194 290 736 2000 93.0% chr14 + 79605949 79606482 534 browser details YourSeq 187 316 736 2000 94.4% chr8 + 70226710 70227134 425 browser details YourSeq 182 551 747 2000 96.5% chr2 + 90173540 90173924 385 browser details YourSeq 181 415 736 2000 95.6% chr4 - 155830945 155831414 470 browser details YourSeq 181 551 759 2000 93.1% chr19 + 3953431 3953637 207 browser details YourSeq 179 551 753 2000 95.0% chr7 - 45442169 45442371 203 browser details YourSeq 179 539 751 2000 93.6% chr19 - 15945267 15945478 212 browser details YourSeq 179 540 736 2000 93.9% chr12 - 24094276 24094469 194 browser details YourSeq 179 540 736 2000 93.9% chr12 - 22742739 22742932 194 browser details YourSeq 178 87 738 2000 86.4% chr5 - 139113096 139113577 482 browser details YourSeq 178 554 837 2000 91.0% chr16 - 17049218 17049489 272 browser details YourSeq 178 545 759 2000 91.0% chr8 + 37967483 37967682 200 browser details YourSeq 178 551 740 2000 95.8% chr11 + 78137244 78137432 189 browser details YourSeq 177 540 736 2000 93.3% chr12 - 22464046 22464237 192 browser details YourSeq 177 545 756 2000 91.0% chr6 + 63069624 63069829 206 browser details YourSeq 177 540 736 2000 93.3% chr12 + 18856367 18856560 194

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr13 + 55234032 55236031 2000 browser details YourSeq 261 735 2000 2000 93.4% chr2 - 11400292 11608946 208655 browser details YourSeq 255 768 2000 2000 93.9% chr4 - 132540718 132624762 84045 browser details YourSeq 230 758 2000 2000 94.0% chr17 - 29031241 29034896 3656 browser details YourSeq 192 288 587 2000 88.0% chr14 - 37267115 37267433 319 browser details YourSeq 192 762 1153 2000 92.6% chr11 - 75618003 75618723 721 browser details YourSeq 189 635 888 2000 91.0% chrX + 162816387 162816724 338 browser details YourSeq 181 296 551 2000 88.3% chr1 + 164300136 164300410 275 browser details YourSeq 164 742 942 2000 93.3% chr11 - 63983978 63984181 204 browser details YourSeq 164 758 942 2000 94.6% chr17 + 45696008 45696193 186 browser details YourSeq 163 758 942 2000 94.6% chr17 - 88152813 88152998 186 browser details YourSeq 162 758 942 2000 95.2% chr4 - 4969685 4969880 196 browser details YourSeq 160 758 942 2000 94.1% chr4 - 99687230 99687419 190 browser details YourSeq 160 758 942 2000 94.1% chr3 - 73561725 73561914 190 browser details YourSeq 160 758 942 2000 94.1% chr11 + 97224933 97225122 190 browser details YourSeq 159 758 941 2000 94.1% chr9 - 21163451 21163639 189 browser details YourSeq 159 758 942 2000 92.3% chr9 + 4815057 4815239 183 browser details YourSeq 159 758 942 2000 93.5% chr5 + 59472719 59472904 186 browser details YourSeq 159 758 942 2000 93.5% chr2 + 175022853 175023038 186 browser details YourSeq 158 758 942 2000 94.0% chr7 - 81431864 81432060 197

Note: The 2000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Nsd1 -binding SET-domain protein 1 [ Mus musculus (house mouse) ] Gene ID: 18193, updated on 12-Aug-2019

Gene summary

Official Symbol Nsd1 provided by MGI Official Full Name nuclear receptor-binding SET-domain protein 1 provided by MGI Primary source MGI:MGI:1276545 See related Ensembl:ENSMUSG00000021488 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as KMT3B; AI528500 Expression Ubiquitous expression in CNS E11.5 (RPKM 12.4), placenta adult (RPKM 9.2) and 28 other tissues See more Orthologs human all

Genomic context

Location: 13; 13 B1 See Nsd1 in Genome Data Viewer Exon count: 34

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 13 NC_000079.6 (55209782..55318325)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 13 NC_000079.5 (55311143..55419686)

Chromosome 13 - NC_000079.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 11 transcripts

Gene: Nsd1 ENSMUSG00000021488

Description nuclear receptor-binding SET-domain protein 1 [Source:MGI Symbol;Acc:MGI:1276545] Gene Synonyms KMT3B Location Chromosome 13: 55,209,782-55,318,325 forward strand. GRCm38:CM001006.2 About this gene This gene has 11 transcripts (splice variants), 299 orthologues, 21 paralogues, is a member of 1 Ensembl protein family and is associated with 10 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Nsd1-201 ENSMUST00000099490.2 12784 2691aa ENSMUSP00000097089.2 Protein coding CCDS36673 E9QAE4 TSL:5 GENCODE basic APPRIS P1

Nsd1-207 ENSMUST00000224973.1 9905 2588aa ENSMUSP00000153677.1 Protein coding - A0A286YE36 GENCODE basic

Nsd1-203 ENSMUST00000224156.1 1149 158aa ENSMUSP00000153366.1 Protein coding - A0A286YDS9 GENCODE basic

Nsd1-206 ENSMUST00000224918.1 1001 158aa ENSMUSP00000153511.1 Protein coding - A0A286YDS9 GENCODE basic

Nsd1-205 ENSMUST00000224693.1 852 158aa ENSMUSP00000152939.1 Protein coding - A0A286YDS9 GENCODE basic

Nsd1-208 ENSMUST00000225169.1 339 30aa ENSMUSP00000153503.1 Protein coding - A0A286YDN7 CDS 3' incomplete

Nsd1-204 ENSMUST00000224338.1 2013 No protein - Retained intron - - -

Nsd1-211 ENSMUST00000225982.1 1822 No protein - Retained intron - - -

Nsd1-209 ENSMUST00000225194.1 1770 No protein - Retained intron - - -

Nsd1-210 ENSMUST00000225405.1 772 No protein - Retained intron - - -

Nsd1-202 ENSMUST00000223894.1 426 No protein - lncRNA - - -

Page 7 of 9 https://www.alphaknockout.com

128.54 kb Forward strand 55.20Mb 55.25Mb 55.30Mb (Comprehensive set... Nsd1-201 >protein coding

Nsd1-207 >protein coding

Nsd1-203 >protein coding Nsd1-202 >lncRNA Nsd1-210 >retained intron Prelid1-202 >retained intron

Nsd1-204 >retained intron Prelid1-201 >protein coding

Nsd1-206 >protein coding

Nsd1-211 >retained intron

Nsd1-209 >retained intron

Nsd1-205 >protein coding

Nsd1-208 >protein coding

Contigs < AC160958.2 Genes < Gm47816-201processed pseudogene < Rab24-205retained intron (Comprehensive set...

< Rab24-204retained intron

< Rab24-202retained intron

< Mxd3-203retained intron

< Mxd3-204retained intron

< Rab24-203protein coding

< Rab24-201protein coding

< Mxd3-201protein coding

< Mxd3-202lncRNA

Regulatory Build

55.20Mb 55.25Mb 55.30Mb Reverse strand 128.54 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

pseudogene processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000099490

108.54 kb Forward strand

Nsd1-201 >protein coding

ENSMUSP00000097... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily SSF63748

SSF82199

Zinc finger, FYVE/PHD-type SMART Zinc finger, PHD-type

PWWP domain AWS domain

SET domain

Post-SET domain Pfam PWWP domain NSD, Cys-His rich domain

AWS domain

SET domain PROSITE profiles PWWP domain SET domain

AWS domain

Zinc finger, PHD-finger Post-SET domain

Zinc finger, RING-type PROSITE patterns Zinc finger, PHD-type, conserved site PANTHER PTHR22884:SF312

PTHR22884 Gene3D 2.30.30.140

2.170.270.10

Zinc finger, RING/FYVE/PHD-type CDD cd05837 cd15648 cd05838 cd15659

cd15656

cd15653

cd15650

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend stop gained inframe deletion missense variant synonymous variant

Scale bar 0 400 800 1200 1600 2000 2691

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9