https://www.alphaknockout.com

Mouse Nsd1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Nsd1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Nsd1 (NCBI Reference Sequence: NM_008739 ; Ensembl: ENSMUSG00000021488 ) is located on Mouse 13. 23 exons are identified, with the ATG start codon in exon 2 and the TAA stop codon in exon 23 (Transcript: ENSMUST00000099490). Exon 3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Nsd1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP24-334A17 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygotes for targeted null mutations exhibit excess apoptosis and retarded growth, fail to complete gastrulation, and are resorbed by embryonic day 10.

Exon 3 starts from about 11.53% of the coding region. The knockout of Exon 3 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 19744 bp, and the size of intron 3 for 3'-loxP site insertion: 4659 bp. The size of effective cKO region: ~636 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 3 23 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Nsd1 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7136bp) | A(27.84% 1987) | C(17.7% 1263) | T(34.09% 2433) | G(20.36% 1453)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr13 + 55230646 55233645 3000 browser details YourSeq 213 1602 1986 3000 94.2% chr9 - 92949221 92949639 419 browser details YourSeq 208 1583 1999 3000 94.5% chr15 + 93110485 93111002 518 browser details YourSeq 205 1533 1986 3000 88.7% chr11 + 62545171 62545507 337 browser details YourSeq 194 1540 1986 3000 93.0% chr14 + 79605949 79606482 534 browser details YourSeq 187 1566 1986 3000 94.4% chr8 + 70226710 70227134 425 browser details YourSeq 182 1801 1997 3000 96.5% chr2 + 90173540 90173924 385 browser details YourSeq 181 1665 1986 3000 95.6% chr4 - 155830945 155831414 470 browser details YourSeq 181 1801 2009 3000 93.1% chr19 + 3953431 3953637 207 browser details YourSeq 179 1801 2003 3000 95.0% chr7 - 45442169 45442371 203 browser details YourSeq 179 1789 2001 3000 93.6% chr19 - 15945267 15945478 212 browser details YourSeq 179 1790 1986 3000 93.9% chr12 - 24094276 24094469 194 browser details YourSeq 179 1790 1986 3000 93.9% chr12 - 22742739 22742932 194 browser details YourSeq 178 1337 1988 3000 86.4% chr5 - 139113096 139113577 482 browser details YourSeq 178 1804 2087 3000 91.0% chr16 - 17049218 17049489 272 browser details YourSeq 178 1795 2009 3000 91.0% chr8 + 37967483 37967682 200 browser details YourSeq 178 1801 1990 3000 95.8% chr11 + 78137244 78137432 189 browser details YourSeq 177 1790 1986 3000 93.3% chr12 - 22464046 22464237 192 browser details YourSeq 177 1795 2006 3000 91.0% chr6 + 63069624 63069829 206 browser details YourSeq 177 1790 1986 3000 93.3% chr12 + 18856367 18856560 194

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr13 + 55234282 55237281 3000 browser details YourSeq 253 1662 2333 3000 88.1% chrX + 162890640 162891075 436 browser details YourSeq 247 1663 2338 3000 87.6% chr6 + 65000861 65001180 320 browser details YourSeq 240 1663 2334 3000 89.8% chr9 + 28260221 28260765 545 browser details YourSeq 230 1663 2306 3000 93.6% chr3 - 105920756 105921415 660 browser details YourSeq 225 1547 1857 3000 90.7% chr1 - 136755740 136756056 317 browser details YourSeq 202 1661 2306 3000 88.0% chr3 + 88279066 88279505 440 browser details YourSeq 199 1663 2313 3000 88.3% chrX - 94245477 94246031 555 browser details YourSeq 199 1663 2316 3000 91.7% chr2 - 92281218 92281853 636 browser details YourSeq 192 1547 1843 3000 94.9% chr4 - 127276024 127276326 303 browser details YourSeq 191 1664 2285 3000 87.3% chr3 - 95149857 95150287 431 browser details YourSeq 189 1663 2287 3000 89.2% chr19 - 37182825 37183278 454 browser details YourSeq 186 1564 1862 3000 96.1% chr4 - 125062661 125063279 619 browser details YourSeq 184 1663 2030 3000 89.2% chr2 + 156102299 156102582 284 browser details YourSeq 184 1663 1855 3000 98.0% chr15 + 4490048 4490241 194 browser details YourSeq 184 1663 1862 3000 96.5% chr11 + 79272127 79272333 207 browser details YourSeq 183 1663 1857 3000 97.0% chrX + 73937759 73937953 195 browser details YourSeq 183 1663 1864 3000 96.0% chr1 + 170156580 170156781 202 browser details YourSeq 182 1663 1857 3000 97.0% chr9 + 50629383 50629581 199 browser details YourSeq 182 1663 1852 3000 98.5% chr18 + 34662414 34662870 457

Note: The 3000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Nsd1 -binding SET-domain protein 1 [ Mus musculus (house mouse) ] Gene ID: 18193, updated on 12-Aug-2019

Gene summary

Official Symbol Nsd1 provided by MGI Official Full Name nuclear receptor-binding SET-domain protein 1 provided by MGI Primary source MGI:MGI:1276545 See related Ensembl:ENSMUSG00000021488 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as KMT3B; AI528500 Expression Ubiquitous expression in CNS E11.5 (RPKM 12.4), placenta adult (RPKM 9.2) and 28 other tissues See more Orthologs human all

Genomic context

Location: 13; 13 B1 See Nsd1 in Genome Data Viewer

Exon count: 34

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 13 NC_000079.6 (55209782..55318325)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 13 NC_000079.5 (55311143..55419686)

Chromosome 13 - NC_000079.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 11 transcripts

Gene: Nsd1 ENSMUSG00000021488

Description nuclear receptor-binding SET-domain protein 1 [Source:MGI Symbol;Acc:MGI:1276545] Gene Synonyms KMT3B Location Chromosome 13: 55,209,782-55,318,325 forward strand. GRCm38:CM001006.2 About this gene This gene has 11 transcripts (splice variants), 299 orthologues, 21 paralogues, is a member of 1 Ensembl protein family and is associated with 10 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Nsd1-201 ENSMUST00000099490.2 12784 2691aa ENSMUSP00000097089.2 Protein coding CCDS36673 E9QAE4 TSL:5 GENCODE basic APPRIS P1

Nsd1-207 ENSMUST00000224973.1 9905 2588aa ENSMUSP00000153677.1 Protein coding - A0A286YE36 GENCODE basic

Nsd1-203 ENSMUST00000224156.1 1149 158aa ENSMUSP00000153366.1 Protein coding - A0A286YDS9 GENCODE basic

Nsd1-206 ENSMUST00000224918.1 1001 158aa ENSMUSP00000153511.1 Protein coding - A0A286YDS9 GENCODE basic

Nsd1-205 ENSMUST00000224693.1 852 158aa ENSMUSP00000152939.1 Protein coding - A0A286YDS9 GENCODE basic

Nsd1-208 ENSMUST00000225169.1 339 30aa ENSMUSP00000153503.1 Protein coding - A0A286YDN7 CDS 3' incomplete

Nsd1-204 ENSMUST00000224338.1 2013 No protein - Retained intron - - -

Nsd1-211 ENSMUST00000225982.1 1822 No protein - Retained intron - - -

Nsd1-209 ENSMUST00000225194.1 1770 No protein - Retained intron - - -

Nsd1-210 ENSMUST00000225405.1 772 No protein - Retained intron - - -

Nsd1-202 ENSMUST00000223894.1 426 No protein - lncRNA - - -

Page 6 of 8 https://www.alphaknockout.com

128.54 kb Forward strand 55.20Mb 55.25Mb 55.30Mb (Comprehensive set... Nsd1-201 >protein coding

Nsd1-207 >protein coding

Nsd1-203 >protein coding Nsd1-202 >lncRNA Nsd1-210 >retained intron Prelid1-202 >retained intron

Nsd1-204 >retained intron Prelid1-201 >protein coding

Nsd1-206 >protein coding

Nsd1-211 >retained intron

Nsd1-209 >retained intron

Nsd1-205 >protein coding

Nsd1-208 >protein coding

Contigs < AC160958.2 Genes < Gm47816-201processed pseudogene < Rab24-205retained intron (Comprehensive set...

< Rab24-204retained intron

< Rab24-202retained intron

< Mxd3-203retained intron

< Mxd3-204retained intron

< Rab24-203protein coding

< Rab24-201protein coding

< Mxd3-201protein coding

< Mxd3-202lncRNA

Regulatory Build

55.20Mb 55.25Mb 55.30Mb Reverse strand 128.54 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

pseudogene processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000099490

108.54 kb Forward strand

Nsd1-201 >protein coding

ENSMUSP00000097... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily SSF63748

SSF82199

Zinc finger, FYVE/PHD-type SMART Zinc finger, PHD-type

PWWP domain AWS domain

SET domain

Post-SET domain Pfam PWWP domain NSD, Cys-His rich domain

AWS domain

SET domain PROSITE profiles PWWP domain SET domain

AWS domain

Zinc finger, PHD-finger Post-SET domain

Zinc finger, RING-type PROSITE patterns Zinc finger, PHD-type, conserved site PANTHER PTHR22884:SF312

PTHR22884 Gene3D 2.30.30.140

2.170.270.10

Zinc finger, RING/FYVE/PHD-type CDD cd05837 cd15648 cd05838 cd15659

cd15656

cd15653

cd15650

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend stop gained inframe deletion missense variant synonymous variant

Scale bar 0 400 800 1200 1600 2000 2691

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8