https://www.alphaknockout.com

Mouse Sbf2 Knockout Project (CRISPR/Cas9)

Objective: To create a Sbf2 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sbf2 (NCBI Reference Sequence: NM_177324 ; Ensembl: ENSMUSG00000038371 ) is located on Mouse 7. 41 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 41 (Transcript: ENSMUST00000033058). Exon 4~6 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for null alleles display progressive misfolding of myelin sheaths and abnormal nerve electrophysiology.

Exon 4 starts from about 5.04% of the coding region. Exon 4~6 covers 6.0% of the coding region. The size of effective KO region: ~3571 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 4 5 6 41

Legends Exon of mouse Sbf2 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 4 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 6 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.6% 552) | C(18.15% 363) | T(33.8% 676) | G(20.45% 409)

Note: The 2000 bp section upstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(31.0% 620) | C(16.45% 329) | T(33.55% 671) | G(19.0% 380)

Note: The 2000 bp section downstream of Exon 6 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr7 - 110464708 110466707 2000 browser details YourSeq 572 1390 2000 2000 97.3% chr2 + 11074556 11075159 604 browser details YourSeq 83 1078 1199 2000 88.3% chrX + 8113429 8113553 125 browser details YourSeq 74 1085 1196 2000 87.0% chr7 - 81518446 81518730 285 browser details YourSeq 73 1083 1198 2000 81.8% chr6 + 98718751 98718867 117 browser details YourSeq 71 1086 1199 2000 85.3% chr1 + 86292953 86329682 36730 browser details YourSeq 70 1105 1198 2000 85.9% chr15 - 99119956 99120048 93 browser details YourSeq 70 1086 1199 2000 84.4% chr1 + 91807985 91808244 260 browser details YourSeq 69 1024 1198 2000 88.0% chr3 - 152148056 152148382 327 browser details YourSeq 69 1103 1199 2000 85.6% chr17 + 42827257 42827353 97 browser details YourSeq 68 1103 1198 2000 85.5% chr2 - 64795619 64795714 96 browser details YourSeq 67 1085 1197 2000 91.5% chr12 - 84659758 84660139 382 browser details YourSeq 67 1106 1198 2000 86.1% chr1 - 92019798 92019890 93 browser details YourSeq 67 1085 1197 2000 79.7% chr1 + 13839813 13839925 113 browser details YourSeq 65 1108 1205 2000 83.6% chr1 - 191048115 191048213 99 browser details YourSeq 65 1105 1199 2000 84.3% chr5 + 77121426 77121520 95 browser details YourSeq 65 1086 1198 2000 78.8% chr1 + 133531134 133531246 113 browser details YourSeq 64 1106 1199 2000 84.1% chr9 - 63765845 63765938 94 browser details YourSeq 64 1106 1198 2000 85.0% chr11 + 62951634 63306418 354785 browser details YourSeq 63 1086 1199 2000 78.1% chr5 - 148312544 148312656 113

Note: The 2000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr7 - 110459137 110461136 2000 browser details YourSeq 575 1300 1979 2000 92.9% chr3 - 86213049 86213729 681 browser details YourSeq 574 1300 1970 2000 92.7% chrX - 17731855 17732518 664 browser details YourSeq 569 1299 1980 2000 92.2% chr9 + 72567987 72568668 682 browser details YourSeq 568 1299 1979 2000 92.2% chr3 - 7380504 7381187 684 browser details YourSeq 565 1302 1980 2000 92.3% chr2 - 66137483 66138161 679 browser details YourSeq 565 1299 1970 2000 92.6% chr11 - 29990266 29990944 679 browser details YourSeq 564 1300 1979 2000 91.8% chr18 - 68888496 68889175 680 browser details YourSeq 564 1300 1970 2000 92.5% chr18 + 57070865 57071536 672 browser details YourSeq 564 1301 1980 2000 91.6% chr14 + 56517577 56518251 675 browser details YourSeq 563 1300 1970 2000 92.4% chr10 + 79561791 79562458 668 browser details YourSeq 559 1301 1979 2000 93.5% chr17 - 85424537 85425214 678 browser details YourSeq 559 1335 1970 2000 94.1% chr1 + 63341182 63341819 638 browser details YourSeq 558 1335 1979 2000 93.7% chr2 - 116120483 116121127 645 browser details YourSeq 558 1339 1979 2000 93.0% chr12 - 69019307 69019944 638 browser details YourSeq 558 1300 1979 2000 91.4% chr11 - 22120354 22121033 680 browser details YourSeq 558 1300 1979 2000 91.2% chr4 + 4602371 4603047 677 browser details YourSeq 556 1300 1956 2000 93.5% chr9 - 111531370 111532026 657 browser details YourSeq 556 1303 1970 2000 91.7% chr18 + 5549529 5550193 665 browser details YourSeq 554 1304 1961 2000 92.5% chr7 + 107894540 107895196 657

Note: The 2000 bp section downstream of Exon 6 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Sbf2 SET binding factor 2 [ Mus musculus (house mouse) ] Gene ID: 319934, updated on 10-Oct-2019

Gene summary

Official Symbol Sbf2 provided by MGI Official Full Name SET binding factor 2 provided by MGI Primary source MGI:MGI:1921831 See related Ensembl:ENSMUSG00000038371 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Mtmr13; mMTMH1; AA414977; AI317167; 4833411B01Rik; B430219L04Rik Expression Ubiquitous expression in CNS E18 (RPKM 16.7), whole brain E14.5 (RPKM 16.1) and 28 other tissues See more Orthologs human all

Genomic context

Location: 7; 7 E3 See Sbf2 in Genome Data Viewer Exon count: 44

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 7 NC_000073.6 (110308011..110614934, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 7 NC_000073.5 (117451527..117758434, complement)

Chromosome 7 - NC_000073.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 15 transcripts

Gene: Sbf2 ENSMUSG00000038371

Description SET binding factor 2 [Source:MGI Symbol;Acc:MGI:1921831] Gene Synonyms 4833411B01Rik, B430219L04Rik, Mtmr13, SBF2, mMTMH1 Location Chromosome 7: 110,308,013-110,614,922 reverse strand. GRCm38:CM001000.2 About this gene This gene has 15 transcripts (splice variants), 242 orthologues, 12 paralogues, is a member of 1 Ensembl protein family and is associated with 8 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sbf2- ENSMUST00000033058.13 7154 1872aa ENSMUSP00000033058.7 Protein coding CCDS52361 E9PXF8 TSL:5 201 GENCODE basic APPRIS P2

Sbf2- ENSMUST00000164759.7 7090 1847aa ENSMUSP00000132072.1 Protein coding - E9PXF8 TSL:5 205 GENCODE basic APPRIS ALT2

Sbf2- ENSMUST00000166020.7 5481 1826aa ENSMUSP00000126217.1 Protein coding - E9Q0D4 TSL:5 208 GENCODE basic APPRIS ALT2

Sbf2- ENSMUST00000171218.7 1476 408aa ENSMUSP00000129805.1 Protein coding - E9Q372 TSL:1 213 GENCODE basic

Sbf2- ENSMUST00000164599.7 1018 339aa ENSMUSP00000131927.1 Protein coding - F6ZDC5 CDS 5' and 3' 204 incomplete TSL:5

Sbf2- ENSMUST00000164525.1 733 245aa ENSMUSP00000128340.1 Protein coding - F7BBM8 CDS 5' and 3' 202 incomplete TSL:5

Sbf2- ENSMUST00000166885.1 595 198aa ENSMUSP00000130476.1 Protein coding - F6QZ02 CDS 5' and 3' 209 incomplete TSL:5

Sbf2- ENSMUST00000164559.1 530 177aa ENSMUSP00000128265.1 Protein coding - F6QG63 CDS 5' and 3' 203 incomplete TSL:5

Sbf2- ENSMUST00000167652.1 404 44aa ENSMUSP00000127105.1 Nonsense mediated - E9Q4I8 TSL:5 210 decay

Sbf2- ENSMUST00000167880.7 4280 No - Retained intron - - TSL:1 211 protein

Sbf2- ENSMUST00000165449.7 2961 No - Retained intron - - TSL:1 206 protein

Sbf2- ENSMUST00000211732.1 2770 No - Retained intron - - TSL:NA 215 protein

Sbf2- ENSMUST00000171378.7 2694 No - Retained intron - - TSL:1 214 protein

Sbf2- ENSMUST00000169740.1 1681 No - Retained intron - - TSL:1 212 protein

Sbf2- ENSMUST00000165992.1 920 No - Retained intron - - TSL:5 207 protein

Page 7 of 9 https://www.alphaknockout.com

326.91 kb Forward strand 110.3Mb 110.4Mb 110.5Mb 110.6Mb Gm9132-201 >processed pseudogene Gm9064-201 >processed pseudogene Gm17219-201 >processed pseudogene (Comprehensive set...

Gm10087-201 >processed pseudogene

Contigs < AC124472.4 AC122921.2 > AC154911.4 > Genes (Comprehensive set... < Sbf2-205protein coding

< Sbf2-201protein coding

< Sbf2-208protein coding

< Sbf2-207retained intron < Sbf2-206retained intron

< Sbf2-204protein coding < Sbf2-214retained intron

< Sbf2-212retained intron < Sbf2-203protein coding

< Sbf2-202protein coding < Sbf2-211retained intron

< Sbf2-213protein coding

< Sbf2-209protein coding

< Sbf2-215retained intron

< Sbf2-210nonsense mediated decay

Regulatory Build

110.3Mb 110.4Mb 110.5Mb 110.6Mb Reverse strand 326.91 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding

Non-Protein Coding

processed transcript pseudogene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000033058

< Sbf2-201protein coding

Reverse strand 306.90 kb

ENSMUSP00000033... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Protein-tyrosine phosphatase-like

SSF50729 SMART uDENN domain dDENN domain GRAM domain Pleckstrin homology domain

cDENN domain Pfam uDENN domain SBF1/SBF2 domain GRAM domain Myotubularin-like phosphatase domain Pleckstrin homology domain

cDENN domain PROSITE profiles Tripartite DENN domain Myotubularin-like phosphatase domain Pleckstrin homology domain

PANTHER Myotubularin family

Myotubularin-related protein 13 Gene3D 3.40.50.11500 PH-like domain superfamily

3.30.450.200 CDD Myotubularin-related protein 13, PH-GRAM domain cd01235

cd14589

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend stop gained frameshift variant missense variant splice region variant synonymous variant

Scale bar 0 200 400 600 800 1000 1200 1400 1600 1872

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9