https://www.alphaknockout.com

Mouse Safb2 Knockout Project (CRISPR/Cas9)

Objective: To create a Safb2 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Safb2 (NCBI Reference Sequence: NM_001029979 ; Ensembl: ENSMUSG00000042625 ) is located on Mouse 17. 21 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 21 (Transcript: ENSMUST00000075510). Exon 3~11 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Male homozyous mutant mice exhibit an increase in testis weight and an increased number of Sertoli cells.

Exon 3 starts from about 8.75% of the coding region. Exon 3~11 covers 48.07% of the coding region. The size of effective KO region: ~7776 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 5 6 7 8 9 10 11 21

Legends Exon of mouse Safb2 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 11 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.7% 514) | C(22.65% 453) | T(27.65% 553) | G(24.0% 480)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.55% 551) | C(21.1% 422) | T(25.4% 508) | G(25.95% 519)

Note: The 2000 bp section downstream of Exon 11 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr17 - 56578986 56580985 2000 browser details YourSeq 117 320 587 2000 89.8% chrX - 14041667 14042196 530 browser details YourSeq 117 378 568 2000 87.8% chr12 - 71328311 71328501 191 browser details YourSeq 117 370 589 2000 84.6% chr1 - 168115933 168116152 220 browser details YourSeq 112 366 597 2000 84.7% chr1 + 90387687 90387922 236 browser details YourSeq 109 411 568 2000 91.2% chr17 - 32137250 32137408 159 browser details YourSeq 108 434 588 2000 86.7% chr1 - 107972715 107972876 162 browser details YourSeq 108 370 559 2000 86.0% chr12 + 5437763 5437953 191 browser details YourSeq 107 419 589 2000 85.1% chr7 - 137539731 137539900 170 browser details YourSeq 106 365 568 2000 88.6% chr2 + 128897051 128897263 213 browser details YourSeq 105 370 588 2000 83.9% chr1 + 152983109 152983328 220 browser details YourSeq 104 365 560 2000 86.7% chr6 - 114205522 114205720 199 browser details YourSeq 104 434 601 2000 81.6% chr2 - 76298437 76298604 168 browser details YourSeq 104 434 588 2000 86.7% chr10 + 75972264 75972583 320 browser details YourSeq 101 409 565 2000 88.2% chr2 - 75007098 75007253 156 browser details YourSeq 101 366 576 2000 82.9% chr6 + 70786891 70787097 207 browser details YourSeq 101 405 588 2000 81.8% chr4 + 149459569 149459750 182 browser details YourSeq 101 379 593 2000 87.5% chr3 + 53580056 53580270 215 browser details YourSeq 100 362 588 2000 83.0% chr10 - 23220035 23220248 214 browser details YourSeq 99 379 580 2000 88.4% chr5 - 120070806 120071009 204

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr17 - 56569210 56571209 2000 browser details YourSeq 149 981 1171 2000 91.2% chr14 + 74142640 74142852 213 browser details YourSeq 143 986 1166 2000 90.5% chr16 - 22205434 22205611 178 browser details YourSeq 136 986 1165 2000 90.0% chrX + 160643536 160712060 68525 browser details YourSeq 131 990 1152 2000 88.8% chr1 - 105736795 105736948 154 browser details YourSeq 130 1013 1175 2000 94.1% chrX - 148034310 148355832 321523 browser details YourSeq 129 981 1145 2000 91.1% chr3 - 59050264 59050431 168 browser details YourSeq 129 1003 1152 2000 93.9% chr2 - 180675805 180676165 361 browser details YourSeq 129 1003 1168 2000 92.3% chrX + 7905111 7905657 547 browser details YourSeq 128 986 1151 2000 91.0% chr5 + 115303739 115303911 173 browser details YourSeq 127 1022 1171 2000 95.2% chr2 + 106395426 106395582 157 browser details YourSeq 127 1003 1168 2000 86.9% chr2 + 75890887 75891043 157 browser details YourSeq 125 1013 1175 2000 93.2% chrX - 149139290 149306106 166817 browser details YourSeq 125 1000 1147 2000 91.4% chr2 - 26150866 26151008 143 browser details YourSeq 124 986 1143 2000 86.0% chr7 - 46712939 46713088 150 browser details YourSeq 124 981 1136 2000 91.9% chr1 - 37415183 37415354 172 browser details YourSeq 124 981 1145 2000 84.9% chr4 + 140917143 140917300 158 browser details YourSeq 124 987 1141 2000 87.8% chr19 + 36844755 36844902 148 browser details YourSeq 123 990 1134 2000 93.7% chr11 + 23507499 23507909 411 browser details YourSeq 122 982 1146 2000 84.8% chr1 - 97830230 97830387 158

Note: The 2000 bp section downstream of Exon 11 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Safb2 scaffold attachment factor B2 [ Mus musculus (house mouse) ] Gene ID: 224902, updated on 12-Aug-2019

Gene summary

Official Symbol Safb2 provided by MGI Official Full Name scaffold attachment factor B2 provided by MGI Primary source MGI:MGI:2146808 See related Ensembl:ENSMUSG00000042625 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AA389433; AI255170; mKIAA0138 Expression Ubiquitous expression in CNS E11.5 (RPKM 32.5), spleen adult (RPKM 17.6) and 28 other tissues See more Orthologs human all

Genomic context

Location: 17; 17 D See Safb2 in Genome Data Viewer Exon count: 22

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 17 NC_000083.6 (56560971..56584675, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 17 NC_000083.5 (56702365..56724006, complement)

Chromosome 17 - NC_000083.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 17 transcripts

Gene: Safb2 ENSMUSG00000042625

Description scaffold attachment factor B2 [Source:MGI Symbol;Acc:MGI:2146808] Location Chromosome 17: 56,560,965-56,584,585 reverse strand. GRCm38:CM001010.2 About this gene This gene has 17 transcripts (splice variants), 97 orthologues, 2 paralogues, is a member of 1 Ensembl protein family and is associated with 2 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Safb2- ENSMUST00000075510.11 3277 991aa ENSMUSP00000074953.5 Protein coding CCDS28907 Q80YR5 TSL:1 201 GENCODE basic APPRIS P1

Safb2- ENSMUST00000124111.7 2636 83aa ENSMUSP00000120845.1 Protein coding - F6VRR0 CDS 5' incomplete 202 TSL:1

Safb2- ENSMUST00000142940.7 1233 248aa ENSMUSP00000123229.1 Protein coding - F6XJ79 CDS 5' incomplete 211 TSL:1

Safb2- ENSMUST00000131056.1 886 295aa ENSMUSP00000120750.1 Protein coding - F6WIZ2 CDS 5' and 3' 205 incomplete TSL:5

Safb2- ENSMUST00000154991.7 870 249aa ENSMUSP00000117696.1 Protein coding - F6Z0Y5 CDS 5' incomplete 215 TSL:1

Safb2- ENSMUST00000142752.7 438 66aa ENSMUSP00000119141.1 Protein coding - F6R703 CDS 5' incomplete 210 TSL:5

Safb2- ENSMUST00000144255.7 2523 179aa ENSMUSP00000123673.1 Nonsense mediated - D6RJ78 TSL:1 212 decay

Safb2- ENSMUST00000133604.7 1804 179aa ENSMUSP00000119324.1 Nonsense mediated - D6RJ78 TSL:1 206 decay

Safb2- ENSMUST00000155983.1 1090 214aa ENSMUSP00000116363.1 Nonsense mediated - D6RHP9 TSL:5 216 decay

Safb2- ENSMUST00000134741.7 2107 No - Retained intron - - TSL:1 208 protein

Safb2- ENSMUST00000140037.7 970 No - Retained intron - - TSL:2 209 protein

Safb2- ENSMUST00000146958.7 842 No - Retained intron - - TSL:3 214 protein

Safb2- ENSMUST00000124457.7 684 No - Retained intron - - TSL:3 203 protein

Safb2- ENSMUST00000127947.1 632 No - Retained intron - - TSL:5 204 protein

Safb2- ENSMUST00000134497.1 572 No - Retained intron - - TSL:3 207 protein

Safb2- ENSMUST00000146070.1 439 No - Retained intron - - TSL:3 213 protein

Safb2- ENSMUST00000156640.1 552 No - lncRNA - - TSL:3 217 protein

Page 7 of 9 https://www.alphaknockout.com

43.62 kb Forward strand 56.56Mb 56.57Mb 56.58Mb 56.59Mb Safb-202 >retained intron (Comprehensive set...

Safb-207 >protein coding

Safb-208 >protein coding

Safb-201 >protein coding

Safb-210 >nonsense mediated decay

Safb-203 >retained intron

Contigs CT485788.18 > Genes < Gm20219-201protein coding < Safb2-210protein coding < Safb2-204retained intron < Safb2-209retained intron (Comprehensive set...

< Safb2-202protein coding < Safb2-203retained intron < Safb2-214retained intron

< Safb2-201protein coding

< Safb2-212nonsense mediated decay

< Safb2-211protein coding< Safb2-206nonsense mediated decay

< Safb2-217lncRNA < Safb2-213retained intron

< Safb2-215protein coding < Safb2-208retained intron

< Safb2-205protein coding

< Safb2-207retained intron

< Safb2-216nonsense mediated decay

Regulatory Build

56.56Mb 56.57Mb 56.58Mb 56.59Mb Reverse strand 43.62 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000075510

< Safb2-201protein coding

Reverse strand 21.65 kb

ENSMUSP00000074... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily SAP domain superfamily RNA-binding domain superfamily

SMART SAP domain RNA recognition motif domain

Pfam SAP domain RNA recognition motif domain

PROSITE profiles SAP domain RNA recognition motif domain

PANTHER PTHR15683

Scaffold attachment factor B2 Gene3D SAP domain superfamily Nucleotide-binding alpha-beta plait domain superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 100 200 300 400 500 600 700 800 991

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9