https://www.alphaknockout.com

Mouse Scfd1 Knockout Project (CRISPR/Cas9)

Objective: To create a Scfd1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Scfd1 (NCBI Reference Sequence: NM_029825 ; Ensembl: ENSMUSG00000020952 ) is located on Mouse 12. 25 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 25 (Transcript: ENSMUST00000021335). Exon 2~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 2.76% of the coding region. Exon 2~5 covers 19.51% of the coding region. The size of effective KO region: ~6324 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 25

Legends Exon of mouse Scfd1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1754 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.4% 528) | C(18.75% 375) | T(33.7% 674) | G(21.15% 423)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1754bp) | A(27.08% 475) | C(18.87% 331) | T(34.55% 606) | G(19.5% 342)

Note: The 1754 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr12 + 51382106 51384105 2000 browser details YourSeq 193 570 916 2000 89.8% chr11 - 69403174 69630708 227535 browser details YourSeq 171 737 1264 2000 95.3% chr13 - 58443172 58615657 172486 browser details YourSeq 160 737 916 2000 96.1% chr12 - 17437350 17522966 85617 browser details YourSeq 159 753 1125 2000 85.0% chr1 - 192768357 192768608 252 browser details YourSeq 153 738 916 2000 93.8% chr11 - 21074630 21074817 188 browser details YourSeq 152 739 915 2000 94.2% chr19 - 29281529 29281710 182 browser details YourSeq 152 745 921 2000 94.7% chr13 - 107554829 107555013 185 browser details YourSeq 151 745 917 2000 96.4% chr1 - 60134662 60134835 174 browser details YourSeq 151 737 913 2000 93.7% chr6 + 38422315 38422496 182 browser details YourSeq 149 737 912 2000 94.7% chr16 - 4658343 4658538 196 browser details YourSeq 149 737 916 2000 93.0% chr14 - 107194043 107194223 181 browser details YourSeq 148 741 917 2000 92.6% chr2 - 157414091 157414277 187 browser details YourSeq 145 733 908 2000 93.4% chr3 + 88093039 88093604 566 browser details YourSeq 144 744 915 2000 94.5% chr17 + 23769781 23769967 187 browser details YourSeq 142 742 915 2000 92.3% chr2 - 82964213 82964397 185 browser details YourSeq 141 760 916 2000 95.6% chr17 - 29405891 29546912 141022 browser details YourSeq 140 739 912 2000 91.3% chr4 + 105101688 105101867 180 browser details YourSeq 140 737 912 2000 90.7% chr3 + 32528264 32528447 184 browser details YourSeq 139 737 906 2000 94.4% chr1 - 184886710 184886914 205

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1754 1 1754 1754 100.0% chr12 + 51390430 51392183 1754 browser details YourSeq 134 1257 1624 1754 85.3% chr10 + 63824051 63824343 293 browser details YourSeq 132 1254 1618 1754 92.9% chr12 - 4183329 4183694 366 browser details YourSeq 128 1198 1625 1754 84.2% chr10 + 4411973 4412311 339 browser details YourSeq 126 1182 1624 1754 81.7% chr16 - 28934786 28934964 179 browser details YourSeq 125 1192 1640 1754 81.6% chr10 + 5782473 5782636 164 browser details YourSeq 119 1318 1624 1754 93.5% chr10 - 128012792 128013224 433 browser details YourSeq 94 1517 1620 1754 95.2% chr16 - 55929944 55930047 104 browser details YourSeq 92 1517 1624 1754 92.6% chr10 - 84407526 84407633 108 browser details YourSeq 92 1517 1624 1754 92.6% chr11 + 20790330 20790437 108 browser details YourSeq 91 1517 1624 1754 92.6% chr11 + 97761381 97761494 114 browser details YourSeq 90 1517 1624 1754 91.7% chr12 + 110415979 110416086 108 browser details YourSeq 86 1271 1586 1754 92.3% chr8 - 83586414 83586876 463 browser details YourSeq 85 1517 1664 1754 83.9% chr11 + 20704065 20704192 128 browser details YourSeq 74 1232 1374 1754 93.2% chr11 - 108851113 108851606 494 browser details YourSeq 72 1398 1617 1754 84.0% chr13 + 32801095 32801328 234 browser details YourSeq 68 1263 1618 1754 91.5% chr2 + 29767247 29767682 436 browser details YourSeq 64 1286 1578 1754 93.4% chr11 + 71037509 71038024 516 browser details YourSeq 57 1224 1299 1754 94.0% chr1 - 86147193 86147268 76 browser details YourSeq 53 1257 1357 1754 93.5% chr13 - 92783910 92784018 109

Note: The 1754 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Scfd1 Sec1 family domain containing 1 [ Mus musculus (house mouse) ] Gene ID: 76983, updated on 24-Oct-2019

Gene summary

Official Symbol Scfd1 provided by MGI Official Full Name Sec1 family domain containing 1 provided by MGI Primary source MGI:MGI:1924233 See related Ensembl:ENSMUSG00000020952 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as RA410; STXBP1L2; 3110021P21Rik Expression Broad expression in placenta adult (RPKM 10.2), limb E14.5 (RPKM 6.3) and 24 other tissues See more Orthologs human all

Genomic context

Location: 12; 12 B3 See Scfd1 in Genome Data Viewer Exon count: 26

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 12 NC_000078.6 (51377513..51450104)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 12 NC_000078.5 (52478567..52551083)

Chromosome 12 - NC_000078.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 7 transcripts

Gene: Scfd1 ENSMUSG00000020952

Description Sec1 family domain containing 1 [Source:MGI Symbol;Acc:MGI:1924233] Gene Synonyms 3110021P21Rik, RA410, STXBP1L2 Location Chromosome 12: 51,377,510-51,450,101 forward strand. GRCm38:CM001005.2 About this gene This gene has 7 transcripts (splice variants), 215 orthologues, 7 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Scfd1- ENSMUST00000021335.6 2096 639aa ENSMUSP00000021335.5 Protein coding CCDS36439 Q8BRF7 TSL:1 201 GENCODE basic APPRIS P1

Scfd1- ENSMUST00000219434.1 2749 390aa ENSMUSP00000151347.1 Protein coding - Q8BRF7 TSL:1 205 GENCODE basic

Scfd1- ENSMUST00000218138.1 1017 159aa ENSMUSP00000151379.1 Nonsense mediated - A0A1W2P6R7 CDS 5' 203 decay incomplete TSL:3

Scfd1- ENSMUST00000218131.1 6842 No - Retained intron - - TSL:2 202 protein

Scfd1- ENSMUST00000219264.1 3108 No - Retained intron - - TSL:NA 204 protein

Scfd1- ENSMUST00000219799.1 2534 No - Retained intron - - TSL:1 207 protein

Scfd1- ENSMUST00000219686.1 707 No - Retained intron - - TSL:2 206 protein

Page 7 of 9 https://www.alphaknockout.com

92.59 kb Forward strand

51.38Mb 51.40Mb 51.42Mb 51.44Mb 51.46Mb (Comprehensive set... G2e3-201 >protein coding Scfd1-206 >retained intron

G2e3-202 >protein coding Scfd1-203 >nonsense mediated decay

G2e3-203 >protein coding

G2e3-205 >retained intron

Scfd1-204 >retained intron

Scfd1-201 >protein coding

Scfd1-205 >protein coding

Scfd1-207 >retained intron

Scfd1-202 >retained intron

Contigs AC161116.4 > AC163670.3 > Regulatory Build

51.38Mb 51.40Mb 51.42Mb 51.44Mb 51.46Mb Reverse strand 92.59 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000021335

72.52 kb Forward strand

Scfd1-201 >protein coding

ENSMUSP00000021... Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Sec1-like superfamily Pfam Sec1-like protein PIRSF Sec1-like protein PANTHER PTHR11679:SF2

Sec1-like protein Gene3D 3.40.50.2060 1.25.40.60

3.90.830.10

Sec1-like, domain 2

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 639

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9