https://www.alphaknockout.com

Mouse Sdad1 Knockout Project (CRISPR/Cas9)

Objective: To create a Sdad1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sdad1 (NCBI Reference Sequence: NM_172713 ; Ensembl: ENSMUSG00000029415 ) is located on Mouse 5. 22 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 22 (Transcript: ENSMUST00000031364). Exon 2~12 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 4.42% of the coding region. Exon 2~12 covers 46.34% of the coding region. The size of effective KO region: ~9827 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10 11 12 22

Legends Exon of mouse Sdad1 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1989 bp section downstream of Exon 12 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.5% 490) | C(20.0% 400) | T(27.1% 542) | G(28.4% 568)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1989bp) | A(25.04% 498) | C(21.82% 434) | T(28.96% 576) | G(24.18% 481)

Note: The 1989 bp section downstream of Exon 12 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr5 - 92305838 92307837 2000 browser details YourSeq 86 1470 1630 2000 87.8% chr11 + 79024441 79024625 185 browser details YourSeq 76 1497 1626 2000 85.4% chr7 - 136625413 136625556 144 browser details YourSeq 74 1346 1627 2000 73.6% chr1 + 87886342 87886549 208 browser details YourSeq 66 1516 1617 2000 81.2% chr9 - 114899155 114899248 94 browser details YourSeq 66 1397 1622 2000 90.4% chr11 - 5759973 5760256 284 browser details YourSeq 60 1510 1605 2000 86.8% chr13 - 12084574 12084682 109 browser details YourSeq 58 1524 1608 2000 94.2% chr13 - 35333967 35334066 100 browser details YourSeq 58 1457 1561 2000 91.5% chr11 - 17242038 17242222 185 browser details YourSeq 57 1514 1626 2000 86.3% chr11 + 69869024 69869142 119 browser details YourSeq 56 1522 1612 2000 92.6% chr7 - 25290030 25290137 108 browser details YourSeq 54 1458 1602 2000 95.1% chr1 - 177674164 177674495 332 browser details YourSeq 53 1514 1618 2000 87.5% chr12 - 100148309 100148415 107 browser details YourSeq 53 1526 1626 2000 89.8% chr11 - 5377687 5377787 101 browser details YourSeq 53 1514 1607 2000 89.8% chr3 + 90481823 90481931 109 browser details YourSeq 51 1516 1612 2000 86.2% chr10 + 43144328 43144423 96 browser details YourSeq 50 1458 1566 2000 96.4% chr1 - 127908390 127908500 111 browser details YourSeq 49 1517 1602 2000 93.0% chr2 + 158736731 158736828 98 browser details YourSeq 48 1515 1611 2000 87.5% chr17 - 33877387 33877492 106 browser details YourSeq 45 1492 1559 2000 76.6% chr7 - 128432009 128432072 64

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1989 1 1989 1989 100.0% chr5 - 92294022 92296010 1989 browser details YourSeq 86 670 1207 1989 79.0% chr12 + 83239551 83240145 595 browser details YourSeq 79 645 792 1989 85.3% chr10 + 18178134 18178276 143 browser details YourSeq 75 646 783 1989 81.2% chr11 + 8489122 8489263 142 browser details YourSeq 71 701 1214 1989 71.3% chr14 + 121804891 121805119 229 browser details YourSeq 67 371 780 1989 74.7% chr6 - 71829182 71829520 339 browser details YourSeq 65 621 784 1989 92.4% chr11 + 79221328 79283245 61918 browser details YourSeq 64 680 784 1989 88.1% chr16 - 18513159 18513264 106 browser details YourSeq 64 621 721 1989 89.2% chr10 - 41033340 41033440 101 browser details YourSeq 63 671 797 1989 95.8% chr1 + 59323181 59411918 88738 browser details YourSeq 62 747 1196 1989 73.5% chr6 - 120078400 120078766 367 browser details YourSeq 62 659 782 1989 93.1% chr1 - 151969299 151969422 124 browser details YourSeq 61 486 776 1989 77.5% chr10 - 94051305 94051668 364 browser details YourSeq 60 672 774 1989 91.8% chr14 - 60184656 60184765 110 browser details YourSeq 58 747 936 1989 91.7% chr11 - 23115481 23115738 258 browser details YourSeq 56 636 794 1989 84.9% chr2 + 72963032 72963186 155 browser details YourSeq 55 621 760 1989 85.9% chr10 - 117706836 117706975 140 browser details YourSeq 53 698 783 1989 95.0% chr11 + 70401097 70401203 107 browser details YourSeq 52 1149 1214 1989 92.0% chr13 - 46801719 46801785 67 browser details YourSeq 51 349 783 1989 60.3% chr10 + 12191343 12191506 164

Note: The 1989 bp section downstream of Exon 12 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Sdad1 SDA1 domain containing 1 [ Mus musculus (house mouse) ] Gene ID: 231452, updated on 10-Oct-2019

Gene summary

Official Symbol Sdad1 provided by MGI Official Full Name SDA1 domain containing 1 provided by MGI Primary source MGI:MGI:2140779 See related Ensembl:ENSMUSG00000029415 Gene type protein coding RefSeq status PROVISIONAL Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AA591032; AW538460; 4931421J16 Expression Ubiquitous expression in CNS E11.5 (RPKM 10.4), testis adult (RPKM 10.0) and 28 other tissues See more Orthologs human all

Genomic context

Location: 5; 5 E2 See Sdad1 in Genome Data Viewer Exon count: 22

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 5 NC_000071.6 (92284010..92310024, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 5 NC_000071.5 (92713036..92739050, complement)

Chromosome 5 - NC_000071.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 7 transcripts

Gene: Sdad1 ENSMUSG00000029415

Description SDA1 domain containing 1 [Source:MGI Symbol;Acc:MGI:2140779] Location Chromosome 5: 92,284,010-92,310,479 reverse strand. GRCm38:CM000998.2 About this gene This gene has 7 transcripts (splice variants), 232 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sdad1-201 ENSMUST00000031364.4 5020 687aa ENSMUSP00000031364.1 Protein coding CCDS19429 A0A0R4J0B7 TSL:1 GENCODE basic APPRIS P2

Sdad1-203 ENSMUST00000201143.1 2156 686aa ENSMUSP00000144446.1 Protein coding - A0A0J9YV20 TSL:5 GENCODE basic APPRIS ALT1

Sdad1-206 ENSMUST00000202870.1 578 64aa ENSMUSP00000144014.1 Protein coding - A0A0J9YU58 CDS 3' incomplete TSL:3

Sdad1-205 ENSMUST00000202604.1 996 No protein - Retained intron - - TSL:1

Sdad1-204 ENSMUST00000201532.1 889 No protein - Retained intron - - TSL:5

Sdad1-202 ENSMUST00000201084.1 786 No protein - Retained intron - - TSL:2

Sdad1-207 ENSMUST00000202903.3 708 No protein - Retained intron - - TSL:3

46.47 kb Forward strand

92.28Mb 92.29Mb 92.30Mb 92.31Mb 92.32Mb Gm43599-201 >lncRNA (Comprehensive set...

Gm23031-201 >snRNA

Contigs AC122365.4 > Genes (Comprehensive set... < Naaa-201protein coding < Sdad1-201protein coding

< Naaa-202protein coding < Sdad1-203protein coding

< Naaa-204retained intron < Sdad1-207retained intron < Sdad1-205retained intron

< Sdad1-202retained intron < Sdad1-204retained intron

< Sdad1-206protein coding

Regulatory Build

92.28Mb 92.29Mb 92.30Mb 92.31Mb 92.32Mb Reverse strand 46.47 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000031364

< Sdad1-201protein coding

Reverse strand 26.02 kb

ENSMUSP00000031... MobiDB lite Low complexity (Seg) Superfamily Armadillo-type fold

Pfam Uncharacterised domain NUC130/133, N-terminal SDA1 domain

PANTHER PTHR12730:SF1

Sda1

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend splice acceptor variant missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 600 687

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8