https://www.alphaknockout.com

Mouse Ssmem1 Knockout Project (CRISPR/Cas9)

Objective: To create a Ssmem1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Ssmem1 (NCBI Reference Sequence: NM_027073 ; Ensembl: ENSMUSG00000029784 ) is located on Mouse 6. 3 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 3 (Transcript: ENSMUST00000031797). Exon 2~3 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 6.8% of the coding region. Exon 2~3 covers 93.37% of the coding region. The size of effective KO region: ~2401 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3

Legends Exon of mouse Ssmem1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.35% 527) | C(19.95% 399) | T(32.05% 641) | G(21.65% 433)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.9% 558) | C(20.5% 410) | T(25.65% 513) | G(25.95% 519)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr6 + 30515506 30517505 2000 browser details YourSeq 227 458 1497 2000 91.4% chr5 + 123978889 124538988 560100 browser details YourSeq 215 462 1474 2000 90.0% chr11 - 57822172 57844329 22158 browser details YourSeq 189 458 1388 2000 95.3% chr3 - 53860772 54096337 235566 browser details YourSeq 137 455 606 2000 95.4% chr11 - 113547893 113548046 154 browser details YourSeq 136 452 608 2000 93.6% chrX + 151752744 151752908 165 browser details YourSeq 134 457 607 2000 94.7% chrX + 13477087 13477237 151 browser details YourSeq 134 467 622 2000 94.8% chr9 + 19098543 19098701 159 browser details YourSeq 133 462 604 2000 96.6% chr3 + 87897161 87897303 143 browser details YourSeq 132 458 608 2000 94.6% chr1 - 163992874 163993043 170 browser details YourSeq 129 460 603 2000 95.2% chr2 - 145840940 145841084 145 browser details YourSeq 129 462 598 2000 95.6% chr11 - 19239419 19239554 136 browser details YourSeq 129 467 607 2000 95.8% chrX + 101040547 101040687 141 browser details YourSeq 129 460 607 2000 93.9% chr5 + 122713792 122713940 149 browser details YourSeq 129 461 609 2000 94.0% chr14 + 63176003 63176154 152 browser details YourSeq 129 472 606 2000 97.8% chr1 + 133686531 133686665 135 browser details YourSeq 128 470 614 2000 92.4% chr10 - 85083515 85083657 143 browser details YourSeq 128 460 606 2000 93.9% chr6 + 38425568 38425715 148 browser details YourSeq 128 467 606 2000 95.8% chr1 + 89530597 89530736 140 browser details YourSeq 127 462 598 2000 96.4% chr15 - 98605294 98605430 137

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr6 + 30519907 30521906 2000 browser details YourSeq 61 810 878 2000 94.3% chrX - 100685867 100685935 69 browser details YourSeq 61 810 885 2000 90.8% chrX - 12866076 12866153 78 browser details YourSeq 61 435 892 2000 66.7% chr10 - 88229826 88229982 157 browser details YourSeq 61 808 885 2000 89.1% chr1 - 118267965 118268041 77 browser details YourSeq 59 809 881 2000 90.5% chr5 - 119861131 119861203 73 browser details YourSeq 59 807 888 2000 86.5% chr11 - 59126875 59126961 87 browser details YourSeq 59 420 883 2000 68.5% chr1 - 173471030 173471220 191 browser details YourSeq 58 809 901 2000 79.8% chr11 - 67060441 67060527 87 browser details YourSeq 58 808 881 2000 89.2% chr9 + 109870253 109870326 74 browser details YourSeq 58 812 890 2000 94.0% chr14 + 67085998 67269291 183294 browser details YourSeq 57 807 881 2000 88.0% chr11 - 86462975 86463049 75 browser details YourSeq 56 807 885 2000 86.9% chrX - 164037503 164037583 81 browser details YourSeq 55 803 875 2000 87.0% chr9 - 70037647 70037718 72 browser details YourSeq 55 807 879 2000 87.7% chr10 - 18009651 18009723 73 browser details YourSeq 54 807 880 2000 86.5% chr9 - 107876824 107876897 74 browser details YourSeq 53 807 881 2000 85.4% chr11 - 103682307 103682381 75 browser details YourSeq 53 807 875 2000 90.8% chr10 - 70057811 70057880 70 browser details YourSeq 53 808 878 2000 87.4% chr9 + 114892025 114892095 71 browser details YourSeq 53 807 871 2000 90.8% chr2 + 161019001 161019065 65

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Ssmem1 serine-rich single-pass membrane protein 1 [ Mus musculus (house mouse) ] Gene ID: 75647, updated on 12-Aug-2019

Gene summary

Official Symbol Ssmem1 provided by MGI Official Full Name serine-rich single-pass membrane protein 1 provided by MGI Primary source MGI:MGI:1922897 See related Ensembl:ENSMUSG00000029784 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 1700016K02Rik; 1700025E21Rik Expression Restricted expression toward testis adult (RPKM 30.5) See more Orthologs human all

Genomic context

Location: 6; 6 A3.3 See Ssmem1 in Genome Data Viewer Exon count: 6

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 6 NC_000072.6 (30509791..30520254)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 6 NC_000072.5 (30462304..30469963)

Chromosome 6 - NC_000072.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Ssmem1 ENSMUSG00000029784

Description serine-rich single-pass membrane protein 1 [Source:MGI Symbol;Acc:MGI:1922897] Gene Synonyms 1700016K02Rik, 1700025E21Rik Location Chromosome 6: 30,509,849-30,520,254 forward strand. GRCm38:CM000999.2 About this gene This gene has 3 transcripts (splice variants), 103 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Ssmem1-201 ENSMUST00000031797.10 1340 196aa ENSMUSP00000031797.4 Protein coding CCDS57417 Q9DAA0 TSL:1 GENCODE basic

Ssmem1-202 ENSMUST00000031798.13 1155 244aa ENSMUSP00000031798.7 Protein coding CCDS19973 Q9D9Y8 TSL:1 GENCODE basic APPRIS P1

Ssmem1-203 ENSMUST00000131485.1 397 66aa ENSMUSP00000122018.1 Protein coding - D3Z3E1 CDS 3' incomplete TSL:2

Page 7 of 9 https://www.alphaknockout.com

30.41 kb Forward strand 30.50Mb 30.51Mb 30.52Mb 30.53Mb Ssmem1-201 >protein coding (Comprehensive set...

Ssmem1-202 >protein coding

Ssmem1-203 >protein coding

Contigs < AC155848.6 Genes < Tmem209-208nonsense mediated decay (Comprehensive set...

< Tmem209-201protein coding

< Tmem209-205nonsense mediated decay

< Tmem209-204protein coding

< Tmem209-203protein coding

< Tmem209-202protein coding

< Tmem209-211protein coding

< Tmem209-209protein coding

< Tmem209-207retained intron

< Tmem209-206protein coding

Regulatory Build

30.50Mb 30.51Mb 30.52Mb 30.53Mb Reverse strand 30.41 kb

Regulation Legend CTCF Enhancer Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000031797

10.40 kb Forward strand

Ssmem1-201 >protein coding

ENSMUSP00000031... MobiDB lite Pfam Protein of unknown function DUF4636 PANTHER Protein of unknown function DUF4636

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 196

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9