https://www.alphaknockout.com

Mouse Ssc4d Knockout Project (CRISPR/Cas9)

Objective: To create a Ssc4d knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Ssc4d (NCBI Reference Sequence: NM_001160366 ; Ensembl: ENSMUSG00000029699 ) is located on Mouse 5. 11 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 11 (Transcript: ENSMUST00000111152). Exon 2~11 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 0.06% of the coding region. Exon 2~11 covers 100.0% of the coding region. The size of effective KO region: ~9451 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 7 8 9 10 11

Legends Exon of mouse Ssc4d Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(28.45% 569) | C(23.6% 472) | T(24.75% 495) | G(23.2% 464)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.45% 509) | C(25.65% 513) | T(25.8% 516) | G(23.1% 462)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr5 - 135970348 135972347 2000 browser details YourSeq 226 658 1400 2000 91.3% chr1 + 27412071 27538199 126129 browser details YourSeq 165 510 819 2000 88.9% chr19 - 4054480 4054782 303 browser details YourSeq 164 653 1149 2000 93.6% chr16 + 33166535 33167153 619 browser details YourSeq 155 638 827 2000 89.9% chr1 - 93632205 93632386 182 browser details YourSeq 155 653 906 2000 88.2% chr3 + 129790725 129790956 232 browser details YourSeq 152 655 837 2000 89.4% chr11 - 79732749 79732917 169 browser details YourSeq 152 638 826 2000 96.4% chr3 + 63975362 63975849 488 browser details YourSeq 151 520 821 2000 89.0% chr6 - 87940008 87940305 298 browser details YourSeq 151 652 835 2000 90.0% chr14 - 66283790 66283964 175 browser details YourSeq 151 654 825 2000 94.2% chr11 - 105185393 105185568 176 browser details YourSeq 150 649 825 2000 90.6% chr15 + 83139993 83140163 171 browser details YourSeq 150 654 835 2000 92.7% chr12 + 36097560 36097746 187 browser details YourSeq 150 654 826 2000 91.9% chr11 + 6577136 6577306 171 browser details YourSeq 149 436 771 2000 86.3% chr16 - 13845629 13845869 241 browser details YourSeq 148 638 823 2000 88.5% chr16 - 62358719 62358896 178 browser details YourSeq 148 634 825 2000 88.4% chr15 + 100538347 100538524 178 browser details YourSeq 147 653 825 2000 90.4% chr2 - 20817049 20817214 166 browser details YourSeq 147 654 823 2000 91.4% chr16 - 90723459 90723622 164 browser details YourSeq 146 654 829 2000 92.5% chr3 - 131616025 131616501 477

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr5 - 135958895 135960894 2000 browser details YourSeq 184 1500 1881 2000 89.0% chr4 - 141138009 141138358 350 browser details YourSeq 181 1620 1889 2000 89.2% chrX - 139411206 139411415 210 browser details YourSeq 181 1510 1876 2000 89.3% chr12 - 83616029 83616306 278 browser details YourSeq 180 1510 1867 2000 87.1% chr5 - 129079497 129079713 217 browser details YourSeq 176 1695 1889 2000 93.2% chr11 - 20198740 20198930 191 browser details YourSeq 176 1686 1889 2000 94.1% chr10 - 62924885 62925092 208 browser details YourSeq 176 1695 1886 2000 94.8% chr1 - 156676932 156677122 191 browser details YourSeq 175 1688 1891 2000 94.0% chr11 + 94589039 94589251 213 browser details YourSeq 174 1506 1876 2000 86.5% chr1 - 4877234 4877508 275 browser details YourSeq 173 1695 1889 2000 95.4% chr8 - 25583500 25583698 199 browser details YourSeq 173 1695 1889 2000 95.3% chr5 - 7084557 7084754 198 browser details YourSeq 173 1695 1889 2000 95.4% chr2 - 119129123 119129327 205 browser details YourSeq 173 1695 1889 2000 95.8% chr8 + 60903518 60903724 207 browser details YourSeq 173 1695 1889 2000 95.4% chr2 + 34715714 34715912 199 browser details YourSeq 173 1695 1889 2000 95.4% chr11 + 76281872 76282070 199 browser details YourSeq 173 1695 1889 2000 95.4% chr1 + 86467852 86468050 199 browser details YourSeq 172 1695 1876 2000 97.3% chr9 - 122346692 122346873 182 browser details YourSeq 171 1427 1876 2000 88.1% chrX - 73834048 73834462 415 browser details YourSeq 171 1695 1889 2000 94.8% chr2 + 165030009 165030207 199

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Ssc4d scavenger receptor cysteine rich family, 4 domains [ Mus musculus (house mouse) ] Gene ID: 109267, updated on 12-Aug-2019

Gene summary

Official Symbol Ssc4d provided by MGI Official Full Name scavenger receptor cysteine rich family, 4 domains provided by MGI Primary source MGI:MGI:1924709 See related Ensembl:ENSMUSG00000029699 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Srcrb4d; S4D-SRCRB; SRCRB-S4D; C330016E03Rik Expression Ubiquitous expression in ovary adult (RPKM 2.1), genital fat pad adult (RPKM 2.0) and 28 other tissues See more Orthologs human all

Genomic context

Location: 5; 5 G2 See Ssc4d in Genome Data Viewer Exon count: 11

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 5 NC_000071.6 (135960220..135974535, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 5 NC_000071.5 (136436093..136450346, complement)

Chromosome 5 - NC_000071.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 7 transcripts

Gene: Ssc4d ENSMUSG00000029699

Description scavenger receptor cysteine rich family, 4 domains [Source:MGI Symbol;Acc:MGI:1924709] Gene Synonyms C330016E03Rik, Srcrb4d Location Chromosome 5: 135,960,211-135,974,531 reverse strand. GRCm38:CM000998.2 About this gene This gene has 7 transcripts (splice variants), 140 orthologues, 16 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Ssc4d-203 ENSMUST00000111152.7 2745 586aa ENSMUSP00000106782.1 Protein coding CCDS51664 A1L0T3 TSL:5 GENCODE basic APPRIS P1

Ssc4d-204 ENSMUST00000111153.7 2745 586aa ENSMUSP00000106783.1 Protein coding CCDS51664 A1L0T3 TSL:5 GENCODE basic APPRIS P1

Ssc4d-201 ENSMUST00000054895.3 1191 134aa ENSMUSP00000050439.3 Protein coding - G3X988 TSL:1 GENCODE basic

Ssc4d-206 ENSMUST00000154181.1 819 174aa ENSMUSP00000123008.1 Protein coding - D3YUQ6 CDS 3' incomplete TSL:5

Ssc4d-205 ENSMUST00000153823.7 498 166aa ENSMUSP00000122958.1 Protein coding - F7AS28 CDS 5' and 3' incomplete TSL:3

Ssc4d-202 ENSMUST00000111150.1 459 108aa ENSMUSP00000106780.1 Protein coding - D3YZ07 TSL:2 GENCODE basic

Ssc4d-207 ENSMUST00000154696.1 348 116aa ENSMUSP00000117071.1 Protein coding - F6ZA92 CDS 5' and 3' incomplete TSL:5

Page 7 of 9 https://www.alphaknockout.com

34.32 kb Forward strand

135.96Mb 135.97Mb 135.98Mb Zp3-202 >nonsense mediated decay (Comprehensive set...

Zp3-201 >protein coding

Contigs AC084162.3 > AC087420.4 > Genes (Comprehensive set... < Ssc4d-204protein coding

< Ssc4d-203protein coding

< Ssc4d-201protein coding < Ssc4d-206protein coding

< Ssc4d-202protein coding

< Ssc4d-205protein coding

< Ssc4d-207protein coding

Regulatory Build

135.96Mb 135.97Mb 135.98Mb Reverse strand 34.32 kb

Regulation Legend CTCF Enhancer Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000111152

< Ssc4d-203protein coding

Reverse strand 14.28 kb

ENSMUSP00000106... MobiDB lite Low complexity (Seg) Superfamily SRCR-like domain superfamily SMART SRCR-like domain Prints SRCR domain Pfam SRCR domain PROSITE profiles SRCR domain PROSITE patterns SRCR domain PANTHER PTHR19331

PTHR19331:SF439 Gene3D SRCR-like domain superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 586

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9