https://www.alphaknockout.com

Mouse Smoc2 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Smoc2 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Smoc2 (NCBI Reference Sequence: NM_022315 ; Ensembl: ENSMUSG00000023886 ) is located on Mouse 17. 13 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 13 (Transcript: ENSMUST00000024660). Exon 3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Smoc2 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-94H11 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for one KO allele exhibit protection from induced kidney fibrosis and reduced interstitial myofibroblast accumulation. Another KO allele leads to shortening and widening of the skull.

Exon 3 starts from about 19.16% of the coding region. The knockout of Exon 3 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 10840 bp, and the size of intron 3 for 3'-loxP site insertion: 1008 bp. The size of effective cKO region: ~607 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 3 4 13 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Smoc2 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7107bp) | A(26.21% 1863) | C(20.88% 1484) | T(28.1% 1997) | G(24.81% 1763)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr17 + 14333298 14336297 3000 browser details YourSeq 144 2177 2384 3000 94.0% chr3 + 149745486 149745985 500 browser details YourSeq 132 2178 2382 3000 85.1% chrX + 152390492 152390682 191 browser details YourSeq 127 2178 2384 3000 81.5% chr5 - 59803566 59803742 177 browser details YourSeq 120 2190 2500 3000 85.7% chr17 + 7574138 7574432 295 browser details YourSeq 120 2178 2384 3000 81.1% chr13 + 115247573 115247737 165 browser details YourSeq 119 2167 2388 3000 89.8% chr14 - 35902263 35902645 383 browser details YourSeq 119 2198 2384 3000 82.3% chr13 - 82858849 82859014 166 browser details YourSeq 119 2177 2384 3000 89.4% chr19 + 34498078 34498286 209 browser details YourSeq 113 2177 2376 3000 81.9% chr9 + 31594499 31594660 162 browser details YourSeq 110 2189 2391 3000 91.0% chr14 + 71583722 71583946 225 browser details YourSeq 110 2184 2384 3000 85.2% chr11 + 91201266 91201424 159 browser details YourSeq 109 2214 2384 3000 90.3% chr19 + 46129035 46129305 271 browser details YourSeq 107 2178 2334 3000 85.3% chr10 - 45007530 45007676 147 browser details YourSeq 105 2179 2383 3000 89.2% chr11 + 32779604 32779804 201 browser details YourSeq 102 2167 2334 3000 83.5% chr1 - 162384446 162384600 155 browser details YourSeq 102 2178 2384 3000 78.6% chr13 + 115247597 115247725 129 browser details YourSeq 101 2178 2334 3000 82.7% chr18 - 29947440 29947570 131 browser details YourSeq 98 2196 2376 3000 80.5% chr16 - 78074773 78074923 151 browser details YourSeq 97 2200 2382 3000 81.9% chr6 - 97711215 97711359 145

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr17 + 14336905 14339904 3000 browser details YourSeq 112 2360 2672 3000 91.8% chr9 + 113482567 113483303 737 browser details YourSeq 111 2370 2669 3000 83.4% chr1 - 34672870 34673163 294 browser details YourSeq 108 2359 2681 3000 88.3% chr10 + 67387121 67387633 513 browser details YourSeq 105 2357 2672 3000 79.2% chr11 - 8955268 8955554 287 browser details YourSeq 97 2362 2649 3000 81.5% chr10 + 6981484 6981755 272 browser details YourSeq 94 2435 2673 3000 80.0% chr17 + 69258310 69258489 180 browser details YourSeq 90 2400 2668 3000 80.0% chr15 - 101982413 101982632 220 browser details YourSeq 90 2438 2685 3000 91.6% chr10 - 59257132 59257389 258 browser details YourSeq 90 2400 2681 3000 79.9% chr8 + 121238655 121238879 225 browser details YourSeq 88 2203 2656 3000 75.8% chr12 - 19203156 19203387 232 browser details YourSeq 85 2450 2681 3000 94.0% chr1 - 130563666 130564091 426 browser details YourSeq 85 2360 2656 3000 89.0% chr11 + 98655715 98656014 300 browser details YourSeq 80 2406 2690 3000 79.4% chr11 - 111531567 111531791 225 browser details YourSeq 80 2466 2690 3000 88.4% chr4 + 153190046 153190301 256 browser details YourSeq 79 2436 2690 3000 91.7% chr8 - 120032191 120032562 372 browser details YourSeq 78 2438 2669 3000 90.6% chr8 + 121238657 121238891 235 browser details YourSeq 77 2394 2690 3000 76.4% chr11 - 38909971 38910155 185 browser details YourSeq 77 2440 2672 3000 80.0% chr1 - 192514567 192514741 175 browser details YourSeq 77 2374 2670 3000 89.5% chr11 + 6783461 6783806 346

Note: The 3000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Smoc2 SPARC related modular calcium binding 2 [ Mus musculus (house mouse) ] Gene ID: 64074, updated on 14-Oct-2019

Gene summary

Official Symbol Smoc2 provided by MGI Official Full Name SPARC related modular calcium binding 2 provided by MGI Primary source MGI:MGI:1929881 See related Ensembl:ENSMUSG00000023886 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Smoc2l; 1700056C05Rik; 5430426J21Rik Expression Broad expression in ovary adult (RPKM 125.2), subcutaneous fat pad adult (RPKM 55.4) and 16 other tissues See more Orthologs human all

Genomic context

Location: 17 A2; 17 8.95 cM See Smoc2 in Genome Data Viewer

Exon count: 13

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 17 NC_000083.6 (14279490..14404790)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 17 NC_000083.5 (14416513..14541797)

Chromosome 17 - NC_000083.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 7 transcripts

Gene: Smoc2 ENSMUSG00000023886

Description SPARC related modular calcium binding 2 [Source:MGI Symbol;Acc:MGI:1929881] Gene Synonyms 1700056C05Rik, 5430426J21Rik, Smoc2l Location Chromosome 17: 14,279,506-14,404,790 forward strand. GRCm38:CM001010.2 About this gene This gene has 7 transcripts (splice variants), 200 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 6 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Smoc2-201 ENSMUST00000024660.8 2832 447aa ENSMUSP00000024660.8 Protein coding CCDS37442 Q8CD91 TSL:1 GENCODE basic APPRIS P1

Smoc2-204 ENSMUST00000233344.1 2817 458aa ENSMUSP00000156572.1 Protein coding - A0A3B2WCK7 GENCODE basic

Smoc2-202 ENSMUST00000232923.1 588 89aa ENSMUSP00000156632.1 Protein coding - A0A3B2WCQ8 GENCODE basic

Smoc2-206 ENSMUST00000233540.1 3972 No protein - Retained intron - - -

Smoc2-205 ENSMUST00000233526.1 572 No protein - lncRNA - - -

Smoc2-203 ENSMUST00000232936.1 521 No protein - lncRNA - - -

Smoc2-207 ENSMUST00000233686.1 337 No protein - lncRNA - - -

Page 6 of 8 https://www.alphaknockout.com

145.28 kb Forward strand

14.30Mb 14.35Mb 14.40Mb Gm34567-201 >lncRNA Gm49720-201 >processed pseudogene (Comprehensive set...

Smoc2-201 >protein coding

Smoc2-204 >protein coding

Smoc2-206 >retained intron Smoc2-205 >lncRNA

Smoc2-202 >protein coding Smoc2-203 >lncRNA

Smoc2-207 >lncRNA

Contigs < AC171111.2 < AC169675.2 < CT010437.15

Genes < 4930474M22Rik-201lncRNA (Comprehensive set...

Regulatory Build

14.30Mb 14.35Mb 14.40Mb Reverse strand 145.28 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene pseudogene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000024660

125.28 kb Forward strand

Smoc2-201 >protein coding

ENSMUSP00000024... MobiDB lite Low complexity (Seg) Cleavage site (Sign... Superfamily Thyroglobulin type-1 superfamily EF-hand domain pair SMART Kazal domain Thyroglobulin type-1 Pfam Kazal domain Thyroglobulin type-1 SPARC/Testican, calcium-binding domain

PF16597 PROSITE profiles Kazal domain Thyroglobulin type-1 EF-hand domain PROSITE patterns Thyroglobulin type-1 EF-Hand 1, calcium-binding site PANTHER PTHR12352:SF21

PTHR12352 Gene3D 3.30.60.30 Thyroglobulin type-1 superfamily 1.10.238.10

CDD cd00104 Thyroglobulin type-1 SMOC-2, extracellular calcium-binding domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend start lost inframe deletion missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 400 447

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8