https://www.alphaknockout.com

Mouse Sorbs3 Knockout Project (CRISPR/Cas9)

Objective: To create a Sorbs3 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sorbs3 (NCBI Reference Sequence: NM_001271407 ; Ensembl: ENSMUSG00000022091 ) is located on Mouse 14. 21 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 21 (Transcript: ENSMUST00000227653). Exon 2~6 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous mutants are generally normal, viable, and fertile, except showing delayed wound healing in response to full-thickness skin injury in vivo.

Exon 2 starts from the coding region. Exon 2~6 covers 24.75% of the coding region. The size of effective KO region: ~7494 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 21

Legends Exon of mouse Sorbs3 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 416 bp section downstream of Exon 6 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(19.95% 399) | C(31.85% 637) | T(21.7% 434) | G(26.5% 530)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(416bp) | A(20.43% 85) | C(23.56% 98) | T(30.29% 126) | G(25.72% 107)

Note: The 416 bp section downstream of Exon 6 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr14 - 70203362 70205361 2000 browser details YourSeq 23 1700 1723 2000 100.0% chr3 + 62679366 62679391 26 browser details YourSeq 21 1917 1937 2000 100.0% chr6 - 124853176 124853196 21 browser details YourSeq 21 738 758 2000 100.0% chr2 - 13734481 13734501 21 browser details YourSeq 21 1398 1418 2000 100.0% chr14 - 15011786 15011806 21 browser details YourSeq 21 1917 1937 2000 100.0% chr8 + 11687986 11688006 21

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 416 1 416 416 100.0% chr14 - 70195507 70195922 416 browser details YourSeq 33 207 241 416 100.0% chr5 - 133741843 133741888 46 browser details YourSeq 28 190 235 416 71.9% chr12 - 23494780 23494814 35 browser details YourSeq 28 297 328 416 96.7% chr11 + 115550050 115550268 219 browser details YourSeq 27 195 231 416 86.5% chr11 - 5338385 5338421 37 browser details YourSeq 26 203 232 416 93.4% chr2 + 29583014 29583043 30 browser details YourSeq 25 202 230 416 93.2% chr18 - 20578477 20578505 29 browser details YourSeq 24 329 359 416 88.5% chr11 - 80673378 80673407 30 browser details YourSeq 24 212 235 416 100.0% chr10 + 126185109 126185132 24 browser details YourSeq 23 211 235 416 96.0% chr16 - 7727913 7727937 25 browser details YourSeq 23 191 218 416 84.0% chrX + 81490181 81490206 26 browser details YourSeq 23 197 233 416 81.1% chr11 + 119129801 119129837 37 browser details YourSeq 22 212 234 416 100.0% chr16 - 97829855 97829878 24 browser details YourSeq 22 207 229 416 100.0% chr15 - 48370614 48370637 24 browser details YourSeq 22 212 234 416 100.0% chr4 + 100796048 100796071 24 browser details YourSeq 22 212 235 416 95.9% chr12 + 19332719 19332742 24 browser details YourSeq 22 212 235 416 95.9% chr12 + 18563209 18563232 24 browser details YourSeq 21 212 232 416 100.0% chr19 - 5565894 5565914 21 browser details YourSeq 21 212 232 416 100.0% chr18 - 60762211 60762231 21 browser details YourSeq 21 215 235 416 100.0% chr18 - 34650795 34650815 21

Note: The 416 bp section downstream of Exon 6 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Sorbs3 sorbin and SH3 domain containing 3 [ Mus musculus (house mouse) ] Gene ID: 20410, updated on 24-Oct-2019

Gene summary

Official Symbol Sorbs3 provided by MGI Official Full Name sorbin and SH3 domain containing 3 provided by MGI Primary source MGI:MGI:700013 See related Ensembl:ENSMUSG00000022091 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as SH3P3; Sh3d4; SCAM-1; vinexin-g Expression Broad expression in lung adult (RPKM 40.0), adrenal adult (RPKM 39.0) and 23 other tissues See more Orthologs human all

Genomic context

Location: 14 D2; 14 36.27 cM See Sorbs3 in Genome Data Viewer Exon count: 28

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 14 NC_000080.6 (70180468..70212023, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 14 NC_000080.5 (70580275..70607444, complement)

Chromosome 14 - NC_000080.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Sorbs3 ENSMUSG00000022091

Description sorbin and SH3 domain containing 3 [Source:MGI Symbol;Acc:MGI:700013] Gene Synonyms SH3P3, Sh3d4, vinexin alpha, vinexin beta Location Chromosome 14: 70,180,468-70,211,989 reverse strand. GRCm38:CM001007.2 About this gene This gene has 4 transcripts (splice variants), 192 orthologues, 12 paralogues, is a member of 1 Ensembl protein family and is associated with 2 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sorbs3-201 ENSMUST00000022682.5 2914 733aa ENSMUSP00000022682.5 Protein coding CCDS27249 Q9R1Z8 TSL:1 GENCODE basic APPRIS P2

Sorbs3-203 ENSMUST00000227653.1 3061 680aa ENSMUSP00000154195.1 Protein coding - Q8K0M3 GENCODE basic APPRIS ALT2

Sorbs3-202 ENSMUST00000227259.1 2481 680aa ENSMUSP00000153715.1 Protein coding - Q8K0M3 GENCODE basic APPRIS ALT2

Sorbs3-204 ENSMUST00000227929.1 2460 715aa ENSMUSP00000154773.1 Protein coding - A0A2I3BS41 GENCODE basic

Page 7 of 9 https://www.alphaknockout.com

51.52 kb Forward strand

70.18Mb 70.19Mb 70.20Mb 70.21Mb 70.22Mb Gm49417-202 >lncRNA (Comprehensive set...

Gm49417-201 >lncRNA

Contigs AC151836.3 > Genes (Comprehensive set... < Pdlim2-201protein coding < Sorbs3-201protein coding < Ppp3cc-203protein coding

< Pdlim2-210protein coding < Sorbs3-203protein coding < Ppp3cc-201protein coding

< Pdlim2-209protein coding < Sorbs3-202protein coding

< Pdlim2-208retained intron < Sorbs3-204protein coding

< Pdlim2-206retained intron

< Pdlim2-207retained intron

< Pdlim2-202retained intron

Regulatory Build

70.18Mb 70.19Mb 70.20Mb 70.21Mb 70.22Mb Reverse strand 51.52 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000227653

< Sorbs3-203protein coding

Reverse strand 25.55 kb

ENSMUSP00000154... MobiDB lite Low complexity (Seg) Superfamily SH3-like domain superfamily SMART SoHo domain SH3 domain

Prints PR00499 SH3 domain Pfam SoHo domain SH3 domain

SH3 domain PROSITE profiles SoHo domain SH3 domain PANTHER PTHR14167

Vinexin Gene3D 2.30.30.40 CDD Vinexin, SH3 domain 1 Vinexin, SH3 domain 3

Vinexin, SH3 domain 2

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend inframe deletion missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 600 680

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9