https://www.alphaknockout.com

Mouse Sobp Knockout Project (CRISPR/Cas9)

Objective: To create a Sobp knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sobp (NCBI Reference Sequence: NM_175407 ; Ensembl: ENSMUSG00000038248 ) is located on Mouse 10. 7 are identified, with the ATG start codon in 1 and the TAA stop codon in exon 6 (Transcript: ENSMUST00000040275). Exon 2~3 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous mutant mice exhibit open-field hyperactivity and circling behavior from weaning. Their hearing thresholds are elevated at all frequencies; the hearing impairment does not progress over time.

Exon 2 starts from about 3.74% of the coding region. Exon 2~3 covers 12.54% of the coding region. The size of effective KO region: ~2937 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 7

Legends Exon of mouse Sobp Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(28.4% 568) | C(18.2% 364) | T(32.1% 642) | G(21.3% 426)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.5% 550) | C(20.15% 403) | T(30.15% 603) | G(22.2% 444)

Note: The 2000 bp section downstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr10 - 43160845 43162844 2000 browser details YourSeq 29 1494 1522 2000 100.0% chr1 + 29077267 29077295 29 browser details YourSeq 26 788 816 2000 85.2% chr11 - 105623821 105623847 27 browser details YourSeq 25 790 814 2000 100.0% chr4 - 88224567 88224591 25 browser details YourSeq 24 788 811 2000 100.0% chr5 - 3594441 3594464 24 browser details YourSeq 24 788 811 2000 100.0% chr19 - 22065084 22065107 24 browser details YourSeq 23 788 810 2000 100.0% chr6 - 128468976 128468998 23 browser details YourSeq 23 788 810 2000 100.0% chrX + 81159805 81159827 23 browser details YourSeq 23 731 753 2000 100.0% chr3 + 124059735 124059757 23 browser details YourSeq 23 788 810 2000 100.0% chr13 + 37498769 37498791 23 browser details YourSeq 23 788 810 2000 100.0% chr12 + 96958640 96958662 23 browser details YourSeq 22 586 607 2000 100.0% chr2 - 29101105 29101126 22 browser details YourSeq 22 788 809 2000 100.0% chr18 - 78677201 78677222 22 browser details YourSeq 22 1653 1674 2000 100.0% chr14 - 107702661 107702682 22 browser details YourSeq 22 788 809 2000 100.0% chr11 - 121855731 121855752 22 browser details YourSeq 22 788 809 2000 100.0% chr11 - 3692376 3692397 22 browser details YourSeq 22 788 809 2000 100.0% chr6 + 72674520 72674541 22 browser details YourSeq 22 788 809 2000 100.0% chr17 + 90728250 90728271 22 browser details YourSeq 22 788 809 2000 100.0% chr1 + 3933706 3933727 22 browser details YourSeq 21 1654 1674 2000 100.0% chr8 - 41019261 41019281 21

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr10 - 43155908 43157907 2000 browser details YourSeq 201 1149 1800 2000 80.9% chr3 - 117101502 117101901 400 browser details YourSeq 192 1401 1812 2000 80.1% chr17 - 12646589 12646966 378 browser details YourSeq 188 1471 1791 2000 81.5% chr15 - 41771579 41771870 292 browser details YourSeq 183 1400 1812 2000 82.2% chr5 + 90508896 90509289 394 browser details YourSeq 180 1400 1813 2000 79.2% chr1 - 156542286 156542681 396 browser details YourSeq 179 1403 1812 2000 79.3% chr15 + 60937512 60937803 292 browser details YourSeq 177 1400 1812 2000 82.3% chr5 - 148029837 148030202 366 browser details YourSeq 177 1401 1809 2000 78.2% chr4 + 131669406 131669780 375 browser details YourSeq 176 1479 1804 2000 85.4% chr8 + 104413669 104634325 220657 browser details YourSeq 169 1398 1809 2000 82.4% chr3 - 21663903 21664280 378 browser details YourSeq 167 1405 1812 2000 81.6% chr8 - 122135251 122135617 367 browser details YourSeq 166 1398 1811 2000 80.9% chr19 - 38788706 38789088 383 browser details YourSeq 165 1492 1804 2000 80.6% chr1 + 83170040 83170326 287 browser details YourSeq 164 1416 1808 2000 76.6% chr10 - 42354786 42355146 361 browser details YourSeq 163 1491 1812 2000 82.2% chr3 - 154395399 154395692 294 browser details YourSeq 163 1492 1795 2000 84.3% chr10 - 93005225 93005505 281 browser details YourSeq 163 1412 1812 2000 80.6% chr4_JH584294_random + 54228 54598 371 browser details YourSeq 163 1423 1783 2000 82.2% chr3 + 157596621 157596958 338 browser details YourSeq 160 1399 1713 2000 80.6% chr4 - 52942986 52943294 309

Note: The 2000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Sobp sine oculis binding protein [ Mus musculus (house mouse) ] Gene ID: 109205, updated on 12-Aug-2019

Gene summary

Official Symbol Sobp provided by MGI Official Full Name sine oculis binding protein provided by MGI Primary source MGI:MGI:1924427 See related Ensembl:ENSMUSG00000038248 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as jc; JXC1; 2900009C16Rik; 5330439J01Rik Expression Broad expression in frontal lobe adult (RPKM 6.6), CNS E18 (RPKM 6.5) and 18 other tissues See more Orthologs human all

Genomic context

Location: 10 B2; 10 22.89 cM See Sobp in Genome Data Viewer Exon count: 7

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 10 NC_000076.6 (43002488..43183139, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 10 NC_000076.5 (42722306..42894336, complement)

Chromosome 10 - NC_000076.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 2 transcripts

Gene: Sobp ENSMUSG00000038248

Description sine oculis binding protein [Source:MGI Symbol;Acc:MGI:1924427] Gene Synonyms 2900009C16Rik, 5330439J01Rik, Jxc1, jc Location : 43,002,500-43,174,530 reverse strand. GRCm38:CM001003.2 About this gene This gene has 2 transcripts (splice variants), 196 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 10 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sobp-201 ENSMUST00000040275.8 4980 864aa ENSMUSP00000040072.7 Protein coding CCDS35890 Q0P5V2 TSL:1 GENCODE basic APPRIS P1

Sobp-202 ENSMUST00000189987.1 1095 No protein - lncRNA - - TSL:1

192.03 kb Forward strand 43.00Mb 43.05Mb 43.10Mb 43.15Mb Gm47815-202 >lncRNA 9030612E09Rik-202 >lncRNA (Comprehensive set...

Gm47815-201 >lncRNA 9030612E09Rik-201 >lncRNA

Gm29245-201 >TEC

Gm29246-201 >lncRNA

Contigs < AC112265.5 Genes < Sobp-201protein coding (Comprehensive set...

< Sobp-202lncRNA

Regulatory Build

43.00Mb 43.05Mb 43.10Mb 43.15Mb Reverse strand 192.03 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000040275

< Sobp-201protein coding

Reverse strand 172.03 kb

ENSMUSP00000040... MobiDB lite Low complexity (Seg) Pfam Retinoic acid-induced protein 2/sine oculis-binding protein homologue PANTHER PTHR23186:SF2

Retinoic acid-induced protein 2/sine oculis-binding protein homologue

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 720 864

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8