https://www.alphaknockout.com

Mouse Sema6c Knockout Project (CRISPR/Cas9)

Objective: To create a Sema6c knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sema6c (NCBI Reference Sequence: NM_001272024 ; Ensembl: ENSMUSG00000038777 ) is located on Mouse 3. 21 exons are identified, with the ATG start codon in exon 4 and the TGA stop codon in exon 21 (Transcript: ENSMUST00000090823). Exon 4~21 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a targeted mutation display a decrease in pain threshold.

Exon 4 starts from about 0.03% of the coding region. Exon 4~21 covers 100.0% of the coding region. The size of effective KO region: ~8893 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3' 18

1 3 4 5 6 7 8 9 12 13 14 15 16 17 1920 21

Legends Exon of mouse Sema6c Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(21.25% 425) | C(26.25% 525) | T(23.7% 474) | G(28.8% 576)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(20.4% 408) | C(27.05% 541) | T(25.55% 511) | G(27.0% 540)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 + 95162394 95164393 2000 browser details YourSeq 38 1322 1410 2000 67.5% chr1 - 18045787 18045834 48 browser details YourSeq 26 1401 1426 2000 100.0% chr13 + 35827375 35827400 26 browser details YourSeq 25 1661 1688 2000 96.5% chr1 - 55855340 55855378 39 browser details YourSeq 22 1023 1044 2000 100.0% chr2 - 141500190 141500211 22 browser details YourSeq 22 1575 1596 2000 100.0% chr16 - 28821061 28821082 22 browser details YourSeq 22 1434 1455 2000 100.0% chr5 + 14751100 14751121 22 browser details YourSeq 21 423 443 2000 100.0% chr4 - 49109031 49109051 21

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 + 95173287 95175286 2000 browser details YourSeq 29 1839 1868 2000 100.0% chr6 + 47806151 47806181 31 browser details YourSeq 24 1588 1615 2000 80.0% chr1 - 134946039 134946063 25

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Sema6c sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6C [ Mus musculus (house mouse) ] Gene ID: 20360, updated on 10-Oct-2019

Gene summary

Official Symbol Sema6c provided by MGI Official Full Name sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6C provided by MGI Primary source MGI:MGI:1338032 See related Ensembl:ENSMUSG00000038777 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Semay; mKIAA1869 Expression Broad expression in whole brain E14.5 (RPKM 28.2), CNS E14 (RPKM 23.1) and 18 other tissues See more Orthologs human all

Genomic context

Location: 3; 3 F2.1 See Sema6c in Genome Data Viewer Exon count: 22

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 3 NC_000069.6 (95160420..95174050)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 3 NC_000069.5 (94968296..94977238)

Chromosome 3 - NC_000069.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 15 transcripts

Gene: Sema6c ENSMUSG00000038777

Description sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6C [Source:MGI Symbol;Acc:MGI:1338032] Gene Synonyms Sema Y, Semay Location Chromosome 3: 95,160,457-95,174,024 forward strand. GRCm38:CM000996.2 About this gene This gene has 15 transcripts (splice variants), 242 orthologues, 19 paralogues, is a member of 1 Ensembl protein family and is associated with 1 phenotype. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sema6c- ENSMUST00000090823.7 4029 963aa ENSMUSP00000088333.1 Protein coding CCDS71289 E9Q613 TSL:1 202 GENCODE basic APPRIS ALT2

Sema6c- ENSMUST00000090821.9 3933 931aa ENSMUSP00000088331.3 Protein coding CCDS17604 Q9WTM3 TSL:5 201 GENCODE basic APPRIS P3

Sema6c- ENSMUST00000168321.7 3553 963aa ENSMUSP00000129081.2 Protein coding CCDS71289 E9Q613 TSL:1 213 GENCODE basic APPRIS ALT2

Sema6c- ENSMUST00000202315.2 2843 931aa ENSMUSP00000144039.1 Protein coding CCDS17604 Q9WTM3 TSL:1 214 GENCODE basic APPRIS P3

Sema6c- ENSMUST00000107217.5 3595 923aa ENSMUSP00000102835.1 Protein coding - G5E8N4 TSL:1 203 GENCODE basic

Sema6c- ENSMUST00000131742.7 549 29aa ENSMUSP00000142525.1 Protein coding - A0A0G2JDV7 CDS 3' 208 incomplete TSL:2

Sema6c- ENSMUST00000142449.8 3936 124aa ENSMUSP00000123457.2 Nonsense mediated - E9Q4N1 TSL:1 212 decay

Sema6c- ENSMUST00000204709.2 3562 124aa ENSMUSP00000144702.1 Nonsense mediated - E9Q4N1 TSL:1 215 decay

Sema6c- ENSMUST00000131620.7 716 80aa ENSMUSP00000138154.1 Nonsense mediated - S4R1B4 TSL:5 207 decay

Sema6c- ENSMUST00000134125.7 5095 No - Retained intron - - TSL:2 209 protein

Sema6c- ENSMUST00000126597.1 814 No - Retained intron - - TSL:3 204 protein

Sema6c- ENSMUST00000141607.7 788 No - Retained intron - - TSL:3 210 protein

Sema6c- ENSMUST00000130662.1 890 No - lncRNA - - TSL:2 205 protein

Sema6c- ENSMUST00000142306.1 450 No - lncRNA - - TSL:2 211 protein

Sema6c- ENSMUST00000130814.7 347 No - lncRNA - - TSL:2 206 protein

Page 7 of 9 https://www.alphaknockout.com

33.57 kb Forward strand 95.16Mb 95.17Mb 95.18Mb (Comprehensive set... Sema6c-211 >lncRNA Sema6c-205 >lncRNA

Sema6c-215 >nonsense mediated decay

Sema6c-207 >nonsense mediated decay Sema6c-210 >retained intron

Sema6c-206 >lncRNA Sema6c-204 >retained intron

Sema6c-208 >protein coding

Sema6c-201 >protein coding

Sema6c-209 >retained intron

Sema6c-212 >nonsense mediated decay

Sema6c-202 >protein coding

Sema6c-203 >protein coding

Sema6c-213 >protein coding

Sema6c-214 >protein coding

Contigs AC131769.3 > AC140190.3 > Genes < Gabpb2-205protein coding (Comprehensive set...

< Gabpb2-204protein coding

< Gabpb2-202protein coding

Regulatory Build

95.16Mb 95.17Mb 95.18Mb Reverse strand 33.57 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000090823

13.52 kb Forward strand

Sema6c-202 >protein coding

ENSMUSP00000088... Transmembrane heli... MobiDB lite Low complexity (Seg) Cleavage site (Sign... Superfamily Sema domain superfamily

SSF103575 SMART Sema domain Pfam Sema domain PROSITE profiles Sema domain PANTHER Semaphorin 6C

Semaphorin Gene3D WD40/YVTN repeat-like-containing domain superfamily 3.30.1680.10

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 100 200 300 400 500 600 700 800 963

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9