https://www.alphaknockout.com

Mouse Sema4c Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Sema4c conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sema4c (NCBI Reference Sequence: NM_001126047 ; Ensembl: ENSMUSG00000026121 ) is located on Mouse 1. 15 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 15 (Transcript: ENSMUST00000114991). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Sema4c gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-292F21 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a targeted mutation exhibit exencephaly, neonatal lethality, and abnormal cerebellum morphology.

Exon 2 starts from about 100% of the coding region. The knockout of Exon 2 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 1938 bp, and the size of intron 2 for 3'-loxP site insertion: 1531 bp. The size of effective cKO region: ~609 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 3 4 5 15 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Sema4c Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7109bp) | A(18.23% 1296) | C(28.58% 2032) | T(22.96% 1632) | G(30.23% 2149)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr1 - 36556291 36559290 3000 browser details YourSeq 31 831 864 3000 97.0% chr12 - 71830871 71830905 35 browser details YourSeq 27 71 120 3000 64.3% chr1 + 49053731 49053758 28 browser details YourSeq 22 2553 2574 3000 100.0% chr2 + 95210710 95210731 22 browser details YourSeq 21 71 91 3000 100.0% chr1 - 123112605 123112625 21 browser details YourSeq 21 617 637 3000 100.0% chr3 + 153563016 153563036 21 browser details YourSeq 21 835 856 3000 100.0% chr3 + 31310451 31310473 23 browser details YourSeq 21 2176 2197 3000 100.0% chr1 + 31600295 31600317 23 browser details YourSeq 20 1045 1064 3000 100.0% chr13 + 43480901 43480920 20 browser details YourSeq 20 837 856 3000 100.0% chr1 + 39903384 39903403 20

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr1 - 36552682 36555681 3000 browser details YourSeq 35 2617 2705 3000 69.7% chr7 + 80217013 80217101 89 browser details YourSeq 33 1723 1795 3000 89.2% chr4 - 40885238 40885309 72 browser details YourSeq 33 849 896 3000 82.1% chr17 - 24584161 24584205 45

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Sema4c sema domain, immunoglobulin domain (Ig), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) 4C [ Mus musculus (house mouse) ] Gene ID: 20353, updated on 10-Oct-2019

Gene summary

Official Symbol Sema4c provided by MGI Official Full Name sema domain, immunoglobulin domain (Ig), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) Primary source 4C provided by MGI See related MGI:MGI:109252 Gene type Ensembl:ENSMUSG00000026121 RefSeq status protein coding Organism REVIEWED Lineage Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Also known as Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Summary Semaf; Semai; sema I; Semacl1; AI426163; M-Sema F Expression This gene encodes a member of the semaphorin family of that have diverse functions in neuronal development, heart morphogenesis, vascular growth, tumor progression and immune cell regulation. Lack of the encoded protein in some mice causes exencephaly resulting in neonatal lethality. Mice that bypass exencephaly show no obvious behavioral defects but display distinct pigmentation defects. of this gene results in multiple transcript variants. [provided by RefSeq, Jan 2015] Orthologs Ubiquitous expression in CNS E11.5 (RPKM 14.0), CNS E14 (RPKM 13.4) and 27 other tissues See more all

Genomic context

Location: 1; 1 B See Sema4c in Genome Data Viewer

Exon count: 18

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 1 NC_000067.6 (36548639..36560699, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 1 NC_000067.5 (36605487..36615226, complement)

Chromosome 1 - NC_000067.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: Sema4c ENSMUSG00000026121

Description sema domain, immunoglobulin domain (Ig), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) 4C [Source:MGI Symbol;Acc:MGI:109252] Gene Synonyms M-Sema F, Semacl1, Semaf, Semai Location Chromosome 1: 36,548,639-36,558,349 reverse strand. GRCm38:CM000994.2 About this gene This gene has 8 transcripts (splice variants), 197 orthologues, 19 paralogues, is a member of 1 Ensembl protein family and is associated with 7 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sema4c-202 ENSMUST00000191642.5 3892 834aa ENSMUSP00000142284.1 Protein coding CCDS48240 Q64151 TSL:5 GENCODE basic APPRIS P1

Sema4c-203 ENSMUST00000191677.5 3779 834aa ENSMUSP00000141263.1 Protein coding CCDS48240 Q64151 TSL:5 GENCODE basic APPRIS P1

Sema4c-201 ENSMUST00000114991.7 3773 834aa ENSMUSP00000110643.1 Protein coding CCDS48240 Q64151 TSL:1 GENCODE basic APPRIS P1

Sema4c-208 ENSMUST00000195620.5 3640 834aa ENSMUSP00000141527.1 Protein coding CCDS48240 Q64151 TSL:1 GENCODE basic APPRIS P1

Sema4c-207 ENSMUST00000195339.2 508 36aa ENSMUSP00000141833.1 Protein coding - A0A0A6YX48 CDS 3' incomplete TSL:1

Sema4c-205 ENSMUST00000193382.5 452 37aa ENSMUSP00000142034.1 Protein coding - A0A0A6YXK9 CDS 3' incomplete TSL:5

Sema4c-204 ENSMUST00000191785.1 745 No protein - Retained intron - - TSL:3

Sema4c-206 ENSMUST00000195160.1 618 No protein - Retained intron - - TSL:3

Page 6 of 8 https://www.alphaknockout.com

29.71 kb Forward strand 36.54Mb 36.55Mb 36.56Mb D430040D24Rik-202 >retained intron (Comprehensive set...

D430040D24Rik-201 >lncRNA

Contigs < AC084391.1

Genes (Comprehensive set... < Gm42417-201protein coding < Ankrd39-205retained intron < Sema4c-204retained int

< Gm42417-202nonsense mediated decay < Sema4c-203protein coding < Fam178b-207nonsense mediated decay

< Ankrd39-203nonsense mediated decay < Sema4c-201protein coding < Fam178b-202protein coding

< Ankrd39-201protein coding < Sema4c-208protein coding < Fam178b-205lncRNA

< Ankrd39-202protein coding < Sema4c-202protein coding < Fam178b-204lncRNA

< Ankrd39-204retained intron < Sema4c-206retained intron < Fam178b-203lncRNA

< Ankrd39-206lncRNA < Sema4c-205protein coding

Regulatory Build

36.54Mb 36.55Mb 36.56Mb Reverse strand 29.71 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000114991

< Sema4c-201protein coding

Reverse strand 9.71 kb

ENSMUSP00000110... Transmembrane heli... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Cleavage site (Sign... Superfamily Sema domain superfamily SSF103575

Immunoglobulin-like domain superfamily SMART Sema domain PSI domain

Immunoglobulin subtype Pfam Sema domain Plexin repeat

PROSITE profiles Sema domain Immunoglobulin-like domain

PANTHER Semaphorin

PTHR11036:SF16 Gene3D WD40/YVTN repeat-like-containing domain superfamily Immunoglobulin-like fold

3.30.1680.10 CDD cd11258

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 720 834

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8