https://www.alphaknockout.com

Mouse Sema4a Knockout Project (CRISPR/Cas9)

Objective: To create a Sema4a knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sema4a (NCBI Reference Sequence: NM_013658 ; Ensembl: ENSMUSG00000028064 ) is located on Mouse 3. 15 exons are identified, with the ATG start codon in exon 2 and the TAA stop codon in exon 15 (Transcript: ENSMUST00000029700). Exon 2~10 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygotes for a knock-out allele show no obvious brain defects but exhibit impaired T cell priming and defective Th1 responses. Homozygotes for a gene trap allele show severe retinal degeneration with reduced retinal vessels, depigmentation and dysfunction of both rod and cone photoreceptors.

Exon 2 starts from the coding region. Exon 2~10 covers 49.74% of the coding region. The size of effective KO region: ~6856 bp. The KO region does not have any other known gene.

Page 1 of 10 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10 15

Legends Exon of mouse Sema4a Knockout region

Page 2 of 10 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 701 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 10 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 10 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(701bp) | A(23.97% 168) | C(23.25% 163) | T(18.69% 131) | G(34.09% 239)

Note: The 701 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.3% 446) | C(26.55% 531) | T(30.4% 608) | G(20.75% 415)

Note: The 2000 bp section downstream of Exon 10 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 10 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 701 1 701 701 100.0% chr3 - 88454818 88455518 701 browser details YourSeq 40 56 155 701 95.5% chr11 - 53916566 53940005 23440 browser details YourSeq 31 292 351 701 73.6% chr10 + 9119365 9119416 52 browser details YourSeq 25 60 87 701 96.5% chr11 - 65047254 65047284 31 browser details YourSeq 22 564 587 701 87.0% chr12 - 104793264 104793286 23 browser details YourSeq 22 392 414 701 100.0% chr1 - 128874851 128874874 24 browser details YourSeq 21 541 561 701 100.0% chr10 - 20544605 20544625 21 browser details YourSeq 20 46 65 701 100.0% chr13 - 29840438 29840457 20 browser details YourSeq 20 653 672 701 100.0% chr14 + 16488031 16488050 20

Note: The 701 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 - 88445991 88447990 2000 browser details YourSeq 73 255 582 2000 92.0% chr4 - 155038870 155039336 467 browser details YourSeq 64 263 811 2000 93.3% chr7 + 143227188 143356115 128928 browser details YourSeq 53 291 597 2000 76.2% chr1 + 84924932 84925200 269 browser details YourSeq 52 291 806 2000 63.3% chr1 + 36055899 36056163 265 browser details YourSeq 51 262 343 2000 90.5% chr10 - 69364491 69364770 280 browser details YourSeq 50 242 335 2000 91.7% chr7 - 24496126 24496323 198 browser details YourSeq 49 262 333 2000 91.6% chr11 - 77596717 77596795 79 browser details YourSeq 49 262 335 2000 90.0% chr5 + 79715938 79716016 79 browser details YourSeq 48 262 345 2000 81.7% chr2 - 144474210 144474289 80 browser details YourSeq 48 296 570 2000 92.9% chr16 + 87468723 87469105 383 browser details YourSeq 47 789 886 2000 92.8% chr1 - 153367843 153368251 409 browser details YourSeq 46 262 337 2000 92.6% chr7 - 19927444 19927520 77 browser details YourSeq 46 262 334 2000 92.6% chr14 + 32614938 32615011 74 browser details YourSeq 46 262 334 2000 92.6% chr10 + 63044707 63044780 74 browser details YourSeq 45 268 333 2000 84.9% chr4 - 103165358 103165424 67 browser details YourSeq 45 262 333 2000 92.5% chr1 - 171336552 171336624 73 browser details YourSeq 45 262 334 2000 91.0% chr16 + 20081264 20081337 74 browser details YourSeq 45 262 333 2000 92.5% chr11 + 84717834 84717906 73 browser details YourSeq 44 262 333 2000 89.1% chr11 + 101384482 101384556 75

Note: The 2000 bp section downstream of Exon 10 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 10 https://www.alphaknockout.com

Gene and information: Sema4a sema domain, immunoglobulin domain (Ig), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) 4A [ Mus musculus (house mouse) ] Gene ID: 20351, updated on 10-Oct-2019

Gene summary

Official Symbol Sema4a provided by MGI Official Full Name sema domain, immunoglobulin domain (Ig), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) Primary source 4A provided by MGI See related MGI:MGI:107560 Gene type Ensembl:ENSMUSG00000028064 RefSeq status protein coding Organism VALIDATED Lineage Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Also known as Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Expression SemB; Semab; AI132332 Orthologs Broad expression in colon adult (RPKM 43.5), duodenum adult (RPKM 41.8) and 25 other tissues See more human all

Genomic context

Location: 3; 3 F1 See Sema4a in Genome Data Viewer Exon count: 21

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 3 NC_000069.6 (88435959..88461240, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 3 NC_000069.5 (88239884..88265104, complement)

Chromosome 3 - NC_000069.6

Page 6 of 10 https://www.alphaknockout.com

Transcript information: This gene has 19 transcripts

Gene: Sema4a ENSMUSG00000028064

Description sema domain, immunoglobulin domain (Ig), transmembrane domain (TM) and short cytoplasmic domain, (semaphorin) 4A [Source:MGI Symbol;Acc:MGI:107560] Gene Synonyms SemB, Semab Location Chromosome 3: 88,435,959-88,461,182 reverse strand. GRCm38:CM000996.2 About this gene This gene has 19 transcripts (splice variants), 410 orthologues, 19 paralogues, is a member of 1 Ensembl protein family and is associated with 20 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sema4a- ENSMUST00000029700.11 3205 760aa ENSMUSP00000029700.5 Protein coding CCDS17475 Q62178 TSL:1 201 GENCODE basic APPRIS P1

Sema4a- ENSMUST00000165898.7 3125 760aa ENSMUSP00000128510.1 Protein coding CCDS17475 Q62178 TSL:5 213 GENCODE basic APPRIS P1

Sema4a- ENSMUST00000169222.7 3084 760aa ENSMUSP00000128887.1 Protein coding CCDS17475 Q62178 TSL:5 215 GENCODE basic APPRIS P1

Sema4a- ENSMUST00000166237.7 3046 760aa ENSMUSP00000125909.1 Protein coding CCDS17475 Q62178 TSL:1 214 GENCODE basic APPRIS P1

Sema4a- ENSMUST00000107531.7 2897 628aa ENSMUSP00000103155.1 Protein coding - D3YWV5 TSL:5 202 GENCODE basic

Sema4a- ENSMUST00000127436.7 845 233aa ENSMUSP00000118706.1 Protein coding - D3YZ30 CDS 3' 205 incomplete TSL:3

Sema4a- ENSMUST00000147200.7 706 203aa ENSMUSP00000123061.1 Protein coding - D3YUM4 CDS 3' 210 incomplete TSL:5

Sema4a- ENSMUST00000141471.1 630 60aa ENSMUSP00000114330.1 Protein coding - D3YVM6 CDS 3' 208 incomplete TSL:5

Sema4a- ENSMUST00000125526.7 506 113aa ENSMUSP00000119028.1 Protein coding - D3Z336 CDS 3' 204 incomplete TSL:3

Sema4a- ENSMUST00000123753.7 385 17aa ENSMUSP00000120084.1 Protein coding - D3YWK5 CDS 3' 203 incomplete TSL:2

Sema4a- ENSMUST00000184487.7 866 170aa ENSMUSP00000139126.1 Nonsense mediated - V9GXF5 TSL:5 216 decay

Sema4a- ENSMUST00000184876.7 748 180aa ENSMUSP00000139159.1 Nonsense mediated - V9GXH9 TSL:5 217 decay

Sema4a- ENSMUST00000185137.7 699 47aa ENSMUSP00000138858.1 Nonsense mediated - V9GWW2 CDS 5' 219 decay incomplete TSL:3

Sema4a- ENSMUST00000135539.7 2487 No - Retained intron - - TSL:2 206 protein

Sema4a- ENSMUST00000149145.1 2357 No - Retained intron - - TSL:5 Page 7 of 10 https://www.alphaknockout.com

211 protein

Sema4a- ENSMUST00000156108.7 2108 No - Retained intron - - TSL:2 212 protein

Sema4a- ENSMUST00000135732.7 713 No - Retained intron - - TSL:3 207 protein

Sema4a- ENSMUST00000184972.1 465 No - Retained intron - - TSL:2 218 protein

Sema4a- ENSMUST00000146921.1 371 No - lncRNA - - TSL:3 209 protein

Page 8 of 10 https://www.alphaknockout.com

45.22 kb Forward strand 88.43Mb 88.44Mb 88.45Mb 88.46Mb 88.47Mb Gm42814-201 >processed pseudogene (Comprehensive set...

Contigs < AC102388.7 Genes (Comprehensive set... < Sema4a-201protein coding

< Sema4a-215protein coding

< Sema4a-213protein coding

< Sema4a-214protein coding

< Sema4a-202protein coding < Sema4a-209lncRNA

< Sema4a-211retained intron < Sema4a-205protein coding

< Sema4a-206retained intron < Sema4a-219nonsense mediated decay

< Sema4a-212retained intron < Sema4a-207retained intron

< Mir7011-201miRNA < Sema4a-216nonsense mediated decay

< Sema4a-217nonsense mediated decay

< Sema4a-210protein coding

< Sema4a-204protein coding

< Sema4a-218retained intron

< Sema4a-208protein coding

< Sema4a-203protein coding

Regulatory Build

88.43Mb 88.44Mb 88.45Mb 88.46Mb 88.47Mb Reverse strand 45.22 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene pseudogene

Page 9 of 10 https://www.alphaknockout.com

Transcript: ENSMUST00000029700

< Sema4a-201protein coding

Reverse strand 19.78 kb

ENSMUSP00000029... Transmembrane heli... MobiDB lite Low complexity (Seg) Cleavage site (Sign... Superfamily Sema domain superfamily

SSF103575 SMART Sema domain PSI domain

Pfam Sema domain Plexin repeat

PROSITE profiles Sema domain PANTHER PTHR11036:SF15

Semaphorin Gene3D WD40/YVTN repeat-like-containing domain superfamily 3.30.1680.10

Immunoglobulin-like fold

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 760

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 10 of 10