https://www.alphaknockout.com

Mouse Sema3f Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Sema3f conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sema3f (NCBI Reference Sequence: NM_011349 ; Ensembl: ENSMUSG00000034684 ) is located on Mouse 9. 18 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 18 (Transcript: ENSMUST00000080560). Exon 7~11 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Sema3f gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-93E19 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Inactivation of this results in neuronal defects including impaired CNS axon pathfinding, and PNS and limbic system circuitry. Mice homozygous for a knock-out allele exhibit increased lymphatic branching complexity and LEC numbers.

Exon 7 starts from about 24.36% of the coding region. The knockout of Exon 7~11 will result in frameshift of the gene. The size of intron 6 for 5'-loxP site insertion: 1336 bp, and the size of intron 11 for 3'-loxP site insertion: 1642 bp. The size of effective cKO region: ~1660 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 6 7 8 9 10 11 12 18 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Sema3f Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8160bp) | A(20.83% 1700) | C(27.38% 2234) | T(23.68% 1932) | G(28.11% 2294)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr9 - 107688600 107691599 3000 browser details YourSeq 66 781 868 3000 98.6% chr15 + 89219185 89219471 287 browser details YourSeq 65 785 934 3000 93.5% chr13 - 72279686 72279836 151 browser details YourSeq 65 781 871 3000 95.9% chr15 + 89219609 89219733 125 browser details YourSeq 57 785 1001 3000 78.8% chr6 + 147772283 147772482 200 browser details YourSeq 57 785 875 3000 98.4% chr14 + 45984036 45984127 92 browser details YourSeq 53 817 875 3000 95.0% chr17 - 71078065 71078123 59 browser details YourSeq 51 823 875 3000 100.0% chr6 - 14362330 14362645 316 browser details YourSeq 50 706 875 3000 96.4% chr15 - 74438660 74438831 172 browser details YourSeq 47 827 875 3000 100.0% chr15 + 101063388 101063606 219 browser details YourSeq 45 823 871 3000 96.0% chr5 + 149519151 149519199 49 browser details YourSeq 41 831 871 3000 100.0% chr4 - 63068457 63068497 41 browser details YourSeq 38 834 873 3000 97.5% chr10 + 62815217 62815256 40 browser details YourSeq 37 1079 1123 3000 84.7% chr1 - 83184195 83184235 41 browser details YourSeq 34 337 381 3000 94.9% chr1 - 175142608 175142677 70 browser details YourSeq 34 805 843 3000 97.3% chr8 + 10807395 10807597 203 browser details YourSeq 23 1925 1949 3000 87.5% chr10 - 74630920 74630943 24 browser details YourSeq 22 781 802 3000 100.0% chr4 + 12456036 12456057 22 browser details YourSeq 21 167 187 3000 100.0% chr4 + 7428491 7428511 21

Note: The 3000 bp section upstream of Exon 7 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr9 - 107683940 107686939 3000 browser details YourSeq 121 36 533 3000 88.1% chr2 + 37805566 37806121 556 browser details YourSeq 109 210 519 3000 89.9% chr10 - 43412433 43412849 417 browser details YourSeq 109 211 508 3000 89.8% chr1 + 136388404 136820062 431659 browser details YourSeq 105 439 590 3000 88.9% chr11 - 97963324 98335731 372408 browser details YourSeq 104 1 295 3000 91.3% chr4 + 41472848 41473146 299 browser details YourSeq 103 243 558 3000 81.5% chr2 + 92002528 92002770 243 browser details YourSeq 85 438 561 3000 85.9% chr10 - 93710141 93710267 127 browser details YourSeq 81 438 562 3000 81.4% chr11 - 97996403 97996526 124 browser details YourSeq 81 438 561 3000 88.1% chr1 - 119250150 119250273 124 browser details YourSeq 81 438 562 3000 84.0% chr11 + 102081354 102268153 186800 browser details YourSeq 78 438 562 3000 86.5% chr3 - 94045903 94046026 124 browser details YourSeq 78 438 562 3000 89.8% chr12 - 81575498 81575623 126 browser details YourSeq 78 438 574 3000 84.4% chr3 + 155232919 155233070 152 browser details YourSeq 77 428 543 3000 89.0% chr2 + 6172825 6172963 139 browser details YourSeq 76 438 562 3000 87.2% chr3 - 58538455 58538589 135 browser details YourSeq 75 70 533 3000 73.7% chrX - 162891503 162891839 337 browser details YourSeq 75 437 534 3000 89.9% chr6 - 132156420 132156516 97 browser details YourSeq 75 437 534 3000 89.9% chr6 + 132157475 132157571 97 browser details YourSeq 74 438 562 3000 89.6% chr6 - 37923218 37923342 125

Note: The 3000 bp section downstream of Exon 11 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Sema3f sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F [ Mus musculus (house mouse) ] Gene ID: 20350, updated on 10-Oct-2019

Gene summary

Official Symbol Sema3f provided by MGI Official Full Name sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F provided by MGI Primary source MGI:MGI:1096347 See related Ensembl:ENSMUSG00000034684 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Sema4; Semak Expression Broad expression in lung adult (RPKM 77.6), limb E14.5 (RPKM 39.8) and 19 other tissuesS ee more Orthologs human all

Genomic context

Location: 9; 9 F1 See Sema3f in Genome Data Viewer

Exon count: 21

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 9 NC_000075.6 (107681499..107710475, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 9 NC_000075.5 (107583833..107612806, complement)

Chromosome 9 - NC_000075.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 12 transcripts

Gene: Sema3f ENSMUSG00000034684

Description sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F [Source:MGI Symbol;Acc:MGI:1096347] Gene Synonyms Sema IV, Semak Location Chromosome 9: 107,681,500-107,710,475 reverse strand. GRCm38:CM001002.2 About this gene This gene has 12 transcripts (splice variants), 258 orthologues, 19 paralogues, is a member of 1 Ensembl protein family and is associated with 42 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sema3f- ENSMUST00000080560.8 3395 754aa ENSMUSP00000079400.3 Protein coding CCDS23505 O88632 TSL:1 201 GENCODE basic APPRIS P3

Sema3f- ENSMUST00000192727.5 2383 785aa ENSMUSP00000141865.1 Protein coding CCDS81069 O88632 TSL:1 204 GENCODE basic APPRIS ALT2

Sema3f- ENSMUST00000192783.5 906 284aa ENSMUSP00000141668.1 Protein coding - A0A0A6YWS0 CDS 5' 205 incomplete TSL:3

Sema3f- ENSMUST00000193108.5 685 191aa ENSMUSP00000141878.1 Protein coding - A0A0A6YX80 CDS 3' 206 incomplete TSL:5

Sema3f- ENSMUST00000195023.2 443 148aa ENSMUSP00000141350.1 Protein coding - A0A0A6YW11 CDS 5' and 3' 211 incomplete TSL:3

Sema3f- ENSMUST00000194039.5 2972 185aa ENSMUSP00000142221.1 Nonsense mediated - A0A0A6YY06 TSL:1 208 decay

Sema3f- ENSMUST00000194424.5 711 116aa ENSMUSP00000142178.1 Nonsense mediated - A0A0A6YXX2 CDS 5' 209 decay incomplete TSL:3

Sema3f- ENSMUST00000195267.5 1084 No - Retained intron - - TSL:3 212 protein

Sema3f- ENSMUST00000194846.1 695 No - Retained intron - - TSL:3 210 protein

Sema3f- ENSMUST00000192712.1 581 No - Retained intron - - TSL:3 203 protein

Sema3f- ENSMUST00000192157.1 376 No - Retained intron - - TSL:3 202 protein

Sema3f- ENSMUST00000193665.5 1034 No - lncRNA - - TSL:5 207 protein

Page 6 of 8 https://www.alphaknockout.com

48.98 kb Forward strand 107.68Mb 107.69Mb 107.70Mb 107.71Mb 107.72Mb A930036K24Rik-201 >TEC (Comprehensive set...

Contigs < AC162905.4 < AC152718.6 Genes (Comprehensive set... < Gnat1-201protein coding< Sema3f-201protein coding

< Gnat1-205retained intron < Sema3f-204protein coding

< Gnat1-207retained intron < Sema3f-208nonsense mediated decay

< Gnat1-206lncRNA < Sema3f-212retained intron

< Gnat1-203retained intron< Sema3f-209nonsense mediated decay

< Gnat1-202nonsense mediated decay < Sema3f-205protein coding

< Gnat1-204lncRNA < Sema3f-210retained intron

< Sema3f-202retained intron

< Sema3f-207lncRNA

< Sema3f-203retained intron

< Sema3f-206protein coding

< Sema3f-211protein coding

Regulatory Build

107.68Mb 107.69Mb 107.70Mb 107.71Mb 107.72Mb Reverse strand 48.98 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000080560

< Sema3f-201protein coding

Reverse strand 28.98 kb

ENSMUSP00000079... MobiDB lite Low complexity (Seg) Cleavage site (Sign... Superfamily SSF103575

Sema domain superfamily Immunoglobulin-like domain superfamily SMART Sema domain PSI domain Immunoglobulin subtype 2

Immunoglobulin subtype Pfam Sema domain PROSITE profiles Sema domain Immunoglobulin-like domain

PANTHER PTHR11036:SF27

Semaphorin Gene3D 3.30.1680.10

WD40/YVTN repeat-like-containing domain superfamily Immunoglobulin-like fold CDD cd11254 cd05871

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 754

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8