https://www.alphaknockout.com

Mouse Egflam Knockout Project (CRISPR/Cas9)

Objective: To create a Egflam knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Egflam (NCBI Reference Sequence: NM_001289496 ; Ensembl: ENSMUSG00000042961 ) is located on Mouse 15. 23 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 23 (Transcript: ENSMUST00000096494). Exon 2~3 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous null mutants are viable and fertile under normal conditions. They exhibit abnormal photoreceptor ribbon , resulting in alteration in synaptic signal transmission and visual function.

Exon 2 starts from about 3.21% of the coding region. Exon 2~3 covers 6.36% of the coding region. The size of effective KO region: ~1952 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 23

Legends Exon of mouse Egflam Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(32.45% 649) | C(17.35% 347) | T(31.35% 627) | G(18.85% 377)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.05% 541) | C(23.9% 478) | T(28.7% 574) | G(20.35% 407)

Note: The 2000 bp section downstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr15 - 7318351 7320350 2000 browser details YourSeq 67 365 1316 2000 93.6% chr1 + 20642255 20940883 298629 browser details YourSeq 63 1230 1339 2000 89.1% chr11 - 75978470 75978579 110 browser details YourSeq 60 324 414 2000 83.6% chr16 + 30257408 30257504 97 browser details YourSeq 59 324 414 2000 82.5% chr1 + 170750227 170750317 91 browser details YourSeq 57 356 1263 2000 89.1% chr10 + 127371066 127649279 278214 browser details YourSeq 56 341 412 2000 88.9% chr9 - 103934196 103934267 72 browser details YourSeq 54 341 412 2000 84.6% chr1 - 85691331 85691401 71 browser details YourSeq 54 321 412 2000 81.3% chr15 + 99454260 99454340 81 browser details YourSeq 52 337 412 2000 84.3% chr10 + 89850645 89850720 76 browser details YourSeq 51 356 413 2000 94.9% chr15 - 88906910 88906973 64 browser details YourSeq 49 1230 1340 2000 83.6% chr6 - 39554330 39554439 110 browser details YourSeq 49 366 728 2000 96.3% chr1 - 118473667 118474051 385 browser details YourSeq 48 324 411 2000 79.3% chr10 + 95301604 95301692 89 browser details YourSeq 47 366 417 2000 96.2% chr10 - 3277456 3277513 58 browser details YourSeq 47 354 412 2000 89.9% chr10 + 80651768 80651826 59 browser details YourSeq 46 358 410 2000 94.4% chr10 - 93678326 93678384 59 browser details YourSeq 46 1156 1282 2000 87.4% chr1 + 183405314 183405450 137 browser details YourSeq 45 356 412 2000 89.5% chr10 - 51659274 51659330 57 browser details YourSeq 44 367 412 2000 97.9% chr10 - 77320574 77320619 46

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr15 - 7314399 7316398 2000 browser details YourSeq 45 1844 1894 2000 98.0% chr1 - 57434450 57434502 53 browser details YourSeq 40 1828 1894 2000 95.6% chr11 + 70203719 70203789 71 browser details YourSeq 39 1837 1891 2000 93.2% chr15 - 80953836 80953891 56 browser details YourSeq 38 1839 1893 2000 91.5% chr1 - 155982084 155982141 58 browser details YourSeq 36 1840 1891 2000 95.0% chr19 - 15895541 15895594 54 browser details YourSeq 36 1838 1899 2000 97.5% chr13 - 111525470 111525539 70 browser details YourSeq 35 1839 1893 2000 89.8% chr9 - 47919446 47919499 54 browser details YourSeq 35 1843 1893 2000 92.9% chr13 - 13480908 13480961 54 browser details YourSeq 35 1837 1886 2000 92.7% chr2 + 79725505 79725556 52 browser details YourSeq 34 1840 1884 2000 94.9% chr18 + 74920871 74920917 47 browser details YourSeq 33 1837 1870 2000 100.0% chr7 - 83871675 83871709 35 browser details YourSeq 33 1837 1871 2000 97.2% chr7 + 37708201 37708235 35 browser details YourSeq 33 1839 1891 2000 94.6% chr5 + 141167404 141167457 54 browser details YourSeq 33 1837 1890 2000 92.4% chr13 + 29785481 29785536 56 browser details YourSeq 32 1836 1893 2000 97.1% chr14 - 105350948 105351007 60 browser details YourSeq 32 1837 1891 2000 94.5% chrX + 11848855 11848911 57 browser details YourSeq 32 1837 1891 2000 94.5% chr9 + 37243200 37243256 57 browser details YourSeq 32 1841 1891 2000 91.9% chr15 + 73826040 73826092 53 browser details YourSeq 31 1840 1891 2000 94.3% chr5 - 134637575 134637627 53

Note: The 2000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Egflam EGF-like, fibronectin type III and G domains [ Mus musculus (house mouse) ] Gene ID: 268780, updated on 10-Oct-2019

Gene summary

Official Symbol Egflam provided by MGI Official Full Name EGF-like, fibronectin type III and laminin G domains provided by MGI Primary source MGI:MGI:2146149 See related Ensembl:ENSMUSG00000042961 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AU040377; 5930412K08 Expression Broad expression in lung adult (RPKM 9.0), ovary adult (RPKM 6.9) and 16 other tissues See more Orthologs human all

Genomic context

Location: 15; 15 A1 See Egflam in Genome Data Viewer Exon count: 25

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 15 NC_000081.6 (7206120..7398744, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 15 NC_000081.5 (7156120..7348304, complement)

Chromosome 15 - NC_000081.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 7 transcripts

Gene: Egflam ENSMUSG00000042961

Description EGF-like, fibronectin type III and laminin G domains [Source:MGI Symbol;Acc:MGI:2146149] Gene Synonyms nectican, pikachurin Location Chromosome 15: 7,206,120-7,398,395 reverse strand. GRCm38:CM001008.2 About this gene This gene has 7 transcripts (splice variants), 195 orthologues, 26 paralogues, is a member of 1 Ensembl protein family and is associated with 5 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Egflam- ENSMUST00000058593.9 4778 1009aa ENSMUSP00000055599.3 Protein coding CCDS27370 Q4VBE4 TSL:1 201 GENCODE basic APPRIS P3

Egflam- ENSMUST00000096494.4 4391 1017aa ENSMUSP00000094238.4 Protein coding CCDS79359 Q4VBE4 TSL:1 202 GENCODE basic APPRIS ALT2

Egflam- ENSMUST00000160207.7 4420 37aa ENSMUSP00000125188.1 Nonsense mediated - E0CXM4 TSL:1 204 decay

Egflam- ENSMUST00000160273.7 3327 No - Retained intron - - TSL:1 205 protein

Egflam- ENSMUST00000159726.1 618 No - Retained intron - - TSL:2 203 protein

Egflam- ENSMUST00000160314.1 490 No - Retained intron - - TSL:5 206 protein

Egflam- ENSMUST00000162105.1 446 No - lncRNA - - TSL:3 207 protein

Page 7 of 9 https://www.alphaknockout.com

212.28 kb Forward strand

7.2Mb 7.3Mb 7.4Mb Lifr-203 >protein coding Gm16030-201 >processed pseudogene Gm16029-201 >lncRNA (Comprehensive set...

Contigs < AC158747.3 AC105969.17 > Genes (Comprehensive set... < Egflam-201protein coding

< Egflam-202protein coding

< Egflam-204nonsense mediated decay

< Egflam-206retained intron < Egflam-207lncRNA < Egflam-203retained intron

< Egflam-205retained intron

Regulatory Build

7.2Mb 7.3Mb 7.4Mb Reverse strand 212.28 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene processed transcript pseudogene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000096494

< Egflam-202protein coding

Reverse strand 191.77 kb

ENSMUSP00000094... Low complexity (Seg) Cleavage site (Sign... Superfamily Fibronectin type III superfamily Concanavalin A-like lectin/glucanase domain superfamily

SMART Fibronectin type III EGF-like domain

EGF-like calcium-binding domain

Laminin G domain Pfam Fibronectin type III Laminin G domain Laminin G domain

EGF-like domain PROSITE profiles Fibronectin type III Laminin G domain

EGF-like domain PROSITE patterns EGF-like, conserved site

EGF-like, conserved site PANTHER PTHR10574:SF202

PTHR10574 Gene3D Immunoglobulin-like fold 2.60.120.200

2.10.25.10 CDD Fibronectin type III cd00054

cd00110

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant splice region variant synonymous variant

Scale bar 0 100 200 300 400 500 600 700 800 900 1017

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9