https://www.alphaknockout.com

Mouse Epb41 Knockout Project (CRISPR/Cas9)

Objective: To create a Epb41 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Epb41 (NCBI Reference Sequence: NM_183428 ; Ensembl: ENSMUSG00000028906 ) is located on Mouse 4. 21 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 20 (Transcript: ENSMUST00000030739). Exon 2~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygotes for a targeted null mutation exhibit moderate hemolytic anemia, erythrocytic abnormalities including aberrant morphology, reduced membrane stability, and lowered expression of and ankyrin.

Exon 2 starts from the coding region. Exon 2~5 covers 32.32% of the coding region. The size of effective KO region: ~9940 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 21

Legends Exon of mouse Epb41 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(29.0% 580) | C(19.95% 399) | T(28.95% 579) | G(22.1% 442)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.35% 507) | C(18.8% 376) | T(32.95% 659) | G(22.9% 458)

Note: The 2000 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr4 - 132007432 132009431 2000 browser details YourSeq 151 1594 1774 2000 93.3% chr5 + 150188969 150189489 521 browser details YourSeq 147 1595 1762 2000 94.7% chr14 + 55209295 55209487 193 browser details YourSeq 146 1594 1758 2000 95.7% chr5 - 103453069 103453252 184 browser details YourSeq 146 1593 1758 2000 94.6% chr13 + 109504589 109505018 430 browser details YourSeq 144 1591 1758 2000 93.5% chr1 - 132929655 132929842 188 browser details YourSeq 144 1593 1759 2000 94.1% chr10 + 80860263 80860448 186 browser details YourSeq 144 1594 1771 2000 93.5% chr10 + 4277207 4277406 200 browser details YourSeq 142 1593 1758 2000 93.4% chr8 + 33754934 33755119 186 browser details YourSeq 142 1594 1756 2000 94.5% chr11 + 68478232 68478417 186 browser details YourSeq 141 1593 1758 2000 95.0% chr5 + 147750839 147751252 414 browser details YourSeq 141 1601 1758 2000 95.0% chr4 + 138089405 138089578 174 browser details YourSeq 141 1595 1760 2000 93.4% chr13 + 55430469 55430653 185 browser details YourSeq 141 1594 1756 2000 93.9% chr10 + 94976983 94977166 184 browser details YourSeq 140 1596 1758 2000 93.8% chr11 - 58308489 58308672 184 browser details YourSeq 140 1597 1759 2000 93.8% chr14 + 60553372 60553555 184 browser details YourSeq 140 1593 1755 2000 93.9% chr1 + 39512542 39512717 176 browser details YourSeq 139 1593 1758 2000 92.7% chr18 - 35911940 35912123 184 browser details YourSeq 139 1594 1758 2000 92.7% chr16 - 44228605 44228785 181 browser details YourSeq 139 1593 1760 2000 91.7% chr5 + 144831339 144831522 184

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr4 - 131995499 131997498 2000 browser details YourSeq 49 1301 1699 2000 65.6% chr4 - 141627243 141627425 183 browser details YourSeq 41 1688 1758 2000 95.7% chr11 + 58271111 58271392 282 browser details YourSeq 35 1719 1768 2000 86.0% chr4 + 55424662 55424712 51 browser details YourSeq 34 1320 1357 2000 89.2% chr16 + 49753680 49753716 37 browser details YourSeq 31 1691 1729 2000 94.5% chr2 - 5468029 5468069 41 browser details YourSeq 31 1621 1653 2000 97.0% chr12 + 80364339 80364371 33 browser details YourSeq 30 1732 1763 2000 96.9% chr8 - 69898955 69898986 32 browser details YourSeq 29 1321 1353 2000 90.4% chr4 - 103509011 103509042 32 browser details YourSeq 29 1689 1729 2000 94.0% chr4 - 67715880 67715926 47 browser details YourSeq 29 1702 1741 2000 89.2% chr4 - 9515531 9515571 41 browser details YourSeq 29 1697 1729 2000 94.0% chr10 + 38723570 38723602 33 browser details YourSeq 27 1709 1751 2000 71.9% chr2 - 16935568 16935601 34 browser details YourSeq 26 1704 1729 2000 100.0% chr7 - 93423806 93423831 26 browser details YourSeq 25 1694 1721 2000 84.7% chr17 - 9973176 9973201 26 browser details YourSeq 25 1323 1349 2000 96.3% chr5 + 46571651 46571677 27 browser details YourSeq 24 1700 1726 2000 84.0% chr14 - 71903226 71903250 25 browser details YourSeq 24 505 536 2000 96.2% chr10 + 9273614 9273646 33 browser details YourSeq 23 1704 1729 2000 96.0% chr1 - 37658256 37658283 28 browser details YourSeq 23 1703 1728 2000 83.4% chr18 + 35894125 35894148 24

Note: The 2000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Epb41 erythrocyte membrane protein band 4.1 [ Mus musculus (house mouse) ] Gene ID: 269587, updated on 24-Oct-2019

Gene summary

Official Symbol Epb41 provided by MGI Official Full Name erythrocyte membrane protein band 4.1 provided by MGI Primary source MGI:MGI:95401 See related Ensembl:ENSMUSG00000028906 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 4.1R; Elp1; Elp-1; Epb4.1; AI415518; mKIAA4056; D4Ertd442e Expression Ubiquitous expression in liver E14.5 (RPKM 23.5), liver E14 (RPKM 23.4) and 27 other tissues See more Orthologs human all

Genomic context

Location: 4 D2.3; 4 64.54 cM See Epb41 in Genome Data Viewer Exon count: 26

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 4 NC_000070.6 (131923413..132075579, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 4 NC_000070.5 (131479328..131631228, complement)

Chromosome 4 - NC_000070.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 20 transcripts

Gene: Epb41 ENSMUSG00000028906

Description erythrocyte membrane protein band 4.1 [Source:MGI Symbol;Acc:MGI:95401] Gene Synonyms 4.1R, D4Ertd442e, Elp-1, Elp1, Epb4.1 Location Chromosome 4: 131,923,413-132,075,321 reverse strand. GRCm38:CM000997.2 About this gene This gene has 20 transcripts (splice variants), 252 orthologues, 11 paralogues, is a member of 1 Ensembl protein family and is associated with 24 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Epb41-201 ENSMUST00000030739.10 6565 858aa ENSMUSP00000030739.4 Protein coding CCDS18717 P48193 TSL:5 GENCODE basic APPRIS P3

Epb41-208 ENSMUST00000105981.8 5303 858aa ENSMUSP00000101601.2 Protein coding CCDS18717 P48193 TSL:5 GENCODE basic APPRIS P3

Epb41-203 ENSMUST00000084253.9 5141 804aa ENSMUSP00000081274.3 Protein coding CCDS51317 P48193 TSL:1 GENCODE basic APPRIS ALT2

Epb41-202 ENSMUST00000054917.11 4948 804aa ENSMUSP00000060375.5 Protein coding CCDS51317 P48193 TSL:5 GENCODE basic APPRIS ALT2

Epb41-205 ENSMUST00000105972.7 2577 858aa ENSMUSP00000101592.1 Protein coding CCDS18717 P48193 TSL:5 GENCODE basic APPRIS P3

Epb41-219 ENSMUST00000212761.1 8302 476aa ENSMUSP00000148515.1 Protein coding - A0A1D5RLV1 CDS 5' incomplete TSL:1

Epb41-207 ENSMUST00000105975.7 5235 869aa ENSMUSP00000101595.1 Protein coding - A2A841 TSL:5 GENCODE basic APPRIS ALT2

Epb41-215 ENSMUST00000146021.7 5074 594aa ENSMUSP00000158714.1 Protein coding - - TSL:1 GENCODE basic APPRIS ALT2

Epb41-204 ENSMUST00000105970.7 5059 639aa ENSMUSP00000101590.1 Protein coding - A2A839 TSL:5 GENCODE basic APPRIS ALT2

Epb41-206 ENSMUST00000105974.7 4844 769aa ENSMUSP00000101594.1 Protein coding - A2A842 TSL:5 GENCODE basic

Epb41-212 ENSMUST00000137846.7 4531 667aa ENSMUSP00000123623.1 Protein coding - A2A838 CDS 5' incomplete TSL:5

Epb41-210 ENSMUST00000135579.7 3654 375aa ENSMUSP00000121764.1 Protein coding - F6S4K9 CDS 5' incomplete TSL:1

Epb41-216 ENSMUST00000146443.7 3278 250aa ENSMUSP00000122234.1 Protein coding - F7BUB8 CDS 5' incomplete TSL:5

Epb41-218 ENSMUST00000155990.7 3120 197aa ENSMUSP00000116599.1 Protein coding - F7CR30 CDS 5' incomplete TSL:3

Epb41-213 ENSMUST00000141291.2 2832 767aa ENSMUSP00000120236.2 Protein coding - A2AD32 TSL:5 GENCODE basic

Epb41-220 ENSMUST00000238852.1 2375 No protein - Retained intron - - -

Epb41-214 ENSMUST00000144754.7 5946 No protein - lncRNA - - TSL:1

Epb41-217 ENSMUST00000151746.7 4458 No protein - lncRNA - - TSL:1

Epb41-209 ENSMUST00000131953.1 644 No protein - lncRNA - - TSL:5

Page 7 of 9 https://www.alphaknockout.com

Epb41-211 ENSMUST00000136761.1 607 No protein - lncRNA - - TSL:2

171.91 kb Forward strand 131.92Mb 131.94Mb 131.96Mb 131.98Mb 132.00Mb 132.02Mb 132.04Mb 132.06Mb 132.08Mb Tmem200b-202 >protein coding Gm13063-201 >lncRNA (Comprehensive set...

Tmem200b-201 >protein coding Gm10300-201 >lncRNA

Contigs AL607088.26 > AL669981.11 >

Genes (Comprehensive set... < Gm12992-201lncRNA < Epb41-219protein coding < Gm27445-201miRNA

< Gm12992-202lncRNA < Epb41-213protein coding

< AL607088.1-201lncRNA < Epb41-220retained intron

< Epb41-215protein coding

< Epb41-207protein coding

< Epb41-204protein coding

< Epb41-206protein coding

< Epb41-202protein coding

< Epb41-212protein coding

< Epb41-210protein coding < Epb41-211lncRNA

< Epb41-216protein coding < Epb41-209lncRNA

< Epb41-217lncRNA

< Epb41-218protein coding

< Epb41-201protein coding

< Epb41-203protein coding

< Epb41-208protein coding

< Epb41-214lncRNA

< Epb41-205protein coding

Regulatory Build

131.92Mb 131.94Mb 131.96Mb 131.98Mb 132.00Mb 132.02Mb 132.04Mb 132.06Mb 132.08Mb Reverse strand 171.91 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000030739

< Epb41-201protein coding

Reverse strand 148.67 kb

ENSMUSP00000030... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily SSF50729

Ubiquitin-like domain superfamily

FERM superfamily, second domain SMART Band 4.1 domain FERM adjacent (FA)

FERM, C-terminal PH-like domain Prints Ezrin/radixin/moesin-like

Band 4.1 domain Pfam FERM, N-terminal FERM, C-terminal PH-like domain SAB domain Band 4.1, C-terminal

FERM central domain FERM adjacent (FA) PROSITE profiles FERM domain

PROSITE patterns FERM conserved site

FERM conserved site PIRSF PIRSF002304

PANTHER Band 4.1 protein, chordates

PTHR23280 Gene3D 3.10.20.90 FERM/acyl-CoA-binding protein superfamily

PH-like domain superfamily CDD cd17105 FERM central domain

cd13184

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 720 858

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9