https://www.alphaknockout.com

Mouse Hpn Knockout Project (CRISPR/Cas9)

Objective: To create a Hpn knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Hpn (NCBI Reference Sequence: NM_001110252 ; Ensembl: ENSMUSG00000001249 ) is located on Mouse 7. 14 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 14 (Transcript: ENSMUST00000108102). Exon 2~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a null mutation are hypothyroidic and develop profound hearing loss associated with structural changes in the tectorial membrane and a myelination defect affecting the compaction of spiral ganglion neurons.

Exon 2 starts from the coding region. Exon 2~5 covers 16.59% of the coding region. The size of effective KO region: ~5780 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 14

Legends Exon of mouse Hpn Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 391 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(391bp) | A(18.93% 74) | C(37.6% 147) | T(20.72% 81) | G(22.76% 89)

Note: The 391 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.9% 518) | C(23.75% 475) | T(26.4% 528) | G(23.95% 479)

Note: The 2000 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 391 1 391 391 100.0% chr7 - 31114691 31115081 391 browser details YourSeq 39 54 335 391 62.8% chr16 + 67030256 67030439 184 browser details YourSeq 29 333 377 391 96.8% chr13 + 33833013 33833502 490 browser details YourSeq 28 54 85 391 96.7% chrX + 129715992 129716441 450 browser details YourSeq 25 78 116 391 70.4% chr16 - 41535977 41536006 30 browser details YourSeq 23 304 330 391 96.0% chr8 + 104241905 104241936 32 browser details YourSeq 22 324 346 391 100.0% chr15 - 12757609 12757632 24 browser details YourSeq 22 62 85 391 95.9% chrX + 73052570 73052593 24 browser details YourSeq 21 369 389 391 100.0% chr1 - 40475302 40475322 21 browser details YourSeq 21 64 86 391 95.7% chrX + 46437325 46437347 23 browser details YourSeq 21 65 85 391 100.0% chrX + 44809215 44809235 21 browser details YourSeq 20 66 85 391 100.0% chr6 - 56225676 56225695 20 browser details YourSeq 20 66 85 391 100.0% chr5 - 109849781 109849800 20 browser details YourSeq 20 66 85 391 100.0% chr3 - 55107160 55107179 20 browser details YourSeq 20 66 85 391 100.0% chr16 - 58803799 58803818 20 browser details YourSeq 20 66 85 391 100.0% chr14 - 84308991 84309010 20 browser details YourSeq 20 66 85 391 100.0% chr9 + 76106469 76106488 20 browser details YourSeq 20 66 85 391 100.0% chr9 + 33659534 33659553 20 browser details YourSeq 20 66 85 391 100.0% chr4 + 145552369 145552388 20 browser details YourSeq 20 66 85 391 100.0% chr3 + 93599096 93599115 20

Note: The 391 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr7 - 31106965 31108964 2000 browser details YourSeq 177 1622 1872 2000 94.9% chr5 - 142719353 142719803 451 browser details YourSeq 166 1687 1873 2000 94.7% chr5 + 140550327 140681807 131481 browser details YourSeq 165 1687 1873 2000 95.1% chr7 - 30304581 30304815 235 browser details YourSeq 165 1687 1873 2000 95.1% chr7 - 30304317 30304551 235 browser details YourSeq 165 1687 1873 2000 95.1% chr7 - 28705782 29066004 360223 browser details YourSeq 164 1687 1873 2000 94.2% chr7 - 19757391 19757733 343 browser details YourSeq 164 1687 1873 2000 91.9% chr7 - 19351310 19351492 183 browser details YourSeq 164 1687 1873 2000 91.8% chr7 + 29326897 29327078 182 browser details YourSeq 164 1687 1873 2000 91.8% chr5 + 140011976 140012157 182 browser details YourSeq 164 1687 1873 2000 91.9% chr5 + 139052733 139052915 183 browser details YourSeq 163 1687 1873 2000 91.8% chr7 - 18945901 18946083 183 browser details YourSeq 163 1687 1873 2000 94.5% chr5 - 144638009 144638451 443 browser details YourSeq 163 1687 1873 2000 94.6% chr5 - 136703002 136703237 236 browser details YourSeq 163 1687 1873 2000 92.8% chr5 + 136768556 136768738 183 browser details YourSeq 162 1687 1873 2000 94.6% chr7 - 28202576 28202864 289 browser details YourSeq 162 1687 1873 2000 94.6% chr7 - 16180986 16181328 343 browser details YourSeq 162 1687 1873 2000 91.3% chr7 - 16264340 16264522 183 browser details YourSeq 162 1690 1873 2000 95.0% chr5 - 137707666 137707898 233 browser details YourSeq 162 1687 1873 2000 91.3% chr5 - 136850350 136850532 183

Note: The 2000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Hpn hepsin [ Mus musculus (house mouse) ] Gene ID: 15451, updated on 12-Aug-2019

Gene summary

Official Symbol Hpn provided by MGI Official Full Name hepsin provided by MGI Primary source MGI:MGI:1196620 See related Ensembl:ENSMUSG00000001249 Gene type protein coding RefSeq status REVIEWED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Hlb320 Summary This gene encodes a type II transmembrane serine protease that may function in diverse processes, including regulation of Expression cell growth. Deficiency in this gene results in hearing loss. The protein is cleaved into a catalytic serine protease chain and a non-catalytic scavenger receptor cysteine-rich chain, which associate via a single disulfide bond. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. [provided by RefSeq, Jan 2013] Orthologs Biased expression in liver adult (RPKM 275.8), kidney adult (RPKM 197.4) and 4 other tissuesS ee more human all

Genomic context

Location: 7; 7 B1 See Hpn in Genome Data Viewer Exon count: 15

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 7 NC_000073.6 (31098725..31115326, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 7 NC_000073.5 (31883744..31900309, complement)

Chromosome 7 - NC_000073.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 11 transcripts

Gene: Hpn ENSMUSG00000001249

Description hepsin [Source:MGI Symbol;Acc:MGI:1196620] Gene Synonyms Hlb320 Location Chromosome 7: 31,098,725-31,115,290 reverse strand. GRCm38:CM001000.2 About this gene This gene has 11 transcripts (splice variants), 179 orthologues, 20 paralogues, is a member of 1 Ensembl protein family and is associated with 18 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Hpn- ENSMUST00000108102.8 1830 436aa ENSMUSP00000103737.2 Protein coding CCDS52188 O35453 TSL:1 202 GENCODE basic

Hpn- ENSMUST00000168884.7 1762 416aa ENSMUSP00000131658.1 Protein coding CCDS52187 G3UWE8 TSL:1 209 GENCODE basic APPRIS P1

Hpn- ENSMUST00000039435.14 1743 445aa ENSMUSP00000038149.8 Protein coding CCDS71936 E9Q5P0 TSL:1 201 GENCODE basic

Hpn- ENSMUST00000164929.2 1110 82aa ENSMUSP00000127229.1 Protein coding - E9Q3X9 CDS 3' 204 incomplete TSL:3

Hpn- ENSMUST00000171259.1 621 179aa ENSMUSP00000132307.1 Protein coding - F6W6S4 CDS 5' 211 incomplete TSL:5

Hpn- ENSMUST00000171225.1 469 32aa ENSMUSP00000130966.1 Protein coding - F7C9T6 CDS 5' 210 incomplete TSL:3

Hpn- ENSMUST00000165124.7 1861 137aa ENSMUSP00000145624.1 Nonsense mediated - A0A0U1RNM3 TSL:2 205 decay

Hpn- ENSMUST00000164340.1 920 No - Retained intron - - TSL:5 203 protein

Hpn- ENSMUST00000167719.7 679 No - Retained intron - - TSL:3 207 protein

Hpn- ENSMUST00000165480.1 541 No - Retained intron - - TSL:3 206 protein

Hpn- ENSMUST00000168623.1 351 No - Retained intron - - TSL:3 208 protein

Page 7 of 9 https://www.alphaknockout.com

36.57 kb Forward strand

31.09Mb 31.10Mb 31.11Mb 31.12Mb Contigs AC158993.2 >

Genes (Comprehensive set... < Hpn-202protein coding < Scn1b-201protein coding

< Hpn-209protein coding < Scn1b-203protein coding

< Hpn-205nonsense mediated decay < Scn1b-202retained intron

< Hpn-201protein coding

< Hpn-211protein coding < Hpn-204protein coding

< Hpn-210protein coding < Hpn-208retained intron

< Hpn-203retained intron

< Hpn-207retained intron

< Hpn-206retained intron

Regulatory Build

31.09Mb 31.10Mb 31.11Mb 31.12Mb Reverse strand 36.57 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000108102

< Hpn-202protein coding

Reverse strand 16.57 kb

ENSMUSP00000103... Transmembrane heli... MobiDB lite Superfamily SRCR-like domain superfamily

Peptidase S1, PA clan SMART SRCR-like domain Serine proteases, trypsin domain

Prints Peptidase S1A, chymotrypsin family Hepsin, SRCR domain Serine proteases, trypsin domain

PROSITE profiles Serine proteases, trypsin domain

PROSITE patterns Serine proteases, trypsin family, histidine active site

Serine proteases, trypsin family, serine active site PANTHER Hepsin

PTHR24253 Gene3D SRCR-like domain superfamily 2.40.10.10 CDD Serine proteases, trypsin domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 436

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9