https://www.alphaknockout.com

Mouse Hpn Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Hpn conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Hpn (NCBI Reference Sequence: NM_001110252 ; Ensembl: ENSMUSG00000001249 ) is located on Mouse 7. 14 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 14 (Transcript: ENSMUST00000108102). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Hpn gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-29F19 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a null mutation are hypothyroidic and develop profound hearing loss associated with structural changes in the tectorial membrane and a myelination defect affecting the compaction of spiral ganglion neurons.

Exon 2 starts from about 100% of the coding region. The knockout of Exon 2 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 391 bp, and the size of intron 2 for 3'-loxP site insertion: 3904 bp. The size of effective cKO region: ~458 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

5 1 2 14 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Scn1b Homology arm Exon of mouse Hpn cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(6903bp) | A(21.4% 1477) | C(26.81% 1851) | T(22.45% 1550) | G(29.34% 2025)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr7 - 31114886 31117885 3000 browser details YourSeq 31 1715 1757 3000 86.1% chr1 - 6531240 6531282 43 browser details YourSeq 24 1859 1882 3000 100.0% chr2 + 33066576 33066599 24 browser details YourSeq 22 1667 1691 3000 96.0% chr12 - 71577866 71577892 27 browser details YourSeq 22 521 542 3000 100.0% chr10 - 123092421 123092442 22 browser details YourSeq 22 991 1016 3000 92.4% chr13 + 42711886 42711911 26 browser details YourSeq 21 2555 2575 3000 100.0% chr6 - 94700416 94700436 21 browser details YourSeq 21 2562 2582 3000 100.0% chr5 + 141079461 141079481 21

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr7 - 31111428 31114427 3000 browser details YourSeq 207 1437 1716 3000 94.5% chr7 - 16181058 16264640 83583 browser details YourSeq 200 1490 1716 3000 94.8% chr7 - 30304283 30304993 711 browser details YourSeq 200 1490 1848 3000 89.8% chr5 + 140011957 140012417 461 browser details YourSeq 200 1490 1743 3000 92.4% chr5 + 130480767 130481210 444 browser details YourSeq 197 1490 1717 3000 94.3% chr7 + 24375294 24375583 290 browser details YourSeq 197 1491 1763 3000 93.5% chr5 + 136649813 136650393 581 browser details YourSeq 196 1490 1716 3000 93.9% chr7 + 28553753 28638775 85023 browser details YourSeq 195 1487 1716 3000 93.4% chr5 + 144034262 144034497 236 browser details YourSeq 193 1490 1716 3000 95.0% chr7 - 19763075 19867624 104550 browser details YourSeq 193 1490 1716 3000 94.6% chr7 - 6201815 6202050 236 browser details YourSeq 192 1490 1716 3000 95.0% chr5 + 134001482 134375302 373821 browser details YourSeq 191 1490 1716 3000 92.9% chr7 - 35610942 35611330 389 browser details YourSeq 191 1490 1716 3000 94.5% chr7 - 19351329 19351564 236 browser details YourSeq 191 1490 1716 3000 94.5% chr7 + 29003994 29004229 236 browser details YourSeq 191 1491 1716 3000 94.5% chr7 + 27297003 27390425 93423 browser details YourSeq 191 1490 1716 3000 92.5% chr7 + 25111469 25111701 233 browser details YourSeq 191 1490 1718 3000 94.1% chr5 + 142037560 142330923 293364 browser details YourSeq 191 1490 1716 3000 94.5% chr5 + 140484087 140484322 236 browser details YourSeq 190 1490 1716 3000 92.6% chr7 - 35631982 35632217 236

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Hpn hepsin [ Mus musculus (house mouse) ] Gene ID: 15451, updated on 12-Aug-2019

Gene summary

Official Symbol Hpn provided by MGI Official Full Name hepsin provided by MGI Primary source MGI:MGI:1196620 See related Ensembl:ENSMUSG00000001249 Gene type protein coding RefSeq status REVIEWED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Hlb320 Summary This gene encodes a type II transmembrane serine protease that may function in diverse processes, including regulation of Expression cell growth. Deficiency in this gene results in hearing loss. The protein is cleaved into a catalytic serine protease chain and a non-catalytic scavenger receptor cysteine-rich chain, which associate via a single disulfide bond. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. [provided by RefSeq, Jan 2013] Orthologs Biased expression in liver adult (RPKM 275.8), kidney adult (RPKM 197.4) and 4 other tissuesS ee more human all

Genomic context

Location: 7; 7 B1 See Hpn in Genome Data Viewer

Exon count: 15

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 7 NC_000073.6 (31098725..31115326, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 7 NC_000073.5 (31883744..31900309, complement)

Chromosome 7 - NC_000073.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 11 transcripts

Gene: Hpn ENSMUSG00000001249

Description hepsin [Source:MGI Symbol;Acc:MGI:1196620] Gene Synonyms Hlb320 Location Chromosome 7: 31,098,725-31,115,290 reverse strand. GRCm38:CM001000.2 About this gene This gene has 11 transcripts (splice variants), 179 orthologues, 20 paralogues, is a member of 1 Ensembl protein family and is associated with 18 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Hpn- ENSMUST00000108102.8 1830 436aa ENSMUSP00000103737.2 Protein coding CCDS52188 O35453 TSL:1 202 GENCODE basic

Hpn- ENSMUST00000168884.7 1762 416aa ENSMUSP00000131658.1 Protein coding CCDS52187 G3UWE8 TSL:1 209 GENCODE basic APPRIS P1

Hpn- ENSMUST00000039435.14 1743 445aa ENSMUSP00000038149.8 Protein coding CCDS71936 E9Q5P0 TSL:1 201 GENCODE basic

Hpn- ENSMUST00000164929.2 1110 82aa ENSMUSP00000127229.1 Protein coding - E9Q3X9 CDS 3' 204 incomplete TSL:3

Hpn- ENSMUST00000171259.1 621 179aa ENSMUSP00000132307.1 Protein coding - F6W6S4 CDS 5' 211 incomplete TSL:5

Hpn- ENSMUST00000171225.1 469 32aa ENSMUSP00000130966.1 Protein coding - F7C9T6 CDS 5' 210 incomplete TSL:3

Hpn- ENSMUST00000165124.7 1861 137aa ENSMUSP00000145624.1 Nonsense mediated - A0A0U1RNM3 TSL:2 205 decay

Hpn- ENSMUST00000164340.1 920 No - Retained intron - - TSL:5 203 protein

Hpn- ENSMUST00000167719.7 679 No - Retained intron - - TSL:3 207 protein

Hpn- ENSMUST00000165480.1 541 No - Retained intron - - TSL:3 206 protein

Hpn- ENSMUST00000168623.1 351 No - Retained intron - - TSL:3 208 protein

Page 6 of 8 https://www.alphaknockout.com

36.57 kb Forward strand

31.09Mb 31.10Mb 31.11Mb 31.12Mb Contigs AC158993.2 >

Genes (Comprehensive set... < Hpn-202protein coding < Scn1b-201protein coding

< Hpn-209protein coding < Scn1b-203protein coding

< Hpn-205nonsense mediated decay < Scn1b-202retained intron

< Hpn-201protein coding

< Hpn-211protein coding < Hpn-204protein coding

< Hpn-210protein coding < Hpn-208retained intron

< Hpn-203retained intron

< Hpn-207retained intron

< Hpn-206retained intron

Regulatory Build

31.09Mb 31.10Mb 31.11Mb 31.12Mb Reverse strand 36.57 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000108102

< Hpn-202protein coding

Reverse strand 16.57 kb

ENSMUSP00000103... Transmembrane heli... MobiDB lite Superfamily SRCR-like domain superfamily

Peptidase S1, PA clan SMART SRCR-like domain Serine proteases, trypsin domain

Prints Peptidase S1A, chymotrypsin family Hepsin, SRCR domain Serine proteases, trypsin domain

PROSITE profiles Serine proteases, trypsin domain

PROSITE patterns Serine proteases, trypsin family, histidine active site

Serine proteases, trypsin family, serine active site PANTHER Hepsin

PTHR24253 Gene3D SRCR-like domain superfamily 2.40.10.10 CDD Serine proteases, trypsin domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 436

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8