https://www.alphaknockout.com

Mouse Herpud1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Herpud1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Herpud1 (NCBI Reference Sequence: NM_022331 ; Ensembl: ENSMUSG00000031770 ) is located on Mouse 8. 8 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 8 (Transcript: ENSMUST00000161576). Exon 2~4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Herpud1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-36N8 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit impaired glucose tolerance and decreased cerebral infarction size.

Exon 2 starts from about 12.62% of the coding region. The knockout of Exon 2~4 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 2617 bp, and the size of intron 4 for 3'-loxP site insertion: 883 bp. The size of effective cKO region: ~1993 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 8 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Herpud1 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8493bp) | A(23.91% 2031) | C(22.81% 1937) | T(28.31% 2404) | G(24.97% 2121)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 + 94386107 94389106 3000 browser details YourSeq 44 2075 2455 3000 62.5% chr2 + 33381106 33381268 163 browser details YourSeq 40 2379 2454 3000 91.7% chr12 + 85303971 85304046 76 browser details YourSeq 38 2377 2454 3000 95.3% chr5 + 65778170 65778247 78 browser details YourSeq 37 2379 2455 3000 93.1% chr3 - 151934345 151934421 77 browser details YourSeq 36 2073 2428 3000 56.1% chr1 - 81490957 81491096 140 browser details YourSeq 34 2379 2454 3000 92.5% chr15 - 58185610 58185687 78 browser details YourSeq 33 2404 2461 3000 94.6% chr14 + 122316889 122316950 62 browser details YourSeq 31 2406 2457 3000 91.0% chr10 - 61785587 61785637 51 browser details YourSeq 31 2405 2457 3000 97.0% chr11 + 62541626 62541679 54 browser details YourSeq 30 2405 2455 3000 90.7% chr7 - 79205254 79205303 50 browser details YourSeq 30 2405 2460 3000 97.0% chr13 - 58614396 58614456 61 browser details YourSeq 30 2404 2454 3000 90.7% chr15 + 17199319 17199368 50 browser details YourSeq 29 2405 2454 3000 90.4% chr19 - 32998575 32998623 49 browser details YourSeq 29 2404 2454 3000 96.9% chr12 - 23671926 23671976 51 browser details YourSeq 29 2404 2454 3000 96.9% chr1 - 28388586 28388636 51 browser details YourSeq 29 2405 2454 3000 90.4% chr2 + 106144145 106144193 49 browser details YourSeq 28 2403 2451 3000 86.7% chr10 - 82917560 82917606 47 browser details YourSeq 28 2405 2454 3000 96.8% chr14 + 124502014 124502063 50 browser details YourSeq 28 2405 2454 3000 96.8% chr13 + 92313860 92313909 50

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 + 94391100 94394099 3000 browser details YourSeq 49 2403 2456 3000 96.3% chr10 + 57811991 57812045 55 browser details YourSeq 46 2213 2290 3000 83.4% chr19 - 23882175 23882246 72 browser details YourSeq 42 2417 2458 3000 100.0% chr13 + 51193761 51193802 42 browser details YourSeq 40 2213 2261 3000 84.5% chr5 + 85602289 85602334 46 browser details YourSeq 38 2206 2245 3000 92.4% chr16 + 6719953 6719991 39 browser details YourSeq 37 2206 2247 3000 84.7% chr3 - 5379524 5379562 39 browser details YourSeq 37 2206 2243 3000 100.0% chr2 - 139110443 139110481 39 browser details YourSeq 36 2206 2243 3000 91.9% chr2 - 5837331 5837367 37 browser details YourSeq 36 2211 2247 3000 100.0% chr2 + 42578859 42578897 39 browser details YourSeq 35 2211 2264 3000 75.7% chr5 - 16930542 16930581 40 browser details YourSeq 35 2444 2494 3000 92.7% chr15 - 84339785 84339837 53 browser details YourSeq 34 2210 2243 3000 100.0% chr11 - 18320838 18320871 34 browser details YourSeq 34 2210 2245 3000 97.3% chrX + 118318221 118318256 36 browser details YourSeq 33 2211 2243 3000 100.0% chr4 + 147079739 147079771 33 browser details YourSeq 33 2211 2243 3000 100.0% chr4 + 145723239 145723271 33 browser details YourSeq 28 2213 2240 3000 100.0% chr2 - 102954808 102954835 28 browser details YourSeq 27 2442 2495 3000 96.6% chr11 + 79565623 79565678 56 browser details YourSeq 23 2213 2235 3000 100.0% chr13 - 62368553 62368575 23 browser details YourSeq 22 2211 2234 3000 95.9% chr14 + 53636630 53636653 24

Note: The 3000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Herpud1 homocysteine-inducible, endoplasmic reticulum stress-inducible, -like domain member 1 [ Mus musculus (house mouse) ] Gene ID: 64209, updated on 24-Oct-2019

Gene summary

Official Symbol Herpud1 provided by MGI Official Full Name homocysteine-inducible, endoplasmic reticulum stress-inducible, ubiquitin-like domain member 1 provided by MGI Primary source MGI:MGI:1927406 See related Ensembl:ENSMUSG00000031770 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as SUP; Herp; Mifl Expression Ubiquitous expression in lung adult (RPKM 149.4), kidney adult (RPKM 146.9) and 28 other tissues See more Orthologs human all

Genomic context

Location: 8; 8 C5 See Herpud1 in Genome Data Viewer

Exon count: 8

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 8 NC_000074.6 (94386429..94395371)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 8 NC_000074.5 (96910400..96919258)

Chromosome 8 - NC_000074.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Herpud1 ENSMUSG00000031770

Description homocysteine-inducible, endoplasmic reticulum stress-inducible, ubiquitin-like domain member 1 [Source:MGI Symbol;Acc:MGI:1927406] Gene Synonyms Herp, Mifl Location Chromosome 8: 94,386,438-94,395,377 forward strand. GRCm38:CM001001.2 About this gene This gene has 6 transcripts (splice variants), 198 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 3 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Herpud1- ENSMUST00000161576.7 1952 391aa ENSMUSP00000124201.1 Protein coding CCDS22541 Q3TMN9 TSL:1 205 Q9JJK5 GENCODE basic APPRIS P2

Herpud1- ENSMUST00000034220.7 1873 390aa ENSMUSP00000034220.7 Protein coding - Q8C4N0 TSL:1 201 GENCODE basic APPRIS ALT1

Herpud1- ENSMUST00000211982.1 823 179aa ENSMUSP00000148581.1 Protein coding - A0A1D5RM09 CDS 5' 206 incomplete TSL:5

Herpud1- ENSMUST00000161085.7 731 62aa ENSMUSP00000148426.1 Nonsense mediated - A0A1D5RLM1 CDS 5' 204 decay incomplete TSL:3

Herpud1- ENSMUST00000159450.1 631 36aa ENSMUSP00000148775.1 Nonsense mediated - A0A1D5RMG9 CDS 5' 202 decay incomplete TSL:3

Herpud1- ENSMUST00000160866.1 690 No - Retained intron - - TSL:1 203 protein

Page 6 of 8 https://www.alphaknockout.com

28.94 kb Forward strand 94.38Mb 94.39Mb 94.40Mb (Comprehensive set... Gm15889-202 >lncRNA Herpud1-205 >protein coding Ap3s1-ps2-201 >processed pseudogene

Gm15889-201 >lncRNA Herpud1-201 >protein coding

Herpud1-204 >nonsense mediated decay

Herpud1-202 >nonsense mediated decay

Herpud1-206 >protein coding

Herpud1-203 >retained intron

Contigs < AC128663.4 Genes < Gm15890-201lncRNA (Comprehensive set...

Regulatory Build

94.38Mb 94.39Mb 94.40Mb Reverse strand 28.94 kb

Regulation Legend CTCF Enhancer Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript pseudogene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000161576

8.94 kb Forward strand

Herpud1-205 >protein coding

ENSMUSP00000124... MobiDB lite Low complexity (Seg) Superfamily Ubiquitin-like domain superfamily SMART Ubiquitin domain Pfam Ubiquitin domain PROSITE profiles Ubiquitin domain PANTHER PTHR12943:SF7

Homocysteine-responsive endoplasmic reticulum-resident ubiquitin-like domain member 1 /2 Gene3D 3.10.20.90 CDD cd17118

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 391

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8