https://www.alphaknockout.com

Mouse Hsph1 Knockout Project (CRISPR/Cas9)

Objective: To create a Hsph1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Hsph1 (NCBI Reference Sequence: NM_013559 ; Ensembl: ENSMUSG00000029657 ) is located on Mouse 5. 18 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 18 (Transcript: ENSMUST00000202361). Exon 2~11 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous inactivation of this gene leads to decreased susceptibility to ischemic brain injury.

Exon 2 starts from about 4.2% of the coding region. Exon 2~11 covers 57.5% of the coding region. The size of effective KO region: ~8919 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10 11 18

Legends Exon of mouse Hsph1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1372 bp section downstream of Exon 11 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.65% 493) | C(22.95% 459) | T(29.9% 598) | G(22.5% 450)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1372bp) | A(25.58% 351) | C(16.55% 227) | T(33.09% 454) | G(24.78% 340)

Note: The 1372 bp section downstream of Exon 11 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr5 - 149633888 149635887 2000 browser details YourSeq 23 1590 1614 2000 87.5% chr17 - 83425688 83425711 24 browser details YourSeq 22 643 667 2000 96.0% chr12 - 74088105 74088131 27

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1372 1 1372 1372 100.0% chr5 - 149623597 149624968 1372 browser details YourSeq 28 1311 1341 1372 96.8% chr1 - 166637702 166637746 45 browser details YourSeq 21 1246 1266 1372 100.0% chr5 - 12232056 12232076 21

Note: The 1372 bp section downstream of Exon 11 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Hsph1 heat shock 105kDa/110kDa protein 1 [ Mus musculus (house mouse) ] Gene ID: 15505, updated on 12-Aug-2019

Gene summary

Official Symbol Hsph1 provided by MGI Official Full Name heat shock 105kDa/110kDa protein 1 provided by MGI Primary source MGI:MGI:105053 See related Ensembl:ENSMUSG00000029657 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 105kDa; Hsp105; Hsp110; hsp-E7I; AI790491; hsp110/105 Expression Broad expression in cortex adult (RPKM 30.2), CNS E11.5 (RPKM 29.9) and 25 other tissues See more Orthologs human all

Genomic context

Location: 5 G3; 5 89.18 cM See Hsph1 in Genome Data Viewer Exon count: 19

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 5 NC_000071.6 (149616843..149636498, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 5 NC_000071.5 (150419420..150438890, complement)

Chromosome 5 - NC_000071.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 11 transcripts

Gene: Hsph1 ENSMUSG00000029657

Description heat shock 105kDa/110kDa protein 1 [Source:MGI Symbol;Acc:MGI:105053] Gene Synonyms HSP110, Hsp105, hsp-E7I, hsp110/105 Location Chromosome 5: 149,614,287-149,636,376 reverse strand. GRCm38:CM000998.2 About this gene This gene has 11 transcripts (splice variants), 163 orthologues, 12 paralogues, is a member of 1 Ensembl protein family and is associated with 3 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Hsph1-211 ENSMUST00000202361.3 3802 858aa ENSMUSP00000144413.1 Protein coding CCDS19885 Q61699 TSL:1 GENCODE basic APPRIS P3

Hsph1-201 ENSMUST00000074846.13 3240 814aa ENSMUSP00000074392.8 Protein coding CCDS85010 Q61699 TSL:1 GENCODE basic APPRIS ALT1

Hsph1-205 ENSMUST00000201452.3 3140 858aa ENSMUSP00000144654.1 Protein coding CCDS19885 Q61699 TSL:1 GENCODE basic APPRIS P3

Hsph1-209 ENSMUST00000202089.3 3054 817aa ENSMUSP00000144297.1 Protein coding - E9Q0U7 TSL:5 GENCODE basic

Hsph1-206 ENSMUST00000201559.3 661 144aa ENSMUSP00000144043.1 Protein coding - D3Z3I9 CDS 3' incomplete TSL:5

Hsph1-202 ENSMUST00000200805.3 587 94aa ENSMUSP00000143925.1 Protein coding - A0A0J9YTZ7 CDS 3' incomplete TSL:3

Hsph1-203 ENSMUST00000200825.1 416 100aa ENSMUSP00000143913.1 Protein coding - D3Z027 CDS 3' incomplete TSL:2

Hsph1-204 ENSMUST00000201431.3 4764 No protein - Retained intron - - TSL:1

Hsph1-210 ENSMUST00000202137.1 752 No protein - Retained intron - - TSL:2

Hsph1-208 ENSMUST00000201877.1 751 No protein - Retained intron - - TSL:2

Hsph1-207 ENSMUST00000201666.1 254 No protein - lncRNA - - TSL:5

Page 7 of 9 https://www.alphaknockout.com

42.09 kb Forward strand

149.61Mb 149.62Mb 149.63Mb 149.64Mb Wdr95-201 >protein coding Gm20005-201 >lncRNA (Comprehensive set...

Wdr95-207 >protein coding

Contigs < AC119856.13

Genes (Comprehensive set... < Hsph1-205protein coding

< Hsph1-211protein coding

< Hsph1-201protein coding

< Hsph1-209protein coding

< Hsph1-208retained intron < Hsph1-206protein coding

< Hsph1-210retained intron < Hsph1-202protein coding

< Hsph1-204retained intron

< Hsph1-203protein coding

< Hsph1-207lncRNA

Regulatory Build

149.61Mb 149.62Mb 149.63Mb 149.64Mb Reverse strand 42.09 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000202361

< Hsph1-211protein coding

Reverse strand 19.80 kb

ENSMUSP00000144... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily SSF53067 Heat shock protein 70kD, peptide-binding domain superfamily

Heat shock protein 70kD, C-terminal domain superfamily Prints Heat shock protein 70 family Pfam Heat shock protein 70 family PROSITE patterns Heat shock protein 70, conserved site PANTHER PTHR45639:SF2

PTHR45639 Gene3D 3.30.30.30 3.90.640.10 Heat shock protein 70kD, peptide-binding domain superfamily

3.30.420.40 Heat shock protein 70kD, C-terminal domain superfamily CDD HSPH1, nucleotide-binding domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 720 858

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9