https://www.alphaknockout.com

Mouse Heatr1 Knockout Project (CRISPR/Cas9)

Objective: To create a Heatr1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Heatr1 (NCBI Reference Sequence: NM_144835 ; Ensembl: ENSMUSG00000050244 ) is located on Mouse 13. 45 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 45 (Transcript: ENSMUST00000059270). Exon 3~10 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 2.22% of the coding region. Exon 3~10 covers 18.07% of the coding region. The size of effective KO region: ~9712 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 5 6 7 8 9 10 45

Legends Exon of mouse Heatr1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 508 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 391 bp section downstream of Exon 10 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(508bp) | A(28.15% 143) | C(17.13% 87) | T(27.95% 142) | G(26.77% 136)

Note: The 508 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(391bp) | A(29.16% 114) | C(11.76% 46) | T(38.11% 149) | G(20.97% 82)

Note: The 391 bp section downstream of Exon 10 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 508 1 508 508 100.0% chr13 + 12395915 12396422 508 browser details YourSeq 33 43 76 508 100.0% chr10 + 21473856 21497396 23541 browser details YourSeq 24 205 234 508 92.9% chr10 - 22320125 22320156 32 browser details YourSeq 22 43 64 508 100.0% chr14 - 70799391 70799412 22 browser details YourSeq 22 43 64 508 100.0% chr13 - 78343516 78343537 22 browser details YourSeq 22 43 64 508 100.0% chr14 + 96457890 96457911 22 browser details YourSeq 22 51 72 508 100.0% chr1 + 170909542 170909563 22 browser details YourSeq 22 47 69 508 100.0% chr1 + 123829978 123830001 24 browser details YourSeq 20 41 62 508 95.5% chr1 - 161231721 161231742 22

Note: The 508 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 391 1 391 391 100.0% chr13 + 12406135 12406525 391 browser details YourSeq 27 200 230 391 93.6% chr1 - 142330730 142330760 31 browser details YourSeq 26 288 320 391 96.5% chr12 - 119390481 119390521 41 browser details YourSeq 23 205 229 391 96.0% chr16 - 75895755 75895779 25 browser details YourSeq 23 23 47 391 87.5% chr13 - 64350601 64350624 24 browser details YourSeq 22 10 35 391 92.4% chr12 + 67544370 67544395 26 browser details YourSeq 21 199 220 391 100.0% chr10 - 89549524 89549546 23 browser details YourSeq 21 260 282 391 95.7% chr13 + 8495951 8495973 23 browser details YourSeq 21 101 121 391 100.0% chr10 + 122560292 122560312 21 browser details YourSeq 20 267 286 391 100.0% chr10 - 50627231 50627250 20 browser details YourSeq 20 50 69 391 100.0% chr12 + 90301703 90301722 20 browser details YourSeq 20 148 167 391 100.0% chr1 + 64283145 64283164 20

Note: The 391 bp section downstream of Exon 10 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Heatr1 HEAT repeat containing 1 [ Mus musculus (house mouse) ] Gene ID: 217995, updated on 12-Aug-2019

Gene summary

Official Symbol Heatr1 provided by MGI Official Full Name HEAT repeat containing 1 provided by MGI Primary source MGI:MGI:2442524 See related Ensembl:ENSMUSG00000050244 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AA517551; BC019693; B130016L12Rik Expression Ubiquitous expression in CNS E11.5 (RPKM 6.1), liver E14 (RPKM 3.9) and 28 other tissues See more Orthologs human all

Genomic context

Location: 13 A1; 13 4.56 cM See Heatr1 in Genome Data Viewer Exon count: 45

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 13 NC_000079.6 (12395375..12438893)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 13 NC_000079.5 (12487642..12531160)

Chromosome 13 - NC_000079.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: Heatr1 ENSMUSG00000050244

Description HEAT repeat containing 1 [Source:MGI Symbol;Acc:MGI:2442524] Gene Synonyms B130016L12Rik Location Chromosome 13: 12,395,027-12,440,289 forward strand. GRCm38:CM001006.2 About this gene This gene has 8 transcripts (splice variants), 237 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Heatr1- ENSMUST00000059270.9 6780 2143aa ENSMUSP00000054084.8 Protein coding CCDS26240 G3X9B1 TSL:5 201 GENCODE basic APPRIS P1

Heatr1- ENSMUST00000223324.1 193 4aa ENSMUSP00000152797.1 Protein coding - - CDS 3' 208 incomplete TSL:3

Heatr1- ENSMUST00000222091.1 2482 744aa ENSMUSP00000152435.1 Nonsense mediated - A0A1Y7VNI1 CDS 5' 206 decay incomplete TSL:2

Heatr1- ENSMUST00000221046.1 1325 344aa ENSMUSP00000152410.1 Nonsense mediated - A0A1Y7VNG8 TSL:1 202 decay

Heatr1- ENSMUST00000221616.1 681 No - Retained intron - - TSL:2 204 protein

Heatr1- ENSMUST00000221051.1 556 No - Retained intron - - TSL:3 203 protein

Heatr1- ENSMUST00000221746.1 446 No - Retained intron - - TSL:3 205 protein

Heatr1- ENSMUST00000222817.1 373 No - Retained intron - - TSL:2 207 protein

Page 7 of 9 https://www.alphaknockout.com

65.26 kb Forward strand 12.40Mb 12.42Mb 12.44Mb (Comprehensive set... Heatr1-208 >protein coding Heatr1-205 >retained intron Heatr1-206 >nonsense mediated decay

Heatr1-201 >protein coding

Heatr1-207 >retained intron Heatr1-203 >retained intron

Heatr1-202 >nonsense mediated decay Heatr1-204 >retained intron

Contigs < AC154221.3 Genes < Gm5928-201processed pseudogene < Lgals8-203protein coding (Comprehensive set...

< Lgals8-202protein coding

< Lgals8-204retained intron

< Lgals8-206protein coding

< Lgals8-207protein coding

< Lgals8-201protein coding

< Lgals8-208protein coding

Regulatory Build

12.40Mb 12.42Mb 12.44Mb Reverse strand 65.26 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

pseudogene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000059270

43.52 kb Forward strand

Heatr1-201 >protein coding

ENSMUSP00000054... Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Armadillo-type fold SMART BP28, C-terminal domain

Pfam U3 small nucleolar RNA-associated protein 10, N-terminal BP28, C-terminal domain

PANTHER U3 small nucleolar RNA-associated protein 10

Gene3D Armadillo-like helical

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

stop gained missense variant splice region variant synonymous variant

Scale bar 0 200 400 600 800 1000 1200 1400 1600 1800 2143

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9