https://www.alphaknockout.com

Mouse Nhlrc1 Knockout Project (CRISPR/Cas9)

Objective: To create a Nhlrc1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Nhlrc1 (NCBI Reference Sequence: NM_175340 ; Ensembl: ENSMUSG00000044231 ) is located on Mouse 13. 1 exon is identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 1 (Transcript: ENSMUST00000052747). Exon 1 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit accumulation of Lafora bodies and total glycogen levels in the heart muscle, skeletal muscle, and brain.

Exon 1 starts from about 0.08% of the coding region. Exon 1 covers 100.0% of the coding region. The size of effective KO region: ~1201 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1

Legends Exon of mouse Nhlrc1 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.55% 531) | C(24.8% 496) | T(27.75% 555) | G(20.9% 418)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.35% 527) | C(21.0% 420) | T(30.25% 605) | G(22.4% 448)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr13 - 47014780 47016779 2000 browser details YourSeq 74 857 1323 2000 69.3% chr11 + 119973023 119973241 219 browser details YourSeq 66 839 1323 2000 93.6% chr1 - 13098621 13393363 294743 browser details YourSeq 59 843 928 2000 91.6% chr13 - 49333577 49333663 87 browser details YourSeq 54 1237 1323 2000 76.4% chr4 + 137849384 137849461 78 browser details YourSeq 52 1230 1318 2000 82.5% chr19 - 7319804 7319913 110 browser details YourSeq 52 1247 1323 2000 84.3% chr13 - 38049506 38049587 82 browser details YourSeq 52 1240 1323 2000 78.4% chr4 + 119167669 119167751 83 browser details YourSeq 50 1240 1323 2000 96.4% chr4 - 136209922 136210006 85 browser details YourSeq 50 1250 1323 2000 78.9% chr18 - 35478239 35478309 71 browser details YourSeq 49 1241 1323 2000 76.9% chr9 + 70702920 70703001 82 browser details YourSeq 49 1241 1323 2000 74.4% chr5 + 38057823 38057900 78 browser details YourSeq 48 1255 1323 2000 79.7% chr7 + 109759491 109759547 57 browser details YourSeq 48 1252 1323 2000 83.4% chr2 + 144668090 144668161 72 browser details YourSeq 47 144 214 2000 83.1% chr4 - 150957443 150957513 71 browser details YourSeq 47 1252 1323 2000 92.8% chr16 - 4713482 4713554 73 browser details YourSeq 47 1255 1323 2000 84.1% chr13 + 44721888 44721956 69 browser details YourSeq 46 1255 1323 2000 83.9% chr3 + 131081848 131081921 74 browser details YourSeq 46 1256 1446 2000 66.7% chr19 + 8565736 8565815 80 browser details YourSeq 45 1253 1321 2000 77.8% chr8 + 106975563 106975626 64

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr13 - 47011577 47013576 2000 browser details YourSeq 79 1596 1713 2000 93.5% chr1 - 151058199 151058536 338 browser details YourSeq 78 1575 1710 2000 81.5% chr11 + 95147152 95147275 124 browser details YourSeq 76 1575 1691 2000 94.4% chr10 - 119138104 119138338 235 browser details YourSeq 75 1487 1631 2000 91.3% chr12 - 76150027 76150197 171 browser details YourSeq 75 1570 1710 2000 93.2% chr10 + 90393863 90394035 173 browser details YourSeq 72 1575 1746 2000 79.6% chr11 - 104069420 104069555 136 browser details YourSeq 72 1575 1688 2000 86.9% chr7 + 81714917 81715010 94 browser details YourSeq 68 1575 1669 2000 90.0% chr10 - 61159071 61159164 94 browser details YourSeq 62 1538 1623 2000 91.2% chr12 - 54917308 54917813 506 browser details YourSeq 61 1620 1710 2000 93.3% chr11 - 68573322 68573437 116 browser details YourSeq 59 1577 1639 2000 98.5% chr7 - 7100026 7100090 65 browser details YourSeq 59 1595 1686 2000 91.4% chr1 + 156111051 156111145 95 browser details YourSeq 55 1593 1712 2000 73.9% chr12 + 84299153 84299224 72 browser details YourSeq 55 1575 1631 2000 100.0% chr1 + 120238916 120238974 59 browser details YourSeq 54 1600 1691 2000 93.9% chr12 - 56707428 56707673 246 browser details YourSeq 51 1575 1631 2000 96.5% chr12 + 48805918 48805976 59 browser details YourSeq 47 1655 1710 2000 94.7% chr10 - 4550963 4551025 63 browser details YourSeq 47 1655 1710 2000 92.2% chr1 - 133740590 133740644 55 browser details YourSeq 47 1575 1631 2000 96.1% chr14 + 73550138 73550196 59

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Nhlrc1 NHL repeat containing 1 [ Mus musculus (house mouse) ] Gene ID: 105193, updated on 12-Aug-2019

Gene summary

Official Symbol Nhlrc1 provided by MGI Official Full Name NHL repeat containing 1 provided by MGI Primary source MGI:MGI:2145264 See related Ensembl:ENSMUSG00000044231 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as EPM2B; AI505271; B230309E09Rik Orthologs human all

Genomic context

Location: 13; 13 A5 See Nhlrc1 in Genome Data Viewer

Exon count: 1

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 13 NC_000079.6 (47012557..47014850, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 13 NC_000079.5 (47107926..47110219, complement)

Chromosome 13 - NC_000079.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 1 transcript

Gene: Nhlrc1 ENSMUSG00000044231

Description NHL repeat containing 1 [Source:MGI Symbol;Acc:MGI:2145264] Gene Synonyms B230309E09Rik, EPM2B, Malin Location Chromosome 13: 47,012,557-47,014,850 reverse strand. GRCm38:CM001006.2 About this gene This gene has 1 transcript (splice variant), 157 orthologues, is a member of 1 Ensembl protein family and is associated with 29 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Nhlrc1-201 ENSMUST00000052747.3 2294 401aa ENSMUSP00000054990.2 Protein coding CCDS26487 Q0VF71 Q8BR37 TSL:NA GENCODE basic APPRIS P1

22.29 kb Forward strand

47.005Mb 47.010Mb 47.015Mb 47.020Mb 2010001K21Rik-203 >lncRNA Gm18807-201 >processed pseudogene (Comprehensive set...

2010001K21Rik-202 >retained intron 2010001K21Rik-201 >pseudogene

2010001K21Rik-204 >retained intron

Contigs AC096628.14 >

Genes < Gm5790-201processed pseudogene < Tpmt-208protein coding (Comprehensive set...

< Nhlrc1-201protein coding < Tpmt-201protein coding

Regulatory Build

47.005Mb 47.010Mb 47.015Mb 47.020Mb Reverse strand 22.29 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene pseudogene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000052747

< Nhlrc1-201protein coding

Reverse strand 2.29 kb

ENSMUSP00000054... Low complexity (Seg) Superfamily SSF57850 SSF101898

SMART Zinc finger, RING-type Pfam Zinc finger, RING-type PROSITE profiles Zinc finger, RING-type NHL repeat, subgroup PROSITE patterns Zinc finger, RING-type, conserved site PANTHER PTHR24104:SF28

PTHR24104 Gene3D Zinc finger, RING/FYVE/PHD-type 2.40.10.500

Six-bladed beta-propeller, TolB-like CDD cd16516 cd14961

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 401

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8