https://www.alphaknockout.com

Mouse Hmces Knockout Project (CRISPR/Cas9)

Objective: To create a Hmces knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Hmces (NCBI Reference Sequence: NM_173737 ; Ensembl: ENSMUSG00000030060 ) is located on Mouse 6. 7 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 7 (Transcript: ENSMUST00000032141). Exon 3~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous knockout leads to changes in DNA methylation, resulting in an altered embryonic gene expression profile and embryonic sub-lethality (lower embryonic survival).

Exon 3 starts from about 17.37% of the coding region. Exon 3~5 covers 42.4% of the coding region. The size of effective KO region: ~7972 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 5 7

Legends Exon of mouse Hmces Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(24.25% 485) | C(23.4% 468) | T(33.55% 671) | G(18.8% 376)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(28.6% 572) | C(17.2% 344) | T(34.4% 688) | G(19.8% 396)

Note: The 2000 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr6 + 87915840 87917839 2000 browser details YourSeq 106 817 1552 2000 93.5% chr19 + 46232626 46539953 307328 browser details YourSeq 100 480 952 2000 87.9% chr7 - 118255340 118255924 585 browser details YourSeq 100 795 1603 2000 91.6% chr1 + 59485949 59598682 112734 browser details YourSeq 88 501 876 2000 90.8% chr9 - 61790038 61930836 140799 browser details YourSeq 83 795 1556 2000 91.1% chr1 + 13083833 13147975 64143 browser details YourSeq 81 794 929 2000 83.0% chr15 - 12783729 12783872 144 browser details YourSeq 80 794 950 2000 90.9% chr4 - 117168069 117168227 159 browser details YourSeq 79 792 951 2000 87.7% chr18 - 34415701 34415874 174 browser details YourSeq 76 793 929 2000 91.5% chr7 - 131260698 131260843 146 browser details YourSeq 74 794 932 2000 94.3% chr8 - 109815785 109815928 144 browser details YourSeq 74 793 929 2000 88.6% chr17 - 63523350 63523484 135 browser details YourSeq 73 793 927 2000 92.0% chr7 - 28750471 28750610 140 browser details YourSeq 71 822 952 2000 89.9% chr2 - 38493616 38493751 136 browser details YourSeq 70 793 929 2000 92.7% chr5 - 145258472 145258616 145 browser details YourSeq 70 808 934 2000 91.7% chr15 - 8177543 8177677 135 browser details YourSeq 70 792 934 2000 91.7% chr11 - 76408269 76408414 146 browser details YourSeq 67 803 929 2000 91.3% chr1 - 170715717 170715845 129 browser details YourSeq 67 793 929 2000 91.4% chr3 + 67555551 67555698 148 browser details YourSeq 67 794 876 2000 91.4% chr11 + 80028524 80028606 83

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr6 + 87925812 87927811 2000 browser details YourSeq 115 1593 1965 2000 75.8% chr3 + 41008513 41008779 267 browser details YourSeq 113 1583 1759 2000 79.9% chr7 - 34633324 34633494 171 browser details YourSeq 111 1566 1756 2000 77.8% chr16 + 90392926 90393110 185 browser details YourSeq 110 1599 1759 2000 84.0% chr9 - 106606541 106606694 154 browser details YourSeq 110 1584 1759 2000 84.1% chr19 + 45344865 45345046 182 browser details YourSeq 106 1414 1759 2000 72.1% chr10 + 56240617 56240810 194 browser details YourSeq 105 1567 1725 2000 82.0% chr11 - 50188248 50188399 152 browser details YourSeq 104 1583 1759 2000 77.7% chr9 - 107643043 107643215 173 browser details YourSeq 104 1612 1759 2000 83.7% chr12 - 108540733 108540879 147 browser details YourSeq 102 1602 1995 2000 90.0% chr2 - 156964990 156965492 503 browser details YourSeq 98 1593 1757 2000 86.0% chr4 - 32849456 32849855 400 browser details YourSeq 98 1595 1759 2000 84.3% chr7 + 16937175 16937339 165 browser details YourSeq 98 1610 1759 2000 86.1% chr12 + 105503194 105503545 352 browser details YourSeq 94 1609 1771 2000 91.4% chrX - 10733597 10733781 185 browser details YourSeq 94 1579 1715 2000 85.5% chr3 + 65118213 65118354 142 browser details YourSeq 94 1598 1759 2000 80.6% chr18 + 10075832 10075995 164 browser details YourSeq 94 1566 1759 2000 79.1% chr15 + 102072853 102073021 169 browser details YourSeq 92 1592 1759 2000 75.8% chr9 - 111603637 111603802 166 browser details YourSeq 92 1534 1759 2000 77.0% chr17 - 79087009 79087185 177

Note: The 2000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Hmces 5-hydroxymethylcytosine (hmC) binding, ES cell specific [ Mus musculus (house mouse) ] Gene ID: 232210, updated on 12-Aug-2019

Gene summary

Official Symbol Hmces provided by MGI Official Full Name 5-hydroxymethylcytosine (hmC) binding, ES cell specific provided by MGI Primary source MGI:MGI:1914053 See related Ensembl:ENSMUSG00000030060 Gene type protein coding RefSeq status PROVISIONAL Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Srap1; C85376; 8430410A17Rik Expression Ubiquitous expression in limb E14.5 (RPKM 8.7), CNS E18 (RPKM 8.2) and 28 other tissues See more Orthologs human all

Genomic context

Location: 6; 6 D1 See Hmces in Genome Data Viewer Exon count: 7

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 6 NC_000072.6 (87913907..87936619)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 6 NC_000072.5 (87863970..87886608)

Chromosome 6 - NC_000072.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 7 transcripts

Gene: Hmces ENSMUSG00000030060

Description 5-hydroxymethylcytosine (hmC) binding, ES cell specific [Source:MGI Symbol;Acc:MGI:1914053] Gene Synonyms 8430410A17Rik, Srap1 Location Chromosome 6: 87,913,935-87,936,629 forward strand. GRCm38:CM000999.2 About this gene This gene has 7 transcripts (splice variants), 189 orthologues, is a member of 1 Ensembl protein family and is associated with 6 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Hmces- ENSMUST00000113606.1 1572 353aa ENSMUSP00000109236.1 Protein coding CCDS39551 Q8R1M0 TSL:1 202 GENCODE basic APPRIS P1

Hmces- ENSMUST00000032141.13 1464 353aa ENSMUSP00000032141.7 Protein coding CCDS39551 Q8R1M0 TSL:1 201 GENCODE basic APPRIS P1

Hmces- ENSMUST00000204232.1 513 76aa ENSMUSP00000145504.1 Nonsense mediated - A0A0N4SWG0 TSL:5 206 decay

Hmces- ENSMUST00000148914.1 882 No - Retained intron - - TSL:1 205 protein

Hmces- ENSMUST00000204614.2 902 No - lncRNA - - TSL:3 207 protein

Hmces- ENSMUST00000124551.1 672 No - lncRNA - - TSL:3 203 protein

Hmces- ENSMUST00000132651.2 567 No - lncRNA - - TSL:2 204 protein

Page 7 of 9 https://www.alphaknockout.com

42.70 kb Forward strand 87.91Mb 87.92Mb 87.93Mb 87.94Mb (Comprehensive set... Copg1-202 >protein coding Hmces-201 >protein coding

Copg1-206 >retained intron Hmces-207 >lncRNA

Copg1-207 >retained intron Hmces-204 >lncRNA Hmces-205 >retained intron

Copg1-203 >lncRNA Hmces-206 >nonsense mediated decay

Hmces-202 >protein coding

Hmces-203 >lncRNA

Contigs AC153872.2 > Genes < Gm26636-201lncRNA (Comprehensive set...

Regulatory Build

87.91Mb 87.92Mb 87.93Mb 87.94Mb Reverse strand 42.70 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000032141

22.70 kb Forward strand

Hmces-201 >protein coding

ENSMUSP00000032... MobiDB lite Low complexity (Seg) Superfamily SOS response associated peptidase-like Pfam SOS response associated peptidase (SRAP) PANTHER SOS response associated peptidase (SRAP)

Gene3D SOS response associated peptidase-like

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 353

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9