https://www.alphaknockout.com

Mouse Hmbox1 Knockout Project (CRISPR/Cas9)

Objective: To create a Hmbox1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Hmbox1 (NCBI Reference Sequence: NM_177338 ; Ensembl: ENSMUSG00000021972 ) is located on Mouse 14. 10 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 10 (Transcript: ENSMUST00000067843). Exon 3~4 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a gene trapped allele exhibit absence of TERT binding to chromatin as shown by subcellular fractionation analysis of mouse embryonic fibroblasts.

Exon 3 starts from about 1.91% of the coding region. Exon 3~4 covers 44.79% of the coding region. The size of effective KO region: ~9552 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 10

Legends Exon of mouse Hmbox1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 4 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.8% 556) | C(19.75% 395) | T(32.05% 641) | G(20.4% 408)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.6% 512) | C(20.1% 402) | T(34.55% 691) | G(19.75% 395)

Note: The 2000 bp section downstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr14 - 64897127 64899126 2000 browser details YourSeq 125 669 876 2000 88.4% chr9 + 75571854 75572062 209 browser details YourSeq 123 670 836 2000 87.4% chr4 + 106071736 106072229 494 browser details YourSeq 120 677 879 2000 84.5% chr6 - 107509285 107509495 211 browser details YourSeq 120 670 880 2000 89.0% chr2 - 128646737 128646959 223 browser details YourSeq 117 710 890 2000 87.7% chr11 - 62733645 62733834 190 browser details YourSeq 117 669 885 2000 78.3% chr2 + 116942820 116943044 225 browser details YourSeq 117 668 831 2000 92.1% chr11 + 32105104 32310599 205496 browser details YourSeq 115 670 822 2000 89.2% chr10 - 75972264 75972583 320 browser details YourSeq 113 678 876 2000 86.0% chr10 - 71368919 71369136 218 browser details YourSeq 112 697 876 2000 81.3% chr10 - 94671494 94671672 179 browser details YourSeq 111 670 825 2000 86.8% chrX - 136768234 136768398 165 browser details YourSeq 111 669 841 2000 88.3% chr6 - 31507301 31507482 182 browser details YourSeq 111 698 885 2000 86.3% chr1 - 177331476 177331663 188 browser details YourSeq 109 647 839 2000 79.1% chr19 - 59341464 59341612 149 browser details YourSeq 107 680 857 2000 88.1% chr18 - 38901045 38901231 187 browser details YourSeq 107 669 825 2000 89.7% chr15 - 35460727 35460887 161 browser details YourSeq 106 713 885 2000 84.8% chr6 - 108671510 108671682 173 browser details YourSeq 105 697 861 2000 89.4% chr8 - 104545689 104545864 176 browser details YourSeq 105 667 881 2000 84.4% chr15 - 8247438 8247660 223

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr14 - 64885575 64887574 2000 browser details YourSeq 68 464 619 2000 85.8% chr19 - 45277643 45277791 149 browser details YourSeq 56 1802 1882 2000 82.7% chr14 + 21768496 21768574 79 browser details YourSeq 52 1802 1881 2000 80.8% chr9 + 14375123 14375201 79 browser details YourSeq 47 1813 1879 2000 85.1% chr1 - 180821849 180821915 67 browser details YourSeq 43 1820 1880 2000 85.3% chr18 + 34560746 34560806 61 browser details YourSeq 43 1813 1871 2000 82.8% chr11 + 97623030 97623087 58 browser details YourSeq 42 1837 1884 2000 93.8% chr8 - 72716680 72716727 48 browser details YourSeq 42 1810 1871 2000 81.7% chr11 + 79925475 79925535 61 browser details YourSeq 41 1064 1123 2000 80.0% chr14 + 121397237 121397288 52 browser details YourSeq 40 457 502 2000 93.5% chr6 - 9414825 9414870 46 browser details YourSeq 40 1821 1876 2000 85.8% chr5 - 147897567 147897622 56 browser details YourSeq 40 1837 1880 2000 95.5% chr4 + 40480448 40480491 44 browser details YourSeq 40 1080 1122 2000 97.7% chr1 + 78667808 78667992 185 browser details YourSeq 38 1815 1880 2000 75.4% chr3 - 95546081 95546145 65 browser details YourSeq 38 1077 1123 2000 80.5% chr12 - 54221765 54221805 41 browser details YourSeq 38 458 520 2000 88.4% chr11 - 78766835 78766895 61 browser details YourSeq 38 1082 1122 2000 97.6% chr10 + 102360440 102360481 42 browser details YourSeq 37 1085 1123 2000 97.5% chr9 + 57767410 57767448 39 browser details YourSeq 37 1820 1866 2000 89.4% chr7 + 135382481 135382527 47

Note: The 2000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Hmbox1 containing 1 [ Mus musculus (house mouse) ] Gene ID: 219150, updated on 10-Oct-2019

Gene summary

Official Symbol Hmbox1 provided by MGI Official Full Name homeobox containing 1 provided by MGI Primary source MGI:MGI:2445066 See related Ensembl:ENSMUSG00000021972 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI451877; AI604847; F830020C16Rik Expression Ubiquitous expression in lung adult (RPKM 16.3), thymus adult (RPKM 10.2) and 28 other tissuesS ee more Orthologs human all

Genomic context

Location: 14; 14 D1 See Hmbox1 in Genome Data Viewer Exon count: 11

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 14 NC_000080.6 (64811600..64949899, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 14 NC_000080.5 (65441055..65568684, complement)

Chromosome 14 - NC_000080.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 10 transcripts

Gene: Hmbox1 ENSMUSG00000021972

Description homeobox containing 1 [Source:MGI Symbol;Acc:MGI:2445066] Gene Synonyms F830020C16Rik Location Chromosome 14: 64,811,600-64,949,871 reverse strand. GRCm38:CM001007.2 About this gene This gene has 10 transcripts (splice variants), 262 orthologues, is a member of 1 Ensembl protein family and is associated with 1 phenotype. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Hmbox1-201 ENSMUST00000022544.13 3396 404aa ENSMUSP00000022544.7 Protein coding CCDS84149 Q8BJA3 TSL:1 GENCODE basic

Hmbox1-205 ENSMUST00000176128.7 2958 420aa ENSMUSP00000135448.1 Protein coding CCDS84150 H3BKM3 TSL:5 GENCODE basic APPRIS ALT1

Hmbox1-202 ENSMUST00000067843.9 2931 419aa ENSMUSP00000066905.3 Protein coding CCDS36955 Q8BJA3 TSL:1 GENCODE basic APPRIS P3

Hmbox1-203 ENSMUST00000175744.7 1771 405aa ENSMUSP00000135272.1 Protein coding - H3BK67 TSL:1 GENCODE basic APPRIS ALT1

Hmbox1-210 ENSMUST00000177326.7 1729 445aa ENSMUSP00000135372.2 Protein coding - H3BKF8 CDS 5' incomplete TSL:5

Hmbox1-204 ENSMUST00000175905.7 1715 416aa ENSMUSP00000135657.2 Protein coding - H3BL55 TSL:5 GENCODE basic

Hmbox1-209 ENSMUST00000176832.7 1586 408aa ENSMUSP00000135211.1 Protein coding - H3BK13 TSL:5 GENCODE basic

Hmbox1-207 ENSMUST00000176489.7 1422 364aa ENSMUSP00000134824.1 Protein coding - H3BJ31 CDS 3' incomplete TSL:5

Hmbox1-208 ENSMUST00000176657.1 4740 No protein - Retained intron - - TSL:1

Hmbox1-206 ENSMUST00000176386.1 614 No protein - lncRNA - - TSL:5

Page 7 of 9 https://www.alphaknockout.com

158.27 kb Forward strand 64.82Mb 64.84Mb 64.86Mb 64.88Mb 64.90Mb 64.92Mb 64.94Mb Kif13b-201 >protein coding Gm20111-201 >processed pseudogene Gm4573-201 >processed pseudogene (Comprehensive set...

Kif13b-203 >protein coding Ints9-202 >retained intron

Kif13b-205 >retained intron Ints9-203 >retained intron

Ints9-201 >protein coding

Contigs < AC132602.2 < AC141425.3 Genes (Comprehensive set... < Hmbox1-201protein coding

< Hmbox1-206lncRNA < Gm37183-201TEC < Hmbox1-208retained intron

< Hmbox1-202protein coding

< Hmbox1-205protein coding

< Hmbox1-209protein coding

< Hmbox1-210protein coding

< Hmbox1-203protein coding

< Hmbox1-207protein coding

< Hmbox1-204protein coding

Regulatory Build

64.82Mb 64.84Mb 64.86Mb 64.88Mb 64.90Mb 64.92Mb 64.94Mb Reverse strand 158.27 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript pseudogene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000067843

< Hmbox1-202protein coding

Reverse strand 127.63 kb

ENSMUSP00000066... MobiDB lite Low complexity (Seg) Superfamily Homeobox-like domain superfamily

Lambda repressor-like, DNA-binding domain superfamily SMART Homeobox domain Pfam Hepatocyte nuclear factor 1, N-terminal Homeobox domain

PROSITE profiles Homeobox domain PANTHER PTHR14618:SF4

Homeobox-containing protein 1 Gene3D 1.10.260.40 1.10.10.60

CDD Cro/C1-type helix-turn-helix domain

Homeobox domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend frameshift variant missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 419

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9