https://www.alphaknockout.com

Mouse Hmg20b Knockout Project (CRISPR/Cas9)

Objective: To create a Hmg20b knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Hmg20b (NCBI Reference Sequence: NM_010440 ; Ensembl: ENSMUSG00000020232 ) is located on Mouse 10. 9 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 9 (Transcript: ENSMUST00000020454). Exon 1~9 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 1 starts from about 0.11% of the coding region. Exon 1~9 covers 100.0% of the coding region. The size of effective KO region: ~3702 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9

Legends Exon of mouse Hmg20b Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.8% 456) | C(28.65% 573) | T(23.9% 478) | G(24.65% 493)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(23.6% 472) | C(28.1% 562) | T(23.5% 470) | G(24.8% 496)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr10 - 81350172 81352171 2000 browser details YourSeq 126 432 946 2000 82.3% chr12 - 36069594 36069820 227 browser details YourSeq 125 443 580 2000 96.4% chr10 - 23991357 23991759 403 browser details YourSeq 123 435 569 2000 96.3% chr2 - 144391389 144391927 539 browser details YourSeq 121 435 566 2000 96.2% chr7 - 97923782 97924304 523 browser details YourSeq 118 434 569 2000 96.9% chr12 + 80832911 80833050 140 browser details YourSeq 118 440 566 2000 96.9% chr1 + 74377492 74377619 128 browser details YourSeq 117 439 569 2000 92.0% chrX - 151538813 151538936 124 browser details YourSeq 117 438 566 2000 96.2% chrX + 130273489 130273619 131 browser details YourSeq 116 436 569 2000 95.2% chr3 - 51537492 51537624 133 browser details YourSeq 116 440 564 2000 96.8% chr19 - 41423211 41423671 461 browser details YourSeq 115 441 566 2000 96.1% chr8 - 19950481 19950608 128 browser details YourSeq 115 438 569 2000 95.3% chr3 + 9045157 9045289 133 browser details YourSeq 114 440 569 2000 94.3% chr19 - 46121926 46122053 128 browser details YourSeq 114 432 566 2000 95.3% chr17 - 23800097 23800520 424 browser details YourSeq 113 436 565 2000 96.0% chr8 - 13143963 13144093 131 browser details YourSeq 113 435 566 2000 95.2% chr4 + 34944833 34944966 134 browser details YourSeq 112 446 566 2000 96.7% chr4 - 108393665 108393801 137 browser details YourSeq 112 439 566 2000 94.5% chr1 - 135353561 135353691 131 browser details YourSeq 111 437 566 2000 95.9% chr17 - 27055762 27055892 131

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr10 - 81344468 81346467 2000 browser details YourSeq 118 752 1855 2000 94.8% chr9 - 65031448 65429191 397744 browser details YourSeq 107 706 848 2000 93.0% chr14 + 78561946 78562089 144 browser details YourSeq 104 763 1853 2000 95.7% chr15 - 97198310 97610344 412035 browser details YourSeq 100 713 848 2000 93.2% chr19 + 5819916 5820052 137 browser details YourSeq 99 713 849 2000 93.9% chr9 - 41653304 41653459 156 browser details YourSeq 99 713 848 2000 90.7% chr7 - 86538080 86538214 135 browser details YourSeq 98 713 844 2000 93.9% chrX + 113980799 113980967 169 browser details YourSeq 97 713 844 2000 94.6% chr15 + 80618502 80618634 133 browser details YourSeq 96 715 848 2000 95.5% chr11 - 64139916 64140050 135 browser details YourSeq 96 556 840 2000 82.1% chr11 + 115605463 115605733 271 browser details YourSeq 95 713 840 2000 93.1% chr1 - 36685760 36685889 130 browser details YourSeq 95 728 845 2000 95.4% chr10 + 115339433 115339606 174 browser details YourSeq 95 714 845 2000 90.9% chr10 + 44988327 44988456 130 browser details YourSeq 94 714 848 2000 91.4% chr11 - 79984580 79984715 136 browser details YourSeq 94 714 845 2000 89.3% chr8 + 71698983 71699115 133 browser details YourSeq 93 713 844 2000 95.2% chr18 + 67550481 67550613 133 browser details YourSeq 93 713 848 2000 93.7% chr10 + 69374596 69374732 137 browser details YourSeq 91 715 844 2000 90.7% chr4 - 123575504 123575632 129 browser details YourSeq 90 713 848 2000 88.4% chr17 - 71449111 71449241 131

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Hmg20b high mobility group 20B [ Mus musculus (house mouse) ] Gene ID: 15353, updated on 12-Aug-2019

Gene summary

Official Symbol Hmg20b provided by MGI Official Full Name high mobility group 20B provided by MGI Primary source MGI:MGI:1341190 See related Ensembl:ENSMUSG00000020232 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Hmgx2; BRAF35; Hmgxb2; AW610687; Smarce1r Expression Ubiquitous expression in duodenum adult (RPKM 74.1), adrenal adult (RPKM 70.0) and 28 other tissues See more Orthologs human all

Genomic context

Location: 10 C1; 10 39.72 cM See Hmg20b in Genome Data Viewer Exon count: 10

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 10 NC_000076.6 (81346046..81350943, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 10 NC_000076.5 (80808791..80813202, complement)

Chromosome 10 - NC_000076.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 13 transcripts

Gene: Hmg20b ENSMUSG00000020232

Description high mobility group 20B [Source:MGI Symbol;Acc:MGI:1341190] Gene Synonyms BRAF35, BRCA2-associated factor 35, Hmgx2, Hmgxb2, Smarce1r Location Chromosome 10: 81,346,048-81,350,480 reverse strand. GRCm38:CM001003.2 About this gene This gene has 13 transcripts (splice variants), 182 orthologues, 33 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Hmg20b- ENSMUST00000020454.10 1621 317aa ENSMUSP00000020454.4 Protein CCDS24055 Q3U1L0 TSL:1 201 coding Q9Z104 GENCODE basic APPRIS P1

Hmg20b- ENSMUST00000105324.8 1540 317aa ENSMUSP00000100961.2 Protein CCDS24055 Q3U1L0 TSL:1 203 coding Q9Z104 GENCODE basic APPRIS P1

Hmg20b- ENSMUST00000105323.7 1463 317aa ENSMUSP00000100960.1 Protein CCDS24055 Q3U1L0 TSL:1 202 coding Q9Z104 GENCODE basic APPRIS P1

Hmg20b- ENSMUST00000167481.7 1321 317aa ENSMUSP00000128807.1 Protein CCDS24055 Q3U1L0 TSL:5 213 coding Q9Z104 GENCODE basic APPRIS P1

Hmg20b- ENSMUST00000122993.7 1502 215aa ENSMUSP00000137861.1 Protein - Q05DT2 TSL:1 204 coding GENCODE basic

Hmg20b- ENSMUST00000154609.2 761 235aa ENSMUSP00000115459.1 Protein - E9Q2W1 CDS 3' incomplete 212 coding TSL:3

Hmg20b- ENSMUST00000141171.7 739 246aa ENSMUSP00000117322.1 Protein - F7AVN4 CDS 5' and 3' 210 coding incomplete TSL:2

Hmg20b- ENSMUST00000132313.1 2501 No - Retained - - TSL:1 206 protein intron

Hmg20b- ENSMUST00000129482.1 627 No - Retained - - TSL:2 205 protein intron

Hmg20b- ENSMUST00000134857.1 597 No - Retained - - TSL:2 207 protein intron

Hmg20b- ENSMUST00000140268.1 546 No - Retained - - TSL:3 209 protein intron

Hmg20b- ENSMUST00000148839.7 420 No - lncRNA - - TSL:3 211 protein

Hmg20b- ENSMUST00000140160.1 359 No - lncRNA - - TSL:3 208 protein

Page 7 of 9 https://www.alphaknockout.com

24.43 kb Forward strand 81.34Mb 81.35Mb 81.36Mb Mfsd12-202 >nonsense mediated decay (Comprehensive set...

Mfsd12-201 >protein coding

Contigs AC155937.4 > Genes (Comprehensive set... < Gipc3-201protein coding < Hmg20b-202protein coding

< Hmg20b-203protein coding

< Hmg20b-201protein coding

< Hmg20b-206retained intron

< Hmg20b-204protein coding

< Hmg20b-213protein coding

< Hmg20b-207retained intron

< Hmg20b-210protein coding

< Hmg20b-212protein coding

< Hmg20b-209retained intron

< Hmg20b-211lncRNA

< Hmg20b-205retained intron

< Hmg20b-208lncRNA

Regulatory Build

81.34Mb 81.35Mb 81.36Mb Reverse strand 24.43 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000020454

< Hmg20b-201protein coding

Reverse strand 4.37 kb

ENSMUSP00000020... PDB-ENSP mappings MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily High mobility group box domain superfamily SMART High mobility group box domain Prints PR00886 Pfam High mobility group box domain PROSITE profiles High mobility group box domain

PANTHER PTHR46040

PTHR46040:SF2 Gene3D High mobility group box domain superfamily

CDD cd01390

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

synonymous variant

Scale bar 0 40 80 120 160 200 240 317

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9