https://www.alphaknockout.com

Mouse Aldh3b2 Knockout Project (CRISPR/Cas9)

Objective: To create a Aldh3b2 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Aldh3b2 (NCBI Reference Sequence: NM_001177438 ; Ensembl: ENSMUSG00000075296 ) is located on Mouse 19. 10 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 10 (Transcript: ENSMUST00000143380). Exon 2~10 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 2.71% of the coding region. Exon 2~10 covers 97.36% of the coding region. The size of effective KO region: ~3691 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10

Legends Exon of mouse Aldh3b2 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(31.95% 639) | C(19.9% 398) | T(20.45% 409) | G(27.7% 554)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.0% 540) | C(21.35% 427) | T(24.35% 487) | G(27.3% 546)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr19 + 3975500 3977499 2000 browser details YourSeq 119 838 1248 2000 78.8% chr4 - 138366477 138366759 283 browser details YourSeq 111 829 1134 2000 80.5% chr11 + 119065646 119065886 241 browser details YourSeq 104 832 1041 2000 85.2% chr9 + 96918054 96918265 212 browser details YourSeq 102 863 1051 2000 87.0% chr9 + 112973760 112973950 191 browser details YourSeq 98 888 1056 2000 92.3% chr15 - 38097804 38097974 171 browser details YourSeq 96 818 1009 2000 85.8% chr2 + 10194526 10194718 193 browser details YourSeq 92 835 1045 2000 83.8% chr2 - 156804006 156804209 204 browser details YourSeq 86 947 1056 2000 91.4% chr18 + 34038032 34038166 135 browser details YourSeq 85 1608 1718 2000 95.7% chr7 + 16524792 16524944 153 browser details YourSeq 84 861 1042 2000 90.3% chr13 + 106268852 106269076 225 browser details YourSeq 82 1617 1708 2000 90.9% chr1 + 24812193 24812279 87 browser details YourSeq 81 1617 1710 2000 95.5% chrX - 154530092 154530193 102 browser details YourSeq 81 1617 1708 2000 93.2% chr1 - 53488168 53488257 90 browser details YourSeq 79 1616 1698 2000 97.6% chr12 - 119122434 119122516 83 browser details YourSeq 79 1618 1700 2000 97.6% chr9 + 7097569 7097651 83 browser details YourSeq 78 1619 1707 2000 89.2% chr7 - 20766959 20767041 83 browser details YourSeq 78 1618 1711 2000 89.5% chr3 - 77662503 77662592 90 browser details YourSeq 78 1616 1698 2000 97.6% chr10 + 17813375 17813458 84 browser details YourSeq 78 1617 1698 2000 97.6% chr1 + 48530191 48530272 82

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr19 + 3981191 3983190 2000 browser details YourSeq 311 13 444 2000 87.7% chr19 + 3969958 3970412 455 browser details YourSeq 101 745 1016 2000 88.2% chr11 + 77891432 77891724 293 browser details YourSeq 91 739 959 2000 84.6% chr12 + 17214169 17214507 339 browser details YourSeq 90 703 888 2000 76.8% chr7 + 92168250 92168402 153 browser details YourSeq 88 717 1016 2000 93.3% chr11 + 100792026 100792606 581 browser details YourSeq 83 780 1020 2000 91.2% chr8 + 123365465 123365714 250 browser details YourSeq 82 745 1016 2000 89.5% chr10 - 75064725 75065106 382 browser details YourSeq 81 703 1016 2000 88.7% chr11 + 102539270 102539684 415 browser details YourSeq 79 704 872 2000 87.7% chr11 - 32443110 32443281 172 browser details YourSeq 78 703 870 2000 93.4% chr11 + 76608898 76855507 246610 browser details YourSeq 76 745 902 2000 88.2% chr2 + 130727593 130727752 160 browser details YourSeq 75 731 871 2000 91.4% chr12 - 75641860 75944653 302794 browser details YourSeq 74 729 962 2000 76.8% chr10 - 58689897 58690085 189 browser details YourSeq 70 706 812 2000 85.2% chr1 + 152762654 152762767 114 browser details YourSeq 68 750 867 2000 78.9% chr12 - 109842642 109842759 118 browser details YourSeq 67 744 872 2000 76.0% chr11 + 75849481 75849609 129 browser details YourSeq 66 783 897 2000 88.4% chr2 - 118514112 118514229 118 browser details YourSeq 65 750 1050 2000 85.9% chr8 - 72745252 72745667 416 browser details YourSeq 64 716 819 2000 84.8% chr11 - 86051943 86052053 111

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and protein information: Aldh3b2 3 family, member B2 [ Mus musculus (house mouse) ] Gene ID: 621603, updated on 12-Aug-2019

Gene summary

Official Symbol Aldh3b2 provided by MGI Official Full Name aldehyde dehydrogenase 3 family, member B2 provided by MGI Primary source MGI:MGI:2147613 See related Ensembl:ENSMUSG00000075296 Gene type protein coding RefSeq status PROVISIONAL Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI848594; C130048D07Rik Expression Biased expression in bladder adult (RPKM 45.8), stomach adult (RPKM 24.0) and 10 other tissues See more

Genomic context

Location: 19; 19 A See Aldh3b2 in Genome Data Viewer

Exon count: 10

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 19 NC_000085.6 (3972328..3981665)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 19 NC_000085.5 (3972328..3981665)

Chromosome 19 - NC_000085.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 1 transcript

Gene: Aldh3b2 ENSMUSG00000075296

Description aldehyde dehydrogenase 3 family, member B2 [Source:MGI Symbol;Acc:MGI:2147613] Gene Synonyms C130048D07Rik Location Chromosome 19: 3,972,328-3,981,646 forward strand. GRCm38:CM001012.2 About this gene This gene has 1 transcript (splice variant), 226 orthologues, 19 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Aldh3b2-201 ENSMUST00000143380.2 2142 479aa ENSMUSP00000115356.1 Protein coding CCDS50347 E9Q3E1 TSL:1 GENCODE basic APPRIS P1

29.32 kb Forward strand 3.97Mb 3.98Mb 3.99Mb (Comprehensive set... Aldh3b3-202 >protein coding Aldh3b2-201 >protein coding Acy3-205 >protein coding

Aldh3b3-203 >nonsense mediated decay Acy3-204 >protein coding

Aldh3b3-201 >protein coding Acy3-207 >protein coding

Acy3-203 >protein coding

Acy3-206 >lncRNA

Acy3-201 >protein coding

Acy3-202 >retained intron

Contigs AC133523.3 > Regulatory Build

3.97Mb 3.98Mb 3.99Mb Reverse strand 29.32 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000143380

9.32 kb Forward strand

Aldh3b2-201 >protein coding

ENSMUSP00000115... Superfamily Aldehyde/histidinol dehydrogenase Pfam Aldehyde dehydrogenase domain PROSITE patterns Aldehyde dehydrogenase, glutamic acid active site

Aldehyde dehydrogenase, cysteine active site PIRSF Aldehyde dehydrogenase NAD(P)-dependent PANTHER PTHR43570:SF12

PTHR43570 Gene3D Aldehyde dehydrogenase, C-terminal

Aldehyde dehydrogenase, N-terminal CDD cd07132

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 400 479

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8