https://www.alphaknockout.com

Mouse Aldh1a1 Knockout Project (CRISPR/Cas9)

Objective: To create a Aldh1a1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Aldh1a1 (NCBI Reference Sequence: NM_013467 ; Ensembl: ENSMUSG00000053279 ) is located on Mouse 19. 13 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 13 (Transcript: ENSMUST00000087638). Exon 2~4 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a disruption in this gene show a significantly reduced ability to convert retinol to in the liver. Retinal morphology is normal even though the gene is normally highly expressed in the dorsal retina.

Exon 2 starts from about 4.46% of the coding region. Exon 2~4 covers 25.02% of the coding region. The size of effective KO region: ~9186 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 13

Legends Exon of mouse Aldh1a1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1570 bp section downstream of Exon 4 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(31.55% 631) | C(16.4% 328) | T(31.1% 622) | G(20.95% 419)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1570bp) | A(32.68% 513) | C(20.45% 321) | T(28.92% 454) | G(17.96% 282)

Note: The 1570 bp section downstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr19 + 20608914 20610913 2000 browser details YourSeq 35 611 672 2000 84.4% chr2 - 85359506 85359569 64 browser details YourSeq 26 566 592 2000 100.0% chr12 - 87382174 87382208 35 browser details YourSeq 25 562 588 2000 96.3% chr12 - 13525990 13526016 27 browser details YourSeq 22 514 585 2000 65.3% chr7 - 91628520 91628591 72 browser details YourSeq 20 704 723 2000 100.0% chr1 - 33810795 33810814 20 browser details YourSeq 20 1724 1743 2000 100.0% chr1 - 3722190 3722209 20

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1570 1 1570 1570 100.0% chr19 + 20620100 20621669 1570 browser details YourSeq 320 467 1543 1570 88.8% chr19 - 20717787 20718792 1006 browser details YourSeq 59 1073 1150 1570 93.0% chr2 - 25762002 25762091 90 browser details YourSeq 44 1107 1150 1570 100.0% chr15 - 17023458 17023501 44 browser details YourSeq 41 1067 1109 1570 97.7% chr6 + 53301093 53301135 43 browser details YourSeq 39 1031 1111 1570 97.6% chr6 - 104776129 104776384 256 browser details YourSeq 29 1047 1078 1570 96.8% chr13 - 119269442 119269478 37 browser details YourSeq 27 1043 1071 1570 96.6% chrX + 68848139 68848167 29 browser details YourSeq 26 1039 1065 1570 100.0% chrX - 84289546 84289574 29 browser details YourSeq 26 848 876 1570 96.5% chr1 + 70135686 70135714 29 browser details YourSeq 25 1047 1071 1570 100.0% chrX - 16265299 16265323 25 browser details YourSeq 25 1047 1071 1570 100.0% chr7 - 119622902 119622926 25 browser details YourSeq 25 1045 1071 1570 96.3% chr17 - 13999047 13999073 27 browser details YourSeq 25 973 999 1570 88.5% chr1 - 5712654 5712679 26 browser details YourSeq 25 1047 1071 1570 100.0% chr7 + 133164202 133164226 25 browser details YourSeq 23 1183 1205 1570 100.0% chr6 - 25873896 25873918 23 browser details YourSeq 23 1049 1071 1570 100.0% chr18 - 64385770 64385792 23 browser details YourSeq 23 1049 1071 1570 100.0% chr16 + 33185944 33185966 23 browser details YourSeq 22 1047 1068 1570 100.0% chr18 - 51275690 51275711 22 browser details YourSeq 22 1374 1395 1570 100.0% chr15 - 78212566 78212587 22

Note: The 1570 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and protein information: Aldh1a1 family 1, subfamily A1 [ Mus musculus (house mouse) ] Gene ID: 11668, updated on 24-Oct-2019

Gene summary

Official Symbol Aldh1a1 provided by MGI Official Full Name aldehyde dehydrogenase family 1, subfamily A1 provided by MGI Primary source MGI:MGI:1353450 See related Ensembl:ENSMUSG00000053279 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as E1; Ahd2; Ahd-2; Aldh1; ALHDII; Raldh1; ALDH-E1; Aldh1a2 Expression Biased expression in liver adult (RPKM 141.6), genital fat pad adult (RPKM 125.0) and 12 other tissues See more

Genomic context

Location: 19 B; 19 13.91 cM See Aldh1a1 in Genome Data Viewer

Exon count: 17

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 19 NC_000085.6 (20492583..20643463)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 19 NC_000085.5 (20676472..20717952)

Chromosome 19 - NC_000085.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Aldh1a1 ENSMUSG00000053279

Description aldehyde dehydrogenase family 1, subfamily A1 [Source:MGI Symbol;Acc:MGI:1353450] Gene Synonyms ALDH1, Ahd-2, Ahd2, E1, Raldh1 Location Chromosome 19: 20,492,715-20,643,465 forward strand. GRCm38:CM001012.2 About this gene This gene has 6 transcripts (splice variants), 211 orthologues, 19 paralogues, is a member of 1 Ensembl protein family and is associated with 2 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Aldh1a1- ENSMUST00000225337.2 2358 501aa ENSMUSP00000153410.2 Protein CCDS29695 A0A286YDG6 GENCODE 206 coding P24549 basic APPRIS P1

Aldh1a1- ENSMUST00000087638.3 2053 501aa ENSMUSP00000084918.3 Protein CCDS29695 P24549 TSL:1 201 coding GENCODE basic APPRIS P1

Aldh1a1- ENSMUST00000225313.1 825 138aa ENSMUSP00000153011.1 Protein - A0A286YCZ0 CDS 3' 205 coding incomplete

Aldh1a1- ENSMUST00000224358.1 1800 No - lncRNA - - - 202 protein

Aldh1a1- ENSMUST00000224807.1 1085 No - lncRNA - - - 203 protein

Aldh1a1- ENSMUST00000225249.1 510 No - lncRNA - - - 204 protein

Page 7 of 9 https://www.alphaknockout.com

170.75 kb Forward strand 20.50Mb 20.55Mb 20.60Mb 20.65Mb (Comprehensive set... Aldh1a1-206 >protein coding

Aldh1a1-203 >lncRNA Aldh1a1-201 >protein coding

Aldh1a1-202 >lncRNA

Aldh1a1-205 >protein coding

Aldh1a1-204 >lncRNA

Contigs AC152162.9 > < AC167167.3 Genes < C730002L08Rik-202lncRNA < Gm6684-201processed pseudogene (Comprehensive set...

< C730002L08Rik-203lncRNA

< C730002L08Rik-201lncRNA

Regulatory Build

20.50Mb 20.55Mb 20.60Mb 20.65Mb Reverse strand 170.75 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

pseudogene RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000087638

41.50 kb Forward strand

Aldh1a1-201 >protein coding

ENSMUSP00000084... Low complexity (Seg) Superfamily Aldehyde/histidinol dehydrogenase Pfam Aldehyde dehydrogenase domain PROSITE patterns Aldehyde dehydrogenase, cysteine active site

Aldehyde dehydrogenase, glutamic acid active site PANTHER PTHR11699

PTHR11699:SF221 Gene3D Aldehyde dehydrogenase, N-terminal

Aldehyde dehydrogenase, C-terminal CDD cd07141

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 501

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9