https://www.alphaknockout.com

Mouse Baat Knockout Project (CRISPR/Cas9)

Objective: To create a Baat knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Baat (NCBI Reference Sequence: NM_007519 ; Ensembl: ENSMUSG00000039653 ) is located on Mouse 4. 4 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 4 (Transcript: ENSMUST00000043056). Exon 2 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from the coding region. Exon 2 covers 36.98% of the coding region. The size of effective KO region: ~517 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 4

Legends Exon of mouse Baat Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.5% 530) | C(21.8% 436) | T(33.6% 672) | G(18.1% 362)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.5% 530) | C(18.75% 375) | T(33.15% 663) | G(21.6% 432)

Note: The 2000 bp section downstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr4 - 49503121 49505120 2000 browser details YourSeq 62 332 569 2000 89.8% chr7 - 122348103 122348621 519 browser details YourSeq 61 500 624 2000 93.1% chr6 - 144810232 144810389 158 browser details YourSeq 61 503 604 2000 93.2% chr14 - 122594669 122594849 181 browser details YourSeq 59 500 625 2000 76.9% chr12 + 84280267 84280348 82 browser details YourSeq 52 563 629 2000 96.5% chr10 + 115780579 115780659 81 browser details YourSeq 50 515 610 2000 91.4% chr11 - 104077538 104077653 116 browser details YourSeq 49 541 607 2000 83.7% chr16 - 95041747 95041811 65 browser details YourSeq 48 508 571 2000 91.7% chr17 - 58664717 58664796 80 browser details YourSeq 47 517 590 2000 89.7% chr14 - 86635750 86635986 237 browser details YourSeq 45 520 610 2000 83.4% chr1 + 71636848 71636934 87 browser details YourSeq 42 502 593 2000 80.8% chr8 - 32041311 32041398 88 browser details YourSeq 39 502 548 2000 91.5% chr9 - 73173185 73173231 47 browser details YourSeq 37 516 564 2000 86.1% chr2 - 167216814 167216860 47 browser details YourSeq 37 537 587 2000 93.2% chr14 + 12155116 12155176 61 browser details YourSeq 37 574 654 2000 93.2% chr1 + 117338750 117338833 84 browser details YourSeq 35 558 608 2000 85.4% chr6 + 82468318 82468366 49 browser details YourSeq 35 522 569 2000 92.5% chr4 + 93604430 93604479 50 browser details YourSeq 33 495 528 2000 100.0% chr12 + 17827732 17827806 75 browser details YourSeq 32 559 599 2000 92.2% chr5 - 21980936 21980977 42

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr4 - 49500655 49502654 2000 browser details YourSeq 183 849 1159 2000 86.2% chr2 - 12377781 12378123 343 browser details YourSeq 174 809 1171 2000 81.2% chr13 + 36569867 36570376 510 browser details YourSeq 173 849 1159 2000 81.2% chr5 - 20754940 20755281 342 browser details YourSeq 171 812 1159 2000 84.2% chr9 - 123355749 123356230 482 browser details YourSeq 169 849 1133 2000 82.2% chr3 - 67381162 67381483 322 browser details YourSeq 167 849 1157 2000 88.6% chr2 - 171944317 171944660 344 browser details YourSeq 166 852 1133 2000 86.8% chr12 - 110064908 110065223 316 browser details YourSeq 164 813 1108 2000 84.6% chr10 - 59545338 59545775 438 browser details YourSeq 162 849 1133 2000 84.2% chr8 + 109971339 109971634 296 browser details YourSeq 159 812 1114 2000 82.2% chr4 - 63639164 63639611 448 browser details YourSeq 159 849 1115 2000 84.8% chr2 - 166503039 166547698 44660 browser details YourSeq 158 815 1113 2000 84.0% chr1 - 156379976 156380422 447 browser details YourSeq 157 849 1114 2000 83.5% chr11 - 75990670 75990975 306 browser details YourSeq 154 849 1118 2000 88.9% chrX + 48809547 48809860 314 browser details YourSeq 153 849 1129 2000 87.0% chr17 + 88492303 88492626 324 browser details YourSeq 152 849 1106 2000 82.1% chr3 - 11303036 11303329 294 browser details YourSeq 151 393 1108 2000 86.6% chr14 + 118818441 118819151 711 browser details YourSeq 148 813 1159 2000 87.8% chr8 + 126463771 126464199 429 browser details YourSeq 146 853 1114 2000 85.3% chr15 + 36933294 36933590 297

Note: The 2000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and protein information: Baat bile acid-Coenzyme A: N-acyltransferase [ Mus musculus (house mouse) ] Gene ID: 12012, updated on 28-Sep-2019

Gene summary

Official Symbol Baat provided by MGI Official Full Name bile acid-Coenzyme A: amino acid N-acyltransferase provided by MGI Primary source MGI:MGI:106642 See related Ensembl:ENSMUSG00000039653 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as BAT; AI118337; AI158864 Expression Biased expression in liver adult (RPKM 17.8), liver E18 (RPKM 3.3) and 1 other tissueS ee more Orthologs human all

Genomic context

Location: 4 B1; 4 26.51 cM See Baat in Genome Data Viewer Exon count: 4

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 4 NC_000070.6 (49489416..49507915, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 4 NC_000070.5 (49502290..49519430, complement)

Chromosome 4 - NC_000070.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 2 transcripts

Gene: Baat ENSMUSG00000039653

Description bile acid-Coenzyme A: amino acid N-acyltransferase [Source:MGI Symbol;Acc:MGI:106642] Gene Synonyms BAT, taurine N-acyltransferase Location Chromosome 4: 49,489,422-49,506,557 reverse strand. GRCm38:CM000997.2 About this gene This gene has 2 transcripts (splice variants), 146 orthologues, 9 paralogues, is a member of 1 Ensembl protein family and is associated with 2 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Baat-201 ENSMUST00000043056.8 1941 420aa ENSMUSP00000041983.2 Protein coding CCDS18173 Q91X34 TSL:1 GENCODE basic APPRIS P1

Baat-202 ENSMUST00000166036.1 1263 420aa ENSMUSP00000129603.1 Protein coding CCDS18173 Q91X34 TSL:3 GENCODE basic APPRIS P1

37.14 kb Forward strand 49.48Mb 49.49Mb 49.50Mb 49.51Mb Contigs AL772310.27 > < Baat-201protein coding < Mrpl50-201protein coding (Comprehensive set...

< Baat-202protein coding

Regulatory Build

49.48Mb 49.49Mb 49.50Mb 49.51Mb Reverse strand 37.14 kb

Regulation Legend CTCF Open Chromatin Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000043056

< Baat-201protein coding

Reverse strand 17.14 kb

ENSMUSP00000041... Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Alpha/Beta hydrolase fold

Pfam Acyl-CoA thioester hydrolase/bile acid-CoA amino acid N-acetyltransferase

BAAT/Acyl-CoA thioester hydrolase C-terminal PIRSF Acyl-CoA thioesterase, long chain

PANTHER PTHR10824

PTHR10824:SF18 Gene3D Alpha/Beta hydrolase fold

Acyl-CoA thioester hydrolase/BAAT, N-terminal

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend frameshift variant missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 420

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8