https://www.alphaknockout.com

Mouse Ugt1a5 Knockout Project (CRISPR/Cas9)

Objective: To create a Ugt1a5 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Ugt1a5 (NCBI Reference Sequence: NM_201643 ; Ensembl: ENSMUSG00000089943 ) is located on Mouse 1. 5 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 5 (Transcript: ENSMUST00000097659). Exon 1 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 1 starts from the coding region. Exon 1 covers 53.69% of the coding region. The size of effective KO region: ~852 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 5

Legends Exon of mouse Ugt1a5 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 852 bp section of Exon 1 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 852 bp section of Exon 1 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(852bp) | A(24.41% 208) | C(24.65% 210) | T(27.58% 235) | G(23.36% 199)

Note: The 852 bp section of Exon 1 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(852bp) | A(24.53% 209) | C(24.65% 210) | T(27.58% 235) | G(23.24% 198)

Note: The 852 bp section of Exon 1 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 852 1 852 852 100.0% chr1 + 88166054 88166905 852 browser details YourSeq 488 42 852 852 89.5% chr1 + 88186708 88201502 14795 browser details YourSeq 174 465 852 852 78.7% chr1 + 88192311 88192685 375 browser details YourSeq 154 34 804 852 87.1% chr1 + 88200681 88212825 12145 browser details YourSeq 71 735 827 852 88.2% chr1 + 88187404 88187496 93 browser details YourSeq 32 166 207 852 88.1% chr1 + 88192046 88192087 42 browser details YourSeq 25 663 702 852 70.4% chr11 + 46988627 46988657 31 browser details YourSeq 22 440 461 852 100.0% chr10 - 39275656 39275677 22 browser details YourSeq 20 705 724 852 100.0% chr12 - 40540351 40540370 20 browser details YourSeq 20 748 769 852 95.5% chr1 + 97828035 97828056 22

Note: The 852 bp section of Exon 1 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 852 1 852 852 100.0% chr1 + 88166052 88166903 852 browser details YourSeq 486 44 852 852 89.5% chr1 + 88186708 88201500 14793 browser details YourSeq 248 36 830 852 78.3% chr1 + 88191924 88192661 738 browser details YourSeq 154 36 806 852 87.1% chr1 + 88200681 88212825 12145 browser details YourSeq 71 737 829 852 88.2% chr1 + 88187404 88187496 93 browser details YourSeq 25 665 704 852 70.4% chr11 + 46988627 46988657 31 browser details YourSeq 22 442 463 852 100.0% chr10 - 39275656 39275677 22 browser details YourSeq 20 707 726 852 100.0% chr12 - 40540351 40540370 20 browser details YourSeq 20 750 771 852 95.5% chr1 + 97828035 97828056 22

Note: The 852 bp section of Exon 1 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and protein information: Ugt1a5 UDP 1 family, polypeptide A5 [ Mus musculus (house mouse) ] Gene ID: 394433, updated on 10-Oct-2019

Gene summary

Official Symbol Ugt1a5 provided by MGI Official Full Name UDP glucuronosyltransferase 1 family, polypeptide A5 provided by MGI Primary source MGI:MGI:3032634 See related Ensembl:ENSMUSG00000089943 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Expression Biased expression in kidney adult (RPKM 235.0), liver adult (RPKM 156.2) and 9 other tissuesS ee more

Genomic context

Location: 1; 1 D See Ugt1a5 in Genome Data Viewer Exon count: 5

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 1 NC_000067.6 (88166012..88220002)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 1 NC_000067.5 (90062587..90116577)

Chromosome 1 - NC_000067.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 1 transcript

Gene: Ugt1a5 ENSMUSG00000089943

Description UDP glucuronosyltransferase 1 family, polypeptide A5 [Source:MGI Symbol;Acc:MGI:3032634] Location Chromosome 1: 88,166,012-88,218,997 forward strand. GRCm38:CM000994.2 About this gene This gene has 1 transcript (splice variant), 1087 orthologues, 22 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Ugt1a5-201 ENSMUST00000097659.4 2203 529aa ENSMUSP00000095263.4 Protein coding CCDS35659 B2RT14 TSL:1 GENCODE basic APPRIS P1

72.99 kb Forward strand 88.16Mb 88.18Mb 88.20Mb 88.22Mb Ugt1a7c-204 >protein coding Mroh2a-201 >protein coding (Comprehensive set...

Ugt1a7c-201 >protein coding Mroh2a-205 >lncRNA

Ugt1a7c-203 >nonsense mediated decay Mroh2a-202 >protein coding

Ugt1a7c-202 >protein coding

Ugt1a6b-202 >protein coding

Ugt1a6b-201 >protein coding

Ugt1a8-201 >protein coding

Ugt1a10-202 >protein coding

Ugt1a10-201 >protein coding

Ugt1a10-203 >protein coding

Ugt1a6a-202 >protein coding

Ugt1a6a-203 >retained intron

Ugt1a6a-204 >protein coding

Ugt1a6a-201 >protein coding

Ugt1a9-201 >protein coding

Ugt1a5-201 >protein coding

Gm15368-201 >unprocessed pseudogene Ugt1a1-201 >protein coding

Gm15376-201 >unprocessed pseudogene

Ugt1a2-201 >protein coding

Contigs < AC087801.69 < AC087780.19 Genes < Gm20528-201lncRNA (Comprehensive set...

< Dnajb3-201protein coding

< Gm20528-202lncRNA Page 7 of 9

Regulatory Build

88.16Mb 88.18Mb 88.20Mb 88.22Mb Reverse strand 72.99 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene pseudogene 72.99 kb Forward strand 88.16Mb 88.18Mb 88.20Mb 88.22Mb Genes Ugt1a7c-204 >protein coding Mroh2a-201 >protein coding (Comprehensive set...

Ugt1a7c-201 >protein coding Mroh2a-205 >lncRNA

Ugt1a7c-203 >nonsense mediated decay Mroh2a-202 >protein coding

Ugt1a7c-202 >protein coding

Ugt1a6b-202 >protein coding

Ugt1a6b-201 >protein coding

Ugt1a8-201 >protein coding

Ugt1a10-202 >protein coding

Ugt1a10-201 >protein coding

Ugt1a10-203 >protein coding

Ugt1a6a-202 >protein coding

Ugt1a6a-203 >retained intron

Ugt1a6a-204 >protein coding

Ugt1a6a-201 >protein coding

Ugt1a9-201 >protein coding

Ugt1a5-201 >protein coding

Gm15368-201 >unprocessed pseudogene Ugt1a1-201 >protein coding

Gm15376-201 >unprocessed pseudogene

Ugt1a2-201 >protein coding

Contigs < AC087801.69 < AC087780.19 Genes < Gm20528-201lncRNA (Comprehensive set...

< Dnajb3-201protein coding https://www.alphaknockout.com

< Gm20528-202lncRNA

Regulatory Build

88.16Mb 88.18Mb 88.20Mb 88.22Mb Reverse strand 72.99 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene pseudogene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000097659

52.99 kb Forward strand

Ugt1a5-201 >protein coding

ENSMUSP00000095... Transmembrane heli... Low complexity (Seg) Coiled-coils (Ncoils) Cleavage site (Sign... Superfamily SSF53756 Pfam UDP-glucuronosyl/UDP- PROSITE patterns UDP- family, conserved site PANTHER PTHR11926 Gene3D 3.40.50.2000 CDD cd03784

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend splice donor variant stop gained frameshift variant missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 529

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9