https://www.alphaknockout.com

Mouse Gba Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Gba conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Gba (NCBI Reference Sequence: NM_008094 ; Ensembl: ENSMUSG00000028048 ) is located on Mouse 3. 12 exons are identified, with the ATG start codon in exon 3 and the TGA stop codon in exon 12 (Transcript: ENSMUST00000167998). Exon 6~8 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Gba gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-237D17 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mutations in this locus variably lower enzyme activity and result in accumulated glucocerebroside in reticuloendotehelial cell lysosomes and glucosylceramide in brain, liver and skin. Severe mutants die perinatally with compromised epidermal permeability.

Exon 6 starts from about 25.57% of the coding region. The knockout of Exon 6~8 will result in frameshift of the gene. The size of intron 5 for 5'-loxP site insertion: 1025 bp, and the size of intron 8 for 3'-loxP site insertion: 858 bp. The size of effective cKO region: ~1422 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10 11 12 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Gba Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7922bp) | A(21.53% 1706) | C(25.57% 2026) | T(27.77% 2200) | G(25.12% 1990)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr3 + 89202198 89205197 3000 browser details YourSeq 329 238 1685 3000 95.9% chr17 + 29644428 29757951 113524 browser details YourSeq 208 1497 1771 3000 98.2% chr4 + 149423503 149423851 349 browser details YourSeq 191 1497 1709 3000 96.7% chr1 + 47168309 47168768 460 browser details YourSeq 190 1467 1692 3000 95.2% chr13 - 93596167 93596391 225 browser details YourSeq 188 1497 1692 3000 99.0% chr5 - 53506509 53506708 200 browser details YourSeq 187 1496 1769 3000 96.1% chrX + 9515891 9516492 602 browser details YourSeq 186 1497 1703 3000 96.1% chr14 + 57692431 57920487 228057 browser details YourSeq 184 1497 1699 3000 94.0% chrX + 48631246 48631443 198 browser details YourSeq 182 1460 1685 3000 96.5% chr9 - 7728103 7728447 345 browser details YourSeq 182 1497 1703 3000 93.0% chr19 - 4428375 4428575 201 browser details YourSeq 181 1500 1701 3000 94.4% chr12 - 31478430 31478627 198 browser details YourSeq 181 1497 1695 3000 97.0% chr7 + 19031021 19031220 200 browser details YourSeq 181 1498 1703 3000 92.4% chr6 + 120434577 120434773 197 browser details YourSeq 181 1496 1703 3000 96.5% chr18 + 34894383 34894756 374 browser details YourSeq 181 1497 1769 3000 95.5% chr15 + 31588327 31588630 304 browser details YourSeq 180 1497 1695 3000 97.4% chr7 - 29177564 29177762 199 browser details YourSeq 180 1497 1706 3000 95.5% chr19 - 7247366 7247961 596 browser details YourSeq 180 1497 1685 3000 95.7% chr2 + 166981086 166981270 185 browser details YourSeq 179 1498 1714 3000 90.7% chr6 + 52280753 52280945 193

Note: The 3000 bp section upstream of Exon 6 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr3 + 89206620 89209619 3000 browser details YourSeq 302 2472 2983 3000 84.8% chr6 - 126932286 126932697 412 browser details YourSeq 36 2 66 3000 69.8% chr6 - 114646455 114646500 46 browser details YourSeq 31 74 111 3000 91.0% chr15 - 55793754 55793790 37 browser details YourSeq 29 23 112 3000 80.5% chr4 - 35196470 35196558 89 browser details YourSeq 29 1079 1133 3000 68.6% chr4 + 120800612 120800652 41 browser details YourSeq 28 231 275 3000 76.2% chr5 - 92821921 92821963 43 browser details YourSeq 28 54 90 3000 90.0% chr4 + 129638488 129638523 36 browser details YourSeq 21 200 220 3000 100.0% chr10 - 24799391 24799411 21 browser details YourSeq 21 23 49 3000 88.9% chr12 + 59329791 59329817 27

Note: The 3000 bp section downstream of Exon 8 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Gba glucosidase, beta, acid [ Mus musculus (house mouse) ] Gene ID: 14466, updated on 22-Oct-2019

Gene summary

Official Symbol Gba provided by MGI Official Full Name glucosidase, beta, acid provided by MGI Primary source MGI:MGI:95665 See related Ensembl:ENSMUSG00000028048 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as GC; GBA1; GLUC; GCase; betaGC Expression Ubiquitous expression in genital fat pad adult (RPKM 35.3), subcutaneous fat pad adult (RPKM 29.7) and 28 other tissues Orthologs See more human all

Genomic context

Location: 3 F1; 3 39.01 cM See Gba in Genome Data Viewer

Exon count: 13

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 3 NC_000069.6 (89202905..89208873)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 3 NC_000069.5 (89006850..89012603)

Chromosome 3 - NC_000069.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 5 transcripts

Gene: Gba ENSMUSG00000028048

Description glucosidase, beta, acid [Source:MGI Symbol;Acc:MGI:95665] Gene Synonyms GBA1, GC, GCase, betaGC, glucocerebrosidase Location Chromosome 3: 89,202,928-89,208,966 forward strand. GRCm38:CM000996.2 About this gene This gene has 5 transcripts (splice variants), 262 orthologues, is a member of 1 Ensembl protein family and is associated with 55 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Gba- ENSMUST00000167998.1 1938 515aa ENSMUSP00000130660.1 Protein coding CCDS17493 P17439 TSL:1 202 GENCODE basic APPRIS P1

Gba- ENSMUST00000077367.10 1779 515aa ENSMUSP00000076589.4 Protein coding CCDS17493 P17439 TSL:1 201 GENCODE basic APPRIS P1

Gba- ENSMUST00000197738.4 1978 388aa ENSMUSP00000142401.1 Nonsense mediated - A0A0G2JDK2 TSL:1 204 decay

Gba- ENSMUST00000200124.1 693 No - Retained intron - - TSL:3 205 protein

Gba- ENSMUST00000196887.1 662 No - Retained intron - - TSL:2 203 protein

Page 6 of 8 https://www.alphaknockout.com

26.04 kb Forward strand 89.195Mb 89.200Mb 89.205Mb 89.210Mb 89.215Mb (Comprehensive set... Gba-201 >protein coding Gm43737-201 >TEC Thbs3-201 >protein coding

Gba-204 >nonsense mediated decay Thbs3-207 >protein coding

Gba-202 >protein coding Thbs3-203 >retained intron

Gba-205 >retained intron Thbs3-202 >protein coding

Gba-203 >retained intron Thbs3-209 >retained intron

Contigs AC161600.6 > Genes < Gm23269-201snRNA < Mtx1-201protein coding (Comprehensive set...

< Mtx1-213protein coding< Mtx1-208retained intron

< Mtx1-202protein coding

< Mtx1-206lncRNA

< Mtx1-217retained intron

< Mtx1-205nonsense mediated decay

< Mtx1-203protein coding

< Mtx1-214protein coding

< Mtx1-215nonsense mediated decay

< Mtx1-216protein coding

< Mtx1-204retained intron

< Mtx1-207retained intron

< Mtx1-210retained intron

< Mtx1-212nonsense mediated decay

< Mtx1-211retained intron

< Mtx1-209protein coding

Regulatory Build

89.195Mb 89.200Mb 89.205Mb 89.210Mb 89.215Mb Reverse strand 26.04 kb

Regulation Legend

CTCF Enhancer Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000167998

5.74 kb Forward strand

Gba-202 >protein coding

ENSMUSP00000130... Low complexity (Seg) Cleavage site (Sign... Superfamily SSF51011

Glycoside hydrolase superfamily Prints Glycoside hydrolase family 30 Pfam Glycosyl hydrolase family 30, TIM-barrel domain Glycosyl hydrolase family 30, beta sandwich domain

PANTHER Glycoside hydrolase family 30

PTHR11069:SF33 Gene3D Glycosyl hydrolase, all-beta

3.20.20.80

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 515

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8