https://www.alphaknockout.com

Mouse Ubash3a Knockout Project (CRISPR/Cas9)

Objective: To create a Ubash3a knockout Mouse model (C57BL/6N) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Ubash3a (NCBI Reference Sequence: NM_177823 ; Ensembl: ENSMUSG00000042345 ) is located on Mouse 17. 14 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 14 (Transcript: ENSMUST00000236745). Exon 2~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygous null mice are viable and healthy with no abnormalities detected in any of the hematopoietic lineages.

Exon 2 starts from about 6.09% of the coding region. Exon 2~5 covers 32.59% of the coding region. The size of effective KO region: ~8382 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 14

Legends Exon of mouse Ubash3a Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1520 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1448 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(1520bp) | A(24.87% 378) | C(24.08% 366) | T(22.83% 347) | G(28.22% 429)

Note: The 1520 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1448bp) | A(32.94% 477) | C(13.67% 198) | T(25.83% 374) | G(27.56% 399)

Note: The 1448 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1520 1 1520 1520 100.0% chr17 + 31208233 31209752 1520 browser details YourSeq 24 279 306 1520 85.2% chr13 - 102836505 102836531 27 browser details YourSeq 24 90 121 1520 77.0% chr1 + 88773566 88773592 27 browser details YourSeq 23 85 107 1520 100.0% chr6 - 122766019 122766041 23 browser details YourSeq 22 1358 1379 1520 100.0% chr10 - 74067838 74067859 22 browser details YourSeq 21 90 110 1520 100.0% chr5 - 39390551 39390571 21 browser details YourSeq 21 354 374 1520 100.0% chr4 - 147453201 147453221 21 browser details YourSeq 21 354 374 1520 100.0% chr19 - 60216271 60216291 21 browser details YourSeq 21 354 374 1520 100.0% chr10 - 42607999 42608019 21 browser details YourSeq 21 354 374 1520 100.0% chr10 - 24196033 24196053 21 browser details YourSeq 21 354 374 1520 100.0% chr7 + 36859856 36859876 21 browser details YourSeq 21 354 374 1520 100.0% chr6 + 148812826 148812846 21 browser details YourSeq 21 354 374 1520 100.0% chr10 + 91682669 91682689 21 browser details YourSeq 21 354 374 1520 100.0% chr10 + 77532562 77532582 21 browser details YourSeq 21 131 152 1520 100.0% chr1 + 108252061 108252083 23 browser details YourSeq 20 985 1004 1520 100.0% chr1 - 10676634 10676653 20 browser details YourSeq 20 355 374 1520 100.0% chr10 + 63059802 63059821 20 browser details YourSeq 20 21 40 1520 100.0% chr1 + 154096936 154096955 20

Note: The 1520 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1448 1 1448 1448 100.0% chr17 + 31218135 31219582 1448 browser details YourSeq 178 655 1075 1448 91.6% chr19 - 34497984 34498487 504 browser details YourSeq 148 849 1075 1448 90.8% chr10 + 45007529 45008018 490 browser details YourSeq 147 850 1075 1448 91.7% chr3 - 52582650 52583087 438 browser details YourSeq 145 501 1076 1448 87.2% chr17 + 87762007 87762564 558 browser details YourSeq 142 497 1067 1448 82.6% chr13 - 70767058 70767318 261 browser details YourSeq 141 790 1075 1448 85.0% chr18 + 29947405 29947641 237 browser details YourSeq 137 849 1075 1448 92.1% chr1 - 183628525 183628847 323 browser details YourSeq 136 849 1070 1448 91.0% chr10 + 107713575 107713981 407 browser details YourSeq 136 822 1075 1448 84.5% chr10 + 3947938 3948175 238 browser details YourSeq 136 745 1073 1448 87.5% chr1 + 23710057 23710369 313 browser details YourSeq 135 849 1068 1448 92.5% chr3 + 137745554 137745864 311 browser details YourSeq 135 597 1075 1448 86.6% chr19 + 3302362 3302834 473 browser details YourSeq 134 868 1075 1448 92.5% chr10 + 45007536 45008227 692 browser details YourSeq 133 858 1075 1448 89.6% chr3 - 149745798 149746063 266 browser details YourSeq 133 775 1075 1448 90.4% chr4 + 140489951 140490307 357 browser details YourSeq 133 857 1079 1448 90.1% chr13 + 69443987 69444255 269 browser details YourSeq 132 849 1074 1448 90.2% chr19 - 46129074 46129368 295 browser details YourSeq 131 868 1075 1448 88.1% chr2 + 9773229 9773432 204 browser details YourSeq 130 849 1075 1448 89.0% chr19 - 46129085 46129340 256

Note: The 1448 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Ubash3a ubiquitin associated and SH3 domain containing, A [ Mus musculus (house mouse) ] Gene ID: 328795, updated on 10-Oct-2019

Gene summary

Official Symbol Ubash3a provided by MGI Official Full Name ubiquitin associated and SH3 domain containing, A provided by MGI Primary source MGI:MGI:1926074 See related Ensembl:ENSMUSG00000042345 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as TULA; Sts-2; TULA-1; C330001M22; 5830413C03Rik Expression Biased expression in thymus adult (RPKM 15.3), liver E14.5 (RPKM 3.3) and 4 other tissuesS ee more Orthologs human all

Genomic context

Location: 17; 17 A3.3 See Ubash3a in Genome Data Viewer Exon count: 20

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 17 NC_000083.6 (31207868..31259681)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 17 NC_000083.5 (31345011..31379348)

Chromosome 17 - NC_000083.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 9 transcripts

Gene: Ubash3a ENSMUSG00000042345

Description ubiquitin associated and SH3 domain containing, A [Source:MGI Symbol;Acc:MGI:1926074] Gene Synonyms 5830413C03Rik, Sts-2, TULA Location Chromosome 17: 31,207,873-31,246,892 forward strand. GRCm38:CM001010.2 About this gene This gene has 9 transcripts (splice variants), 143 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 5 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Ubash3a- ENSMUST00000236745.1 7635 624aa ENSMUSP00000158544.1 Protein coding CCDS50054 Q3V3E1 GENCODE 208 basic APPRIS P1

Ubash3a- ENSMUST00000048656.14 1695 527aa ENSMUSP00000045890.8 Protein coding - - TSL:1 201 GENCODE basic

Ubash3a- ENSMUST00000237216.1 740 194aa ENSMUSP00000158184.1 Protein coding - A0A494BAT7 CDS 5' 209 incomplete

Ubash3a- ENSMUST00000173776.1 378 116aa ENSMUSP00000134557.1 Protein coding - G3UZM7 CDS 3' 205 incomplete TSL:3

Ubash3a- ENSMUST00000144772.7 668 129aa ENSMUSP00000119279.1 Nonsense mediated - D6RCF9 TSL:1 202 decay

Ubash3a- ENSMUST00000235375.1 2552 No - Retained intron - - - 206 protein

Ubash3a- ENSMUST00000236599.1 1061 No - Retained intron - - - 207 protein

Ubash3a- ENSMUST00000151620.8 2539 No - lncRNA - - TSL:1 204 protein

Ubash3a- ENSMUST00000147686.1 468 No - lncRNA - - TSL:3 203 protein

Page 7 of 9 https://www.alphaknockout.com

59.02 kb Forward strand 31.20Mb 31.21Mb 31.22Mb 31.23Mb 31.24Mb 31.25Mb (Comprehensive set... Ubash3a-208 >protein coding Gm50218-203 >lncRNA

Ubash3a-202 >nonsense mediated decay Ubash3a-209 >protein coding Gm50218-201 >lncRNA

Ubash3a-205 >protein coding Ubash3a-206 >retained intron Gm50218-202 >lncRNA

Ubash3a-201 >protein coding

Ubash3a-204 >lncRNA

Ubash3a-203 >lncRNA

Ubash3a-207 >retained intron

Contigs < AC167247.2 < CU024900.15 AC154652.2 >

Genes < Tmprss3-202protein coding < Rsph1-201protein coding (Comprehensive set...

< Tmprss3-203protein coding < Rsph1-203protein coding

< Tmprss3-204protein coding < Rsph1-202protein coding

Regulatory Build

31.20Mb 31.21Mb 31.22Mb 31.23Mb 31.24Mb 31.25Mb Reverse strand 59.02 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000236745

39.02 kb Forward strand

Ubash3a-208 >protein coding

ENSMUSP00000158... Low complexity (Seg) Superfamily UBA-like superfamily SH3-like domain superfamily

Histidine phosphatase superfamily SMART SH3 domain Pfam Ubiquitin-associated domain SH3 domain Histidine phosphatase superfamily, clade-1

PROSITE profiles Ubiquitin-associated domain SH3 domain

PANTHER PTHR16469:SF7

PTHR16469 Gene3D 1.10.8.10 2.30.30.40 Histidine phosphatase superfamily

CDD cd14300 UBASH3A, SH3 domain Histidine phosphatase superfamily, clade-1

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend stop gained frameshift variant missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 624

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9