https://www.alphaknockout.com

Mouse Ubap1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Ubap1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Ubap1 (NCBI Reference Sequence: NM_023305 ; Ensembl: ENSMUSG00000028437 ) is located on Mouse 4. 7 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 7 (Transcript: ENSMUST00000072866). Exon 3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Ubap1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-324H18 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 2.32% of the coding region. The knockout of Exon 3 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 4794 bp, and the size of intron 3 for 3'-loxP site insertion: 7097 bp. The size of effective cKO region: ~625 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 3 7 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Ubap1 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7125bp) | A(24.36% 1736) | C(21.94% 1563) | T(31.92% 2274) | G(21.78% 1552)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr4 + 41368476 41371475 3000 browser details YourSeq 655 322 2099 3000 91.8% chr9 - 74983390 75433941 450552 browser details YourSeq 301 1835 2271 3000 89.8% chr4 + 44087008 44087743 736 browser details YourSeq 267 1793 2120 3000 91.7% chr9 + 70658925 70659594 670 browser details YourSeq 241 322 671 3000 91.2% chr19 + 37760692 37761220 529 browser details YourSeq 239 322 636 3000 88.8% chr5 + 136513265 136513596 332 browser details YourSeq 237 211 637 3000 88.2% chr6 - 86562304 86562750 447 browser details YourSeq 234 1801 2118 3000 89.8% chr13 - 33858191 33858515 325 browser details YourSeq 233 1813 2118 3000 89.9% chr5 - 86963247 86963787 541 browser details YourSeq 232 322 634 3000 90.3% chr4 + 152278381 152278708 328 browser details YourSeq 231 330 610 3000 92.0% chr11 - 97595148 97595445 298 browser details YourSeq 229 340 635 3000 88.9% chr4 - 137066163 137066475 313 browser details YourSeq 229 1844 2324 3000 85.7% chr2 - 136902517 136902822 306 browser details YourSeq 227 1835 2116 3000 90.8% chr11 + 104786679 104786978 300 browser details YourSeq 226 1835 2115 3000 90.7% chr5 + 41926475 41926778 304 browser details YourSeq 226 1835 2115 3000 90.7% chr13 + 25446988 25447284 297 browser details YourSeq 225 1835 2118 3000 90.4% chr6 - 74077266 74077563 298 browser details YourSeq 225 1835 2117 3000 90.1% chr6 - 56346334 56346632 299 browser details YourSeq 225 1835 2321 3000 83.1% chr6 + 75695071 75695385 315 browser details YourSeq 224 1844 2118 3000 90.9% chr14 + 27915459 27915746 288

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr4 + 41372101 41375100 3000 browser details YourSeq 160 962 1194 3000 90.5% chr19 - 44948569 44948808 240 browser details YourSeq 150 965 1173 3000 87.9% chr4 - 108484928 108485180 253 browser details YourSeq 124 898 1130 3000 86.4% chr10 - 59985424 59985649 226 browser details YourSeq 121 999 1174 3000 93.0% chr10 - 83510644 83510941 298 browser details YourSeq 116 964 1127 3000 88.1% chr1 - 184990450 184990908 459 browser details YourSeq 113 965 1095 3000 93.2% chr12 - 78830255 78830385 131 browser details YourSeq 113 953 1096 3000 89.6% chr11 - 119131241 119131386 146 browser details YourSeq 113 964 1104 3000 90.1% chr16 + 92459766 92459906 141 browser details YourSeq 108 964 1109 3000 87.0% chr15 - 64499641 64499786 146 browser details YourSeq 106 962 1119 3000 82.8% chr12 - 79913329 79913481 153 browser details YourSeq 105 968 1102 3000 88.9% chr14 - 45312563 45312697 135 browser details YourSeq 105 964 1102 3000 87.8% chr12 - 111563458 111563596 139 browser details YourSeq 101 968 1102 3000 86.5% chr2 - 120976099 120976232 134 browser details YourSeq 101 964 1096 3000 91.1% chr17 + 35826051 35826185 135 browser details YourSeq 100 964 1096 3000 91.7% chr2 - 112294420 112294552 133 browser details YourSeq 90 953 1081 3000 85.3% chr15 + 100351039 100351169 131 browser details YourSeq 88 968 1076 3000 90.8% chr9 - 36733403 36733512 110 browser details YourSeq 88 999 1102 3000 92.4% chr16 - 20385248 20385351 104 browser details YourSeq 85 964 1080 3000 86.4% chr10 + 49109432 49109548 117

Note: The 3000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Ubap1 -associated protein 1 [ Mus musculus (house mouse) ] Gene ID: 67123, updated on 24-Oct-2019

Gene summary

Official Symbol Ubap1 provided by MGI Official Full Name ubiquitin-associated protein 1 provided by MGI Primary source MGI:MGI:2149543 See related Ensembl:ENSMUSG00000028437 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Ubap; NAG20; UBAP-1; 2700092A01Rik Expression Ubiquitous expression in testis adult (RPKM 11.0), kidney adult (RPKM 8.7) and 28 other tissues See more Orthologs human all

Genomic context

Location: 4; 4 A5 See Ubap1 in Genome Data Viewer

Exon count: 9

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 4 NC_000070.6 (41348996..41389766)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 4 NC_000070.5 (41296029..41336799)

Chromosome 4 - NC_000070.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 5 transcripts

Gene: Ubap1 ENSMUSG00000028437

Description ubiquitin-associated protein 1 [Source:MGI Symbol;Acc:MGI:2149543] Gene Synonyms 2700092A01Rik, NAG20 Location Chromosome 4: 41,348,996-41,390,525 forward strand. GRCm38:CM000997.2 About this gene This gene has 5 transcripts (splice variants), 203 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 2 . Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Ubap1-201 ENSMUST00000072866.11 3456 502aa ENSMUSP00000072643.5 Protein coding CCDS51140 Q8BH48 TSL:1 GENCODE basic APPRIS P1

Ubap1-202 ENSMUST00000108060.9 2514 441aa ENSMUSP00000103695.3 Protein coding CCDS71361 Q8BH48 TSL:1 GENCODE basic

Ubap1-203 ENSMUST00000132235.1 422 141aa ENSMUSP00000123491.1 Protein coding - F6WHE1 CDS 5' and 3' incomplete TSL:5

Ubap1-205 ENSMUST00000154529.1 671 No protein - lncRNA - - TSL:3

Ubap1-204 ENSMUST00000136705.1 314 No protein - lncRNA - - TSL:3

61.53 kb Forward strand 41.34Mb 41.36Mb 41.38Mb 41.40Mb (Comprehensive set... Ubap1-202 >protein coding

Ubap1-201 >protein coding

Ubap1-204 >lncRNA Ubap1-203 >protein coding

Ubap1-205 >lncRNA

Contigs < AL807823.6

Genes < Gm26084-201snRNA < Kif24-202protein coding (Comprehensive set...

< Kif24-201protein coding

< Kif24-206lncRNA

Regulatory Build

41.34Mb 41.36Mb 41.38Mb 41.40Mb Reverse strand 61.53 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000072866

41.53 kb Forward strand

Ubap1-201 >protein coding

ENSMUSP00000072... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily UBA-like superfamily PROSITE profiles UMA domain Ubiquitin-associated domain PANTHER Ubiquitin-associated protein 1

PTHR15960:SF2 Gene3D Ubiquitin-associated protein 1, C-terminal CDD cd14315 cd14316

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 502

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7