http://beta.alphaknockout.cyagen.net

Mouse Upf2 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Upf2 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Upf2 (NCBI Reference Sequence: NM_001081132 ; Ensembl: ENSMUSG00000043241 ) is located on Mouse 2. 22 exons are identified , with the ATG start codon in exon 2 and the TGA stop codon in exon 22 (Transcript: ENSMUST00000060092). Exon 4~5 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Upf2 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP24-186J12 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis.

Note: Mice homozygous for a knock-out allele exhibit early embryonic lethality.

The knockout of Exon 4~5 will result in frameshift of the gene, and covers 9.35% of the coding region. The size of intron 3 for 5'-loxP site insertion: 11756 bp, and the size of intron 5 for 3'-loxP site insertion: 3671 bp. The size of effective cKO region: ~3231 bp. This strategy is designed based on genetic information in existing databases. Due to the complexity of biological processes, all risk of loxP insertion on gene transcription, RNA splicing and translation cannot be predicted at existing technological level.

Page 1 of 7 http://beta.alphaknockout.cyagen.net

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 4 5 22 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Upf2 Homology arm cKO region loxP site

Page 2 of 7 http://beta.alphaknockout.cyagen.net

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(9634bp) | A(25.78% 2484) | C(18.41% 1774) | G(20.33% 1959) | T(35.47% 3417)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 http://beta.alphaknockout.cyagen.net

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr2 + 5970220 5973219 3000 browser details YourSeq 357 549 1958 3000 90.6% chr11 + 97347903 97538486 190584 browser details YourSeq 293 553 1958 3000 87.7% chr11 + 72707273 72786050 78778 browser details YourSeq 290 1786 2661 3000 92.0% chr11 + 118976686 119661940 685255 browser details YourSeq 264 365 743 3000 92.7% chr8 - 26823824 27063420 239597 browser details YourSeq 264 383 743 3000 89.6% chr1 + 135353319 135704622 351304 browser details YourSeq 241 383 743 3000 87.1% chr3 - 95261097 95261461 365 browser details YourSeq 235 383 732 3000 92.5% chr11 - 113660284 113660660 377 browser details YourSeq 228 412 742 3000 88.3% chr10 - 78011082 78011476 395 browser details YourSeq 219 395 840 3000 92.0% chr10 - 128080272 128364080 283809 browser details YourSeq 187 572 832 3000 90.1% chr10 - 117692011 117692425 415 browser details YourSeq 185 555 832 3000 88.6% chr9 - 110503964 110504501 538 browser details YourSeq 178 581 832 3000 89.8% chr6 + 134997104 134997629 526 browser details YourSeq 168 446 741 3000 90.5% chr13 - 99954963 99998283 43321 browser details YourSeq 167 1795 1974 3000 96.7% chr3 - 120454019 120454205 187 browser details YourSeq 159 1784 1963 3000 92.7% chr7 - 100477934 100478111 178 browser details YourSeq 159 542 830 3000 88.4% chr9 + 106215834 106216214 381 browser details YourSeq 159 1785 1957 3000 96.6% chr12 + 111206208 111206388 181 browser details YourSeq 157 363 556 3000 94.5% chr2 + 37363660 37363864 205 browser details YourSeq 156 1778 1957 3000 94.5% chr4 + 131215284 131215602 319

Note: The 3000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr2 + 5976354 5979353 3000 browser details YourSeq 127 828 2778 3000 85.8% chr3 - 33726934 33988807 261874 browser details YourSeq 89 573 2777 3000 82.9% chr14 + 43945005 44045682 100678 browser details YourSeq 80 474 624 3000 85.8% chr9 - 98547130 98547280 151 browser details YourSeq 79 478 631 3000 81.6% chr1 - 23940478 23940623 146 browser details YourSeq 79 795 1122 3000 90.0% chr4 + 116535034 116535620 587 browser details YourSeq 78 472 624 3000 86.2% chr5 - 143283009 143283165 157 browser details YourSeq 77 2376 2771 3000 88.0% chr7 + 65556520 65556982 463 browser details YourSeq 77 2670 2777 3000 86.2% chr17 + 57759264 57759372 109 browser details YourSeq 76 482 979 3000 89.6% chr4 - 133945860 133946449 590 browser details YourSeq 76 911 1169 3000 93.2% chr11 - 115562934 115563322 389 browser details YourSeq 74 867 1052 3000 80.0% chr8 + 105630558 105630729 172 browser details YourSeq 67 475 593 3000 78.2% chr13 + 19502922 19503040 119 browser details YourSeq 66 863 1185 3000 92.4% chr2 + 39021151 39021571 421 browser details YourSeq 65 558 654 3000 81.0% chr2 + 3676010 3676100 91 browser details YourSeq 64 2691 2786 3000 84.1% chr12 - 91868085 91868179 95 browser details YourSeq 64 558 647 3000 85.6% chr10 - 7062343 7062432 90 browser details YourSeq 62 821 947 3000 79.5% chr2 + 135422052 135422181 130 browser details YourSeq 61 832 926 3000 80.8% chr18 - 31684041 31684131 91 browser details YourSeq 58 751 930 3000 78.9% chr6 - 29752979 29753153 175

Note: The 3000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 http://beta.alphaknockout.cyagen.net Gene and protein information: Upf2 UPF2 regulator of nonsense transcripts homolog (yeast) [ Mus musculus (house mouse) ] Gene ID: 326622, updated on 12-Aug-2019

Gene summary

Official Symbol Upf2 provided by MGI Official Full Name UPF2 regulator of nonsense transcripts homolog (yeast) provided by MGI Primary source MGI:MGI:2449307 See related Ensembl:ENSMUSG00000043241 Gene type protein coding RefSeq status PROVISIONAL Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Expression Broad expression in CNS E11.5 (RPKM 6.1), bladder adult (RPKM 4.3) and 23 other tissues See more Orthologs human all

Genomic context

Location: 2; 2 A1 See Upf2 in Genome Data Viewer

Exon count: 24

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 2 NC_000068.7 (5951447..6056703)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 2 NC_000068.6 (5872515..5977749)

Chromosome 2 - NC_000068.7

Page 5 of 7 http://beta.alphaknockout.cyagen.net

Transcript information: This gene has 3 transcripts

Gene: Upf2 ENSMUSG00000043241

Description UPF2 regulator of nonsense transcripts homolog (yeast) [Source:MGI Symbol;Acc:MGI:2449307] Location Chromosome 2: 5,951,469-6,056,703 forward strand. GRCm38:CM000995.2 About this gene This gene has 3 transcripts (splice variants), 209 orthologues, is a member of 1 Ensembl protein family and is associated with 1 phenotype. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Upf2-201 ENSMUST00000060092.12 5174 1269aa ENSMUSP00000058375.6 Protein coding CCDS38041 A2AT37 TSL:1 GENCODE basic APPRIS P1

Upf2-202 ENSMUST00000128200.1 2826 523aa ENSMUSP00000119348.1 Protein coding - F6Q3G7 CDS 5' incomplete TSL:1

Upf2-203 ENSMUST00000144901.1 544 No protein - lncRNA - - TSL:2

125.23 kb Forward strand 5.96Mb 5.98Mb 6.00Mb 6.02Mb 6.04Mb 6.06Mb (Comprehensive set... Upf2-201 >protein coding

Upf2-202 >protein coding

Upf2-203 >processed transcript

Contigs AL928924.12 > CR388026.3 > AL928735.9 > Genes < Dhtkd1-202protein coding (Comprehensive set...

< Dhtkd1-201protein coding

Regulatory Build

5.96Mb 5.98Mb 6.00Mb 6.02Mb 6.04Mb 6.06Mb Reverse strand 125.23 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript

Page 6 of 7 http://beta.alphaknockout.cyagen.net

Transcript: ENSMUST00000060092

105.23 kb Forward strand

Upf2-201 >protein coding

ENSMUSP00000058... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Armadillo-type fold SMART MIF4G-like, type 3 Pfam MIF4G-like, type 3 Up-frameshift suppressor 2, C-terminal PANTHER PTHR12839:SF7

Nonsense-mediated mRNA decay protein Nmd2/UPF2 Gene3D MIF4G-like domain superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend frameshift variant missense variant synonymous variant

Scale bar 0 200 400 600 800 1000 1269

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC, VectorBuilder.

Page 7 of 7