https://www.alphaknockout.com

Mouse Polr2k Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Polr2k conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Polr2k (NCBI Reference Sequence: NM_001039368 ; Ensembl: ENSMUSG00000045996 ) is located on Mouse 15. 4 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 4 (Transcript: ENSMUST00000057177). Exon 2~4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Polr2k gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP24-88B18 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2~4 covers 61.62% of the coding region. Start codon is in exon 1, and stop codon is in exon 4. The size of intron 1 for 5'-loxP site insertion: 773 bp. The size of effective cKO region: ~2774 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 1 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Homology arm Exon of mouse Polr2k cKO region Exon of mouse Spag1 loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8287bp) | A(29.13% 2414) | C(20.65% 1711) | T(27.71% 2296) | G(22.52% 1866)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr15 + 36171738 36174737 3000 browser details YourSeq 785 1 845 3000 97.8% chr7 + 114125418 114126230 813 browser details YourSeq 782 1 795 3000 99.3% chrX + 51918071 51918866 796 browser details YourSeq 780 1 788 3000 99.7% chr8 - 9351734 9352545 812 browser details YourSeq 779 1 787 3000 99.5% chr7 - 95065174 95065960 787 browser details YourSeq 778 1 788 3000 99.4% chr13 - 5092199 5092986 788 browser details YourSeq 776 1 781 3000 99.9% chr12 - 42623380 42624160 781 browser details YourSeq 776 1 790 3000 99.3% chr1 - 30610741 30611535 795 browser details YourSeq 776 1 782 3000 99.7% chr6 + 95663981 95664762 782 browser details YourSeq 776 1 788 3000 99.4% chr3 + 67792778 67793570 793 browser details YourSeq 776 1 782 3000 99.7% chr1 + 150278667 150279448 782 browser details YourSeq 775 1 788 3000 99.3% chrX - 124110475 124111280 806 browser details YourSeq 775 1 788 3000 99.3% chr18 + 59121158 59121947 790 browser details YourSeq 774 1 787 3000 98.9% chr8 - 102887446 102888229 784 browser details YourSeq 774 1 788 3000 99.4% chr4 - 62925172 62925980 809 browser details YourSeq 774 1 788 3000 99.3% chr17 - 57368391 57369180 790 browser details YourSeq 774 1 844 3000 99.3% chr13 - 26359881 26360800 920 browser details YourSeq 774 1 788 3000 99.2% chr10 - 121938493 121939280 788 browser details YourSeq 774 1 783 3000 99.5% chr18 + 13731843 13732627 785 browser details YourSeq 774 1 788 3000 99.2% chr13 + 6784498 6785285 788

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr15 + 36176775 36179774 3000 browser details YourSeq 276 11 662 3000 84.5% chr1 + 59657350 59657868 519 browser details YourSeq 236 1 236 3000 100.0% chrX + 38969992 38970227 236 browser details YourSeq 210 1 231 3000 95.7% chr5 + 123725358 123725589 232 browser details YourSeq 209 1 225 3000 96.5% chr1 - 180854059 180854283 225 browser details YourSeq 192 1 231 3000 91.8% chr8 - 98878491 98878722 232 browser details YourSeq 175 528 885 3000 96.4% chr4 - 33393596 33394162 567 browser details YourSeq 174 7 228 3000 90.8% chr12 + 48632463 48632685 223 browser details YourSeq 172 16 230 3000 90.3% chr6 + 128293591 128294007 417 browser details YourSeq 163 513 706 3000 89.8% chr10 + 91983377 91983562 186 browser details YourSeq 161 512 700 3000 93.6% chr7 - 65408353 65408761 409 browser details YourSeq 158 513 701 3000 92.5% chr2 - 142018420 142018608 189 browser details YourSeq 155 1 171 3000 95.4% chr9 + 122942704 122942874 171 browser details YourSeq 155 511 707 3000 93.8% chr19 + 18572339 18572747 409 browser details YourSeq 155 503 680 3000 94.4% chr17 + 55953576 55953757 182 browser details YourSeq 155 511 691 3000 92.1% chr15 + 100736395 100736573 179 browser details YourSeq 155 517 702 3000 93.9% chr10 + 87165674 87165869 196 browser details YourSeq 154 503 704 3000 88.1% chr1 + 183167240 183167421 182 browser details YourSeq 152 521 691 3000 95.8% chr12 - 99045553 99045727 175 browser details YourSeq 151 512 713 3000 88.2% chr4 + 59253910 59254086 177

Note: The 3000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Polr2k polymerase (RNA) II (DNA directed) polypeptide K [ Mus musculus (house mouse) ] Gene ID: 17749, updated on 12-Aug-2019

Gene summary

Official Symbol Polr2k provided by MGI Official Full Name polymerase (RNA) II (DNA directed) polypeptide K provided by MGI Primary source MGI:MGI:102725 See related Ensembl:ENSMUSG00000045996 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as MafY; Mt1a; RPB12; RPABC4; RPB7.0; RPB10alpha; ABC10-alpha Expression Broad expression in CNS E11.5 (RPKM 21.0), liver E14 (RPKM 19.6) and 17 other tissues See more Orthologs human all

Genomic context

Location: 15; 15 B3.1 See Polr2k in Genome Data Viewer

Exon count: 4

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 15 NC_000081.6 (36174010..36177012)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 15 NC_000081.5 (36103772..36106767)

Chromosome 15 - NC_000081.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 2 transcripts

Gene: Polr2k ENSMUSG00000045996

Description polymerase (RNA) II (DNA directed) polypeptide K [Source:MGI Symbol;Acc:MGI:102725] Gene Synonyms ABC10-alpha, MafY, Mt1a, RPABC4, RPB10alpha, RPB12, RPB7.0 Location Chromosome 15: 36,174,010-36,177,010 forward strand. GRCm38:CM001008.2 About this gene This gene has 2 transcripts (splice variants), 192 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Polr2k-201 ENSMUST00000057177.6 601 99aa ENSMUSP00000051968.5 Protein coding CCDS27426 Q8BFX0 TSL:1 GENCODE basic

Polr2k-202 ENSMUST00000180159.7 489 58aa ENSMUSP00000136975.1 Protein coding CCDS56980 Q545V5 Q63871 TSL:1 GENCODE basic APPRIS P1

23.00 kb Forward strand 36.165Mb 36.170Mb 36.175Mb 36.180Mb 36.185Mb (Comprehensive set... Gm22979-201 >snRNA Polr2k-202 >protein coding Spag1-206 >protein coding Spag1-203 >retained intron

Polr2k-201 >protein coding Spag1-201 >protein coding

Spag1-202 >protein coding

Contigs < AC121577.3 Genes < Fbxo43-201protein coding (Comprehensive set...

< Fbxo43-203protein coding

< Fbxo43-202lncRNA

Regulatory Build

36.165Mb 36.170Mb 36.175Mb 36.180Mb 36.185Mb Reverse strand 23.00 kb

Regulation Legend CTCF Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene processed transcript

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000057177

2.98 kb Forward strand

Polr2k-201 >protein coding

ENSMUSP00000051... Superfamily RNA polymerase subunit RPABC4/transcription elongation factor Spt4 SMART RNA polymerase archaeal subunit P/eukaryotic subunit RPABC4 Pfam RNA polymerase archaeal subunit P/eukaryotic subunit RPABC4 PANTHER DNA-directed RNA polymerases I, II, and III subunit RPABC4 Gene3D 2.20.28.30

All sequence SNPs/i... Sequence variants (dbSNP and all other sources) Y Y R Y K Y R M K

Variant Legend missense variant synonymous variant

Scale bar 0 10 20 30 40 50 60 70 80 99

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7