http://www.alphaknockout.com/ Mouse Polr2j Knockout Project (CRISPR/Cas9)

Objective: To create a Polr2j knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Polr2j (NCBI Reference Sequence: NM_011293 ; Ensembl: ENSMUSG00000039771 ) is located on Mouse 5. 4 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 4 (Transcript: ENSMUST00000041366). Exon 3 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from the coding region. Exon 3 covers 49.86% of the coding region. The size of effective KO region: ~175 bp. The Lrwd1 and Gm43604 gene may be affected.

Page 1 of 9 http://www.alphaknockout.com/

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4

Legends Exon of mouse Polr2j Knockout region

Page 2 of 9 http://www.alphaknockout.com/

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Note: The 478 bp section downstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 http://www.alphaknockout.com/

Overview of the GC Content Distribution (up) Window size: 300 bp

Summary: Full Length(2000bp) | A(22.45% 449) | C(25.25% 505) | G(24.2% 484) | T(28.1% 562)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Summary: Full Length(478bp) | A(17.78% 85) | C(26.57% 127) | G(29.5% 141) | T(26.15% 125)

Note: The 478 bp section downstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 http://www.alphaknockout.com/

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr5 + 136120090 136122089 2000 browser details YourSeq 225 612 1772 2000 90.1% chr11 + 69954861 70474792 519932 browser details YourSeq 154 1637 1870 2000 86.8% chr14 + 21018686 21018911 226 browser details YourSeq 143 1653 1844 2000 86.1% chr7 - 99121614 99121788 175 browser details YourSeq 141 1337 1847 2000 81.7% chr1 - 152248297 152248615 319 browser details YourSeq 137 1640 1802 2000 92.5% chr13 - 13430793 13430959 167 browser details YourSeq 137 1626 1806 2000 90.2% chr10 + 24851823 24852006 184 browser details YourSeq 135 1640 1806 2000 90.1% chr10 - 69297561 69297725 165 browser details YourSeq 134 1485 1806 2000 83.9% chr19 - 3234816 3235107 292 browser details YourSeq 133 1635 1810 2000 89.0% chr18 - 10083552 10083725 174 browser details YourSeq 133 1635 1810 2000 93.0% chr5 + 149724674 149724853 180 browser details YourSeq 132 1637 1806 2000 90.3% chr8 + 85449963 85450132 170 browser details YourSeq 132 1648 1810 2000 92.4% chr5 + 108068495 108068657 163 browser details YourSeq 132 1635 1806 2000 90.4% chr2 + 71198114 71198287 174 browser details YourSeq 131 1640 1806 2000 88.6% chr7 - 117678865 117679026 162 browser details YourSeq 131 1640 1806 2000 92.3% chr12 - 116320556 116320724 169 browser details YourSeq 131 1660 1822 2000 89.3% chr4 + 57002250 57002404 155 browser details YourSeq 131 1626 1810 2000 92.4% chr12 + 111723956 111724144 189 browser details YourSeq 130 1640 1791 2000 93.4% chr17 - 57017026 57017344 319 browser details YourSeq 130 1660 1846 2000 87.4% chr10 + 81619461 81619630 170

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 478 1 478 478 100.0% chr5 + 136122265 136122742 478 browser details YourSeq 33 304 340 478 97.3% chr7 - 5349655 5349910 256 browser details YourSeq 33 363 410 478 97.3% chr4 - 150025818 150025868 51 browser details YourSeq 30 48 83 478 94.3% chr12 - 45256356 45256473 118 browser details YourSeq 26 45 83 478 71.5% chr17 - 56029260 56029290 31 browser details YourSeq 26 67 95 478 96.6% chr10 - 79570600 79570666 67 browser details YourSeq 25 253 282 478 96.3% chr12 + 10467766 10467811 46 browser details YourSeq 23 319 349 478 76.0% chr2 - 146023668 146023693 26 browser details YourSeq 23 69 94 478 96.0% chr13 - 48677793 48677819 27 browser details YourSeq 23 60 83 478 100.0% chr10 - 81636556 81636590 35 browser details YourSeq 23 238 261 478 100.0% chr16 + 62243206 62243234 29 browser details YourSeq 22 65 86 478 100.0% chr5 - 142059215 142059236 22 browser details YourSeq 22 307 328 478 100.0% chr10 - 61497430 61497451 22 browser details YourSeq 20 102 121 478 100.0% chr18 - 7126628 7126647 20

Note: The 478 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 http://www.alphaknockout.com/ Gene and protein information: Polr2j polymerase (RNA) II (DNA directed) polypeptide J [ Mus musculus (house mouse) ] Gene ID: 20022, updated on 12-Aug-2019

Gene summary

Official Symbol Polr2j provided by MGI Official Full Name polymerase (RNA) II (DNA directed) polypeptide J provided by MGI Primary source MGI:MGI:109582 See related Ensembl:ENSMUSG00000039771 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Polr2i; Rpb11a; Rpo2-4; 14.5kDa Expression Ubiquitous expression in liver E14.5 (RPKM 53.1), liver E14 (RPKM 50.1) and 28 other tissues See more Orthologs human all

Genomic context

Location: 5; 5 G2 See Polr2j in Genome Data Viewer Exon count: 4

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 5 NC_000071.6 (136116691..136122947)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 5 NC_000071.5 (136592561..136598817)

Chromosome 5 - NC_000071.6

Page 6 of 9 http://www.alphaknockout.com/

Transcript information: This gene has 3 transcripts

Gene: Polr2j ENSMUSG00000039771

Description polymerase (RNA) II (DNA directed) polypeptide J [Source:MGI Symbol;Acc:MGI:109582] Gene Synonyms 14.5kDa, RNA polymerase II subunit RPB14, Rpb11a, Rpo2-4 Location Chromosome 5: 136,116,631-136,122,947 forward strand. GRCm38:CM000998.2 About this gene This gene has 3 transcripts (splice variants), 165 orthologues, 1 paralogue and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Polr2j-201 ENSMUST00000041366.13 646 117aa ENSMUSP00000038505.7 Protein coding CCDS19751 Q6PI63 TSL:1 GENCODE basic APPRIS P1

Polr2j-202 ENSMUST00000111127.7 900 111aa ENSMUSP00000106757.1 Protein coding - D3YZ20 TSL:2 GENCODE basic

Polr2j-203 ENSMUST00000111129.1 474 87aa ENSMUSP00000106759.1 Protein coding - D3YZ19 TSL:3 GENCODE basic

Page 7 of 9 http://www.alphaknockout.com/

26.32 kb Forward strand

136.11Mb 136.12Mb 136.13Mb (Comprehensive set... Rasa4-201 >protein coding Polr2j-201 >protein coding

Rasa4-208 >retained intron Polr2j-202 >protein coding

Rasa4-209 >retained intron Polr2j-203 >protein coding

Rasa4-204 >retained intron Rasa4-206 >retained intron

Rasa4-212 >retained intron

Rasa4-202 >protein coding

Rasa4-210 >retained intron

Contigs AC087420.4 > Genes < Gm43604-201lncRNA < Lrwd1-206retained intron (Comprehensive set...

< Lrwd1-201protein coding

< Lrwd1-203retained intron

< Lrwd1-205retained intron < Lrwd1-209retained intron

< Lrwd1-202protein coding

< Lrwd1-204retained intron

< Lrwd1-207retained intron

Regulatory Build

136.11Mb 136.12Mb 136.13Mb Reverse strand 26.32 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 http://www.alphaknockout.com/

Transcript: ENSMUST00000041366

6.32 kb Forward strand

Polr2j-201 >protein coding

ENSMUSP00000038... Superfamily RNA polymerase, RBP11-like subunit

Pfam DNA-directed RNA polymerase, RBP11-like dimerisation domain

PROSITE patterns DNA-directed RNA polymerase Rpb11, 13-16kDa subunit, conserved site

PANTHER PTHR13946

PTHR13946:SF31 Gene3D RNA polymerase, RBP11-like subunit

CDD RNA polymerase RBP11

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

synonymous variant

Scale bar 0 10 20 30 40 50 60 70 80 90 100 117

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC, VectorBuilder.

Page 9 of 9