https://www.alphaknockout.com

Mouse Taf1a Knockout Project (CRISPR/Cas9)

Objective: To create a Taf1a knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Taf1a (NCBI Reference Sequence: NM_021466 ; Ensembl: ENSMUSG00000072258 ) is located on Mouse 1. 13 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 13 (Transcript: ENSMUST00000097043). Exon 2~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from the coding region. Exon 2~5 covers 44.67% of the coding region. The size of effective KO region: ~9881 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 13

Legends Exon of mouse Taf1a Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1830 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1818 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(1830bp) | A(24.04% 440) | C(21.26% 389) | T(30.71% 562) | G(23.99% 439)

Note: The 1830 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1818bp) | A(24.31% 442) | C(22.88% 416) | T(28.71% 522) | G(24.09% 438)

Note: The 1818 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1830 1 1830 1830 100.0% chr1 + 183389092 183390921 1830 browser details YourSeq 225 1122 1602 1830 91.2% chr2 + 164918620 164918971 352 browser details YourSeq 224 1125 1602 1830 90.1% chr5 + 127634254 127634572 319 browser details YourSeq 203 1122 1478 1830 91.6% chr2 - 70859980 70860233 254 browser details YourSeq 203 1122 1570 1830 99.1% chr3 + 52475514 52476166 653 browser details YourSeq 201 1122 1603 1830 89.0% chr7 - 123045985 123046246 262 browser details YourSeq 201 1125 1432 1830 91.8% chr16 - 37875692 37875900 209 browser details YourSeq 201 1122 1654 1830 90.0% chr11 - 107516284 107516491 208 browser details YourSeq 200 1119 1572 1830 90.5% chr8 + 69678486 69678798 313 browser details YourSeq 200 1122 1649 1830 98.1% chr3 + 152545811 152546426 616 browser details YourSeq 200 1117 1615 1830 95.5% chr13 + 14602621 14603234 614 browser details YourSeq 198 1125 1335 1830 99.5% chr1 - 63184135 63184573 439 browser details YourSeq 196 1109 1313 1830 99.1% chr11 + 63983956 63984166 211 browser details YourSeq 194 1122 1330 1830 96.1% chr8 - 98904473 98904678 206 browser details YourSeq 194 1124 1331 1830 95.6% chr3 - 78511591 78511795 205 browser details YourSeq 194 1122 1323 1830 99.5% chr18 + 36835967 36836186 220 browser details YourSeq 193 1113 1312 1830 98.5% chr6 - 122287368 122287575 208 browser details YourSeq 193 1124 1320 1830 98.0% chr3 - 15978450 15978645 196 browser details YourSeq 193 1122 1339 1830 94.5% chr2 + 55305514 55305716 203 browser details YourSeq 192 1122 1315 1830 99.5% chr8 - 57734338 57734531 194

Note: The 1830 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1818 1 1818 1818 100.0% chr1 + 183400803 183402620 1818 browser details YourSeq 185 931 1433 1818 94.4% chr11 - 115372010 115625172 253163 browser details YourSeq 141 913 1410 1818 88.6% chr1 + 134412387 134412911 525 browser details YourSeq 113 1011 1411 1818 77.0% chr5 - 137687399 137687618 220 browser details YourSeq 96 893 1086 1818 90.9% chr16 + 91958588 91958788 201 browser details YourSeq 92 915 1411 1818 88.0% chr13 - 100629946 100891113 261168 browser details YourSeq 92 916 1086 1818 90.5% chr1 - 179150689 179150863 175 browser details YourSeq 90 909 1079 1818 77.6% chr15 + 80786192 80786365 174 browser details YourSeq 89 558 1079 1818 76.8% chr9 - 40813295 40813695 401 browser details YourSeq 87 565 1079 1818 77.2% chr15 - 27218720 27219120 401 browser details YourSeq 86 913 1086 1818 86.4% chr2 - 122210926 122557423 346498 browser details YourSeq 83 893 1071 1818 87.7% chr11 + 101072850 101073041 192 browser details YourSeq 81 951 1085 1818 86.0% chr2 + 35418931 35419067 137 browser details YourSeq 80 610 1004 1818 77.0% chr1 - 89355056 89355440 385 browser details YourSeq 80 906 1077 1818 83.9% chr2 + 104001618 104001786 169 browser details YourSeq 77 953 1079 1818 84.3% chr1 - 175783850 175783977 128 browser details YourSeq 76 938 1085 1818 84.1% chr18 + 53343854 53344004 151 browser details YourSeq 75 951 1087 1818 82.1% chr12 - 103310221 103310360 140 browser details YourSeq 75 951 1471 1818 86.6% chr4 + 107911596 107912176 581 browser details YourSeq 74 938 1081 1818 85.6% chr4 - 152074625 152074769 145

Note: The 1818 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and protein information: Taf1a TATA-box binding protein associated factor, RNA polymerase I, A [ Mus musculus (house mouse) ] Gene ID: 21339, updated on 12-Aug-2019

Gene summary

Official Symbol Taf1a provided by MGI Official Full Name TATA-box binding protein associated factor, RNA polymerase I, A provided by MGI Primary source MGI:MGI:109578 See related Ensembl:ENSMUSG00000072258 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as TAFI48; mTAFI48 Annotation Annotation category: suggests misassembly information Annotation category: partial on reference assembly Expression Ubiquitous expression in CNS E11.5 (RPKM 7.3), bladder adult (RPKM 6.9) and 27 other tissues See more Orthologs human all

Genomic context

Location: 1; 1 H5 See Taf1a in Genome Data Viewer

Exon count: 11

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 1 NC_000067.6 (183388885..183410201)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 1 NT_165754.2 (362233..383549)

Chromosome 1 - NC_000067.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Taf1a ENSMUSG00000072258

Description TATA-box binding protein associated factor, RNA polymerase I, A [Source:MGI Symbol;Acc:MGI:109578] Gene Synonyms mTAFI48 Location : 183,388,981-183,410,198 forward strand. GRCm38:CM000994.2 About this gene This gene has 3 transcripts (splice variants), 192 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Taf1a-201 ENSMUST00000097043.10 1472 453aa ENSMUSP00000094808.5 Protein coding - J3KMG9 TSL:5 GENCODE basic APPRIS P1

Taf1a-202 ENSMUST00000192076.1 1254 360aa ENSMUSP00000142213.1 Protein coding - A0A0A6YXZ8 CDS 3' incomplete TSL:5

Taf1a-203 ENSMUST00000195798.1 1119 91aa ENSMUSP00000141334.1 Protein coding - A0A0A6YW00 CDS 5' incomplete TSL:2

41.22 kb Forward strand 183.38Mb 183.39Mb 183.40Mb 183.41Mb 183.42Mb (Comprehensive set... Taf1a-201 >protein coding Gm37986-201 >lncRNA

Taf1a-202 >protein coding Taf1a-203 >protein coding Hhipl2-201 >protein coding

Hhipl2-202 >protein coding

Hhipl2-204 >lncRNA

Hhipl2-203 >lncRNA

Contigs CAAA01083517.1 > CAAA01117637.1 > Regulatory Build

183.38Mb 183.39Mb 183.40Mb 183.41Mb 183.42Mb Reverse strand 41.22 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000097043

20.37 kb Forward strand

Taf1a-201 >protein coding

ENSMUSP00000094... Low complexity (Seg) Pfam TATA box-binding protein-associated factor RNA polymerase I subunit A-like PIRSF RNA polymerase I, subunit A (TATA-binding protein-associated factor) PANTHER PTHR32122

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 400 453

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8