https://www.alphaknockout.com

Mouse Taf1a Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Taf1a conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Taf1a (NCBI Reference Sequence: NM_021466 ; Ensembl: ENSMUSG00000072258 ) is located on Mouse 1. 13 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 13 (Transcript: ENSMUST00000097043). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Taf1a gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP24-281F5 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 100% of the coding region. The knockout of Exon 2 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 1830 bp, and the size of intron 2 for 3'-loxP site insertion: 4865 bp. The size of effective cKO region: ~624 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 13 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Homology arm Exon of mouse Taf1a cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7124bp) | A(25.42% 1811) | C(21.07% 1501) | T(29.18% 2079) | G(24.33% 1733)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr1 + 183387672 183390671 3000 browser details YourSeq 212 2545 3000 3000 91.4% chr5 + 127634254 127634550 297 browser details YourSeq 211 2542 3000 3000 92.2% chr2 + 164918620 164918949 330 browser details YourSeq 207 2539 2990 3000 90.4% chr8 - 98904456 98904683 228 browser details YourSeq 203 2542 2990 3000 99.1% chr3 + 52475514 52476166 653 browser details YourSeq 200 2539 2992 3000 90.5% chr8 + 69678486 69678798 313 browser details YourSeq 198 2545 2755 3000 99.5% chr1 - 63184135 63184573 439 browser details YourSeq 197 2541 2740 3000 99.5% chr3 - 15978450 15978664 215 browser details YourSeq 196 2529 2733 3000 99.1% chr11 + 63983956 63984166 211 browser details YourSeq 194 2539 2735 3000 99.5% chr8 - 57734338 57734539 202 browser details YourSeq 194 2544 2751 3000 95.6% chr3 - 78511591 78511795 205 browser details YourSeq 194 2542 2743 3000 99.5% chr18 + 36835967 36836186 220 browser details YourSeq 193 2539 2732 3000 100.0% chr16 - 64813876 64814072 197 browser details YourSeq 193 2539 2732 3000 100.0% chr1 - 143529443 143529639 197 browser details YourSeq 193 2542 2759 3000 94.5% chr2 + 55305514 55305716 203 browser details YourSeq 192 2542 2983 3000 88.9% chr7 - 123046026 123046246 221 browser details YourSeq 192 2539 2732 3000 99.5% chr5 - 128532352 128532545 194 browser details YourSeq 192 2538 2732 3000 99.5% chr14 - 91768973 91769169 197 browser details YourSeq 192 2545 2751 3000 96.5% chr8 + 13153184 13153386 203 browser details YourSeq 191 2538 2731 3000 99.5% chr8 - 90807193 90807387 195

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr1 + 183391296 183394295 3000 browser details YourSeq 305 794 3000 3000 91.6% chr11 + 6345928 6455453 109526 browser details YourSeq 216 1 2878 3000 91.6% chr1 + 183391188 183425552 34365 browser details YourSeq 203 794 1786 3000 92.6% chr11 - 104340721 104529255 188535 browser details YourSeq 175 2616 3000 3000 90.7% chr2 - 104713077 104713605 529 browser details YourSeq 166 1668 2031 3000 92.4% chr1 + 84828571 84829156 586 browser details YourSeq 163 2616 3000 3000 93.7% chr11 - 68813730 68878686 64957 browser details YourSeq 136 1671 2029 3000 91.1% chr1 + 87112368 87112921 554 browser details YourSeq 132 1656 1808 3000 93.5% chr11 - 101420939 101421093 155 browser details YourSeq 123 1657 1800 3000 93.1% chr11 + 5052193 5052337 145 browser details YourSeq 121 1663 1800 3000 94.3% chr9 - 96670712 96670851 140 browser details YourSeq 120 376 902 3000 84.9% chr1 - 38674249 38674748 500 browser details YourSeq 120 1657 1800 3000 94.2% chr7 + 131314127 131314276 150 browser details YourSeq 114 1668 1808 3000 93.9% chr10 + 76367554 76367695 142 browser details YourSeq 113 1678 1802 3000 95.2% chr2 + 152995609 152995733 125 browser details YourSeq 111 2864 3000 3000 94.5% chr13 + 53192853 53192992 140 browser details YourSeq 110 1662 1788 3000 93.8% chr1 + 150665633 150665989 357 browser details YourSeq 109 1665 1788 3000 94.4% chr3 + 85826557 85826686 130 browser details YourSeq 106 2619 2878 3000 89.1% chr10 - 78301351 78301649 299 browser details YourSeq 106 1672 1809 3000 86.3% chr5 + 77269822 77269946 125

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and protein information: Taf1a TATA-box binding protein associated factor, RNA polymerase I, A [ Mus musculus (house mouse) ] Gene ID: 21339, updated on 12-Aug-2019

Gene summary

Official Symbol Taf1a provided by MGI Official Full Name TATA-box binding protein associated factor, RNA polymerase I, A provided by MGI Primary source MGI:MGI:109578 See related Ensembl:ENSMUSG00000072258 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as TAFI48; mTAFI48 Annotation Annotation category: suggests misassembly information Annotation category: partial on reference assembly Expression Ubiquitous expression in CNS E11.5 (RPKM 7.3), bladder adult (RPKM 6.9) and 27 other tissues See more Orthologs human all

Genomic context

Location: 1; 1 H5 See Taf1a in Genome Data Viewer

Exon count: 11

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 1 NC_000067.6 (183388885..183410201)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 1 NT_165754.2 (362233..383549)

Chromosome 1 - NC_000067.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Taf1a ENSMUSG00000072258

Description TATA-box binding protein associated factor, RNA polymerase I, A [Source:MGI Symbol;Acc:MGI:109578] Gene Synonyms mTAFI48 Location : 183,388,981-183,410,198 forward strand. GRCm38:CM000994.2 About this gene This gene has 3 transcripts (splice variants), 192 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Taf1a-201 ENSMUST00000097043.10 1472 453aa ENSMUSP00000094808.5 Protein coding - J3KMG9 TSL:5 GENCODE basic APPRIS P1

Taf1a-202 ENSMUST00000192076.1 1254 360aa ENSMUSP00000142213.1 Protein coding - A0A0A6YXZ8 CDS 3' incomplete TSL:5

Taf1a-203 ENSMUST00000195798.1 1119 91aa ENSMUSP00000141334.1 Protein coding - A0A0A6YW00 CDS 5' incomplete TSL:2

41.22 kb Forward strand 183.38Mb 183.39Mb 183.40Mb 183.41Mb 183.42Mb (Comprehensive set... Taf1a-201 >protein coding Gm37986-201 >lncRNA

Taf1a-202 >protein coding Taf1a-203 >protein coding Hhipl2-201 >protein coding

Hhipl2-202 >protein coding

Hhipl2-204 >lncRNA

Hhipl2-203 >lncRNA

Contigs CAAA01083517.1 > CAAA01117637.1 > Regulatory Build

183.38Mb 183.39Mb 183.40Mb 183.41Mb 183.42Mb Reverse strand 41.22 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

RNA gene

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000097043

20.37 kb Forward strand

Taf1a-201 >protein coding

ENSMUSP00000094... Low complexity (Seg) Pfam TATA box-binding protein-associated factor RNA polymerase I subunit A-like PIRSF RNA polymerase I, subunit A (TATA-binding protein-associated factor) PANTHER PTHR32122

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 400 453

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7