https://www.alphaknockout.com

Mouse Taf3 Knockout Project (CRISPR/Cas9)

Objective: To create a Taf3 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Taf3 (NCBI Reference Sequence: NM_027748 ; Ensembl: ENSMUSG00000025782 ) is located on Mouse 2. 7 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 7 (Transcript: ENSMUST00000026888). Exon 3 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 14.66% of the coding region. Exon 3 covers 65.52% of the coding region. The size of effective KO region: ~1832 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 7

Legends Exon of mouse Taf3 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(21.75% 435) | C(24.85% 497) | T(32.3% 646) | G(21.1% 422)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(29.2% 584) | C(21.3% 426) | T(29.45% 589) | G(20.05% 401)

Note: The 2000 bp section downstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr2 - 9952946 9954945 2000 browser details YourSeq 155 1440 1606 2000 96.5% chr9 - 31291329 31291495 167 browser details YourSeq 155 1443 1607 2000 97.0% chr4 + 55461146 55461310 165 browser details YourSeq 152 1 156 2000 98.8% chr4 + 4317219 4317374 156 browser details YourSeq 150 1 152 2000 99.4% chr8 + 115759403 115759554 152 browser details YourSeq 150 1 156 2000 98.1% chr6 + 102064444 102064599 156 browser details YourSeq 149 1 179 2000 94.7% chr6 + 122041164 122041464 301 browser details YourSeq 149 8 156 2000 100.0% chr2 + 42455735 42455883 149 browser details YourSeq 149 4 156 2000 98.7% chr18 + 21787637 21787789 153 browser details YourSeq 148 1 156 2000 97.5% chr9 - 84929809 84929964 156 browser details YourSeq 148 1440 1595 2000 97.5% chr7 - 6167054 6167209 156 browser details YourSeq 148 1 156 2000 97.5% chr18 - 86904418 86904573 156 browser details YourSeq 148 1 156 2000 97.5% chr19 + 50838799 50838954 156 browser details YourSeq 148 1 156 2000 97.5% chr13 + 101377303 101377458 156 browser details YourSeq 148 1 156 2000 97.5% chr12 + 94997904 94998059 156 browser details YourSeq 148 1 156 2000 97.5% chr11 + 39823049 39823204 156 browser details YourSeq 147 1 156 2000 97.5% chrX - 39773118 39773274 157 browser details YourSeq 147 1441 1597 2000 96.9% chr6 - 90648701 90648857 157 browser details YourSeq 147 8 156 2000 99.4% chr19 - 20119314 20119462 149 browser details YourSeq 147 4 156 2000 98.1% chr10 - 15745949 15746101 153

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr2 - 9949114 9951113 2000 browser details YourSeq 72 1131 1263 2000 92.0% chr9 - 106105196 106105607 412 browser details YourSeq 72 1127 1279 2000 94.0% chr12 - 84237835 84238003 169 browser details YourSeq 59 1130 1299 2000 88.4% chr10 + 128052131 128052466 336 browser details YourSeq 58 1130 1263 2000 94.0% chr1 - 72139571 72139741 171 browser details YourSeq 58 1229 1303 2000 82.7% chr12 + 72197328 72197396 69 browser details YourSeq 55 1132 1296 2000 92.2% chr4 + 107444913 107445102 190 browser details YourSeq 54 1180 1301 2000 87.5% chr11 + 53754654 53754783 130 browser details YourSeq 52 1179 1258 2000 94.9% chr5 - 148438960 148439053 94 browser details YourSeq 50 1136 1292 2000 96.3% chr5 - 106435461 106435669 209 browser details YourSeq 50 1199 1257 2000 93.3% chr18 - 67191935 67192000 66 browser details YourSeq 50 1127 1253 2000 94.6% chr16 - 23220788 23220944 157 browser details YourSeq 50 1179 1268 2000 96.3% chr15 + 78650353 78650631 279 browser details YourSeq 49 1198 1263 2000 96.3% chr2 - 30715242 30715324 83 browser details YourSeq 48 1133 1268 2000 92.9% chr16 - 4872411 4872547 137 browser details YourSeq 46 1130 1268 2000 92.6% chr18 + 34081930 34082074 145 browser details YourSeq 46 1229 1303 2000 80.4% chr17 + 43781506 43781574 69 browser details YourSeq 45 1240 1303 2000 77.6% chr1 + 185936109 185936166 58 browser details YourSeq 44 1224 1279 2000 85.8% chr2 - 101853025 101853076 52 browser details YourSeq 43 1132 1264 2000 95.8% chr10 + 39889026 39889163 138

Note: The 2000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Taf3 TATA-box binding protein associated factor 3 [ Mus musculus (house mouse) ] Gene ID: 209361, updated on 12-Aug-2019

Gene summary

Official Symbol Taf3 provided by MGI Official Full Name TATA-box binding protein associated factor 3 provided by MGI Primary source MGI:MGI:2388097 See related Ensembl:ENSMUSG00000025782 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 140kDa; TAF140; AW539625; TAFII140; TAFII-140; mTAFII140; 4933439M23Rik Expression Ubiquitous expression in bladder adult (RPKM 3.3), testis adult (RPKM 3.1) and 28 other tissues See more Orthologs human all

Genomic context

Location: 2; 2 A1 See Taf3 in Genome Data Viewer Exon count: 7

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 2 NC_000068.7 (9914552..10048609, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 2 NC_000068.6 (9836179..9970236, complement)

Chromosome 2 - NC_000068.7

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 5 transcripts

Gene: Taf3 ENSMUSG00000025782

Description TATA-box binding protein associated factor 3 [Source:MGI Symbol;Acc:MGI:2388097] Gene Synonyms 4933439M23Rik, mTAFII140 Location Chromosome 2: 9,914,552-10,048,596 reverse strand. GRCm38:CM000995.2 About this gene This gene has 5 transcripts (splice variants), 202 orthologues, 1 paralogue and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Taf3-201 ENSMUST00000026888.10 4792 932aa ENSMUSP00000026888.4 Protein coding CCDS15675 A2ASY1 Q5HZG4 TSL:1 GENCODE basic APPRIS P1

Taf3-204 ENSMUST00000114909.1 4897 779aa ENSMUSP00000110559.1 Protein coding - A2ASY0 TSL:1 GENCODE basic

Taf3-203 ENSMUST00000114907.1 695 108aa ENSMUSP00000110557.1 Protein coding - A2ASX9 TSL:1 GENCODE basic

Taf3-202 ENSMUST00000114906.1 583 51aa ENSMUSP00000110556.1 Protein coding - A2ASX8 TSL:2 GENCODE basic

Taf3-205 ENSMUST00000129720.1 2217 No protein - lncRNA - - TSL:5

154.04 kb Forward strand 9.92Mb 9.94Mb 9.96Mb 9.98Mb 10.00Mb 10.02Mb 10.04Mb Gm13262-201 >lncRNA C630004M23Rik-201 >TEC (Comprehensive set...

Contigs AL928704.10 > Genes (Comprehensive set... < Taf3-201protein coding

< Taf3-204protein coding < Atp5c1-201protein coding

< Taf3-205lncRNA < Atp5c1-203protein coding

< Taf3-202protein coding < Atp5c1-202protein coding

< Taf3-203protein coding

Regulatory Build

9.92Mb 9.94Mb 9.96Mb 9.98Mb 10.00Mb 10.02Mb 10.04Mb Reverse strand 154.04 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000026888

< Taf3-201protein coding

Reverse strand 134.04 kb

ENSMUSP00000026... PDB-ENSP mappings MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Zinc finger, FYVE/PHD-type SMART Bromodomain associated domain Zinc finger, PHD-type

Pfam Bromodomain associated domain Zinc finger, PHD-finger

PROSITE profiles Zinc finger, PHD-finger PROSITE patterns Zinc finger, PHD-type, conserved site PANTHER PTHR46452 Gene3D Histone-fold Zinc finger, RING/FYVE/PHD-type

CDD cd15522

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend frameshift variant inframe deletion missense variant splice region variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 720 800 932

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8