https://www.alphaknockout.com

Mouse Taf6 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Taf6 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Taf6 (NCBI Reference Sequence: NM_009315 ; Ensembl: ENSMUSG00000036980 ) is located on Mouse 5. 15 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 15 (Transcript: ENSMUST00000048698). Exon 3~6 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Taf6 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-378N5 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a transgenic gene disruption may exhibit preimplantation lethality.

Exon 3 starts from about 7.72% of the coding region. The knockout of Exon 3~6 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 406 bp, and the size of intron 6 for 3'-loxP site insertion: 351 bp. The size of effective cKO region: ~1209 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10 11 12 15 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Taf6 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7587bp) | A(26.14% 1983) | C(23.37% 1773) | T(25.25% 1916) | G(25.24% 1915)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr5 - 138184129 138187128 3000 browser details YourSeq 191 597 2101 3000 87.2% chr5 - 25249552 25478172 228621 browser details YourSeq 151 597 769 3000 91.1% chr4 + 135980194 135980360 167 browser details YourSeq 149 592 770 3000 88.9% chr11 - 48703072 48703241 170 browser details YourSeq 147 600 768 3000 94.1% chr15 - 75870227 75870397 171 browser details YourSeq 145 595 764 3000 89.6% chr9 - 15436283 15436444 162 browser details YourSeq 145 600 769 3000 90.2% chr11 - 53392148 53392310 163 browser details YourSeq 145 597 766 3000 90.2% chr1 + 72691528 72691690 163 browser details YourSeq 144 599 765 3000 90.7% chr2 - 168674156 168674316 161 browser details YourSeq 144 605 769 3000 91.2% chr5 + 20977219 20977377 159 browser details YourSeq 144 601 769 3000 90.2% chr1 + 93743413 93743575 163 browser details YourSeq 143 597 771 3000 91.2% chr16 - 79356775 79356948 174 browser details YourSeq 142 597 769 3000 92.4% chr2 + 120561431 120561620 190 browser details YourSeq 142 600 761 3000 95.5% chr13 + 37621112 37621277 166 browser details YourSeq 142 597 771 3000 91.0% chr1 + 125641994 125642166 173 browser details YourSeq 142 605 769 3000 90.6% chr1 + 88435767 88435925 159 browser details YourSeq 141 597 766 3000 93.8% chrUn_JH584304 + 7509 7679 171 browser details YourSeq 140 606 769 3000 94.3% chrX + 153199348 153199820 473 browser details YourSeq 140 597 759 3000 90.4% chr11 + 119365480 119365635 156 browser details YourSeq 139 600 763 3000 90.8% chr17 - 24704943 24705104 162

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr5 - 138179920 138182919 3000 browser details YourSeq 199 1234 1593 3000 91.6% chrX - 12120077 12222063 101987 browser details YourSeq 165 1236 1589 3000 87.9% chr5 + 143601734 143619998 18265 browser details YourSeq 126 1304 1593 3000 94.4% chr11 + 96101052 96101484 433 browser details YourSeq 124 1206 1578 3000 84.5% chr10 - 79680020 79680165 146 browser details YourSeq 123 1210 1576 3000 82.3% chr11 + 55138896 55139038 143 browser details YourSeq 122 1204 1589 3000 79.3% chr10 + 77451208 77451392 185 browser details YourSeq 119 1204 1589 3000 80.5% chr15 - 27156193 27156346 154 browser details YourSeq 118 1210 1576 3000 81.9% chr8 - 94068709 94068862 154 browser details YourSeq 117 1449 1604 3000 92.8% chr10 + 81166305 81166619 315 browser details YourSeq 116 1208 1367 3000 89.6% chr1 - 35422356 35422517 162 browser details YourSeq 116 1235 1589 3000 81.5% chr18 + 72930270 72930407 138 browser details YourSeq 116 1210 1586 3000 79.9% chr12 + 31229766 31229920 155 browser details YourSeq 115 1246 1575 3000 86.6% chr1 - 165499382 165784636 285255 browser details YourSeq 112 1449 1593 3000 91.8% chr11 + 30982902 30983081 180 browser details YourSeq 110 1214 1576 3000 81.0% chr1 + 180655330 180655463 134 browser details YourSeq 109 1278 1594 3000 89.9% chr1 - 134543095 134663946 120852 browser details YourSeq 109 1233 1579 3000 81.2% chr4 + 129386174 129386306 133 browser details YourSeq 108 1209 1589 3000 84.9% chrX - 125100123 125100574 452 browser details YourSeq 108 1234 1596 3000 75.7% chr12 - 28096469 28096612 144

Note: The 3000 bp section downstream of Exon 6 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Taf6 TATA-box binding protein associated factor 6 [ Mus musculus (house mouse) ] Gene ID: 21343, updated on 24-Oct-2019

Gene summary

Official Symbol Taf6 provided by MGI Official Full Name TATA-box binding protein associated factor 6 provided by MGI Primary source MGI:MGI:109129 See related Ensembl:ENSMUSG00000036980 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as p80; 80kDa; Taf2e; TAFII70; AW549759; TAF(II)80 Expression Ubiquitous expression in testis adult (RPKM 50.6), ovary adult (RPKM 34.0) and 28 other tissues See more Orthologs human all

Genomic context

Location: 5; 5 G2 See Taf6 in Genome Data Viewer

Exon count: 16

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 5 NC_000071.6 (138178617..138187451, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 5 NC_000071.5 (138619845..138628414, complement)

Chromosome 5 - NC_000071.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: Taf6 ENSMUSG00000036980

Description TATA-box binding protein associated factor 6 [Source:MGI Symbol;Acc:MGI:109129] Gene Synonyms 80kDa, Taf2e, p80 Location Chromosome 5: 138,178,617-138,187,451 reverse strand. GRCm38:CM000998.2 About this gene This gene has 8 transcripts (splice variants), 189 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 5 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Taf6- ENSMUST00000110936.7 2308 678aa ENSMUSP00000106561.1 Protein coding CCDS19795 Q62311 TSL:1 202 GENCODE basic APPRIS P1

Taf6- ENSMUST00000048698.13 2304 678aa ENSMUSP00000048016.7 Protein coding CCDS19795 Q62311 TSL:1 201 GENCODE basic APPRIS P1

Taf6- ENSMUST00000110937.7 2093 636aa ENSMUSP00000106562.1 Protein coding - D3Z0T0 TSL:5 203 GENCODE basic

Taf6- ENSMUST00000123415.7 968 214aa ENSMUSP00000122534.1 Protein coding - D3Z5H6 CDS 3' 204 incomplete TSL:5

Taf6- ENSMUST00000139276.1 357 55aa ENSMUSP00000116512.1 Protein coding - D3Z4X5 CDS 3' 206 incomplete TSL:5

Taf6- ENSMUST00000153117.7 2619 114aa ENSMUSP00000138335.1 Nonsense mediated - S4R1R2 TSL:5 207 decay

Taf6- ENSMUST00000130473.1 1220 No - Retained intron - - TSL:1 205 protein

Taf6- ENSMUST00000200483.1 1139 No - lncRNA - - TSL:NA 208 protein

Page 6 of 8 https://www.alphaknockout.com

28.84 kb Forward strand 138.17Mb 138.18Mb 138.19Mb Ap4m1-201 >protein coding Cnpy4-202 >protein coding (Comprehensive set...

Ap4m1-205 >protein coding Ap4m1-202 >retained intron Cnpy4-201 >protein coding Mblac1-201 >protein coding

Ap4m1-207 >retained intron Cnpy4-203 >retained intron

Ap4m1-204 >nonsense mediated decay

Ap4m1-206 >retained intron

Ap4m1-203 >retained intron

Contigs AC151719.3 > AC159257.2 > Genes (Comprehensive set... < Mcm7-201protein coding < Taf6-202protein coding

< Mcm7-208protein coding < Taf6-201protein coding

< Mcm7-205retained intron < Taf6-203protein coding

< Mcm7-209protein coding < Taf6-207nonsense mediated decay

< Mcm7-203retained intron < Taf6-205retained intron < Taf6-208lncRNA

< Mcm7-204protein coding < Taf6-204protein coding

< Mcm7-210protein coding < Taf6-206protein coding

< Mcm7-207protein coding

< Mcm7-206protein coding

Regulatory Build

138.17Mb 138.18Mb 138.19Mb Reverse strand 28.84 kb

Regulation Legend CTCF Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000048698

< Taf6-201protein coding

Reverse strand 8.61 kb

ENSMUSP00000048... MobiDB lite Low complexity (Seg) Superfamily Histone-fold Armadillo-type fold

SMART TATA box binding protein associated factor (TAF) Pfam TATA box binding protein associated factor (TAF) TAF6, C-terminal HEAT repeat domain

PANTHER Transcription initiation factor TFIID subunit 6

PTHR10221:SF12 Gene3D Histone-fold 1.25.40.770

CDD cd08050

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 600 678

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8