https://www.alphaknockout.com

Mouse Btaf1 Knockout Project (CRISPR/Cas9)

Objective: To create a Btaf1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Btaf1 (NCBI Reference Sequence: NM_001080706 ; Ensembl: ENSMUSG00000040565 ) is located on Mouse 19. 38 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 38 (Transcript: ENSMUST00000099494). Exon 3~6 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Embryos homozygous for a gene-trapped allele display growth retardation. Embryos homozygous for an ENU- induced allele show growth retardation, edema, abnormal blood circulation, myocardial trabeculae hypoplasia, and delayed head and brain development.

Exon 3 starts from about 2.51% of the coding region. Exon 3~6 covers 10.1% of the coding region. The size of effective KO region: ~9442 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 5 6 38

Legends Exon of mouse Btaf1 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 6 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(28.75% 575) | C(18.5% 370) | T(34.4% 688) | G(18.35% 367)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(29.55% 591) | C(16.75% 335) | T(32.2% 644) | G(21.5% 430)

Note: The 2000 bp section downstream of Exon 6 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr19 + 36947069 36949068 2000 browser details YourSeq 170 847 1737 2000 92.1% chr11 + 93954752 94376661 421910 browser details YourSeq 146 847 1019 2000 94.0% chr14 - 70590776 70590948 173 browser details YourSeq 144 794 1005 2000 93.9% chr1 - 132261556 132261842 287 browser details YourSeq 141 846 1010 2000 89.9% chr3 + 116823390 116823546 157 browser details YourSeq 138 851 1033 2000 93.8% chr11 - 54875398 54875585 188 browser details YourSeq 138 834 1007 2000 88.0% chr12 + 111847846 111848003 158 browser details YourSeq 138 849 1010 2000 93.2% chr10 + 43491282 43491445 164 browser details YourSeq 136 849 1010 2000 94.2% chr12 - 51023696 51023863 168 browser details YourSeq 136 847 1001 2000 91.8% chr19 + 37447040 37447184 145 browser details YourSeq 135 847 1004 2000 91.8% chr11 - 59765992 59766139 148 browser details YourSeq 134 850 1004 2000 91.1% chr7 - 80354329 80354473 145 browser details YourSeq 134 850 1004 2000 94.1% chr2 - 156408661 156408824 164 browser details YourSeq 134 850 1004 2000 91.2% chr3 + 28291891 28292039 149 browser details YourSeq 133 854 1010 2000 93.5% chr13 - 62787946 62788117 172 browser details YourSeq 133 619 1010 2000 90.3% chr12 - 51354416 51354897 482 browser details YourSeq 132 847 1004 2000 89.5% chr3 - 19998137 19998287 151 browser details YourSeq 131 864 1010 2000 92.5% chr3 + 54785874 54786018 145 browser details YourSeq 131 854 1008 2000 93.5% chr14 + 25990545 25990703 159 browser details YourSeq 131 850 1008 2000 91.8% chr1 + 14261079 14261243 165

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr19 + 36958511 36960510 2000 browser details YourSeq 147 898 1228 2000 90.7% chr11 + 101429923 101483444 53522 browser details YourSeq 128 884 1202 2000 88.6% chr17 - 37065964 37097401 31438 browser details YourSeq 108 930 1222 2000 78.2% chr11 - 116734666 116734856 191 browser details YourSeq 107 890 1974 2000 88.5% chr1 + 155076635 155155761 79127 browser details YourSeq 101 794 975 2000 90.4% chr4 + 120931986 120932268 283 browser details YourSeq 101 1815 1982 2000 85.8% chr10 + 39282492 39282672 181 browser details YourSeq 100 791 1235 2000 87.7% chr17 + 23765875 23766577 703 browser details YourSeq 99 901 1227 2000 82.9% chr10 + 95537370 95537688 319 browser details YourSeq 93 1672 1979 2000 92.6% chr13 + 31062808 31063337 530 browser details YourSeq 92 1712 1979 2000 89.2% chr18 - 49860299 49860564 266 browser details YourSeq 92 1180 1891 2000 88.9% chr11 + 97255946 97494937 238992 browser details YourSeq 88 930 1259 2000 80.6% chr2 - 154653130 154653429 300 browser details YourSeq 88 1714 1984 2000 88.8% chr11 - 104569617 104569889 273 browser details YourSeq 87 1109 1224 2000 85.1% chr10 - 80091107 80091220 114 browser details YourSeq 85 1708 1982 2000 92.0% chr11 - 94617782 94618118 337 browser details YourSeq 84 834 1397 2000 73.6% chr1 - 74866677 74866844 168 browser details YourSeq 81 1109 1224 2000 82.5% chr1 - 23338262 23338375 114 browser details YourSeq 80 1679 1923 2000 92.6% chr1 - 193378516 193378764 249 browser details YourSeq 78 1111 1224 2000 87.5% chr8 - 105960813 105960924 112

Note: The 2000 bp section downstream of Exon 6 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Btaf1 B-TFIID TATA-box binding protein associated factor 1 [ Mus musculus (house mouse) ] Gene ID: 107182, updated on 12-Aug-2019

Gene summary

Official Symbol Btaf1 provided by MGI Official Full Name B-TFIID TATA-box binding protein associated factor 1 provided by MGI Primary source MGI:MGI:2147538 See related Ensembl:ENSMUSG00000040565 Gene type protein coding RefSeq status PROVISIONAL Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as TAF170; AI414500; AI447930; E430027O22Rik Expression Ubiquitous expression in testis adult (RPKM 6.3), liver E14 (RPKM 5.8) and 28 other tissues See more Orthologs human all

Genomic context

Location: 19; 19 C2 See Btaf1 in Genome Data Viewer Exon count: 39

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 19 NC_000085.6 (36926079..37014057)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 19 NC_000085.5 (37000569..37088547)

Chromosome 19 - NC_000085.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Btaf1 ENSMUSG00000040565

Description B-TFIID TATA-box binding protein associated factor 1 [Source:MGI Symbol;Acc:MGI:2147538] Gene Synonyms E430027O22Rik Location Chromosome 19: 36,926,079-37,012,752 forward strand. GRCm38:CM001012.2 About this gene This gene has 3 transcripts (splice variants), 193 orthologues, 32 paralogues, is a member of 1 Ensembl protein family and is associated with 6 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Btaf1-201 ENSMUST00000099494.3 7204 1848aa ENSMUSP00000097093.3 Protein coding CCDS37969 E9QAE3 TSL:1 GENCODE basic APPRIS P1

Btaf1-203 ENSMUST00000238041.1 4876 No protein - Retained intron - - -

Btaf1-202 ENSMUST00000236343.1 1400 No protein - Retained intron - - -

106.67 kb Forward strand 36.92Mb 36.94Mb 36.96Mb 36.98Mb 37.00Mb 37.02Mb (Comprehensive set... Btaf1-201 >protein coding

Btaf1-203 >retained intron

Btaf1-202 >retained intron

Contigs AC118931.8 > Genes < Fgfbp3-202protein coding < Cpeb3-201protein coding (Comprehensive set...

< Fgfbp3-201protein coding < Cpeb3-203protein coding

< Cpeb3-210protein coding

< Cpeb3-212protein coding

Regulatory Build

36.92Mb 36.94Mb 36.96Mb 36.98Mb 37.00Mb 37.02Mb Reverse strand 106.67 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000099494

86.67 kb Forward strand

Btaf1-201 >protein coding

ENSMUSP00000097... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Armadillo-type fold

P-loop containing nucleoside triphosphate hydrolase SMART Helicase superfamily 1/2, ATP-binding domain

Helicase, C-terminal Pfam Domain of unknown function DUF3535 Helicase, C-terminal

SNF2-related, N-terminal domain PROSITE profiles Helicase superfamily 1/2, ATP-binding domain

Helicase, C-terminal PANTHER PTHR36498 Gene3D Armadillo-like helical 3.40.50.300

SNF2-like, N-terminal domain superfamily CDD cd17999 cd18793

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 200 400 600 800 1000 1200 1400 1600 1848

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8