https://www.alphaknockout.com

Mouse Tcf19 Knockout Project (CRISPR/Cas9)

Objective: To create a Tcf19 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Tcf19 (NCBI Reference Sequence: NM_001163763 ; Ensembl: ENSMUSG00000050410 ) is located on Mouse 17. 4 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 4 (Transcript: ENSMUST00000160885). Exon 3 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 30.29% of the coding region. Exon 3 covers 69.33% of the coding region. The size of effective KO region: ~547 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4

Legends Exon of mouse Tcf19 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 857 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1289 bp section downstream of Exon 3 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(857bp) | A(25.9% 222) | C(23.69% 203) | T(26.84% 230) | G(23.57% 202)

Note: The 857 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1289bp) | A(26.14% 337) | C(22.81% 294) | T(25.45% 328) | G(25.6% 330)

Note: The 1289 bp section downstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 857 1 857 857 100.0% chr17 - 35515021 35515877 857 browser details YourSeq 32 548 580 857 100.0% chr9 + 58370166 58370200 35 browser details YourSeq 31 549 587 857 97.1% chr17 - 74730926 74730967 42 browser details YourSeq 31 549 589 857 89.8% chr7 + 45997826 45997868 43 browser details YourSeq 31 548 589 857 91.9% chr10 + 69811531 69811574 44 browser details YourSeq 30 548 580 857 96.9% chr12 + 27305395 27305429 35 browser details YourSeq 28 547 579 857 94.0% chr11 - 61750980 61751013 34 browser details YourSeq 26 500 526 857 100.0% chr12 + 72301288 72301318 31 browser details YourSeq 24 550 577 857 80.0% chr10 - 88517705 88517729 25 browser details YourSeq 22 548 572 857 95.9% chr1 - 61977210 61977236 27 browser details YourSeq 22 548 572 857 96.0% chr12 + 54887660 54887685 26 browser details YourSeq 21 567 587 857 100.0% chr11 + 54350943 54350963 21 browser details YourSeq 20 784 803 857 100.0% chr1 - 28445353 28445372 20 browser details YourSeq 20 109 140 857 81.3% chr5 + 128371186 128371217 32 browser details YourSeq 20 511 530 857 100.0% chr12 + 54042479 54042498 20 browser details YourSeq 20 564 587 857 91.7% chr12 + 32980306 32980329 24 browser details YourSeq 20 86 105 857 100.0% chr1 + 15287895 15287914 20

Note: The 857 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1289 1 1289 1289 100.0% chr17 - 35513185 35514473 1289 browser details YourSeq 93 950 1231 1289 82.4% chr10 - 75979106 75979367 262 browser details YourSeq 85 950 1085 1289 87.1% chr4 + 118619494 118619632 139 browser details YourSeq 84 910 1061 1289 93.0% chr10 - 82765786 82766119 334 browser details YourSeq 80 973 1092 1289 91.0% chr2 + 179884456 179884577 122 browser details YourSeq 79 950 1086 1289 82.3% chr10 - 23730881 23731002 122 browser details YourSeq 78 950 1061 1289 88.3% chr10 - 117713955 117714069 115 browser details YourSeq 78 950 1101 1289 88.4% chr5 + 107860109 107860266 158 browser details YourSeq 76 950 1102 1289 91.4% chr7 - 12913704 12913858 155 browser details YourSeq 76 976 1085 1289 86.5% chr15 + 37922534 37922647 114 browser details YourSeq 74 950 1188 1289 93.2% chr2 + 164172735 164173220 486 browser details YourSeq 73 950 1070 1289 94.1% chr4 - 100725531 100725654 124 browser details YourSeq 73 950 1061 1289 89.4% chr10 + 80289692 80289806 115 browser details YourSeq 72 973 1076 1289 89.2% chr4 + 116642273 116642381 109 browser details YourSeq 71 973 1085 1289 85.2% chr7 - 80885864 80885974 111 browser details YourSeq 71 976 1204 1289 91.0% chr1 - 87962915 88167213 204299 browser details YourSeq 71 950 1085 1289 80.8% chr1 - 7200654 7200792 139 browser details YourSeq 70 950 1102 1289 88.1% chr14 - 123514992 123515147 156 browser details YourSeq 70 950 1102 1289 89.8% chr2 + 168937039 168937192 154 browser details YourSeq 69 950 1059 1289 88.8% chr1 - 181327060 181327172 113

Note: The 1289 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Tcf19 transcription factor 19 [ Mus musculus (house mouse) ] Gene ID: 106795, updated on 10-Oct-2019

Gene summary

Official Symbol Tcf19 provided by MGI Official Full Name transcription factor 19 provided by MGI Primary source MGI:MGI:103180 See related Ensembl:ENSMUSG00000050410 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AW495861; 5730403J10Rik Expression Broad expression in limb E14.5 (RPKM 21.9), thymus adult (RPKM 20.3) and 25 other tissuesS ee more Orthologs human all

Genomic context

Location: 17; 17 B1 See Tcf19 in Genome Data Viewer Exon count: 4

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 17 NC_000083.6 (35512730..35516837, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 17 NC_000083.5 (35649680..35653769, complement)

Chromosome 17 - NC_000083.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Tcf19 ENSMUSG00000050410

Description transcription factor 19 [Source:MGI Symbol;Acc:MGI:103180] Gene Synonyms 5730403J10Rik Location Chromosome 17: 35,512,734-35,516,824 reverse strand. GRCm38:CM001010.2 About this gene This gene has 4 transcripts (splice variants), 166 orthologues, is a member of 1 Ensembl protein family and is associated with 6 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Tcf19-202 ENSMUST00000160885.1 1732 263aa ENSMUSP00000125167.1 Protein coding CCDS50089 G3XA31 TSL:1 GENCODE basic APPRIS P1

Tcf19-203 ENSMUST00000161012.7 1665 263aa ENSMUSP00000125310.1 Protein coding CCDS50089 G3XA31 TSL:1 GENCODE basic APPRIS P1

Tcf19-201 ENSMUST00000159009.1 982 201aa ENSMUSP00000124449.1 Protein coding - E0CY69 CDS 3' incomplete TSL:2

Tcf19-204 ENSMUST00000162683.1 873 147aa ENSMUSP00000125659.1 Protein coding - E0CZ64 CDS 3' incomplete TSL:1

Page 7 of 9 https://www.alphaknockout.com

24.09 kb Forward strand 35.505Mb 35.510Mb 35.515Mb 35.520Mb 35.525Mb Pou5f1-201 >protein coding Cchcr1-202 >protein coding (Comprehensive set...

Pou5f1-205 >protein coding Cchcr1-207 >retained intron Cchcr1-204 >retained intron

Pou5f1-202 >protein coding Cchcr1-201 >protein coding

Pou5f1-204 >protein coding Cchcr1-205 >protein coding

Pou5f1-203 >protein coding Cchcr1-206 >lncRNA

Pou5f1-206 >protein coding

Contigs CR974473.23 > Genes (Comprehensive set... < Gm19553-201lncRNA < Tcf19-203protein coding

< Tcf19-202protein coding

< Tcf19-201protein coding

< Tcf19-204protein coding

Regulatory Build

35.505Mb 35.510Mb 35.515Mb 35.520Mb 35.525Mb Reverse strand 24.09 kb

Regulation Legend CTCF Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000160885

< Tcf19-202protein coding

Reverse strand 4.05 kb

ENSMUSP00000125... MobiDB lite Low complexity (Seg) Superfamily SMAD/FHA domain superfamily SMART Forkhead-associated (FHA) domain

Pfam Forkhead-associated (FHA) domain

PROSITE profiles Forkhead-associated (FHA) domain

PANTHER PTHR15464

Gene3D 2.60.200.20

CDD Forkhead-associated (FHA) domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

frameshift variant missense variant synonymous variant

Scale bar 0 40 80 120 160 200 263

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9