https://www.alphaknockout.com

Mouse Mthfd1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Mthfd1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Mthfd1 (NCBI Reference Sequence: NM_138745 ; Ensembl: ENSMUSG00000021048 ) is located on Mouse 12. 28 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 27 (Transcript: ENSMUST00000021443). Exon 9~11 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Mthfd1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-172N19 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a gene trapped allele exhibit embryonic lethality. Mice heterozygous for a gene trap allele exhibit altered amino acid levels and nucleotide metabolism related to dietary folate and choline concentrations.

Exon 9 starts from about 25.95% of the coding region. The knockout of Exon 9~11 will result in frameshift of the gene. The size of intron 8 for 5'-loxP site insertion: 4837 bp, and the size of intron 11 for 3'-loxP site insertion: 1122 bp. The size of effective cKO region: ~1478 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 9 10 11 12 28 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Mthfd1 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7978bp) | A(24.18% 1929) | C(22.09% 1762) | T(29.66% 2366) | G(24.08% 1921)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr12 + 76285632 76288631 3000 browser details YourSeq 447 733 1252 3000 95.2% chr1 - 75707615 75708240 626 browser details YourSeq 437 733 1203 3000 96.8% chr10 - 91251779 91252670 892 browser details YourSeq 424 733 1255 3000 94.0% chr4 - 22626454 22627060 607 browser details YourSeq 422 733 1183 3000 97.8% chrX - 143318778 143319553 776 browser details YourSeq 416 733 1255 3000 94.4% chr6 - 122013476 122013933 458 browser details YourSeq 414 733 1255 3000 94.1% chr6 - 122182729 122183186 458 browser details YourSeq 414 733 1255 3000 94.8% chrY + 1194118 1194567 450 browser details YourSeq 413 733 1174 3000 97.7% chr1 + 144156573 144157284 712 browser details YourSeq 411 733 1153 3000 98.9% chr3 + 132626036 132626456 421 browser details YourSeq 410 733 1154 3000 98.6% chr7 - 25917750 25918171 422 browser details YourSeq 409 733 1153 3000 98.6% chrX - 50651737 50652157 421 browser details YourSeq 409 733 1155 3000 98.4% chr3 - 91020042 91020464 423 browser details YourSeq 409 733 1153 3000 98.6% chr17 - 77026765 77027185 421 browser details YourSeq 409 772 1255 3000 97.7% chr16 - 3731267 3731834 568 browser details YourSeq 409 733 1153 3000 98.6% chr14 - 116098849 116099269 421 browser details YourSeq 409 733 1153 3000 98.6% chr14 - 63333560 63333980 421 browser details YourSeq 409 733 1153 3000 98.6% chr1 - 96507318 96507738 421 browser details YourSeq 409 733 1153 3000 98.6% chrX + 142150352 142150772 421 browser details YourSeq 409 733 1153 3000 98.6% chr4 + 58412423 58412843 421

Note: The 3000 bp section upstream of Exon 9 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr12 + 76290110 76293109 3000 browser details YourSeq 100 1713 1847 3000 91.1% chr5 - 141551872 141552008 137 browser details YourSeq 95 1727 1857 3000 88.2% chrX - 157158915 157159046 132 browser details YourSeq 93 1726 1854 3000 92.8% chr8 - 13681886 13682016 131 browser details YourSeq 92 1720 1847 3000 86.8% chr5 - 7610969 7611113 145 browser details YourSeq 90 1713 1840 3000 91.1% chr14 + 64474040 64474185 146 browser details YourSeq 86 1726 1847 3000 92.4% chr1 - 193266741 193266862 122 browser details YourSeq 86 1731 1847 3000 90.0% chrX + 112295435 112295556 122 browser details YourSeq 86 1727 1847 3000 88.6% chr6 + 92825880 92826001 122 browser details YourSeq 85 1728 1847 3000 85.9% chr1 + 36265246 36265367 122 browser details YourSeq 84 1718 1826 3000 90.7% chr1 - 131616815 131616941 127 browser details YourSeq 83 1727 1846 3000 90.4% chr16 - 35616321 35616441 121 browser details YourSeq 82 1727 1840 3000 92.0% chr14 - 19990014 19990129 116 browser details YourSeq 81 1726 1841 3000 86.3% chr1 + 118654647 118654759 113 browser details YourSeq 80 1726 1840 3000 83.1% chrX - 166899593 166899702 110 browser details YourSeq 78 1713 1813 3000 91.6% chr13 - 55184063 55184168 106 browser details YourSeq 78 1716 1813 3000 90.9% chr14 + 62676497 62676597 101 browser details YourSeq 77 1726 1840 3000 84.3% chr5 - 127120159 127120276 118 browser details YourSeq 77 1728 1840 3000 85.6% chr1 + 56890466 56890580 115 browser details YourSeq 76 1732 1854 3000 86.8% chr12 + 86627300 86627421 122

Note: The 3000 bp section downstream of Exon 11 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and protein information: Mthfd1 methylenetetrahydrofolate dehydrogenase (NADP+ dependent), methenyltetrahydrofolate cyclohydrolase, formyltetrahydrofolate synthase [ Mus musculus (house mouse) ] Gene ID: 108156, updated on 12-Aug-2019

Gene summary

Official Symbol Mthfd1 provided by MGI Official Full Name methylenetetrahydrofolate dehydrogenase (NADP+ dependent), methenyltetrahydrofolate cyclohydrolase, Primary source formyltetrahydrofolate synthase provided by MGI See related MGI:MGI:1342005 Gene type Ensembl:ENSMUSG00000021048 RefSeq status protein coding Organism REVIEWED Lineage Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Also known as Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Summary Dcs; Mthfd; E430024A07Rik Expression This gene encodes a trifunctional cytoplasmic . The encoded protein functions as a methylenetetrahydrofolate dehydrogenase, a methenyltetrahydrofolate cyclohydrolase, and a formyltetrahydrofolate synthase. The encoded enzyme functions in de novo synthesis of purines and thymidylate and in regeneration of methionine from homocysteine. [provided by RefSeq, Oct 2009] Orthologs Ubiquitous expression in kidney adult (RPKM 44.6), liver adult (RPKM 41.3) and 27 other tissuesS ee more human all

Genomic context

Location: 12; 12 C3 See Mthfd1 in Genome Data Viewer

Exon count: 28

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 12 NC_000078.6 (76254406..76319820)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 12 NC_000078.5 (77356219..77420807)

Chromosome 12 - NC_000078.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 7 transcripts

Gene: Mthfd1 ENSMUSG00000021048

Description methylenetetrahydrofolate dehydrogenase (NADP+ dependent), methenyltetrahydrofolate cyclohydrolase, formyltetrahydrofolate synthase [Source:MGI Symbol;Acc:MGI:1342005] Gene Synonyms DCS, E430024A07Rik, Mthfd Location Chromosome 12: 76,255,298-76,319,803 forward strand. GRCm38:CM001005.2 About this gene This gene has 7 transcripts (splice variants), 262 orthologues, 3 paralogues, is a member of 1 Ensembl and is associated with 18 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Mthfd1-201 ENSMUST00000021443.6 3241 935aa ENSMUSP00000021443.5 Protein coding CCDS25990 Q922D8 TSL:1 GENCODE basic APPRIS P1

Mthfd1-206 ENSMUST00000220046.1 3040 755aa ENSMUSP00000151500.1 Protein coding - A0A1W2P733 TSL:1 GENCODE basic

Mthfd1-207 ENSMUST00000220321.1 738 160aa ENSMUSP00000151668.1 Protein coding - A0A1W2P7L5 CDS 3' incomplete TSL:2

Mthfd1-202 ENSMUST00000218010.1 674 No protein - Retained intron - - TSL:3

Mthfd1-203 ENSMUST00000218110.1 642 No protein - Retained intron - - TSL:2

Mthfd1-205 ENSMUST00000218513.1 406 No protein - Retained intron - - TSL:3

Mthfd1-204 ENSMUST00000218331.1 359 No protein - lncRNA - - TSL:3

Page 6 of 8 https://www.alphaknockout.com

84.51 kb Forward strand 76.26Mb 76.28Mb 76.30Mb 76.32Mb (Comprehensive set... Mthfd1-201 >protein coding

Mthfd1-206 >protein coding

Mthfd1-203 >retained intron Mthfd1-204 >lncRNA Mthfd1-207 >protein coding Akap5-202 >protein coding

Mthfd1-202 >retained intron Akap5-201 >protein coding

Mthfd1-205 >retained intron Akap5-203 >protein coding

Contigs AC120002.24 > Genes < Tex21-201protein coding < Gm47526-201processed pseudogene (Comprehensive set...

< Tex21-202protein coding < Gm34868-201lncRNA

< Tex21-203protein coding

< Gm47525-201lncRNA

Regulatory Build

76.26Mb 76.28Mb 76.30Mb 76.32Mb Reverse strand 84.51 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript pseudogene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000021443

64.51 kb Forward strand

Mthfd1-201 >protein coding

ENSMUSP00000021... Low complexity (Seg) Superfamily SSF53223 P-loop containing nucleoside triphosphate

NAD(P)-binding domain superfamily Prints Tetrahydrofolate dehydrogenase/cyclohydrolase Pfam Formate-tetrahydrofolate , FTHFS

Tetrahydrofolate dehydrogenase/cyclohydrolase, NAD(P)-binding domain

Tetrahydrofolate dehydrogenase/cyclohydrolase, catalytic domain PROSITE patterns Tetrahydrofolate dehydrogenase/cyclohydrolase, conserved site Formate-tetrahydrofolate ligase, FTHFS, conserved site

Tetrahydrofolate dehydrogenase/cyclohydrolase, conserved site

Formate-tetrahydrofolate ligase, FTHFS, conserved site PANTHER PTHR43274

PTHR43274:SF2 HAMAP Tetrahydrofolate dehydrogenase/cyclohydrolase

Formate-tetrahydrofolate ligase, FTHFS Gene3D 3.40.50.720 1.10.8.770 3.10.410.10

3.40.50.10860 CDD cd00477

Tetrahydrofolate dehydrogenase/cyclohydrolase, NAD(P)-binding domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 720 800 935

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8