https://www.alphaknockout.com

Mouse Mthfd2l Knockout Project (CRISPR/Cas9)

Objective: To create a Mthfd2l knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Mthfd2l (NCBI Reference Sequence: NM_026788 ; Ensembl: ENSMUSG00000029376 ) is located on Mouse 5. 8 are identified, with the ATG start codon in 1 and the TAG stop codon in exon 8 (Transcript: ENSMUST00000071652). Exon 2~3 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 11.54% of the coding region. Exon 2~3 covers 30.37% of the coding region. The size of effective KO region: ~2156 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 8

Legends Exon of mouse Mthfd2l Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 3 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.3% 526) | C(24.5% 490) | T(29.35% 587) | G(19.85% 397)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(27.95% 559) | C(19.7% 394) | T(31.25% 625) | G(21.1% 422)

Note: The 2000 bp section downstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr5 + 90944791 90946790 2000 browser details YourSeq 520 110 770 2000 92.9% chr12 - 14760391 14761238 848 browser details YourSeq 481 136 796 2000 90.6% chr6 + 82229937 82230541 605 browser details YourSeq 455 144 796 2000 90.5% chr15 - 22937472 22938346 875 browser details YourSeq 445 243 793 2000 93.1% chr1 - 160163789 160164425 637 browser details YourSeq 376 283 735 2000 92.6% chr6 - 84679139 84679981 843 browser details YourSeq 361 396 796 2000 95.3% chr7 + 128828269 128828671 403 browser details YourSeq 359 398 795 2000 95.5% chrX - 159843334 159843731 398 browser details YourSeq 359 396 796 2000 95.0% chr18 - 31468289 31468691 403 browser details YourSeq 358 396 830 2000 92.9% chr4 + 27630520 27631331 812 browser details YourSeq 357 396 796 2000 94.8% chr7 + 67901854 67902256 403 browser details YourSeq 356 396 796 2000 94.8% chr8 + 77304719 77305121 403 browser details YourSeq 356 396 795 2000 94.8% chr8 + 15862184 15862585 402 browser details YourSeq 356 401 796 2000 95.2% chr4 + 87099364 87099761 398 browser details YourSeq 356 396 796 2000 95.0% chr1 + 163892417 163892819 403 browser details YourSeq 355 396 796 2000 94.8% chr9 - 49375892 49376294 403 browser details YourSeq 355 396 796 2000 94.5% chr18 + 10162768 10163170 403 browser details YourSeq 354 396 796 2000 94.5% chr14 + 34113547 34113947 401 browser details YourSeq 354 396 796 2000 94.5% chr12 + 109248675 109249075 401 browser details YourSeq 353 396 796 2000 94.0% chr7 - 73094670 73095067 398

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr5 + 90948947 90950946 2000 browser details YourSeq 102 1518 1713 2000 84.4% chr8 - 73144586 73144781 196 browser details YourSeq 95 1557 1713 2000 85.4% chr1 - 183243590 183243743 154 browser details YourSeq 95 1518 1676 2000 88.6% chr5 + 114044968 114045151 184 browser details YourSeq 93 1577 1706 2000 83.4% chr5 + 134574511 134574637 127 browser details YourSeq 92 1577 1713 2000 83.8% chr8 - 75059992 75060122 131 browser details YourSeq 91 1578 1713 2000 86.7% chr3 + 143044003 143044140 138 browser details YourSeq 84 1579 1706 2000 82.2% chr16 + 94005315 94005441 127 browser details YourSeq 84 1581 1715 2000 81.3% chr14 + 31356915 31357049 135 browser details YourSeq 83 1565 1706 2000 80.5% chr15 - 27616689 27616827 139 browser details YourSeq 82 1559 1693 2000 87.5% chr8 - 114412426 114412559 134 browser details YourSeq 82 1577 1710 2000 85.6% chr11 + 20955062 20955196 135 browser details YourSeq 81 1517 1667 2000 88.6% chr18 - 47812147 47812297 151 browser details YourSeq 81 1520 1690 2000 83.1% chr1 - 39203186 39203356 171 browser details YourSeq 81 1578 1690 2000 84.0% chr15 + 94841362 94841473 112 browser details YourSeq 80 1516 1633 2000 91.7% chr6 - 32661559 32661698 140 browser details YourSeq 80 1577 1690 2000 83.5% chr11 + 87320946 87321055 110 browser details YourSeq 79 1525 1713 2000 91.6% chr5 + 112882881 112883075 195 browser details YourSeq 79 1557 1665 2000 85.5% chr3 + 34713100 34713206 107 browser details YourSeq 78 1577 1690 2000 92.4% chr2 + 132839172 132839302 131

Note: The 2000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and protein information: Mthfd2l methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 2-like [ Mus musculus (house mouse) ] Gene ID: 665563, updated on 10-Oct-2019

Gene summary

Official Symbol Mthfd2l provided by MGI Official Full Name methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 2-like provided by MGI Primary source MGI:MGI:1915871 See related Ensembl:ENSMUSG00000029376 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 1110019K23Rik; C630010D07Rik Expression Ubiquitous expression in CNS E14 (RPKM 10.4), whole brain E14.5 (RPKM 10.0) and 28 other tissues See more Orthologs human all

Genomic context

Location: 5; 5 E1 See Mthfd2l in Genome Data Viewer Exon count: 11

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 5 NC_000071.6 (90930873..91021368)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 5 NC_000071.5 (91360222..91450394)

Chromosome 5 - NC_000071.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Mthfd2l ENSMUSG00000029376

Description methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 2-like [Source:MGI Symbol;Acc:MGI:1915871] Gene Synonyms 1110019K23Rik, C630010D07Rik Location Chromosome 5: 90,931,117-91,021,368 forward strand. GRCm38:CM000998.2 About this gene This gene has 4 transcripts (splice variants), 193 orthologues, 3 paralogues, is a member of 1 Ensembl protein family and is associated with 4 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Mthfd2l-201 ENSMUST00000071652.5 2159 338aa ENSMUSP00000071578.4 Protein coding CCDS39145 D3YZG8 TSL:5 GENCODE basic APPRIS P1

Mthfd2l-203 ENSMUST00000202781.1 688 62aa ENSMUSP00000144302.1 Protein coding - A0A0J9YUR4 TSL:1 GENCODE basic

Mthfd2l-202 ENSMUST00000201851.1 611 No protein - Retained intron - - TSL:5

Mthfd2l-204 ENSMUST00000202852.1 4148 No protein - lncRNA - - TSL:3

110.25 kb Forward strand 90.94Mb 90.96Mb 90.98Mb 91.00Mb 91.02Mb (Comprehensive set... Gm42530-201 >lncRNA Mthfd2l-202 >retained intron Epgn-201 >protein coding

Mthfd2l-204 >lncRNA Epgn-202 >protein coding

Mthfd2l-203 >protein coding

Mthfd2l-201 >protein coding

Contigs AC157938.9 > < AC109311.12 Regulatory Build

90.94Mb 90.96Mb 90.98Mb 91.00Mb 91.02Mb Reverse strand 110.25 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000071652

90.09 kb Forward strand

Mthfd2l-201 >protein coding

ENSMUSP00000071... Low complexity (Seg) Superfamily SSF53223 NAD(P)-binding domain superfamily

Prints Tetrahydrofolate dehydrogenase/cyclohydrolase Pfam Tetrahydrofolate dehydrogenase/cyclohydrolase, catalytic domain

Tetrahydrofolate dehydrogenase/cyclohydrolase, NAD(P)-binding domain PROSITE patterns Tetrahydrofolate dehydrogenase/cyclohydrolase, conserved site PANTHER Tetrahydrofolate dehydrogenase/cyclohydrolase

PTHR10025:SF42 HAMAP Tetrahydrofolate dehydrogenase/cyclohydrolase Gene3D 3.40.50.10860

3.40.50.720 CDD Tetrahydrofolate dehydrogenase/cyclohydrolase, NAD(P)-binding domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant splice region variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 338

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8