https://www.alphaknockout.com

Mouse Mif4gd Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Mif4gd conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Mif4gd (NCBI Reference Sequence: NM_027162 ; Ensembl: ENSMUSG00000020743 ) is located on Mouse 11. 5 are identified, with the ATG start codon in 1 and the TAA stop codon in exon 5 (Transcript: ENSMUST00000106507). Exon 2~4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Mif4gd gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-368N3 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 12.46% of the coding region. The knockout of Exon 2~4 will result in frameshift of the gene. The size of 1 for 5'-loxP site insertion: 2254 bp, and the size of intron 4 for 3'-loxP site insertion: 447 bp. The size of effective cKO region: ~1061 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 5 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Mif4gd Homology arm cKO region Exon of mouse Mrps7 loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7534bp) | A(23.35% 1759) | C(26.81% 2020) | T(23.92% 1802) | G(25.92% 1953)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr11 - 115609899 115612898 3000 browser details YourSeq 186 2147 2799 3000 92.8% chr11 - 107097530 107134688 37159 browser details YourSeq 181 2147 2802 3000 83.3% chr12 + 8666909 8667257 349 browser details YourSeq 172 2160 2340 3000 98.4% chr8 + 104582532 104582716 185 browser details YourSeq 172 1889 2340 3000 93.0% chr1 + 156516155 156516740 586 browser details YourSeq 171 1892 2352 3000 91.7% chr1 + 131912729 131913261 533 browser details YourSeq 168 2142 2357 3000 91.6% chr7 - 110199560 110199781 222 browser details YourSeq 166 2160 2803 3000 82.1% chr11 + 6607946 6608289 344 browser details YourSeq 162 2143 2353 3000 89.8% chr6 + 30473856 30474064 209 browser details YourSeq 159 2162 2533 3000 91.7% chr11 + 97880814 97881198 385 browser details YourSeq 158 2147 2340 3000 91.0% chr13 - 100626725 100626917 193 browser details YourSeq 157 2147 2341 3000 89.7% chr4 - 120634525 120634699 175 browser details YourSeq 155 2144 2361 3000 92.9% chr10 - 127686497 127686725 229 browser details YourSeq 154 2147 2340 3000 92.4% chr1 - 127930114 127930317 204 browser details YourSeq 154 2160 2533 3000 91.0% chr12 + 73135325 73135710 386 browser details YourSeq 153 2147 2354 3000 90.0% chr11 + 115742276 115742483 208 browser details YourSeq 153 2161 2796 3000 81.2% chr11 + 58094228 58094609 382 browser details YourSeq 152 2160 2354 3000 91.4% chr11 - 4864561 4864773 213 browser details YourSeq 152 2158 2354 3000 91.9% chr1 - 86164927 86165139 213 browser details YourSeq 152 2148 2340 3000 91.8% chr13 + 54660351 54660543 193

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr11 - 115605838 115608837 3000 browser details YourSeq 138 1008 1203 3000 93.8% chr7 + 30042896 30209341 166446 browser details YourSeq 131 1015 1203 3000 94.6% chr6 + 91832063 91832577 515 browser details YourSeq 130 997 1204 3000 89.6% chr18 - 65511990 65512235 246 browser details YourSeq 127 997 1193 3000 91.6% chr15 - 100327353 100327598 246 browser details YourSeq 126 1042 1245 3000 88.5% chr11 + 114569274 114569469 196 browser details YourSeq 121 975 1236 3000 83.3% chr6 - 140675955 140676117 163 browser details YourSeq 121 992 1201 3000 89.7% chr4 - 135980089 135980575 487 browser details YourSeq 119 989 1243 3000 82.3% chr3 + 121432671 121432817 147 browser details YourSeq 118 1034 1202 3000 93.5% chr2 + 32504840 32505397 558 browser details YourSeq 113 989 1230 3000 91.2% chr5 - 92111361 92111742 382 browser details YourSeq 112 989 1201 3000 91.8% chr12 + 84646037 84646405 369 browser details YourSeq 110 997 1244 3000 81.1% chr11 + 57525032 57525171 140 browser details YourSeq 109 998 1212 3000 93.0% chr17 - 56642539 56643041 503 browser details YourSeq 108 998 1235 3000 81.8% chr5 + 147413556 147413687 132 browser details YourSeq 107 976 1105 3000 92.2% chr16 + 56191832 56191971 140 browser details YourSeq 101 989 1243 3000 92.5% chr4 + 55324136 55324474 339 browser details YourSeq 100 1058 1204 3000 92.5% chr13 + 96743796 97103842 360047 browser details YourSeq 98 989 1244 3000 79.9% chr10 + 121433971 121434118 148 browser details YourSeq 97 989 1245 3000 91.6% chr2 - 60834985 60835418 434

Note: The 3000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Mif4gd MIF4G domain containing [ Mus musculus () ] Gene ID: 69674, updated on 10-Oct-2019

Gene summary

Official Symbol Mif4gd provided by MGI Official Full Name MIF4G domain containing provided by MGI Primary source MGI:MGI:1916924 See related Ensembl:ENSMUSG00000020743 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 1110014L05Rik; 2310075G12Rik Expression Broad expression in testis adult (RPKM 121.7), adrenal adult (RPKM 65.9) and 22 other tissues See more Orthologs all

Genomic context

Location: 11; 11 E2 See Mif4gd in Genome Data Viewer

Exon count: 7

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 11 NC_000077.6 (115607918..115612969, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 11 NC_000077.5 (115469232..115474267, complement)

Chromosome 11 - NC_000077.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 10 transcripts

Gene: Mif4gd ENSMUSG00000020743

Description MIF4G domain containing [Source:MGI Symbol;Acc:MGI:1916924] Gene Synonyms 1110014L05Rik, 2310075G12Rik Location Chromosome 11: 115,607,918-115,612,969 reverse strand. GRCm38:CM001004.2 About this gene This gene has 10 transcripts (splice variants), 199 orthologues, 2 paralogues, is a member of 1 Ensembl protein family and is associated with 1 phenotype. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Mif4gd-203 ENSMUST00000106507.8 1656 222aa ENSMUSP00000102116.2 Protein coding CCDS25643 Q3UBZ5 TSL:1 GENCODE basic APPRIS P1

Mif4gd-201 ENSMUST00000021087.13 1372 222aa ENSMUSP00000021087.7 Protein coding CCDS25643 Q3UBZ5 TSL:1 GENCODE basic APPRIS P1

Mif4gd-202 ENSMUST00000106506.7 993 203aa ENSMUSP00000102115.1 Protein coding CCDS56818 A2A9W3 TSL:3 GENCODE basic

Mif4gd-210 ENSMUST00000148574.1 628 162aa ENSMUSP00000119643.1 Protein coding - A2A9W1 CDS 3' incomplete TSL:3

Mif4gd-206 ENSMUST00000137304.7 1271 No protein - lncRNA - - TSL:1

Mif4gd-204 ENSMUST00000124407.7 1142 No protein - lncRNA - - TSL:1

Mif4gd-209 ENSMUST00000146244.7 869 No protein - lncRNA - - TSL:3

Mif4gd-205 ENSMUST00000127132.7 828 No protein - lncRNA - - TSL:5

Mif4gd-208 ENSMUST00000142637.1 621 No protein - lncRNA - - TSL:2

Mif4gd-207 ENSMUST00000139556.1 299 No protein - lncRNA - - TSL:5

Page 6 of 8 https://www.alphaknockout.com

25.05 kb Forward strand 115.60Mb 115.61Mb 115.62Mb Gm25364-201 >snRNAMrps7-202 >lncRNA (Comprehensive set...

Mrps7-201 >protein coding

Contigs < AL645470.20 Genes (Comprehensive set... < Gga3-212nonsense mediated decay < Mif4gd-201protein coding < Slc25a19-201protein coding

< Gga3-202protein coding < Mif4gd-206lncRNA < Slc25a19-212protein coding

< Gga3-201protein coding < Mif4gd-204lncRNA < Slc25a19-202protein coding

< Gga3-206lncRNA < Mif4gd-203protein coding < Slc25a19-205protein coding

< Gga3-204lncRNA < Mif4gd-202protein coding < Slc25a19-203lncRNA

< Gga3-209lncRNA < Mif4gd-205lncRNA < Slc25a19-208lncRNA

< Gga3-205protein coding < Mif4gd-209lncRNA

< Gga3-210lncRNA < Mif4gd-210protein coding

< Gga3-207lncRNA < Mif4gd-208lncRNA

< Mif4gd-207lncRNA

Regulatory Build

115.60Mb 115.61Mb 115.62Mb Reverse strand 25.05 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000106507

< Mif4gd-203protein coding

Reverse strand 4.59 kb

ENSMUSP00000102... Low complexity (Seg) Superfamily Armadillo-type fold SMART MIF4G-like, type 3 Pfam MIF4G-like, type 3 PANTHER PTHR23254

PTHR23254:SF17 Gene3D MIF4G-like domain superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 180 200 222

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8