https://www.alphaknockout.com

Mouse Gcm1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Gcm1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Gcm1 (NCBI Reference Sequence: NM_008103 ; Ensembl: ENSMUSG00000023333 ) is located on Mouse 9. 6 exons are identified, with the ATG start codon in exon 2 and the TAA stop codon in exon 6 (Transcript: ENSMUST00000024104). Exon 4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Gcm1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-336E9 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Homozygotes for targeted null mutations exhibit impaired branching of the chorioallantoic interface, absence of the placental labyrinth, lack of fusion of chorionic trophoblast cells, and lethality between embryonic days 5.5-10.

Exon 4 starts from about 25.15% of the coding region. The knockout of Exon 4 will result in frameshift of the gene. The size of intron 3 for 5'-loxP site insertion: 1519 bp, and the size of intron 4 for 3'-loxP site insertion: 950 bp. The size of effective cKO region: ~613 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 3 4 5 6 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Gcm1 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7113bp) | A(27.23% 1937) | C(26.53% 1887) | T(24.59% 1749) | G(21.65% 1540)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr9 + 78058099 78061098 3000 browser details YourSeq 305 1902 2296 3000 96.0% chr4 + 45593957 45594496 540 browser details YourSeq 287 1917 2250 3000 96.9% chr4 - 142245405 142245951 547 browser details YourSeq 277 1901 2232 3000 97.0% chr3 - 152482890 152483422 533 browser details YourSeq 269 1901 2288 3000 95.5% chr16 - 91345643 91346094 452 browser details YourSeq 268 1904 2217 3000 96.0% chr4 + 45594036 45594527 492 browser details YourSeq 265 1902 2216 3000 97.6% chr4 + 103271062 103271646 585 browser details YourSeq 264 1923 2296 3000 94.5% chr11 + 4979648 4980042 395 browser details YourSeq 253 1902 2235 3000 92.9% chr14 + 117386210 117386735 526 browser details YourSeq 239 1931 2250 3000 94.1% chr19 - 46442076 46442764 689 browser details YourSeq 231 1901 2261 3000 91.7% chr4 - 115650072 115650400 329 browser details YourSeq 226 1904 2189 3000 93.2% chr14 + 117386205 117386745 541 browser details YourSeq 221 1899 2197 3000 98.0% chr4 - 133560303 133560883 581 browser details YourSeq 216 1946 2296 3000 95.5% chr12 - 52896048 52896637 590 browser details YourSeq 210 1902 2169 3000 93.1% chr14 + 117386190 117386775 586 browser details YourSeq 206 1913 2167 3000 97.0% chr5 - 105968637 105969063 427 browser details YourSeq 196 2037 2296 3000 93.9% chr14 + 8006799 8007117 319 browser details YourSeq 185 1904 2189 3000 86.2% chr13 - 73600857 73601100 244 browser details YourSeq 185 1901 2293 3000 90.0% chr19 + 16447224 16447467 244 browser details YourSeq 185 1474 1731 3000 87.0% chr10 + 130140113 130140365 253

Note: The 3000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr9 + 78061712 78064711 3000 browser details YourSeq 335 698 2996 3000 91.2% chr10 + 130140484 130149592 9109 browser details YourSeq 101 1354 1549 3000 85.6% chrX - 93223771 93224381 611 browser details YourSeq 97 1326 1468 3000 80.8% chr19 - 42713492 42713616 125 browser details YourSeq 97 1368 1530 3000 82.6% chr19 - 40491875 40492022 148 browser details YourSeq 94 1368 1530 3000 86.9% chr2 + 157491130 157491294 165 browser details YourSeq 92 1369 1536 3000 88.9% chr10 + 41954441 41954618 178 browser details YourSeq 90 1368 1535 3000 76.8% chr1 - 135116730 135116897 168 browser details YourSeq 87 1354 1530 3000 86.7% chr18 + 5654007 5654205 199 browser details YourSeq 87 1368 1478 3000 89.2% chr15 + 63496289 63496399 111 browser details YourSeq 83 1368 1477 3000 88.2% chr4 + 131695701 131695811 111 browser details YourSeq 82 1384 1535 3000 88.0% chr15 + 27616727 27616878 152 browser details YourSeq 81 1348 1478 3000 84.7% chr11 - 12226956 12227085 130 browser details YourSeq 81 1368 1478 3000 86.7% chr10 - 126140735 126140844 110 browser details YourSeq 81 1354 1463 3000 87.3% chr4 + 106071761 106071871 111 browser details YourSeq 80 1368 1471 3000 88.5% chr19 - 22975579 22975682 104 browser details YourSeq 80 1368 1471 3000 87.7% chr14 - 65914303 65914404 102 browser details YourSeq 80 1368 1478 3000 89.3% chr12 + 17141675 17141785 111 browser details YourSeq 80 1354 1472 3000 83.9% chr10 + 100027815 100027951 137 browser details YourSeq 79 1389 1515 3000 91.6% chr9 - 30171936 30172082 147

Note: The 3000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Gcm1 glial cells missing homolog 1 [ Mus musculus (house mouse) ] Gene ID: 14531, updated on 10-Oct-2019

Gene summary

Official Symbol Gcm1 provided by MGI Official Full Name glial cells missing homolog 1 provided by MGI Primary source MGI:MGI:108045 See related Ensembl:ENSMUSG00000023333 Gene type protein coding RefSeq status PROVISIONAL Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as GCMa; glide; Gcm-rs2; Gcm1-rs1 Expression Biased expression in placenta adult (RPKM 1.9) and kidney adult (RPKM 1.2) See more Orthologs all

Genomic context

Location: 9 E1; 9 43.49 cM See Gcm1 in Genome Data Viewer

Exon count: 6

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 9 NC_000075.6 (78051958..78065624)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 9 NC_000075.5 (77899765..77913431)

Chromosome 9 - NC_000075.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 1 transcript

Gene: Gcm1 ENSMUSG00000023333

Description glial cells missing homolog 1 [Source:MGI Symbol;Acc:MGI:108045] Gene Synonyms GCMa, Gcm 1, Gcm a, Gcm1-rs1, glial cell deficient, glide Location Chromosome 9: 78,051,924-78,065,624 forward strand. GRCm38:CM001002.2 About this gene This gene has 1 transcript (splice variant), 142 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 9 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Gcm1-201 ENSMUST00000024104.8 2068 436aa ENSMUSP00000024104.7 Protein coding CCDS23356 P70348 Q3UQD1 TSL:1 GENCODE basic APPRIS P1

33.70 kb Forward strand 78.05Mb 78.06Mb 78.07Mb (Comprehensive set... Gcm1-201 >protein coding Gm8058-201 >processed pseudogene

Contigs AC160334.2 > Regulatory Build

78.05Mb 78.06Mb 78.07Mb Reverse strand 33.70 kb

Regulation Legend

CTCF Enhancer Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana

Non-Protein Coding

pseudogene

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000024104

13.70 kb Forward strand

Gcm1-201 >protein coding

ENSMUSP00000024... PDB-ENSP mappings Superfamily GCM domain superfamily Pfam Transcription regulator GCM domain PROSITE profiles Transcription regulator GCM domain PANTHER Chorion-specific transcription factor GCM

PTHR12414:SF6 Gene3D 3.30.1370.90

2.20.28.80

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 436

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7