http://www.alphaknockout.com/ Mouse Lrg1 Conditional Knockout Project (CRISPR/Cas9)-vA

Objective: To create a Lrg1 conditional knockout mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Lrg1 ( NCBI Reference Sequence: NM_029796 ; Ensembl: ENSMUSG00000037095 ) is located on mouse 17. 2 exons are identified , with the ATG start codon in exon 1 and the TGA stop codon in exon 2 (Transcript: ENSMUST00000041357). Exon 2 will be selected as conditional knockout region (cKO region). the second loxP will be inserted downstream of the TGA stop codon. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-180H19 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit Tgfbr1-dependent increased growth of LCC tumor allografts.

The knockout of Exon 2 will result in frameshift of the gene, and covers 97.47% of the coding region. The size of intron 1 for 5'-loxP site insertion: 916 bp. The size of effective cKO region: ~1341 bp. This strategy is designed based on genetic information in existing databases. Due to the complexity of biological processes, all risk of loxP insertion on gene transcription, RNA splicing and translation cannot be predicted at existing technological level.

The cKO region contains functional region of the Plin5 gene. Knockout the region may affect the function of Plin5 gene.

Page 1 of 7 http://www.alphaknockout.com/

Overview of the Targeting Strategy

gRNA region

Wildtype allele T gRNA region G 5' A 3'

17 1 2

Targeting vector T G A

Targeted allele T G A

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Sema6b Homology arm Exon of mouse Lrg1 cKO region

loxP site

Page 2 of 7 http://www.alphaknockout.com/

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7767bp) | A(22.96% 1783) | C(27.63% 2146) | G(28.74% 2232) | T(20.68% 1606)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 7 http://www.alphaknockout.com/

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr17 - 56121195 56124194 3000 browser details YourSeq 46 1320 1374 3000 96.1% chr5 - 75395028 75395242 215 browser details YourSeq 37 1382 1426 3000 93.4% chr7 + 132834730 132834777 48 browser details YourSeq 36 1384 1428 3000 80.5% chr3 + 114469312 114469352 41 browser details YourSeq 33 1381 1418 3000 94.8% chr4 - 45240329 45240369 41 browser details YourSeq 33 1369 1415 3000 85.2% chr14 + 115718776 115718822 47 browser details YourSeq 32 1386 1428 3000 79.0% chr11 + 29882496 29882534 39 browser details YourSeq 30 632 680 3000 97.0% chr5 - 30719423 30719476 54 browser details YourSeq 30 1490 1557 3000 81.6% chr2 - 105435004 105435069 66 browser details YourSeq 30 1386 1418 3000 97.0% chr2 - 103475442 103475476 35 browser details YourSeq 30 1324 1353 3000 100.0% chr12 - 110127569 110127598 30 browser details YourSeq 29 1382 1418 3000 89.2% chr1 + 101686476 101686512 37 browser details YourSeq 29 1393 1459 3000 94.0% chr1 + 7697829 7697897 69 browser details YourSeq 28 1375 1414 3000 85.0% chr2 - 155596941 155596980 40 browser details YourSeq 28 1387 1418 3000 93.8% chr14 - 66913356 66913387 32 browser details YourSeq 28 672 724 3000 77.4% chr13 - 110685255 110685309 55 browser details YourSeq 28 1323 1352 3000 96.7% chr1 - 120186252 120186281 30 browser details YourSeq 28 1383 1414 3000 93.8% chr13 + 109909425 109909456 32 browser details YourSeq 27 1382 1418 3000 86.5% chr4 - 108648456 108648492 37 browser details YourSeq 27 1386 1418 3000 91.0% chr1 - 78464380 78464412 33

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr17 - 56116678 56119677 3000 browser details YourSeq 284 229 1931 3000 93.4% chr11 + 86538339 87134063 595725 browser details YourSeq 272 226 1861 3000 92.0% chr11 + 19683486 20109590 426105 browser details YourSeq 203 1485 1862 3000 91.1% chr19 - 21263798 21264328 531 browser details YourSeq 192 1488 1863 3000 86.9% chr1 - 74226084 74226403 320 browser details YourSeq 189 1485 1863 3000 92.7% chr11 + 117908208 117908790 583 browser details YourSeq 172 235 666 3000 89.5% chr15 + 85633384 85633975 592 browser details YourSeq 165 1492 1866 3000 88.3% chr10 + 31399868 31400218 351 browser details YourSeq 155 1492 1866 3000 91.4% chr11 + 46794896 46795353 458 browser details YourSeq 146 1560 1863 3000 90.1% chr10 - 78496879 78497316 438 browser details YourSeq 141 234 383 3000 97.4% chr4 + 148520568 148520717 150 browser details YourSeq 138 1486 1649 3000 92.7% chr10 - 60699769 60699942 174 browser details YourSeq 138 233 794 3000 86.9% chr1 - 119573281 119573810 530 browser details YourSeq 138 1488 1862 3000 85.2% chr11 + 22889916 22890234 319 browser details YourSeq 137 1485 1883 3000 82.1% chr2 + 37974945 37975117 173 browser details YourSeq 136 226 382 3000 93.6% chr17 - 45628664 45628822 159 browser details YourSeq 135 230 383 3000 95.4% chr8 - 109112220 109112383 164 browser details YourSeq 134 1492 1937 3000 82.5% chr2 + 90437305 90437465 161 browser details YourSeq 132 229 384 3000 89.6% chr2 - 122748591 122748733 143 browser details YourSeq 132 226 367 3000 95.1% chr4 + 107322075 107322215 141

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 http://www.alphaknockout.com/ Gene and protein information: Lrg1 leucine-rich alpha-2-glycoprotein 1 [ Mus musculus (house mouse) ] Gene ID: 76905, updated on 25-Sep-2020

Gene summary

Official Symbol Lrg1 provided by MGI Official Full Name leucine-rich alpha-2-glycoprotein 1 provided by MGI Primary source MGI:MGI:1924155 See related Ensembl:ENSMUSG00000037095 Gene type protein coding RefSeq status PROVISIONAL Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Lr; Lrg; Lrhg; 1300008B03Rik; 2310031E04Rik Expression Biased expression in liver adult (RPKM 229.9), mammary gland adult (RPKM 136.7) and 10 other tissues See more Orthologs human all

Genomic context

Location: 17 29.17 cM; 17 D See Lrg1 in Genome Data Viewer

Exon count: 2

Annotation release Status Assembly Chr Location

109 current GRCm39 (GCF_000001635.27) 17 NC_000083.7 (56426678..56428946, complement)

108.20200622 previous assembly GRCm38.p6 (GCF_000001635.26) 17 NC_000083.6 (56119678..56121946, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 17 NC_000083.5 (56259101..56261369, complement)

Chromosome 17 - NC_000083.7

Page 5 of 7 http://www.alphaknockout.com/

Transcript information: This gene has 1 transcript

Gene: Lrg1 ENSMUSG00000037095

Description leucine-rich alpha-2-glycoprotein 1 [Source:MGI Symbol;Acc:MGI:1924155] Gene Synonyms 1300008B03Rik, 2310031E04Rik, Lrhg Location Chromosome 17: 56,119,678-56,122,001 reverse strand. GRCm38:CM001010.2 About this gene This gene has 1 transcript (splice variant), 102 orthologues, 23 paralogues and is associated with 1 phenotype. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Match Flags

Lrg1-201 ENSMUST00000041357.8 1408 342aa ENSMUSP00000038048.7 Protein coding CCDS28895 Q91XL1 TSL:1 GENCODE basic APPRIS P1

22.32 kb Forward strand 56.11Mb 56.12Mb 56.13Mb Contigs CT009719.10 > (Comprehensive set... < Plin4-201protein coding < Lrg1-201protein coding < Sema6b-201protein coding

< Plin4-202protein coding < Sema6b-202protein coding

< Plin4-203protein coding

< Plin5-201protein coding

< Plin5-202protein coding

Regulatory Build

56.11Mb 56.12Mb 56.13Mb Reverse strand 22.32 kb

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Page 6 of 7 http://www.alphaknockout.com/

Transcript: ENSMUST00000041357

< Lrg1-201protein coding

Reverse strand 2.32 kb

ENSMUSP00000038... Low complexity (Seg) Cleavage site (Sign... Superfamily SSF52058 SMART Leucine-rich repeat, typical subtype Cysteine-rich flanking region, C-terminal

SM00364 Prints PR00019 Pfam Leucine-rich repeat Cysteine-rich flanking region, C-terminal PROSITE profiles Leucine-rich repeat PANTHER PTHR45617:SF68

PTHR45617 Gene3D Leucine-rich repeat domain superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend inframe deletion missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 342

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC, VectorBuilder.

Page 7 of 7