http://www.alphaknockout.com/ Mouse Lcorl Knockout Project (CRISPR/Cas9)

Objective: To create a Lcorl knockout Mouse model (C57BL/6N) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Lcorl (NCBI Reference Sequence: NM_178142 ; Ensembl: ENSMUSG00000015882 ) is located on Mouse 5. 7 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 7 (Transcript: ENSMUST00000045586). Exon 2~4 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 16.08% of the coding region. Exon 2~4 covers 29.21% of the coding region. The size of effective KO region: ~19967 bp. The KO region does not have any other known gene.

Page 1 of 9 http://www.alphaknockout.com/

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 7

Legends Exon of mouse Lcorl Knockout region

Page 2 of 9 http://www.alphaknockout.com/

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 4 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 http://www.alphaknockout.com/

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(28.55% 571) | C(20.6% 412) | G(16.7% 334) | T(34.15% 683)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(28.85% 577) | C(15.1% 302) | G(15.65% 313) | T(40.4% 808)

Note: The 2000 bp section downstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 http://www.alphaknockout.com/

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr5 - 45795359 45797358 2000 browser details YourSeq 272 1401 1927 2000 89.2% chrX - 52871911 52872319 409 browser details YourSeq 263 1410 1931 2000 90.2% chr2 + 31070115 31070543 429 browser details YourSeq 231 1415 1923 2000 88.7% chr14 + 53287658 53287934 277 browser details YourSeq 162 1757 1944 2000 94.6% chr11 + 5768146 5768336 191 browser details YourSeq 160 1398 1931 2000 85.9% chr5 + 114735199 114735368 170 browser details YourSeq 160 1397 1932 2000 86.4% chr10 + 41301372 41301539 168 browser details YourSeq 160 1765 1934 2000 97.7% chr1 + 183258498 183258670 173 browser details YourSeq 155 1397 1931 2000 85.6% chr8 - 105946246 105946412 167 browser details YourSeq 155 1762 1941 2000 95.9% chr5 - 151010893 151011089 197 browser details YourSeq 155 1773 1934 2000 98.2% chr11 - 93398405 93398579 175 browser details YourSeq 153 1762 1931 2000 97.6% chr11 - 116721098 116721272 175 browser details YourSeq 152 1400 1566 2000 95.9% chrX + 130722082 130722250 169 browser details YourSeq 150 1391 1566 2000 95.3% chr4 - 7498767 7498965 199 browser details YourSeq 150 1784 1954 2000 94.3% chr8 + 107111631 107111794 164 browser details YourSeq 150 1396 1564 2000 96.4% chr3 + 59272429 59272601 173 browser details YourSeq 149 1397 1565 2000 94.7% chrX - 77672811 77672980 170 browser details YourSeq 149 1779 1931 2000 98.7% chr5 - 90266513 90266665 153 browser details YourSeq 149 1403 1566 2000 96.4% chr6 + 146573171 146573342 172 browser details YourSeq 148 1403 1566 2000 95.7% chr2 - 161226627 161226790 164

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr5 - 45773392 45775391 2000 browser details YourSeq 33 1421 1465 2000 79.0% chr6 - 21314090 21314130 41 browser details YourSeq 24 1112 1139 2000 92.9% chr1 + 12325785 12325812 28

Note: The 2000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 http://www.alphaknockout.com/ Gene and information: Lcorl ligand dependent nuclear receptor corepressor-like [ Mus musculus (house mouse) ] Gene ID: 209707, updated on 24-Oct-2019

Gene summary

Official Symbol Lcorl provided by MGI Official Full Name ligand dependent nuclear receptor corepressor-like provided by MGI Primary source MGI:MGI:2651932 See related Ensembl:ENSMUSG00000015882 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Mlr1 Expression Broad expression in whole brain E14.5 (RPKM 2.2), CNS E18 (RPKM 2.0) and 18 other tissues See more Orthologs human all

Genomic context

Location: 5; 5 B3 See Lcorl in Genome Data Viewer Exon count: 12

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 5 NC_000071.6 (45697181..45857835, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 5 NC_000071.5 (46091680..46248779, complement)

Chromosome 5 - NC_000071.6

Page 6 of 9 http://www.alphaknockout.com/

Transcript information: This gene has 12 transcripts

Gene: Lcorl ENSMUSG00000015882

Description ligand dependent nuclear receptor corepressor-like [Source:MGI Symbol;Acc:MGI:2651932] Gene Synonyms A830039H10Rik, Mlr1 Location Chromosome 5: 45,697,181-45,857,615 reverse strand. GRCm38:CM000998.2 About this gene This gene has 12 transcripts (splice variants), 127 orthologues, 1 paralogue and is a member of 2 Ensembl protein families. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Lcorl- ENSMUST00000045586.12 5708 315aa ENSMUSP00000042677.6 Protein coding CCDS19277 Q3U285 TSL:1 202 GENCODE basic

Lcorl- ENSMUST00000087164.9 4979 517aa ENSMUSP00000084408.3 Protein coding CCDS19278 Q3U285 TSL:1 204 GENCODE basic

Lcorl- ENSMUST00000016026.13 4884 600aa ENSMUSP00000016026.7 Protein coding CCDS51499 Q3U285 TSL:1 201 GENCODE basic

Lcorl- ENSMUST00000238522.1 10477 1851aa ENSMUSP00000158672.1 Protein coding - - GENCODE 212 basic APPRIS P1

Lcorl- ENSMUST00000190036.6 5463 220aa ENSMUSP00000140503.2 Protein coding - A0A0H2UH31 CDS 5' 210 incomplete TSL:1

Lcorl- ENSMUST00000121573.7 2378 232aa ENSMUSP00000112416.1 Protein coding - E9PXY2 TSL:1 205 GENCODE basic

Lcorl- ENSMUST00000189859.6 384 128aa ENSMUSP00000139996.2 Protein coding - A0A087WQ10 CDS 5' and 3' 209 incomplete TSL:3

Lcorl- ENSMUST00000186633.2 351 117aa ENSMUSP00000141174.2 Protein coding - A0A087WST0 CDS 5' and 3' 207 incomplete TSL:3

Lcorl- ENSMUST00000187615.6 439 49aa ENSMUSP00000139466.1 Nonsense mediated - A0A087WNR9 CDS 5' 208 decay incomplete TSL:2

Lcorl- ENSMUST00000200142.1 7912 No - Retained intron - - TSL:NA 211 protein

Lcorl- ENSMUST00000156295.1 678 No - Retained intron - - TSL:2 206 protein

Lcorl- ENSMUST00000067997.10 1174 No - lncRNA - - TSL:1 203 protein

Page 7 of 9 http://www.alphaknockout.com/

180.44 kb Forward strand 45.70Mb 45.75Mb 45.80Mb 45.85Mb Ncapg-201 >protein coding 4930449I04Rik-201 >TEC (Comprehensive set...

Ncapg-205 >retained intron

Ncapg-206 >retained intron

Ncapg-204 >retained intron

Ncapg-202 >retained intron

Gm36401-201 >TEC

Contigs AC093341.5 > < AC158786.8

Genes < Lcorl-212protein coding (Comprehensive set...

< Lcorl-202protein coding

< Lcorl-210protein coding

< Lcorl-205protein coding

< Lcorl-211retained intron < Lcorl-209protein coding

< Lcorl-204protein coding

< Lcorl-201protein coding

< Lcorl-208nonsense mediated decay

< Gm43201-201TEC

< Gm29036-201transcribed processed pseudogene

< Gm43200-201TEC

< Lcorl-203processed transcript

< Lcorl-206retained intron

< Lcorl-207protein coding

Regulatory Build

45.70Mb 45.75Mb 45.80Mb 45.85Mb Reverse strand 180.44 kb

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

pseudogene processed transcript

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Binding Site

Page 8 of 9 http://www.alphaknockout.com/

Transcript: ENSMUST00000045586

< Lcorl-202protein coding

Reverse strand 160.44 kb

ENSMUSP00000042... MobiDB lite Low complexity (Seg) PANTHER PTHR21545

PTHR21545:SF10

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

splice region variant synonymous variant

Scale bar 0 40 80 120 160 200 240 315

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC, VectorBuilder.

Page 9 of 9