https://www.alphaknockout.com

Mouse Lig4 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Lig4 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Lig4 (NCBI Reference Sequence: NM_176953 ; Ensembl: ENSMUSG00000049717 ) is located on Mouse 8. 2 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 2 (Transcript: ENSMUST00000095476). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Lig4 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-191P3 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Null homozygotes die late in gestation with extensive CNS apoptosis, blocked lymphopoeiesis and failure of V(D)J joining. Carrier fibroblasts show elevated chromosome breaks. ~40% of homozygous hypomorphs survive, with retarded growth, reduced PBL and progressive loss of hematopoietic stem cells.

Exon 2 covers 100.0% of the coding region. Start codon is in exon 2, and stop codon is in exon 2. The size of intron 1 for 5'-loxP site insertion: 2223 bp. The size of effective cKO region: ~3006 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

gRNA region

Wildtype allele T A

5' gRNA region G 3'

1 2

Targeting vector T A G

Targeted allele T A G

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Lig4 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(9233bp) | A(28.82% 2661) | C(19.93% 1840) | T(29.16% 2692) | G(22.09% 2040)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 - 9974029 9977028 3000 browser details YourSeq 146 2365 2549 3000 91.6% chr17 + 88309776 88309957 182 browser details YourSeq 145 2406 2571 3000 96.8% chr12 + 3265963 3266296 334 browser details YourSeq 144 2391 2552 3000 95.0% chr10 + 90980089 90980263 175 browser details YourSeq 140 2386 2546 3000 96.1% chr16 - 32320628 32320793 166 browser details YourSeq 139 2390 2548 3000 94.3% chr5 - 52964897 52965055 159 browser details YourSeq 139 2406 2554 3000 95.3% chr12 + 76505883 76506030 148 browser details YourSeq 138 2405 2548 3000 98.0% chr12 + 74418982 74419125 144 browser details YourSeq 138 2404 2554 3000 96.1% chr1 + 15748617 15748768 152 browser details YourSeq 137 2405 2556 3000 95.4% chr8 - 3797449 3797604 156 browser details YourSeq 137 2405 2549 3000 97.3% chr6 - 131393335 131393479 145 browser details YourSeq 137 2406 2549 3000 98.0% chr12 - 71935876 71936023 148 browser details YourSeq 136 2405 2549 3000 97.3% chr13 - 69812834 69812979 146 browser details YourSeq 136 2405 2554 3000 95.4% chr10 + 67841862 67842011 150 browser details YourSeq 135 2405 2549 3000 96.6% chr17 + 29323538 29323682 145 browser details YourSeq 134 2405 2554 3000 93.3% chr7 - 126513025 126513173 149 browser details YourSeq 134 2398 2548 3000 94.7% chr16 + 69427383 69427545 163 browser details YourSeq 134 2405 2548 3000 96.6% chr11 + 5208208 5208351 144 browser details YourSeq 134 2405 2554 3000 94.7% chr10 + 82837837 82837986 150 browser details YourSeq 133 2406 2548 3000 96.6% chr2 - 121970341 121970483 143

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 - 9968046 9971045 3000 browser details YourSeq 52 1901 1956 3000 98.3% chr2 - 5644040 5644164 125 browser details YourSeq 48 1776 1943 3000 96.3% chr2 - 5644044 5644337 294 browser details YourSeq 43 1899 1955 3000 87.8% chr3 - 154416120 154416176 57 browser details YourSeq 41 1968 2311 3000 95.6% chr14 + 122240244 122240672 429 browser details YourSeq 36 2345 2464 3000 65.0% chr10 + 114016499 114016618 120 browser details YourSeq 35 1774 2060 3000 83.1% chr1 - 87338947 87339257 311 browser details YourSeq 34 2345 2382 3000 94.8% chr12 + 41997884 41997921 38 browser details YourSeq 32 1965 2047 3000 94.6% chr1 - 4803240 4803322 83 browser details YourSeq 32 200 453 3000 94.5% chr14 + 92149236 92149772 537 browser details YourSeq 31 1910 1949 3000 91.9% chr2 + 25301432 25301525 94 browser details YourSeq 30 2345 2381 3000 91.9% chr13 + 15778574 15778614 41 browser details YourSeq 29 2347 2377 3000 96.8% chr8 + 115494008 115494038 31 browser details YourSeq 29 2345 2376 3000 96.9% chr14 + 26563764 26563796 33 browser details YourSeq 28 1910 1943 3000 96.7% chr11 - 5130671 5130713 43 browser details YourSeq 21 2349 2369 3000 100.0% chr12 - 70065146 70065166 21

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and protein information: Lig4 ligase IV, DNA, ATP-dependent [ Mus musculus (house mouse) ] Gene ID: 319583, updated on 12-Aug-2019

Gene summary

Official Symbol Lig4 provided by MGI Official Full Name ligase IV, DNA, ATP-dependent provided by MGI Primary source MGI:MGI:1335098 See related Ensembl:ENSMUSG00000049717 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as tiny; 5830471N16Rik Expression Ubiquitous expression in thymus adult (RPKM 7.2), testis adult (RPKM 3.8) and 28 other tissues See more Orthologs human all

Genomic context

Location: 8; 8 A1.1 See Lig4 in Genome Data Viewer

Exon count: 3

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 8 NC_000074.6 (9970020..9977696, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 8 NC_000074.5 (9970020..9976323, complement)

Chromosome 8 - NC_000074.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 2 transcripts

Gene: Lig4 ENSMUSG00000049717

Description ligase IV, DNA, ATP-dependent [Source:MGI Symbol;Acc:MGI:1335098] Gene Synonyms 5830471N16Rik, DNA ligase IV, tiny Location Chromosome 8: 9,969,049-9,977,686 reverse strand. GRCm38:CM001001.2 About this gene This gene has 2 transcripts (splice variants), 190 orthologues, 2 paralogues, is a member of 1 Ensembl protein family and is associated with 83 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Lig4-201 ENSMUST00000095476.5 5028 911aa ENSMUSP00000093130.4 Protein coding CCDS22092 Q8BTF7 TSL:1 GENCODE basic APPRIS P1

Lig4-202 ENSMUST00000170033.1 3105 911aa ENSMUSP00000130807.1 Protein coding CCDS22092 Q8BTF7 TSL:1 GENCODE basic APPRIS P1

28.64 kb Forward strand 9.96Mb 9.97Mb 9.98Mb Abhd13-202 >protein coding (Comprehensive set...

Abhd13-201 >protein coding

Contigs AC127240.3 > < AC138397.4 Genes (Comprehensive set... < Lig4-201protein coding

< Lig4-202protein coding

Regulatory Build

9.96Mb 9.97Mb 9.98Mb Reverse strand 28.64 kb

Regulation Legend CTCF Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000095476

< Lig4-201protein coding

Reverse strand 7.25 kb

ENSMUSP00000093... TIGRFAM DNA ligase, ATP-dependent Superfamily DNA ligase, ATP-dependent, N-terminal domain superfamily BRCT domain superfamily

SSF56091 Nucleic acid-binding, OB-fold SMART BRCT domain Pfam DNA ligase, ATP-dependent, central BRCT domain BRCT domain

DNA ligase, ATP-dependent, N-terminal DNA ligase, ATP-dependent, C-terminal DNA ligase IV domain PROSITE profiles DNA ligase, ATP-dependent, central BRCT domain PROSITE patterns DNA ligase, ATP-dependent, conserved site

DNA ligase, ATP-dependent, conserved site PANTHER PTHR45997

DNA ligase 4 Gene3D DNA ligase, ATP-dependent, N-terminal domain superfamily 2.40.50.140 BRCT domain superfamily

3.30.470.30 CDD cd07903 cd07968 cd17722 cd17717

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 720 800 911

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7