https://www.alphaknockout.com

Mouse Atg4b Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Atg4b conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Atg4b (NCBI Reference Sequence: NM_174874 ; Ensembl: ENSMUSG00000026280 ) is located on Mouse 1. 13 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 13 (Transcript: ENSMUST00000027502). Exon 4~6 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Atg4b gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-27E7 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a gene trap allele exhibit decreased autophagy, impaired swimming, circling, head tilting, and abnormal utricle, saccular, and otolith morphology. Mice homozygous for another gene trap allele exhibit partial preweaning lethality and impaired motor coordination and learning.

Exon 4 starts from about 15.69% of the coding region. The knockout of Exon 4~6 will result in frameshift of the gene. The size of intron 3 for 5'-loxP site insertion: 5445 bp, and the size of intron 6 for 3'-loxP site insertion: 2584 bp. The size of effective cKO region: ~2431 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 4 5 6 13 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Atg4b Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8931bp) | A(23.97% 2141) | C(23.16% 2068) | T(30.55% 2728) | G(22.33% 1994)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr1 + 93770521 93773520 3000 browser details YourSeq 197 1 650 3000 85.8% chr18 - 56654896 56655153 258 browser details YourSeq 196 1 650 3000 87.4% chr19 - 32738436 32739032 597 browser details YourSeq 182 460 769 3000 93.0% chr1 - 9142762 9143298 537 browser details YourSeq 173 1 643 3000 86.7% chr17 + 17429499 17430095 597 browser details YourSeq 172 415 650 3000 93.5% chr2 + 38453019 38453481 463 browser details YourSeq 170 451 652 3000 92.5% chr7 - 58460450 58460652 203 browser details YourSeq 170 449 652 3000 92.2% chr12 + 94096248 94096456 209 browser details YourSeq 169 450 650 3000 92.5% chr5 - 122176936 122177136 201 browser details YourSeq 169 450 651 3000 93.9% chr16 - 94711876 94712081 206 browser details YourSeq 169 457 651 3000 92.3% chr2 + 166882378 166882571 194 browser details YourSeq 168 468 655 3000 93.6% chr6 - 113571748 113571934 187 browser details YourSeq 168 1673 2031 3000 85.4% chr12 - 105461157 105461514 358 browser details YourSeq 168 454 650 3000 93.5% chr11 - 62429779 62429985 207 browser details YourSeq 168 463 867 3000 85.1% chrX + 103671134 103671485 352 browser details YourSeq 168 463 668 3000 89.6% chr2 + 3395789 3395983 195 browser details YourSeq 168 460 651 3000 94.3% chr10 + 11234914 11235106 193 browser details YourSeq 167 443 653 3000 91.0% chr2 - 70860040 70860248 209 browser details YourSeq 167 462 650 3000 94.7% chr1 - 33872537 33872727 191 browser details YourSeq 167 451 649 3000 90.4% chr8 + 78496228 78496424 197

Note: The 3000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr1 + 93775952 93778951 3000 browser details YourSeq 113 1639 1789 3000 89.5% chr15 - 63496242 63496393 152 browser details YourSeq 110 1611 1787 3000 85.7% chr3 + 106552570 106552795 226 browser details YourSeq 109 1635 1776 3000 90.3% chrX + 74354030 74354171 142 browser details YourSeq 106 1624 1789 3000 81.8% chr18 - 80453926 80454090 165 browser details YourSeq 106 1605 1775 3000 88.9% chr13 + 95071788 95071987 200 browser details YourSeq 104 1635 1789 3000 86.2% chr9 - 98714092 98714248 157 browser details YourSeq 103 1627 1786 3000 87.6% chr4 - 89402926 89403086 161 browser details YourSeq 102 1610 1776 3000 87.5% chr8 - 77718649 77718836 188 browser details YourSeq 102 1639 1776 3000 88.1% chr15 + 79009280 79009418 139 browser details YourSeq 101 1610 1789 3000 85.9% chr7 + 83285125 83285333 209 browser details YourSeq 101 1612 1776 3000 88.4% chr5 + 38701440 38701664 225 browser details YourSeq 101 1631 1787 3000 89.8% chr1 + 94430635 94430792 158 browser details YourSeq 100 1635 1776 3000 85.7% chr18 - 78049680 78049822 143 browser details YourSeq 100 1609 1776 3000 84.9% chr11 + 62415712 62415926 215 browser details YourSeq 99 1635 1776 3000 86.3% chr3 - 98079141 98079281 141 browser details YourSeq 99 1635 1789 3000 90.3% chr11 + 77873102 77873256 155 browser details YourSeq 99 1631 1780 3000 87.9% chr11 + 20955054 20955207 154 browser details YourSeq 98 1635 1774 3000 90.2% chr4 - 106777365 106777506 142 browser details YourSeq 97 1635 1768 3000 86.4% chr5 - 7610960 7611095 136

Note: The 3000 bp section downstream of Exon 6 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Atg4b autophagy related 4B, cysteine peptidase [ Mus musculus (house mouse) ] Gene ID: 66615, updated on 10-Oct-2019

Gene summary

Official Symbol Atg4b provided by MGI Official Full Name autophagy related 4B, cysteine peptidase provided by MGI Primary source MGI:MGI:1913865 See related Ensembl:ENSMUSG00000026280 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Apg4b; Autl1; Atg4bl; AW048066; 2510009N07Rik Expression Ubiquitous expression in CNS E14 (RPKM 14.9), whole brain E14.5 (RPKM 14.0) and 28 other tissues See more Orthologs human all

Genomic context

Location: 1; 1 D See Atg4b in Genome Data Viewer

Exon count: 14

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 1 NC_000067.6 (93754905..93789606)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 1 NC_000067.5 (95651610..95686106)

Chromosome 1 - NC_000067.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 11 transcripts

Gene: Atg4b ENSMUSG00000026280

Description autophagy related 4B, cysteine peptidase [Source:MGI Symbol;Acc:MGI:1913865] Gene Synonyms 2510009N07Rik, Apg4b, autophagin 1 Location Chromosome 1: 93,751,500-93,790,610 forward strand. GRCm38:CM000994.2 About this gene This gene has 11 transcripts (splice variants), 198 orthologues, 4 paralogues, is a member of 1 Ensembl protein family and is associated with 20 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Atg4b- ENSMUST00000027502.15 4173 393aa ENSMUSP00000027502.8 Protein coding CCDS15195 A0A0R4J065 TSL:1 201 GENCODE basic APPRIS P1

Atg4b- ENSMUST00000149436.7 513 150aa ENSMUSP00000123383.1 Protein coding - D3YZP6 CDS 3' 203 incomplete TSL:3

Atg4b- ENSMUST00000187824.6 420 121aa ENSMUSP00000139541.1 Protein coding - A0A087WNY2 CDS 3' 209 incomplete TSL:5

Atg4b- ENSMUST00000185482.6 919 143aa ENSMUSP00000140758.1 Nonsense mediated - A0A087WRT0 TSL:5 204 decay

Atg4b- ENSMUST00000186811.1 503 63aa ENSMUSP00000139463.1 Nonsense mediated - A0A087WNR6 CDS 5' 208 decay incomplete TSL:5

Atg4b- ENSMUST00000135762.7 4186 No - Retained intron - - TSL:5 202 protein

Atg4b- ENSMUST00000185754.1 601 No - Retained intron - - TSL:2 205 protein

Atg4b- ENSMUST00000186001.1 565 No - Retained intron - - TSL:2 206 protein

Atg4b- ENSMUST00000189872.1 477 No - Retained intron - - TSL:3 211 protein

Atg4b- ENSMUST00000186124.1 454 No - Retained intron - - TSL:3 207 protein

Atg4b- ENSMUST00000189152.1 343 No - lncRNA - - TSL:2 210 protein

Page 6 of 8 https://www.alphaknockout.com

59.11 kb Forward strand 93.75Mb 93.76Mb 93.77Mb 93.78Mb 93.79Mb 93.80Mb (Comprehensive set... Gm10550-201 >lncRNA Atg4b-203 >protein coding Atg4b-211 >retained intron

Atg4b-201 >protein coding

Atg4b-209 >protein coding Atg4b-208 >nonsense mediated decay

Atg4b-204 >nonsense mediated decay Atg4b-207 >retained intron

Atg4b-202 >retained intron

Atg4b-210 >lncRNA Atg4b-205 >retained intron

Atg4b-206 >retained intron

Contigs < AC162891.6 Genes < Thap4-206protein coding < Dtymk-201protein coding (Comprehensive set...

< Thap4-201nonsense mediated decay < Dtymk-203protein coding

< Thap4-204protein coding < Dtymk-202protein coding

< Thap4-203retained intron < Dtymk-204lncRNA

Regulatory Build

93.75Mb 93.76Mb 93.77Mb 93.78Mb 93.79Mb 93.80Mb Reverse strand 59.11 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000027502

35.71 kb Forward strand

Atg4b-201 >protein coding

ENSMUSP00000027... Low complexity (Seg) Superfamily Papain-like cysteine peptidase superfamily Pfam Peptidase C54 PANTHER Peptidase C54

Cysteine protease ATG4B, metazoa

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 393

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8