https://www.alphaknockout.com

Mouse Setmar Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Setmar conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Setmar (NCBI Reference Sequence: NM_178391 ; Ensembl: ENSMUSG00000034639 ) is located on Mouse 6. 2 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 2 (Transcript: ENSMUST00000049246). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Setmar gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-93D20 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 covers 82.85% of the coding region. Start codon is in exon 1, and stop codon is in exon 2. The size of intron 1 for 5'-loxP site insertion: 10466 bp. The size of effective cKO region: ~2167 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Setmar Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7268bp) | A(30.02% 2182) | C(21.84% 1587) | T(27.79% 2020) | G(20.35% 1479)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr6 + 108072457 108075456 3000 browser details YourSeq 77 2159 2401 3000 89.8% chr5 + 131175804 131176046 243 browser details YourSeq 66 2236 2369 3000 72.0% chr7 + 144172688 144172819 132 browser details YourSeq 62 2238 2427 3000 91.9% chr17 - 84503163 84503353 191 browser details YourSeq 59 2342 2423 3000 94.1% chr9 - 111144419 111144505 87 browser details YourSeq 59 2342 2423 3000 94.1% chr5 - 39191300 39191386 87 browser details YourSeq 59 2342 2423 3000 94.1% chr17 - 68503145 68503231 87 browser details YourSeq 56 2237 2421 3000 87.4% chr6 + 134562819 134562998 180 browser details YourSeq 55 2236 2414 3000 93.8% chr1 - 121354013 121354206 194 browser details YourSeq 55 2234 2414 3000 84.7% chr17 + 43514260 43514436 177 browser details YourSeq 51 2330 2404 3000 88.1% chr6 - 50060300 50060375 76 browser details YourSeq 50 2236 2397 3000 86.3% chr15 + 3254575 3254733 159 browser details YourSeq 49 1486 1727 3000 94.6% chr7 + 122183515 122183758 244 browser details YourSeq 48 2236 2399 3000 79.7% chr6 + 119827540 119827691 152 browser details YourSeq 45 2350 2422 3000 96.0% chr19 + 27747451 27747529 79 browser details YourSeq 45 2238 2399 3000 88.7% chr16 + 58248601 58248761 161 browser details YourSeq 45 2335 2399 3000 86.6% chr11 + 65610154 65610216 63 browser details YourSeq 44 2842 2951 3000 78.5% chr11 - 40511846 40511957 112 browser details YourSeq 44 2239 2374 3000 66.2% chr2 + 122699496 122699631 136 browser details YourSeq 42 1672 1726 3000 92.0% chr19 + 23716556 23716611 56

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr6 + 108076475 108079474 3000 browser details YourSeq 111 1940 2138 3000 83.3% chr11 - 101944685 101944900 216 browser details YourSeq 102 1914 2193 3000 87.7% chr12 + 81789891 81790247 357 browser details YourSeq 80 2021 2166 3000 90.9% chr2 - 131962524 131962784 261 browser details YourSeq 80 2016 2166 3000 85.8% chr18 + 46400546 46400696 151 browser details YourSeq 77 1923 2130 3000 89.8% chr9 - 54966341 54966567 227 browser details YourSeq 76 1925 2140 3000 88.0% chr12 + 28720290 28720510 221 browser details YourSeq 67 1919 2136 3000 92.5% chr4 - 53064814 53065080 267 browser details YourSeq 67 1914 2136 3000 92.6% chr10 + 72579934 72580175 242 browser details YourSeq 62 2018 2173 3000 87.4% chr18 - 34894933 34895087 155 browser details YourSeq 62 1887 2160 3000 92.0% chr2 + 84863827 84864225 399 browser details YourSeq 60 2574 2662 3000 86.8% chr15 - 101281300 101281397 98 browser details YourSeq 58 2024 2188 3000 92.7% chr9 - 115146171 115146342 172 browser details YourSeq 57 1948 2051 3000 90.3% chr9 - 87244990 87245109 120 browser details YourSeq 57 2580 2740 3000 79.0% chr11 + 88854670 88854819 150 browser details YourSeq 56 1919 2143 3000 89.1% chr7 + 7985459 7985726 268 browser details YourSeq 56 2035 2166 3000 86.8% chr13 + 56497573 56497702 130 browser details YourSeq 55 1939 2142 3000 96.7% chr6 - 57549550 57549843 294 browser details YourSeq 55 2022 2140 3000 87.7% chr16 - 30143669 30143789 121 browser details YourSeq 55 2047 2160 3000 84.0% chr8 + 122528416 122528528 113

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and protein information: Setmar SET domain without mariner fusion [ Mus musculus (house mouse) ] Gene ID: 74729, updated on 12-Aug-2019

Gene summary

Official Symbol Setmar provided by MGI Official Full Name SET domain without mariner transposase fusion provided by MGI Primary source MGI:MGI:1921979 See related Ensembl:ENSMUSG00000034639 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Etet2; 5830404F24Rik Summary This gene encodes a histone-lysine N-methyltransferase that may be involved in the methylation of . In Expression anthropoid primates this gene is a fusion gene of a SET histone-lysine N-methyltransferase and a mariner (MAR) family transposase. In all other species this gene contains only the SET domain. [provided by RefSeq, Jan 2013] Orthologs Ubiquitous expression in bladder adult (RPKM 2.7), frontal lobe adult (RPKM 1.9) and 28 other tissues See more human all

Genomic context

Location: 6; 6 E1 See Setmar in Genome Data Viewer

Exon count: 3

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 6 NC_000072.6 (108065045..108077127)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 6 NC_000072.5 (108015040..108027114)

Chromosome 6 - NC_000072.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 2 transcripts

Gene: Setmar ENSMUSG00000034639

Description SET domain without mariner transposase fusion [Source:MGI Symbol;Acc:MGI:1921979] Gene Synonyms 5830404F24Rik, Etet2 Location Chromosome 6: 108,065,045-108,077,122 forward strand. GRCm38:CM000999.2 About this gene This gene has 2 transcripts (splice variants), 177 orthologues, 21 paralogues, is a member of 1 Ensembl protein family and is associated with 9 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Setmar- ENSMUST00000049246.6 1612 309aa ENSMUSP00000048225.5 Protein coding CCDS51867 Q80UJ9 TSL:1 201 GENCODE basic APPRIS P1

Setmar- ENSMUST00000138140.2 1748 60aa ENSMUSP00000145263.1 Nonsense mediated - A0A0N4SVW0 TSL:3 202 decay

32.08 kb Forward strand 108.06Mb 108.07Mb 108.08Mb (Comprehensive set... Setmar-201 >protein coding

Setmar-202 >nonsense mediated decay

Contigs AC153916.2 > Regulatory Build

108.06Mb 108.07Mb 108.08Mb Reverse strand 32.08 kb

Regulation Legend

CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana

Non-Protein Coding

processed transcript

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000049246

12.08 kb Forward strand

Setmar-201 >protein coding

ENSMUSP00000048... Low complexity (Seg) Superfamily SSF82199 SMART Pre-SET domain SET domain

Pfam Pre-SET domain SET domain

PROSITE profiles Pre-SET domain SET domain Post-SET domain

PANTHER PTHR45660:SF13

PTHR45660 Gene3D 2.170.270.10

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 309

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7