https://www.alphaknockout.com

Mouse S100a16 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a S100a16 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The S100a16 (NCBI Reference Sequence: NM_026416 ; Ensembl: ENSMUSG00000074457 ) is located on Mouse 3. 3 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 3 (Transcript: ENSMUST00000098910). Exon 2~3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse S100a16 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP24-204K10 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2~3 covers 100.0% of the coding region. Start codon is in exon 2, and stop codon is in exon 3. The size of intron 1 for 5'-loxP site insertion: 676 bp. The size of effective cKO region: ~1882 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Homology arm Exon of mouse S100a16 cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7064bp) | A(23.44% 1656) | C(23.53% 1662) | T(26.73% 1888) | G(26.3% 1858)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr3 + 90538771 90541770 3000 browser details YourSeq 81 14 104 3000 95.7% chr1 + 153550528 153550623 96 browser details YourSeq 68 2602 2927 3000 95.9% chr5 + 4195226 4195798 573 browser details YourSeq 67 27 104 3000 93.6% chr2 + 174461078 174461156 79 browser details YourSeq 44 2881 2924 3000 100.0% chr9 - 29838614 29838657 44 browser details YourSeq 37 1254 1290 3000 100.0% chr11 + 99881455 99881491 37 browser details YourSeq 24 780 807 3000 88.5% chr15 - 50567555 50567581 27 browser details YourSeq 23 356 378 3000 100.0% chr5 - 63928196 63928218 23 browser details YourSeq 22 273 298 3000 92.4% chr7 - 112187741 112187766 26 browser details YourSeq 22 357 378 3000 100.0% chr11 - 118626275 118626296 22 browser details YourSeq 21 1911 1931 3000 100.0% chr13 - 19481366 19481386 21

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr3 + 90542585 90545584 3000 browser details YourSeq 195 2423 2986 3000 78.6% chr16 - 12919742 12920293 552 browser details YourSeq 174 2423 2967 3000 80.5% chr8 + 113220132 113220660 529 browser details YourSeq 171 2405 2927 3000 84.1% chr16 - 50658682 50659224 543 browser details YourSeq 165 2423 2981 3000 83.7% chr13 + 91620641 91621208 568 browser details YourSeq 165 1946 2652 3000 75.5% chr10 + 90310308 90310816 509 browser details YourSeq 163 1566 2457 3000 86.0% chr16 + 46125219 46126103 885 browser details YourSeq 162 1711 2489 3000 79.2% chr10 - 44977105 44977719 615 browser details YourSeq 161 2160 2389 3000 91.7% chr17 + 43319150 43319339 190 browser details YourSeq 158 1611 2156 3000 85.1% chr2 + 16888794 16889351 558 browser details YourSeq 158 1956 2644 3000 75.7% chr1 + 104854186 104854628 443 browser details YourSeq 147 2406 2881 3000 82.4% chr5 - 66696090 66696549 460 browser details YourSeq 146 2406 2874 3000 86.0% chr3 - 108006652 108007107 456 browser details YourSeq 146 2154 2389 3000 95.6% chr2 - 124857170 124857685 516 browser details YourSeq 141 2154 2310 3000 98.0% chr3 + 42611199 42611361 163 browser details YourSeq 139 2423 2843 3000 78.4% chr6 + 115235065 115235462 398 browser details YourSeq 136 2154 2339 3000 97.3% chr11 + 62724247 62724436 190 browser details YourSeq 131 2257 2389 3000 99.3% chr7 - 51280264 51280396 133 browser details YourSeq 131 2257 2389 3000 99.3% chr15 - 4433129 4433261 133 browser details YourSeq 131 2257 2389 3000 99.3% chr10 - 66495113 66495245 133

Note: The 3000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: S100a16 S100 calcium binding protein A16 [ Mus musculus (house mouse) ] Gene ID: 67860, updated on 24-Oct-2019

Gene summary

Official Symbol S100a16 provided by MGI Official Full Name S100 calcium binding protein A16 provided by MGI Primary source MGI:MGI:1915110 See related Ensembl:ENSMUSG00000074457 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as S100F; DT1P1A7; AI325039; AI663996; 2300002L21Rik Expression Ubiquitous expression in colon adult (RPKM 66.5), stomach adult (RPKM 63.5) and 27 other tissues See more Orthologs human all

Genomic context

Location: 3; 3 F1 See S100a16 in Genome Data Viewer

Exon count: 6

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 3 NC_000069.6 (90537254..90543151)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 3 NC_000069.5 (90345145..90347073)

Chromosome 3 - NC_000069.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 8 transcripts

Gene: S100a16 ENSMUSG00000074457

Description S100 calcium binding protein A16 [Source:MGI Symbol;Acc:MGI:1915110] Gene Synonyms 2300002L21Rik Location Chromosome 3: 90,537,254-90,543,151 forward strand. GRCm38:CM000996.2 About this gene This gene has 8 transcripts (splice variants), 176 orthologues, 20 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

S100a16-202 ENSMUST00000098911.9 1127 124aa ENSMUSP00000096510.3 Protein coding CCDS17536 Q9D708 TSL:1 GENCODE basic APPRIS P1

S100a16-201 ENSMUST00000098910.2 1123 124aa ENSMUSP00000096509.2 Protein coding CCDS17536 Q9D708 TSL:1 GENCODE basic APPRIS P1

S100a16-206 ENSMUST00000107335.1 926 124aa ENSMUSP00000102958.1 Protein coding CCDS17536 Q9D708 TSL:5 GENCODE basic APPRIS P1

S100a16-205 ENSMUST00000107334.7 879 124aa ENSMUSP00000102957.1 Protein coding CCDS17536 Q9D708 TSL:3 GENCODE basic APPRIS P1

S100a16-203 ENSMUST00000107331.7 723 124aa ENSMUSP00000102954.1 Protein coding CCDS17536 Q9D708 TSL:2 GENCODE basic APPRIS P1

S100a16-204 ENSMUST00000107333.7 608 124aa ENSMUSP00000102956.1 Protein coding CCDS17536 Q9D708 TSL:3 GENCODE basic APPRIS P1

S100a16-208 ENSMUST00000150833.7 422 72aa ENSMUSP00000119168.1 Protein coding - D3Z2Y6 CDS 3' incomplete TSL:3

S100a16-207 ENSMUST00000127008.1 464 No protein - lncRNA - - TSL:3

Page 6 of 8 https://www.alphaknockout.com

25.90 kb Forward strand 90.53Mb 90.54Mb 90.55Mb (Comprehensive set... S100a14-203 >protein coding S100a16-202 >protein coding Gm24593-201 >snRNA

S100a14-201 >protein coding S100a16-206 >protein coding

S100a14-202 >protein coding S100a16-207 >lncRNA

S100a14-204 >retained intron S100a16-205 >protein coding

S100a16-208 >protein coding

S100a16-204 >protein coding

S100a16-203 >protein coding

S100a16-201 >protein coding

Contigs AC160552.13 > Regulatory Build

90.53Mb 90.54Mb 90.55Mb Reverse strand 25.90 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000098910

1.99 kb Forward strand

S100a16-201 >protein coding

ENSMUSP00000096... MobiDB lite Low complexity (Seg) Superfamily EF-hand domain pair SMART S100/CaBP-9k-type, calcium binding, subdomain Pfam S100/CaBP-9k-type, calcium binding, subdomain PROSITE profiles EF-hand domain

PROSITE patterns EF-Hand 1, calcium-binding site PANTHER PTHR11639:SF76

PTHR11639 Gene3D 1.10.238.10 CDD cd05022

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend synonymous variant

Scale bar 0 20 40 60 80 100 124

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8