https://www.alphaknockout.com

Mouse Knockout Project (CRISPR/Cas9)

Objective: To create a Fam221b knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Fam221b (NCBI Reference Sequence: NM_175517 ; Ensembl: ENSMUSG00000043633 ) is located on Mouse 4. 7 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 7 (Transcript: ENSMUST00000056474). Exon 2~7 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Male mice homozygous for a null allele exhibit normal fecundity.

Exon 2 starts from about 0.07% of the coding region. Exon 2~7 covers 100.0% of the coding region. The size of effective KO region: ~6950 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7

Legends Exon of mouse Fam221b Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.6% 512) | C(23.05% 461) | T(26.2% 524) | G(25.15% 503)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(30.05% 601) | C(18.95% 379) | T(22.65% 453) | G(28.35% 567)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr4 - 43666610 43668609 2000 browser details YourSeq 61 206 373 2000 91.9% chr9 - 67528536 67528710 175 browser details YourSeq 51 577 668 2000 78.6% chr1 - 64637249 64637316 68 browser details YourSeq 51 196 255 2000 94.7% chrX + 160171731 160172267 537 browser details YourSeq 51 583 668 2000 96.4% chr7 + 126805804 126805898 95 browser details YourSeq 51 198 378 2000 77.8% chr17 + 27418922 27419089 168 browser details YourSeq 50 207 378 2000 86.2% chr10 - 85900377 85900547 171 browser details YourSeq 49 222 378 2000 91.4% chr2 - 172285266 172285424 159 browser details YourSeq 46 218 379 2000 92.6% chr3 - 94990872 94991044 173 browser details YourSeq 46 577 668 2000 94.3% chr1 + 157863738 157863831 94 browser details YourSeq 45 221 373 2000 91.7% chr9 - 86830595 86830746 152 browser details YourSeq 45 575 668 2000 85.8% chr11 + 96159866 96160082 217 browser details YourSeq 45 199 303 2000 78.9% chr10 + 41938261 41938356 96 browser details YourSeq 44 221 361 2000 94.0% chr6 - 86097001 86097147 147 browser details YourSeq 44 221 332 2000 94.0% chr12 - 116459255 116459366 112 browser details YourSeq 44 199 295 2000 81.3% chr7 + 16441517 16441602 86 browser details YourSeq 44 221 312 2000 94.0% chr6 + 113005276 113005368 93 browser details YourSeq 44 220 295 2000 92.2% chr2 + 163506154 163506231 78 browser details YourSeq 43 216 261 2000 97.8% chrX - 101121509 101121558 50 browser details YourSeq 43 628 983 2000 62.5% chr11 + 59116834 59116995 162

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr4 - 43657658 43659657 2000 browser details YourSeq 37 1953 2000 2000 95.2% chr5 - 151201272 151201348 77 browser details YourSeq 34 1950 1992 2000 90.5% chr9 - 97080811 97080883 73 browser details YourSeq 34 1953 1992 2000 92.5% chr2 - 26084346 26084385 40 browser details YourSeq 34 1951 1996 2000 87.0% chr17 + 31661791 31661836 46 browser details YourSeq 32 1949 2000 2000 80.8% chr6 - 25569991 25570042 52 browser details YourSeq 32 1949 2000 2000 80.8% chr18 - 52438461 52438512 52 browser details YourSeq 32 1951 2000 2000 82.0% chr12 - 108343616 108343665 50 browser details YourSeq 32 1953 1996 2000 86.4% chr12 - 70706351 70706394 44 browser details YourSeq 32 1951 2000 2000 82.0% chr1 - 120711702 120711751 50 browser details YourSeq 32 1954 1992 2000 97.1% chr11 + 87332460 87332527 68 browser details YourSeq 31 1958 2000 2000 86.1% chr11 + 107345204 107345246 43 browser details YourSeq 30 1953 2000 2000 81.3% chr6 - 85316012 85316059 48 browser details YourSeq 30 1951 1983 2000 96.9% chr4 - 141108965 141109025 61 browser details YourSeq 30 1953 2000 2000 81.3% chr6 + 42901334 42901381 48 browser details YourSeq 30 1949 2000 2000 78.9% chr5 + 17118020 17118071 52 browser details YourSeq 30 1953 2000 2000 81.3% chr15 + 79608979 79609026 48 browser details YourSeq 30 1949 1992 2000 91.7% chr15 + 65841495 65841540 46 browser details YourSeq 29 1949 1979 2000 96.8% chr2 - 20952239 20952269 31 browser details YourSeq 29 1950 1979 2000 100.0% chr15 - 92807256 92807287 32

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Fam221b family with sequence similarity 221, member B [ Mus musculus (house mouse) ] Gene ID: 242408, updated on 8-Oct-2019

Gene summary

Official Symbol Fam221b provided by MGI Official Full Name family with sequence similarity 221, member B provided by MGI Primary source MGI:MGI:2441678 See related Ensembl:ENSMUSG00000043633 Gene type protein coding RefSeq status PREDICTED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 4930412F15Rik Expression Restricted expression toward testis adult (RPKM 93.3) See more Orthologs human all

Genomic context

Location: 4; 4 A5 See Fam221b in Genome Data Viewer Exon count: 7

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 4 NC_000070.6 (43659622..43668859, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 4 NC_000070.5 (43672494..43681731, complement)

Chromosome 4 - NC_000070.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 2 transcripts

Gene: Fam221b ENSMUSG00000043633

Description family with sequence similarity 221, member B [Source:MGI Symbol;Acc:MGI:2441678] Gene Synonyms 4930412F15Rik Location : 43,659,622-43,669,145 reverse strand. GRCm38:CM000997.2 About this gene This gene has 2 transcripts (splice variants), 86 orthologues, 1 paralogue and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Fam221b-201 ENSMUST00000056474.6 2045 487aa ENSMUSP00000057398.6 Protein coding CCDS18107 Q8C627 TSL:1 GENCODE basic APPRIS P1

Fam221b-202 ENSMUST00000134487.1 3913 No protein - Retained intron - - TSL:1

Page 7 of 9 https://www.alphaknockout.com

29.52 kb Forward strand

43.65Mb 43.66Mb 43.67Mb Npr2-201 >protein coding Gm12481-201 >processed pseudoTgmeenme 8b-204 >lncRNA (Comprehensive set...

Npr2-202 >protein coding Tmem8b-202 >protein coding

Npr2-204 >lncRNA Tmem8b-209 >protein coding

Npr2-205 >protein coding Tmem8b-201 >protein coding

Npr2-206 >lncRNA Tmem8b-203 >protein coding

Npr2-210 >lncRNA Tmem8b-206 >protein coding

Tmem8b-207 >lncRNA

Tmem8b-205 >lncRNA

Contigs AL732626.8 >

Genes (Comprehensive set... < Spag8-201protein coding < Fam221b-201protein coding

< Spag8-203protein coding < Fam221b-202retained intron

< Spag8-202protein coding

< Hint2-202lncRNA

< Hint2-205lncRNA

< Hint2-203lncRNA

< Hint2-201protein coding

< Hint2-204lncRNA

< Hint2-206lncRNA

< Hint2-207lncRNA

Regulatory Build

43.65Mb 43.66Mb 43.67Mb Reverse strand 29.52 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Flank Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

pseudogene RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000056474

< Fam221b-201protein coding

Reverse strand 9.52 kb

ENSMUSP00000057... MobiDB lite Low complexity (Seg) Pfam Protein FAM221A/B PANTHER Protein FAM221A/B

PTHR31214:SF3

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend inframe deletion missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 487

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9