Title Analysis of stop codon readthrough utilizing comparative genomics approach Sub Title 比較ゲノム解析による終止コドンリードスルー機構の研究 Author 服部, 美樹子(Hattori, Mikiko) 冨田, 勝(Tomita, Masaru) 金井, 昭夫(Kanai, Akio) 斎藤, 輪太郎(Saito, Rintaro) Publisher 慶應義塾大学湘南藤沢学会 Publication year 2007-10 Jtitle 優秀修士論文 Abstract 本研究では、ほ乳類を含む真核生物においてリードスルーを起こす遺伝子を予測する手法を開発 し、リードスルー遺伝子候補を抽出した。その結果、未知のリードスルー遺伝子が数多く存在す る可能性が示された。また、本手法はあらゆる生物種のあらゆるリードスルーを予想する上で有 効であり、真核生物におけるリードスルー現象の全体像の把握に繁がることが期待できる。 Notes 先端生命科学プロジェクト2007年 Genre Thesis or Dissertation URL https://koara.lib.keio.ac.jp/xoonips/modules/xoonips/detail.php?koara_id=0302-0000-0602

慶應義塾大学学術情報リポジトリ(KOARA)に掲載されているコンテンツの著作権は、それぞれの著作者、学会または出版社/発行者に帰属し、その権利は著作権法によって 保護されています。引用にあたっては、著作権法を遵守してご利用ください。

The copyrights of content available on the KeiO Associated Repository of Academic resources (KOARA) belong to the respective authors, academic societies, or publishers/issuers, and these rights are protected by the Japanese Copyright Act. When quoting the content, please follow the Japanese copyright act.

Powered by TCPDF (www.tcpdf.org) ISBN 978-4-87762-191-9 SFC-MT 2007-006

nalysis of stop codon readthrough

utilizing comparative

genomics approach 2007年

oεo●¶O》o暫昌o∋剛o切㊥謝oξ・

服部 美樹子 政策・メディア研究科修士課程

先 端 生 命 科 学 プ ロ ジェク ト

慶應義塾 大 学 湘 南 藤 沢 学 会

A 優秀修士論文推薦 の ことば

遺伝子情報か らた んぱ く質合成す る 「翻訳」 とい うプ ロセスにおいて 「終止 コ ドン」 で翻訳が終結す るという ことが生物学 の常識 とされて いた。しか し最 近では終止 コ ドン の存在 を無視 して翻訳 を続行す る 「リードスルー」という現象が明 らか とな り、その メ カニズム と生物学的意義 はよ くわか って いない。 本論文で は。真核 生物 の リー ドスルー を予測す る手法 を新規に開発 し、多 くの未知 リ ー ドスルー遺伝子が存在す る可能性 を示唆 した。リー ドスルーの理解 に向 けて本手法の

果たす功績 は非常 に大 きい ことか ら、優秀修士論文 に強 く推薦す る。

慶應義塾大学 環境情報学部教授 冨田 勝 Master's Thesis Academic Year 2007

比較ゲノム解析による終止コ ドンリー ドスルー機構の研究

Analysis of stop codon readthrough utilizing comparative genomics approach

服部美樹子

Mikiko Hattori

慶應 義 塾 大 学 政 策.メ デ ィ ア 研 究 科

Keio University Graduate School of Media and Governance

July,2007 修 士 論文 2007年 度(平 成19年 度)

比較ゲノム解析による終止コ ドンリー ドスルー機構の研究

論文要旨

翻 訳 終結 は遺 伝子 発現 に お いて非 常 に重要 な ステ ップの一 つで あ る.UAA, UAG, UGAは

翻訳 終結 の役 割 を担 う終 止 コ ドン と して知 られ て い るが,近 年,こ の終 止 コ ドンが翻 訳 を終 結

せず に ア ミノ酸 を コー ドす る例 が 明 らか とな った.終 止 コ ドン リー ドスル ー と呼 ばれ る この現

象は,普 遍 的遺伝 暗 号 に従わ な い翻訳 制御 機構 と して注 目 を集 め て いるが,そ の メ力 ニズ ムは

多様 で 未だ未 解 明 な点 が 多い.さ らに現 段階 で は,生 体 内に未 知 の リー ドスル ー遺伝 子 が数 多

く潜 在 して い る可能 性 が高 い.終 止 コ ドン リ-ド ス ルーの メカニ ズム の解 明 と理解 は ゲ ノム上

に描かれた生命の暗号を解読するために大きな課題のひとつである-

本研 究 では 比較 ゲ ノム解 析 を用 いて網 羅 的に 真核 生物 の新 規 リー ドスル-遺 伝 子 を予 測 す る

手 法 を提 案す る.連 伝 子が リー ドスル-を 起 こ した 場合,3'UTRと 考 え られ て いた領域 が翻訳

され る こ ととな る.よ ってcDNA配 列の3'UTRを 機械 的 に翻訳 した配 列 と既知 の タ ンパ ク質 配

列 を比較 し,高 い相 同性 を示 した 遺伝子 を リ-ド ス ル-候 補 遺伝 子 と して抽 出す る と,マ ウ ス

で148本,ヒ トで86本 の候 補遺 伝子 を得た.さ らに,シ ョウジ ョウバ エ や これ ま で リー ドスル ー が 報告 され て いな い高 等植 物 に おい て も数本 ず つ の新 規 リー ドスル-候 補遺 伝 子 を抽 出 した.

本 手法 は,生 物種 や終 止 コ ドンの種 類,既 知の リ-ド スルー メカニ ズム に依 存せ ず,網 羅 的

かつ 汎 用的 に新規 リー ドスル ー遺 伝子 を予測 す る手法 と して有 効 で あ り,実 際 に ほ乳 類 や高 等

植物 に お いて新規 の リー ドスルー 機構 の 存在 を示 唆す る結 果 を得 た.こ の よ うな網 羅 的な リ一

ドスル-遺 伝子 の抽 出は,生 体が 利 用 して いる リ-ド スル ー機 構の 全体 像 を掴 む ことに繋が り,

そ の役 割 に関す るよ り深 い理 解 を もた らすだ ろ う.

キ ー ワ ー ド:リ-ド ス ル ー,翻 訳 終 結,真 核 生 物,比 較 ゲ ノ ム

慶應義塾 大学 大学 院 政策 ・メデ ィア研究

服部 美樹子 Abstract of Master's Thesis Academic Year 2007

Analysis of stop cod on read through utilizing comparative genomics approach

Summary

Translation termination is a crucial step in expression, and is signaled by three stop codons, UAA, UAG and UGA. Stop cod on read through is a mechanism by which stop codons fail to terminate translation. Although the 3' untranslated regions (UTRs) of several are translated using read through, its biological mechanism and significance remain unclear. Here, we present a bioinformatics workflow to predict eukaryotic read through genes which is based on an exhaustive comparative genomics approach. It is capable of detecting potential translation frames of 3' UTR in cDNA sequences according to the homology to proteome sequences. In total, 148 M. musculus and 86 H. sapiens read through candidates were identified, containing 11 out of 19 and six out of 14 documented read through genes, respectively. Potential read through stop codons were biased towards UAG and UGA, while the distance between internal and authentic stop codons was longer than expected. Some novel read through candidates were also predicted in D. melanogaster and higher plants. Using a comparative genomics approach, we extracted the read through candidates without being limited to one type of organism, internal stop cod on or mechanism. Our list of novel candidates suggests the existences of novel read through mechanisms other than selenocysteine insertion in mammals and higher plants. We propose that this comprehensive approach to screen read through candidates is effective in the further understanding of the biological roles of read through.

Keywords: read through, translation termination, eukaryote, comparative genomics

Keio University Graduate School of Media and Governance

Mikiko Hattori Contents

CHAPTER 1 BACKGROUND: WHAT IS STOP CODON READTHROUGH ? 5

1.1 SYNTHESIS 5

1.2TRANSLATION TRANSLATION TERM INATION 6

1.sSTOPCODONREADT]STOP CODON EIROUGH 6

1.3.lSelenocysteine1.3.1 Selenocysteine insertion 7

1.3.2 Pyrrolysine ertion 9

1.3.sStopcodon Stop cod on ad through in eukaryote 10

CHAPTER 2 IN SILICO 'REDICTION OF EUKARYOTIC STOP CODON READTHROUGH

12 GENES USING COMPARATIVE GENOMICS

12 2.1 INTRODUCTION

2.2 MATERIALS AND METHODS 14

2.2.1 Data preparation 14

2.2.2 ExtractionofelExtraction of nav read through candidates 15

2.2.3 Proteinmotifsea;read through candidates 17

2.2.4CharacteristicanCharacteristic alysis of extracted read through candidates...... 17

2.3 RESULTS...... 20

2.3.1 Read through genes extracted at each filtering step 20

20 2.3.2 Novel read through candidates in eukaryotes.

21 2.3.3 Protein motifs found at 3' UTR of candidates

2.3.4 Comparison of stop cod on usage, and distance between the internal and termination stop

codons 21

2.4 DISCUSSION 29

CHAPTER 3 NCLUSIONS 32

ACKNOWLEDGEMENTS 34

REFERENCES 36

41 APPENDIX 1. Background: What is stop cod on read through?

Chapter 1

Background: What is stop cod onread through?

- Dual role of stop cod on in protein synthesis, "termination" and "elongation" —

1.1 Protein synthesis

The genome of an organism stores a huge amount of biological information, however the

DNA in genomes is unable to release that information to the cell itself. In 1958, Francis Crick enunciated the workflow to utilize the biological information written on genome in most general case, `the central dogma'.

The first step in the central dogma is to copy the DNA nucleotide sequence in

information-contained sections of genomes, a gene, into an RNA nucleotide sequence. This step

is named `transcription'. The transcription products are classified into two groups, non-coding

RNA (ncRNA) and messenger RNA (mRNA). In the next step, `translation', the information

carried in an mRNA molecule is converted into protein while ncRNA molecule does not encode

protein. One of the organelle in cells, ribosome translates mRNA into protein according to the

universal genetic code. In the universal genetic code, the three consecutive nucleotides in RNA,

5 1. Background: What is stop cod on read through?

called `codon', correspond one . Initiation of translation is signaled by start cod on

AUG, and translation ends at stop cod on, UAA, UAG or UGA.

1.2 Translation termination

Termination of protein synthesis is signaled by the three stop codons UAA, UAG and UGA.

When one of the three stop codons reaches the A-site on the ribosome, the release factors bind to the stop cod on and then help to release of the nascent polypeptide chain from the ribosome.

The three release factors have been detected in prokaryotes: RFI which recognizes the stop cod on `UAA' and `UAG', RF2 which recognizes `UAA' and `UGA', and RF3 which promotes release of RFI and RF2 from the ribosome after termination. On the other hand, eukaryotes have just two release factors: eRFl which binds to the three stop codons, and eRFs which might play the similar role to bacterial RF3 [1]. In archaea, although only one release factor aRFl, which is similar to eRFl, have been found, detail of termination systems is unclear [2].

1.3 Stop cod on read through

The stop codons, UAA, UGA and UAG are known as signals for translation termination.

However, in stop cod on read through, an amino acid is incorporated at the stop cod on according to unusual translation process. Notably, selenocysteine and pyrrolysine, established as the 21' and 22nd amino acids, are encoded by UGA and UAG respectively. The previous study reported some genes using read through for regulation of protein function and several models of the mechanism of read through.

6 1. Background: What is stop cod on read through?

1.3.1 Selenocysteine insertion

Selenium is an essential trace mineral for many organisms, including humans and other mammals. Unlike other metal elements acting as cofactors, selenium is inserted into the in the form of the 21st amino acid selenocysteine in all three lineages of life [3]. Remarkably, selenocysteine is encoded by the UGA cod on that is identified as signals for translation termination in the universal genetic code [4]. The mechanism of selenocysteine incorporation is different between prokaryotes, eukaryotes and archaea. The common feature to all organisms is the UGA cod on, the selenocysteine-specific transfer RNA (tRNAS"), the selenocysteine insertion sequence (SECIS) element, and the specific translation elongation factors. The bacterial, archaeal and eukaryotic SECIS elements are not similar in their sequences and

structures to each other. On the other hand, in all three kingdoms, the tRNA$' is initially

charged serine by seryl-tRNA synthetase, and then Ser-tRNAseo is converted to

selenocystein-charged tRNAseo(Sec-tRNAs") by modifying the serine residue to selenocysteine

residue [5].

In prokaryotes, the bacterial selenocysteine insertion sequence (bSECIS) elements are

located immediately downstream of the UGA cod on and form secondary RNA stem-loop

structures. The selenocysteine-specific translation elongation factor SelB binds both bSECIS

element and Sec-tRNA$", and deliver Sec-tRNAsecto ribosomal A site (Figure 1.1, A) [6-8].

In eukaryotes, the SECIS (eSECIS) elements are located in s'-untranslated region (3' UTR)

distant from the UGA codons [9]. These can direct the insertion of selenocysteine at multiple

UGA codons within the same mRNA in some casas [10]. The eukaryotic SelB dose not

recognize eSECIS elements [11] and requires the other eSECIS-binding protein SBP2 (Figure

7 1. Background: What is stop cod on read through?

1.1, B) [6, 10]. Moreover, the most recent study revealed the existence of novel cts-element, selenocysteine re definition element (SRE) that located in the coding sequence and forms the conserved stem-loop RNA structure different from the eSECIS element [12]. To understand its role in selenocysteine insertion, the further studies are required. Yeast and higher plants lack all these components of selenocysteine insertion [ 13], while the selenocysteine-inserted proteins

(selenoproteins) are essential for mammalian development [14].

The mechanism of selenocysteine incorporation in archaea seems to be a hybrid between those in prokaryotes and eukaryotes. As in eukaryotes, the archaeal SECIS (aSECIS) elements are located in 3' UTR. However, the homologues of eukaryotic SBP2 have never been detected in archaea. The model estimated from these recent findings is shown in Figure 1.1, C [6].

A\ Mit AtIki

v„ L

.~ _.._....,.. 1 ' ....._.,...

~.

11^' eLc

• : #' slue I

Figure 1.1 - Mechanisms of selenocysteine insertion. Insertion of selenocysteine in

prokaryotes (A), eukarytes (B), and archaea (C) [6]

8

1. Background: What is stop cod on read through?

Eukaryotic selenoproteins participate in antioxidant and anabolic processes, including mammalian development [14], whereas in prokaryotes they are primarily involved in catabolic processes and use selenium to catalyze redox reactions [3]. In archaea, selenoproteins have been identified only in the Methanogenes genus, and they are all involved in redox reactions participate in the methanogenic pathway [6].

1.3.2 Pyrrolysine insertion

Pyrrolysine, the 22nd amino acid, was recently discovered in the active site of

monomethylamine methyltransferase (MtmB) in Methanosarcina barkeri [15, 16]. Based on the currently known genomes, pyrrolysine is found in only methanogens, one bacteria

Desulfitobacterium hafniense and five archaea belonging to the Methanosarcinaceae. These

species are capable of methanogenesis from trimethylamine, dimethylamine and

monomethylamine using methylamine methyltransferase enzymes [17].

Although the specific tRNAPy'with the anticodon CUA and class II pyrrolysyl-tRNA

synthetase have been shown to be encoded by the M. barkeri genome [15, 18], the precise

mechanisms by which pyrrolysine is incorporated without terminating translation remain

unclear. In the recent research, it was reported that class H pyrrolysyl-tRNA synthetase could

charge directly tRNAPy'with pyrrolysine [ 19, 20], and then the standard elongation factor EF-Tu

recognizes the pyrrolysyl-tRNAPy'. Moreover, the previous study showed that canonical class I

lysyl-tRNA synthetase and class II lysyl-tRNA synthetase ligated lysine onto tRNAPy'through

the formation of a ternary complex [21]. These facts suggest that pyrrolysine uses decoding

strategies dissimilar to those of selenocysteine (Figure 1.2) [6].

9 1. Background: What is stop cod on read through?

Pyrrolysine insertion sequence (PYLIS) elements, putative secondary structures conserved

features, were predicted 5 or 6 ht downstream of the UAG codons of methylamine

methyltransferase genes [8, 22]. However, whether these elements are required for pyrrolysine

insertion have never been evidenced by experiment.

tin s4N.,— N.,._. II

wit% Alt" +,X14. t: o

detk

Ll MM ..r ' rOT L, Figure 1.2 - Hypothetical mechanisms of insertion of pyrrolysine [6]

The dashed arrow indicates the possible pre-translational conversion of thothe lysyl-tRNA' '

into pyrrolysyl-tRNAPY'.

1.3.3 The other stop cod on read through in eukaryotes

In Drosophila melanogaster, the three genes, out at first (oaf) [23], ketch [24, 25], and

headcase (hoc) [26, 27] were reported to regulate the protein function by stop cod on

read through other than selenocysteine or pyrrolysine insertion. However, the factors that

promote read through and the amino acids inserted at the read through sites in the three genes

have not been identified.

10

f 1. Background: What is stop cod on read through?

Oaf and ketch participate in diverse developmental processes. The expression of oaf is necessary for larval development as well as ketch required in the every for production of viable eggs. In oaf and ketch, although the UGA codons are receded by read through, their mRNAs contain no SECIS elements. The efficiency of read through in these genes seems to vary in different tissues or different stage of development. This finding suggested that the two genes use read through mechanism for regulation of tissue- or stage-specific expression [23, 25].

On the other hand, headcase protein inhibits terminal branching of neighboring tracheal cells.

Remarkably, the longer hoc protein synthesized by read through is a much stronger inhibitor

[27]. In hoc, the receding stop cod on is UAA, commonly associated with the highest termination efficiencies. The RNA secondary stem-loop structure was found immediately downstream of the UAA cod on, and experimental data showed that it plays a crucial role in read through event [26].

Read through is also occurred in the yeast PDE2 gene, encoding the high-affinity cyclic adenosine monophosphate (cAMP) phosphodiesterase [28]. In yeast, the efficiency of read through depends on the environment; the read through of PDE2 is enhanced in the absence

of glucose and the presence of the [PSI+] factor. PDE2 read through seems to modulate cAMP

intracelluar concentration. Since cAMP is a most important second messenger that controls in

many biological processes, the read through of PDE2 may have a number of effects on yeast

physiology [28].

11 2. In silica prediction of eukaryotic read through genes

Chapter 2

In silica prediction of eukaryotic stop cod on read through genes using comparative genomics

2.1 Introduction

In stop cod on read through, an amino acid is incorporated at the stop cod on according to a non-standard genetic code. Although the three stop codons UAA, UAG and UGA direct ribosome to terminate protein synthesis, some UGA and UAG triplets have been found to encode selenocysteine and pyrrolysine, respectively [6, 8], resulting in the skipping of translation termination. Our knowledge of read through genes is limited and their biological significance is not fully understood, so it is important to investigate their mechanisms.

Selenocysteine and pyrrolysine incorporation at the respective internal stop codons of UGA and UAG have been previously investigated. In eukaryotes, selenocysteine incorporation depends on elements of the selenocysteine insertion sequence (SECIS) located at the 3' untranslated region (UTR) of read through messenger RNAs (mRNAs), and requires selenocysteine-specific transfer RNA (tRNA; Sec UCA), translation elongation factor mSelB and SECIS binding protein SBP2 [29, 30].

12 2. In silica prediction of eukaryotic read through genes

Eukaryotic selenoproteins participate in antioxidant and anabolic processes, including mammalian development [14], whereas in prokaryotes they are primarily involved in catabolic processes and use selenium to catalyze redox reactions [3]. Recent research has identified the pyrrolysine insertion in the active site of monomethylamine methyltransferase (MtmB) of the archaea Methanosarcina barkeri [ 16], making a total of three classes of methyltransferases containing pyrrolysines that have been identified in this species. Although the tRNApyl with the anticodon CUA and class II aminoacyl-tRNA synthetase have been shown to be encoded by the

M. barkeri genome [15, 18], the precise mechanisms by which pyrrolysine is incorporated without terminating translation remain unclear.

In total, 72 selenoproteins involving UGA read through have been documented in nine mammalian species [31, 32]; however, read through mechanisms other than selenocysteine incorporation have not been reported. In Drosophila melanogaster, three read through genes encoding selenoproteins are known, while read through of out at first (oaf)[23], ketch [24] and headcase (hoc) [26], were also detected experimentally. However, the amino acids inserted at these stop cod on sites and the signals that promote their read through are unknown. In higher plants, no stop cod on read through genes have been found [8].

Most computational methods for the prediction of selenoprotein search for SECIS elements

[13, 33-36]. However, as characteristic SECIS element motifs differ between organisms [6, 35], these approaches are ineffective for predicting novel selenoproteins in organisms with unknown

SECIS motifs. A recent study reported a homology-based approach to predict selenocysteine or pyrrolysine insertion genes in bacterial and archaeal species [37], but similar computational

approaches have not been developed for eukaryotes. Here we introduce a new comparative

13 2. In silica prediction of eukaryotic read through genes

genomics approach to predict read through genes in eukaryotes independently from the characteristics of known read through genes.

According to the hypothesis that 3' UTRs translated by read through encode amino-acid sequences similar to known proteins, we predicted novel read through candidates from the five eukaryotes Homo sapiens, Mus musculus, D. melanogaster, Arabidopsis thaliana and Oryza sativa based on a comparison between the deduced amino-acid sequences of full-length complementary DNAs (cDNAs) and the protein sequences in the National Centre for

Biotechnology Information (NCBI) hr protein database. The candidates contained some previously reported read through genes. Computational analyses of other read through candidates showed that they had similar characteristics to those of reported read through genes, including a bias of stop codons towards UGA and UAG, and significantly longer distances between skipped and authentic stop codons than expected. Our results also suggest that stop cod on read through occur in higher plants, and we propose novel read through mechanisms for eukaryotes.

2.2 Materials and Methods

2.2.1 Data preparation

Read through candidates were extracted by screening full-length cDNAs from the five eukaryotes: 41,118 cDNA sequences from H. sapiens were obtained from the H-inv DB web site [38], 60,770 M. musculus cDNA sequences were obtained from FANTOM 2.01 [39],

10,994 D. melanogaster cDNA sequences were obtained from BDGP [40], 32,127 O. sativa cDNA sequences were obtained from KOME [41] and 13,181 A. thaliana cDNA sequences were obtained from RIKEN [42]. The annotated amino-acid sequences of 2,354,365 proteins

14 2. In silica prediction of eukaryotic read through genes

from all prokaryotes, archaea and eukaryotes were obtained from the comprehensive hr protein database of the NCBI (ftp://ftp.ncbi.nth.gov/blast/ob/).

ESTs were also obtained from the NCBI ftp server: 6,085,737 from H. sapiens, 4,334,170 from M. musculus, 386,747 from D. melanogaster, 285,801 from O. sativa and 364,807 from A. thaliana. Complete genome sequences of H. sapiens and M. musculus were down loaded from the University of California Santa Cruz (UCSC) server [43], O. sativa and A. thaliana sequences were down loaded from the NCBI ftp server in GenBank format files, and the D. melanogaster sequence was obtained from the Berkeley Drosophila Genome Project (BDGP)

Release 3. Previously documented read through genes were obtained from RECODE [31, 32]:

25 from H. sapiens, 21 from M. musculus and six from D. melanogaster. The cDNA datasets contained 14 out of the 25 documented read through genes in H. sapiens, 19 out of 21 in M. musculus and six out of six in D. melanogaster. In total, 197,228 annotated proteins were down loaded from the UniProt Knowledgebase [44] to confirm the correct annotation of the candidate proteins.

2.2.2 Extraction of novel read through candidates

If the cDNA sequence of deduced amino acids including a stop cod on and 3' UTR was highly similar to a documented protein, this sequence was assumed to have the potential to encode a protein by read through. Sate and co-workers have previously reported that searching for protein motifs in deduced amino-acid sequences of 3' UTRs is effective for the prediction of read through genes in D. melanogaster [45]. On the basis of this work, we predicted new read through candidates. Nucleotide sequences of full-length cDNAs of H. sapiens, M. musculus,

15 2. In silica prediction of eukaryotic read through genes

D. melanogaster, 0. sativa and A. thaliana were translated in all reading frames, and compared with annotated protein sequences in the hr database using BLASTX [46, 47] (Figure 2.1, step

A). The translated regions of the cDNAs that aligned with annotated proteins (including the carboxy-terminus) with identity scores >80% were then reannotated as coding regions. If reannotated coding regions contained internal in-frame stop codons, then we defined them as the initial read through candidates.

Each candidate was compared with the others using BLASTN [46, 47] in order to screen out redundant sequences. Candidates were eliminated if 70% of the predicted partial coding region was aligned with others with identity scores >90%. To identify candidates with authentic internal stop codons and reliable sequence qualities, we compared each candidate with EST sequences using the BLASTN program [46, 47] adopting the threshold E-value < le-ioo. The possible existence of internal stop codons was then checked (Figure 2.1, step B). As genome sequences were assembled from overlapping sequence fragments, they had a higher sequence quality than expressed sequence tags (ESTs) or cDNAs. Therefore, we again compared those filtered by EST comparisons with the genome sequences of the respective species using the

BLAT program [48](Figure 2.1, step C). The last two steps reduced the risk of sequencing errors or obtaining SNPs in candidate internal stop codons.

We next excluded cDNA candidates that were predicted to undergo alternative splicing around their internal stop codons, in order to generate read through gene `candidates' (Figure 2.1, step D). The SSearch program [49] with an open gap penalty of —5and an extension gap penalty of 0 was used to realign each translated cDNA and corresponding protein pair, and candidates harbouring >3 AA (9 nucleotide) gaps around the internal stop codons were excluded from the

16 2. In silica prediction of eukaryotic read through genes

analysis.Candidates containing multiple types of internal stop cod onwere also eliminated due to their lower reliability as read throughgenes.

Finally, candidates were again compared with the hr database using BLASTX to extract

`strong candidates' of read throughgenes that were conserved in multiple species (Figure 2.1, step E). We screened cDNA candidates if 80% of the predicted partial coding regions aligned with protein sequences of other species with an identity score threshold value >60%.

2.2.3 Protein motif search at 3' UTR of extracted read through candidates

Searching for protein motifs at 3' UTRs was shown to be effective for prediction of read through genes [45]. Thus, we searched for documented protein motifs in deduced amino acid sequences of 3' UTRs to validate the reliability of our candidates by using InterProScan

[50]. Every region between internal stop cod on and termination stop cod on of candidates was translated in silica, and was used for the query of InterProScan. If the descriptions for the motifs from InterProScan were not informative (e.g., `no description', `seg', `FAMILY NOT

NAMED', `SUBFAMILY NOT NAMED' and `UNCHARACTERIZED' ), then they were excluded.

2.2.4 Characteristic analysis of extracted read through candidates

We calculated the ratios of UAA, UAG and UGA at the internal stop codons of `candidates' and `strong candidates', and compared these with the ratios at the translation termination codons of `non-candidates' (negative control) that were screened out at step A. Long peptides might be encoded downstream of the internal stop cod on of read through genes to change the protein

17 2. In silica prediction of eukaryotic read through genes

function. Therefore, in order to evaluate the validities of extracted read through candidates, the distances between the read through stop codons and the termination stop codons of `candidates' and `strong candidates' were compared with those between the termination stop codons and the nearest in-frame triplets of UAA, UAG or UGA (next stop cod on) in the 3' UTR sequences of

`non-candidates'

18 2. In silica prediction of eukaryotic read through genes

step A Fuil-tength cONA

•N° \` 517 (12)1 ns min 497 (6) 129(5) . 203 127 ICandidatecDA :313(11)lEST sequence 215(6) 56 (4) 39 45

I272 (11) \Gen CandidatecDNAs‘, e 186(6) 1 :34 (4) 32 : A.the: , ,,,,, ,,,,,,,,,,,, 23 , „ ,--, CandidatecONA ,al\'\ \t,''''''' it-- Candidate c{: '\\'lass\\ NA than\ ...... \\ gntGAP\ tuhr proteinw..._protein \

(Eliminate candidates containing multiple internal stop axiom)

D. melanogastetmelanogaster :2 (0)

Figure 2.1 - Schematic representation of the extraction of read through candidates.

Overview of in silica screening of read through candidates. Black boxes summarize the procedures. The number of candidates at each filtering step is shown in white boxes; bold numbers in parentheses indicate known read through genes within candidates.

19 2. In silica prediction of eukaryotic read through genes

2.3 Results

2.3.1 Read through genes extracted at each filtering step

The numbers of read through candidates of respective species extracted at each filtering step

are summarized in Figure 2.1. In step A, 12 of the 19 documented read through genes in M .

musculus, six of 14 in H. sapiens and five of six in D. melanogaster were extracted; step B

eliminated one further documented read through gene from each of M. musculus and D.

melanogaster. No documented read through genes were eliminated in steps C and D from M .

musculus, H. sapiens or D. melanogaster. Hence, after step D, 165 `candidates' including 11

documented read through genes were listed from M. musculus, 94 including six documented

genes from H. sapiens, seven including four documented genes from D. melanogaster, and two

each from 0. sativa and A. thaliana (for details, see Additional file 1). In step E, we screened

the `strong candidates' by searching widely conserved read through candidates in several species .

However no documented read through genes were extracted from D. melanogaster.

23.2 Novel read through candidates in eukaryotes

After extracting `strong candidates', 148 candidates remained in M. musculus (Table 1), 86

in H. sapiens (Table 2), two in D. melanogaster, two in A. thaliana and one in 0. sativa (Table

3). The alignments of two strong candidates and their corresponding proteins, ESTs and

genomes are shown in Figure 2.2. The selectivity values to capture documented read through genes in M. musculus candidates were 58%, and 43% in H. sapiens. Notably, three read through candidates were extracted in A. thaliana and 0. sativa with high blast bit scores.

20 2. In silica prediction of eukaryotic read through genes

2.3.3 Protein motifs found at 3' UTR of candidates

Using InterProScan, we searched for documented protein motifs between internal stop codons and termination stop codons as it may further support that these regions indeed encode proteins [45]. As a result, 106 out of 148 strong candidates in M. musculus, 55 out of 86 in H. sapiens and one out of two in A. thaliana had protein motifs between their internal stop codons and termination stop codons (for details, see Additional file 1,2). Within the candidate screened in the step D, nine out of 11 documented read through genes in M. musculus, five out of six in H. sapiens and two out of four in D. melanogaster contained documented protein motifs in these regions (for details, see Additional file 2).

2.3.4 Comparison of stop cod on usage, and distance between the internal and termination stop codons

The ratios of UGA and UAG usage were similar in the `candidates' and `strong candidates' of M. musculus and H. sapiens. However, the ratio of UAA usage in both `candidates' and

`strong candidates' was lower than that of the termination stop cod on in `non-candidates'

(Figure 2.3). Of the three possible internal stop codons of candidates in A. thaliana and O. sativa, only UAG was used. In M. musculus and H. sapiens, the distances between the internal and termination stop codons of documented read through genes, `candidates' and `strong candidates' were significantly longer than those of `non-candidates' (Figure 2.4).

21 2. In silica prediction of eukaryotic read through genes

Table 1. Strong candidates for stop cod on read through in M. musculus

cDNA Cod on Protein Annotation Identity Conserved species Conserved protein

25100121120 UGA AAH8664g.1 Glutathione peroxidase 1 100 Rattus norvegicus CAAsog28.1 Aisool6Co8 UGA AAHs6s6o.1 RUN and SIB domain containing protein 1 99.89 Homo sapiens CAll2il8.1 Dosoo46Pll UGA AAHoil4o.1 Fibulin-l precursor 99.85 Homo sapiens CAAsiii2.1 sgso4osD2l UGA AAH8664o.1 Glucose-6-phosphate isomerase 99.82 Rattus norvegicus AAH62oos.1 4gs242gFl8 UGA BAB2gs88.1 Male sterility protein 2 99.81 Rauus norvegicus XP 574540.I lsoooo2Fls UGA AAHsi646.1 Mitogen-inducible gene 6 protein homolog 99.78 Rattus norvegicus AAH8s84s.1 64sosgoNl2 UGA NP_619609. I RIKEN cDNA l8ioo4iC2s (n) 99.78 Rattus norvegicus AAH8siil.1 1810062014 UGA BACso6os.1 TBCl domain family member I0A 99.78 Rattus norvegicus XP_341981.2 l2oool4H24 UGA BAC262g4. 1 Splicing factor 3 subunit 1 99.75 Rattus norvegicus XP_223566.2 B2soso6Iio UGA BACs8666.1 Junctophilin 3 99.74 Raitus norvegicus XP_226549.3 2soooolNOs UGA BAB2lgso.1 Biliverdin reductase A precursor 99.67 Rattus norvegicus AAHi8i66.1 20102061112 UGA NP_062732.1 Mitochondrial carrier homolog 2 99.67 Homo sapiens AAHoo8is.1 6720430015 UGA NP_899003.1 Tetraspanin-l8 99.60 Rattus norvegicus XP_230297.2 A2sooo2Gl4 UGA AAHs4i 19.1 Ras-related protein Rab-2B 99.59 Rattus norvegicus AAioi444.1 s2so4o2Kli UGA BAC26s2i.1 Nucleolar phosphoprotein plso 99.57 Rattus norvegicus AAA4lilg.1 3110001117 UGA AAH2886i.1 Sodium channel modifier 1 (n) 99.56 Rattus norvegicus XP_227429.3 1500041015 UGA NP _067504.2 Seline/threonine protein phosphatase 2A 99.56 Rattus norvegicus AAHig2si.1 lsoooo8Gol UGA AAHsio2i.1 Glutathione peroxidase 3 precursor 99.55 Rattus norvegicus AAH6222i.1 1700027009 UGA BAA22i8o.1 Phospholipid hydroperoxide glutathione peroxidase 99.49 Ratios norvegicus CAD6l2i6.1 1300003602 UGA AAH64iss.1 Bifunctional coenzyme A synthase 99.49 Ratios norvegicus AAH8si8l.1 sosl4s4C2l UGA CAA68l4o.1 Selenoprotein P precursor 99.47 Rattus norvegicus NP_062065.2 g4soolsPog UGA NP _444332.1 15 kDa selenoprotein precursor 99.38 Ratios norvegicus AAH6os4i.1 64sos22B2l UGA NP__114392.1 Ectonucleotide pyrophosphatase/phosphodiesterase sprecursor 99.37 Rattus norvegicus NPooiol2i62.1 2010002005 UGA NP _612177.1 Calmodulin 99.35 Rattus norvegicus XP_236325.3 glso2o6Hos UGA NP _080519.2 RIKEN cDNA 9130411117 (n) 99.33 Rattus norvegicus XP_235099.3 0230011 L23 UGA NP _031685.2 Cell division control protein 2 homolog 99.33 Rattus norvegicus NP_062169.1 2oioolsH22 UGA AAKi248o.1 Beta-l 99.31 Rauus norvegicus NPiis4s4.1 lliool2Eog UGA NP 038787.1 Methionine-R-sulfoxide reductase 99.13 Homo sapiens NP_057416.1 Cssooiol2l UGA NP 291095.1 Vacuolar ATP synthase 21 kDa proteolipid subunit 99.02 Raitus norvegicus XP 216510.3 Clsoo2sKos UGA AAHil6gg.1 Selenoprotein T precursor 98.97 Homo sapiens AAH26sso.2 22iool4Pll UGA NP _033182.1 Selenoprotein W , muscle 1 98.86 Homo sapiens NP_003000.1 28io4o6K24 UGA NP_780335.1 finger CCHC domain containing protein 3 98.75 Homo sapiens AAH6g2s8.1 2oioolsBo4 UGA NP 075872.1 Arylacetamide deacetylase 98.74 Rattus norvegicus AAH88l4s.1 24ioolsK2s UGA BAB2ssso.1 pisNTR-associated cell death executor 98.64 Rattus norvegicus AAHs8sos.1 B2sossiAio UGA NP4444gi.1 Selenoprotein M precursor 98.62 Homo sapiens AAHls42l.1 B2so2lsDls UGA NP_113908.2 Type II iodothyronine deiodinase 98.30 Rattus norvegicus AACs2i6i.1 l8ioo2sAoi UGA XP 135065.2 RING-box protein 2 98.21 Bes taurus XP_611580.2 ggsoiosHli UGA XP _355466.1 Env polyprotein precursor 97.99 MouseIntracisternal Aparticle Psli8g Assooiioio UGA AAM4is42.1 2'-s' oligoadenylate synthetase IA 97.61 Rattus norvegicus AAHgll2l.1 4gsos88C2l UGA NP _080065.3 Flagellar radial spoke protein 3 97.17 Rattus norvegicus XP_341755.2 9130020015 UGA BAAosi68.1 Env polyprotein 96.95 Mousemammary tumor virus BAAosi68.1 F8soolsLos UGA EAA2osgl.1 hypothetical protein (n) 94.92 Plasmodiumyoelii yoelii XPi28826.1 22io4osBio UGA CAl2sgs8.1 Ras-like protein RRP22 94.58 Rattus norvegicus XP__344260.2 2410075602 UGA AANl228s.1 compaction-associated protein (n) 94.04 Rattus norvegicus XP 573771.1 5830471022 UGA XP _513202. I FUSinteracting serine-arginine rich protein 94.00 Pan troglodytes XP__513202.1 Cssool4Plg UGA NP 080694.1 Regulator of G-protein signaling 10 93.96 Rattus norvegicus XPs4lgsi.2 Cisoo4gF2l UGA XP _218197.1 protein 135 93.83 Rattus norvegicus XP_218197.2 E4sooslPio UGA XP_193524.3 Hypothetical protein ORF-llsi 93.06 Plasmodium berghei XP6i6io2.I g2solioPli UGA AAHsigo2.1 ProteinBAT4 (HLA-B-associated transcript 4) 91.80 Rattus norvegicus NP__001029329.1 l2oool6Clg UGA BAA2s88s.1 Putative splicing factor YTs2l (RAsol-binding protein) 90.20 Gallus gallus XP42o6ll.1 B4solioll6 UGA CAIO2oii.1 hypothetical protein PB4o2 (n) 82.09 Plasmodium berghei )0 _672777.1 Bgsooogoo2 UGA BACslisi. 1 unnamed protein product (n) 81.60 Plasmodium berghei XP_677343.1 02300171423 UGA BACs6sgo.1 unnamed protein product (n) 81.54 Plasmodium berghei XP_677343.1 2iooog4Kls UGA AAH2ll22.2 Selenoprotein H 81.15 Rattus norvegicus XP_578137.1 A6soo28oog UGA BAB26844.1 unnamed protein product (n) 80.95 Homo sapiens AAK6l2sg.1 4932415019 UAG XPl2gi2l.2 Sodium/hydrogen exchanger 2 99.88 Rattus sp. AAAi2sso.1 E2soooiK22 UAG NP__ 109646.2 Acyl-coenzyme A oxidase 3 99.86 Rattus norvegicus CAA6448i.1 9530092619 UAG NPiosi8s.1 WD and tetratricopeptide repeats protein 1 99.83 Ratios norvegicus AAH8sgsi.1 Eosoo48Fl8 UAG XP__131118.1 Transcriptiontermination factor 2 99.82 Rattus norvegicus XP_2 1 5670.2 57304061421 UAG NP8o844s.2 hypothetical protein LOC2 (n) 99.82 Rattus norvegicus AAH82osi.1 g4soo4gB22 UAG NP_032595.1 Anaphase promoting complex subunit 1 99.82 Rattus norvegicus XP_230589.3 E4soo24C22 UAG NP_036090.1 Poly(ADP-ribose) glycohydrolase 99.81 Rattus norvegicus QgQYM2 5730469005 UAG BAC2igs6.1 Microfibrillar-associatedprotein 1 99.80 Homo sapiens AAH2sssi.1 gssol62Lls UAG BACss2ss. I Probable G-protein coupled receptor 22 99.80 Raffia norvegicus XP_234041.3 6ssos6sC24 UAG NPos8ssl.1 D(1B) dopamine receptor 99.79 Rattus norvegicus NP_036900. I Dlsoo2oJ2o UAG NP__080773.1 Peroxisomal NADH pyrophosphatase NUDT 12 99.78 Macaca fascicularis BAEoogo4.1 D2sooo4Mio UAG AAHs6s4o.2 Protein phosphatase 4 (n) 99.76 Rattus norvegicus XP_216225.2 9830131607 UAG NP_932145.2 Serpin Bio 99.74 Rattus norvegicus AAH6liss.1 3110013005 UAG AAH8sso8.1 Tumor susceptibility gene 101 protein 99.74 Rattus norvegicus NP_853659.2 26oooosCOs UAG AAHsisio.1 Basic FGF-repressedZia bindingprotein (mbFZb) 99.66 Rattus norvegicus AAioi8o8.1 IliooolNls UAG AAHis6gs.1 USEI-like protein 99.63 Rattus norvegicus XP 214304.3 llioossJlg UAG AAH4igg4.1 40S ribosomal protein S4 99.62 Bes taurus XP 591678.1 4is246gFo2 UAG NP_598702.1 Putative pie-mRNA splicing factor RNA helicase 99.60 Rattus norvegicus XP_341949.2 4is248gH2l UAG NPoiggos.1 ADAMTS- 10 precursor 99.59 Rattus norvegicus XP_234919.3 4o2l4o2A 13 UAG NP_076304.2 TPss-regulating kinase 99.59 Rattus norvegicus XP_342581. I s4so4osH2o UAG BACsgsg4.1 Lipoprotein lipase precursor 99.58 Rattus norvegicus AAH8l8s6. I

22 2. In silica prediction of eukaryotic read through genes

Identity Conserved species Conserved protein cDNA Cod on Protein Annotation UAG NP s-beta-hydroxysteroid-delta(8) 99.57 Rattus norvegicus AAQ 14592.1 20100051301 _031924. 1 XP UAG AAH624o6.1 lntegrin-linked protein kinase 99.56 Canis familiaris _863096.1 26ioo44H 13 AAHg26oo. l UAG NP ER lumen protein retaining receptor 1 99.53 Rattus norvegicus 8oso486Fo4 _59871 1.1 CAAsiigl.1 Bgsoo88Dos UAG NP_031485.2 Adapter-related protein complex 2 alpha 2 subunit 99.53 Rattus norvegicus 99.52 Homo sapiens NP_060114.2 BlsooioIl6 UAG BAB2gisg.2 Ketch-like protein 6 99.51 Rattus norvegicus AAH6li2l.I C4soo4oAo2 UAG CAI2sss8.1 novel protein (n) 99.31 Marineleukemia virus ABDl44sg.1 G4sioo2ll2 UAG AABosogl.1 Pol polyprotein NP Basigin precursor 99.29 Rattus norvegicus CAA6iill.1 2iooossPl2 UAG _033171.1 NPo6lggl.3 UAG NP RIKEN cDNA 2510006016 (n) 99.29 Homo sapiens 4gss4oiDos _084024.1 XP_215126.1 UAG NP 444318.1 14 kDa transmembranc protein 99.25 Rattus norvegicur 1110003106 AAH8i6s4. I UAG NP 031566.3 Brain-derived neurotrophic factor precursor 99.20 Rattus norvegicus siso4l4Doi XP UAG NP 079730.1 Inhibitor of growth protein 5 99.17 Rattus norvegicus _343635.2 s8so4s2Fli AAHi8is2.1 o6ioosoE42 UAG BAB2sio4.1 Ketohexokinase 99.15 Rattus norvegicus 99.11 Homo sapiens AAQgllg4. l 2oiool2Fos UAG NP 083638.1 Charged multivesicular body protein 4b 98.95 Sus scrofa NP 2sioo6sMos UAG NP 077150.1 Succinyl-CoA:s-ketoacid-coenzyme A transferase 1 _999103.1 98.93 Rattus norvegicus XP_342995.1 lliooo2Aol UAG XP 131827.3 RIKEN cDNA 2610002202 (n) 98.91 Rattus norvegicus NP_001034775.1 22iooioBos UAG NP 079743.1 Gastrokine-l precursor 98.85 XPsis8oi.1 2410004117 UAG AAHil2is. 1 N-acetyltransferase ESCO2 Rattus norvegicus UAG AAH8oig6.1 VacuolarATP syntbasesubunit B, kidneyisoform 98.72 Rattus norvegicus XP^232119.2 D6soosoLl6 XP 6oso44oE2l UAG NP_780310.1 UPFos4i protein Looss8sl homolog 98.47 Pan troglodytes _516272.I XP 213716.1 l8ioo42Ko4 UAG AAH2i4io.1 l8ioo42Ko4Rik protein (n) 98.36 Rattus norvegicus UAG AAHs2ol2. I Protein C i 1orfl homolog 98.25 Bes taurus AAIllsl8.1 llioos2Aos NP UAG XP 143332.4 Sucrase-isomaltase 98.20 Rams norvegicus _037193.1 2oio2o4No8 AAH86s8l. l 24oooosCl2 UAG AAH8sssl.1 Apolipoprotein E precursor 98.07 Rattus norvegicus Hypothetical 35.9 kDa protein in MCXl-PBP2 intergenicregion 98.04 Rattus norvegicus AAH8iogo.1 24ioo8gKll UAG AAHssgog.1 in MCX l-PBP2 intergenicregion UAG AAH86ssl.1 RuvB-like I 97.59 Homo sapiens AAH l 2886.1 2510009606 XP UAG CAio2ssl.1 Oncogene tlm 96.13 Plasmodium berghei _675397.1 A2soolsCls XP 727793.1 UAG EAA 19358.1 Hypothetical protein RCol88 (n) 95.05 Plasmodiumyoelii yoelii 9530075011 XPss8l8s.2 2ioooiiB2o UAG BAB24ioo.1 unnamed protein product (n) 94.90 Canis familiar's 93.75 Plasmodiumyoelii yoelii XP_728532.1 4gss4llEo6 UAG CAIolg4l.1 Retrovirus-relatedPol polyprotein LIN E-l 92.91 Rattus norvegicus AAHigsoi.1 C6soosiMli UAG NP 9.1 Bifunctional polynucleotide phosphatase/kinase 90.41 Plasmodium berghei XP A4soo4gCio UAG AAHo6o4g.1 o6ioollIo4Rik protein (n) _668773.1 Low molecular weight phosphotyrosine 87.44 Rattus norvegicus XP 237746.2 Cssoo22Bol UAG XP _237746.2 protein phosphatase gssoossHos UAG AAHslsol.1 Env polyprotein precursor 84.62 84.50 Homo sapiens CAIsg8og.1 4gs2io2Fo8 UAG CAl4oios.1 Tubulin tyrosine ligase-like protein 11 82.54 Rattus norvegicus XP_34701 I.2 B2sosggCOs UAG XP _347011.1 hypothetical protein XP_347010 (n) 99.82 Rattus norvegicus XP_342917.2 4gsosi8M 17 UAA NP _080132.1 SHs-domain kinase binding protein 1 99.81 Rattus norvegicus XP A8soo22oos UAA AANs8sl8.1 Neuropilin and tolloid-like protein 1 precursor _225661.3 99.80 Rattus norvegicus AAH8ils2.1 siso442C2l UAA AAB6s26l.1 NNP-I protein XP s8so4o4Fol UAA CAl2sgs2.1 Zinc finger protein 62 homolog 99.78 Rattus norvegicus _573079.1 )0 Bgsool8ll2 UAA CAI24sio.1 Rap guanine nucleotide exchange factor 2 99.77 Rattus norvegicus _340793.2 NP_001032433. l 6oso4ssKl6 UAA NP_080826.1 p2l-activated protein kinase-interacting protein 1 99.76 Rattus norvegicus 99.69 Rattus norvegicus AAH86s8s. I Clsoo2oJo4 UAA AAKs8l6o.1 MKI6i FHA domain-interacting nucleolar phosphoprotein NP 1700041 K 19 UAA AAA6ggs8.1 Aldose reductase 99.68 Rattus norvegicus _036630.1 AAHgl2l2.1 2sioo46K 10 UAA AAH2l82l.1 Epithelial stromal interaction 1 (n) 99.68 Rattus norvegicus XP llioo2oLlg UAA NP_082909.1 26S proteasome-associated UCHsi interacting protein 1 99.63 Rattus norvegicus _215225.2 99.57 Rattus norvegicus AAH8sss8.1 4gsososB2o UAA AAH486s8.1 DnaJ homolog subfamily B member 6 99.53 Rattus norvegicus NP_113906.1 Aisoos2Bls UAA NP _067493.1 Ras-related protein Rab-2A 99.52 Rattus norvegicus AAHggls8.1 CssooolEo4 UAA AAHl4isi.1 Midline- I XP 2siooo8E2o UAA NP_033960.1 GUS-specific cyclin-E2 99.51 Rattus norvegicus _342805.1 99.49 Rattus norvegicus NP 1200003021 UAA AAH48l8l.2 Ezrin _062230.1 99.46 Pan troglodytes XP 516184.1 1810019020 UAA NP _080730.1 Ubiquitin-conjugating enzyme E2 M 99.42 Rattus norvegicus AAHgi48s.1 1200011118 UAA NP _080453.2 Hypothetical protein F26As.7 in I 99.39 Rattus norvegicus XP _573067. I 8030485122 UAA AAH86gIg.1 405 ribosomal protein S 10 AAHiglss.1 2siooosEio UAA BACs86ls.1 Aldose reductase-related protein 1 99.37 Rattus norvegicus 99.10 Rattus norvegicus NP 2oiool6Do2 UAA CAI244ll.1 Malate dehydrogenase _ 150238. I 99.08 Rattus norvegicus NP_062098.1 6sso4l4Al2 UAA NP 683740.1 Excitatory amino acid transporter 1 98.97 Rattus norvegicus XPs46Is2.2 2oiooo2Eo4 UAA AAQg2glg.1 Group XIIB secretory phospholipase A2-like proteinprecursor NP_ 64sos2gBo6 UAA AAH 12268.1 DnaJ homolog subfamily C member 5 98.52 Homo sapiens _079495. l 98.39 Homo sapiens NP 476528.1 20100151308 UAA XP _534675.1 Vacuolar protein sorting 29 97.73 Rattus norvegicus AAHigoso.1 2900002020 UAA AABg44gl.1 Bystin 97.48 Rattus norvegicus NP_071626.1 4833424007 UAA NP _071717.1 Tenomodulin 97.18 Rattus norvegicus AAH884s2. I 0610011018 UAA NP _056606.2 F-box only protein 8 95.65 Rattus norvegicus NP_598228.I A2sool4Mo8 UAA NP _075700.1 upregulated during skeletal muscle growth 5 (n) XP 0610007.101 UAA NP_083905.1 Hypothetical protein Mll6s6 95.53 Rattus norvegicus _21585 l .2 AAT 10590.1 B4sool2o2l UAA NP_ 115776.1 Probable G-protein coupled receptor 91 94.34 Rattus norvegicus AAH88284.1 o6ioosgMoi UAA NP 079872.4 Pxlg-like protein 93.09 Rattus norvegicus )0 4631422005 UAA AAH2i86o.1 Myosin-ll 90.82 Rattus norvegicus _221535.3 XP2lil8s.2 2sioosiKl4 UAA AAHig6sg.1 Gamma-secretase subunit APH- I B 87.94 Rattus norvegicus D4sooosEl8 UAA CAIO2s2o. I hypothetical protein PB4o2 (n) 87.50 Plasmodium berghei strain AN XP_ 673599.1 87.50 Plasmodium berghei XP 4gss4siF24 UAA AAHo6o4g.1 061001 l Io4Rik protein (n) _668773.1 86.82 Homo sapiens AAYI 5045.1 D lsoogiN 11 UAA AANsg68s.1 Ras-related protein Rab-6C

23 2. In silica prediction of eukaryotic read through genes

Table 2. Strong candidates for stop cod on read through in H . sapiens

cDNA Cod on Protein Annotation Identity Conserved species Conserved protein BCoos2g4 UGA CACo4l86.1 15 kDa selenoprotein 100 Rattusnorvegicus AAH6os4i.1 Dooo2s8l UGA XP_523342.1 selenophosphatase 99.78 Pan troglodytes XP_523342.1 BCooss88 UGA AAD22lio.1 Originrecognition complex subunit 4 99.77 PongoPygmaeus CAHg246s.1 80001862 UGA AAol4g46.1 Tripartite motif protein 48 99.73 Pantroglodytes XP_508897.I AKoolil4 UGA AAHs6o2o.1 Zinc finger DHHC domain containingprotein 13 99.68 PongoPygmaeus CAl2gi44.1 AKosss44 UGA NP_060622.2 Protein C2oorfl2 99.66 Macacafascicularis QgGKSg BCQliiJ UGA AAHliili.2 Type III iodothyroninedeiodinase 99.64 Ovisartes AATi4g2s.1 AKoi4l86 UGA CAI2o268.l Hypothetical symporteryagG 99.62 Canisfamiliaris XP_532546.2 BCOsoi8s UGA CAHglsg6.1 Exportin-i (Ran-bindingprotein 16) 99.56 Pongopygmaeus CAHglsg6.1 AKo2lgs6 UGA AADs2sg6.1 cAMP and cAMP-inhibitedcGMP 3' 99.56 Musmusculus AAPg4oso. I AKoolg82 UGA AAHl28os.2 Ran-bindingprotein 6 (R.anBP6) 99.56 Pan troglodytes XP_528532.1 BCooiogi UGA CAI4246s.1 Metalloproteinaseinhibitor 1 precursor 99.52 syntheticconstruct AAVs84s2.1 BCOs26go UGA NP_006853.I Tudor and KH domain contain protein 99.51 Pan troglodytes XP_524870.1 BCoo88sg UGA AAQI 5228.l Malcavernin 99.35 Pan troglodytes XP_519079.I AKo22io4 UGA AAQ88gsg.1 EGE4g6 (n) 99.27 Canisfamiliaris XP_535636.2 AL8ssl4s UGA CAAii8s6.1 SelenoproteinP precursor 99.21 Pongopygmaeus QsR8Wg AKo2slss UGA AAHoo8os.1 Nuclear ubiquitouscasein and cyclin-dependentkinasessubstrate 99.18 Bestaurus XP_581892.2 AKo2s6s8 UGA CAl2s62s.I Zinc fingerprotein 11B 99.18 Pan troglodytes XP 521451.1 AKo2sii6 UGA AAHsoss6.1 CDggL2 protein (n) 99.14 Pongopygmaeus CAH8ggi2.1 BCoosl2i UGA AAK6lsoo.1 Methionine-R-sulfoxidereductase 99.14 Pongopygmaeus CAHg2o4l. I AKogissi UGA AAG2l828. I Sentrin-specificpro tease8 99.06 Pan troglodytes XP_523114.1 Booooi42 UGA CAAslggs.1 Glutathioneperoxidase 99.02 Hylobateslat BAE 17008.1 AKog6isi UGA XP_512772.1 Prostacyclinreceptor 99.01 Pan troglodytes XP_512772.1 AKo24gsi UGA XP_522545.1 RING fingerprotein 34 98.95 Pan troglodytes XP_522545.1 AKogss44 UGA NP_057376.1 Heat shock protein 75 kDa 98.72 Pongopygmaeus CAHg2662.1 BCO26oi8 UGA CAI 14767.1 Mps one binder kinase activator-like2C 98.51 Pan troglodytes XP_524574.1 BCoo8424 UGA XP_520976.1 Diamine acetyltransferase1 98.10 Pan troglodytes XP 520976.1 AL8slgsl UGA XP 528251.1 Zinc fingerprotein 41 97.79 Pan troglodytes XP 528251.1 AKos4ggo UGA XP 086937.6 PREDICTED:similar to hypotheticalprotein (n) 97.52 Pantroglodytes XP__520662.1 AKosigls UGA AADl8Iss.1 neuroblastoma-amplifiedprotein (n) 97.35 Pantroglodytes XP_515708.1 AKo2462o UGA AAo4lils.1 SH domain protein 2A 96.98 Canisfamiliaris XP 848220.1 BColgo4l UGA CAll48go.1 Hypothetical28.3 kDa protein in gbd s'region 96.57 Canisfamiliaris XPssg648.2 AKog6244 UGA CAIl48go.1 Hypothetical28.3 kDa protein in gbd s'region 96.10 Canisfamiliaris XP_539648.2 AKog6s6g UGA XP_517528.1 PREDICTED:similar to hypotheticalprotein (n) 96.08 Pan troglodytes XP 517528.1 AKosioil UGA AAKli2sg.2 Cytochromec oxidasesubunit 3 93.49 Pan paniscus NP_008205.1 BCOssiog VGA CAI22o68.1 ATPasefamily AAA domain containing protein 3B 92.60 Canisfamiliaris XP_536708.2 CAHgs4os AKo2s486 UGA AAFillls.1 PRO222l (n) 85.57 Pongopygmaeus .1 AKogs6s2 UGA AAA88os6.1 LINE-l reversetranscriptase homolog 82.80 Pongopygmaeus CAHg2662.1 AKo24il4 UGA AAHl6gos.1 FL 113614protein (n) 82.35 Pan troglodytes XP_518535.1 ABosso82 UAG AAF6s6oo.1 Intersectin-2 99.94 Canisfamiliaris XP 532890.2 Muskelin I, intracellularmediator AKoo2o24 UAG EAL24o8l.1 containingketch motifs 99.86 Pongopygmaeus CAHgio2s.1 AKossigs UAG CAI22oss.1 ProteinC2oorfl2g 99.84 Bestaurus XP_585353.2 BCOsio82 UAG NP 002766.1 Seline pro teaseHTRA 1 precursor 99.79 Bestaurus XP_612097.I AKos6lg2 UAG NP_858061.1 Seline/threonineprotein phosphatase2A 99.77 Rattusnorvegicus AAHi88s4.1 BCoo2isl UAG CAI4lgss.1 Antigen peptide transporter2 99.72 Pan troglodytes XP 527356.1 AKo2s426 UAG AAHog6i6.1 Histone deacetylase 11 99.71 Macacafascicularis QgGKUs AL8s2s4g UAG NP_775083.1 Calpastatin 99.71 Pongopygmaeus CAHg26l4.1 80001899 UAG AAX 14047.1 Galactoside2-alpha-L-fucosyltransferase 2 99.71 Gorillagorilla AAF 14068.1 AKo26sls UAG AASoo4go.1 l-lactate dehydrogenaseA chain 99.70 syntheticconstruct AAPs64g6.1 AKo244oo UAG IXP 029101.8 PREDICTED:KIAAog4i protein (n) 99.70 Canisfamiliaris XP 545182.2 AKo22sis UAG EAL2si44.1 transforming growth factor (n) 99.68 Susscrofa AAPsig46,1 AKo24s62 UAG NP_078969.2 Enoyl-CoAhydratase 99.67 Musmusculus AAHs4s6s.1 Boooo4lg UAG CAGsoso8. 1 Catecholo-methyltransferase 99.63 Pan troglodytes XPsl4g84.1 AKool 177 UAG CAB666so.1 hypotheticalprotein (n) 99.59 Canisfamiliaris XP_544441.2 AKo24o4i UAG AAFiligo.1 Zinc fingerprotein 180 99.56 Pongopygmaeus CAHgl4s2.1 BCO2o8sl UAG CAllso44. I Protein Cgorfi2 99.55 Bestaurus XP_879352.1 AKos6ggo UAG CAIl4lg2.1 Ankyrin repeat domain protein 2 99.44 Canisfamiliaris XP_850948.1 BCoos8g4 UAG NP 001009008.1 Dimethylanilinemonooxygenase [N-oxide-forming) 2 99.44 Pantroglodytes AANo6slg.1 80000262 UAG CACo84os.1 OTTHUMPoooooosg42s(n) 99.30 Bestaurus AAios2o2.1 AL8s4l28 UAG NP_057654.2 Serologicallydefined breast cancer antigen NY-BR-84homolog 99.20 Macacafascicularis BAEolio2.1 Boooss64 UAG XP_528400.I Vacuolar ATP synthasesubunit G 1 99.15 Bestaurus AABsi48i.1 Boooioo8 UAG AAClgl6l.1 Alpha crystallinB chain 98.97 Macacafascicularis BADS1947.1 Booogo6s UAG NP_004474.2 Glycine cleavage system H protein 98.84 Pan troglodytes XP_523434.I AKo2s82s UAG NP_059143.2 FURLprotein homolog 98.32 Canisfamiliaris XP 544821.1 AKo2i2l4 UAG XP 496071.1 Carboncatabolite repressor protein 4 homolog 98.30 Pan troglodytes XP 526578.1 AKo248gi UAG XP__511835.1 PREDICTED:hypothetical protein (n) 97.32 Pan troglodytes XP_511835.1 AKogl64s UAG XP_529621.1 PREDICTED:hypothetical protein (n) 96.97 Pan troglodytes XP 529621.1 AKos6ios UAG XP 529293.1 PREDICTED:hypothetical protein (n) 95.27 Pan troglodytes XP 529293.1 AKo2iss6 UAG CAHg2lgs.1 hypotheticalprotein (n) 95.08 Pongopygmaeus CAHg2lgs.1 AKos66s6 UAG CADg8iol.1 Hypothetical 124.5 kDa protein in SKOl-RPL44A intergenicregion 94.91 Canisfamiliaris XP 850111.1 AKog4sis UAG CAl228is.1 expressed in hematopoieticcells, heart, liver (HLL) (n) 83.87 Pan troglodytes XPso8l48.1 AKool682 UAA BABls8li.1 Leucyl-tRNA synthetase 99.77 Pan troglodytes XP_518016.1 AKo2s6is UAA XP_517735.1 PREDICTED:similar to hypotheticalprotein (n) 99.76 Pan troglodytes XP_517735.1 Boool48i UAA CAB4ss6i.1 TAR DNA-bindingprotein 43 99.76 Pongopygmaeus CAHg28s4.1 AKooo222 UAA NP_062826.2 N6-adenosine-methyltransferase70 kDa subunit 99.66 Canisfarniliaris XP_532627.2 Booo4sol UAA NPNP_ooioo84gtl 1 Core promoter element-bindingprotein 99.65 syntheticconstruct AAPs6s8g.1 AKo2slss UAA CAHils24.1 GI to S phase transitionprotein 1 homolog 99.52 Pongopygmaeus CAHgs4os.1 AKooo66g UAA AAHo446s.1 Telomeric repeat bindingfactor 2 interactingprotein 1 99.50 Macacafascicularis BAEolggs.1 AKo24826 UAA NP_054749.2 ARF GTPase-activatingprotein GIT 1 99.46 Canisfamiliaris XP_548300.2 AKo226so UAA CAl22o26.1 Basic FGF-repressedZia-binding protein 99.27 Pantroglodytes XPsl46o4.1 Boooogos UAA XP_517538.1 High mobilitygroup protein 2 99.04 Susscrofa NPggg228.1 AKos6l2o UAA AAK6i6sl.1 Cyclin-L2 97.19 Canisfamiliaris XP 848553.1 AKos6626 UAA CAHg2o48.1 hypotheticalprotein (n) 96.83 Pongopygmaeus CAHg2o48.1 Booos6li UAA AANsg68s.1 Ras-relatedprotein Rab-6C 88.37 Gallusgothic ABFool26.1 AKog6s4l UAA XP2s84ls.2 similar to CDNA sequence BColggii (n) 85.70 Canisfamiliaris XP_850632.1 BCO2sg84 UAA XP 510166.1 Nonhistonechromosomal protein HMG-l4 85.44 Pantroglodytes XP 510166.1

24

2. In silica prediction of eukaryotic read through genes

Table 3. Strong candidates for stop cod on read through in D . melanogaster, A. thaliana and 0.

sativa

Identity Conserved species Conserved protein Organism cDNA Cod on Protein Annotation

REiog6s UGA CAD 12856.1 Hypothetical protein (n) 98.06 Drosophila pseudoobscura EAL26g8s.I D. melanogaster LPiogl8 UAG AAL4g28o.1 serine-type endopeptidase (n) 99.64 Drosophila pseudoobscura EAL2gs8i.I

Seline/threonine protein phosphatase BAD2gss4. I RAFLog-g8-E22 UAG CABigs2i.1 99.67 Olyza saliva PP-X isozyme 1 A. thaliana Phospholipid/glycerol acyltransferase CAEossg8.1 RAFLoi-io-H24 UAG NP 177990.1 99.65 Olyza saliva family protein (n) 94.66 AAMg4sos. I O. saliva AKo6sgis UAG AANl6s44.1 TNP2-like protein (n) Sorghum bicolor

cDNAs corresponding to experimentally verified read through genes are underlined.

Cod on : triplet of internal stop cod on.

Homologous protein of translated cDNA listed under `Protein' by accession ID along with

`Annotation' and `Identity' . `Annotation' obtained from UniProt or hr database (n).

Identity: identity score of S Search alignment between translated cDNA and protein.

Conserved Species: species with highest protein homologies to candidate sequences.

Conserved Protein: proteins from other species with highest homologies to candidate sequences.

25

2. In silica prediction of eukaryotic read through genes

Alss TACATA MG AAGATC CCT AGG AGCCAT TTC TCA GCA ATT ATA TCA 202 I I 1 i I 111 R111 111 111 I I 111 11 Blsooioll6 EST(CBs24262)92 TACATA MG AAGATC CCT AGG AGC CAT TTC TCA GCA ATT ATA TCA 139 11I 111 111 111 111 I I III chromosome 16 20020207 TACATA MG AAGATC MT AGGAGCCAT TTC TCA GCA ATT ATA TCA 20020254

Ketch-like protein 6 5 Y F 20

Blsooioll6 203 ACTGAT GTA ACA ATGGTA CTA ATA TTG GGA CGC AGA TTA AAC AGA GAG 250 i I I 111 I I 1 I i I I 1I I I I I I 1=1 !I II EST( CBs24262) 140 ACT GTAACA ATGGTA CTA ATA rrc GGACGC AGA TTA AAC AGA GAG 187 I i I I I I l I 11 I l I I I I i I 111 II I=i chromosome 16 20020255 ACT GTAACA ATGGTA CTA ATA TTGGGA CGC AGA TTAAAC AGA GAG 20020301

Keith-like protein 6 21 0 GATV 111 MGATV L L G R L N R E 36

B AKOO2O24 466 ATTGAT CCTGAT ATA CPA CM CCTTGT CTC AAC TGG TAT AGC MG 513 11I III Ill{GAT i Ill 111 III 111;;11 111 1. 11 111 III III III EST (AU 137730) 466 ATTGAT GAT CCT GAT ATA GTA CAA CCT TGT CTC AAC TGG `I TATAGC AAG 513 I€ III 111 111111 III III III III 111 III 111 chromosome 7 130421751 ATTGATGAT CCT GAT ATA GTACAA CCT TGT CTC AAC TGG TAT AGC RAG 130421798

Muskein 1 155 D D p D V Q p C L N w Y S K 170

AKOO2O24 514 TACCGT GAA CAG GAA GCT ATT CGCCTT TGC CTA AAA CAC TTC AGA 561 III II° 111 111 111 Ill 111 111 II` III 111 111 Ill 111 III EST(AU 137730) 514 TACCGT GAA CAG GM CCTATT CGCCTT TGC CTA AAA CAC TTC AGA 561 II° ill 111 III 111 Ill Ill 111 ill 111 211 111 111 111 II chromosome 7 130423659 TACCGT GAA CAG GM GCTATT CGCCTT TGC CTA AAA CAC TTC AGA 130423706

Muskein 1 171 Y R E E A R L C L K H F R 186

Figure 2.2 - Alignment of candidates and corresponding proteins, ESTs and genomes.

Candidates containing ketch motifs in M. musculus (A, Blsooioll6) and H. sapiens (B,

AKoo2o24) aligned with proteins (`Ketch-like protein 6' and `Muskelin 1'), ESTs and genomes.

Ketch repeat motifs found at 100-500 amino acids downstream of internal stop codons (bold,

alignments boxed). Gray fonts indicate a mismatch between the protein and cDNA sequence;

hyphens indicate a gap in the sequence.

26

KpSHAQ 2. In silica prediction of eukaryotic read through genes

A B (%) (%) 100 100

80 80

60 ^UAA 60 ^UAA UAG O UAG UGA UGA 40 40

~ ' 51 20 20

0 strong candidates tiOn- strong candidates non- candidatescandidates candidates candidates

Figure 2.3 - Ratios of internal stop codons in M. musculus (A) and H. sapiens (B). `Candidates' represent read through candidates following step D; `strong candidates' follow step

E. `Candidates' and `strong candidates' indicate the ratios of UAA, UAG and UGA in their predicted internal stop codons. Ratios of termination codons filtered out in step A are represented by `non-candidates' (negative control).

27 2. In silica prediction of eukaryotic read through genes

A

600 a Ja 8400

200

0 documented strong candidates candidates non-candidates read through

600

§ 400 C K. U)

200

0 documented strong candidates candidates non-candidates read through

Figure 2.4 - Average distances between stop codons in M. musculus (A) and H. sapiens

(B). `Candidates' represent read through candidates following step D; `strong candidates' follow step

E. `Documented readthrough' indicates previously known read through genes. Other genes are

categorized as 'non-candidates'. Bars indicate the average distance between internal and

termination stop codons. In `non-candidates', it is represented as the average distance between the actual termination stop cod on and the nearest in-frame stop cod on in the 3' UTR. Error bars

show the 95% standard error of each category.

28

c 2. In silica prediction of eukaryotic read through genes

2.4 Discussion

We demonstrated the in silica screening of read through genes using comparative genomics, and generated a list of candidates, including documented read through genes, in five eukaryote species. As the selectivity of our approach depends on the quality of the source data in public databases, it is imperative that protein and cDNA sequences are accurate and updated to avoid false negatives. For example, the previously documented read through genes Sell of H. sapiens, and Dial and Sephs2 of M. musculus were missed out of our candidate list because of the insertion and/or deletion of their cDNA sequences. Similarly, Serv of M. musculus was missed in step B because of a sequencing error in the EST data. The abundance of protein and cDNA sequence data for some species might increase the selectivity compared with other species; for example, M. musculus candidates showed higher selectivity than those of H. sapiens.

In the final filtering step, all documented read through genes from the `candidates' list were extracted in M. musculus and H. sapiens. Previous research showed that many selenoproteins have homologues containing instead of selenocysteine [13]. Therefore, when protein motifs around the predicted internal stop cod on are widely observed in other species, the corresponding cDNA candidate can be defined as reliable, and the internal stop cod on and read through mechanism are likely to have been gained or lost in the course of evolution. We proposed that the final filtering step was appropriate for the screening of high-quality candidates, although it did not prove necessary for our screening of D. melanogaster genes, as no documented read through genes were extracted at this stage. This suggested the existence of species-specific read through genes.

29 2. In silica prediction of eukaryotic read through genes

The ratios of each type of internal stop cod on in `candidates' and `strong candidates' in M. musculus and H. sapiens differed from those of the termination codons in `non-candidates'; the ratio of UGA was similar to that of UAG, while UAA was less common (Figure 2.3). Notably, the results from H. sapiens candidates were similar to those of M. musculus. Although all previously reported mammalian read through events involve selenocysteine insertions at UGA codons, the ability of glutamine tRNA to recognize mammalian UAG codons [51] suggests that read through can occur at each type of stop cod on in mammals. Moreover, although no stop cod on read through mechanisms have been found in higher plants, we identified two candidates in A. thaliana and one in 0. sativa with UAG internal stop codons.

If read through plays a critical role in the regulation of protein function, long peptides might be encoded downstream of the internal stop codons to change the structural features of the protein. Therefore, our finding that the distances between stop codons of `candidates', `strong candidates' and documented read through genes are longer than those of `non-candidates'

(Figure 2.4) supports the hypothesis that candidates contain a large number of actual read through genes.

Approximately 72 % of strong candidates in M. musculus and 64% in H. sapiens contained known protein motifs between their internal stop codons and termination stop codons suggesting that these regions have protein-coding potentials. Notably, two of the candidates,

Blsooioll6 in M. musculus and AKoo2o24 in H. sapiens, were observed to have ketch motifs at the downstream of their internal stop codons (see Additional file 2). Ketch motif is a so-residue repetitive motif and the read through of UGA stop cod on translating its downstream ketch motif has been previously documented in D. melanogaster [24]. Although the reported

30 2. In silica prediction of eukaryotic read through genes

functions of ketch-containing proteins are diverse and the role of the ketch motif is unclear [52], we suggest that some proteins containing the ketch motif are translated by a read through mechanism.

AKosioil cDNA of H. sapiens was predicted to harbour the internal stop cod on UGA, and was annotated as a cytochrome c oxidase subunit. Previously, selenoproteins participating in oxidation reactions have only been reported in prokaryotes [3]; however, AKosioil suggests that this might also be true of eukaryotes, because UGA mammalian read through candidates are most likely to encode selenoproteins.

31 3. Conclusions

Chapter 3

Conclusions

Stop cod on read through is an unusual process in which an amino acid is incorporated at the stop cod on according to non-standard genetic code. Read through mechanism is conserved in all three lineages of life. Nevertheless, previous reports showed that read through is minor regulatory mechanism, which is used by only about 20 genes in H. sapiens and is unnecessary for higher plats. Really? Why? At present, our knowledge of read through genes is limited and their biological significance is not fully understood, so it is important to investigate the

`readthroughome' .

In this research, we have identified new read through candidate genes in five eukaryotes using comparative genomics approach. In total, 148 M. musculus and 86 H. sapiens read through candidates were identified, containing 11 out of 19 and six out of 14 documented read through genes, respectively. Some novel read through candidates were also predicted in D. melanogaster and higher plants. Our list of novel candidates suggests the existences of novel read through mechanisms, for example, other than selenocysteine insertion in mammals and higher plants.

Most previous computational methods predict read through on the basis of SECIS element searching, and are specialized in locating read through genes that incorporate selenocysteine.

32 3. Conclusions

However, our homology-based approach was able to extract read through candidates without being limited to one type of organism, internal stop cod on or mechanism.

We suggest that our comprehensive method of prediction will assist in the further understanding of the biological roles of read through.

33 ACKNOWLEDGEMENTS

Acknowledgements

First of all, I sincerely thank Prof. Akio Kanai for the direct support of my work. I learnt the way of being a scientist from him. I appreciate Assistant Prof. Rintaro Saito for coaching me in all my study. He taught me basic bioinformatics technique and scientific way of thinking. I also thank Nozomu Yachie for the support both officially and privately. I cannot say enough gratitude for his kindness and cooperation. Without these three great teachers, I would have not been able to complete my master's degree in bioinformatics.

I would like to give special thanks to Hitomi Umeki for her unfailing friendship. We had our share of ups and downs. I thank Yoshiteru Negishi for his kind support. He is a great partner to overcome many problems. I am grateful to Yuki Okada and Motomu Matsui for computational assistance. I also thank Noriyuki Kitagawa, Kosuke Fujishima, Kyota Ishii, Hayataro Kochi and

Hiroyuki Nakamura for beneficial discussion and advices. The memories of significant times with them will last a lifetime. I would like to thank all of faculties, students and staffs in IAB.

Everybody gave me treasured experiences.

I appreciate Prof. Masaru Tomita for supervising the whole research project and offering a wonderful research environment. He made my student life more and more meaningful.

Finally, I thank my parents and family for their selfless love. They are the best family one could wish for.

34 ACKNO WLEDGEMENTS

これ ま での研 究活 動 は非常 に 多 くの方 々 に支 え られ て成 し得 た もの で ある.ま ず,研 究 を進 め

るに あた り多 くの ご指 導 と助言 をいた だ いた金 井昭 夫教 授 に格別 の 感謝 の意 を表 した い.金 井

教授 との研 究 活動 か らサ イエ ンスの 面 白 さを学 び,大 き く成 長す る ことが で きた-ま た,斎 藤

輪 太郎 氏 には,学 部2年 次 か ら修士 修 了 に至 るまで 多岐 にわ た る ご指導 を いた だ いた.私 が こ

こま で研究 活動 を続 け られ たの も,斎 藤 氏 に基礎 か ら育 て あ げて いただ い たお か げで ある.さ

らに,谷 内江 望氏 に は公私 に渡 り多 くの場面 で 支 えて いた だ き,非 常に 感謝 して いる.膨 大 な

時 間 を割 いて私 の 研究 をサ ポー トして いた だ いた谷 内江 氏へ の ご恩 は,-言 で語 り尽 くせ る も

の で はな い.こ の お三 方が い らっ しゃ らなか った ら現在 の私 は な い.特 に厚 く御 礼 申 し上 げた

い.

梅 木瞳 氏 とは 非常 に 多 くの 時間 を共 有 し,様 々 な感 情 を分 か ち合 って学 生 生活 を送 って きた.

彼 女 の存在 は,私 の学 生生 活 にな くて は な らな い もので あ った.何 物 に も代 え難 い友 情 に心 よ

り感謝 して いる.根 岸 義輝 氏 には,常 に側 で支 えて いた だ き,彼 の おか げ で多 くの困 難 を乗 り

越 える ことが で きた.こ の 場 を借 りて深 謝 した い.岡 田祐 輝 氏,松 井求 氏 には配 列解 析 にお け

る多 くの助 言 を いただ き研 究 をサ ポ-ト して いただ いた.ま た,北 川統 之氏,藤 島皓 介 氏,石

井 強太 氏,河 内隼 太郎 氏,中 村 浩之 氏 との デ ィス カ ッシ ョンは大 変 有意 義 であ った.彼 らは 研

究 の苦 楽 を共 に した心強 い同志 で あ り,彼 らと過 ご した 日々は生 涯 の宝 物 であ る.先 端 生命 科

学研 究 所の すべ て の教 員,学 生,ス タ ッフの 方 々に敬 意 を表 す る.彼 らの誰-人 が欠 けて も,

ここまで有意義な学生生活は送れなかっただろう.

この よ うな すば ら しい人 々 とす ば ら しい研 究活 動 をす る こ とが で きた の は,す べ て冨 田勝教

授のおかげである.冨 田教授との出会いによって私の学生生活は格段に充実したものとなった.

心 よ り御礼 申 し上 げた い-

最後 に,常 に暖 か いまな ざ しと惜 しみ な いサ ポ-ト で私 を支 えて くれ た両親 と祖母,兄 姉,

そ して最高 の 癒 しを くれ た 姪の 琴美 に,最 大 の感 謝 の念 を伝 えた い.本 当 にあ りが と う.

35 REFERENCES

References

1. Kisselev LL, Buckingham RH: Translational termination comes of age. Trends Biochem Sci 2000, 25(11):561-566.

2. Dontsova M, Frolova L, Vassilieva J, Piendl W, Kisselev L, Garber M: Translation termination factor aRFl from the archaeon Methanococcus jannaschii is active with eukaryotic ribosomes. FEBS Left 2000, 472(2-3):213-216.

3. Hatfield DL, Gladyshev VN: How selenium has altered our understanding of the

genetic code. Mol Cell Biol 2002, 22(11):3565-3576.

4. Chambers I, Frampton J, Goldfarb P, Affara N, McBain W, Harrison PR: The structure of the mouse glutathione peroxidase gene: the selenocysteine in the active site is encoded by the 'termination' cod on, TGA. Embo J 1986, 5(6):1221-1227.

5. Ambrogelly A, Palioura S, Soil D: Natural expansion of the genetic code. Nat Chem Biol 2007, 3(1):29-35.

6. Cobucci-Ponzano B, Rossi M, Moracci M: Receding in archaea. Mol Microbiol 2005, 55(2):339-348.

7. Krol A: Evolutionarily different RNA motifs and RNA-protein complexes to achieve selenoprotein synthesis. Biochimie 2002, 84(8):765-774.

8. Namy 0, Rousset JP, Napthine S, Brierley I: Reprogrammed genetic decoding in cellular gene expression. Mol Cell 2004,13(2):157-168.

9. Berry MI, Banu L, Harney JW, Larsen PR: Functional characterization of the eukaryotic SECIS elements which direct selenocysteine insertion at UGA codons. Embo J 1993, 12(8):3315-3322.

10. Tujebajeva RM, Ransom DG, Harney JW, Berry MI: Expression and characterization of nonmammalian selenoprotein P in the zebrafish, Danie redo. Genes Cells 2000, 5(11):897-903.

36 REFERENCES

11. Fagegaltier D, Hubert N, Yamada K, Mizutani T, Carbon P, Krol A: Characterization of mSelB, a novel mammalian elongation factor for selenoprotein translation. Embo J 2000, 19(17):4796-4805.

12. Howard MT, Aggarwal G, Anderson CB, Khatri S, Flanigan KM, Atkins IF: Receding elements located adjacent to a subset of eukaryal selenocysteine-specifying UGA codons. Embo J 2005, 24(8):1596-1607.

13. Kryukov GV, Castellano S, Novoselov SV, Lobanov AV, Zehtab 0, Guigo R, Gladyshev VN: Characterization of mammalian selenoproteomes. Science 2003, 300(5624):1439-1443.

14. Bosl MR, Takaku K, Oshima M, Nishimura S, Taketo MM: Early embryonic lethality caused by targeted disruption of the mouse selenocysteine tRNA gene (Trsp). Proc Natl Acad Sci USA 1997, 94(11):5531-5534.

15. Srinivasan G, James CM, Krzycki JA: Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA. Science 2002, 296(5572):1459-1462.

16. Hao B, Gong W, Ferguson TK, James CM, Krzycki JA, Chan MK: A new UAG-encoded residue in the structure of a methanogen methyltransferase. Science 2002, 296(5572):1462-1466.

17. James CM, Ferguson TK, Leykam IF, Krzycki JA: The amber cod on in the gene encoding the monomethylamine methyltransferase isolated from Methanosarcina barkeri is translated as a sense cod on. J Biol Chem 2001, 276(36):34252-34258.

18. Zhang Y, Baranov PV, Atkins IF, Gladyshev VN: Pyrrolysine and selenocysteine use dissimilar decoding strategies. J Biol Chem 2005, 280(21):20740-20751.

19. Polycarpo C, Ambrogelly A, Berube A, Winbush SM, McCloskey JA, Crain PF, Wood IL, Soli D: An aminoacyl-tRNA synthetase that specifically activates pyrrolysine. Proc Natl Acad Sci U S A 2004, 101(34):12450-12454.

20. Blight SK, Larue RC, Mahapatra A, Longstaff DG, Chang E, Zhao G, Kang PT, Green-Church KB, Chan MK, Krzycki JA: Direct charging of tRNA(CUA) with

pyrrolysine in vitro and in vive. Nature 2004, 431(7006):333-335.

21. Polycarpo C, Ambrogelly A, Ruan B, Tumbula-Hansen D, Ataide SF, Ishitani R, Yokoyama S, Nureki 0, Ibba M, Soli D: Activation of the pyrrolysine suppressor tRNA requires formation of a ternary complex with class I and class II lysyl-tRNA synthetases. Mol Cell 2003, 12(2):287-294.

37 REFERENCES

22. Ibba M, Soil D: Aminoacyl-tRNAs: setting the limits of the genetic code. Genes Dev 2004, 18(7):731-738.

23. Bergstrom DE, Melli CA, Cygan JA, Shelby R, Blackman RK: Regulatory autonomy and molecular characterization of the Drosophila out at first gene. Genetics 1995, 139(3):1331-1346.

24. Xue F, Cooley L: ketch encodes a component of intercellular bridges in Drosophila egg chambers. Cell 1993, 72(5):681-693.

25. Robinson DN, Cooley L: Examination of the function of two ketch proteins

generated by stop cod on suppression. Development 1997, 124(7):1405-1417. 26. Steneberg P, Samakovlis C: A novel stop cod on read through mechanism produces functional Headcase protein in Drosophila trachea. EMBO Rep 2001, 2(7):593-597.

27. Steneberg P, Englund C, Kronhamn J, Weaver TA, Samakovlis C: Translational read through in the hoc mRNA generates a novel branching inhibitor in the drosophila trachea. Genes Dev 1998, 12(7):956-967.

28. Namy 0, Duchateau-Nguyen G, Rousset JP: Translational read through of the PDE2 stop cod on modulates cAMP levels in Saccharomyces cerevisiae. Mol Microbiol 2002, 43(3):641-652.

29. Driscoll DM, Copeland PR: Mechanism and regulation of selenoprotein synthesis. Annu Rev Nutr 2003, 23:17-40.

30. Mehta A, Rebsch CM, Kinzy SA, Fletcher IE, Copeland PR: Efficiency of mammalian selenocysteine incorporation. J Biol Chem 2004, 279(36):37852-37859.

31. Baranov PV, Gurvich OL, Fayet 0, Pre re MF, Miller WA, Gesteland RF, Atkins IF, Giddings MC: RECODE: a database of frameshifting, bypassing and cod on re definition utilized for gene expression. Nucleic Acids Res 2001, 29(1):264-267.

32. Baranov PV, Gurvich OL, Hammer AW, Gesteland RF, Atkins IF: Recede 2003. Nucleic Acids Res 2003, 31(1):87-89.

33. Castellano S, Morozova N, Morey M, Berry MI, Serras F, Corominas M, Guigo R: In silica identification of novel selenoproteins in the Drosophila melanogaster genome. EMBO Rep 2001, 2(8):697-702.

34. Pesole G, Liuni S, D'Souza M: PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics 2000, 16(5):439-450.

38 REFERENCES

35. Taskov K, Chapple C, Kryukov GV, Castellano S, Lobanov AV, Korotkov KV, Guigo R, Gladyshev VN: Nematode selenoproteome: the use of the selenocysteine insertion system to decode one cod on in an animal genome? Nucleic Acids Res 2005, 33(7):2227-2238.

36. Zhang Y, Gladyshev VN: An algorithm for identification of bacterial selenocysteine insertion sequence elements and selenoprotein genes. Bioinformatics 2005, 21(11):2580-2589.

37. Chaudhuri BN, Yeates TO: A computational method to predict genetically encoded rare amino acids in proteins. Genome Biol 2005, 6(9):R79.

38. Fujii Y, Imanishi T, Gojobori T: [H-Invitational Database: integrated database of human genes]. Tanpakushitsu Kakusan Koso 2004, 49(11 Suppl):1937-1943.

39. Bone H, Kasukawa T, Furuno M, Hayashizaki Y, Okazaki Y: FANTOM DB: database of Functional Annotation of RIKEN Mouse cDNA Clones. Nucleic Acids Res 2002, 30(1):116-118.

40. Bellen HI, Levis RW, Liao G, He Y, Carlson JW, Tsang G, Evans-Holm M, Hiesinger PR, Schulze KL, Rubin GM et al: The BDGP gene disruption project: single transposon insertions associated with 40% of Drosophila genes. Genetics 2004, 167(2):761-781.

41. Kikuchi S, Satoh K, Nagata T, Kawagashira N, Dol K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H et al: Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 2003, 301(5631):376-379.

42. Seki M, Satou M, Sakurai T, Akiyama K, Iida K, Ishida J, Nakajima M, Enju A, Narusaka M, Fujita M et al: RIKEN Arabidopsis full-length (RAFL) cDNA and its applications for expression profiting under abiotic stress conditions. J Exp Bot 2004, 55(395):213-223.

43. Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ et al: The UCSC Genome Browser Database. Nucleic Acids Res 2003, 31(1):51-54.

44. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Hwang H, Lopez R, Magrane M et al: The Universal Protein Resource (UniProt). Nucleic Acids Res 2005, 33(Database issue):Dls4-lsg.

39 REFERENCES

45. Sate M, Umeki H, Saito R, Kanai A, Tomita M: Computational analysis of stop cod on read through in D.melanogaster. Bioinformatics 2003,19(11):1371-1380.

46. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403-410.

47. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search

programs. Nucleic Acids Res 1997, 25(17):3389-3402. 48. Kent WJ: BLAT--the BLAST-like alignment tool. Genome Res 2002, 12(4):656-664.

49. Smith TF, Waterman MS: Identification of common molecular subsequences. J Mol Biol 1981, 147(1):195-197.

50. Zdobnov EM, Apweiler R: InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics 2001, 17(9):847-848.

51. Kuchino Y, Muramatsu T: Nonsense suppression in mammalian cells. Biochimie 1996, 78(11-12):1007-1015.

52. Adams J, Kelso R, Cooley L: The ketch repeat superfamily of proteins: propellers of cell function. Trends Cell Biol 2000, 10(1):17-24.

40 APPENDIX

Appendix

Contents

Additional file 1— The candidate lists of cDNAs encoding read through genes filtered in the step D Read through candidates filtered in the `step D' (see Methods, Figure 2.1) in five eukaryotes are listed along with their cDNAs.

Additional file 2 —The lists of protein motifs at 3' UTR of read through candidates

Protein motifs found at 3' UTR of read through candidates by InterProScan are listed along with the cDNAs of candidates following `step D' (see Methods, Figure 2.1).

41 Additional file 1, Table S 1:

The candidate lists of cDNAs encoding read through genes filtered in the `step D Read throughcandidates filtered in the `step D' (see Methods, Figure 2.1) in five eukaryotes are listed along with their cDNAs.

Column 'Name': Name of cDNAs Color-coding in rows:

Experimental validated cDNAs encoding read through genes

Column `Codon': Internal stop codons of respective cDNAs Column `Protein': Homologous proteins of respective cDNAs Column `Identity': Identity scores of BLASTX between respective pairs of cDNAs and proteins in the `step A' Column `Comments': Annotations of proteins from UniProt or hr database Column `Motif at 3' UTR': Existence or nonexistence of motifs between internal stop codons and termination stop codons (for details of motif names, see Additional file 2) Color-codes of 'Comments':

Name Cod on Protein Identit Motif in s'UTR (iffy: 2siool2H2o UGA AAH8664g.1 100 lam„ y yes ' Aisool6Co8 ivy4• 4jyys"s~ / • UGA AAHs6s6o.1 99.89 ,. yes Dosoo46Pll UGA AAHoil4o.1 99.85 /,, yes sgso4osD2l UGA

Name Codon Protein Identity 【曜●】1ilー1国塵Mこ Motif in 3'UTR A230002G14 UGA AAH547191 9959 3230402K17 UGA BAC263271 9957 難畿鵜 鷲 繋 簿謬響 響琴琴 講noyes 3110001117 UGA AAH288671 9956 Sodium channel modifier1 no 1500041015 UGA NPO675042 9956 i捌 麟 義懲 塗゚'9f.Y,.・'// ∵ 療/ノ 箔 ぞ/二を㌘霧yes 箋 7蘇 病繍3翻 繧3鱒翻8G蟹 UGA AAH370271 9955 '覇 r . /多勢㌘rご閣三コ覧 yes .... 17QOO27009 UGA BAA227801 霧,、E 繍1纏 癖 謬蜘 ・㈱嫁鞭 乞・/二∠驚㌘影契雍yes 1300003GO2 UGA AAH647531 9949 「考凝搬 繍 ¢As蜘} / ' // ∫/二・ぢ∬そ" /多・/そ三 no 5031434021 UGA CAA681401 9947 携z麟 ・籍辱瑳繍 , . /ノぞ ≧ワ彩ノ〆劣"甕 yes §430創5P◎§ UGA NP 4443321 9938 郷r簸 纐 麟 〆 冠 、 ≧∴//7ク落2三彦諺 yes 6430522621 UGA NP 1143921 9937 犠鰍 鰯 一 翻 繍.雛 〆"三≡多/pi.:m,..°.tGiyes 2010002GO5 UGA NP 6121771 9935 繊1 .轍/τ ∫//%/乞 / //・'9∬/ z/// /二/髪〆7宅・"1鴛髪灘 yes 9130206HO5 UGA NPO805192 9933 1季ミ脈{三NcDNA 9130411117 yes D230011L23 UGA NPO316852 9933 繍 髪鹸 趨 費貸襲§轟2纏 、、 、 . :影嘉三ζ急髪ニノ多套.身髪yes 2010013H22 1 UGA IAAK724801 9931 獺.鱈!ズζ/∵ 二/ニニ、 / /、 . / /"㌶ 、三罎 諺拶霧多1援yes 蠣 - -..微 骸…聯 撒微碗- … 「「…… 「「.朋 徊膨一 一一…一一 一 一 一 一 『『『『一一--...... 籔 麺麹 纏窪塗...臨益鵬舳 影 《鐙嬢 善繍z奮 藝翰鍛韮重//ζ霧/甥拶移髪 激 多髪

肇1繧O◎壌2薮(}9 UGA NP O387871 9913 繍.彪 _、_%".華._.'/ 磯{繍 り ・/二・/『 ノ 鷺 鷺 疹乞笏 多髪鰯no C530010121 UGA NP 2910951 9902 継 醗 鯵2織 鎌 綴ll .諺鯉 灘 yes 0130◎23KO5 UGA AAH716991 9897 纈izワ嚢"慈1霧姦魎 魏;タ ∫老ζ"/影彦鷺三磁灘7驚.灘 yes 2210014P11 UGA NPO331821 9886 懸1麟 覆競鋤 多 〆 つ.//〃勿物∫弩㌘づ /御多獲 譲 yes 2810406K24 UGA NP 7803351 9875 難1.欝 齢 譲簸3麹 ・麟 〆3.` 易/㌘ダ撚 髪〆懇yes 2010015604 UGA NPO758721 9874 猛霧雛 三 ニ ド易〕髪 /'三2写.彩多咳観灘 灘yes Mmusculus 繍 聾9864 雛 霧響 纏 鰍幽 ㈱ 2410015K23 UGA BA6233501 .9繍 簸/乞/・/%嬢 難 鎌 ・・ 8230337A鰭 UGA NP 4444971 9862 鰹 猛/ /灘 診勲r/ // 、 /;三e,%,.'/,e三ゑを / , 物 ㌘慧no 「… 「 「 附 -. 醒-一….…『闇何凹 『附 閣墜劇 ….… … . . sflerm mitochondria-associated cvsteine-rich protein瓢 B23◎2で5田5 UGA NP 1139082 9830 難 舞 ド"/〃 "∠1、/翻 騰 髪〆〃.ニン;影ノニ乙諺/μ髪雛 三総yes 1810023AO7 UGA IXP 1350652 9821 角 _/窮 蓼葭}繍 多2 . 三/多∬/3/" /27多 灘 懇 髪yes 9930105用7 UGA XP-3554661 9799 難 〉魏 厚繕 競,.〆 ・.、、/./iiy.:e%/、労撫 灘yes 躍関 .繧. /∠観クつ づ// 砦/.、饗髪τ諺影、ζ/髪三冤"一 霧1霧灘 欝1霧繊 霧髪 鑛1.一… 一.鱈…… 翻.… ……...廟". 繍lar ta enve(・ゆepr・teln 幽購 繍 雛鑛 A530071G10 UGA AAM475421 9761 鷲 多〆 η 蓼 /".麟 ≡多汽弓£髪鴛鎗㌘る一yes 4930588C21 UGA INP O800653 9717 雛 懇' .睡_自1∠.__,霧易三劣三篇器 撚 慧 髪織yes 9130020015 UGA BAA037681 --- 9695 獺 籔 多多珍/彦ろ・諺三瘍ζぞ三ぢ…羅 多一 織yes Si翻ar to RIKε卜壷cDNA 493◎572◎03 鍾綴 綾霧雛i i.雛i髪霧旨嚢 二笥弓./〆 ∴諺,易諺4傷 撫易/多//二 姦 霧篇 雛 〃二%慧 灘 暇N、QNA B23◎104CO8 一[.一 /〆 〆 / / /多 グ 〃 「OJV)I」」U」 UしA 」ハA乙))ロ11 y49∠ ㈱ihetical p⑩セei轟 no 2210403610 UGA CAI259381 9458 '形 形 1雛% "。..、あ霧碧彰諺%傷亥.そ%盤 一yes … 繊 隅 編 ㎜ … 皿……… .㎜… 一'π'…『r…解 …'四 一…'『『隠僧"媚 .- 一 _. _ S納掘ar転)Rl}く亡N cQNA 4933431[)05 _- _ O 2410075GO2 UGA AAN122831 9404 oo涌奮)ac行orトaSS◎C茎a塗edpro重e no 5830471022 UGA XP 5132021 9400 辮Z霧 纏 霧綴 蓼騒〆7" 繍 覇 鯵 〆謬珍三霧紛 撫 継 no C330014P19 UGA NPO806941 9396 灘 、蓼i雛務麟 繍 継 ζ霧 翻 饗'灘 雛yes C730049F21 UGA XP 2181971 9383 懲 ,,i義1欝,騰i禦.饗 「77像覆;膨 饗三難1鷲 魏no E430031P10 UGA XP 1935243 9306 鞭1鰺 霧縁 i欝 講饗雛 男/∫〆㌘写 ン髪"7/ニグr髪ワ獲多多耀 φ饗灘 no Name C0don Protein Identity 【罎・hーm隅畠膿 Motif in 3.UTR -.- ㎜ 9230110P17 1 UGA AAH57902.1 91.80 ' 1-no 1200016C19 1 UGA BAA23885.1 90.20 4 F--no 6430110コ16 1 UGA CAIO2077,1 82,09 hy診e㈱ 禰 峯pぞ℃繍 ドP84{}2 no - -.- 一 ㎜ 騒 、 `he#ica!pr伽 拠 - 6930009DO2 UGA BAC31757.1 81.60 ㈱ ㈱d1晦 纈n p野◎dud no C230017N23 UGA BAC36590.1 81.54 導臨a謝 ◎繍pr◎dUC{ no 扇 % z∠ 270QO94K13 UGA AAH21122.2 81.15 醗雛 瓢猛 夏尻/霧霧霧懸 霧繊霧灘 羅 霧羅yes A630028GO9 UGA BAB26844.1 80.95 りh捻盆r謝 =的 聖e重n τo(聾超ct yes 4932415019 UAG XP 129721.2 99.88 灘1雛 鰐,'.霧z灘鶴一il藻 霧yes E230007K22 UAG NP 109646.2 99.86 瀦 纏 1霧蓼灘/ ..o%-il撒 雛霧yes 9530092G19 UAG NP 705783.1 99.83 撚 饒蓼霧顯麗鑛雛騨鶴彰一i轟 髪yes EO30048F18 UAG XP 131118.1 99.82 籔灘 覇馨 舅 ∠蓼霧霧多三難簗盤 難'i難 髪yes 5730406N21 UAG NP 808445.2 99.82 '獅Q撫 ∋塾calPfO捻齢 し◎C2 yes 9430049622 UAG NPO32595.1 99.82 1難i顔1灘 懸"影.〆 1〆7蓼、グ霧撚.欝 欝罵 讐箋獺i髪霧霧髪羅 yes E430024C22 UAG NPO36090.1 99.81 1鰹,i鑛i灘 繍 ド鑛綴 灘 三・、;"羅 謙 ≧影雛霧雛 霧琵 no 5730469GO5 UAG BAC27936.1 9980 騰ll,1灘'「勲醗 雛纏a ニター 霧 i雛難霧灘n・ 9330162L13 UAG BAC33235.1 99-80 轍i霧1灘 纒 纒 σ"纏一 綴i纏i霧灘 墾no 6330565C24 UAG NPO38531.1 99.79 雛雛鞭 雛懸 綴 ∫ド認瓠r驚ノ易i菱二灘雛1鷺灘i雛霧髪態yes κF 盈 D130020J20 UAG NPO80773.1 99.78 霧彰捌1翼 鰹 懸 、. . グ _彰 ...ワ膨,・ ・叢 雛 髪萎 霧 霧 騨 yes D230004M10 UAG AAH56340.2 99,76 鞠 韮n OS蜘 ねse 4 yes M.musculus 9830131GO7 UAG NP 932145.2 99.74 霧購 鶴號 綴 傷易、、〆γ, '",_,.77ド"ド .、形磁{義コ㌃Zl㌶ 二:二、、雛霧難 霧ll髪 霧 霧yes 3110013005 UAG AAH85308.1 99-74 霧騨簸灘1霧 霧霧麹繍 、多鯵ドド灘7蕊 霧三1驚 雛霧霧ll霧・霧 髪yes 2600003CO5 UAG AAH57570.1 99.66 霧鰯簸購 蓼籐'繍 ∠鞠醗.覇 房叢 i髪霧霧ll霧 霧髪yes "ド 望ド… グ . 1110001N15 UAG AAH75695.1 99.63 霧霧 薩霧霧1鶴霧.rζ"、. .∵ 罵 ζあ霧=望 ラ霧 髪'霧撚 霧雛 霧yes 1110033J19 UAG AAH47994.1 99.62 霧箋馨辮1鎌 ≧鰍 ド靱 .、 ワ%.r傷 砕〃〆㌶多彦髪霧霧霧 襲 ・霧霧yes 4732469FO2 UAG NP 598702.1 … 1 霧鯵 藝霧霧霧懸 麹籔覇.繍 灘.藤.〆1纏i・ 雛 霧ll雛髪霧霧yes 4732489H21 UAG NPO79905.1 99,59 / /" ////・協グy鮒7 ド 灘Fi,舞 慧 霧yes 霧 擁.i謬難1懸 雛 魏.、 「' 7 "//ド 霧 髪雛籔 携li雛 4021402A13 UAG NPO76304.2 99.59 獲灘 騰麟 騰鋤 ,霧 霧yes 5430405H20 UAG BAC39594.1 99.58 霧纏 辮 彰装蜘1 、 汀;1こご 鰯 灘灘 雛懸 獺yes 2010005601 UAG NPO31924.1 99.57 難購霧雛 麟 灘 醐 蝋 ド 漏.劉 灘 雛雛霧雛雛 髪yes 2610044H13 UAG AAH62406.1 99.56 霧霧辮霧雛纏 謬.鶴 雛 溜""伽 ∴:ゑ碗三雛 鴛:三雛霧、霧霧1 霧 撒 yes %%yes 8030486FO4 UAG NP 598711.1 99.53 霧鞠 灘 灘雛霧磯 麹継`羅 二募2Z'・"雛霧1'雛 6930088DO5 UAG NPO31485.2 99.53 霧灘 顯 『購雛 繍 ド爾.鍼 多纒 膨難膨〃髪 影霧1疋雛 雛yes 6130010116 UAG BA629759.2 99.52 霧纐 撚.霧 彰易を.,ζ1纏 二髪譲 羅 霧 霧雛ll 雛 撒yes C430040AO2 UAG CAI25358.1 99.51 i醜.晦 鎌 no G431002112 UAG AABO3091.1 99.31 轍霧灘雛 黙 、議叢 ゑノ、1綴一 懸 雛 髪yes - 一 魂 づ %物%㎜ " 一… 瀕一一. .㎜ 皿 コ 髪『%'ii『 2700055P12 UAG NPO33171.1 99.29 辮 綴 辮 一 灘 霧 霧yes ;雛 韓dB獄2廓{糊 舞8 4933407005 UAG NPO84024.1 99.29 / .、 'no - 一 一 一. 一 』L鍵㈱麟 綴 . /ド ._-.-_.-. 1110003JO6 UAG NP 444318.1 99.25 彪 .一 yes Name Codon Protein Identity 【漏 団{こ Motif in 3'UTR 5730414DO7 UAG NPO315663 9920 辮 蔭㈱霧F'須%'灘z購 懸 纒 雛一 ーyes 3830432F17 UAG NPO797301 9917 鵜i購 籔霧. z g?g轣c彦,雛多鎌 …灘 饗霧霧一yes 0610030P12 UAG BA6257041 9915 難,纐 纏 鞭 曝 夕講 多%考雛 傷"微 多鴛拷1鷺一 yes 2010012FO5 UAG NPO836381 9911 yes 2310063MO3 UAG NPO771501 9895 纏 纈 繋i躍蓼雛畿緒i鍵鶏簸講 繕霧灘i yes 1110002AO 1 UAG XP 1318273 9893 iRIKεN cD短A 2610002JO2 yes 2210010BO5 UAG NPO797431 9891 箔%/"鰯 菱鎌 ゑ2鷺%〆//拶/ン1鷺 ノ:髭乞1講 yes no 2410004117 UAG AAH712751 9885 1麟 醸 一擁 灘/〆/つ ∠ 驚 遜 〃 ノ多懸1灘 D630030L16 UAG AAH807961 9872 灘 勇諺墜雛1麟 薦騒"懸霧1"1彰霧髪霧霧同霧 no 6030440E21 UAG NP 7803101 9847 -2轍 欄:雛/考 影 多髪霧多易多'S.,多纏 難 yes 1810042KO4 UAG AAH274101 9836 1{810042KO4痛k prc)tem no 1110032AO3 UAG AAH320121 9825 霧1纏lil霧 麹醗 筆多∫均/三警7二㌧ζ多擁多雛%%i多纏 購 難 笏鎌 no 2010204NO8 UAG XP 1433324 9820 麹.il懸 '4< 蓼維 〃二〃/ ジジ/乞ン/難 饗 雛 eyes 2400003C12 UAG AAH833511 9807 蓼…藝聯iil聯 継 義髪i多劣二_難 雛 撚 yes 2410089K11 UAG AAH559091 9804 騒1灘 霧、霧.瑳綴.羅.、 ・%1.携 〆∠" 磯 …灘 雛 膨雛雛 no UAG AAH865311 9759 鶉霧勇。1勿鍵ii鍵 診㌶諺〃 劣一 欝 雛 κ灘yes 2510009GO6 -- -- 彩郷 ..- .. 灘 一 一一…一一.一 ... …一一霧 綴1- _ ㎜ 一.㎝ 。. .%/,i%欝、膨多霧多多膨多霧勇灘 雛 韓 灘 A230013C15 UAG ICA1025311 9613 霧霧霧嚢霧霧霧撚%多 務多字多 %i.叢 灘 綴1霧 獲繊 雛鋤es 9530075011 UAG EAA193581 9505 封y沿¢1ぬe輔ca垂鋤em RC◎ 雪88 . no 2700077820 UAG BA6241001 9490 {untlamedprote'rrt produc# yes 4933411EO6 1 UAG ICA1019411 1 9375 no Mmusculus C630037M17 UAG NP一 〇〇10042591 9291 翻 麟 認1翻麟 鶴翻羅螺謡羅 語羅 yes / 蔓翻n醜 匿 ㎜w重ロct 纏 綴 盟 . π_,._.___,%__.、 _ 虜 0._麗 陽 A430049C10 UAG IAAHO60491 9000 G6 ◎Or目雪◎轍 rotem no C530022601 1 UAG IXP_2377462 8744 灘 雛 魏 灘 讐 饗 難 擁璽i鱗1蝋nono _一- .-纏-畷鰯醐 ・ 刎㎜.-… 『… … 『…… … 『『一.帽… わv診◎齢 纏ca癖 ◎繍 / / i/,O/'3.i,ii,i�'i%i何……皿闇-… …囮.…` . 9530053HO5 UAG AAH315011 8462 霧 彰彦蒙 陀 覇 κ∠ 7 /�'/.㌘2二〃グ多多鍵7驚 難i懸yes 4932702FO8 UAG CAI407051 8450 鱗,嫡 蓼 .形、 / κ/醗 多犠多難 髪ノ擁'盤 yes no 6230399CO3 UAG XP 3470111 8254 %h (苅1β黛鍵匙』≧rtT≧tem 4930578M17 UAA NPO801321 9982 蕪、彰雛毫 簸 肇. 筋繍 ・麟 多多多影髪二霧篇彰多蕪 霧撚 yes A830022003 UAA AAN383181 9981 鱗撒.醐 雀〃纏磁 ∠〃傷班難 雛 塀雛纏 yes 573044×21 UAA AA6632611 9980 難 蠣 騨 髪擁 勿,/ /ワ「∵チ/多諺湧霧多i麹盈%羅 雛 雛 雛lyes 3830404FO1 UAA CAI239521 9978 徽 多纏 髪霧I勇懸禦 為囎 ノ擁 擁 ソ多多ζ3チ繋 霧… 影%笏雛yes 6930018112 UAA CAI243701 9977 霧雛多麟i鰺 鵜 ∠4._購 雛 猛 混舞 霧雛纏霧霧yes

11 6030455K16 NPO808261 一 9976 獺 霧 籟霧/欝.掘 麟薦""1簸殿蓼霧霧魏1辮 ・ 擁no C130020JO4 UAA AAK381601 9969 繍 鷺.糞勇 /、ド騒ドrr./"卯 葬麟1雛i諺iκ 例σ ///霧霧獲 霧髪yes 1700041K19 UAA AAA699581 ...: 繋購,騨 葭灘 ・r-,, タ多一1雛 霧難 霧髪yes 2310046K10 UAA AAH218211 9968 EEptth駘iaia#ro ai interact yes 1110020L19 UAA NPO829091 9963 購 霧霧擁 蓼灘1霧纏 饒灘 .髪 騨 ㎜ 一 yes 4930503620 UAA AAH486581 9957 難`髪難 髪灘 屡騒醗鶴灘1- no A730032613 UAA NPO674931 9953 霧 購 羅.懸 雛 一 yes Name Codon Protein Identity 1匝:山1166囹ll4二 Motif in 3'UTR C330001EO4 UAA AAH147571 9952 蓼鶴一 鯵 多%甥;灘 饗 移難 灘yes 2510008E20 UAA NPO339601 9951 % 。霧鰯襯 4撚2〃 ノ・多ノ衡参髪74雛 霧髪芝多霧 欝 灘yes 1200003021 UAA AAH481812 9949 灘 拶 勿7勿'777/%マ//J「1"μ'"刎 ぞ 桝 鷺"7男㌘ψz7髪 灘 難yes 1810019G20 UAA NPO807301 9946 κ /ド "微… 繭 γ瞬/顔為翻6薮 隻麟 三/二;穿 つ/二%7猛 灘yes ! / 1〆 1200011118 UAA NPO804532 9942 " z ∠∠ 7 z 。纐1『 繍.7魏 翻 髪"貿∠;灘驚�./9yes 8030485J22 UAA AAH869191 9939 辮 懸 蓼灘 鋤 §護移' ノ劣7 /⊃老劣〆"三/i//%;雛1二懸 纒yes / 2310005E10 UAA BAC386151 9937 獺 鍵 纏 図 一 / 繍 麟 ド角11髪・ツ%灘 鑑灘 難… yes -一 … 陥..'… … 瓦r艀'㎜ π例 『『闇『 皿 _賜 微 物 髪髪 一-一 膓.一撒_彫 物 『『.拷物 ㎝、『 彰彦『径 舅『『『.__ .徽 - _韮 ミlKEN cDNA 2210404007 2010016DO2 11 CAI244111 9910 纏i簸 蹴"匿 、 ∠泌 「・、"二:・≧ク ーyes 6330414Al2 UAA NP 6837401 9908 繊 蠣 多蝋 ド購/窒 〆乞努∫易4諺移〉二㌘箕二を手一yes 2010002EO4 UAA AAQ929191 9897 犠 纏 蓼..、. /例勧脳籔/嚢.慶. 涌 1.喫.. 痂諺 髄 葭嚢 l r灘yes 鑑 翻 郷 .翌 -....._㎜__._.撚_、__- .-、.. RliくEN cD翼A 170003閉A10 6430529806 UAA AAH 122681 9852 霧i簾騰翻 籔1吻" 霧嚇 銑 麟 ・-iiii髪i霧no Mmusculus 2010015DO8 11 XP 5346751 9839 箋li諺轍 ㈱ ・多多多i-i…i髪 髪yes 2900002G20 UAA AA6944911 9773 11ii醗.霧 鋤7%鱗 葦 嚢 灘 灘 難yes 4833424007 UAA NPO717171 9748 霧 ..騰、死,1蓼醜 菱膨"易7〆.ワ 諺∴ラ霧多義〆拷〆影 ㌶ クζ霧li髪 ・髪yes 0610011G18 UAA NPO566062 9718 薩i一 ㌃二三乙ノ/%z ■ ...・//髪髪多多1ラ勿撚 鑓i髪 霧yes A230014MO8 UAA NPO757001 9565 聾prellelated during skelet no 0610007JO1 UAA NPO839051 9553 蓼籔霧 麹蓼霧騒箋鰍霧獺 霧抜旛乙考盤 繍 灘 霧 霧雛纏灘yes 6430012021 1 NP 1157761 9434 騒ilil藁蓼霧鑛彦購賜驚籔 鑛14 F「麹軽。.。欝灘lil i灘 Yes 0610039MO7 UAA NPO798724 9309 霧1霧1.騒髪籏一 ・灘1、1 霧霧 霧il離yes 4631422005 UAA AAH278601 9082 霧1蓼1蓼霧霧購 鍵 多髪多/� il 灘 霧灘yes … 2310057K14 UAA AAH196591 8794 睡 ド鑛i襲籟蛎麹磯 ■ ,纏 認徽鰯琵騰 簾.姦 〆多難 属携1 霧雛霧il灘yes D430005E18 UAA CA1025201 8750 no . ,新yp◎せ1e勲cai protein PB402 4933437F24 UAA AAHO60491 8750 A61QO11104Rikprotein no . D130097N11 UAA AAN396851 8682 顧鑛鰯,、..騒卿霧".籔/纐 名り/㌧ン影1髪多難 葬霧 霧霧霧霧霧霧霧霧no 鱈...... -..-... - 一劉. - .謬魏___群 矯amedρギ◎feinproduct 〆 BCOO5294 UGA CACO41861 100 1雛鶴 髪.l 灘 醗"、/屍 多て〆髪難 霧霧i難霧髪i鵜… 髪霧yes BCOO2381 UGA XP 5233421 9978 l鑛霧簸 κ. 1…、1齢 、;/.諮 毒 灘 毒1籔ill霧lyes BCOO5388 UGA AAD221101 9977 霧鰯1霧 聯 1.麹 醸 一 、霧i霧 1・霧髪雛 髪髪yes BCOO1862 UGA AAO149461 9973 霧雛雛 懲1.li.鰍 、. .勉ノ簗多s,/;°r%i:欝彦糊 笏 髪鴛霧 霧1霧 難霧霧i%霧yes AKOO1714 UGA AAH360201 9968 1霧籔.纏 霧 屡li:勲 グ蜘 鰯 魏/」ζ乏二=霧霧灘 霧房纏 霧yes AKO55544 UGA NPO606222 9966 獅 鋲轍-.∠"- 雛 ヨ∫づ ㌻/.・ゲ/劣.//多4彦〃 〃4薫 多ラ多鱗 霧霧霧霧繊 影髪髪yes BCO貿 質7 UGA AAH177172 9964 霧覇饒綴須 1纏 そ∠弗擁〆研 ゴ・二・笏穿多多笏 多灘1雛 霧1髪i霧雛髪 霧yes Hsapiens AKO74186 UGA CAI202681 9962 轍 舅轍 旨- κ1纏κ『醜 纏 易.多多易グ多髪灘 霧霧雛li髪霧ii霧霧 髪yes BCO30785 UGA CAH915961 9956 霧魏 雛 〃 騨 -纏1辮 笏 多孝綴 髪多三難 笏髪萎箋i霧萎雛…ii雛髪霧no AKO21956 UGA AAD325961 9956 1蓼購.髪…1"繊 騰 . 屡纏 鰯獺.多 ㌶ 籔 二易多灘1 i欝纒 霧霧no AKOO1982 UGA AAH128052 9956 霧纏 ….i 襯 1.髪 鞭 灘7"多1鍵 謙 撚li li霧雛撚 髪髪yes BCOO7097 UGA CAI424651 9952 霧鶴 …ii.墾 義 ∠i饒鶏霧纏-撫 褻一 il霧雛雛ii 繊yes BCO32690 UGA NP OO68531 9951 i.蒙 i饒萎 頭鰯 辮 霧繊 霧難 …髪… 霧雛 雛 、、髪髪no 0彫 ,痂 一., 隔…F-」 闇一 「.「「舶 闇r附r附_闇r" 難i-髪…1-1- i灘 ㎝'躍" 雌… 枷 『… 陥鶴"凶 ㎜ 『猛 隔"㎜ 鷹幽臨…"… 『 礁 彰霧難i醗 霧雛霧i灘"物 揚霧%` 郷撚灘 …-.. 繍 貌猛 鰯..I-.醐 閣-・ 1 --.■-- . ..I- - --一.一 . -. 一-...i匿 一 一 圏一 % 鐵 . 0 . .醗一.鵜髪灘 雛 鞭 ド揚・ll1霧illl髪 髪髪蓼霧髪灘一 謹1 Name Codon Protein Identity H幌1畳1ーー13】麟 M0tif in 3-UTR

BCOO8859 UGA AAQ152281 9935 難.多微霧 鰯 霧 灘 多髪㌘髭身z筏 劣劣z「二/~簗彦讐7ド〆づク∫/'/s/%シ1胃 傷 拶 嫁 no AKO22704 UGA AAQ889391 9927 【EG薮496 yes AL833145 UGA CAA778361 9921 鎌 辮 舞徽 ・ / / 二/、 … %・/ ;.・ 2/「/yes AKO25133 UGA AAHOO8051 9918 獺.磁:麟 二㈱ 瞬a翻 鋼1籍b毛㎏鈴繍 麺 ・ 一∵〆・no

AKO25638 UGA CAI236231 9918 灘 撫 鋤 。二脅籍' / / ㌻ / / yes 『『闇…㎝… 附 .一.一 『『『霧『…『『影 磁 『…. 男.遡 一鱈『㎝『『 笏 ㎜ "灘π 一..凹獺撒 瀦'_.__ 〃瓦灘_r、'㎜ 一… 膨 〃〃〃〃瓦 〃 〃〃, , similar to Very hypothetical protein AKO23776 UGA AAH305361 9914 CD99L2 protein yes 疲/の∠/" F醜 綴3俺d墾c鍛se UGA AAK613001 9914 7 no BCOO3127 晩/・・一/・ 嬢 . AKO97557 UGA AAG218281 9906 ・ 織 葺 ・/・/・ ・1yes BCQ゚0742 UGA CAA319931 9902 灘 纏 鵜 ド 、 . 。/1. yes AKO96737 UGA XP 5127721 9901 。窮蓼 /勇%ル。繍 、 '/ '/ ぞ ・ しノつ∬三/∴no AKO24957 UGA IXP 5225451 9895 難/蓼 蜘 『幽 診34二 / _/ ・、 ・/㌻二/・;ラ/〃物・ 〆yes

一 凹.一 __賜 脇辮._.㎜ 一 .猛...一.__一...一. 笏% ・ 物 徽7鍬・ ㌶ ㎜ 徽 筋_岬 傷7驚 /さ諺_麓動饒/垂灘盤 ・・.肥万励...・ 垂s雛}繍 / 二/し './〃.4才 ∬三 AKO93344 UGA NPO573761 9872 讐豫藪・蝋;蜘2讐 蟻)a / / / グ /二傷L°�°,yes BCO26078 UGA CAI147671 9851 一 磯 幾鰍 鞠 細 翻 磁,鴛 写1、//・諺 yes 、艦 BCOO8424 UGA XP 5209761 9810 / 。i漁 醜 繍 で .//〃 /揚/τ/『 膨劣影 yes AL831951 UGA XP 5282511 9779 難/舞 鰍 鱒 懲/.〆 7/・.7 /乙 チつ〃 紘 拶.i, no AKO54990 UGA XP _0869376 9752 (PREDICTED. similarto hypotheticalprotein no AKO57915 UGA AAD181331 9735 Heur◎Mastoma-amplified protein no /〆r 嚢燃 二万 AKO24620 UGA AAO417151 9698 甥 嚢一、i/.,. ・ ∬/ ∠/ .膨㌶ ・.、・、 髪 多〃/物 多i多多 no BCO19041 UGA CAI148901 9657 i纏嚢 象 卿難魏 多搬 磯 毎・姻 難2 //身 轍 髭娩 鋸 髪yes 7 / π 一 Hsapiens 9610 AKO96244 UGA CAI148901 影 Z z〃 。欝 簸 翻{雛 勧で㈱・彪 薦 ノ嬰,轡 謬響誓霧驚珍 yes '・1: PRED垂CTED:sim洞ar窒o hypothetica-pro重e構 no AKO96369 UGA XP 一 5175281 AKO57071 UGA AAK172392 9349 霧 砂 彦.〆.彰癬麟 蜘 峯毒彦 ・を"』て∵て/////・//つ//・ ンノ!/弓 yes う編 BCO33109 UGA CAI220681 9260 撫;繍 ・繍 ピ蜘 醗.c◎轟捻ホf11毒・鋤 鐙 ク ,.……./%「ゑ% no AKO23486 UGA AAF711151 8557 PRO2221 yes AKO93652 UGA AAA880361 8280 i灘 轟.麟 ㈱ 蜘 捻鈴 融 ◎垂《≧蛙 /.〃/ ∬/ /.房/彦/膨 no AKO24714 UGA AAH169051 8235 FLJ13614 protein no A6033082 UAG AAF636001 9994 箋翻 ∫ゴニ/ ・ / , 『.'二㍗:{ %・∫・託三/2二.yes AKOO2024 UAG EAL240811 9986 灘 鍛 麟1轡 主嚇1主/ .・主 鰯 yes ..;, yes AKO55793 UAG CAI220331 .蓼髪鎌.麟 騨 二 ・ / / 〆//"・ 砿 〆・・/蓼

BCO31082 UAG NP 一 OO27661 9979 鑛i籔'競曝 魏徽 博ド嚢㈱ 繍 ゴ ニ / ノ プ・∵擁 yes AKO56192 UAG NP 8580611 9977 ・懸 蜘 繍 繍2A ・4・ ∫・吻 絃 yes拓 BCOO2751 UAG CAI419351 9972 鑛 霧1翻 多鱒t, !i " 〃 ・ / ノ// . 〆、 / 死 … 彦' 〃.三ドン no AKO25426 UAG AAHO96761 9971 蓼 …騰 嫉麟 穏 ・ ./ チ物磁 雰タ解yes AL832349 UAG NP 7750831 9971 霧…1織 彫//r 〃'/ 〃二//〆 二㌻一 ・乙yes BCOO1899 UAG AAX140471 9971 鶏霧鱗 鵬 磯 ¢・・二 ろ珍二4 ' ク・/冗㌃ yes .弓 〃 no AKO26515 UAG AASOO4901 9970 難 蒙…i鱒 〆轍A㈱///ド グ影////傷 て AKO24400 UAG lXP 0291018 9970 PREDICTED:KIAAO947protein yes AKO22575 UAG EAL237441 9968 transforminggrowth factor yes AKO24562 UAG NPO789692 9967 1猛 勿藪騒髪蓼ζ 舜! 〆/考. ! / " ∫ ・ ・ 7 ・/㌧/yes Name Codon Protein 一dentity 【φ璽。】M1鎚隅1腫 Motlf ln 3璽UTR BCOOO419 UAG CAG303081 9963 蒙覇灘z"一 ラ野陽1ド客多膨フ鷺 多㌶一yes AKOO1177 UAG CA6666301 9959 hypotheticalprotein no AKO24047 UAG AAF717901 9956 鍛i矯 旗藤 鱗 、 、 嫁 孫 嫉滋 yes BCO20851 UAG CAI 130441 9955 麹魏 渤 ¢綴 ・/.∫ 二 栖:・・///〆篇%多㌶撒.灘 no AKO56990 UAG CAI141921 ..� ..., i盤%f/ 騨 頭纏 鯉"~ 物 ・/膿 締 … BCOO5894 UAG NPOO10090081 。轍 騨轍 掴襯 麟嫌勲撫 膨霧 難馨磯yes BCOOO262 UAG CACO84031 9930 OTTNUMPOOOOOO59423 no AL834128 UAG NPO576542 9920 難 ' 努蜘 萱}麟 鋤 厳a綴 ㈱ 緯y撫/ 筋∠髪 /み媛ゑ磁 yes BCOO3564 UAG XP _5284001 9915 雛:織 謬織 鈴 繍G重/・' つ二・/姦蓼馨 雛yes 皿.-幽 _ 7 霧.薩 . .. η霧講 扇・・撫 繍 幽 轍蓑魔認 醐,4・4ζ ξン ;,<,%%%饗_"_ … BCOO7008 UAG AAC191611 9897 灘 二麟 匿籔鑑多繊 〃4冤二・/傷/!霧諺ζ男クそ.懇 霧髪雛、霧 no BCOO9065 UAG NPOO44742 9884 1、 蔑擬纏 灘 繍 雛 鍵嶺紬 易〃タ傷髪づi多%蒙灘 霧髪灘i霧 yes --露 ゐ 髪箋鶏ii鰯綾i襲 -.- ..__難..- 獺 .髪.1/、∠.…繍 一 諺 灘 、雛 AKO23825 UAG NPO591432 9832 難 髪鞠// ㌧三〆 覧乙・〃4髭彦1響嚢嚢 霧髪霧yes AKO27214 UAG XP 4960711 9830 灘 纏 .彪、 轍 二蜘 姦 /斐 圃!/;多を穿㌘2三綴 懲 雛 雛 no AKO24897 UAG XP 5118351 9732 PREDfCTED:hypathet{caf protein no

AKO91645 1 UAG IXP 5296211 1 9697 no 闇幽魑隔 . . . i PREDICTED:hypotheticalprotein 一・一・-.一霧一撚 一一.. _獺 霧_ 一---霧 諺霧_ . ____..--彫_%__撒hypotheticai pr◎te覇LOC387628 冨上.__.-__. _ 附.凹.隠 AKO56703 1 UAG I XP 5292931 1 9527 PREDICTED:hypothetical protean no Hsapiens AKO27536 UAG CAH921931 9508 hypothetical protein yes 彪 ㍑ 冤7-"%"霧 冗グ∠影 場 %F z銘 笏 ' 纐 AKO56656 .. UAG CAD981011 9491 。,_...。.,形 厚 影_. .嚢懸 簸'ド _髪 ._霧 難yes 獺 隈-薩 . . A…"… ㎜㎜ … ._ … 仰 …A… 脚皿 "" u魏緕a照∋dpr◎te{np髭◎duct l "W AKO94375 UAG CAI228751 8387 expressed in hematopo袷 振c no ・i舞κ AKOO1682 UAA BA6138171 9977 、 〆㌧麹、/ .、、 乙 〃ク易☆ 多・多膨多彪一 yes AKO23673 UAA XP 5177351 9976 PREDICTED:similar to hypotheticalprotein yes 〆/嚢麟繍4弓 π/二 BCOO1487 UAA CA6433671 9976 霧菱li霧雛.… 曹κ∠ ./蕩多勃汐のチ像多1雄 〆葬難 難 雛yes

., .... AKOOO222 NPO628262 轟墜 義.麹〆 勧覇蜘 ア奪/願纒 ζz窪一 yes BCOO4301 UAA NPOO10084901 9965 馨験勤麟.彰麺魏離 鋤寡 i 9 _ 鱒 「・∫今ワ6 ニン娑彦多嚢 髪霧 yes AKO23155 UAA CAH715241 9952 霧 。磁 贋購%ゑ 易ク㌘易一 yes AKOOO669 1」 AAHO44651 9950 霧旨嫌 函霧猪"勇、 雛 嫡1二蒙一 髪雛 形霧1雛霧yes 11 AKO24826 NPO547492 9946 蓼i馨/購 鞭 蓼轍:鱗 墾亥傷多 ..,/ ,i,a.%.ダ穣i髪 霧雛 yes AKO22650 UAA CAI220261 9927 霧1.,臨鰯 灘 轍 購 疹多ζ陽三彦二罪物ζ1獲 蕪 yes BCOOO903 UAA IXP 5175381 9904 鶉肇飯麟 撚 ・鋤2/一 %ガ〃ク2兇・易1髪〃傷づ雛 髪霧 no AKO56120 UAA AAK676311 9719 綴 纈 ワク彩勿・でζ・〆ジ ニ ワ /7〆写 二2ノケ多ニニ多/多霧 撚yes AKO56626 11 CAH920481 9683 軸yPothet重calpr◎teen no BCOO3617 UAA AAN396851 8837 舅彦 藝 量、獅 疹瞬. ..、.、ζタ~多〃 三i _%%/二_no AKO96341 UAA XP 2384152 8570 similar to CDNA seque員ce BC◎1997ア yes

BCO23984 1 UAA XP - 5101661 8544 箋z 翫 勇7蓼.〆〆蓉!// 梛 ;藤 一 ド藝 髪霧霧no l - .彌.画齢陽ilil.Ii自 獺 鵜 卿 撒 雅 i 麗圏-團 霧 i鐙 纏 暑ll閣獲暑簸 .「 一 賜.勇1撒-一 髪 塑・蓼猶D melaη0gasfer l圏 一 覆-醗.一.獺 ...膨.一易.賜聖 一,一 一.・一難 醗.翻 薩 灘 ・..一 一 狸 舅 " 1.鑛難鍛蟻籔難鍵灘懸 灘籔難圃国 RE70963 - UGA CAD 128561 i 9806 脚pothe§cal prot窃n no

Additional file 2, Table S2 The list of protein motifs at 3' UTR of read through candidates

Protein motifs found at 3' UTR of read through candidates by InterProScan are listed `step D' (see Methods along with the cDNAs of candidates following , Figure 2.1).

Column `Name' : Name of cDNAs

Color-coding in rows:

Experimental validated cDNAs encodina readthrouah genes

Column `Motif at 3' UTR' : Annotation of the motif found at 3' UTR

Column 'Database': Database name

Name Motif at 3' UTR Database 2siool2H2o GLUTATHIONE PEROXIDASE-RELATED HMMPanther 2siool2H2o GLUTATHIONE PEROXID_2 ScanRegExp 2siool2H2o GLUTPROXDASE FPnntScan 2siool2H2o GSHPx HMMPfam 2siool2H2o Glutathione peroxidase HMMPIR 2siool2H2o Thioredoxin-like superfamily Aisool6Co8 RUN HMMPfam Aisool6Co8 SH3 superfamily Aisool6Co8 SHsDOMAIN FPrintScan Aisool6Co8 SH3 1 HMMPfam Dosoo46Pll ANAPHYLATOXIN_1 ScanRegExp Dosoo46Pll ANAPHYLATOXIN_2 ProfileScan Dosoo46Pll ANATO HMMPfam Dosoo46Pll ASX_HYDROXYL ScanRegExp Dosoo46Pll Anaphylotoxins(complement system) superfamily Dosoo46Pll EGF/Laminin superfamily Dosoo46Pll EGF_2 ScanRegExp Dosoo46Pll EGF _3 ProfileScan Dosoo46Pll EGF _CA HMMPfam Dosoo46Pll FIBRILLIN-RELATED HMMPanther Dosoo46Pll FIBRILLIN/NOTCH-RELATED HMMPanther sgso4osD2l G6PISOMERASE FPrintScan M. musculus sgso4osD2l GLUCOSE-6-PHOSPHATE ISOMERASE HMMPanther sgso4osD2l PGI HMMPfam sgso4osD2l P_ GLUCOSE_ISOMERASE_2 ScanRegExp sgso4osD2l SIS domain superfamily lsoooo2Fls GENE 33 POLYPEPTIDE HMMPanther 1810062014 EPI64-RELATED HMMPanther 1810062014 TBC HMMPfam 1810062014 TBCRABGAP ProfileScan 1810062014 UBIQUITIN SPECIFIC PROTEASE-RELATED HMMPanther 1810062014 Ypt/Rab-GAP domain of gyplp superfamily l2oool4H24 SPLICEOSOME ASSOCIATED PROTEIN 114 HMMPanther l2oool4H24 SURP ProfileScan l2oool4H24 Surp HMMPfam l2oool4H24 UBIQUITIN _2 ProfileScan l2oool4H24 Ubiquitin-like superfamily l2oool4H24 coiled-coil Coil l2oool4H24 ubiquitin HMMPfam 20102061112 MITOCHONDRIAL CARRIER-RELATED HMMPanther 2oio2o6Hl2 Mite_cart HMMPfam 2oio2o6Hl2 SOLCAR ProfileScan 6720430015 CDlsl-RELATED HMMPanther 6720430015 TETRASPANIN - RELATED HMMPanther 6720430015 Tetraspannin HMMPfam Name Motif at 3' UTR Database s2so4o2Kli A KINASE (PRKA)ANCHOR PROTEIN-RELATED HMMPanther s2so4o2Kli SRP4o_C HMMPfam s2so4o2Kli TREACLE FPrintScan s2so4o2Kli coiled-coil Coil s2so4o2Kli qb def: aqcpss6o [anophelesgambiae str, pest] HMMPanther 1500041015 EF-hand superfamily 1500041015 EF HAND 1 ScanReqExp 1300008001 GLUTATHIONEPEROXIDASE-RELATED HMMPanther 1300008001 GLUTATHIONE_PEROXID_2 ScanRegExp 1300008001 GLUTPROXDASE FPrintScan 1300008001 GSHPx HMMPfam 1300008001 Glutathione peroxidase HMMPIR 1300008001 Thioredoxin-like superfamily 1700027009 GLUTATHIONE PEROXIDASE-RELATED HMMPanther 1700027009 GLUTATHIONE _PEROXID_2 ScanRegExp 1700027009 GLUTPROXDASE FPrintScan 1700027009 GSHPx HMMPfam 1700027009 Glutathione peroxidase HMMPIR 1700027009 Thioredoxin-like superfamily sosl4s4C2l SELENOPROTEINP-RELATED HMMPanther sosl4s4C2l SeIP_C HMMPfam sosl4s4C2l SeIP N HMMPfam g4soolsPog 15 KDA SELENOPROTEIN HMMPanther g4soolsPog SELENOPROTEIN HMMPanther 6430522821 Alkaline phosphatase-like superfamily 64sos22B2l NUCLEOTIDEPYROPHOSPHATASE/PHOSPHODIESTERASE HMMPanther 64sos22B2l PHOSPHODIESTERASE-RELATED HMMPanther 6430522821 Phosphodiest HMMPfam 2oiooo2Gos CALMODULIN-RELATED HMMPanther 2oiooo2Gos EF-hand superfamily 2oiooo2Gos EF_HAND_2 ProfileScan 2oiooo2Gos Q8VDsg_MOUSE_Q8VDsg; BlastProDom 2oiooo2Gos efhand HMMPfam glso2o6Hos N-ACETYLGLUCOSAMINYLTRANSFERASEVI - RELATED HMMPanther D2soollL2s CDK2_MESAU_P48g6s; BlastProDom D2soollL2s CYCLIN-DEPENDENTKINASE 2 HMMPanther D2soollL2s PROTEIN_KINASE_DOM ProfileScan D2soollL2s Pkinase HMMPfam D2soollL2s Proteinkinase-like (PK-like) superfamily D2soollL2s SERINE/THREONINE KINASE HMMPanther M. musculus 2oioolsH22 Branch HMMPfam 2oioolsH22 GALACTOSYLTRANSFERASE-RELATED HMMPanther 2oioolsH22 N-ACETYLLACTOSAMINIDEBETA-I,6-N-ACETYLGLUCOSAMINYLTRANSFERASE-RELATED HMMPanther Cssooiol2l ATP-synt_C HMMPfam Cssooiol2l FIFO ATP synthase subunit C superfamily Cssooiol2l VACATPASE FPrintScan Cssooiol2l VACUOLAR ATP SYNTHASE PROTEOLIPID SUBUNIT HMMPanther Clsoo2sKos CXXU seIWTH: selT/selW/selH selenoprotein HMMTiqr 22iool4Pll CXXU seIWTH: selT/selW/selH selenoprotein HMMTiqr 28io4o6K24 C2HCZNFINGER FPrintScan 28io4o6K24 GAG POLYPROTEIN HMMPanther 28io4o6K24 Retrovirus zinc finger-like domains superfamily 28io4o6K24 ZF_CCHC ProfileScan 28io4o6K24 gb def: dna-bindingprotein hexbp (hexamer-binding protein) HMMPanther 28io4o6K24 zf-CCHC HMMPfam 2oioolsBo4 ARYLACETAMIDEDEACETYLASE (AADAC) HMMPanther 2oioolsBo4 Abhydrolase _3 HMMPfam 2oioolsBo4 ESTERASE ProfileScan 2oioolsBo4 LIPASE_GDXG_SER ScanRegExp 2oioolsBo4 MONOOXYGENASE-RELATED HMMPanther

AA nn nnl A- -a.- I

\\

tsLJUL[JLIIJ IVL -I I 11 T MAY llllC LJCIVLJIIYPIJC 01 11 IGI 8230215015 T4_deiodinase HMMPfam 8230215015 TYPE II IODOTHYRONINE DEIODINASE HMMPanther 82so2lsDls Thioredoxin-like superfamily l8ioo2sAoi RING FINGER-RELATED HMMPanther ggsoiosHli CCR4-NOT-RELATED HMMPanther ggsoiosHli Helical domain of Sec2s/24 superfamily

nnnn.lncu~~ 110"TC 1AI ll IAIAC`C AT~ ennnDnnavn

,,,\\a.~\\,_\\

PIJJV V I 1\31 IL-J/VLIVVrll.J 1-.11I Lil I I- V I 1111 IL 1 rIVL _nther Assooil G 10 25A SYNTH 2 ScanReqExp 4gsos88C2l ATP synthase )Fl-ATPase), gamma subunit superfamily 4gsos88C2l Radial_spoke_3 HMMPfam 4gsos88C2l coiled-coil Coil Name Motif at 3' UTR Database 9130020015 ENV POLYPROTEIN HMMPanther 9130020015 GPs6 HMMPfam sl'sllllgllrll rnilari-rnil

lllU4UJLilU U tJ(KHI 1 nivirviraiiuiei 22io4osBio P-loop containing nucleoside triphosphate hydrolases superfamily 22io4osBio RAB-RELATED SMALL G PROTEIN HMMPanther 22io4osBio RASTRNSFRMNG FPrintScan

Cssool4Plg REGULATOR OF G-PROTEIN SIGNALING HMMPanther Cssool4Plg REGULATOR OF G-PROTEIN SIGNALING 10 HMMPanther Cssool4Plg RGS ProfileScan Cssool4Plg RGSA _MOUSE_QgCQEs; BlastProDom Cssool4Plg Regulator of G-protein signalling, RGS superfamily 2iooog4Kls CXXU seIWTH: selT/selW/selH selenoprotein HMMTigr A6soo28Gog DUFl222 HMMPfam 4932415019 NAHEXCHNGR2 FPrintScan 4932415019 Na_H_Exchanger HMMPfam 4932415019 SODIUM/HYDROGEN EXCHANGER-RELATED HMMPanther 4932415019 b_opal: sodium/hydrogen exchanger HMMTigr E2soooiK22 ACOX HMMPfam E2soooiK22 ACYL-COAOXIDASE HMMPanther E2soooiK22 Acyl-CoA dehydrogenase C-terminal domain-like superfamily E2soooiK22 Acyl-CoA dehydrogenase NM domain-like superfamily E2soooiK22 Acyl-CoA oxidase HMMPIR E2soooiK22 Acyl-CoA_oh_1 HMMPfam E2soooiK22 Acyl-CoA_oh_M HMMPfam E2soooiK22 ELECTRON TRANSFER FLAVOPROTEIN BETA-SUBUNIT-RELATED HMMPanther gssoog2Glg GPROTEINBRPT FPrintScan gssoog2Glg Qg6Eoo_HUMAN_Qg6Eoo; BlastProDom gssoog2Glg WD4o HMMPfam gssoog2Glg WD4o-repeat superfamily gssoog2Glg WD_REPEATS_2 ProfileScan gssoog2Glg WD REPEATS REGION Prof leScan Eosoo48Fl8 DEAH_ATP_HELICASE ScanRegExp Eosoo48Fl8 DNA REPAIR PROTEIN RADl6-RELATED HMMPanther Eosoo48Fl8Eosoo48Fl8 Helicase_C HMMPfam M. musculus LODESTAR GENE PRODUCT HMMPanther Eosoo48Fl8 P-loop containing nucleoside triphosphate hydrolases superfamily Eosoo48Fl8 SNF2_N HMMPfam Eosoo48Fl8 coiled-coil Coil siso4o6N2l gb def: dna segment, chi 11, way ne state university 47, expressed—data source:mgd, sourc HMMPanther g4soo4gB22 APC_SENs_REPEAT ProfileScan g4soo4gB22 MEIOTIC CHECKPOINT REGULATOR TSG24 HMMPanther g4soo4gB22 MEIOTIC CHECKPOINT REGULATOR TSG24 FAMILY MEMBER HMMPanther g4soo4gB22 PC _rep HMMPfam g4soo4gB22 RNase A-like superfamily 6ssos6sC24 7tm_1 HMMPfam 6ssos6sC24 D(IA,B)DOPAMINE RECEPTOR HMMPanther 6ssos6sC24 DOPAMINEDlBR FPrintScan 6ssos6sC24 DOPAMINER FPrintScan 6ssos6sC24 Family A G protein-coupled receptor-like superfamily 6ssos6sC24 G-PROTEIN COUPLED RECEPTOR HMMPanther 6ssos6sC24 GPCRRHODOPSN FPrintScan 6ssos6sC24 G_PROTEIN_RECEP_F 1_1 ScanRegExp 6ssos6sC24 G_PROTEIN_RECEP_F1 _2 ProfileScan 6ssos6sC24 HTHFIS FPrintScan Dlsoo2oJ2o NADHPYROPHOSPHATASE HMMPanther Dlsoo2oJ2o NUDIX HMMPfam Dlsoo2oJ2o NUDIXFAMILY FPrintScan Dlsoo2oJ2o Nudix superfamily D2sooo4Mio CYSTATIN ScanRegExp D2sooo4Mio PROTEIN PHOSPHATASE 4 REGULATORY SUBUNIT 2 HMMPanther D2sooo4Mio coiled-coil Coil g8solslGoi OVALBUMIN HMMPanther g8solslGoi SERINE PROTEINASE INHIBITOR HMMPanther g8solslGoi SERPIN ScanRegExp g8solslGoi Serpin HMMPfam g8solslGoi Serpins superfamilv 3110013005 TSGiol-RELATED HMMPanther 3110013005 Tsgiol HMMPfam 3110013005 UBC-like superfamily 3110013005 coiled-coil Coil 26oooosCOs Ubiq_cyt_C_chap HMMPfam 26oooosCOs ZICs BINDING PROTEIN-RELATED HMMPanther lliooolNls coiled-coil Coil Name Motif at 3' UTR Database llioossJlg 30S AND 40S RIBOSOMALPROTEIN S4 HMMPanther llioossJlg 40S RIBOSOMALPROTEIN S4 HMMPanther llioossJlg Alpha-L RNA-binding motif superfamily llioossJlg Kew HMMPfam llioossJlg QgDll8_MOUSE_QgDll8; BlastProDom llioossJlg Ribosomal S4e HMMPfam 4is246gFo2 ATP-DEPENDENT RNA HELICASE HMMPanther 4is246gFo2 ATP-DEPENDENTRNA HELICASE-RELATED HMMPanther 4is246gFo2 DUFl6os HMMPfam 4is246gFo2 HA2 HMMPfam 4is246gFo2 HELICASE ProfileScan 4is246gFo2 P-loop containing nucleoside triphosphate hydrolases superfamily 4is248gH2l ADAM-A DISINTEGRIN AND METALLOPROTEINASE HMMPanther 4is248gH2l ADAMTSl8 HMMPanther 4is248gH2l ADAMTSFAMILY FPrintScan 4is248gH2l TSP-l type 1 repeat superfamily 4is248gH2l TSPl ProfileScan 4is248gH2l TSPlREPEAT FPrintScan 4is248gH2l TSP 1 HMMPfam 4o2l4o2Als O-SIALOGLYCOPROTEINENDOPEPTIDASE HMMPanther 4o2l4o2Als PROTEIN_KINASE_DOM ProfileScan 4o2l4o2Als PROTEIN_KINASE_TYR ScanRegExp 4o2l4o2Als PRPK_MOUSE_QggPW4; BlastProDom 4o2l4o2Als Protein kinase-like (PK-like) superfamily 4o2l4o2Als RlOl HMMPfam s4so4osH2o ESTERASE ProfileScan s4so4osH2o LIPASE HMMPanther s4so4osH2o LIPASE_SER ScanRegExp s4so4osH2o LIPOLIPASE FPrintScan s4so4osH2o LIPOPROTEIN LIPASE HMMPanther s4so4osH2o Lipase HMMPfam s4so4osH2o Lipase/lipooxygenasedomain (PLAT/LH2 domain) superfamily s4so4osH2o PLAT HMMPfam s4so4osH2o TAGLIPASE FPrintScan s4so4osH2o alpha/beta-Hyd rolases superfamily 2oiooosBol EBP HMMPfam 2oiooosBol STEROL ISOMERASE RELATED HMMPanther 26ioo44Hls ANKYRIN FPrintScan 26ioo44Hls ANK_REPEAT ProfileScan M. musculus 26ioo44Hls ANK_REP_REGION ProfileScan 26ioo44Hls Ank HMMPfam 26ioo44Hls Ankyrin repeat superfamily 26ioo44Hls PROTEIN KINASE HMMPanther 26ioo44Hls PROTEIN _KINASE_DOM ProfileScan 26ioo44Hls Pkinase_Tyr HMMPfam 26ioo44Hls Protein kinase-like (PK-like) superfamily 26ioo44Hls QggJ82 RAT QggJ82; BlastProDom 8oso486Fo4 ER LUMENPROTEIN RETAINING RECEPTOR-RELATED HMMPanther 8oso486Fo4 ERLUMENR FPrintScan 8oso486Fo4 ER _lumen_recept HMMPfam 8oso486Fo4 QggJH8 MOUSE QggJH8; BlastProDom Bgsoo88Dos ARM repeat superfamily Bgsoo88Dos Adaptin _N HMMPfam Bgsoo88Dos Alpha_adaptinC2 HMMPfam Bgsoo88Dos Alpha_adapt in_C HMMPfam Bgsoo88Dos Clathrin adaptor appendage domain superfamily Bgsoo88Dos Clathrin adaptor appendage, alpha and beta chain-specific domain superfamily Blsooioll6 BACK HMMPfam Blsooioll6 BTB HMMPfam Blsooioll6 Galactose oxidase, central domain superfamily Blsooioll6 KELCH-RELATED HMMPanther Blsooioll6 Ketch _1 HMMPfam Blsooioll6 POZ domain superfamily G4sioo2Il2 ASP_PROTEASE ScanRegExp G4sioo2Il2 ASP_PROT_RETROV ProfileScan G4sioo2ll2 Acid proteases superfamily G4sioo2ll2 DNA/RNA polymerases superfamily G4sioo2ll2 GAG POLYPROTEIN HMMPanther G4sioo2ll2 INTEGRASE ProfileScan G4sioo2Il2 POL POLYPROTEIN HMMPanther G4sioo2ll2 RNASE_H Prof leScan G4sioo2ll2 RT_POL ProfileScan G4sioo2ll2 RVP HMMPfam G4sioo2ll2 RVT _1 HMMPfam G4sioo2ll2 Ribonuclease H-like superfamily G4sioo2Il2 RnaseH HMMPfam G4sioo2ll2 Ive HMMPfam 2iooossPl2 BASIGIN FPrintScan 2iooossPl2 I-set HMMPfam

Name Motif at 3' UTR Database 2iooossPl2 IG_LIKE ProfileScan 2iooossPl2 IMMUNOGLOBULIN SUPERFAMILY MEMBER-RELATED HMMPanther 2iooossPl2 Immunoglobulin superfamily 2iooossPl2 STROMAL CELL DERIVED FACTOR RECEPTOR-RELATED HMMPanther 2iooossPl2 HMMPfam lliooosJo6 CD22s HMMPfam siso4l4Doi BRAIN-DERIVED NEUROTROPHIC FACTOR HMMPanther siso4l4Doi Cystine-knot cytokines superfamily siso4l4Doi NERVE GROWTHFACTOR (NGF)-RELATED HMMPanther siso4l4Doi NGF HMMPfam siso4l4Doi NGF _1 ScanRegExp siso4l4Doi NGF 2 ProfileScan siso4l4Doi Q8IUsg_HUMAN _Q8IUsg; BlastProDom s8so4s2Fli Group II dsDNA viruses VP superfamily s8so4s2Fli PHD HMMPfam s8so4s2Fli TUMOR SUPPRESSOR INGl-RELATED HMMPanther s8so4s2Fli ZF_PHD_1 ScanRegExp s8so4s2Fli ZF PHD 2 ProfileScan o6ioosoPl2 KETOHEXOKINASE HMMPanther o6ioosoPl2 PfkB HMMPfam o6ioosoPl2 Ribokinase-like superfamily 2oiool2Fos ESCRT-III HMMPfam 2oiool2Fos SNFi - RELATED HMMPanther 2oiool2Fos coiled-coil Coil 2sioo6sMos ACETYL-COA:ACETOACETYL-COA TRANSFERASE-RELATED HMMPanther 2sioo6sMos CoA transferase superfamily 2sioo6sMos SUCCINYL-COA:s-KETOACID-COENZYME A TRANSFERASE HMMPanther 22iooioBos BRICHOS HMMPfam 22iooioBos QgBDXs BOVIN QgBDXs; BlastProDom 6oso44oE2l DUF8so HMMPfam 2oio2o4No8 ALPHA AMYLASE-RELATED HMMPanther 2oio2o4No8 MALTASE-GLUCOAMYLASE HMMPanther 24oooosCl2 APOLIPOPROTEIN HMMPanther 2400003012 Apolipoprotein HMMPfam 24oooosCl2 coiled-coil Coil 2siooogGo6 P-loop containing nucleoside triphosphate hydrolases superfamily 2siooogGo6 RUVB-RELATED HMMPanther 2siooogGo6 TIP4g HMMPfam A2soolsCls tRNA-guanine transglycosylase superfamily 2ioooiiB2o2ioooiiB2o HIV TAT SPECIFIC FACTOR 1 - RELATED HMMPanther M. musculus RNA-bindingdomain, RBD superfamily 2ioooiiB2o RRM 1 HMMPfam C6soosiMli DNA-s'-Pase: DNA s'-phosphatase HMMTigr C6soosiMli HAD-SF-IIIA: hydrolase, HAD-superfamily, HMMTigr C6soosiMli HAD-like superfamily C6soosiMli P-loop containing nucleoside triphosphate hydrolases superfamily C6soosiMli PNK-s'Pase: polynucleotide kinase s'-phos HMMTigr C6soosiMli POLYNUCLEOTIDE KINASE 3 PHOSPHATASE HMMPanther C6soosiMli POLYNUCLEOTIDE KINASE 3' PHOSPHATASE-RELATED HMMPanther gssoossHos PROTEIN _KINASE_ATP ScanRegExp gssoossHos RNA BINDING PROTEIN l-RELATED HMMPanther gssoossHos RNP PARTICLE COMPONENT-RELATED HMMPanther 4gs2io2Fo8 Glutathione synthetase ATP-binding domain-like superfamily 4gs2io2Fo8 TTL HMMPfam 4gs2io2Fo8 TUBULIN TYROSINE LIGASE RELATED HMMPanther 4gsosi8Mli coiled-coil Coil A8soo22Oos CUB HMMPfam A8soo22Oos LDL receptor-like module superfamily A8soo22Oos LDLRA_2 ProfileScan A8soo22Oos Ldl_recept_a HMMPfam A8soo22Oos NEUROPILIN(NRP) AND TOLLOID(TLL)-LIKE 1- RELATED HMMPanther A8soo22Oos Spermadhesin, CUB domain superfamily A8soo22Oos ZINC METALLOPROTEINASE-RELATED HMMPanther siso442C2l Nops2 HMMPfam s8so4o4Fol C2H2 and C2HC zinc fingers superfamily s8so4o4Fol KRAB-RELATED C2H2-TYPE ZINC-FINGER PROTEIN HMMPanther s8so4o4Fol Q8C82i _MOUSE_Q8C82i; BlastProDom s8so4o4Fol Q8RsD 1 MOUSE_Q8RsDl; BlastProDom s8so4o4Fol Zsos_HUMAN_043309; BlastProDom s8so4o4Fol ZINC FINGER PROTEIN HMMPanther s8so4o4Fol ZINCFINGER FPrintScan s8so4o4Fol ZINC_FINGER_C2H2_1 ScanRegExp s8so4o4Fol ZINC FINGER C2H2 2 ProfileScan s8so4o4Fol zf-C2H2 HMMPfam Bgsool8ll2 CAMP-DEPENDENT RAPl GUANINE-NUCLEOTIDE EXCHANGE FACTOR HMMPanther Bgsool8ll2 RAPl GUANINE-NUCLEOTIDE-EXCHANGE FACTOR HMMPanther Bgsool8Il2 RASGEF ScanRegExp Bgsool8ll2 RASGEF_CAT Prof leScan Bgsool8ll2 Ras GEF superfamily Bgsool8ll2 RasGEF HMMPfam Name Motif at 3' UTR Database Clsoo2oJo4 RIBONUCLEOPROTEIN-RELATED HMMPanther Clsoo2oJo4 RNA-binding domain, RBD superfamily Clsoo2oJo4 RRM ProfileScan Clsoo2oJo4 RRM 1 HMMPfam liooo4lKlg ALDKETRDTASE FPrintScan liooo4lKlg ALDO/KETOREDUCTASE-RELATED HMMPanther liooo4lKlg ALDOKETO_REDUCTASE_2 ScanRegExp liooo4lKlg ALDOKETO_REDUCTASE_3 ScanRegExp liooo4lKlg ALDR_MOUSE_P4ssi6; BlastProDom liooo4lKlg Aldo_ket_red HMMPfam liooo4lKlg NAD(P)-linkedoxidoreductase superfamily liooo4lKlg OXIDOREDUCTASE HMMPanther 2sioo46Kio coiled-coil Coil llioo2oLlg coiled-coil Coil Aisoos2Bls RAB-RELATED SMALL G PROTEIN HMMPanther Aisoos2Bls RAB2 HMMPanther CssooolEo4 Bso2 ProfileScan CssooolEo4 TRIPARTITEMOTIF PROTEIN TRIMl4 BETA HMMPanther CssooolEo4 TRIPARTITE MOTIF PROTEIN-RELATED HMMPanther 2siooo8E2o CYCLIN HMMPanther 2siooo8E2o Cyclin-like superfamily 2siooo8E2o Cyclin_C HMMPfam 2siooo8E2o Cyclin_N HMMPfam 2siooo8E2o GI/S-SPECIFIC CYCLIN E HMMPanther 1200003021 ERM HMMPfam 1200003021 ERMFAMILY FPrintScan 1200003021 MOESIN/EZRIN/RADIXIN-RELATED HMMPanther 1200003021 Moesin tail domain superfamily 1200003021 coiled-coil Coil 1810019020 QgD8Ki_MOUSE_QgD8Ki; BlastProDom l8ioolgG2o RUBl CONJUGATINGENZYME UBCl2 HMMPanther 1810019020 UBC-like superfamily l8ioolgG2o UBIQUITIN-CONJUGATING ENZYME E2 HMMPanther l8ioolgG2o UBIQUITIN_CONJUGAT_1 ScanRegExp l8ioolgG2o UBIQUITIN_CONJUGAT_2 ProfileScan l8ioolgG2o Uc con HMMPfam 8oso48sJ22 40S RIBOSOMALPROTEIN Sic HMMPanther 8oso48sJ22 ATHOOK FPrintScan 8oso48sJ228oso48sJ22 Q8ISPs_BRABE_Q8ISPs; BlastProDom M. musculus Sic plectin HMMPfam 2siooosEio ALDKETRDTASE FPrintScan 2siooosEio ALDO/KETOREDUCTASE-RELATED HMMPanther 2siooosEio ALDOKETO_REDUCTASE_3 ScanRegExp 2siooosEio Aldo ket red HMMPfam 2siooosEio NAD(P)-linkedoxidoreductase superfamily 2siooosEio OXIDOREDUCTASE HMMPanther 2siooosEio Q8BIV6 MOUSE Q8BIV6; BlastProDom 2oiool6Do2 LACTATEDEHYDROGENASE-RELATED HMMPanther 2oiool6Do2 LDH C-terminal domain-like superfamily 2010016002 Ldh 1 C HMMPfam 2010016002 MALATEDEHYDROGENASE HMMPanther 2010016002 QgDB4s MOUSE_QgDB4s; BlastProDom 6sso4l4Al2 EDTRNSPORT FPrntScan 6sso4l4Al2 GLUTAMATE TRANSPORTER HMMPanther 6sso4l4Al2 NA_DICARBOXYL_SYMP_2 ScanRegExp 6sso4l4Al2 SDF HMMPfam 6sso4l4Al2 SODIUM/DICARBOXYLATESYMPORTER-RELATED HMMPanther 2oiooo2Eo4 ER TARGET ScanRegExp 2oiooo2Eo4 GROUP XII SECRETED PHOSPHOLIPASE A2 HMMPanther 2oiooo2Eo4 PLA2Gl2 HMMPfam 2oiooo2Eo4 Phospholipase A2, PLA2 superfamily 2010015008 Metallo-dependent phosphatases superfamily 2010015008 Metallophos HMMPfam 2010015008 VACUOLAR SORTING PROTEIN VPS2g HMMPanther 2goooo2G2o BYSTIN HMMPanther 2goooo2G2o Bystin HMMPfam 4833424007 BRICHOS HMMPfam 4833424007 CHONDROMODULIN-RELATED HMMPanther 4833424007 Scorpion toxin-like superfamily 4833424007 TENOMODULIN HMMPanther o6ioollGl8 ARF6 GUANINENUCLEOTIDE EXCHANGE FACTOR-RELATED HMMPanther o6ioollGl8 SECi ProfileScan o6ioollGl8 Seci HMMPfam o6ioollGl8 Seci domain superfamily o6ioooiJol FAA_hydrolase HMMPfam o6ioooiJol FAH superfamily o6ioooiJol ISOMERASE-RELATED HMMPanther o6ioooiJol ISOMERASE/DECARBOXYLASE-RELATED HMMPanther Name Motif at 3' UTR Database B4sool2o2l 7tm_1 HMMPfam B4sool2o2l Family A G protein-coupled receptor-like superfamily 8430012021 G PROTEIN-COUPLEDRECEPTOR HMMPanther B4sool2o2l G PROTEIN-COUPLED RECEPTOR 91 HMMPanther B4sool2o2l GPCRRHODOPSN FPrintScan 8430012021 G PROTEIN RECEP F12 ProfileScan o6ioosgMoi CGI-ioi PROTEIN-RELATED HMMPanther o6ioosgMoio6ioosgMoi DNase I-likeMSFl superfamily M. musculus HMMPfam o6ioosgMoi PRELI_MSFl ProfileScan o6ioosgMoi PXlg HOMOLOG HMMPanther o6ioosgMoi coiled-coil Coil 4631422005 Immunoglobulin superfamily 4631422005 Prefoldin superfamily 4631422005 coiled-coil Coil 2sioosiKl4 Aph-l HMMPfam BCoos2g4 15 KDA SELENOPROTEIN HMMPanther 8Coos2g4 SELENOPROTEIN HMMPanther BCoo2s8l AIRS HMMPfam BCoo2s8l AIRS _C HMMPfam 8Coo2s8l Aminoimidazoleribonucleotide synthetase (PurM) C-terminal domain superfamily BCoo2s8l Aminoimidazoleribonucleotide synthetase (PurM) N-terminal domain superfamily 8Coo2s8l SELENOPHOSPHATESYNTHASE HMMPanther BCoo2s8l Selenophosphate synthetase HMMPIR BCoo2s8l selD: selenide, water dikinase HMMTigr BOooss88 ORIGIN RECOGNITIONCOMPLEX SUBUNIT 4 HMMPanther BOool862 8302 ProfileScan 8Cool862 BUTYPHLNCDUF FPrintScan 60001862 RING FINGER PROTEIN 18 HMMPanther Boool862 SPRY HMMPfam 60001862 TRIPARTITEMOTIF PROTEIN-RELATED HMMPanther AKoolil4 ANKYRIN REPEAT-CONTAINING PROTEIN HMMPanther AKosss44 ANK REP REGION ProfileScan AKosss44 Ank HMMPfam AKosss44 Ankyrin repeat superfamily BColiili Group II dsDNA viruses VP superfamily BColiili IODOTHYRONINEDEIODINASE HMMPanther BColiili T4_deiodinase HMMPfam BColiili TYPE III IODITHYRONINE DEIODINASE HMMPanther AKoi4l86 SODIUM:GALACTOSIDESYMPORTER FAMILY MEMBER HMMPanther AKoolg82 ARM repeat superfamily AKoolg82 IMPORTINBETA-s SUBUNIT HMMPanther 60007097 TIMP HMMPfam 60007097 TIMP-like superfamily BOooiogi TISSUE INHIBITOR OF METALLOPROTEASE HMMPanther BOooiogi TISSUE INHIBITOROF METALLOPROTEASE1 HMMPanther H. sapiens AKo22io4 ENTHNHS domain superfamily AKo22io4 coiled-coil Coil AL8ssl4s SELENOPROTEINP-RELATED HMMPanther AL8ssl4s SelP_C HMMPfam AL8ssl4s SeIP N HMMPfam AKo2s6s8 C2H2 and C2HC zinc fingers superfamily AKo2s6s8 KRAB-RELATEDC2H2-TYPE ZINC-FINGER PROTEIN HMMPanther AKo2s6s8 Q8NDWs_HUMAN_Q8NDWs; BlastProDom AKo2s6s8 ZssA_HUMAN_Qo6iso; BlastProDom AKo2s6s8 ZINCFINGER FPrintScan AKo2s6s8 ZINC _FINGER_C2H2_1 ScanRegExp AKo2s6s8 ZINC_FINGER_C2H2_2 ProfileScan AKo2s6s8 zf-C2H2 HMMPfam AKo2sii6 CDgg/MIC2PROTEIN HMMPanther AKogissi Cysteine proteinases superfamily AKogissi Peptidase_C48 HMMPfam BCoooi42 GLUTATHIONEPEROXIDASE-RELATED HMMPanther 8Coooi42 GLUTATHIONE_PEROXID_2 ScanRegExp BCoooi42 GLUTPROXDASE FPrintScan BCoooi42 GSHPx HMMPfam BCoooi42 Glutathione peroxidase HMMPIR BCoooi42 Thioredoxin-like superfamily AKo24gsi RING/U-box superfamily AKo24gsi ZF RING 2 ProfileScan

s.\; ' ,\

/11M.1,1 44 r1C/11 JrllArR rrlV I LIN El I rV OIIU Iel AKogss44 TUMORNECROSIS FACTORTYPE 1 RECEPTOR-RELATED HMMPanther Name Motif at 3' UTR Database BCO26oi8 F-BOX PROTEIN-RELATED HMMPanther BCO26oi8 Ferritin-like superfamily BCO26oi8 MOBl RELATED HMMPanther BCO26oi8 Mobi_phocein HMMPfam BCoo8424 ACETYLTRANSFERASE-RELATED HMMPanther BCoo8424 Acetyltransf_1 HMMPfam BCoo8424 Acvl-CoAN-acyltransferases (Nat) superfamily BColgo4l HYDROXYPYRUVATE ISOMERASE HMMPanther AKog6244 HYDROXYPYRUVATE ISOMERASE HMMPanther AKosioil COXs HMMPfam AKosioil CYTOCHROMEC OXIDASEPOLYPEPTIDE III HMMPanther AKosioil Cytochrome c oxidase subunit III-like superfamily AKosioil Q8sLo4 HUMAN Q8sLo4; BlastProDom AKo2s486 RETINOBLASTOMA l-RELATED HMMPanther AKo2s486 RNP PARTICLE COMPONENT-RELATED HMMPanther AKo2s486 coiled-coil Coil ABosso82 C2 HMMPfam ABosso82 C2domain (Calcium/lipid-binding domain, CaLB) superfamily ABosso82 C2DOMAIN FPrintScan ABosso82 C2 DOMAIN ProfileScan ABosso82 DBL homology domain (DH-domain) superfamily ABosso82 DH_2 ProfileScan ABosso82 EF-hand superfamily ABosso82 EF_HAND_1 ScanRegExp ABosso82 EF_HAND_2 ProfileScan ABosso82 EH Prof leScan ABosso82 GROWTH FACTOR RECEPTOR-BOUND PROTEIN 2-RELATED HMMPanther ABosso82 INTERSECTIN 2 HMMPanther ABosso82 ITN2_HUMAN_QgNZMs; BlastProDom ABosso82 P6iPHOX FPrintScan ABosso82 PH HMMPfam ABosso82 PH domain-like superfamily ABosso82 PH DOMAIN Prof leScan ABosso82 Q84ZV1 _SOYBN_Q84ZVl; BlastProDom ABosso82 RhoGEF HMMPfam ABosso82 SH3 ProfileScan ABosso82 SHs-domain superfamily ABosso82 SHsDOMAIN FPrintScan ABosso82 SH3 _1 HMMPfam H. sapiens ABosso82 SH3 _2 HMMPfam ABosso82 SPECTRNALPHA FPrintScan ABosso82 coiled-coil Coil ABosso82 efhand HMMPfam ABosso82 tRNA-binding arm superfamily AKoo2o24 CTLH ProfileScan AKoo2o24 Galactose oxidase, central domain superfamily AKoo2o24 Ketch_1 HMMPfam AKoo2o24 LISH Prof leScan AKoo2o24 MUSKELIN1, INTRACELLULARMEDIATOR CONTAINING KELCH MOTIFS HMMPanther AKoo2o24 MUSKELIN-RELATED HMMPanther AKoo2o24 Muskelin N HMMPfam AKossigs AA_TRNA_LIGASE_I ScanRegExp AKossigs DUFl66g HMMPfam BCOsio82 PDZ HMMPfam BCOsio82 PDZ domain-like superfamily BCOsio82 PROTEASES2C FPrintScan BCOsio82 SERINE PROTEASE HMMPanther BCOsio82 Trypsin HMMPfam BCOsio82 Trypsin-like serine proteases superfamily AKos6lg2 PP2APRss FPrintScan AKos6lg2 PRss_2 ScanRegExp AKos6lg2 PROTEIN PHOSPHATASE PP2A REGULATORY SUBUNIT B HMMPanther AKos6lg2 WD4o HMMPfam AKos6lg2 WD4o-repeat superfamily AKos6lg2 WD REPEATS 1 ScanRegExp AKo2s426 Arginase/deacetylase superfamily AKo2s426 HDASUPER FPrintScan AKo2s426 HISTONE DEACETYLASE-RELATED HMMPanther AKo2s426 Hist _deacetvl HMMPfam AL8s2s4g CALPAIN INHIBITOR HMMPanther AL8s2s4g Calpain_inhib HMMPfam BCool8gg Diaminopimelate epimerase-like superfamily BCool8gg GALACTOSIDE2-L-FUCOSYLTRANSFERASE 1 HMMPanther BCool8gg Glvco_transf_11 HMMPfam AKo244oo ARM repeat superfamily AKo244oo Protein kinase-like(PK-like) superfamily AKo22sis CELL CYCLE PROGRESSION 2 FAMILY HMMPanther AKo22sis CELL CYCLE PROGRESSION2 PROTEIN HMMPanther AKo22sis FAST Lew-rich HMMPfam Name Motif at 3' UTR Database AKo24s62 CIpP/crotonase superfamily AKo24s62 ENOYL-COAHYDRATASE-RELATED HMMPanther BCooo4lg CATECHOL-O-METHYLTRANSFERASE HMMPanther BCooo4lg Caffeoyl-CoA s-o-methyltransferase HMMPIR BCooo4lg Methyltransf _3 HMMPfam BOooo4lg O-METHYLTRANSFERASE HMMPanther BCooo4lg S-adenosyl-L-methionine-dependent methyltransferases superfamily 8Cooo4lg SAM BIND ProfileScan AKo24o4i C2H2 and C2HC zinc fingers superfamily AKo24o4i KRAB-RELATEDC2H2-TYPE ZINC-FINGER PROTEIN HMMPanther AKo24o4i Q8TCgl _HUMAN_Q8TCgl, BlastProDom AKo24o4i Zl8o _HUMAN_QgUJW8, BlastProDom AKo24o4i ZINCFINGER FPrintScan AKo24o4i ZINC_FINGER_C2H2_1 ScanRegExp AKo24o4i ZINC_FINGER_C2H2_2 ProfileScan AKo24o4i zf-C2H2 HMMPfam AKos6ggo ANKYRIN FPrintScan AKos6ggo ANKYRIN REPEAT DOMAIN CONTAINING PROTEIN HMMPanther AKos6ggo ANKYRIN-RELATED HMMPanther AKos6ggo ANK_REPEAT ProfileScan AKos6ggo ANK_REP_REGION ProfileScan AKos6ggo Ank HMMPfam AKos6ggo Ankvrin repeat suoerfamilv BOoos8g4 DIMETHYLANILINEMONOOXYGENASE HMMPanther BCoos8g4 FMO-like HMMPfam BCoos8g4 FMOXYGENASE2 FPrintScan BCoos8g4 MONOOXYGENASE-RELATED HMMPanther AL8s4l28 DUFl6g2 HMMPfam Rrnnlcea VAf"-" ' R ATP SYNTHS 111\I\II'l_~aL __

bUUUVUbS l~l V_11 BCoogo6s GLYCINE CLEAVAGE SYSTEM H PROTEIN MMPanther Rrnnonaa Gnnlc hvhnrl motif family

H sapiens , •• •

rimmrTam AKo2s82s coiled-coil Coil AKo2iss6 TRANSMEMBRANEPROTEIN WITH THROMBOSPONDIN MODULE HMMPanther AKos66s6 EF-G/eEF-2 domains III and V superfamily AKos66s6 EFG_C HMMPfam AKos66s6 EFG_IV HMMPfam AKos66s6 Ribosomal protein S5 domain 2-like superfamily AKos66s6 TRANSLATION ELONGATION FACTOR HMMPanther AKos66s6 TRANSLATION ELONGATION FACTOR 2-RELATED HMMPanther AKos66s6 Translation proteins suoerfamdv AKool682 AMINO ACID TRNA SYNTHETASE HMMPanther AKool682 LEUCYL-TRNA SYNTHETASE-RELATED HMMPanther AKo2s6is DUFgi4 HMMPfam AKo2s6is Transalut C suoerfamil BCool48i RIBONUCLEOPROTEIN-RELATED HMMPanther BCool48i TAR DNA-BINDINGPROTEIN HMMPanther BColissl APOLIPOPROTEIN L HMMPanther BColissl ApoL HMMPfam BColissl coded-cod Coil AKooo222 MT-Aio HMMPfam AKooo222 MT_A70 ProfileScan AKooo222 S-adenosvl-L-methionine-dependent methvltransferases suoerfamdv BCoo4sol C2H2 and C2HC zinc fingers superfamily BCoo4sol KRUPPEL-LIKE FACTOR HMMPanther BCoo4sol Q8VDZi_MOUSE_Q8VDZi, BlastProDom BCoo4sol ZINC_FINGER_C2H2_1 ScanRegExp BCoo4sol ZINC FINGER C2H2 2 ProfileScan AKo2slss EF-Tu/eEF-lalpha/eIF2-gamma C-terminal domain superfamily AKo2slss EFACTOR_GTP ScanRegExp AKo2slss ELONGATIONFACTOR l-RELATED HMMPanther AKo2slss ELONGATNFCT FPnntScan AKo2slss EUCARYOTICCHAIN RELEASE FACTOR-s-RELATED HMMPanther AKo2slss GTP_EFTU HMMPfam AKo2slss GTP_EFTU_D2 HMMPfam AKo2slss GTP_EFTU_D3 HMMPfam AKo2slss P-loon containina nucleoside tnphosphate hvdrolases suserfamdv AKooo66g TRF2-INTERACTING TELOMERIC RAPl PROTEIN HMMPanther AKo24826 G PROTEIN-COUPLED RECEPTOR KINASE-INTERACTOR-RELATED HMMPanther

FIVIIVIrldlll Name Mo[if at 3'UTR Database AKO22650 Ubiq_cyt_C_chap HMMPtam AKO22650 ZIC3 BINDING PROTEIN-RELATED HMMPanther AKO56120 CYCLIN HMMPanlher AKO56120 CYCLIN L HMMPanther AKO56120 Cyclin-like superfamily H.sapiens AKO56120 Cydin, L type HMMPIR AKO56120 coiled-coil Coil AKO96341 E-MAP-115 HMMPiam AKO96341 MICROTUBULE-ASS0CIATED PROT日N HMMPaniher AKO96341 MICROTUBULE-ASSOCIATED PROTEIN-RELATED HMMPanther

AVnnMA噌 【 ∩.-A月 ハA.1 Cnd

D.melanogaster

羅≡懸麹 羅羅 灘灘灘驚蘂睡 額 謹塗垂≡ 響 麟 懸 羅灘講 RAFLO9-98-E22 Metaliopho nmmrram RAFLO9-98-E22 Metallo-dependent phosphalases superfamily Athaliana RAFLO9-98-E22 PROTEIN PHOSPHATASE 4 HMMPanther RAFLO9-98-E22 SERINEITHREONINEPROTEIN PHOSPHATASE HMMPanther R4F109-98-E22 STPHPHTASE FPrintScan Analysis of stop codon readthrough utilizingcomparative genomics approach

2007年10月15日 初 版発 行

著 者 服 部美樹 子

監修 冨田 勝、金井昭夫、斎藤輪太郎

発 行 慶應 義 塾 大 学 湘 南 藤 沢 学 会 〒252-0816神 奈 川 県藤 沢市 遠藤5322 TEL:0466-49-3437

Printed in Japan 印 刷 ・製本 ワキプリントピア

ISBN 978-4-87762-191-9 SFC-MT 2007-006 - - 本論文は修士論文において優秀と認められ、 出版されたものです。