The Proteome As a Molecular Language ('Proteinese')

THE PROTEOME AS A MOLECULAR LANGUAGE ('PROTEINESE'). Sungchul Ji, Department of Pharmacology and Toxicology, Rutgers University, Piscataway, N.J. 08855. [email protected]

The purpose of this poster is to characterize and analyze the striking similarities that exist between the proteome viewed as a molecular language and the human language. Perhaps this is not surprising if one accepts the thesis advocated in 1997 [2-4] that living cells use a molecular language whose characteristics are similar to those of the human language. According to the cell language theory, cell functions are supported by a hierarchy of molecular languages implemented by DNA, RNA, proteins, and metabolites (e.g., ions, pyruvate, lactate, glucose, inorganic phosphate, ATP, cAMP, etc.). In addition, we can recognize a fifth language used primarily for intercellular communication through exchange of diffusible ions and molecules such as hormones, cytokines, prostaglandins, and morphogens. For convenience, we may adopt the following names for these languages:

1. DNA language = 'DNese' 2. RNA language = 'RNese' 3. Protein language = 'proteinese' 4. Metabolite language = 'metabolese' 5. Intercellular language = 'intercellese'

Thus, it may be stated that cellese consists of 5 different sublanguages, just as the computer language can be said to comprise a set of sublanguages – i) digital logic language, ii) microarchitecture language, iii) instruction set architecture language, iv) operating system machine language, v) assembly language, and vi) problem-oriented language [5].

The cell language theory [2 -4] postulates that there exists a set of principles and rules obeyed by both cellese and humansese on the one hand and by all the sub-languages of cellese on the other. These principles include:

i) the principle of syntagmatic and paradigmatic relations [2], and ii) the principle of triple articulation (see Table 3 on p. 17 in [4]).

Detailed discussions on these principles will not be given here since they are avaialbe in the cited references and not directly related to the comparison between proteinese and humanese, the focal point of this poster.

The structure of 'proteinese' can be divided into 5 levels as shown in Figure 1. The fiveness of Figure 1 and the fiveness of the number of the sublanguages constituting cellese are not causally related (as far as I can tell) and hence must be viewed as accidental. The first four levels of organization can be identified with the well-known primary, secondary,tertiary and quaternary structures of proteins. The fifth structure is rather new (see for example [1]), having emerged only recently due to the development of high-throughput experimental techniques such as DNA arrays, enabling the study of genome-wide behaviors of biopolymers.

Protein Networks (5) /\ || Protein Complexes (4) /\ || Folded Polypeptides (3) /\ || Functional Domains (2) /\ || Amino Acids (1)

Figure 1. The hierarchical organization of protein language ('proteinese').

The 5-level structural organization of proteinese shown in Figure 1 is compared with the 5-level structural organization of humanese in Table 1 (see the first three columns).

Table 1. A structural and functional comparison of protein language (proteinese) and human language (humanese). ______

Level of Organiz- ation Proteinese Humanese Function ______

Primary Amino acids Phonemes Units of Communication ______

Secondary Functional Morphemes Units of Meaning Domains ______

Tertiary Folded Words Units of Denotation Polypeptides ______

Quaternary Protein Sentences Units of Judgments Complexes ______

Quintic Protein Texts Units of Reasoning Networks ______If the comparison in Table 1 is valid, we may infer the possible functions of the different structures of proteins from the corresponding functions of human language (see the last column). Thus, the functional role of protein complexes is to make a judgment (e.g., to turn on or off gene expression), and that of a protein-protein interaction network is to reason or compute, namely, to transform inputs into outputs following a set of rules selected by biological evolution.

The content of Table 1 can be viewed as a set of predictions made by the cell language theory as applied to proteinese. It is hoped that this poster will attract the interest of bioinformatics experts who can test some of these predictions based on protein structural and functional data available on the Internet.

References: [1] Ganeur, J., Krause, R., Bouwmeester, T., and Casari, G. (2004). Modular decomposition of protein-protein interaction networks. Genome Biology 5:R57.

[2] Ji, S. (1997). Isomorphism between Cell and Human Languages: Molecular Biological, Bioinformatic and Linguistic Implications. BioSystems 44:17-39.

[3] Ji, S. (2002). Microsemiotics of DNA. Semiotica 138 (1/4): 15-42.

[4] Ji, S. (2005). Semiotics of Life: A Unified Theory of Molecular Machines, Living Cells, the Mind, Peircean Signs, and the Universe based on the Principle of Information-Energy Complementarity. Reports, the Research Group on Mathematical Linguistics (Martin-Vide, C., ed.), Rovira i Virgili University, Tarragona, Spain, pp. 1-183. The text available at http://www.grlmc.com, under Publictions.

[5] Tanenbaum, A. S. (2003). Structured Computer Organization. Fourth Edition. Prentice-Hall of India, New Delhi. P. 5.