l�{

A Language for Knowledge Representation

Yukio Nakamura, INFOSTA-NIPDOK, Japan

Abstract: A language for the representation of pieces of knowledge is proposed as an extension of representation used in document retrieval scheme hitherto used. Main features are the use of case-representation and modifying symbols for ­ descriptors as well as the intro-duction of -descriptors with tense and modifYing symbols. Beside usual AND and OR operators, some new operators are introduced to show ontological relations. Some examples afknowledge representation is appended.

1. Subject expression

In documentation works we have dealed with a subject of a document, not­ withstanding the size of the document. A subject is a substitute or a re­ presentative of the document. In dealing with pieces of knowledge, however, a representative of the piece of knowledge must be treated differently from the "subject" of a document. The pieces of knowledge are simple and short comparing with a document. A piece of knowledge does not need a subject representation but a direct description of this piece itself is enough.

Then, what is required for the description of knowledge? The subject was and is traditionally described by a noun phrase and without a verb. This was justified by the fact that can be transformed into noun form by using mostly participles in Western languages. Is this applicable in the case of knowledge?

In knowledge description, we need to describe some kind of change in matter or in state, that is to say an action, or a concept of change. To represent such changes, the use Qfverbs is very natural. This means a description can be made by using a sentence, if necessary.

For example, descriptions (A) A dog bites a man. (B) A man bites a dog. are both permissible as valid knowledge. In these expressions, (A) and (B) can be transformed into noun phrase forms (BI) Biting of a man by a dog. (B2) Biting of a dog by a man.

These expressions are all of natural language way and we need to transform these into simpler logical formulations, if we need to treat these by a computer.

Advances in Knowledge Organization, Vol.4( 1994), p. 127-133 The fonnulated form will be (Sl) Biting /\ Men /\ Dogs , (S2) Biting /\ Dogs /\ Men. Both (SI) and (S2) are the same due to fonnal logics. It is easy to see that such means of subject fonnulation is inadequate for knowledge representation.!t is also seen that the use of "biting" is not necessary and the use of "bite" is just enough.

We see, therefore, that a. a simple use of descriptors is insufficient and, at least, a differentiation of subject and is necessary when descriptors are used. b. use of verbs as descriptors can be made without any difficulty, if we just abandon the traditional way of thinking. Discussions on these two points are made in the later part ofthis paper.

2. Meaning of AND and OR operators

If "Biting", "Men" and "Dogs" are thought as concepts. then the logical product () of "Biting" (an action) with "Men" (or "Dogs") is nonsense in the light of E. Wiister's theory I . Truely, there is nothing which is "Biting" and "Men" at the same time. In our cases, both for the subject of a document and knowledge representation, this logical product (conjunction) of "biting", "Men" and "dogs" makes sense. Wiister insisted not to use the symbol /\ but 0. which is a quite different operator that he tenned as "conjunction in theme relation (Thema-Relation)". With this symbol, one can represent (SI) and (S2) as [Biting] f.. [Men] 0. [Dogs], where [ ] represents a concept. This representation will be agreed upon by him and his followers.

The present author has a little different way of thinking.

Whenever we use some machine (or even a peek-a-boo card) for the handling of a subject of a document, all operations are based on coordination (or collocation) of concepts, descriptors or simple keywords. The machines and tools for processing are just working on the same AND operation (as truth table shows). The difference comes from operands. If operands are concepts, then, concept "Biting" cannot have the AND operation with liMen" nor "Dogs " but if the operands are collocated words or classification notations, then the AND operation is possible for lIbiting" and "Men" (or "Dogs").

All of this can be represented in the author's notation as [Biting] /\ [Men] invalid l;titing 1\ Me!! valid , .::"

where [ ] represents a concept and the underlining represents a descriptor. In handling knowledge, we must use descriptors or classification notations and these two correspond to concepts and are not concepts themselves.

We must be very careful when we use descriptors or classification notations that relate to the ontological relation and especially when the logical OR is concerned. For instance, Relgium v NETHERLANDS and [Belgium] v [Netherlands] make sense. However, using the AND operator, [Belgium] 1\ [Netherlands] does not make sense, because there is no such place that is Belgium and Netherlands, simultaneously.

However, Belgium /\ Netherlands makes sense and further Belgium v Netherlands v �embourg makes sense also and in many cases this combination is the same as Ilene\w>.

3. Verbs as Descriptors

Until now many of us have thought that a concept should be shown by . However, ISO R1087(1969) says in its Section 2.1.1

"Concepts may be the mental representation not only of things (expressed by nouns) but in a wider sense, also of qualities (expr. by or nouns), of actions (as expo by verbs or nouns) and even locations, situations or relations (expr. by adverbs, prepositions, conjunction or nouns)."

A representation of knowledge requires to represent an action or a change and therefore verbs come necessarily into our scope. Then we must include some form of verbs in our knowledge representation.

The simplest way to do this, is to adopt verbs in our descriptors. Verbs have many forms as these have grammatical conjugation, differences in mood and . Concept formation means some kind of abstraction, so we must neglect some characteristic that is less important. The simplest choice is to adopt just the idea of action or change and this means we use only the (or basic) form and neglect all forms in conjugation, such as mood and voice. ,,,v

At first this will be difficult to agree for some people because many natural langnages have some kind of granunatical conjugation which is so familiar with them. There are, however, also many languages that have no conjugation at all but still these can express everything necessary for us, by the addition of some other words (mostly of adverbial nature). For instance, we can think of Chinese and Indonesian. An example is shown below:

Chinese Wo shui = I sleep. Wo shui-le = I slept.

Indonesian = Saya tidur. = Saya sudah tidur.

With these in mind, we can adopt verbs in their infinitive form as descriptors and represent them by a symbol, such as I fillli' I , in to a noun descriptor shown as fillli'. Past and future tense can be represented as flowpand flow{-

Of course, reducing the number of verb descriptors is essential as that for noun descriptors. This is a task for the compiler of descriptors.

4, Case representation of noun descriptors

As shown earlier a differentiation of noun descriptors was felt necessary, at least, for the granunatical subject and object. In other cases some other representation is also necessary. This necessitywas felt already in the 1960's and a well known device is proposed under the name of "roles". Implementation of such roles were made in experimental systems but after these trials, no commercial retrieval system has employed such roles. This is justified by the increase in indexing effort and little effectiveness for existing retrieval systems based on subject representation of documents.

Now for knowledge representation, the situation is quite different. Without such devices a correct representation of knowledge is almost impossible. The Role system proposed in the Thesaurus of Engineering Terms2, was intended for chemical documentation and is not applicable to other diSCiplines. Other proposals were abundant but these were also discipline- or problem-oriented and no general discussion was made. The use of "of-a" relation is made frequently to construct a kind of hierarchy. A less known author3 once proposed to use prepositions and links to represent syntactical relations in subject representation. He gave examples such as the following (converted into descriptor representation in ISO 2788)

(FISHES + MARKET) IN San Francisco Fish market in San Francisco FISHES IN (MARKET INSan Francisco) Fishes (sold or exhibited) in San Francisco Market '" ,

(FISHES + MARKET) IN San Francisco IN SONGS Fish market in San Francisco appeared in a song (In all these examples, IN is a preposition operator)

This trial shows that prepositions play an important function in the English and many Western langnages, and they are therefore useful in showing the syntactic role of descriptors in subject representation, In other langnages where case-representation is more direct and clear, these prepositions can be replaced by signs, Readers are reminded to think of Slavic langnages and Ural-Altaic langnages.

Prepositions are very much langnage-dependent and are difficult to have neutral and abstract representations. In contrary to prepositions, grammatical cases are rather uniform and universal among langnages because these are abstract concepts. In this sense, the present author's proposal is to use an abstract case common to most of the langnages and to limit the num1ier of cases to a minimum, as shown below:

symbol corresponding preposition. nominative XXlQIn genitive XXXXg of dative XXMd to accusative x.xM"c ablative xxxxb from locative xxxxb at, in, on instrumental =i by here = means a noun-descriptor.

One of the special points in this proposal is to assign positively a symbol for nominatives though in many of natural langnages there is no sign for nominatives because usually a nominative comes at the beginning of a sentence.

5. Other syntactic and logical relations

There is still the necessity to have representations of other relations easy to understand. a. For the qualifying function, adjectival descriptors are not necessary because an adjectival expression is derived from noun-descriptors by adding a modifying symbol m to these, i.e. =m' If a word with the same symbol is used after a verb-descriptor, this word represents an adverbial expression. ,,,,,

For the relation of -- larger than, smaller than and equal, the usual signs are used of < , > and � if necessary. Dynamic quantity relations are handled by verb-descriptors. b. The Time relation is, in a sense, ontological but, here, it is separated. The relations, "before", !lafler'l and "simultaneous II must be represented. "Before" and "after" are just inverse and only the following operators are necessary:

Operators POST meaning ttaftertl SIMT meaning "simultaneous"

Example: A POST B means A occurs after B. A SIMT B means A occurs simultaneously with B. c. The Inclusion relation is shown by an operator INCL and is used as A INCL B means A includes B, This operator is used unless the relation between A and B is clearly shown by hierarchy (chain) for noun-descriptors. Example: DOCUMENTS INCL MAGNETIC RECORDS d. The "If-then" or conditional relation is shown by an operator CONDo

Anexample is: (ClIRRENT n A I ELOW I A TERMlNALSd) COND (GATESn A I OI'EN I) � Current flows into the terminal if the gate opens. Remark: For operators POST, INCL and COND operands are not commutable.

6. A Summary of Grammar a. Different descriptors are combined by operators, AND( A ), OR( v ) and those shown in 5. b. Concepts are represented by 1) noun-descriptors that represent static or unchanging objects and phenomena. This is shown by xxxx. 2) verb-descriptors that represent change and action. This is shown by I xxxx I . c. Noun-descriptors are accompanied by one of case-representing symbols. (Symbols are shown in 5.) Verb-descriptors are accompanied, if necessary, by one of the tense symbols (shown in 5.). d. Adjectival and adverbial expressions are derived from noun-descriptors and verb-descriptors respectively, by adding a modifYingsymbol m at the end.

e. Ontological relations are represented by the following operators: for the time relation POST, and for the inclusion relation INCL. f. A train of descriptors with symbols attached and of operators fonn a sentence that represents a knowledge statement.

7. Examples

1. A green apple �n /\ greenlnesslm 2. Apples ripe in October. �n /\ I ripe I /\ (ktobeq 3. Trains slow down at a corner. trainsn /\ I redlli:e I /\ �c /\ �UIYeSI 4. Weather moves from West to East on the Earth. the weathern /\ I = I /\ �b /\ eastd /\ the Earthl 5. Rifles were brought to Japan by Portuguese. �n /\ Portllgalg /\ I bringp I /\ riflesc /\ .Iluland In this example, a semantic factoring is made for the word "Portuguese." 6. Oxidation of alcohol produces acetone. Illcoholn /\ I become I /\ acetollild /\ oxidatjoDi 7. Shortage of insolation causes shortage of rice crop. (shortagen /\ insola1iQ.ug) /\ I cause I /\ (�en /\ (ricen /\ Cl1lllg)g) In this example, a double bracket is used.

References

1 E.Wtlster: Einfiihrung in die Allgemeine Tenninologielehre und Tenninologische Lexikographie. WienINew York: Springer 1979

2 Engineering Joint Council: Thesaurus of Engineering and Scientific Terms, New York: Engineering Joint COlU1Ci1 1962.

3 LAnzai: Fourth Ann. Mtg. Info.Sci.Tech.(JPN), 1968