ASCUS: an Error-Tolerant Mycological Classification System*
Total Page:16
File Type:pdf, Size:1020Kb
ZOBODAT - www.zobodat.at Zoologisch-Botanische Datenbank/Zoological-Botanical Database Digitale Literatur/Digital Literature Zeitschrift/Journal: Sydowia Jahr/Year: 1990 Band/Volume: 42 Autor(en)/Author(s): Petrini Orlando, Rusca C. V., Szabo I. Artikel/Article: ASCUS: an error-tolerant mycological classification system. 273-285 ©Verlag Ferdinand Berger & Söhne Ges.m.b.H., Horn, Austria, download unter www.biologiezentrum.at ASCUS: an error-tolerant mycological classification system* 0. PETRINI1, C. V. RUSCA2 & I. SZABO2 1 Mikrobiologisches Institut, ETH-Zentrum, 8092 Zürich, Switzerland 2 Institut de Microtechnique, DMT, EPFL, 1015 Lausanne, Switzerland O. PETRINI, C. V. RUSCA & I. SZABO (1990). ASCUS: an error-tolerant mycological classification system. - SYDOWIA 42: 273-285. ASCUS, an error-tolerant classification system to be used in the identification of fungal taxa is described. ASCUS is a hybrid system and combines a connexionist with a rule-based expert system to be used by experts for the preparation of identification keys and by novices for the identification of fungi. The system is tolerant and is not too sensitive to mistakes by the user. It also has a built-in mechanism to deal with user uncertainty and vague qualifiers. The recent development of powerful, yet comparatively inexpen- sive hardware and increasingly user-friendly software now allows most mycologists to organize their collections in databases, to analyze morphological and ecological data with complex statistical packages and to eventually write monographs and research papers with sophisticated word processors at home on their personal com- puters. The introduction of databases that collect and apply the know- ledge of experts has led to the development of computer systems to assist in the identification of organisms by scientists (e.g. plant pathologists, ecologists) who are not taxonomists but have to rely on the use of taxonomic techniques (e.g. SHAPIRO & al., 1974; ESTEP & al., 1989). In mycology, the use of computer keys for the identification of fungi is comparatively recent (KORF & ZHUANG, 1985; POLONELLI & al., 1985; MARGOT, 1980; MARGOT & al., 1984). All applications so far developed, however, are computer versions of synoptic or dichoto- mous keys; they cannot deal with any kind of uncertainty nor can they tolerate some mistakes by the user. Moreover, they have no, or very modest, graphics capabilities and do not allow the preparation of a graphically oriented user-interface. The recent introduction of hypermedia (see below) has overcome the problem of adding * Paper based on a talk given at the Fourth International Mycological Congress, Symposium G-2, Computers and Information Systems, held in Regensburg, FRG, 28th August - 3rd September 1990. 273 ©Verlag Ferdinand Berger & Söhne Ges.m.b.H., Horn, Austria, download unter www.biologiezentrum.at graphics to databases and using them either as identification infor- mation or as on-line help to illustrate the scientific jargon used in the key. For instance, HyperCard has been successfully used to produce a key to guide identification of Zooplankton in the North Sea (ESTEP & al., 1989) and several prototypes written in HyperCard and Super- Card exist to help mycologists in the identification of selected fungal genera (H. CLEMENQON, unpublished; O. PETRINI & L. PETRINI, unpublished). All these applications, however, are computerized syn- optic or dichotomous keys: in no case has a mechanism for reasoning with uncertainty been embedded in such applications, although recently an expert system has been developed which makes use of HyperCard's excellent graphical capabilities (EVANS, 1990). Subjectively defined characters (e. g. shapes and size of spores or ascocarps, frequency of occurrence by a given taxon) are very common in mycology. Conjunctive (AND), disjunctive (OR), and mixed (AND/OR) sorting is also currently used in most fungal descriptions. The presence of linguistic descriptors ("rare", "often", "more or less") makes the identification of a fungus a difficult task for many novices. All these features lead to vague or uncertain definitions. Thus, knowledge-based classification systems able to deal with uncertainty become a necessary tool for the preparation of robust (able to withstand intrinsic contradictions) computer-assisted identification systems in mycology (PETRINI & RUSCA, 1989). We report here on a project (ASCUS) currently underway in our laboratories to develop an error-tolerant classification system to be used in the identification of fungal taxa. A startup-prototype (ASCUS-0) cur- rently under pre-release (alpha) testing is described in detail. Some terminology Confusion exists among biologists on the meaning of terms such as hypermedia (hypertext) and expert systems, as well as on the terminology used to describe some of their properties. Some defini- tions are given below; terms which are not exhaustively explained here can be found in SHAPIRO & ECKROTH (1987). Hypermedia is "... an approach to information management in which data is stored in a network of nodes connected by links. Nodes can contain text, graphics, audio, video, as well as source code [of computer applications] or other forms of data. The nodes . are meant to be viewed through a structure editor." (SMITH & WEISS, 1988). The hypermedia environment is one which allows information (the "nodes") to be linked and accessed by association (the links), in a 274 ©Verlag Ferdinand Berger & Söhne Ges.m.b.H., Horn, Austria, download unter www.biologiezentrum.at similar way human beings rapidly access diverse types of informa- tion. Dealing with uncertainty is not directly possible with hyper- media tools, but methods can be found to simulate it or inference engines (see below) prepared with external expert systems can be activated as external commands by the hypermedia. Software pack- ages based on the hypermedia philosophy are e.g. HyperCard, Super- Card (both for the Macintosh) and HyperPAD (on the IBM-PC and compatibles). The existing hypermedia packages generally work well for building front ends (user-interfaces) for other software (e.g. expert systems) or tutorial systems. An expert system is a computer programme able to do a well- defined kind of reasoning by using a database (the so-called knowl- edge base) that may incorporate facts and rules (GRAHAM & JONES, 1988). The expert system's knowledge is based not only on formal textbook information but also on judgmental or heuristic knowledge derived from the experience of a specialist. An expert system usually consists of five main components: - An user interface - A knowledge base - An inference engine - An explanation module - A knowledge elicitation module Knowledge can be represented as verbal descriptors and graphi- cal illustrations, as a set of production rules, or in a connexionist way. The knowledge base (KB) contains an abstract representation of the knowledge used by an expert in a given area to solve a family of problems such as classification, diagnosis, advisory, tutoring, or planning. Knowledge can be stored as a collection of verbal descrip- tors or graphical illustrations (collectively also called objects) in nodes, connected by arcs (the equivalent of hypermedia links in the expert system terminology), which represent the relationships bet- ween objects or their characterizations. Knowledge is also often represented as a set of production rules. The following is an example of a production rule: Rule: "IF an ascus is present in the specimen THEN the specimen is an ascomycete" Fact: "the specimen has an ascus" the inference engine (see below) will deduce: "the specimen is an ascomycete". 275 ©Verlag Ferdinand Berger & Söhne Ges.m.b.H., Horn, Austria, download unter www.biologiezentrum.at a) Formal neuron c) 3-layer network Sums Thresholds '.•;>'. +1 Out2 <HQ- connection Vi^_i/ connection weights weights Input Hidden Output layer layer layer d) Fully connected network: Hopfield net b) 2-layer network: the perceptron Outi Used as a classifier: Inputs (Characters) Out2 Outputs (Species) connection weights Input Output layer layer Fig. 1. - a. The formal neuron. - b. The perceptron (two-layered neuronal network). - c. A three-layered network. - d. A fully connected network ('Hopfield net'). - For further explanations see text. ©Verlag Ferdinand Berger & Söhne Ges.m.b.H., Horn, Austria, download unter www.biologiezentrum.at Knowledge need not be represented exclusively as a collection of character symbols. One of the most common symbolic representa- tions is the use of numbers. The contingency tables, for example, and many other tools used in statistics (e.g. correlation coefficients, regression lines, tables of covariance) can be seen as numerical repre- sentations of a particular kind of knowledge. Knowledge can also be represented in a connexionist way, a paradigm increasingly used in many scientific fields (e.g. biology, psychology, physics; LIPPMANN, 1987). The basic element of this repre- sentation is the formal neuron (Fig. la), a processing element which roughly attempts to mimic the structural and functional unit of the nervous system in animals. A formal neural network results from the interconnexion of many formal neurons. In a two-layered network (Fig. lb) each input (in this case a taxon's character) is potentially connected to each output (a taxon). In the connexionistic jargon this is called the perceptron topology. Fig. lc is the graphical representa- tion of the three-layered neuronal network. The formal neurons of the intermediate