A Case-Based Approach to Software Component Retrieval

A Case-Basedapproach to Software ComponentRetrieval Carmen Fern~indez-Chamizo, Luis Hermtndez-Yltfiez, Pedro A. Gonz~lez-Calero and Alvaro Urech Baqu~ Dep. Informtttica -Fac. F/sica - Universidad Complutense 28040 Madrid, SPAIN 34-1-3944381 (Phone); 34-1-3944687 (FAX); [email protected] From: AAAI Technical Report SS-93-07. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. Abstract A major problemconcerning the reusability of software is the retrieval of software components. Different approaches, ranging from automatic indexing methods to knowledge-basedsystems, have been followed to solve this problem. In this paper we present three prototypes which implementdifferent methods of component indexing. From these prototypes we propose a hybrid system that uses CBRas primary approach but taking advantage of IR methods to facilitate the access to the adequate components. 1. Introduction the componentthat best suits the current needs,and build a custom-tailored software componentusable in the Softwarereuse is widely’believedto be one of the most software systembeing developed. promisingtechnologies for improvingsoftware quality Theuser shouldbe able to find possiblecomponents and productivity(Biggerstaff & Richter1987). with only an approximate idea of the component’s Work on reusability has followed several function, without requiring to write a detailed formal approaches.As Krueger(Krueger 1992) says, we can find specification. verydifferent types andlevels of reusability. Hedivides the different approachesto software reuse into eight categories: high-level languages, design and code 2. Software Component Retrieval scavenging,source code componentes,software schemas, Oneof the fundamentalissues in the reusability of application generators, very high-level languages, softwareis the retrieval of softwarecomponents. In order transformational systems and software architectures. to reuse a software component,we have to be able to Theseapproaches rely on reuse techniquesthat rangefrom retrieve it froma componentbase in an easy and efficient abstractions at a very low level, such as assembly way.Therefore, the softwarecomponent retrieval focuses languagepatterns, to very high level abstractions, as in two majorproblem areas: the classification of the softwaredesigns. However,there are manytechnical and software componentsand the retrieval method. Both non-technicalproblems to be solved before widespread problems are interrelated, because the way the softwarereuse becomesa reality. Oneof the problemsis componentsare organizedinfluences the retrieval method the one concernedwith the classification, storage and to be used. retrieval of reusable components.Our current workis aimedat the solutionof this problem. Traditionalapproaches to softwareretrieval fall into two complementarycategories: high.level classification If a reusabilitylibrary is to be successful,it mustbe techniques, which emphasizeretrieval by software structured to help the reuser to find softwarecomponents category, and low-level cross-reference tools, which of interest, browsethrough related componentsto locate facilitate variouskinds of browsingat the codelevel. We This workis supportedby the SpanishComittee of Science& Technology (CICYT, TIC92-0058) 35 are primarily concernedwith high-level techniquesbut Those LAswhich have a high resolving powerin the low-level tools are also needed to establish some documentwith respect to the corpus are selected as importantrelationships betweencomponents at the code indices. In Guru,all softwarecomponents are classified, level. stored, compared,and retrieved according to these indices. Theycan even be organizedinto hierarchies for The high-level classification of the software browsingusing clustering techniques. At the retrieval componentscan follow basically two approaches.We can stag.e, the user canspecify a queryin free-style natural extract the classification informationfrom the component language, which is indexed using the sameindexing itself (code, documentation).Or, on the other hand, technique.The set of LAsextracted from the querydirects can implementa classification schemeusing information the repositorysearch. Variousranking measures can then about the componentsthat lies outside of them.That is, be usedby the Gurusystem to select the best candidate we can provide for each componentsome additional for a particular query.This system has beenapplied to the informationfor the classificationscheme to base on; this UNIXcommand set and also to an object-oriented class information could be obtained, for example,from an library (Helm& Maarek1991). analysis of the domain.Most of the systemsbased on the first approachuse automaticindexing methods, while the As the informationprovided by IR tools is derived systems based on the second approach are usually automatically,this approachpresents advantages in cost, "knowledge-basedsystems". transportability and scalability. Statistical methods, however,can’t substitute for meaning. 2.1. Automaticindexing approach 2.2. Knowledge-Basedapproach Dueto the increasingsize of natural languagedescriptions of softwarecomponents in recent libraries, information Thereis a growinginterest in the potential contributions retrieval (IR) techniquesbased on statistical methods of artificial intelligenceto softwareengineering (Arango (Salton 1986) are becomingmore usual in component 1988). Developmentof knowledge-based tools for retrieval. This approachextracts informationfrom the softwarereusability is oneof the mostpromising research natural-language documentation of the software topicsin this area. components.It doesn’t use any semanticknowledge and Thekey feature of this approachis that it draws it doesn’t intend to understandthe documentation.The semanticinformation about software componentsfrom a goal of this approachis to characterizeeach component humanexpert. Knowledge-basedsystems are often very by a set of indicesthat are automaticallyextracted from sophisticated.As a tradeoff, they requiredomain analysis its natural languagedocumentation. and a great deal of pre-encoded, manually provided Maarek(Maarek et al. 1991) identifies three semanticinformation. requirementsthat shouldbe fulfilled by IR techniquesin Prieto-Dfaz(Prieto-Diaz &Freeman 1987) created the softwaredomain: allow multiple word retrieval, select a classification schemebased on the library science. He onlykey indices, and achievehigh precision. proposes a set of six facets: three related to the Thesystem proposedin (Frakes and Nejmeh1987) functionality of the componentand three related to its uses an existing IR system, CATALOG,for storing and environment.The different values a facet can have are retrieving C software components.Each componentis called terms. Theseterms are structured aroundcertain characterizedby a set of single- termindices that are supertypes that represent organizing concepts. automatically extracted from the natural- language Conceptualdistances betweenterms are assigned by the headers of C programs. user. Toclassify a component,a value for eachfacet, a term, mustbe given, so eachcomponent is characterized The atomic indexing unit in the Guru system by a six-tuple of terms. The search for a reusable (Maareket al. 1991)is the lexical affinity (LA).An componentis accomplishedby entering a querywith six betweentwo units of languagestands for a correlation of terms.The set of termsis finite, but a thesaurusis provided their commonappearance. A set of LA-basedindices (or to help makingthe query. Theconceptual graph that profile) is built for each documentby performing organizesdomain concepts represents manuallyencoded statistical analysisof the distributionof wordsnot only in knowledgeabout the domain. the document,but also in the corpusformed by the set of all the documentsin the repository.LA-based profiles are The system proposed in (Woodand Sommerville automatically built for each componentdescription. 1988) uses Conceptual Dependency(Schank 1972) 36 represent knowledgeabout software components.They hierarchyensures that componentsare properlyorganized define the "componentdescriptor frames" in order to andcategorized. The taxonomy can also be useful in query represent the function performedby the componentand formulation and reformulation. Whenquerying the the objects manipulatedby the function. After analyzing database,if there are no answersor if there are too many severalapplication domains, they identified from10 to 25 answers, the hierarchy can be used to specialize, basic functions(primitive actions) for software.There generalize, or look for alternatives for an appropiate one basic functionfor eachclassification of conceptually portionof the query;modify this portionand queryagain. similar verbs, that is, verbs that describe semantically LASSIEincorporates a natural-languageinterface which similar software functions. Also required is a uses a list of compatibilitytuples to parse the input. classification of the objects manipulatedby software Compatibility tuples, which indicate plausible components,into classes or "nominals"that represent associations amongobjects, are obtained from the conceptuallysimilar objects. Theinterface to the system frame-like knowledgerepresentation mechanismused by is forms-based. LASSIE. Embley and Woodfield (Embley & Woodfield All these knowledge-basedsystems have several 1987)define a knowledgestructure for a softwarelibrary things in common.They all represent the knowledgein consistingof abstract data types (ADTs).This knowledge frame-likestructures

A Case-Based Approach to Software Component Retrieval

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support