<<

A Case-Basedapproach to Software ComponentRetrieval

Carmen Fern~indez-Chamizo, Luis Hermtndez-Yltfiez, Pedro A. Gonz~lez-Calero and Alvaro Urech Baqu~

Dep. Informtttica -Fac. F/sica - Universidad Complutense 28040 Madrid, SPAIN 34-1-3944381 (Phone); 34-1-3944687 (FAX); [email protected]

From: AAAI Technical Report SS-93-07. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. Abstract

A major problemconcerning the reusability of software is the retrieval of software components. Different approaches, ranging from automatic indexing methods to knowledge-basedsystems, have been followed to solve this problem. In this paper we present three prototypes which implementdifferent methods of component indexing. From these prototypes we propose a hybrid system that uses CBRas primary approach but taking advantage of IR methods to facilitate the access to the adequate components.

1. Introduction the componentthat best suits the current needs,and build a custom-tailored software componentusable in the Softwarereuse is widely’believedto be one of the most software systembeing developed. promisingtechnologies for improvingsoftware quality Theuser shouldbe able to find possiblecomponents and productivity(Biggerstaff & Richter1987). with only an approximate idea of the component’s Work on reusability has followed several function, without requiring to write a detailed formal approaches.As Krueger(Krueger 1992) says, we can find specification. verydifferent types andlevels of reusability. Hedivides the different approachesto software reuse into eight categories: high-level languages, design and code 2. Software Component Retrieval scavenging, componentes,software schemas, Oneof the fundamentalissues in the reusability of application generators, very high-level languages, softwareis the retrieval of softwarecomponents. In order transformational systems and software architectures. to reuse a software component,we have to be able to Theseapproaches rely on reuse techniquesthat rangefrom retrieve it froma componentbase in an easy and efficient abstractions at a very low level, such as assembly way.Therefore, the softwarecomponent retrieval focuses languagepatterns, to very high level abstractions, as in two majorproblem areas: the classification of the softwaredesigns. However,there are manytechnical and software componentsand the retrieval method. Both non-technicalproblems to be solved before widespread problems are interrelated, because the way the softwarereuse becomesa reality. Oneof the problemsis componentsare organizedinfluences the retrieval method the one concernedwith the classification, storage and to be used. retrieval of reusable components.Our current workis aimedat the solutionof this problem. Traditionalapproaches to softwareretrieval fall into two complementarycategories: high.level classification If a reusabilitylibrary is to be successful,it mustbe techniques, which emphasizeretrieval by software structured to help the reuser to find softwarecomponents category, and low-level cross-reference tools, which of interest, browsethrough related componentsto locate facilitate variouskinds of browsingat the codelevel. We

This workis supportedby the SpanishComittee of Science& Technology (CICYT, TIC92-0058)

35 are primarily concernedwith high-level techniquesbut Those LAswhich have a high resolving powerin the low-level tools are also needed to establish some documentwith respect to the corpus are selected as importantrelationships betweencomponents at the code indices. In Guru,all softwarecomponents are classified, level. stored, compared,and retrieved according to these indices. Theycan even be organizedinto hierarchies for The high-level classification of the software browsingusing clustering techniques. At the retrieval componentscan follow basically two approaches.We can stag.e, the user canspecify a queryin free-style natural extract the classification informationfrom the component language, which is indexed using the sameindexing itself (code, documentation).Or, on the other hand, technique.The set of LAsextracted from the querydirects can implementa classification schemeusing information the repositorysearch. Variousranking measures can then about the componentsthat lies outside of them.That is, be usedby the Gurusystem to select the best candidate we can provide for each componentsome additional for a particular query.This system has beenapplied to the informationfor the classificationscheme to base on; this UNIXcommand set and also to an object-oriented class information could be obtained, for example,from an library (Helm& Maarek1991). analysis of the domain.Most of the systemsbased on the first approachuse automaticindexing methods, while the As the informationprovided by IR tools is derived systems based on the second approach are usually automatically,this approachpresents advantages in cost, "knowledge-basedsystems". transportability and scalability. Statistical methods, however,can’t substitute for meaning. 2.1. Automaticindexing approach 2.2. Knowledge-Basedapproach Dueto the increasingsize of natural languagedescriptions of softwarecomponents in recent libraries, information Thereis a growinginterest in the potential contributions retrieval (IR) techniquesbased on statistical methods of artificial intelligenceto softwareengineering (Arango (Salton 1986) are becomingmore usual in component 1988). Developmentof knowledge-based tools for retrieval. This approachextracts informationfrom the softwarereusability is oneof the mostpromising research natural-language documentation of the software topicsin this area. components.It doesn’t use any semanticknowledge and Thekey feature of this approachis that it draws it doesn’t intend to understandthe documentation.The semanticinformation about software componentsfrom a goal of this approachis to characterizeeach component humanexpert. Knowledge-basedsystems are often very by a set of indicesthat are automaticallyextracted from sophisticated.As a tradeoff, they requiredomain analysis its natural languagedocumentation. and a great deal of pre-encoded, manually provided Maarek(Maarek et al. 1991) identifies three semanticinformation. requirementsthat shouldbe fulfilled by IR techniquesin Prieto-Dfaz(Prieto-Diaz &Freeman 1987) created the softwaredomain: allow multiple word retrieval, select a classification schemebased on the library science. He onlykey indices, and achievehigh precision. proposes a set of six facets: three related to the Thesystem proposedin (Frakes and Nejmeh1987) functionality of the componentand three related to its uses an existing IR system, CATALOG,for storing and environment.The different values a facet can have are retrieving C software components.Each componentis called terms. Theseterms are structured aroundcertain characterizedby a set of single- termindices that are supertypes that represent organizing concepts. automatically extracted from the natural- language Conceptualdistances betweenterms are assigned by the headers of C programs. user. Toclassify a component,a value for eachfacet, a term, mustbe given, so eachcomponent is characterized The atomic indexing unit in the Guru system by a six-tuple of terms. The search for a reusable (Maareket al. 1991)is the lexical affinity (LA).An componentis accomplishedby entering a querywith six betweentwo units of languagestands for a correlation of terms.The set of termsis finite, but a thesaurusis provided their commonappearance. A set of LA-basedindices (or to help makingthe query. Theconceptual graph that profile) is built for each documentby performing organizesdomain concepts represents manuallyencoded statistical analysisof the distributionof wordsnot only in knowledgeabout the domain. the document,but also in the corpusformed by the set of all the documentsin the repository.LA-based profiles are The system proposed in (Woodand Sommerville automatically built for each componentdescription. 1988) uses Conceptual Dependency(Schank 1972) 36 represent knowledgeabout software components.They hierarchyensures that componentsare properlyorganized define the "componentdescriptor frames" in order to andcategorized. The taxonomy can also be useful in query represent the function performedby the componentand formulation and reformulation. Whenquerying the the objects manipulatedby the function. After analyzing database,if there are no answersor if there are too many severalapplication domains, they identified from10 to 25 answers, the hierarchy can be used to specialize, basic functions(primitive actions) for software.There generalize, or look for alternatives for an appropiate one basic functionfor eachclassification of conceptually portionof the query;modify this portionand queryagain. similar verbs, that is, verbs that describe semantically LASSIEincorporates a natural-languageinterface which similar software functions. Also required is a uses a list of compatibilitytuples to parse the input. classification of the objects manipulatedby software Compatibility tuples, which indicate plausible components,into classes or "nominals"that represent associations amongobjects, are obtained from the conceptuallysimilar objects. Theinterface to the system frame-like knowledgerepresentation mechanismused by is forms-based. LASSIE. Embley and Woodfield (Embley & Woodfield All these knowledge-basedsystems have several 1987)define a knowledgestructure for a softwarelibrary things in common.They all represent the knowledgein consistingof abstract data types (ADTs).This knowledge frame-likestructures with similar sets of slot-fillers. They structure supportsdifferent relations amongADTs. The all organize the knowledgearound the functions ADTsin the library includenatural languagedescriptions performed by the components,although someof them and keywordsto assist in findingand browsing activities. also includeframe representations of the objectsinvolved. Relationships in the ADTlibrary can be automatic In every single case, the characterization of the (relationships implied by ADTinformation contained in componentswith slot-fillers is donemanually, following the library) or user-defined(explicitly providedby the a pre-establishedmodel of the domain. user). Defined relationships for automatically Another approach is undertaken in CODEFINDER determiningcloseness, generalizations,specializations, (Henninger1991). This systemfaces a slightly different generics, and aggregationsimplicitly imposea useful goal. It exploresthe problemof finding programexamples structureon the library. In addition,library administrators which are relevant to a design task. CODEFINDERuses may also define their own generalizations, an associative spreadingactivation methodfor locating specializations,and classifications to further aid the user. softwareobjects. The"find" operation, implementedin the prototype, allows several meansby whichan ADTcan be located: Curtis (Curtis 1989)has analyzeddifferent software by name,by general descriptionsin natural-language,by indexing methodsused by knowledge-basedsystems. He keywords or by operation descriptions. Closeness postulatesthat the effectiveuse of a reusablelibrary will betweennatural-language descriptions is determinedby require an indexing schemesimilar to the knowledge evaluating the numberand frequencyof content words structures built into mostprogrammers working in an foundin both the givenand candidatedescriptions. application area. Thechallenge of suchindexing schemes is that the organizationof a programmer’sknowledge will TheLASSIE system (Devanbu et al. 1991) embodies changewith increasedexperience. a frame-based knowledgerepresentation. Software componentsare describedin terms of the operationsthat they perform.Each of these actionsis describedby giving 3. Previous work its actor, object, recipient, agent,environment, and so on. Theserelationships are codedin a specialized knowledge Wehave developedthree prototypes: ARGOS,S ARC and PROSA.ARGOS is an IR system which helps UNIX representation system which classifies them into a conceptualhierarchy. The four principalobjects typesare users find out the commandsthey need. TheSARC system implementsa componentclassification mechanismbased OBJECT,ACTION, DOER, and STATE.Nodes below DOERand OBJECTrepresent the architectural on facets. The PROSAsystem is a CBRsystem for component of the system. Nodes below ACTION ProgramSynthesis which indexes software components represent the system’s functional component. The in a DynamicKnowledge Base. These prototypes are relationship between the two system componentsis describedbelow. captured by various slot-filler relationships between ACTIONs, OBJECTsand DOERs. This taxonomic

37 3.1. ARGOS: An IR assistant for UNIX Total values of frequency of appearanceof the wordsand LAsin the wholecorpora are computed. ARGOS(Buenaga et al. 1992) is an on line intelligent assistant for UNIXbased on IR techniques,user modeling Valuesfor the resolving powerof eachword and LA and hypertext. The goal of this system is to obtain are calculated for eachdocument. information about the UNIXcommand (or commands) that implementsan operationdescribed by the user using 3.2. SARC: A faceted system for class natural-language.The UNIX manual text files constitute retrieval the document database. ARGOSuses the method proposedin (Maarek1991 ) to characterizeeach document Wehave also developed the SARCsystem (Cervig6n by detectingthe lexical affinities (LAs). Hernfindez1992), a prototypethat helpsthe reusersin the retrieval of software components. The software The user will normally use ARGOSto solve a componentsare the classes of an object-orientedlibrary particularproblem, but he or she will needto makeseveral and the systemlets the user select one of twolibraries: queriesto find the solution. Todisplay the informationon S malltalkor ObjectProfessional. The system incorporates screen, the systemtakes into accountnot only the last a friendly user interface with multiple windows,mouse query,but the wholeset of previousqueries relative to the supportand context-sensitivehelp. user’s problem.There is a buttonfor the user to click on whenthe documentationdisplayed is irrelevant to his or SARCintegrates high-level functional knowledge her problem.The system will use this informationabout and low-level knowledge.The high-level functional irrelevance whendoing the subsequentsearches. This is knowledgeis built into the classification scheme,a accomplishedby the ARGOS’User ModelModule. The simplified version of the faceted classification of user maypartially reset the user modelto indicate that he (Prieto-Dfaz 1991) with somemodifications due to the or she wantsto start a newsearch on a different subject. differences betweenobject classes and general software components.The components are characterizedby facets. ARGOSuses hypertext techniques as navigating For eachfacet, a componenthas an associatedset of terms tool through the information. Whenthe user wants to (or keywords).In order to find candidatecomponents, the knowmore about a topic, ARGOSsearches the related user selects one termfor eachfacet and the systemshows informationin the documentdatabase, and displays it. The a list with all the componentsthat matchthe required hypertext links are not fixed. They are generated terms,if any. dynamically,depending on the interest of the user. Thesystem also uses low-level knowledgeacquired Theuser modeltries to guess whatare the user’s by a deep analysis of the source codeof the components. interests in order to modifythe subsequentsearches or to Thefaceted classification is constructedby the interaction makecorrections to the outputof the IR module.The user with expert programmerswho characterize the new modelis incremental. The information is implicitly componentsthey incorporate to the library. This means obtainedby the interactionwith the user. that the characterizationis moreor less subjective.For this Theanalysis of the documentsin the database is reason,the systemuses an affinity metricbased in the code performedonly once. The results of the analysisare stored shared betweenany pair of components(Chidamber in auxiliary files. In the analysis of eachdocument, the Kemerer1991). These affinity values let the system followinggoals are accomplished: associate to eachcandidate a set of "near" componentsto be consideredby the user. Theuser establishesthe affinity ¯ Onlyopen class words(nouns, verbs, adjectives and thresholdthe systemuses in this search. SARCuses other adverbs) are considered. Closed class words metrics in order to computethe reuse complexityof the (pronouns, prepositions, conjunctions and selected components,sorting them by the complexity interjections)are ignored. values. ¯ Wordsare taken into their canonical form, Oncethe systemdisplays the selected components, eliminating morphemesand suffixes. the user can inspect the code and access a hypertext browserthat allows himor her to get moreinformation ¯ For each document,the set of words and their about the selected and other related components. frequency of appearanceis computed.The set of lexical affinities andtheir frequencyof appearance Finally, whena newcomponent is added to the is also computed. library, the systemasks the programmerfor the termsthat

38 characterizethe facets of the newcomponent. In order to integers, links, etc. and non-primitiveobjects like linear help the user in the characterizationof the facet that lists, trees, graphs,etc. Thesystem knows implicitly about reflects the functionality of the component,the system the primitive objects. Non-primitive objects are extracts keywordsfrom the commentsof the source code. introducedinto the systemgiving their namesand a set of Theuser mayselect any of those keywords,or any other attributes. Newobjects are definedin termsof the objects word,as terms. (primitive or non-primitive)known by the system. Operationsrefer to the actions that manipulatethe 3.3. PROSA:A CBRsystem for Program objects. Wecan distinguish betweenprimitive operations Synthesis (non-definedoperations) and non-primitiveoperations The study of the behavior of humanprogrammers can (stated in terms of primitive or previously defined operations). offer useful guidancefor supporting software reuse, because human programmers perform many The system makesabstractions from the common programmingtasks by "reusing" previously acquired attributes of various objects or operations. Whenan mental schemas,rather than by reasoning from first abstract conceptis obtainedin this manner,the system principles (Rich and Waters1988) (Steier 1991). asks the user for a name.This name,if providedby the user, will identifythe abstractconcept. Programmingis usually taught by example.From a set of training programs,students can abstract some Eachease stored in the CaseLibrary consists of a programschemas, or even somegeneral programming problemand its solution. A solution is described at techniques. Therefore, we consider that CBR(Riesbeck different levels of abstraction (stepwise refinement &Schank 1989) is a natural approachto the software paradigm).At each level, the solution is expressedin design problem.Following this approachwe developeda pseudocodeas the compositionof several subproblem prototype, the PROSASystem (Mdndiz et al. 1992), solutions (software components).Therefore, we can whichis describedbelow.. represent recursively the refinement tree which correspondsto the problemsolution. Thefinal refinement The core of the system is a DynamicKnowledge levelin this tree canbe directlytranslated to an imperative Base (DKB)(Fermindez-Valmayor and Fermindez programminglanguage. Problem descriptions and Chamizo1992), implementedas a discrimination net of solutions are also introducedinto the systemas property frame-like structures named MOPs(Memory lists and they are stored in the DKBas instance MOPs. OrganizationPackages) (Schank 1980). This system Problem(and subproblem)solutions are accessible initially developedto modelthe learningprocesses in a giving the problemattributes or someof the concepts different application domain(Fem/mdez-Valmayor and mentionedin the problem(or subproblem)description. Fern~dez Chamizo1991). Casesare indexedin the DKBby the operationsthey Initially, weintroduced into the DKBtwo kinds of implement. The subproblems obtained in the information: a basic Concept Dictionary and a Case decompositionof the initial problemare also indexedby Library. The Concept Dictionary is a hierarchical the operations they implement.These indexes allow the structurethat containsthe basic terminologyto be usedin reuse of subproblemsolutions in other problems. the application domain.At any time, the user can expand the dictionary(and the CaseLibrary) according to his Whenintroducing a newtraining case, the system her needs. Eachconcept definition is introducedas a name attemptsto find commonattributes betweenthe newcase and a property list (set of attribute-value pairs and existing cases resident in the DKB,similar to the characterizing the concept). Weaccess the conceptsby abstraction process we described in the Concept givingtheir namesor by givingtheir attribute lists. Access Dictionary. Fromthese commonattributes the system by nameis providedby a concepttable whichlinks each generates abstraction MOPswhich represent generic namewith the MOPcorresponding to the concept in the problems. Commonattributes are selected from the DKB. problem descriptions (without refinement trees). Therefore, they correspond to abstract problem There are two kinds of entities in the Concept descriptionsinstead of abstractsolutions. Dictionary:objects and operations. Whena new problem is presented, the sytem Objectsrefer to entities that are manipulatedby a searches the DKBfor analogousproblems, by following program. Weuse primitive objects like characters, a top-downstrategy. The major goal of PROSAis the

39 adaptation of the retrieved solutions to solve the new library, the software componentsare classes of problems. The problem solving mechanismsused in this objects which include the operational blocks adaptationprocess are out of the scope of this paper, where (methods)that are applicable to these objects. we are concernedwith the software indexing methods. In a class library, the documentationis organized around object classes instead of around functional 4. A Hybrid Approach to blocks. ARGOSand PROSAassumed that each Software Components Retrieval software description corresponded to a functional . The goal of our current work is to develop a tool for indexing and retrieving software components from a Due to the inheritance mechanism, many class library of object classes. In fact, this tool will be part of a descriptions are incomplete because the inherited wider environment that will embodyseveral tools for methodsare decribed in the ancestor classes. helping and teaching the usage of the class library. We The Case Library of software componentswe used intend to take advantage of the results of the three in the PROSAsystem was specifically designed for previously developedsystems, but taking into account the it. Therefore, the functional description of each different requirements of the newsystem: software componentwas tailored to fit the property ¯ Object-oriented design, unlike classical functional list required by the prototype. Now,the goal is to design, bases the modular decomposition of a apply the same indexing methodsto a commercially software system on the classes of objects the system available class library. manipulates, not on the functions the system performs (Meyer 1987). Therefore, in a class

Load Declaration constructorLoad(var S : idStream); Purpose Loada string dictionary froma stream. Ducription S is a properlyinitialized streamobject (a streamfile openedfor reading,for example).Note that S can be any descendant of the IdStreamtype,which includes DosIdStream,BufldStream, and MemIdStream. Loadreads the next sequenceof bytes fromthe streamS. Thesebytes musthave beenwritten by a previouscall to StringDict.Store.Load allocates heapspace for the hash table, then reads a series of strings and data valuesfrom the stream and adds themto the dictionary. If Loadfails, the constructorwill return False andan error codewill be stored in the global variableInitStatus. Thestream registration procedurefor a StringDictis StringDictStream.

Declaration procedureStore(var S : idStream); Purpox Storea string dictionaryin a stream. Deecription S is a properlyinitialized streamobject (a streamfile openedfor writing, usually). Notethat S can be anydescendant the IdStreamtype, whichincludes DosldStream,BufldStream, and MemIdStream(although you typically wouldn’t write to a Meml-dStream). Store writes the current size of the hashtable followedby a packedimage of all the strings anddata values for the dictionary. This imagemay be read by a later call to Load. Tocheck for errors that mayhave ocurredduring the Stere, call the stream’sGetStatus function as shownin the examplebelow. Thestream registration procedurefor a StringDictis StringDictStream.

Figure 1

40 As starting point, we have selected two widely ("string dictionary")is describedin twelvemanual pages, available class libraries, Object Professional (Turbo includingthe description of its thirteen methods.Some PowerSoftware 1990) and Smalltalk(Digitalk 1986), fragmentsof two of its methoddescriptions are shownin experimentwith. In this way,we can benefit fromthe Fig. 1. experienceacquired using the faceted SARCsystem, that Wehave analyzedthe purposedescription of every workswith the sameclass libraries. Weare particularly methodin this library, and also somemethods of other interestedin the set of termsselected to characterizeeach different libraries, and wecan concludethat the syntax class, and in the low-levelknowledge obtained from the and the vocabularyused in these descriptions is very deep analysis of the sourcecode (inheritance and client simple.By performing a domainanalysis, wecan identify relationshipsbetween classes). a set of basic functions and objects that allows us to Theclass libraries wehave selected are relatively expressthe functionalities of the method.At the moment, small. Theycontain aroundone hundredclasses and one we are characterizing each methodby a MOPstructure thousandmethods. The associate documentationcontains that will be stored in the DynamicKnowledge Base approximatelytwo thousand pages of text, mostof which (DKB)that we used in the PROSAsystem. Following are natural languagedescriptions with someexamples of with the previous example, we will have two MOPs use at the sourcecode-level. characterizingthe twomethods of Fig.1. Theuser will describehis or her needsby givinga I-M -LOAD.#12 M-LOAD free-text generaldescription of the class or of the methods isA he or she requires. Theretrieving processconstitutes an selector : Load iterative process of queryingthe systemto retrieve implementor : l-M-StringDict adequatecomponents. This process can be implemented code: I.M-LOAD-IMPLEMENTATION.#15 as a formulate-retrieve-reformulatecycle (Devanbuet al. origin : I-M-Stream 1991).If the answerto an initial queryis unsatisfactory, destination : I-M-StringDict the user can reformulate the query by using partial descriptionsof the retrieved classes, or browseamongst I-M-STORE.#5 iaA M-STORE functionally related classes. Anotherapproach would be basedon a interactiveprocess where, after the first query, selector : Store the systeminterrogates the user to disambiguatethe implementor : I-M-StringDict meaningof the query. Weconsider that the browsing code : I-M~TORE-IMPLEMENTATION.#8 approachis the mostadequate approach for our initial origin : l.M~StringDict prototype. d~tination: I.M-Stream Thekey points of the systemwill be the indexing and relrieving methods. Everymethod MOP includes three slots specifying the ownerclass of that method,implementor, the actual Indexing methods implementationof the method,code, and the identifier of the method,selector. The MOPalso includes the slots Wewill assignan indexnot onlyto the classes but also to correspondingto the specific action being described every methodof each class. Eachmethod will be linked (origin anddestination in this case). to the ownerclass and eachclass will be linked to every These two MOPSwill be linked in the DKBunder oneof its methods.This will allowthe accessby class and the corresponding action type, LOADor STORE.Both the access by method. actions are placed under the basic action ASSIGN, Weintend to use a hybrid approach (knowledge becausethey are instancesof it. basedand automaticfree-text indexing)to characterize This process will producea classification of the the software components.We describe this approachin methodsin the library. Amethod will usuallybe classified the context of the ObjectProfessional library. A class under only one action, but whena methodimplements descriptionmay range from 3 to 55 text pages. A method, severalactions it will be placedunder all of them. however,is usuallydescribed in less than onepage and its purposeis describedwith only one line. This figures are MOPscorresponding to functionally related not mantainedin other libraries, but the proportionis more methodswill be close in the DKB.For example,there are or less conserved.For example,the object StringDict manyother "store methods"in the library. Oneof them

41 correspondsto the class SingleList and its associate MOP affinity based indices of every class in the library. These is: indices will be usedin the first step of the analysis of the user’s query. Furthermore,we will create for each class a I-M-STORE.#7isA M-STORE MOPcontaining information about its structural selector : Store description, its relationships with other classes implementor: I-M-SingleList (inheritance, clien0 and the links to all its methods.The structural description defines a set of values for the class, code : I-M-STORE-IMPLEMENTATION.#12 i.e. a list of other classes neededto construct the current origin : I-M-SingleList class. For example, destination : I-M.Stream StringDict - {Dictionary of strings} object(Root) This MOPwill be placed under the same M-STORE sdSize : Word; {Maximum elements in hash pool} MOPthat the MOPrepresenting the previous "store sdUsed : Word; {Current elements in hash pool} sdPool : HashPoolPtr; {Pointer to hash pool} method". These methods, however, were not close in the sdStatus : Word; {Status variable} class hierarchy becausethey belongto different classes. The slructural description and the relationships with Anabstraction MOPwill be generated from the two other classes will be obtained from the source code using MOPsbecause they have the same filler for the the low-level tool of the SARC system. MOPs destinationslot: correspondingto classes are stored in the DKBunder the MOPOBJECT at the root level of the memory. A M-STORE-TO-STREAMisA M.STORE fragment of the resulting memoryorganization is shown destination : I.M-Stream in Fig. 2.

Aclass description, as previouslystated, can extend Retrieving method to several text pages. Therefore, it would be a very The analysis of the user’s query will be madein two complextask to express all this information as MOPs.We steps: think this is the right place to use statistical indexing methods. So, we will use ARGOSto obtain the lexical

Figure 2

42 1. Thequery is indexedby the samestatistical Automaticindexing methodshave proved to be methodused to indexthe classes. Theprofile of lexical succesful whenapplied to software components.We don’t affinities detectedin the queryallows the accessto the haveany results that predict better results in component classes with similar profiles. MOPscorresponding to retrieval by addingknowledge based methods. these classes andall their methodswill be activated. Anywaywe believe that CBRapproaches can be 2. The query is analyzed again by an useful whenteaching (or aiding)the user to reusesoftware expectation-based parser (Dyer 1983) (Martin 1989) components.If someof the componentsin the library which uses the MOPsactivated in the step 1 as comefrom past designs, the experienceacquired in the expectations.When the parser recognizesthe methodsor designingof sucha systemcan be of use to novices.Even the structural descriptionof a class, this class will be experienceddesigners, can save time and effort using displayedas an answerto the query. A completematching reusable design templatesand exploringrationale from betweenthe queryand the selectedclass is not necessary, previousdesigns, instead of findingtheir waythrough trial becausethe class requiredby the user maynot exist in the and error prototyping. library. In this case, the user wantsto obtaina functionally Aswe undertook not onlythe retrievingtask but also closeclass to adaptit. If the parserhas recognizedseveral the educational task, we need to represent the user’s classes, the class with morerecognized methods will be mentalmodel of the domain,and also be able to reason displayed. If several classes have the samenumber of about the functionality of the components.Therefore, in recognizedmethods, the rate betweenthe functionality order to develop an environment which helps requiredby the user and the global functionality of the programmerslearn about the library’s structure and class will be used to select the candidate class. For contents, we have chosenCBR as the primary approach. example,if the querywas "a dictionaryto be usedto detect Weexpect that this approach will produce, as a whethera particularword is in the dictionaryor not", two side-effect, a better performancein the retrieving of classes wouldbe selected, SlringDictand StringSet. Both softwarecomponents. Even in this case, the information implementthe two methodsrequired, but the first one provided by IR tools wouldbe useful since it can be implementsanother eleven methodsand the secondone derived automatically. doesn’thave any others. Sothe S tringSetwill be displayed becauseit fits better the user requirements. 6. References Thisparser has to be completedwith a browser.This browserwill allow to explore the classes functionally Arango,G., Baxter, I. & Freeman,P., 1988.AFramework related to the class selected by the system.The browser for IncrementalProgress in the Applicationof Artificial will also allowto explore similar methodsbelonging to Intelligence to Software Engineering, ACMSoftware differentclasses. Eng.Notes, vol. 13, no. 1,46-50.

Thecurrent state of the prototypeonly supportsthe Biggerstaff, T.J. & Richter, C., 1987. Reusability indexingmethods. We are undertakingthe design of the Framework,Assessment, and Directions, IEEESoftware, parser and specifyingthe browserfeatures. Weare also vol.4, 2. recodifying someparts of the prototype because the previous systems were implemented in different Buenaga,M., Fernandez,B., Vaquero,A. & Urech, A., languagesand machines. 1992. An OnLine Intelligent Assistant for Unix, Tech. Report DIA92/16. Departmentof ComputerScience, Theprototype is being implementedin BIMProlog ComplutenseUniversity of Madrid. underSunOS 4.1 in a SUNSparc workstation. Cervig6n,C., Hernfindez,L., 1992. Manualde uso del 5. Conclusions Sistemade Ayudapara la Recuperacitnde Clases. Tech. Report DIA92/11. Departmentof ComputerScience, Wehave presented a hybrid approachto the indexingand ComplutenseUniversity of Madrid. retrieving problem for software components. This Chidamber,S.R. & Kemerer, C.F., 1991. "Towardsa approachconsists of using CBRtechniques to represent Metrics Suite for Object-OrientedDesign". OOPSLA-91. knowledgeabout classes and methods,but simultaneously usingstatistical indexingmethods to facilitate the access Curtis, B., 1989. "Cognitiveissues in reusing software to classes. artifacts", SoftwareReusability, volume H, Applications

43 and Experience, (Biggerstaff, T.J. & Perlis, A.J., eds.), Martin, C., 1989. "Case-based ". Inside ACMPress, Addison-Wesley Publishing Company. Cased-Based Reasoning. Riesbeck, C.K. & Schank, R.C.(eds.) 1989. Hillsdale, N.J. Lawrence Erlbaum Devanbu,P., Ballard, B.W., Brachman,RJ. & Selfridge, Associates. P.G., 1991. "LASSIE: A Knowledge-Based Software Information System", Automating Software Design M6ndiz, I., Fern~Indez-Chamizo, C. & (Lowry, M.R. & McCarlney, R.D., eds.), AAAIPress/ Fermindez-Valmayor, A., 1992. " The MITPress. Using Case Based Reasoning", 12 IFIP Congress, Poster Session: and Maintenance, Digitalk, 1986. SmalltalklV. Tutorial and Programming Madrid, Sept. 1992. Handbook. Meyer, B., 1987. Reusability: The Case for Dyer, M., 1983. In Depth Understanding. MITPress. Object-OrientedDesign. IEEESoftware, vol.4, 2.

Embley, D.W. & Woodfield, S.N., 1987. "A Knowledge Prieto-D[az, R., Freeman,P., 1987. Classifying Software Structure for Reusing Abstract Data Types", Proceedings for Reusability. IEEESoftware, January. of the Ninth International Conference on Software Engineering, ACM,360-368. Prieto-Dfaz, R., 1991. Implementing Faceted Classification for Software Reuse. Communications of the Fern~dez-Valmayor,A.& Fermtndez Chamizo, C., 1991. ACM,vol. 34, n~ 5. "An Educational Application of MemoryOrganization Models", AdvancedResearch on Computers in Education Rich, C. &Waters, R.C., 1988. Automatic Programming: (R. Lewis& S. Otsuki, eds.). Elsevier Science Publishers Myths and Prospects, Computer, August 1988, 40-51. B.V., IFIP. Riesbeck, C.K. & Schank, R.C., 1989. Inside Fermtndez-Valmayor,A.& Fermtndez Chamizo, C., 1992. Cased-Based Reasoning. Hillsdale, N.J. Lawrence Educational and Research Utilization of a Dynamic ErlbaumAssociates. KnowledgeBase, Computers & Education, 18, No. 1-3, Salton, G., 1986. Another look at Automatic Text- 51-61. Retrieval Systems, Communicationsof the ACM,vol.29, Frakes, W. B. & Nejmeh, B. A., 1987. "Software Reuse . through Information Retrieval". Proceedings of the 20th Schank, R.C., 1972. Conceptual Dependency: A Theory Annual HICSS, Kona, HI. of Natural Language Understanding, Cognitive Helm, R., Maarek, Y.S., 1991. "Integrating Information Psychology, 552-631. Retrieval and DomainSpecific Approachesfor Browsing Schank, R.C., 1980. Language and Memory, Cognitive and Retrieval in Object-Oriented Class Libraries". Science, 4, 243-284. OOPSLA-91. Steier, D., 1991. "AutomatingAlgorithm Design within a Henninger, S., 1991. "Retrieving Software Objects in an General Architecture for Intelligence", Automating Example-Based Programming Environment". Software Design (Lowry, M.R. & McCartney,R.D., eds.), Proceedings of the Fourteenth Annual International AAAIPress/The MIT Press. ACM/SIGIRConference on Research and Development in InformationRetrieval. Turbo PowerSoftware, 1990. Object Professional User’s Manual. Krueger, C.W., 1992. Software Reuse. ACMComputing Surveys,vol. 24, 2. Wood, M. & SommerviUe, I., 1988. An Information Retrieval System for Software Components, ACMSIGIR, Maarek, Y.S. et al., 1991. An Information Retrieval Approach for Automatically Constructing Software vol. 22, 3,4. Libraries. IEEETransactions on Software Engineering, vol. 17, 8.

44