<<

chime: customizable hyperlink insertion and maintenance engine for Software Environments

P. Devanbu Y-F. Chen, E. Gansner H. Muller, J. Martin Dept. of Computer Science AT&T Labs–Research Dept. Of Computer Science University of California 190, Park Drive University of Victoria Davis, CA, USA Florham Park, NJ 95030 Victoria, BC V8W 3P6 +1 530 752 7324 +1 973 360 8653/8646 +1 250 721 7630 [email protected] chen,[email protected] hausi,[email protected]

Abstract In addition, they have the ability to handle dierent Source code browsing is an important part of program types of networking protocols. These clients are ubiqui- comprehension. Browsers expose semantic and syntac- tous, and are constantly being enhanced. There is thus tic relationships (such as between object and a strong incentive to leverage this increasingly power- denitions) in GUI-accessible forms. These relation- ful browsing technology in other contexts; in particu- ships are derived using tools which perform static anal- lar, chime is aimed at exploiting web-based browsing ysis on the original software documents. Implementing in software engineering environments. Software devel- such browsers is tricky. Program comprehension strate- opment projects are becoming increasingly more dis- gies vary, and it is necessary to provide the right brows- tributed, in response to personnel costs and market seg- ing support. Analysis tools to derive the relevant cross- mentation. Such projects will involve distributed soft- relationships are often dicult to build. Tools ware documents that have complex inter-relationships. to browse distributed documents require extensive cod- Browsers that can handle such distributed documents, ing for the GUI, as well as for data communications. and expose their inter-relationships as traversable hy- Therefore, there are powerful motivations for using ex- perlinks, will be helpful to developers in these projects. isting static analysis tools in conjunction with WWW Web-based source code browsing has been discussed by technology to implement browsers for distributed soft- other authors [15, 14, 23]; our specic goal is to facil- ware projects. The chime framework provides a ex- itate the addition of web-based browsing into existing ible, customizable platform for inserting HTML links software development environments (SDEs). The cen- into software documents using information generated by tral focus of chime is the task of inserting HTML links existing software analysis tools. Using the chime spec- into source code, using information stored in reposito- ication language, and a simple, retargetable ries associated with existing SDEs. Specically, chime interface, it is possible to quickly incorporate a range assumes that the online documents are stored in some of dierent link insertion tools for software documents, sort of (possibly distributed) repository, and that syn- into an existing, legacy software development environ- tactic and semantic inter-relationships between docu- ment. This enables tool builders to oer customized ments are derived by some analysis tools and stored in browsing support with a well-known GUI. This paper a repository in some format (not pre-determined). A describes the chime architecture, and describes our ex- user of chime can then specify a set of links, the rela- perience with several re-targeting eorts of this system. tionship of the links to the contents of this database, 1 Introduction as well as the action to be taken when the links are The (WWW) is an accessible, power- activated. From this specication, chime generates a ful, and ubiquitous medium for the delivery and access link insertion engine that reads documents, queries the of widely distributed documents. The low cost, exi- database and interprets the results, inserts the appro- bility, and ease of access has led to rapid advance of priate HTML links, and outputs the resulting text. An- browser technology. HTTP clients such as web browsers other key goal in the chime project is to provide exi- support sophisticated interactive capabilities: hot lists, bility in link insertion. The variability that chime tries browsing history, , selection forms, etc. to accommodate includes database implementation, in- terpretation of database contents, number of links, and the meaning of links. The rest of this paper is organized as follows. In sec- tion 2, we discuss the importance of browsing functions in software development environments, and the moti- vations of chime. Then, in Section 3, we discuss the

1

details of how chime tries to achieve the goals outlined be leveraged. above. In Section 4, we briey describe some applica- A browsing tool [28, 4] includes a graphical interface tions of chime. In section 5, we discuss relevant work (GUI), and it accesses the document repository and on automatic link insertion, including related projects, a cross-referencing relational database (XRDB). The and the relationship of that work to the goals of chime; relational database has the relations corresponding to this is followed by the conclusion. browsing steps (e.g., from a reference to a function to 2 HTML browsing for Software Documents its denition). The GUI includes document viewers and A key functionality in software development environ- other devices such as buttons, menus and scroll bars for ments is browsing. It has been reported [7] that pro- manipulating the viewers. For example the GNU emacs grammers can spend 50% or more of their time trying editor [28] can use the GNU tags database to support to understand the system, especially in large projects. browsing of C code: a simple mouse action can move the Good browsing support can be helpful during this pro- view from a function reference to a function denition. cess. Students of program understanding have described C++ development environments such as those available two strategies used by to comprehend from Symantec [25] also support various browsing ac- programs: top-down and bottom-up. Empirical stud- tions. With distributed environments, browsers need to ies [19, 17] indicate that programmers use both styles access remote documents transparently in response to of exploration and comprehension. Thus, it would be browsing requests and queries. It is important to note desirable for a browser to support both kinds of explo- here that the information in the relational database used ration. For example, a C++ might start by the browser is typically built by analyzing source rst examining the class structure of a system; she might documents. For example, the used by the then focus on a few specic methods and their calling Ciao [4] browser are built by running analysis tools structure; then she might scrutinize the precise use of over C, C++, or Java programs. These analysis tools class member data of a specic class within the member are quite dicult to build, particularly for complex lan- functions. This might again lead to examine the type guages with sophisticated pre-processors. Tools such as denitions of some class data members, which leads to CIA[3], CIA++ [13], and Acacia[5], which analyze C further exploration of the class structure, etc. Dierent and C++, take many person-years to construct. To the users’ styles and dierent comprehension tasks require extent possible, it would be highly desirable to use the dierent links. This has led us to emphasize exible databases built by existing analysis tools. link conguration as a vital, key goal of the chime sys- Some key issues to be addressed in browser implemen- tem. A secondary goal in the chime system is retar- tation are GUI design, extraction of browsing relations getability. Large legacy projects have a lot of inertia. from source code, and distribution. The goals of chime Part of this is due to personnel and training: it can be are to leverage existing tools and technologies in all of time-consuming, expensive and dicult to introduce an these areas. HTML, the source language of WWW, pro- entirely new development environment into an ongoing vides a powerful and simple GUI interface paradigm. project just because part of the development eort has Public domain (or low cost) browsers such as Netscape been delegated a geographically far removed site. So Communicator support HTML, and are well known to it would be desirable to introduce tools to support dis- many users. These browsers come with many desirable tributed development into the project in a minimally features such as “click-to-browse”, browsing trails, hot disruptive manner. lists, etc. Distribution of documents over dierent ma- In addition, because of the complexity of the cong- chines that communicate using TCP/IP and associated uration and build procedures [31] in large projects, it protocols is handled in a transparent manner. Before can be very expensive to bring new tools and processes HTML browsers can be used to browse source code, one into the environment, specially if these tools need to be must insert software browsing relations into source code run over the entire source base. If at all possible, it as HTML links. We now describe how chime addresses is advantageous to introduce new capabilities into the this link insertion task. development environment in a minimalist, incremental 3 The chime Architecture fashion that avoids global analysis with new tools. A chime is a meta tool; it generates tools that insert goal of the chime system is to introduce web-based HTML links into raw text. The overall framework in source code browsing into an existing development en- which chime-generated tools function is shown in Fig- vironment without forcing changes to the conguration ure 1. and build procedures. This has two advantages. First existing processes are not disrupted. Second, the de- Example and Background velopers don’t have to learn a new browsing tool: their An example fragment of C source code is shown in current familiarity with the WWW browsing tools can Figure 2 before and after link insertion. In this frag-

2

Input Source: if (transmutable(a,b)) j = interSolve(a, b); else j = -1;

Output HTML: if ( transmutable (a,b)) j = interSolve (a, b); else j = -1;

Figure 2: Input Source and Output HTML

that are relevant to chime, and the range of variabilities in links that can be handled. Static Analysis Links, and their insertion Database Tools There are two important parameters of an HTML link: the position: where the link gets inserted (this is often Document Repository (Distributed) called the link anchor) and the semantics: what hap- pens when the user of an HTTP client activates that Input Source link by clicking on it.

Link Insertion Engine In most software browsers, the position of the link is determined by an XRDB, which is derived by mechani- Output HTML cal static analysis of the source code. For example, the positions in the source code (viz., line numbers, column numbers) of all function calls, variable references, etc., Figure 1: Architecture of a link insertion device are stored in an XRDB. The link insertion engine uses this information to introduce links in the input software ment, links are inserted into the occurrences of function document. If the original XRDB can generate informa- names; these would be highlighted in browsers. The tion on link positions in the order of their occurrence developer can then click on the names, and the corre- in the source code, the insertion can be performed very sponding gateway tool mach/cgi-bin/find would be eciently. executed. The eect of this would be to retrieve the An HTML link can either be a simple URL that points source text constituting the denition of the relevant directly to another HTML page, or an invocation of function, into which HTML links would be inserted; the a CGI engine to generate HTML from a data source. resulting HTML would be returned to the browser. This Links can also contain additional information that is can then be a basis for further browsing. It is impor- used by the CGI engine for its processing. For exam- tant to note that links are inserted on demand. Storing ple in Figure 2, the rst link, http://mach/cgi-bin/- “with links” and “without links” versions of all source find?name=transmutable&type=fundlink, contains two would be undesirable in many cases, given the volume additional parameters, name and type, along with val- of software documents, the frequency of change, and the ues. We assume that HTML documents are created importance of presenting accurate information to main- “on-the-y” from source code; all chime links are of tainers. The advantages of dynamic link insertion have the CGI type. For example, if the rst link shown in been explored at length elsewhere [2, 11]. Figure 2 is clicked, the denition of the function trans- As discussed earlier, dierent sorts of links may be re- mutable will be transformed to HTML with embedded quired for dierent types of program understanding and links and then displayed. This is done using the find maintenance tasks. chime can be used to generate link (or other) CGI executable, in several steps. insertion engines to insert dierent sorts of links. A sin- gle chime-based link insertion engine can expose dier- Step 1 find will query the database to nd the URL ent types of links depending on the current maintenance of the target document that actually contains the task. We now describe the elements of an HTML link denition of the function transmutable. We refer

3

to this query as the target query below, since it There may be several views applicable to a given docu- nds the target document. Then, the document ment type. For example, in a function declaration, one itself will be retrieved. view may expose links from classes to their declarations, another from a class to a list of its members or ances- Step 2 After retrieving the document, the database tors, etc. Whenever a document is viewed with chime, will be queried again to locate cross-referencing in- alternate views (if available) will be presented as a series formation about where function calls occur within of links at the top of the page. A default view is used the body of the denition of transmutable. This for each document type; selecting an alternative view query is called the anchor query below. Each tu- makes that the default view for that document type. ple in the anchor relation resulting from this query gives a position or (anchor) for a link to be inserted. A chime application is generated from a set of specica- tions (see examples in Figures 3 and 5) each describing Step 3 Next, the links will be inserted to invoke a spe- dierent elements of the 3 steps of link insertion de- cic CGI such as find; additional information in scribed above. Each specication names an element, the form of parameters (such as name and type in species what type it is (view, link, target query, link Figure 2) that could be used by find could also be query, etc.), and gives values for several attributes of placed in the link. This information can be drawn the element (e.g., for a link, the specication would say from the corresponding tuple in the anchor relation. what its target query is, what its anchor query is, etc.). chime allows each of the steps above to be customized, chime is a domain specic language [26]. As an aside, so that both the position and meaning of links can be we note here that the chime language diers from lan- tailored. It is also designed to be database-retargetable; guages for other domains like source code analysis [8] the chime user can specify how to get the relevant in- or web applications [16]. All chime-based applications formation out of a range of dierent XRDBs. In this have the same basic 3-step algorithm for link insertion. way, chime can accommodate the browsing needs for So unlike [8, 16], the chime language has no procedu- dierent types of tasks (via dierent links) and can pro- ral constructs, but a series of table-like structures that vide a way to add WWW browsing into legacy projects describe the elements of each step of the (xed) 3-step (via XRDB-retargetability). application algorithm. In this sense, chime resembles parser-generator or lexer-generator systems, which use In the rest of this section, we discuss the dierent cus- a xed table-driven algorithm; the languages for spec- chime tomizable aspects of . ifying lexers and parsers have no procedural elements 4 Chime Language either. chime The language is used to specify the particulars Implementation Details of the three steps described in the previous section. We Now we consider in detail how a browsing action (i.e., rst describe the main concepts of the language, and a click on a link) is processed in chime, along with the then illustrate the details of the specication language applicable chime specications and their interpretation. with an example. Each URL in the HTML generated by a chime insertion chime language concepts tool names the link that it is associated with. For exam- The language deals with several basic concepts of brows- ple in Figure 1, the name of the link is fundlink1 in the ing. First, there are documents, which can be of vari- rst URL. This link is described in a chime specica- ous document types: source code documents, test doc- tion, from which the invoked link insertion engine is gen- uments, conguration management documents, formal erated. The engine recognizes fundlink as a link from specication documents, etc. In the following examples, a function to its denition; its specication is shown we just deal with source code documents. Each docu- in Figure 3. The attribute resulttype gives the docu- ment type is associated with one or more views, which ment type of the target document that will be displayed are collections of links. Link specications are central to when this link is exercised: in this case, it has the value chime . They identify: rst, for Step 1, the target query functioncode, which indicates that it is the body of that is used to locate the target document. Link speci- a function denition. This document type is used to cations also determine the type of the target document identify a default view that would be used to display (function denition, class denition, etc.). They also the document. specify the anchor query (which is also specied sepa- rately) for Step 2, and the details on what parameters The specication also names a target query to be eval- are to be inserted into the links (in Step 3). A view rep- 1 resents a specic HTML rendering of a document type, As we shall see below, the names of links, views, etc. are encoded into HTML links. Thus, in practice, to minimize resource exposing a certain set of links relevant for a particular usage, it is important to keep these names much smaller than the task, such as design, implementation, maintenance, etc. ones we use in our illustrative examples.

4

uated against the XRDB. All queries are separately { specied in the chime specication. The target query fdlq = TARGET QUERY string="rigigrep Function type.db \ | fgrep \"$name$\" fundlink = {LINK focus=function resulttype=function-code resultdocument={ATTRIBUTE focus = function start ="refline" anchor-query=funrefs locator = "file1" target-query=fdlq } mapparms = {(name,name2)} } }

funrefs = {ANCHOR QUERY Figure 3: Example of a link specication string = ‘‘rigigrep \"$loc$\" call.db’’ position = {ATTRIBUTE named in the specication of fundlink is fdlq; this linenumber = "refline" query specication is shown in Figure 4. This speci- matchstring = "name2" } cation includes a query string in which to substitute } certain parameters specied in the link before executing the query. For example, the rst link in Figure 2 in- cludes a parameter name whose value names a function Figure 4: Example specications of a target query and transmutable; this value would be substituted into the an anchor query query string. The resulting query “rigigrep Function type.db | fgrep transmutable” would then be eval- uated. The precise mechanics of evaluating the query plaincodeview = {VIEW is determined by the database interface, which is de- fortypes = scribed in the following section. The target query spec- links = ication also describes how to nd the document loca- } tor in the result. The specication of fdlq says that the le locator is given by the attribute file1 of the resulting tuple, and the starting line number is given Figure 5: Example of a view specication by refline. Let’s assume here that the document loca- tion is file://mach2/src/transmutable.c. Using the document location, the document is then retrieved. The The anchor query also species the attributes of the precise mechanics can vary with the particular SDE; the returned anchor tuples that contain location informa- retargeting mechanism is described in the following sec- tion. For example, the funrefs query corresponding to tion. This concludes Step 1 of the link insertion process. fundlink would return tuples containing line numbers, (in the attribute refline) and strings to be matched Associated with each document type, there is a default in each line (in the attribute name2). For example, a view. In this case (Figure 3) the resulting document tuple might have a value 200 for the attribute refline, type is functioncode. The default view associated with and the value "isFixed" for the attribute name2, indi- 2 document type is plaincodeview, shown in Figure 5. cating that an HTML link of the type fundlink should This default view species two links, vardlink and be inserted at the position where the string "isFixed" fundlink. We now turn back to fundlink (Figure 3), is matched on line 200 of the target document. Thus, and examine how Step 2 of link insertion proceeds. The this anchor tuple indicates the position of a call to fundlink specication names an anchor query, funrefs the function isFixed occurring in the body of the tar- which is specied separately (Figure 4). get document containing the denition of the function The funrefs query species a query string into which transmutable. the document locator is substituted. For example, the Once the locations where the HTML links to be inserted locator file://mach2/src/transmutable.c is substi- are known, we can begin Step 3, the actual link inser- tuted into the query string given for funrefs, resulting tion. Each link always contains the URL of the link in the actual query “rigigrep "file://mach2/src/- insertion engine; in addition the link always includes transmutable.c" call.db” Evaluating this query re- a parameter-value pair specifying the type of the link sults in a list of tuples, each of which corresponds to the (fundlink in this case). Additional parameter-value location of a link (an anchor position) in the document. pairs, such as the name of the item that is linked (in 2Details of associating document types with default views are our example, the function name isFixed) can also be in- fairly straightforward, and are omitted here. serted. The values are drawn from the anchor tuple that

5

is associated with each specic link insertion. If such ad- class DB; ditional link parameters are required, they are specied class Query; class Relation; class Tuple; in the link specication, as a mapparm attribute. For class DB { class Document; public: class Repository; example, the fundlink specication indicates that the virtual void class attrValue; DBopen(Map &parms) public: virtual void virtual int be entered into the corresponding link as the value of DBclose() {}; getNextLine(String &) {}; virtual Relation* virtual String getId() {}; the parameter name. Link insertion is the nal step; the evaluate(const Query &) {}; virtual int linePos() {}; }; resulting HTML is returned as the result of the acti- virtual int docSize() {}; class Tuple { }; vation of the original link from the call to the function public: virtual attrValue /* Factory Method examples */ transmutable. getField(const String & attr ) {}; Repository &makeRep(Map &parms); Query &makeQuery(String &qstr); environments DB &makeDB(Map &parms); Two of chime’s key objectives are compatibility and leverage. First, we want to introduce WWW brows- ing into legacy software development environments, to Figure 6: Some of the interface classes in the chime reduce disruption to the ongoing projects; second, we database and document interface would like to reuse the results of existing analysis tools whenever possible. However, this presents an implemen- tation diculty: the XRDB and the document reposi- for accessing values of individual attributes. The class tory may use a variety of dierent data models, schemas, attrValue provides virtual methods for accessing val- and storage strategies. It is impossible to determine a ues that are not in rst normal form. Details are not priori the storage format and access methods that could shown here, but the abstraction used here corresponds composite be used. chime makes some assumptions and tradeos. to the pattern (see [12], page 163), which First, we assume a relational model for the XRDB, and provides a uniform way to access compositions of objects a “bag of at streams” model for the document reposi- and individual objects. Likewise class Document has tory. Within the relational model, the chime language member functions for processing the individual lines in and retargeting machinery can accommodate dierent the document. The Repository class (not shown) has schemas for relations, attributes, etc. methods for opening and closing documents. factory method The retargeting machinery in chime is based on a cou- There are also several s (See [12], ple of design patterns [12]: the adapter pattern and page 107), to create the top-level objects such as queries, the factory method. The basic chime link insertion the database and the document repository. These meth- engine is implemented in C++. It denes and uses some ods are to be implemented to return instances of the im- virtual base classes (interfaces) corresponding to a sim- plementation classes that derive from DB, repository ple relational database access. To retarget chime to a and Query interface classes discussed above. The use factory method chime specic XRDB, it is necessary to dene an implementa- of the allows the implemen- tion class that inherits publicly (interface inheritance) tation to invoke the creational operations to create in- from each of these base classes, and implements the vir- stances at the right times, while deferring the actual tual member functions. In the classic adapter pattern implementation to the specic retargeting. (See [12], page 139), these implementation classes would 6 Experience inherit privately (implementation inheritance) from the chime has been retargeted to four dierent environ- actual implementation classes of the particular XRDB. ments. Our rst experience was using a simple C++ gen++ A synopsis of the public base classes for the XRDB cross reference analyzer based on [9], a C++ and document interfaces, along with the details of some analyzer generator. Our second experiment was with an of the classes, are shown in Figure 5. There are sev- old version of CIA [6], a C cross referencing tool. Our eral classes: Document, DB (for database), Relation, third experiment was with Rigi [29], a reverse engineer- Tuple, and Query. Some of the classes are shown in ing environment. Finally, our most recent retargeting more detail: thus, class DB has methods for opening, was to Acacia [5]. In this section, we describe our expe- and closing databases, and evaluating a query. These rience with these retargetings. are virtual methods to be implemented by a specic We rst describe our experience with gen++ and database interface. The result of evaluating a query CIA++. Analyzing C and C++ programs is notori- (via method DB::evaluate is an instance of class ously dicult because of syntactic and semantic irregu- Relation (details not shown), which provides ways of larities caused by macros, typedefs, etc [8]. Given this, iterating through the relation, a tuple at a time, and the popularity of the languages, and especially the CIA for applying selections. The class Tuple has a method and Acacia tools, implementing a WWW browsing ca-

6

pability for C and C++ using existing analysis tools gines. The chime specications are compiled into tables was an attractive proposition. Our experience with that are used by a core engine that inserts links. This gen++ was relatively easy: we built an analyzer that engine is about 1300 LOC (the chime language com- produced a simple cross referencing relation for func- piler is not included in this count). The small size is tions and global variables into a at le database. The due to heavy use of templates (both custom and the document repository was simply the UnixTM le sys- C++ standard template library[21]). Roughly a fth tem. Our second eort, with CIA, exposed several prob- of this code is for interpreting the tables–the rest does lems: most notably, the relations were not in rst nor- the string matching and manipulation, HTML encod- mal form. The CIA family of tools [13, 5, 4] uses a ing/decoding, database interaction and document ac- highly compacted database format to store potentially cess (via the adapter interfaces), attribute/parameter quadratic sizes of cross-reference relations. This for- mapping during link-insertion, etc. This core link- mat sometimes stores a list of values (for example, a insertion engine code is leveraged for every retargeting list of line numbers where a name is referenced) in an of chime. attribute. Our database interface, and the chime lan- One alternative would be to implement a link inser- guage were extended to accommodate this sort of non- tion engine for each software development environment. rst normal form attribute values in the XRDB. We im- Such an engine (for reasons described earlier) should plemented several links, including links from functions, be customizable, and allow ready addition of new links variables and macros to their denitions, and views that and views. This indicates the use of a layered architec- presented dierent combinations of these links. tural style [27, 24], with the bottom layers implement- Our third and fourth eorts followed a major restructur- ing the database access, and HTML encoding/decoding ing of the chime database interface: originally, we had primitives, the middle layer providing primitives to sup- used templates, but we changed that to use design pat- port basic link insertion functionality, and the top layer terns. This was motivated by the desire to release the for implementing dierent links. Without such a struc- chime software in binary form rather in the source form. ture, it would be dicult to provide dierent sets of The third retargeting, to Rigi, was relatively easy and links for dierent browsing needs. Building a robust, straightforward. The fourth retargeting, to a newest mature, reusable infrastructure of this type (with the member of the CIA family, Acacia, was easy as well. right abstractions) is dicult, and nearly impossible to engineer correctly the rst time. The core chime 7 Evaluation engine provides precisely such an infrastructure. We The chime system is a retargetable domain specic nd that we can leverage about 1K LOC, by writing framework intended to support the introduction of cus- the (about) 250 LOC adapter interfaces to an exist- tomizable web-based browsing into software develop- ing XRDB/repository. The lines of code comparison, ment environments. The arguments for customizability while providing a baseline for comparison, is not deni- and retargetability, have been presented earlier in this tive. There are several other favourable (+) and un- paper. As a result of our retargeting work with chime, favourable (-) factors, such as the eort to learn the we believe it is now a mature, reusable framework that is chime language (-), the eort to design, develop and both retargetable and customizable. Our secondary goal debug a reusable link-insertion framework similar to in undertaking the retargeting trials described in the chime’s from scratch (+), the diculty of understand- previous section was to address another hypothesis— ing and implementing the chime adapter interfaces that a retargetable framework such as chime is better (-), the convenience of using the chime language (+), than implementing an HTML link insertion engine from etc. Our experience, however, indicates that the chime scratch for each software development environment. We framework oers a viable alternative to “from scratch” now discuss our ndings relating to this issue. implementation of WWW browsing in a legacy software Our experience with the four retargeting eorts indicate environment. In addition, once the retargeting interface that the bulk of the retargeting eort is implementing is implemented, it’s fairly simple and quick to produce the database interface. With both the prior template- as many customized links and views as desired. based interface, and the current adapter-based inter- 8 Related Work face, it takes around 200-250 lines of C++ code (LOC) WWW is a research hotbed—there are too many to implement the interface–about 10-15 hours of coding, projects and eorts to discuss here, so we provide com- for some one knowledgeable about specic XRDB and parisons to what we believe to be a representative sam- document repository. Once this is complete, one can ple. Storing and managing links separately from content write specications for several dierent links, and views is a key issue, and several systems have addressed this that group these links in dierent ways. The database issue. The Microcosm [2] project at University of Leeds interface code can be leveraged across the dierent types is specically concerned with this problem. Hyperlinks of links and views, and even dierent link insertion en-

7

are stored in a separate link base, and are dynamically derived using existing analysis tools. inserted into documents at the time they are dispensed Another noteworthy related system is Genera [18] by an HTTP server. This system has sophisticated fea- (which is also a specication-driven environment like tures for handling link insertion into “cooked” docu- chime). Using Genera, one can specify detailed ways of ments. For example, Microcosm can read the formatting sophisticated object-oriented database con- content from a Microsoft WordTM document, perform tent. Examples of Genera usage in the Human Genome the appropriate text formatting, and then insert the Database are shown in the link from the [18]. links. The delayed, dynamic link insertion policy al- Since much summary information about software ob- lows the browsing context to determine which links are jects might be available in cross-reference databases or actually inserted. Microcosm has special link denition software repositories, some of the features of Genera features that allow links to be parametrized; this allows could be used in conjunction with chime for informa- links to be dependent on textual context in a seman- tion display. We are exploring this possibility. tic way rather than on specic locations in a le. This reduces the “tight binding” between content and links, Complementary WWW tools allowing content to evolve somewhat more freely of the The WWW is a very fertile environment for innovation; linkage. Their custom authoring facility has specialized numerous tools to supplement the basic web technology features for creating links. are available, and new ones emerge frequently. Browsing source code using the web can leverage these technolo- However, Microcosm uses a specialized database for stor- gies. Any type of tool which can work with dynamically ing the link information. While chime’s link insertion generated web pages can be useful in this context. For facilities are more limited, it can accept link information example, tools such as htmldiff [10] could be adapted from much more varied sources; indeed, it is predicated and used to nd and view in a the pre- on the assumption that this information is readily avail- cise dierences in source les, including dierences in able for re-use from various existing sources. In the case inter-source le relationships that are exposed as links. of software documents, source code analyzers that gen- Alerting systems such as netmind [22] are available to erate such link information are hard to build and/or monitor and report changes. Convenient graphical tools modify. The key contribution of chime in this regard is for visualizing and using browsing histories can be help- the retargetable database interface, as well as the chime ful: tools such as [1] provide graphical visualization of specication language, which is used to specify the pre- browsing histories in distributed, shared context that cise mapping of the link insertion information in a given can be used by teams co-operating over an intranet. database. A novel application of the web is in distributed code Constellation [30] is a broad conception of a distributed inspections. Empirical studies [23] suggest that dis- software development environment based on WWW tributed inspections based on the WWW can save technology. Typical HTTP clients, perhaps augmented costs and reduce intervals while preserving eectiveness. with signicant client side GUI extensions can be used New technologies such as GroupWalk[20] allow a dis- to handle variable functionalities for editing, browsing, tributed group of browsers to follow a leader through a debugging, etc. On the server side, enhanced services series of links. This is achieved by “slaving” a group of are oered for source code control, debugging, and other browsers to a master; browsing events are distributed development oriented activities. Integrated facilities for via notication system. This has a clear application to tele-conferencing are also envisioned. chime could help distributed inspections. In addition, it could be used by leverage existing WWW and software engineering assets an experienced developer to lead a distributed group of to implement the functionality envisioned in [30]. novices through a large body of source code, and explain Hyper-G [11] has some similarities to Microcosm, but the design and function. it has also many advanced features for handling mul- A major project in the area of WWW-applications to tilingual documents, as well as links into and out of software development is the Hypercode [15] eort at multimedia content such as video. Hyper-G is also con- Columbia University. Hypercode is an architecture for cerned with maintaining consistency of link informa- distributed collaborative software development which tion across distributed documents. Sophisticated, dis- leverages the web infra-structure. Hypercode has an tributed, probabilistic algorithms have been developed open architecture that accommodates new tools and in- to distribute the state of the link information. chime formation sources [14]. On the one hand, chime could has a specic focus on link insertion for software docu- t in nicely as a link insertion tool within the Hypercode ments; however, the link insertion capabilities are more framework. On the other hand, there is a dierence in exible. Using the specication language and the retar- motivations. chime can introduce the incremental ben- getable database interface, it is possible to insert links et of WWW-browsing into a legacy software develop- based on information stored in a wide range of databases

8

ment environment, while leaving the rest of the envi- approaches to address this problem. ronment unchanged. Hypercode is a completely novel For the immediate future, we are developing an HTML- approach, based on WWW-based integration of a dis- based front end to the chime specication language that tributed set of environments under a uniform WWW- can hide some of the syntax from a user, and provide a based interface. The persistence of legacy environments more friendly user interface. Further on, we are inter- oers a continuing and viable opportunity for incremen- ested in extending chime beyond software documents. tal tools such as chime. The main diculty is inserting links into “cooked” doc- 9 Conclusion, Limitations and Future Work uments, based on information derived by the analysis of chime is a generator of HTML link-insertion engines. “raw” documents. We are exploring approaches based chime is customizable and retargetable. Dierent soft- on the plug-in interfaces published for popular HTTP ware engineering tasks require dierent strategies for clients. exploring code; the ability to customize links allows Acknowledgements: We would like to thank Tom the creation of browsing support to suit the specic Ball, Naser Barghouti, Alex Borgida, Henry Kautz, De- needs of dierent tasks. The retargetability of chime wayne Perry, and David Rosenblum for their helpful allows us to introduce web-browsing into legacy soft- comments and suggestions. ware development environment with minimal eort and disruption. This has several benets. First, the analy- REFERENCES sis tools and build procedures (which build the cross- referencing database) in the legacy environment can [1] E. Z. Ayers and J. T. Stasko. Using graphic be leveraged. Second, programmers don’t require re- history in browsing the world wide web, fourth training; most are familiar with the WWW infrastruc- international world wide web. World Wide ture used in chime, and the rest of their environment Web, 1995. http://www.w3.org/pub/WWW/Journal/- is undisturbed. Finally, the constant stream of new 1/ayers.270/paper/270.. WWW technologies for collaboration, notication, etc., can be leveraged in context with chime in legacy envi- [2] Leslie Carr, David De Roure, Wendy Hall, and ronments. Gary Hill. The distributed link service: A tool chime for publishers, authors, and readers. In Proc. has several limitations. On the language and Fourth International World Wide Web Conference. link-insertion side, it is a “pure browser”. No facili- O’Reilly Associates, 1995. ties have been added as yet to integrate it with other elements of the environment, such as editors and de- [3] Yih-Farn Chen. Reverse engineering. In Balachan- buggers. We are examining this issue carefully; such an der Krishnamurthy, editor, Practical Reusable integration would introduce a major new paradigm into UNIX Software, chapter 6. John Wiley & Sons, legacy environments, which is not our original inten- 1995. tion. On the database interface side, chime is strongly biased towards a relational model. Other models could [4] Yih-Farn Chen, Glenn S. Fowler, Eleftherios Kout- be accommodated by appropriate implementation of the soos, and Ryan S. Wallach. Ciao: A Graphi- adapter interface; however, this will be harder for non- cal Navigator for Software and Document Repos- relational databases. chime-based link insertion en- itories. In International Conference on Software gines currently maintain all context information (e.g., Maintenance, 1995. the current default view) in the links. This wastes band- width. It would not be dicult to change the engines to [5] Yih-Farn Chen, Emden Gansner, and Eleftherios use client-side persistent data (cookies) to keep this in- Koutsoos. A C++ Data Model Supporting Reach- formation. Another problem endemic to languages like ability Analysis and Dead Code Detection. In C which have a pre-processor is that occasionally se- Proc. Sixth European Software Engineering Confer- mantic cross-referencing information can be masked by ence and Fifth ACM SIGSOFT Symposium on the a macro call. In this case, diculties may occur dur- Foundations of Software Engineering, pages 414– ing link insertion. Some XRDB’s may provide enough 431, 1997. information that can be used to handle the situation— but chime currently passes over such diculties and [6] Yih-Farn Chen, Michael Nishimoto, and C. V. Ra- keeps proceeding with link insertion. Finally, the cur- mamoorthy. The C Information Abstraction Sys- rent implementation of chime assumes a single location tem. IEEE Transactions on Software Engineering, (URL) for the link insertion engine. This may not be 16(3):325–334, March 1990. well-suited for situations where the XRDB and the doc- ument repository are distributed. We are working on [7] T. A. Corbi. Program understanding: A challenge for the 1990’s. IBM Systems Journal, 28(2), 1989.

9

[8] P. Devanbu. Genoa—a language and front-end in- Group, 1997. http://www.osf.org/www/waiba/- dependent source code analyzer generator. In Proc. papers/aw3tc/notif.html. Fourteenth International Conference on Software Engineering, 1992. [21] D. R. Musser and A. Saini. STL Tutorial and Referene Guide. Addison Wesley, 1996. [9] P. Devanbu. The gen++ page. http://seclab.cs.- Netmind, Inc. ucdavis.edu/~devanbu/genp, 1998. [22] Url minder service. http://www.netmind.com. [10] Fred Douglis, Thomas Ball, Yih-Farn Chen, and Eleftherios Koutsoos. The at&t dier- [23] J. Perpich, D.E. Perry, A. Porter, L.G. Votta, and ence engine: Tracking and viewing changes on the M.W. Wade. Anywhere, anytime code inspections: web. World Wide Web, January 1998. Using the web to remove inspection bottlenecks in large-scale software development. In 19th Interna- [11] Frank Kappe et al. The hyper-g sys- tional Conference on Software Engineering, 1997. tem. http://www.iicm.tu-graz.ac.at/chyperg, 1995. [24] Dewayne E. Perry and Alexander L. Wolf. Founda- tions for the study of software architecture. ACM [12] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. SIGSOFT Software Engineering Notes, October Design Patterns: Elements of Reusable Object- 1992. Oriented Software. Addison Wesley, 1994. [25] Symantec Personnel. http://www.symantec.com, [13] Judith Grass and Y. F. Chen. The C++ Infor- 1995. mation Abstractor. In The Second USENIX C++ Conference, April 1990. [26] J. Christopher Ramming. Proc. First Usenix Con- ference on Domain-Specic Languages. The Usenix [14] Gail E. Kaiser and Stephen E. Dossick. Xanth: An Association, October 1997. (Edited). architecture for eective utilization of distributed heterogeneous information resources. Technical Re- [27] Mary Shaw and David Garlan. Software Archi- port CUCS-003-98, Columbia University Depart- tecture: Perspectives on an Emerging Discipline. ment of Computer Science, March 1998. Prentice-Hall, 1996. [15] Gail E. Kaiser, Stephen E. Dossick, Wenyu Jiang, [28] R. Stallman. Gnu emacs manual. http://www.cs.- and Jack Jingshuang Yang. An architecture for utah.edu/csinfo/texinfo/emacs19, 1994. www-based hypercode environments. In Proc. 1997 [29] Margaret-Anne Storey, Kenny Wong, and Hausi A. International Conference on Software Engineering, Muller. Rigi: A visualization environment for re- 1997. verse engineering. In Proc. 1997 International Con- [16] D. A. Ladd and J. C. Ramming. Programming the ference on Software Engineering, 1997. web: An application-oriented language for hyper- [30] Nino Vidovic and Dado Vrsalovic. Constellation: media. In Proc. 4th Intl. World Wide Web Confer- A web-based design framework for developing net- ence, June 1995. work applications. In Proc. Fourth International [17] S. Letovsky. Cognitive processes in program com- World Wide Web Conference. O’Reilly Associates, prehension. In Proc. Second Workshop on Em- 1995. pirical Studies of Programmers, Washington, DC, [31] S. Zeigler. Comparing development costs of c and 1986. Ablex Publishers, Norwood, NJ. ada. http://sw-eng-falls-church.va.us/AdaIC- [18] S. Letovsky. Genera: A specication driven /docs/reports/cada/cada art.html. web/database gateway tool. http://gdbdoc.- gdb.org/letovsky/wgen.html, 1995. [19] S. Letovsky, J. Pinto, R. Lampert, and E. Soloway. A cognitive analysis of a code inspection. In Proc. Second Workshop on Empirical Studies of Pro- grammers, Washington, DC, 1986. Ablex Publish- ers, Norwood, NJ. [20] W. S. Meeks, C. Brooks, and F. J. Hirsch. Stay- ing in the loop: Multicast asynchronous notica- tion for intranet webs. Technical report, The Open

10