Experience with ANSI C Markup Language for a Cross-Referencer
Total Page:16
File Type:pdf, Size:1020Kb
Experience with ANSI C Markup Language for a cross-referencer Hayato Kawashima and Katsuhiko Gondow Department of Information Science, Japan Advanced Institute of Science and Technology (JAIST) 1-1 Asahidai Tatsunokuchi Nomi Ishikawa, 923-1292, JAPAN {hayato-k, gondow}@jaist.ac.jp Abstract duced ACML (ANSI C Markup Language) [6]. ACML is a kind of domain-specific language (DSL) to describe a com- The purpose of this paper is twofold: (1) to examine mon data format specific to ANSI C in the domain of CASE the properties of our ANSI C Markup Language (ACML) tools. That is, ACML is defined as a set of XML [1] tags as a domain-specific language (DSL); and (2) to show that and attributes, which describe ANSI C program’s syntax ACML is useful as a DSL by implementing an ANSI C cross- trees, types, symbol tables, and relationships among lan- referencer using ACML. guage constructs. We have introduced ACML as a DSL for developing A DSL is a programming language dedicated to a par- CASE tools. ACML is defined as a set of XML tags and ticular domain or problem. For example, Lex and Yacc are attributes, and describes ANSI C program’s syntax trees, used for lexers and parsers; VHDL for electronic hardware types, symbol tables, and so on. That is, ACML is the DSL description. Furthermore, even markup languages can also which plays the role of intermediate representation among be viewed as DSLs, because they can be good languages to CASE tools. ACML-tagged documents are automatically describe some data’s structures, relationships and semantics generated from ANSI C programs, and then used as input specific to some particular domain, even though they are of CASE tools. likely to lack the language features like variables, control ACML is self-descriptive and has CASE-tool specific in- statements, and/or procedures. In fact, HTML and LATEX formation, which results in high productivity of CASE tools. are widely recognized as DSLs [18][19][23]. To show this, we experimentally implemented an ANSI C The first purpose of this paper is to examine the proper- cross-referencer based on ACML. In the implementation, we ties of ACML as a DSL. ACML is a markup language, and had a good result; it took only 0.5 man-month. is considered as a DSL for the same reason described above. ACML-tagged documents are automatically generated from ANSI C programs, and then used as input of CASE tools. 1. Introduction ACML provides to programmers a way to easily obtain information about ANSI C programs in a self-descriptive CASE (Computer-Aided Software Engineering) tools manner, which is useful to develop CASE tools. To show can be quite useful to reduce the cost of software devel- this, we already implemented Weiser’s slicer as a prelimi- opment, but the cost of developing CASE tools itself is very nary experiment in [6]. But the slicer only processes limited high. One reason is because internal data in CASE tools is ANSI C programs, and utilizes only the syntax structure in usually not available to other tools, although such data can ACML. Thus further experiment is required. be shared among CASE tools. As a result, most CASE tools The second purpose of this paper is to show that ACML have their own individual parsers and analyzers, resulting in is useful as a DSL by implementing an ANSI C cross- low maintainability. It is a key issue to find or develop some referencer using ACML. A cross-referencer is a CASE tool technique to facilitate data sharing, or exchanging among that lists all identifiers including variables, functions, la- CASE tools in an elegant and cost-effective manner. bels and type names in a given program, and allows pro- In order to cut the cost of developing CASE tools for grammers to relate the use of an identifier to its defini- ANSI C programming language [17] (e.g., program slicers, tion, and vice versa. An elaborate cross-referencer consid- cross-referencers and test-case generators), we have intro- erably helps programmers to understand and maintain their 1 source programs. However, the existing implementations ally costs. Such data should be shared among CASE tools. like Cxref [14] and GNU GLOBAL [12] are still not good Thus, it is a key issue to find or develop some technology to enough, since their analysis is likely to be imprecise as a re- facilitate data sharing, or exchanging among CASE tools in sult of their focusing on execution speed. Furthermore, all an elegant and cost-effective manner. To solve the issue, the information in ACML is required to implement our cross- idea of using common data formats have already been intro- referencer. The cross-referencer processes any ANSI C pro- duced in CDIF[5] and PCTE[3]. But, unfortunately, these grams. Thus, in this experiment, the effectiveness of ACML technologies have not come into wide use in CASE tools is examined more comprehensively. In the experiment, we yet. Thus, the quest for the ideal format is still continuing. had a good result; it took only 0.5 man-month. The rest of this paper is organized as follows. Section 2 2.2. Data integration gives a short overview of data integration for CASE tools and the existing standards: CDIF and PCTE. Section 3 dis- The purpose to integrate CASE tools is to make CASE cusses the properties of our ACML as a DSL. Section 4 de- tools more open, interoperable, and reusable. In [4], CASE scribes our XCI (Experimental ANSI C Interpreter), which tool integration is categorized into data integration, con- also works as a converter from ANSI C source code to trol integration, presentation integration, process integration ACML-tagged documents. In Section 5, our preliminary and framework integration. experiment of implementing an ANSI C cross-referencer is Data integration is our major interest, since our goal is to given. In Section 6, we have discussion through the imple- cut the cost of developing CASE tools by introducing com- mentation of our cross-referencer. Section 7 describes re- mon data formats. It is not easy to find a good solution for lated works. Finally, Section 8 gives conclusion and future data integration. We can understand this fact by seeing that works. even simple data integration like integrating newline char- acters in text files (CRLF, CR, and LF) could be problem- 2. Data integration in CASE tools atic. For example, JDK1.2 has several bug fixes related to CRLF. Things are even worse for CASE tools, because: 2.1. Problem • There are various software products like specification diagrams, design documents, programs, memoranda, Generally the cost of developing CASE tools is very manuals, test data, and so on. high. One reason is because internal data in CASE tools is usually not available to other tools, although such data can • For a product, there are many formats for the product. be shared among CASE tools. In other words, such data is For example, there are many programming languages, closed, or even if it is open, it is not very useful. For ex- which have their own syntax (i.e., formats). ample, we cannot easily use internal data of GCC [8], such • as symbol data, syntax data, and type data, for other CASE For a format of some product, there are many tools tools (Figure 1). for it. For example, there are many CASE tools for ANSI C programs like compilers, debuggers, program slicers, and cross-referencers. GCC • There are complex relationships among products men- Internal Data tioned above. Source Object files files Therefore, it is not quite trivial to determine the proper level of abstraction or granularity of the common represen- tation for various kinds of products. Another problem is pointed out in [22]. Data integration assumes that the connected CASE tools share information. Tool A Tool B Tool C This typically involves the development of a database or repository to store the information shared among the tools. This can be a disadvantage of data integration, since such Figure 1. GCC Internal Data a repository is likely to be a quite large and complicated database system. Also, such environments tend to be closed rather than open, both because any new tool is needed to in- As a result, most CASE tools have their own individ- tegrate with the database system and because the database ual parsers and analyzers, resulting in low maintainabil- itself tends to be designed to operate with a particular lan- ity. From this point of view, CASE tools development re- guage. 2 Our idea is that we can avoid these problems for data 3.1. DSL: Domain Specific Language integration by supporting only one programming language (i.e., ANSI C) and using XML, which relieves us from de- A domain-specific language (DSL) is a programming veloping a complicated repository. When it comes to using language dedicated to a particular domain or problem. Usu- XML, it is appropriate to restrict a target language to only ally, DSLs are smaller and easier than general-purpose lan- one, since it is too complicated and ambitious to represent guages (GPLs) in exchange for their limited domain. For some programming languages on only a DTD. example, Lex and Yacc are used for lexers and parsers; SQL for handling relational databases; VHDL for electronic 2.3. The existing data integration technologies hardware description [18]. Furthermore, even markup languages can also be viewed as DSLs, because they can be good languages to describe In this section, we briefly outline CDIF and PCTE from data structures, relationships and semantics specific to some the existing data integration technologies.