Brunner, Tibor and Porkolab, Zoltan Advanced Code Comprehension

Brunner, Tibor and Porkolab, Zoltan Advanced Code Comprehension

Advanced Code Comprehension using Version Control Information Tibor Brunner; Zoltán Porkoláb Abstract: Version control systems were originally Conference of Program Comprehension, and in designed to handle the development process of the Intellectual outputs No. O1 and O2 of the a source code and synchronize team work by Erasmus+ Key Action 2 (Strategic partnership resolving conflicts. While this purpose remains the primary target, they proved to be useful for for higher education) project No.2017-1-SK01- other purposes including the help of code compre- KA203-035402: “Focusing Education on Com- hension. Advanced code comprehension process posability, Comprehensibility and Correctness of requires utilization of the full knowledge portfolio Working Software” [16], [31] among others. of the software system. In this paper we investigate how a version control information of the project can Most of the comprehension approaches are be utilized for extending our apprehension of large based on the source code. That is a logical ap- legacy systems providing a better understanding proach as the actual software might have already of the software under examination. We show that diverged from the original specification and the some of the hidden structural connections between documentation may also be outdated. Therefore, the elements of the program can be revealed most easily by the development history of the system. An typical comprehension tools analyze the source industrial level implementation of the method using code, support fast navigation, feature location, git version control information has been imple- and reveal the internal structure of the software. mented as an open source extension of the Code- However, not all the internal connections within Compass software comprehension framework. the system can be detected by analysing the source. Virtual function calls on polymorphic ob- Index Terms: code comprehension, version con- trol, git, software technology jects, pointers, references, closures are among the program constructs where static analysis has limitations. 1. INTRODUCTION Code comprehension may not be restricted to existing code bases. Important architectural infor- T is a well-known fact, that the largest cost factor I of a software product for its whole lifetime is the mation can be gained from the build system, like maintenance cost. One of the reasons is that, prior relations between libraries, binaries and source to any maintenance activity – new feature develop- files [32], [33]. Even more interesting structural ment, bug fixing, etc., – programmers first have to connections can be revealed from the history of locate the place where the change applies, then, the project development. We can identify which have to understand the actual code, and, finally, files were added or changed together, how these have to explore the connections to other parts of changes are related to certain commit messages the software to decide how to interact in order to and which lines were added/removed/changed avoid a regression. All these activities require an frequently. adequate understanding of the code in question CodeCompass is an open source code com- and of its environment. Most of these activities prehension framework [29] developed by Ericsson are currently impossible to automate, therefore the and Eötvös Loránd University, Budapest, to help developers should spend their expensive time to the code comprehension process of large legacy carry out these actions. systems. The tool is based on the LLVM/Clang Therefore, it is not a surprise that code com- compiler infrastructure [5], [14], and has been prehension is a key factor of modern software designed to be extremely scalable, seamlessly development, exhaustively researched by both the working with many million lines of code. Fast industry and academy. Various scientific and in- search options help locating the requested feature dustrial papers have been published on the topic by text search. Once the feature has been located, in conferences, e.g. in the series of International precise information on language elements for vari- ables, inheritance and aggregation relationships This work was supported by the European Union, co-financed by of types, and call points of functions are provided the European Social Fund (EFOP-3.6.3-VEKOP-16-2017-00002) T. Brunner (contact person) is a PhD student at the Faculty of by the LLVM/Clang infrastructure. Easy navigation Programming Languages and Compilers, Eötvös Loránd University, possibilities and a wide range of visualizations Hungary (e-mail: [email protected]). extend far more than the usual class and func- Z. Porkoláb is an Associate Professor at the Faculty of Program- ming Languages and Compilers, Eötvös Loránd University, Hungary tion call diagrams and help the user in more (e-mail: [email protected]). complete comprehension. To make the compre- 47 hension more extensive, CodeCompass utilizes files are generated during a parsing process. The a full portfolio of available information including advantage of this approach is that the web client build commands but also utilizes version control will be fast since no “on the fly” computation is information, if available; git commit and branching needed on the server side while browsing. Also, history, blame view are also visualized. hovering the mouse on a specific function, class, In this paper we investigate the role of the ver- variable, macro, etc. can show the properties of sion control information for code comprehension that element. For example, in case of functions purposes. We overview the main categories of the one can see its signature, place of its definition, existing comprehension software using specific and place of usages. For classes, one can check tools as example in Section 2. In that section we the size of its objects, the class layout, and offset also review current research directions regarding of its members and the inheritance diagram. For the version control systems. Section 3 describes variables, one can inspect their type and locations the architecture of CodeCompass in the context where they are written or read. of how we collect and use version control informa- OpenGrok [11] is a fast source code search tion. In Section 4 we present why it is important to and cross reference engine. Opposed to Woboq, include version control information in a code com- this tool doesn’t perform deep language analysis, prehension tool. In Section 5 we show CodeCom- therefore it is not able to provide semantic informa- pass support for the version control information tion about the particular entities. In addition to text to reveal hidden connections between otherwise search there is possibility to find symbols or def- unrelated code segments. Section 6 discusses the initions separately. The lack of semantic analysis most common use cases where version control allows Ctags to support several (41) programming could be used for code comprehension. Our paper languages. Also, an advantage of this approach is concludes in Section 7. that it is possible to incrementally update index 2. RELATED WORK database. OpenGrok also gives opportunity to Code comprehension became a hot research gather information from version control systems topic recently, with dedicated user communities like Mercurial, SVN, Git, etc. It has the ability to and, proprietary and open source tools. search not only in the content of source files but in their history as well. Since most of these version 2.1. Code Comprehension Tools control systems (VCS) provide search functional- On the software market, there are several tools ities in the project history (including commit mes- which aim some kind of source code comprehen- sages and source files), OpenGrok can forward sion. Some of them uses static analysis, others these queries to the given VCS. However, there examine also the dynamic behavior of the parsed are no extra visualizations in order to display the program. These tools can be divided into different “blame view” so the developer could understand archetypes based on their architectures and their what other relevant changes happened in other main principles. On the one hand there are tools files in the same commit. The branches of the having server-client architecture. Generally, these history are invisible too. CodeCompass intends to tools parse the project and store all necessary support these use-cases. information in a database. Clients (usually web- Understand [13] is not only a code browsing based) are served from the database. These tools tool, but also a complete IDE. Its great advantage can be integrated into the workflow as nightly CI is that the source code can be edited and the runs. This way the developers can always browse changes of the analysis can be seen immediately. and analyze the whole, large, legacy codebase. Besides code browsing functions already men- Also, there are client-heavy applications where tioned for previous tools, Understand provides a smaller part of the code base is parsed. This is the lot of metrics and reports. Some of these are use case for most of the IDE based editors where the lines of code (total/average/maximum globally the frequent modification of the source requires or per class), number of coupled/base/derived quick update of the database with analyzed re- classes, lack of cohesion [19], McCabe complexity sults. In this section we present the most popular [24] and many others. Treemap is a common tools used in industrial environment from each representation method for all metrics. It is a nested category. rectangular view where nesting represents the Woboq Code Browser [6] is a web-based code hierarchy of elements, and the color and size browser for C and C++ languages. This tool has dimensions represent the metric chosen by the extensive features which aim for fast browsing of user. For large code bases, the inspection of the a software project. The user can quickly find the architecture is necessary. Visual representation is files and named entities by a search field which one of the most helpful ways of displaying such provides code completion for easy usability.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us