Advanced Code Comprehension using Information

Tibor Brunner; Zoltán Porkoláb

Abstract: Version control systems were originally Conference of Program Comprehension, and in designed to handle the development process of the Intellectual outputs No. O1 and O2 of the a and synchronize team work by Erasmus+ Key Action 2 (Strategic partnership resolving conflicts. While this purpose remains the primary target, they proved to be useful for for higher education) project No.2017-1-SK01- other purposes including the help of code compre- KA203-035402: “Focusing Education on Com- hension. Advanced code comprehension process posability, Comprehensibility and Correctness of requires utilization of the full knowledge portfolio Working Software” [16], [31] among others. of the software system. In this paper we investigate how a version control information of the project can Most of the comprehension approaches are be utilized for extending our apprehension of large based on the source code. That is a logical ap- legacy systems providing a better understanding proach as the actual software might have already of the software under examination. We show that diverged from the original specification and the some of the hidden structural connections between documentation may also be outdated. Therefore, the elements of the program can be revealed most easily by the development history of the system. An typical comprehension tools analyze the source industrial level implementation of the method using code, support fast navigation, feature location, version control information has been imple- and reveal the internal structure of the software. mented as an open source extension of the Code- However, not all the internal connections within Compass software comprehension framework. the system can be detected by analysing the source. Virtual function calls on polymorphic ob- Index Terms: code comprehension, version con- trol, git, software technology jects, pointers, references, closures are among the program constructs where static analysis has limitations. 1. INTRODUCTION Code comprehension may not be restricted to existing code bases. Important architectural infor- T is a well-known fact, that the largest cost factor I of a software product for its whole lifetime is the mation can be gained from the build system, like maintenance cost. One of the reasons is that, prior relations between libraries, binaries and source to any maintenance activity – new feature develop- files [32], [33]. Even more interesting structural ment, bug fixing, etc., – programmers first have to connections can be revealed from the history of locate the place where the change applies, then, the project development. We can identify which have to understand the actual code, and, finally, files were added or changed together, how these have to explore the connections to other parts of changes are related to certain commit messages the software to decide how to interact in order to and which lines were added/removed/changed avoid a regression. All these activities require an frequently. adequate understanding of the code in question CodeCompass is an open source code com- and of its environment. Most of these activities prehension framework [29] developed by Ericsson are currently impossible to automate, therefore the and Eötvös Loránd University, Budapest, to help developers should spend their expensive time to the code comprehension process of large legacy carry out these actions. systems. The tool is based on the LLVM/Clang Therefore, it is not a surprise that code com- compiler infrastructure [5], [14], and has been prehension is a key factor of modern software designed to be extremely scalable, seamlessly development, exhaustively researched by both the working with many million lines of code. Fast industry and academy. Various scientific and in- search options help locating the requested feature dustrial papers have been published on the topic by text search. Once the feature has been located, in conferences, e.g. in the series of International precise information on language elements for vari- ables, inheritance and aggregation relationships This work was supported by the European Union, co-financed by of types, and call points of functions are provided the European Social Fund (EFOP-3.6.3-VEKOP-16-2017-00002) T. Brunner (contact person) is a PhD student at the Faculty of by the LLVM/Clang infrastructure. Easy navigation Programming Languages and Compilers, Eötvös Loránd University, possibilities and a wide range of visualizations Hungary (e-mail: [email protected]). extend far more than the usual class and func- Z. Porkoláb is an Associate Professor at the Faculty of Program- ming Languages and Compilers, Eötvös Loránd University, Hungary tion call diagrams and help the user in more (e-mail: [email protected]). complete comprehension. To make the compre-

47 hension more extensive, CodeCompass utilizes files are generated during a parsing process. The a full portfolio of available information including advantage of this approach is that the web client build commands but also utilizes version control will be fast since no “on the fly” computation is information, if available; git commit and branching needed on the server side while browsing. Also, history, blame view are also visualized. hovering the mouse on a specific function, class, In this paper we investigate the role of the ver- variable, macro, etc. can show the properties of sion control information for code comprehension that element. For example, in case of functions purposes. We overview the main categories of the one can see its signature, place of its definition, existing comprehension software using specific and place of usages. For classes, one can check tools as example in Section 2. In that section we the size of its objects, the class layout, and offset also review current research directions regarding of its members and the inheritance diagram. For the version control systems. Section 3 describes variables, one can inspect their type and locations the architecture of CodeCompass in the context where they are written or read. of how we collect and use version control informa- OpenGrok [11] is a fast source code search tion. In Section 4 we present why it is important to and cross reference engine. Opposed to Woboq, include version control information in a code com- this tool doesn’t perform deep language analysis, prehension tool. In Section 5 we show CodeCom- therefore it is not able to provide semantic informa- pass support for the version control information tion about the particular entities. In addition to text to reveal hidden connections between otherwise search there is possibility to find symbols or def- unrelated code segments. Section 6 discusses the initions separately. The lack of semantic analysis most common use cases where version control allows Ctags to support several (41) programming could be used for code comprehension. Our paper languages. Also, an advantage of this approach is concludes in Section 7. that it is possible to incrementally update index 2. RELATED WORK database. OpenGrok also gives opportunity to Code comprehension became a hot research gather information from version control systems topic recently, with dedicated user communities like , SVN, Git, etc. It has the ability to and, proprietary and open source tools. search not only in the content of source files but in their history as well. Since most of these version 2.1. Code Comprehension Tools control systems (VCS) provide search functional- On the software market, there are several tools ities in the project history (including commit mes- which aim some kind of source code comprehen- sages and source files), OpenGrok can forward sion. Some of them uses static analysis, others these queries to the given VCS. However, there examine also the dynamic behavior of the parsed are no extra visualizations in order to display the program. These tools can be divided into different “blame view” so the developer could understand archetypes based on their architectures and their what other relevant changes happened in other main principles. On the one hand there are tools files in the same commit. The branches of the having server-client architecture. Generally, these history are invisible too. CodeCompass intends to tools parse the project and store all necessary support these use-cases. information in a database. Clients (usually web- Understand [13] is not only a code browsing based) are served from the database. These tools tool, but also a complete IDE. Its great advantage can be integrated into the workflow as nightly CI is that the source code can be edited and the runs. This way the developers can always browse changes of the analysis can be seen immediately. and analyze the whole, large, legacy codebase. Besides code browsing functions already men- Also, there are client-heavy applications where tioned for previous tools, Understand provides a smaller part of the code base is parsed. This is the lot of metrics and reports. Some of these are use case for most of the IDE based editors where the lines of code (total/average/maximum globally the frequent modification of the source requires or per class), number of coupled/base/derived quick update of the database with analyzed re- classes, lack of cohesion [19], McCabe complexity sults. In this section we present the most popular [24] and many others. Treemap is a common tools used in industrial environment from each representation method for all metrics. It is a nested category. rectangular view where nesting represents the Woboq Code Browser [6] is a web-based code hierarchy of elements, and the color and size browser for C and C++ languages. This tool has dimensions represent the metric chosen by the extensive features which aim for fast browsing of user. For large code bases, the inspection of the a software project. The user can quickly find the architecture is necessary. Visual representation is files and named entities by a search field which one of the most helpful ways of displaying such provides code completion for easy usability. The structures. Understand can show dependency dia- navigation in the code base is enabled through a grams based on various relations such as function web page consisting of static HTML files. These call hierarchy, class inheritance, file dependency,

48 file inclusion/import. Users can also create their structure of development teams and improving the custom diagram type via the API provided by the modularity in visualizations of developer networks. tool. Unfortunately, information retrieved from version CodeSurfer [8] is similar to Understand in the control systems has its limitations. As an exam- sense that it is also a thick client, static analysis ple, attempts to predict code quality or developer application. Its target is understanding C/C++ or efficiency cannot be achieved according to a re- x86 machine code projects. CodeSurfer accom- search described in [25]. plishes deep language analysis which provides A frequent problem with information mining from detailed information about the software behav- version control systems is that they store only ior. For example, it implements pointer analysis atomic information. In [23], the authors suggest to check which pointers may point to a given a set of heuristics for grouping change-sets files variable, lists the statements which depend on a that frequently change together. The results show selected statement by impact analysis, and uses that the approach is able to find sequences of dataflow analysis to pinpoint where a variable was changed-files. assigned its value, etc. Version control information can be used not only Development tools. The aforementioned tools for extracting data from the source code for code are mainly designed for code comprehension. An- comprehension purposes, but it is also a frequent other application area of static analysis is writing research target for analyzing comments – either the code itself. This is a very different way of structured, or natural language text – in order to working in many aspects, which requires a slightly get additional information about the system [30]. different tool set. Maybe the most widespread As we have seen, the information retrieved from IDEs are NetBeans [2] and [9] primarily the version control systems is used for various for projects, and QtCreator [12] mainly for other purposes than the original intention for the C++ projects. The recent open source tools tend original source control. However, we are not aware to be pluginable so their functions can easily be of any earlier published research that is aiming extended according to special needs and domain the code comprehension problem using version specific tasks. The greatest benefit of these tools control information. is the ability of incremental parsing, which means 3. THE CODECOMPASS ARCHITECTURE the real-time re-analysis of small deviations in the source code. The Visual Studio [15] IDE has a rich CodeCompass provides a read-only, search- interface for code comprehension features, like go able and navigable snapshot of the source code, to definitions and all references among others. rendered in both textual and graphical formats. CodeCompass is built with a traditional server- client architecture as depicted in Figure 1. The 2.2. Version Control Information Usage server application provides a Thrift [3] interface Version control information is used for various to clients over HTTP transport. The primary client software research areas. Most frequently, in the that comes pre-packaged with the tool is a web center of these researches is the connection be- browser based single-page HTML application writ- tween commit actions and software quality, and ten in HTML and JavaScript. the cost of maintenance. Naturally, this is used as Since the interface is specified in the Thrift in- the prediction of distribution of software bugs. In terface definition language, additional client appli- [26], the authors developed a regression model cations (such as a command line client or an IDE that accurately predicts the likelihood of post- plugin) can be easily written in more than 15 other release defects for new entities. Similarly, in his languages supported by Thrift (including C/C++, PhD thesis [17] the author describes the connec- Java, Python etc.). An experimental Eclipse plugin tion between code maintenance activities as it is is already implemented. reflected by version control information and the A parsed snapshot of the source code is called deterioration of the code quality. In a related paper a workspace. A workspace is physically stored as [18] the authors show that a connection between a relational database instance and additional files version control operations and maintainability re- created during the parsing process. The parsing ally exists, in spite of the fact that the data is process consists of running different parser plug- coming from different sources. ins on the source code. The most important parser Apart from software quality, other researches plugins are: target the developer’s team. The authors of [22] A search parser iterates recursively over all utilize version control information for mining and files in the source folder and uses Apache Lucene visualizing networks of software developers. They [1] to collect all words from the source code. detect similarities among developers based on These words are stored in a search index, with common file changes, and construct the network their exact location (file and position). of collaborating developers. The authors show The C/C++ parser iterates over a JSON com- that this approach performs well in revealing the pilation database containing build actions, using

49 opened file, the workspace selector, simple navi- gation history (breadcrumbs) and a generic menu for user guides. The version control related functionality is avail- able from both the center modules, selecting the Team menu item on any code parts, and from the revision control module from the accordion part.

4. IMPORTANCE OF VERSION CONTROL IN CODE COMPREHENSION

The original goal of version control systems was to handle the development process of the source code and to synchronize the team work by resolving conflicts in case of colliding code modi- fications. However, many times the source code Fig. 1: CodeCompass architecture management is joined with some issue tracker in order to provide a platform for reporting bugs or collecting feature requests. GitHub [20], Git- the LLVM/Clang parser [5] and stores the position Lab [21] and Bitbucket [28] are the most popular and type information of specific AST nodes in code storage and issue tracker platforms. the database. This database will be used by the The unit of a code modification in Git is commit. C/C++ language service to answer Thrift calls Among others a Git commit consists of the set regarding C/C++ source code. of changed lines, the committer, and the date of Among other parser plugins the Git parser modification. A hash value combined from these reads the version control information from the identifies the commit itself. Each commit has one source tree (in the .git directories) and stores or more parents to which the change is related. it into the project database. This relationship between commits defines the CodeCompass has an extensible architecture, commit history. When a new feature is being so new parser plugins can be written easily in introduced then the development of this feature C/C++ language. Parser plugins can be added to happens on a different branch, i.e. a commit has the system as shared objects. another child node. This way the modifications On the webserver, Thrift calls are served by of differences don’t interfere until the feature is so-called service plugins. A service plugin imple- ready. As soon as it is complete, the branch can ments one or more Thrift services and serves be merged to the main software branch resulting client requests based on information stored in a two parent commits for the merge node. workspace. A Thrift service is a (remotely callable) Git supports multiple development workflows. collection of methods and type definitions. All Guidelines are available that describe how to or- Thrift services have one implementation with the ganize the joint work among team members. One exception of the language service, which is im- way of working is keeping the commit history in plemented for C/C++, Java and Python. The lan- one line. In this formation every team member guage service is distinct in the sense that it pro- is pushing commits to the single master branch. vides a basic code navigation functionality for the Of course, some review sessions precede this languages it is implemented for. To put it simply, action. Gerrit [10] is such a review handler tool if this interface is implemented for a language, a that supports this workflow. When a new feature user will be able to click and query information or bug fix has been implemented, it is pushed as about symbols in the source code view of a file one commit to the top of Git history, so the steps written in the given language. of the new feature’s solution cannot be broken The Web-based user interface is organized down to smaller commits. Another workflow is into a static top area, extensible accordion mod- creating a new branch for a new feature or bug ules on the left and also extensible center modules fix, implementing the solution in several commits on middle-right. and sending a merge request to the master branch The source code and different visualizations are through the version control system. The merge shown in the center, while navigation trees and command will join the feature branch to the master lists, such as file tree, search results, list of static with a special merge commit which thus has two analysis (CodeChecker) bugs, browsing history, parents. code metrics, and version control navigation is In either configuration there is a way for group- shown on the left. New center modules and ac- ing a set of modifications belonging together. cordion panels can be added by developers. The From the code comprehension point of view this top area shows the search toolbar, the currently grouping has an important role, since this way

50 Fig. 2: Git blame view we can direct our attention on specific code frag- tion we discuss what kind of visualizations Code- ments which are enough to consider in their own. Compass provides regarding Git version control Highlighting these parts, the huge code base is system to support this process. narrowing down to a few lines what makes com- 5. VERSION CONTROL SUPPORT IN CODECOMPASS prehension much easier. CodeCompass supports the code comprehen- In a well-organized project it should be easy sion via various visualizations to present the to orientate. Some languages like Java even en- different views of git information related to the force the organization of modules in an equiva- project. As a starting point, one can initiate the lent directory structure. Or in C/C++, there is a blame view on any source code. Git blame view common practice to separate the interface and shows line-by-line the latest changes (commits) the implementation of a module. The interface to a given source file as seen on Figure 2. The is located in a header file conventionally under background color of the committer also holds an include directory and the implementation information: the commit that happened recently goes to the corresponding src folder with the are colored lighter green, while older changes are same name except for the file extension (.h vs. darker red. This view is excellent to review why .c or .cpp). However, when it comes to a feature certain lines were added to a source file. development, the introduction of a new feature Clicking on the committers’ name of the blame may touch the files of multiple modules located at view CodeCompass brings us to the commit distant parts of the code base. In the commit his- information. This contains the exact date and tory of CodeCompass project we found that there the message of the comment. Here we still can were 226 non-merge commits (i.e. commits with inspect the committed code. one parent) which contain code modification or CodeCompass can also show Git commits in a addition, and in 104 of these there was a common filterable list ordered by the time of the commit. modification of an interface and its implementation This search facility can be used to list changes file together. Besides these in the 226 commits, on made by a person or to filter commits by relevant average more than two files were edited which are words in the commit message. not a common modification of an interface and an Many times, when we are reading the source implementation. code and find a fragment which is interesting in The presentation of the files modified together some ways or seems suspicious, we would also is a helpful visualization in order to help users to like to find what other parts of the current file have see which other files should be inspected together been modified at the same time. These colors are while comprehending a code part. In the next sec- thus visual aid for determining which modifications

51 Fig. 3: Git branch view belong to the introduction of the same new feature All of the visualizations are based on a similar or bug fix. graphical appearance of existing revision control Usually it is a project level decision whether tools, to minimize the cognitive effort for the de- the explanatory comments about the reasons of a velopers when using revision control related infor- modification should be incorporated in the source mation in CodeCompass. code or should be described in the commit mes- Such use of revision control information of large sage. The advantage of writing these comments in legacy projects can reveal hidden connections in the source code is that this way these comments large software systems and help the complete will be version controlled and also inseparable comprehension of these projects. from the code. However, some information be- longs to the commit message of the modification 6. IMPORTANT USE CASES (e.g. the issue which this commit solves, some In this section we show use cases where ver- links to external pages or earlier commits, etc.). sion control information helps to understand the In CodeCompass we would like to present this connection between different parts of the system, information to the user too, immediately at the thus provides essential information for both com- currently displayed file and line. prehending and maintaining it. CodeCompass implements branch view that presents which commits have been developed on 6.1. Configuration a separate branch and thus belong together. On Figure 3 we see a typical branch view. Let us suppose that we are maintaining a foun- In case of a software project, the evolution of dation library, like a library for logging. It is com- the program may also carry useful information. mon that such a library can be configured without The organization of the development process is a modification of the code, e.g. via environment also up to the project members. One of the most variables or configuration files. Such libraries are common structures is to maintain a master branch evolving by time and new features are added to which always contains the latest version of the the existing ones, also, they may be target of bug project. When a new feature is created or a bug fix fixes and other maintenance works. is being introduced, then these are developed on How can we add a new feature (or feature a separate, so called feature branch. This branch parameters) to the existing system? First, we have contains one or more commits which make up the to modify the structure of the configuration file. change together. When the introduced feature is Depending on the technical implementation, this stable on the feature branch, then the changes are requires the (de-)serialization of the configuration merged to the master. The bigger a new feature is, file and/or the schema description of it. Also, the the more suitable it is to separate it on consecutive new features should be implemented in the heart commits. of the library.

52 In any well-structured system, these three ac- may write a record to the database and other(s) tions should be executed in separate parts of the may retrieve this information without introducing code. The reader of the configuration file is a any explicit connection between these modules. well-defined sub-library called either in the startup Similar situation happens when data is sent over a phase of the library, or during static initialization. network connection. Let’s recognize, that even the The schema description is usually a different file, use of some common data type is not necessarily e.g. in an XSD format, also physically separated occurring between the communicating entities. from the rest of the system. Finally, the actual The reader and writer side of the communicat- code to execute the feature is placed somewhere ing partners can use differently named data types inside the logical structure of the library. with the same structure. It is also not uncommon, It is easy to see, that there exist neither direct that – for efficiency reasons – even the structure of relations between the files (e.g. they are not in the reader and writer buffers are different. In such some common subdirectory), nor there are im- situations the version control information could be plicit language connections, like function calls, or a useful source to reveal the connections. commonly used global variables to bind these participants. The developer has no other chance 6.4. A Real World Example to detect these connections than reveal the ver- Xerces-C++ is a popular open source library for sion control information to connect them: they parsing, validating, serializing and manipulating were created, checked-in and maintained together XML documents [4] written in a portable subset during the lifetime of the system. of C++ and makes it easy to give applications the 6.2. Plug-ins ability to read and write XML data. The library is prepared for annotation sup- Large systems linking against fixed libraries (ei- port back in 2003. The modification added a ther statically linked or using shared objects/DLLs) new struct PerfMapElem to the header file are sometimes proved to be too rigid. It is more ElemStack.hpp. If one start to investigate what flexible to apply advanced component-oriented the purpose of this feature addition is and how approaches, like plug-ins [27]. With plug-ins it is it has been implemented it is a natural way possible to apply new code to an existing system to search all the usages of this type. However, without any additional compiling/linking action or PerfMapElem is used only in 3 other files. even other configuration settings. Applications of The usages include defining template parameter plug-ins include: of a vector-like container as well as declaring • enabling third-party developers to create fea- (pointer(s) to) individual objects. tures to extend static applications, like au- However, it is less obvious that, among dio or video decoders, graphical components, others, this modification removed the etc. fElemStack field from the DGXMLScanner • adding features to existing large systems. • reducing the size of the system elimination class and replaced by a heap allocated unused features from linking statically to the vector ValueVectorOf. program. This information can be retrieved easily from the • separating software components with incom- version control information. patible licenses. Locating the definition of the PerfMapElem in The plug-inable infrastructure, by its nature, ElemStack.hpp, we can use the blame view avoids any direct connection between the plug-ins (see Figure 2) to find the corresponding commit and the components working with them, otherwise f2cf1160 where this definition has been intro- the linking phase could not be avoided. However, duced. Clicking on the left bar of the view, we this means that no explicit information is held are automatically navigating to the change view in the system about these implicit dependencies. where all the changes included by this commit is When a new plug-in is applied or an existing represented. This view also reveals that the new one has been modified it could be challenging to feature modified 12 files instead of the originally detect the components that require interest. supposed 4. It would be otherwise a very hard In those cases, the version control information investigation to detect these connections without is the main source for the maintainer. As plug- the version control information. ins and their client code are introduced together 7. CONCLUSION into the system, the maintainer can detect the corresponding components. Code comprehension is an important research area to support better understanding of large in- 6.3. Databases/Network Connections dustrial software systems to reduce maintenance Hidden connections between software compo- costs. Tools supporting comprehension are mainly nents in large systems can be manifested in based on the static analysis of the source code. persistent data. One component of the system However, software systems may contain hidden

53 relationships between their components. Connec- [19] Brian Henderson-Sellers. Object-oriented Metrics: Mea- tions between configuration files and their appli- sures of Complexity. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996. cation in the code, and similar are hard to detect [20] GitHub Inc. home page. https://github.com. based on the source. At the same time, those [21] GitLab Inc. home page. https://gitlab.com. [22] Andrejs Jermakovics, Alberto Sillitti, and Giancarlo Succi. connections are likely reflected by the software Mining and visualizing developer networks from version development process, which can be retrieved from control systems. In Proceedings of the 4th International the version control information. Workshop on Cooperative and Human Aspects of Soft- CodeCompass is an open source code compre- ware Engineering, CHASE ’11, page 24–31, New York, NY, USA, 2011. Association for Computing Machinery. hension framework which is intended to collect the [23] Huzefa Kagdi, Shehnaaz Yusuf, and Jonathan I. Maletic. whole information portfolio of the system under Mining sequences of changed-files from version histories. investigation. This includes not only the internal In Proceedings of the 2006 International Workshop on Mining Software Repositories, MSR ’06, page 47–53, structure revealed from the source code, but also New York, NY, USA, 2006. Association for Computing additional information, including the git version Machinery. control information. [24] McCabe. A complexity measure. IEEE Transactions on Software Engineering, 2:308–320, 1976. The blame view shows the last committers line [25] Keir Mierle, Kevin Laven, Sam Roweis, and Greg Wilson. by line for the source lines visually expressing the Mining student cvs repositories for performance indica- “age” of the code. From here, the developer easily tors. In Proceedings of the 2005 International Workshop on Mining Software Repositories, MSR ’05, page 1–5, can navigate to the commit information, to check New York, NY, USA, 2005. Association for Computing the commit message and the other files affected Machinery. by the commit. One can also compare the code [26] Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. Mining metrics to predict component failures. In Proceed- of different comments. Finally, we can inspect ings of the 28th International Conference on Software the development of the project by the traditional Engineering, ICSE ’06, page 452–461, New York, NY, branch view. The applied visualizations inherit the USA, 2006. Association for Computing Machinery. [27] Oscar Nierstrasz, Simon J. Gibbs, and Dennis Tsichritzis. graphical interface of the usual version control Component-oriented software development. Commun. tools to make them familiar for the developers. ACM, 35(9):160–165, 1992. All these possibilities make the comprehension [28] Atlassian Corporation Plc. Bitbucket home page. https: //bitbucket.org. more complete and help to increase the code [29] Zoltán Porkoláb, Tibor Brunner, Dániel Krupp, and Márton quality in the further development process. The Csordás. Codecompass: An open software comprehen- full implementation is available as an open source sion framework for industrial usage. In Proceedings of the 26th Conference on Program Comprehension, ICPC project at [7]. ’18, pages 361–369, New York, NY, USA, 2018. ACM. REFERENCES [30] Y. Shinyama, Y. Arahori, and K. Gondow. Analyzing code comments to boost program comprehension. In [1] Apache Lucene. https://lucene.apache.org/core/. 2018 25th Asia-Pacific Software Engineering Conference [2] Apache NetBeans. https://netbeans.org. (APSEC), pages 325–334, 2018. [3] Apache Thrift. https://Thrift.apache.org. [31] Csaba Szabó. Programme of the winter school of project [4] Apache Xerces-C validating XML parser. https://github. no.2017-1-sk01-ka203-035402: “focusing education on com/apache/xerces-c. composability, comprehensibility and correctness of work- [5] Clang: a C language family frontend for LLVM. https: ing software”, 2018. accessed 02-July-2019. //clang.llvm.org/. [32] Richárd Szalay, Zoltán Porkoláb, and Dániel Krupp. Mea- [6] Code Browser by Woboq for C and C++. https://woboq. suring mangled name ambiguity in large c/c++ projects. In com/codebrowser.html. Zoran Budimac (Ed), Proceedings of the Sixth Workshop [7] CodeCompass website. https://github.com/Ericsson/ on Software Quality Analysis, Monitoring, Improvement, CodeCompass. and Applications. Belgade, Serbia 2017., 2017. [8] CodeSurfer. https://www.grammatech.com/products/ [33] Richárd Szalay, Zoltán Porkoláb, and Dániel Krupp. To- codesurfer. wards better symbol resolution for C/C++ programs: A [9] Eclipse. https://www.eclipse.org/ide/. cluster-based solution. In IEEE 17th International Work- [10] Gerrit Code review home page. https://www. ing Conference on Source Code Analysis and Manipula- gerritcodereview.com/. tion (SCAM), pages 101–110. IEEE, 2017. [11] OpenGrok. https://opengrok.github.io/OpenGrok. [12] Qt Creator. https://www.qt.io/. [13] SciTools: Understand. https://scitools.com. Tibor Brunner is working on his PhD at the [14] The LLVM Compiler Infrastructure. https://llvm.org/. Eötvös Loránd University (ELTE), Budapest, Hun- [15] Visual Studio. https://visualstudio.microsoft.com. gary. He is an expert of code comprehension and [16] Tibor Brunner. Codecompass: an extensible code comprehhension framework. Technical Report IK-TR1, static analysis. He is also teaching C and C++ Eötvös Loránd University, Faculty of Informatics, Bu- programming languages for BSc students. dapest, May 2018. [17] Csaba Faragó. Maintainability of source code and its connection to version control history metrics, phd thesis. Zoltán Porkoláb received his doctoral degree in Technical report, Department of Software Engineering, Computer Science from the Eötvös Loránd Uni- University of Szeged, Szeged, May 2016. versity (ELTE), Budapest, Hungary in 2004. He is [18] Csaba Faragó, Péter Heged˝us, Ádám Zoltán Végh, and an Associate Professor at ELTE, and at the same Rudolf Ferenc. Connection between version control op- erations and quality change of the source code. Acta time, he holds Principal C++ Developer position at Cybernetica, 21(4):585–607, 2014. Ericsson Hungary Ltd.

54