Evaluating Code Readability and Legibility: an Examination of Human-Centric Studies
Total Page:16
File Type:pdf, Size:1020Kb
2020 IEEE International Conference on Software Maintenance and Evolution (ICSME) Evaluating Code Readability and Legibility: An Examination of Human-centric Studies Delano Oliveira∗, Reydne Bruno∗, Fernanda Madeiral†, Fernando Castor∗ ∗Federal University of Pernambuco, Recife, Brazil, {dho, rbs8, fjclf}@cin.ufpe.br †KTH Royal Institute of Technology, Stockholm, Sweden, [email protected] Abstract—Reading code is an essential activity in software There are systematic literature reviews about program com- maintenance and evolution. Several studies with human subjects prehension studies [16], [17]. To the best of our knowledge, have investigated how different factors, such as the employed however, no previous work has investigated how code readabil- programming constructs and naming conventions, can impact code readability, i.e., what makes a program easier or harder to ity and legibility are evaluated in studies comparing different read and apprehend by developers, and code legibility, i.e., what ways of writing code, in particular, what kinds of tasks these influences the ease of identifying elements of a program. These studies conduct and what response variables they employ. An- studies evaluate readability and legibility by means of different alyzing these tasks and variables can help researchers identify comprehension tasks and response variables. In this paper, we limitations in the evaluations of previous studies, gauge their examine these tasks and variables in studies that compare programming constructs, coding idioms, naming conventions, and comprehensiveness, and improve our understanding of what formatting guidelines, e.g., recursive vs. iterative code. To that said tasks and variables evaluate. end, we have conducted a systematic literature review, where we In this paper, we investigate how code readability and legi- found 54 relevant papers. Most of these studies evaluate code bility are evaluated in studies that aim to identify approaches readability and legibility by measuring the correctness of the that are more readable or legible than others. Our goal is to subjects’ results (83.3%) or simply asking their opinions (55.6%). Some studies (16.7%) rely exclusively on the latter variable. study what types of tasks are performed by human subjects There are still few studies that monitor subjects’ physical signs, in those studies, what cognitive skills are required from them, such as brain activation regions (5%). Moreover, our study shows and what response variables these studies employ. To achieve that some variables are multi-faceted. For instance, correctness this goal, we carried out a systematic literature review, which can be measured as the ability to predict the output of a program, started with 2,843 documents, where 54 papers are relevant answer questions about its behavior, or recall parts of it. These results make it clear that different evaluation approaches require for our research—these papers are the primary studies of our different competencies from subjects, e.g., tracing the program vs. study. Our results show that 40 primary studies involve asking summarizing its goal vs. memorizing its text. To assist researchers subjects to provide information about a program, for example, in the design of new studies and improve our comprehension of to predict its output or the values of its variables, or to provide existing ones, we model program comprehension as a learning a high level description of its functioning. In addition, 30 activity by adapting a preexisting learning taxonomy. This adaptation indicates that some competencies, e.g., tracing, are studies involve asking subjects to provide personal opinions, often exercised in these evaluations whereas others, e.g., relating and 15 studies involve asking subjects to act on code. Most similar code snippets, are rarely targeted. studies employ combinations of these types of tasks. Index Terms—Code readability, code legibility, code under- When considering response variables, the most common standability, code understanding, program comprehension approach is to verify if the subjects are able to correctly provide information about the code or act on it. Correctness I. INTRODUCTION is the response variable of choice in 45 studies. Moreover, Understanding a program usually requires reading code. A 27 studies measure the time required to perform tasks as an program might be easier or harder to be read depending on imprecise proxy to the effort required by these tasks. Besides, its readability, i.e., what makes a program easier or harder to mirroring the aforementioned tasks where subjects are required read and apprehend by developers, and legibility, i.e., what to provide their personal opinions, 30 studies employ opinion influences the ease of identifying elements of a program. as a response variable. Additionally, 9 of the 54 studies use Different factors can influence code readability and legibility, opinion as the sole response variable. Furthermore, studies such as which constructs are employed [1], [2], how the code evaluating legibility tend to employ opinion as a response is formatted with whitespaces [3]–[5], and identifier naming variable proportionally more often than studies focusing on conventions [6]–[13]. Researchers have conducted empirical readability. Also, there are still relatively few studies that mon- studies to investigate several of those factors, where different itor subjects’ physical signs, such as brain activation regions but functionally equivalent ways of writing code are compared, (5%) [18]–[20]. The analysis of the response variables reveals e.g., recursive vs. iterative code [14] and long vs. abbreviated that they are multi-faceted. For example, correctness can be identifiers [10], [15]. These studies involve asking subjects to measured as the ability to predict the output of a program, perform one or more tasks related to source code and assessing answer questions about its general behavior, precisely recall their understanding or the effort involved in the tasks. specific parts, identify the cause of a bug, among other things. 2576-3148/20/$31.00 ©2020 IEEE 348 DOI 10.1109/ICSME46990.2020.00041 To assist researchers in the design of new studies and Daka et al. [29] affirm that “the visual appearance of code in improve our comprehension of existing ones, we adapt an general is referred to as its readability”. The authors clearly existing learning taxonomy to the context of program com- refer to legibility (in the design/linguistics sense) but employ prehension. This adapted taxonomy indicates that some com- the term “readability” possibly because it is more often used petencies, e.g., tracing, are often exercised in these evaluations in the software engineering literature. whereas others, e.g., relating similar code snippets, are rarely Based on the differences between the terms “readability” targeted. The taxonomy also indicates that 37% of the primary and “legibility” that are well-established in other areas such as studies have a narrow focus, requiring a single cognitive skill linguistics [27], design [30], human-computer interaction [31], from the subjects, even though program comprehension is a and education [28], we believe that the two terms should have complex activity. Additionally, the taxonomy highlights the clear, distinct, albeit related, meanings also in the area of soft- tendency of the studies evaluating readability and legibility ware engineering. On the one hand, the structural and semantic to focus on comprehension tasks that do not contextualize characteristics of the source code of a program that affect the the information that is apprehended. In software development ability of developers to understand it while reading the code, practice, program comprehension is often associated with other e.g., programming constructs, coding idioms, and meaningful tasks such as reengineering an existing program or extending identifiers, impact its readability. On the other hand, the it. Existing studies rarely go this far. visual characteristics of the source code of a program, which affect the ability of developers to identify the elements of the II. CODE READABILITY AND LEGIBILITY code while reading it, such as line breaks, spacing, alignment, In software engineering, the terms readability, legibility, un- indentation, blank lines, identifier capitalization, impact its derstandability, and comprehensibility have overlapping mean- legibility. Hereafter, we employ these two terms according ings. For example, Buse and Weimer [21] define “readability to these informal definitions. as a human judgment of how easy a text is to understand”. III. METHODOLOGY In a similar vein, Almeida et al. [22] affirm that “legibility is fundamental to code maintenance; if source code is written Our goal is to investigate how human-centric studies eval- in a complex way, understanding it will require much more uate whether a certain approach is more readable or legible effort”. In addition, Lin and Wu [23] state that ““Software than another one. More specifically, we investigate what tasks understandability” determines whether a system can be un- are performed by human subjects and how their performance derstood by other individuals easily, or whether artifacts of is evaluated in empirical studies aiming to evaluate readability one system can be easily understood by other individuals”. and legibility. We focus on studies that directly compare two Xia et al. [24] treat comprehension and understanding as or more different approaches and have