Using Eye Tracking to Study Variable Naming Conventions and Their Effect on Code Readability

EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15 HP STOCKHOLM, SVERIGE 2019 Using eye tracking to study variable naming conventions and their effect on code readability PONTUS BROBERG SHAPOUR JAHANSHAHI KTH SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP Using eye tracking to study variable naming conventions and their effect on code readability PONTUS BROBERG, SHAPOUR JAHANSHAHI Master in Computer Science Date: June 7, 2019 Supervisor: Richard Glassey Examiner: Örjan Ekeberg School of Electrical Engineering and Computer Science Swedish title: En studie av variabelnamngivningskonventioners åverkan på läslighet av kod med hjälp av ögonspårning iii Abstract Using camel case when naming variables is largely considered to be best prac- tise when writing code these days. But is it really the best variable naming convention when it comes to code readability and understanding? And how does different variable naming conventions affect the readability of code? This thesis researches these questions using eye tracking technology. Test subjects are timed as they look at and explain code snippets using different variable naming conventions while their gaze is plotted onto a heatmap. The variable naming conventions tested were single letters, single words, multiple words in camel case and multiple words in snake case. From the results shown, the con- clusion is drawn that no significant difference in readability can be confirmed between the different variable naming conventions. iv Sammanfattning Att använda camel case när man namnger variabler i kod anses i stort sett vara god praxis. Men är det verkligen det bästa sättet att namnge variabler när det kommer till läslighet och förståelse av kod? Och hur påverkar olika variabelnamns-notationer läsligheten av kod? Denna avhandling undersöker dessa frågor med hjälp av ögonspårningsteknologi. Tid togs medan testperso- ner tittade på och försökte förklara små kodexempel med olika sorters variabelnamn samtidigt som deras blick översattes till ett färgdiagram. De olika sätten att skriva variabelnamn som testades var en bokstav, ett ord, flera ord i camel case och flera ord i snake case. Från resultatet som visas dras slutsat- sen att ingen betydande skillnad på läsbarheten av koden kunde hittas när det kommer till användandet av olika variabelnamngivningskonventioner. Contents 1 Introduction 1 1.1 Research Question . .2 1.2 Approach . .2 1.3 Thesis Outline . .2 2 Background 3 2.1 Variable naming conventions . .3 2.2 Heatmap . .4 2.3 Code readability . .4 2.4 Eye tracking . .5 2.5 Related work . .5 3 Methods 7 3.1 Preparations . .7 3.1.1 Creating code snippets . .7 3.1.2 Test station setup . .9 3.2 User tests . .9 3.3 Limitations . 10 4 Results 11 4.1 Test subjects . 11 4.2 Mean time . 11 4.3 Time distribution . 11 4.4 Questions . 13 4.5 Heatmaps . 14 5 Discussion 15 5.1 Timed data . 15 5.2 Heatmaps in relation to timed data . 17 5.3 Possible improvements . 17 v vi CONTENTS 5.4 Future work . 18 6 Conclusions 19 Bibliography 20 A Heatmaps 22 Chapter 1 Introduction Reading and understanding code is a key component in the life of a programmer and being able to understand a program quickly is something that is as- sociated with being a good programmer. However, in a perfect world, a programming novice should also be able to understand a piece of code quickly if the code is written in such a way that it is easy to understand. Since about 70% of code consists of identifiers [1], one could argue that code readability mostly depends on how the author chooses to name these variables, methods and classes. Code readability is also very important when maintaining programs. A bit more than two thirds of a programs total lifecycle cost is spent on code maintenance and researchers have noted that reading code seems to be the area where time is most spent when maintaining code [2]. While studying computer science at KTH, one is introduced to programming languages such as Java, Python and GO during the first year and all of these languages seem to follow the same kind of praxis, using camel case, when it comes to naming variables. However, one never really get a scientific explanation as to why you should program using camel case. Instead you are simply told that it is the best variable naming convention and that you should write code according to it. Is this really the best way to name variables or is it just popular without any scientific reason to back it up? With eye tracking technology it is possible to get instant feedback where a developer is looking on a monitor. Using this technology, a developer can be presented with a snippet of code on a monitor and be told to read it, trying to figure out what the code does. The eye tracker could generate a heatmap showing what parts of the code the developer spent the most time reading. 1 2 CHAPTER 1. INTRODUCTION 1.1 Research Question The purpose of this study is to challenge the praxis of using camel case to name variables when writing code and to see if there maybe is another way that might be better, considering code readability and comprehension. The study therefore will aim to answer the question: How does using different variable naming conventions affect the readability of a program? 1.2 Approach In order to answer this question, four different variable naming conventions will be used. These are: single letter, single word, multiple words in camel case and multiple words in snake case. Six different code snippets will be written where each of these snippets will be written using two different variable naming conventions. All of the variable names will be chosen to help describe the code as well as possible, within the limits of the variable naming convention. A test subject will be shown a set of code snippets and using Tobii eye trackers, their gaze will be translated to a heatmap. All test subjects will also be timed as they try to generally explain what the snippet of code does. The data from these tests will then be analyzed, discussed and conclusions will be drawn. 1.3 Thesis Outline The following chapter will provide relevant background, definitions and a brief explanation on how eye tracking works. It will also present some similar stud- ies that has been done in this area. The third chapter describes the methods used to gather data. In the fourth chapter all results will be presented. The following fifth and sixth chapter discusses the results and answers the research question. Chapter 2 Background This chapter will begin with definitions of the variable naming conventions that we use in this study. Following that is a rundown of the eye tracking technology that is used, explaining briefly how it works and what it can do. Last section in the chapter presents related work that has been done previously in the field. 2.1 Variable naming conventions Using a set of rules for naming variables in code is a wide-spread practice. A prime example of when to use such a variable naming convention is when working with other people on writing and maintaining code. [3]. This section will define the four different variable naming conventions that we used during the tests. These can essentially be split into two different groups which are multi-worded variables (camel case and snake case) and single-worded variables (single letter and single word). Definition: Camel case Multiple words in camel case (hereafter named MWCC) is a variable naming style used to write variables with multiple words without using spaces. Since spaces in code means the start of a new type of command or a new part of the code, spaces cannot be used when naming variables. To distinguish the different words from each other you can start every new word with a capital letter. An example of MWCC would be sumOfNumbers. [4] 3 4 CHAPTER 2. BACKGROUND Definition: Snake case Multiple words in snake case (hereafter MWSC) is a multiple worded variable naming style where instead of using spaces you use underscores instead. These underscores used to separate the words makes the variable look somewhat like a snake, hence the name. Note that no letters are written with capital letters in MWSC. An example of MWSC would be sum_of_numbers. [4] Definition: Single word Single worded (hereafter SW) variable names are names that use only one word. No capital letters are ever written. An example of SW would be sum. Definition: Single letter Single letter (hereafter SL) variable names are names that will use only a single letter to describe the variable. Letters are not written in uppercase and the letter is used to try and describe the variable as well as possible. An example of SL would be s since that is the first letter of the word sum. 2.2 Heatmap A heatmap is a graphical representation of a matrix. Individual tiles in a grid is painted with a shaded colour, scaled to represent the value corresponding to the element of the data matrix [5]. If we interpret a computer screen as a grid with rows and columns, we can use a heatmap to visualize which parts of the screen a user is looking at the most. 2.3 Code readability Buse and Weimer define readability as "a human judgment of how easy a text is to understand" [2]. They acknowledge that there are many different factors that determine the readability of code. There are formatting aspects, such as the use of proper indentation, fonts and colours as well as identifier names. Code complexity can also affect code readability [6].

Load more