EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 7,5 HP STOCKHOLM, SVERIGE 2019

Comparing syntax highlightings and their effects on code comprehension

ERIK HÄREGÅRD

ALEXANDER KRUGER

KTH SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP

Comparing syntax highlightings and their effects on code comprehension

ERIK HÄREGÅRD ALEXANDER KRUGER

Bachelor in Computer Science Date: June 7, 2019 Supervisor: Jeanette Hellgren Kotaleski Examiner: Örjan Ekeberg KTH Swedish title: Jämförelse av syntax highlightings och dess effekt på kodförståelse

iii

Abstract

Syntax highlight is a system designed to assist a writer or by dis- playing different parts of a text in a specific color based on its function. In this study, we conducted a practical experiment comparing the effectiveness of two common syntax highlightings by primarily measuring the speed of which par- ticipants could understand a given piece of code. The highlights chosen for the comparison were UI Standard and Styri. Previous studies in this area, most notably by Advait Sarkar from the University of Cambridge, have shown a generally positive effect of syntax highlight but have not compared how different types of syntax highlightings affected the reader. Our study was performed with eight participants from a Technical High school with a minor background in programming who answered six questions where some code was highlighted and some not; the order of the items was found to affect re- sults on some questions. The results do not show any significant advantage to using syntax highlight or a difference between the effectiveness of the two syn- tax highlights. Some participants saw a slight constant advantage from using syntax highlight, but no general or notable conclusion could be made about these participants. iv

Sammanfattning

Syntax highlight är ett system menat att hjälpa författare eller programmerare att skriva text eller kod genom att ändra textfärgen baserat på dess effekt i en mening eller instruktion. I denna studie genomfördes ett praktiskt experiment som jämförde effekten av två olika syntax highlightings genom att huvudsak- ligen mäta hur lång tid det tog för deltagarna att förstå ett givet kodstycke. Ti- digare studier, av framförallt Advait Sarkar från Cambridge Universitet, inom detta område har visat en generell positiv effekt av syntax highlight men har inte jämfört hur olika syntax highlightings påverkade läsaren. De två syntax highlights som valdes var Atom UI Standard och Styri. Vår studie utfördes med åtta studenter från ett tekniskt gymnasium som hade viss erfarenhet inom pro- grammering. Deltagarna svarade på sex frågor där de blev visade kodstycken. Några av dessa var med syntax highlight och resten hade ingen syntax high- light. Frågornas ordning var konsekvent mellan deltagarna. Vid senare under- sökning visade det sig att frågornas ordning hade påverkat resultatet av vissa frågor. Resultatet visar ingen generell signifikant positiv effekt av att använda syntax highlight eller någon skillnad mellan de två utvalda highlightingsen. Vissa deltagare såg en liten konstant fördel av att använda syntax highlight men inte tillräckligt för en signifikant slutsats. v

Acknowledgements

We want to acknowledge Jeanette Hellgren Kotaleski for providing quick an- swers when needed and her support throughout the project. We would also like to thank Lovisa Johansson for assisting in finding suitable questions. This study would also not have been possible without Maria Franzén assisting us in finding appropriate participants and providing excellent feedback on our questions. Contents

1 Introduction 1 1.1 Limiting the Scope ...... 2 1.2 Research Question ...... 3 1.3 Hypothesis ...... 3

2 Background 4 2.1 State of the Art ...... 6 2.2 Difference of the Syntax Highlightings ...... 6

3 Method 8 3.1 Design ...... 8 3.2 Selection ...... 9 3.3 Ethical considerations ...... 10

4 Results 11

5 Conclusion and Discussion 13 5.1 Conclusion ...... 13 5.2 Discussion ...... 13 5.3 Problems with the Study ...... 14 5.4 Future Work ...... 16

Bibliography 17

vi Chapter 1

Introduction

The core function of Syntax Highlight is to change the text color for the user in the hopes of making the text more accessible and faster to read [1]. It is impor- tant to note that a syntax highlight does not alter the behavior of the code. In the following example 1.2, we have a function called “System.out.println” that is given some text in the form of a string containing the text “HelloWorld”. A function is an instruction that takes something like a string and returns something else. A string is how a computer stores text. Because the “Sys- tem.out.println” is a function the syntax highlight system gives it the color blue; similarly because “HelloWorld” is a string, it is given the color yellow. Different syntax highlightings can give different colors to different categories of a language. Syntax highlight makes it easier to see what purpose each part of the sentence or instruction has; an example can be to visualize what part is a function and what part is a variable such as a string.

Figure 1.1: HelloWorld without syntax highlight

Figure 1.2: HelloWorld with Styri highlight

1 2 CHAPTER 1. INTRODUCTION

This customizability of syntax highlight has led to many programs and larger companies to develop a new style of syntax highlight; examples of this are AtomUI and Googles Material Design [2]. This study hopes to aid in the development of syntax highlightings that are more effective and increase the productivity of . Given the goal, a fair question is why the focus on syntax highlighting? As shown in the famous quote below, programmers spend a lot of time reading code. This means that the speed of which program- mers can read code dramatically affects their overall efficiency. The quote comes from the book Clean Code: A Handbook of Agile Software Crafts- manship” which is the bible of excellent code; the author Robert Martin is regarded in similar high standards: “Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code.” [3]

1.1 Limiting the Scope

There are many different syntax highlightings and we could not reasonably test them all. Therefore we have settled on comparing Atom standard highlight and Styri’s syntax highlight compared to having no syntax highlight. We attempted to select two well-known syntax highlightings in the hopes that their popularity is somehow related to their usefulness [4]. The reason Atom standard highlight and Styri’s syntax highlight were cho- sen was that they are both popular highlights in the Atom editor, one of the most popular code editors in the world. The two highlights are also signifi- cantly different from each-other, which can be seen in the figures later, and as such there might be a difference in their effectiveness. Other studies in the field (see Section 2.1) have used Eye-Tracking as an- other metric to measure the eye movement of participants to assist with the analysis. In this study we opted to not use Eye-Tracking, primarily due to the unreliability of the technology without using head-mounted setups. We in- stead chose to focus on the time and correctness of the answers as our metrics. CHAPTER 1. INTRODUCTION 3

1.2 Research Question

Which of Atom standard highlight and Styri’s syntax highlight results in the fastest code comprehension compared to having no syntax highlight?

1.3 Hypothesis

In a previous study by Advait Sarkar from the University of Cambridge [5], it was concluded that syntax highlight had a positive effect on the participant’s comprehension of code. In this paper, we aim to study if the system of color being used by a specific syntax highlight can affect the effectiveness of said highlight. We hypothesize that the type of highlight will not play a significant role in the effectiveness and we expect both highlights to perform better than having none at all. The basis of this hypothesis is that both highlights used are popular and as such are likely to perform well; while there are differences between them, it is unlikely that they are major enough to see a radical change in performance. Since most modern code editors feature some form of syntax highlight we find it reasonable that they should have some effect. Chapter 2

Background

A language like English was created to facilitate communication between hu- mans and in the same way, programming languages were made to enable com- munication between humans and computers [6]. Where English has sentences made by stringing words together, programming languages are made the same way but words are often called “instructions” or “statements”. By combin- ing words or instructions, we form sentences that adhere to a programmings languages specific grammatical rules. The syntactic rules can be for example that a semi-colon “;” must be used to delimit a statement [7] which represents the end of an instruction or sentence. Unlike spoken languages like English, programming languages must be incredibly precise with their grammar and syntax; there is no room for am- biguity[8]. In this context, ambiguity is defined as being able to create two different interpretations of the same statement. Where it is possible to com- prehend a broken sentence in a human language, this cannot be done by a code compiler since it has to follow specific rules on how to interpret an instruction. Ambiguity is an undesirable trait to have when writing code since it leaves room for the program to misinterpret the instructions, and for that reason is avoided in most programming languages[6]. As such the programmer must write syntactically correct code for the compiler to understand the intentions of the programmer and create a working program. One of the first known editors featuring a system in principle similar to syn- tax highlight was Wilfred Hansen’s code editor Emily from 1969 [1]. While extremely crude compared to modern systems it featured a similar concept of changing the way text is displayed based on the word or sentence specific grammar. The system was developed as a part of a dissertation submitted to the

4 CHAPTER 2. BACKGROUND 5

Department of Computer Science at Stanford University for the degree of Doc- tor of Philosophy. [9] Emily built on the concept of hierarchic text to provide syntax conforming options to the programmer.[10] The system bore some re- semblances to code-completion, similar to IntelliSense, but with some aspects of syntax highlighting. Wilfred Hansen is still alive today and has published several other papers and constructed several other systems since 1969. The first patent request in an area similar to syntax highlighting was filed by Anita H. Klock and Jan B. Chodak the 29th of October in 1982 in Mattel Inc where it was used in the Intellivisions ECS. The ECS had a simple code editor with a basic syntax highlighting system implemented which was based on a syntax rule based coloring system. The program was implemented to make it easier for beginners in BASIC [11]. As seen in 2.1, each type of instruction was given a specified color in the patent [12]. As stated in the abstract of the patent, the reason for this coloring system was to assist the programmer in finding incorrect statements and “a method of overlooking syntax errors to reduce operator frustration during running”. Since then code editors have evolved but still share some fundamental properties with the original systems from Wilfred Hansen and Anita H. Klock.

Figure 2.1: Figure 8 from the original patent for the syntax highlighting fea- tured in the Intellivision ECS 6 CHAPTER 2. BACKGROUND

2.1 State of the Art

In Advait Sarkar study “The impact of syntax colouring on program compre- hension” it is stated that the area of syntax highlighting or syntax coloring has not yet been studied extensively. The goal of his study was to attempt to understand whether syntax coloring has an effect on program comprehension time and to investigate if this effect varies by programming experience. Python was used for the experiment due to its similarity to pseu- docode. In the study ten graduate students participated and a Tobii X120 eye tracker was used to capture eye movements. Sarkar also notes that future work should investigate the impact of different coloring schemes or systems. The conclusion drawn was that the presence of syntax highlighting significantly reduces task completion time, but the magnitude of this effect decreases as programming experience increases. Tuomas Hakala et al from the university of Joensuu in Finland studied the effects of three coloring schemes on Java programs in the paper “An Experi- ment on the Effects of Program Code Highlighting on Visual Search for Local Patterns” [13]. In the study the authors suggest that many highlightings use a very extensive coloring system and that this leads to cluttered screens. The study therefore focused on a simpler type of syntax highlighting with less col- ors and complexity. Because of this, none of the 21 participants were familiar beforehand with the given syntax highlightings. The results were that syntax highlighting either had a minimal effect or no effect compared to black text on white background. They theorized that the colors from syntax highlighting are aesthetically pleasing and could therefore contribute to work satisfaction.

2.2 Difference of the Syntax Highlightings

The two types of syntax highlightings chosen for this study have some differ- ences in appearance. For instance, in Atom standard in figure 2.2 the variables x, y, and z of the function "println" are not highlighted; in Styri in figure 2.3 they are. Another difference between the two is the choice of coloring: Styri uses brighter almost neon colours for its syntax compared to Atom standards more pale colors. Also, for all the examples in this study, the Atom standard highlight uses more colors in total than Styri does. CHAPTER 2. BACKGROUND 7

Figure 2.2: Atom standard example

Figure 2.3: Styri example Chapter 3

Method

3.1 Design

A practical experiment was performed to compare the effectiveness of Atom standard versus Styri syntax highlight. The experiment is loosely based on “The impact of syntax colouring on program comprehension” by Advait Sarkar [5] and follows a similar structure to his. In our experiment, short snippets of code were shown to a participant with a question paired with each piece; the participant then answered out loud what the code resulted in out of four given alternatives. The test had four different versions and the participants were randomly divided into 4 groups, one for each version; all versions had the same questions in the same order: 1a, 2b, 3a, 1b, 2a, 3b. The only difference was which code snippets had syntax highlight, as seen in figure 3.1. During the experiment the number of correct answers of each participant was measured as well as their time to complete each question; time to complete will be used as the primary measure of how effective syntax highlight is. An administrator noted the answers and measured the time using a stopwatch. After the experiment, each participant answered a short survey seen in figure 4.4 where they self-describe their previous code experience. The two questions asked for this purpose was: “Did you program before taking this course in programming?” and “How well would you rate your understanding of the course?”. The reasoning for this is attempting to determine if there is any correlation between experience with coding and the effect of syntax high- light which could also help in understanding if the test group had an uneven

8 CHAPTER 3. METHOD 9

distribution of experience between the participants.

Figure 3.1: The four versions of the experiment. Vertical axis show version of experiment; horizontal axis show question number.

3.2 Selection

The participants of this study are students of a Technical High school and their previous experience in programming came mostly from their course Program- ming 1 [14]. In the course, students are expected to learn the basics of pro- gramming including but not limited to knowledge of data structures, structural problem-solving, and experience in at least one . The students had taken the course for a little over half a year and as such were ex- pected to have some knowledge in basic programming. This group was cho- sen since in the Swedish curriculum, Programming is not taught before High School level and the students would therefore have similar levels of experience in the field. The questions for our experiment are written in pairs: two questions look very similar to each other and are close in difficulty. One has syntax highlight and the other does not. This was to compare the two questions completion time and determine the effect of syntax highlight. It could also create a prob- lem where the second question the participant encounters might be easier due to being similar in structure and content; this is discussed further later. To assess the questions, we tested several different questions on a volunteer who had a similar background in programming; the volunteer was not a part of the test group. The results allowed us to fine-tune the questions and their diffi- culty. This very small-scale pilot experiment also provided valuable insights concerning much of the practical aspects of testing. 10 CHAPTER 3. METHOD

The reason for the different versions of the experiment as seen in figure 3.1 was to compare the effects of syntax highlight on every individual while trying to avoid having the individual reading speed and skill affect the results. The versions also make it possible to attempt to determine if the order of the questions had a larger impact than syntax highlight. The code was written to be intuitive and easy to read with large text and did not have any weird quirks of Java or “traps”.A "trap" could be described as code written to trick the programmer into making the wrong choice by making it harder to grasp the real purpose of the code. This study aims to compare syntax highlightings in their ideal environment, not test the awareness or experience of the participants.

3.3 Ethical considerations

All of the participants of our study were volunteers and took part willingly without coercion or other outside influences. As mentioned previously, the participants were students and after talking to their mentor, we came to one of their lectures to ask for volunteers. No personal or identifying information was recorded of the participants and they were informed that they would re- main anonymous in our study. Throughout the experiment, we did our best to contribute to a calm and welcoming atmosphere and not to pressure or stress the participants. Chapter 4

Results

Figure 4.1: Time measured in seconds for each participant per question split up by each version of the test. Correct answers are marked with green and incorrect ones with red. The Self-Assessment score is the participants own assessment of their programming skills, on a scale of one to five where five is highest.

11 12 CHAPTER 4. RESULTS

Figure 4.2: Average time measured in seconds spent per question, with and without syntax highlight, split up by what form of syntax highlight used. Lower is better

Figure 4.3: Difference in time measured in seconds between the first and sec- ond question of each pair per person, and the average difference for each cou- ple. Pairs where the first question took longer time, are marked with yellow. Pairs where the second took longer, are marked in orange.

Figure 4.4: Answers to the survey questions regarding previous experience as well as what they found difficult in the study. Chapter 5

Conclusion and Discussion

5.1 Conclusion

The original question was: “Which of Atom standard highlight and Styri’s syntax highlight results in the fastest code comprehension compared to hav- ing no syntax highlight?”. This question can be interpreted as three individual questions: is Atom standard highlight faster than no syntax highlight; is Styri’s syntax highlight faster than no syntax highlight; which of Styri and Atom stan- dard is the one which results in the fastest code comprehension? Contrary to our hypothesis and some of the previous work in this field, we found no significant difference in performance between either of the syntax highlightings compared to plain text, as seen in figure 4.2. It is worth noting that in the study, the participants with the best score also had the most posi- tive effect of syntax highlight. As such, there might be a correlation between experience with programming and effect gained from syntax highlight.

5.2 Discussion

In figure 4.1, two participants stand out: person 3 and person 8. Both show a notable advantage while using syntax highlight and are the only ones with six correct answers. A possible conclusion would therefore be that people who are experienced and knowledgeable in programming get an advantage using syntax highlight. The conclusion assumes that the people who answered correctly on the programming questions had more experience than those who did not. The conclusion is supported by some of the answers in figure 4.4 from the questions given after the test where person 8 rated their understanding of

13 14 CHAPTER 5. CONCLUSION AND DISCUSSION

the lessons as a four out of five. Because both persons 3 and 8 claimed to have no previous programming experience before having programming lessons, it is unlikely that the conclusion stated above is entirely correct. The argument is further supported by that person 3 felt like "Everything" was difficult, as seen in figure 4.4. To evaluate whether or not the order of the questions had an impact on the results, we analyzed the data as seen in figure 4.3. From this, we can establish that on questions 1a / 1b participants spent more time on the first question they saw of this set: question 1a. Contrary to this on questions 2a / 2b, participants generally spent more time on the second question: 2a. For the final set, questions 3a and 3b, there was no conclusive trend among the participants. These differences in speed applied to both questions 1a / 1b and questions 2a / 2b regardless of what syntax highlight was used and which question of the two had syntax highlight enabled. A possible explanation is that either question 1a and 2a were both harder than their counterpart, or that the questions 1a / 1b were too similar and as such participants knew what to look out for on the second question. Another potential explanation is that the order of the question matters more for the shorter questions, as question 1a / 1b were shorter than both 2a / 2b and 3a / 3b. In the study performed by Sakar [5], the conclusion was that syntax high- light generally improved task completion speed and was a net positive. In this study, we did not find the same general conclusion based on average results and some of the participants even performed better without highlight. Our partic- ipants had notably less experience in programming than in Sakar’s study; his participants were graduate computer science students, while our participants had less than one-year worth of experience. In Sakar’s paper, it was concluded that the effect of highlight diminished with experience of the participant [5]; in this study, we found some evidence that the better performers also saw the most prominent positive effect of syntax highlight. From this, a possible con- clusion is that you need some experience with programming to see any effects from syntax highlight, but when you reach a certain level the effects start to diminish.

5.3 Problems with the Study

The experiment took place in a corridor of a school and the environment was therefore not fully controlled during the study. Because of this, there were CHAPTER 5. CONCLUSION AND DISCUSSION 15

some differences in the setting for each participant. A critical change in the environment was noise from other students: for the early participants there was little to no noise, but for the later participants there was some chatter and music that could have distracted and affected the performance of the partic- ipants. No participant remarked on the noise or made any apparent signs of being distracted and it is therefore reasonable to assume that it did not sway the results of this study decidedly. A problem identified early was that the difficulty of the questions had to be similar enough so that they would take a similar amount of time without causing the person to immediately understand the second part of the question set. Another primary source of error was in the measuring of the time; this was especially true for the first person. There were some difficulties in the measuring of the time because of the physical setup; this was much improved before the second person and is therefore not considered enough to change the conclusion. One of the limitations of the scope of this study was that we could not establish previous experience in programming in our participants. The only information we could obtain about their experience with coding was their self- assessment of their skill seen in figure 4.1 and some additional comments seen in figure 4.4; both of these are very subjective. Because of this, we have no objective way of determining if the different participants of our study had sig- nificantly varying experience or not; all of the participants claimed to have no previous background in programming and rated themselves very closely in the survey. There were notable differences in their performance in the questions and we are unable to conclude whether this is due to their experience with the given results. One such example is in figure 4.2 where the Styri group performed better than the Standard group on the questions without any syn- tax highlight; this could imply that the Styri group in general, were faster at reading code. One of the biggest problems with this study is the number of participants: with only 8 participants, it was very difficult to establish if any conclusions are accurate or just variance. Also, any difference in experience of the participants or minor interruption during one of the experiments had a considerable impact on the overall result of the study because of this. 16 CHAPTER 5. CONCLUSION AND DISCUSSION

5.4 Future Work

Because our study did not produce any conclusive evidence to support whether one syntax highlight is superior to others, we recommend that any future re- search in this field iterates and improves upon our methodology. Specifically, two clear improvements could be made to our study: 1) Control the previous experience in the participants of the study; this could be done with a test or a more extensive survey. 2) Have a larger sample size of participants; we had eight people in our study, and it was difficult to draw any general conclusions. Our study focused mainly on the difference in how fast participants an- swered the questions; other areas to study could be how well they understand the code, retention of said understanding over time and how much of a differ- ence syntax highlight makes when writing code. Bibliography

[1] Wilfred J Hansen. “User engineering principles for interactive systems”. In: Proceedings of the November 16-18, 1971, fall joint computer con- ference. ACM. 1971, pp. 523–532. [2] GitHub material-design-lite. url: https://github.com/google/ material-design-lite. [3] Robert C. Martin. Clean Code: A Handbook of Agile Software Crafts- manship. Prentice Hall, 2008. isbn: 9780132350884. url: https : //www.amazon.com/Clean-Code-Handbook-Software- Craftsmanship/dp/0132350882?SubscriptionId=AKIAIOBINVZYXZQZ2U3A& tag=chimbori05-20&linkCode=xm2&camp=2025&creative= 165953&creativeASIN=0132350882. [4] Ranking the Top 5 Code Editors in 2019. url: https : / / www . software . com / review / ranking - the - top - 5 - code - editors-2019. [5] Advait Sarkar. “The impact of syntax colouring on program compre- hension.” In: PPIG. 2015. [6] Programming language. url: https://www.cs.mcgill.ca/ ~rwest/wikispeedia/wpcd/wp/p/Programming_language. htm. [7] Java Code Conventions. url: https : / / www . oracle . com / technetwork/java/codeconventions-150003.pdf. [8] Introduction to Programming Languages/Ambiguity. url: https:// en.wikibooks.org/wiki/Introduction_to_Programming_ Languages/Ambiguity. [9] Wilfred J Hansen. “EMILY User’s Manual”. In: Applied Math. Div., Argonne Nat’l Lab., Argonne, Ill. (Dec. 1970).

17 18 BIBLIOGRAPHY

[10] Terrence Dorsey08/07/2014. Semantic Code Highlighting. July 2014. url: https://visualstudiomagazine.com/articles/ 2014/08/01/semantic-code-highlighting.aspx. [11] Anita H Klock and Jan B Chodak. Syntax error correction method and apparatus. [12] Anita H Klock and Jan B Chodak. SYNTAX ERROR CORRECTION METHOD AND APPARATUS. Oct. 1986. [13] Tuomas Hakala, Pekka Nykyri, and Jorma Sajaniemi. “An Experiment on the Effects of Program Code Highlighting on Visual Search for Local Patterns.” In: PPIG. Citeseer. 2006, p. 10. [14] Ämne - Programmering. url: https://www.skolverket.se/ undervisning / gymnasieskolan / laroplan - program - och-amnen-i-gymnasieskolan/gymnasieprogrammen/ amne?url=1530314731/syllabuscw/jsp/subject.htm? subjectCode%3DPRR%26lang%3Dsv%26tos%3Dgy%26p% 3Dp&sv.url=12.5dfee44715d35a5cdfa92a3.

Figure 1: Question 1a. BIBLIOGRAPHY 19

Figure 2: Question 2b.

Figure 3: Question 3a. 20 BIBLIOGRAPHY

Figure 4: Question 1b.

Figure 5: Question 2a. BIBLIOGRAPHY 21

Figure 6: Question 3b.

Figure 7: The survey questions for self-assessment.

TRITA-EECS-EX-2019:337

www.kth.se