Syntax Errors Just Aren't Natural: Improving Error Reporting With

Total Page:16

File Type:pdf, Size:1020Kb

Syntax Errors Just Aren't Natural: Improving Error Reporting With Syntax Errors Just Aren’t Natural: Improving Error Reporting with Language Models Joshua Charles Abram Hindle José Nelson Amaral Campbell Department of Computing Department of Computing Department of Computing Science Science Science University of Alberta University of Alberta University of Alberta Edmonton, Canada Edmonton, Canada Edmonton, Canada [email protected] [email protected] [email protected] ABSTRACT programmers get frustrated by syntax errors, often resorting A frustrating aspect of software development is that com- to erratically commenting their code out in the hope that piler error messages often fail to locate the actual cause of they discover the location of the error that brought their a syntax error. An errant semicolon or brace can result in development to a screeching halt [14]. Syntax errors are many errors reported throughout the file. We seek to find annoying to programmers, and sometimes very hard to find. the actual source of these syntax errors by relying on the con- Garner et al. [4] admit that “students very persistently sistency of software: valid source code is usually repetitive keep seeking assistance for problems with basic syntactic de- and unsurprising. We exploit this consistency by construct- tails.” They, corroborated by numerous other studies [11, 12, ing a simple N-gram language model of lexed source code 10, 17], found that errors related to basic mechanics (semi- tokens. We implemented an automatic Java syntax-error lo- colons, braces, etc.) were the most persistent and common cator using the corpus of the project itself and evaluated its problems that beginner programmers faced. In fact these performance on mutated source code from several projects. errors made up 10% to 20% of novice programmer compiler Our tool, trained on the past versions of a project, can ef- problems. As an example, consider the missing brace at fectively augment the syntax error locations produced by the end of line 2 in the following Java code taken from the the native compiler. Thus we provide a methodology and Lucene 4.0.0 release: 1 for (int i = 0; i < scorers.length; i++) { tool that exploits the naturalness of software source code to 2 if (scorers[i].nextDoc() == NO_MORE_DOCS) detect syntax errors alongside the parser. 5 lastDoc = NO_MORE_DOCS; 6 return; Categories and Subject Descriptors 7 } 8 } D.2.5 [Software Engineering]: Testing and Debugging— Debugging aids, diagnostics; D.3.8 [Programming Lan- This mistake, while easy for an experienced programmer guages]: Processors—Parsing to understand and fix, if they know where to look, causes the Oracle Java compiler 1 to report 50 error messages, includ- General Terms ing those in Figure 1, none of which are on the line with the mistake. This poor error reporting slows down the software Languages, Human Factors development process because a human programmer must ex- amine the source file to locate the error, which can be a very Keywords time consuming process. naturalness, language, n-grams, syntax, error location, NLP Some smart IDEs such as Eclipse, aim to address this problem but still fall short of locating the actual cause of the syntax error as seen in Figure 2. Thus syntax errors and 1. MOTIVATION poor compiler/interpreter error messages have been found Syntax errors plague new programmers as they struggle to be a major problem for inexperienced programmers [16, to learn computer programming [16, 4]. Even experienced 4, 17]. There has been much work on trying to improve error loca- Permission to make digital or hard copies of all or part of this work for tion detection and error reporting. Previous methods tend personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear to rely on rules and heuristics [3, 6, 8]. In contrast to those this notice and the full citation on the first page. Copyrights for components methods, we seek to improve on error location reporting by of this work owned by others than ACM must be honored. Abstracting with exploiting recent research on the regularity and naturalness credit is permitted. To copy otherwise, or republish, to post on servers or to of written source code [7]. Recent research by Hindle et redistribute to lists, requires prior specific permission and/or a fee. Request al. [7] exploits n-gram based language models, applied to permissions from [email protected]. existing source code, to predict tokens for code completion. MSR ’14, May 31 - June 07 2014, Hyderabad, India Copyright is held by the owner/author(s). Publication rights licensed to 1 ACM. Oracle’s Java compiler, version 1.7.0 13, is available ACM 978-1-4503-2863-0/14/05...$15.00. at http://www.oracle.com/technetwork/java/javase/ http://dx.doi.org/10.1145/2597073.2597102. downloads/index.htm Figure 1: Oracle Java Error messages from the motivational example. The effectiveness of the n-gram model at code completion and suggestion is due to the repetitive, and consistent, struc- ture of code. Using this exact same model, and exploiting the natural structure of source code, we can improve error location de- tection. The idea behind this new method is to train a model on compilable source-code token sequences, and then evaluate on new code to see how often those sequences oc- cur within the model. Source code that does not compile should be surprising for an n-gram language model trained on source code that compiles. Intuitively, most available and distributed source code compiles. Projects and their source code are rarely released with syntax errors, although this property may depend on the project’s software develop- ment process. Thus, existing software can act as a corpus of compilable and working software. Furthermore, whenever a source file successfully compiles it can automatically up- date the training corpus because the goal is to augment the compiler’s error messages. Changes to the software might not compile. When that happens, locations in the source file that the model finds surprising should be prioritized as locations that a software engineer should examine. The following sections of the paper examine the feasibility of this new method by testing its ability to locate errors in a variety of situations, as an augmentation of compiler Figure 2: Top: Figure 1’s code snippet put into error reporting. These situations include situations which Eclipse; bottom: another example of failed Eclipse are much more adverse than expected in practice. syntax error location detection. Eclipse often de- The main contributions of this work include: tects syntax errors, but often reports them in the • A method of statistical, probabilistic syntax-error loca- wrong location. Eclipse’s suggested error location is tion detection that exploits n-gram language models. circled in red, and the actual error location is circled in green. • A prototype implementation of an n-gram language- model-based Java syntax error locator, UnnaturalCode,2 that can be used with existing build systems and Java 2. BACKGROUND compilers to suggest locations that might contain syn- An n-gram language model at its lowest level is simply a tax errors. collection of counts. These counts represent the number of times a phrase appears in a corpus. These phrases are re- • A validation of the feasibility of the new syntax error ferred to as n-grams because they consist of at most n words location detection method. or tokens. These counts are then used to infer the probabil- ity of a phrase: the probability is simply the frequency of • A validation of the integration of the new method with occurrence of the phrase in the original corpus. For example, the compiler’s own methods. if the phrase “I’m a little teapot” occurred 7 times in a cor- 3 pus consisting of 700 4-grams, its probability would be .01. • A modified version of MITLM, which has routines devel- However, it is more useful to consider the probability of a oped by the authors to calculate the entropy of short word in its surrounding context. The probability of finding sentences with respect to a large corpus quickly. the word “little” given the context of “I’m a teapot” is much higher because “little” may be the only word that 2 shows up in that context — a probability of 1. UnnaturalCode is available at https://github.com/ orezpraw/unnaturalcode These n-gram models become more accurate as n increases 3The modified MITLM package used in this pa- because they can count longer, more specific phrases. How- per is available at https://github.com/orezpraw/ ever, this relationship between n and accuracy is also prob- MIT-Language-Modeling-Toolkit lematic because most n-grams will not exist in a corpus of human-generated text (or code). Therefore most n-grams Cross Entropy of Project Code compared to English would have a probability of zero for a large enough n. This Code To Code issue can be addressed by using a smoothed model. Smooth- Lucene to Lucene Xerces to Xerces ing increases the accuracy of the model by estimating the English to English probability of unseen n-grams from the probabilities of the largest m-grams (where m<n) that the n-gram consists of and that exist in the corpus. For example, if the corpus does not contain the phrase “I’m a little teapot” but it does con- tain the phrases“I’m a”and“little teapot”it would estimate the probability of“I’m a little teapot”using a function of the two probabilities it does know. In UnnaturalCode, however, the entropy is measured in bits. Entropy in bits is simply CrossEntropy S = − log2 (p) , where p is the probability.
Recommended publications
  • Compiler Error Messages Considered Unhelpful: the Landscape of Text-Based Programming Error Message Research
    Working Group Report ITiCSE-WGR ’19, July 15–17, 2019, Aberdeen, Scotland Uk Compiler Error Messages Considered Unhelpful: The Landscape of Text-Based Programming Error Message Research Brett A. Becker∗ Paul Denny∗ Raymond Pettit∗ University College Dublin University of Auckland University of Virginia Dublin, Ireland Auckland, New Zealand Charlottesville, Virginia, USA [email protected] [email protected] [email protected] Durell Bouchard Dennis J. Bouvier Brian Harrington Roanoke College Southern Illinois University Edwardsville University of Toronto Scarborough Roanoke, Virgina, USA Edwardsville, Illinois, USA Scarborough, Ontario, Canada [email protected] [email protected] [email protected] Amir Kamil Amey Karkare Chris McDonald University of Michigan Indian Institute of Technology Kanpur University of Western Australia Ann Arbor, Michigan, USA Kanpur, India Perth, Australia [email protected] [email protected] [email protected] Peter-Michael Osera Janice L. Pearce James Prather Grinnell College Berea College Abilene Christian University Grinnell, Iowa, USA Berea, Kentucky, USA Abilene, Texas, USA [email protected] [email protected] [email protected] ABSTRACT of evidence supporting each one (historical, anecdotal, and empiri- Diagnostic messages generated by compilers and interpreters such cal). This work can serve as a starting point for those who wish to as syntax error messages have been researched for over half of a conduct research on compiler error messages, runtime errors, and century. Unfortunately, these messages which include error, warn- warnings. We also make the bibtex file of our 300+ reference corpus ing, and run-time messages, present substantial difficulty and could publicly available.
    [Show full text]
  • Common Logic Errors for Programming Learners: a Three-Decade Litera- Ture Survey
    Paper ID #32801 Common Logic Errors for Programming Learners: A Three-decade Litera- ture Survey Nabeel Alzahrani, University of California, Riverside Nabeel Alzahrani is a Computer Science Ph.D. student in the Department of Computer Science and En- gineering at the University of California, Riverside. Nabeel’s research interests include causes of student struggle, and debugging methodologies, in introductory computer programming courses. Prof. Frank Vahid, University of California, Riverside Frank Vahid is a Professor of Computer Science and Engineering at the Univ. of California, Riverside. His research interests include embedded systems design, and engineering education. He is a co-founder of zyBooks.com. c American Society for Engineering Education, 2021 Common Logic Errors for Programming Learners: A Three- Decade Literature Survey Nabeel Alzahrani, Frank Vahid* Computer Science and Engineering University of California, Riverside {nalza001, vahid}@ucr.edu *Also with zyBooks Abstract We surveyed common logic errors made by students learning programming in introductory (CS1) programming classes, as reported in 47 publications from 1985 to 2018. A logic error causes incorrect program execution, in contrast to a syntax error, which prevents execution. Logic errors tend to be harder to detect and fix and are more likely to cause students to struggle. The publications described 166 common logic errors, which we classified into 11 error categories: input (2 errors), output (1 error), variable (7 errors), computation (21 errors), condition (18 errors), branch (14 errors), loop (27 errors), array (5 errors), function (24 errors), conceptual (43 errors), and miscellaneous (4 errors). Among those errors, we highlighted 43 that seemed to be the most common and/or troublesome.
    [Show full text]
  • Automated Debugging of Java Syntax Errors Through Diagnosis
    Uti and Fox Automated debugging of Java syntax errors through diagnosis Ngozi V. Uti and Dr. Richard Fox Department of Mathematics and Computer Science Northern Kentucky University Highland Heights, KY 41099 Abstract Syntax errors arise in a program because the program code misuses the programming language. When these errors are detected by a compiler, they are not necessarily easy to debug because the compiler error message that reports the error is often misleading. JaSD, a knowledge-based system written in Java, is a program that can determine the cause of a given syntax error and suggest how to correct that error. JaSD uses artificial intelligence techniques to perform debugging. Currently, JaSD has enough knowledge to debug one of four different Java syntax errors. This paper describes the JaSD system and the diagnostic approach taken and offers some examples. Conclusions describe how JaSD can be expanded to debug additional errors. Introduction expected error is common because a semicolon is used Syntax errors can arise in a program because the in Java, C, and C++ to end instructions and in Pascal to program code misuses the programming language. The delineate (separate) instructions. Forgetting a semico- compiler, in its attempt to translate the high-level lan- lon is common and yields the semicolon-expected guage program into machine language, has reached an error in any of these languages. Thus, the programmer instruction that it does not understand. The error might learns how the error can arise. However, there are be because the programmer misused the high-level lan- numerous possible causes for this error, omitting a guage (e.g., improper punctuation or not structuring a semicolon being only one of them.
    [Show full text]
  • Introduction to Computers and Programming
    M01_GADD7119_01_SE_C01.QXD 1/30/08 12:55 AM Page 1 Introduction to Computers CHAPTER 1 and Programming TOPICS 1.1 Introduction 1.4 How a Program Works 1.2 Hardware and Software 1.5 Using Python 1.3 How Computers Store Data 1.1 Introduction Think about some of the different ways that people use computers. In school, students use com- puters for tasks such as writing papers, searching for articles, sending email, and participating in online classes. At work, people use computers to analyze data, make presentations, conduct busi- ness transactions, communicate with customers and coworkers, control machines in manufac- turing facilities, and do many other things. At home, people use computers for tasks such as pay- ing bills, shopping online, communicating with friends and family, and playing computer games. And don’t forget that cell phones, iPods®, BlackBerries®, car navigation systems, and many other devices are computers too. The uses of computers are almost limitless in our everyday lives. Computers can do such a wide variety of things because they can be programmed. This means that computers are not designed to do just one job, but to do any job that their programs tell them to do. A program is a set of instructions that a computer follows to perform a task. For example, Figure 1-1 shows screens from two commonly used programs, Microsoft Word and Adobe Photoshop. Microsoft Word is a word processing program that allows you to create, edit, and print documents with your computer. Adobe Photoshop is an image editing program that allows you to work with graphic images, such as photos taken with your digital camera.
    [Show full text]
  • Compile and Runtime Errors in Java
    Compile and Runtime Errors in Java Mordechai (Moti) Ben-Ari Department of Science Teaching Weizmann Institute of Science Rehovot 76100 Israel http://stwww.weizmann.ac.il/g-cs/benari/ January 24, 2007 This work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/ by-nc-nd/2.5/; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Acknowledgement Otto Seppälä and Niko Myller contributed many helpful suggestions. Contents 1 Introduction 5 2 Compile-time Errors 6 2.1 Syntax errors . 7 2.1.1 . expected . 7 2.1.2 unclosed string literal . 7 2.1.3 illegal start of expression . 8 2.1.4 not a statement . 8 2.2 Identifiers . 10 2.2.1 cannot find symbol . 10 2.2.2 . is already defined in . 10 2.2.3 array required but . found . 10 2.2.4 . has private access in . 10 2.3 Computation . 11 2.3.1 variable . might not have been initialized . 11 2.3.2 . in . cannot be applied to . 11 2.3.3 operator . cannot be applied to . ,. 12 2.3.4 possible loss of precision . 12 2.3.5 incompatible types . 12 2.3.6 inconvertible types . 13 2.4 Return statements . 13 2.4.1 missing return statement . 13 2.4.2 missing return value . 14 2.4.3 cannot return a value from method whose result type is void . 14 2.4.4 invalid method declaration; return type required .
    [Show full text]
  • Compile and Runtime Errors in Java
    Compile and Runtime Errors in Java Mordechai (Moti) Ben-Ari Department of Science Teaching Weizmann Institute of Science Rehovot 76100 Israel http://stwww.weizmann.ac.il/g-cs/benari/ August 27, 2007 This work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/ by-nc-nd/2.5/; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Acknowledgement Otto Seppälä and Niko Myller contributed many helpful suggestions. Contents 1 Introduction 5 2 Compile-time Errors 6 2.1 Syntax errors . 7 2.1.1 . expected . 7 2.1.2 unclosed string literal . 7 2.1.3 illegal start of expression . 8 2.1.4 not a statement . 8 2.2 Identifiers . 10 2.2.1 cannot find symbol . 10 2.2.2 . is already defined in . 10 2.2.3 array required but . found . 10 2.2.4 . has private access in . 10 2.3 Computation . 11 2.3.1 variable . might not have been initialized . 11 2.3.2 . in . cannot be applied to . 11 2.3.3 operator . cannot be applied to . ,. 12 2.3.4 possible loss of precision . 12 2.3.5 incompatible types . 12 2.3.6 inconvertible types . 13 2.4 Return statements . 13 2.4.1 missing return statement . 13 2.4.2 missing return value . 14 2.4.3 cannot return a value from method whose result type is void . 14 2.4.4 invalid method declaration; return type required .
    [Show full text]
  • Using Language Models to Detect and Correct Syntax Errors
    Syntax and Sensibility: Using language models to detect and correct syntax errors Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, and José Nelson Amaral Department of Computing Science University of Alberta Edmonton, Alberta, Canada {easantos,joshua2,dhvani,hindle1,jamaral}@ualberta.ca Abstract—Syntax errors are made by novice and experienced Listing 1: Syntactically invalid Java code. An open brace ({) is missing programmers alike; however, novice programmers lack the years at the end of line 3. of experience that help them quickly resolve these frustrating errors. Standard LR parsers are of little help, typically resolving 1 public class A{ public static void main(String args[]) { syntax errors and their precise location poorly. We propose 2 3 if (args.length < 2) a methodology that locates where syntax errors occur, and 4 System.out.println("Not enough args!"); suggests possible changes to the token stream that can fix the 5 System.exit(1); error identified. This methodology finds syntax errors by using 6 } language models trained on correct source code to find tokens 7 System.out.println("Hello, world!"); that seem out of place. Fixes are synthesized by consulting the 8 } language models to determine what tokens are more likely at the 9 } estimated error location. We compare n-gram and LSTM (long short-term memory) language models for this task, each trained input to a Java 8 compiler such as OpenJDK’s javac [4], on a large corpus of Java code collected from GitHub. Unlike and it reports that there is an error with the source file, but it prior work, our methodology does not rely that the problem overwhelms the user with misleading error messages to identify source code comes from the same domain as the training data.
    [Show full text]
  • Python for Everybody
    Python for Everybody Exploring Data Using Python 3 Charles R. Severance 1.10. WHAT COULD POSSIBLY GO WRONG? 13 I hate you Python! ^ SyntaxError: invalid syntax >>> if you come out of there, I would teach you a lesson File "<stdin>", line 1 if you come out of there, I would teach you a lesson ^ SyntaxError: invalid syntax >>> There is little to be gained by arguing with Python. It is just a tool. It has no emotions and it is happy and ready to serve you whenever you need it. Its error messages sound harsh, but they are just Python’s call for help. It has looked at what you typed, and it simply cannot understand what you have entered. Python is much more like a dog, loving you unconditionally, having a few key words that it understands, looking you with a sweet look on its face (>>>), and waiting for you to say something it understands. When Python says “SyntaxError: invalid syntax”, it is simply wagging its tail and saying, “You seemed to say something but I just don’t understand what you meant, but please keep talking to me (>>>).” As your programs become increasingly sophisticated, you will encounter three gen- eral types of errors: Syntax errors These are the first errors you will make and the easiest to fix. A syntax error means that you have violated the “grammar” rules of Python. Python does its best to point right at the line and character where it noticed it was confused. The only tricky bit of syntax errors is that sometimes the mistake that needs fixing is actually earlier in the program than where Python noticed it was confused.
    [Show full text]
  • Finding and Correcting Syntax Errors Using Recurrent Neural Networks
    Finding and correcting syntax errors using recurrent neural networks Eddie Antonio Santos1, Joshua Charles Campbell1, Abram Hindle1, and Jose´ Nelson Amaral1 1Department of Computing Science, University of Alberta, Edmonton, Canada ABSTRACT Minor syntax errors are made by novice and experienced programmers alike; however, novice program- mers lack the years of intuition that help them resolve these tiny errors. Standard LR parsers typically resolve syntax errors and their precise location poorly. We propose a methodology that helps locate where syntax errors occur, but also suggests possible changes to the token stream that can fix the error identified. This methodology finds syntax errors by checking if two language models “agree” on each token. If the models disagree, it indicates a possible syntax error; the methodology tries to suggest a fix by finding an alternative token sequence obtained from the models. We trained two LSTM (Long short-term memory) language models on a large corpus of JavaScript code collected from GitHub. The dual LSTM neural network model predicts the correct location of the syntax error 54.74% in its top 4 suggestions and produces an exact fix up to 35.50% of the time. The results show that this tool and methodology can locate and suggest corrections for syntax errors. Our methodology is of practical use to all programmers, but will be especially useful to novices frustrated with incomprehensible syntax errors. Keywords: syntax error correction; LSTM; JavaScript; RNN; syntax error; deep learning; neural network; n-gram; program repair; GitHub 1 INTRODUCTION Computer program source code is often expressed in plain text files. Plain text is a simple, flexible medium that has been preferred by programming languages for decades.
    [Show full text]
  • Examples of Syntax Errors in Writing
    Examples Of Syntax Errors In Writing DouceParked Niels Leon revoked fogging steady.phlegmatically or apotheosizes perspectively when Coleman is dead. Frans conjugating feudally if halest Raynard habilitated or gloat. In the hex file that must make writing of errors syntax in Not track if your papers are you commonly answering these examples of a text and suggestions on a value. 1E Errors Computer Science Circles. Until grasp the programs that therefore have seen have generally ignored the orphan that things can check wrong. What's the difference between grammar and syntax? From your errors? 14 Common Grammatical Mistakes in English And came to. What taste good syntax? Grammatical error however a term used in prescriptive grammar to conquer an. Of course grammar mistakes are until that drastic but it helps me remember to ponder them out of my writing any mistake happens when a. But we write about writing at all manner, making adjustments to a rule is driving down by. This example of writing common sentence in carbs but not write about referencing a right hand side seems unlikely to! Never used together without making me, accuracy in writing of errors syntax in programming languages the above program work the debugger if your own begin to! Some example might see if we write about writing conferences on. In expressing proper adjectives causing, with sentence above in syntax errors of examples. An exaggerated face will function using correct syntax is lesser than in writing errors in the grammatical pattern as well. Using incorrect capitalization One west the service common syntax errors that new developers make aware to capitalize keywords rather peculiar use lowercase Java is.
    [Show full text]
  • Usability of Error Messages for Introductory Students
    Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal Volume 2 Issue 2 Article 5 September 2015 Usability of Error Messages for Introductory Students Paul A. Schliep University of Minnesota, Morris Follow this and additional works at: https://digitalcommons.morris.umn.edu/horizons Part of the Software Engineering Commons Recommended Citation Schliep, Paul A. (2015) "Usability of Error Messages for Introductory Students," Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal: Vol. 2 : Iss. 2 , Article 5. Available at: https://digitalcommons.morris.umn.edu/horizons/vol2/iss2/5 This Article is brought to you for free and open access by the Journals at University of Minnesota Morris Digital Well. It has been accepted for inclusion in Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal by an authorized editor of University of Minnesota Morris Digital Well. For more information, please contact [email protected]. Schliep: Usability of Error Messages for Introductory Students Usability of Error Messages for Introductory Students Paul A. Schliep Division of Science and Mathematics University of Minnesota, Morris Morris, Minnesota, USA 56267 [email protected] ABSTRACT that students struggle with various elements of error mes- Error messages are an important tool programmers use to sages such as terminology and source highlighting [1, 7]. help find and fix mistakes or issues in their code. When Several tools and heuristics are being developed to help ad- an error message is unhelpful, it can be difficult to find the dress issues in error message usability. The goals of these issue and may impose additional challenges in learning the methodologies are to help introductory programmers locate language and concepts.
    [Show full text]
  • Don't Panic! Better, Fewer, Syntax Errors for LR Parsers
    Don’t Panic! Better, Fewer, Syntax Errors for LR Parsers Lukas Diekmann Software Development Team, King’s College London, United Kingdom https://lukasdiekmann.com/ [email protected] Laurence Tratt Software Development Team, King’s College London, United Kingdom https://tratt.net/laurie/ [email protected] Abstract Syntax errors are generally easy to fix for humans, but not for parsers in general nor LR parsers in particular. Traditional ‘panic mode’ error recovery, though easy to implement and applicable to any grammar, often leads to a cascading chain of errors that drown out the original. More advanced error recovery techniques suffer less from this problem but have seen little practical use because their typical performance was seen as poor, their worst case unbounded, and the repairs they reported arbitrary. In this paper we introduce the CPCT + algorithm, and an implementation of that algorithm, that address these issues. First, CPCT + reports the complete set of minimum cost repair sequences for a given location, allowing programmers to select the one that best fits their intention. Second, on a corpus of 200,000 real-world syntactically invalid Java programs, CPCT + is able to repair 98.37%±0.017% of files within a timeout of 0.5s. Finally, CPCT + uses the complete set of minimum cost repair sequences to reduce the cascading error problem, where incorrect error recovery causes further spurious syntax errors to be identified. Across the test corpus, CPCT + reports 435,812±473 error locations to the user, reducing the cascading error problem substantially relative to the 981,628±0 error locations reported by panic mode.
    [Show full text]