Using Language Models to Detect and Correct Syntax Errors

Total Page:16

File Type:pdf, Size:1020Kb

Using Language Models to Detect and Correct Syntax Errors Syntax and Sensibility: Using language models to detect and correct syntax errors Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, and José Nelson Amaral Department of Computing Science University of Alberta Edmonton, Alberta, Canada {easantos,joshua2,dhvani,hindle1,jamaral}@ualberta.ca Abstract—Syntax errors are made by novice and experienced Listing 1: Syntactically invalid Java code. An open brace ({) is missing programmers alike; however, novice programmers lack the years at the end of line 3. of experience that help them quickly resolve these frustrating errors. Standard LR parsers are of little help, typically resolving 1 public class A{ public static void main(String args[]) { syntax errors and their precise location poorly. We propose 2 3 if (args.length < 2) a methodology that locates where syntax errors occur, and 4 System.out.println("Not enough args!"); suggests possible changes to the token stream that can fix the 5 System.exit(1); error identified. This methodology finds syntax errors by using 6 } language models trained on correct source code to find tokens 7 System.out.println("Hello, world!"); that seem out of place. Fixes are synthesized by consulting the 8 } language models to determine what tokens are more likely at the 9 } estimated error location. We compare n-gram and LSTM (long short-term memory) language models for this task, each trained input to a Java 8 compiler such as OpenJDK’s javac [4], on a large corpus of Java code collected from GitHub. Unlike and it reports that there is an error with the source file, but it prior work, our methodology does not rely that the problem overwhelms the user with misleading error messages to identify source code comes from the same domain as the training data. the location of the mistake made by the programmer. We evaluated against a repository of real student mistakes. Our A.java:7: error: <identifier> expected tools are able to find a syntactically-valid fix within its top- System.out.println("Hello, world!"); 2 suggestions, often producing the exact fix that the student ^ used to resolve the error. The results show that this tool and A.java:7: error: illegal start of type methodology can locate and suggest corrections for syntax errors. System.out.println("Hello, world!"); Our methodology is of practical use to all programmers, but will ^ be especially useful to novices frustrated with incomprehensible A.java:9: error: class, interface, or enum expected syntax errors. } I. INTRODUCTION ^ 3 errors Computer program source code is often expressed in plain Imagine a novice programmer writing this simple program text files. Plain text is a simple, flexible medium that has been for the first time, being greeted with three error messages, preferred by programmers and their tools for decades. Yet, all of which include strange jargon such as error: illegal plain text can be a major hurdle for novices learning how to start of type. The compiler identified the problem as being code [1, 2, 3]. Not only do novices have to learn the semantics at least on line 7 when the mistake is four lines up, on line 3. of a programming language, but they also have to learn how However, an experienced programmer could look at the source to place arcane symbols in the right order for the computer to code, ponder, and exclaim: “Ah! There is a missing open brace understand their intent. The positioning of symbols in just the ({) at the end of line 3!” Such mistakes involving unbalanced right way is called syntax, and sometimes humans—especially braces are the most frequent errors among new programmers [5], novices—get it wrong [1]. yet the compiler offers little help in resolving them. The tools meant for interpreting the human-written source In this paper, we present Sensibility, which finds and fixes code, parsers, are often made such that they excel in understand- single token syntax errors. We address the following problem: ing well-structured input; however, if they are given an input Given a source code file with a syntax error, how can one with so much as one mistake, they can fail catastrophically. accurately pinpoint its location and produce a single token What’s worse, the parser may come up with a misleading suggestion that will fix it? conclusion as to where the actual error is. Consider the Java source code in Listing 1. A single token—an open brace ({) We compare the use of n-gram models and long short-term at the end of line 3—in the input is different from that of the memory neural network (LSTM) models for modelling source correct file that the programmer intended to write. Give this code for the purposes of correcting syntax errors. All models were trained on a large corpus of hand-written Java source error messages the compiler generates and how long it takes for code collected from GitHub. Whereas prior works offer the programmer to fix them. This research shows that novices solutions that are limited to fixing syntax errors within the and advanced programmers alike struggle with syntax errors same domain as the training data—such as only fixing errors and their accompanying error messages. Dy and Rodrigo [22] within the same source repository [6], or within the same developed a system that detects compiler error messages that do introductory programming assignment [7, 8]—Sensibility not indicate the actual fault, which they name “non-literal error imposes no such restriction. As such, we evaluated against messages”, and assists students in solving them. Nienaltowski et a corpus of real syntax errors collected from the Blackbox al. [23] found that more detailed compiler error messages do repository of novice programmers’ activity [9]. not necessarily help students avoid being confused by error In this paper, we focus on correcting syntax errors at the messages. They also found that students presented with error token level. We do not consider semantic errors that can messages only including the file, line, and a short message did occur given a valid parse tree such as type mismatches, and— better at identifying the actual error in the code more often in our abstract models (Section III-B)—misspelled variable for some types of errors. Marceau et al. [24] developed a names. These errors are already handled by other tools given rubric for grading the error messages produced by compilers. a source file with valid syntax. For example, the Clang C++ Becker’s dissertation [25] attempted to enhance compiler error compiler [10] can detect misspelled variable names and—since messages in order to improve student performance. Barik et it has a valid parse tree—Clang can suggest which variable al. [26] studied how developers visualize compilation errors. in scope may be intended. As such, we focus our attention in They motivate their research with an example of a compiler this paper to errors that occur before a compiler can reach this misreporting the location of a fault. Pritchard [27] shows that stage—namely, errors that prevent a valid parse of a source the most common type of errors novices make in Python are code file. In particular, we consider syntax errors that are non- syntax errors. Brown et al. [5] investigated the opinions of trivial to detect using simple handwritten rules, as used in some educators regarding what they believe are the most common parsers [11, 12]. Java programming mistakes made by novices, and contrasted Our contributions include: it with mistakes mined from the Blackbox dataset. They found • Showing that language models can successfully locate and that educators share a weak consensus of what errors are fix syntax errors in human-written code without parsing. most frequent, and furthermore show that across 14;235;239 • Comparing three different language models including two compilation events, educators’ opinion demonstrates a low level n-gram models and one deep neural network. of agreement against the actual data mined. The most common • Evaluating how all three models perform on a corpus of real- error witnessed was unbalanced parenthesis and brackets, which world syntax errors with known fixes provided by students. was ranked the eleventh most common on average by educators. The other most common errors were semantic and type errors. II. PRIOR WORK Earlier research attempted to tackle errors at the parsing Prior work relevant to this research addresses repositories stage. In 1972, Aho [28] introduced an algorithm to attempt to of source code and repositories of mistakes, past attempts at repair parse failures by minimizing the number of errors that locating and fixing syntax errors in and out of the parser, were generated. In 1976, Thompson [29] provided a theoretical methods for preventing syntax errors, deep learning, and basis for error-correcting probabilistic compiler parsers. He applying deep learning to syntax error locating and fixing. also criticized the lack of probabilistic error correction in then- a) Data-sources and repositories: are needed to train modern compilers. However, this trend continues to this day. models and evaluate the effectiveness of techniques on syntax Parr et al. [30] discusses the strategy used in ANTLR, a errors. Brown et al. [9, 13] created the Blackbox reposi- popular LL(*) parser-generator. The parser attempts to repair tory of programmers’ activity collected from the BlueJ Java the code when it encounters an error using the context around Integrated Development Environment (IDE) [14], which is the error so it can continue parsing. This strategy allows used internationally in introductory computer science classes. ANTLR parsers to detect multiple problems instead of stopping Upon installation, BlueJ asks the user whether they consent on the first error. Jeffery [11] created Merr, an extension of the to anonymized data collection of editor events.
Recommended publications
  • Compiler Error Messages Considered Unhelpful: the Landscape of Text-Based Programming Error Message Research
    Working Group Report ITiCSE-WGR ’19, July 15–17, 2019, Aberdeen, Scotland Uk Compiler Error Messages Considered Unhelpful: The Landscape of Text-Based Programming Error Message Research Brett A. Becker∗ Paul Denny∗ Raymond Pettit∗ University College Dublin University of Auckland University of Virginia Dublin, Ireland Auckland, New Zealand Charlottesville, Virginia, USA [email protected] [email protected] [email protected] Durell Bouchard Dennis J. Bouvier Brian Harrington Roanoke College Southern Illinois University Edwardsville University of Toronto Scarborough Roanoke, Virgina, USA Edwardsville, Illinois, USA Scarborough, Ontario, Canada [email protected] [email protected] [email protected] Amir Kamil Amey Karkare Chris McDonald University of Michigan Indian Institute of Technology Kanpur University of Western Australia Ann Arbor, Michigan, USA Kanpur, India Perth, Australia [email protected] [email protected] [email protected] Peter-Michael Osera Janice L. Pearce James Prather Grinnell College Berea College Abilene Christian University Grinnell, Iowa, USA Berea, Kentucky, USA Abilene, Texas, USA [email protected] [email protected] [email protected] ABSTRACT of evidence supporting each one (historical, anecdotal, and empiri- Diagnostic messages generated by compilers and interpreters such cal). This work can serve as a starting point for those who wish to as syntax error messages have been researched for over half of a conduct research on compiler error messages, runtime errors, and century. Unfortunately, these messages which include error, warn- warnings. We also make the bibtex file of our 300+ reference corpus ing, and run-time messages, present substantial difficulty and could publicly available.
    [Show full text]
  • Common Logic Errors for Programming Learners: a Three-Decade Litera- Ture Survey
    Paper ID #32801 Common Logic Errors for Programming Learners: A Three-decade Litera- ture Survey Nabeel Alzahrani, University of California, Riverside Nabeel Alzahrani is a Computer Science Ph.D. student in the Department of Computer Science and En- gineering at the University of California, Riverside. Nabeel’s research interests include causes of student struggle, and debugging methodologies, in introductory computer programming courses. Prof. Frank Vahid, University of California, Riverside Frank Vahid is a Professor of Computer Science and Engineering at the Univ. of California, Riverside. His research interests include embedded systems design, and engineering education. He is a co-founder of zyBooks.com. c American Society for Engineering Education, 2021 Common Logic Errors for Programming Learners: A Three- Decade Literature Survey Nabeel Alzahrani, Frank Vahid* Computer Science and Engineering University of California, Riverside {nalza001, vahid}@ucr.edu *Also with zyBooks Abstract We surveyed common logic errors made by students learning programming in introductory (CS1) programming classes, as reported in 47 publications from 1985 to 2018. A logic error causes incorrect program execution, in contrast to a syntax error, which prevents execution. Logic errors tend to be harder to detect and fix and are more likely to cause students to struggle. The publications described 166 common logic errors, which we classified into 11 error categories: input (2 errors), output (1 error), variable (7 errors), computation (21 errors), condition (18 errors), branch (14 errors), loop (27 errors), array (5 errors), function (24 errors), conceptual (43 errors), and miscellaneous (4 errors). Among those errors, we highlighted 43 that seemed to be the most common and/or troublesome.
    [Show full text]
  • Automated Debugging of Java Syntax Errors Through Diagnosis
    Uti and Fox Automated debugging of Java syntax errors through diagnosis Ngozi V. Uti and Dr. Richard Fox Department of Mathematics and Computer Science Northern Kentucky University Highland Heights, KY 41099 Abstract Syntax errors arise in a program because the program code misuses the programming language. When these errors are detected by a compiler, they are not necessarily easy to debug because the compiler error message that reports the error is often misleading. JaSD, a knowledge-based system written in Java, is a program that can determine the cause of a given syntax error and suggest how to correct that error. JaSD uses artificial intelligence techniques to perform debugging. Currently, JaSD has enough knowledge to debug one of four different Java syntax errors. This paper describes the JaSD system and the diagnostic approach taken and offers some examples. Conclusions describe how JaSD can be expanded to debug additional errors. Introduction expected error is common because a semicolon is used Syntax errors can arise in a program because the in Java, C, and C++ to end instructions and in Pascal to program code misuses the programming language. The delineate (separate) instructions. Forgetting a semico- compiler, in its attempt to translate the high-level lan- lon is common and yields the semicolon-expected guage program into machine language, has reached an error in any of these languages. Thus, the programmer instruction that it does not understand. The error might learns how the error can arise. However, there are be because the programmer misused the high-level lan- numerous possible causes for this error, omitting a guage (e.g., improper punctuation or not structuring a semicolon being only one of them.
    [Show full text]
  • Introduction to Computers and Programming
    M01_GADD7119_01_SE_C01.QXD 1/30/08 12:55 AM Page 1 Introduction to Computers CHAPTER 1 and Programming TOPICS 1.1 Introduction 1.4 How a Program Works 1.2 Hardware and Software 1.5 Using Python 1.3 How Computers Store Data 1.1 Introduction Think about some of the different ways that people use computers. In school, students use com- puters for tasks such as writing papers, searching for articles, sending email, and participating in online classes. At work, people use computers to analyze data, make presentations, conduct busi- ness transactions, communicate with customers and coworkers, control machines in manufac- turing facilities, and do many other things. At home, people use computers for tasks such as pay- ing bills, shopping online, communicating with friends and family, and playing computer games. And don’t forget that cell phones, iPods®, BlackBerries®, car navigation systems, and many other devices are computers too. The uses of computers are almost limitless in our everyday lives. Computers can do such a wide variety of things because they can be programmed. This means that computers are not designed to do just one job, but to do any job that their programs tell them to do. A program is a set of instructions that a computer follows to perform a task. For example, Figure 1-1 shows screens from two commonly used programs, Microsoft Word and Adobe Photoshop. Microsoft Word is a word processing program that allows you to create, edit, and print documents with your computer. Adobe Photoshop is an image editing program that allows you to work with graphic images, such as photos taken with your digital camera.
    [Show full text]
  • Compile and Runtime Errors in Java
    Compile and Runtime Errors in Java Mordechai (Moti) Ben-Ari Department of Science Teaching Weizmann Institute of Science Rehovot 76100 Israel http://stwww.weizmann.ac.il/g-cs/benari/ January 24, 2007 This work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/ by-nc-nd/2.5/; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Acknowledgement Otto Seppälä and Niko Myller contributed many helpful suggestions. Contents 1 Introduction 5 2 Compile-time Errors 6 2.1 Syntax errors . 7 2.1.1 . expected . 7 2.1.2 unclosed string literal . 7 2.1.3 illegal start of expression . 8 2.1.4 not a statement . 8 2.2 Identifiers . 10 2.2.1 cannot find symbol . 10 2.2.2 . is already defined in . 10 2.2.3 array required but . found . 10 2.2.4 . has private access in . 10 2.3 Computation . 11 2.3.1 variable . might not have been initialized . 11 2.3.2 . in . cannot be applied to . 11 2.3.3 operator . cannot be applied to . ,. 12 2.3.4 possible loss of precision . 12 2.3.5 incompatible types . 12 2.3.6 inconvertible types . 13 2.4 Return statements . 13 2.4.1 missing return statement . 13 2.4.2 missing return value . 14 2.4.3 cannot return a value from method whose result type is void . 14 2.4.4 invalid method declaration; return type required .
    [Show full text]
  • Compile and Runtime Errors in Java
    Compile and Runtime Errors in Java Mordechai (Moti) Ben-Ari Department of Science Teaching Weizmann Institute of Science Rehovot 76100 Israel http://stwww.weizmann.ac.il/g-cs/benari/ August 27, 2007 This work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/ by-nc-nd/2.5/; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Acknowledgement Otto Seppälä and Niko Myller contributed many helpful suggestions. Contents 1 Introduction 5 2 Compile-time Errors 6 2.1 Syntax errors . 7 2.1.1 . expected . 7 2.1.2 unclosed string literal . 7 2.1.3 illegal start of expression . 8 2.1.4 not a statement . 8 2.2 Identifiers . 10 2.2.1 cannot find symbol . 10 2.2.2 . is already defined in . 10 2.2.3 array required but . found . 10 2.2.4 . has private access in . 10 2.3 Computation . 11 2.3.1 variable . might not have been initialized . 11 2.3.2 . in . cannot be applied to . 11 2.3.3 operator . cannot be applied to . ,. 12 2.3.4 possible loss of precision . 12 2.3.5 incompatible types . 12 2.3.6 inconvertible types . 13 2.4 Return statements . 13 2.4.1 missing return statement . 13 2.4.2 missing return value . 14 2.4.3 cannot return a value from method whose result type is void . 14 2.4.4 invalid method declaration; return type required .
    [Show full text]
  • Python for Everybody
    Python for Everybody Exploring Data Using Python 3 Charles R. Severance 1.10. WHAT COULD POSSIBLY GO WRONG? 13 I hate you Python! ^ SyntaxError: invalid syntax >>> if you come out of there, I would teach you a lesson File "<stdin>", line 1 if you come out of there, I would teach you a lesson ^ SyntaxError: invalid syntax >>> There is little to be gained by arguing with Python. It is just a tool. It has no emotions and it is happy and ready to serve you whenever you need it. Its error messages sound harsh, but they are just Python’s call for help. It has looked at what you typed, and it simply cannot understand what you have entered. Python is much more like a dog, loving you unconditionally, having a few key words that it understands, looking you with a sweet look on its face (>>>), and waiting for you to say something it understands. When Python says “SyntaxError: invalid syntax”, it is simply wagging its tail and saying, “You seemed to say something but I just don’t understand what you meant, but please keep talking to me (>>>).” As your programs become increasingly sophisticated, you will encounter three gen- eral types of errors: Syntax errors These are the first errors you will make and the easiest to fix. A syntax error means that you have violated the “grammar” rules of Python. Python does its best to point right at the line and character where it noticed it was confused. The only tricky bit of syntax errors is that sometimes the mistake that needs fixing is actually earlier in the program than where Python noticed it was confused.
    [Show full text]
  • Syntax Errors Just Aren't Natural: Improving Error Reporting With
    Syntax Errors Just Aren’t Natural: Improving Error Reporting with Language Models Joshua Charles Abram Hindle José Nelson Amaral Campbell Department of Computing Department of Computing Department of Computing Science Science Science University of Alberta University of Alberta University of Alberta Edmonton, Canada Edmonton, Canada Edmonton, Canada [email protected] [email protected] [email protected] ABSTRACT programmers get frustrated by syntax errors, often resorting A frustrating aspect of software development is that com- to erratically commenting their code out in the hope that piler error messages often fail to locate the actual cause of they discover the location of the error that brought their a syntax error. An errant semicolon or brace can result in development to a screeching halt [14]. Syntax errors are many errors reported throughout the file. We seek to find annoying to programmers, and sometimes very hard to find. the actual source of these syntax errors by relying on the con- Garner et al. [4] admit that “students very persistently sistency of software: valid source code is usually repetitive keep seeking assistance for problems with basic syntactic de- and unsurprising. We exploit this consistency by construct- tails.” They, corroborated by numerous other studies [11, 12, ing a simple N-gram language model of lexed source code 10, 17], found that errors related to basic mechanics (semi- tokens. We implemented an automatic Java syntax-error lo- colons, braces, etc.) were the most persistent and common cator using the corpus of the project itself and evaluated its problems that beginner programmers faced. In fact these performance on mutated source code from several projects.
    [Show full text]
  • Finding and Correcting Syntax Errors Using Recurrent Neural Networks
    Finding and correcting syntax errors using recurrent neural networks Eddie Antonio Santos1, Joshua Charles Campbell1, Abram Hindle1, and Jose´ Nelson Amaral1 1Department of Computing Science, University of Alberta, Edmonton, Canada ABSTRACT Minor syntax errors are made by novice and experienced programmers alike; however, novice program- mers lack the years of intuition that help them resolve these tiny errors. Standard LR parsers typically resolve syntax errors and their precise location poorly. We propose a methodology that helps locate where syntax errors occur, but also suggests possible changes to the token stream that can fix the error identified. This methodology finds syntax errors by checking if two language models “agree” on each token. If the models disagree, it indicates a possible syntax error; the methodology tries to suggest a fix by finding an alternative token sequence obtained from the models. We trained two LSTM (Long short-term memory) language models on a large corpus of JavaScript code collected from GitHub. The dual LSTM neural network model predicts the correct location of the syntax error 54.74% in its top 4 suggestions and produces an exact fix up to 35.50% of the time. The results show that this tool and methodology can locate and suggest corrections for syntax errors. Our methodology is of practical use to all programmers, but will be especially useful to novices frustrated with incomprehensible syntax errors. Keywords: syntax error correction; LSTM; JavaScript; RNN; syntax error; deep learning; neural network; n-gram; program repair; GitHub 1 INTRODUCTION Computer program source code is often expressed in plain text files. Plain text is a simple, flexible medium that has been preferred by programming languages for decades.
    [Show full text]
  • Examples of Syntax Errors in Writing
    Examples Of Syntax Errors In Writing DouceParked Niels Leon revoked fogging steady.phlegmatically or apotheosizes perspectively when Coleman is dead. Frans conjugating feudally if halest Raynard habilitated or gloat. In the hex file that must make writing of errors syntax in Not track if your papers are you commonly answering these examples of a text and suggestions on a value. 1E Errors Computer Science Circles. Until grasp the programs that therefore have seen have generally ignored the orphan that things can check wrong. What's the difference between grammar and syntax? From your errors? 14 Common Grammatical Mistakes in English And came to. What taste good syntax? Grammatical error however a term used in prescriptive grammar to conquer an. Of course grammar mistakes are until that drastic but it helps me remember to ponder them out of my writing any mistake happens when a. But we write about writing at all manner, making adjustments to a rule is driving down by. This example of writing common sentence in carbs but not write about referencing a right hand side seems unlikely to! Never used together without making me, accuracy in writing of errors syntax in programming languages the above program work the debugger if your own begin to! Some example might see if we write about writing conferences on. In expressing proper adjectives causing, with sentence above in syntax errors of examples. An exaggerated face will function using correct syntax is lesser than in writing errors in the grammatical pattern as well. Using incorrect capitalization One west the service common syntax errors that new developers make aware to capitalize keywords rather peculiar use lowercase Java is.
    [Show full text]
  • Usability of Error Messages for Introductory Students
    Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal Volume 2 Issue 2 Article 5 September 2015 Usability of Error Messages for Introductory Students Paul A. Schliep University of Minnesota, Morris Follow this and additional works at: https://digitalcommons.morris.umn.edu/horizons Part of the Software Engineering Commons Recommended Citation Schliep, Paul A. (2015) "Usability of Error Messages for Introductory Students," Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal: Vol. 2 : Iss. 2 , Article 5. Available at: https://digitalcommons.morris.umn.edu/horizons/vol2/iss2/5 This Article is brought to you for free and open access by the Journals at University of Minnesota Morris Digital Well. It has been accepted for inclusion in Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal by an authorized editor of University of Minnesota Morris Digital Well. For more information, please contact [email protected]. Schliep: Usability of Error Messages for Introductory Students Usability of Error Messages for Introductory Students Paul A. Schliep Division of Science and Mathematics University of Minnesota, Morris Morris, Minnesota, USA 56267 [email protected] ABSTRACT that students struggle with various elements of error mes- Error messages are an important tool programmers use to sages such as terminology and source highlighting [1, 7]. help find and fix mistakes or issues in their code. When Several tools and heuristics are being developed to help ad- an error message is unhelpful, it can be difficult to find the dress issues in error message usability. The goals of these issue and may impose additional challenges in learning the methodologies are to help introductory programmers locate language and concepts.
    [Show full text]
  • Don't Panic! Better, Fewer, Syntax Errors for LR Parsers
    Don’t Panic! Better, Fewer, Syntax Errors for LR Parsers Lukas Diekmann Software Development Team, King’s College London, United Kingdom https://lukasdiekmann.com/ [email protected] Laurence Tratt Software Development Team, King’s College London, United Kingdom https://tratt.net/laurie/ [email protected] Abstract Syntax errors are generally easy to fix for humans, but not for parsers in general nor LR parsers in particular. Traditional ‘panic mode’ error recovery, though easy to implement and applicable to any grammar, often leads to a cascading chain of errors that drown out the original. More advanced error recovery techniques suffer less from this problem but have seen little practical use because their typical performance was seen as poor, their worst case unbounded, and the repairs they reported arbitrary. In this paper we introduce the CPCT + algorithm, and an implementation of that algorithm, that address these issues. First, CPCT + reports the complete set of minimum cost repair sequences for a given location, allowing programmers to select the one that best fits their intention. Second, on a corpus of 200,000 real-world syntactically invalid Java programs, CPCT + is able to repair 98.37%±0.017% of files within a timeout of 0.5s. Finally, CPCT + uses the complete set of minimum cost repair sequences to reduce the cascading error problem, where incorrect error recovery causes further spurious syntax errors to be identified. Across the test corpus, CPCT + reports 435,812±473 error locations to the user, reducing the cascading error problem substantially relative to the 981,628±0 error locations reported by panic mode.
    [Show full text]