Writing Your Own Text Editor, Part 1

Search Java Magazine Menu Topics Issues Downloads Subscribe September 2019 For the Fun of It: Writing Your Own CODING Text Editor, Part 1 The Line Between Layers of Code For the Fun of It: Writing Your Own Buffer Management Text Editor, Part 1 UI/Command Structure Using a layered design and iterative Parameterized Testing development, a line editor evolves into a Further Work text editor in this new article series. Code Reuse by Ian Darwin Getting the Code August 26, 2019 Conclusion After reading the title of this article, you might be thinking, “A text editor in Also in This Issue Java? Why would you want to do that? Isn’t Java just for enterprise web apps?” My first answer is heck no! Java is still alive and well on the desktop. My second answer is that getting away from the web environment—some Java developers’ main “comfort zone”—moves the focus towards design and interesting implementation issues. In this first article, I focus on building a simple line-mode text editor. I do this through iterative implementation. That is, I start with a simple implementation. Then, as I add new features, I update the design and reimplement it. In an upcoming article, I will turn it into a graphical desktop editor. In all its variations, it will remain a pure text editor; that is, a program for changing a plain-text file. Plain text—no font or color changes—still makes up the bulk of the world’s computer-facing information: batch/script files, program source files, configuration files, log files, and much more. There’s a big jump from a plain-text editor to a full word processor that provides the ability to select typefaces and type size and color; embed images and spreadsheets; align text left, right, or center; and so much more. It would be difficult to focus on all that in a series of short articles, so for the time being, I’ll focus on plain text. When you design an editor, the two main things to think about are the user interface (UI) or command layer and buffer management. The UI or command layer defines what the user can do to work on the in-memory buffer. These tasks may be simple commands for a line editor, mouse gestures on a screen editor, voice commands on an interactive system, and so on. Buffer management is how the user works on the text in a buffer. Text editors—along with their larger kin, word processors— generally keep an in-memory copy of the file on which changes are made. This copy is stored in memory in a buffer. The editor will write this modified text in place of the original file on disk when the user exits the editor or when some kind of “save” command is issued. Buffer management, also known as the model code, is concerned with looking after the in-memory contents of the file being edited. Both the UI and buffer management, as well as the all-important interface between them, need to be well designed for the end product to be both useful and maintainable. The Line Between Layers of Code Defining a Java interface is a powerful (and common) way of separating logical layers of code. Interfaces should be used between layers of an application and on any class that is likely to have (now or later) multiple implementations. The classes can then depend on the interface, not the particular implementations. In this case, the BufferPrims interface meets both needs: It is a layer boundary, and it has multiple implementations. This is similar to a database-backed application, where you’d likely have an interface between the middle (business logic) layer and the database code. This would allow switching among JDBC, Java Persistence API (JPA)/Hibernate, and maybe NoSQL databases. Some have argued that the JPA EntityManager or Hibernate’s Session are general-enough interfaces for this purpose; others suggest using an application-specific interface. There’s no one right answer for all applications, but these are the design considerations. The editor needs an interface between the command layer and the buffer management layer. A very trivial interface between these layers might look like this: public interface BufferPrims { addLineAfter(int afterLine, String newLine); deleteLine(int lineNumber); replaceText(oldText, newText); } This interface tells the command code nothing much about how the buffer management works, and at the same time it tells the buffer management nothing about how the UI works. That’s important, because it’s desirable to be able to replace either part without having any effect on the other. And that is true of layered software in general: Layers should know nothing at all about the layer above them (so they can be invoked from different UIs, say) and know only how to invoke the layer directly below them, nothing more. To make an actual editor, the interface needs to be a bit more comprehensive. I wound up with something like this version: public interface BufferPrims { final int NO_NUM = 0, INF = Integer.MAX_VALUE; void addLines(List<String> newLines); void addLines(int start, List<String> newLines) void deleteLines(int start, int end); void clearBuffer(); void readBuffer(String fileName); void writeBuffer(String fileName); int getCurrentLineNumber(); String getCurrentLine(); int goToLine(int n); int size(); // Number of lines, as per old C /** Retrieve one or more lines */ String getLine(int ln); List<String> getLines(int i, int j); /** Replace first/all occurrences of 'old' regex * with 'new' text, in current line only */ void replace(String oldRE, String newStr, boolea /** Replace first/all occurrences in each line * void replace(String oldRE, String newStr, boolean all, int startLine, int endLine); boolean isUndoSupported(); /** Undo the most recent operation; * optional method */ default void undo() { throw new UnsupportedOperationException(); } } Most of the operations probably seem straightforward. The source code for this article provides three implementations of the BufferPrims interface. The optional undo method is in one of these implementations but not the others; I discussed that implementation previously in this article on the Command design pattern. Some might argue that read() and write() don’t belong here and that the main program should read the file a line at a time and ingest the lines using one of the add() methods. The read() and write() methods are in this interface for efficiency; there may be versions where you read the entire file with a single read operation. Buffer Management Having multiple implementations that differ significantly provides some evidence that the interface is based on a sensible design. But it doesn’t say anything about efficiency—speaking of which, if you didn’t care about efficiency, you could just keep everything in a single String object. Because strings are immutable, this approach would require a lot of reallocation, so a StringBuilder or StringBuffer is a better basis for a naive implementation. In fact, the first implementation of the buffer primitives (BufferPrims) will use BufferPrimsStringBuffer. Although a StringBuffer has some advantages—such as built-in methods for modifying the contents of the buffer—it is not a natural organization for what is essentially a list of lists of words or, more precisely, a list of lines. Accordingly, the StringBuffer implementation, which I talked myself into writing to show that it could be done, has to do some work just to find where the lines begin and end. To keep things consistent, I made the assumption that each line would end with a single newline character ('\n'); carriage returns ('\r') would be banned altogether. With this approach, I can find where one line ends and the next begins just by looking for those newline characters. Most of the operations in this class end by calling StringBuffer methods to get or set characters at particular positions in the StringBuffer. For example, here’s the code to get the current line’s contents: @Override public String getCurrentLine() { int startOffset = findLineOffset(current); int len = findLineLengthAt(startOffset); return buffer.substring(startOffset, startOffset + len); } The findLineOffset method uses a regular expression to find the lines, so it’s a bit more than just a for loop looking for newline characters, although that would be a workable naive implementation. Similarly, the StringBuffer-based line deletion makes use of the same two methods: @Override public void deleteLines(int startLine, int endLine) if (startLine > endLine) { throw new IllegalArgumentException(); } int startOffset = findLineOffset(startLine), endOffset = findLineOffset(endLine); buffer.delete(startOffset, endOffset + findLineLengthAt(endOffset) + 1 } The substitute (s) command is probably the most complicated command; I describe the variations of it in the “UI/Command Structure” section. When all the command options have been parsed, the UI layer has to call one of two methods: void replace(String oldRE, String newStr, boolean all); void replace(String oldRE, String newStr, boolean all, int startLine, int endLine); The variable all controls whether all occurrences, or just the first, are to be replaced. Here’s the implementation of the one-line version of replace: @Override public void replace(String oldRE, String newStr, boolean all) { int startOffset = findLineOffset(current); int length = findLineLengthAt(startOffset); String tmp = buffer.substring(startOffset, length); tmp = all ? tmp.replaceAll(oldRE, newStr) : tmp.replace(oldRE newStr); buffer.replace(startOffset, length, tmp); } A second implementation. The second implementation of the buffer code, BufferPrimsNoUndo, stores the buffer data in a List<String> , which is a data structure that’s easier to work with on a line-by-line basis. For example, getting the current line consists of just the following code: @Override public String getCurrentLine() { return buffer.get( lineNumToIndex(current)); } (Because the line numbers start at 1 but List indices start at 0, there’s a lineNumToIndex() method to convert from a line number to an array index.

Load more