Rich Text Programming : A Tool for Code Comprehension

Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder

Abstract

This article presents the idea of using rich text for programming. Here weex- plore some new avenues opened up by a programming environment that allows developers to use rich text formatting options (eg. fonts, colours, highlighting, embedded images, notes, audio, video etc.) in their program text. We argue that this can be very useful towards enhancing program comprehension, especially in the context of maintaining large long-lived legacy systems. Computer programs are arguably the most complex creations of mankind. Having started only afew decades ago, we have already produced billions of lines of code worth trillions of dollars, and trillions of man-hours of cognitive work would be required to main- tain them. A tool that improves maintainability of such valuable assets can be a useful addition to the development toolset. The proposed rich text program- ming environment helps us take greater advantage of our sensory faculties for meaning extraction and annotation of programs. -driven semantic colouring facilitated by this approach reduces chances of programming errors and rich multimedia features enable heavily annotated code that keeps expla- nations right where they help most with code comprehension.

1 Contents

1 Introduction 1

2 The Rich Text File Format 8

3 A Richer Programming Environment 11 3.1 Richer Program Text ...... 11 3.2 Richer Search ...... 13 3.2.1 Multiple always-present search widgets and commands . . . 14 3.2.2 Support for multi-attribute search indexes ...... 15 3.2.3 Search by proximity ...... 15 3.2.4 Search patterns with additional constraints ...... 16 3.2.5 Searching notes ...... 16 3.3 Richer Text Replace ...... 16 3.4 Richer Tracing ...... 17 3.5 Richer Selection and Insertion ...... 19 3.6 Commands and Scripting ...... 20 3.7 Compatibility with Other Systems ...... 20 3.8 Richer Static Assertions and Expansion ...... 21

4 Conclusion and Future Work 22 List of Figures

1 Guy L Steele showing highlighted code ...... 3 2 Cartoon depiction of the program comprehension process . 4 3 Wire Marker - a browser plug-in for annotating -pages ...... 5 4 iAnnotate - a program for annotating PDF and other formats . 5 5 Sample of Spectral rich text file ...... 8 6 Visual rendering of the rich text given in Figure 5 ...... 8 7 Annotated version of the text in Figure 5 ...... 9 8 Merge conflict markers displayed on Spectral ...... 10 9 Spectral editor views with highlighted salient symbols ...... 13 10 annotated with hit counts ...... 19 Rich Text Programming : A Tool for Code Comprehension

1 Introduction

Let us start by noting the following quotes from famous :

Let us change our traditional attitude to the construction of programs. Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do. The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style.

- Donald E Knuth, in Literate Programming [6]

In the last 40 years we have written billions of lines of code that will keep programmers employed for trillions of man hours in the next few thou- sand years to clean up the mess we have made. All that stuff we wrote years ago, you know, these poor sods will be scratching their heads and going ”What the hell does this stuff do?” right? And it’s terrible

- Joe Armstrong, in his talk titled ”The Mess We’re In” [3]

Programs are written for people to read, and only incidentally for ma- chines to execute.

- Harold Abelson, in Structure and Interpretation of Computer Programs [1]

Management and mitigation of code complexity is crucial to the sustenance oflarge long-lived products. This goal is approached from many directions, someof which are listed below:

(e.g. modularisation, layering, abstraction, etc.)

• Language imposed constraints (e.g. type safety, , vari- able immutability etc.)

• Documentation formalisms (e.g. UML)

• Shared vocabulary of design patterns, , and idioms

• Coding standards

• Code reviews

Page 1 of 24

Copyright () 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

However diverse these directions and approaches might seem on the surface, deep down they have a central theme - that of making the code easier to understand. Ease of modification, maintenance, and extension follows as a result of ease of understand- ing. Programmers often talk about the notion of beautiful code, but any such subjec- tive beauty when not strongly correlated to the objective notionof cognitive ease ends up making the code hard to maintain. Knuth’s literate programming [6] idea approaches this objective by documenting the progression of the development in an order that is most conducive to human under- standing. A literate program is a combination of code and documentation that com- piles into machine executable code as well into a richly formatted text that may be published as a book or used as a reference. Many notable programs and books have been written using this approach (e.g [5], in which the authors note that the booknot only describes the implementation of (a C ), it is the implementation. Literate programming, for all its beautiful outcomes, did not really catch on, perhaps due to short-term commercial pressures taking priority over long term maintainability in the software industry. Here we introduce an alternative approach that hassome aspects in common with literate programming, but can potentially be a better fit with mainstream programming practice because of its incremental approach - in that its adoption doesn’t need a big change in the practice, and the adoption level canvary continuously from zero to full-blown adoption. The approach iscalled rich program- ming (RP) - i.e. programming with rich-text. Here we use the term rich text in a broad sense in that it allows beautiful text formatting (using fonts, colours, underlines, high- lighting, hyper-links, etc.) along with multimedia (images, audio, video etc.). Rich text is more conducive to human perusal due to the availability of additional vi- sual cues and options of interactivity. So it is a common practice to have additional richer material describing the code - in which parts of the code is excerpted and pre- sented along with diagrams and multimedia. For example, the photo in figure 1is taken from a google tech-talk video, in which Guy L Steele is presenting some code, and note how he uses highlighting to emphasize some salient lines. Having made these observations, we did what any wannabe inventor would do- we asked ourselves a question. The question was - wouldn’t it be nice if the coding itself could generally be done in a rich text environment, where one could highlight, annotate, hyperlink, and so on to make the code easier to understand? or something on those lines. In exploring that question, we came up with a new kind of text editor that can make programming more fun and programs easier to read. We named it the Spec- tral Editor. It is a WYSWYG tool that allows you to edit rich text while automatically

Page 2 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

Figure 1: Guy L Steele showing highlighted code maintaining the underlying plain-text representation. The central idea of this editor is a marriage between rich text and formats which brings about the best of both worlds. It begets the maintainability merits of plain text (e.g. simplicity of auto- matic merging, concatenation, and version management) alongside the visual appeal of rich text (e.g. fonts, color, graphics, etc.). The editor is built around a file format that represents rich text in a way that preserves line-to-line correspondence with the plain text content. The rich text view would incorporate pictures, notes, audio, video etc. to enrich the programming and program comprehension experience, while the corresponding plain text source code is preserved all along with an intuitive mapping between the plain and the rich text. The video in [8] gives a quick tour of the features of spectral. This work arguably belongs in the body of literature on program comprehension. Readers looking for a broader survey on precursor work are advised to consult pub- lications made in the annual International Conference on Program Comprehension - ICPC. The papers published in that conference are behind pay-walls, but some pre- cursor works that could be found on public-access sites are [4], [2], [12] and [11]. Despite the substantial literature on program comprehension, its toolkit continues to be the (i) debugger, (ii) the profiler, and (iii) log files, driven by deep and preferably uninterrupted thinking, assisted by a pencil and scratchpad. Figure 2 illustrates this

Page 3 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

Figure 2: Cartoon depiction of the program comprehension process activity in a light vein. The proposed rich text editor aims at augmenting thistoolkit, by allowing programmers to accumulate explanations, annotations, and diagrams on the code text to keep the additional information in context. Annotating digital documents is not new. In fact there are dozens of editors that allow annotation of popular document formats like Adobe PDF, MS Word Doc,Web pages etc. Figure 3 and 4 are two examples. Table 1 lists many more. However, none of the available annotation tools target program text. The proposed tool aims to ad- dress this gap, as elaborated in the following sections.

Page 4 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

Figure 3: Wire Marker - a browser plug-in for annotating web-pages

Figure 4: iAnnotate - a program for annotating PDF and other formats

Page 5 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

Table 1: Text Annotation Tools

Tool Name Website Insight http://www.lunaimaging.com/index.html Panorama http://xml.coverpages.org/panofeat.html iMarkup http://www.bplogix.com/support/imarkup-client.aspx uTOK http://web.media.mit.edu/∼orit/utok.html Critlink http://zesty.ca/crit/ The Annotea http://annotea.org/ NewsTrust http://newstrust.net/ HP Trailblazer http://ceur-ws.org/Vol-175/19_croke_jack_final.pdf A.nnotate http://a.nnotate.com/ Stickis http://www.stickis.com/ My WOT http://www.mywot.com/ ShiftSpace http://sharedcopy.com/ Diigo http://www.diigo.com/ SharedCopy http://www.shiftspace.org/ FinalsClub http://finalsclub.org/ Awesome Highlighter http://www.awesomehighlighter.com/ Reframe It http://reframeit.com/ Spinspotter http://spinspotter.com/ Mendeley http://www.mendeley.com/ SideWiki http://www.google.com/sidewiki/ Kutano http://www.kutano.com/ Goozy http://goozy.com/ Markup.io http://markup.io/ Shareflow http://zenbe.com/ Yousticker / Stickr http://yousticker.com/ http://stickr.com/ NewsCube http://dl.acm.org/citation.cfm?id=1518772 LEMO http://www.slideshare.net/bhaslhofer/the-lemo-annotation-framework Annozilla http://annozilla.mozdev.org/ NB http://nb.mit.edu/about/ Marginize http://www.marginize.com/ DocumentCloud http://www.documentcloud.org/ RapGenius http://rapgenius.com

Page 6 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

HowTru http://howtru.com/ AnnotateIt http://annotateit.org/ WebKlipper http://webklipper.com/webklipper JournalTalk http://journaltalk.net CritiqueIt https://edu.critiqueit.com/ Crocodoc http://crocodoc.com/ Highlighter.com http://highlighter.com Alipi - Renarration Web http://alipi.janastu.org dbunkr http://dbunkr.com Substance.io http://substance.io/ BounceApp bounceapp.com iCorrect https://www.icorrect.com/ Digress.it http://digress.it/ Converati http://converati.com/ NotableApp noteableapp.com Findings http://findings.com/ Hypothes.is http://hypothes.is LiquidText http://liquidtext.net/ SweeT Web http://wiki.janastu.org/Sweet_Web Annotary https://annotary.com Lacuna Stories http://www.lacunastories.com/about/ Annopad http://www.annopad.com/ Livefyre Sidenotes http://web.livefyre.com/apps/sidenotes/ Orseis http://en.doc.fidesfit.org/wiki/Main_Page Filesquare http://filesq.com CliqueMe http://cliqueme.proboards.com/index.cgi DisputeFinder http://confront.intel-research.net/Dispute_Finder.html Citability http://citability.org Pundit http://thepund.it/ Annotum http://annotum.org/ Goodreader http://www.goodiware.com/goodreader.html Notable PDF https://www.kamihq.com Ouija.io http://ouija.io/demo Poetica http://poetica.com/

Page 7 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

Figure 5: Sample of Spectral rich text file

Figure 6: Visual rendering of the rich text given in Figure 5

2 The Rich Text File Format

The proposed rich text file format is a textual sequence of records, each record iden- tified by its type followed by its details. The record type descriptor tells as towhat kind of record is represented by the details that follow - i.e. it tells whether it is a text record, whether it is an image record, an audio/visual record, a span record, and so on. A text record’s type descriptor is the letter T, which is followed by the text itself. The type descriptors and the detail blocks are mutually separated by white-space, as are the records. The blocks are enclosed by braces where such delimiting is required due to internal white-space within the . Alternately spaces in a block can be escaped using a backslash. Nothing out of the ordinary so far, but here is the crux - the only blocks that can have a new-line in them are text details, and every new-line in a text detail block corresponds to a new-line in the underlying plain text. This re- quirement, in itself, ensures that there is a line-to-line correspondence between the rich-text and its plain text equivalent. The font and colours are described by spans. Spans are regions of text enclosed by records of type span-start (given by the type descriptor S) and span-end (given by the type descriptor /S). The detail block for a span record is a name, whose properties may be described in a property (P) record. There are record types for images, multimedia files, and so on. New record types can be introduced without getting in the way of existing record types, because allwe need to take off on the new type’s meaning is a unique type identifier symbol. Figure 5 presents the content of a sample rich text file, which when rendered by spectral would look like what is shown in Figure 6. Figure 7 shows the text content of figure 5 with some color annotations. The yellow highlighted boldfaced characters in figure 7 represent record type descriptors. The first one among them isa P or property type descriptor, followed by its details - i.e.

Page 8 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

Figure 7: Annotated version of the text in Figure 5 the red-highlighted part - that describes the properties of a span called "h1" (this span is subsequently used for the title text). The details say that the font for the "h1" spans would be boldfaced "Cambria", of size 14, and teal in color. The first text block (i.e. one preceded by the type descriptor T) contains the text "This is the Title". It is surrounded by the span blocks "S h1" and "/S h1" meaning that the font and colour properties of the "h1" span applies to that text. There are some pre-defined span tags (e.g. bg_yellow and bg_turquoise in this example) named after the background color that applies to their spanned text. A user of the editor does not need to deal with the above details since the editor presents a what-you-see-is-what-you-get (WYSWYG) interface. One can simply open a text file (let’s say a source code file named "main.cpp") on this editor, and add value to it using color, fonts, graphics, audio, etc. Each time the file is saved, the editor will save the plain-text changes to the original file ("main.cpp") and the rich text to a file named by appending ".hlt" to the original file’s name, which in this case will be "main.cpp.hlt". If one subsequently changes the original file using some other editor, the ".hlt" file will go out of sync with the original text but the next timeitis opened on spectral, the change will be merged into the rich text. The merged regions of the text will be highlighted with a special background and a helpful merge report will be displayed in a pop-up window. The rich-text merge report would include all the pre-existing content that got modified or deleted due to the merge, and ifrequired these may be copied and pasted back to the editor window. If we store ".hlt" files in a source control system (e.g. svn, git, perforce, etc.), and if merge conflict markers are introduced by a merge from concurrent changes to the same file, the markers will not corrupt the file. Instead the conflict markers will get superposed into the rich-text, within a text block. This is a slight deviation from the file format described above. The handling of conflict markers may be seen as an additional pre-processing that scans the files for conflict markers and convert them into text records that comply with the format described earlier. If the conflict block happens to have some graphics in it, the graphics will also be displayed as shown in figure 8. Concatenation of two such rich text files would not corrupt the file, and resultin the naturally expected concatenation - one that preserves the rich text markup and graphics while concatenating the textual content. It is also possible to makethe hlt

Page 9 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

Figure 8: Merge conflict markers displayed on Spectral

Page 10 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension format the primary code format in a project where Makefile rules are used to extract the source code files from hlt files before further compilation. The hlt format is so simple that a mere five line script in a common scripting language can serve asthe plain text extraction tool.

3 A Richer Programming Environment

The rich text editor concept consists of the following relatively orthogonal ideas built around the aforementioned rich text format:

• Richer program text - fonts, colors, highlights, images, audio, notes, hyperlinks etc.

• Richer search - regular expression search within notes, within selection, in the vicinity of listed locations, with left/right context, and so on.

• Richer selection - rectangular selection, and selecting multiple regions ofvari- ous shapes.

• Richer insertion - multiple insertion cursors of various shapes and sizes.

• Richer find/replace - computed replace, replace in selection, replace in the vicinity of listed locations.

• Richer tracing - integrating tracing, profiling, and coverage analysis, and pattern analysis in traces.

• Richer system interface - accessing and databases from within the editor.

• Richer text transforms - edit thousands of files in one go.

We shall present the details of the above ideas in the following subsections :

3.1 Richer Program Text

The spectral format supports images, audio, video, notes, hyperlinks, and references to other files, which in turn can help produce value-added code that is cross refer- enced with other bits of code, data, and multimedia. There are many possible cre- ative uses of these features. For example, one might want to use spectral to record audio clips during a code walk-through or review meeting. The audio so recorded

Page 11 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension would be attached with the current cursor/caret position when the record buttonis clicked, thereby keeping it in the context of the code that was being discussed. Notes i.e. textual annotations that don’t result in change to the underlying source code or plain text files may be similarly attached with code locations. One use ofnotes is to store dead code. Sometimes people comment out dead code just in case they need to refer to them later. This is arguably a bad practice because it takes up screen real-estate rendering the alive code harder to read. Of course deleting the dead-code is better (because after all the source control system would have the old versions), but if it actually becomes necessary to consult dead code, the existing tools don’t really offer a good way of fishing out deleted code. It seems to be a good solution totuck away dead-code into notes while adding a special keyword (say ”dead-code”) to notes that contain dead code. That way it would be easier to search for them later. As an application of highlights, one can define commenting standards with highlight colour codes. For example, green highlighted comments might be targeted at new recruits who don’t know much about the code-base yet, yellow comments might be for assumptions and requirements, red comments might be for caveats, purple might be used for describing the purpose, and so on. The spectral editor window has a set of widgets to select a highlighter (which is de- fined by a background color or foreground colour, or font, or a combination thereof). Once a highlighter is selected, one can apply the highlight on a text region using the usual text selection method, or one can double-click a word and apply the same high- light on all occurrences of that word. One can also pre-select some text and apply the highlight by a button-click or use a set of regular expressions to apply highlights. When no highlighter is chosen, double clicking on words (for the default binding of double-click that is) applies a randomly chosen background color to all occurrences of the same word. It might seem frivolous at first glance but the act of highlight- ing significantly enhances the speed of comprehension, since the reader can identify all occurances of individual symbols using the more conspicuous visual cue of color (rather than actually reading through the text). Figure 9 shows some glimpses of code on spectral editor with highlights applied using a few double-clicks. During the early stages of program comprehension it is useful to observe how some salient symbols in the code are interacting. This sort of observation is expedited by the colorful high- lights. Dyslexic programmers can greatly benefit from this feature as it reduces the reading effort. An example of use: Suppose there is a method that computes a computationally expensive mathematical function of several variables and its gradient (i.e. thegrad

Page 12 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

Figure 9: Spectral editor views with highlighted salient symbols or ∇). This is the sort of method that gets passed to equation solver or optimisation routines. Let’s say that our job is to slice the method into two methods - onethat evaluates just the function and the other for just the gradient. The function value and the gradient have a number of common sub-expressions that are computed and stored in certain variables for shared use. There are also sub-expressions and vari- ables that contribute exclusively to either the function value or to the gradient. In order to do the slicing, we need to visually distinguish between these three cate- gories. When we choose a highlighter and double-click on the variables used on the RHS of the assignmentsthat assign to the gradient, we highlight the variables that feed into the gradient. Where these variables are on the LHS of an assignment, we could double-click on the variables used in the RHS, and repeat this process until we reach a closure (i.e. no new variable are on the LHS of an assignment whose RHS hasn’t been thoroughly highlighted) . Next we choose another highlighter with settings that don’t fully override the previous highlighter (e.g. the previous highlighter might be all about background color and the new highlighter might be about foreground color or font). Now we could carry out the aforementioned procedure (i.e. highlight till you reach closure) starting with the assignments to the function value. At the end ofthis whole process, the variables contributing to both the gradient and the function value would have both the highlights. Likewise the initially chosen highlighter would show the variables that exclusively contribute to the gradient and the subsequently chosen one would show the variables exclusively contributing to the function value. Thus it would become very easy to visually distinguish the three categories.

3.2 Richer Search

It seems efficacious to take advantage of the rich text using an enhanced search func- tionality. The following sub-sections describe the features developed to that effect.

Page 13 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

3.2.1 Multiple always-present search widgets and commands

Search is one of the frequently used actions during the program comprehension pro- cess. So it can enhance productivity if several search boxes are always present on the editor screen (rather than a single search box that pops up as a modal dialog in most other editors). Spectral’s top bar has as many as 6 search boxes, one under each high- lighter widget. This helps the user to simultaneously search for multiple patterns and highlight them differently. There is a multiple pattern search feature where one can specify multiple patterns which are searched and the result reported in the orderof occurrence. It also displays an abbreviated result string where each pattern from the pattern set is assigned a single letter and occurrences listed as a sequence ofthese letters. These letters are hyperlinks, so on the one hand one could mouse-click on them to jump to the actual occurrence, and on the other hand one could search for patterns in them. An example of use: Say we are looking at a program trace on spec- tral, and we are interested in knowing how often two points of code are hit within a certain scope. In this case 4 patterns are of interest - the two code points of in- terest, the entry to the said scope and the exit from it. We would have to identify these 4 patterns of text and invoke a multi-pattern search with them. In response, spectral will, alongside search result hyperlinks, assign 4 abbreviation symbols (say a, b, c, and ) to these patterns and show a string made of a, b, c, and d in the order of occurrence. By the way, these letters are stable hyperlinks - i.e. won’t go stale if additional text or lines are inserted in the editor. Once the string is produced, one can search for patterns of interest within that string. For example we could search for c[^ac]*([^ac]*a[^ac]*a[^ac]*)*[^ac]*a[^ac]*b to find cases for which the code point b is hit after an odd number of hits to a on entering the scope c. In a string segment ...caabdcaabaabaaabaaabd... it would match caabaabaaab. Similarly, if the objective is to find a scope in which a or b is hit the mostnumber of times, one could invoke the search command match_longest c[ab]+d. After a match is found on the abbreviated representation of the patterns, one can jump to the actual occurrences in the trace text since the abbreviated letters are stable hyper- links to the actual occurrences. It may be argued that one can produce a regex in terms of the original patterns to achieve the same effect but such regular expressions may become very long andun- wieldy. The proposed approach breaks the task into two logically disparate stages and thus helps to mitigate its complexity, and it takes advantage of the hyperlinks feature of spectral’s rich text.

Page 14 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

3.2.2 Support for multi-attribute search indexes

During program comprehension process it is common to ask questions of the follow- ing kind:

• What functions take this type as argument? How about the ones that take itas non-const reference?

• What are the subclasses of this class?

• What are the functions that return this type?

• What are the ”workhorse” functions in this module (i.e. longer than a certain specified length)?

• Which functions use this symbol within its body?

• Which functions insert into this table? Which ones update this table?

• Which functions consult these tables but don’t modify them?

• What tables are modified by the methods of this class?

• Which functions are not hit by the test-suite?

• Which classes are defined under a specified directory or a set of directories?

Spectral supports querying tables made for answering such queries.

3.2.3 Search by proximity

Spectral allows us to set proximity filters while searching for patterns. Such filtered search is based on a listing of code locations (which in turn could be compiler errors, warnings, previous search results and so on). In addition to a listing of locations, one has to specify the extent of proximity for the filter, which is represented by two numbers n1 and n2 (which can be positive or negative). For each listed location L, the filtered search will be carried out in the line range[L + n1,L + n2]. One can also choose to do pattern substitution (i.e. replace) on such filtered search.

Page 15 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

3.2.4 Search patterns with additional constraints

Section 3.2.1 briefly mentioned the match_longest command. It finds the longest string matching the given regular expression. Likewise there is its counterpart match_shortest . Similar is the command: match_length_range that specifies the length range for the match. These commands take an additional op- tional argument representing a transformer expression, which, when specified, trans- forms the match, and the length used by the filter is the length of the transformed string. These features come particularly handy for analysis and comprehension of program trace. The match_with_context command specifies regular expressions for left and right context. A match won’t be accepted by this method unless the beginning and end of the match satisfies the left and right context expres- sions. This begets some additional matching power which can help in some situations.

3.2.5 Searching notes

The notes are meant to contain annotations added for various reasons, accumulated over an extended period, perhaps the entire life-cycle. It is advisable to have an agreed convention or standard about putting some metadata on the note. Forex- ample, each note might start with a heading line, starting with the text ”heading:” followed by a descriptive text, like "heading:John Smith's handover meeting", "heading:Weekly walkthrough 2016-12-23", "heading:Dead code" etc. In ad- dition there could be an author line starting with theword”author:”. Spectral has widgets for searching in notes for the current file, and a command called ”grepnotes” for regular expression search within notes, recursively within a directory tree.

3.3 Richer Text Replace

The text replacement functionality of spectral builds on the aforementioned search filters (i.e. left/right context filters, proximity filters, selection filters) and supportsa powerful feature called computed replacement. It means that the replacement text in the find-replace operation can be produced dynamically (i.e. can be a function of the match, or maybe a number incremented at each match, maybe a random string, a GUID etc.). This can have many uses in en masse transformations and rewriting. Fol- lowing is an application of this feature that helps with program comprehension -that

Page 16 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension of instrumented tracing. Since this is quite an elaborate arrangement, we dedicate a section to it rather than including it within the current section.

3.4 Richer Tracing

Analysing program trace is a key method of investigating the runtime behavior ofa program. Some programmers use the debugger as their primary tool for such inves- tigations, while others use trace analysis. In our experience, especially for heavily multi-threaded programs, trace analysis is a faster and more reliable method of in- vestigation than debugging. The computed replace feature helps with tracing by in- strumenting the source files in a way that it associates a unique id or numberwith every branching point or point of interest in the source code. The instrumentation so added is just a call to a function or macro, called with the unique id as argument. Let us call these unique ids the location identifiers, as it is unique to the location in code. This function or macro can do a number of things at run-time to assist us with program comprehenssion. Firstly it can write the id to the trace stream, indicating that the in- strumented location has been hit. It can also update a map indexed by the current context and location id, so that it records a hit-count for that location within agiven context. The current context could be an identifier for the current test, if it is atest run, or it could be an id for the dataset that is being currently processed, or it could just be a constant default if no such distinction is needed. We use a scope oriented instrumentation for C and C++ since these languages allow detection of scope-exit. In C++ the RAII mechanism (i.e. destructor of an automatic variable) helps detect scope exit, and in C the __cleanup__ attribute does the same. So for each scope, thein- strumentation code ends up calling a scope entry handler and a scope exit handler and passes the location id to them. Following is the for such scope entry and exit handlers: function scope_entry(location_id) { lock_the_global_mutex(); context_id = get_current_context_id(); thread_id = get_current_thread_id(); increment scope_depth[thread_id] by 1; write_to_trace('E',thread_id, scope_depth[thread_id], location_id); increment hit_count[context_id, location_id] by 1; unlock_the_global_mutex(); }

Page 17 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension function scope_exit(location_id as argument) { lock_the_global_mutex(); thread_id = get_current_thread_id(); write_to_trace('X',thread_id, scope_depth[thread_id], location_id); decrement scope_depth[thread_id] by 1; unlock_the_global_mutex(); }

In the above pseudocode there are two maps (or dictionaries) - (i) scope_depth, which is indexed by thread_id, and (ii) hit_count which is indexed by the pair . The scope_depth map keeps track of the current stack frame depth, whereas hit_count keeps track of the number of times a given location_id is hit within a given context. The addition in the trace file for ascope entry is the quadruple

<'E', thread_id, scope_depth, location_id>

, and that for a scope exit is the quadruple <'X', thread_id, scope_depth, location_id>. A time-stamp may also be in- cluded for completeness, if that is not too expensive. Such a trace contains enough information to reconstruct the scope stack (which is a superset of the function-call stack) at any point in the trace. For a particular trace format (whose description is not included in this article), one could load the trace file on spectral and checkthe backtrace at any point of the trace. A particular binding of mouse double-click (se- lectable from the options menu), allows the user to show the source code location corresponding to the location id written on the trace file. A single mouse-click onthe back-trace also shows the actual source file location. The hit_count map shown above keeps a record of the number of time each instru- mented location was hit within given contexts. This map can be used to automatically annotate the source code with such information. We have written a little program that loads the hit_count data after a session, and substitutes the instrumented calls in the source file with an annotation containing the tuple: . Such an annotated source file is very useful for assessing unit test coverage asthe code gets thoroughly annotated with hit counts. Figure 10 shows such an annotated code as displayed on spectral. It is easy to do a reverse-lookup on the contexts based on context id. For example, if each individual context is a unit test, spectral allows the

Page 18 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

Figure 10: Source code annotated with hit counts user to list the unit test names that had hit a specified instrumented location, thanks to the hit_count table.

3.5 Richer Selection and Insertion

The selected text serves as operand for a number of operations in Spectral, not just cut, copy and delete. Besides supporting column selection mode, spectral allows a highlight to be converted into selection. The conversion from highlight to selection makes way for having multiple selections at the same time. Many commands apply its action on the selected region. Following are a few examples:

• When a selection exists, find+replace commands will apply the substitutions only in the selected regions.

• The uc and lc commands will turn the selection to upper case and lower case respectively.

• The menu ”Hyperlink to Selection” on a highlighter will hyperlink the corre- sponding highlighted text to the selected region.

Page 19 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

• The menu ”Copy to HTML Clipboard” copies the selected region into html for- matted clipboard (used for transferring formatted text from spectral tooffice tools like MS Word or Outlook)

• The left-edge of a rectangular selection is used as multiple insertion cursor.

• On pressing Ctrl+ on spectral, the text-to-speech reader feature reads aloud the selected text.

3.6 Commands and Scripting

The editor is scriptable in Tcl [10]. There is a plugin loader architecture whereby users can add new functionality to the editor. The users can easily add new commands, new menus, new language syntax handlers, new key bindings and so on. The full Tcl 8.6 language is available along with the commands added by spectral. Tcl is a homoiconic language, and it follows the paradigm ”everything is a string”, in the same way that ”everything is a list” in lisp. This property allows for a certain succinctness or laconicness in its syntax that many programmers find very appealing.

3.7 Compatibility with Other Systems

Our editor of choice has been Vi (gvim really), which meant a number of Vi reflexes have become second nature to us. In order not to miss them on spectral, we have added many of the commonly used Vi key bindings. Here are a few examples that Vi users will find familiar.

• Esc takes focus to the command prompt.

• The :w command saves the file.

• The :! command runs an external command.

• / searches for the regex.

• yy copies the current line (to clipboard), likewise 10yy copies 10 lines, and so on. Please note that the copy operation copies the rich text with graphics fonts and all, not just the plain text.

• dd deletes the current line, 10dd deletes 10 lines, and so on.

• dw deletes the current word, yw copies the current word.

Page 20 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

• :%s does regex substitution in the entire file, :10,25s does the substitution between lines 10 and 25. :10s would do the substitution in 10 lines starting with the current.

• p pastes from clipboard.

• :q quits the editor, :wq saves and quits.

Not all vim commands have been implemented, only the ones that the authors missed the most. Some commands behave differently. For example, dw (and yw) deletes (copies) the full word no matter where the cursor is within the word, not justthe trailing part. This behavior seemed more convenient as it doesn’t require the user to take the cursor precisely to the beginning to delete/copy the whole word. The substitute command syntax is also slightly different (in that it uses space and notslash as separator, and if space is required within the expressions the expressions can be enclosed in brace). This particular difference was chosen for ease of implementation (i.e. avoid wrestling) in Tcl. It doesn’t seem to cause any inconvenience because the muscle-memory reflex attenuates out by the time the first couple of characters are typed. Since concious thinking takes over towards writing the expressions, it becomes easy to see that we are not actually on gvim and space separation is required here. The video in [9] presents a demonstration of the Vi bindings. Another word on compatibility is that the MS windows version of spectral supports an HTML formatted clipboard that allows transfer of spectral’s rich text to office tools like MS word and outlook.

3.8 Richer Static Assertions and Macro Expansion

Spectral has a built-in static assertion checker and macro expander functionality that can help with reducing errors and enhancing productivity. These macro expansion and assertion scripts are embedded in comments in a fairly language independent manner (i.e. spectral doesn’t care if the comments start with #, ;, %, //,etc.). When the macro expansion (macexp) or assertion checker (ascheck) commands are in- voked, spectral looks for these definitions within the text and carries out the expan- sion and checking. The assertions are about the code text and there are many sit- uations in which such assertions make sense. Suppose you are writing acomplex mathematical code in which you know that there is a symmetry between coordinate directions (x, y and z). In that case you could assert that within the relevant region

Page 21 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension of code the number of occurrences of x, y and z should be equal. Spectral would not only report if there is a violation of the equality, it would show run-length sequences of symbols, so that the likely location of error is easily pinpointed. Assertions couldbe about interleaving/alternation patterns of mutually conjugate symbols (e.g. open-vs- close, beginTransact-vs-endTransaction etc.), or about prohibited patterns (e.g. you might want to assert that some symbols are not used in a given region). The asser- tions not only allow for checking for errors, they convey additional information tothe reader in program comprehension stage. The macro expander also finds out definitions and calls within comments and carries out the expansion. It supports a number of neat things like variable number of argu- ments, symbolic permutation/combination generator, string interpolation etc. When invoked for a file, spectral cleans up previous expansions and generates newones to replace them. The expanded code is clearly marked to avoid inadvertent manual modification. Both the checker and the macro expander provide a declarative interface but if needed the entire Tcl 8.6 language is available for unboundedly powerful checks and expan- sions.

4 Conclusion and Future Work

We obtained some feedback by presenting the idea on the social bookmarking forum called reddit. Some were outright discouraging, like the following :

”it sounds like you are brand new to software development. Nobody will buy this you moron holy shit. Why did you even waste your time on this.??????”

While the others were rather encouraging, like the following :

“Ignore the haters – I love seeing stuff like this. How we write code has pretty much stagnated for decades so it’s nice to see new and creative approaches to decade-old pain points.”

”I like the concept, its similar to some of the crazy ideas in TempleOS, but It’s too far removed from normal documentation to work. I imagine this software being the bane of my life when I go to contribute toaproject and find that I can only read the documentation with a proprietary third party editor I never wanted in the first place that I now have to pay for just so I can understand the code.”

Page 22 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

”Sounds like a project a lot of Tcl users would be interested in if it could be used for Tcl code writing as well as plugin hacking.”

We believe the backlash on reddit happened because we had tried to position spec- tral as a programming editor (i.e. not as a niche tool for code annotation and com- prehension). Programmers tend to be quite religious about editors [13], but if we position spectral as a special-purpose tool for program comprehension, it might not seem as repulsive. After all, the cognitive ease that colourful highlighting facilitates arises from broadening the bandwidth of the communication channel connecting the code text and our cerebral cortex. Spectral only adds to the bandwidth of perusal and doesn’t take anything away. Especially, while reading code littered with weakly relatable strange words like CLinkSpanToRampJunctionNodeManagerProxy, or IAbstractFlowSheetConnnectionTerminatorVisitor, it certainly helps to think non-verbally in terms of this turquoise thingy or that yellow mother,at least until the meanings and purposes become clearer. Since all the advantages of plain text is retained in the proposed rich text format (due to the line-to-line correspondence with plain text, as described in section 2, there is no reason why rich text programming can’t go main-stream. After all, it doesn’t force anything new, only adds some extra features that can be useful sometimes. We plan to devote future work on the user manual and external tool integrations with spectral. The first three in the pipeline are (i) diff-merge integration, (ii) theGNU debugger (debugger) using the GDB/MI protocol, and (iii) some LLVM/Clang tooling integration. The existing command line diff/merge tools suffice for the spectral format because of the line-to-line correspondence. The new development merely involves showing the difference in rich-text mode. We made an early prototype based onthe tkdiff tool, as demonstrated in [7].

References

[1] Harold Abelson and Gerald J. Sussman. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, MA, USA, 2nd edition, 1996.

[2] Paul Anderson and Tim Teitelbaum. Software inspection using codesurfer. In Proceedings of the first Workshop on Inspection in Software Engineering, 2001.

[3] Joe Armstrong. The mess we’re in - url: https://youtu.be/lkxe3hug2l4.

Page 23 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder Rich Text Programming : A Tool for Code Comprehension

[4] Alastair Dunsmore. Comprehension and visualisation of object-oriented code for inspections. Empirical Foundations of (EFoCS), University of Strathclyde Livingstone Tower, Glasgow G1 1XH, UK, 1998.

[5] Christopher W. Fraser and David R. Hanson. A Retargetable C Compiler: De- sign and Implementation. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.

[6] Donald E. Knuth. Literate programming. The Computer Journal, 27:97–111, 1984.

[7] Jayanta Majumder and Shikha Sarkar. A prototype difftool for the spectral format - url: https://youtu.be/jnkiajxfl7m.

[8] Jayanta Majumder and Shikha Sarkar. Spectral editor quick tour - 1 - url: https://youtu.be/t5zo0co0yaa.

[9] Jayanta Majumder and Shikha Sarkar. Using vi commands in spectral - url: https://youtu.be/mtcph4m_foc.

[10] John K. Ousterhout, Ken Jones, Eric Foster-Johnson, Donal Fellows, Brian Griffin, and David Welton. Tcl and the Tk Toolkit. Addision-Wesley Professional Com- puting Series. Addison-Wesley, Upper Saddle River, New Jersey, 2 edition, 2010.

[11] Margaret-Anne Storey. Theories, tools and research methods in program com- prehension: Past, present and future. Software Quality Journal, 14(3):187–208, September 2006.

[12] Anneliese von Mayrhauser and A. Marie Vans. Program comprehension during software maintenance and evolution. Computer, 28(8):44–55, August 1995.

[13] Wikipedia. Editor war - url:https://en.wikipedia.org/wiki/editor_war.

Page 24 of 24

Copyright (C) 2015 Jayanta Majumder, Shikha Sarkar, Sambuddha Majumder