Indexing Common Lisp with Kythe a Demonstration Jonathan Godbout [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
Indexing Common Lisp with Kythe A Demonstration Jonathan Godbout [email protected] ABSTRACT give other information about the indexed code. It has VNames which For decades Lispers have had the power of code cross-references uniquely identify a node in a code base. It has Edges which annotate (jump to definition, list callers, etc.) for any code they’ve loaded into how two nodes relate to each other. their Lisp image. But what about cross referencing code that isn’t For example, take the variable object from threadp in Bordeaux- (or can’t be) loaded into the image? Wouldn’t it be great if we could threads [7]: ask “who, in the global Lisp community, calls this function?” The (defun threadp (object) only option currently available is to download all Lisp code and use (typep object 'sb-thread:thread)) “grep” or similar text-based tools. At Google we use Kythe [4] as a The variable object next to threadp would have a node: cross-reference database for all Lisp code, whether loaded into our local Lisp image or not. We will show how Lisp is cross-referenced { on a static web-page with hyperlinks between definitions. With ticket: "kythe://corpus??lang=lisp?path=PATH this we can also get call graphs and call hierarchies 1. #BORDEAUX-THREADS%3A%3AOBJECT%20%3AVARIABLE %20loc%3D%2825%3A16-25%3A22%29", ACM Reference Format: Jonathan Godbout. 2020. Indexing Common Lisp with Kythe: A Demonstra- kind: "variable", tion. In Proceedings of the 13th European Lisp Symposium (ELS’20). ACM, language: "lisp", New York, NY, USA, 3 pages. https://doi.org/10.5281/zenodo.3765987 name: "object", qualified_name: "object", 1 INTRODUCTION location: { corpus: "corpus", Almost every software project will have a large number of files path: "PATH/TO/bordeaux-threads and functions. As soon as the number of files goes above 1, or /src/impl-sbcl.lisp", the number of possible on-screen pages goes above 1, users will line_number: 25, get confused about what definitions are used where. SLIME [5] line_number_end: 25, has jump-to-definition using “M-.”, so when the code has been column_number: 16, loaded into the Lisp image we can jump to function definitions column_number_end: 22 and call sites. On websites with static code, such as https://www. }, github.com, where the code is viewed statically on screen, it would v_name: { be nice to get hyperlinks between the definitions and their usage. signature: Kythe https://kythe.io/ is a service that allows users to implement "BORDEAUX-THREADS::OBJECT :VARIABLE loc=(25:16-25:22)", language-specific indexers and then to upload graphs describing corpus: "corpus", the structure of the code. This allows for code display and editing path: engines to provide services like jump-to-definition. At Google we "PATH/TO/bordeaux-threads/src/impl-sbcl.lisp", have implemented a Lisp plugin for the Kythe indexer to produce language: "lisp" cross reference data for Google’s Common Lisp code base. We will } start with a brief overview of Kythe, and then discuss indexing Lisp. } 2 KYTHE OVERVIEW The VName uniquely identifies the node. The slot kind tells which Kythe is a database for storing code graphs for large code bases kind of node this is, so “variable” tells us this is a variable. The slot across multiple languages. Its schema is designed to accommodate location tells us where the source location of the referenced code. facets of different languages. Part of its schema are nodes which The slot ticket is just a URI encoding of the VName. By location name functions and variables, define exact locations in a file, or reference we mean a node containing the location of a form in the code. 1some limitations apply There would be a second node for the instance of the variable which is the first argument to typep. Finally there would be an Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed edge for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. { For all other uses, contact the owner/author(s). source: node1, ELS’20, April 27–28 2020, Zürich, Switzerland target: node2, © 2020 Copyright held by the owner/author(s). ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. edge_kind: ref https://doi.org/10.5281/zenodo.3765987 } ELS’20, April 27–28 2020, Zürich, Switzerland Jonathan Godbout (setf (bear-cat my-bear-cat) 'friendly)) We would like a reference from the bear-cat setter to the cat slot in the bear structure. In (most) Lisps, this would be fine, we would just add a call to who-calls for (setf bear-cat), but the Lisp language specification does not require such a function to exist. In fact SBCL does not create setf functions for structure-objects, so we must start by going through the code and creating location references for all structure-object accessors. 4 INTER-LANGUAGE REFERENCES Figure 1: Kythe Calling the Lisp Indexer We often make calls from one language into another language, for example Lisp’s foreign functions calls into C. At Google, the most common format for data interchange between systems is called node1 node2 where and are the first and second nodes discussed Protocol Buffers [2], or protobuf for short. A protobuf is a data above. interchange format that a language can implement. For full details on Kythe’s schema please reference https://kythe. To implement support for protobuf messages languages can use io/docs/schema/. their native structures but they must serialize the messages into a standard format before sending them out. Then any other language 3 STRATEGY that implements the protobuf standard can deserialize and read the In an out-of-band process, we start up a Lisp indexing service, messages. The content of the messages can be deserialized without and have it load all the code required to populate the who-calls knowledge of the protobuf schema used, but a protobuf schema database with the requisite information. This is essentially how detailing types and names are required for human readable output. SLIME determines jump-to-definition targets (along with some Here is an example protobuf schema defining one “message” (a heuristics needed for problems discussed later). structure) that contains a string: You may have: syntax = "proto2"; foo.lisp uses bar.lisp package example; The Lisp indexing plugin loads bar.lisp and foo.lisp into the Lisp image and the Lisp implementation determines the cross-reference message HelloWorld { information locally. If you are trying to create all cross-references optional string hello_world_string = 1; for foo.lisp and bar is a function defined in bar.lisp we can inspect } the who-calls database to get this cross-reference. In SBCL [3] you get all of the top level defun and defvar forms, Below we have lisp code that creates the Lisp standard-object but none of the top level forms that don’t define a data structure corresponding to the structure. that are needed later. For example, code that is run at start-up (let ((my-proto time, such as (setf *foo* ’foo), at the top level may not have a (make-instance 'example:hello-world cross reference in the who-calls database because the compiler can :hello-world-string ``hello-world''))) compile the call away. We will go through some examples:. (print (hello-world-string my-proto))) Local variable bindings aren’t stored in the who-calls database. We would like a reference from “hello-world-string” in the If you have a function Lisp code to the “hello_world_string” in the protobuf schema. (defun print-a (a) As Kythe is just a database service that stores a graph of the code (print a)) for contextualization in a language agnostic form, so long as you you would like to have a cross-reference from the a in print- know the signature for the “hello_world_string” you can just a’s lambda list to its use in the function’s body. This is not stored create a cross-reference in Kythe. in the who-calls database. To solve cases such as this we have a number of parsers (e.g. “defun” parser) that will get the symbols 5 MACROS to be bound and store their location. Iterating through all of the The use of a small number of parsers to understand local bindings code, with the correct set of parsers, will give us all of the local is not ideal but it is doable for the built in commands. In contrast definitions. Currently our parser is only a decent heuristic, andour Common Lisp is known for its powerful syntax-extending ability, method parser does not correctly cross-reference types. namely macros. For a detailed look at macros please consut Let Next we have hidden parameters that don’t show up in the code Over Lambda [6], we will go over a basic examples below. or the who-calls database. Take for example: (defvar *process-data-mutex* (make-mutex)) (defstruct bear cat) (defmacro with-data-mutex ((mutex) &body body) (defun set-bear-cat-friendly (my-bear-cat) `(let ((,mutex *process-data-mutex*)) ... lots of code ... (sb-thread:get-mutex ,mutex) Indexing Common Lisp with Kythe ELS’20, April 27–28 2020, Zürich, Switzerland ,@body REFERENCES (sb-thread:release-mutex ,mutex))) [1] Armed bear common lisp. https://abcl.org/. [2] Protocol buffers. https://developers.google.com/protocol-buffers. Accessed: 2020- 02-10. (defun process-data (data) [3] Steel bank common lisp.