Es: A shell with higher-order functions Paul Haahr ± Adobe Systems Incorporated Byron Rakitzis ± Network Appliance Corporation

ABSTRACT In the fall of 1990, one of us (Rakitzis) re-implemented the Plan 9 command interpreter, , for use as a . Experience with that shell led us to wonder whether a more general approach to the design of shells was possible, and this paper describes the result of that experimentation. We applied concepts from modern languages, such as Scheme and ML, to shells, which typically are more concerned with UNIX features than language design. Our shell is both simple and highly programmable. By exposing many of the internals and adopting constructs from functional programming languages, we have created a shell which supports new paradigms for programmers.

Although most users think of the shell as an At a super®cial level, es looks like most UNIX interactive command interpreter, it is really a pro- shells. The syntax for pipes, redirection, background gramming language in which each statement runs a jobs, etc., is unchanged from the . Es's command. Because it must satisfy both the interac- programming constructs are new, but reminiscent of tive and programming aspects of command execu- rc and Tcl[6]. tion, it is a strange language, shaped as much by Es is freely redistributable, and is available by history as by design. anonymous ftp from ftp.white.toronto.edu. Ð Brian Kernighan & Rob Pike [1] Using es Introduction Commands A shell is both a and For simple commands, es resembles other the core of an interactive environment. The ancestor shells. For example, newline usually acts as a com- of most current shells is the 7th Edition Bourne mand terminator. These are familiar commands shell[2], which is characterized by simple semantics, which all work in es: a minimal set of interactive features, and syntax that is all too reminiscent of Algol. One recent shell, /tmp rc[3], substituted a cleaner syntax but kept most of rm Ex* the Bourne shell's attributes. However, most recent ps aux | grep '^byron' | developments in shells (e.g., csh, ksh, zsh) have awk '{print $2}' | xargs -9 focused on improving the interactive environment without changing the structure of the underlying For simple uses, es bears a close resemblance language ± shells have proven to be resistant to to rc. For this reason, the reader is referred to the innovation in programming languages. paper on rc for a discussion of quoting rules, redirection, and so on. (The examples shown here, While rc was an experiment in adding modern however, will try to aim for a lowest common syntax to Bourne shell semantics, es is an explora- denominator of shell syntax, so that an understanding tion of new semantics combined with rc-in¯uenced of rc is not a prerequisite for understanding this syntax: es has lexically scoped variables, ®rst-class paper.) functions, and an exception mechanism, which are concepts borrowed from modern programming Functions languages such as Scheme and ML[4, 5]. Es can be programmed through the use of shell functions. Here is a simple function to print the date In es, almost all standard shell constructs (e.g., in yy-mm-dd format: pipes and redirection) are translated into a uniform representation: function calls. The primitive func- fn d { tions which implement those constructs can be mani- date +%y-%m-%d pulated the same way as all other functions: invoked, } replaced, or passed as arguments to other functions. The ability to replace primitive functions in es is key Functions can also be called with arguments. to its extensibility; for example, a user can override Es allows parameters to be speci®ed to functions by the de®nition of pipes to cause remote execution, or placing them between the function name and the the path-searching machinery to implement a path open-brace. This function takes a command cmd and look-up cache. arguments args and applies the command to each argument in turn:

1993 Winter USENIX ± January 25-29, 1993 ± San Diego, CA 53 Es: A shell with higher-order functions Haahr & Rakitzis

fn apply cmd args { command-line. This is called a lambda.2 It takes the for (i = $args) form $cmd $i @ parameters { commands } } 1 In effect, a lambda is a procedure ``waiting to For example: happen.'' For example, it is possible to type: es> apply echo testing 1.. 2.. 3.. es> @ i {cd $i; rm -f *} /tmp testing 1.. directly at the shell, and this runs the inlined func- 2.. tion directly on the argument /tmp. 3.. There is one more thing to notice: the inline Note that apply was called with more than two function that was supplied to apply had a parame- arguments; es assigns arguments to parameters one- ter named i, and the apply function itself used a to-one, and any leftovers are assigned to the last reference to a variable called i. Note that the two parameter. For example: uses did not con¯ict: that is because es function parameters are lexically scoped, much as variables es> fn rev3 a b { are in C and Scheme. echo $c $b $a } Variables es> rev3 1 2 3 4 5 The similarity between shell functions and 3 4 5 2 1 lambdas is not accidental. In fact, function de®nitions are rewritten as assignments of lambdas If there are fewer arguments than parameters, es to shell variables. Thus these two es commands are leaves the leftover parameters null: entirely equivalent: es> rev3 1 fn echon args {echo -n $args} 1 fn-echon = @ args {echo -n $args} So far we have only seen simple strings passed In order not to con¯ict with regular variables, as arguments. However, es functions can also take function variables have the pre®x fn- prepended to program fragments (enclosed in braces) as argu- their names. This mechanism is also used at execu- ments. For example, the apply function de®ned tion time; when a name like apply is seen by es, it above can be used with program fragments typed ®rst looks in its symbol table for a variable by the directly on the command line: name fn-apply. Of course, it is always possible to es> apply @ i {cd $i; rm -f *} \ execute the contents of any variable by dereferencing /tmp /usr/tmp it explicitly with a dollar sign: es> silly-command = {echo hi} This command contains a lot to understand, so let us es> $silly-command break it up slowly. hi In any other shell, this command would usually be split up into two separate commands: The previous examples also show that variables can be set to contain program fragments as well as es> fn cd-rm i { simple strings. In fact, the two can be intermixed: cd $i rm -f * es> mixed = {ls} hello, {wc} world } es> echo $mixed(2) $mixed(4) es> apply cd-rm /tmp /usr/tmp hello, world es> $mixed(1) | $mixed(3) Therefore, the construct 61 61 478 @ i {cd $i; rm -f *} Variables can hold a list of commands, or even is just a way of inlining a function on the a list of lambdas. This makes variables into versatile tools. For example, a variable could be used as a function dispatch table.

1In our examples, we use ``es> '' as es's prompt. The default prompt, which may be overridden, is ``; '' which is interpreted by es as a null command followed by a 2The keyword @ introduces the lambda. Since @ is not a command separator. Thus, whole lines, including special character in es it must be surrounded by white prompts, can be cut and pasted back to the shell for re- space. @ is a poor substitute for the letter λ, but it was execution. In examples, an italic ®xed width font one of the few characters left on a standard keyboard indicates user input. which did not already have a special meaning.

54 1993 Winter USENIX ± January 25-29, 1993 ± San Diego, CA Haahr & Rakitzis Es: A shell with higher-order functions

Binding es> trace echo-nl In the section on functions, we mentioned that es> echo-nl a b c function parameters are lexically scoped. It is also calling echo-nl a b c possible to use lexically-scoped variables directly. a For example, in order to avoid interfering with a glo- calling echo-nl b c bal instance of i, the following scoping syntax can b be used: calling echo-nl c c let (var = value) { calling echo-nl commands which use $var } The reader should note that Lexical binding is useful in shell functions, where it !cmd becomes important to have shell functions that do not clobber each others' variables. is es's ``not'' command, which inverts the sense of the return value of cmd, and Es code fragments, whether used as arguments to commands or stored in variables, capture the ~ subject pattern values of enclosing lexically scoped values. For matches subject against pattern and returns true if example, the subject is the same as the pattern. (In fact, the es> let (h=hello; w=world) { matching is a bit more sophisticated, for the pattern hi = { echo $h, $w } may include wildcards.) } Shells like the Bourne shell and rc support a es> $hi form of local assignment known as dynamic binding. hello, world The shell syntax for this is typically: One use of lexical binding is in rede®ning var=value command functions. A new de®nition can store the previous That notation con¯icts with es's syntax for assign- de®nition in a lexically scoped variable, so that it is ment (where zero or more words are assigned to a only available to the new function. This feature can variable), so dynamic binding has the syntax: be used to de®ne a function for tracing calls to other functions: local (var = value) { fn trace functions { commands which use $var } for (func = $functions) let (old = $(fn-$func)) The difference between the two forms of bind- fn $func args { ing can be seen in an example: echo calling $func $args $old $args es> x = foo } es> let (x = bar) { } echo $x fn lexical { echo $x } The trace function rede®nes all the functions } which are named on its command line with a func- bar tion that prints the function name and arguments and es> lexical then calls the previous de®nition, which is captured bar in the lexically bound variable old. Consider a es> local (x = baz) { recursive function echo-nl which prints its argu- echo $x ments, one per line: fn dynamic { echo $x } } es> fn echo-nl head tail { baz if {!~ $#head 0} { es> dynamic echo $head foo echo-nl $tail } Settor Variables } In addition to the pre®x (fn-) for function exe- es> echo-nl a b c cution described earlier, es uses another pre®x to a search for settor variables. A settor variable set- b foo is a variable which gets evaluated every time the c variable foo changes value. A good example of set- Applying trace to this function yields: tor variable use is the watch function:

1993 Winter USENIX ± January 25-29, 1993 ± San Diego, CA 55 Es: A shell with higher-order functions Haahr & Rakitzis

fn watch vars { non-structured control ¯ow. The built-in function for (var = $vars) { throw raises an exception, which typically consists set-$var = @ { of a string which names the exception and other echo old $var '=' $$var arguments which are speci®c to the named exception echo new $var '=' $* type. For example, the exception error is caught return $* by the default interpreter loop, which treats the } remaining arguments as an error message. Thus: } es> fn in dir cmd { } if {~ $#dir 0} { Watch establishes a settor function for each of its throw error 'usage: in dir cmd' parameters; this settor prints the old and new values } of the variable to be set, like this: fork # run in a subshell cd $dir es> watch x $cmd es> x=foo bar } old x = es> in new x = foo bar usage: in dir cmd es> x=fubar es> in /tmp ls old x = foo bar webster.socket yacc.312 new x = fubar By providing a routine which catches error excep- Return Values tions, a programmer can intercept internal shell UNIX programs exit with a single number errors before the message gets printed. between 0 and 255 reported as their statuses. Es Exceptions are also used to implement the supplants the notion of an exit status with ``rich'' break and return control ¯ow constructs, and to return values. An es function can return not only a provide a way for user code to interact with UNIX number, but any object: a string, a program frag- signals. While six error types are known to the ment, a lambda, or a list which mixes such values. interpreter and have special meanings, any set of The return value of a command is accessed by arguments can be passed to throw. prepending the command with <>: Exceptions are trapped with the built-in es> fn hello-world { catch, which typically takes the form return 'hello, world' catch @ e args { handler } { body } } es> echo <>{hello-world} Catch ®rst executes body; if no exception is raised, hello, world catch simply returns, passing along body's return value. On the other hand, if anything invoked by This example shows rich return values being body throws an exception, handler is run, with e used to implement hierarchical lists: bound to the exception that caused the problem. For fn cons a d { example, the last two lines of in above can be return @ f { $f $a $d } replaced with: } catch @ e msg { fn car p { $p @ a d { return $a } } if {~ $e error} { fn cdr p { $p @ a d { return $d } } echo >[1=2] in $dir: $msg The ®rst function, cons, returns a function } { which takes as its argument another function to run throw $e $msg on the parameters a and d. car and cdr each } invoke the kind of function returned by cons, sup- } { plying as the argument a function which returns the cd $dir ®rst or second parameter, respectively. For example: $cmd } es> echo <>{car <>{cdr <>{ cons 1 <>{cons 2 <>{cons 3 nil}} to better identify for a user where an error came }}} from: 2 es> in /temp ls Exceptions in /temp: chdir /temp: No such file or directory In addition to traditional control ¯ow constructs ± loops, conditionals, subroutines ± es has an excep- tion mechanism which is used for implementing

56 1993 Winter USENIX ± January 25-29, 1993 ± San Diego, CA Haahr & Rakitzis Es: A shell with higher-order functions

Spoo®ng fn %create fd file cmd { if {test -f $file} { Es's versatile functions and variables are only throw error $file exists half of the story; the other part is that es's shell syn- } { tax is just a front for calls on built-in functions. For $&create $fd $file $cmd example: } ls > /tmp/foo } is internally rewritten as In fact, most rede®nitions do not refer to the %create 1 /tmp/foo {ls} $&-forms explicitly, but capture references to them with lexical scoping. Thus, the above rede®nition before it is evaluated. %create is the built-in func- would usually appear as tion which opens /tmp/foo on ®le-descriptor 1 let (create = $fn-%create) and runs ls. fn %create fd file cmd { The value of this rewriting is that the if {test -f $file} { %create function (and that of just about any other throw error $file exists shell service) can be spoofed, that is, overridden by } { the user: when a new %create function is de®ned, $create $fd $file $cmd the default action of redirection is overridden. } Furthermore, %create is not really the built- } in ®le redirection service. It is a hook to the primi- The latter form is preferable because it allows multi- tive $&create, which itself cannot be overridden. ple rede®nitions of a function; the former version That means that it is always possible to access the would always throw away any previous rede®nitions. underlying shell service, even when its hook has been reassigned. Overriding traditional shell built-ins is another common example of spoo®ng. For example, a cd Keeping this in mind, here is a spoof of the operation which also places the current directory in redirection operator that we have been discussing. the title-bar of the window (via the hypothetical This spoof is simple: if the ®le to be created exists command title) can be written as: (determined by running test -f), then the com- mand is not run, similar to the C-shell's ``noclobber'' option:

es> let (pipe = $fn-%pipe) { fn %pipe first out in rest { if {~ $#out 0} { time $first } { $pipe {time $first} $out $in {%pipe $rest} } } } es> cat paper9 | tr -cs a-zA-Z0-9 '\012' | sort | uniq -c | sort -nr | sed 6q 213 the 150 a 120 to 115 of 109 is 96 and 2r 0.3u 0.2s cat paper9 2r 0.3u 0.2s tr -cs a-zA-Z0-9 \012 2r 0.5u 0.2s sort 2r 0.4u 0.2s uniq -c 3r 0.2u 0.1s sed 6q 3r 0.6u 0.2s sort -nr Figure 1: Timing pipeline elements

1993 Winter USENIX ± January 25-29, 1993 ± San Diego, CA 57 Es: A shell with higher-order functions Haahr & Rakitzis

let (cd = $fn-%cd) looked up in a user's $PATH. Es does not provide fn cd { this functionality in the shell, but it can easily be $cd $* added by any user who wants it. The function title `{pwd} %pathsearch (see Figure 2) is invoked to look-up } non-absolute ®le names which are used as com- mands. Spoo®ng can also be used for tasks which other shells cannot do; one example is timing each ele- One other piece of es which can be replaced is ment of a pipeline by spoo®ng %pipe, along the the interpreter loop. In fact, the default interpreter is lines of the pipeline pro®ler suggested by Jon Bent- written in es itself; see Figure 3. ley[7]; see Figure 1. A few details from this example need further Many shells provide some mechanism for cach- explanation. The exception retry is intercepted by ing the full pathnames of executables which are catch when an exception handler is running, and causes the body of the catch routine to be re-run. let (search = $fn-%pathsearch) { fn %pathsearch prog { let (file = <>{$search $prog}) { if {~ $#file 1 && ~ $file /*} { path-cache = $path-cache $prog fn-$prog = $file } return $file } } } fn recache { for (i = $path-cache) fn-$i = path-cache = } Figure 2: Path caching

fn %interactive-loop { let (result = 0) { catch @ e msg { if {~ $e eof} { return $result } {~ $e error} { echo >[1=2] $msg } { echo >[1=2] uncaught exception: $e $msg } throw retry } { while {} { %prompt let (cmd = <>{%parse $prompt}) { result = <>{$cmd} } } } } } Figure 3: Default interactive loop

58 1993 Winter USENIX ± January 25-29, 1993 ± San Diego, CA Haahr & Rakitzis Es: A shell with higher-order functions

%parse prints its ®rst argument to standard error, The difference with these is that they are given reads a command (potentially more than one line names invoked directly by the user; ``.'' is the long) from the current source of command input, and Bourne-compatible command for ``sourcing'' a ®le. throws the eof exception when the input source is Finally, some settor functions are de®ned to exhausted. The hook %prompt is provided for the work around UNIX path searching (and other) conven- user to rede®ne, and by default does nothing. tions. For example, Other spoo®ng functions which either have set-path = @ { been suggested or are in active use include: a ver- local (set-PATH = ) sion of cd which asks the user whether to create a PATH = <>{%flatten : $*} directory if it does not already exist; versions of return $* redirection and program execution which try spelling } correction if ®les are not found; a %pipe to run set-PATH = @ { pipeline elements on (different) remote machines to local (set-path = ) obtain parallel execution; automatic loading of shell path = <>{%fsplit : $*} functions; and replacing the function which is used return $* for tilde expansion to support alternate de®nitions of } home directories. Moreover, for debugging pur- poses, one can use trace on hook functions. A note on implementation: these functions tem- porarily assign their opposite-case settor cousin to Implementation null before making the assignment to the opposite- Es is implemented in about 8000 lines of C. case variable. This avoids in®nite recursion between Although we estimate that about 1000 lines are the two settor functions. devoted to portability issues between different ver- The Environment sions of UNIX, there are also a number of work- UNIX shells typically maintain a table of vari- arounds that es must use in order to blend with UNIX. able de®nitions which is passed on to child processes The path variable is a good example. when they are created. This table is loosely referred The es convention for path searching involves to as the environment or the environment variables. looking through the list elements of a variable called Although traditionally the environment has been path. This has the advantage that all the usual list used to pass values of variables only, the duality of operations can be applied equally to path as any functions and variables in es has made it possible to other variable. However, UNIX programs expect the pass down function de®nitions to subshells. (While path to be a colon-separated list stored in PATH. rc also offered this functionality, it was more of a Hence es must maintain a copy of each variable, kludge arising from the restriction that there was not with a change in one re¯ected as a change in the a separate space for ``environment functions.'') other. Having functions in the environment brings Initialization them into the same conceptual framework as vari- Much of es's initialization is actually done by ables ± they follow identical rules for creation, dele- an es script, called initial.es, which is con- tion, presence in the environment, and so on. Addi- verted by a to a C character string at tionally, functions in the environment are an optimi- compile time and stored internally. The script illus- zation for ®le I/O and parsing time. Since nearly all trates how the default actions for es's parser is set shell state can now be encoded in the environment, up, as well as features such as the path/PATH it becomes super¯uous for a new instance of es, such aliasing mentioned above. as one started by xterm(1), to run a con®guration Much of the script consists of lines like: ®le. Hence shell startup becomes very quick. As a consequence of this support for the fn-%and = $&and environment, a fair amount of es must be devoted to fn-%append = $&append ``unparsing'' function de®nitions so that they may be fn-%background = $&background passed as environment strings. This is complicated a which bind the shell services such as short-circuit- bit more because the lexical environment of a func- and, backgrounding, etc., to the %-pre®xed hook tion de®nition must be preserved at unparsing. This variables. is best illustrated by an example: There are also a set of assignments which bind es> let (a=b) fn foo {echo $a} the built-in shell functions to their hook variables: which lexically binds b to the variable a for the fn-. = $&dot scope of this function de®nition. Therefore, the fn-break = $&break external representation of this function must make fn-catch = $&catch this information explicit. It is encoded as:

1993 Winter USENIX ± January 25-29, 1993 ± San Diego, CA 59 Es: A shell with higher-order functions Haahr & Rakitzis

es> whatis foo unfortunate consequence of making memory %closure(a=b)@ * {echo $a} management in es more complex than that found in other shells. Simple memory reclamation strategies (Note that for cultural compatibility with other such as arena style allocation [8] or reference count- shells, functions with no named parameters use ``*'' ing are unfortunately inadequate; a full garbage col- for binding arguments.) lection system is required to plug all memory leaks. Interactions With Unix Based on our experience with rc's memory use, Unlike most traditional shells, which have we decided that a copying garbage collector would feature sets dictated by the UNIX system call inter- be appropriate for es. The observations leading to face, es contains features which do not interact well this conclusion were: (1) between two separate com- with UNIX itself. For example, rich return values mands little memory is preserved (it roughly make sense from shell functions (which are run corresponds to the storage for environment vari- inside the shell itself) but cannot be returned from ables); (2) command execution can consume large shell scripts or other external programs, because the amounts of memory for a short time, especially exit/wait interface only supports passing small when loops are involved; and, (3) however much integers. This has forced us to build some things memory is used, the working set of the shell will into the shell which otherwise could be external. typically be much smaller than the physical memory The exception mechanism has similar problems. available. Thus, we picked a strategy where we When an exception is raised from a shell function, it traded relatively fast collection times for being propagates as expected; if raised from a subshell, it somewhat wasteful in the amount of memory used in cannot be propagated as one would like it to be: exchange. While a generational garbage collector instead, a message is printed on exit from the sub- might have made sense for the same reasons that we shell and a false exit status is returned. We consider picked a copying collector, we decided to avoid the this unfortunate, but there seemed no reasonable way added complexity implied by switching to the gen- to tie exception propagation to any existing UNIX erational model. mechanism. In particular, the signal machinery is During normal execution of the shell, memory unsuited to the task. In fact, signals complicate the is acquired by incrementing a pointer through a pre- control ¯ow in the shell enough, and cause enough allocated block. When this block is exhausted, all special cases throughout the shell, so as to be more live pointers from outside of garbage collector of a nuisance than a bene®t. memory, the rootset, are examined, and any structure One other unfortunate consequence of our that they point to is copied to a new block. When shoehorning es onto UNIX systems is the interaction the rootset has been scanned, all the freshly copied between lexically scoped variables, the environment, data is scanned similarly, and the process is repeated and subshells. Two functions, for example, may until all reachable data has been copied to the new have been de®ned in the same lexical scope. If one block. At this point, the memory request which trig- of them modi®es a lexically scoped variable, that gered the collection should be able to succeed. If change will affect the variable as seen by the other not, a larger block is allocated and the collection is function. On the other hand, if the functions are run redone. in a subshell, the connection between their lexical During some parts of the shell's execution ± scopes is lost as a consequence of them being notably while the yacc parser driver is running ± it is exported in separate environment strings. This does not possible to identify all of the rootset, so garbage not turn out to be a signi®cant problem, but it does collection is disabled. If an allocation request is not seem intuitive to a programmer with a back- made during this time for which there is not enough ground in functional languages. memory available in the arena, a new chunk of One restriction on es that arose because it had memory is grabbed so that allocation can continue. to work in a traditional UNIX environment is that lists Garbage collectors have developed a reputation are not hierarchical; that is, lists may not contain for being hard to debug. The collection routines lists as elements. In order to be able to pass lists to themselves typically are not the source of the external programs with the same semantics as pass- dif®culty. Even more sophisticated algorithms than ing them to shell functions, we had to restrict lists to the one found in es are usually only a few hundred the same structure as exec-style argument vectors. lines of code. Rather, the most common form of GC Therefore all lists are ¯attened, as in rc and csh. bug is failing to identify all elements of the rootset, Garbage Collection since this is a rather open-ended problem which has Since es incorporates a true lambda calculus, it implications for almost every routine. To ®nd this includes the ability to create true recursive struc- form of bug, we used a modi®ed version of the gar- tures, that is, objects which include pointers to them- bage collector which has two key features: (1) a selves, either directly or indirectly. While this collection is initiated at every allocation when the feature can be useful for programmers, it has the collector is not disabled, and (2) after a collection ®nishes, access to all the memory from the old

60 1993 Winter USENIX ± January 25-29, 1993 ± San Diego, CA Haahr & Rakitzis Es: A shell with higher-order functions region is disabled.3 Thus, any reference to a pointer The current implementation of es has the in garbage collector space which could be invali- undesirable property that all function calls cause the dated by a collection immediately causes a memory C stack to nest. In particular, tail calls consume protection fault. We strongly recommend this tech- stack space, something they could be optimized not nique to anyone implementing a copying garbage to do. Therefore, properly tail recursive functions, collector. such as echo-nl above, which a Scheme or ML There are two performance implications of the programmer would expect to be equivalent to loop- garbage collector; the ®rst is that, occasionally, ing, have hidden costs. This is an implementation while the shell is running, all action must stop while de®ciency which we hope to remedy in the near the collector is invoked. This takes roughly 4% of future. the running time of the shell. More serious is that at Es, in addition to being a good language for the time of any potential allocation, either the collec- shell programming, is a good candidate for a use as tor must be disabled, or all pointers to structures in an embeddable ``scripting'' language, along the lines garbage collector memory must be identi®ed, effec- of Tcl. Es, in fact, borrows much from Tcl ± most tively requiring them to be in memory at known notably the idea of passing around blocks of code as addresses, which defeats the registerization optimiza- unparsed strings ± and, since the requirements on the tions required for good performance from modern two languages are similar, it is not surprising that architectures. It is hard to quantify the performance the syntaxes are so similar. Es has two advantages consequences of this restriction. over most embedded languages: (1) the same code The garbage collector consists of about 250 can be used by the shell or other programs, and lines of code for the collector itself (plus another many functions could be identical; and (2) it sup- 300 lines of debugging code), along with numerous ports a wide variety of programming constructs, such declarations that identify variables as being part of as closures and exceptions. We are currently work- the rootset and small (typically 5 line) procedures to ing on a ``library'' version of es which could be allocate, copy, and scan all the structure types allo- used stand-alone as a shell or linked in other pro- cated from collector space. grams, with or without shell features such as wild- card expansion or pipes. Future Work Conclusions There are several places in es where one would expect to be able to rede®ne the built-in behavior There are two central ideas behind es. The ®rst and no such hook exists. The most notable of these is that a system can be made more programmable by is the wildcard expansion, which behaves identically exposing its internals to manipulation by the user. to that in traditional shells. We hope to expose By allowing spoo®ng of heretofore unmodi®able some of the remaining pieces of es in future ver- shell features, es gives its users great ¯exibility in sions. tailoring their programming environment, in ways that earlier shells would have supported only with One of the least satisfying pieces of es is its modi®cation of shell source itself. parser. We have talked of the distinction between the core language and the full language; in fact, the Second, es was designed to support a model of translation of syntactic sugar (i.e., the convenient programming where code fragments could be treated UNIX shell syntax presented to the user) to core as just one more form of data. This feature is often language features is done in the same yacc-generated approximated in other shells by passing commands parser as the recognition of the core language. around as strings, but this approach requires resort- Unfortunately, this ties the full language in to the ing to baroque quoting rules, especially if the nesting core very tightly, and offers little room for a user to of commands is several layers deep. In es, once a extend the syntax of the shell. construct is surrounded by braces, it can be stored or passed to a program with no fear of mangling. We can imagine a system where the parser only recognizes the core language, and a set of exposed Es contains little that is completely new. It is transformation rules would map the extended syntax a synthesis of the attributes we admire most from which makes es feel like a shell, down to the core two shells ± the venerable Bourne shell and Tom language. The extend-syntax [9] system for Scheme Duff's rc ± and several programming languages, not- provides a good example of how to design such a ably Scheme and Tcl. Where possible we tried to mechanism, but it, like most other macro systems retain the simplicity of es's predecessors, and in designed for Lisp-like languages, does not mesh well several cases, such as control ¯ow constructs, we with the free-form syntax that has evolved for UNIX believe that we have simpli®ed and generalized what shells. was found in earlier shells. We do not believe that es is the ultimate shell. It has a cumbersome and non-extensible syntax, the 3This disabling depends on support. support for traditional shell notations forced some

1993 Winter USENIX ± January 25-29, 1993 ± San Diego, CA 61 Es: A shell with higher-order functions Haahr & Rakitzis unfortunate design decisions, and some of es's reached by electronic mail at [email protected] or features, such as exceptions and rich return values, by surface mail at Adobe Systems Incorporated, do not interact as well with UNIX as we would like 1585 Charleston Road, Mountain View, CA 94039. them to. Nonetheless, we think that es is successful Byron Rakitzis is a system programmer at Net- as both a shell and a programming language, and work Appliance Corporation, where he works on the would miss its features and extensibility if we were design and implementation of their network ®le forced to revert to other shells. server. In his spare time he works on shells and win- Acknowledgements dow systems. His free-software contributions We'd like to thank the many people who include a UNIX version of rc, the Plan 9 shell, and helped both with the development of es and the writ- pico, a version of Gerard Holzmann's picture editor ing of this paper. Dave Hitz supplied essential popi with code generators for SPARC and MIPS. He advice on where to focus our efforts. Chris Sieben- received an A.B. in Physics from Princeton Univer- mann maintained the es mailing list and ftp distribu- sity in 1990. He has two cats, Pooh-Bah and Goldi- tion of the source. Donn Cave, Peter Ho, Noel locks, who try to rule his home life. Byron can be Hunt, John Mackin, Bruce Perens, Steven Rezsutek, reached at [email protected] or at Network Appli- Rich Salz, Scott Schwartz, Alan Watson, and all ance Corporation, 2901 Tasman Drive, Suite 208 other contributors to the list provided many sugges- Santa Clara, CA 95054. tions, which along with a ferocious willingness to experiment with a not-ready-for-prime-time shell, were vital to es's development. Finally, Susan Karp and Beth Mitcham read many drafts of this paper and put up with us while es was under development. References 1. Brian W. Kernighan and Rob Pike, The UNIX Programming Environment, Prentice-Hall, 1984. 2. S. R. Bourne, ``The UNIX Shell,'' Bell Sys. Tech. J., vol. 57, no. 6, pp. 1971-1990, 1978. 3. , ``Rc ± A Shell for Plan 9 and Unix Systems,'' in UKUUG Conference Proceedings, pp. 21-33, Summer 1990. 4. William Clinger and Jonathan Rees (editors), The Revised4 Report on the Algorithmic Language Scheme, 1991. 5. Robin Milner, Mads Tofte, and Robert Harper, The De®nition of Standard ML, MIT Press, 1990. 6. John Ousterhout, ``Tcl: An Embeddable Com- mand Language,'' in Usenix Conference Proceedings, pp. 133-146, Winter 1990. 7. Jon L. Bentley, More Programming Pearls, Addison-Welsey, 1988. 8. David R. Hanson, ``Fast allocation and deallo- cation of memory based on object lifetimes,'' SoftwareÐPractice and Experience, vol. 20, no. 1, pp. 5-12, January, 1990. 9. R. Kent Dybvig, The Scheme Programming Language, Prentice-Hall, 1987.

Author Information Paul Haahr is a computer scientist at Adobe Systems Incorporated where he works on font rendering technology. His interests include program- ming languages, window systems, and computer architecture. Paul received an A.B. in computer sci- ence from Princeton University in 1990. He can be

62 1993 Winter USENIX ± January 25-29, 1993 ± San Diego, CA