An Empirical Investigation of Command-Line Customization
Total Page:16
File Type:pdf, Size:1020Kb
An Empirical Investigation of Command-Line Customization Michael Schröder Jürgen Cito TU Wien TU Wien Vienna, Austria Vienna, Austria [email protected] Massachusetts Institute of Technology Cambridge, U.S.A. [email protected] ABSTRACT the particularities of syntax and argument combinations, instead of The interactive command line, also known as the shell, is a promi- enabling them to use a more recognizable symbol (as in graphical nent mechanism used extensively by a wide range of software pro- user interfaces). fessionals (engineers, system administrators, data scientists, etc.). A way for these experts to introduce recognizability and cus- Shell customizations can therefore provide insight into the tasks tomize their command-line experience is to attach distinct names they repeatedly perform, how well the standard environment sup- to potentially convoluted, but frequently used, command and argu- ports those tasks, and ways in which the environment could be ment structures, as well as workflows expressed as compositions productively extended or modified. To characterize the patterns of commands. This can be achieved by defining shell aliases. An and complexities of command-line customization, we mined the alias substitutes a given name, the alias, with a string value that collective knowledge of command-line users by analyzing more defines an arbitrarily complex command (or chain of commands). than 2.2 million shell alias definitions found on GitHub. Shell aliases The set of aliases users define provides a window into their prefer- allow command-line users to customize their environment by defin- ences expressed as part of their personal configuration. Many users ing arbitrarily complex command substitutions. Using inductive publicly share these configurations on social coding platforms such coding methods, we found three types of aliases that each enable as GitHub, contributing to a collective knowledge of command-line a number of customization practices: Shortcuts (for nicknam- customizations, which can provide insight into the tasks that expert ing commands, abbreviating subcommands, and bookmarking lo- users repeatedly perform and how well the standard environment cations), Modifications (for substituting commands, overriding supports those tasks. defaults, colorizing output, and elevating privilege), and Scripts (for transforming data and chaining subcommands). We conjecture that 1.1 Contribution identifying common customization practices can point to particular We see our large-scale analysis on command-line user customiza- usability issues within command-line programs, and that a deeper tions manifested in alias definitions as a unique window of oppor- understanding of these practices can support researchers and tool tunity to study how the standard environment of the command developers in designing better user experiences. In addition to our line could be productively extended, modified, and improved. Our analysis, we provide an extensive reproducibility package in the work goes hand in hand with existing efforts to innovate on the form of a curated dataset together with well-documented computa- experience of command lines that employ techniques from research tional notebooks enabling further knowledge discovery and a basis in systems [23, 43], software engineering and programming lan- for learning approaches to improve command-line workflows. guages [10, 54, 55], human-computer interaction [15, 53], and arti- ficial intelligence [1, 24, 31]. Particularly, our extensive qualitative KEYWORDS and quantitative analysis, in conjunction with our dataset, form the command line, customization practices, collective knowledge, in- basis for identifying opportunities for improving command-line ductive coding experience in the following directions: by characterizing customiza- tion practices, we gain a categorical understanding underlying the arXiv:2012.10206v2 [cs.SE] 6 Aug 2021 1 INTRODUCTION needs and wants of command-line users; based on our analysis, we identify opportunities for innovation and formulate them as A command-line interface, also called a shell, is a textual interface implications, accompanied with concrete scenarios and examples; that allows users to interact with the underlying operating system further, our comprehensive dataset enables the foundation of learn- by issuing commands. Expert users, such as system administrators, ing approaches, as part of learning-based program synthesis [7, 44], software developers, researchers, and data scientists, routinely use automated repair [34], and recommendation systems [32]; finally, the shell as it affords them flexibility and the ability to compose we also see our results and datasets as a basis for usability research multiple commands. They perform a variety of tasks on their sys- that can impact the design of tools and the future of the shell in tems including navigating and interacting with the filesystem (e.g., general. ls, mv, cd), using version control (e.g., git, hg), installing packages We summarize the work in this paper as follows: (e.g., apt-get, npm), or dealing with infrastructure (e.g., docker). Experts can adapt and play with a multitude of commands and • We identified nine Customization Practices, grouped into arguments, chaining them together to create more complex work- three high-level themes: Shortcuts introduce new names. flows. All this versatility introduces a common problem inuser They can be used for nicknaming commands (and correcting interfaces of recognition over recall [37], where users have to recall misspellings in the process), abbreviating subcommands like 1 Michael Schröder and Jürgen Cito git push, and bookmarking locations for quick navigation. 2.1 Usage and Syntax Modifications change the semantics of commands. We can The alias command allows the user to create alias definitions, use these types of aliases for substituting commands, such defining command substitutions. When the shell processes the as replacing more with less, for overriding defaults to cus- command line, it replaces known alias names with their defined tomize commands to personal contexts, which often involves string values. For example, colorizing output, and also running certain commands as root alias ll='ls -l' by elevating privilege. Aliases that combine multiple com- mands are Scripts. They enable many ways of transforming defines the alias name ll, that is replaced by the alias value ls -l. In data using Unix pipes, and allow for automating repetitive this case, ls is the standard command for listing directory contents, workflows by chaining subcommands. with the argument -l specifying a long-form output format. So the • A Curated Dataset of Command-Line Customizations, alias ll (present in many system configurations) is used to specify consisting of over 2.2 million shell aliases collected from a default argument to a commonly used command under a different GitHub. We view our dataset as a playground for fine-grained name. discovery that can benefit researchers, tool-builders, and Alias values can be arbitrarily complex strings and can substitute command-line users; for example, researchers can use this not only simple commands and arguments, but whole chains of knowledge base to discover which commands are frequently commands. The definition used together and how they are combined, while tool-builders alias ducks='du -cksh * | sort -hr | head -n 15' can see how their programs are being customized. We also defines the new command ducks by chaining together three differ- describe the effective mining technique we used to distill this ent command-line tools in order to return the 15 largest files in the knowledge, which allowed us to capture almost the whole current directory. population (94.09 %) of relevant shell configuration files. In general, an alias definition takes the form • We formulate Implications for Improving Command- alias name=value Line Experience that go beyond single customization prac- tices to address shortcomings and tie them to existing user ex- where value can optionally be enclosed in single (') or double perience research. Codifying emergent behavior [14] found (") quotes and name can be any identifier that is a valid command in our customizations enables learning repair rules and dis- name.1 In particular, the alias name can be an existing command, covering workflows. We are able to uncover conceptual de- so a re-definition like sign flaws, where customizations indicate frustrations with alias grep='grep --color=always' underlying command structures, supporting prior research on potential flaws in the conceptual design of certain com- is possible. mands [38]. Based on the prevalence of highly variable com- In the remainder of this paper, we will use the more compact ! mand redefinitions, we propose contextual defaults, the abil- notation a b to indicate an alias that replaces the name a with ity to suggest different command preferences based on user the value b. context [51]. Overall, we find that many customizations deal with the tension of Interactivity vs Scripting: commands be- 2.2 Dotfiles ing used to interactively navigate systems, while at the same Aliases can be entered directly on the command line, in which time being used within scripts for batch-processing. case they are valid until the shell session ends. To make an alias We now describe usage and syntax of aliases as a vehicle for cus- definition permanent, it is common practice to enter it into afile tomization. We further describe our data