Automatic Synthesis of Parallel Unix Commands and Pipelines with Kumquat

Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat Jiasi Shen Martin Rinard Nikos Vasilakis MIT EECS & CSAIL MIT EECS & CSAIL MIT EECS & CSAIL USA USA USA [email protected] [email protected] [email protected] Abstract including pipelines that contain new commands or command options for which combiners were previously unavailable. We present KumQuat, a system for automatically generating KumQuat targets commands that can be expressed as data data parallel implementations of Unix shell commands and parallel divide-and-conquer computations with two phases:1 pipelines. The generated parallel versions split input streams, the first phase executes the original, unmodified command execute multiple instantiations of the original pipeline com- in parallel on disjoint parts of the input; the second phase mands to process the splits in parallel, then combine the combines the partial results from the first phase to obtain resulting parallel outputs to produce the final output stream. the final output. KumQuat generates candidate combiners, KumQuat automatically synthesizes the combine operators, then repeatedly feeds selected inputs to parallelized versions with a domain-specific combiner language acting as a strong of the command that use the candidate combiners. A com- regularizer that promotes efficient inference of correct comparison of the resulting outputs with corresponding outputs biners. from the original serial version of the command enables We evaluate KumQuat on 70 benchmark scripts that to- KumQuat to identify a correct combiner for the command. gether have a total of 427 stages. KumQuat synthesizes a A domain-specific combiner language acts as a strong regu- correct combiner for 113 of the 121 unique commands that larizer that promotes efficient learning of correct combiners. appear in these benchmark scripts. The synthesis times vary The resulting (automatically generated) parallel computa- between 39 seconds and 331 seconds with a median of 60 tion executes directly in the same environment and with the seconds. We present experimental results that show that same program and data locations as the original sequential these combiners enable the effective parallelization of our command. benchmark scripts. This paper makes the following contributions: • Algorithm: We present a new algorithm that automati- 1 Introduction cally synthesizes combiners for parallel and distributed versions of Unix commands. The resulting synthesized The Unix shell, working in tandem with the wide range combiners enable the automatic generation of parallel and of commands it supports, provides a convenient program- distributed versions of Unix commands and pipelines. ming environment for many stream processing computa- • tions. Shell commands—which can be written in multiple Domain-Specific Language: We present a domain-spe- languages—typically execute sequentially on a single pro- cific language for combiner operators. This language sup- cessor. This sequential execution often leaves available data ports both the class of combiner operators relevant to parallelism, in which a command operates on different parts this domain and an efficient algorithm for automatically of an input stream in parallel, unexploited. This observation synthesizing these combiner operators. • We present theorems (Theorem 2 arXiv:2012.15443v2 [cs.PL] 22 Aug 2021 has motivated the development of systems that exploit data Correct Combiners: parallelism in shell pipelines [18, 26]. A key prerequisite is and Theorem 4) that characterize when the combiner syn- obtaining the combiners required to merge the resulting mul- thesis algorithm will identify a correct combiner for a tiple parallel output streams correctly into a single output given set of parallel output streams. stream. Previous systems rely on developers to manually We also present an analysis of the interaction between our implement such combiners and associate them with their input generation algorithm, our benchmark commands, corresponding shell commands [18, 26]. and our combiner synthesis algorithm. This analysis iden- We present a new system, KumQuat, for automatically tifies why KumQuat generates correct combiners for the exploiting data parallelism available in Unix pipelines. Work- ing with the commands in the pipeline as black boxes, KumQuat automatically generates inputs that explore the behavior of 1There is no requirement that the actual internal implementation must the command to infer and automatically generate a combiner be structured as a divide-and-conquer computation — because KumQuat for the command. This capability enables KumQuat to auto- interacts with the command as a black box, the requirement is instead only matically generate data parallel versions of Unix pipelines, that the computation that it implements can be expressed in this way. ,, Jiasi Shen, Martin Rinard, and Nikos Vasilakis 113 of 121 benchmark commands for which correct com- cat $IN| tr-csA-Za-z '\n ' | trA-Za-z| biners exist, identifies command patterns that ensure cor- sort| uniq-c| sort-rn rect combiner synthesis and correct data parallel execution, and provides insight into the rationale behind the Figure 1. Example pipeline that computes word frequen- KumQuat design and the reasons why the KumQuat de- cies [2, 26]. sign can effectively exploit data parallelism available in its target class of commands. function, which may depend on the sort flag that specifies • Experimental Results: We present experimental results the comparison function. The “uniq -c” command produces that characterize the effectiveness of KumQuat on a set of a stream of (count, word) pairs. Given two streams y1 and y2, 70 benchmark scripts that together have a total of 477 com- the combine operator compares the word in the last line of mands, among which 121 are unique and process an input y1 with the word in the first line of y2. If they are the same, it stream. The results show that KumQuat can effectively concatenates y1 and y2 but combines the last and first lines to synthesize combiners for the majority of our benchmark include the sum of the two word counts. Otherwise it simply commands and that these synthesized combiners enable concatenates y1 and y2. As these examples highlight, select- effective parallelizations of our benchmark Unix pipelines. ing an appropriate combine operator for each command is a critical step for obtaining a correct parallel execution. 2 Example The default KumQuat parallel computation applies the Figure 1 presents an example pipeline that we use to illus- combine operator after the parallel execution of each com- trate KumQuat. The pipeline implements a computation that mand to obtain a single output stream for that command. counts the frequency of words in an input document. The In many cases, however, it is possible to enhance parallel six commands in the pipeline (1) read the input document, performance by eliminating intermediate combine operators (2) break the document into lines of words, (3) translate the so that the output substreams for one command feed directly words into lower case, (4) sort the words, (5) remove dupli- into the input substreams for the direct parallel execution cates and prepend each unique word with a count, and (6) of the next command. KumQuat therefore applies an opti- sort the words on their counts in reverse order. mization that automatically eliminates intermediate combine The pipeline conforms to a standard Unix model that struc- operators when possible (Section 3.5). tures computations as pipelines of building-block commands Model of Computation: KumQuat targets commands 5 that process character streams. The example commands pro- that have a combine operator 6 that satisfies cess the streams as lines of words, with the lines and words 5 ¹x ¸¸ x º = 6¹5 ¹x º, 5 ¹x ºº separated by delimiters, in the example newline and space. 1 2 1 2 As the commands process lines, words, or characters, they for all input streams x1, x2, where the streams are (poten- apply a function to each unit and either output the result tially recursively) structured as units separated by delimiters. of the function (commands “tr -cs A-Za-z ’\n’” and KumQuat currently targets character streams structured “tr A-Z a-z”), sort the units according to a certain order as lines with the newline delimiter, so that x1 and x2 ter- (commands “sort” and “sort -rn”), or accumulate a result minate with newlines. ¸¸ denotes string concatenation. A that is output when the command finishes reading the input key step in the parallelization of 5 is the synthesis of a cor- stream and terminates (command “uniq -c”). rect combiner 6 for 5 . To focus the synthesis algorithm on a To exploit the data parallelism available in this computa- productive space of candidate combiners, KumQuat works tion, KumQuat splits the input data stream into substreams, with combiners expressible in a domain-specific combiner then instantiates the commands to process the input sub- language (Figure 3). streams in parallel. The result is a set of parallel output Combiner Synthesis: To infer a combiner 6 for a com- substreams that must then be combined to obtain the single mand 5 , KumQuat works with a set of candidate combiner final output stream of the command. functions. In the current KumQuat implementation, this set Different commands often require different combine op- consists of all combiner functions with seven or fewer nodes erators. The combine operator for command “tr A-Z a-z” in the DSL abstract syntax tree. KumQuat

Load more