Vectorization Vs. Compilation in Query Execution

Vectorization Vs. Compilation in Query Execution

Vectorization vs. Compilation in Query Execution Juliusz Sompolski1 Marcin Zukowski Peter Boncz2 VectorWise B.V. VectorWise B.V. Vrije Universiteit Amsterdam [email protected] [email protected] [email protected] ABSTRACT conditions used in Joins and Select, calculations used to in- Compiling database queries into executable (sub-) programs troduce new columns in Project, and functions like MIN, provides substantial benefits comparing to traditional inter- MAX and SUM used in Aggregation. Most query inter- preted execution. Many of these benefits, such as reduced preters follow the so-called iterator-model (as described in interpretation overhead, better instruction code locality, and Volcano [5]), in which each operator implements an API that providing opportunities to use SIMD instructions, have pre- consists of open(), next() and close() methods. Each next() viously been provided by redesigning query processors to call produces one new tuple, and query evaluation follows a use a vectorized execution model. In this paper, we try to “pull” model in which next() is called recursively to traverse shed light on the question of how state-of-the-art compila- the operator tree from the root downwards, with the result tion strategies relate to vectorized execution for analytical tuples being pulled upwards. database workloads on modern CPUs. For this purpose, we It has been observed that the tuple-at-a-time model leads carefully investigate the behavior of vectorized and compiled to interpretation overhead: the situation that much more strategies inside the Ingres VectorWise database system in time is spent in evaluating the query plan than in actually three use cases: Project, Select and Hash Join. One of the calculating the query result. Additionally, this tuple-at-a- findings is that compilation should always be combined with time interpretation model particularly affects high perfor- block-wise query execution. Another contribution is iden- mance features introduced in modern CPUs [13]. For in- tifying three cases where “loop-compilation” strategies are stance, the fact that units of actual work are hidden in the inferior to vectorized execution. As such, a careful merging stream of interpreting code and function calls, prevents com- of these two strategies is proposed for optimal performance: pilers and modern CPUs from getting the benefits of deep either by incorporating vectorized execution principles into CPU pipelining and SIMD instructions, because for these compiled query plans or using query compilation to create the work instructions should be adjacent in the instruction building blocks for vectorized processing. stream and independent of each other. Related Work: Vectorized execution. MonetDB [2] 1. INTRODUCTION reduced interpretation overhead by using bulk processing, where each operator would fully process its input, and only Database systems provide many useful abstractions such then invoking the next execution stage. This idea has been as data independence, ACID properties, and the possibil- further improved in the X100 project [1], later evolving into ity to pose declarative complex ad-hoc queries over large VectorWise, with vectorized execution. It is a form of block- amounts of data. This flexibility implies that a database oriented query processing [8], where the next() method rather server has no advance knowledge of the queries until run- than a single tuple produces a block (typically 100-10000) time, which has traditionally led most systems to implement of tuples. In the vectorized model, data is represented as their query evaluators using an interpretation engine. Such small single-dimensional arrays (vectors), easily accessible an engine evaluates plans consisting of algebraic operators, for CPUs. The effect is (i) that the percentage of instruc- such as Scan, Join, Project, Aggregation and Select. The op- tions spent in interpretation logic is reduced by a factor erators internally include expressions, which can be boolean equal to the vector-size, and (ii) that the functions that per- 1 form work now typically process an array of values in a tight This work is part of a MSc thesis being written at Vrije loop. Such tight loops can be optimized well by compilers, Universiteit Amsterdam. 2 e.g. unrolled when beneficial, and enable compilers to gener- The author also remains affiliated with CWI Amsterdam. ate SIMD instructions automatically. Modern CPUs also do well on such loops, as function calls are eliminated, branches get more predictable, and out-of-order execution in CPUs Permission to make digital or hard copies of all or part of this work for often takes multiple loop iterations into execution concur- personal or classroom use is granted without fee provided that copies are rently, exploiting the deeply pipelined resources of modern not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to CPUs. It was shown that vectorized execution can improve republish, to post on servers or to redistribute to lists, requires prior specific data-intensive (OLAP) queries by a factor 50. permission and/or a fee. Related Work: Loop-compilation. An alternative strat- Proceedings of the Seventh International Workshop on Data Management on New Hardware (DaMoN 2011), June 13, 2011, Athens, Greece. egy for eliminating the ill effects of interpretation is using Copyright 2011 ACM 978-1-4503-0658-4 ...$10.00. Just-In-Time (JIT) query compilation. On receiving a query for the first time, the query processor compiles (part of) the Algorithm 1 Implementation of an example query using query into a routine that gets subsequently executed. In vectorized and compiled modes. Map-primitives are stat- Java engines, this can be done through the generation of ically compiled functions for combinations of operations new Java classes that are loaded using reflection (and JIT (OP), types (T) and input formats (col/val). Dynamically compiled by the virtual machine) [10]. In C or C++, source compiled primitives, such as c000(), follow the same pat- code text is generated, compiled, dynamically loaded, and tern as pre-generated vectorized primitives, but may take executed. System R originally skipped compilation by gen- arbitrarily complex expressions as OP. erating assembly directly, but the non-portability of that // General v e c t o r i z e d p r i m i t i v e p a t t e r n approach led to its abandonment [4]. Depending on the map OP T col T col ( idx n ,T∗ res ,T∗ col1 ,T∗ c o l 2 ){ compilation strategy, the generated code may either solve for ( int i =0; i <n ; i ++) the whole query (“holistic” compilation [7]) or only certain r e s [ i ]=OP( c o l 1 [ i ] , c o l 2 [ i ] ) ; performance-critical pieces. Other systems that are known } to use compilation are ParAccel [9] and the recently an- // The micro−benchmark uses data s t o r e d in : nounced Hyper system [6]. We will generalise the current const idx LEN=1024; state-of-the-art using the term“loop-compilation”strategies, chr tmp1 [LEN], tmp2 [LEN], one = 1 0 0 ; sht tmp3 [LEN ]; as these typically try to compile the core of the query into a int tmp4 [LEN ]; // f i n a l r e s u l t single loop that iterates over tuples. This can be contrasted with vectorized execution, which decomposes operators in // Vectorized code : m a p a d d c h r v a l c h r c o l (LEN, tmp1,&one , l d i s c o u n t ) ; multiple basic steps, and executes a separate loop for each m a p s u b c h r v a l c h r c o l (LEN, tmp2,&one , l t a x ) ; basic step (“multi-loop”). m a p m u l c h r c o l c h r c o l (LEN, tmp3 , tmp1 , tmp2 ) ; Compilation removes interpretation overhead and can lead m a p m u l i n t c o l s h t c o l (LEN, tmp4 , l e x t p r i c e , tmp3 ) ; to very concise and CPU-friendly code. In this paper, we // Compiled e q u i v a l e n t o f t h i s e x p r e s s i o n : put compilation in its most favourable light by assuming c000 ( idx n , int ∗ res , int ∗ col1 , chr ∗ col2 , chr ∗ c o l 3 ){ that compilation-time is negligible. This is often true in for ( idx i =0; i <n ; i ++) r e s [ i ]= c o l 1 [ i ]∗((100 − c o l 2 [ i ])∗(100+ c o l 3 [ i ] )) ; OLAP queries which tend do be rather long-running, and } technologies such as JIT in Java and the LLVM framework for C/C++ [12] nowadays provide low (milliseconds) laten- cies for compiling and linking. 2. CASE STUDY: PROJECT Roadmap: vectorization vs. compilation. Vectorized Inspired by the expressions in Q1 of TPC-H we focus on expressions process one or more input arrays and store the the following simple Scan-Project query as micro-benchmark: result in an output array. Even though systems like Vec- torWise go through lengths to ensure that these arrays are SELECT l_extprice*(1-l_discount)*(1+l_tax) FROM lineitem CPU cache-resident, this materialization constitutes extra The scanned columns are all decimals with precision two. load/store work. Compilation can avoid this work by keep- VectorWise represents these internally as integers, using the ing results in CPU registers as they flow from one expression value multiplied by 100 in this case. After scanning and de- to the other. Also, compilation as a general technique is or- compression it chooses the smallest integer type that, given thogonal to any execution strategy, and can only improve 3 the actual value domain, can represent the numbers.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us