Pushing Data-Induced Predicates Through Joins in Big-Data Clusters
Total Page:16
File Type:pdf, Size:1020Kb
Pushing Data-Induced Predicates Through Joins in Big-Data Clusters Srikanth Kandula, Laurel Orr, Surajit Chaudhuri Microsoft ABSTRACT Using daTa sTaTisTics, we converT predicaTes on a Table inTo daTa in- duced predicaTes (diPs) ThaT apply on The joining Tables. Doing so subsTanTially speeds up mulTi-relaTion queries because The beneûts of predicaTe pushdown can now apply beyond jusT The Tables ThaT Figure Ï:Example illusTraTing creaTion and use of a daTa-induced have predicaTes. We use diPs To skip daTa exclusively during query predicaTe which only uses The join columns and is a necessary con- opTimizaTion; i.e., diPs lead To beTTer plans and have no overhead diTion To The True predicaTe, i.e., σ ⇒ dpartkey. during query execuTion. We sTudy how To apply diPs for complex Ï query expressions and how The usefulness of diPs varies wiTh The pruning [ÊÏ]), buT predicaTes over join columns are rare, and These daTa sTaTisTics used To consTrucT diPs and The daTa disTribuTions.Our Techniques do noT apply when The predicaTes use columns ThaT do results show ThaT building diPs using zone-maps which are already noT exisT in The joining Tables. mainTained in Today’s clusTers leads To sizable daTa skipping gains. Some sysTems implemenT a form of sideways informaTion passing Using a new (slighTly larger) sTaTisTic, ªì of The queries in The TPC- over joins [}Ï, CÊ] during query execuTion. For example, They may H, TPC-DS and JoinOrder benchmarks can skip aT leasT kkì of The build a bloom ûlTer over The values of The join column partkey query inpuT. ConsequenTly, The median query in a producTion big- in The rows ThaT saTisfy The predicaTe on The part Table and use This daTa clusTer ûnishes roughly }× fasTer. bloom ûlTer To skip rows from The lineitem Table. UnforTunaTely, This Technique only applies during query execuTion, does noT easily PvLDB Reference FormaT: exTend To general joins and has high overheads, especially during SrikanTh Kandula, Laurel Orr, SurajiT Chaudhuri. Pushing DaTa-Induced PredicaTes _rough Joins in Big-DaTa ClusTers. PvLDB, Ï}(k): xxxx-yyyy, parallel execuTion on large daTasets because consTrucTing The bloom }ªÏ. ûlTer becomes a scheduling barrier delaying The scan of lineitem DOI: hTTps://doi.org/Ϫ.Ï99Ê/kkCÊ}Ê.kkCÊ}} unTil The bloom ûlTer has been consTrucTed. We seek a meThod ThaT can converT predicaTes on a Table To daTa 1. INTRODUCTION skipping opporTuniTies on joining Tables even if The predicaTe columns In This paper, we seek To exTend The beneûts of predicaTe push- are absenT in oTher Tables. Moreover, we seek a meThod ThaT applies down beyond jusT The Tables ThaT have predicaTes. Consider The fol- exclusively during query plan generaTion in order To limiT overheads lowing fragmenT of TPC-H query ¢Ï9 [}ª]. during query execuTion. Finally, we are inTeresTed in a meThod ThaT is easy To mainTain, applies To a broad class of queries and makes SELECT SUM(l extendedprice) FROM lineitem minimalisT assumpTions. JOIN part ON l partkey = p partkey Our TargeT scenario is big-daTa sysTems, e.g., SCOPE[], Spark[k9, WHERE p brand=`:1' AND p container=`:2' Ê], Hive [ʪ], FÏ [9C] or Pig [C9] clusTers ThaT run SQL-like queries _e lineitem Table is much larger Than The part Table, buT because over large daTasets; recenT reports esTimaTe over a million servers in The query predicaTe uses columns ThaT are only available in part, such clusTers [Ï]. predicaTe pushdown cannoT speed up The scan of lineitem. How- Big-daTa sysTems already mainTain daTa sTaTisTics such as The max- ever, iT is easy To see ThaT scanning The enTire lineitem Table will be imum and minimum value of each column aT diòerenT granulariTies wasTeful if only a small number of Those rows will join wiTh The rows of The input; deTails are in Table Ï. In The resT of This paper, for sim- from part ThaT saTisfy The predicaTe on part. pliciTy, we will call This The zone-map sTaTisTic and we use The word If only The predicaTe was on The column used in The join condiTion, parTiTion To denoTe The granulariTy aT which sTaTisTics are mainTained. partkey, Then a varieTy of Techniques become applicable (e.g., al- Using daTa sTaTisTics, we oòer a meThod ThaT converts predicaTes gebraic equivalence [k], magic seT rewriTing [ª, 9}] or value-based on a Table To daTa skipping opporTuniTies on The joining Tables aT query opTimizaTion Time. _e meThod, an example of which is shown This work is licensed under the Creative Commons Attribution- in Figure Ï, begins by using daTa sTaTisTics To eliminaTe parTiTions on NonCommercial-NoDerivatives 4.0 International License. To view a copy Tables ThaT have predicaTes. _is sTep is already implemenTed in some of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For sysTems [9, ÏÊ, ,ÊÏ].NexT, using The daTa sTaTisTics of The parTiTions any use beyond those covered by this license, obtain permission by emailing [email protected]. Copyright is held by the owner/author(s). Publication rights ThaT saTisfy The local predicaTes, we consTrucT a new predicaTe which licensed to the VLDB Endowment. Ï Proceedings of the VLDB Endowment, Vol. 12, No. 3 Over all The queries in TPC-H [}C] and TPC-DS [}], There are ISSN 2150-8097. zero predicaTes on join columns perhaps because join columns Tend DOI: https://doi.org/10.14778/3368289.3368292 To be opaque sysTem-generaTed idenTiûers. Table Ï: DaTa sTaTisTics mainTained by several sysTems. Scheme STaTisTic GranulariTy ZoneMaps [Ï] max and min value per col- zone Spark [, ÏÊ] umn file ExadaTa [C] max, min and null presenT or per Table region verTica [], ORC[}], null counT per column sTripe, row- ParqueT [ÏC] group BrighThouse [99] hisTograms, char maps per col daTa pack Figure k:A workow which shows changes in red; using parTiTion sTaTisTics, our query opTimizer compuTes daTa-induced predicaTes and ouTputs plans ThaT read less inpuT. range-seT, ThaT improves query performance over zone-maps aTThe Figure }: IllusTraTing The need To move diPs pasT oTher opera- cosT of a small increase in space. We also discuss how To mainTain Tions (le). On a k-way join when all Tables have predicaTes (middle), sTaTisTics when daTasets evolve. In more deTail, The resT of This paper The opTimal schedule only requires Three (parallel) sTeps (righT). has These conTribuTions. ● Using diPs for complex queries leads To novel challenges. capTures all of The join column values conTained in such parTiTions. – Consider TPC-H query¢Ï9 [}ª] in Figure }(le) which has a _is new daTa-induced predicaTe (diP) is a necessary condiTion of The nesTed sub-query on The lineitem Table. CreaTing diPs for acTual predicaTe (i.e., σ ⇒ d) because There may be false-posiTives; only The fragmenT considered in Figure Ï sTill reads The enTire i.e., in The parTiTions ThaT are included in The diP, noT all of The rows lineitem Table for The nesTed group-by. To alleviaTe This, we may saTisfy σ. However, The diP can apply over The joining Table be- use new QO TransformaTion rules To move diPs; in This case, cause iT only uses The join column; in The case of Figure Ï, The diP shown wiTh doTTed arrows, The diP is pulled above a join, moved consTrucTed on The part Table can be applied on The parTiTion sTaTis- sideways To a diòerenT join inpuT and Then pushed below a group- Tics of lineitem To eliminaTe parTiTions. All of These sTeps happen by Thereby ensuring ThaT a shared scan of lineitem will suõce. during query opTimizaTion; our QO eòecTively replaces each Table – When mulTiple joining Tables have predicaTes, a second chal- wiTh a parTiTion subseT of ThaTTable;The reducTion in inpuT size oen lenge arises. Consider The k-way join in Figure }(middle) where Triggers oTher plan changes (e.g., using broadcasT joins which elim- all Tables have local predicaTes. _e ûgure shows four diPs: inaTe a parTiTion-shuøe [9]) leading To more eõcienT query plans. one per Table and per join condiTion. If applying These diPs If The above meThod is implemenTed using zone-maps, which are eliminaTes parTiTions on some joining Table, Then The diPs ThaT mainTained by many sysTems already, The only overhead is an in- were previously consTrucTed on ThaTTable are no longer up-To- crease in query opTimizaTion Time which we show is small in §. daTe. Re-creaTing diPs whenever parTiTion subsets change will For queries wiTh joins, we show ThaT daTa-induced predicaTes oòer increase daTa skipping buT doing so na¨ıvely can consTrucT ex- comparable query performance aT much lower cosT relaTive To maTe- cessively many diPs which increases query opTimizaTion Time. rializing denormalized join views [C] or using join indexes [, }k]. We presenT an opTimal schedule for Tree-like join graphs which _e fundamenTal reason is ThaTThese Techniques use augmenTary converges To ûxed poinT and hence achieves all possible daTa daTa-sTrucTures which are cosTly To mainTain; yeT, Their beneûts are skipping while compuTing The fewesT number of diPs. STar and limiTed To narrow classes of queries (e.g., queries ThaT maTch views, snowake schemas resulT in Tree-like join graphs. We discuss have speciûc join condiTions, or very selecTive predicaTes) [k]. DaTa- how To derive diPs for general join graphs wiThin a cosT-based induced predicaTes, we will show, are useful more broadly. query opTimizer in §k. We also noTe ThaTThe consTrucTion and use of daTa-induced pred- diPs icaTes is decoupled from how The daTasets are laid ouT.Prior work ● We show how diòerenT daTa sTaTisTics can be used To compuTe idenTiûes useful daTa layouts, for example, co-parTiTion Tables on in § and discuss why range-sets represenT a good Trade-oò be- Their join columns [Ï, C}] or clusTer rows ThaT saTisfy The same pred- Tween space usage and poTenTial for daTa skipping.