
Towards Concurrent Stateful Stream Processing on Multicore Processors Towards Concurrent Stateful Stream Processing on Multicore Processors Shuhao Zhang co-author: Yingjun Wu (AWS), !eng"hang (#enmin %ni&ersity of China) Bingsheng He Xtra Computing lead Groupby Prof. Bingsheng He Towards Concurrent Stateful Stream Processing on Multicore Processors Data Stream Processing Systems • [Available] Fast stream processing on large-scale multicore architectures • E.g., BriskStream1, StreamBox2, Grizzly3 • [Inadequate] Support of consistent stateful stream processing 1. BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures, SIGM D!1" 2. StreamBo$: Modern Stream Processing on a Multicore Machine, A%&!19 3. Gri((ly: Efficient Stream Processing %hrough Ada+tive Query Com+ilation, SIGM D!#. ( Towards Concurrent Stateful Stream Processing on Multicore Processors Outline • Motivation • State-of-the-art • Our Approach • Experimental Results ) Towards Concurrent Stateful Stream Processing on Multicore Processors Motivation Example • Road congestion status is shared among different streaming operators • Its consistency needs to be preserved +oll Processing: -- consistent state/ul road computes toll based on stream processing congestion the roadstatus congestion status * Towards Concurrent Stateful Stream Processing on Multicore Processors The Current Common System Design disjoint subset &ommon designs: operator parallelis$ a) Pipelined processing with message passing b) On-demand data parallelism - Towards Concurrent Stateful Stream Processing on Multicore Processors Key-based Stream Partition #oad 0 AY. e/ecutor0 3o 'onflict1 #oad0 copy Road 2 Pay e/tra penalty PIE e/ecutor( if road1 is Road2 e$,ty1 #oad0 Can not efficiently handle general case 2 Towards Concurrent Stateful Stream Processing on Multicore Processors Lock-based State Sharing #oad1 AY. e/ecutor0 #oad0 ( )6 Road 2 PIE e/ecutor( Lead to poor performance 5 Towards Concurrent Stateful Stream Processing on Multicore Processors Access Ordering A toll should based on exactly current road condition, otherwise.. 7 Towards Concurrent Stateful Stream Processing on Multicore Processors Access Ordering &ommon workaround: • Buffer and sort Lead to poor performance 8 Towards Concurrent Stateful Stream Processing on Multicore Processors Outline • Motivation • State-of-the-art • Our Approach • Experimental Results 09 Towards Concurrent Stateful Stream Processing on Multicore Processors Consistent Stateful Stream Processing Employing transactional schematics. • State transaction (): 1. txnt1{Update Road1 Cnt, update Roadx…} @ 10:00; 2. txnt( {#ead Road1 Cnt 6; @ 00:9- • Correct schedule: … 00 Towards Concurrent Stateful Stream Processing on Multicore Processors Existing Solutions Revisited What if we use existing design[1]? • Severe lock contention A new solution for scaling concurrent state access is needed! [1] S-Store: Streaming Meets Transaction Processing, VLDB’15 0( Towards Concurrent Stateful Stream Processing on Multicore Processors Outline • Motivation • State-of-the-art • Our Approach: • TStream (An extension of BriskStream[2]) • Experimental Results [2] BriskStream: Scaling Data Stream Processing on S are!-Memor" M#lticore $rc itect#res, S%&M'D’1( 0) Towards Concurrent Stateful Stream Processing on Multicore Processors Design Overview • Design 1) Dual-Mode Scheduling • Exposing Parallelism • Design 2) Dynamic Restructuring Execution • Exploiting Parallelism. Up to 4.8 times higher throughput with similar or even lower processing latency! 0* Towards Concurrent Stateful Stream Processing on Multicore Processors Design1) Dual-Mode Scheduling • Initial process on an • State access is often the input event PreProcess bottleneck. • It unnecessarily bloc=s • Access to shared states StateAccess all subsequent processes. • Process on an input Let’s postpone it. event with reference to PostProcess obtained state values 0- Towards Concurrent Stateful Stream Processing on Multicore Processors Design 1) Dual-Mode Scheduling (a)PreProcess (b)StateAccess (c)PostProcess (d)PreProcess 'ompute Mode State Access Mode 'ompute Mode Pros: Enlarging inter-event parallelism opportunities. Cons: 3eed to handle ``on-the-flyA e&ents. 02 Towards Concurrent Stateful Stream Processing on Multicore Processors Design 2) Dynamic Restructuring Execution For each state transaction: Dynamically decompose and regroup operations to avoid conflict before actual parallel evaluation is applied. e* e1 e2 e)ecutor1 e)ecutor2 txnt1: {O1, O2} txnt1 t)nt2 txn2: {O3} → O : #ead of Road 1 ' 1 * : %,date to Road 2 Cperation chains: 6 O2 ' ' O3 : #ead of Road 2 1 2 : A thread #oad 1 #oad 2 05 Towards Concurrent Stateful Stream Processing on Multicore Processors Outline • Motivation • State-of-the-art • Our Approach • Experimental Results 07 Towards Concurrent Stateful Stream Processing on Multicore Processors Benchmark Workloads • Four applications: • Grep Sum (GS); Streaming Ledger (SL); Online Bidding (OB); Toll Processing (TP). • Diverse characteristics: • Varying compute/state access ratio • Varying state transaction types: read-only, write-only • Varying data dependencies • Varying multi-partition transactions Experiments conducted on a 4-socket Intel Xeon E7- 4820 server with 128 GB DRAM. 08 Towards Concurrent Stateful Stream Processing on Multicore Processors Overall Performance Comparison (i) In general, TStream perfor$s well at large core counts. (ii) TStrea$ perfor$s slightly better when data dependency is hea&ily presented (SE). (iii) Large room for further impro&ement. (9 Towards Concurrent Stateful Stream Processing on Multicore Processors Recap • [Key Designs] 1) dual-mode scheduling, 2) transaction restructuring • [Results] TStream performs constantly better than prior solutions under varying workloads. https://github.com/Xtra-Computing/briskstream/tree/TStream (0 Towards Concurrent Stateful Stream Processing on Multicore Processors Current Work +e)t &eneration %oT Data 1isit #s 2 Management Plat,orm www.nebula.stream -lea! ." Pro,/ Volker Markl0 +Ditter@nebulastrea$ Thank "ou3 s # ao/4 ang2t#-.erlin/!e (( Towards Concurrent Stateful Stream Processing on Multicore Processors Competitor Schemes Ordering-lock-based approach (LOCK). ① Multi-versioning-based approach (MVLK). ② Static-partition-based approach (PAT). ③ • S-Store No Consistency Guarantee (No-Lock). ④• Remove locks from LOCK scheme. • Represent the performance upper bound. () Towards Concurrent Stateful Stream Processing on Multicore Processors Runtime Breakdown (a) 3o-Lock scheme spends more than 60% in others mainly due to inde/ look up. (b) Synchronization o&erhead dominates in all ,rior solutions regardless of N%BA eGect. (c) +Stream in&ol&es high #BA o&erhead. 3%BA-aDare optimization detailed in our paper. (* Towards Concurrent Stateful Stream Processing on Multicore Processors No free lunch • Currently, TStream has its drawback on efficiently handling transaction abort. • E.g., one that leaves the average speed of a road to become negative. Stay tuned for future work (-.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages25 Page
-
File Size-