Integrating Apache Arrow and Fpgas on Openpower Johan Peltenburg Delft University of Technology

Integrating Apache Arrow and Fpgas on Openpower Johan Peltenburg Delft University of Technology

Integrating Apache Arrow and FPGAs on OpenPOWER Johan Peltenburg Delft University of Technology Join the Conversation #OpenPOWERSummit Outline • Serialization overhead in big data frameworks • Apache Arrow in-memory format • An FPGA acceleration framework • Regular expression matching experiment • Conclusion 2 Big Data Analytics Processing Frameworks Landscape • Cluster computing & analytics • Application level languages frameworks: • Java • Hadoop Spark • Scala • Flink Storm • Python • Samza Pandas • R • Drill Impala • MATLAB • … • … • Most run on JVM, some parts in C/C++ • A huge variety of tools and • Countless libraries / extensions languages used 3 Let’s attach an accelerator to a JVM • Data is in run-time objects managed by VM • Where are they? • Allocated by the VM in an “unknown” place • Can be subject to Garbage Collection • What do they look like? • Not standardized • Determined by VM implementation. • For OpenJDK: • Header • Pointer to class • Other bits for monitors, threads, etc… • Fields • We must “serialize” 4 Example: serialize collection of strings • Traverse the collection object reference • Traverse the reference to an array of references • For every string: • Traverse the reference to the string object • Traverse the character array reference • Pay a lot of latency to retrieve small amount of bytes • Many individual copies of short character arrays 5 (De)serialization 6 Serialization throughput TACC POWER8 node with OpenJDK. Small objects (p = 2, a=1, e=16, N=220) 7 Apache Arrow & Fletcher 8 Schema X { Arrow format crash course A: Float (nullable) B: List<Char> C: Struct{ A table: E: Int16 Index A B C F: Double } 0 1.33f beer {1, 3.14} } 1 7.01f is {5, 1.41} Index Data 2 ∅ tasty {3, 1.61} Offset Data 0 1 0 ‘b’ 1 5 1 ‘e’ 2 3 2 ‘e’ Buffers in Index Data Index Offset 3 ‘r’ Index Data memory: 0 0 4 ‘i’ 0 3.14 0 1.33f 1 4 5 ‘s’ 1 1.41 1 7.01f 2 6 2 1.61 2 X 6 ‘t’ Index Valid 3 11 7 ‘a’ 0 1 8 ‘s’ 1 1 9 ‘t’ 2 0 9 10 ‘y’ Fletcher: Arrow and FPGA, general approach 10 Schema X { A: Float (nullable) B: List<Char> A: Fixed length data C: Struct{ E: Int16 (with validity bitmap) F: Double } } Index Data 0 1.33f 1 7.01f 2 X Index Valid 0 1 1 1 2 0 ● User streams in first and last index in the table. ● Internal command stream: ● Column Reader streams the requested rows in order. – First element offset in the data word. – No. valid elements in the data word. ● Response handler aligns and serializes or parallelizes the data. 11 Schema X { A: Float (nullable) B: List<Char> C: Struct{ B: Variable length data E: Int16 (without validity bitmaps) F: Double } } Offset Data 0 ‘b’ 1 ‘e’ 2 ‘e’ 3 ‘r’ 4 ‘i’ 5 ‘s’ Index Offset 6 ‘t’ 0 0 7 ‘a’ 1 4 8 ‘s’ 2 6 9 ‘t’ 12 3 11 10 ‘y’ 12 Schema X { A: Float (nullable) B: List<Char> C: Structs C: Struct{ E: Int16 (without validity bitmaps) F: Double } } Index Data 0 1 1 5 2 3 Index Data 0 3.14 1 1.41 2 1.61 13 Other hardware features ● Parameterizable: Special thanks to: – Data & address widths at host memory interface Jeroen van Straten – Burst lengths, FIFO depths, optional register slices on all streams – No. elements per cyle in output ● Nested lists, lists in structs, structs in list, etc… are supported. ● Not using any vendor IP, but synthesizes in Vivado & Quartus ● Extensive verification through automatic test bench generation ( > 10 000 random schemas tested) – For example: ● Struct(List(List(Struct(Float, List(Struct(Int, Prim(1), String)), List(Boolean)), Int)), Double) – Also varies parameters mentioned above 14 Regular expression matching experiment R=16 different regular expressions per unit AWS EC2 F1: • Virtex Ultrascale+ • N=16 regex units • 256 regexes being matched in parallel POWER8 CAPI: • AlphaData KU3 (Kintex Ultrascale) • N=8 regex units • 128 regex being matched in parallel 15 Results (1/2) AWS EC2 F1 CAPI SNAP 16 AWS EC2 F1 (Intel Xeon) Results (2/2) POWER8+CAPI 17 Conclusion • Serialization may cause significant bottlenecks in big data frameworks • Prevents effective deployment of accelerators in some cases • Apache Arrow can help to alleviate bottlenecks • We created an FPGA interface generation framework for Arrow • Fletcher works with SNAP 18.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    18 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us