Mobile App Acceleration Via Fine-Grain Offloading to the Cloud

Mobile App Acceleration Via Fine-Grain Offloading to the Cloud

Mobile App Acceleration via ! Fine-Grain Offloading to the Cloud" Chit-Kwan Lin H. T. Kung" USENIX HotCloud 2014" Copyright © 2014 UpShift Labs, Inc." 1" Confluence of Forces Points to ! Offloading to the Cloud" Devices" Cloud" Network" • Proliferation of smartphones & tablets" • Cloud computing • Internet infrastructure" infrastructures" • Complex tasks ! • Wireless infrastructure (e.g., imaging, learning, • Economies of scale" (Wi-Fi & cellular)" recognition)" • On-demand provisioning" • High bandwidths & ! • Emphasis on battery low latencies" efficiency " Compute Offloading Copyright © 2014 UpShift Labs, Inc." 2" A Simple DSM Supports Offloading" Mobile App" Cloud Daemon" Distributed Shared Memory" Memory" Memory (replica)" Object Continuous! Object variable_X Replication" variable_X method_Y() Dynamic + Fine-grain Offloading" method_Y() *" * See last slide for image source references" Copyright © 2014 UpShift Labs, Inc." 3" Advantages of Dynamic + Fine-Grain" • Dynamic – Can offload arbitrary work at runtime" – Can optimize resource utilization (e.g., battery) at runtime" • Fine-grain – At the level of a method invocation" – Feels more responsive when failure requires local restart to fix" – New class of small workloads for cloud providers" • Useful for leveling out utilization while providing low-latency services for end-users" Copyright © 2014 UpShift Labs, Inc." 4" Challenges" • Latency – Dictates granularity: if update takes 5s, then workloads that run <5s don’t benefit from offloading" • Bandwidth – We should only send deltas, but determining delta encoding has non- trivial costs " • e.g., rsync can take 3 round trips (weak hash comparison, strong hash comparison, data) to generate a delta encoding, on top of time to calculate hashes" • Compute – We should compress to save bandwidth, but compression can be computationally expensive" • Battery – Shouldn’t end up consuming more battery budget" Copyright © 2014 UpShift Labs, Inc." 5" Compressive Sensing" Φ" x" y" #" =" M × N" M × 1" Random sampling! Compressive ! matrix ! samples" (M < N)" N × 1" K-sparse signal" • Randomly mix signal elements by random projection onto lower- dimensional space " • Random Φ preserves Euclidean length/distance of sparse vectors with high probability when • Decode x from y by solvingM y ≥ = O(ΦxK via log optimization N/K) (linear programming)" 6" Key Insight" Writes (deltas) to memory typically constitute a sparse signal that can be compressively sampled Copyright © 2014 UpShift Labs, Inc." 7" Compressive Replication (1)" Device" Server" Time t0" x0" x0" Memory starts out synchronized, with byte values x0 " Both device and server know Φ" " Server calculates" y0 = Φx0" Copyright © 2014 UpShift Labs, Inc." 8" Compressive Replication (2)" Device" Server" y1 = Φx1" y0 = Φx0" Time t1" x1" x0" Some values in memory change, resulting in x1 " Device calculates" y1 = Φx1" " Device transmits y1 = Φx1 to server" " Copyright © 2014 UpShift Labs, Inc." 9" Compressive Replication (3) Device Server y0 = Φx0 Time t1 y1 = Φx1 y’ = y0 – y1 y' = Φx0 – Φx1 y' = Φ(x0 – x1) y' = Φx’ x1 x0 x’ = x0 – x1 is the delta encoding! Server calculates y’ = y0 – y1 y’ = Φx’ has same form as compressive sensing decoding problem, so server solves for x’ Copyright © 2014 UpShift Labs, Inc. 10 Compressive Replication (4) Device Server x0 – x’ Time t1 = x0 – (x0 – x1) = x1 x1 x1 Server calculates x0 – x’ = x1, which is the new memory state Server is now updated Copyright © 2014 UpShift Labs, Inc. 11 Novel Characteristics • All-in-one – Delta encoding + compression • Delta encoding figured out by server, not device – Automatically recovered during decoding – Just send compressive samples; no add’l network costs • Codec is resource-commensurate – Device: low-complexity encoder – Server: higher-complexity decoder – Unlike traditional compressors Copyright © 2014 UpShift Labs, Inc. 12 What do we compare against? • No similar replication methods – Compressive replication gives all-in-one delta encoding + compression – e.g., rsync • Compression is an add’l step • Needs multiple round-trips to determine delta encoding • Compressed snapshots is a more appropriate comparison – No add’l round-trip overheads – Just compress the whole memory page (snapshot) – snappy, zlib • Metrics – Replicate 64KB memory block – Total Latency = encoding + network + decoding – Compression ratio (bandwidth cost) Copyright © 2014 UpShift Labs, Inc. 13 Latency/Compression Trade-off Near ~100ms threshold of user- perceptible app delay Total Update Compression! Encoding Network Decoding Latency Size ! Ratio! (ms) (ms) (ms) (ms)! (KB)" snappy! 4" 15" --" 19" 3.8 : 1" 17.2" zlib! 487" 13" --" 500" 6.0 : 1" 10.9" compressive 53" 12" 70" 135" 7.3 : 1" 9.0" replication! Highest compression ratio Copyright © 2014 UpShift Labs, Inc. 14 Latency/Compression Trade-off" Remains good w/ high Kolmogorov complexity, which would confound snappy/zlib Total Update Compression! Encoding Network Decoding Latency Size ! Ratio! (ms) " (ms)" (ms)" (ms)! (KB)" snappy! 4" 15" --" 19" 3.8 : 1" 17.2" zlib! 487" 13" --" 500" 6.0 : 1" 10.9" compressive 53" 12" 70" 135" 7.3 : 1" 9.0" replication! Unlike snappy/zlib, comp ratio fixed & b/w cost predictable because y = Φx is independent of input’s Kolmogorov complexity Copyright © 2014 UpShift Labs, Inc." 15" Example Handwriting Recognition App" • UpShift platform for compressive offloading of iOS apps" • SVM for Chinese handwriting recognition" – Stroke count is measure of task complexity" • Device: iPad (3rd gen)" • Server: Amazon g2.xlarge" – GPU for compressive replication decoding" – CPU for SVM evaluation" • Network: 802.11g " – Office setting" – 19ms RTT to us-east-1a server" Copyright © 2014 UpShift Labs, Inc." 16" App Speedup" On-device Offloaded execution time execution time stays short due to low scales poorly overhead of compressive More complex with task offloading. Users expect tasks have complexity this. greater speedup Copyright © 2014 UpShift Labs, Inc." 17" Battery Savings" High task complexity (25 strokes), 250 iterations" Offloading allows 60% more work to be done with the same battery budget. Copyright © 2014 UpShift Labs, Inc." 18" Summary" • Low-latency DSM updates enable fine-grain offloading" • UpShift supports offloading ~100ms workloads, while keeping resource utilization low" • Achieves significant speedups and battery savings" • Key insight: memory writes are a sparse signal that can be compressively sampled" • Implications for future ARM-based DC’s or x86-based devices" Copyright © 2014 UpShift Labs, Inc." 19" Questions?" " [email protected]" Copyright © 2014 UpShift Labs, Inc." 20".

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us