actory PC
UNIVERSITY OF CALIFORNIA, IRVINE MOZILLA RESEARCH PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS: CASE STUDY OF SERVO Rohit Zambre, Lars Bergstrom, Laleh Aghababaie Beni, Aparna Chandramowlishwaran 2,200,000 2,200,000 Google Play Store, June 2016 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
A SMART BUT CLUTTERED PHONE
▸ “Free” apps not really free
▸ Cost memory
▸ One gatekeeper ONE App? ONE App? PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
PROBLEMS WITH THE MOBILE BROWSER
51% 45%
Crashed Unexpected PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
PROBLEMS WITH THE MOBILE BROWSER
51% 45% 73%
Crashed Unexpected Too Slow PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WHY SLOW?
▸ Fundamental architecture unchanged since 1990s PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WHY SLOW?
▸ Fundamental architecture unchanged since 1990s
2000 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WHY SLOW?
▸ Fundamental architecture unchanged since 1990s
2000 2016 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WHY SLOW?
▸ Web page complexity continually increasing
▸ Fundamental architecture unchanged since 1990s
▸ Only sequential optimizations
▸ Incremental feature support OBJECTIVE MAKE THE BROWSER FASTER OBJECTIVE MAKE THE BROWSER FASTER
Revive the Universal Web platform. HOW TO MAKE IT FASTER? HOW TO MAKE IT PARALLELIZE FASTER? BROWSER TASKS PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
RELATED WORK
▸ Parallelizing browser tasks:
▸ Styling: Meyerovich et al. (80x) [28], Badea et al. (1.8x) [21]
▸ Parsing: Zhao et al. (2.4x) [34] PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
RELATED WORK
▸ Parallelizing browser tasks:
▸ Styling: Meyerovich et al. (80x) [28], Badea et al. (1.8x) [21]
▸ Parsing: Zhao et al. (2.4x) [34]
▸ Concurrent/Parallel Browsers:
▸ Concurrent: Google Chrome, Mozilla Firefox
▸ Parallel + Concurrent: Qualcomm’s ZOOMM (2x) PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
THE WORKLOAD OF A BROWSER IS DEPENDENT ON THE PAGE IT IS RENDERING.
Browser Internals PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WORKLOAD VARIETY
PARALLEL OVERHEAD PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WORKLOAD VARIETY
PARALLEL OVERHEAD PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WORKLOAD VARIETY
Creation, deletion, coordination
PARALLEL OVERHEAD OUR IDEA ADAPTIVE PARALLELISM PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
HOW DO WE ADAPTIVELY PARALLELIZE? PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
HOW DO WE ADAPTIVELY PARALLELIZE?
▸ Using predictive models of a browser’s parallel performance and energy PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
HOW DO WE ADAPTIVELY PARALLELIZE?
▸ Using predictive models of a browser’s parallel performance and energy
▸ Generated using machine learning techniques PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
HOW TO USE SUCH A MODEL?
1 DOM-size No attribute-count web-page-size PREDICTIVE PARALLELIZE? number-of-leaves MODEL 2 avg-tree-width max-tree-width 3 avg-work-per-level Yes
HOW MANY THREADS? N PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
OUR CONTRIBUTIONS
▸ Implementation-blind workload characterization
▸ Automated labeling algorithms
▸ Model generation and prediction PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
OUR CONTRIBUTIONS
▸ Implementation-blind workload characterization
▸ Automated labeling algorithms
▸ Model generation and prediction
▸ Extendable to any parallel browser on any platform
▸ Configurable thread configurations, labeling PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
PARALLEL BROWSER INTERNALS
SCRIPT PARSING STYLING LAYOUT PAINTING COMPOSITING
HTML CSS
https://www.url.com DOM FLOW DISPLAY LAYERS SCRIPT TREE TREE LISTS OUTPUT PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
PARALLEL BROWSER INTERNALS
SCRIPT PARSING STYLING LAYOUT PAINTING COMPOSITING
HTML CSS
https://www.url.com DOM FLOW DISPLAY LAYERS SCRIPT TREE TREE LISTS OUTPUT
Styling + Layout = Overall Layout is most computationally intensive PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
PARALLEL BROWSER INTERNALS HTML
BODY
P DIV STYLING LAYOUTHello World TEXT IMAGE
DOM FLOW DISPLAY
PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
PARALLEL BROWSER INTERNALS
STYLING LAYOUT ▸ Styling: attach styles to DOM tree nodes
▸ Layout: compute absolute DOM FLOW DISPLAY TREE TREE LISTS geometric dimensions PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
PARALLEL BROWSER INTERNALS
▸ Multiple DOM nodes can be STYLING LAYOUT styled simultaneously
▸ Parallel, consecutive top- down and bottom-up tree DOM FLOW DISPLAY traversals TREE TREE LISTS ▸ Meyerovich et al. [28] PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
MOZILLA RESEARCH’S SERVO
▸ Written in Rust
▸ Uses work-stealing scheduler
▸ Sequential Overall Layout is 2x faster than Firefox’s
▸ High parallel speedups
▸ 3.2x, 2.5x with 4 threads on humana.com, kohls.com PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WORKLOAD CHARACTERIZATION
▸ DOM tree characteristics intuitively predict available parallelism
▸ Looked at 9 features
▸ Used 7 features PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WORKLOAD CHARACTERIZATION
▸ DOM tree characteristics intuitively predict available parallelism
▸ Looked at 9 features
▸ Used 7 features
Correlation between DOM-tree features and parallel work PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WORKLOAD CHARACTERIZATION
▸ DOM-size: total number of nodes in the DOM tree
▸ attribute-count: total number of attributes in HTML tags
▸ web-page-size: size of the web pages’ HTML (bytes)
▸ number-of-leaves: number of leaves in DOM tree
▸ avg-tree-width: average number of nodes at a level of tree
▸ max-tree-width: largest number of nodes at a level of tree
▸ avg-work-per-level: average work per level of tree PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WORKLOAD CHARACTERIZATION
▸ DOM-size: total number of nodes in the DOM tree
▸ attribute-count: total number of attributes in HTML tags
▸ web-page-size: size of the web pages’ HTML (bytes)
▸ number-of-leaves: number of leaves in DOM tree
▸ avg-tree-width: average number of nodes at a level of tree
▸ max-tree-width: largest number of nodes at a level of tree
▸ avg-work-per-level: average work per level of tree PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
WORKLOAD CHARACTERIZATION PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
PREPARE TRAINING DATA
▸ Define thread-configurations T = { t | t is a thread-configuration }
▸ For each web page
▸ Collect performance and energy usage with | T | thread- configurations
▸ Speedup + Greenup ∀ t ∈ T
pt et
▸ Label using a cost model PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
AUTOMATED LABELING
▸ Performance and Energy Cost Model
▸ Label a web page PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
AUTOMATED LABELING
PETN
{ pk, ek }
PET2
PETs { Performance-Energy Tuple } PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
AUTOMATED LABELING
PM PM+1 PET EM buckets PETN
{ pk, ek } Pj Pj+1
Ej PET2
P1 P2 PETs E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS
AUTOMATED LABELING
PM PM+1 PM PM+1 PET
EM buckets EM PETN
{ pk, ek } Pj Ej Ej PET2 P1 P2 P1 P2 PETs E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS AUTOMATED LABELING PM PM+1 PM PM+1 PET Next SORT { pt, et } EM buckets EM PETN Pj Pj+1 DESCENDING highest bucket w.r.t. pt values Ej Choose next PET { pk, ek } Pj More More Request next YES NO NO et Ej Ej PET PETs in Labels buckets? bucket? PET2 highest bucket > Ej YES NO t P1 P2 P1 P2 1 PETs Loop from last bucket down to first E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS CASE STUDY WITH SERVO ▸ Testbed: Quad-core Intel i7 Ivy Bridge; 8GB RAM; OS X 10.9 ▸ Thread-configurations: 1, 2, 4 ▸ Samples: 535 webpages from Alexa Top 500 + 2012 Fortune 1000 ▸ Google’s Web Page Replay to maintain repeatability ▸ Performance measurements with Servo’s internal instrumentation ▸ Energy measurements with Apple’s powermetrics PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS SUPERVISED LEARNING ▸ Off-the-shelf learning techniques to train the model ▸ Confident accuracies Supervised Learning Evaluation Average Maximum Algorithm Method Accuracy Accuracy Multinomial Logistic Cross-validation 72.22% 87.27% Regression (MNR) Ensemble Learning Cross-validation 71.12% 85.45% Neural Network Holdout-set 77.8% - PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS OUR MODEL VS SERVO ▸ Up to 94.52% performance savings ▸ 2.48 ms with 1 thread vs. 45.41 ms with 4 threads (indeed.com) ▸ Up to 46.32% energy savings ▸ 84.88 J with 1 thread vs. 158.14 J with 4 threads (starbucks.com) ▸ Predict rightly when we need to use 4 threads ▸ 2.15x speedup, 1.17x greenup with 4 threads (booking.com) ▸ 2 threads better than 4 threads ▸ 1.36x with 2 threads vs. 1.21x with 4 threads (creditkarma.com) PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS FUTURE WORK ▸ Guided, work-load aware scheduling ▸ Static + Dynamic ▸ Modeling on big.LITTLE architecture ▸ Modeling parallelism of other browser tasks PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS CONCLUSION ▸ Adaptive parallelism is key ▸ Platform-agnostic modeling technique ▸ using key DOM characteristics ▸ Automated labeling algorithms ▸ Robust accuracies and savings ▸ Productive step towards reviving the universal Web platform We thank IEEE TCPP for providing travel support to attend IEEE HiPC 2016 THANK YOU! Rohit Zambre QUESTIONS? PhD, Computer Engineering University of California, Irvine [email protected] | rohitzambre.com BACKUP SLIDES PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS PARALLEL BROWSER INTERNALS BOD Hello World HTML ▸ Parallel parsing by Zhao et al. [34] PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS AUTOMATED LABELING PM PM+ PM PM+ PET Next SORT { pt, et } EM bucke EM PETN Pj Pj+ DESCENDIN highest G Ej { pk, ek } Pj Ej Ej PET2 P1 P2 P1 P2 PET E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS AUTOMATED LABELING PM PM+ PM PM+ PET Next SORT { pt, et } EM bucke EM PETN Pj Pj+ DESCENDIN highest G Ej { pk, ek } Pj et Ej Ej Label PET2 > YE P1 P2 P1 P2 PET E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS AUTOMATED LABELING PM PM+ PM PM+ PET Next SORT { pt, et } EM bucke EM PETN Pj Pj+ DESCENDIN highest G Ej Choose next { pk, ek } Pj More N et Ej Ej PETs in Label PET2 > YE P1 P2 P1 P2 PET E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS AUTOMATED LABELING PM PM+ PM PM+ PET Next SORT { pt, et } EM bucke EM PETN Pj Pj+ DESCENDIN highest G Ej Choose next { pk, ek } Pj Request next More N More N et Ej Ej YE PET PETs in Label highest PET2 > YE P1 P2 P1 P2 PET E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS AUTOMATED LABELING PM PM+ PM PM+ PET Next SORT { pt, et } EM bucke EM PETN Pj Pj+ DESCENDIN highest G Ej Choose next { pk, ek } Pj Request next More N More N et Ej Ej YE PET PETs in Label highest PET2 > YE P1 P2 P1 P2 PET Loop from last bucket down to E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS LABELING & LEARNING ▸ Two PET buckets 1. 1.3 1. 12.8 0. 0.8 p2 p4 CDF2 CDF4 120 100.00% 100 80.00% 80 60 60.00% 40 40.00% 20 20.00% # OF WEB PAGES 0 0.00% 1 0.8 0.9 1.1 1.2 1.3 1.4 1.5 1.6 0.8 1.6 < > SPEEDUP PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS LABELING & LEARNING ▸ Two PET buckets 1. 1.3 1. 12.8 0. 0.8 ▸ Labels → 1: 59.3%; 2: 9.3%; 4: 31.4% PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS LABELING & LEARNING ▸ Two PET buckets 1. 1.3 1. 12.8 0. 0.8 ▸ Labels → 1: 59.3%; 2: 9.3%; 4: 31.4% ▸ Off-the-shelf supervised learning methods