<<

actory PC

UNIVERSITY OF CALIFORNIA, IRVINE RESEARCH PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS: CASE STUDY OF SERVO Rohit Zambre, Lars Bergstrom, Laleh Aghababaie Beni, Aparna Chandramowlishwaran 2,200,000 2,200,000 Google Play Store, June 2016 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

A SMART BUT CLUTTERED PHONE

▸ “Free” apps not really free

▸ Cost memory

▸ One gatekeeper ONE App? ONE App? PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

PROBLEMS WITH THE MOBILE BROWSER

51% 45%

Crashed Unexpected PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

PROBLEMS WITH THE MOBILE BROWSER

51% 45% 73%

Crashed Unexpected Too Slow PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WHY SLOW?

▸ Fundamental architecture unchanged since 1990s PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WHY SLOW?

▸ Fundamental architecture unchanged since 1990s

2000 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WHY SLOW?

▸ Fundamental architecture unchanged since 1990s

2000 2016 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WHY SLOW?

▸ Web page complexity continually increasing

▸ Fundamental architecture unchanged since 1990s

▸ Only sequential optimizations

▸ Incremental feature support OBJECTIVE MAKE THE BROWSER FASTER OBJECTIVE MAKE THE BROWSER FASTER

Revive the Universal Web platform. HOW TO MAKE IT FASTER? HOW TO MAKE IT PARALLELIZE FASTER? BROWSER TASKS PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

RELATED WORK

▸ Parallelizing browser tasks:

▸ Styling: Meyerovich et al. (80x) [28], Badea et al. (1.8x) [21]

▸ Parsing: Zhao et al. (2.4x) [34] PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

RELATED WORK

▸ Parallelizing browser tasks:

▸ Styling: Meyerovich et al. (80x) [28], Badea et al. (1.8x) [21]

▸ Parsing: Zhao et al. (2.4x) [34]

▸ Concurrent/Parallel Browsers:

▸ Concurrent: Google Chrome, Mozilla

▸ Parallel + Concurrent: Qualcomm’s ZOOMM (2x) PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

THE WORKLOAD OF A BROWSER IS DEPENDENT ON THE PAGE IT IS RENDERING.

Browser Internals PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WORKLOAD VARIETY

PARALLEL OVERHEAD PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WORKLOAD VARIETY

PARALLEL OVERHEAD PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WORKLOAD VARIETY

Creation, deletion, coordination

PARALLEL OVERHEAD OUR IDEA ADAPTIVE PARALLELISM PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

HOW DO WE ADAPTIVELY PARALLELIZE? PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

HOW DO WE ADAPTIVELY PARALLELIZE?

▸ Using predictive models of a browser’s parallel performance and energy PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

HOW DO WE ADAPTIVELY PARALLELIZE?

▸ Using predictive models of a browser’s parallel performance and energy

▸ Generated using machine learning techniques PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

HOW TO USE SUCH A MODEL?

1 DOM-size No attribute-count web-page-size PREDICTIVE PARALLELIZE? number-of-leaves MODEL 2 avg-tree-width max-tree-width 3 avg-work-per-level Yes

HOW MANY THREADS? N PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

OUR CONTRIBUTIONS

▸ Implementation-blind workload characterization

▸ Automated labeling algorithms

▸ Model generation and prediction PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

OUR CONTRIBUTIONS

▸ Implementation-blind workload characterization

▸ Automated labeling algorithms

▸ Model generation and prediction

▸ Extendable to any parallel browser on any platform

▸ Configurable thread configurations, labeling PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

PARALLEL BROWSER INTERNALS

SCRIPT PARSING STYLING LAYOUT PAINTING COMPOSITING

HTML CSS

https://www.url.com DOM DISPLAY LAYERS SCRIPT TREE TREE LISTS OUTPUT PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

PARALLEL BROWSER INTERNALS

SCRIPT PARSING STYLING LAYOUT PAINTING COMPOSITING

HTML CSS

https://www.url.com DOM FLOW DISPLAY LAYERS SCRIPT TREE TREE LISTS OUTPUT

Styling + Layout = Overall Layout is most computationally intensive PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

PARALLEL BROWSER INTERNALS HTML

BODY P DIV STYLING LAYOUT

Hello World TEXT IMAGE

DOM FLOW DISPLAY

TREE TREE LISTS

PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

PARALLEL BROWSER INTERNALS

STYLING LAYOUT ▸ Styling: attach styles to DOM tree nodes

▸ Layout: compute absolute DOM FLOW DISPLAY TREE TREE LISTS geometric dimensions PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

PARALLEL BROWSER INTERNALS

▸ Multiple DOM nodes can be STYLING LAYOUT styled simultaneously

▸ Parallel, consecutive top- down and bottom-up tree DOM FLOW DISPLAY traversals TREE TREE LISTS ▸ Meyerovich et al. [28] PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

MOZILLA RESEARCH’S SERVO

▸ Written in Rust

▸ Uses work-stealing scheduler

▸ Sequential Overall Layout is 2x faster than Firefox’s

▸ High parallel speedups

▸ 3.2x, 2.5x with 4 threads on humana.com, kohls.com PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WORKLOAD CHARACTERIZATION

▸ DOM tree characteristics intuitively predict available parallelism

▸ Looked at 9 features

▸ Used 7 features PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WORKLOAD CHARACTERIZATION

▸ DOM tree characteristics intuitively predict available parallelism

▸ Looked at 9 features

▸ Used 7 features

Correlation between DOM-tree features and parallel work PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WORKLOAD CHARACTERIZATION

▸ DOM-size: total number of nodes in the DOM tree

▸ attribute-count: total number of attributes in HTML tags

▸ web-page-size: size of the web pages’ HTML (bytes)

▸ number-of-leaves: number of leaves in DOM tree

▸ avg-tree-width: average number of nodes at a level of tree

▸ max-tree-width: largest number of nodes at a level of tree

▸ avg-work-per-level: average work per level of tree PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WORKLOAD CHARACTERIZATION

▸ DOM-size: total number of nodes in the DOM tree

▸ attribute-count: total number of attributes in HTML tags

▸ web-page-size: size of the web pages’ HTML (bytes)

▸ number-of-leaves: number of leaves in DOM tree

▸ avg-tree-width: average number of nodes at a level of tree

▸ max-tree-width: largest number of nodes at a level of tree

▸ avg-work-per-level: average work per level of tree PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

WORKLOAD CHARACTERIZATION PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

PREPARE TRAINING DATA

▸ Define thread-configurations T = { t | t is a thread-configuration }

▸ For each web page

▸ Collect performance and energy usage with | T | thread- configurations

▸ Speedup + Greenup ∀ t ∈ T

pt et

▸ Label using a cost model PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

AUTOMATED LABELING

▸ Performance and Energy Cost Model

▸ Label a web page PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

AUTOMATED LABELING

PETN

{ pk, ek }

PET2

PETs { Performance-Energy Tuple } PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

AUTOMATED LABELING

PM PM+1 PET EM buckets PETN

{ pk, ek } Pj Pj+1

Ej PET2

P1 P2 PETs E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

AUTOMATED LABELING

PM PM+1 PM PM+1 PET

EM buckets EM PETN

{ pk, ek } Pj

Ej Ej PET2

P1 P2 P1 P2

PETs E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

AUTOMATED LABELING

PM PM+1 PM PM+1 PET

Next SORT { pt, et } EM buckets EM PETN Pj Pj+1 DESCENDING highest bucket w.r.t. pt values Ej Choose next PET { pk, ek } Pj

More More Request next YES NO NO et Ej Ej PET PETs in Labels buckets? bucket? PET2 highest bucket > Ej

YES NO t P1 P2 P1 P2 1 PETs Loop from last bucket down to first E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

CASE STUDY WITH SERVO

▸ Testbed: Quad-core Intel i7 Ivy Bridge; 8GB RAM; OS X 10.9

▸ Thread-configurations: 1, 2, 4

▸ Samples: 535 webpages from Alexa Top 500 + 2012 Fortune 1000

▸ Google’s Web Page Replay to maintain repeatability

▸ Performance measurements with Servo’s internal instrumentation

▸ Energy measurements with Apple’s powermetrics PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

SUPERVISED LEARNING

▸ Off-the-shelf learning techniques to train the model

▸ Confident accuracies

Supervised Learning Evaluation Average Maximum Algorithm Method Accuracy Accuracy

Multinomial Logistic Cross-validation 72.22% 87.27% Regression (MNR)

Ensemble Learning Cross-validation 71.12% 85.45%

Neural Network Holdout-set 77.8% - PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

OUR MODEL VS SERVO

▸ Up to 94.52% performance savings

▸ 2.48 ms with 1 thread vs. 45.41 ms with 4 threads (indeed.com)

▸ Up to 46.32% energy savings

▸ 84.88 J with 1 thread vs. 158.14 J with 4 threads (starbucks.com)

▸ Predict rightly when we need to use 4 threads

▸ 2.15x speedup, 1.17x greenup with 4 threads (booking.com)

▸ 2 threads better than 4 threads

▸ 1.36x with 2 threads vs. 1.21x with 4 threads (creditkarma.com) PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

FUTURE WORK

▸ Guided, work-load aware scheduling

▸ Static + Dynamic

▸ Modeling on big.LITTLE architecture

▸ Modeling parallelism of other browser tasks PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS

CONCLUSION

▸ Adaptive parallelism is key

▸ Platform-agnostic modeling technique

▸ using key DOM characteristics

▸ Automated labeling algorithms

▸ Robust accuracies and savings

▸ Productive step towards reviving the universal Web platform We thank IEEE TCPP for providing travel support to attend IEEE HiPC 2016

THANK YOU! Rohit Zambre QUESTIONS? PhD, Computer Engineering University of California, Irvine [email protected] | rohitzambre.com BACKUP SLIDES PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS PARALLEL BROWSER INTERNALS

HTM PARSING

BOD Hello World

HTML

CSS P DIV DOM
TEXT IMA TREE

▸ Parallel parsing by Zhao et al. [34] PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS AUTOMATED LABELING

PM PM+ PM PM+ PET Next SORT { pt, et } EM bucke EM PETN Pj Pj+ DESCENDIN highest G Ej

{ pk, ek } Pj

Ej Ej PET2

P1 P2 P1 P2

PET E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS AUTOMATED LABELING

PM PM+ PM PM+ PET Next SORT { pt, et } EM bucke EM PETN Pj Pj+ DESCENDIN highest G Ej

{ pk, ek } Pj

et Ej Ej Label PET2 > YE

P1 P2 P1 P2

PET E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS AUTOMATED LABELING

PM PM+ PM PM+ PET Next SORT { pt, et } EM bucke EM PETN Pj Pj+ DESCENDIN highest G Ej Choose next { pk, ek } Pj

More N et Ej Ej PETs in Label PET2 > YE

P1 P2 P1 P2

PET E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS AUTOMATED LABELING

PM PM+ PM PM+ PET Next SORT { pt, et } EM bucke EM PETN Pj Pj+ DESCENDIN highest G Ej Choose next { pk, ek } Pj

Request next More N More N et Ej Ej YE PET PETs in Label highest PET2 > YE

P1 P2 P1 P2

PET E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS AUTOMATED LABELING

PM PM+ PM PM+ PET Next SORT { pt, et } EM bucke EM PETN Pj Pj+ DESCENDIN highest G Ej Choose next { pk, ek } Pj

Request next More N More N et Ej Ej YE PET PETs in Label highest PET2 > YE

P1 P2 P1 P2

PET Loop from last bucket down to E1 E1 PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS LABELING & LEARNING ▸ Two PET buckets

1. 1.3 1. 12.8

0. 0.8

p2 p4 CDF2 CDF4

120 100.00% 100 80.00% 80 60 60.00% 40 40.00% 20 20.00%

# OF WEB PAGES 0 0.00% 1 0.8 0.9 1.1 1.2 1.3 1.4 1.5 1.6 0.8 1.6 < > SPEEDUP PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS LABELING & LEARNING ▸ Two PET buckets

1. 1.3 1. 12.8

0. 0.8

▸ Labels → 1: 59.3%; 2: 9.3%; 4: 31.4% PARALLEL PERFORMANCE-ENERGY PREDICTIVE MODELING OF BROWSERS LABELING & LEARNING ▸ Two PET buckets

1. 1.3 1. 12.8

0. 0.8

▸ Labels → 1: 59.3%; 2: 9.3%; 4: 31.4%

▸ Off-the-shelf supervised learning methods