[3B2-14] mmi2010040065.3d 30/7/010 19:43 Page 65
...... GOOGLE-WIDE PROFILING: ACONTINUOUS PROFILING INFRASTRUCTURE FOR DATA CENTERS
...... GOOGLE-WIDE PROFILING (GWP), A CONTINUOUS PROFILING INFRASTRUCTURE FOR
DATA CENTERS, PROVIDES PERFORMANCE INSIGHTS FOR CLOUD APPLICATIONS.WITH
NEGLIGIBLE OVERHEAD, GWP PROVIDES STABLE, ACCURATE PROFILES AND A
DATACENTER-SCALE TOOL FOR TRADITIONAL PERFORMANCE ANALYSES.FURTHERMORE,
GWP INTRODUCES NOVEL APPLICATIONS OF ITS PROFILES, SUCH AS APPLICATION-
PLATFORM AFFINITY MEASUREMENTS AND IDENTIFICATION OF PLATFORM-SPECIFIC,
MICROARCHITECTURAL PECULIARITIES.
...... As cloud-based computing grows uniquely qualified for performance monitor- in pervasiveness and scale, understanding data- ing in the data center. Traditionally, sam- center applications’ performance and utiliza- pling tools run on a single machine, tion characteristics is critically important, monitoring specific processes or the system because even minor performance improve- as a whole. Profiling can begin on demand Gang Ren ments translate into huge cost savings. Tradi- to analyze a performance problem, or it can tional performance analysis, which typically run continuously.1 Eric Tune needs to isolate benchmarks, can be too com- Google-WideProfilingcanbetheoreti- plicated or even impossible with modern cally viewed as an extension of the Digital Tipp Moseley datacenter applications. It’s easier and more Continuous Profiling Infrastructure representative to monitor datacenter applica- (DCPI)1 to data centers. GWP is a continu- Yixin Shi tions running on live traffic. However, applica- ous profiling infrastructure; it samples across tion owners won’t tolerate latency degradations machines in multiple data centers and col- Silvius Rus of more than a few percent, so these tools must lects various events—such as stack traces, be nonintrusive and have minimal overhead. hardware events, lock contention profiles, Robert Hundt As with all profiling tools, observer distortion heap profiles, and kernel events—allowing must be minimized to enable meaningful cross-correlation with job scheduling data, Google analysis. (For additional information on re- application-specific data, and other informa- lated techniques, see the ‘‘Profiling: From sin- tion from the data centers. gle systems to data centers’’ sidebar.) GWP collects daily profiles from several Sampling-based tools can bring overhead thousand applications running on thousands and distortion to acceptable levels, so they’re of servers, and the compressed profile ......
0272-1732/10/$26.00 c 2010 IEEE Published by the IEEE Computer Society 65 [3B2-14] mmi2010040065.3d 30/7/010 19:43 Page 66
...... DATACENTER COMPUTING
...... Profiling: From single systems to data centers From gprof1 to Intel VTune (http://software.intel.com/en-us/intel- as MPI calls) from parallel programs but aren’t suitable for continuous vtune), profiling has long been standard in the development process. profiling. Most profiling tools focus on a single execution of a single program. As computing systems have evolved, understanding the bigger picture across multiple machines has become increasingly important. References Continuous, always-on profiling became more important with Morph 1. S.L. Graham, P.B. Kessler, and M.K. Mckusick, ‘‘Gprof: A and DCPI for Digital UNIX. Morph uses low overhead system-wide sam- Call Graph Execution Profiler,’’ Proc. Symp. Compiler Con- pling (approximately 0.3 percent) and binary rewriting to continuously struction (CC 82), ACM Press, 1982, pp. 120-126. evolve programs to adapt to their host architectures.2 DCPI gathers 2. X. Zhang et al., ‘‘System Support for Automatic Profiling and much more robust profiles, such as precise stall times and causes, Optimization,’’ Proc. 16th ACM Symp. Operating Systems and focuses on reporting information to users.3 Principles (SOSP 97), ACM Press, 1997, pp. 15-26. OProfile, a DCPI-inspired tool, collects and reports data much in the 3. J.M. Anderson et al., ‘‘Continuous Profiling: Where Have All same way for a plethora of architectures, though it offers less sophisti- the Cycles Gone?’’ Proc. 16th ACM Symp. Operating Sys- cated analysis. GWP uses OProfile (http://oprofile.sourceforge.net) as a tems Principles (SOSP 97), ACM Press, 1997, pp. 357-390. profile source. High-performance computing (HPC) shares many profiling 4. N.R. Tallent et al., ‘‘Diagnosing Performance Bottlenecks in challenges with cloud computing, because profilers must gather repre- Emerging Petascale Applications,’’ Proc. Conf. High Perfor- sentative samples with minimal overhead over thousands of nodes. mance Computing Networking, Storage and Analysis (SC Cloud computing has additional challenges—vastly disparate applica- 09), ACM Press, 2009, no. 51. tions, workloads, and machine configurations. As a result, different profil- 5. V. Herrarte and E. Lusk, Studying Parallel Program Behavior ing strategies exist for HPC and cloud computing. HPCToolkit uses with Upshot, tech. report, Argonne National Laboratory, lightweight sampling to collect call stacks from optimized programs 1991. and compares profiles to identify scaling issues as parallelism increases.4 6. O. Zaki et al., ‘‘Toward Scalable Performance Visualization Open|SpeedShop (www.openspeedshop.org) provides similar func- with Jumpshot,’’ Int’l J. High Performance Computing Appli- tionality. Similarly, Upshot5 and Jumpshot6 can analyze traces (such cations, vol. 13, no. 3, 1999, pp. 277-288.
database grows by several Gbytes every day. Does a particular memory allocation Profiling at this scale presents significant scheme benefit a particular class of challenges that don’t exist for a single ma- applications? chine. Verifying that the profiles are correct What is the cycles per instruction (CPI) is important and challenging because the for applications across platforms? workloads are dynamic. Managing profiling overhead becomes far more important as Additionally, we can derive higher-level data well, as any unnecessary profiling overhead to more complex but interesting questions, can cost millions of dollars in additional such as which compilers were used for appli- resources. Finally, making the profile data cations in the fleet, whether there are more universally accessible is an additional chal- 32-bit or 64-bit applications running, and lenge. GWP is also a cloud application, how much utilization is being lost by subop- with its own scalability and performance timal job scheduling. issues. With this volume of data, we can answer Infrastructure typical performance questions about datacen- Figure 1 provides an overview of the en- ter applications, including the following: tire GWP system.