Simple and General Statistical Profiling With

Simple and General Statistical Profiling with PCT Charles Blake Steve Bauer Laboratory for Computer Science Laboratory for Computer Science Massachusettes Institute of Technology Massachusettes Institute of Technology [email protected] [email protected] Abstract mentarily suspend programs to sample execution state, such as the value of the program counter. How frequently certain locations occur during an execu- The Profile Collection Toolkit (PCT) provides a tion estimates the relative fraction of time incurred novel generalized CPU profiling facility. PCT en- by those parts of the program. ables arbitrarily late profiling activation and arbitrarily early report generation. PCT usually re- PCT is a sampling-based profiling system that quires no re-compilation, re-linking, or even re- shows a new way to construct effective perfor- starting of programs. Profiling reports gracefully de- mance investigation tools. PCT demonstrates that grade with available debugging data. the same tools programmers are familiar with for answering questions about correctness can be used PCT uses its debugger controller, dbctl, to drive for effective performance analysis. The philosophy a debugger's control over a process. dbctl has a of PCT is that profiling is a particular type of de- configuration language that allows users to specify bugging and that the same preparations should be context-specific debugger commands. These com- adequate. The focus of PCT is CPU-time profiling mands can sample general program state, such as rather than real-time profiling, though, in principle, call stacks and function parameters. sampling may be applied to either. For systems or situations with poor debugger sup- PCT is also flexible and easy to use. Enabling profile port, PCT provides several other portable and flex- collection rarely requires re-compiling, re-linking, or ible collection methods. PCT can track most pro- even re-starting a program. In its simplest usage, gram code, including code in shared libraries and adding a one word prefix to the command-line can late-loaded shared objects. On Linux, PCT can activate collection over entire process subtrees and seamlessly merge kernel CPU time profiles with emit a basic analysis report at the end. PCT can user-level CPU profiles to create whole system re- track CPU time spent in the main program text, ports. shared libraries, late-loaded dynamic objects and in kernel code on Linux. PCT works with a variety of programming languages. 1 Introduction A novel aspect of PCT is that it allows sampling semantically rich data such as function call stacks, function parameters, local or global vari- Profiling is the art and science of understanding ables, CPU registers, or other execution context. program performance. There are two main families This rich data collection capability is achieved via a of profiling techniques, automatic code instrumen- debugger-controller program, dbctl. Using debug- tation and statistical sampling. Code instrumenta- gers to probe program state allows PCT to sam- tion approaches use a high-level language compiler ple a wide variety of values. Statistical patterns in or linker to incorporate new instructions into ob- these values may explain program performance. For ject file outputs. These instructions count how many example, statistically typical values of a function's times various parts of a program get executed. Some parameters may explain why a program spends a lot instrumentation systems [12] count function activa- of time in that function. tions while others [1, 21] count more fine-grained control flow transitions. Sampling approaches mo- Additionally, dbctl can drive parallel non- Feature Description Causally Informative Maximize ability to explain performance characteristics Extensible Sample many kinds of user-defined program state Late-binding Defer as long as possible the decision of whether to profile Early-reporting Make profile reports available as soon as possible Non-invasive Require no extra program build steps or copies of objects Low-overhead Minimize extra program run time Portable Support informative profiles on any OS and CPU Robust Do not rely on program correctness, in particular clean exits Tolerant Report quality should gracefully degrade with worse system support, poorer profile data, and less rich debugging data in object files. Exhaustive Track as much relevant CPU activity as possible Multilingual Support different programming languages, multi-language environments Table 1: Desirable profiling system features. interactive debugging sessions. As the original report generation strategies. Section 6 evaluates the process creates children, dbctl can spawn off overhead and accuracy of the toolkit. Section 7 dis- new debugger instances, reliably attaching them cusses some other approaches to profiling. Section 8 to those children. Using a debugger-controller describes how to obtain the software. Finally, Sec- allows a portable implementation of process subtree tion 9 concludes. execution tracing tools, such as function call tracers. The functionality of PCT gracefully degrades with 2 Design Objectives the available support in the system and the executa- bles of interest. Debugging data or symbol tables are needed for highly meaningful reports. Nevertheless The design goals of PCT were driven by user needs even stripped binaries allow some analysis. For ex- and the inadequacies or inaccessibility of prior sys- ample, one can track the usage of dynamic library tems. Table 1 highlights these objectives. The fol- functions or emit annotated disassembly. In concert lowing section argues for the importance of each in with instrumentation-based basic block-level profil- turn, and the approach of PCT in general. ing such as gcov, PCT can even estimate CPU cy- cles per instruction. Sampled data can be windowed Programmers use profiling systems to understand in time to isolate different CPU intensive periods of what causes performance characteristics. E.g., if cer- a program's execution. The various report formats tain functions dominate an execution, then a profile are available through a set of composable primitive should tell us why those calls are made, and why programs and shell pipelines. they might be slow. If functions are called with ar- guments implying quite different \job sizes", then Profile reports may be generated at any time, even a profile should be able to capture this for analy- prior to program termination and several times over sis. Exactly how causally informative profiling can or the life of one process. Several granularities are should be is an open issue. More information is bet- available for data aggregation and report formats. ter up until some point where overhead and analysis Depending on the debugging data available in ex- tractability concerns become a problem. Program- ecutables, users can select how to display program mers currently have far more a priori knowledge locations. This may be at the level of individual in- about what to look for than any automatic system structions, line numbers, functions, source files, or can hope to have. A practical answer is an exten- even whole object files or libraries. sible collection system that lets users decide what program variables are most relevant to subsequent The organization of this paper is as follows. Sec- performance analysis. tion 2 discusses our design objectives. To make PCT's capabilities more concrete, Section 3 shows a Performance problems often arise only on inputs by few examples. Section 4 then elaborates upon PCT's end users unanticipated by the programmer or in implementation of data collection. Section 5 details very late stages of testing. These issues are thus discovered at the worst possible time for rebuild- compiler technology. ing a program and all its dependencies. Long running programs such as system services often have While instrumentation has the virtue of precision, phased behavior. That is, sections of the program the above considerations suggest that we should go with quite distinct performance characteristics ex- as far as possible with systems that are non-invasive ecute over various windows in time. Profiles over to the stream of instructions the CPU encounters. entire program executions can introduce unwanted In essence, this implies a sampling-based approach. averaging over this phased behavior, making results Sampling also has the virtue of incurring tunably more difficult to interpret. A direct and flexible way low overhead. to address this problem is to allow late-binding. Ide- ally, activating and deactivating profile collection Performance problems often arise only when pro- should be possible at any stage in the life cycle of grams are used in very different environments from a program. As an immediate correspondent, early- where they were developed. Platform-specific profil- reporting is also desirable so that long running pro- ing packages can be more efficient and occasionally grams with highly active phases do not need to ter- more capable. However, they do not help if perfor- minate before a profile can be examined. Together, mance problems cannot be reproduced on supported these let programmers apply whatever knowledge platforms or environments. Programmers also have they have about phased behavior. a rational resistance to learning and relying upon multiple, disparate system-specific tools and inter- Classic instrumentation techniques raise a num- faces. Therefore, a more portable system is more ber of administrative, theoretical, and practical is- valuable. sues. Instrumentation usually requires

Load more