Profiling XXXX Code with Valgrind for Linux

Profiling XXXX code with Valgrind for Linux Valgrind is a programming tool for memory debugging, memory leak detection, and profiling. Valgrind Tools Options None, runs the code in the virtual machine without performing any analysis and thus has the smallest possible CPU and memory overhead of all tools. Since valgrind itself provides a trace back from a segmentation fault, the none tool provides this traceback at minimal overhead. Addrcheck, similar to Memcheck but with much smaller CPU and memory overhead, thus catching fewer types of bug. Addrcheck has been removed as of version 3.2.0. Massif, a heap profiler. The separate GUI massif-visualizer visualizes output from Massif. Memcheck, The problems Memcheck can detect and warn about include the following: o Use of uninitialized memory o Reading/writing memory after it has been free'd o Reading/writing off the end of malloc'd blocks o Memory leaks Helgrind and DRD, detect race conditions in multithreaded code Cachegrind, a cache profiler. The separate GUI KCacheGrind visualizes output from Cachegrind. Callgrind, a callgraph analyzer created by Josef Weidendorfer was added to Valgrind as of version 3.2.0. KCacheGrind can visualize output from Callgrind as well as Cachegrind. exp-sgcheck, an experimental tool to find stack and global array overrun errors which Memcheck cannot find. Some code results in false positives from this tool. exp-dhat, dynamic heap analysis tool which analyzes how much memory is allocated and for how long as well as patterns of memory usage. exp-bbv, a performance simulator that extrapolates performance from a small sample set. Usage # valgrind –tool=<option above> ./MyApplication Finding memory leaks in application The following section describes how a memory leak in CraneInformationClient was identified and solved. We had earlier seen that the client application was leaking memory, and then executed the profiler with the heap profiler turned on. This was executed on a virtual machine running Linux, not the real hardware. # valgrind –tool=massif ./CraneInformationClient –config ApplicationConfigurationLinux.xml During execution, valgrind saves the heap profiling data into a file named massif.out.xxxx, where xxxx is an increasing number. This file is hard to get a grip on, so there is a graphical frontend called massif-visualizer. This is pre- installed in the Virtual machine, and can be started with the command: # massif-visualizer massif.out.xxxx. The initial result from the analysis of CraneInformationClient was this after a 10 minutes sampling: In the diagram above, we can see that the total memory heap consumption is increasing steadily. We allocate a lot of memory with QImageData::create, but after a while, it stabilizes at 14.7 MiB of allocated memory. Other functions calls such as qMalloc seem to increase in memory all the time. Note that the “peak” values are all in the end of the sampling, indicating that allocated memory is steadily growing. Valgrind filters out by default the top 10 most memory consuming calls. In the tree above the graph, you can expand each function call to see more details. Below are two of the items expanded until we can see something that could be related to our code: As we know the source code quite well, we can guess that something in the CustomValueIndicatorButton and CustomIndicatorButton are leaking memory. After examining the code in the paintEvent of these two components, we found a StyleOption object that was created again for each repaint, and caused a memory leak. Also we could see a lot of calls to QNetworkAccessManager::get(). After searching the source code, we can see that this is used in the framework for sending HTTP Get requests to the server. After some examining in the code for handling the disposal of httpGet requests, we found that a destructor was missing in class JSONCallPrivate, as seen in yellow below: After adding these bug fixes, another profiling sample gave this output: The allocated memory consumption now stabilizes at 20 MiB, and we can also see that the peak (now for other function calls) occurs at different places in execution, and not in the end of the sample. Here is an even longer sampling. We can see that most of the memory the application uses are from image handling, but since it stops growing after a while, there is no longer any leak. Finding performance bottlenecks Valgrind has a tool called “callgrind” that can be used for performance profiling. A typical profile run with valgrind looks like this: # valgrind –tool=callgrind ./myApplication The profile data will be saved into a file called “callgrind.out.<pid>”. This file must be viewed with a visualizer. One that we have installed in the virtual machine is “KCacheGrind”. To use KCacheGrind, simply type: # kcachegrind callgrind.out.1234. In theory, profiling with callgrind should be quite easy. For we can look at this example application: This will call three math functions 10.000 times. We execute valgrind with the –tool=callgrind option and get a callgrind.out.xxxx file. Shown in KCacheGrind we can see the call graph: This lists all function calls starting at initialization of main and then a call tree below. In the tree we can see how cpu time is allocation in each function call. Since this is a Qt application, it calls a lot of CPU consuming GUI functions. Our small mathFunctionCall function is not easy to find, but it can be searched for in the search bar. In this example we can see how CPU load is divided between “log” and “cos” functions. Cos is apparently a much more CPU consuming operation than log. This is how the profiler can be used in theory. When using it on a complex application such as XXXX, things get more complicated. Here we use multiple processes and the code base is much larger, making it hard to get a clear view of what is going on. On top of this we have the problem that profiling is a heavy and memory consuming task. Typical callgrind runs programs about 20-100 times slower than normal. On the CCpilot XS display, this makes it almost impossible to profile the complete application. During our tests we had to break out parts of the code into test projects to get a profile result. Conclusions CPU Profiling on the CCpilot XS device is very difficult. The best way for now seems to be to run the profiler in the virtual machine when testing the complete application and to create test projects with parts of the code base when profiling on the real hardware. We did find the memory leaks when profiling in the virtual machine and did not needed to run the heap profiler on the CCpilot device. When it comes to CPU profiling, we could never identify any specific bottlenecks. CPU load was spread out all over the application and no problematic code was found during these investigations. Tips when profiling in the virtual machine When profiling XXXXX in the virtual machine, valgrind is forming the result from different threads into “cycles”. This is quite hard to understand, and to avoid these cycles these flags can be added to the valgrind call: # valgrind --tool=callgrind --separate-callers=5 --separate.recs=19 ./MyApplication --config Another flag that is helpful is to use the flag --instr-atstart=no It instructs valgrind to not start the profiling right away, since this will make the starup _very_ slow. Instead it can be manually started later on at a specific point in execution by calling: # callgrind_control --i on This is made from another console window. It will turn on the profiler and a specific page or function can be examined. .

Load more