Evaluating Methods for Grouping and Comparing Crash Dumps

Total Page:16

File Type:pdf, Size:1020Kb

Evaluating Methods for Grouping and Comparing Crash Dumps DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2019 Evaluating methods for grouping and comparing crash dumps MICHEL CUPURDIJA KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Evaluating methods for grouping and comparing crash dumps MICHEL CUPURDIJA Master in Computer Science Date: January 28, 2019 Supervisors: Alexander Baltatzis, Cyrille Artho Examiner: Johan Håstad School of Electrical Engineering and Computer Science Swedish title: Utvärdering av metoder för att gruppera och jämföra krashdumpar iii Abstract Observations suggest that a high percentage of all reported software errors are reoccurrences. In certain cases even as high as 75%. This high percent- age of reoccurrences means that companies are wasting hours manually re- diagnosing errors that have already been diagnosed. The goal of this thesis was to eliminate or limit cases where errors have to be re-diagnosed through the use of automated grouping of crash dumps. In this study we constructed a series of tests. We evaluate both pre-existing methods as well as our new proposed methods for comparing and matching crash dumps. A set of known errors were used as basis for measuring the matching precision and grouping ability of each method. Our results show a large variation in accuracy between methods and that generally, the more accurate a method is, the less it offers in terms of grouping ability. With an accuracy ranging from 50% to 90% and a reduction in manual diagnosis by up to 90%, we have shown that through automatic grouping of crash dumps we can accurately identify reoccurrences and reduce manual diagnosis. iv Sammanfattning Målet med denna rapport var att undersöka metoder för gruppering av krash- dumpar. Rapporter inom ämnet har visat att upp till 75% av rapporterade bug- gar kan vara upprepade förekomster av samma bug. Syftet har därför varit att reducerat behovet av manuell diagnostik genom att gruppera krashdumpar med samma felkälla. I vår studie konstruerade vi tester för att objektivt kunna jämföra och utvär- dera de olika metoderna. Vi utvärderade redan existerande grupperingsmeto- der och metoder som vi föreslagit. Testerna utvärderade grupperingmetoder- nas precision samt deras grupperingsförmåga. Utvärderingen visade på stor variation i precision mellan metoderna men också en korrelation mellan grup- peringsförmåga och precision. Observationen var att metoder med stor preci- sion har en dålig grupperingsförmåga. Våra resultat visar att det är möjligt att reducera upp till 90% av det manuella felsökningsarbetet med en precision i intervallet 50-90% beroende på metodval. Contents 1 Introduction 1 1.1 Problem statement . .1 1.2 Methodology . .2 1.3 Delimitation . .2 1.4 Contribution . .3 1.5 Ethics and sustainability . .3 1.6 Thesis outline . .3 2 Background 4 2.1 Commercial software . .4 2.2 Building reliable software . .5 2.3 Programming languages and their effect on reliability . .5 2.4 Software verification . .5 2.4.1 Formal methods . .6 2.4.2 Software testing . .6 2.4.3 Automatic crash reporting systems . .7 2.5 Crash analysis and understanding failures . .7 2.5.1 System crashes . .8 2.5.2 Process crashes . .9 2.6 Understanding crash dumps . 10 2.7 Tools for crash dump analysis & information extraction . 10 2.7.1 Locating cause of SIGSEGV using mdb in process crash 11 2.8 Machine learning . 13 2.8.1 Online and offline learning . 13 2.8.2 Classification approaches . 14 2.8.3 Measuring the performance of classifiers . 15 3 Related work 16 3.1 Automatic crash reporting systems . 16 v vi CONTENTS 3.2 Bucketing algorithms . 17 3.2.1 Analyzing call stacks . 17 3.2.2 Definition of an edit distance . 19 4 Comparing crash dumps 20 4.1 Symptoms . 20 4.2 Matching based on symptoms . 22 4.3 Comparing stack traces . 22 4.3.1 String comparison . 22 4.3.2 Recursion removal . 23 4.3.3 Edit distance . 27 4.3.4 Prefix matching . 29 4.3.5 Distance normalization . 30 4.4 Machine learning . 31 4.4.1 Nearest neighbor learning algorithm . 31 4.5 Summary . 32 4.5.1 Risk discussion . 33 5 Evaluation and results 35 5.1 Evaluation overview . 35 5.1.1 Test data . 36 5.1.2 Evaluation method for determining precision . 37 5.1.3 Evaluation method for determining grouping ability . 38 5.1.4 Evaluation method for machine learning precision . 41 5.2 Evaluation results . 41 5.2.1 Precision evaluation results . 42 5.2.2 Grouping evaluation results . 46 5.2.3 Machine learning evaluation results . 51 6 Conclusion 53 7 Future work 55 Bibliography 56 Chapter 1 Introduction Modern software is becoming larger and more complex than ever before. Source code repositories and user bases are growing larger every day. Although growth is inevitable, from a quality assurance perspective, new growth means more errors being encountered more frequently. It is important to understand that a high percentage of all encountered soft- ware errors are reoccurrences [1][2]. This high percentage is often due to the significant amount of time it takes to develop and deploy fixes. Applications can therefore remain flawed for a long time. Manual re-diagnosis of old errors that are consistently reported as new bugs is a huge waste of resources. Clearly there is an opportunity here to automate a process that could al- leviate the amount of necessary manual diagnosis significantly. If we could determine and pinpoint the exact symptoms of an error, and the symptoms of two or more errors coincide — then we could consider them equal. Consid- ering two errors equal gives us the ability to categorize errors, and if a new error is reported it could be placed in an error-group with similar symptoms that have already been diagnosed. However, uniquely categorizing errors based on the characteristics of their errors, is not trivial. Information is limited and often only presented in the form of a crash dump, which is the main focus of this thesis. 1.1 Problem statement This thesis will investigate and evaluate methods for grouping errors based on their symptoms, or indications of the cause of an error. The goal is to minimize the amount of manual diagnostics by removing the necessity of redundant re- diagnosis of already known errors. This thesis therefore aims to answer the 1 2 CHAPTER 1. INTRODUCTION following questions: Is it possible to group crashes with sufficient accuracy for an automated sys- tem to be trusted with the task of reducing the amount of necessary manual diagnosis? 1.2 Methodology The overall focus was on exploring the field in order to later apply known the- ory and evaluate a number of existing methods. The first step was to establish an understanding of the subject in general and localize relevant information in crash dumps that would allow us to identify the symptoms of an error. The second step was to conduct a study of certain existing methods for comparing symptoms, specifically within stack trace analysis, and to assess their ability to group errors. These two steps were conducted in several iterations. The outcome of the study determined that four grouping methods were to be implemented and evaluated. The final step was then to summarize the analysis of suggested methods and evaluate concept reliability (see Figure 1.1 for an illustration of the research phase). Figure 1.1: Workflow during research phase. 1.3 Delimitation This work is limited to analysis of crash dumps on Unix-like operating sys- tems. The work is also limited to analysis of crash dumps from a set of similar CHAPTER 1. INTRODUCTION 3 applications, therefore, the results may not be generalizable beyond our data set. 1.4 Contribution This thesis primarily analyzes and compares existing methods for comparing stack traces. The results are in line with existing research for the individual comparison methods, and our evaluation gives an indication as to under which circumstances the different methods are ideal. We introduce a heuristic for removing recursion patterns in stack traces based on the removal of maximal repeats. It is showing an increase in accuracy over existing techniques for the tested data set. 1.5 Ethics and sustainability The training of machine learning algorithms can have a negative impact on sustainability in terms energy consumption, in particular for large data sets where training over large periods of time is necessary. Furthermore, the re- quired physical hardware is constructed with scarce natural resources. From an ethical perspective, part of this work attempts to improve ways of fixing errors. With a very efficient way of fixing errors once a product has already been distributed to customers, companies could spend less time assuring the quality of product prior to distribution. Companies could then be more inclined to release unfinished products, which in turn could have varying levels of impact depending on the sector in which the company operates. 1.6 Thesis outline This thesis is organized as follows: Chapter 2 introduces the reader to crash dumps, what they contain, how they are provoked and how they can be ana- lyzed; Chapter 3 introduces related work already done in the field; Chapter 4 defines how crashes can be categorized and describes methods for doing such categorizations; Chapter 5 explains the evaluation process of the differ- ent comparison methods and the results of the evaluation; Chapter 6 discusses the results from the evaluation; and lastly, Chapter 7 concludes the report. Chapter 2 Background Writing reliable software is hard, especially in today’s landscape where com- plexity of modern applications is ever increasing. History has shown that writ- ing software without errors is close to impossible. Errors can be caused by flaws in the design, or by a correct design implemented in the wrong way.
Recommended publications
  • Introduction to Debugging the Freebsd Kernel
    Introduction to Debugging the FreeBSD Kernel John H. Baldwin Yahoo!, Inc. Atlanta, GA 30327 [email protected], http://people.FreeBSD.org/˜jhb Abstract used either directly by the user or indirectly via other tools such as kgdb [3]. Just like every other piece of software, the The Kernel Debugging chapter of the FreeBSD kernel has bugs. Debugging a ker- FreeBSD Developer’s Handbook [4] covers nel is a bit different from debugging a user- several details already such as entering DDB, land program as there is nothing underneath configuring a system to save kernel crash the kernel to provide debugging facilities such dumps, and invoking kgdb on a crash dump. as ptrace() or procfs. This paper will give a This paper will not cover these topics. In- brief overview of some of the tools available stead, it will demonstrate some ways to use for investigating bugs in the FreeBSD kernel. FreeBSD’s kernel debugging tools to investi- It will cover the in-kernel debugger DDB and gate bugs. the external debugger kgdb which is used to perform post-mortem analysis on kernel crash dumps. 2 Kernel Crash Messages 1 Introduction The first debugging service the FreeBSD kernel provides is the messages the kernel prints on the console when the kernel crashes. When a userland application encounters a When the kernel encounters an invalid condi- bug the operating system provides services for tion (such as an assertion failure or a memory investigating the bug. For example, a kernel protection violation) it halts execution of the may save a copy of the a process’ memory current thread and enters a “panic” state also image on disk as a core dump.
    [Show full text]
  • Process and Memory Management Commands
    Process and Memory Management Commands This chapter describes the Cisco IOS XR software commands used to manage processes and memory. For more information about using the process and memory management commands to perform troubleshooting tasks, see Cisco ASR 9000 Series Aggregation Services Router Getting Started Guide. • clear context, on page 2 • dumpcore, on page 3 • exception coresize, on page 6 • exception filepath, on page 8 • exception pakmem, on page 12 • exception sparse, on page 14 • exception sprsize, on page 16 • follow, on page 18 • monitor threads, on page 25 • process, on page 29 • process core, on page 32 • process mandatory, on page 34 • show context, on page 36 • show dll, on page 39 • show exception, on page 42 • show memory, on page 44 • show memory compare, on page 47 • show memory heap, on page 50 • show processes, on page 54 Process and Memory Management Commands 1 Process and Memory Management Commands clear context clear context To clear core dump context information, use the clear context command in the appropriate mode. clear context location {node-id | all} Syntax Description location{node-id | all} (Optional) Clears core dump context information for a specified node. The node-id argument is expressed in the rack/slot/module notation. Use the all keyword to indicate all nodes. Command Default No default behavior or values Command Modes Administration EXEC EXEC mode Command History Release Modification Release 3.7.2 This command was introduced. Release 3.9.0 No modification. Usage Guidelines To use this command, you must be in a user group associated with a task group that includes appropriate task IDs.
    [Show full text]
  • Post Mortem Crash Analysis
    Post Mortem Crash Analysis Johan Heander & Magnus Malmborn January 14, 2007 Abstract To improve the quality and reliability of embedded systems it is important to gather information about errors in units already sold and deployed. To achieve this, a system for transmitting error information from the customer back to the developers is needed, and the developers must also have a set of tools to analyze the error reports. The purpose of this master thesis was to develop a fully functioning demon- stration system for collection, transmission and interpretation of error reports from Axis network cameras using the Linux operating system. The system has been shown to handle both kernel and application errors and conducts automatic analysis of received data. It also uses standard HTTP protocol for all network transfers making it easy to use even on firewalled net- works. i ii Acknowledgement We would like to thank our LTH supervisor Jonas Skeppstedt for all he has taught us about computer science in general and operating systems and the C programming language in particular. We would also like to thank Mikael Starvik at Axis Communications for quickly providing us with all hardware and information we needed to complete this thesis, and for providing us with support during our implementation and writing. Finally we thank all the developers working at Axis Communications, many of whom have provided input and reflections on our work. iii iv Contents 1 Introduction 1 1.1 Problem description . 1 1.2 Problem analysis . 1 2 Background 3 2.1 Kernel crashes . 3 2.2 User space crashes .
    [Show full text]
  • Linux Core Dumps
    Linux Core Dumps Kevin Grigorenko [email protected] Many Interactions with Core Dumps systemd-coredump abrtd Process Crashes Ack! 4GB File! Most Interactions with Core Dumps Poof! Process Crashes systemd-coredump Nobody abrtd Looks core kdump not Poof! Kernel configured Crashes So what? ● Crashes are problems! – May be symptoms of security vulnerabilities – May be application bugs ● Data corruption ● Memory leaks – A hard crash kills outstanding work – Without automatic process restarts, crashes lead to service unavailability ● With restarts, a hacker may continue trying. ● We shouldn't be scared of core dumps. – When a dog poops inside the house, we don't just `rm -f $poo` or let it pile up, we try to figure out why or how to avoid it again. What is a core dump? ● It's just a file that contains virtual memory contents, register values, and other meta-data. – User land core dump: Represents state of a particular process (e.g. from crash) – Kernel core dump: Represents state of the kernel (e.g. from panic) and process data ● ELF-formatted file (like a program) User Land User Land Crash core Process 1 Process N Kernel Panic vmcore What is Virtual Memory? ● Virtual Memory is an abstraction over physical memory (RAM/swap) – Simplifies programming – User land: process isolation – Kernel/processor translate virtual address references to physical memory locations 64-bit Process Virtual 8GB RAM Address Space (16EB) (Example) 0 0 16 8 EB GB How much virtual memory is used? ● Use `ps` or similar tools to query user process virtual memory usage (in KB): – $ ps -o pid,vsz,rss -p 14062 PID VSZ RSS 14062 44648 42508 Process 1 Virtual 8GB RAM Memory Usage (VSZ) (Example) 0 0 Resident Page 1 Resident Page 2 16 8 EB GB Process 2 How much virtual memory is used? ● Virtual memory is broken up into virtual memory areas (VMAs), the sum of which equal VSZ and may be printed with: – $ cat /proc/${PID}/smaps 00400000-0040b000 r-xp 00000000 fd:02 22151273 /bin/cat Size: 44 kB Rss: 20 kB Pss: 12 kB..
    [Show full text]
  • The Complete Freebsd
    The Complete FreeBSD® If you find errors in this book, please report them to Greg Lehey <grog@Free- BSD.org> for inclusion in the errata list. The Complete FreeBSD® Fourth Edition Tenth anniversary version, 24 February 2006 Greg Lehey The Complete FreeBSD® by Greg Lehey <[email protected]> Copyright © 1996, 1997, 1999, 2002, 2003, 2006 by Greg Lehey. This book is licensed under the Creative Commons “Attribution-NonCommercial-ShareAlike 2.5” license. The full text is located at http://creativecommons.org/licenses/by-nc-sa/2.5/legalcode. You are free: • to copy, distribute, display, and perform the work • to make derivative works under the following conditions: • Attribution. You must attribute the work in the manner specified by the author or licensor. • Noncommercial. You may not use this work for commercial purposes. This clause is modified from the original by the provision: You may use this book for commercial purposes if you pay me the sum of USD 20 per copy printed (whether sold or not). You must also agree to allow inspection of printing records and other material necessary to confirm the royalty sums. The purpose of this clause is to make it attractive to negotiate sensible royalties before printing. • Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. • For any reuse or distribution, you must make clear to others the license terms of this work. • Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above.
    [Show full text]
  • Exniffer: Learning to Rank Crashes by Assessing the Exploitability from Memory Dump
    Exniffer: Learning to Rank Crashes by Assessing the Exploitability from Memory Dump Thesis submitted in partial fulfillment of the requirements for the degree of MS in Computer Science & Engineering by Research by Shubham Tripathi 201407646 [email protected] International Institute of Information Technology Hyderabad - 500 032, INDIA March 2018 Copyright c Shubham Tripathi, March 2018 All Rights Reserved International Institute of Information Technology Hyderabad, India CERTIFICATE It is certified that the work contained in this thesis, titled “Exniffer: Learning to Rank Crashes by Assessing the Exploitability from Memory Dump” by Shubham Tripathi, has been carried out under my supervision and is not submitted elsewhere for a degree. Date Adviser: Prof. Sanjay Rawat “Contemplate and reflect upon knowledge, and you will become a benefactor to others.” To my parents Acknowledgments I would like to express my gratitude to my adviser, Dr. Sanjay Rawat. Sanjay sir helped me staying focused on problems and provided new directions to approach them. Working with him, I have devel- oped my problem solving skills, learnt about research, about life in general. I would be thankful to him for all my life, for providing me the guidance on various matters, always motivating and boosting me with confidence, which really helped me in shaping my life. I must also thank all my lab-mates who are working or have worked earlier in CSTAR - Vijayendra, Spandan, Satwik, Charu, Teja, Ishan and Lokesh. I really enjoyed working with them in the lab. I would like to thank all my friends in IIIT, for making my stay in the campus a memorable one.
    [Show full text]
  • Chapter 2: Operating-System Structures
    Chapter 2: Operating-System Structures Operating System Concepts – 10th Edition Silberschatz, Galvin and Gagne ©2018 Chapter 2: Operating-System Structures Operating System Services User and Operating System-Interface System Calls System Services Linkers and Loaders Why Applications are Operating System Specific Operating-System Design and Implementation Operating System Structure Building and Booting an Operating System Operating System Debugging Operating System Concepts – 10th Edition 2.2 Silberschatz, Galvin and Gagne ©2018 Objectives Identify services provided by an operating system Illustrate how system calls are used to provide operating system services Compare and contrast monolithic, layered, microkernel, modular, and hybrid strategies for designing operating systems Illustrate the process for booting an operating system Apply tools for monitoring operating system performance Design and implement kernel modules for interacting with a Linux kernel Operating System Concepts – 10th Edition 2.3 Silberschatz, Galvin and Gagne ©2018 Operating System Services Operating systems provide an environment for execution of programs and services to programs and users One set of operating-system services provides functions that are helpful to the user: User interface - Almost all operating systems have a user interface (UI). Varies between Command-Line (CLI), Graphics User Interface (GUI), touch-screen, Batch Program execution - The system must be able to load a program into memory and to run that program, end execution, either normally or abnormally (indicating error) I/O operations - A running program may require I/O, which may involve a file or an I/O device Operating System Concepts – 10th Edition 2.4 Silberschatz, Galvin and Gagne ©2018 Operating System Services (Cont.) One set of operating-system services provides functions that are helpful to the user (Cont.): File-system manipulation - The file system is of particular interest.
    [Show full text]
  • Comparing the Robustness of POSIX Operating Systems
    Comparing the Robustness of POSIX Operating Systems http://www.ices.cmu.edu/ballista Philip Koopman & John DeVale ECE Department [email protected] - (412) 268-5225 - http://www.ices.cmu.edu/koopman ,QVWLWXWH IRU &RPSOH[ (QJLQHHUHG 6\VWHPV Overview: Ballista Automated Robustness Testing ◆ Generic robustness testing • Based on data types ◆ OS Testing results • Raw results for 15 Operating Systems • System calls vs. C Library ◆ Exception Handling Diversity • Does everyone core dump on the same exceptions? (no) ◆ Approximating “Silent” failure rates (missing error codes) A Ballista is an ancient siege ◆ Conclusions/Future work weapon for hurling objects at fortified defenses. 2 Ballista: Software Testing + Fault Injection Ideas ◆ SW Testing requires: Ballista uses: • Test case “Bad” value combinations • Module under test Module under Test • Oracle (a “specification”) Watchdog timer/core dumps SPECIFIED INPUT RESPONSE BEHAVIOR SPACE SPACE ROBUST SHOULD VAL I D OPERATION WORK INPUTS MO DULE REPRODUCIBLE UNDEFINED UNDER FAILURE TEST SHOULD INVALID INPUTS UNREPRODUCIBLE RETURN FAILURE ERROR ◆ Ballista combines ideas from: • Domain testing ideas / Syntax testing ideas • Fault injection at the API level 3 Scalable Test Generation API write(int filedes, const void *buffer, size_t nbytes) FILE MEMORY SIZE TESTING DESCRIPTOR BUFFER TEST OBJECTS TEST OBJECT TEST OBJECT OBJECT FD_CLOSED BUF_SMALL_1 SIZE_1 FD_OPEN_READ BUF_MED_PAGESIZE SIZE_16 FD_OPEN_WRITE BUF_LARGE_512MB SIZE_PAGE FD_DELETED BUF_XLARGE_1GB SIZE_PAGEx16 FD_NOEXIST BUF_HUGE_2GB SIZE_PAGEx16plus1
    [Show full text]
  • Chapter 2 Operating System Structures
    Operating Systems Associate Prof. Yongkun Li 中科大-计算机学院 副教授 http://staff.ustc.edu.cn/~ykli Chapter 2 Operating System Structures 1 Objectives • Operating System Services – User Operating System Interface – System Calls • Operating System Structure • Operating System Design and Implementation • MISC: Debugging, Generation & System Boot 2 Operating System Services Services Overview, User Interface 3 Operating System Services • Operating systems provide – an environment for execution of programs and – services to programs and users • Services may differ from one OS to another • What are the common classes? – Convenience of the user – Efficiency of the system 4 Overview of Operating System Services 5 OS Services for Helping Users • User interface - Almost all operating systems have a user interface (UI). – Three forms • Command-Line (CLI) – Shell command • Batch – Shell script • Graphics User Interface (GUI) – Windows system 6 OS Services for Helping Users • Program execution – Load a program into memory – Run the program – End execution • either normally or • abnormally (indicating error) 7 OS Services for Helping Users • I/O operations - A running program may require I/O, which may involve a file or an I/O device – Common I/Os: read, write, etc. – Special functions: recording CD/DVD • Notes: Users usually cannot control I/O devices directly, so OS provides a mean to do I/O – Mainly for efficiency and protection 8 OS Services for Helping Users • File-system manipulation - The file system is of particular interest – OS provides a variety of file systems • Major services – read and write files and directories – create and delete files and directories – search for a given file – list file Information – permission management: allow/deny access 9 OS Services for Helping Users • Communications: information exchange between processes – Processes on the same computer – Processes between computers over a network • Implementations – Shared memory • Two or more processes read/write to a shared section of mem.
    [Show full text]
  • Troubleshooting Typical Issues in Oracle Solaris 11.1
    TroubleshootingTypical Issues in Oracle® Solaris 11.1 Part No: E29013–01 October 2012 Copyright © 1998, 2012, Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT END USERS. Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including anyoperating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government. This software or hardware is developed for general use in a variety of information management applications.
    [Show full text]
  • Benchmarking the Stack Trace Analysis Tool for Bluegene/L
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Juelich Shared Electronic Resources John von Neumann Institute for Computing Benchmarking the Stack Trace Analysis Tool for BlueGene/L Gregory L. Lee, Dong H. Ahn, Dorian C. Arnold, Bronis R. de Supinski, Barton P. Miller, Martin Schulz published in Parallel Computing: Architectures, Algorithms and Applications , C. Bischof, M. B¨ucker, P. Gibbon, G.R. Joubert, T. Lippert, B. Mohr, F. Peters (Eds.), John von Neumann Institute for Computing, J¨ulich, NIC Series, Vol. 38, ISBN 978-3-9810843-4-4, pp. 621-628, 2007. Reprinted in: Advances in Parallel Computing, Volume 15, ISSN 0927-5452, ISBN 978-1-58603-796-3 (IOS Press), 2008. c 2007 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher mentioned above. http://www.fz-juelich.de/nic-series/volume38 Benchmarking the Stack Trace Analysis Tool for BlueGene/L Gregory L. Lee1, Dong H. Ahn1, Dorian C. Arnold2, Bronis R. de Supinski1, Barton P. Miller2, and Martin Schulz1 1 Computation Directorate Lawrence Livermore National Laboratory, Livermore, California, U.S.A. E-mail: {lee218, ahn1, bronis, schulzm}@llnl.gov 2 Computer Sciences Department University of Wisconsin, Madison, Wisconsin, U.S.A. E-mail: {darnold, bart}@cs.wisc.edu We present STATBench, an emulator of a scalable, lightweight, and effective tool to help debug extreme-scale parallel applications, the Stack Trace Analysis Tool (STAT).
    [Show full text]
  • Warrior1: a Performance Sanitizer for C++ Arxiv:2010.09583V1 [Cs.SE]
    Warrior1: A Performance Sanitizer for C++ Nadav Rotem, Lee Howes, David Goldblatt Facebook, Inc. October 20, 2020 1 Abstract buffer, copies the data and deletes the old buffer. As the vector grows the buffer size expands in a geometric se- This paper presents Warrior1, a tool that detects perfor- quence. Constructing a vector of 10 elements in a loop mance anti-patterns in C++ libraries. Many programs results in 5 calls to ’malloc’ and 4 calls to ’free’. These are slowed down by many small inefficiencies. Large- operations are relatively expensive. Moreover, the 4 dif- scale C++ applications are large, complex, and devel- ferent buffers pollute the cache and make the program run oped by large groups of engineers over a long period of slower. time, which makes the task of identifying inefficiencies One way to optimize the performance of this code is to difficult. Warrior1 was designed to detect the numerous call the ’reserve’ method of vector. This method will small performance issues that are the result of inefficient grow the underlying storage of the vector just once and use of C++ libraries. The tool detects performance anti- allow non-allocating growth of the vector up to the speci- patterns such as map double-lookup, vector reallocation, fied size. The vector growth reallocation is a well known short lived objects, and lambda object capture by value. problem, and there are many other patterns of inefficiency, Warrior1 is implemented as an instrumented C++ stan- some of which are described in section 3.4. dard library and an off-line diagnostics tool.
    [Show full text]