Windows XP Kernel Crash Analysis Archana Ganapathi, Viji Ganapathi, and David Patterson – University of California, Berkeley

Total Page:16

File Type:pdf, Size:1020Kb

Load more

Windows XP Kernel Crash Analysis Archana Ganapathi, Viji Ganapathi, and David Patterson – University of California, Berkeley ABSTRACT PC users have started viewing crashes as a fact of life rather than a problem. To improve operating system dependability, systems designers and programmers must analyze and understand failure data. In this paper, we analyze Windows XP kernel crash data collected from a population of volunteers who contribute to the Berkeley Open Infrastructure for Network Computing (BOINC) project. We found that OS crashes are predominantly caused by poorly-written device driver code. Users as well as product developers will benefit from understanding the crash behaviors elaborated in this paper. Introduction the dominant failure cause of popular computer sys- tems. In particular, it identifies products that cause the Personal Computer (PC) reliability has become a most user frustration, thus facilitating our efforts to rapidly growing concern both for computer users as build stable, resilient systems. Furthermore, it enables well as product developers. Personal computers run- ning the Microsoft Windows operating system are product evaluation and development of benchmarks often considered overly complex and difficult to man- that rank product quality. These benchmarks can influ- age. As modern operating systems serve as a conflu- ence design prototypes for reliable systems. ence of a variety of hardware and software compo- Within an organization, analyzing failure data nents, it is difficult to pinpoint unreliable components. can improve quality of service. Often, corporations Such unconstrained flexibility allows complex, collect failure data to evaluate causes of downtime. In unanticipated, and unsafe interactions that result in an addition, they perform cost-benefit analysis to unstable environment often frustrating the user. To improve service availability. Some companies extend troubleshoot recurring problems, it is beneficial to their analyses to client sites by gathering failure data data-mine, analyze and document every interaction for at deployment locations. erroneous behaviors. Such failure data provides For example, Microsoft Corporation collects insight into how computer systems behave under var- crash data for their Windows operating system as well ied hardware and software configurations. as applications used by their customers. Unfortunately, To improve dependability, systems designers and due to legal concerns, corporations such as Microsoft programmers must understand operating system failure will usually not share their data with academic data. In this paper, we analyze crash data from a small research groups. Companies do not wish to reveal number of Windows machines. We collected our data their internal vulnerabilities, nor can they share third from a population of volunteers who contribute to the party products’ potential weaknesses. In addition, Berkeley Open Infrastructure for Network Computing many companies disable the reporting feature after (BOINC) project. As our analysis is based on a small viewing proprietary data in the report. While abundant amount of data (with a self-selection bias due to the failure data is generated on a daily basis, very little is nature of BOINC), we acknowledge that our results do readily sharable with the research community. not represent the entire PC population. Nonetheless, The remainder of this paper describes our data the data reveals several useful results for PC users as collection and analysis methodology, including: related well as researchers and product developers. work in the areas of system dependability and failure Most Windows users have experienced at least data analysis, background information about Windows one ‘‘bluescreen’’ during the lifetime of their machine. crash data and the data collection process, crash data A sophisticated PC user will accept Windows crashes analysis and results, a discussion of the merits of as a fact and attempt to cope with them. However, a potential extensions to our work, and a conclusion. novice user will be terrified by the implications of a crash and will continue to be preoccupied with the Related Work thought of causing severe damage to the computer. Jim Gray’s work [Gra86, Gra90] serves a model Analyzing failure data can help users gauge the for most contemporary failure analysis work. Gray did dependability of various products and understand the not perform root cause analysis but rather Outage Cause source of their crashes. that considers the last in the fault chain. In 1989, he From a research perspective, the motivation found that the major source of outages was software, behind failure data-mining is manifold. First, it reveals contributing about 55%, far outrunning its immediate 20th Large Installation System Administration Conference (LISA ’06) 149 Wi n d o w sXP Kernel Crash Analysis Ganapathi, Ganapathi, & Patterson successor, system operations, which contributed 15%. rather than injecting artificial faults as performed by This observation led him to blame software for almost fuzz testing [FM00]. Our study of crash data differs every failure. In an earlier study [G05, GP05], we ana- from error log analysis performed by Kalakech, et al. lyzed Windows application crashes to understand causal [KK+04]; we determine the cause of crashes in addi- relationships in the user-level. Departing from Gray’s tion to time and frequency. outage cause analysis, in our study we perform root Several researchers have provided insights on cause analysis under the assumption that the first crash benchmarking and failure data analysis [BC+02, in a sequence of crashes is responsible for all subse- BS97, OB+02, WM+02]. Wilson, et al. suggest evalu- quent crashes within that event chain. ating the relationship between failures and service The past two decades have produced several stud- availability [WM+02]. Among other metrics, when ies in root-cause analysis for operating systems (OS) evaluating dependability, system stability is a key con- ranging from Guardian OS and Tandem Non-Stop UX cern. Ganapathi, et al. examine Windows XP registry OS to VAX/VMS and Windows NT [Gra90, Kal98, problems and their effect on system stability [GW+04]. LI95, SK+00, SK+02, TI92, TI+95]. In server environ- Levendel suggests using the catastrophic nature of fail- ments, Tandem computers, VAX clusters as well as ures to evaluate system stability [Lev89]. Brown, et al. several operating systems and file servers have been provide a practical perspective on system dependability examined for software defects by several researchers. by incorporating users’ experience in benchmarks Lee and Iyer focussed on software faults in the Tandem [BC+02, BS97]. In our study of crashes, we consider GUARDIAN operating system [LI95], Tang and Iyer these factors when evaluating various applications. considered two VAX clusters running the VAX/VMS operating system [TI92], and Sullivan and Chillarege Overview of Crashes and Crashdumps examined software defects in MVS, DB2, and IMS A crash is an event caused by a problem in the [SC91]. Murphy and Gent also focussed on system operating system (OS) or application (app) requiring crashes in VAX systems over an extended period, OS or app restart. App crashes occur at user level and almost a decade [MG95]. They concluded that system typically involve restarting the crashing application. management was responsible for over 50% of failures An OS crash occurs at kernel-level, and is usually with software trailing at 20% followed by hardware caused by memory corruption, bad drivers or faulty that is responsible for about 10% of failures. system-level routines. OS crashes are more frustrating While examining NFS data availability in Net- than application crashes as they require the user to kill work Appliance’s NetApp filers, Lancaster and Rowe and restart the Windows Explorer process at a mini- attributed power failures and software failures as the mum, more commonly forcing a full machine reboot. largest contributors to downtime; operator failure con- While there are a handful of crashes due to memory tributions were negligible [LR01]. Thakur and Iyer corruption and other common systems problems, a examined failures in a network of 69 SunOS worksta- majority of these OS crashes are caused by device driv- tions [TI96]. They divided problem root causes into ers. These drivers are related to various components network, non-disk and disk-related machine problems. such as display monitors, network and video cards. Kalyanakrishnam, et al. perused six months of event Upon each OS crash or bluescreen generated by logs from a LAN comprising of Windows NT work- the operating system, Windows XP collects failure stations that delivered emails [KK+99]. Using a state data as a minidump. Users have three different options machine model of detailed system failure states to for the amount of information that is collected upon a describe failure timelines on a single node, they con- crash. We use the default (and smallest) option of col- cluded that most automatic system reboot problems lecting small dumps, which are only 64K in size. are software-related; the average downtime is two These small minidumps contain a partial snapshot of hours. Similarly, Xu, et al. considered Windows NT the computer’s state at the time of crash. They include event log entries related to system reboots for a net- a list of loaded drivers, the names and timestamps of work of workstations that were used for enterprise in- binaries that were loaded
Recommended publications
  • What Is an Operating System III 2.1 Compnents II an Operating System

    What Is an Operating System III 2.1 Compnents II an Operating System

    Page 1 of 6 What is an Operating System III 2.1 Compnents II An operating system (OS) is software that manages computer hardware and software resources and provides common services for computer programs. The operating system is an essential component of the system software in a computer system. Application programs usually require an operating system to function. Memory management Among other things, a multiprogramming operating system kernel must be responsible for managing all system memory which is currently in use by programs. This ensures that a program does not interfere with memory already in use by another program. Since programs time share, each program must have independent access to memory. Cooperative memory management, used by many early operating systems, assumes that all programs make voluntary use of the kernel's memory manager, and do not exceed their allocated memory. This system of memory management is almost never seen any more, since programs often contain bugs which can cause them to exceed their allocated memory. If a program fails, it may cause memory used by one or more other programs to be affected or overwritten. Malicious programs or viruses may purposefully alter another program's memory, or may affect the operation of the operating system itself. With cooperative memory management, it takes only one misbehaved program to crash the system. Memory protection enables the kernel to limit a process' access to the computer's memory. Various methods of memory protection exist, including memory segmentation and paging. All methods require some level of hardware support (such as the 80286 MMU), which doesn't exist in all computers.
  • Measuring and Improving Memory's Resistance to Operating System

    Measuring and Improving Memory's Resistance to Operating System

    University of Michigan CSE-TR-273-95 Measuring and Improving Memory’s Resistance to Operating System Crashes Wee Teck Ng, Gurushankar Rajamani, Christopher M. Aycock, Peter M. Chen Computer Science and Engineering Division Department of Electrical Engineering and Computer Science University of Michigan {weeteck,gurur,caycock,pmchen}@eecs.umich.edu Abstract: Memory is commonly viewed as an unreliable place to store permanent data because it is per- ceived to be vulnerable to system crashes.1 Yet despite all the negative implications of memory’s unreli- ability, no data exists that quantifies how vulnerable memory actually is to system crashes. The goals of this paper are to quantify the vulnerability of memory to operating system crashes and to propose a method for protecting memory from these crashes. We use software fault injection to induce a wide variety of operating system crashes in DEC Alpha work- stations running Digital Unix, ranging from bit errors in the kernel stack to deleting branch instructions to C-level allocation management errors. We show that memory is remarkably resistant to operating system crashes. Out of the 996 crashes we observed, only 17 corrupted file cache data. Excluding direct corruption from copy overruns, only 2 out of 820 corrupted file cache data. This data contradicts the common assump- tion that operating system crashes often corrupt files in memory. For users who need even greater protec- tion against operating system crashes, we propose a simple, low-overhead software scheme that controls access to file cache buffers using virtual memory protection and code patching. 1 Introduction A modern storage hierarchy combines random-access memory, magnetic disk, and possibly optical disk or magnetic tape to try to keep pace with rapid advances in processor performance.
  • Clusterfuzz: Fuzzing at Google Scale

    Clusterfuzz: Fuzzing at Google Scale

    Black Hat Europe 2019 ClusterFuzz Fuzzing at Google Scale Abhishek Arya Oliver Chang About us ● Chrome Security team (Bugs--) ● Abhishek Arya (@infernosec) ○ Founding Chrome Security member ○ Founder of ClusterFuzz ● Oliver Chang (@halbecaf) ○ Lead developer of ClusterFuzz ○ Tech lead for OSS-Fuzz 2 Fuzzing ● Effective at finding bugs by exploring unexpected states ● Recent developments ○ Coverage guided fuzzing ■ AFL started “smart fuzzing” (Nov’13) ○ Making fuzzing more accessible ■ libFuzzer - in-process fuzzing (Jan’15) ■ OSS-Fuzz - free fuzzing for open source (Dec’16) 3 Fuzzing mythbusting ● Fuzzing is only for security researchers or security teams ● Fuzzing only finds security vulnerabilities ● We don’t need fuzzers if our project is well unit-tested ● Our project is secure if there are no open bugs 4 Scaling fuzzing ● How to fuzz effectively as a Defender? ○ Not just “more cores” ● Security teams can’t write all fuzzers for the entire project ○ Bugs create triage burden ● Should seamlessly fit in software development lifecycle ○ Input: Commit unit-test like fuzzer in source ○ Output: Bugs, Fuzzing Statistics and Code Coverage 5 Fuzzing lifecycle Manual Automated Fuzzing Upload builds Build bucket Cloud Storage Find crash Write fuzzers De-duplicate Minimize Bisect File bug Fix bugs Test if fixed (daily) Close bug 6 Assign bug ClusterFuzz ● Open source - https://github.com/google/clusterfuzz ● Automates everything in the fuzzing lifecycle apart from “fuzzer writing” and “bug fixing” ● Runs 5,000 fuzzers on 25,000 cores, can scale more ● Cross platform (Linux, macOS, Windows, Android) ● Powers OSS-Fuzz and Google’s fuzzing 7 Fuzzing lifecycle 1. Write fuzzers 2. Build fuzzers 3. Fuzz at scale 4.
  • The Fuzzing Hype-Train: How Random Testing Triggers Thousands of Crashes

    The Fuzzing Hype-Train: How Random Testing Triggers Thousands of Crashes

    SYSTEMS ATTACKS AND DEFENSES Editors: D. Balzarotti, [email protected] | W. Enck, [email protected] | T. Holz, [email protected] | A. Stavrou, [email protected] The Fuzzing Hype-Train: How Random Testing Triggers Thousands of Crashes Mathias Payer | EPFL, Lausanne, Switzerland oftware contains bugs, and some System software, such as a browser, discovered crash. As a dynamic testing S bugs are exploitable. Mitigations a runtime system, or a kernel, is writ­ technique, fuzzing is incomplete for protect our systems in the presence ten in low­level languages (such as C nontrivial programs as it will neither of these vulnerabilities, often stop­ and C++) that are prone to exploit­ cover all possible program paths nor ping the program once a security able, low­level defects. Undefined all data­flow paths except when run violation has been detected. The alter­ behavior is at the root of low­level for an infinite amount of time. Fuzz­ native is to discover bugs during de­ vulnerabilities, e.g., invalid pointer ing strategies are inherently optimiza­ velopment and fix them tion problems where the in the code. The task of available resources are finding and reproducing The idea of fuzzing is simple: execute used to discover as many bugs is difficult; however, bugs as possible, covering fuzzing is an efficient way a program in a test environment with as much of the program to find security­critical random input and see if it crashes. functionality as possible bugs by triggering ex ­ through a probabilistic ceptions, such as crashes, exploration process. Due memory corruption, or to its nature as a dynamic assertion failures automatically (or dereferences resulting in memory testing technique, fuzzing faces several with a little help).
  • Recovering from Operating System Crashes

    Recovering from Operating System Crashes

    Recovering from Operating System Crashes Francis David Daniel Chen Department of Computer Science Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign Urbana, USA Urbana, USA Email: [email protected] Email: [email protected] Abstract— When an operating system crashes and hangs, it timers and processor support techniques [12] are required to leaves the machine in an unusable state. All currently running detect such crashes. program state and data is lost. The usual solution is to reboot the It is accepted behavior that when an operating system machine and restart user programs. However, it is possible that after a crash, user program state and most operating system state crashes, all currently running user programs and data in is still in memory and hopefully, not corrupted. In this project, volatile memory is lost and unrecoverable because the proces- we use a watchdog timer to reset the processor on an operating sor halts and the system needs to be rebooted. This is inspite system crash. We have modified the Linux kernel and added of the fact that all of this information is available in volatile a recovery routine that is called instead of the normal boot up memory as long as there is no power failure. This paper function when the processor is reset by a watchdog. This resumes execution of user processes after killing the process that was explores the possibility of recovering the operating system executing when the watchdog fired. We have implemented this using the contents of memory in order to resume execution on the ARM architecture and we use a periodic watchdog kick of user programs without any data loss.
  • Protect Your Computer from Viruses, Hackers, & Spies

    Protect Your Computer from Viruses, Hackers, & Spies

    Protect Your Computer from Viruses, Hackers, & Spies Tips for Consumers Consumer Information Sheet 12 • January 2015 Today we use our computers to do so many things. We go online to search for information, shop, bank, do homework, play games, and stay in touch with family and friends. As a result, our com- puters contain a wealth of personal information about us. This may include banking and other financial records, and medical information – information that we want to protect. If your computer is not protected, identity thieves and other fraudsters may be able to get access and steal your personal information. Spammers could use your computer as a “zombie drone” to send spam that looks like it came from you. Malicious viruses or spyware could be deposited on your computer, slowing it down or destroying files. By using safety measures and good practices to protect your home computer, you can protect your privacy and your family. The following tips are offered to help you lower your risk while you’re online. 3 Install a Firewall 3 Use Anti-virus Software A firewall is a software program or piece of Anti-virus software protects your computer hardware that blocks hackers from entering from viruses that can destroy your data, slow and using your computer. Hackers search the down or crash your computer, or allow spammers Internet the way some telemarketers automati- to send email through your account. Anti-virus cally dial random phone numbers. They send protection scans your computer and your out pings (calls) to thousands of computers incoming email for viruses, and then deletes and wait for responses.
  • PC Crash 7.2 Manual

    PC Crash 7.2 Manual

    PC-CRASH A Simulation Program for Vehicle Accidents Operating Manual Version 9.0 November 2010 Dr. Steffan Datentechnik Linz, Austria Distributed and Supported in North America by: www.pc-crash.com Contents CHAPTER 1 INTRODUCTION AND INSTALLATION ........................................................................................ 1 INTRODUCTION ................................................................................................................................................... 1 ABOUT THIS MANUAL .......................................................................................................................................... 1 THE PC-CRASH SOFTWARE PACKAGE ................................................................................................................. 2 HARDWARE REQUIREMENTS ............................................................................................................................... 2 USE OF MOUSE AND PRINTERS ........................................................................................................................... 2 INSTALLATION OF PC-CRASH .............................................................................................................................. 2 Licensing PC-Crash ...................................................................................................................................................... 5 Check for Updates .......................................................................................................................................................
  • Internet Application Development

    Internet Application Development

    8/28/12 Chapter 1 – PHP Crash Course Metropolitan State University ICS 325 Internet Application Development Class Notes – Chapter 1 – PHP Crash Course What is PHP? PHP is a server side scripting language, written specifically for the Web PHP was developed in 1994 by Rasmus Lerdorf PHP originally stood for Personal Home Page PHP currently stands for PHP Hypertext Preprocessor Current version of PHP is 4.3.6 Version 5 is available for early adopters Home page = http://www.php.net PHP strengths include: High performance, Interfacing with Different DBMS (uses ODBC), Libraries, Low Cost, Portability, open source, and thousands of functions. Web server must have PHP installed. PHP scripts are interpreted and executed on the server. Output from a PHP script looks like plain old HTML. The client cannot see your PHP code, only its output. PHP Tags – Short Style <? echo ”<p>Short Style</p>”; ?> XML Style <?php print(“<p>XML Style</p>”); ?> Script Style <script language=’php’>printf(“<p>Script Style</p>”);</script> ASP Style <% echo “<p>ASP Style</p>”; %> PHP Comments – C Style /* this is a multi line comment */ C++ Style // this is a single line comment PERL Style # this is a single line comment Most PHP code statements must end with a ; (semi-colon) Conditionals and loops are an exception to the rule Variables: All variables in PHP begin with $ (dollar sign) All variables are created the first time they are assigned a value. PHP does not require you to declare variables before using them. PHP variables are multi-type (may contain different data types at different times) cs.metrostate.edu/~f itzgesu/courses/ics325/f all04/Ch01Notes.htm 1/9 8/28/12 Chapter 1 – PHP Crash Course Variables from a form are the name of the attribute from the submitting form.
  • Kdump and Introduction to Vmcore Analysis How to Get Started with Inspecting Kernel Failures

    Kdump and Introduction to Vmcore Analysis How to Get Started with Inspecting Kernel Failures

    KDUMP AND INTRODUCTION TO VMCORE ANALYSIS HOW TO GET STARTED WITH INSPECTING KERNEL FAILURES PATRICK LADD TECHNICAL ACCOUNT MANAGER, RED HAT [email protected] slides available at https://people.redhat.com/pladd SUMMARY TOPICS TO BE COVERED WHAT IS A VMCORE, AND WHEN IS IS CAPTURED? CONFIGURATION OF THE KDUMP UTILITY AND TESTING SETTING UP A VMCORE ANALYSIS SYSTEM USING THE "CRASH" UTILITY FOR INITIAL ANALYSIS OF VMCORE CONTENTS WHAT IS A VMCORE? It is the contents of system RAM. Ordinarily, captured via: makedumpfile VMWare suspend files QEMU suspend-to-disk images # hexdump -C vmcore -s 0x00011d8 -n 200 000011d8 56 4d 43 4f 52 45 49 4e 46 4f 00 00 4f 53 52 45 |VMCOREINFO..OSRE| 000011e8 4c 45 41 53 45 3d 32 2e 36 2e 33 32 2d 35 37 33 |LEASE=2.6.32-573| 000011f8 2e 32 32 2e 31 2e 65 6c 36 2e 78 38 36 5f 36 34 |.22.1.el6.x86_64| 00001208 0a 50 41 47 45 53 49 5a 45 3d 34 30 39 36 0a 53 |.PAGESIZE=4096.S| 00001218 59 4d 42 4f 4c 28 69 6e 69 74 5f 75 74 73 5f 6e |YMBOL(init_uts_n| 00001228 73 29 3d 66 66 66 66 66 66 66 66 38 31 61 39 36 |s)=ffffffff81a96| 00001238 39 36 30 0a 53 59 4d 42 4f 4c 28 6e 6f 64 65 5f |960.SYMBOL(node_| 00001248 6f 6e 6c 69 6e 65 5f 6d 61 70 29 3d 66 66 66 66 |online_map)=ffff| 00001258 66 66 66 66 38 31 63 31 61 36 63 30 0a 53 59 4d |ffff81c1a6c0.SYM| 00001268 42 4f 4c 28 73 77 61 70 70 65 72 5f 70 67 5f 64 |BOL(swapper_pg_d| 00001278 69 72 29 3d 66 66 66 66 66 66 66 66 38 31 61 38 |ir)=ffffffff81a8| 00001288 64 30 30 30 0a 53 59 4d 42 4f 4c 28 5f 73 74 65 |d000.SYMBOL(_ste| 00001298 78 74 29 3d 66 66 66 66 |xt)=ffff| 000012a0 WHEN IS ONE WRITTEN? By default, when the kernel encounters a state in which it cannot gracefully continue execution.
  • COMP26120 Academic Session: 2018-19 Lab Exercise 3: Debugging

    COMP26120 Academic Session: 2018-19 Lab Exercise 3: Debugging

    COMP26120 Academic Session: 2018-19 Lab Exercise 3: Debugging; Arrays and Memory Management Duration: 1 lab session For this lab exercise you should do all your work on your ex3 branch. Important: Read the supporting material on (i) pointers and arrays, and (ii) how to debug C programs before the lab starts. Learning Objectives At the end of this lab you should be able to: 1. Identify some common mistakes related to strings and memory management in C; 2. Use structs, unions, pointers, malloc and free in C; 3. Explain some fundamental differences between C and Java - the explicit use of pointers and of memory management; 4. Use GDB and valgrind to help debug C programs. Introduction We strongly recommend you review the lecture slides, and the two sets of reading material on pointers and arrays and debugging before starting this lab. See the course unit website for these documents. You might also find The GNU C Programming Tutorial helpful now and throughout this course. The relevant parts to this lab are Pointers, Strings, and Data Structures. Part 1 In this part you will take the provided C program broken.c and produce a C program fixed.c and a text file part1.txt describing what you did. You do not need to get to the end of this part to get full marks. You need to have made some reasonable progress and write down what you did. You will also need to demonstrate how to use a debugger and Valgrind to the TA and explain what they do. 1 This program (fixed) should take a single string as input on the command line and print the result of making the first letter of each word upper-case and the rest of the letters lower-case.
  • Pointers, Arrays, Memory: AKA the Cause of Those F@#)(#@*( Segfaults

    Pointers, Arrays, Memory: AKA the Cause of Those F@#)(#@*( Segfaults

    Computer Science 61C Spring 2020 Weaver Pointers, Arrays, Memory: AKA the cause of those F@#)(#@*( Segfaults 1 Agenda Computer Science 61C Spring 2020 Weaver • Pointers Redux • This is subtle and important, so going over again • Arrays in C • Memory Allocation 2 Address vs. Value Computer Science 61C Spring 2020 Weaver • Consider memory to be a single huge array • Each cell of the array has an address associated with it • Each cell also stores some value • For addresses do we use signed or unsigned numbers? Negative address?! • Don’t confuse the address referring to a memory location with the value stored there 101 102 103 104 105 ... ... 23 42 ... 3 Pointers Computer Science 61C Spring 2020 Weaver • An address refers to a particular memory location; e.g., it points to a memory location • Pointer: A variable that contains the address of a variable Location (address) 101 102 103 104 105 ... ... 23 42 104 ... x y p name 4 Types of Pointers Computer Science 61C Spring 2020 Weaver • Pointers are used to point to any kind of data (int, char, a struct, etc.) • Normally a pointer only points to one type (int, char, a struct, etc.). • void * is a type that can point to anything (generic pointer) • Use void * sparingly to help avoid program bugs, and security issues, and other bad things! • You can even have pointers to functions… • int (*fn) (void *, void *) = &foo • fn is a function that accepts two void * pointers and returns an int and is initially pointing to the function foo. • (*fn)(x, y) will then call the function 5 More C Pointer Dangers Computer Science 61C Spring 2020 Weaver • Declaring a pointer just allocates space to hold the pointer – it does not allocate the thing being pointed to! • Local variables in C are not initialized, they may contain anything (aka “garbage”) • What does the following code do? void f() { int *ptr; *ptr = 5; } 6 NULL pointers..
  • Automating Problem Analysis and Triage Sasha Goldshtein @Goldshtn Production Debugging

    Automating Problem Analysis and Triage Sasha Goldshtein @Goldshtn Production Debugging

    Automating Problem Analysis and Triage Sasha Goldshtein @goldshtn Production Debugging Requirements Limitations • Obtain actionable • Can’t install Visual information about Studio crashes and errors • Can’t suspend • Obtain accurate production servers performance • Can’t run intrusive information tools In the DevOps Process… Automatic build (CI) Automatic Automatic deployment remediation (CD) Automatic Automatic error triage monitoring and analysis Dump Files Dump Files • A user dump is a snapshot of a running process • A kernel dump is a snapshot of the entire system • Dump files are useful for post-mortem diagnostics and for production debugging • Anytime you can’t attach and start live debugging, a dump might help Limitations of Dump Files • A dump file is a static snapshot • You can’t debug a dump, just analyze it • Sometimes a repro is required (or more than one repro) • Sometimes several dumps must be compared Taxonomy of Dumps • Crash dumps are dumps generated when an application crashes • Hang dumps are dumps generated on-demand at a specific moment • These are just names; the contents of the dump files are the same! Generating a Hang Dump • Task Manager, right- click and choose “Create Dump File” • Creates a dump in %LOCALAPPDATA%\Te mp Procdump • Sysinternals utility for creating dumps • Examples: Procdump -ma app.exe app.dmp Procdump -ma -h app.exe hang.dmp Procdump -ma -e app.exe crash.dmp Procdump -ma -c 90 app.exe cpu.dmp Procdump -m 1000 -n 5 -s 600 -ma app.exe Windows Error Reporting • WER can create dumps automatically