Systems Performance: Enterprise and the Cloud
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
1 Thinking Methodically About Performance
PERFORMANCE Thinking Methodically about Performance The USE method addresses shortcomings in other commonly used methodologies Brendan Gregg, Joyent Performance issues can be complex and mysterious, providing little or no clue to their origin. In the absence of a starting point—or a methodology to provide one—performance issues are often analyzed randomly: guessing where the problem may be and then changing things until it goes away. While this can deliver results—if you guess correctly—it can also be time-consuming, disruptive, and may ultimately overlook certain issues. This article describes system-performance issues and the methodologies in use today for analyzing them, and it proposes a new methodology for approaching and solving a class of issues. Systems-performance analysis is complex because of the number of components in a typical system and their interactions. An environment may be composed of databases, Web servers, load balancers, and custom applications, all running upon operating systems—either bare-metal or virtual. And that’s just the software. Hardware and firmware, including external storage systems and network infrastructure, add many more components to the environment, any of which is a potential source of issues. Each of these components may require its own field of expertise, and a company may not have staff knowledgeable about all the components in its environment. Performance issues may also arise from complex interactions between components that work well in isolation. Solving this type of problem may require multiple domains of expertise to work together. As an example of such complexity within an environment, consider a mysterious performance issue we encountered at Joyent for a cloud-computing customer: the problem appeared to be a memory leak, but from an unknown location. -
Improving Route Scalability: Nexthops As Separate Objects
Improving Route Scalability: Nexthops as Separate Objects September 2019 David Ahern | Cumulus Networks !1 Agenda Executive Summary ▪ If you remember nothing else about this talk … Driving use case Review legacy route API Dive into Nexthop API Benefits of the new API Cumulus Networks !2 Performance with the Legacy Route API route route route prefix/lenroute prefix/lendev prefix/lendev gatewayprefix/len gatewaydev gatewaydev gateway Cumulus Networks !3 Splitting Next Hops from Routes Routes with separate Nexthop objects Legacy Route API route route prefix/len nexthop route nexthop id dev route gateway prefix/lenroute prefix/lendev prefix/lendev gatewayprefix/len gatewaydev gatewaydev gateway route prefix/len nexthop nexthop nexthop id group nexthopdev nexthop[N] gatewaydev gateway Cumulus Networks !4 Dramatically Improves Route Scalability … Cumulus Networks !5 … with the Potential for Constant Insert Times Cumulus Networks !6 Networking Operating System Using Linux APIs Routing daemon or utility manages switchd ip FRR entries in kernel FIBs via rtnetlink APIs SDK userspace ▪ Enables other control plane software to use Linux networking APIs rtnetlink Data path connections, stats, troubleshooting, … FIB notifications FIB Management of hardware offload is separate kernel upper devices tunnels ▪ Keeps hardware in sync with kernel ... eth0 swp1 swp2 swpN Userspace driver with SDK leveraging driver driver driver kernel notifications NIC switch ASIC H / W Cumulus Networks !7 NOS with switchdev Driver In-kernel switchdev driver ip FRR Leverages -
Performance Counters in Htop 3.0
Slide: [ ] Talk: Perf counters in htop 3.0 Presenter: https://hisham.hm PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command Performance counters in htop 3.0 Hisham Muhammad @[email protected] https://hisham.hm Slide: [ 2 ] Date: 2018-08-25 Talk: Perf counters in htop 3.0 Presenter: https://hisham.hm PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command About me original author of htop, a project started in 2004 http://hisham.hm/htop/ lead dev of LuaRocks, package manager for Lua http://luarocks.org/ co-founder of the GoboLinux distribution http://gobolinux.org/ developer at Kong – FLOSS API gateway http://getkong.org/ (we’re hiring!) Slide: [ 3 ] Date: 2018-08-25 Talk: Perf counters in htop 3.0 Presenter: https://hisham.hm PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command What is htop an interactive process manager intended to be “a better top” by this all I originally meant was: scrolling! (versions of top improved a lot since!) Slide: [ 4 ] Date: 2018-08-25 Talk: Perf counters in htop 3.0 Presenter: https://hisham.hm PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command Hello, htop! Slide: [ 5 ] Date: 2018-08-25 Talk: Perf counters in htop 3.0 Presenter: https://hisham.hm PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command htop beyond Linux Linux MacOS FreeBSD OpenBSD DragonFlyBSD Solaris (illumos) Slide: [ 6 ] Date: 2018-08-25 Talk: Perf counters in htop 3.0 Presenter: https://hisham.hm PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command Then Apple released a broken kernel.. -
20 Linux System Monitoring Tools Every Sysadmin Should Know by Nixcraft on June 27, 2009 · 315 Comments · Last Updated November 6, 2012
About Forum Howtos & FAQs Low graphics Shell Scripts RSS/Feed nixcraft - insight into linux admin work 20 Linux System Monitoring Tools Every SysAdmin Should Know by nixCraft on June 27, 2009 · 315 comments · Last updated November 6, 2012 Need to monitor Linux server performance? Try these built-in commands and a few add-on tools. Most Linux distributions are equipped with tons of monitoring. These tools provide metrics which can be used to get information about system activities. You can use these tools to find the possible causes of a performance problem. The commands discussed below are some of the most basic commands when it comes to system analysis and debugging server issues such as: 1. Finding out bottlenecks. 2. Disk (storage) bottlenecks. 3. CPU and memory bottlenecks. 4. Network bottlenecks. #1: top - Process Activity Command The top program provides a dynamic real-time view of a running system i.e. actual process activity. By default, it displays the most CPU-intensive tasks running on the server and updates the list every five seconds. Fig.01: Linux top command Commonly Used Hot Keys The top command provides several useful hot keys: Hot Usage Key t Displays summary information off and on. m Displays memory information off and on. Sorts the display by top consumers of various system resources. Useful for quick identification of performance- A hungry tasks on a system. f Enters an interactive configuration screen for top. Helpful for setting up top for a specific task. o Enables you to interactively select the ordering within top. r Issues renice command. -
Linux: Kernel Release Number, Part II
Linux: Kernel Release Number, Part II Posted by jeremy on Friday, March 4, 2005 - 07:05 In the continued discussion on release numbering for the Linux kernel [story], Linux creator Linus Torvalds decided against trying to add meaning to the odd/even least significant number. Instead, the new plan is to go from the current 2.6.x numbering to a finer-grained 2.6.x.y. Linus will continue to maintain only the 2.6.x releases, and the -rc releases in between. Others will add trivial patches to create the 2.6.x.y releases. Linus cautions that the task of maintaining a 2.6.x.y tree is not going to be enjoyable: "I'll tell you what the problem is: I don't think you'll find anybody to do the parallell 'only trivial patches' tree. They'll go crazy in a couple of weeks. Why? Because it's a _damn_ hard problem. Where do you draw the line? What's an acceptable patch? And if you get it wrong, people will complain _very_ loudly, since by now you've 'promised' them a kernel that is better than the mainline. In other words: there's almost zero glory, there are no interesting problems, and there will absolutely be people who claim that you're a dick-head and worse, probably on a weekly basis." He went on to add, "that said, I think in theory it's a great idea. It might even be technically feasible if there was some hard technical criteria for each patch that gets accepted, so that you don't have the burn-out problem." His suggested criteria included limiting the patch to being 100 lines or less, and requiring that it fix an oops, a hang, or an exploitable security hole. -
Reducing Power Consumption in Mobile Devices by Using a Kernel
IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. Z, NO. B, AUGUST 2017 1 Reducing Event Latency and Power Consumption in Mobile Devices by Using a Kernel-Level Display Server Stephen Marz, Member, IEEE and Brad Vander Zanden and Wei Gao, Member, IEEE E-mail: [email protected], [email protected], [email protected] Abstract—Mobile devices differ from desktop computers in that they have a limited power source, a battery, and they tend to spend more CPU time on the graphical user interface (GUI). These two facts force us to consider different software approaches in the mobile device kernel that can conserve battery life and reduce latency, which is the duration of time between the inception of an event and the reaction to the event. One area to consider is a software package called the display server. The display server is middleware that handles all GUI activities between an application and the operating system, such as event handling and drawing to the screen. In both desktop and mobile devices, the display server is located in the application layer. However, the kernel layer contains most of the information needed for handling events and drawing graphics, which forces the application-level display server to make a series of system calls in order to coordinate events and to draw graphics. These calls interrupt the CPU which can increase both latency and power consumption, and also require the kernel to maintain event queues that duplicate event queues in the display server. A further drawback of placing the display server in the application layer is that the display server contains most of the information required to efficiently schedule the application and this information is not communicated to existing kernels, meaning that GUI-oriented applications are scheduled less efficiently than they might be, which further increases power consumption. -
Jitk: a Trustworthy In-Kernel Interpreter Infrastructure
Jitk: A trustworthy in-kernel interpreter infrastructure Xi Wang, David Lazar, Nickolai Zeldovich, Adam Chlipala, Zachary Tatlock MIT and University of Washington Modern OSes run untrusted user code in kernel In-kernel interpreters - Seccomp: sandboxing (Linux) - BPF: packet filtering - INET_DIAG: socket monitoring - Dtrace: instrumentation Critical to overall system security - Any interpreter bugs are serious! 2/30 Many bugs have been found in interpreters Kernel space bugs - Control flow errors: incorrect jump offset, ... - Arithmetic errors: incorrect result, ... - Memory errors: buffer overflow, ... - Information leak: uninitialized read Kernel-user interface bugs - Incorrect encoding/decoding User space bugs - Incorrect input generated by tools/libraries Some have security consequences: CVE-2014-2889, ... See our paper for a case study of bugs 3/30 How to get rid of all these bugs at once? Theorem proving can help kill all these bugs seL4: provably correct microkernel [SOSP'09] CompCert: provably correct C compiler [CACM'09] This talk: Jitk - Provably correct interpreter for running untrusted user code - Drop-in replacement for Linux's seccomp - Built using Coq proof assistant + CompCert 5/30 Theorem proving: overview specification proof implementation Proof is machine-checkable: Coq proof assistant Proof: correct specification correct implementation Specification should be much simpler than implementation 6/30 Challenges What is the specification? How to translate systems properties into proofs? How to extract a running -
Mesalock Linux: Towards a Memory-Safe Linux Distribution
MesaLock Linux Towards a memory-safe Linux distribution Mingshen Sun MesaLock Linux Maintainer | Baidu X-Lab, USA Shanghai Jiao Tong University, 2018 whoami • Senior Security Research in Baidu X-Lab, Baidu USA • PhD, The Chinese University of Hong Kong • System security, mobile security, IoT security, and car hacking • MesaLock Linux, TaintART, Pass for iOS, etc. • mssun @ GitHub | https://mssun.me !2 MesaLock Linux • Why • What • How !3 Why • Memory corruption occurs in a computer program when the contents of a memory location are unintentionally modified; this is termed violating memory safety. • Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. !4 Stack Buffer Overflow • https://youtu.be/T03idxny9jE !5 Types of memory errors • Access errors • Buffer overflow • Race condition • Use after free • Uninitialized variables • Memory leak • Double free !6 Memory-safety in user space • CVE-2017-13089 wget: Stack-based buffer overflow in HTTP protocol handling • A stack-based buffer overflow when processing chunked, encoded HTTP responses was found in wget. By tricking an unsuspecting user into connecting to a malicious HTTP server, an attacker could exploit this flaw to potentially execute arbitrary code. • https://bugzilla.redhat.com/show_bug.cgi?id=1505444 • POC: https://github.com/r1b/CVE-2017-13089 !7 What • Linux distribution • Memory-safe user space !8 Linux Distribution • A Linux distribution (often abbreviated as distro) is an operating system made from a software collection, which is based upon the Linux kernel and, often, a package management system. !9 Linux Distros • Server: CentOS, Federa, RedHat, Debian • Desktop: Ubuntu • Mobile: Android • Embedded: OpenWRT, Yocto • Hard-core: Arch Linux, Gentoo • Misc: ChromeOS, Alpine Linux !10 Security and Safety? • Gentoo Hardened: enables several risk-mitigating options in the toolchain, supports PaX, grSecurity, SELinux, TPE and more. -
Kernel Boot-Time Tracing
Kernel Boot-time Tracing Linux Plumbers Conference 2019 - Tracing Track Masami Hiramatsu <[email protected]> Linaro, Ltd. Speaker Masami Hiramatsu - Working for Linaro and Linaro members - Tech Lead for a Landing team - Maintainer of Kprobes and related tracing features/tools Why Kernel Boot-time Tracing? Debug and analyze boot time errors and performance issues - Measure performance statistics of kernel boot - Analyze driver init failure - Debug boot up process - Continuously tracing from boot time etc. What We Have There are already many ftrace options on kernel command line ● Setup options (trace_options=) ● Output to printk (tp_printk) ● Enable events (trace_events=) ● Enable tracers (ftrace=) ● Filtering (ftrace_filter=,ftrace_notrace=,ftrace_graph_filter=,ftrace_graph_notrace=) ● Add kprobe events (kprobe_events=) ● And other options (alloc_snapshot, traceoff_on_warning, ...) See Documentation/admin-guide/kernel-parameters.txt Example of Kernel Cmdline Parameters In grub.conf linux /boot/vmlinuz-5.1 root=UUID=5a026bbb-6a58-4c23-9814-5b1c99b82338 ro quiet splash tp_printk trace_options=”sym-addr” trace_clock=global ftrace_dump_on_oops trace_buf_size=1M trace_event=”initcall:*,irq:*,exceptions:*” kprobe_event=”p:kprobes/myevent foofunction $arg1 $arg2;p:kprobes/myevent2 barfunction %ax” What Issues? Size limitation ● kernel cmdline size is small (< 256bytes) ● A half of the cmdline is used for normal boot Only partial features supported ● ftrace has too complex features for single command line ● per-event filters/actions, instances, histograms. Solutions? 1. Use initramfs - Too late for kernel boot time tracing 2. Expand kernel cmdline - It is not easy to write down complex tracing options on bootloader (Single line options is too simple) 3. Reuse structured boot time data (Devicetree) - Well documented, structured data -> V1 & V2 series based on this. Boot-time Trace: V1 and V2 series V1 and V2 series posted at June. -
Ivoyeur: Inotify
COLUMNS iVoyeur inotify DAVE JOSEPHSEN Dave Josephsen is the he last time I changed jobs, the magnitude of the change didn’t really author of Building a sink in until the morning of my first day, when I took a different com- Monitoring Infrastructure bination of freeways to work. The difference was accentuated by the with Nagios (Prentice Hall T PTR, 2007) and is Senior fact that the new commute began the same as the old one, but on this morn- Systems Engineer at DBG, Inc., where he ing, at a particular interchange, I would zig where before I zagged. maintains a gaggle of geographically dispersed It was an unexpectedly emotional and profound metaphor for the change. My old place was server farms. He won LISA ‘04’s Best Paper off to the side, and down below, while my future was straight ahead, and literally under award for his co-authored work on spam construction. mitigation, and he donates his spare time to the SourceMage GNU Linux Project. The fact that it was under construction was poetic but not surprising. Most of the roads I [email protected] travel in the Dallas/Fort Worth area are under construction and have been for as long as anyone can remember. And I don’t mean a lane closed here or there. Our roads drift and wan- der like leaves in the water—here today and tomorrow over there. The exits and entrances, neither a part of this road or that, seem unable to anticipate the movements of their brethren, and are constantly forced to react. -
Chapter 1. Origins of Mac OS X
1 Chapter 1. Origins of Mac OS X "Most ideas come from previous ideas." Alan Curtis Kay The Mac OS X operating system represents a rather successful coming together of paradigms, ideologies, and technologies that have often resisted each other in the past. A good example is the cordial relationship that exists between the command-line and graphical interfaces in Mac OS X. The system is a result of the trials and tribulations of Apple and NeXT, as well as their user and developer communities. Mac OS X exemplifies how a capable system can result from the direct or indirect efforts of corporations, academic and research communities, the Open Source and Free Software movements, and, of course, individuals. Apple has been around since 1976, and many accounts of its history have been told. If the story of Apple as a company is fascinating, so is the technical history of Apple's operating systems. In this chapter,[1] we will trace the history of Mac OS X, discussing several technologies whose confluence eventually led to the modern-day Apple operating system. [1] This book's accompanying web site (www.osxbook.com) provides a more detailed technical history of all of Apple's operating systems. 1 2 2 1 1.1. Apple's Quest for the[2] Operating System [2] Whereas the word "the" is used here to designate prominence and desirability, it is an interesting coincidence that "THE" was the name of a multiprogramming system described by Edsger W. Dijkstra in a 1968 paper. It was March 1988. The Macintosh had been around for four years. -
Linux Pocket Guide.Pdf
3rd Edition Linux Pocket Guide ESSENTIAL COMMANDS Daniel J. Barrett 3RD EDITION Linux Pocket Guide Daniel J. Barrett Linux Pocket Guide by Daniel J. Barrett Copyright © 2016 Daniel Barrett. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebasto‐ pol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promo‐ tional use. Online editions are also available for most titles (http://safaribook‐ sonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Editor: Nan Barber Production Editor: Nicholas Adams Copyeditor: Jasmine Kwityn Proofreader: Susan Moritz Indexer: Daniel Barrett Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Rebecca Demarest June 2016: Third Edition Revision History for the Third Edition 2016-05-27: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491927571 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Linux Pocket Guide, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellec‐ tual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.