Sys Diag User's Guide

sys_diag User’s Guide Release 8.4 Note: This extended MS Word version is based upon the core self-extracting README_sys_diag.txt file. (Copyright © 1999-2017 by Todd A. Jobson) Pg 1 of 33 _________________________________ Outline of this document : _________________________________ 1.0 sys_diag v.8.3g Overview 2.0 HTML Report – Outline & Interpretation 3.0 Command Line Arguments & Available Parameters 4.0 Common Command Line Usage Examples + 5.0 Capturing sys_diag Command Line output 6.0 Executing sys_diag via Crontab entries 7.0 Reducing System Overhead during Data Capture 8.0 Performance Data: Threshold Analysis and Baselines 9.0 Creating / Viewing Graphs of Performance Data 10.0 sys_diag DIRECTORIES and DATA FILE Descriptions 11.0 Sample sys_diag_BASELING.cfg file 12.0 Sample Command Line Output 13.0 Downloads, Resources and Feedback ---------------------------------------------------------------------------------------------- _______________________________ 1.0 sys_diag v.8.3 Overview : _______________________________ BACKGROUND : Over the course of the past ~15+ yrs as a former SunPS field consulting Architect (now Oracle) employee, sys_diag has been personally developed in my spare time in order to increase productivity and efficiency when working with Solaris systems (system configuration snapshots, workload characterization, historical performance trending, performance analysis, POC / Proof Of Concept load testing, bottleneck root cause identification, capacity planning of stand-alone or as part of larger TCO or consolidation analysis, and/or current/future state architectural assessment). By placing this and prior versions for public use under copyright, hopefully others can reap the many time-saving benefits of this utility, making use of my efforts and sys_diag, to streamline any admin/analysis/assessment activities required of them. This has been an invaluable asset used to characterize / diagnose / analyze workloads across literally hundreds of systems within many of the top Fortune 100 datacenters. As would be expected, the obligations, support, and implications of use are the sole responsibility of the user, as is documented within the header of sys_diag. As a standard “best practice”, this and/or any new workload introduced to a system should always be tested first in a non-production environment for validation and familiarity. INTRODUCTION : sys_diag is a Solaris utility (ksh/awk/javascript) that can perform several functions, among them : system configuration 'snapshot' and reporting (detailed or high-level) along-side performance data capture (over some specified duration or point in time PEAK PERIOD 'snapshot'). Most significantly, after the data is captured, it automatically does correlation, analysis, and reporting of findings/exceptions (based upon configurable thresholds that can be easily changed within the script header). The output provides a single .html report with a color-coded “dashboard” that includes auto-generated chart summaries of findings, along-side system configuration and snapshot details. Each run of sys_diag creates a local sub-directory where all datafiles captured or created (analysis, reports, graphs generated) are stored. Upon completion, sys_diag creates an compressed archive within a single .tar.Z for examination externally. The report format is provided in .html, and .txt as a single file for easy review (without requiring trudging through several subdirectories of separate files potentially (Copyright © 1999-2017 by Todd A. Jobson) Pg 2 of 33 thousands of lines long each, to manually correlate and review for hours /days.. before manually generating the assessment report and/or any graphs needed). This tool will literally save you a week of analysis for complicated configurations that require diagnosis. sys_diag has previously been run on Solaris 2.x (or above) Solaris platforms, and today should be capable of being run on any x86 or SPARC Solaris 8+ system. Version 8.3 includes reporting new Solaris 11.3 capabilities (zones, LDOM’s/OVM, SRM, zfspools, fmd, ipfilter/ipnat, link aggregation, Dtrace probing, etc...). Beyond the Solaris configuration reporting commands (System/storage HW config, OS config, kernel tunables, network/IPMP/Trunking config, ZFS/FS/VM/NFS, users/groups, security, NameSvcs, pkgs, patches, errors/warnings, and system/network performance metrics), sys_diag also captures relevant application configuration details, such as Sun Cluster 2.x/3.x, Veritas VCS/VM/vxfs, Oracle .ora/RAC/CRS/listener.., MySQL.., along with other detailed configuration capture of key files (and tracking of changes via -t), etc. Of all the capabilities, the greatest benefits are found by being able to run this single ksh script on a system and do the analysis from one single report/ file offline/elsewhere. Since sys_diag is a ksh script (using awk for post-processing the data and javascript for dynamic HTML/chart generation), no packages need to be installed, only using standard built-in Solaris Utilities, allowing for the widest range of support. Version 8.3g of sys_diag offers built-in dynamic HTML generation with both javascript embedded dashboard charts, as well as stand-alone .gr.html files for each individual chart. Additionally, the vmstat, iostat, and netstat data is exported in a text format friendly (.gr.txt) format to import and create custom graphs from within OpenOffice or Excel. Regarding the system overhead, sys_diag runs all commands in a serially, (waiting for each command to complete before running the next) impacting system performance the same as if an admin were typing these commands one at a time on a console. The only exception is the background vmstat/mpstat/iostat/netstat (-g) performance gathering of metrics at the specified sampling interval (-I) and total duration (-T), which generally has negligible overhead on a system. *See Section 7 for examples to reduce overhead* Workflow (order of execution) of a typical sys_diag run (with arguments “-g –I1 –l”) : This example uses a 1 second sampling Interval (-I) a DEFAULT Total duration (-T) of 5 minutes = (–T 300) to gather performance data (-g) and create a long (-l) configuration report. *All Commands are run serially, except Background Collection* - Extract README_sys_diag.txt - Beginning BME (0=Begin/1=Midpt/2=EndPt) Profiling SNAPSHOT (#0) [IF NOT –x & is -v|-V] (to profile the system serially with prstat, ps, iostat, netstat, zpool, tcpstat,.. *before any background collection is started*). - Initiate BACKGROUND Data Collection (vm/mp/io/netstat..) at (“-I x”) x sec intervals for total duration default 300 seconds (5mins) or t Total Seconds via “-T t”. - WAIT until the MidPoint of Background Data Collection - Initiate BME Midpoint Profiling SNAPSHOT (#1), *ONLY IF >3mins of Total duration remains, & Not Excluded via “-x”, & using Deep Verbosity via “-V”. - WAIT for Background Data Collection to Complete - Initiate BME Midpoint Profiling SNAPSHOT (#2), *ONLY IF Not Excluded via “-x”, & using verbosity via “-v|-V”. - Capture System Configuration Data for report (following the TOC Table of Contents Outline) - Post-Process the Performance data gathered to identify exceptions. - Generate both the embedded .HTML Javascript charts and stand-alone .html and .gr.txt files (for Excel/OpenOffice custom import chart creation) - Generate the complete .html report - Identify the DataDirectory Path, the HTML Report File link - Create a compressed tar.Z archive of DataDirectory (all+ sys_diag & perflog) * See Section 12 for complete sample command line output running sys_diag * sys_diag is generally run from the same directory (eg. /var/tmp) that will have enough available disk space for storing the data directories and archives (however, the data directory and all files can be removed after each run using –C). When always run from the same directory, a single sys_diag_perflog.out file is appended to as a system chronology of performance each time sys_diag is run, that can later be referred to. NOTE: ** USE Chrome, Firefox as recommended browsers ** (for best viewing open full screen) (Copyright © 1999-2017 by Todd A. Jobson) Pg 3 of 33 ______________________________________________________________________________ 2.0 HTML Report - Outline and Interpretation ______________________________________________________________________________ The final report output that sys_diag produces, comes in 2 formats : .out (Text) Or sysd_hostname_date_time.out.html (HTML/Javascript). Both reports include a “Header” section that summarizes basic system details and characteristics of the Sys_diag snapshot captured. The .html report additionally includes the performance analysis “dashboard”, where data is summarized and color-coded within separate sub-system sections : CPU/Kernel, Memory, Storage IO, Network. Within each of the dashboard sections, details of sub-system “health”, identification of flagged exceptions, embedded charts, and links to detailed “Analysis” of captured data (how/why/where exceptions where flagged) + links to related system details (the data behind the analysis and findings). Beyond the dashboard, you will find the configuration report Table Of Contents that categorize and link all facets of system configuration within 25 Sections (to bring you directly to the relevant data within these sections). 2.1 HTML Report (Sample) Header The following is a sample .html report header from output generated within the global zone of a Solaris 11.3 host named “Newton-S11.3x6” running on an x86 server within a VirtualBox

Sys Diag User's Guide

Web Vmstat Any Distros, Especially Here’S Where Web Vmstat Comes Those Targeted at In

Performance, Scalability on the Server Side

Linux Performance Tools

Java Bytecode Manipulation Framework

UNIX OS Agent User's Guide

System Analysis and Tuning Guide System Analysis and Tuning Guide SUSE Linux Enterprise Server 15 SP1

A Hitchhiker's Guide for Performance Assessment and Benchmarking SAS® Applications

SUSE Linux Enterprise Server 12 SP4 System Analysis and Tuning Guide System Analysis and Tuning Guide SUSE Linux Enterprise Server 12 SP4

Whitepaper Why Nagios and Server Monitoring Are Failing Modern Apps

Licensing Information User Manual Oracle® ILOM

Embedded Android

Informix Best Practices Configuration, ONCONFIG, CPU and Memory Usage