sys_diag

User’s Guide

Release 8.4

Note: This extended MS Word version is based upon the core self-extracting README_sys_diag.txt file.

(Copyright © 1999-2017 by Todd A. Jobson) Pg 1 of 33

______

Outline of this document : ______

1.0 sys_diag v.8.3g Overview

2.0 HTML Report – Outline & Interpretation

3.0 Command Line Arguments & Available Parameters

4.0 Common Command Line Usage Examples +

5.0 Capturing sys_diag Command Line output

6.0 Executing sys_diag via Crontab entries

7.0 Reducing System Overhead during Data Capture

8.0 Performance Data: Threshold Analysis and Baselines

9.0 Creating / Viewing Graphs of Performance Data

10.0 sys_diag DIRECTORIES and DATA FILE Descriptions

11.0 Sample sys_diag_BASELING.cfg file

12.0 Sample Command Line Output

13.0 Downloads, Resources and Feedback

------______

1.0 sys_diag v.8.3 Overview : ______

BACKGROUND :

Over the course of the past ~15+ yrs as a former SunPS field consulting Architect (now Oracle) employee, sys_diag has been personally developed in my spare time in order to increase productivity and efficiency when working with Solaris systems (system configuration snapshots, workload characterization, historical performance trending, performance analysis, POC / Proof Of Concept load testing, bottleneck root cause identification, capacity planning of stand-alone or as part of larger TCO or consolidation analysis, and/or current/future state architectural assessment).

By placing this and prior versions for public use under copyright, hopefully others can reap the many time-saving benefits of this utility, making use of my efforts and sys_diag, to streamline any admin/analysis/assessment activities required of them. This has been an invaluable asset used to characterize / diagnose / analyze workloads across literally hundreds of systems within many of the Fortune 100 datacenters.

As would be expected, the obligations, support, and implications of use are the sole responsibility of the user, as is documented within the header of sys_diag. As a standard “best practice”, this and/or any new workload introduced to a system should always be tested first in a non-production environment for validation and familiarity.

INTRODUCTION :

sys_diag is a Solaris utility (ksh/awk/javascript) that can perform several functions, among them : system configuration 'snapshot' and reporting (detailed or high-level) along-side performance data capture (over some specified duration or point in time PEAK PERIOD 'snapshot'). Most significantly, after the data is captured, it automatically does correlation, analysis, and reporting of findings/exceptions (based upon configurable thresholds that can be easily changed within the script header). The output provides a single .html report with a color-coded “dashboard” that includes auto-generated chart summaries of findings, along-side system configuration and snapshot details.

Each run of sys_diag creates a local sub-directory where all datafiles captured or created (analysis, reports, graphs generated) are stored. Upon completion, sys_diag creates an compressed archive within a single .tar.Z for examination externally.

The report format is provided in .html, and .txt as a single file for easy review (without requiring trudging through several subdirectories of separate files potentially (Copyright © 1999-2017 by Todd A. Jobson) Pg 2 of 33

thousands of lines long each, to manually correlate and review for hours /days.. before manually generating the assessment report and/or any graphs needed). This tool will literally save you a week of analysis for complicated configurations that require diagnosis. sys_diag has previously been run on Solaris 2.x (or above) Solaris platforms, and today should be capable of being run on any x86 or SPARC Solaris 8+ system. Version 8.3 includes reporting new Solaris 11.3 capabilities (zones, LDOM’s/OVM, SRM, zfspools, fmd, ipfilter/ipnat, link aggregation, Dtrace probing, etc...).

Beyond the Solaris configuration reporting commands (System/storage HW config, OS config, kernel tunables, network/IPMP/Trunking config, ZFS/FS/VM/NFS, users/groups, security, NameSvcs, pkgs, patches, errors/warnings, and system/network performance metrics), sys_diag also captures relevant application configuration details, such as Sun Cluster 2.x/3.x, Veritas VCS/VM/vxfs, Oracle .ora/RAC/CRS/listener.., MySQL.., along with other detailed configuration capture of key files (and tracking of changes via -t), etc.

Of all the capabilities, the greatest benefits are found by being able to run this single ksh script on a system and do the analysis from one single report/ file offline/elsewhere. Since sys_diag is a ksh script (using awk for post-processing the data and javascript for dynamic HTML/chart generation), no packages need to be installed, only using standard built-in Solaris Utilities, allowing for the widest range of support.

Version 8.3g of sys_diag offers built-in dynamic HTML generation with both javascript embedded dashboard charts, as well as stand-alone .gr.html files for each individual chart. Additionally, the vmstat, , and netstat data is exported in a text format friendly (.gr.txt) format to import and create custom graphs from within OpenOffice or Excel.

Regarding the system overhead, sys_diag runs all commands in a serially, (waiting for each command to complete before running the next) impacting system performance the same as if an admin were typing these commands one at a time on a console. The only exception is the background vmstat//iostat/netstat (-g) performance gathering of metrics at the specified sampling interval (-I) and total duration (-T), which generally has negligible overhead on a system. *See Section 7 for examples to reduce overhead*

Workflow (order of execution) of a typical sys_diag run (with arguments “-g –I1 –l”) :

This example uses a 1 second sampling Interval (-I) a DEFAULT Total duration (-T) of 5 minutes = (–T 300) to gather performance data (-g) and create a long (-l) configuration report. *All Commands are run serially, except Background Collection*

- Extract README_sys_diag.txt - Beginning BME (0=Begin/1=Midpt/2=EndPt) Profiling SNAPSHOT (#0) [IF NOT –x & is -v|-V] (to profile the system serially with prstat, ps, iostat, netstat, zpool, tcpstat,.. *before any background collection is started*).

- Initiate BACKGROUND Data Collection (vm/mp/io/netstat..) at (“-I x”) x sec intervals for total duration default 300 seconds (5mins) or t Total Seconds via “-T t”.

- WAIT until the MidPoint of Background Data Collection

- Initiate BME Midpoint Profiling SNAPSHOT (#1), *ONLY IF >3mins of Total duration remains, & Not Excluded via “-x”, & using Deep Verbosity via “-V”.

- WAIT for Background Data Collection to Complete

- Initiate BME Midpoint Profiling SNAPSHOT (#2), *ONLY IF Not Excluded via “-x”, & using verbosity via “-v|-V”. - Capture System Configuration Data for report (following the TOC Table of Contents Outline)

- Post-Process the Performance data gathered to identify exceptions. - Generate both the embedded .HTML Javascript charts and stand-alone .html and .gr.txt files (for Excel/OpenOffice custom import chart creation) - Generate the complete .html report - Identify the DataDirectory Path, the HTML Report File link - Create a compressed tar.Z archive of DataDirectory (all+ sys_diag & perflog)

* See Section 12 for complete sample command line output running sys_diag * sys_diag is generally run from the same directory (eg. /var/tmp) that will have enough available disk space for storing the data directories and archives (however, the data directory and all files can be removed after each run using –C). When always run from the same directory, a single sys_diag_perflog.out file is appended to as a system chronology of performance each time sys_diag is run, that can later be referred to.

NOTE: ** USE Chrome, Firefox as recommended browsers ** (for best viewing open full screen) (Copyright © 1999-2017 by Todd A. Jobson) Pg 3 of 33

______

2.0 HTML Report - Outline and Interpretation ______

The final report output that sys_diag produces, comes in 2 formats : .out (Text) Or sysd_hostname_date_time.out.html (HTML/Javascript).

Both reports include a “Header” section that summarizes basic system details and characteristics of the Sys_diag snapshot captured.

The .html report additionally includes the performance analysis “dashboard”, where data is summarized and color-coded within separate sub-system sections : CPU/Kernel, Memory, Storage IO, Network. Within each of the dashboard sections, details of sub-system “health”, identification of flagged exceptions, embedded charts, and links to detailed “Analysis” of captured data (how/why/where exceptions where flagged) + links to related system details (the data behind the analysis and findings).

Beyond the dashboard, you will find the configuration report Table Of Contents that categorize and link all facets of system configuration within 25 Sections (to bring you directly to the relevant data within these sections).

2.1 HTML Report (Sample) Header

The following is a sample .html report header from output generated within the global zone of a Solaris 11.3 host named “Newton-S11.3x6” running on an x86 server within a VirtualBox VM (an excellent way to become familiar with sys_diag, and/or analyze customer data from within Solaris !) :

From the header, most of the basic system/LDOM/zone specifications can quickly be determined, along with the Version of sys_diag run, what command line arguments used, the snapshot interval and duration, times that snapshots occurred and the location of the data directory for all data captured.

Note the “PERF SNAPSHOTS” label is underlined and highlighted in blue, indicating that by clicking the link, you will be redirected to the section of the report for viewing some of the embedded BME Snapshot Datafiles such as prstat, pmap, pfiles, lockstat, etc.

(Copyright © 1999-2017 by Todd A. Jobson) Pg 4 of 33

2.2 HTML Report (Sample) System Utilization “Dashboard”

As noted previously, the HTML report includes a very useful addition beyond the Text (.out) version of the report.

The additional “Dashboard” includes the following sections that recap all system findings and analysis :

- Workload Characterization Summary - CPU / Kernel Profiling - Memory Profiling - Storage IO - Network IO

NOTE : Within each “sub-system” Dashboard section, charts are only included IF the % of exceptions is > 0. In other words, if there are no issues, no charts are included, and the section is flagged as “Green”.

It should be also noted that viewing the .html javascript embedded (or stand-alone gr.html) charts requires an internet connection since the javascript utilizes the google charts API and back-end services.

2.2.1 Sample Dashboard Section : Workload Characterization Summary

As can be seen from the example below (from an actual customer t7-4 RAC/ASM POC), all high-level Workload attributes can be seen very quickly :

- Characterization of System Workloads / Types (Applications, DB, Java, ..) - Identification of busiest system processing at this point in time. - ID of system workload characteristics: Multi-threaded (1025 LWPs) vs. Single-Threaded (1 LWP) - Busy process memory footprints - Busiest users related applications - System CPU/Memory/SWAP/Network usage by gobal/non-global zones vs. total system (or shared). - System Processor and vcpu configuration - Several links to analyze the next level of detail for workloads sorted by CPU/%MEM,LWPs, etc.

(Copyright © 1999-2017 by Todd A. Jobson) Pg 5 of 33

2.2.2 Sample Dashboard Section : CPU / Kernel Profiling

The following 2 samples depict entirely different stories.

From both of these examples, it can be seen quickly that there are exceptions flagged, since Graphs are included and the sections aren’t outlined as “Green” ;-) Detailed below also indicates the “NOTE: XX % : xxx of yyy VMSTAT CPU entries EXCEEDED Thresholds !”

The pie chart depicts the breakdown of vmstat CPU exceptions that are flagged, while the line chart shows the key metrics captured during the background data gathering over the duration of the sys_diag run.

As with all Dashboard sub-system detailed tables, the TOTAL Averages and PEAK’s are depicted for key metrics (by parsing all the non-zero data points gathered during background data collection).

The first example demonstrates how a system can appear as 100% IDLE !?, while the CPU’s are actually Waiting on Kernel threads to complete execution.. that are BLOCKED, waiting on IO (which you can’t immediately tell, until you see the Storage IO section below that corresponds to this).

The second example depicts a system (the prior Vbox S11.3 zone) that has a different issue, a kernel Run Queue That periodically slows down processing (primarily due to not having enough CPU resources within the VM/zone to handle the “bursty” nature of the workload that is causing the RQ to form and involuntary Context Switching..).

(Copyright © 1999-2017 by Todd A. Jobson) Pg 6 of 33

2.2.3 Sample Dashboard Section : Memory Profiling

The following sample “MEMORY Profiling” section below, (similar to the other dashboard sections) rapidly identifies that there are exceptions flagged, and that the section is CRITICAL, given the “RED” background color. The pie chart depicts the breakdown of vmstat CPU exceptions that are flagged, while the line chart shows the key metrics captured during the background data gathering over the duration of the sys_diag run.

Additionally, the Graphs are included to depict “What” Categories and percentages of Exceptions have been flagged, along-side the line chart showing the system-wide memory usage (% Free Memory and System Scan Rate to free memory). Details below the charts indicate the “NOTE: XX % : xxx of yyy VMSTAT MEMORY entries EXCEEDED Thresholds !”

As with all Dashboard sub-system detailed tables, the TOTAL Averages and PEAK’s are depicted for key metrics (by parsing all the non-zero data points gathered during background data collection).

This example also reflects an additional table section only listed for Non-Global-Zones that have Memory “Resource Capping” turned on (limiting non-global zone memory to that cap).

As a contrast to the vmstat Memory data, which reflects “system-wide” metrics, the zone-specific memory usage is Shown here as “Zone : Einstein Physical Mem Usage”. This data comes from aggregated “zonestat” data gathered during background gathering, but specific to non-global zone memory usage.

NOTE, the [3072 MB CAP] listed (of the systems total 3.6 GB Physical Memory of the Global Zone and Server)

This aggregated zonestat data is broken down by Minimum Used and Peak Used per 3 categories :

- Total Server (Total Zone memory used on the server, including Other zones System Shared kernel memory) - System Shared (Memory used by Zone Einstein and shared also with all other zones = shared Kernel memory) - Einstein (Memory used ONLY within non-global Zone Einstein)

The last section for this non-global zone depicts the Resource Capping Stats, showing the Local non-GZ memory used, in addition to any rcapd “paging” if/when the non-GlobalZone goes beyond it’s MCAP.

(Copyright © 1999-2017 by Todd A. Jobson) Pg 7 of 33

2.2.4 Sample Dashboard Section : Storage IO

The following sample “Storage IO” section below (similar to the other dashboard sections), rapidly identifies that there are exceptions flagged, and that the section is CRITICAL, given the “RED” background color. The pie charts depict the breakdown of iostat exceptions that are flagged.

One distinction with this section, is that the first Pie Chart includes ALL system Devices and Exceptions, while the second Pie Chart and corresponding Line chart shows the systems’ SLOWEST Storage Device. Similar to other sections, the line chart graphs the key metrics captured during the background data gathering over the duration of the sys_diag run.

The sample below shows the corresponding Storage Bottlenecks that are directly related to the CPU / Kernel dashboard example above where CPU Idle had approached 100% !? due to BLOCKED Kernel Threads (waiting on IO as can be seen here).

Key metrics to take note of for the Storage IO devices are the latency involved in both AVERAGE transactions, As well as the PEAKs found by post-processing the gathered iostat data. The Active (asvc_t) and Wait (wsvc_t) Service times and %ages indicated by the 4 slowest devices (SAN LUN’s) are so dramatic, that they are crippling the overall usefulness of this DB server (and RAC cluster) !!

The last portion of this dashboard (Controller aggregate totals) has been left out of this example.

(Copyright © 1999-2017 by Todd A. Jobson) Pg 8 of 33

2.2.4.1 Sample Dashboard Section : Expanded LARGE Charts

Dashboard LARGE Charts can be opened by clicking on the smaller embedded in-line summary charts within each dashboard section (where exceptions > 0% exist .. aka, Not “Green” sections).

NOTE : The HTML charts are interactive, enabling the user to mouse pointer over any datapoint, or also click on the Legend line color on the right to highlight that specific set of data-points.

Each of these charts can be opened independently from the master .html report from any internet connected browser.

The example below is of the prior example 2.2.2 : “VMSTAT : CPU / Kernel” line chart being clicked.

(Copyright © 1999-2017 by Todd A. Jobson) Pg 9 of 33

The example below is of a prior VMSTAT CPU Pie Chart being clicked.

NOTE : The HTML charts are interactive, enabling the user to mouse pointer over any datapoint, or also click on the Legend line color on the right to highlight that specific set of data-points.

(Copyright © 1999-2017 by Todd A. Jobson) Pg 10 of 33

The example below is of the prior example’s Storage IO : “Slowest IO Device” LINE Chart being clicked.

NOTE : The HTML charts are interactive, enabling the user to mouse pointer over any datapoint, or also click on the Legend line color on the right to highlight that specific set of data-points.

(Copyright © 1999-2017 by Todd A. Jobson) Pg 11 of 33

2.2.5 Sample Dashboard Section : Network IO Profiling

The following sample “Network IO” section below, (similar to the other dashboard sections) rapidly identifies (in this case) that there are NO exceptions flagged, and that the section is “healthy”, given the “GREEN” background color.

This is further confirmed by the indication that there are 0% of netstat entries flagged as exceptions : “NOTE: XX % : xxx of yyy NETSTAT entries EXCEEDED Thresholds !”

The one difference with this section is that all individual interfaces are listed.

Also, another difference with this dashboard is the presence of a table showing the busiest Network TCP Connections, on top of a table showing the aggregated summary of all NIC’s within the system.

As with all Dashboard sub-system detailed tables, the TOTAL Averages and PEAK’s are depicted for key metrics (by parsing all the non-zero data points gathered during background data collection).

NOTE that ANY DROPPPED packets or “Undrained Network Queues” (below as Undrained TCP Ports) would also indicate a potential bottleneck or interface saturation (could also be a server on the other end not receiving fast enough, or any network device in-between) !

(Copyright © 1999-2017 by Todd A. Jobson) Pg 12 of 33

2.6 HTML Report Table of Contents

(Copyright © 1999-2017 by Todd A. Jobson) Pg 13 of 33

______

3.0 Command Line Arguments & available parameters : ______

COMMAND USAGE :

# sys_diag [-a -A -c -C -d_ -D -f_ -g -G -H -I_ -l -L_ -n -N -o_ -p -P -q -s -S -T_ -t -u -v -V -h|-?]

-a Application / DB Configs (included in -l/-A, Oracle/RAC/MySQL/SunRay ..)

-A ALL Options are turned on, except Debug and -u

-b Generate a Performance Thresholds "Baseline" profile (see -B or default fname used)

-B (1 | 2) Use Baseline file Threshold Analysis Calculation (1=Range HWM, 2=StdDev)

-c Configuration details (included in -l/-A)

-C Cleanup Files and remove Directory if tar works

-d path Base directory for data directory / files

-D Debug mode (ksh set -x .. echo statements/variables/evaluations)

-e email_addr Emails sys_diag .tar.Z file upon completion (assuming sendmail is configured)

-f input_file Used with -t to list configuration files to Track changes of

-g gather Performance data (def: 5 sec samples for 5 mins, unless -I |-T exist)

-G GATHER Extended Perf data (S10+ Dtrace, lockstats+, pmap/pfiles) vs -g

-h | -? Help / Command Usage (this listing) / Version_#

-H HA configuration and stats (Solaris Cluster, VCS, ..)

-I secs Perf Gathering Sample Interval (default is 5 secs)

-l Long Listing (most details, but not -g|-G,-v|-V,-A,-t,-D)

-L label_descr_nospaces (Descriptive Label For Report)

-n Network configuration and stats (also included in -l/-A except ndd settings)

-N No Graph generation in HTML Reports.

-o outfile Output filename (stored under sub-dir created)

-p Specify Individual Performance Subsystems for data capture (for -g | -G). [eg “-p cminp” selects All (CPU|Mem|IO|Net|Process), “-p cn” only cpu & net]

-P -d ./data_dir_path Post-process the Perf data skipped with -S and finish .html rpt

-q Quiet mode, disables command line output. (*not yet fully implemented*)

-s Security configuration

-S SKIP POST PROCESSing of Performance data (use -P -d data_dir to complete)

-t Track configuration / cfg_file changes (Saves/Rpts cfg/file chgs *see -f)

-T secs Perf Gathering Total Duration (default is 300 secs =5 mins)

-u unTar ed: (do NOT create a tar file)

-v Extended verbosity level 1 (for -g perf gathering, examines more top procs, Also adds pmap/pfiles/ptree, and lightweight lockstat to BME SNAPSHOTS).

-V Deep Verbosity level 2 (adds path_to_inst, netwk dev settings, snoop..) Longer message/error/log listings. Additionally, the probe duration for Dtrace and lockstat sampling is widened from 2 seconds (during -G) to 5 seconds (if -G && -V). Ping is also run against the default route and google.com. If -g|-G & -V, then mdb memory usage is captured (page cache, kernel, anon..).

-x Excludes lockstat, intrstat, plockstat (DTrace usage),pfiles & mdb from -g|-G performance data gathering, also skipping Midpt BME snapshots.

(Copyright © 1999-2017 by Todd A. Jobson) Pg 14 of 33

----

BOTH of the following command line syntax examples are functionally the same (order/spacing doesn’t matter):

eg. ./sys_diag -g –v -I 1 -T 600 -l OR ./sys_diag -g -l -I1 –T600 -v

NOTE: NO args equates to a brief rpt with NO Performance capture (No -A,-g/I,-l,-t,-D,-V,..)

** Also, note that option/parameter ordering is flexible, as well as use of white space before arguments to parameters (or not). The only requirement is to list every option/parameter separately with a preceeding - (-g -l , but not -gl).

------

********************************** ** EXIT Status ** (Return Code) : **********************************

0 if OK, non-zero if an error occurred or Performance EXCEEDED Thresholds!found

IF Performance Gathering and Analysis (-g|-G) has Noted EXCEEDED Thresholds! THEN a bitmask is produced of the following Conditions (added together to produce a single integer exit/return code) :

***************************************** RED (Critical) CPU Alarm : return_code = return_code + 1 RED (Critical) Memory Alarm : return_code = return_code + 2 RED (Critical) StorageIO Alrm : return_code = return_code + 4 RED (Critical) Network Alarm : return_code = return_code + 8 YELLOW (Warning) CPU Alarm : return_code = return_code + 16 YELLOW (Warning) Memory Alarm : return_code = return_code + 32 YELLOW (Warning) StorageIO Alrm : return_code = return_code + 64 YELLOW (Warning) Network Alarm : return_code = return_code + 128

Therefore, if you take the return code and start by subracting the highest values, you can identify which subsystems (cpu/memory/storageIO/network) had alarms.

eg. root# exit 0 will give you the exit code of the last run command/utility

Therefore, if sys_diag returned an exit code of 129, then that depicts :

return_code - 128 shows that Network Warnings (YELLOW) were present.. and return_code - 1 shows CPU (RED) Critical Alarms

(essentially, start subtracting the largest exceptions, and take the remainder and go down the list.. so an exit code of 5 would have been RED_IO & RED_CPU)

------

(Copyright © 1999-2017 by Todd A. Jobson) Pg 15 of 33

______

4.0 Common Command Line Usage examples : ______

./sys_diag -l Creates a LONG (detailed) configuration snapshot report in both HTML (.html) and Text formats (.out). Without -l, the config report created has minimal system cfg details. Note, that -l (as with most cmd line arguments) can be added when capturing performance data to create a more complete rpt.

./sys_diag -g gathers performance data at the default sampling rate of 5 secs for a total duration of 5 mins, creating a color coded HTML rpt with header/ Dashboard Summary section and performance details/ findings/exceptions found. Also runs the BME starting/endpoint snapshots (before/after background data gathering of vm/mp/io/netstat..). *This example will NOT create detailed configuration report sections.

NOTE: -g is meant to gather perf data without overhead, therefore only 1 second lockstat samples are taken. Use -G and/or -V for more detailed system probing (see examples and notes below) Using –v/-V with -g, adds pmap/pfiles snapshots, vs. using -G to also capture Dtrace and extended lockstat probing.

** Any time that sys_diag is run with either -g or -G, the performance * dashboard/summary section of the command line output is appended to * the file sys_diag_perflog.out, which gets copied and archived as ** part of the final .tar.Z output file.

./sys_diag -g –l -I 1 -T 600 Gathers perf data at 1 sec samples for 10 mins and Also does basic BME Begin/Midpt/Endpoint sampling, and reates a long/detailed configuration report.

./sys_diag -l -g -C Creates a long configuration snapshot report, gathers basic performance data/analysis, and Cleans up (aka removes the data directory) after data directory archive compression (.tar.Z).

./sys_diag -d base_directory_path –l … (-d changes the data directory location to be created)

./sys_diag -G –l -T 600 Gathers DEEP performance & Dtrace/lockstat/pmap data at the default Interval (sampling rate of 5 secs) for 10 mins (including the std data gathering from -g).

*NOTE: this runs all Dtrace/Lockstat/Pmap probing during BME snapshot intervals (beginning_0/midpoint_1 w -V/ and endpoint_#2 snapshots), limiting probing overhead to BEFORE/AFTER the standard data gathering begins (vmstat, mpstat, iostat, netstat, .. from -g). The MIDPOINT probing occurs at a known point as not to confuse this activity for other system processing.

*Because of this, standard data collection may not start for 30+ seconds, or until the beginning snapshot (snapshot_#0) is complete. (-g snapshot_#0 activities only take a couple seconds to complete, since they do not include any Dtrace/lockstat.. beyond 1 sec samples).

./sys_diag -G -V -I 1 -T 600 Gathers DEEP, VERBOSE, performance & Dtrace/lockstat/pmap data at 1 sec samples for 10 mins (uses 5 second Dtrace and Lockstat snapshots, vs. 2 second probing with -g. (in addition to the standard data gathering from -g).

./sys_diag -g –l -S (gathers perf data, runs long config rpt, and SKIPS Post-Processing and .html report generation)

NOTE: * This allows for completing the post-processing/analysis activities either on another system, or at a later time, as long as the data_directory exists (which can be extracted from the .tar.Z, then refered to as -d data_dir_path ). ** See the next example using -P -d data_path **

./sys_diag -P -d ./data_dir_path (Completes Skipped Post-Processing & .html rpt creation)

(Copyright © 1999-2017 by Todd A. Jobson) Pg 16 of 33

______

5.0 Capturing sys_diag command line output : ______

To capture all cmd line output (stdout/stderr) to a file use either :

script [-a] /var/tmp/sys_diag.out (then after running sys_diag, type exit)

OR

./sys_diag -g [..other options..] 1>/var/tmp/sys_diag.out 2>&1 (this will hide all command line output .. all instead going to the file)

NOTE: If the filename used for capturing command line output is /var/tmp/sys_diag.out or uses the same path as the -d base_data_directory , then that file will be automatically copied as part of the .tar.Z created.

______

6.0 Executing sys_diag via CRONTAB entries : ______

To run /var/tmp/sys_diag as a CRON entry (@9am every Friday), with data stored in (-d) /var/tmp, with all cmd line output appended to /var/tmp/sys_diag.out : (set EDITOR=vi;export EDITOR .. as root run "crontab -e" adding the following line)

0 9 * * 5 /var/tmp/sys_diag -g -d /var/tmp 1>>/var/tmp/sys_diag.out 2>&1

To run /var/tmp/sys_diag for tracking configuration and configuration file changes (-t) midnight every day, using an input file to specify the list of files to track and report on (-f /var/tmp/sysd_tfiles), storing the data directory for runs under the basedirectory (-d /var/tmp). All output from sys_diag gets saved (appended) in /var/tmp/sys_diag.out

0 0 * * * /var/tmp/sys_diag -t -l -f /var/tmp/sysd_tfiles -d /var/tmp 1>>/var/tmp/sys_diag.out 2>&1

Note, that the following describes the first 5 fields for crontab entries :

minute (0-59), hour (0-23), day of the month (1-31), month of the year (1-12), day of the week (0-6 with 0=Sunday).

* Lising a field with either comma or dash separated list allows multiple times/days (eg. 0 9 * * 1-5 runs Mon-Fri @9am).. & ( 0 9 * * 1,5 runs on Mon & Fri's only) *

------

(Copyright © 1999-2017 by Todd A. Jobson) Pg 17 of 33

______

7.0 Examples for Reducing System Overhead during Data Capture : ______

When running sys_diag to capture, report, and/or analyze system utilization/ performance data [using -g | -G], several considerations must be made, and alternative approaches with sys_diag are available.

Among the key considerations :

- Whether or not Deep (-G) performance data and probing is required. This will run several Dtrace and lockstat probes of the system that incurr more system overhead than the basic (-g) data gathering.

- Whether or not additional Verbosity is specified (-v) =additional || (-V) =MAXIMUM (for performance gathering -g|-G, -v|-V adds pmap/pfiles, mdb, lockstat, as well as extending the probing time (lockstat/Dtrace) and # of top processes analyzed) [ -V will extend the # of top processes pmap lists to 10 from 5 for -v] [ -V will extend the probing intrval for lockstat/Dtrace to 5 secs from 2 for -v]

Performance data and related metrics captured/reported/analyzed are done in the following order of execution :

- Beginning, MidPoint (-V & !-x), and EndPoint Performance Snapshots [bef bkgrnd capture] (various ps process listings, kstat network metrics, mdb kernel memory, + additional pmap/pfile top cpu listings.. if more Verbosity (-v|-V) is specified + Dtrace and deeper Lockstat probing (if -G)

- Background performance Data Gathering for a specific sampling Rate (-I) and Total Duration (-T) .. includinf vmstat, mpstat, iostat, netstat, kstat, etc..

Alternative Considerations/ Ways to reduce system overhead :

- use -G (Dtrace deep probing) ONLY when you need the extended probing data as the overhead will be the most significant of all sys_diag options.

- Extended Verbosity (-V) will increase the number of top processes that are examined during the BME_Snapshots, as well as adding pmap/pfiles and # size of output.

- Consider using -x option to Exclude lockstat, intrstat, plockstat (DTrace usage), pfiles, and mdb from -g|-G performance data gathering, also skips Midpt BME snapshots. ** Other than running Dtrace (-G), these are the most obtrusive utilities to skip**

- Decrease the Interval (sampling rate) of data capture via -I Although 1 sec samples give the most granular picture, it produces 5 times the sampling as the default (5 sec interval).

NOTE: systems which have a LOT of Storage IO devices (SANs) can generate IO overhead on 1 second (fequent) sampling during iostat data capture (for hundreds of devs..).

- Specify individual performance sub-systems (-p cminp) for data capture/ analysis vs capturing and analyzing data for all subsystems (CPU|Mem|IO|Net|Process).

eg. ./sys_diag -p cminp selects All sub-systems ./sys_diag -p cn only selects cpu & network data/analysis

- Skip Post-Processing of Performance data (using -S), which will only capture data and create a .tar.Z that can be Post-Processed later (locally or on a remote system to correlate/analyze the exceptions and generate the .html rpt and Charts later) :

./sys_diag -g -S (to gather performance data and Skip post-processing) ./sys_diag -P -d /data_directory_path (Post-process/analyze/rpt data_directory later)

- Remove additional command line operations, such as Long/Detailed configuration reporting (-l) [though this only runs after performance data capture is completed], among other levels of additional verbosity (-v|V), etc.. ______

NOTE: as always, TEST ANY use of sys_diag (or any utility/program) first in a NON Production environment PRIOR to running in production so that overhead/interactions can be monitored.

(Copyright © 1999-2017 by Todd A. Jobson) Pg 18 of 33

______

8.0 Performance Data: Threshold Analysis and Baselines ______

During normal (non-Baseline) analysis of performance data (captured via standard post-processing -g|-G without using Baseline arguments -b|-Bx), sys_diag uses built-in performance threshold PARAMETERS that are located as global variables in sys_diag headers and can be edited (Default PARAMETERS listed below).

The final output of performance analysis (after parsing data gathered by sys_diag via -g | -G), generates a "dashboard" characterizing workload and performance findings. The process of generating this dashboard results in post-processing the gathered data against the thresholds noted below, to produce a color coding of exceptions to thresholds, noting the severity by the following scale :

RED = Severe (Critical) YELLOW = Caution (Warning) GREEN = No Significant EXCEEDED Thresholds!(Warnings & Critical Threshold exceptions not exceeded)

(** As noted in Section 2.0 above, sys_diag generates a bitmapped single integer exit/return code which depicts these RED/YELLOW Alarms by subsystem. **)

______

############### Performance Data Gathering Parameters ############## # # NOTE: THE FOLLOWING PARAMETERS ONLY AFFECT the -g (PERF ANALYSIS) # PERF_SECS is the total elapsed seconds for data collection and can # be overridden on the command line with -T # PERF_INTERVAL is the sampling interval in seconds and can be overridden # via the command line interval -I

PERF_SECS=300 ## Default Total data gathering Time; 300 secs= 5 mins, (-T x) PERF_INTERVAL=5 ## Default Interval in # of seconds that samples are taken (-I x)

BASELINE_CALC=1 ## Baseline exceptions thresholds (1=StdDev, 2=Range) STDDEV_MULTIPLE=2.58

## the following fields are the actual vmstat/mpstat/iostat ## Rules / Thresholds that are tested for and flagged if outside bounds

#VMSTAT_RUNQ_GT=$((num_cores)) ## vmstat exception only if RUNQ (R) > # cores #VMSTAT_RUNQ_GT=0 ## vmstat entry flagged only if RUNQ (R) field > 0 ## No RunQ allowed VMSTAT_RUNQ_GT=$((num_cpus)) ## vmstat exception only if RUNQ (R) > # physical cpus

VMSTAT_BLKD_GT=0 ## vmstat entry flagged only if Kthr_B field > X VMSTAT_WAIT_GT=0 ## vmstat entry flagged only if Kthr_W field > X VMSTAT_SCANRT_GT=0 ## rate of system scanning for free mem pages VMSTAT_PCTSYS_GT=40 ## overall system cpu % SYS > X% VMSTAT_PCTSYSIDLE_LT=15 ## overall system cpu (%SYS > %USR ) & (%IDLE < X%) VMSTAT_PCTIDLE_LT=5 ## overall system cpu % IDLE < X% VMSTAT_PCT_YEL=3.0 ## Vmstat CPU Warning Threshold > X% of samples VMSTAT_PCT_RED=15.0 ## Vmstat CPU Critical Threshold > X% of samples

MEM_PCT_YEL=1.0 ## Vmstat MEM Warning Threshold > X% of samples MEM_PCT_RED=15.0 ## Vmstat MEM Critical Threshold > X% of samples MEM_PCT_MIN=10 ## % of physical RAM avail Minimum Threshold < X% SWAP_PCT_MIN=20 ## % Vmem:SWAP avail Minimum Threshold < X%

MPSTAT_ICSW_GT=90 ## involuntary context switches per cpu entry > X & (if %Sys > MPSTAT_SYS) MPSTAT_ICSW_IGT=300 ## involuntary context switches per cpu entry > X & (if Idle < PCTIDLE_ILT) MPSTAT_SMTX_GT=200 ## shared mutex spins per cpu entry > X & (if %Sys > MPSTAT_SYS) MPSTAT_SYS_GT=40 ## mpstat cpu % SYS (kernel) > X MPSTAT_PCTWT_GT=0 ## mpstat cpu waiting > X% of its time MPSTAT_PCTIDLE_LT=5 ## mpstat cpu entry flagged only if PCTIDLE < X MPSTAT_PCTIDLE_ILT=10 ## mpstat cpu entry flagged if (ICSW > ICSW_IGT) & (PCTIDLE_ILT < X) MPSTAT_PCT_YEL=4.0 ## Mpstat Warning Threshold > X% of samples MPSTAT_PCT_RED=15.0 ## Mpstat Critical Threshold > X% of samples

IOSTAT_WAIT_GT=0 ## iostat avg # transactions waiting on device queue IOSTAT_ASVCTM_GT=19 ## iostat avg device time to svc active rqst (asvc_t) > Xms IOSTAT_WSVCTM_GE=1 ## iostat avg device time rqst is in wait queue (wsvc_t) >= Xms IOSTAT_PCTWT_GT=0 ## transactions waiting > X% of the time on device (%w) IOSTAT_PCTBSY_GT=95 ## device busy % (%b) > X% of its time IOSTAT_PCT_YEL=3.0 ## Iostat Warning Threshold > X% of samples IOSTAT_PCT_RED=15.0 ## Iostat Critical Threshold > X% of samples

NETSTAT_RX_GT=19750 ## network interface incoming packets per interval NETSTAT_RX_ERR_GT=0 ## network interface incoming packet errors per interval NETSTAT_TX_GT=27500 ## network interface outgoing packets per interval NETSTAT_TX_ERR_GT=0 ## network interface outgoing packet errors per interval NETSTAT_COLL_GT=0 ## network interface # collisions per interval NETSTAT_SQ_GT=0 ## network TCP Port Send Queue Packets Undrained NETSTAT_RQ_GT=0 ## network TCP Port Recv Queue Packets Undrained NETSTAT_IPKTS_GT=20000 ## network interface incoming packets per interval NETSTAT_IDROPS_GT=0 ## network interface incoming packet errors per interval NETSTAT_OPKTS_GT=27500 ## network interface outgoing packets per interval NETSTAT_ODROPS_GT=0 ## network interface outgoing packet errors per interval NETSTAT_PCT_YEL=4.0 ## Netstat Warning Threshold > X% of samples NETSTAT_PCT_RED=15.0 ## Netstat Critical Threshold > X% of samples (Copyright © 1999-2017 by Todd A. Jobson) Pg 19 of 33

(Copyright © 1999-2017 by Todd A. Jobson) Pg 20 of 33

Performance Analysis of Thresholds and Reporting within each Sub-System :

When default (non-Baseline) Parameters (noted above) are used, they are listed in report/analysis section headers within the actual exception Rules calculation as variables to test against.

NOTE: This can be found within the .html final report by clicking the link at the bottom of each Dashboard section labeled “Analysis”, such as “*VMSTAT CPU Analysis*”

Sample report snippets follow : ______

**** CPU (VMSTAT) Findings / EXCEEDED Thresholds!****

VMSTAT file: /var/tmp/sysd_Einstein-S11.3-Zone_170110_2022/sysd_vm_*.out

[entries where (RunQ > 1) or (Kthr_Blocked > 0) or (Kthr_Wait > 0) or (cpu_idle < 5) or ((%Sys > %Usr) and (cpu_idle < 15))]

______

TOTAL CPU AVGS : RUNQ= 1.5 : BThr= 0.0 : USR= 4.4 : SYS= 22.0 : IDLE= 73.6 *( 26.4% Total CPU USED)* PEAK CPU HWMs : RUNQ= 17 : BThr= 0 : USR= 17 : SYS= 99 : IDLE= 0 *(100.0% Total CPU USED)* ______

* NOTE: 29.6029 % : 82 of 277 VMSTAT CPU entries EXCEEDED Thresholds! *

VMSTAT (top 100) WARNINGs are sorted by %SYS && %USR cpu (Last Column: NonZero sample #)

kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr cd ------in sy cs us sy id 17 0 0 3915852 88540 27 46 0 0 0 0 0 35 0 0 0 337 506 12 0 99 0 68 12 0 0 4203424 371168 175 392 0 0 0 0 0 23 0 0 0 990 5837 32 3 97 0 72 2 0 0 4203424 371168 140 197 0 0 0 0 0 0 0 0 0 1699 1025 45 1 96 3 73

... (exceptions listed) ...

Metrics Exceeding Threshold [sorted by exception count] :

CPU:VMSTAT:n:No_Issues:195:70 CPU:VMSTAT:r:Run_Queue_Threads:80:28 CPU:VMSTAT:sgtu:System_Gt_UserCpu:7:2 CPU:VMSTAT:id:Idle_Pct_Cpu:5:1 CPU:VMSTAT:w:Waiting_Kernel_Threads:0:0 CPU:VMSTAT:b:Blocked_Kernel_Threads:0:0

______

**** MEMORY (VMSTAT) Findings ****

VMSTAT file: /var/tmp/sysd_Einstein-S11.3-Zone_170110_2022/sysd_vm_*.out

[entries where (scan_rate (sr) > 0) or ( free_swap (swap) < 838860 K [ < 20 % total_swap (4194300 K) ] ) or (free_mem < 307200 K) [ < 10 % total_ram (3072000 K) ]

______

TOTAL MEM AVGS : SR= 0.0 : SWAP_free= 4158689 K : FREE_RAM= 308536 K *( 90.0% Total MEM USED)* PEAK MEM Usage: SR= 0 : SWAP_free= 3894204 K : FREE_RAM= 40216 K *( 98.7% Total MEM USED)* ______

* NOTE: 42.2383 % : 117 of 277 VMSTAT MEMORY entries EXCEEDED Thresholds! *

... (exceptions listed) ...

______

**** CPU (MPSTAT) Findings / EXCEEDED Thresholds!****

MPSTAT file: /var/tmp/sysd_Einstein-S11.3-Zone_170110_2022/sysd_mp_*.out

[entries where Cpu ( ((icsw > 90) || (smtx > 200)) && (%Sys > 40)) or (Copyright © 1999-2017 by Todd A. Jobson) Pg 21 of 33

((icsw > 300) && (%Idle < 10)) or (%Wait > 0) or (%Idle < 5)]

______

CPU MP AVGS: Wt= 0: Xcal= 11: csw= 216: icsw= 26: migr= 13: smtx= 26: syscl= 5603 PEAK MP HWMs: Wt= 0: Xcal= 1459: csw= 535: icsw= 97: migr= 63: smtx= 365: syscl= 14073 ______

... (exceptions listed) ...

**** IOSTAT Findings / EXCEEDING Thresholds!****

IOSTAT file: /var/tmp/sysd_Einstein-S11.3-Zone_170110_2022/sysd_io_*.out

[entries where device (transacts_Waiting (wait) > 0) or (wsvc_time_ms >= 1) or (asvc_time_ms > 19) or (%W > 0) or (%Busy > 95)]

* NOTE: 27.1321 % : 1947 of 7176 IOSTAT (non-zero) entries EXCEEDED Thresholds! *

IOSTAT (top 100) WARNINGs reflect the slowest device entries (Last Column: NonZero Sample#)

extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device

69.0 178.0 560.0 21544.0 0.0 67.4 0.0 273.0 0 89 ssd45 1402 80.0 163.0 648.0 17476.0 0.0 62.6 0.0 257.5 0 90 ssd50 1404 87.0 162.0 704.0 23152.0 0.0 63.3 0.0 254.0 0 92 ssd44 1401 82.0 159.0 664.0 15660.0 0.0 56.8 0.0 235.7 0 91 ssd49 1403 44.0 1483.2 360.0 14700.9 97.6 247.3 63.9 162.0 84 100 ssd45 1332 76.0 736.3 608.2 81354.8 16.7 130.9 20.5 161.2 13 92 ssd44 205 60.0 1246.8 479.9 11636.0 98.2 209.8 75.2 160.5 68 98 ssd50 1768 37.0 1532.2 304.0 14706.0 157.2 250.2 100.2 159.4 88 100 ssd44 1331 55.0 1299.8 439.9 12197.9 149.3 213.2 110.2 157.4 70 99 ssd49 1767

... (exceptions listed) ...

Metrics Exceeding Threshold [sorted by exception count] :

IOS:IOSTAT:n:No_Issues:5229:72 IOS:IOSTAT:asvc_t:Actv_Svc_Time_ms:1922:26 IOS:IOSTAT:pctw:%_Wait_Device:1511:21 IOS:IOSTAT:wait:Transactions_Waiting:1491:20 IOS:IOSTAT:wsvc_t:Wait_Svc_Time_ms:1425:19 IOS:IOSTAT:pb:%_Busy_Device:123:1 ______

**** NETWORK Findings (Netstat / Kstat) : ****

[entries where interface (RX_Pkts > 19750) or (Rx_Pkt_Errs > 0) or (Tx_Pkts > 27500) or (Tx_Pkt_Errs > 0) or (Collisions > 0)]

... (exceptions listed) ...

(Copyright © 1999-2017 by Todd A. Jobson) Pg 22 of 33

______

8.1 PERFORMANCE Data BASELINE (snapshot) Generation : ______

Baselining system utilization offers the capability to create a system performance snapshot, characterizing workload and identifying key performance metric characteristics including : Ranges (LWM / HWM), Mean (Avgs), and Standard Deviations.

To CREATE a baseline of system utilization and performance metrics, add the -b command line argument when capturing / gathering performance data as :

root@/export/home# ./sys_diag -b -g ... or ./sys_diag -b -G ...

To USE and compare current system performance data gathered with a past/prior system baseline taken, you would add the -Bx command line argument as :

root@/export/home# ./sys_diag -B1 -g ... or ./sys_diag -B2 -G ...

(-B1 denotes using the Baseline file to calculate thresholds based upon the Range of HWM metrics, while -B2 denotes using the Baseline file to calculate thresholds based upon Range AND/OR Standard Deviation metric exceptions.)

NOTE: v.8.3 does NOT yet provide baseline data per individual Storage IO or Network Devices, instead using the Performance PARAMETERS noted above for identifying these exceptions.

** A sample BASELINE file created with -b (sys_diag_BASELINE.cfg) is listed in section 10 below. **

------

______

9.0 Creating Graphs of Performance Data ______

Version 8.3g of sys_diag offers built-in dynamic HTML generation with both javascript Embedded dashboard charts, as well as stand-alone .gr.html files for each individual chart.

This is in addition to the existing export of background performance data gathered to Text-based .gr.txt files (for use to import / parse externally). With these .g.txt files, any need to generate custom graphs of vmstat, iostat, and/or netstat data, is easily accomplished via OpenOffice /Excel :

From Openoffice, (download free at openoffice.org) Insert->sheet from file (delimited by space).. then hiding any columns that you don't want graphed.. following the wizard for graph choices/options.

For Excel, File->Open (type *.txt) -> Text Import Wizard (Delimited-> Space), (step 3 of 3) associate each column as General OR Do not import to delete un-needed columns (can also be edited after graphs are created via Select Data, LegendEntries, select Column to exclude and click Remove for each). Next, from within the spreadsheet, test first auto-create w/o highlighting data, then click Insert-> & select the chart type (eg. Line w markers). If the chart looks bizare, first try to change the chart type (R-click).. .. and if that still looks strange, go back to line chart and Edit Data and exclude some columns with outlier datapoints. The other option is to undo and try again but first hightlight the area rows/cols to graph before selecting Insert-> LineChart w markers.

NOTE: For iostat _io_ .gr.txt file, you must highlight/select the data to graph (all data is sorted in sections by device/c11 then c12/chron entry#)

The graph filenames will resemble the following (assuming hostname “Newton-S11.3x6”) :

- sysd_io_Newton-S11.3x6_170110_002525.out.gr.txt : Master iostat txt datafile (sorted by device) - sysd_iodev_c1d0.gr.txt : Individual IO device c1d0 iostat data (.txt) - sysd_iost_line_Newton-S11.3x6.gr.html : HTML Line Chart of SLOWEST iostat device - sysd_iostall_pie_Newton-S11.3x6.gr.html : HTML Pie Chart for ALL IO Devices (iostat) - sysd_net1_Newton-S11.3x6.gr.txt : Netstat text data for NIC1 - sysd_net2_Newton-S11.3x6.gr.txt : Netstat text data for NIC2 - sysd_vm_Newton-S11.3x6_170110_002524.out.gr.txt : Master vmstat txt datafile - sysd_vmcpu_line_Newton-S11.3x6.gr.html - sysd_vmcpu_pie_Newton-S11.3x6.gr.html - sysd_vmmem_line_Newton-S11.3x6.gr.html - sysd_vmmem_pie_Newton-S11.3x6.gr.html (Copyright © 1999-2017 by Todd A. Jobson) Pg 23 of 33

______

10.0 sys_diag DIRECTORIES and DATA FILE Descriptions : ______

The following list is a description of the files you will encounter within the default base directory that sys_diag uses for its data files (or identified with -d) :

[NOTE: "socrates" is the hostname of the system used to generate the following filenames. * most files use the following naming convention : sysd_*_hostname_YYMMDD_HHMM ]

# ls ./sys*

-rwxr-xr-x 1 root root 186900 May 11 03:44 sys_diag drwxr-xr-x 1 root root 2560 May 11 03:44 sysd_socrates_070511_0355 drwxr-xr-x 2 root root 1024 May 11 03:56 sysd_cfg_mgt

The listing above shows the sys_diag script itself, as well as the 2 directories that were created if run with the -A (or -t) options. The sysd_hostname_YYMMDD_HHMM directory is the data directory where all the data files are stored for the reporting and performance data capture. The last directory listed as sysd_cfg_mgt is only created/used if you run with either -t or -A to initiate tracking of system configuration changes.

The details and descriptions of the contents of both directories is listed below :

# ls ./sysd_socrates_070511_0355/ : SYS_DIAG DATA DIRECTORY (sysd_hostname_YYMMDD_HHMM)

Filename Arg Description ______

README_sys_diag.txt * The self-extracting README sys_diag * A copy of the sys_diag script used sys_diag.out - sys_diag command line output (if captured) sys_diag_perflog.out -g|-G Performance Summary cmdline output (history) sys_diag_BASELINE.cfg -b|-B Performance BASELINE file (created with -b , and used with -Bx , both require -g|-G) sysd_etcsystem_socrates_070511_035503.out -l|-c /etc/system kernel parameters/tunables file sysd_socrates_070511_0355.out.html -g|-G ****** FINAL .html Report ****** sysd_socrates_070511_0355.out.dash.html -g|-G Utilizaiton/Performance Analysis Dashboard .html piece sysd_socrates_070511_0355.out * Sys_diag main .txt output file (for .hmtl / .ps) sysd_cputrk0_socrates_070511_035504.out -G Cputrack top PID data (TLB_misses & % FP) (snap #0) sysd_cputrk1_socrates_070511_035604.out -G Cputrack top PID data (TLB_misses & % FP) (snap #1) sysd_cputrk2_socrates_070511_035704.out -G Cputrack top PID data (TLB_misses & % FP) (snap #2) sysd_dpio0_socrates_070511_035504.out -G Dtrace : IOsnoop for top pids (snap #0) sysd_dpio1_socrates_070511_035604.out -G Dtrace : IOsnoop for top pids (snap #1) sysd_dpio2_socrates_070511_035704.out -G Dtrace : IOsnoop for top pids (snap #2) sysd_diow0_socrates_070511_035504.out -G Dtrace : File IO/IO waits (snap #0) sysd_diow1_socrates_070511_035604.out -G Dtrace : File IO/IO waits (snap #1) sysd_diow2_socrates_070511_035704.out -G Dtrace : File IO/IO waits (snap #2) sysd_dmpc0_socrates_070511_035504.out -G Dtrace : Top ICSW/SMTX/XCAL (#0, if -V||avg_icsw > HWM) sysd_dmpc1_socrates_070511_035604.out -G Dtrace : Top ICSW/SMTX/XCAL (#1, if -V||avg_icsw > HWM) sysd_dmpc2_socrates_070511_035704.out -G Dtrace : Top ICSW/SMTX/XCAL (#2, if -V||avg_icsw > HWM) sysd_dsyscall_counts0_socrates_070511_035504.out -G Dtrace syscall counts by call (snap #0) sysd_dsyscall_counts1_socrates_070511_035604.out -G Dtrace syscall counts by call (snap #1) sysd_dsyscall_counts2_socrates_070511_035704.out -G Dtrace syscall counts by call (snap #2) sysd_dcalls_by_procs0_socrates_070511_035504.out -G Dtrace process syscalls (snap #0) sysd_dcalls_by_procs1_socrates_070511_035604.out -G Dtrace process syscalls (snap #1) sysd_dcalls_by_procs2_socrates_070511_035704.out -G Dtrace process syscalls (snap #2) sysd_dintrtm0_socrates_070511_035504.out -G Dtrace Interrupt times (snap #0) sysd_dintrtm1_socrates_070511_035604.out -G Dtrace Interrupt times (snap #1) sysd_dintrtm2_socrates_070511_035704.out -G Dtrace Interrupt times (snap #2) sysd_dsdtcnt0_socrates_070511_035504.out -G Dtrace sdt_ counts (snap #0) sysd_dsdtcnt1_socrates_070511_035604.out -G Dtrace sdt_ counts (snap #1) sysd_dsdtcnt2_socrates_070511_035704.out -G Dtrace sdt_ counts (snap #2) sysd_dsinfo_by_procs0_socrates_070511_035504.out -G Dtrace process sysinfo counts (snap #0) sysd_dsinfo_by_procs1_socrates_070511_035604.out -G Dtrace process sysinfo counts (snap #1) sysd_dsinfo_by_procs2_socrates_070511_035704.out -G Dtrace process sysinfo counts (snap #2) sysd_dtcp_rx0_socrates_070511_035504.out -G Dtrace process tcp reads (snap #0) sysd_dtcp_rx1_socrates_070511_035604.out -G Dtrace process tcp reads (snap #1) sysd_dtcp_rx2_socrates_070511_035704.out -G Dtrace process tcp reads (snap #2) sysd_dtcp_tx0_socrates_070511_035504.out -G Dtrace process tcp writes (snap #0) sysd_dtcp_tx1_socrates_070511_035604.out -G Dtrace process tcp writes (snap #1) sysd_dtcp_tx2_socrates_070511_035704.out -G Dtrace process tcp writes (snap #2) sysd_dR_by_procs0_socrates_070511_035504.out -G Dtrace process read calls (snap #0) sysd_dR_by_procs1_socrates_070511_035604.out -G Dtrace process read calls (snap #1) sysd_dR_by_procs2_socrates_070511_035704.out -G Dtrace process read calls (snap #2) sysd_dW_by_procs0_socrates_070511_035504.out -G Dtrace process write calls (snap #0) (Copyright © 1999-2017 by Todd A. Jobson) Pg 24 of 33

sysd_dW_by_procs1_socrates_070511_035604.out -G Dtrace process write calls (snap #1) sysd_dW_by_procs2_socrates_070511_035704.out -G Dtrace process write calls (snap #2) sysd_ifcfg_socrates_070511_0356.out -n|-l|-g Network ifconfig -a output for host socrates sysd_knetb_hme0_socrates_070511_035522.out -g|-G Kstat output beginning snapshot for hme0 sysd_knetb_lo0_socrates_070511_035522.out -g|-G Kstat output beginning snapshot for lo0 sysd_knete_hme0_socrates_070511_035721.out -g|-G Kstat output ending snapshot for hme0 sysd_knete_lo0_socrates_070511_035721.out -g|-G Kstat output ending snapshot for lo0 .... etc.. for all network cards ... sysd_io_socrates.out -g|-G iostat data captured (raw format) sysd_io_socrates.out.gr.txt -g|-G iostat data captured (sorted text graph format) sysd_iox_socrates.out -g|-G iostat exceptions beyond thresholds sysd_ioavg_socrates.out -g|-G iostat device avgs & peaks from post-processing sysd_iocavg_socrates.out -g|-G iostat controller averages sysd_iodev_c1d0.gr.txt -g|-G iostat single device data (text graph format) .... etc.. for all Storage Devices ... sysd_iost_line_Newton-S11.3x6.gr.html -g|-G iostat slowest device LINE Chart (html format) sysd_iost_pie_Newton-S11.3x6.gr.html -g|-G iostat slowest device PIE Chart (html format) sysd_iostall_pie_Newton-S11.3x6.gr.html -g|-G iostat PIE Chart for ALL devices (html format) sysd_lockstat_files.out -g|-G Lockstat syntax and output file list sysd_lI0_socrates_070511_035504.out -g|-G Lockstat -I -W -s (snap #0) sysd_lI1_socrates_070511_035604.out -g|-G Lockstat -I -W -s (snap #1) sysd_lI2_socrates_070511_035722.out -g|-G Lockstat -I -W -s (snap #2) sysd_lA0_socrates_070511_035513.out -g|-G Lockstat -A -D (snap #0) sysd_lA1_socrates_070511_035613.out -g|-G Lockstat -A -D (snap #1) sysd_lA2_socrates_070511_035730.out -g|-G Lockstat -A -D (snap #2) sysd_ls0_socrates_070511_035504.out -G Lockstat -s -D (snap #0) sysd_ls1_socrates_070511_035604.out -G Lockstat -s -D (snap #1) sysd_ls2_socrates_070511_035722.out -G Lockstat -s -D (snap #2) sysd_lP0_socrates_070511_035513.out -G Lockstat -AP -D (snap #0) sysd_lP1_socrates_070511_035613.out -G Lockstat -AP -D (snap #1) sysd_lP2_socrates_070511_035730.out -G Lockstat -AP -D (snap #2) sysd_mdb0_socrates_070511_035504.out -g|-G && -v|V mdb kernel memory profile (snapshot #0) sysd_mdb1_socrates_070511_035604.out -g|-G && -v|V mdb kernel memory profile (snapshot #1) sysd_mdb2_socrates_070511_035722.out -g|-G && -v|V mdb kernel memory profile (snapshot #2) sysd_memx_socrates.out -g|-G vmstat memory exceptions sysd_mp_socrates.out -g|-G mpstat data captured (raw format) sysd_mpx_socrates.out -g|-G mpstat exceptions beyond thresholds sysd_net1_socrates_070511_035522.out -g|-G NIC1s netstat output file (NIC1= lo0) sysd_net1_socrates.gr.txt -g|-G NIC1s graph-reformatted netstat .txt output file sysd_net1x_socrates.out -g|-G NIC1 netstat traffic (exceptions) beyond thresholds .... etc.. for all network cards ... sysd_netstata_socrates_070511_035608.out -n|-l|-g netstat -a output sysd_netstat0_socrates_070511_035504.out -g|-G netstat -i -a stats summary (snapshot #0) sysd_netstat1_socrates_070511_035604.out -g|-G netstat -i -a stats summary (snapshot #1) sysd_netstat2_socrates_070511_035722.out -g|-G netstat -i -a stats summary (snapshot #2) sysd_netavg1_socrates.out -g|-G Network average/Peak calculations output file #1 sysd_netavg2_socrates.out -g|-G Network average/Peak calculations output file #2 sysd_pmap0_socrates_070511_035504.out -g -v Top x PID details (pmap, pfiles, ptree) (snap #0) sysd_pmap1_socrates_070511_035604.out -g -v Top x PID details (pmap, pfiles, ptree) (snap #1) sysd_pmap2_socrates_070511_035704.out -g -v Top x PID details (pmap, pfiles, ptree) (snap #2) sysd_psc0_socrates_070511_035504.out -g|-G Ps sorted by cpu (snap #0) sysd_psc1_socrates_070511_035604.out -g|-G Ps sorted by cpu (snap #1) sysd_psc2_socrates_070511_035721.out -g|-G Ps sorted by cpu (snap #2) sysd_psm0_socrates_070511_035504.out -g|-G Ps sorted by mem (snap #0) sysd_psm1_socrates_070511_035604.out -g|-G Ps sorted by mem (snap #1) sysd_psm2_socrates_070511_035721.out -g|-G Ps sorted by mem (snap #2) sysd_PSc_socrates_070511_035543.out -g|-G Ps sorted by %cpu sysd_PSl_socrates_070511_035543.out -g|-G Ps sorted by #LWP sysd_PSm_socrates_070511_035543.out -g|-G Ps sorted by %mem sysd_PStl_socrates_070511_035543.out -g|-G Ps sorted by top LWP sysd_PSzn_socrates_070511_035543.out -g|-G Ps sorted by Zone / LWP sysd_pkg_socrates_070511_035503.out -l pkginfo -l (listing) sysd_rcap_Newton-S11.3x6_170112_235340.out -l Non-global zone rcapstat output sysd_snoop_socrates_070511_035522.out -g &(-n|-V) network snoop output sysd_swapl_socrates_070511_035622.out -l|(-g|-G) Physical Swap (swap -l) and phys RAM output sysd_tcp0_Newton-S11.3x6_170112_235347.out -G | -g & -v|V tcpstat (snapshot #0) sysd_tcp2_Newton-S11.3x6_170112_235531.out -G | -g & -v|V tcpstat (snapshot #2) sysd_vm_socrates.out -g|-G vmstat data captured (raw format) sysd_vm_socrates.gr.txt -g|-G vmstat reformatted graph datafile sysd_vmx_socrates.out -g|-G vmstat cpu exceptions beyond thresholds sysd_vmavg_socrates.out -g|-G vmstat averages and Peak entries sysd_vmcpu_line_Newton-S11.3x6.gr.html -g|-G vmstat CPU LINE Chart (html standalone chart) sysd_vmcpu_pie_Newton-S11.3x6.gr.html -g|-G vmstat CPU PIE Chart (html standalone graph) sysd_vmmem_line_Newton-S11.3x6.gr.html -g|-G vmstat MEMORY LINE Chart (html graph) (Copyright © 1999-2017 by Todd A. Jobson) Pg 25 of 33

sysd_vmmem_pie_Newton-S11.3x6.gr.html -g|-G vmstat MEMORY PIE Chart (html graph) sysd_warn_socrates_070511_035503.out -l|-g|-G Warning Messages from dmesg/messages/syslog... sysd_error_socrates_070511_035503.out -l|-g|-G Error Messages from dmesg/messages/syslog... sysd_zio0_Newton-S11.3x6_170112_235347.out -g|-G zpool iostat –v output (snap #0) sysd_zio2_Newton-S11.3x6_170112_235531.out -g|-G zpool iostat –v output (snap #2)

socrates_change_log.out -t|-A Configuration Tracking change log copy

------

** NOTE: ** NOTE: As of sys_diag v8.3g, all graphs also exist as stand-alone .gr.html files !

However, the vmstat, iostat, and netstat .gr.txt files above can easily be imported/inserted Using StarOffice/OpenOffice or Excel to Generate GRAPHS.

For Openoffice, (download free at openoffice.org) Insert->sheet from file (delimited by space).. then hiding any columns that you don't want graphed.. following the wizard for graph choices/options.

For Excel, File->Open (type *.txt) -> Text Import Wizard (Delimited-> Space), then after import, delete un-needed columns.

------

**Configuration Managment / Tracking Directory**

# ls ./sysd_cfg_mgt

Filename Description ______cfgadm_last.cfg Last captured /usr/sbin/cfgadm output eeprom_last.cfg Last captured /usr/sbin/eeprom output metastat_last.cfg Last captured /usr/sbin/metastat output metadb_last.cfg Last captured /usr/sbin/metadb output psrinfo_last.cfg Last captured /usr/sbin/psrinfo output prtconf_last.cfg Last captured /usr/sbin/prtconf output prtdiag_last.cfg Last captured /usr/platform/*/sbin/prtdiag -v sysdef_last.cfg Last captured /usr/sbin/sysdef -D output

F_hosts_last.cfg Last captured FILE: /etc/hosts F_mnttab_last.cfg Last captured FILE: /etc/mnttab F_nsswitch_last.cfg Last captured FILE: /etc/nsswitch.conf F_resolve_last.cfg Last captured FILE: /etc/resolv.conf F_syslog_last.cfg Last captured FILE: /etc/syslog.conf file F_system_last.cfg Last captured FILE: /etc/system file socrates_change_log.out Change log of past/current configuration chgs

070511_0356_cfgadm.cfg Date stamped historical cmd output files 070511_0356_df.cfg 070511_0356_eeprom.cfg 070511_0356_metastat.cfg 070511_0356_metadb.cfg 070511_0356_psrinfo.cfg 070511_0356_prtconf.cfg 070511_0356_prtdiag.cfg 070511_0356_sysdef.cfg

070511_0356_F_hosts.cfg Date stamped historical configuration FILES 070511_0356_F_mnttab.cfg 070511_0356_F_nsswitch.cfg 070511_0356_F_resolve.cfg 070511_0356_F_syslog.cfg 070511_0356_F_system.cfg

** NOTE: If the -f intput_file option is used with -t, then all files listed within the input_file (as one absolute file path per line) will also be tracked for chgs.

------

(Copyright © 1999-2017 by Todd A. Jobson) Pg 26 of 33

______

11.0 A Sample Performance Baseline File : ______

Sample BASELINE file created with -b is listed below (sys_diag_BASELINE.cfg) :

--- Begin -- sys_diag_BASELINE.cfg --

100223_160756 Tue Feb 23 16:07:56 EST 2010 PERF: 2 VERBOSE: 2 sys_diag (v 7.08.2): -G -V -l -b -I1 -T180 Total: 100 processes, 309 lwps, load averages: 0.56, 0.41, 0.29 Physical_CPUs: 1 * AMD Turion(tm) X2 Dual-Core Mobile RM-70 VCPUs_ONLINE: 2 OFFLINE: 0 NO_INTR: 0 Memory size: 2815 Megabytes Page Summary Pages MB %Tot ------Kernel 112720 440 16% ZFS File Data 147767 577 21% Anon 125112 488 17% Exec and libs 3621 14 1% Page cache 23504 91 3% Free (cachelist) 30681 119 4% Free (freelist) 275023 1074 38%

INTERVAL: 1 sec(s) samples TOTAL DURATION: 180 sec(s) PERF SNAPSHOTS : *BEGIN_SNAP: Tue Feb 23 16:07:57 EST 2010 *MID_SNAP: Tue Feb 23 16:13:03 EST 2010 *END_SNAP: Tue Feb 23 16:16:11 EST 2010 PERF DATA GATHERING : START : Tue Feb 23 16:11:33 EST 2010 STOP : Tue Feb 23 16:16:11 EST 2010

Utility_Field Min Max Avg StdDev

VMSTAT_C_R 0 5 0.48 0.70 VMSTAT_C_B 0 0 0.00 0.00 VMSTAT_C_W 0 0 0.00 0.00 VMSTAT_M_SW 2254884 2314700 2293126.73 25451.25 VMSTAT_M_FR 1181300 1232296 1212247.53 22054.37 VMSTAT_M_SR 0 0 0.00 0.00 VMSTAT_C_IN 477 6374 806.70 761.75 VMSTAT_C_SC 1078 130836 4473.91 14000.83 VMSTAT_C_CS 320 4335 485.23 390.39 VMSTAT_C_US 0 59 5.21 12.17 VMSTAT_C_SY 0 47 2.35 5.66 VMSTAT_C_ID 8 100 92.28 16.60 MPSTAT_C_XC 0 5289 74.78 451.61 MPSTAT_C_IN 116 5759 402.62 493.97 MPSTAT_C_IT 13 420 115.22 75.33 MPSTAT_C_CS 144 2735 242.61 206.03 MPSTAT_C_IC 0 746 8.50 42.15 MPSTAT_C_MI 5 111 16.84 9.83 MPSTAT_C_SM 0 36 3.31 4.69 MPSTAT_C_SC 347 129261 2222.93 8997.95 MPSTAT_C_WT 0 0 0.00 0.00

------TOT_SIO_Thrupt: 2 reads | 426 writes | 153 KB_read | 2258 KB_write | 15 entries

TOT_NET_Thrupt: 11 rx_KB | 2 tx_KB | 156 rx_Pkts | 21 tx_Pkts

NETSTAT_CONNs: 4 ESTABLISHED

--- END -- sys_diag_BASELINE.cfg ---

------

(Copyright © 1999-2017 by Todd A. Jobson) Pg 27 of 33

______

12.0 Sample Command Line Output ______

The following sample output was captured on an x86 server running VirtualBox, which hosted a VM with Solaris 11.3. Within that Solaris virtual host, the following snapshot was run from within the Global Zone (where another non-global zone also existed and was configured with a MEM RCAP).

This output reflects basic performance data gathering (“-g”), creation of a long (“-l”) configuration report, expanded verbosity (“-v”) which also adds pmap/ptree/pfiles and light-weight lockstat BME sampling. The data is gathered (captured) at 1 second sampling Intervals (“-I1”) for a Total (“-T60”) of 60 seconds.

NOTE: * All Commands are run serially, except Background Performance Data Gathering *

[email protected]:/export/home/tjobson/Downloads# ./sys_diag -g -l -v -I1 -T60

sys_diag:0112_145543: gather PERFORMANCE Data sys_diag:0112_145543: LONG Configuration Report sys_diag:0112_145543: VERBOSE=1 sys_diag:0112_145543: INTERVAL: 1 sys_diag:0112_145543: TIME Duration: 60 sys_diag:0112_145543: lock_file: 1 sys_diag:0112_145543: Extracting ... README_sys_diag.txt ... sys_diag: ------Beginning Profiling : SNAPSHOT (# 0) ------sys_diag:0112_145543: # /usr/bin/zonestat -q -r summary -R total 1s 3s ... sys_diag:0112_145543: # /usr/bin/rcapstat ... sys_diag:0112_145543: # /usr/bin/tcpstat -l10 -c 1 1 ... sys_diag:0112_145553: # ps -e -o ...(by %CPU) ... Snapshot # 0 sys_diag:0112_145553: # ps -e -o ...(by %MEM) ... Snapshot # 0 sys_diag:0112_145553: # iostat -xcnCXTdz 2 3 ... Snapshot # 0 sys_diag:0112_145558: # iostat -xcCXTdz 1 1 ... Snapshot # 0 sys_diag:0112_145558: # prstat -mL -d d -J -n 300 5 2 ... Snapshot # 0 sys_diag:0112_145558: # /usr/bin/tcpstat -l50 -c 1 5 ... Snapshot # 0 sys_diag:0112_145558: # /usr/sbin/zpool iostat -v 1 5 ... Snapshot # 0 sys_diag:0112_145615: # prstat -mL -p 3505 1 1 ... sys_diag:0112_145616: # pmap -xs 3505 ... sys_diag:0112_145616: # ptree -a 3505 ... sys_diag:0112_145616: # pfiles 3505 ... sys_diag:0112_145616: # prstat -mL -p 851 1 1 ... sys_diag:0112_145616: # pmap -xs 851 ... sys_diag:0112_145616: # ptree -a 851 ... sys_diag:0112_145616: # pfiles 851 ... sys_diag:0112_145617: # prstat -mL -p 3546 1 1 ... sys_diag:0112_145617: # pmap -xs 3546 ... sys_diag:0112_145617: # ptree -a 3546 ... sys_diag:0112_145617: # pfiles 3546 ... sys_diag:0112_145617: # prstat -mL -p 3504 1 1 ... sys_diag:0112_145617: # pmap -xs 3504 ... sys_diag:0112_145617: # ptree -a 3504 ... sys_diag:0112_145617: # pfiles 3504 ... sys_diag:0112_145617: # prstat -mL -p 339 1 1 ... sys_diag:0112_145617: # pmap -xs 339 ... sys_diag:0112_145617: # ptree -a 339 ... sys_diag:0112_145617: # pfiles 339 ... sys_diag:0112_145617: # /usr/bin/netstat -i -a ... sys_diag:0112_145618: # /usr/sbin/lockstat -IW -s 10 sleep 1 ... sys_diag:0112_145623: # /usr/sbin/lockstat -AP -D10 -n50000 sleep 1 ... lockstat: warning: 17215 aggregation drops on CPU 1 lockstat: warning: 16019 aggregation drops on CPU 2 lockstat: warning: 16238 aggregation drops on CPU 1 lockstat: warning: ran out of data records (use -n for more) sys_diag: --**-- (Background) DATA COLLECTION FOR 60 secs STARTED --**-- sys_diag:0112_145623: # /usr/bin/vmstat -q 1 60 > /export/home/tjobson/Downloads/sysd_Newton- S11.3x6_170112_1455/sysd_vm_Newton-S11.3x6_170112_145543.out 2>&1 & sys_diag:0112_145623: # /usr/bin/mpstat -q 1 60 > /export/home/tjobson/Downloads/sysd_Newton- S11.3x6_170112_1455/sysd_mp_Newton-S11.3x6_170112_145543.out 2>&1 & sys_diag:0112_145623: # /usr/bin/iostat -xn 1 60 > /export/home/tjobson/Downloads/sysd_Newton- S11.3x6_170112_1455/sysd_io_Newton-S11.3x6_170112_145543.out 2>&1 & sys_diag:0112_145631: # /usr/bin/netstat -i -I lo0 1 60 > /export/home/tjobson/Downloads/sysd_Newton- S11.3x6_170112_1455/sysd_net1_Newton-S11.3x6_170112_145631.out 2>&1 & sys_diag:0112_145631: # /usr/bin/kstat -p -T u -n lo0 1> /export/home/tjobson/Downloads/sysd_Newton- S11.3x6_170112_1455/sysd_knetb_lo0_Newton-S11.3x6_170112_145631.out 2>&1 sys_diag:0112_145631: # /usr/bin/netstat -i -I net0 1 60 > /export/home/tjobson/Downloads/sysd_Newton- S11.3x6_170112_1455/sysd_net2_Newton-S11.3x6_170112_145631.out 2>&1 & (Copyright © 1999-2017 by Todd A. Jobson) Pg 28 of 33

sys_diag:0112_145631: # /usr/bin/kstat -p -T u -n net0 1> /export/home/tjobson/Downloads/sysd_Newton- S11.3x6_170112_1455/sysd_knetb_net0_Newton-S11.3x6_170112_145631.out 2>&1 sys_diag:0112_145631: ... WAITING 59.634 seconds for ENDPOINT data collection...

sys_diag:0112_145631: ------Background Data Gathering COMPLETED ------sys_diag:0112_145631: # /usr/bin/kstat -p -T u -n lo0 2>&1 sys_diag:0112_145631: # /usr/bin/kstat -p -T u -n net0 2>&1 sys_diag: ------EndPoint Profiling : SNAPSHOT (# 2) ------sys_diag:0112_145731: # ps -e -o ...(by %CPU) ... Snapshot # 2 sys_diag:0112_145731: # ps -e -o ...(by %MEM) ... Snapshot # 2 sys_diag:0112_145731: # iostat -xcnCXTdz 2 3 ... Snapshot # 2 sys_diag:0112_145735: # iostat -xcCXTdz 1 1 ... Snapshot # 2 sys_diag:0112_145735: # prstat -mL -d d -J -n 300 5 2 ... Snapshot # 2 sys_diag:0112_145735: # /usr/bin/tcpstat -l50 -c 1 5 ... Snapshot # 2 sys_diag:0112_145735: # /usr/sbin/zpool iostat -v 1 5 ... Snapshot # 2 sys_diag:0112_145754: # prstat -mL -p 851 1 1 ... sys_diag:0112_145754: # pmap -xs 851 ... sys_diag:0112_145754: # ptree -a 851 ... sys_diag:0112_145754: # prstat -mL -p 3505 1 1 ... sys_diag:0112_145754: # pmap -xs 3505 ... sys_diag:0112_145754: # ptree -a 3505 ... sys_diag:0112_145754: # prstat -mL -p 3546 1 1 ... sys_diag:0112_145754: # pmap -xs 3546 ... sys_diag:0112_145754: # ptree -a 3546 ... sys_diag:0112_145754: # prstat -mL -p 3504 1 1 ... sys_diag:0112_145754: # pmap -xs 3504 ... sys_diag:0112_145754: # ptree -a 3504 ... sys_diag:0112_145754: # prstat -mL -p 339 1 1 ... sys_diag:0112_145754: # pmap -xs 339 ... sys_diag:0112_145754: # ptree -a 339 ... sys_diag:0112_145754: # /usr/bin/netstat -i -a ... sys_diag:0112_145755: ------Data Collection COMPLETE ------sys_diag: ------(Foreground) Gathering System Configuration Details ------sys_diag:0112_145755: # uname -a ... sys_diag:0112_145755: # hostid ... sys_diag:0112_145755: # domainname (DNS) ... sys_diag:0112_145755: ###### SYSTEM CONFIGURATION / LDOM / DEVICE INFO ###### sys_diag:0112_145755: # cat /etc/release ... sys_diag:0112_145755: # /usr/bin/pkg info entire ... sys_diag:0112_145755: # prtconf | grep Memory ... sys_diag:0112_145755: # /usr/sbin/psrinfo -v ... sys_diag:0112_145755: # /usr/sbin/psrinfo -pv ... sys_diag:0112_145755: # /usr/sbin/virtinfo ... sys_diag:0112_145755: # /usr/sbin/psrset -q ... sys_diag:0112_145755: # /usr/sbin/beadm list -a ... sys_diag:0112_145755: # /usr/sbin/prtdiag -v ... sys_diag:0112_145755: # prtconf -dDu ... sys_diag:0112_145755: # prtconf -dDvu saved to OUTPUT FILE _prtconf_ ... sys_diag:0112_145757: ###### Enterprise SPARC HW System INFO ###### sys_diag:0112_145757: # Checking Kernel Cage settings ... sys_diag:0112_145757: # eeprom ... sys_diag:0112_145757: # /usr/bin/coreadm ... sys_diag:0112_145757: # /usr/sbin/dumpadm ... sys_diag:0112_145759: ###### WORKLOAD CHARACTERIZATION ###### sys_diag:0112_145759: # prstat -c -a -d d 1 1 ... (by All Process & Users) sys_diag:0112_145759: # prstat -c -J -d d 2 2 ... (by Process & SRM Projects) sys_diag:0112_145759: # prstat -c -Z -d d 1 1 ... (by Process & Zones) sys_diag:0112_145800: # prstat -c -a -d d 5 2 ... sys_diag:0112_145805: # prstat -c -v -m -L -d d 1 3 ... (Extended microstate metrics & LWPs) sys_diag:0112_145805: # pgstat -A -v 1 3 ... sys_diag:0112_145810: # pgstat -A -v -B core 1 3 ... sys_diag:0112_145813: # top -S -d1 300 ... sys_diag:0112_145814: # top -S -d1 -ores 100 ... sys_diag:0112_145816: # ps -e -o ...(by %CPU) ... sys_diag:0112_145816: # ps -e -o ...(by %MEM) ... sys_diag:0112_145816: # ps -e -o ...(by # LWPs per PID) ... sys_diag:0112_145816: # ps -eL -o ...(by Top cpu LWPs) ... sys_diag:0112_145817: # ps -eZ -o ...(by Zone) ... sys_diag:0112_145817: ###### KERNEL / MEMORY PROFILING ###### sys_diag:0112_145817: # vmstat 1 5 ... sys_diag:0112_145821: # /usr/bin/mpstat 1 3 ... sys_diag:0112_145821: # /usr/bin/isainfo -v ... sys_diag:0112_145823: # /usr/bin/ipcs -a ... sys_diag:0112_145823: # /usr/bin/pagesize ... sys_diag:0112_145823: # swap -l ... sys_diag:0112_145823: # swap -s ... (Copyright © 1999-2017 by Todd A. Jobson) Pg 29 of 33

sys_diag:0112_145823: # /usr/bin/vmstat -s ... sys_diag:0112_145823: # /usr/bin/kstat -n segmap ... sys_diag:0112_145823: # /usr/bin/kstat -n system_pages ... sys_diag:0112_145823: # /usr/bin/kstat -m zfs ... sys_diag:0112_145823: # /usr/bin/kstat -n vm ... sys_diag:0112_145823: # /usr/sbin/trapstat 1 2 ... sys_diag:0112_145823: # /usr/bin/vmstat -i ... sys_diag:0112_145823: ###### Solaris ZONES / SRM / Kernel Config ###### sys_diag:0112_145823: # /usr/sbin/zoneadm list -cvi ... sys_diag:0112_145823: # /usr/bin/zonestat -q -Td -R total,high 1s 5s ... sys_diag:0112_145823: # /usr/bin/zonestat -r all -Td 1s 2s ... sys_diag:0112_145823: # /usr/bin/poolstat -Td -r all ... sys_diag:0112_145823: # /usr/bin/projects -l ... sys_diag:0112_145823: # /usr/sbin/psrset -i ... sys_diag:0112_145823: # /usr/sbin/psrset -p ... sys_diag:0112_145823: # /usr/sbin/psrset -q ... sys_diag:0112_145823: # /usr/bin/rcapstat -z 1 5 ... sys_diag:0112_145823: # /usr/bin/rcapstat 1 5 ... sys_diag:0112_145823: # /usr/sbin/rctladm -l ... sys_diag:0112_145823: # /usr/bin/priocntl -l ... sys_diag:0112_145823: # tail -80 /etc/system ... sys_diag:0112_145823: # sysdef | tail -85 ... sys_diag:0112_145823: # modinfo ... sys_diag:0112_145842: ###### STORAGE / ARRAY ENCLOSURE INFO ###### sys_diag:0112_145842: # prtconf -pv ... sys_diag:0112_145842: # luxadm probe ...

ERROR: No Fibre Channel Adapters found. sys_diag:0112_145842: # /usr/sbin/raidctl -l ... sys_diag:0112_145842: ###### ZFS / VOLUME MANAGEMENT INFO ###### sys_diag:0112_145842: ###### SOLARIS ZFS Info ###### sys_diag:0112_145842: # /usr/sbin/zpool list ... sys_diag:0112_145842: # /usr/sbin/zfs list ... sys_diag:0112_145842: # /usr/sbin/zfs list -o space ... sys_diag:0112_145842: # /usr/sbin/zpool status -v ... sys_diag:0112_145842: # /usr/sbin/zpool iostat -v 1 5 ... sys_diag:0112_145842: ###### Sun STMS / MPxIO Info ###### sys_diag:0112_145842: # cat /kernel/drv/fp.conf ... sys_diag:0112_145842: # cat /kernel/drv/fcp.conf ... sys_diag:0112_145846: ###### FILESYSTEM INFO ###### sys_diag:0112_145846: # df ... sys_diag:0112_145846: # df -k ... sys_diag:0112_145846: # mount -v ... sys_diag:0112_145846: # /usr/sbin/showmount -a ... sys_diag:0112_145846: # cat /etc/auto_master ... sys_diag:0112_145846: # cat /etc/auto_home ... sys_diag:0112_145846: # cat /etc/vfstab ... sys_diag:0112_145847: ###### I/O STATS ###### sys_diag:0112_145847: # /usr/bin/iostat -nxe 3 2 ... sys_diag:0112_145847: # /usr/bin/iostat -xcnXTdzY 1 1 ... sys_diag:0112_145847: # /usr/bin/iostat -xnp 1 1 ... sys_diag:0112_145847: # /usr/bin/iostat -xcCn 3 3 ... sys_diag:0112_145847: # /usr/bin/iostat -xnE ... sys_diag:0112_145856: ###### NFS INFO ###### sys_diag:0112_145856: # /usr/bin/nfsstat ... sys_diag:0112_145856: # /usr/bin/nfsstat -m ... sys_diag:0112_145856: ###### NETWORKING INFO ###### sys_diag:0112_145856: # cat /etc/hosts ... sys_diag:0112_145856: # /usr/sbin/ifconfig -a ... sys_diag:0112_145856: # /usr/sbin/ipadm show-addr ... sys_diag:0112_145856: # /usr/sbin/dladm show-phys ... sys_diag:0112_145856: # /usr/sbin/dladm show-link ... sys_diag:0112_145856: # /usr/sbin/dlstat show-link ... sys_diag:0112_145856: # /usr/sbin/dladm show-vnic ... sys_diag:0112_145856: # /usr/bin/netstat -i ... sys_diag:0112_145856: # /usr/bin/netstat -r ... sys_diag:0112_145856: # /usr/sbin/arp -a ... sys_diag:0112_145856: # /usr/sbin/ping -s 10.0.2.2 56 5 ... sys_diag:0112_145856: # /usr/sbin/ping -s 10.0.2.2 1016 5 ... sys_diag:0112_145856: # /usr/sbin/traceroute -v 10.0.2.2 ... sys_diag:0112_145856: # /usr/sbin/ping -s Einstein-S11.3-Zone 56 5 ... sys_diag:0112_145856: # /usr/sbin/ping -s Einstein-S11.3-Zone 1016 5 ... sys_diag:0112_145856: # /usr/sbin/traceroute -v Einstein-S11.3-Zone ... sys_diag:0112_145914: # cat /etc/inet/networks ... sys_diag:0112_145914: # cat /etc/netmasks ... sys_diag:0112_145914: # tail -30 /etc/inet/ntp.server ... sys_diag:0112_145914: # /usr/sbin/dladm show-aggr -x ... (Copyright © 1999-2017 by Todd A. Jobson) Pg 30 of 33

sys_diag:0112_145914: # /usr/sbin/dlstat show-aggr ... sys_diag:0112_145914: # /usr/sbin/ipadm show-if ... sys_diag:0112_145914: # /usr/sbin/ipadm show-ifprop ... sys_diag:0112_145914: # /usr/sbin/ipadm show-prop ... sys_diag:0112_145914: # /usr/sbin/ipadm show-addrprop ... sys_diag:0112_145914: # /usr/sbin/dlstat show-link -r net0 1 3... sys_diag:0112_145914: # /usr/sbin/dlstat show-link -t net0 1 3... sys_diag:0112_145914: # /usr/sbin/ipfstat -h ... sys_diag:0112_145914: # /usr/sbin/ipfstat -i -o ... sys_diag:0112_145914: # /usr/sbin/flowadm ... sys_diag:0112_145914: # /usr/sbin/flowadm show-flow ... sys_diag:0112_145914: # /usr/sbin/flowadm show-flowprop ... sys_diag:0112_145914: # /usr/sbin/ilbadm show-stats -v ... sys_diag:0112_145918: # /usr/bin/tcpstat -l50 -c 1 3 ... sys_diag:0112_145918: # /usr/bin/netstat -a ... sys_diag:0112_145926: # /usr/bin/netstat -s ... sys_diag:0112_145926: ###### TTY / MODEM INFO ###### sys_diag:0112_145926: # cat /etc/remote ... sys_diag:0112_145926: # cat /var/adm/aculog ... sys_diag:0112_145926: ###### USER / ACCOUNT / GROUP Info ###### sys_diag:0112_145927: # w ... sys_diag:0112_145927: # who -a ... sys_diag:0112_145927: # cat /etc/passwd ... sys_diag:0112_145927: # cat /etc/group ... sys_diag:0112_145927: ###### SERVICES / NAMING RESOLUTION ###### sys_diag:0112_145927: # /usr/bin/svcs -v ... sys_diag:0112_145927: # /usr/bin/svcs -a -p -v ... sys_diag:0112_145927: # cat /etc/services ... sys_diag:0112_145927: # cat /etc/inetd.conf ... sys_diag:0112_145927: # cat /etc/inittab ... sys_diag:0112_145927: # cat /etc/nsswitch.conf ... sys_diag:0112_145927: # cat /etc/resolv.conf ... sys_diag:0112_145929: # /usr/bin/ypwhich ... sys_diag:0112_145929: # /usr/sbin/acctadm ... sys_diag:0112_145929: # /usr/sbin/acctadm -r... sys_diag:0112_145929: ###### SECURITY / CONFIG FILES ###### sys_diag:0112_145929: # cat /etc/syslog.conf ... sys_diag:0112_145929: # cat /etc/pam.conf ... sys_diag:0112_145929: # cat /etc/default/login ... sys_diag:0112_145929: # cat /etc/ssh/sshd_config ... sys_diag:0112_145929: # cat /etc/user_attr ... sys_diag:0112_145929: # tail -250 /var/adm/sulog ... sys_diag:0112_145929: # /usr/bin/last reboot ... sys_diag:0112_145929: # /usr/bin/last -200 ... sys_diag:0112_145929: # /usr/sbin/ipf -T list ... sys_diag:0112_145929: # cat /etc/ipf/ipf.conf ... sys_diag:0112_145929: # /usr/sbin/ipnat -vls ... sys_diag:0112_145929: ###### HA/ CLUSTERING INFO ###### sys_diag:0112_145929: ###### Database Configuration INFO ###### sys_diag:0112_145929: ###### APPLICATION (STATUS/LOG/CONFIG) FILES ###### sys_diag:0112_145929: ###### PACKAGE INFO / SOLARIS REGISTRY ###### sys_diag:0112_145929: # /usr/bin/pkginfo ... sys_diag:0112_145929: # /usr/bin/pkginfo -l ... sys_diag:0112_145929: ###### Jumpstart / Automated Installer / Patch Info ###### sys_diag:0112_145929: Capture S11 AI Manifest File ... sys_diag:0112_145929: * NO Patch Diagnostic Utility found, skipping.

sys_diag:0112_145930: ###### CRONTAB FILE LISTINGS ###### sys_diag:0112_145930: ###### FMD / SYSTEM MESSAGE/LOG FILES ###### sys_diag:0112_145930: # /usr/sbin/fmadm config ... sys_diag:0112_145930: # /usr/sbin/fmdump ... sys_diag:0112_145930: # /usr/sbin/fmstat 1 2 ... sys_diag:0112_145930: # tail -250 /var/adm/messages ... sys_diag:0112_145930: # /usr/bin/dmesg | tail -250 ... sys_diag:0112_145930: # tail -250 /var/log/syslog ... sys_diag:0112_145931: ... gen_html_hdr ... sys_diag:0112_145931: ###### SYSTEM ANALYSIS : INITIAL FINDINGS : ERRORS / WARNINGS ###### sys_diag:0112_145931: ###### PERFORMANCE DATA : POTENTIAL ISSUES ######

______

sys_diag:0112_145931: ## Analyzing VMSTAT CPU Datafile : (Copyright © 1999-2017 by Todd A. Jobson) Pg 31 of 33

/export/home/tjobson/Downloads/sysd_Newton-S11.3x6_170112_1455/sysd_vm_*.out ...

* NOTE: 13.5593 % : 8 of 59 VMSTAT CPU entries EXCEEDED Thresholds! *

TOTAL CPU AVGS : RUNQ= 0.8 : BThr= 0.0 : USR= 16.2 : SYS= 19.9 : IDLE= 63.8 *( 36.2% Total CPU USED)* PEAK CPU HWMs : RUNQ= 7 : BThr= 0 : USR= 35 : SYS= 40 : IDLE= 31 *( 69.0% Total CPU USED)*

______

sys_diag:0112_145932: ## Analyzing VMSTAT MEMORY from Datafile : /export/home/tjobson/Downloads/sysd_Newton-S11.3x6_170112_1455/sysd_vm_*.out ...

* NOTE: 57.6271 % : 34 of 59 VMSTAT MEMORY entries EXCEEDED Thresholds! *

TOTAL MEM AVGS : SR= 0.0 : SWAP_free= 4175217 K : FREE_RAM= 273260 K *( 92.3% Total MEM USED)* PEAK MEM Usage: SR= 0 : SWAP_free= 3868388 K : FREE_RAM= 92592 K *( 97.4% Total MEM USED)*

______

sys_diag:0112_145932: ## Analyzing MPSTAT Datafile : /export/home/tjobson/Downloads/sysd_Newton-S11.3x6_170112_1455/sysd_mp_*.out ...

* NOTE: 11.1111 % : 20 of 180 MPSTAT CPU entries EXCEEDED Thresholds! *

CPU MP AVGS: Wt= 0: Xcal= 324: csw= 753: icsw= 119: migr= 58: smtx= 17: syscl= 6130 PEAK MP HWMs: Wt= 0: Xcal= 10828: csw= 2944: icsw= 584: migr= 242: smtx= 122: syscl= 75494 ______sys_diag:0112_145932: ## Analyzing IOSTAT Datafile : /export/home/tjobson/Downloads/sysd_Newton-S11.3x6_170112_1455/sysd_io_*.out ...

* NOTE: 60.4651 % : 26 of 43 IOSTAT (non-zero) entries EXCEEDED Thresholds! * Slowest Storage IO Devices : by *PEAK* wsvc_t :

r/s w/s kr/s kw/s actv wsvc_t asvc_t %w %b device

223.0 281.0 21432.4 92602.4 2.0 4.9 28.2 20.0 100 c1d0

______

Slowest Storage IO Devices : by *AVERAGE* asvc_t (* AVG of non-zero device entries *) :

r/s w/s kr/s kw/s actv wsvc_t asvc_t %w %b device # I/O Samples

11.0 64.4 827.7 15905.8 0.5 0.9 8.0 3.5 27 c1d0 43

______

CONTROLLER IO : AVG and TOTAL Throughput per HBA (*active/non-zero entries only*) :

--- c1d0: AVG : 11 r/s | 64 w/s | 828 kr/s | 15906 kw/s | c1d0: TOTAL: 472 r | 2770 w | 35591 kr | 683950 kw | 43 entries

--- TOT_SIO_Thrupt: 472 reads | 2770 writes | 35591 KB_read | 683950 KB_write | 43 entries

______

sys_diag:0112_145932: ## Analyzing NETSTAT Datafiles : ...

* lo0 : NOTE: 0 % : 0 of 59 NETSTAT entries EXCEEDED Thresholds! * * net0 : NOTE: 0 % : 0 of 59 NETSTAT entries EXCEEDED Thresholds! * (per 1 sec samples)

------*MAX_RX_PKTS AVG_RX_PKTS AVG_RX_ERRS AVG_TX_PKTS AVG_TX_ERRS AVG_COLL NET1 : lo0 : 0 0.0 0.0 0.0 0.0 0.0

------*MAX_RX_PKTS AVG_RX_PKTS AVG_RX_ERRS AVG_TX_PKTS AVG_TX_ERRS AVG_COLL NET2 : net0 : 980 471.4 0.0 258.5 0.0 0.0

: net0 : TOT_RX_Pkts TOT_RX_KBytes *RX_Pkt_DROPs* TOT_TX_Pkts TOT_TX_KBytes *TX_Pkt_DROPs* 28068 3477 0 15385 124 0 (Copyright © 1999-2017 by Todd A. Jobson) Pg 32 of 33

------TOTAL Throughput: TOT_RX_Pkts TOT_RX_KBytes *TOT_RX_DROPs* TOT_TX_Pkts TOT_TX_KBytes *TOT_TX_DROPs* 28068 3477 0 15385 124 0

NOTE: ** 1 ESTABLISHED connections (sockets) exist **

NOTE: ** 7 TIME_WAIT sockets exist **

______

* NOTE: CPU=YEL : MEM=RED : IO=RED : NET=GRN * ______sys_diag:0112_145933: gen_graphs() : VMSTAT : CPU PIE & LINE Charts sys_diag:0112_145933: gen_graphs() : VMSTAT : MEM PIE & LINE Charts sys_diag:0112_145933: gen_graphs() : IOSTAT : PIE Chart : ALL Devices sys_diag:0112_145934: gen_graphs() : IOSTAT : PIE & LINE Charts : Slowest device sys_diag:0112_145934: ... gen_html_rpt ...

Data Directory : /export/home/tjobson/Downloads/sysd_Newton-S11.3x6_170112_1455

HTML Report File : file:///export/home/tjobson/Downloads/sysd_Newton-S11.3x6_170112_1455/sysd_Newton- S11.3x6_170112_1455.out.html

sys_diag:0112_145934: ## Generating TAR file : /export/home/tjobson/Downloads/sysd_Newton- S11.3x6_170112_1455.tar ...

Data files have been TARed and compressed in :

*** /export/home/tjobson/Downloads/sysd_Newton-S11.3x6_170112_1455.tar.Z ***

------Sys_Diag Complete ------

sys_diag:0112_145934: lock_file: 0

------

______

13.0 For More Information : Resources and Feedback ______

** See http://blogs.sun.com/toddjobson/ for several blogs relating to system performance, capacity planning, and systems architecture / availability.

The latest released version of sys_diag can be downloaded from : from : https://blogs.oracle.com/toddjobson/resource/sys_diag.Z

NOTE: (Right Click Save-As, then make sure to # uncompress within a Solaris OS so that Windows Does not corrupt the file with CR-LF’s vs. the NL.)

If your file does get Windows converted, your can convert back to Solaris via # dos2unix (man), Or by following the manual # vi instructions within the sys_diag header.

Comments, RFE’s, suggestions forwarded to [email protected], subject header “sys_diag”

Prior distribution was also via Sun Microsystems BigAdmin and SunFreeware.com, now at : http://www.sunfreeware.com/programlistsparc10.html

------

(Copyright © 1999-2017 by Todd A. Jobson) Pg 33 of 33