04/08/2020
njmon nmon njmon and nmon User Meeting 2
Power Nigel Griffiths Advanced Technology Support, EMEA VIOS AMD64 IBM email: [email protected] Z Open Source: [email protected] ARM @mr_nmon twitter http://nmon.sourceforge.net/njmon https://tinyurl.com/njmon https://www.youtube.com/user/nigelargriffiths http://tinyurl.com/AIXpert 2
Open Source from IBMers So AIX benefits from the latest Time-Series database & graph engines from Nigel “Mr nmon” Griffiths Stats : CPU RAM Disks Paging Volume Groups Logical Volumes Networks Adapters Kernel stats Tapes Uptime User count AIO File systems System Calls Very simple endpoint install Recent updates: Processes InfluxDB and Grafana install in 10 minutes - New faster centralized collector NFS Grafana starter dashboards but prime value - New direct to InfluxDB = nimon GPFS Spectrum Scale is creating any graph you want in seconds - New YouTube videos for Sys Admins VIOS virtual disks - New Grafana graph templates VIOS SEA VIOS virtual networks JSON output for Elastic & Splunk VIOS SSP Line Protocol for InfluxDB & Prometheus See https://tinyurl.com/njmon Linux NVIDIA GPUs AIX rPerf
3
1 04/08/2020 QlFnmiWVydb5QdX2wz9iRfJkuuB2ec1
No Change
10 videos 12,600 Views up to August 2020
https://www.youtube.com/user/nigelargriffiths 2 hours 20 minutes https://www.youtube.com/watch?v=wN5GNc9HH7Y&list=PLK 4
https://www.youtube.com/watch?v=1Vu-cQciEJc&feature=youtu.be
Using njmon to monitor 150+ POWER9 server cluster running a 5 day soak test (in the blind)
Application was Foam = aero-dynamics modelling for a Formula 1 car racing team
Included Grafana graphs on the 30 foot screen in the centre.
5
2 04/08/2020
Two recent ideas :
1. Not easy to document measures & statistics names!
2. Capturing ad-hoc stats on Big Production Servers
Answers: AIXpert Blog
6
To graph the njmon/nimon data you need to “see” the data structure • But documentation is boring! So • JSON is self documenting – ish! • The data names do help • On AIX there are straight from the perfstats library, which is where they came from Competition was just as bad • I can’t find the Linux statd stats listed
So I wrote a shell script . . . for nimon
7
3 04/08/2020
In InfluxDB Measure – a group of related stats • CPU, memory, disks, network, … To graph the njmon/nimon data Stats are the numbers / strings you need to “see” the data structure • Usr, sys, wait, idle • But documentation is boring! • Read, write KB/s, packets, … So For variable resource sub measures • JSON is self documenting – ish! • CPU0, CPU1, disk1, disk2, net1, net2 • The data names do help • On AIX there are straight from the perfstats library, which is where they came from Competition was just as bad • I can’t find the Linux statd stats listed
So I wrote a shell script . . .
8
To graph the njmon/nimon data you need to “see” the data structure • But documentation is boring! So • JSON is self documenting – ish! • The data names do help • On AIX there are straight from the perfstats library, which is where they came from Competition was just as bad • I can’t find the Linux statd stats listed
So I wrote a shell script . . .
9
4 04/08/2020
10
nimon_list_stats Article and script download here njmon and nimon Listing Tags, Measures & Statistics
https://www.ibm.com/support/pages/njmon-and-nimon-listing-tags-measures-statistics
11
5 04/08/2020
All very nice . . . but I want to add some of my own stats Or worse – Can you add my wacko idea stats?
12
With njmon using JSON a problem. Very easy to get JSON wrong!
“cpu”: { “cpu” { “usr”: 20.5, “usr’’ : 20.5, “sys”: 5.5, “sys”: 5.5, “waitIO”: 20.0, “waitIO”: 20.0. “idle”: 79.0 “idle”: 79.0, } }
13
6 04/08/2020
Missing : Two single quote “cpu”: { “cpu” { “usr”: 20.5, “usr’’ : 20.5, “sys”: 5.5, “sys”: 5.5, “waitIO”: 20.0, “waitIO”: 20.0. Dot comma “idle”: 79.0 “idle”: 79.0, Extra comma } }
Allowing users to add JSON error prone and support nightmare
14
But nimon and InfluxDB Line Protocol is safer –ish
cpu measure must be unique (not in use by nimon) usr=20.5,sys=5.5,waitIO=20.0,idle=79.0
Or strings: version=“7.2TL4sp2” Already had a problem due to the shell stripping out “
15
7 04/08/2020
| CPU | Memory | Disks | Network | Kernel | Processes
Measure command for AIX and Linux • Saving other statistics to the same njmon database. • If you can get the data via a script, you can send it on with the same njmon tags in 1/100 th of a second. • Then graph OS stats & your stats at the same time.
AIX Production Servers CAN NOT just add software like curl + pre-reqs, security issues, 2 months pre-production testing + roll out over 2 years!
16
Measure for AIX and Linux | CPU | Memory Saving other statistics to the same njmon database. | Disks If you can get the data via a script, you can send it | Network on with the same njmon tags in 1/100 th of a second. | Kernel Then graph OS stats & your stats at the same time. | Processes
Measure Statistics InfluxDB Grafana RDBMS script : measure * -g rdbms -G commits=986.34,rollbacks=23.1,hitratio=99.3
Sales script : measure * -g sales -G itemsold=32984,avgcost=79.99,profit=-0.003
Users script : measure * -g user -G online=65389,session_mins=184,click_pm=18.2
IT-tasks times script : measure * -g tasks -G dataload=47_min,backupmin=124,batch_min=84 * Also need Influx-hostname & Influx-DB-name
17
8 04/08/2020
https://sysrant.com/aix-metrics-in-prometheus-with-njmon/
18
Added Prometheus Support (-w) Article on this • https://www.ibm.com/support/pages/node/1116327
push nimon Telegraf Prometheus Grafana
pull nimon
Line Protocol without nimon InfluxDB login details using the -w option
19
9 04/08/2020
Lets talk about Grafana! Wow!! Every release is like Xmas we get new toys (graphs) - Even a webpage with samples
20
1 Lets talk about
3 Grafana!
4
2
6
1. My logo = cool 2. Donut graph, yum 5 3. Dark mode: Helps you sleep at the desk! 4. LED graphic equaliser: draws attention to red stats 5. Button single stat and graph: high density 6. Blue ridge mountain range graph 7. Carpet graph – see later 21
10 04/08/2020
Any one heard of the Dolly Parton curve?
22
Any one heard of the Dolly Parton curve? 100%
Morning Afternoon Batch
Lunch CPUBUSY
TIME AM PM
23
11 04/08/2020
Any one heard of the Three Crunch points Dolly Parton curve? 100%
Morning Afternoon Batch
Lunch CPUBUSY
TIME AM PM
24
Any one heard of the Three Crunch points Dolly Parton curve? 100%
Morning Afternoon Batch
Lunch Problems : Averaging the day hides the three crunch points
CPUBUSY Periodic over a day and over a week (typical busier on Friday) Periodic over a month (end of month extra reporting) and end of year! Batch overrun times TIME AM PM
25
12 04/08/2020
Heat map for whole days using the Grafana Carpet Plugin This is a excellent way to determining the busy day + busy hours = first step for trend forecasting
Week Week Week
26
Heat map for whole days using the Grafana Carpet Plugin This is a excellent way to determining the busy day + busy hours = first step for trend forecasting Heat Map Warning: There are always red parts! Week Week Week
Interesting Peaks 8 to 10 am & 2 pm Tuesday to Friday Busy day is Thursday
27
13 04/08/2020
My to do list : Work out how to graph CPU on successive Fridays 8 am to 10 pm
Ideas to [email protected]
Could be done in “flux” or Grafana
28
Some ideas
Remove the weeds
Fri Fri Fri Fri Fri Once graph + selected time periods
29
14 04/08/2020
nimon quick-win approach AMD64/x86_64 download from • influxdata.com/downloads 1 Fresh install of OS on your HW • 2 InfluxDB + Grafana download grafana.com/get 3 InfluxDB + Grafana install: rpm/apt for • RHEL7/8 + Centos/Fedora 4 --- Disable your firewall! • Ubuntu 18/20 5 $ influx > create database njmon POWER8/9 download from > exit • power-devops.com/influxdb 6 Login to Grafana website port:3000 • power-devops.com/grafana & add a database source = for influxDB + njmon • RHEL7/8 or SLES 15
7 Install njmon/nimon with ninstall Download latest njmon .gzip [AIX + Linux] 8 $ nimon -s30 -k -i influxhost -p 8086 tinyurl.com/njmon -x njmon -y nigel -z passwd [+crontab -e] 9 In Grafana: import a dashboard by # Find graph template number 10 ---- Fix your firewall [ports 8086/3000] grafana.com/dashboards [search: njmon]
30
End of Message Next meeting: - Thank you for your time 7th September 2020
Feedback + ideas welcome: [email protected] or @mr_nmon or LinkedIn: https://www.linkedin.com/in/nigel-griffiths
31
15