04/08/2020

njmon nmon njmon and nmon User Meeting 2

Power Nigel Griffiths Advanced Technology Support, EMEA VIOS AMD64 IBM email: [email protected] Z Open Source: [email protected] ARM @mr_nmon twitter http://nmon.sourceforge.net/njmon https://tinyurl.com/njmon https://www.youtube.com/user/nigelargriffiths http://tinyurl.com/AIXpert 2

Open Source from IBMers So AIX benefits from the latest -Series database & graph engines from Nigel “Mr nmon” Griffiths Stats : CPU RAM Disks Paging Volume Groups Logical Volumes Networks Adapters Kernel stats Tapes Uptime User count AIO systems System Calls Very simple endpoint install Recent updates: Processes InfluxDB and Grafana install in 10 minutes - New faster centralized collector NFS Grafana starter dashboards but prime value - New direct to InfluxDB = nimon GPFS Spectrum Scale is creating any graph you want in seconds - New YouTube videos for Sys Admins VIOS virtual disks - New Grafana graph templates VIOS SEA VIOS virtual networks JSON output for Elastic & Splunk VIOS SSP Line Protocol for InfluxDB & Prometheus See https://tinyurl.com/njmon NVIDIA GPUs AIX rPerf

3

1 04/08/2020 QlFnmiWVydb5QdX2wz9iRfJkuuB2ec1

No Change

10 videos 12,600 Views up to August 2020

https://www.youtube.com/user/nigelargriffiths 2 hours 20 minutes https://www.youtube.com/watch?v=wN5GNc9HH7Y&list=PLK 4

https://www.youtube.com/watch?v=1Vu-cQciEJc&feature=youtu.be

Using njmon to monitor 150+ POWER9 server cluster running a 5 day soak (in the blind)

Application was Foam = aero-dynamics modelling for a Formula 1 car racing team

Included Grafana graphs on the 30 foot screen in the centre.

5

2 04/08/2020

Two recent ideas :

1. Not easy to document measures & statistics names!

2. Capturing ad-hoc stats on Big Production Servers

Answers: AIXpert Blog

6

To graph the njmon/nimon data you need to “see” the data structure • But documentation is boring! So • JSON is self documenting – ish! • The data names do help • On AIX there are straight from the perfstats library, which is where they came from Competition was just as bad • I can’t the Linux statd stats listed

So I wrote a shell script . . . for nimon

7

3 04/08/2020

In InfluxDB Measure – a group of related stats • CPU, memory, disks, network, … To graph the njmon/nimon data Stats are the numbers / you need to “see” the data structure • Usr, sys, , idle • But documentation is boring! • Read, KB/s, packets, … So For variable resource  sub measures • JSON is self documenting – ish! • CPU0, CPU1, disk1, disk2, net1, net2 • The data names do help • On AIX there are straight from the perfstats library, which is where they came from Competition was just as bad • I can’t find the Linux statd stats listed

So I wrote a shell script . . .

8

To graph the njmon/nimon data you need to “see” the data structure • But documentation is boring! So • JSON is self documenting – ish! • The data names do help • On AIX there are straight from the perfstats library, which is where they came from Competition was just as bad • I can’t find the Linux statd stats listed

So I wrote a shell script . . .

9

4 04/08/2020

10

nimon_list_stats Article and script download here njmon and nimon Listing Tags, Measures & Statistics

https://www.ibm.com/support/pages/njmon-and-nimon-listing-tags-measures-statistics

11

5 04/08/2020

All very . . . but I want to add some of my own stats Or worse – Can you add my wacko idea stats?

12

With njmon using JSON a problem. Very easy to get JSON wrong!

“cpu”: { “cpu” { “usr”: 20.5, “usr’’ : 20.5, “sys”: 5.5, “sys”: 5.5, “waitIO”: 20.0, “waitIO”: 20.0. “idle”: 79.0 “idle”: 79.0, } }

13

6 04/08/2020

Missing : Two single quote “cpu”: { “cpu” { “usr”: 20.5, “usr’’ : 20.5, “sys”: 5.5, “sys”: 5.5, “waitIO”: 20.0, “waitIO”: 20.0. Dot  comma “idle”: 79.0 “idle”: 79.0, Extra comma } }

Allowing users to add JSON error prone and support nightmare

14

But nimon and InfluxDB Line Protocol is safer –ish

cpu  measure must be unique (not in use by nimon) usr=20.5,sys=5.5,waitIO=20.0,idle=79.0

Or strings: version=“7.2TL4sp2” Already had a problem due to the shell stripping out “

15

7 04/08/2020

| CPU | Memory | Disks | Network | Kernel | Processes

Measure command for AIX and Linux • Saving other statistics to the same njmon database. • If you can get the data via a script, you can send it on with the same njmon tags in 1/100 th of a second. • Then graph OS stats & your stats the same time.

AIX Production Servers CAN NOT just add software like curl + pre-reqs, security issues, 2 months pre-production testing + roll out over 2 years!

16

Measure for AIX and Linux | CPU | Memory Saving other statistics to the same njmon database. | Disks If you can get the data via a script, you can send it | Network on with the same njmon tags in 1/100 th of a second. | Kernel Then graph OS stats & your stats at the same time. | Processes

Measure Statistics InfluxDB Grafana RDBMS script : measure * -g rdbms -G commits=986.34,rollbacks=23.1,hitratio=99.3

Sales script : measure * -g sales -G itemsold=32984,avgcost=79.99,profit=-0.003

Users script : measure * -g user -G online=65389,session_mins=184,click_pm=18.2

IT-tasks times script : measure * -g tasks -G dataload=47_min,backupmin=124,batch_min=84 * Also need Influx-hostname & Influx-DB-name

17

8 04/08/2020

https://sysrant.com/aix-metrics-in-prometheus-with-njmon/

18

Added Prometheus Support (-w) Article on this • https://www.ibm.com/support/pages/node/1116327

push nimon Telegraf Prometheus Grafana

pull nimon

Line Protocol without nimon InfluxDB login details using the -w option

19

9 04/08/2020

Lets about Grafana! Wow!! Every release is like Xmas  we get new toys (graphs) - Even a webpage with samples

20

1 Lets talk about

3 Grafana!

4

2

6

1. My logo = cool 2. Donut graph, yum 5 3. Dark mode: Helps you at the desk! 4. LED graphic equaliser: draws attention to red stats 5. Button single stat and graph: high density 6. Blue ridge mountain range graph 7. Carpet graph – see later 21

10 04/08/2020

Any one heard of the Dolly Parton curve?

22

Any one heard of the Dolly Parton curve? 100%

Morning Afternoon Batch

Lunch CPUBUSY

TIME AM PM

23

11 04/08/2020

Any one heard of the Three Crunch points Dolly Parton curve? 100%

Morning Afternoon Batch

Lunch CPUBUSY

TIME AM PM

24

Any one heard of the Three Crunch points Dolly Parton curve? 100%

Morning Afternoon Batch

Lunch Problems : Averaging the day hides the three crunch points

CPUBUSY Periodic over a day and over a week (typical busier on Friday) Periodic over a month (end of month extra reporting) and end of year! Batch overrun times TIME AM PM

25

12 04/08/2020

Heat map for whole days using the Grafana Carpet Plugin This is a excellent way to determining the busy day + busy hours = first step for trend forecasting

Week Week Week

26

Heat map for whole days using the Grafana Carpet Plugin This is a excellent way to determining the busy day + busy hours = first step for trend forecasting Heat Map Warning: There are always red parts! Week Week Week

Interesting Peaks 8 to 10 am & 2 pm Tuesday to Friday Busy day is Thursday

27

13 04/08/2020

My to do list : Work out how to graph CPU on successive Fridays 8 am to 10 pm

Ideas to [email protected]

Could be done in “flux” or Grafana

28

Some ideas

Remove the weeds

Fri Fri Fri Fri Fri Once graph + selected time periods

29

14 04/08/2020

nimon quick-win approach AMD64/x86_64 download from • influxdata.com/downloads 1 Fresh install of OS on your HW • 2 InfluxDB + Grafana download grafana.com/get 3 InfluxDB + Grafana install: rpm/apt for • RHEL7/8 + Centos/Fedora 4 --- Disable your firewall! • 18/20 5 $ influx > create database njmon POWER8/9 download from > • power-devops.com/influxdb 6 Login to Grafana website port:3000 • power-devops.com/grafana & add a database source = for influxDB + njmon • RHEL7/8 or SLES 15

7 Install njmon/nimon with ninstall Download latest njmon .gzip [AIX + Linux] 8 $ nimon -s30 -k -i influxhost -p 8086 tinyurl.com/njmon -x njmon -y nigel -z passwd [+crontab -e] 9 In Grafana: import a dashboard by # Find graph template number 10 ---- Fix your firewall [ports 8086/3000] grafana.com/dashboards [search: njmon]

30

End of Message Next meeting: - Thank you for your time 7th September 2020

Feedback + ideas welcome: [email protected] or @mr_nmon or LinkedIn: https://www.linkedin.com/in/nigel-griffiths

31

15