Nmon Njmon and Nmon User Meeting 2
Total Page:16
File Type:pdf, Size:1020Kb
04/08/2020 njmon nmon njmon and nmon User Meeting 2 Power Nigel Griffiths Advanced Technology Support, EMEA VIOS AMD64 IBM email: [email protected] Z Open Source: [email protected] ARM @mr_nmon twitter http://nmon.sourceforge.net/njmon https://tinyurl.com/njmon https://www.youtube.com/user/nigelargriffiths http://tinyurl.com/AIXpert 2 Open Source from IBMers So AIX benefits from the latest Time-Series database & graph engines from Nigel “Mr nmon” Griffiths Stats : CPU RAM Disks Paging Volume Groups Logical Volumes Networks Adapters Kernel stats Tapes Uptime User count AIO File systems System Calls Very simple endpoint install Recent updates: Processes InfluxDB and Grafana install in 10 minutes - New faster centralized collector NFS Grafana starter dashboards but prime value - New direct to InfluxDB = nimon GPFS Spectrum Scale is creating any graph you want in seconds - New YouTube videos for Sys Admins VIOS virtual disks - New Grafana graph templates VIOS SEA VIOS virtual networks JSON output for Elastic & Splunk VIOS SSP Line Protocol for InfluxDB & Prometheus See https://tinyurl.com/njmon Linux NVIDIA GPUs AIX rPerf 3 1 04/08/2020 QlFnmiWVydb5QdX2wz9iRfJkuuB2ec1 No Change 10 videos 12,600 Views up to August 2020 https://www.youtube.com/user/nigelargriffiths 2 hours 20 minutes https://www.youtube.com/watch?v=wN5GNc9HH7Y&list=PLK 4 https://www.youtube.com/watch?v=1Vu-cQciEJc&feature=youtu.be Using njmon to monitor 150+ POWER9 server cluster running a 5 day soak test (in the blind) Application was Foam = aero-dynamics modelling for a Formula 1 car racing team Included Grafana graphs on the 30 foot screen in the centre. 5 2 04/08/2020 Two recent ideas : 1. Not easy to document measures & statistics names! 2. Capturing ad-hoc stats on Big Production Servers Answers: AIXpert Blog 6 To graph the njmon/nimon data you need to “see” the data structure • But documentation is boring! So • JSON is self documenting – ish! • The data names do help • On AIX there are straight from the perfstats library, which is where they came from Competition was just as bad • I can’t find the Linux statd stats listed So I wrote a shell script . for nimon 7 3 04/08/2020 In InfluxDB Measure – a group of related stats • CPU, memory, disks, network, … To graph the njmon/nimon data Stats are the numbers / strings you need to “see” the data structure • Usr, sys, wait, idle • But documentation is boring! • Read, write KB/s, packets, … So For variable resource sub measures • JSON is self documenting – ish! • CPU0, CPU1, disk1, disk2, net1, net2 • The data names do help • On AIX there are straight from the perfstats library, which is where they came from Competition was just as bad • I can’t find the Linux statd stats listed So I wrote a shell script . 8 To graph the njmon/nimon data you need to “see” the data structure • But documentation is boring! So • JSON is self documenting – ish! • The data names do help • On AIX there are straight from the perfstats library, which is where they came from Competition was just as bad • I can’t find the Linux statd stats listed So I wrote a shell script . 9 4 04/08/2020 10 nimon_list_stats Article and script download here njmon and nimon Listing Tags, Measures & Statistics https://www.ibm.com/support/pages/njmon-and-nimon-listing-tags-measures-statistics 11 5 04/08/2020 All very nice . but I want to add some of my own stats Or worse – Can you add my wacko idea stats? 12 With njmon using JSON a problem. Very easy to get JSON wrong! “cpu”: { “cpu” { “usr”: 20.5, “usr’’ : 20.5, “sys”: 5.5, “sys”: 5.5, “waitIO”: 20.0, “waitIO”: 20.0. “idle”: 79.0 “idle”: 79.0, } } 13 6 04/08/2020 Missing : Two single quote “cpu”: { “cpu” { “usr”: 20.5, “usr’’ : 20.5, “sys”: 5.5, “sys”: 5.5, “waitIO”: 20.0, “waitIO”: 20.0. Dot comma “idle”: 79.0 “idle”: 79.0, Extra comma } } Allowing users to add JSON error prone and support nightmare 14 But nimon and InfluxDB Line Protocol is safer –ish cpu measure must be unique (not in use by nimon) usr=20.5,sys=5.5,waitIO=20.0,idle=79.0 Or strings: version=“7.2TL4sp2” Already had a problem due to the shell stripping out “ 15 7 04/08/2020 | CPU | Memory | Disks | Network | Kernel | Processes Measure command for AIX and Linux • Saving other statistics to the same njmon database. • If you can get the data via a script, you can send it on with the same njmon tags in 1/100 th of a second. • Then graph OS stats & your stats at the same time. AIX Production Servers CAN NOT just add software like curl + pre-reqs, security issues, 2 months pre-production testing + roll out over 2 years! 16 Measure for AIX and Linux | CPU | Memory Saving other statistics to the same njmon database. | Disks If you can get the data via a script, you can send it | Network on with the same njmon tags in 1/100 th of a second. | Kernel Then graph OS stats & your stats at the same time. | Processes Measure Statistics InfluxDB Grafana RDBMS script : measure * -g rdbms -G commits=986.34,rollbacks=23.1,hitratio=99.3 Sales script : measure * -g sales -G itemsold=32984,avgcost=79.99,profit=-0.003 Users script : measure * -g user -G online=65389,session_mins=184,click_pm=18.2 IT-tasks times script : measure * -g tasks -G dataload=47_min,backupmin=124,batch_min=84 * Also need Influx-hostname & Influx-DB-name 17 8 04/08/2020 https://sysrant.com/aix-metrics-in-prometheus-with-njmon/ 18 Added Prometheus Support (-w) Article on this • https://www.ibm.com/support/pages/node/1116327 push nimon Telegraf Prometheus Grafana pull nimon Line Protocol without nimon InfluxDB login details using the -w option 19 9 04/08/2020 Lets talk about Grafana! Wow!! Every release is like Xmas we get new toys (graphs) - Even a webpage with samples 20 1 Lets talk about 3 Grafana! 4 2 6 1. My logo = cool 2. Donut graph, yum 5 3. Dark mode: Helps you sleep at the desk! 4. LED graphic equaliser: draws attention to red stats 5. Button single stat and graph: high density 6. Blue ridge mountain range graph 7. Carpet graph – see later 21 10 04/08/2020 Any one heard of the Dolly Parton curve? 22 Any one heard of the Dolly Parton curve? 100% Morning Afternoon Batch Lunch CPUBUSY TIME AM PM 23 11 04/08/2020 Any one heard of the Three Crunch points Dolly Parton curve? 100% Morning Afternoon Batch Lunch CPUBUSY TIME AM PM 24 Any one heard of the Three Crunch points Dolly Parton curve? 100% Morning Afternoon Batch Lunch Problems : Averaging the day hides the three crunch points CPUBUSY Periodic over a day and over a week (typical busier on Friday) Periodic over a month (end of month extra reporting) and end of year! Batch overrun times TIME AM PM 25 12 04/08/2020 Heat map for whole days using the Grafana Carpet Plugin This is a excellent way to determining the busy day + busy hours = first step for trend forecasting Week Week Week 26 Heat map for whole days using the Grafana Carpet Plugin This is a excellent way to determining the busy day + busy hours = first step for trend forecasting Heat Map Warning: There are always red parts! Week Week Week Interesting Peaks 8 to 10 am & 2 pm Tuesday to Friday Busy day is Thursday 27 13 04/08/2020 My to do list : Work out how to graph CPU on successive Fridays 8 am to 10 pm Ideas to [email protected] Could be done in “flux” or Grafana 28 Some ideas Remove the weeds Fri Fri Fri Fri Fri Once graph + selected time periods 29 14 04/08/2020 nimon quick-win approach AMD64/x86_64 download from • influxdata.com/downloads 1 Fresh install of OS on your HW • 2 InfluxDB + Grafana download grafana.com/get 3 InfluxDB + Grafana install: rpm/apt for • RHEL7/8 + Centos/Fedora 4 --- Disable your firewall! • Ubuntu 18/20 5 $ influx > create database njmon POWER8/9 download from > exit • power-devops.com/influxdb 6 Login to Grafana website port:3000 • power-devops.com/grafana & add a database source = for influxDB + njmon • RHEL7/8 or SLES 15 7 Install njmon/nimon with ninstall Download latest njmon .gzip [AIX + Linux] 8 $ nimon -s30 -k -i influxhost -p 8086 tinyurl.com/njmon -x njmon -y nigel -z passwd [+crontab -e] 9 In Grafana: import a dashboard by # Find graph template number 10 ---- Fix your firewall [ports 8086/3000] grafana.com/dashboards [search: njmon] 30 End of Message Next meeting: - Thank you for your time 7th September 2020 Feedback + ideas welcome: [email protected] or @mr_nmon or LinkedIn: https://www.linkedin.com/in/nigel-griffiths 31 15.