Bringing ZFS information into SNMP

Thomas Stibor

GSI Helmholtz Centre for Heavy Ion Research, HPC

27. Januar 2014 What is SNMP?

• Simple Network Management Protocol (SNMP) is protocol for network management. • It allows collecting information from switches, printers, -boxes, . . . and also to configure (write access) those. thomas@lxdv65:~>snmpget -v 1 -c public localhost 1.3.6.1.2.1.1.1.0 iso.3.6.1.2.1.1.1.0 = STRING: "Linux lxdv65 3.8.13-tstibor-lxdv65-rev1 #1 SMP Wed May 15 12:32:59 CEST 2013 x86_64" thomas@lxdv65:~>snmpget -v 1 -c public localhost 1.3.6.1.2.1.25.1.4.0 iso.3.6.1.2.1.25.1.4.0 = STRING: "BOOT_IMAGE=/boot/vmlinuz-3.8.13-tstibor-lxdv65-rev1 root=/dev/mapper/vg0-debian ro quiet"

What are these strange looking numbers, e.g. 1.3.6.1.2.1.25.1.4.0? • Each Object Identifier (short OID) identifies a variable that can be or set via SNMP. • OID(s) are organized hierarchically. OID(s) as a Tree (snmpwalk)

thomas@lxdv65:~> snmpwalk -c public -v 2c localhost 1.3.6.1.4.1.2021.9.1 iso.3.6.1.4.1.2021.9.1.1.1 = INTEGER: 1 iso.3.6.1.4.1.2021.9.1.1.2 = INTEGER: 2 ... iso.3.6.1.4.1.2021.9.1.1.12 = INTEGER: 12 iso.3.6.1.4.1.2021.9.1.2.1 = STRING: "/" iso.3.6.1.4.1.2021.9.1.2.2 = STRING: "/sys" ... iso.3.6.1.4.1.2021.9.1.2.12 = STRING: "//pools-deduplication" iso.3.6.1.4.1.2021.9.1.3.1 = STRING: "rootfs" iso.3.6.1.4.1.2021.9.1.3.2 = STRING: "" ... iso.3.6.1.4.1.2021.9.1.3.5 = STRING: "" iso.3.6.1.4.1.2021.9.1.3.6 = STRING: ""

1.3.6.1.4.1.2021.9.1

1 2 3

1 2 . . . 12 1 2 . . . 12 1 2 . . . 6 Human Readable OID(s) Given an OID • What is the semantic meaning of e.g. thomas@lxdv65:~>snmpget -v 1 -c public localhost 1.3.6.1.2.1.25.1.6.0 iso.3.6.1.2.1.25.1.6.0 = Gauge32: 597 • Is there a description giving us more information? thomas@lxdv65:~>snmptranslate -m SNMPv2-MIB 1.3.6.1.2.1.1.1 SNMPv2-MIB::sysDescr thomas@lxdv65:~>snmptranslate -m SNMPv2-MIB -On -Td 1.3.6.1.2.1.1.1 .1.3.6.1.2.1.1.1 sysDescr OBJECT-TYPE -- FROM SNMPv2-MIB -- TEXTUAL CONVENTION DisplayString SYNTAX OCTET STRING (0..255) DISPLAY-HINT "255a" MAX-ACCESS read-only STATUS current DESCRIPTION "A textual description of the entity. This value should include the full name and version identification of the system’s hardware type, software operating-system, and networking software." ::= { iso(1) org(3) dod(6) internet(1) mgmt(2) mib-2(1) system(1) 1 }

These information are provided in Management Information Base (short MIB) file(s). Desired ZFS Information bringing into SNMP thomas@lxdv65:~>sudo zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT domov-0 178M 244K 178M 0% 1.00x DEGRADED - domov-1 178M 235K 178M 0% 1.00x ONLINE - domov-2 178M 235K 178M 0% 1.00x ONLINE - domov-3 178M 28.5M 150M 15% 1.00x ONLINE - domov-4 178M 235K 178M 0% 1.00x ONLINE - domov-5 178M 235K 178M 0% 1.00x ONLINE - domov-6 178M 235K 178M 0% 1.00x ONLINE - domov-7 178M 599K 177M 0% 2048.00x ONLINE - thomas@lxdv65:~>sudo zfs get all | grep "avail\|used " domov-0used 163K - Specify MIB file to bring domov-0available 86.4M - domov-1used 157K - the ZFS information in- domov-1available 86.4M - domov-2used 157K - to SNMP. domov-2available 86.4M - domov-3used 18.9M - domov-3available 67.6M - domov-4used 157K - domov-4available 86.4M - domov-5used 157K - domov-5available 86.4M - domov-6used 157K - domov-6available 86.4M - domov-7used 256M - domov-7available 86.2M - ZFS-MIB File

ZFS-MIB.txt

ZFS-MIB DEFINITIONS ::= BEGIN

IMPORTS OBJECT-TYPE, MODULE-IDENTITY, enterprises, Counter64, Integer32 FROM SNMPv2-SMI ... -- -- A brief description and update information about the ZFS-MIB. -- zfs MODULE-IDENTITY LAST-UPDATED "201312190000Z" ORGANIZATION "GSI" CONTACT-INFO "[email protected]" DESCRIPTION "This MIB module describes read-only ZFS information gathered through libzfs. This encompasses the health status, available used and total space, as well as compression and deduplication ratio of pools."

REVISION "201312190000Z" DESCRIPTION "Initial revision."

::= { hpc 1 } ZFS-MIB File (cont.) ZFS-MIB.txt

... ZFSUnsigned64 ::= TEXTUAL-CONVENTION DISPLAY-HINT "d" STATUS current DESCRIPTION "A 64 bits unsigned (which doesn’t exist in SMIv2) containing any unsigned 64 bits integer number. It is defined as a Counter64 but doesn’t carry the counter semantic" SYNTAX Counter64

-- We are hosted under GSI OID (2021). gsi OBJECTIDENTIFIER::={enterprises2021} hpc OBJECTIDENTIFIER::={gsi255} poolTable OBJECT-TYPE SYNTAX SEQUENCE OF PoolEntry MAX-ACCESS not-accessible STATUS current DESCRIPTION "ZFS Pool watching information." ::= { zfs 1 } poolEntry OBJECT-TYPE SYNTAX PoolEntry MAX-ACCESS not-accessible STATUS current DESCRIPTION "An entry containing information on a ZFS pool." INDEX { poolIndex } ::= { poolTable 1 } ... ZFS-MIB File (cont.) ZFS-MIB.txt

... PoolEntry ::= SEQUENCE { poolIndex Integer32, --1 poolName DisplayString, -- 2 poolHealth DisplayString, -- 3 poolAvail ZFSUnsigned64, -- 4 poolUsed ZFSUnsigned64, -- 5 poolTotal ZFSUnsigned64, -- 6 poolCompressRatio DisplayString -- 7 poolDedupRatio DisplayString -- 8 } poolIndex OBJECT-TYPE SYNTAX Integer32 (0..255) MAX-ACCESS read-only STATUS current DESCRIPTION "Reference Index for each observed ZFS pool." ::= { poolEntry 1 } poolName OBJECT-TYPE SYNTAX DisplayString (SIZE (0..255)) MAX-ACCESS read-only STATUS current DESCRIPTION "Name of ZFS pool." ::= { poolEntry 2 } ... Inspect our ZFS-MIB File

thomas@lxdv65:~>snmptranslate -Tp -IR ZFS-MIB::poolTable +--poolTable(1) | +--poolEntry(1) | Index: poolIndex | +-- -R-- Integer32 poolIndex(1) | Range:0..255 +-- -R-- String poolName(2) | Textual Convention: DisplayString | Size:0..255 +-- -R-- String poolHealth(3) | Textual Convention: DisplayString | Size:0..15 +-- -R-- Counter64 poolAvail(4) | Textual Convention: ZFSUnsigned64 +-- -R-- Counter64 poolUsed(5) | Textual Convention: ZFSUnsigned64 +-- -R-- Counter64 poolTotal(6) | Textual Convention: ZFSUnsigned64 +-- -R-- String poolCompressRatio(7) | Textual Convention: DisplayString | Size:0..15 +-- -R-- String poolDedupRatio(8) Textual Convention: DisplayString Size: 0..15 Inspect our ZFS-MIB File (cont.)

thomas@lxdv65:~>snmptranslate -On -Td ZFS-MIB::poolHealth .1.3.6.1.4.1.2021.255.1.1.1.3 poolHealth OBJECT-TYPE -- FROM ZFS-MIB -- TEXTUAL CONVENTION DisplayString SYNTAX OCTET STRING (0..15) DISPLAY-HINT "255a" MAX-ACCESS read-only STATUS current DESCRIPTION "Health status of ZFS pool." ::= { iso(1) org(3) dod(6) internet(1) private(4) enterprises(1) gsi(2021) hpc(255) zfs(1) poolTable(1) poolEntry(1) 3 }

• Howto implement a SNMP sub-agent daemon, once we specified our MIB file? Excellent starting point: http://www.net-snmp.org/wiki/index.php/Tutorials From MIB ⇒ C thomas@lxdv65:~>env MIBS="+ZFS-MIB" mib2c -c mib2c.iterate.conf poolTable # poolTable.h poolTable.c

/* * Note: this file originally auto-generated by mib2c using * : mib2c.iterate.conf 17821 2009-11-11 09:00:00Z dts12 */ #ifndef POOLTABLE_H #define POOLTABLE_H

/* function declarations */ void init_poolTable(void); void initialize_table_poolTable(void); Netsnmp_Node_Handler poolTable_handler; Netsnmp_First_Data_Point poolTable_get_first_data_point; Netsnmp_Next_Data_Point poolTable_get_next_data_point;

/* column number definitions for table poolTable */ #define COLUMN_POOLINDEX 1 #define COLUMN_POOLNAME 2 #define COLUMN_POOLHEALTH 3 #define COLUMN_POOLAVAIL 4 #define COLUMN_POOLUSED 5 #define COLUMN_POOLTOTAL 6 #define COLUMN_POOLCOMPRESSRATIO 7 #define COLUMN_POOLDEDUPRATIO 8 #endif /* POOLTABLE_H */ From MIB ⇒ C (cont.)

/** Handles requests for the poolTable entries. */ int poolTable_handler(netsnmp_mib_handler *handler, netsnmp_handler_registration *reginfo, netsnmp_agent_request_info *reqinfo, netsnmp_request_info *requests) { netsnmp_request_info *request; netsnmp_table_request_info *table_info; struct poolTable_entry *table_entry; char result[ZFS_MAXPROPLEN];

switch (reqinfo->mode) { case MODE_GET: for (request=requests; request; request=request->next) { table_entry = (struct poolTable_entry *) netsnmp_extract_iterator_context(request); table_info = netsnmp_extract_table_info(request); switch (table_info->colnum) { ... case COLUMN_POOLHEALTH: if ( !table_entry ) { netsnmp_set_request_error(reqinfo, request, SNMP_NOSUCHINSTANCE); continue; } /* Pool health. */ if (get_zpool_prop(libzfs_handle, table_entry->poolName, ZPOOL_PROP_HEALTH, result) == ERROR) netsnmp_set_request_error(reqinfo, request, SNMP_NOSUCHINSTANCE); else { strcpy(table_entry->poolHealth, result); table_entry->poolHealth_len = strlen(result); snmp_set_var_typed_value(request->requestvb, ASN_OCTET_STR, (u_char*)table_entry->poolHealth, table_entry->poolHealth_len); } Ask libzfs (/dev/zfs) for ZFS Information First approach, don’t do that!

#define COMMAND_ARCSTATS "/bin/cat /proc//kstat/zfs/arcstats" #define COMMAND_ZPOOL_HEALTH "/usr/local/sbin/zpool list -H -o name,health" #define COMMAND_ZGET_AVAIL_USED "/usr/local/sbin/zfs get -Hpo value used,available"

/* the command for reading. */ fp = popen(COMMAND_ZPOOL_HEALTH, "r"); if (fp == NULL) { perror("popen" ); return ERROR; } i = 0; while (fgets(line, sizeof(line)-1, fp) != NULL) {

line_dup = strdup(line); while ((tok_str = strsep(&line_dup, "\t"))) {

if (n_token == 0) { //printf("poolname: %s\n", tok_str); name_temp = strdup(tok_str); } else if (n_token == 1) { tok_str[strlen(tok_str)-1] = ’\0’; /* Remove CR */ health_temp = strdup(tok_str); } else { free_pool_info(pool_info); return ERROR; } n_token = (n_token + 1) % 2; } pool_info[i] = malloc(sizeof(pool_info_t)); strcpy(pool_info[i]->name, name_temp); strcpy(pool_info[i]->health, health_temp); ... Ask libzfs (/dev/zfs) for ZFS Information (cont.) Much more efficient and cleaner!

#include

int get_zpool_prop(libzfs_handle_t *libzfs_handle, const char const *pool_name, zpool_prop_t prop, char result[ZFS_MAXPROPLEN]) { zpool_handle_t *zpool_handle; int rc;

if (libzfs_handle == NULL) { fprintf(stderr, "Error: libzfs_handle is NULL pointer\n"); return ERROR; }

zpool_handle = zpool_open_canfail(libzfs_handle, pool_name); if (zpool_handle == NULL) { fprintf(stderr, "Error: zpool_open_canfail(%p, %s)\n", libzfs_handle, pool_name); return ERROR; }

rc = zpool_get_prop(zpool_handle, prop, result, ZFS_MAXPROPLEN, NULL); if (rc != SUCCESS) { fprintf(stderr, "Error: zpool_get_prop(%p, %s), rc = %d\n", zpool_handle, result, rc); zpool_close(zpool_handle); return ERROR; }

zpool_close(zpool_handle); return SUCCESS; } ... Demo (server)

Start by means of init script. thomas@lxdv65:~>sudo /etc/init.d/zfsnmpd start [ ok ] Starting ZFS SNMP Sub-Agent: zfsnmpd. thomas@lxdv65:~>sudo /etc/init.d/zfsnmpd stop [ ok ] Stopping ZFS SNMP Sub-Agent: zfsnmpd. Let’s look at the syntax first thomas@lxdv65:~/dev/zfsnmpd>sudo ./zfsnmpd --help unknown parameter: --help syntax: ./zfsnmpd -f (optional parameter for running in foreground) (if no parameter is given it runs in background as a daemon) version 0.1, written by [email protected], HPC Group at GSI, 2014 Start in foreground thomas@lxdv65:[1]~/dev/zfsnmpd>sudo ./zfsnmpd -f NET-SNMP version 5.4.3 AgentX subagent connected zfsnmpd is up and running. Client for querying (efficiently) ZFS information thomas@lxdv65:~/dev/zfsnmpd>./zfsnmp syntax: ./zfsnmp ... (specify hostname(s) or IP address(es)) -f (specify hostfile where each row contains a hostname or IP address) -p [optional] parameter named ’problem’ to quickly see whether unhealthy ZFS pools exist example: ./zfsnmp -f zfsnmphosts.txt -p example: ./zfsnmp 10.10.2.17 lx-zfs01.gsi.de lx-zfs73.gsi.de version 0.1, written by [email protected], HPC Group at GSI, 2014 thomas@lxdv65:~/dev/zfsnmpd>./zfsnmp localhost -p host: ’localhost’ has one or several unhealthy pools and functioning can be compromised!

... void synchronous_query(int pflag) { struct host *hp; unsigned int oid_i = 1; unsigned int host_index = 0; double summary_avail = 0; double summary_used = 0; double summary_total = 0;

struct timespec start_time, end_time; clock_gettime(CLOCK_MONOTONIC, &start_time);

/* Iterate over all hosts. */ for (hp = hosts; hp->name; hp++) { struct snmp_session ss, *sp; struct oid *op; struct oid *op_i; ... Client for querying (efficiently) ZFS information (cont.)

thomas@lxdv65:~/dev/zfsnmpd>./zfsnmp localhost localhost

localhost | name health available used total compress dedup +-1domov-0DEGRADED 86.39M 0.16M 86.55M 1.00x 1.00x +-2domov-1 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-3domov-2 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-4domov-3 ONLINE 67.64M 18.91M 86.55M 2.38x 1.00x +-5domov-4 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-6domov-5 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-7domov-6 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-8 domov-7 ONLINE 86.16M 256.01M 342.18M 1.00x 2048.00x

localhost | name health available used total compress dedup +-1domov-0DEGRADED 86.39M 0.16M 86.55M 1.00x 1.00x +-2domov-1 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-3domov-2 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-4domov-3 ONLINE 67.64M 18.91M 86.55M 2.38x 1.00x +-5domov-4 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-6domov-5 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-7domov-6 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-8 domov-7 ONLINE 86.16M 256.01M 342.18M 1.00x 2048.00x

summary: 2 host(s) queried in 0.06 secs with overall capacities (summarized over all hosts) available space: 1.31G used space: 0.54G total space: 1.85G Client for querying (efficiently) ZFS information (cont.)

thomas@lxdv65:~/dev/zfsnmpd>cat zfsnmphosts.txt # This denotes a comment 127.0.0.1 10.10.1.1

thomas@lxdv65:[1]~/dev/zfsnmpd>./zfsnmp -f zfsnmphosts.txt 127.0.0.1

| name health available used total compress dedup +-1domov-0DEGRADED 86.39M 0.16M 86.55M 1.00x 1.00x +-2domov-1 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-3domov-2 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-4domov-3 ONLINE 67.64M 18.91M 86.55M 2.38x 1.00x +-5domov-4 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-6domov-5 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-7domov-6 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-8 domov-7 ONLINE 86.16M 256.01M 342.18M 1.00x 2048.00x

cannot connect to host: 10.10.1.1

summary: 2 host(s) queried in 6.04 secs with overall capacities (summarized over all hosts) available space: 0.66G used space: 0.27G total space: 0.93G Summary & Outlook Summary: • SNMP sub-agent daemon + query client are developed. • Implement trap mechanism. • Final code polishing, e.g. switch between fprintf(stderr,...) and syslog(LOG_ERR,...). • Debian package (will require GSI-ZFS.deb). • Will be publicly available (GPL3), e.g. http://git.stibor.net or http://github.com/stibor Outlook: • SNMP query client for . thomas@[SSH]apollo:~/linux/lustre/lustre-release/snmp>ll total 136 drwxr-xr-x 2 thomas thomas 4096 Nov 28 13:32 autoconf -rw-r--r-- 1 thomas thomas 28864 Nov 28 13:32 Lustre-MIB.txt -rw-r--r-- 1 thomas thomas 27854 Nov 28 13:32 lustre-snmp.c -rw-r--r-- 1 thomas thomas 1963 Nov 28 13:32 lustre-snmp.h -rw-r--r-- 1 thomas thomas 18223 Nov 28 13:32 lustre-snmp-trap.c -rw-r--r-- 1 thomas thomas 1466 Nov 28 13:32 lustre-snmp-trap.h -rw-r--r-- 1 thomas thomas 23882 Nov 28 13:32 lustre-snmp-util.c -rw-r--r-- 1 thomas thomas 8553 Nov 28 13:32 lustre-snmp-util.h -rw-r--r-- 1 thomas thomas 397 Nov 28 13:32 Makefile.am -rw-r--r-- 1 thomas thomas 214 Nov 28 13:32 README.install