Monitoring Systems and POWER5/6 Lpars with Ganglia

Monitoring Systems and POWER5/6 Lpars with Ganglia

Monitoring Systems and POWER5/6 LPARs with Ganglia Michael Perzl – [email protected] Agenda . Ganglia – what is it ? . Ganglia components and data flow . An introduction to RRDTool . Ganglia metrics – what can be measured ? . New POWER5/6 metrics (AIX & Linux) . Extending Ganglia with gmetric . Add device specific information to Ganglia . Ganglia network communication . Installation issues . Where to get Ganglia for AIX and Linux on POWER ? . Best practices . Future additions / plans . Discussion . Links 2 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia – what is it ? Ganglia – what is it ? (1/3) . Ganglia is an Open Source cluster performance monitoring tool and has been extended to include POWER5/6 features like shared processor LPARs, entitlement, physical CPU usage etc. This session covers: – the technical details of Ganglia and the POWER5/6 extensions – how to set it up and use it to monitor all LPARs in a single machine and lots of machines 4 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia – what is it ? (2/3) Ganglia properties: . scalable distributed monitoring system for high-performance computing systems such as clusters and grids . based on a hierarchical design targeted at federations of clusters . relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state . leverages widely used technologies such as – XML for data representation – XDR (eXternal Data Representation) for compact, portable data transport – RRDtool for data storage and visualization . uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency . robust implementation . Open Source, written in C – Downloaded 110,000+ times, 145+ countries, 500+ clusters, 2000+ nodes 5 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia – what is it ? (3/3) Ganglia properties (cont.): . has been ported to an extensive set of operating systems and processor architectures: – AIX – Darwin – FreeBSD – HP-UX – IRIX – Linux – OSF – NetBSD – Solaris – Windows (via Cygwin) . is currently in use on over 500+ clusters around the world . has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000+ nodes – check http://ganglia.info/ for more details 6 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia components and data flow Ganglia components The ganglia system consists of: . two unique daemons: – Ganglia Monitoring Daemon (gmond) • monitoring daemon, collects the metrics • runs on each node – Ganglia Meta Daemon (gmetad) • polls all gmond clients and stores the collected metrics in Round-Robin Databases (RRDs) . a PHP-based web frontend . a few other small utility programs – gmetric • can be used to easily extend Ganglia with additional user-defined metrics – gstat – gexec 8 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia – Schematic View From: “Ganglia: Past, Present and Future” by Matt Massie: URL: http://ganglia.info/talks/lug_lbl_talk/ 9 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia Architecture 10 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia Monitoring Daemon (gmond) . Ganglia Monitoring Daemon (gmond) is a multi-threaded daemon which runs on each cluster node you want to monitor. Installation is easy: – just the daemon and a configuration file (/etc/gmond.conf) . gmond has four main responsibilities: 1. monitor changes in host state 2. announce relevant changes 3. listen to the state of all other ganglia nodes via a unicast or multicast channel 4. answer requests for an XML description of the cluster state . Each gmond transmits information in two different ways: – unicasting or multicasting host state in external data representation (XDR) format using UDP messages – sending XML over a TCP connection 11 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia Meta Daemon (gmetad) (1/2) . Ganglia Meta Daemon (gmetad) is a daemon which typically only runs on one specific cluster node – or on more when using a staged setup. Installation is easy: – just the daemon and a configuration file (/etc/gmetad.conf) . Federation in Ganglia is achieved using a tree of point-to-point connections amongst representative cluster nodes to aggregate the state of multiple clusters. At each node in the tree a gmetad – periodically polls a collection of child data sources – parses the collected XML – saves all numeric volatile metrics to round-robin databases – exports the aggregated XML over a TCP socket to clients 12 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia Meta Daemon (gmetad) (2/2) . Data sources may be either – gmond daemons, representing specific clusters or – other gmetad daemons, representing sets of clusters . Data sources use source IP addresses for access control – Multiple IP addresses can be specified for failover – The capability is natural for aggregating data from clusters since each gmond daemon contains the entire state of its cluster 13 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia PHP web frontend (1/2) Web frontend properties: . provides a view of the gathered information via real-time dynamic web pages . displays Ganglia data in a meaningful way for system administrators and users – For example, one can view the CPU utilization over the past hour, day, week, month, or year – The web frontend shows similar graphs for memory usage, disk usage, network statistics, number of running processes, and all other Ganglia metrics 14 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia PHP web frontend (2/2) Web frontend properties (cont.): . depends on the existence of the gmetad which provides it with data from several Ganglia sources . opens the local port 8651 (by default) and expects to receive a Ganglia XML tree . the web pages themselves are highly dynamic; any change to the Ganglia data appears immediately on the site – This behavior leads to a very responsive site, but requires that the full XML tree be parsed on every page access – Therefore, the Ganglia web frontend should run on a fairly powerful, dedicated machine if it presents a large amount of data . is written in the PHP scripting language and uses graphs generated by gmetad to display history information . has been tested on many flavors of Unix (primarily Linux) with the Apache web server and the PHP 4.1 module 15 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia - data flow (1/4) One daemon per node/LPAR gmond Operating System performance stats /etc/gmond.conf API File access Network Web 16 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia - data flow (2/4) One daemon per node/LPAR Runs on web server gmond gmetad /etc/gmetad.conf rrdtool Operating System performance stats /etc/gmond.conf database API of statistics Browser File access Network Web 17 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia - data flow (3/4) One daemon per node/LPAR Runs on web server gmond gmetad /etc/gmetad.conf rrdtool Operating System performance stats /etc/gmond.conf database API of statistics Ganglia FE scripts Browser File access Apache2 Network + PHP5 Web 18 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia - data flow (4/4) User command One daemon per node/LPAR Runs on web server gmetric gmond gmetad /etc/gmetad.conf rrdtool Operating System performance stats /etc/gmond.conf database API of statistics Ganglia FE scripts Browser File access Apache2 Network + PHP5 Web 19 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia - data flow again One daemon per node/LPAR Only one instance with the Web Server /etc/gmetad.conf /etc/gmond.conf gmond gmetad rrdtool /etc/gmond.conf gmond database of statistics /etc/gmond.conf gmond PHP scripts Browser Apache2 File access + PHP5 Network Web 20 Monitoring Systems and POWER5/6 LPARs with Ganglia An introduction to RRDTool RRDTool . Homepage: http://oss.oetiker.ch/rrdtool/ . RRD is the Acronym for Round-Robin Database. RRD is a system to store and display time-series data (i.e., network bandwidth, machine-room temperature, server load average). It stores the data in a very compact way that will not expand over time (fixed size of DB), and it presents useful graphs by processing the data to enforce a certain data density. It can be used either via simple wrapper scripts (from shell or Perl) or via frontends that poll network devices and put a friendly user interface on it. RRDTool is the industry standard tool to store and display time-series data! 22 Monitoring Systems and POWER5/6 LPARs with Ganglia RRDTool example graph Graph taken from http://oss.oetiker.ch/rrdtool/gallery/index.en.html Graph shows inbound and outbound call traffic going in and out of the switch via the 6 trunks connected to the Diamond exchange. Inbound traffic shown as positive and uses a lowest-free fill method. Outbound traffic shown as negative uses a distributed fill method. Tech details on RRDtrac. 23 Monitoring Systems and POWER5/6 LPARs with Ganglia RRDTool example # rrdtool create test.rrd \ --start 920804400 \ --step 300 \ DS:km:COUNTER:600:U:U \ RRA:AVERAGE:0.5:1:24 # rrdtool update test.rrd 920804700:12345 920805000:12357 920805300:12363 # rrdtool update test.rrd 920805600:12363 920805900:12363 920806200:12373 # rrdtool update test.rrd 920806500:12383 920806800:12393 920807100:12399 # rrdtool update test.rrd 920807400:12405 920807700:12411 920808000:12415 # rrdtool update test.rrd 920808300:12420 920808600:12422 920808900:12423 # rrdtool graph kilometer.png \ --start 920804400 \ --end 920808000 \ DEF:mykm=test.rrd:km:AVERAGE \ LINE2:mykm#FF0000 24 Monitoring Systems and POWER5/6 LPARs with Ganglia Ganglia metrics – what can be monitored ? Metrics Definition of a metric: . A metric is a certain observed property of the system. Number of metrics: . 34 standard metrics, i.e., available (i.e., defined) on all platforms . Additional platform dependent metrics available – Solaris • 8 additional metrics available – HP-UX • 4 additional metrics available – AIX • 18 additional new metrics available for POWER5/6 !!! • details later….

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    93 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us