How to see what is happening inside your OpenStack using Elastic Stack and Prometheus Introduction & Agenda
l About me
- Csaba Patyi ([email protected])
- Consultant and Instuctor at Component Soft Ltd.
- 6 years of experience from the Ops side
- Mirantis and COA Certs
- Linux Foundation Certs l Agenda - About Component Soft - Logging with Elastic Stack - Monitoring with Prometheus - (Monasca and Ceilosca) About Component Soft l Educational Services l Consulting Services:
lOpenStack and Ceph consulting and lBash, Perl, Python courses support services
lRed Hat Linux, Advanced Linux lDocker and Kubernetes consulting and support services lJava, Scala and C++ courses
lOpen Source Full Stack consulting and lOpenStack, Docker, Kubernetes, Ceph support services
lSoftware testing methodologies l Contact Us: Website: http://www.componentsoft.eu/ lApache, Tomcat, MySQL Office and training site:
lVeritas Storage Foundation and Cluster 1116 Budapest Fehérvári street 126-128 lAuthorized Oracle courses Phone: +36-1-487 4040 Fax: +36-1-487-4047 lNext Gen. Telecom and Networking E-mail: [email protected] What is OpenStack? - The technical term: Collection of Independent but related projects provides web UI Dashboard Horizon
Telemetry measures usage Object Service Store Ceilometer Swift
provides provides Network Compute Image provides connectivity OS backstore Service Service template Store Neutron Nova Glance
provides Block Orchestration volumes Storage Service Cinder Heat automates creation
Identity provides Service authentication & Keystone authorization Distribution of services
neutron-server Controller Network node(3) node** horizon neutron-l3-agent
cinder neutron-l2-agent Storage node** glance swift-account nova-api swift-container Compute nova-scheduler node** swift-object keystone neutron-l2-agent
Mysql nova-compute
swift-proxy
RabbitMQ/Qpid VM provisioning in-depth 16 1 Client Keystone
1 Glance 2 3 0 hypervisor 1 1 3 nova-api 8 9 nova-compute 5 7 Neutron 1 2 AMQ
8 11 1 1 1 1 4 7 9 5 nova-scheduler Cinder nova-conductor
6
4 DB Hit an error? No problem. Check the logs… Where are they? Node role / Controller Network Compute Swift Project Name Storage Horizon 3 Keystone 5 Nova 9 2 2 Neutron 3 10 2 Glance 2 Cinder 5 1 Ceilometer (+ 23 1 1 1 aodh and gnocchi) Heat 4 Swift 2 1 SUM 56 26 5 3 3 Controller, 2 Network, 20 Compute, 6 Storage Node == 31 nodes and 312 log files. # OpenStack log file locations per service: https://docs.openstack.org/
- Elasticsearch → Storing and indexing logs. Makes fast search possible
- Logstash → transforming incoming logs and sending it to Elasticsearch
- Kibana → Web interface for the Elastic Stack l New modules/plugins:
- XPACK (most of them only in the payed version)
- Beats
l Filebeat → a small agent application collecting and sending logs directly to ElasticSearch or Logstash
l Metricbeat → a small agent application collecting and sending metrics directly to ElasticSearch or Logstash
l Etc. Configuration for filebeat
#/etc/filebeat/filebeat.yml filebeat: prospectors: . . . - paths: - "/var/log/nova/*.log" exclude_files: ['\.gz$'] document_type: nova tags: [openstack_service_logs, nova'] . . . output: logstash: enabled: true hosts: - logstash-server:5044 index: filebeat bulk_max_size: 50 Really simple Logstash config
#/usr/share/logstash/pipeline/logstash.conf input { beats { port => 5044 } } filter { grok { match => { "message" => "(?m)^%{TIMESTAMP_ISO8601:date}\s+\d+\s+(?
1: 2017-05-29 03:42:56.491 1980 WARNING neutron.db.agents_db [req-2d2958ae-dcaa-40be-b7d3- 6e3513a0f2d3 - - - - -] Agent healthcheck: found 5 dead agents out of 13: 2: Type Last heartbeat host 3: Metering agent 2017-05-26 11:13:39 network2.openstack.local 4: Loadbalancerv2 agent 2017-05-26 11:13:41 network2.openstack.local 5: L3 agent 2017-05-26 11:13:41 network2.openstack.local 6: Metadata agent 2017-05-26 11:13:39 network2.openstack.local 7: DHCP agent 2017-05-26 11:13:10 network2.openstack.local What about multiline logs?
1: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Failed reporting state! 2: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last): 3: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "file-location", line 320, in _report_state 4: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent True) 5: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "another-file-location", line 88, in report_state 6: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return method(context, 'report_state', **kwargs) 7: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "another-another-file", line 169, in call
https://play.golang.org/p/vZWQJt5lQ0 Filebeat multiline config example
#/etc/filebeat/filebeat.yml filebeat: prospectors: - . . . - paths: - "/var/log/nova/*.log" exclude_files: ['\.gz$'] document_type: nova multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3} [0-9]+ (ERROR|WARNING|INFO|DEBUG|TRACE) [0-9A-Za-z._]+ \[' multiline.negate: true multiline.match: after tags: [openstack_service_logs, nova] . . . output: elasticsearch: enabled: true hosts: - logstash-server:5044 index: filebeat bulk_max_size: 50 Kibana Demo Build test environment for yourself
l https://github.com/itoperatorguy/openstack-elk- docker What about monitoring and alarming?
l You have to monitor the physical infrastructure
- Traditional:
l Zabbix
l Nagios
l Icinga
l Zenoss
l + a few more
- New “players”
l Elastic Stack XPACK (commercial licence)
l Prometheus l You have to monitor the virtual infrastructure
- “Traditional”
l Ceilometer
- New “players”
l Monasca
l Ceilosca Prometheus
l “...an open-source systems monitoring and alerting toolkit originally built at SoundCloud.” l “joined the Cloud Native Computing Foundation (https://www.cncf.io/) in 2016 as the second hosted project after Kubernetes.” l Some of the Features
- a multi-dimensional data model (time series identified by metric name and key/value pairs)
- no reliance on distributed storage; single server nodes are autonomous
- time series collection happens via a pull model over HTTP
- targets are discovered via service discovery or static configuration
- multiple modes of graphing and dashboarding support l Some of the Components
- the main Prometheus server which scrapes and stores time series data
- a push gateway for supporting short-lived jobs
- special-purpose exporters (for HAProxy, StatsD, Graphite, etc.)
- an alertmanager Prometheus Architecture Demo deployment of Prometheus Prometheus exporters for OpenStack
l Consul exporter (official) l cAdvisor l ElasticSearch exporter l Memcached exporter (official) l MongoDB exporter l MySQL server exporter (official) l Node/system metrics exporter (official) l RabbitMQ exporter l RabbitMQ Management Plugin exporter l Ceph exporter l Gluster exporter l Apache exporter l HAProxy exporter (official) Prometheus configuration
global: . . scrape_interval: 30s . evaluation_interval: 30s scrape_configs: . labels: . cluster: swarm . replica: "1" - job_name: 'node-exporter' dns_sd_configs: # Attach these labels to any time series or alerts when - names: communicating with - 'tasks.node-exporter' # external systems (federation, remote storage, Alertmanager). type: 'A' external_labels: port: 9100 monitor: 'prometheus-swarm' - job_name: "node" scrape_interval: 5s rule_files: static_configs: - "alert.rules_nodes" - targets: ['10.10.10.51:9100', - "alert.rules_tasks" '10.10.10.52:9100', - "alert.rules_service-groups" '10.10.10.53:9100', . '10.10.10.54:9100', . '10.10.10.55:9100', . '10.10.10.56:9100', '10.10.10.57:9100'] Prometheus query examples
l Show Overall CPU usage for a server - 100 * (1 - avg by(instance)(irate(node_cpu{mode='idle'}[5m]))) l HTTP request rate, per second.. an hour ago - rate(api_http_requests_total{status=500}[5m] offset 1h) l Disk Will Fill in 4 Hours - predict_linear(node_filesystem_free[1h], 4*3600) Prometheus Alarm Syntax
l ALERT
l OpenStack log file locations per service: https://docs.openstack.org/
- https://prometheus.io/docs/querying/examples
- https://github.com/infinityworksltd/prometheus-example-queries Q & A