How to see what is happening inside your OpenStack using Elastic Stack and Prometheus Introduction & Agenda

l About me

- Csaba Patyi ([email protected])

- Consultant and Instuctor at Component Soft Ltd.

- 6 years of experience from the Ops side

- Mirantis and COA Certs

- Linux Foundation Certs l Agenda - About Component Soft - Logging with Elastic Stack - Monitoring with Prometheus - (Monasca and Ceilosca) About Component Soft l Educational Services l Consulting Services:

lOpenStack and Ceph consulting and lBash, Perl, Python courses support services

lRed Hat Linux, Advanced Linux lDocker and Kubernetes consulting and support services lJava, Scala and C++ courses

lOpen Source Full Stack consulting and lOpenStack, Docker, Kubernetes, Ceph support services

lSoftware testing methodologies l Contact Us: Website: http://www.componentsoft.eu/ lApache, Tomcat, MySQL Office and training site:

lVeritas Storage Foundation and Cluster 1116 Budapest Fehérvári street 126-128 lAuthorized Oracle courses Phone: +36-1-487 4040 Fax: +36-1-487-4047 lNext Gen. Telecom and Networking E-mail: [email protected] What is OpenStack? - The technical term: Collection of Independent but related projects provides web UI Dashboard Horizon

Telemetry measures usage Object Service Store Ceilometer Swift

provides provides Network Compute Image provides connectivity OS backstore Service Service template Store Neutron Nova Glance

provides Block Orchestration volumes Storage Service Cinder Heat automates creation

Identity provides Service authentication & Keystone authorization Distribution of services

neutron-server Controller Network node(3) node** horizon neutron-l3-agent

cinder neutron-l2-agent Storage node** glance swift-account nova-api swift-container Compute nova-scheduler node** swift-object keystone neutron-l2-agent

Mysql nova-compute

swift-proxy

RabbitMQ/Qpid VM provisioning in-depth 16 1 Client Keystone

1 Glance 2 3 0 hypervisor 1 1 3 nova-api 8 9 nova-compute 5 7 Neutron 1 2 AMQ

8 11 1 1 1 1 4 7 9 5 nova-scheduler Cinder nova-conductor

6

4 DB Hit an error? No problem. Check the logs… Where are they? Node role / Controller Network Compute Swift Project Name Storage Horizon 3 Keystone 5 Nova 9 2 2 Neutron 3 10 2 Glance 2 Cinder 5 1 Ceilometer (+ 23 1 1 1 aodh and gnocchi) Heat 4 Swift 2 1 SUM 56 26 5 3 3 Controller, 2 Network, 20 Compute, 6 Storage Node == 31 nodes and 312 log files. # OpenStack log file locations per service: https://docs.openstack.org//config- reference/ for i in horizon keystone nova neutron glance cinder ceilometer aodh gnocchi heat swift ; do COUNT=`find /var/log/ | egrep $i | grep log$ | wc -l` echo "$i : $COUNT" done; Elastic Stack for the rescue l Formally known as ELK stack

- Elasticsearch → Storing and indexing logs. Makes fast search possible

- Logstash → transforming incoming logs and sending it to Elasticsearch

- → Web interface for the Elastic Stack l New modules/plugins:

- XPACK (most of them only in the payed version)

- Beats

l Filebeat → a small agent application collecting and sending logs directly to ElasticSearch or Logstash

l Metricbeat → a small agent application collecting and sending metrics directly to ElasticSearch or Logstash

l Etc. Configuration for filebeat

#/etc/filebeat/filebeat.yml filebeat: prospectors: . . . - paths: - "/var/log/nova/*.log" exclude_files: ['\.gz$'] document_type: nova tags: [openstack_service_logs, nova'] . . . output: logstash: enabled: true hosts: - logstash-server:5044 index: filebeat bulk_max_size: 50 Really simple Logstash config

#/usr/share/logstash/pipeline/logstash.conf input { beats { port => 5044 } } filter { grok { match => { "message" => "(?m)^%{TIMESTAMP_ISO8601:date}\s+\d+\s+(?AUDIT|CRITICAL|DEBUG|INFO|TRACE|WARNING|ERR OR)\s(?\S+).*$" } } if [module] == "iso8601.iso8601" { drop {} } } output { elasticsearch { hosts => "elasticsearch-server:9200" sniffing => true manage_template => false index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}" document_type => "%{[@metadata][type]}" } } What about multiline logs?

1: 2017-05-29 03:42:56.491 1980 WARNING neutron.db.agents_db [req-2d2958ae-dcaa-40be-b7d3- 6e3513a0f2d3 - - - - -] Agent healthcheck: found 5 dead agents out of 13: 2: Type Last heartbeat host 3: Metering agent 2017-05-26 11:13:39 network2.openstack.local 4: Loadbalancerv2 agent 2017-05-26 11:13:41 network2.openstack.local 5: L3 agent 2017-05-26 11:13:41 network2.openstack.local 6: Metadata agent 2017-05-26 11:13:39 network2.openstack.local 7: DHCP agent 2017-05-26 11:13:10 network2.openstack.local What about multiline logs?

1: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Failed reporting state! 2: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last): 3: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "file-location", line 320, in _report_state 4: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent True) 5: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "another-file-location", line 88, in report_state 6: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return method(context, 'report_state', **kwargs) 7: 2017-02-22 08:29:07.810 1495 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "another-another-file", line 169, in call

https://play.golang.org/p/vZWQJt5lQ0 Filebeat multiline config example

#/etc/filebeat/filebeat.yml filebeat: prospectors: - . . . - paths: - "/var/log/nova/*.log" exclude_files: ['\.gz$'] document_type: nova multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3} [0-9]+ (ERROR|WARNING|INFO|DEBUG|TRACE) [0-9A-Za-z._]+ \[' multiline.negate: true multiline.match: after tags: [openstack_service_logs, nova] . . . output: elasticsearch: enabled: true hosts: - logstash-server:5044 index: filebeat bulk_max_size: 50 Kibana Demo Build test environment for yourself

l https://github.com/itoperatorguy/openstack-elk- docker What about monitoring and alarming?

l You have to monitor the physical infrastructure

- Traditional:

l Zabbix

l Nagios

l Icinga

l Zenoss

l + a few more

- New “players”

l Elastic Stack XPACK (commercial licence)

l Prometheus l You have to monitor the virtual infrastructure

- “Traditional”

l Ceilometer

- New “players”

l Monasca

l Ceilosca Prometheus

l “...an open-source systems monitoring and alerting toolkit originally built at SoundCloud.” l “joined the Cloud Native Computing Foundation (https://www.cncf.io/) in 2016 as the second hosted project after Kubernetes.” l Some of the Features

- a multi-dimensional data model (time series identified by metric name and key/value pairs)

- no reliance on distributed storage; single server nodes are autonomous

- time series collection happens via a pull model over HTTP

- targets are discovered via service discovery or static configuration

- multiple modes of graphing and dashboarding support l Some of the Components

- the main Prometheus server which scrapes and stores time series data

- a push gateway for supporting short-lived jobs

- special-purpose exporters (for HAProxy, StatsD, Graphite, etc.)

- an alertmanager Prometheus Architecture Demo deployment of Prometheus Prometheus exporters for OpenStack

l Consul exporter (official) l cAdvisor l ElasticSearch exporter l Memcached exporter (official) l MongoDB exporter l MySQL server exporter (official) l Node/system metrics exporter (official) l RabbitMQ exporter l RabbitMQ Management Plugin exporter l Ceph exporter l Gluster exporter l Apache exporter l HAProxy exporter (official) Prometheus configuration

global: . . scrape_interval: 30s . evaluation_interval: 30s scrape_configs: . labels: . cluster: swarm . replica: "1" - job_name: 'node-exporter' dns_sd_configs: # Attach these labels to any time series or alerts when - names: communicating with - 'tasks.node-exporter' # external systems (federation, remote storage, Alertmanager). type: 'A' external_labels: port: 9100 monitor: 'prometheus-swarm' - job_name: "node" scrape_interval: 5s rule_files: static_configs: - "alert.rules_nodes" - targets: ['10.10.10.51:9100', - "alert.rules_tasks" '10.10.10.52:9100', - "alert.rules_service-groups" '10.10.10.53:9100', . '10.10.10.54:9100', . '10.10.10.55:9100', . '10.10.10.56:9100', '10.10.10.57:9100'] Prometheus query examples

l Show Overall CPU usage for a server - 100 * (1 - avg by(instance)(irate(node_cpu{mode='idle'}[5m]))) l HTTP request rate, per second.. an hour ago - rate(api_http_requests_total{status=500}[5m] offset 1h) l Disk Will Fill in 4 Hours - predict_linear(node_filesystem_free[1h], 4*3600) Prometheus Alarm Syntax

l ALERT l IF l [ FOR ] l [ LABELS

l OpenStack log file locations per service: https://docs.openstack.org//config- reference/ l Multiline solutions for LogStash: https://dzone.com/articles/using-multiple-grok-statements l Grok Pattern names for Logstash: https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok- patterns l Multiline solutions for Filebeat: https://www.elastic.co/guide/en/logstash/current/multiline.html l Multiline test site for Filebeat: https://play.golang.org/p/uAd5XHxscu l Docker ELK-stack deployment example: http://elk-docker.readthedocs.io/ l Good logstash pipeline config examples: https://github.com/sorantis/elkstack l Prometheus main site: https://prometheus.io/ l Prometheus Docker Swarm usage: https://github.com/bvis/docker-prometheus-swarm l Prometheus query examples:

- https://prometheus.io/docs/querying/examples

- https://github.com/infinityworksltd/prometheus-example-queries Q & A