AIOps: Anomaly detection with Prometheus
Spice up your Monitoring with AI
Marcel Hild Principal Software Engineer @ Red Hat AI CoE / Office of the CTO
Marcel Hild, Red Hat Marcel Hild, Red Hat HOW RED HAT SEES AI
Represents a workload Applicable to Red Hat’s Valuable to our Enable customers to requirement for our existing core business in customers as specific build Intelligent Apps platforms across the order to increase Open services and product using Red Hat products hybrid cloud. Source development capabilities, providing as well as our broader and production an Intelligent Platform partner ecosystem. efficiency. experience.
010110 101010 Data as the Foundation
3 Marcel Hild, Red Hat HOW RED HAT SEES AI
Project Thoth and Bots http://bit.ly/2zYfb6h
Represents a workload Applicable to Red Hat’s Valuable to our Enable customers to requirement for our existing core business in customers as specific build Intelligent Apps platforms across the order to increase Open services and product using Red Hat products hybrid cloud. Source development capabilities, providing as well as our broader and production an Intelligent Platform partner ecosystem. efficiency. experience.
010110 101010 Data as the Foundation
4 Marcel Hild, Red Hat HOW RED HAT SEES AI
Project Thoth and Bots http://bit.ly/2zYfb6h
Represents a workload Applicable to Red Hat’s Valuable to our Enable customers to requirement for our existing core business in customers as specific build Intelligent Apps platforms across the order to increase Open services and product using Red Hat products hybrid cloud. Source development capabilities, providing as well as our broader and production an Intelligent Platform partner ecosystem. efficiency. experience. OpenDataHub
http://bit.ly/2y6Nh6m 010110 101010 Data as the Foundation
5 Marcel Hild, Red Hat HOW RED HAT SEES AI
Project Thoth and Bots This Talk http://bit.ly/2zYfb6h
Represents a workload Applicable to Red Hat’s Valuable to our Enable customers to requirement for our existing core business in customers as specific build Intelligent Apps platforms across the order to increase Open services and product using Red Hat products hybrid cloud. Source development capabilities, providing as well as our broader and production an Intelligent Platform partner ecosystem. efficiency. experience. OpenDataHub
http://bit.ly/2y6Nh6m 010110 101010 Data as the Foundation
6 Marcel Hild, Red Hat Overview
Prometheus
Long term storage
Atonomy of an Anømål¥
Integration into monitoring setup
7 Marcel Hild, Red Hat What's not in this talk
shiny product and the holy grail of ready solution to turn your monitoring success story how we turned our monitoring setup into spider demon messy monitoring into an advance ai monitoring
8 Marcel Hild, Red Hat What is in this talk
tools and scripts Q&A to problems all OSS to get you started
9 Marcel Hild, Red Hat What is prometheus?
Marcel Hild, Red Hat Prometheus architecture
Everybody architecture slides
11 Marcel Hild, Red Hat Prometheus architecture
Simplistic world view
12 Marcel Hild, Red Hat Prometheus architecture
Simplistic world view
13 Marcel Hild, Red Hat Prometheus architecture
Simplistic world view
14 Marcel Hild, Red Hat Prometheus architecture
Simplistic world view
15 Marcel Hild, Red Hat Prometheus architecture
Simplistic world view
16 Marcel Hild, Red Hat Prometheus is made for
MONITORING ALERTING
SHORT TERM TIME SERIES DB
17 Marcel Hild, Red Hat What do we need for machine learning? ---> DATA DATA DATA
Marcel Hild, Red Hat Long term storage of Prometheus data
Marcel Hild, Red Hat Too good to be true...
● Prometheus at scale ● Global query view ● Reliable historical data storage ● Unlimited retention ● Downsampling
thanos is in the making, but until then?
20 Marcel Hild, Red Hat Works great, but...
● easily hooked into prometheus with write and read endpoint ● Reliable historical data storage ● Great for data science ○ Pandas integration
Eats RAM for breakfast
gh/AICoE/p-influx http://bit.ly/2y6CvwX
21 Marcel Hild, Red Hat Let’s just store it... prometheus scraper
● container can be configured to scrape any prometheus server ● can scrape all or a subset of the metrics ● stores data in ceph or S3 compliant storage ● can be queried with spark sql ● Future Proof: path to Thanos
gh/AICoE/p-lts http://bit.ly/2Qw9pho
22 Marcel Hild, Red Hat Harness the power of spark to
● Query stored JSON files ● Distribute the workload ● Use spark library
notebook http://bit.ly/2PlZZVG
23 Marcel Hild, Red Hat What do we need for machine learning? ---> CONSISTENT DATA
Marcel Hild, Red Hat Prometheus Metric Types
Gauge Counter Histogram Summary
A Time Series Monotonically Cumulative Snapshot of Increasing Histogram of Values in a Values Time Window
25 Marcel Hild, Red Hat Prometheus Metric Types
Gauge
Counter
26 Marcel Hild, Red Hat Prometheus Metric Types
Histogram
Cumulative
Summary
Time Window
27 Marcel Hild, Red Hat Anatomy of a metric Label B Label B
Label B
Metric
Time Value Time Value Time Value Time Value
Marcel Hild, Red Hat E.g. docker_latency Hostname Operation Type
Clam Controller Kubelet Enabled Docker Operations Latency Time Value Time Value Time Value Time Value
Marcel Hild, Red Hat E.g. docker_latency Hostname free-stg-master-0-3fb6 Operation Type List Images
Clam Controller Kubelet Enabled Docker True
Operations Latency Time Value Every unique combination of labels Time Value Time Value makes up a Time Series Time Value
Marcel Hild, Red Hat Monitoring is hard ● prometheus doesn't enforce a schema ○ /metrics can expose anything it wants GET /metrics ○ no control over what is being exposed by endpoints or targets ○ it can change if your endpoints change versions ● # of metrics to choose from ○ 1000+ for OpenShift ● State of the Art is Dashboards and Alerting ○ Dashboards and Alerting need domain knowledge ● No tools to explore meta-information in metrics
31 Marcel Hild, Red Hat analysis of metrics meta data
32 Marcel Hild, Red Hat analysis of metrics meta data
Meta-data tooling http://bit.ly/2A1hXHX
33 Marcel Hild, Red Hat Anomaly Types
Marcel Hild, Red Hat Components of Time Series
Trend Seasonality Cyclicity Irregularity
Increase or decrease in the Regular pattern of up and It is a medium-term It refers to variations which series over a period of time. down fluctuations. It is a variation caused by occur due to unpredictable short-term variation circumstances, which repeat factors and also do not occurring due to seasonal in irregular intervals. repeat in particular factors. patterns.
Marcel Hild, Red Hat Anomaly Types
36 Marcel Hild, Red Hat Anomaly Detection with Prophet Predicting future data and dynamic thresholds
● list_images operation ● on OpenShift ● monitored by prometheus ● detecting outliers ● upper and lower bands
Marcel Hild, Red Hat Anomaly Detection with Prophet Extracting trends and seasonality
● list_images operation ● on OpenShift ● monitored by prometheus ● upward trends ● intraday seasonality
CoE/prophet http://bit.ly/2pLzGNj
Marcel Hild, Red Hat 39 Marcel Hild, Red Hat The Accumulator
40 Marcel Hild, Red Hat The Tail Probability
41 Marcel Hild, Red Hat Combined
42 Marcel Hild, Red Hat architecture setup so far
Marcel Hild, Red Hat Research Setup 100% OpenSource Tooling
Marcel Hild, Red Hat Now what? I want to
Marcel Hild, Red Hat Prometheus Training Pipeline
46 Marcel Hild, Red Hat Marcel Hild, Red Hat ● Ready to use container ● Local deployment ● Kubernetes ● OpenShift build config
CoE/prom-ad http://bit.ly/2yulCfh
Marcel Hild, Red Hat Runtime configuration
Expose predictions via /metrics endpoint
49 Marcel Hild, Red Hat Marcel Hild, Red Hat Marcel Hild, Red Hat Alerting Rules
52 Marcel Hild, Red Hat
notebooks gh/AICoE/p-influx http://bit.ly/2PlZZVG http://bit.ly/2y6CvwX
CoE/prophet
http://bit.ly/2pLzGNj Project Thoth and Bots http://bit.ly/2zYfb6h Meta-data tooling QUESTIONS? http://bit.ly/2A1hXHX
CoE/prom-ad No more g+ http://bit.ly/2yulCfhfacebook.com/redhatinc
linkedin.com/company/red-hat twitter.com/RedHat OpenDataHub http://bit.ly/2y6Nh6myoutube.com/user/RedHatVideos gh/AICoE/p-lts
http://bit.ly/2Qw9pho Marcel Hild, Red Hat