New tricks with IBM ChatOps :

Achieve Site Reliability Engineering operations with legacy solutions

Cloud Service Management and Operations (CSMO) DevOps Environment Ops

Service Site Reliability Management, Engineering ITIL, IT4IT, ZeroOutage By nature, IT operations and developers have opposite goals

Jane Todd Enterprise Developer Operations / Admin Responsible for modernizing existing applications and Responsible for infrastructure, security, and creating new Cloud Native Workloads. management of the environment.

Both development and IT operations need to meet business demands:

Developers IT Operations and Administrators • Rapidly create new • Quickly set up & manage applications, optimize existing modern, flexible, and ones, and securely connect compliant hybrid clouds their applications with data and • Integrate with new & existing services across all clouds management tools and processes By nature, IT operations and developers have opposite goals

Jane Todd Enterprise Developer Operations / Admin Responsible for modernizing existing applications and Responsible for infrastructure, security, and creating new Cloud Native Workloads. management of the environment.

Both development and IT operations need to meet business demands:

Developers IT Operations and Administrators • Rapidly create new • Quickly set up & manage applications, optimize existing modern, flexible, and ones, and securely connect compliant hybrid clouds their applications with data and • Integrate with new & existing services across all clouds management tools and processes Jane Todd Enterprise Developer Operations / Admin Responsible for modernizing existing applications and Responsible for infrastructure, security, and creating new Cloud Native Workloads. management of the environment.

Both development and IT operations need to meet business demands:

Developers IT Operations and Administrators • Rapidly create new • Quickly set up & manage applications, optimize existing modern, flexible, and ones, and securely connect compliant hybrid clouds their applications with data and • Integrate with new & existing services across all clouds management tools and processes Jane Todd Enterprise Developer Operations / Admin Responsible for modernizing existing applications and Responsible for infrastructure, security, and creating new Cloud Native Workloads. management of the environment.

Both development and IT operations need to meet business demands: Change that can be managed Stability that enables change

Unified, flexible, CSMO toolchain Dedicated Ops tools vs ChatOps Dedicated ITSM tools vs ChatOps

Plus Traditional • Email Modern Help Desk • Phone ChatOps Microsoft • Bridge Calls Teams tools • tools (, WhatsApp) • … L1 Instant Collaboration between SMEs … • Various Operations roles L2 • Developers L3, SMEs • Vendor / Provider … and between Humans and Applications (ITSM, DevOps, etc.) through Bots

+ Physical War rooms Solving thorough „Swarming“

Persistent audit of communication Incident Management Tool Chain

Dashboards & Reporting

Incident Tracing Datalake & Analytics Runbooks

Monitoring

Event Notification Collaboration Management Logging

Ticketing & Trending Monitor (Multiple Clouds) Analyze Plan Execute Incident Management Tool Chain with ChatOps

Dashboards & Reporting

Incident Tracing Datalake & Analytics Runbooks

Monitoring

Event Notification Collaboration Management Logging

Ticketing & Trending Monitor (Multiple Clouds) Analyze Plan Execute What is ChatOps?

Siloed Operations Operations Individual app/website. No sharing or collaboration

CHATOPS Dedicated app/website • Human-human collaboration • Simple automations Consumer to Business • Advanced automations Business to Business Collaboration Chat Bots Platform based platform Consumer to Business Non-operational Business to Business collaboration Forrester Reasearch : for IT Operations 2019

IBM Cloud / ChatOps / November, 2018 / © 2018 IBM Corporation The inherent benefits of ChatOps in modernizing operations, DevOps and SRE

For existing Operations personas: For new personas: • Operators • Developers • Site Reliability Engineers Todd • Managers Jane • Level 1-2-3 support Operations / Admin • of Business Owners Enterprise Developer • Subject Matter Experts Benefits Benefits • Reduction of “Waste by Motion” • Availability of tools in a familiar environment, improved No needless context-switching between tools, ramp-up time and reduced friction to adopt new tools. easier and closer collaboration between humans. • Reduced dependence on traditional operations • Reduction of “Waste by Transport” personas, more collaboration. No copy/paste between tools, • Use your own tools in the new platform. faster access to information (swivel chair operations) • Gain the benefits of a dynamic & flexible collaboration • More opportunity to learn from others. platform in parallel to traditional process-oriented tooling – more transparency. Integrate your tools. Simplify your processes. Automate everything. Higher Transparency & Efficiency Lower Costs Better MTTR and other KPIs Entry points into ChatOps

Entry points will be different from organization to organization. Some stages may be skipped.

13 Before ChatOps

Operations Subject Matter Developers Ad-hoc human-human collaboration Support level 1,2,3 Experts Examples: • Communication using phone calls, conference calls, WhatsApp/SMS, etc… • Physical co-location for war-rooms, etc… Silo Silo

Prometheus Business Value: rd Netcool Operations IBM Alert Grafana 3 others… • This is the way we’ve Insight / Cloud Event Monitoring Notification Party Solutions always done it, comfort zone Management ChatOps level 1

• Human-to-human collaboration. CSMO tooling remains the same.

• Slight change to processes. • Create dedicated channels for MatterMost Microsoft others… Sev1 incidents Teams • Document activities in collaboration tool

• Business Value: • Persistent record of incident • Remote/Virtual Prometheus war-room rd Netcool Operations IBM Alert Grafana 3 others… • Clear communications Insight / Cloud Event Monitoring Notification Party Solutions • Reduce Mean Time to Know Management ChatOps level 2a

• Human-to-human collaboration. • Monitoring tools send notifications and information to collaboration tool. Slack MatterMost Microsoft others… • Slight change to processes: Teams • Send event information to channel automatically • Send closing event when incident Incoming integrations is resolved • Send notification of new deployment, issue, pull request, etc…

• Business Value: Prometheus • Reduction of rd Netcool Operations IBM Alert Grafana 3 others… Mean Time To Detect, Insight / Cloud Event Monitoring Notification Party Solutions Mean Time To Inform Management ChatOps level 2b

• Human-to-human collaboration. • Pull data from monitoring tools into collaboration tool. Slack MatterMost Microsoft others… • Slight change to processes. Teams Examples: • Query CMDB for information • Query Ticketing system Outgoing integrations • Query Metrics

• Business Value: • Reduce Mean Time to Identify, Mean Time to Know Prometheus rd Netcool Operations IBM Alert Grafana 3 others… Insight / Cloud Event Monitoring Notification Party Solutions Management ChatOps level 3

• Human-to-human collaboration. • Automated interactions with monitoring tools from within collaboration tools Slack MatterMost Microsoft others… • Larger process changes Teams • Update ticket/event status • Execute runbooks and view responses

• Business Value: • Reduce Mean Time to Identify, Mean Time to Know, Mean Time to Repair Prometheus rd Netcool Operations IBM Alert Grafana 3 others… Insight / Cloud Event Monitoring Notification Party Solutions Management ChatOps level 4

• Human-to-human collaboration. • Bots interact with humans and tools within the collaboration channels. Slack MatterMost Microsoft others… • Larger process changes: Teams • Relay conversation into ticketing system • Monitor for key words and send updates • Update Knowledge Base • Improved interactions (Virtual agents)

• Business Value: • Processes are streamlined, manual toil is replaced by automation. Prometheus • Addition of security/RBAC layer rd Netcool Operations IBM Alert Grafana 3 others… • Continuous improvement and Insight / Cloud Event Monitoring Notification Party Solutions learning. Management • Leveraging ChatOps between processes ChatOps level 5

• Human-to-human collaboration. • Interact with monitoring tools from withincollaboration tools. • Bots interact with humans and tools Slack MatterMost Microsoft others… within the collaboration channels. Teams

• Cognitive bots (Cognitive virtual agents). • Recommend solutions and/or participants based on history/Knowledge Base • Recommend channels where similar discussions took place

Prometheus • Business Value: rd Netcool Operations IBM Alert Grafana 3 others… • Continuous improvement Insight / Cloud Event Monitoring Notification Party Solutions and learning. Management • Easier on-boarding of processes Incident Lifecycle with ChatOps

Demo time!

6

3

1 5 6

2 5

1 7 4 4 6

Architecture : The “old dog”

Manage-to IBM Cloud Edge Systems

Runbooks Monitoring Dashboards

Topology AIOps DevOps Tickets Management services

MessageBus Probe Omnibus Impact

NOI Manage-From Environment The first new trick

Manage-to IBM Cloud Edge Systems

Runbooks Monitoring Dashboards

Topology AIOps DevOps Tickets Management services

MessageBus 1. Send Events Slack Probe Omnibus Impact

NOI Manage-From Environment The 2nd new trick

Manage-to IBM Cloud Edge Systems Hubot Runbooks Monitoring Dashboards 2. Respond to Topology AIOps DevOps Tickets direct commands Management services 3. Respond to key words

MessageBus 1. Send Events Slack Probe Omnibus Impact

NOI Manage-From Environment A 3rd new trick

Manage-to IBM Cloud Edge Systems Cloud Functions Hubot

Runbooks Monitoring Dashboards 2. Respond to 4. Respond to Topology AIOps DevOps Tickets direct commands buttons and dialogs Management services 3. Respond to key words

MessageBus 1. Send Events Slack Probe Omnibus Impact

NOI Manage-From Environment Trick #4

Manage-to IBM Cloud Edge Systems Cloud Functions Hubot

Runbooks Monitoring Dashboards 2. Respond to 4. Respond to Topology AIOps DevOps Tickets direct commands buttons and dialogs Management services 3. Respond to key words 5. Update events MessageBus 1. Send Events Slack Probe Omnibus Impact

NOI Manage-From Environment What a good dog!

Manage-to 6. Execute IBM Cloud Edge Systems commands Cloud Functions Hubot

Runbooks Monitoring Dashboards 2. Respond to 4. Respond to Topology AIOps DevOps Tickets direct commands buttons and dialogs Management services 3. Respond to key words 5. Update events MessageBus 1. Send Events Slack Probe Omnibus Impact

NOI Manage-From Environment Many good dogs!

Manage-to 6. Execute IBM Cloud Edge Systems commands Cloud Functions Hubot

Runbooks Monitoring Dashboards 2. Respond to 4. Respond to Topology AIOps DevOps Tickets direct commands buttons and dialogs Management services 3. Respond to key words 5. Update events MessageBus 1. Send Events Slack Probe Omnibus Impact Mattermost NOI Manage-From Environment MSTeams Previous Process: 15 minutes

Identify the Contact Extract report or Identify possible affected business Change and consult Maximo Transfer information offensive changes service Release databases to Incident Management Management ChatOps level 2b – Pulling information into collaboration channel 97.8% reduction in Current process: 20 seconds operational effort

Real Case

Identify system- Interact with the related or server- robot by command related acronyms

IBM & Customer confidential Cloud Services INTERNET Collaboration commands

HTTPS Grafana

Netcool DMZ HTTPS Webhooks HTTPS (REST)

node-omnibus PowerShell

CEMEXNET

ChatOps level 4 – SQL App commands Processes starting to change, Bots and automation leading to much higher velocity and transparancy

Customer and IBM Confidential Further reading and questions

Existing lab material: http://ibm.biz/csmo-chatops-lab

Reach out to me directly for material that’s in development [email protected] / @flyingbarron

ChatOps and the Moon Landing http://ibm.biz/csmo-apollo-chatops