New tricks with IBM ChatOps :
Achieve Site Reliability Engineering operations with legacy solutions
Cloud Service Management and Operations (CSMO) DevOps Environment Ops
Service Site Reliability Management, Engineering ITIL, IT4IT, ZeroOutage By nature, IT operations and developers have opposite goals
Jane Todd Enterprise Developer Operations / Admin Responsible for modernizing existing applications and Responsible for infrastructure, security, and creating new Cloud Native Workloads. management of the environment.
Both development and IT operations need to meet business demands:
Developers IT Operations and Administrators • Rapidly create new • Quickly set up & manage applications, optimize existing modern, flexible, and ones, and securely connect compliant hybrid clouds their applications with data and • Integrate with new & existing services across all clouds management tools and processes By nature, IT operations and developers have opposite goals
Jane Todd Enterprise Developer Operations / Admin Responsible for modernizing existing applications and Responsible for infrastructure, security, and creating new Cloud Native Workloads. management of the environment.
Both development and IT operations need to meet business demands:
Developers IT Operations and Administrators • Rapidly create new • Quickly set up & manage applications, optimize existing modern, flexible, and ones, and securely connect compliant hybrid clouds their applications with data and • Integrate with new & existing services across all clouds management tools and processes Jane Todd Enterprise Developer Operations / Admin Responsible for modernizing existing applications and Responsible for infrastructure, security, and creating new Cloud Native Workloads. management of the environment.
Both development and IT operations need to meet business demands:
Developers IT Operations and Administrators • Rapidly create new • Quickly set up & manage applications, optimize existing modern, flexible, and ones, and securely connect compliant hybrid clouds their applications with data and • Integrate with new & existing services across all clouds management tools and processes Jane Todd Enterprise Developer Operations / Admin Responsible for modernizing existing applications and Responsible for infrastructure, security, and creating new Cloud Native Workloads. management of the environment.
Both development and IT operations need to meet business demands: Change that can be managed Stability that enables change
Unified, flexible, CSMO toolchain Dedicated Ops tools vs ChatOps Dedicated ITSM tools vs ChatOps
Plus Traditional • Email Modern Help Desk • Phone ChatOps Microsoft • Bridge Calls Teams tools • Instant Messaging tools (Skype, WhatsApp) • … L1 Instant Collaboration between SMEs … • Various Operations roles L2 • Developers L3, SMEs • Vendor / Provider … and between Humans and Applications (ITSM, DevOps, etc.) through Bots
+ Physical War rooms Solving thorough „Swarming“
Persistent audit of communication Incident Management Tool Chain
Dashboards & Reporting
Incident Tracing Datalake & Analytics Runbooks
Monitoring
Event Notification Collaboration Management Logging
Ticketing & Trending Monitor (Multiple Clouds) Analyze Plan Execute Incident Management Tool Chain with ChatOps
Dashboards & Reporting
Incident Tracing Datalake & Analytics Runbooks
Monitoring
Event Notification Collaboration Management Logging
Ticketing & Trending Monitor (Multiple Clouds) Analyze Plan Execute What is ChatOps?
Siloed Operations Operations Individual app/website. No sharing or collaboration
CHATOPS Dedicated app/website • Human-human collaboration • Simple automations Consumer to Business • Advanced automations Business to Business Collaboration Chat Bots Platform based platform Consumer to Business Non-operational Business to Business collaboration Forrester Reasearch : ChatBots for IT Operations 2019
IBM Cloud / ChatOps / November, 2018 / © 2018 IBM Corporation The inherent benefits of ChatOps in modernizing operations, DevOps and SRE
For existing Operations personas: For new personas: • Operators • Developers • Site Reliability Engineers Todd • Managers Jane • Level 1-2-3 support Operations / Admin • Line of Business Owners Enterprise Developer • Subject Matter Experts Benefits Benefits • Reduction of “Waste by Motion” • Availability of tools in a familiar environment, improved No needless context-switching between tools, ramp-up time and reduced friction to adopt new tools. easier and closer collaboration between humans. • Reduced dependence on traditional operations • Reduction of “Waste by Transport” personas, more collaboration. No copy/paste between tools, • Use your own tools in the new platform. faster access to information (swivel chair operations) • Gain the benefits of a dynamic & flexible collaboration • More opportunity to learn from others. platform in parallel to traditional process-oriented tooling – more transparency. Integrate your tools. Simplify your processes. Automate everything. Higher Transparency & Efficiency Lower Costs Better MTTR and other KPIs Entry points into ChatOps
Entry points will be different from organization to organization. Some stages may be skipped.
13 Before ChatOps
Operations Subject Matter Developers Ad-hoc human-human collaboration Support level 1,2,3 Experts Examples: • Communication using phone calls, conference calls, WhatsApp/SMS, etc… • Physical co-location for war-rooms, etc… Silo Silo
Prometheus Business Value: rd Netcool Operations IBM Alert Grafana 3 others… • This is the way we’ve Insight / Cloud Event Monitoring Notification Party Solutions always done it, comfort zone Management ChatOps level 1
• Human-to-human collaboration. CSMO tooling remains the same.
• Slight change to processes. • Create dedicated channels for Slack MatterMost Microsoft others… Sev1 incidents Teams • Document activities in collaboration tool
• Business Value: • Persistent record of incident • Remote/Virtual Prometheus war-room rd Netcool Operations IBM Alert Grafana 3 others… • Clear communications Insight / Cloud Event Monitoring Notification Party Solutions • Reduce Mean Time to Know Management ChatOps level 2a
• Human-to-human collaboration. • Monitoring tools send notifications and information to collaboration tool. Slack MatterMost Microsoft others… • Slight change to processes: Teams • Send event information to channel automatically • Send closing event when incident Incoming integrations is resolved • Send notification of new deployment, issue, pull request, etc…
• Business Value: Prometheus • Reduction of rd Netcool Operations IBM Alert Grafana 3 others… Mean Time To Detect, Insight / Cloud Event Monitoring Notification Party Solutions Mean Time To Inform Management ChatOps level 2b
• Human-to-human collaboration. • Pull data from monitoring tools into collaboration tool. Slack MatterMost Microsoft others… • Slight change to processes. Teams Examples: • Query CMDB for information • Query Ticketing system Outgoing integrations • Query Metrics
• Business Value: • Reduce Mean Time to Identify, Mean Time to Know Prometheus rd Netcool Operations IBM Alert Grafana 3 others… Insight / Cloud Event Monitoring Notification Party Solutions Management ChatOps level 3
• Human-to-human collaboration. • Automated interactions with monitoring tools from within collaboration tools Slack MatterMost Microsoft others… • Larger process changes Teams • Update ticket/event status • Execute runbooks and view responses
• Business Value: • Reduce Mean Time to Identify, Mean Time to Know, Mean Time to Repair Prometheus rd Netcool Operations IBM Alert Grafana 3 others… Insight / Cloud Event Monitoring Notification Party Solutions Management ChatOps level 4
• Human-to-human collaboration. • Bots interact with humans and tools within the collaboration channels. Slack MatterMost Microsoft others… • Larger process changes: Teams • Relay conversation into ticketing system • Monitor for key words and send updates • Update Knowledge Base • Improved interactions (Virtual agents)
• Business Value: • Processes are streamlined, manual toil is replaced by automation. Prometheus • Addition of security/RBAC layer rd Netcool Operations IBM Alert Grafana 3 others… • Continuous improvement and Insight / Cloud Event Monitoring Notification Party Solutions learning. Management • Leveraging ChatOps between processes ChatOps level 5
• Human-to-human collaboration. • Interact with monitoring tools from withincollaboration tools. • Bots interact with humans and tools Slack MatterMost Microsoft others… within the collaboration channels. Teams
• Cognitive bots (Cognitive virtual agents). • Recommend solutions and/or participants based on history/Knowledge Base • Recommend channels where similar discussions took place
Prometheus • Business Value: rd Netcool Operations IBM Alert Grafana 3 others… • Continuous improvement Insight / Cloud Event Monitoring Notification Party Solutions and learning. Management • Easier on-boarding of processes Incident Lifecycle with ChatOps
Demo time!
6
3
1 5 6
2 5
1 7 4 4 6
Architecture : The “old dog”
Manage-to IBM Cloud Edge Systems
Runbooks Monitoring Dashboards
Topology AIOps DevOps Tickets Management services
MessageBus Probe Omnibus Impact
NOI Manage-From Environment The first new trick
Manage-to IBM Cloud Edge Systems
Runbooks Monitoring Dashboards
Topology AIOps DevOps Tickets Management services
MessageBus 1. Send Events Slack Probe Omnibus Impact
NOI Manage-From Environment The 2nd new trick
Manage-to IBM Cloud Edge Systems Hubot Runbooks Monitoring Dashboards 2. Respond to Topology AIOps DevOps Tickets direct commands Management services 3. Respond to key words
MessageBus 1. Send Events Slack Probe Omnibus Impact
NOI Manage-From Environment A 3rd new trick
Manage-to IBM Cloud Edge Systems Cloud Functions Hubot
Runbooks Monitoring Dashboards 2. Respond to 4. Respond to Topology AIOps DevOps Tickets direct commands buttons and dialogs Management services 3. Respond to key words
MessageBus 1. Send Events Slack Probe Omnibus Impact
NOI Manage-From Environment Trick #4
Manage-to IBM Cloud Edge Systems Cloud Functions Hubot
Runbooks Monitoring Dashboards 2. Respond to 4. Respond to Topology AIOps DevOps Tickets direct commands buttons and dialogs Management services 3. Respond to key words 5. Update events MessageBus 1. Send Events Slack Probe Omnibus Impact
NOI Manage-From Environment What a good dog!
Manage-to 6. Execute IBM Cloud Edge Systems commands Cloud Functions Hubot
Runbooks Monitoring Dashboards 2. Respond to 4. Respond to Topology AIOps DevOps Tickets direct commands buttons and dialogs Management services 3. Respond to key words 5. Update events MessageBus 1. Send Events Slack Probe Omnibus Impact
NOI Manage-From Environment Many good dogs!
Manage-to 6. Execute IBM Cloud Edge Systems commands Cloud Functions Hubot
Runbooks Monitoring Dashboards 2. Respond to 4. Respond to Topology AIOps DevOps Tickets direct commands buttons and dialogs Management services 3. Respond to key words 5. Update events MessageBus 1. Send Events Slack Probe Omnibus Impact Mattermost NOI Manage-From Environment MSTeams Previous Process: 15 minutes
Identify the Contact Extract report or Identify possible affected business Change and consult Maximo Transfer information offensive changes service Release databases to Incident Management Management ChatOps level 2b – Pulling information into collaboration channel 97.8% reduction in Current process: 20 seconds operational effort
Real Case
Identify system- Interact with the related or server- robot by command related acronyms
IBM & Customer confidential Cloud Services INTERNET Collaboration commands
HTTPS Grafana
Netcool DMZ HTTPS Webhooks HTTPS (REST)
node-omnibus PowerShell
CEMEXNET
ChatOps level 4 – SQL App commands Processes starting to change, Bots and automation leading to much higher velocity and transparancy
Customer and IBM Confidential Further reading and questions
Existing lab material: http://ibm.biz/csmo-chatops-lab
Reach out to me directly for material that’s in development [email protected] / @flyingbarron
ChatOps and the Moon Landing http://ibm.biz/csmo-apollo-chatops