Trace3 Research

Trend Report

IT Operations Monitoring & Analytics (ITOMA)

Seeing the difference between visibility and insight in IT Ops

Disclaimer – This document has been prepared solely for Trace3's internal research purposes without any commitment or responsibility on our part. Trace3 accepts no liability for any direct or consequential loss arising from the transmission of this information to third parties. This report is current at the date of writing only and Trace3 will not be responsible for informing of any future changes in circumstances which may affect the accuracy of the information contained in this report. Trace3 does not offer or hold itself out as offering any advice relating to investment, future performance or market acceptance.

© 2019 Trace3, Inc. All Rights Reserved IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops

Executive Summary IT Monitoring tools have been around for years. Yet many still only relay what happened, leaving IT operations teams to handle today’s complexity, volumes, variety and rapid change. Today’s IT operations teams need solutions that automate large-scale data collection and present real-time analysis in a unified view across all IT operational silos, detecting anomalies, correlating events and determining root causes to separate real incidents from the background noise. Choosing the right IT monitoring and analytics solution can be the difference between success and failure of an IT organization. As the IT monitoring field continues to grow and mature, innovations such as the integration of AI and into monitoring tools will enable businesses to make better decisions in real time.

Report Scope This report attempts to present a survey of current and emerging IT monitoring and analytics techniques, use cases, products and vendors by: • Identifying the various monitoring approaches. • Providing an overview of the various monitoring silos • Describe the IT Infrastructure Management (ITIM) use case and provide solution examples. • Introducing the IT Operations Analytics (ITOA) use case. • Exploring the emergence of AIOps. • Advancing Forecasts and Recommendations. This report does not, however, delve into the various use cases ancillary to, and supportive of, IT monitoring, such as IT Service Management (ITSM), Capacity Planning, Notification Management, Incident and Event Management, Root Cause Analysis, Ticketing or Tracking.

Research Method This Trace3 Research Trend Report's scope was based on research requests received from Trace3 customers and field engineers. From these requests, relevant areas of the technical landscape were mapped out, including the identification of affected 360 View use cases and the primary players in these use cases. From these use cases mandatory and desirable feature sets were defined and key vendors were then given the opportunity to present, describe and demonstrate their current product offerings. After detailed analysis forecasts and recommendations were drawn.

© 2019 Trace3, Inc. All Rights Reserved 2 IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops Analysis Did You Know... • The average cost of a data center outage costs is $740,357. [1]

• 65% of companies own more than 10 different commercial monitoring products. [4]

• 50% of surveyed companies indicated that 50% or fewer of their monitoring tools are actively being used. [4]

• The infrastructure monitoring market is expected to reach $2.47 billion by 2020 at a CAGR of 26.3%. [3]

• A performance monitoring solution can increase IT operating expenses an average of $800 per server, annually. [5]

Landscape

Today's IT infrastructure combines elements of physical, IT Monitoring Landscape virtual and cloud environments in a multi-tier hierarchy, AIOps mixing both legacy and modern technologies. This ITOA heterogeneous conglomeration not only requires an extensive amount of configuration to deploy, it also ITIM demands an even larger level of effort and expertise to monitor during daily operations. In general, IT Monitoring can be loosely grouped into five broad categories, including:

• Monitoring Silos - Includes specialized monitoring tools for Application, Cloud, Database, Log, Network, Server, App Cloud Database Log Network Server Storage Web Storage and Web monitoring. © 2019 Trace3, Inc. All Rights Reserved

• IT Infrastructure Monitoring (ITIM) - collects real-time availability and utilization data from the various IT infrastructure components from either cloud or on-prem resources, including server/hypervisor, networks, database and storage resources. Many of these tools have the ability to perform historical data analysis or trending patterns in this data.

• IT Operational Analytics (ITOA) - uses data science principles (e.g., mathematical algorithms and advanced analytics like machine learning) to understand the patterns in data generated across an organization’s IT landscape, detect anomalies from baseline behavior and correlate these variances to a root cause. Many of the tools in this bucket have recently been branded as AIOps tools, indicating that they have AI/ML embedded in their offering.

• Notification Management - a combination of software and hardware that provides a means of delivering operational messages, alerts and alarms to a group of operators based on rule sets and configuration parameters.

It is important to note, that the manufacturers noted above are only representatives of each space and this landscape diagram should not be considered to be all inclusive of the options available on the market today. Also, while each vendor above is shown as a distinct, single use case solution, many products actually overlap into neighboring silos and use cases making the real-world IT monitoring landscape much more "fuzzy" than depicted, Nonetheless, despite this fuzziness, it is useful to talk about a more delineated landscape in order to isolate use cases for easier analysis.

© 2019 Trace3, Inc. All Rights Reserved 3 Siloed Monitoring

App Cloud Database Log Network Server Storage Web

© 2019 Trace3, Inc. All Rights Reserved IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops

Many Panes of Glass

Siloed Monitoring Traditional IT monitoring often mirrors the traditional IT organizational structure; network monitoring for the network group, storage monitoring for the storage group and so on. Although this approach avoids potential political/cultural friction, it results in a very singular view of an operational component that is shared across the enterprise. While a practice of convenience, this segregation often causes increased cost and decreased efficiency as troubleshooting is based on limited views of impacted systems and applications, hindering collaboration and significantly increasing mean time to App Cloud Database Log Network Server Storage Web resolution (MTTR). © 2019 Trace3, Inc. All Rights Reserved Although many tools on the market today span multiple related silos, the primary siloed monitoring tools focus on Application Performance Monitoring (APM), Cloud Monitoring, Database Performance Monitoring (DPM - not to be confused with Database Activity Monitoring), Log Management, Network Performance Monitoring, Server Monitoring, Storage Monitoring and Website Monitoring. For the scope of this report, the myriad vendors of vendors offering siloed monitoring solutions will not be detailed.

Most enterprises today possess an accumulation of disjointed tools built for a static, generalized IT environment. Given the ever-increasing magnitude of change within an typical IT organization, siloed tools often relegate operational teams into a reactive defense as opposed to assuming a more proactive offensive stance. Because siloed monitoring solutions cause each team to focus on the behavior of their particular resources, root cause determination often takes a considerable amount of time, if even possible. With delayed or unavailable root cause analysis, what may start as a transient unexplained network anomaly could escalate into high severity outage.

Siloed monitoring tool indications are often left to subjective interpretation by the operational team before and during deployment. Rarely are the deployed rules revisited as the environment evolves over time, making it susceptible to unforeseen vulnerabilities. For example, an "impossible" alert is assigned no predetermined actions since it should never occur, but as things evolve over time what was impossible becomes possible, the alert fires, a ticket is created but no response action is identified. As this unexplained, seemingly inert error expands with enterprise volumes, it can generate a deluge of support tickets that are unmanageable, unadddressable and likely to spiral out of control before remediation can be designed and implemented.

Nonetheless, siloed monitoring is present in almost every enterprise and is still forms the foundation that higher monitoring layers are built upon. Siloed tools provide in-depth visibility into a specific infrastructure resource and are tailored to the special nuances of an enterprise's deployment. These specialized tools also provide detailed information on resource utilization for use in trending and capacity planning. Each infrastructure silo has its key indicators, technical specialties and dominant vendor profiles, resulting in custom monitoring tools specializing for the vagaries of each resource discipline.

© 2019 Trace3, Inc. All Rights Reserved 4

ITIM

IT Infrastructure Monitoring ITIM

App Cloud Database Log Network Server Storage Web

© 2019 Trace3, Inc. All Rights Reserved IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops

Single Pane of Glass 1.0 - ITIM

When searching for the proverbial "Single Pane of Glass", ITIM the answer is not found in a mythical application that replaces all of the siloed monitoring tools. Successful unification lies in monitoring both the parts and the whole of the enterprise landscape. IT Infrastructure Monitoring IT Infrastructure Monitoring (ITIM) takes siloed monitoring a good ways to this ITIM unification.

ITIM solutions monitor the availability, capacity, events, utilization and performance telemetry from multiple silos (typically servers & hypervisors), network, storage, and database resources). These tools are often purpose built App Cloud Database Log Network Server Storage Web with a largely static infrastructure combination in mind. © 2019 Trace3, Inc. All Rights Reserved However, reality dictates they must continuously evolve.

ITIM tools are available as open-source, commercial open-source, proprietary software and SaaS offerings. Licensing options and costs vary greatly from per-device to number of metrics monitored. Pure-play ITIM leaders include Datadog, GroundWork, Icinga, LogicMonitor, ManageEngine, Nagios, Opsview, Paessler AG, ScienceLogic, Solarwinds, Zenoss (and many more). In addition to these pure play solutions, many large IT incumbents offer ITIM solution suites, including HPE, CA Technology, IBM, Microsoft, VMware, and BMC.

Given this vast array of solution options available, and the gaps between them, many organizations deploy a combination of ITIM tools (which, to some degree, obviates the unifying function of sought in an ITIM). The selection of ITIM solution combinations is often exacerbated by unclear understanding or under-evaluation of the current and future enterprise landscape, emerging technologies, time to value, skill levels of operational teams and alignment with business stakeholders.

Yet despite the aims and claims of many ITIM products in the market, there is no magical one size fits all "Single Pane of Glass". While ITIM solutions do indeed offer IT operators an aggregated view of enterprise operations consolidated across their various silos, very few offer any form of event correlation or noise reduction on this amalgamated data flow to help filter the needle from the increased stack of hay. Some ITIM solutions do allow for programatic scripting and response to known or foreseen patterns and events, this is far from a correlation engine - the haystack still remains.

So how does the typical IT enterprise span this gap without purchasing and integrating every ITIM tool available? Unfortunately, today's enterprises must select solutions that not only meet the current infrastructure topology but can also adapt to the predicted landscape of tomorrow, foretelling, for instance, the impact technologies like converged and hyperconverged, cloud-based resources and services, containers and DevOps. This requires much more than a simple feature comparison.

Despite these challenges, ITIM solutions do help span (but not replace) the various silo-based monitors and certainly simplify the typical tasks facing an IT operations group. Even a semi-unified view into the enterprise resources provided by an ITIM solution can greatly speed fault isolation, remediation, root cause analysis and prescription corrective actions.

© 2019 Trace3, Inc. All Rights Reserved 5

ITOA

IT Operations Analytics ITOA

App Cloud Database Log Network Server Storage Web

© 2019 Trace3, Inc. All Rights Reserved IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops

Single Pane of Glass 2.0 - ITOA

ITOA As can be seen from previous analysis, ITIM solutions do not typically deliver the "Single Pane of Glass" that most enterprises crave, hence the recent emergence of IT Operations Analytics (ITOA).

IT Operations Analytics ITOA ITOA tools use data science principles (e.g., mathematical algorithms, advanced analytics, machine learning) to learn operational patterns of systems, storage, network, storage, cloud and any third party applications in the form of logs, events and metrics to build a baseline performance profile from which to detect and report App Cloud Database Log Network Server Storage Web anomalies while filtering and ignoring false positives. © 2019 Trace3, Inc. All Rights Reserved Analysts and application owners can investigate these anomalies to either remediate issues or adjust the ITOA model to changing norms.

This new ITOA approach promises fewer outages, faster problem resolution, optimized resource utilization and increased operational staff productivity. It also presents a potential solution to the constant environmental evolution to which more rigid solutions are unable to adapt. ITOA solutions also often integrate directly with ITSM tools, chat tools and production call tools allowing operations teams to bring the right operators to a production issue call, as opposed to, all of the operators.

As with many emerging technologies, ITOA is still ill-defined, which unfortunately allows many vendors to slap the "ITOA" moniker on more modest products. As such, this report defines ITOA solutions to be those that satisfy all four of the following requirements: 1. Discover complex patterns in vast amounts of data and extract meaningful insights from these patterns. 2. Correlate various operational events to perform root cause analysis. 3. Dynamically learn the behavior of infrastructure to establish baseline behavior and thresholds. 4. Gather data from multiple data center resource silos.

There are only a handful of solutions on the market today that fit into this strictly defined space. These few can be generalized into three approaches: • Alert-based - Ingest alerts from every type of monitoring tool and correlate detected events into a unified situation. Examples include Big Panda and MoogSoft. • Log-based - Ingest logs from all sources to correlate situations and detect anomalies. Examples include products by Loom Systems and Unomaly. • Wire data based - Ingests layer 2 through 7 network communication to develop a real-time transaction state across Web, Application, Database, Storage tiers without the need for agents, logging tools or configuration rules. Examples include ExtraHop, Corvil, NetFort and Clear Clouds. Most ITOA architectures use a combination of these approaches and augment them with machine data, agent data, synthetic data and human generated data.

It is important to note that, most IT monitoring vendors recognize that ITOA features are the way customers are pulling the industry. However, ITOA solutions have "" principles at their core, allowing them to ingest, correlate and make sense of this vast amount of unstructured data. This DNA must be spliced into the product from inception, making it exceedingly difficult for ITIM vendors to just add ITOA features to their next product release.

© 2019 Trace3, Inc. All Rights Reserved 6

Source: https://dzone.com/articles/aiops-the-future-of-it-ops IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops

The emergence of AIOps

As the ITOA market has continued to mature and advance, (AI) and Machine Learning (ML) capabilities have found their way into many of the ITOA offerings. Such tools have been labeled in the market as AIOps. This distinction has been used to further separate true ITOA tools from traditional monitoring solutions. These tools utilize machine learning and advanced analytics to provide anomaly detection, root cause determination, event correlation, and guidance for resolving common IT problems. AIOps tools sift through Source: https://dzone.com/articles/aiops-the-future-of-it-ops the large amounts of log, network, machine, and alert data to provide insights into the behaviors and dependencies of IT environments. In the previous section, we highlighted a number of vendors offering compelling ITOA solutions such as BigPanda, Moogsoft, Loom Systems, Unomaly, ExtraHop, Corvil, Netfort, and ClearClouds. These vendors have all begun to take the plunge into the AIOps territory, by implementing AI in their platforms. Other vendors beginning to offer solutions in this space include StackState, OpsRamp, Signifai (now a part of New Relic), Anodot, and Dynatrace.

The IT Operations Analytics market, now also known as AIOps, will continue to grow. As a result, features like anomaly detection, root cause determination and event correlation will become more robust and more integrated with existing tool sets. One additional challenge we see facing the ITOA market is the ability to centralize data sources. Companies like Cribl.io are beginning to tackle this problem, and we expect other solutions to emerge.

Trace3's Take

Forecasts 1. Siloed offerings will continue to add features to monitor neighboring resources in the continuing quest for the Single Pane of Glass. These consolidations will result in monitoring platforms with similar limitations to those found in today's ITIM solutions but will be a cost effective alternative for small and medium enterprises.

2. IT Monitoring and APM products will continue to add features traditionally found in each others market's, blurring the distinctions between application, database and infrastructure silos.

3. The adoption of big data principles by ITOA will follow a similar path to that of previous big data technologies resulting in an IT Operational data lake with a large analytics platform serving up intelligence, visibility tools, reporting and predictive analytics.

4. IT Operations groups will shift from a tools-centric organization structure to a data-driven interdisciplinary approach.

5. Big Data solutions providers will refit their existing suite of tools to handle IT Ops data. This will be an easier transition for them than for the ITIM providers.

6. Advanced analytics and machine learning will become table stakes in monitoring tools. Initially this will create a flurry of unsubstantiated rebranding efforts by vendors eager to catch up, but these will eventually either acquire their way into ITOA or exit the market.

7. ITOA and AIOps will help evolve tomorrow's IT organization from a reactive speeds and feeds provider focused on capacity availability into a proactive data-driven fulfillment engine delivering stability, agility and innovation ahead of business needs.

© 2019 Trace3, Inc. All Rights Reserved 7 IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops Recommendations 1. IT organizations implementing or overhauling their IT monitoring suite should define the objective and desired outcome prior to choosing a tool. Tool selection must then be centered on fitting the stated requirements, constraints and process needs (not vice versa).

2. Cultural inertia is one of the main inhibitors to adoption of new IT monitoring paradigms. Therefore, when making toolset selections, organizations should evaluate the skillsets of their existing operational staff and determine how much cultural change can be tolerated.

3. When evaluating any monitoring tool, it is highly desirable to give preference to those that integrate their data with other tools or expose their APIs to allow for the loose coupling with other tools in the enterprises monitoring landscape. This will allow for future mix and match choices inevitable in an ever-evolving IT deployment.

4. There should be a continuous effort to not only use the features on existing tools more effectively and in deeper conjunction other platforms, but also to identify opportunities for rationalization and consolidation of monitoring tools.

5. IT Monitoring is but one facet of data collection and reporting within the enterprise. IT monitoring can, and should, be brought into alignment with security and business intelligence reporting initiatives.

6. IT monitoring tool suites must integrate seamlessly with the often overlooked supporting operational tools such as ITSM workflow, notification management, incident management, service tools, communication tools and alerting tools.

7. Apart from the primary features sets, ancillary characteristics of the tools should be included in the selection evaluation, including the required levels of technical knowledge, pricing models, levels of automation, ease of deployment, alerting, visualization and scalability when choosing a monitoring solutions

8. Remember that analytics goes beyond unstructured text searches and scaled down alerting. Pattern Discovery and inference are critical when choosing a solution.

© 2019 Trace3, Inc. All Rights Reserved 8 IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops Appendix Featured Use Cases Application Performance Management APM monitors and manages the performance and availability of software applications. APM strives to detect and diagnose application performance problems to maintain an expected level of service. Mobile application performance is defined by the user's perception of how well the application performs. This means that the performance of your application is measured by how responsive it is, how quickly it starts up, how well it uses device memory, how well it uses device power.

Cloud Monitoring Cloud monitoring is the process of reviewing, monitoring and managing the operational workflow and processes within a cloud-based IT asset or infrastructure. It is the use of manual or automated IT monitoring and management techniques to ensure that a cloud infrastructure or platform performs optimally.

Database Monitoring Database activity monitoring (DAM) is a database security technology for monitoring and analyzing database activity that operates independently of the database management system (DBMS) and does not rely on any form of native (DBMS-resident) auditing or native logs such as trace or transaction logs. DAM is typically performed continuously and in real-time.

Database activity monitoring and prevention (DAMP) is an extension to DAM that goes beyond monitoring and alerting IT Infrastructure Monitoring Operational monitoring and management refers to collecting key system performance metrics at periodic intervals over time. This information gives you critical data to refine that initial configuration to be more tailored to your requirements, and also prepares you to address new problems that might appear on their own or following software upgrades, increases in data or user volumes, or new application deployments.

IT Operations Analytics IT operations analytics (ITOA) is the practice of monitoring systems and gathering, processing, analyzing and interpreting data from various IT operations sources to guide decisions and predict potential issues.

Log Management Log aggregation is an approach to store large volumes of computer-generated log messages (also known as audit records, audit trails, event-logs, etc.) in a centralized repository where they are used as data for IT operations or security analytics or forensics.

Network Performance Management Network Performance Management is a system that continuously monitors a network and notifies a network administrator though messaging systems (usually e-mail) when a device fails or an outage occurs. Network monitoring is usually performed through the use of software applications and tools.

© 2019 Trace3, Inc. All Rights Reserved 9 IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops

Sources 1 – Emerson Network Power – Emerson Network Power Study Says Unplanned Data Center Outages Cost Companies Nearly $9,000 Per Minute – 2016 2 - Application Performance Monitoring - Industry Challenges, State of the Art, and the cause for unified monitoring - 2015 3 - Infrastructure Monitoring Market by Technology - Marketsandmarkets.com - 2016 4 - Current Enterprise Application Monitoring Tools Often Siloed and Underutilized by IT Organizations, Reports New Research - AppDynamics - 2015 5 - Avoiding the Hidden Costs of Performance Monitoring Tools - SevOne - 2016

Landscape

IT Monitoring Landscape AIOps ITOA ITIM

App Cloud Database Log Network Server Storage Web

© 2019 Trace3, Inc. All Rights Reserved

© 2019 Trace3, Inc. All Rights Reserved 10 Siloed Monitoring

App Cloud Database Log Network Server Storage Web

© 2019 Trace3, Inc. All Rights Reserved IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops Many Panes of Glass Siloed Monitoring

App Cloud Database Log Network Server Storage Web

© 2019 Trace3, Inc. All Rights Reserved

Single Pane of Glass 1.0 - ITIM ITIM

IT Infrastructure Monitoring ITIM

App Cloud Database Log Network Server Storage Web

© 2019 Trace3, Inc. All Rights Reserved

© 2019 Trace3, Inc. All Rights Reserved 11

ITOA

IT Operations Analytics ITOA

App Cloud Database Log Network Server Storage Web

© 2019 Trace3, Inc. All Rights Reserved ITIM

IT Infrastructure Monitoring ITIM

App Cloud Database Log Network Server Storage Web

© 2019 Trace3, Inc. All Rights Reserved IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops

Single Pane of Glass 2.0 - ITOA ITOA

IT Operations Analytics ITOA

App Cloud Database Log Network Server Storage Web

© 2019 Trace3, Inc. All Rights Reserved

The emergence of AIOps

Source: https://dzone.com/articles/aiops-the-future-of-it-ops

© 2019 Trace3, Inc. All Rights Reserved 12 Source: https://dzone.com/articles/aiops-the-future-of-it-ops

IT Operations Monitoring & Analytics (ITOMA) Seeing the difference between visibility and insight in IT Ops

About Trace3 Research To solve the IT problems of tomorrow, our research engineers leverage Trace3's unique access across the technology landscape to derive impartial insights. By identifying and analyzing technology and market trends, we enable our customers to prepare for and master tomorrow's challenges before they arrive. Trace3 Research leverages our partnerships with 500 established and emerging technology companies, the real-world experience of over 250 engineers, a 3000-client ecosystem and deep relationships with dozens of the top Silicon Valley venture capital firms to spot trends ahead of most industry pundits. This allows you to take advantage of Trace3's Research unique access to gain an inside advantage on tomorrow's trends and reduce your technical and business risk.

(end of report)

© 2019 Trace3, Inc. All Rights Reserved 13