White Paper IT Operations Management

The State of Analytics in IT Operations IT operations analytics Introduction holds considerable promise for making day-to-day If you lived through the AI () hype of the 1990s or earlier, you might be skeptical about IT Ops work easier. seeing the term in such frequent use these days. AI can mean many different things, andalways has.

It has also changed names a few times, from IT Operations Analytics (ITOA) to Algorithmic IT Ops, AIOps, and Cognitive Operations. But over the past two years, with influential analyst firms like Gartner and Forrester getting on board with the term, AI is getting more respect, and it’s getting more practical.

According to a recent Forrester survey, ITOA is the number one application of AI technology that busi- nesses are considering. Also high on Forrester’s list are business insight and security, all of which are related to ITOA at a fundamental level.

IT Ops Analytics all starts with data collection or monitoring data. Analytics is dependent on data and lots of it, often called . Data is the food that fuels analytics, without it analytics has nothing to look at to find patterns or anomalies that provide us insight.

IT Operations Analytics holds considerable promise for making day-to-day IT Ops work easier. But what does this mean for IT Ops specialists who aren’t trained in analytics? Do they now need to take classes in data science and , and learn to write the algorithms that lie at the heart of analytics capabilities?

No. But it does mean that IT Ops specialists should be at least familiar with the kinds of analytics be- ing used, increasingly, in their industry. They should take advantage of whatever analytic capabilities are embedded in their tools, and they should know when to seek guidance from other teams in the organiza- tion – security, big data, business intelligence teams, for example—when they have questions or want to improve their analytics skills.

There’s a lot to consider. Here is an overview of what’s happening today in the ITOA space, along with some expert advice.

IT Ops Teams: Don’t Panic Over Analytics

Compared to specialists in security or big data, where analytics is a core part of the job description, the analytic skills within an IT Ops organization tend to be relatively low, which is to be expected. The technology and the field itself is fairly new within IT Ops. Besides, “analytics within IT Ops isn’t usually something that demands a data scientist,” says Michele Goetz, principal analyst with Forrester Research, who specializes in business insights, artificial intelligence, information management, architecture, and strategy.

www.microfocus.com 1 White Paper The State of Analytics in IT Operations

“IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams, if possible,” says Goetz. This might include a security, or business intelligence, or big data team that can provide basic “But what I see is that IT Ops specialist tend to rely help or training for those just getting their feet wet with analytics. on the analytics capabilities within the platforms and “But what I see is that IT Ops specialist tend to rely on the analytics capabilities within the platforms and solutions acquired for solutions acquired for the IT Ops organization,” says Goetz. “These are not the same capabilities you find the IT Ops organization,” in security analytics tools, which represent some of the most sophisticated capabilities on the market.” says says Michele Goetz, Instead, these are tools to help performance, monitor and predict spending, what Goetz calls “the block- principal analyst with Forrester Research. and-tackle job of running and maintaining the platform, keeping the lights on, being agile to support busi- ness needs. These are the things that require operational analytics.”

Some teams are running models that give them a better understanding about cost to performance, they’re managing resources with tools that can give them more detail than the higher-level performance metrics they might have used in the past. “This is not big, sophisticated predictive and prescriptive modelling, that you might see in other parts of the business. The best IT Ops teams are looking to mine a little deeper into the system data that comes from their infrastructure or learning about the types of queries they can run against the system and figure out better styles of workload management,” Goetz says.

So, don’t panic if you don’t have the skills to be more sophisticated with analytics. But you may want to begin exploring machine learning, “at least take a look at what your own solutions offer for embedded machine learning in the operations,” says Goetz. “This will bring you up a level over the coming year.”

Controlling The Spend: The Number One Target For IT Ops Analytics

Analytics, in addition to all its other users, has a focus on the spending side of IT Ops. You’re trying to lower you cost to performance. What is the total cost of ownership for your technologies? How do you right size your resources, and what’s happening with your outsourcing and contracting? The goal is to get smarter about the resources you need, the investments you need to make.

CIOs are constantly under pressure to contain their budgets, which means justifying expenditures against the business value. And much of the analytics that can help with this comes out of the box for many tools.

2 “Take automated data warehouses,” Goetz continues. “Based on years of understanding how data centers Good analytics leverages run in the cloud, what those workloads are, all that understanding is built into the tools.” information across a number of different sources to help meet service level Rather than requiring users to figure out their particular environment, the machine smarts can guide you agreements (SLAs). via patterns that come preconfigured. Meanwhile, “vendors of IT Ops technology continue to learn how different types of workloads and administrative tasks are informing how you’re managing and optimizing those environments as well as managing it toward your cost to performance models,” Goetz says. “This guidance is not as targeted, of course, as if a or Facebook data scientist were opening the box and tweaking the model. But this is going to be much more of the norm than needing a data scientist in-house. The pretrained environments are usually sufficient.”

IT Service Management: Analytics Drives Issue Resolution

An efficient, authoritative IT service desk can be a business’s best line of defense when customers call with critical software problems. It can put the customer at ease, with faster ticket resolution, and even faster resolution when the problem involves a known issue can be resolved via smart self-service based on analytics.

“Good analytics leverages information across a number of different sources to help a worker opening up a ticket,” says Jeff Jamieson, CEO of Whitlock Infrastructure Solutions. “A problem is described, and the analytics engine underneath can tell you, essentially, ‘wait... we just had 15 other people log this same problem.’ We’re seeing our customers adopting machine learning as a way to drive down the time and cost of tickets.” (More on machine learning below.)

As ITSM teams monitor business services they provide to customers, capabilities like smart search, smart ticket, virtual agents for 24x7 support, and social collaboration, all based on machine learning and analytics, help meeting related service level agreements (SLAs). These issues can cover a broad range of areas and may have to do with business processes like order-to-cash or infrastructure services like email.

But the biggest ITSM payoff for analytics may lie in understanding how long it takes to resolve tickets. Identifying root causes—and providing solutions—before they become widely reported problems im- proves your business’s reputation, reduces labor costs, and leads to better services.

www.microfocus.com 3 White Paper The State of Analytics in IT Operations

Anomaly Detection and Resolution via ChatOps “But the beauty of analytics- driven anomaly detection As IT Ops teams use tools to define baselines for normal operations, they’re setting the stage for anomaly is that you don’t have to detection – the ability to find what’s out of spec or overloaded, conditions that something is out of bounds. know everything that might go wrong. While there are “When there’s an outage or a failure, there’s a common reaction on the business side: ‘Hey, we pay all this millions of log files that money for monitoring tools; why didn’t you catch this problem?’” says Jamieson. have captured what’s going on in your environment, “Typically, you can only catch things that you anticipate. The things that drive our customers crazy are analytics can point you to events that they can’t even imagine—events, for example, based on a piece of infrastructure that no one 3, 4, or 6 areas that seem to be most relevant to has a clue was there. your problem, based on data. This is a new type “But the beauty of analytics-driven anomaly detection is that you don’t have to know everything that might of opportunity,” says Jeff go wrong. While there are millions of log files that have captured what’s going on in your environment, Jamieson, CEO of Whitlock analytics can point you to 3, 4, or 6 areas that seem to be most relevant to your problem, based on data. Infrastructure Solutions This is a new type of opportunity.”

With analytics built into performance monitoring tools, IT Ops teams may have the ability to review timelines for performance on specific servers and see where and when performance took a hit. The next step is to find out why, which is root cause analysis.

ChatOps can speed up this process since automation can alert IT Ops teams about an issue in play. “ChatOps runs the gamut from service management, where an agent taking customer calls can use bots to improve efficiency,” says Jamieson “or in core monitoring systems where ChatOps helps you quickly pull together experts and the right folks to explore a problem, and review suggestions made by the chatbots at work within the integration.”

Plus, by leveraging an autonomous agent or a bot to source up data right away, you can discover quickly if your business systems are aligned to resolve the issue, says Jamieson.

“Auto resolution of events based on specific criteria is the goal. It’s expensive to operate a help desk, to have Level 2, Level 3 engineers distracted by having to solve gobs of problems, to have war rooms of people trying to solve problems instead of doing their regular jobs. All of that is a huge cost to IT.”

When AI is paired with runbooks, automated remediation becomes reality.

4 Figure 1. This analytics dashboard shows both anomaly detection in the upper right and log analytics lower right where 2.9M log messages were processed to find 20 significant ones. Courtesy: Micro Focus

Being able to distill millions of log file data points down to a few key anomalies still requires human in- telligence to examine the output. But automation and analytics working together can drive down costs compared to older methods.

Machine Learning 101

As Michele Goetz noted above, the analytics that come built-in with popular IT Ops tools for monitoring, load balancing, etc. will generally offer what teams need for operations analysis, at least enough to get started. This capability is usually supported by machine learning – the use of preconfigured algorithms that, over time, allow a system to alert and often respond to conditions set by the user.

www.microfocus.com 5 White Paper The State of Analytics in IT Operations

Although creating and manipulating complex algorithms typically requires advanced training in data sci- ence, putting them to use is less complicated. As important as anomaly detection and root cause analysis is, the ability to “What you’re trying to accomplish is fairly straightforward,” says Torrey Jones, principal analyst at Greenlight offer procurement guidance Group. “You take a set of information—the more the better - and split that into ‘good’ or desired informa- to users can be just as tion, and ‘bad’ or undesired information. You will most likely have some unknown information left over.” important a use case.

Once you have identified these subsets in the data, you feed them into the machine, which is a math- ematical equation that operates on the unknown subset of information. This is the process of “training,” providing the machine a base level of understanding for what you want and don’t want in your data. Over time, as you feed the machine unknown information (i.e., metric and log data from your IT infrastructure and systems), the machine is able to discern good or bad data on its own. “The decisions are based upon the original subsets you gave it, but the mathematical calculation is self-teaching. The more information you give it, the better the machine gets at determining if the data reveals a desired or undesired state” – i.e., conditions are normal, or conditions are anomalous and need attention.

“Of course, it’s still up to you, the human, to tell the machine that something it processed as good is actu- ally bad - or vice versa,” says Jones. Over time, with more human-based corrections and more data, the machine gets very good at predicting anomalies in the data. “Note that an anomaly can be good or bad. In IT Ops, we are typically only concerned with the bad anomalies, things that indicate a failure condition may be occurring or has occurred.”

IT Ops Analytics and the Cloud

“As more applications move to the cloud, the CIO tends to have more sleepless nights,” says Stefan Bergstein, chief software architect for hybrid cloud at Micro Focus. Executives must ensure that the en- tire infrastructure, which is audited and indirectly managed by the lines of business, is safe and secure. “Whether your infrastructure is on-prem or in the cloud, you want to prevent information leakage, you don’t want open ports, etc. It means that enforcing best practices and compliance is key and any kind of analytics and machine learning that can identify configuration settings or patterns of usage that suggest an anomaly or a breach... all of that is critical.”

As important as anomaly detection and root cause analysis is, the ability to offer procurement guidance to users can be just as important a use case. Bergstein explains: “Say I’m a user requesting a service on an

6 instance of the cloud. I want to know the best size and location for the machine. If I’m located in Germany IT Ops teams using analytics and I want to deploy something, the system should know that I will be best served by information I get see anomalies of all types, from Frankfurt, because the server there meets the compliance requirements governing my request. The affecting both performance and economics. data should not be acquired, in this case, across national borders. Or perhaps the data should simply not leave the country.”

Based on either best practices or hard coded rules, as well as history that the system learns over time, a system should suggest the right settings or workloads, the image type, and machine type. All of this makes it faster and easier for the user to get the right configuration in place.

“It’s also important to know how long a task is taking,” says Bergstein. “If I’m deploying a configuration or a patch to my networked devices, I want to predict how long the change is going to take. Or if I’m requesting services on the cloud, and I know the process will take only a few minutes, I can wait for the completion in front of my monitor. But if it will take longer, then I should probably schedule that for a later time.”

The Overlap of Security and IT Ops

As analytics capabilities continues to improve across the full spectrum of IT tools, the boundaries between security, operations, business intelligence, and ITOA are getting blurry. Take for example network traffic analytics. Tools like Cisco’s NetFlow have worked for many years in the on-prem environment to monitor IP network traffic going in or out of a system. Now cloud providers such as AWS allow you to access net flow data that can be used to detect anomalies for security purposes.

Network management tools should be able to analyze that data as well. “Only a few years ago this sort of information was not available for cloud analytics,” says Bergstein. “The benefit is showing correlations between traffic: Do I have too much or too little traffic between specific machines?” This is a key use case for analytics in cloud management. More on networking analytics below.

While security teams are certainly using a variety of analytics tools to keep data, applications, and systems safe by looking for threats, it’s often the case the IT Ops teams using analytics have a wider purview, says Jeff Jamieson. “They see anomalies of all types, affecting both performance and economics. IT Ops might work with security teams to feed what they have done into a much broader set of logs and data feeds.”

www.microfocus.com 7 White Paper The State of Analytics in IT Operations

For example, if a security team is using a SIEM (security information and event management) system, IT Ops teams can leverage those systems as a log-capture facility. “We see our customers ingesting data Analytics should, alert networking staff not only to from Splunk, ArcSight, and Logstash as a source for log file information,” says Jamieson. “All of that can a drop in performance but be rolled into a single system for anomaly detection and other analytics purposes.” also to any configuration change that may be In an ideal world, the practice of security isn’t just about protecting perimeters and discovering anomalies, its root cause. but also tying back into the business, understanding how it operates, and how IT Ops fit into that model.

“Operations should be federated between all operational units with shared responsibilities,” says Michele Goetz. “What I’m beginning to see is that security, privacy, regulatory, legal, and compliance all becoming intertwined within the CISO tool suites. Security concerns aren’t limited to the security specialists in a company. They need to broaden, and take into consideration business operations holistically. Besides, you have to realize that the security breaches actually occur in the IT operations space.”

Analytics for the Essentials: Networking and Backup/Recovery

There are at least two other areas of IT Operations where analytics is playing an increasing role: network- ing – whether traditional or virtual—and backup and recovery methods.

Analytics in Networking Unexpected network traffic – which can slow performance considerably – is often caused by unauthorized network device configuration changes to physical, SDN, and virtual controllers. (Gartner reports that 40% of mission-critical service outages are caused by configuration-related issues.) The analytics should, ide- ally, alert networking staff not only to the drop in performance but also to any configuration change that may be its root cause. This will focus operations staff on a configuration check first, which is much more efficient than an events-only model, or a log-file based triage model.

“The correlation between out-of-compliance device configuration and a network performance hit can be achieved with a general analytics tool”, says Frank Bonifazi, product marketing manager, network op- erations management at Micro Focus. “But it requires mining data from multiple sources, and significant network domain knowledge. For example, knowing that her company’s network is well-designed, a network professional might suspect that CPU over-utilization is being caused by configuration changes”. In the illustration below, we can see an out-of-the-box correlated view superimposing configuration events over a specific device performance graph.

8 Figure 2. This graph shows two configuration changes (vertical blue lines) time-correlated with out-of-normal CPU utilization (purple curve). In this case, the first change was done without approval, and the second configuration change was to return it to the approved company compliance policy. Courtesy: Micro Focus

Analytics in Backup and Recovery Even backup and recovery processes are benefiting from analytics. As the traditional function of the data center expands to include the cloud, data resides in multiple locations, gets accessed by local and remote users, and is often spread across the organization in different versions, formats, and media.

The essential question is whether the IT teams that manage the backup environment are equipped to identify issues such as unbalanced use of backup resources, inability to meet the target service-level agreements (SLAs) for mission-critical applications, or resolve future resource conflicts or other system issues before they lead to outages and data loss.

www.microfocus.com 9 White Paper The State of Analytics in IT Operations

Figure 3. This dashboard for a backup and recovery system shows users at a glance how many backup sessions failed, when, and on which media they failed.

Another goal in backup/recovery analytics is reducing the capitalization expenditure (CAPEX) and operating expenditure (OPEX) with high utilization of the infrastructure. This can keep administrators from resorting to reactionary approaches to problem resolution that often lead to complicated future challenges.

Key use cases for analytics in backup/recovery include: ■ ■ Real-time predictive analytics that provide insight into daily use of the backup process, as well as future performance and capacity gaps regarding data sets and infrastructure. ■ ■ “What-if” scenario evaluations that help teams understand whether or not SLAs are achievable, and suggest best ways to balance the demands of new data sets within the existing infrastructure.

10 ■ ■ Storage capacity planning for monitoring an ongoing data growth and how the available storage If you can put analytics on media is being filled. I.e., if data continues to grow at the current rate, how much new storage top of your own operational will be needed before storage shortage occurs? practices, the business ■ wins: You can reduce ■ Identifying potential resource conflicts and systematic issues before they cascade into costs, and you get IT outages and data loss. Ops into better alignment with business goals. Use IT Ops Analytics for Better Business Alignment

If you’re able to see and understand what’s running through your IT Ops environment—through your archi- tecture—and use analytics to map these things to what your stakeholders care most about, your value as an IT Ops specialist will become more obvious to the business.

Goetz poses three key questions that IT management cares about: “Where are they bottlenecked with limited resources? Where has the technology failed to meet their needs? What friction is there?”

“The better enterprise architecture teams are using analytics to see what’s happening on their landscapes and managing that back to the requests coming in,” she says, “and looking at the utilization by the business teams themselves so that they can make decisions like impact analysis or reuse within their environments. They’re coming at it in a smarter way, rather than just taking requests like in a deli and trying to get those done.”

If you can put analytics on top of your own operational practices, the business wins: You can reduce costs, and you get IT Ops into better alignment with business goals “rather than building a bunch of platforms that will just start collecting dust through low adoption,” Goetz notes.

Ultimately, you will understand where resources can be deployed and reused, and you’ll help IT leadership make better investment decisions.

Learn More At www.microfocus.com/opsbridge www.microfocus.com/SMA www.microfocus.com/NOM www.microfocus.com/dataprotector

www.microfocus.com 11 Additional contact information and office locations: www.microfocus.com

www.microfocus.com

162-000172-001 | M | 08/18 | © 2018 Micro Focus or one of its affiliates. Micro Focus and the Micro Focus logo, among others, are trademarks or registered trademarks of Micro Focus or its subsidiaries or affiliated companies in the United Kingdom, United States and other countries. All other marks are the property of their respective owners.