ASKING THE HARD QUESTIONS: REPORTING IN VIPR SRM

Daniel Stafford Tiffany Stafford Advisory Systems Engineer [email protected] EMC (Illustrations) [email protected] (Words and figures)

Table of Contents Introduction: What is a Hard Question? ...... 4

Automating the Answers ...... 5

About this Article ...... 6

Basic Search Skills and the Data Model ...... 8

Metric Search ...... 8

Property Search ...... 9

Notes on the ViPR SRM Data Model ...... 10

Why is Metric and Property Searching a Foundational Skill? ...... 10

Building a Table with a Simple Expansion ...... 11

More on Simple Expansions ...... 14

Adding Related Disks ...... 16

Adding Physical Hosts and Disk Capacity ...... 19

The Basics of Time Management ...... 22

Recipes for Success: Common Time Management Configurations ...... 24

Adding Disk Capacity ...... 25

Recipes for Success: Common Complex Expansions ...... 28

Configuring the Expansion for Physical Hosts ...... 30

Configuring the Expansion for Virtual Machines ...... 31

Data Enrichment ...... 36

Registering a Collector ...... 36

Configuring a Tag Set ...... 37

Saving a Tag Set ...... 39

Checking for Updates ...... 40

Using Data Enrichment for Application Chargeback ...... 42

Alerting on Reports ...... 45

2015 EMC Proven Professional Knowledge Sharing 2

Building the Report ...... 46

Scheduling the Report ...... 46

Configuring the Alert ...... 47

Notes on the Report Data Adapter ...... 47

The Alerting Definition ...... 48

Automating the Policy Change ...... 49

Testing Port Deregistration ...... 51

Conclusion ...... 52

Disclaimer: The views, processes or methodologies published in this article are those of the author. They do not necessarily reflect EMC Corporation’s views, processes or methodologies.

2015 EMC Proven Professional Knowledge Sharing 3

Introduction: What is a Hard Question? Most enterprise information technology products have some sort of built-in reporting capability. In storage arrays this might be something like Unisphere® for VMAX® or Isilon® InsightIQ. VMware vCenter natively shows statistics about a VMware environment. Oracle has Automatic Workload Repository (AWR) and Oracle Enterprise Manager (OEM).

The common thread among these tools is that they were written with the intent of reporting on a specific product or system. This initial architectural decision introduces some inherent limitations into their capabilities. For instance, it may be difficult to ask questions that involve long time periods. It may be difficult to scale to report on a large number of the target systems in a unified way. Most importantly, it may not be possible to build a query or report that the developer did not envision.

This means traditional tools are most often used to answer easy questions. Answering hard questions with only these tools available often means doing painstaking analysis by hand.

Let’s consider a few examples of this:

Easy Question Hard Question Value of the Hard Answer

Draw a graph of the Draw a graph of the total write Allows network team to write throughput to LUN throughput to all LUNs associated with right-size the WAN circuit 70 on VMAX 1581 over critical apps across VMAXs, XIVs, and used for the past day NetApp Filers in the Eastside Data Center over the past month

Which processor had Rank the utilization of all processors Enable orchestration the highest average based on a combination of average, engine to provision new utilization last week? maximum, and 95th percentile load based on utilization performance, leading to higher utilization and less spend

Which ports on the SAN Which ports on the SAN fabric are up Reclaim SAN ports to but haven’t passed a packet in the past

2015 EMC Proven Professional Knowledge Sharing 4 fabric are down? month? avoid new purchase

What is the What is the average filesystem Accurately design a thin oversubscription ratio utilization of every host attached to this storage solution to save on this thin pool? thick array? money at refresh time

Send an alert when a Trigger a provisioning stop for an array Provide safeguards to pool reaches 80% full when a combined set of pools reaches improve application 80% or disk utilization is consistently availability over 70%

How many disks are Which of those disks are local and Perform migrations with attached to this host? which are on arrays? What arrays are less labor and lower risk they located on, what are their IDs, of error and what SAN ports are used? Are there other hosts using the same array?

The common thread among the hard questions is that they are the things actually being asked by senior resources. This is because their answers have direct, hard-dollar impact on budgets.

Often the only reason for asking the easy questions is as research in service of the hard questions. This is part of the labor-intensive analysis necessary to answer hard questions.

This investment of labor also means that the number of hard questions that can be answered is inherently limited: There are always more questions than answers. The ability to get directly to the hard answers without the labor investment has the potential to change the way an IT organization operates in fundamental ways.

Automating the Answers EMC’s ViPR SRM has become a very popular monitoring and reporting package for good reason. It can collect data from a diverse set of IT infrastructure products, produce thousands of useful out-of-the-box reports, and scale to meet the needs of the world’s largest data centers. Many of these out-of-the-box reports are even the sort that would qualify as ‘hard questions’ under the criteria we’ve laid out above.

2015 EMC Proven Professional Knowledge Sharing 5

This is the proposition that leads many enterprises to deploy ViPR SRM. What many organizations quickly discover is that many of their hard questions can’t be answered by those thousands of built-in reports. At this point, customization is needed.

Custom reporting in any tool can be intimidating. In this article we will attempt to overcome that intimidation. Starting from the perspective of a new administrator, we build the foundational knowledge necessary to create custom reports which address specific business problems. In the process, a number of recipes are developed which can be used as-is to immediately get interesting and valuable results not included in an off-the-shelf install.

About this Article This article is written as a how-to guide in which previously learned skills build on each other. It can be read by itself, but for maximum benefit it is intended as a sort of lab guide. By following along in your own ViPR SRM environment as you read, you will find yourself retaining more and going Explore: Boxes like this suggest beyond the text to solve problems specific to your experiments for you to try on your own needs. own

If you don’t have a ViPR SRM environment in which to try this out, reach out to your EMC or Partner Systems Engineer. They can provide you with links to online virtual environments that are meant for learning.

Throughout this article Online Checkpoints are often referenced. These can be found at https://community.emc.com/people/DannoOfNashville/blog/2015/05/03/knowledge-sharing-2015-vipr-srm-reports.

If you run into trouble following the configuration steps, or simply want to jump right to a usable report, visit this site and download the appropriate checkpoint. This site will also include any errata discovered after publication.

2015 EMC Proven Professional Knowledge Sharing 6

2015 EMC Proven Professional Knowledge Sharing 7

Basic Search Skills and the Data Model

Metric Search Before discussing any of the theory behind ViPR SRM’s data model, the most instructive thing can be to first spend some time exploring it. Let’s start with a search using the Advanced Search dialog.

Leave the filter as-is, set to Everything (*). For now, just enter the following in the Expansion box: devtype device parttype part name

The meaning of this will become clear as you click through the search results. The results start with:  A list of Device Types (devtype) which… Explore: Any property can be  Leads to a list of Devices of that type (device)… used in this search expansion. Try adding source to the front, or  Then to Component Types associated with that vstatus anywhere in the list. device (parttype)…  On to Components of that type (part), and finally to…  Time series Metrics which are associated with that component (name)

2015 EMC Proven Professional Knowledge Sharing 8

Property Search The components (devtype, device, parttype, part, name) which make up the expansion are known as properties. These particular five properties are special because every metric (name) in the database has them. However, there are typically many more properties associated with most collected metrics. Let’s look at some of those now.

To do this we’ll use the Management of Database Metrics interface, located in ‘Administration’. (Note: Non-Admin users should browse to ‘Modules’ instead of ‘Administration’)

Because this interface returns raw metrics, we’ll add a Filter to ensure the search is fast and we don’t get too many results. Filters can be created graphically by clicking on the box which starts with Everything. Try refining the filter down to a particular device type, device, and component type.

When you run this query, a matching list of metrics will appear. Clicking on any of these will display a list of properties attached to the metric.

Note that any of these properties can be used as a Filter term. They can be accessed with an easy autocomplete interface by choosing ‘Using a Wizard’. Advanced users may choose to simply ‘Edit Expression’ and enter the filter directly. The syntax of these filters is very similar to SQL or CQL.

2015 EMC Proven Professional Knowledge Sharing 9

Notes on the ViPR SRM Data Model Explore: Do a search with a Filter/ It is important to note here that the metrics are Expansion set that lets you drill down stored in an entirely flat, unstructured data to view LUNs by Pool (hint: model. When we search in the original example, poolname). Try the same thing with the only reason the results appear in a tree-like Virtual Machines by vSphere Cluster structure is because we have requested it with (hint: cluster) the expansion (devttype device parttype part name). This is a good starting expansion because these properties are common to almost every metric. By combining different filters and expansion patterns we can impose almost any structure that makes sense for a given situation.

Why is Metric and Property Searching a Foundational Skill? As you read through the rest of this article, references to Metric and Property names may seem to appear out of nowhere. For example, in the first report (Online Checkpoint #1), we use the property deviceid, which contains the UUID of a VMware Virtual Machine. In the second (Online Checkpoint #2), we use the metric ‘Capacity’ to sum up the SAN capacity consumed by each Virtual Machine.

When these appear, refer back to this Basic Search Skills section. You will always be able to find them in one of two ways:

 Grouping relationships can be found by using the Advanced Search. For example: o A devtype ‘Array’ will include one or more ‘LUN’ parttypes, and each associated part will have a metric with name ‘WriteRequests’.  Metadata (properties) can be found using Management of Database Metrics. For example: o Every metric associated with a devtype ‘VirtualMachine’ stores the associated UUID in property deviceid. o VMware Datastores have their unique World Wide Names (WWNs) stored in property partsn. Property partsn is also used for the WWNs of Array LUNs, making partsn a unique key which connects these different data sets.

2015 EMC Proven Professional Knowledge Sharing 10

Building a Table with a Simple Expansion To handle Jordan’s request, we’ll need to use Edit Mode. Once in Edit Mode, we’ll want to add our custom report in the section called ‘My Reports’. As you might guess from the name, My Reports is a private sandbox. The reports built here are visible only to you. It’s a place where experimentation (and mistakes) are encouraged.

To add a new node, click the ‘New Node’ button. The other buttons along the bar shown are respectively Cut, Copy, Paste, Paste as Link, and Remove nodes.

Once the node is added, let’s do the following to populate it.

1. In the Filtering and Expansion tab a. Give the report a friendly name b. Set a filter for devices of type VirtualMachine and source is the VMware Collector 2. In the Report Configuration tab a. Change report type to Standard Table 3. In the Report Details tab a. Add a property column called ‘Virtual Machine’ with the property ‘device’

2015 EMC Proven Professional Knowledge Sharing 11

Once this is done, let’s go back to Browse Mode and take a look at our handiwork.

As you can see from Figure 8, something is amiss. My lab environment with less than 200 Virtual Machines (VMs) generated nearly 15,000 lines. What went wrong?

As we learned while using Search, the data in ViPR SRM is inherently unstructured. This means that unless we impose a structure, the engine will simply assume we wish to show one Time-Series metric per table line.

The way we impose this structure is with an expansion. An expansion groups metrics based on common properties. For example, an expansion on deviceid (which is the VM UUID) groups all of the metrics with a common deviceid property together on the same row.

2015 EMC Proven Professional Knowledge Sharing 12

To fix the report:

4. Add a Child Node to the report 5. On the Child Node in Filtering and Expansion a. Add deviceid as an Expansion Property

Now when we view the report, it will have just one Virtual Machine per line.

Since Jordan also wanted to know what particular vSphere host each VM is running on, we’ll need to know the names of the property with that data. A quick search of Management of Database Metrics reveals that hypervsr is correct here. Adding this as an additional Property column just as we did for device will produce the desired table.

2015 EMC Proven Professional Knowledge Sharing 13

More on Simple Expansions Consider a sample dataset based on the filter devtype==’Array’ & parttype=’LUN’ for a moment. Suppose that each LUN just has time-series Metrics (name) entries for IOPS and Capacity. What happens if different expansions are applied?

Of these, only the device expansion would be considered conventionally useful. It is more common with large datasets to see expansions on combinations of properties, as in Figure 13.

2015 EMC Proven Professional Knowledge Sharing 14

It is also very common to see multiple levels of expansion used for drilling down. When the user clicks on a table row, the next level down is filtered to just the data associated with the row (Figure 14).

Explore: Try this yourself. Copy/paste a report and try changing the expansion. Try to build a simple drill-down.

2015 EMC Proven Professional Knowledge Sharing 15

Adding Related Disks When building out report nodes, filters are additive. A common practice is to start at a top node with a very wide filter, such as all data associated with the VMware Collector (source==’VMWare-Collector’). As the report drills into details of the data the filter gradually becomes more restrictive.

Expansions are part of this process. Besides grouping common metrics, they also introduce a filter based on those common metrics. This can be seen visually by comparing the report tree in Edit Mode to the report tree in Browse Mode. A node with a simple expansion may become many nodes, each one a unique data set within that expansion.

This means by adding a child node underneath the node with the device expansion, we can filter for components particular to a given device. We will add a child node with a filter for parttype==’Disk’. The filter scope will change to ‘expansion and selection’ because we will also add a simple expansion on part to this node.

Exploring the tree in Browse mode will reveal that each VM can now be expanded into a list of Disks associated with that VM.

2015 EMC Proven Professional Knowledge Sharing 16

Since the VMware Administrator wants to see a count of Disks, we’ll need to add a Formula Explore: Take a moment to look at the using the Formula tab. In this case we’ll use the available formulas. ChildCount formula, which returns a count of child nodes.

When you create a Formula, the formula result has a scope. This is a fancy way of saying the result is only visible from certain places. Formula results in ViPR SRM always have a scope which includes the node on which they were created, plus the parent of that node.

In this case, we’ve created the Formula on the ‘device’ node (because we wish to count its children), and we want to display it on the ‘VM List with Disk Count’ node, both of which are in the scope by default. However, this will be an important fact to remember later when we wish to display a formula result outside its default scope.

2015 EMC Proven Professional Knowledge Sharing 17

Once this is done, a Disk Count column will show the number of disks associated with the VM. Clicking on a VM will show a list of those disks by name.

2015 EMC Proven Professional Knowledge Sharing 18

Adding Physical Hosts and Disk Capacity There are a number of ways to add Physical Hosts to this table alongside the Virtual Machines. The simplest would be to expand the filter to include devtype==’Host’. This will work because the tree structure we’ve used in the report so far is the same: both VMs and Physical Hosts have associated Disk parttypes.

A slightly more complex way starts the same: we expand the top-level filter to encompass Physical Hosts. However, we would then restrict the filter on the existing child node (the device and part nodes above) to apply only to devtype==’VirtualMachine’ and build a second child node to apply to only devtype==’Host’.

In this example, we will choose the second option. This method can be useful for a number of reasons:

 It allows data sets with different structures to be displayed in a normalized way o For example, performance statistics on VMDKs appear on VirtualDisk parts, whereas those same statistics for a Raw Device Mapping (RDM) appear on Disk parts. Using different nodes for each allows us to pass up these metrics in a common way. o This is also important if different parts of the dataset require different expansions – In this case the VMs need to be expanded on their unique deviceid (VMware UUID), but the Hosts should be expanded on device, the hostname.  It allows different data to be displayed when the user clicks to drill down. o For example, drilling down to a VM might display reports about the associated and Datastore, whereas drilling down to a Physical Host would not need these reports  It allows us a finer degree of control over how certain items are displayed.

2015 EMC Proven Professional Knowledge Sharing 19

In the screenshot above we’ve added a column which shows the property devtype, which displays ‘VirtualMachine’ for VMs and ‘Host’ for physical hosts.

Suppose instead we wish to use the labels ‘VM Guest’ and ‘Bare Metal’. To do this, we will use a new type of formula called a Nop along with the Value-to-String formatter. This sort of customization can help make reports much more readable.

Adding the capacity of all of the disks will involve another new formula, the Spatial Sum. To do this we need to think again about scope. The capacity should be summed on each VM or Host. However, the Capacity metric resides a level below this in the tree. This means a Nop will be required. Start by adding a Nop formula to the lowest child nodes. This Nop will filter for name==’Capacity’.

2015 EMC Proven Professional Knowledge Sharing 20

When you add the sum formula (math.Spatial.Sum) to the Host and VM nodes, take a moment to look at the formula configuration. It follows a pattern common among many formula types.

First, the formula requests an input parameter. This could be a metric selected from the filtered/expanded data available (‘Filter on this node’), a Formula Result, a Property Value, or a set of Combined Parameters (which can combine any of the above). For this example we will choose the result of the Nop formula on the child.

The second configuration group on the sum formula is ‘Settings’. In this case, they will be left default. However, on many other formulas these settings are central to the desired operation.

The third configuration group consists of the output settings. The only thing we will set in this case is a name. The other two options are used in other circumstances. ‘Show in Graphs’ allows the result to be shown in a simple chart on the same node. ‘Default result’ allows that formula result to be displayed as part of a stack chart or TopN report.

Once the Nop and Spatial Sum formulas are added, we are ready to display the result in the table. This will be a Value and will use the Formula Result from the Sum. With this column we will need to do something new: Modify the column’s Time Management settings.

2015 EMC Proven Professional Knowledge Sharing 21

The Basics of Time Management The Time Management settings on a column are there to solve a problem which exists on every ViPR SRM table:

There are many different ways that one might wish to summarize a set time-series data:

 Display the average over the period  Display the maximum or minimum over the period  Display the sum of all values in the period (numerical integration)  Display the last value in the period

Some of these operations might be computationally intensive. Imagine asking ViPR SRM to display the average IOPS on a LUN over the last six months. If it used real-time data collected every five minutes, this would involve retrieving and averaging over 50,000 values for every LUN displayed.

To simplify this, the ViPR SRM engine continuously calculates values such as the average, minimum, maximum, and sum for various periods (each hour, day, and week). Rather than average all of the real time values, we would average the rolled-up one-week averages. This produces the same answer with far less computation – 26 values versus 52,560, or about 2000 times more efficient.

The Time Management settings which can be configured to produce these results are:

 Sampling Period – Should the calculation use real time data or one of the statistical aggregates (one hour, one day, one week)?  Sampling Type – Which sort of statistical aggregate (average, mix, max, sum, last, or count) should be used?

2015 EMC Proven Professional Knowledge Sharing 22

 Column Time Range(s) – ‘Inherit from report’ will use the time range from the ‘Report Configuration’ tab. Optionally, a particular period may be chosen, or multiple columns can be generated for multiple time ranges.  Recover… – Should all values in the time range be used for calculation, or should we simply display the last one?  Temporal Aggregation – If ‘Recover…’ was set to All Values, this specifies whether the set of values should be averaged, summed, or if the min, max, or count should be displayed.  Time Threshold – If ‘Recover…’ was set to Last, this specifies how far to go back looking for a last value.

2015 EMC Proven Professional Knowledge Sharing 23

Recipes for Success: Common Time Management Configurations

Recipe #1

Useful for: Showing the last value of a metric, such as with LUN or Pool Capacities. Also good for metrics which don’t change much, such as CPU counts.

Recipe #2

Useful for: Showing the average of a value over a period with minimal Front-end load. Common when displaying average IOPS, CPU utilization, or Memory usage.

Recipe #3

Useful for: Showing the peak value over a period.

Recipe #4

Useful for: Estimating total change over time. For example, this might be used to estimate the required size of a RecoverPoint journal or the space required

for snapshot deltas.

2015 EMC Proven Professional Knowledge Sharing 24

Adding Disk Capacity To review, the steps to add a Disk Capacity column are:

This will provide an accurate value for each Host and VM. However, as configured it is still missing some possible points for style. Since all of the values are in gigabytes (GB), some of the very high and very low values can be difficult to read.

To improve readability, we will use the Scaling feature in the column’s Value Settings. Scaling can be simple multiplication or division. It can also be unit-aware. For this case, unit auto- scaling will work perfectly.

With these settings, Capacity will be scaled to the most appropriate unit (Figure 24).

2015 EMC Proven Professional Knowledge Sharing 25

Adding LUN and Array Information

Using the skills we’ve already developed, it’s now fairly simple to turn the boring drill-down list into a table of disk names.

All of the information about the associated LUNs is locked up in a completely different dataset. Fortunately, one of ViPR SRMs strengths is making connections between different datasets. To do this, we’ll need to learn to use a new tool: Complex Expansions.

Complex expansions extend the metaphor we explored previously with simple expansions:

2015 EMC Proven Professional Knowledge Sharing 26

In a complex expansion, a new filter is created to find the new dataset. The complex expansion itself:

 Splits the data based on common properties (just like a simple expansion)  Connects the split data to parent nodes with matching source data

In the report we’ve built so far, this existing source data (on the lowest child node associated with a physical host) would be for a particular Disk. Looking in Management of Database Metrics, we can see this Disk has a property called partsn which is a LUN World Wide Name (WWN). A complex expansion can use this to find the parttype LUN with the same WWN.

Let’s take a moment to review the steps to build a complex expansion that joins datasets based on common properties:

In this step we choose a template for the complex expansion. This limits the future steps to a smaller set of features to simplify the configuration process.

In this case we are primarily considering ‘Join properties having a different name’. To open up all of the options, choose ‘Manually configure the complex expansion’.

Here we choose the source property on the existing data set (which resides on the parent node) which we want to use to connect to some other data set.

Here we select a target property which resides on the new data set we wish to find.

Another common configuration item is Level Up, which removes previous filter constraints. The most common selection is to level up to Maximum, which allows you to create a fresh filter on this node.

This step captures any other Complex Expansion modifiers, such as wildcard or regex matching, or splitting a property on a separator.

2015 EMC Proven Professional Knowledge Sharing 27

Recipes for Success: Common Complex Expansions This table describes a number of commonly encountered expansions. It is important to note that source and target are arbitrary. Any of the expansions below can work in both directions.

Note as well that the filters have been simplified. In writing real-world reports it is a best practice to specify source and vstatus properties. The particular circumstances of your report may also lead you to add additional restrictions.

Some of the expansions below join on multiple properties. This is to ensure that a unique match is made. Just as every switch has a Port 1, nearly every array will have a LUN 0.

Connection Source Data Filter Target Data Filter Expansion Configuration

Host Disk to devtype==’Host’& devtype==’Array’& Join partsn to Array LUN parttype==’Disk’ parttype==’LUN’ partsn

VM RDM to devtype==’VirtualMachine’& devtype==’Array’& Join partsn to Array LUN parttype==’Disk’& rdmname& parttype==’LUN’ partsn partsn

VM Virtual Disk devtype==’VirtualMachine’& devtype==’VirtualMachine’& Join partdesc to to VMDK File parttype==’Disk’& part=’HARD parttype==’File’ part DISK%’

VMDK File to devtype==’VirtualMachine’& devtype==’Datastore’ Join linkedto to Datastore parttype==’File’ device

Datastore to devtype==’Datastore’ devtype==’Array’& Join partsn to Array LUN parttype==’LUN’ partsn

VMAX Storage parttype==’Storage Group’ parttype==’StorageGroupToLUN’ Join part to Group to LUN sgname Step #1 Join device to device

2015 EMC Proven Professional Knowledge Sharing 28

VMAX Storage parttype==’StorageGroupToLUN’ parttype==’LUN’ Join lunname to Group to LUN part Step #2 Join device to device

Host HBA to (devtype==’Host’| devtype==’FabricSwitch’& Join partsn to Switch Port devtype==’Hypervisor’)& iftype==’fibreChannel’ portwwn parttype==’Port’

Explore: What other discovered components might have connections? Use Management of Database metrics to find their common properties.

2015 EMC Proven Professional Knowledge Sharing 29

Configuring the Expansion for Physical Hosts Executing these steps to connect to an Array LUN looks something like this:

Now the device and part properties from the LUN can be displayed on the top level table (or the drill-down into the list of Disks). The device is the name of the storage array, whereas part is the name of the LUN.

2015 EMC Proven Professional Knowledge Sharing 30

Configuring the Expansion for Virtual Machines This only solves the Physical Host half of the equation. We still need to make the same connections for Virtual Machines.

For Disks which are RDMs, this connection can be made the same way as with Disks attached to Physical Hosts. The RDM has an associated WWN stored in the partsn property which can be linked to an Array LUN using a Complex Expansion.

For Disks which are VMDKs, the process has a number of additional steps, depicted below.

To get Array and LUN data for Virtual Machines, we’ll need to start by splitting the original simple expansion on part into two pieces. One will have a filter (on Expansion and Selection) to capture only RDMs. The other will have a filter (also on Expansion and Selection) to capture only VMDKs.

On the RDM node, we can copy/paste the complex expansion node from the Physical Host tree. As stated before, this connection will work exactly the same as with a Physical Host. Copying and pasting the LUN Properties Nop formulas to make this data available is left as an exercise for the reader.

2015 EMC Proven Professional Knowledge Sharing 31

To complete the connection on the VMDK node, we will add a set of Complex Expansions based on the chart above: Disk to File, File to Datastore, Datastore to LUN.

Once the Complex Expansions have been added, the device and part properties can be passed up the tree using Nop formulas, just as in the previous two examples.

2015 EMC Proven Professional Knowledge Sharing 32

Extreme Time Management Earlier we established some simple recipes for Time Management. These can be scaled up to accomplish things that would be very difficult in a traditional reporting tool. To demonstrate this, we will build a report which finds SAN ports which are connected but have not passed any traffic in the past three months.

The link status of a port is available in the property partstat. Depending on whether the switch environment is Brocade or Cisco, a connected port will have a partstat value of either ‘online’ or ‘up’, respectively.

Finally, to reduce the number of entries in our table (and eliminate the need for computationally expensive sorting), we will take advantage of the value filtering feature. This allows us to only display a table row if the resulting value matches a Boolean expression. This can be found in the Advanced settings for a table value.

These steps are summarized in Figure 34.

2015 EMC Proven Professional Knowledge Sharing 33

Explore: What other under-utilized

components could we detect through filtering and time management?

2015 EMC Proven Professional Knowledge Sharing 34

2015 EMC Proven Professional Knowledge Sharing 35

Data Enrichment Opening up any given metric in Management of Database Metrics reveals a wealth of metadata. The Data Enrichment process allows us to add custom metadata which is meaningful to a particular business. Some common uses include tagging discovered components with:

 Location  Business application  Business purpose  Installation, lifecycle, or maintenance dates  Business or IT contact  Cost data  Service Catalog Assignments (Gold, Silver, Bronze, etc.)

There are two interfaces in Centralized Management which allow custom metadata tagging. The older, traditional one is Data Enrichment. This interface is the most flexible. In ViPR SRM 3.5, the Groups Management interface was added. Groups Management is intended to allow simple, wizard-driven tagging for a set of common use cases. The tags it populates are often referenced in built-in reports. By contrast, almost any tag added in Data Enrichment requires some reporting customization.

The Groups Management interface is self-explanatory, especially to a user who has a basic comfort level with the ViPR SRM data model. For this reason, we’ll focus on Data Enrichment, which can be more powerful (and tuned more finely) in the hands of an experienced user.

Registering a Collector The first step in using Data Enrichment is registering a collector module. This can be found by browsing to ‘Data Enrichment’ in the Centralized Management interface. At the top level of the tree, a ‘Register a new module’ button is available.

2015 EMC Proven Professional Knowledge Sharing 36

A given Collector host may have many Collector modules – one or more for each type of infrastructure it is collecting data from. To avoid the management overhead of registering all of these, a best practice is to register the “Load-Balancer :: DataEnrichment” module. This ensures that any configured enrichment rules will act on all data which passes through the collector host.

Configuring a Tag Set Once you register a module, drill into that module to configure Tag Sets with the ‘New Tagging’ button.

Each ‘New Tagging’ which is added consists of a list of keys and properties. These keys and properties follow a basic template:

2015 EMC Proven Professional Knowledge Sharing 37

When adding a new key, there are a few choices to make:

Column order is important because a tagging ruleset can be imported from an Excel worksheet or a CSV file. Setting this order correctly tells ViPR SRM what to expect when looking through the file.

Explore: The examples below just The type of match is also very important. For scratch the surface of what is possible example, choosing ‘String’ will allow you to with regular expressions. Check the match exact strings of characters. One of the Internet for in-depth tutorials. most flexible options is ‘Regex’, which is short for ‘Regular Expressions’. Regex is a language built for pattern matching, common across many operating systems and computer languages. Here we’ll explore a few common regex recipes.

Regex Matches

.*FOO.* Matches when ‘FOO’ is anywhere in the string, such as ‘FOOBAR’ or ‘EATFOOD’

^FOO.* Matches when ‘FOO’ is at the beginning of the string. ‘FOOBAR’ would match but ‘EATFOOD’ would not.

^.{3}FOO.* Any string where the letters ‘FOO’ are characters 4, 5, and 6. ‘EATFOOD’ would match but ‘FOOBAR’ would not.

2015 EMC Proven Professional Knowledge Sharing 38

.*[fF][oO][oO].* Makes the match case-insensitive. Both ‘foobar’ and ‘FOOBAR’ would match.

.*FOO\d.* Will match when ‘FOO’ is followed by a number. ‘FOO1BAR’ would match, but ‘FOOBAR’ would not.

Configuring a property is much simpler. It is only necessary to set a property name and column position, and optionally a default value. Here are a few tips to keep in mind when choosing a property name.

 Property names can only be up to eight characters long. Any extra characters will get truncated. This means homeaddress will become homeaddr.  Property names are case sensitive. HomeAddr is different from homeaddr.  Always check for collisions. If your metric already contains a property called lunname, overwriting it is likely to have unintended consequences, such as reports not working.

Saving a Tag Set When you click Save, the resulting dialog will list all registered modules. This is an opportunity to apply this tagging configuration to a larger part of the environment. This can be a very convenient feature. Suppose you want to apply the same Data Enrichment rules to every VMware Collector: just check each one and choose ‘Update’.

These steps are shown on Figure 38.

2015 EMC Proven Professional Knowledge Sharing 39

Checking for Updates Once you’ve implemented a set of Data Enrichment rules, it will take some time for them to be applied. Typically, two things must happen before the new properties will be available:

 The collector on which the rules are applied must complete a collection cycle  The property store must be updated on the Frontend host

If there is a need to iterate quickly, each of these can be manually initiated.

You can restart the registered collector to ensure a new cycle starts quickly. This can be done in the GUI from Centralized Management. Find the appropriate collector-manager on the Collector host(s) and choose to restart the service. It can also be done by SSHing to the Collector host(s) and using the manage-modules script. This is typically located at /opt/APG/bin/manage- modules.sh.

To update the property store, browse to the Frontend host in Centralized Management and run the import-properties task. This can also be done in the terminal using the /opt/APG/bin/manage- tasks.sh script. This task normally runs on a nightly basis, but can be run at any time.

2015 EMC Proven Professional Knowledge Sharing 40

Either of the shell scripts referenced will print help text describing their syntax when run with no arguments.

Once these tasks are complete, the enriched properties can be seen in Management of Database Metrics.

Fun fact: Running the import-properties task is known colloquially among ViPR SRM engineers as “Kicking the property store”.

2015 EMC Proven Professional Knowledge Sharing 41

Using Data Enrichment for Application Chargeback At this point, we will assume you have used Data Enrichment to apply a set of appli (Application) tags to the hosts and virtual machines in your environment.

The report we’ve already developed for displaying Array, LUN, and total Capacity for each host can be re-used here. We can create a new node and copy/paste the existing report as a child. By adding a simple expansion on appli to this child node, we can aggregate per-application capacity using a Sum formula. This is the same process we followed when we aggregated per- LUN capacity on the nodes which had expansions on device.

We can also use a ChildCount formula on the newly pasted node to get a count of all Hosts and Virtual Machines associated with an application.

This will make the configuration of an application-focused report very simple:

2015 EMC Proven Professional Knowledge Sharing 42

The result is a straightforward application-focused capacity report. It can be drilled into to view per-host information.

Explore: What out-of-the-box reports could be improved with tags customized to your business?

2015 EMC Proven Professional Knowledge Sharing 43

2015 EMC Proven Professional Knowledge Sharing 44

Alerting on Reports The alert engine in ViPR SRM is very flexible. It can generate emails or SNMP traps, make log entries, as well as execute arbitrary actions. This can be in response to different stimuli:

 Incoming alerts from other sources (Alert Consolidation)  Simple analysis of incoming collection data (APG Values Socket Listener)  Results in scheduled reports (APG Report Data)

In this case, the Director of Cloud has requested changes to provisioning policy based on array performance. It probably makes the most sense to base such a decision on the combination of multiple metrics rather than a single value. Based on that, we will use the APG Report Data Adapter.

The rule we will implement is

The formula above (Figure 42) is an artificial ‘score’ describing how busy a VMAX FA (Front- End Adapter) processor is.

Cloud storage provisioning in this environment is managed by the ViPR Controller. To stop new provisioning, we will de-register the busy FA port from the vArray it is associated with. This will not affect capacity which is already allocated, but will prevent the ViPR Controller from using this FA in the future.

2015 EMC Proven Professional Knowledge Sharing 45

Building the Report This report will have a structure similar to those we have built in the past. The primary difference is the addition of the Math.Spatial.Average, Math.Spatial.Max, and Math.Spatial.Pecentile formulas. These allow us to apply concepts such as the Average and Maximum of a set without using Time Management. The Percentile formula provides a capability that isn’t possible with normal Time Management rules.

Scheduling the Report This can be done from the Tools menu. Choose ‘Schedule this Report’. After configuring a schedule, the most important thing is checking the ‘Local Manager’ box on the ‘Alert’ tab.

2015 EMC Proven Professional Knowledge Sharing 46

In the background, this works by sending the XML output of the report to this folder on the Alerting Backend host:

/opt/APG/Backends/Alerting-Backend/Default/custom/Adapter/Watch4net Report Data Adapter/APG Report Data

Explore: Check the report data folder on the Backend Host to ensure Scheduled Reports are coming across. Note the format of the XML output.

Configuring the Alert To find the Alerting interface, browse to Administration (for full admin user) or Modules (for normal users). Alternatively, browse to http://[frontend-host]:58080/alerting-frontend/

Notes on the Report Data Adapter First, ensure that the Report Data Adapter is installed. It should appear in the ‘Adapters’ section of the tree. If it is not installed, this process can be started by clicking the ‘Create a New Element’ button, which can be found in the same location as the ‘New Node’ button in Edit Mode.

It is also possible to edit the Report Data Adapter settings here. Temporarily reducing the Time Check value can speed up troubleshooting, allowing fast iteration when configuring and testing a new alert.

2015 EMC Proven Professional Knowledge Sharing 47

The name of the Report Data Adapter instance is also significant, as it will be referenced in a filter. In environments with a wide variety of report-based alerts, multiple adapter instances may be configured with different names and complementary file masks.

The Alerting Definition In the ‘Alerting Definitions’ section, a new definition will be created, similar to creating a new node when in Edit Mode.

Once in the alert configuration section, different blocks can be dragged and dropped to create the alert logic. A typical alert will consist of at least three blocks:

 A Filtered Entry, which defines which data the alert acts on  A Condition such as a Comparator, which checks the filtered value against a logical test  An Action which occurs when the Condition is met (or not met)

The Filtered Entry for this alert will check for three properties: adapterName, reportName, and name. The first and second are straightforward. The adapterName property will match the name of the Report Data adapter. The reportName property will match the name of the scheduled report.

The third has some special rules. When a table report is parsed, each numerical column is turned into a metric. The name property of the metric is based on the name of the column, but it is important to note that the parser removes all spaces. This means that a column named “Utilization Score” would get a name property of Explore: Properties have an eight- “UtilizationScore”. character limit. How would this impact Non-numerical columns are turned into column names on a report meant to be properties for the numerical columns. For parsed by the Alerting Backend? example, the report scheduled above has

2015 EMC Proven Professional Knowledge Sharing 48 non-numerical columns named “VMAX” and “FA”. This means the numerical metric on the “Utilization Score” column will get tagged with properties VMAX and FA.

Once a Filtered Entry is configured, we can connect it to a Comparator. This will be a Constant Comparator Operation with a ">" operator and a constant value of “400”.

Finally, we can define an action. The change in provisioning policy can wait a moment – for now, let’s just configure an email alert.

Alert actions (such as Emails) will accept certain keywords:

 TMST – Timestamp of the alert  VALUE – The value that triggered the alert  PROP.’xxxx’ – Displays property xxxx

Once this alert is configured to send an email in the desired format, we can move on to the provisioning policy change.

Automating the Policy Change We’ll be modifying a parameter in ViPR Controller. To do this, the ViPR Controller CLI will need to be installed on the ViPR SRM Alerting Backend. The exact procedure for doing so can be found at http://www.emc.com/techpubs/vipr/installing_the_vipr_cli-1.htm.

Once the CLI is installed, we can create an External Process action in the ViPR SRM alert definition. This action expects a command, a set of command paramters, as well as any

2015 EMC Proven Professional Knowledge Sharing 49 environment parameters. The command will be run Explore: Try issuing different commands. as user ‘apg’ – this means the environment variables For example, the command ‘/bin/date’ with required for the ViPR CLI will need to exist for this parameters ‘>>,/home/apg/date.txt’ will create user as well. a simple log of alert times. The simplest way to do this is likely to define these variables in the ‘Environment Parameters’ box when creating the External Process action. These will be placed on one line, separated by commas. The required variables for executing a ViPR CLI command are typically:

PATH=/opt/ViPR/cli/bin:$PATH

PYTHONPATH=/opt/ViPR/cli/bin:$PYTHONPATH

ViPR_HOSTNAME=[ViPR Controller FQDN]

ViPR_PORT=4443 We also need to define the command and its arguments. The ultimate command we wish to run would be viprcli storageport deregister -name [FA Port] -type vmax -serialnumber [Array Serial]

To configure this in the External Process action, first we enter the binary in the ‘Command’ box. This must include the full path, typically ‘/opt/ViPR/cli/viprcli’.

In the command parameters box, the arguments must be entered one at a time, comma- separated.

The Array Serial number and FA Port name can be populated using the PROP keyword. These will take on the column header names, PROP.’VMAX’ and PROP.’FA’.

Note in the screenshot that ‘:0” has been appended to the FA property. This is because the port must be specified to match ViPR Controller’s

2015 EMC Proven Professional Knowledge Sharing 50 nomenclature. There is an assumption here that only the VMAX’s zero ports are in use. If both the one and zero ports are in use, it will be necessary to create two actions to deregister each one individually.

Testing Port Deregistration It is possible to re-register ports in ViPR Controller after they have been deregistered. To quickly test that the process is correctly configured end-to-end, try the following:

 Add part filter to the Filter Entry for a single VMAX Controller  Set the Comparator very low to ensure it will trigger  Manually run the Scheduled Report  Once the action has been observed to fire correctly, re-register the port in ViPR Controller (Physical Assets / Storage Systems / [Select System] / Storage Ports / [Register Port])

Note: Between the writing and publication of this paper, the ViPR Controller 2.2 release implemented a feature similar to what is described above which does not require ViPR SRM. As such, consider this as an example of what can be done with ViPR SRM by treating it as a platform rather than a simple monitoring dashboard.

2015 EMC Proven Professional Knowledge Sharing 51

Conclusion During the course of this article, we’ve started simple and built up to a number of fairly complex reports. These tutorials are intended to allow an administrator to explore ViPR SRM reporting. Users that invest time in developing these skills will find themselves much more confident the next time they encounter a Hard Question.

The skills presented here are only the beginning. ViPR SRM is a deep platform, with capabilities and intricacies that cannot be plumbed in a single article. Consider this article a first step – a Hello World app for a new language. As you continue on your journey, consider some of the additional resources below for learning more.

The official documentation for ViPR SRM can be found online at: https://community.emc.com/docs/DOC-35810

The EMC Community supporting the broader ViPR portfolio can be found at: https://community.emc.com/community/products/vipr This community includes videos, demos, and even posts by users that include the custom reports they’ve created.

Your humble author maintains a blog which regularly discusses ViPR SRM (and other enterprise-technology-adjacent topics): http://eastsidegeek.typepad.com/

The EMC Community post referenced in the introduction will also be updated as needed with more information related to this document: https://community.emc.com/people/DannoOfNashville/blog/2015/05/03/knowledge-sharing-2015-vipr-srm-reports

For users who wish to go deeper in a hands-on setting, EMC Education Services offers a ViPR SRM Advanced Reporting class, as well as classes around the installation, maintenance, and general use of the environment: https://education.emc.com/

Finally, I encourage everyone to become actively involved in the online communities above. Questioning, exploration, sharing, and diversity of opinion make the entire ecosystem stronger. The more you participate, the more the community will become the one you want to see.

2015 EMC Proven Professional Knowledge Sharing 52

“If this machine gave you the truth immediately, you would not recognize it, because your heart would not have been purified by the long quest”

– Umberto Eco, Foucault’s Pendulum

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

2015 EMC Proven Professional Knowledge Sharing 53