Protection Manager 3.8 Default Provisioning Explained

This document attempts to give a clear picture on how the 3.8 version of Protection Manager does secondary and tertiary volume provisioning using its default provisioning feature and Resource Pools. In particular, it explains the various size and space variables that will cause an aggregate to be disqualified from use for provisioning a secondary or tertiary volume. If you are not already familiar with Resource Pools then it is recommended that you first read the contents of the “Understanding Resource Pools” folder of the online help that is part of the NetApp Management Console. The folder includes an overview of the provisioning processes. The sections “Sequence for selecting backup destination volumes” and “Sequence for selecting mirror destination volumes” provide high level explanations of the provisioning decision processes.

The Decision Process for Default Provisioning

Before walking through the processes, let’s take a look at the important options and variables used in the decision process.

The Dynamic Secondary Sizing option The 3.8 version of Protection Manager includes a new feature called dynamic secondary sizing (see the Admin Guide for details on this feature). This feature is controlled by the dpDynamicSecondarySizing global option. This option needs to be enabled for the provisioning described here to take place. You can check the current setting by issuing “dfm options list dpDynamicSecondarySizing” on the command line of the DFM server. You can set the option with the “dfm options set dpDynamicSecondarySizing=[Yes|No]”. If the dynamic secondary sizing is disabled, Protection Manager uses different calculations. Those calculations are explained in the No Dynamic Secondary Sizing section, near the end of this document. By default, Dynamic Secondary Sizing will be disabled on upgrades from earlier releases, and enabled on new installations.

Dynamic secondary sizing includes resizing the secondary volume after it is initially provisioned. This volume resizing is not discussed here.

The Source Volume/Qtree variables Objects that can be added to the primary node of a dataset include storage systems, aggregates, volumes and qtrees. All of these are valid objects for the primary node of a dataset. Regardless of objects that make up the primary data, Protection Manager works at the volume level of the source objects to determine how much space needs to be provisioned.

When Protection Manager is going through its provisioning decision process, it looks at the following information for each volume in the primary node of the dataset.

 Source Volume’s Total Size; this is the size of the volume’s data space plus the size of the volume’s snap reserve space. It is the total size of the volume.  Source Volume’s Used Size; this is the amount of space consumed or used in the data space portion of the volume. Space consumed in the snap reserve area does not contribute to Source Volume’s Used Size.  To-Be-Provisioned Volume’s Projected Size; this is Protection Manager’s best guess of the size of the to-be provisioned secondary or tertiary volume. It is based on the current values of the Source Volume’s Total Size and Source Volume’s Used Size. Ideally, this would be the size to make the provisioned volume so it is large enough to avoid rebaselining the relationship(s) in the volume, but small enough that space is not wasted. Protection Manager is making an educated guess as to the projected size of the provisioned volume. The actual formula for calculating the Projected Size is to take the maximum of ([1.2 times the Volume’s Total Size] or [2.0 times the Volume’s Used Size]) + 10%. The Examples section will help clarify the use of Volume’s Projected Size.

Before reviewing the specific variables, there is a common sense comment that is worth an explicit statement. We will discuss a number of variables that are of the form abcNearlyFullThreshold and abcFullThreshold. Protection Manager default settings ensure that the NearlyFull value is less than the Full value. This makes sense, right ? A NearlyFull value should always be less than the value of the corresponding Full value. If you change any of these default settings you should maintain this relationship. The NearlyFull value should be less than the Full value.

The Resource Pool variables Objects that can be added to a resource pool include storage systems and aggregates. Adding a storage system to a resource pools is just a shorthand way of adding all the aggregates of that storage system to the resource pool. When provisioning a secondary or tertiary node of a dataset, Protection Manager looks at resource pool variables and variables of the aggregates within the selected resource pool(s). When you create a resource pool there are four “Space Threshold” values that you can set for that resource pool. The “factory installed” default values are shown in the image below.

1. Nearly Full threshold is a percentage of the size of the resource pool. This setting does not figure into the provisioning decision process. This value is a trigger point for generating the Resource Pool Space Nearly Full event. 2. Full threshold is a percentage of the size of the resource pool. This setting does not figure into the provisioning decision process. This value is a trigger point for generating the Resource Pool Space Full event. 3. Nearly overcommitted threshold is a per-aggregate threshold for each aggregate in the resource pool. This value can be overridden by individual aggregate setting. This setting figures into the provisioning decision process. It is also a trigger point for generating the Aggregate Almost Overcommitted event for aggregates in the resource pool 4. Overcommitted threshold is a per-aggregate threshold for each aggregate in the resource pool. This value can be overridden by individual aggregate setting. It is a trigger point for generating the Aggregate Overcommitted event for aggregates in the resource pool Of these four variables, only the Nearly overcommitted threshold is used in provisioning.

You can enable or disable the use of Nearly overcommitted and Overcommitted thresholds. If you disable the aggregate overcommitted thresholds, then resource pool setting for Nearly overcommitted threshold will not be used in the provisioning decision process.

The Aggregate Variables In the resource pool wizard, you can set the Nearly overcommitted threshold(%) for aggregates that are in the selected resource pool. In addition to those settings, Protection Manager supports a set of global and individual settings for aggregates.

These next two variables have global settings. They can also be set on individual aggregates. The setting of an individual aggregate takes precedence over the global value.

1. Aggregate Nearly Full Threshold (%) is an indication of how full you are willing to allow the aggregate to become. Think of this as a high water mark for actual used data in the aggregate. The default global setting is 80%. Individual aggregates can have their own settings, too. 2. Aggregate Nearly Overcommitted Threshold(%) is an indication of how overcommitted you are willing to allow the aggregate to become. Think of this as the high water mark for thin provisioning within the aggregate. The default setting is 300%. Individual aggregates can have their own settings, too.

The global settings for these thresholds can be viewed and modified from the Operations Manager WebUI, under ControlCenter->Setup->Options->DefaultThresholds. The individual aggregate settings for the thresholds can be viewed and modified from Operations Manager WebUI by selecting the intended aggregate (ControlCenter->Home- >MemberDetails->Aggregates) and then clicking on the Edit Settings options on the left hand side of the display, under Aggregate Tools.

Along with the two Nearly thresholds, there are Aggregate Full Threshold and Aggregate Overcommitted Threshold, but these do not factor into the provisioning decision process.

Scope for the Variables Each aggregate may have an Aggregate Nearly Full Threshold value. If the aggregate does not have its own setting, then the global setting will be used for the aggregate.

Each aggregate may have an Aggregate Nearly Overcommitted Threshold value. If the aggregate does not have its own setting then the setting of the containing resource pool will be used, if the resource pool aggregate overcommitted threshold is enabled. If the aggregate doesn’t have its own setting, and it can’t get the setting from the resource pool, then the global setting will be used for the aggregate. Along with these threshold settings, each aggregate has a current value for its used space and its committed space. These values can be viewed in the Space Breakout tab of the Resource Pool display.

Disqualifying an aggregate Now we have all the variables needed to determine if an aggregate is disqualified due to size constraints.

How an aggregate gets disqualified from being used for volume provisioning Here’s what Protection Manager does for each source volume: 1. ProtMgr selects an aggregate from the resource pool. 2. ProtMgr adds the aggregate’s Used Size to Source Volume’s Used Size. Let’s call this the Potential Used Space. 3. ProtMgr takes the Potential Used Space and divides that by the aggregates Total Size. This yields a percentage value. Let’s call this the Potential Used Percentage. 4. ProtMgr compares the Potential Used Percentage with the aggregate’s NearlyFull Threshold percentage. If the Potential Used Percentage is greater than the NearlyFull percentage, the threshold would be exceeded so the aggregate is disqualified. The aggregate will not be used to provision the secondary or tertiary volume. Protection Manager goes back to step 1. 5. Now Protection Manager looks at Overcommit values. It adds the aggregate’s Committed Size to the To-Be-Provisioned Volume’s Projected Size. Let’s call this the Potential Committed Size. 6. ProtMgr takes the Potential Committed Size and divides that by the aggregates Total Size. This yields a percentage value. Let’s call this the Potential Committed Percentage. 7. ProtMgr compares the Potential Committed Percentage to the aggregates NearlyOvercommitted Threshold percentage. If the Potential Committed Percentage Committed is greater than the aggregates NearlyOvercommitted Threshold percentage, the threshold would be exceeded so the aggregate is disqualified. It will not be used to provision the secondary of tertiary volume. Protection Manager goes back to step 1.

Protection Manager will keep trying to find an aggregate from the selected Resource Pool(s). If it gets to the last available aggregate and still cannot find a qualified aggregate then Protection Manager will issue a set of error messages. Here is an example message from the Preview page of the Apply Protection Policy Wizard.

Conformance Results

=== SEVERITY === Error: Attention: Provisioning a new flexible volume (backup secondary) failed. === ACTION === No physical resources exist, so thin provision a new flexible volume (backup secondary) of size 3.60 GB for qtree filer_X:/test_vol/- into node 'Backup' and then attempt to create a backup relationship using Qtree SnapMirror first, then try SnapVault if Qtree SnapMirror relationship creation fails. === REASON ===

Storage system : 'filer_z.lab_A.netapp.com'(108):

Aggregate : 'filer_z:aggr_01'(1277):

- Nearly overcommit threshold of the aggregate will exceed: 'filer_z:aggr_01'(1277)[Overcommits to: 273.655 % (3.94 TB), Nearly overcommitted threshold : 100 % (1.44 TB)]

=== SUGGESTION ===

Suggestions related to storage system 'filer_z.lab_A.netapp.com'(108):

Suggestions related to aggregate 'filer_z:aggr_01'(1277):

- Increase aggregate nearly overcommitted threshold for 'filer_z:aggr_01'(1277) to thinly provision volumes.

In the REASON portion of the example it shows that trying to use aggregate filer_z:aggr_01 to provision a new volume of 3.60GB would cause the aggregates nearly overcommitted threshold to be exceeded. This disqualifies the aggregate and since no other aggregates are available, Protection Manager issues the Conformance Results shown above.

Two Additional Attributes In addition to the size considerations discussed so far, there are two attributes that figure into the aggregate disqualification process. The first is storage system licenses. As Protection Manager considers the aggregates in the resource pool, it also has to consider the licenses installed on the aggregate’s hosting storage system. For example, if you are trying to provision a volume for a Mirror policy, then only aggregates on storage systems with the SnapMirror license will be considered. The same type of requirement exists for a Backup policy. There may be plenty of space on the aggregate, but if the containing storage system does not have the proper license, the aggregate is disqualified.

The other attribute is the new 3.8 Fan-In feature (see the Admin Guide for details). If fan-in is set to something greater than 1, then size calculations will be based upon the sum of the source volumes’ space attributes. For example, if fan-in is set to 2 and you are trying to provision secondary storage for a dataset with 2 volumes, using the Backup policy, the Source variables for both volumes will be added in the calculation of “Projected” size.

No Dynamic Secondary Sizing If dynamic secondary sizing is disabled, then the following calculation is used.  To-Be-Provisioned Volume’s Projected Size is calculated by taking the maximum of [(Source Volume’s Used Size), (0.8 * Source Volume’s Total Size)] * 1.5. Note again, if the Fan-In feature is enabled, Protection Manager uses the sum of the source volume size variables.

OpenSystem SnapVault (OSSV) variables From Protection Manager’s provisioning perspective, there are two classifications for OSSV agents. If the OSSV agent is on a non-ESX server then Protection Manager does not have a way to get the values for the Source Volume variables. In this case, Protection Manager assumes a default 10.0GB for both the Source Volume’s Used Size and the To- Be-Provisioned Volume’s Projected Size.

When the OSSV agent is running on an ESX Server, Protection Manager is able to get the size of the virtual disk for the VM(s) to be protected. Protection Manager uses 1.5 times the size of the virtual disk for both the Source Volume’s Used Size and the To-Be- Provisioned Volume’s Projected Size. For example, if a VM virtual disk is 8.0 GB (as shown in VMCenter Infrastructure Client) then Protection Manager will use 12.0 GB (8.0GB * 1.5) as the Source Volume’s Used Size and the To-Be-Provisioned Volume’s Projected Size.

Examples: In these examples, we look at the individual aggregate threshold settings when doing the calculations. Remember that an aggregate’s NearlyFull threshold comes from either the global setting or from the specific aggregate setting. The aggregate’s NearlyOvercommitted threshold comes from either the global setting, the resource pool setting or the specific aggregate setting.

Example 1: This example shows how the To-Be-Provisioned Volume’s Project Size is calculated. In the remaining examples we simply show the value, without doing the calculation. SrcVolTotalSize = 60GB SrcVolUsedSize = 20GB ToBeProvProjSize= max [(1.2 * 60GB), (2.0 * 20GB)] +10% = 72GB + 7.2GB = 79.2GB

Example 2: In this example we look at one source volume and consider 3 candidate aggregates. The aggregates are the same total size, but their current values and their thresholds vary.

The Source Volume: SrcVolTotalSize = 600GB SrcVolUsedSize = 200GB ToBeProvProjSize = 792GB

Consider aggregate aggr_ABC; CurrentAggrTotalSize = 1,000GB CurrentAggrUsedSize = 600GB CurrentAggrCommitSize = 1,200GB

AggrNearlyFullThreshold % = 70% AggrNearlyOvercommittedThreshold % = 150%

PotentialUsedSpace = SrcVolUsedSize + CurrentAggrUsedSize PotentialUsedSpace = 200GB + 600GB = 800GB PotentialUsedPercentage = PotentialUsedSpace / CurrentAggrTotalSize PotentialUsedPercentage = 800GB / 1,000TB = 80%

PotentialUsedPercentage (80%) exceeds AggrNearlyFullThreshold (70%) so aggregate aggr_ABC is disqualified.

Consider aggregate aggr_DEF CurrentAggrTotalSize = 1,000GB CurrentAggrUsedSize = 600GB CurrentAggrCommitSize = 1,200GB

AggrNearlyFullThreshold % = 85% AggrNearlyOvercommittedThreshold % = 150%

PotentialUsedSpace = SrcVolUsedSize + CurrentAggrUsedSize PotentialUsedSpace = 200GB + 600GB = 800GB PotentialUsedPercentage = PotentialUsedSpace / CurrentAggrTotalSize PotentialUsedPercentage = 800GB / 1,000TB = 80% PotentialUsedPercentage (80%) does not exceed AggrNearlyFullThreshold (80%) so go to next threshold check.

PotentialCommittedSize = ToBeProvProjSize + CurrentAggrCommitSize PotentialCommittedSize = 792GB + 1,200GB = 1,992 GB PotentialCommittedPercentage = PotentialCommittedSize / CurrentAggrTotal PotentialCommittedPercentage = 1,992GB / 1,000GB = 199% PotentialCommittedPercentage (199%) exceeds AggrNearlyOvercommittedThreshold (150%) so aggr. aggr_DEF is disqualifed.

Consider aggregate aggr_GHI CurrentAggrTotalSize = 1,000GB CurrentAggrUsedSize = 600GB CurrentAggrCommitSize = 1,200GB

AggrNearlyFullThreshold % = 85% AggrNearlyOvercommittedThreshold % = 250%

PotentialUsedSpace = SrcVolUsedSize + CurrentAggrUsedSize PotentialUsedSpace = 200GB + 600GB = 800GB PotentialUsedPercentage = PotentialUsedSpace / CurrentAggrTotalSize PotentialUsedPercentage = 800GB / 1,000TB = 80% PotentialUsedPercentage (80%) does not exceed AggrNearlyFullThreshold (85%) so go to next threshold check.

PotentialCommittedSize = ToBeProvProjSize + CurrentAggrCommitSize PotentialCommittedSize = 792GB + 1,200GB = 1,992 GB PotentialCommittedPercentage = PotentialCommittedSize / CurrentAggrTotal PotentialCommittedPercentage = 1,992GB / 1,000GB = 199% PotentialCommittedPercentage (199%) does not exceed AggrNearlyOvercommittedThreshold (250%) so aggr. aggr_GHI is qualifed!

All three of the candidate aggregates had 400GB of unused space, but only the last one, aggr_GHI, qualified for the provisioning of the secondary volume. The first two aggregates were disqualified because of their threshold settings. As mentioned, these thresholds are configurable. You can tune them based on what you consider important.

I hope you found this article helpful.