Posts Tagged ‘drs’

h1

Quick-Take: Removable Media and Update Manager Host Remediation

January 31, 2013

Thanks to a spate of upgrades to vSphere 5.1, I recently (re)discovered the following inconvenient result when applying an update to a DRS cluster from Update Manager (5.1.0.13071, using vCenter Server Appliance 5.1.0 build 947673):

Remediate entity ‘vm11.solori.labs’  Host has VMs ‘View-PSG’ , vUM5 with connected removable media devices. This prevents putting the host into maintenance mode. Disconnect the removable devices and try again.

Immediately I thought: “Great! I left a host-only ISO connected to these VMs.” However, that assumption was as flawed as Update Manager’s assumption that the workloads cannot be vMotion’d without disconnecting the removable media. In fact, the removable media indicated was connected to a shared ISO repository available to all hosts in the cluster. However, I was to blame and not Update Manager, as I had not remembered that Update Manager’s default response to removable media is to abort the process. Since cluster remediation is a powerful feature made possible by Distributed Resource Scheduler (DRS) in Enterprise (and above) vSphere editions that may be new to the feature to many (especially uplifted “Advanced AK” users), it seemed like something worth reviewing and blogging about.

Why is this a big deal?

More the the point, why does this seem to run contrary to “a common sense” response?

First, the manual for remediation of a host in a DRS cluster would include:

  1. Applying “Maintenance Mode” to the host,
  2. Selecting the appropriate action for “powered-off and suspended” workloads, and
  3. Allowing DRS to choose placement and finally vMotion those workloads to an alternate host.

In the case of VMs with removable media attached, this set of actions will result in the workloads being vMotion’d (without warning or hesitation) so long as the other hosts in the cluster have access to the removable media source (i.e. shared storage, not “Host Device.”) However, in the case of Update Manger remediation, the following are documented road blocks to a successful remediation (without administrative override):

  1. A CD/DVD drive is attached (any method),
  2. A floppy drive is attached (any method),
  3. HA admission control prevents migration of the virtual machine,
  4. DPM is enabled on the cluster,
  5. EVC is disabled on the cluster,
  6. DRS is disabled on the cluster (preventing migration),
  7. Fault Tolerance (FT) is enabled for a VM on the host in the cluter.

Therefore it is “by design” that a scheduled remediation would have failed – even if the removable media would be eligible for vMotion. To assist in the evaluation of “obstacles to successful deferred remediation” a cluster remediation report is available (see below).

Generating a remediation report prior to scheduling a Update Manager remediation.

Generating a remediation report prior to scheduling a Update Manager remediation.

In fact, the report will list all possible road blocks to remediation whether or not matching overrides are selected (potentially misleading, certainly not useful for predicting the outcome of the remediation attempt). While this too is counter intuitive, it serves as a reminder of the show-stoppers to successful remediation. For the offending “removable media” override, the appropriate check-box can be found on the options page just prior to the remediation report:

Disabling removable media during Update Manager driven remediation.

Disabling removable media during Update Manager driven remediation.

The inclusion of this override allows Update Manager to slog through the remediation without respect to the attached status of removable media. Likewise, the other remediation overrides will enable successful completion of the remediation process; these overrides are:

  1. Maintenance Mode Settings:
    1. VM Power State prior to remediation:  Do not change, Power off, Suspend
    2. Temporarily disable any removable media devices;
    3. Retry maintenance mode in case of failure (delay and attempts);
  2. Cluster Settings:
    1. Temporarily Disable Distributed Power Management (forces “sleeping” hosts to power-on prior to next steps in remediation);
    2. Temporarily Disable High Availability Admission Control (allows for host remediation to violate host-resource reservation margins);
    3. Temporarily Disable Fault Tolerance (FT) (admonished  to remediate all cluster hosts in the same update cycle to maintain FT compatibility);
    4. Enable parallel remediation for hosts in cluster (will not violate DRS anti-affinity constraints);
      1. Automatically determine the maximum number of concurrently remediated hosts, or
      2. Limit the number of concurrent hosts (1-32);
    5. Migrate powered off and suspended virtual machines to other hosts in the cluster (helpful when a remediation leaves a host in an unserviceable condition);
  3.  PXE Booted ESXi Host Settings:
    1. Allow installation of additional software on PXE booted ESXi 5.x hosts (requires the use of an updated PXE boot image – Update Manager will NOT reboot the PXE booted ESXi host.)

These settings are available at the time of remediation scheduling and as host/cluster defaults (Update Manager Admin View.)

SOLORI’s Take: So while it follows that the remediation process is NOT as similar to the manual process as one might think, it still can be made to function accordingly (almost.) There IS a big difference between disabling removable media and making vMotion-aware decisions about hosts. Perhaps VMware could take a few cycles to determine whether or not a host is bound to a removable media device (either through Host Device or local storage resource) and make a more intelligent decision about removable media.

vSphere already has the ability to identify point-resource dependencies, it would be nice to see this information more intelligently correlated where cluster management is concerned. Currently, instead of “asking” DRS for a dependency list, it just seems to just ask the hosts “do you have removable media plugged-into any VM’s” – and if the answer is “yes” it stops right there… Still, not very intuitive for a feature (DRS) that’s been around since Virtual Infrastructure 3 and vCenter 2.

h1

In The Lab: vSphere DPM, Quirky but Functional

July 24, 2009
ESX Hosts in Standby Using DRS's DPM Extension

ESX Hosts in Standby Using DRS's DPM Extension

It would be hard to complain that virtualization is contributing to any kind of “global warming” hysteria. In fact and by its very consolidating nature, virtualization offers many advantages over “traditional” computing models that make it “green” even in its most basic format. VMware reinforces this argument with the claims that “every server that is virtualized saves 7,000 kWh of electricity and four tons of carbon dioxide emissions per year.”

However, virtualization’s promise was born out of the recognition that x86 servers were commonly operating with enormous “excess capacity.” In typical deployments, virtual servers are driven to only 30-40% of capacity, and where excess capacity abounds, there exists a potential for slashing available resources in a quest for limiting on-line power consumption.

Enter VMware’s power saving extension to its Distributed Resource Scheduler (DRS) that scrubs VMware clusters for unused capacity, automatically consolidates stray virtual machines away from near-idle members and shuts those members down to conserve power. This magic green genie is called Distributed Power Management (DPM) and, while simple to configure, has a few quirks that not only hinder its effectiveness but actually make no sense (more on that later).

DRS's DPM In Action

DRS's DPM In Action

“VMware DPM monitors the cumulative demand of all virtual machines in the cluster for memory and CPU resources and compares this to the total available resource capacity of all hosts in the cluster. If sufficient excess capacity is found, VMware DPM places one or more hosts in standby mode and powers them off after migrating their virtual machines to other hosts. Conversely, when capacity is deemed to be inadequate, DRS brings hosts out of standby mode (powers them on) and migrates virtual machines, using VMotion, to them. When making these calculations, VMware DPM considers not only current demand, but it also honors any user-specified virtual machine resource reservations.”

vSphere Resource Management Guide, Managing Power Resources

Read the rest of this entry ?