h1

Quick Take: Syslog Stops Working after Upgrade to ESXi 5.0 Update 1

March 24, 2012

If you’ve recently upgraded your ESXi from 5.0 build 456551 and were logging to syslog, it’s possible that your events are no longer being received by your syslog server. It seems that there was a “feature” in ESXi 5.0 build 456551 that allowed syslog to escape the ESXi firewall regardless of the firewall setting. This could be especially problematic if your upgraded from ESXi 4.x where there was no firewall configuration needed for syslog traffic.

VMware notes that syslog traffic was not affected by the ESXi firewall in v5 build 456551. See KB2003322 for details.

However, in ESXi 5.0 Update 1, the firewall rules definitely applies and if you were “grandfathered-in” during the upgrade to build 456551: check your syslog for your ESXi 5 servers. If your no longer getting syslog entries, either set the policy in the host’s Configuration->Security Profile->Properties… control panel:

Enabling syslog traffic in the ESXi firewall within the vSphere Client interface.

 

Or use ESXCLI to do the work (especially with multiple hosts):

esxcli network firewall ruleset set –ruleset-id=syslog –enable=true

esxcli network firewall refresh

That will take care of the “absent” syslog entries.

SOLORI’s Take: Gotcha! As ESXi becomes more like ESX in terms of provisioning, old-school ESXiers (like me) need to make sure they’re up-to-speed on the latest changes in ESXi. Ashamed to admit it, but this exact scenario got me in my home lab… Until I stumbled onto KB2003322 I didn’t think to go back and check the ESXi firewall settings – after all, it was previously working 😉

h1

Quick Take: VMware ESXi 5.0, Patch ESXi50-Update01

March 16, 2012

VMware releases ESXi 5.0 Complete Update 1 for vSphere 5. An important change for this release is the inclusion of general and security-only image profiles:

Starting with ESXi 5.0 Update 1, VMware patch and update releases contain general and security-only image profiles. Security-only image profiles are applicable to new security fixes only. No new bug fixes are included, but bug fixes from earlier patch/update releases are included.

The general release image profile supersedes the security-only profile. Application of the general release image profile applies to new security and bug fixes.

The security-only image profiles are identified with the additional “s” identifier in the image profile name.

Just a few of the more interesting bugs fixed in this release:

PR 712342: Cannot assign VMware vSphere Hypervisor license key to an ESXi host with pRAM greater than 32GB

PR 719895: Unable to add a USB device to a virtual machine (KB 1039359).

PR 721191: Modifying snapshots using the commands vim-cmd vmsvc/snapshot.remove or vim-cmd vmsvc/snapshot.revert
will fail when applied against certain snapshot tree structures.

This issue is resolved in this release. Now a unique identifier, snapshotId, is created for every snapshot associated to a virtual machine. You can get the snapshotId by running the command vim-cmd vmsvc/snapshot.get <vmid>. You can use the following new syntax when working with the same commands:

Revert to snapshot: vim-cmd vmsvc/snapshot.revert <vmid> <snapshotId> [suppressPowerOff/suppressPowerOn]
Remove a snapshot: vim-cmd vmsvc/snapshot.remove <vmid> <snapshotId>

PR 724376: Data corruption might occur if you copy large amounts of data (more than 1GB) from a 64-bit Windows virtual machine to a USB storage device.

PR 725429: Applying a host profile to an in-compliance host causes non-compliance (KB 2003472).

PR 728257: On a pair of HA storage controllers configured for redundancy, if you take over one controller, the datastores that reside on LUNs on the taken over controller might show inactive and remain inactive until you perform a rescan manually.

PR 734366: Purple diagnostic screen with vShield or third-party vSphere integrated firewall products (KB 2004893)

PR 734707: Virtual machines on a vNetwork Distributed Switch (vDS) configured with VLANs might lose network connectivity upon boot if you configure Private VLANs on the vDS. However, disconnecting and reconnecting the uplink solves the problem.This issue has been observed on be2net NICs and ixgbe vNICs.

PR 742242: XCOPY commands that VAAI sends to the source storage device might fail. By default, XCOPY commands should be sent to the destination storage device in accordance with VAAI specification.

PR 750460: Adding and removing a physical NIC might cause an ESXi host to fail with a purple screen. The purple diagnostic screen displays an error message similar to the following:

NDiscVlanCheck (data=0x2d16, timestamp=<value optimized out>) at bora/vmkernel/public/list.h:386

PR 751803: When disks larger than 256GB are protected using vSphere Replication (VR), any operation that causes an internal restart of the virtual disk device causes the disk to complete a full sync. Internal restarts are caused by a number of conditions including any time:

  • A virtual machine is restarted
  • A virtual machine is vMotioned
  • A virtual machine is reconfigured
  • A snapshot is taken of the virtual machine
  • Replication is paused and resumed

PR 754047: When you upgrade VMware Tools the upgrade might fail because, some Linux distributions periodically delete old files and folders in /tmp. VMware Tools upgrade requires this directory in /tmp for auto upgrades.

PR 766179: ESXi host installed on a server with more than 8 NUMA nodes fails and displays a purple screen.

PR 769677: If you perform a VMotion operation to an ESXi host on which the boot-time option “pageSharing” is disabled, the ESXi host might fail with a purple screen.

Disabling pageSharing severely affects performance of the ESXi host. Because pageSharing should never be disabled, starting with this release, the “pageSharing” configuration option is removed.

PR 773187: On an ESXi host, if you configure the Network I/O Control (NetIOC) to set the Host Limit for Virtual Machine Traffic to a value higher than 2000Mbps, the bandwidth limit is not enforced.

PR 773769: An ESXi host halts and displays a purple diagnostic screen when using Network I/O Control with a Network Adapter that does not support VLAN Offload (KB 2011474).

PR 788962: When an ESXi host encounters a corrupt VMFS volume, VMFS driver might leak memory causing VMFS heap exhaustion. This stops all VMFS operations causing orphaned virtual machines and missing datastores. vMotion operations might not work and attempts to start new virtual machines might fail with errors about missing files and memory exhaustion. This issue might affect all ESXi hosts that share the corrupt LUN and have running virtual machines on that LUN.

PR 789483: After you upgrade to ESXi 5.0 from ESXi 4.x, Windows 2000 Terminal Servers might perform poorly. The consoles of these virtual machines might stop responding and their CPU usage show a constant 100%.

PR 789789: ESXi host might fail with a purple screen when a virtual machine connected to VMXNET 2 vNIC is powered on. The purple diagnostic screen displays an error message similar to the following:

0x412261b07ef8:[0x41803b730cf4]Vmxnet2VMKDevTxCoalesceTimeout@vmkernel#nover+0x2b stack: 0x412261b0
0x412261b07f48:[0x41803b76669f]Net_HaltCheck@vmkernel#nover+0xf6 stack: 0x412261b07f98

You might also observe an error message similar to the following written to VMkernel.log:

WARNING: Vmxnet2: 5720: failed to enable port 0x2000069 on vSwitch1: Limit exceeded^[[0m

SOLORI’s Take: Lions, tigers and bears – oh my! In all, I count seven (7) unique PSD bugs (listed in the full KB) along with some rather head-scratching gotchas.  Lots of reasons to keep your vSphere hosts current in this release to be sure… Use Update Manager or start your update journey here…

h1

In-the-Lab: NexentaStor vs ESXi, Redux

February 24, 2012

In my last post, I mentioned a much complained about “idle” CPU utilization quirk with NexentaStor when running as a virtual machine. After reading many supposed remedies on forum postings (some reference in the last blog, none worked) I went pit-bull on the problem… and got lucky.

As an avid (er, frequent) NexentaStor user, the luster of the NMV (Nexenta’s Web GUI) has worn off. Nearly 100% of my day-to-day operations are on the command line and/or Nexenta’s CLI (dubbed NMC). This process includes power-off events (from NMC, issue “setup appliance power-off” or “setup appliance reboot”).

For me, the problem cropped-up while running storage benchmarks on some virtual storage appliances for a client. These VSA’s are bound to a dedicated LSI 9211-8i SAS/6G controller using VMware’s PCI pass-through (Host Configuration, Hardware, Advanced Settings). The VSA uses the LSI controller to access SAS/6G disks and SSDs in a connected JBOD – this approach allows for many permutations on storage HA and avoids physical RDMs and VMDKs. Using a JBOD allows for attachments to PCIe-equipped blades, dense rack servers, etc. and has little impact on VM CPU utilization (in theory).

So I was very surprised to find idle CPU utilization (according to ESXi’s performance charting) hovering around 50% from a fresh installation. This runs contrary to my experience with NexentaStor, but I’ve seen reports of such issues on the forums and even on my own blog. I’ve never been able to reproduce more than a 15-20% per vCPU bias between what’s reported in the VM and what ESXi/vCenter sees. I’ve always attributed this difference to vSMP and virtual hardware (disk activity) which is not seen by the OS but is handled by the VMM.

CPU record of idle and IOzone testing of SAS-attached VSA

During the testing phase, I’m primarily looking at the disk throughput, but I notice a persistent CPU utilization of 50% – even when idle. Regardless, the 4 vCPU VSA appears to perform well (about 725MB/sec 2-process throughput on initial write) despite the CPU deficit (3 vCPU test pictured above, about 600MB/sec write). However, after writing my last blog entry, the 50% CPU leach just kept bothering me.

After wasting several hours researching and tweaking with very little (positive) effect, a client e-mail prompted a NMV walk through with resulted in an unexpected consequence: the act of powering-off the VSA from web GUI (NMV) resulted is significantly reduced idle CPU utilization.

Getting lucky: noticing a trend after using NMV to reboot for a client walk-through of the GUI.

Working with the 3 vCPU VSA over the previous several hours, I had consistently used the NMC (CLI) to reboot and power-off the VM. The fact of simply using the NMV to shutdown the VSA couldn’t have anything to do with idle CPU consumption, could it? Remembering that these were fresh installations I wondered if this was specific to a fresh installation or could it show up in an upgrade. According to the forums, this only hampered VMs, not hardware.

I grabbed a NexentaStor 3.1.0 VM out of the library (one that had been progressively upgraded from 3.0.1) and set about the upgrade process. The result was unexpected: no difference in idle CPU from the previous version; this problem was NOT specific to 3.1.2, but specific to the installation/setup process itself (at least that was the prevailing hypothesis.)

Regression into my legacy VSA library, upgrading from 3.1.1 to 3.1.2 to check if the problem follows the NexentaStor version.

If anything, the upgraded VSA exhibited slightly less idle CPU utilization than its previous version. Noteworthy, however, was the extremely high CPU utilization as the VSA sat waiting for a yes/no response (NMC/CLI) to the “would you like to reboot now” question at the end of the upgrade process (see chart above). Once “no” was selected, CPU dropped immediately to normal levels.

Now it seemed apparent that perhaps an vestige of the web-based setup process (completed by a series of “wizard” pages) must be lingering around (much like the yes/no CPU glutton.) Fortunately, I had another freshly installed VSA to test with – exactly configured and processed as the first one. I fired-up the NMV and shutdown the VSA…

Confirming the impact of the "fix" on a second fresh installed NexentaStor VSA

After powering-on the VM from the vSphere Client it was obvious. This VSA had been running idle for some time, so it’s idle performance baseline – established prior across several reboots from CLI – was well recorded by the ESXi host (see above.) The resulting drop in idle CPU was nothing short of astounding: the 3 vCPU configuration has dropped from a 50% average utilization to 23% idle utilization. Naturally, these findings (still anecdotal) have been forwarded on to engineers at Nexenta. Unfortunately, now I have to go back and re-run my storage benchmarks; hopefully clearing the underlying bug has reduced the needed vCPU count…

h1

In-the-Lab: NexentaStor and VMware Tools, You Need to Tweak It…

February 24, 2012

While working on an article on complex VSA’s (i.e. a virtual storage appliance with PCIe pass-through SAS controllers) an old issue came back up again: NexentaStor virtual machines still have a problem installing VMware Tools since it branched from Open Solaris and began using Illumos. While this isn’t totally Nexenta’s fault – there is no “Nexenta” OS type in VMware to choose from – it would be nice if a dummy package was present to allow a smooth installation of VMware Tools; this is even the case with the latest NexentaStor release: 3.1.2.

I could not find where I had documented the fix in SOLORI’s blog, so here it is… Note, the NexentaStor VM is configured as an Oracle Solaris 11 (64-bit) virtual machine for the purpose of vCenter/ESXi. This establishes the VM’s relationship to a specific VMware Tools load. Installation of VMware Tools in NexentaStor is covered in detail in an earlier blog entry.

VMware Tools bombs-out at SUNWuiu8 package failure. Illumos-based NexentaStor has no such package.

Instead, we need to modify the vmware-config-tools.pl script directly to compensate for the loss of the SUNWuiu8 package that is explicitly required in the installation script.

Commenting out the SUNWuiu8 related section allows the tools to install with no harm to the system or functionality.

Note the full “if” stanza for where the VMware Tools installer checks for ‘tools-for-solaris’ must be commented out. Since the SUNWuiu8 package does not exist – and more importantly is not needed for Illumos/Nexenta – removing a reference to it is a good thing. Now the installation can proceed as normal.

After the changes, installation completes as normal.

That’s all there is to getting the “Oracle Solaris” version of VMware Tools to work in newer NexentaStor virtual machines – now back to really fast VSA’s with JBOD-attached storage…

SOLORI’s Note: There is currently a long-standing bug that affects NexentaStor 3.1.x running as a virtual machine. Currently there is no known workaround to keep NexentaStor from running up a 50% cpu utilization from ESXi’s perspective. Inside the NexentaStor VM we see very little CPU utilization, but from the performance tab, we see 50% utilization on every configured vCPU allocated to the VM. Nexenta is reportedly looking into the cause of the problem.

I looked through this and there is nothing that stands out other that a huge number of interrupts while idle. I am not sure where those interrupts are coming from. I see something occasionally called volume-check and nmdtrace which could be causing the interrupts.

Nexenta Support

A bug report was reportedly filed a couple of days ago to investigate the issue further.

h1

VMware vCenter5: Revenge of Y2K, aka Worst Host Import Fail Ever!

January 6, 2012

I was recently involved in a process of migrating from vSphere 4 to vSphere 5 for an enterprise client leapfrogging from vSphere 4.0 to vSphere 5.0. Their platform is and AMD service farm with modern, socket G34 CPU blades and 10G Ethernet connectivity – all moving parts on VMware’s Hardware Compatibility List for all versions of vSphere involved in the process.

Supermicro AS-2022TG Platform Compatibility

Intel 10G Ethernet, i82599EB Chipset based NIC

Although VMware lists the 2022TG-HIBQRF as ESXi 5.0 compatible and not the 2022TG-HTRF, it is necessary to note the only difference between the two is the presence of a Mellanox ConnectX-2 QDR infiniband controller on-board: the motherboards and BIOS are exactly the same, the Mellanox SMT components are just mission on the HTRF version.

It is key to note that VMware also distinguishes the ESXi compatible platform by supported BIOS version 2.0a (Supermicro’s current version) versus 1.0b for the HTRF version. The current version is also required for AMD Opteron 6200 series CPUs which is not a factor in this current upgrade process (i.e. only 6100-series CPUs are in use). For this client, the hardware support level of the current BIOS (1.0c) was sufficient.

Safe Assumptions

So is it safe to assume that a BIOS update is not necessary when migrating to a newer version of vSphere? In the past, it’s been feature driven. For instance, proper use new hardware features like Intel EPT, AMD RVI or VMDirectPath (pci pass-through) have required BIOS updates in the past. All of these features were supported by the “legacy” version of vSphere and existing BIOS – so sounds safe to assume a direct import into vCenter 5 will work and then we can let vCenter manage the ESXi update, right?

Well, not entirely: when importing the host to vCenter5 the process gets all the way through inventory import and the fails abruptly with a terse message “A general system error occurred: internal error.” Looking at the error details in vCenter5 is of no real help.

Import of ESXi 4 host fails in vCenter5 for unknow reason.

A search of the term in VMware Communities is of no help either (returns non-relevant issues). However, digging down to the vCenter5 VPXD log (typically found in the hidden directory structure “C:\ProgramData\VMware\VMware VirtualCenter\Logs\”) does return a nugget that is both helpful and obscure.

Reviewing the vCenter VPXD log for evidence of the import problem.

If you’ve read through these logs before, you’ll note that the SSL certificate check has been disabled. This was defeated in vCenter Server Settings to rule-out potentially stale SSL certificates on the “legacy” ESXi nodes – it was not helpful in mitigating the error. The section highlighted was, however, helpful in uncovering a relevant VMware Knowledgebase article – the key language, “Alert:false@ D:/build/ob/bora-455964/bora/vim/lib/vdb/vdb.cpp:3253” turns up only one KB article – and it’s a winner.

Knowledge Base article search for cryptic VPXD error code.

It is important – if not helpful – to note that searching KB for “import fail internal error” does return nine different (and unrelated) articles, but it does NOT return this KB (we’ve made a request to VMware to make this KB easier to find in a simpler search). VMware’s KB2008366 illuminates the real reason why the host import fails: non-Y2K compliant BIOS date is rejected as NULL data by vCenter5.

Y2K Date Requirement, Really?

Yes, the spectre of Y2K strikes 12 years later and stands as the sole roadblock to importing your perfectly functioning ESXi 4 host into vCenter5. According the the KB article, you can tell if you’re on the hook for a BIOS update by checking the “Hardware/Processors” information pane in the “Host Configuration” tab inside vCenter4.

ESXi 4.x host BIOS version/date exposed in vCenter4

According to vCenter date policy, this platform was minted in 1910. The KB makes it clear that any two-digit year will be imported as 19XX, where XX is the two digit year. Seeing as how not even a precursor of ESX existed in 1999, this choice is just dead stupid. Even so, the x86 PC wasn’t even invented until 1978, so a simple “date check” inequality (i.e. if “two_digit_date” < 78 then “four_digit_date” = 2000 + “two_digit_date”) would have resolved the problem for the next 65 years.

Instead, VMware will have you go through the process of upgrading and testing a new (and, as 6200 Opterons are just now available to the upgrade market, a likely unnecessary) BIOS version on your otherwise “trusty” platform.

Non-Y2K compliant BIOS date

Y2K-compliant BIOS date, post upgrade

Just to add insult to injury with this upgrade process, the BIOS upgrade for this platform comes with an added frustration: the IPMI/BMC firmware must also be updated to accommodate the new hardware monitoring capabilities of the new BIOS. Without the BMC update, vCenter will complain of Northbridge chipset overheat warnings from the platform until the BMC firmware is updated.

So, after the BIOS update, BMC update and painstaking hours (to days) of “new” product testing, we arrive at the following benefit: vCenter gets the BIOS version date correctly.

vCenter5 only wants Y2K compliant BIOS release dates for imported hosts

Bar Unnecessarily High

VMware actually says, “if the BIOS release date of the host is in the MM/DD/YY format, contact the hardware vendor to obtain the current MM/DD/YYYY format.” Really? So my platform is not vCenter5 worthy unless the BIOS date is four-digit year formatted? Put another way, VMware’s coders can create the premier cloud platform but they can’t handle a simple Y2K date inequality. #FAIL

Forget “the vRAM tax”, this obstacle is just dead stupid and unnecessary; and it will stand in the way of many more vSphere 5 upgrades. Relying on a BIOS update for a platform that was previously supported (remember 1.0b BIOS above?) just to account for the BIOS date is arbitrary at best, and it does not pose a compelling argument to your vendor’s support wing when dealing with an otherwise flawless BIOS.

SOLORI’s Take:

We’ve submitted a vCenter feature request to remove this exclusion for hundreds of vSphere 4.x hosts, maybe you should too…

h1

Quick-Take: VMworld 2011, Thoughts on the Airplane

August 28, 2011

On the way to VMworld this morning this morning I started-out by listening to @Scott_lowe, @mike_laverick and @duncanyp about stretched clusters and some esoteric storage considerations. Then i was off reading @sakacc blogging about his take on stretch clusters and the black hole of node failure when I stumbled on a retweet @bgracely via @andreliebovici about the spectre of change in our industry. Suddenly these things seemed very well related within the context of my destination: VMworld 2011.

Back about a month ago when vSphere 5 was announced the buzz about the “upgrade” was consumed by discussions about licensing and vRAM. Naturally, this was not the focus VMware was hoping for, especially considering how much of a step forward vSphere 5 is over VS4. Rather, VMware – by all deserved rights – wanted to hear “excited” conversations about how VS5 was closing the gap on vCloud architecture problems and pain-points.

Personally, I managed to keep the vRAM licensing issue out of SOLORI’s blog for two reasons: 1) the initial vRAM targets were so off that VMware had to make a change, and 2) significant avenues for the discussion were available elsewhere. That does not mean I wasn’t outspoken about my thoughts on vRAM – made obvious by contributions to some community discussions on the topic – or VMware’s reasoning for moving to vRAM. Suffice to say VMware did “the right thing” – as I had confidence they would – and the current vRAM targets capture 100% of my clients without additional licenses.

I hinted that VS5 answers a lot of the hanging questions from VS4 in terms of facilitating how cloud confederations are architected, but the question is: in the distraction, did VS5’s “goodness” get lost in the scuffle? If so, can they get back the mind share they may have lost to Chicken Little reactionaries?

First, if VMware’s lost ground to anyone, it’s VMware. The vast majority of cool-headed admins I talked to were either not affected by vRAM or were willing to take a wait-and-see outlook on vSphere 5 with continued use of vSphere 4.1. Some did evaluate Hyper-V’s “readiness” but most didn’t blink. By comparison, vSphere 4.1 still had more to offer private cloud than anything else.

Secondly, vSphere 5 “goodness” did get lost in the scuffle, and that’s okay! It may be somewhat counter intuitive but I believe VMware will actually come out well ahead of their “would be” position in the market, and it is precisely because of these things, not just in spite of them. Here’s my reasoning:

1) In the way the vSphere 5 launch announcement and vRAM licensing debacle unfolded, lot of the “hot air” about vRAM was vented along the way. Subsequently, VMware gained some service cred by actually listening to their client base and making a significant change to their platform pricing model. VMware got more bang-for-their-buck out of that move as the effect on stock price may never be known here, given the timing of the S&P ratings splash, but I would have expected to see a slight hit. Fortunately, 20-30% sector slides trump vRAM, and only Microsoft is talking about vRAM now (that is until they adopt something similar.)

On that topic, anytime you can get your competitor talking about your product instead of theirs, it usually turns out to be a good thing. Even in this case, where the topic has nothing to do with the needs of most businesses, negative marketing against vRAM will ultimately do more to establish VMware as an innovator than an “already too expensive alternative to XYZ.”

2) SOLORI’s law of conservation of marketing momentum: goodness preserved, not destroyed. VMworld 2011 turns out to be perfectly timed to generate excitement in all of the “goodness” that vSphere 5 has to offer. More importantly, it can now do so with increased vigor and without a lot of energy siphoned-off discussing vRAM, utilization models and what have you: been there done that, on to the meat and away with the garnish.

3) Again it’s odd timing, but the market slide has more folks looking at cloud than ever before. Confidence in cloud offerings has been a deterrent for private cloud users, partly because of the “no clear choices” scenario and partly because concerns about data migration in and around the public cloud. Instability and weak growth in the world economy have people reevaluating CAPEX-heavy initiatives as well as priorities. The bar for cloud offerings has never been lower.

In vSphere 5, VMware hints at the ability for more cloud providers to be transparent to the subscriber: if they adopt vSphere. Ultimately, this will facilitate vendor agnosticism much like the early days of the Internet. Back then, operators discovered that common protocols allowed for dial-up vendors to share resources in a reciprocal and transparent manner. This allowed the resources of provider A to be utilized by a subscriber of provider B: the end user was completely unaware of the difference. For those that don’t have strict requirements on where their data “lives” and/or are more interested in adherence to availability and SLA requirements, this can actually induce a broader market instead of a narrower one.

If you’ve looked past vRAM, you may have noticed for yourself that vSphere has more to deliver cloud offerings than ever before. VMware will try to convince you that whether cloud bursting, migrating to cloud or expanding hybrid cloud options, having a common underlying architecture promotes better flexibility and reduces overall cost and complexity. They want you to conclude that vSphere 5 is the basis for that architecture. Many will come away from Las Vegas – having seen it – believing it too.

So, as I – and an estimated 20K+ other virtualization junkies – head off to Las Vegas for a week of geek overload, parties and social networking, my thoughts turn to @duncanyp‘s 140+ improvements, enhancements and advances waiting back home in my vSphere 5 lab. Last week he challenged his “followers” to be the first to post examples of all of them; with the myriad of hands-on labs and expert sessions just over the horizon, I hope to do it one better and actually experience them first hand.

These things all add up to a win-win for VMware and a strong showing for VMworld. It’s going to be an exciting and – tip of the hat to @bgracely – industry changing week! Now off to the fray…

References:

See Mike Laverick’s chinwag podcasts

See Chad’s Sakacc’s VirtualGeek blog on stretched cluster issues to overcome

(excuse typos today, wordpress iPad…)

h1

Short-Take: Nexenta 3.1 Adds VAAI Support, Auto-Sync Resume

August 3, 2011

Nexneta Systems Inc released version 3.1 of its open storage software yesterday with a couple of VMware vSphere-specific feature enhancements. These enhancements are specifically targets at VMware’s vStorage API for Array Integration (VAAI) which promises to accelerate certain “costly” storage operations by pushing their implementation to the storage array instead of the ESX host.

From NexentaStor 3.1 Release Notes, the primitives implemented in 3.1 that contribute to VAAI support include:

  • SCSI Write Same: Supported in vSphere 4.1 and later
    Example. Accelerates zero block writes when creating new virtual disks.
  • SCSI ATS (Atomic Test & Set): Supported in vSphere 4.1 and later.
    Example. Enables specific LUN “region” to be locked instead of entire LUN when cloning a VM.
  • SCSI Block Copy: Supported in vSphere 4.1 and later.
    Example. Avoids reading and writing of block data “through” the ESX host during a block copy operation by allowing VMware to instruct the SAN to do so.
  • SCSI Unmap: Supported in vSphere 5 and later. Enables freed blocks to be returned to the zpool for new allocation when no longer used for VM storage.

Additional “optimizations” and improvements from Nexenta in 3.1 include:

  • In-flight deduplication
  • ARC performance enhancements
  • multiple connections per session for iSCSI
  • DMU fast path for iSCSI (i.e. no extra copy)
  • Auto-sync “resume” with progress bar in GUI/NMV and ability to change source/destination paths OTF
  • Parallel tasks in NMV (i.e. no more busy process “hangs”)
  • Improved CIFS performance
  • Support for multiple DC/DC fail-over for CIFS
  • Better cross-forrest trusts with CIFS
  • Configuration monitoring/reporting via “ConfGuard” plug-in
  • Multiple VIP per service for HA Cluster, fail-over of local users and elimination of separate heartbeat device
  • JBOD management for select devices from within the NMV

Given the addition of VAAI features, the upgrade offers some compelling reasons to make the move to NexentaStor 3.1 and at the same time removes obstacles from choosing NexentaStor as a VMware iSCSI platform for SMB/SME (versus low-end EMC VNXE, which at last look was still waiting on VAAI support.) However, for existing vSphere 4.1+ environments, a word of caution: you will want to “test, test, test” before upgrading to (or enabling) VAAI (fortunately, there’s a NexentaStor VSA available).

Auto-Sync Resume

In the past, NexentaStor’s auto-sync plug-in has been the only integrated means of block replication from one storage pool (or array) to another. In the past, the plug-in allowed for periodic replication events to be scheduled which drew from a marker snapshot until the replication was complete. Upon extended error (where the replication fails), the failure of the replication causes a roll-back to the marker point, eliminating any data that has transferred between the pools. For WAN replication, this can be costly as no check-points are created along the way.

More problematically, there has been no way to recreate a replication service in the event it has been either deleted or missing (i.e. zpool moved to a new host.) This creates a requirement for the replication to start over from scratch – a problem for very large datasets. With Auto-Sync 3.1, later problem is resolved, and provided NexentaStor can find at least one pair of identical snapshots for the file system.

Where I find this new “feature” particularly helpful is in seed replications to external storage devices (i.e. USB2.0 arrays, JBODs, etc.). This allows for a replication to external, removable storage to (1) be completed locally, (2) shipped to a central repository, and (3) a remote replication service created to continue the replication updates over the WAN.

Additionally, consider the case where the above local-to-WAN replication seeding takes place over the course of several months and the hardware at the central repository fails, requiring the replication pool to be moved to another NexentaStor instance. In the past, the limitation on auto-sync would have required a brand new replication set, regardless of the consistency of the replicated data on the relocated pool. Now, a new (replacement) service can be created pointing to the new destination, and auto-sync promises to find the data – intact – and resume the replication updates starting with the last identical marker snapshot.

NexentaStor Native Transport

The default transport for replication in NexentaStor 3.1 is now NexentaStor’s TCP-based Remote Replication protocol (RR). While SSH is still an option for non-NexentaStor destinations, netcat is no longer supported for auto-sync replications. While no indication of performance benefits are available, two tunable parameters are available for RR auto-sync services (per service): TCP connection count (-n) and TCP package size (-P). Defaults for each of these are 4 and 1024, respectively, meaning 4 connections and 1024KB PDU size for the replication session.

Conclusions

For VMware vSphere deployments in SMB, SME and ROBO environments, NexentaStor 3.1 looks to be a good fit, offering high-performance CIFS, NFS, iSCSI and Fiber Channel options in a unified storage environment complete with VAAI support to accelerate vStorage applications. For VMware View installations using NexentaStor, the VAAI/ATS feature should resolve some iSCSI locking behavior issues that have made NFS more attractive but remove SCSI-based VAAI features. That said, with the storage provisioning changes in View 4.5 and upcoming View 5, the ability to pick from FC, iSCSI or NFS (especially at 10G) from within the same storage platform has definite advantages (if not complexity implications.) Suffice to say, NexentaStor’s update is adding more open storage tools to the VMware virtualization architect’s bag of tricks.

NexentaStor 3.1 is available for download now.

Update, 8/12/2011:

Nexenta has found some problems with 3.1 post Q/A. They’ve released this statement on the matter:

Nexenta places the highest importance on maintaining access to and integrity of customer data. The purpose of this Technical Bulletin is to make you aware of an issue with the process of upgrading to 3.1. Nexenta has discovered an issue with the software delivery mechanism we use. This issue can result in errors during the upgrade process and some functionality not being installed properly. Please postpone upgrading to v3.1 until our next Technical Bulletin update. We are actively working to get this corrected and get it back to 100 % service as fast as possible. Until the issue is resolved we have removed 3.1 from the website and suspended upgrades. Thanks for your patience.

Nexenta Support, Aug. 6, 2011

According to sources from within Nexenta, the problems appear to be more related to APT repository/distribution issues “rather than the 3.1 codebase.” All ISO and repository distribution for 3.1 has been pulled until further notice and links to information about 3.1 on the corporate Nexenta site are no longer working…

Update, 8/17/2011:

Today, while working on a follow-up post, the lab systems (virtual storage appliances) were updated to NexentaStor 3.1.1 (both Enterprise and Community editions). Since a question was raised about the applicability of the VAAI enhancements to Community Edition (NexentaStor CE), I’ve got a teaser for you: see the following image of two identical LUNs mounted to an ESXi host from NexentaStor Enterprise Edition (NSEE) and NexentaStor Community Edition (NSCE). If you look closely, you’ll notice they BOTH show “supported” status.

vSphere VMFS5 Datastores provided by NexentaStor Community (VSA04) and Enterprise (VSA03) editions.

Update, 8/19/2011:

Nexenta officially re-released NexentaStor 3.1 today in the form of version 3.1.1 – it is available for download now.

h1

Short-Take: VMware View PCoIP Client for Android

July 15, 2011

Today VMware released a “Tech Preview” version of VMware’s View Client for Android: a PCoIP-only client suitable for LAN and WAN (via PCoIP Secure Gateway). We’ve had a quick first look this evening when the application appeared on Android Market – a free download – and it looks great. On my NotionInk Adam tablet (NVidia 1GHz dual-core) running Honeycomb 3.0.1 the display updates where just as snappy as my iPad2 running View Client for iPad. The only problem I experienced in the hour or so of working with the client is the lack of three-finger support in the Adam/Honeycomb port to spawn the pop-up keyboard.

The View PCoIP Client for Android supports the same saved desktop icon paradigm as it's iPad predecessor for quick access.

The View PCoIP Client for Android allows for desktop connections to stay active even when the app is not in the foreground - a one-up on the iPad predecessor.

Android View PCoIP Client - Task switching to other Android application

Task switching in View PCoIP Client for Android works just like any other Android application.

Android View PCoIP Client - Retrieving a View desktop from background

View PCoIP Client for Android is easily restored from the background without reconnection delays.

And yes, that last screen shot shows 1-bar on AT&T’s 3G network and it’s totally useable just like on the iPad. If you’re waiting for a rocking View client before plunking down money on that 10.1″ ASUS EEpad Transformer (now with Honeycomb 3.1) and it’s keyboard/mousepad “docking” station (complete with additional run-time doubling battery) then wait no more: Android has arrived. Remember though, this is just a “Tech Preview” and the apple needs a bit more polishing before you go running to your CIO…

SOLORI’s Note: Although the View Client for Android was “optimized” for 1280×800 format, it still had no problem with the more limited 1024×600 Pixel Qi display on my NotionInk Adam. In fact, changes in rotation on the Android seemed faster than on iPad2 and multitasking on the Honeycomb system did no seemed to be affected to a backgrounded desktop.

As another test to compatibility, I tested small-screen PCoIP goodness on my Samsung Fascinate and it rocks! Beware, there is just enough display to be useful with the pop-up keyboard on-screen, however and the scroll-back on the screen with keyboard in foreground made for interesting URL entry while trying to get to Hulu, but audio was clear and frame rates at about 3-5 fps (visual est.) but very clear. Task switching on the single-core Android Froyo device worked flawlessly too.

How did Hulu fare on Honeycomb? Unfortunately it was not up to scratch in full screen, but I found it passable in the embedded mode (Mozilla 3.6). This kind of performance issue will likely be very platform dependent on Android version, CPU, display and vendor tweaks to the Google Android kernel – especially hacked kernels like the NI Adam (tested). Unlike the Apple-controlled IOS, Android leaves a lot of performance enhancements to platform providers and most just pass-on the reference kernel without significant improvement in performance. For a “preview” release, Team Fox at VMware has delivered the goods.

VMware’s official blog post has a quick walk-through video. A User Guide and Release Notes are also available from VMware.

h1

Short-Take: Clock Ticking on VI 3.5 Updates, June 1 Deadline

May 26, 2011

If you’re still not quite ready to upgrade from VI 3.x to vSphere, time may be running out on your ESX hosts to stay “current” inside of VI3 unless you act before June 1, 2011. If your VMware VI3 hosts have not been patched since November of 2010, you are at risk for losing update/patching capabilities unless you apply ESX350-201012410-BG before the deadline. This patch ONLY addresses the expiring secure key on the ESX host which will otherwise become invalid on June 1, 2011.

If you need to bring your hosts current (without upgrading to vSphere) the last full patch release from VMware for VI 3.5 addresses the following issues:

Enablement of Intel Xeon Processor 3400 Series – Support for the Intel Xeon processor 3400 series has been added. Support includes Enhanced VMotion capabilities. For additional information on previous processor families supported by Enhanced VMotion, see Enhanced VMotion Compatibility (EVC) processor support (KB 1003212).

Driver Update for Broadcom bnx2 Network Controller – The driver for bnx2 controllers has been upgraded to version 1.6.9. This driver supports bootcode upgrade on bnx2 chipsets and requires bmapilnx and lnxfwnx2 tools upgrade from Broadcom. This driver also adds support for Network Controller – Sideband Interface (NC-SI) for SOL (serial over LAN) applicable to Broadcom NetXtreme 5709 and 5716 chipsets.

Driver Update for LSI SCSI and SAS Controllers – The driver for LSI SCSI and SAS controllers is updated to version 2.06.74. This version of the driver is required to provide a better support for shared SAS environments.

Newly Supported Guest Operating Systems – Support for the following guest operating systems has been added specifically for this release:

For more complete information about supported guests included in this release, see the VMware Compatibility Guide: http://www.vmware.com/resources/compatibility/search.php?deviceCategory=software.

  • Windows 7 Enterprise (32-bit and 64-bit)
  • Windows 7 Ultimate (32-bit and 64-bit)
  • Windows 7 Professional (32-bit and 64-bit)
  • Windows 7 Home Premium (32-bit and 64-bit)
  • Windows 2008 R2 Standard Edition (64-bit)
  • Windows 2008 R2 Enterprise Edition (64-bit)
  • Windows 2008 R2 Datacenter Edition (64-bit)
  • Windows 2008 R2 Web Server (64-bit)
  • Ubuntu Desktop 9.04 (32-bit and 64-bit)
  • Ubuntu Server 9.04 (32-bit and 64-bit)

Newly Supported Management Agents – See VMware ESX Server Supported Hardware Lifecycle Management Agents for current information on supported management agents.

Newly Supported Network Cards – This release of ESX Server supports HP NC375T (NetXen) PCI Express Quad Port Gigabit Server Adapter.

Newly Supported SATA Controllers – This release of ESX Server supports the Intel Ibex Peak SATA AHCI controller.

Note:
  • Some limitations apply in terms of support for SATA controllers. For more information, see SATA Controller Support in ESX 3.5. (KB 1008673)
  • Storing VMFS datastores on native SATA drives is not supported.

This patch comes with a roll-up approach that VMware describes this way:

Note: As part of the end of availability for some VMware Virtual Infrastructure product releases, the ESX 3.5 Update 5 upgrade package ESX350-Update05.zip has been replaced by ESX350-Update05a.zip in order to remove dependencies upon patches that will no longer be available for download. Hosts upgraded using ESX350-Update05a.zip are equivalent to those upgraded using the older package, but patch bundles released before ESX 3.5 Update 5 will not be required during the upgrade process.

h1

Short-Take: vSphere 4.0 U3 Out Today, Drivers & Fixes

May 6, 2011

Today, VMware announces the release of vSphere 4.0 Update 3 (U3). Many, many fixes and enhancements – some rolled-in from (or influenced by) vSphere 4.1. Updates to ESX, ESXi, vCenter and vCenter Update Manager are available now (see below for links).

Don't forget to click the "View History" link to expose the vCenter and ESX updates available for older versions...

VMware ESX/ESXi

  • Enhanced APD handling with automatic failover
  • Inclusion of additional drivers
  • Bug and security fixes
  • Additional guest operating system support
  • Updated VM Tools (WDDM, XPDM and PVSCSI drivers)

Patch Release ESX400-Update03 contains the following individual bulletins:

ESX400-201105201-UG: Updates the VMware ESX 4.0 Core and CIM components
ESX400-201105202-UG: Updates the VMware-esx-remove-rpms
ESX400-201105203-UG: Updates the VMware ESX 4.0 EHCI HCD device driver
ESX400-201105204-UG: Updates the VMware ESX 4.0 USB core component
ESX400-201105205-UG: Updates the VMware ESX 4.0 SCSI qla4xxx driver
ESX400-201105206-UG: Updates the VMware ESX 4.0 USB storage component
ESX400-201105207-UG: Updates the VMware ESX 4.0 SCSI qla2xxx driver
ESX400-201105209-UG: Updates the VMware ESX 4.0 e1000e driver
ESX400-201105210-UG: Updates the VMware ESX 4.0 mptsas, mptspi device drivers
ESX400-201105212-UG: Updates the VMware ESX 4.0 nx-nic device driver
ESX400-201105215-UG: Updates the VMware ESX 4.0 scsi hpsa device driver
ESX400-201105216-UG: Updates the VMware ESX 4.0 bnx2x device driver
ESX400-201105217-UG: Updates the VMware ESX 4.0 Megaraid SAS device driver
ESX400-201105218-UG: Updates the VMware ESX 4.0 bnx2 device driver
ESX400-201105219-UG: Updates the VMware ESX 4.0 SCSI-AIC 79xx device driver

Patch Release ESXi400-Update03 contains the following individual bulletins:

ESXi400-201105201-UG: Updates Firmware
ESXi400-201105202-UG: Updates Tools
ESXi400-201105203-UG: Updates VI Client

VMware vCenter™

  • Windows 2008 R2 is now a supported platform
  • Additional guest operating system customization support
  • Additional vCenter Server database support
  • Bug and security fixes

VMware vCenter Server 4.0 Update 3 offers the following improvements:

Guest Operating System Customization Improvements: vCenter Server adds support for customization of the following guest operating systems:

  •         RHEL 6.0 (32-bit and 64-bit)
  •         SLES 11 SP1 (32-bit and 64-bit)
  •         Windows 7 SP1 (32-bit and 64-bit)
  •         Windows Server 2008 R2 SP1

Additional vCenter Server Database Support: vCenter Server now supports the following databases:

  •         Microsoft SQL Server 2008 R2 (32-bit and 64-bit)
  •         Oracle 11g R2 (32-bit and 64-bit)
  •         IBM DB2 9.7.2 (32-bit and 64-bit)

For more information about using IBM DB2 – 9.7.2 database with vCenter Server 4.0 Update 3, see KB 1037354.

Additional vCenter Server Operating System Support: You can install vCenter Server on Windows Server 2008 R2.
Resolved Issues:In addition, this release delivers a number of bug fixes that have been documented in the Resolved Issues section.

VMware vCenter Update Manager

  • Windows 2008 R2 is now a supported platform
  • Additional vCenter Server database support