Archive for February, 2012

h1

In-the-Lab: NexentaStor vs ESXi, Redux

February 24, 2012

In my last post, I mentioned a much complained about “idle” CPU utilization quirk with NexentaStor when running as a virtual machine. After reading many supposed remedies on forum postings (some reference in the last blog, none worked) I went pit-bull on the problem… and got lucky.

As an avid (er, frequent) NexentaStor user, the luster of the NMV (Nexenta’s Web GUI) has worn off. Nearly 100% of my day-to-day operations are on the command line and/or Nexenta’s CLI (dubbed NMC). This process includes power-off events (from NMC, issue “setup appliance power-off” or “setup appliance reboot”).

For me, the problem cropped-up while running storage benchmarks on some virtual storage appliances for a client. These VSA’s are bound to a dedicated LSI 9211-8i SAS/6G controller using VMware’s PCI pass-through (Host Configuration, Hardware, Advanced Settings). The VSA uses the LSI controller to access SAS/6G disks and SSDs in a connected JBOD – this approach allows for many permutations on storage HA and avoids physical RDMs and VMDKs. Using a JBOD allows for attachments to PCIe-equipped blades, dense rack servers, etc. and has little impact on VM CPU utilization (in theory).

So I was very surprised to find idle CPU utilization (according to ESXi’s performance charting) hovering around 50% from a fresh installation. This runs contrary to my experience with NexentaStor, but I’ve seen reports of such issues on the forums and even on my own blog. I’ve never been able to reproduce more than a 15-20% per vCPU bias between what’s reported in the VM and what ESXi/vCenter sees. I’ve always attributed this difference to vSMP and virtual hardware (disk activity) which is not seen by the OS but is handled by the VMM.

CPU record of idle and IOzone testing of SAS-attached VSA

During the testing phase, I’m primarily looking at the disk throughput, but I notice a persistent CPU utilization of 50% – even when idle. Regardless, the 4 vCPU VSA appears to perform well (about 725MB/sec 2-process throughput on initial write) despite the CPU deficit (3 vCPU test pictured above, about 600MB/sec write). However, after writing my last blog entry, the 50% CPU leach just kept bothering me.

After wasting several hours researching and tweaking with very little (positive) effect, a client e-mail prompted a NMV walk through with resulted in an unexpected consequence: the act of powering-off the VSA from web GUI (NMV) resulted is significantly reduced idle CPU utilization.

Getting lucky: noticing a trend after using NMV to reboot for a client walk-through of the GUI.

Working with the 3 vCPU VSA over the previous several hours, I had consistently used the NMC (CLI) to reboot and power-off the VM. The fact of simply using the NMV to shutdown the VSA couldn’t have anything to do with idle CPU consumption, could it? Remembering that these were fresh installations I wondered if this was specific to a fresh installation or could it show up in an upgrade. According to the forums, this only hampered VMs, not hardware.

I grabbed a NexentaStor 3.1.0 VM out of the library (one that had been progressively upgraded from 3.0.1) and set about the upgrade process. The result was unexpected: no difference in idle CPU from the previous version; this problem was NOT specific to 3.1.2, but specific to the installation/setup process itself (at least that was the prevailing hypothesis.)

Regression into my legacy VSA library, upgrading from 3.1.1 to 3.1.2 to check if the problem follows the NexentaStor version.

If anything, the upgraded VSA exhibited slightly less idle CPU utilization than its previous version. Noteworthy, however, was the extremely high CPU utilization as the VSA sat waiting for a yes/no response (NMC/CLI) to the “would you like to reboot now” question at the end of the upgrade process (see chart above). Once “no” was selected, CPU dropped immediately to normal levels.

Now it seemed apparent that perhaps an vestige of the web-based setup process (completed by a series of “wizard” pages) must be lingering around (much like the yes/no CPU glutton.) Fortunately, I had another freshly installed VSA to test with – exactly configured and processed as the first one. I fired-up the NMV and shutdown the VSA…

Confirming the impact of the "fix" on a second fresh installed NexentaStor VSA

After powering-on the VM from the vSphere Client it was obvious. This VSA had been running idle for some time, so it’s idle performance baseline – established prior across several reboots from CLI – was well recorded by the ESXi host (see above.) The resulting drop in idle CPU was nothing short of astounding: the 3 vCPU configuration has dropped from a 50% average utilization to 23% idle utilization. Naturally, these findings (still anecdotal) have been forwarded on to engineers at Nexenta. Unfortunately, now I have to go back and re-run my storage benchmarks; hopefully clearing the underlying bug has reduced the needed vCPU count…

h1

In-the-Lab: NexentaStor and VMware Tools, You Need to Tweak It…

February 24, 2012

While working on an article on complex VSA’s (i.e. a virtual storage appliance with PCIe pass-through SAS controllers) an old issue came back up again: NexentaStor virtual machines still have a problem installing VMware Tools since it branched from Open Solaris and began using Illumos. While this isn’t totally Nexenta’s fault – there is no “Nexenta” OS type in VMware to choose from – it would be nice if a dummy package was present to allow a smooth installation of VMware Tools; this is even the case with the latest NexentaStor release: 3.1.2.

I could not find where I had documented the fix in SOLORI’s blog, so here it is… Note, the NexentaStor VM is configured as an Oracle Solaris 11 (64-bit) virtual machine for the purpose of vCenter/ESXi. This establishes the VM’s relationship to a specific VMware Tools load. Installation of VMware Tools in NexentaStor is covered in detail in an earlier blog entry.

VMware Tools bombs-out at SUNWuiu8 package failure. Illumos-based NexentaStor has no such package.

Instead, we need to modify the vmware-config-tools.pl script directly to compensate for the loss of the SUNWuiu8 package that is explicitly required in the installation script.

Commenting out the SUNWuiu8 related section allows the tools to install with no harm to the system or functionality.

Note the full “if” stanza for where the VMware Tools installer checks for ‘tools-for-solaris’ must be commented out. Since the SUNWuiu8 package does not exist – and more importantly is not needed for Illumos/Nexenta – removing a reference to it is a good thing. Now the installation can proceed as normal.

After the changes, installation completes as normal.

That’s all there is to getting the “Oracle Solaris” version of VMware Tools to work in newer NexentaStor virtual machines – now back to really fast VSA’s with JBOD-attached storage…

SOLORI’s Note: There is currently a long-standing bug that affects NexentaStor 3.1.x running as a virtual machine. Currently there is no known workaround to keep NexentaStor from running up a 50% cpu utilization from ESXi’s perspective. Inside the NexentaStor VM we see very little CPU utilization, but from the performance tab, we see 50% utilization on every configured vCPU allocated to the VM. Nexenta is reportedly looking into the cause of the problem.

I looked through this and there is nothing that stands out other that a huge number of interrupts while idle. I am not sure where those interrupts are coming from. I see something occasionally called volume-check and nmdtrace which could be causing the interrupts.

Nexenta Support

A bug report was reportedly filed a couple of days ago to investigate the issue further.