h1

Quick-Take: NexentaStor 3.1.3 New AD Group Feature, Can Break AD Shares

June 12, 2012

The latest update of NexentaStor may not go too smoothly if you are using Windows Server 2008 AD servers and delegating shares via NexentaStor. While the latest update includes a long sought after fix in AD capabilities (see pull quote below) it may require a tweak to the CIFS Server settings to get things back on track.

Domain Group Support

It is now possible to allow Domain groups as members of local groups. When a Windows client authenticates with NexentaStor using a domain account, NexentaStor consults the domain controller for information about that user’s membership in domain groups. NexentaStor also computes group memberships based on its _local_ groups database, adding both local and domain groups based on local group memberships, which are allowed to be indirect. NexentaStor’s computation of group memberships previously did not correctly handle domain groups as members of local groups.

NexentaStor 3.1.3 Release Notes

In the past, some of NexentaStor’s in-place upgrades have reset the “lmauth_level” of the associated SMB share server from its user configured value back to a “default” of four (4). This did not work very well in an AD environment where the servers were Windows Server 2008 and running their native authentication mode. The fix was to change the “lmauth_level” to two (2) via the NMV or NMC (“sharectl set -p lmauth_level=2 smb”) and restart the service. If you have this issue, the giveaway kernel log entries are as follows:

smbd[7501]: [ID 702911 daemon.notice] smbd_dc_update: myad.local: locate failed
smbd[7501]: [ID 702911 daemon.notice] smbd_dc_monitor: domain service not responding

However, the rules have changed in some applications; Nexenta’s new guidance is:

Summary Description CIFS Issue

A recent patch release by Microsoft has necessitated a changed to the CIFS authorization setting. Without changing this setting, customers will see CIFS disconnects or the appliance being unable to join the Active Directory domain. If you experience CIFS disconnects or problems joining your Active Directory domain, please modify the ‘lmauth_level’ setting.

# sharectl set -p lmauth_level=4 smb

– NexentaStor 3.1.3 Release Notes

While this may work for others out there it does not universally work for any of my tested Windows Server 2008 R2, native AD mode servers. Worse, it appears to work with some shares, but not all; this can lead to some confusion about the actual cause (or resolution) of the problem based on the Nexenta release notes. Fortunately (or not, depending on your perspective), the genesis of NexentaStor is clearlyheading toward an intersection with Illumos although the current kernel is still based on Open Solaris (134f), and a post from OpenIndiana points users to the right solution.

(Jonathan Leafty) I always thought it was weird that lmauth_level had to be set to 2 so I
bumped it back to the default of 3 and restarted smb and it worked...
(Gordon Ross) Glad you found that.  I probably should have sent a "heads-up" when the
"extended security outbound" enhancement went in.  People who have
adjusted down lmauth_level should put it back the the default.

– CIFS in Domain Mode (AD 2008), OpenIndiana Discussion Group (openindiana-discuss@openindiana.org)

Following the advice for OpenIndiana re-enabled all previously configured shares. This mode is also the default for Solaris, although NexentaStor continues to use a different one. According to the man pages for smb on Nexenta (‘man smb(4)’) the difference between ‘lmauth_level=3’ and ‘lmauth_level=4’ is as follows:

lmauth_level

Specifies the LAN Manager (LM) authentication level. The LM compatibility level controls the type of user authentication to use in workgroup mode or
domain mode. The default value is 3.

The following describes the behavior at each level.

2 – In Windows workgroup mode, the Solaris CIFS server accepts LM, NTLM, LMv2, and NTLMv2 requests. In domain mode, the SMB redirector on
the Solaris CIFS server sends NTLM requests.

3 – In Windows workgroup mode, the Solaris CIFS server accepts LM, NTLM, LMv2, and NTLMv2 requests. In domain mode, the SMB redirector on
the Solaris CIFS server sends LMv2 and NTLMv2 requests.

4 – In Windows workgroup mode, the Solaris CIFS server accepts NTLM, LMv2, and NTLMv2 requests. In domain mode, the SMB redirector on the
Solaris CIFS server sends LMv2 and NTLMv2 requests.

5 – In Windows workgroup mode, the Solaris CIFS server accepts LMv2 and NTLMv2 requests. In domain mode, the SMB redirector on the Solaris
CIFS server sends LMv2 and NTLMv2 requests.

Manpage for SMB(4)

This illustrates either a continued dependency on LAN Manager (absent in ‘lmauth_level=4’) or a bug as indicated in the OpenIndiana thread. Either way, more testing to determine if this issue is unique to my particular 2008 AD environment or this is a general issue with the current smb/server facility in NexentaStor…

SOLORI’s Take: So while NexentaStor defaults back to ‘lmauth_level=4’ and ‘lmauth_level=2’ is now broken (for my environment), the “default” for OpenIndiana and Solaris (‘lmauth_level=3’) is a winner; as to why – that’s a follow-up question… Meanwhile, proceed with caution when upgrading to NexentaStor 3.1.3 if your appliance is integrated into AD – testing with the latest virtual appliance for the win.

h1

Short Take: VMware View Client for Android, ICS Update

May 17, 2012

An updated VMware View Client for Android devices hit a the street today sporting a couple of enhancements for Google’s Android OS running the relatively new Ice Cream Sandwich (ICS) version; other improvements are for View 5.1 deployments only.

Here’s a list of the new features in the update available now on Google Play:

– Support for ICS
– Mouse support with hover, right click and scroll wheel (ICS)
– Updated look and feel and improvements for smaller screens
– New Settings dialog includes security mode settings
– Up to 2x better video playback performance
– Optimized for View 5.1
– RADIUS two factor authentication with View 5.1
– Save password option (administrator approval required) with View 5.1
– French, German, Spanish keyboard support with View 5.1

The update is a 5.32MB download, and is available free of charge.

h1

Quick-Take: vCenter 5.0 dies within 48-hours of Installation, Error 1000

May 1, 2012

After upgrading a View installation for a client this weekend from View 4.0 to View 5.0 all seemed well. The upgrade process took them from vSphere 4.0U2 to vSphere 5.0U1 in the bargain – about 15-20 hours of work including backups and staging. Testing and the first 24 hours of production went swimmingly with no negative reports or hiccups. (The upgrade process and spectres of dead pilots-turned-production is an issue for another blog post.)

I got a call about vCenter 5.0 dying (and then magically working again before the local admin could get to it – a couple of minutes or so.) Two mysteries, one easy, one VERY frustrating…

Mystery One – vCenter Dies and Comes Back to Life

This was the easy one: the VMware VirtualCenter Server service is set to a “300000 millisecond” recovery delay upon failure by default. The local site admin didn’t have his prayer answered, the system just recovered as planned. (Note to upgraders – set your recovery time to more or less hold-down time as your site needs – probably no less than 120000 milliseconds.)

The VMware VirtualCenter Server service terminated unexpectedly. It has done this 1 time(s). The following corrective action will be taken in 300000 milliseconds: Restart the service.

– Service Control Manager

Why would five minutes (yep, 300000 milliseconds) be a good amount of recovery time? The socratic answer is this: how long will it take for all of the vCenter log and dump files to be written based on your environment? In the case of this issue, the dump file was about 500MB in size with about another 150MB in various other logs. At a “leisurely pace” of 5 MB/sec (let’s assume the worst), that would require about two minutes of “hold time” before restart.

Mystery Two – vCenter Died. Why?

Here’s the problem: vCenter needs to be bullet proof. vCenter’s installer asks  for your environmental size during the installation and sets parameters to accommodate the basic needs. Also, during the SQL upgrade process from vCenter 4.0 to 5.0, the SQL database is set from SIMPLE (the recommended setting for vCenter) to BULK-LOGGING, but just for the duration of the upgrade. After the upgrade it’s reset back to SIMPLE.

Fast forward 48 hours. vCenter is running with a couple of hundred virtual machines in a View environment and is tracking all of that lovely host and performance data we appreciate when dealing with complex enterprise systems. It’s happily responding to View Connection Server’s request for power-ons and power-offs when all of a sudden the worst happens: it crashes!

Suddenly, 10’s of thousands of dollars worth of infrastructure is waiting for a 5 minute recovery interval and View logins requiring VM power-ons wont happen until then. All is not right in your virtual world now, buckaroo! Let’s see if Windows Event Viewer can elicit a solution:

The description for Event ID 1000 from source VMware VirtualCenter Server cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

Log directory: C:\ProgramData\VMware\VMware VirtualCenter\Logs.

the message resource is present but the message is not found in the string/message table

– Event Viewer, Application Log

Okay, Event ID 1000 – there’s got to be a KB on that one, but seriously, ID 1000 sound pretty generic for me to have a ton of hope. But sure enough, VMware Knowledge Base immediately coughs up KB article 1015101, applicable to vCenter 5.0. Unfortunately, vCenter Server is not installed on an IIS platform, so this is just an empty rabit hole…

Next, let’s have a look at the vCenter Server logs (thoughtfully pointed to in the Event log, above) for vCenter at-or-around the time of failure. Sure enough, there is a gzipped log with the restart time stamp available. A quick glance at the end of the log shows the following “impending doom” quality message:

--> 
--> Panic: TerminateHandler called
--> Backtrace:
--> backtrace[00] rip 000000018013deba (no symbol)
--> backtrace[01] rip 0000000180101518 (no symbol)
 ... 
--> backtrace[60] rip 00000000708f2fdf (no symbol)
--> backtrace[61] rip 00000000708f3080 (no symbol)
-->

– vCenter vpxd-X.log file

But a sobering look above the doomsday report gives us a better idea as to the real culprit: SQL execution failed. What? Did I hear you whisper “kill your DBA?” Before walking down to the DBA and calling him out for leaving you in the lurch, let’s visit the SQL logs to find out (perhaps you will have to talk to the DBA after all if your vCenter admins don’t have access to SQL logs in the environment.) Here’s what my SQL log for the vCenter database said:

05/01/2012 08:05:21,spid62,Unknown,The transaction log for database 'VIM_VCDB' is full. To find out why space in the log cannot be reused<c/> see the log_reuse_wait_desc column in sys.databases
05/01/2012 08:05:21,spid62,Unknown,Error: 9002<c/> Severity: 17<c/> State: 4.
 ...
05/01/2012 08:00:04,spid75,Unknown,The transaction log for database 'VIM_VCDB' is full. To find out why space in the log cannot be reused<c/> see the log_reuse_wait_desc column in sys.databases
05/01/2012 08:00:04,spid75,Unknown,Error: 9002<c/> Severity: 17<c/> State: 4.

– Microsoft SQL Server Log for VIM_VCDB (vCenter)

Note that something to this effect also shows up as a diagnostic message inside the vCenter log – reducing the number of times you need to traipse down to the DBA’s cubby for  a chat. Okay, that cinches it, the DBA’s been meddling in my vCenter database again – probably with some unscheduled and undocumented maintenance. We’re definitely going to have that talk now, right? Nope.

Resolution

Remember that upgrade we did 48-hours ago? As part of the upgrade process, the database is upgraded from the vCenter 4.0’s format to the more information rich vCenter 5.0 format. Along the way, an upgrade process changes the SQL database’s mode from the preferred “SIMPLE” mode to the “BULK-LOGGING” mode so that a failed upgrade can be more easily rolled-back.

Beware:

BULK-LOGGING mode can create a HUGE transaction log during a vCenter upgrade process. There are MANY posts about the TLOG filling-up during these processes, with a consensus that the TLOG needs to be allowed to grow to at least 4x the size of your vCenter database or the process will not complete.

You’ve been warned.

In the case of this upgrade, I happen to know that the TLOG was set to at least 4x of the vCenter database PRIOR to the upgrade process. In fact, during this upgrade (final stage) it grew to 1.5X of the vCenter database size. What was unknown to me – until now – is that the TLOG maximum allowed growth was reset to 500MB when the database was returned to “SIMPLE” mode. During a time of high activity (perhaps processing the last 24-hours of data) the TLOG needed to exceed that amount, couldn’t, and vCenter crashed accordingly. The simple fix is to increase the TLOG limit back to the original settings that works well for the environment.

SOLORI’s Take:

Ouch! Someone feels setup for failure. I never want to hear a customer say: “gosh, everything was great until I logged into vCenter [with the vSphere Client] and then, “all of a sudden” things went sideways” – especially when the cause is that SQL server has been silently modified with setting known to cause it to  choke, subsequently resulting in vCenter coming to a crashing halt.

VMware: if you’re modifying my database parameters POST INSTALL you need to WARN ME or post it in the install or upgrade docs. I’ve combed them and can’t find it… let’s get the upgrade process modified so that the database settings are restored after the database is returned to SIMPLE mode, okay?

Updated 05/02/2012: Corrected intro grammar. Link to TLOG upgrade issue added.

h1

Quick Take: Syslog Stops Working after Upgrade to ESXi 5.0 Update 1

March 24, 2012

If you’ve recently upgraded your ESXi from 5.0 build 456551 and were logging to syslog, it’s possible that your events are no longer being received by your syslog server. It seems that there was a “feature” in ESXi 5.0 build 456551 that allowed syslog to escape the ESXi firewall regardless of the firewall setting. This could be especially problematic if your upgraded from ESXi 4.x where there was no firewall configuration needed for syslog traffic.

VMware notes that syslog traffic was not affected by the ESXi firewall in v5 build 456551. See KB2003322 for details.

However, in ESXi 5.0 Update 1, the firewall rules definitely applies and if you were “grandfathered-in” during the upgrade to build 456551: check your syslog for your ESXi 5 servers. If your no longer getting syslog entries, either set the policy in the host’s Configuration->Security Profile->Properties… control panel:

Enabling syslog traffic in the ESXi firewall within the vSphere Client interface.

 

Or use ESXCLI to do the work (especially with multiple hosts):

esxcli network firewall ruleset set –ruleset-id=syslog –enable=true

esxcli network firewall refresh

That will take care of the “absent” syslog entries.

SOLORI’s Take: Gotcha! As ESXi becomes more like ESX in terms of provisioning, old-school ESXiers (like me) need to make sure they’re up-to-speed on the latest changes in ESXi. Ashamed to admit it, but this exact scenario got me in my home lab… Until I stumbled onto KB2003322 I didn’t think to go back and check the ESXi firewall settings – after all, it was previously working 😉

h1

Quick Take: VMware ESXi 5.0, Patch ESXi50-Update01

March 16, 2012

VMware releases ESXi 5.0 Complete Update 1 for vSphere 5. An important change for this release is the inclusion of general and security-only image profiles:

Starting with ESXi 5.0 Update 1, VMware patch and update releases contain general and security-only image profiles. Security-only image profiles are applicable to new security fixes only. No new bug fixes are included, but bug fixes from earlier patch/update releases are included.

The general release image profile supersedes the security-only profile. Application of the general release image profile applies to new security and bug fixes.

The security-only image profiles are identified with the additional “s” identifier in the image profile name.

Just a few of the more interesting bugs fixed in this release:

PR 712342: Cannot assign VMware vSphere Hypervisor license key to an ESXi host with pRAM greater than 32GB

PR 719895: Unable to add a USB device to a virtual machine (KB 1039359).

PR 721191: Modifying snapshots using the commands vim-cmd vmsvc/snapshot.remove or vim-cmd vmsvc/snapshot.revert
will fail when applied against certain snapshot tree structures.

This issue is resolved in this release. Now a unique identifier, snapshotId, is created for every snapshot associated to a virtual machine. You can get the snapshotId by running the command vim-cmd vmsvc/snapshot.get <vmid>. You can use the following new syntax when working with the same commands:

Revert to snapshot: vim-cmd vmsvc/snapshot.revert <vmid> <snapshotId> [suppressPowerOff/suppressPowerOn]
Remove a snapshot: vim-cmd vmsvc/snapshot.remove <vmid> <snapshotId>

PR 724376: Data corruption might occur if you copy large amounts of data (more than 1GB) from a 64-bit Windows virtual machine to a USB storage device.

PR 725429: Applying a host profile to an in-compliance host causes non-compliance (KB 2003472).

PR 728257: On a pair of HA storage controllers configured for redundancy, if you take over one controller, the datastores that reside on LUNs on the taken over controller might show inactive and remain inactive until you perform a rescan manually.

PR 734366: Purple diagnostic screen with vShield or third-party vSphere integrated firewall products (KB 2004893)

PR 734707: Virtual machines on a vNetwork Distributed Switch (vDS) configured with VLANs might lose network connectivity upon boot if you configure Private VLANs on the vDS. However, disconnecting and reconnecting the uplink solves the problem.This issue has been observed on be2net NICs and ixgbe vNICs.

PR 742242: XCOPY commands that VAAI sends to the source storage device might fail. By default, XCOPY commands should be sent to the destination storage device in accordance with VAAI specification.

PR 750460: Adding and removing a physical NIC might cause an ESXi host to fail with a purple screen. The purple diagnostic screen displays an error message similar to the following:

NDiscVlanCheck (data=0x2d16, timestamp=<value optimized out>) at bora/vmkernel/public/list.h:386

PR 751803: When disks larger than 256GB are protected using vSphere Replication (VR), any operation that causes an internal restart of the virtual disk device causes the disk to complete a full sync. Internal restarts are caused by a number of conditions including any time:

  • A virtual machine is restarted
  • A virtual machine is vMotioned
  • A virtual machine is reconfigured
  • A snapshot is taken of the virtual machine
  • Replication is paused and resumed

PR 754047: When you upgrade VMware Tools the upgrade might fail because, some Linux distributions periodically delete old files and folders in /tmp. VMware Tools upgrade requires this directory in /tmp for auto upgrades.

PR 766179: ESXi host installed on a server with more than 8 NUMA nodes fails and displays a purple screen.

PR 769677: If you perform a VMotion operation to an ESXi host on which the boot-time option “pageSharing” is disabled, the ESXi host might fail with a purple screen.

Disabling pageSharing severely affects performance of the ESXi host. Because pageSharing should never be disabled, starting with this release, the “pageSharing” configuration option is removed.

PR 773187: On an ESXi host, if you configure the Network I/O Control (NetIOC) to set the Host Limit for Virtual Machine Traffic to a value higher than 2000Mbps, the bandwidth limit is not enforced.

PR 773769: An ESXi host halts and displays a purple diagnostic screen when using Network I/O Control with a Network Adapter that does not support VLAN Offload (KB 2011474).

PR 788962: When an ESXi host encounters a corrupt VMFS volume, VMFS driver might leak memory causing VMFS heap exhaustion. This stops all VMFS operations causing orphaned virtual machines and missing datastores. vMotion operations might not work and attempts to start new virtual machines might fail with errors about missing files and memory exhaustion. This issue might affect all ESXi hosts that share the corrupt LUN and have running virtual machines on that LUN.

PR 789483: After you upgrade to ESXi 5.0 from ESXi 4.x, Windows 2000 Terminal Servers might perform poorly. The consoles of these virtual machines might stop responding and their CPU usage show a constant 100%.

PR 789789: ESXi host might fail with a purple screen when a virtual machine connected to VMXNET 2 vNIC is powered on. The purple diagnostic screen displays an error message similar to the following:

0x412261b07ef8:[0x41803b730cf4]Vmxnet2VMKDevTxCoalesceTimeout@vmkernel#nover+0x2b stack: 0x412261b0
0x412261b07f48:[0x41803b76669f]Net_HaltCheck@vmkernel#nover+0xf6 stack: 0x412261b07f98

You might also observe an error message similar to the following written to VMkernel.log:

WARNING: Vmxnet2: 5720: failed to enable port 0x2000069 on vSwitch1: Limit exceeded^[[0m

SOLORI’s Take: Lions, tigers and bears – oh my! In all, I count seven (7) unique PSD bugs (listed in the full KB) along with some rather head-scratching gotchas.  Lots of reasons to keep your vSphere hosts current in this release to be sure… Use Update Manager or start your update journey here…

h1

In-the-Lab: NexentaStor vs ESXi, Redux

February 24, 2012

In my last post, I mentioned a much complained about “idle” CPU utilization quirk with NexentaStor when running as a virtual machine. After reading many supposed remedies on forum postings (some reference in the last blog, none worked) I went pit-bull on the problem… and got lucky.

As an avid (er, frequent) NexentaStor user, the luster of the NMV (Nexenta’s Web GUI) has worn off. Nearly 100% of my day-to-day operations are on the command line and/or Nexenta’s CLI (dubbed NMC). This process includes power-off events (from NMC, issue “setup appliance power-off” or “setup appliance reboot”).

For me, the problem cropped-up while running storage benchmarks on some virtual storage appliances for a client. These VSA’s are bound to a dedicated LSI 9211-8i SAS/6G controller using VMware’s PCI pass-through (Host Configuration, Hardware, Advanced Settings). The VSA uses the LSI controller to access SAS/6G disks and SSDs in a connected JBOD – this approach allows for many permutations on storage HA and avoids physical RDMs and VMDKs. Using a JBOD allows for attachments to PCIe-equipped blades, dense rack servers, etc. and has little impact on VM CPU utilization (in theory).

So I was very surprised to find idle CPU utilization (according to ESXi’s performance charting) hovering around 50% from a fresh installation. This runs contrary to my experience with NexentaStor, but I’ve seen reports of such issues on the forums and even on my own blog. I’ve never been able to reproduce more than a 15-20% per vCPU bias between what’s reported in the VM and what ESXi/vCenter sees. I’ve always attributed this difference to vSMP and virtual hardware (disk activity) which is not seen by the OS but is handled by the VMM.

CPU record of idle and IOzone testing of SAS-attached VSA

During the testing phase, I’m primarily looking at the disk throughput, but I notice a persistent CPU utilization of 50% – even when idle. Regardless, the 4 vCPU VSA appears to perform well (about 725MB/sec 2-process throughput on initial write) despite the CPU deficit (3 vCPU test pictured above, about 600MB/sec write). However, after writing my last blog entry, the 50% CPU leach just kept bothering me.

After wasting several hours researching and tweaking with very little (positive) effect, a client e-mail prompted a NMV walk through with resulted in an unexpected consequence: the act of powering-off the VSA from web GUI (NMV) resulted is significantly reduced idle CPU utilization.

Getting lucky: noticing a trend after using NMV to reboot for a client walk-through of the GUI.

Working with the 3 vCPU VSA over the previous several hours, I had consistently used the NMC (CLI) to reboot and power-off the VM. The fact of simply using the NMV to shutdown the VSA couldn’t have anything to do with idle CPU consumption, could it? Remembering that these were fresh installations I wondered if this was specific to a fresh installation or could it show up in an upgrade. According to the forums, this only hampered VMs, not hardware.

I grabbed a NexentaStor 3.1.0 VM out of the library (one that had been progressively upgraded from 3.0.1) and set about the upgrade process. The result was unexpected: no difference in idle CPU from the previous version; this problem was NOT specific to 3.1.2, but specific to the installation/setup process itself (at least that was the prevailing hypothesis.)

Regression into my legacy VSA library, upgrading from 3.1.1 to 3.1.2 to check if the problem follows the NexentaStor version.

If anything, the upgraded VSA exhibited slightly less idle CPU utilization than its previous version. Noteworthy, however, was the extremely high CPU utilization as the VSA sat waiting for a yes/no response (NMC/CLI) to the “would you like to reboot now” question at the end of the upgrade process (see chart above). Once “no” was selected, CPU dropped immediately to normal levels.

Now it seemed apparent that perhaps an vestige of the web-based setup process (completed by a series of “wizard” pages) must be lingering around (much like the yes/no CPU glutton.) Fortunately, I had another freshly installed VSA to test with – exactly configured and processed as the first one. I fired-up the NMV and shutdown the VSA…

Confirming the impact of the "fix" on a second fresh installed NexentaStor VSA

After powering-on the VM from the vSphere Client it was obvious. This VSA had been running idle for some time, so it’s idle performance baseline – established prior across several reboots from CLI – was well recorded by the ESXi host (see above.) The resulting drop in idle CPU was nothing short of astounding: the 3 vCPU configuration has dropped from a 50% average utilization to 23% idle utilization. Naturally, these findings (still anecdotal) have been forwarded on to engineers at Nexenta. Unfortunately, now I have to go back and re-run my storage benchmarks; hopefully clearing the underlying bug has reduced the needed vCPU count…

h1

In-the-Lab: NexentaStor and VMware Tools, You Need to Tweak It…

February 24, 2012

While working on an article on complex VSA’s (i.e. a virtual storage appliance with PCIe pass-through SAS controllers) an old issue came back up again: NexentaStor virtual machines still have a problem installing VMware Tools since it branched from Open Solaris and began using Illumos. While this isn’t totally Nexenta’s fault – there is no “Nexenta” OS type in VMware to choose from – it would be nice if a dummy package was present to allow a smooth installation of VMware Tools; this is even the case with the latest NexentaStor release: 3.1.2.

I could not find where I had documented the fix in SOLORI’s blog, so here it is… Note, the NexentaStor VM is configured as an Oracle Solaris 11 (64-bit) virtual machine for the purpose of vCenter/ESXi. This establishes the VM’s relationship to a specific VMware Tools load. Installation of VMware Tools in NexentaStor is covered in detail in an earlier blog entry.

VMware Tools bombs-out at SUNWuiu8 package failure. Illumos-based NexentaStor has no such package.

Instead, we need to modify the vmware-config-tools.pl script directly to compensate for the loss of the SUNWuiu8 package that is explicitly required in the installation script.

Commenting out the SUNWuiu8 related section allows the tools to install with no harm to the system or functionality.

Note the full “if” stanza for where the VMware Tools installer checks for ‘tools-for-solaris’ must be commented out. Since the SUNWuiu8 package does not exist – and more importantly is not needed for Illumos/Nexenta – removing a reference to it is a good thing. Now the installation can proceed as normal.

After the changes, installation completes as normal.

That’s all there is to getting the “Oracle Solaris” version of VMware Tools to work in newer NexentaStor virtual machines – now back to really fast VSA’s with JBOD-attached storage…

SOLORI’s Note: There is currently a long-standing bug that affects NexentaStor 3.1.x running as a virtual machine. Currently there is no known workaround to keep NexentaStor from running up a 50% cpu utilization from ESXi’s perspective. Inside the NexentaStor VM we see very little CPU utilization, but from the performance tab, we see 50% utilization on every configured vCPU allocated to the VM. Nexenta is reportedly looking into the cause of the problem.

I looked through this and there is nothing that stands out other that a huge number of interrupts while idle. I am not sure where those interrupts are coming from. I see something occasionally called volume-check and nmdtrace which could be causing the interrupts.

Nexenta Support

A bug report was reportedly filed a couple of days ago to investigate the issue further.