Archive for the ‘Open Source Storage’ Category

h1

Quick-Take: NexentaStor 4.0.1GA

April 14, 2014

Our open storage partner, Nexenta Systems Inc., hit a milestone this month by releasing NexentaStor 4.0.1 for general availability. This release is significant mainly because it is the first commercial release of NexentaStor based on the Open Source Illumos kernel and not Oracle’s OpenSolaris (now closed source). With this move, NexentaStor’s adhering to the company’s  promise of “open source technology” that enables hardware independence and targeted flexibility.

Some highlights in 4.0.1:

  • Faster Install times
  • Better HA Cluster failover times and “easier” cluster manageability
  • Support for large memory host configurations – up to 512GB of DRAM per head/controller
  • Improved handling of intermittently faulty devices (disks with irregular I/O responses under load)
  • New (read: “not backward compatible”) Auto-Sync replication (user configurable zfs+ssh still available for backward compatibility) with support for replication of HA to/from non-HA clusters
    • Includes LZ4 compression (fast) option
    • Better Control of “Force Flags” from NMV
    • Better Control of Buffering and Connections
  • L2ARC Compression now supported
    • Potentially doubles the effective coverage of L2ARC (for compressible data sets)
    • Supports LZ4 compression (fast)
    • Automatically applied if dataset is likewise compressed
  • Server Message Block v2.1 support for Windows (some caveats for IDMAP users)
  • iSCSI support for Microsoft Server 2012 Cluster and Cluster Shared Volume (CSV)
  • Guided storage pool configuration wizards – Performance, Balanced and Capacity modes
  • Enhanced Support Data and Log Gathering
  • High Availability Cluster plug-in (RSF-1) binaries are now part of the installation image
  • VMware: Much better VMXNET3 support
    • no more log spew
    • MTU settings work from NMV
  • VMware: Install to PVSCSI (boot disk) from ISO no longer requires tricks
  • Upgrade from 3.x is currently “disruptive” – promised “non-disruptive” in next maintenance update
  • Improved DTrace capabilities from NMC shell for
    • COMSTAR/iSCSI/FC
    • general IO
  • Snappier, more stable NMV/GUI
    • Service port changes from 2000 to 8457
    • Multi-NMS default
    • Fast refresh for ZFS containers
    • RSF-1 defaults in “Server” settings
    • Improved iSCSI

See Nexenta’s 4.0.1 Release Notes for additional changes and details.

Note, the 18TB Community Edition EULA is still hampered by the “non-commercial” language, restricting it’s use to home, education and academic (ie. training, testing, lab, etc.) targets. However, the “total amount of Storage Space” license for Community is a deviation from the Enterprise licensing (typical “raw” storage entitlement)

2.2 If You have acquired a Community Edition license, the total amount of Storage Space is limited as specified on the Site and is subject to change without notice. The Community Edition may ONLY be used for educational, academic and other non-commercial purposes expressly excluding any commercial usage. The Trial Edition licenses may ONLY be used for the sole purposes of evaluating the suitability of the Product for licensing of the Enterprise Edition for a fee. If You have obtained the Product under discounted educational pricing, You are only permitted to use the Product for educational and academic purposes only and such license expressly excludes any commercial purposes.

– NexentaStor EULA, Version 4.0; Last updated: March 18, 2014

For those who operate under the Community license, this means your total physical storage is UNLIMITED, provided your space “IN USE” falls short of 18TB (18,432 GB) at all times. Where this is important is in constructing useful arrays with “currently available” disks (SATA, SAS, etc.) Let’s say you needed 16TB of AVAILABLE space using “modern” 3TB disks. The fact that your spinning disks are individually larger than 600GB indicates that array rebuild times might run afoul of failure PRIOR to the completion of the rebuild (encountering data loss) and mirror or raidz2/raidz3 would be your best bet for array configuration.

SOLORI Note: Richard Elling made this concept exceedingly clear back in 2010, and his “ZFS data protection comparison” of 2, 3 and 4-way mirrors to raidz, raidz2 and raidz3 is still a great reference on the topic.

Elling’s MTTDL Comparison by RAID Type

 

Given 16TB in 3-way mirror or raidz2 (roughly equivalent MTTDL predictors), your 3TiB disk count would follow as:

3-way Mirror Disks := RoundUp( 16 * (1024 / 1000)^3 / 70% / ( 3 * (1000 / 1024)^3 )  ) * 3 = 27 disks, or

6-disk Raidz2 Disks := RoundUp( 16 * (1024 / 1000)^3 / 70% / ( 4 * 3 * (1000 / 1024)^3 )  ) * 6 = 18 disks

By “raw” licensing standards, the 3-way mirror would require a 76TB license while the raidz2 volume would require a 51TB license – a difference of 25TB in licensing (around $5,300 retail). However, under the Community License, the “cost” is exactly the same, allowing for a considerable amount of flexibility in array loadout and configuration.

Why do I need 54TiB in disk to make 16TB of “AVAILABLE” storage in Raidz2?

The RAID grouping we’ve chosen is 6-disk raidz2 – that’s akin to 4 data and 2 parity disks in RAID6 (without the fixed stripe requirement or the “write hole penalty.”) This means, on average, one third of the space consumed on-disk will be in the form of parity information. Therefore, right of the top, we’re losing 33% of the disk capacity. Likewise, disk manufacturers make TiB not TB disks, so we lose 7% of “capacity” in the conversion from TiB to TB. Additionally, we like to have a healthy amount of space reserved for new block allocation and recommend 30% unused space as a target. All combined, a 6-disk raidz array is, at best, 43% efficient in terms of capacity (by contrast, 3-way mirror is only 22% space efficient). For an array based on 3TiB disks, we therefore get only 1.3TB of usable storage – per disk – with 6-disk raidz (by contrast, 10-disk raidz nets only 160GB additional “usable” space per disk.)

 SOLORI’s Take: If you’re running 3.x in production, 4.0.1 is not suitable for in-place upgrades (yet) so testing and waiting for the “non-disruptive” maintenance release is your best option. For new installations – especially inside a VM or hypervisor environment as a Virtual Storage Appliance (VSA) – version 4.0.1 presents a better option over it’s 3.x siblings. If you’re familiar with 3.x, there’s not much new on the NMV side outside better tunables and snappier response.

h1

NFS and VMware: Perfect for Small Business? Part 1 – Introduction

August 22, 2012

Nexenta System’s “open storage” software made significant inroads into the VMware community over the last year with NFS storage. Even though Nexenta has been a partner with VMware for much longer, the storage vendor really made it’s debut at last year’s VMworld 2011 Hands-on-Labs by showcasing it’s NFS-for-VMware solution running on commodity hardware:

And, here’s the kicker, NexentaStor was running on industry standard hardware from Supermicro with STEC drives for write and read cache and 7200 rpm SAS drives for capacity.  Monday some DRAM on one of the four servers (two HA pairs) failed.  And no end users noticed because of our HA cluster performed correctly and failed over.  Meanwhile our load increased from a designed 33% to over 60% of the total load of the Hands on Lab due to unspecified issues with either NetApp or EMC.

Evan Powell, CEO – Nexenta Systems, VMworld Reviewed

While this was indeed an important inflection point in the VMware/Nexenta relationship, in broader terms Nexenta’s success at VMworld was the probably the moment when commodity NFS stepped out of the shadow of block storage. To be fair, there are many enterprise alternatives to Nexenta for NFS storage – like NetApp and EMC, but there are few can be deployed on commodity hardware, fewer that do both hardware and virtual storage appliances, and fewer still that have commercially licensed and community licensed distributions of the same platform.

If you’ve ever asked the question, “what’s the best storage solution for my vSphere stack?” I’d be willing to bet that NFS was not high on the list of recommendations. If you’ve looked at the related product marketing materials, as I have, or engaged front-line VMware personnel in a discussion of primary storage solutions, between 2009 and 2011, as I have, you’d be hard pressed to leave the conversation with a recommendation to use NFS. If Nexenta’s appearance can “prove” that open storage solutions based on NFS (and commodity hardware) are “ready” for big cloud infrastructures, can it be true that it’s a perfect fit for a small business’ private cloud? I’d say a resounding YES, but…

Introduction, NFS versus Block Storage

Before you say, “thanks for the tip, Collin, but who needs commercial stuff when NFS services are included in practically every Linux distribution, and “no cost” solutions -like FreeNAS – make NFS cheap and easy?” While it is true that solutions like this have been very popular with lab and bare-bones users, but most enterprises (even small ones) require a “bet the business” level of support and stability that isn’t often found in “community supported” distributions and do-it-yourself implementations. Even if the though “any NFSv3 server” – properly sized and configured – should work with VMware according to its abilities: it’s up to you to decide if the basket fits your eggs. The commercial NFS vendors really know their stuff, so you’re buying expertise, experience and a well-refined playbook: something you’ll be giving up when you go it alone.

Despite being “block storage’s whipping boy,” to say NFS is “not ready for prime time” in today’s VMware product matrix would be the height of FUD-peddling. On the contrary, a well know post in 2009 from noted EMC’r Chad Sakac and NetApp’s Vaughn Stewart made a great case for NFS in the enterprise in their multi-vendor post back in 2009. Since then, many improvements in NFS offerings and vSphere capabilities have increased NFS’ appeal in that space, not diminished it. To quote the Virtual Geek:

“NFS is an absolutely legitimate storage model for VMware – with many advantages.”

– Chad Sakac, aka Virtual Geek, EMC VP VMware Technology Alliance

Certainly there is a lot to like in pairing NFS with vSphere 5.x no matter the scale of the enterprise. Here are some of the high-points:

  • NFS works seamlessly with Storage I/O Control and Network I/O Control to support converged network architectures;
  • NFS exposes VMDKs to 3rd party tools and scripts without VMFS proxies, enabling:
    • Simple Backup/Recovery of VM, VMDK from NAS is a file copy operation
    • Linux, Windows7, etc. support NFS clients out of the box
    • Replication of VM or VMDK from NAS can be achieved simply with rsync
    • Use of snapshotted NFS volumes does not require ESX/VMFS
  • Reclamation of unused storage is not array dependent (file deletes return to storage immediately without SCSI Unmap support or equivalent)
  • Not subject to LUN locking and related performance issues in block/VMFS
  • It’s simpler to use: in the link above, VMware dedicates 24 pages to block/VMFS and only 3 to NFS
  • Presentation and management of NAS storage is very familiar (it’s a filer)
  • NFS is very forgiving of “imperfect” network configurations – compared to iSCSI, especially where network time-outs and latency are concerned
  • NFS storage does not need to be available at ESXi boot time, enabling VMs to exist on VSA running on-top of the host ESXi server (enabling recursive storage possibilities and reduced/shared hardware costs)
  • Mounting an NFS snapshot to vSphere does not include a signature operation (or risk possible collision)
  • NFS does not require VAAI to resolve SCSI file locking and VM loading limitations consistent with SCSI-based block storage
  • vSphere 5 currently support 256 NFS mounts per host
    • NFS.MaxVolumes (per host) – default 8, max 256
  • Single file size not limited on NFS file systems, however
    • Without 3rd party NAS VAAI, all VMDKs on NAS are always thin provisioned
    • Single file size limited to NAS vendor file system constraints
    • VMDK uses 512-byte sectors, so it suffers from the same limitations as physical disks, hence it will still have a 2TB-512-byte limit (since VMware has no 4K-byte sector VMDK, there will be no way to support 2TB+ VMDKs on NFS until that time)
  • NFS volumes are not limited in size
    • For NetApp WAFL, the limit is up to 100TB (with restrictions)
    • For NexentaStor, the limit is determined by the zpool size
  • On-line expansion of an NFS file system is a one-step operation: expand the file system on the filer

That said, NFS still cannot replace block storage on Tier 1 applications that were designed for block storage. Even iSCSI – arguably the least common denominator in shared block storage for VMware – still has some built-in advantages (and unique disadvantages) as compared to NFS. Likewise, when we’re talking about block storage in VMware we’re usually talking about VMFS too:

  • Writes are almost always asynchronous, making even low-end iSCSI “appear” to be faster than low-end NFS
  • Interface redundancy is straight forward and deterministic with many good options for redundancy
  • Storage latency in iSCSI/block is “more predictable” across common use cases
  • vSphere 5 currently supports 256 LUNs per host (similar to NFS mount limit)
    • Disk.MaxLUN (per target) – default 256, max 256
    • Total VMFS LUNs per host cannot exceed Disk.MaxLUN, regardless of type (FC, SAS, iSCSI, etc.)
  • vSphere VMFS3/5 limits single file size (VMDK and virtual RDM) to 2TB (minus 512 bytes)
  • VMFS3 limits single volume size to 50-64TB depending on block size chosen when formatted
  • VMFS5 limits single volume size to 64TB for VMFS5 (always uses 1MB block size)
  • vSphere’s storage telemetry is still geared towards block versus filer storage, making trouble-shooting of “performance issues” more available
  • Pairing storage to interface is much easier to do, even on-the-fly
  • Exchange 2010 expressly forbids the use of NAS storage as VMDK datastores
  • Virtual RDM and Clustering (shared block) require block storage (in some cases, not even iSCSI qualifies for support)
  • Tier 1 application support on block-based storage is generally better (familiarity and testing)
  • VMware VAAI for block storage ships with vSphere, similar acceleration features for NAS must come from the vendor (creating a much lest robust out-of-the-box experience for SMB)
  • On-line VMFS expansion usually requires two steps, with some caveats:
    • For VMFS expansions using a single LUN expansions under 2TB: (1) expand the underlying LUN on the SAN, (2) expand VMFS with the new space on the LUN
    • Single LUN expansions over 2TB require VMFS5
    • VMFS3 volume expansion beyond 2TB require multiple extents, each of which may not exceed 2TB-512B – loss of a single extent in a multi-extent volume could mean a loss of the entire volume
    • VMFS5 supports single LUNs (extents) as large as 60TB

Sparse VAAI issues aside, NFS is a great go-to storage protocol for most virtual workloads that do not strictly require block or shared-block storage back-ends (clustering, et al). Where NFS struggles today – in terms of VMware implementations in the SMB space – is in network resiliency. It is not that you cannot make NFS resilient to network failures, it’s more or less that redundancy is not neatly baked-into the service or protocol like it is for iSCSI, SAS and Fiber Channel – these block-based services have mature, multi-session amd multi-path capabilities at the service level (multi-path targets and initiators).

Note about 2TB VMDK limitations – given that most modern OSes running as supported virtual machines support some form of LUN concatenation (extents) to bypass 2TB physical disk limitations, the very same facilities can be leveraged to bypass the 2TB VMDK limits for these OSes. While this is not an optimal solution, it is a supported one. Today’s physical disks that exceed 2TB in size do so with 4KB sectors instead of 512B sectors. Currently, there is no 4KB sector VMDK analog.

Next Up, NFS and Path Redundancy

Hopefully by now there’s a compelling argument to look deeper into the NFS/VMware question, but – as with most shared, network storage – the rubber meets the road at the network layer. To me, the secret to making NFS more robust is in the network architecture that underpins it: depending on the complexity of the environment, the network layer will make or break an NFS implementation. In some ways there’s a lot more to making NFS “redundant’ (due to it’s lack of multipath capabilities): it’s not impossible; it’s not difficult; it’s just full of options and caveats.

Unlike block storage, you can’t “throw up two network interfaces, two target ports and two initiator ports” and easily have path redundancy and multipath data. With NFS, the network – not the storage service – does most of the “heavy lifting” and – as you’ll see in the next post – NFS has absolutely no concept of multipath. Therefore, I’m going to spend the next entry reviewing some of the main points driving network and NFS service dependencies that make understanding NFS network resiliency a bit more accessible.

h1

Quick-Take: NexentaStor 3.1.3 New AD Group Feature, Can Break AD Shares

June 12, 2012

The latest update of NexentaStor may not go too smoothly if you are using Windows Server 2008 AD servers and delegating shares via NexentaStor. While the latest update includes a long sought after fix in AD capabilities (see pull quote below) it may require a tweak to the CIFS Server settings to get things back on track.

Domain Group Support

It is now possible to allow Domain groups as members of local groups. When a Windows client authenticates with NexentaStor using a domain account, NexentaStor consults the domain controller for information about that user’s membership in domain groups. NexentaStor also computes group memberships based on its _local_ groups database, adding both local and domain groups based on local group memberships, which are allowed to be indirect. NexentaStor’s computation of group memberships previously did not correctly handle domain groups as members of local groups.

NexentaStor 3.1.3 Release Notes

In the past, some of NexentaStor’s in-place upgrades have reset the “lmauth_level” of the associated SMB share server from its user configured value back to a “default” of four (4). This did not work very well in an AD environment where the servers were Windows Server 2008 and running their native authentication mode. The fix was to change the “lmauth_level” to two (2) via the NMV or NMC (“sharectl set -p lmauth_level=2 smb”) and restart the service. If you have this issue, the giveaway kernel log entries are as follows:

smbd[7501]: [ID 702911 daemon.notice] smbd_dc_update: myad.local: locate failed
smbd[7501]: [ID 702911 daemon.notice] smbd_dc_monitor: domain service not responding

However, the rules have changed in some applications; Nexenta’s new guidance is:

Summary Description CIFS Issue

A recent patch release by Microsoft has necessitated a changed to the CIFS authorization setting. Without changing this setting, customers will see CIFS disconnects or the appliance being unable to join the Active Directory domain. If you experience CIFS disconnects or problems joining your Active Directory domain, please modify the ‘lmauth_level’ setting.

# sharectl set -p lmauth_level=4 smb

– NexentaStor 3.1.3 Release Notes

While this may work for others out there it does not universally work for any of my tested Windows Server 2008 R2, native AD mode servers. Worse, it appears to work with some shares, but not all; this can lead to some confusion about the actual cause (or resolution) of the problem based on the Nexenta release notes. Fortunately (or not, depending on your perspective), the genesis of NexentaStor is clearlyheading toward an intersection with Illumos although the current kernel is still based on Open Solaris (134f), and a post from OpenIndiana points users to the right solution.

(Jonathan Leafty) I always thought it was weird that lmauth_level had to be set to 2 so I
bumped it back to the default of 3 and restarted smb and it worked...
(Gordon Ross) Glad you found that.  I probably should have sent a "heads-up" when the
"extended security outbound" enhancement went in.  People who have
adjusted down lmauth_level should put it back the the default.

– CIFS in Domain Mode (AD 2008), OpenIndiana Discussion Group (openindiana-discuss@openindiana.org)

Following the advice for OpenIndiana re-enabled all previously configured shares. This mode is also the default for Solaris, although NexentaStor continues to use a different one. According to the man pages for smb on Nexenta (‘man smb(4)’) the difference between ‘lmauth_level=3’ and ‘lmauth_level=4’ is as follows:

lmauth_level

Specifies the LAN Manager (LM) authentication level. The LM compatibility level controls the type of user authentication to use in workgroup mode or
domain mode. The default value is 3.

The following describes the behavior at each level.

2 – In Windows workgroup mode, the Solaris CIFS server accepts LM, NTLM, LMv2, and NTLMv2 requests. In domain mode, the SMB redirector on
the Solaris CIFS server sends NTLM requests.

3 – In Windows workgroup mode, the Solaris CIFS server accepts LM, NTLM, LMv2, and NTLMv2 requests. In domain mode, the SMB redirector on
the Solaris CIFS server sends LMv2 and NTLMv2 requests.

4 – In Windows workgroup mode, the Solaris CIFS server accepts NTLM, LMv2, and NTLMv2 requests. In domain mode, the SMB redirector on the
Solaris CIFS server sends LMv2 and NTLMv2 requests.

5 – In Windows workgroup mode, the Solaris CIFS server accepts LMv2 and NTLMv2 requests. In domain mode, the SMB redirector on the Solaris
CIFS server sends LMv2 and NTLMv2 requests.

Manpage for SMB(4)

This illustrates either a continued dependency on LAN Manager (absent in ‘lmauth_level=4’) or a bug as indicated in the OpenIndiana thread. Either way, more testing to determine if this issue is unique to my particular 2008 AD environment or this is a general issue with the current smb/server facility in NexentaStor…

SOLORI’s Take: So while NexentaStor defaults back to ‘lmauth_level=4’ and ‘lmauth_level=2’ is now broken (for my environment), the “default” for OpenIndiana and Solaris (‘lmauth_level=3’) is a winner; as to why – that’s a follow-up question… Meanwhile, proceed with caution when upgrading to NexentaStor 3.1.3 if your appliance is integrated into AD – testing with the latest virtual appliance for the win.

h1

In-the-Lab: NexentaStor vs ESXi, Redux

February 24, 2012

In my last post, I mentioned a much complained about “idle” CPU utilization quirk with NexentaStor when running as a virtual machine. After reading many supposed remedies on forum postings (some reference in the last blog, none worked) I went pit-bull on the problem… and got lucky.

As an avid (er, frequent) NexentaStor user, the luster of the NMV (Nexenta’s Web GUI) has worn off. Nearly 100% of my day-to-day operations are on the command line and/or Nexenta’s CLI (dubbed NMC). This process includes power-off events (from NMC, issue “setup appliance power-off” or “setup appliance reboot”).

For me, the problem cropped-up while running storage benchmarks on some virtual storage appliances for a client. These VSA’s are bound to a dedicated LSI 9211-8i SAS/6G controller using VMware’s PCI pass-through (Host Configuration, Hardware, Advanced Settings). The VSA uses the LSI controller to access SAS/6G disks and SSDs in a connected JBOD – this approach allows for many permutations on storage HA and avoids physical RDMs and VMDKs. Using a JBOD allows for attachments to PCIe-equipped blades, dense rack servers, etc. and has little impact on VM CPU utilization (in theory).

So I was very surprised to find idle CPU utilization (according to ESXi’s performance charting) hovering around 50% from a fresh installation. This runs contrary to my experience with NexentaStor, but I’ve seen reports of such issues on the forums and even on my own blog. I’ve never been able to reproduce more than a 15-20% per vCPU bias between what’s reported in the VM and what ESXi/vCenter sees. I’ve always attributed this difference to vSMP and virtual hardware (disk activity) which is not seen by the OS but is handled by the VMM.

CPU record of idle and IOzone testing of SAS-attached VSA

During the testing phase, I’m primarily looking at the disk throughput, but I notice a persistent CPU utilization of 50% – even when idle. Regardless, the 4 vCPU VSA appears to perform well (about 725MB/sec 2-process throughput on initial write) despite the CPU deficit (3 vCPU test pictured above, about 600MB/sec write). However, after writing my last blog entry, the 50% CPU leach just kept bothering me.

After wasting several hours researching and tweaking with very little (positive) effect, a client e-mail prompted a NMV walk through with resulted in an unexpected consequence: the act of powering-off the VSA from web GUI (NMV) resulted is significantly reduced idle CPU utilization.

Getting lucky: noticing a trend after using NMV to reboot for a client walk-through of the GUI.

Working with the 3 vCPU VSA over the previous several hours, I had consistently used the NMC (CLI) to reboot and power-off the VM. The fact of simply using the NMV to shutdown the VSA couldn’t have anything to do with idle CPU consumption, could it? Remembering that these were fresh installations I wondered if this was specific to a fresh installation or could it show up in an upgrade. According to the forums, this only hampered VMs, not hardware.

I grabbed a NexentaStor 3.1.0 VM out of the library (one that had been progressively upgraded from 3.0.1) and set about the upgrade process. The result was unexpected: no difference in idle CPU from the previous version; this problem was NOT specific to 3.1.2, but specific to the installation/setup process itself (at least that was the prevailing hypothesis.)

Regression into my legacy VSA library, upgrading from 3.1.1 to 3.1.2 to check if the problem follows the NexentaStor version.

If anything, the upgraded VSA exhibited slightly less idle CPU utilization than its previous version. Noteworthy, however, was the extremely high CPU utilization as the VSA sat waiting for a yes/no response (NMC/CLI) to the “would you like to reboot now” question at the end of the upgrade process (see chart above). Once “no” was selected, CPU dropped immediately to normal levels.

Now it seemed apparent that perhaps an vestige of the web-based setup process (completed by a series of “wizard” pages) must be lingering around (much like the yes/no CPU glutton.) Fortunately, I had another freshly installed VSA to test with – exactly configured and processed as the first one. I fired-up the NMV and shutdown the VSA…

Confirming the impact of the "fix" on a second fresh installed NexentaStor VSA

After powering-on the VM from the vSphere Client it was obvious. This VSA had been running idle for some time, so it’s idle performance baseline – established prior across several reboots from CLI – was well recorded by the ESXi host (see above.) The resulting drop in idle CPU was nothing short of astounding: the 3 vCPU configuration has dropped from a 50% average utilization to 23% idle utilization. Naturally, these findings (still anecdotal) have been forwarded on to engineers at Nexenta. Unfortunately, now I have to go back and re-run my storage benchmarks; hopefully clearing the underlying bug has reduced the needed vCPU count…

h1

In-the-Lab: NexentaStor and VMware Tools, You Need to Tweak It…

February 24, 2012

While working on an article on complex VSA’s (i.e. a virtual storage appliance with PCIe pass-through SAS controllers) an old issue came back up again: NexentaStor virtual machines still have a problem installing VMware Tools since it branched from Open Solaris and began using Illumos. While this isn’t totally Nexenta’s fault – there is no “Nexenta” OS type in VMware to choose from – it would be nice if a dummy package was present to allow a smooth installation of VMware Tools; this is even the case with the latest NexentaStor release: 3.1.2.

I could not find where I had documented the fix in SOLORI’s blog, so here it is… Note, the NexentaStor VM is configured as an Oracle Solaris 11 (64-bit) virtual machine for the purpose of vCenter/ESXi. This establishes the VM’s relationship to a specific VMware Tools load. Installation of VMware Tools in NexentaStor is covered in detail in an earlier blog entry.

VMware Tools bombs-out at SUNWuiu8 package failure. Illumos-based NexentaStor has no such package.

Instead, we need to modify the vmware-config-tools.pl script directly to compensate for the loss of the SUNWuiu8 package that is explicitly required in the installation script.

Commenting out the SUNWuiu8 related section allows the tools to install with no harm to the system or functionality.

Note the full “if” stanza for where the VMware Tools installer checks for ‘tools-for-solaris’ must be commented out. Since the SUNWuiu8 package does not exist – and more importantly is not needed for Illumos/Nexenta – removing a reference to it is a good thing. Now the installation can proceed as normal.

After the changes, installation completes as normal.

That’s all there is to getting the “Oracle Solaris” version of VMware Tools to work in newer NexentaStor virtual machines – now back to really fast VSA’s with JBOD-attached storage…

SOLORI’s Note: There is currently a long-standing bug that affects NexentaStor 3.1.x running as a virtual machine. Currently there is no known workaround to keep NexentaStor from running up a 50% cpu utilization from ESXi’s perspective. Inside the NexentaStor VM we see very little CPU utilization, but from the performance tab, we see 50% utilization on every configured vCPU allocated to the VM. Nexenta is reportedly looking into the cause of the problem.

I looked through this and there is nothing that stands out other that a huge number of interrupts while idle. I am not sure where those interrupts are coming from. I see something occasionally called volume-check and nmdtrace which could be causing the interrupts.

Nexenta Support

A bug report was reportedly filed a couple of days ago to investigate the issue further.

h1

Short-Take: Nexenta 3.1 Adds VAAI Support, Auto-Sync Resume

August 3, 2011

Nexneta Systems Inc released version 3.1 of its open storage software yesterday with a couple of VMware vSphere-specific feature enhancements. These enhancements are specifically targets at VMware’s vStorage API for Array Integration (VAAI) which promises to accelerate certain “costly” storage operations by pushing their implementation to the storage array instead of the ESX host.

From NexentaStor 3.1 Release Notes, the primitives implemented in 3.1 that contribute to VAAI support include:

  • SCSI Write Same: Supported in vSphere 4.1 and later
    Example. Accelerates zero block writes when creating new virtual disks.
  • SCSI ATS (Atomic Test & Set): Supported in vSphere 4.1 and later.
    Example. Enables specific LUN “region” to be locked instead of entire LUN when cloning a VM.
  • SCSI Block Copy: Supported in vSphere 4.1 and later.
    Example. Avoids reading and writing of block data “through” the ESX host during a block copy operation by allowing VMware to instruct the SAN to do so.
  • SCSI Unmap: Supported in vSphere 5 and later. Enables freed blocks to be returned to the zpool for new allocation when no longer used for VM storage.

Additional “optimizations” and improvements from Nexenta in 3.1 include:

  • In-flight deduplication
  • ARC performance enhancements
  • multiple connections per session for iSCSI
  • DMU fast path for iSCSI (i.e. no extra copy)
  • Auto-sync “resume” with progress bar in GUI/NMV and ability to change source/destination paths OTF
  • Parallel tasks in NMV (i.e. no more busy process “hangs”)
  • Improved CIFS performance
  • Support for multiple DC/DC fail-over for CIFS
  • Better cross-forrest trusts with CIFS
  • Configuration monitoring/reporting via “ConfGuard” plug-in
  • Multiple VIP per service for HA Cluster, fail-over of local users and elimination of separate heartbeat device
  • JBOD management for select devices from within the NMV

Given the addition of VAAI features, the upgrade offers some compelling reasons to make the move to NexentaStor 3.1 and at the same time removes obstacles from choosing NexentaStor as a VMware iSCSI platform for SMB/SME (versus low-end EMC VNXE, which at last look was still waiting on VAAI support.) However, for existing vSphere 4.1+ environments, a word of caution: you will want to “test, test, test” before upgrading to (or enabling) VAAI (fortunately, there’s a NexentaStor VSA available).

Auto-Sync Resume

In the past, NexentaStor’s auto-sync plug-in has been the only integrated means of block replication from one storage pool (or array) to another. In the past, the plug-in allowed for periodic replication events to be scheduled which drew from a marker snapshot until the replication was complete. Upon extended error (where the replication fails), the failure of the replication causes a roll-back to the marker point, eliminating any data that has transferred between the pools. For WAN replication, this can be costly as no check-points are created along the way.

More problematically, there has been no way to recreate a replication service in the event it has been either deleted or missing (i.e. zpool moved to a new host.) This creates a requirement for the replication to start over from scratch – a problem for very large datasets. With Auto-Sync 3.1, later problem is resolved, and provided NexentaStor can find at least one pair of identical snapshots for the file system.

Where I find this new “feature” particularly helpful is in seed replications to external storage devices (i.e. USB2.0 arrays, JBODs, etc.). This allows for a replication to external, removable storage to (1) be completed locally, (2) shipped to a central repository, and (3) a remote replication service created to continue the replication updates over the WAN.

Additionally, consider the case where the above local-to-WAN replication seeding takes place over the course of several months and the hardware at the central repository fails, requiring the replication pool to be moved to another NexentaStor instance. In the past, the limitation on auto-sync would have required a brand new replication set, regardless of the consistency of the replicated data on the relocated pool. Now, a new (replacement) service can be created pointing to the new destination, and auto-sync promises to find the data – intact – and resume the replication updates starting with the last identical marker snapshot.

NexentaStor Native Transport

The default transport for replication in NexentaStor 3.1 is now NexentaStor’s TCP-based Remote Replication protocol (RR). While SSH is still an option for non-NexentaStor destinations, netcat is no longer supported for auto-sync replications. While no indication of performance benefits are available, two tunable parameters are available for RR auto-sync services (per service): TCP connection count (-n) and TCP package size (-P). Defaults for each of these are 4 and 1024, respectively, meaning 4 connections and 1024KB PDU size for the replication session.

Conclusions

For VMware vSphere deployments in SMB, SME and ROBO environments, NexentaStor 3.1 looks to be a good fit, offering high-performance CIFS, NFS, iSCSI and Fiber Channel options in a unified storage environment complete with VAAI support to accelerate vStorage applications. For VMware View installations using NexentaStor, the VAAI/ATS feature should resolve some iSCSI locking behavior issues that have made NFS more attractive but remove SCSI-based VAAI features. That said, with the storage provisioning changes in View 4.5 and upcoming View 5, the ability to pick from FC, iSCSI or NFS (especially at 10G) from within the same storage platform has definite advantages (if not complexity implications.) Suffice to say, NexentaStor’s update is adding more open storage tools to the VMware virtualization architect’s bag of tricks.

NexentaStor 3.1 is available for download now.

Update, 8/12/2011:

Nexenta has found some problems with 3.1 post Q/A. They’ve released this statement on the matter:

Nexenta places the highest importance on maintaining access to and integrity of customer data. The purpose of this Technical Bulletin is to make you aware of an issue with the process of upgrading to 3.1. Nexenta has discovered an issue with the software delivery mechanism we use. This issue can result in errors during the upgrade process and some functionality not being installed properly. Please postpone upgrading to v3.1 until our next Technical Bulletin update. We are actively working to get this corrected and get it back to 100 % service as fast as possible. Until the issue is resolved we have removed 3.1 from the website and suspended upgrades. Thanks for your patience.

Nexenta Support, Aug. 6, 2011

According to sources from within Nexenta, the problems appear to be more related to APT repository/distribution issues “rather than the 3.1 codebase.” All ISO and repository distribution for 3.1 has been pulled until further notice and links to information about 3.1 on the corporate Nexenta site are no longer working…

Update, 8/17/2011:

Today, while working on a follow-up post, the lab systems (virtual storage appliances) were updated to NexentaStor 3.1.1 (both Enterprise and Community editions). Since a question was raised about the applicability of the VAAI enhancements to Community Edition (NexentaStor CE), I’ve got a teaser for you: see the following image of two identical LUNs mounted to an ESXi host from NexentaStor Enterprise Edition (NSEE) and NexentaStor Community Edition (NSCE). If you look closely, you’ll notice they BOTH show “supported” status.

vSphere VMFS5 Datastores provided by NexentaStor Community (VSA04) and Enterprise (VSA03) editions.

Update, 8/19/2011:

Nexenta officially re-released NexentaStor 3.1 today in the form of version 3.1.1 – it is available for download now.

h1

In-the-Lab: Default Rights on CIFS Shares

December 6, 2010

Following-up on the last installment of managing CIFS shares, there has been a considerable number of questions as to how to establish domain user rights on the share. From these questions it is apparent that the my explanation about root-level share permissions could have been more clear. To that end, I want to look at default shares from a Windows SBS Server 2008 R2 environment and translate those settings to a working NexentaStor CIFS share deployment.

Evaluating Default Shares

In SBS Server 2008, a number of default shares are promulgated from the SBS Server. Excluding the “hidden” shares, these include:

  • Address
  • ExchangeOAB
  • NETLOGON
  • Public
  • RedirectedFolders
  • SYSVOL
  • UserShares
  • Printers

Therefore, it follows that a useful exercise in rights deployment might be to recreate a couple of these shares on a NexentaStor system and detail the methodology. I have chosen the NETLOGON and SYSVOL shares as these two represent default shares common in all Windows server environments. Here are their relative permissions:

NETLOGON

From the Windows file browser, the NETLOGON share has default permissions that look like this:

NETLOGON Share permissions

Looking at this same permission set from the command line (ICALCS.EXE), the permission look like this:

NETLOGON permissions as reported from icacls
The key to observe here is the use of Windows built-in users and NT Authority accounts. Also, it is noteworthy that some administrative privileges are different depending on inheritance. For instance, the Administrator’s rights are less than “Full” permissions on the share, however they are “Full” when inherited to sub-dirs and files, whereas SYSTEM’s permissions are “Full” in both contexts.

SYSVOL

From the Windows file browser, the NETLOGON share has default permissions that look like this:

SYSVOL network share permissions

Looking at this same permission set from the command line (ICALCS.EXE), the permission look like this:

SYSVOL permissions from ICACLS.EXE
Note that Administrators privileges are truncated (not “Full”) with respect to the inherited rights on sub-dirs and files when compared to the NETLOGON share ACL.

Create CIFS Shares in NexentaStor

On a ZFS pool, create a new folder using the Web GUI (NMV) that will represent the SYSVOL share. This will look something like the following:
Creating the SYSVOL share
Read the rest of this entry ?