Archive for the ‘In-the-Lab’ Category

h1

In-the-Lab: NexentaStor vs ESXi, Redux

February 24, 2012

In my last post, I mentioned a much complained about “idle” CPU utilization quirk with NexentaStor when running as a virtual machine. After reading many supposed remedies on forum postings (some reference in the last blog, none worked) I went pit-bull on the problem… and got lucky.

As an avid (er, frequent) NexentaStor user, the luster of the NMV (Nexenta’s Web GUI) has worn off. Nearly 100% of my day-to-day operations are on the command line and/or Nexenta’s CLI (dubbed NMC). This process includes power-off events (from NMC, issue “setup appliance power-off” or “setup appliance reboot”).

For me, the problem cropped-up while running storage benchmarks on some virtual storage appliances for a client. These VSA’s are bound to a dedicated LSI 9211-8i SAS/6G controller using VMware’s PCI pass-through (Host Configuration, Hardware, Advanced Settings). The VSA uses the LSI controller to access SAS/6G disks and SSDs in a connected JBOD – this approach allows for many permutations on storage HA and avoids physical RDMs and VMDKs. Using a JBOD allows for attachments to PCIe-equipped blades, dense rack servers, etc. and has little impact on VM CPU utilization (in theory).

So I was very surprised to find idle CPU utilization (according to ESXi’s performance charting) hovering around 50% from a fresh installation. This runs contrary to my experience with NexentaStor, but I’ve seen reports of such issues on the forums and even on my own blog. I’ve never been able to reproduce more than a 15-20% per vCPU bias between what’s reported in the VM and what ESXi/vCenter sees. I’ve always attributed this difference to vSMP and virtual hardware (disk activity) which is not seen by the OS but is handled by the VMM.

CPU record of idle and IOzone testing of SAS-attached VSA

During the testing phase, I’m primarily looking at the disk throughput, but I notice a persistent CPU utilization of 50% – even when idle. Regardless, the 4 vCPU VSA appears to perform well (about 725MB/sec 2-process throughput on initial write) despite the CPU deficit (3 vCPU test pictured above, about 600MB/sec write). However, after writing my last blog entry, the 50% CPU leach just kept bothering me.

After wasting several hours researching and tweaking with very little (positive) effect, a client e-mail prompted a NMV walk through with resulted in an unexpected consequence: the act of powering-off the VSA from web GUI (NMV) resulted is significantly reduced idle CPU utilization.

Getting lucky: noticing a trend after using NMV to reboot for a client walk-through of the GUI.

Working with the 3 vCPU VSA over the previous several hours, I had consistently used the NMC (CLI) to reboot and power-off the VM. The fact of simply using the NMV to shutdown the VSA couldn’t have anything to do with idle CPU consumption, could it? Remembering that these were fresh installations I wondered if this was specific to a fresh installation or could it show up in an upgrade. According to the forums, this only hampered VMs, not hardware.

I grabbed a NexentaStor 3.1.0 VM out of the library (one that had been progressively upgraded from 3.0.1) and set about the upgrade process. The result was unexpected: no difference in idle CPU from the previous version; this problem was NOT specific to 3.1.2, but specific to the installation/setup process itself (at least that was the prevailing hypothesis.)

Regression into my legacy VSA library, upgrading from 3.1.1 to 3.1.2 to check if the problem follows the NexentaStor version.

If anything, the upgraded VSA exhibited slightly less idle CPU utilization than its previous version. Noteworthy, however, was the extremely high CPU utilization as the VSA sat waiting for a yes/no response (NMC/CLI) to the “would you like to reboot now” question at the end of the upgrade process (see chart above). Once “no” was selected, CPU dropped immediately to normal levels.

Now it seemed apparent that perhaps an vestige of the web-based setup process (completed by a series of “wizard” pages) must be lingering around (much like the yes/no CPU glutton.) Fortunately, I had another freshly installed VSA to test with – exactly configured and processed as the first one. I fired-up the NMV and shutdown the VSA…

Confirming the impact of the "fix" on a second fresh installed NexentaStor VSA

After powering-on the VM from the vSphere Client it was obvious. This VSA had been running idle for some time, so it’s idle performance baseline – established prior across several reboots from CLI – was well recorded by the ESXi host (see above.) The resulting drop in idle CPU was nothing short of astounding: the 3 vCPU configuration has dropped from a 50% average utilization to 23% idle utilization. Naturally, these findings (still anecdotal) have been forwarded on to engineers at Nexenta. Unfortunately, now I have to go back and re-run my storage benchmarks; hopefully clearing the underlying bug has reduced the needed vCPU count…

h1

In-the-Lab: NexentaStor and VMware Tools, You Need to Tweak It…

February 24, 2012

While working on an article on complex VSA’s (i.e. a virtual storage appliance with PCIe pass-through SAS controllers) an old issue came back up again: NexentaStor virtual machines still have a problem installing VMware Tools since it branched from Open Solaris and began using Illumos. While this isn’t totally Nexenta’s fault – there is no “Nexenta” OS type in VMware to choose from – it would be nice if a dummy package was present to allow a smooth installation of VMware Tools; this is even the case with the latest NexentaStor release: 3.1.2.

I could not find where I had documented the fix in SOLORI’s blog, so here it is… Note, the NexentaStor VM is configured as an Oracle Solaris 11 (64-bit) virtual machine for the purpose of vCenter/ESXi. This establishes the VM’s relationship to a specific VMware Tools load. Installation of VMware Tools in NexentaStor is covered in detail in an earlier blog entry.

VMware Tools bombs-out at SUNWuiu8 package failure. Illumos-based NexentaStor has no such package.

Instead, we need to modify the vmware-config-tools.pl script directly to compensate for the loss of the SUNWuiu8 package that is explicitly required in the installation script.

Commenting out the SUNWuiu8 related section allows the tools to install with no harm to the system or functionality.

Note the full “if” stanza for where the VMware Tools installer checks for ‘tools-for-solaris’ must be commented out. Since the SUNWuiu8 package does not exist – and more importantly is not needed for Illumos/Nexenta – removing a reference to it is a good thing. Now the installation can proceed as normal.

After the changes, installation completes as normal.

That’s all there is to getting the “Oracle Solaris” version of VMware Tools to work in newer NexentaStor virtual machines – now back to really fast VSA’s with JBOD-attached storage…

SOLORI’s Note: There is currently a long-standing bug that affects NexentaStor 3.1.x running as a virtual machine. Currently there is no known workaround to keep NexentaStor from running up a 50% cpu utilization from ESXi’s perspective. Inside the NexentaStor VM we see very little CPU utilization, but from the performance tab, we see 50% utilization on every configured vCPU allocated to the VM. Nexenta is reportedly looking into the cause of the problem.

I looked through this and there is nothing that stands out other that a huge number of interrupts while idle. I am not sure where those interrupts are coming from. I see something occasionally called volume-check and nmdtrace which could be causing the interrupts.

Nexenta Support

A bug report was reportedly filed a couple of days ago to investigate the issue further.

h1

In-the-Lab: Tweak 2008R2 post-clone for View Transfer Server

April 4, 2011

View Transfer Server supports Server 2008 R2 but does not support the use of the “default” virtual LSI Logic SAS controller. If you’ve already carved-out a cloning template using the LSI Logic SAS template, it is not necessary to create a new template (or fresh installation) just to spool-up a Transfer Server. In fact, it will take you TWO re-boots from clone completion to LSI Logic Parallel replacement.

CAUTION: You must configure the virtual machine that hosts View Transfer Server with an LSI Logic Parallel SCSI controller. You cannot use a SAS or VMware paravirtual controller.

On Windows Server 2008 virtual machines, the LSI Logic SAS controller is selected by default. You must change this selection to an LSI Logic Parallel controller before you install the operating system.

– VMware View Upgrades (EN-000526-00), Page 13

Here’s the process to take you from completed Server 2008/R2 clone with LSI Logic SAS to LSI Logic Parallel – by-passing the Windows blue screen at boot:

  1. Clone your Server 2008/R2 server as normal,
  2. Shutdown clone and edit settings,
    1. Change Options>Advanced>Boot Options to “Force BIOS Setup” on next reboot;
    2. Hardware>Add…>Hard Disk>Create a new virtual disk>4GB, Thin Provisioning>SCSI(1:0)
    3. Hardware>SCSI Controller 1>Change Type…>LSI Logic Parallel
    4. Power-on

      Dropping-in a "dummy" LSI Logic Parallel disk to enable the drive controller for View Transfer Server.

  3. Boot the modified VM and (optionally) confirm new drive and controller
    1. Boot VM
    2. Modify boot order to insure SAS boot priority

      Modify boot order in BIOS to insure that the SAS controller is primary.

    3. (optional) Open Server Manager>Diagnostics>Device Manager
      1. View “Storage controllers”

        Confirming the operational status of both LSI controller types: Parallel and SAS.

    4. Shutdown
  4. Edit settings to modify boot and remove additional disk
    1. Hardware>SCSI Controller 0>Change Type…>LSI Logic Parallel
    2. Hard Disk 2>Remove>Remove from virtual machine and delete files from disk
      1. SCSI Controller 1 will automatically be removed
    3. Save and power-on
  5. Boot disk will now be LSI Logic Parallel

NOTE: In this example, the Server 2008/R2 VM is composed onto a single LSI Logic SAS disk (Hard Disk 1, SCSI controller 0). If your VM template is different, substitute your specific disk and/or controller numbers accordingly.

Nice, simple and now ready to install the View Transfer Server. Now on to the PCoIP Secure Gateway…

h1

In-the-Lab: Default Rights on CIFS Shares

December 6, 2010

Following-up on the last installment of managing CIFS shares, there has been a considerable number of questions as to how to establish domain user rights on the share. From these questions it is apparent that the my explanation about root-level share permissions could have been more clear. To that end, I want to look at default shares from a Windows SBS Server 2008 R2 environment and translate those settings to a working NexentaStor CIFS share deployment.

Evaluating Default Shares

In SBS Server 2008, a number of default shares are promulgated from the SBS Server. Excluding the “hidden” shares, these include:

  • Address
  • ExchangeOAB
  • NETLOGON
  • Public
  • RedirectedFolders
  • SYSVOL
  • UserShares
  • Printers

Therefore, it follows that a useful exercise in rights deployment might be to recreate a couple of these shares on a NexentaStor system and detail the methodology. I have chosen the NETLOGON and SYSVOL shares as these two represent default shares common in all Windows server environments. Here are their relative permissions:

NETLOGON

From the Windows file browser, the NETLOGON share has default permissions that look like this:

NETLOGON Share permissions

Looking at this same permission set from the command line (ICALCS.EXE), the permission look like this:

NETLOGON permissions as reported from icacls
The key to observe here is the use of Windows built-in users and NT Authority accounts. Also, it is noteworthy that some administrative privileges are different depending on inheritance. For instance, the Administrator’s rights are less than “Full” permissions on the share, however they are “Full” when inherited to sub-dirs and files, whereas SYSTEM’s permissions are “Full” in both contexts.

SYSVOL

From the Windows file browser, the NETLOGON share has default permissions that look like this:

SYSVOL network share permissions

Looking at this same permission set from the command line (ICALCS.EXE), the permission look like this:

SYSVOL permissions from ICACLS.EXE
Note that Administrators privileges are truncated (not “Full”) with respect to the inherited rights on sub-dirs and files when compared to the NETLOGON share ACL.

Create CIFS Shares in NexentaStor

On a ZFS pool, create a new folder using the Web GUI (NMV) that will represent the SYSVOL share. This will look something like the following:
Creating the SYSVOL share
Read the rest of this entry ?

h1

In-the-Lab: NexentaStor vs. Grub

November 16, 2010

In this In-the-Lab segment we’re going to look at how to recover from a failed ZFS version update in case you’ve become ambitious with your NexentaStor installation after the last Short-Take on ZFS/ZPOOL versions. If you used the “root shell” to make those changes, chances are your grub is failing after reboot. If so, this blog can help, but before you read on, observe this necessary disclaimer:

NexentaStor is an appliance operating system, not a general purpose one. The accepted way to manage the system volume is through the NMC shell and NMV web interface. Using a “root shell” to configure the file system(s) is unsupported and may void your support agreement(s) and/or license(s).

That said, let’s assume that you updated the syspool filesystem and zpool to the latest versions using the “root shell” instead of the NMC (i.e. following a system update where zfs and zpool warnings declare that your pool and filesystems are too old, etc.) In such a case, the resulting syspool will not be bootable until you update grub (this happens automagically when you use the NMC commands.) When this happens, you’re greeted with the following boot prompt:

grub>

Grub is now telling you that it has no idea how to boot your NexentaStor OS. Chances are there are two things that will need to happen before your system boots again:

  1. Your boot archive will need updating, pointing to the latest checkpoint;
  2. Your master boot record (MBR) will need to have grub installed again.

We’ll update both in the same recovery session to save time (this assumes you know or have a rough idea about your intended boot checkpoint – it is usually the highest numbered rootfs-nmu-NNN checkpoint, where NNN is a three digit number.) The first step is to load the recovery console. This could have been done from the “Safe Mode” boot menu option if grub was still active. However, since grub is blown-away, we’ll boot from the latest NexentaStor CD and select the recovery option from the menu.

Import the syspool

Then, we login as “root” (empty password.) From this “root shell” we can import the existing (disks connected to active controllers) syspool with the following command:

# zpool import -f syspool

Note the use of the “-f” card to force the import of the pool. Chances are, the pool will not have been “destroyed” or “exported” so zpool will “think” the pool belongs to another system (your boot system, not the rescue system). As a precaution, zpool assumes that the pool is still “in use” by the “other system” and the import is rejected to avoid “importing an imported pool” which would be completely catastrophic.

With the syspool imported, we need to mount the correct (latest) checkpointed filesystem as our boot reference for grub, destroy the local zfs.cache file (in case the pool disks have been moved, but still all there), update the boot archive to correspond to the mounted checkpoint and install grub to the disk(s) in the pool (i.e. each mirror member).

List the Checkpoints

# zfs list -r syspool

From the resulting list, we’ll pick our highest-numbered checkpoint; for the sake of this article let’s say it’s “rootfs-nmu-013” and mount it.

Mount the Checkpoint

# mkdir /tmp/syspool
# mount -F zfs syspool/rootfs-nmu-013 /tmp/syspool

Remove the ZPool Cache File

# cd /tmp/syspool/etc/zfs
# rm -f zpool.cache

Update the Boot Archive

# bootadm update-archive -R /tmp/syspool

Determine the Active Disks

# zpool status syspool

For the sake of this article, let’s say the syspool was a three-way mirror and the zpool status returned the following:

  pool: syspool
 state: ONLINE
  scan: resilvered 8.64M in 0h0m with 0 errors on Tue Nov 16 12:34:40 2010
config:
        NAME           STATE     READ WRITE CKSUM
        syspool        ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            c6t13d0s0  ONLINE       0     0     0
            c6t14d0s0  ONLINE       0     0     0
            c6t15d0s0  ONLINE       0     0     0

errors: No known data errors

This enumerates the three disk mirror as being composed of disks/slices c6t13d0s0, c6t14d0s0 and c6t15d0s0. We’ll use that information for the grub installation.

Install Grub to Each Mirror Disk

# cd /tmp/syspool/boot/grub
# installgrub -f -m stage[12] /dev/rdsk/c6t13d0s0
# installgrub -f -m stage[12] /dev/rdsk/c6t14d0s0
# installgrub -f -m stage[12] /dev/rdsk/c6t15d0s0

Unmount and Reboot

# umount /tmp/syspool
# sync
# reboot

Now, the system should be restored to a bootable configuration based on the selected system checkpoint. A similar procedure can be found on Nexenta’s site when using the “Safe Mode” boot option. If you follow that process, you’ll quickly encounter an error – likely intentional and meant to elicit a call to support for help. See if you can spot the step…

h1

In-the-Lab: Windows Server 2008 R2 Template for VMware

September 30, 2010

As it turns out, the reasonably simple act of cloning a Windows Server 2008 R2 (insert addition here) has been complicated by the number of editions, changes from 2008 release through 2008 R2 as well as user profile management changes since its release. If you’re like me, you like to tweak your templates to limit customization steps in post-deployment. While most of these customizations can now be setup in group policies from AD, the deployment of non-AD members has become a lot more difficult – especially where custom defaults are needed or required.

Here’s my quick recipe to build a custom image of Windows Server 2008 R2 that has been tested with Standard, Enterprise and Foundation editions.

Create VM, use VMXNET3 as NIC(s), 40GB “thin” disk, using 2008 R2 Wizard

This is a somewhat “mix to taste” step. We use ISO images and encourage their use. The size of the OS volume will end-up being somewhere around 8GB of actual space-on-disk after this step, making 40GB sound like overkill. However, the OS volume will bloat-up to 18-20GB pretty quick after updates, roles and feature additions. Adding application(s) will quickly chew-up the rest.

  • Edit Settings… ->
    • Options -> Advanced -> General -> Uncheck “Enable logging”
    • Hardware -> CD/DVD Drive 1 ->
      • Click “Datastore ISO File”
        • Browse to Windows 2008 R2 ISO image
      • Check “Connect at power on”
    • Options -> Advanced -> Boot Options -> Force BIOS Setup
      • Check “The next time the virtual machine boots, force entry into the BIOS setup screen”
  • Power on VM
  • Install Windows Server 2008 R2

Use Custom VMware Tools installation to disable “Shared Folders” feature:

It is important that VMware Tools be installed next, if for no other reason than to make the rest of the process quicker and easier. The additional step of disabling “Shared Folders” is for ESX/vSphere environments where shared folders are not supported. Since this option is installed by default, it can/should be removed in vSphere installations.

  • VM -> Guest -> Install VMware Tools ->
    • Custom -> VMware Device Drivers -> Disable “Shared Folder” feature
  • Retstart

Complete Initial Configuration Tasks:

Once the initial installation is complete, we need to complete the 2008 R2 basic configuration. If you are working in an AD environment, this is not the time to join the template to the domain as GPO conflicts may hinder manual template defaults. We’ve chosen a minimal package installation based on our typical deployment profile. Some features/roles may differ in your organization’s template (mix to taste).

  • Set time zone -> Date and Time ->
    • Internet Time -> Change Settings… -> Set to local time source
    • Date and Time -> Change time zone… -> Set to local time zone
  • Provide computer name and domain -> Computer name ->
    • Enterprise Edition: W2K8R2ENT-TMPL
    • Standard Edition: W2K8R2STD-TMPL
    • Foundation Edition: W2K8R2FND-TMPL
    • Note: Don’t join to a domain just yet…
  • Restart Later
  • Configure Networking
    • Disable QoS Packet Scheduler
  • Enable automatic updating and feedback
    • Manually configure settings
      • Windows automatic updating -> Change Setting… ->
        • Important updates -> “check for updates but let me choose whether to download and install them”
        • Recommended updates -> Check “Give me recommended updates the same way I receive important updates”
        • Who can install updates -> Uncheck “Allow all users to install updates on this computer”
      • Windows Error Reporting -> Change Setting… ->
        • Select “I don’t want to participate, and don’t ask me again”
      • Customer Experience Improvement Program -> Change Setting… ->
        • Select “No, I don’t want to participate”
  • Download and install updates
    • Bring to current (may require several reboots)
  • Add features (to taste)
    • .NET Framwork 3.5.1 Feautures
      • Check WCF Activation, Non-HTTP Activation
        • Pop-up: Click “Add Required Features”
    • SNMP Services
    • Telnet Client
    • TFTP Client
    • Windows PowerShell Integrated Scripting Environment (ISE)
  • Check for updates after new features
    • Install available updates
  • Enable Remote Desktop
    • System Properties -> Remote
      • Windows 2003 AD
        • Select “Allow connection sfrom computers running any version of Remote Desktop”
      • Windows 2008 AD (optional)
        • Select “Allow connections only from computers runnign Remote Desktop with Network Level Authentication”
  • Windows Firewall
    • Turn Windows Firewall on of off
      • Home or work location settings
        • Turn off Windows Firewall
      • Public network location settings
        • Turn off Windows Firewall
  • Complete Initial Configuration Tasks
    • Check “Do not show this window at logon” and close

Modify and Silence Server Manager

(Optional) Parts of this step may violate your local security policies, however, it’s more than likely that a GPO will ultimately override this configuration. We find it useful to have this disabled for “general purpose” templates – especially in a testing/lab environment where the security measures will be defeated as a matter of practice.

  • Security Information -> Configure IE ESC
    • Select Administrators Off
    • Select Users Off
  • Select “Do not show me this console at logon” and close

Modify Taskbar Properties

Making the taskbar usable for your organization is another matter of taste. We like smaller icons and maximizing desktop utility. We also hate being nagged by the notification area…

  • Right-click Taskbar -> Taskbar and Start Menu Properties ->
    • Taskbar -> Check “Use small icons”
    • Taskbar -> Customize… ->
      • Set all icons to “Only show notifications”
      • Click “Turn system icons on or off”
        • Turn off “Volume”
    • Start Menu -> Customize…
      • Uncheck “Use large icons”

Modify default settings in Control Panel

Some Control Panel changes will help “optimize” the performance of the VM by disabling unnecessary features like screen saver and power management. We like to see our corporate logo on server desktops (regardless of performance implications) so now’s the time to make that change as well.

  • Control Panel -> Power Options -> High Performance
    • Change plan settings -> Turn off the display -> Never
  • Control Panel -> Sound ->
    • Pop-up: “Would you like to enable the Windows Audio Service?” – No
    • Sound -> Sounds -> Sound Scheme: No Sounds
    • Uncheck “Play Windows Startup sound”
  • Control Panel -> VMware Tools -> Uncheck “Show VMware Tools in the taskbar”
  • Control Panel -> Display -> Change screen saver -> Screen Saver -> Blank, Wait 10 minutes
  • Change default desktop image (optional)
    • Copy your desktop logo background to a public folder (i.e. “c:\Users\Public\Public Pictures”)
    • Control Panel -> Display -> Change desktop background -> Browse…
    • Find picture in browser, Picture position stretch

Disable Swap File

Disabling swap will allow the defragment step to be more efficient and will disable VMware’s advanced memory management functions. This is only temporary and we’ll be enabling swap right before committing the VM to template.

  • Computer Properties -> Visual Effects -> Adjust for best performance
  • Computer Properties -> Advanced System Settings ->
    • System Properties -> Advanced -> Performance -> Settings… ->
    • Performance Options -> Advanced -> Change…
      • Uncheck “Automatically manage paging file size for all drives”
      • Select “No paging file”
      • Click “Set” to disable swap file

Remove hibernation file and set boot timeout

It has been pointed out that the hibernation and timeout settings will get re-enabled by the sysprep operation. Removing the hibernation files will help in defragment now. We’ll reinforce these steps in the customization wizard later.

  • cmd: powercfg -h off
  • cmd: bcdedit /timeout 5

Disable indexing on C:

Indexing the OS disk can suck performance and increase disk I/O unnecessarily. Chances are, this template (when cloned) will be heavily cached on your disk array so indexing in the OS will not likely benefit the template. We prefer to disable this feature as a matter of practice.

  • C: -> Properties -> General ->
    • Uncheck “Allow files on this drive to have contents indexed in addition to file properties”
    • Apply -> Apply changes to C:\ only (or files and folders, to taste)

Housekeeping

Time to clean-up and prepare for a streamlined template. The first step is intended to aid the copying of “administrator defaults” to “user defaults.” If this does not apply, just defragment.

Remove “Default” user settings:

  • C:\Users -> Folder Options -> View -> Show hidden files…
  • C:\Users\Default -> Delete “NTUser.*” Delete “Music, Pictures, Saved Games, Videos”

Defragment

  • C: -> Properties -> Tools -> Defragment Now…
    • Select “(C:)”
    • Click “Defragment disk”

Copy Administrator settings to “Default” user

The “formal” way of handling this step requires a third-party utility. We’re giving credit to Jason Samuel for consolidating other bloggers methods because he was the first to point out the importance of the “unattend.xml” file and it really saved us some time. His blog post also includes a link to an example “unattend.xml” file that can be modified for your specific use, as we have.

  • Jason Samuel points out a way to “easily” copy Administrator settings to defaults, by activating the CopyProfile node in an “unattend.xml” file used by sysprep.
  • Copy your “unattend.xml” file to C:\windows\system32\sysprep
  • Edit unattend.xml for environment and R2 version
    • Update offline image pointer to correspond to your virtual CD
      • E.g. wim:d:… -> wim:f:…
    • Update OS offline image source pointer, valid sources are:
      • Windows Server 2008 R2 SERVERDATACENTER
      • Windows Server 2008 R2 SERVERDATACENTERCORE
      • Windows Server 2008 R2 SERVERENTERPRISE
      • Windows Server 2008 R2 SERVERENTERPRISECORE
      • Windows Server 2008 R2 SERVERSTANDARD
      • Windows Server 2008 R2 SERVERSTANDARDCORE
      • Windows Server 2008 R2 SERVERWEB
      • Windows Server 2008 R2 SERVERWEBCORE
      • Windows Server 2008 R2 SERVERWINFOUNDATION
    • Any additional changes necessary
  • NOTE: now would be a good time to snapshot/backup the VM
  • cmd: cd \windows\system32\sysprep
  • cmd: sysprep /generalize /oobe /reboot /unattend:unattend.xml
    • Check “Generalize”
    • Shutdown Options -> Reboot
  • Login
  • Skip Activation
  • Administrator defaults are now system defaults
  • Reset Template Name
    • Computer Properties -> Advanced System Settings -> Computer name -> Change…
      • Enterprise Edition: W2K8R2ENT-TMPL
      • Standard Edition: W2K8R2STD-TMPL
      • Foundation Edition: W2K8R2FND-TMPL
    • If this will be an AD member clone, join template to the domain now
    • Restart
  • Enable Swap files
    • Computer Properties -> Advanced System Settings ->
      • System Properties -> Advanced -> Performance -> Settings… ->
      • Performance Options -> Advanced -> Change…
        • Check “Automatically manage paging file size for all drives”
  • Release IP
    • cmd: ipconfig /release
  • Shutdown
  • Convert VM to template

Convert VM Template to Clone

Use the VMware Customization Wizard to create a re-usable script for cloning the template. Now’s a good time to test that your template will create a usable clone. If it fails, go check the “red letter” items and make sure your setup is correct. The following hints will help improve your results.

  • Remove hibernation related files and reset boot delay to 5 seconds in Customization Wizard
  • Remember that the ISO is still mounted by default. Once VM’s are deployed from the template, it should be removed after the customization process is complete and additional roles/features are added.

That’s the process we have working at SOLORI. It’s not rocket science, but if you miss an important step you’re likely to be visited by an error in “pass [specialize]” that will have you starting over. Note: this also happens when your AD credentials are bad, your license key is incorrect (version/edition mismatch, typo, etc.) or other nondescript issues – too bad the error code is unhelpful…

h1

VMware Management Assistant Panics on Magny Cours

August 11, 2010

VMware’s current version of its vSphere Management Assistant – also known as vMA (pronounced “vee mah”) – will crash when run on an ESX host using AMD Magny Cours processors. This behavior was discovered recently when installing the vMA on an AMD Opteron 6100 system (aka. Magny Cours) causing a “kernel panic” on boot after deploying the OVF template. Something of note is the crash also results in 100% vCPU utilization until the VM is either powered-off or reset:

vMA Kernel Panic on Import

vMA Kernel Panic on Import

As it turns out, no manner of tweaks to the virtual machine’s virtualization settings nor OS boot/grub settings (i.e. noapic, etc.) seem to cure the ills for vMA. However, we did discover that the OVF deployed appliance was configured as a VMware Virtual Machine Hardware Version 4 machine:

vMA 4.1 defaults to Hardware Version 4

vMA 4.1 defaults to Virtual Machine Hardware Version 4

Since our lab vMA deployments have all been upgraded to Virtual Machine Harware Version 7 for some time (and for functional benefits as well), we tried to update the vMA to Version 7 and try again:

Upgrade vMA Virtual Machine Version...

Upgrade vMA Virtual Machine Version...

This time, with Virtual Hardware Version 7 (and no other changes to the VM), the vMA boots as it should:

vMA Booting after Upgrade to Virtual Hardware Version 7

vMA Booting after Upgrade to Virtual Hardware Version 7

Since the Magny Cours CPU is essentially a pair of tweaked 6-core Opteron CPUs in a single package, we took the vMA into the lab and deployed it to an ESX server running on AMD 2435 6-core CPUs: the vMA booted as expected, even with Virtual Hardware Version 4. A quick check of the community and support boards show a few issues with older RedHat/Centos kernels (like vMA’s) but no reports of kernel panic with Magny Cours. Perhaps there are just not that many AMD Opteron 6100 deployments out there with vMA yet…

h1

ZFS Pool Import Fails After Power Outage

July 15, 2010
The early summer storms have taken its toll on Alabama and UPS failures (and short-falls) have been popping-up all over. Add consolidated, shared storage to the equation and you have a recipe for potential data loss – at least this is what we’ve been seeing recently. Add JBOD’s with separate power rails and limited UPS life-time and/or no generator backup and you’ve got a recipe for potential data loss.

Even with ZFS pools, data integrity in a power event cannot be guaranteed – especially when employing “desktop” drives and RAID controllers with RAM cache and no BBU (or perhaps a “bad storage admin” that has managed to disable the ZIL). When this happens, NexentaStor (an other ZFS storage devices) may even show all members in the ZFS pool as “ONLINE” as if they are awaiting proper import. However, when an import is attempted (either automatically on reboot or manually) the pool fails to import.

From the command line, the suspect pool’s status might look like this:

root@NexentaStor:~# zpool import
pool: pool0
id: 710683863402427473
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
        pool0        ONLINE
          mirror-0   ONLINE
            c1t12d0  ONLINE
            c1t13d0  ONLINE
          mirror-1   ONLINE
            c1t14d0  ONLINE
            c1t15d0  ONLINE
Looks good, but the import it may fail like this:
root@NexentaStor:~# zpool import pool0
cannot import 'pool0': I/O error
Not good. This probably indicates that something is not right with the array. Let’s try to force the import and see what happens:

Nope. Now this is the point where most people start to get nervous, their neck tightens-up a bit and they begin to flip through a mental calendar of backup schedules and catalog backup repositories – I know I do. However, it’s the next one that makes most administrators really nervous when trying to “force” the import:

root@NexentaStor:~# zpool import -f pool0
pool: pool0
id: 710683863402427473
status: The pool metadata is corrupted and the pool cannot be opened.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
cannot import 'pool0': I/O error
Really not good. Did it really suggest going to backup? Ouch!.

In this case, something must have happened to corrupt metadata – perhaps the non-BBU cache on the RAID device when power failed. Expensive lesson learned? Not yet. The ZFS file system still presents you with options, namely “acceptable data loss” for the period of time accounted for in the RAID controller’s cache. Since ZFS writes data in transaction groups and transaction groups normally commit in 20-30 second intervals, that RAID controller’s lack of BBU puts some or all of that pending group at risk. Here’s how to tell by testing the forced import as if data loss was allowed:

root@NexentaStor:~# zpool import -nfF pool0
Would be able to return data to its state as of Fri May 7 10:14:32 2010.
Would discard approximately 30 seconds of transactions.
or
root@NexentaStor:~# zpool import -nfF pool0
WARNING: can't open objset for pool0
If the first output is acceptable, then proceeding without the “n” option will produce the desired effect by “rewinding” the last couple of transaction groups (read ignoring) and imported the “truncated” pool. The “import” option will report the exact number of “seconds” worth of data that cannot be restored. Depending on the bandwidth and utilization of your system, this could be very little data or several MB worth of transaction(s).

What to do about the second option? From the man pages on “zpool import” Sun/Oracle says the following:

zpool import [-o mntopts] [ -o property=value] … [-d dir-c cachefile] [-D] [-f] [-R root] [-F [-n]]-a
Imports all pools found in the search directories. Identical to the previous command, except that all pools with a sufficient number of devices available are imported. Destroyed pools, pools that were previously destroyed with the “zpool destroy” command, will not be imported unless the-D option is specified.

-o mntopts
Comma-separated list of mount options to use when mounting datasets within the pool. See zfs(1M) for a description of dataset properties and mount options.

-o property=value
Sets the specified property on the imported pool. See the “Properties” section for more information on the available pool properties.

-c cachefile
Reads configuration from the given cachefile that was created with the “cachefile” pool property. This cachefile is used instead of searching for devices.

-d dir
Searches for devices or files in dir. The -d option can be specified multiple times. This option is incompatible with the -c option.

-D
Imports destroyed pools only. The -f option is also required.

-f
Forces import, even if the pool appears to be potentially active.

-F
Recovery mode for a non-importable pool. Attempt to return the pool to an importable state by discarding the last few transactions. Not all damaged pools can be recovered by using this option. If successful, the data from the discarded transactions is irretrievably lost. This option is ignored if the pool is importable or already imported.

-a
Searches for and imports all pools found.

-R root
Sets the “cachefile” property to “none” and the “altroot” property to “root”.

-n

Used with the -F recovery option. Determines whether a non-importable pool can be made importable again, but does not actually perform the pool recovery. For more details about pool recovery mode, see the -F option, above.

No real help here. What the documentation omits is the “-X” option. This option is only valid with the “-F” recovery mode setting, however it is NOT well documented suffice to say it is the last resort before acquiescing to real problem solving… Assuming the standard recovery mode “depth” of transaction replay is not quite enough to get you over the hump, the “-X” option gives you an “extended replay” by seemingly providing a scrub-like search through the transaction groups (read “potentially time consuming”) until it arrives at the last reliable transaction group in the dataset.
Lessons to be learned from this excursion into pool recovery are as follows:
  1. Enterprise SAS good; desktop SATA could be a trap
  2. Redundant Power + UPS + Generator = Protected; Anything else = Risk
  3. SAS/RAID Controller + Cache + BBU = Fast; SAS/RAID Controller + Cache – BBU = Train Wreck

The data integrity functions in ZFS are solid when used appropriately. When architecting your HOME/SOHO/SMB NAS appliance, pay attention to the hidden risks of “promised performance” that may walk you down the plank towards a tape backup (or resume writing) event. Better to leave the 5-15% performance benefit on the table or purchase adequate BBU/UPS/Generator resources to supplant your system in worst-case events. In complex environments, a pending power loss can be properly mitigated through management supervisors and clever scripts: turning down resources in advance of total failure. How valuable is your data???

h1

In-the-Lab: Install VMware Tools on NexentaStor VSA

June 17, 2010

Physical lab resources can be a challenge to “free-up” just to test a potential storage appliance. With NexentaStor, you can download a pre-configured VMware (or Xen) appliance from NexentaStor.Org, but what if you want to build your own? Here’s a little help on the subject:

  1. Download the ISO from NexentaStor.Org (see link above);
  2. Create a VMware virtual machine:
    1. 2 vCPU
    2. 4GB RAM (leaves about 3GB for ARC);
    3. CD-ROM (mapped to the ISO image);
    4. One (optionally two if you want to simulate the OS mirror) 4GB, thin provisioned SCSI disks (LSI Logic Parallel);
    5. Guest Operating System type: Sun Solaris 10 (64-bit)
    6. One E1000 for Management/NAS
    7. (optional) One E1000 for iSCSI
  3. Streamline the guest by disabling unnecessary components:
    1. floppy disk
    2. floppy controller (remove from BIOS)
    3. primary IDE controller (remove from BIOS)
    4. COM ports (remove from BIOS)
    5. Parallel ports (remove from BIOS)
  4. Boot to ISO and install NexentaStor CE
    1. (optionally) choose second disk as OS mirror during install
  5. Register your installation with Nexenta
    1. http://www.nexenta.com/register-eval
    2. (optional) Select “Solori” as the partner
  6. Complete initial WebGUI configuration wizard
    1. If you will join it to a domain, use the domain FQDN (i.e. microsoft.msft)
    2. If you choose “Optimize I/O performance…” remember to re-enable ZFS intent logging under Settings>Preferences>System
      1. Sys_zil_disable = No
  7. Shutdown the VSA
    1. Settings>Appliance>PowerOff
  8. Re-direcect the CD-ROM
    1. Connect to Client Device
  9. Power-on the VSA and install VMware Tools
    1. login as admin
      1. assume root shell with “su” and root password
    2. From vSphere Client, initiate the VMware Tools install
    3. cd /tmp
      1. untar VMware Tools with “tar zxvf  /media/VMware\ Tools/vmware-solaris-tools.tar.gz”
    4. cd to /tmp/vmware-tools-distrib
      1. install VMware Tools with “./vmware-install.pl”
      2. Answer with defaults during install
    5. Check that VMware Tools shows and OK status
      1. IP address(es) of interfaces should now be registered

        VMware Tools are registered.

  10. Perform a test “Shutdown” of your VSA
    1. From the vSphere Client, issue VM>Power>Shutdown Guest

      System shutting down from VMware Tools request.

    2. Restart the VSA…

      VSA restarting in vSphere

Now VMware Tools has been installed and you’re ready to add more virtual disks and build ZFS storage pools. If you get a warning about HGFS not loading properly at boot time:

HGFS module mismatch warning.

it is not usually a big deal, but the VMware Host-Guest File System (HGFS) has been known to cause issues in some installations. SInce the NexentaStor appliance is not a general purpose operating system, you should customize the install to not use HGFS at all. To disable it, perform the following:

  1. Edit “/kernel/drv/vmhgfs.conf”
    1. Change:     name=”vmhgfs” parent=”pseudo” instance=0;
    2. To:     #name=”vmhgfs” parent=”pseudo” instance=0;
  2. Re-boot the VSA

Upon reboot, there will be no complaint about the offending HGFS module. Remember that, after updating VMware Tools at a future date, the HGFS configuration file will need to be adjusted again. By the way, this process works just as well on the NexentaStor Commercial edition, however you might want to check with technical support prior to making such changes to a licensed/supported deployment.