Even with ZFS pools, data integrity in a power event cannot be guaranteed – especially when employing “desktop” drives and RAID controllers with RAM cache and no BBU (or perhaps a “bad storage admin” that has managed to disable the ZIL). When this happens, NexentaStor (an other ZFS storage devices) may even show all members in the ZFS pool as “ONLINE” as if they are awaiting proper import. However, when an import is attempted (either automatically on reboot or manually) the pool fails to import.
From the command line, the suspect pool’s status might look like this:
root@NexentaStor:~# zpool import pool: pool0 id: 710683863402427473 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: pool0 ONLINE mirror-0 ONLINE c1t12d0 ONLINE c1t13d0 ONLINE mirror-1 ONLINE c1t14d0 ONLINE c1t15d0 ONLINE
root@NexentaStor:~# zpool import pool0 cannot import 'pool0': I/O error
Nope. Now this is the point where most people start to get nervous, their neck tightens-up a bit and they begin to flip through a mental calendar of backup schedules and catalog backup repositories – I know I do. However, it’s the next one that makes most administrators really nervous when trying to “force” the import:
root@NexentaStor:~# zpool import -f pool0 pool: pool0 id: 710683863402427473 status: The pool metadata is corrupted and the pool cannot be opened. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. cannot import 'pool0': I/O error
In this case, something must have happened to corrupt metadata – perhaps the non-BBU cache on the RAID device when power failed. Expensive lesson learned? Not yet. The ZFS file system still presents you with options, namely “acceptable data loss” for the period of time accounted for in the RAID controller’s cache. Since ZFS writes data in transaction groups and transaction groups normally commit in 20-30 second intervals, that RAID controller’s lack of BBU puts some or all of that pending group at risk. Here’s how to tell by testing the forced import as if data loss was allowed:
root@NexentaStor:~# zpool import -nfF pool0 Would be able to return data to its state as of Fri May 7 10:14:32 2010. Would discard approximately 30 seconds of transactions.
root@NexentaStor:~# zpool import -nfF pool0 WARNING: can't open objset for pool0
What to do about the second option? From the man pages on “zpool import” Sun/Oracle says the following:
zpool import [-o mntopts] [ -o property=value] … [-d dir | -c cachefile] [-D] [-f] [-R root] [-F [-n]]-a Imports all pools found in the search directories. Identical to the previous command, except that all pools with a sufficient number of devices available are imported. Destroyed pools, pools that were previously destroyed with the “zpool destroy” command, will not be imported unless the-D option is specified.
- -o mntopts
- Comma-separated list of mount options to use when mounting datasets within the pool. See zfs(1M) for a description of dataset properties and mount options.
- -o property=value
- Sets the specified property on the imported pool. See the “Properties” section for more information on the available pool properties.
- -c cachefile
- Reads configuration from the given cachefile that was created with the “cachefile” pool property. This cachefile is used instead of searching for devices.
- -d dir
- Searches for devices or files in dir. The -d option can be specified multiple times. This option is incompatible with the -c option.
- Imports destroyed pools only. The -f option is also required.
- Forces import, even if the pool appears to be potentially active.
- Recovery mode for a non-importable pool. Attempt to return the pool to an importable state by discarding the last few transactions. Not all damaged pools can be recovered by using this option. If successful, the data from the discarded transactions is irretrievably lost. This option is ignored if the pool is importable or already imported.
- Searches for and imports all pools found.
- -R root
- Sets the “cachefile” property to “none” and the “altroot” property to “root”.
Used with the -F recovery option. Determines whether a non-importable pool can be made importable again, but does not actually perform the pool recovery. For more details about pool recovery mode, see the -F option, above.
- Enterprise SAS good; desktop SATA could be a trap
- Redundant Power + UPS + Generator = Protected; Anything else = Risk
- SAS/RAID Controller + Cache + BBU = Fast; SAS/RAID Controller + Cache – BBU = Train Wreck
The data integrity functions in ZFS are solid when used appropriately. When architecting your HOME/SOHO/SMB NAS appliance, pay attention to the hidden risks of “promised performance” that may walk you down the plank towards a tape backup (or resume writing) event. Better to leave the 5-15% performance benefit on the table or purchase adequate BBU/UPS/Generator resources to supplant your system in worst-case events. In complex environments, a pending power loss can be properly mitigated through management supervisors and clever scripts: turning down resources in advance of total failure. How valuable is your data???