h1

SME Stack V0.1, Part 3 – Storage Solutions

January 2, 2009

If storage is the key, shared storage the key that opens all locks. In the early days of file servers, shared storage meant a common file store presented to users over the network infrastructure. This storage was commonly found in a DAS array – usually RAID1 or RAID5 – and managed by a general purpose server operating system (like Windows or Netware). Eventually, such storage paradigms adopted clustering technologies for high-availability, but the underlying principles remained much the same: an extrapolation of a general purpose implementation.

Today, shared storage means something completely different. In fact, the need for “file servers” of old has not disappeared but the dependency on DAS for the place where stored data is ultimately placed has moved to the network. Network attached storage – in the form of filers and network block devices – are replacing DAS as companies retire legacy systems and expand their data storage and business continuity horizons. Today, commercial and open source software options abound that provide stability, performance, scalability, redundancy and feature sets that can provide increased functionality and accelerated ROI to their adopters.

How can the increased acquisition cost of a filer or SAN be justified in the small enterprise? Aside from the benefits touched on in Part 2, the justification begins with the cost of continuing to do things in the legacy way. Noting that a filer and/or SAN’s real job is to abstract DAS to a network entity, the cost of NAS/SAN is in the software used to implement it along with the feature licenses necessary to fit your company’s needs. We are all familiar with the concept of redundancy methods used in DAS in the form of RAID1 (N+N) and RAID5 (N+1) or RAID6 (N+2) and the use of hot and stand-by spares – usually 1 per array- that fix the cost of storage on a “per gigabyte” basis. With NAS/SAN, this cost is equal to or greater than DAS simply because the basis for storage is the same: disk arrays.

Features and redundancy is how NAS/SAN costs tend to climb. Volume snapshots, management utilities, tiering capabilities, network mirroring and error checking should be built-in to any NAS/SAN implementation being considered today. Additionally, backup and recovery, deduplication and high-availability must also be considered as “basic features” although many vendors charge additionally for them. Before even considering “performance enhancing” architectural each of these features come with an intrinsic cost in terms of additional storage requirements. For instance, snapshots imply an increase in available raw storage proportional to the rate of storage growth and number of active snapshots while mirroring implies a doubling of raw storage costs. As a rule of thumb, snapshots accelerate storage growth by 15-30% over the data creation rate. That is a shift of 15-30% on the “Storage and Growth Rate” chart from Part 2 which, in turn, implies a 50% growth rate for almost any sized organization.

Let us examine the concept of raw storage versus effective storage. Raw storage is almost universally understood: the sum total of all storage elements in their unformatted, uncommitted state. For instance, if you start with 12 individual 500GB hard disks then you have roughly 6TB of “raw storage” on hand (12 x 500GB = 6,000 GB or 6TB.) Even assuming the formatting cost was zero, if these drives were apportioned as a single RAID6 array the effective storage would be reduced to 5TB as implemented (RAID6, (12-2) x 500GB = 5,000GB or 5TB). Using that same raw storage as a network mirror of RAID5 arrays, the effective storage is reduced to 2.5TB (RAID5 mirrored, (12/2-1) x 500GB = 2,500GB = 2.5TB). This basic and typical deployment model results in an storage efficiency factor of only 42% when compared to raw storage. This fact has similar ramifications in your storage budget.

Returning to our prototype company in Part 2, the graphics design company needed 2TB today in a storage system that meets redundancy and availability requirements that provide for tomorrow’s needs. Based on our raw storage comparison and the baseline snapshot growth acceleration, the company’s effective storage needs – in six months time – will exceed 2TB, and, if purchasing for an 18-month implementation, increases to 3.6TB. For a redundant cluster of storage with an effective capacity of 3.6TB, the company would need nearly 8TB of raw storage just to meet the capacity requirements. This, of course, does not allow for performance considerations, tiering or backup-to-disk uses.

While storage optimization is well beyond the scope of this discussion, the employment of optimization techniques could set the raw storage requirements well above 300-400% of effective storage needs. As many vendors recognize storage density as a component of their license costs, if becomes important to know whether or not your licensing terms include raw or effective/delivered storage. This is where ROI calculations can become nebulous and often a place where consumers become lost. Here, too, is a place where thin provisioning methods can actually be used as a trap for some storage vendors, where licensing costs may take a steep jump from “initial storage needs” to a company’s 18-month fulfillment.

Another way storage vendors extract a premium for per-unit storage costs (raw storage) is at the alter of the “approved storage module.” These are typically the same drives with slightly tweaked parameters and disk signatures that uniquely identify them as OEM parts. Without such a disk in the drive caddy (or hot-swap tray) the system will fail to recognize the part as “compatible” and refuse to add the storage to the system. While this insures design performance compliance, it also results in “vendor lock-in” which can have long-reaching support and financial consequences. As an example, while a 750GB drive from your local hardware vendor may cost $130 in the “RAID compatible” variant, the same drive (with its tweaks) will set you back $800-$1,250. The increase in effective storage may also result in a higher maintence and support bill if licensing is tied to storage volume.

With raw storage costs upward of 600% above commodity storage costs, it is no wonder why there is a burgeoning market for OSS-based storage appliances where the costs of “scale out” are quickly offset by the reduced cost of raw storage. In a follow-up post, we’ll examine a state-of-the-art OSS platform and compare it to a similarly equipped commercial product. This approach will be the foundation of our storage eco-system and the virtualization infrastructure build on-top of it. In this approach, we will also contrast all-in-one enclosures vs. appliance-plus-JBOD architectures for performance, flexibility and cost.

%d bloggers like this: