Shanghai Economics 101 – ConclusionMay 6, 2009
In the past entries, we’ve looked only at the high-end processors as applied to system prices, and we’ll continue to use those as references through the end of this one. We’ll take a look at other price/performance tiers in a later blog, but we want to finish-up on the same footing as we began; again, with an eye to how these systems play in a virtualization environment.
We decided to finish this series with an analysis of real world application instead of just theory. We keep seeing 8-to-1, 16-to-1 and 20-to-1 consolidation ratios (VM-to-host) being offered as “real world” in today’s environment so we wanted to analyze what that meant from an economic side.
The Fallacy of Consolidation Ratios
First, consolidation ratios that speak in terms of VM-to-host are not very informative. For instance, a 16-to-1 consolidation ratio sounds good until you realize it was achieved on an $16,000 4Px4C platform. This ratio results in a $1,000-per-VM cost to the consolidator.
In contrast, let’s take the same 16-to-1 ratio on a $6,000 2Px4C platform and it results in a $375-per-VM cost to the consolidator: a savings of nearly 60%. The key to the savings is in vCPU-to-Core consolidation ratio (provided sufficient memory exists to support it). In the first example that ratio was 1:1, but in the last example the ratio is 2:1. Can we find 16:1 vCPU-to-Core ratios out there? Sure, in test labs, but in the enterprise we think the valid range of vCPU-to-Core consolidation ratios is much more conservative, ranging from 1:1 to 8:1 with the average (or sweet spot) falling somewhere between 3:1 and 4:1.
Second, we must note that memory is a growing aspect of the virtualization equation. Modern operating systems no longer “sip” memory and 512MB for a Windows or Linux VM is becoming more an exception than a rule. That puts pressure on both CPU and memory capacity as driving forces for consolidation costs. As operating system “bloat” increases, administrative pressure to satisfy their needs will mount, pushing the “provisioned” amount of memory per VM ever higher.
Until “hot add” memory is part of DRS planning and the requisite operating systems support it, system admins will be forced to either over commit memory, purchase memory based on peak needs or purchase memory based on average memory needs and trust DRS systems to handle the balancing act. In any case, memory is a growing factor in systems consolidation and virtualization.
Modeling the Future
Using data from the Univerity of Chicago and as a baseline and extrapolating forward through 2010, we’ve developed a simple model to predict vMEM and vCPU allocation trends. This approach establishes three key metrics (already used in previous entries) that determine/predict system capacity: Average Memory/VM (vMVa), Average vCPU/VM (vCVa) and Average vCPU/Core (vCCa).
Average Memory per VM (vMVa)
Average memory per VM is determined by taking the allocated memory of all VM’s in a virtualized system – across all hosts – and dividing that by the total number of VM’s in the system (not including non-active templates.) This number is assumed to grow as virtualization moves from consolidation to standardized deployment.
Dividing the physical memory in a virtualization platform by the vMVa component provides an estimate of the average number of VM’s that platform can contribute to the system. Since memory is only a single factor, this metric may indicate a number of VM’s not achievable depending on the vCPU requirements of the VM’s to be managed.
Average vCPU per VM (vCVa)
Average virtual CPU per VM is determined by taking the sum of all virtual CPUs in a virtualized system – across all hosts – and dividing that by the total number of VM’s in the system (again, not including non-active templates.) This number is assumed ot grow as operating systems become more resource hungry and standardized deployment encroaches on more administrative domains.
Average vCPU per Core (vCCa)
Average virtual CPU per processor core is determined by taking the sum of all virtual CPUs in a virtualized system and dividing by the total number of physical processor cores in the system. This number is expected to move slightly higher as virtualization standardizes and more applications “administratively require” multiple vCPUs.
This is indirectly related to the scalability of the underlying CPU architecture of the virtualization system. This indirect relationship owes itself to the administrative nature of CPU allocations across modern virtualization systems. In the future, DRS-type of performance-based resource allocation may lend more value to this metric. Multiplying this metric times the number of physical cores in a virtualization platform will provide an estimate of the average capacity of that virtualization host (given sufficient memory to sustain the predicted VM’s).
Resource Use and Effective Cost
Modern operating systems are not designed to be virtualized: they are designed to consume and use all available resources. This presents an administrative layer to the virtual machine provisioning step that challenges the user to make choices that affect not only the virtual machine’s performance, but the overall performance and available resources of the virtualization pool. The administrative layer defines a default “cost” for virtual assets like memory, CPU and disk. These “costs” have a relationship not only to the direct cost of the equipment, but to the “opportunity cost” of future resource availability.
Here’s our idea of how the pressures of operating system changes will affect typical memory allocation in virtual environments.
This trend shows a growth of about 512MB per VM per year on average. Before we see what that means to the hardware cost per VM, let’s do the same for CPU allocation:
Holding vCCa constant at 3.38 would have meant purchasing more hardware and virtualization licenses. Since the introduction of one machine equates to at least two CPU licenses in VMware, increasing vCCa (within a tolerance limit) is acceptable. Taking into account the bias towards “buy for growth” and away from “upgrades,” purchasing decisions tend to move towards new system allocation instead of component-wise upgrades. We expect to see this continue with the adoption of new platforms like Nehalem-EP/EX and AMD G34/C32.
How does this affect per VM cost at the hardware level? As we’ve seen, driving consolidation factors reduces cost in a couple of major areas:
- Administrative costs,
- VMware license costs,
- Support costs,
- Backup license costs,
- Infrastructure costs (LAN and SAN ports and switches, etc.),
- Operating system costs (multi-VM per license, etc.)
By driving-up per-host consolidation rates, these cost savings can be more effectively realized. This is why decisions on system purchases should be made – assuming acceptable performance – around $/VM and not necessarily $/system. Here’s a snapshot of our reference platforms based on 2008 metrics:
This characteristic was developed by holding the average ratio of vCPU to Core (vCCa) constant at 3.38. This represents an increase in memory allocation of 87% and CPU allocation increase of 31% over three years. Notice the inflection point in the graph for 48GB systems demonstrates a function of CPU loading and not memory limitation. For loads through 2010, 48GB-64GB appears to be the sweet-spot in $/VM.
The “not so good news” for Shanghai is that – based on the expected benefit of Nehalem-EP’s SMT – the math no longer holds-up in our projected cost per VM analysis for 2010. While it looks bleak for Shanghai, the opposite is projected to be true for Istanbul. In our projected analysis, Istanbul carries a clear $100/VM advantage over comparably equipped Nehalem-EP systems – even when giving Nehalem-EP a serious per-core advantage in CPU capacity.
Fortunately for those invested in AMD Eco-Systems to date (socket-F capable systems), the road to Istanbul is a short one. While we’ve covered the ROI aspect of Istanbul drop-in replacement, and the fallacy of replacing socket-F systems with Nehalem-EP, we’ve taken a final look at cost per VM as it relates to projects enterprise costs from 2009 through 2011.
Expected Cost of Virtualization
We wanted to use our vCPU and vMem growth projections to model a typical enterprise – entering into virtualization today – and compare the economic merits of platform choice through 2011. We have chosen the sweet spot for each represented platform where cost per VM is either a minimum at the beginning of the trend or at the end. This helped us determine the right memory size for the platforms: 48GB Nehalem-EP, 32GB Shanghai and 48GB Istanbul.
We’ve made an exception in the Istanbul case, factoring in the costs (material and labor) of upgrading the memory from 48GB to 64GB in 2011. These additional costs include upgrading all systems purchased in 2009 and 2010 as well. This was the only case where memory upgrades reduced the cost per VM of the platform (based on our VM averages.)
We begin our analysis by assuming an enterprise load of 150 virtual machines, expected annual virtual machine growth of 20%, and an average platform cost reduction of 15% due to competitive pressures and demand. Next we apply our provisioning model based on our projected vMVa and vCVa metrics as projected above (see table.) Then, we calculate the number of initial systems, their projected power consumption and add additional systems and power consumption through 2011.
We feel 60% is a good balance of operational performance and availability protection for 3+ virtual hosts managing typical loads. For the 60% utilization target, we find the following:
While Shanghai represents a cost benefit in 2009, the expected higher demands on virtual resources in 2010 coupled with the reduction in system costs for Nehalem-EP allow Nehalem to overcome initial fixed costs by 2011. The advantage goes to Istanbul in year-over-year cost savings.
Shanghai and Nehalem-EP are fairly matched until 2011 when the number of Shanghai-based systems would balloon due to the 32GB memory limit of the chosen system (30 Shanghai systems versus 24 Nehalem-EP systems). Here, a memory upgrade would not prove effective as Shanghai’s processor core resources would reach their limit before the additional memory could prove useful cost effective. However, these charts indicate that a rolling upgrade from Shanghai/Barcelona to Istanbul in 2009/2010 would put the OPEX on-track with Istanbul, recovering a significant value in the investment. This is great news for existing Shanghai/Barcelona users.
For systems with “peak demands” we’d recommend a less aggressive 40% utilization – preferably in a pooled subset of a the larger virtualization pool – to handle peak loads with adequate performance. This is also a good utilization ratio for systems designed to operate in paired pools – allowing for 80% operational capacity in the event of a single system failure. The higher cost of the increased latent potential is obvious:
Here we see Shanghai fairing a little better in 2010 versus Nehalem-EP, with similar ballooning of costs in 2011. As in the 60% utilization study, we see the effects of the much increased vCPU demand closing the gap between Istanbul and Nehalem-EP systems (in numerical quantity) in 2011, resulting in 32 total systems in use for Nehalem-EP in 2011 and 27 total systems in use for Istanbul; Shanghai is a distant third with 45 systems in use by 2011.
It is no surprise that with more systems in use, comes higher projected power costs:
As can be seen from the chart, a smaller number of systems translates into a smaller power-cost footprint. This translates into about a $2,000/year savings for Istanbul over Nehalem-EP. [Note: DRS-driven DPM would be most effective in a 40% target utilization environment. This more conservative utilization ratio would allow up to 50% of the virtualization farm to be turned-off during off-peak times of the day. While the effects would be more dramatic for Shanghai – with its higher number of servers – the results would be proportional across the sample.]
We wanted to wrap-up this series with a couple of observations about the relative values of Nehalem-EP and Shanghai and the projected value of Istanbul for AMD. We also wanted to emphasize some unexpected results uncovered during this analysis as they apply to “typical” deployments.
As we’ve said in previous posts, Nehalem-EP represents an excellent choice for those just starting out in virtualization. Compared to AMD’s Shanghai, it offers better scalability and price per VM over the long term. In 2009 dollars, it represents a slightly higher cost of about $30/VM, but its scalability allows it to out-pace Shanghai when vCPU demands increase in 2010 and 2011, giving it a 3-year cost just advantage over Shanghai of about 4%. While this is not enough benefit to drop an existing AMD infrastructure, it makes continuing with an Intel channel a no-brainer. [Note that these comments dismiss the potential to upgrade Shanghai to Istanbul in June/2009, at which time the Istanbul projections would dominate the AMD side of the equation.]
Had Barcelona released as Shanghai in 2007, it would have been as technologically meaningful for AMD as the Nehalem-EP launch in 2009 was for Intel. That said, Shanghai has not received the accolades it deserved over the last year, and AMD certainly deserves a hand in Intel’s success with Nehalem for showing them the way to success. Even though our projections show Shanghai at a loss in cost/VM in 2011, that still gives the Shanghai processor three two good years of potential value [2008-2009].
However, with the release of Istanbul, AMD is faced with a hard decision with respect to Shanghai: where does it fit in the line-up and at what price? A 2P Shanghai system will only be capable of sustaining 18 VM’s in 2009 (based on 60% resource utilization) compared to Nehalem-EP’s 22 VM’s per host – that still equates to 7% system price advantage today, mostly due to DDR3 price premiums. In 2010 (perhaps Q3/2009) that changes significantly, where load profiles suggest a 15% price/VM advantage in Nehalem-EP’s favor. To bring Shanghai’s value in line, there would need to be a 30-40% reduction in price per CPU this year or Istanbul will simply cannibalize the market.
[Perhaps a good role for Shanghai is in the X3 market for virtualization. This would deliver higher L3/core ratios and possibly allow higher core speeds at the same average power mark. This is a diminishing return path, as vCPU/CPU will begin to drive purchasing decisions in 2010 – but that’s Magny-Cours’ problem to solve. Through-out 2009 and 2010, AMD needs to continue to find a value proposition for Shanghai: another position Intel doen’t find themselves in.]
AMD Istanbul & Magny-Cours
AMD has to make a flawless leap in their game of leap-frog with Intel when they bring Istanbul to market. Istanbul needs to release across 80% of the speed band and maintain the same (or lower) power envelope as Shanghai. Likewise, they need to follow quickly with Magny-Cours to counter Intel’s rumored 6-core Nehalem-EP and 8-core Nehalem-EX in 2010. Unless “snoop filter” brings-off some amazing tricks in 2P, the aging memory architecture coupled to socket-F is likely to keep Istanbul out of the top VMmark slot until Magny-Cours follows-up in Q1/2010.
Since our analysis was predicated on Istanbul pricing-in at Shanghai+15% (current prices), our price-performance comparisons are only valid in that context. If AMD’s pricing comes in +/-5% of that mark, they’re in good shape. If Istanbul comes with a price too much higher, it will give-up more market share to Nehalem-EP. [Historically, AMD has done a pretty good job of maintaining its price point on the “higher-end” by either reducing the replaced chips in price or introducing the “new” chip at a slightly lower price at the same speed rating as the chip they replace. We think this approach will continue to prevail.]
We’re confident that the Istanbul pairing will show significant gains in the 24-core standings, perhaps posting of 29.2@20 tiles [est.] in its inaugural 4P VMmark and 18.3@12 tiles for 2P (creating a new 12-core category). That would place Istanbul 2P in a price-competitive situation with Nehalem-EP (as shown) and Istanbul 4P in competition with existing 8P systems. Naturally, 8P systems based on Shanghai will be eligible for upgrade to Istanbul, and we’d expect to see VMmark numbers in the 37@28 tiles mark.
So it begs the question: where does 4P fit into this mix? Based on our “very fuzzy” numbers, the price-performance of the 4P Istanbul fits pretty nicely into the equation. This was a surprise, as 4P systems usually run about 30% higher than their 2P siblings. However, Istanbul could offer some short-term price parity with Nehalem-EP if it releases at the correct price point. The following assumes an “Istanbul 8493SE” would sell for about $3K/ea:
Likewise, the power characteristic is no different than the projected 2P systems (4P systems equate to 1/2 the system number).
While the 4P Istanbul systems still result in a 30% higher cost than the 2P Istanbul, the 4P Istanbul is right in line with the Nehalem-EP price, with the bonus of additional HT3 bandwidth (due to the 2-additional processors). This makes Istanbul a good choice for the “special purpose” virtualization systems we hinted about earlier in the post [SQL, VDI, ERP, etc.].
It was somewhat of a surprise that Nehalem-EP did well in the price per VM against Shanghai, however when you consider what Opteron did to Xeon in the past, it makes a lot of sense that Intel would come out will guns blazing. We will follow-up on these concepts for more “average performance” systems and look for more surprises and trends. In another follow-up, we’ll evaluate what 8-core and 12-core processors will do to this analysis, and what role – if any – 4P and 8P have in future virtualization systems.
Across the range of analysis we see shifting consolidation ratios in keeping with the vCPU/CPU limitations we have already identified. In our 2009-2011 time-line, we see 2P VM/host consolidation ratios as high as 27:1 with Istanbul in 2009 and as low as 12:1 with Shanghai in 2011. Nehalem is somewhere in-between, ranging from a high of 22:1 to a low of 17:1. Core count here is definitely the limiting factor, with memory starting to show-up in 2010/2011. This does not mean that memory does not play a role, and we feel we’ve been very conservative in our VM growth model [with respect to VM memory bloat]. If we’re off by even 50% on the memory side, that makes 2010 look like our 2011 projections and memory configurations and [memory] costs [and capacities] will begin to play a dominant role.
A final surprise was the emphasis on core utilization versus memory utilization in our study. While we’ve highlighted Intel and AMD’s reduction in DIMM capacity [in their new/projected products] as “short sighted” it would appear – based on our typical load studies, that 48-64GB seems to be right for price-per-VM in 2P systems based on today’s core count. We do expect that number to double by Q3/2010 in keeping with core count and refinements in distributed load management. Until then the vCPU/core ratio (ours test ratio is 3.38) will hold-back large memory footprints. This is a good thing for both AMD’s new DDR3 systems and Intel’s Nehalem as 8GB DDR3 sticks cost over $1,000 each today.
[Note: comments about vCPU/core ratios and system loading assume all loads are treated with equal priority. Some virtual machine managers, like VMware’s ESX and vSphere, allow for priority adjustments that can shift consolidation ratios higher while “protecting” performance sensitive applications. While this can provide meaningful shifts in consolidation ratios, its effects would be consistent across similar architectures, with execution speed, VM exit and instruction issue latency dominating the results.]