Posts Tagged ‘AMD Shanghai’


Shanghai Economics 101 – Continued

May 4, 2009

Let’s look at some more real world applications of what we’ve learned from the VMmark results for Nehalem and what it means in a practical comparison. We’ll award Nehalem-EP’s SMT a 25% bonus for in our comparisons when vCPU/core count is taken into the measurement. In a 6:1 consolidation, this means 60 vCPU’s for 2P Nehalem and 48 vCPU’s for Shanghai. Using this bias, the following cost characteristics are revealed for VM’s with average memory footprints of 1.5GB, for the Nehalem-EP 3.2GHz system:

Nehalem-EP Configuration Street $ 1536MB VM’s, 1 vCPU’s Max vCPU’s (6/c) Cost/VM
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 24GB DDR3/1333 $7,017.69 13 60 $539.82
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 48GB DDR3/1066 $7,755.99 28 60 $277.00
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 72GB DDR3/800 $8,708.19 42 60 $207.34
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 96GB DDR3/1066 $21,969.99 57 60 $385.44
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 144GB DDR3/800 $30,029.19 60 60 $500.49
2 x 2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 144GB DDR3/800 $60,058.38 120 120 $500.49

We’ll compare this to a Shanghai 2P system at 3.1GHz vs. the Nehalem-EP system:

Shanghai 2P/HT3 Configuration Street $ 1536MB VM’s, 1 vCPU’s Max vCPU’s (6/c) Cost/VM Savings per VM Savings %
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 32GB DDR2/800 $5,892.12 18 48 $327.34 $212.48 39.36%
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 48GB DDR2/800 $6,352.12 28 48 $226.86 $50.14 18.10%
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 64GB DDR2/533 $6,462.52 37 48 $174.66 $32.68 15.76%
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 80GB DDR2/667 $8,422.12 47 48 $179.19 $28.14 13.57%
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 96GB DDR2/667 $11,968.72 48 48 $249.35 $136.09 35.31%
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 128GB DDR2/533 $14,300.92 48 48 $297.94 $202.55 40.47%
2 x 2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 128GB DDR2/533 $28,601.83 96 96 $297.94 $202.55 40.47%

Read the rest of this entry ?


Quick Take: VMware – Shanghai vs. Nehalem-EP

April 26, 2009

Johan De Gelas at AnandTech has an interesting article comparing a 2P Shanghai (2384, 2.7GHz) vs. 2P Nehalem-EP (X5570, 2.93GHz) and the comparison in VMark is stunning… until you do you do your homework and reference the results. Johan is comparing the VMmark of a 64GB configured 2P Opteron running ESX3.5-Update 3 against a 72GB configured 2P Nehalem-EP running vSphere (ESX v4.0).

When I see benchmarks like these quoted by AnandTech I start to wonder why they consider the results “analytical…” In any case, there are significant ramifications to larger memory pools and higher clock speeds in VMmark, and these results show that fact. Additionally, the results also seem to indicate:

  • VMware vSphere (ESX v4.0) takes serious advantage of the new hyperthreading in Nehalem-EP
  • Nehalem-EP’s TurboBoost Appears to render the value proposition in favor of the X5570 over the W5580, all things considered

Judging from the Supermicro VMmark score, the Nehalem-EP (adjusted for differences in processor speed) turns-in about a 6% performance advantage over the Shanghai with comparable memory footprints. Had the Opteron been given additional memory, perhaps the tile and benchmark scores would have better illustrated this conclusion. It is unclear whether or not vSphere is significantly more efficient at resource scheduling, but the results seem to indicate that – at least with Nehalem’s new hyperthreading – it is more efficient.

Platform Memory VMware Version VMmark Score Rating
Clock Adj.)
Per Tile
HP ProLiant

(2xOpteron 2384, 2.7GHz)
64GB DDR2/533 ESX v3.5.0 Update 3 11.28
@8 tiles
(2xX5570, 2.93GHz w/3.2GHz TurboBoost)
72GB DDR3/1066 ESX v3.5.0 Update 4 BETA 14.22
@10 tiles
Dell PowerEdge

(2xX5570, 2.93GHz w/3.3GHz TurboBoost)
96GB DDR3/1066 ESX v4.0 23.90
HP ProLiant
DL370 G6

(2xW5580, 3.2GHz w/3.3GHz TurboBoost)
96GB DDR3/1066 ESX v4.0 23.96
HP ProLiant
DL585 G5
(4x8386SE, 2.8GHz)
128GB DDR2/667 ESX v3.5.0 Update 3 20.43
@14 tiles
HP ProLiant
DL585 G5
(4x8393SE, 3.1GHz)
128GB DDR2/667 ESX v4.0 22.11
@15 tiles

One things is clear from these VMmark examples: Nehalem-EP is a huge step in the right direction for Intel, and it potentially blurs the line between 2P and 4P systems. AMD will not have much breathing room with Istanbul in the 2P space against Nehalem-EP for system refreshes unless it can show similar gains and scalability. Where Istanbul will shine is in its drop-in capability in existing 2P, 4P and 8P platforms.

SOLORI’s take: These are exciting times for those just getting into virtualization. VMmark would seem to indicate that consolidation factors unlocked by Nehalem-EP come close to rivaling 4P platforms at about 75% of the cost. If I were buying a new system today, I would be hard-pressed to ignore Nehalem as a basis for my Eco-system. However, the socket-F Opteron systems still has about 8-12 months of competitive life in it, at which point it becomes just another workhorse. Nehalem-EP still does not provide enough incentive to shatter an established Eco-system.

SOLORI’s 2nd take: AMD has a lot of ground to cover with Istanbul and Magny-Cours in the few short months that remain in 2009. The “hearts and minds” of system refresh and new entrants into virtualization are at stake and Nehalem-EP offers some conclusive value to those entering the market.

With entrenched customers, AMD needs to avoid making them feel “left behind” before the market shifts definitively. AMD could do worse than getting some SR5690-based Istanbul platforms out on the VMmark circuit – especially with its HP and Supermicro partners. We’d also like to see some Magny-Cours VMmarks prior to the general availability of the G34 systems.


AMD and Intel I/O Virtualization

April 26, 2009

Virtualization now reaches an I/O barrier where consolidated applications must vie for increasingly more limited I/O resources. Early virtualization techniques – both software and hardware assisted – concentrated on process isolation and gross context switching to accelerate the “bulk” of the virtualization process: running multiple virtual machines without significant processing degradation.

As consolidation potentials are greatly enhanced by new processors with many more execution contexts (threads and cores) the limitations imposed on I/O – software translation and emulation of device communication – begin to degrade performance. This degradation further limits consolidation, especially where significant network traffic (over 3Gbps of non-storage VM traffic per virtual server) or specialized device access comes into play.

I/O Virtualization – The Next Step-Up

Intrinsic to AMD-V in revision “F” Opterons and newer AM2 processors is I/O virtualization enabling hardware assisted memory management in the form of a Graphics Aperture Remapping Table (GART) and the Device Exclusion Vector (DEV). These two facilities provide address translation of I/O device access to a limited range of the system physical address space and provide limited I/O device classification and memory protection.

Combined with specialized software GART and DEV provided primitive I/O virtualization but were limited to the confines of the memory map. Direct interaction with devices and virtualization of device contexts in hardware are efficiently possible in this approach as VMs need to rely on hypervisor control of device access. AMD defined its I/O virtualization strategy as AMD IOMMU in 2006 (now AMD-Vi) and has continued to improve it through 2009.

With the release of new motherboard chipsets (AMD SR5690) in 2009, significant performance gains in I/O will be brought to the platform with end-to-end I/O virtualization. Motherboard refreshes based on the SR5690 should enable Shanghai and Istanbul processors to take advantage of the full AMD IOMMU specification (now AMD-Vi).

Similarly, Intel’s VT-d approach combines chipset and CPU features to solve the problem in much the same way. Due to the architectural separation of memory controller from CPU, this meant earlier processors not only carry the additional instruction enhancements but they must also be coupled to northbridge chipsets that contained support. This feature was initially available in the Intel Q35 desktop chipset in Q3/2007. Read the rest of this entry ?