Archive for the ‘AMD’ Category

h1

VMware Management Assistant Panics on Magny Cours

August 11, 2010

VMware’s current version of its vSphere Management Assistant – also known as vMA (pronounced “vee mah”) – will crash when run on an ESX host using AMD Magny Cours processors. This behavior was discovered recently when installing the vMA on an AMD Opteron 6100 system (aka. Magny Cours) causing a “kernel panic” on boot after deploying the OVF template. Something of note is the crash also results in 100% vCPU utilization until the VM is either powered-off or reset:

vMA Kernel Panic on Import

vMA Kernel Panic on Import

As it turns out, no manner of tweaks to the virtual machine’s virtualization settings nor OS boot/grub settings (i.e. noapic, etc.) seem to cure the ills for vMA. However, we did discover that the OVF deployed appliance was configured as a VMware Virtual Machine Hardware Version 4 machine:

vMA 4.1 defaults to Hardware Version 4

vMA 4.1 defaults to Virtual Machine Hardware Version 4

Since our lab vMA deployments have all been upgraded to Virtual Machine Harware Version 7 for some time (and for functional benefits as well), we tried to update the vMA to Version 7 and try again:

Upgrade vMA Virtual Machine Version...

Upgrade vMA Virtual Machine Version...

This time, with Virtual Hardware Version 7 (and no other changes to the VM), the vMA boots as it should:

vMA Booting after Upgrade to Virtual Hardware Version 7

vMA Booting after Upgrade to Virtual Hardware Version 7

Since the Magny Cours CPU is essentially a pair of tweaked 6-core Opteron CPUs in a single package, we took the vMA into the lab and deployed it to an ESX server running on AMD 2435 6-core CPUs: the vMA booted as expected, even with Virtual Hardware Version 4. A quick check of the community and support boards show a few issues with older RedHat/Centos kernels (like vMA’s) but no reports of kernel panic with Magny Cours. Perhaps there are just not that many AMD Opteron 6100 deployments out there with vMA yet…

h1

vSphere 4 Update 2 Released

June 11, 2010

VMware vSphere 4, Update 2 has been released with the following changes to ESXi:

The following information provides highlights of some of the enhancements available in this release of VMware ESXi:

  • Enablement of Fault Tolerance Functionality for Intel Xeon 56xx Series processors— vSphere 4.0 Update 1 supports the Intel Xeon 56xx Series processors without Fault Tolerance. vSphere 4.0 Update 2 enables Fault Tolerance functionality for the Intel Xeon 56xx Series processors.
  • Enablement of Fault Tolerance Functionality for Intel i3/i5 Clarkdale Series and Intel Xeon 34xx Clarkdale Series processors— vSphere 4.0 Update 1 supports the Intel i3/i5 Clarkdale Series and Intel Xeon 34xx Clarkdale Series processors without Fault Tolerance. vSphere 4.0 Update 2 enables Fault Tolerance functionality for the Intel i3/i5 Clarkdale Series and Intel Xeon 34xx Clarkdale Series processors.
  • Enablement of IOMMU Functionality for AMD Opteron 61xx and 41xx Series processors— vSphere 4.0 Update 1 supports the AMD Opteron 61xx and 41xx Series processors without input/output memory management unit (IOMMU). vSphere 4.0 Update 2 enables IOMMU functionality for the AMD Opteron 61xx and 41xx Series processors.
  • Enhancement of the resxtop utility— vSphere 4.0 U2 includes an enhancement of the performance monitoring utility, resxtop. The resxtop utility now provides visibility into the performance of NFS datastores in that it displays the following statistics for NFS datastores: Reads/swrites/sMBreads/sMBwrtn/scmds/sGAVG/s (guest latency).
  • Additional Guest Operating System Support— ESX/ESXi 4.0 Update 2 adds support for Ubuntu 10.04. For a complete list of supported guest operating systems with this release, see the VMware Compatibility Guide.

Resolved Issues In addition, this release delivers a number of bug fixes that have been documented in theResolved Issues section.

ESXi 4 Update 2 Release Notes

Noted in the release is the official support for AMD’s IOMMU in Opteron 6100 and 4100 processors – available in 1P, 2P and 4P configurations. This finally closes the (functional) gap between AMD Opteron and Intel’s Nehalem line-up. Likewise, FT support for many new Intel processors has been added. Also, the addition of NFS performance counters in ESXTop will make storage troubleshooting a bit easier. Grab you applicable update at VMware’s download site now (SnS required.)

h1

Quick-Take: AMD Dodeca-core Opteron, Real Soon Now

March 3, 2010

In a recent blog, John Fruehe recounted a few highlights from the recent server analyst event at AMD/Austin concerning the upcoming release of AMD’s new 12-core (dodeca) Opteron 6100 series processor – previously knows as Magny-Cours. While not much “new” was officially said outside of NDA privilege, here’s what we’re reading from his post:

1. Unlike previous launches, AMD is planning to have “boots on the ground” this time with vendors and supply alignments in place to be able to ship product against anticipated demand. While it is now well known that Magny-Cours has been shipping to certain OEM and institutional customers for some time, our guess is that 2000/8000 series 6-core HE series have been hard to come by for a reason – and that reason has 12-cores not 6;

Obviously the big topic was the new AMD Opteron™ 6000 Series platforms that will be launching very soon.  We had plenty of party favors – everyone walked home with a new 12-core AMD Opteron 6100 Series processor, code name “Magny-Cours”.

– Fruehe on AMD’s pending launch

2. Timing is right! With Intel’s Nehalem-EX 8-core and Core i7/Nehalem-EP 6-core being demoed about, there is more pressure than ever for AMD to step-up with a competitive player. Likewise, DDR3 is neck-and-neck with DDR2 in affordability and way ahead with low-power variants that more than compensate for power-hungry CPU profiles. AMD needs to deliver mainstream performance in 24-cores and 96GB DRAM within the power envelope of 12-cores and 64GB to be a player. With 1.35V DDR3 parts paired to better power efficiency in the 6100, this could be a possibility;

We demonstrated a benchmark running on two servers, one based on the Six-Core AMD Opteron processor codenamed “Istanbul,” and one 12-core “Magny-Cours”-based platform.  You would have seen that the power consumption for the two is about the same at each utilization level.  However, there is one area where there was a big difference – at idle.  The “Magny-Cours”-based platform was actually lower!

– AMD’s Fruehe on Opteron 6100’s power consumption

3. Performance in scaled virtualization matters – raw single-threaded performance is secondary. In virtual architectures, clusters of systems must perform as one in an orchestrated ballet of performance and efficiency seeking. For some clusters, dynamic load migration to favour power consumption is a priority – relying on solid power efficiency under high load conditions. For other clusters, workload is spread to maximize performance available to key workloads – relying on solid power efficiency under generally light loads. For many environments, multi-generational hardware will be commonplace and AMD is counting on its wider range of migration compatibility to hold-on to customers that have not yet jumped ship for Intel’s Nehalem-EP/EX.

“We demonstrated Microsoft Hyper-V running on two different servers, one based on a Quad-Core AMD Opteron processor codenamed “Barcelona” (circa 2007) and a brand new “Magny-Cours”-based system. …companies might have problems moving a 2010 VM to a 2007 server without limiting the VM features. (For example, in order to move a virtual machine from an Intel  “Nehalem”-based system to a “Harpertown” [or earlier] platform, the customer must not enable nested paging in the “Nehalem” virtual machine, which can reduce the overall performance of the VM.)”

– AMD’s Fruehe, extolling the virtues of Opteron generational compatibility

SOLORI’s Take: It would appear that Magny-Cours has more under the MCM hood than a pair of Istanbul processors (as previously charged.) To manage better idle performance and constant power performance in spite of a two-to-one core ratio and similar 45nm process, AMD’s process and feature set must include much better power management as well, however, core speed is not one of them. With the standard “Maranello” 6100 series coming in at 1.9, 2.1 and 2.2 GHz with an HE variant at 1.7GHz and SE version running at 2.3GHz, finding parity in an existing cluster of 2.4, 2.6 and 2.8 GHz six-core servers may be difficult. Still, Maranello/G34 CPUs will be at 85, 115 and 140W TDP.

That said, Fruehe has a point on virtualization platform deployment and processor speed: it is not necessary to trim-out an entire farm with top-bin parts – only a small portion of the cluster needs to operate with top-band performance marks. The rest of the market is looking for predictable performance, scalability and power efficiency per thread. While SMT makes a good run at efficiency per thread, it does so at the expense of predictable performance. Here’s hoping that AMD’s C1E (or whatever their power-sipping special sauce will be called) does nothing to interfere with predictable performance…

As we’ve said before, memory capacity and bandwidth (as a function of system power and core/thread capacity) are key factors in a CPU’s viability in a virtualization stack. With 12 DIMM slots per CPU (3-DPC, 4-channel), AMD inherits an enviable position over Intel’s current line-up of 2P solutions by being able to offer 50% more memory per cluster node without resorting to 8GB DIMMs. That said, it’s up to OEM’s to deliver rack server designs that feature 12 DIMM per CPU and not hold-back with only 8 DIMM variants. In the blade and 1/2-size market, cramming 8 DIMM per board (effectively 1-DPC for 2P Magny-Cours) can be a challenge let alone 24 DIMMs! Perhaps we’ll see single-socket blades with 12 DIMMs (12-cores, 48/96GB DDR3) or 2P blades with only one 12 DIMM memory bank (one-hop, NUMA) in the short term.

SOLORI’s 2nd Take: It makes sense that AMD would showcase their leading OEM partners because their success will be determined on what those OEM’s bring to market. With VDI finally poised to make a big market impact, we’d expect to see the first systems delivered with 2-DPC configurations (8 DIMM per CPU, economically 2.5GB/core) which could meet both VDI and HPC segments equally. However, with Window7 gaining momentum, what’s good for HPC might not cut it for long in the VDI segment where expectations of 4-6 VM’s per core at 1-2GB/VM are mounting.

Besides the launch date, what wasn’t said was who these OEM’s are and how many systems they’ll be delivering at launch. Whoever they are, they need to be (1) financially stronger than AMD, (2) in an aggressive marketing position with respect to today’s key growth market (server and desktop virtualization), and (3) willing to put AMD-based products “above the fold” on their marketing and e-commerce initiatives. AMD needs to “represent” in a big way before a tide of new technologies makes them yesterday’s news. We have high hopes that AMD’s recent “perfect” execution streak will continue.

h1

Quick Take: Year-end DRAM Price Follow-up, Thoughts on 2010

December 30, 2009

Looking at memory prices one last time before the year is out and prices of our “benchmark” Kingston DDR3 server DIMMs are on the decline. While the quad rank 8G DDR3/1066 DIMMs are below the $565 target price (at $514) we predicted back in August, the dual rank equivalent (on our benchmark list) are still hovering around $670 each. Likewise, while retail price on the 8G DDR2/667 parts continue to rise, inventory and promotional pricing has managed to keep them flat at $433 each, giving large foot print DDR2 systems a $2,000 price advantage (based on 64GB systems).

Benchmark Server (Spot) Memory Pricing – Dual Rank DDR2 Only
DDR2 Reg. ECC Series (1.8V) Price Jun ’09 Price Sep ’09 Price
Dec ’09

KVR800D2D4P6/4G
4GB 800MHz DDR2 ECC Reg with Parity CL6 DIMM Dual Rank, x4
(5.400W operating)
$100.00 $117.00
up 17%
$140.70
up 23%

(Promo price, retail $162)

KVR667D2D4P5/4G
4GB 667MHz DDR2 ECC Reg with Parity CL5 DIMM Dual Rank, x4 (5.940W operating)
$80.00 $103.00
up 29%
$97.99
down 5%

(retail $160)

KVR667D2D4P5/8G
8GB 667MHz DDR2 ECC Reg with Parity CL5 DIMM Dual Rank, x4 (7.236W operating)
$396.00 $433.00 $433.00

(Promo price, retail $515)
Benchmark Server (Spot) Memory Pricing – Dual Rank DDR3 Only
DDR3 Reg. ECC Series (1.5V) Price Jun ’09 Price Sep ’09 Price
Dec ’09

KVR1333D3D4R9S/4G
4GB 1333MHz DDR3 ECC Reg w/Parity CL9 DIMM Dual Rank, x4 w/Therm Sen (3.960W operating)
$138.00 $151.00
up 10%

$135.99

down 10%

KVR1066D3D4R7S/4G
4GB 1066MHz DDR3 ECC Reg w/Parity CL7 DIMM Dual Rank, x4 w/Therm Sen (5.09W 5.085W operating)
$132.00 $151.00
up 15%
$137.59
down 9%(retail $162)

KVR1066D3D4R7S/8G
8GB 1066MHz DDR3 ECC Reg w/Parity CL7 DIMM Dual Rank, x4 w/Therm Sen (6.36W 4.110W operating)
$1035.00 $917.00
down 11.5%
$667.00
down 28%

(avail. 1/10)

As the year ends, OEMs are expected to “pull up inventory,” according to DRAMeXchange, in advance of a predicted market short fall somewhere in Q2/2010. Demand for greater memory capacities are being driven by Windows 7 and 64-bit processors with 4GB as the well established minimum system foot print ending 2009. With Server 2008 systems demanding 6GB+ and increased shift towards large memory foot print virtualization servers and blades, the market price for DDR3 – just turning the corner in Q1/2010 versus DDR2 – will likely flatten based on growing demand.

SOLORI’s Take: With Samsung and Hynix doubling CAPEX spending in 2010, we’d be surprised to see anything more than a 30% drop in retail 4GB and 8GB server memory by Q3/2010 given the anticipated demand. That puts 8G DDR3/10666 at $470/stick versus $330 for 2x 4GB and on track with August 2009 estimates. The increase in compute, I/O and memory densities in 2010 will be market changing and memory demand will play a small (but significant) role in that development.

In the battle to “feed” the virtualization servers of 2H/2010, the 4-channel “behemoth” Magny-Cours system could have a serious memory/price advantage with 8 (2-DPC) or 12 (3-DPC) configurations of 64GB (2.6GB/thread) and 96GB (3.9GB/thread) DDR3/1066 using only 4GB sticks (assumes 2P configuration). Similar GB/thread loads on Nehalem-EP6 “Gulftown” (6-core/12-thread) could be had with 72GB DDR3/800 (18x 4GB, 3-DPC) or 96GB DDR3/1066 (12x 8GB, 2-DPC), providing the solution architect with a choice between either a performance (memory bandwidth) or price (about $2,900 more) crunch. This means Magny-Cours could show a $2-3K price advantage (per system) versus Nehalem-EP6 in $/VM optimized VDI implementations.

Where the rubber starts to meet the road, from a virtualization context, is with (unannounced) Nehalem-EP8 (8-core/16-thread) which would need 96GB (12x 8GB, 2-DPC) to maintain 2.6GB/thread capacity with Magny-Cours. This creates a memory-based price differential – in Magny-Cours’ favor – of about $3K per system/blade in the 2P space. At the high-end (3.9GB/thread), the EP8 system would need a full 144GB (running DDR3/800 timing) to maintain GB/thread parity with 2P Magny-Cours – this creates a $5,700 system price differential and possibly a good reason why we’ll not actually see an 8-core/16-thread variant of Nehalem-EP in 2010.

Assuming that EP8 has 30% greater thread capacity than Magny-Cours (32-threads versus 24-threads, 2P system), a $5,700 difference in system price would require a 2P Magny-Cours system to cost about $19,000 just to make it an even value proposition. We’d be shocked to see a MC processor priced above $2,600/socket, making the target system price in the $8-9K range (24-core, 2P, 96GB DDR3/1066). That said, with VDI growth on the move, a 4GB/thread baseline is not unrealistic (4 VM/thread, 1GB per virtual desktop) given current best practices. If our numbers are conservative, that’s a $100 equipment cost per virtual desktop – about 20% less than today’s 2P equivalents in the VDI space. In retrospect, this realization makes VMware’s decision to license VDI per-concurrent-user and NOT per socket a very forward-thinking one!

Of course, we’re talking about rack servers and double-size and non-standard blades here: after all, where can we put 24 DIMM slots (2P, 3-DPC, 4-channel memory) on a SFF blade? Vendors will have a hard enough time with 8-DIMM per processor (2P, 2-DPC, 4-channel memory) configurations today. Plus, all that dense compute and I/O will need to get out of the box somehow (10GE, IB, etc.) It’s easy to see that HPC and virtualization platforms demands are converging, and we think that’s good for both markets.

SOLORI’s 2nd Take: Why does 8GB of DRAM require less than 4GB at the same speed and voltage??? The 4GB stick is based on 36x 256M x 4-bit DDR3-1066 FBGA’s (60nm) and the 8GB stick is based on 36x 512M x 4-bit DDR3-1066 FBGA’s (likely 50nm). According to SAMSUNG, the smaller feature size offers nearly 40% improvement in power consumption (per FBGA). Since the sticks use the same number of FBGA components (1Gb vs 2Gb), the 20% power savings seems reasonable.

The prospect of lower power at higher memory densities will drive additional market share to modules based on 2Gb DRAM modules. The gulf between DDR2 will continue to expand as tooling shifts to majority-DDR3 production and the technology. While minority leader Hynix announced a 50nm 2Gb DDR2 part earlier this year (2009), the chip giant Samsung continues to use 60-nm for its 2Gb DDR2. Recently, Hynix announced a successful validation of its 40-nm class 2Gb DDR3 module operating at 1333MHz and saving up to 40% power from the 50nm design. Similarly, Samsung’s leading the DRAM arms race with 30nm, 4Gb DDR3 production which will show-up in 1.35V, 16GB UDIMM and RDIMM in 2010 offering additional power saving benefits over 40-50nm designs. Meanwhile, Samsung has all but abandoned advances on DDR2 feature sizes.

The writing is on the wall for DDR2 systems: unit costs are rising, demand is shrinking, research is stagnant and a new wave of DDR3-based hardware is just over the horizon (1H/2010). While these things will show the door to DDR2-based systems (which enjoyed a brief resurgence in 2009 due to DDR3 supply problems and marginal power differences) as demand and DDR3 advantages heat-up in later 2010, it’s kudos to AMD for calling the adoption curve, spot on!

h1

NEC Offers “Dunnington” Liposuction, Tops 64-Core VMmark

November 19, 2009

NEC’s venerable Express5800/A1160 is back at the top VMmark chart, this time establishing the brand-new 64-core category with a score of 48.23@32 tiles – surpassing its 48-core 3rd place posting by over 30%. NEC’s new 16-socket, 64-core, 256GB “Dunnington” X7460 Xeon-based score represents a big jump in performance over its predecessor with a per tile ratio of 1.507 – up 6% from the 48-core ratio of 1.419.

To put this into perspective, the highest VMmark achieved, to date, is the score of 53.73@35 tiles (tile ratio 1.535) from the 48-core HP DL785 G6 in August, 2009. If you are familiar with the “Dunnington” X7460, you know that it’s a 6-core, 130W giant with 16MB L2 cache and a 1000’s price just south of $3,000 per socket. So that raises the question: how does 6-cores X 16-sockets = 64? Well, it’s not pro-rationing from the Obama administration’s “IT fairness” czar. NEC chose to disable the 4th and 6th core of each socket to reduce the working cores from 96 to 64.

At $500/core, NEC’s gambit may represent an expensive form of “core liposuction” but it was a necessary one to meet VMware’s “logical processor per host” limitation of 64. That’s right, currently VMware’s vSphere places a limit on logical processors based on the following formula:

CPU_Sockets X Cores_Per_Socket X Threads_Per_Core =< 64

According to VMware, the other 32 cores would have been “ignored” by vSphere had they been enabled. Since “ignored” is a nebulous term (aka “undefined”), NEC did the “scientific” thing by disabling 32 cores and calling the system a 64-core server. The win here: a net 6% improvement in performance per tile over the 6-core configuration – ostensibly from the reduced core loading on the 16MB of L3 cache per socket and reduction in memory bus contention.

Moving forward to 2010, what does this mean for vSphere hardware configurations in the wake of 8-core, 16-thread Intel Nehalem-EX and 12-core, 12-thread AMD Magny-Cours processors? With a 4-socket Magny-Cours system limitation, we won’t be seeing any VMmarks from the boys in green beyond 48-cores. Likewise, the boys in blue will be trapped by a VMware limitation (albeit, a somewhat arbitrary and artificial one) into a 4-socket, 64-thread (HT) configuration or an 8-socket, 64-core (HT-disabled) configuration for their Nehalem-EX platform – even if using the six-core variant of EX. Looks like VMware will need to lift the 64-LCPU embargo by Q2/2010 just to keep up.

h1

Quick Take: Red Hat and Microsoft Virtual Inter-Op

October 9, 2009

This week Red Hat and Microsoft announced support of certain of their OSes as guests in their respective hypervisor implementations: Kernel Virtual Machine (KVM) and Hyper-V, respectively. This comes on the heels of Red Hat’s Enterprise Server 5.4 announcement last month.

KVM is Red Hat’s new hypervisor that leverages the Linux kernel to accelerate support for hardware and capabilities. It was Red Hat and AMD that first demonstrated live migration between AMD and Intel-based hypervisors using KVM late last year – then somewhat of a “Holy Grail” of hypervisor feats. With nearly a year of improvements and integration into their Red Hat Enterprise Server and Fedora “free and open source” offerings, Red Hat is almost ready to strike-out in a commercially viable way.

Microsoft now officially supports the following Red Hat guest operating systems in Hyper-V:

Red Hat Enterprise Linux 5.2, 5.3 and 5.4

Red Hat likewise officially supports the following Microsoft quest operating systems in KVM:

Windows Server 2003, 2008 and 2008 R2

The goal of the announcement and associated agreements between Red Hat and Microsoft was to enable a fully supported virtualization infrastructure for enterprises with Red Hat and Microsoft assets. As such, Microsoft and Red Hat are committed to supporting their respective products whether the hypervisor environment is all Red Hat, all Hyper-V or totally heterogeneous – mixing Red Hat KVM and Microsoft Hyper-V as necessary.

“With this announcement, Red Hat and Microsoft are ensuring their customers can resolve any issues related to Microsoft Windows on Red Hat Enterprise Virtualization, and Red Hat Enterprise Linux operating on Microsoft Hyper-V, regardless of whether the problem is related to the operating system or the virtualization implementation.”

Red Hat press release, October 7, 2009

Many in the industry cite Red Hat’s adoption of KVM as a step backwards [from Xen] requiring the re-development of significant amount of support code. However, Red Hat’s use of libvirt as a common management API has allowed the change to happen much more rapidly that critics assumptions had allowed. At Red Hat Summit 2009, key Red Hat officials were keen to point out just how tasty their “dog food” is:

Tim Burke, Red Hat’s vice president of engineering, said that Red Hat already runs much of its own infrastructure, including mail servers and file servers, on KVM, and is working hard to promote KVM with key original equipment manufacturer partners and vendors.

And Red Hat CTO Brian Stevens pointed out in his Summit keynote that with KVM inside the Linux kernel, Red Hat customers will no longer have to choose which applications to virtualize; virtualization will be everywhere and the tools to manage applications will be the same as those used to manage virtualized guests.

Xen vs. KVM, by Pam Derringer, SearchDataCenter.com

For system integrators and virtual infrastructure practices, Red Hat’s play is creating opportunities for differentiation. With a focus on light-weight, high-performance, I/O-driven virtualization applications and no need to support years-old established processes that are dragging on Xen and VMware, KVM stands to leap-frog the competition in the short term.

SOLORI’s Take: This news is good for all Red Hat and Microsoft customers alike. Indeed, it shows that Microsoft realizes that its licenses are being sold into the enterprise whether or not they run on physical hardware. With 20+:1 consolidation ratios now common, that represents a 5:1 license to hardware sale for Microsoft, regardless of the hypervisor. With KVM’s demonstrated CPU agnostic migration capabilities, this opens the door to an even more diverse virtualization infrastructure than ever before.

On the Red Hat side, it demonstrates how rapidly Red Hat has matured its offering following the shift to KVM earlier this year. While KVM is new to Red Hat, it is not new to Linux or aggressive early adopters since being added to the Linux kernel as of 2.6.20 back in September of 2007. With support already in active projects like ConVirt (VM life cycle management), OpenNebula (cloud administration tools), Ganeti, and Enomaly’s Elastic Computing Platform, the game of catch-up for Red Hat and KVM is very likely to be a short one.

h1

Quick Take: Nehalem/Istanbul Comparison at AnandTech

October 7, 2009

Johan De Gelas and crew present an interesting comparison of Dunnington, Shanghai, Istanbul and Nehalem in a new post at AnandTech this week. In the test line-up are the “top bin” parts from Intel and AMD in 4-core and 6-core incarnations:

  • Intel Nehalem-EP Xeon, X5570 2.93GHz, 4-core, 8-thread
  • Intel “Dunnington” Xeon, X7460, 2.66GHz, 6-core, 6-thread
  • AMD “Shanghai” Opteron 2389/8389, 2.9GHz, 4-core, 4-thread
  • AMD “Istanbul” Opteron 2435/8435, 2.6GHz, 6-core, 6-thread

Most importantly for virtualization systems architects is how the vCPU scheduling affects “measured” performance. The telling piece comes from the difference in comparison results where vCPU scheduling is equalized:

AnandTech's Quad Sockets v. Dual Sockets Comparison. Oct 6,  2009.

AnandTech's Quad Sockets v. Dual Sockets Comparison. Oct 6, 2009.

When comparing the results, De Gelas hits on the I/O factor which chiefly separates VMmark from vAPUS:

The result is that VMmark with its huge number of VMs per server (up to 102 VMs!) places a lot of stress on the I/O systems. The reason for the Intel Xeon X5570’s crushing VMmark results cannot be explained by the processor architecture alone. One possible explanation may be that the VMDq (multiple queues and offloading of the virtual switch to the hardware) implementation of the Intel NICs is better than the Broadcom NICs that are typically found in the AMD based servers.

Johan De Gelas, AnandTech, Oct 2009

This is yet another issue that VMware architects struggle with in complex deployments. The latency in “Dunnington” is a huge contributor to its downfall and why the Penryn architecture was a dead-end. Combined with 8 additional threads in the 2P form factor, Nehalem delivers twice the number of hardware execution contexts than Shanghai, resulting in significant efficiencies for Nehalem where small working data sets are involved.

When larger sets are used – as in vAPUS – the Istanbul’s additional cores allows it to close the gap to within the clock speed difference of Nehalem (about 12%). In contrast to VMmark which implies a 3:2 advantage to Nehalem, the vAPUS results suggest a closer performance gap in more aggressive virtualization use cases.

SOLORI’s Take: We differ with De Gelas on the reduction in vAPUS’ data set to accommodate the “cheaper” memory build of the Nehalem system. While this offers some advantages in testing, it also diminishes one of Opteron’s greatest strengths: access to cheap and abundant memory. Here we have the testing conundrum: fit the test around the competitors or the competitors around the test. The former approach presents a bias on the “pure performance” aspect of the competitors, while the latter is more typical of use-case testing.

We do not construe this issue as intentional bias on AnandTech’s part, however it is another vector to consider in the evaluation of the results. De Gelas delivers a report worth reading in its entirety, and we view this as a primer to the issues that will define the first half of 2010.