Posts Tagged ‘anandtech’

h1

Quick Take: Nehalem/Istanbul Comparison at AnandTech

October 7, 2009

Johan De Gelas and crew present an interesting comparison of Dunnington, Shanghai, Istanbul and Nehalem in a new post at AnandTech this week. In the test line-up are the “top bin” parts from Intel and AMD in 4-core and 6-core incarnations:

  • Intel Nehalem-EP Xeon, X5570 2.93GHz, 4-core, 8-thread
  • Intel “Dunnington” Xeon, X7460, 2.66GHz, 6-core, 6-thread
  • AMD “Shanghai” Opteron 2389/8389, 2.9GHz, 4-core, 4-thread
  • AMD “Istanbul” Opteron 2435/8435, 2.6GHz, 6-core, 6-thread

Most importantly for virtualization systems architects is how the vCPU scheduling affects “measured” performance. The telling piece comes from the difference in comparison results where vCPU scheduling is equalized:

AnandTech's Quad Sockets v. Dual Sockets Comparison. Oct 6,  2009.

AnandTech's Quad Sockets v. Dual Sockets Comparison. Oct 6, 2009.

When comparing the results, De Gelas hits on the I/O factor which chiefly separates VMmark from vAPUS:

The result is that VMmark with its huge number of VMs per server (up to 102 VMs!) places a lot of stress on the I/O systems. The reason for the Intel Xeon X5570’s crushing VMmark results cannot be explained by the processor architecture alone. One possible explanation may be that the VMDq (multiple queues and offloading of the virtual switch to the hardware) implementation of the Intel NICs is better than the Broadcom NICs that are typically found in the AMD based servers.

Johan De Gelas, AnandTech, Oct 2009

This is yet another issue that VMware architects struggle with in complex deployments. The latency in “Dunnington” is a huge contributor to its downfall and why the Penryn architecture was a dead-end. Combined with 8 additional threads in the 2P form factor, Nehalem delivers twice the number of hardware execution contexts than Shanghai, resulting in significant efficiencies for Nehalem where small working data sets are involved.

When larger sets are used – as in vAPUS – the Istanbul’s additional cores allows it to close the gap to within the clock speed difference of Nehalem (about 12%). In contrast to VMmark which implies a 3:2 advantage to Nehalem, the vAPUS results suggest a closer performance gap in more aggressive virtualization use cases.

SOLORI’s Take: We differ with De Gelas on the reduction in vAPUS’ data set to accommodate the “cheaper” memory build of the Nehalem system. While this offers some advantages in testing, it also diminishes one of Opteron’s greatest strengths: access to cheap and abundant memory. Here we have the testing conundrum: fit the test around the competitors or the competitors around the test. The former approach presents a bias on the “pure performance” aspect of the competitors, while the latter is more typical of use-case testing.

We do not construe this issue as intentional bias on AnandTech’s part, however it is another vector to consider in the evaluation of the results. De Gelas delivers a report worth reading in its entirety, and we view this as a primer to the issues that will define the first half of 2010.

h1

AMD Istanbul Reviews

June 1, 2009

Let’s look at what other sites are saying about Istanbul in detailed performance testing and reviews.

“Make no mistake, though: this Istanbul system is very much a match for the [Nehalem X5550] in terms of power-efficient performance.”

Scott Wassman, TechReport.com

“Istanbul can work with “only” 6 threads, but each thread gets a 64 KB L1 and an in comparison copious amount of 512 KB of L2. In a nutshell, It is clear that the new AMD “Istanbul” Opteron targets a specific market: a few compute intensive HPC applications, large databases and most importantly: “heavy” virtualized workload. This is a relatively “new” market where the AMD 2435 shines.”

Johan De Gelas, AnantTech IT Portal

Johan also correctly points out that VMware’s VMM scheduler defaults to a logical processor partition boundary of 4-cores. Actually, the configuration entry – VMkernel.Boot.cpuCellSize – defaults to “zero” which signifies “auto-configure” and the auto-configure default is a value of “4.” (Editor: we point this out because changing “auto-configure” defaults can have “unintended” effects when system updates roll-out.) While this works well for single-core, quad-socket through multi-socket, quad-core, it does not work well when cell count is a fraction of the core/CPU count. In the case of Istanbul (and Intel’s Dunnington) the correct setting is:

VMkernel.Boot.cpuCellSize = 6

The resulting change elicited the following from De Gelas in his vAPUS Mark I tests: “The six-core Opteron keeps up with the best Xeons available!” These results were, of course, under ESX 3.5 and referred to performance relative to Intel’s Nehalem-EP X5570. In ESX 4.0, the VMM is tuned for Intel’s SMT and provides a boost in performance for Nehalem-EP SKUs with SMT. The vAPUS testing demonstrates some interesting performance and tuning characteristics. Again, it’s worth the look…


“AMD’s strategy (blog) is to talk about virtualization and power efficiency and offering those features across all of its processors. AMD called out Intel for rejiggering features based on the chip.”

Larry Dignan, ZDNet

“So, this is the death of the Quad-Core AMD Opteron processor codenamed “Shanghai,” right? Hardly.”

John Fruehe, AMD

“We are really excited about Istanbul and you can bet that we’ll be introducing new AMD-based rack, tower and blade servers in very short order. AMD’s execution was nothing short of flawless. Once again, they delivered ahead of schedule. And right around the corner you’ll see a full suite of Istanbul server products from HP.”

Paul Gottsegen, Vice-president, Integrated Marketing, Enterprise Servers & Storage, HP

“AMD is introducing its Six-Core AMD Opteron processors (code named Istanbul) ahead of schedule and Dell is pleased to offer it in our portfolio including the Dell PowerEdge 2970, R805 and R905 rack servers and the PowerEdge M605, M805, M905 blade servers. We are committed to bringing efficiency to enterprise computing by simplifying technology and lowering the cost of managing IT environments, and the AMD Istanbul processors in our servers help us do just that.”

Matt McGinnis, Senior Manager, Dell Global Communications

“AMD-V technology coupled with our server design with massive memory capacity and I/O scalability, we are seeing whopping improvements in virtualization performance our initial benchmarks. We expect to continue to have industry-leading benchmarks for four-socket servers with Istanbul in the PowerEdge R905.”

Sally Stevens, Vice President, Dell Platform Marketing

h1

Operton vs. Nehalem-EP at AnandTech

May 22, 2009

AnandTech’s Johan DeGelas has an interesting article on what he calls “real world virtualization” using a benchmark process his team calls “vApus Mk I” and runs it on ESX 3.5 Update 4. Essentially, it is a suite of Web 2.0 flavored apps running entirely on Windows in a mixed 32/64 structure. We’re cautiously encouraged by this effort as it opens the field of potential reviewers wide open.

Additionally, he finally comes to the same conclusion we’ve presented (in an economic impact context) about Shanghai’s virtualization value proposition. While his results are consistent with what we have been describing – that Shanghai has a good price-performance position against Nehalem-EP – there are some elements about his process that need further refinement.

Our biggest issue comes with his handling of 32-bit virtual machines (VM) and disclosure of using AMD’s Rapid Virtualization Indexing (RVI) with 32-bit VMs. In the DeGalas post, he points out some well known “table thrashing” consequences of TLB misses:

“However, the web portal (MCS eFMS) will give the hypervisor a lot of work if Hardware Assisted Paging (RVI, NPT, EPT) is not available. If EPT or RVI is available, the TLBs (Translation Lookaside Buffer) of the CPUs will be stressed quite a bit, and TLB misses will be costly.”

However, the MCS eFMS web portal (2 VMs) is running in a 32-bit OS. What makes this problematic is VMware’s default handling of page tables in 32-bit VM’s is “shadow page table” using VMware’s binary translation engine (BT). In otherwords, RVI is not enabled by default for ESX 3.5.x:

“By default, ESX automatically runs 32bit VMs (Mail, File, and Standby) with BT, and runs 64bit VMS (Database, Web, and Java) with AMD-V + RVI.”

–    VROOM! Blog, 3/2009

Read the rest of this entry ?