Posts Tagged ‘cost per vm’


Quick Take: IBM Tops VMmark, Crushes Record with 4P Nehalem-EX

April 7, 2010

It was merely a matter of time before one of the new core-rich titans – the Intel’s 8-core “Beckton” Nehalem-EX (Xeon 7500) or AMD’s 12-core “Magny-Cours” (Opteron 6100) – was to make a name for itself on VMware’s VMmark benchmark. Today, Intel draws first blood in the form of an 4-processor, 32-core, 64-thread, monster from IBM: the x3850 X5 running four Xeon X7560 (2.266GHz – 2.67GHz w/turbo, 130W TDP, each) and 384GB of DDR3-1066 low-power registered DIMMs. Weighing-in at 70.78@48 tiles, the 4P IBM System x3850 handily beats the next highest system – the 48-core DL785 G5 which set the record of 53.73@35 tiles back in August, 2009 – and bests it by over 30%.

At $3,800+ per socket for the tested Beckton chip, this is no real 2P alternative. In fact, a pair of Cisco UCS B250 M2 blades will get 52 tiles running for much less money. Looking at processor and memory configurations alone, this is a $67K+ enterprise server, resulting in a moderately-high $232/VM price point for the IBM x3850 X5.

SOLORI’s Take: The most interesting aspect of the EX benchmark is its clock-adjusted scaling factor: between 70% and 91% versus a 2P/8-core Nehalem-EP reference (Cisco UCS, B200 M1, 25.06@17 tiles). The unpredictable nature of Intel’s “turbo” feature – varying with thermal loads and per-core conditions – makes an exact clock-for-clock comparison difficult. However, if the scaling factor is 90%, the EX blows away our previous expectations about the platform’s scalability. Where did we go wrong when we predicted a conservative 44@39 tiles? We’re looking at three things: (1) a bad assumption about the effectiveness of “turbo” in the EP VMmark case (setting Ref_EP_Clock to 3.33 GHz), and (2) underestimating EX’s scaling efficiency (assumed 70%), (3) assuming a 2.26GHz clock for EX.

Chosing our minimum QPI/HT3 scalability factor of 75%, the predicted performance was derived this way from HP Proliant BL490 G6 as a baseline:

Est. Tiles = EP_Tiles_per_core( 2.13 ) * 32 cores * Scaling_Efficiency( 75% ) * EX_Clock( 2.26 ) / EP_Clock( 2.93 ) = 39 tiles

Est. Score = Est_Tiles( 40 ) * EP_Score_per_Tile( 1.43 ) * Est_EX_Clock( 2.26 ) / Ref_EP_Clock( 2.93 ) = 44.12

Est. Nehalem-EX VMmark -> 44.12@39 tiles

Correcting for the as-tested clock/turbo numbers, and using AMD’s 2P-to-4P VMmark scaling efficiency of 83%, and shifting to the new UCS baseline (with newer ESX version) the Nehalem-EX prediction factors to:

Est. Tiles = EP_Tiles_per_core( 2.13 ) * 32 cores * Scaling_Efficiency( 83% ) * EX_Clock( 2.67 ) / EP_Clock( 2.93 ) = 51 tiles

Est. Score = Est_Tiles( 51 ) * EP_Score_per_Tile( 1.47 ) * Est_EX_Clock( 2.67 ) / Ref_EP_Clock( 2.93 ) = 68.32

Est. Nehalem-EX VMmark -> 68.3@51 tiles

Clearly, this approach either overestimates the scaling efficiency or underestimates the “turbo” mode. IBM claims that a 2.93 GHz “turbo” setting is viable where Intel suggests 2.67 GHz is the maximum, so there is a source of potential bias. Looking at the tiles-per-core ratio of the VMmark result, the Nehalem-EX drops from 2.13 tiles per core on EP/2P platforms to 1.5 tiles per core on EX/4P platforms – about a 30% drop in per-core loading efficiency. That indicator matches well with our initial 75% scaling efficiency moving from 2P to 4P – something that AMD demonstrated with Istanbul last August. Given the high TDP of EX and IBM’s 2.93 GHz “turbo” specification, it’s possible that “turbo” is adding clock cycles (and power consumption) and compensating for a “lower” scaling efficiency than we’ve assumed. Looking at the same estimation with 2.93GHz “clock” and 71% efficiency (1.5/2.13), the numbers fall in line with VMmark:

Est. Tiles = EP_Tiles_per_core( 2.13 ) * 32 cores * Scaling_Efficiency( 71% ) * EX_Clock( 2.93 ) / EP_Clock( 2.93 ) = 48 tiles

Est. Score = Est_Tiles( 48 ) * EP_Score_per_Tile( 1.47 ) * Est_EX_Clock( 2.93 ) / Ref_EP_Clock( 2.93 ) = 70.56

Est. Nehalem-EX VMmark -> 70.56@48 tiles

This give us a good basis for evaluating 2P vs. 4P Nehalem systems: scaling factor of 71% and capable of pushing clock towards the 3GHz mark within its thermal envelope. Both of these conclusions fit typical 2P-to-4P norms and Intel’s process history.

SOLORI’s 2nd Take: So where does that leave AMD’s newest 12-core chip? To date, no VMmark exists for AMD’s Magny-Cours, and AMD chips tend not to do as well in VMmark as their Intel peers do to the benchmarks SMT-friendly loads. However, we can’t resist using the same analysis against AMD/MC’s 2.4GHz Opteron 6174SE (theoretical) using the 2P HP DL385 G6 as a baseline for core loading and the HP DL785 G6 for tile performance (best of the best cases):

Est. Tiles = HP_Tiles_per_core( 0.92 ) * 48 cores * Scaling_Efficiency( 83% ) * MC_Clock( 2.3 ) / HP_Clock( 2.6 ) = 33 tiles

Est. Score = Est_Tiles( 34 ) * HP_Score_per_Tile( 1.54 ) * Est_MC_Clock( 2.3 ) / Ref_HP_Clock( 2.8 ) = 41.8

Est. 4P Magny-Cours VMmark -> 41.8@33 tiles

That’s nowhere near good enough to top the current 8P, 48-core Istanbul VMmark at 53.73@35 tiles, so we’ll likely have to wait for faster 6100 parts to see any new AMD records. However, assuming AMD’s proposition is still “value 4P” so about 200 VM’s at under $18K/server gets you around $90/VM or less.


The Cost of Benchmarks

May 8, 2009

We’ve been challenged to backup our comparison of Nehalem-EP systems to Opteron Shanghai in price performance based on prevailing VMmark scores available on VMware’s site. In earlier posts, our analysis predicted “comparable” price-performance results between Shanghai and Nehalem-EP systems based on the economics of today’s memory and processors availability:

So what we’ve done here is taken the on-line configurations of some of the benchmark competitors. To make things very simple, we’ve just configured memory and CPU as tested – no HBA or 10GE cards to skew the results. The only exception – as pointed out by our challenger – is that we’ve taken the option of using “street price” memory where “street price” is better than the server manufacturer’s memory price.

Here’s our line-up:

System Processor Qty. Speed (GHz) Speed (GHz, Opt) Memory Configuration Street Price
Inspur NF5280 X5570 2 2.93 3.2 96GB (12x8GB) DDR3 1066 $18,668.58
Dell PowerEdge R710 X5570 2 2.93 3.2 96GB (12x8GB) DDR3 1066 $16,893.00
IBM System x 3650M2 X5570 2 2.93 3.2 96GB (12x8GB) DDR3 1066 $21,546.00
Dell PowerEdge M610 X5570 2 2.93 3.2 96GB (12x8GB) DDR3 1066 $21,561.00
HP ProLiant DL370 G6 W5580 2 3.2 3.2 96GB (12x8GB) DDR3 1066 $18,636.00
Dell PowerEdge R710 X5570 2 2.93 3.2 96GB (12x8GB) DDR3 1066 $16,893.00
Dell PowerEdge R805 2384 2 2.7 2.7 64GB (8x8GB) DDR2 533 $6,955.00
Dell PowerEdge R905 8384 4 2.7 2.7 128GB (16x8GB) DDR2 667 $11,385.00

Here we see Dell offering very aggressive DDR3/1066 pricing [for the R710] allowing us to go with on-line configurations, and HP offering overly expensive DDR2/667 memory prices (factor of 2) forcing us to go with 3rd party memory. In fact, IBM did not allow us to configure their memory configuration – as tested [with the 3650M2] – with their on-line configuration tool [neither did Dell with the M610] so we had to apply street memory prices. [Note: the So here’s how they rank with respect to VMmark:

System VMware Version Vmmark Score Vmmark Tiles Score/Tile Cost/Tile
Inspur NF5280 ESX Server 4.0 build 148592 23.45 17 1.38 $1,098.15
Dell PowerEdge R710 ESX Server 4.0 build 150817 23.55 16 1.47 $1,055.81
IBM System x 3650M2 ESX Server 4.0 build 148592 23.89 17 1.41 $1,267.41
Dell PowerEdge M610 ESX Server 4.0 23.9 17 1.41 $1,273.59
HP ProLiant DL370 G6 ESX Server 4.0 build 148783 23.96 16 1.50 $1,164.75
Dell PowerEdge R710 ESX Server 4.0 24 17 1.41 $993.71
Dell PowerEdge R805 ESX Server 3.5 U4 build 120079 11.22 8 1.40 $869.38
Dell PowerEdge R905 ESX Server 3.5 U3 build 120079 20.35 14 1.45 $813.21

As you can easily see, the cost-per-tile (analogous to $/VM) favors the Shanghai systems. In fact, the one system that we’ve taken criticism for including in our previous comparisons – the Supermicro 6026T-NTR+ with 72GB of DDR3/1066 (running at DDR3/800) – actually leads the pack in Nehalem-EP $/tile, but we’ve excluded it from our tables since it has been argued to be a “sub-optimal” configuration and out-lier. Again, the sweet spot for price-performance for Nehalem, Shanghai and Istanbul is in the 48GB to 80GB range with inexpensive memory: simple economics.

Please note, that not one of the 2P VMmark scores listed on VMware’s official VMmark results tally carry the Opteron 2393SE version of the processor (3.1GHz) or HT3-enabled motherboards. It is likely that we’ll not see HT3-enabled scores nor 2P ESX 4.0 scores until Istanbul’s release in the coming month. Again, if Shanghai’s $/tile is competitive with Nehalem’s today (again, in the 48GB to 80GB configurations), Istanbul – with the same memory and system costs – will be even more so.

Update: AMD’s Margaret Lewis has a similar take with comparison prices for AMD using DDR2/533 configurations. Her numbers – like our previous posts – resolve to $/VM, however she provides some good “street prices” for more “mainstream” configurations of Intel Nehalem-EP and AMD Shanghai systems. See her results and conclusions on AMD’s blog.


Shanghai Economics 101 – Continued

May 4, 2009

Let’s look at some more real world applications of what we’ve learned from the VMmark results for Nehalem and what it means in a practical comparison. We’ll award Nehalem-EP’s SMT a 25% bonus for in our comparisons when vCPU/core count is taken into the measurement. In a 6:1 consolidation, this means 60 vCPU’s for 2P Nehalem and 48 vCPU’s for Shanghai. Using this bias, the following cost characteristics are revealed for VM’s with average memory footprints of 1.5GB, for the Nehalem-EP 3.2GHz system:

Nehalem-EP Configuration Street $ 1536MB VM’s, 1 vCPU’s Max vCPU’s (6/c) Cost/VM
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 24GB DDR3/1333 $7,017.69 13 60 $539.82
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 48GB DDR3/1066 $7,755.99 28 60 $277.00
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 72GB DDR3/800 $8,708.19 42 60 $207.34
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 96GB DDR3/1066 $21,969.99 57 60 $385.44
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 144GB DDR3/800 $30,029.19 60 60 $500.49
2 x 2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 144GB DDR3/800 $60,058.38 120 120 $500.49

We’ll compare this to a Shanghai 2P system at 3.1GHz vs. the Nehalem-EP system:

Shanghai 2P/HT3 Configuration Street $ 1536MB VM’s, 1 vCPU’s Max vCPU’s (6/c) Cost/VM Savings per VM Savings %
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 32GB DDR2/800 $5,892.12 18 48 $327.34 $212.48 39.36%
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 48GB DDR2/800 $6,352.12 28 48 $226.86 $50.14 18.10%
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 64GB DDR2/533 $6,462.52 37 48 $174.66 $32.68 15.76%
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 80GB DDR2/667 $8,422.12 47 48 $179.19 $28.14 13.57%
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 96GB DDR2/667 $11,968.72 48 48 $249.35 $136.09 35.31%
2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 128GB DDR2/533 $14,300.92 48 48 $297.94 $202.55 40.47%
2 x 2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 128GB DDR2/533 $28,601.83 96 96 $297.94 $202.55 40.47%

Read the rest of this entry ?