Posts Tagged ‘benchmark’

h1

Quick Take: Dell/Nehalem Take #2, 2P VMmark Spot

September 9, 2009

The new 1st runner-up spot for VMmark in the “8 core” category was taken yesterday by Dell’s R710 – just edging-out the previous second spot HP ProLiant BL490 G6 by 0.1% – a virtual dead heat. Equipped with a pair of Xeon X5570 ($1386/ea, bulk list) and 96GB registered DDR3/1066 (12x8GB), the 2U, rack mount R710 weighs-in with a tile ratio of 1.43 over 102 VMs. :

  • Dell R710 w/redundant high-output power supply, ($18,209)
  • 2 x Intel Xeon X5570 Processors (included)
  • 96GB ECC DDR3/1066 (12×8GB) (included)
  • 2 x Broadcom NexXtreme II 5709 dual-port GigabitEthernet w/TOE (included)
  • 1 x Intel PRO 1000VT quad-port GigabitEthernet (1x PCIe-x4 slot, $529)
  • 3 x QLogic QLE2462 FC HBA (1x PCIe slot, $1,219/ea)
  • 1 x LSI1078 SAS Controller (on-board)
  • 8 x 15K SAS OS drive, RAID10 (included)
  • Required ProSupport package ($2,164)
  • Total as Configured: $24,559 ($241/VM, not including storage)

Three Dell/EMC CX3-40f arrays were used as the storage backing of the test. The storage system included 8GB cache, 2 enclosures and 15, 15K disks per array delivering 19 LUNs at about 300GB each. Intel’s Hyper-Threading and  “Turbo Boost” were enabled for 8-thread, 3.33GHz core clocking as was VT; however embedded SATA and USB were disabled as is common practice.

At about $1,445/tile ($241/VM) the new “second dog” delivers its best at a 20% price premium over Lenovo’s “top dog” – although the non-standard OS drive configuration makes-up a half of the difference, with Dell’s mandatory support package making-up the remainder. Using a simple RAID1 SAS and eliminating the support package would have droped the cost to $20,421 – a dead heat with Lenovo at $182/VM.

Comparing the Dell R710 the 2P, 12-core benchmark HP DL385 G6 Istanbul system at 15.54@11 tiles:

  • HP DL385 G6  ($5,840)
  • 2 x AMD 2435 Istanbul Processors (included)
  • 64GB ECC DDR2/667 (8×8GB) ($433/DIMM)
  • 2 x Broadcom 5709 dual-port GigabitEthernet (on-board)
  • 1 x Intel 82571EB dual-port GigabitEthernet (1x PCIe slot, $150/ea)
  • 1 x QLogic QLE2462 FC HBA (1x PCIe slot, $1,219/ea)
  • 1 x HP SAS Controller (on-board)
  • 2 x SAS OS drive (included)
  • $10,673/system total (versus $14,696 complete from HP)

Direct pricing shows Istanbul’s numbers at $1,336/tile ($223/VM) which is  a 7.5% savings per-VM over the Dell R710. Going to the street – for memory only – changes the Istanbul picture to $970/tile ($162/VM) representing a 33% savings over the R710.

SOLORI’s Take: Istanbul continues to offer a 20-30% CAPEX value proposition against Nehalem in the virtualization use case – even without IOMMU and higher memory bandwidth promised in upcoming Magny-Cours. With the HE parts running around $500 per processor, the OPEX benefits are there for Istanbul too. It is difficult to understand why HP wants to charge $900/DIMM for 8GB PC-5300 sticks when they are available on the street for 50% less – that’s a 100% markup. Looking at what HP charges for 8GB DDR3/1066 – $1,700/DIM – they are at least consistent. HP’s memory pricing practice makes one thing clear – customers are not buying large memory configurations from their system vendors…

On the contrary, Dell appears to be happy to offer decent prices on 8GB DDR3/1066 with their R710 at approximately $837/DIMM – almost par with street prices.  Looking to see if this parity held up with Dell’s AMD offerings, we examined the prices offered with Dell’s R805: while – at $680/DIMM – Dell’s prices were significantly better than HP’s, they still exceeded the market by 50%. Still, we were able to configure a Dell R805 with AMD 2435’s for much less than the equivalent HP system:

  • Dell R805 w/redundant power ($7,214)
  • 2 x AMD 2435 Istanbul Processors (included)
  • 64GB ECC DDR2/667 (8×8GB) ($433/ea, street)
  • 4 x Broadcom 5708 GigabitEthernet (on-board)
  • 1 x Intel PRO 100oPT dual-port GigabitEthernet (1x PCIe slot, included)
  • 1 x QLogic QLE2462 FC HBA (1x PCIe slot, included)
  • 1 x Dell PERC SAS Controller (on-board)
  • 2 x SAS OS drive (included)
  • $10,678/system total (versus $12,702 complete from Dell)

This offering from Dell should be able to deliver equivalent performance with HP’s DL385 G6 and likewise savings/VM compared to the Nehalem-based R710. Even at the $12,702 price as delivered from Dell, the R805 represents a potential $192/VM price point – about $50/VM (25%) savings over the R710.

h1

First 48-core VMmark Appears

June 18, 2009

Following in the footsteps of the first 12-core VMmark comes the current champion at 33.85@24 tiles using 48-cores – and, despite the timing, it is not an Istanbul server. In fact, today’s leader is the IBM System x3950 M2 running 8, 6-core Intel Xeon MP “Dunnington” X7460 processors with 256GB DDR2/667 RAM (5.3GB/core).

This score edges-out the previous champion – the HP ProLiant DL785 G5 with 8, 4-core Opteron 8393SE processors – which reigned at 31.56@21 tiles. In contrast to the 4-socket, 24-core IBM System x3850 M2 Xeon leading the 24-core category, this doubling of socket/core count resulted in only a 50% increase in capacity. This scaling inefficiency is less typical in 2P-to-4P transition but seems to plague the 4p-to-8P segment.

“The x3950 M2 is based on the fourth generation of IBM Enterprise X-Architecture®, and is designed to deliver innovation with enhanced reliability and availability features that enable optimal performance for databases, enterprise applications and virtualized environments.”

IBM News Blurb

“I’m really looking forward to even more virtualization benchmarks which are coming very soon.”

– Elisabeth Stahl, IBM Benchmarking and Systems Performance Blog

Looking at the virtualization notes we discover what it takes to keep 48-cores fed to achieve such a benchmark:

  • 4-QLogic QLE2462 HBA’s (Dual-port, 4-Gbps FC)
  • 1-IBM DS4800 with 4GB cache
    • 19 EXP 810 storage expansion units for
    • 1.8TB in 49 LUNs
      • 280 15K disks total
  • 21 IBM x336 clients
    • DP 3.2GHz Xeon
    • 3GB RAM
    • Server 2003 R2
  • 2 IBM x335 clients
    • DP Xeon 3.06GHz
    • 2.5GB RAM
    • Server 2003 R2
  • Eight vSwitches
    • 120 ports total
  • 4 Intel PRO 1000PT Dual-port 1Gb Ethernet controllers
    • one per vSwitch

While the Dunnington tops the list by sheer brute force, it’s safe to assume that – given the 32-core Opteron is nipping at its heels – the 48-core Istanbul results will displace it soon (possibly alluded to in Elisabeth Stahl’s “Benchmarking and Performance Blog” reference above). More interestingly, will AMD’s much touted “HT Assist” allow the 8P Istanbul to break the 4P-to-8P “curse” of scaling inefficiency? If not, it would show that much work is needed before the relatively “massive ” core counts of 2010 are upon us.

h1

Operton vs. Nehalem-EP at AnandTech

May 22, 2009

AnandTech’s Johan DeGelas has an interesting article on what he calls “real world virtualization” using a benchmark process his team calls “vApus Mk I” and runs it on ESX 3.5 Update 4. Essentially, it is a suite of Web 2.0 flavored apps running entirely on Windows in a mixed 32/64 structure. We’re cautiously encouraged by this effort as it opens the field of potential reviewers wide open.

Additionally, he finally comes to the same conclusion we’ve presented (in an economic impact context) about Shanghai’s virtualization value proposition. While his results are consistent with what we have been describing – that Shanghai has a good price-performance position against Nehalem-EP – there are some elements about his process that need further refinement.

Our biggest issue comes with his handling of 32-bit virtual machines (VM) and disclosure of using AMD’s Rapid Virtualization Indexing (RVI) with 32-bit VMs. In the DeGalas post, he points out some well known “table thrashing” consequences of TLB misses:

“However, the web portal (MCS eFMS) will give the hypervisor a lot of work if Hardware Assisted Paging (RVI, NPT, EPT) is not available. If EPT or RVI is available, the TLBs (Translation Lookaside Buffer) of the CPUs will be stressed quite a bit, and TLB misses will be costly.”

However, the MCS eFMS web portal (2 VMs) is running in a 32-bit OS. What makes this problematic is VMware’s default handling of page tables in 32-bit VM’s is “shadow page table” using VMware’s binary translation engine (BT). In otherwords, RVI is not enabled by default for ESX 3.5.x:

“By default, ESX automatically runs 32bit VMs (Mail, File, and Standby) with BT, and runs 64bit VMS (Database, Web, and Java) with AMD-V + RVI.”

–    VROOM! Blog, 3/2009

Read the rest of this entry ?