Posts Tagged ‘nehalem-ep’


Fujistu RX300 S5 Rack Server Takes 8-core VMmark Lead

November 11, 2009

Fujitsu’s RX300 S5 rack server takes the top spot in VMware’s VMmark for 8-core systems today with a score of 25.16@17 tiles. Loaded with two of Intel’s top-bin 3.33GHz, 130W Nehalem-EP processors (W5590, turbo to 3.6GHz per core) and 96GB of DDR3-1333 R-ECC memory, the RX300 bested the former champ – the HP ProLiant BL490c G6 blade – by only about 2.5%.

With 17 tiles and 102 virtual machines on a single 2U box, the RX300 S5 demonstrates precisely how well vSphere scales on today’s x86 commodity platforms. It also appears to demonstrate both the value and the limits of Intel’s “turbo mode” in its top-bin Nehalem-EP processors – especially in the virtualization use case – we’ll get to that later. In any case, the resulting equation is:

More * (Threads + Memory + I/O) = Dense Virtualization

We could have added “higher execution rates” to that equation, however, virtualization is a scale-out applications where threads, memory pool and I/O capabilities dominate the capacity equation – not clock speed. Adding 50% more clock provides less virtualization gains than adding 50% more cores, and reducing memory and context latency likewise provides better gains that simply upping the clock speed. That’s why a dual quad-core Nehalem 2.6GHz processor will crush a quad dual-core 3.5GHz (ill-fated) Tulsa.

Speaking of Tulsa, unlike Tulsa’s rather anaemic first-generation hyper-threading, Intel’s improved SMT in Nehalem “virtually” adds more core “power” to the Xeon by contributing up to 100% more thread capacity. This is demonstrated by Nehalem-EP’s 2 tiles per core contributions to VMmark where AMD’s Istanbul quad-core provides only 1 tile per core. But exactly what is a VMmark tile and how does core versus thread play into the result?


The Illustrated VMmark "Tile" Load

As you can see, a “VMmark Tile” – or just “tile” for short – is composed of 6 virtual machines, half running Windows, half running SUSE Linux. Likewise, half of the tiles are running in 64-bit mode while the other half runs in 32-bit mode. As a whole, the tile is composed of 10 virtual CPUs, 5GB of RAM and 62GB of storage. Looking at how the parts contribute to the whole, the tile is relatively balanced:

Operating System / Mode 32-bit 64-bit Memory vCPU Disk
Windows Server 2003 R2 67% 33% 45% 50% 58%
SUSE Linux Enterprise Server 10 SP2 33% 67% 55% 50% 42%
32-bit 50% N/A 30% 40% 58%
64-bit N/A 50% 70% 60% 42%

If we stop here and accept that today’s best x86 processors from AMD and Intel are capable of providing 1 tile for each thread, we can look at the thread count and calculate the number of tiles and resulting memory requirement. While that sounds like a good “rule of thumb” approach, it ignores specific use case scenarios where synthetic threads (like HT and SMT) do not scale linearly like core threads do where SMT accounts for only about 12% gains over single-threaded core, clock-for-clock. For this reason, processors from AMD and Intel in 2010 will feature more cores – 12 for AMD and 8 for Intel in their Magny-Cours and Nehalem-EX (aka “Beckton”), respectively.

Learning from the Master

If we want to gather some information about a specific field, we consult an expert, right? Judging from the results, Fujitsu’s latest dual-processor entry has definitely earned the title ‘Master of VMmark” in 2P systems – at least for now. So instead of the usual VMmark $/VM analysis (which are well established for recent VMmark entries), let’s look at the solution profile and try to glean some nuggets to take back to our data centers.

It’s Not About Raw Speed

First, we’ve noted that the processor used is not Intel’s standard “rack server” fare, but the more workstation oriented W-series Nehalem at 130W TDP. With “turbo mode” active, this CPU is capable of driving the 3.33GHz core – on a per-core basis – up to 3.6GHz. Since we’re seeing only a 2.5% improvement in overall score versus the ProLiant blade at 2.93GHz, we can extrapolate that the 2.93GHz X5570 Xeon is spending a lot of time at 3.33GHz – its “turbo” speed – while the power-hungry W5590 spends little time at 3.6GHz. How can we say this? Looking at the tile ratio as a function of the clock speed.

We know that the X5570 can run up to 3.33GHz, per core, according to thermal conditions on the chip. With proper cooling, this could mean up to 100% of the time (sorry, Google). Assuming for a moment that this is the case in the HP test environment (and there is sufficient cause to think so) then the ratio of the tile score to tile count and CPU frequency is 0.433 (24.54/17/3.33). If we examine the same ratio for the W5590, assuming the clock speed of 3.33GHz, we get 0.444 – a difference of 2.5%, or the contribution of “turbo” in the W5590. Likewise, if you back-figure the “apparent speed” of the X5570 using the ratio of the clock-locked W5590, you arrive at 3.25GHz for the W5570 (an 11% gain over base clock). In either case, it is clear that “turbo” is a better value at the low-end of the Nehalem spectrum as there isn’t enough thermal headroom for it to work well for the W-series.

VMmark Equals Meager Network Use

Second, we’re not seeing “fancy” networking tricks out of VMmark submissions. In the past, we’ve commented on the use of “consumer grade” switches in VMmark tests. For this reason, we can consider VMmark’s I/O dependency as related almost exclusively to storage. With respect to networking, the Fujitsu team simply interfaced three 1Gbps network adapter ports to the internal switch of the blade enclosure used to run the client-side load suite and ran with the test. Here’s what that looks like:


Networking Simplified: The "leaders" simple virtual networking topology.

Note that the network interfaces used for the VMmark trial are not from the on-board i82575EB network controller but from the PCI-Express quad-port adapter using its older cousin – the i82571EB. What is key here is that VMmark is tied to network performance issues, and it is more likely that additional network ports might increase the likelihood of IRQ sharing and reduced performance more so than the “optimization” of network flows.

Keeping Storage “Simple”

Third, Fujitsu’s approach to storage is elegantly simple: several “inexpensive” arrays with intelligent LUN allocation. For this, Fujistu employed eight of its ETERNUS DX80 Disk Storage Systems with 7 additional storage shelves for a total of 172 working disks and 23 LUNs. For simplicity, Fujistu used a pair of 8Gbps FC ports to feed ESX and at least one port per DX80 – all connected through a Brocade 5100 fabric switch. The result looked something like this:


Fujitsu's VMmark Storage Topology: 8 Controllers, 7 Shelves, 172 Disks and 23 LUNs.

And yes, the ESX server is configured to boot from SAN, using no locally attached storage. Note that the virtual machine configuration files, VM swap and ESX boot/swap are contained in a separate DX80 system. This “non-default” approach allows the working VMDKs of the virtual machines to be isolated – from a storage perspective – from the swap file overhead, about 5GB per tile. Again, this is a benchmark scenario, not an enterprise deployment, so trade-offs are in favour of performance, not CAPEX or OPEX.

Even if the DX80 solution falls into the $1K/TB range, to say that this approach to storage is “economic” requires a deeper look. At 33 rack units for the solution – including the FC switch but not including the blade chassis – this configuration has a hefty datacenter footprint. In contrast to the old-school server/blade approach, 1 rack at 3 servers per U is a huge savings over the 2 racks of blades or 3 racks of 1U rack servers. Had each of those servers of blades had a mirror pair, we’d be talking about 200+ disks spinning in those racks versus the 172 disks in the ETERNUS arrays, so that still represents a savings of 15.7% in storage-related power/space.

When will storage catch up?

Compared to a 98% reduction in network ports, a 30-80% reduction server/storage CAPEX (based on $1K/TB SAN), a 50-75% reduction in overall datacenter footprint, why is a 15% reduction in datacenter storage footprint acceptable? After all, storage – in the Fujitsu VMmark case – now represents 94% of the datacenter footprint. Even if the load were less aggressively spread across five ESX servers (a conservative 20:1 loading), the amount of space taken by storage only falls to 75%.

How can storage catch up to virtualization densities. First, with 2.5″ SAS drives, a bank of 172 disks can be made to occupy only 16U with very strong performance. This drops storage to only 60% of the datacenter footprint – 10U for hypervisor, 16U for storage, 26U total for this example. Moving from 3.5″ drives to 2.5″ drives takes care of the physical scaling issue with acceptable returns, but results in only minimal gains in terms of power savings.

Saving power in storage platforms is not going to be achieved by simply shrinking disk drives – shrinking the NUMBER of disks required per “effective” LUN is what’s necessary to overcome the power demands of modern, high-performance storage. This is where non-traditional technology like FLASH/SSD is being applied to improve performance while utilizing fewer disks and proportionately less power. For example, instead of dedicating disks on a per LUN basis, carving LUNs out of disk pools accelerated by FLASH (a hybrid storage pool) can result in a 30-40% reduction in disk count – when applied properly – and that means 30-40% reduction in datacenter space and power utilization.

Lessons Learned

Here are our “take aways” from the Fujitsu VMmark case:

1) Top-bin performance is at the losing end of diminishing returns. Unless your budget can accommodate this fact, purchasing decisions about virtualization compute platforms need to be aligned with $/VM within an acceptable performance envelope. When shopping CPU, make sure the top-bin’s “little brother” has the same architecture and feature set and go with the unit priced for the mainstream. (Don’t forget to factor memory density into the equation…) Regardless, try to stick within a $190-280/VM equipment budget for your hypervisor hardware and shoot for a 20-to-1 consolidation ratio (that’s at least $3,800-5,600 per server/blade).

2) While networking is not important to VMmark, this is likely not the case for most enterprise applications. Therefore, VMmark is not a good comparison case for your network-heavy applications. Also, adding more network ports increases capacity and redundancy but does so at the risk of IRQ-sharing (ESX, not ESXi) problems, not to mention the additional cost/number of network switching ports. This is where we think 10GE will significantly change the equation in 2010. Remember to add up the total number of in use ports – including out-of-band management – when factoring in switch density. For net new instalments, look for a switch that provides 10GE/SR or 10GE/CX4 options and go with !0GE/SR if power savings are driving your solution.

3) Storage should be simple, easy to manage, cheap (relatively speaking), dense and low-power. To meet these goals, look for storage technologies that utilize FLASH memory, tiered spindle types, smart block caching and other approaches to limit spindle count without sacrificing performance. Remember to factor in at least the cost of DAS when approximating your storage budget – about $150/VM in simple consolidation cases and $750/VM for more mission critical applications (that’s a range of $9,000-45,000 for a 3-server virtualization stack). The economies in managed storage come chiefly from the administration of the storage, but try to identify storage solutions that reduce datacenter footprint including both rack space and power consumption. Here’s where offerings from Sun and NexentaStor are showing real gains.

We’d like to see VMware update VMmark to include system power specifications so we can better gage – from the sidelines – what solution stack(s) perform according to our needs. VMmark served its purpose by giving the community a standard from which different platforms could be compared in terms of the resultant performance. With the world’s eyes on power consumption and the ecological impact of datacenter choices, adding a “power utilization component” to the “server-side” of the VMmark test would not be that significant of a “tweak.” Here’s how we think it can be done:

  1. Require power consumption of the server/VMmark related components be recorded, including:
    1. the ESX platform (rack server, blade & blade chassis, etc.)
    2. the storage platform providing ESX and test LUN(s) (all heads, shelves, switches, etc.)
    3. the switching fabric (i.e. Ethernet, 10GE, FC, etc.)
  2. Power delivered to the test harness platforms, client load machines, etc. can be ignored;
  3. Power measurements should be recorded at the following times:
    1. All equipment off (validation check);
    2. Start-up;
    3. Single tile load;
    4. 100% tile capacity;
    5. 75% tile capacity;
    6. 50% tile capacity;
  4. Power measurements should be recorded using a time-power data-logger with readings recorded as 5-minute averages;
  5. Notations should be made concerning “cache warm-up” intervals, if applicable, where “cache optimized” storage is used.

Why is this important? In the wake of the VCE announcement, solution stacks like VCE need to be measured against each other in an easy to “consume” way. Is VCE the best platform versus a component solution provided by your local VMware integrator? Given that the differentiated VCE components are chiefly UCS, Cisco switching and EMC storage, it will be helpful to have a testing platform that can better differentiate “packaged solutions” instead of uncorrelated vendor “propaganda.”

Let us know what your thoughts are on the subject, either on Twitter or on our blog…


Quick Take: HP Blade Tops 8-core VMmark w/OC’d Memory

September 25, 2009

HP’s ProLiant BL490c G6 server blade now tops the VMware VMmark table for 8-core systems – just squeaking past rack servers from Lenovo and Dell with a score of 24.54@17 tiles: a new 8-core record. The half-height blade was equipped with two, quad-core Intel Xeon X5570 (Nehalem-EP, 130W TDP) and 96GB ECC Registered DDR3-1333 (12x 8GB, 2-DIMM/channel) memory.

In our follow-up, we found that HP’s on-line configuration tool does not allow for DDR3-1333 memory so we went to the street for a comparison. For starters, we examined the on-line price from HP with DDR3-1066 memory and the added QLogic QMH2462 Fiber Channel adapter ($750) and additional NC360m dual-port Gigabit Ethernet controller ($320) which came to a grand total of $28,280 for the blade (about $277/VM, not including Blade chassis or SAN storage).

Stripping memory from the build-out results in a $7,970 floor to the hardware, sans memory. Going to the street to find 8GB sticks with DDR3-1333 ratings and HP support yielded the Kingston KTH-PL313K3/24G kit (3x 8GB DIMMs) of which we would need three to complete the build-out.  At $4,773 per kit, the completed system comes to $22,289 (about $218/VM, not including chassis or storage) which may do more to demonstrate Kingston’s value in the market place rather than HP’s penchant for “over-priced” memory.

Now, the interesting disclosure from HP’s testing team is this:

Notes from HP's VMmark submission.

Notes from HP's VMmark submission.

While this appears to boost memory performance significantly for HP’s latest run (compared to the 24.24@17 tiles score back in May, 2009) it does so at the risk of running the Nehalem-EP memory controller out of specification – essentially, driving the controller beyond the rated load. It is hard for us to imagine that this specific configuration would be vendor supported if used in a problematic customer installation.

SOLORI’s Take:Those of you following closely may be asking yourselves: “Why did HP choose to over-clock the  memory controller in this run by pushing a 1066MHz, 2DPC limit to 1333MHz?”  It would appear the answer is self-evident: the extra 6% was needed to put them over the Lenovo machine. This issue raises a new question about the VMmark validation process: “Should out of specification configurations be allowed in the general benchmark corpus?” It is our opinion that VMmark should represent off-the-shelf, fully-supported configurations only – not esoteric configuration tweaks and questionable over-clocking practices.

Should there be as “unlimited” category in the VMmark arena? Who knows? How many enterprises knowingly commit their mission critical data and processes to systems running over-clocked processors and over-driven memory controllers? No hands? That’s what we thought… Congratulations anyway to HP for clawing their way to the top of the VMmark 8-core heap…


Quick Take: Magny-Cours Spotted, Pushed to 3GHz for wPrime

September 13, 2009

Andreas Galistel at NordicHardware posted an article showing a system running a pair of engineering samples of the Magny-Cours processor running at 3.0GHz. Undoubtedly these images were culled from a report “leaked” on XtremeSystems forums showing a “DINAR2” motherboard with SR5690 chipset – in single and dual processor installation – running Magny-Cours at the more typical pre-release speed of 1.7GHz.

We know that Magny-Cours is essentially a MCM of Istanbul delivered in the rectangular socket G34 package. One thing illuminating about the two posts is the reported “reduction” in L3 cache from 12MB (6MB x 2 in MCM) to 10MB (2 x 5MB in MCM). Where did the additional cache go? That ‘s easy: since a 2P Magny-Cours installation is essentially a 4P Istanbul configuration, these processors have the new HT Assist feature enabled – giving 1MB of cache from each chip in the MCM to HT Assist.

“wPrime uses a recursive call of Newton’s method for estimating functions, with f(x)=x2-k, where k is the number we’re sqrting, until Sgn(f(x)/f'(x)) does not equal that of the previous iteration, starting with an estimation of k/2. It then uses an iterative calling of the estimation method a set amount of times to increase the accuracy of the results. It then confirms that n(k)2=k to ensure the calculation was correct. It repeats this for all numbers from 1 to the requested maximum.”

wPrime site

Another thing intriguing about the XtremeSystems post in particular is the reported wPrime 32M and 1024M completion times. Compared to the hyper-threading-enabled 2P Xeon W5590 (130W TDP) running wPrime 32M at 3.33GHz (3.6GHz turbo)  in 3.950 seconds, the 2P 3.0GHz Magny-Cours completed wPrime 32M in an unofficial 3.539 seconds – about 10% quicker while running a 10% slower clock. From the myopic lens of this result, it would appear AMD’s choice of “real cores” versus hyper-threading delivers its punch.

SOLORI’s Take: As a “reality check” we can compared the reigning quad-socked, quad-core Opteron 8393 SE result in wPrime 32M and wPrime 1024M at 3.90 and 89.52  seconds, respectively. Adjusted for clock and core count versus its Shanghai cousin, the Magny-Cours engineering samples – at 3.54 and 75.77 seconds, respectively – turned-in times about 10% slower than our calculus predicted. While still “record breaking” for 2P systems, we expected the Magny-Cours/Istanbul cores to out-perform Shanghai clock-per-clock – even at this stage of the game.

Due to the multi-threaded nature of the wPrime benchmark, it is likely that the HT Assist feature – enabled in a 2P Magny-Cours system by default – is the cause of the discrepancy. By reducing the available L3 cache by 1MB per die – 4MB of L3 cache total – HT Assist actually could be creating a slow-down. However, there are several things to remember here:

  • These are engineering samples qualified for 1.7GHz operation
  • Speed enhancements were performed with tools not yet adapted to Magny-Cours
  • The author indicated a lack of control over AMD’s Cool ‘n Quiet technology which could have made “as tested” core clocks somewhat lower than what CPUz reported (at least during the extended tests)
  • It is speculated that AMD will release Magny-Cours at 2.2GHz (top bin) upon release, making the 2.6+ GHz results non-typical
  • The BIOS and related dependencies are likely still being “baked”

Looking at the more “typical” engineering sample speed tests posted on the XtremeSystems’ forum tracks with the 3.0GHz overclock results at a more “typical” clock speed of 2.6GHz for 2P Magny-Cours: 3.947 seconds and 79.625 seconds for wPrime 32M and 1024M, respectively. Even at that speed, the 24-core system is on par with the 2P Nehalem system clocked nearly a GHz faster. Oddly, Intel reports the W5590  as not supporting “turbo” or hyper-threading although it is clear that Intel’s marketing is incorrect based on actual testing.

Assuming Magny-Cours improves slightly on its way to market, we already know how 24-core Istanbul stacks-up against 16-thread Nehalem in VMmark and what that means for Nehalem-EP. This partly explains the marketing shift as Intel tries to position Nehalep-EP as a destined for workstations instead of servers. Whether or not you consider this move a prelude to the ensuing Nehalem-EX v. Magny-Cours combat to come or an attempt to keep Intel’s server chip power average down by eliminating the 130W+ parts from the “server” list,  Intel and AMD will each attempt win the war before the first shot is fired. Either way, we see nothing that disrupts the price-performance and power-performance comparison models that dominate the server markets.

[Ed: The 10% difference is likely due to the fact that the author was unable to get “more than one core” clocked at 3.0GHz. Likewise, he was uncertain that all cores were reliably clocking at 2.6GHz for the longer wPrime tests. Again, this engineering sample was designed to run at 1.7GHz and was not likely “hand picked” to run at much higher clocks. He speculated that some form of dynamic core clocking linked to temperature was affecting clock stability – perhaps due to some AMD-P tweaks in Magny-Cours.]


Quick Take: 6-core “Gulftown” Nehalem-EP Spotted, Tested

August 10, 2009

TechReport is reporting on a Taiwanese overclocker who may be testing a pair of Nehalem 6-core processors (2P) slated for release early in 2010. Likewise, AlienBabelTech mentions a Chinese website, HKEPC, that has preliminary testing completed on the desktop (1P) variant of the 6-core. While these could be different 32nm silicon parts, it is more likely – judging from the CPU-Z outputs and provided package pictures – that these are the same sample SKUs tested as 1P and 2P LGA-1366 components.

CPUzWhat does this mean for AMD and the only 6-core shipping today? Since Intel’s still projecting Q2/2010 for the server part, AMD has a decent opportunity to grow market share for Istanbul. Intel’s biggest rival will be itself – facing a wildly growing number of SKU’s in across its i-line from i5, i7, i8 and i9 “families” with multiple speed and feature variants. Clearly, the non-HT version would stand as a direct competitor to Istanbul’s native 6-core SKUs. Likewise, Istanbul may be no match for the 6-core Nehalem with HT and “turbo core” feature set.

However, with an 8-core “Beckton” Nehalem variant on the horizon, it might be hard to understand just where the Gulftown fits in Intel’s picture. Intel faces a serious production issue, filling fab capacity with 4-core, 6-core and 8-core processors, each with speed, power, socket and HT variants from which to supply high-speed, high-power SKUs and lower-speed, low-power SKUs for 1P, 2P and 4P+ destinations. Doing the simple math with 3 SKU’s per part Intel would be offering the market a minimum of 18 base parts according to their current marketing strategy: 9 with HT/turbo, 9 without HT/turbo. For socket LGA-1366, this could easily mean 40+ SKUs with 1xQPI and 2xQPI variants included (up from 23).

SOLORI’s take: Intel will have to create some interesting “crippling or pricing tricks” to keep Gulftown from canibalizing the Gainstown market. If they follow their “normal” play book, we prodict the next 10-months will play out like this:

  1. Initially there will be no 8-core product for 1P and 2P systems (LGA-1366), allowing for artificially high margins on the 8-core EX chip (LGA-1567), slowing the enevitable canibalization of the 4-core/2P market, and easing production burdens;
  2. Intel will silently and abruptly kill Itanium in favor of “hyper-scale” Nehalem-EX variants;
  3. Gulftown will remain high-power (90-130W TDP) and be positioned against AMD’s G34 systems and Magny-Cours – plotting 12-core against 12-thread;
  4. Intel creates a “socket refresh” (LGA-1566?) to enable “inexpensive” 2P-4P platforms from its Gulftown/Beckton line-up in 2H/2010 (ostensibly to maintain parity with G34) without hurting EX;
  5. Revised, lower-power variants of Gainstown will be positioned against AMD’s C32 target market;
  6. Intel will cut SKUs in favor of higher margins, increasing speed and features for “same dollar” cost;
  7. Non-HT parts will begin to disappear in 4-core configurations completely;
  8. Intel’s AES enhancements in Gulftown will allow it to further differentiate itself in storage and security markets;

It would be a mistake for Intel to continue growing SKU count or provide too much overlap between 4-core HT and 6-core non-HT offerings. If purchasing trends soften in 4Q/09 and remain (relatively) flat through 2Q/10, Intel will benefit from a leaner, well differentiated line-up. AMD has already announced a “leaner” plan for G34/C32. If all goes well at the fabs, 1H/2010 will be a good ole fashioned street fight between blue and green.


AMD Istanbul and Intel Nehalem-EP: Street Prices

June 22, 2009

It’s been three weeks after the official launch of AMD’s 6-core Istanbul processor and we wanted to take a look at prevailing street prices for the DIY upgrade option.

Istanbul Pricing (Street)

AMD “Istanbul” Opteron™ Processor Family
2400 Series Price 8400 Series Price
2.6GHz Six-Core, 6-Thread
AMD Opteron 2435 (75W ACP)
$1060.77 2.6GHz Six-Core, 6-Thread
AMD Opteron 8435 (75W ACP)
2.4GHz Six-Core, 6-Thread
AMD Opteron 2431 (75W ACP)
2.4GHz Six-Core, 6-Thread
AMD Opteron 8431 (75W ACP)
2.2GHx Six-Core, 6-Thread
AMD Opteron 2427 (75W ACP)

Nehalem-EP/EX Pricing (Street)

After almost two months on the market, the Nehalem has been on the street long enough to see a 1-3% drop in prices. How does Istanbul stack-up against the Nehalem-EP/Xeon pricing?

Intel “Nehalem” Xeon Processor Family
EP Series Price EX Series Price
2.66GHz Quad-Core, 8-Thread Intel Xeon EP X5550 (95W TDP) $999.95
Quad-Core, 8-Thread Intel Xeon EX TDB
2.4GHz Quad-Core, 8-Thread Intel Xeon EP E5530 (80W TDP) $548.66
Quad-Core, 8-Thread Intel Xeon EX TBD
2.26GHz Quad-Core, 8-Thread Intel Xeon EP E5520 (80W TDP) $400.15
2.26GHz Quad-Core, 8-Thread Intel Xeon EP L5520 (60W TDP) $558.77

Compared to the competing Nehalem SKU’s, the Istanbul is fetching a premium price. This is likely due to the what AMD perceives to be the broader market that Istanbul is capable of serving (and its relative newness relative to demand, et al). Of course, there are no Xeon Nehalem-EX SKU’s in supply to compare against Istanbul in the 4P and 8P segments, but in 2P, it appears Istanbul is running 6% higher at the top bin SKU and 27% higher at the lower bin SKU – with the exception of the 60W TDP part, upon which Intel demands a 13% premium over the 2.2GHz Istanbul part.

This last SKU is the “green datacenter” battleground part. Since the higher priced 2.6GHz Istanbul rates a 15W (ACP) premium over the L5520, it will be interesting to see if system integrators will compare it to the low-power Xeon in power-performance implementations. Comparing SPECpower_ssj2008 between similarly configured Xeon L5520 and X5570, the performance-per-watt is within 2% for relatively anemic, dual-channel 8GB memory configurations.

In a virtualization system, this memory configuration would jump from an unusable 8GB to at least 48GB, increasing average power consumption by another 45-55W and dropping the performance-per-watt ratio by about 25%. Looking at the relative performance-per-watt of the Nehalem-EP as compared to the Istanbul in TechReport’s findings earlier this month, one could extrapolate that the virtualization performance-per-watt for Istanbul is very competitive – even with the lower-power Xeon – in large memory configurations. We’ll have to wait for similar SPECpower_ssj2008 in 4P configurations to know for sure.

System Memory Pricing (Street)

System memory represents 15-20% of system pricing – more in very large memory foot prints. We’ve indicated that Istanbul’s time-to-market strategy shows a clear advantage (CAPEX) in memory pricing alone – more than compensating for the slight premium in CPU pricing.

System Memory Pricing
DDR2 Series (1.8V) Price DDR3 Series (1.5V) Price

4GB 800MHz DDR2 ECC Reg with Parity CL6 DIMM Dual Rank, x4 (5.4W)

4GB 1333MHz DDR3 ECC Reg w/Parity CL9 DIMM Dual Rank, x4 w/Therm Sen (3.96W)


4GB 667MHz DDR2 ECC Reg with Parity CL5 DIMM Dual Rank, x4 (5.94W)

4GB 1066MHz DDR3 ECC Reg w/Parity CL7 DIMM Dual Rank, x4 w/Therm Sen (5.09W)

8GB 667MHz DDR2 ECC Reg with Parity CL5 DIMM Dual Rank, x4 (7.236W)

8GB 1066MHz DDR3 ECC Reg w/Parity CL7 DIMM Dual Rank, x4 w/Therm Sen (6.36W)

These parts show a 28%, 40% and 62% premium price for DDR3 components versus DDR2 which indicates Istanbul’s savings window is still wide-open. Since DDR3 prices are not expected to fall until Q3 at the earliest, this cost differential is expected to influence “private cloud” virtualization systems more strongly. However, with the 0.3V lower voltage requirement on the DDR3 modules, Nehalem-EP actually has a slight adavantage from a operational power perspective in dual-channel configurations. When using tripple-channel for the same memory footprint, Nehalem-EP’s memory consumes about 58% more power (4x8GB vs. 9x4GB).


First 12-core VMmark for Istanbul Appears

June 10, 2009

VMware has posted the VMmark score for the first Istanbul-based system and it’s from HP: the ProLiant DL385 G6. While it’s not at the top of the VMmark chart at 15.54@11 tiles (technically it is at the top of the 12-core benchmark list), it still shows a compelling price-performance picture.

Comparing Istanbul’s VMmark Scores

For comparison’s sake, we’ve chosen the HP DL385 G5 and HP DL380 G6 as they were configured for their VMmark tests. In the case of the ProLiant DL380 G6, we could only configure the X5560 and not the X5570 as tested so the price is actually LOWER on the DL380 G6 than the “as tested” configuration. Likewise, we chose the PC-6400 (DDR2/667, 8x8GB) memory for the DL 385 G5 versus the more expensive PC-5300 (533) memory as configured in 2008.

As configured for pricing, each system comes with processor, memory, 2-SATA drives and VMware Infrastructure Standard for 2-processors. Note that in testing, additional NIC’s, HBA, and storage are configured and such additions are not included herein. We have omitted these additional equipment features as they would be common to a deployment set and have no real influence on relative pricing.

Systems as Configured for Pricing Comparison

System Processor Speed Cores Threads Memory Speed Street
HP ProLiant DL385 G5 Opteron 2384 2.7 8 8 64 667 $10,877.00
HP ProLiant DL385 G6 Opteron 2435 2.6 12 12 64 667 $11,378.00
HP ProLiant DL380 G6 Xeon X5560* 2.93 8 16 96 1066 $30,741.00

Here’s some good news: 50% more cores for only 5% more (sound like an economic stimulus?) The comparison Nehalem-EP is nearly 3x the Istanbul system in price.

Read the rest of this entry ?


Intel’s $1.1B Euro Slap On the Wrist, Must Sell 2.3M Chips

May 13, 2009

May 13th, 2009  – besides being my birthday – marks the day that the European Competition Commission drew a $1.1B Euro fine (about $1.4B US dollars) on Intel for going “to great lengths to cover up its anti-competitive actions” and in the process “harmed millions of European consumers.” This according to the EU commissioner Neelie Kroes, in an address in Brussels today. The fine could have been as large as $4B Euros, and will go to the EU’s annual budget – not consumers.

Commissioner Kroes was seen holding up an Intel PII/PIII processor card (SECC2) during the news conference, giving some scope to what has been a very long and drawn-out process: going back to 2000. At the heart of the matter has been Intel’s “llegal anticompetitive practices to exclude competitors from the market for computer chips called x86 central processing units (CPUs)” – namely AMD. These were apparantly manifested in behind the scenes rebates and discounts in exchange for a reduction or termination of AMD-based products.

In a press release from Intel’s President and CEO, Paul Otellini, the fined chip maker offered this defense:

Intel takes strong exception to this decision. We believe the decision is wrong and ignores the reality of a highly competitive microprocessor marketplace – characterized by constant innovation, improved product performance and lower prices. There has been absolutely zero harm to consumers. Intel will appeal.

Intel must cover their fine immediately with a bank guarantee which will stay sequestered until their appeal is either exhausted or the decision reversed. Based on EU’s hunger for this type of commercial justice, the money could be tied-up for many years. But the question remains, does Intel have a history of anti-competitive behavior beyond the test of rigorous competition?

Intel’s history tells a compelling story: the EU joins Japan (2004) and South Korea (2008) in finding Intel engaged in anti-competitive behavior. The question remains: how will the EU’s decision play in the US courts as AMD’s ongoing antitrust suit (2005) against Intel continues to unfold? Delayed until 2010 due to the lenghty list of depositions scheduled for the case, the EU’s decision will likely do more to tarnish Intel’s new “Promoting Innovation” Campaign than settle the dispute.

So what does Intel need to do to weather the EU’s wrath? In product terms, Intel needs to move 2,262,752 of its Nehalem-EP (5500-series) chips to cover the loss. Based on a predicted 40M unit replacement market in the US, thats less than 5% and it’s under 2.5% of the market if they are 2P systems. However, Intel’s promised a 9:1 value for the replacement with some estimating that number moves to 18:1 with good results for SMT (depending on the workload).

What does this mean from an Intel 5500-series sales perspective? Here’s our estimate, using Intel’s 9:1 and 18:1 math (not forgeting the 4.5:1 for the dual-core):

Nehalem Units Needed Retail Value 9:1 18:1
W5580 12,545 $20,072,000.00 0.56%
X5570 121,713 $168,694,218.00 5.48%
X5560 168,227 $197,162,044.00 7.57%
X5550 174,450 $167,123,100.00 7.85%
E5540 531,715 $395,595,960.00 23.93%
E5530 419,636 $222,407,080.00 18.88%
E5520 183,533 $68,457,809.00 8.26%
E5506 262,704 $69,879,264.00 5.91%
E5504 250,051 $56,011,424.00 5.63%
E5502 106,312 $19,986,656.00 1.20%
L5520 10,516 $5,573,480.00 0.24%
L5506 21,350 $9,031,050.00 0.96%
Total 2,262,752 $1,399,994,085.00 12.97% 73.49%

By these estimates, Intel will need to close 86.5% of the total replacement market to be able to cover the EU fines. All this assumes, of course, that they don’t offer discounts off of their “published” per-1000 chip prices. Good luck, Intel, on an exciting marketing campaign!