h1

AMD Istanbul Launch: Shipping Today

June 1, 2009
AMD Opteron "Istanbul" 6-core processor die

AMD Opteron "Istanbul" 6-core processor die

June 1, 2009 – Today, AMD is announcing the general availability of its new single-die, 6-core Opteron processor code named “Istanbul.” We have weighed-in on the promised benefits of Istanbul based on pre-release material that was not under non-disclosure protections. Now, we’re able to disclose the rest of the story.

First, we got a chance to talk to Mike Goddard, AMD Server Products CTO, to discuss Istanbul and how G34/C32 platforms are shaping-up. According to Goddard,”things went really well with Istanbul; it’s no big secret that the silicon we’re using in Istanbul is the same silicon we’re using in Magny-Cours.” Needless to say, there are many more forward-thinking capabilities in Istanbul than can be supported in Socket-F’s legacy chipsets.

“We had always been planning a refresh to Socket-F with 5690,” says Goddard, “but Istanbul got pulled-in beyond our ability to pull-in the chipset.” Consequently, while there could be Socket-F platforms based on the next-generation 5690/5100 chipset, Goddard suggests that “most OEM’s will realign their platform development around [G34/C32, Q1/2010].”

In common parlance, Istanbul is a “genie in a bottle,” and we won’t see its true potential until it resurfaces in its Magny-Cours/G34 configuration. However, at few of these next-generation tweaks will trickle-down to Socket-F systems:

  • AMD PowerCap Manager (via BIOS extensions)
  • Enhanced AMD PowerNow! Technology
  • AMD CoolCore Technology extended to L3 cache
  • HT Assist (aka probe filter) for increase memory bandwidth
  • HT 3.0 with increase to 4.8GT/sec and IMC improvements
  • 5 new part SKUs
  • Better 2P Performance Parity with Nehalem-EP

That’s in addition to 50% more cores in the same power envelope: not an insignificant improvement. In side-by-side comparisons to “Shanghai” quad-core at the same clock frequency, Istanbul delivers 2W lower idle power and 34% better SPECpower ssj_2008 (1,297 overall) results using identical systems with just a processor swap. In fact, the only time Istanbul exceeded Shanghai’s average power envelope was at 80% actual load and beyond – remaining within 5% of the Shanghai even at 100% load.

New Feature: AMD PowerCap Manager

AMD’s PowerCap Manager is a BIOS extension to maximum power consumption points for Istanbul processors during system operation. The BIOS option caps frequency and voltage to the processor and is configurable with four options:

  • Disabled – full-range voltage and clock to processor, no effective power savings
  • -1 – Caps power consumption to 70% of normal, effectively saving 30% power
  • -2 – Caps power consumption to 60% of normal, effectively saving 40% power
  • -3 – Caps power consumption to 40% of normal, effectively saving 60% power

This feature is targeted at workloads that are more thread sensitive than frequency sensitive (i.e. the bulk of today’s “cloud computing” loads). This allows system administrators to “dial-in” power efficiency to optimize rack and blade systems where deterministic power consumption is more critical than absolute performance.

New Feature: Enhanced AMD PowerNow! and CoolCore

PowerNow! delivers increased power efficiency by allowing parts of the processor by reducing voltage and frequency to cores based on demand. AMD’s “smart fetch” enhancements allows core clocks to be turned off during idle processing cycles to reduce power consumption even further. Additionally AMD’s CoolCore has been extended to include shutting down unused portions of the L3 cache. These features are primarily useful in non-virtualized workloads where significant opportunities for idle-time exist.

HT Assist Improvements

HT Assist Improvements

New Feature: HT Assist

Modern multi-core processors that share the same die commonly share a third level cache that allows each core to fetch cached memory locations without requiring a fetch from main memory (and slower memory access.) In multi-socket systems, cache coherency between processors in adjacent sockets must be maintained by constant updates between physical processors. For two processors, this update process is fairly straight-forward, but for 4 or 8 processors, the amount of traffic between processors can consume significant bus resources to maintain cache coherence.

Before HT Assist, each processor essentially broadcasts updates to every other processor each time a cacheline entered the L3-cache. Before the updating processor can proceed with its cached value, it must wait for all other processors to acknowledge. This can consume significant “effective” bandwidth and increases cache latency. With HT Assist, each L3 cache has a 1MB set-aside to track other processor cache entries, reducing the amount of traffic and reducing the number of times a physical processor must wait for acknowledgments. For workloads – especially NUMA aware workloads – where data sets are not shared between physical processors, this can result in significantly higher effective memory bandwidth.

Since HT Assist increases initial cacheline efficiencies, overall memory access latency is reported to be reduced by 7-10 ns. This closes the memory latency gap that Nehalem created between itself and Shanghai by 25-30%. In practice, HT Assist is primarily beneficial for 4P and 8P systems and is disabled by default for 2P systems. For most 2P workloads, the reduction in L3 cache (1MB borrowed for HT Assist) may eliminate any benefit from HT Assist since probe traffic is much reduced in 2P systems anyway.

New Capability: HyperTransport 3.0 and IMC Improvements

HyperTransport 3.0 has been available in Shanghai in the 73EE, 77EE, 79HE, 81HE, 87, 89 and 93SE parts. It will be standard on all Istanbul SKUs. While HT 3.0 supports clock rates up to 2.6GHz, enabling HT speeds up to 5.2GT/sec, Istanbul will only support HT 3.0 speeds up to 4.8GT/sec in its Socket-F form factor. For systems shipping as HT 3.0 enabled, CPU-to-CPU traffic (including memory) maxes-out at 4.8GT/sec with CPU-to-chipset remaining at HT 1.0 rates, or 2GT/sec. With DCA 2.0 enabled systems – requiring the upcoming SR5690/SB5100 chipset – these rates will increase significantly.

The integrated memory controller (IMC) has been overhauled significantly to accommodate the needs of G34, C32 and DDR3. While this results in unused capabilities in Socket-F, the memory controller is now unlocked from the HT bus, resulting in a 4.8GT/sec HT 3.0 transfers even with a 2.2GHz memory controller. Additionally, the ability to support ECC with x8 memory modules has been added to the IMC which should improved memory bandwidth in ECC applications.

New Part SKUs Available

For 2P (series 2000) platforms, initial Istanbul parts will be 2427, 2431 and 2435 running at 2.2GHz, 2.4GHz and 2.6GHz, respectively, and running at 75W average power. Initial pricing based on 1,000 unit quantity for the 2400 line-up is reported to be $455, $698 and $989 for the 2427, 2431 and 2435, respectively. As indicated, these will all be HT 3.0-enabled (backwards compatible to HT 1.0) and carry the improved memory controller.

For 4P and 8P systems, two SKUs will be initially available: 8431 and 8435 at 2.4GHz and 2.6GHz, respectively. These 8400-series products are expected to be available at $2,149 and $2,649, respectively, in 1,000’s quantities. We expect to see HT Assist play a big role in 4P/8P system refreshes/updates as 4-way stream memory bandwidth performance improve by as much as 60%.

Better Performance Parity with Nehalem-EP

No discussion about a new processor would be complete without a preview into the chip’s performance potential. Enter Andy Parma, AMD Server Marketing, with some details from AMD’s test lab. Parma has been working closely with Goddard’s performance team to inventory Istanbul’s advantages for Socket-F systems.”What you see in our comparison is the top bin of [Intel’s] 80W TDP [‘Gainstown,’ Nehalem-EP],” said Parma of their chosen comparison systems. “They have their higher TDP parts; we expect to be introducing  – in Q3/2009 – our SE [Istanbul] product line which are more geared towards competing with those parts.” Until Q3, AMD is focusing on the “mainstream” market leaving the top performance seat to Gainstown’s top TDP offering.

SPECjbb 2005 Comparison. Istanbul, Shanghai and Gainstown - 75-80W SKUs Compared.

SPECjbb 2005 Comparison. Istanbul, Shanghai and Gainstown - 75-80W SKUs Compared, 2P Servers.

Much has been said about Nehalem-EP comparisons to Shanghai based on benchmarks which take advantage of Nehalem-EP’s updated SMT (multi-threading) capabilities. Parma released some of AMD’s internal benchmarks of Istanbul in 2P and 4P configurations compared to the Intel Gainstown (2P) and Dunnington (4P) product lines. Intel’s 4-core, 8-thread SMT E5540 “Gainstown” Nehalem-EP processor is the highest performance SKU in the same power tier as AMD’s initial Istanbul offering.

At 2.6GHz and 75W APC, Istanbul shows significant gains on Nehalem-EP (E5540, 2.53GHz) compared to Shanghai in AMD’s SPECint_rate2006 and SPECjbb_2005 comparisons. According to Parma’s results, the performance gap closes to within 6% and 2%, respectively. This indicates a near 90% scaling efficiency in threading and 5% improvement in integer performance.

While these results speak well for AMD’s Istanbul in the short term, Intel has four additional SKUs from 95-130W that offer higher performance in the 2P segment that remain unanswered until Q3/2009. However, for modern virtualization workloads where Intel’s SMT and EPT are less effective, and power consumption is a factor, Istanbul looks very competitive; this represents the bulk of the mainstream 2P virtualization segment.

SPECint_rate2006 Comparison. Istanbul, Shanghai and Dunnington, 75/90W SKUs Compared.

SPECint_rate2006 Comparison. Istanbul, Shanghai and Dunnington, 75/90W SKUs Compared, 4P Servers.

For better or worse, Intel’s 6-core Dunnington is still marketed as Intel’s go-to virtualization workhorse for 4P systems. AMD chose the 2.4GHz, 6-core, 45nm E7450 “Dunnington” – running at 90W TDP – as its 4P comparison from Intel. While Intel offers a higher performance variant – the 2.66GHz X7460 – it consumes 130W TDP per socket which puts it out of the mainstream.

Similar to the 2P comparisons,  the 4P results show Istanbul pulling-away from Dunnington by 59% and 37% for SPECint_rate2006 and SPECjbb_2005, respectively, and demonstrate significant improvements over Shanghai. Here, the real story is that Istanbul has increased AMD’s 2P to 4P integer scaling efficiency from 90% to 98%.

Istanbul represents the performance bump AMD needed in the interval between Nehalem-EP’s launch and C32/G34 in Q1/2010. AMD’s Istanbul will have a solid market position in the 2P space until the 8-core variant of Nehalem-EP hits the streets. Likewise, Istanbul will remain unchallenged in the 4P/8P segment until the 8-core Nehalem-EX is released in Q1/2010 and we expect to see excellent results in the 12-core, 24-core and 48-core virtualization benchmarks in the coming weeks.

Further, in a virtualization comparison, we must evaluate the current market position on virtualization license due to core count. When released in May 2009, VMware’s vSphere drew a line in the core-count sand at 6-cores/CPU. For configurations beyond 6-core (the current Dunnington standard), vSphere must be licensed at the “Advanced” or “Enterprise Plus” level – just to accommodate the core-count. This shift-up in licensing – while adding features – represents a 185% and 22% markup, $1,450 and $620 respectively (per socket).

We expect to see vSphere’s pricing model shift to accommodate 8-core and possibly 12-core CPU’s more generously as 2009 comes to a close (Q4/2009 to Q1/2010) and market forces cause VMware’s licensing position to be called into question. Until that time, the 8-core Nehalem-EP and Nehalem-EX – if launched before a resetting of VMware’s core/CPU bar – will disadvantage Intel’s offering in the VMware space. This use case will either drive Nehalem 8-core system builders to solutions other than vSphere, or stand as a “virtualization tax” that impacts Nehalem 8-core’s TCO.

Virtualization Use Case Comparisons

The virtualization segment poised to benefit the most from the Istanbul announcement is SMB. This is especially the case for vSphere Essentials which is aggressively positioned towards the growing SMB market. Looking at the total solution cost – not including storage which represents the biggest variable in virtualization today – for Istanbul versus Nehalem-EP, we’ve configured two similar systems and generated a $/VM analysis to see where Istanbul stands relative to Nehalem.

We chose the Supermicro 6062T-NTR+ system with LSI 3442/1068E add-on and 73G/SAS RAID1 for the Nehalem-EP platform and the Tyan TA26B2932-SI with integrated LSI 1068E and 73G/SAS RAID1 for Istanbul. Each was configured with 6-total 1Gbps ports and enough memory to reach their “sweet spot” of $/VM (72GB DDR3/800 for Nehalem-EP and 64GB DDR2/533 for . The cost of the listed VMware solution and effective cost of 1Gbps Ethernet ports are also figured into the dollar amounts. Then we’ve applied our SMB load profile of 2.37GB/VM and 1.7 vCPU/VM (averages) to calculate the $/VM estimate.

Looking at Nehalem-EP (2.53GHz, E5540 and 2.26GHz, E5520 SKUs), we’re estimating a $295/VM to $503/VM cost range (best case) for VMware solutions based on vSphere Essentials (2-3 systems) to vSphere Advanced (3-5 systems).  These platforms should be able to sustain 45-125 virtual workloads, respectively.

Nehalem-EP 2P/QPI Configuration Street $ 2372MB VM’s, 1.7 vCPU’s Max vCPU’s (3.38/c) Cost/VM VMware Solution
2 x 2P/8C, Nehalem-EP, E5540 2.53GHz, 5.86GT QPI with 72GB DDR3/800 $15,714.15 48 82 $327.38 vSphere Essentials
3 x 2P/8C, Nehalem-EP, E5540 2.53GHz, 5.86GT QPI with 72GB DDR3/800 $25,702.73 72 123 $356.98 vSphere Essentials Plus
3 x 2P/8C, Nehalem-EP, E5540 2.53GHz, 5.86GT QPI with 72GB DDR3/800 $34,777.73 72 123 $483.02 vSphere Advanced
5 x 2P/8C, Nehalem-EP, E5540 2.53GHz, 5.86GT QPI with 72GB DDR3/800 $60,364.88 120 205 $503.04 vSphere Advanced

Nehalem-EP 2P/QPI Configuration Street $ 2372MB VM’s, 1.7 vCPU’s Max vCPU’s (3.38/c) Cost/VM VMware Solution
2 x 2P/8C, Nehalem-EP, E5520 2.26GHz, 5.86GT QPI with 72GB DDR3/800 $14,202.15 48 82 $295.88 vSphere Essentials
3 x 2P/8C, Nehalem-EP, E5520 2.26GHz, 5.86GT QPI with 72GB DDR3/800 $23,434.73 72 123 $325.48 vSphere Essentials Plus
3 x 2P/8C, Nehalem-EP, E5520 2.26GHz, 5.86GT QPI with 72GB DDR3/800 $32,509.73 72 123 $451.52 vSphere Advanced
5 x 2P/8C, Nehalem-EP, E5520 2.26GHz, 5.86GT QPI with 72GB DDR3/800 $56,584.88 120 205 $471.54 vSphere Advanced

For Istanbul (2.6GHz, 2435 and 2.2GHz 2427 SKUs), we’re estimating a $253/VM to $473/VM cost range (best case) for VMware solutions based on vSphere Essentials (2-3 systems) to vSphere Advanced (3-5 systems). Like the EP systems, these platforms should be able to sustain 45-125 virtual workloads, respectively.

Istanbul 2P/HT3 Configuration Street $ 2372MB VM’s, 1.7 vCPU’s Max vCPU’s (3.38/c) Cost/VM VMware Solution
2 x 2P/8C Istanbul, 2435, 2.6GHz, 4.8GT HT3 with 64GB DDR2/533 $14,265.39 48 82 $297.20 vSphere Essentials
3 x 2P/8C Istanbul, 2435, 2.6GHz, 4.8GT HT3 with 64GB DDR2/533 $23,529.59 72 123 $326.80 vSphere Essentials Plus
3 x 2P/8C Istanbul, 2435, 2.6GHz, 4.8GT HT3 with 64GB DDR2/533 $32,604.59 72 123 $452.84 vSphere Advanced
5 x 2P/8C Istanbul, 2435, 2.6GHz, 4.8GT HT3 with 64GB DDR2/533 $56,742.98 120 205 $472.86 vSphere Advanced

Istanbul 2P/HT3 Configuration Street $ 2372MB VM’s, 1.7 vCPU’s Max vCPU’s (3.38/c) Cost/VM VMware Solution
2 x 2P/8C Istanbul, 2427, 2.2GHz, 4.8GT HT3 with 64GB DDR2/533 $12,129.39 48 82 $252.70 vSphere Essentials
3 x 2P/8C Istanbul, 2427, 2.2GHz, 4.8GT HT3 with 64GB DDR2/533 $20,325.59 72 123 $282.30 vSphere Essentials Plus
3 x 2P/8C Istanbul, 2427, 2.2GHz, 4.8GT HT3 with 64GB DDR2/533 $29,400.59 72 123 $408.34 vSphere Advanced
5 x 2P/8C Istanbul, 2427, 2.2GHz, 4.8GT HT3 with 64GB DDR2/533 $51,402.98 120 205 $428.36 vSphere Advanced

For SMB’s, this means a very close price-performance choices between Nehalem-EP and Istanbul within the same power footprints. Istanbul’s biggest advantage for the SMB space is really the $2,075 (15%)  difference in virtualization entry cost. This $2,000 is much more significant for SMB just entering the virtualization life-cycle than enterprise customers. For SMB’s, the savings can easily translate to backup and storage licenses or integration services resulting in better TCO for Istanbul.

2 comments

  1. […] 79% of its theoretical potential due to the weaker memory bandwidth of the Socket-F system. From our conversation with AMD’s Mike Goddard, we are told that a lot of Istanbul’s potential – including much higher memory […]

    Like


  2. […] IOMMU support for Opteron/Istanbul and that’s a good thing for virtualization. We know from earlier discussions with AMD that Istanbul needs the SR5890 to unlock its hidden potential. Two internal USB ports cry-out for flash booting […]

    Like



Comments are closed.

%d bloggers like this: