Posts Tagged ‘magny-cours’

h1

Quick Take: Magny-Cours Spotted, Pushed to 3GHz for wPrime

September 13, 2009

Andreas Galistel at NordicHardware posted an article showing a system running a pair of engineering samples of the Magny-Cours processor running at 3.0GHz. Undoubtedly these images were culled from a report “leaked” on XtremeSystems forums showing a “DINAR2″ motherboard with SR5690 chipset – in single and dual processor installation – running Magny-Cours at the more typical pre-release speed of 1.7GHz.

We know that Magny-Cours is essentially a MCM of Istanbul delivered in the rectangular socket G34 package. One thing illuminating about the two posts is the reported “reduction” in L3 cache from 12MB (6MB x 2 in MCM) to 10MB (2 x 5MB in MCM). Where did the additional cache go? That ‘s easy: since a 2P Magny-Cours installation is essentially a 4P Istanbul configuration, these processors have the new HT Assist feature enabled – giving 1MB of cache from each chip in the MCM to HT Assist.

“wPrime uses a recursive call of Newton’s method for estimating functions, with f(x)=x2-k, where k is the number we’re sqrting, until Sgn(f(x)/f'(x)) does not equal that of the previous iteration, starting with an estimation of k/2. It then uses an iterative calling of the estimation method a set amount of times to increase the accuracy of the results. It then confirms that n(k)2=k to ensure the calculation was correct. It repeats this for all numbers from 1 to the requested maximum.”

- wPrime site

Another thing intriguing about the XtremeSystems post in particular is the reported wPrime 32M and 1024M completion times. Compared to the hyper-threading-enabled 2P Xeon W5590 (130W TDP) running wPrime 32M at 3.33GHz (3.6GHz turbo)  in 3.950 seconds, the 2P 3.0GHz Magny-Cours completed wPrime 32M in an unofficial 3.539 seconds – about 10% quicker while running a 10% slower clock. From the myopic lens of this result, it would appear AMD’s choice of “real cores” versus hyper-threading delivers its punch.

SOLORI’s Take: As a “reality check” we can compared the reigning quad-socked, quad-core Opteron 8393 SE result in wPrime 32M and wPrime 1024M at 3.90 and 89.52  seconds, respectively. Adjusted for clock and core count versus its Shanghai cousin, the Magny-Cours engineering samples – at 3.54 and 75.77 seconds, respectively – turned-in times about 10% slower than our calculus predicted. While still “record breaking” for 2P systems, we expected the Magny-Cours/Istanbul cores to out-perform Shanghai clock-per-clock – even at this stage of the game.

Due to the multi-threaded nature of the wPrime benchmark, it is likely that the HT Assist feature – enabled in a 2P Magny-Cours system by default – is the cause of the discrepancy. By reducing the available L3 cache by 1MB per die – 4MB of L3 cache total – HT Assist actually could be creating a slow-down. However, there are several things to remember here:

  • These are engineering samples qualified for 1.7GHz operation
  • Speed enhancements were performed with tools not yet adapted to Magny-Cours
  • The author indicated a lack of control over AMD’s Cool ‘n Quiet technology which could have made “as tested” core clocks somewhat lower than what CPUz reported (at least during the extended tests)
  • It is speculated that AMD will release Magny-Cours at 2.2GHz (top bin) upon release, making the 2.6+ GHz results non-typical
  • The BIOS and related dependencies are likely still being “baked”

Looking at the more “typical” engineering sample speed tests posted on the XtremeSystems’ forum tracks with the 3.0GHz overclock results at a more “typical” clock speed of 2.6GHz for 2P Magny-Cours: 3.947 seconds and 79.625 seconds for wPrime 32M and 1024M, respectively. Even at that speed, the 24-core system is on par with the 2P Nehalem system clocked nearly a GHz faster. Oddly, Intel reports the W5590  as not supporting “turbo” or hyper-threading although it is clear that Intel’s marketing is incorrect based on actual testing.

Assuming Magny-Cours improves slightly on its way to market, we already know how 24-core Istanbul stacks-up against 16-thread Nehalem in VMmark and what that means for Nehalem-EP. This partly explains the marketing shift as Intel tries to position Nehalep-EP as a destined for workstations instead of servers. Whether or not you consider this move a prelude to the ensuing Nehalem-EX v. Magny-Cours combat to come or an attempt to keep Intel’s server chip power average down by eliminating the 130W+ parts from the “server” list,  Intel and AMD will each attempt win the war before the first shot is fired. Either way, we see nothing that disrupts the price-performance and power-performance comparison models that dominate the server markets.

[Ed: The 10% difference is likely due to the fact that the author was unable to get "more than one core" clocked at 3.0GHz. Likewise, he was uncertain that all cores were reliably clocking at 2.6GHz for the longer wPrime tests. Again, this engineering sample was designed to run at 1.7GHz and was not likely "hand picked" to run at much higher clocks. He speculated that some form of dynamic core clocking linked to temperature was affecting clock stability - perhaps due to some AMD-P tweaks in Magny-Cours.]

h1

Quick-Take: VMworld 2009 Wrap-Up

September 8, 2009

VMworld 2009 in San Franciso started off with a crash and a fist fight, but ended without further incident. If you’re looking for what happened, it would be hard to beat Duncan Epping’s link-summary of the San Francisco VMworld 2009 at Yellow-Bricks, so we won’t even try. Likewise, Chad Sakacc has some great EMC view point on his Virtualgeek blog, and – fresh from his new book releaseScott Lowe has some great detail about the VMworld keynotes, events and sessions he attended.

There is a great no-spin commentary on VMworld’s “softer underbelly” on Jon William Toigo’s Drunken Data blog – especially the post about Xsigo’s participation in VMworld 2009. Also, Brian Madden has a great wrap-up video of of interviews from the VMworld floor including VMware’s Client Virtualization Platform (CVP) and the software implementation of Teradici’s PC-over-IP.

AMD’s IOMMU was on display using a test mule with two 12-core 6100 processors and a SR5690 chipset. The targets were a FirePro graphics card and a Solarflare 10GE NIC. For IOMMU-based virtualization to have broad appeal, hardware device segmentation must be supported in a manner compatible with vMotion (live migration.) No segmentation was hinted at in AMD’s demo (for FirePro), but the fact that vSphere+IOMMU+Magny-Cours equated to enough stability to be openly demonstrating the technology says a lot about the maturity of AMD’s upcoming chips and chipsets. On the other hand, Solarflare’s demonstration previewed – in 10GE – what could be possible in a future version of IOV for GPU’s:

“The flexible vNIC demonstration will highlight the Solarstorm server adapter’s scalable, virtualized architecture, supporting 100s of virtual machines and 1000s of vNICs. The Solarstorm vNIC architecture provides flexible mapping of vNICs, so that each guest OS can have its own vNIC, as well as traffic management, enabling prioritization and isolation of IP flows between vNICs.”

- Solarflare Press Release

SOLORI’s Take: The controversy surrounding VMware’s “focus” on the VMware “sphere” of products was a non-starter. The name VMworld does not stand for “Virtualization World” – it stands for “VMware World” and denying competitor’s “marketing access” to that venue seems like a reasonable restriction. While it may seem like a strong-arm tactic to some, insisting that vendors/partners are there “for VMworld only” – and hence restricting cross-marketing efforts in and around the venue – makes it more difficult for direct competitors to play the “NASCAR-style marketing” (as Toigo calls it) game.

VMworld is a showcase for technologies driving the virtualization eco-system as seen from VMware’s perspective. While there are a growing number of competitors for virtualization mind-share, VMware’s pace and vision – to date – has been driven by careful observation of use-case more so than innovation for innovation’s sake. It is this attention to business need that has made VMware successful and what defines VMworld’s focus – and it is in that light that VMworld 2009 looks like a great success.

h1

Quick Take: HP’s Sets Another 48-core VMmark Milestone

August 26, 2009

Not satisfied with a landmark VMmark score that crossed the 30 tile mark for the first time, HP’s performance team went back to the benches two weeks later and took another swing at the performance crown. Well, the effort paid off, and HP significantly out-paced their two-week-old record with a score of 53.73@35 tiles in the heavy weight, 48-core category.

Using the same 8-processor HP ProLiant DL785 G6 platform as in the previous run – complete with 2.8GHz AMD Opteron 8439 SE 6-core chips and 256GB DDR2/667 – the new score comes with significant performance bumps in the javaserver, mailserver and database results achieved by the same system configuration as the previous attempt – including the same ESX 4.0 version (164009). So what changed to add an additional 5 tiles to the team’s run? It would appear that someone was unsatisfied with the storage configuration on the mailserver run.

Given that the tile ratio of the previous run ran about 6% higher than its 24-core counterpart, there may have been a small indication that untapped capacity was available. According to the run notes, the only reported changes to the test configuration – aside from the addition of the 5 LUNs and 5 clients needed to support the 5 additional tiles – was a notation indicating that the “data drive and backup drive for all mailserver VMs” we repartitioned using AutoPart v1.6.

The change in performance numbers effectively reduces the virtualization cost of the system by 15% to about $257/VM – closing-in on its 24-core sibling to within $10/VM and stretching-out its lead over “Dunnington” rivals to about $85/VM. While virtualization is not the primary application for 8P systems, this demonstrates that 48-core virtualization is definitely viable.

SOLORI’s Take: HP’s performance team has done a great job tuning its flagship AMD platform, demonstrating that platform performance is not just related to hertz or core-count but requires balanced tuning and performance all around. This improvement in system tuning demonstrates an 18% increase in incremental scalability – approaching within 3% of the 12-core to 24-core scaling factor, making it actually a viable consideration in the virtualization use case.

In recent discussions with AMD about the SR5690 chipset applications for Socket-F, AMD re-iterated that the mainstream focus for SR5690 has been Magny-Cours and the Q1/2010 launch. Given the close relationship between Istanbul and Magny-Cours – detailed nicely by Charlie Demerjian at Semi-Accurate – the bar is clearly fixed for 2P and 4P virtualization systems designed around these chips. Extrapolating from the similarities and improvements to I/O and memory bandwidth, we expect to  see 2P VMmarks besting 32@23 and 4P scores over 54@39 from HP, AMD and Magny-Cours.

SOLORI’s 2nd Take: Intel has been plugging away with its Nehalem-EX for 8-way systems and – delivering 128-threads – promises to deliver some insane VMmarks. Assuming Intel’s EX scales as efficiently as AMD’s new Opterons have, extrapolations indicate performance for the 4P, 64-thread Nehalem-EX shoud fall between 41@29 and 44@31 given the current crop of speed and performance bins. Using the same methods, our calculus predicts an 8P, 128-thread EX system should deliver scores between 64@45 and 74@52.

With EX expected to clock at 2.66GHz with 140W TDP and AMD’s MCM-based Magny-Cours doing well to hit 130W ACP in the same speed bins, CIO’s balancing power and performance considerations will need to break-out the spreadsheets to determine the winners here. With both systems running 4-channel DDR3, there will be no power or price advantage given on either side to memory differences: relative price-performance and power consumption of the CPU’s will be major factors. Assuming our extrapolations are correct, we’re looking at a slight edge to AMD in performance-per-watt in the 2P segment, and a significant advantage in the 4P segment.

h1

Quick Take: 6-core “Gulftown” Nehalem-EP Spotted, Tested

August 10, 2009

TechReport is reporting on a Taiwanese overclocker who may be testing a pair of Nehalem 6-core processors (2P) slated for release early in 2010. Likewise, AlienBabelTech mentions a Chinese website, HKEPC, that has preliminary testing completed on the desktop (1P) variant of the 6-core. While these could be different 32nm silicon parts, it is more likely – judging from the CPU-Z outputs and provided package pictures – that these are the same sample SKUs tested as 1P and 2P LGA-1366 components.

CPUzWhat does this mean for AMD and the only 6-core shipping today? Since Intel’s still projecting Q2/2010 for the server part, AMD has a decent opportunity to grow market share for Istanbul. Intel’s biggest rival will be itself – facing a wildly growing number of SKU’s in across its i-line from i5, i7, i8 and i9 “families” with multiple speed and feature variants. Clearly, the non-HT version would stand as a direct competitor to Istanbul’s native 6-core SKUs. Likewise, Istanbul may be no match for the 6-core Nehalem with HT and “turbo core” feature set.

However, with an 8-core “Beckton” Nehalem variant on the horizon, it might be hard to understand just where the Gulftown fits in Intel’s picture. Intel faces a serious production issue, filling fab capacity with 4-core, 6-core and 8-core processors, each with speed, power, socket and HT variants from which to supply high-speed, high-power SKUs and lower-speed, low-power SKUs for 1P, 2P and 4P+ destinations. Doing the simple math with 3 SKU’s per part Intel would be offering the market a minimum of 18 base parts according to their current marketing strategy: 9 with HT/turbo, 9 without HT/turbo. For socket LGA-1366, this could easily mean 40+ SKUs with 1xQPI and 2xQPI variants included (up from 23).

SOLORI’s take: Intel will have to create some interesting “crippling or pricing tricks” to keep Gulftown from canibalizing the Gainstown market. If they follow their “normal” play book, we prodict the next 10-months will play out like this:

  1. Initially there will be no 8-core product for 1P and 2P systems (LGA-1366), allowing for artificially high margins on the 8-core EX chip (LGA-1567), slowing the enevitable canibalization of the 4-core/2P market, and easing production burdens;
  2. Intel will silently and abruptly kill Itanium in favor of “hyper-scale” Nehalem-EX variants;
  3. Gulftown will remain high-power (90-130W TDP) and be positioned against AMD’s G34 systems and Magny-Cours – plotting 12-core against 12-thread;
  4. Intel creates a “socket refresh” (LGA-1566?) to enable “inexpensive” 2P-4P platforms from its Gulftown/Beckton line-up in 2H/2010 (ostensibly to maintain parity with G34) without hurting EX;
  5. Revised, lower-power variants of Gainstown will be positioned against AMD’s C32 target market;
  6. Intel will cut SKUs in favor of higher margins, increasing speed and features for “same dollar” cost;
  7. Non-HT parts will begin to disappear in 4-core configurations completely;
  8. Intel’s AES enhancements in Gulftown will allow it to further differentiate itself in storage and security markets;

It would be a mistake for Intel to continue growing SKU count or provide too much overlap between 4-core HT and 6-core non-HT offerings. If purchasing trends soften in 4Q/09 and remain (relatively) flat through 2Q/10, Intel will benefit from a leaner, well differentiated line-up. AMD has already announced a “leaner” plan for G34/C32. If all goes well at the fabs, 1H/2010 will be a good ole fashioned street fight between blue and green.

h1

Shanghai Economics 101

April 30, 2009

Before the release of the Istanbul 6-core processor we wanted to preview the CAPEX comparisons we’ve been working on between today’s Opteron (Shanghai) and today’s Nehalem-EP. The results are pretty startling and mostly due to the Nahelem-EP’s limited memory addressing capability. Here are the raw numbers for comparable performance systems (i.e. high-end):

Nehalem-EP Configuration Street $
Shanghai HT3 Configuration Street $
Savings $ Savings %
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 24GB DDR3/1333 $7,017.69   2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 32GB DDR2/800 $5,892.12   $1,125.57 16.04%
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 48GB DDR3/1066 $7,755.99   2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 48GB DDR2/800 $6,352.12   $1,403.87 18.10%
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 96GB DDR3/1066 $21,969.99   2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 96GB DDR2/667 $11,968.72   $10,001.27 45.52%
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 144GB DDR3/800 $30,029.19   2P/8C Shanghai, 2393 SE, 3.1GHz, 4.4GT HT3 with 128GB DDR2/533 $14,300.92   $15,728.27 52.38%
               
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 96GB DDR3/1066 $21,969.99   4P/16C Shanghai, 8393 SE, 3.1GHz, 4.4GT HT3 with 96GB DDR2/800 $17,512.87   $4,457.12 20.29%
2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 144GB DDR3/800 $30,029.19   4P/16C Shanghai, 8393 SE, 3.1GHz, 4.4GT HT3 with 192GB DDR2/667 $28,746.07   $1,283.12 4.27%
2 x 2P/8C, Nehalem-EP, W5580 3.2GHz, 6.4GT QPI with 144GB (288GB total) DDR3/800 $60,058.38   1 x 4P/16C Shanghai, 8393 SE, 3.1GHz, 4.4GT HT3 with 256GB DDR2/533 $33,410.47   $26,647.92 44.37%

Even the 4-socket Shanghai 8393SE averages 23% lower implementation cost over Nehalem-EP and produces 16 “real” cores versus 8 “real” cores in the process. Even at 50% theoretical efficiency using Nehalem’s SMT, the 4P Shanghai represents a solid choice in the performance segment. An Istanbul drop-in upgrade spread’s the gulf in capabilities even wider.

Based on today’s economics and the history of seamless vMotion between Opteron processors, 4P/24C Istanbul is a solid will be a no-brainer investment. With 2P/24C and 4P/48C Magny-Cours on the way to handle the “really big” tasks, a Shanghai-Istanbul Eco-System looks like an economic stimulus all its own.

h1

Magny-Cours Spotted

April 29, 2009
Magny-Cours, 12-core Processor

Magny-Cours, 12-core Processor

AMD’s next generation “G34″ socket Magny-Cours processor was spotted recently by XbitLabs running in AMD’s 4-way test mule platform. We’ve talked about Magny-Cours and socket-G34 before, but had no picture until now. The multi-chip module (MCM) heritage is obvious given it’s rectangular shape.

Critical for AMD will be HT3+DCA2 efficiency and memory bandwidth to counter the apparent success of Nehalem-EP’s SMT technology. Although AMD does not consider hyperthreading to be a viable technology for them, it appears to be working for Intel in benchmark cases.

While seems logical that more “physical” cores should scale better than the “logical” cores provided by SMT, Intel is making some ground of legacy “physical core” systems, demonstrating what appears to be a linear scaling in VMmark. However, Intel has a fine reputation for chasing – and mastering – benchmark performance only to show marginal gains in real-world applications.

Meanwhile, the presure mounts on Instanbul’s successful launch in June with white box vendors making ready for the next wave of “product release buzz” to stimulate sinking sales. Decision makers will have a lot of spreadsheet work to do to determine where the real price performance lies. Based on the high-cost of dense DDR3 and DDR2, the 16-DIMM/CPU advantage is weighing heavily on AMD’s side from a CAPEX and OPEX perspective (DDR2 is already a well-entrenched component of all socket-F platforms).

Up to now, Intel’s big benchmark winners have been the W5580 and X5570 with $1,700 and $1,500 unit prices, respectively. Compounded with high-cost DDR3 dual-rank memory, or reduction in memory bandwidth (which eliminates a significant advantage), the high-end Nehalem-EP is temporarily caught in an economic bind, severely limiting its price-performance suitability.

h1

Opteron Turns 6: Plus Istanbul and a New Road-map

April 22, 2009

AMD released an updated technology road-map for it’s Opteron processor family, beginning with the early availability of Istanbul – its Socket-F compatible 6-core processor – shipping for revenue in May and available from OEM’s in June. This information was delivered in a webcast today.

AMD Istanbul 6-core Processor

AMD Istanbul 6-core Processor

“…up to 30 percent more performance within the same power envelope and on the same platform as current Quad-Core AMD Opteron…”

Additionally, AMD updated the availability of its Direct Connect Architecture 2.0 to be available only in the Opteron 4000 and 6000 series (socket C32 and G34, respectively). Companies waiting for the 12-core “Magny-Cours” processor will have to switch to the G34 platform in 2010. AMD announced that it is already shipping this 45nm part to sampling partners, and some customers will receive parts in 2H/2009. Magny-Cours is expected to be available from OEM’s and system vendors in 1H/2010.

Opteron 4000 series is also planned for introduction in 2010 for 1P and 2P servers and designed to address virtualized Web and cloud computing environments. The 4000 series will launch with 4- and 6-core processors…”

AMD believes, with core counts on the rise, dense computing (HPC and data center virtualization or cloud) will rely on the 4000 series and its more “green friendly” low power parts called “EE” offering comparable performance at 40W average power. This will create a differential in the server space between 4000 and 6000 (much like 2000 and 8000 today) but with overlap in the 2P market (unlike 2000/8000). The 6000 series is envisioned as a “high performance computing” part where power sensitivity is not the major concern. Read the rest of this entry ?

Follow

Get every new post delivered to your Inbox.

Join 49 other followers