Posts Tagged ‘AMD SR5690’


Quick Take: Magny-Cours Spotted, Pushed to 3GHz for wPrime

September 13, 2009

Andreas Galistel at NordicHardware posted an article showing a system running a pair of engineering samples of the Magny-Cours processor running at 3.0GHz. Undoubtedly these images were culled from a report “leaked” on XtremeSystems forums showing a “DINAR2” motherboard with SR5690 chipset – in single and dual processor installation – running Magny-Cours at the more typical pre-release speed of 1.7GHz.

We know that Magny-Cours is essentially a MCM of Istanbul delivered in the rectangular socket G34 package. One thing illuminating about the two posts is the reported “reduction” in L3 cache from 12MB (6MB x 2 in MCM) to 10MB (2 x 5MB in MCM). Where did the additional cache go? That ‘s easy: since a 2P Magny-Cours installation is essentially a 4P Istanbul configuration, these processors have the new HT Assist feature enabled – giving 1MB of cache from each chip in the MCM to HT Assist.

“wPrime uses a recursive call of Newton’s method for estimating functions, with f(x)=x2-k, where k is the number we’re sqrting, until Sgn(f(x)/f'(x)) does not equal that of the previous iteration, starting with an estimation of k/2. It then uses an iterative calling of the estimation method a set amount of times to increase the accuracy of the results. It then confirms that n(k)2=k to ensure the calculation was correct. It repeats this for all numbers from 1 to the requested maximum.”

wPrime site

Another thing intriguing about the XtremeSystems post in particular is the reported wPrime 32M and 1024M completion times. Compared to the hyper-threading-enabled 2P Xeon W5590 (130W TDP) running wPrime 32M at 3.33GHz (3.6GHz turbo)  in 3.950 seconds, the 2P 3.0GHz Magny-Cours completed wPrime 32M in an unofficial 3.539 seconds – about 10% quicker while running a 10% slower clock. From the myopic lens of this result, it would appear AMD’s choice of “real cores” versus hyper-threading delivers its punch.

SOLORI’s Take: As a “reality check” we can compared the reigning quad-socked, quad-core Opteron 8393 SE result in wPrime 32M and wPrime 1024M at 3.90 and 89.52  seconds, respectively. Adjusted for clock and core count versus its Shanghai cousin, the Magny-Cours engineering samples – at 3.54 and 75.77 seconds, respectively – turned-in times about 10% slower than our calculus predicted. While still “record breaking” for 2P systems, we expected the Magny-Cours/Istanbul cores to out-perform Shanghai clock-per-clock – even at this stage of the game.

Due to the multi-threaded nature of the wPrime benchmark, it is likely that the HT Assist feature – enabled in a 2P Magny-Cours system by default – is the cause of the discrepancy. By reducing the available L3 cache by 1MB per die – 4MB of L3 cache total – HT Assist actually could be creating a slow-down. However, there are several things to remember here:

  • These are engineering samples qualified for 1.7GHz operation
  • Speed enhancements were performed with tools not yet adapted to Magny-Cours
  • The author indicated a lack of control over AMD’s Cool ‘n Quiet technology which could have made “as tested” core clocks somewhat lower than what CPUz reported (at least during the extended tests)
  • It is speculated that AMD will release Magny-Cours at 2.2GHz (top bin) upon release, making the 2.6+ GHz results non-typical
  • The BIOS and related dependencies are likely still being “baked”

Looking at the more “typical” engineering sample speed tests posted on the XtremeSystems’ forum tracks with the 3.0GHz overclock results at a more “typical” clock speed of 2.6GHz for 2P Magny-Cours: 3.947 seconds and 79.625 seconds for wPrime 32M and 1024M, respectively. Even at that speed, the 24-core system is on par with the 2P Nehalem system clocked nearly a GHz faster. Oddly, Intel reports the W5590  as not supporting “turbo” or hyper-threading although it is clear that Intel’s marketing is incorrect based on actual testing.

Assuming Magny-Cours improves slightly on its way to market, we already know how 24-core Istanbul stacks-up against 16-thread Nehalem in VMmark and what that means for Nehalem-EP. This partly explains the marketing shift as Intel tries to position Nehalep-EP as a destined for workstations instead of servers. Whether or not you consider this move a prelude to the ensuing Nehalem-EX v. Magny-Cours combat to come or an attempt to keep Intel’s server chip power average down by eliminating the 130W+ parts from the “server” list,  Intel and AMD will each attempt win the war before the first shot is fired. Either way, we see nothing that disrupts the price-performance and power-performance comparison models that dominate the server markets.

[Ed: The 10% difference is likely due to the fact that the author was unable to get “more than one core” clocked at 3.0GHz. Likewise, he was uncertain that all cores were reliably clocking at 2.6GHz for the longer wPrime tests. Again, this engineering sample was designed to run at 1.7GHz and was not likely “hand picked” to run at much higher clocks. He speculated that some form of dynamic core clocking linked to temperature was affecting clock stability – perhaps due to some AMD-P tweaks in Magny-Cours.]


Quick Take: DDR3 Prices on the Rise

August 4, 2009

DDR-128x128In the current server-class arms race, Intel and AMD have secured separate quarters: Intel’s rival QPI architecture coupled to a 3-channel DDR3 memory bus and functional hyper-threading cores (top bin parts) holds the pure performance sector; while AMD’s improved Istanbul cores can be delivered 6 at a time and paired with inexpensive DDR2 memory to achieve better price-performance (acquisition). Both solutions deliver about the same economies in power consumption under virtualized loads.

All in all, the Twin2 with Xeon L5520 CPUs is the best platform for those seeking an affordable server with an excellent performance/watt ratio at an affordable price. On the other hand, if performance/price is the most important criterion followed by performance/watt, we would probably opt for the six-core Opteron version of the Twin2. Supermicro has “a blade killer” avialable with the Twin², especially for those people who like to keep the hardware costs low.

John De Gelas, AnantTech, July 22, 2009

Global DDR2 and DDR3 Capacity

Global DDR2 and DDR3 Capacity

Meanwhile, the cost differential between DDR3 and DDR2 continues to widen due to increased demand in the notebook sector and reduced supply (capacity). According to DRAMeXchange, the trend will continue into Q4/09 as suppliers are expected to commit up to 30% of capacity to DDR3 by that time.

At the same time, DDR3 prices continue to inch up, by 5% in July, while DDR2 prices have appeared to bottom-out. This trend in DDR3 pricing is consistent across all speed ratings (1066/1333/1600) and, despite artificial downward price pressure from Samsung, has managed to drift upward 20% since May, 2009.

DRAMeXchange, DDR3 1Gb 128Mx8 1333MHz Price Chart

DDR3 Price Trend, May to August, 2009

Because low-end, lower-priced 2GB DDR3/1066 ($60/stick) memory shows little advantage over 2GB DDR2/800 ($35/stick), the 70% price premium keeps DDR2 in demand. With the added economic pressures of the world economy and cautious growth outlook of manufacturing sector, the cross-over from DDR2 to DDR3 will come at a significant cost: either to the consumer or the supplier.

Until the cross-over, DDR2-based systems will continue to be a favorite in price sensitive applications (i.e. where total system cost plays a significant role in purchasing decisions.) As an example of this economic inequality, let’s take the HP DL380 G6 and DL385 G6 as a comparison. Adding 16GB to the DL380 adds about $760 to the price tag (4x4GB DDR3-1066), while adding the same amount of memory to the DL385 adds only $410 (4x4GB DDR2-800). This comparison demonstrates an 85% price premium of DDR3 versus DDR2, a bit higher (percentage wise) than the desktop norm of 70%.

SOLORI’s Take: While the cost of memory in desktop systems typically represents a small portion of the overall system cost, the same can not be said for virtualization systems where entry configurations weigh-in at 16GB and often run from 48GB to 72GB in “fully loaded” systems. This, as our calculus has shown, is where the sweet-spot of $/VM is delivered.

In such configurations, the cost of DDR3 memory can tripple the system cost ($6,370 for 2P, L5506 w/12x4GB DDR3-1066R vs. $5,201 for 2P 2427 w/12x4GB DDR2-800). Moving to the higher memory footprint in 2P systems is typically not cost effective because core count cannot keep-up with the memory needs of the virtual machine inventory. However, if it were possible to utilize additional memory in the 2P platform, our benchmark 8GB DDR3-1066 versus DDR2-667 price comparison tells another story. At $900/stick, the cost of 8GB DDR3 is still a 235% premium over 8GB DDR2, making 96GB DDR3 systems (2P Xeon w/HT) nearly $6,200 per server more costly than their DDR2 counterparts (2P Istanbul) based on memory pricing alone.

SOLORI’s 2nd Take: We’re hoping to see Tyan and Supermicro release SR5690 chipset-based systems – promised in Q3/2009 – to take advantage of this pricing trend and round-out the Istanbul offering before Q1/2010 ushers-in the next wave of multi-core systems. With 10G prices on the decline, we think today’s virtualization applications make Istanbul+IOMMU a good price-performance and price-feature fit in the 32-64GB memory footprint space, leaving Nehalem-EP with only the performance niche to its credit. The only question is: where is SR5690?


AMD and Intel I/O Virtualization

April 26, 2009

Virtualization now reaches an I/O barrier where consolidated applications must vie for increasingly more limited I/O resources. Early virtualization techniques – both software and hardware assisted – concentrated on process isolation and gross context switching to accelerate the “bulk” of the virtualization process: running multiple virtual machines without significant processing degradation.

As consolidation potentials are greatly enhanced by new processors with many more execution contexts (threads and cores) the limitations imposed on I/O – software translation and emulation of device communication – begin to degrade performance. This degradation further limits consolidation, especially where significant network traffic (over 3Gbps of non-storage VM traffic per virtual server) or specialized device access comes into play.

I/O Virtualization – The Next Step-Up

Intrinsic to AMD-V in revision “F” Opterons and newer AM2 processors is I/O virtualization enabling hardware assisted memory management in the form of a Graphics Aperture Remapping Table (GART) and the Device Exclusion Vector (DEV). These two facilities provide address translation of I/O device access to a limited range of the system physical address space and provide limited I/O device classification and memory protection.

Combined with specialized software GART and DEV provided primitive I/O virtualization but were limited to the confines of the memory map. Direct interaction with devices and virtualization of device contexts in hardware are efficiently possible in this approach as VMs need to rely on hypervisor control of device access. AMD defined its I/O virtualization strategy as AMD IOMMU in 2006 (now AMD-Vi) and has continued to improve it through 2009.

With the release of new motherboard chipsets (AMD SR5690) in 2009, significant performance gains in I/O will be brought to the platform with end-to-end I/O virtualization. Motherboard refreshes based on the SR5690 should enable Shanghai and Istanbul processors to take advantage of the full AMD IOMMU specification (now AMD-Vi).

Similarly, Intel’s VT-d approach combines chipset and CPU features to solve the problem in much the same way. Due to the architectural separation of memory controller from CPU, this meant earlier processors not only carry the additional instruction enhancements but they must also be coupled to northbridge chipsets that contained support. This feature was initially available in the Intel Q35 desktop chipset in Q3/2007. Read the rest of this entry ?