Posts Tagged ‘nehalem-ex’


Quick Take: IBM Tops VMmark, Crushes Record with 4P Nehalem-EX

April 7, 2010

It was merely a matter of time before one of the new core-rich titans – the Intel’s 8-core “Beckton” Nehalem-EX (Xeon 7500) or AMD’s 12-core “Magny-Cours” (Opteron 6100) – was to make a name for itself on VMware’s VMmark benchmark. Today, Intel draws first blood in the form of an 4-processor, 32-core, 64-thread, monster from IBM: the x3850 X5 running four Xeon X7560 (2.266GHz – 2.67GHz w/turbo, 130W TDP, each) and 384GB of DDR3-1066 low-power registered DIMMs. Weighing-in at 70.78@48 tiles, the 4P IBM System x3850 handily beats the next highest system – the 48-core DL785 G5 which set the record of 53.73@35 tiles back in August, 2009 – and bests it by over 30%.

At $3,800+ per socket for the tested Beckton chip, this is no real 2P alternative. In fact, a pair of Cisco UCS B250 M2 blades will get 52 tiles running for much less money. Looking at processor and memory configurations alone, this is a $67K+ enterprise server, resulting in a moderately-high $232/VM price point for the IBM x3850 X5.

SOLORI’s Take: The most interesting aspect of the EX benchmark is its clock-adjusted scaling factor: between 70% and 91% versus a 2P/8-core Nehalem-EP reference (Cisco UCS, B200 M1, 25.06@17 tiles). The unpredictable nature of Intel’s “turbo” feature – varying with thermal loads and per-core conditions – makes an exact clock-for-clock comparison difficult. However, if the scaling factor is 90%, the EX blows away our previous expectations about the platform’s scalability. Where did we go wrong when we predicted a conservative 44@39 tiles? We’re looking at three things: (1) a bad assumption about the effectiveness of “turbo” in the EP VMmark case (setting Ref_EP_Clock to 3.33 GHz), and (2) underestimating EX’s scaling efficiency (assumed 70%), (3) assuming a 2.26GHz clock for EX.

Chosing our minimum QPI/HT3 scalability factor of 75%, the predicted performance was derived this way from HP Proliant BL490 G6 as a baseline:

Est. Tiles = EP_Tiles_per_core( 2.13 ) * 32 cores * Scaling_Efficiency( 75% ) * EX_Clock( 2.26 ) / EP_Clock( 2.93 ) = 39 tiles

Est. Score = Est_Tiles( 40 ) * EP_Score_per_Tile( 1.43 ) * Est_EX_Clock( 2.26 ) / Ref_EP_Clock( 2.93 ) = 44.12

Est. Nehalem-EX VMmark -> 44.12@39 tiles

Correcting for the as-tested clock/turbo numbers, and using AMD’s 2P-to-4P VMmark scaling efficiency of 83%, and shifting to the new UCS baseline (with newer ESX version) the Nehalem-EX prediction factors to:

Est. Tiles = EP_Tiles_per_core( 2.13 ) * 32 cores * Scaling_Efficiency( 83% ) * EX_Clock( 2.67 ) / EP_Clock( 2.93 ) = 51 tiles

Est. Score = Est_Tiles( 51 ) * EP_Score_per_Tile( 1.47 ) * Est_EX_Clock( 2.67 ) / Ref_EP_Clock( 2.93 ) = 68.32

Est. Nehalem-EX VMmark -> 68.3@51 tiles

Clearly, this approach either overestimates the scaling efficiency or underestimates the “turbo” mode. IBM claims that a 2.93 GHz “turbo” setting is viable where Intel suggests 2.67 GHz is the maximum, so there is a source of potential bias. Looking at the tiles-per-core ratio of the VMmark result, the Nehalem-EX drops from 2.13 tiles per core on EP/2P platforms to 1.5 tiles per core on EX/4P platforms – about a 30% drop in per-core loading efficiency. That indicator matches well with our initial 75% scaling efficiency moving from 2P to 4P – something that AMD demonstrated with Istanbul last August. Given the high TDP of EX and IBM’s 2.93 GHz “turbo” specification, it’s possible that “turbo” is adding clock cycles (and power consumption) and compensating for a “lower” scaling efficiency than we’ve assumed. Looking at the same estimation with 2.93GHz “clock” and 71% efficiency (1.5/2.13), the numbers fall in line with VMmark:

Est. Tiles = EP_Tiles_per_core( 2.13 ) * 32 cores * Scaling_Efficiency( 71% ) * EX_Clock( 2.93 ) / EP_Clock( 2.93 ) = 48 tiles

Est. Score = Est_Tiles( 48 ) * EP_Score_per_Tile( 1.47 ) * Est_EX_Clock( 2.93 ) / Ref_EP_Clock( 2.93 ) = 70.56

Est. Nehalem-EX VMmark -> 70.56@48 tiles

This give us a good basis for evaluating 2P vs. 4P Nehalem systems: scaling factor of 71% and capable of pushing clock towards the 3GHz mark within its thermal envelope. Both of these conclusions fit typical 2P-to-4P norms and Intel’s process history.

SOLORI’s 2nd Take: So where does that leave AMD’s newest 12-core chip? To date, no VMmark exists for AMD’s Magny-Cours, and AMD chips tend not to do as well in VMmark as their Intel peers do to the benchmarks SMT-friendly loads. However, we can’t resist using the same analysis against AMD/MC’s 2.4GHz Opteron 6174SE (theoretical) using the 2P HP DL385 G6 as a baseline for core loading and the HP DL785 G6 for tile performance (best of the best cases):

Est. Tiles = HP_Tiles_per_core( 0.92 ) * 48 cores * Scaling_Efficiency( 83% ) * MC_Clock( 2.3 ) / HP_Clock( 2.6 ) = 33 tiles

Est. Score = Est_Tiles( 34 ) * HP_Score_per_Tile( 1.54 ) * Est_MC_Clock( 2.3 ) / Ref_HP_Clock( 2.8 ) = 41.8

Est. 4P Magny-Cours VMmark -> 41.8@33 tiles

That’s nowhere near good enough to top the current 8P, 48-core Istanbul VMmark at 53.73@35 tiles, so we’ll likely have to wait for faster 6100 parts to see any new AMD records. However, assuming AMD’s proposition is still “value 4P” so about 200 VM’s at under $18K/server gets you around $90/VM or less.


VMware PartnerExchange2010 – Day 1-2

February 9, 2010

View of the Mandalay Bay from VMware's Alumni Lounge

It’s my second day at the beautiful Mandalay Bay in Las Vegas, Nevada and VMware PartnerExchange 2010. Yesterday was filled with travel and a generous “Tailgate Party” with burgers, dogs, beverages and lots of VMware geeks! I managed to catch the last quarter of the game from the Mandalay Bay Poker Room where I added to my chip stack at the 1/2 No-Limit Texas Hold ‘Em tables. Then it was early to bed – about 9PM PST – where I studied for the upcoming VCP410 exam.

Today (Monday) was occupied with a partners-only VMware Certified Professional, Version 4, Preparation Course which outlined the VCP4 Blueprint, question examples and test-taking strategies. The “best answer,” multiple-choice format of the VCP410 exam promises to offer me some challenges as I apply black-and-white logic to a few shades-of-grey questions. The best strategy to overcome such an obstacle: read the question in its entirety, eliminate all wrong answers, then choose the answer(s) that best satisfy the entire question. A key example is this from the on-line “mock-up” exam:

What is the maximum number of vNetwork switch ports per ESX host and vCenter Server instance?

a.  4,088 for vNetwork standard switches; 4,096 for vNetwork Distributed switches

b.  4,096 for both types of switches

c.  4,088 for vNetwork standard switches; 6,000 for vNetwork distributed switches

d.  512 for both types of virtual switches

Well, it might have been obvious that “c” is the “correct” answer, but “a” is right off of Page 6 of the vSphere Configuration Maximums guide. Both are solidly “correct” answers, it’s just that “c” speaks to both the ESX question and the vCenter question making it more correct. However, neither is completely correct since vDS ports are bound by vCenter and ESX host, while vSS ports are bound only by ESX host. Since neither answer “a” or “c” specifies which limitation they are answering – host or vCenter – it is left to subjective reasoning to infer the intent. According to Jon Hall (VMware, Florida) the most ports any vNetwork switch can have in any one host is 4,088 – regardless of type. Therefore, to reach the “total virtual network ports per host (vDS and vSS ports) at least one switch of each type must exist. Alone, they can only reach 4,088 ports, however the Configuration Maximums document never spells this out for the vNetwork Distributed Switch. Hopefully this exception will be foot-noted in the next revision of the document. [Note: the additional information about vDS type vNetwork switches that  Jon logically invalidates “a” as a response.]

Following the VCP4 Prep Course, I “recharged” in the Alumni Lounge. VMware had snacks and drinks to quell the appetite and lots of power outlets to restore my iPhone and laptop. While I waited, I contacted the wife and got the 4-1-1 on our baby, checked e-mail and ran through the “mock-up” exam a couple of times. Then it was off to the Welcome Reception at the VMware Experience Hall where sponsors and exhibitors had their wares on display.

iPhone Screen Capture of the ESX Host Running Nehalem-EX, 4P/16C/32T

iPhone Screen Capture of the ESX Host Running Nehalem-EX, 4P/32C/64T

Just inside the Hall – across from the closest beverage station – was Intel’s booth and the boys in blue were demonstrating vMotion over 10GE NICs. Yes, it was fast (as you’d expect) but the real kick was the “upcoming” 10GE Base-T adapters to challenge the current price-performance leader: the 10GE Base-CR (also supporting SFP+). At under $400/port for 10GE, it’s hard to remember a reason for using 1Gbps NICs… Oh yes, the prohibitive per-port cost of 10GE switches. AristaNetworks to the rescue???

Intel was also showing their “modular server” system. Unfortunately, the current offering doesn’t allow for SAS JBOD expansion in a meaningful way (read: running NexentaStor on one/two of the “blades”), but after discussing the issue of SAS/love with the guys in the blue booth, interests were peaked. Evan, expect a call from Intel’s server group… Seriously, with 14x 2.5″ drives in a SAS Expander interconnected chassis, NexentaStor + SSD + 15K SAS would rock!

Last but not least, Intel was proudly showing their 4P, Nehalem-EX running VMware ESX with 512GB of RAM (DDR3) and demonstrating 64active threads (pictured.) This build-out offers lots of virtualization goodness at a hereto unknown price point. Suffice to say, at 1.8GHz it’s not a screamer, but the RAS features are headed in the right direction. When you rope 64-threads (about 125-250 VM’s) and 1TB worth of VM’s (yes, 1TB RAM – about $250K worth using “on-loan Samsung parts”) you are talking about a lot of “eggy in the basket.” By enhancing the RAS capabilities of these giant systems, component failure mitigation is becoming less catastrophic  – eventually allowing only a few VM’s to be impacted by a point failure instead of ALL running VM’s on the box.

vCenter ESX Host Status Showing 512GB of RAM

In case you haven’t seen an ESX host with 512GB of available RAM, check-out this screen capture (excuse the iPhone quality) to the right. That’s about $33K worth of DDR3 memory sitting in that box and assuming that the EX processors run $2K a piece and giving $6K for the remainder of the system, that’s nearly $6K/VM in this demo: fantastically decadent! Of course – and in all due fairness to the boys in blue – VM density was not the goal in this demonstration: RAS was, and the 2-bit error scrubbing – while painful as watching paint dry – is pretty cool and soon to be needed (as indicated above) for systems with this capacity.

Other vendors visited were Wyse and Xsigo. The boys in yellow (Wyse) were pimping their thin/zero clients with some compelling examples of PCoIP (Wyse 20p) and MMR (Wyse r90lew). The PCoIP demos featured end-to-end hardware Teradici cards displaying clips from Avatar, while the MMR demo featured 720p movie clips from an iMAX cut of dog fight training. While the PCoIP was impressive and flawless, the upcoming MMR enhancements – while flawed in the beta I saw – were nothing short of impressive.

No, that's not Xsigo's secret sauce: it's the chocolate fountain at VMware's Welcome Reception.

Considering that the MMR-capable thin client was running a 1.5GHz AMD Semperon, the 720p Windows Media stream looked all the better. Looking back at the virtual machine from the ESX console, only about 10-15% of a core was being consumed to “render” the video. But that’s the beauty of MMR: redirect the processor intensive decoding to the end-point and just send the stream un-decoded. While PCoIP is a win in LANs with knowledge workers and call center applications, the MMR-based thin clients look pretty good for Education and YouTube-happy C-level employees looking to catch-up on their Hulu…

I managed to catch the Xsigo boys as the night wound down and they insured my that “mom’s cooking” back at the HQ. “Very soon” we should be hearing about a Xsigo I/O Director option that is a better fit for ROBO and SME deployments. The best part about Xsigo’s I/O virtualization technology in VMware applications: it delivers without a proprietary blade or server requirement! I’m really looking forward to getting some Xsigo into the SOLORI lab this summer…


Quick Take: Magny-Cours Spotted, Pushed to 3GHz for wPrime

September 13, 2009

Andreas Galistel at NordicHardware posted an article showing a system running a pair of engineering samples of the Magny-Cours processor running at 3.0GHz. Undoubtedly these images were culled from a report “leaked” on XtremeSystems forums showing a “DINAR2” motherboard with SR5690 chipset – in single and dual processor installation – running Magny-Cours at the more typical pre-release speed of 1.7GHz.

We know that Magny-Cours is essentially a MCM of Istanbul delivered in the rectangular socket G34 package. One thing illuminating about the two posts is the reported “reduction” in L3 cache from 12MB (6MB x 2 in MCM) to 10MB (2 x 5MB in MCM). Where did the additional cache go? That ‘s easy: since a 2P Magny-Cours installation is essentially a 4P Istanbul configuration, these processors have the new HT Assist feature enabled – giving 1MB of cache from each chip in the MCM to HT Assist.

“wPrime uses a recursive call of Newton’s method for estimating functions, with f(x)=x2-k, where k is the number we’re sqrting, until Sgn(f(x)/f'(x)) does not equal that of the previous iteration, starting with an estimation of k/2. It then uses an iterative calling of the estimation method a set amount of times to increase the accuracy of the results. It then confirms that n(k)2=k to ensure the calculation was correct. It repeats this for all numbers from 1 to the requested maximum.”

wPrime site

Another thing intriguing about the XtremeSystems post in particular is the reported wPrime 32M and 1024M completion times. Compared to the hyper-threading-enabled 2P Xeon W5590 (130W TDP) running wPrime 32M at 3.33GHz (3.6GHz turbo)  in 3.950 seconds, the 2P 3.0GHz Magny-Cours completed wPrime 32M in an unofficial 3.539 seconds – about 10% quicker while running a 10% slower clock. From the myopic lens of this result, it would appear AMD’s choice of “real cores” versus hyper-threading delivers its punch.

SOLORI’s Take: As a “reality check” we can compared the reigning quad-socked, quad-core Opteron 8393 SE result in wPrime 32M and wPrime 1024M at 3.90 and 89.52  seconds, respectively. Adjusted for clock and core count versus its Shanghai cousin, the Magny-Cours engineering samples – at 3.54 and 75.77 seconds, respectively – turned-in times about 10% slower than our calculus predicted. While still “record breaking” for 2P systems, we expected the Magny-Cours/Istanbul cores to out-perform Shanghai clock-per-clock – even at this stage of the game.

Due to the multi-threaded nature of the wPrime benchmark, it is likely that the HT Assist feature – enabled in a 2P Magny-Cours system by default – is the cause of the discrepancy. By reducing the available L3 cache by 1MB per die – 4MB of L3 cache total – HT Assist actually could be creating a slow-down. However, there are several things to remember here:

  • These are engineering samples qualified for 1.7GHz operation
  • Speed enhancements were performed with tools not yet adapted to Magny-Cours
  • The author indicated a lack of control over AMD’s Cool ‘n Quiet technology which could have made “as tested” core clocks somewhat lower than what CPUz reported (at least during the extended tests)
  • It is speculated that AMD will release Magny-Cours at 2.2GHz (top bin) upon release, making the 2.6+ GHz results non-typical
  • The BIOS and related dependencies are likely still being “baked”

Looking at the more “typical” engineering sample speed tests posted on the XtremeSystems’ forum tracks with the 3.0GHz overclock results at a more “typical” clock speed of 2.6GHz for 2P Magny-Cours: 3.947 seconds and 79.625 seconds for wPrime 32M and 1024M, respectively. Even at that speed, the 24-core system is on par with the 2P Nehalem system clocked nearly a GHz faster. Oddly, Intel reports the W5590  as not supporting “turbo” or hyper-threading although it is clear that Intel’s marketing is incorrect based on actual testing.

Assuming Magny-Cours improves slightly on its way to market, we already know how 24-core Istanbul stacks-up against 16-thread Nehalem in VMmark and what that means for Nehalem-EP. This partly explains the marketing shift as Intel tries to position Nehalep-EP as a destined for workstations instead of servers. Whether or not you consider this move a prelude to the ensuing Nehalem-EX v. Magny-Cours combat to come or an attempt to keep Intel’s server chip power average down by eliminating the 130W+ parts from the “server” list,  Intel and AMD will each attempt win the war before the first shot is fired. Either way, we see nothing that disrupts the price-performance and power-performance comparison models that dominate the server markets.

[Ed: The 10% difference is likely due to the fact that the author was unable to get “more than one core” clocked at 3.0GHz. Likewise, he was uncertain that all cores were reliably clocking at 2.6GHz for the longer wPrime tests. Again, this engineering sample was designed to run at 1.7GHz and was not likely “hand picked” to run at much higher clocks. He speculated that some form of dynamic core clocking linked to temperature was affecting clock stability – perhaps due to some AMD-P tweaks in Magny-Cours.]


Quick Take: HP’s Sets Another 48-core VMmark Milestone

August 26, 2009

Not satisfied with a landmark VMmark score that crossed the 30 tile mark for the first time, HP’s performance team went back to the benches two weeks later and took another swing at the performance crown. Well, the effort paid off, and HP significantly out-paced their two-week-old record with a score of 53.73@35 tiles in the heavy weight, 48-core category.

Using the same 8-processor HP ProLiant DL785 G6 platform as in the previous run – complete with 2.8GHz AMD Opteron 8439 SE 6-core chips and 256GB DDR2/667 – the new score comes with significant performance bumps in the javaserver, mailserver and database results achieved by the same system configuration as the previous attempt – including the same ESX 4.0 version (164009). So what changed to add an additional 5 tiles to the team’s run? It would appear that someone was unsatisfied with the storage configuration on the mailserver run.

Given that the tile ratio of the previous run ran about 6% higher than its 24-core counterpart, there may have been a small indication that untapped capacity was available. According to the run notes, the only reported changes to the test configuration – aside from the addition of the 5 LUNs and 5 clients needed to support the 5 additional tiles – was a notation indicating that the “data drive and backup drive for all mailserver VMs” we repartitioned using AutoPart v1.6.

The change in performance numbers effectively reduces the virtualization cost of the system by 15% to about $257/VM – closing-in on its 24-core sibling to within $10/VM and stretching-out its lead over “Dunnington” rivals to about $85/VM. While virtualization is not the primary application for 8P systems, this demonstrates that 48-core virtualization is definitely viable.

SOLORI’s Take: HP’s performance team has done a great job tuning its flagship AMD platform, demonstrating that platform performance is not just related to hertz or core-count but requires balanced tuning and performance all around. This improvement in system tuning demonstrates an 18% increase in incremental scalability – approaching within 3% of the 12-core to 24-core scaling factor, making it actually a viable consideration in the virtualization use case.

In recent discussions with AMD about the SR5690 chipset applications for Socket-F, AMD re-iterated that the mainstream focus for SR5690 has been Magny-Cours and the Q1/2010 launch. Given the close relationship between Istanbul and Magny-Cours – detailed nicely by Charlie Demerjian at Semi-Accurate – the bar is clearly fixed for 2P and 4P virtualization systems designed around these chips. Extrapolating from the similarities and improvements to I/O and memory bandwidth, we expect to  see 2P VMmarks besting 32@23 and 4P scores over 54@39 from HP, AMD and Magny-Cours.

SOLORI’s 2nd Take: Intel has been plugging away with its Nehalem-EX for 8-way systems and – delivering 128-threads – promises to deliver some insane VMmarks. Assuming Intel’s EX scales as efficiently as AMD’s new Opterons have, extrapolations indicate performance for the 4P, 64-thread Nehalem-EX shoud fall between 41@29 and 44@31 given the current crop of speed and performance bins. Using the same methods, our calculus predicts an 8P, 128-thread EX system should deliver scores between 64@45 and 74@52.

With EX expected to clock at 2.66GHz with 140W TDP and AMD’s MCM-based Magny-Cours doing well to hit 130W ACP in the same speed bins, CIO’s balancing power and performance considerations will need to break-out the spreadsheets to determine the winners here. With both systems running 4-channel DDR3, there will be no power or price advantage given on either side to memory differences: relative price-performance and power consumption of the CPU’s will be major factors. Assuming our extrapolations are correct, we’re looking at a slight edge to AMD in performance-per-watt in the 2P segment, and a significant advantage in the 4P segment.


Quick Take: 6-core “Gulftown” Nehalem-EP Spotted, Tested

August 10, 2009

TechReport is reporting on a Taiwanese overclocker who may be testing a pair of Nehalem 6-core processors (2P) slated for release early in 2010. Likewise, AlienBabelTech mentions a Chinese website, HKEPC, that has preliminary testing completed on the desktop (1P) variant of the 6-core. While these could be different 32nm silicon parts, it is more likely – judging from the CPU-Z outputs and provided package pictures – that these are the same sample SKUs tested as 1P and 2P LGA-1366 components.

CPUzWhat does this mean for AMD and the only 6-core shipping today? Since Intel’s still projecting Q2/2010 for the server part, AMD has a decent opportunity to grow market share for Istanbul. Intel’s biggest rival will be itself – facing a wildly growing number of SKU’s in across its i-line from i5, i7, i8 and i9 “families” with multiple speed and feature variants. Clearly, the non-HT version would stand as a direct competitor to Istanbul’s native 6-core SKUs. Likewise, Istanbul may be no match for the 6-core Nehalem with HT and “turbo core” feature set.

However, with an 8-core “Beckton” Nehalem variant on the horizon, it might be hard to understand just where the Gulftown fits in Intel’s picture. Intel faces a serious production issue, filling fab capacity with 4-core, 6-core and 8-core processors, each with speed, power, socket and HT variants from which to supply high-speed, high-power SKUs and lower-speed, low-power SKUs for 1P, 2P and 4P+ destinations. Doing the simple math with 3 SKU’s per part Intel would be offering the market a minimum of 18 base parts according to their current marketing strategy: 9 with HT/turbo, 9 without HT/turbo. For socket LGA-1366, this could easily mean 40+ SKUs with 1xQPI and 2xQPI variants included (up from 23).

SOLORI’s take: Intel will have to create some interesting “crippling or pricing tricks” to keep Gulftown from canibalizing the Gainstown market. If they follow their “normal” play book, we prodict the next 10-months will play out like this:

  1. Initially there will be no 8-core product for 1P and 2P systems (LGA-1366), allowing for artificially high margins on the 8-core EX chip (LGA-1567), slowing the enevitable canibalization of the 4-core/2P market, and easing production burdens;
  2. Intel will silently and abruptly kill Itanium in favor of “hyper-scale” Nehalem-EX variants;
  3. Gulftown will remain high-power (90-130W TDP) and be positioned against AMD’s G34 systems and Magny-Cours – plotting 12-core against 12-thread;
  4. Intel creates a “socket refresh” (LGA-1566?) to enable “inexpensive” 2P-4P platforms from its Gulftown/Beckton line-up in 2H/2010 (ostensibly to maintain parity with G34) without hurting EX;
  5. Revised, lower-power variants of Gainstown will be positioned against AMD’s C32 target market;
  6. Intel will cut SKUs in favor of higher margins, increasing speed and features for “same dollar” cost;
  7. Non-HT parts will begin to disappear in 4-core configurations completely;
  8. Intel’s AES enhancements in Gulftown will allow it to further differentiate itself in storage and security markets;

It would be a mistake for Intel to continue growing SKU count or provide too much overlap between 4-core HT and 6-core non-HT offerings. If purchasing trends soften in 4Q/09 and remain (relatively) flat through 2Q/10, Intel will benefit from a leaner, well differentiated line-up. AMD has already announced a “leaner” plan for G34/C32. If all goes well at the fabs, 1H/2010 will be a good ole fashioned street fight between blue and green.


Server Watch: Istanbul, G34, C32, Itanium and Nehalem-EX

May 29, 2009
Istanbul is launching in June, 2009 and will be a precursor to the G34 and C32 platforms to come in Q1/2010. To that end, AMD will be providing an overview of its next generation of Direct Connect Architecture, or DCA 2.0, which which separates Socket-F systems from G34/C32. This overview will be available as a live webcast on June 1, 2009 at 11:00AM Central Time. In advance of the announcement, AMD has (silently) reduced prices for its Opteron processors across the board. This move will place additional pressure on Intel’s Nehalem-EP systems already weakened (virtualization) price-performance.

We expect to hear more news about Istanbul’s availability in keeping with Tyan’s upcoming announcement next week. Based on current technology and economic trends, Istanbul and G34 could offer AMD a solid one-two punch to counter Intel’s relentless “tick-tock” pace. With Nehalem servers sales weak despite early expectations and compounding economic pressures, market timing may be more ideally suited for AMD’s products than Intel’s for a change. As Gartner puts it, “the timing of Nehalem is a bit off, and it probably won’t make much of an impact this year.”

In the meantime, Phil Hughes at AMD has a posted a personal reflection on Opteron’s initial launch, starting with the IBM e325 in 2003, and ending with Opteron’s impact on the Intel Itanium market by year-end (while resisting a reference to “the sinking of the Itanic“). Phil acknowledges Sun’s influence on Opteron and links to some news articles from 2003. See his full post, “The Sun Also Rises,” here… As 64-bit processors go, 2003 was much more the year of the Opteron rather than “the year of the Itanium” (as predicted by Intel’s Paul Otellini.)

Speaking of Itanium, TechWorld has an article outlining how Intel’s upcoming Nehalem-EX – with the addition of MCA technology derived from Itanium – could bring an end to the beleagered proprietary platform. TechWorld cites Insight 64 analyst Nathan Brookwood as saying the new Xeon will finally break Intel’s policy of artificially crippling of the x86 processor which has prevented Xeon from being competitive with Itanium. The 8-core, SMT-enabled EX processor was being demonstrated by IBM in an 8-socket configuration.