Monday, June 11, 2012

Can Intel Catch ARM?

First let’s preface this discussion with some information on process nodes. It seems that others like to claim they are ahead of Intel on SOC process nodes by pointing out that they are on 28nm while Intel is still on 32nm. This is nothing more than pure marketing fluff. They are the same process node 28nm is merely an optical shrink of the 32nm node. The transistors are packed a bit closer together, but the underlying process is the same. So the following processes will be equivalent: 45/40nm, 32/28nm, 22/20nm, 14/12nm, etc.

Before I get into my main points I want to look at what I call the x86 myth. Boiled down to its simplest terms this myth states that the transistor overhead needed to support the x86 instruction set prevents x86 chips from being as small and efficient as an ARM (or other non-x86 chip). At one time this was certainly true. It is estimated that the x86 instruction set took up ~30% of the transistor budget on the original Pentium processor. The requirements for the x86 instruction set haven’t grown much with subsequent generations, but for the sake of argument, let’s pick an obscenely high number and say that the requirements have doubled. So that would give us 60% of the transistors in the original Pentium chip dedicated to the x86 instruction set that non-x86 chips can either repurpose or eliminate.

According to Wikipedia, the Pentium chip had 3,100,000 transistors. 60% of that would be 1,860,000 transistors. That sounds like a lot of transistors, but let’s put that in perspective. I can’t find a verified transistor count or die size for Medfield. (The best I could find was that the transistor count is ~1/4 of a dual core Conroe and Anandtech estimated the die size at ~53mm^2.) Again referring to Wikipedia, the transistor count of a dual core Conroe is 291 million transistors. One quarter of the Conroe transistor count is 72,750,000 transistors. Our theoretical 1.86 million transistors would be about 2.6% of the total transistor count. And that is a high end estimate. So the disadvantage Intel accrues from using the x86 instruction set is an increase in transistor count and die size of ~2.6%. I hardly call that a make or break proposition. I’ve looked for die size information on Qualcomm’s Snapdragon S4 chip and haven’t found anything. But I did find a die size on Nvidia’s Tegra 3 chip. It is listed as 83mm^2. Since this is built on a 40nm process we should look at what building this on a 28 nm process would give for a die size if we want to compare this to Medfield in an apples-to-apples comparison.

One of the chip industries dirty little secrets is that you don’t actually get a 50% decrease in die size when you move to the next process node. You achieve pretty close to this in the cache regions where the layout is very symmetrical and you can pack the transistors for maximum density, but the results in the logic regions of the chip are a lot worse. As a result you only get about a 30% decrease in die size for a typical microprocessor rather than the theoretical 50% decrease.

Applying the above logic to the Tegra 3 chip you get a reduction from 83mm^2 to 58mm^2. That puts it in the same ballpark as Intel’s Medfield at 32nm. In fact, Tegra 3 should be almost 10% larger than the Medfield chip even on the same process node. Since cost is proportional to size, Intel should be able to manufacture Medfield for ~10% less than Nvidia can manufacture Tegra 3 at 28n. So much for the cost impact of the x86 instruction set.

While I don’t see an issue with Medfield due to the x86 nature of the chip Intel still faces the challenges that any new entrant to a market faces when trying to replace an entrenched competitor. They have to provide a compelling value proposition. As you will see from the analysis below, I believe that Intel's biggest advantage in this space will be cost. That's right, I said cost. I'll be interested in any other views, but please have the courtesy to read my analysis and identify specific issues rather than making general blanket statements.

Intel’s phone (I’m looking at the system as a whole here because that is what the end user really cares about) is reported to be a middle of the road smart phone. It is a competitive entry, but doesn’t offer anything truly compelling. Intel is wisely targeting emerging markets with this smartphone that don’t have the well established base that exists in the US and Europe. This move allows Intel to avoid some of the difficulties of trying to oust a well established incumbent. In order to be successful in the long term, though, Intel is going to have to offer a compelling reason to choose their product over ARM. So Intel is going to have to offer comparable (preferably better) performance at lower power, and an equivalent (or lower) price. What I’d like to do here is evaluate Intel’s plan to achieve these goals.

The smartphone game is all about maximum performance at minimum power, but power and performance are inextricably linked. For a given process node, if you increase performance, you are going to have to use more power. If you want to lower power, you are going to have to give up some performance. If you want to improve your performance without increasing power you either have to improve your design, or change your process (i.e. a node shrink). Intel plans on two process shrinks and a redesign over the next 2 years.

Intel’s latest entry into the smartphone space, Medfield, is based on the 32nm process. Most of the competing devices are based on the 45/40nm process. So Intel has a 1 process node lead over those products. The latest and greatest ARM processors that are just starting to hit the market (QUALCOMM Snapdragon and TI OMAP5) are based on a 28nm process. Looking at benchmarks for the Snapdragon 4 (Krait) processor looks like the SOCs based on this processor outperform the Medfield offering. Perhaps more interesting is that Tegra 3 seems to also offer better performance although it is fabbed on a 40nm process. One would expect that shrinking the other processor designs to the 28nm node would give them a similar performance advantage over Medfield and possibly give Tegra 3 top honors.

Anandtech’s review of Medfield shows that Intel still has a ways to go to achieve this goal. The HTC One X/S use the Tegra3 and the Snapdragon S4 chips respectively. These phones represent the next generation of ARM based phones and generally outperform the Medfield chip. Unfortunately, battery life data isn’t available in the review, so no comparison between the Medfield phone and the newest ARM phones can be made regarding battery life. But Medfield battery life is compared to a number of other phones and generally in the bottom half but above the bottom quarter of the phones evaluated. Intel will need to close this gap. Snapdragon 4 is supposed to provide comparable battery life to it's predecessor while providing better performance. Based on that I would expect Medfield to lag behind the Snapdragon 4 in both power and performance. If Tegra 3 were fabbed on 28nm I would expect it to also edge Medfield on power as well as performance.

Intel's current roadmap shows them introducing a new design by the end of this year that is supposed to increase performance. Given the power sensitive nature of the phone market it is my assumption that this redesign will match the power consumption of the current devices. I have no data to support this, but Intel's phone effort are currently being run by Mike Bell who was involved in the development of the original iPhone, so I'm sure he knows what this market values. Based on what Intel has released on the redesign and my assumption of comparable power consumption Intel can expect an improvement over Medfield's current power/performance metric by the end of the year. The redesign should put Intel back in the lead on performance, where they are already competitive with the top phones.

I'm not claiming Intel has any magic bullet's here though. I suspect the redesign will end up increasing die size and giving up the size/cost advantage I indicated that Intel currently has over the hypothetical Tegra 3 processor on 28nm. Intel can increase their transistor budget by ~10% and still maintain size/cost equivalency. In order to stay in the same power envelope they will have to adopt more rigorous power control and/or reduce processor speed. My thought here is that going to out-of-order execution here will be Intel's approach which will allow greater processing efficiency at lower speeds while costing more transistors.

It is my opinion that the redesign will put Intel back in the lead on performance, but will still leave them lagging behind on battery power and sacrifice their cost/size advantage. Intel's roadmap calls for Intel to move to 22nm in 2013 and 14nm in 2014. I believe this is where Intel will rely on their process technology to close the gap. My expectation here is that the 22nm offering will use the same design as the improved 32nm design and the following comments are based on this assumption. Given the short time between 22 and 14nm, I believe Intel will have to use the same design on 14nm that they do on 22nm. While Intel hasn't announced anything beyond this I believe the next logical step will be another redesign on 14nm.

Shrinking to 22nm will give Intel their cost/size advantage back and will improve their power efficiency. If ARM were to do nothing between now and 2014 I expect that Intel's shrink to 14nm would give them unquestioned leadership in all three key metrics, cost, power, and performance. Intel's critics are quick to point out that ARM will not be sitting still for the next 2 years and rightly so. Let's look at what ARM's roadmap shows between now and 2014.

ARM is claiming to have 20nm products ready to go at the end of 2013. Given a typical 2 year development cycle they would see 12nm in 2015, 1 year after Intel goes to 14nm. This timeline also lines up well with TSMC’s current roadmap. Interestingly, TSMC has recently announced that they will only be offering a single 20nm process instead of the High Performance and Low Power variants they originally proposed. They cited the “lack of a noticeable performance difference” between the two processes as the reason for the change. This is very different from the data that Intel presented for 22nm. Intel showed significant differences between transistors designed for low power and those designed for high speed. This leads me to believe that TSMC’s process will underperform compared to Intel’s equivalent process.

I have been unable to find a timeline for ARM processor designs. I did find an article that indicates that ARM's future plans focus on a Big-Little theme. They will be using ARM7 cores for times when the device has less computational demand and switch to an ARM15 core when the computational demands are higher. Despite the lack of any public statements from ARM I have assumed a number of redesigns in the table below where I compare the roadmaps of ARM and Intel.

Firm 2012 Q3 2012 Q4 2013 Q1 2013 Q2 2013 Q3 2013 Q4 2014 Q1 2014 Q2 2014 Q3 2014 Q4 2015 Q1
Intel 32nm Mk1 32nm Mk2 32nm Mk2 32nm Mk2 22nm Mk1 22nm Mk1 22nm Mk1 14nm Mk1 14nm Mk1 14nm Mk1 14nm Mk2?
ARM 28nm Mk1 28nm Mk1 28nm Mk2? 28nm Mk2? 28nm Mk2? 20nm Mk1 20nm Mk1 20nm Mk2? 20nm Mk2? 20nm Mk2? 12nm Mk1?


The table above summarizes the two roadmaps. "?" marks indicate process nodes or redesigns that I'm assuming will occur. Mk1 is an initial design on a process node and Mk2 is a redesign on a given process node. Comparing the roadmaps in the table above I expect the next couple of years to unfold as shown in the tables below.

Metric 2012 Q3 2012 Q4
Performance ARM Intel
Power ARM ARM
Cost Intel ARM/Intel
Metric 2013 Q1 2013 Q2 2013 Q3 2013 Q4
Performance ARM/Intel ARM/Intel Intel ARM/Intel
Power ARM ARM ARM/Intel ARM
Cost Intel Intel Intel ARM/Intel
Metric 2014 Q1 2014 Q2 2014 Q3 2014 Q4 2015 Q1
Performance ARM/Intel ARM/Intel ARM/Intel ARM/Intel ARM/Intel
Power ARM ARM/Intel ARM/Intel ARM/Intel ARM
Cost ARM/Intel Intel Intel Intel ARM/Intel


Note that these tables compare estimates of Intel's offerings with the leading edge ARM products. When compared to ARM's older products, Intel may well have an advantage. I've also made no assumptions that foundries would have yield issues that would delay migration to a given process node, or any assumptions regarding inferior performance of foundry processes. I also assumed that Intel and ARM had an equivalent design frequency of 3 quarters with the exception of the 20nm node where I assumed 2 quarters for ARM. In short, I have tried not to bias my evaluation based on any assumptions of process or design superiority.

Examination of these tables shows that Intel will match ARM's leading edge performance and either hold the lead or exceed their performance by the end of this year. Intel will struggle to match the power efficiency of ARM until 2014 when both companies will maintain relative parity. The analysis I've performed above shows that Intel's real advantage here, and the thing they will have to leverage to gain traction in this market is cost.

Most analysts believe that Intel will not be willing to cut margins enough to be competitive on cost. However, I bring two counter arguments to the table. First, Intel's process lead gives them an inherent cost advantage. Second, I have heard Intel's Paul Otellini state that he expects SOC products to make up the majority of Intel's production on a volume basis, but not a cost basis on several occasions. To me this indicates that he realizes Intel will have to sacrifice some degree of margin on these products to be competitive. To offset this Intel will still have server and PC revenues to maintain their margins. As smartphone sales go up, so do server sales, and servers are Intel's highest margin chips. All these factors lead me to believe that before the end of 2014 Intel will be a major player in the smartphone market and will use cost as their primary advantage.

Tuesday, November 2, 2010

Intel's Achronix Strategy

There have been several reports recently on Intel's agreement to build FPGA's for Achronix on their upcoming 22nm technology. As far as I know this is a first for Intel. Not only are they building someone else's designs on an Intel process, but they are building those devices on Intel's leading edge technology.

Intel makes the most money off of their leading edge process. In recent presentations Intel has made a big deal out of how quickly they are ramping their newest process technologies. Faster ramps mean earlier crossover from the old technology to the new technology. Driving towards earlier crossover means higher profit margins and shorter time to repay the development and retooling costs associated with moving to a new process node. So I have to ask: Why would Intel sacrifice any of their early leading edge capacity for what is essentially foundry work?

The articles I've seen have suggested 2 reasons. The first is that Intel is looking to offset some of the R&D costs of process development. The second is that Intel wants to get back into the Field Programable Gate Area (FPGA) game.

In my opinion, the idea that Intel is looking to offset R&D costs with this move is absolute rubbish. Anyone that is willing to take an objective look at this would come to the same conclusion. Let me give an example to demonstrate why I don't think this line of speculation is worth the pixels it takes to print it.

Suppose I can sell a product for $100 and it costs me $50 to make. Let's also say the design work costs me $1000 up front. So if I sell 1000 units, I make $50000 minus the $1000 for design work. I net a total of $49000.

In the foundry model, I save the $1000 design cost up front, I still spend $50000 to make the 1000 units, but then I can't sell them to the customer for $49000 because they want to make a profit as well. Recouping their design costs isn't sufficient. So let's say I can sell them for 70% of their market value. That gives me $35000 in profit.

The model here is grossly oversimplified, but it illustrates the point. Building my own designs I make $49000, and building product as a foundry I make $35000. That means I'm making significantly less on the foundry product, and last I checked, making less isn't going to help offset my development costs. Instead of helping me, it reduces my margins and increases the time it is going to take me to recoup my R&D investment. Remember, we are talking about Intel's leading edge technology here, not trying to fill fabs running and old technology and keep them profitable longer.

The second theory is that Intel wants to get back into the FPGA game. Intel once had an FPGA program and sold it. In the EE Times article a spokesman for Achronix was quoted as saying:

"If Intel wanted to be in the FPGA business they would be already. They certainly have the cash."

And he is right. If all Intel wanted was to be in the FPGA business, they would simply buy Achronix or a similar company.

I believe the author of the EE Times article comes close to explaining what Intel is doing when the author says:

The relationship with Achronix could be a precursor to Intel eventually combining programmable logic with its Atom cores on the same die to create a new type of device. Earlier this year both Xilinx and Actel Corp. announced products that combined their programmable logic technology with hard ARM processor cores.

In my opinion the author of the EE Times article isn't looking far enough ahead to see what Intel is really looking to accomplish. While Intel may well want to create a new device that combines Atom and FPGA circuitry, I believe there is a much larger scope to this announcement. This move is really about Intel's Atom SOC strategy, not just FPGA devices.

In order to be a real player in the SOC space (smartphones, autotainment systems, etc.) Intel needs to develop a robust SOC capability they don't currently have. Up to this point the SOC designs that I've seem Intel previewing are all in-house Intel designs. But many of the players in the SOC space have their own proprietary designs they build around the central processing core. To make that happen, Intel needs to learn how to build external designs on the Intel process.

But my reading leads me to believe that Intel's design rules are fairly restrictive when compared to the traditional foundries. Since we are talking SOC's here Intel can't just tweak the process for an individual customer. The external designs have to work well with the same process Intel is using to manufacture Atom. In order to work effectively with customers in this new space, Intel needs to learn how to work in conjunction with external design teams to get the designs laid out in a way that will take advantage of Intel's process capabilities and yield well.

I believe the Achronix move is actually a first step in Intel's SOC strategy. A strategy that will allow Intel's customers to design their unique features around an Atom core to make a truly unique product. If this strategy proves successful, Intel and their partners will be able to offer a distinct product with clear differentiation in the market place. This is how Intel intends to differentiate future Atom products from competing ARM products.

Wednesday, January 13, 2010

A Closer Look at Intel's Process Lead

In my previous analysis I did not look closely at the effects of Intel's process lead over the ARM products currently being manufactured. IMHO, process node isn't nearly as important as design for these small, low power applications.

Let's look at Snapdragon in a little more detail to illustrate my point. I mentioned multi-tasking as an example of Atom having more horsepower than, comparable ARM products, and I believe the LG phone will have more horsepower than the Nexus One using Snapdragon. But as Ho Ho pointed out on Roborat's blog, ARM isn't that bad. You can see what Snapdragon can do at this link. I think any rational person has to agree that the performance in the video isn't painfully slow. In my mind this gets the performance of the Snapdragon design past the "can it do what I want to do" hurdle. That takes us to the next big differentiator, battery life.

Look at the numbers I threw out there for the power usage on the Snapdragon design. This thing is uses 2-3x less power than Atom and is built on the same process node as Atom without the advantage of HK/MG. If you assume that you will get a 25% power reduction with each process shrink you can see that Atom won't reach power parity with the current Snapdragon design until the 11nm node if you rely on process shrinks alone to get you there.


If Atom is going to reach where Snapdragon is today by the 22nm node using only process shrinks, they would have to achieve a whopping 45% power reduction on each of the next 2 process nodes. I don't see that happening, but even if it did that would still give Qualcom 3 years to improve the power efficiency of the current Snapdragon design.

No matter how badly the foundries may struggle with advanced processes, I just can't see Intel ending up with the 4 node process lead they would need to close this gap on process alone. This also assumes that there is no room for further optimization in the Snapdragon design over the next 6 years when Intel reaches 11nm.

So if Atom is going to compete successfully with the leading edge ARM processors, it is going to come down to Intel's ability to reduce the power requirements of their design while maintaining functionality. Intel's process lead may allow for less efficient designs at the high end, but it is not going to be sufficient to compete effectively in small form factors. The game is different when you are dealing with small form factors, and business as usual isn't going to cut it.

Monday, January 11, 2010

ARM vs Atom: Intel's Newest Challenge

Cross posted on Roborat's blog

The ARM architecture offers several advantages when compared to Atom.


ARM has smaller die sizes than the Atom processors which gives ARM a cost advantage. Having been designed for use in space sensitive environments, the ARM core is smaller than the Atom equivalent. This is the case even though Atom is being manufactured on a more advanced process than most of the current ARM designs.


In addition, ARM is more highly integrated than Atom. Almost all ARM products for use in the mobile space are single chip SOC solutions. This offers a substantial size advantage over the current Atom solution which requires three chips and the upcoming solution (Moorestown) that is still a two chip solution. Atom won't offer a single chip solution prior to the advent of Medfield sometime in 2011. So Atom won't be able to match ARM for solution size or integration until somewhere between 1 and 2 years from now.


But the biggest advantage ARM holds right now is in power efficiency. Qualcom's Snapdragon processor is the poster child for ARMs high performance processors, so I'll use that as a reference point. The Snapdragon processor is reported to use 250-500mW under load at 10mW at idle. Atom's Moorestown, due out later this year, should use ~1000-750mW under load and ~35mW at idle. So ARM offers about a 2-3X power efficiency advantage over the Atom platform.


With all these disadvantages, one wonders what Atom can bring to the table.


First and foremost is sheer processing power. If you look at Intel's marketing around the LG GW990 from CES, you will see an emphasis on multi-tasking. ARM is closing the gap on responsiveness on single apps, but the x86 architecture that Atom is based on still seems to have more horsepower and allows you to do more things at once.


Another big advantage that Atom currently enjoys is the ability to run flash applications. However, Adobe is reportedly working with ARM to enable their processor designs to run flash applications. So this advantage is going to be short lived. It has helped Atom become the dominant netbook processor but it will not continue to drive future growth.


The last advantage that the Atom brings to the table is the ability to run Windows. By being able to run Windows, Atom brings a large software infrastructure to the table for any device it is installed on. But this advantage isn’t quite as big as it might seem at first glance.


The Atom processor was designed to be a “good enough” processor for basic PC tasks like browsing the internet, viewing video, etc. But it lacks the power to run large applications well. So while Atom may be capable of running x86 applications, the experience with many of them is poor. If the software doesn’t run well it is not much better than not running at all.


The use of Atom in small form factors further offsets the advantage of using existing software. Many of the current applications don’t fit these small form factors very well. This can be fixed, but requires that the code be modified to correct the problem. Having to modify the code for this purpose nullifies much of the advantage of being able to use the existing software.


Intel’s marketing along the software lines seems to have matured beyond the idea of basic software compatibility of late. They are placing a greater emphasis on cross platform portability. I believe that this is a more realistic assessment of the x86 advantage than focusing on the software because it focuses on one of the few real weaknesses of the ARM architecture.


ARM doesn’t manufacture chips, it sells licenses to use its architecture. Each licensee is free to modify the basic design to suit the licensee’s needs. This results in an ecosystem where the various implementations from different vendors may not be compatible with each other even though they are based on the same core architecture.


Systems built around the x86 architecture bring the guarantee of cross system compatibility. Not in the sense that you can move the software directly, but rather in the ability to link the systems together and transfer data between them. So by choosing Atom, you know you are choosing a device that will work and play well with your other devices.


In summary, ARM and Atom are rapidly converging to similar levels of computing power and energy efficiency. Within a few years I believe there will only be one key differentiator between the two architectures. The differentiator will be the ease with which you can move data between your various computing applications.


Due to the homogenous nature of the hardware infrastructure Intel is building I believe this gives them a substantial advantage. However, there is still a need for urgency on Intel’s part. If ARM becomes the entrenched incumbent architecture in this new space, it will take far longer for Intel to move Atom down into the smaller devices. I believe the x86 architecture, warts and all, will become the dominant architecture in personal computing devices. But if Intel doesn’t move quickly enough they will miss the initial growth curve and the resulting profits that come from riding that curve.

Friday, August 21, 2009

I've been thinking about this article from Tech report on AMD's Istanbul server chip. One thing that caught my eye was the SPECpower_ssj2008 score. The score they reported was quite a bit lower than the score at Spec.org. So I decided to take a more detailed look at this for myself.

First let me explain the methodology I'm going to use. I intend to take the best overall model for a given set of benchmarks for both Intel and AMD and base my comparison on that. I believe that using spec scores is the best method for comparing competing systems because the vendors who are posting the scores have done everything they can to tweak the systems to give the best performance possible. Anyone who tries to claim bias or attempts to make one system look bad relative to another just doesn't get the free market system. These guys want to sell systems, and you can't do that if you don't do all you can to make your offering look good.

For those who claim that the benchmarks are tweaked to favor Intel, all you have to do is look at the detailed breakdown and compare the tests that are relevant to your workload against each other. Most of these benchmarks represent real world server tasks. So in essence, you can customize the results for your workload. I don't intend to do that in this post, but might at some future date.

So let's start with the SPEC CPU Scores. The AMD systems posted at SPEC.org are limited to a couple of 1 socket systems and a couple of 2 socket systems for the CINT and CFP scores. The only results for the Xeon x5550 system (the Intel chip with a comparable clock speed) are for 2 socket systems, so that is what I'll compare. I will note that the AMD systems seem to take a hit on these scores and are actually a little faster in the 1 socket configuration. However, when we get to the rate scores 2 socket is clearly the way to go. As you can see below the Intel system offers superior speed for both floating point and integer workloads.

It is interesting to note that the AMD systems show better peak performance increases than the Intel Systems. I suspect that is due to Intel's turbo boost implementation. It appears that turbo boost keeps the Intel system operating near the peak values at all times. These results would seem to throw a bit of cold water on the claim I've seen thrown about that turbo boost only works for short periods of time.

Next we'll take a look at the rate values. This has traditionally been a strong point for AMD due to the use of hypertransport rather than the FSB that Intel has been using up to the release of the Nehalem processor. I remember arguing with someone (the name is withheld to protect the guilty) that switching away from the FSB was going to yield huge throughput improvements for Intel. Said individual kept trying to paint the change as some sort of negative for Intel, but I couldn't see how that would be the case. I guess we can answer that question now, so let's take a look.

In this case I have included results for the Opteron 2435 (2.6GHz) and the 2439 (2.8GHz) as well as the Xeon 5500(2.66GHz) and 5560 (2.8GHz).


Again, we see that the Intel peak numbers are not that much higher than the baseline numbers while the AMD systems show a substantial increase at peak performance. The increase in integer rate performance brings the AMD systems to performance levels that are nearly comparable to the Intel systems.

I find it interesting that the increase in speed bins don't produce throughput increases proportional to the rate of the speed increase in the processor.

Now we'll take a look at the big claim from the tech report article. The claim that the power to performance ratios of the two systems are comparable. First let me note that there is only one official power score for the Opteron 2435 posted at SPEC.org. There are no postings for the Xeon 5550, but there are several for the Xeon 5570. To make a "fair" comparison, I did the following. I took the values for the Xeon 5570 and scaled the output by the difference in processor speed. I then reduced the values by an additional 5%. For those that want to see the math, I multiplied the number of 0pps (or the perormance to power ratio) by (2.66/2.93)*0.95 to get a final value. I plotted this number against the unmodified power values for the Xeon 5570, which should be higher since it is clocked higher. The results are plotted below as the "Hypothetical Xeon 5550". If someone can propose a better way to estimate the performance I'm more that willing to give it a whirl and see what it produces.
So even though the Xeon x5570 is clearly shown to offer a better power performance ratio than the Opteron 2435. The Hypothetical Xeon 5550 offers a better power/performance ratio than the Opteraon 2435 at all but the highest loadings.

Finally, I went and took a quick look at pricing a couple of the systems that I've used here for comparison. Without digging into a lot of the details. It seems that the systems are priced within a few hundred dollars of each other. I don't see that as a significant difference for systems that are priced around $6500 to begin with.

So it looks like the Opteron 2435 Istanbul processor still doesn't close the gap to Nehalem across a broad range of workloads. As I said at the beginning, there are sure to be workloads that the Opteron is a better choice for, and if that is what you are running you should give serious consideration to going that route.

Unfortunately, these results are even worse for AMD than the numbers themselves seem to indicate. In order to get within a respectable distance of the Nehalem processor, AMD has had to go with a 6 core die. While the number of cores really isn't an issue, the resulting die size is. Assuming equivalent defect densities (0.05 defects per cm^2) the yield for Nehalem at 263mm^2 is 190 die and for Istanbul at 346mm^2 is 129 die. So to get equivalent revenue for an Istanbul chip, AMD will have to sell it for 26.8% more than an equivalent Nehalem chip. But Istanbul's performance is, at best, comparable to a Xeon 5550, and is priced comparably. So AMD is taking a substantial margin hit when selling this part. Unfortunately, they don't have a choice since this is the most competitive chip they have to offer.