Friday, August 21, 2009

I've been thinking about this article from Tech report on AMD's Istanbul server chip. One thing that caught my eye was the SPECpower_ssj2008 score. The score they reported was quite a bit lower than the score at Spec.org. So I decided to take a more detailed look at this for myself.

First let me explain the methodology I'm going to use. I intend to take the best overall model for a given set of benchmarks for both Intel and AMD and base my comparison on that. I believe that using spec scores is the best method for comparing competing systems because the vendors who are posting the scores have done everything they can to tweak the systems to give the best performance possible. Anyone who tries to claim bias or attempts to make one system look bad relative to another just doesn't get the free market system. These guys want to sell systems, and you can't do that if you don't do all you can to make your offering look good.

For those who claim that the benchmarks are tweaked to favor Intel, all you have to do is look at the detailed breakdown and compare the tests that are relevant to your workload against each other. Most of these benchmarks represent real world server tasks. So in essence, you can customize the results for your workload. I don't intend to do that in this post, but might at some future date.

So let's start with the SPEC CPU Scores. The AMD systems posted at SPEC.org are limited to a couple of 1 socket systems and a couple of 2 socket systems for the CINT and CFP scores. The only results for the Xeon x5550 system (the Intel chip with a comparable clock speed) are for 2 socket systems, so that is what I'll compare. I will note that the AMD systems seem to take a hit on these scores and are actually a little faster in the 1 socket configuration. However, when we get to the rate scores 2 socket is clearly the way to go. As you can see below the Intel system offers superior speed for both floating point and integer workloads.

It is interesting to note that the AMD systems show better peak performance increases than the Intel Systems. I suspect that is due to Intel's turbo boost implementation. It appears that turbo boost keeps the Intel system operating near the peak values at all times. These results would seem to throw a bit of cold water on the claim I've seen thrown about that turbo boost only works for short periods of time.

Next we'll take a look at the rate values. This has traditionally been a strong point for AMD due to the use of hypertransport rather than the FSB that Intel has been using up to the release of the Nehalem processor. I remember arguing with someone (the name is withheld to protect the guilty) that switching away from the FSB was going to yield huge throughput improvements for Intel. Said individual kept trying to paint the change as some sort of negative for Intel, but I couldn't see how that would be the case. I guess we can answer that question now, so let's take a look.

In this case I have included results for the Opteron 2435 (2.6GHz) and the 2439 (2.8GHz) as well as the Xeon 5500(2.66GHz) and 5560 (2.8GHz).


Again, we see that the Intel peak numbers are not that much higher than the baseline numbers while the AMD systems show a substantial increase at peak performance. The increase in integer rate performance brings the AMD systems to performance levels that are nearly comparable to the Intel systems.

I find it interesting that the increase in speed bins don't produce throughput increases proportional to the rate of the speed increase in the processor.

Now we'll take a look at the big claim from the tech report article. The claim that the power to performance ratios of the two systems are comparable. First let me note that there is only one official power score for the Opteron 2435 posted at SPEC.org. There are no postings for the Xeon 5550, but there are several for the Xeon 5570. To make a "fair" comparison, I did the following. I took the values for the Xeon 5570 and scaled the output by the difference in processor speed. I then reduced the values by an additional 5%. For those that want to see the math, I multiplied the number of 0pps (or the perormance to power ratio) by (2.66/2.93)*0.95 to get a final value. I plotted this number against the unmodified power values for the Xeon 5570, which should be higher since it is clocked higher. The results are plotted below as the "Hypothetical Xeon 5550". If someone can propose a better way to estimate the performance I'm more that willing to give it a whirl and see what it produces.
So even though the Xeon x5570 is clearly shown to offer a better power performance ratio than the Opteron 2435. The Hypothetical Xeon 5550 offers a better power/performance ratio than the Opteraon 2435 at all but the highest loadings.

Finally, I went and took a quick look at pricing a couple of the systems that I've used here for comparison. Without digging into a lot of the details. It seems that the systems are priced within a few hundred dollars of each other. I don't see that as a significant difference for systems that are priced around $6500 to begin with.

So it looks like the Opteron 2435 Istanbul processor still doesn't close the gap to Nehalem across a broad range of workloads. As I said at the beginning, there are sure to be workloads that the Opteron is a better choice for, and if that is what you are running you should give serious consideration to going that route.

Unfortunately, these results are even worse for AMD than the numbers themselves seem to indicate. In order to get within a respectable distance of the Nehalem processor, AMD has had to go with a 6 core die. While the number of cores really isn't an issue, the resulting die size is. Assuming equivalent defect densities (0.05 defects per cm^2) the yield for Nehalem at 263mm^2 is 190 die and for Istanbul at 346mm^2 is 129 die. So to get equivalent revenue for an Istanbul chip, AMD will have to sell it for 26.8% more than an equivalent Nehalem chip. But Istanbul's performance is, at best, comparable to a Xeon 5550, and is priced comparably. So AMD is taking a substantial margin hit when selling this part. Unfortunately, they don't have a choice since this is the most competitive chip they have to offer.