Monday, June 11, 2012

Can Intel Catch ARM?

First let’s preface this discussion with some information on process nodes. It seems that others like to claim they are ahead of Intel on SOC process nodes by pointing out that they are on 28nm while Intel is still on 32nm. This is nothing more than pure marketing fluff. They are the same process node 28nm is merely an optical shrink of the 32nm node. The transistors are packed a bit closer together, but the underlying process is the same. So the following processes will be equivalent: 45/40nm, 32/28nm, 22/20nm, 14/12nm, etc.

Before I get into my main points I want to look at what I call the x86 myth. Boiled down to its simplest terms this myth states that the transistor overhead needed to support the x86 instruction set prevents x86 chips from being as small and efficient as an ARM (or other non-x86 chip). At one time this was certainly true. It is estimated that the x86 instruction set took up ~30% of the transistor budget on the original Pentium processor. The requirements for the x86 instruction set haven’t grown much with subsequent generations, but for the sake of argument, let’s pick an obscenely high number and say that the requirements have doubled. So that would give us 60% of the transistors in the original Pentium chip dedicated to the x86 instruction set that non-x86 chips can either repurpose or eliminate.

According to Wikipedia, the Pentium chip had 3,100,000 transistors. 60% of that would be 1,860,000 transistors. That sounds like a lot of transistors, but let’s put that in perspective. I can’t find a verified transistor count or die size for Medfield. (The best I could find was that the transistor count is ~1/4 of a dual core Conroe and Anandtech estimated the die size at ~53mm^2.) Again referring to Wikipedia, the transistor count of a dual core Conroe is 291 million transistors. One quarter of the Conroe transistor count is 72,750,000 transistors. Our theoretical 1.86 million transistors would be about 2.6% of the total transistor count. And that is a high end estimate. So the disadvantage Intel accrues from using the x86 instruction set is an increase in transistor count and die size of ~2.6%. I hardly call that a make or break proposition. I’ve looked for die size information on Qualcomm’s Snapdragon S4 chip and haven’t found anything. But I did find a die size on Nvidia’s Tegra 3 chip. It is listed as 83mm^2. Since this is built on a 40nm process we should look at what building this on a 28 nm process would give for a die size if we want to compare this to Medfield in an apples-to-apples comparison.

One of the chip industries dirty little secrets is that you don’t actually get a 50% decrease in die size when you move to the next process node. You achieve pretty close to this in the cache regions where the layout is very symmetrical and you can pack the transistors for maximum density, but the results in the logic regions of the chip are a lot worse. As a result you only get about a 30% decrease in die size for a typical microprocessor rather than the theoretical 50% decrease.

Applying the above logic to the Tegra 3 chip you get a reduction from 83mm^2 to 58mm^2. That puts it in the same ballpark as Intel’s Medfield at 32nm. In fact, Tegra 3 should be almost 10% larger than the Medfield chip even on the same process node. Since cost is proportional to size, Intel should be able to manufacture Medfield for ~10% less than Nvidia can manufacture Tegra 3 at 28n. So much for the cost impact of the x86 instruction set.

While I don’t see an issue with Medfield due to the x86 nature of the chip Intel still faces the challenges that any new entrant to a market faces when trying to replace an entrenched competitor. They have to provide a compelling value proposition. As you will see from the analysis below, I believe that Intel's biggest advantage in this space will be cost. That's right, I said cost. I'll be interested in any other views, but please have the courtesy to read my analysis and identify specific issues rather than making general blanket statements.

Intel’s phone (I’m looking at the system as a whole here because that is what the end user really cares about) is reported to be a middle of the road smart phone. It is a competitive entry, but doesn’t offer anything truly compelling. Intel is wisely targeting emerging markets with this smartphone that don’t have the well established base that exists in the US and Europe. This move allows Intel to avoid some of the difficulties of trying to oust a well established incumbent. In order to be successful in the long term, though, Intel is going to have to offer a compelling reason to choose their product over ARM. So Intel is going to have to offer comparable (preferably better) performance at lower power, and an equivalent (or lower) price. What I’d like to do here is evaluate Intel’s plan to achieve these goals.

The smartphone game is all about maximum performance at minimum power, but power and performance are inextricably linked. For a given process node, if you increase performance, you are going to have to use more power. If you want to lower power, you are going to have to give up some performance. If you want to improve your performance without increasing power you either have to improve your design, or change your process (i.e. a node shrink). Intel plans on two process shrinks and a redesign over the next 2 years.

Intel’s latest entry into the smartphone space, Medfield, is based on the 32nm process. Most of the competing devices are based on the 45/40nm process. So Intel has a 1 process node lead over those products. The latest and greatest ARM processors that are just starting to hit the market (QUALCOMM Snapdragon and TI OMAP5) are based on a 28nm process. Looking at benchmarks for the Snapdragon 4 (Krait) processor looks like the SOCs based on this processor outperform the Medfield offering. Perhaps more interesting is that Tegra 3 seems to also offer better performance although it is fabbed on a 40nm process. One would expect that shrinking the other processor designs to the 28nm node would give them a similar performance advantage over Medfield and possibly give Tegra 3 top honors.

Anandtech’s review of Medfield shows that Intel still has a ways to go to achieve this goal. The HTC One X/S use the Tegra3 and the Snapdragon S4 chips respectively. These phones represent the next generation of ARM based phones and generally outperform the Medfield chip. Unfortunately, battery life data isn’t available in the review, so no comparison between the Medfield phone and the newest ARM phones can be made regarding battery life. But Medfield battery life is compared to a number of other phones and generally in the bottom half but above the bottom quarter of the phones evaluated. Intel will need to close this gap. Snapdragon 4 is supposed to provide comparable battery life to it's predecessor while providing better performance. Based on that I would expect Medfield to lag behind the Snapdragon 4 in both power and performance. If Tegra 3 were fabbed on 28nm I would expect it to also edge Medfield on power as well as performance.

Intel's current roadmap shows them introducing a new design by the end of this year that is supposed to increase performance. Given the power sensitive nature of the phone market it is my assumption that this redesign will match the power consumption of the current devices. I have no data to support this, but Intel's phone effort are currently being run by Mike Bell who was involved in the development of the original iPhone, so I'm sure he knows what this market values. Based on what Intel has released on the redesign and my assumption of comparable power consumption Intel can expect an improvement over Medfield's current power/performance metric by the end of the year. The redesign should put Intel back in the lead on performance, where they are already competitive with the top phones.

I'm not claiming Intel has any magic bullet's here though. I suspect the redesign will end up increasing die size and giving up the size/cost advantage I indicated that Intel currently has over the hypothetical Tegra 3 processor on 28nm. Intel can increase their transistor budget by ~10% and still maintain size/cost equivalency. In order to stay in the same power envelope they will have to adopt more rigorous power control and/or reduce processor speed. My thought here is that going to out-of-order execution here will be Intel's approach which will allow greater processing efficiency at lower speeds while costing more transistors.

It is my opinion that the redesign will put Intel back in the lead on performance, but will still leave them lagging behind on battery power and sacrifice their cost/size advantage. Intel's roadmap calls for Intel to move to 22nm in 2013 and 14nm in 2014. I believe this is where Intel will rely on their process technology to close the gap. My expectation here is that the 22nm offering will use the same design as the improved 32nm design and the following comments are based on this assumption. Given the short time between 22 and 14nm, I believe Intel will have to use the same design on 14nm that they do on 22nm. While Intel hasn't announced anything beyond this I believe the next logical step will be another redesign on 14nm.

Shrinking to 22nm will give Intel their cost/size advantage back and will improve their power efficiency. If ARM were to do nothing between now and 2014 I expect that Intel's shrink to 14nm would give them unquestioned leadership in all three key metrics, cost, power, and performance. Intel's critics are quick to point out that ARM will not be sitting still for the next 2 years and rightly so. Let's look at what ARM's roadmap shows between now and 2014.

ARM is claiming to have 20nm products ready to go at the end of 2013. Given a typical 2 year development cycle they would see 12nm in 2015, 1 year after Intel goes to 14nm. This timeline also lines up well with TSMC’s current roadmap. Interestingly, TSMC has recently announced that they will only be offering a single 20nm process instead of the High Performance and Low Power variants they originally proposed. They cited the “lack of a noticeable performance difference” between the two processes as the reason for the change. This is very different from the data that Intel presented for 22nm. Intel showed significant differences between transistors designed for low power and those designed for high speed. This leads me to believe that TSMC’s process will underperform compared to Intel’s equivalent process.

I have been unable to find a timeline for ARM processor designs. I did find an article that indicates that ARM's future plans focus on a Big-Little theme. They will be using ARM7 cores for times when the device has less computational demand and switch to an ARM15 core when the computational demands are higher. Despite the lack of any public statements from ARM I have assumed a number of redesigns in the table below where I compare the roadmaps of ARM and Intel.

Firm 2012 Q3 2012 Q4 2013 Q1 2013 Q2 2013 Q3 2013 Q4 2014 Q1 2014 Q2 2014 Q3 2014 Q4 2015 Q1
Intel 32nm Mk1 32nm Mk2 32nm Mk2 32nm Mk2 22nm Mk1 22nm Mk1 22nm Mk1 14nm Mk1 14nm Mk1 14nm Mk1 14nm Mk2?
ARM 28nm Mk1 28nm Mk1 28nm Mk2? 28nm Mk2? 28nm Mk2? 20nm Mk1 20nm Mk1 20nm Mk2? 20nm Mk2? 20nm Mk2? 12nm Mk1?

The table above summarizes the two roadmaps. "?" marks indicate process nodes or redesigns that I'm assuming will occur. Mk1 is an initial design on a process node and Mk2 is a redesign on a given process node. Comparing the roadmaps in the table above I expect the next couple of years to unfold as shown in the tables below.

Metric 2012 Q3 2012 Q4
Performance ARM Intel
Cost Intel ARM/Intel
Metric 2013 Q1 2013 Q2 2013 Q3 2013 Q4
Performance ARM/Intel ARM/Intel Intel ARM/Intel
Cost Intel Intel Intel ARM/Intel
Metric 2014 Q1 2014 Q2 2014 Q3 2014 Q4 2015 Q1
Performance ARM/Intel ARM/Intel ARM/Intel ARM/Intel ARM/Intel
Power ARM ARM/Intel ARM/Intel ARM/Intel ARM
Cost ARM/Intel Intel Intel Intel ARM/Intel

Note that these tables compare estimates of Intel's offerings with the leading edge ARM products. When compared to ARM's older products, Intel may well have an advantage. I've also made no assumptions that foundries would have yield issues that would delay migration to a given process node, or any assumptions regarding inferior performance of foundry processes. I also assumed that Intel and ARM had an equivalent design frequency of 3 quarters with the exception of the 20nm node where I assumed 2 quarters for ARM. In short, I have tried not to bias my evaluation based on any assumptions of process or design superiority.

Examination of these tables shows that Intel will match ARM's leading edge performance and either hold the lead or exceed their performance by the end of this year. Intel will struggle to match the power efficiency of ARM until 2014 when both companies will maintain relative parity. The analysis I've performed above shows that Intel's real advantage here, and the thing they will have to leverage to gain traction in this market is cost.

Most analysts believe that Intel will not be willing to cut margins enough to be competitive on cost. However, I bring two counter arguments to the table. First, Intel's process lead gives them an inherent cost advantage. Second, I have heard Intel's Paul Otellini state that he expects SOC products to make up the majority of Intel's production on a volume basis, but not a cost basis on several occasions. To me this indicates that he realizes Intel will have to sacrifice some degree of margin on these products to be competitive. To offset this Intel will still have server and PC revenues to maintain their margins. As smartphone sales go up, so do server sales, and servers are Intel's highest margin chips. All these factors lead me to believe that before the end of 2014 Intel will be a major player in the smartphone market and will use cost as their primary advantage.