Click

Saturday, January 30, 2010

Istanbul versus Nehalem, some extra notes


My last post generated quite a bit of discussion, some of it based on misunderstandings. In this post I'll try to make a few things more clear. In a previous post, I pointed out that there are a good indications that a dual Nehalem EP has a 40 to 100% advantage over Shanghai (depending on the application, based on the SAP and Core i7 workstation benchmarks).

If Istanbul is introduced in the early part of H2 2009, AMD will have a small window of opportunity of competing with a hex-core versus a quad-core (Intel's Nehalem EP). Time will tell of course how small, large or non-existing this window will be.

In well threaded applications, the best a "hex-core Shanghai" can do is give about a 30-40% boost to performance compared to the current Shanghai, which is most likely not enough to close the gap with the upcoming Nehalem CPU (let alone the 32 nm hex-core version). However, Istanbul is more than a hex-core Shanghai. The improved memory controller and HT-assist can lower the latency of inter-CPU syncing and increase the effective memory bandwidth. For that reason, Istanbul will do better than just "a shanghai with 2 added cores" in many applications such as SAP, OLTP databases, Virtualization scenario's and HPC. Depending on the application, Istanbul might prove to be competitive with the quad-core Nehalem. It is clear that the hex-core "Westmere" which will have a slightly improved architecture will be a different matter.

But back to the "this higher amount of bandwidth will allow the quad Istanbul to stay out of the reach of the dual Nehalem EP Xeons" comment. It is very embarrassing, and simply bad PR if a quad socket platform is beaten by a dual socket platform in any benchmark. This is something we have witnessed in the early SAP numbers. That is why I commented that the improved "uncore" will help the quad socket Istanbul to stay out of the reach of the dual Nehalem EP. I was and am not implying that people who would consider a dual Nehalem EP are suddenly going to consider a quad Istanbul.

It is clear those looking for a 4S and 2S server are in a slightly overlapping but mostly different market. Quad socket is mostly chosen for large back end applications such as OLTP databases or for virtualization consolidation. The number of DIMM slots in that case is a very important factor. However, even with the advantage of having more DIMM slots, better RAS etc., a quad socket platform that cannot outperform a dual socket platform will leave a bad taste in the mouth of potential buyers. It is important that there is a minimal performance advantage.

The fact that the performance/power ratio of such a quad server will be worse than a dual socket server is an entirely different discussion. IBM's market research (see the picture below) shows which form factor is bought mostly for consolidating VMs. As you can see it comes down to some people being convinced that a number of 4-socket rack servers is the best way, others are firm believers that about twice as much low power 2-socket blades is the way to go. It is very hard to convince the latter or former group to switch sides and that is why I feel that 2S and 4S servers are mostly in different markets.

In many cases, the number of virtual machines you can consolidate on one physical server is mostly a function of the amount of RAM. If the number of DIMM slots allows you to consolidate twice as many virtual machines on the quad socket machine, the consumed energy might be better than using two DP machines with the same number of DIMMs.

So despite the fact that the two DP machines have a lot more CPU power, the "scale up" buyers still prefer to go for a large box with more memory; they are not limited by raw CPU power, but by the amount of RAM that they can put in this server. It is these people that AMD will target with their 4S platform, a platform which has - especially for virtualization - a number of advantages over the current Intel 4S "Dunnington" platform... at least until Intel's octal-core arrives. Whether you choose the 2S blades or 4S rack servers depends on whether you believe in the "scale up" or "scale out" philosophy.

The conclusion is that many 4S rack servers are not only bought for raw CPU performance, but for the amount of RAM, their RAS features, and so on. However, it is clear that a 4S server should still outperform 2S servers so that the group of buyers who are believers in the "scale up" philosophy feel good about their purchase.

No comments:

Post a Comment