We’re mulling over a lot of hot topics at C++ and Beyond 2012. Don’t forget we’re at almost 75% capacity so you hurry up and register now to secure you or your team a seat (and don’t forget the 10% discount for teams). Here’s a session I hope you’ll find informative. I sure learned a thing or two while working on it.
There’s been a lot of talk lately about multicore CPUs and how to use them effectively. In all this excitement, it’s often forgotten that each CPU is quite parallel in and of itself – it has multiple ALUs, superscalar execution, pipelining, and more—generally lots of dedicated, redundant hardware ready to support Instruction-Level Parallelism (ILP). Much of ILP detection and exploitation is happening automatically…
…but the effectiveness of such mechanisms is heavily dependent on the high-level coding style. Code snippets of identical effect may actually exhibit very different performance profiles, and often it’s code that looks smaller, tighter, and more economical that’s the loser. We need to retrain some of our code reviewing taste buds if we care about performance.
The problem with wasting CPU-level parallelism is that its benefits are impossible to recover at a higher level. Losing ILP essentially means renouncing some valuable CPU real estate – that dark silicon will never light up. This problem affects multi-core and single-core processors alike (the latter being still in heavy use in embedded systems).
The good news is, there are things to do at high level to improve the utilization of CPUs’ ILP capabilities. This talk demonstrates with examples and numbers that certain styles of programming naturally lead to better ILP, which is readily exploited by the compiler and hardware to attain spectacular speed improvements. This all in spite of the code not initiating any specific parallel actions and looking generally not parallel at all.