Summary
This brand-new talk covers the state of the art for using C++ (not just C) for general-purpose computation on graphics processing units (GPGPU). The first half of the talk discusses the most important issues and techniques to consider when using GPUs for high-performance computation, especially where we have to change our traditional advice for doing the same computation on the CPU. The second half focuses on upcoming C++ language and library extensions that bring key abstractions for GPGPU — and in time considerably more — directly into C++.
Description
The mainstream hardware platform, from grandma’s PC or tablet on up, now typically contains a modern GPU that can offer 10x to 100x speedups for certain interesting computation workloads. If you care about performance on commodity hardware, you won’t willingly leave that kind of performance on the table. It’s time to think of the GPU as another available coprocessor, or compute accelerator, that’s not just for high-end custom GPGPU servers any more.
There’s one small problem: Standard C++ cannot access that juicy performance directly because it doesn’t (yet) have the language abstractions needed for running parts of the same program on heterogeneous processors. That’s unfortunate, because it’s C++’s job as a systems programming language to give full access to the hardware. While C++ evolves in this direction, in the interim environments like CUDA and OpenCL are filling this gap using a combination of libraries and extensions to C.
This talk covers the following issues:
- Language subsets due to hardware heterogeneity. The mainstream computer is now fundamentally heterogeneous. Different compute resources (traditional CPU, Cell-style SPU, GPU) support only subsets of the C and C++ languages; for example, many lack support for pointers to pointers, pointers to functions, new, malloc, and even fundamental types like short int.
- Performance differences and pitfalls. Code executing on GPUs can have very different performance characteristics from the same code running on CPUs. Even using C++’s most basic language feature — by adding an “if” statement — can silently lose an order of magnitude or more of performance.
- Non-uniform and fragmented memory. Most current GPUs do not share memory with the CPU, and so data must be (usually explicitly) transferred before and after the computation. Further, the GPU memory itself has a notion of cache-like memory shared by subgroups of threads, but that “cache” is not automatic; the programmer must manage it explicitly.
- Rapidly changing hardware. GPGPU hardware designs in particular are still in great flux. What programming techniques work well on today’s hardware, and result in writing code that is also friendly to tomorrow’s different hardware?
Finally, we’ll also consider how to address the above issues (some of which are temporary) in a way that treats GPGPU as just an interesting current midpoint on the road to mainstream heterogeneous computation — spreading a computational workload across available parallel processing assets, from vector units and multicore, to GPGPU and APU, to elastic cloud computing. And, unlike CUDA and OpenCL, our goal is to find solutions, not for C, but for C++ — leveraging C++’s strength of strong abstractions and STL techniques while still flying close to today’s morphing metal.
April 5, 2011 at 5:32 pm
[…] be giving a brand-new talk on “C++ and the GPU… and Beyond.” I’ll cover the state of the art for using C++ (not just C) for general-purpose computation […]
April 6, 2011 at 12:52 pm
[…] This talk is related to, but different from, the GPU talk I’ll be presenting in August at C++ and Beyond 2011 (aka C&B). You can expect the above keynote to be, well, […]
April 14, 2011 at 11:35 am
[…] be giving a brand-new talk on “C++ and the GPU… and Beyond.” I’ll cover the state of the art for using C++ (not just C) for general-purpose computation on […]
April 27, 2011 at 5:05 am
Will slides etc. be posted for these talks for those of us not going to the conference?
April 28, 2011 at 9:51 am
[…] presentations on efficiently processing big data sets, taking advantage of the computational capabilities of GPUs, understanding the C++0x memory model, and more (all created just for C&B), plus the chance to […]
July 8, 2011 at 8:52 pm
[…] “C++ and the GPU… and Beyond.” I’ll cover the state of the art for using C++ (not just C) for general-purpose computation on graphics processing units (GPGPU). The first half of the talk discusses the most important issues and techniques to consider when using GPUs for high-performance computation, especially where we have to change our traditional advice for doing the same computation on the CPU. The second half focuses on upcoming C++ language and library extensions that bring key abstractions for GPGPU — and in time considerably more — directly into C++. […]
July 17, 2011 at 5:14 am
Hi Herb –
Really liked your introduction of C++ AMP on Channel9, and am looking forward to this session! A discomforting issue though is related to the current limitations of services running in Session 0. It would be great to create Windows Service applications that can fully leverage the power of the GPU (e.g. SQLCUDA database server, DXVA-enabled video processing service, etc.). Server applications are where I especially need this power. Perhaps you will have advice on what the future of this will be.
(Note, see the following link for typical discussion:
http://social.msdn.microsoft.com/Forums/en-US/windowsgeneraldevelopmentissues/thread/1cd6456c-9a14-4a7c-b6a3-e3e2249358e9).
July 26, 2011 at 1:01 pm
[…] you get going. So my talks on C++0x (C++ Renaissance, and How To Teach C++) and GPU programming (C++ and the GPU… and Beyond) have grown considerably longer and in-depth than I initially expected. There’s only so much […]