By Peter Varhol
RapidMind lets you take an existing application, place it on top of its runtime platform, and enables the platform to automatically find and implement opportunities for parallel execution on multicore systems.
We’re in the midst of another computing revolution akin to the advent of inexpensive PCs and the widespread use of 32-bit processors. We’re still early in the revolution, and its real benefits may be years away from being realized. But with careful attention to systems and software, engineers can achieve some of these benefits today.
To understand this revolution of multicore processors and their dramatic impact on system performance, you must look back to the origin of Moore’s Law. Named after Gordon Moore, one of the inventors of the microprocessor, Moore’s Law loosely states that microprocessor density and performance will double approximately every 18 months.
Processor manufacturers such as Intel achieved this standard largely via two methods: higher clock speeds and smaller components. This enabled processors to cycle more quickly and to pack more circuits into the same space.
But neither method could be continued indefinitely. Very high clock speeds are associated with undesirable electronics characteristics as they pushed processors up into the radio frequency spectrum, while more and smaller components result in more challenging heat dissipation issues.
These limitations didn’t kill Moore’s Law, however. Microprocessors had already started down the road to using multiple execution pipelines as a way to better use clock cycles. They took the next logical step as the processor designers packaged up not only the pipelines, but also associated core execution circuits, making a single processor die in many ways equivalent to a multiprocessor system.
And now we all have them. This article was written on a dual-core laptop that is already three years old. If you’re not already using one, your next computer will almost certainly have a quad-core processor. Consumers of HPC, such as design engineers, care about multicore because of the need for speed, especially for modeling and real-time simulation.
But it’s not the clear win that it might appear to be at first glance. Neither operating systems nor application software take full advantage of multiple cores yet. And it may be a decade or more before we can fully take advantage of the power presented to us today.
But systems and software vendors are helping us get there. Intel is providing debugging and profiling tools to those who write and maintain their own code. Microsoft and the Linux community are working on better parallelism in operating systems.
Multicore processors rely on groups of individual cores for their power. A core is a set of circuits and functions that define the essential ability to execute instructions. It includes the processing pipeline, registers, arithmetic units, on-chip caches, and in general everything other than peripherals and buses that connect the cores to each other and to the world beyond the processor. Because the processor is driven by a single clock, each step in the pipeline is performed in lockstep across the cores, but that clock drives more than in single-core processors.
Intel’s Atom uses dual execution cores to deliver high performance in a small package for small computers or embedded systems.
Because we can theoretically do a number of operations on the same clock cycle, there is less need to run at high clock speeds, which saves on heat generation and power consumption. And while the use of multiple cores means that the die size is typically larger, it results in a simpler motherboard than multiple individual processors.
Software—The Hard Part
While it’s easy enough to incorporate these multicore processors into desktop systems, writing software to take full advantage of those processors is much harder. To have all of the cores engaged and running code from a single application is a difficult and error-prone process.
Writing multithreaded applications that can run threads in parallel is one of the most technically difficult activities computing professionals can attempt. There are two types of bugs you can experience while writing multithreaded applications. The first is a race condition. If you have two separate threads, and one thread is dependent upon data produced by the other, what happens if the first thread needs the data before the second produces it, then the first thread faults in some manner.
Race conditions are especially difficult to debug, because they seemingly occur randomly. Sometimes the threads complete successfully, because the “race” is won by the thread providing the data. Other times it fails, because the data is not available when the first thread needs it.
The second type of bug is deadlock. If both threads require a resource held by the other, and do not give up that resource, then neither thread can move forward. Because threads are typically designed to hold onto resources until done with them, neither thread can complete.
Because Intel has a vested interest in ensuring the success of multicore processors, it is on the verge of releasing Intel Parallel Studio, a set of plug-ins for Microsoft Visual Studio that help build applications with parallel components. It consists of three components—Intel Parallel Composer, Intel Parallel Inspector, and Intel Parallel Amplifier. Intel Parallel Composer consists of a parallel debugger plug-in that simplifies the debugging of parallel code and helps to ensure thread accuracy. Parallel Inspector detects threading and memory errors and provides guidance to help ensure application reliability. Parallel Amplifier is a performance profiler that makes it straightforward to quickly find multicore performance bottlenecks without needing to know the processor architecture or assembly code.
Innovative third-party tools also exist to assist in the building of parallel applications. RapidMind has a platform that lets you take single-threaded applications and run them on top of the RapidMind platform for fully protected multithreading (you also have to make a few minor code changes).
Another company whose technology has the potential to address race conditions is Virtutech, whose Simics simulator provides for reversible execution and debugging. You can also vary the speed of each core on the simulated processor, to help locate the cause of the race condition.
Last, CriticalBlue incorporates two products to enable multicore developers to create and improve parallel applications. Prism lets developers analyze code for parallel opportunities, perform what-if analyses on parallel strategies, re-code the application, verify its safe execution, and tune the result. Multicore Cascade adds cross-core software partitioning, task dependency analysis, and verification when optimizing software efficiency across architectures by mixing processors and Cascade coprocessors.
None of these tools support widespread multithreaded programming today. RapidMind is limited to C++ (although founder and chief scientist Michael McCool tells me that future versions will support .NET), while Virtutech targets embedded system development. But tools like these are the way we are going to take advantage of our growing processor power in the future.
Benefit to High-Performance Users
At its best, multicore processing provides the ability for a single application to have multiple threads of execution. If you are using a design application, for example, you can complete a design and run a simulation in the background, while at the same time annotating the original design.
At a higher level, multicore processors will also let engineers and other HPC users simultaneously run two active applications. Today, we often have multiple applications open, but are working in only one at a time. In the future, multiple processor cores will let us run two or more applications at the same time, with each actively executing.
What multicore processors won’t do is help you get a single task done more quickly. If you have an application thread with dependencies throughout the thread, or have heavy user interaction with a single application, it is likely that you will see little or no improvement in your perceived performance.
Better use of multiple processor cores will happen gradually, in fits and starts. So the performance and productivity improvements of multicore processors are largely still down the road. But there are design software vendors who are delivering the parallel goods today. Check with your software vendor; if they are supporting parallel operation, upgrade to multicore machines immediately. If not, follow their product path, and time your hardware upgrades to coincide with the availability of hardware that can take advantage of it. But be prepared for an explosion of multicore applications sometime in the future.
San Jose, CA
Santa Clara, CA
San Jose, CA
Peter Varhol has been involved with software development and systems management for years. You can send him comments about this article via e-mail to DE-Editors@deskeng.com.