Sooner or later, advanced CAE software user will have to face the invasion of the zombie desktop computers. It’s a common anxiety among those who rely on underpowered desktop computers to run complex simulation jobs. Fully engaged in number crunching, the computers come to almost a standstill, not quite dead but barely alive.
Image courtesy of Istock.
In a recent survey DE conducted on behalf of IBM Platform Computing, 39% of more than 1,000 respondents stated they’re running their applications exclusively on workstations. Only 11% are running their programs exclusively on clusters. The same participants also reported running a variety of simulation: complete systems (60%), large assemblies and models (58%), and parts and small assemblies (52%). The disparity between the volume of simulation executed and the number of clusters involved suggests a significant segment of the participants may have accepted the zombie workstation syndrome as a fact of life.
Two easy methods will prevent the return of the zombies:
1. Set up an in-house cluster to tackle the CAE jobs and free up the workstations; or
2. Offload them to a remote cluster or on-demand cloud computing vendor.
Our chat with server experts, custom cluster vendors and cloud computing pioneers sheds light on the sometimes not-so-obvious choices you must make with each option.
Open Source or Commercial Code?
In the survey, respondents listed what they felt were barriers to cluster deployment. The top ones include lack of budget for hardware and software (47%), lack of skilled IT staff (31%), and uncertainty associated with migrating desktop applications to the cluster environment (24%).
Budget concern, especially among smaller and midsized firms, may nudge high-performance computing (HPC) buyers toward open-source software. After paying for the hardware, the open-source cluster management software’s $0 price tag is attractive.
“We’re big proponents of open-source cluster management for general HPC and select simulation cluster users,” notes Brett Newman, Microway’s HPC sales and marketing specialist. “We develop our Microway Cluster Management Suite package based upon Ganglia. It’s robust software, and it’s free with Microway clusters.”
But Nick Werstiuk, a product line executive with IBM Platform Computing, points out that open source software is not exactly turnkey: “Open source generally requires ]users] to pull together a variety of software, which they must integrate or support on their own. The expertise you need to integrate or pull together these solutions is something you need to consider. ”
Whereas cluster-management software exists both as open source and commercial code, the job-scheduling option may be best confined to commercial solutions sanctioned by the simulation software vendor. “ANSYS, for example, supports only Platform LSF, PBS Professional, and Windows HPC as third-party schedulers,” Microway’s Newman says, adding that some open-source schedulers, such as Torque or Grid Engine, “aren’t wise choices for ANSYS users.”
The Emergence of On-demand Clusters
Software-as-a-Service (SaaS) was yesterday’s discussion. Today, the heated debate revolves around Infrastructure-as-a-Service (IaaS) or Platform-as-a-Service (PaaS). The emerging market is served by vendors who offer hardware resources augmented with cluster- and queue-management software as an integrated bundle to process simulation jobs. Like SaaS vendors, IaaS and PaaS vendors offer their products as on-demand solutions, remotely accessible to subscribers who pay a recurring fee.
Respondents listed lack of budget for hardware and software as the top barrier to cluster adoption, followed by the lack of skilled IT staff to manage the cluster.
Only 18% of respondents stated they are familiar with the benefits of HPC.
Rescale, headed by former Boeing structural and software engineer Joris Poort, is among the new crop of vendors targeting the CAE market with PaaS offerings. The company certainly gets a fair amount of business from those who have no desire to invest in and maintain HPC hardware onsite. But Rescale has also found a new type of simulation users: those seeking scalability.
“Most of our customers actually have on-premise HPC clusters. Some of them say the reason they came to Rescale was because their businesses tend to peak and drop, but their onsite HPC capacity is flat,” says Sunny Manivannan, Rescale’s vice president of business development. “When their demand goes over the supply, they want to be able to tap into Rescale’s infrastructure.”
It’s not just startups that are dipping their toes into the uncharted waters, however. Altair, an established name in simulation, has repackaged its CAE platform products, HyperWorks, as a cloud offering. Altair customers purchase pools of HyperWorks Units, which function like tokens, to get access to select Altair software titles. With HyperWorks On Demand (HWOD), customers may also use these units to pay for access to Altair’s computing resources, available on-demand. Based on Altair’s PBS Works suite, the integrated software lets you remotely submit, monitor and manage jobs from the web.
Linux or Windows?
The choice for cluster management software may also be dictated by the preferred cluster operating environment. Most workstation versions of CAD and CAE software are developed for Windows OS. This may lead first-time cluster buyers to conclude that supplementing the pre-existing Windows workstations with a Windows cluster makes the most sense. That, however, may not be the best way to decide the cluster’s operating system.
“Some simulation software packages support more features in their Windows or Linux versions,” says Microway’s Newman. “We strongly recommend that customers consult their hardware and simulation software vendors to assess what’s best for their needs.”
“]Windows HPC Server] offers robust support for Windows hardware and applications, but not for Linux applications,” notes IBM’s Werstiuk. “A lot of these CAE applications do have a large Linux footprint, and clients want the ability to run those on Linux environment.”
Rescale’s own cluster setup offers additional clues. “When we started, we were pure Linux. Now we’re looking at Windows options, too, because some small proprietary codes are written for Windows,” says Manivannan. “Big commercial codes are written for both Linux and Windows, with occasional differences in features based on OS. Linux is more widely accepted as the standard for larger batch jobs.”
Balancing Core and Node Counts with Interconnect Speed
If you throw in an excessive number of server units (computing nodes) in the cluster, the hardware suffers from increased chatter — the back-and-forth communication among the computing nodes to coordinate and attack your simulation job. If your interconnects, the pipelines that join the computing nodes, are not up to par, they could become a performance bottleneck.
“Purchasing a very small cluster of 2P ]dual CPU] servers to run modest jobs can sometimes be a poor choice,” says Newman, offering 32 total cores as an example. “There’s latency involved in the internode communication. Small clusters are most likely to rely on Gigabit Ethernet with its much higher latency as well.
“CAE software users might have much faster results by forgoing a tiny cluster and purchasing a single big node with 32 cores instead,” he says. “The entire model might even fit into memory.”
For big CAE jobs, Newman says, there’s strong benefits to choosing a fast interconnect: “CAE users want ample bandwidth to transmit job data with as little latency as possible. At up to 56GB per second and less-than-1 microsecond latency, InfiniBand allows for the strongest performance scaling.”
The GPU Grid Defense
Graphics processing unit (GPU) maker NVIDIA offers another way to keep the zombies away: Put your machines on a grid. Launched officially during NVIDIA GPU Tech Conference 2013, the hardware premiered as the NVIDIA Grid visual computing appliance (VCA). Essentially, it’s a GPU-powered cluster, capable of supporting eight to 16 users.
With this approach, users may use inexpensive laptops and consumer desktops as client devices to remotely access powerful virtual machines, each with its own designated virtual GPU. The virtual GPU will make a difference for those who routinely engage in real-time visualization or GPU-accelerated simulation.
Dawn of the Living
Complex CAE jobs will almost always test the limits of individual workstations. When these jobs are running, the burden on the CPU is significant. The ever-increasing size and complexity of the digital models — a consequence of manufacturers’ increased reliance on digital simulation — suggests CAE clusters, both offsite and onsite, are bound to become a way of life for designers and engineers for the foreseeable future.
Like the backend IT, the front-end terminals are evolving, too. In the past, most CAD and CAE users had to be physically seated before their desktops, as they were the only machines powerful enough to perform the tasks. The emergence of remote servers, PC over IP, virtualization and cloud computing cuts the umbilical cord, allowing design software users to perform simulation from a mobile tablet, a phone or an ordinary laptop from a Wi-Fi-enabled caf © or an airport lounge.
As zombie computers face their twilight, the formerly desk-bound professionals get a chance to operate beyond the cubicles — in sunshine and daylight.
|Free White Paper
IBM Platform Computing and Desktop Engineering have produced “5 Easy Steps to a High Performance Cluster,” a white paper that is available as a free download here.