Home / Engineering Computing / Pushing Clusters Further for CAE

Pushing Clusters Further for CAE

By Stan Posey

Ask design engineers and CAE analysts about important computing trends, and clusters will emerge as a topic of conversation before practically anything else. In a relatively short time, clusters have become practically ubiquitous in design and R&D and facilities. And they continue to propagate, with analyst firm IDC projecting cluster sales to experience 25 percent compound annual growth rate through 2006.

That growth has been largely fueled by the low initial cost of conventional cluster systems, whose commodity technology components have led to their being branded as "white box" clusters. By leveraging the buyer-friendly economies of scale that come with today’s plain-vanilla clusters, engineering organizations have found a relatively easy way to outwardly expand computing capacity by adding dual-processor nodes and connecting them with inexpensive interconnect technologies.


The Silicon Graphics Prism deskside system is an extension of the Silicon Graphics Prism family. SGI describes its deskside system as the bridge between traditional workstations and rackmount Silicon Graphics Prism visualization systems, bringing large memory and extended I/O capabilities to users at a lower price point. Image courtesy of SGI.

The good news is that these resources have done a reasonable job of running CAE applications that solve problems by partitioning workloads across individual locations of memory that usually amount to no more than 2GB per node. These programs include FEA (finite element analysis) applications for structural impact analysis and nearly all CFD (computational fluid dynamics) software. In general, codes whose algorithms don’t demand high rates of data exchange between nodes—and whose partition scheme is based on a domain decomposition for load balance—should scale well on white box clusters. This is because they place the data associated with each thread on the node where it will be executed, thus parsing and distributing the data load across available memory.

The bad news is that design, analysis, and manufacturing environments do not always rely on just one kind of application. Instead, they often run a mix of software, some of which—like most FEA implicit structural analysis applications—simply cannot be considered cluster-production-ready. These codes generally lack the ability to efficiently distribute models and workloads across memory partitions. What’s more, HPC (high-performance computing) application productivity isn’t solely tied to parallel capability, but is also closely related to file I/O throughput. And conventional capacity-centered computing clusters just can’t offer a high-performance I/O solution.

These systems simply cannot meet all application objectives for organizations that are working to reduce cost of cluster ownership, reverse the "server sprawl" spawned by proliferation of single- and dual-processor cluster nodes, and achieve more flexible resource management. The answer lies in a more versatile, balanced cluster architecture.

Capacity + Capability = Flexibility

Three CAE disciplines—implicit and explicit FEA for structural analysis and CFD for fluid flow simulation—exhibit the kind of resource demands that highlight the importance of a balanced HPC system architecture. A balanced architecture would combine traditional capacity-centered computing with the kind of capability-oriented computing architectures that in recent years has marked the design of the world’s fastest servers and supercomputers. The result is a dramatically more flexible system capable of deriving excellent performance from applications suited for both distributed-memory and shared-memory architectures. By serving as platforms for multiple types of CAE application software, these balanced cluster architectures make it easier to reduce product design-cycle time and costs while improving design quality.

Achieving a genuine balanced approach to cluster computing requires certain technology components, chief among them: high-speed processors with large cache, large addressable memory, high memory-to-processor bandwidth rates, high disk-to-memory I/O rates, and a low-latency interconnect that provides efficient parallel scalability to hundreds of processors. In general terms, CFD can scale efficiently to hundreds of CPUs, explicit FEA can scale to more than 50 CPUs, and implicit FEA can scale to up to eight CPUs. The challenge is to ensure that these codes scale efficiently, with as close to a linear performance speed-up as possible, while maximizing file I/O performance and making use of available memory efficiently and effectively.


The SGI Altix 1350 is a factory-integrated cluster that matches performance and functionality of SGI’s flagship Altix systems at a reduced total cost of ownership. The ability of the SGI Altix 1350 clusters to scale at the node rather than by adding new nodes for increased computing needs means that users have fewer nodes to connect, manage and provision. Image courtesy of Silicon Graphics, Inc.

Never has this goal been more important. Every day, engineers and analysts face a veritable explosion in the size of models, number of simulations, and the amount of results data. Everyone demands increased realism in digital prototypes, which, in turn, increases the burden on an application’s ability to cut time-to-solution, a key metric of success in a production environment.

Consider the case of one auto industry supplier that designs and manufactures parts and components for the world’s largest vehicle manufacturers. In just the past two years, its CAE engineers report that axle and drive-train model sizes have grown 400%. Such models put an ever-greater burden on conventional server architectures, leading to memory limitations that can degrade performance and create productivity bottlenecks.

However, recent developments in clustering technology addressing these concerns mean that the rapidly deepening data glut need not create nightmares for engineers. A new generation of Linux OS-based clusters made of large shared-memory nodes provides the best of both worlds, accommodating both parallel and throughput-intensive CAE jobs. Design engineering and analysis groups worldwide already have begun implementation of clusters that handily deliver generous shared-memory parallel resources "on-node," and provide scalable capabilities "off-node."  These systems not only scale CPUs just as white box clusters do, but they also independently scale memory and I/O like the high-performance supercomputers that continually shatter records in CFD and FEA performance.

One Cluster Environment, Multiple Applications

For engineers the problem has been that CFD applications distribute simulations across CPUs and over multiple nodes while FEA structural analysis tasks run within a single large-node, shared-memory environment. The emerging "hybrid cluster" approach allows for multiple applications to run in a single-system environment that does not force engineers to choose between the single-discipline considerations of CFD or FEA tasks. Thus, engineers throughout an organization can accomplish both—and they do not need multiple clusters to get it done. Additionally, they still can rely on open-source software and industry-standard components to ensure cost-effectiveness.

Hybrid cluster systems combine distributed- and shared-memory parallel deployment of a high-throughput system architecture, resulting in single nodes that can scale to up to 32 processors and 384GB of memory. With the ability to leverage a shared-memory architecture, engineers can use hybrid clusters to conduct memory-resident analyses with the data as a complete entity, without breaking it into smaller partitions to be handled by individual processors. This spares you and programmers from having to devote time developing rules for dividing their data into smaller sets, a common occurrence in distributed-memory parallel applications.

Hybrid cluster systems also answer mounting demands for I/O bandwidth by incorporating high-speed interconnect technology capable of transferring data at 6.4GB per second. Further, by scaling at the node instead of adding new nodes every time your computing needs increase, you spend less on server administration and interconnect fabric because you have fewer nodes to connect, manage, and provision.

Hybrid cluster systems offer a natural bridge to "next level" solutions, including powerful Linux visualization systems and immersive environments, such as the six-sided CAVE displays used for interactive design reviews and collaboration.

Hybrid clusters represent positive news for users. But the news is just as compelling for managers with their eye on the bottom line. With fewer systems, workgroups can generate solutions faster, and the cost of cluster ownership drops dramatically with fewer expenses tied to interconnect fabric and administration.

Clearly, clusters have begun to reach far beyond the limitations of the plain white box. For CAE engineers and analysts, that’s a paradigm shift worth talking about.

Stan Posey manages HPC applications and industry development strategy for SGI’s North American Field Organization. He holds a master’s degree in mechanical engineering from the University of Tennessee-Knoxville. Send your comments about this article by clicking here. Please reference August Clusters Article.


  Faster, Better, Now

The bold vision of balanced architectures in computing systems may be relatively new to the world of white box clusters, but systems capable of driving multiple types of applications have existed for years—primarily in the UNIX marketplace.  Now, Linux users are getting their hands on them as well.

Servers like SGI Altix have shown that open-source software need not be relegated to commodity platforms. Launched in 2003, Altix servers have consistently dominated real-world application tests while reversing long-held myths about Linux system scalability. A single Altix node now can support 512 processors, a feat thought impossible not long ago. Recently, Altix technology became available in preconfigured cluster solutions designed for easy deployability and management.


The silicon graphics altix 350 system is a midrange  linux os-based midrange server capable of scaling up to 16 processors utilizing a global shared memory architecture.

The Silicon Graphics Altix 350 system is a midrange Linux OS-based midrange server capable of scaling up to 16 processors utilizing a global shared memory architecture.

The SGI shared-memory architecture also benefits Linux visualization systems via the Silicon Graphics Prism family of scalable visualization systems. These systems allow engineers to visualize and interact with their entire CAE simulation data sets at high resolution and zoom in to see details of any particular area of interest. Consequently, an engineer can more rapidly understand specific results and cross-functional teams can engage in collaborative analysis, increasing the insight gained, and allowing interactive what-if questions as the understanding of the data yields initial insights.

SGI Altix and Silicon Graphics Prism systems run more than 200 CAE applications, including ABAQUS, ADINA, ANSYS, ANSYS CFX, CFD++, EnSight, FIELDVIEW, FLUENT, LS-DYNA, MSC.Nastran, NX Nastran, PAM-CRASH, RADIOSS, and STAR-CD. Based on 64-bit Linux and Intel Itanium 2 processors, these systems are also compatible with all applications adhering to these open standards.  Furthermore, with Transitive QuickTransit dynamic binary translation technology, IRIX/MIPS applications run transparently on the Silicon Graphics Prism platform without any source code or binary changes.

With new systems like these entering the marketplace, imagining what-if is easier than ever.—SP

Product Information

Silicon Graphics, Inc.
Mountain View, CA

About DE Editors

DE's editors contribute news and new product announcements to Digital Engineering. Press releases can be sent to them via DE-Editors@digitaleng.news.