Not to spoil it for you, but the ultimate answer to the Great Question of Life, the Universe and Everything is 42, according to Douglas Adams’ Hitchhiker’s Guide to the Galaxy. The answer is a great illustration of the need to frame the right questions to get the answers you need from the deluge of engineering data. Image courtesy of Del Rey.
Many simulation technologies have been restricted until now by the granularity of the solution that can be employed. For example, in computational fluid dynamics (CFD), desired accuracy has been compromised by the physical size of the mesh and the time steps achievable.
To capture phenomena such as chaotic vortex patterns in highly turbulent flow demands a very high level of fidelity. Computing power has been a limiting factor in achieving the extreme level of simulation required in this case. It may be several years before sufficiently powerful computing is available to achieve the required granularity for the average user. The important point is that the baseline we are seeking is an accurate high-fidelity representation of the physics involved.
Other simulation areas in a similar position are combustion technology and climate modeling. Indeed, it could be argued that climate modeling is so chaotic that there will never be adequate fidelity in the simulation process. Many claim that weather forecasting will always be at least one day out of date!
Structural analysis at a macro level, however, is not usually that chaotic. We are able to predict the response of components to many loading phenomena very accurately with a modest mesh density. In other words, we’ve reached the ideal point where we have sufficient model fidelity easily achievable with average computing resource.
Most static, dynamic and mildly nonlinear analyses fall into this category. If we move beyond this into impact, low cycle fatigue and micro-mechanical response of composites, then we have not reached that required fidelity using typical computing resource. However, this discussion relates to the more mundane, but vital categories of analysis that most of us are involved with from day-to-day.
Over the past five years, I’ve watched with great interest as basic structural finite element analysis (FEA) has moved to an ever-greater element count for quite simple models. Much of this is premature, given the state of FEA pre- and post-processing technology. My fundamental questions are: What is driving analysts to this “big data” trend, and how can we avoid it until we are really ready for it?
Making Big Data
There is an enormous excitement in many areas over the concept and promise of big data. Many feel that there is a virtual explosion of big data about to hit us. The essence of the paradigm is that dramatically increasing the fidelity, scope and interchangeability of data will allow higher levels of productivity and competitiveness in industry as we understand better what that data is telling us.
Mainstream examples of data captured at ever-faster rates include:
- Indirect and direct tracking of human opinions, habits and responses on the Internet;
- GPS More Information linked to activities on smartphones and other mobile devices; and
- Sensor More Information fed to the central computer on modern cars.
The McKinsey Global Institute (MGI) paper1 on big data predicts that we will all be immersed its creation and influence over the next few years. The researchers cite five main areas where industries will benefit:
- Creating exhaustive data sets will mean that all data is readily available and less time will be wasted in searching for it. The data will be more naturally integrated between R&D and manufacturing, for example.
- Increased experimentation and variability in simulation models, combined with efficient data mining, will allow increases in product performance.
- Increasing the granularity of research or simulation data will improve the modeling by allowing greater accuracy and better targeting. This includes anything from FEA to consumer marketing.
- Replacing and/or supporting human decision-making by automated risk engines will improve product performance, as is done in financial and insurance circles now.
- New business or operational models will emerge, such as assessing driver insurance risk by monitoring actual behavior.
Now some of these sound downright scary, but with a bit of imagination we can see many of these points resonating in a design, analysis and production environment. I will talk about some FEA areas where I think we can see real benefits.
For this explosion of data to mean anything, of course, we need technologies that can actually deal with its volume and interpretation. How and where do we store it? How do we retrieve it, and how do we figure out what it is telling us?
The MGI paper has a very important section on this. It emphasizes that disciplines such as ours that consist of a large amount of legacy data and disparate systems will need the development of new and innovative systems to deal with big data.
That really is the basis of my argument: I feel we are nowhere near having these types of systems in place in a general sense. There is a strong growth in the areas of simulation data management (SDM) and simulation process data management (SPDM), and that is bringing big benefits. However, that is not the full story, as it is essentially managing the data flow. What is missing is the technology to be able to interpret what that data means — or as the MGI paper puts it, the ability to “integrate, analyze, visualize and consume the growing torrent of big data.”
Without additional interpretation of the results, what is the point of increasing the fidelity of an FEA mesh? We converge to an accurate stress solution at a very specific mesh density; beyond that, we are just wasting computing power and our own time. If the increased fidelity can actually tell us something more useful, that’s great. But we don’t have tools to enable that as yet. I will discuss later what these tools might be.
Our Future Role with Big Data
|Terabytes Terrify me and Petabytes Petrify me!
To get a handle on the size of data we are discussing, the amount of data stored in the US Library of Congress is sometimes used as a yardstick. As of 2007 it was estimated there are 10 Terabytes of data in the written works collection. The current storage of all US Academic research libraries runs at 2 Petabytes. Storing all the words ever spoken in the world is commonly assessed at 5 Exabytes. There is no warranty on the accuracy of the numbers, but they are indicative!
To remind ourselves:
A 4 Terabyte External Hard drive has a low-end street price of $200. You can pay much more for faster data transfer rate, but it clearly shows the trend in data storage for the average user.
The MGI paper defines three categories of people who will be involved with Big Data, and whose skills will be increasingly needed:
- Deep Analytical
- Big Data Savvy
- Supporting Technology
After looking carefully at the MGI data matrix that supports this, I came to the conclusion that most of us fit into the appropriate “Big Data Savvy” occupation described there as “engineer.” That requires us to have some knowledge of data technology, but the vital ingredient is the ability to “define key questions data can answer.” If you remember The Hitchhikers Guide to the Galaxy, the answer to the Question of the Universe is given2:
“All right,” said Deep Thought. “The Answer to the Great Question…”
“Of Life, the Universe and Everything…” said Deep Thought.
“Is…” said Deep Thought, and paused.
“Forty-two,” said Deep Thought, with infinite majesty and calm.
The quote really sums it up well — how to pose the right question and not to just accept a deluge of data as being the “answer.”
Let’s Go Green
At an FEA conference two years ago, I first heard of the term petabyte in relation to data size. It means 1,000 terabytes. That was the volume of data being considered for a fairly average analysis.
I don’t actually know of a post-processor that can handle output of that quantity. So we have a disconnect — careless meshing or idealization will generate huge numbers of elements (the solvers are now reaching the stage where they can generate huge volumes of result data with average computing resource). However, we can’t handle that amount of data; in many cases, it adds nothing to our understanding of the results.
Let’s keep our model size down. Don’t fill up the computing resource just because it’s there. Think of the challenge as a green element footprint instead of a green carbon footprint!
If we keep model size modest for “normal” analyses, it opens up the possibility of running more analyses rather than bigger analyses. This means we can follow a couple of interesting paths:
- Optimization: Modern optimization techniques have an inexhaustible appetite for numbers of analyses. To be most efficient and effective, genetic algorithm, artificial intelligence (AI) and design of experiment (DOE) methods really need analysis counts in the thousands. Software that is capable of spawning FEA analysis variants, retrieving the data and presenting it in a meaningful way is quite mature. The best visualization tools enable the engineer to try to understand why a particular configuration has evolved, and what the key drivers are. This enables a direct relationship to be made to practical design insights. This is the best example I see today of the use of big data in FEA.
- Stochastics: The very term of stochastics scares most engineers, including me until a few years ago. But all it basically means is throwing some variability into the modeling. Much traditional focus is on variations in material properties, etc. I am more interested, however, in the idea of automatically generating a range of mesh densities, boundary conditions and loading conditions to see how robust a particular solution is. I think this has tremendous potential in supplementing, to some extent, the skill and experience needed in FEA. The ability to assess risk of inaccurate analyses dovetails very nicely with some of the risk assessment models described in the MGI paper. The data mining tools to explore this type of analysis results are quite mature, and this is where I see a place for expanding big data in FEA right now.
At some point in the history of idealization, it will become common to model fabricated structures such as ships and aircraft with solid elements, rather than shells and beams. This will be an enormous extrapolation of data, and will require a total rethink of how we store, manipulate and interrogate structural analysis results. The challenge will be to have a process that will add to our understanding of structural load paths and stress patterns in complex structures. The danger will be in producing vast quantities of data that defy interrogation as to which structural responses are critical and why. This will require software developers to push the innovation aspects discussed in the MGI paper to the max.
Editor’s Note: Tony Abbey teaches live NAFEMS FEA classes in the US, Europe and Asia. He also teaches NAFEMS e-learning classes globally. Contact firstname.lastname@example.org for details.
Tony Abbey is a consultant analyst with his own company, FETraining. He also works as training manager for NAFEMS, responsible for developing and implementing training classes, including a wide range of e-learning classes. Send e-mail about this article to DE-Editors@deskeng.com.
1 The McKinsey Global Institute 2011. Big data: The next frontier for innovation, competition, and productivity. James Manyika, Michael Chui, Brad Brown, et al.
2 The Hitchhiker’s Guide to the Galaxy. Douglas Adams. 1979. Harmony Publishing.