Home / Fast Apps / Baker Hughes Develops Predictive Maintenance Software for Gas and Oil Extraction Equipment Using Data Analytics and Machine Learning

Baker Hughes Develops Predictive Maintenance Software for Gas and Oil Extraction Equipment Using Data Analytics and Machine Learning

In periods of peak demand, Baker Hughes crews work around the clock to tap oil and natural gas reservoirs. At a single well site, as many as 20 trucks may operate simultaneously, with positive displacement pumps injecting a mixture of water and sand at high pressures deep into drilled wells. These pumps and their internal parts, including valves, valve seats, seals, and plungers, are costly, accounting for about $100,000 of the $1.5 million total cost of the truck.

Truck with positive displacement pump.

Truck with positive displacement pump.

To monitor the pumps for potentially catastrophic wear and predict failures before they occur, Baker Hughes analyzes pump sensor data with MATLAB and applies MATLAB machine learning algorithms.

“We saw three advantages in using MATLAB to develop our pump health monitoring system,” says Gulshan Singh, reliability principal and team lead for drilling services at Baker Hughes. “The first is speed; development in C or any other language would have taken longer. The second is automation; MATLAB enabled us to automate the processing of large data sets. The third is the wide variety of technologies that MATLAB provides for working with data, including basic statistical analysis, spectral analysis, filtering, and predictive modeling using artificial neural networks.”

A well site using positive displacement pumps.

A well site using positive displacement pumps.


If a truck at an active site has a pump failure, Baker Hughes must immediately replace the truck to ensure continuous operation. Sending spare trucks to each site costs the company tens of millions of dollars in revenue that those trucks could generate if they were in active use at another site. The inability to accurately predict when valves and pumps will require maintenance underpins other costs. Too-frequent maintenance wastes effort and results in parts being replaced when they are still usable, while too-infrequent maintenance risks damaging pumps beyond repair.

“MATLAB gave us the ability to convert previously unreadable data into a usable format; automate filtering, spectral analysis, and transform steps for multiple trucks and regions; and ultimately, apply machine learning techniques in real time to predict the ideal time to perform maintenance.”

 — Gulshan Singh, Baker Hughes

Baker Hughes engineers wanted to develop a system that could determine when a machine was about to fail and needed maintenance. To develop this system, the team needed to process and analyze up to a terabyte of data collected at 50,000 samples per second from sensors installed on 10 trucks operating in the field. From this large data set, they needed to identify the parameters that were useful in predicting failures.


Baker Hughes engineers used MATLAB to develop pump health monitoring software that uses data analytics for predictive maintenance.

They imported data gathered in the field from temperature, pressure, vibration, and other sensors into MATLAB. The team worked with a MathWorks support engineer to develop a custom script for reading and parsing sensor data stored in binary files in a proprietary format.

Working in MATLAB, the Baker Hughes team analyzed the imported data to determine which signals in the data had the strongest influence on equipment wear and tear. This step included performing Fourier transforms and spectral analysis as well as filtering out large movements of the truck, pump, and fluid to better detect the smaller vibrations of the valves and valve seats.

To automate the processing of almost one terabyte of collected data, the team wrote MATLAB scripts that they executed overnight.

The engineers discovered that data captured from pressure, vibration, and timing sensors was the most relevant for predicting machine failures.

Working with the MathWorks support engineer, the team evaluated several machine learning techniques using Statistics and Machine Learning Toolbox™ and Neural Network Toolbox™. This initial evaluation showed that neural networks produced the most accurate results. The group created and trained a neural network to use sensor data to predict pump failures. They validated this model using additional data from the field that was not used to build the model.

Field tests confirmed the pump health monitoring system’s ability to predict pump failures.

Baker Hughes' predictive maintenance alarm system, based on MATLAB.

Baker Hughes’ predictive maintenance alarm system, based on MATLAB.


  • Savings of more than $10 million projected.“In a single year, we can spend a significant amount of revenue just on the maintenance and replacement of internal pump components, such as valves, valve seats, plungers, and seals,” says Thomas Jaeger, senior product manager at Baker Hughes. “We estimate that the software we developed in MATLAB will reduce the overall costs by 30–40%—and that’s in addition to the savings we’ll see from eliminating the need for extra trucks onsite.”
  • Development time reduced tenfold.“MATLAB enabled us to perform our desired analyses and processing, including machine learning,” Singh says. “With a lower-level language, you can’t always find the libraries you need and complete the project within the allocated time of weeks. If we had to write our own code using lower-level language libraries for all the built-in MATLAB capabilities we needed, it would likely have taken an order of magnitude longer to complete this project.”
  • Multiple types of data easily accessed.“MATLAB made it easy to combine multiple kinds of data into one analysis application,” says Singh. “We were even able to use sensor data from a proprietary file format.”

About DE Guest

This article was contributed to Digital Engineering by a guest author.