Industrial Machine Learning: Why you have to do it.
Holger Amort

Machine Learning (ML) has seen an exponential growth during the last five years and many analytical platforms have adopted ML technologies to provide packaged solutions to their users. So, why has Machine Learning become mainstream?

  • Machine Learning libraries have automated the model development cycle by adopting pipelines so that users can easily benchmark the performance of a variety of data cleansing, feature engineering, and algorithm selection and optimization.
  • No more tedious steps of manual testing ofall the various permutations. Autotuning libraries have automated the process of both the hyper parameter tuning and algorithm selection.
  • The increased interest and rapidly expanding user base have led to excellent public domain research, documentation, trainings material and code examples.
  • And more importantly, most libraries are available under the very permissive MIT license.

Let’s take a look at Technically Multivariate Analysis (MVA). While many algorithms have been widely available for a long time, MVA is still considered a subset of ML algorithms. MVA typically refers to two algorithms:

  • Principal Component Analysis (PCA) - applied for unsupervised learning (e.g. classification) tasks and feature engineering
  • Partial Least Square (PLS) - used for supervised learning (e.g. regression)

As such, MVA has become a de facto standard in manufacturing batch processing and others. Some typical use cases are:

  • Multivariate control (T2 and SPE) limits (MPC) based on PCA
  • Predict yield or other product relevant metrics based on process variables
  • Calibration of online measurements (FTIR, RAMAN, Vibrational)
  • Midpoint corrections

In principle, industrial datasets are not different from other supervised or unsupervised learning problems and they can be evaluated using a wide range of algorithms. Multivariate Analysis was preferred because it offered global and local explainability. MVA models are multivariate extensions of the well understood linear regression that provide weights (slope) for each variable. This enables critical understanding and optimization of underlying process dynamics which is a very important aspect in manufacturing.


In the past, many ML algorithms were considered black box models, because the inner mechanics of the model were not transparent to the user. These model types had limited utility in manufacturing since they could not answer the WHY and therefore lacked credibility.

This has very much changed. Today, model explainers in ML are a very active field of research and excellent libraries have become available to analyze the underlying model mechanics of highly complex architectures.

The following shows an example of applying ML technologies to a typical MVA project type. In the original publication ( ), several preprocessing steps have been studied together with PLS to build a predictive model. All steps were performed using commercial off the shelf software that manually worked the analysis.

Using ML pipelines, the same study can be structured as follows:

pipeline=Pipeline(steps= [('preprocess', None), ('regression',None)])
preprocessing_options=[{'preprocess': (SNV(),)},
                       {'preprocess': (MSC(),)},
                       {'preprocess': (SavitzkyGolay(9,2,1),)},
                       {'preprocess': (make_pipeline(SNV(),SavitzkyGolay(9,2,1)),)}]

regression_options=[{'regression': (PLSRegression(),), 'regression__n_components': np.arange(1,10)},
                    {'regression': (LinearRegression(),)},
                    {'regression': (xgb.XGBRegressor(objective="reg:squarederror", random_state=42),)}]
param_grid = []
for preprocess in preprocessing_options:
    for regression in regression_options:
        param_grid.append({**preprocess, **regression})
search=GridSearchCV(pipeline,param_grid=param_grid, scoring=score, n_jobs=2,cv=kf_10,refit=False)

This small code example manages to test every combination of prepossessing and regression steps, then automatically select the best model. [A combination of SNV (Standard Normal Variate), 1st derivative and XGBoost showed the highest cross validated explained variance of 0.958].

The transformed spectra and the model weights can be overlaid to provide insights into the model mechanics:


Multivariate Analysis (MVA) has been successfully applied in manufacturing and is here to stay. But there is no doubt that Machine Learning (ML) data engineering concepts will be widely applied to this domain as well. Pipelines and autotuning libraries will ultimately replace the manual work of selecting data transformation, model selection and hyper parameter tuning. New ML algorithms and Deep Learner, in combination with local and global explainer, will expand Manufacturing Intelligence and provide key insights into Process Dynamics.

Special Thanks

Thanks to Dr. Salvador Garcia-Munoz for providing code examples and data sets.

For more information, please contact us.

© All rights reserved.