High Frequency Data

Processes in industrial operation occur often at different time scales, some are fast (sub seconds to hours), and others are slow (hours, days, weeks, or months). In a biotechnology facility for example, there are slow moving batch processes, fast purification steps and very fast filling lines. Capturing events at different time scales and analyzing them, requires a data strategy for the acquisition, storage, and analysis.

To optimize storage space and network bandwidth, the OSIsoft PI system differentiates between high frequency data also known as snapshot values and compressed or archived data. Data are archived from the snapshot table by applying a swinging door compression algorithm. This data strategy has proven to be great balance between displaying real time data in high resolution as well as storing sufficiently enough data for historical data analysis.

The drawback of this approach is that the snapshot queue contains only a single value for each process variable, so analysis based on snapshot or event driven data is limited to single point. There are some valuable use cases such as statistical process control, alarm management or event triggers. However, Machine Learning (ML) or multivariate models (MVA) are usually based on time series vectors.

To accommodate advance modeling of high frequency data, the OSIsoft PI system requires expansion off the snapshot table to a low latency time series storage:

High Frequency Data

The requirements for the Snapshot Db are primarily driven by read speed as well as write speed. Some open-source time series databases such as QuestDB that allows a million writes per seconds are available now. The read speeds are even more impressive: We measured ~ 800K read/sec for a standard OSIsoft PI system, whereas a low latency TSDB is faster by a factor of 800 - 1,000 (see demo:  QuestDB · Console )

An additional benefit of using open source TSDB is that it allows us to add open-source ML and MVA libraries as well as, to take advantage of the very rich open-source visualization ecosphere. For example, the following shows a Grafana Dashboard of the Snapshot Db:


The OSIsoft PI system has been designed to capture real time events in a snapshot table and store compressed data in the PI Data Archive. This data architecture is optimized for short term event data and long-term data storage. Missing in this scenario are capabilities to store and analyze high frequency data for which modern low latency time series databases can provide. By adding a dedicated high frequency data store, fast processes can be monitored and analyzed in parallel to an already existing data infrastructure. This will open a large range of new uses cases that are difficult or impossible to realize with existing systems.

For information, please contact us.

machin learning with osi pi

Python based machine learning (ML) libraries have evolved at an unbelievable pace. It is most impressive that the time-consuming steps such as data encoding, feature selection, model comparison and even model optimization have been fully automated. For example, the relatively new Python library PyCaret calculates the metrics of over 21 different regression models and selects the best one with just a few lines of codes. Machine learning with OSI Pi has come along way.

There are plenty of industrial applications, where these algorithms could be successfully applied. But there are two major bottlenecks for successful projects:

  1. Historical Data collection for the Model Development
    1. Real time data collection for the Model Integration

Model Development data could be downloaded in Excel or text\csv files and analyzed offline. The drawback is that this approach cannot be productized and is limited to off-line applications.

To accelerate the model development and model integration (MD\MI pipelines) for the OSIsoft PI System, TQS has developed a Python library called TQS Pandas PiFrames for OSIsoft® PI System® that connects to the PI System and provides PI data as Pandas data frames. The Pandas data frame is the preferred data structure in Python for data scientists and is supported by many ML libraries. Therefore, the TQS Pandas PiFrames for OSIsoft® PI System® can be easily integrated into ML projects in both model development and model integration.

The following shows some code examples in Python.

  1. Connecting to the PI Data Historian and PI System:

cdf = ConnectToDefaultAF()
cdf = ConnectToDefaultPI()

df = GetMultipleAttributeValuesByVariable("Bio Reactor 1",["Temperature","Concentration","Level"],'t-2h','t',60,0,None)

The resulting data frame is a time series:

The data frame can also be arranged by variable columns:

df = GetMultipleAttributeValuesByFrame("Batch_0_*","Bio Reactor 1",["Temperature","Concentration","Level"],'t-7d','t',60,0,None)

During the last couple of months, we have developed use cases around OSIsoft PI system that are based on the TQS Pandas PiFrames for OSIsoft® PI System® library:

The library has shown to significantly reduce the model development and model integration time.


Machine Learning and AI projects are often slow to develop and difficult to integrate. The main reason is that most Python libraries are expecting Pandas data frames (or Numpy arrays) and these data structures are not readily available in industrial automation. TQS Integration has developed the TQS Pandas PiFrames for OSIsoft® PI System® libraries to accelerate both model development and model integration. The library is user friendly, fast and scales well for all common machine learning (ML) applications.

For information, please contact us.

Machine Learning (ML) has seen an exponential growth during the last five years and many analytical platforms have adopted ML technologies to provide packaged solutions to their users. So, why has Machine Learning become mainstream?

Let’s take a look at Technically Multivariate Analysis (MVA). While many algorithms have been widely available for a long time, MVA is still considered a subset of ML algorithms. MVA typically refers to two algorithms:

As such, MVA has become a de facto standard in manufacturing batch processing and others. Some typical use cases are:

In principle, industrial datasets are not different from other supervised or unsupervised learning problems and they can be evaluated using a wide range of algorithms. Multivariate Analysis was preferred because it offered global and local explainability. MVA models are multivariate extensions of the well understood linear regression that provide weights (slope) for each variable. This enables critical understanding and optimization of underlying process dynamics which is a very important aspect in manufacturing.


In the past, many ML algorithms were considered black box models, because the inner mechanics of the model were not transparent to the user. These model types had limited utility in manufacturing since they could not answer the WHY and therefore lacked credibility.

This has very much changed. Today, model explainers in ML are a very active field of research and excellent libraries have become available to analyze the underlying model mechanics of highly complex architectures.

The following shows an example of applying ML technologies to a typical MVA project type. In the original publication (https://journals.sagepub.com/doi/10.1366/0003702021955358 ), several preprocessing steps have been studied together with PLS to build a predictive model. All steps were performed using commercial off the shelf software that manually worked the analysis.

Using ML pipelines, the same study can be structured as follows:

pipeline=Pipeline(steps= [('preprocess', None), ('regression',None)])
preprocessing_options=[{'preprocess': (SNV(),)},
                       {'preprocess': (MSC(),)},
                       {'preprocess': (SavitzkyGolay(9,2,1),)},
                       {'preprocess': (make_pipeline(SNV(),SavitzkyGolay(9,2,1)),)}]

regression_options=[{'regression': (PLSRegression(),), 'regression__n_components': np.arange(1,10)},
                    {'regression': (LinearRegression(),)},
                    {'regression': (xgb.XGBRegressor(objective="reg:squarederror", random_state=42),)}]
param_grid = []
for preprocess in preprocessing_options:
    for regression in regression_options:
        param_grid.append({**preprocess, **regression})
search=GridSearchCV(pipeline,param_grid=param_grid, scoring=score, n_jobs=2,cv=kf_10,refit=False)

This small code example manages to test every combination of prepossessing and regression steps, then automatically select the best model. [A combination of SNV (Standard Normal Variate), 1st derivative and XGBoost showed the highest cross validated explained variance of 0.958].

The transformed spectra and the model weights can be overlaid to provide insights into the model mechanics:


Multivariate Analysis (MVA) has been successfully applied in manufacturing and is here to stay. But there is no doubt that Machine Learning (ML) data engineering concepts will be widely applied to this domain as well. Pipelines and autotuning libraries will ultimately replace the manual work of selecting data transformation, model selection and hyper parameter tuning. New ML algorithms and Deep Learner, in combination with local and global explainer, will expand Manufacturing Intelligence and provide key insights into Process Dynamics.

Special Thanks

Thanks to Dr. Salvador Garcia-Munoz for providing code examples and data sets.

For more information, please contact us.

digital twin

Have you ever wondered if it were possible to predict process conditions in manufacturing? Know what is likely to happen before it actually happens in your business processes? Digital Twin might just be your answer.


There are several different definitions of Digital Twins or Clones and many use them interchangeably with terms such as Industry 4.0 or the Industrial Internet of Things (IIOT). Fundamentally, Digital Twins are digital representations of a physical asset, process or product, and they behave similarly to the object they represent. The concept of Digital Clones has been around for some time. Earlier models were based on engineering principles and approximations, however they required very deep domain expertise, were time consuming and were limited to a few use cases.

Today Digital Clones are virtual models that are built entirely by using massive historical datasets and Machine Learning (ML) to extract the underlying dynamics. The data driven approach makes Digital Clones accessible for a wide range of applications. Therefore, the potential for Digital Twins is enormous and includes process enhancements\optimization, equipment life cycle management, energy reductions, safety improvements just to name a few.

Building digital clones require:

1.      A large historical data set or data historian

2.      High data quality and sufficient data granularity

3.      Very fast data access

4.      A large GPU for the model development and real time predictions

5.      A supporting data structure to manage the development, deployment, and maintenance of ML models

The following shows the application of a Digital Twin to a batch process example. The model is built with 30 second interpolated data using a window of past data to predict future (5 min) data points:

So, what’s all the hype of Digital Clones? Well, not only are they able to predict process conditions, they also provide explanatory power on what drives the process - the underlying dynamics. The following dashboard shows a replay of this analysis including the estimate of the model weights:


In summary, the availability of enterprise level data historians and deep learning libraries allow Digital Clones to be implemented on the equipment and process level throughout manufacturing. The technology allows a wide range of applications and offer an insight into the process dynamics that were not previously available, improving data integrity and data access while achieving trust and data transparency with your partners. This helps to digitalize data management and processes to lower risk and improve efficient data sharing with partners.

Please contact us for more information.

Machine Learning (ML) will undoubtedly transform manufacturing and grow from a few selected application such as Predictive Maintenance to a wide range of use cases. The technology already exists today, libraries are widely available under open-source licenses and on-premises IT infrastructure as well as cloud service allow these applications to scale.

So, what is holding it back?

One area that limits the wide adoption of ML models is the underlying data structure. Companies have heavily invested in their data infrastructure and the creation of meta databases (mostly ISA-95 and ISA-88), but the productizing of ML models is still lagging. There are several reasons for this:

Industrial standards ISA-95 and ISA-88 provide a framework to structure the equipment and batch model, but by design do not support ML modeling. For example, one equipment can have several ML uses cases that all require a different structure, e.g. example multivariate batch modeling, predictive maintenance, forecasts for predictive control, …

One approach to structure industrial models is ML Relational Mapping (MLRM). It builds on the already existing object relational mapping (ORM) by linking existing type systems. The concept does not require restructuring existing data models and is therefore fast to implement:

MLRM adds an additional type or class that links for example equipment and batch types as well as provides definitions for the ML model. By separating the functionality, this approach does not clutter the existing type system and provides the flexibility to define different models for one class or multi class models without the need to restructure.

The following shows an OSIsoft AF based UI that implements MLRM:


Machine Learning applications will show grow rapidly in the Manufacturing Environment. The challenge will be to provide the right structure, so that ML models can be built on top of existing type systems. ML Relational Mapping (MLRM) provides a flexible approach by implementing a model specific type system that links to existing data models.

© All rights reserved.