How to build Machine Learning with OSI Pi and Python.

Python based machine learning (ML) libraries have evolved at an unbelievable pace. It is most impressive that the time-consuming steps such as data encoding, feature selection, model comparison and even model optimization have been fully automated. For example, the relatively new Python library PyCaret calculates the metrics of over 21 different regression models and selects the best one with just a few lines of codes. Machine learning with OSI Pi has come along way.

There are plenty of industrial applications, where these algorithms could be successfully applied. But there are two major bottlenecks for successful projects:

  1. Historical Data collection for the Model Development
    1. Real time data collection for the Model Integration

Model Development data could be downloaded in Excel or text\csv files and analyzed offline. The drawback is that this approach cannot be productized and is limited to off-line applications.

To accelerate the model development and model integration (MD\MI pipelines) for the OSIsoft PI System, TQS has developed a Python library called TQS Pandas PiFrames for OSIsoft® PI System® that connects to the PI System and provides PI data as Pandas data frames. The Pandas data frame is the preferred data structure in Python for data scientists and is supported by many ML libraries. Therefore, the TQS Pandas PiFrames for OSIsoft® PI System® can be easily integrated into ML projects in both model development and model integration.

The following shows some code examples in Python.

  1. Connecting to the PI Data Historian and PI System:


cdf = ConnectToDefaultAF()
cdf = ConnectToDefaultPI()


df = GetMultipleAttributeValuesByVariable("Bio Reactor 1",["Temperature","Concentration","Level"],'t-2h','t',60,0,None)

The resulting data frame is a time series:


The data frame can also be arranged by variable columns:

df = GetMultipleAttributeValuesByFrame("Batch_0_*","Bio Reactor 1",["Temperature","Concentration","Level"],'t-7d','t',60,0,None)

During the last couple of months, we have developed use cases around OSIsoft PI system that are based on the TQS Pandas PiFrames for OSIsoft® PI System® library:

The library has shown to significantly reduce the model development and model integration time.

SUMMARY

Machine Learning and AI projects are often slow to develop and difficult to integrate. The main reason is that most Python libraries are expecting Pandas data frames (or Numpy arrays) and these data structures are not readily available in industrial automation. TQS Integration has developed the TQS Pandas PiFrames for OSIsoft® PI System® libraries to accelerate both model development and model integration. The library is user friendly, fast and scales well for all common machine learning (ML) applications.

For information, please contact us.

It seems many companies think that Message Queuing Telemetry Transport (short MQTT) can solve all their data integration issues. And there has been a lot of industry chats about this topic. But can it really do that?

The  MQTT has been successfully used to communicate data for over 20 years. It is by design lightweight and has fared well when benchmarked against other competing protocols (e.g. OPC-UA). The central component of the MQTT architecture is the message broker, which allows devices to subscribe to or publish data to a central repository. This architecture makes MQTT very attractive as a central data exchange in manufacturing to integrate different components such as Automation Layer, Historians, MES, ERP and others.

The MQTT message has two components:

  1. Topic
  2. Payload

The MQTT topic is used to route message and allow subscribers to filter messages. The routing by topic requires a design that specifies the location of the data source. If the MQTT broker is used on the enterprise level, it is recommended to use the ISA-95 standard to define the MQTT topic which is often referred to as Unified Name Space (UNS). The following shows the topic as unified names space:

Enterprise A/Site A/Area A/Process Cell A/Bio Reactor 0

MQTT by itself does not specify a message structure and in many IOT applications the payload is simply a JSON string. The JSON is deserialized by the MQTT subscriber (here Ignition) into a tree structure:

OSIsoft AF provides an extensive class or type system for assets (equipment), frames (batch, alarms, OEE) and transfers (traceability and genealogy), where enterprise level equipment structures are either manually created or autogenerated by interface specific connectors.

To publish OSIsoft AF data into the MQTT unified name space is simply to serialize the AF structure into the MQTT topic and attach the attribute values as JSON payloads. As any other MQTT component, OSIsoft AF can be a subscriber, publisher, or both. As a subscriber OSIsoft AF can deserialize the MQTT message and the resulting AF structure can be contextualized by templating (inheritance), categorizing, and referencing.

MQTT with the unified name space is a very elegant way of routing structured data, but the JSON payload is not an ideal solution for several reasons:

(1) There is no clear definition how to structure industrial sensor\time series data.
(2) JSON structures are bulky.
(3) There is no standard way of compressing the payloads.

Is there an alternative?


The SparkplugB standard was developed to address both the data structure and throughput requirements for industrial sending sensor data. There are a couple of key mechanism to accomplish this:

It has exactly five components:

Structuring the unified name space, requires splitting the asset definition between the topic and payload and can lead to an identical structure as described above in the JSON example:

The SparkplugB payload can also be used to serialize (publish) or deserialize (subscribe) OSIsoft AF structure. To subscribe to MQTT SparkplugB, OSIsoft offers a compliant MQTT connector.

Summary

The MQTT SparkplugB standard together with the unified name space concept is an efficient way to exchange sensor or other time series data. There are, however, a few limitations that need to be considered:

Despite above limitations, an edge node can readily publish an ISA-95 compliant OSISoft AF asset structure into the unified namespace. The time-consuming task of mapping OCP or PLC into human readable asset paths has already been concluded when the OSIsosft AF system was setup and configured. An additional benefit shows equipment centric calculations that can be streamed to the MQTT broker and be consumed by MES or ERP systems.

When MQTT SparkplugB data are consumed by the OSIsoft AF system, the equipment centric data stream can be deserialized into an asset structure.

This can lead to significant savings in the integration task, which normally requires the tedious step of mapping the automation layer into the IT layer.

There is an added effort to contextualize the data and add additional abstraction layers such as base classes\inheritance. Frames and transfers must also be configured to allow for the modeling of time-based models (MVA or ML). These steps are essential to rollout MVA\ML models at scale.

MQTT SparkplugB is a real step forward in the Level 1,2 and 3 integration. Level 3+ system as well as MVA\ML models require an extensive type system that can’t readily be flattened into the MQTT SparkplugB standard.

For now, Level 3+ and MVA\ML systems still require a fair amount of integration and configuration.

For information, please contact us.

Multivariate Analysis (MVA) is a well-established technique to analyse highly correlated process variables. It is well known in batch, but also successfully applied in discrete or continuous processing. In comparison to single variable applications, for example statistical process control, MVA has shown to be superior in the detection of process drifts and upsets. In practice, the implementation of MVA requires two different data structures or models:

Event Frames are usually autogenerated from the batch execution system (BES) and reflect the logical\automation sequences for recipe execution. Both AF Elements and Event Frames are  being used to create MVA models and calculate statistics. Below is an example of a multivariate model that combines the autogenerated Event Frame “Unit Procedure” and process variables in the Element: “Bio Reactor 0”:

This type of analysis is  typically used for batch-to-batch comparison (T2 and speX statistics) and batch evolution monitoring in the pharmaceutical, biotech and chemical Industry.

Challenge

One of the shortcomings of using automation phases is that they  seldom  line up with time frames that are critical for the underlying process evolution (process phases). Often there is a mismatch in the granularity, process phases are either longer or shorter in duration compared to the automation phases. Also start and end might be based on specific process conditions, for example temperature, batch maturity, online measurements and others. The mismatch between automation and process phases causes misalignment in the MVA model and a broadening of the process control envelopes. . The resulting models are often not optimal.

Solution

SEEQ has developed a platform that excels in creating time series segments as well as time series data cleansing and conditioning. The platform provides several different approaches to define very precise start and end condition.  The following show the definition of a new capsule based on a profile search that solely focuses on the process peak temperature:

These capsules can be utilized in other applications through an API and blended with other PI data models to create very precise multivariate models:

Benefits

Multivariate Analysis is a powerful method to analyze highly correlated process data. It depends on  equipment\process models and time series segments. OSIsoft PI provides data models for both. And typically time segments are automatically populated from a BES or MES systems. SEEQ provides new capabilities to create highly precise time segments called capsules, that refines the MVA analysis and creates meaningful process envelops. The integration is seamless since both systems provide powerful API’s to their time series data and models. The resulting MVA models target specific process phases that can be used to create improved process control limits or regression analysis.

Please contact us for more information.

© All rights reserved.