Bringing AI/ML to Cumulocity - Cloud and Edge Inferencing

While agentic AI and large language models are getting a lot of attention right now — and rightfully so — traditional AI techniques like machine learning, deep learning, and statistical modeling continue to play an equally important role, especially in the IoT and timeseries domain. Use cases such as anomaly detection, timeseries forecasting, predictive maintenance, and computer vision rely heavily on these approaches and remain central to how organizations extract value from sensor and device data.

This article focuses squarely on that space. It covers how data generated by devices registered to Cumulocity can be used to train machine learning models externally — and once those models are trained, how they can be deployed back to Cumulocity for inferencing, either on the cloud or at the edge, depending on the use case.

This is a foundation article. Follow-up posts will dive deeper into specific use cases and hands-on examples. But first, let’s get the big picture right.

The End-to-End Analytics Pipeline

Let’s start from the top. How does data flow through Cumulocity, and where does ML fit in?

It begins with data ingestion. Devices send telemetry data — things like temperature, vibration, pressure — either through thin-edge (running locally on the device) or directly to the Cumulocity cloud/on-prem environment.

This operational data is typically short-term and high-frequency. For long-term retention, it gets regularly exported or archived into a data lake — for example, using DataHub. Having months or even years of historical data in one place is incredibly valuable when it comes to training machine learning models.

From the data lake, a data scientist or AI developer pulls the data into their ML training environment. This step happens outside Cumulocity. Developers are free to use whatever framework they prefer — TensorFlow, PyTorch, Scikit-learn, you name it. There are no restrictions here.

Once the model is trained and validated, it gets packaged and deployed back into Cumulocity’s operational environment. And this is where it gets interesting, because we have two deployment options depending on your use case:

  1. Cloud inference — The model runs in the cloud. Streaming Analytics sends incoming device data to the model for live inference.

  2. Edge inference — The model runs directly on the device. This is ideal for scenarios with limited connectivity or resource-constrained hardware.

Cumulocity offers different repositories suited for different operational contexts — the File Repository for managing assets in the cloud environment, and the Software Repository and Configuration Repository for managing packages and configurations on edge devices.

In both cases, the loop is closed. Incoming data feeds the model, the model produces results, and those results drive actions — like raising alarms, creating events, generating tickets, or feeding labeled data back into the data lake for continuous improvement.

Why ONNX?

Before diving into the cloud and edge architectures, there’s one important building block to talk about: ONNX — the Open Neural Network Exchange.

Here’s the challenge: AI developers train models using different frameworks. One team might use TensorFlow, another prefers PyTorch. If every framework had to be supported natively inside Cumulocity, things would get complicated fast.

ONNX solves this elegantly. It’s an open-source, interoperable standard for representing machine learning models. You train your model in whatever framework you like, then convert it to the ONNX format. After that, you only need a single ONNX Runtime to execute it — regardless of how it was originally built.

The other big advantage is portability. You can train a model once and deploy the same ONNX model to the cloud. Or you can optimize and quantize it to run efficiently on edge devices. One model format, multiple deployment targets.

That’s why Cumulocity uses ONNX Runtime as its standard for ML inferencing. It keeps things simple, flexible, and framework-agnostic.

More details on the ONNX standard can be found here.

AI/ML Inferencing on the Cloud

Starting with the cloud-based approach.

A device connected to Cumulocity generates telemetry data, which lands in the operational data store. For long-term needs, this data is archived into a data lake. From there, it’s used in the ML training environment — covering everything from data labeling and feature engineering to model training and validation. The final trained model is exported in ONNX format.

Now, to bring this model into Cumulocity, you simply upload it to the File Repository. The File Repository already supports file management with versioning, so your models are tracked and managed just like any other operational asset. No new tooling needed.

Next comes the inference pipeline, built using Streaming Analytics in a low-code way. A typical ML pipeline has three steps:

Pre-processing — Raw sensor data rarely matches what the model expects. You might need to normalize values, compute rolling averages, or reshape the data. Such custom logics can be implemented within Streaming Analytics using the Smart Function Block.

Inference — The pre-processed data is passed to the ONNX model. The ONNX Block loads the model directly from the File Repository and runs the inference.

Post-processing — Based on the model’s output, you define what happens next using the Smart Function Block. If an anomaly is detected, you can generate alarms or events. You can create new measurements for visualization on dashboards. You can trigger external actions like raising a ticket in a third-party system. And you can send labeled data back to the data lake — feeding the loop for continuous model improvement.

The key message here: your existing Cumulocity infrastructure — the File Repository, Streaming Analytics, Smart Functions — is what powers the ML pipeline. You bring your trained model, and Cumulocity provides the operational framework to run it.
You can find more details on the Smart Function Block here.

AI/ML Inferencing on the Edge

Now shifting to edge inferencing, which takes a slightly different approach. Here, the inference happens on the device itself — not in the cloud.

For this, a generic ONNX Pipeline Runner has been developed and will be made available as an open-source solution via GitHub. It’s packaged as a standard Debian package and delivered through Cumulocity’s Software Repository. Once installed on the device via the Software tab, it becomes a reusable inference engine on the edge.

The important word here is generic. The runner is not tied to any specific use case. It doesn’t care whether you’re doing thermal imaging, anomaly detection, or predictive maintenance. It simply executes a standard three-step pipeline: pre-processing, ONNX inference, and post-processing.

So how does it know what to do? Through configuration files. The runner expects three components — the pre-processing logic, the ONNX model, and the post-processing logic — delivered as configuration files. These files are managed in Cumulocity using the Configuration Repository and can be pushed directly to the device.

This design gives you real flexibility. Working on a thermal analytics use case? Push the corresponding configuration files to the device. Later, want to switch to anomaly detection? Just push a different set of files. The runner itself stays the same — no redeployment needed.

Think of it this way: the runner is the engine, and the pipeline is defined by whatever configuration you feed it.

The result? ML inference runs locally on the edge. Only the results — events, measurements, alerts — are sent back to the cloud. This keeps bandwidth low and response times fast.

And again, notice the pattern: you’re using Cumulocity’s existing repositories (Software Repository for the runner, Configuration Repository for the pipeline files) to manage everything. No separate ML infrastructure needed on the device side.

Wrapping Up

Here’s the key takeaway:

Train your models using your own preferred tools and frameworks. Then manage and run those models on Cumulocity — making ML a natural part of your analytics pipeline.

There’s no need to change how models are built. Cumulocity meets developers where they are. Whether deploying to the cloud or to the edge, the platform’s existing repositories and tools handle model management, versioning, and pipeline orchestration.

This article covered the foundational architecture. Upcoming posts will get hands-on with specific use cases — walking through real examples of cloud and edge inferencing step by step.

Stay tuned!

4 Likes