Stop Feeding IoT Telemetry Directly Into Your LLM
Right now, there is a massive push from enterprise leadership to connect overarching AI orchestrators—like Microsoft Copilot or custom LangChain frameworks—directly to physical operations. Executives envision the ultimate industrial conversational interface: asking an AI to diagnose a vibrating bearing on the factory floor and automatically generating a work order. It is a great vision. But it often lands on the desks of Enterprise Architects and Data teams as an impossible mission.
Non-technical stakeholders see the magic of GenAI and assume you can simply feed raw, high-frequency IoT telemetry directly into a Large Language Model. Technical teams scratch their head and foresee a production nightmare.
We’ve seen dozens of highly capable engineering teams get trapped in the “sandbox,” struggling to reconcile management’s “Agentic AI” expectations with the harsh realities of data plumbing. They are being asked to build a bridge where the foundations don’t match. Why? Because the physics of streaming data breaks the economics of Generative AI. If you attempt to route raw data points directly into an LLM, you will likely encounter three primary barriers:
-
Latency Constraints: The reasoning time required by an LLM is typically too slow to support real-time responses for critical asset failures.
-
Significant Compute Costs: High-frequency telemetry ingestion leads to excessive token consumption, which can quickly make the project’s scaling costs unsustainable.
-
Risk of Hallucination: Without domain-specific grounding, a generic LLM lacks the physical context to accurately interpret machine states, often leading to unreliable or “guessed” diagnostic outputs.
As IoT experts who have spent over a decade routing, processing, and managing millions of industrial data streams, our advice to architects is simple: Stop building brittle bridges between raw OT data and IT chatbots. Safely exposing physical operations to Agentic AI calls for a fundamentally different approach—an architecture that carefully separates deterministic math from contextual meaning.
A Proven Alternative: The Two-Expert Architecture
In our experience at Cumulocity, the most resilient industrial AI deployments rely on a “Two-Expert” architecture. The most effective strategy is to leverage fast, traditional Machine Learning for raw telemetry, reserving Generative AI to query contextualized data only after an anomaly occurs.
Here is the blueprint for how we recommend building this:
Expert 1: The Watchdog (Deterministic Math at Machine Speed)
Rather than placing an LLM directly in the live data stream, we advise using a deterministic engine—what we call the Watchdog. Sitting directly in the high-speed telemetry feed, this layer uses traditional rules engines and lightweight ML (via ONNX models) to crunch thousands of data points a second.
It handles complex failure patterns and subtle sensor drifts at machine speed. It asks one simple question: “Is this normal?” Because it doesn’t rely on GenAI, it costs fractions of a cent and operates in milliseconds.
Expert 2: The Investigator (Grounded Context and Reasoning)
When the Watchdog detects an anomaly, your Agentic AI (the Investigator) wakes up. But here is the critical difference: The Investigator does not look at the raw data stream. Instead, it interacts with a unified IoT Semantic Layer via the Model Context Protocol (MCP). Because the IoT platform has already normalized the messy OT payloads into a structured, contextualized asset model, the AI knows exactly what it is looking at.
It synthesizes the alarm by reviewing the exact context, associated measurements, asset configuration changes, and related historical events. It acts as a specialized Task Agent that automatically generates a complete diagnostic dossier detailing the likely cause, potential impact, and suggested next steps for the overarching Enterprise Copilot.
The Foundation: Grounding Intelligence in Physical Truth
The secret to this architecture isn’t just the shiny new AI model you choose; it is about combining the proven operational logic you already have in place with a robust data foundation.
You don’t need to reinvent the wheel or throw away what works. The stream processing, threshold rules, and traditional ML models your teams have built over the years are still incredibly effective at doing what they’ve always done best. By introducing a Semantic Layer, you aren’t replacing that foundation—you are supercharging it, ensuring your AI is fully grounded in your existing, domain-specific reality.
It is time to build on a platform designed for the Agentic Era. By leveraging Cumulocity’s out-of-the-box MCP server backed by our robust Semantic Layer, organizations can safely connect any external enterprise AI orchestrator to their asset data in minutes, not months.
Let what already works keep doing its job, let the fast ML watch the streams, and let the grounded AI author the solutions.
Ready to see this architecture in action? Read our recent technical update on how to implement this “Two-Expert” pattern using the new ONNX and AI Agent blocks in Cumulocity Streaming Analytics.
Want to read more about Traditional / Analytics AI versus Agentic AI? Read Why Analytics AI and Agentic AI Are Different and Why Both Matter for Your Industrial Operations

