IN-DEPTH TECH

16 min read

by

Asheesh Goja

Published on 03/21/2022

Last updated on 02/28/2025

Published on 03/21/2022

Last updated on 02/28/2025

The Architect's Guide to the AIoT - Part 1

Subscribe to

The Shift!

Get emerging insights on innovative technology straight to your inbox.

"All you really need to know for the moment is that the AIoT is a lot more complicated than you might think, even if you start from a position of thinking it’s pretty damn complicated in the first place." - Inspired by HG2G

Introduction

Cloud computing, artificial intelligence, and internet connected devices are the ineliminable technological pillars of contemporary digital society. However, a greater untapped potential, that can usher in the next generation of digital transformations and innovations, lies latent at the convergence of these technologies.

"The combined power of AI and IoT collectively referred to as the Artificial Intelligence of Things or AIoT, promises to unlock unrealized customer value in a broad swath of industry verticals such as edge analytics, autonomous vehicles, personalized fitness, remote healthcare, precision agriculture, smart retail, predictive maintenance, and industrial automation."

In principle, combining AI with IoT seems to be the obvious logical progression in the evolution of these technologies. In practice though, building an AIoT solution is fraught with seemingly insurmountable architectural and engineering challenges. In this three-part series, I will discuss such challenges in sufficient detail and address them by proposing an overarching architectural framework. I hope this series will give you the adequate architectural context and perspective needed to build an industrial-grade scalable and robust AIoT application. Here is the series breakdown: Part 1: AIoT Architecture - In this section, you will get a thorough grounding in the AIoT problem space, understand the inherent challenges and investigate emergent behaviors. I will present a set of effective solution patterns that can address such challenges, along with a comprehensive reference architecture. The reference architecture will serve as a cognitive map in the hitherto uncharted territory of AIoT architectures. It will assist you in pairing AIoT problem scenarios with applicable solution patterns and viable technology stacks. Part 2: AIoT Infrastructure - Here using the reference architecture you will see how to establish an edge infrastructure for an AIoT application. The infrastructure is built using various CNCF open-source projects from the Kubernetes ecosystem such as K3S, Argo, Longhorn and Strimzi. You will see how to configure and install these projects on a cluster of AI acceleration equipped single-board computers such as NVIDIA® Jetson Nano™ and Google Coral Edge TPU™. Part 3: AIoT Design - In the concluding part, you will see how to design and build an AIoT application that simulates an industrial predictive maintenance scenario. In this scenario, analog sensors monitor an induction motor by sensing its power utilization, vibration, sound, and temperature, and this data is then processed by an AIoT application. This application powered by a TPU accelerator applies a logistic regression model to predict and prevent motor breakdown. You will see how ML pipelines measure drift, re-train and re-deploy the model. Using various design artifacts such as event diagrams and deployment topology models you will get an in-depth view of the systems design. You will find ample code and configuration samples in C++, Go, Python and YAML. These samples will show you how to configure, code, build (ARM64 compatible), containerize (distroless), deploy and orchestrate AIoT modules and services as MLOps pipelines across various heterogeneous infrastructure tiers. This section also includes IoT device firmware code along with circuit schematics.

The Problem - "Illusion of simplicity"

Building a “hello world” AIoT application is simple - train a model on the cloud, embed it in a device, simulate some sensor data, perform inferences, blink a few LEDs, and you are done. However, this simplicity is illusory, as engineering a “real world” AIoT solution is altogether a different ballgame, with an order of magnitude more complexity, and requiring deep technical know-how that spans multiple domains of electrical engineering and computer science. In designing a "real world" AIoT solution one encounters a myriad of challenges that necessitates a careful examination of various problem scenarios, emergent behaviors, conflicting requirements, and tradeoffs. Let's discuss the architecturally significant ones in more detail.

Emergent Operational Complexity

AI and IoT based solutions often incorporate dissimilar design principles, industry standards, development methodologies, security controls, and software/firmware delivery pipelines. They run on heterogeneous computational platforms, operating systems, and network topologies. They exhibit a broad range of computing, storage, bandwidth, and energy utilization capabilities. This disparity in hardware and software of AI vs. IoT systems results in significant emergent operational complexity when combined in an AIoT solution. Emergent_Operational_Complexity

Embedding a trained model and running inferences on an edge device is a relatively simple problem to solve. However, in the real world, post deployment, the model often drifts. This requires drift monitoring, re-training, and re-deployment. Data quality and timeliness are essential for drift detection necessitating continuous sensor data collection, processing, validation, and training. Updated ML models need to be re-deployed to the IoT devices using continuous delivery pipelines. Hence, the lifecycle of an AIoT application includes both ML and IoT related build, test, deploy toolchains and processes. Therefore one needs to account for the entire end-to-end operation of an AIoT solution encompassing software development, delivery, security, and monitoring.

Computational Complexity

The computational complexity, both space and time, of learning algorithms significantly differs from inferences. To illustrate this point, let’s look at the logistic regression algorithm complexity in this table Computational_Complexity

Notice the training time complexity of the logistic regression using newton-raphson optimization vs. the inference time. The training complexity is polynomial while the inference is linear. As an example, a resource constrained device does not have the computational power to train a logistic regression model but can easily handle a logistic regression inference. Conversely, an AI accelerated device (say with an onboard GPU accelerator) might be overkill, both from a cost and computational power perspective if used just for inferencing. This is an important consideration that needs to be accounted for architecturally.

Resource Constraints

The computational complexity of ML tasks quickly overwhelms resource-constrained devices that have limited energy, memory, compute, and storage capacity. Most ML frameworks are too onerous for embedded devices. The standard hardware-agnostic metrics used to measure performance such as FLOPS and MACs multiplier–accumulate (MAC), lack the fidelity to measure real performance for a particular edge ML device. Optimization strategies targeted for such hardware introduce errors that erode the model efficacy. Compute intensive inferences can starve IoT devices and interfere with real-time sensing and actuation subroutines.

Security and Privacy

Deriving any actionable and meaningful insight from the data collected by the AIoT devices requires processing analyzing the sensor data on the edge tier. However such data often has to stay on the device for privacy and security reasons. Edge devices lack the physical security guarantee of a data center. A single compromised edge node can significantly widen the scope of a security breach. Low energy and low bandwidth IoT protocols are particularly prone to such attacks. Thus the application of appropriate security controls is essential to ensure data security and privacy. However, this creates a particularly intractable set of requirements as computation intensive security controls compete for power, resources, and bandwidth on devices that are inherently resource constrained.

Latency Constraints

Autonomous vehicles, robotics, and industrial automation often require instant action, low latency “sense, decide and act” real-time loops. Even with the ML logic embedded on the device, the context needed to make a decision requires an IoT device to frequently communicate with the edge tier. This makes enabling closed-loop AI enabled decisions, particularly challenging in real-world scenarios.

The Solution - “AIoT Patterns”

In order to address such challenges in their entirety, one needs to take a holistic view of the entire problem space and uncover a set of recurring problems that span both the AI and IoT domains. My approach to expressing the solution is extensively based on the language of patterns. Various architectural and design patterns can be quite effective in managing the complexity of running the entire AIoT solution on the edge tier. Embedded ML patterns can also help in addressing the device resource constraint challenges. Minimizing or eliminating the dependency on the cloud tier can be achieved by running the entire ML pipeline on the edge tier, closer to the sensors. This can vastly improve the network latency and address security concerns.

Application Architecture Patterns

Tiered Infrastructure

Manage complexity by creating a clear separation of concerns using a tiered architecture. Partition the infrastructure into tiers to separate training from inferences and data acquisition activities. This allows for independent scaling, energy management, and securing of each tier. As you will see in the subsequent sections, separating the inference from learning activities and running them on separate tiers allows for the training jobs to run on AI accelerated hardware such as GPUs or TPUs, while inference jobs can run on resource constrained hardware. This separation also minimizes the power demands on battery powered hardware as the energy intensive training jobs can now run on a dedicated tier with wired AC/DC powered devices.

Event-driven architecture

Process high volume and high velocity IoT data in real-time with minimal latency and maximum concurrency using messages and event streams. Allow continuous flow, interpretation, and processing of events, while minimizing temporal coupling between sensor data consumers and producers. This pattern facilitates a loosely coupled structure and organization of such services on heterogeneous computational platforms. It also enables each service to scale and fail independently thus creating clear isolation boundaries.

Event Streaming for ML

Establish a durable and reliable event streaming mechanism for communication between the services involved in training, inferencing, and orchestrations. Various command and data messages can persist as streams and get ordered (within a partition). Consumers can process the streams as they occur or retrospectively. Consumers can join the stream anytime, replay, ignore or process past messages asynchronously.

Publish and Subscribe for IoT

Establish lightweight and bandwidth efficient pub/sub based messaging to communicate with the IoT devices. Such messages cannot be replayed or retransmitted once received. A new subscriber will not be able to receive any past messages and the message order is not guaranteed.

Protocol Bridge

Bridge the two event-driven patterns by converting the pub/sub messages into event streams and vice versa.

Streaming API sidecar

Using the sidecar pattern to isolate and decouple embedded inference from communication with event streams. This keeps the inference modules lean and portable with minimal dependencies, ideal for constrained device deployments.

Embedded ML Patterns

ML techniques for constrained devices

Various techniques to adapt the model architecture and reduce its complexity and size can be quite effective in minimizing resource utilization. Here are a few examples

Model partitioning
Caching
Early stopping/termination
Data compression/sparsification.
Patch based Inferencing such as MCUNetV2

Model Compression

Compressing the model can significantly reduce the inference time and consequently minimize resource consumption. In the reference implementation, I will be using quantization to compress the model.

Binarized Neural Networks

Binarizing weights and activations to only two values (1, -1) can improve performance and reduce energy utilization. However, the use of this strategy needs to be carefully weighed against the loss of accuracy.

DSP

Using digital signal processing, close to the point of data acquisition, can significantly improve signal-to-noise ratio and eliminate inconsequential data. In industrial IoT scenarios, training the model on the raw sensor data tends to train the model on the noise rather than the signal. Transforms such as Fourier, Hilbert, Wavelet, etc. can vastly improve both training and inference efficiency.

Multi-stage inference

Perform close-loop, low latency inferencing for anomaly detection and intervention at the edge closer to the point of data acquisition. Use context specific inferencing for predictive analytics at an aggregate level. In the reference implementation, they are referred to as "Level 1" and "Level 2" inferencing respectively.

MLOps Patterns

Reproducibility Pattern - Containerize workloads, Pipeline execution

Package ML tasks such as ingest, extract, drift detection, train, etc., and related dependencies as containerized workloads. Use container orchestration to manage the workload deployments. Use container workflow pipelines to automate continuous training, evaluation, and delivery.

AI Accelerator aware orchestration strategy

Use AI accelerator aware workload placement strategies to ensure workloads that require AI acceleration are placed on appropriate computational hardware.

Edge Learning

Bring the entire learning pipeline to the edge tier, eliminating the dependency on the cloud tier. Run and manage ML tasks such as extract, drift detection, training, validation, and model compression on the edge tier.

Directed Acyclic Graphs

Express the desired state and flow of the ML tasks and their dependencies as directed acyclic graphs (DAG). Use a container workflow engine to achieve the desired state and flow.

Automated container orchestration

Use declarative automation to deploy, manage and monitor containerized workloads across various edge infrastructure tiers.

Formalizing AIoT patterns in a reference architecture is an effective strategy to decompose the problem space, identify recurring scenarios and apply repeatable best practices and patterns to resolve them.

The Reference Architecture

Using the aforementioned patterns, this reference architecture attempts to manage the complexity arising in developing, deploying, and monitoring an AIoT solution, on a plethora of heterogeneous computational hardware and network topologies. It achieves this by proposing a distributed event-driven architecture that is hosted on a multi-tier infrastructure. The multi-tiered architecture creates clear and distinct boundaries for network, security, scalability, durability, and reliability guarantees of the solution. Each tier can be independently secured and scaled based on the nature of the tier's workload, data privacy, and computational hardware characteristics. The_Reference_Architecture

The three infrastructure tiers host various components and services, have specific roles, and establish a clear separation of the following concerns:

Control
Data
Intelligence
Model/Artifacts
Communication

Let's examine the characteristics of each tier in more detail and understand how a tiered event-driven architecture addresses these concerns.

Things Tier

The Things Tier hosts the Perception components. The sensors and actuators in this tier serve as the primary interface to the physical world. Components in this tier sense the physical environment, digitize the signal, process and transmit it to the rest of the tiers. The Things Tier is comprised of constrained edge devices and is architected to meet the following requirements and operational constraints:

Role and Responsibilities

Interface with the sensors and digitize the analog signals
Preprocesses data using DSP filters
Perform closed-loop inferences
Interface with actuators
Provide protocol gateway services for sensor nodes to gateway communication
Provide IoT gateway services for communication with the outside world
Package, normalize, aggregate, and transmit data using lightweight messaging protocols.
Response to command messages and perform operations such as triggering a model OTA download
Minimize data loss
Ensure low latency between inference and actuation

Operating environment

Microcontroller, SoC
8, 16, or 32 bit architecture
RTOS or Super Loop
Sensor or mote nodes

Resources

Low power consumption computational workloads
Limited on-device memory and storage
No scalability options
No file system
Power consumption - Peak milliwatts to microwatts, quiescent nanowatts
Power source - Battery, solar, or harvested
No on-board thermal management

Network

Wireless sensor networks between simple sensors nodes and the gateway
Star, tree, or mesh topologies
Use of low power and bandwidth IoT protocols such BLE, LoRa, or Zigbee
Limited bandwidth and intermittent connectivity

Security

Gateway initiated connections to the outside world with asymmetric key cryptography
Strict device identity and encryption using on-chip secure cryptoprocessors such as Trusted Platform Module (TPM)

Inference Tier

The inference tier hosts the Cognition services that analyze data coming from the Things Tier and generate real-time actionable insights and alerts. This tier is architected to meet the following requirements and operational constraints:

Role and Responsibility

Respond to command events from the MLOps layer
Download the latest ML models in response to command events
Subscribe to various context enrichment event streams
Perform context specific inferences
Generate insights using event stream processing
Synthesize higher-order alert events by integrating inferences with events stream processing insights
Maximize data timeliness

Operating environment

Embedded Microprocessor or Single-board Computers
ARM architecture
Embedded Linux or RTOS operating systems

Resources

Moderately intensive computational workloads
Power consumption - Peak milliwatts, quiescent microwatts
Power source - Battery or external power supply
Passive thermal management such as heat sink

Network

Moderate bandwidth and throughput

Security

Data in-transit secured using mutual TLS
No data at rest is allowed on this tier

Platform Tier

The platform tier hosts two categories of services - MLOps and Platform Services. It logically partitions training-related activities from platform services, enabling computationally intensive training jobs to run on dedicated AI accelerated devices. This tier is architected to meet the following requirements and operational constraints:

Role and responsibilities - MLOps Layer

Provide mechanisms to express MLOps workflows, pipelines, and dependencies as Directed acrylic graphs (DAG)
Provide mechanisms to declaratively define AI accelerator aware workload placement strategies
Orchestrate MLOps pipelines for data collection, processing, validation, and training
Provide continuous deployment capabilities for embedded ML models
Produce command events to orchestrate various model deployment and training activities
Ingest streaming data, normalize and create training data
Detect drift in the models
Compress models and store them in the artifacts registry
Provide MLOps dashboard services
Maximize data quality

Role and responsibilities - Platform Service Layer

Coordinate workload orchestration with the local Control Agents
Manage deployment and monitoring of containerized workload and services
Enable lightweight messaging to communicate with the IoT devices
Provide durable and reliable event streaming services
Bridge the messaging and streaming protocols
Provide private container registry services
Provide artifacts repository, metadata, and training datastore services
Store and serve quantized models
Provide embedded ML model over the air (Model OTA) services

Operating environment

Single-board Computers with AI Acceleration such as GPU or TPU
ARM or x86 architecture
Embedded Linux operating system

Resources

IOPS intensive workloads
Large high throughput storage
Shared file system
Computation and memory intensive workloads
Large on-device memory
Active thermal management such as conductive or peltier cooling

Network and Communication

High bandwidth and throughput

Security

Data in-transit secured using mutual TLS
Encrypt data at rest

Summary

In this article, we explored the AIoT problem landscape, the emergent behaviors, and architecturally significant use cases. We saw how using a tiered event driven architecture and employing AIoT patterns in a reference architecture, we can achieve a clean separation of concerns, address emergent behaviors and manage the ensuing complexity. In part 2 of this series, we will see how to build a concrete infrastructure implementation of this reference architecture that is capable of hosting a real-world AIoT application.

Subscribe to

The Shift!

Get emerging insights on innovative technology straight to your inbox.

Welcome to the future of agentic AI: The Internet of Agents

Outshift is leading the way in building an open, interoperable, agent-first, quantum-safe infrastructure for the future of artificial intelligence.

* No email required

Twitter

Facebook

Published on 00/00/0000

Last updated on 00/00/0000

Published on 00/00/0000

Last updated on 00/00/0000

Twitter

Facebook

Introduction

"The combined power of AI and IoT collectively referred to as the Artificial Intelligence of Things or AIoT, promises to unlock unrealized customer value in a broad swath of industry verticals such as edge analytics, autonomous vehicles, personalized fitness, remote healthcare, precision agriculture, smart retail, predictive maintenance, and industrial automation."

The Problem - "Illusion of simplicity"

Emergent Operational Complexity

Computational Complexity

Resource Constraints

Security and Privacy

Latency Constraints

The Solution - “AIoT Patterns”

Application Architecture Patterns

Tiered Infrastructure

Event-driven architecture

Event Streaming for ML

Publish and Subscribe for IoT

Protocol Bridge

Bridge the two event-driven patterns by converting the pub/sub messages into event streams and vice versa.

Streaming API sidecar

Embedded ML Patterns

ML techniques for constrained devices

Various techniques to adapt the model architecture and reduce its complexity and size can be quite effective in minimizing resource utilization. Here are a few examples

Model partitioning
Caching
Early stopping/termination
Data compression/sparsification.
Patch based Inferencing such as MCUNetV2

Model Compression

Compressing the model can significantly reduce the inference time and consequently minimize resource consumption. In the reference implementation, I will be using quantization to compress the model.

Binarized Neural Networks

DSP

Multi-stage inference

MLOps Patterns

Reproducibility Pattern - Containerize workloads, Pipeline execution

AI Accelerator aware orchestration strategy

Use AI accelerator aware workload placement strategies to ensure workloads that require AI acceleration are placed on appropriate computational hardware.

Edge Learning

Directed Acyclic Graphs

Express the desired state and flow of the ML tasks and their dependencies as directed acyclic graphs (DAG). Use a container workflow engine to achieve the desired state and flow.

Automated container orchestration

Use declarative automation to deploy, manage and monitor containerized workloads across various edge infrastructure tiers.

Formalizing AIoT patterns in a reference architecture is an effective strategy to decompose the problem space, identify recurring scenarios and apply repeatable best practices and patterns to resolve them.

The Reference Architecture

The three infrastructure tiers host various components and services, have specific roles, and establish a clear separation of the following concerns:

Control
Data
Intelligence
Model/Artifacts
Communication

Let's examine the characteristics of each tier in more detail and understand how a tiered event-driven architecture addresses these concerns.

Things Tier

Role and Responsibilities

Interface with the sensors and digitize the analog signals
Preprocesses data using DSP filters
Perform closed-loop inferences
Interface with actuators
Provide protocol gateway services for sensor nodes to gateway communication
Provide IoT gateway services for communication with the outside world
Package, normalize, aggregate, and transmit data using lightweight messaging protocols.
Response to command messages and perform operations such as triggering a model OTA download
Minimize data loss
Ensure low latency between inference and actuation

Operating environment

Microcontroller, SoC
8, 16, or 32 bit architecture
RTOS or Super Loop
Sensor or mote nodes

Resources

Low power consumption computational workloads
Limited on-device memory and storage
No scalability options
No file system
Power consumption - Peak milliwatts to microwatts, quiescent nanowatts
Power source - Battery, solar, or harvested
No on-board thermal management

Network

Wireless sensor networks between simple sensors nodes and the gateway
Star, tree, or mesh topologies
Use of low power and bandwidth IoT protocols such BLE, LoRa, or Zigbee
Limited bandwidth and intermittent connectivity

Security

Gateway initiated connections to the outside world with asymmetric key cryptography
Strict device identity and encryption using on-chip secure cryptoprocessors such as Trusted Platform Module (TPM)

Inference Tier

Role and Responsibility

Respond to command events from the MLOps layer
Download the latest ML models in response to command events
Subscribe to various context enrichment event streams
Perform context specific inferences
Generate insights using event stream processing
Synthesize higher-order alert events by integrating inferences with events stream processing insights
Maximize data timeliness

Operating environment

Embedded Microprocessor or Single-board Computers
ARM architecture
Embedded Linux or RTOS operating systems

Resources

Moderately intensive computational workloads
Power consumption - Peak milliwatts, quiescent microwatts
Power source - Battery or external power supply
Passive thermal management such as heat sink

Network

Moderate bandwidth and throughput

Security

Data in-transit secured using mutual TLS
No data at rest is allowed on this tier

Platform Tier

Role and responsibilities - MLOps Layer

Provide mechanisms to express MLOps workflows, pipelines, and dependencies as Directed acrylic graphs (DAG)
Provide mechanisms to declaratively define AI accelerator aware workload placement strategies
Orchestrate MLOps pipelines for data collection, processing, validation, and training
Provide continuous deployment capabilities for embedded ML models
Produce command events to orchestrate various model deployment and training activities
Ingest streaming data, normalize and create training data
Detect drift in the models
Compress models and store them in the artifacts registry
Provide MLOps dashboard services
Maximize data quality

Role and responsibilities - Platform Service Layer

Coordinate workload orchestration with the local Control Agents
Manage deployment and monitoring of containerized workload and services
Enable lightweight messaging to communicate with the IoT devices
Provide durable and reliable event streaming services
Bridge the messaging and streaming protocols
Provide private container registry services
Provide artifacts repository, metadata, and training datastore services
Store and serve quantized models
Provide embedded ML model over the air (Model OTA) services

Operating environment

Single-board Computers with AI Acceleration such as GPU or TPU
ARM or x86 architecture
Embedded Linux operating system

Resources

IOPS intensive workloads
Large high throughput storage
Shared file system
Computation and memory intensive workloads
Large on-device memory
Active thermal management such as conductive or peltier cooling