Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
INSIGHTS
15 min read
Share
In part 1 of our architect's guide to the AIoT series, we explored the AIoT (Artificial Intelligence of Things) problem space, the emergent behaviors, and the architecturally significant challenges. We learned how to address them using AIoT patterns and comprehensive reference architecture. In this post, I will show you how to apply the principles and patterns of this reference architecture to build a real-world AIoT application that can run on resource-constrained edge devices.
While the reference architecture formalizes the recurring scenarios and repeatable best practices into abstract AIoT patterns, the reference implementation offers concrete archetypes that can be used as foundational building blocks for any AIoT application. In this implementation, I have attempted to maximize the use of open-source projects, however, in certain areas, none existed, so I wrote my own. I have coded these unopinionated modules with a deliberate openness to both extension and modification.
The reference implementation can be used as a collection of individual reusable libraries and templates, or as a unified application framework.
The reference implementation is organized into two sections:
The core platform and MLOps services of the reference infrastructure use various CNCF projects from the Kubernetes ecosystem such as K3S, Argo, Longhorn, and Strimzi along with custom-coded modules in Go and Python. Here is the complete list of the mappings.
Tier | Layer | Reference Architecture | Reference Implementation |
---|---|---|---|
Platform | Platform Services | Lightweight Pub/Sub Broker Protocol Bridge Event Streaming Broker Model OTA Service Model Registry Device Registry Training DataStore Container Registry Container Orchestration Engine Container Workflow Engine Edge Native Storage | Embedded Go MQTT Broker MQTT-Kafka Protocol Bridge Kafka/Strimzi Model OTA Server Model Registry μService Device Registry μService Training Datastore μService Docker Registry Service K3S Argo Workflows Longhorn |
Platform | MLOps | MLOps CD MLOps UI Control and Data Events Training Pipelines Ingest Pipelines MLOps DAGs | Argo CD Argo Dashboard Control and Data Topics Argo Workflows Training Pipeline Data Ingest μService Argo Demo DAG |
The AIoT application services, which are covered in detail in the next post in this AIoT series, primarily comprise custom-coded modules in C++, Python, and Go.
Tier | Layer | Reference Architecture | Reference Implementation |
---|---|---|---|
Inference | Cognition | Alerts Compressed ML Model Context Specific Inferencing Streaming Data Orchestration Agent | Motor Condition Alerts Quantized Model TF Lite PyCoral Logistic Regression Module Kafka K3S Agent |
Things | Perception | Protocol Gateway Sensor Data Acquisition Pre Processing Filter FOTA ML Model Actuator Control Closed Loop Inferencing Aggregation | OpenMQTTGateway Sensor module FFT DSP Module TF Lite Model Download Servo Controller Module TFLM Module Aggregation Module |
Each infrastructure tier of this implementation uses a particular type of hardware and AI acceleration to ensure the resource availability, scalability, security, and durability guarantees of the tier are met. Each tier can independently scale and fail, enabling services on each tier to be deployed, managed, and secured independently. The hardware and OS specifications for each tier are listed here:
Infrastructure Tier | Device | AI Accelerator | Compute | Memory | OS/Kernel |
---|---|---|---|---|---|
Platform | Jetson Nano DevKit | GPU - 128-core NVIDIA Maxwell™ | CPU – Quad-core ARM® A57 @ 1.43 GHz | 2 GB 64-bit LPDDR4 | Ubuntu 18.04.6 LTS 4.9.253-tegra |
Platform | Raspberry Pi 4 | None | Quad Cortex-A72 @ 1.5GHz | 4GB LPDDR4 | Debian GNU/Linux 10 (buster) 5.10.63-v8+ |
Inference | Coral Dev Board | GPU - Vivante GC7000Lite TPU - Edge TPU VPU - 4Kp60 HEVC/H.265 | Quad Cortex-A53 @ 1.5 GHz | 1 GB LPDDR4 | Mendel GNU/Linux 5 (Eagle) 4.14.98-imx |
Inference | ESP32 SoC | None | MCU - Dual Core Xtensa® 32-bit LX6 @ 40Mhz | 448 KB ROM 520 KB SRAM | ESP-IDF FreeRTOS |
Things | ESP32 SoC | None | MCU - Dual Core Xtensa® 32-bit LX6 @ 40Mhz | 448 KB ROM 520 KB SRAM | ESP-IDF FreeRTOS |
I will now show you how to configure each tier and prepare it to host an AIoT application.
The concrete implementation of this tier runs on an ESP32 SoC. The next post gets into the details of the hardware setup.
The concrete implementation of this tier runs on a cluster of three Coral Dev Boards and an ESP32 SoC. This tier hosts the following services:
The cluster of TPU Dev boards are ARM devices running Mendel Linux. These devices host the TFLite PyCoral modules. We will first install the latest Linux Mendel OS on the Dev Boards by following these steps: (Note: These steps are specific to the macOS)
Install ADB tools on your laptop or PC
bash brew install android-platform-tools
Use the serial terminal at 115200 baud to connect to the device
screen /dev/tty.SLAB_USBtoUART 115200
The concrete implementation of this tier runs a cluster of two Raspberry Pi devices and a NVIDIA Jetson Nano device.
Here are the steps to configure this tier.
SSH into the device and confirm the OS is 64bit ARM by running
dpkg --print-architecture
Update the OS using
sudo apt-get update
sudo apt-get upgrade
Add the following lines to /boot/cmdline.txt (This is required for K3S and containerd to work correctly)
add cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1
SSH into the device and remove docker using the following commands
dpkg -l | grep -i docker
sudo apt-get purge -y docker-engine docker docker.io docker-ce docker-ce-cli
sudo apt-get autoremove -y --purge docker-engine docker docker.io docker-ce
sudo rm -rf /var/lib/docker /etc/docker
sudo rm /etc/apparmor.d/docker
sudo groupdel docker
sudo rm -rf /var/run/docker.sock
sudo rm -rf ~/.docker
At this point, the edge devices have all the prerequisite firmware and OS configurations needed to install and run the platform services. We will now install and configure various platform services for MLOps, communication, and container orchestration.
In this reference infrastructure, K3S is set up in a single-server node configuration with an embedded SQLite database and requires two separate steps.
The first step is to install and run the K3S server on the platform tier (a Raspberry Pi4 device or an equivalent VM). Here are the steps:
Install and run the server control node
#replace the <IP Address> with the IP Address of the device or VM
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--write-kubeconfig ~/.kube/config --write-kubeconfig-mode 666 --tls-san <IP Address> --node-external-ip=<IP Address>" sh -
Confirm proper setup by using crictl
crictl info
Get the token to authorize the agent nodes
cat /var/lib/rancher/k3s/server/token
The agent nodes get installed on all the tiers except the things tier. Install the K3S agent on the Jetson Nano and Coral TPU Dev Kits, and then confirm proper setup using crictl
#replace the <IP Address> with the IP Address of the K3S server node
#replace the <TOKEN> with the token from the server node
curl -sfL https://get.k3s.io | K3S_URL=https://<IP Address>:6443 K3S_TOKEN=<TOKEN> sh -
crictl info
With each successful agent node setup, you should be able to see the entire cluster by running this command on the K3S server node
kubectl get nodes -o wide -w
This is what I see on my cluster
Install longhorn by following these steps:
Create a new namespace architectsguide2aiot and label the raspberrypi device 1
kubectl create ns architectsguide2aiot
kubectl label nodes agentnode-raspi1 controlnode=active
Add a node selector in the longhorn.yaml file to run the following longhorn CRDs only on devices labeled controlnode=active
apiVersion: v1
kind: ConfigMap
metadata:
name: longhorn-default-setting
namespace: longhorn-system
data:
default-setting.yaml: |-
backup-target:
backup-target-credential-secret:
system-managed-components-node-selector:"controlnode: active"
.
.
.
# add this for each of the the following CRDs
# DaemonSet/longhorn-manager
# Service/longhorn-ui
# Deploymentlonghorn-driver-deployer
nodeSelector:
controlnode: active
Install the ingress controller by following these instructions
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: longhorn-ingress
namespace: longhorn-system
annotations:
# type of authentication
nginx.ingress.kubernetes.io/auth-type: basic
# prevent the controller from redirecting (308) to HTTPS
nginx.ingress.kubernetes.io/ssl-redirect: "false"
# name of the secret that contains the user/password definitions
nginx.ingress.kubernetes.io/auth-secret: basic-auth
# message to display with an appropriate context why the authentication is required
nginx.ingress.kubernetes.io/auth-realm: "Authentication Required "
spec:
rules:
- http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: longhorn-frontend
port:
number: 80
Open the longhorn dashboard and navigate to settings->general. Set the configuration to the following settings and save.
- Replica Node Level Soft Anti-Affinity : true
- Replica Zone Level Soft Anti-Affinity : true
- System Managed Components Node Selector : controlnode: active
Label the raspberry pi device 2
kubectl label nodes agentnode-raspi2 controlnode=active
Wait till all the CSI drivers and plugins are deployed and running on the raspberry pi device 2
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
longhorn-csi-plugin-rw5qv 2/2 Running 4 (18h ago) 10d 10.42.5.50 agentnode-raspi2 <none> <none>
longhorn-manager-dtbp5 1/1 Running 2 (18h ago) 10d 10.42.5.48 agentnode-raspi2 <none> <none>
instance-manager-e-f74eeb54 1/1 Running 0 172m 10.42.5.53 agentnode-raspi2 <none> <none>
engine-image-ei-4dbdb778-jbw5g 1/1 Running 2 (18h ago) 10d 10.42.5.52 agentnode-raspi2 <none> <none>
instance-manager-r-9f692f5b 1/1 Running 0 171m 10.42.5.54 agentnode-raspi2 <none> <none>
Open the volumes panel and then create a new volume with the following settings
Name : artifacts-registry-volm
Size: 1 Gi
Replicas: 1
Frontend : Block Device
Using the dashboard create a PV and PVC in the namespace architectsguide2aiot and name it artifacts-registry-volm
kubectl get pv,pvc -n architectsguide2aiot
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/artifacts-registry-volm 1Gi RWO Retain Bound architectsguide2aiot/artifacts-registry-volm longhorn-static 12d
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/artifacts-registry-volm Bound artifacts-registry-volm 1Gi RWO longhorn-static 12d
Here are the steps to install and configure a private docker registry on the platform tier:
Install docker a Raspberry Pi4 device or an equivalent VM
sudo apt-get update
sudo apt-get remove docker docker-engine docker.io
sudo apt install docker.io
sudo systemctl start docker
sudo systemctl enable docker
Now start the docker distribution service on the device or VM. This is the local docker registry. The -d flag will run it in a detached mode.
-d -p 5000:5000 --restart=always --name registry registry:2
Edit /etc/docker/daemon.yaml to add an insecure registry entry
{
"insecure-registries": ["localhost:5000"]
}
Note: I highly recommend that you use a secure registry using a proper CA and signed certs by following these instructions. But, for this reference infrastructure, I am taking a shortcut and configuring an insecure registry.
Restart the docker service
systemctl restart docker.service
Configure a mirror endpoint in the K3S server node by editing the /etc/rancher/k3s/registries.yaml
#replace the <IP Address> with the IP Address of the node hosting the docker registry service
mirrors:
docker.<IP Address>.nip.io:5000:
endpoint:
- "http://docker.<IP Address>.nip.io:5000"
Replace the <IP Address> with the IP Address of the node hosting the docker registry service
#replace the <IP Address> with the IP Address of the node hosting the docker registry service
[plugins.cri.registry]
[plugins.cri.registry.mirrors]
[plugins.cri.registry.mirrors."docker.io"]
endpoint = ["https://registry-1.docker.io"]
[plugins.cri.registry.mirrors."docker.<IP Address>.nip.io:5000"]
endpoint = ["http://docker.<IP Address>.nip.io:5000"]
Restart the k3s-agent service and verify the proper configuration of the k3s-agent service using crictl
systemctl restart k3s-agent.service
crictl info
We also need to set up docker buildx which is used to build the ARM64 compatible inference modules images. On the device hosting the docker registry, initialize and setup docker buildx
docker buildx
docker buildx create --name mybuilder
Argo workflow is used in this reference infrastructure to run parallel ML jobs expressed as DAGs. Here are the installation and configuration steps:
Deploy the Argo workflow CRDs
kubectl create ns architectsguide2aiot
kubectl apply -n architectsguide2aiot -f https://github.com/argoproj/argo-workflows/releases/download/v3.1.11/install.yaml
Switch the workflow executor to the Kubernetes API. A workflow executor is a process that conforms to a specific interface that allows Argo to perform certain actions like monitoring pod logs, collecting artifacts, managing container lifecycles, etc
kubectl patch configmap/workflow-controller-configmap \
-n architectsguide2aiot \
--type merge \
-p '{"data":{"containerRuntimeExecutor":"k8sapi"}}'
Port forward to open the argo console in a browser
kubectl -n architectsguide2aiot port-forward svc/argo-server 2746:2746
Get the auth token
kubectl -n architectsguide2aiot exec argo-server-<pod name> -- argo auth token
Strimzi provides the images and operators to run and manage Kafka on a Kubernetes cluster. We will now install and configure Strimzi on one of the Raspberry Pi devices. This deployment includes the following components
This deployment also includes the following Strimzi Operators:
Here are the Installation steps:
Create a namespace for strimzi deployment
kubectl create ns architectsguide2aiot
Apply the Strimzi install file and then provision the Kafka Cluster
kubectl create -f 'https://strimzi.io/install/latest?namespace=architectsguide2aiot' -n architectsguide2aiot
kubectl apply -f 'https://strimzi.io/examples/latest/kafka/kafka-persistent-single.yaml' -n architectsguide2aiot
Modify the kafka-persistent-single.yaml to start the node port external listeners
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: architectsguide2aiot-aiotops-cluster
spec:
kafka:
version: 2.8.0
replicas: 1
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
- name: external
port: 9094
type: nodeport
tls: false
configuration:
bootstrap:
nodePort: 32199
brokers:
- broker: 0
nodePort: 32000
- broker: 1
nodePort: 32001
- broker: 2
nodePort: 32002
config:
offsets.topic.replication.factor: 1
transaction.state.log.replication.factor: 1
transaction.state.log.min.isr: 1
Modify the tolerations and affinities to limit scheduling of pods to specific nodes
template:
pod:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "Kafka"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: dedicated
operator: In
values:
- Kafka
Apply the modified configuration and wait for all the services to start
kubectl apply -f 'https://strimzi.io/examples/latest/kafka/kafka-persistent-single.yaml' -n architectsguide2aiot
kubectl wait kafka/my-cluster --for=condition=Ready --timeout=300s -n architectsguide2aiot
See the Lightweight Pub/Sub broker section in the next post.
See the protocol bridge section in the next post.
The devices with AI accelerators such as GPUs or TPUs need to be labeled so as the ensure placement of ML workloads on the proper AI accelerated device.
kubectl label nodes agentnode-coral-tpu1 tpuAccelerator=true
kubectl label nodes agentnode-coral-tpu2 tpuAccelerator=true
kubectl label nodes agentnode-coral-tpu3 tpuAccelerator=true
kubectl label nodes agentnode-nvidia-jetson gpuAccelerator=true
To prevent Strimzi from scheduling workloads on the devices in the inference tier use the following taints:
kubectl taint nodes agentnode-coral-tpu1 dedicated=Kafka:NoSchedule
kubectl taint nodes agentnode-coral-tpu2 dedicated=Kafka:NoSchedule
kubectl taint nodes agentnode-coral-tpu3 dedicated=Kafka:NoSchedule
In this post, we followed a detailed step-by-step guide for establishing a reference infrastructure on edge devices by installing and configuring various CNCF projects such as Argo, K3S, Strimzi, Longhorn, and various custom services.
In the concluding section of the Architect's Guide to the AIoT series, we will see how to build, deploy and manage a “real world” AIoT reference application using TensorFlow Lite and TFLM and deploy it on this infrastructure.
Get emerging insights on innovative technology straight to your inbox.
Discover how AI assistants can revolutionize your business, from automating routine tasks and improving employee productivity to delivering personalized customer experiences and bridging the AI skills gap.
The Shift is Outshift’s exclusive newsletter.
The latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.