Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
AI/ML
10 min read
Share
Distributed multi-agent software are systems composed of multiple autonomous agents running on independent computing nodes, communicating over a network through event-driven workflows to answer a user query.
These applications appear to the end user as a single coherent system, even though the constituent agents are operating independently within their own environments and are often unaware about each other’s role and involvement in the workflow. This design enables unprecedented flexibility in how agent capabilities can be dynamically composed and orchestrated.
Key characteristics of such systems are:
The journey toward composing scalable multi-agent software has evolved from single process applications where all agents run within the same process to sophisticated distributed architectures. Central to this evolution is the agent runtime—a specialized program/API that manages agent identities, lifecycles, and communication patterns. Much like how programming language runtimes provide necessary infrastructure for code execution, agent runtimes provide communication infrastructure for agents to interact, collaborate, and solve complex problems together.
The core constructs of Internet of Agents (IoA) remote gateway distributed agent runtime (DAR), address the unique challenges of multi-agent systems. DAR enables the construction of sophisticated multi-agent software that can operate seamlessly across network boundaries while maintaining the coordination necessary for collaborative problem-solving.
By providing the critical infrastructure for agent communication, lifecycle management, security enforcement, and operational monitoring, agent gateway represents the next step in unlocking the full potential of multi-agent software.
The simplest form of a multi-agent software operates within a standalone runtime—a single-process environment where specialized agents and tools execute collaboratively. In this configuration, data and memory sharing across agents and tools occurs naturally, similar to how methods within a class access shared attributes as shown below. When Agent2 needs to recognize Agent1 or share data, no special mechanisms are required as they operate within the same memory space.
def __init__(self):
# Shared data store for all agents
self.shared_data = {}
def agent1(self, key, value):
# Agent 1 stores data in the shared store
self.shared_data[key] = value
print(f"Agent 1: Stored '{value}' with key '{key}'")
def agent2(self, key):
# Agent 2 retrieves data from the shared store
value = self.shared_data.get(key, "Not found")
print(f"Agent 2: Retrieved '{value}' using key '{key}'")
return value
For straightforward use cases with limited complexity and well-defined agent interactions, a standalone runtime prodes efficiency and simplicity. Development teams can rapidly prototype multi-agent software without addressing cross-process communication and lifecycle management challenges.
Limitations of standalone agent runtime
Despite their simplicity, building multi-agent software using a standalone agent runtime becomes limiting as the system scales:
Limited extensibility
Extending multi-agent software requires substantial development effort such as building new agents from the ground up, reconfiguring the entire LangGraph workflow including modifying current edges, adding conditional edges, etc. to accommodate new message flows. This process becomes increasingly complex as the number of agents increases.
Standalone architectures create silos that prevent leveraging externally developed components. Agents cannot be easily integrated across organizational boundaries, requiring redundant development of similar capabilities and limiting innovation.
Limited reusability
Organizations cannot effectively reuse agent components across projects or incorporate agents developed by third parties, resulting in duplicated efforts and inconsistent implementations.
A distributed agent runtime is comprised of agents and an agent gateway, which orchestrates inter-agent communication using agent gateway protocol (AGP) primitives. The AGP specification defines a standardized communication framework for AI agents.
It supports diverse messaging patterns, including request-response, publish-subscribe, fire-and-forget, and streaming. Built on gRPC, agent gateway exposes a host server which remote agents connect to. This gateway ensures secure, scalable, and efficient interactions between agents, enabling robust multi-agent collaboration.
Such a distributed agent runtime enables developers to build event-driven multi-agent workflows that address the limitations of standalone agent runtimes by enabling seamless cross-process communication, agent lifecycle management, and privacy preservation in multi-agent software. This runtime allows agents—developed in different programming languages and frameworks—to interact across distributed environments, running on different host machines over a network.
The following messaging patterns are defined by the AGP and can be implemented by the agent gateway and the third party remote agents.
The agent gateway consists of three primary components:
The control plane handles agent administration, including tenant management, namespace organization, agent categorization, and authentication. It features a registration service for agent onboarding, agent discovery for metadata, and token rotation for secure authentication with OAuth2 tokens. End-to-end encryption ensures message privacy via payload-level encryption by agents and transport-level encryption using HTTPS.
The data plane is responsible for efficiently forwarding messages between agents, ensuring seamless communication. Designed for scalability, it is a critical component of the system, present in both the agent SDK and the backend service.
The gateway interface (Python bindings) provides a convenient way for agents built using heterogeneous frameworks to communicate within a distributed system. These bindings act as a bridge between Python-based clients and the underlying gRPC-based gateway, enabling seamless interaction with both the control plane and data plane services.
The gateway server manages agent registrations and message routing, while the Python client interacts with these services, managing subscriptions, sending messages, and handling responses asynchronously.
Agent identification and routing
Each agent is uniquely identified using a hierarchical structure:
AgentID = Organization/Namespace/Agent-type/Agent-UUID
Organization – The tenant that owns and registers the agent.
Namespace – A logical partition for traffic segmentation.
Agent-type – The category/role of the agent, such as "finance," "healthcare”, etc.
Agent-UUID – A unique identifier assigned per agent instance, changing with each restart.
Messages are routed based on topic structures, which define how they reach their destination:
The gateway maintains subscription tables to efficiently map incoming messages to their intended recipients. The gateway keeps track of active agents and their connections via a connection table.
Optimized subscription tables
To ensure efficient routing, subscriptions are structured hierarchically:
This structure optimizes message delivery by reducing lookup overhead and ensuring efficient memory access.
Communication patterns
The gateway uses the following three message forwarding strategies to route messages efficiently based on the intended recipients.
Unicast forwarding
Messages are sent to a specific agent instance using a fully qualified topic. The system validates the destination and ensures point-to-point delivery.
Broadcast forwarding
Messages are sent to all agents of a particular type by omitting the agent-UUID and using a fan-out mechanism. This is useful for distributing system-wide updates or commands.
Anycast forwarding
A message is sent to one randomly selected instance of an agent type, ensuring load balancing while minimizing unnecessary message duplication.
Event handling and message processing
Connection events
Subscription management
Message handling
This section explores a distributed agent runtime for multi-agent software using an event-driven publish-subscribe model. The system leverages an agent gateway and the agent gateway protocol (AGP) to facilitate messaging between agents built on heterogeneous agentic frameworks. These four remote agents, running on different hosts, connect to a remote gateway using topics for structured communication.
System design
One gateway host (gRPC server): Manages core logic and request processing.
Four gateway clients (gRPC clients): Send requests to invoke gateway functions for agent interactions. Each client is part of a LangGraph or Autogen based agentic application.
Agents & tools | Framework | Protocol | Publishes to | Subscribes to | Host/Deployment | Runtime |
IOA Agent Gateway | gRPC | AGP | - | - | EC2 instance #1 | gRPC Agent Host Servicer |
Writer Agent | LangGraph | AGP | ‘GroupChat’ Topic | Group Chat Topic, Writer Topic | EC2 instance #4 | gRPC Agent Runtime Worker #1 |
Reviewer Agent | AutoGen | AGP | GroupChat Topic | Group Chat Topic, Editor Topic | EC2 instance #2 | gRPC Agent Runtime Worker #2 |
User Interface Agent | AutoGen | AGP | - | UI Topic | Local Machine | gRPC Agent Runtime Worker #3 |
Central Orchestrator Agent | AutoGen | AGP | UI Topic | Group Chat Topic | EC2 instance #3 | gRPC Agent Runtime Worker #4 |
The proof of concept validates that these integrated technologies and architectural patterns work together to create a scalable, flexible system capable of supporting heterogeneous AI workflows across organizational boundaries.
The decision between standalone and distributed runtime architectures depends on specific application requirements:
Choose standalone runtime when:
Choose distributed runtime when:
The distributed agent runtime delivers multiple advantages for AI systems:
Cross-organizational integration
Development and operational efficiency
This architecture enables unprecedented integration possibilities while simplifying the development of complex, collaborative AI systems that work effectively across organizational boundaries.
To learn more, collaborate with us at the AGNTCY - an open source collective building the infrastructure for the Internet of Agents.
Get emerging insights on innovative technology straight to your inbox.
Outshift is leading the way in building an open, interoperable, agent-first, quantum-safe infrastructure for the future of artificial intelligence.
* No email required
The Shift is Outshift’s exclusive newsletter.
Get the latest news and updates on agentic AI, quantum, next-gen infra, and other groundbreaking innovations shaping the future of technology straight to your inbox.