AI/ML

10 min read

by

Aditya Patel

Published on 04/15/2025

Last updated on 04/15/2025

Published on 04/15/2025

Last updated on 04/15/2025

Composing event-driven multi-agent workflows with a gRPC-based distributed agent runtime

Subscribe to

The Shift!

Get emerging insights on innovative technology straight to your inbox.

Distributed multi-agent software are systems composed of multiple autonomous agents running on independent computing nodes, communicating over a network through event-driven workflows to answer a user query.

These applications appear to the end user as a single coherent system, even though the constituent agents are operating independently within their own environments and are often unaware about each other’s role and involvement in the workflow. This design enables unprecedented flexibility in how agent capabilities can be dynamically composed and orchestrated.

Key characteristics of such systems are:

Decentralization: No single point of control, with agents making local decisions.
Scalability: Ability to add more agents to the system without significant reconfiguration.
Concurrency: Agents operate independently and in parallel.
Autonomy: Each agent can perceive its environment, make decisions, and take actions.
Event-driven communication: Agents emit and react to events without knowing which other agents will handle their messages, promoting loose coupling.
Coordination: Workflows can emerge from agent interactions, even when agents have no direct interdependencies.

The journey toward composing scalable multi-agent software has evolved from single process applications where all agents run within the same process to sophisticated distributed architectures. Central to this evolution is the agent runtime—a specialized program/API that manages agent identities, lifecycles, and communication patterns. Much like how programming language runtimes provide necessary infrastructure for code execution, agent runtimes provide communication infrastructure for agents to interact, collaborate, and solve complex problems together.

The core constructs of Internet of Agents (IoA) remote gateway distributed agent runtime (DAR), address the unique challenges of multi-agent systems. DAR enables the construction of sophisticated multi-agent software that can operate seamlessly across network boundaries while maintaining the coordination necessary for collaborative problem-solving.

By providing the critical infrastructure for agent communication, lifecycle management, security enforcement, and operational monitoring, agent gateway represents the next step in unlocking the full potential of multi-agent software.

Agent runtimes

1. Standalone agent runtime

The simplest form of a multi-agent software operates within a standalone runtime—a single-process environment where specialized agents and tools execute collaboratively. In this configuration, data and memory sharing across agents and tools occurs naturally, similar to how methods within a class access shared attributes as shown below. When Agent2 needs to recognize Agent1 or share data, no special mechanisms are required as they operate within the same memory space.

    def __init__(self):
        # Shared data store for all agents
        self.shared_data = {}
    
    def agent1(self, key, value):
        # Agent 1 stores data in the shared store
        self.shared_data[key] = value
        print(f"Agent 1: Stored '{value}' with key '{key}'")
    
    def agent2(self, key):
        # Agent 2 retrieves data from the shared store
        value = self.shared_data.get(key, "Not found")
        print(f"Agent 2: Retrieved '{value}' using key '{key}'")
        return value

For straightforward use cases with limited complexity and well-defined agent interactions, a standalone runtime prodes efficiency and simplicity. Development teams can rapidly prototype multi-agent software without addressing cross-process communication and lifecycle management challenges.

Limitations of standalone agent runtime

Despite their simplicity, building multi-agent software using a standalone agent runtime becomes limiting as the system scales:

Limited extensibility

Extending multi-agent software requires substantial development effort such as building new agents from the ground up, reconfiguring the entire LangGraph workflow including modifying current edges, adding conditional edges, etc. to accommodate new message flows. This process becomes increasingly complex as the number of agents increases.

Standalone architectures create silos that prevent leveraging externally developed components. Agents cannot be easily integrated across organizational boundaries, requiring redundant development of similar capabilities and limiting innovation.

Limited reusability

Organizations cannot effectively reuse agent components across projects or incorporate agents developed by third parties, resulting in duplicated efforts and inconsistent implementations.

2. Distributed agent runtime (gRPC-based) - agent gateway and agent gateway protocol (AGP)

A distributed agent runtime is comprised of agents and an agent gateway, which orchestrates inter-agent communication using agent gateway protocol (AGP) primitives. The AGP specification defines a standardized communication framework for AI agents.

It supports diverse messaging patterns, including request-response, publish-subscribe, fire-and-forget, and streaming. Built on gRPC, agent gateway exposes a host server which remote agents connect to. This gateway ensures secure, scalable, and efficient interactions between agents, enabling robust multi-agent collaboration.

Such a distributed agent runtime enables developers to build event-driven multi-agent workflows that address the limitations of standalone agent runtimes by enabling seamless cross-process communication, agent lifecycle management, and privacy preservation in multi-agent software. This runtime allows agents—developed in different programming languages and frameworks—to interact across distributed environments, running on different host machines over a network.

The following messaging patterns are defined by the AGP and can be implemented by the agent gateway and the third party remote agents.

Request-response: Supports synchronous communication between agents.
Publish-subscribe: Allows agents to publish messages to topics and subscribe to receive messages from topics.
Fire-and-forget: Enables agents to send messages without waiting for a response.
Streaming: Supports both unidirectional and bidirectional streaming.
Security: Employs authentication, authorization, and end-to-end encryption to protect data privacy and integrity.

The agent gateway consists of three primary components:

Control plane
Data plane
Gateway interface (Developer friendly Python bindings)

The control plane handles agent administration, including tenant management, namespace organization, agent categorization, and authentication. It features a registration service for agent onboarding, agent discovery for metadata, and token rotation for secure authentication with OAuth2 tokens. End-to-end encryption ensures message privacy via payload-level encryption by agents and transport-level encryption using HTTPS.

The data plane is responsible for efficiently forwarding messages between agents, ensuring seamless communication. Designed for scalability, it is a critical component of the system, present in both the agent SDK and the backend service.

The gateway interface (Python bindings) provides a convenient way for agents built using heterogeneous frameworks to communicate within a distributed system. These bindings act as a bridge between Python-based clients and the underlying gRPC-based gateway, enabling seamless interaction with both the control plane and data plane services.

Key functions of the gateway interface

Agent communication: The bindings allow agents to send and receive messages. Using gateway.publish() and gateway.receive(), agents can exchange messages asynchronously, ensuring efficient communication even in complex systems.
Session and agent management: The bindings provide methods to create and manage agent sessions. Through gateway.create_agent() and gateway.subscribe(), agents can register with the gateway server and establish communication channels for ongoing interactions.
Dynamic route management: With gateway.set_route(), agents can dynamically define routes for message delivery, ensuring messages are directed to the correct recipients across distributed systems.
Asynchronous operations: Built on Python’s asyncio library, the bindings enable non-blocking operations, allowing agents to handle multiple tasks concurrently without waiting for messages or replies, ensuring responsiveness in real-time environments.
OpenTelemetry integration: For observability, the gateway interface integrates with OpenTelemetry to trace agent activity and monitor performance, providing insights into the system’s health.

The gateway server manages agent registrations and message routing, while the Python client interacts with these services, managing subscriptions, sending messages, and handling responses asynchronously.

Agent identification and routing

Each agent is uniquely identified using a hierarchical structure:

AgentID = Organization/Namespace/Agent-type/Agent-UUID

Organization – The tenant that owns and registers the agent.
Namespace – A logical partition for traffic segmentation.
Agent-type – The category/role of the agent, such as "finance," "healthcare”, etc.
Agent-UUID – A unique identifier assigned per agent instance, changing with each restart.

Topic structure and message routing

Messages are routed based on topic structures, which define how they reach their destination:

One-to-many: Organization/Namespace/Agent-type
- Targets all agents of a specific type within the namespace.
- The fan-out parameter controls whether messages go to all instances or a subset.
One-to-one: Organization/Namespace/Agent-type/Agent-UUID
- Sends messages to a specific agent instance based on its unique identifier.

The gateway maintains subscription tables to efficiently map incoming messages to their intended recipients. The gateway keeps track of active agents and their connections via a connection table.

Agent-to-connection table: Maps agent identifiers to active network connections.
Reverse connection table: Allows efficient cleanup when connections drop by mapping connections back to their agents.

Optimized subscription tables

To ensure efficient routing, subscriptions are structured hierarchically:

Main table: Maps organization/namespace pairs to agent-type-specific tables.
Agent-type tables: Track agents within each category for quick lookups.

This structure optimizes message delivery by reducing lookup overhead and ensuring efficient memory access.

Communication patterns

The gateway uses the following three message forwarding strategies to route messages efficiently based on the intended recipients.

Unicast forwarding

Messages are sent to a specific agent instance using a fully qualified topic. The system validates the destination and ensures point-to-point delivery.

Broadcast forwarding

Messages are sent to all agents of a particular type by omitting the agent-UUID and using a fan-out mechanism. This is useful for distributing system-wide updates or commands.

Anycast forwarding

A message is sent to one randomly selected instance of an agent type, ensuring load balancing while minimizing unnecessary message duplication.

Event handling and message processing

Connection events

On connection: The system authenticates the agent and updates its connection tables.
On disconnection: The system removes the agent’s entries, updates subscription tables, and notifies other components if needed.

Subscription management

Subscribe events: Register an agent for receiving messages of a specific topic.
Unsubscribe events: Remove an agent from subscription lists and propagate updates.

Message handling

The system validates incoming messages, matches them against active subscriptions, and forwards them accordingly.
If no direct match is found, default routing strategies are applied to ensure message delivery.

Proof of concept: Multi-agent software using distributed agent runtime (Agent Gateway and AGP)

This section explores a distributed agent runtime for multi-agent software using an event-driven publish-subscribe model. The system leverages an agent gateway and the agent gateway protocol (AGP) to facilitate messaging between agents built on heterogeneous agentic frameworks. These four remote agents, running on different hosts, connect to a remote gateway using topics for structured communication.

System design

One gateway host (gRPC server): Manages core logic and request processing.

Four gateway clients (gRPC clients): Send requests to invoke gateway functions for agent interactions. Each client is part of a LangGraph or Autogen based agentic application.

Components

Agents & tools	Framework	Protocol	Publishes to	Subscribes to	Host/Deployment	Runtime
IOA Agent Gateway	gRPC	AGP	-	-	EC2 instance #1	gRPC Agent Host Servicer
Writer Agent	LangGraph	AGP	‘GroupChat’ Topic	Group Chat Topic, Writer Topic	EC2 instance #4	gRPC Agent Runtime Worker #1
Reviewer Agent	AutoGen	AGP	GroupChat Topic	Group Chat Topic, Editor Topic	EC2 instance #2	gRPC Agent Runtime Worker #2
User Interface Agent	AutoGen	AGP	-	UI Topic	Local Machine	gRPC Agent Runtime Worker #3
Central Orchestrator Agent	AutoGen	AGP	UI Topic	Group Chat Topic	EC2 instance #3	gRPC Agent Runtime Worker #4

Flow Diagram

A screenshot of a computer

AI-generated content may be incorrect.

Watch a demo of the system in action

Key characteristics

Distributed agents with gRPC interconnectivity
- Agents run on different hosts/machines, with no direct knowledge of each other
- gRPC provides high-performance, language-agnostic communication between components
- Geographic distribution becomes possible, enabling global-scale agent systems
Event-driven communication through PUB-SUB messaging
- Agents utilize a publish-subscribe model where messages are published to topics
- The runtime ensures message delivery based on subscriptions
- This event-driven approach decouples senders from receivers, enhancing system flexibility
- Agents can respond dynamically to system events without tight coupling
Heterogeneous framework integration
- Agents are implemented using different frameworks such as LangGraph and Autogen
- Each runtime instance typically hosts a single agent, enhancing fault isolation

The proof of concept validates that these integrated technologies and architectural patterns work together to create a scalable, flexible system capable of supporting heterogeneous AI workflows across organizational boundaries.

Choosing the right architecture

The decision between standalone and distributed runtime architectures depends on specific application requirements:

Choose standalone runtime when:

Building simple prototypes
Working with a limited number of agents that share common resources
Operating within a single development team
Performance is critical and inter-process communication would add unacceptable overhead

Choose distributed runtime when:

Incorporating agents from multiple sources or organizations
Building complex systems that may scale beyond a single machine
Supporting diverse programming languages and frameworks
Creating resilient systems that require workload distribution
Developing enterprise-grade software with long-term extensibility requirements

Benefits of the distributed runtime paradigm

The distributed agent runtime delivers multiple advantages for AI systems:

Cross-organizational integration

Agents from different organizations can seamlessly interact
Third parties can leverage existing agents without rebuilding them
Agent capabilities become accessible services over networks

Development and operational efficiency

Teams can develop specialized agents independently
Workloads can be distributed across machines for better scaling
Runtime infrastructure handles agent lifecycle management

Build with us at the AGNTCY

This architecture enables unprecedented integration possibilities while simplifying the development of complex, collaborative AI systems that work effectively across organizational boundaries.

To learn more, collaborate with us at the AGNTCY - an open source collective building the infrastructure for the Internet of Agents.

Subscribe to

The Shift!

Get emerging insights on innovative technology straight to your inbox.

Twitter

Facebook

Published on 00/00/0000

Last updated on 00/00/0000

Published on 00/00/0000

Last updated on 00/00/0000

Twitter

Facebook

def __init__(self): # Shared data store for all agents self.shared_data = {} def agent1(self, key, value): # Agent 1 stores data in the shared store self.shared_data[key] = value print(f"Agent 1: Stored '{value}' with key '{key}'") def agent2(self, key): # Agent 2 retrieves data from the shared store value = self.shared_data.get(key, "Not found") print(f"Agent 2: Retrieved '{value}' using key '{key}'") return value

by

Aditya Patel

Published on 04/15/2025

Last updated on 04/15/2025

Published on 04/15/2025

Last updated on 04/15/2025

Composing event-driven multi-agent workflows with a gRPC-based distributed agent runtime

Get emerging insights on innovative technology straight to your inbox.

Agent runtimes

1. Standalone agent runtime

2. Distributed agent runtime (gRPC-based) - agent gateway and agent gateway protocol (AGP)

Key functions of the gateway interface

Topic structure and message routing

Proof of concept: Multi-agent software using distributed agent runtime (Agent Gateway and AGP)

Components

Flow Diagram

Watch a demo of the system in action

Key characteristics

Choosing the right architecture

Benefits of the distributed runtime paradigm

Build with us at the AGNTCY

Published on 00/00/0000

Last updated on 00/00/0000

Published on 00/00/0000

Last updated on 00/00/0000

by

Aditya Patel

Published on 04/15/2025

Last updated on 04/15/2025

Published on 04/15/2025

Last updated on 04/15/2025

Composing event-driven multi-agent workflows with a gRPC-based distributed agent runtime

Get emerging insights on innovative technology straight to your inbox.

Agent runtimes

1. Standalone agent runtime

2. Distributed agent runtime (gRPC-based) - agent gateway and agent gateway protocol (AGP)

Key functions of the gateway interface

Topic structure and message routing

Proof of concept: Multi-agent software using distributed agent runtime (Agent Gateway and AGP)

Components

Flow Diagram

Watch a demo of the system in action

Key characteristics

Choosing the right architecture

Benefits of the distributed runtime paradigm

Build with us at the AGNTCY

Welcome to the future of agentic AI: The Internet of Agents

Related articles

AI/ML

Transform AI performance with agent observability and evaluation

AI/ML

Building actionable AI agents with AGNTCY ACP for seamless browser and terminal workflows

AI/ML

AGNTCY Agent Directory: Find and publish AI agents with our new service