Share
AI/ML
6 min read
Share
Today's AI systems are incredibly sophisticated, revolutionizing the way businesses operate. From answering customer inquiries to advanced problem-solving, AI is transforming workflows across industries. These advanced systems don’t just follow rules—they make decisions, collaborate with other AI, and adapt to unpredictable situations. And while they can do amazing things, they also face unique challenges like:
These gaps have made it clear that AI observability must extend beyond traditional metrics like latency, error rates, token usage, and cost. Businesses need smarter tools that can answer higher-order questions - How can organizations ensure that AI consistently delivers useful and accurate results? How do companies monitor the performance of these systems to minimize errors and prevent missed opportunities?
Splunk and AGNTCY have teamed up to address these challenges head-on, introducing tools and standards designed to transform how organizations monitor and improve their AI systems.
Splunk is collaborating with the AGNTCY, a Linux Foundation initiative, to establish open standards for monitoring AI systems. Here’s how:
Splunk and AGNTCY are advancing agentic semantic conventions within the OpenTelemetry (OTel) schema, a vendor-neutral open standard designed for annotating, tracking, and measuring LLM and agent-level telemetry. By contributing this schema to OTel and adopting it in Splunk, customers gain a consistent, portable way to capture and share AI performance data across different systems and vendors.
Building on those semantic conventions, Splunk’s integration of AGNTCY Metrics Compute Engine (MCE) can calculate next-gen quality metrics—factual accuracy and coherence—alongside operational signals (latency, errors, throughput). By replacing custom-built pipelines with a reliable, vendor-neutral solution, MCE streamlines performance monitoring, empowering teams to optimize AI systems with actionable insights.
The AGNTCY Metrics Compute Engine (MCE) delivers a comprehensive, dual-layered analysis of AI system performance. It moves beyond conventional monitoring by integrating quantitative statistical analysis with advanced qualitative evaluation, providing a complete picture of operational efficiency and output quality.
MCE’s expanded metrics are generated through a two-step approach:
One of the core challenges in observing modern AI systems is data chaos. Telemetry from diverse LLMs, autonomous agents, and various frameworks arrive in countless formats.
The MCE normalizes telemetry from disparate AI frameworks, including LLMs and agents, into a unified schema based on vendor-neutral OpenTelemetry standards. This process creates a foundation of standardized data, eliminating the data silos and inconsistencies that obscure performance insights.
MCE assesses both the quantitative mechanics of the system and the qualitative value of its output.
Together, these methods provide a 360-degree view of AI performance: the operational heartbeat and the quality of the results.
Picture this: a retail bank launches an AI assistant in its mobile app to answer credit card and loan questions. While performance metrics show fast responses and few errors, customers still call in for clarifications or complain about unclear repayment steps.
With Splunk’s upcoming AI observability, powered by the AGNTCY Metrics Compute Engine (MCE) and Telemetry Hub, the bank can go beyond basic metrics. Every conversation between the AI and customers is analyzed in real time—not just for technical performance, but also for conversational quality: Did the assistant use the right tone? Was it accurate and compliant? Did it maintain context throughout the chat?
The MCE scores each interaction for coherence, and flow, revealing where customers struggle—like fee disputes that may need clearer instructions, or APR queries that slip if rate feeds lag. The Telemetry Hub lets the bank compare different AI versions, analyze performance by mobile versus web channels, and connect these insights to business results—such as tracking if better clarity leads to fewer call center contacts.
This holistic approach ensures the AI not only works reliably but also communicates effectively driving better customer experiences and business outcomes.
AI is moving from single-model deployments to multi-agent systems, where specialized agents collaborate on complex tasks. Interoperability—not lock-in—will determine who scales. That’s why AGNTCY is contributing agent semantic conventions to OpenTelemetry (OTel) and hardening shared compute foundations like Telemetry Hub and the MCE.
Taken together, these open building blocks make quality and collaboration measurable today and extend naturally to multi-agent metrics tomorrow—so teams can evolve without losing observability, portability, or vendor neutrality.
"Splunk is excited to partner with the AGNTCY project to establish an open source infrastructure and open standards for agentic applications. This effort will drive observability of these complex systems through standardized instrumentation and unified telemetry across vendors and agents in OpenTelemetry. Splunk's AI agent monitoring will build on this open foundation, leveraging components such as the AGNTCY Metrics Compute Engine, to provide visibility and insights into the performance of agentic and LLM-based applications." — Patrick Lin, Senior Vice President and General Manager of Observability at Splunk
We recommend that you start with AGNTCY Observe (Obs & Eval)—our open source toolkit for instrumenting LLMs/agents and computing quality metrics like factual accuracy and coherence alongside latency and errors.
The GitHub repos below include a quick start and sample app so you can stream telemetry, run our Metrics Compute Engine (MCE), and view results in your preferred dashboard.
If you're curious about more than metrics, then learn how AGNTCY is shaping the future of multi-agent systems, explore the AGNTCY project on GitHub, and visit the centralized docs at AGNTCY.org.
Whether you're experimenting with agent orchestration or planning to deploy agentic architectures at scale, AGNTCY offers the building blocks for trustworthy, interoperable AI collaboration.
Read this blog to learn how Splunk and AGNTCY are enabling actionable insights for AI.
Get emerging insights on innovative technology straight to your inbox.
Outshift is leading the way in building an open, interoperable, agent-first, quantum-safe infrastructure for the future of artificial intelligence.
* No email required
The Shift is Outshift’s exclusive newsletter.
Get the latest news and updates on agentic AI, quantum, next-gen infra, and other groundbreaking innovations shaping the future of technology straight to your inbox.