Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
INSIGHTS
11 min read
Share
Software development was one of the earliest domains to embrace generative AI (GenAI). Through its ability to understand and produce human-like code, GenAI enables developers to focus on higher-level problem-solving and creative tasks. For many software developers, GenAI has become an essential tool, driving innovation and skyrocketing productivity.
The basic mode of operation for large language models (LLMs) is quite restrictive: given a prompt and a context, predict the next token. For complex tasks—such as building complete projects or scanning large legacy codebases for problems—GenAI users need a more structured and composable approach. This is where agent-based systems come in.
With an agent-based system, engineers break down complex tasks to subtasks, then delegate subtasks to autonomous AI agents that can operate either independently or in collaboration with other agents. Software development teams that use agents manage complex workflows better. The resulting software has fewer human errors and is delivered faster.
GenAI agents are autonomous software entities capable of generating content, making decisions, and interacting with their environment to complete specific tasks. They can adapt and learn from new data, offering flexibility and handling complex tasks with minimal human intervention. When you work with GenAI agents, you move well beyond simple chatbot interactions that most individuals associate with GenAI and LLMs.
The complexity and dynamism of software systems makes software development an area where GenAI agents excel. Understanding the many different types of agents is useful, especially when you need to compose a multi-agent system to orchestrate the completion of complex tasks.
Reflective agents analyze their own actions and decisions, giving them a self-awareness to adapt and improve. This process of continuous monitoring lets the agent evaluate its performance against predefined goals or expectations. It identifies areas where behavior can be optimized and then adjusts its strategies and algorithms accordingly.
In dynamic environments such as software development—where conditions and requirements are constantly changing—reflective agents thrive. They’re able to maintain high performance even in the face of uncertainty.
As an example, consider a reflective agent that creates a pull request (PR) without any tests. A human developer leaves a review comment such as, “Write some tests!” From this, the agent learns the concept that code changes need accompanying tests for PRs to be approved. In future PRs, the agent will ensure that code has the necessary tests to validate the suggested changes.
Also to note, in a multi-agent system, the human reviewer can be replaced by yet another GenAI agent.
Software development uses tools extensively. GenAI agents must be able to invoke these tools to perform their tasks. Some of the things these toll-using agents can do include:
Tool-using agents are configured with the knowledge, access, and credentials to use these tools just as a human developer would.
In a multi-agent system, tool-using agents typically focus on interacting with the tools, while other agents issue the requests for them to perform tasks. Examples of tasks include listing all the files in a particular GitHub repository or running a linter on a source file.
A model-based reflex agent uses an internal model of the environment to make decisions. It looks at the current state of a system and the history of previous states, comparing these against its internal model to make more informed decisions. The internal model enables the agent to handle more complex scenarios, especially those in which the action it needs to take depends on an understanding of how the world works or might change.
How might a model-based reflex agent manage resource allocation in a cloud environment? Its internal model gives it an understanding of optimal cloud performance and how various resource changes affect cloud metrics and states. As the agent monitors the current state of the system—such as CPU usage, memory consumption, and network traffic—it adjusts resources accordingly. If the agent detects an increase in traffic that could lead to a bottleneck, it could automatically scale up resources based on its model of past usage patterns and current system state, ensuring optimal performance.
A multi-agent system is the embodiment of collaboration, accomplishing complex workflows by stitching together the work of many types of agents, including the ones discussed above. In modern usage of GenAI agents, it’s now possible to create a fully-distributed and peer-based system. For example, swarm robotics often mimic life in ant colonies without central control.
Within the context of multi-agent usage in software development, employing a high-level orchestrating agent is a common approach. This kind of agent ensures interdependent tasks are executed in the right order and information flows between the agents. These high-level agents are often model-based reflex agents that keep the state of the overall workflow and drive it forward.
Agentic workflows in software development involve using GenAI agents in collaboration to iteratively plan, execute, and refine tasks. They bring about more adaptable and scalable development processes. Agentic workflows have the following key components:
Adopting agentic workflows in software development can bring substantial benefits to engineering teams. Let’s highlight the most notable ones.
Well-designed systems often follow a consistent pattern where common activities result in a predictable stream of work.
Consider the task of adding a new service and exposing it as a set of RESTful API endpoints. The human developer may define the service in their programming language of choice, implementing it as a class with methods. This implementation step might be done in collaboration with a GenAI agent.
However, preceding the task of exposing the new service as a REST API is a highly mechanical set of subtasks, including:
These steps can be fully delegated to an agent. Although manually building automation is an option, the solution might be brittle and require extensive maintenance. In contrast, a GenAI agent can handle these tasks and adapt to changes.
To ensure software quality, automated testing is a key practice for mature software teams. However, many trivial bugs lie dormant because of insufficient test coverage. This is where agents excel. A multi-agent system tasked with improving the quality of a software system can:
Software quality receives a significant boost once agents get involved.
Managing a large-scale system—with its many teams, infrastructure components, in-house systems, and third-party integrations—is challenging, to say the least. Innovation moves quickly, and changes are frequent. With this level of complexity, human developers find it difficult to grasp the overall status of the system or allocate resources optimally. Here, a multi-agent system can assist by continuously probing and analyzing the entire system state, adapting to changes, and recommending resource-allocation measures.
Multi-agent systems act as a force multiplier on the effectiveness of individual agents. Consider a multi-agent system tasked with debugging and performance optimization for a large system. In this scenario, we would see the following interplay between agents:
A multi-agent system such as this demonstrates the power of specialized agents working together to achieve a common goal.
As GenAI agents automate repetitive tasks and optimize workflows, development teams can focus on innovation. Efficiency and quality go up, enabling an enterprise to develop features rapidly. With their developers leveraging GenAI, organizations are better positioned to deliver high-quality products that meet customer and stakeholder expectations.
Although the benefits of using GenAI agents can be far-reaching, building agents and integrating them into your dev workflow is certainly not trivial. The challenges and limitations are significant enough to merit attention before jumping in.
An agentic workflow may be ideal for many situations, but the long tail of edge cases can still represent a significant range of scenarios. For example, in testing, an agent may be able to generate test cases with “100% code coverage,” in the sense that each line of code is covered by a test. However, this doesn’t equate to complete coverage. Consider this Python function that divides two integers and returns the result:
def divide(nom: int, denom: int) -> float:
return nom / denom
Here is a test that verifies it works correctly:
def test_divide():
self.assertEqual(divide(6, 2), 3.0)
The test validates every line of code; if the test passes, then the code is correct. The naive interpretation of this is that we now have 100% code coverage. However, it should be obvious that the divide function will raise an exception if the denominator is 0. It will also fail if nom or denom are not numbers. In Python, function parameters are just objects. You can call divide with any object. The int in the function signature is just a type hint.
Agentic workflows lead to the autonomous generation and execution of code. That code might contain vulnerabilities or expose sensitive data. Why is this? Keep in mind that many underlying LLMs have been trained on massive volumes of source code, including code that is not secure or is simply outdated, utilizing outdated dependencies that contain known vulnerabilities.
This is a serious problem. In an agentic workflow, agents could generate code on the fly to perform their jobs—and human engineers would not be in the loop to audit it.
Integrating agentic workflows into existing processes and infrastructure may require making substantial changes to how tasks are managed and executed. Data flows, communication protocols, and security measures might all need modification to bring about compatibility. The integration process can be complex and time-consuming.
Couple this complexity with the potential impact on your underlying infrastructure. Will you be able to support the increased computational demands and communication needs of agentic workflows? Your infrastructure may need considerable upgrades.
The changes you may need to make are not just technical ones; they also involve organizational adjustments like allocating budget, retraining staff, and updating policies.
In agentic workflows, the additional processing required by each agent can significantly increase computational load. This may lead to latency issues. The cumulative effect of GenAI agents at work can strain system resources. In real-time or frequently changing environments, these delays can impact overall performance.
Additionally, adopting agentic workflows means needing better observability through logging, tracing, and real metrics. This further adds to the resource load. In summary, using GenAI agents requires you to strike a delicate balance between ensuring effective system monitoring and maintaining optimal performance, especially in high-stakes environments where speed is critical.
Implementing multi-agent systems may be an expensive endeavor. Building the right agents can be resource-intensive, requiring a team of skilled engineers and domain experts. Deploying these systems often necessitates infrastructure upgrades, such as enhanced servers or cloud resources, to handle the increased computational demands.
Tack onto this the ongoing costs of continuous monitoring, maintenance, and regular updates, all of which are necessary to operate effectively in a changing environment. The complexity of multi-agent systems may also lead to the need for specialized support, further driving up costs. Organizations must carefully consider these expenses when deciding whether to implement multi-agent systems.
The integration of GenAI agents and agentic workflows represent a significant step for the future of software development. As engineering teams look to adopt these new advancements, they must find an equilibrium between innovation and the stability and reliability of their systems in place.
Ready to learn more about how agents can empower enterprise IT? Check out this article about intent-driven internet of LLM agents (IIOAs).
Get emerging insights on innovative technology straight to your inbox.
GenAI is full of exciting opportunities, but there are significant obstacles to overcome to fulfill AI’s full potential. Learn what those are and how to prepare.
The Shift is Outshift’s exclusive newsletter.
The latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.