Beyond Orchestration: Challenges in Managing Multi-Agent Software Workflows
While multi-agent systems show promise, structural limitations and high token costs present significant hurdles to effective orchestration. This post explores the technical boundaries of what can and cannot be delegated to autonomous agents in software development.
Beyond Orchestration: Challenges in Managing Multi-Agent Software Workflows
Introduction: Moving Beyond the Single-Agent Era toward Multi-Agent Orchestration
The recent emergence of AI coding agents, such as Claude Code, has brought a massive shift to the development paradigm. We have reached a point where a single agent can deliver results exceeding expectations, provided it is given clear task definitions and the appropriate tools. However, technological progress does not stop there. Developers are naturally turning their gaze toward a new question: "Can we give a high-level goal to a single agent and have the AI autonomously decompose that goal into sub-tasks and distribute them to subordinate agents?"
This shift represents more than just advanced prompting; it is a "paradigm shift." As Addy Osmani has noted, we are moving from a "Conductor" model—where one guides a single performer in real-time—to an "Orchestrator" model, managing an ensemble of agents collaborating asynchronously. In other words, the critical skill is no longer just executing a single loop of code generation, but rather coordinating a team of agents, each possessing different context windows and areas of responsibility.
Yet, behind this rosy outlook lie massive challenges that must be addressed. While the process of a high-level agent interpreting goals, delegating to sub-agents, and aggregating results seems theoretically perfect, actual implementation often hits walls of cost, efficiency, and structural complexity.
Latest Trends and Implementation Cases in Multi-Agent Systems
The current AI agent ecosystem is full of experimental attempts to move beyond simple chatbots toward building autonomous organizations. Most notable are projects attempting to organize agents like a "city" or a "company." Steve Yegge’s Gastown operates by setting Claude Code agents as components of a city, while Paperclip aims for a "Zero Human Company," managing agents in the form of a corporate organizational chart. These attempts demonstrate that agents can move beyond being mere tools to becoming autonomous actors within a workflow.
Dynamic changes are also occurring at the infrastructure and platform levels. Anthropic has unveiled experimental features like "Claude Code Agent Teams" to support inter-agent collaboration, and OpenAI has embedded hand-off capabilities between agents within its Agents SDK. Furthermore, experimental testbeds like Scion support container-based multi-agent orchestration. Scion enables the dynamic management of specialized agents with independent identities and workspaces across both local and remote clusters.
At the product level, the way humans and agents collaborate is being redefined. Cursor has declared the era of "Cloud Agents," where autonomous agents run in parallel on independent VMs. Similarly, the issue tracker Linear has signaled a transition via "Linear Next" from simple issue tracking to a platform where humans and agents collaborate seamlessly. As such, the technological trend is steadily moving from "individual task execution" toward "autonomous organizational management."
Structural Limitations of Orchestration: The Issues of Cost and Efficiency
However, multi-agent systems do not guarantee a bright future alone. According to practical analysis (shalomeir’s inside mode), multi-agent orchestration carries the risk of severe token consumption and cost explosion. During direct testing of the Gastown project, token costs reached approximately $5,000—not simply because of the volume of work, but as a result of the cumulative overhead required for each agent to re-verify previous context and synchronize state whenever a task is handed over.
The core cause of this inefficiency lies in the "cost of state transfer." When tasks are divided among multiple agents, each agent must review the previous context to understand "what the current situation is." Consequently, a structural limitation arises where significantly more resources and tokens are consumed by sharing and re-verifying context between agents than during the actual "execution" phase (writing code or generating documents). As a result, we observe phenomena where costs increase at least tenfold compared to a single-agent session, while productivity actually decreases.
These issues are supported by academic research. UC Berkeley's MAST research (Why Do Multi-Agent LLM Systems Fail?) released a dataset of failure cases in large-scale multi-agent systems, warning that structural limitations can prevent systems from operating as intended. Ultimately, the cost explosion and inefficiency I experienced align with the structural flaws pointed out by this research; we must remember that simply increasing the number of agents does not inherently lead to increased productivity.
Conclusion: Criteria for Effective Delegation and the Changing Role of Developers
So, in which direction should we move? When building multi-agent systems, one needs the insight to distinguish between using a workflow framework like LangGraph and implementing a system that dynamically creates agents, like Gastown. The former is execution within a predefined pipeline, while the latter is closer to implementing an autonomous organization. Developers must clearly understand the difference between these two models and choose the right tool for their specific purpose.
The core competency required of developers moving forward will be establishing "criteria for effective delegation." You cannot delegate every task to an agent.
- Delegatable Areas: Tasks with clear specifications (Specs), verifiable outputs, and repetitive patterns.
- Areas Requiring Direct Management: Tasks requiring high creativity, complex context comprehension, or areas where the cost of state transfer between agents outweighs the execution benefits.
Ultimately, the developer's role will evolve from a "person who writes code" to an "orchestrator who defines clear specifications, logically decomposes tasks, and verifies final results." How we compose this ensemble of agent teams and how we ensure reliability through quality gates will become the core competitive advantage in the future of software engineering.
Evidence-Based Summary
While multi-agent systems show promise, structural limitations and high token costs present significant hurdles to effective orchestration.
Evidence source: 멀티 에이전트 오케스트레이션은 왜 잘 안 되는가? - shalomeir’s inside modeThis post explores the technical boundaries of what can and cannot be delegated to autonomous agents in software development.
Evidence source: AddyOsmani.com - The Code Agent Orchestra - what makes multi-agent coding work