Daniel Knauf is the Chief Technology Officer, Americas at Merkle.
getty
We are at a turning point in artificial intelligence. While single-function chatbots once sufficed, today’s landscape is dominated by specialized AI agents that can manage travel, process payments or even draft proposals. However, as more brands launch their own AI agents, customers face an overwhelming maze of interfaces and interactions, threatening the very purpose of AI: to simplify lives.
The solution lies in agent-to-agent orchestration, a paradigm where AI agents communicate and collaborate to address complex needs. This approach offers a unified, streamlined experience, eliminating the need for users to manage multiple systems.
The Next Step: Agent-To-Agent Orchestration
Agent orchestration allows personal agents to collaborate with others, even across brands and ecosystems. Instead of managing multiple tools, users interact with a single “conductor” agent, which delegates tasks to specialized agents in the background. This creates a seamless, integrated experience that transforms complex ecosystems into unified workflows.
By enabling agents to interact and share capabilities, organizations can offer efficient and consistent experiences, restoring simplicity and enhancing customer satisfaction.
Scaling Human-Like Intelligence
AI agents must replicate the nuanced decision making of human representatives who blend intuition, domain expertise and guided procedures. Agent orchestration achieves this by dynamically coordinating tasks using a modular architecture. Each specialized service, such as payment processing or troubleshooting, operates as a microservice, while the orchestration layer connects these services logically to resolve complex issues.
This orchestration layer mimics human adaptability, ensuring that AI systems not only automate repetitive tasks but also navigate intricate workflows, addressing user demands without frequent human intervention.
Broadcasting Capabilities: Agent Directories
For agents to collaborate effectively, they must understand each other’s capabilities. Future ecosystems will feature standardized directories that list agent functionalities, required inputs and outputs. These directories allow agents to identify the best collaborators for specific tasks.
By exposing capabilities in machine-readable formats, organizations maintain control while enabling authorized agents to negotiate and delegate. This turns isolated services into interconnected networks of expertise, reducing complexity and enhancing flexibility.
Transforming Customer Experience
Agent orchestration revolutionizes the customer experience. Instead of juggling multiple chatbots or apps, users issue a single, natural language request (a prompt). Their personal agent consults capability directories, identifies appropriate agents and oversees task completion. This unified approach simplifies interactions, saving time and effort.
Brands adopting this model gain a competitive edge by becoming synonymous with efficiency and reliability. Over time, public directories could lead to “Agent Stores,” where brands list agent capabilities for broader collaboration. For instance, an airline’s agent might coordinate with hotel and rideshare agents to deliver a seamless travel experience.
Orchestration also redefines personalization. Beyond remembering purchase histories, advanced systems tailor entire processes to individual needs, proactively assembling agents to meet evolving demands. This creates a level of support that feels intuitive and proactive, driving loyalty and trust.
Proposed Architecture For Orchestration
• User Interaction Layer: A single interface where users submit requests, leaving the complexity to the orchestration system.
• Orchestration Layer: Interprets user intent, consults directories, applies rules and coordinates agents.
• Capability Directory: A registry of agent functionalities, ensuring seamless collaboration.
• Context/Policy Engine: Stores user data, enforces privacy and shapes outcomes based on policies.
• Interoperability Layer: Ensures agents adhere to consistent protocols for compatibility.
• Specialized Agents: Execute domain-specific tasks assigned by the orchestrator.
• Response Aggregation: Combines results into a unified response for the user.
This architecture transforms today’s fragmented systems into integrated solutions, offering simplicity and efficiency.
Preparing For Agent Orchestration
To prepare for agent orchestration, organizations must focus on laying a strong foundation for modularity, integration and interoperability. The first step is to ensure that existing systems and services are modular, with clearly defined inputs, outputs and dependencies. This modular architecture is essential for creating an ecosystem where agents can seamlessly collaborate. Organizations should also begin cataloging the capabilities of their AI agents and microservices in structured directories. These directories should include metadata and access policies, enabling agents to quickly identify and collaborate with the appropriate partners.
In addition to building modular systems and directories, organizations must address interoperability by adopting standardized communication protocols. This ensures that agents across different brands or ecosystems can integrate easily without requiring custom configurations. By focusing on these foundational elements, businesses can position themselves to fully embrace agent-to-agent orchestration and deliver a better customer experience.
Roadblocks To Watch For
While the benefits of agent orchestration are compelling, organizations must address several challenges to unlock its potential. One significant hurdle is ensuring data privacy and compliance. As agents collaborate, they must operate within strict boundaries, accessing only authorized information. Strong governance frameworks and policy enforcement are critical to mitigate risks and maintain trust.
Another challenge is overcoming interoperability gaps. Many organizations operate in siloed environments where systems are not designed to work together. This lack of compatibility can hinder the seamless integration needed for orchestration. Finally, businesses should prepare for the upfront investment required to build orchestration frameworks, including infrastructure upgrades, capability directories and standardized APIs. These efforts, while resource-intensive, will be instrumental in driving long-term success.
The Path Forward
Agent orchestration is the next evolution in AI. By turning complexity into a competitive advantage, it allows organizations to meet customer demands with precision and agility. Users no longer need to navigate tools or interfaces—they can focus on goals, trusting the AI ecosystem to handle the details.
This vision ultimately leads us to “agent harmony,” representing a future where AI agents collaborate dynamically to deliver intuitive and effective results. It is a shift from managing tools to managing outcomes, with technology acting as an invisible helper. As organizations embrace this model, they pave the way for AI systems that are not only efficient but also deeply fulfilling for users.
为代理编排做准备 为了准备代理编排,组织必须专注于为模块化、集成和互操作性奠定坚实的基础。第一步是确保现有系统和服务是模块化的,具有明确定义的输入、输出和依赖关系。这种模块化架构对于创建代理可以无缝协作的生态系统至关重要。组织还应该开始对其 AI 代理和微控制器的功能进行分类服务在结构化目录中。这些目录应包括元数据和访问策略,使代理能够快速识别并与适当的合作伙伴协作。
比如说Sora爆红,几篇中文文章基本都是要么车轱辘话说了等于没说,要么就几个名词狗屁不通的拼凑一起,毫无AI痕迹。其实各种基于Stable Diffusion的创新还是不少来自国内的,IP adapter, LCM什么的,但是都算技巧派的。而Sora某种程度上就是Stable Diffusion的升级版本,完美体现了OpenAI大力出奇迹的精神。 Sora的技术报告有13个作者,带头的是两个博士刚毕业和一个本科工作几年的,都是年轻人,据说肝了一年撸出来的。这个项目相当多的数据处理和计算,到PB规模都不奇怪,体力要求不是一般的高,过程还很不确定。Stable Diffusion其实不是一个模型,而是一个带三个模型的pipeline。它的核心就是先用CLIP(也是OpenAI的作品)先把文字转成一个类似于图像模子的东西,然后用U-net(一个类似于U型锁的网络,也是广泛应用的结构)做Diffusion,大致是逐步无中生有根据模子把图像生成出来(但是在一个所谓Latent空间),最后用VAE(这个应该是stability ai训练的一个auto encoder)恢复到像素空间。Diffusion是一个相对比较坚实的数学模型,所以名字带了stable字样。CLIP和VAE都是单独训练的。而Sora估计也是类似的结构,只不过三个模型都换了。VAE对应的他们叫视频压缩网络,U-net则被换成了基于transformer的扩散模型,虽然叫Diffusion Transformer,其实是个误导人的名字,因为本质上是transformer based diffusion,同时把视频分解成时空小块然后tokenize,都是基本操作。而对应CLIP那个condition的生成模型可能也差不多,还用GPT做了增强。所以一年搞这么多,虽然可能没有特别高深的东西,但都是超多工作量。 所以网文说什么这是个transformer多么先进都是胡扯,transformer是个基础模块,OpenAI的模型就没有不用的。Google发明的时候还有encoder和decoder,一个decoder block还有两个注意力层,都被OpenAI简化掉了,就一个注意力加个MLP,然后就往死里加参数和数据,奇迹就出现了。然后condition也不是条件,而更像是用模子来引导或者塑造最终结果的意思,conditioning这个词很难用中文描述。总之Stable Diffusion那一套可以再玩一次。反正就还是attention is all you need。OpenAI主打一个暴力美学碾压一切包括人类。