Table of Contents

Tips for building AI agents

Tips for Building Effective AI Agents

Tips for Building Effective Agents

This document summarizes key insights and view on the current state and future potential of AI agents, differentiating them from simpler AI workflows, highlighting practical implementation tips, and addressing common misconceptions.

1. Key Definitions and Distinctions

A core area of the discussion revolves around defining what constitutes an "AI agent" as opposed to a "workflow," a distinction born from extensive customer interactions and internal development.

Workflow:

Characterized by a "fixed number of steps" where "you have one prompt, you take the output of it, you feed it into prompt B, take the output of that, feed it into prompt C, and then you're done."
It's "on rails through a fixed number of steps," meaning the path is predefined and predictable.
Each prompt in a workflow is "very specific," taking a single input and transforming it into a specific output (e.g., categorizing a user question).

Agent:

The defining characteristic is that "you're letting the LLM decide sort of how many times to run. You're having it continuing to loop until it's found a resolution."
It's "more autonomous" and the number of steps to complete a task is unknown beforehand (e.g., customer support conversations, iterating on code changes).
An agent prompt is "much more open-ended and usually give the model tools or multiple things to check and say, 'Hey, here's the question, and you can do web searches or you can edit these code files or run code and keep doing this until you have the answer.'"
Agents become "more and more prevalent and more and more capable" as models and tools improve.

This distinction is crucial for developers to understand when deciding "where agents are appropriate" and ensuring they "shouldn't go after a fly with a bazooka."

2. Practical Advice for Building Agents

The experts offer several actionable tips for developers working with AI agents:

Empathy for the Model: Developers should "act like Claude" and "put ourselves in that environment" to understand how the model perceives instructions and available tools. "There's a lot of context and a lot of knowledge that the model maybe does not have and we have to be empathetic to the model and we have to make a lot of that clear in the prompt, in the tool description, and in the environment."
Detailed Tool Descriptions: A common pitfall is providing "incredibly bare bones" tool descriptions with "no documentation, the parameters are named A and B." Developers must remember that "it is still a model and you need to be prompt engineering in the descriptions of your tools themselves." Good tool descriptions are "part of the same prompt" and influence the model's behavior.
Measurable Results and Feedback Loops: It's critical to "make sure that you have a way to measure your results." Without feedback on performance, developers can "end up building a lot sort of without realizing that either it's not working or maybe something much simpler would've actually done just as good a job." Agents require "some mechanism to get feedback as you're iterating," otherwise "you're just gonna have noise" and won't "converge to the right answer."
Start Simple and Iterate: Begin with "starting as simple as possible and having that measurable result as you're, you know, building more complexity into it." The "orchestration around the code, which will persist even as the model gets better, is kind of their niche."
Build for Model Improvement: Companies should build products such that "as the models get smarter, your product gets better and better," rather than having their "moat" disappear.

3. Overhyped vs. Underhyped Aspects of Agents

The discussion includes a "hot take" on current perceptions of AI agents:

Overhyped:

"Agents for consumers are fairly overhyped right now."
This is due to the difficulty in "fully specify your preferences and what the task is as to just do it yourself," making it "very expensive to verify."
"Trying to have an agent fully book a vacation for you, describing exactly what you want your vacation to be... is almost just as hard as just going and booking it yourself. And it's very high risk."
Consumer agents require building up "context so that the model already knows your preferences" over time, which "takes time."

Underhyped:

"Things that save people time, even if it's a very small amount of time."
Even automating a one-minute task allows it to be done "a hundred times more than you previously would," scaling up previously cost-prohibitive activities.
The sweet spot for agents is "a set of tasks that's valuable and complex, but also maybe the cost of error or cost of monitoring error is relatively low."

4. Successful Use Cases and Future Outlook

Current Promising Use Cases:

Coding Agents: "Super exciting because they're verifiable, at least partially." Code has "this great property that you can write tests for it," allowing the agent to get "more signal every time it goes through a loop" and "converge on the right answer." The current blocker for coding agents is the lack of "perfect unit tests" for real-world cases, requiring new ways to "verify and... add tests for the things that you really care about."
Search Agents: A "really valuable task" where "it's very hard to do deep, iterative search, but you can always trade off some precision for recall and then just get a little bit more documents or a little bit more information than is needed and filter it down."

Future of Agents (2025):

Business Adoption: Expect "a lot of business adoption of agents, starting to automate a lot of repetitive tasks and really scale up a lot of things that people wanted to do more of before but were too expensive." This includes "every single pull request in triggers a coding agent to come and update all of your documentation."
Multi-Agent Environments: A speculative but interesting area of research is "how a multi-agent environment would look like." The example of "bunch of Claudes... play Werewolf together" illustrates the potential for "interesting interaction... that you just haven't seen before." However, "in production we haven't even seen a lot of successful single agents," so multi-agent systems are a further-off exploration for understanding model behavior rather than immediate practical application.

Conclusion

The conversation emphasizes a pragmatic approach to AI agents, stressing clear definitions, practical development considerations like prompt and tool engineering, and a focus on verifiable, high-value tasks. While consumer-facing agents are currently overhyped, the potential for agents to automate repetitive business tasks and scale existing processes is immense. The future will likely see increasing adoption in enterprise settings and continued research into more complex, autonomous systems.

FAQ - Building Effective AI Agents

Frequently Asked Questions: Building Effective AI Agents

What is the core distinction between an "agent" and a "workflow" in AI?

The key difference lies in autonomy and a fixed number of steps. A workflow involves a series of predefined LLM calls chained together, where the path and number of steps are known in advance. Each step typically performs a specific transformation, like categorizing user input. In contrast, an agent is more autonomous; it allows the LLM to decide how many times to loop and what actions to take until a resolution is found. This makes agents suitable for open-ended tasks where the number of steps is unpredictable, such as customer support conversations or iterative code changes.

Why is there a need for formal definitions of AI agents now?

The growing sophistication and capability of AI models have led to the emergence of two distinct patterns: pre-orchestrated workflows and more autonomous agents. Customers and developers have been using various terms for these concepts, leading to confusion. Anthropic saw a need to provide clear definitions, diagrams, and code examples to facilitate better communication and understanding within the industry, especially as models become more capable of handling agentic workflows.

What are some practical tips for developers building AI agents?

Developers should adopt an "empathetic" approach by trying to see the world through the model's lens. This involves providing clear, comprehensive instructions in the prompt, detailed descriptions for tools, and sufficient context within the environment. Many developers focus on detailed prompts but neglect to properly document the tools they provide to the model, similar to how an engineer needs good documentation for functions they use. Remember, the tool descriptions are part of the overall prompt and significantly influence the model's behavior.

What are some overhyped and underhyped aspects of AI agents currently?

Overhyped: Consumer-facing agents, especially for complex tasks like booking an entire vacation, are currently overhyped. Specifying preferences for such tasks can be as difficult as doing it manually, and the high risk associated with errors (e.g., booking flights without user confirmation) makes real-world implementation challenging without significant context building and stepping stones.

Underhyped: Automating small, repetitive tasks that save even a minute of time is significantly underhyped. While seemingly minor, these automations can dramatically scale up activities that were previously cost or time-prohibitive, allowing for a 10x or 100x increase in output.

What are current successful use cases for AI agents?

Coding and search are two prominent examples where agents demonstrate significant utility. Coding agents are particularly exciting due to their verifiability. The ability to run tests provides immediate feedback, allowing the agent to iterate and converge on the correct solution. While perfect unit tests are rare, this feedback mechanism is crucial for improving agent performance. Agentic search is also highly effective for deep, iterative information retrieval where some trade-off between precision and recall is acceptable.

What are the main challenges in improving the performance of coding agents?

While models have made incredible strides in writing code, the next major limiting factor for coding agents is the verification process. In real-world scenarios, perfect unit tests are often unavailable. The challenge lies in finding ways to effectively verify and add tests for critical functionalities, allowing the model itself to assess the correctness of its code and iterate before human intervention is required. Without a strong feedback loop, agents risk generating noise rather than converging on correct answers.

What does the future of AI agents look like in 2025?

In 2025, there's an expectation of widespread business adoption of agents for automating repetitive tasks and scaling up operations. This includes automating tasks like updating documentation every time a pull request is made, which was previously cost-prohibitive. There's also emerging interest in multi-agent environments, where multiple AI agents interact and coordinate, potentially leading to novel emergent behaviors, though widespread production use of multi-agent systems is still distant.

What general advice is given to developers exploring AI agents?

The most crucial advice is to measure your results and start as simply as possible. Building in a vacuum without feedback mechanisms can lead to inefficient or ineffective solutions. Developers should focus on creating products where the product itself improves as the underlying AI models get smarter, rather than building solutions whose "moat" disappears with model advancements. It's also beneficial to start building "agentic muscle" to better understand these capabilities as the landscape shifts.

Posts Gallery