AI Agents: A Detailed Briefing

This briefing summarizes the key concepts and distinctions related to Large Language Models (LLMs), AI Workflows, and AI Agents, drawing heavily from "AI Agents, Clearly Explained." The document aims to provide a clear understanding for non-technical individuals who use AI tools regularly.

---

I. Large Language Models (LLMs): The Foundation

LLMs are the bedrock upon which popular AI chatbots like ChatGPT, Google Gemini, and Claude are built. They excel at generating and editing text based on vast amounts of training data.

Key Traits:

Input-Output Paradigm: A human provides an input (prompt), and the LLM produces an output based on its training data.
"if I were to ask Chachi BT to draft an email requesting a coffee chat my prompt is the input and the resulting email... is the output."
Limited Proprietary Knowledge: LLMs lack access to personal or internal company data.
As the source states, if asked "when my next coffee chat is even without seeing the response both you and I know Chachi PT is gonna fail because it doesn't know that information it doesn't have access to my calendar."
Passivity: LLMs are reactive; they "wait for our prompt and then respond."

---

II. AI Workflows: Predefined Paths

AI Workflows build upon LLMs by enabling them to interact with external tools and data sources, but always within human-defined parameters.

Key Traits:

Human-Defined Logic: Humans program a "predefined path" or "control logic" for the LLM to follow. An example is telling an LLM, "Every time I ask about a personal event perform a search query and fetch data from my Google calendar before providing a response."
Tool Integration: Workflows can incorporate multiple steps and tools, such as accessing Google Calendar, an API for weather data, or a text-to-audio model.
Lack of Autonomy: The critical distinction is that "if a human is the decision maker there is no AI agent involvement." Even with hundreds or thousands of steps, if a human dictates the sequence, it remains an AI workflow.
Retrieval Augmented Generation (RAG): This is a specific type of AI workflow.
"In simple terms rag is a process that helps AI models look things up before they answer like accessing my calendar or the weather service essentially Rag is just a type of AI workflow."
Manual Iteration: If the output of a workflow is unsatisfactory (e.g., a social media post "is not funny enough"), a human must "manually go back and rewrite the prompt" or adjust the workflow. This trial and error is "currently being done by me a human."

Real-World Example:

The source illustrates an AI workflow using make.com:

Google Sheets: Compiling links to news articles.
Perplexity: Summarizing news articles.
Claude: Drafting LinkedIn and Instagram posts based on a human-written prompt.
Scheduling: Automating the process to run daily.

This is an AI workflow because it "follows a predefined path set by me step one you do this step two you do this step three you do this and finally remember to run daily at 8 am."

---

III. AI Agents: Autonomous Reasoning and Action

AI Agents represent the next level of AI capability, distinguishing themselves by replacing the human decision-maker with an LLM.

Key Traits:

LLM as Decision Maker: The "one massive change that has to happen in order for this AI workflow to become an AI agent is for me the human decision maker to be replaced by an LLM."
Reasoning: An AI agent must "reason or think about the best approach" to achieve a goal. For example, in the social media post scenario, an AI agent would determine the "most efficient way to compile these news articles."
Action via Tools: Agents must "act aka do things via tools," independently selecting and utilizing tools to achieve their objective. This involves assessing which tool is most appropriate (e.g., Google Sheets over Microsoft Word for compiling links if the user is already connected to Google).
Iteration and Self-Correction: A critical capability of AI agents is their ability to "iterate" and improve autonomously. Unlike workflows where a human manually rewrites prompts, "an AI agent will be able to do the same thing autonomously." This could involve an agent using another LLM to critique its own output and repeat cycles until criteria are met.
REACT Framework: The "most common configuration for AI agents is the react framework" because "all AI agents must reason and act."

Real-World Example:

An AI vision agent developed by Andrew Ng demonstrates this:

When a user searches for "skier," the agent first "reason[s] what a skier looks like (a person on skis going really fast in snow for example)."
It then "act[s] by looking at clips in video footage trying to identify what it thinks a skier is indexing that clip and then returning that clip to us."

The key is that "an AI agent did all that instead of a human reviewing the footage beforehand manually identifying the skier and adding tags."

---

IV. Summary of Levels:

Level 1 (LLM): "We provide an input and the LM responds with an output easy."
Level 2 (AI Workflow): "We provide an input and tell the LM to follow a predefined path that may involve in retrieving information from external tools. The key trait here is that the human programs a path for LM to follow."
Level 3 (AI Agent): "The AI agent receives a goal and the LLM performs reasoning to determine how best to achieve the goal, takes action using tools to produce an interim result, observes that interim result and decides whether iterations are required and produces a final output that achieves the initial goal. The key trait here is that the LLM is a decision maker in the workflow."

AI Agents, Workflows, and LLMs: FAQ

What are Large Language Models (LLMs) and what are their limitations?

Large Language Models (LLMs) are the foundation for popular AI chatbots like ChatGPT, Google Gemini, and Claude. They excel at generating and editing text based on the input they receive. However, LLMs have two key limitations: they have limited knowledge of proprietary or personal information (like your calendar data), and they are passive, meaning they only respond when prompted by a human.

How do AI workflows improve upon basic LLMs?

AI workflows build on LLMs by allowing them to follow predefined paths to access and use external tools or data. For example, you can instruct an LLM to search your Google Calendar for an event. While an improvement, these workflows are still limited because they can only follow the exact steps a human has programmed. If a follow-up question requires information not covered by the predefined path, the workflow will fail. Retrieval Augmented Generation (RAG) is a common type of AI workflow where the AI looks up information before answering a query.

What is the fundamental difference between an AI workflow and an AI agent?

The crucial difference is who makes the decisions. In an AI workflow, a human sets the predefined path and makes all the decisions if the output isn't satisfactory, requiring manual adjustments. In contrast, an AI agent replaces the human decision-maker with an LLM. The AI agent autonomously reasons, takes action, observes its results, and iterates to achieve a given goal.

What are the three key traits of an AI agent?

AI agents possess three primary traits:

Reasoning: They can think about the most efficient approach to achieve a goal.
Acting: They can use various tools to perform tasks and produce results.
Iterating: They can observe their own output, identify shortcomings, and autonomously refine their process to meet the desired criteria, often by adding steps or critiquing their own work.

What is the "React" framework in the context of AI agents?

The "React" framework is the most common configuration for AI agents because it embodies their core functionalities: "Reason" and "Act." This framework emphasizes that AI agents must be able to think through a problem (reason) and then execute tasks using available tools (act) to reach their objective.

Can you provide a real-world example of an AI workflow?

A real-world example of an AI workflow could involve compiling news article links from Google Sheets, summarizing them using a tool like Perplexity, drafting social media posts (LinkedIn and Instagram) with Claude, and then scheduling these posts to run automatically every day. This is an AI workflow because a human has defined each step in the process, and any necessary adjustments (like making a post funnier) would require manual intervention.

Can you provide a real-world example of an AI agent?

A real-world AI agent example is an AI vision agent designed to identify specific objects, like "skiers," in video footage. When given the keyword "skier," the agent first reasons what a skier looks like, then acts by analyzing video clips to identify what it believes is a skier, indexes that clip, and returns it to the user. The significant aspect is that the AI agent performs all these reasoning and acting steps autonomously, without human pre-tagging or manual review.

How can we visualize the progression from LLMs to AI workflows to AI agents?

Level 1 (LLM): You provide an input, and the LLM produces an output based on its training data. It's a simple input-output relationship.
Level 2 (AI Workflow): You provide an input and instruct the LLM to follow a predefined path, which may involve retrieving information from external tools. The human programs the path.
Level 3 (AI Agent): You provide a goal. The LLM then reasons how to achieve that goal, takes action using tools, observes the interim results, decides if further iterations are needed, and ultimately produces a final output that meets the initial goal. Here, the LLM itself is the decision-maker.

Posts Gallery