AI Edge June 2025: Smarter Agents, Secure Systems & More

The Curious Mind of a Machine: Building AI Agents with LangChain and OpenAI

A few years ago, language models operated as glorified autocomplete engines, predicting words based on probability rather than purpose. Today, with the right architecture, these models can reason, evaluate, calculate, and deliver targeted outcomes. The shift? They have been given a sense of agency.

LangChain is more than a prompt orchestrator. It introduces a functional layer of autonomy that allows AI to make decisions, invoke tools, and dynamically solve problems. It equips machines with the logic to ask: What should I know next? followed by Where should I go to find it?

In this chapter, we explore how LangChain, paired with OpenAI’s latest models, transforms static models into intelligent agents capable of adaptive, purposeful reasoning.

From Chains to Choices

In earlier architectures, language models were bound by linear logic—what LangChain calls “chains.” These were static pipelines where a user’s prompt passed through a fixed sequence: model call, output, and optional post-processing. While efficient, they lacked adaptability.

Agents change that paradigm. Rather than following a predetermined route, they assess the task at each step and determine the next best move. Should they retrieve data, perform a calculation, ask a clarifying question, or conclude with a final answer?

This dynamic loop of observation, decision-making, and action transforms the model from a passive responder into an autonomous problem solver. Each response informs the next step, enabling the agent to navigate complexity with purpose.

Inside the Mind of a LangChain Agent

While an agent in LangChain is technically a wrapper around a language model, its core innovation lies in its behavioral loop—not its structure.

The agent maintains an evolving awareness of its current state: what information it holds, what actions it has taken, and what remains unresolved. At each iteration, it consults the language model to determine the next step. If the model identifies a need—be it executing a calculation, running a search, or querying a database—the agent invokes the appropriate tool.

The result is fed back into the loop, updating the agent’s memory and informing its next decision. This self-refining process continues until a final answer is ready.

Why Agents? Why Now?

Traditional chains operate on rigid, predefined paths. Each step in the process is hard-coded, making outcomes reliable but limiting adaptability.

But real-world tasks are rarely linear. An AI system might need to search the web, perform a computation, and consult a private database—sometimes all within a single interaction. Rigid chains can’t handle this complexity.

Agents offer a dynamic alternative. They analyze the task at each step, choose the right tool, and adapt as they progress. This looped reasoning—deciding whether to search, calculate, reflect, or simply respond—enables agents to behave more like human problem-solvers than static scripts.

In this guide, we’ll explore how to build such agents using LangChain and OpenAI’s 2025 API stack. You’ll learn how tools, prompts, and large language models work in concert to create reasoning systems that are both reliable and flexible.

We assume a working knowledge of Python and the fundamentals of large language models. What follows is a step-by-step breakdown of how these components interact—and how to make your AI agent think clearly, act purposefully, and deliver consistently.

The Mental Model: A Detective at Work

Consider a detective unraveling a case. Each new clue—be it a fingerprint, an interview, or a surveillance record—feeds the next line of inquiry. The process is not linear; it’s iterative, reasoned, and adaptive

LangChain agents follow a similar loop:

Think: Assess the current state and identify the next step.
Act: Choose a tool or pose a new question.
Observe: Evaluate the output or result.
Repeat: Continue until the objective is met or confidence is reached.

At each step, the language model decides whether to act or simply respond — and that choice is central to how the agent reasons.

“the case unfolds one decision at a time” sound good, but are too narrative-heavy for a technical deep-div

Key Components of Agents:

Building an autonomous AI agent involves orchestrating three core elements—tools, the reasoning engine, and a structured memory system:

Tools: These are external functions or APIs, each defined by a name and description. The agent calls them to carry out tasks such as web search, calculation, or database retrieval.

LLM: This is the reasoning engine, the large language model (e.g., GPT-4o-mini, Gemini 2.0). It chooses the next action or generates a final answer, depending on context and tool results.

Prompt/Scratchpad: The prompt guides the LLM with usage instructions, guardrails, and tool distinctions. The scratchpad acts as memory, storing past actions and outcomes to maintain continuity throughout the reasoning loop.

Tools: Building Blocks for Actions

A tool is simply a Python function wrapped with metadata. For example, to make a calculator tool that evaluates arithmetic expressions, you might write:

from langchain. tools import Tool

def calculate_expression(expr: str) -> str:

try:

result = eval(expr)

return str(result)

except Exception as e:

return f"Error: {e}"

def return_dummy_weather(city: str) -> str:

return f"The weather in {city} is cloudy"

calc_tool = Tool(

name="Calculator",

description="Performs simple arithmetic. Input should be a valid Python expression, e.g. '2+2'.",

func=calculate_expression

)

# Dummy weather tool

weather_tool = Tool(

name="WeatherSearch",

description="Tells current weather of a city. Input should be a valid city in string, e.g 'paris'.",

func=calculate_expression

)

This calculator tool tells the agent that whenever it needs to perform a math operation, it can call the tool named "Calculator" with a string input. The agent's prompt will include the tool’s name and description, along with optional formatting instructions. That description should be clear and specific. Vague or incomplete descriptions can confuse the agent, leading it to select the wrong tool or use the correct one incorrectly.

LangChain includes many built-in tools and wrappers for common use cases. For example:

Web Search Tool: Interfaces such as Tavily Search Results or Google Serper API Wrapper allow the agent to perform web searches. These typically require API keys for access.
Retriever Tool: This wraps a vector database or document store. In a common pattern, you might first load documents and create a retriever, then expose it to the agent using a tool constructor. The retriever then fetches relevant text snippets from your data in response to a query.
Custom API Tools: You can define tools that call any external API. For instance, a weather tool that retrieves forecast data or a JIRA tool that creates new tickets. The agent only needs a Python function reference. LangChain handles the actual call when the agent decides to use it.

When giving tools to an agent, we put them in a list:

tools = [calc_tool, weather_tool, search_tool, retriever_tool, ...]

The agent will see this list, typically as part of the prompt or through Tool objects, and may choose among them.

Each tool should ideally perform a clear, atomic function. Complex or multi-step logic can confuse the agent. If needed, break tasks into simpler tools or chains and let the agent sequence them.

Language Model: The Reasoning Engine

At the core of every LangChain agent is a large language model, responsible for interpreting the current context, reasoning through decisions, and determining the next step. These models are typically chat-optimized (like GPT-4o, Claude, or Gemini) and trained to follow complex instructions across multi-turn conversations.

In LangChain, integrating an LLM is straightforward. A typical implementation might look like this:

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-4o", temperature=0.0)

The temperature setting controls randomness in the model’s output. For agents that require consistency, particularly when interacting with tools or executing code, a lower temperature (close to 0) ensures predictable, repeatable behavior.

Once initialized, the LLM becomes the agent’s reasoning engine, interpreting prompts, invoking tools when necessary, and ultimately producing the final response.

Prompt (Agent Scratchpad)

The agent prompt template defines how the LLM is instructed to behave. A common pattern, often called ReAct-style, includes the following components:

System / Instruction: Explains to the assistant that it is an agent with access to specific tools. For example:"You are an agent designed to answer questions. You have access to the following tools…"
Tool Descriptions: Lists each tool’s name and description so the model understands what actions it can take.
Format Guide: Provides instructions on how the model should format its reasoning. This might involve a structured JSON or markdown format. You can also use libraries like Pydantic to enforce more precise and well-formatted JSON objects for tool calls.

Example Prompt based on our Calculator tool.

You are a helpful, precise AI assistant capable of solving user queries using available tools.

You can perform reasoning, fetch information, and carry out calculations when needed.

</Persona>

- Only call a tool if it's clearly required to answer the question.

- Do not guess values or fabricate information.

- Never perform code execution or arithmetic by yourself; use the Calculator tool for all such tasks.

</Guardrails>

<Tool>

<Name>Calculator</Name>

Performs simple arithmetic. Input must be a valid Python expression, such as '3 * (4 + 5)'.

Use this tool only for basic math operations (e.g., +, -, *, /, parentheses).

</Description>

To use this tool, return:

Action: Calculator

Action Input: 2 + 2

</Format>

</Tool>

<Tool>

<Name>Weather</Name>

Tells current weather of a city. Input should be a valid city in string, e.g 'paris'.

</Description>

To use this tool, return:

Action: Weather

Action Input: Paris

</Format>

</Tool>

</AvailableTools>

How the Agent Thinks: A Step-by-Step Loop

Under the hood, an agent operates as a loop — prompting the language model, interpreting its output, executing tools, and updating its internal state. Each cycle brings the agent closer to an answer. Conceptually, here is how the process unfolds:

1. Initial Input: The loop begins with a user query. This is accompanied by any system-level instructions, tool descriptions, or context-setting prompts.

2. Language Model Response: The language model evaluates the input and produces one of two outcomes—either a final answer or an instruction to perform an external action.

3. Tool Invocation: If the response calls for action, the agent triggers the appropriate tool, passing along the required input. For example, a call to a calculator might look like a query for a specific computation.

4. Observe: The result from the tool — whether text, structured data, or some other output — is captured. The agent records this in the scratchpad, expanding the context for the next decision.

5. Loop or End: The agent checks if the LLM signalled a final answer or if any stopping criteria (max steps/time) are met. If not finished, it goes back to step 2: it calls the LLM again, now including the new observations in the prompt. This continues, building up a chain of reasoning.

6. Return Answer: When the agent determines the task is complete, it delivers the final response to the user — shaped by everything it has seen, done, and learned in the loop.

This process is illustrated by the pseudocode in the LangChain source (simplified):

from langchain. schema import HumanMessage, AIMessage, SystemMessage

def process_with_tool_loop(user_input: str):

MAX_ITERATIONS = 10

current_iteration = 0

messages = [

SystemMessage(content="You are a helpful assistant with access to a calculator tool."),

HumanMessage(content=user_input)

]

while current_iteration < MAX_ITERATIONS:

print(f"Iteration {current_iteration + 1}")

response = llm.invoke(messages)

# Check if LLM wants to call a function

if not response.additional_kwargs.get("function_call"):

print(f"Final answer: {response.content}")

break

function_call = response.additional_kwargs["function_call"]

function_name = function_call["name"]

function_args = function_call["arguments"]

# Execute the tool

if function_name == "Calculator":

import json

args = json.loads(function_args)

tool_result = calculate_expression(args.get("expr", ""))

if function_name == "WeatherSearch":

import json

args = json.loads(function_args)

tool_result = weather_tool(args.get("city", ""))

# Add function call and result to conversation.

messages.append(response)

messages.append(AIMessage(content=f"Function result: {tool_result}"))

current_iteration += 1

return response.content

Managing History for the conversation

In AI chat systems, preserving conversation history is essential for maintaining coherence and context. The system must remember what has been said, what tools have been used, and what responses were returned deletein order to generate meaningful answers.

That is where the Conversation History Service comes in. Its role is to convert stored messages into LangChain-compatible formats — standardised message types such as human messages, AI responses, and tool interactions. This formatting is especially important when working with OpenAI models, where tool invocation and multi-turn reasoning rely on a consistent message structure.

Not all models follow the same format. While OpenAI’s models like GPT-4o-mini expect specific conventions, other language models such as Gemini may require different approaches, particularly when supporting agentic behaviour. The message transformation logic must therefore adapt to match each model’s unique input requirements.

This system:

Handles multiple sender types (USER, AI, TOOL)
Ensures messages are properly ordered and valid according to OpenAI LLM ( gpt-4o-mini ) requirements.
Constructs an array of Langchain messages starting with the system prompt

To support robust reasoning, the system stores the full conversation history, including every tool call and its corresponding response, in a persistent database. Before each new language model invocation, the service retrieves this history and reformulates it according to the requirements of LangChain or the target model.

For example :

from langchain_core.messages import HumanMessage, AIMessage, ToolMessage

def convert_to_langchain_message(message, next_message=None):

sender_type = message.get("sender_type")

if sender_type == "TOOL":

return ToolMessage(

tool_call_id=message.get("tool_call_id"),

name=message.get("content"),

content=message.get("content")

)

elif sender_type == "USER":

return HumanMessage(content=message.get("content"))

else: # Assume AI

if next_message is None:

return None

if message.get("additional_metadata", {}).get("tool_calls") and next_message.get("sender_type") != "TOOL":

return None

return AIMessage(

content=message.get("content"),

additional_kwargs=message.get("additional_metadata", {})

)

It loops through stored conversation messages and based on the sender_type, it converts each into the appropriate LangChain message:

TOOL ➜ ToolMessage

USER ➜ HumanMessage

Otherwise (typically AI) ➜ AIMessage

Best Practices and Advanced Considerations

Building a reliable agent takes more than plugging in a language model. It requires deliberate choices in configuration, in prompting, and in constraint. Here are a few key practices that can help shape smarter, more stable behaviour.

Write Clear Tool Descriptions: The agent depends entirely on how tools are described. These descriptions serve as its mental map, and vague directions will lead it astray. Each tool should include a concise explanation of its purpose, inputs, outputs, and any usage constraints. Ambiguity at this stage often results in the agent selecting the wrong tool or misapplying the right one.

Guide Reasoning with Few-Shot Examples: By default, agents use zero-shot prompting — they operate without prior examples. But when their behaviour is erratic or too vague, a well-crafted few-shot prompt can help. Include one or two sample interactions in the system prompt to show how each tool should be used. These examples serve as scaffolding for more accurate reasoning.

Control for Consistency: Language models are probabilistic, and randomness can derail decision-making. For agents, a low temperature setting (such as 0.1 or 0.2) encourages consistency. It reduces hallucinations, improves tool reliability, and keeps the reasoning loop grounded.

Set Iteration Limits: Without clear boundaries, agents can fall into infinite loops, repeatedly calling tools without ever concluding. To prevent this, LangChain's AgentExecutor allows you to set constraints on execution. Parameters such as max_iterations (which defaults to 10) and max_execution_time ensure the agent eventually stops, even if it fails to produce a final answer.

Conclusion

LangChain offers a powerful foundation for building intelligent agents that combine large language model reasoning with tool-based execution. With the right configuration—clear prompts, precise tool definitions, and well-defined constraints—developers can design systems capable of handling multi-step tasks, dynamic decision-making, and real-time data interaction.

Remember that agents are powerful but also require careful crafting of prompts, descriptions, and limits to behave reliably. Whether you're building a QA chatbot that searches the web, an analytics assistant that processes databases, or any autonomous tool-based LLM system, understanding the agent loop and its components is key.

With the foundations in this guide, you can start designing your own LangChain agents and explore more advanced topics like multi-agent coordination or integration with LangGraph for complex pipelines. Happy agent-building!

Up next: A technical deep dive into Google Gemini and how to harness its multimodal capabilities to build next-generation agents that reason, adapt, and act across text, images, and structured data in real time

Issue 2: AI Edge Magazine