How to Build an AI Agent from Scratch Using LangChain and OpenAI

April 27, 2026

What an AI Agent Actually Is

An AI agent is a program that takes a goal, decides which tools to use, calls those tools, reads the results, and repeats until the goal is met. Unlike a simple LLM call where you send a prompt and get a response, an agent operates in a loop. It thinks, acts, observes, and thinks again.

The simplest example: you ask an agent to find the current weather in Mumbai and book a restaurant nearby. The agent calls a weather API, reads the temperature, calls a restaurant search API filtered by location, picks one, and returns the answer. No single LLM call could do all of that.

The Stack

We will use LangChain for the agent framework and OpenAI's GPT-4o for the reasoning engine. GPT-4o supports function calling natively, which means the model outputs structured tool calls instead of free-text that you have to parse. This is more reliable than the older ReAct-style agents that relied on the model printing "Action: tool_name" in plain text.

Step 1: Define Your Tools

An agent is only as useful as its tools. Each tool is a Python function wrapped with a description that tells the LLM when and how to use it.

from langchain.tools import tool
import requests

@tool
def search_web(query: str) -> str:
"""Search the web for current information. Use this when the user asks about recent events, prices, or facts you do not know."""
response = requests.get(
'https://api.tavily.com/search',
params={'query': query, 'api_key': 'tvly-...'}
)
results = response.json()['results'][:3]
return '
'.join([f"{r['title']}: {r['content']}" for r in results])

@tool
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression. Use this for any arithmetic, unit conversions, or numerical calculations."""
try:
return str(eval(expression))
except Exception as e:
return f"Error: {e}"

The docstrings matter. The LLM reads them to decide which tool to call. Vague descriptions lead to wrong tool selections.

Step 2: Create the Agent

from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_functions_agent, AgentExecutor
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

llm = ChatOpenAI(model='gpt-4o', temperature=0)

prompt = ChatPromptTemplate.from_messages([
('system', 'You are a helpful assistant. Use tools when needed. Be concise.'),
MessagesPlaceholder(variable_name='chat_history', optional=True),
('human', '{input}'),
MessagesPlaceholder(variable_name='agent_scratchpad')
])

tools = [search_web, calculate]
agent = create_openai_functions_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

Step 3: Run It

result = executor.invoke({'input': 'What is the population of Tokyo, and what is that divided by the number of train stations in the city?'})
print(result['output'])

With verbose=True, you can watch the agent think. It will call search_web to find both numbers, then call calculate to do the division. If it gets confused, it retries with a different tool call.

Adding Memory

Without memory, each call to the agent is stateless. It forgets everything from the previous turn. For conversational agents, you need memory.

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(
memory_key='chat_history',
return_messages=True,
k=10  # keep last 10 turns
)

executor = AgentExecutor(
agent=agent,
tools=tools,
memory=memory,
verbose=True
)

ConversationBufferWindowMemory keeps the last k turns in the prompt. For longer conversations, consider ConversationSummaryMemory, which summarizes older turns to save tokens.

Error Handling and Guardrails

Agents fail. They call the wrong tool, they loop endlessly, they hallucinate tool names. Build for these failure modes:

Max iterations: Set max_iterations=10 on the AgentExecutor. If the agent has not solved the problem in 10 steps, it stops and returns what it has.
Timeouts: Wrap tool calls in a timeout. A search API that hangs for 30 seconds will stall the entire agent loop.
Fallback responses: Set handle_parsing_errors=True so the agent recovers from malformed tool calls instead of crashing.
Input validation: Validate tool inputs before executing. An agent might pass a SQL query to your database tool. Sanitize everything.

When Not to Build an Agent

If the task is a single LLM call with a well-defined prompt, do not build an agent. Agents add latency (multiple LLM calls), cost (token usage multiplied), and failure modes. Use an agent only when the task requires multiple steps, tool use, or conditional logic that cannot be hardcoded in advance.

Frequently Asked Questions

How much does an agent cost per query?

Each agent step is a separate LLM call. A 3-step agent using GPT-4o costs roughly $0.03 to $0.10 per query, depending on prompt length and tool outputs.

Can I use open-source models for agents?

Yes, but quality drops. Llama 3 70B and Mistral Large can handle simple tool use. For complex multi-step reasoning, GPT-4o and Claude 3.5 Sonnet are still more reliable.