Building AI Tool-Calling Agents from Scratch with Python
A hands-on guide to building LLM agents that call real tools — from a single weather function to a multi-tool database agent with safety constraints and interactive chat.
In my previous post, I covered how function calling works under the hood — constrained decoding, token generation, and parallel tool calls. This post is the practical companion: we'll build agents step by step, going from a single tool call to a full interactive multi-tool agent connected to a PostgreSQL database.
The examples use the OpenAI Python SDK pointed at OpenRouter — so you can use any model (GPT-4o, Claude, Gemini, etc.) by changing one line.
Part 1: The Basic Pattern — One Tool, Three Steps
Tool calling is a 3-step process. The model never executes your function — it only decides which function to call and with what arguments. Your code does the actual work.
Step 1: Define the Tool Schema
This is a JSON description of your function. The model reads this to understand what tools are available:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a given city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'Berlin'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units (default: celsius)"
}
},
"required": ["city"]
}
}
}
]
This tells the model: "There's a function called get_weather. It takes a required city string and an optional units enum." The model uses description fields to decide when and how to call it.
Step 2: Write the Actual Function
This is your Python function — the one that actually runs. It could call an API, query a database, read a file, anything:
def get_weather(city: str, units: str = "celsius") -> dict:
"""In real life this would call a weather API."""
fake_data = {
"Berlin": {"temp": 18, "condition": "Cloudy"},
"Tokyo": {"temp": 25, "condition": "Sunny"},
"New York": {"temp": 22, "condition": "Rainy"},
}
weather = fake_data.get(city, {"temp": 20, "condition": "Unknown"})
if units == "fahrenheit":
weather["temp"] = weather["temp"] * 9 / 5 + 32
weather["city"] = city
weather["units"] = units
return weather
Step 3: The Tool-Calling Flow
Now wire it together. Send the user's message with your tool definitions, check if the model wants to call a tool, execute it locally, and send the result back:
from openai import OpenAI
import json
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.getenv("OPENROUTER_API_KEY"),
)
# Round 1: Send message + tool definitions
messages = [
{"role": "user", "content": "What's the weather like in Berlin?"}
]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
)
assistant_message = response.choices[0].message
# The model didn't reply with text — it replied with a tool call
tool_call = assistant_message.tool_calls[0]
# tool_call.function.name = "get_weather"
# tool_call.function.arguments = '{"city": "Berlin"}'
# Round 2: Execute the function locally
args = json.loads(tool_call.function.arguments)
result = get_weather(**args)
# Round 3: Send the result back
messages.append(assistant_message)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
final = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
)
print(final.choices[0].message.content)
# "The weather in Berlin is 18°C and cloudy."
The Key Insight
The model's response isn't text — it's a structured instruction saying "call this function with these arguments." Your code executes it, sends back the result, and then the model uses that result to write its final answer.
User: "What's the weather in Berlin?"
│
▼
Model: "Call get_weather(city='Berlin')" ← not text, a tool_call
│
▼
Your code: get_weather("Berlin") ← runs locally
│ returns {"temp": 18, "condition": "Cloudy"}
▼
Model: "The weather in Berlin is 18°C and cloudy." ← final text answer
Part 2: Using Tools for Structured Output — The Classification Trick
Here's a pattern that surprises most people: you can use tool calling NOT to "do something" but to force the model to return structured JSON in a specific format.
The trick is tool_choice — it forces the model to always call a specific function. The function's parameter schema becomes your output schema.
Defining the Output Schema as a Tool
classify_email_tool = {
"type": "function",
"function": {
"name": "classify_email",
"description": "Classify an email into a category with confidence score",
"parameters": {
"type": "object",
"properties": {
"category": {
"type": "string",
"enum": [
"job_opportunity", "spam", "newsletter",
"personal", "invoice", "support_request"
],
"description": "The email category"
},
"confidence": {
"type": "number",
"description": "Confidence score from 0.0 to 1.0"
},
"summary": {
"type": "string",
"description": "One-sentence summary of the email"
},
"action_required": {
"type": "boolean",
"description": "Whether the user needs to take action"
},
"priority": {
"type": "string",
"enum": ["low", "medium", "high"],
"description": "Priority level"
}
},
"required": ["category", "confidence", "summary", "action_required", "priority"]
}
}
}
Forcing the Model to Use It
The magic line is tool_choice — it removes the model's option to respond with plain text:
def classify_email(email_text: str) -> dict:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are an email classifier. Analyze the email and classify it."},
{"role": "user", "content": f"Classify this email:\n\n{email_text}"}
],
tools=[classify_email_tool],
tool_choice={
"type": "function",
"function": {"name": "classify_email"}
},
)
tool_call = response.choices[0].message.tool_calls[0]
return json.loads(tool_call.function.arguments)
The Result
Feed it any email and you get back a perfectly structured dict:
email = """
Subject: Senior Python Developer - Remote - €80k-100k
Hi, I found your profile on LinkedIn. We have an exciting opportunity
for a Senior Python Developer at a fintech startup. The role involves
building ML pipelines and microservices. Interested in chatting?
"""
result = classify_email(email)
{
"category": "job_opportunity",
"confidence": 0.95,
"summary": "LinkedIn recruiter reaching out for a Senior Python Developer role at a fintech startup",
"action_required": true,
"priority": "medium"
}
Every time. Same schema. Same field names. Same types. No parsing, no regex, no "please respond in JSON" prompt tricks.
When to Use This Pattern
This is ideal for any task where you need structured data extraction:
- Email/ticket classification
- Extracting entities from text (names, dates, addresses)
- Sentiment analysis with confidence scores
- Parsing unstructured logs into structured records
You're not really "calling a tool" — you're using the tool-calling machinery as a structured output enforcer.
Part 3: Connecting to a Real Database
Now let's build something real: an agent that can explore and query a PostgreSQL database. This is where tool calling gets powerful — the model can chain multiple tools together to answer a question.
The Tools
We define three database tools:
tools = [
{
"type": "function",
"function": {
"name": "list_tables",
"description": "List all tables in the database. Call this first to discover what data is available.",
"parameters": {"type": "object", "properties": {}}
}
},
{
"type": "function",
"function": {
"name": "describe_table",
"description": "Get the column names and types for a specific table. Use this to understand the table structure before querying.",
"parameters": {
"type": "object",
"properties": {
"table_name": {
"type": "string",
"description": "Name of the table to describe"
}
},
"required": ["table_name"]
}
}
},
{
"type": "function",
"function": {
"name": "query_database",
"description": "Execute a READ-ONLY SQL query (SELECT only). Returns up to 20 rows.",
"parameters": {
"type": "object",
"properties": {
"sql": {
"type": "string",
"description": "The SQL SELECT query to execute. MUST be a SELECT statement."
}
},
"required": ["sql"]
}
}
}
]
Notice the description fields — they guide the model's behavior. "Call this first to discover what data is available" teaches the model to explore before querying.
The Implementations with Safety Constraints
import psycopg2
def list_tables() -> str:
conn = get_db_connection()
try:
with conn.cursor() as cur:
cur.execute("""
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'public' ORDER BY table_name;
""")
tables = [row[0] for row in cur.fetchall()]
return json.dumps({"tables": tables, "count": len(tables)})
finally:
conn.close()
def describe_table(table_name: str) -> str:
conn = get_db_connection()
try:
with conn.cursor() as cur:
cur.execute("""
SELECT column_name, data_type, is_nullable, column_default
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = %s
ORDER BY ordinal_position;
""", (table_name,))
columns = [
{"name": row[0], "type": row[1], "nullable": row[2], "default": str(row[3]) if row[3] else None}
for row in cur.fetchall()
]
return json.dumps({"table": table_name, "columns": columns})
finally:
conn.close()
def query_database(sql: str) -> str:
# SAFETY: only allow SELECT queries
cleaned = sql.strip().upper()
if not cleaned.startswith("SELECT"):
return json.dumps({"error": "Only SELECT queries are allowed!"})
# Block dangerous keywords
for keyword in ["INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "TRUNCATE", "CREATE"]:
if keyword in cleaned:
return json.dumps({"error": f"Blocked: query contains '{keyword}'"})
conn = get_db_connection()
try:
with conn.cursor() as cur:
cur.execute(sql)
columns = [desc[0] for desc in cur.description]
rows = cur.fetchmany(20) # limit to 20 rows
result = [dict(zip(columns, row)) for row in rows]
# Convert non-serializable types
for row in result:
for key, value in row.items():
if not isinstance(value, (str, int, float, bool, type(None))):
row[key] = str(value)
return json.dumps({"columns": columns, "rows": result, "row_count": len(result)})
except Exception as e:
return json.dumps({"error": str(e)})
finally:
conn.close()
Safety matters. The model generates SQL — you should never trust it blindly. Our query_database function:
- Only allows
SELECTstatements - Blocks dangerous keywords (
DROP,DELETE, etc.) - Limits results to 20 rows
- Handles errors gracefully
The Agent Loop
This is the core pattern. The model keeps calling tools until it has enough information to answer:
def chat_with_db(user_question: str):
messages = [
{
"role": "system",
"content": "You are a helpful data analyst. Use the tools to explore the database and answer questions. Start by listing tables if you don't know the schema."
},
{"role": "user", "content": user_question}
]
while True:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
)
assistant_message = response.choices[0].message
# No tool calls → model is done
if not assistant_message.tool_calls:
print(assistant_message.content)
break
# Execute each tool call
messages.append(assistant_message)
for tool_call in assistant_message.tool_calls:
func_name = tool_call.function.name
args = json.loads(tool_call.function.arguments) if tool_call.function.arguments else {}
# Dispatch to the right function
func = TOOL_FUNCTIONS[func_name]
result = func(**args)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result,
})
What Happens When You Ask a Question
Ask: "How many records are in each table?"
The model doesn't know the database schema, so it chains tools automatically:
Turn 1: Model calls list_tables()
→ Returns ["stocks", "prices", "dividends"]
Turn 2: Model calls query_database("SELECT COUNT(*) FROM stocks")
Model calls query_database("SELECT COUNT(*) FROM prices")
Model calls query_database("SELECT COUNT(*) FROM dividends")
→ Returns counts for each table
Turn 3: Model generates final answer:
"The database contains 3 tables:
- stocks: 500 records
- prices: 125,000 records
- dividends: 3,200 records"
The model decided on its own to first discover the tables, then count each one. That's the power of the agentic loop — the model plans its own approach.
Part 4: The Full Multi-Tool Agent
Now let's put it all together: an interactive agent with multiple tools, a dispatch registry, and persistent conversation history.
Adding Non-Database Tools
Agents aren't limited to one domain. You can mix any kind of tool:
from datetime import datetime
def get_current_time() -> str:
now = datetime.now()
return json.dumps({"datetime": now.isoformat(), "timezone": "local"})
def calculate(expression: str) -> str:
allowed = set("0123456789+-*/.() ")
if not all(c in allowed for c in expression):
return json.dumps({"error": "Invalid characters in expression"})
try:
result = eval(expression)
return json.dumps({"expression": expression, "result": result})
except Exception as e:
return json.dumps({"error": str(e)})
The Dispatch Dictionary
This is the glue — a registry that maps function names to Python functions. No if/else chains:
TOOL_FUNCTIONS = {
"list_tables": list_tables,
"describe_table": describe_table,
"query_database": query_database,
"get_current_time": get_current_time,
"calculate": calculate,
}
To add a new tool, you follow a simple recipe:
- Write the Python function
- Write the JSON schema (for the model)
- Add it to
TOOL_FUNCTIONS - Add the schema to the
toolslist
That's it. The agent loop handles the rest.
The Production Agent Loop
def run_agent(messages: list) -> str:
max_iterations = 10 # safety limit
for i in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
)
assistant_message = response.choices[0].message
# No tool calls → model is done
if not assistant_message.tool_calls:
return assistant_message.content
# Execute all tool calls
messages.append(assistant_message)
for tool_call in assistant_message.tool_calls:
func_name = tool_call.function.name
args = json.loads(tool_call.function.arguments) if tool_call.function.arguments else {}
func = TOOL_FUNCTIONS.get(func_name)
if func:
result = func(**args)
else:
result = json.dumps({"error": f"Unknown tool: {func_name}"})
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result,
})
return "Agent reached maximum iterations without finishing."
Two important safety features:
max_iterationsprevents infinite loops — if the model keeps calling tools forever, we cut it off- Unknown tool handling — if the model hallucinates a function name, we return an error instead of crashing
Interactive Chat with History
def main():
system_prompt = {
"role": "system",
"content": (
"You are a helpful data analyst with access to a PostgreSQL database. "
"You can list tables, describe their structure, query data, "
"do calculations, and check the current time."
)
}
messages = [system_prompt]
while True:
user_input = input("\n>>> You: ").strip()
if user_input.lower() in ("quit", "exit", "q"):
break
messages.append({"role": "user", "content": user_input})
answer = run_agent(messages)
print(f"\n<<< Agent: {answer}")
messages.append({"role": "assistant", "content": answer})
Because messages persists across turns, the agent remembers context:
>>> You: What tables are in the database?
<<< Agent: The database has 3 tables: stocks, prices, and dividends.
>>> You: Show me the top 5 most expensive stocks.
<<< Agent: [already knows the schema from the previous turn — goes straight to querying]
The Architecture at a Glance
Every tool-calling agent follows the same pattern, regardless of complexity:
┌─────────────────────────────────────────────────────────┐
│ TOOL DEFINITIONS │
│ JSON schemas that tell the model what's available │
└──────────────────────┬──────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────┐
│ AGENT LOOP │
│ while True: │
│ response = model(messages + tools) │
│ if no tool_calls: return response ← done │
│ for each tool_call: │
│ execute function │
│ append result to messages │
│ loop back │
└──────────────────────┬──────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────┐
│ DISPATCH DICTIONARY │
│ {"func_name": actual_python_function, ...} │
│ Routes model decisions to real code │
└──────────────────────┬──────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────┐
│ PYTHON FUNCTIONS │
│ The actual code that runs: API calls, DB queries, │
│ calculations, file I/O — anything you can code │
└─────────────────────────────────────────────────────────┘
Key Takeaways
-
The model never executes code. It returns structured instructions (tool calls). Your code runs the functions locally and sends results back. This means you control what the agent can actually do.
-
Tool calling doubles as structured output. By using
tool_choiceto force a specific function, you can make any LLM return reliable, schema-validated JSON — no parsing needed. -
The agent loop is just a while loop. Send messages + tools → check for tool calls → execute → append results → repeat. Every agent framework (LangChain, CrewAI, etc.) is built on this same loop.
-
Safety is your responsibility. The model generates SQL and function arguments. Always validate inputs, restrict operations (SELECT only), limit result sizes, and add iteration caps.
-
Adding tools is a recipe, not a rewrite. Write the function, write the schema, add to the dispatch dictionary. The agent loop handles routing and chaining automatically.
-
Start simple, add complexity gradually. One tool → structured output → real database → multi-tool agent. Each step builds on the last. You don't need a framework to build a capable agent.