Building AI Tool-Calling Agents from Scratch with Python

A hands-on guide to building LLM agents that call real tools — from a single weather function to a multi-tool database agent with safety constraints and interactive chat.

In my previous post, I covered how function calling works under the hood — constrained decoding, token generation, and parallel tool calls. This post is the practical companion: we'll build agents step by step, going from a single tool call to a full interactive multi-tool agent connected to a PostgreSQL database.

The examples use the OpenAI Python SDK pointed at OpenRouter — so you can use any model (GPT-4o, Claude, Gemini, etc.) by changing one line.


Part 1: The Basic Pattern — One Tool, Three Steps

Tool calling is a 3-step process. The model never executes your function — it only decides which function to call and with what arguments. Your code does the actual work.

Step 1: Define the Tool Schema

This is a JSON description of your function. The model reads this to understand what tools are available:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. 'Berlin'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature units (default: celsius)"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

This tells the model: "There's a function called get_weather. It takes a required city string and an optional units enum." The model uses description fields to decide when and how to call it.

Step 2: Write the Actual Function

This is your Python function — the one that actually runs. It could call an API, query a database, read a file, anything:

def get_weather(city: str, units: str = "celsius") -> dict:
    """In real life this would call a weather API."""
    fake_data = {
        "Berlin": {"temp": 18, "condition": "Cloudy"},
        "Tokyo": {"temp": 25, "condition": "Sunny"},
        "New York": {"temp": 22, "condition": "Rainy"},
    }
    weather = fake_data.get(city, {"temp": 20, "condition": "Unknown"})
    if units == "fahrenheit":
        weather["temp"] = weather["temp"] * 9 / 5 + 32
    weather["city"] = city
    weather["units"] = units
    return weather

Step 3: The Tool-Calling Flow

Now wire it together. Send the user's message with your tool definitions, check if the model wants to call a tool, execute it locally, and send the result back:

from openai import OpenAI
import json

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY"),
)

# Round 1: Send message + tool definitions
messages = [
    {"role": "user", "content": "What's the weather like in Berlin?"}
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools,
)

assistant_message = response.choices[0].message

# The model didn't reply with text — it replied with a tool call
tool_call = assistant_message.tool_calls[0]
# tool_call.function.name = "get_weather"
# tool_call.function.arguments = '{"city": "Berlin"}'

# Round 2: Execute the function locally
args = json.loads(tool_call.function.arguments)
result = get_weather(**args)

# Round 3: Send the result back
messages.append(assistant_message)
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps(result)
})

final = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools,
)

print(final.choices[0].message.content)
# "The weather in Berlin is 18°C and cloudy."

The Key Insight

The model's response isn't text — it's a structured instruction saying "call this function with these arguments." Your code executes it, sends back the result, and then the model uses that result to write its final answer.

User: "What's the weather in Berlin?"
  │
  ▼
Model: "Call get_weather(city='Berlin')"     ← not text, a tool_call
  │
  ▼
Your code: get_weather("Berlin")             ← runs locally
  │ returns {"temp": 18, "condition": "Cloudy"}
  ▼
Model: "The weather in Berlin is 18°C and cloudy."  ← final text answer

Part 2: Using Tools for Structured Output — The Classification Trick

Here's a pattern that surprises most people: you can use tool calling NOT to "do something" but to force the model to return structured JSON in a specific format.

The trick is tool_choice — it forces the model to always call a specific function. The function's parameter schema becomes your output schema.

Defining the Output Schema as a Tool

classify_email_tool = {
    "type": "function",
    "function": {
        "name": "classify_email",
        "description": "Classify an email into a category with confidence score",
        "parameters": {
            "type": "object",
            "properties": {
                "category": {
                    "type": "string",
                    "enum": [
                        "job_opportunity", "spam", "newsletter",
                        "personal", "invoice", "support_request"
                    ],
                    "description": "The email category"
                },
                "confidence": {
                    "type": "number",
                    "description": "Confidence score from 0.0 to 1.0"
                },
                "summary": {
                    "type": "string",
                    "description": "One-sentence summary of the email"
                },
                "action_required": {
                    "type": "boolean",
                    "description": "Whether the user needs to take action"
                },
                "priority": {
                    "type": "string",
                    "enum": ["low", "medium", "high"],
                    "description": "Priority level"
                }
            },
            "required": ["category", "confidence", "summary", "action_required", "priority"]
        }
    }
}

Forcing the Model to Use It

The magic line is tool_choice — it removes the model's option to respond with plain text:

def classify_email(email_text: str) -> dict:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are an email classifier. Analyze the email and classify it."},
            {"role": "user", "content": f"Classify this email:\n\n{email_text}"}
        ],
        tools=[classify_email_tool],
        tool_choice={
            "type": "function",
            "function": {"name": "classify_email"}
        },
    )

    tool_call = response.choices[0].message.tool_calls[0]
    return json.loads(tool_call.function.arguments)

The Result

Feed it any email and you get back a perfectly structured dict:

email = """
Subject: Senior Python Developer - Remote - €80k-100k
Hi, I found your profile on LinkedIn. We have an exciting opportunity
for a Senior Python Developer at a fintech startup. The role involves
building ML pipelines and microservices. Interested in chatting?
"""

result = classify_email(email)
{
    "category": "job_opportunity",
    "confidence": 0.95,
    "summary": "LinkedIn recruiter reaching out for a Senior Python Developer role at a fintech startup",
    "action_required": true,
    "priority": "medium"
}

Every time. Same schema. Same field names. Same types. No parsing, no regex, no "please respond in JSON" prompt tricks.

When to Use This Pattern

This is ideal for any task where you need structured data extraction:

  • Email/ticket classification
  • Extracting entities from text (names, dates, addresses)
  • Sentiment analysis with confidence scores
  • Parsing unstructured logs into structured records

You're not really "calling a tool" — you're using the tool-calling machinery as a structured output enforcer.


Part 3: Connecting to a Real Database

Now let's build something real: an agent that can explore and query a PostgreSQL database. This is where tool calling gets powerful — the model can chain multiple tools together to answer a question.

The Tools

We define three database tools:

tools = [
    {
        "type": "function",
        "function": {
            "name": "list_tables",
            "description": "List all tables in the database. Call this first to discover what data is available.",
            "parameters": {"type": "object", "properties": {}}
        }
    },
    {
        "type": "function",
        "function": {
            "name": "describe_table",
            "description": "Get the column names and types for a specific table. Use this to understand the table structure before querying.",
            "parameters": {
                "type": "object",
                "properties": {
                    "table_name": {
                        "type": "string",
                        "description": "Name of the table to describe"
                    }
                },
                "required": ["table_name"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "query_database",
            "description": "Execute a READ-ONLY SQL query (SELECT only). Returns up to 20 rows.",
            "parameters": {
                "type": "object",
                "properties": {
                    "sql": {
                        "type": "string",
                        "description": "The SQL SELECT query to execute. MUST be a SELECT statement."
                    }
                },
                "required": ["sql"]
            }
        }
    }
]

Notice the description fields — they guide the model's behavior. "Call this first to discover what data is available" teaches the model to explore before querying.

The Implementations with Safety Constraints

import psycopg2

def list_tables() -> str:
    conn = get_db_connection()
    try:
        with conn.cursor() as cur:
            cur.execute("""
                SELECT table_name FROM information_schema.tables
                WHERE table_schema = 'public' ORDER BY table_name;
            """)
            tables = [row[0] for row in cur.fetchall()]
            return json.dumps({"tables": tables, "count": len(tables)})
    finally:
        conn.close()


def describe_table(table_name: str) -> str:
    conn = get_db_connection()
    try:
        with conn.cursor() as cur:
            cur.execute("""
                SELECT column_name, data_type, is_nullable, column_default
                FROM information_schema.columns
                WHERE table_schema = 'public' AND table_name = %s
                ORDER BY ordinal_position;
            """, (table_name,))
            columns = [
                {"name": row[0], "type": row[1], "nullable": row[2], "default": str(row[3]) if row[3] else None}
                for row in cur.fetchall()
            ]
            return json.dumps({"table": table_name, "columns": columns})
    finally:
        conn.close()


def query_database(sql: str) -> str:
    # SAFETY: only allow SELECT queries
    cleaned = sql.strip().upper()
    if not cleaned.startswith("SELECT"):
        return json.dumps({"error": "Only SELECT queries are allowed!"})

    # Block dangerous keywords
    for keyword in ["INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "TRUNCATE", "CREATE"]:
        if keyword in cleaned:
            return json.dumps({"error": f"Blocked: query contains '{keyword}'"})

    conn = get_db_connection()
    try:
        with conn.cursor() as cur:
            cur.execute(sql)
            columns = [desc[0] for desc in cur.description]
            rows = cur.fetchmany(20)  # limit to 20 rows
            result = [dict(zip(columns, row)) for row in rows]

            # Convert non-serializable types
            for row in result:
                for key, value in row.items():
                    if not isinstance(value, (str, int, float, bool, type(None))):
                        row[key] = str(value)

            return json.dumps({"columns": columns, "rows": result, "row_count": len(result)})
    except Exception as e:
        return json.dumps({"error": str(e)})
    finally:
        conn.close()

Safety matters. The model generates SQL — you should never trust it blindly. Our query_database function:

  1. Only allows SELECT statements
  2. Blocks dangerous keywords (DROP, DELETE, etc.)
  3. Limits results to 20 rows
  4. Handles errors gracefully

The Agent Loop

This is the core pattern. The model keeps calling tools until it has enough information to answer:

def chat_with_db(user_question: str):
    messages = [
        {
            "role": "system",
            "content": "You are a helpful data analyst. Use the tools to explore the database and answer questions. Start by listing tables if you don't know the schema."
        },
        {"role": "user", "content": user_question}
    ]

    while True:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            tools=tools,
        )

        assistant_message = response.choices[0].message

        # No tool calls → model is done
        if not assistant_message.tool_calls:
            print(assistant_message.content)
            break

        # Execute each tool call
        messages.append(assistant_message)

        for tool_call in assistant_message.tool_calls:
            func_name = tool_call.function.name
            args = json.loads(tool_call.function.arguments) if tool_call.function.arguments else {}

            # Dispatch to the right function
            func = TOOL_FUNCTIONS[func_name]
            result = func(**args)

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result,
            })

What Happens When You Ask a Question

Ask: "How many records are in each table?"

The model doesn't know the database schema, so it chains tools automatically:

Turn 1: Model calls list_tables()
  → Returns ["stocks", "prices", "dividends"]

Turn 2: Model calls query_database("SELECT COUNT(*) FROM stocks")
        Model calls query_database("SELECT COUNT(*) FROM prices")
        Model calls query_database("SELECT COUNT(*) FROM dividends")
  → Returns counts for each table

Turn 3: Model generates final answer:
  "The database contains 3 tables:
   - stocks: 500 records
   - prices: 125,000 records
   - dividends: 3,200 records"

The model decided on its own to first discover the tables, then count each one. That's the power of the agentic loop — the model plans its own approach.


Part 4: The Full Multi-Tool Agent

Now let's put it all together: an interactive agent with multiple tools, a dispatch registry, and persistent conversation history.

Adding Non-Database Tools

Agents aren't limited to one domain. You can mix any kind of tool:

from datetime import datetime

def get_current_time() -> str:
    now = datetime.now()
    return json.dumps({"datetime": now.isoformat(), "timezone": "local"})


def calculate(expression: str) -> str:
    allowed = set("0123456789+-*/.() ")
    if not all(c in allowed for c in expression):
        return json.dumps({"error": "Invalid characters in expression"})
    try:
        result = eval(expression)
        return json.dumps({"expression": expression, "result": result})
    except Exception as e:
        return json.dumps({"error": str(e)})

The Dispatch Dictionary

This is the glue — a registry that maps function names to Python functions. No if/else chains:

TOOL_FUNCTIONS = {
    "list_tables": list_tables,
    "describe_table": describe_table,
    "query_database": query_database,
    "get_current_time": get_current_time,
    "calculate": calculate,
}

To add a new tool, you follow a simple recipe:

  1. Write the Python function
  2. Write the JSON schema (for the model)
  3. Add it to TOOL_FUNCTIONS
  4. Add the schema to the tools list

That's it. The agent loop handles the rest.

The Production Agent Loop

def run_agent(messages: list) -> str:
    max_iterations = 10  # safety limit

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            tools=tools,
        )

        assistant_message = response.choices[0].message

        # No tool calls → model is done
        if not assistant_message.tool_calls:
            return assistant_message.content

        # Execute all tool calls
        messages.append(assistant_message)

        for tool_call in assistant_message.tool_calls:
            func_name = tool_call.function.name
            args = json.loads(tool_call.function.arguments) if tool_call.function.arguments else {}

            func = TOOL_FUNCTIONS.get(func_name)
            if func:
                result = func(**args)
            else:
                result = json.dumps({"error": f"Unknown tool: {func_name}"})

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result,
            })

    return "Agent reached maximum iterations without finishing."

Two important safety features:

  • max_iterations prevents infinite loops — if the model keeps calling tools forever, we cut it off
  • Unknown tool handling — if the model hallucinates a function name, we return an error instead of crashing

Interactive Chat with History

def main():
    system_prompt = {
        "role": "system",
        "content": (
            "You are a helpful data analyst with access to a PostgreSQL database. "
            "You can list tables, describe their structure, query data, "
            "do calculations, and check the current time."
        )
    }

    messages = [system_prompt]

    while True:
        user_input = input("\n>>> You: ").strip()
        if user_input.lower() in ("quit", "exit", "q"):
            break

        messages.append({"role": "user", "content": user_input})
        answer = run_agent(messages)
        print(f"\n<<< Agent: {answer}")
        messages.append({"role": "assistant", "content": answer})

Because messages persists across turns, the agent remembers context:

>>> You: What tables are in the database?
<<< Agent: The database has 3 tables: stocks, prices, and dividends.

>>> You: Show me the top 5 most expensive stocks.
<<< Agent: [already knows the schema from the previous turn — goes straight to querying]

The Architecture at a Glance

Every tool-calling agent follows the same pattern, regardless of complexity:

┌─────────────────────────────────────────────────────────┐
│                    TOOL DEFINITIONS                     │
│   JSON schemas that tell the model what's available     │
└──────────────────────┬──────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────┐
│                     AGENT LOOP                          │
│   while True:                                           │
│     response = model(messages + tools)                  │
│     if no tool_calls: return response  ← done           │
│     for each tool_call:                                 │
│       execute function                                  │
│       append result to messages                         │
│     loop back                                           │
└──────────────────────┬──────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────┐
│                 DISPATCH DICTIONARY                      │
│   {"func_name": actual_python_function, ...}            │
│   Routes model decisions to real code                   │
└──────────────────────┬──────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────┐
│               PYTHON FUNCTIONS                          │
│   The actual code that runs: API calls, DB queries,     │
│   calculations, file I/O — anything you can code        │
└─────────────────────────────────────────────────────────┘

Key Takeaways

  1. The model never executes code. It returns structured instructions (tool calls). Your code runs the functions locally and sends results back. This means you control what the agent can actually do.

  2. Tool calling doubles as structured output. By using tool_choice to force a specific function, you can make any LLM return reliable, schema-validated JSON — no parsing needed.

  3. The agent loop is just a while loop. Send messages + tools → check for tool calls → execute → append results → repeat. Every agent framework (LangChain, CrewAI, etc.) is built on this same loop.

  4. Safety is your responsibility. The model generates SQL and function arguments. Always validate inputs, restrict operations (SELECT only), limit result sizes, and add iteration caps.

  5. Adding tools is a recipe, not a rewrite. Write the function, write the schema, add to the dispatch dictionary. The agent loop handles routing and chaining automatically.

  6. Start simple, add complexity gradually. One tool → structured output → real database → multi-tool agent. Each step builds on the last. You don't need a framework to build a capable agent.

Building AI Tool-Calling Agents from Scratch with Python | Software Engineer Blog